Thesaurus with åäö not working

Issues with installing under all GNU/Linux Distributions
Post Reply
eyerouge
Posts: 11
Joined: Thu Mar 27, 2008 8:01 am

Thesaurus with åäö not working

Post by eyerouge »

I tried to install a Swedish thesaurus in Open Office Writer 2.3.0 (Ubuntu Feisty) and have no problem except for the fact that whenever I search for a synonym that has a swedish letter in it (åäöÅÄÖ) it doesn't find any matches in the thesaurus. Whenever I search for words that don't include Swedish letters they get prope rmatches.

When I actually find a word (without swedish letters) the results that are shown for it which contain swedish letters are garbled up. Clearly it seems to be some kind of char code issue, but I have no clue about how to fix it since I've tried different ISO settings in the thesaurus files without any success.

Here's a link to the thesaurus I use. As it's only a test, there's just 3 words in it (I do however have the real thing on my desk, based on Synlex). After loading it, try typing åland öland kresi. You'll notice that åland and öland won't give you any hits, while kresi will. Thing is, they're all in the thesaurus.

Does anyone have a solution for this?
Feels like it's a really small thing I'm missing out on here.
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Thesaurus with åäö not working

Post by esperantisto »

I think, your problem is in encoding mismatch: the .dat file states ISO Latin-10 and is in Latin-10, however, the .idx file states Latin-10, but is not in Latin-10, it's in UTF-8 instead. After re-saving as Latin-10, it works. So, just watch your encodings.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Thesaurus with åäö not working

Post by esperantisto »

Funny, after I've tried out UFT-8, ISO Latin-1 and returned to ISO Latin-10, now, only öland produces any thesaurus suggestions. So, there may be some issues with the very thesaurus.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Thesaurus with åäö not working

Post by esperantisto »

And, by the way, all three words are marked as spelling errors with the Swedish(Sweden) dictionary available to me.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
eyerouge
Posts: 11
Joined: Thu Mar 27, 2008 8:01 am

Re: Thesaurus with åäö not working

Post by eyerouge »

esperantisto wrote:I think, your problem is in encoding mismatch: the .dat file states ISO Latin-10 and is in Latin-10, however, the .idx file states Latin-10, but is not in Latin-10, it's in UTF-8 instead. After re-saving as Latin-10, it works. So, just watch your encodings.
I'm confused now: Latin 10 is not the same as ISO 8859-10. Latin 10 is however the same as ISO 8859-16, which would actually be 1) wrong language for the thesaurus and 2) non-working with Open Office according to MyThes author:
Strings currently recognized by OpenOffice.org are:

ISO8859-1
ISO8859-2
ISO8859-3
ISO8859-4
ISO8859-5
ISO8859-6
ISO8859-7
ISO8859-8
ISO8859-9
ISO8859-10
KOI8-R
CP-1251
ISO8859-14
ISCII-DEVANAGARI
So, either I'm confused or you misstakenly mixed up Latin 10 with 8859-10.

I also don't follow if you actually ever got it to work: Did you ever get it to find both åland öland and kresi? Was it just an encoding missmatch in the files? If som

1) exactly which file did you re-save
2) as what encoding (in iso terms, see http://en.wikipedia.org/wiki/ISO/IEC_8859 )
3) what did you type at the first line in each file, where the ISO is supposed to be?
4) in what program did you save it? My gedit dissallows me from saving it in anythign else than UTF-8 (which isn't supported by Open Office according to the list above)
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Thesaurus with åäö not working

Post by esperantisto »

eyerouge wrote:So, either I'm confused or you misstakenly mixed up Latin 10 with 8859-10.
I meant ISO 8859-10. I did not realize, it's not synonymous to Latin-10 :oops:
I also don't follow if you actually ever got it to work: Did you ever get it to find both åland öland and kresi? Was it just an encoding missmatch in the files? If som

1) exactly which file did you re-save
2) as what encoding (in iso terms, see http://en.wikipedia.org/wiki/ISO/IEC_8859 )
3) what did you type at the first line in each file, where the ISO is supposed to be?
4) in what program did you save it? My gedit dissallows me from saving it in anythign else than UTF-8 (which isn't supported by Open Office according to the list above)
Yes, I've managed to get thesaurus suggestions for all three words. I've loaded th_sv_SE_new.idx as UTF-8 and re-saved as ISO 8895-10; “ISO8895-10” was already in the first line, so I did not change it. As for th_sv_SE_new.dat, I did not touch it. I used UniRed, as I'm on Windows (I use Linux as well, but I did not try your thesaurus there). If you have troubles with saving in different encodings, try out jEdit http://www.jedit.org, it's cool.
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
esperantisto
Volunteer
Posts: 578
Joined: Mon Oct 08, 2007 1:31 am

Re: Thesaurus with åäö not working

Post by esperantisto »

Oops, due to the fact that I'm not proficient in Swedish, I've overlooked something. So, by the above procedure (re-saving th_sv_SE_new.idx in ISO8890-10) I've managed to obtain suggestions for all three words, but now I see, that the second suggestion for öland is garbled. I see it like:

Code: Select all

tillhÃĨll fÃķr ÃķlÃĪndningar
.
I've discovered, that th_sv_SE_new.dat is also in UTF-8! But after re-saving it to ISO8895-10, suggestions for åland and kresi disappeared. However, now I know that öland is

Code: Select all

tillhåll för öländningar
(well, I only have to learn Swedish to understand that :lol: )

Summing up, there seems to be a problem about the encoding(s). But you still need to observe them :-)
AOO 4.2.0 (of 2015) / LO 7.x / Win 7 / openSUSE Linux Leap 15.4 (64-bit)
Post Reply