[Solved] Runtime error when importing custom Word dictionary

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
gerald682
Posts: 5
Joined: Fri Nov 12, 2010 5:52 am

[Solved] Runtime error when importing custom Word dictionary

Post by gerald682 »

I have a problem using the macro ImportExportDictionary1-1.sxw to import a Word 2003 CustomWord.dic in Unicode into an OO3.2.1 New User Dictionary called CustomOOo.

The macro runs through a few hundred words and then gives a "BASIC runtime error, Reading exceeds EOF."
OpenOffice_Import_RunTimeError.jpg
I then cannot stop the macro which means I cannot close the file either - because BASIC is still running. Ctl+Alt+Del gave a "programme not responding" and let me out. After recovering the .odt files I see that nothing was added to the CustomOOo dictionary.

In this forum “Importing User Dictionary from Microsoft Word” by Avraham in July had the same problem but in that case the words were imported despite the error message, while here the words are not imported.

I have near zero programming knowledge and I would appreciate assistance to get the macro to work.
Last edited by gerald682 on Mon Nov 15, 2010 8:39 pm, edited 1 time in total.
Open Office 3.2.1 Windows XP Pro v5.1 SP3
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

Hi,

as far as I can see, there seems to be a limit of words for the macro.
 Edit: ...but it's presumably another problem...
Now I could easily import a *.dic (text file, more than 2200 words/lines)
with that macro.
import_dic_2264.png
 
An easy alternative solution:
Use the "OOoUserDict1" format instead of the Macro and the "WBSWG6" format.

(My test with a MS Word 2000 CUSTOM.DIC [> 250 words] works well:
- convert CUSTOM.DIC to UTF-8,
- add the encoding lines [1]
- rename the dictionary, and paste it in OOo 3.2.1 user/wordbook)

[1] e.g.:

Code: Select all

OOoUserDict1
lang: en-US
type: positive
---
See also:
Re: Creating and mass populating Custom Dictionarypart (2)

Re: Add button in spell checker doesn't workEDIT: [2010-08-22]

→ Issue 106032: linguistic: make human-readable user-dicts the default format ?

→ Issue 60698: user-dict format re-work ...
.
LibreOffice 4.0.4 · WinXP
gerald682
Posts: 5
Joined: Fri Nov 12, 2010 5:52 am

Re: Runtime error when importing custom Word dictionary

Post by gerald682 »

Thanks Franx.
I followed your first set of instructions without success. Maybe they were too cryptic for a beginner.
Anyway today I see your edit where you had the macro work for over 2000 entries which encouraged me to look at my word list as the source of macro problem.
There were various symbols, like * and [, so I removed them.
And, fantastic, the macro added my 6221 words.
Unfortunately when I tested it on the word kukupa which should be kūkupa it gave an error and no suggestion.
And, when I opened the dictionary it made no sense:
OpenOffice_dictionary_display-error.jpg
I use NZ English and the dictionary was in UTF-8 after editing and saving with NotePad before I ran the macro.
I would appreciate your further advice.
Open Office 3.2.1 Windows XP Pro v5.1 SP3
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

gerald682 wrote: [...] I use NZ English and the dictionary was in UTF-8 after editing and saving with NotePad before I ran the macro.
I would appreciate your further advice.
That looks like "mojibake". ;) [1]
Could you upload the edited "Word 2003 CustomWord.dic", or send it to me via PM?
I would test it with both,
- the macro (WBSWG6 format)
- and the "alternative" in OOoUserDict1 format.

[1] <http://en.wikipedia.org/wiki/Mojibake>

EDIT:
A similar example (I applied the import with the macro twice):

(1) Text file UTF-8
import_utf8.png
(2) Text file "ANSI"
import_ansi.png
LibreOffice 4.0.4 · WinXP
gerald682
Posts: 5
Joined: Fri Nov 12, 2010 5:52 am

Re: Runtime error when importing custom Word dictionary

Post by gerald682 »

Hi Franx.
I tried to send it via the PM system but .dic, .txt not acceptable and .jpg didn't work because dimensions could not be determined.
I mentioned I was a beginner!
And trying to send it as an attachment with this message - same problems.
How do I send it?
Open Office 3.2.1 Windows XP Pro v5.1 SP3
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

Please see my private message...
LibreOffice 4.0.4 · WinXP
gerald682
Posts: 5
Joined: Fri Nov 12, 2010 5:52 am

Re: Runtime error when importing custom Word dictionary

Post by gerald682 »

Hi Franx
Dictionary file attached. .dic renamed to .zip
Attachments
MaoriSciNamesUTF8.zip
(66.37 KiB) Downloaded 215 times
Open Office 3.2.1 Windows XP Pro v5.1 SP3
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

Thanks. See you later!
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

First try

OOoUserDict1 format

(1)
- Open MaoriSciNamesUTF8.zip with a text editor (Notepad++)
- Add the four lines at the beginning

Code: Select all

OOoUserDict1
lang: <none>
type: positive
---
‘Ake
‘Ake‘ake
‘Aketa
‘Ange
‘Ano
‘Apūka
‘Ara
‘Arapītia
‘Atukura
...
(2)
- Convert to UTF-8 without BOM (= ANSI as UTF-8)
- Save as Maori_1.dic (or anything.dic)
Maori_3.png
(3)
- Paste Maori_1.dic in ...user\wordbook
- Open OOo, then Tools > Options > ... > Writing Aids
- activate/tick Maori_1.dic
Maori_2.png
(4) Try it (and fix the issue ;) )
[Rename Maori_1.zip to Maori_1.dic]
Maori_1.zip
(66.42 KiB) Downloaded 222 times
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »



(worse results--for comparison only)
Macro (WBSWG6 format)

[Removed]
sorry--wrong attachments...

[New]
Original MaoriSciNamesUTF8.zip (UTF-8) → converted into text file (UTF-8 without BOM)
→ imported with macro: Maori_2ub.dic → renamed to Maori_2ub.zip
Maori_2ub.zip
(67.3 KiB) Downloaded 232 times
Original MaoriSciNamesUTF8.zip (UTF-8) → converted into text file (ANSI)
→ imported with macro: Maori_3a.dic → renamed to Maori_3a.zip

Sample: Looks good--but doesn't work...
test_3a.png
Maori_3a.zip
(66.07 KiB) Downloaded 205 times
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

Second Try

Maybe a workaround...

Are you sure that you are using the correct character for: ‘ (e.g. in ‘Ake‘ake)?
The problem with these "words" is that this char ‘
is not treated as part of the word.
Maybe it's incorrect--but I've replaced it now with the Unicode char
U+02BB MODIFIER LETTER TURNED COMMA = ʻ
from the Unicode block IPA ExtensionsSpacing Modifier Letters, (e.g.: ʻAkeʻake).

You can see--with the help of the red wavy lines from spell check--
that this char is now treated as part of the word.

Example 1
workaround1.png
Then I've removed the older version of these words from the dictionary (Maori_1.dic).
Afterwards, I've added the new version (with U+02BB) to the dictionary--
via right-cklick > context menu.

Example 2
workaround2.png
The improved(?) dictionary (OOoUserDict1 format):
Maori_1b.zip
(66.35 KiB) Downloaded 195 times
Download:
test_playground.odt
<https://docs.google.com/leaf?id=0B0EPDe ... ist&num=50>
Last edited by franx on Mon Nov 15, 2010 6:34 pm, edited 1 time in total.
LibreOffice 4.0.4 · WinXP
gerald682
Posts: 5
Joined: Fri Nov 12, 2010 5:52 am

Re: Runtime error when importing custom Word dictionary

Post by gerald682 »

Franx, you have done a fantastic job, including the discovery that if the unknown chr (‘) was turned into U+02BB things worked better. Now you have mentioned U+02BB my comments below deal with both your proposed solutions and the U+2018/U+02BB problem in Polynesian languages - maybe this should move to another forum, but I leave that up to you.

I should have realized that this import problem would drag me straight back into the handling of the glottal/hamsah in Polynesian languages. The glottal is used in Hawaiian (called ‘okino), Tahitian, Tongan, Samoan and Cook Islands Maori (what I am dealing with), but is not required in NZ Maori (a.k.a. Maori). By the way, I neither speak nor write CK Maori, but I do strive to record the CK Maori names for plants and animals in a modern orthography.

U+2018 (Left Single Quotation Mark) is widely used for glottals because it is in most common fonts and it is more stable than using an ordinary quotation mark in which 'ava'ava is smart quoted into ‘ava’ava. It's main (only?) problem is that custom dictionaries do not accept it as the first letter of a word and they treat words with it inside as two or more words.

U+02BB (Modifier Letter Turned Comma) is the real Unicode glottal and spellers recognize it as a letter rather than a quotation mark. The problem is that U+02BB is in only a couple of little used fonts with XP. With Vista things improved in that v.5.01 (2006) Arial and Times New Roman both contain U+02BB, although it was still absent from most fonts including the new Calibri etc. And if a file in Vista's Arial with ‘Ava‘ava was copied into an XP system it becomes □Ava□ava.

Thus my Cook Islands Maori dictionary of names still has the glottal as U+2018, although when U+02BB is more generally available I will change. Generally all systems handle the macron vowels: āēīōū ĀĒĪŌŪ (although, surprisingly, your Maori_3a lost them).

Back to your various solutions. Maori_2ub and Maori_3a as you noticed do not work - the former is a mess and marked everything with a macron or a U+2018 as misspelt; while the Maori_3a handled within 2018s OK, ignored initial 2018s, and as I note above this solution lost all the macrons.

Solution Maori_1 (for text using U+2018).
This was the Word .dic converted to UTF-8 without BOM.
Macrons: responds to missing macrons (within words and on initial letter) and lists the correct word, eg. Upoa lists Ūpua
Within 2018: responds to missing 2018s and lists correct word, eg. akaoa listed Akao‘a
Initial 2018: marks all words with initial 2018 as incorrect, and for missing initial 2018s such as Ārorangi it correctly lists ‘Ārorangi (which, of course, it continues to mark as incorrect). Not 100% but the suggestion list gives good information to reflect upon and will correct missing initial 2018s.

This is the dic I will be using with Open Office until the day I change to U+02BB.

Solution Maori_1b (for text using U+02BB)
Macrons: as for Maori_1
Within 02BB: as with Maori_1
Initial 02BB: this is where this solution shines. All correct words are shown as correct, and (as for Maori_1) when an initial glottal is missing it lists the correct word.

This soltion will be a great help to those working in Polynesia who have already changed to chr U+02BB - and it is certainly a good reason for me to think more seriously about abandoning U+2018 and changing to U+02BB.

Meitaki ma‘ata :D
Open Office 3.2.1 Windows XP Pro v5.1 SP3
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Runtime error when importing custom Word dictionary

Post by franx »

Hi Gerald,
thanks for your detailed and helpful feedback and the enlightening explanation.
All the best for your work and the Cook Islands Maori dictionary of names--
rā mānea
:D
LibreOffice 4.0.4 · WinXP
Post Reply