Creating and mass populating Custom Dictionary

Discuss the word processor
Post Reply
Paddy
Posts: 1
Joined: Sun Aug 15, 2010 9:57 pm

Creating and mass populating Custom Dictionary

Post by Paddy »

I want to create a botanical dictionary and mass populate it. I have read Bruce Byfield's article but the description he gives of what the dictionary file looks like does not seem to apply to OOo3.2 Writer. I have tried creating a custom dictionary in Writer but obviously one can only add one word at a time by this method. I also tried creating the dictionary in Writer then editing it using a text editor but having done so and returning to Writer it no longer sees this file. What am I doing wrong? Or is there no way to do what I require?
Openoffice 3.2 on Windows XP
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Creating and mass populating Custom Dictionary

Post by franx »

If you want to create an extension dictionary (.oxt) for OOo:

I've attached a short example dict-en_US_private.oxt [renamed to .zip],
made from the word list in the attached 1_word_collection.odt

(1)
Collect the words in Writer (see sample 1_word_collection.odt) or directly in a text editor [UTF-8].
(One word, one "pharagraph", no spaces.)
Sort the words alphanumeric (and copy them to a text editor [UTF-8]).
Insert the number of words in the first line (see en_US_private.dic) and save as *.dic.

(2)
I've created a folder dict-en_US_private for all the necessary files and added en_US_private.dic.
Then add an *.aff file.
I've copied en_US.aff from the English extension dictionaries and renamed it to en_US_private.aff.

(3)
Create dictionaries.xcu and customize the sample to your file names (see unzipped dict-en_US_private.oxt).

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<oor:component-data xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" oor:name="Linguistic" oor:package="org.openoffice.Office">	
 <node oor:name="ServiceManager">
    <node oor:name="Dictionaries">
    	<node oor:name="HunSpellDic_en-US_private" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/en_US_private.aff %origin%/en_US_private.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_SPELL</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>
    </node>
 </node>
</oor:component-data>
Create description.xml and customize the sample to your file names and date.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<description xmlns="http://openoffice.org/extensions/description/2006" xmlns:d="http://openoffice.org/extensions/description/2006"  xmlns:xlink="http://www.w3.org/1999/xlink">
    <version value="2010.08.16" />
    <identifier value="en_US_private" />
    <display-name>
        <name lang="en">en_US_private spelling dictionary</name>
    </display-name>
    <platform value="all" />
    <dependencies>
        <OpenOffice.org-minimal-version value="3.0" d:name="OpenOffice.org 3.0" />
    </dependencies>
</description>
Copy sub folder META-INF with manifest.xml.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
<manifest:manifest xmlns:manifest="http://openoffice.org/2001/manifest">
    <manifest:file-entry manifest:media-type="application/vnd.sun.star.configuration-data" 
        manifest:full-path="dictionaries.xcu"/>
</manifest:manifest>
Add license.txt and README (see sample)

(4)
In a final step, I've created an archive file dict-en_US_private.zip with:
en_US_private.dic
en_US_private.aff
dictionaries.xcu
description.xml
license.txt
README_extension_owner.txt
manifest.xml [in Folder META-INF]

Then renamed to dict-en_US_private.oxt.
Installed and tested with OOo 3.2.1 / OOo 3.3 beta (on WinXP).

Play with the extension (but backup your user profile) ... ;)
See also → Extension Dictionaries
<http://wiki.services.openoffice.org/wik ... ctionaries>
Attachments
dict-en_US_private.zip
(22.66 KiB) Downloaded 532 times
1_word_collection.odt
(16.03 KiB) Downloaded 441 times
LibreOffice 4.0.4 · WinXP
User avatar
franx
Volunteer
Posts: 540
Joined: Wed Nov 12, 2008 9:25 pm
Location: FRA 'n' QXB

Re: Creating and mass populating Custom Dictionary

Post by franx »

... alternative options with the user-defined dictionaries →

(1) OOo Dictionary Importer/Exporter →
Re: What is the Encoding of User-Defined Dictionaries?

(2) OOo 3.3 –
From → Feature Freeze Testing 3.3 – Component : Word Processing (Writer)
[...]
106032 : Change of file format for new created user-dictionaries
Description : The file-format of dictionaries created by the user now defaults to a flat UTF-8 text file. Thus the content can easily be viewed in regular editors. However be careful when editing it, see issue 106032 for details.
Feature Announcement : http://sw.openoffice.org/servlets/ReadM ... &msgNo=324
[...]

Sample: OOo-dev 3.3 (user-defined dictionary, created/opened/edited with a text editor)

Code: Select all

OOoUserDict1
lang: <none>
type: positive
---
Eĥoŝanĝo
Příliš
bávččas
coṇcoṇ
daño
gør
jagħmilli
kwik
rănește
szkło
tägelîch
tükörfúrógép
yishą́ągo
ægithales
čuovžža
ē-tàng
можу
мшистым
Ὀδυσσέα
LibreOffice 4.0.4 · WinXP
lmselby
Posts: 4
Joined: Wed Apr 17, 2013 6:21 am

Re: Creating and mass populating Custom Dictionary

Post by lmselby »

Thank you very much, franx. I was trying to bring in my entries in Haitian Creole to a custom dictionary that I created in Writer and when I would "save as" the .dic file in UTF-8 and then try to edit it in Writer, all my accented vowels became weird symbols.

I do not know how to program, but followed all the instructions on your post Mon Aug 16, 2010 11:17 am. Once I created an archive folder (.zip) and renamed it with the suffix .oxt, I followed the instructions within LibreOffice in Tools>Extension Manager>Add and I was able to select the oxt folder and install the dictionary. I did not make any changes in the name of the OpenOffice versions used. Please advise me if I have to make changes according to the LibreOffice 4.0 version installed on my computer or reconfigure the extension according to new standards. Thanks in advance.
LibreOffice 4.0 on Windows 7 Home Premium SP1 64 bit
stereo
Posts: 2
Joined: Thu Feb 27, 2014 4:33 pm

Re: Creating and mass populating Custom Dictionary

Post by stereo »

I have text files which contain many proper names and special words which are not in the dictionnary. I would like to export them into an extra text file in a list form in order to work on them, without selecting them one by one. How can I do that?
OpenOffice 4.0.0 on Windows 8
User avatar
Hagar Delest
Moderator
Posts: 30420
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Creating and mass populating Custom Dictionary

Post by Hagar Delest »

It depends on how the file looks like.
Can you upload an excerpt of it?
Is it really the same problem as above?
LibreOffice 7.2.6 on Xubuntu 22.04 and 7.2.6 portable on Windows 10
User avatar
Hagar Delest
Moderator
Posts: 30420
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Creating and mass populating Custom Dictionary

Post by Hagar Delest »

stereo wrote:Thank you very much for your reply and for looking at my problem and question. Please find attached an excerpt file. Its one page in German language with dialect words and proper names. I have many pages (400p) and many more to come. I' ve found out that it is possible to create within OO your own dictionary, but that demands checking each word step by step. Furthermore, each local dialect differs from another by spelling. So I am interested in glossaries and dictionaries of different dialects, too. A list can be compared to other word lists. I think that an export of words in a text unknown to a standard dictionary, when offered within OO, in any language, could be useful for people who work on language problems (translators, dialectologists, terminologists, historians and toponymists searching for proper names, people learning of a foreign language or those who teach them etc.). It looks to me like the problem above, maybe mine is more general. I saved my excerpt file in doc 97/2000/XP.
I quote your PM so that the discussion benefits other users (I don't reply such question by mail or PM anyway).

If your words are in italics (that's what I guess from your attachment), you can select them all at once then paste them in a text file. But you'll need to add a delimiting character first. So you need to search for " -" (space and dash) then Find all and set the selection to italics.
Then search for format italics (in the More options panel). Then Find all, paste them in a text file and use find and replace again to replace the space and dash by a paragraph break (\n with regular expressions set ON). You should get your words list.
LibreOffice 7.2.6 on Xubuntu 22.04 and 7.2.6 portable on Windows 10
stereo
Posts: 2
Joined: Thu Feb 27, 2014 4:33 pm

Re: Creating and mass populating Custom Dictionary

Post by stereo »

Thanks very much Hagar for your help, I will try the proposed solution immediately. In this case/text file the use of italics is the exception (maybe some words in questions are not in italics), but for the amount of text pages and the time being, it is just fine. I would appreciate if you could remove the attachement, because the text is already printed in a book, I want to avoid copyright issues. I will try to upload another test text file later, which I have to produce first. Thanks for your understanding and help.

I am just trying. What a genius idea! I did not know about this option in 'find and replace'. I just have problems with the 'space dash' in italics, respectively setting the delimiter. OO crashed twice! One problem is due to thé irregular forms of space and dashes, but there are also others, I try to find a solution.

As promised, I upload a demo text, which is copyright free and which contains words in italics as words unknown to the dictionary. The extraction of words in italics works, I just struggle with the delimiter problem. The copied and pasted words in italics clutter in the next document together. I tried different approaches, e.g. usiung § as a delimiter, but the fact that author(typer) used bold, italics and different dashes makes it difficult. There are many further obstacles which I did not identified yet. Anyhow, as I mentioned before its a solution for words in italics or likewise format, it does not work if the text is in one format. It tried with attributes, but I did not find a solution with this. Anyhow, thank you so far.
Attachments
The Tragedy of Hamlet, Prince of Denmark_p1.doc
demo text file
(20.5 KiB) Downloaded 325 times
OpenOffice 4.0.0 on Windows 8
Post Reply