Page 1 of 1

how to create a *.syn file

PostPosted: Tue Sep 04, 2012 4:50 pm
by Vanilme
Hi all.
I'd like to use both a monolingual italian dictionary and a bilingual italian dictionary (italian -> french) on a ebook reader.
I have already these 2 dictionaries on stardict format.

The problem is that these 2 dictionaries don't handle morphologies.
For example, they can't find "trattenne" which is a verbal form of "trattenere". Ideally, looking up "trattenne", a italian dictionary would have to find "trattenere".

But I find a Italian-English dictionary (stardict format) which handles morphologies.
I thought to use the morphology data from Italian-English dictionary and insert them in both monolingual italian dictionary and a bilingual italian->french dictionary.
It seems that *.syn is the file which handles morphologies.
I tried to build a dictionary adding the Italian-English *.syn file to the monolingual italian dictionary files and the line "synwordcount=420349" to *.ifo file. But i failed (when I search a word, i find another one, completely different!).

How to add morphological data (*.syn) to my two dictionaries? Can I build a *.syn file from stardict tools and ispell/aspell?

Re: how to create a *.syn file

PostPosted: Tue Sep 04, 2012 5:17 pm
by Tvangeste
You need to install the morphology dictionaries (in myspell/hunspell format), they atypically come with GoldenDict in folder "morphology".

You could download it from here: http://sourceforge.net/projects/goldend ... ogies/1.0/

(Don't forget to unzip them).

Once you unzip and install them, Goldend dict would recognize them, then add the appropriate morphology dictionary to the shelf for the specific language.

Re: how to create a *.syn file

PostPosted: Tue Sep 04, 2012 5:40 pm
by Vanilme
@Tvangeste: there is a misunderstanding. I've already used the morphological goldendict dictionaries on my desktop (KDE). They are great. But morphological goldendict dictionaries are not recognized by ebook readers (like Onyx Boox, Sony PRS, Bebook, Cybook, Kobo...). That's why I'm trying to insert morphological data directly to stardict dictionary as Italian->English Babylon dictionary does it.

Re: how to create a *.syn file

PostPosted: Tue Sep 04, 2012 5:45 pm
by Vanilme
It would be possible to create a *.syn file from a morphological goldendict dictionary, but I don't know why.

Re: how to create a *.syn file

PostPosted: Tue Sep 04, 2012 5:57 pm
by Vanilme
http://code.google.com/p/babiloo/wiki/StarDict_format
This file [".syn"] is optional, and you should notice tree dictionary needn't this file.
Only StarDict-2.4.8 and newer support this file.

The .syn file contains information for synonyms, that means, when you input a
synonym, StarDict will search another word that related to it.

The format is simple. Each item contain one string and a number.
synonym_word; // a utf-8 string terminated by '\0'.
original_word_index; // original word's index in .idx file.
Then other items without separation.
When you input synonym_word, StarDict will search original_word;

The length of "synonym_word" should be less than 256. In other
words, (strlen(word) < 256).
original_word_index is a 32-bits unsigned number in network byte order.
Two or more items may have the same "synonym_word" with different
original_word_index.
The items must be sorted by stardict_strcmp() with synonym_word.

I don't know exactly how to build it.

Re: how to create a *.syn file

PostPosted: Wed Sep 05, 2012 4:56 pm
by Vanilme
This is a quote of a *.syn.file.
"\xfaalcoolica\x00\x00\x00\n\xfealcooliche\x00\x00\x00\n\xfealcoolici\x00\x00\x00\n\xfealcoolismi\x00\x00\x00\n\xffalcooliste\x00\x00\x00\x0b\x00alcoolisti\x00\x00\x00\x0b\x00alcoolizza\x00\x00\x00\x0b\x01alcoolizzai\x00\x00\x00\x0b\x01alcoolizzammo\x00\x00\x00\x0b\x01alcoolizzando\x00\x00\x00\x0b\x01alcoolizzano\x00\x00\x00\x0b\x01alcoolizzante\x00\x00\x00\x0b\x01alcoolizzanti\x00\x00\x00\x0b\x01alcoolizzarono\x00\x00\x00\x0b\x01alcoolizzasse\x00\x00\x00\x0b\x01alcoolizzassero\x00\x00\x00\x0b\x01alcoolizzassi\x00\x00\x00\x0b\x01alcoolizzassimo\x00\x00\x00\x0b\x01alcoolizzaste\x00\x00\x00\x0b\x01alcoolizzasti\x00\x00\x00\x0b\x01alcoolizzata\x00\x00\x00\x0b\x01alcoolizzata\x00\x00\x00\x0b\x02alcoolizzate\x00\x00\x00\x0b\x01alcoolizzate\x00\x00\x00\x0b\x02alcoolizzati\x00\x00\x00\x0b\x01alcoolizzati\x00\x00\x00\x0b\x02alcoolizzato\x00\x00\x00\x0b\x01alcoolizzava\x00\x00\x00\x0b\x01alcoolizzavamo\x00\x00\x00\x0b\x01alcoolizzavano\x00\x00\x00\x0b\x01alcoolizzavate\x00\x00\x00\x0b\x01alcoolizzavi\x00\x00\x00\x0b\x01alcoolizzavo\x00\x00\x00\x0b\x01alcoolizzerai\x00\x00\x00\x0b\x01alcoolizzeranno\x00\x00\x00\x0b\x01alcoolizzerebbe\x00\x00\x00\x0b\x01alcoolizzerebbero\x00\x00\x00\x0b\x01alcoolizzerei\x00\x00\x00\x0b\x01alcoolizzeremmo\x00\x00\x00\x0b\x01alcoolizzeremo\x00\x00\x00\x0b\x01alcoolizzereste\x00\x00\x00\x0b\x01alcoolizzeresti\x00\x00\x00\x0b\x01alcoolizzerete\x00\x00\x00\x0b\x01alcoolizzer\xc3\xa0\x00\x00\x00\x0b\x01alcoolizzer\xc3\xb2\x00\x00\x00\x0b\x01alcoolizzi\x00\x00\x00\x0b\x01alcoolizziamo\x00\x00\x00\x0b\x01alcoolizziate\x00\x00\x00\x0b\x01alcoolizzino\x00\x00\x00\x0b\x01alcoolizzo\x00\x00\x00\x0b\x01alcoolizz\xc3\xb2\x00\x00\x00\x0b\x01alcove\x00\x00\x00\x0b\x04alcun\x00\x00\x00\x0b\talcun\x00\x00\x00\x0b\nalcun'\x00\x00\x00\x0b\talcun'\x00\x00\x00\x0b\n"

The syntax is:
SYNONYM_WORD_1original_word_index_1SYNONYM_WORD_2original_word_index_2SYNONYM_WORD_3original_word_index_4...

For example:
alcoolica\x00\x00\x00\n\xfealcooliche\x00\x00\x00\n\xfealcoolici\x00\x00\x00\n\xfe

... with SYNONYM_WORD_1 = alcoolica, original_word_index_1 = \x00\x00\x00\n\xfe...

Consequently, you can first create a dictionary from tab file (⇒ *.dict.dz, *.idx, *.ifo) and then a *.syn file which will use data from *.idx.
"\x00\x00\x00\n\xfe" is maybe a C language address or a pointer...

This is a quote of the *.idx file:
alcool\x00\x00\x02\xf5k\x00\x00\x00,alcoolico\x00\x00\x02\xf5\x97\x00\x00\x004alcoolismo\x00\x00\x02\xf5\xcb\x00\x00\x00=alcoolista\x00\x00\x02\xf6\x08\x00\x00\x007alcoolizzare\x00\x00\x02\xf6?\x00\x00\x00ealcoolizzato\x00\x00\x02\xf6\xa4\x00\x00\x00=alcooltest\x00\x00\x02\xf6\xe1\x00\x00\x00\x0c


I continue to search the solution...

Re: how to create a *.syn file

PostPosted: Wed Sep 05, 2012 6:02 pm
by Vanilme
stardict-index inspects StarDict index (.idx) files, synonyms (.syn) files, and resource database index (.ridx) files
But I'm using Debian Squeeze and "stardict-index" doesn't exist for this operating system yet (exists on Wheezy).