New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

Whats the best dictionary file format?

General discussion

Whats the best dictionary file format?

Postby onny » Wed Jan 23, 2013 2:58 am

Hi,
I'm coding a Python script which converts Wiktionary dumps into a bilingual dictionary. My question is, what's the best dictionary file format to use in terms of:
- popularity
- standardization
- features
- support in dictionary programs
- open source
I guess DICT and StarDict format are the best ones?
Any hints welcome!
onny
 
Posts: 1
Joined: Wed Jan 23, 2013 2:00 am

Re: Whats the best dictionary file format?

Postby hanyl05 » Wed Jan 23, 2013 3:56 am

Stardict support html tags, but the problem is that multimedia files for stardict are usually stored in a file named res. Goldendict
Still cannot supported zipped res files, which means if you have a wikidictionary with lots of multimedia files, load speed in Goldendict
is probably very slow.
If you want to use Goldendict to load the dictionary, probably the best format is lingvo dsl.
hanyl05
 
Posts: 125
Joined: Mon Dec 05, 2011 1:00 pm

Re: Whats the best dictionary file format?

Postby dg333 » Wed Jan 23, 2013 6:21 am

Yes, in order to dump something to a plain bilingual dictionary DSL is probably the best choice. However, it’s quite poor in features, meaning that all its features are aimed at nice look rather than at nice structure of dictionaries. XDXF (an XML application) is definitely much better in that respect, but it’s not very well supported by programs.
dg333
 
Posts: 117
Joined: Fri Jun 05, 2009 9:50 am

Re: Whats the best dictionary file format?

Postby Tvangeste » Wed Jan 23, 2013 8:13 am

@onny, the Lingvo DSL format is well defined (see the DSL Format Help) and very easy to work with, it's just a plain unicode-encoded text with some tags. Given the huge amount of dictionaries available in DSL format, there are converters from DSL to various other formats. For example, there is a converter from DSL to Stardict.

The only serious drawback of the DSL format is that it doesn't support proper tables. If that's OK and you don't need tables, I'd suggest to look at DSL. If you need more HTML-like features, then Stardict with HTML or XDXF content would be better candidates.

BTW, there are some tools to convert the wikipedia (an wiktionary too?) content to AARD format which is also supported by GoldenDict:
http://aarddict.org/aardtools/doc/aardtools.html

At least on the dictionaries page I see a couple of wiktionary-based dictionaries, for English and German:
http://aarddict.org/dictionaries/index.html

If I'm not mistaken, the converter is written in Python as well. And since the AARD format is just a container for any HTML content, the conversion from wiki dumps is pretty straightforward.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: Whats the best dictionary file format?

Postby hanyl05 » Wed Jan 23, 2013 8:47 am

BTW, it is possible to specify fonts with dsl tags, instead of css?
hanyl05
 
Posts: 125
Joined: Mon Dec 05, 2011 1:00 pm

Re: Whats the best dictionary file format?

Postby Tvangeste » Wed Jan 23, 2013 8:50 am

hanyl05 wrote:BTW, it is possible to specify fonts with dsl tags, instead of css?

Not really. You can make them bold, italic, you could change their colors, but you cannot specify fonts in DSL.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: Whats the best dictionary file format?

Postby hanyl05 » Wed Jan 23, 2013 8:55 am

Font and table are the drawback of lingvo. Wish GD could support table and font tags for dsl file.
Can we creat tags and let GD support those tages, such as
[F font=Arial][/F]
[T table width......][/T]??
hanyl05
 
Posts: 125
Joined: Mon Dec 05, 2011 1:00 pm

Re: Whats the best dictionary file format?

Postby Tvangeste » Wed Jan 23, 2013 9:11 am

hanyl05 wrote:Font and table are the drawback of lingvo. Wish GD could support table and font tags for dsl file.

I'd like that too. And we asked Abbyy company to upgrade the DSL format to support these features.

hanyl05 wrote:Can we create tags and let GD support those tages, such as
[F font=Arial][/F]
[T table width......][/T]??

Unfortunately, that would lead to format fragmentation, which is a bad thing.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am


Return to General

Who is online

Users browsing this forum: No registered users and 67 guests