Page 2 of 5

Re: Babylon dictionaries in .BDC format

PostPosted: Sun Jun 20, 2010 2:02 pm
by ikm
Thanks! Would you mind describing what is currently lacking in GoldenDict .BGL support, though?

Re: Babylon dictionaries in .BDC format

PostPosted: Mon Jun 21, 2010 5:10 pm
by ilius
ikm wrote:Thanks! Would you mind describing what is currently lacking in GoldenDict .BGL support, though?

I can not remember all of my changes, but these are some of them:

1- Unicode characters specified by their codes, such as ‚ and some cases that are in "charset" tag.
2- Too many named characters that are mostly phonetic characters, such as &ldash; â ê à and so on ... (about 40 cases until now)
3- Many other special chanacters such as ♦ and ●
4- Removing annoying indexes for all words that is included in some glossaries.
5- Optional removing some non-usefull and annoying html tags (this is configurable by user)
6- Alternates, that will be detected and appended to the word itself (with a separator " | ")
This still needs some works to has a configurable behavior with alternates.
7- Old implementation of BGL (including in GoldenDict and stardict-tools) removes a big part
of descriptions in some cases (and some glossaries) because of bad supposition
and non-expected bytes!
8- There are specific bytes near the end of defenitions in some glossaries that
specify the title of definition and must moved from the end of string
to the start, and make it bold, removing leader bytes.
9- Detecting and loading few new general information about glossary such as
Creation Time and Last Updated Time (with precision of minutes)
10- Some few bytes at the end of definitions in some glossaries indicants copyright message
and will be replaced by this string.
11- ...


BGL is a VERY VERY COMPLICATED format and has too many details and non-logical exceptions
and seem to be designed to be non-readable for other programs except Babylon Software itself.
An ungraceful policy that had no success and we implement it as free software!
(Sorry for my poor English!)

Regards.

Re: Babylon dictionaries in .BDC format

PostPosted: Mon Jun 21, 2010 5:30 pm
by ikm
I'm sorry, but have you tried GoldenDict?

Re: Babylon dictionaries in .BDC format

PostPosted: Tue Jun 22, 2010 4:35 am
by ilius
ikm wrote:I'm sorry, but have you tried GoldenDict?

Yes. But i'm a Gnome user and prefer StarDict.

Re: Babylon dictionaries in .BDC format

PostPosted: Tue Jun 22, 2010 8:11 am
by ikm
It's just most of the problems you've listed don't really exist in GoldenDict. There is no need to replace html entities -- webkit shows them just fine, the charset clauses are handled by regexps, the displayed headwords are handled, too, and so on and so on. The only two things that aren't handled from your list as I can see are 1) copyright message at the end, 2) creation times.

I would really appreciate if people look at the program first. Granted, BGL support is something that can always be improved, but just bluntly stating things without looking is not cool.

Re: Babylon dictionaries in .BDC format

PostPosted: Tue May 31, 2011 1:35 pm
by levent
I have 32/64 bit Windows 7 operating systems. I converted my dictionaries to .BGL format. But I may change all of them to another format. Which one do you recommend to me? I don't need .BGL files because I use only GoldenDict program since 2009... OK, I use .BGL files because they have smaller size than other formats... What is the best dictionary format for GoldenDict program. Which one do you recommend to me? Thanks.

Re: Babylon dictionaries in .BDC format

PostPosted: Tue May 31, 2011 7:59 pm
by ikm
When choosing formats, I'd recommend StarDict format -- many programs, both desktop and mobile, support it, unlike BGL. It supports HTML entries, so you can just store HTML content, like in BGL. It also has less overhead than BGL, since the latter has to be decompressed in full to be used (GD stores full contents of BGLs in index files in a chunk-compressed form, and Babylon creates the corresponding .BDC files, if I am not mistaken).

However, be sure to know that GD currently doesn't support resources in StarDict files (they are messy and ad-hoc and their proper use isn't really documented).

If you're fine with BGL, I'd recommend just staying with it unless you are going to lose the ability to convert to another format in the future.

Re: Babylon dictionaries in .BDC format

PostPosted: Sat Jun 25, 2011 5:48 pm
by kubtek
However, be sure to know that GD currently doesn't support resources in StarDict files (they are messy and ad-hoc and their proper use isn't really documented).


What do you mean by "resources in StarDict files"? As far as I know, StarDict may store resources in two forms.

1. as ordinary files in res subdirectory
2. in resource storage database. It consists of res.rifo, res.ridx and res.rdic files.

Resource database format is similar to the format of a StarDict dictionary .ifo, .idx, .dict. It is clearly documented in StarDictFileFormat file distributed in StarDict source tarball since version 3.0.2.

Re: Babylon dictionaries in .BDC format

PostPosted: Sun Jun 26, 2011 6:26 am
by ikm
kubtek wrote:Resource database format is similar to the format of a StarDict dictionary .ifo, .idx, .dict. It is clearly documented in StarDictFileFormat file distributed in StarDict source tarball since version 3.0.2.

That particular resource format may be documented indeed, but have you actually seen a single dictionary using it?

Also, while the way the resources are stored is more or less clear, there's not a single word on how they should actually be referred to from the articles.

Re: Babylon dictionaries in .BDC format

PostPosted: Sun Jun 26, 2011 7:20 am
by kubtek
That particular resource format may be documented indeed, but have you actually seen a single dictionary using it?

That resource format has almost no use now. But I believe that is because 1) it was implemented not long ago, 2) its advantages over ordinary files is res subdirectory are not well understood.

lingvosound2resdb utility produces a sound sample database in this format. I know this because I've written that utility myself.

I see two reasons for using resource database format instead of ordinary files:
1. many files in one directory
2. non-ASCII file names

Also, while the way the resources are stored is more or less clear, there's not a single word on how they should actually be referred to from the articles.


I do not actually understand what may be unclear here. Suppose there is an HTML article:
Code: Select all
<p>See the picture <img src="abcde.png"/>.</p>

Then we should search for a file named "abcde.png" in the res subdirectory or in the resource db.