New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

Too many realated words in a lookup

Report bugs here

Too many realated words in a lookup

Postby AshkanV » Tue Jan 29, 2013 9:36 pm

Hello
Thank you for this excellent dictionary software.

The program usually does not have any problem when it uses small glossaries, whether in number or size. Recently I added about 20 large Babylon glossary (*.bgl) like Oxford, Webster, Cambridge, offline Wikipedia, ... . Some of them are more than 200MB.

The problem is when i select to show some of these glossaries with some other special ones, GoldenDict shows too many related word to the one i entered and it takes a very long time to show all the results (sometimes more than 5 min, on a core-i7 cpu).

For example:
when i search "Look" in GD, with only "Webster's New Third International Unabridged Dictionary" glossary enabled, i get only the meaning of "look" as a noun and as a verb. But when I add "Oxford Learner's Thesaurus" to the group, in addition to the word "look", GD also shows the result for "appearance, consideration, expression, fashion, search, seem" in both glossaries, and any of these extra results enteries add many outputs to the result window.

Till now I have found these glossaries, as problematic ones, the words in front of each one are the extra result shows in "Babylon En-En" which is the original glossary of babylon:

TestGroup = Babylon En-En + one the following glossaries:

Word = Look
Oxford Learner's Thesaurus => appearance, consideration, expression, fashion, search, seem
Oxford Collocations Dictionary => look after, look at, look down, look for, look in, look into, ...
Chambers Dictionary 11th Edition => askance, face, Hippocratic, rise, rose, rose, rose (rise), ...
Persian Computer Encyclopaedia => look alike, look up
Picture Dictionary => observe
American Idioms => feel small, look a gift horse in the mouth, look after, look at, look down on, ...

Word = Flower
English (GB) Morphology => flow

Word = Up
Merriam-Webster Collegiate® Dictionary => ups and downs
Macmillan English Thesaurus => close, closed

Word = a
Concise Oxford English Dictionary => A.D. (anno domini), ad, ampere
General Chemistry Glossary => activity

Now imagine when there are 10 large glossaries in a group and 5 of them make GD to lookup lots of related word to the one entered. The result are a very huge output!

As putting download link to copy-right protected items, is forbidden in the forum, I didn't add any. But if there is a need, i can add a direct download link for any of mentioned glossaries. And I know that some of theme are freely available at the Babylon website.

I used the git version from 28th of January and build it on fedora 18 using qmake-qt4 + make.

So I asked if this is a feature of GD, please make an option for disabling it, and if it is bug please fix it.

Thank you.
AshkanV
 
Posts: 2
Joined: Tue Jan 29, 2013 8:31 pm

Re: Too many realated words in a lookup

Postby Tvangeste » Tue Jan 29, 2013 10:38 pm

This is a problem with buggy, wrong BGL dictionaries that contain too many synonyms/alternatives and "inject" them for the translation.

If some BGL dictionary contains rules that for word "look" one also need to look for "see, show, etc", that's the dictionary problem.

Personally, I'd rather use DSL dictionaries (and most of the BGL dictionaries you've mentioned are just conversions from DSL), DSL dictionaries do not have such problem with additional injected lists of synonyms/alternatives.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: Too many realated words in a lookup

Postby AshkanV » Wed Jan 30, 2013 5:31 am

Tvangeste wrote:This is a problem with buggy, wrong BGL dictionaries that contain too many synonyms/alternatives and "inject" them for the translation.

I accept it, but maybe GD can prevent this type of injection.

Tvangeste wrote:If some BGL dictionary contains rules that for word "look" one also need to look for "see, show, etc", that's the dictionary problem.

I see, but the real problem is, when glossary "A" does have such a problem why it affects all the other glossaries. The problem is inside one glossary but GD looks up those extra related words in all glossaries.

Tvangeste wrote:Personally, I'd rather use DSL dictionaries (and most of the BGL dictionaries you've mentioned are just conversions from DSL), DSL dictionaries do not have such problem with additional injected lists of synonyms/alternatives.

That is great, I'll try to find the DSL versions. But maybe GD needs some patch to prevent such widely affect from only one glossary.

Update:
I forgot to mention: As I wrote in the first post, Some standard glossaries like 'English (GB) Morphology" also make GD to look up some extra words: Flower => Flower + Flow
AshkanV
 
Posts: 2
Joined: Tue Jan 29, 2013 8:31 pm

Re: Too many realated words in a lookup

Postby Tvangeste » Wed Jan 30, 2013 8:16 am

AshkanV wrote:I accept it, but maybe GD can prevent this type of injection.

If we prevent this injection then morphology dictionaries will be useless. Since that's exactly what they do, they inject alternative forms, so that if user looks up "gone" or "goes" he/she would also get the proper form "go".

AshkanV wrote:I see, but the real problem is, when glossary "A" does have such a problem why it affects all the other glossaries. The problem is inside one glossary but GD looks up those extra related words in all glossaries.

There is no other way to do it. When I look up "goes", the morphology dictionary injects "go" and I get translations of "go" from *all* my dictionaries. If the morphology dictionary couldn't inject the "go" form, then you'd end up with empty translation in all dictionaries except for the morphology one. Not very useful.

AshkanV wrote:I forgot to mention: As I wrote in the first post, Some standard glossaries like 'English (GB) Morphology" also make GD to look up some extra words: Flower => Flower + Flow

And that's exactly the use case why we need the injection, to handle this case.

In short, the problem lies with the specific wrong dictionary. The same would happen if you create a new and incorrect morphology dictionary that returns huge list of alternatives. There is no way for GoldenDict to distinguish between good and bad dictionaries. That's user responsibility. The proper way to fix this is to either remove the buggy dictionary or to adjust it to eliminate huge lists of alternatives.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: Too many realated words in a lookup

Postby Tvangeste » Wed Jan 30, 2013 10:22 am

@AshkanV, could you send me one of the broken BGL dictionaries that produce excessive suggestions, via Private Message? I'd like to take a look.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am


Return to Bugs

Who is online

Users browsing this forum: No registered users and 20 guests