New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

Can't find any word with č in Slovene-English BGL

All about dictionaries

Can't find any word with č in Slovene-English BGL

Postby chema » Sat Feb 01, 2014 5:49 am

Howdy,

I was missing Babylon in the otherwise comfortable Linux setup I use daily, so I decided to check how things were looking up, and was pleasantly surprised to discover GoldenDict. I promptly grabbed a book in Slovene and the only dictionary file I could find for such an "exotic" language: Slovene-English-Slovene by Milan Orožen Adamič, at the dusty and hidden section of free dictionaries at babylon.com.

I'm happy to report I'm lovin the double click translation alright (though I wish I could bind it to the 5th mouse button, or anything else), but with a little stump: I don't get any matches if the word contains a "č".

I restarted my PC. No good. Walked out of the room and back inside. Still no č in da house. Finally, I dictconv-erted the BGL to StarDict format, and presto, I finally could get me some čaj! But... and a big but it is, now words that contain ž or š have no matches, though they did before! This also means I can't get any match at all for words with "šč", slovenščina included... ščureki!

Seems like an encoding problem, though I can't imagine where did it come from, since the Babylon glossary seems just fine, at least online: čaj, šah, žaba, ščurek... ;)

I'm now trying to make sense of the dictionary formats. Perhaps I can simply do a hex substitution of the wrong characters in the index file, or maybe I will have to convert the glossary to an editable form... any suggestions?
chema
 
Posts: 3
Joined: Sat Feb 01, 2014 4:35 am

Re: Can't find any word with č in Slovene-English BGL

Postby Abs62 » Sat Feb 01, 2014 1:01 pm

chema wrote:I'm happy to report I'm lovin the double click translation alright (though I wish I could bind it to the 5th mouse button, or anything else), but with a little stump: I don't get any matches if the word contains a "č".

This dictionary (bgl file) was created in Win-1252 encoding which don't contain "č" symbol. It seems in this file all "č" were replaced by "è".
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: Can't find any word with č in Slovene-English BGL

Postby chema » Sat Feb 01, 2014 2:54 pm

Abs62 wrote:This dictionary (bgl file) was created in Win-1252 encoding which don't contain "č" symbol. It seems in this file all "č" were replaced by "è".


Hmm, the strange thing is it works just fine for babylon.com, and dictconv magically fixed the č while breaking š and ž. Must be the computer gnomes again, for sure! ;)

I've fixed it now: I dictconv-erted to DICT format, which has a nice plaintext index, where the rogue characters (\232 and \236) stuck out like sore thumbs. I replaced those in the index and the dict files and worked just fine. Then I did the same with the StarDict files, because for some reason GoldDict doesn't shows the translated word for DICT files, only the translation (with the minor difference that the IDX file seems to be UTF-16 or such, as I had to replace \302\232 and \302\236 there), and it works great now, ščureki and all.

May I ask which tools do you use to inspect BGLs? Thanks!
chema
 
Posts: 3
Joined: Sat Feb 01, 2014 4:35 am

Re: Can't find any word with č in Slovene-English BGL

Postby Abs62 » Sat Feb 01, 2014 3:37 pm

chema wrote:Hmm, the strange thing is it works just fine for babylon.com,

May be online dictionary created in other encoding.
chema wrote: and dictconv magically fixed the č while breaking š and ž. Must be the computer gnomes again, for sure! ;)

I think your dictconv just tried to read bgl file as ISO 8859-2 encoding instead of Win-1252 specified in file. This enconing include "č" symbol on Win-1252 "è" place. ;)
chema wrote:May I ask which tools do you use to inspect BGLs? Thanks!

GoldenDict source code + debugger. ;)
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am


Return to Dictionaries

Who is online

Users browsing this forum: No registered users and 17 guests