New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

[SOLVED] How to compress large DSL files (more than 1.8 GB)

All about dictionaries

[SOLVED] How to compress large DSL files (more than 1.8 GB)

Postby the_cla5h » Tue May 20, 2014 8:49 pm

I downloaded the 2012 version of Urban Dictionary in dsl format, its size is 2,4 GB. It works well in GoldenDict.
I tried to dzip it in Ubuntu from command line and, although the conversion seems to complete succesfully, the resulting 418MB .dz file doesn't work in GoldenDict.
Then I tried to convert it in Windows with dictzip.exe, but the compression doesn't start at all: the programme is closed with an error message and a 0 bytes .dz file is created.

Could you please help me or try to do this for me? The dictionary can be downloaded here: http://goo.gl/4eHFuH (don't worry, it's a 7zip compressed file of only 199MB :) )
Last edited by the_cla5h on Wed May 21, 2014 3:48 pm, edited 1 time in total.
the_cla5h
 
Posts: 15
Joined: Sun Mar 09, 2014 1:38 am

Re: Could you help me to dzip this 2,4 GB Urban Dictionary?

Postby Abs62 » Tue May 20, 2014 9:02 pm

Dictzip format don't support such huge files. It is format restriction.
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: Could you help me to dzip this 2,4 GB Urban Dictionary?

Postby the_cla5h » Tue May 20, 2014 9:36 pm

If it is so, what is the limit in dimensions?
I also thought it was too large for dictzip, but here I read that dictzip can handle large files (more than 1 gb): viewtopic.php?f=5&t=1841
the_cla5h
 
Posts: 15
Joined: Sun Mar 09, 2014 1:38 am

Re: Could you help me to dzip this 2,4 GB Urban Dictionary?

Postby Abs62 » Wed May 21, 2014 3:56 am

AFAIK this limit is about 1.8 GB.
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: Could you help me to dzip this 2,4 GB Urban Dictionary?

Postby the_cla5h » Wed May 21, 2014 3:42 pm

Finally I was able to compress it! In the end it was a matter of size, as Abs62 suggested.
Here is how I did: (thanks to Tvangeste and wargus for the inputs from this post! viewtopic.php?f=4&t=1301 )

1. To make it smaller, I tried to convert my file from DOS to Unix EOL both with dos2unix and tofrodos, but I discovered that my file was already in Unix format.

2. Then, again to make it smaller, I tried converting it from UTF-16 to UTF-8, but I had some problems because the file was so big that gedit and other text editors would crash while opening it or while trying to save it. Fortunately, i found a really light and fast text editor named AkelPad (it's for Windows but works in Linux with Wine), which was able to open it without crashing and save it as new file in UTF-8 format. From 2,4 GB it shrank to 1,2 gb (50%!).

3. Then I was able to compress it with dictzip to a small 338 MB .dz file! :)

I hope this could help other people with a similar problem, so I'll mark it as solved and change the topic to "How to compress large DSL files".
the_cla5h
 
Posts: 15
Joined: Sun Mar 09, 2014 1:38 am


Return to Dictionaries

Who is online

Users browsing this forum: No registered users and 19 guests