New user registration is currently disabled due to spam abuse / Регистрация новых пользователей в настоящее время приостановлена из-за злоупотреблений спаммерами

How to compile .lsa pronunciation file from .wav?

All about dictionaries

How to compile .lsa pronunciation file from .wav?

Postby dma_k » Sun Oct 30, 2011 2:50 pm

Dear GoldenDict community,

I wonder if somebody can help me to compile one LSA file from thousands of WAV files? I find this more practical to have one file (which I hope also supports loseless/lossy compression on choice).

From Sources.txt I have read:
The speech dictionary (Speech.lsa) was made from the WyabdcRealPeopleTTS.tar.bz2 file, downloaded from http://stardict.sourceforge.net

I am happy to know, what steps are needed for this conversion.
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Re: How to compile .lsa pronunciation file from .wav?

Postby ikm » Sun Oct 30, 2011 7:10 pm

I just used my own hacky script which would 1) concatenate all wav files into one large .wav file, saving their time offsets within it, and their durations, 2) create .lsa header with the file names and corresponding time offsets and durations, 3) compress the .wav file into .ogg with oggenc, 4) concatenate the .lsa header and the resulting .ogg file. That script was only made to do this once, and I actually don't even have it right now. However, writing a proper tool to do this isn't really very hard.
ikm
Автор GoldenDict
 
Posts: 1595
Joined: Wed Feb 04, 2009 10:40 am

Re: How to compile .lsa pronunciation file from .wav?

Postby dma_k » Mon Oct 31, 2011 11:08 pm

ikm wrote:I just used my own hacky script which would 1) concatenate all wav files into one large .wav file, saving their time offsets within it, and their durations, 2) create .lsa header with the file names and corresponding time offsets and durations, 3) compress the .wav file into .ogg with oggenc, 4) concatenate the .lsa header and the resulting .ogg file.

That sounds tricky... Perhaps merging of WAVs and saving offsets and durations I can do... but LSA format is a black patch for me. Is there any (unofficial) docu for LSA format?
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Re: How to compile .lsa pronunciation file from .wav?

Postby dma_k » Sun Dec 11, 2011 2:41 am

If GoldenDict can support archives for sounds (e.g. complete sound structure packed with ZIP or 7zip) there would be no necessity for me to pack them to lsa.
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Re: How to compile .lsa pronunciation file from .wav?

Postby Tvangeste » Sun Dec 11, 2011 7:15 am

dma_k wrote:If GoldenDict can support archives for sounds (e.g. complete sound structure packed with ZIP or 7zip) there would be no necessity for me to pack them to lsa.

You may consider creating a DSL dictionary and attach all the sounds files to that dictionary, in form of ZIP archive.

The DSL format is very simple one. The cards for it could be as simple as this:

Code: Select all
one
  [s]one.mp3[/s]
two
  [s]two.mp3[/s]
three
  [s]three.mp3[/s]


When you create a DSL dictionary, say dictionary.dsl, then you could put all its media files into a simple ZIP archive, just name it dictionary.dsl.files.zip.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: How to compile .lsa pronunciation file from .wav?

Postby dma_k » Thu Nov 15, 2012 11:07 am

Tvangeste wrote:The DSL format is very simple one. The cards for it could be as simple as this:

Code: Select all
one
  [s]one.mp3[/s]
two
  [s]two.mp3[/s]
three
  [s]three.mp3[/s]

When you create a DSL dictionary, say dictionary.dsl, then you could put all its media files into a simple ZIP archive, just name it dictionary.dsl.files.zip.

Thanks for great advise. I have created the archive list.dsl.zip, with the following content:
Code: Select all
list.dsl
a/
a/a.wav
a/aback.wav
a/abandon.wav
a/abandoned.wav
a/abandonment.wav
...
z/zodiacal.wav
z/zone.wav
z/zoo.wav
z/zoological.wav
z/zoology.wav

The file list.dsl in that archive reads:
Code: Select all
a
  [s]a/a.wav[/s]
aback
  [s]a/aback.wav[/s]
abandon
  [s]a/abandon.wav[/s]
abandoned
  [s]a/abandoned.wav[/s]
abandonment
  [s]a/abandonment.wav[/s]
...
zodiacal
  [s]z/zodiacal.wav[/s]
zone
  [s]z/zone.wav[/s]
zoo
  [s]z/zoo.wav[/s]
zoological
  [s]z/zoological.wav[/s]
zoology
  [s]z/zoology.wav[/s]

I have put list.dsl.zip to content folder. No pronunciation is available. I have tried also to move list.dsl file to separate compressed file and rename the archive:
Code: Select all
list.dsl.gz
list.dsl.files.zip

Does not work either. Where should it appear in program settings dialog: in the list of dictionaries or in the list of pronunciation sources?
Last edited by dma_k on Thu Nov 15, 2012 11:56 am, edited 1 time in total.
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Re: How to compile .lsa pronunciation file from .wav?

Postby Tvangeste » Thu Nov 15, 2012 11:21 am

You are probably missing the DSL header (where the name of the dictionary is specified, and from- and to- languages).

Take a look here for the example:

https://github.com/VVSiz/SampleDSL/blob ... sample.dsl

If everything is done right, you'll see your new dictionary in the list of all dictionaries.
Tvangeste
 
Posts: 893
Joined: Thu Jun 02, 2011 11:42 am

Re: How to compile .lsa pronunciation file from .wav?

Postby dma_k » Thu Nov 15, 2012 7:33 pm

Tvangeste wrote:You are probably missing the DSL header (where the name of the dictionary is specified, and from- and to- languages).

Thank for this hint! Indeed that was the problem.

What I have tried:
Code: Select all
pronunciation_en.dsl.gz
Pronunciation_en.dsl.files.zip
then dictionary is not recognized. If I just change the extension
Code: Select all
pronunciation_en.dsl.dz
Pronunciation_en.dsl.files.zip
the dictionary is recognized, but GoldenDict crashes then the article is opened (when tries to pronounce the word). Is it possible to add support for compressed DSL?

The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa". Having WAV source requires "DirectSound" option enabled. But with this option e.g. "zoo" from default "Speech.lsa" is not pronounced, "mother" produces noise.
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Re: How to compile .lsa pronunciation file from .wav?

Postby Abs62 » Thu Nov 15, 2012 7:49 pm

dma_k
GoldenDict from EA build 1.0.1-333-g8e4b384 support compressed sound packs (.zips). Just compress wav files to zip archive and change it extension to "zips". GoldenDict handle such packs like .lsa files.
The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa".

You can try to play sound files by mean external program. Look this post.
Abs62
 
Posts: 631
Joined: Mon Jun 14, 2010 11:51 am

Re: How to compile .lsa pronunciation file from .wav?

Postby dma_k » Fri Nov 16, 2012 8:04 pm

Abs62 wrote:dma_k
GoldenDict from EA build 1.0.1-333-g8e4b384 support compressed sound packs (.zips). Just compress wav files to zip archive and change it extension to "zips". GoldenDict handle such packs like .lsa files.
The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa".

You can try to play sound files by mean external program. Look this post.

Thanks! External program works as a charm both for Speech.lsa and Forvo!

The build 353 I have downloaded was able to find "zips" archive with sounds and detected 18013 words (articles) in it. However it does not show the words from it in the article (nevertheless matches from Speech.lsa and Forvo are shown OK).

Also question: What is the agreement for spaces? Will a/as_well.wav be found for "as well" or I'd rather put spaces into filenames? Thanks.
dma_k
 
Posts: 7
Joined: Mon Oct 24, 2011 11:07 pm

Next

Return to Dictionaries

Who is online

Users browsing this forum: No registered users and 13 guests

cron