Page 1 of 2

How to compile .lsa pronunciation file from .wav?

PostPosted: Sun Oct 30, 2011 2:50 pm
by dma_k
Dear GoldenDict community,

I wonder if somebody can help me to compile one LSA file from thousands of WAV files? I find this more practical to have one file (which I hope also supports loseless/lossy compression on choice).

From Sources.txt I have read:
The speech dictionary (Speech.lsa) was made from the WyabdcRealPeopleTTS.tar.bz2 file, downloaded from http://stardict.sourceforge.net

I am happy to know, what steps are needed for this conversion.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Sun Oct 30, 2011 7:10 pm
by ikm
I just used my own hacky script which would 1) concatenate all wav files into one large .wav file, saving their time offsets within it, and their durations, 2) create .lsa header with the file names and corresponding time offsets and durations, 3) compress the .wav file into .ogg with oggenc, 4) concatenate the .lsa header and the resulting .ogg file. That script was only made to do this once, and I actually don't even have it right now. However, writing a proper tool to do this isn't really very hard.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Mon Oct 31, 2011 11:08 pm
by dma_k
ikm wrote:I just used my own hacky script which would 1) concatenate all wav files into one large .wav file, saving their time offsets within it, and their durations, 2) create .lsa header with the file names and corresponding time offsets and durations, 3) compress the .wav file into .ogg with oggenc, 4) concatenate the .lsa header and the resulting .ogg file.

That sounds tricky... Perhaps merging of WAVs and saving offsets and durations I can do... but LSA format is a black patch for me. Is there any (unofficial) docu for LSA format?

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Sun Dec 11, 2011 2:41 am
by dma_k
If GoldenDict can support archives for sounds (e.g. complete sound structure packed with ZIP or 7zip) there would be no necessity for me to pack them to lsa.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Sun Dec 11, 2011 7:15 am
by Tvangeste
dma_k wrote:If GoldenDict can support archives for sounds (e.g. complete sound structure packed with ZIP or 7zip) there would be no necessity for me to pack them to lsa.

You may consider creating a DSL dictionary and attach all the sounds files to that dictionary, in form of ZIP archive.

The DSL format is very simple one. The cards for it could be as simple as this:

Code: Select all
one
  [s]one.mp3[/s]
two
  [s]two.mp3[/s]
three
  [s]three.mp3[/s]


When you create a DSL dictionary, say dictionary.dsl, then you could put all its media files into a simple ZIP archive, just name it dictionary.dsl.files.zip.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Thu Nov 15, 2012 11:07 am
by dma_k
Tvangeste wrote:The DSL format is very simple one. The cards for it could be as simple as this:

Code: Select all
one
  [s]one.mp3[/s]
two
  [s]two.mp3[/s]
three
  [s]three.mp3[/s]

When you create a DSL dictionary, say dictionary.dsl, then you could put all its media files into a simple ZIP archive, just name it dictionary.dsl.files.zip.

Thanks for great advise. I have created the archive list.dsl.zip, with the following content:
Code: Select all
list.dsl
a/
a/a.wav
a/aback.wav
a/abandon.wav
a/abandoned.wav
a/abandonment.wav
...
z/zodiacal.wav
z/zone.wav
z/zoo.wav
z/zoological.wav
z/zoology.wav

The file list.dsl in that archive reads:
Code: Select all
a
  [s]a/a.wav[/s]
aback
  [s]a/aback.wav[/s]
abandon
  [s]a/abandon.wav[/s]
abandoned
  [s]a/abandoned.wav[/s]
abandonment
  [s]a/abandonment.wav[/s]
...
zodiacal
  [s]z/zodiacal.wav[/s]
zone
  [s]z/zone.wav[/s]
zoo
  [s]z/zoo.wav[/s]
zoological
  [s]z/zoological.wav[/s]
zoology
  [s]z/zoology.wav[/s]

I have put list.dsl.zip to content folder. No pronunciation is available. I have tried also to move list.dsl file to separate compressed file and rename the archive:
Code: Select all
list.dsl.gz
list.dsl.files.zip

Does not work either. Where should it appear in program settings dialog: in the list of dictionaries or in the list of pronunciation sources?

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Thu Nov 15, 2012 11:21 am
by Tvangeste
You are probably missing the DSL header (where the name of the dictionary is specified, and from- and to- languages).

Take a look here for the example:

https://github.com/VVSiz/SampleDSL/blob ... sample.dsl

If everything is done right, you'll see your new dictionary in the list of all dictionaries.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Thu Nov 15, 2012 7:33 pm
by dma_k
Tvangeste wrote:You are probably missing the DSL header (where the name of the dictionary is specified, and from- and to- languages).

Thank for this hint! Indeed that was the problem.

What I have tried:
Code: Select all
pronunciation_en.dsl.gz
Pronunciation_en.dsl.files.zip
then dictionary is not recognized. If I just change the extension
Code: Select all
pronunciation_en.dsl.dz
Pronunciation_en.dsl.files.zip
the dictionary is recognized, but GoldenDict crashes then the article is opened (when tries to pronounce the word). Is it possible to add support for compressed DSL?

The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa". Having WAV source requires "DirectSound" option enabled. But with this option e.g. "zoo" from default "Speech.lsa" is not pronounced, "mother" produces noise.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Thu Nov 15, 2012 7:49 pm
by Abs62
dma_k
GoldenDict from EA build 1.0.1-333-g8e4b384 support compressed sound packs (.zips). Just compress wav files to zip archive and change it extension to "zips". GoldenDict handle such packs like .lsa files.
The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa".

You can try to play sound files by mean external program. Look this post.

Re: How to compile .lsa pronunciation file from .wav?

PostPosted: Fri Nov 16, 2012 8:04 pm
by dma_k
Abs62 wrote:dma_k
GoldenDict from EA build 1.0.1-333-g8e4b384 support compressed sound packs (.zips). Just compress wav files to zip archive and change it extension to "zips". GoldenDict handle such packs like .lsa files.
The next problem I've noticed that some (all?) audio files are not played correctly from "Speech.lsa".

You can try to play sound files by mean external program. Look this post.

Thanks! External program works as a charm both for Speech.lsa and Forvo!

The build 353 I have downloaded was able to find "zips" archive with sounds and detected 18013 words (articles) in it. However it does not show the words from it in the article (nevertheless matches from Speech.lsa and Forvo are shown OK).

Also question: What is the agreement for spaces? Will a/as_well.wav be found for "as well" or I'd rather put spaces into filenames? Thanks.