Difference between revisions of "Creating the acoustic model yourself"
(→Step 8) |
(→Step 2) |
||
(15 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
The HTK toolkit (version 3.4) shall be used to create an acoustic model. Please follow the following steps. | The HTK toolkit (version 3.4) shall be used to create an acoustic model. Please follow the following steps. | ||
− | ==== Step 1 ==== | + | ==== Step 1: Register with HTK ==== |
You will need to [http://htk.eng.cam.ac.uk/register.shtml register] with HTK before you can download it. Please do so. | You will need to [http://htk.eng.cam.ac.uk/register.shtml register] with HTK before you can download it. Please do so. | ||
− | ==== Step 2 ==== | + | |
+ | ==== Step 2: Download HTK sources==== | ||
Download the sources for HTK toolkit 3.4 from [http://htk.eng.cam.ac.uk/ftp/software/HTK-3.4.tar.gz here]. | Download the sources for HTK toolkit 3.4 from [http://htk.eng.cam.ac.uk/ftp/software/HTK-3.4.tar.gz here]. | ||
Also download the HTK samples from [http://htk.eng.cam.ac.uk/ftp/software/HTK-samples-3.4.tar.gz here]. | Also download the HTK samples from [http://htk.eng.cam.ac.uk/ftp/software/HTK-samples-3.4.tar.gz here]. | ||
+ | |||
==== Step 3 ==== | ==== Step 3 ==== | ||
* Move to your home directory | * Move to your home directory | ||
Line 194: | Line 196: | ||
</pre> | </pre> | ||
− | * Download the grammar and voca files | + | * Download the [http://doc.openrobotino.org/download/SpeechRecognition/robotino.grammar robotino.grammar] and [http://doc.openrobotino.org/download/SpeechRecognition/robotino.voca robotino.voca] files and extract them in your 'auto' folder you just created. After extraction your 'auto' folder should contain the following |
<pre> | <pre> | ||
robotino.grammar robotino.voca | robotino.grammar robotino.voca | ||
Line 226: | Line 228: | ||
:You should see an output as follows | :You should see an output as follows | ||
<pre> | <pre> | ||
− | robotino.grammar has | + | robotino.grammar has 12 rules |
− | robotino.voca has | + | robotino.voca has 12 categories and 29 words |
--- | --- | ||
Now parsing grammar file | Now parsing grammar file | ||
− | Now modifying grammar to minimize states[ | + | Now modifying grammar to minimize states[6] |
Now parsing vocabulary file | Now parsing vocabulary file | ||
− | Now making nondeterministic finite automaton[ | + | Now making nondeterministic finite automaton[34/34] |
− | Now making deterministic finite automaton[ | + | Now making deterministic finite automaton[29/29] |
− | Now making triplet list[ | + | Now making triplet list[29/29] |
− | + | 12 categories, 29 nodes, 38 arcs | |
− | -> minimized: | + | -> minimized: 15 nodes, 24 arcs |
--- | --- | ||
generated: robotino.dfa robotino.term robotino.dict | generated: robotino.dfa robotino.term robotino.dict | ||
− | |||
</pre> | </pre> | ||
Line 245: | Line 246: | ||
Now we shall proceed to the training and creation of the acoustic model. | Now we shall proceed to the training and creation of the acoustic model. | ||
− | * Download the prompts file and the codetrain.scp and save it in your '/home/%yourusername%/voxforge/auto' folder. Your 'voxforge/auto' folder should look like this | + | * Download the [http://doc.openrobotino.org/download/SpeechRecognition/prompts prompts] file and the [http://doc.openrobotino.org/download/SpeechRecognition/codetrain.scp codetrain.scp] and save it in your '/home/%yourusername%/voxforge/auto' folder. Your 'voxforge/auto' folder should look like this |
<pre> | <pre> | ||
codetrain.scp robotino.dfa robotino.grammar robotino.voca | codetrain.scp robotino.dfa robotino.grammar robotino.voca | ||
Line 258: | Line 259: | ||
</pre> | </pre> | ||
− | * Download the | + | * Download the [http://doc.openrobotino.org/download/SpeechRecognition/voxforge_lexicon voxforge_lexicon] file and save it in the 'voxforge/lexicon' folder you just created. |
− | ==== Step 9 ==== | + | ==== Step 9: Record the training data ==== |
− | + | ||
+ | * You must have a headset with a mic or a desktop boom mic. Preferably the same mic which will be used for speech recognition on the robot. Built in laptop or desktop mics are not recommended. | ||
*Create folder called 'train' in the 'voxforge/auto' folder and then a folder called 'wav' in the 'train' folder. | *Create folder called 'train' in the 'voxforge/auto' folder and then a folder called 'wav' in the 'train' folder. | ||
Line 277: | Line 279: | ||
** In the Edit>Preferences>Devices (or Audio I/O); make sure that you select 'Channels: 1 (Mono)' under the 'Recording' section. | ** In the Edit>Preferences>Devices (or Audio I/O); make sure that you select 'Channels: 1 (Mono)' under the 'Recording' section. | ||
** In the Edit>Preferences>Quality; make sure that the 'Default Sample Rate' is set to '16000 Hz' and the 'Default Sample Format' is set to '16-bit' | ** In the Edit>Preferences>Quality; make sure that the 'Default Sample Rate' is set to '16000 Hz' and the 'Default Sample Format' is set to '16-bit' | ||
+ | |||
+ | *'''If you need''' to configure your USB Headset then do the following, otherwise skip this bullet point | ||
+ | :*Create a new text file called '.asoundrc' and open it in gedit as follows | ||
+ | <pre> | ||
+ | gedit ~/.asoundrc | ||
+ | </pre> | ||
+ | :*Paste the following text in the file | ||
+ | <pre> | ||
+ | pcm.!default { | ||
+ | type asym | ||
+ | playback.pcm { | ||
+ | type plug | ||
+ | slave.pcm "hw:1,0" | ||
+ | } | ||
+ | capture.pcm { | ||
+ | type plug | ||
+ | slave.pcm "hw:1,0" | ||
+ | } | ||
+ | } | ||
+ | </pre> | ||
+ | :*Save the file and restart the computer. | ||
+ | |||
+ | * Make sure your microphone volume in Audacity is set to 1.0. | ||
+ | |||
+ | * Then click Record (i.e. the red circle button) and begin speaking in your normal voice for a few seconds, and then click Stop (i.e. the yellow square button). Look at the Waveform Display for the audio track you just created. The Vertical Ruler to the left of the Waveform Display provides you with a guide to your audio levels. Try to keep your recording levels between 0.5 and -0.5, averaging around 0.3 to -0.3. It is OK to have a few spikes go outside the 0.5 to -0.5 range, but avoid having any go beyond the 1.0 to -1.0 range, as this will generate distortion. If necessary, adjust Audacity's microphone volume to keep your audio within the proper ranges. | ||
+ | [[File:Audacity_screenshot.png]] | ||
+ | |||
+ | *To begin, you should not have any tracks displayed in the Audacity window. If you do, click the x icon at the top left of the audio track display (or hit ctrl-z as many times as is required to remove them; or restart Audacity). If you don't Audacity will happily record your new track, and leave your old track untouched, and when you export your audio to a wav file, both tracks will be merged to your wav file. | ||
+ | |||
+ | *Make sure your volumes are set properly, as outlined in the preceding section. | ||
+ | |||
+ | *Record you first file by clicking 'Record' in Audacity and saying the words in the first line of your prompts file: | ||
+ | <pre> | ||
+ | ROBOTINO MOVE ROBOTINO ROTATE ROBOTINO STOP | ||
+ | </pre> | ||
+ | |||
+ | *Speak normally - not too slow or too fast - and clearly. Pause slightly before you begin speaking and leave a short pause after you have completed (i.e. half a second pause before and after you speak). Remember not to breath out until you have clicked stop - most microphoness pick up breathing noises. | ||
+ | |||
+ | *Click the 'Stop' icon when you are completed. | ||
+ | |||
+ | *Review your waveform to ensure that highest and lowest peaks of your recording is between 0.5 and 1.0 in the upper range and the lower range is between -0.5 and -1.0. If they are, then listen to the file (press 'Play' in Audacity) to make sure your pronunciation is clear and that you do not hear any non-speech noises (i.e. breathing noises, lip smacking, or background noises, ...). If there are any problems, hit ctrl-z and re-record your file. | ||
+ | |||
+ | * If the file sounds OK then click File>Export and make sure that the format is WAV signed 16 bit PCM. Name the file as sample1 (for the first sentence) and save it in the 'train/wav' folder. | ||
+ | |||
+ | * Repeat the same procedure for the rest of the sentences in the prompt file. | ||
+ | |||
+ | ==== Step 10: Running the script ==== | ||
+ | |||
+ | *Create a new folder called 'scripts' in the 'voxforge/auto' folder | ||
+ | <pre> | ||
+ | cd auto | ||
+ | mkdir scripts | ||
+ | </pre> | ||
+ | *Download and extract this [http://www.voxforge.org/uploads/zK/El/zKElosehAk3PGd4L3nhBUQ/scripts.tgz file] in the 'voxforge/auto/script' folder. | ||
+ | *The scripts folder should look as follows | ||
+ | <pre> | ||
+ | create_trainscp.pl HTK_Compile_Model.sh interim_files perlsort.pl | ||
+ | fixfulllist.pl input_files logs | ||
+ | </pre> | ||
+ | *Now create a folder called 'mfcc' in the 'voxforge/auto/train' folder as follows | ||
+ | <pre> | ||
+ | cd auto | ||
+ | cd train | ||
+ | mkdir mfcc | ||
+ | </pre> | ||
+ | *Now run the script 'HTK_Compile_Model.sh' from the 'voxforge/auto/scripts' as follows | ||
+ | <pre> | ||
+ | cd voxforge/auto/scripts | ||
+ | ./HTK_Compile_Model.sh | ||
+ | </pre> | ||
+ | *The script should create two file 'hmmdefs' and 'tiedlist' in the 'voxforge/auto/acoustic_model_files' folder. | ||
+ | |||
+ | ==== Step 11: Copying the acoustic model files to robotino ==== | ||
+ | |||
+ | *Copy and replace the files 'hmmdefs' and 'tiedlist' in the '/etc/robotino/sr/julius/acoustic_model_files' folder with the ones you just created. | ||
+ | *Robotino is now ready to recognize your speech input. |
Latest revision as of 15:10, 22 February 2011
Setting up Speech Recognition
Setting up speech recognition is done in two parts. The first part involves training of the speech data and creation of the acoustic model and the second involves the actual execution of the speech recognition engine with the created acoustic model.
Creating an Acoustic Model
The HTK toolkit (version 3.4) shall be used to create an acoustic model. Please follow the following steps.
Step 1: Register with HTK
You will need to register with HTK before you can download it. Please do so.
Step 2: Download HTK sources
Download the sources for HTK toolkit 3.4 from here. Also download the HTK samples from here.
Step 3
- Move to your home directory
cd ~
- Create a directory called 'bin'
mkdir bin
- Unpack the downloaded HTK toolkit sources and HTK samples in a folder called 'htk-3.4' in the 'bin' directory. The 'bin' directory should contain the following
htk-3.4 samples
- Move the 'samples' folder to the 'htk-3.4' folder as follows
cd bin mv samples htk-3.4
- If you have a newer version of the gcc compiler (version 4 or above), you will need to install gcc version 3.4 so that HTK will compile properly. Use the following gcc's version command to see which version is installed on your system
gcc -v
- If your gcc version is 4 and above, follow the listed commands to install gcc 3.4
sudo apt-get install gcc-3.4 sudo rm /usr/bin/gcc sudo ln -s /usr/bin/gcc-3.4 /usr/bin/gcc
NOTE - if the above doesn't work for you then maybe the hardy ubuntu package repository is not in your sources.list file. In that case, do the following. If it does work, then skip to the next bullet point.
sudo gedit /etc/apt/sources.list
- add the following line to the end of the sources.list file
deb http://de.archive.ubuntu.com/ubuntu/ hardy main universe
- now run the following command
sudo apt-get update
- This should pull the hardy ubuntu packages from the repository. You can now run the following commands.
sudo apt-get install gcc-3.4 sudo rm /usr/bin/gcc sudo ln -s /usr/bin/gcc-3.4 /usr/bin/gcc
- Install the external dependencies as follows
sudo apt-get install libx11-dev libesd0-dev libasound2-dev libzip1 flex libncurses-dev
- Now move to the 'htk-3.4' dir and configure htk as follows. Note change %yourusername% from the command to your user name.
cd htk-3.4 ./configure --prefix=/home/%yourusername%/bin/htk-3.4
- Now run make all and make install. This should install the created binaries to the folder '/home/yourusername/bin/htk-3.4/bin' .
make all make install
- Change directory back to home and create a folder called 'voxforge' and then a folder called 'HTK_scripts' in the voxforge folder.
cd ~ mkdir voxforge cd voxforge mkdir HTK_scripts cd HTK_scripts
- Now copy some scripts from the 'htk-3.4/samples' folder to the 'HTK_scripts' folder as follows
cp ../../bin/htk-3.4/samples/RMHTK/perl_scripts/mkclscript.prl . cp ../../bin/htk-3.4/samples/HTKTutorial/maketrihed . cp ../../bin/htk-3.4/samples/HTKTutorial/prompts2mlf . cp ../../bin/htk-3.4/samples/HTKTutorial/prompts2wlist .
- Your 'HTK_scripts' folder should contain the following
maketrihed mkclscript.prl prompts2mlf prompts2wlist
Step 4
- Now we will download Julius (version 4.5.1). We shall be using pre-compiled binaries which can downloaded from here
- Once downloaded extract them to your '/home/%yourusername%/bin' folder. After that is done your 'bin' folder should contain the following
htk-3.4 julius-4.1.5-linuxbin
Step 5
- Now you will need to update your user path which can be done as follows. First change to your home directory and edit the .bashrc file.
cd ~ gedit .bashrc
- Add the following to the end of the .bashrc file. Note change %yourusername% from the command to your username.
# HTK and JULIUS scripts and executables PATH=$PATH:$HOME/bin:/home/%yourusename%/bin/htk-3.4/bin:/home/%yourusename%/bin/julius-4.1.5-linuxbin/bin
- Source your .bashrc file to reflect the changes
source ~/.bashrc
- Test if your HTK toolkit has been installed correctly by running the following command.
HVite -V
- You should see an output similar to the following.
HTK Version Information Module Version Who Date : CVS Info HVite 3.4 CUED 25/04/06 : $Id: HVite.c,v 1.1.1.1 2006/10/11 09:55:02 jal58 Exp $ HShell 3.4 CUED 25/04/06 : $Id: HShell.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HMem 3.4 CUED 25/04/06 : $Id: HMem.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HLabel 3.4 CUED 25/04/06 : $Id: HLabel.c,v 1.1.1.1 2006/10/11 09:54:57 jal58 Exp $ HMath 3.4 CUED 25/04/06 : $Id: HMath.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HSigP 3.4 CUED 25/04/06 : $Id: HSigP.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HWave 3.4 CUED 25/04/06 : $Id: HWave.c,v 1.1.1.1 2006/10/11 09:54:59 jal58 Exp $ HAudio 3.4 CUED 25/04/06 : $Id: HAudio.c,v 1.1.1.1 2006/10/11 09:54:57 jal58 Exp $ HVQ 3.4 CUED 25/04/06 : $Id: HVQ.c,v 1.1.1.1 2006/10/11 09:54:59 jal58 Exp $ HModel 3.4 CUED 25/04/06 : $Id: HModel.c,v 1.2 2006/12/07 11:09:08 mjfg Exp $ HParm 3.4 CUED 25/04/06 : $Id: HParm.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HDict 3.4 CUED 25/04/06 : $Id: HDict.c,v 1.1.1.1 2006/10/11 09:54:57 jal58 Exp $ HNet 3.4 CUED 25/04/06 : $Id: HNet.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HRec 3.4 CUED 25/04/06 : $Id: HRec.c,v 1.1.1.1 2006/10/11 09:54:58 jal58 Exp $ HUtil 3.4 CUED 25/04/06 : $Id: HUtil.c,v 1.1.1.1 2006/10/11 09:54:59 jal58 Exp $ HAdapt 3.4 CUED 25/04/06 : $Id: HAdapt.c,v 1.2 2006/12/07 11:09:07 mjfg Exp $ HMap 3.4 CUED 25/04/06 : $Id: HMap.c,v 1.1.1.1 2006/10/11 09:54:57 jal58 Exp $
- Test if Julius has been installed correctly by entering the following command in the terminal
julius-4.1.5
- You should see an output similar to the following
Julius rev.4.1.5 - based on JuliusLib rev.4.1.5 (fast) built for i686-pc-linux Copyright (c) 1991-2009 Kawahara Lab., Kyoto University Copyright (c) 1997-2000 Information-technology Promotion Agency, Japan Copyright (c) 2000-2005 Shikano Lab., Nara Institute of Science and Technology Copyright (c) 2005-2009 Julius project team, Nagoya Institute of Technology Try '-setting' for built-in engine configuration. Try '-help' for run time options.
- Now to switch back to your original gcc version, do the following (The original version in my case was 4.3, yours may differ)
sudo rm /usr/bin/gcc sudo ln -s /usr/bin/gcc-4.3 /usr/bin/gcc
Step 6
Install Audacity as follows
sudo apt-get install audacity
Step 7
We will now compile the grammar and voca files.
- Create a folder called 'auto' in your '/home/%yourusename%/voxforge' directory
cd ~ cd voxforge mkdir auto cd auto
- Download the robotino.grammar and robotino.voca files and extract them in your 'auto' folder you just created. After extraction your 'auto' folder should contain the following
robotino.grammar robotino.voca
- Now compile the grammar and voca files to Julius files. Make sure you are in the 'auto' folder. Run the following command
mkdfa.pl robotino
KNOWN ERROR - in case you get an error as follows while running the command above
/usr/X11R6/bin/perl: bad interpreter: No such file or directory
- Then open the mkdfa.pl file
gedit ~/bin/julius-4.1.5-linuxbin/bin/mkdfa.pl
- And change the first line from
#!/usr/X11R6/bin/perl
- To
#!/usr/bin/perl
- And run the command again
mkdfa.pl robotino
- You should see an output as follows
robotino.grammar has 12 rules robotino.voca has 12 categories and 29 words --- Now parsing grammar file Now modifying grammar to minimize states[6] Now parsing vocabulary file Now making nondeterministic finite automaton[34/34] Now making deterministic finite automaton[29/29] Now making triplet list[29/29] 12 categories, 29 nodes, 38 arcs -> minimized: 15 nodes, 24 arcs --- generated: robotino.dfa robotino.term robotino.dict
Step 8
Now we shall proceed to the training and creation of the acoustic model.
- Download the prompts file and the codetrain.scp and save it in your '/home/%yourusername%/voxforge/auto' folder. Your 'voxforge/auto' folder should look like this
codetrain.scp robotino.dfa robotino.grammar robotino.voca prompts robotino.dict robotino.term
- Now create a folder called 'lexicon' in the 'voxforge' directory.
cd ~ cd voxforge mkdir lexicon
- Download the voxforge_lexicon file and save it in the 'voxforge/lexicon' folder you just created.
Step 9: Record the training data
- You must have a headset with a mic or a desktop boom mic. Preferably the same mic which will be used for speech recognition on the robot. Built in laptop or desktop mics are not recommended.
- Create folder called 'train' in the 'voxforge/auto' folder and then a folder called 'wav' in the 'train' folder.
cd ~/voxforge cd auto mkdir train cd train mkdir wav
- Open the prompts file from the /home/%yourusername%/voxforge/auto folder in a text editor (for example gedit).
- Open Audacity and configure it as follows
- In the Edit>Preferences>Devices (or Audio I/O); make sure that you select 'Channels: 1 (Mono)' under the 'Recording' section.
- In the Edit>Preferences>Quality; make sure that the 'Default Sample Rate' is set to '16000 Hz' and the 'Default Sample Format' is set to '16-bit'
- If you need to configure your USB Headset then do the following, otherwise skip this bullet point
- Create a new text file called '.asoundrc' and open it in gedit as follows
gedit ~/.asoundrc
- Paste the following text in the file
pcm.!default { type asym playback.pcm { type plug slave.pcm "hw:1,0" } capture.pcm { type plug slave.pcm "hw:1,0" } }
- Save the file and restart the computer.
- Make sure your microphone volume in Audacity is set to 1.0.
- Then click Record (i.e. the red circle button) and begin speaking in your normal voice for a few seconds, and then click Stop (i.e. the yellow square button). Look at the Waveform Display for the audio track you just created. The Vertical Ruler to the left of the Waveform Display provides you with a guide to your audio levels. Try to keep your recording levels between 0.5 and -0.5, averaging around 0.3 to -0.3. It is OK to have a few spikes go outside the 0.5 to -0.5 range, but avoid having any go beyond the 1.0 to -1.0 range, as this will generate distortion. If necessary, adjust Audacity's microphone volume to keep your audio within the proper ranges.
- To begin, you should not have any tracks displayed in the Audacity window. If you do, click the x icon at the top left of the audio track display (or hit ctrl-z as many times as is required to remove them; or restart Audacity). If you don't Audacity will happily record your new track, and leave your old track untouched, and when you export your audio to a wav file, both tracks will be merged to your wav file.
- Make sure your volumes are set properly, as outlined in the preceding section.
- Record you first file by clicking 'Record' in Audacity and saying the words in the first line of your prompts file:
ROBOTINO MOVE ROBOTINO ROTATE ROBOTINO STOP
- Speak normally - not too slow or too fast - and clearly. Pause slightly before you begin speaking and leave a short pause after you have completed (i.e. half a second pause before and after you speak). Remember not to breath out until you have clicked stop - most microphoness pick up breathing noises.
- Click the 'Stop' icon when you are completed.
- Review your waveform to ensure that highest and lowest peaks of your recording is between 0.5 and 1.0 in the upper range and the lower range is between -0.5 and -1.0. If they are, then listen to the file (press 'Play' in Audacity) to make sure your pronunciation is clear and that you do not hear any non-speech noises (i.e. breathing noises, lip smacking, or background noises, ...). If there are any problems, hit ctrl-z and re-record your file.
- If the file sounds OK then click File>Export and make sure that the format is WAV signed 16 bit PCM. Name the file as sample1 (for the first sentence) and save it in the 'train/wav' folder.
- Repeat the same procedure for the rest of the sentences in the prompt file.
Step 10: Running the script
- Create a new folder called 'scripts' in the 'voxforge/auto' folder
cd auto mkdir scripts
- Download and extract this file in the 'voxforge/auto/script' folder.
- The scripts folder should look as follows
create_trainscp.pl HTK_Compile_Model.sh interim_files perlsort.pl fixfulllist.pl input_files logs
- Now create a folder called 'mfcc' in the 'voxforge/auto/train' folder as follows
cd auto cd train mkdir mfcc
- Now run the script 'HTK_Compile_Model.sh' from the 'voxforge/auto/scripts' as follows
cd voxforge/auto/scripts ./HTK_Compile_Model.sh
- The script should create two file 'hmmdefs' and 'tiedlist' in the 'voxforge/auto/acoustic_model_files' folder.
Step 11: Copying the acoustic model files to robotino
- Copy and replace the files 'hmmdefs' and 'tiedlist' in the '/etc/robotino/sr/julius/acoustic_model_files' folder with the ones you just created.
- Robotino is now ready to recognize your speech input.