How to Use Models from SphinxTrain in Sphinx-4 |
Using new models is easy, you just need to configure the recognizer properly. It usually includes three steps:
<your_training_folder>/etc/
and have names like <your_model_name>.dic
and <your_model_name>.lm.DMP
. If you don't have LM yet, you can create it with
cmuclmtk and later convert to DMP format with sphinx_lm_convert
from sphinxbase package.
Do the following changes in model and dictionary configuration, just point to the
files:
<component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> <property name="unigramWeight" value="0.7"/> <property name="maxDepth" value="3"/> <property name="logMath" value="logMath"/> <property name="dictionary" value="dictionary"/> <property name="location" value="the name of the language model file for example <your_training_folder>/etc/<your_model_name>.lm.DMP"/> </component> <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file for example <your_training_folder>/etc/<your_model_name>.dic"/> <property name="fillerPath" value="the name of the filler file for example <your_training_folder>/etc/<your_model_name>.filler"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component>
Next is the acoustic model. During training several models are created, you need one of them.
For large vocabulary task cd (context dependent) model is located in
<your_training_folder>/model_parameters/<your_db_name>.cd_cont_<number of senones>
.
For small vocabulary task it's enough to take ci (context independent model). It's located in
<your_training_folder>/model_parameters/<your_db_name>.ci_cont
.
This folder should include several files, like means, variances, feat.params, mdef. There will be also folders for different number of gaussians like _2 _4 _8, they are intermediate ones and you don't need them.
Again, let's define a model in config file:
<component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder for example <your_training_folder>/model_parameters/<your_model_name>.cd_cont_<senones>"/> </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component>
Please note that path value is just URI, so it could start with URI prefix like http://
Note that for MLLT you probably also want change vectorLength
property. Otherwise it's not needed.
If you trained 8 kHz model or MLLT model, you need to change the frontend accordingly. Here are required changes:
<component name="mfcFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd"> <propertylist name="pipeline"> .... <ite>melFilterBank</item> .... <item>lda</item> </propertylist> </component> <component name="melFilterBank" type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank"> <property name="numberFilters" value="31"/> <property name="minimumFrequency" value="200"/> <property name="maximumFrequency" value="3500"/> </component> <component name="lda" type="edu.cmu.sphinx.frontend.feature.LDA"> <property name="loader" value="sphinx3Loader"/> </component>
melFilterBank params here are changed for default 8kHz frequences and lda component is introduced to transform feature space with MLLT matrix.
For more information on configuration see Javadoc and Programmer's Documentation.
Optionally you can pack models into JAR file. The advantage of having it in a JAR file is that the JAR file can simply be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application. Once you did so, don't forget to include the JAR into the classpath. To configure loading form the jars, Sphinx4 allows URIs to contain resource:<acoustic or language model path> which allows XML config files to easily reference models in JAR files. Scheme resource:/path causes Sphinx4 to search on the classpath for the path. See our demos for example on how WSJ model files are loaded from WSJ jar.