├── KaldiNotes.pdf ├── README.md ├── README.txt ├── _config.yml ├── _data └── menu.yml ├── _includes └── nav.html ├── _layouts └── default.html ├── fst-example ├── compileAndDraw.sh ├── composeExample.sh ├── dict.fst.txt ├── index.txt ├── makeSymbols.py ├── sent.fsa.txt └── simple.fsa.txt ├── images ├── body-bg.png ├── highlight-bg.jpg ├── hr.png ├── octocat-icon.png ├── tar-gz-icon.png └── zip-icon.png ├── index.html ├── install_notes.txt ├── javascripts └── main.js ├── params.json ├── required_knowledge.txt ├── resources.txt ├── stylesheets ├── print.css ├── pygment_trac.css └── stylesheet.css └── tidigits ├── 174o2o8a.png ├── 174o2o8aPhones.png ├── LGFST.png ├── TODO.md ├── data_prep.txt ├── eval.txt ├── grammerFST.png ├── index.txt ├── lang_prep.txt ├── lexiconFST.png └── train.txt /KaldiNotes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/KaldiNotes.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Kaldi Notes 4 | --- 5 | This repository backs a webpage. 6 | [Go to the Webpage](http://oxinabox.github.io/Kaldi-Notes/) 7 | 8 | There is however useful executable scripts here so you might want to be on this page. 9 | 10 | -------------------------------------------------------------------------------- /README.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Kaldi Notes 4 | --- 5 | This repository backs a webpage. 6 | [Go to the Webpage](http://oxinabox.github.io/Kaldi-Notes/) 7 | 8 | There is however useful executable scripts here so you might want to be on this page. 9 | 10 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | baseurl: /Kaldi-Notes 2 | 3 | markdown_ext: "markdown,mkdown,mkdn,mkd,md,txt" 4 | markdown: redcarpet 5 | 6 | redcarpet: 7 | extensions: [smart] 8 | 9 | -------------------------------------------------------------------------------- /_data/menu.yml: -------------------------------------------------------------------------------- 1 | - text: Other Resources 2 | url: /resources 3 | - text: Required Knowledge 4 | url: /required_knowledge 5 | - text: Installing Kaldi 6 | url: /install_notes 7 | - text: OpenFST 8 | url: /fst-example 9 | 10 | - text: TIDIGITS 11 | url: /tidigits 12 | subitems: 13 | - text: Data Preparation 14 | url: /tidigits/data_prep 15 | - text: Language Preparation 16 | url: /tidigits/lang_prep 17 | - text: Training 18 | url: /tidigits/train 19 | - text: Evaluation 20 | url: /tidigits/eval 21 | -------------------------------------------------------------------------------- /_includes/nav.html: -------------------------------------------------------------------------------- 1 | {% assign navurl = page.url | remove: 'index.html' %} 2 |

6 | 7 | {{ item.text }} 8 | 9 |
- 12 | 13 | {{ subitem.text }} 14 | 15 |
19 | 20 |
24 | 25 | 26 | {{ item.text }} 27 | 28 |

34 | 35 | -------------------------------------------------------------------------------- /_layouts/default.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 13 | {% if page.title %} {{ page.title }} | {% endif %} 14 | 15 | 16 | 17 | 18 |

19 | {% include nav.html nav=site.data.menu %} 20 |

21 | 22 |

23 |

24 | 25 |

26 |

Kaldi-notes

27 |

Some notes on Kaldi

28 |

29 | 30 |

31 | {{ content }} 32 |

33 | 34 | 38 | 39 | 40 |

41 |

42 | 43 | 44 | -------------------------------------------------------------------------------- /fst-example/compileAndDraw.sh: -------------------------------------------------------------------------------- 1 | #!bash 2 | #Create the example using this script 3 | if [[ $# -eq 0 ]] ; then 4 | echo 'One argument must be given of the form \"name.fst.txt\" or \"name.fsa.txt\"'; 5 | exit 1; 6 | fi 7 | 8 | IFS="."; #Make . the split character 9 | inputFilenameParts=($1); 10 | IFS=" "; #Put it back to default space 11 | 12 | name=${inputFilenameParts[0]}; 13 | type=${inputFilenameParts[1]}; 14 | 15 | echo 'Preparing: '$name; 16 | 17 | fsTextFile="${name}.${type}.txt"; 18 | fsFile="$name.${type}"; 19 | osymsFile="${name}.osyms"; 20 | isymsFile="${name}.isyms"; 21 | svgOutputFile="${name}.svg"; 22 | 23 | isymbols="--isymbols=${isymsFile}"; 24 | osymbols="--osymbols=${osymsFile}"; 25 | 26 | 27 | python makeSymbols.py $fsTextFile 2 > $isymsFile; 28 | 29 | if [ $type = "fst" ] ; then 30 | python makeSymbols.py $fsTextFile 3 > $osymsFile; 31 | fstcompile $isymbols $osymbols --keep_isymbols --keep_osymbols $fsTextFile $fsFile; 32 | fstdraw --portrait $fsFile | dot -Tsvg > $svgOutputFile; 33 | elif [ $type = "fsa" ] ; then 34 | fstcompile --acceptor $isymbols --keep_isymbols $fsTextFile $fsFile; 35 | fstdraw --portrait $fsFile | dot -Tsvg > $svgOutputFile; 36 | else 37 | echo "Filetype: ${type} not recognitsed. Recognised types are fst=finite state trasducer and fsa=finite state acceptor"; 38 | fi 39 | 40 | echo 'Done, outputted: ' $svgOutputFile 41 | 42 | -------------------------------------------------------------------------------- /fst-example/composeExample.sh: -------------------------------------------------------------------------------- 1 | #!bin/sh 2 | 3 | # Based on http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture6.pdf 4 | 5 | bash compileAndDraw.sh sent.fsa 6 | bash compileAndDraw.sh dict.fst 7 | 8 | fstcompose --fst_compat_symbols=false sent.fsa dict.fst > strings.fst 9 | fstdraw --portrait strings.fst | dot -Tsvg > strings.svg 10 | echo 'Done composing: outputted strings.svg' 11 | echo 'Example sentences:' 12 | echo '------------------' 13 | 14 | for i in `seq 1 10`; 15 | do 16 | fstrandgen --seed=$RANDOM strings.fst | fstproject --project_output | 17 | fstprint --acceptor --isymbols=dict.syms | 18 | awk '{printf("%s ",$3)}END{printf("\n")}' 19 | done 20 | -------------------------------------------------------------------------------- /fst-example/dict.fst.txt: -------------------------------------------------------------------------------- 1 | 0 0 DET the 2 | 0 0 DET a 3 | 0 0 N cat 4 | 0 0 N dog 5 | 0 0 N mouse 6 | 0 0 V chased 7 | 0 0 V bit 8 | 0 9 | -------------------------------------------------------------------------------- /fst-example/index.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Introduction to OpenFST 4 | --- 5 | 6 | 7 | #Introduction to Finite State Transducers 8 | Weighted Finite State Transducers is a generalisations of finite state machines. 9 | They can be used for many purposed, including implementing algorithms that are hard to write out otherwise -- such as HMMs, as well as for the representation of knowledge -- similar to a grammar. 10 | 11 | ##Other places to get information 12 | - A descent set of slides can be found [here](http://www.gavo.t.u-tokyo.ac.jp/~novakj/wfst-algorithms.pdf) 13 | - [The OpenFst documentation](http://www.openfst.org/twiki/bin/view/FST/FstQuickTour) and [FST Examples](http://www.openfst.org/twiki/bin/view/FST/FstExamples) are nonawful, though the shell and C++ sections are intermixed. 14 | - [Speech Recognition with Weighted Finite-state Transducers](http://www.cs.nyu.edu/~mohri/pub/hbka.pdf) a book chapter. 15 | 16 | ![A FST for TIDIGITS](../tidigits/lexiconFST.png) 17 | 18 | Above: An FST for pronouncing the digits 1-9 and two pronouncations of zero as: O (o) and zero (z), as used in TIDIGITS 19 | 20 | 21 | ##Terminology 22 | ###Symbols and Strings 23 | Symbols come from some alphabet. 24 | They could be letters, words, phonemes, etc. 25 | 26 | A string is a series of symbols from an alphabet, it can include the empty string. 27 | Matching the examples above, a string could be a word (spelt out), a sentence, a word (spelt out phonetically), etc. 28 | 29 | A string can be represented as a Finite State Acceptor, where each symbol links to the state which links to the next. 30 | 31 | ###Finite State Acceptor (FSA) 32 | A Finite State Acceptor has the components of: 33 | 34 | - a number of States 35 | - one or more of which is initial 36 | - one or more of which is terminal 37 | - connections between states, with a input symbol (IE label) 38 | - the symbol could be the empty string (often written "-" or "" or "ε") 39 | - Not necessarily a one to one label to next state mapping (IE nondeterministic) 40 | 41 | A FSA can be used to check if a string matches its pattern -- it is computationally equivalent to a regular expression. 42 | It can also be used to generate strings which match that pattern. 43 | 44 | FSA's can be treated as FSTs with same input and output symbols at each edge. 45 | Kaldi example scripts sometimes write them this way. 46 | 47 | ###Finite State Transducers (FST) 48 | A Finite State Transducer extends the Finite State Acceptor with the addition of: 49 | 50 | - output labels on each edge 51 | - again the output can be the empty string. 52 | - it is common (such as in the TIDIGIT example above), to see only the first transition in a nonbranching substructure to be labels -- the other states have nothing to add other than confirming we are in that chain. (which we might Not be) 53 | - The input alphabet and output alphabet do not have to be the same, and indeed are normally not. 54 | 55 | A FST can be used to translate strings in its input alphabet to strings in its output alphabet, iff the input string matches the FSTs structure of allowed transitions. 56 | Thus if a FSA accepting its input alphabet is composed with it, it can translate the FSA. 57 | A series of FSAs can be composed, translating (matched) alphabet to alphabet, to get the desired output. 58 | 59 | 60 | ###Weighted Finite State Acceptor/Transducer 61 | As per the ordinal, but with a weight associated with each edge (as well as input, and output for transducers) 62 | This weight has a ⊕ and ⊗ operation defined on it, 63 | so that weight of alternatives and that cumulative weight along a path can be found. 64 | 65 | - e.g. weight along a path is product of probabilities, and represents the probability of that input string. 66 | - e.g. sum of weights on two edges is the probability of either of those alternitives. 67 | 68 | 69 | 70 | #Finite State Transducers in Kaldi 71 | 72 | Kaldi uses FSTs (and FSAs), as a common knowledge representation for all things. 73 | 74 | 75 | #OpenFST 76 | 77 | ##Filetypes 78 | 79 | ###Textual FST/FSA definition: `.fst.txt`, `.fsa.txt`, `.txt` 80 | Textual Representation of the finite state transducer or finite state acceptor respectively. 81 | These are the files you write to get things done, to describe your system. 82 | 83 | In most of kaldi the `.fst.txt`/`.fsa.txt` is used. In other places it is just called `.txt`. In this document, it is always referred to by the former terms. 84 | 85 | 86 | #### Line format: 87 | Normal line `fromState toState inSymbol [outSymbol] [weight]`
88 | Terminal state line `terminalState` 89 | 90 | - `fromState`, `toState`, and `terminalState` are integer state labels 91 | - `inSymbol`, `outSymbol` are textual strings being the name of the symbols from the respective input and output alphabets. 92 | - `outSymbol` should not be present in FSAs, and should always be present in FSTs 93 | - `weight` is a decimal number, indicating the weight of the edge. It must be present in Weighted FSTs/FSAs 94 | 95 | ###Symbol table file: `.isyms`, `.osyms`, `.syms`, `.dict`, `.txt` 96 | OpenFst like to refer to symbols by a positive integer. 97 | Since any finite alphabet is isomorphic to a subset of the positive integers, 98 | such a bijection exists, and can be created by enumerating each symbol. 99 | 100 | For each FST you should have two of these files, one for the input alphabet and one for the output alphabet. For an FSA you should only have one -- for the input alphabet. Under most circumstances these can be generated from the `.fst.txt`/`.fsa.txt` programatically. One such script for that is provided here in [](./makeSymbols.py). Others exist throughout the kaldi example scripts, often using AWK oneliners. 101 | 102 | In different places different extensions are used. 103 | The example [](./compileAndDraw.sh) script uses `.isyms` for symbol files generated from the input alphabet in the textual FSA/FSA description, and `.osyms` for that generated from the output alphabet. 104 | 105 | 106 | ####Line Format: 107 | `symbol integer` 108 | 109 | - `symbol` is a symbol from the alphabet being maps 110 | - `integer` is a unique positive integer (that is to say each integer only appears once in this file). 111 | 112 | ### Binary FST/FSA: `.fst`, `.fsa` 113 | This is the binary representation of the finite state transducer/acceptor. 114 | It is produced from the textual representation and symbol tables using 115 | `fstcompile`. 116 | 117 | ### Graph of FST/FSA: `.dot` 118 | It is a [Graph Description Language File](http://en.wikipedia.org/wiki/DOT_%28graph_description_language%29), produced by `fstdraw`. 119 | Piping in through `dot` can convert it into another more common format. 120 | E.g.: `cat example.dot | dot -Tsvg > example.svg` will convert `example.dot` to a SVG file. 121 | This is often done directly from the line that calls `fstdraw`. 122 | 123 | ##OpenFST components 124 | OpenFST is made up of several different command line applications. 125 | The three most used in kaldi are details briefly below: 126 | 127 | 128 | 129 | ###Common convention 130 | 131 | ###Input and Output 132 | OpenFST commands which take a single input and produce a single output 133 | (such as `fstdraw` and `fstcompile`) 134 | have the basic usage of 135 | 136 | ``` 137 | fstcommand [FLAGS] [inputfile [outputfile]] 138 | ``` 139 | 140 | Which is to say an `inputfile` can optionally be provided, 141 | and if it is, then optionally an `outputfile` can be provided also. 142 | 143 | If either is missing then input will be taken from standard in (IE piped in, or read from keyboard if no input pipe), 144 | and output will be sent to standard output (IE piped out, or printed to the terminal if there is no output pipe.), respectively. 145 | 146 | 147 | ####Accessing Help (manpages) 148 | Because OpenFST is not properly installed, it does not have entries in the man pages. 149 | To get help with a command use: 150 | 151 | ``` 152 | fstcommand --help | less 153 | ``` 154 | ###Compile: `fstcompile` 155 | this converts a textural FST/FSA into a binary one. 156 | 157 | - FSA Usage: `fstcompile --acceptor --isymbols= [--keep_isymbols];` 158 | - FST Usage: `fstcompile --isymbols= -osymbols= [--keep_isymbols] [--keep_osymbols];` 159 | 160 | Flags: 161 | 162 | - `--acceptor`: compiles it as an FSA, rather than a FST 163 | - `--isymbols=`, `--osymbols=`: specifies the input and output symbol tables 164 | - `--keep_isymbols`, `--keep_osymbols`: If set then the symbol stables as keeps in the binary file and do not need to be specified at later steps such as `fstdraw` 165 | 166 | ###Draw: `fstdraw` 167 | produces a `.dot` file graph, from a binary FST/FSA 168 | 169 | - FSA Usage: `fstdraw --acceptor --portait [--isymbols=] [--osymbols=]` 170 | - FST Usage: `fstdraw --portait [--isymbols=] [--osymbols=]` 171 | - Common Use example: `cat eg.fst | fstdraw --portait --isymbols=eg.isyms --osymbols | dot -Tsvg > eg.svg` 172 | 173 | Flags: 174 | 175 | - `--portrait` this flag should **always** be set. If not set then image comes out rotated 90 degrees, and on a overly large canvas. 176 | - `--isymbols`, `--osymbols`, as before, but if not provided then symbols in the graphic will be replaced with their numeric representation, unless `--keep_isymbols` or `--keep_osymbols` was set in the compile step 177 | - `--acceptor`: draws a FSA, rather than a FST. Without it it will label the FSA with output labels. 178 | 179 | ###Compose: `fstcompose` 180 | Composed a FSA/FST with a FST 181 | 182 | - Usage: `fstcompose [--fst_compat_symbols=false] outer.[fst|fsa] inner.fst output.fst` 183 | 184 | Applying an input to the Output FST is equivelent to first applying it to the Inner then applying the output of that to the Outer. i.e. `output(x)=outer(inner(x))` 185 | 186 | - `--fst_compat_symbols=false`: setting this to false (it defaults to true), may be required when composing FSTs/FSA where `--keep_isymbols` or `--keep_osymbols` was used and that the symbol files embedded while actually compatible are not the same files (it seems to store the filenames, which can be seen by running `strings` on a fst). 187 | 188 | 189 | 190 | ###Other useful Commands 191 | All the commands in OpenFst have a use. 192 | Other commands which I have found particularly useful, 193 | but do not have space to detail include; 194 | 195 | - `fstsymbols` manipulate and export the symbols tables in the binary FST/FSA 196 | - `fstproject` convert the FST into a FSA in either the input or output space by discarding the appropriate labels 197 | 198 | #Examples provided here 199 | 200 | Several scripts are provided here to demonstrate how to make use of OpenFST, 201 | and to make using it easier. 202 | They can be downloaded from the [Git backing this site](https://github.com/oxinabox/Kaldi-Notes/tree/gh-pages/fst-example). 203 | The section names below are also hyperlinks to download those scripts/files. 204 | 205 | ###NOTE: 206 | The example scripts assume openfst binaries are in your `PATH`. 207 | If you added all kaldi binerys during install step you will already have them. 208 | Otherwise you can add just the Openfst binaries by: 209 | Add to your `.bashrc` (or similar) `PATH="<...>/kaldi-trunk/tools/openfst/bin:${PATH}"`, where `<...>` is the math to the kaldi-trunk folder. 210 | then `source ~/.bashrc` 211 | 212 | 213 | ###[makeSymbols.py](makeSymbols.py) 214 | [makeSymbols.py](makeSymbols.py) is a script to make creating the symbol tables (which map symbols to arbitary unique integers) easier. 215 | 216 | Usage: `python makeSymbols.py file fieldNumber` 217 | 218 | - `file`: the textual FST/FSA file (`.fst.txt` or `.fsa.txt usually`), to extract the symbols from 219 | - `fieldNumber`: which column of the file to take symbols from 220 | - input symbols use `fieldNumber` of 2 221 | - output symbols use `fieldNumber` of 3 222 | 223 | The Symbols Table is output to standard out, and can be piped into a file 224 | 225 | ###[compileAndDraw.sh](compileAndDraw.sh) 226 | the [compileAndDraw.sh](compileAndDraw.sh) is a simple bash script that runs the whole process of compiling then drawing a FST/FSA. 227 | 228 | Usage FSA: `bash compileAndDraw.sh filename.fsa.txt` 229 | Usage FST: `bash compileAndDraw.sh filename.fst.txt` 230 | 231 | Note: unlike openfst programs this is file extension sensitive. 232 | It will make the appropriate call for a FSA or a FST based on the extension. 233 | 234 | ###[composeExample.sh](composeExample.sh) 235 | The [composeExample.sh](composeExample.sh) script runs though the creation then composition of the `dict.fst` and `sent.fsa`. It then outputs some sentences generated using the language model descried. 236 | 237 | Usage: `bash composeExample.sh` 238 | 239 | ##Example FSTs/FSAs 240 | This folder contains 3 examples: 241 | The later two examples of sentence construction are based on ones provided in [these lecture notes](http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture6.pdf) 242 | 243 | ##[simple.fsa.txt](./simple.fsa.txt) 244 | [simple.fsa.txt](./simple.fsa.txt) is a very simple Finite State Accepter. 245 | 246 | ##[dict.fst.txt](./dict.fst.txt) 247 | [dict.fst.txt](./dict.fst.txt) is a dictionary containing several words. There is only state in the dictionary -- as far it its concerns words can be in any order 248 | 249 | ##[sent.fsa.txt](./sent.fsa.txt) 250 | [sent.fsa.txt](./sent.fsa.txt) is a grammar for a simple sentence, expressed as a finite state acceptor. Sentences can either be `determiner noun verb` or `determiner noun verb determiner noun`. 251 | 252 | 253 | 254 | -------------------------------------------------------------------------------- /fst-example/makeSymbols.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | 4 | 5 | if len(sys.argv)<3 or '-h' in sys.argv or '--help' in sys.argv: 6 | print(""" 7 | Usage: python makeSymbols file fieldNumber 8 | 9 | file: the textual FST/FSA file (.fst.txt or .fsa.txt usually), to extract the symbols from 10 | fieldNumber: which column of the file to take symbols fro 11 | input symbols use fieldNumber of 2 12 | output symbolss use fieldNumber of 3 13 | 14 | The Symbols Table is output to standard out, and can be piped into a file 15 | """) 16 | 17 | 18 | words=set() 19 | index = int(sys.argv[2]) 20 | 21 | with open(sys.argv[1], 'r') as ff: 22 | for line in ff: 23 | fields = line.split(' ') 24 | if len(fields)>index: 25 | field = fields[index].strip() 26 | if (field): 27 | words.add(field) 28 | 29 | print("- 0") #alway have the empty string as 0 30 | words.discard('-') 31 | for (ii,word) in enumerate(words,1): 32 | print("%s %d" % (word,ii)) 33 | 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /fst-example/sent.fsa.txt: -------------------------------------------------------------------------------- 1 | 0 1 DET 2 | 1 2 N 3 | 2 3 V 4 | 3 4 DET 5 | 4 5 N 6 | 5 7 | 3 8 | -------------------------------------------------------------------------------- /fst-example/simple.fsa.txt: -------------------------------------------------------------------------------- 1 | 0 1 The 1 2 | 1 2 person 0.5 3 | 1 3 people 0.5 4 | 2 4 is 1 5 | 3 4 are 1 6 | 4 5 mad 1 7 | 5 8 | 9 | -------------------------------------------------------------------------------- /images/body-bg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/body-bg.png -------------------------------------------------------------------------------- /images/highlight-bg.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/highlight-bg.jpg -------------------------------------------------------------------------------- /images/hr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/hr.png -------------------------------------------------------------------------------- /images/octocat-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/octocat-icon.png -------------------------------------------------------------------------------- /images/tar-gz-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/tar-gz-icon.png -------------------------------------------------------------------------------- /images/zip-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/zip-icon.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | --- 4 | 5 |

6 |

7 | 8 |

9 | View on GitHub 10 |

11 | 12 |

13 | 14 |

15 |

This is an introduction to speech recognition using Kaldi. 16 | Follow one of the links to get started. 17 | 18 |

19 | A PDF snapshot of this site/manual is available. Be aware that all link within the pdf go to the website. 20 |

21 | 22 | 23 |

24 |

`. 84 | 85 | `spk2utt` is the opposite, and can be generated by using the script `utils/utt2spk_to_spk2utt.pl`. 86 | Each like starts with a speaker id, then has every utterance id they spoke. 87 | 88 | 89 | ##Feature extraction 90 | The feature extraction is carried out by the `run.sh` script, rather than by the `local/tidigits_data_prep.sh` script. 91 | 92 | 93 | ###Extracting the MFCC Features 94 | See [this section of the kaldi tutorial](http://kaldi.sourceforge.net/tutorial_running.html#tutorial_running_feats) 95 | 96 | [Mel-frequency cepstral coefficient](http://en.wikipedia.org/wiki/Mel-frequency_cepstrum) (MFCCs) features. 97 | Done using the script `steps/make_mfcc.sh` 98 | 99 | 100 | 101 | ####Compute Cepstral Mean and Variance Normalization statistics 102 | Done using the script `steps/compute_cmvn_stats.sh` 103 | 104 | ##Data Splitting. 105 | The data needs to be divided up so that we can run many jobs in parallel. 106 | The data splitting is also carried out by the `steps/train_mono.sh` and `steps/decode.sh` scripts if it has not already been carried out, rather than by the `local/tidigits_data_prep.sh` script. It can however be carried out at anytime after the training and test directories are created, and features extracted. 107 | 108 | It can be done with the script `utils/split_data.sh`. 109 | Usage: 110 | 111 | ``` 112 | utils/split_data.sh

169 | ``` 170 | 171 | - `lang-dir` is, as before, the path to the directory containing all the language model files, [also prepared earlier](./lang_prep) 172 | - `model-dir` is the `exp-dir` from the previous train-mono step, which now contained the trained model. 173 | - `graph-dir` is the directory to place the final graph in. In the example script this is made as a graph subdirectory under the `exp-dir`. If it does not exist, it will be created 174 | 175 | - 176 | 177 | ####Context Options 178 | There are 3 options for defining how many phones are used to create the context. 179 | These are passed as the options to the `utils/mkgraph.sh` script 180 | 181 | - ` --mono` for monophone i.e. one phone i.e. no context (Used in `steps/train_mono.sh`) 182 | - no flag (default) for triphone i.e. 3 phones i.e. one phone to each side for context 183 | - `--quinphone` for quinphone i.e. 5 phones i.e. 2 phones to each side for context 184 | 185 | It would not be hard to extend the mkgraph script to create contexts of any length. 186 | The section of the mkgraph script responsible for this, 187 | makes use of `/kaldi-trunk/src/fstbin/fstcomposecontext`, take a look at its `--help` for more information. 188 | 189 | --------------------------------------------------------------------------------