├── KaldiNotes.pdf ├── README.md ├── README.txt ├── _config.yml ├── _data └── menu.yml ├── _includes └── nav.html ├── _layouts └── default.html ├── fst-example ├── compileAndDraw.sh ├── composeExample.sh ├── dict.fst.txt ├── index.txt ├── makeSymbols.py ├── sent.fsa.txt └── simple.fsa.txt ├── images ├── body-bg.png ├── highlight-bg.jpg ├── hr.png ├── octocat-icon.png ├── tar-gz-icon.png └── zip-icon.png ├── index.html ├── install_notes.txt ├── javascripts └── main.js ├── params.json ├── required_knowledge.txt ├── resources.txt ├── stylesheets ├── print.css ├── pygment_trac.css └── stylesheet.css └── tidigits ├── 174o2o8a.png ├── 174o2o8aPhones.png ├── LGFST.png ├── TODO.md ├── data_prep.txt ├── eval.txt ├── grammerFST.png ├── index.txt ├── lang_prep.txt ├── lexiconFST.png └── train.txt /KaldiNotes.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/KaldiNotes.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Kaldi Notes 4 | --- 5 | This repository backs a webpage. 6 | [Go to the Webpage](http://oxinabox.github.io/Kaldi-Notes/) 7 | 8 | There is however useful executable scripts here so you might want to be on this page. 9 | 10 | -------------------------------------------------------------------------------- /README.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Kaldi Notes 4 | --- 5 | This repository backs a webpage. 6 | [Go to the Webpage](http://oxinabox.github.io/Kaldi-Notes/) 7 | 8 | There is however useful executable scripts here so you might want to be on this page. 9 | 10 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | baseurl: /Kaldi-Notes 2 | 3 | markdown_ext: "markdown,mkdown,mkdn,mkd,md,txt" 4 | markdown: redcarpet 5 | 6 | redcarpet: 7 | extensions: [smart] 8 | 9 | -------------------------------------------------------------------------------- /_data/menu.yml: -------------------------------------------------------------------------------- 1 | - text: Other Resources 2 | url: /resources 3 | - text: Required Knowledge 4 | url: /required_knowledge 5 | - text: Installing Kaldi 6 | url: /install_notes 7 | - text: OpenFST 8 | url: /fst-example 9 | 10 | - text: TIDIGITS 11 | url: /tidigits 12 | subitems: 13 | - text: Data Preparation 14 | url: /tidigits/data_prep 15 | - text: Language Preparation 16 | url: /tidigits/lang_prep 17 | - text: Training 18 | url: /tidigits/train 19 | - text: Evaluation 20 | url: /tidigits/eval 21 | -------------------------------------------------------------------------------- /_includes/nav.html: -------------------------------------------------------------------------------- 1 | {% assign navurl = page.url | remove: 'index.html' %} 2 | 34 | 35 | -------------------------------------------------------------------------------- /_layouts/default.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 13 | {% if page.title %} {{ page.title }} | {% endif %} 14 | 15 | 16 | 17 | 18 | 21 | 22 |
23 |
24 | 25 |
26 |

Kaldi-notes

27 |

Some notes on Kaldi

28 |
29 | 30 |
31 | {{ content }} 32 |
33 | 34 | 38 | 39 | 40 |
41 |
42 | 43 | 44 | -------------------------------------------------------------------------------- /fst-example/compileAndDraw.sh: -------------------------------------------------------------------------------- 1 | #!bash 2 | #Create the example using this script 3 | if [[ $# -eq 0 ]] ; then 4 | echo 'One argument must be given of the form \"name.fst.txt\" or \"name.fsa.txt\"'; 5 | exit 1; 6 | fi 7 | 8 | IFS="."; #Make . the split character 9 | inputFilenameParts=($1); 10 | IFS=" "; #Put it back to default space 11 | 12 | name=${inputFilenameParts[0]}; 13 | type=${inputFilenameParts[1]}; 14 | 15 | echo 'Preparing: '$name; 16 | 17 | fsTextFile="${name}.${type}.txt"; 18 | fsFile="$name.${type}"; 19 | osymsFile="${name}.osyms"; 20 | isymsFile="${name}.isyms"; 21 | svgOutputFile="${name}.svg"; 22 | 23 | isymbols="--isymbols=${isymsFile}"; 24 | osymbols="--osymbols=${osymsFile}"; 25 | 26 | 27 | python makeSymbols.py $fsTextFile 2 > $isymsFile; 28 | 29 | if [ $type = "fst" ] ; then 30 | python makeSymbols.py $fsTextFile 3 > $osymsFile; 31 | fstcompile $isymbols $osymbols --keep_isymbols --keep_osymbols $fsTextFile $fsFile; 32 | fstdraw --portrait $fsFile | dot -Tsvg > $svgOutputFile; 33 | elif [ $type = "fsa" ] ; then 34 | fstcompile --acceptor $isymbols --keep_isymbols $fsTextFile $fsFile; 35 | fstdraw --portrait $fsFile | dot -Tsvg > $svgOutputFile; 36 | else 37 | echo "Filetype: ${type} not recognitsed. Recognised types are fst=finite state trasducer and fsa=finite state acceptor"; 38 | fi 39 | 40 | echo 'Done, outputted: ' $svgOutputFile 41 | 42 | -------------------------------------------------------------------------------- /fst-example/composeExample.sh: -------------------------------------------------------------------------------- 1 | #!bin/sh 2 | 3 | # Based on http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture6.pdf 4 | 5 | bash compileAndDraw.sh sent.fsa 6 | bash compileAndDraw.sh dict.fst 7 | 8 | fstcompose --fst_compat_symbols=false sent.fsa dict.fst > strings.fst 9 | fstdraw --portrait strings.fst | dot -Tsvg > strings.svg 10 | echo 'Done composing: outputted strings.svg' 11 | echo 'Example sentences:' 12 | echo '------------------' 13 | 14 | for i in `seq 1 10`; 15 | do 16 | fstrandgen --seed=$RANDOM strings.fst | fstproject --project_output | 17 | fstprint --acceptor --isymbols=dict.syms | 18 | awk '{printf("%s ",$3)}END{printf("\n")}' 19 | done 20 | -------------------------------------------------------------------------------- /fst-example/dict.fst.txt: -------------------------------------------------------------------------------- 1 | 0 0 DET the 2 | 0 0 DET a 3 | 0 0 N cat 4 | 0 0 N dog 5 | 0 0 N mouse 6 | 0 0 V chased 7 | 0 0 V bit 8 | 0 9 | -------------------------------------------------------------------------------- /fst-example/index.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Introduction to OpenFST 4 | --- 5 | 6 | 7 | #Introduction to Finite State Transducers 8 | Weighted Finite State Transducers is a generalisations of finite state machines. 9 | They can be used for many purposed, including implementing algorithms that are hard to write out otherwise -- such as HMMs, as well as for the representation of knowledge -- similar to a grammar. 10 | 11 | ##Other places to get information 12 | - A descent set of slides can be found [here](http://www.gavo.t.u-tokyo.ac.jp/~novakj/wfst-algorithms.pdf) 13 | - [The OpenFst documentation](http://www.openfst.org/twiki/bin/view/FST/FstQuickTour) and [FST Examples](http://www.openfst.org/twiki/bin/view/FST/FstExamples) are nonawful, though the shell and C++ sections are intermixed. 14 | - [Speech Recognition with Weighted Finite-state Transducers](http://www.cs.nyu.edu/~mohri/pub/hbka.pdf) a book chapter. 15 | 16 | ![A FST for TIDIGITS](../tidigits/lexiconFST.png) 17 | 18 | Above: An FST for pronouncing the digits 1-9 and two pronouncations of zero as: O (o) and zero (z), as used in TIDIGITS 19 | 20 | 21 | ##Terminology 22 | ###Symbols and Strings 23 | Symbols come from some alphabet. 24 | They could be letters, words, phonemes, etc. 25 | 26 | A string is a series of symbols from an alphabet, it can include the empty string. 27 | Matching the examples above, a string could be a word (spelt out), a sentence, a word (spelt out phonetically), etc. 28 | 29 | A string can be represented as a Finite State Acceptor, where each symbol links to the state which links to the next. 30 | 31 | ###Finite State Acceptor (FSA) 32 | A Finite State Acceptor has the components of: 33 | 34 | - a number of States 35 | - one or more of which is initial 36 | - one or more of which is terminal 37 | - connections between states, with a input symbol (IE label) 38 | - the symbol could be the empty string (often written "-" or "" or "ε") 39 | - Not necessarily a one to one label to next state mapping (IE nondeterministic) 40 | 41 | A FSA can be used to check if a string matches its pattern -- it is computationally equivalent to a regular expression. 42 | It can also be used to generate strings which match that pattern. 43 | 44 | FSA's can be treated as FSTs with same input and output symbols at each edge. 45 | Kaldi example scripts sometimes write them this way. 46 | 47 | ###Finite State Transducers (FST) 48 | A Finite State Transducer extends the Finite State Acceptor with the addition of: 49 | 50 | - output labels on each edge 51 | - again the output can be the empty string. 52 | - it is common (such as in the TIDIGIT example above), to see only the first transition in a nonbranching substructure to be labels -- the other states have nothing to add other than confirming we are in that chain. (which we might Not be) 53 | - The input alphabet and output alphabet do not have to be the same, and indeed are normally not. 54 | 55 | A FST can be used to translate strings in its input alphabet to strings in its output alphabet, iff the input string matches the FSTs structure of allowed transitions. 56 | Thus if a FSA accepting its input alphabet is composed with it, it can translate the FSA. 57 | A series of FSAs can be composed, translating (matched) alphabet to alphabet, to get the desired output. 58 | 59 | 60 | ###Weighted Finite State Acceptor/Transducer 61 | As per the ordinal, but with a weight associated with each edge (as well as input, and output for transducers) 62 | This weight has a ⊕ and ⊗ operation defined on it, 63 | so that weight of alternatives and that cumulative weight along a path can be found. 64 | 65 | - e.g. weight along a path is product of probabilities, and represents the probability of that input string. 66 | - e.g. sum of weights on two edges is the probability of either of those alternitives. 67 | 68 | 69 | 70 | #Finite State Transducers in Kaldi 71 | 72 | Kaldi uses FSTs (and FSAs), as a common knowledge representation for all things. 73 | 74 | 75 | #OpenFST 76 | 77 | ##Filetypes 78 | 79 | ###Textual FST/FSA definition: `.fst.txt`, `.fsa.txt`, `.txt` 80 | Textual Representation of the finite state transducer or finite state acceptor respectively. 81 | These are the files you write to get things done, to describe your system. 82 | 83 | In most of kaldi the `.fst.txt`/`.fsa.txt` is used. In other places it is just called `.txt`. In this document, it is always referred to by the former terms. 84 | 85 | 86 | #### Line format: 87 | Normal line `fromState toState inSymbol [outSymbol] [weight]`
88 | Terminal state line `terminalState` 89 | 90 | - `fromState`, `toState`, and `terminalState` are integer state labels 91 | - `inSymbol`, `outSymbol` are textual strings being the name of the symbols from the respective input and output alphabets. 92 | - `outSymbol` should not be present in FSAs, and should always be present in FSTs 93 | - `weight` is a decimal number, indicating the weight of the edge. It must be present in Weighted FSTs/FSAs 94 | 95 | ###Symbol table file: `.isyms`, `.osyms`, `.syms`, `.dict`, `.txt` 96 | OpenFst like to refer to symbols by a positive integer. 97 | Since any finite alphabet is isomorphic to a subset of the positive integers, 98 | such a bijection exists, and can be created by enumerating each symbol. 99 | 100 | For each FST you should have two of these files, one for the input alphabet and one for the output alphabet. For an FSA you should only have one -- for the input alphabet. Under most circumstances these can be generated from the `.fst.txt`/`.fsa.txt` programatically. One such script for that is provided here in [](./makeSymbols.py). Others exist throughout the kaldi example scripts, often using AWK oneliners. 101 | 102 | In different places different extensions are used. 103 | The example [](./compileAndDraw.sh) script uses `.isyms` for symbol files generated from the input alphabet in the textual FSA/FSA description, and `.osyms` for that generated from the output alphabet. 104 | 105 | 106 | ####Line Format: 107 | `symbol integer` 108 | 109 | - `symbol` is a symbol from the alphabet being maps 110 | - `integer` is a unique positive integer (that is to say each integer only appears once in this file). 111 | 112 | ### Binary FST/FSA: `.fst`, `.fsa` 113 | This is the binary representation of the finite state transducer/acceptor. 114 | It is produced from the textual representation and symbol tables using 115 | `fstcompile`. 116 | 117 | ### Graph of FST/FSA: `.dot` 118 | It is a [Graph Description Language File](http://en.wikipedia.org/wiki/DOT_%28graph_description_language%29), produced by `fstdraw`. 119 | Piping in through `dot` can convert it into another more common format. 120 | E.g.: `cat example.dot | dot -Tsvg > example.svg` will convert `example.dot` to a SVG file. 121 | This is often done directly from the line that calls `fstdraw`. 122 | 123 | ##OpenFST components 124 | OpenFST is made up of several different command line applications. 125 | The three most used in kaldi are details briefly below: 126 | 127 | 128 | 129 | ###Common convention 130 | 131 | ###Input and Output 132 | OpenFST commands which take a single input and produce a single output 133 | (such as `fstdraw` and `fstcompile`) 134 | have the basic usage of 135 | 136 | ``` 137 | fstcommand [FLAGS] [inputfile [outputfile]] 138 | ``` 139 | 140 | Which is to say an `inputfile` can optionally be provided, 141 | and if it is, then optionally an `outputfile` can be provided also. 142 | 143 | If either is missing then input will be taken from standard in (IE piped in, or read from keyboard if no input pipe), 144 | and output will be sent to standard output (IE piped out, or printed to the terminal if there is no output pipe.), respectively. 145 | 146 | 147 | ####Accessing Help (manpages) 148 | Because OpenFST is not properly installed, it does not have entries in the man pages. 149 | To get help with a command use: 150 | 151 | ``` 152 | fstcommand --help | less 153 | ``` 154 | ###Compile: `fstcompile` 155 | this converts a textural FST/FSA into a binary one. 156 | 157 | - FSA Usage: `fstcompile --acceptor --isymbols= [--keep_isymbols];` 158 | - FST Usage: `fstcompile --isymbols= -osymbols= [--keep_isymbols] [--keep_osymbols];` 159 | 160 | Flags: 161 | 162 | - `--acceptor`: compiles it as an FSA, rather than a FST 163 | - `--isymbols=`, `--osymbols=`: specifies the input and output symbol tables 164 | - `--keep_isymbols`, `--keep_osymbols`: If set then the symbol stables as keeps in the binary file and do not need to be specified at later steps such as `fstdraw` 165 | 166 | ###Draw: `fstdraw` 167 | produces a `.dot` file graph, from a binary FST/FSA 168 | 169 | - FSA Usage: `fstdraw --acceptor --portait [--isymbols=] [--osymbols=]` 170 | - FST Usage: `fstdraw --portait [--isymbols=] [--osymbols=]` 171 | - Common Use example: `cat eg.fst | fstdraw --portait --isymbols=eg.isyms --osymbols | dot -Tsvg > eg.svg` 172 | 173 | Flags: 174 | 175 | - `--portrait` this flag should **always** be set. If not set then image comes out rotated 90 degrees, and on a overly large canvas. 176 | - `--isymbols`, `--osymbols`, as before, but if not provided then symbols in the graphic will be replaced with their numeric representation, unless `--keep_isymbols` or `--keep_osymbols` was set in the compile step 177 | - `--acceptor`: draws a FSA, rather than a FST. Without it it will label the FSA with output labels. 178 | 179 | ###Compose: `fstcompose` 180 | Composed a FSA/FST with a FST 181 | 182 | - Usage: `fstcompose [--fst_compat_symbols=false] outer.[fst|fsa] inner.fst output.fst` 183 | 184 | Applying an input to the Output FST is equivelent to first applying it to the Inner then applying the output of that to the Outer. i.e. `output(x)=outer(inner(x))` 185 | 186 | - `--fst_compat_symbols=false`: setting this to false (it defaults to true), may be required when composing FSTs/FSA where `--keep_isymbols` or `--keep_osymbols` was used and that the symbol files embedded while actually compatible are not the same files (it seems to store the filenames, which can be seen by running `strings` on a fst). 187 | 188 | 189 | 190 | ###Other useful Commands 191 | All the commands in OpenFst have a use. 192 | Other commands which I have found particularly useful, 193 | but do not have space to detail include; 194 | 195 | - `fstsymbols` manipulate and export the symbols tables in the binary FST/FSA 196 | - `fstproject` convert the FST into a FSA in either the input or output space by discarding the appropriate labels 197 | 198 | #Examples provided here 199 | 200 | Several scripts are provided here to demonstrate how to make use of OpenFST, 201 | and to make using it easier. 202 | They can be downloaded from the [Git backing this site](https://github.com/oxinabox/Kaldi-Notes/tree/gh-pages/fst-example). 203 | The section names below are also hyperlinks to download those scripts/files. 204 | 205 | ###NOTE: 206 | The example scripts assume openfst binaries are in your `PATH`. 207 | If you added all kaldi binerys during install step you will already have them. 208 | Otherwise you can add just the Openfst binaries by: 209 | Add to your `.bashrc` (or similar) `PATH="<...>/kaldi-trunk/tools/openfst/bin:${PATH}"`, where `<...>` is the math to the kaldi-trunk folder. 210 | then `source ~/.bashrc` 211 | 212 | 213 | ###[makeSymbols.py](makeSymbols.py) 214 | [makeSymbols.py](makeSymbols.py) is a script to make creating the symbol tables (which map symbols to arbitary unique integers) easier. 215 | 216 | Usage: `python makeSymbols.py file fieldNumber` 217 | 218 | - `file`: the textual FST/FSA file (`.fst.txt` or `.fsa.txt usually`), to extract the symbols from 219 | - `fieldNumber`: which column of the file to take symbols from 220 | - input symbols use `fieldNumber` of 2 221 | - output symbols use `fieldNumber` of 3 222 | 223 | The Symbols Table is output to standard out, and can be piped into a file 224 | 225 | ###[compileAndDraw.sh](compileAndDraw.sh) 226 | the [compileAndDraw.sh](compileAndDraw.sh) is a simple bash script that runs the whole process of compiling then drawing a FST/FSA. 227 | 228 | Usage FSA: `bash compileAndDraw.sh filename.fsa.txt` 229 | Usage FST: `bash compileAndDraw.sh filename.fst.txt` 230 | 231 | Note: unlike openfst programs this is file extension sensitive. 232 | It will make the appropriate call for a FSA or a FST based on the extension. 233 | 234 | ###[composeExample.sh](composeExample.sh) 235 | The [composeExample.sh](composeExample.sh) script runs though the creation then composition of the `dict.fst` and `sent.fsa`. It then outputs some sentences generated using the language model descried. 236 | 237 | Usage: `bash composeExample.sh` 238 | 239 | ##Example FSTs/FSAs 240 | This folder contains 3 examples: 241 | The later two examples of sentence construction are based on ones provided in [these lecture notes](http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture6.pdf) 242 | 243 | ##[simple.fsa.txt](./simple.fsa.txt) 244 | [simple.fsa.txt](./simple.fsa.txt) is a very simple Finite State Accepter. 245 | 246 | ##[dict.fst.txt](./dict.fst.txt) 247 | [dict.fst.txt](./dict.fst.txt) is a dictionary containing several words. There is only state in the dictionary -- as far it its concerns words can be in any order 248 | 249 | ##[sent.fsa.txt](./sent.fsa.txt) 250 | [sent.fsa.txt](./sent.fsa.txt) is a grammar for a simple sentence, expressed as a finite state acceptor. Sentences can either be `determiner noun verb` or `determiner noun verb determiner noun`. 251 | 252 | 253 | 254 | -------------------------------------------------------------------------------- /fst-example/makeSymbols.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | 4 | 5 | if len(sys.argv)<3 or '-h' in sys.argv or '--help' in sys.argv: 6 | print(""" 7 | Usage: python makeSymbols file fieldNumber 8 | 9 | file: the textual FST/FSA file (.fst.txt or .fsa.txt usually), to extract the symbols from 10 | fieldNumber: which column of the file to take symbols fro 11 | input symbols use fieldNumber of 2 12 | output symbolss use fieldNumber of 3 13 | 14 | The Symbols Table is output to standard out, and can be piped into a file 15 | """) 16 | 17 | 18 | words=set() 19 | index = int(sys.argv[2]) 20 | 21 | with open(sys.argv[1], 'r') as ff: 22 | for line in ff: 23 | fields = line.split(' ') 24 | if len(fields)>index: 25 | field = fields[index].strip() 26 | if (field): 27 | words.add(field) 28 | 29 | print("- 0") #alway have the empty string as 0 30 | words.discard('-') 31 | for (ii,word) in enumerate(words,1): 32 | print("%s %d" % (word,ii)) 33 | 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /fst-example/sent.fsa.txt: -------------------------------------------------------------------------------- 1 | 0 1 DET 2 | 1 2 N 3 | 2 3 V 4 | 3 4 DET 5 | 4 5 N 6 | 5 7 | 3 8 | -------------------------------------------------------------------------------- /fst-example/simple.fsa.txt: -------------------------------------------------------------------------------- 1 | 0 1 The 1 2 | 1 2 person 0.5 3 | 1 3 people 0.5 4 | 2 4 is 1 5 | 3 4 are 1 6 | 4 5 mad 1 7 | 5 8 | 9 | -------------------------------------------------------------------------------- /images/body-bg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/body-bg.png -------------------------------------------------------------------------------- /images/highlight-bg.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/highlight-bg.jpg -------------------------------------------------------------------------------- /images/hr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/hr.png -------------------------------------------------------------------------------- /images/octocat-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/octocat-icon.png -------------------------------------------------------------------------------- /images/tar-gz-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/tar-gz-icon.png -------------------------------------------------------------------------------- /images/zip-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/images/zip-icon.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | --- 4 | 5 |
6 |
7 | 8 |
9 | View on GitHub 10 |
11 | 12 |
13 | 14 |
15 |

This is an introduction to speech recognition using Kaldi. 16 | Follow one of the links to get started. 17 | 18 |

19 | A PDF snapshot of this site/manual is available. Be aware that all link within the pdf go to the website. 20 |

21 | 22 | 23 |
24 |
25 | -------------------------------------------------------------------------------- /install_notes.txt: -------------------------------------------------------------------------------- 1 | --- 2 | 3 | layout: default 4 | title: Installing Kaldi 5 | --- 6 | 7 | #Installing Kaldi 8 | 9 | ##Building Kaldi 10 | Follow the [official instructions](http://kaldi.sourceforge.net/tutorial_setup.html) 11 | 12 | Do not forget to first `cd` to `/kaldi-trunk/tools` then do a `make -j 8` to build to tools kaldi usesi. 13 | 14 | ###Issues 15 | 16 | - Issue: Requires libtool 17 | - Resolution: Build and install libtool, I did so by installing from [source](http://ftpmirror.gnu.org/libtool/libtool-2.4.5.tar.gz) locally (via configure --prefix) 18 | - Resolution (alternative): Install with package manage (`apt-get`) 19 | - then try and build kaldi, first running .configure 20 | - Issue: Needs a BLAS. 21 | - Resolution: use OpenBlas 22 | - go back to `/kaldi-trunk/tools` and `make -j 8 openblas` 23 | - use it by running `./configure --openblas-root=../tools/OpenBLAS/install` 24 | - Issue: this Kaldi won't run with GCC 4.8.4 25 | - Resolution: install newer GCC from source 26 | - Follow instructions from [GCC website](https://gcc.gnu.org/wiki/InstallingGCC) 27 | - in particular for getting the dependencies 28 | - When it comes to running configure use: `../gcc-4.9.2/configure --disable-multilib` 29 | - when doing use `make -j 8` or it will take a very long time to build 30 | - Resolution (alternative) : install from backports 31 | 32 | ##Installing Graph Viewer 33 | 34 | For viewing the output of fstdraw, you need to convert it into a useful format. To do this you need `dot` which is part of graphviz. `apt-get install graphvis` 35 | 36 | ##Adding things to your path 37 | Since Kaldi has not been install to any location -- just built in place. 38 | Nothing is on your path. 39 | 40 | The build process, spreads out all the binaries into a number of folders in `\kaldi-trunk\src/*bin`, 41 | intermixing them with the source files. (so you can't just add the bin files to your path). 42 | 43 | You might like to symlink all executables into one folder and add it to your path. 44 | The symlinking can be done with the following shell script: 45 | 46 | ``` 47 | for a in `find . -type f -executable -print`; 48 | do 49 | ln -s `pwd`/$a bins 50 | done 51 | ``` 52 | This will put them all into the bins directory. 53 | Then you can edit your `.bashrc` file to add that to your path. 54 | e.g.: 55 | 56 | ``` 57 | PATH="/user/data7/20361362/kaldi/kaldi-trunk/src/bins:${PATH}" 58 | ``` 59 | 60 | 61 | 62 | -------------------------------------------------------------------------------- /javascripts/main.js: -------------------------------------------------------------------------------- 1 | console.log('This would be the main JS file.'); 2 | -------------------------------------------------------------------------------- /params.json: -------------------------------------------------------------------------------- 1 | {"name":"Kaldi-notes","tagline":"Some notes on Kaldi","body":"### Welcome to GitHub Pages.\r\nThis automatic page generator is the easiest way to create beautiful pages for all of your projects. Author your page content here using GitHub Flavored Markdown, select a template crafted by a designer, and publish. After your page is generated, you can check out the new branch:\r\n\r\n```\r\n$ cd your_repo_root/repo_name\r\n$ git fetch origin\r\n$ git checkout gh-pages\r\n```\r\n\r\nIf you're using the GitHub for Mac, simply sync your repository and you'll see the new branch.\r\n\r\n### Designer Templates\r\nWe've crafted some handsome templates for you to use. Go ahead and continue to layouts to browse through them. You can easily go back to edit your page before publishing. After publishing your page, you can revisit the page generator and switch to another theme. Your Page content will be preserved if it remained markdown format.\r\n\r\n### Rather Drive Stick?\r\nIf you prefer to not use the automatic generator, push a branch named `gh-pages` to your repository to create a page manually. In addition to supporting regular HTML content, GitHub Pages support Jekyll, a simple, blog aware static site generator written by our own Tom Preston-Werner. Jekyll makes it easy to create site-wide headers and footers without having to copy them across every page. It also offers intelligent blog support and other advanced templating features.\r\n\r\n### Authors and Contributors\r\nYou can @mention a GitHub username to generate a link to their profile. The resulting `` element will link to the contributor's GitHub Profile. For example: In 2007, Chris Wanstrath (@defunkt), PJ Hyett (@pjhyett), and Tom Preston-Werner (@mojombo) founded GitHub.\r\n\r\n### Support or Contact\r\nHaving trouble with Pages? Check out the documentation at https://help.github.com/pages or contact support@github.com and we’ll help you sort it out.\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."} -------------------------------------------------------------------------------- /required_knowledge.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Prerequisite Knowledge 4 | --- 5 | 6 | #Required Knowledge 7 | 8 | To make use of Kaldi, there is some significant prior required knowledge. 9 | 10 | - Bash 11 | - Awk 12 | - Perl 13 | - Python 14 | 15 | While Kaldi is made in C++, knowledge of C++ is not required to use it. 16 | 17 | ##Bash 18 | Example run scripts in Kaldi are written in Bash. 19 | Not POSIX shell, but Bash, they contain some "Bashisms", which will not reliably work in other shells. 20 | 21 | Bash is the main glue that holds Kaldi all together. 22 | It is used to prepare the data, 23 | prepare the language files, 24 | trigger the training, 25 | and print the results. 26 | 27 | Knowing how to at least read bash is a must for using Kaldi. 28 | The others languages can be picked up as required, but bash is a must if you want to make use of the example scripts. 29 | You most likely *do* want to make use of the example scripts, they are some serious part of the documentation, and exist for most datasets -- doing a lot of the work for you 30 | 31 | Bash is however quiet easy for anyone who has work on the unix shell. 32 | 33 | Key knowledge areas: 34 | 35 | - Conditionals 36 | - Loops 37 | - Pipelines / input/output redirection 38 | - variables 39 | - the PATH 40 | 41 | ##Awk 42 | Awk is on of the standard tools for doing string manipulation on the command line, 43 | along with sed, grep, inline perl and simpler tools like tr, cut, head etc. 44 | 45 | It is used a lot in the preparing of fst file inputs. A lot of inline Awk can be found in the aforementioned bash recipes. 46 | 47 | It is a bit more complicated to understand than sed or grep, in that rather than a regex-tool it is a programming language. 48 | Many decent tutorials exist online. 49 | 50 | ##Perl 51 | Perl carries out a lot of the heavy lifting of kaldi setup. 52 | It is used for preparing data, where it gets to complex for Bash+Awk. 53 | It is used to facilitate the parallel (and/or distributed) task execution. 54 | It shows up throughout the examples. 55 | 56 | ##Python 57 | Python rarely shows up in the example scripts for kaldi, but it does show up. 58 | When it does, it is doing a task similar to those perl is used for. 59 | It is worth knowing, and using, as it is not a good idea to unleash more perl scripts into the wild. 60 | Combining in tools like [Plumbum](https://pypi.python.org/pypi/plumbum), it could also be used to replace Bash -- this however is less portable. 61 | Other tools like [pyp](http://code.google.com/p/pyp/), can be used to replace Awk -- again losing portability. 62 | 63 | -------------------------------------------------------------------------------- /resources.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Other Resources 4 | --- 5 | #Other Resources 6 | This document is quiet interlinked to other resources, as appropriate for each section and subsection. These and some father resources are summeriest below. 7 | 8 | ##OpenFST 9 | 10 | - [OpenFST Documentation](http://www.openfst.org/twiki/bin/view/FST/WebHome) 11 | - [Quick Tour](http://www.openfst.org/twiki/bin/view/FST/FstQuickTour) 12 | - [Examples](http://www.openfst.org/twiki/bin/view/FST/FstExamples) 13 | - [Tutorial Sheet from University of Illinois](http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture6.pdf) 14 | - [Lecture Slides from University of Tokyo](http://www.gavo.t.u-tokyo.ac.jp/~novakj/wfst-algorithms.pdf) 15 | - [Speech Recognition with Weighted Finite-state Transducers (book chapter)](http://www.cs.nyu.edu/~mohri/pub/hbka.pdf) 16 | 17 | 18 | ##Kaldi 19 | 20 | - [Kaldi Documentation](http://kaldi.sourceforge.net/) 21 | - [Tutorial](http://kaldi.sourceforge.net/tutorial.html) 22 | - Several pages by Vassil Panayotov 23 | - [This blog post on Graph construction](http://vpanayotov.blogspot.com.au/2012/06/kaldi-decoding-graph-construction.html) 24 | - [These instructions on recipe setup](http://vpanayotov.blogspot.com.au/2012/02/poor-mans-kaldi-recipe-setup.html) 25 | - [This tutorial](http://analytcz.com/kaldi-hybrid-mlphmm-asr-2/) by the same author extends the above. But its web hosting does not seem stable, right now the [google cached version can be used](http://webcache.googleusercontent.com/search?q=cache:z-MGlCv917sJ:analytcz.com/kaldi-hybrid-mlphmm-asr-2/) 26 | - [This Masters Thesis](https://github.com/oplatek/kaldi-thesis/blob/master/text/tags/oplatek_thesis013.pdf?raw=true) 27 | - [Lecture slides from National Taiwan Normal University](http://berlin.csie.ntnu.edu.tw/Courses/Speech%20Recognition/Lectures2013/SP2013F_Lecture14-Introduction%20to%20the%20Kaldi%20toolkit.pdf) 28 | - [Lecture Slides from Dan Povey (one of kaldi's creators)](http://danielpovey.com/kaldi-lectures.html) 29 | -------------------------------------------------------------------------------- /stylesheets/print.css: -------------------------------------------------------------------------------- 1 | html, body, div, span, applet, object, iframe, 2 | h1, h2, h3, h4, h5, h6, p, blockquote, pre, 3 | a, abbr, acronym, address, big, cite, code, 4 | del, dfn, em, img, ins, kbd, q, s, samp, 5 | small, strike, strong, sub, sup, tt, var, 6 | b, u, i, center, 7 | dl, dt, dd, ol, ul, li, 8 | fieldset, form, label, legend, 9 | table, caption, tbody, tfoot, thead, tr, th, td, 10 | article, aside, canvas, details, embed, 11 | figure, figcaption, footer, header, hgroup, 12 | menu, nav, output, ruby, section, summary, 13 | time, mark, audio, video { 14 | padding: 0; 15 | margin: 0; 16 | font: inherit; 17 | font-size: 100%; 18 | vertical-align: baseline; 19 | border: 0; 20 | } 21 | /* HTML5 display-role reset for older browsers */ 22 | article, aside, details, figcaption, figure, 23 | footer, header, hgroup, menu, nav, section { 24 | display: block; 25 | } 26 | body { 27 | line-height: 1; 28 | } 29 | ol, ul { 30 | list-style: none; 31 | } 32 | blockquote, q { 33 | quotes: none; 34 | } 35 | blockquote:before, blockquote:after, 36 | q:before, q:after { 37 | content: ''; 38 | content: none; 39 | } 40 | table { 41 | border-spacing: 0; 42 | border-collapse: collapse; 43 | } 44 | body { 45 | font-family: 'Helvetica Neue', Helvetica, Arial, serif; 46 | font-size: 13px; 47 | line-height: 1.5; 48 | color: #000; 49 | } 50 | 51 | a { 52 | font-weight: bold; 53 | color: #d5000d; 54 | } 55 | 56 | header { 57 | padding-top: 35px; 58 | padding-bottom: 10px; 59 | } 60 | 61 | header h1 { 62 | font-size: 48px; 63 | font-weight: bold; 64 | line-height: 1.2; 65 | color: #303030; 66 | letter-spacing: -1px; 67 | } 68 | 69 | header h2 { 70 | font-size: 24px; 71 | font-weight: normal; 72 | line-height: 1.3; 73 | color: #aaa; 74 | letter-spacing: -1px; 75 | } 76 | #downloads { 77 | display: none; 78 | } 79 | #main_content { 80 | padding-top: 20px; 81 | } 82 | 83 | code, pre { 84 | margin-bottom: 30px; 85 | font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal; 86 | font-size: 12px; 87 | color: #222; 88 | } 89 | 90 | code { 91 | padding: 0 3px; 92 | } 93 | 94 | pre { 95 | padding: 20px; 96 | overflow: auto; 97 | border: solid 1px #ddd; 98 | } 99 | pre code { 100 | padding: 0; 101 | } 102 | 103 | ul, ol, dl { 104 | margin-bottom: 20px; 105 | } 106 | 107 | 108 | /* COMMON STYLES */ 109 | 110 | table { 111 | width: 100%; 112 | border: 1px solid #ebebeb; 113 | } 114 | 115 | th { 116 | font-weight: 500; 117 | } 118 | 119 | td { 120 | font-weight: 300; 121 | text-align: center; 122 | border: 1px solid #ebebeb; 123 | } 124 | 125 | form { 126 | padding: 20px; 127 | background: #f2f2f2; 128 | 129 | } 130 | 131 | 132 | /* GENERAL ELEMENT TYPE STYLES */ 133 | 134 | h1 { 135 | font-size: 2.8em; 136 | } 137 | 138 | h2 { 139 | margin-bottom: 8px; 140 | font-size: 22px; 141 | font-weight: bold; 142 | color: #303030; 143 | } 144 | 145 | h3 { 146 | margin-bottom: 8px; 147 | font-size: 18px; 148 | font-weight: bold; 149 | color: #d5000d; 150 | } 151 | 152 | h4 { 153 | font-size: 16px; 154 | font-weight: bold; 155 | color: #303030; 156 | } 157 | 158 | h5 { 159 | font-size: 1em; 160 | color: #303030; 161 | } 162 | 163 | h6 { 164 | font-size: .8em; 165 | color: #303030; 166 | } 167 | 168 | p { 169 | margin-bottom: 20px; 170 | font-weight: 300; 171 | } 172 | 173 | a { 174 | text-decoration: none; 175 | } 176 | 177 | p a { 178 | font-weight: 400; 179 | } 180 | 181 | blockquote { 182 | padding: 0 0 0 30px; 183 | margin-bottom: 20px; 184 | font-size: 1.6em; 185 | border-left: 10px solid #e9e9e9; 186 | } 187 | 188 | ul li { 189 | list-style-position: inside; 190 | list-style: disc; 191 | padding-left: 20px; 192 | } 193 | 194 | ol li { 195 | list-style-position: inside; 196 | list-style: decimal; 197 | padding-left: 3px; 198 | } 199 | 200 | dl dd { 201 | font-style: italic; 202 | font-weight: 100; 203 | } 204 | 205 | footer { 206 | padding-top: 20px; 207 | padding-bottom: 30px; 208 | margin-top: 40px; 209 | font-size: 13px; 210 | color: #aaa; 211 | } 212 | 213 | footer a { 214 | color: #666; 215 | } 216 | 217 | /* MISC */ 218 | .clearfix:after { 219 | display: block; 220 | height: 0; 221 | clear: both; 222 | visibility: hidden; 223 | content: '.'; 224 | } 225 | 226 | .clearfix {display: inline-block;} 227 | * html .clearfix {height: 1%;} 228 | .clearfix {display: block;} 229 | -------------------------------------------------------------------------------- /stylesheets/pygment_trac.css: -------------------------------------------------------------------------------- 1 | .highlight { background: #ffffff; } 2 | .highlight .c { color: #999988; font-style: italic } /* Comment */ 3 | .highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */ 4 | .highlight .k { font-weight: bold } /* Keyword */ 5 | .highlight .o { font-weight: bold } /* Operator */ 6 | .highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */ 7 | .highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */ 8 | .highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */ 9 | .highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */ 10 | .highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */ 11 | .highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */ 12 | .highlight .ge { font-style: italic } /* Generic.Emph */ 13 | .highlight .gr { color: #aa0000 } /* Generic.Error */ 14 | .highlight .gh { color: #999999 } /* Generic.Heading */ 15 | .highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */ 16 | .highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */ 17 | .highlight .go { color: #888888 } /* Generic.Output */ 18 | .highlight .gp { color: #555555 } /* Generic.Prompt */ 19 | .highlight .gs { font-weight: bold } /* Generic.Strong */ 20 | .highlight .gu { color: #800080; font-weight: bold; } /* Generic.Subheading */ 21 | .highlight .gt { color: #aa0000 } /* Generic.Traceback */ 22 | .highlight .kc { font-weight: bold } /* Keyword.Constant */ 23 | .highlight .kd { font-weight: bold } /* Keyword.Declaration */ 24 | .highlight .kn { font-weight: bold } /* Keyword.Namespace */ 25 | .highlight .kp { font-weight: bold } /* Keyword.Pseudo */ 26 | .highlight .kr { font-weight: bold } /* Keyword.Reserved */ 27 | .highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */ 28 | .highlight .m { color: #009999 } /* Literal.Number */ 29 | .highlight .s { color: #d14 } /* Literal.String */ 30 | .highlight .na { color: #008080 } /* Name.Attribute */ 31 | .highlight .nb { color: #0086B3 } /* Name.Builtin */ 32 | .highlight .nc { color: #445588; font-weight: bold } /* Name.Class */ 33 | .highlight .no { color: #008080 } /* Name.Constant */ 34 | .highlight .ni { color: #800080 } /* Name.Entity */ 35 | .highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */ 36 | .highlight .nf { color: #990000; font-weight: bold } /* Name.Function */ 37 | .highlight .nn { color: #555555 } /* Name.Namespace */ 38 | .highlight .nt { color: #000080 } /* Name.Tag */ 39 | .highlight .nv { color: #008080 } /* Name.Variable */ 40 | .highlight .ow { font-weight: bold } /* Operator.Word */ 41 | .highlight .w { color: #bbbbbb } /* Text.Whitespace */ 42 | .highlight .mf { color: #009999 } /* Literal.Number.Float */ 43 | .highlight .mh { color: #009999 } /* Literal.Number.Hex */ 44 | .highlight .mi { color: #009999 } /* Literal.Number.Integer */ 45 | .highlight .mo { color: #009999 } /* Literal.Number.Oct */ 46 | .highlight .sb { color: #d14 } /* Literal.String.Backtick */ 47 | .highlight .sc { color: #d14 } /* Literal.String.Char */ 48 | .highlight .sd { color: #d14 } /* Literal.String.Doc */ 49 | .highlight .s2 { color: #d14 } /* Literal.String.Double */ 50 | .highlight .se { color: #d14 } /* Literal.String.Escape */ 51 | .highlight .sh { color: #d14 } /* Literal.String.Heredoc */ 52 | .highlight .si { color: #d14 } /* Literal.String.Interpol */ 53 | .highlight .sx { color: #d14 } /* Literal.String.Other */ 54 | .highlight .sr { color: #009926 } /* Literal.String.Regex */ 55 | .highlight .s1 { color: #d14 } /* Literal.String.Single */ 56 | .highlight .ss { color: #990073 } /* Literal.String.Symbol */ 57 | .highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */ 58 | .highlight .vc { color: #008080 } /* Name.Variable.Class */ 59 | .highlight .vg { color: #008080 } /* Name.Variable.Global */ 60 | .highlight .vi { color: #008080 } /* Name.Variable.Instance */ 61 | .highlight .il { color: #009999 } /* Literal.Number.Integer.Long */ 62 | 63 | .type-csharp .highlight .k { color: #0000FF } 64 | .type-csharp .highlight .kt { color: #0000FF } 65 | .type-csharp .highlight .nf { color: #000000; font-weight: normal } 66 | .type-csharp .highlight .nc { color: #2B91AF } 67 | .type-csharp .highlight .nn { color: #000000 } 68 | .type-csharp .highlight .s { color: #A31515 } 69 | .type-csharp .highlight .sc { color: #A31515 } 70 | -------------------------------------------------------------------------------- /stylesheets/stylesheet.css: -------------------------------------------------------------------------------- 1 | /* http://meyerweb.com/eric/tools/css/reset/ 2 | v2.0 | 20110126 3 | License: none (public domain) 4 | */ 5 | html, body, div, span, applet, object, iframe, 6 | h1, h2, h3, h4, h5, h6, p, blockquote, pre, 7 | a, abbr, acronym, address, big, cite, code, 8 | del, dfn, em, img, ins, kbd, q, s, samp, 9 | small, strike, strong, sub, sup, tt, var, 10 | b, u, i, center, 11 | dl, dt, dd, ol, ul, li, 12 | fieldset, form, label, legend, 13 | table, caption, tbody, tfoot, thead, tr, th, td, 14 | article, aside, canvas, details, embed, 15 | figure, figcaption, footer, header, hgroup, 16 | menu, nav, output, ruby, section, summary, 17 | time, mark, audio, video { 18 | padding: 0; 19 | margin: 0; 20 | font: inherit; 21 | font-size: 100%; 22 | vertical-align: baseline; 23 | border: 0; 24 | } 25 | /* HTML5 display-role reset for older browsers */ 26 | article, aside, details, figcaption, figure, 27 | footer, header, hgroup, menu, nav, section { 28 | display: block; 29 | } 30 | body { 31 | line-height: 1; 32 | } 33 | ol, ul { 34 | list-style: none; 35 | } 36 | blockquote, q { 37 | quotes: none; 38 | } 39 | blockquote:before, blockquote:after, 40 | q:before, q:after { 41 | content: ''; 42 | content: none; 43 | } 44 | table { 45 | border-spacing: 0; 46 | border-collapse: collapse; 47 | } 48 | 49 | /* LAYOUT STYLES */ 50 | body { 51 | font-family: 'Helvetica Neue', Helvetica, Arial, serif; 52 | font-size: 1em; 53 | line-height: 1.5; 54 | color: #6d6d6d; 55 | text-shadow: 0 1px 0 rgba(255, 255, 255, 0.8); 56 | background: #e7e7e7 url(../images/body-bg.png) 0 0 repeat; 57 | } 58 | 59 | 60 | a { 61 | color: #d5000d; 62 | text-decoration: underline; 63 | } 64 | a:hover { 65 | color: #a5000a; 66 | } 67 | 68 | 69 | header { 70 | padding-top: 35px; 71 | padding-bottom: 25px; 72 | } 73 | 74 | header h1 { 75 | font-family: 'Chivo', 'Helvetica Neue', Helvetica, Arial, serif; 76 | font-size: 48px; font-weight: 900; 77 | line-height: 1.2; 78 | color: #303030; 79 | letter-spacing: -1px; 80 | } 81 | 82 | header h2 { 83 | font-size: 24px; 84 | font-weight: normal; 85 | line-height: 1.3; 86 | color: #aaa; 87 | letter-spacing: -1px; 88 | } 89 | 90 | #container { 91 | min-height: 595px; 92 | background: transparent url(../images/highlight-bg.jpg) 50% 0 no-repeat; 93 | } 94 | 95 | .inner { 96 | width: 620px; 97 | margin: 0 auto; 98 | } 99 | 100 | #container .inner img { 101 | max-width: 100%; 102 | } 103 | 104 | #downloads { 105 | margin-bottom: 40px; 106 | } 107 | 108 | #nav { 109 | float: left; 110 | position: fixed; 111 | top: 50%; 112 | background-color: #6D6D6D; 113 | color: #D5000D; 114 | padding-top: 20px; 115 | padding-bottom: 20px; 116 | padding-right: 20px; 117 | text-shadow: none; 118 | border-width: 1px 1px 1px 0px; 119 | border-style: solid; 120 | border-top-right-radius: 10px; 121 | border-bottom-right-radius: 10px; 122 | } 123 | 124 | .navtop { 125 | font-weight: bold; 126 | } 127 | 128 | 129 | .navsub{ 130 | font-weight: normal; 131 | } 132 | 133 | 134 | a.button { 135 | display: block; 136 | float: left; 137 | width: 179px; 138 | padding: 12px 8px 12px 8px; 139 | margin-right: 14px; 140 | font-size: 15px; 141 | font-weight: bold; 142 | line-height: 25px; 143 | color: #303030; 144 | background: #fdfdfd; /* Old browsers */ 145 | background: -moz-linear-gradient(top, #fdfdfd 0%, #f2f2f2 100%); /* FF3.6+ */ 146 | background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#fdfdfd), color-stop(100%,#f2f2f2)); /* Chrome,Safari4+ */ 147 | background: -webkit-linear-gradient(top, #fdfdfd 0%,#f2f2f2 100%); /* Chrome10+,Safari5.1+ */ 148 | background: -o-linear-gradient(top, #fdfdfd 0%,#f2f2f2 100%); /* Opera 11.10+ */ 149 | background: -ms-linear-gradient(top, #fdfdfd 0%,#f2f2f2 100%); /* IE10+ */ 150 | background: linear-gradient(top, #fdfdfd 0%,#f2f2f2 100%); /* W3C */ 151 | filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#fdfdfd', endColorstr='#f2f2f2',GradientType=0 ); /* IE6-9 */ 152 | border-top: solid 1px #cbcbcb; 153 | border-right: solid 1px #b7b7b7; 154 | border-bottom: solid 1px #b3b3b3; 155 | border-left: solid 1px #b7b7b7; 156 | border-radius: 30px; 157 | -webkit-box-shadow: 10px 10px 5px #888; 158 | -moz-box-shadow: 10px 10px 5px #888; 159 | box-shadow: 0px 1px 5px #e8e8e8; 160 | -moz-border-radius: 30px; 161 | -webkit-border-radius: 30px; 162 | } 163 | a.button:hover { 164 | background: #fafafa; /* Old browsers */ 165 | background: -moz-linear-gradient(top, #fdfdfd 0%, #f6f6f6 100%); /* FF3.6+ */ 166 | background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#fdfdfd), color-stop(100%,#f6f6f6)); /* Chrome,Safari4+ */ 167 | background: -webkit-linear-gradient(top, #fdfdfd 0%,#f6f6f6 100%); /* Chrome10+,Safari5.1+ */ 168 | background: -o-linear-gradient(top, #fdfdfd 0%,#f6f6f6 100%); /* Opera 11.10+ */ 169 | background: -ms-linear-gradient(top, #fdfdfd 0%,#f6f6f6 100%); /* IE10+ */ 170 | background: linear-gradient(top, #fdfdfd 0%,#f6f6f6, 100%); /* W3C */ 171 | filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#fdfdfd', endColorstr='#f6f6f6',GradientType=0 ); /* IE6-9 */ 172 | border-top: solid 1px #b7b7b7; 173 | border-right: solid 1px #b3b3b3; 174 | border-bottom: solid 1px #b3b3b3; 175 | border-left: solid 1px #b3b3b3; 176 | } 177 | 178 | a.button span { 179 | display: block; 180 | height: 23px; 181 | padding-left: 50px; 182 | } 183 | 184 | #download-zip span { 185 | background: transparent url(../images/zip-icon.png) 12px 50% no-repeat; 186 | } 187 | #download-tar-gz span { 188 | background: transparent url(../images/tar-gz-icon.png) 12px 50% no-repeat; 189 | } 190 | #view-on-github span { 191 | background: transparent url(../images/octocat-icon.png) 12px 50% no-repeat; 192 | } 193 | #view-on-github { 194 | margin-right: 0; 195 | } 196 | 197 | code, pre { 198 | margin-bottom: 30px; 199 | font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace; 200 | font-size: 14px; 201 | color: #222; 202 | } 203 | 204 | code { 205 | padding: 0 3px; 206 | background-color: #f2f2f2; 207 | border: solid 1px #ddd; 208 | } 209 | 210 | pre { 211 | padding: 20px; 212 | overflow: auto; 213 | color: #f2f2f2; 214 | text-shadow: none; 215 | background: #303030; 216 | } 217 | pre code { 218 | padding: 0; 219 | color: #f2f2f2; 220 | background-color: #303030; 221 | border: none; 222 | } 223 | 224 | ul, ol, dl { 225 | margin-bottom: 20px; 226 | } 227 | 228 | 229 | /* COMMON STYLES */ 230 | 231 | hr { 232 | height: 1px; 233 | padding-bottom: 1em; 234 | margin-top: 1em; 235 | line-height: 1px; 236 | background: transparent url('../images/hr.png') 50% 0 no-repeat; 237 | border: none; 238 | } 239 | 240 | strong { 241 | font-weight: bold; 242 | } 243 | 244 | em { 245 | font-style: italic; 246 | } 247 | 248 | table { 249 | width: 100%; 250 | border: 1px solid #ebebeb; 251 | } 252 | 253 | th { 254 | font-weight: 500; 255 | } 256 | 257 | td { 258 | font-weight: 300; 259 | text-align: center; 260 | border: 1px solid #ebebeb; 261 | } 262 | 263 | form { 264 | padding: 20px; 265 | background: #f2f2f2; 266 | 267 | } 268 | 269 | 270 | /* GENERAL ELEMENT TYPE STYLES */ 271 | 272 | h1 { 273 | font-size: 32px; 274 | } 275 | 276 | h2 { 277 | margin-bottom: 8px; 278 | font-size: 22px; 279 | font-weight: bold; 280 | color: #303030; 281 | } 282 | 283 | h3 { 284 | margin-bottom: 8px; 285 | font-size: 18px; 286 | font-weight: bold; 287 | color: #d5000d; 288 | } 289 | 290 | h4 { 291 | font-size: 16px; 292 | font-weight: bold; 293 | color: #303030; 294 | } 295 | 296 | h5 { 297 | font-size: 1em; 298 | color: #303030; 299 | } 300 | 301 | h6 { 302 | font-size: .8em; 303 | color: #303030; 304 | } 305 | 306 | p { 307 | margin-bottom: 20px; 308 | font-weight: 300; 309 | } 310 | 311 | 312 | p a { 313 | font-weight: 400; 314 | } 315 | 316 | blockquote { 317 | padding: 0 0 0 30px; 318 | margin-bottom: 20px; 319 | font-style: italic; 320 | border-left: 10px solid #e9e9e9; 321 | } 322 | 323 | ul li { 324 | list-style-position: inside; 325 | list-style: disc !important; //HACK: for nesing UL in OL 326 | padding-left: 20px; 327 | } 328 | 329 | ol li { 330 | list-style-position: inside; 331 | list-style: decimal; 332 | padding-left: 3px; 333 | } 334 | 335 | ul > li { 336 | margin-left: 20px; 337 | } 338 | 339 | dl dt { 340 | color: #303030; 341 | } 342 | 343 | footer { 344 | padding-top: 20px; 345 | padding-bottom: 30px; 346 | margin-top: 40px; 347 | font-size: 13px; 348 | color: #aaa; 349 | background: transparent url('../images/hr.png') 0 0 no-repeat; 350 | } 351 | 352 | footer a { 353 | color: #666; 354 | } 355 | footer a:hover { 356 | color: #444; 357 | } 358 | 359 | /* MISC */ 360 | .clearfix:after { 361 | display: block; 362 | height: 0; 363 | clear: both; 364 | visibility: hidden; 365 | content: '.'; 366 | } 367 | 368 | .clearfix {display: inline-block;} 369 | * html .clearfix {height: 1%;} 370 | .clearfix {display: block;} 371 | 372 | /* #Media Queries 373 | ================================================== */ 374 | 375 | /* Smaller than standard 960 (devices and browsers) */ 376 | @media only screen and (max-width: 959px) { } 377 | 378 | /* Tablet Portrait size to standard 960 (devices and browsers) */ 379 | @media only screen and (min-width: 768px) and (max-width: 959px) { } 380 | 381 | /* All Mobile Sizes (devices and browser) */ 382 | @media only screen and (max-width: 767px) { 383 | header { 384 | padding-top: 10px; 385 | padding-bottom: 10px; 386 | } 387 | #downloads { 388 | margin-bottom: 25px; 389 | } 390 | #download-zip, #download-tar-gz { 391 | display: none; 392 | } 393 | .inner { 394 | width: 94%; 395 | margin: 0 auto; 396 | } 397 | } 398 | 399 | /* Mobile Landscape Size to Tablet Portrait (devices and browsers) */ 400 | @media only screen and (min-width: 480px) and (max-width: 767px) { } 401 | 402 | /* Mobile Portrait Size to Mobile Landscape Size (devices and browsers) */ 403 | @media only screen and (max-width: 479px) { } 404 | -------------------------------------------------------------------------------- /tidigits/174o2o8a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/tidigits/174o2o8a.png -------------------------------------------------------------------------------- /tidigits/174o2o8aPhones.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/tidigits/174o2o8aPhones.png -------------------------------------------------------------------------------- /tidigits/LGFST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/tidigits/LGFST.png -------------------------------------------------------------------------------- /tidigits/TODO.md: -------------------------------------------------------------------------------- 1 | - Include more examples at all stages 2 | - Check constancy in path references. 3 | - Spelling check 4 | -------------------------------------------------------------------------------- /tidigits/data_prep.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Data Preparation 4 | --- 5 | 6 | #Data Preparation 7 | [The official kaldi documentation on this section](http://kaldi.sourceforge.net/data_prep.html#data_prep_data). It is the basis of a lot of this section. 8 | 9 | These steps are carried out by the script `local/tidigits_data_prep.sh`. It takes one parameter -- the path to the dataset. 10 | 11 | One should realize after looking at this section (and the next), just how valuable AWK and Bash (or equivalents) are for this task. 12 | 13 | 14 | ##Locate the Dataset 15 | on on the SIP network, the TIDIGITs data set can be found at `/user/data14/res/speech_data/TIDIGITs/`. Symlink it into a convenient location. 16 | 17 | ###Split the Dataset into test and training 18 | TIDIGITS is already split into test and training datasets. 19 | If it were not, you would need to do the split. 20 | It could be done at any time during the data preparation step, 21 | depending on when other useful informations (from the annotations), 22 | is available. 23 | 24 | ##Parse its annotations 25 | Annotations of the correct labels for each utterance need to be generated for the `test` and `training` directories. 26 | 27 | ###Kaldi Script: `.scp`: Basically just a list of Utterances to Filenames 28 | A Kaldi script file is just a mapping from record_id, to extended-filenames. 29 | 30 | Line Format: 31 | 32 | ``` 33 | 34 | ``` 35 | 36 | ####Recording ID 37 | The recording ID is the first part of each line in a `.scp` file. 38 | If speaker id is available (which is is for TIDIGITs), it should form the first part of the recording id. 39 | Kaldi requires this not for speaker identification, but for purposes of sorting for training (`utt2spk` is for that). 40 | 41 | The remained of the Speaker ID is arbitary, so long as it is unique. 42 | For convenience of generating the unique id, the example script for TIDIGITS uses 43 | `_`. 44 | 45 | As there is only one Utterance per recording in TIDIGITS, the Recording ID is the Utterance ID. 46 | (See below) 47 | 48 | 49 | ####Extended Filename 50 | The second part of the line is the extended filename 51 | Extended Filename is the term used by Kaldi, to refer to a string that is either the path to a wav-format file or it is a bash command that will output wav-format data to standard out, followed by a pipe symbol (`|`). 52 | 53 | As the TIDIGITS data is in the [SPHERE audio format](http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node64.html), it needs to be converted to wav. 54 | So the sample scripts in Kaldi use `sph2pipe` to convert them, so the .scp files lines will look like: (assuming `sph2pipe` is on your PATH, otherwise Path to the executable will need to be used) 55 | 56 | ``` 57 | ad_16a sph2pipe -f wav ../TIDIGITs/test/girl/ad/16a.wav | 58 | ``` 59 | 60 | ###Segmentation File `segments` 61 | If there were multiple utterances per recording then there would need to be a segmentation file as well, mapping Recording Ids and Start-End times to Utterance IDs. 62 | (See [The official kaldi documentation on this section](http://kaldi.sourceforge.net/data_prep.html#data_prep_data)). 63 | As there is not, by not creating a `segments` file, Kaldi defaults to utterance id == recording id. 64 | 65 | 66 | ###Text Transcription file `text` 67 | The text transcription must be stored in a file, which the example calls `text`. 68 | Each line is an utterance-id followed by a transcription of what is said. 69 | E.g.: 70 | 71 | ``` 72 | ad_1oa 1 o 73 | ad_1z1za 1 z 1 z 74 | ad_1z6a 1 z 6 75 | ad_23461a 2 3 4 6 1 76 | ``` 77 | 78 | Notice the Utterance-ID format as described above. 79 | Notice also, for later, that the transcription here is in word space, not phoneme space. 80 | 81 | ###Utterance to Speaker Mappings `utt2spk` 82 | This file maps each utterance id to a speaker id. 83 | Each line has the form ` `. 84 | 85 | `spk2utt` is the opposite, and can be generated by using the script `utils/utt2spk_to_spk2utt.pl`. 86 | Each like starts with a speaker id, then has every utterance id they spoke. 87 | 88 | 89 | ##Feature extraction 90 | The feature extraction is carried out by the `run.sh` script, rather than by the `local/tidigits_data_prep.sh` script. 91 | 92 | 93 | ###Extracting the MFCC Features 94 | See [this section of the kaldi tutorial](http://kaldi.sourceforge.net/tutorial_running.html#tutorial_running_feats) 95 | 96 | [Mel-frequency cepstral coefficient](http://en.wikipedia.org/wiki/Mel-frequency_cepstrum) (MFCCs) features. 97 | Done using the script `steps/make_mfcc.sh` 98 | 99 | 100 | 101 | ####Compute Cepstral Mean and Variance Normalization statistics 102 | Done using the script `steps/compute_cmvn_stats.sh` 103 | 104 | ##Data Splitting. 105 | The data needs to be divided up so that we can run many jobs in parallel. 106 | The data splitting is also carried out by the `steps/train_mono.sh` and `steps/decode.sh` scripts if it has not already been carried out, rather than by the `local/tidigits_data_prep.sh` script. It can however be carried out at anytime after the training and test directories are created, and features extracted. 107 | 108 | It can be done with the script `utils/split_data.sh`. 109 | Usage: 110 | 111 | ``` 112 | utils/split_data.sh 113 | ``` 114 | 115 | - `` is the directory where the data is. In this case it would be both of `data/test` and `data/train` 116 | - `` is the number of divisions of data needed. It should be the number of different Jobs. 117 | 118 | -------------------------------------------------------------------------------- /tidigits/eval.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Evaluation 4 | --- 5 | #Evaluation -- using the model to recognise speech. 6 | 7 | ##Decoding 8 | We already created a decoding graph in the [training step](training). 9 | Using this graph to decode the utterances is done using `steps/decode.sh`. 10 | This script only works only for certain feature types -- conveniently all the feature types we use in TIDIGITS. (Similar decoding functions also exist in steps, for other feature types) 11 | 12 | ###Usage for `steps/decode.sh` 13 | 14 | Usage: 15 | 16 | ``` 17 | steps/decode.sh [options] 18 | ``` 19 | 20 | - `test-data-dir` is the path to the training data directory [prepared earlier](./data_prep) 21 | - `graph-dir` is the path to the directory containing the graphs generated in the previous step 22 | - `decode-dir` is a path to store all of its outputs -- including the results of the evaluations. It will be created if it does not exist. 23 | 24 | ###Configuration / Options 25 | The `decode.sh` script takes many configuration options, these should be familiar from the `train_mono.sh` script options above. 26 | They can be set by passing them as flags to script: as so: `-- `. 27 | Or by putting them all into a config bash script, and adding the flag `--config `. 28 | They could also be set by editing the defaults in `steps/decode.sh`, but there is no good reason to do this. 29 | 30 | 31 | * `nj`: Number of Jobs to run in parallel. (default `4`) 32 | * `cmd`: Job dispatcher script (default `run.pl`) 33 | 34 | 35 | * `iter`: Iteration of model to test. Training step above actually stores a copy of the model for each iteration. This option can be used to go back and test that (default final trained model). Overridden by `model` option. 36 | * `model`: which model to use, given by path. If given this overrides the `iter` (default determined by value of `iter`) 37 | 38 | * `transform-dir` directory path to find fMLLR transforms (Not useful for TIDIGITS). (default: N/A only used if fMLLR transformed were done on features.) 39 | * `scoring-opts` options to local/score.sh. Can be used to set min and max Language Model Weight for rescoring to be done. (default: "") 40 | * `num-threads` number of threads to use, (default 1). 41 | * `parallel-opts ` option string to be supplied to the parallel executer (in our running locally case `utils/run.pl` 42 | * e.g. '-pe smp 4' if you supply `--num-threads 4` 43 | * `stage`: This is used to allow you to skip some steps, as above. However decode only has 2 stages. If stage is greater than 0 will skip decoding and just do scoring. (default `0`) 44 | 45 | Options passed on to `kaldi-trunk/src/gmmbin/gmm-latgen-faster`: 46 | 47 | * `acwt` acoustic scale applied to acoustic likelihoods applied in lattice generation (default 0.083333). It affects the pruning of the lattice (low enough likelihood will be pruned). 48 | * `max_active` (default 7000) 49 | * `beam` decoding beam (default 13.0) 50 | * `lattice_beam` lattice generation beam (default 6.0) 51 | 52 | 53 | ]##What is the parallelism of Jobs in the Decoding step 54 | During decoding, the test set can be (and is in the example) split up (the actual spitting was doing in the [data preparation step](data_prep)), 55 | and each different process (Job), decodes different subset of utterances, into lattices (see below). 56 | When scoring happens (see below), all the different lattices are evaluated to get the transcriptions. 57 | 58 | 59 | ###Lattices 60 | From the [Kaldi documentation]( http://kaldi.sourceforge.net/lattices.html) "A lattice is a representation of the alternative word-sequences that are "sufficiently likely" for a particular utterance." 61 | 62 | [This blog post](http://codingandlearning.blogspot.com.au/search/label/KWS14) gives an introduction to the Latices in Kaldi quiet well, relating them to the other FSTs. 63 | 64 | Kaldi creates and uses these latices during the decoding step. 65 | However, interpreting them can be hard, because all the command line programs for working with them use [Kaldi's special table IO](http://kaldi.sourceforge.net/io_tut.html), describing how this works in detail is beyond the scope of this introduction. 66 | The command line programs in question can be found in `/kaldi-trunk/src/latbibin` 67 | 68 | 69 | The Latices are output during the decoding into ``. Into a numbered gzipped file. E.g. `lat.10.gz`. The number corresponds to the Job number (because the data has been distributed to multiple processes). Each contains a single binary file. 70 | Each of these achieves contains many latices - one for each utterance. 71 | 72 | Commands to work with them take the general form of: 73 | 74 | ``` 75 | [options] "ark:gunzip -c |" ark,t: 76 | 77 | ``` 78 | Each of the lattice commands do take the `--help` option which will cause them to give the other options. 79 | 80 | ###Example Converting a lattice to a FST Diagram 81 | For example, 82 | Consider the lattice gzipped at `exp/mono0a/decode/lat.1.gz` 83 | 84 | Running: 85 | 86 | ``` 87 | lattice-to-fst "ark:gunzip -c exp/mono0a/decode/lat.1.gz|" ark,t:1.fsts 88 | utils/int2sym.pl -f 3 data/lang/words.txt 1.fsts > 1.txt.fsts 89 | 90 | ``` 91 | 92 | (Assuming that `/kaldi-trunk/src/latbin` is in your path) 93 | 94 | Will fill `1.fsts` with a collection of text form FSTs, one for each utterance space separated. 95 | Ones with multiple terminal states have multiple different "reasonably likely" phrases possible. 96 | The output labels on the transitions are words (Which we restored using int2sym). 97 | The weights are the negative log likelihood of that transition (or that final state) 98 | 99 | As shown below: 100 | 101 | ``` 102 | ad_16a 103 | 0 1 1 3 14.7788 104 | 1 2 6 8 5.0416 105 | 2 2.61209 106 | 107 | ad_174o2o8a 108 | 0 1 1 3 12.5118 109 | 0 11 o 2 9.44585 110 | 1 2 7 9 9.34774 111 | 1 16 o 2 6.57278 112 | 2 3 4 6 2.08985 113 | 3 4 o 2 10.2191 114 | 4 5 2 4 4.91992 115 | 4 9 o 2 3.20784 116 | 5 6 o 2 3.84306 117 | 6 7 o 2 3.90951 118 | 6 13 8 10 7.07031 119 | 7 8 8 10 6.74935 120 | 7 14 o 2 3.79537 121 | 8 2.61209 122 | 9 10 2 4 5.3914 123 | 10 6 o 2 3.84306 124 | 11 12 1 3 4.75861 125 | 12 2 7 9 9.34774 126 | 13 2.61209 127 | 14 15 8 10 6.63099 128 | 15 2.61209 129 | 16 17 7 9 6.38392 130 | 17 3 4 6 2.08985 131 | ``` 132 | 133 | Then we grab one particular FST off of it. (in this case just using Awk to grab some lines -- most sophisticated approaches exist). Compile it. 134 | Project it only along the input labels (cos they are the words it will guess at), Minimise the number of states to get a simpler but equivelent model (easier to read) and finally draw it as an FSA. 135 | 136 | ``` 137 | cat 1.txt.fsts | awk "6 1.2.svg 141 | ``` 142 | 143 | The Result of this, being a FSA that will accept (/generate) the likely matches for the utterance `ad_174o2o8a`. 144 | The utterance actually said "174o2o8", which is accepted by the path through states "0,2,4,5,6,8,9,12" 145 | 146 | Note: that when the confidence in the path being correct is very high no weight is shown. 147 | 148 | ![parse lattice](./174o2o8a.png) 149 | 150 | Notice that the lattice has a lot of paths allowing 'o' to be followed by another 'o'. 151 | 152 | #### Drawing Phone Lattices 153 | Much like we can draw lattices at the word level, 154 | we can go down to draw them at the phone level. 155 | 156 | 157 | ``` 158 | lattice-to-phone-lattice exp/mono0a/final.mdl "ark:gunzip -c exp/mono0a/decode/lat.1.gz|" ark,t:1.ph.lats 159 | lattice-copy --write-compact=false ark:1.ph.lats ark,t:1.ph.fsts 160 | utils/int2sym.pl -f 4 data/lang/phones.txt 1.ph.fsts > 1.ph.txt.wfsts 161 | cat 1.ph.txt.wfsts | awk 'BEGIN{FS = " "}{ if (NF>=4) {print $1," ", $2," ",$3," ",$4;} else {print $1;};}' > 1.ph.txt.fsts 162 | ``` 163 | 164 | Notice that in the first step the model (`final.mdl`) was also used. 165 | The output of the first step is in the Compact Lattice form which is not amenable to being worked with by scripts like int2sym. 166 | The second set expands it, making it a FST. 167 | Third step is simply substituting the phone symbols into the output. It is worth looking perhaps at `1.ph.txt.fsts`, notices that the weights are only at start word phones. It is also however hard to read as it have hundred of empty string states (''). Notice also there are 2 weights (this is the Graph Weight and the Acoustic Weight). 168 | As there are 2 weights, this is not in a valid format for OpenFST. Thus the four line (the Awk Script) removed them all. 169 | 170 | With that done we now have something that looks like a collection txt.fst, however it is still very filled with epsilon states. 171 | 172 | Now to draw it up. Capturing the utterance `ad_174o2o8a` again, we will draw it: 173 | 174 | ``` 175 | cat 1.ph.txt.fsts| awk "199 1.ph.svg 180 | ``` 181 | 182 | So the steps being again, grabbing the lines we want, compiling it. 183 | Projecting it (this time on the output space), 184 | removing epsilons (matchers for empty strings), determining, and minimising to make it more readable. 185 | Then drawing it. 186 | 187 | (Click to view full screen image) 188 | [![phone lattice](./174o2o8aPhones.png)](./174o2o8aPhones.png) 189 | 190 | 191 | ##Scoring 192 | ###Viewing Results 193 | As the final step of `steps/decoding.sh` the results are recorded. 194 | 195 | The can be found in `` under filenames called `wer_` where `N` is the Language Model Scale. 196 | 197 | 198 | Example: 199 | 200 | ``` 201 | compute-wer --text --mode=present ark:data/test/text ark,p:- 202 | %WER 1.63 [ 670 / 41220, 420 ins, 111 del, 139 sub ] 203 | %SER 4.70 [ 590 / 12547 ] 204 | Scored 12547 sentences, 0 not present in hyp. 205 | ``` 206 | 207 | The Wikipedia entry on [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), is a reasonable introduction, if you are not familiar with it. 208 | 209 | The Sentence Error Rate (SER), is actually the utterance error rate. 210 | Of all the utterances in the test set, it is the portion that had zero errors. 211 | Both error rates only consider the most likely hypothesis in the lattice. 212 | 213 | `utils/best_wer.pl` will take as input any number of the `wer_` files, 214 | and will output the best WER from amongst them. 215 | 216 | ###How scoring is done 217 | Scoring is done by `local/score.sh`. 218 | his program takes a `--min-lmwt`, `--max-lmwt` for the minimal and maximum language model weight. 219 | It outputs the `wer_N` files for each of this different weights. 220 | The Language Model Weight, is the trade off (vs the Arctic model weight) as to which is more important, matching the language model, or matching the sounds. 221 | 222 | The scoring program works by opening all the lattice files, 223 | and getting them to output a transcription of the best guess at the words in all of the utterances they contain. The best guess is done with `/src/latticebin/lattice-best-path` 224 | the language model weight is passed to it, as `-lm-scale`. 225 | 226 | 227 | The WER is calculated using `/src/bin/computer-wer`, 228 | which takes two transcription files -- the best guess output in the previous step, and the correct labels. The program outputs the portion that match. 229 | 230 | -------------------------------------------------------------------------------- /tidigits/grammerFST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/tidigits/grammerFST.png -------------------------------------------------------------------------------- /tidigits/index.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: How To Train TIDIGITS 4 | --- 5 | 6 | #Introduction to training TIDIGITS 7 | TIDIGITS is a comparatively simple connected digits recognition task. 8 | Like for many well-known corpora, Kaldi includes a example script for it. 9 | It is fairly typical for the example scripts -- though simpler than most. 10 | 11 | The example script can be found in `kaldi-trunk/egs/tidigits/s5/` all other scripts referred to here are relative to that path. Kaldi example scripts are all written to be run from that path (or it equivelent in other examples) even if they are located in a subfolder. 12 | Kaldi example scripts should only be run in `bash` -- they will not necessarily work in other POSIX shells. 13 | 14 | Be aware that a lot of the recipe code is shared between WSJ (Wall Street Journal), and all the other examples (including TIDIGITs). 15 | The `util/` and `steps/` folders in most of the example folders (including that for TIDIGITs), 16 | is a symlink to the matching folders in the WSJ example. You can very well make use of these scripts in your own recipes. 17 | 18 | 19 | ####Other Resources: 20 | 21 | - The official [Kaldi tutorial](http://kaldi.sourceforge.net/tutorial.html) is not perfect (yet), but is a valuable resource. It is linked to in various sections throughout this document. 22 | 23 | - [This tutorial](http://analytcz.com/kaldi-hybrid-mlphmm-asr-2/) seems good. Its web hosting does not seem stable, right now the [google cached version can be used](http://webcache.googleusercontent.com/search?q=cache:z-MGlCv917sJ:analytcz.com/kaldi-hybrid-mlphmm-asr-2/) 24 | 25 | 26 | ##The Major Steps 27 | 28 | There are Four Steps to applying Kaldi to a task such as this. 29 | 30 | 31 | 1. [Data Preparation](./data_prep): 32 | * Locating the datafiles 33 | * Parsing its annotations (e.g. Speaker Labels, Utterance Labels) 34 | * Converting the audio data format 35 | 2. [Language Preparation](./lang_prep): 36 | * Create Lexicon (Phoneme/Word dictionary) 37 | * Create Grammar (Word Language Model) 38 | 3. [Training Speech Recognizer](train): 39 | * Training the GMMs 40 | * Building the HMM graph 41 | 4. [Evaluating the Speech Recognizer](eval): 42 | * Decoding and building the lattices 43 | * Interpreting the results 44 | The full process can be carried out by running `bash run.sh`. Though you most likely need to edit at least the TIDIGITs path, and the `cmd.sh` (so that it is set to run locally, not on a cluster). 45 | 46 | These instructions also briefly touch on some of the option that might be needed in more complicated tasks. They also go into some detail on things which are not done by the `run.sh` script, for example outputting utterance recognition lattice diagrams. The instructions do not provide any kind of solid introduction to HMMs or GMMs. Nor to bash or awk. 47 | -------------------------------------------------------------------------------- /tidigits/lang_prep.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Language Preparation 4 | --- 5 | #Language Preparation 6 | [The official kaldi documentation on this section](http://kaldi.sourceforge.net/data_prep.html#data_prep_lang). 7 | 8 | This section covers the same content as the recipe script in `/local/tidigits_prepare_lang.sh` 9 | 10 | To understand this section you should first [understand openFST]( ../fst-example). 11 | 12 | 13 | 14 | #The Phones 15 | Kaldi expects a number of files to be in the `data/lang/phones/` directory. 16 | Most of them are not complex for TIDIGITS. 17 | 18 | To facilitate the creation of this, it is useful to have a full list of phonemes. This could be created many ways. One way it to apply Awk to the lexicon (see next section). 19 | 20 | These phone files a simple lists with one phone each line: 21 | 22 | - `silence.txt`, `context_indep.txt` and `optional_silence.txt` all are made to just single like files containing `sil` the silence phoneme symbol. 23 | - `nonsilence.txt` contains all other phonemes. 24 | 25 | The following files would do a lot more in more complicated situations, but are simple for TIDIGITS 26 | - `sets.txt` each line contains a set of phones that should be considered to be the same phoneme (i.e. a set of morphemes for one phoneme). Since in TIDIGITS this is not a concern (we don't have access to a morpheme level transcription), each set contains just the one phoneme so all phonemes should be listed on there own line in this file. Including sil (the silence phoneme). 27 | - `disambig.txt` should be created and left empty. 28 | - `extra-questions.txt` should also be created and left empty. 29 | 30 | ###Generating isymbols files from the phones. 31 | Once you have a phonelist, it is very easy to enumerate it to create the isymbols file required for the phoneme-word FST. 32 | 33 | ###Converting Symbol Phonelist to Integer Phone Lists 34 | Once you have a isymbols file, each of the files created in the previous set need to be converted to lists of there matching integers rather than textual symbols. The files created this way have the same name, but with a `.int` extension. 35 | 36 | The perl script `utils/sym2int.pl` is used for this. It can take a single parameters of the symbols file, and then will take on standard in, the symbolic (`.txt`) phone lists, and output (on standard out), there corresponding integer forms. 37 | 38 | 39 | `silence`, `nonsilence`, `context_indep`, `optional_silence`, `disambig` 40 | all also need to be converter to colon separated list files (`.csl`), 41 | these are the same as the `.int` files, but instead of the integer phone representation being separated by linebreaks, they are separated with colons. A simplish job for Awk/sed. 42 | 43 | ###roots.txt Decision Tree Roots 44 | Kaldi makes use of Decision Trees for some functionality. 45 | See [the documentation](http://kaldi.sourceforge.net/tree_externals.html) for the why and how of this. 46 | 47 | It require a root definition file. 48 | For TIDIGITS is is very simple. 49 | `roots.txt` contains on each line, `shared split `, and has one line for each phone. 50 | It is converted to `roots.int`, by converting each phone symbol to it integer representation. 51 | 52 | ##Words and Out of Vocabulary Lists 53 | A word symbol list will also need to be constructed for the FST. 54 | Again this can be generated from the lexicon (see below), with Awk. 55 | 56 | The word symbols are simply: o, z ,1, 2, 3, 4, 6, 7, 8, 9. 57 | 58 | To go with this Kaldi needs to be told what word to use when words that are not in the vocabulary list are found. Since no such words exist in TIDIGITS, it really doesn't mater what it done with them. But Kaldi requires the files. 59 | 60 | Create a `oov.txt` with any one word in it. (Example script uses z). 61 | Create a oov.int with the matching integer for it. 62 | This can be done manually, 63 | or it could be done with `sym2int` on the words symbol list, created earlier. 64 | 65 | If you arranged you word symbols so that the int form of your oov word is the same is its text form then you are either over or under thinking this, and you could copy the `oov.txt` to `oov.int` 66 | 67 | ## The Lexicon 68 | The example recipe for TIDIGITs is quiet clever about constructing the phoneme to word FST. 69 | There script `util/make_lexicon_fst.pl` takes a lexicon file, and outputs a text FST file. 70 | Each line of the Lexicon file has the format: 71 | 72 | ``` 73 | lexicon.fst.txt 99 | ``` 100 | 101 | Creates a lexicon FST that transduces phones to words, and may allow optional silence. 102 | Note: ordinarily, each line of `lexicon.txt` is: `w`fstcompile`ord phone1 phone2 ... phoneN`; if the `--pron-probs` option is used, each line is: `word pronunciation-probability phone1 phone2 ... phoneN`. The probability 'prob' will typically be between zero and one, and note that it's generally helpful to normalize so the largest one for each word is 1.0, but this is your responsibility. The silence disambiguation symbol, 103 | e.g. something like #5, is used only when creating a lexicon with disambiguation symbols, e.g. L_disambig.fst, and was introduced to fix a particular case of non-determinism of decoding graphs. 104 | 105 | ###Compiling the Lexicon FST: `L.fst` 106 | The `lexicon.fst.txt`, is then compiled (`fstcompile`), using the isymbols and osymbols generated from the `lexicon.txt`, plus the silence phoneme (sil) added in the make `lexicon.txt.fst` step. 107 | It's edges are then sorted by output label using `fstarcsort`. (not sure why this is required.) 108 | 109 | ####L_disambig.fst = L.fst 110 | 111 | To quote the TIDIGITs recipe: 112 | >in this setup there are no "disambiguation symbols" because the lexicon 113 | contains no homophones; and there is no '#0' symbol in the LM [(Language Model)] because it's not a backoff LM, so L_disambig.fst is the same as L.fst 114 | 115 | So `L.fst` is copied to `L_disambig.fst` 116 | 117 | For more information on disambiguation read the [documentation page](http://kaldi.sourceforge.net/graph.html#graph_disambig). 118 | 119 | ###The final lexicon FST 120 | 121 | ![lexicon fst](./lexiconFST.png) 122 | 123 | 0 is the initial state (as always), 124 | and 1 is the only final state. 125 | Notice there is only one path leaving state 2 and that goes back to 1 via 'sil'. 126 | Notice also that all states which have a transition to 2, have a identical transition to 1. 127 | 128 | ##The Grammar 129 | The Lexicon defined how Phonemes make up words. 130 | The Grammar defines how words make up a sentence. 131 | The grammar is a weighed FSA. 132 | It is expressed as a weighted FST in the example script -- a FSA can be considered as a FST with input and output symbols the same. 133 | 134 | 135 | ###The States 136 | As our sentences are made up of digit sequences of length between 1 and 7, this could be re-represented as a WFSA with 8 states, 7 of which are optionally terminal, and all of which have all digits going to 137 | 138 | It is simpler, and more real world useful, however to take the assumption that digit sequences can be of any length (greater than 1). 139 | 140 | This can be modelled as a FSA, with just one state, which is both initial and terminal, 141 | and all edges connect to it. 142 | This is done in the example recipe 143 | 144 | ### The Transitions 145 | As a FSA each edge has one label, 146 | but as it is expressed as a FST this label is put on both input and output. 147 | 148 | ####The Weights 149 | We want to relate the weight to the probability of that transition happening. 150 | After a digit has been said there are 12 possible future actions: 151 | 152 | - A digit from 1-9 is said 153 | - o, pronounced Oh is said 154 | - z, pronounced zero is said 155 | - nothing further is said, as the sentence has ended. 156 | 157 | It is reasonable to assume each of there 12 options is equally likely. 158 | So they each have a probability of 1/12th. 159 | 160 | In there circumstances it is normal to work with negative log probabilities for numerical stability. 161 | `-ln(1/2)=2.48490664979...`. This can just be put in to the final like of each column in the FST. 162 | The example recipe uses an inline perl to calculate it on the fly (but is is not expressed in any more digits). 163 | 164 | ###Compile and Arc Sort 165 | The FST is compiled and arc sored just as for the lexicon. 166 | The example calls this `G.fst` 167 | 168 | ###The final grammar FST 169 | 170 | ![grammar fst](./grammerFST.png) 171 | 172 | ##The Final Grammar Composed with Lexicon 173 | The great beauty of working with FSTs in this way is they are compose-able. 174 | There is no need to compose them in this step -- that will be done later when they are also composed with the HMM; but so that you can see what is going on, below is the grammar composed with the lexicon. 175 | 176 | ![Lexicon Grammar FST](./LGFST.png) 177 | 178 | 0 is the initial state. 0 and 4 are the final states. 179 | This FST maps phones (from the lexicon) to strings of words which are allowed by the Grammar. 180 | However, since the Grammar is so permissive (no restrictions at all on order of words), 181 | this looks very similar to the Lexicon FST. It is in fact equivelent to the Kleene closure of the Lexicon FST. 182 | 183 | ##HMM Topology 184 | One could say this was really part of the next step of training. 185 | However it is covered in the sample script for this section at `/local/tidigits_prepare_lang.sh`. 186 | 187 | The actual action to be taken is very simple. 188 | Understanding why takes some knowledge of HMMs. 189 | 190 | The HMM Topology defines how the HMM that is going to be created for the Phones works. 191 | In most cases the 3 state Bakis model is used. 192 | 193 | To get a idea what is really going on under the hood, 194 | read [this page of the documentation](http://kaldi.sourceforge.net/hmm.html). 195 | 196 | In short, topo files define instructions for how to build Hidden Markov Models (HMMs) -- what states are linked to others. 197 | 198 | The topo file is expressed in a almost-XML language (not quiet XML as not all opened tags have close tags, only ones that have other elements nested inside them.). Kaldi uses this, and will eventually at some point internally produce a WFST that is the HMM. Which you might find in literature referred to as H, to go with the lexicon L and the grammar G. 199 | 200 | ###In practice 201 | All that is required is to copy the template 3 state Bakis from `conf/topo.protp`, 202 | and use `sed` to replace NONSILENCEPHONES, and SILENCEPHONES, with space separated lists of the integer representation of the nonsilent and silent phones respectively. 203 | 204 | ##Validating Everything has been done correctly so far 205 | This step is actually carried out in `run.sh` rather than in the `local/tidigits_prepare_lang.sh`. 206 | 207 | `util/validate_lang.pl` takes a single argument -- the path to the lang folder. 208 | It then validates that all things have been set up correctly. 209 | However there are some warnings for the TIDIGITs setup. 210 | 211 | To quote `run.sh`: 212 | >```utils/validate_lang.pl data/lang/ ``` 213 | > Note; this actually does report errors, 214 | > and exits with status 1, but we've checked them and seen that they 215 | > don't matter (this setup doesn't have any disambiguation symbols, 216 | > and the script doesn't like that). 217 | 218 | 219 | -------------------------------------------------------------------------------- /tidigits/lexiconFST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oxinabox/Kaldi-Notes/845047ba0191440222338b1f3d2310b2e8e14df9/tidigits/lexiconFST.png -------------------------------------------------------------------------------- /tidigits/train.txt: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: Training 4 | --- 5 | 6 | ##Controlled remote vs local execution: `cmd.sh` 7 | Kaldi is designed to work with SunGrid clusters. 8 | It also work with other clusters. 9 | We want to run it locally, it can do that too. 10 | This can be done by making sure cmd.sh sets the variables as follows: 11 | 12 | ``` 13 | export train_cmd=run.pl 14 | export decode_cmd=run.pl 15 | ``` 16 | 17 | rather than making references to `queue.pl`. 18 | 19 | Training (and testing), will still be split into multiple jobs, each handling different subsets of the data. 20 | 21 | 22 | #Training Recognizer 23 | This section is covered by [this section of the kaldi tutorial](http://kaldi.sourceforge.net/tutorial_running.html#tutorial_running_monophone). 24 | 25 | The majority of the steps covered in this page, are triggered by the script `run.sh` 26 | 27 | 28 | ##Training 29 | Done using the script `steps/train_mono.sh`, However very similar steps are used in the other training scripts from in `steps` (such as `steps/train_deltas`). 30 | 31 | Usage: 32 | 33 | ``` 34 | steps/train_mono.sh [options] " 35 | ``` 36 | - `training-data-dir` is the path to the training data directory [prepared earlier](./data_prep) 37 | - `lang-dir` is the path the directory containing all the language model files, [also prepared earlier](./lang_prep) 38 | - `exp-dir` is a path for the training to store all of its outputs. It will be created if it does not exist. 39 | 40 | ###Configuration / Options 41 | The `train_mono` script takes many configuration options. 42 | They can be set by passing them as flags to script: as so: `-- `. 43 | Or by putting them all into a config bash script, and adding the flag `--config `. 44 | They could also be set by editing the defaults in `steps/train_mono.sh`, but there is no good reason to do this. 45 | 46 | 47 | * `nj`: Number of Jobs to run in parallel. (default `4`) 48 | * `cmd`: Job dispatcher script (default `run.pl`) 49 | * `scale_opts`: takes a string (wrap it in quotes) to control scaling options (default `"--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"`) 50 | * `transition-scale` (default `1.0`) 51 | * `acoustic-scale` (default `0.1`) 52 | * `self-loop-scale` (default `0.1`) 53 | * `num_iters` Number of iterations of training (default `40`) 54 | * `max_iter_inc` maximum amount to increase the number of Gaussians by (default `30`) 55 | * `totgauss` Target number of Gaussians (default `1000`) 56 | * `careful` passed on to `gmm-align-compiled`. To quote its documentation: "If true, do 'careful' alignment, which is better at detecting alignment failure (involves loop to start of decoding graph)." (default `false`) 57 | * `boost_silence` Factor by which to boost silence likelihoods in alignment. (Default `1.0`) 58 | * `realign_iters` iterations in which to perform realignment (default `"1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32 35 38"`) 59 | * `power` exponent to determine number of Gaussians from occurrence counts (default `0.25`) 60 | * `cmvn_opts` options will be passed on to cmvn -- like scale_opts. (default `""`) 61 | * `stage`: This is used to allow you to skip some steps, if the program crashed partway though. The stage variable sets the stage to start at. The stages are discussed in the next section (default `-4`) 62 | 63 | 64 | ###What is the parallelism of Jobs in the Training step 65 | During training, the training set can be (and is in the example) split up (the actual spitting is explained in the [data preparation step](data_prep)), 66 | and each different process (Job), trains on a different subset of utterances, which each iteration are then merged. 67 | 68 | ### Initialisation Stages 69 | 70 | ####Initialise GMM (Stage -3) 71 | Uses `/kaldi-trunk/src/gmmbin/gmm-init-mono`. 72 | Call that with the `--help` option for more info 73 | 74 | This defines (amongst other things), how many GMMs there are initially. 75 | 76 | 77 | ####Compile Training Graphs (Stage -2) 78 | uses `/kaldi-trunk/source/bin/compile-training-graphs`. 79 | Call that with the `--help` option for more info. 80 | 81 | See [this section of the documentation](http://kaldi.sourceforge.net/graph_recipe_train.html). 82 | 83 | ####Align Data Equally (Stage -1) 84 | Creates an equally spaced alignment. As a starting point for further alignment stages. 85 | Uses `/kaldi-trunk/source/bin/align-equal-compiled`. 86 | Call that with the `--help` option for more info. 87 | 88 | ####Estimate Gaussians (Stage 0) 89 | Do the maximum likelihood estimation of GMM-based acoustic model. 90 | Uses `/kaldi-trunk/src/gmmbin/gmm-est`. 91 | Call that with the `--help` option for more info. 92 | 93 | The script notes: 94 | >In the following steps, the `--min-gaussian-occupancy=3` option is important, otherwise 95 | > we fail to est[imate] "rare" phones and later on, they never align properly. 96 | 97 | 98 | 99 | ###Training (Stage = Iterations completed) 100 | Every Iteration a number of steps are carried out. 101 | 102 | ####Realign 103 | If this iteration is one of the `realign_iters` then: 104 | 105 | #####Boost Silence 106 | Silence is boosted using `/kaldi-trunk/src/gmmbin/gmm-boost-silence`, 107 | Call that with the `--help` option for more info. 108 | Notably it does not necessarily boost the silence phone (but it does in this training case), it can boost any phone. 109 | It does this by modifying the GMM weights, to make silence more probable. 110 | 111 | #####Align 112 | Features are aligned given the GMM models. 113 | Uses `kaldi-kaldi/src/gmmbin/gmm-align-compiled`. 114 | Call that with the `--help` option for more info. 115 | 116 | ###Reestimate the GMM model. 117 | First accumulate stats which are used in the next step. 118 | This is done using `/kaldi-trunk/src/gmmbin/gmm-acc-states-ali`. 119 | Call that with the `--help` option for more info. 120 | 121 | Then redo the GMM-based acoustic model. 122 | This is done with `/kaldi-trunk/src/gmmbin/gmm-est`, but using very different arguments. 123 | Again call that with the `--help` option for more info. 124 | 125 | ###Merge GMMs 126 | All the different GMMs from the partitioned training dataset are then merged, 127 | using `gmm-acc-sum` to produce a model (a `.mdl` file). 128 | The model can be examined using `/kaldi-trunk/src/gmmbin/gmm-info` to get some very basic information about the number of Gaussians etc. 129 | 130 | 131 | Finally, increase the number of Gaussians (capped by `max_iter_inc`), so that by the time all the iterations (`num_iters`) are all complete, it will approach the target total number of Gaussians (`totgauss`) -- assuming `max_iter_inc` did not block it. 132 | 133 | 134 | ##Making of the Decoding Graph 135 | 136 | 137 | As showing earlier, the Grammar (G) can be composed with the Lexicon (L), 138 | to get a phoneme to word mapping. 139 | 140 | To increase the power of the phones, they could be expanded to add context. 141 | For example making the 'ay' phone in 'n-ay-n' (nine) different from the one in 'm-ay-n' (mine). 142 | This can be done with a Context dependency FST, which can be scaled 143 | with the number of phones to take into account in the context. 144 | This is roughly equivelent to making use of n-grams on the phonetic level. Using 3 (i.e. one to each side) context is referred to a triphones. 145 | 146 | 147 | The Context Dependency can be expressed as a FST, referred to as C. 148 | 149 | 150 | 151 | [This blog post](http://vpanayotov.blogspot.com.au/2012/06/kaldi-decoding-graph-construction.html) presents the details of the creation quiet well. It will be a bit of revision from the data preparation step. 152 | 153 | 154 | ###Usage of `util/mkgraph.sh` 155 | The final graph is created using `util/mkgraph.sh` 156 | To quote the introduction to that script: 157 | > ...creates a fully expanded decoding graph (HCLG) that represents 158 | > all the language-model, pronunciation dictionary (lexicon), context-dependency, 159 | > and HMM structure in our model. The output is a Finite State Transducer 160 | > that has word-ids on the output, and pdf-ids on the input (these are indexes 161 | > that resolve to Gaussian Mixture Models). 162 | 163 | It also creates the aforementioned Context Dependency Graph. 164 | 165 | Usage: 166 | 167 | ``` 168 | utils/mkgraph.sh [options] 169 | ``` 170 | 171 | - `lang-dir` is, as before, the path to the directory containing all the language model files, [also prepared earlier](./lang_prep) 172 | - `model-dir` is the `exp-dir` from the previous train-mono step, which now contained the trained model. 173 | - `graph-dir` is the directory to place the final graph in. In the example script this is made as a graph subdirectory under the `exp-dir`. If it does not exist, it will be created 174 | 175 | - 176 | 177 | ####Context Options 178 | There are 3 options for defining how many phones are used to create the context. 179 | These are passed as the options to the `utils/mkgraph.sh` script 180 | 181 | - ` --mono` for monophone i.e. one phone i.e. no context (Used in `steps/train_mono.sh`) 182 | - no flag (default) for triphone i.e. 3 phones i.e. one phone to each side for context 183 | - `--quinphone` for quinphone i.e. 5 phones i.e. 2 phones to each side for context 184 | 185 | It would not be hard to extend the mkgraph script to create contexts of any length. 186 | The section of the mkgraph script responsible for this, 187 | makes use of `/kaldi-trunk/src/fstbin/fstcomposecontext`, take a look at its `--help` for more information. 188 | 189 | --------------------------------------------------------------------------------