├── test_audio
    ├── 1_1.wav
    ├── 1_2.wav
    ├── 1_3.wav
    ├── 1_4.wav
    ├── 1_5.wav
    ├── 1_6.wav
    ├── 1_7.wav
    ├── 1_8.wav
    ├── 1_9.wav
    └── 1_10.wav
├── train_audio
    ├── 10_1.wav
    ├── 10_10.wav
    ├── 10_2.wav
    ├── 10_3.wav
    ├── 10_4.wav
    ├── 10_5.wav
    ├── 10_6.wav
    ├── 10_7.wav
    ├── 10_8.wav
    ├── 10_9.wav
    ├── 1_1.wav
    ├── 1_10.wav
    ├── 1_2.wav
    ├── 1_3.wav
    ├── 1_4.wav
    ├── 1_5.wav
    ├── 1_6.wav
    ├── 1_7.wav
    ├── 1_8.wav
    ├── 1_9.wav
    ├── 2_1.wav
    ├── 2_10.wav
    ├── 2_2.wav
    ├── 2_3.wav
    ├── 2_4.wav
    ├── 2_5.wav
    ├── 2_6.wav
    ├── 2_7.wav
    ├── 2_8.wav
    ├── 2_9.wav
    ├── 3_1.wav
    ├── 3_10.wav
    ├── 3_2.wav
    ├── 3_3.wav
    ├── 3_4.wav
    ├── 3_5.wav
    ├── 3_6.wav
    ├── 3_7.wav
    ├── 3_8.wav
    ├── 3_9.wav
    ├── 4_1.wav
    ├── 4_10.wav
    ├── 4_2.wav
    ├── 4_3.wav
    ├── 4_4.wav
    ├── 4_5.wav
    ├── 4_6.wav
    ├── 4_7.wav
    ├── 4_8.wav
    ├── 4_9.wav
    ├── 5_1.wav
    ├── 5_10.wav
    ├── 5_2.wav
    ├── 5_3.wav
    ├── 5_4.wav
    ├── 5_5.wav
    ├── 5_6.wav
    ├── 5_7.wav
    ├── 5_8.wav
    ├── 5_9.wav
    ├── 6_1.wav
    ├── 6_10.wav
    ├── 6_2.wav
    ├── 6_3.wav
    ├── 6_4.wav
    ├── 6_5.wav
    ├── 6_6.wav
    ├── 6_7.wav
    ├── 6_8.wav
    ├── 6_9.wav
    ├── 7_1.wav
    ├── 7_10.wav
    ├── 7_2.wav
    ├── 7_3.wav
    ├── 7_4.wav
    ├── 7_5.wav
    ├── 7_6.wav
    ├── 7_7.wav
    ├── 7_8.wav
    ├── 7_9.wav
    ├── 8_1.wav
    ├── 8_10.wav
    ├── 8_2.wav
    ├── 8_3.wav
    ├── 8_4.wav
    ├── 8_5.wav
    ├── 8_6.wav
    ├── 8_7.wav
    ├── 8_8.wav
    ├── 8_9.wav
    ├── 9_1.wav
    ├── 9_10.wav
    ├── 9_2.wav
    ├── 9_3.wav
    ├── 9_4.wav
    ├── 9_5.wav
    ├── 9_6.wav
    ├── 9_7.wav
    ├── 9_8.wav
    └── 9_9.wav
├── demo.py
└── README.md


/test_audio/1_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_1.wav


--------------------------------------------------------------------------------
/test_audio/1_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_2.wav


--------------------------------------------------------------------------------
/test_audio/1_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_3.wav


--------------------------------------------------------------------------------
/test_audio/1_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_4.wav


--------------------------------------------------------------------------------
/test_audio/1_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_5.wav


--------------------------------------------------------------------------------
/test_audio/1_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_6.wav


--------------------------------------------------------------------------------
/test_audio/1_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_7.wav


--------------------------------------------------------------------------------
/test_audio/1_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_8.wav


--------------------------------------------------------------------------------
/test_audio/1_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_9.wav


--------------------------------------------------------------------------------
/test_audio/1_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_10.wav


--------------------------------------------------------------------------------
/train_audio/10_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_1.wav


--------------------------------------------------------------------------------
/train_audio/10_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_10.wav


--------------------------------------------------------------------------------
/train_audio/10_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_2.wav


--------------------------------------------------------------------------------
/train_audio/10_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_3.wav


--------------------------------------------------------------------------------
/train_audio/10_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_4.wav


--------------------------------------------------------------------------------
/train_audio/10_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_5.wav


--------------------------------------------------------------------------------
/train_audio/10_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_6.wav


--------------------------------------------------------------------------------
/train_audio/10_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_7.wav


--------------------------------------------------------------------------------
/train_audio/10_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_8.wav


--------------------------------------------------------------------------------
/train_audio/10_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_9.wav


--------------------------------------------------------------------------------
/train_audio/1_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_1.wav


--------------------------------------------------------------------------------
/train_audio/1_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_10.wav


--------------------------------------------------------------------------------
/train_audio/1_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_2.wav


--------------------------------------------------------------------------------
/train_audio/1_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_3.wav


--------------------------------------------------------------------------------
/train_audio/1_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_4.wav


--------------------------------------------------------------------------------
/train_audio/1_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_5.wav


--------------------------------------------------------------------------------
/train_audio/1_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_6.wav


--------------------------------------------------------------------------------
/train_audio/1_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_7.wav


--------------------------------------------------------------------------------
/train_audio/1_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_8.wav


--------------------------------------------------------------------------------
/train_audio/1_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_9.wav


--------------------------------------------------------------------------------
/train_audio/2_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_1.wav


--------------------------------------------------------------------------------
/train_audio/2_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_10.wav


--------------------------------------------------------------------------------
/train_audio/2_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_2.wav


--------------------------------------------------------------------------------
/train_audio/2_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_3.wav


--------------------------------------------------------------------------------
/train_audio/2_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_4.wav


--------------------------------------------------------------------------------
/train_audio/2_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_5.wav


--------------------------------------------------------------------------------
/train_audio/2_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_6.wav


--------------------------------------------------------------------------------
/train_audio/2_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_7.wav


--------------------------------------------------------------------------------
/train_audio/2_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_8.wav


--------------------------------------------------------------------------------
/train_audio/2_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_9.wav


--------------------------------------------------------------------------------
/train_audio/3_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_1.wav


--------------------------------------------------------------------------------
/train_audio/3_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_10.wav


--------------------------------------------------------------------------------
/train_audio/3_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_2.wav


--------------------------------------------------------------------------------
/train_audio/3_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_3.wav


--------------------------------------------------------------------------------
/train_audio/3_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_4.wav


--------------------------------------------------------------------------------
/train_audio/3_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_5.wav


--------------------------------------------------------------------------------
/train_audio/3_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_6.wav


--------------------------------------------------------------------------------
/train_audio/3_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_7.wav


--------------------------------------------------------------------------------
/train_audio/3_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_8.wav


--------------------------------------------------------------------------------
/train_audio/3_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_9.wav


--------------------------------------------------------------------------------
/train_audio/4_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_1.wav


--------------------------------------------------------------------------------
/train_audio/4_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_10.wav


--------------------------------------------------------------------------------
/train_audio/4_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_2.wav


--------------------------------------------------------------------------------
/train_audio/4_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_3.wav


--------------------------------------------------------------------------------
/train_audio/4_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_4.wav


--------------------------------------------------------------------------------
/train_audio/4_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_5.wav


--------------------------------------------------------------------------------
/train_audio/4_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_6.wav


--------------------------------------------------------------------------------
/train_audio/4_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_7.wav


--------------------------------------------------------------------------------
/train_audio/4_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_8.wav


--------------------------------------------------------------------------------
/train_audio/4_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_9.wav


--------------------------------------------------------------------------------
/train_audio/5_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_1.wav


--------------------------------------------------------------------------------
/train_audio/5_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_10.wav


--------------------------------------------------------------------------------
/train_audio/5_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_2.wav


--------------------------------------------------------------------------------
/train_audio/5_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_3.wav


--------------------------------------------------------------------------------
/train_audio/5_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_4.wav


--------------------------------------------------------------------------------
/train_audio/5_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_5.wav


--------------------------------------------------------------------------------
/train_audio/5_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_6.wav


--------------------------------------------------------------------------------
/train_audio/5_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_7.wav


--------------------------------------------------------------------------------
/train_audio/5_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_8.wav


--------------------------------------------------------------------------------
/train_audio/5_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_9.wav


--------------------------------------------------------------------------------
/train_audio/6_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_1.wav


--------------------------------------------------------------------------------
/train_audio/6_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_10.wav


--------------------------------------------------------------------------------
/train_audio/6_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_2.wav


--------------------------------------------------------------------------------
/train_audio/6_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_3.wav


--------------------------------------------------------------------------------
/train_audio/6_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_4.wav


--------------------------------------------------------------------------------
/train_audio/6_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_5.wav


--------------------------------------------------------------------------------
/train_audio/6_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_6.wav


--------------------------------------------------------------------------------
/train_audio/6_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_7.wav


--------------------------------------------------------------------------------
/train_audio/6_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_8.wav


--------------------------------------------------------------------------------
/train_audio/6_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_9.wav


--------------------------------------------------------------------------------
/train_audio/7_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_1.wav


--------------------------------------------------------------------------------
/train_audio/7_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_10.wav


--------------------------------------------------------------------------------
/train_audio/7_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_2.wav


--------------------------------------------------------------------------------
/train_audio/7_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_3.wav


--------------------------------------------------------------------------------
/train_audio/7_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_4.wav


--------------------------------------------------------------------------------
/train_audio/7_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_5.wav


--------------------------------------------------------------------------------
/train_audio/7_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_6.wav


--------------------------------------------------------------------------------
/train_audio/7_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_7.wav


--------------------------------------------------------------------------------
/train_audio/7_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_8.wav


--------------------------------------------------------------------------------
/train_audio/7_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_9.wav


--------------------------------------------------------------------------------
/train_audio/8_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_1.wav


--------------------------------------------------------------------------------
/train_audio/8_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_10.wav


--------------------------------------------------------------------------------
/train_audio/8_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_2.wav


--------------------------------------------------------------------------------
/train_audio/8_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_3.wav


--------------------------------------------------------------------------------
/train_audio/8_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_4.wav


--------------------------------------------------------------------------------
/train_audio/8_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_5.wav


--------------------------------------------------------------------------------
/train_audio/8_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_6.wav


--------------------------------------------------------------------------------
/train_audio/8_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_7.wav


--------------------------------------------------------------------------------
/train_audio/8_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_8.wav


--------------------------------------------------------------------------------
/train_audio/8_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_9.wav


--------------------------------------------------------------------------------
/train_audio/9_1.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_1.wav


--------------------------------------------------------------------------------
/train_audio/9_10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_10.wav


--------------------------------------------------------------------------------
/train_audio/9_2.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_2.wav


--------------------------------------------------------------------------------
/train_audio/9_3.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_3.wav


--------------------------------------------------------------------------------
/train_audio/9_4.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_4.wav


--------------------------------------------------------------------------------
/train_audio/9_5.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_5.wav


--------------------------------------------------------------------------------
/train_audio/9_6.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_6.wav


--------------------------------------------------------------------------------
/train_audio/9_7.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_7.wav


--------------------------------------------------------------------------------
/train_audio/9_8.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_8.wav


--------------------------------------------------------------------------------
/train_audio/9_9.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_9.wav


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Created on 29/08/2018
 3 | 
 4 | @author: wblgers
 5 | '''
 6 | from __future__ import print_function
 7 | import warnings
 8 | import os
 9 | from scikits.talkbox.features import mfcc
10 | from scipy.io import wavfile
11 | from hmmlearn import hmm
12 | import numpy as np
13 | warnings.filterwarnings('ignore')
14 | def extract_mfcc(full_audio_path):
15 |     sample_rate, wave =  wavfile.read(full_audio_path)
16 |     mfcc_features = mfcc(wave, nwin=int(sample_rate * 0.03), fs=sample_rate, nceps=12)[0]
17 |     return mfcc_features
18 | 
19 | def buildDataSet(dir):
20 |     # Filter out the wav audio files under the dir
21 |     fileList = [f for f in os.listdir(dir) if os.path.splitext(f)[1] == '.wav']
22 |     dataset = {}
23 |     for fileName in fileList:
24 |         tmp = fileName.split('.')[0]
25 |         label = tmp.split('_')[1]
26 |         feature = extract_mfcc(dir+fileName)
27 |         if label not in dataset.keys():
28 |             dataset[label] = []
29 |             dataset[label].append(feature)
30 |         else:
31 |             exist_feature = dataset[label]
32 |             exist_feature.append(feature)
33 |             dataset[label] = exist_feature
34 |     return dataset
35 | 
36 | def train_GMMHMM(dataset):
37 |     GMMHMM_Models = {}
38 |     states_num = 5
39 |     GMM_mix_num = 3
40 |     tmp_p = 1.0/(states_num-2)
41 |     transmatPrior = np.array([[tmp_p, tmp_p, tmp_p, 0 ,0], \
42 |                                [0, tmp_p, tmp_p, tmp_p , 0], \
43 |                                [0, 0, tmp_p, tmp_p,tmp_p], \
44 |                                [0, 0, 0, 0.5, 0.5], \
45 |                                [0, 0, 0, 0, 1]],dtype=np.float)
46 | 
47 | 
48 |     startprobPrior = np.array([0.5, 0.5, 0, 0, 0],dtype=np.float)
49 | 
50 |     for label in dataset.keys():
51 |         model = hmm.GMMHMM(n_components=states_num, n_mix=GMM_mix_num, \
52 |                            transmat_prior=transmatPrior, startprob_prior=startprobPrior, \
53 |                            covariance_type='diag', n_iter=10)
54 |         trainData = dataset[label]
55 |         length = np.zeros([len(trainData), ], dtype=np.int)
56 |         for m in range(len(trainData)):
57 |             length[m] = trainData[m].shape[0]
58 |         trainData = np.vstack(trainData)
59 |         model.fit(trainData, lengths=length)  # get optimal parameters
60 |         GMMHMM_Models[label] = model
61 |     return GMMHMM_Models
62 | 
63 | def main():
64 |     trainDir = './train_audio/'
65 |     trainDataSet = buildDataSet(trainDir)
66 |     print("Finish prepare the training data")
67 | 
68 |     hmmModels = train_GMMHMM(trainDataSet)
69 |     print("Finish training of the GMM_HMM models for digits 0-9")
70 | 
71 |     testDir = './test_audio/'
72 |     testDataSet = buildDataSet(testDir)
73 | 
74 |     score_cnt = 0
75 |     for label in testDataSet.keys():
76 |         feature = testDataSet[label]
77 |         scoreList = {}
78 |         for model_label in hmmModels.keys():
79 |             model = hmmModels[model_label]
80 |             score = model.score(feature[0])
81 |             scoreList[model_label] = score
82 |         predict = max(scoreList, key=scoreList.get)
83 |         print("Test on true label ", label, ": predict result label is ", predict)
84 |         if predict == label:
85 |             score_cnt+=1
86 |     print("Final recognition rate is %.2f"%(100.0*score_cnt/len(testDataSet.keys())), "%")
87 | 
88 | 
89 | if __name__ == '__main__':
90 |     main()


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # hmm_speech_recognition_demo
 2 | 
 3 | ### 0. Setup Environment
 4 | This demo project is running on python2.x, please install the following required packages as well:
 5 | - scikits.talkbox: Calculation of MFCC features on audio 
 6 | - hmmlearn: Hidden Markov Models in Python, with scikit-learn like API	
 7 | - scipy: Fundamental library for scientific computing
 8 | 
 9 | All the three python packages can be installed via `pip install`, on Python3.x, the package `scikits.talkbox` can't be installed correctly for me.
10 | 
11 | 
12 | ### 1. Description
13 | #### 1.1 Problem
14 | By utilizing the `GMMHMM` in `hmmlearn`, we try to model the audio files in 10 categories. `GMMHMM` model provides easy interface to train a HMM model and to evaluate the score on test set.
15 | 
16 | Please more details in the [doc](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn-hmm) of `hmmlearn`.
17 | #### 1.2 Dataset
18 | It's a demo project for simple isolated speech word recognition. There are only 100 audio files with extention of `.wav` for training, and 10 audio files for testing. To be more specified:
19 | 
20 | - Train: For digits 0-9, each with 10 sampels with Chinese pronunciation
21 | - Test:  For digits 0-9, each with 1 sample with Chinese pronunciation
22 | 
23 | #### 1.3 Demo running results:
24 | In python2.x, run the script `demo.py`, get the result below:
25 | ```
26 | Finish prepare the training data
27 | Finish training of the GMM_HMM models for digits 0-9
28 | Test on true label  10 : predict result label is  9
29 | Test on true label  1 : predict result label is  1
30 | Test on true label  3 : predict result label is  3
31 | Test on true label  2 : predict result label is  2
32 | Test on true label  5 : predict result label is  5
33 | Test on true label  4 : predict result label is  4
34 | Test on true label  7 : predict result label is  7
35 | Test on true label  6 : predict result label is  6
36 | Test on true label  9 : predict result label is  9
37 | Test on true label  8 : predict result label is  9
38 | Final recognition rate is 80.00 %
39 | ```
40 | 
41 | ### 2. Tricky bug workaround in hmmlearn
42 | In the training of HMM model, there may lead to negative value of `startprob_`. By checking the code of hmmlearn, I found the following code in `hmmlearn/base.py` is suspicious:
43 | ```
44 | def _do_mstep(self, stats):
45 |     """Performs the M-step of EM algorithm.
46 |     Parameters
47 |     ----------
48 |     stats : dict
49 |         Sufficient statistics updated from all available samples.
50 |     """
51 |     # The ``np.where`` calls guard against updating forbidden states
52 |     # or transitions in e.g. a left-right HMM.
53 |     if 's' in self.params:
54 |         startprob_ = self.startprob_prior - 1.0 + stats['start']
55 |         self.startprob_ = np.where(self.startprob_ == 0.0,
56 |                                    self.startprob_, startprob_)
57 |         normalize(self.startprob_)
58 |     if 't' in self.params:
59 |         transmat_ = self.transmat_prior - 1.0 + stats['trans']
60 |         self.transmat_ = np.where(self.transmat_ == 0.0,
61 |                                   self.transmat_, transmat_)
62 |         normalize(self.transmat_, axis=1)
63 | ```
64 | When updating the `self.startprob_` and `self.transmat_` in every step, it will subtract 1.0 from the old value, in this situation, it will be very likely to lead to a negative value for both `self.startprob_` and `self.transmat_`.
65 | 
66 | Meanwhile, this issue is also submitted in the issue list of `hmmlearn`, but no response from the maintainer of `hmmlearn`:
67 | 
68 | [startprob_ of ghmm is negative or nan #276](https://github.com/hmmlearn/hmmlearn/issues/276)
69 | 
70 | For a temporary workaround, I modify this part of code to get rid of the subtraction, since there is a `normalize` after the update.
71 | ```
72 | def _do_mstep(self, stats):
73 |     """Performs the M-step of EM algorithm.
74 | 
75 |     Parameters
76 |     ----------
77 |     stats : dict
78 |         Sufficient statistics updated from all available samples.
79 |     """
80 |     # The ``np.where`` calls guard against updating forbidden states
81 |     # or transitions in e.g. a left-right HMM.
82 |     if 's' in self.params:
83 |         # startprob_ = self.startprob_prior - 1.0 + stats['start']
84 |         startprob_ = self.startprob_prior + stats['start']
85 |         self.startprob_ = np.where(self.startprob_ == 0.0,
86 |                                    self.startprob_, startprob_)
87 |         normalize(self.startprob_)
88 |     if 't' in self.params:
89 |         # transmat_ = self.transmat_prior - 1.0 + stats['trans']
90 |         transmat_ = self.transmat_prior + stats['trans']
91 |         self.transmat_ = np.where(self.transmat_ == 0.0,
92 |                                   self.transmat_, transmat_)
93 |         normalize(self.transmat_, axis=1)
94 | ```
95 | 
96 | Please note: This modification is just a wordaround, official solution will be followed up if there is any update.  


--------------------------------------------------------------------------------