├── test_audio ├── 1_1.wav ├── 1_2.wav ├── 1_3.wav ├── 1_4.wav ├── 1_5.wav ├── 1_6.wav ├── 1_7.wav ├── 1_8.wav ├── 1_9.wav └── 1_10.wav ├── train_audio ├── 10_1.wav ├── 10_10.wav ├── 10_2.wav ├── 10_3.wav ├── 10_4.wav ├── 10_5.wav ├── 10_6.wav ├── 10_7.wav ├── 10_8.wav ├── 10_9.wav ├── 1_1.wav ├── 1_10.wav ├── 1_2.wav ├── 1_3.wav ├── 1_4.wav ├── 1_5.wav ├── 1_6.wav ├── 1_7.wav ├── 1_8.wav ├── 1_9.wav ├── 2_1.wav ├── 2_10.wav ├── 2_2.wav ├── 2_3.wav ├── 2_4.wav ├── 2_5.wav ├── 2_6.wav ├── 2_7.wav ├── 2_8.wav ├── 2_9.wav ├── 3_1.wav ├── 3_10.wav ├── 3_2.wav ├── 3_3.wav ├── 3_4.wav ├── 3_5.wav ├── 3_6.wav ├── 3_7.wav ├── 3_8.wav ├── 3_9.wav ├── 4_1.wav ├── 4_10.wav ├── 4_2.wav ├── 4_3.wav ├── 4_4.wav ├── 4_5.wav ├── 4_6.wav ├── 4_7.wav ├── 4_8.wav ├── 4_9.wav ├── 5_1.wav ├── 5_10.wav ├── 5_2.wav ├── 5_3.wav ├── 5_4.wav ├── 5_5.wav ├── 5_6.wav ├── 5_7.wav ├── 5_8.wav ├── 5_9.wav ├── 6_1.wav ├── 6_10.wav ├── 6_2.wav ├── 6_3.wav ├── 6_4.wav ├── 6_5.wav ├── 6_6.wav ├── 6_7.wav ├── 6_8.wav ├── 6_9.wav ├── 7_1.wav ├── 7_10.wav ├── 7_2.wav ├── 7_3.wav ├── 7_4.wav ├── 7_5.wav ├── 7_6.wav ├── 7_7.wav ├── 7_8.wav ├── 7_9.wav ├── 8_1.wav ├── 8_10.wav ├── 8_2.wav ├── 8_3.wav ├── 8_4.wav ├── 8_5.wav ├── 8_6.wav ├── 8_7.wav ├── 8_8.wav ├── 8_9.wav ├── 9_1.wav ├── 9_10.wav ├── 9_2.wav ├── 9_3.wav ├── 9_4.wav ├── 9_5.wav ├── 9_6.wav ├── 9_7.wav ├── 9_8.wav └── 9_9.wav ├── demo.py └── README.md /test_audio/1_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_1.wav -------------------------------------------------------------------------------- /test_audio/1_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_2.wav -------------------------------------------------------------------------------- /test_audio/1_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_3.wav -------------------------------------------------------------------------------- /test_audio/1_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_4.wav -------------------------------------------------------------------------------- /test_audio/1_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_5.wav -------------------------------------------------------------------------------- /test_audio/1_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_6.wav -------------------------------------------------------------------------------- /test_audio/1_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_7.wav -------------------------------------------------------------------------------- /test_audio/1_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_8.wav -------------------------------------------------------------------------------- /test_audio/1_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_9.wav -------------------------------------------------------------------------------- /test_audio/1_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/test_audio/1_10.wav -------------------------------------------------------------------------------- /train_audio/10_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_1.wav -------------------------------------------------------------------------------- /train_audio/10_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_10.wav -------------------------------------------------------------------------------- /train_audio/10_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_2.wav -------------------------------------------------------------------------------- /train_audio/10_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_3.wav -------------------------------------------------------------------------------- /train_audio/10_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_4.wav -------------------------------------------------------------------------------- /train_audio/10_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_5.wav -------------------------------------------------------------------------------- /train_audio/10_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_6.wav -------------------------------------------------------------------------------- /train_audio/10_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_7.wav -------------------------------------------------------------------------------- /train_audio/10_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_8.wav -------------------------------------------------------------------------------- /train_audio/10_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/10_9.wav -------------------------------------------------------------------------------- /train_audio/1_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_1.wav -------------------------------------------------------------------------------- /train_audio/1_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_10.wav -------------------------------------------------------------------------------- /train_audio/1_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_2.wav -------------------------------------------------------------------------------- /train_audio/1_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_3.wav -------------------------------------------------------------------------------- /train_audio/1_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_4.wav -------------------------------------------------------------------------------- /train_audio/1_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_5.wav -------------------------------------------------------------------------------- /train_audio/1_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_6.wav -------------------------------------------------------------------------------- /train_audio/1_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_7.wav -------------------------------------------------------------------------------- /train_audio/1_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_8.wav -------------------------------------------------------------------------------- /train_audio/1_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/1_9.wav -------------------------------------------------------------------------------- /train_audio/2_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_1.wav -------------------------------------------------------------------------------- /train_audio/2_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_10.wav -------------------------------------------------------------------------------- /train_audio/2_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_2.wav -------------------------------------------------------------------------------- /train_audio/2_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_3.wav -------------------------------------------------------------------------------- /train_audio/2_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_4.wav -------------------------------------------------------------------------------- /train_audio/2_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_5.wav -------------------------------------------------------------------------------- /train_audio/2_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_6.wav -------------------------------------------------------------------------------- /train_audio/2_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_7.wav -------------------------------------------------------------------------------- /train_audio/2_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_8.wav -------------------------------------------------------------------------------- /train_audio/2_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/2_9.wav -------------------------------------------------------------------------------- /train_audio/3_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_1.wav -------------------------------------------------------------------------------- /train_audio/3_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_10.wav -------------------------------------------------------------------------------- /train_audio/3_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_2.wav -------------------------------------------------------------------------------- /train_audio/3_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_3.wav -------------------------------------------------------------------------------- /train_audio/3_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_4.wav -------------------------------------------------------------------------------- /train_audio/3_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_5.wav -------------------------------------------------------------------------------- /train_audio/3_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_6.wav -------------------------------------------------------------------------------- /train_audio/3_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_7.wav -------------------------------------------------------------------------------- /train_audio/3_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_8.wav -------------------------------------------------------------------------------- /train_audio/3_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/3_9.wav -------------------------------------------------------------------------------- /train_audio/4_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_1.wav -------------------------------------------------------------------------------- /train_audio/4_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_10.wav -------------------------------------------------------------------------------- /train_audio/4_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_2.wav -------------------------------------------------------------------------------- /train_audio/4_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_3.wav -------------------------------------------------------------------------------- /train_audio/4_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_4.wav -------------------------------------------------------------------------------- /train_audio/4_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_5.wav -------------------------------------------------------------------------------- /train_audio/4_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_6.wav -------------------------------------------------------------------------------- /train_audio/4_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_7.wav -------------------------------------------------------------------------------- /train_audio/4_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_8.wav -------------------------------------------------------------------------------- /train_audio/4_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/4_9.wav -------------------------------------------------------------------------------- /train_audio/5_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_1.wav -------------------------------------------------------------------------------- /train_audio/5_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_10.wav -------------------------------------------------------------------------------- /train_audio/5_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_2.wav -------------------------------------------------------------------------------- /train_audio/5_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_3.wav -------------------------------------------------------------------------------- /train_audio/5_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_4.wav -------------------------------------------------------------------------------- /train_audio/5_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_5.wav -------------------------------------------------------------------------------- /train_audio/5_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_6.wav -------------------------------------------------------------------------------- /train_audio/5_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_7.wav -------------------------------------------------------------------------------- /train_audio/5_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_8.wav -------------------------------------------------------------------------------- /train_audio/5_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/5_9.wav -------------------------------------------------------------------------------- /train_audio/6_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_1.wav -------------------------------------------------------------------------------- /train_audio/6_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_10.wav -------------------------------------------------------------------------------- /train_audio/6_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_2.wav -------------------------------------------------------------------------------- /train_audio/6_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_3.wav -------------------------------------------------------------------------------- /train_audio/6_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_4.wav -------------------------------------------------------------------------------- /train_audio/6_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_5.wav -------------------------------------------------------------------------------- /train_audio/6_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_6.wav -------------------------------------------------------------------------------- /train_audio/6_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_7.wav -------------------------------------------------------------------------------- /train_audio/6_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_8.wav -------------------------------------------------------------------------------- /train_audio/6_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/6_9.wav -------------------------------------------------------------------------------- /train_audio/7_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_1.wav -------------------------------------------------------------------------------- /train_audio/7_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_10.wav -------------------------------------------------------------------------------- /train_audio/7_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_2.wav -------------------------------------------------------------------------------- /train_audio/7_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_3.wav -------------------------------------------------------------------------------- /train_audio/7_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_4.wav -------------------------------------------------------------------------------- /train_audio/7_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_5.wav -------------------------------------------------------------------------------- /train_audio/7_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_6.wav -------------------------------------------------------------------------------- /train_audio/7_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_7.wav -------------------------------------------------------------------------------- /train_audio/7_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_8.wav -------------------------------------------------------------------------------- /train_audio/7_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/7_9.wav -------------------------------------------------------------------------------- /train_audio/8_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_1.wav -------------------------------------------------------------------------------- /train_audio/8_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_10.wav -------------------------------------------------------------------------------- /train_audio/8_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_2.wav -------------------------------------------------------------------------------- /train_audio/8_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_3.wav -------------------------------------------------------------------------------- /train_audio/8_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_4.wav -------------------------------------------------------------------------------- /train_audio/8_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_5.wav -------------------------------------------------------------------------------- /train_audio/8_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_6.wav -------------------------------------------------------------------------------- /train_audio/8_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_7.wav -------------------------------------------------------------------------------- /train_audio/8_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_8.wav -------------------------------------------------------------------------------- /train_audio/8_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/8_9.wav -------------------------------------------------------------------------------- /train_audio/9_1.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_1.wav -------------------------------------------------------------------------------- /train_audio/9_10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_10.wav -------------------------------------------------------------------------------- /train_audio/9_2.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_2.wav -------------------------------------------------------------------------------- /train_audio/9_3.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_3.wav -------------------------------------------------------------------------------- /train_audio/9_4.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_4.wav -------------------------------------------------------------------------------- /train_audio/9_5.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_5.wav -------------------------------------------------------------------------------- /train_audio/9_6.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_6.wav -------------------------------------------------------------------------------- /train_audio/9_7.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_7.wav -------------------------------------------------------------------------------- /train_audio/9_8.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_8.wav -------------------------------------------------------------------------------- /train_audio/9_9.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wblgers/hmm_speech_recognition_demo/HEAD/train_audio/9_9.wav -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Created on 29/08/2018 3 | 4 | @author: wblgers 5 | ''' 6 | from __future__ import print_function 7 | import warnings 8 | import os 9 | from scikits.talkbox.features import mfcc 10 | from scipy.io import wavfile 11 | from hmmlearn import hmm 12 | import numpy as np 13 | warnings.filterwarnings('ignore') 14 | def extract_mfcc(full_audio_path): 15 | sample_rate, wave = wavfile.read(full_audio_path) 16 | mfcc_features = mfcc(wave, nwin=int(sample_rate * 0.03), fs=sample_rate, nceps=12)[0] 17 | return mfcc_features 18 | 19 | def buildDataSet(dir): 20 | # Filter out the wav audio files under the dir 21 | fileList = [f for f in os.listdir(dir) if os.path.splitext(f)[1] == '.wav'] 22 | dataset = {} 23 | for fileName in fileList: 24 | tmp = fileName.split('.')[0] 25 | label = tmp.split('_')[1] 26 | feature = extract_mfcc(dir+fileName) 27 | if label not in dataset.keys(): 28 | dataset[label] = [] 29 | dataset[label].append(feature) 30 | else: 31 | exist_feature = dataset[label] 32 | exist_feature.append(feature) 33 | dataset[label] = exist_feature 34 | return dataset 35 | 36 | def train_GMMHMM(dataset): 37 | GMMHMM_Models = {} 38 | states_num = 5 39 | GMM_mix_num = 3 40 | tmp_p = 1.0/(states_num-2) 41 | transmatPrior = np.array([[tmp_p, tmp_p, tmp_p, 0 ,0], \ 42 | [0, tmp_p, tmp_p, tmp_p , 0], \ 43 | [0, 0, tmp_p, tmp_p,tmp_p], \ 44 | [0, 0, 0, 0.5, 0.5], \ 45 | [0, 0, 0, 0, 1]],dtype=np.float) 46 | 47 | 48 | startprobPrior = np.array([0.5, 0.5, 0, 0, 0],dtype=np.float) 49 | 50 | for label in dataset.keys(): 51 | model = hmm.GMMHMM(n_components=states_num, n_mix=GMM_mix_num, \ 52 | transmat_prior=transmatPrior, startprob_prior=startprobPrior, \ 53 | covariance_type='diag', n_iter=10) 54 | trainData = dataset[label] 55 | length = np.zeros([len(trainData), ], dtype=np.int) 56 | for m in range(len(trainData)): 57 | length[m] = trainData[m].shape[0] 58 | trainData = np.vstack(trainData) 59 | model.fit(trainData, lengths=length) # get optimal parameters 60 | GMMHMM_Models[label] = model 61 | return GMMHMM_Models 62 | 63 | def main(): 64 | trainDir = './train_audio/' 65 | trainDataSet = buildDataSet(trainDir) 66 | print("Finish prepare the training data") 67 | 68 | hmmModels = train_GMMHMM(trainDataSet) 69 | print("Finish training of the GMM_HMM models for digits 0-9") 70 | 71 | testDir = './test_audio/' 72 | testDataSet = buildDataSet(testDir) 73 | 74 | score_cnt = 0 75 | for label in testDataSet.keys(): 76 | feature = testDataSet[label] 77 | scoreList = {} 78 | for model_label in hmmModels.keys(): 79 | model = hmmModels[model_label] 80 | score = model.score(feature[0]) 81 | scoreList[model_label] = score 82 | predict = max(scoreList, key=scoreList.get) 83 | print("Test on true label ", label, ": predict result label is ", predict) 84 | if predict == label: 85 | score_cnt+=1 86 | print("Final recognition rate is %.2f"%(100.0*score_cnt/len(testDataSet.keys())), "%") 87 | 88 | 89 | if __name__ == '__main__': 90 | main() -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # hmm_speech_recognition_demo 2 | 3 | ### 0. Setup Environment 4 | This demo project is running on python2.x, please install the following required packages as well: 5 | - scikits.talkbox: Calculation of MFCC features on audio 6 | - hmmlearn: Hidden Markov Models in Python, with scikit-learn like API 7 | - scipy: Fundamental library for scientific computing 8 | 9 | All the three python packages can be installed via `pip install`, on Python3.x, the package `scikits.talkbox` can't be installed correctly for me. 10 | 11 | 12 | ### 1. Description 13 | #### 1.1 Problem 14 | By utilizing the `GMMHMM` in `hmmlearn`, we try to model the audio files in 10 categories. `GMMHMM` model provides easy interface to train a HMM model and to evaluate the score on test set. 15 | 16 | Please more details in the [doc](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn-hmm) of `hmmlearn`. 17 | #### 1.2 Dataset 18 | It's a demo project for simple isolated speech word recognition. There are only 100 audio files with extention of `.wav` for training, and 10 audio files for testing. To be more specified: 19 | 20 | - Train: For digits 0-9, each with 10 sampels with Chinese pronunciation 21 | - Test: For digits 0-9, each with 1 sample with Chinese pronunciation 22 | 23 | #### 1.3 Demo running results: 24 | In python2.x, run the script `demo.py`, get the result below: 25 | ``` 26 | Finish prepare the training data 27 | Finish training of the GMM_HMM models for digits 0-9 28 | Test on true label 10 : predict result label is 9 29 | Test on true label 1 : predict result label is 1 30 | Test on true label 3 : predict result label is 3 31 | Test on true label 2 : predict result label is 2 32 | Test on true label 5 : predict result label is 5 33 | Test on true label 4 : predict result label is 4 34 | Test on true label 7 : predict result label is 7 35 | Test on true label 6 : predict result label is 6 36 | Test on true label 9 : predict result label is 9 37 | Test on true label 8 : predict result label is 9 38 | Final recognition rate is 80.00 % 39 | ``` 40 | 41 | ### 2. Tricky bug workaround in hmmlearn 42 | In the training of HMM model, there may lead to negative value of `startprob_`. By checking the code of hmmlearn, I found the following code in `hmmlearn/base.py` is suspicious: 43 | ``` 44 | def _do_mstep(self, stats): 45 | """Performs the M-step of EM algorithm. 46 | Parameters 47 | ---------- 48 | stats : dict 49 | Sufficient statistics updated from all available samples. 50 | """ 51 | # The ``np.where`` calls guard against updating forbidden states 52 | # or transitions in e.g. a left-right HMM. 53 | if 's' in self.params: 54 | startprob_ = self.startprob_prior - 1.0 + stats['start'] 55 | self.startprob_ = np.where(self.startprob_ == 0.0, 56 | self.startprob_, startprob_) 57 | normalize(self.startprob_) 58 | if 't' in self.params: 59 | transmat_ = self.transmat_prior - 1.0 + stats['trans'] 60 | self.transmat_ = np.where(self.transmat_ == 0.0, 61 | self.transmat_, transmat_) 62 | normalize(self.transmat_, axis=1) 63 | ``` 64 | When updating the `self.startprob_` and `self.transmat_` in every step, it will subtract 1.0 from the old value, in this situation, it will be very likely to lead to a negative value for both `self.startprob_` and `self.transmat_`. 65 | 66 | Meanwhile, this issue is also submitted in the issue list of `hmmlearn`, but no response from the maintainer of `hmmlearn`: 67 | 68 | [startprob_ of ghmm is negative or nan #276](https://github.com/hmmlearn/hmmlearn/issues/276) 69 | 70 | For a temporary workaround, I modify this part of code to get rid of the subtraction, since there is a `normalize` after the update. 71 | ``` 72 | def _do_mstep(self, stats): 73 | """Performs the M-step of EM algorithm. 74 | 75 | Parameters 76 | ---------- 77 | stats : dict 78 | Sufficient statistics updated from all available samples. 79 | """ 80 | # The ``np.where`` calls guard against updating forbidden states 81 | # or transitions in e.g. a left-right HMM. 82 | if 's' in self.params: 83 | # startprob_ = self.startprob_prior - 1.0 + stats['start'] 84 | startprob_ = self.startprob_prior + stats['start'] 85 | self.startprob_ = np.where(self.startprob_ == 0.0, 86 | self.startprob_, startprob_) 87 | normalize(self.startprob_) 88 | if 't' in self.params: 89 | # transmat_ = self.transmat_prior - 1.0 + stats['trans'] 90 | transmat_ = self.transmat_prior + stats['trans'] 91 | self.transmat_ = np.where(self.transmat_ == 0.0, 92 | self.transmat_, transmat_) 93 | normalize(self.transmat_, axis=1) 94 | ``` 95 | 96 | Please note: This modification is just a wordaround, official solution will be followed up if there is any update. --------------------------------------------------------------------------------