├── Pretrained_models
    ├── basic_lstm.h5
    ├── cnn.h5
    ├── crnn.h5
    ├── dnn.h5
    ├── ds_cnn.h5
    └── gru.h5
├── README.md
├── README.md~
├── accuracy_table.png
└── kws_pipeline.png


/Pretrained_models/basic_lstm.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/basic_lstm.h5


--------------------------------------------------------------------------------
/Pretrained_models/cnn.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/cnn.h5


--------------------------------------------------------------------------------
/Pretrained_models/crnn.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/crnn.h5


--------------------------------------------------------------------------------
/Pretrained_models/dnn.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/dnn.h5


--------------------------------------------------------------------------------
/Pretrained_models/ds_cnn.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/ds_cnn.h5


--------------------------------------------------------------------------------
/Pretrained_models/gru.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/gru.h5


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # KWS-ANN-KERAS
 2 | This repository contains the pretrained artificial neural network ([ANN](https://en.wikipedia.org/wiki/Artificial_neural_network)) model (.h5) files developed and trained in Keras, for Key Word Spotting(KWS). All these models are the equellents of the [Tensorflow](https://www.tensorflow.org/) based varients present in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU) and proposed in the paper: [Hello Edge: Keyword spotting on Microcontrollers](https://arxiv.org/pdf/1711.07128.pdf). For details have a look on the paper or  `ML-KWS-for-MCU` repository.
 3 | 
 4 | ## Pipeline of the KWS systems 
 5 | The pipeline of a Key word spotting system is given in the figure below.
 6 | 
 7 | <img src="https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/master/kws_pipeline.png">
 8 | 
 9 | ## Dataset used
10 | 
11 | For experiements we used the [Speech Commands dataset](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html) created by [TensorFlow](https://www.tensorflow.org/) and [AIY labs](https://aiyprojects.withgoogle.com/).  The dataset has `65,000` one-second long utterances saved as `.wav` files of `30` single word commands, by pronounced by thousands of different people. The sampling freuqncy of the sound files is `16000 samples/sec`.
12 | ## Preprocessing systems
13 | 
14 | The raw wave files are processed to generate the feature matrix containing the Mel-frequency cepstral coefficients (mfcc) for overlapping windowed signal. For the current results we used a window size of `40 msec` with a stride size of `20 msec`. Influenced from the networks in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU), we used the `10 mfccs` only.
15 | 
16 | ## Training of the models
17 | 
18 | The training commands with all the hyperparameters to reproduce the models shown in the 
19 | [paper](https://arxiv.org/pdf/1711.07128.pdf) are given [here](https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/train_commands.txt). For these experiment we trained the models on `12` classese, e.g. with all numbers from `zero` to `nine` and two classes named as `silence` and `unknown`. The `unknown` class contains instances from all other classes. The ratio for `unknown` and `silence` is kept as `10%`. For details see the `input_data.py` file in this [repository](https://github.com/ARM-software/ML-KWS-for-MCU).
20 | 
21 | ## Pretrained models
22 | 
23 | Pretrained models (.h5 files) for different neural network architectures such as Deep Neural Networks (DNN),
24 | Convolutional Neural Networks (CNN), Basic Long Short-Term Memory (LSTM), Gated Recurrent Unis (GRU) networks, Convolutional and Recurrent Neural Networks (CRNN) and Depth-wise Seperable Convolutional Neural Networks (DS-CNN) shown in [arXiv paper](https://arxiv.org/pdf/1711.07128.pdf) are added in [Pretrained_models](Pretrained_models). Accuracy of the models on testing set and their memory requirements per inference are also summarized in the following table. 
25 | 
26 | <img src="https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/master/accuracy_table.png">
27 | 
28 | PS. Please note that all the weights and activations are considered to be 32 bits long.
29 | 
30 | 


--------------------------------------------------------------------------------
/README.md~:
--------------------------------------------------------------------------------
 1 | # KWS-ANN-KERAS
 2 | This repository contains the pretrained artificial neural network ([ANN](https://en.wikipedia.org/wiki/Artificial_neural_network)) model (.h5) files developed and trained in Keras, for Key Word Spotting(KWS). All these models are the equellents of the [Tensorflow](https://www.tensorflow.org/) based varients present in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU) and proposed in the paper: [Hello Edge: Keyword spotting on Microcontrollers](https://arxiv.org/pdf/1711.07128.pdf). For details have a look on the paper or  `ML-KWS-for-MCU` repository.
 3 | 
 4 | ## Pipeline of the KWS systems 
 5 | The pipeline of a Key word spotting system is given in the figure below.
 6 | 
 7 | <img src="https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/master/kws_pipeline.png">
 8 | 
 9 | ## Dataset used
10 | 
11 | For experiements we used the [Speech Commands dataset](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html) created by [TensorFlow](https://www.tensorflow.org/) and [AIY labs](https://aiyprojects.withgoogle.com/).  The dataset has `65,000` one-second long utterances saved as `.wav` files of `30` single word commands, by pronounced by thousands of different people. The sampling freuqncy of the sound files is `16000 samples/sec`.
12 | ## Preprocessing systems
13 | 
14 | The raw wave files are processed to generate the feature matrix containing the Mel-frequency cepstral coefficients (mfcc) for overlapping windowed signal. For the current results we used a window size of `40 msec` with a stride size of `20 msec`. Influenced from the networks in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU), we used the `10 mfccs` only.
15 | 
16 | ## Training of the models
17 | 
18 | The training commands with all the hyperparameters to reproduce the models shown in the 
19 | [paper](https://arxiv.org/pdf/1711.07128.pdf) are given [here](https://github.com/ARM-software/ML-KWS-for-MCU/master/train_commands.txt). For these experiment we trained the models on `12` classese, e.g. with all numbers from `zero` to `nine` and two classes named as `silence` and `unknown`. The `unknown` class contains instances from all other classes. The ratio for `unknown` and `silence` is kept as `10%`. For details see the `input_data.py` file in this [repository](https://github.com/ARM-software/ML-KWS-for-MCU).
20 | 
21 | ## Pretrained models
22 | 
23 | Pretrained models (.h5 files) for different neural network architectures such as Deep Neural Networks (DNN),
24 | Convolutional Neural Networks (CNN), Basic Long Short-Term Memory (LSTM), Gated Recurrent Unis (GRU) networks, Convolutional and Recurrent Neural Networks (CRNN) and Depth-wise Seperable Convolutional Neural Networks (DS-CNN) shown in [arXiv paper](https://arxiv.org/pdf/1711.07128.pdf) are added in [Pretrained_models](Pretrained_models). Accuracy of the models on testing set and their memory requirements per inference are also summarized in the following table. 
25 | 
26 | <img src="https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/master/accuracy_table.png">
27 | 
28 | PS. Please note that all the weights and activations are considered to be 32 bits long.
29 | 
30 | 


--------------------------------------------------------------------------------
/accuracy_table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/accuracy_table.png


--------------------------------------------------------------------------------
/kws_pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/kws_pipeline.png


--------------------------------------------------------------------------------