├── Pretrained_models ├── basic_lstm.h5 ├── cnn.h5 ├── crnn.h5 ├── dnn.h5 ├── ds_cnn.h5 └── gru.h5 ├── README.md ├── README.md~ ├── accuracy_table.png └── kws_pipeline.png /Pretrained_models/basic_lstm.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/basic_lstm.h5 -------------------------------------------------------------------------------- /Pretrained_models/cnn.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/cnn.h5 -------------------------------------------------------------------------------- /Pretrained_models/crnn.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/crnn.h5 -------------------------------------------------------------------------------- /Pretrained_models/dnn.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/dnn.h5 -------------------------------------------------------------------------------- /Pretrained_models/ds_cnn.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/ds_cnn.h5 -------------------------------------------------------------------------------- /Pretrained_models/gru.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/Pretrained_models/gru.h5 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # KWS-ANN-KERAS 2 | This repository contains the pretrained artificial neural network ([ANN](https://en.wikipedia.org/wiki/Artificial_neural_network)) model (.h5) files developed and trained in Keras, for Key Word Spotting(KWS). All these models are the equellents of the [Tensorflow](https://www.tensorflow.org/) based varients present in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU) and proposed in the paper: [Hello Edge: Keyword spotting on Microcontrollers](https://arxiv.org/pdf/1711.07128.pdf). For details have a look on the paper or `ML-KWS-for-MCU` repository. 3 | 4 | ## Pipeline of the KWS systems 5 | The pipeline of a Key word spotting system is given in the figure below. 6 | 7 | 8 | 9 | ## Dataset used 10 | 11 | For experiements we used the [Speech Commands dataset](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html) created by [TensorFlow](https://www.tensorflow.org/) and [AIY labs](https://aiyprojects.withgoogle.com/). The dataset has `65,000` one-second long utterances saved as `.wav` files of `30` single word commands, by pronounced by thousands of different people. The sampling freuqncy of the sound files is `16000 samples/sec`. 12 | ## Preprocessing systems 13 | 14 | The raw wave files are processed to generate the feature matrix containing the Mel-frequency cepstral coefficients (mfcc) for overlapping windowed signal. For the current results we used a window size of `40 msec` with a stride size of `20 msec`. Influenced from the networks in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU), we used the `10 mfccs` only. 15 | 16 | ## Training of the models 17 | 18 | The training commands with all the hyperparameters to reproduce the models shown in the 19 | [paper](https://arxiv.org/pdf/1711.07128.pdf) are given [here](https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/train_commands.txt). For these experiment we trained the models on `12` classese, e.g. with all numbers from `zero` to `nine` and two classes named as `silence` and `unknown`. The `unknown` class contains instances from all other classes. The ratio for `unknown` and `silence` is kept as `10%`. For details see the `input_data.py` file in this [repository](https://github.com/ARM-software/ML-KWS-for-MCU). 20 | 21 | ## Pretrained models 22 | 23 | Pretrained models (.h5 files) for different neural network architectures such as Deep Neural Networks (DNN), 24 | Convolutional Neural Networks (CNN), Basic Long Short-Term Memory (LSTM), Gated Recurrent Unis (GRU) networks, Convolutional and Recurrent Neural Networks (CRNN) and Depth-wise Seperable Convolutional Neural Networks (DS-CNN) shown in [arXiv paper](https://arxiv.org/pdf/1711.07128.pdf) are added in [Pretrained_models](Pretrained_models). Accuracy of the models on testing set and their memory requirements per inference are also summarized in the following table. 25 | 26 | 27 | 28 | PS. Please note that all the weights and activations are considered to be 32 bits long. 29 | 30 | -------------------------------------------------------------------------------- /README.md~: -------------------------------------------------------------------------------- 1 | # KWS-ANN-KERAS 2 | This repository contains the pretrained artificial neural network ([ANN](https://en.wikipedia.org/wiki/Artificial_neural_network)) model (.h5) files developed and trained in Keras, for Key Word Spotting(KWS). All these models are the equellents of the [Tensorflow](https://www.tensorflow.org/) based varients present in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU) and proposed in the paper: [Hello Edge: Keyword spotting on Microcontrollers](https://arxiv.org/pdf/1711.07128.pdf). For details have a look on the paper or `ML-KWS-for-MCU` repository. 3 | 4 | ## Pipeline of the KWS systems 5 | The pipeline of a Key word spotting system is given in the figure below. 6 | 7 | 8 | 9 | ## Dataset used 10 | 11 | For experiements we used the [Speech Commands dataset](https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html) created by [TensorFlow](https://www.tensorflow.org/) and [AIY labs](https://aiyprojects.withgoogle.com/). The dataset has `65,000` one-second long utterances saved as `.wav` files of `30` single word commands, by pronounced by thousands of different people. The sampling freuqncy of the sound files is `16000 samples/sec`. 12 | ## Preprocessing systems 13 | 14 | The raw wave files are processed to generate the feature matrix containing the Mel-frequency cepstral coefficients (mfcc) for overlapping windowed signal. For the current results we used a window size of `40 msec` with a stride size of `20 msec`. Influenced from the networks in [ML-KWS-for-MCU](https://github.com/ARM-software/ML-KWS-for-MCU), we used the `10 mfccs` only. 15 | 16 | ## Training of the models 17 | 18 | The training commands with all the hyperparameters to reproduce the models shown in the 19 | [paper](https://arxiv.org/pdf/1711.07128.pdf) are given [here](https://github.com/ARM-software/ML-KWS-for-MCU/master/train_commands.txt). For these experiment we trained the models on `12` classese, e.g. with all numbers from `zero` to `nine` and two classes named as `silence` and `unknown`. The `unknown` class contains instances from all other classes. The ratio for `unknown` and `silence` is kept as `10%`. For details see the `input_data.py` file in this [repository](https://github.com/ARM-software/ML-KWS-for-MCU). 20 | 21 | ## Pretrained models 22 | 23 | Pretrained models (.h5 files) for different neural network architectures such as Deep Neural Networks (DNN), 24 | Convolutional Neural Networks (CNN), Basic Long Short-Term Memory (LSTM), Gated Recurrent Unis (GRU) networks, Convolutional and Recurrent Neural Networks (CRNN) and Depth-wise Seperable Convolutional Neural Networks (DS-CNN) shown in [arXiv paper](https://arxiv.org/pdf/1711.07128.pdf) are added in [Pretrained_models](Pretrained_models). Accuracy of the models on testing set and their memory requirements per inference are also summarized in the following table. 25 | 26 | 27 | 28 | PS. Please note that all the weights and activations are considered to be 32 bits long. 29 | 30 | -------------------------------------------------------------------------------- /accuracy_table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/accuracy_table.png -------------------------------------------------------------------------------- /kws_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shahnawax/KWS-ANN-KERAS/6dc52f522935751994a0ba0bb3428f84ebfccf0e/kws_pipeline.png --------------------------------------------------------------------------------