├── .gitignore
├── 0_introduction.ipynb
├── 1_data_exploration.ipynb
├── 2_custom_autoencoder.ipynb
├── 3_rekognition_custom_labels.ipynb
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── autoencoder
    ├── model.py
    └── requirements.txt
├── pictures
    ├── confusion_matrix_autoencoder.png
    ├── confusion_matrix_rekognition.png
    ├── reconstruction_error_histograms.png
    ├── reconstruction_error_threshold.png
    ├── spectrograms.png
    ├── stft.png
    ├── threshold_range_exploration.png
    └── waveforms.png
└── tools
    ├── rekognition_tools.py
    ├── sound_tools.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | /data
 31 | 
 32 | # PyInstaller
 33 | #  Usually these files are written by a python script from a template
 34 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 35 | *.manifest
 36 | *.spec
 37 | 
 38 | # Installer logs
 39 | pip-log.txt
 40 | pip-delete-this-directory.txt
 41 | 
 42 | # Unit test / coverage reports
 43 | htmlcov/
 44 | .tox/
 45 | .nox/
 46 | .coverage
 47 | .coverage.*
 48 | .cache
 49 | nosetests.xml
 50 | coverage.xml
 51 | *.cover
 52 | *.py,cover
 53 | .hypothesis/
 54 | .pytest_cache/
 55 | 
 56 | # Translations
 57 | *.mo
 58 | *.pot
 59 | 
 60 | # Django stuff:
 61 | *.log
 62 | local_settings.py
 63 | db.sqlite3
 64 | db.sqlite3-journal
 65 | 
 66 | # Flask stuff:
 67 | instance/
 68 | .webassets-cache
 69 | 
 70 | # Scrapy stuff:
 71 | .scrapy
 72 | 
 73 | # Sphinx documentation
 74 | docs/_build/
 75 | 
 76 | # PyBuilder
 77 | target/
 78 | 
 79 | # Jupyter Notebook
 80 | .ipynb_checkpoints
 81 | 
 82 | # IPython
 83 | profile_default/
 84 | ipython_config.py
 85 | 
 86 | # pyenv
 87 | .python-version
 88 | 
 89 | # pipenv
 90 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 91 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 92 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 93 | #   install all needed dependencies.
 94 | #Pipfile.lock
 95 | 
 96 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 97 | __pypackages__/
 98 | 
 99 | # Celery stuff
100 | celerybeat-schedule
101 | celerybeat.pid
102 | 
103 | # SageMath parsed files
104 | *.sage.py
105 | 
106 | # Environments
107 | .env
108 | .venv
109 | env/
110 | venv/
111 | ENV/
112 | env.bak/
113 | venv.bak/
114 | 
115 | # Spyder project settings
116 | .spyderproject
117 | .spyproject
118 | 
119 | # Rope project settings
120 | .ropeproject
121 | 
122 | # mkdocs documentation
123 | /site
124 | 
125 | # mypy
126 | .mypy_cache/
127 | .dmypy.json
128 | dmypy.json
129 | 
130 | # Pyre type checker
131 | .pyre/
132 | 


--------------------------------------------------------------------------------
/0_introduction.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Sound anomaly detection\n",
  8 |     "*Context*\n",
  9 |     "\n",
 10 |     "## Introduction\n",
 11 |     "---\n",
 12 |     "Industrial companies have been collecting a massive amount of time series data about their operating processes, manufacturing production lines and industrial equipment. They sometime store years of data in historian systems or in their factory information system at large. Whereas they are looking to prevent equipment breakdown that would stop a production line, avoid catastrophic failures in a power generation facility or improving their end product quality by adjusting their process parameters, having the ability to process time series data is a challenge that modern cloud technologies are up to. However, everything is not about cloud itself: your factory edge capability must allow you to stream the appropriate data to the cloud (bandwidth, connectivity, protocol compatibility, putting data in context...).\n",
 13 |     "\n",
 14 |     "What if had a frugal way to qualify your equipment health with few data? This would definitely help leveraging robust and easier to maintain edge-to-cloud blueprints. In this post, we are going to focus on a tactical approach industrial companies can use to help them reduce the impact of machine breakdowns by reducing how unpredictable they are.\n",
 15 |     "\n",
 16 |     "Most times, machine failures are tackled by either reactive action (stop the line and repair...) or costly preventive maintenance where you have to build the proper replacement parts inventory and schedule regular maintenance activities. Skilled machine operators are the most valuable assets in such settings: years of experience allow them to develop a fine knowledge of how the machinery should operate, they become expert listeners and can to detect unusual behavior and sounds in rotating and moving machines. However, production lines are becoming more and more automated, and augmenting these machine operators with AI-generated insights is a way to maintain and develop the fine expertise needed to prevent reactive-only postures when dealing with machine breakdowns.\n",
 17 |     "\n",
 18 |     "In this post we are going to compare and contrast two different approaches to identify a malfunctioning machine, providing we have sound recordings from its operation: we will start by building a neural network based on an autoencoder architecture and we will then use an image-based approach where we will feed images of sound (namely spectrograms) to an image based automated ML classification feature."
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Solution overview\n",
 26 |     "---\n",
 27 |     "In this example, we are going to use sounds recorded in an industrial environment to perform anomaly detection on industrial equipment.\n",
 28 |     "\n",
 29 |     "To achieve this, we are going to explore and leverage the MIMII dataset for anomaly detection purpose: this is a sound dataset for **M**alfunctioning **I**ndustrial **M**achine **I**nvestigation and **I**nspection (MIMII). You can download it from **https://zenodo.org/record/3384388**: it contains sounds from several types of industrial machines (valves, pumps, fans and slide rails). In this example, we are going to focus on the **fans**. **[This paper](https://arxiv.org/abs/1909.09347)** describes the sound capture procedure.\n",
 30 |     "\n",
 31 |     "We walk you through the following steps using Jupyter notebooks provided with this blog post:\n",
 32 |     "\n",
 33 |     "1. The first one will focus on *data exploration* to get familiar with sound data: these data are particular time series data and exploring them requires specific approaches.\n",
 34 |     "2. We will then use Amazon SageMaker to *build an* *autoencoder* that will be used as a classifier able to discriminate between normal and abnormal sounds.\n",
 35 |     "3. Last, we are going to take on a more novel approach in the last part of this work: we are going to *transform the sound files into spectrogram images* and feed them directly to an *image classifier*. We will use Amazon Rekognition Custom Labels to perform this classification task and leverage Amazon SageMaker for the data preprocessing and to drive the Custom Labels training and evaluation process.\n",
 36 |     "\n",
 37 |     "Both approaches requires an equal amount of effort to complete: although the models obtained in the end are not comparable, this will give you an idea of how much of a kick start you may get when using an applied AI service."
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "## Introducting the machine sound dataset\n",
 45 |     "---\n",
 46 |     "You can follow this data exploration work with the first companion notebook from **[this repository](https://github.com/michaelhoarau/sound-anomaly-detection)**. Each recording contains 8 channels, one for each microphone that was used to record a given machine sound. In this experiment, we will only focus on the recordings of the first microphone. The first thing we can do is to plot the waveforms of a normal and abnormal signals next to each other:\n",
 47 |     "\n",
 48 |     "![Waveforms](pictures/waveforms.png)\n",
 49 |     "\n",
 50 |     "Each signal is 10 seconds long and apart from the larger amplitude of the abnormal signal and some pattern that are more irregular, it’s difficult to distinguish between these two signals. In the companion notebook, you will also be able to listen to some of the sounds: most of the time, the differences are small, especially if you put them in a context of a very noisy environment.\n",
 51 |     "\n",
 52 |     "A first approach could be to leverage the **[Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform)**, which is a mathematical operator that decompose a function of time (or a signal) into its underlying frequencies. The Fourier transform is a function of frequency and its amplitude represents how much of a given frequency is present in the original signal. However, a sound signal is highly non-stationary (i.e. their statistics change over time). For a given time period, the frequency decomposition will be different from another time period. As a consequence, it will be rather meaningless to compute a single Fourier transform over the entire signal (however short they are in our case). We will need to call the short-time Fourier transform (STFT) for help: the STFT is obtained by computing the Fourier transform for successive frames in a signal.\n",
 53 |     "\n",
 54 |     "If we plot the amplitude of each frequency present in the first 64 ms of the first signal of both the normal and abnormal dataset, we obtain the following plot:\n",
 55 |     "\n",
 56 |     "![Short Fourier Transform](pictures/stft.png)\n",
 57 |     "\n",
 58 |     "We now have a tool to discretize our time signals into the frequency domain which brings us one step closer to be able to visualize them in this domain. For each signal we will now:\n",
 59 |     "\n",
 60 |     "1. Slice the signal in successive time frames\n",
 61 |     "2. Compute an STFT for each time frame\n",
 62 |     "3. Extract the amplitude of each frequency as a function of time\n",
 63 |     "4. Most sounds we can hear as humans, are concentrated in a very small range (**both** in frequency and amplitude range). The next step is then to take a log scale for both the frequency and the amplitude: for the amplitude, we obtain this by converting the color axis to Decibels (which is the equivalent of applying a log scale to the sound amplitudes)\n",
 64 |     "5. Plot the result on a spectrogram: a spectrogram has three dimensions: we keep time on the horizontal axis, put frequency on the vertical axis and use the amplitude to a color axis (in dB).\n",
 65 |     "\n",
 66 |     "The picture below shows the frequency representation of the signals plotted earlier:\n",
 67 |     "\n",
 68 |     "![Spectrograms](pictures/spectrograms.png)\n",
 69 |     "\n",
 70 |     "We can now see that these images have interesting features that we can easily uncover with our naked eyes: this is exactly the kind of features that a neural network can try to uncover and structure. We will now build two types of feature extractor based on this analysis and feed them to different type of architectures."
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {},
 76 |    "source": [
 77 |     "## Building a custom autoencoder architecture\n",
 78 |     "---\n",
 79 |     "The **[autoencoder architecture](https://en.wikipedia.org/wiki/Autoencoder)** is a neural network with the same number of neurons in the input and the output layers. This kind of architecture learns to generate the “identity” transformation between inputs and outputs. The second notebook of our series will go through these different steps:\n",
 80 |     "\n",
 81 |     "1. Build the dataset: to feed the spectrogram to an autoencoder, we will build a tabular dataset and upload it to Amazon S3.\n",
 82 |     "2. Create a TensorFlow autoencoder model, train it in script mode by using the TensorFlow / Keras existing container\n",
 83 |     "3. Evaluate the model to obtain a confusion matrix highlighting the classification performance between normal and abnormal sounds.\n",
 84 |     "\n",
 85 |     "### Build a dataset\n",
 86 |     "We are using the **[librosa library](https://librosa.org/doc/latest/index.html)** which is a python package for audio analysis. A features extraction function based on steps to generate the spectrogram described earlier is central to the dataset generation process.\n",
 87 |     "\n",
 88 |     "```python\n",
 89 |     "def extract_signal_features(signal, sr, n_mels=64, frames=5, n_fft=1024, hop_length=512):\n",
 90 |     "    # Compute a spectrogram (using Mel scale):\n",
 91 |     "    mel_spectrogram = librosa.feature.melspectrogram(\n",
 92 |     "        y=signal,\n",
 93 |     "        sr=sr,\n",
 94 |     "        n_fft=n_fft,\n",
 95 |     "        hop_length=hop_length,\n",
 96 |     "        n_mels=n_mels\n",
 97 |     "    )\n",
 98 |     "    \n",
 99 |     "    # Convert to decibel (log scale for amplitude):\n",
100 |     "    log_mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)\n",
101 |     "    \n",
102 |     "    # Generate an array of vectors as features for the current signal:\n",
103 |     "    features_vector_size = log_mel_spectrogram.shape[1] - frames + 1\n",
104 |     "    \n",
105 |     "    # Build N sliding windows (=frames) and concatenate\n",
106 |     "    # them to build a feature vector:\n",
107 |     "    features = np.zeros((features_vector_size, dims), np.float32)\n",
108 |     "    for t in range(frames):\n",
109 |     "        features[:, n_mels*t:n_mels*(t+1)] = log_mel_spectrogram[:, t:t+features_vector_size].T\n",
110 |     "        \n",
111 |     "    return features\n",
112 |     "```\n",
113 |     "\n",
114 |     "Note that we will train our autoencoder only on the normal signals: our model will learn how to reconstruct these signals (“learning the identity transformation”). The main idea is to leverage this for classification later; when we feed this trained model with abnormal sounds, the reconstruction error will be a lot higher than when trying to reconstruct normal sounds. Using an error threshold, we will then be able to discriminate abnormal and normal sounds.\n",
115 |     "\n",
116 |     "### Create the autoencoder\n",
117 |     "To build our autoencoder, we use Keras and assemble a simple autoencoder architecture with 3 hidden layers:\n",
118 |     "\n",
119 |     "```python\n",
120 |     "from tensorflow.keras import Input\n",
121 |     "from tensorflow.keras.models import Model\n",
122 |     "from tensorflow.keras.layers import Dense\n",
123 |     "\n",
124 |     "def autoencoder_model(input_dims):\n",
125 |     "    inputLayer = Input(shape=(input_dims,))\n",
126 |     "    h = Dense(64, activation=\"relu\")(inputLayer)\n",
127 |     "    h = Dense(64, activation=\"relu\")(h)\n",
128 |     "    h = Dense(8, activation=\"relu\")(h)\n",
129 |     "    h = Dense(64, activation=\"relu\")(h)\n",
130 |     "    h = Dense(64, activation=\"relu\")(h)\n",
131 |     "    h = Dense(input_dims, activation=None)(h)\n",
132 |     "\n",
133 |     "    return Model(inputs=inputLayer, outputs=h)\n",
134 |     "```\n",
135 |     "\n",
136 |     "We put this in a training script (model.py) and use the SageMaker TensorFlow estimator to configure our training job and launch the training:\n",
137 |     "\n",
138 |     "```python\n",
139 |     "tf_estimator = TensorFlow(\n",
140 |     "    base_job_name='sound-anomaly',\n",
141 |     "    entry_point='model.py',\n",
142 |     "    source_dir='./autoencoder/',\n",
143 |     "    role=role,\n",
144 |     "    instance_count=1, \n",
145 |     "    instance_type='ml.p3.2xlarge',\n",
146 |     "    framework_version='2.2',\n",
147 |     "    py_version='py37',\n",
148 |     "    hyperparameters={\n",
149 |     "        'epochs': 30,\n",
150 |     "        'batch-size': 512,\n",
151 |     "        'learning-rate': 1e-3,\n",
152 |     "        'n_mels': n_mels,\n",
153 |     "        'frame': frames\n",
154 |     "    },\n",
155 |     "    debugger_hook_config=False\n",
156 |     ")\n",
157 |     "\n",
158 |     "tf_estimator.fit({'training': training_input_path})\n",
159 |     "```\n",
160 |     "\n",
161 |     "Training over 30 epochs will take few minutes on a p3.2xlarge instance: at this stage, this will cost you a few cents. If you plan to use a similar approach on the whole MIMII dataset or use hyperparameter tuning, you can even further reduce this training cost by using Spot Training (check out **[this sample](https://github.com/aws-samples/amazon-sagemaker-managed-spot-training)** on how you can leverage Managed Training Spot and get a 70% discount in the process).\n",
162 |     "\n",
163 |     "### Evaluate the model\n",
164 |     "Let’s now deploy the autoencoder behind a SageMaker endpoint: this operation will create a SageMaker endpoint and will continue to cost you as long as you let it leave. Do not forger to shut it down at the end of this experiment!\n",
165 |     "\n",
166 |     "```python\n",
167 |     "tf_endpoint_name = 'sound-anomaly-'+time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
168 |     "tf_predictor = tf_estimator.deploy(\n",
169 |     "    initial_instance_count=1,\n",
170 |     "    instance_type='ml.c5.large',\n",
171 |     "    endpoint_name=tf_endpoint_name\n",
172 |     ")\n",
173 |     "print(f'Endpoint name: {tf_predictor.endpoint_name}')\n",
174 |     "```\n",
175 |     "\n",
176 |     "Our test dataset has an equal share of normal and abnormal sounds. We will loop through this dataset and send each test file to this endpoint. As our model is an autoencoder, we will evaluate how good the model is at reconstructing the input. The higher the reconstruction error, the greater the chance that we have identified an anomaly:\n",
177 |     "\n",
178 |     "```python\n",
179 |     "y_true = test_labels\n",
180 |     "reconstruction_errors = []\n",
181 |     "\n",
182 |     "for index, eval_filename in tqdm(enumerate(test_files), total=len(test_files)):\n",
183 |     "    # Load signal\n",
184 |     "    signal, sr = sound_tools.load_sound_file(eval_filename)\n",
185 |     "\n",
186 |     "    # Extract features from this signal:\n",
187 |     "    eval_features = sound_tools.extract_signal_features(\n",
188 |     "        signal, \n",
189 |     "        sr, \n",
190 |     "        n_mels=n_mels, \n",
191 |     "        frames=frames, \n",
192 |     "        n_fft=n_fft, \n",
193 |     "        hop_length=hop_length\n",
194 |     "    )\n",
195 |     "    \n",
196 |     "    # Get predictions from our autoencoder:\n",
197 |     "    prediction = tf_predictor.predict(eval_features)['predictions']\n",
198 |     "    \n",
199 |     "    # Estimate the reconstruction error:\n",
200 |     "    mse = np.mean(np.mean(np.square(eval_features - prediction), axis=1))\n",
201 |     "    reconstruction_errors.append(mse)\n",
202 |     "```\n",
203 |     "\n",
204 |     "In the plot below, we can see that the distribution of reconstruction error for normal and abnormal signals differs significantly. The overlap between these histograms means we have to compromise:\n",
205 |     "\n",
206 |     "![Reconstruction Error Histograms](pictures/reconstruction_error_histograms.png)\n",
207 |     "\n",
208 |     "Let's explore the recall-precision tradeoff for a reconstruction error threshold varying between 5.0 and 10.0 (this encompasses most of the overlap we can see above). First, let's visualize how this threshold range separates our signals on a scatter plot of all the testing samples:\n",
209 |     "\n",
210 |     "![threshold_range_exploration](pictures/threshold_range_exploration.png)\n",
211 |     "\n",
212 |     "If we plot the number of samples flagged as false positives and false negatives we can see that the best compromise is to use a threshold set around 6.3 for the reconstruction error (assuming we are not looking at minimizing either the false positive or false negatives occurrences):\n",
213 |     "\n",
214 |     "![reconstruction_error_threshold](pictures/reconstruction_error_threshold.png)\n",
215 |     "\n",
216 |     "For this threshold (6.3), we obtain the confusion matrix below:\n",
217 |     "\n",
218 |     "![confusion_matrix](pictures/confusion_matrix_autoencoder.png)\n",
219 |     "\n",
220 |     "The metrics associated to this matrix are the following:\n",
221 |     "\n",
222 |     "* Precision: 92.1%\n",
223 |     "* Recall: 92.1%\n",
224 |     "* Accuracy: 88.5%\n",
225 |     "* F1 Score: 92.1%\n",
226 |     "\n",
227 |     "### Cleanup\n",
228 |     "Let’s not forget to delete our Endpoint to prevent any cost to continue incurring by using the **delete_endpoint()** API.\n",
229 |     "\n",
230 |     "### Autoencoder improvement and further exploration\n",
231 |     "\n",
232 |     "The spectrogram approach requires defining the spectrogram square dimensions (e.g. the number of Mel cell defined in the data exploration notebook) which is a heuristic. In contrast, deep learning networks with a CNN encoder can learn the best representation to perform the task at hands (anomaly detection). Further steps to investigate to improve on this first result could be:\n",
233 |     "\n",
234 |     "* Experimenting with several more or less complex autoencoder architectures, training for a longer time, performing hyperparameter tuning with different optimizer, tuning the data preparation sequence (e.g. sound discretization parameters), etc.\n",
235 |     "* Leveraging high resolution spectrograms and feeding them to a CNN encoder to uncover the most appropriate representation of the sound.\n",
236 |     "* Using end-to-end model architecture with encoder-decoder that have been known to give good results on waveform datasets.\n",
237 |     "* Using deep learning models with multi-context temporal and channel (8 microphones) attention weights .\n",
238 |     "* Experimenting with time distributed 2D convolution layers can be used to encode features across the 8 channels: these encoded features could then be fed as sequences across time steps to an LSTM or GRU layer. From there, multiplicative sequence attention weights can then be learnt on the output sequence from the RNN layer.\n",
239 |     "* Exploring the appropriate image representation for multi-variate time series signal that are not waveform: replacing spectrograms with Markov transition fields, recurrence plots or network graphs could then be used to achieve the same goals for non-sound time-based signals."
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "markdown",
244 |    "metadata": {},
245 |    "source": [
246 |     "## Using Amazon Rekognition Custom Labels\n",
247 |     "---\n",
248 |     "### Build a dataset\n",
249 |     "Previously, we had to train our autoencoder on only normal signals. In this case, we will build a more traditional split of training, and testing dataset. Based on the fans sound database this will yield:\n",
250 |     "\n",
251 |     "* **4440 signals** for the training dataset, including:\n",
252 |     "    * 3260 normal signals\n",
253 |     "    * 1180 abnormal signals\n",
254 |     "\n",
255 |     "* **1110 signals** for the testing dataset including:\n",
256 |     "    * 815 normal signals\n",
257 |     "    * 295 abnormal signals\n",
258 |     "\n",
259 |     "We will generate and store the spectrogram of each signal and upload them in either a train or test bucket.\n",
260 |     "\n",
261 |     "### Create a Rekognition Custom Labels\n",
262 |     "\n",
263 |     "The first step is to create a Custom Labels project:\n",
264 |     "\n",
265 |     "```python\n",
266 |     "# Initialization, get a Rekognition client:\n",
267 |     "PROJECT_NAME = 'sound-anomaly-detection'\n",
268 |     "reko = boto3.client(\"rekognition\")\n",
269 |     "\n",
270 |     "# Let's try to create a Rekognition project:\n",
271 |     "try:\n",
272 |     "    project_arn = reko.create_project(ProjectName=PROJECT_NAME)['ProjectArn']\n",
273 |     "    \n",
274 |     "# If the project already exists, we get its ARN:\n",
275 |     "except reko.exceptions.ResourceInUseException:\n",
276 |     "    # List all the existing project:\n",
277 |     "    print('Project already exists, collecting the ARN.')\n",
278 |     "    reko_project_list = reko.describe_projects()\n",
279 |     "    \n",
280 |     "    # Loop through all the Rekognition projects:\n",
281 |     "    for project in reko_project_list['ProjectDescriptions']:\n",
282 |     "        # Get the project name (the string after the first delimiter in the ARN)\n",
283 |     "        project_name = project['ProjectArn'].split('/')[1]\n",
284 |     "        \n",
285 |     "        # Once we find it, we store the ARN and break out of the loop:\n",
286 |     "        if (project_name == PROJECT_NAME):\n",
287 |     "            project_arn = project['ProjectArn']\n",
288 |     "            break\n",
289 |     "            \n",
290 |     "print(project_arn)\n",
291 |     "```\n",
292 |     "\n",
293 |     "We need to tell Amazon Rekognition where to find the training data, testing data and where to output its results:\n",
294 |     "\n",
295 |     "```python\n",
296 |     "TrainingData = {\n",
297 |     "    'Assets': [{ \n",
298 |     "        'GroundTruthManifest': {\n",
299 |     "            'S3Object': { \n",
300 |     "                'Bucket': <YOUR-BUCKET-NAME>,\n",
301 |     "                'Name': f'{<YOUR-PREFIX-NAME>}/manifests/train.manifest'\n",
302 |     "            }\n",
303 |     "        }\n",
304 |     "    }]\n",
305 |     "}\n",
306 |     "\n",
307 |     "TestingData = {\n",
308 |     "    'AutoCreate': True\n",
309 |     "}\n",
310 |     "\n",
311 |     "OutputConfig = { \n",
312 |     "    'S3Bucket': <YOUR-BUCKET-NAME>,\n",
313 |     "    'S3KeyPrefix': f'{<YOUR-PREFIX-NAME>}/output'\n",
314 |     "}\n",
315 |     "```\n",
316 |     "\n",
317 |     "Now we can create a project version: creating a project version will build and train a model within this Rekognition project for the data previously configured. Project creation can fail, if the bucket you selected cannot be accessed by Rekognition. Make sure the right Bucket Policy is applied to your bucket (check the notebooks to see the recommended policy).\n",
318 |     "\n",
319 |     "Let’s now create a project version: this will launch a new model training and you will then have to wait for the model to be trained. This should take around 1 hour (less than $1 from a cost perspective):\n",
320 |     "\n",
321 |     "```python\n",
322 |     "version = 'experiment-1'\n",
323 |     "VERSION_NAME = f'{PROJECT_NAME}.{version}'\n",
324 |     "\n",
325 |     "# Let's try to create a new project version in the current project:\n",
326 |     "try:\n",
327 |     "    project_version_arn = reko.create_project_version(\n",
328 |     "        ProjectArn=project_arn,      # Project ARN\n",
329 |     "        VersionName=VERSION_NAME,    # Name of this version\n",
330 |     "        OutputConfig=OutputConfig,   # S3 location for the output artefact\n",
331 |     "        TrainingData=TrainingData,   # S3 location of the manifest describing the training data\n",
332 |     "        TestingData=TestingData      # S3 location of the manifest describing the validation data\n",
333 |     "    )['ProjectVersionArn']\n",
334 |     "    \n",
335 |     "# If a project version with this name already exists, we get its ARN:\n",
336 |     "except reko.exceptions.ResourceInUseException:\n",
337 |     "    # List all the project versions (=models) for this project:\n",
338 |     "    print('Project version already exists, collecting the ARN:', end=' ')\n",
339 |     "    reko_project_versions_list = reko.describe_project_versions(ProjectArn=project_arn)\n",
340 |     "    \n",
341 |     "    # Loops through them:\n",
342 |     "    for project_version in reko_project_versions_list['ProjectVersionDescriptions']:\n",
343 |     "        # Get the project version name (the string after the third delimiter in the ARN)\n",
344 |     "        project_version_name = project_version['ProjectVersionArn'].split('/')[3]\n",
345 |     "\n",
346 |     "        # Once we find it, we store the ARN and break out of the loop:\n",
347 |     "        if (project_version_name == VERSION_NAME):\n",
348 |     "            project_version_arn = project_version['ProjectVersionArn']\n",
349 |     "            break\n",
350 |     "            \n",
351 |     "print(project_version_arn)\n",
352 |     "status = reko.describe_project_versions(\n",
353 |     "    ProjectArn=project_arn,\n",
354 |     "    VersionNames=[project_version_arn.split('/')[3]]\n",
355 |     ")['ProjectVersionDescriptions'][0]['Status']\n",
356 |     "```\n",
357 |     "\n",
358 |     "### Evaluate the model\n",
359 |     "\n",
360 |     "First, we will deploy our model by using the ARN collected before: again, this will deploy an endpoint that will cost you around $4 per hour. Don’t forget to decommission it once you’re done!\n",
361 |     "\n",
362 |     "```python\n",
363 |     "# Start the model\n",
364 |     "print('Starting model: ' + model_arn)\n",
365 |     "response = client.start_project_version(ProjectVersionArn=model_arn, MinInferenceUnits=min_inference_units)\n",
366 |     "\n",
367 |     "# Wait for the model to be in the running state:\n",
368 |     "project_version_running_waiter = client.get_waiter('project_version_running')\n",
369 |     "project_version_running_waiter.wait(ProjectArn=project_arn, VersionNames=[version_name])\n",
370 |     "\n",
371 |     "# Get the running status\n",
372 |     "describe_response=client.describe_project_versions(ProjectArn=project_arn, VersionNames=[version_name])\n",
373 |     "for model in describe_response['ProjectVersionDescriptions']:\n",
374 |     "    print(\"Status: \" + model['Status'])\n",
375 |     "    print(\"Message: \" + model['StatusMessage'])\n",
376 |     "```\n",
377 |     "\n",
378 |     "Once the model is running you can start querying it for predictions: in the notebook, you will find a function *get_results()* that will query a given model with a list of pictures sitting in a given path. This will take a few minutes to run all the test samples and will cost less than $1 (for the ~3,000 test samples):\n",
379 |     "\n",
380 |     "```python\n",
381 |     "predictions_ok = rt.get_results(project_version_arn, BUCKET, s3_path=f'{BUCKET}/{PREFIX}/test/normal', label='normal', verbose=True)\n",
382 |     "predictions_ko = rt.get_results(project_version_arn, BUCKET, s3_path=f'{BUCKET}/{PREFIX}/test/abnormal', label='abnormal', verbose=True)\n",
383 |     "\n",
384 |     "def get_results(project_version_arn, bucket, s3_path, label=None, verbose=True):\n",
385 |     "    \"\"\"\n",
386 |     "    Sends a list of pictures located in an S3 path to\n",
387 |     "    the endpoint to get the associated predictions.\n",
388 |     "    \"\"\"\n",
389 |     "\n",
390 |     "    fs = s3fs.S3FileSystem()\n",
391 |     "    data = {}\n",
392 |     "    predictions = pd.DataFrame(columns=['image', 'normal', 'abnormal'])\n",
393 |     "    \n",
394 |     "    for file in fs.ls(path=s3_path, detail=True, refresh=True):\n",
395 |     "        if file['Size'] > 0:\n",
396 |     "            image = '/'.join(file['Key'].split('/')[1:])\n",
397 |     "            if verbose == True:\n",
398 |     "                print('.', end='')\n",
399 |     "\n",
400 |     "            labels = show_custom_labels(project_version_arn, bucket, image, 0.0)\n",
401 |     "            for L in labels:\n",
402 |     "                data[L['Name']] = L['Confidence']\n",
403 |     "                \n",
404 |     "            predictions = predictions.append(pd.Series({\n",
405 |     "                'image': file['Key'].split('/')[-1],\n",
406 |     "                'abnormal': data['abnormal'],\n",
407 |     "                'normal': data['normal'],\n",
408 |     "                'ground truth': label\n",
409 |     "            }), ignore_index=True)\n",
410 |     "            \n",
411 |     "    return predictions\n",
412 |     "    \n",
413 |     "def show_custom_labels(model, bucket, image, min_confidence):\n",
414 |     "    # Call DetectCustomLabels from the Rekognition API: this will give us the list \n",
415 |     "    # of labels detected for this picture and their associated confidence level:\n",
416 |     "    reko = boto3.client('rekognition')\n",
417 |     "    try:\n",
418 |     "        response = reko.detect_custom_labels(\n",
419 |     "            Image={'S3Object': {'Bucket': bucket, 'Name': image}},\n",
420 |     "            MinConfidence=min_confidence,\n",
421 |     "            ProjectVersionArn=model\n",
422 |     "        )\n",
423 |     "        \n",
424 |     "    except Exception as e:\n",
425 |     "        print(f'Exception encountered when processing {image}')\n",
426 |     "        print(e)\n",
427 |     "        \n",
428 |     "    # Returns the list of custom labels for the image passed as an argument:\n",
429 |     "    return response['CustomLabels']\n",
430 |     "```\n",
431 |     "\n",
432 |     "Let’s plot the confusion matrix associated to this test set:\n",
433 |     "\n",
434 |     "![confusion_matrix_rekognition](pictures/confusion_matrix_rekognition.png)\n",
435 |     "\n",
436 |     "The metrics associated to this matrix are the following:\n",
437 |     "\n",
438 |     "* Precision: 100.0%\n",
439 |     "* Recall: 99.8%\n",
440 |     "* Accuracy: 99.8%\n",
441 |     "* F1 Score: 99.9%\n",
442 |     "\n",
443 |     "Without any effort (and no ML knowledge!), we get impressive results. With so low false positives and false negatives, we can leverage such a model in even the most challenging industrial context.\n",
444 |     "\n",
445 |     "### Cleanup\n",
446 |     "\n",
447 |     "We need to stop the running model as we will continue to incur costs while the endpoint is live:\n",
448 |     "\n",
449 |     "```python\n",
450 |     "print('Stopping model:' + model_arn)\n",
451 |     "\n",
452 |     "# Stop the model:\n",
453 |     "try:\n",
454 |     "    reko = boto3.client('rekognition')\n",
455 |     "    response = reko.stop_project_version(ProjectVersionArn=model_arn)\n",
456 |     "    status = response['Status']\n",
457 |     "    print('Status: ' + status)\n",
458 |     "\n",
459 |     "except Exception as e:  \n",
460 |     "    print(e)  \n",
461 |     "\n",
462 |     "print('Done.')\n",
463 |     "```"
464 |    ]
465 |   }
466 |  ],
467 |  "metadata": {
468 |   "kernelspec": {
469 |    "display_name": "conda_tensorflow2_p36",
470 |    "language": "python",
471 |    "name": "conda_tensorflow2_p36"
472 |   },
473 |   "language_info": {
474 |    "codemirror_mode": {
475 |     "name": "ipython",
476 |     "version": 3
477 |    },
478 |    "file_extension": ".py",
479 |    "mimetype": "text/x-python",
480 |    "name": "python",
481 |    "nbconvert_exporter": "python",
482 |    "pygments_lexer": "ipython3",
483 |    "version": "3.6.10"
484 |   }
485 |  },
486 |  "nbformat": 4,
487 |  "nbformat_minor": 4
488 | }
489 | 


--------------------------------------------------------------------------------
/1_data_exploration.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Sound anomaly detection\n",
  8 |     "*Step 1 - Data exploration*\n",
  9 |     "\n",
 10 |     "## Introduction\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "There are usually very few redondancies on a manufacturing production line of any shop floor. The ability to ensure the highest avaibility of these lines is key to deliver higher ROI, produce better quality products, increase safety levels, reduce environmental impact and waste and in the end... ensure greater customer satisfaction. More and more AI-based solutions are leveraging sensor data coming from the pieces of equipment on the production line. In this example, we are going to use sounds recorded in an industrial environment to perform anomaly detection on indsutrial equipment.\n",
 19 |     "\n",
 20 |     "To achieve this, we are going to explore and leverage the MIMII dataset for anomaly detection purpose: this is a sound dataset for **M**alfunctioning **I**ndustrial **M**achine **I**nvestigation and **I**nspection (MIMII). It can be downloaded from **[this link](https://zenodo.org/record/3384388#.X2werWgzaTk)** and contains sounds from several types of industrial machines (valves, pumps, fans and slide rails). In this example, we are going to focus on the **fans**.\n",
 21 |     "\n",
 22 |     "This is the first notebook in a series of three:\n",
 23 |     "* This notebook will help you get familiar with this kind of data. Sound data are particular time series data and exploring them requires specific approaches.\n",
 24 |     "* In the next notebook we will build an autoencoder to build a classifier able to discriminate between normal and abnormal sounds.\n",
 25 |     "* Lastly, we are going to take on a more novel approach in the last part of the work: we are going to transform the sound files into spectrogram images and feed them to an image classifier. We will use Amazon Rekognition Custom Labels to perform this work.\n",
 26 |     "\n",
 27 |     "The two approaches (building a basic autoencoder from scratch and leveraging Rekognition) amounts approximately to the same effort: although the models are in no way comparable it will give you an idea of how much of a kick start you can get when using an applied AI service (if any)."
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "markdown",
 32 |    "metadata": {},
 33 |    "source": [
 34 |     "## Initialization\n",
 35 |     "---\n",
 36 |     "**WARNING**: make sure you run this notebook using an **ml.c5.2xlarge instance** with a **25 GB attached EBS volume** to process the MIMII dataset (the dataset for the industrial fans is a 10 GB archive, reaching 14 GB once unzipped)."
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "### Configuration\n",
 44 |     "Remove the **-q** command line parameters if you want to check for potential error messages."
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": null,
 50 |    "metadata": {},
 51 |    "outputs": [],
 52 |    "source": [
 53 |     "# Notebook update\n",
 54 |     "import sys\n",
 55 |     "\n",
 56 |     "!pip -q install --upgrade sagemaker\n",
 57 |     "if 'librosa' not in list(sys.modules):\n",
 58 |     "    !conda install -q -y -c conda-forge librosa\n",
 59 |     "\n",
 60 |     "# Kernel restart\n",
 61 |     "from IPython.core.display import HTML\n",
 62 |     "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": null,
 68 |    "metadata": {},
 69 |    "outputs": [],
 70 |    "source": [
 71 |     "# Python libraries:\n",
 72 |     "import hashlib\n",
 73 |     "import matplotlib.pyplot as plt\n",
 74 |     "import numpy as np\n",
 75 |     "import os\n",
 76 |     "import random\n",
 77 |     "import sys\n",
 78 |     "\n",
 79 |     "# Sound management:\n",
 80 |     "import librosa\n",
 81 |     "import librosa.display\n",
 82 |     "import IPython.display as ipd\n",
 83 |     "\n",
 84 |     "sys.path.append('tools')\n",
 85 |     "import utils\n",
 86 |     "import sound_tools"
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "code",
 91 |    "execution_count": null,
 92 |    "metadata": {},
 93 |    "outputs": [],
 94 |    "source": [
 95 |     "# Initialization:\n",
 96 |     "random.seed(42)\n",
 97 |     "np.random.seed(42)\n",
 98 |     "plt.style.use('Solarize_Light2')\n",
 99 |     "prop_cycle = plt.rcParams['axes.prop_cycle']\n",
100 |     "colors = prop_cycle.by_key()['color']\n",
101 |     "blue, red = colors[1], colors[5]\n",
102 |     "\n",
103 |     "# Paths definition:\n",
104 |     "DATA           = os.path.join('data', 'interim')\n",
105 |     "RAW_DATA       = os.path.join('data', 'raw')\n",
106 |     "PROCESSED_DATA = os.path.join('data', 'processed')"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {},
112 |    "source": [
113 |     "### Loading helper functions"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "### Downloading and unzipping data"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": null,
126 |    "metadata": {},
127 |    "outputs": [],
128 |    "source": [
129 |     "if not os.path.exists(DATA):\n",
130 |     "    print('Data directory does not exist, creating them.')\n",
131 |     "    os.makedirs(DATA, exist_ok=True)\n",
132 |     "    os.makedirs(RAW_DATA, exist_ok=True)\n",
133 |     "    os.makedirs(PROCESSED_DATA, exist_ok=True)"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": null,
139 |    "metadata": {},
140 |    "outputs": [],
141 |    "source": [
142 |     "# Checks if the dataset is already downloded and unzipped:\n",
143 |     "first_file = os.path.join(DATA, 'fan', 'id_00', 'normal', '00000000.wav')\n",
144 |     "if os.path.exists(first_file):\n",
145 |     "    print('=== Sound files found, no need to download them again. ===')\n",
146 |     "    \n",
147 |     "else:\n",
148 |     "    print('=== Downloading and unzipping the FAN file from the MIMII dataset website (~10 GB) ===')\n",
149 |     "    !wget https://zenodo.org/record/3384388/files/6_dB_fan.zip?download=1 --output-document=/tmp/fan.zip\n",
150 |     "    \n",
151 |     "    # Checking file integrity: computing MD5 hash\n",
152 |     "    original_md5 = '0890f7d3c2fd8448634e69ff1d66dd47'\n",
153 |     "    downloaded_md5 = utils.md5('/tmp/fan.zip')\n",
154 |     "    \n",
155 |     "    # Correct MD5, unzipping archive:\n",
156 |     "    if original_md5 == downloaded_md5:\n",
157 |     "        !unzip -q /tmp/fan.zip -d $DATA\n",
158 |     "        \n",
159 |     "    # Raising exception for an incorrect MD5:\n",
160 |     "    else:\n",
161 |     "        raise Exception('Downloaded file was corrupted, retry the download.')"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "markdown",
166 |    "metadata": {},
167 |    "source": [
168 |     "### Feature engineering parameters\n",
169 |     "These parameters are used to extract features from sound files:"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "code",
174 |    "execution_count": null,
175 |    "metadata": {},
176 |    "outputs": [],
177 |    "source": [
178 |     "n_mels = 64\n",
179 |     "frames = 5\n",
180 |     "n_fft = 2048\n",
181 |     "hop_length = 512\n",
182 |     "power = 2.0"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "## Exploratory data analysis\n",
190 |     "---\n",
191 |     "Let's load a normal and an abnormal signal to plot them. Each recording contains **8 channels, one for each microphone** that was used to record the machine sound. For the remaining of this experiment, **we will only focus on the recordings of the first microphone**.\n",
192 |     "### Wave forms"
193 |    ]
194 |   },
195 |   {
196 |    "cell_type": "code",
197 |    "execution_count": null,
198 |    "metadata": {},
199 |    "outputs": [],
200 |    "source": [
201 |     "normal_signal_file = os.path.join(DATA, 'fan', 'id_00', 'normal', '00000100.wav')\n",
202 |     "abnormal_signal_file = os.path.join(DATA, 'fan', 'id_00', 'abnormal', '00000100.wav')\n",
203 |     "normal_signal, sr = sound_tools.load_sound_file(normal_signal_file)\n",
204 |     "abnormal_signal, sr = sound_tools.load_sound_file(abnormal_signal_file)\n",
205 |     "print(f'The signals have a {normal_signal.shape} shape. At {sr} Hz, these are {normal_signal.shape[0]/sr:.0f}s signals')"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "Let's first visualize the waveplots for these signals:"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": null,
218 |    "metadata": {},
219 |    "outputs": [],
220 |    "source": [
221 |     "fig = plt.figure(figsize=(24, 6))\n",
222 |     "plt.subplot(1,3,1)\n",
223 |     "librosa.display.waveplot(normal_signal, sr=sr, alpha=0.5, color=blue, linewidth=0.5, label='Machine #id_00 - Normal signal')\n",
224 |     "plt.title('Normal signal')\n",
225 |     "\n",
226 |     "plt.subplot(1,3,2)\n",
227 |     "librosa.display.waveplot(abnormal_signal, sr=sr, alpha=0.6, color=red, linewidth=0.5, label='Machine #id_00 - Abnormal signal')\n",
228 |     "plt.title('Abnormal signal')\n",
229 |     "\n",
230 |     "plt.subplot(1,3,3)\n",
231 |     "librosa.display.waveplot(abnormal_signal, sr=sr, alpha=0.6, color=red, linewidth=0.5, label='Abnormal signal')\n",
232 |     "librosa.display.waveplot(normal_signal, sr=sr, alpha=0.5, color=blue, linewidth=0.5, label='Normal signal')\n",
233 |     "plt.title('Both signals')\n",
234 |     "\n",
235 |     "fig.suptitle('Machine #id_00 - 2D representation of the wave forms', fontsize=16)\n",
236 |     "plt.legend();"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "markdown",
241 |    "metadata": {},
242 |    "source": [
243 |     "Apart from the larger amplitude of the abnormal signal and some pattern that are more irregular, it's difficult to distinguish between these two signals. Let's listen to them:"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "code",
248 |    "execution_count": null,
249 |    "metadata": {},
250 |    "outputs": [],
251 |    "source": [
252 |     "ipd.Audio(os.path.join(DATA, 'fan', 'id_00', 'normal', '00000003.wav'), rate=sr)"
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "code",
257 |    "execution_count": null,
258 |    "metadata": {},
259 |    "outputs": [],
260 |    "source": [
261 |     "ipd.Audio(os.path.join(DATA, 'fan', 'id_00', 'abnormal', '00000003.wav'), rate=sr)"
262 |    ]
263 |   },
264 |   {
265 |    "cell_type": "markdown",
266 |    "metadata": {},
267 |    "source": [
268 |     "We can hear a small difference. **Let's now have a look in the frequency domain** and see if we can make that difference more obvious..."
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "markdown",
273 |    "metadata": {},
274 |    "source": [
275 |     "### Short Fourier transform\n",
276 |     "Let's take the Fourier transform of a first time window. Such signals are highly non-stationary (i.e., their statistics change over time). As a consequence, it will be rather meaningless to compute a single Fourier transform over an entire signal. The short-time Fourier transform is obtained by computing the Fourier transform for successive frames in a signal. We can compute it thanks to the **`librosa.stft()`** function that returns a complex-valued matrix D where:\n",
277 |     "* `np.abs(D[f, t])` is the magnitude of frequency bin **f** at frame **t** and\n",
278 |     "* `np.angle(D[f, t])` is the corresponding phase for the same frequency bin **f** at frame **t**.\n",
279 |     "\n",
280 |     "The parameter `n_fft` of this function is the length of the window signal (frame size) while the `hop_length` is the frame increment. Our signals are 10s long: with `n_fft = 2048` and at a sampling rate of 16 kHz, this corresponds to a physical duration of `2048/16000 = 128 ms`. Let's display the FFT of the first 128ms window (by limiting the signal span and by setting a hop length greater than `n_fft`):"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": null,
286 |    "metadata": {},
287 |    "outputs": [],
288 |    "source": [
289 |     "D_normal = np.abs(librosa.stft(normal_signal[:n_fft], n_fft=n_fft, hop_length=n_fft + 1))\n",
290 |     "D_abnormal = np.abs(librosa.stft(abnormal_signal[:n_fft], n_fft=n_fft, hop_length=n_fft + 1))\n",
291 |     "\n",
292 |     "fig = plt.figure(figsize=(12, 6))\n",
293 |     "plt.plot(D_normal, color=blue, alpha=0.6, label='Machine #id_00 - Normal signal');\n",
294 |     "plt.plot(D_abnormal, color=red, alpha=0.6, label='Machine #id_00 - Abnormal signal');\n",
295 |     "plt.title('Fourier transform for the first 64ms window')\n",
296 |     "plt.xlabel('Frequency (Hz)')\n",
297 |     "plt.ylabel('Amplitude')\n",
298 |     "plt.legend()\n",
299 |     "plt.xlim(0, 200);"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {},
305 |    "source": [
306 |     "### Spectrograms\n",
307 |     "Let's know take the entire signals, separate them in time windows of `hop_length` width, apply a short Fourier transform on each of these windows and plot them on a spectrogram to illustrate these three dimensions:\n",
308 |     "* Frequency (Hz) is now on the vertical axis\n",
309 |     "* Amplitude is shifted from the vertical axis of the previous diagram to the color axis\n",
310 |     "* The horizontal axis represents time\n",
311 |     "\n",
312 |     "The following diagram is a plot of the short Fourier transforms for 20 bins (20 x 128ms = 2560ms which is the span of the horizontal axis) for the first 500 Hz (frequency span on the vertical axis):"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "code",
317 |    "execution_count": null,
318 |    "metadata": {},
319 |    "outputs": [],
320 |    "source": [
321 |     "D_normal = np.abs(librosa.stft(normal_signal[:20*n_fft], n_fft=n_fft, hop_length=hop_length))\n",
322 |     "dB_normal = sound_tools.get_magnitude_scale(normal_signal_file)\n",
323 |     "\n",
324 |     "fig = plt.figure(figsize=(12, 6))\n",
325 |     "librosa.display.specshow(D_normal, sr=sr, x_axis='time', y_axis='linear', cmap='viridis');\n",
326 |     "plt.title('Machine #id_00 - Normal signal\\nShort Fourier Transform representation of the First 2560ms')\n",
327 |     "plt.ylim(0, 500)\n",
328 |     "plt.xlabel('Time (s)')\n",
329 |     "plt.ylabel('Frequency (Hz)')\n",
330 |     "plt.colorbar()\n",
331 |     "plt.show()"
332 |    ]
333 |   },
334 |   {
335 |    "cell_type": "markdown",
336 |    "metadata": {},
337 |    "source": [
338 |     "Let's now plot the same spectrogram for the whole signal (10s), a higher frequency range and for both a normal and an abnormal signals. Each spectrogram will have a dimension of `int(160,000 / hop_length) + 1 = 313` bins on the horizontal axis and `n_fft / 2 = 1,024` bins on the vertical axis. Hence, the dimension of the spectrogram for a given sound signal will be `1,024 x 313`."
339 |    ]
340 |   },
341 |   {
342 |    "cell_type": "code",
343 |    "execution_count": null,
344 |    "metadata": {},
345 |    "outputs": [],
346 |    "source": [
347 |     "D_normal = np.abs(librosa.stft(normal_signal, n_fft=n_fft, hop_length=hop_length))\n",
348 |     "D_abnormal = np.abs(librosa.stft(abnormal_signal, n_fft=n_fft, hop_length=hop_length))\n",
349 |     "\n",
350 |     "fig = plt.figure(figsize=(24, 6))\n",
351 |     "plt.subplot(1, 2, 1)\n",
352 |     "librosa.display.specshow(D_normal, sr=sr, x_axis='time', y_axis='linear', cmap='viridis');\n",
353 |     "plt.title('Machine #id_00 - Normal signal')\n",
354 |     "plt.xlabel('Time (s)')\n",
355 |     "plt.ylabel('Frequency (Hz)')\n",
356 |     "plt.colorbar();\n",
357 |     "\n",
358 |     "plt.subplot(1, 2, 2)\n",
359 |     "librosa.display.specshow(D_abnormal, sr=sr, x_axis='time', y_axis='linear', cmap='viridis');\n",
360 |     "plt.title('Machine #id_00 - Abnormal signal')\n",
361 |     "plt.xlabel('Time (s)')\n",
362 |     "plt.ylabel('Frequency (Hz)')\n",
363 |     "plt.colorbar();"
364 |    ]
365 |   },
366 |   {
367 |    "cell_type": "markdown",
368 |    "metadata": {},
369 |    "source": [
370 |     "Not much we can see here, mainly because most sounds we can hear or experience as humans, are concentrated in a very small range (both in frequency and amplitude range). Let's take a log scale for both the frequency and the amplitude: for the amplitude, we obtain this by transforming the \"color\" axis to a log scale by converting it to Decibels (which is equivalent to applying a log scale to the sound amplitudes):"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "code",
375 |    "execution_count": null,
376 |    "metadata": {},
377 |    "outputs": [],
378 |    "source": [
379 |     "dB_normal = sound_tools.get_magnitude_scale(normal_signal_file, n_fft=n_fft, hop_length=hop_length)\n",
380 |     "dB_abnormal = sound_tools.get_magnitude_scale(abnormal_signal_file, n_fft=n_fft, hop_length=hop_length)\n",
381 |     "\n",
382 |     "fig = plt.figure(figsize=(24, 6))\n",
383 |     "\n",
384 |     "plt.subplot(1, 2, 1)\n",
385 |     "librosa.display.specshow(dB_normal, sr=sr, x_axis='time', y_axis='mel', cmap='viridis')\n",
386 |     "plt.title('Machine #id_00 - Normal signal')\n",
387 |     "plt.colorbar(format=\"%+2.f dB\")\n",
388 |     "plt.xlabel('Time (s)')\n",
389 |     "plt.ylabel('Frequency (Hz)')\n",
390 |     "\n",
391 |     "plt.subplot(1, 2, 2)\n",
392 |     "librosa.display.specshow(dB_abnormal, sr=sr, x_axis='time', y_axis='mel', cmap='viridis')\n",
393 |     "plt.title('Machine #id_00 - Abnormal signal')\n",
394 |     "plt.ylabel('Frequency (Hz)')\n",
395 |     "plt.colorbar(format=\"%+2.f dB\")\n",
396 |     "plt.xlabel('Time (s)')\n",
397 |     "plt.ylabel('Frequency (Hz)')\n",
398 |     "\n",
399 |     "plt.show()"
400 |    ]
401 |   },
402 |   {
403 |    "cell_type": "markdown",
404 |    "metadata": {},
405 |    "source": [
406 |     "### Applying Mel scale\n",
407 |     "The way the Mel scale is constructed is to allow sounds that are at equal distance from each other on this scale, to also *sound* as if they were at equal distance if a human hear them. As a human, our ear is easily able to distinguish two sounds at frequency 250 Hz and 500 Hz. However, we will barely notice any difference between two other sounds emitted at frequency 9750 Hz and 10000 Hz (which are still 250 Hz apart)... This non-linear transformation is also available in the `librosa` library. The following function partitions the frequency scales in bins and transform each of them into the corresponding bin in the Mel scale: this produces a linear transformation matrix to project FFT bins onto Mel-frequency bins. Then it uses this transformation matrix to plot a new spectrogram:"
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": null,
413 |    "metadata": {},
414 |    "outputs": [],
415 |    "source": [
416 |     "normal_mel = librosa.feature.melspectrogram(normal_signal, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)\n",
417 |     "normal_S_DB = librosa.power_to_db(normal_mel, ref=np.max)\n",
418 |     "abnormal_mel = librosa.feature.melspectrogram(abnormal_signal, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)\n",
419 |     "abnormal_S_DB = librosa.power_to_db(abnormal_mel, ref=np.max)\n",
420 |     "\n",
421 |     "fig = plt.figure(figsize=(24, 6))\n",
422 |     "plt.subplot(1, 2, 1)\n",
423 |     "librosa.display.specshow(normal_S_DB, sr=sr, hop_length=hop_length, x_axis='time', y_axis='mel', cmap='viridis');\n",
424 |     "plt.title('Machine #id_00 - Normal signal')\n",
425 |     "plt.xlabel('Time (s)')\n",
426 |     "plt.ylabel('Frequency (Hz)')\n",
427 |     "plt.colorbar(format='%+2.0f dB');\n",
428 |     "\n",
429 |     "plt.subplot(1, 2, 2)\n",
430 |     "librosa.display.specshow(abnormal_S_DB, sr=sr, hop_length=hop_length, x_axis='time', y_axis='mel', cmap='viridis');\n",
431 |     "plt.title('Machine #id_00 - Abnormal signal')\n",
432 |     "plt.xlabel('Time (s)')\n",
433 |     "plt.ylabel('Frequency (Hz)')\n",
434 |     "plt.colorbar(format='%+2.0f dB');"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "code",
439 |    "execution_count": null,
440 |    "metadata": {},
441 |    "outputs": [],
442 |    "source": [
443 |     "frames = 5\n",
444 |     "stride = 1\n",
445 |     "dims = frames * n_mels\n",
446 |     "\n",
447 |     "features_vector_size = normal_S_DB.shape[1] - frames + 1\n",
448 |     "features = np.zeros((features_vector_size, dims), np.float32)\n",
449 |     "for t in range(frames):\n",
450 |     "    features[:, n_mels * t: n_mels * (t + 1)] = normal_S_DB[:, t:t + features_vector_size].T\n",
451 |     "fig = plt.figure(figsize=(24, 3))\n",
452 |     "for t in range(frames):\n",
453 |     "    plt.subplot(1, frames, t + 1)\n",
454 |     "    librosa.display.specshow(features[:, n_mels * t: n_mels * (t + 1)].T, sr=sr, hop_length=hop_length, cmap='viridis');\n",
455 |     "    \n",
456 |     "features_vector_size = abnormal_S_DB.shape[1] - frames + 1\n",
457 |     "features = np.zeros((features_vector_size, dims), np.float32)\n",
458 |     "for t in range(frames):\n",
459 |     "    features[:, n_mels * t: n_mels * (t + 1)] = abnormal_S_DB[:, t:t + features_vector_size].T\n",
460 |     "fig = plt.figure(figsize=(24, 3))\n",
461 |     "for t in range(frames):\n",
462 |     "    plt.subplot(1, frames, t + 1)\n",
463 |     "    librosa.display.specshow(features[:, n_mels * t: n_mels * (t + 1)].T, sr=sr, hop_length=hop_length, cmap='viridis');\n",
464 |     "    #plt.colorbar(format='%+2.0f dB');"
465 |    ]
466 |   },
467 |   {
468 |    "cell_type": "markdown",
469 |    "metadata": {},
470 |    "source": [
471 |     "The binning process in the frequency domain applied by the Mel transformation yields a more pixelated diagrams which is consistent with the lower sound resolution a human hear have when compared to a microphone. It's also useful to dilate any features we can try and extract for a deep learning model afterward..."
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "markdown",
476 |    "metadata": {},
477 |    "source": [
478 |     "### Conclusion\n",
479 |     "The Mel spectrogram looks like a good candidate to extract interesting features that we could feed to a neural network. We will know build two types of feature extractor based on this analysis and feed them to different type of architectures:\n",
480 |     "1. Extracting features into a tabular dataset that we will feed to an autoencoder neural network: as computed above each raw spectrogram has a shape of `1024 x 313` where the Mel spectrogram will have a shape of `64 x 313`.\n",
481 |     "2. Using the spectrograms as an input to feed a computer vision-based architecture"
482 |    ]
483 |   }
484 |  ],
485 |  "metadata": {
486 |   "kernelspec": {
487 |    "display_name": "conda_tensorflow2_p36",
488 |    "language": "python",
489 |    "name": "conda_tensorflow2_p36"
490 |   },
491 |   "language_info": {
492 |    "codemirror_mode": {
493 |     "name": "ipython",
494 |     "version": 3
495 |    },
496 |    "file_extension": ".py",
497 |    "mimetype": "text/x-python",
498 |    "name": "python",
499 |    "nbconvert_exporter": "python",
500 |    "pygments_lexer": "ipython3",
501 |    "version": "3.6.10"
502 |   }
503 |  },
504 |  "nbformat": 4,
505 |  "nbformat_minor": 4
506 | }
507 | 


--------------------------------------------------------------------------------
/2_custom_autoencoder.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Sound anomaly detection\n",
  8 |     "*Step 2 - Performing anomaly detection with an autoencoder architecture*"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "## Introduction\n",
 16 |     "---\n",
 17 |     "Here is [an article](https://towardsdatascience.com/autoencoder-neural-network-for-anomaly-detection-with-unlabeled-dataset-af9051a048) you can go through to dive deeper into what an autoencoder is and how it can be used for anomaly detections."
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "## Initialization\n",
 25 |     "---"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "### Configuration"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": null,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "# Python libraries:\n",
 42 |     "import os\n",
 43 |     "import sys\n",
 44 |     "import random\n",
 45 |     "import time\n",
 46 |     "import pickle\n",
 47 |     "import pandas as pd\n",
 48 |     "import numpy as np\n",
 49 |     "import matplotlib.pyplot as plt\n",
 50 |     "import seaborn as sns\n",
 51 |     "from sklearn.model_selection import train_test_split\n",
 52 |     "\n",
 53 |     "# Other imports:\n",
 54 |     "from sklearn import metrics\n",
 55 |     "from tqdm import tqdm\n",
 56 |     "\n",
 57 |     "# AWS and SageMaker libraries:\n",
 58 |     "import sagemaker\n",
 59 |     "import boto3\n",
 60 |     "from sagemaker.tensorflow import TensorFlow\n",
 61 |     "\n",
 62 |     "sys.path.append('tools')\n",
 63 |     "import sound_tools\n",
 64 |     "import utils"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": null,
 70 |    "metadata": {},
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "# Initializations:\n",
 74 |     "%matplotlib inline\n",
 75 |     "plt.style.use('Solarize_Light2')\n",
 76 |     "prop_cycle = plt.rcParams['axes.prop_cycle']\n",
 77 |     "colors = prop_cycle.by_key()['color']\n",
 78 |     "blue = colors[1]\n",
 79 |     "red = colors[5]\n",
 80 |     "\n",
 81 |     "random.seed(42)\n",
 82 |     "np.random.seed(42)\n",
 83 |     "sess = sagemaker.Session()\n",
 84 |     "role = sagemaker.get_execution_role()\n",
 85 |     "\n",
 86 |     "# Paths definition:\n",
 87 |     "DATA           = os.path.join('data', 'interim')\n",
 88 |     "RAW_DATA       = os.path.join('data', 'raw')\n",
 89 |     "PROCESSED_DATA = os.path.join('data', 'processed')"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "### Feature engineering parameters\n",
 97 |     "These parameters are used to extract features from sound files:"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": null,
103 |    "metadata": {},
104 |    "outputs": [],
105 |    "source": [
106 |     "n_mels = 64\n",
107 |     "frames = 5\n",
108 |     "n_fft = 1024\n",
109 |     "hop_length = 512\n",
110 |     "power = 2.0"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "markdown",
115 |    "metadata": {},
116 |    "source": [
117 |     "## **Step 1:** Building the datasets\n",
118 |     "---\n",
119 |     "### Generate list of sound files and splitting them\n",
120 |     "Generate list of files found in the raw data folder and generate a training dataset from the folders marked for training. We will be training an autoencoder below:\n",
121 |     "\n",
122 |     "* Testing dataset: **1110 signals** including:\n",
123 |     "  * 295 abnormal signals\n",
124 |     "  * 815 normal signals\n",
125 |     "* Training dataset: **3260 signals** only including normal signals"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "code",
130 |    "execution_count": null,
131 |    "metadata": {},
132 |    "outputs": [],
133 |    "source": [
134 |     "# Build the list of normal and abnormal files:\n",
135 |     "normal_files, abnormal_files = utils.build_files_list(root_dir=os.path.join(DATA, 'fan'))\n",
136 |     "\n",
137 |     "# Concatenate them to obtain a features and label datasets that we can split:\n",
138 |     "X = np.concatenate((normal_files, abnormal_files), axis=0)\n",
139 |     "y = np.concatenate((np.zeros(len(normal_files)), np.ones(len(abnormal_files))), axis=0)\n",
140 |     "\n",
141 |     "train_files, test_files, train_labels, test_labels = train_test_split(X, y,\n",
142 |     "                                                                      train_size=0.8,\n",
143 |     "                                                                      random_state=42,\n",
144 |     "                                                                      shuffle=True,\n",
145 |     "                                                                      stratify=y\n",
146 |     "                                                                     )\n",
147 |     "# We will want to reuse this same train/test split for our next experiment in the next notebook:\n",
148 |     "dataset = dict({\n",
149 |     "    'train_files': train_files,\n",
150 |     "    'test_files': test_files,\n",
151 |     "    'train_labels': train_labels,\n",
152 |     "    'test_labels': test_labels\n",
153 |     "})\n",
154 |     "\n",
155 |     "for key, values in dataset.items():\n",
156 |     "    fname = os.path.join(PROCESSED_DATA, key + '.txt')\n",
157 |     "    with open(fname, 'w') as f:\n",
158 |     "        for item in values:\n",
159 |     "            f.write(str(item))\n",
160 |     "            f.write('\\n')\n",
161 |     "\n",
162 |     "# We now keep only the normal signals from the train files to train the autoencoder:\n",
163 |     "train_files = [f for f in train_files if f not in abnormal_files]\n",
164 |     "train_labels = np.zeros(len(train_files))"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "markdown",
169 |    "metadata": {},
170 |    "source": [
171 |     "### Extracting spectrograms as tabular features\n",
172 |     "Based on the previous data exploration notebook, the training data are generated by computing a spectrogram from each signal and extracting features from these. To build the feature vector (performd in the `generate_dataset()` function), we divide the Mel spectrogram of each signal into several `(=frames)` sliding windows. We then concatenate these windows to assemble a single feature matrix associated to each signal. This tabular-shaped feature will then be fed to an autoencoder down the road:"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": null,
178 |    "metadata": {},
179 |    "outputs": [],
180 |    "source": [
181 |     "train_data_location = os.path.join(DATA, 'train_data.pkl')\n",
182 |     "\n",
183 |     "if os.path.exists(train_data_location):\n",
184 |     "    print('Train data already exists, loading from file...')\n",
185 |     "    with open(train_data_location, 'rb') as f:\n",
186 |     "        train_data = pickle.load(f)\n",
187 |     "        \n",
188 |     "else:\n",
189 |     "    train_data = sound_tools.generate_dataset(train_files, n_mels=n_mels, frames=frames, n_fft=n_fft, hop_length=hop_length)\n",
190 |     "    print('Saving training data to disk...')\n",
191 |     "    with open(os.path.join(DATA, 'train_data.pkl'), 'wb') as f:\n",
192 |     "        pickle.dump(train_data, f)\n",
193 |     "    print('Done.')"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "markdown",
198 |    "metadata": {},
199 |    "source": [
200 |     "#### S3 buckets preparation\n",
201 |     "We upload the train dataset on S3. We use the default bucket of this SageMaker instance:"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "code",
206 |    "execution_count": null,
207 |    "metadata": {},
208 |    "outputs": [],
209 |    "source": [
210 |     "training_input_path = sess.upload_data(os.path.join(DATA, 'train_data.pkl'), key_prefix='training')\n",
211 |     "print(training_input_path)"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "metadata": {},
217 |    "source": [
218 |     "## **Step 2:** Creating a TensorFlow model\n",
219 |     "---\n",
220 |     "### Define an Estimator\n",
221 |     "We are using the TensorFlow container with script mode. The following script will be used as the entry point of the training container:"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "code",
226 |    "execution_count": null,
227 |    "metadata": {},
228 |    "outputs": [],
229 |    "source": [
230 |     "!pygmentize autoencoder/model.py"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "code",
235 |    "execution_count": null,
236 |    "metadata": {},
237 |    "outputs": [],
238 |    "source": [
239 |     "tf_estimator = TensorFlow(\n",
240 |     "    base_job_name='sound-anomaly',\n",
241 |     "    entry_point='model.py',\n",
242 |     "    source_dir='./autoencoder/',\n",
243 |     "    role=role,\n",
244 |     "    instance_count=1, \n",
245 |     "    instance_type='ml.p3.2xlarge',\n",
246 |     "    framework_version='2.2',\n",
247 |     "    py_version='py37',\n",
248 |     "    hyperparameters={\n",
249 |     "        'epochs': 30,\n",
250 |     "        'batch-size': 512,\n",
251 |     "        'learning-rate': 1e-3,\n",
252 |     "        'n_mels': n_mels,\n",
253 |     "        'frame': frames\n",
254 |     "    },\n",
255 |     "    debugger_hook_config=False\n",
256 |     ")"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {},
262 |    "source": [
263 |     "### Model training\n",
264 |     "At this point, you will incur some training costs: the training is quite fast (a few minutes) and you can use spot training to reduce the bill by 70%."
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": null,
270 |    "metadata": {},
271 |    "outputs": [],
272 |    "source": [
273 |     "tf_estimator.fit({'training': training_input_path})"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "markdown",
278 |    "metadata": {},
279 |    "source": [
280 |     "### Model deployment"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "markdown",
285 |    "metadata": {},
286 |    "source": [
287 |     "We can now deploy our trained model behind a SageMaker endpoint: this endpoint will continue costing you on an hourly basis, don't forget to delete it once your done with this notebook!"
288 |    ]
289 |   },
290 |   {
291 |    "cell_type": "code",
292 |    "execution_count": null,
293 |    "metadata": {},
294 |    "outputs": [],
295 |    "source": [
296 |     "tf_endpoint_name = 'sound-anomaly-'+time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
297 |     "tf_predictor = tf_estimator.deploy(\n",
298 |     "    initial_instance_count=1,\n",
299 |     "    instance_type='ml.c5.large',\n",
300 |     "    endpoint_name=tf_endpoint_name\n",
301 |     ")\n",
302 |     "print(f'\\nEndpoint name: {tf_predictor.endpoint_name}')"
303 |    ]
304 |   },
305 |   {
306 |    "cell_type": "markdown",
307 |    "metadata": {},
308 |    "source": [
309 |     "## **Step 3:** Model evaluation\n",
310 |     "---"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "markdown",
315 |    "metadata": {},
316 |    "source": [
317 |     "### Apply model on a test dataset\n",
318 |     "Now we will loop through the test dataset and send each test file to the endpoint. As our model is an autoencoder, we will evaluate how good the model is at reconstructing the input. The highest the reconstruction error, the more chance we have to have identified an anomaly:"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "code",
323 |    "execution_count": null,
324 |    "metadata": {},
325 |    "outputs": [],
326 |    "source": [
327 |     "y_true = test_labels\n",
328 |     "reconstruction_errors = []\n",
329 |     "\n",
330 |     "for index, eval_filename in tqdm(enumerate(test_files), total=len(test_files)):\n",
331 |     "    # Load signal\n",
332 |     "    signal, sr = sound_tools.load_sound_file(eval_filename)\n",
333 |     "\n",
334 |     "    # Extract features from this signal:\n",
335 |     "    eval_features = sound_tools.extract_signal_features(\n",
336 |     "        signal, \n",
337 |     "        sr, \n",
338 |     "        n_mels=n_mels, \n",
339 |     "        frames=frames, \n",
340 |     "        n_fft=n_fft, \n",
341 |     "        hop_length=hop_length\n",
342 |     "    )\n",
343 |     "    \n",
344 |     "    # Get predictions from our autoencoder:\n",
345 |     "    prediction = tf_predictor.predict(eval_features)['predictions']\n",
346 |     "    \n",
347 |     "    # Estimate the reconstruction error:\n",
348 |     "    mse = np.mean(np.mean(np.square(eval_features - prediction), axis=1))\n",
349 |     "    reconstruction_errors.append(mse)"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "markdown",
354 |    "metadata": {},
355 |    "source": [
356 |     "### Reconstruction error analysis"
357 |    ]
358 |   },
359 |   {
360 |    "cell_type": "markdown",
361 |    "metadata": {},
362 |    "source": [
363 |     "In the plot below, we can see that the distribution of reconstruction error for normal and abnormal signals differs significantly. However, the overlap we can see between both histograms, means we will have to compromise between precision and recall."
364 |    ]
365 |   },
366 |   {
367 |    "cell_type": "code",
368 |    "execution_count": null,
369 |    "metadata": {},
370 |    "outputs": [],
371 |    "source": [
372 |     "data = np.column_stack((range(len(reconstruction_errors)), reconstruction_errors))\n",
373 |     "bin_width = 0.25\n",
374 |     "bins = np.arange(min(reconstruction_errors), max(reconstruction_errors) + bin_width, bin_width)\n",
375 |     "\n",
376 |     "fig = plt.figure(figsize=(12,4))\n",
377 |     "plt.hist(data[y_true==0][:,1], bins=bins, color=blue, alpha=0.6, label='Normal signals', edgecolor='#FFFFFF')\n",
378 |     "plt.hist(data[y_true==1][:,1], bins=bins, color=red, alpha=0.6, label='Abnormal signals', edgecolor='#FFFFFF')\n",
379 |     "plt.xlabel(\"Testing reconstruction error\")\n",
380 |     "plt.ylabel(\"# Samples\")\n",
381 |     "plt.title('Reconstruction error distribution on the testing set', fontsize=16)\n",
382 |     "plt.legend()\n",
383 |     "plt.show()"
384 |    ]
385 |   },
386 |   {
387 |    "cell_type": "markdown",
388 |    "metadata": {},
389 |    "source": [
390 |     "Let's explore the recall-precision trade off for a reconstruction error threshold varying between 5.0 and 10.0 (this encompasses most of the overlap we can see above). First, let's visualize how this threshold range separates our signals on a scatter plot of all the testing samples:"
391 |    ]
392 |   },
393 |   {
394 |    "cell_type": "code",
395 |    "execution_count": null,
396 |    "metadata": {},
397 |    "outputs": [],
398 |    "source": [
399 |     "# Set threshold ranges:\n",
400 |     "threshold_min = 5.0\n",
401 |     "threshold_max = 10.0\n",
402 |     "threshold_step = 0.50"
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": null,
408 |    "metadata": {},
409 |    "outputs": [],
410 |    "source": [
411 |     "# Scatter data for normal and abnormal signals:\n",
412 |     "normal_x, normal_y = data[y_true==0][:,0], data[y_true==0][:,1]\n",
413 |     "abnormal_x, abnormal_y = data[y_true==1][:,0], data[y_true==1][:,1]\n",
414 |     "x = np.concatenate((normal_x, abnormal_x))\n",
415 |     "\n",
416 |     "fig, ax = plt.subplots(figsize=(24,8))\n",
417 |     "plt.scatter(normal_x, normal_y, s=15, color='tab:green', alpha=0.3, label='Normal signals')\n",
418 |     "plt.scatter(abnormal_x, abnormal_y, s=15, color='tab:red', alpha=0.3,   label='Abnormal signals')\n",
419 |     "plt.fill_between(x, threshold_min, threshold_max, alpha=0.1, color='tab:orange', label='Threshold range')\n",
420 |     "plt.hlines([threshold_min, threshold_max], x.min(), x.max(), linewidth=0.5, alpha=0.8, color='tab:orange')\n",
421 |     "plt.legend(loc='upper left')\n",
422 |     "plt.title('Threshold range exploration', fontsize=16)\n",
423 |     "plt.xlabel('Samples')\n",
424 |     "plt.ylabel('Reconstruction error')\n",
425 |     "plt.show()"
426 |    ]
427 |   },
428 |   {
429 |    "cell_type": "markdown",
430 |    "metadata": {},
431 |    "source": [
432 |     "### Confusion matrix analysis"
433 |    ]
434 |   },
435 |   {
436 |    "cell_type": "code",
437 |    "execution_count": null,
438 |    "metadata": {},
439 |    "outputs": [],
440 |    "source": [
441 |     "thresholds = np.arange(threshold_min, threshold_max + threshold_step, threshold_step)\n",
442 |     "\n",
443 |     "df = pd.DataFrame(columns=['Signal', 'Ground Truth', 'Prediction', 'Reconstruction Error'])\n",
444 |     "df['Signal'] = test_files\n",
445 |     "df['Ground Truth'] = test_labels\n",
446 |     "df['Reconstruction Error'] = reconstruction_errors\n",
447 |     "\n",
448 |     "FN = []\n",
449 |     "FP = []\n",
450 |     "for th in thresholds:\n",
451 |     "    df.loc[df['Reconstruction Error'] <= th, 'Prediction'] = 0.0\n",
452 |     "    df.loc[df['Reconstruction Error'] > th, 'Prediction'] = 1.0\n",
453 |     "    df = utils.generate_error_types(df)\n",
454 |     "    FN.append(df['FN'].sum())\n",
455 |     "    FP.append(df['FP'].sum())\n",
456 |     "        \n",
457 |     "utils.plot_curves(FP, FN, nb_samples=df.shape[0], threshold_min=threshold_min, threshold_max=threshold_max, threshold_step=threshold_step)"
458 |    ]
459 |   },
460 |   {
461 |    "cell_type": "markdown",
462 |    "metadata": {},
463 |    "source": [
464 |     "The above curves, shows us that the best compromise is to use a threshold set around 6.3-6.5 for the reconstruction error (assuming we are not looking at minimizing either the false positive or false negatives occurrences)."
465 |    ]
466 |   },
467 |   {
468 |    "cell_type": "code",
469 |    "execution_count": null,
470 |    "metadata": {},
471 |    "outputs": [],
472 |    "source": [
473 |     "th = 6.275\n",
474 |     "df.loc[df['Reconstruction Error'] <= th, 'Prediction'] = 0.0\n",
475 |     "df.loc[df['Reconstruction Error'] > th, 'Prediction'] = 1.0\n",
476 |     "df['Prediction'] = df['Prediction'].astype(np.float32)\n",
477 |     "df = utils.generate_error_types(df)\n",
478 |     "tp = df['TP'].sum()\n",
479 |     "tn = df['TN'].sum()\n",
480 |     "fn = df['FN'].sum()\n",
481 |     "fp = df['FP'].sum()\n",
482 |     "\n",
483 |     "from sklearn.metrics import confusion_matrix\n",
484 |     "df['Ground Truth'] = 1 - df['Ground Truth']\n",
485 |     "df['Prediction'] = 1 - df['Prediction']\n",
486 |     "utils.print_confusion_matrix(confusion_matrix(df['Ground Truth'], df['Prediction']), class_names=['abnormal', 'normal']);"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "code",
491 |    "execution_count": null,
492 |    "metadata": {},
493 |    "outputs": [],
494 |    "source": [
495 |     "df.to_csv(os.path.join(PROCESSED_DATA, 'results_autoencoder.csv'), index=False)"
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "code",
500 |    "execution_count": null,
501 |    "metadata": {},
502 |    "outputs": [],
503 |    "source": [
504 |     "precision = tp / (tp + fp)\n",
505 |     "recall = tp / (tp + fn)\n",
506 |     "accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
507 |     "f1_score = 2 * precision * recall / (precision + recall)\n",
508 |     "\n",
509 |     "print(f\"\"\"Basic autoencoder metrics:\n",
510 |     "- Precision: {precision*100:.1f}%\n",
511 |     "- Recall: {recall*100:.1f}%\n",
512 |     "- Accuracy: {accuracy*100:.1f}%\n",
513 |     "- F1 Score: {f1_score*100:.1f}%\"\"\")"
514 |    ]
515 |   },
516 |   {
517 |    "cell_type": "markdown",
518 |    "metadata": {},
519 |    "source": [
520 |     "## Cleanup\n",
521 |     "---"
522 |    ]
523 |   },
524 |   {
525 |    "cell_type": "markdown",
526 |    "metadata": {},
527 |    "source": [
528 |     "Finally, we should delete the endpoint before we close this notebook.\n",
529 |     "\n",
530 |     "To do so execute the cell below. Alternately, you can navigate to the **Endpoints** tab in the SageMaker console, select the endpoint with the name stored in the variable endpoint_name, and select **Delete** from the **Actions** dropdown menu."
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "code",
535 |    "execution_count": null,
536 |    "metadata": {},
537 |    "outputs": [],
538 |    "source": [
539 |     "sess.delete_endpoint(tf_predictor.endpoint_name)"
540 |    ]
541 |   },
542 |   {
543 |    "cell_type": "markdown",
544 |    "metadata": {},
545 |    "source": [
546 |     "## Epilogue: model improvement and further exploration\n",
547 |     "---"
548 |    ]
549 |   },
550 |   {
551 |    "cell_type": "markdown",
552 |    "metadata": {},
553 |    "source": [
554 |     "This spectrogram approach requires to define the spectrogram square dimensions (e.g. the number of Mel cell defined in the data exploration notebook) which is a heuristic. In contrast, deep learning networks with a CNN encoder can learn the best representation to perform the task at hands (anomaly detection). Further steps to investigate to improve on this first result could be:\n",
555 |     "* Experimenting with several more or less complex autoencoder architectures, training for a longer time, performing hyperparameter tuning with different optimizer, tuning the data preparation sequence (e.g. sound discretization parameters), etc.\n",
556 |     "* Leveraging high resolution spectrograms and feed them to a CNN encoder to uncover the most appropriate representation of the sound\n",
557 |     "* Building end-to-end model architecture with encoder-decoder: they have been known to give good results on waveform datasets.\n",
558 |     "* Using deep learning models with multi-context temporal and channel (8 microphones in this case) attention weights.\n",
559 |     "* Using time distributed 2D convolution layers to encode features across the 8 channels: these encoded features could then be fed as sequences across time steps to an LSTM or GRU layer. From there, multiplicative sequence attention weights can then be learnt on the output sequence from the RNN layer.\n",
560 |     "* Exploring the appropriate image representation for multivariate time series signal that are not waveform: replacing spectrograms with Markov transition fields, recurrence plots or network graphs could then be used to achieve the same goals for non-sound time-based signals."
561 |    ]
562 |   }
563 |  ],
564 |  "metadata": {
565 |   "kernelspec": {
566 |    "display_name": "conda_tensorflow2_p36",
567 |    "language": "python",
568 |    "name": "conda_tensorflow2_p36"
569 |   },
570 |   "language_info": {
571 |    "codemirror_mode": {
572 |     "name": "ipython",
573 |     "version": 3
574 |    },
575 |    "file_extension": ".py",
576 |    "mimetype": "text/x-python",
577 |    "name": "python",
578 |    "nbconvert_exporter": "python",
579 |    "pygments_lexer": "ipython3",
580 |    "version": "3.6.10"
581 |   }
582 |  },
583 |  "nbformat": 4,
584 |  "nbformat_minor": 4
585 | }
586 | 


--------------------------------------------------------------------------------
/3_rekognition_custom_labels.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Sound anomaly detection\n",
  8 |     "*Step 3 - Performing anomaly detection with a computer vision based approach, leveraging Amazon Rekognition Custom Labels*\n",
  9 |     "\n",
 10 |     "## Introduction\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "In this notebook, we will use the spectrograms directly as inputs to feed a computer vision-based architecture. We will leverage Amazon Rekognition Custom Labels. Training a custom label project follows this process:\n",
 19 |     "1. Building the datasets and uploading them to Amazon S3\n",
 20 |     "2. Creating a project and collecting the generated project ARN\n",
 21 |     "3. Associate the project with the training data, validation data and output locations\n",
 22 |     "4. Train a project version with these datasets\n",
 23 |     "5. Start the model: this will provision an endpoint and deploy the model behind it\n",
 24 |     "6. Query the endpoint for inference for the validation and testing datasets\n",
 25 |     "\n",
 26 |     "You need to ensure that this **notebook instance has an IAM role** which allows it to call the **Amazon Rekognition Custom Labels API**:\n",
 27 |     "1. In your IAM console, look for the SageMaker execution role endorsed by your notebook instance (a role with a name like *AmazonSageMaker-ExecutionRole-yyyymmddTHHMMSS*)\n",
 28 |     "2. Click on **Attach Policies** and look for this managed policy: **AmazonRekognitionCustomLabelsFullAccess**\n",
 29 |     "3. Check the box next to it and click on **Attach Policy**\n",
 30 |     "\n",
 31 |     "Your SageMaker notebook instance can now call the Rekognition Custom Labels APIs."
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "## Initialization\n",
 39 |     "---"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "metadata": {},
 45 |    "source": [
 46 |     "### Configuration"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "# Python libraries:\n",
 56 |     "import matplotlib.pyplot as plt\n",
 57 |     "import numpy as np\n",
 58 |     "import os\n",
 59 |     "import pandas as pd\n",
 60 |     "import random\n",
 61 |     "import seaborn as sns\n",
 62 |     "import sys\n",
 63 |     "import time\n",
 64 |     "\n",
 65 |     "# Helper functions:\n",
 66 |     "sys.path.append('tools')\n",
 67 |     "import sound_tools\n",
 68 |     "import utils\n",
 69 |     "import rekognition_tools as rt\n",
 70 |     "\n",
 71 |     "# Other imports:\n",
 72 |     "from sklearn.metrics import confusion_matrix\n",
 73 |     "from sklearn.model_selection import train_test_split\n",
 74 |     "from tqdm import tqdm\n",
 75 |     "\n",
 76 |     "# AWS libraries:\n",
 77 |     "import boto3"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": null,
 83 |    "metadata": {},
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "# Initialization:\n",
 87 |     "%matplotlib inline\n",
 88 |     "random.seed(42)\n",
 89 |     "np.random.seed(42)\n",
 90 |     "plt.style.use('Solarize_Light2')\n",
 91 |     "prop_cycle = plt.rcParams['axes.prop_cycle']\n",
 92 |     "colors = prop_cycle.by_key()['color']\n",
 93 |     "\n",
 94 |     "# Paths definition:\n",
 95 |     "DATA           = os.path.join('data', 'interim')\n",
 96 |     "RAW_DATA       = os.path.join('data', 'raw')\n",
 97 |     "PROCESSED_DATA = os.path.join('data', 'processed')\n",
 98 |     "TRAIN_PATH     = os.path.join(PROCESSED_DATA, 'train')\n",
 99 |     "TEST_PATH      = os.path.join(PROCESSED_DATA, 'test')\n",
100 |     "\n",
101 |     "os.makedirs(os.path.join(PROCESSED_DATA, 'train', 'normal'), exist_ok=True)\n",
102 |     "os.makedirs(os.path.join(PROCESSED_DATA, 'train', 'abnormal'), exist_ok=True)\n",
103 |     "os.makedirs(os.path.join(PROCESSED_DATA, 'test', 'normal'), exist_ok=True)\n",
104 |     "os.makedirs(os.path.join(PROCESSED_DATA, 'test', 'abnormal'), exist_ok=True)"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {},
110 |    "source": [
111 |     "### Feature engineering parameters\n",
112 |     "These parameters are used to extract features from sound files:"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": null,
118 |    "metadata": {},
119 |    "outputs": [],
120 |    "source": [
121 |     "n_mels = 64\n",
122 |     "frames = 5\n",
123 |     "n_fft = 1024\n",
124 |     "hop_length = 512\n",
125 |     "power = 2.0"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "markdown",
130 |    "metadata": {},
131 |    "source": [
132 |     "## **Step 1:** Building the datasets\n",
133 |     "---\n",
134 |     "### Generate list of sound files and splitting them\n",
135 |     "We are going to generate a spectrogram for each signal and use this as input to train a custom labels model with Rekognition:\n",
136 |     "\n",
137 |     "* Testing dataset: **1110 signals** including:\n",
138 |     "  * 295 abnormal signals\n",
139 |     "  * 815 normal signals\n",
140 |     "* Training dataset: **4440 signals** including:\n",
141 |     "  * 1180 abnormal signals\n",
142 |     "  * 3260 normal signals"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": null,
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "# Loading the dataset from the previous notebook if it exists:\n",
152 |     "try:\n",
153 |     "    dataset = dict({\n",
154 |     "        'train_files': train_files,\n",
155 |     "        'test_files': test_files,\n",
156 |     "        'train_labels': train_labels,\n",
157 |     "        'test_labels': test_labels\n",
158 |     "    })\n",
159 |     "\n",
160 |     "    for key in ['train_files', 'test_files', 'train_labels', 'test_labels']:\n",
161 |     "        fname = os.path.join(PROCESSED_DATA, key + '.txt')\n",
162 |     "        with open(fname, 'r') as f:\n",
163 |     "            dataset.update({\n",
164 |     "                key: [line[:-1] for line in f.readlines()]\n",
165 |     "            })\n",
166 |     "\n",
167 |     "    dataset['train_labels'] = [np.float(label) for label in dataset['train_labels']]\n",
168 |     "    dataset['test_labels'] = [np.float(label) for label in dataset['test_labels']]\n",
169 |     "    \n",
170 |     "# If the dataset was not already generated, we generate it from scratch:\n",
171 |     "except Exception as e:\n",
172 |     "    # Build the list of normal and abnormal files:\n",
173 |     "    normal_files, abnormal_files = utils.build_files_list(root_dir=os.path.join(DATA, 'fan'))\n",
174 |     "\n",
175 |     "    # Concatenate them to obtain a features and label datasets that we can split:\n",
176 |     "    X = np.concatenate((normal_files, abnormal_files), axis=0)\n",
177 |     "    y = np.concatenate((np.zeros(len(normal_files)), np.ones(len(abnormal_files))), axis=0)\n",
178 |     "\n",
179 |     "    train_files, test_files, train_labels, test_labels = train_test_split(X, y,\n",
180 |     "                                                                          train_size=0.8,\n",
181 |     "                                                                          random_state=42,\n",
182 |     "                                                                          shuffle=True,\n",
183 |     "                                                                          stratify=y\n",
184 |     "                                                                         )"
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "markdown",
189 |    "metadata": {},
190 |    "source": [
191 |     "### Generating spectrograms pictures"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": null,
197 |    "metadata": {},
198 |    "outputs": [],
199 |    "source": [
200 |     "img_train_files = sound_tools.generate_spectrograms(train_files, os.path.join(PROCESSED_DATA, 'train'))\n",
201 |     "img_test_files = sound_tools.generate_spectrograms(test_files, os.path.join(PROCESSED_DATA, 'test'))"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "markdown",
206 |    "metadata": {},
207 |    "source": [
208 |     "### S3 buckets preparation\n",
209 |     "We upload the train and test dataset to S3 and generate the manifest files. **Update the BUCKET variable with your own Bucket name below**"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "code",
214 |    "execution_count": null,
215 |    "metadata": {},
216 |    "outputs": [],
217 |    "source": [
218 |     "BUCKET = '<YOUR-BUCKET-NAME>'\n",
219 |     "PREFIX = 'custom-label'\n",
220 |     "LABELS = ['abnormal', 'normal']"
221 |    ]
222 |   },
223 |   {
224 |    "cell_type": "code",
225 |    "execution_count": null,
226 |    "metadata": {},
227 |    "outputs": [],
228 |    "source": [
229 |     "!aws s3 cp --recursive $TRAIN_PATH s3://$BUCKET/$PREFIX/train\n",
230 |     "!aws s3 cp --recursive $TEST_PATH s3://$BUCKET/$PREFIX/test"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "code",
235 |    "execution_count": null,
236 |    "metadata": {},
237 |    "outputs": [],
238 |    "source": [
239 |     "rt.create_manifest_from_bucket(BUCKET, PREFIX, 'train', LABELS, output_bucket=f's3://{BUCKET}/{PREFIX}/manifests')\n",
240 |     "rt.create_manifest_from_bucket(BUCKET, PREFIX, 'test', LABELS, output_bucket=f's3://{BUCKET}/{PREFIX}/manifests')"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "markdown",
245 |    "metadata": {},
246 |    "source": [
247 |     "## **Step 2:** Creating a custom label project in Amazon Rekognition\n",
248 |     "---"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": null,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "# Initialization, get a Rekognition client:\n",
258 |     "PROJECT_NAME = 'sound-anomaly-detection'\n",
259 |     "reko = boto3.client(\"rekognition\")"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": null,
265 |    "metadata": {},
266 |    "outputs": [],
267 |    "source": [
268 |     "# Let's try to create a Rekognition project:\n",
269 |     "try:\n",
270 |     "    project_arn = reko.create_project(ProjectName=PROJECT_NAME)['ProjectArn']\n",
271 |     "    \n",
272 |     "# If the project already exists, we get its ARN:\n",
273 |     "except reko.exceptions.ResourceInUseException:\n",
274 |     "    # List all the existing project:\n",
275 |     "    print('Project already exists, collecting the ARN.')\n",
276 |     "    reko_project_list = reko.describe_projects()\n",
277 |     "    \n",
278 |     "    # Loop through all the Rekognition projects:\n",
279 |     "    for project in reko_project_list['ProjectDescriptions']:\n",
280 |     "        # Get the project name (the string after the first delimiter in the ARN)\n",
281 |     "        project_name = project['ProjectArn'].split('/')[1]\n",
282 |     "        \n",
283 |     "        # Once we find it, we store the ARN and break out of the loop:\n",
284 |     "        if (project_name == PROJECT_NAME):\n",
285 |     "            project_arn = project['ProjectArn']\n",
286 |     "            break\n",
287 |     "            \n",
288 |     "project_arn"
289 |    ]
290 |   },
291 |   {
292 |    "cell_type": "markdown",
293 |    "metadata": {},
294 |    "source": [
295 |     "## **Step 3:** Associate the dataset to the project\n",
296 |     "---\n",
297 |     "We need to tell Rekognition where to find the training data, testing data and where to output its results"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "code",
302 |    "execution_count": null,
303 |    "metadata": {},
304 |    "outputs": [],
305 |    "source": [
306 |     "TrainingData = {\n",
307 |     "    'Assets': [{ \n",
308 |     "        'GroundTruthManifest': {\n",
309 |     "            'S3Object': { \n",
310 |     "                'Bucket': BUCKET,\n",
311 |     "                'Name': f'{PREFIX}/manifests/train.manifest'\n",
312 |     "            }\n",
313 |     "        }\n",
314 |     "    }]\n",
315 |     "}\n",
316 |     "\n",
317 |     "TestingData = {\n",
318 |     "    'AutoCreate': True\n",
319 |     "}\n",
320 |     "\n",
321 |     "OutputConfig = { \n",
322 |     "    'S3Bucket': BUCKET,\n",
323 |     "    'S3KeyPrefix': f'{PREFIX}/output'\n",
324 |     "}"
325 |    ]
326 |   },
327 |   {
328 |    "cell_type": "markdown",
329 |    "metadata": {},
330 |    "source": [
331 |     "## **Step 4:** Now we create a project version\n",
332 |     "---\n",
333 |     "Creating a project version will build and train a model within this Rekognition project for the data previously configured. Project creation can fail, if the bucket you selected cannot be accessed by Rekognition. Make sure the following Bucket Policy is applied to your bucket (replace **<YOUR-BUCKET-NAME>** by your bucket):\n",
334 |     "\n",
335 |     "```json\n",
336 |     "{\n",
337 |     "    \"Version\": \"2012-10-17\",\n",
338 |     "    \"Statement\": [\n",
339 |     "        {\n",
340 |     "            \"Sid\": \"AWSRekognitionS3AclBucketRead20191011\",\n",
341 |     "            \"Effect\": \"Allow\",\n",
342 |     "            \"Principal\": {\n",
343 |     "                \"Service\": \"rekognition.amazonaws.com\"\n",
344 |     "            },\n",
345 |     "            \"Action\": [\n",
346 |     "                \"s3:GetBucketAcl\",\n",
347 |     "                \"s3:GetBucketLocation\"\n",
348 |     "            ],\n",
349 |     "            \"Resource\": \"arn:aws:s3:::<YOUR-BUCKET-NAME>\"\n",
350 |     "        },\n",
351 |     "        {\n",
352 |     "            \"Sid\": \"AWSRekognitionS3GetBucket20191011\",\n",
353 |     "            \"Effect\": \"Allow\",\n",
354 |     "            \"Principal\": {\n",
355 |     "                \"Service\": \"rekognition.amazonaws.com\"\n",
356 |     "            },\n",
357 |     "            \"Action\": [\n",
358 |     "                \"s3:GetObject\",\n",
359 |     "                \"s3:GetObjectAcl\",\n",
360 |     "                \"s3:GetObjectVersion\",\n",
361 |     "                \"s3:GetObjectTagging\"\n",
362 |     "            ],\n",
363 |     "            \"Resource\": \"arn:aws:s3:::<YOUR-BUCKET-NAME>/*\"\n",
364 |     "        },\n",
365 |     "        {\n",
366 |     "            \"Sid\": \"AWSRekognitionS3ACLBucketWrite20191011\",\n",
367 |     "            \"Effect\": \"Allow\",\n",
368 |     "            \"Principal\": {\n",
369 |     "                \"Service\": \"rekognition.amazonaws.com\"\n",
370 |     "            },\n",
371 |     "            \"Action\": \"s3:GetBucketAcl\",\n",
372 |     "            \"Resource\": \"arn:aws:s3:::<YOUR-BUCKET-NAME>\"\n",
373 |     "        },\n",
374 |     "        {\n",
375 |     "            \"Sid\": \"AWSRekognitionS3PutObject20191011\",\n",
376 |     "            \"Effect\": \"Allow\",\n",
377 |     "            \"Principal\": {\n",
378 |     "                \"Service\": \"rekognition.amazonaws.com\"\n",
379 |     "            },\n",
380 |     "            \"Action\": \"s3:PutObject\",\n",
381 |     "            \"Resource\": \"arn:aws:s3:::<YOUR-BUCKET-NAME>/*\",\n",
382 |     "            \"Condition\": {\n",
383 |     "                \"StringEquals\": {\n",
384 |     "                    \"s3:x-amz-acl\": \"bucket-owner-full-control\"\n",
385 |     "                }\n",
386 |     "            }\n",
387 |     "        }\n",
388 |     "    ]\n",
389 |     "}\n",
390 |     "```"
391 |    ]
392 |   },
393 |   {
394 |    "cell_type": "code",
395 |    "execution_count": null,
396 |    "metadata": {},
397 |    "outputs": [],
398 |    "source": [
399 |     "version = 'experiment-1'\n",
400 |     "VERSION_NAME = f'{PROJECT_NAME}.{version}'\n",
401 |     "\n",
402 |     "# Let's try to create a new project version in the current project:\n",
403 |     "try:\n",
404 |     "    project_version_arn = reko.create_project_version(\n",
405 |     "        ProjectArn=project_arn,      # Project ARN\n",
406 |     "        VersionName=VERSION_NAME,    # Name of this version\n",
407 |     "        OutputConfig=OutputConfig,   # S3 location for the output artefact\n",
408 |     "        TrainingData=TrainingData,   # S3 location of the manifest describing the training data\n",
409 |     "        TestingData=TestingData      # S3 location of the manifest describing the validation data\n",
410 |     "    )['ProjectVersionArn']\n",
411 |     "    \n",
412 |     "# If a project version with this name already exists, we get its ARN:\n",
413 |     "except reko.exceptions.ResourceInUseException:\n",
414 |     "    # List all the project versions (=models) for this project:\n",
415 |     "    print('Project version already exists, collecting the ARN:', end=' ')\n",
416 |     "    reko_project_versions_list = reko.describe_project_versions(ProjectArn=project_arn)\n",
417 |     "    \n",
418 |     "    # Loops through them:\n",
419 |     "    for project_version in reko_project_versions_list['ProjectVersionDescriptions']:\n",
420 |     "        # Get the project version name (the string after the third delimiter in the ARN)\n",
421 |     "        project_version_name = project_version['ProjectVersionArn'].split('/')[3]\n",
422 |     "\n",
423 |     "        # Once we find it, we store the ARN and break out of the loop:\n",
424 |     "        if (project_version_name == VERSION_NAME):\n",
425 |     "            project_version_arn = project_version['ProjectVersionArn']\n",
426 |     "            break\n",
427 |     "            \n",
428 |     "print(project_version_arn)\n",
429 |     "status = reko.describe_project_versions(\n",
430 |     "    ProjectArn=project_arn,\n",
431 |     "    VersionNames=[project_version_arn.split('/')[3]]\n",
432 |     ")['ProjectVersionDescriptions'][0]['Status']"
433 |    ]
434 |   },
435 |   {
436 |    "cell_type": "markdown",
437 |    "metadata": {},
438 |    "source": [
439 |     "The following loops prints the project version training status (`TRAINING_IN_PROGRESS`) until the model has been trained (`TRAINING_COMPLETE`): if it's already trained the model status will either be:\n",
440 |     "* `STOPPED`: the model is trained, but is not currently deployed\n",
441 |     "* `STARTED`: the model has been deployed behind an endpoint and is available to deliver inference (hourly costs are incurred)\n",
442 |     "* `STARTING`: deployment in progress\n",
443 |     "* `STOPPING`: stopping in progress"
444 |    ]
445 |   },
446 |   {
447 |    "cell_type": "code",
448 |    "execution_count": null,
449 |    "metadata": {},
450 |    "outputs": [],
451 |    "source": [
452 |     "# Loops while training of this project version is in progress:\n",
453 |     "while status == 'TRAINING_IN_PROGRESS':\n",
454 |     "    status = reko.describe_project_versions(\n",
455 |     "        ProjectArn=project_arn,\n",
456 |     "        VersionNames=[project_version_arn.split('/')[3]]\n",
457 |     "    )['ProjectVersionDescriptions'][0]['Status']\n",
458 |     "\n",
459 |     "    print(status)\n",
460 |     "    time.sleep(60)\n",
461 |     "    \n",
462 |     "print(status)"
463 |    ]
464 |   },
465 |   {
466 |    "cell_type": "markdown",
467 |    "metadata": {},
468 |    "source": [
469 |     "## **Step 5:** Model starting\n",
470 |     "We now have a trained model, we need to start it to serve inferences: the following command put the model in a \"hosted\" state. This process takes a while, as in the background we are created a dedicated endpoint on which we will deploy our trained model to serve the predictions:"
471 |    ]
472 |   },
473 |   {
474 |    "cell_type": "code",
475 |    "execution_count": null,
476 |    "metadata": {},
477 |    "outputs": [],
478 |    "source": [
479 |     "rt.start_model(project_arn, project_version_arn, VERSION_NAME)"
480 |    ]
481 |   },
482 |   {
483 |    "cell_type": "markdown",
484 |    "metadata": {},
485 |    "source": [
486 |     "## **Step 6:** Model evaluation\n",
487 |     "---\n",
488 |     "We now have a live endpoint with our model ready to deliver its predictions.\n",
489 |     "### Apply model on a test dataset\n",
490 |     "Let's now get the predictions on the **test datasets**:"
491 |    ]
492 |   },
493 |   {
494 |    "cell_type": "code",
495 |    "execution_count": null,
496 |    "metadata": {},
497 |    "outputs": [],
498 |    "source": [
499 |     "import s3fs\n",
500 |     "\n",
501 |     "test_results_filename = os.path.join(PROCESSED_DATA, f'results_rekognition_{PROJECT_NAME}-{version}.csv')\n",
502 |     "print(f'Looking for test results file: \"{test_results_filename}\"')\n",
503 |     "\n",
504 |     "if os.path.exists(test_results_filename):\n",
505 |     "    print('Prediction file on the test dataset exists, loading them from disk')\n",
506 |     "    test_predictions = pd.read_csv(test_results_filename)\n",
507 |     "    \n",
508 |     "else:\n",
509 |     "    print('Predictions file on the test dataset does not exist, querying the endpoint to collect inference results...')\n",
510 |     "    predictions_ok = rt.get_results(project_version_arn, BUCKET, s3_path=f'{BUCKET}/{PREFIX}/test/normal', label='normal', verbose=True)\n",
511 |     "    predictions_ko = rt.get_results(project_version_arn, BUCKET, s3_path=f'{BUCKET}/{PREFIX}/test/abnormal', label='abnormal', verbose=True)\n",
512 |     "\n",
513 |     "    print('\\nWriting predictions for test set to disk.')\n",
514 |     "    test_predictions = pd.concat([predictions_ok, predictions_ko], axis='index')\n",
515 |     "    test_predictions = rt.reshape_results(test_predictions)\n",
516 |     "    test_predictions.to_csv(test_results_filename, index=None)\n",
517 |     "    print('Done.')"
518 |    ]
519 |   },
520 |   {
521 |    "cell_type": "markdown",
522 |    "metadata": {},
523 |    "source": [
524 |     "### Confusion matrix analysis"
525 |    ]
526 |   },
527 |   {
528 |    "cell_type": "code",
529 |    "execution_count": null,
530 |    "metadata": {},
531 |    "outputs": [],
532 |    "source": [
533 |     "df = utils.generate_error_types(test_predictions, normal_label='normal', anomaly_label='abnormal')\n",
534 |     "tp = df['TP'].sum()\n",
535 |     "tn = df['TN'].sum()\n",
536 |     "fn = df['FN'].sum()\n",
537 |     "fp = df['FP'].sum()\n",
538 |     "\n",
539 |     "utils.print_confusion_matrix(confusion_matrix(df['Ground Truth'], df['Prediction']), class_names=['abnormal', 'normal']);"
540 |    ]
541 |   },
542 |   {
543 |    "cell_type": "code",
544 |    "execution_count": null,
545 |    "metadata": {},
546 |    "outputs": [],
547 |    "source": [
548 |     "precision = tp / (tp + fp)\n",
549 |     "recall = tp / (tp + fn)\n",
550 |     "accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
551 |     "f1_score = 2 * precision * recall / (precision + recall)\n",
552 |     "\n",
553 |     "print(f\"\"\"Amazon Rekognition custom model metrics:\n",
554 |     "- Precision: {precision*100:.1f}%\n",
555 |     "- Recall: {recall*100:.1f}%\n",
556 |     "- Accuracy: {accuracy*100:.1f}%\n",
557 |     "- F1 Score: {f1_score*100:.1f}%\"\"\")"
558 |    ]
559 |   },
560 |   {
561 |    "cell_type": "markdown",
562 |    "metadata": {},
563 |    "source": [
564 |     "## Cleanup\n",
565 |     "---\n",
566 |     "We need to stop the running model as we will continue to incur costs while the endpoint is live:"
567 |    ]
568 |   },
569 |   {
570 |    "cell_type": "code",
571 |    "execution_count": null,
572 |    "metadata": {},
573 |    "outputs": [],
574 |    "source": [
575 |     "rt.stop_model(project_version_arn)"
576 |    ]
577 |   }
578 |  ],
579 |  "metadata": {
580 |   "kernelspec": {
581 |    "display_name": "conda_tensorflow2_p36",
582 |    "language": "python",
583 |    "name": "conda_tensorflow2_p36"
584 |   },
585 |   "language_info": {
586 |    "codemirror_mode": {
587 |     "name": "ipython",
588 |     "version": 3
589 |    },
590 |    "file_extension": ".py",
591 |    "mimetype": "text/x-python",
592 |    "name": "python",
593 |    "nbconvert_exporter": "python",
594 |    "pygments_lexer": "ipython3",
595 |    "version": "3.6.10"
596 |   }
597 |  },
598 |  "nbformat": 4,
599 |  "nbformat_minor": 4
600 | }
601 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Performing anomaly detection on industrial equipment using audio signals
 2 | 
 3 | This repository contains a sample on how to perform anomaly detection on machine sounds (based on the [MIMII Dataset](https://zenodo.org/record/3384388)) leveraging several approaches.
 4 | 
 5 | **Running time:** *once the dataset is downloaded, it takes roughly an hour and a half to go through all these notebooks from start to finish.*
 6 | 
 7 | ## Overview
 8 | Industrial companies have been collecting a massive amount of time series data about their operating processes, manufacturing production lines, industrial equipment... They sometime store years of data in historian systems. Whereas they are looking to prevent equipment breakdown that would stop a production line, avoid catastrophic failures in a power generation facility or improving their end product quality by adjusting their process parameters, having the ability to process time series data is a challenge that modern cloud technologies are up to. In this post, we are going to focus on preventing machine breakdown from happening.
 9 | 
10 | In many cases, machine failures are tackled by either reactive action (stop the line and repair...) or costly preventive maintenance where you have to build the proper replacement parts inventory and schedule regular maintenance activities. Skilled machine operators are the most valuable assets in such settings: years of experience allow them to develop a fine knowledge of how the machinery should operate, they become expert listeners and are able to develop unusual behavior and sounds in rotating and moving machines. However, production lines are becoming more and more automated, and augmenting these machine operators with AI-generated insights is a way to maintain and develop the fine expertise needed to prevent industrials undergoing a reactive posture when dealing with machine breakdowns.
11 | 
12 | This is a companion repository for a blog post on AWS Machine Learning Blog, where we compare and contrast two different approaches to identify a malfunctioning machine for which we have sound recordings: we will start by building a neural network based on an autoencoder architecture and we will then use an image-based approach where we will feed “images from the sound” (namely spectrograms) to an image based automated ML classification feature.
13 | 
14 | ### Installation instructions
15 | [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and login.
16 | 
17 | Navigate to the SageMaker console and create a new instance. Using an **ml.c5.2xlarge instance** with a **25 GB attached EBS volume** is recommended to process the dataset comfortably
18 | 
19 | You need to ensure that this notebook instance has an IAM role which allows it to call the Amazon Rekognition Custom Labels API:
20 | 1. In your IAM console, look for the SageMaker execution role endorsed by your notebook instance (a role with a name like AmazonSageMaker-ExecutionRole-yyyymmddTHHMMSS)
21 | 2. Click on Attach Policies and look for this managed policy: AmazonRekognitionCustomLabelsFullAccess
22 | 3. Check the box next to it and click on Attach Policy
23 | 
24 | Your SageMaker notebook instance can now call the Rekognition Custom Labels APIs.
25 | 
26 | You can know navigate back to the Amazon SageMaker console, then to the Notebook Instances menu. Start your instance and launch either Jupyter or JupyterLab session. From there, you can launch a new terminal and clone this repository into your local development machine using `git clone`.
27 | 
28 | ### Repository structure
29 | Once you've cloned this repo, browse to the [data exploration](1_data_exploration.ipynb) notebook: this first notebook will download and prepare the data necessary for the other ones.
30 | 
31 | The dataset used is a subset of the MIMII dataset dedicated to industrial fans sound. This 10 GB archive will be downloaded in the /tmp directory: if you're using a SageMaker instance, you should have enough space on the ephemeral volume to download it. The unzipped data is around 15 GB large and will be located in the EBS volume, make sure it is large enough to prevent any out of space error.
32 | 
33 | ```
34 | .
35 | |
36 | +-- README.md                                 <-- This instruction file
37 | |
38 | +-- autoencoder/
39 | |   |-- model.py                              <-- The training script used as an entrypoint of the 
40 | |   |                                             TensorFlow container
41 | |   \-- requirements.txt                      <-- Requirements file to update the training container 
42 | |                                                 at launch
43 | |
44 | +-- pictures/                                 <-- Assets used in in the introduction and README.md
45 | |
46 | +-- tools/
47 | |   |-- rekognition_tools.py                  <-- Utilities to manage Rekognition custom labels models
48 | |   |                                             (start, stop, get inference...)
49 | |   |-- sound_tools.py                        <-- Utilities to manage sounds dataset
50 | |   \-- utils.py                              <-- Various tools to build files list, plot curves, and 
51 | |                                                 confusion matrix... 
52 | |
53 | +-- 0_introduction.ipynb                      <-- Expose the context
54 | |
55 | +-- 1_data_exploration.ipynb                  <-- START HERE: data exploration notebook, useful to 
56 | |                                                 generate the datasets, get familiar with sound datasets
57 | |                                                 and basic frequency analysis
58 | |
59 | +-- 2_custom_autoencoder.ipynb                <-- Using the SageMaker TensorFlow container to build a 
60 | |                                                 custom autoencoder
61 | |
62 | \-- 3_rekognition_custom_labels.ipynb         <-- Performing the same tasks by calling the Rekognition 
63 |                                                   Custom Labels API
64 | ```
65 | 
66 | ## Questions
67 | 
68 | Please contact [@michaelhoarau](https://twitter.com/michaelhoarau) or raise an issue on this repository.
69 | 
70 | ## Security
71 | 
72 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
73 | 
74 | ## License
75 | This collection of notebooks is licensed under the MIT-0 License. See the LICENSE file.
76 | 


--------------------------------------------------------------------------------
/autoencoder/model.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import argparse, os
  5 | import numpy as np
  6 | import pickle
  7 | 
  8 | import tensorflow as tf
  9 | import tensorflow.keras as keras
 10 | from tensorflow.keras import backend as K
 11 | from tensorflow.keras import Input
 12 | from tensorflow.keras.models import Model
 13 | from tensorflow.keras.layers import Dense
 14 | from tensorflow.keras.optimizers import Adam
 15 | from tensorflow.keras.utils import multi_gpu_model
 16 | 
 17 | def autoencoder_model(input_dims):
 18 |     """
 19 |     Defines a Keras model for performing the anomaly detection. 
 20 |     This model is based on a simple dense autoencoder.
 21 |     
 22 |     PARAMS
 23 |     ======
 24 |         inputs_dims (integer) - number of dimensions of the input features
 25 |         
 26 |     RETURN
 27 |     ======
 28 |         Model (tf.keras.models.Model) - the Keras model of our autoencoder
 29 |     """
 30 |     
 31 |     # Autoencoder definition:
 32 |     inputLayer = Input(shape=(input_dims,))
 33 |     h = Dense(64, activation="relu")(inputLayer)
 34 |     h = Dense(64, activation="relu")(h)
 35 |     h = Dense(8, activation="relu")(h)
 36 |     h = Dense(64, activation="relu")(h)
 37 |     h = Dense(64, activation="relu")(h)
 38 |     h = Dense(input_dims, activation=None)(h)
 39 | 
 40 |     return Model(inputs=inputLayer, outputs=h)
 41 | 
 42 | def parse_arguments():
 43 |     """
 44 |     Parse the command line arguments passed when running this training script
 45 |     
 46 |     RETURN
 47 |     ======
 48 |         args (ArgumentParser) - an ArgumentParser instance command line arguments
 49 |     """
 50 |     parser = argparse.ArgumentParser()
 51 |     
 52 |     parser.add_argument('--epochs', type=int, default=10)
 53 |     parser.add_argument('--n_mels', type=int, default=64)
 54 |     parser.add_argument('--frame', type=int, default=5)
 55 |     parser.add_argument('--learning-rate', type=float, default=0.01)
 56 |     parser.add_argument('--batch-size', type=int, default=128)
 57 |     parser.add_argument('--gpu-count', type=int, default=os.environ['SM_NUM_GPUS'])
 58 |     parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
 59 |     parser.add_argument('--training', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
 60 | 
 61 |     args, _ = parser.parse_known_args()
 62 |     
 63 |     return args
 64 |     
 65 | def train(training_dir, model_dir, n_mels, frame, lr, batch_size, epochs, gpu_count):
 66 |     """
 67 |     Main training function.
 68 |     
 69 |     PARAMS
 70 |     ======
 71 |         training_dir (string) - location where the training data are
 72 |         model_dir (string) - location where to store the model artifacts
 73 |         n_mels (integer) - number of Mel buckets to build the spectrograms
 74 |         frames (integer) - number of sliding windows to use to slice the Mel spectrogram
 75 |         lr (float) - learning rate
 76 |         batch_size (integer) - batch size
 77 |         epochs (integer) - number of epochs
 78 |         gpu_count (integer) - number of GPU to distribute the job on
 79 |     """
 80 |     # Load training data:
 81 |     train_data_file = os.path.join(training_dir, 'train_data.pkl')
 82 |     with open(train_data_file, 'rb') as f:
 83 |         train_data = pickle.load(f) 
 84 |     
 85 |     # Builds the model:
 86 |     model = autoencoder_model(n_mels * frame)
 87 |     print(model.summary())
 88 |     if gpu_count > 1:
 89 |         model = multi_gpu_model(model, gpus=gpu_count)
 90 |         
 91 |     # Model preparation:
 92 |     model.compile(
 93 |         loss='mean_squared_error',
 94 |         optimizer=Adam(learning_rate=lr),
 95 |         metrics=['accuracy']
 96 |     )
 97 |     
 98 |     # Model training: this is an autoencoder, we 
 99 |     # use the same data for training and validation:
100 |     history = model.fit(
101 |         train_data, 
102 |         train_data,
103 |         batch_size=batch_size,
104 |         validation_split=0.1,
105 |         epochs=epochs,
106 |         shuffle=True,
107 |         verbose=2
108 |     )
109 |     
110 |     # Save the trained model:
111 |     os.makedirs(os.path.join(model_dir, 'model/1'), exist_ok=True)
112 |     model.save(os.path.join(model_dir, 'model/1'))
113 | 
114 | if __name__ == '__main__':
115 |     # Initialization:
116 |     tf.random.set_seed(42)
117 |     
118 |     # Parsing command line arguments:
119 |     args = parse_arguments()
120 |     epochs       = args.epochs
121 |     n_mels       = args.n_mels
122 |     frame        = args.frame
123 |     lr           = args.learning_rate
124 |     batch_size   = args.batch_size
125 |     gpu_count    = args.gpu_count
126 |     model_dir    = args.model_dir
127 |     training_dir = args.training
128 |     
129 |     # Launch the training:
130 |     train(training_dir, model_dir, n_mels, frame, lr, batch_size, epochs, gpu_count)


--------------------------------------------------------------------------------
/autoencoder/requirements.txt:
--------------------------------------------------------------------------------
1 | keras


--------------------------------------------------------------------------------
/pictures/confusion_matrix_autoencoder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/confusion_matrix_autoencoder.png


--------------------------------------------------------------------------------
/pictures/confusion_matrix_rekognition.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/confusion_matrix_rekognition.png


--------------------------------------------------------------------------------
/pictures/reconstruction_error_histograms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/reconstruction_error_histograms.png


--------------------------------------------------------------------------------
/pictures/reconstruction_error_threshold.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/reconstruction_error_threshold.png


--------------------------------------------------------------------------------
/pictures/spectrograms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/spectrograms.png


--------------------------------------------------------------------------------
/pictures/stft.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/stft.png


--------------------------------------------------------------------------------
/pictures/threshold_range_exploration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/threshold_range_exploration.png


--------------------------------------------------------------------------------
/pictures/waveforms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/sound-anomaly-detection-for-manufacturing/029862d1441e5e695da0a9e56823f8b6da98227a/pictures/waveforms.png


--------------------------------------------------------------------------------
/tools/rekognition_tools.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import boto3
  5 | import json
  6 | import pandas as pd
  7 | import s3fs
  8 | import time
  9 | import utils
 10 | 
 11 | from datetime import datetime
 12 | 
 13 | def create_manifest_from_bucket(bucket, prefix, folder, labels, output_bucket):
 14 |     """
 15 |     Based on a bucket / prefix location on S3, this function will crawl this 
 16 |     location for images and generate a JSON manifest file compatible with 
 17 |     Rekognition Custom Labels.
 18 |     
 19 |     PARAMS
 20 |     ======
 21 |         bucket (string) - bucket name
 22 |         prefix (string) - S3 prefix where to look for the images
 23 |         folder (string) - either train or test
 24 |         labels (list) - list of labels to look for (normal, anomaly)
 25 |         output_bucket (string) - where to upload the JSON manifest file to
 26 |     """
 27 |     # Get a creation date:
 28 |     creation_date = str(pd.to_datetime(datetime.now()))[:23].replace(' ','T')
 29 |     
 30 |     # Assign a distinct identifier for each label:
 31 |     auto_label = {}
 32 |     for index, label in enumerate(labels):
 33 |         auto_label.update({label: index + 1})
 34 |     
 35 |     # Get a handle on an S3 filesystem object:
 36 |     fs = s3fs.S3FileSystem()
 37 |     
 38 |     # Create a manifest file in the output directory passed as argument:
 39 |     with fs.open(output_bucket + f'/{folder}.manifest', 'w') as f:
 40 |         # We expect one subfolder for each label:
 41 |         for label in labels:
 42 |             # Loops through each file present at this point:
 43 |             for file in fs.ls(path=f'{bucket}/{prefix}/{folder}/{label}/', detail=True):
 44 |                 # We only care for files, not directories:
 45 |                 if file['Size'] > 0:
 46 |                     key = file['Key']
 47 |                     
 48 |                     # Build a Ground Truth format manifest row:
 49 |                     manifest_row = {
 50 |                         'source-ref': f's3://{key}',
 51 |                         'auto-label': auto_label[label],
 52 |                         'auto-label-metadata': {
 53 |                             'confidence': 1,
 54 |                             'job-name': 'labeling-job/auto-label',
 55 |                             'class-name': label,
 56 |                             'human-annotated': 'yes',
 57 |                             'creation-date': creation_date,
 58 |                             'type': 'groundtruth/image-classification'
 59 |                         }
 60 |                     }
 61 | 
 62 |                     # Write this line to the manifest:
 63 |                     f.write(json.dumps(manifest_row, indent=None) + '\n')
 64 |                     
 65 | def start_model(project_arn, model_arn, version_name, min_inference_units=1):
 66 |     """
 67 |     Start a Rekognition Custom Labels model.
 68 |     
 69 |     PARAMS
 70 |     ======
 71 |         project_arn (string) - project ARN
 72 |         model_arn (string) - project version ARN
 73 |         version_name (string) - project version name
 74 |         min_inference_units (integer) - inference unit to provision for the 
 75 |                                         endpoint which will be deployed for 
 76 |                                         this particular project version.
 77 |     """
 78 |     client = boto3.client('rekognition')
 79 | 
 80 |     try:
 81 |         # Start the model
 82 |         print('Starting model: ' + model_arn)
 83 |         response = client.start_project_version(ProjectVersionArn=model_arn, MinInferenceUnits=min_inference_units)
 84 |         
 85 |         # Wait for the model to be in the running state:
 86 |         project_version_running_waiter = client.get_waiter('project_version_running')
 87 |         project_version_running_waiter.wait(ProjectArn=project_arn, VersionNames=[version_name])
 88 | 
 89 |         # Get the running status
 90 |         describe_response=client.describe_project_versions(ProjectArn=project_arn, VersionNames=[version_name])
 91 |         for model in describe_response['ProjectVersionDescriptions']:
 92 |             print("Status: " + model['Status'])
 93 |             print("Message: " + model['StatusMessage'])
 94 |             
 95 |     except Exception as e:
 96 |         print(e)
 97 |         
 98 |     print('Done.')
 99 |     
100 | def stop_model(model_arn):
101 |     """
102 |     Stops a Rekognition Custom Labels model.
103 |     
104 |     PARAMS
105 |     ======
106 |         model_arn (string) - project version ARN
107 |     """
108 |     print('Stopping model:' + model_arn)
109 | 
110 |     # Stop the model:
111 |     try:
112 |         reko = boto3.client('rekognition')
113 |         response = reko.stop_project_version(ProjectVersionArn=model_arn)
114 |         status = response['Status']
115 |         print('Status: ' + status)
116 |         
117 |     except Exception as e:  
118 |         print(e)  
119 | 
120 |     print('Done.')
121 |     
122 | def show_custom_labels(model, bucket, image, min_confidence):
123 |     """
124 |     Calls the Rekognition detect_custom_labels() API to get the prediction for
125 |     a given image.
126 |     
127 |     PARAMS
128 |     ======
129 |         model (string) - project version ARN
130 |         bucket (string) - bucket where the image is located
131 |         image (string) - complete S3 prefix where the image is located
132 |         min_confidence (float) - minimum confidence score to return a result
133 |         
134 |     RETURNS
135 |     =======
136 |         Returns the custom label response
137 |     """
138 |     # Call DetectCustomLabels from the Rekognition API: this will give us the list 
139 |     # of labels detected for this picture and their associated confidence level:
140 |     reko = boto3.client('rekognition')
141 |     try:
142 |         response = reko.detect_custom_labels(
143 |             Image={'S3Object': {'Bucket': bucket, 'Name': image}},
144 |             MinConfidence=min_confidence,
145 |             ProjectVersionArn=model
146 |         )
147 |         
148 |     except Exception as e:
149 |         print(f'Exception encountered when processing {image}')
150 |         print(e)
151 |         
152 |     # Returns the list of custom labels for the image passed as an argument:
153 |     return response['CustomLabels']
154 | 
155 | def get_results(project_version_arn, bucket, s3_path, label=None, verbose=True):
156 |     """
157 |     Sends a list of pictures located in an S3 path to
158 |     the endpoint to get the associated predictions.
159 |     
160 |     PARAMS
161 |     ======
162 |         project_version_arn (string) - ARN of the model to query
163 |         bucket (string) - bucket name
164 |         s3_path (string) - prefix where to look the images for
165 |         label (string) - ground truth label of the images
166 |         verbose (boolean) - shows a progress bar if True (defaults to True)
167 |         
168 |     RETURNS
169 |     =======
170 |         predictions (dataframe)
171 |             A dataframe with the following columns: image, 
172 |             abnormal probability, normal probability and 
173 |             ground truth.
174 |     """
175 | 
176 |     fs = s3fs.S3FileSystem()
177 |     data = {}
178 |     counter = 0
179 |     predictions = pd.DataFrame(columns=['image', 'normal', 'abnormal'])
180 |     
181 |     for file in fs.ls(path=s3_path, detail=True, refresh=True):
182 |         if file['Size'] > 0:
183 |             image = '/'.join(file['Key'].split('/')[1:])
184 |             if verbose == True: print('.', end='')
185 | 
186 |             labels = show_custom_labels(project_version_arn, bucket, image, 0.0)
187 |             for L in labels:
188 |                 data[L['Name']] = L['Confidence']
189 |                 
190 |             predictions = predictions.append(pd.Series({
191 |                 'image': file['Key'].split('/')[-1],
192 |                 'abnormal': data['abnormal'],
193 |                 'normal': data['normal'],
194 |                 'ground truth': label
195 |             }), ignore_index=True)
196 |             
197 |             # Temporization to prevent any throttling:
198 |             counter += 1
199 |             if counter % 100 == 0:
200 |                 if verbose == True: print('|', end='')
201 |                 time.sleep(1)
202 |             
203 |     return predictions
204 | 
205 | def reshape_results(df, unknown_threshold=50.0):
206 |     """
207 |     Reshape a results dataframe containing image path, normal and abnormal
208 |     confidence levels into a more straightforward one with ground truth, 
209 |     prediction and confidence level associated to each image.
210 |     
211 |     PARAMS
212 |     ======
213 |         df (dataframe)
214 |             Input dataframe with the following columns: image, ground 
215 |             truth, normal and abnormal.
216 |             
217 |         unknown_threshold (float)
218 |             If a probability is lower than this threshold, we select 
219 |             the other result (defaults to 50.0).
220 |     """
221 |     new_val_predictions = pd.DataFrame(columns=['Image', 'Ground Truth', 'Prediction', 'Confidence Level'])
222 | 
223 |     for index, row in df.iterrows():
224 |         new_row = pd.Series(dtype='object')
225 |         new_row['Image'] = row['image']
226 |         new_row['Ground Truth'] = row['ground truth']
227 |         if row['normal'] >= unknown_threshold:
228 |             new_row['Prediction'] = 'normal'
229 |             new_row['Confidence Level'] = row['normal'] / 100
230 | 
231 |         elif row['abnormal'] >= unknown_threshold:
232 |             new_row['Prediction'] = 'abnormal'
233 |             new_row['Confidence Level'] = row['abnormal'] / 100
234 | 
235 |         else:
236 |             new_row['Prediction'] = 'unknown'
237 |             new_row['Confidence Level'] = 0.0
238 | 
239 |         new_val_predictions = new_val_predictions.append(pd.Series(new_row), ignore_index=True)
240 | 
241 |     return new_val_predictions
242 | 
243 | def classification_report(input_df):
244 |     """
245 |     Generates a classification report (similar to what Amazon Rekognition 
246 |     Custom Labels shows in the console) based on the input_df dataframe.
247 |     
248 |     PARAMS
249 |     ======
250 |     
251 |     RETURNS
252 |     =======
253 |         performance (Pandas Series)
254 |             Returns a Pandas series with the following attributes:
255 |             - Label name: 'normal'
256 |             - F1 score
257 |             - Number of test images
258 |             - Precision score
259 |             - Recall score
260 |             - Assumed threshold (computed as the confidence level minimum)
261 |     """
262 |     input_df = utils.generate_error_types(input_df, normal_label='normal', anomaly_label='abnormal')
263 |     
264 |     # Abnormal samples:
265 |     df = input_df[input_df['Ground Truth'] == 'abnormal']
266 |     TP = df['TN'].sum()
267 |     FN = df['FN'].sum()
268 |     FP = df['FP'].sum()
269 |     recall = TP / (TP + FN)
270 |     precision = TP / (TP + FP)
271 |     f1_score = 2 * TP / (2 * TP + FP + FN)
272 |     min_confidence_level = df.sort_values(by='Confidence Level', ascending=True).iloc[0]['Confidence Level']
273 | 
274 |     performance = pd.DataFrame(columns=['Label name', 'F1 score', 'Test images', 'Precision', 'Recall', 'Assumed threshold'])
275 |     performance = performance.append(pd.Series({
276 |         'Label name': 'abnormal',
277 |         'F1 score': round(f1_score, 3),
278 |         'Test images': input_df[input_df['Ground Truth'] == 'abnormal'].shape[0],
279 |         'Precision': precision,
280 |         'Recall': recall,
281 |         'Assumed threshold': round(min_confidence_level,3)
282 |     }), ignore_index=True)
283 | 
284 |     # Normal samples:
285 |     df = input_df[input_df['Ground Truth'] == 'normal']
286 |     TP = df['TP'].sum()
287 |     FN = df['FN'].sum()
288 |     FP = df['FP'].sum()
289 |     recall = TP / (TP + FN)
290 |     precision = TP / (TP + FP)
291 |     f1_score = 2 * TP / (2 * TP + FP + FN)
292 |     min_confidence_level = df.sort_values(by='Confidence Level', ascending=True).iloc[0]['Confidence Level']
293 | 
294 |     performance = performance.append(pd.Series({
295 |         'Label name': 'normal',
296 |         'F1 score': round(f1_score,3),
297 |         'Test images': input_df[input_df['Ground Truth'] == 'normal'].shape[0],
298 |         'Precision': precision,
299 |         'Recall': recall,
300 |         'Assumed threshold': round(min_confidence_level,3)
301 |     }), ignore_index=True)
302 | 
303 |     return performance


--------------------------------------------------------------------------------
/tools/sound_tools.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import os
  5 | import sys
  6 | import librosa
  7 | import librosa.display
  8 | import numpy as np
  9 | from PIL import Image
 10 | from tqdm import tqdm
 11 | 
 12 | def load_sound_file(wav_name, mono=False, channel=0):
 13 |     """
 14 |     Loads a sound file
 15 |     
 16 |     PARAMS
 17 |     ======
 18 |         wav_name (string) - location to the WAV file to open
 19 |         mono (boolean) - signal is in mono (if True) or Stereo (False, default)
 20 |         channel (integer) - which channel to load (default to 0)
 21 |     
 22 |     RETURNS
 23 |     =======
 24 |         signal (numpy array) - sound signal
 25 |         sampling_rate (float) - sampling rate detected in the file
 26 |     """
 27 |     multi_channel_data, sampling_rate = librosa.load(wav_name, sr=None, mono=mono)
 28 |     signal = np.array(multi_channel_data)[channel, :]
 29 |     
 30 |     return signal, sampling_rate
 31 | 
 32 | def get_magnitude_scale(file, n_fft=1024, hop_length=512):
 33 |     """
 34 |     Get the magnitude scale from a wav file.
 35 |     
 36 |     PARAMS
 37 |     ======
 38 |         file (string) - filepath to the location of the WAV file
 39 |         n_fft (integer) - length of the windowed signal to compute the short Fourier transform on
 40 |         hop_length (integer) - window increment when computing STFT
 41 | 
 42 |     RETURNS
 43 |     =======
 44 |         dB (ndarray) - returns the log scaled amplitude of the sound file
 45 |     """
 46 |     # Load the sound data:
 47 |     signal, sampling_rate = load_sound_file(file)
 48 | 
 49 |     # Compute the short-time Fourier transform of the signal:
 50 |     stft = librosa.stft(signal, n_fft=n_fft, hop_length=hop_length)
 51 | 
 52 |     # Map the magnitude to a decibel scale:
 53 |     dB = librosa.amplitude_to_db(np.abs(stft), ref=np.max)
 54 | 
 55 |     return dB
 56 | 
 57 | def extract_signal_features(signal, sr, n_mels=64, frames=5, n_fft=1024, hop_length=512):
 58 |     """
 59 |     Extract features from a sound signal, given a sampling rate sr. This function 
 60 |     computes the Mel spectrogram in log scales (getting the power of the signal).
 61 |     Then we build N frames (where N = frames passed as an argument to this function):
 62 |     each frame is a sliding window in the temporal dimension.
 63 |     
 64 |     PARAMS
 65 |     ======
 66 |         signal (array of floats) - numpy array as returned by load_sound_file()
 67 |         sr (integer) - sampling rate of the signal
 68 |         n_mels (integer) - number of Mel buckets (default: 64)
 69 |         frames (integer) - number of sliding windows to use to slice the Mel spectrogram
 70 |         n_fft (integer) - length of the windowed signal to compute the short Fourier transform on
 71 |         hop_length (integer) - window increment when computing STFT
 72 |     """
 73 |     
 74 |     # Compute a mel-scaled spectrogram:
 75 |     mel_spectrogram = librosa.feature.melspectrogram(
 76 |         y=signal,
 77 |         sr=sr,
 78 |         n_fft=n_fft,
 79 |         hop_length=hop_length,
 80 |         n_mels=n_mels
 81 |     )
 82 |     
 83 |     # Convert to decibel (log scale for amplitude):
 84 |     log_mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)
 85 |     
 86 |     # Generate an array of vectors as features for the current signal:
 87 |     features_vector_size = log_mel_spectrogram.shape[1] - frames + 1
 88 |     
 89 |     # Skips short signals:
 90 |     dims = frames * n_mels
 91 |     if features_vector_size < 1:
 92 |         return np.empty((0, dims), np.float32)
 93 |     
 94 |     # Build N sliding windows (=frames) and concatenate them to build a feature vector:
 95 |     features = np.zeros((features_vector_size, dims), np.float32)
 96 |     for t in range(frames):
 97 |         features[:, n_mels * t: n_mels * (t + 1)] = log_mel_spectrogram[:, t:t + features_vector_size].T
 98 |         
 99 |     return features
100 | 
101 | def generate_dataset(files_list, n_mels=64, frames=5, n_fft=1024, hop_length=512):
102 |     """
103 |     Takes a list for WAV files as an input and generate a numpy array with
104 |     the extracted features.
105 |     
106 |     PARAMS
107 |     ======
108 |         files_list (list) - list of files to generate a dataset from
109 |         n_mels (integer) - number of Mel buckets (default: 64)
110 |         frames (integer) - number of sliding windows to use to slice the Mel 
111 |                            spectrogram
112 |         n_fft (integer) - length of the windowed signal to compute the short 
113 |                           Fourier transform on
114 |         hop_length (integer) - window increment when computing STFT
115 |         
116 |     RETURNS
117 |     =======
118 |         dataset (numpy array) - dataset
119 |     """
120 |     # Number of dimensions for each frame:
121 |     dims = n_mels * frames
122 |     
123 |     for index in tqdm(range(len(files_list)), desc='Extracting features'):
124 |         # Load signal
125 |         signal, sr = load_sound_file(files_list[index])
126 |         
127 |         # Extract features from this signal:
128 |         features = extract_signal_features(
129 |             signal, 
130 |             sr, 
131 |             n_mels=n_mels, 
132 |             frames=frames, 
133 |             n_fft=n_fft, 
134 |             hop_length=hop_length
135 |         )
136 |         
137 |         if index == 0:
138 |             dataset = np.zeros((features.shape[0] * len(files_list), dims), np.float32)
139 |             
140 |         dataset[features.shape[0] * index : features.shape[0] * (index + 1), :] = features
141 | 
142 |     return dataset
143 | 
144 | def scale_minmax(X, min=0.0, max=1.0):
145 |     """
146 |     Minmax scaler for a numpy array
147 |     
148 |     PARAMS
149 |     ======
150 |         X (numpy array) - array to scale
151 |         min (float) - minimum value of the scaling range (default: 0.0)
152 |         max (float) - maximum value of the scaling range (default: 1.0)
153 |     """
154 |     X_std = (X - X.min()) / (X.max() - X.min())
155 |     X_scaled = X_std * (max - min) + min
156 |     return X_scaled
157 | 
158 | def generate_spectrograms(list_files, output_dir, n_mels=64, n_fft=1024, hop_length=512):
159 |     """
160 |     Generate spectrograms pictures from a list of WAV files. Each sound
161 |     file in WAV format is processed to generate a spectrogram that will 
162 |     be saved as a PNG file.
163 |     
164 |     PARAMS
165 |     ======
166 |         list_files (list) - list of WAV files to process
167 |         output_dir (string) - root directory to save the spectrogram to
168 |         n_mels (integer) - number of Mel buckets (default: 64)
169 |         n_fft (integer) - length of the windowed signal to compute the short Fourier transform on
170 |         hop_length (integer) - window increment when computing STFT
171 |         
172 |     RETURNS
173 |     =======
174 |         files (list) - list of spectrogram files (PNG format)
175 |     """
176 |     files = []
177 |     
178 |     # Loops through all files:
179 |     for index in tqdm(range(len(list_files)), desc=f'Building spectrograms for {output_dir}'):
180 |         # Building file name for the spectrogram PNG picture:
181 |         file = list_files[index]
182 |         path_components = file.split('/')
183 |         
184 |         # machine_id = id_00, id_02...
185 |         # sound_type = normal or abnormal
186 |         # wav_file is the name of the original sound file without the .wav extension
187 |         machine_id, sound_type = path_components[-3], path_components[-2]
188 |         wav_file = path_components[-1].split('.')[0]
189 |         filename = sound_type + '-' + machine_id + '-' + wav_file + '.png'
190 |         
191 |         # Example: train/normal/normal-id_02-00000259.png:
192 |         filename = os.path.join(output_dir, sound_type, filename)
193 | 
194 |         if not os.path.exists(filename):
195 |             # Loading sound file and generate Mel spectrogram:
196 |             signal, sr = load_sound_file(file)
197 |             mels = librosa.feature.melspectrogram(y=signal, sr=sr, n_mels=n_mels, n_fft=n_fft, hop_length=hop_length)
198 |             mels = librosa.power_to_db(mels, ref=np.max)
199 | 
200 |             # Preprocess the image: min-max, putting 
201 |             # low frequency at bottom and inverting to 
202 |             # match higher energy with black pixels:
203 |             img = scale_minmax(mels, 0, 255).astype(np.uint8)
204 |             img = np.flip(img, axis=0)
205 |             img = 255 - img
206 |             img = Image.fromarray(img)
207 | 
208 |             # Saving the picture generated to disk:
209 |             img.save(filename)
210 | 
211 |         files.append(filename)
212 |         
213 |     return files
214 | 


--------------------------------------------------------------------------------
/tools/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import hashlib
  5 | import matplotlib.pyplot as plt
  6 | import numpy as np
  7 | import os
  8 | import pandas as pd
  9 | import random
 10 | import seaborn as sns
 11 | from tqdm import tqdm
 12 | 
 13 | def md5(fname):
 14 |     """
 15 |     This function builds an MD5 hash for the file passed as argument.
 16 |     
 17 |     PARAMS
 18 |     ======
 19 |         fname (string)
 20 |             Full path and filename
 21 |             
 22 |     RETURNS
 23 |     =======
 24 |         hash (string)
 25 |             The MD5 hash of the file
 26 |     """
 27 |     filesize = os.stat(fname).st_size
 28 |     hash_md5 = hashlib.md5()
 29 |     with open(fname, "rb") as f:
 30 |         for chunk in tqdm(iter(lambda: f.read(4096), b""), total=filesize/4096):
 31 |             hash_md5.update(chunk)
 32 |             
 33 |     return hash_md5.hexdigest()
 34 | 
 35 | def build_files_list(root_dir, abnormal_dir='abnormal', normal_dir='normal'):
 36 |     """
 37 |     Generate a list of files located in the root dir.
 38 |     
 39 |     PARAMS
 40 |     ======
 41 |         root_dir (string)
 42 |             Root directory to walk
 43 |         abnormal_dir (string)
 44 |             Directory where the abnormal files are located. 
 45 |             Defaults to 'abnormal'
 46 |         normal_dir (string)
 47 |             Directory where the normal files are located.
 48 |             Defaults to 'normal'
 49 | 
 50 |     RETURNS
 51 |     =======
 52 |         normal_files (list)
 53 |             List of files in the normal directories
 54 |         abnormal_files (list)
 55 |             List of files in the abnormal directories
 56 |     """
 57 |     normal_files = []
 58 |     abnormal_files = []
 59 |     
 60 |     # Loops through the directories to build a normal and an abnormal files list:
 61 |     for root, dirs, files in os.walk(top = os.path.join(root_dir)):
 62 |         for name in files:
 63 |             current_dir_type = root.split('/')[-1]
 64 |             if current_dir_type == abnormal_dir:
 65 |                 abnormal_files.append(os.path.join(root, name))
 66 |             if current_dir_type == normal_dir:
 67 |                 normal_files.append(os.path.join(root, name))
 68 |                 
 69 |     return normal_files, abnormal_files
 70 | 
 71 | 
 72 | def generate_files_list(root_dir, abnormal_dir='abnormal', normal_dir='normal'):
 73 |     """
 74 |     Generate a list of files located in the root dir and sort test and train 
 75 |     files and labels to be used by an autoencoder. This means that the train 
 76 |     set only contains normal values, whereas the test set is balanced between 
 77 |     both types.
 78 |     
 79 |     PARAMS
 80 |     ======
 81 |         root_dir (string)
 82 |             Root directory to walk
 83 |         abnormal_dir (string)
 84 |             Directory where the abnormal files are located. 
 85 |             Defaults to 'abnormal'
 86 |         normal_dir (string)
 87 |             Directory where the normal files are located.
 88 |             Defaults to 'normal'
 89 |             
 90 |     RETURNS
 91 |     =======
 92 |         train_files (list)
 93 |             List of files to train with (only normal data)
 94 |         train_labels (list)
 95 |             List of labels (0s for normal)
 96 |         test_files (list)
 97 |             Balanced list of files with both normal and abnormal data
 98 |         test_labels (list)
 99 |             List of labels (0s for normal and 1s otherwise)
100 |     """
101 |     normal_files = []
102 |     abnormal_files = []
103 |     
104 |     # Loops through the directories to build a normal and an abnormal files list:
105 |     for root, dirs, files in os.walk(top = os.path.join(root_dir)):
106 |         for name in files:
107 |             current_dir_type = root.split('/')[-1]
108 |             if current_dir_type == abnormal_dir:
109 |                 abnormal_files.append(os.path.join(root, name))
110 |             if current_dir_type == normal_dir:
111 |                 normal_files.append(os.path.join(root, name))
112 | 
113 |     # Shuffle the normal files in place:
114 |     random.shuffle(normal_files)
115 | 
116 |     # The test files contains all the abnormal files and the same number of normal files:
117 |     test_files = np.concatenate((normal_files[:len(abnormal_files)], abnormal_files), axis=0)
118 |     test_labels = np.concatenate((np.zeros(len(abnormal_files)), np.ones(len(abnormal_files))), axis=0)
119 |     
120 |     # The train files contains all the remaining normal files:
121 |     train_files = normal_files[len(abnormal_files):]
122 |     train_labels = np.zeros(len(train_files))
123 |     
124 |     return train_files, train_labels, test_files, test_labels
125 | 
126 | def generate_error_types(df, ground_truth_col='Ground Truth', prediction_col='Prediction', normal_label=0.0, anomaly_label=1.0):
127 |     """
128 |     Compute false positive and false negatives columns based on the prediction
129 |     and ground truth columns from a dataframe.
130 |     
131 |     PARAMS
132 |     ======
133 |         df (dataframe)
134 |             Dataframe where the ground truth and prediction columns are available
135 |         ground_truth_col (string)
136 |             Column name for the ground truth values. Defaults to "Ground Truth"
137 |         prediction_col (string)
138 |             Column name for the predictied values. Defaults to "Prediction"
139 |         normal_label (object)
140 |             Value taken by a normal value. Defaults to 0.0
141 |         anomaly_label (object)
142 |             Value taken by an abnormal value. Defaults to 1.0
143 |             
144 |     RETURNS
145 |     =======
146 |         df (dataframe)
147 |             An updated dataframe with 4 new binary columns for TP, TN, FP and FN.
148 |     """
149 |     df['TP'] = 0
150 |     df['TN'] = 0
151 |     df['FP'] = 0
152 |     df['FN'] = 0
153 |     df.loc[(df[ground_truth_col] == df[prediction_col]) & (df[ground_truth_col] == normal_label), 'TP'] = 1
154 |     df.loc[(df[ground_truth_col] == df[prediction_col]) & (df[ground_truth_col] == anomaly_label), 'TN'] = 1
155 |     df.loc[(df[ground_truth_col] != df[prediction_col]) & (df[ground_truth_col] == normal_label), 'FP'] = 1
156 |     df.loc[(df[ground_truth_col] != df[prediction_col]) & (df[ground_truth_col] == anomaly_label), 'FN'] = 1
157 |     
158 |     return df
159 | 
160 | def plot_curves(FP, FN, nb_samples, threshold_min, threshold_max, threshold_step):
161 |     """
162 |     Plot the false positives and false negative samples number depending on a given threshold.
163 |     
164 |     PARAMS
165 |     ======
166 |         FP (dataframe)
167 |             Number of false positives depending on the threshold
168 |         FN (dataframe)
169 |             Number of false negatives depending on the threshold
170 |         threshold_min (float)
171 |             Minimum threshold to plot for
172 |         threshold_max (float)
173 |             Maximum threshold to plot for
174 |         threshold_step (float)
175 |             Threshold step to plot these curves
176 |     """
177 |     fig = plt.figure(figsize=(12, 6))
178 |     ax = fig.add_subplot(1, 1, 1)
179 |     
180 |     min_FN = np.argmin(FN)
181 |     min_FP = np.where(FP == np.min(FP))[0][-1]
182 |     plot_top = max(FP + FN) + 1
183 | 
184 |     # Grid customization:
185 |     major_ticks = np.arange(threshold_min, threshold_max, 1.0 * threshold_step)
186 |     minor_ticks = np.arange(threshold_min, threshold_max, 0.2 * threshold_step)
187 |     ax.set_xticks(major_ticks);
188 |     ax.set_xticks(minor_ticks, minor=True);
189 |     ax.grid(which='minor', alpha=0.5)
190 |     ax.grid(which='major', alpha=1.0, linewidth=1.0)
191 |     
192 |     # Plot false positives and false negatives curves
193 |     plt.plot(np.arange(threshold_min, threshold_max + threshold_step, threshold_step), FP, label='False positive', color='tab:red')
194 |     plt.plot(np.arange(threshold_min, threshold_max + threshold_step, threshold_step), FN, label='False negative', color='tab:green')
195 | 
196 |     # Finalize the plot with labels and legend:
197 |     plt.xlabel('Reconstruction error threshold (%)', fontsize=16)
198 |     plt.ylabel('# Samples', fontsize=16)
199 |     plt.legend()
200 |     
201 | def print_confusion_matrix(confusion_matrix, class_names, figsize = (4,3), fontsize=14):
202 |     """
203 |     Prints a confusion matrix, as returned by sklearn.metrics.confusion_matrix,
204 |     as a heatmap.
205 |     
206 |     PARAMS
207 |     ======
208 |         confusion_matrix (numpy.ndarray)
209 |             The numpy.ndarray object returned from a call to 
210 |             sklearn.metrics.confusion_matrix. Similarly constructed 
211 |             ndarrays can also be used.
212 |         class_names (list)
213 |             An ordered list of class names, in the order they index the given
214 |             confusion matrix.
215 |         figsize (tuple)
216 |             A 2-long tuple, the first value determining the horizontal size of
217 |             the ouputted figure, the second determining the vertical size.
218 |             Defaults to (10,7).
219 |         fontsize: (int)
220 |             Font size for axes labels. Defaults to 14.
221 |         
222 |     RETURNS
223 |     =======
224 |         matplotlib.figure.Figure: The resulting confusion matrix figure
225 |     """
226 |     # Build a dataframe from the confusion matrix passed as argument:
227 |     df_cm = pd.DataFrame(confusion_matrix, 
228 |                          index=class_names, 
229 |                          columns=class_names)
230 |     
231 |     # Plot the confusion matrix:
232 |     fig = plt.figure(figsize=figsize)
233 |     try:
234 |         heatmap = sns.heatmap(df_cm, annot=True, fmt="d", annot_kws={"size": 16}, cmap='viridis')
235 |     except ValueError:
236 |         raise ValueError("Confusion matrix values must be integers.")
237 |         
238 |     # Figure customization:
239 |     heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=fontsize)
240 |     heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=fontsize)
241 |     plt.ylabel('True label', fontsize=16)
242 |     plt.xlabel('Predicted label', fontsize=16)
243 |     
244 |     return fig


--------------------------------------------------------------------------------