├── .gitignore
├── ReadME.md
├── deeplearning_approach.ipynb
└── images
    └── mlflow-shot.png


/.gitignore:
--------------------------------------------------------------------------------
1 | dataset/
2 | mlflow/
3 | .ipynb_checkpoints/
4 | 


--------------------------------------------------------------------------------
/ReadME.md:
--------------------------------------------------------------------------------
 1 | ## Workshop on Deep Learning (develop to deploy for starters)
 2 | 
 3 | * _Switch between branches in order to see all developed versions of code_
 4 | 
 5 | The purpose of this online series is to develop an image classifier using  __[TensorFlow](https://www.tensorflow.org/)__ and __[Keras](https://keras.io/)__ on a funny dataset __[Mosquito-on-human-skin](https://data.mendeley.com/datasets/zw4p9kj6nt/2)__ we don't talk __MNIST__ here :D.
 6 | 
 7 | We will discuss why we need to log every single experiment and how to do it easily with __[MLflow](https://www.mlflow.org/)__ with 2-3 lines of code, and then we will use the advanced functions of MLflow to expand our knowledge.
 8 | 
 9 | ![](./images/mlflow-shot.png)
10 | 
11 | Here are some questions we'll try to answer:
12 | 
13 | - *MLflow: what is it? (Components, configuration, storage options, etc.)*
14 | - *Can you tell me why we need it? When and how do we use it?*
15 | - *Can you tell me what the benefits are?*
16 | 
17 | _The next step is to deploy the model using either __MLflow Models__ or __Tensorflow Serving__, which are the two popular approaches._
18 | 
19 | *A hands-on experience with basic optimization steps will be provided to help us understand why and when we need optimization.*
20 | 
21 | *It would be nice if we had enough time to develop an __API__ (Application Programming Interface) using __[FastApi](https://fastapi.tiangolo.com/)__, and we would also need to use __[Docker](https://www.docker.com/)__ so that maintenance on production become as simple as possible.*
22 | 
23 | *What's next? Could it be __TFX__?*
24 | 
25 | __A basic understanding of deep learning and machine learning is required__


--------------------------------------------------------------------------------
/deeplearning_approach.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# [Link to dataset](https://data.mendeley.com/datasets/zw4p9kj6nt/2)"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [
 15 |     {
 16 |      "data": {
 17 |       "text/html": [
 18 |        "<pre>✔️ 149 ms (2022-08-16T03:26:45/2022-08-16T03:26:45)</pre>"
 19 |       ],
 20 |       "text/plain": [
 21 |        "<IPython.core.display.HTML object>"
 22 |       ]
 23 |      },
 24 |      "metadata": {},
 25 |      "output_type": "display_data"
 26 |     },
 27 |     {
 28 |      "name": "stdout",
 29 |      "output_type": "stream",
 30 |      "text": [
 31 |       "GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3b49e2b8-87f0-c515-798b-3492ec05a183)\r\n",
 32 |       "GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-07628ed7-6ef8-fd67-7d03-cb6a89f72de4)\r\n"
 33 |      ]
 34 |     }
 35 |    ],
 36 |    "source": [
 37 |     "%load_ext autotime\n",
 38 |     "\n",
 39 |     "!nvidia-smi -L\n",
 40 |     "\n",
 41 |     "import os\n",
 42 |     "\n",
 43 |     "os.environ['CUDA_VISIBLE_DEVICES']='0'"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 2,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "data": {
 53 |       "text/html": [
 54 |        "<pre>✔️ 2.7 s (2022-08-16T03:26:45/2022-08-16T03:26:48)</pre>"
 55 |       ],
 56 |       "text/plain": [
 57 |        "<IPython.core.display.HTML object>"
 58 |       ]
 59 |      },
 60 |      "metadata": {},
 61 |      "output_type": "display_data"
 62 |     }
 63 |    ],
 64 |    "source": [
 65 |     "import numpy as np, tensorflow as tf, matplotlib.pyplot as plt\n",
 66 |     "from tensorflow import keras\n",
 67 |     "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
 68 |     "from tensorflow.keras.preprocessing.image import load_img\n",
 69 |     "from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input\n",
 70 |     "from tensorflow.keras import layers\n",
 71 |     "from tensorflow.keras.losses import SparseCategoricalCrossentropy, CategoricalCrossentropy\n",
 72 |     "from tensorflow.keras.models import Model\n",
 73 |     "\n",
 74 |     "from sklearn.metrics import confusion_matrix\n",
 75 |     "import itertools, glob\n",
 76 |     "\n",
 77 |     "# Experiment tracking with mlflow\n",
 78 |     "import mlflow\n",
 79 |     "import mlflow.tensorflow as mltf"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "metadata": {},
 85 |    "source": [
 86 |     "```\n",
 87 |     "mkdir mlflow && mlflow server \\\n",
 88 |     "    --backend-store-uri sqlite:///mlflow/mlflow.db \\\n",
 89 |     "    --default-artifact-root ./mlflow/artifacts \\\n",
 90 |     "    --host 0.0.0.0\n",
 91 |     "```"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": 3,
 97 |    "metadata": {},
 98 |    "outputs": [
 99 |     {
100 |      "data": {
101 |       "text/html": [
102 |        "<pre>✔️ 405 ms (2022-08-16T03:26:48/2022-08-16T03:26:48)</pre>"
103 |       ],
104 |       "text/plain": [
105 |        "<IPython.core.display.HTML object>"
106 |       ]
107 |      },
108 |      "metadata": {},
109 |      "output_type": "display_data"
110 |     },
111 |     {
112 |      "name": "stderr",
113 |      "output_type": "stream",
114 |      "text": [
115 |       "2022/08/16 03:26:48 INFO mlflow.tracking.fluent: Experiment with name 'mosquito' does not exist. Creating a new experiment.\n"
116 |      ]
117 |     }
118 |    ],
119 |    "source": [
120 |     "mlflow.set_tracking_uri(\"http://localhost:5000\")\n",
121 |     "mlflow.set_experiment(\"mosquito\")\n",
122 |     "mltf.autolog()"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": 4,
128 |    "metadata": {},
129 |    "outputs": [
130 |     {
131 |      "data": {
132 |       "text/html": [
133 |        "<pre>✔️ 1.23 ms (2022-08-16T03:26:49/2022-08-16T03:26:49)</pre>"
134 |       ],
135 |       "text/plain": [
136 |        "<IPython.core.display.HTML object>"
137 |       ]
138 |      },
139 |      "metadata": {},
140 |      "output_type": "display_data"
141 |     }
142 |    ],
143 |    "source": [
144 |     "train_path = \"./dataset/data_splitting/Train/\"\n",
145 |     "valid_path = \"./dataset/data_splitting/Test/\"\n",
146 |     "test_path = \"./dataset/data_splitting/Pred/\""
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "markdown",
151 |    "metadata": {},
152 |    "source": [
153 |     "# How to save ImageDataGenerator parameters in mlflow ?"
154 |    ]
155 |   },
156 |   {
157 |    "cell_type": "code",
158 |    "execution_count": 5,
159 |    "metadata": {},
160 |    "outputs": [
161 |     {
162 |      "data": {
163 |       "text/html": [
164 |        "<pre>✔️ 678 µs (2022-08-16T03:26:49/2022-08-16T03:26:49)</pre>"
165 |       ],
166 |       "text/plain": [
167 |        "<IPython.core.display.HTML object>"
168 |       ]
169 |      },
170 |      "metadata": {},
171 |      "output_type": "display_data"
172 |     }
173 |    ],
174 |    "source": [
175 |     "# You can add more augmentations, if you want\n",
176 |     "\n",
177 |     "train_gen = ImageDataGenerator(\n",
178 |     "    rotation_range=0.2,\n",
179 |     "    horizontal_flip=True,\n",
180 |     "    vertical_flip=True,\n",
181 |     "    preprocessing_function=preprocess_input,\n",
182 |     ")\n",
183 |     "\n",
184 |     "gen = ImageDataGenerator(\n",
185 |     "    preprocessing_function=preprocess_input\n",
186 |     ")"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": 6,
192 |    "metadata": {},
193 |    "outputs": [
194 |     {
195 |      "data": {
196 |       "text/html": [
197 |        "<pre>✔️ 2.94 ms (2022-08-16T03:26:49/2022-08-16T03:26:49)</pre>"
198 |       ],
199 |       "text/plain": [
200 |        "<IPython.core.display.HTML object>"
201 |       ]
202 |      },
203 |      "metadata": {},
204 |      "output_type": "display_data"
205 |     },
206 |     {
207 |      "data": {
208 |       "text/plain": [
209 |        "['aegypti landing',\n",
210 |        " 'aegypti smashed',\n",
211 |        " 'albopictus landing',\n",
212 |        " 'albopictus smashed',\n",
213 |        " 'Culex landing',\n",
214 |        " 'Culex smashed']"
215 |       ]
216 |      },
217 |      "execution_count": 6,
218 |      "metadata": {},
219 |      "output_type": "execute_result"
220 |     }
221 |    ],
222 |    "source": [
223 |     "targetMap='''aegypti landing\n",
224 |     "aegypti smashed\n",
225 |     "albopictus landing\n",
226 |     "albopictus smashed\n",
227 |     "Culex landing\n",
228 |     "Culex smashed'''.split('\\n')\n",
229 |     "targetMap"
230 |    ]
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": 7,
235 |    "metadata": {},
236 |    "outputs": [
237 |     {
238 |      "data": {
239 |       "text/html": [
240 |        "<pre>✔️ 608 µs (2022-08-16T03:26:49/2022-08-16T03:26:49)</pre>"
241 |       ],
242 |       "text/plain": [
243 |        "<IPython.core.display.HTML object>"
244 |       ]
245 |      },
246 |      "metadata": {},
247 |      "output_type": "display_data"
248 |     }
249 |    ],
250 |    "source": [
251 |     "# Hyper-Parameters\n",
252 |     "IMG_SIZE = (224, 224)\n",
253 |     "BATCH_SIZE = 32\n",
254 |     "EPOCHS = 1\n",
255 |     "NUM_CLASSES = len(targetMap)"
256 |    ]
257 |   },
258 |   {
259 |    "cell_type": "code",
260 |    "execution_count": 8,
261 |    "metadata": {},
262 |    "outputs": [
263 |     {
264 |      "data": {
265 |       "text/html": [
266 |        "<pre>✔️ 626 ms (2022-08-16T03:26:49/2022-08-16T03:26:50)</pre>"
267 |       ],
268 |       "text/plain": [
269 |        "<IPython.core.display.HTML object>"
270 |       ]
271 |      },
272 |      "metadata": {},
273 |      "output_type": "display_data"
274 |     },
275 |     {
276 |      "name": "stdout",
277 |      "output_type": "stream",
278 |      "text": [
279 |       "Found 4200 images belonging to 6 classes.\n",
280 |       "Found 1799 images belonging to 6 classes.\n",
281 |       "Found 3600 images belonging to 6 classes.\n"
282 |      ]
283 |     }
284 |    ],
285 |    "source": [
286 |     "train = train_gen.flow_from_directory(train_path, target_size=IMG_SIZE,\n",
287 |     "                                      classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)\n",
288 |     "valid = gen.flow_from_directory(valid_path, target_size=IMG_SIZE,\n",
289 |     "                                      classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)\n",
290 |     "test = gen.flow_from_directory(test_path, target_size=IMG_SIZE,\n",
291 |     "                                      classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)"
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "code",
296 |    "execution_count": 9,
297 |    "metadata": {},
298 |    "outputs": [
299 |     {
300 |      "data": {
301 |       "text/html": [
302 |        "<pre>✔️ 1.31 s (2022-08-16T03:26:50/2022-08-16T03:26:51)</pre>"
303 |       ],
304 |       "text/plain": [
305 |        "<IPython.core.display.HTML object>"
306 |       ]
307 |      },
308 |      "metadata": {},
309 |      "output_type": "display_data"
310 |     },
311 |     {
312 |      "name": "stdout",
313 |      "output_type": "stream",
314 |      "text": [
315 |       "Model: \"model\"\n",
316 |       "_________________________________________________________________\n",
317 |       " Layer (type)                Output Shape              Param #   \n",
318 |       "=================================================================\n",
319 |       " input_1 (InputLayer)        [(None, 224, 224, 3)]     0         \n",
320 |       "                                                                 \n",
321 |       " block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      \n",
322 |       "                                                                 \n",
323 |       " block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     \n",
324 |       "                                                                 \n",
325 |       " block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         \n",
326 |       "                                                                 \n",
327 |       " block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     \n",
328 |       "                                                                 \n",
329 |       " block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    \n",
330 |       "                                                                 \n",
331 |       " block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         \n",
332 |       "                                                                 \n",
333 |       " block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    \n",
334 |       "                                                                 \n",
335 |       " block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    \n",
336 |       "                                                                 \n",
337 |       " block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    \n",
338 |       "                                                                 \n",
339 |       " block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         \n",
340 |       "                                                                 \n",
341 |       " block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   \n",
342 |       "                                                                 \n",
343 |       " block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   \n",
344 |       "                                                                 \n",
345 |       " block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   \n",
346 |       "                                                                 \n",
347 |       " block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         \n",
348 |       "                                                                 \n",
349 |       " block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   \n",
350 |       "                                                                 \n",
351 |       " block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   \n",
352 |       "                                                                 \n",
353 |       " block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   \n",
354 |       "                                                                 \n",
355 |       " block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         \n",
356 |       "                                                                 \n",
357 |       "=================================================================\n",
358 |       "Total params: 14,714,688\n",
359 |       "Trainable params: 0\n",
360 |       "Non-trainable params: 14,714,688\n",
361 |       "_________________________________________________________________\n"
362 |      ]
363 |     }
364 |    ],
365 |    "source": [
366 |     "# Pretrained Model\n",
367 |     "\n",
368 |     "ptm = VGG16(input_shape=list(IMG_SIZE)+[3], weights='imagenet', include_top=False)\n",
369 |     "ptm.trainable = False\n",
370 |     "\n",
371 |     "\n",
372 |     "vgg = Model(ptm.input, ptm.output)\n",
373 |     "vgg.summary()"
374 |    ]
375 |   },
376 |   {
377 |    "cell_type": "code",
378 |    "execution_count": 10,
379 |    "metadata": {},
380 |    "outputs": [
381 |     {
382 |      "data": {
383 |       "text/html": [
384 |        "<pre>✔️ 141 ms (2022-08-16T03:26:51/2022-08-16T03:26:51)</pre>"
385 |       ],
386 |       "text/plain": [
387 |        "<IPython.core.display.HTML object>"
388 |       ]
389 |      },
390 |      "metadata": {},
391 |      "output_type": "display_data"
392 |     },
393 |     {
394 |      "name": "stdout",
395 |      "output_type": "stream",
396 |      "text": [
397 |       "Model: \"model_1\"\n",
398 |       "_________________________________________________________________\n",
399 |       " Layer (type)                Output Shape              Param #   \n",
400 |       "=================================================================\n",
401 |       " input_2 (InputLayer)        [(None, 224, 224, 3)]     0         \n",
402 |       "                                                                 \n",
403 |       " model (Functional)          (None, 7, 7, 512)         14714688  \n",
404 |       "                                                                 \n",
405 |       " flatten (Flatten)           (None, 25088)             0         \n",
406 |       "                                                                 \n",
407 |       " dense (Dense)               (None, 128)               3211392   \n",
408 |       "                                                                 \n",
409 |       " dense_1 (Dense)             (None, 64)                8256      \n",
410 |       "                                                                 \n",
411 |       " dropout (Dropout)           (None, 64)                0         \n",
412 |       "                                                                 \n",
413 |       " dense_2 (Dense)             (None, 32)                2080      \n",
414 |       "                                                                 \n",
415 |       " dropout_1 (Dropout)         (None, 32)                0         \n",
416 |       "                                                                 \n",
417 |       " dense_3 (Dense)             (None, 6)                 198       \n",
418 |       "                                                                 \n",
419 |       "=================================================================\n",
420 |       "Total params: 17,936,614\n",
421 |       "Trainable params: 3,221,926\n",
422 |       "Non-trainable params: 14,714,688\n",
423 |       "_________________________________________________________________\n"
424 |      ]
425 |     }
426 |    ],
427 |    "source": [
428 |     "i = layers.Input(shape=(IMG_SIZE[:]+(3,)))\n",
429 |     "ptm = vgg(i)\n",
430 |     "\n",
431 |     "# p = layers.Rescaling(1./255)(i)\n",
432 |     "# ptm = vgg(p)\n",
433 |     "\n",
434 |     "x = layers.Flatten()(ptm)\n",
435 |     "x = layers.Dense(128, activation='relu')(x)\n",
436 |     "x = layers.Dense(64, activation='relu')(x)\n",
437 |     "x = layers.Dropout(0.2)(x)\n",
438 |     "x = layers.Dense(32, activation='relu')(x)\n",
439 |     "x = layers.Dropout(0.1)(x)\n",
440 |     "x = layers.Dense(NUM_CLASSES, activation='softmax')(x)\n",
441 |     "\n",
442 |     "model = Model(i, x)\n",
443 |     "\n",
444 |     "model.summary()"
445 |    ]
446 |   },
447 |   {
448 |    "cell_type": "markdown",
449 |    "metadata": {},
450 |    "source": [
451 |     "# How to save Compile parameters in mlflow ?"
452 |    ]
453 |   },
454 |   {
455 |    "cell_type": "code",
456 |    "execution_count": 11,
457 |    "metadata": {},
458 |    "outputs": [
459 |     {
460 |      "data": {
461 |       "text/html": [
462 |        "<pre>✔️ 57.5 s (2022-08-16T03:26:52/2022-08-16T03:27:49)</pre>"
463 |       ],
464 |       "text/plain": [
465 |        "<IPython.core.display.HTML object>"
466 |       ]
467 |      },
468 |      "metadata": {},
469 |      "output_type": "display_data"
470 |     },
471 |     {
472 |      "name": "stderr",
473 |      "output_type": "stream",
474 |      "text": [
475 |       "2022/08/16 03:26:52 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e22e6c5370374437bb3fc8f3a008b2f4', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current tensorflow workflow\n"
476 |      ]
477 |     },
478 |     {
479 |      "name": "stdout",
480 |      "output_type": "stream",
481 |      "text": [
482 |       "132/132 [==============================] - 40s 276ms/step - loss: 2.4872 - acc: 0.4190 - val_loss: 0.9443 - val_acc: 0.6643\n"
483 |      ]
484 |     },
485 |     {
486 |      "name": "stderr",
487 |      "output_type": "stream",
488 |      "text": [
489 |       "WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 13). These functions will not be directly callable after loading.\n"
490 |      ]
491 |     },
492 |     {
493 |      "name": "stdout",
494 |      "output_type": "stream",
495 |      "text": [
496 |       "INFO:tensorflow:Assets written to: /tmp/tmp5rf1a5km/model/data/model/assets\n"
497 |      ]
498 |     },
499 |     {
500 |      "name": "stderr",
501 |      "output_type": "stream",
502 |      "text": [
503 |       "INFO:tensorflow:Assets written to: /tmp/tmp5rf1a5km/model/data/model/assets\n",
504 |       "2022/08/16 03:27:49 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/tmp5rf1a5km/model, flavor: keras), fall back to return ['tensorflow==2.9.0', 'keras==2.9.0']. Set logging level to DEBUG to see the full traceback.\n"
505 |      ]
506 |     }
507 |    ],
508 |    "source": [
509 |     "model.compile(optimizer=\"adam\", loss=\"categorical_crossentropy\", metrics=[\"acc\"])\n",
510 |     "\n",
511 |     "h = model.fit(\n",
512 |     "    train,\n",
513 |     "    validation_data=valid,\n",
514 |     "    epochs=EPOCHS\n",
515 |     ")"
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "code",
520 |    "execution_count": null,
521 |    "metadata": {},
522 |    "outputs": [],
523 |    "source": []
524 |   }
525 |  ],
526 |  "metadata": {
527 |   "kernelspec": {
528 |    "display_name": "tf2.9",
529 |    "language": "python",
530 |    "name": "tf2.9"
531 |   },
532 |   "language_info": {
533 |    "codemirror_mode": {
534 |     "name": "ipython",
535 |     "version": 3
536 |    },
537 |    "file_extension": ".py",
538 |    "mimetype": "text/x-python",
539 |    "name": "python",
540 |    "nbconvert_exporter": "python",
541 |    "pygments_lexer": "ipython3",
542 |    "version": "3.8.10"
543 |   }
544 |  },
545 |  "nbformat": 4,
546 |  "nbformat_minor": 4
547 | }
548 | 


--------------------------------------------------------------------------------
/images/mlflow-shot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pykeras/DL_develop_to_deploy_workshop/8b753eae10465ff95e76f0bc9d4e68164a25c7fb/images/mlflow-shot.png


--------------------------------------------------------------------------------