├── .gitignore
├── ReadME.md
├── deeplearning_approach.ipynb
└── images
└── mlflow-shot.png
/.gitignore:
--------------------------------------------------------------------------------
1 | dataset/
2 | mlflow/
3 | .ipynb_checkpoints/
4 |
--------------------------------------------------------------------------------
/ReadME.md:
--------------------------------------------------------------------------------
1 | ## Workshop on Deep Learning (develop to deploy for starters)
2 |
3 | * _Switch between branches in order to see all developed versions of code_
4 |
5 | The purpose of this online series is to develop an image classifier using __[TensorFlow](https://www.tensorflow.org/)__ and __[Keras](https://keras.io/)__ on a funny dataset __[Mosquito-on-human-skin](https://data.mendeley.com/datasets/zw4p9kj6nt/2)__ we don't talk __MNIST__ here :D.
6 |
7 | We will discuss why we need to log every single experiment and how to do it easily with __[MLflow](https://www.mlflow.org/)__ with 2-3 lines of code, and then we will use the advanced functions of MLflow to expand our knowledge.
8 |
9 | 
10 |
11 | Here are some questions we'll try to answer:
12 |
13 | - *MLflow: what is it? (Components, configuration, storage options, etc.)*
14 | - *Can you tell me why we need it? When and how do we use it?*
15 | - *Can you tell me what the benefits are?*
16 |
17 | _The next step is to deploy the model using either __MLflow Models__ or __Tensorflow Serving__, which are the two popular approaches._
18 |
19 | *A hands-on experience with basic optimization steps will be provided to help us understand why and when we need optimization.*
20 |
21 | *It would be nice if we had enough time to develop an __API__ (Application Programming Interface) using __[FastApi](https://fastapi.tiangolo.com/)__, and we would also need to use __[Docker](https://www.docker.com/)__ so that maintenance on production become as simple as possible.*
22 |
23 | *What's next? Could it be __TFX__?*
24 |
25 | __A basic understanding of deep learning and machine learning is required__
--------------------------------------------------------------------------------
/deeplearning_approach.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# [Link to dataset](https://data.mendeley.com/datasets/zw4p9kj6nt/2)"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [
15 | {
16 | "data": {
17 | "text/html": [
18 | "
✔️ 149 ms (2022-08-16T03:26:45/2022-08-16T03:26:45)
"
19 | ],
20 | "text/plain": [
21 | ""
22 | ]
23 | },
24 | "metadata": {},
25 | "output_type": "display_data"
26 | },
27 | {
28 | "name": "stdout",
29 | "output_type": "stream",
30 | "text": [
31 | "GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3b49e2b8-87f0-c515-798b-3492ec05a183)\r\n",
32 | "GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-07628ed7-6ef8-fd67-7d03-cb6a89f72de4)\r\n"
33 | ]
34 | }
35 | ],
36 | "source": [
37 | "%load_ext autotime\n",
38 | "\n",
39 | "!nvidia-smi -L\n",
40 | "\n",
41 | "import os\n",
42 | "\n",
43 | "os.environ['CUDA_VISIBLE_DEVICES']='0'"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 2,
49 | "metadata": {},
50 | "outputs": [
51 | {
52 | "data": {
53 | "text/html": [
54 | "✔️ 2.7 s (2022-08-16T03:26:45/2022-08-16T03:26:48)
"
55 | ],
56 | "text/plain": [
57 | ""
58 | ]
59 | },
60 | "metadata": {},
61 | "output_type": "display_data"
62 | }
63 | ],
64 | "source": [
65 | "import numpy as np, tensorflow as tf, matplotlib.pyplot as plt\n",
66 | "from tensorflow import keras\n",
67 | "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
68 | "from tensorflow.keras.preprocessing.image import load_img\n",
69 | "from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input\n",
70 | "from tensorflow.keras import layers\n",
71 | "from tensorflow.keras.losses import SparseCategoricalCrossentropy, CategoricalCrossentropy\n",
72 | "from tensorflow.keras.models import Model\n",
73 | "\n",
74 | "from sklearn.metrics import confusion_matrix\n",
75 | "import itertools, glob\n",
76 | "\n",
77 | "# Experiment tracking with mlflow\n",
78 | "import mlflow\n",
79 | "import mlflow.tensorflow as mltf"
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "metadata": {},
85 | "source": [
86 | "```\n",
87 | "mkdir mlflow && mlflow server \\\n",
88 | " --backend-store-uri sqlite:///mlflow/mlflow.db \\\n",
89 | " --default-artifact-root ./mlflow/artifacts \\\n",
90 | " --host 0.0.0.0\n",
91 | "```"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 3,
97 | "metadata": {},
98 | "outputs": [
99 | {
100 | "data": {
101 | "text/html": [
102 | "✔️ 405 ms (2022-08-16T03:26:48/2022-08-16T03:26:48)
"
103 | ],
104 | "text/plain": [
105 | ""
106 | ]
107 | },
108 | "metadata": {},
109 | "output_type": "display_data"
110 | },
111 | {
112 | "name": "stderr",
113 | "output_type": "stream",
114 | "text": [
115 | "2022/08/16 03:26:48 INFO mlflow.tracking.fluent: Experiment with name 'mosquito' does not exist. Creating a new experiment.\n"
116 | ]
117 | }
118 | ],
119 | "source": [
120 | "mlflow.set_tracking_uri(\"http://localhost:5000\")\n",
121 | "mlflow.set_experiment(\"mosquito\")\n",
122 | "mltf.autolog()"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 4,
128 | "metadata": {},
129 | "outputs": [
130 | {
131 | "data": {
132 | "text/html": [
133 | "✔️ 1.23 ms (2022-08-16T03:26:49/2022-08-16T03:26:49)
"
134 | ],
135 | "text/plain": [
136 | ""
137 | ]
138 | },
139 | "metadata": {},
140 | "output_type": "display_data"
141 | }
142 | ],
143 | "source": [
144 | "train_path = \"./dataset/data_splitting/Train/\"\n",
145 | "valid_path = \"./dataset/data_splitting/Test/\"\n",
146 | "test_path = \"./dataset/data_splitting/Pred/\""
147 | ]
148 | },
149 | {
150 | "cell_type": "markdown",
151 | "metadata": {},
152 | "source": [
153 | "# How to save ImageDataGenerator parameters in mlflow ?"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": 5,
159 | "metadata": {},
160 | "outputs": [
161 | {
162 | "data": {
163 | "text/html": [
164 | "✔️ 678 µs (2022-08-16T03:26:49/2022-08-16T03:26:49)
"
165 | ],
166 | "text/plain": [
167 | ""
168 | ]
169 | },
170 | "metadata": {},
171 | "output_type": "display_data"
172 | }
173 | ],
174 | "source": [
175 | "# You can add more augmentations, if you want\n",
176 | "\n",
177 | "train_gen = ImageDataGenerator(\n",
178 | " rotation_range=0.2,\n",
179 | " horizontal_flip=True,\n",
180 | " vertical_flip=True,\n",
181 | " preprocessing_function=preprocess_input,\n",
182 | ")\n",
183 | "\n",
184 | "gen = ImageDataGenerator(\n",
185 | " preprocessing_function=preprocess_input\n",
186 | ")"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": 6,
192 | "metadata": {},
193 | "outputs": [
194 | {
195 | "data": {
196 | "text/html": [
197 | "✔️ 2.94 ms (2022-08-16T03:26:49/2022-08-16T03:26:49)
"
198 | ],
199 | "text/plain": [
200 | ""
201 | ]
202 | },
203 | "metadata": {},
204 | "output_type": "display_data"
205 | },
206 | {
207 | "data": {
208 | "text/plain": [
209 | "['aegypti landing',\n",
210 | " 'aegypti smashed',\n",
211 | " 'albopictus landing',\n",
212 | " 'albopictus smashed',\n",
213 | " 'Culex landing',\n",
214 | " 'Culex smashed']"
215 | ]
216 | },
217 | "execution_count": 6,
218 | "metadata": {},
219 | "output_type": "execute_result"
220 | }
221 | ],
222 | "source": [
223 | "targetMap='''aegypti landing\n",
224 | "aegypti smashed\n",
225 | "albopictus landing\n",
226 | "albopictus smashed\n",
227 | "Culex landing\n",
228 | "Culex smashed'''.split('\\n')\n",
229 | "targetMap"
230 | ]
231 | },
232 | {
233 | "cell_type": "code",
234 | "execution_count": 7,
235 | "metadata": {},
236 | "outputs": [
237 | {
238 | "data": {
239 | "text/html": [
240 | "✔️ 608 µs (2022-08-16T03:26:49/2022-08-16T03:26:49)
"
241 | ],
242 | "text/plain": [
243 | ""
244 | ]
245 | },
246 | "metadata": {},
247 | "output_type": "display_data"
248 | }
249 | ],
250 | "source": [
251 | "# Hyper-Parameters\n",
252 | "IMG_SIZE = (224, 224)\n",
253 | "BATCH_SIZE = 32\n",
254 | "EPOCHS = 1\n",
255 | "NUM_CLASSES = len(targetMap)"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 8,
261 | "metadata": {},
262 | "outputs": [
263 | {
264 | "data": {
265 | "text/html": [
266 | "✔️ 626 ms (2022-08-16T03:26:49/2022-08-16T03:26:50)
"
267 | ],
268 | "text/plain": [
269 | ""
270 | ]
271 | },
272 | "metadata": {},
273 | "output_type": "display_data"
274 | },
275 | {
276 | "name": "stdout",
277 | "output_type": "stream",
278 | "text": [
279 | "Found 4200 images belonging to 6 classes.\n",
280 | "Found 1799 images belonging to 6 classes.\n",
281 | "Found 3600 images belonging to 6 classes.\n"
282 | ]
283 | }
284 | ],
285 | "source": [
286 | "train = train_gen.flow_from_directory(train_path, target_size=IMG_SIZE,\n",
287 | " classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)\n",
288 | "valid = gen.flow_from_directory(valid_path, target_size=IMG_SIZE,\n",
289 | " classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)\n",
290 | "test = gen.flow_from_directory(test_path, target_size=IMG_SIZE,\n",
291 | " classes=targetMap, class_mode='categorical', batch_size=BATCH_SIZE)"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 9,
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "data": {
301 | "text/html": [
302 | "✔️ 1.31 s (2022-08-16T03:26:50/2022-08-16T03:26:51)
"
303 | ],
304 | "text/plain": [
305 | ""
306 | ]
307 | },
308 | "metadata": {},
309 | "output_type": "display_data"
310 | },
311 | {
312 | "name": "stdout",
313 | "output_type": "stream",
314 | "text": [
315 | "Model: \"model\"\n",
316 | "_________________________________________________________________\n",
317 | " Layer (type) Output Shape Param # \n",
318 | "=================================================================\n",
319 | " input_1 (InputLayer) [(None, 224, 224, 3)] 0 \n",
320 | " \n",
321 | " block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 \n",
322 | " \n",
323 | " block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 \n",
324 | " \n",
325 | " block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 \n",
326 | " \n",
327 | " block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 \n",
328 | " \n",
329 | " block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 \n",
330 | " \n",
331 | " block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 \n",
332 | " \n",
333 | " block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 \n",
334 | " \n",
335 | " block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 \n",
336 | " \n",
337 | " block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 \n",
338 | " \n",
339 | " block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 \n",
340 | " \n",
341 | " block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 \n",
342 | " \n",
343 | " block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 \n",
344 | " \n",
345 | " block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 \n",
346 | " \n",
347 | " block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 \n",
348 | " \n",
349 | " block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 \n",
350 | " \n",
351 | " block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 \n",
352 | " \n",
353 | " block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 \n",
354 | " \n",
355 | " block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 \n",
356 | " \n",
357 | "=================================================================\n",
358 | "Total params: 14,714,688\n",
359 | "Trainable params: 0\n",
360 | "Non-trainable params: 14,714,688\n",
361 | "_________________________________________________________________\n"
362 | ]
363 | }
364 | ],
365 | "source": [
366 | "# Pretrained Model\n",
367 | "\n",
368 | "ptm = VGG16(input_shape=list(IMG_SIZE)+[3], weights='imagenet', include_top=False)\n",
369 | "ptm.trainable = False\n",
370 | "\n",
371 | "\n",
372 | "vgg = Model(ptm.input, ptm.output)\n",
373 | "vgg.summary()"
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": 10,
379 | "metadata": {},
380 | "outputs": [
381 | {
382 | "data": {
383 | "text/html": [
384 | "✔️ 141 ms (2022-08-16T03:26:51/2022-08-16T03:26:51)
"
385 | ],
386 | "text/plain": [
387 | ""
388 | ]
389 | },
390 | "metadata": {},
391 | "output_type": "display_data"
392 | },
393 | {
394 | "name": "stdout",
395 | "output_type": "stream",
396 | "text": [
397 | "Model: \"model_1\"\n",
398 | "_________________________________________________________________\n",
399 | " Layer (type) Output Shape Param # \n",
400 | "=================================================================\n",
401 | " input_2 (InputLayer) [(None, 224, 224, 3)] 0 \n",
402 | " \n",
403 | " model (Functional) (None, 7, 7, 512) 14714688 \n",
404 | " \n",
405 | " flatten (Flatten) (None, 25088) 0 \n",
406 | " \n",
407 | " dense (Dense) (None, 128) 3211392 \n",
408 | " \n",
409 | " dense_1 (Dense) (None, 64) 8256 \n",
410 | " \n",
411 | " dropout (Dropout) (None, 64) 0 \n",
412 | " \n",
413 | " dense_2 (Dense) (None, 32) 2080 \n",
414 | " \n",
415 | " dropout_1 (Dropout) (None, 32) 0 \n",
416 | " \n",
417 | " dense_3 (Dense) (None, 6) 198 \n",
418 | " \n",
419 | "=================================================================\n",
420 | "Total params: 17,936,614\n",
421 | "Trainable params: 3,221,926\n",
422 | "Non-trainable params: 14,714,688\n",
423 | "_________________________________________________________________\n"
424 | ]
425 | }
426 | ],
427 | "source": [
428 | "i = layers.Input(shape=(IMG_SIZE[:]+(3,)))\n",
429 | "ptm = vgg(i)\n",
430 | "\n",
431 | "# p = layers.Rescaling(1./255)(i)\n",
432 | "# ptm = vgg(p)\n",
433 | "\n",
434 | "x = layers.Flatten()(ptm)\n",
435 | "x = layers.Dense(128, activation='relu')(x)\n",
436 | "x = layers.Dense(64, activation='relu')(x)\n",
437 | "x = layers.Dropout(0.2)(x)\n",
438 | "x = layers.Dense(32, activation='relu')(x)\n",
439 | "x = layers.Dropout(0.1)(x)\n",
440 | "x = layers.Dense(NUM_CLASSES, activation='softmax')(x)\n",
441 | "\n",
442 | "model = Model(i, x)\n",
443 | "\n",
444 | "model.summary()"
445 | ]
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "metadata": {},
450 | "source": [
451 | "# How to save Compile parameters in mlflow ?"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": 11,
457 | "metadata": {},
458 | "outputs": [
459 | {
460 | "data": {
461 | "text/html": [
462 | "✔️ 57.5 s (2022-08-16T03:26:52/2022-08-16T03:27:49)
"
463 | ],
464 | "text/plain": [
465 | ""
466 | ]
467 | },
468 | "metadata": {},
469 | "output_type": "display_data"
470 | },
471 | {
472 | "name": "stderr",
473 | "output_type": "stream",
474 | "text": [
475 | "2022/08/16 03:26:52 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e22e6c5370374437bb3fc8f3a008b2f4', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current tensorflow workflow\n"
476 | ]
477 | },
478 | {
479 | "name": "stdout",
480 | "output_type": "stream",
481 | "text": [
482 | "132/132 [==============================] - 40s 276ms/step - loss: 2.4872 - acc: 0.4190 - val_loss: 0.9443 - val_acc: 0.6643\n"
483 | ]
484 | },
485 | {
486 | "name": "stderr",
487 | "output_type": "stream",
488 | "text": [
489 | "WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 13). These functions will not be directly callable after loading.\n"
490 | ]
491 | },
492 | {
493 | "name": "stdout",
494 | "output_type": "stream",
495 | "text": [
496 | "INFO:tensorflow:Assets written to: /tmp/tmp5rf1a5km/model/data/model/assets\n"
497 | ]
498 | },
499 | {
500 | "name": "stderr",
501 | "output_type": "stream",
502 | "text": [
503 | "INFO:tensorflow:Assets written to: /tmp/tmp5rf1a5km/model/data/model/assets\n",
504 | "2022/08/16 03:27:49 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/tmp5rf1a5km/model, flavor: keras), fall back to return ['tensorflow==2.9.0', 'keras==2.9.0']. Set logging level to DEBUG to see the full traceback.\n"
505 | ]
506 | }
507 | ],
508 | "source": [
509 | "model.compile(optimizer=\"adam\", loss=\"categorical_crossentropy\", metrics=[\"acc\"])\n",
510 | "\n",
511 | "h = model.fit(\n",
512 | " train,\n",
513 | " validation_data=valid,\n",
514 | " epochs=EPOCHS\n",
515 | ")"
516 | ]
517 | },
518 | {
519 | "cell_type": "code",
520 | "execution_count": null,
521 | "metadata": {},
522 | "outputs": [],
523 | "source": []
524 | }
525 | ],
526 | "metadata": {
527 | "kernelspec": {
528 | "display_name": "tf2.9",
529 | "language": "python",
530 | "name": "tf2.9"
531 | },
532 | "language_info": {
533 | "codemirror_mode": {
534 | "name": "ipython",
535 | "version": 3
536 | },
537 | "file_extension": ".py",
538 | "mimetype": "text/x-python",
539 | "name": "python",
540 | "nbconvert_exporter": "python",
541 | "pygments_lexer": "ipython3",
542 | "version": "3.8.10"
543 | }
544 | },
545 | "nbformat": 4,
546 | "nbformat_minor": 4
547 | }
548 |
--------------------------------------------------------------------------------
/images/mlflow-shot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pykeras/DL_develop_to_deploy_workshop/8b753eae10465ff95e76f0bc9d4e68164a25c7fb/images/mlflow-shot.png
--------------------------------------------------------------------------------