├── example.png
├── LICENSE
├── README.md
└── encoder_decoder.ipynb
/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/faizahkureshi232/imagetospeech/HEAD/example.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 Faizah Mahendinawaz Kureshi
4 |
5 |
6 | Permission is hereby granted, free of charge, to any person obtaining a copy
7 | of this software and associated documentation files (the "Software"), to deal
8 | in the Software without restriction, including without limitation the rights
9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Speech-enabled Image Narration: Assistance for the Visually Impaired
2 |
3 | ## Overview
4 | This project focuses on enabling accessibility for visually impaired individuals by automating the generation of descriptive captions for images and providing these descriptions through audio narration. The solution integrates advanced machine learning techniques in image processing, natural language understanding, and speech synthesis.
5 |
6 | ## Features
7 | - **Image Feature Extraction**: Utilizes the InceptionV3 CNN model to extract high-level image features.
8 | - **Semantic Word Embeddings**: Employs GloVe embeddings to enhance the language representation.
9 | - **Caption Generation**: Generates meaningful and contextually relevant captions using an LSTM-based decoder.
10 | - **Speech Narration**: Converts generated captions into audio using Text-to-Speech (TTS) technology.
11 |
12 | ## Architecture
13 | The system follows an encoder-decoder paradigm:
14 | 1. **Image Input**: Accepts images as input.
15 | 2. **Feature Extraction**: InceptionV3 CNN extracts image features.
16 | 3. **Language Representation**: GloVe embeddings provide semantic word vectors.
17 | 4. **Caption Generation**: LSTM decoder generates captions using image features and word embeddings.
18 | 5. **Text-to-Speech Conversion**: TTS converts captions to speech.
19 | 6. **Audio Output**: Delivers the generated description as audio for user accessibility.
20 |
21 |
22 | ## Technologies Used
23 | - **Python**: Programming language
24 | - **TensorFlow/Keras**: For building and training the CNN and LSTM models
25 | - **GloVe**: Pre-trained word embeddings for language representation
26 | - **Text-to-Speech (TTS)**: For converting text captions into speech
27 | - **Flask/Django (Optional)**: For deploying the application
28 | - **NumPy, Pandas, Matplotlib**: For data handling and visualization
29 |
30 | ## Installation
31 | 1. Clone this repository:
32 | ```bash
33 | git clone https://github.com/faizahkureshi232/imagetospeech.git
34 | cd project-name
35 |
36 |
37 | 2. Download the pre-trained models:
38 |
39 | - InceptionV3 weights
40 | - GloVe word embeddings
41 |
42 | 3. Run the application:
43 |
44 | bash
45 |
46 | Copy code
47 |
48 | `eval.ipnby`
49 |
50 | Usage
51 | -----
52 |
53 | 1. Upload an image through the interface or specify the image path in the script.
54 | 2. The system will generate a descriptive caption.
55 | 3. The caption will be converted into speech and played as audio.
56 |
57 | Sample Results
58 | --------------
59 |
60 | - Input Image: [example.png]
61 | - Generated Caption: "A Dog Running through the."
62 | - Audio Output: Speech narration of the generated caption.
63 |
64 | Future Enhancements
65 | -------------------
66 |
67 | - Integration with real-time image capture (e.g., through a smartphone camera).
68 | - Support for multiple languages in Text-to-Speech.
69 | - Advanced customization for user-specific accessibility needs.
70 |
71 | Contributing
72 | ------------
73 |
74 | Contributions are welcome! Please follow these steps:
75 |
76 | 1. Fork the repository.
77 | 2. Create a feature branch:
78 |
79 | bash
80 |
81 | Copy code
82 |
83 | `git checkout -b feature-name`
84 |
85 | 3. Commit your changes:
86 |
87 | bash
88 |
89 | Copy code
90 |
91 | `git commit -m "Add feature description"`
92 |
93 | 4. Push to the branch:
94 |
95 | bash
96 |
97 | Copy code
98 |
99 | `git push origin feature-name`
100 |
101 | 5. Create a pull request.
102 |
103 | License
104 | -------
105 |
106 | This project is licensed under the MIT License. See `LICENSE` for more details.
107 |
108 | Acknowledgements
109 | ----------------
110 |
111 | - InceptionV3 for feature extraction.
112 | - [GloVe](https://nlp.stanford.edu/projects/glove/) for pre-trained word embeddings.
113 | - OpenAI and community resources for inspiration and support.
114 |
115 | * * * * *
116 |
117 | Feel free to suggest improvements or report issues in the repository!
118 |
--------------------------------------------------------------------------------
/encoder_decoder.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 31,
6 | "metadata": {
7 | "id": "XTs0q-TgmiSt"
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import numpy as np\n",
12 | "import pandas as pd\n",
13 | "import string\n",
14 | "import matplotlib.pyplot as plt\n",
15 | "from tqdm import tqdm\n",
16 | "from PIL import Image\n",
17 | "import pickle\n",
18 | "from pickle import dump, load\n",
19 | "import time\n",
20 | "import os\n",
21 | "\n",
22 | "import torch\n",
23 | "import tensorflow as tf\n",
24 | "import keras\n",
25 | "from keras.applications.inception_v3 import InceptionV3\n",
26 | "from keras.models import Model\n",
27 | "from keras.preprocessing import image\n"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 3,
33 | "metadata": {
34 | "colab": {
35 | "base_uri": "https://localhost:8080/"
36 | },
37 | "id": "CUXJPFGAXRkf",
38 | "outputId": "7467f855-5c11-4386-80a8-3e6574bab59e"
39 | },
40 | "outputs": [
41 | {
42 | "name": "stdout",
43 | "output_type": "stream",
44 | "text": [
45 | "Mounted at /content/gdrive\n"
46 | ]
47 | }
48 | ],
49 | "source": [
50 | "from google.colab import drive\n",
51 | "drive.mount('/content/gdrive')"
52 | ]
53 | },
54 | {
55 | "cell_type": "code",
56 | "execution_count": 4,
57 | "metadata": {
58 | "colab": {
59 | "base_uri": "https://localhost:8080/"
60 | },
61 | "id": "K5QOJ1aomdsM",
62 | "outputId": "ae4b0bdc-fa20-4174-b2cc-5e707b5c96af"
63 | },
64 | "outputs": [
65 | {
66 | "name": "stdout",
67 | "output_type": "stream",
68 | "text": [
69 | "Device: cuda\n"
70 | ]
71 | }
72 | ],
73 | "source": [
74 | "num_gpus = torch.cuda.device_count()\n",
75 | "if num_gpus > 0:\n",
76 | " device = 'cuda'\n",
77 | "else:\n",
78 | " device = 'cpu'\n",
79 | "\n",
80 | "print('Device: ', device)"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {
86 | "id": "vQqEhCDonewY"
87 | },
88 | "source": [
89 | "### **1. Data Preprocessing and Cleaning**"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 6,
95 | "metadata": {
96 | "id": "jS2AsHKmnFdK"
97 | },
98 | "outputs": [],
99 | "source": [
100 | "# Function to load the file and read its text\n",
101 | "def load_caption(filename):\n",
102 | " file = open(filename, 'r')\n",
103 | " read_text = file.read()\n",
104 | " file.close()\n",
105 | " return read_text\n",
106 | "\n",
107 | "# Cleaning captions - Removing punctuation, converting to lowercase, single-character words and numeric values\n",
108 | "def clean_text(text):\n",
109 | " caption = text.lower()\n",
110 | " translator = str.maketrans('', '', string.punctuation) # Removes punctuation using the translator method defined below\n",
111 | " caption = caption.translate(translator)\n",
112 | "\n",
113 | " desc_list = \"\"\n",
114 | " for word in caption.split():\n",
115 | " if len(word) >= 1 and word.isdigit() == False:\n",
116 | " desc_list += \" \" + word\n",
117 | "\n",
118 | " caption = desc_list\n",
119 | " return caption\n",
120 | "\n",
121 | "\n",
122 | "# Creating a dataframe from the text files\n",
123 | "def create_dataframe(text):\n",
124 | " data = []\n",
125 | " descriptions = {} # Dictionary with image as the key and a list of all the five captions as value\n",
126 | " for sentence in text.split('\\n'):\n",
127 | " splits = sentence.split('\\t')\n",
128 | "\n",
129 | " if len(splits) != 1:\n",
130 | " idx = splits[0].split('#')\n",
131 | " data.append(idx + [splits[1].lower()])\n",
132 | " img_idx = idx[0].split('.')[0]\n",
133 | " text = splits[1]\n",
134 | " caption = clean_text(text) # Cleaning text using the function defined \n",
135 | " \n",
136 | " if img_idx not in descriptions.keys():\n",
137 | " descriptions[img_idx] = list() \n",
138 | " descriptions[img_idx].append(caption)\n",
139 | "\n",
140 | " else:\n",
141 | " continue\n",
142 | " return data, descriptions\n"
143 | ]
144 | },
145 | {
146 | "cell_type": "markdown",
147 | "metadata": {
148 | "id": "6-CpTb1eClJO"
149 | },
150 | "source": [
151 | "#### An example of raw data - Image ID with caption number and assoiated captions\n",
152 | "\n",
153 | "#### The text file contains raw captions for the Dataset. The first column denotes the image ID with the caption number (0 - 4)"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": 7,
159 | "metadata": {
160 | "colab": {
161 | "base_uri": "https://localhost:8080/"
162 | },
163 | "id": "nr4I8C12ouY1",
164 | "outputId": "cb964eb8-5609-4a39-d0e7-5245ee7a310f"
165 | },
166 | "outputs": [
167 | {
168 | "name": "stdout",
169 | "output_type": "stream",
170 | "text": [
171 | "1000268201_693b08cb0e.jpg#0\tA child in a pink dress is climbing up a set of stairs in an entry way .\n",
172 | "1000268201_693b08cb0e.jpg#1\tA girl going into a wooden building .\n",
173 | "1000268201_693b08cb0e.jpg#2\tA little girl climbing into a wooden playhouse .\n",
174 | "1000268201_693b08cb0e.jpg#3\tA little girl climbing the stairs to her playhouse .\n",
175 | "1000268201_693b08cb0e.jpg#4\tA little girl in a pink dress going into a wooden cabin .\n"
176 | ]
177 | }
178 | ],
179 | "source": [
180 | "text = load_caption('/content/gdrive/MyDrive/Deep_Learning/Flickr8k_text/Flickr8k.token.txt')\n",
181 | "print(text[:410])"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 8,
187 | "metadata": {
188 | "colab": {
189 | "base_uri": "https://localhost:8080/",
190 | "height": 218
191 | },
192 | "id": "eYqDkHEfzyJ6",
193 | "outputId": "e6dca51c-8c02-4fb4-9f46-b875072dc9b4"
194 | },
195 | "outputs": [
196 | {
197 | "name": "stdout",
198 | "output_type": "stream",
199 | "text": [
200 | "Number of unique images (datapoints): 8092\n"
201 | ]
202 | },
203 | {
204 | "data": {
205 | "text/html": [
206 | "
\n",
207 | "\n",
220 | "
\n",
221 | " \n",
222 | " \n",
223 | " | \n",
224 | " image | \n",
225 | " index | \n",
226 | " caption | \n",
227 | "
\n",
228 | " \n",
229 | " \n",
230 | " \n",
231 | " | 0 | \n",
232 | " 1000268201_693b08cb0e.jpg | \n",
233 | " 0 | \n",
234 | " a child in a pink dress is climbing up a set o... | \n",
235 | "
\n",
236 | " \n",
237 | " | 1 | \n",
238 | " 1000268201_693b08cb0e.jpg | \n",
239 | " 1 | \n",
240 | " a girl going into a wooden building . | \n",
241 | "
\n",
242 | " \n",
243 | " | 2 | \n",
244 | " 1000268201_693b08cb0e.jpg | \n",
245 | " 2 | \n",
246 | " a little girl climbing into a wooden playhouse . | \n",
247 | "
\n",
248 | " \n",
249 | " | 3 | \n",
250 | " 1000268201_693b08cb0e.jpg | \n",
251 | " 3 | \n",
252 | " a little girl climbing the stairs to her playh... | \n",
253 | "
\n",
254 | " \n",
255 | " | 4 | \n",
256 | " 1000268201_693b08cb0e.jpg | \n",
257 | " 4 | \n",
258 | " a little girl in a pink dress going into a woo... | \n",
259 | "
\n",
260 | " \n",
261 | "
\n",
262 | "
"
263 | ],
264 | "text/plain": [
265 | " image ... caption\n",
266 | "0 1000268201_693b08cb0e.jpg ... a child in a pink dress is climbing up a set o...\n",
267 | "1 1000268201_693b08cb0e.jpg ... a girl going into a wooden building .\n",
268 | "2 1000268201_693b08cb0e.jpg ... a little girl climbing into a wooden playhouse .\n",
269 | "3 1000268201_693b08cb0e.jpg ... a little girl climbing the stairs to her playh...\n",
270 | "4 1000268201_693b08cb0e.jpg ... a little girl in a pink dress going into a woo...\n",
271 | "\n",
272 | "[5 rows x 3 columns]"
273 | ]
274 | },
275 | "execution_count": 8,
276 | "metadata": {
277 | "tags": []
278 | },
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "data, dictionary = create_dataframe(text)\n",
284 | "df = pd.DataFrame(data, columns=['image', 'index', 'caption'])\n",
285 | "images_vector = np.unique(df.image.values)\n",
286 | "print('Number of unique images (datapoints): ', len(images_vector))\n",
287 | "df.head(5)"
288 | ]
289 | },
290 | {
291 | "cell_type": "markdown",
292 | "metadata": {
293 | "id": "SLaAXaVxC85p"
294 | },
295 | "source": [
296 | "#### A dictionary key as Image ID and its 5 captions"
297 | ]
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": 9,
302 | "metadata": {
303 | "colab": {
304 | "base_uri": "https://localhost:8080/"
305 | },
306 | "id": "pOZ9fnhIPv7A",
307 | "outputId": "aa7551b9-3329-4d10-c4e0-b3e69e156c9a"
308 | },
309 | "outputs": [
310 | {
311 | "data": {
312 | "text/plain": [
313 | "array(['a child in a pink dress is climbing up a set of stairs in an entry way .',\n",
314 | " 'a girl going into a wooden building .',\n",
315 | " 'a little girl climbing into a wooden playhouse .',\n",
316 | " 'a little girl climbing the stairs to her playhouse .',\n",
317 | " 'a little girl in a pink dress going into a wooden cabin .',\n",
318 | " 'a black dog and a spotted dog are fighting',\n",
319 | " 'a black dog and a tri-colored dog playing with each other on the road .',\n",
320 | " 'a black dog and a white dog with brown spots are staring at each other in the street .',\n",
321 | " 'two dogs of different breeds looking at each other on the road .',\n",
322 | " 'two dogs on pavement moving toward each other .',\n",
323 | " 'a little girl covered in paint sits in front of a painted rainbow with her hands in a bowl .',\n",
324 | " 'a little girl is sitting in front of a large painted rainbow .',\n",
325 | " 'a small girl in the grass plays with fingerpaints in front of a white canvas with a rainbow on it .',\n",
326 | " 'there is a girl with pigtails sitting in front of a rainbow painting .',\n",
327 | " 'young girl with pigtails painting outside in the grass .'],\n",
328 | " dtype=object)"
329 | ]
330 | },
331 | "execution_count": 9,
332 | "metadata": {
333 | "tags": []
334 | },
335 | "output_type": "execute_result"
336 | }
337 | ],
338 | "source": [
339 | "pd.set_option(\"display.max_rows\", None, \"display.max_columns\", None)\n",
340 | "df.caption.values[:15]"
341 | ]
342 | },
343 | {
344 | "cell_type": "markdown",
345 | "metadata": {
346 | "id": "H3l5KFhYBb1t"
347 | },
348 | "source": [
349 | "#### Preprocessing images for the InceptionV3 model\n",
350 | "1. Resizing the images to 299 x 299\n",
351 | "2. Using keras process_input method to normalize the pixel values between -1 to 1"
352 | ]
353 | },
354 | {
355 | "cell_type": "code",
356 | "execution_count": 10,
357 | "metadata": {
358 | "id": "zlTcJQQT7Ojt"
359 | },
360 | "outputs": [],
361 | "source": [
362 | "def load_image_preprocess(img_path):\n",
363 | " img = tf.io.read_file(img_path)\n",
364 | " img = tf.image.decode_jpeg(img, channels = 3)\n",
365 | " img = tf.image.resize(img, (299, 299)) # Resizing the image to 299 * 299\n",
366 | "\n",
367 | " # Preprocessing the input using preprocess_input method: Normalizing the pixel values between -1 t0 1\n",
368 | " img = tf.keras.applications.inception_v3.preprocess_input(img)\n",
369 | " return img, img_path\n",
370 | "\n",
371 | "# Method to create a list of image paths\n",
372 | "def create_images_vector(images_vector, root):\n",
373 | " paths = []\n",
374 | " for img in images_vector:\n",
375 | " paths.append(root + img)\n",
376 | " return paths\n",
377 | "\n",
378 | "root = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_Dataset/Flicker8k_Dataset/' \n",
379 | "list_image_paths = create_images_vector(images_vector, root)"
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": null,
385 | "metadata": {
386 | "colab": {
387 | "base_uri": "https://localhost:8080/",
388 | "height": 304
389 | },
390 | "id": "EwtbSLFKAXp1",
391 | "outputId": "c7c7c4cb-16e5-42a4-a912-f2a143121205"
392 | },
393 | "outputs": [
394 | {
395 | "name": "stderr",
396 | "output_type": "stream",
397 | "text": [
398 | "Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).\n"
399 | ]
400 | },
401 | {
402 | "name": "stdout",
403 | "output_type": "stream",
404 | "text": [
405 | "(299, 299, 3)\n"
406 | ]
407 | },
408 | {
409 | "data": {
410 | "image/png": "",
411 | "text/plain": [
412 | ""
413 | ]
414 | },
415 | "metadata": {
416 | "needs_background": "light",
417 | "tags": []
418 | },
419 | "output_type": "display_data"
420 | }
421 | ],
422 | "source": [
423 | "demo_img, img_path = load_image_preprocess(root + images_vector[2])\n",
424 | "plt.imshow(demo_img)\n",
425 | "print(demo_img.shape)"
426 | ]
427 | },
428 | {
429 | "cell_type": "code",
430 | "execution_count": 11,
431 | "metadata": {
432 | "colab": {
433 | "base_uri": "https://localhost:8080/"
434 | },
435 | "id": "azNPyZWJxI-D",
436 | "outputId": "f667ce65-b80d-4601-a736-1914d9c3032c"
437 | },
438 | "outputs": [
439 | {
440 | "name": "stdout",
441 | "output_type": "stream",
442 | "text": [
443 | "1346 /content/gdrive/MyDrive/Deep_Learning/Flickr8k_Dataset/Flicker8k_Dataset/2258277193_586949ec62.jpg.1\n"
444 | ]
445 | }
446 | ],
447 | "source": [
448 | "# Additional code to remove a non-existing training data image present in our text file\n",
449 | "\n",
450 | "find = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_Dataset/Flicker8k_Dataset/2258277193_586949ec62.jpg.1'\n",
451 | "for i, val in enumerate(list_image_paths):\n",
452 | " if find == val:\n",
453 | " print(i, val)\n",
454 | " list_image_paths[i] = root + '2258277193_586949ec62.jpg'\n",
455 | "\n",
456 | "del list_image_paths[1346]"
457 | ]
458 | },
459 | {
460 | "cell_type": "markdown",
461 | "metadata": {
462 | "id": "6AvihAc4xYcP"
463 | },
464 | "source": [
465 | "#### Splitting train, validate and test data"
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": 36,
471 | "metadata": {
472 | "colab": {
473 | "base_uri": "https://localhost:8080/"
474 | },
475 | "id": "LXh65aUNwHvZ",
476 | "outputId": "96c6a03a-fe28-4d6a-c37c-b8abfc3eca58"
477 | },
478 | "outputs": [
479 | {
480 | "name": "stdout",
481 | "output_type": "stream",
482 | "text": [
483 | "Length of train, validate and test data: 6000 1000 1000\n"
484 | ]
485 | }
486 | ],
487 | "source": [
488 | "# Creating separate train, validate and test image lists\n",
489 | "train_images_path = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_text/Flickr_8k.trainImages.txt'\n",
490 | "train_text = open(train_images_path, 'r').read().strip().split('\\n')\n",
491 | "train_images = create_images_vector(train_text, root)\n",
492 | "\n",
493 | "test_images_path = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_text/Flickr_8k.testImages.txt'\n",
494 | "test_txt = open(test_images_path, 'r').read().strip().split('\\n')\n",
495 | "test_images = create_images_vector(test_txt, root)\n",
496 | "\n",
497 | "validate_images_path = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_text/Flickr_8k.devImages.txt'\n",
498 | "validate_images = open(validate_images_path, 'r').read().strip().split('\\n')\n",
499 | "validate_images = create_images_vector(validate_images, root)\n",
500 | "\n",
501 | "print('Length of train, validate and test data: ', len(train_images), len(validate_images), len(test_images))"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "metadata": {
507 | "id": "QYZwKJbU97KN"
508 | },
509 | "source": [
510 | "#### Preprocessing captions for the decoder model"
511 | ]
512 | },
513 | {
514 | "cell_type": "code",
515 | "execution_count": 37,
516 | "metadata": {
517 | "id": "2RbVr05nxHZ2"
518 | },
519 | "outputs": [],
520 | "source": [
521 | "# Loading captions in a dictionary only for the training images\n",
522 | "train_dictionary = {}\n",
523 | "test_dictionary = {} # Implementing for later BLEU Analysis\n",
524 | "\n",
525 | "# Adding and tokens --- decide either to all_captions list or dictionary \n",
526 | "for key in dictionary.keys():\n",
527 | " key_updated = key + '.jpg'\n",
528 | " if key_updated in train_text:\n",
529 | " train_dictionary[key] = list()\n",
530 | "\n",
531 | " for c in dictionary[key]:\n",
532 | " caption = '' + ''.join(c) + ' '\n",
533 | " train_dictionary[key].append(caption)\n",
534 | " \n",
535 | " if key_updated in test_txt:\n",
536 | " test_dictionary[key] = list()\n",
537 | "\n",
538 | " for c in dictionary[key]:\n",
539 | " test_dictionary[key].append(c)"
540 | ]
541 | },
542 | {
543 | "cell_type": "markdown",
544 | "metadata": {
545 | "id": "udCWScDCPXRt"
546 | },
547 | "source": [
548 | "#### Defining utility functions for Captions processing"
549 | ]
550 | },
551 | {
552 | "cell_type": "code",
553 | "execution_count": 14,
554 | "metadata": {
555 | "id": "sZKlQNL895x3"
556 | },
557 | "outputs": [],
558 | "source": [
559 | "# A list of all the captions in the dataframe\n",
560 | "def create_captions_list(dictionary):\n",
561 | " all_captions = []\n",
562 | " for key in dictionary.keys():\n",
563 | " for c in dictionary[key]:\n",
564 | " all_captions.append(c)\n",
565 | " return all_captions\n",
566 | "\n",
567 | "def max_length(captions):\n",
568 | " lengths = []\n",
569 | " for caption in captions:\n",
570 | " lengths.append(len(caption.split()))\n",
571 | " return max(lengths)"
572 | ]
573 | },
574 | {
575 | "cell_type": "code",
576 | "execution_count": 15,
577 | "metadata": {
578 | "colab": {
579 | "base_uri": "https://localhost:8080/"
580 | },
581 | "id": "GSToJ38JGzER",
582 | "outputId": "228a39c5-b32a-4a54-b0bf-5ff6f82f26a1"
583 | },
584 | "outputs": [
585 | {
586 | "name": "stdout",
587 | "output_type": "stream",
588 | "text": [
589 | "Original Vocabulary size: 7597\n",
590 | "Vocabulary size after removing low frequency words: 1654\n",
591 | "Maximum length of a caption in the corpus is: 37\n"
592 | ]
593 | }
594 | ],
595 | "source": [
596 | "# Creating a Vocabulary of all the unique tokens in the corpus\n",
597 | "vocab = set()\n",
598 | "train_all_captions = create_captions_list(train_dictionary)\n",
599 | "max_caption = max_length(train_all_captions)\n",
600 | "\n",
601 | "for key in train_dictionary.keys():\n",
602 | " [vocab.update(caption.split()) for caption in train_dictionary[key]]\n",
603 | "print('Original Vocabulary size: ', len(vocab))\n",
604 | "\n",
605 | "# Consider only words which occur at least 10 times in the corpus\n",
606 | "threshold = 10\n",
607 | "counts = {}\n",
608 | "num_sentence = 0\n",
609 | "for sent in train_all_captions:\n",
610 | " num_sentence += 1\n",
611 | " for token in sent.split(' '):\n",
612 | " counts[token] = counts.get(token, 0) + 1\n",
613 | "\n",
614 | "vocab = [word for word in counts if counts[word] >= threshold]\n",
615 | "print('Vocabulary size after removing low frequency words: ', len(vocab))\n",
616 | "print('Maximum length of a caption in the corpus is: ', max_caption)"
617 | ]
618 | },
619 | {
620 | "cell_type": "code",
621 | "execution_count": 16,
622 | "metadata": {
623 | "colab": {
624 | "base_uri": "https://localhost:8080/"
625 | },
626 | "id": "Z9WmsayyZREz",
627 | "outputId": "a334b02d-1806-438a-cc4b-278485ecfc5b"
628 | },
629 | "outputs": [
630 | {
631 | "name": "stdout",
632 | "output_type": "stream",
633 | "text": [
634 | "Index of token is 1 and token is 15\n"
635 | ]
636 | }
637 | ],
638 | "source": [
639 | "# Python dictionaries to map tokens to index \n",
640 | "token_to_idx, idx_to_token = {}, {} \n",
641 | "\n",
642 | "index = 1\n",
643 | "for token in vocab:\n",
644 | " token_to_idx[token] = index\n",
645 | " idx_to_token[index] = token\n",
646 | " index += 1\n",
647 | "\n",
648 | "print(f\"Index of token is {token_to_idx['']} and token is {token_to_idx['']}\")"
649 | ]
650 | },
651 | {
652 | "cell_type": "markdown",
653 | "metadata": {
654 | "id": "LtX2kxtdoMSU"
655 | },
656 | "source": [
657 | "### **2. GLoVe embeddings**"
658 | ]
659 | },
660 | {
661 | "cell_type": "code",
662 | "execution_count": null,
663 | "metadata": {
664 | "colab": {
665 | "base_uri": "https://localhost:8080/"
666 | },
667 | "id": "Tt6LCPV4oL_O",
668 | "outputId": "3cb5842f-6ad3-452c-9c24-cba3238d70f5"
669 | },
670 | "outputs": [
671 | {
672 | "name": "stdout",
673 | "output_type": "stream",
674 | "text": [
675 | "--2021-05-18 19:23:25-- http://nlp.stanford.edu/data/glove.6B.zip\n",
676 | "Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140\n",
677 | "Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.\n",
678 | "HTTP request sent, awaiting response... 302 Found\n",
679 | "Location: https://nlp.stanford.edu/data/glove.6B.zip [following]\n",
680 | "--2021-05-18 19:23:25-- https://nlp.stanford.edu/data/glove.6B.zip\n",
681 | "Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.\n",
682 | "HTTP request sent, awaiting response... 301 Moved Permanently\n",
683 | "Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]\n",
684 | "--2021-05-18 19:23:25-- http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip\n",
685 | "Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22\n",
686 | "Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.\n",
687 | "HTTP request sent, awaiting response... 200 OK\n",
688 | "Length: 862182613 (822M) [application/zip]\n",
689 | "Saving to: ‘glove.6B.zip’\n",
690 | "\n",
691 | "glove.6B.zip 100%[===================>] 822.24M 5.07MB/s in 2m 43s \n",
692 | "\n",
693 | "2021-05-18 19:26:09 (5.03 MB/s) - ‘glove.6B.zip’ saved [862182613/862182613]\n",
694 | "\n"
695 | ]
696 | }
697 | ],
698 | "source": [
699 | "!wget http://nlp.stanford.edu/data/glove.6B.zip\n",
700 | "!unzip -q glove.6B.zip"
701 | ]
702 | },
703 | {
704 | "cell_type": "code",
705 | "execution_count": null,
706 | "metadata": {
707 | "colab": {
708 | "base_uri": "https://localhost:8080/"
709 | },
710 | "id": "EZR6Q6wt8LZH",
711 | "outputId": "295f19cd-877f-4bc1-dcfa-11422c4013d7"
712 | },
713 | "outputs": [
714 | {
715 | "name": "stdout",
716 | "output_type": "stream",
717 | "text": [
718 | "Found 400000 word vectors.\n"
719 | ]
720 | }
721 | ],
722 | "source": [
723 | "# Using Keras pre-trained GLOVE word-embeddings (dimension: 200)\n",
724 | "\n",
725 | "path_to_glove_file = os.path.join(\n",
726 | " os.path.expanduser(\"~\"), \"/content/glove.6B.200d.txt\"\n",
727 | ")\n",
728 | "\n",
729 | "embeddings_index = {}\n",
730 | "with open(path_to_glove_file) as f:\n",
731 | " for line in f:\n",
732 | " word, coefs = line.split(maxsplit=1)\n",
733 | " coefs = np.fromstring(coefs, \"f\", sep=\" \")\n",
734 | " embeddings_index[word] = coefs\n",
735 | "\n",
736 | "print(\"Found %s word vectors.\" % len(embeddings_index))"
737 | ]
738 | },
739 | {
740 | "cell_type": "code",
741 | "execution_count": 20,
742 | "metadata": {
743 | "id": "G9KbDjhAs4pA"
744 | },
745 | "outputs": [],
746 | "source": [
747 | "embeddings_size = 200\n",
748 | "vocab_size = len(vocab) + 1"
749 | ]
750 | },
751 | {
752 | "cell_type": "code",
753 | "execution_count": null,
754 | "metadata": {
755 | "colab": {
756 | "base_uri": "https://localhost:8080/"
757 | },
758 | "id": "di0QmSi28z1H",
759 | "outputId": "5d37bc0d-cb66-4b5c-fc61-247a1e86a483"
760 | },
761 | "outputs": [
762 | {
763 | "name": "stderr",
764 | "output_type": "stream",
765 | "text": [
766 | "100%|██████████| 1654/1654 [00:00<00:00, 309345.35it/s]\n"
767 | ]
768 | }
769 | ],
770 | "source": [
771 | "embeddings = np.zeros((vocab_size, embeddings_size))\n",
772 | "\n",
773 | "for token, idx in tqdm(token_to_idx.items()):\n",
774 | " if token in embeddings_index:\n",
775 | " vector = embeddings_index[token]\n",
776 | " embeddings[idx] = vector"
777 | ]
778 | },
779 | {
780 | "cell_type": "markdown",
781 | "metadata": {
782 | "id": "TXThDYmdBRue"
783 | },
784 | "source": [
785 | "Checkpoint: Saving the embeddings matrix"
786 | ]
787 | },
788 | {
789 | "cell_type": "code",
790 | "execution_count": 34,
791 | "metadata": {
792 | "colab": {
793 | "base_uri": "https://localhost:8080/"
794 | },
795 | "id": "Np0EQYz-_cXT",
796 | "outputId": "59c07478-d5da-48d0-882d-0a9481693c02"
797 | },
798 | "outputs": [
799 | {
800 | "name": "stdout",
801 | "output_type": "stream",
802 | "text": [
803 | "Size of embeddings matrix: (1655, 200)\n"
804 | ]
805 | }
806 | ],
807 | "source": [
808 | "print('Size of embeddings matrix: ', embeddings.shape)\n",
809 | "with open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/embeddings.pkl', 'wb') as embeddings_file:\n",
810 | " pickle.dump(embeddings, embeddings_file)\n",
811 | "\n",
812 | "with open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/token_to_idx.pkl', 'wb') as token_idx_file:\n",
813 | " pickle.dump(token_to_idx, token_idx_file)\n",
814 | "\n",
815 | "with open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/idx_to_token.pkl', 'wb') as idx_token_file:\n",
816 | " pickle.dump(idx_to_token, idx_token_file)"
817 | ]
818 | },
819 | {
820 | "cell_type": "markdown",
821 | "metadata": {
822 | "id": "dDu8LfveH5yH"
823 | },
824 | "source": [
825 | "### **3. Encoder: InceptionV3 model**\n",
826 | "> Creating an instance of the InceptionV3 architecture pretrained on ImageNet\n",
827 | "\n",
828 | "> Extracting the feature vector from the last convolutional layer of shape (8 x 8 x 2048): Parameter include_top set to False"
829 | ]
830 | },
831 | {
832 | "cell_type": "markdown",
833 | "metadata": {
834 | "id": "GuO6RAjMV_K6"
835 | },
836 | "source": [
837 | "#### Method 1: Feature Extraction (without tensorFlow's parallel computation)"
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "execution_count": null,
843 | "metadata": {
844 | "id": "IFVsxxJoaZrr"
845 | },
846 | "outputs": [],
847 | "source": [
848 | "# Fit the encoder model to the training and validation dataset\n",
849 | "model = InceptionV3(weights = 'imagenet')\n",
850 | "model_new = Model(model.input, model.layers[-2].output)\n",
851 | "\n",
852 | "def preprocess(image_path):\n",
853 | " img = image.load_img(image_path, target_size=(299, 299))\n",
854 | " x = image.img_to_array(img)\n",
855 | " x = np.expand_dims(x, axis=0)\n",
856 | " img = tf.keras.applications.inception_v3.preprocess_input(x)\n",
857 | " return x"
858 | ]
859 | },
860 | {
861 | "cell_type": "code",
862 | "execution_count": null,
863 | "metadata": {
864 | "id": "en1RGB0yfYEl"
865 | },
866 | "outputs": [],
867 | "source": [
868 | "features_dict = {}\n",
869 | "\n",
870 | "start = time.time()\n",
871 | "for img_path in tqdm(test_images):\n",
872 | " pre_img = preprocess(img_path)\n",
873 | " features = model_new(pre_img)\n",
874 | " features = np.reshape(features, features.shape[1])\n",
875 | "\n",
876 | " features_dict[img_path] = features\n",
877 | "\n",
878 | "end = time.time()\n",
879 | "print('Time taken to generate features vector (train data): ', (end - start)/60)"
880 | ]
881 | },
882 | {
883 | "cell_type": "markdown",
884 | "metadata": {
885 | "id": "BGvef8EKV75r"
886 | },
887 | "source": [
888 | "#### Method 2: Feature Extraction (with TensorFLow's batch processing)"
889 | ]
890 | },
891 | {
892 | "cell_type": "code",
893 | "execution_count": null,
894 | "metadata": {
895 | "colab": {
896 | "base_uri": "https://localhost:8080/"
897 | },
898 | "id": "nAN0se2UIBBR",
899 | "outputId": "081e2f9e-c868-4d75-f1a9-499c73990f4b"
900 | },
901 | "outputs": [
902 | {
903 | "name": "stdout",
904 | "output_type": "stream",
905 | "text": [
906 | "Model: \"model\"\n",
907 | "__________________________________________________________________________________________________\n",
908 | "Layer (type) Output Shape Param # Connected to \n",
909 | "==================================================================================================\n",
910 | "input_1 (InputLayer) [(None, None, None, 0 \n",
911 | "__________________________________________________________________________________________________\n",
912 | "conv2d (Conv2D) (None, None, None, 3 864 input_1[0][0] \n",
913 | "__________________________________________________________________________________________________\n",
914 | "batch_normalization (BatchNorma (None, None, None, 3 96 conv2d[0][0] \n",
915 | "__________________________________________________________________________________________________\n",
916 | "activation (Activation) (None, None, None, 3 0 batch_normalization[0][0] \n",
917 | "__________________________________________________________________________________________________\n",
918 | "conv2d_1 (Conv2D) (None, None, None, 3 9216 activation[0][0] \n",
919 | "__________________________________________________________________________________________________\n",
920 | "batch_normalization_1 (BatchNor (None, None, None, 3 96 conv2d_1[0][0] \n",
921 | "__________________________________________________________________________________________________\n",
922 | "activation_1 (Activation) (None, None, None, 3 0 batch_normalization_1[0][0] \n",
923 | "__________________________________________________________________________________________________\n",
924 | "conv2d_2 (Conv2D) (None, None, None, 6 18432 activation_1[0][0] \n",
925 | "__________________________________________________________________________________________________\n",
926 | "batch_normalization_2 (BatchNor (None, None, None, 6 192 conv2d_2[0][0] \n",
927 | "__________________________________________________________________________________________________\n",
928 | "activation_2 (Activation) (None, None, None, 6 0 batch_normalization_2[0][0] \n",
929 | "__________________________________________________________________________________________________\n",
930 | "max_pooling2d (MaxPooling2D) (None, None, None, 6 0 activation_2[0][0] \n",
931 | "__________________________________________________________________________________________________\n",
932 | "conv2d_3 (Conv2D) (None, None, None, 8 5120 max_pooling2d[0][0] \n",
933 | "__________________________________________________________________________________________________\n",
934 | "batch_normalization_3 (BatchNor (None, None, None, 8 240 conv2d_3[0][0] \n",
935 | "__________________________________________________________________________________________________\n",
936 | "activation_3 (Activation) (None, None, None, 8 0 batch_normalization_3[0][0] \n",
937 | "__________________________________________________________________________________________________\n",
938 | "conv2d_4 (Conv2D) (None, None, None, 1 138240 activation_3[0][0] \n",
939 | "__________________________________________________________________________________________________\n",
940 | "batch_normalization_4 (BatchNor (None, None, None, 1 576 conv2d_4[0][0] \n",
941 | "__________________________________________________________________________________________________\n",
942 | "activation_4 (Activation) (None, None, None, 1 0 batch_normalization_4[0][0] \n",
943 | "__________________________________________________________________________________________________\n",
944 | "max_pooling2d_1 (MaxPooling2D) (None, None, None, 1 0 activation_4[0][0] \n",
945 | "__________________________________________________________________________________________________\n",
946 | "conv2d_8 (Conv2D) (None, None, None, 6 12288 max_pooling2d_1[0][0] \n",
947 | "__________________________________________________________________________________________________\n",
948 | "batch_normalization_8 (BatchNor (None, None, None, 6 192 conv2d_8[0][0] \n",
949 | "__________________________________________________________________________________________________\n",
950 | "activation_8 (Activation) (None, None, None, 6 0 batch_normalization_8[0][0] \n",
951 | "__________________________________________________________________________________________________\n",
952 | "conv2d_6 (Conv2D) (None, None, None, 4 9216 max_pooling2d_1[0][0] \n",
953 | "__________________________________________________________________________________________________\n",
954 | "conv2d_9 (Conv2D) (None, None, None, 9 55296 activation_8[0][0] \n",
955 | "__________________________________________________________________________________________________\n",
956 | "batch_normalization_6 (BatchNor (None, None, None, 4 144 conv2d_6[0][0] \n",
957 | "__________________________________________________________________________________________________\n",
958 | "batch_normalization_9 (BatchNor (None, None, None, 9 288 conv2d_9[0][0] \n",
959 | "__________________________________________________________________________________________________\n",
960 | "activation_6 (Activation) (None, None, None, 4 0 batch_normalization_6[0][0] \n",
961 | "__________________________________________________________________________________________________\n",
962 | "activation_9 (Activation) (None, None, None, 9 0 batch_normalization_9[0][0] \n",
963 | "__________________________________________________________________________________________________\n",
964 | "average_pooling2d (AveragePooli (None, None, None, 1 0 max_pooling2d_1[0][0] \n",
965 | "__________________________________________________________________________________________________\n",
966 | "conv2d_5 (Conv2D) (None, None, None, 6 12288 max_pooling2d_1[0][0] \n",
967 | "__________________________________________________________________________________________________\n",
968 | "conv2d_7 (Conv2D) (None, None, None, 6 76800 activation_6[0][0] \n",
969 | "__________________________________________________________________________________________________\n",
970 | "conv2d_10 (Conv2D) (None, None, None, 9 82944 activation_9[0][0] \n",
971 | "__________________________________________________________________________________________________\n",
972 | "conv2d_11 (Conv2D) (None, None, None, 3 6144 average_pooling2d[0][0] \n",
973 | "__________________________________________________________________________________________________\n",
974 | "batch_normalization_5 (BatchNor (None, None, None, 6 192 conv2d_5[0][0] \n",
975 | "__________________________________________________________________________________________________\n",
976 | "batch_normalization_7 (BatchNor (None, None, None, 6 192 conv2d_7[0][0] \n",
977 | "__________________________________________________________________________________________________\n",
978 | "batch_normalization_10 (BatchNo (None, None, None, 9 288 conv2d_10[0][0] \n",
979 | "__________________________________________________________________________________________________\n",
980 | "batch_normalization_11 (BatchNo (None, None, None, 3 96 conv2d_11[0][0] \n",
981 | "__________________________________________________________________________________________________\n",
982 | "activation_5 (Activation) (None, None, None, 6 0 batch_normalization_5[0][0] \n",
983 | "__________________________________________________________________________________________________\n",
984 | "activation_7 (Activation) (None, None, None, 6 0 batch_normalization_7[0][0] \n",
985 | "__________________________________________________________________________________________________\n",
986 | "activation_10 (Activation) (None, None, None, 9 0 batch_normalization_10[0][0] \n",
987 | "__________________________________________________________________________________________________\n",
988 | "activation_11 (Activation) (None, None, None, 3 0 batch_normalization_11[0][0] \n",
989 | "__________________________________________________________________________________________________\n",
990 | "mixed0 (Concatenate) (None, None, None, 2 0 activation_5[0][0] \n",
991 | " activation_7[0][0] \n",
992 | " activation_10[0][0] \n",
993 | " activation_11[0][0] \n",
994 | "__________________________________________________________________________________________________\n",
995 | "conv2d_15 (Conv2D) (None, None, None, 6 16384 mixed0[0][0] \n",
996 | "__________________________________________________________________________________________________\n",
997 | "batch_normalization_15 (BatchNo (None, None, None, 6 192 conv2d_15[0][0] \n",
998 | "__________________________________________________________________________________________________\n",
999 | "activation_15 (Activation) (None, None, None, 6 0 batch_normalization_15[0][0] \n",
1000 | "__________________________________________________________________________________________________\n",
1001 | "conv2d_13 (Conv2D) (None, None, None, 4 12288 mixed0[0][0] \n",
1002 | "__________________________________________________________________________________________________\n",
1003 | "conv2d_16 (Conv2D) (None, None, None, 9 55296 activation_15[0][0] \n",
1004 | "__________________________________________________________________________________________________\n",
1005 | "batch_normalization_13 (BatchNo (None, None, None, 4 144 conv2d_13[0][0] \n",
1006 | "__________________________________________________________________________________________________\n",
1007 | "batch_normalization_16 (BatchNo (None, None, None, 9 288 conv2d_16[0][0] \n",
1008 | "__________________________________________________________________________________________________\n",
1009 | "activation_13 (Activation) (None, None, None, 4 0 batch_normalization_13[0][0] \n",
1010 | "__________________________________________________________________________________________________\n",
1011 | "activation_16 (Activation) (None, None, None, 9 0 batch_normalization_16[0][0] \n",
1012 | "__________________________________________________________________________________________________\n",
1013 | "average_pooling2d_1 (AveragePoo (None, None, None, 2 0 mixed0[0][0] \n",
1014 | "__________________________________________________________________________________________________\n",
1015 | "conv2d_12 (Conv2D) (None, None, None, 6 16384 mixed0[0][0] \n",
1016 | "__________________________________________________________________________________________________\n",
1017 | "conv2d_14 (Conv2D) (None, None, None, 6 76800 activation_13[0][0] \n",
1018 | "__________________________________________________________________________________________________\n",
1019 | "conv2d_17 (Conv2D) (None, None, None, 9 82944 activation_16[0][0] \n",
1020 | "__________________________________________________________________________________________________\n",
1021 | "conv2d_18 (Conv2D) (None, None, None, 6 16384 average_pooling2d_1[0][0] \n",
1022 | "__________________________________________________________________________________________________\n",
1023 | "batch_normalization_12 (BatchNo (None, None, None, 6 192 conv2d_12[0][0] \n",
1024 | "__________________________________________________________________________________________________\n",
1025 | "batch_normalization_14 (BatchNo (None, None, None, 6 192 conv2d_14[0][0] \n",
1026 | "__________________________________________________________________________________________________\n",
1027 | "batch_normalization_17 (BatchNo (None, None, None, 9 288 conv2d_17[0][0] \n",
1028 | "__________________________________________________________________________________________________\n",
1029 | "batch_normalization_18 (BatchNo (None, None, None, 6 192 conv2d_18[0][0] \n",
1030 | "__________________________________________________________________________________________________\n",
1031 | "activation_12 (Activation) (None, None, None, 6 0 batch_normalization_12[0][0] \n",
1032 | "__________________________________________________________________________________________________\n",
1033 | "activation_14 (Activation) (None, None, None, 6 0 batch_normalization_14[0][0] \n",
1034 | "__________________________________________________________________________________________________\n",
1035 | "activation_17 (Activation) (None, None, None, 9 0 batch_normalization_17[0][0] \n",
1036 | "__________________________________________________________________________________________________\n",
1037 | "activation_18 (Activation) (None, None, None, 6 0 batch_normalization_18[0][0] \n",
1038 | "__________________________________________________________________________________________________\n",
1039 | "mixed1 (Concatenate) (None, None, None, 2 0 activation_12[0][0] \n",
1040 | " activation_14[0][0] \n",
1041 | " activation_17[0][0] \n",
1042 | " activation_18[0][0] \n",
1043 | "__________________________________________________________________________________________________\n",
1044 | "conv2d_22 (Conv2D) (None, None, None, 6 18432 mixed1[0][0] \n",
1045 | "__________________________________________________________________________________________________\n",
1046 | "batch_normalization_22 (BatchNo (None, None, None, 6 192 conv2d_22[0][0] \n",
1047 | "__________________________________________________________________________________________________\n",
1048 | "activation_22 (Activation) (None, None, None, 6 0 batch_normalization_22[0][0] \n",
1049 | "__________________________________________________________________________________________________\n",
1050 | "conv2d_20 (Conv2D) (None, None, None, 4 13824 mixed1[0][0] \n",
1051 | "__________________________________________________________________________________________________\n",
1052 | "conv2d_23 (Conv2D) (None, None, None, 9 55296 activation_22[0][0] \n",
1053 | "__________________________________________________________________________________________________\n",
1054 | "batch_normalization_20 (BatchNo (None, None, None, 4 144 conv2d_20[0][0] \n",
1055 | "__________________________________________________________________________________________________\n",
1056 | "batch_normalization_23 (BatchNo (None, None, None, 9 288 conv2d_23[0][0] \n",
1057 | "__________________________________________________________________________________________________\n",
1058 | "activation_20 (Activation) (None, None, None, 4 0 batch_normalization_20[0][0] \n",
1059 | "__________________________________________________________________________________________________\n",
1060 | "activation_23 (Activation) (None, None, None, 9 0 batch_normalization_23[0][0] \n",
1061 | "__________________________________________________________________________________________________\n",
1062 | "average_pooling2d_2 (AveragePoo (None, None, None, 2 0 mixed1[0][0] \n",
1063 | "__________________________________________________________________________________________________\n",
1064 | "conv2d_19 (Conv2D) (None, None, None, 6 18432 mixed1[0][0] \n",
1065 | "__________________________________________________________________________________________________\n",
1066 | "conv2d_21 (Conv2D) (None, None, None, 6 76800 activation_20[0][0] \n",
1067 | "__________________________________________________________________________________________________\n",
1068 | "conv2d_24 (Conv2D) (None, None, None, 9 82944 activation_23[0][0] \n",
1069 | "__________________________________________________________________________________________________\n",
1070 | "conv2d_25 (Conv2D) (None, None, None, 6 18432 average_pooling2d_2[0][0] \n",
1071 | "__________________________________________________________________________________________________\n",
1072 | "batch_normalization_19 (BatchNo (None, None, None, 6 192 conv2d_19[0][0] \n",
1073 | "__________________________________________________________________________________________________\n",
1074 | "batch_normalization_21 (BatchNo (None, None, None, 6 192 conv2d_21[0][0] \n",
1075 | "__________________________________________________________________________________________________\n",
1076 | "batch_normalization_24 (BatchNo (None, None, None, 9 288 conv2d_24[0][0] \n",
1077 | "__________________________________________________________________________________________________\n",
1078 | "batch_normalization_25 (BatchNo (None, None, None, 6 192 conv2d_25[0][0] \n",
1079 | "__________________________________________________________________________________________________\n",
1080 | "activation_19 (Activation) (None, None, None, 6 0 batch_normalization_19[0][0] \n",
1081 | "__________________________________________________________________________________________________\n",
1082 | "activation_21 (Activation) (None, None, None, 6 0 batch_normalization_21[0][0] \n",
1083 | "__________________________________________________________________________________________________\n",
1084 | "activation_24 (Activation) (None, None, None, 9 0 batch_normalization_24[0][0] \n",
1085 | "__________________________________________________________________________________________________\n",
1086 | "activation_25 (Activation) (None, None, None, 6 0 batch_normalization_25[0][0] \n",
1087 | "__________________________________________________________________________________________________\n",
1088 | "mixed2 (Concatenate) (None, None, None, 2 0 activation_19[0][0] \n",
1089 | " activation_21[0][0] \n",
1090 | " activation_24[0][0] \n",
1091 | " activation_25[0][0] \n",
1092 | "__________________________________________________________________________________________________\n",
1093 | "conv2d_27 (Conv2D) (None, None, None, 6 18432 mixed2[0][0] \n",
1094 | "__________________________________________________________________________________________________\n",
1095 | "batch_normalization_27 (BatchNo (None, None, None, 6 192 conv2d_27[0][0] \n",
1096 | "__________________________________________________________________________________________________\n",
1097 | "activation_27 (Activation) (None, None, None, 6 0 batch_normalization_27[0][0] \n",
1098 | "__________________________________________________________________________________________________\n",
1099 | "conv2d_28 (Conv2D) (None, None, None, 9 55296 activation_27[0][0] \n",
1100 | "__________________________________________________________________________________________________\n",
1101 | "batch_normalization_28 (BatchNo (None, None, None, 9 288 conv2d_28[0][0] \n",
1102 | "__________________________________________________________________________________________________\n",
1103 | "activation_28 (Activation) (None, None, None, 9 0 batch_normalization_28[0][0] \n",
1104 | "__________________________________________________________________________________________________\n",
1105 | "conv2d_26 (Conv2D) (None, None, None, 3 995328 mixed2[0][0] \n",
1106 | "__________________________________________________________________________________________________\n",
1107 | "conv2d_29 (Conv2D) (None, None, None, 9 82944 activation_28[0][0] \n",
1108 | "__________________________________________________________________________________________________\n",
1109 | "batch_normalization_26 (BatchNo (None, None, None, 3 1152 conv2d_26[0][0] \n",
1110 | "__________________________________________________________________________________________________\n",
1111 | "batch_normalization_29 (BatchNo (None, None, None, 9 288 conv2d_29[0][0] \n",
1112 | "__________________________________________________________________________________________________\n",
1113 | "activation_26 (Activation) (None, None, None, 3 0 batch_normalization_26[0][0] \n",
1114 | "__________________________________________________________________________________________________\n",
1115 | "activation_29 (Activation) (None, None, None, 9 0 batch_normalization_29[0][0] \n",
1116 | "__________________________________________________________________________________________________\n",
1117 | "max_pooling2d_2 (MaxPooling2D) (None, None, None, 2 0 mixed2[0][0] \n",
1118 | "__________________________________________________________________________________________________\n",
1119 | "mixed3 (Concatenate) (None, None, None, 7 0 activation_26[0][0] \n",
1120 | " activation_29[0][0] \n",
1121 | " max_pooling2d_2[0][0] \n",
1122 | "__________________________________________________________________________________________________\n",
1123 | "conv2d_34 (Conv2D) (None, None, None, 1 98304 mixed3[0][0] \n",
1124 | "__________________________________________________________________________________________________\n",
1125 | "batch_normalization_34 (BatchNo (None, None, None, 1 384 conv2d_34[0][0] \n",
1126 | "__________________________________________________________________________________________________\n",
1127 | "activation_34 (Activation) (None, None, None, 1 0 batch_normalization_34[0][0] \n",
1128 | "__________________________________________________________________________________________________\n",
1129 | "conv2d_35 (Conv2D) (None, None, None, 1 114688 activation_34[0][0] \n",
1130 | "__________________________________________________________________________________________________\n",
1131 | "batch_normalization_35 (BatchNo (None, None, None, 1 384 conv2d_35[0][0] \n",
1132 | "__________________________________________________________________________________________________\n",
1133 | "activation_35 (Activation) (None, None, None, 1 0 batch_normalization_35[0][0] \n",
1134 | "__________________________________________________________________________________________________\n",
1135 | "conv2d_31 (Conv2D) (None, None, None, 1 98304 mixed3[0][0] \n",
1136 | "__________________________________________________________________________________________________\n",
1137 | "conv2d_36 (Conv2D) (None, None, None, 1 114688 activation_35[0][0] \n",
1138 | "__________________________________________________________________________________________________\n",
1139 | "batch_normalization_31 (BatchNo (None, None, None, 1 384 conv2d_31[0][0] \n",
1140 | "__________________________________________________________________________________________________\n",
1141 | "batch_normalization_36 (BatchNo (None, None, None, 1 384 conv2d_36[0][0] \n",
1142 | "__________________________________________________________________________________________________\n",
1143 | "activation_31 (Activation) (None, None, None, 1 0 batch_normalization_31[0][0] \n",
1144 | "__________________________________________________________________________________________________\n",
1145 | "activation_36 (Activation) (None, None, None, 1 0 batch_normalization_36[0][0] \n",
1146 | "__________________________________________________________________________________________________\n",
1147 | "conv2d_32 (Conv2D) (None, None, None, 1 114688 activation_31[0][0] \n",
1148 | "__________________________________________________________________________________________________\n",
1149 | "conv2d_37 (Conv2D) (None, None, None, 1 114688 activation_36[0][0] \n",
1150 | "__________________________________________________________________________________________________\n",
1151 | "batch_normalization_32 (BatchNo (None, None, None, 1 384 conv2d_32[0][0] \n",
1152 | "__________________________________________________________________________________________________\n",
1153 | "batch_normalization_37 (BatchNo (None, None, None, 1 384 conv2d_37[0][0] \n",
1154 | "__________________________________________________________________________________________________\n",
1155 | "activation_32 (Activation) (None, None, None, 1 0 batch_normalization_32[0][0] \n",
1156 | "__________________________________________________________________________________________________\n",
1157 | "activation_37 (Activation) (None, None, None, 1 0 batch_normalization_37[0][0] \n",
1158 | "__________________________________________________________________________________________________\n",
1159 | "average_pooling2d_3 (AveragePoo (None, None, None, 7 0 mixed3[0][0] \n",
1160 | "__________________________________________________________________________________________________\n",
1161 | "conv2d_30 (Conv2D) (None, None, None, 1 147456 mixed3[0][0] \n",
1162 | "__________________________________________________________________________________________________\n",
1163 | "conv2d_33 (Conv2D) (None, None, None, 1 172032 activation_32[0][0] \n",
1164 | "__________________________________________________________________________________________________\n",
1165 | "conv2d_38 (Conv2D) (None, None, None, 1 172032 activation_37[0][0] \n",
1166 | "__________________________________________________________________________________________________\n",
1167 | "conv2d_39 (Conv2D) (None, None, None, 1 147456 average_pooling2d_3[0][0] \n",
1168 | "__________________________________________________________________________________________________\n",
1169 | "batch_normalization_30 (BatchNo (None, None, None, 1 576 conv2d_30[0][0] \n",
1170 | "__________________________________________________________________________________________________\n",
1171 | "batch_normalization_33 (BatchNo (None, None, None, 1 576 conv2d_33[0][0] \n",
1172 | "__________________________________________________________________________________________________\n",
1173 | "batch_normalization_38 (BatchNo (None, None, None, 1 576 conv2d_38[0][0] \n",
1174 | "__________________________________________________________________________________________________\n",
1175 | "batch_normalization_39 (BatchNo (None, None, None, 1 576 conv2d_39[0][0] \n",
1176 | "__________________________________________________________________________________________________\n",
1177 | "activation_30 (Activation) (None, None, None, 1 0 batch_normalization_30[0][0] \n",
1178 | "__________________________________________________________________________________________________\n",
1179 | "activation_33 (Activation) (None, None, None, 1 0 batch_normalization_33[0][0] \n",
1180 | "__________________________________________________________________________________________________\n",
1181 | "activation_38 (Activation) (None, None, None, 1 0 batch_normalization_38[0][0] \n",
1182 | "__________________________________________________________________________________________________\n",
1183 | "activation_39 (Activation) (None, None, None, 1 0 batch_normalization_39[0][0] \n",
1184 | "__________________________________________________________________________________________________\n",
1185 | "mixed4 (Concatenate) (None, None, None, 7 0 activation_30[0][0] \n",
1186 | " activation_33[0][0] \n",
1187 | " activation_38[0][0] \n",
1188 | " activation_39[0][0] \n",
1189 | "__________________________________________________________________________________________________\n",
1190 | "conv2d_44 (Conv2D) (None, None, None, 1 122880 mixed4[0][0] \n",
1191 | "__________________________________________________________________________________________________\n",
1192 | "batch_normalization_44 (BatchNo (None, None, None, 1 480 conv2d_44[0][0] \n",
1193 | "__________________________________________________________________________________________________\n",
1194 | "activation_44 (Activation) (None, None, None, 1 0 batch_normalization_44[0][0] \n",
1195 | "__________________________________________________________________________________________________\n",
1196 | "conv2d_45 (Conv2D) (None, None, None, 1 179200 activation_44[0][0] \n",
1197 | "__________________________________________________________________________________________________\n",
1198 | "batch_normalization_45 (BatchNo (None, None, None, 1 480 conv2d_45[0][0] \n",
1199 | "__________________________________________________________________________________________________\n",
1200 | "activation_45 (Activation) (None, None, None, 1 0 batch_normalization_45[0][0] \n",
1201 | "__________________________________________________________________________________________________\n",
1202 | "conv2d_41 (Conv2D) (None, None, None, 1 122880 mixed4[0][0] \n",
1203 | "__________________________________________________________________________________________________\n",
1204 | "conv2d_46 (Conv2D) (None, None, None, 1 179200 activation_45[0][0] \n",
1205 | "__________________________________________________________________________________________________\n",
1206 | "batch_normalization_41 (BatchNo (None, None, None, 1 480 conv2d_41[0][0] \n",
1207 | "__________________________________________________________________________________________________\n",
1208 | "batch_normalization_46 (BatchNo (None, None, None, 1 480 conv2d_46[0][0] \n",
1209 | "__________________________________________________________________________________________________\n",
1210 | "activation_41 (Activation) (None, None, None, 1 0 batch_normalization_41[0][0] \n",
1211 | "__________________________________________________________________________________________________\n",
1212 | "activation_46 (Activation) (None, None, None, 1 0 batch_normalization_46[0][0] \n",
1213 | "__________________________________________________________________________________________________\n",
1214 | "conv2d_42 (Conv2D) (None, None, None, 1 179200 activation_41[0][0] \n",
1215 | "__________________________________________________________________________________________________\n",
1216 | "conv2d_47 (Conv2D) (None, None, None, 1 179200 activation_46[0][0] \n",
1217 | "__________________________________________________________________________________________________\n",
1218 | "batch_normalization_42 (BatchNo (None, None, None, 1 480 conv2d_42[0][0] \n",
1219 | "__________________________________________________________________________________________________\n",
1220 | "batch_normalization_47 (BatchNo (None, None, None, 1 480 conv2d_47[0][0] \n",
1221 | "__________________________________________________________________________________________________\n",
1222 | "activation_42 (Activation) (None, None, None, 1 0 batch_normalization_42[0][0] \n",
1223 | "__________________________________________________________________________________________________\n",
1224 | "activation_47 (Activation) (None, None, None, 1 0 batch_normalization_47[0][0] \n",
1225 | "__________________________________________________________________________________________________\n",
1226 | "average_pooling2d_4 (AveragePoo (None, None, None, 7 0 mixed4[0][0] \n",
1227 | "__________________________________________________________________________________________________\n",
1228 | "conv2d_40 (Conv2D) (None, None, None, 1 147456 mixed4[0][0] \n",
1229 | "__________________________________________________________________________________________________\n",
1230 | "conv2d_43 (Conv2D) (None, None, None, 1 215040 activation_42[0][0] \n",
1231 | "__________________________________________________________________________________________________\n",
1232 | "conv2d_48 (Conv2D) (None, None, None, 1 215040 activation_47[0][0] \n",
1233 | "__________________________________________________________________________________________________\n",
1234 | "conv2d_49 (Conv2D) (None, None, None, 1 147456 average_pooling2d_4[0][0] \n",
1235 | "__________________________________________________________________________________________________\n",
1236 | "batch_normalization_40 (BatchNo (None, None, None, 1 576 conv2d_40[0][0] \n",
1237 | "__________________________________________________________________________________________________\n",
1238 | "batch_normalization_43 (BatchNo (None, None, None, 1 576 conv2d_43[0][0] \n",
1239 | "__________________________________________________________________________________________________\n",
1240 | "batch_normalization_48 (BatchNo (None, None, None, 1 576 conv2d_48[0][0] \n",
1241 | "__________________________________________________________________________________________________\n",
1242 | "batch_normalization_49 (BatchNo (None, None, None, 1 576 conv2d_49[0][0] \n",
1243 | "__________________________________________________________________________________________________\n",
1244 | "activation_40 (Activation) (None, None, None, 1 0 batch_normalization_40[0][0] \n",
1245 | "__________________________________________________________________________________________________\n",
1246 | "activation_43 (Activation) (None, None, None, 1 0 batch_normalization_43[0][0] \n",
1247 | "__________________________________________________________________________________________________\n",
1248 | "activation_48 (Activation) (None, None, None, 1 0 batch_normalization_48[0][0] \n",
1249 | "__________________________________________________________________________________________________\n",
1250 | "activation_49 (Activation) (None, None, None, 1 0 batch_normalization_49[0][0] \n",
1251 | "__________________________________________________________________________________________________\n",
1252 | "mixed5 (Concatenate) (None, None, None, 7 0 activation_40[0][0] \n",
1253 | " activation_43[0][0] \n",
1254 | " activation_48[0][0] \n",
1255 | " activation_49[0][0] \n",
1256 | "__________________________________________________________________________________________________\n",
1257 | "conv2d_54 (Conv2D) (None, None, None, 1 122880 mixed5[0][0] \n",
1258 | "__________________________________________________________________________________________________\n",
1259 | "batch_normalization_54 (BatchNo (None, None, None, 1 480 conv2d_54[0][0] \n",
1260 | "__________________________________________________________________________________________________\n",
1261 | "activation_54 (Activation) (None, None, None, 1 0 batch_normalization_54[0][0] \n",
1262 | "__________________________________________________________________________________________________\n",
1263 | "conv2d_55 (Conv2D) (None, None, None, 1 179200 activation_54[0][0] \n",
1264 | "__________________________________________________________________________________________________\n",
1265 | "batch_normalization_55 (BatchNo (None, None, None, 1 480 conv2d_55[0][0] \n",
1266 | "__________________________________________________________________________________________________\n",
1267 | "activation_55 (Activation) (None, None, None, 1 0 batch_normalization_55[0][0] \n",
1268 | "__________________________________________________________________________________________________\n",
1269 | "conv2d_51 (Conv2D) (None, None, None, 1 122880 mixed5[0][0] \n",
1270 | "__________________________________________________________________________________________________\n",
1271 | "conv2d_56 (Conv2D) (None, None, None, 1 179200 activation_55[0][0] \n",
1272 | "__________________________________________________________________________________________________\n",
1273 | "batch_normalization_51 (BatchNo (None, None, None, 1 480 conv2d_51[0][0] \n",
1274 | "__________________________________________________________________________________________________\n",
1275 | "batch_normalization_56 (BatchNo (None, None, None, 1 480 conv2d_56[0][0] \n",
1276 | "__________________________________________________________________________________________________\n",
1277 | "activation_51 (Activation) (None, None, None, 1 0 batch_normalization_51[0][0] \n",
1278 | "__________________________________________________________________________________________________\n",
1279 | "activation_56 (Activation) (None, None, None, 1 0 batch_normalization_56[0][0] \n",
1280 | "__________________________________________________________________________________________________\n",
1281 | "conv2d_52 (Conv2D) (None, None, None, 1 179200 activation_51[0][0] \n",
1282 | "__________________________________________________________________________________________________\n",
1283 | "conv2d_57 (Conv2D) (None, None, None, 1 179200 activation_56[0][0] \n",
1284 | "__________________________________________________________________________________________________\n",
1285 | "batch_normalization_52 (BatchNo (None, None, None, 1 480 conv2d_52[0][0] \n",
1286 | "__________________________________________________________________________________________________\n",
1287 | "batch_normalization_57 (BatchNo (None, None, None, 1 480 conv2d_57[0][0] \n",
1288 | "__________________________________________________________________________________________________\n",
1289 | "activation_52 (Activation) (None, None, None, 1 0 batch_normalization_52[0][0] \n",
1290 | "__________________________________________________________________________________________________\n",
1291 | "activation_57 (Activation) (None, None, None, 1 0 batch_normalization_57[0][0] \n",
1292 | "__________________________________________________________________________________________________\n",
1293 | "average_pooling2d_5 (AveragePoo (None, None, None, 7 0 mixed5[0][0] \n",
1294 | "__________________________________________________________________________________________________\n",
1295 | "conv2d_50 (Conv2D) (None, None, None, 1 147456 mixed5[0][0] \n",
1296 | "__________________________________________________________________________________________________\n",
1297 | "conv2d_53 (Conv2D) (None, None, None, 1 215040 activation_52[0][0] \n",
1298 | "__________________________________________________________________________________________________\n",
1299 | "conv2d_58 (Conv2D) (None, None, None, 1 215040 activation_57[0][0] \n",
1300 | "__________________________________________________________________________________________________\n",
1301 | "conv2d_59 (Conv2D) (None, None, None, 1 147456 average_pooling2d_5[0][0] \n",
1302 | "__________________________________________________________________________________________________\n",
1303 | "batch_normalization_50 (BatchNo (None, None, None, 1 576 conv2d_50[0][0] \n",
1304 | "__________________________________________________________________________________________________\n",
1305 | "batch_normalization_53 (BatchNo (None, None, None, 1 576 conv2d_53[0][0] \n",
1306 | "__________________________________________________________________________________________________\n",
1307 | "batch_normalization_58 (BatchNo (None, None, None, 1 576 conv2d_58[0][0] \n",
1308 | "__________________________________________________________________________________________________\n",
1309 | "batch_normalization_59 (BatchNo (None, None, None, 1 576 conv2d_59[0][0] \n",
1310 | "__________________________________________________________________________________________________\n",
1311 | "activation_50 (Activation) (None, None, None, 1 0 batch_normalization_50[0][0] \n",
1312 | "__________________________________________________________________________________________________\n",
1313 | "activation_53 (Activation) (None, None, None, 1 0 batch_normalization_53[0][0] \n",
1314 | "__________________________________________________________________________________________________\n",
1315 | "activation_58 (Activation) (None, None, None, 1 0 batch_normalization_58[0][0] \n",
1316 | "__________________________________________________________________________________________________\n",
1317 | "activation_59 (Activation) (None, None, None, 1 0 batch_normalization_59[0][0] \n",
1318 | "__________________________________________________________________________________________________\n",
1319 | "mixed6 (Concatenate) (None, None, None, 7 0 activation_50[0][0] \n",
1320 | " activation_53[0][0] \n",
1321 | " activation_58[0][0] \n",
1322 | " activation_59[0][0] \n",
1323 | "__________________________________________________________________________________________________\n",
1324 | "conv2d_64 (Conv2D) (None, None, None, 1 147456 mixed6[0][0] \n",
1325 | "__________________________________________________________________________________________________\n",
1326 | "batch_normalization_64 (BatchNo (None, None, None, 1 576 conv2d_64[0][0] \n",
1327 | "__________________________________________________________________________________________________\n",
1328 | "activation_64 (Activation) (None, None, None, 1 0 batch_normalization_64[0][0] \n",
1329 | "__________________________________________________________________________________________________\n",
1330 | "conv2d_65 (Conv2D) (None, None, None, 1 258048 activation_64[0][0] \n",
1331 | "__________________________________________________________________________________________________\n",
1332 | "batch_normalization_65 (BatchNo (None, None, None, 1 576 conv2d_65[0][0] \n",
1333 | "__________________________________________________________________________________________________\n",
1334 | "activation_65 (Activation) (None, None, None, 1 0 batch_normalization_65[0][0] \n",
1335 | "__________________________________________________________________________________________________\n",
1336 | "conv2d_61 (Conv2D) (None, None, None, 1 147456 mixed6[0][0] \n",
1337 | "__________________________________________________________________________________________________\n",
1338 | "conv2d_66 (Conv2D) (None, None, None, 1 258048 activation_65[0][0] \n",
1339 | "__________________________________________________________________________________________________\n",
1340 | "batch_normalization_61 (BatchNo (None, None, None, 1 576 conv2d_61[0][0] \n",
1341 | "__________________________________________________________________________________________________\n",
1342 | "batch_normalization_66 (BatchNo (None, None, None, 1 576 conv2d_66[0][0] \n",
1343 | "__________________________________________________________________________________________________\n",
1344 | "activation_61 (Activation) (None, None, None, 1 0 batch_normalization_61[0][0] \n",
1345 | "__________________________________________________________________________________________________\n",
1346 | "activation_66 (Activation) (None, None, None, 1 0 batch_normalization_66[0][0] \n",
1347 | "__________________________________________________________________________________________________\n",
1348 | "conv2d_62 (Conv2D) (None, None, None, 1 258048 activation_61[0][0] \n",
1349 | "__________________________________________________________________________________________________\n",
1350 | "conv2d_67 (Conv2D) (None, None, None, 1 258048 activation_66[0][0] \n",
1351 | "__________________________________________________________________________________________________\n",
1352 | "batch_normalization_62 (BatchNo (None, None, None, 1 576 conv2d_62[0][0] \n",
1353 | "__________________________________________________________________________________________________\n",
1354 | "batch_normalization_67 (BatchNo (None, None, None, 1 576 conv2d_67[0][0] \n",
1355 | "__________________________________________________________________________________________________\n",
1356 | "activation_62 (Activation) (None, None, None, 1 0 batch_normalization_62[0][0] \n",
1357 | "__________________________________________________________________________________________________\n",
1358 | "activation_67 (Activation) (None, None, None, 1 0 batch_normalization_67[0][0] \n",
1359 | "__________________________________________________________________________________________________\n",
1360 | "average_pooling2d_6 (AveragePoo (None, None, None, 7 0 mixed6[0][0] \n",
1361 | "__________________________________________________________________________________________________\n",
1362 | "conv2d_60 (Conv2D) (None, None, None, 1 147456 mixed6[0][0] \n",
1363 | "__________________________________________________________________________________________________\n",
1364 | "conv2d_63 (Conv2D) (None, None, None, 1 258048 activation_62[0][0] \n",
1365 | "__________________________________________________________________________________________________\n",
1366 | "conv2d_68 (Conv2D) (None, None, None, 1 258048 activation_67[0][0] \n",
1367 | "__________________________________________________________________________________________________\n",
1368 | "conv2d_69 (Conv2D) (None, None, None, 1 147456 average_pooling2d_6[0][0] \n",
1369 | "__________________________________________________________________________________________________\n",
1370 | "batch_normalization_60 (BatchNo (None, None, None, 1 576 conv2d_60[0][0] \n",
1371 | "__________________________________________________________________________________________________\n",
1372 | "batch_normalization_63 (BatchNo (None, None, None, 1 576 conv2d_63[0][0] \n",
1373 | "__________________________________________________________________________________________________\n",
1374 | "batch_normalization_68 (BatchNo (None, None, None, 1 576 conv2d_68[0][0] \n",
1375 | "__________________________________________________________________________________________________\n",
1376 | "batch_normalization_69 (BatchNo (None, None, None, 1 576 conv2d_69[0][0] \n",
1377 | "__________________________________________________________________________________________________\n",
1378 | "activation_60 (Activation) (None, None, None, 1 0 batch_normalization_60[0][0] \n",
1379 | "__________________________________________________________________________________________________\n",
1380 | "activation_63 (Activation) (None, None, None, 1 0 batch_normalization_63[0][0] \n",
1381 | "__________________________________________________________________________________________________\n",
1382 | "activation_68 (Activation) (None, None, None, 1 0 batch_normalization_68[0][0] \n",
1383 | "__________________________________________________________________________________________________\n",
1384 | "activation_69 (Activation) (None, None, None, 1 0 batch_normalization_69[0][0] \n",
1385 | "__________________________________________________________________________________________________\n",
1386 | "mixed7 (Concatenate) (None, None, None, 7 0 activation_60[0][0] \n",
1387 | " activation_63[0][0] \n",
1388 | " activation_68[0][0] \n",
1389 | " activation_69[0][0] \n",
1390 | "__________________________________________________________________________________________________\n",
1391 | "conv2d_72 (Conv2D) (None, None, None, 1 147456 mixed7[0][0] \n",
1392 | "__________________________________________________________________________________________________\n",
1393 | "batch_normalization_72 (BatchNo (None, None, None, 1 576 conv2d_72[0][0] \n",
1394 | "__________________________________________________________________________________________________\n",
1395 | "activation_72 (Activation) (None, None, None, 1 0 batch_normalization_72[0][0] \n",
1396 | "__________________________________________________________________________________________________\n",
1397 | "conv2d_73 (Conv2D) (None, None, None, 1 258048 activation_72[0][0] \n",
1398 | "__________________________________________________________________________________________________\n",
1399 | "batch_normalization_73 (BatchNo (None, None, None, 1 576 conv2d_73[0][0] \n",
1400 | "__________________________________________________________________________________________________\n",
1401 | "activation_73 (Activation) (None, None, None, 1 0 batch_normalization_73[0][0] \n",
1402 | "__________________________________________________________________________________________________\n",
1403 | "conv2d_70 (Conv2D) (None, None, None, 1 147456 mixed7[0][0] \n",
1404 | "__________________________________________________________________________________________________\n",
1405 | "conv2d_74 (Conv2D) (None, None, None, 1 258048 activation_73[0][0] \n",
1406 | "__________________________________________________________________________________________________\n",
1407 | "batch_normalization_70 (BatchNo (None, None, None, 1 576 conv2d_70[0][0] \n",
1408 | "__________________________________________________________________________________________________\n",
1409 | "batch_normalization_74 (BatchNo (None, None, None, 1 576 conv2d_74[0][0] \n",
1410 | "__________________________________________________________________________________________________\n",
1411 | "activation_70 (Activation) (None, None, None, 1 0 batch_normalization_70[0][0] \n",
1412 | "__________________________________________________________________________________________________\n",
1413 | "activation_74 (Activation) (None, None, None, 1 0 batch_normalization_74[0][0] \n",
1414 | "__________________________________________________________________________________________________\n",
1415 | "conv2d_71 (Conv2D) (None, None, None, 3 552960 activation_70[0][0] \n",
1416 | "__________________________________________________________________________________________________\n",
1417 | "conv2d_75 (Conv2D) (None, None, None, 1 331776 activation_74[0][0] \n",
1418 | "__________________________________________________________________________________________________\n",
1419 | "batch_normalization_71 (BatchNo (None, None, None, 3 960 conv2d_71[0][0] \n",
1420 | "__________________________________________________________________________________________________\n",
1421 | "batch_normalization_75 (BatchNo (None, None, None, 1 576 conv2d_75[0][0] \n",
1422 | "__________________________________________________________________________________________________\n",
1423 | "activation_71 (Activation) (None, None, None, 3 0 batch_normalization_71[0][0] \n",
1424 | "__________________________________________________________________________________________________\n",
1425 | "activation_75 (Activation) (None, None, None, 1 0 batch_normalization_75[0][0] \n",
1426 | "__________________________________________________________________________________________________\n",
1427 | "max_pooling2d_3 (MaxPooling2D) (None, None, None, 7 0 mixed7[0][0] \n",
1428 | "__________________________________________________________________________________________________\n",
1429 | "mixed8 (Concatenate) (None, None, None, 1 0 activation_71[0][0] \n",
1430 | " activation_75[0][0] \n",
1431 | " max_pooling2d_3[0][0] \n",
1432 | "__________________________________________________________________________________________________\n",
1433 | "conv2d_80 (Conv2D) (None, None, None, 4 573440 mixed8[0][0] \n",
1434 | "__________________________________________________________________________________________________\n",
1435 | "batch_normalization_80 (BatchNo (None, None, None, 4 1344 conv2d_80[0][0] \n",
1436 | "__________________________________________________________________________________________________\n",
1437 | "activation_80 (Activation) (None, None, None, 4 0 batch_normalization_80[0][0] \n",
1438 | "__________________________________________________________________________________________________\n",
1439 | "conv2d_77 (Conv2D) (None, None, None, 3 491520 mixed8[0][0] \n",
1440 | "__________________________________________________________________________________________________\n",
1441 | "conv2d_81 (Conv2D) (None, None, None, 3 1548288 activation_80[0][0] \n",
1442 | "__________________________________________________________________________________________________\n",
1443 | "batch_normalization_77 (BatchNo (None, None, None, 3 1152 conv2d_77[0][0] \n",
1444 | "__________________________________________________________________________________________________\n",
1445 | "batch_normalization_81 (BatchNo (None, None, None, 3 1152 conv2d_81[0][0] \n",
1446 | "__________________________________________________________________________________________________\n",
1447 | "activation_77 (Activation) (None, None, None, 3 0 batch_normalization_77[0][0] \n",
1448 | "__________________________________________________________________________________________________\n",
1449 | "activation_81 (Activation) (None, None, None, 3 0 batch_normalization_81[0][0] \n",
1450 | "__________________________________________________________________________________________________\n",
1451 | "conv2d_78 (Conv2D) (None, None, None, 3 442368 activation_77[0][0] \n",
1452 | "__________________________________________________________________________________________________\n",
1453 | "conv2d_79 (Conv2D) (None, None, None, 3 442368 activation_77[0][0] \n",
1454 | "__________________________________________________________________________________________________\n",
1455 | "conv2d_82 (Conv2D) (None, None, None, 3 442368 activation_81[0][0] \n",
1456 | "__________________________________________________________________________________________________\n",
1457 | "conv2d_83 (Conv2D) (None, None, None, 3 442368 activation_81[0][0] \n",
1458 | "__________________________________________________________________________________________________\n",
1459 | "average_pooling2d_7 (AveragePoo (None, None, None, 1 0 mixed8[0][0] \n",
1460 | "__________________________________________________________________________________________________\n",
1461 | "conv2d_76 (Conv2D) (None, None, None, 3 409600 mixed8[0][0] \n",
1462 | "__________________________________________________________________________________________________\n",
1463 | "batch_normalization_78 (BatchNo (None, None, None, 3 1152 conv2d_78[0][0] \n",
1464 | "__________________________________________________________________________________________________\n",
1465 | "batch_normalization_79 (BatchNo (None, None, None, 3 1152 conv2d_79[0][0] \n",
1466 | "__________________________________________________________________________________________________\n",
1467 | "batch_normalization_82 (BatchNo (None, None, None, 3 1152 conv2d_82[0][0] \n",
1468 | "__________________________________________________________________________________________________\n",
1469 | "batch_normalization_83 (BatchNo (None, None, None, 3 1152 conv2d_83[0][0] \n",
1470 | "__________________________________________________________________________________________________\n",
1471 | "conv2d_84 (Conv2D) (None, None, None, 1 245760 average_pooling2d_7[0][0] \n",
1472 | "__________________________________________________________________________________________________\n",
1473 | "batch_normalization_76 (BatchNo (None, None, None, 3 960 conv2d_76[0][0] \n",
1474 | "__________________________________________________________________________________________________\n",
1475 | "activation_78 (Activation) (None, None, None, 3 0 batch_normalization_78[0][0] \n",
1476 | "__________________________________________________________________________________________________\n",
1477 | "activation_79 (Activation) (None, None, None, 3 0 batch_normalization_79[0][0] \n",
1478 | "__________________________________________________________________________________________________\n",
1479 | "activation_82 (Activation) (None, None, None, 3 0 batch_normalization_82[0][0] \n",
1480 | "__________________________________________________________________________________________________\n",
1481 | "activation_83 (Activation) (None, None, None, 3 0 batch_normalization_83[0][0] \n",
1482 | "__________________________________________________________________________________________________\n",
1483 | "batch_normalization_84 (BatchNo (None, None, None, 1 576 conv2d_84[0][0] \n",
1484 | "__________________________________________________________________________________________________\n",
1485 | "activation_76 (Activation) (None, None, None, 3 0 batch_normalization_76[0][0] \n",
1486 | "__________________________________________________________________________________________________\n",
1487 | "mixed9_0 (Concatenate) (None, None, None, 7 0 activation_78[0][0] \n",
1488 | " activation_79[0][0] \n",
1489 | "__________________________________________________________________________________________________\n",
1490 | "concatenate (Concatenate) (None, None, None, 7 0 activation_82[0][0] \n",
1491 | " activation_83[0][0] \n",
1492 | "__________________________________________________________________________________________________\n",
1493 | "activation_84 (Activation) (None, None, None, 1 0 batch_normalization_84[0][0] \n",
1494 | "__________________________________________________________________________________________________\n",
1495 | "mixed9 (Concatenate) (None, None, None, 2 0 activation_76[0][0] \n",
1496 | " mixed9_0[0][0] \n",
1497 | " concatenate[0][0] \n",
1498 | " activation_84[0][0] \n",
1499 | "__________________________________________________________________________________________________\n",
1500 | "conv2d_89 (Conv2D) (None, None, None, 4 917504 mixed9[0][0] \n",
1501 | "__________________________________________________________________________________________________\n",
1502 | "batch_normalization_89 (BatchNo (None, None, None, 4 1344 conv2d_89[0][0] \n",
1503 | "__________________________________________________________________________________________________\n",
1504 | "activation_89 (Activation) (None, None, None, 4 0 batch_normalization_89[0][0] \n",
1505 | "__________________________________________________________________________________________________\n",
1506 | "conv2d_86 (Conv2D) (None, None, None, 3 786432 mixed9[0][0] \n",
1507 | "__________________________________________________________________________________________________\n",
1508 | "conv2d_90 (Conv2D) (None, None, None, 3 1548288 activation_89[0][0] \n",
1509 | "__________________________________________________________________________________________________\n",
1510 | "batch_normalization_86 (BatchNo (None, None, None, 3 1152 conv2d_86[0][0] \n",
1511 | "__________________________________________________________________________________________________\n",
1512 | "batch_normalization_90 (BatchNo (None, None, None, 3 1152 conv2d_90[0][0] \n",
1513 | "__________________________________________________________________________________________________\n",
1514 | "activation_86 (Activation) (None, None, None, 3 0 batch_normalization_86[0][0] \n",
1515 | "__________________________________________________________________________________________________\n",
1516 | "activation_90 (Activation) (None, None, None, 3 0 batch_normalization_90[0][0] \n",
1517 | "__________________________________________________________________________________________________\n",
1518 | "conv2d_87 (Conv2D) (None, None, None, 3 442368 activation_86[0][0] \n",
1519 | "__________________________________________________________________________________________________\n",
1520 | "conv2d_88 (Conv2D) (None, None, None, 3 442368 activation_86[0][0] \n",
1521 | "__________________________________________________________________________________________________\n",
1522 | "conv2d_91 (Conv2D) (None, None, None, 3 442368 activation_90[0][0] \n",
1523 | "__________________________________________________________________________________________________\n",
1524 | "conv2d_92 (Conv2D) (None, None, None, 3 442368 activation_90[0][0] \n",
1525 | "__________________________________________________________________________________________________\n",
1526 | "average_pooling2d_8 (AveragePoo (None, None, None, 2 0 mixed9[0][0] \n",
1527 | "__________________________________________________________________________________________________\n",
1528 | "conv2d_85 (Conv2D) (None, None, None, 3 655360 mixed9[0][0] \n",
1529 | "__________________________________________________________________________________________________\n",
1530 | "batch_normalization_87 (BatchNo (None, None, None, 3 1152 conv2d_87[0][0] \n",
1531 | "__________________________________________________________________________________________________\n",
1532 | "batch_normalization_88 (BatchNo (None, None, None, 3 1152 conv2d_88[0][0] \n",
1533 | "__________________________________________________________________________________________________\n",
1534 | "batch_normalization_91 (BatchNo (None, None, None, 3 1152 conv2d_91[0][0] \n",
1535 | "__________________________________________________________________________________________________\n",
1536 | "batch_normalization_92 (BatchNo (None, None, None, 3 1152 conv2d_92[0][0] \n",
1537 | "__________________________________________________________________________________________________\n",
1538 | "conv2d_93 (Conv2D) (None, None, None, 1 393216 average_pooling2d_8[0][0] \n",
1539 | "__________________________________________________________________________________________________\n",
1540 | "batch_normalization_85 (BatchNo (None, None, None, 3 960 conv2d_85[0][0] \n",
1541 | "__________________________________________________________________________________________________\n",
1542 | "activation_87 (Activation) (None, None, None, 3 0 batch_normalization_87[0][0] \n",
1543 | "__________________________________________________________________________________________________\n",
1544 | "activation_88 (Activation) (None, None, None, 3 0 batch_normalization_88[0][0] \n",
1545 | "__________________________________________________________________________________________________\n",
1546 | "activation_91 (Activation) (None, None, None, 3 0 batch_normalization_91[0][0] \n",
1547 | "__________________________________________________________________________________________________\n",
1548 | "activation_92 (Activation) (None, None, None, 3 0 batch_normalization_92[0][0] \n",
1549 | "__________________________________________________________________________________________________\n",
1550 | "batch_normalization_93 (BatchNo (None, None, None, 1 576 conv2d_93[0][0] \n",
1551 | "__________________________________________________________________________________________________\n",
1552 | "activation_85 (Activation) (None, None, None, 3 0 batch_normalization_85[0][0] \n",
1553 | "__________________________________________________________________________________________________\n",
1554 | "mixed9_1 (Concatenate) (None, None, None, 7 0 activation_87[0][0] \n",
1555 | " activation_88[0][0] \n",
1556 | "__________________________________________________________________________________________________\n",
1557 | "concatenate_1 (Concatenate) (None, None, None, 7 0 activation_91[0][0] \n",
1558 | " activation_92[0][0] \n",
1559 | "__________________________________________________________________________________________________\n",
1560 | "activation_93 (Activation) (None, None, None, 1 0 batch_normalization_93[0][0] \n",
1561 | "__________________________________________________________________________________________________\n",
1562 | "mixed10 (Concatenate) (None, None, None, 2 0 activation_85[0][0] \n",
1563 | " mixed9_1[0][0] \n",
1564 | " concatenate_1[0][0] \n",
1565 | " activation_93[0][0] \n",
1566 | "==================================================================================================\n",
1567 | "Total params: 21,802,784\n",
1568 | "Trainable params: 21,768,352\n",
1569 | "Non-trainable params: 34,432\n",
1570 | "__________________________________________________________________________________________________\n"
1571 | ]
1572 | }
1573 | ],
1574 | "source": [
1575 | "encoder_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')\n",
1576 | "\n",
1577 | "new_input = encoder_model.input \n",
1578 | "hidden_layer = encoder_model.layers[-1].output\n",
1579 | "encoder_features_model = tf.keras.Model(new_input, hidden_layer)\n",
1580 | "\n",
1581 | "encoder_features_model.summary()"
1582 | ]
1583 | },
1584 | {
1585 | "cell_type": "code",
1586 | "execution_count": null,
1587 | "metadata": {
1588 | "id": "_Q_xORSfgVpw"
1589 | },
1590 | "outputs": [],
1591 | "source": [
1592 | "# Extracting features from the encoder model and caching it \n",
1593 | "# Creating a dataset of tensors from the train images list\n",
1594 | "images_data = tf.data.Dataset.from_tensor_slices(train_images)\n",
1595 | "\n",
1596 | "# Mapping the preprocess method on the dataset and paralellizing it (num of batches: 64)\n",
1597 | "images_data = images_data.map(load_image_preprocess, num_parallel_calls = tf.data.AUTOTUNE).batch(64)\n",
1598 | "features_dict = {}\n",
1599 | "\n",
1600 | "start = time.time()\n",
1601 | "for img, filepath in tqdm(images_data):\n",
1602 | " features = encoder_features_model(img)\n",
1603 | " features = tf.reshape(features, (features.shape[0], -1, features.shape[3]))\n",
1604 | "\n",
1605 | " for f, path in zip(features, filepath):\n",
1606 | " features_path = path.numpy().decode('utf-8')\n",
1607 | " np.save(features_path, f.numpy())\n",
1608 | " features_dict[features_path] = f.numpy()\n",
1609 | " \n",
1610 | "end = time.time()"
1611 | ]
1612 | },
1613 | {
1614 | "cell_type": "code",
1615 | "execution_count": null,
1616 | "metadata": {
1617 | "colab": {
1618 | "base_uri": "https://localhost:8080/"
1619 | },
1620 | "id": "DrwM9us-gdX7",
1621 | "outputId": "352e1096-20ea-4c31-847c-9c2287341a64"
1622 | },
1623 | "outputs": [
1624 | {
1625 | "name": "stdout",
1626 | "output_type": "stream",
1627 | "text": [
1628 | "Time taken to generate features vector (train data): 28.365445109208427\n"
1629 | ]
1630 | }
1631 | ],
1632 | "source": [
1633 | "print('Time taken to generate features vector (train data): ', (end - start)/60)"
1634 | ]
1635 | },
1636 | {
1637 | "cell_type": "markdown",
1638 | "metadata": {
1639 | "id": "fZd16R4agVXJ"
1640 | },
1641 | "source": [
1642 | "Checkpoint: The code snippet below is to save the features_dict generated by the IncpetionV3 model once. These image features of length 2048, would be used for in the decoder LSTM-model\n"
1643 | ]
1644 | },
1645 | {
1646 | "cell_type": "code",
1647 | "execution_count": null,
1648 | "metadata": {
1649 | "id": "yUfNLFsIy1I_"
1650 | },
1651 | "outputs": [],
1652 | "source": [
1653 | "# Saving the train and test features dictionary\n",
1654 | "from pickle import dump, load\n",
1655 | "with open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/train_features.pkl', 'wb') as features_file:\n",
1656 | " pickle.dump(features_dict, features_file)\n",
1657 | "\n",
1658 | "from pickle import dump, load\n",
1659 | "with open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/test_features.pkl', 'wb') as features_file:\n",
1660 | " pickle.dump(features_dict, features_file)"
1661 | ]
1662 | },
1663 | {
1664 | "cell_type": "markdown",
1665 | "metadata": {
1666 | "id": "XjSYESq8O3zc"
1667 | },
1668 | "source": [
1669 | "### **4. LSTM Decoder**"
1670 | ]
1671 | },
1672 | {
1673 | "cell_type": "markdown",
1674 | "metadata": {
1675 | "id": "TjW6yDwzfaOu"
1676 | },
1677 | "source": [
1678 | "The following function is used to format the data needed to input to the Deep Learning decoder model. \n",
1679 | "\n",
1680 | "> Caption: Consider the caption '(start) a child in a pink dress is climbing up a set of stairs in an entry way (end)' and a features_vector generated by the InceptionV3 model.\n",
1681 | "\n",
1682 | "> The function converts it into the input format:\n",
1683 | "\n",
1684 | "* features_vector + '(start)'\n",
1685 | "* features_vector + '(start) a'\n",
1686 | "* features_vector + '(start) a child'\n",
1687 | "* features_vector + '(start) a child in'\n",
1688 | "\n",
1689 | "... and so on.\n",
1690 | "\n",
1691 | "\n",
1692 | "\n"
1693 | ]
1694 | },
1695 | {
1696 | "cell_type": "code",
1697 | "execution_count": 17,
1698 | "metadata": {
1699 | "id": "eJ2qLXYefUkW"
1700 | },
1701 | "outputs": [],
1702 | "source": [
1703 | "root_dir = '/content/gdrive/MyDrive/Deep_Learning/Flickr8k_Dataset/Flicker8k_Dataset/' \n",
1704 | "\n",
1705 | "def decoder_data_generator(dictionary, features_dict, token_to_idx, max_caption, images_batch_size):\n",
1706 | " X1, X2, y = list(), list(), list()\n",
1707 | " n=0\n",
1708 | " while 1:\n",
1709 | " for key, desc_list in dictionary.items():\n",
1710 | " n+=1\n",
1711 | "\n",
1712 | " image = features_dict[root_dir + key + '.jpg']\n",
1713 | " for desc in desc_list:\n",
1714 | " # encoding the input caption\n",
1715 | " seq = [token_to_idx[word] for word in desc.split(' ') if word in token_to_idx] \n",
1716 | " \n",
1717 | " for i in range(1, len(seq)): \n",
1718 | " in_seq, out_seq = seq[:i], seq[i] \n",
1719 | " in_seq = keras.preprocessing.sequence.pad_sequences([in_seq], maxlen=max_caption)[0] \n",
1720 | "\n",
1721 | " # Encoding the output caption \n",
1722 | " out_seq = keras.utils.to_categorical([out_seq], num_classes=vocab_size)[0] \n",
1723 | " X1.append(image)\n",
1724 | " X2.append(in_seq)\n",
1725 | " y.append(out_seq)\n",
1726 | " \n",
1727 | " if n == images_batch_size: # yield the batch data\n",
1728 | " yield [np.array(X1), np.array(X2)], np.array(y)\n",
1729 | " X1, X2, y = list(), list(), list()\n",
1730 | " n=0"
1731 | ]
1732 | },
1733 | {
1734 | "cell_type": "code",
1735 | "execution_count": 18,
1736 | "metadata": {
1737 | "id": "9qo7AyQPC3gW"
1738 | },
1739 | "outputs": [],
1740 | "source": [
1741 | "def decoder_model():\n",
1742 | "\n",
1743 | " input_1 = keras.Input(shape=(2048,)) \n",
1744 | " dropout_1 = keras.layers.Dropout(0.5)(input_1)\n",
1745 | " dense_1 = keras.layers.Dense(256, activation = 'relu')(dropout_1)\n",
1746 | "\n",
1747 | " input_2 = keras.Input(shape=(max_caption,))\n",
1748 | " embeddings_layer = keras.layers.Embedding(vocab_size, embeddings_size, mask_zero = True)(input_2)\n",
1749 | " dropout_2 = keras.layers.Dropout(0.5)(embeddings_layer)\n",
1750 | " lstm_layer = keras.layers.LSTM(256)(dropout_2)\n",
1751 | "\n",
1752 | " add_1 = keras.layers.merge.add([dense_1, lstm_layer])\n",
1753 | " dense_2 = keras.layers.Dense(256, activation = 'relu')(add_1)\n",
1754 | " output_layer = keras.layers.Dense(vocab_size, activation = 'softmax')(dense_2)\n",
1755 | " model = keras.models.Model(inputs = [input_1, input_2], outputs = output_layer)\n",
1756 | " \n",
1757 | " return model"
1758 | ]
1759 | },
1760 | {
1761 | "cell_type": "code",
1762 | "execution_count": 21,
1763 | "metadata": {
1764 | "colab": {
1765 | "base_uri": "https://localhost:8080/"
1766 | },
1767 | "id": "42Wrm0BYHbGI",
1768 | "outputId": "94574ff4-ab38-4867-8e6d-cba0998977f3"
1769 | },
1770 | "outputs": [
1771 | {
1772 | "name": "stdout",
1773 | "output_type": "stream",
1774 | "text": [
1775 | "Model: \"model\"\n",
1776 | "__________________________________________________________________________________________________\n",
1777 | "Layer (type) Output Shape Param # Connected to \n",
1778 | "==================================================================================================\n",
1779 | "input_4 (InputLayer) [(None, 37)] 0 \n",
1780 | "__________________________________________________________________________________________________\n",
1781 | "input_3 (InputLayer) [(None, 2048)] 0 \n",
1782 | "__________________________________________________________________________________________________\n",
1783 | "embedding (Embedding) (None, 37, 200) 331000 input_4[0][0] \n",
1784 | "__________________________________________________________________________________________________\n",
1785 | "dropout_1 (Dropout) (None, 2048) 0 input_3[0][0] \n",
1786 | "__________________________________________________________________________________________________\n",
1787 | "dropout_2 (Dropout) (None, 37, 200) 0 embedding[0][0] \n",
1788 | "__________________________________________________________________________________________________\n",
1789 | "dense_1 (Dense) (None, 256) 524544 dropout_1[0][0] \n",
1790 | "__________________________________________________________________________________________________\n",
1791 | "lstm (LSTM) (None, 256) 467968 dropout_2[0][0] \n",
1792 | "__________________________________________________________________________________________________\n",
1793 | "add (Add) (None, 256) 0 dense_1[0][0] \n",
1794 | " lstm[0][0] \n",
1795 | "__________________________________________________________________________________________________\n",
1796 | "dense_2 (Dense) (None, 256) 65792 add[0][0] \n",
1797 | "__________________________________________________________________________________________________\n",
1798 | "dense_3 (Dense) (None, 1655) 425335 dense_2[0][0] \n",
1799 | "==================================================================================================\n",
1800 | "Total params: 1,814,639\n",
1801 | "Trainable params: 1,814,639\n",
1802 | "Non-trainable params: 0\n",
1803 | "__________________________________________________________________________________________________\n"
1804 | ]
1805 | }
1806 | ],
1807 | "source": [
1808 | "LSTM_decoder = decoder_model()\n",
1809 | "LSTM_decoder.summary()"
1810 | ]
1811 | },
1812 | {
1813 | "cell_type": "markdown",
1814 | "metadata": {
1815 | "id": "NK2-ReinM076"
1816 | },
1817 | "source": [
1818 | "#### Defining LSTM decoder loss and optimizer and setting embeddings weights"
1819 | ]
1820 | },
1821 | {
1822 | "cell_type": "code",
1823 | "execution_count": 24,
1824 | "metadata": {
1825 | "id": "UwoHK8ICXQgT"
1826 | },
1827 | "outputs": [],
1828 | "source": [
1829 | "embeddings = load(open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/embeddings.pkl', 'rb'))\n",
1830 | "features_dict = load(open('/content/gdrive/MyDrive/Deep_Learning/Saved_files/train_features.pkl', 'rb'))\n",
1831 | "\n",
1832 | "LSTM_decoder.layers[2].set_weights([embeddings])\n",
1833 | "LSTM_decoder.layers[2].trainable = False\n",
1834 | "\n",
1835 | "LSTM_decoder.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(lr=0.001))\n",
1836 | "\n",
1837 | "def model_parameters(model, embeddings):\n",
1838 | " embeddings_layer = model.layers[2]\n",
1839 | " embeddings_layer.set_weights([embeddings])\n",
1840 | " embeddings_layer.trainable = False\n",
1841 | " model.layers[2] = embeddings_layer\n",
1842 | "\n",
1843 | " model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(lr=0.001))\n",
1844 | " return model\n",
1845 | "\n",
1846 | "LSTM_decoder = model_parameters(LSTM_decoder, embeddings)"
1847 | ]
1848 | },
1849 | {
1850 | "cell_type": "code",
1851 | "execution_count": 26,
1852 | "metadata": {
1853 | "colab": {
1854 | "base_uri": "https://localhost:8080/"
1855 | },
1856 | "id": "w_LJz-4zXWRC",
1857 | "outputId": "7c1a258a-c476-4bab-f89d-897ded4076a5"
1858 | },
1859 | "outputs": [
1860 | {
1861 | "name": "stdout",
1862 | "output_type": "stream",
1863 | "text": [
1864 | "\r 1/1000 [..............................] - ETA: 1:41 - loss: 3.2640"
1865 | ]
1866 | },
1867 | {
1868 | "name": "stderr",
1869 | "output_type": "stream",
1870 | "text": [
1871 | "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:1844: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.\n",
1872 | " warnings.warn('`Model.fit_generator` is deprecated and '\n"
1873 | ]
1874 | },
1875 | {
1876 | "name": "stdout",
1877 | "output_type": "stream",
1878 | "text": [
1879 | "1000/1000 [==============================] - 102s 102ms/step - loss: 3.0930\n",
1880 | "1000/1000 [==============================] - 105s 105ms/step - loss: 2.8734\n",
1881 | "1000/1000 [==============================] - 105s 105ms/step - loss: 2.7477\n",
1882 | "1000/1000 [==============================] - 104s 104ms/step - loss: 2.6564\n",
1883 | "1000/1000 [==============================] - 104s 104ms/step - loss: 2.5849\n",
1884 | "1000/1000 [==============================] - 103s 103ms/step - loss: 2.5272\n",
1885 | "1000/1000 [==============================] - 102s 102ms/step - loss: 2.4774\n",
1886 | "1000/1000 [==============================] - 101s 101ms/step - loss: 2.4355\n",
1887 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.3999\n",
1888 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.3669\n",
1889 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.3382\n",
1890 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.3127\n",
1891 | "1000/1000 [==============================] - 99s 98ms/step - loss: 2.2913\n",
1892 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.2705\n",
1893 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.2504\n",
1894 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.2334\n",
1895 | "1000/1000 [==============================] - 99s 98ms/step - loss: 2.2185\n",
1896 | "1000/1000 [==============================] - 99s 99ms/step - loss: 2.2012\n",
1897 | "1000/1000 [==============================] - 104s 104ms/step - loss: 2.1881\n",
1898 | "1000/1000 [==============================] - 103s 103ms/step - loss: 2.1762\n",
1899 | "1000/1000 [==============================] - 102s 102ms/step - loss: 2.1608\n",
1900 | "1000/1000 [==============================] - 101s 101ms/step - loss: 2.1506\n",
1901 | "1000/1000 [==============================] - 102s 102ms/step - loss: 2.1413\n",
1902 | "1000/1000 [==============================] - 100s 100ms/step - loss: 2.1334\n",
1903 | "1000/1000 [==============================] - 100s 100ms/step - loss: 2.1226\n",
1904 | "1000/1000 [==============================] - 101s 101ms/step - loss: 2.1129\n",
1905 | "1000/1000 [==============================] - 101s 101ms/step - loss: 2.1076\n",
1906 | "1000/1000 [==============================] - 102s 102ms/step - loss: 2.0991\n",
1907 | "1000/1000 [==============================] - 102s 102ms/step - loss: 2.0897\n",
1908 | "1000/1000 [==============================] - 101s 101ms/step - loss: 2.0817\n"
1909 | ]
1910 | }
1911 | ],
1912 | "source": [
1913 | "epochs = 30\n",
1914 | "images_per_batch = 6\n",
1915 | "steps = len(train_dictionary)//images_per_batch\n",
1916 | "\n",
1917 | "for i in range(epochs):\n",
1918 | " generator = decoder_data_generator(train_dictionary, features_dict, token_to_idx, max_caption, images_per_batch)\n",
1919 | " LSTM_decoder.fit_generator(generator, epochs=1, steps_per_epoch = steps, verbose=1)\n",
1920 | " LSTM_decoder.save('/content/gdrive/MyDrive/Deep_Learning/Saved_files/models/model_' + str(i) + '.h5')\n"
1921 | ]
1922 | },
1923 | {
1924 | "cell_type": "code",
1925 | "execution_count": 33,
1926 | "metadata": {
1927 | "id": "ND0bdI9e9t73"
1928 | },
1929 | "outputs": [],
1930 | "source": [
1931 | "LSTM_decoder.save('/content/gdrive/MyDrive/Deep_Learning/Saved_files/models/trained_model' + '.h5')"
1932 | ]
1933 | }
1934 | ],
1935 | "metadata": {
1936 | "accelerator": "GPU",
1937 | "colab": {
1938 | "collapsed_sections": [],
1939 | "name": "encoder_decoder.ipynb",
1940 | "provenance": []
1941 | },
1942 | "kernelspec": {
1943 | "display_name": "Python 3",
1944 | "name": "python3"
1945 | },
1946 | "language_info": {
1947 | "name": "python"
1948 | }
1949 | },
1950 | "nbformat": 4,
1951 | "nbformat_minor": 0
1952 | }
1953 |
--------------------------------------------------------------------------------