├── .gitignore ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | .DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Amrit Khera 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Image_Captioning Using LSTM 2 | 3 | ## Dataset - Flickr_8K 4 | 9 | 10 | ## Link to Dataset 11 | https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip 12 | https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip 13 | 14 | ## Link to Code (notebook) 15 | If facing Github rendering issues use: 16 | https://nbviewer.org/github/AmritK10/Image_Captioning/blob/master/image_captioning.ipynb 17 | 18 | ## Model 19 | ResNet50 was used as an image encoder to encode the images which were then input in the model.
20 | Keras embedding layer was used to generate word embeddings on the captions which were encoded earlier.
21 | The embeddings were then passed into LSTM after which the image and text features were combined and sent to a decoder network 22 | to generate the next word.
23 | 24 | ## Results 25 | Greedy Search and Beam Search both were used to generate the captions.
26 | Bleu Score was used to evaluate the captions generated. 27 | ### Captions generated by Greedy Search are as follows: 28 | Screen Shot 2019-09-01 at 11 07 09 PM 29 | Screen Shot 2019-09-01 at 11 07 53 PM 30 | Screen Shot 2019-09-01 at 11 08 18 PM 31 | 32 |
33 | 34 | ### Captions generated by Beam Search with k=3 are as follows: 35 | Screen Shot 2019-09-01 at 11 13 23 PM 36 | Screen Shot 2019-09-01 at 11 13 48 PM 37 | Screen Shot 2019-09-01 at 11 14 01 PM 38 | Screen Shot 2019-09-01 at 11 14 10 PM 39 | 40 | ### Average Bleu Score on Test Set 41 | Greedy Search: 0.4776
42 | Beam Search with k=3: 0.4930 43 | --------------------------------------------------------------------------------