└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Image Captioning 2 | 3 | This repository contains an implementation of image captioning based on neural network (i.e. CNN + RNN). The model first extracts the image feature by CNN and then generates captions by RNN. CNN is VGG16 and RNN is a standard LSTM . 4 | 5 | Normal Sampling and Beam Search were used to predict the caption of images. 6 | 7 | 8 | # Network Topology:- 9 | 10 | ## Encoder 11 | The Convolutional Neural Network(CNN) can be thought of as an encoder. The input image is given to CNN to extract the features. The last hidden state of the CNN is connected to the Decoder. 12 | ## Decoder 13 | The Decoder is a Recurrent Neural Network(RNN) which does language modelling up to the word level. The first time step receives the encoded output from the encoder and also the vector. 14 | 15 | Dataset used was Flickr8k dataset. 16 | 17 | # Input 18 | ![input](https://user-images.githubusercontent.com/23000971/33495332-fbd2b75a-d6eb-11e7-999a-09fdc4255a6f.JPG) 19 | 20 | 21 | # Output 22 | ![output](https://user-images.githubusercontent.com/23000971/33495366-2b5a9cd6-d6ec-11e7-9cd0-2b7adce57b3e.JPG) 23 | ![text](https://user-images.githubusercontent.com/23000971/33495435-7a9bd10c-d6ec-11e7-9b26-77c6865c0551.JPG) 24 | 25 | 26 | # Dependencies 27 | 28 | * Keras 2.0.7 29 | * Theano 0.9.0 30 | * Numpy 31 | * Pandas 0.20.3 32 | * Matplotlib 33 | * Pickle 34 | 35 | # References 36 | 37 | [1] Deep Visual-Semantic Alignments for Generating Image 38 | Descriptions ( Karpathy et-al, CVPR 2015) 39 | 40 | [2] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator 41 | 42 | [3] CS231n: Convolutional Neural Networks for Visual Recognition. 43 | ( Instructors : Li Fei Fei, Andrej Karpathy, Justin Johnson) 44 | --------------------------------------------------------------------------------