├── HOW TO RUN.txt
├── README.md
├── image-caption-generator.ipynb
├── output_1.png
├── output_2.png
└── output_3.png


/HOW TO RUN.txt:
--------------------------------------------------------------------------------
 1 | IMPORTANT LINKS:
 2 | 
 3 | Dataset link: https://www.kaggle.com/shadabhussain/flickr8k
 4 | 
 5 | VGG-16 predefined weights: https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
 6 | 
 7 | 
 8 | HOW TO EXECUTE THE CODE:
 9 | 
10 | 1. Make a new directory and download the image-caption-generator.ipynb file in the directory.
11 | 
12 | 2. From the Dataset Link (https://www.kaggle.com/shadabhussain/flickr8k) download the
13 | complete dataset and save it in your directory.
14 | 
15 | 3. Open the image-caption-generator.ipynb file in Jupyter Notebook.
16 | 
17 | 4. In the 2nd code cell give the path to your "Flicker8k_Dataset" folder for the "dir_Flickr_jpg" variable.
18 | And the path for the "Flickr8k.token.txt" file for the "dir_Flickr_text" variable.
19 | 
20 | 5. From the link (https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5)
21 | Download the VGG-16 Model pre-defined weights for image classification and save it in your directory.
22 | 
23 | 6. In the 12th code cell give the path to your downloaded predefined weights (.h5 file) in the previous step.
24 | 
25 | 7. Run all the cells.
26 | 
27 | 8. After execution of all the cells is complete (it will take 2-3 hours), the last cell will open a GUI interface.
28 | 
29 | 9. Click on the "Choose Image" button in GUI and upload an image from dataset or any image from google
30 | and then click on the "Classify Image" button to get the predicted caption from the model.
31 | 
32 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Image-Caption-Generator-with-GUI
 2 | Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given photograph. My project is inspired from Andrej Karpathy famous blogpost The Unreasonable effectiveness of Recurrent Neural Networks.
 3 | 
 4 |   Dataset
 5 |   
 6 |   I have used Flickr8k dataset for this project.
 7 |   It contains around 8091 images and 5 captions for each image (5 * 8091 = 40455 captions)
 8 | 
 9 |   **Dataset Source: KAGGLE LINK https://www.kaggle.com/shadabhussain/flickr8k**
10 | 
11 |   Model Evaluation Matrix
12 |   
13 |   BLEU metric is an evaluation matrix that compares a generater sentence with a reference sentence. It compares n-grams (1-gram means one word) and a perfect match and a perfect mismatch results in a score of 1.0 and 0.0 respectively.
14 |   
15 | ### Data Preparation
16 | We create a new dataframe called as dfword to visualize distribution of the words. It contains each word and its frequency in the entire tokens in decreasing order.
17 | 
18 | ![image](https://user-images.githubusercontent.com/41522782/125451863-97951044-f7f3-432f-9592-c2bed0650bf9.png)
19 | 
20 | Then we find the top 50 most and least frequently appearing words. Here, I found that the stopwords (like a, the) and punctuations are most occuring, we have to remove these from our dataset in order to clean it. I have implemented 3 functions to clean the captions:
21 |   1. remove_punctuation(text)
22 |   2. remove_single_character(text)
23 |   3. remove_numeric(text)
24 | 
25 | Now we'll be adding start and end tokens in every caption ('startseq ' & ' endseq')
26 | 
27 | ### Image Preparation for VGG16 Model
28 | We will be using pre-trained network VGG16.
29 | 
30 | **Model Source: GITHUB LINK https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5**
31 | 
32 | VGG16 model takes input image of size (224, 224, 3) process it with DEEP CNN layers and output a tensor of shape (?, 1000) i.e. 1000 classes
33 | We have removed the last layer from the network as we only want the features (?, 4096).
34 | 
35 | After this we'll reshape our images to (224, 224, 3) and feed it to the model to get features respective to each image.
36 |   
37 | **Tokenizer**
38 | We'll now convert our captions into tokens eg (startseq = 1, endseq = 2 etc.) and store these into an array.
39 | 
40 | ### Model Training
41 | 
42 | **Splitting the dataset**
43 | We'll split the dataset (all 3 datasets i.e. captions_data, images_data and filenames_data) in ratio of 0.6 : 0.2 : 0.2 (train:valid:test)
44 | For the captions_data we have to do the padding since not all token_arrays are of same length, so we take the maximum length and do padding for other token_arrays.
45 | 
46 | **Model**
47 | - The input to the model will be image-features of shape 4096.
48 | - First we got 256 unit outputs from images using Dense layer and we used the Embedding and LSTM layer to get 256 unit output from captions.
49 | - It uses Categorical cross entropy loss function & adam optimizer.
50 | - Then we train the model with our dataset.
51 | 
52 | ### Graphical User Interface
53 | We have used tkinter library and PIL (Python Image Library) to make a GUI with Upload button to upload the images and Classify Image button to get generated caption.
54 | 


--------------------------------------------------------------------------------
/output_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/akshatchaturvedi28/Image-Caption-Generator-with-GUI/e2923e0d4abce31ece5a5075b5e5556a59140d58/output_1.png


--------------------------------------------------------------------------------
/output_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/akshatchaturvedi28/Image-Caption-Generator-with-GUI/e2923e0d4abce31ece5a5075b5e5556a59140d58/output_2.png


--------------------------------------------------------------------------------
/output_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/akshatchaturvedi28/Image-Caption-Generator-with-GUI/e2923e0d4abce31ece5a5075b5e5556a59140d58/output_3.png


--------------------------------------------------------------------------------