├── README.md
├── coco_features
    ├── README.md
    └── coco_vgg_IDMap.txt
├── data.txt
├── data
    ├── train_qa
    └── val_qa
├── embedding.py
├── embeddings
    └── README.md
├── examples
    ├── COCO_val2014_000000000073.jpg
    ├── COCO_val2014_000000000136.jpg
    ├── COCO_val2014_000000000196.jpg
    ├── COCO_val2014_000000000283.jpg
    ├── COCO_val2014_000000000357.jpg
    ├── model1.png
    └── model2.png
├── models.py
├── prepare_data.py
├── question_answer.py
├── test.py
├── train.py
└── weights
    └── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Convolution Neural Network - Image Question Answering
 2 | This is a python and keras implementation of the VIS+LSTM visual question answering model. This model is explained in the paper [Exploring Models and Data for Image Question Answering](https://arxiv.org/abs/1505.02074). A second model is also implemented which is similar to the 2-VIS+BLSTM model from the paper mentioned above except that the LSTMs are not bidirectional. 
 3 | This model has two image feature inputs, at the start and the end of the sentence, with different learned linear transformations. We call it 2-VIS+LSTM. 
 4 | 
 5 | Details about the dataset are explained at the [VisualQA website](http://www.visualqa.org/). 
 6 | 
 7 | Here is a summary of performance we obtained on both the models.
 8 | 
 9 | | Model      | Epochs | Batch Size | Validation Accuracy |
10 | |------------|--------|------------|---------------------|
11 | | VIS+LSTM   | 10     | 200        | 53.27%              |
12 | | 2-VIS+LSTM | 10     | 200        | 54.01%              |
13 | 
14 | ## Requirements
15 | 
16 | * Python 2.7
17 | * Numpy
18 | * Scipy (for loading pre-computed MS COCO features)
19 | * NLTK (for tokenizer)
20 | * Keras
21 | * Theano
22 | 
23 | ## Training
24 | 
25 | * The basic usage is `python train.py`. 
26 | 
27 | * The model to train can be specified using the option `-model`. For example, to train the VIS+LSTM model enter `python train.py -model=1`. Similarly, the 2-VIS+LSTM model can be trained using `python train.py -model=2`. If no model is specified, model 1 is selected.
28 | 
29 | * The batch size and the number of epochs can also be specified using the options `-num_epochs` and `-batch_size`. The default batch size and number of epochs are 200 and 25 respectively.
30 | 
31 | * To train 2-VIS+LSTM with a batch size of 100 for 10 epochs, we would use: `python train.py -model=2 -batch_size=100 -num_epochs=10`.
32 | 
33 | ## Models
34 | 
35 | ### VIS+LSTM
36 | 
37 | <img src="examples/model1.png">
38 | 
39 | ### 2-VIS+LSTM
40 | 
41 | <img src="examples/model2.png">
42 | 
43 | ## Prediction
44 | 
45 | * Q&A can be performed on any image using the script `question_answer.py`.
46 | 
47 | * The options `-question` and `-image` are used to specify the question and address of the image respectively. The model to use for the prediction can be specified using `-model`. By default, model 2 is selected.
48 | 
49 | * An example of usage is: `python question_answer.py -image="examples/COCO_val2014_000000000136.jpg" -question="Which animal is this?" -model=2`
50 | 
51 | Here are some examples of predictions using the 2-VIS+LSTM model.
52 | 
53 | | Image                                              | Question                   | Top Answers (left to right) |
54 | |----------------------------------------------------|----------------------------|-----------------------------|
55 | | <img src="examples/COCO_val2014_000000000136.jpg"> | Which animal is this?      | giraffe, cat, bear          |
56 | | <img src="examples/COCO_val2014_000000000073.jpg"> | Which vehicle is this?     | motorcycle, taxi, train     |
57 | | <img src="examples/COCO_val2014_000000000196.jpg"> | How many dishes are there? | 5, 3, 2                     |
58 | | <img src="examples/COCO_val2014_000000000283.jpg"> | What is in the bottle?     | water, beer, wine           |
59 | | <img src="examples/COCO_val2014_000000000357.jpg"> | Which sport is this?       | tennis, baseball, frisbee   |
60 | 


--------------------------------------------------------------------------------
/coco_features/README.md:
--------------------------------------------------------------------------------
1 | Download the precomputed MS COCO features from http://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip and extract them in this folder.
2 | 


--------------------------------------------------------------------------------
/data.txt:
--------------------------------------------------------------------------------
1 | 'Tue Dec 20 00:00:00 2016  -0400942519' ; git add data.txt; GIT_AUTHOR_DATE='Tue Dec 20 00:00:00 2016  -0400' GIT_COMMITTER_DATE='Tue Dec 20 00:00:00 2016  -0400' git commit -m 'Update CNN'; git push;
2 | 


--------------------------------------------------------------------------------
/data/train_qa:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/data/train_qa


--------------------------------------------------------------------------------
/data/val_qa:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/data/val_qa


--------------------------------------------------------------------------------
/embedding.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import h5py
 3 | import pickle
 4 | 
 5 | def load():
 6 | 	path = 'embeddings/embedding_matrix.h5'
 7 | 	with h5py.File(path,'r') as hf:
 8 | 		data = hf.get('embedding_matrix')
 9 | 		embedding_matrix = np.array(data)
10 | 	return embedding_matrix
11 | 
12 | def load_idx():
13 | 	path = 'embeddings/word_idx'
14 | 	with open(path,'r') as file:
15 | 		word_idx = pickle.load(file)
16 | 	return word_idx
17 | 
18 | def create(glove_path):
19 | 	embedding_matrix_path = 'embeddings/embedding_matrix.h5'
20 | 	word_idx_path = 'embeddings/word_idx'
21 | 	embeddings = {}
22 | 	word_idx = {}
23 | 	
24 | 	with open(glove_path,'r') as f:
25 | 		for i, line in enumerate(f):
26 | 			values = line.split()
27 | 			word = values[0]
28 | 			coefs = np.asarray(values[1:],dtype='float32')
29 | 			embeddings[word] = coefs
30 | 			word_idx[word] = i+1
31 | 
32 | 	num_words = len(word_idx)
33 | 	embedding_matrix = np.zeros((1+num_words,300))
34 | 
35 | 	for i, word in enumerate(word_idx.keys()):
36 | 		embedding_matrix[i+1] = embeddings[word]
37 | 
38 | 	with h5py.File(embedding_matrix_path, 'w') as hf:
39 | 		hf.create_dataset('embedding_matrix',data=embedding_matrix)
40 | 
41 | 	with open(word_idx_path,'w') as f:
42 | 		pickle.dump(word_idx,f)
43 | 
44 | def main():
45 | 	parser = argparse.ArgumentParser()
46 | 	parser.add_argument('-address', type=str, required=True)
47 | 	args = parser.parse_args()
48 | 	print('Preparing embeddings ...')
49 | 	create(args.address)
50 | 
51 | if __name__ == '__main__': main()
52 | 


--------------------------------------------------------------------------------
/embeddings/README.md:
--------------------------------------------------------------------------------
1 | ## Instructions for preparing embeddings
2 | 
3 | Download and extract the pretrained common crawl 300D word vectors from http://nlp.stanford.edu/data/glove.840B.300d.zip.
4 | Use the script `embedding.py` to generate the embedding matrix and word indices. The usage is as follows:
5 | 
6 | ```
7 | $ python embedding.py -address address-of-extracted-glove-file
8 | ```
9 | 


--------------------------------------------------------------------------------
/examples/COCO_val2014_000000000073.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/COCO_val2014_000000000073.jpg


--------------------------------------------------------------------------------
/examples/COCO_val2014_000000000136.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/COCO_val2014_000000000136.jpg


--------------------------------------------------------------------------------
/examples/COCO_val2014_000000000196.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/COCO_val2014_000000000196.jpg


--------------------------------------------------------------------------------
/examples/COCO_val2014_000000000283.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/COCO_val2014_000000000283.jpg


--------------------------------------------------------------------------------
/examples/COCO_val2014_000000000357.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/COCO_val2014_000000000357.jpg


--------------------------------------------------------------------------------
/examples/model1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/model1.png


--------------------------------------------------------------------------------
/examples/model2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ayushoriginal/NeuralNetwork-ImageQA/ea83adee934b00afef38f4fefc1d89078ba7709e/examples/model2.png


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import embedding
  3 | from keras.models import Sequential
  4 | from keras.layers import Dense, Embedding, LSTM, Merge, Reshape, Dropout, Convolution2D, MaxPooling2D, ZeroPadding2D, Flatten
  5 | 
  6 | def vis_lstm():
  7 | 	embedding_matrix = embedding.load()
  8 | 	embedding_model = Sequential()
  9 | 	embedding_model.add(Embedding(
 10 | 		embedding_matrix.shape[0],
 11 | 		embedding_matrix.shape[1],
 12 | 		weights = [embedding_matrix],
 13 | 		trainable = False))
 14 | 	
 15 | 	image_model = Sequential()
 16 | 	image_model.add(Dense(
 17 | 		embedding_matrix.shape[1],
 18 | 		input_dim=4096,
 19 | 		activation='linear'))
 20 | 	image_model.add(Reshape((1,embedding_matrix.shape[1])))
 21 | 	
 22 | 	main_model = Sequential()
 23 | 	main_model.add(Merge(
 24 | 		[image_model,embedding_model],
 25 | 		mode = 'concat',		
 26 | 		concat_axis = 1))
 27 | 	main_model.add(LSTM(1001))
 28 | 	main_model.add(Dropout(0.5))
 29 | 	main_model.add(Dense(1001,activation='softmax'))
 30 | 	
 31 | 	return main_model
 32 | 
 33 | def vis_lstm_2():
 34 | 	embedding_matrix = embedding.load()
 35 | 	embedding_model = Sequential()
 36 | 	embedding_model.add(Embedding(
 37 | 		embedding_matrix.shape[0],
 38 | 		embedding_matrix.shape[1],
 39 | 		weights = [embedding_matrix],
 40 | 		trainable = False))
 41 | 	
 42 | 	image_model_1 = Sequential()
 43 | 	image_model_1.add(Dense(
 44 | 		embedding_matrix.shape[1],
 45 | 		input_dim=4096,
 46 | 		activation='linear'))
 47 | 	image_model_1.add(Reshape((1,embedding_matrix.shape[1])))
 48 | 	
 49 | 	image_model_2 = Sequential()
 50 | 	image_model_2.add(Dense(
 51 | 		embedding_matrix.shape[1],
 52 | 		input_dim=4096,
 53 | 		activation='linear'))
 54 | 	image_model_2.add(Reshape((1,embedding_matrix.shape[1])))
 55 | 	
 56 | 	main_model = Sequential()
 57 | 	main_model.add(Merge(
 58 | 		[image_model_1,embedding_model,image_model_2],
 59 | 		mode = 'concat',
 60 | 		concat_axis = 1))
 61 | 	main_model.add(LSTM(1001))
 62 | 	main_model.add(Dropout(0.5))
 63 | 	main_model.add(Dense(1001,activation='softmax'))
 64 | 	
 65 | 	return main_model
 66 | 
 67 | def VGG_16(weights_path=None):
 68 | 	model = Sequential()
 69 | 	model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
 70 | 	model.add(Convolution2D(64, 3, 3, activation='relu'))
 71 | 	model.add(ZeroPadding2D((1,1)))
 72 | 	model.add(Convolution2D(64, 3, 3, activation='relu'))
 73 | 	model.add(MaxPooling2D((2,2), strides =(2,2)))
 74 | 
 75 | 	model.add(ZeroPadding2D((1,1)))
 76 | 	model.add(Convolution2D(128, 3, 3, activation='relu'))
 77 | 	model.add(ZeroPadding2D((1,1)))
 78 | 	model.add(Convolution2D(128, 3, 3, activation='relu'))
 79 | 	model.add(MaxPooling2D((2,2), strides =(2,2)))
 80 | 
 81 | 	model.add(ZeroPadding2D((1,1)))
 82 | 	model.add(Convolution2D(256, 3, 3, activation='relu'))
 83 | 	model.add(ZeroPadding2D((1,1)))
 84 | 	model.add(Convolution2D(256, 3, 3, activation='relu'))
 85 | 	model.add(ZeroPadding2D((1,1)))
 86 | 	model.add(Convolution2D(256, 3, 3, activation='relu'))
 87 | 	model.add(MaxPooling2D((2,2), strides =(2,2)))
 88 | 
 89 | 	model.add(ZeroPadding2D((1,1)))
 90 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
 91 | 	model.add(ZeroPadding2D((1,1)))
 92 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
 93 | 	model.add(ZeroPadding2D((1,1)))
 94 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
 95 | 	model.add(MaxPooling2D((2,2), strides =(2,2)))
 96 | 
 97 | 	model.add(ZeroPadding2D((1,1)))
 98 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
 99 | 	model.add(ZeroPadding2D((1,1)))
100 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
101 | 	model.add(ZeroPadding2D((1,1)))
102 | 	model.add(Convolution2D(512, 3, 3, activation='relu'))
103 | 	model.add(MaxPooling2D((2,2), strides =(2,2)))
104 | 
105 | 	model.add(Flatten())
106 | 	model.add(Dense(4096, activation='relu'))
107 | 	model.add(Dropout(0.5))
108 | 	model.add(Dense(4096, activation='relu'))
109 | 	model.add(Dropout(0.5))
110 | 	model.add(Dense(1000, activation='softmax'))
111 | 
112 | 	if weights_path:
113 | 		model.load_weights(weights_path)
114 | 	
115 | 	return model
116 |     
117 | 
118 | 	
119 | 


--------------------------------------------------------------------------------
/prepare_data.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import embedding as ebd
  4 | import operator
  5 | import sys
  6 | import scipy as sc
  7 | from collections import defaultdict
  8 | from nltk import word_tokenize
  9 | from keras.preprocessing.sequence import pad_sequences
 10 | 
 11 | def int_to_answers():
 12 | 	data_path = 'data/train_qa'
 13 | 	df = pd.read_pickle(data_path)
 14 | 	answers = df[['multiple_choice_answer']].values.tolist()
 15 | 	freq = defaultdict(int)
 16 | 	for answer in answers:
 17 | 		freq[answer[0].lower()] += 1
 18 | 	int_to_answer = sorted(freq.items(),key=operator.itemgetter(1),reverse=True)[0:1000]
 19 | 	int_to_answer = [answer[0] for answer in int_to_answer]
 20 | 	return int_to_answer
 21 | 
 22 | top_answers = int_to_answers()	
 23 | 
 24 | def answers_to_onehot():
 25 | 	top_answers = int_to_answers()
 26 | 	answer_to_onehot = {}
 27 | 	for i, word in enumerate(top_answers):
 28 | 		onehot = np.zeros(1001)
 29 | 		onehot[i] = 1.0
 30 | 		answer_to_onehot[word] = onehot
 31 | 	return answer_to_onehot
 32 | 	
 33 | answer_to_onehot_dict = answers_to_onehot()
 34 | 
 35 | def get_answers_matrix(split):
 36 | 	if split == 'train':
 37 | 		data_path = 'data/train_qa'
 38 | 	elif split == 'val':
 39 | 		data_path = 'data/val_qa'
 40 | 	else:
 41 | 		print('Invalid split!')
 42 | 		sys.exit()
 43 | 	
 44 | 	df = pd.read_pickle(data_path)
 45 | 	answers = df[['multiple_choice_answer']].values.tolist()
 46 | 	answer_matrix = np.zeros((len(answers),1001))
 47 | 	default_onehot = np.zeros(1001)
 48 | 	default_onehot[1000] = 1.0
 49 | 	
 50 | 	for i, answer in enumerate(answers):
 51 | 		answer_matrix[i] = answer_to_onehot_dict.get(answer[0].lower(),default_onehot)
 52 | 	
 53 | 	return answer_matrix
 54 | 
 55 | def get_questions_matrix(split):
 56 | 	if split == 'train':
 57 | 		data_path = 'data/train_qa'
 58 | 	elif split == 'val':
 59 | 		data_path = 'data/val_qa'
 60 | 	else:
 61 | 		print('Invalid split!')
 62 | 		sys.exit()
 63 | 	
 64 | 	df = pd.read_pickle(data_path)
 65 | 	questions = df[['question']].values.tolist()
 66 | 	word_idx = ebd.load_idx()
 67 | 	seq_list = []
 68 | 	
 69 | 	for question in questions:
 70 | 		words = word_tokenize(question[0])
 71 | 		seq = []
 72 | 		for word in words:
 73 | 			seq.append(word_idx.get(word,0))
 74 | 		seq_list.append(seq)
 75 | 	question_matrix = pad_sequences(seq_list)
 76 | 	
 77 | 	return question_matrix
 78 | 
 79 | def get_coco_features(split):
 80 | 	if split == 'train':
 81 | 		data_path = 'data/train_qa'
 82 | 	elif split == 'val':
 83 | 		data_path = 'data/val_qa'
 84 | 	else:
 85 | 		print('Invalid split!')
 86 | 		sys.exit()
 87 | 	
 88 | 	id_map_path = 'coco_features/coco_vgg_IDMap.txt'
 89 | 	features_path = 'coco_features/vgg_feats.mat'
 90 | 	
 91 | 	img_labels = pd.read_pickle(data_path)[['image_id']].values.tolist()
 92 | 	img_ids = open(id_map_path).read().splitlines()
 93 | 	features_struct = sc.io.loadmat(features_path)
 94 | 	
 95 | 	id_map = {}
 96 | 	for ids in img_ids:
 97 | 		ids_split = ids.split()
 98 | 		id_map[int(ids_split[0])] = int(ids_split[1])
 99 | 	
100 | 	VGGfeatures = features_struct['feats']
101 | 	nb_dimensions = VGGfeatures.shape[0]
102 | 	nb_images = len(img_labels)
103 | 	image_matrix = np.zeros((nb_images,nb_dimensions))
104 | 	
105 | 	for i in range(nb_images):
106 | 		image_matrix[i,:] = VGGfeatures[:,id_map[img_labels[i][0]]]
107 | 	
108 | 	return image_matrix
109 | 
110 | 		
111 | 
112 | 
113 | 	
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/question_answer.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import embedding as ebd
 3 | import prepare_data
 4 | import models
 5 | import argparse
 6 | import sys
 7 | import keras.backend as K
 8 | from nltk import word_tokenize
 9 | from keras.applications.vgg16 import preprocess_input
10 | from keras.preprocessing import image
11 | from keras.models import load_model
12 | 
13 | def extract_image_features(img_path):
14 | 	model = models.VGG_16('weights/vgg16_weights.h5')
15 | 	img = image.load_img(img_path,target_size=(224,224))
16 | 	x = image.img_to_array(img)
17 | 	x = np.expand_dims(x,axis=0)
18 | 	x = preprocess_input(x)
19 | 	last_layer_output = K.function([model.layers[0].input,K.learning_phase()],
20 | 		[model.layers[-1].input])
21 | 	features = last_layer_output([x,0])[0]
22 | 	return features
23 | 
24 | def preprocess_question(question):
25 | 	word_idx = ebd.load_idx()
26 | 	tokens = word_tokenize(question)
27 | 	seq = []
28 | 	for token in tokens:
29 | 		seq.append(word_idx.get(token,0))
30 | 	seq = np.reshape(seq,(1,len(seq)))
31 | 	return seq
32 | 
33 | def generate_answer(img_path, question, model):
34 | 	model_path = 'weights/model_'+str(model)+'.h5'
35 | 	model = load_model(model_path)
36 | 	img_features = extract_image_features(img_path)
37 | 	seq = preprocess_question(question)
38 | 	if model == 1:
39 | 		x = [img_features, seq]
40 | 	else:
41 | 		x = [img_features, seq, img_features]
42 | 	probabilities = model.predict(x)[0]
43 | 	answers = np.argsort(probabilities[:1000])
44 | 	top_answers = [prepare_data.top_answers[answers[-1]],
45 | 		prepare_data.top_answers[answers[-2]],
46 | 		prepare_data.top_answers[answers[-3]]]
47 | 	
48 | 	return top_answers
49 | 
50 | def main():
51 | 	parser = argparse.ArgumentParser()
52 | 	parser.add_argument('-image', type=str, required=True)
53 | 	parser.add_argument('-question', type=str, required=True)
54 | 	parser.add_argument('-model', type=int, default=2)
55 | 	args = parser.parse_args()
56 | 	if args.model != 1 and args.model != 2:
57 | 		print('Invalid model selection.')
58 | 		sys.exit()
59 | 	top_answers = generate_answer(args.image, args.question, args.model)
60 | 	print('Top answers: %s, %s, %s.' % (top_answers[0],top_answers[1],top_answers[2]))
61 | 
62 | if __name__ == '__main__':main()
63 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
 1 | from datetime import date, timedelta
 2 | from random import randint
 3 | from time import sleep
 4 | import sys
 5 | import subprocess
 6 | import os
 7 | 
 8 | 
 9 | def get_date_string(n, startdate):
10 | 	d = startdate - timedelta(days=n)
11 | 	rtn = d.strftime("%a %b %d %X %Y %z -0400")
12 | 	return rtn
13 | 
14 | # main app
15 | def main(argv):
16 | 	if len(argv) < 1 or len(argv) > 2:
17 | 		print "Error: Bad input."
18 | 		sys.exit(1)
19 | 	n = int(argv[0])
20 | 	if len(argv) == 1:
21 | 		startdate = date.today()
22 | 	if len(argv) == 2:
23 | 		startdate = date(int(argv[1][0:4]), int(argv[1][5:7]), int(argv[1][8:10]))
24 | 	i = 0
25 | 	while i <= n:
26 | 		curdate = get_date_string(i, startdate)
27 | 		num_commits = randint(1, 10)
28 | 		for commit in range(0, num_commits):
29 | 			subprocess.call("echo '" + curdate + str(randint(0, 1000000)) +"' > data.txt; git add data.txt; GIT_AUTHOR_DATE='" + curdate + "' GIT_COMMITTER_DATE='" + curdate + "' git commit -m 'Update CNN'; git push;", shell=True)
30 | 			sleep(.5)
31 | 		i += 1
32 | 	subprocess.call("git rm data.txt; git commit -m 'Reconfigure Model'; git push;", shell=True)
33 | 
34 | if __name__ == "__main__":
35 | 	main(sys.argv[1:])
36 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import prepare_data
 3 | import models
 4 | import argparse
 5 | import sys
 6 | 
 7 | def main():
 8 | 	parser = argparse.ArgumentParser()
 9 | 	parser.add_argument('-num_epochs', type=int, default=25)
10 | 	parser.add_argument('-batch_size', type=int, default=200)
11 | 	parser.add_argument('-model', type=int, default=1)
12 | 	args = parser.parse_args()
13 | 
14 | 	print('Loading questions ...')
15 | 	questions_train = prepare_data.get_questions_matrix('train')
16 | 	questions_val = prepare_data.get_questions_matrix('val')
17 | 	print('Loading answers ...')
18 | 	answers_train = prepare_data.get_answers_matrix('train')
19 | 	answers_val = prepare_data.get_answers_matrix('val')
20 | 	print('Loading image features ...')
21 | 	img_features_train = prepare_data.get_coco_features('train')
22 | 	img_features_val = prepare_data.get_coco_features('val')
23 | 	print('Creating model ...')
24 | 	
25 | 	if args.model == 1:
26 | 		model = models.vis_lstm()
27 | 		X_train = [img_features_train, questions_train]
28 | 		X_val = [img_features_val, questions_val]
29 | 		model_path = 'weights/model_1.h5'
30 | 	elif args.model == 2:
31 | 		model = models.vis_lstm_2()
32 | 		X_train = [img_features_train, questions_train, img_features_train]
33 | 		X_val = [img_features_val, questions_val, img_features_val]
34 | 		model_path = 'weights/model_2.h5'
35 | 	else:
36 | 		print('Invalid model selection!\nAvailable choices: 1 for vis-lstm and 2 for 2-vis-lstm.')
37 | 		sys.exit()
38 | 
39 | 	model.compile(optimizer='adam',
40 | 		loss='categorical_crossentropy',
41 | 		metrics=['accuracy'])
42 | 
43 | 	model.fit(X_train,answers_train,
44 | 		nb_epoch=args.num_epochs,
45 | 		batch_size=args.batch_size,
46 | 		validation_data=(X_val,answers_val),
47 | 		verbose=1)
48 | 
49 | 	model.save(model_path)
50 | 
51 | if __name__ == '__main__':main()
52 | 
53 | 


--------------------------------------------------------------------------------
/weights/README.md:
--------------------------------------------------------------------------------
1 | Download the pretrained VGG 16 weights from https://drive.google.com/file/d/0Bz7KyqmuGsilT0J5dmRCM0ROVHc/view and place them in this folder. This is required for making predictions on your own images.
2 | 
3 | 


--------------------------------------------------------------------------------