├── .gitignore
├── README.md
├── __init__.py
├── codes
    ├── MSCOCO.py
    ├── __init__.py
    ├── caption_generator.py
    ├── evaluate_model.py
    ├── generate_caption.py
    ├── generate_caption_beam.py
    ├── image_reader.py
    ├── pre_extract_googlenet_features.py
    ├── prepocess_captions.py
    ├── sample_code.ipynb
    ├── sample_code.py
    ├── sample_code_jp.ipynb
    └── train_caption_model.py
├── data
    └── .gitignore
├── download.sh
├── download_jp.sh
├── evalutation_script
    ├── README.md
    ├── evalutate_caption_val.py
    └── generate_caption_val.py
├── experiment1
    └── .gitignore
├── images
    ├── COCO_val2014_000000185546.jpg
    ├── COCO_val2014_000000192091.jpg
    ├── COCO_val2014_000000229948.jpg
    ├── COCO_val2014_000000241747.jpg
    ├── COCO_val2014_000000250790.jpg
    ├── COCO_val2014_000000277533.jpg
    ├── COCO_val2014_000000285505.jpg
    ├── COCO_val2014_000000323758.jpg
    ├── COCO_val2014_000000326128.jpg
    ├── COCO_val2014_000000397427.jpg
    ├── COCO_val2014_000000553761.jpg
    └── test_image.jpg
├── models
    └── .gitignore
└── work
    └── .gitignore


/.gitignore:
--------------------------------------------------------------------------------
 1 | #gtignore 以外のファイルを全部無視する。
 2 | .*
 3 | !.gitignore
 4 | 
 5 | codes/sample_code_work.ipynb
 6 | 
 7 | *.pyc
 8 | 
 9 | *.pyc
10 | 
11 | codes/image_reader.pyc
12 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ### I no longer maintain this repository. This implementation is not that clean and hard to use if you want to train on your own data. I re-implemented from scratch. The new one is much faster, accurate, and clean. It can even generate Chinese captions. Please see the [better implementation] (https://github.com/apple2373/chainer-caption).
 2 | 
 3 | 
 4 | # image caption generation by chainer
 5 | This codes are trying to reproduce the image captioning by google in CVPR 2015.
 6 | Show and Tell: A Neural Image Caption Generator
 7 | http://arxiv.org/abs/1411.4555
 8 | 
 9 | The training data is MSCOCO. I used GoogleNet to extract  images feature in advance (preprocessed them before training), and then trained language model to generate caption.
10 | 
11 | I made pre-trained model available. The model achieves CIDEr of 0.66 for the MSCOCO validation dataset. To achieve the better score, the use of beam search is first step (not implemented yet). Also, I think the CNN has to be fine-tuned.  
12 | Update: I implemented a beam search. Check the usage below.  
13 | 
14 | More information including some sample captions are in my blog post. 
15 | http://t-satoshi.blogspot.com/2015/12/image-caption-generation-by-cnn-and-lstm.html
16 | 
17 | ## requirement
18 | chainer 1.6  http://chainer.org
19 | and some more packages.  
20 | !!Warning ** Be sure to use chainer 1.6.**  Not the latest version. If you have another version, no guarantee to work.  
21 | If you are new, I suggest you to install Anaconda (https://www.continuum.io/downloads) and then install chainer.  You can watch the video below. 
22 | 
23 | ## I have a problem to prepare environment
24 | I  prepared a video to show how you prepare environment and generate captions on ubuntu. I used a virtual machine just after installing ubuntu 14.04. If you imitate as in the video, you can generate captions. The process is almost the same for Mac. Windows is not suported because I cannot use it (Acutually chainer does not officialy support windows). 
25 | https://drive.google.com/file/d/0B046sNk0DhCDUkpwblZPME1vQzg/edit
26 | Or, some commands that might help:
27 | ```
28 | #get and install anaconda. you might want to check the latest link.
29 | wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh
30 | bash Anaconda2-2.4.1-Linux-x86_64.sh -b
31 | echo 'export PATH=$HOME/anaconda/bin:$PATH' >> .bashrc
32 | echo 'export PYTHONPATH=$HOME/anaconda/lib/python2.7/site-packages:$PYTHONPATH' >> .bashrc
33 | source .bashrc
34 | conda update conda -y
35 | # install chainer 
36 | pip install chainer==1.6
37 | ```
38 | 
39 | ## I just want to generate caption!
40 | OK, first, you need to download the models and other preprocessed files.
41 | Then you can generate caption.
42 | 
43 | IMPORTANT NOTE:  
44 | Google Drive suddenly shut down the hosting service and the file downlaod no longer works.  
45 | Ref: https://gsuiteupdates.googleblog.com/2015/08/deprecating-web-hosting-support-in.html
46 | 
47 | I don't have time to uplaod somewhere else, but all files are here:  
48 | https://drive.google.com/open?id=0B046sNk0DhCDeEczcm1vaWlCTFk  
49 | 
50 | ```
51 | bash download.sh
52 | cd codes
53 | python generate_caption.py -i ../images/test_image.jpg
54 | ```
55 | This generate a caption for ../images/test_image.jpg. If you want to use your image, you just have to indicate -i option to image that you want to generate captions. 
56 | 
57 | Once you set up environment, you can use it as a module.Check the ipython notebooks. This includes beam search. 
58 | English:https://github.com/apple2373/chainer_caption_generation/blob/master/codes/sample_code.ipynb  
59 | 
60 | Also, you can try beam search as:
61 | ```
62 | cd codes
63 | python generate_caption_beam.py -b 3 -i ../images/test_image.jpg
64 | ```
65 | -b option indicates beam size. Default is 3. 
66 | 
67 | ## I want to train the model by myself.
68 | I extracted the GoogleNet features and pickled, so you use it for training.  
69 | ```
70 |  cd codes
71 |  python train_caption_model.py 
72 |  python train_caption_model.py  -g 0 # to use gpu. change the number to gpu_id
73 | ```
74 | The log and trained model will be saved to a directory (experiment1 is defalt)  
75 | If you want to change, use -d option. 
76 | ```
77 |  python train_caption_model.py -d ./yourdirectory
78 | ```
79 | 
80 | ## I want to train from other data.
81 | Sorry, current implementation does not support it. You need to preprocess the data. Maybe you can read and modify the code. 
82 | 
83 | ## I want to fine-tune CNN part. 
84 | Sorry, current implementation does not support it. Maybe you can read and modify the code. 
85 | 
86 | ## I want to generate Japanese caption. 
87 | I made pre-trained Japanese caption model available.  You can download Japanese caption model with the following script.
88 | ```
89 | bash download.sh 
90 | bash download_jp.sh
91 | ```
92 | ```
93 | cd codes
94 | python generate_caption.py -v ../work/index2token_jp.pkl -m ../models/caption_model_jp.chainer -i ../images/test_image.jpg
95 | ```
96 | Japnese Notebook: https://github.com/apple2373/chainer_caption_generation/blob/master/codes/sample_code_jp.ipynb  
97 | Japnese Blogpost: http://t-satoshi.blogspot.com/2016/01/blog-post_1.html  
98 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/__init__.py


--------------------------------------------------------------------------------
/codes/MSCOCO.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import json
 5 | import nltk 
 6 | 
 7 | def read_MSCOCO_json(file_place):
 8 |         
 9 |     f = open(file_place, 'r')
10 |     jsonData = json.load(f)
11 |     f.close()
12 | 
13 |     captions={}#key is sentence_length. 
14 |     caption_id2tokens={}
15 |     caption_id2image_id={}
16 | 
17 |     for caption_data in jsonData['annotations']:
18 |         caption_id=caption_data['id']
19 |         image_id=caption_data['image_id']
20 |         caption=caption_data['caption']
21 | 
22 |         caption=caption.replace('\n', '').strip().lower()
23 |         if caption[-1]=='.':#to delete the last period. 
24 |             caption=caption[0:-1]
25 | 
26 |         caption_tokens=['<SOS>']
27 |         caption_tokens += nltk.word_tokenize(caption)
28 |         caption_tokens.append("<EOS>")
29 |         caption_length=len(caption_tokens)
30 | 
31 |         if caption_length in captions:
32 |             captions[caption_length].add(caption_id)
33 |         else:
34 |             captions[caption_length]=set([caption_id])
35 |         
36 |         caption_id2tokens[caption_id]=caption_tokens
37 |         caption_id2image_id[caption_id]=image_id
38 |         
39 |     return captions,caption_id2tokens,caption_id2image_id
40 |        


--------------------------------------------------------------------------------
/codes/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/codes/__init__.py


--------------------------------------------------------------------------------
/codes/caption_generator.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #!/usr/bin/env python
  3 | 
  4 | '''
  5 | If you want to integrate caption generation system for your system, you can import this module.
  6 | '''
  7 | 
  8 | import os
  9 | #comment out the below if you want to do type check. Remeber this have to be done BEFORE import chainer
 10 | #os.environ["CHAINER_TYPE_CHECK"] = "0" 
 11 | import chainer 
 12 | #If the below is false, the type check is disabled. 
 13 | #print(chainer.functions.Linear(1,1).type_check_enable) 
 14 | 
 15 | import numpy as np
 16 | import math
 17 | from chainer import cuda
 18 | import chainer.functions as F
 19 | from chainer import cuda, Function, FunctionSet, gradient_check, Variable, optimizers
 20 | from chainer import serializers
 21 | import pickle
 22 | import copy
 23 | from image_reader import Image_reader
 24 | 
 25 | class Caption_generator(object):
 26 |     def __init__(self,caption_model_place,cnn_model_place,index2word_place,gpu_id=-1,beamsize=3):
 27 |         #basic paramaters you need to modify
 28 |         self.gpu_id=gpu_id# GPU ID. if you want to use cpu, -1
 29 |         self.beamsize=beamsize
 30 | 
 31 |         #Gpu Setting
 32 |         global xp
 33 |         if self.gpu_id >= 0:
 34 |             xp = cuda.cupy 
 35 |             cuda.get_device(gpu_id).use()
 36 |         else:
 37 |             xp=np
 38 | 
 39 |         # Prepare dataset
 40 |         with open(index2word_place, 'r') as f:
 41 |             self.index2word = pickle.load(f)
 42 |         vocab=self.index2word
 43 | 
 44 |         #Load Caffe Model
 45 |         with open(cnn_model_place, 'r') as f:
 46 |             self.func = pickle.load(f)
 47 | 
 48 |         #Model Preparation
 49 |         image_feature_dim=1024#dimension of image feature
 50 |         self.n_units = 512  #number of units per layer
 51 |         n_units = 512 
 52 |         self.model = FunctionSet()
 53 |         self.model.img_feature2vec=F.Linear(image_feature_dim, n_units)#CNN(I)の最後のレイヤーに相当。#parameter  W,b
 54 |         self.model.embed=F.EmbedID(len(vocab), n_units)#W_e*S_tに相当 #parameter  W
 55 |         self.model.l1_x=F.Linear(n_units, 4 * n_units)#parameter  W,b
 56 |         self.model.l1_h=F.Linear(n_units, 4 * n_units)#parameter  W,b
 57 |         self.model.out=F.Linear(n_units, len(vocab))#parameter  W,b
 58 |         serializers.load_hdf5(caption_model_place, self.model)#read pre-trained model
 59 | 
 60 |         #To GPU
 61 |         if gpu_id >= 0:
 62 |             self.model.to_gpu()
 63 |             self.func.to_gpu()
 64 | 
 65 |         #to avoid overflow.
 66 |         #I don't know why, but this model overflows at the first time only with CPU.
 67 |         #So I intentionally make overflow so that it never happns after that.
 68 |         if gpu_id < 0:
 69 |             numpy_image = np.ones((3, 224,224), dtype=np.float32)
 70 |             self.generate(numpy_image)
 71 | 
 72 |     def feature_exractor(self,x_chainer_variable): #to extract image feature by CNN.
 73 |         y, = self.func(inputs={'data': x_chainer_variable}, outputs=['pool5/7x7_s1'],
 74 |                       disable=['loss1/ave_pool', 'loss2/ave_pool','loss3/classifier'],
 75 |                       train=False)
 76 |         return y
 77 | 
 78 |     def forward_one_step_for_image(self,img_feature, state, volatile='on'):
 79 |         x = img_feature#img_feature is chainer.variable.
 80 |         h0 = self.model.img_feature2vec(x)
 81 |         h1_in = self.model.l1_x(F.dropout(h0,train=False)) + self.model.l1_h(state['h1'])
 82 |         c1, h1 = F.lstm(state['c1'], h1_in)
 83 |         y = self.model.out(F.dropout(h1,train=False))#don't forget to change drop out into non train mode.
 84 |         state = {'c1': c1, 'h1': h1}
 85 |         return state, F.softmax(y)
 86 | 
 87 |     #forward_one_step is after the CNN layer, 
 88 |     #h0 is n_units dimensional vector (embedding)
 89 |     def forward_one_step(self,cur_word, state, volatile='on'):
 90 |         x = chainer.Variable(cur_word, volatile)
 91 |         h0 = self.model.embed(x)
 92 |         h1_in = self.model.l1_x(F.dropout(h0,train=False)) + self.model.l1_h(state['h1'])
 93 |         c1, h1 = F.lstm(state['c1'], h1_in)
 94 |         y = self.model.out(F.dropout(h1,train=False)) 
 95 |         state = {'c1': c1, 'h1': h1}
 96 |         return state, F.softmax(y)
 97 | 
 98 |     def beam_search(self,sentence_candidates,final_sentences,depth=1,beamsize=3):
 99 |         volatile=True
100 |         next_sentence_candidates_temp=list()
101 |         for sentence_tuple in sentence_candidates:
102 |             cur_sentence=sentence_tuple[0]
103 |             cur_index=sentence_tuple[0][-1]
104 |             cur_index_xp=xp.array([cur_index],dtype=np.int32)
105 |             cur_state=sentence_tuple[1]
106 |             cur_log_likely=sentence_tuple[2]
107 | 
108 |             state, predicted_word = self.forward_one_step(cur_index_xp,cur_state, volatile=volatile)
109 |             predicted_word_np=cuda.to_cpu(predicted_word.data)
110 |             top_indexes=(-predicted_word_np).argsort()[0][:beamsize]
111 | 
112 |             for index in np.nditer(top_indexes):
113 |                 index=int(index)
114 |                 probability=predicted_word_np[0][index]
115 |                 next_sentence=copy.deepcopy(cur_sentence)
116 |                 next_sentence.append(index)
117 |                 log_likely=math.log(probability)
118 |                 next_log_likely=cur_log_likely+log_likely
119 |                 next_sentence_candidates_temp.append((next_sentence,state,next_log_likely))# make each sentence tuple
120 | 
121 |         prob_np_array=np.array([sentence_tuple[2] for sentence_tuple in next_sentence_candidates_temp])
122 |         top_candidates_indexes=(-prob_np_array).argsort()[:beamsize]
123 |         next_sentence_candidates=list()
124 |         for i in top_candidates_indexes:
125 |             sentence_tuple=next_sentence_candidates_temp[i]
126 |             index=sentence_tuple[0][-1]
127 |             if self.index2word[index]=='<EOS>':
128 |                 final_sentence=sentence_tuple[0]
129 |                 final_likely=sentence_tuple[2]
130 |                 final_probability=math.exp(final_likely)
131 |                 final_sentences.append((final_sentence,final_probability,final_likely))
132 |             else:
133 |                 next_sentence_candidates.append(sentence_tuple)
134 | 
135 |         if len(final_sentences)>=beamsize:
136 |             return final_sentences
137 |         elif depth==50:
138 |             return final_sentences
139 |         else:
140 |             depth+=1
141 |             return self.beam_search(next_sentence_candidates,final_sentences,depth,beamsize)
142 | 
143 |     def generate(self,numpy_image):
144 |         '''Generate Caption for an Numpy Image array
145 |         
146 |         Args:
147 |             numpy_image: numpy image
148 | 
149 |         Returns:
150 |             list of generated captions. The structure is [caption,caption,caption,...]
151 |             Where caption = {"sentence":This is a generated sentence, "probability": The probability of the generated sentence} 
152 | 
153 |         '''
154 | 
155 |         #initial step
156 |         x_batch = np.ndarray((1, 3, 224,224), dtype=np.float32)
157 |         x_batch[0]=numpy_image
158 | 
159 |         volatile=True
160 |         if self.gpu_id  >=0:
161 |             x_batch_chainer = Variable(cuda.to_gpu(x_batch),volatile=volatile)
162 |         else:
163 |             x_batch_chainer = Variable(x_batch,volatile=volatile)
164 | 
165 |         batchsize=1
166 |         #image is chainer.variable.
167 |         state = {name: chainer.Variable(xp.zeros((batchsize, self.n_units),dtype=np.float32),volatile) for name in ('c1', 'h1')}
168 |         img_feature=self.feature_exractor(x_batch_chainer)
169 |         state, predicted_word = self.forward_one_step_for_image(img_feature,state, volatile=volatile)
170 | 
171 |         if self.gpu_id >=0:
172 |             index=cuda.to_cpu(predicted_word.data.argmax(1))[0]
173 |         else:
174 |             index=predicted_word.data.argmax(1)[0]
175 | 
176 |         probability=predicted_word.data[0][index]
177 |         initial_sentence_candidates=[([index],state,probability)]
178 | 
179 |         final_sentences=list()
180 |         generated_sentence_candidates=self.beam_search(initial_sentence_candidates,final_sentences,beamsize=self.beamsize)
181 | 
182 |         #convert to index to strings
183 | 
184 |         generated_string_sentence_candidates=[]
185 |         for sentence_tuple in generated_sentence_candidates:
186 |             sentence=[self.index2word[index] for index in sentence_tuple[0]][1:-1]
187 |             probability=sentence_tuple[1]
188 |             final_likely=sentence_tuple[2]
189 | 
190 |             a_candidate={'sentence':sentence,'probability':probability,'log_probability':final_likely}
191 |     
192 |             generated_string_sentence_candidates.append(a_candidate)
193 | 
194 | 
195 |         return generated_string_sentence_candidates
196 | 
197 |     def generate_temp(self,numpy_image):
198 | 
199 |         '''Simple Generate Caption for an Numpy Image array
200 |         
201 |         Args:
202 |             numpy_image: numpy image
203 | 
204 |         Returns:
205 |             string of generated capiton
206 |         '''
207 | 
208 |         genrated_sentence_string=''
209 |         x_batch = np.ndarray((1, 3, 224,224), dtype=np.float32)
210 |         x_batch[0]=numpy_image
211 | 
212 |         volatile=True
213 |         if self.gpu_id >=0:
214 |             x_batch_chainer = Variable(cuda.to_gpu(x_batch),volatile=volatile)
215 |         else:
216 |             x_batch_chainer = Variable(x_batch,volatile=volatile)
217 | 
218 |         batchsize=1
219 | 
220 |         #image is chainer.variable.
221 |         state = {name: chainer.Variable(xp.zeros((batchsize, self.n_units),dtype=np.float32),volatile) for name in ('c1', 'h1')}
222 |         img_feature=self.feature_exractor(x_batch_chainer)
223 |         #img_feature_chainer is chainer.variable of extarcted feature.
224 |         state = {name: chainer.Variable(xp.zeros((batchsize, self.n_units),dtype=np.float32),volatile) for name in ('c1', 'h1')}
225 |         state, predicted_word = self.forward_one_step_for_image(img_feature,state, volatile=volatile)
226 |         index=predicted_word.data.argmax(1)
227 |         index=cuda.to_cpu(index)[0]
228 |         #genrated_sentence_string+=index2word[index] #dont's add it because this is <SOS>
229 | 
230 |         for i in xrange(50):
231 |             state, predicted_word = self.forward_one_step(predicted_word.data.argmax(1).astype(np.int32),state, volatile=volatile)
232 |             index=predicted_word.data.argmax(1)
233 |             index=cuda.to_cpu(index)[0]
234 |             if self.index2word[index]=='<EOS>':
235 |                 genrated_sentence_string=genrated_sentence_string.strip()
236 |                 break;
237 |             genrated_sentence_string+=self.index2word[index]+" "
238 | 
239 |         return genrated_sentence_string
240 | 
241 |     def get_top_sentence(self,numpy_image):
242 |         '''
243 |         just get a top sentence as  string
244 |         
245 |         Args:
246 |             numpy_image: numpy image
247 | 
248 |         Returns:
249 |             string of generated capiton
250 |         '''
251 |         candidates=self.generate(numpy_image)
252 |         scores=[caption['log_probability'] for caption in candidates]
253 |         argmax=np.argmax(scores)
254 |         top_caption=candidates[argmax]['sentence']
255 | 
256 |         sentence = ''
257 |         for word in top_caption:
258 |             sentence+=word+' '
259 | 
260 |         return sentence.strip()
261 | 
262 | 
263 | 
264 | 


--------------------------------------------------------------------------------
/codes/evaluate_model.py:
--------------------------------------------------------------------------------
 1 | #under construction. 
 2 | #I do not use this.
 3 | 
 4 | 
 5 | file_place = '../data/MSCOCO/annotations/captions_val2014.json'
 6 | val_captions,val_caption_id2tokens,val_caption_id2image_id = read_MSCOCO_json(file_place)
 7 | 
 8 | #Validiation Set
 9 |     print "testing"
10 |     num_val_data=len(val_caption_id2image_id)
11 |     caption_ids_batches=[]
12 |     for caption_length in val_captions.keys():
13 |         caption_ids_set=val_captions[caption_length]
14 |         caption_ids=list(caption_ids_set)
15 |         caption_ids_batches+=[caption_ids[x:x + batchsize] for x in xrange(0, len(caption_ids), batchsize)]
16 |     
17 |     sum_loss = 0
18 |     file_base='../data/MSCOCO/val2014/COCO_val2014_'
19 |     for i, caption_ids_batch in enumerate(caption_ids_batches):
20 |         captions_batch=[val_caption_id2sentence[caption_id] for caption_id in caption_ids_batch]
21 |         sentences=xp.array(captions_batch,dtype=np.int32)
22 |         image_ids_batch=[val_caption_id2image_id[caption_id] for caption_id in caption_ids_batch]
23 | 
24 |         try:
25 |             images=images_read(image_ids_batch,file_base,volatile=True)
26 |         except Exception as e:
27 |             print 'image reading error'
28 |             print 'type:' + str(type(e))
29 |             print 'args:' + str(e.args)
30 |             print 'message:' + e.message
31 |             print image_ids_batch
32 |             continue
33 | 
34 |         batchsize=normal_batchsize#becasue I am adusting batch size depending on sentence length, I need to rechange it. 
35 |         if len(caption_ids_batch) != batchsize:
36 |             batchsize=len(caption_ids_batch) 
37 |             #last batch may be less than batchsize. Or depend on caption_length
38 | 
39 |         loss = forward(images,sentences,volatile=True)
40 |         
41 |         sum_loss      += loss.data * batchsize
42 | 
43 |     mean_loss     = sum_loss / num_val_data
44 |     print mean_loss
45 |     with open(savedir+"test_mean_loss.txt", "a") as f:
46 |         f.write(str(mean_loss)+'\n')


--------------------------------------------------------------------------------
/codes/generate_caption.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #!/usr/bin/env python
  3 | #compatible chiner 1.5
  4 | 
  5 | 
  6 | import os
  7 | #comment out the below if you want to do type check. Remeber this have to be done BEFORE import chainer
  8 | #os.environ["CHAINER_TYPE_CHECK"] = "0" 
  9 | import chainer 
 10 | #If the below is false, the type check is disabled. 
 11 | #print(chainer.functions.Linear(1,1).type_check_enable) 
 12 | 
 13 | import argparse
 14 | import os
 15 | import numpy as np
 16 | from chainer import cuda
 17 | import chainer.functions as F
 18 | from chainer import cuda, Function, FunctionSet, gradient_check, Variable, optimizers
 19 | #import matplotlib.pyplot as plt
 20 | from chainer import serializers
 21 | 
 22 | from scipy.misc import imread, imresize, imsave
 23 | import json
 24 | import random
 25 | import pickle
 26 | import math
 27 | import skimage.transform
 28 | 
 29 | #Settings can be changed by command line arguments
 30 | gpu_id=-1# GPU ID. if you want to use cpu, -1
 31 | model_place='../models/caption_model.chainer'
 32 | caffe_model_place='../data/bvlc_googlenet_caffe_chainer.pkl'
 33 | index2word_file = '../work/index2token.pkl'
 34 | image_file_name='../images/test_image.jpg'
 35 | 
 36 | 
 37 | 
 38 | #Override Settings by argument
 39 | parser = argparse.ArgumentParser(description=u"caption generation")
 40 | parser.add_argument("-g", "--gpu",default=gpu_id, type=int, help=u"GPU ID.CPU is -1")
 41 | parser.add_argument("-m", "--model",default=model_place, type=str, help=u" caption generation model")
 42 | parser.add_argument("-c", "--caffe",default=caffe_model_place, type=str, help=u" pre trained caffe model pickled after imported to chainer")
 43 | parser.add_argument("-v", "--vocab",default=index2word_file, type=str, help=u" vocaburary file")
 44 | parser.add_argument("-i", "--image",default=image_file_name, type=str, help=u"a image that you want to generate capiton ")
 45 | 
 46 | args = parser.parse_args()
 47 | gpu_id=args.gpu
 48 | model_place= args.model
 49 | index2word_file = args.vocab
 50 | image_file_name = args.image
 51 | caffe_model_place = args.caffe
 52 | 
 53 | #Gpu Setting
 54 | if gpu_id >= 0:
 55 |     xp = cuda.cupy 
 56 |     cuda.get_device(gpu_id).use()
 57 | else:
 58 |     xp=np
 59 | 
 60 | #Basic Setting
 61 | image_feature_dim=1024#dimension of image feature
 62 | n_units = 512  #number of units per layer
 63 | 
 64 | 
 65 | # Prepare dataset
 66 | print "loading vocab"
 67 | with open(index2word_file, 'r') as f:
 68 |     index2word = pickle.load(f)
 69 | 
 70 | vocab=index2word
 71 | 
 72 | 
 73 | #Load Caffe Model
 74 | print "loading caffe models"
 75 | with open(caffe_model_place, 'r') as f:
 76 |     func = pickle.load(f)
 77 | 
 78 | if gpu_id>= 0:
 79 |     func.to_gpu()
 80 | print "done"
 81 | 
 82 | def feature_exractor(x_chainer_variable): #to extract image feature by CNN.
 83 |     y, = func(inputs={'data': x_chainer_variable}, outputs=['pool5/7x7_s1'],
 84 |                   disable=['loss1/ave_pool', 'loss2/ave_pool','loss3/classifier'],
 85 |                   train=False)
 86 |     return y
 87 | 
 88 | #Read image from file into numpy.
 89 | #several codes are copied from here: https://github.com/ebenolson/Recipes/blob/master/examples/imagecaption/COCO%20Preprocessing.ipynb
 90 | #see also https://groups.google.com/forum/#!toself.pic/lasagne-users/cCFVeT5rw-o
 91 | MEAN_VALUES = np.array([104, 117, 123]).reshape((3,1,1))
 92 | def image_read_np(file_place):
 93 |     im = imread(file_place)
 94 |     if len(im.shape) == 2:
 95 |         im = im[:, :, np.newaxis]
 96 |         im = np.repeat(im, 3, axis=2)
 97 |     # Resize so smallest dim = 224, preserving aspect ratio
 98 |     h, w, _ = im.shape
 99 |     if h < w:
100 |         im = skimage.transform.resize(im, (224, w*224/h), preserve_range=True)
101 |     else:
102 |         im = skimage.transform.resize(im, (h*224/w, 224), preserve_range=True)
103 | 
104 |     # Central crop to 224x224
105 |     h, w, _ = im.shape
106 |     im = im[h//2-112:h//2+112, w//2-112:w//2+112]
107 |     
108 |     rawim = np.copy(im).astype('uint8')
109 |     
110 |     # Shuffle axes to c01
111 |     im = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
112 |     
113 |     # Convert to BGR
114 |     im = im[::-1, :, :]
115 | 
116 |     im = im - MEAN_VALUES
117 |     return rawim.transpose(2, 0, 1).astype(np.float32)
118 | 
119 | #Model Preparation
120 | print "preparing caption generation models"
121 | model = FunctionSet()
122 | model.img_feature2vec=F.Linear(image_feature_dim, n_units)#CNN(I)の最後のレイヤーに相当。#parameter  W,b
123 | model.embed=F.EmbedID(len(vocab), n_units)#W_e*S_tに相当 #parameter  W
124 | model.l1_x=F.Linear(n_units, 4 * n_units)#parameter  W,b
125 | model.l1_h=F.Linear(n_units, 4 * n_units)#parameter  W,b
126 | model.out=F.Linear(n_units, len(vocab))#parameter  W,b
127 | 
128 | serializers.load_hdf5(model_place, model)
129 | 
130 | #To GPU
131 | if gpu_id >= 0:
132 |     model.to_gpu()
133 | print "done"
134 | 
135 | #Define Newtowork (Forward)
136 | 
137 | #forward_one_step is after the CNN layer, 
138 | #h0 is n_units dimensional vector (embedding)
139 | def forward_one_step(cur_word, state, volatile='on'):
140 |     x = chainer.Variable(cur_word, volatile)
141 |     h0 = model.embed(x)
142 |     h1_in = model.l1_x(F.dropout(h0,train=False)) + model.l1_h(state['h1'])
143 |     c1, h1 = F.lstm(state['c1'], h1_in)
144 |     y = model.out(F.dropout(h1,train=False)) 
145 |     state = {'c1': c1, 'h1': h1}
146 |     return state, y
147 | 
148 | def forward_one_step_for_image(img_feature, state, volatile='on'):
149 |     x = img_feature#img_feature is chainer.variable.
150 |     h0 = model.img_feature2vec(x)
151 |     h1_in = model.l1_x(F.dropout(h0,train=False)) + model.l1_h(state['h1'])
152 |     c1, h1 = F.lstm(state['c1'], h1_in)
153 |     y = model.out(F.dropout(h1,train=False))#don't forget to change drop out into non train mode.
154 |     state = {'c1': c1, 'h1': h1}
155 |     return state, y
156 | 
157 | #to avoid overflow.
158 | #I don't know why, but this model overflows only at the first time.
159 | #So I intentionally make overflow so that it never happns after that.
160 | if gpu_id < 0:
161 |     x_batch = np.ones((1, 3, 224,224), dtype=np.float32)
162 |     x_batch_chainer = Variable(x_batch)
163 |     img_feature=feature_exractor(x_batch_chainer)
164 |     state = {name: chainer.Variable(xp.zeros((1, n_units),dtype=np.float32)) for name in ('c1', 'h1')}
165 |     state, predicted_word = forward_one_step_for_image(img_feature,state)
166 | 
167 | def caption_generate(image_file_name):
168 |     print('sentence generation started')
169 | 
170 |     genrated_sentence=[]
171 |     volatile=True
172 | 
173 |     image=image_read_np(image_file_name)
174 |     x_batch = np.ndarray((1, 3, 224,224), dtype=np.float32)
175 |     x_batch[0]=image
176 | 
177 |     if gpu_id >=0:
178 |         x_batch_chainer = Variable(cuda.to_gpu(x_batch),volatile=volatile)
179 |     else:
180 |         x_batch_chainer = Variable(x_batch,volatile=volatile)
181 | 
182 |     batchsize=1
183 | 
184 |     #image is chainer.variable.
185 |     state = {name: chainer.Variable(xp.zeros((batchsize, n_units),dtype=np.float32),volatile) for name in ('c1', 'h1')}
186 |     img_feature=feature_exractor(x_batch_chainer)
187 |     state, predicted_word = forward_one_step_for_image(img_feature,state, volatile=volatile)
188 |     genrated_sentence.append(predicted_word.data)
189 | 
190 |     for i in xrange(50):
191 |         state, predicted_word = forward_one_step(predicted_word.data.argmax(1).astype(np.int32),state, volatile=volatile)
192 |         genrated_sentence.append(predicted_word.data)
193 | 
194 |     print("---genrated_sentence--")
195 | 
196 |     for predicted_word in genrated_sentence:
197 |         if gpu_id >=0:
198 |             index=cuda.to_cpu(predicted_word.argmax(1))[0]
199 |         else:
200 |             index=predicted_word.argmax(1)[0]
201 |         print index2word[index]
202 |         if index2word[index]=='<EOS>':
203 |             xp.max(predicted_word)
204 |             x_batch_chainer = Variable(predicted_word,volatile=volatile)
205 |             print xp.max(F.softmax(x_batch_chainer).data)
206 |             break
207 | 
208 | caption_generate(image_file_name)


--------------------------------------------------------------------------------
/codes/generate_caption_beam.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | #!/usr/bin/env python
 3 | 
 4 | import numpy as np
 5 | import argparse
 6 | from image_reader import Image_reader
 7 | from caption_generator import Caption_generator
 8 | 
 9 | #Settings can be changed by command line arguments
10 | gpu_id=-1# GPU ID. if you want to use cpu, -1
11 | model_place='../models/caption_model.chainer'
12 | caffe_model_place='../data/bvlc_googlenet_caffe_chainer.pkl'
13 | index2word_file = '../work/index2token.pkl'
14 | image_file_name='../images/test_image.jpg'
15 | beamsize=3
16 | 
17 | #Override Settings by argument
18 | parser = argparse.ArgumentParser(description=u"caption generation")
19 | parser.add_argument("-g", "--gpu",default=gpu_id, type=int, help=u"GPU ID.CPU is -1")
20 | parser.add_argument("-m", "--model",default=model_place, type=str, help=u" caption generation model")
21 | parser.add_argument("-c", "--caffe",default=caffe_model_place, type=str, help=u" pre trained caffe model pickled after imported to chainer")
22 | parser.add_argument("-v", "--vocab",default=index2word_file, type=str, help=u" vocaburary file")
23 | parser.add_argument("-i", "--image",default=image_file_name, type=str, help=u"a image that you want to generate capiton ")
24 | parser.add_argument("-b", "--beam",default=beamsize, type=int, help=u"a image that you want to generate capiton ")
25 | 
26 | args = parser.parse_args()
27 | gpu_id=args.gpu
28 | model_place= args.model
29 | index2word_file = args.vocab
30 | image_file_name = args.image
31 | caffe_model_place = args.caffe
32 | beamsize = args.beam
33 | 
34 | 
35 | #Instantiate image_reader with GoogleNet mean image
36 | mean_image = np.array([104, 117, 123]).reshape((3,1,1))#GoogleNet Mean
37 | image_reader=Image_reader(mean=mean_image)
38 | 
39 | #Instantiate caption generator
40 | caption_generator=Caption_generator(caption_model_place=model_place,cnn_model_place=caffe_model_place,index2word_place=index2word_file,beamsize=beamsize,gpu_id=gpu_id)
41 | 
42 | #Read Image
43 | image=image_reader.read(image_file_name)
44 | 
45 | #Generate Catpion
46 | captions=caption_generator.generate(image)
47 | 
48 | #print it
49 | for caption in captions:
50 |     sentence=caption['sentence']
51 |     probability=caption['probability']
52 |     print " ".join(sentence),probability
53 | 
54 | 


--------------------------------------------------------------------------------
/codes/image_reader.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | '''
 5 | The class to read an image as numpy array.
 6 | This is particurary designed for ImageNet related task.
 7 | So, whatever size the input image have, the output will be centor-croped image of 224*224
 8 | Also, you can specify the mean image for CNNs like GoogleNet or VGG 
 9 | '''
10 | 
11 | import numpy as np
12 | from scipy.misc import imread, imresize
13 | import skimage.transform
14 | 
15 | class Image_reader(object):
16 |     def __init__(self,mean=np.zeros((3,1,1))):
17 |         self.mean_image = mean
18 | 
19 |     #taken from https://github.com/ebenolson/Recipes/blob/master/examples/imagecaption/COCO%20Preprocessing.ipynb
20 |     #see also https://groups.google.com/forum/#!toself.pic/lasagne-users/cCFVeT5rw-o
21 |     def read(self,file_place):
22 |         im = imread(file_place)
23 |         if len(im.shape) == 2:
24 |             im = im[:, :, np.newaxis]
25 |             im = np.repeat(im, 3, axis=2)
26 | 
27 |         # Resize so smallest dim = 224, preserving aspect ratio
28 |         h, w, _ = im.shape
29 |         if h < w:
30 |             im = skimage.transform.resize(im, (224, w*224/h), preserve_range=True)
31 |         else:
32 |             im = skimage.transform.resize(im, (h*224/w, 224), preserve_range=True)
33 | 
34 |         # Central crop to 224x224
35 |         h, w, _ = im.shape
36 |         im = im[h//2-112:h//2+112, w//2-112:w//2+112]
37 |         
38 |         rawim = np.copy(im).astype('uint8')
39 |         
40 |         # Shuffle axes to c01
41 |         im = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
42 |         
43 |         # Convert to BGR
44 |         # We should know OpenCV's default is BGR instead of RGB
45 |         im = im[::-1, :, :]
46 | 
47 |         im = im - self.mean_image
48 |         return rawim.transpose(2, 0, 1).astype(np.float32)
49 | 
50 |     def crop_for_plot(self,file_place):
51 |         im = imread(file_place)
52 |         if len(im.shape) == 2:
53 |             im = im[:, :, np.newaxis]
54 |             im = np.repeat(im, 3, axis=2)
55 |         # Resize so smallest dim = 224, preserving aspect ratio
56 |         h, w, _ = im.shape
57 |         if h < w:
58 |             im = skimage.transform.resize(im, (224, w*224/h), preserve_range=True)
59 |         else:
60 |             im = skimage.transform.resize(im, (h*224/w, 224), preserve_range=True)
61 | 
62 |         # Central crop to 224x224
63 |         h, w, _ = im.shape
64 |         im = im[h//2-112:h//2+112, w//2-112:w//2+112]
65 |         
66 |         rawim = np.copy(im).astype('uint8')
67 |         
68 |         # Shuffle axes to c01
69 |         im = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
70 |         
71 |         # Convert to BGR
72 |         im = im[::-1, :, :]
73 | 
74 |         im = im - MEAN_VALUES
75 |         return rawim


--------------------------------------------------------------------------------
/codes/pre_extract_googlenet_features.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | To extarct CNN features.
  3 | 
  4 | This code could be messy.
  5 | I did not assume others  use this, but decided to make avaiable, 
  6 | because I saw many people who wants to use VGG insetad of GoogleNet.
  7 | But remember that this is for GoogleNet.
  8 | '''
  9 | #!/usr/bin/env python
 10 | # -*- coding: utf-8 -*-
 11 | 
 12 | 
 13 | # import os
 14 | # os.environ["CHAINER_TYPE_CHECK"] = "0" #to disable type check
 15 | import chainer 
 16 | 
 17 | import argparse
 18 | import os
 19 | import numpy as np
 20 | from chainer import cuda
 21 | import chainer.functions as F
 22 | from chainer.functions import caffe
 23 | from chainer import cuda, Function, FunctionSet, gradient_check, Variable, optimizers
 24 | #import matplotlib.pyplot as plt
 25 | from scipy.misc import imread, imresize, imsave
 26 | import json
 27 | import nltk
 28 | import random
 29 | import pickle
 30 | import math
 31 | import skimage.transform
 32 | 
 33 | 
 34 | #Settings can be changed by command line arguments
 35 | gpu_id=-1# GPU ID. if you want to use cpu, -1
 36 | #gpu_id=0
 37 | savedir='../work/img_features/'# name of log and results image saving directory
 38 | image_feature_dim=1024#特徴の次元数。
 39 | 
 40 | #Functions
 41 | def get_image_ids(file_place):
 42 |     
 43 |     f = open(file_place, 'r')
 44 |     jsonData = json.load(f)
 45 |     f.close()
 46 |     
 47 |     image_id2feature={}
 48 |     for caption_data in jsonData['annotations']:
 49 |         image_id=caption_data['image_id']
 50 |         image_id2feature[image_id]=np.array([image_feature_dim,])
 51 | 
 52 |     return image_id2feature
 53 | 
 54 | #Gpu Setting
 55 | if gpu_id >= 0:
 56 |     xp = cuda.cupy 
 57 |     cuda.get_device(gpu_id).use()
 58 | else:
 59 |     xp=np
 60 | 
 61 | #画像読み込み関数
 62 | #ただ読むだけ
 63 | MEAN_VALUES = np.array([104, 117, 123]).reshape((3,1,1))
 64 | def image_read_np(file_place):
 65 |     im = imread(file_place)
 66 |     if len(im.shape) == 2:
 67 |         im = im[:, :, np.newaxis]
 68 |         im = np.repeat(im, 3, axis=2)
 69 |     # Resize so smallest dim = 224, preserving aspect ratio
 70 |     h, w, _ = im.shape
 71 |     if h < w:
 72 |         im = skimage.transform.resize(im, (224, w*224/h), preserve_range=True)
 73 |     else:
 74 |         im = skimage.transform.resize(im, (h*224/w, 224), preserve_range=True)
 75 | 
 76 |     # Central crop to 224x224
 77 |     h, w, _ = im.shape
 78 |     im = im[h//2-112:h//2+112, w//2-112:w//2+112]
 79 |     
 80 |     rawim = np.copy(im).astype('uint8')
 81 |     
 82 |     # Shuffle axes to c01
 83 |     im = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
 84 |     
 85 |     # Convert to BGR
 86 |     im = im[::-1, :, :]
 87 | 
 88 |     im = im - MEAN_VALUES
 89 |     return rawim.transpose(2, 0, 1).astype(np.float32)
 90 | 
 91 | #main
 92 | 
 93 | # Prepare dataset
 94 | file_place = '../data/MSCOCO/annotations/captions_train2014.json'
 95 | train_image_id2feature=get_image_ids(file_place)
 96 | file_place = '../data/MSCOCO/annotations/captions_val2014.json'
 97 | val_image_id2feature=get_image_ids(file_place)
 98 | 
 99 | 
100 | #Caffeモデルをロード
101 | print "loading caffe models"
102 | func = caffe.CaffeFunction('../data/bvlc_googlenet.caffemodel')
103 | if gpu_id>= 0:
104 |     func.to_gpu()
105 | print "done"
106 | 
107 | 
108 | 
109 | print 'feature_exractor'
110 | file_base='../data/MSCOCO/train2014/COCO_train2014_'
111 | for i, image_id in enumerate(train_image_id2feature.keys()):
112 | 
113 |     if i%5000==0:
114 |         print i 
115 | 
116 |     try:
117 |         image=image_read_np(file_base+str("{0:012d}".format(image_id)+'.jpg'))
118 |     except Exception as e:
119 |         print 'image reading error'
120 |         print 'type:' + str(type(e))
121 |         print 'args:' + str(e.args)
122 |         print 'message:' + e.message
123 |         print image_id
124 |         continue
125 | 
126 |     x_batch = np.ndarray((1, 3, 224,224), dtype=np.float32)
127 |     x_batch[0]=image
128 |     if gpu_id >=0:
129 |         x = Variable(cuda.to_gpu(x_batch), volatile=True)
130 |     else:
131 |         x = Variable(x_batch, volatile=True)
132 |     image_feature_chainer, = func(inputs={'data': x}, outputs=['pool5/7x7_s1'],
133 |                   disable=['loss1/ave_pool', 'loss2/ave_pool','loss3/classifier'],
134 |                   train=False)
135 | 
136 |     image_feature_np=image_feature_chainer.data.reshape(1024)
137 |     train_image_id2feature[image_id]=cuda.to_cpu(image_feature_np)
138 | 
139 | 
140 | pickle.dump(train_image_id2feature, open(savedir+"train_image_id2feature.pkl", 'wb'), -1)
141 | 
142 | print "for test"
143 | file_base='../data/MSCOCO/val2014/COCO_val2014_'
144 | for i, image_id in enumerate(val_image_id2feature.keys()):
145 | 
146 |     if i%5000==0:
147 |         print i 
148 | 
149 |     try:
150 |         image=image_read_np(file_base+str("{0:012d}".format(image_id)+'.jpg'))
151 |     except Exception as e:
152 |         print 'image reading error'
153 |         print 'type:' + str(type(e))
154 |         print 'args:' + str(e.args)
155 |         print 'message:' + e.message
156 |         print image_id
157 |         continue
158 | 
159 |     x_batch = np.ndarray((1, 3, 224,224), dtype=np.float32)
160 |     x_batch[0]=image
161 |     if gpu_id >=0:
162 |         x = Variable(cuda.to_gpu(x_batch), volatile=True)
163 |     else:
164 |         x = Variable(x_batch, volatile=True)
165 |     image_feature_chainer, = func(inputs={'data': x}, outputs=['pool5/7x7_s1'],
166 |                   disable=['loss1/ave_pool', 'loss2/ave_pool','loss3/classifier'],
167 |                   train=False)
168 | 
169 |     image_feature_np=image_feature_chainer.data.reshape(1024)
170 |     val_image_id2feature[image_id]=cuda.to_cpu(image_feature_np)
171 | 
172 | pickle.dump(val_image_id2feature, open(savedir+"val_image_id2feature.pkl", 'wb'), -1)


--------------------------------------------------------------------------------
/codes/prepocess_captions.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | """
 5 | This program preprocesses the caption into picke file.
 6 | Main purpose is to tokenize, make lower case, and filter out low frequent vocaburaries.
 7 | Note tokenize and  make lower case is done by a function (read_MSCOCO_json) in another file MSCOCO.py.
 8 | """
 9 | 
10 | from MSCOCO import read_MSCOCO_json #to read MSCOCO json file. 
11 | from gensim import corpora
12 | import pickle
13 | 
14 | file_place = '../data/MSCOCO/annotations/captions_train2014.json'
15 | train_captions,train_caption_id2tokens,train_caption_id2image_id = read_MSCOCO_json(file_place)
16 | 
17 | texts=train_caption_id2tokens.values()
18 | dictionary = corpora.Dictionary(texts)
19 | dictionary.filter_extremes(no_below=5, no_above=1.0)
20 | dictionary.compactify() # remove gaps in id sequence after words that were removed
21 | index2token = dict((v, k) for k, v in dictionary.token2id.iteritems())
22 | ukn_id=len(dictionary.token2id)
23 | index2token[ukn_id]='<UKN>'
24 | 
25 | #just save the map from index to token (word)
26 | #that means this is vocaburary file
27 | with open('../work/index2token.pkl', 'w') as f:
28 |     pickle.dump(index2token,f)
29 | 
30 | 
31 | train_caption_id2sentence={}
32 | for (caption_id,tokens) in train_caption_id2tokens.iteritems():
33 |     sentence=[]
34 |     for token in tokens:
35 |         if token in dictionary.token2id:
36 |             sentence.append(dictionary.token2id[token])
37 |         else:
38 |             sentence.append(ukn_id)
39 |             
40 |     train_caption_id2sentence[caption_id]=sentence
41 | 
42 | 
43 | #Save preprocessed captions. 
44 | with open('../work/preprocessed_train_captions.pkl', 'w') as f:
45 |     pickle.dump((train_captions,train_caption_id2sentence,train_caption_id2image_id),f)


--------------------------------------------------------------------------------
/codes/sample_code.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | '''
 5 | Sample code to generate caption
 6 | '''
 7 | import numpy as np
 8 | from image_reader import Image_reader
 9 | from caption_generator import Caption_generator
10 | 
11 | #Instantiate image_reader with GoogleNet mean image
12 | mean_image = np.array([104, 117, 123]).reshape((3,1,1))
13 | image_reader=Image_reader(mean=mean_image)
14 | 
15 | #Instantiate caption generator
16 | caption_model_place='../models/caption_model.chainer'
17 | cnn_model_place='../data/bvlc_googlenet_caffe_chainer.pkl'
18 | index2word_place='../work/index2token.pkl'
19 | caption_generator=Caption_generator(caption_model_place=caption_model_place,cnn_model_place=cnn_model_place,index2word_place=index2word_place)
20 | 
21 | 
22 | #The preparation is done
23 | #Let's ganarate caption for a image
24 | 
25 | #First, read an image as numpy array
26 | image_file_path='../images/test_image.jpg'
27 | image=image_reader.read(image_file_path)
28 | 
29 | 
30 | #Next, put the image into caption generator
31 | #The output structure is 
32 | #	[caption,caption,caption,...]
33 | #	caption = {"sentence":This is a generated sentence, "probability": The probability of the generated sentence} 
34 | captions=caption_generator.generate(image)
35 | 
36 | #For example, if you want to print all captions
37 | for caption in captions:
38 |     sentence=caption['sentence']
39 |     probability=caption['probability']
40 |     print " ".join(sentence),probability
41 | 
42 | #Let's do for another image
43 | image_file_path='../images/COCO_val2014_000000241747.jpg'
44 | image=image_reader.read(image_file_path)
45 | captions=caption_generator.generate(image)
46 | for caption in captions:
47 |     sentence=caption['sentence']
48 |     probability=caption['probability']
49 |     print " ".join(sentence),probability
50 | 


--------------------------------------------------------------------------------
/codes/train_caption_model.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | 
  5 | # import os
  6 | #os.environ["CHAINER_TYPE_CHECK"] = "0" #to disable type check. 
  7 | import chainer 
  8 | #Check che below is False if you disabled type check
  9 | #print(chainer.functions.Linear(1,1).type_check_enable) 
 10 | 
 11 | import argparse
 12 | import numpy as np
 13 | import chainer.functions as F
 14 | from chainer import cuda
 15 | from chainer import Function, FunctionSet, Variable, optimizers, serializers
 16 | import pickle
 17 | import random
 18 | 
 19 | #Settings can be changed by command line arguments
 20 | gpu_id=-1# GPU ID. if you want to use cpu, -1
 21 | #gpu_id=4
 22 | savedir='../experiment1/'# name of log and results image saving directory
 23 | 
 24 | #Override Settings by argument
 25 | parser = argparse.ArgumentParser(description=u"caption generation")
 26 | parser.add_argument("-g", "--gpu",default=gpu_id, type=int, help=u"GPU ID.CPU is -1")
 27 | parser.add_argument("-d", "--savedir",default=savedir, type=str, help=u"The directory to save models and log")
 28 | args = parser.parse_args()
 29 | gpu_id=args.gpu
 30 | savedir=args.savedir
 31 | 
 32 | #Gpu Setting
 33 | if gpu_id >= 0:
 34 |     xp = cuda.cupy 
 35 |     cuda.get_device(gpu_id).use()
 36 | else:
 37 |     xp=np
 38 | 
 39 | #Prepare Data
 40 | print("loading preprocessed data")
 41 | 
 42 | with open('../work/index2token.pkl', 'r') as f:
 43 |     index2token = pickle.load(f)
 44 | 
 45 | with open('../work/preprocessed_train_captions.pkl', 'r') as f:
 46 |     train_captions,train_caption_id2sentence,train_caption_id2image_id = pickle.load(f)
 47 | 
 48 | with open('../work/img_features/train_image_id2feature.pkl', 'r') as f:
 49 |     train_image_id2feature = pickle.load(f)
 50 | 
 51 | #Model Preparation
 52 | print "preparing caption generation models"
 53 | image_feature_dim=1024#特徴の次元数。
 54 | n_units = 512  # number of units per layer
 55 | vocab_size=len(index2token)
 56 | 
 57 | model = chainer.FunctionSet()
 58 | model.img_feature2vec=F.Linear(image_feature_dim, n_units)#CNN(I)の最後のレイヤーに相当。#parameter  W,b
 59 | model.embed=F.EmbedID(vocab_size, n_units)#W_e*S_tに相当 #parameter  W
 60 | model.l1_x=F.Linear(n_units, 4 * n_units)#parameter  W,b
 61 | model.l1_h=F.Linear(n_units, 4 * n_units)#parameter  W,b
 62 | model.out=F.Linear(n_units, vocab_size)#parameter  W,b
 63 | 
 64 | #Parameter Initialization
 65 | #Mimicked Chainer Samples
 66 | for param in model.params():
 67 |     data = param.data
 68 |     data[:] = np.random.uniform(-0.1, 0.1, data.shape)
 69 | 
 70 | #set forget bias 1
 71 | model.l1_x.b.data[2*n_units:3*n_units]=np.ones(model.l1_x.b.data[2*n_units:3*n_units].shape).astype(xp.float32)
 72 | model.l1_h.b.data[2*n_units:3*n_units]=np.ones(model.l1_h.b.data[2*n_units:3*n_units].shape).astype(xp.float32)
 73 | 
 74 | #To GPU
 75 | if gpu_id >= 0:
 76 |     model.to_gpu()
 77 | 
 78 | 
 79 | #Define Newtowork (Forward)
 80 | 
 81 | #forward_one_stepは画像の話は無視。それはforwardの一回目で特別にやる。
 82 | #h0はn_units次元のベクトル(embedding)
 83 | #cur_wordはその時の単語のone-hot-vector
 84 | #next_wordはそこで出力すべきone-hot-vector(つまり次のー単語)
 85 | 
 86 | 
 87 | def forward_one_step(cur_word, next_word, state, volatile=False):
 88 |     x = chainer.Variable(cur_word, volatile)
 89 |     t = chainer.Variable(next_word, volatile)
 90 |     h0 = model.embed(x)
 91 |     h1_in = model.l1_x(F.dropout(h0)) + model.l1_h(state['h1'])
 92 |     c1, h1 = F.lstm(state['c1'], h1_in)
 93 |     y = model.out(F.dropout(h1)) 
 94 |     state = {'c1': c1, 'h1': h1}
 95 |     loss = F.softmax_cross_entropy(y, t)
 96 |     return state, loss
 97 | 
 98 | def forward_one_step_for_image(img_feature, first_word, state, volatile=False):
 99 |     print img_feature.shape
100 |     x = chainer.Variable(img_feature)
101 |     t = chainer.Variable(first_word, volatile)
102 |     h0 = model.img_feature2vec(x)
103 |     h1_in = model.l1_x(F.dropout(h0)) + model.l1_h(state['h1'])
104 |     c1, h1 = F.lstm(state['c1'], h1_in)
105 |     y = model.out(F.dropout(h1))
106 |     state = {'c1': c1, 'h1': h1}
107 |     loss = F.softmax_cross_entropy(y, t)
108 |     return state, loss
109 | 
110 | #imageは画像
111 | #x_listはある画像(image)に対応する文章（単語の集まり+EOS）
112 | #つまりx_list=[word1,word2,....,EOS]
113 | def forward(img_feature,sentences, volatile=False):
114 |     #imageはすでにchinaer variableである。
115 |     state = {name: chainer.Variable(xp.zeros((batchsize, n_units),dtype=xp.float32),volatile) for name in ('c1', 'h1')}
116 |     loss = 0
117 |             
118 |     first_word=sentences.T[0]
119 |     #[[w11,w12,...],[w21,w22...]]から[w11,w21]と最初の単語たちを取り出す.
120 |     #バッチサイズの数だけ文があって、それぞれの最初の単語だけを取ってきた、一次元の配列を作るということ。
121 | 
122 |     state, new_loss = forward_one_step_for_image(img_feature, first_word,state, volatile=volatile)
123 |     loss += new_loss
124 |     
125 |     #cur_wordに今の単語のnp.array(1次元)
126 |     #next_wordに次の単語のnp.array(1次元)
127 |     for cur_word, next_word in zip(sentences.T, sentences.T[1:]):
128 |         state, new_loss = forward_one_step(cur_word, next_word,state, volatile=volatile)
129 |         loss += new_loss
130 |     return loss
131 | 
132 | optimizer = optimizers.Adam()
133 | optimizer.setup(model)
134 | 
135 | #Trining Setting
136 | normal_batchsize=256
137 | grad_clip = 1.0
138 | num_train_data=len(train_caption_id2image_id)
139 | 
140 | #Begin Training
141 | print 'training started'
142 | for epoch in xrange(200):
143 | 
144 |     print 'epoch %d' %epoch
145 | 
146 |     batchsize=normal_batchsize
147 |     caption_ids_batches=[]
148 |     for caption_length in train_captions.keys():
149 |         caption_ids_set=train_captions[caption_length]
150 |         caption_ids=list(caption_ids_set)
151 |         random.shuffle(caption_ids)
152 |         caption_ids_batches+=[caption_ids[x:x + batchsize] for x in xrange(0, len(caption_ids), batchsize)]   
153 |     random.shuffle(caption_ids_batches)
154 | 
155 |     # training_bacthes={}
156 |     # for i, caption_ids_batch in enumerate(caption_ids_batches):
157 |     #     images = xp.array([train_image_id2feature[train_caption_id2image_id[caption_id]] for caption_id in caption_ids_batch],dtype=xp.float32)
158 |     #     sentences = xp.array([train_caption_id2sentence[caption_id] for caption_id in caption_ids_batch],dtype=xp.int32)
159 |     #     training_bacthes[i]= (images,sentences)
160 | 
161 |     #This is equivalent for above and hard to read, but I inteitionally did for faster calculation
162 |     training_bacthes = \
163 |         { i:\
164 |             (\
165 |                 xp.array([train_image_id2feature[train_caption_id2image_id[caption_id]] for caption_id in caption_ids_batch],dtype=xp.float32),\
166 |                 xp.array([train_caption_id2sentence[caption_id] for caption_id in caption_ids_batch],dtype=xp.int32)\
167 |             )\
168 |         for i, caption_ids_batch in enumerate(caption_ids_batches)\
169 |         }
170 | 
171 |     sum_loss = 0
172 |     for i, batch in training_bacthes.iteritems():
173 |         images=batch[0]
174 |         sentences=batch[1]
175 | 
176 |         sentence_length=len(sentences[0])
177 |         batchsize=normal_batchsize#reverse batchsize if it is changed due to sentence length.
178 |         if len(images) != batchsize:
179 |             batchsize=len(images) 
180 |             #last batch may be less than batchsize. Or depend on caption_length
181 | 
182 |         optimizer.zero_grads()
183 |         loss = forward(images,sentences)
184 |         print loss.data
185 |         with open(savedir+"real_loss.txt", "a") as f:
186 |             f.write(str(loss.data)+'\n') 
187 |         with open(savedir+"real_loss_per_word.txt", "a") as f:
188 |             f.write(str(loss.data/sentence_length)+'\n') 
189 | 
190 |         loss.backward()
191 |         #optimizer.clip_grads(grad_clip)
192 |         optimizer.update()
193 |         
194 |         sum_loss      += loss.data * batchsize
195 |     
196 |     serializers.save_hdf5(savedir+"/caption_model"+str(epoch)+'.chainer', model)
197 |     serializers.save_hdf5(savedir+"/optimizer"+str(epoch)+'.chainer', optimizer)
198 | 
199 |     mean_loss     = sum_loss / num_train_data
200 |     with open(savedir+"mean_loss.txt", "a") as f:
201 |         f.write(str(loss.data)+'\n')
202 | 
203 | 


--------------------------------------------------------------------------------
/data/.gitignore:
--------------------------------------------------------------------------------
1 | #gtignore 以外のファイルを全部無視する。
2 | *
3 | !.gitignore
4 | 


--------------------------------------------------------------------------------
/download.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | cd data
 3 | if [ ! -f bvlc_googlenet_caffe_chainer.pkl ]; then
 4 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/data/bvlc_googlenet_caffe_chainer.pkl
 5 | fi
 6 | cd ..
 7 | cd work
 8 | if [ ! -f index2token.pkl ]; then
 9 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/work/index2token.pkl
10 | fi
11 | if [ ! -f preprocessed_train_captions.pkl ]; then
12 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/work/preprocessed_train_captions.pkl
13 | fi
14 | if [ ! -d img_features ]; then
15 | 	mkdir img_features
16 | fi
17 | cd img_features
18 | if [ ! -f train_image_id2feature.pkl ]; then
19 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/work/img_features/train_image_id2feature.pkl
20 | fi
21 | if [ ! -f val_image_id2feature.pkl ]; then
22 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/work/img_features/val_image_id2feature.pkl
23 | fi
24 | cd ../../
25 | cd models
26 | if [ ! -f caption_model.chainer ]; then
27 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/models/caption_model.chainer
28 | fi


--------------------------------------------------------------------------------
/download_jp.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | cd work
 3 | if [ ! -f index2token_jp.pkl ]; then
 4 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/work/index2token_jp.pkl
 5 | fi
 6 | cd ..
 7 | cd models
 8 | if [ ! -f caption_model_jp.chainer ]; then
 9 |     wget https://googledrive.com/host/0B046sNk0DhCDeEczcm1vaWlCTFk/models/caption_model_jp.chainer
10 | fi
11 | 


--------------------------------------------------------------------------------
/evalutation_script/README.md:
--------------------------------------------------------------------------------
 1 | # Evaluation Script for MSCOCO
 2 | This code is based on the the follwoing repository.
 3 | https://github.com/tylin/coco-caption
 4 | To use the scripts here, please copy the three folders and thier contents to this place.
 5 | annotations
 6 | pycocoevalcap
 7 | pycocotools
 8 | 
 9 | 
10 | ## How to do evaluation?
11 | Prepare the directory that contains several json files for evaluation.
12 | The json file should be: 
13 | [{"image_id": 404464, "caption": "black and white photo of a man standing in front of a building"}, {"image_id": 380932, "caption": "group of people are on the side of a snowy field"},...]
14 | Then, it will save json file into results folder by the file name. 
15 | 


--------------------------------------------------------------------------------
/evalutation_script/evalutate_caption_val.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | This is a script to evaluate generated captions for validiation files.
 3 | Most of the script are from https://github.com/tylin/coco-caption
 4 | '''
 5 | 
 6 | # -*- coding: utf-8 -*-
 7 | #!/usr/bin/env python
 8 | #compatible chiner 1.5
 9 | 
10 | from pycocotools.coco import COCO
11 | from pycocoevalcap.eval import COCOEvalCap
12 | import matplotlib.pyplot as plt
13 | import skimage.io as io
14 | import pylab
15 | pylab.rcParams['figure.figsize'] = (10.0, 8.0)
16 | 
17 | import json
18 | from json import encoder
19 | encoder.FLOAT_REPR = lambda o: format(o, '.3f')
20 | 
21 | model_dir='../experiment1'
22 | 
23 | annFile='./annotations/captions_val2014.json'
24 | 
25 | # create coco object and cocoRes object
26 | coco = COCO(annFile)
27 | 
28 | all_results_json=[]
29 | 
30 | for i in xrange(50):
31 |     resFile=model_dir+'/caption_model%d.json'%i
32 |     print resFile
33 | 
34 | 
35 |     cocoRes = coco.loadRes(resFile)
36 |     # create cocoEval object by taking coco and cocoRes
37 |     cocoEval = COCOEvalCap(coco, cocoRes)
38 | 
39 |     # evaluate on a subset of images by setting
40 |     # cocoEval.params['image_id'] = cocoRes.getImgIds()
41 |     # please remove this line when evaluating the full validation set
42 |     #cocoEval.params['image_id'] = cocoRes.getImgIds()
43 | 
44 |     #evaluate results
45 |     cocoEval.evaluate()
46 | 
47 |     # print output evaluation scores
48 |     results={}
49 |     for metric, score in cocoEval.eval.items():
50 |         results[metric]=score
51 |     all_results_json.append(results)
52 | 
53 | with open(model_dir+'/evaluation_val.json', 'w') as f:
54 |     json.dump(all_results_json, f, sort_keys=True, indent=4)
55 | 


--------------------------------------------------------------------------------
/evalutation_script/generate_caption_val.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | This is a script to generate captions for validiation files.
  3 | '''
  4 | 
  5 | # -*- coding: utf-8 -*-
  6 | #!/usr/bin/env python
  7 | #compatible chiner 1.5
  8 | 
  9 | 
 10 | import os
 11 | os.environ["CHAINER_TYPE_CHECK"] = "0" #to disable type check. 
 12 | import chainer 
 13 | #Check che below is False if you disabled type check
 14 | #print(chainer.functions.Linear(1,1).type_check_enable) 
 15 | 
 16 | import argparse
 17 | import numpy as np
 18 | import chainer.functions as F
 19 | from chainer import cuda
 20 | from chainer import Function, FunctionSet, Variable, optimizers, serializers
 21 | import pickle
 22 | 
 23 | import glob
 24 | import os
 25 | import json
 26 | 
 27 | #Settings can be changed by command line arguments
 28 | gpu_id=0# GPU ID. if you want to use cpu, -1
 29 | model_dir='../experiment1'
 30 | 
 31 | #Override Settings by argument
 32 | parser = argparse.ArgumentParser(description=u"caption generation")
 33 | parser.add_argument("-g", "--gpu",default=gpu_id, type=int, help=u"GPU ID.CPU is -1")
 34 | parser.add_argument("-m", "--modeldir",default=model_dir, type=str, help=u"The directory that have models")
 35 | args = parser.parse_args()
 36 | gpu_id=args.gpu
 37 | model_dir= args.modeldir
 38 | 
 39 | 
 40 | print('pareparing evaluation')
 41 | 
 42 | 
 43 | with open('../work/img_features/val_image_id2feature.pkl', 'r') as f:
 44 |     val_image_id2feature = pickle.load(f)
 45 | 
 46 | #Gpu Setting
 47 | if gpu_id >= 0:
 48 |     xp = cuda.cupy 
 49 |     cuda.get_device(gpu_id).use()
 50 | else:
 51 |     xp=np
 52 | 
 53 | #Basic Setting
 54 | image_feature_dim=1024#dimension of image feature
 55 | n_units = 512  #number of units per layer
 56 | batchsize=1#has to be 1 currently because of implementation.
 57 | volatile=False
 58 | 
 59 | 
 60 | # Prepare dataset
 61 | print "loading vocab"
 62 | with open('../work/index2token.pkl', 'r') as f:
 63 |     index2word = pickle.load(f)
 64 | 
 65 | vocab=index2word
 66 | 
 67 | #Model Preparation
 68 | print "preparing caption generation models"
 69 | model = FunctionSet()
 70 | model.img_feature2vec=F.Linear(image_feature_dim, n_units)#CNN(I)の最後のレイヤーに相当。#parameter  W,b
 71 | model.embed=F.EmbedID(len(vocab), n_units)#W_e*S_tに相当 #parameter  W
 72 | model.l1_x=F.Linear(n_units, 4 * n_units)#parameter  W,b
 73 | model.l1_h=F.Linear(n_units, 4 * n_units)#parameter  W,b
 74 | model.out=F.Linear(n_units, len(vocab))#parameter  W,b
 75 | 
 76 | #To GPU
 77 | if gpu_id >= 0:
 78 |     model.to_gpu()
 79 | print "done"
 80 | 
 81 | for (image_id,feature) in val_image_id2feature.iteritems():
 82 |     x_batch = np.ndarray((1,image_feature_dim), dtype=np.float32)
 83 |     x_batch[0]=feature
 84 |     if gpu_id >= 0:
 85 |         x_batch=cuda.to_gpu(x_batch)
 86 |     x_batch_chainer = Variable(x_batch,volatile=volatile)
 87 |     val_image_id2feature[image_id]=x_batch_chainer
 88 | 
 89 | #Define Newtowork (Forward)
 90 | 
 91 | #forward_one_step is after the CNN layer, 
 92 | #h0 is n_units dimensional vector (embedding)
 93 | def forward_one_step(cur_word, state, volatile=True):
 94 |     x = chainer.Variable(cur_word, volatile)
 95 |     h0 = model.embed(x)
 96 |     h1_in = model.l1_x(F.dropout(h0,train=False)) + model.l1_h(state['h1'])
 97 |     c1, h1 = F.lstm(state['c1'], h1_in)
 98 |     y = model.out(F.dropout(h1,train=False)) 
 99 |     state = {'c1': c1, 'h1': h1}
100 |     return state, y
101 | 
102 | def forward_one_step_for_image(img_feature, state, volatile=True):
103 |     x = img_feature#img_feature is chainer.variable.
104 |     h0 = model.img_feature2vec(x)
105 |     h1_in = model.l1_x(F.dropout(h0,train=False)) + model.l1_h(state['h1'])
106 |     c1, h1 = F.lstm(state['c1'], h1_in)
107 |     y = model.out(F.dropout(h1,train=False))#don't forget to change drop out into non train mode.
108 |     state = {'c1': c1, 'h1': h1}
109 |     return state, y
110 | 
111 | print('evaluation started')
112 | 
113 | for model_place in glob.glob(os.path.join(model_dir, 'caption_model*.chainer')):
114 |     print model_place
115 | 
116 |     serializers.load_hdf5(model_place, model)#load model
117 | 
118 |     results_list=[]
119 | 
120 |     for image_id in val_image_id2feature:
121 | 
122 |         img_feature_chainer=val_image_id2feature[image_id]
123 | 
124 |         genrated_sentence_string=''
125 | 
126 |         #img_feature_chainer is chainer.variable of extarcted feature.
127 |         state = {name: chainer.Variable(xp.zeros((batchsize, n_units),dtype=np.float32),volatile) for name in ('c1', 'h1')}
128 |         state, predicted_word = forward_one_step_for_image(img_feature_chainer,state, volatile=volatile)
129 |         index=predicted_word.data.argmax(1)
130 |         index=cuda.to_cpu(index)[0]
131 |         #genrated_sentence_string+=index2word[index] #dont's add it because this is <SOS>
132 | 
133 |         for i in xrange(50):
134 |             state, predicted_word = forward_one_step(predicted_word.data.argmax(1).astype(np.int32),state, volatile=volatile)
135 |             index=predicted_word.data.argmax(1)
136 |             index=cuda.to_cpu(index)[0]
137 |             if index2word[index]=='<EOS>':
138 |                 genrated_sentence_string=genrated_sentence_string.strip()
139 |                 break;
140 |             genrated_sentence_string+=index2word[index]+" "
141 | 
142 |         line={}
143 |         line['image_id']=image_id
144 |         line['caption']=genrated_sentence_string
145 |         results_list.append(line)
146 |         
147 |     name, ext = os.path.splitext(model_place)
148 |     with open(name+'.json', 'w') as f:
149 |         json.dump(results_list, f, sort_keys=True, indent=4)
150 | 


--------------------------------------------------------------------------------
/experiment1/.gitignore:
--------------------------------------------------------------------------------
1 | #gtignore 以外のファイルを全部無視する。
2 | *
3 | !.gitignore
4 | 


--------------------------------------------------------------------------------
/images/COCO_val2014_000000185546.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000185546.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000192091.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000192091.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000229948.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000229948.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000241747.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000241747.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000250790.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000250790.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000277533.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000277533.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000285505.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000285505.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000323758.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000323758.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000326128.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000326128.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000397427.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000397427.jpg


--------------------------------------------------------------------------------
/images/COCO_val2014_000000553761.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/COCO_val2014_000000553761.jpg


--------------------------------------------------------------------------------
/images/test_image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apple2373/chainer_caption_generation/ee3a504beec5c0a9a84662c883d68375bc41b2d8/images/test_image.jpg


--------------------------------------------------------------------------------
/models/.gitignore:
--------------------------------------------------------------------------------
1 | #gtignore 以外のファイルを全部無視する。
2 | *
3 | !.gitignore
4 | 


--------------------------------------------------------------------------------
/work/.gitignore:
--------------------------------------------------------------------------------
1 | #gtignore 以外のファイルを全部無視する。
2 | *
3 | !.gitignore
4 | 


--------------------------------------------------------------------------------