├── .gitignore
├── README.md
├── config.py
├── decoder.py
├── embedding.py
├── generate.py
├── images
├── ex1.jpg
├── ex2.jpg
├── ex3.jpg
└── ex4.jpg
├── search.py
└── skipthoughts.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 |
5 | # C extensions
6 | *.so
7 |
8 | # Distribution / packaging
9 | .Python
10 | env/
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | *.egg-info/
23 | .installed.cfg
24 | *.egg
25 |
26 | # PyInstaller
27 | # Usually these files are written by a python script from a template
28 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
29 | *.manifest
30 | *.spec
31 |
32 | # Installer logs
33 | pip-log.txt
34 | pip-delete-this-directory.txt
35 |
36 | # Unit test / coverage reports
37 | htmlcov/
38 | .tox/
39 | .coverage
40 | .coverage.*
41 | .cache
42 | nosetests.xml
43 | coverage.xml
44 | *,cover
45 |
46 | # Translations
47 | *.mo
48 | *.pot
49 |
50 | # Django stuff:
51 | *.log
52 |
53 | # Sphinx documentation
54 | docs/_build/
55 |
56 | # PyBuilder
57 | target/
58 |
59 | # Ignore changes to configuration file
60 | config.py
61 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # neural-storyteller
2 |
3 | neural-storyteller is a recurrent neural network that generates little stories about images. This repository contains code for generating stories with your own images, as well as instructions for training new models.
4 |
5 |
6 | *We were barely able to catch the breeze at the beach , and it felt as if someone stepped out of my mind . She was in love with him for the first time in months , so she had no intention of escaping . The sun had risen from the ocean , making her feel more alive than normal . She 's beautiful , but the truth is that I do n't know what to do . The sun was just starting to fade away , leaving people scattered around the Atlantic Ocean . I d seen the men in his life , who guided me at the beach once more .*
7 |
8 | [Samim](http://samim.io/) has made an awesome blog post with lots of results [here](https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed).
9 |
10 | Some more results from an older model trained on Adventure books can be found [here](http://www.cs.toronto.edu/~rkiros/adv_L.html).
11 |
12 | The whole approach contains 4 components:
13 | * [skip-thought vectors](https://github.com/ryankiros/skip-thoughts)
14 | * [image-sentence embeddings](https://github.com/ryankiros/visual-semantic-embedding)
15 | * [conditional neural language models](https://github.com/ryankiros/skip-thoughts/tree/master/decoding)
16 | * style shifting (described in this project)
17 |
18 | The 'style-shifting' operation is what allows our model to transfer standard image captions to the style of stories from novels. The only source of supervision in our models is from [Microsoft COCO](http://mscoco.org/) captions. That is, we did not collect any new training data to directly predict stories given images.
19 |
20 | Style shifting was inspired by [A Neural Algorithm of Artistic Style](http://arxiv.org/abs/1508.06576) but the technical details are completely different.
21 |
22 | ## How does it work?
23 |
24 | We first train a recurrent neural network (RNN) decoder on romance novels. Each passage from a novel is mapped to a skip-thought vector. The RNN then conditions on the skip-thought vector and aims to generate the passage that it has encoded. We use romance novels collected from the BookCorpus [dataset](http://www.cs.toronto.edu/~mbweb/).
25 |
26 | Parallel to this, we train a visual-semantic embedding between COCO images and captions. In this model, captions and images are mapped into a common vector space. After training, we can embed new images and retrieve captions.
27 |
28 | Given these models, we need a way to bridge the gap between retrieved image captions and passages in novels. That is, if we had a function F that maps a collection of image caption vectors **x** to a book passage vector F(**x**), then we could feed F(**x**) to the decoder to get our story. There is no such parallel data, so we need to construct F another way.
29 |
30 | It turns out that skip-thought vectors have some intriguing properties that allow us to construct F in a really simple way. Suppose we have 3 vectors: an image caption **x**, a "caption style" vector **c** and a "book style" vector **b**. Then we define F as
31 |
32 | F(**x**) = **x** - **c** + **b**
33 |
34 | which intuitively means: keep the "thought" of the caption, but replace the image caption style with that of a story. Then, we simply feed F(**x**) to the decoder.
35 |
36 | How do we construct **c** and **b**? Here, **c** is the mean of the skip-thought vectors for Microsoft COCO training captions. We set **b** to be the mean of the skip-thought vectors for romance novel passages that are of length > 100.
37 |
38 | #### What kind of biases work?
39 |
40 | Skip-thought vectors are sensitive to:
41 |
42 | - length (if you bias by really long passages, it will decode really long stories)
43 | - punctuation
44 | - vocabulary
45 | - syntactic style (loosely speaking)
46 |
47 | For the last point, if you bias using text all written the same way the stories you get will also be written the same way.
48 |
49 | #### What can the decoder be trained on?
50 |
51 | We use romance novels, but that is because we have over 14 million passages to train on. Anything should work, provided you have a lot of text! If you want to train your own decoder, you can use the code available [here](https://github.com/ryankiros/skip-thoughts/tree/master/decoding) Any models trained there can be substituted here.
52 |
53 | ## Dependencies
54 |
55 | This code is written in python. To use it you will need:
56 |
57 | * Python 2.7
58 | * A recent version of [NumPy](http://www.numpy.org/) and [SciPy](http://www.scipy.org/)
59 | * [Lasagne](https://github.com/Lasagne/Lasagne)
60 | * A version of Theano that Lasagne supports
61 |
62 | For running on CPU, you will need to install [Caffe](http://caffe.berkeleyvision.org) and its python interface.
63 |
64 |
65 | ## Getting started
66 |
67 | You will first need to download some pre-trained models and style vectors. Most of the materials are available in a single compressed file, which you can obtain by running
68 |
69 | wget http://www.cs.toronto.edu/~rkiros/neural_storyteller.zip
70 |
71 | Included is a pre-trained decoder on romance novels, the decoder dictionary, caption and romance style vectors, MS COCO training captions and a pre-trained image-sentence embedding model.
72 |
73 | Next, you need to obtain the pre-trained skip-thoughts encoder. Go [here](https://github.com/ryankiros/skip-thoughts) and follow the instructions on the main page to obtain the pre-trained model.
74 |
75 | Finally, we need the VGG-19 ConvNet parameters. You can obtain them by running
76 |
77 | wget https://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/vgg19.pkl
78 |
79 | Note that this model is for non-commercial use only. Once you have all the materials, open `config.py` and specify the locations of all of the models and style vectors that you downloaded.
80 |
81 | For running on CPU, you will need to download the VGG-19 prototxt and model by:
82 |
83 | wget http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel
84 | wget https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt
85 |
86 | You also need to modify pycaffe and model path in `config.py`, and modify the flag in line 8 as:
87 |
88 | FLAG_CPU_MODE = True
89 |
90 | ## Generating a story
91 |
92 | The images directory contains some sample images that you can try the model on. In order to generate a story, open Ipython and run the following:
93 |
94 | import generate
95 | z = generate.load_all()
96 | generate.story(z, './images/ex1.jpg')
97 |
98 | If everything works, it will first print out the nearest COCO captions to the image (predicted by the visual-semantic embedding model). Then it will print out a story.
99 |
100 | #### Generation options
101 |
102 | There are 2 knobs that can be tuned for generation: the number of retrieved captions to condition on as well as the beam search width. The defaults are
103 |
104 | generate.story(z, './images/ex1.jpg', k=100, bw=50)
105 |
106 | where k is the number of captions to condition on and bw is the beam width. These are reasonable defaults but playing around with these can give you very different outputs! The higher the beam width, the longer it takes to generate a story.
107 |
108 | If you bias by song lyrics, you can turn on the lyric flag which will print the output in multiple lines by comma delimiting. `neural_storyteller.zip` contains an additional bias vector called `swift_style.npy` which is the mean of skip-thought vectors across Taylor Swift lyrics. If you point `path_to_posbias` to this vector in `config.py`, you can generate captions in the style of Taylor Swift lyrics. For example:
109 |
110 | generate.story(z, './images/ex1.jpg', lyric=True)
111 |
112 | should output
113 |
114 | You re the only person on the beach right now
115 | you know
116 | I do n't think I will ever fall in love with you
117 | and when the sea breeze hits me
118 | I thought
119 | Hey
120 |
121 | ## Reference
122 |
123 | This project does not have any associated paper with it. If you found this code useful, please consider citing:
124 |
125 | Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. **"Skip-Thought Vectors."** *arXiv preprint arXiv:1506.06726 (2015).*
126 |
127 | @article{kiros2015skip,
128 | title={Skip-Thought Vectors},
129 | author={Kiros, Ryan and Zhu, Yukun and Salakhutdinov, Ruslan and Zemel, Richard S and Torralba, Antonio and Urtasun, Raquel and Fidler, Sanja},
130 | journal={arXiv preprint arXiv:1506.06726},
131 | year={2015}
132 | }
133 |
134 | If you also use the BookCorpus data for training new models, please also consider citing:
135 |
136 | Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler.
137 | **"Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books."** *arXiv preprint arXiv:1506.06724 (2015).*
138 |
139 | @article{zhu2015aligning,
140 | title={Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books},
141 | author={Zhu, Yukun and Kiros, Ryan and Zemel, Richard and Salakhutdinov, Ruslan and Urtasun, Raquel and Torralba, Antonio and Fidler, Sanja},
142 | journal={arXiv preprint arXiv:1506.06724},
143 | year={2015}
144 | }
145 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | """
2 | Configuration for the generate module
3 | """
4 |
5 | #-----------------------------------------------------------------------------#
6 | # Flags for running on CPU
7 | #-----------------------------------------------------------------------------#
8 | FLAG_CPU_MODE = True
9 |
10 | #-----------------------------------------------------------------------------#
11 | # Paths to models and biases
12 | #-----------------------------------------------------------------------------#
13 | paths = dict()
14 |
15 | # Skip-thoughts
16 | paths['skmodels'] = '/u/rkiros/public_html/models/'
17 | paths['sktables'] = '/u/rkiros/public_html/models/'
18 |
19 | # Decoder
20 | paths['decmodel'] = '/ais/gobi3/u/rkiros/storyteller/romance.npz'
21 | paths['dictionary'] = '/ais/gobi3/u/rkiros/storyteller/romance_dictionary.pkl'
22 |
23 | # Image-sentence embedding
24 | paths['vsemodel'] = '/ais/gobi3/u/rkiros/storyteller/coco_embedding.npz'
25 |
26 | # VGG-19 convnet
27 | paths['vgg'] = '/ais/gobi3/u/rkiros/vgg/vgg19.pkl'
28 | paths['pycaffe'] = '/u/yukun/Projects/caffe-run/python'
29 | paths['vgg_proto_caffe'] = '/ais/guppy9/movie2text/neural-storyteller/models/VGG_ILSVRC_19_layers_deploy.prototxt'
30 | paths['vgg_model_caffe'] = '/ais/guppy9/movie2text/neural-storyteller/models/VGG_ILSVRC_19_layers.caffemodel'
31 |
32 |
33 | # COCO training captions
34 | paths['captions'] = '/ais/gobi3/u/rkiros/storyteller/coco_train_caps.txt'
35 |
36 | # Biases
37 | paths['negbias'] = '/ais/gobi3/u/rkiros/storyteller/caption_style.npy'
38 | paths['posbias'] = '/ais/gobi3/u/rkiros/storyteller/romance_style.npy'
39 |
--------------------------------------------------------------------------------
/decoder.py:
--------------------------------------------------------------------------------
1 | """
2 | Decoder
3 | """
4 | import theano
5 | import theano.tensor as tensor
6 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
7 |
8 | import cPickle as pkl
9 | import numpy
10 |
11 | from search import gen_sample
12 | from collections import OrderedDict
13 |
14 |
15 | def load_model(path_to_model, path_to_dictionary):
16 | """
17 | Load a trained model for decoding
18 | """
19 | # Load the worddict
20 | with open(path_to_dictionary, 'rb') as f:
21 | worddict = pkl.load(f)
22 |
23 | # Create inverted dictionary
24 | word_idict = dict()
25 | for kk, vv in worddict.iteritems():
26 | word_idict[vv] = kk
27 | word_idict[0] = ''
28 | word_idict[1] = 'UNK'
29 |
30 | # Load model options
31 | with open('%s.pkl'%path_to_model, 'rb') as f:
32 | options = pkl.load(f)
33 | if 'doutput' not in options.keys():
34 | options['doutput'] = True
35 |
36 | # Load parameters
37 | params = init_params(options)
38 | params = load_params(path_to_model, params)
39 | tparams = init_tparams(params)
40 |
41 | # Sampler.
42 | trng = RandomStreams(1234)
43 | f_init, f_next = build_sampler(tparams, options, trng)
44 |
45 | # Pack everything up
46 | dec = dict()
47 | dec['options'] = options
48 | dec['trng'] = trng
49 | dec['worddict'] = worddict
50 | dec['word_idict'] = word_idict
51 | dec['tparams'] = tparams
52 | dec['f_init'] = f_init
53 | dec['f_next'] = f_next
54 | return dec
55 |
56 | def run_sampler(dec, c, beam_width=1, stochastic=False, use_unk=False):
57 | """
58 | Generate text conditioned on c
59 | """
60 | sample, score = gen_sample(dec['tparams'], dec['f_init'], dec['f_next'],
61 | c.reshape(1, dec['options']['dimctx']), dec['options'],
62 | trng=dec['trng'], k=beam_width, maxlen=1000, stochastic=stochastic,
63 | use_unk=use_unk)
64 | text = []
65 | if stochastic:
66 | sample = [sample]
67 | for c in sample:
68 | text.append(' '.join([dec['word_idict'][w] for w in c[:-1]]))
69 |
70 | #Sort beams by their NLL, return the best result
71 | lengths = numpy.array([len(s.split()) for s in text])
72 | if lengths[0] == 0: # in case the model only predicts
73 | lengths = lengths[1:]
74 | score = score[1:]
75 | text = text[1:]
76 | sidx = numpy.argmin(score)
77 | text = text[sidx]
78 | score = score[sidx]
79 |
80 | return text
81 |
82 | def _p(pp, name):
83 | """
84 | make prefix-appended name
85 | """
86 | return '%s_%s'%(pp, name)
87 |
88 | def init_tparams(params):
89 | """
90 | initialize Theano shared variables according to the initial parameters
91 | """
92 | tparams = OrderedDict()
93 | for kk, pp in params.iteritems():
94 | tparams[kk] = theano.shared(params[kk], name=kk)
95 | return tparams
96 |
97 | def load_params(path, params):
98 | """
99 | load parameters
100 | """
101 | pp = numpy.load(path)
102 | for kk, vv in params.iteritems():
103 | if kk not in pp:
104 | warnings.warn('%s is not in the archive'%kk)
105 | continue
106 | params[kk] = pp[kk]
107 | return params
108 |
109 | # layers: 'name': ('parameter initializer', 'feedforward')
110 | layers = {'ff': ('param_init_fflayer', 'fflayer'),
111 | 'gru': ('param_init_gru', 'gru_layer')}
112 |
113 | def get_layer(name):
114 | fns = layers[name]
115 | return (eval(fns[0]), eval(fns[1]))
116 |
117 | def init_params(options):
118 | """
119 | Initialize all parameters
120 | """
121 | params = OrderedDict()
122 |
123 | # Word embedding
124 | params['Wemb'] = norm_weight(options['n_words'], options['dim_word'])
125 |
126 | # init state
127 | params = get_layer('ff')[0](options, params, prefix='ff_state', nin=options['dimctx'], nout=options['dim'])
128 |
129 | # Decoder
130 | params = get_layer(options['decoder'])[0](options, params, prefix='decoder',
131 | nin=options['dim_word'], dim=options['dim'])
132 |
133 | # Output layer
134 | if options['doutput']:
135 | params = get_layer('ff')[0](options, params, prefix='ff_hid', nin=options['dim'], nout=options['dim_word'])
136 | params = get_layer('ff')[0](options, params, prefix='ff_logit', nin=options['dim_word'], nout=options['n_words'])
137 | else:
138 | params = get_layer('ff')[0](options, params, prefix='ff_logit', nin=options['dim'], nout=options['n_words'])
139 |
140 | return params
141 |
142 | def build_sampler(tparams, options, trng):
143 | """
144 | Forward sampling
145 | """
146 | ctx = tensor.matrix('ctx', dtype='float32')
147 | ctx0 = ctx
148 |
149 | init_state = get_layer('ff')[1](tparams, ctx, options, prefix='ff_state', activ='tanh')
150 | f_init = theano.function([ctx], init_state, name='f_init', profile=False)
151 |
152 | # x: 1 x 1
153 | y = tensor.vector('y_sampler', dtype='int64')
154 | init_state = tensor.matrix('init_state', dtype='float32')
155 |
156 | # if it's the first word, emb should be all zero
157 | emb = tensor.switch(y[:,None] < 0, tensor.alloc(0., 1, tparams['Wemb'].shape[1]),
158 | tparams['Wemb'][y])
159 |
160 | # decoder
161 | proj = get_layer(options['decoder'])[1](tparams, emb, init_state, options,
162 | prefix='decoder',
163 | mask=None,
164 | one_step=True)
165 | next_state = proj[0]
166 |
167 | # output
168 | if options['doutput']:
169 | hid = get_layer('ff')[1](tparams, next_state, options, prefix='ff_hid', activ='tanh')
170 | logit = get_layer('ff')[1](tparams, hid, options, prefix='ff_logit', activ='linear')
171 | else:
172 | logit = get_layer('ff')[1](tparams, next_state, options, prefix='ff_logit', activ='linear')
173 | next_probs = tensor.nnet.softmax(logit)
174 | next_sample = trng.multinomial(pvals=next_probs).argmax(1)
175 |
176 | # next word probability
177 | inps = [y, init_state]
178 | outs = [next_probs, next_sample, next_state]
179 | f_next = theano.function(inps, outs, name='f_next', profile=False)
180 |
181 | return f_init, f_next
182 |
183 | def linear(x):
184 | """
185 | Linear activation function
186 | """
187 | return x
188 |
189 | def tanh(x):
190 | """
191 | Tanh activation function
192 | """
193 | return tensor.tanh(x)
194 |
195 | def ortho_weight(ndim):
196 | """
197 | Orthogonal weight init, for recurrent layers
198 | """
199 | W = numpy.random.randn(ndim, ndim)
200 | u, s, v = numpy.linalg.svd(W)
201 | return u.astype('float32')
202 |
203 | def norm_weight(nin,nout=None, scale=0.1, ortho=True):
204 | """
205 | Uniform initalization from [-scale, scale]
206 | If matrix is square and ortho=True, use ortho instead
207 | """
208 | if nout == None:
209 | nout = nin
210 | if nout == nin and ortho:
211 | W = ortho_weight(nin)
212 | else:
213 | W = numpy.random.uniform(low=-scale, high=scale, size=(nin, nout))
214 | return W.astype('float32')
215 |
216 | # Feedforward layer
217 | def param_init_fflayer(options, params, prefix='ff', nin=None, nout=None, ortho=True):
218 | """
219 | Affine transformation + point-wise nonlinearity
220 | """
221 | if nin == None:
222 | nin = options['dim_proj']
223 | if nout == None:
224 | nout = options['dim_proj']
225 | params[_p(prefix,'W')] = norm_weight(nin, nout)
226 | params[_p(prefix,'b')] = numpy.zeros((nout,)).astype('float32')
227 |
228 | return params
229 |
230 | def fflayer(tparams, state_below, options, prefix='rconv', activ='lambda x: tensor.tanh(x)', **kwargs):
231 | """
232 | Feedforward pass
233 | """
234 | return eval(activ)(tensor.dot(state_below, tparams[_p(prefix,'W')])+tparams[_p(prefix,'b')])
235 |
236 | # GRU layer
237 | def param_init_gru(options, params, prefix='gru', nin=None, dim=None):
238 | """
239 | Gated Recurrent Unit (GRU)
240 | """
241 | if nin == None:
242 | nin = options['dim_proj']
243 | if dim == None:
244 | dim = options['dim_proj']
245 | W = numpy.concatenate([norm_weight(nin,dim),
246 | norm_weight(nin,dim)], axis=1)
247 | params[_p(prefix,'W')] = W
248 | params[_p(prefix,'b')] = numpy.zeros((2 * dim,)).astype('float32')
249 | U = numpy.concatenate([ortho_weight(dim),
250 | ortho_weight(dim)], axis=1)
251 | params[_p(prefix,'U')] = U
252 |
253 | Wx = norm_weight(nin, dim)
254 | params[_p(prefix,'Wx')] = Wx
255 | Ux = ortho_weight(dim)
256 | params[_p(prefix,'Ux')] = Ux
257 | params[_p(prefix,'bx')] = numpy.zeros((dim,)).astype('float32')
258 |
259 | return params
260 |
261 | def gru_layer(tparams, state_below, init_state, options, prefix='gru', mask=None, one_step=False, **kwargs):
262 | """
263 | Feedforward pass through GRU
264 | """
265 | nsteps = state_below.shape[0]
266 | if state_below.ndim == 3:
267 | n_samples = state_below.shape[1]
268 | else:
269 | n_samples = 1
270 |
271 | dim = tparams[_p(prefix,'Ux')].shape[1]
272 |
273 | if init_state == None:
274 | init_state = tensor.alloc(0., n_samples, dim)
275 |
276 | if mask == None:
277 | mask = tensor.alloc(1., state_below.shape[0], 1)
278 |
279 | def _slice(_x, n, dim):
280 | if _x.ndim == 3:
281 | return _x[:, :, n*dim:(n+1)*dim]
282 | return _x[:, n*dim:(n+1)*dim]
283 |
284 | state_below_ = tensor.dot(state_below, tparams[_p(prefix, 'W')]) + tparams[_p(prefix, 'b')]
285 | state_belowx = tensor.dot(state_below, tparams[_p(prefix, 'Wx')]) + tparams[_p(prefix, 'bx')]
286 | U = tparams[_p(prefix, 'U')]
287 | Ux = tparams[_p(prefix, 'Ux')]
288 |
289 | def _step_slice(m_, x_, xx_, h_, U, Ux):
290 | preact = tensor.dot(h_, U)
291 | preact += x_
292 |
293 | r = tensor.nnet.sigmoid(_slice(preact, 0, dim))
294 | u = tensor.nnet.sigmoid(_slice(preact, 1, dim))
295 |
296 | preactx = tensor.dot(h_, Ux)
297 | preactx = preactx * r
298 | preactx = preactx + xx_
299 |
300 | h = tensor.tanh(preactx)
301 |
302 | h = u * h_ + (1. - u) * h
303 | h = m_[:,None] * h + (1. - m_)[:,None] * h_
304 |
305 | return h
306 |
307 | seqs = [mask, state_below_, state_belowx]
308 | _step = _step_slice
309 |
310 | if one_step:
311 | rval = _step(*(seqs+[init_state, tparams[_p(prefix, 'U')], tparams[_p(prefix, 'Ux')]]))
312 | else:
313 | rval, updates = theano.scan(_step,
314 | sequences=seqs,
315 | outputs_info = [init_state],
316 | non_sequences = [tparams[_p(prefix, 'U')],
317 | tparams[_p(prefix, 'Ux')]],
318 | name=_p(prefix, '_layers'),
319 | n_steps=nsteps,
320 | profile=False,
321 | strict=True)
322 | rval = [rval]
323 | return rval
324 |
325 |
--------------------------------------------------------------------------------
/embedding.py:
--------------------------------------------------------------------------------
1 | """
2 | Joint image-sentence embedding space
3 | """
4 | import theano
5 | import theano.tensor as tensor
6 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
7 |
8 | import cPickle as pkl
9 | import numpy
10 | import nltk
11 |
12 | from collections import OrderedDict, defaultdict
13 | from scipy.linalg import norm
14 |
15 |
16 | def load_model(path_to_model):
17 | """
18 | Load all model components
19 | """
20 | # Load the worddict
21 | with open('%s.dictionary.pkl'%path_to_model, 'rb') as f:
22 | worddict = pkl.load(f)
23 |
24 | # Create inverted dictionary
25 | word_idict = dict()
26 | for kk, vv in worddict.iteritems():
27 | word_idict[vv] = kk
28 | word_idict[0] = ''
29 | word_idict[1] = 'UNK'
30 |
31 | # Load model options
32 | with open('%s.pkl'%path_to_model, 'rb') as f:
33 | options = pkl.load(f)
34 |
35 | # Load parameters
36 | params = init_params(options)
37 | params = load_params(path_to_model, params)
38 | tparams = init_tparams(params)
39 |
40 | # Extractor functions
41 | trng = RandomStreams(1234)
42 | trng, [x, x_mask], sentences = build_sentence_encoder(tparams, options)
43 | f_senc = theano.function([x, x_mask], sentences, name='f_senc')
44 |
45 | trng, [im], images = build_image_encoder(tparams, options)
46 | f_ienc = theano.function([im], images, name='f_ienc')
47 |
48 | # Store everything we need in a dictionary
49 | model = {}
50 | model['options'] = options
51 | model['worddict'] = worddict
52 | model['word_idict'] = word_idict
53 | model['f_senc'] = f_senc
54 | model['f_ienc'] = f_ienc
55 | return model
56 |
57 | def encode_sentences(model, X, verbose=False, batch_size=128):
58 | """
59 | Encode sentences into the joint embedding space
60 | """
61 | features = numpy.zeros((len(X), model['options']['dim']), dtype='float32')
62 |
63 | # length dictionary
64 | ds = defaultdict(list)
65 | captions = [s.split() for s in X]
66 | for i,s in enumerate(captions):
67 | ds[len(s)].append(i)
68 |
69 | # quick check if a word is in the dictionary
70 | d = defaultdict(lambda : 0)
71 | for w in model['worddict'].keys():
72 | d[w] = 1
73 |
74 | # Get features. This encodes by length, in order to avoid wasting computation
75 | for k in ds.keys():
76 | if verbose:
77 | print k
78 | numbatches = len(ds[k]) / batch_size + 1
79 | for minibatch in range(numbatches):
80 | caps = ds[k][minibatch::numbatches]
81 | caption = [captions[c] for c in caps]
82 |
83 | seqs = []
84 | for i, cc in enumerate(caption):
85 | seqs.append([model['worddict'][w] if d[w] > 0 and model['worddict'][w] < model['options']['n_words'] else 1 for w in cc])
86 | x = numpy.zeros((k+1, len(caption))).astype('int64')
87 | x_mask = numpy.zeros((k+1, len(caption))).astype('float32')
88 | for idx, s in enumerate(seqs):
89 | x[:k,idx] = s
90 | x_mask[:k+1,idx] = 1.
91 | ff = model['f_senc'](x, x_mask)
92 | for ind, c in enumerate(caps):
93 | features[c] = ff[ind]
94 |
95 | return features
96 |
97 | def encode_images(model, IM):
98 | """
99 | Encode images into the joint embedding space
100 | """
101 | images = model['f_ienc'](IM)
102 | return images
103 |
104 | def _p(pp, name):
105 | """
106 | make prefix-appended name
107 | """
108 | return '%s_%s'%(pp, name)
109 |
110 | def init_tparams(params):
111 | """
112 | initialize Theano shared variables according to the initial parameters
113 | """
114 | tparams = OrderedDict()
115 | for kk, pp in params.iteritems():
116 | tparams[kk] = theano.shared(params[kk], name=kk)
117 | return tparams
118 |
119 | def load_params(path, params):
120 | """
121 | load parameters
122 | """
123 | pp = numpy.load(path)
124 | for kk, vv in params.iteritems():
125 | if kk not in pp:
126 | warnings.warn('%s is not in the archive'%kk)
127 | continue
128 | params[kk] = pp[kk]
129 | return params
130 |
131 | # layers: 'name': ('parameter initializer', 'feedforward')
132 | layers = {'ff': ('param_init_fflayer', 'fflayer'),
133 | 'gru': ('param_init_gru', 'gru_layer')}
134 |
135 | def get_layer(name):
136 | fns = layers[name]
137 | return (eval(fns[0]), eval(fns[1]))
138 |
139 | def init_params(options):
140 | """
141 | Initialize all parameters
142 | """
143 | params = OrderedDict()
144 |
145 | # Word embedding
146 | params['Wemb'] = norm_weight(options['n_words'], options['dim_word'])
147 |
148 | # Sentence encoder
149 | params = get_layer(options['encoder'])[0](options, params, prefix='encoder',
150 | nin=options['dim_word'], dim=options['dim'])
151 |
152 | # Image encoder
153 | params = get_layer('ff')[0](options, params, prefix='ff_image', nin=options['dim_image'], nout=options['dim'])
154 |
155 | return params
156 |
157 | def build_sentence_encoder(tparams, options):
158 | """
159 | Encoder only, for sentences
160 | """
161 | opt_ret = dict()
162 |
163 | trng = RandomStreams(1234)
164 |
165 | # description string: #words x #samples
166 | x = tensor.matrix('x', dtype='int64')
167 | mask = tensor.matrix('x_mask', dtype='float32')
168 |
169 | n_timesteps = x.shape[0]
170 | n_samples = x.shape[1]
171 |
172 | # Word embedding
173 | emb = tparams['Wemb'][x.flatten()].reshape([n_timesteps, n_samples, options['dim_word']])
174 |
175 | # Encode sentences
176 | proj = get_layer(options['encoder'])[1](tparams, emb, None, options,
177 | prefix='encoder',
178 | mask=mask)
179 | sents = proj[0][-1]
180 | sents = l2norm(sents)
181 |
182 | return trng, [x, mask], sents
183 |
184 | def build_image_encoder(tparams, options):
185 | """
186 | Encoder only, for images
187 | """
188 | opt_ret = dict()
189 |
190 | trng = RandomStreams(1234)
191 |
192 | # image features
193 | im = tensor.matrix('im', dtype='float32')
194 |
195 | # Encode images
196 | images = get_layer('ff')[1](tparams, im, options, prefix='ff_image', activ='linear')
197 | images = l2norm(images)
198 |
199 | return trng, [im], images
200 |
201 | def linear(x):
202 | """
203 | Linear activation function
204 | """
205 | return x
206 |
207 | def tanh(x):
208 | """
209 | Tanh activation function
210 | """
211 | return tensor.tanh(x)
212 |
213 | def l2norm(X):
214 | """
215 | Compute L2 norm, row-wise
216 | """
217 | norm = tensor.sqrt(tensor.pow(X, 2).sum(1))
218 | X /= norm[:, None]
219 | return X
220 |
221 | def ortho_weight(ndim):
222 | """
223 | Orthogonal weight init, for recurrent layers
224 | """
225 | W = numpy.random.randn(ndim, ndim)
226 | u, s, v = numpy.linalg.svd(W)
227 | return u.astype('float32')
228 |
229 | def norm_weight(nin,nout=None, scale=0.1, ortho=True):
230 | """
231 | Uniform initalization from [-scale, scale]
232 | If matrix is square and ortho=True, use ortho instead
233 | """
234 | if nout == None:
235 | nout = nin
236 | if nout == nin and ortho:
237 | W = ortho_weight(nin)
238 | else:
239 | W = numpy.random.uniform(low=-scale, high=scale, size=(nin, nout))
240 | return W.astype('float32')
241 |
242 | def xavier_weight(nin,nout=None):
243 | """
244 | Xavier init
245 | """
246 | if nout == None:
247 | nout = nin
248 | r = numpy.sqrt(6.) / numpy.sqrt(nin + nout)
249 | W = numpy.random.rand(nin, nout) * 2 * r - r
250 | return W.astype('float32')
251 |
252 | # Feedforward layer
253 | def param_init_fflayer(options, params, prefix='ff', nin=None, nout=None, ortho=True):
254 | """
255 | Affine transformation + point-wise nonlinearity
256 | """
257 | if nin == None:
258 | nin = options['dim_proj']
259 | if nout == None:
260 | nout = options['dim_proj']
261 | params[_p(prefix,'W')] = xavier_weight(nin, nout)
262 | params[_p(prefix,'b')] = numpy.zeros((nout,)).astype('float32')
263 |
264 | return params
265 |
266 | def fflayer(tparams, state_below, options, prefix='rconv', activ='lambda x: tensor.tanh(x)', **kwargs):
267 | """
268 | Feedforward pass
269 | """
270 | return eval(activ)(tensor.dot(state_below, tparams[_p(prefix,'W')])+tparams[_p(prefix,'b')])
271 |
272 | # GRU layer
273 | def param_init_gru(options, params, prefix='gru', nin=None, dim=None):
274 | """
275 | Gated Recurrent Unit (GRU)
276 | """
277 | if nin == None:
278 | nin = options['dim_proj']
279 | if dim == None:
280 | dim = options['dim_proj']
281 | W = numpy.concatenate([norm_weight(nin,dim),
282 | norm_weight(nin,dim)], axis=1)
283 | params[_p(prefix,'W')] = W
284 | params[_p(prefix,'b')] = numpy.zeros((2 * dim,)).astype('float32')
285 | U = numpy.concatenate([ortho_weight(dim),
286 | ortho_weight(dim)], axis=1)
287 | params[_p(prefix,'U')] = U
288 |
289 | Wx = norm_weight(nin, dim)
290 | params[_p(prefix,'Wx')] = Wx
291 | Ux = ortho_weight(dim)
292 | params[_p(prefix,'Ux')] = Ux
293 | params[_p(prefix,'bx')] = numpy.zeros((dim,)).astype('float32')
294 |
295 | return params
296 |
297 | def gru_layer(tparams, state_below, init_state, options, prefix='gru', mask=None, one_step=False, **kwargs):
298 | """
299 | Feedforward pass through GRU
300 | """
301 | nsteps = state_below.shape[0]
302 | if state_below.ndim == 3:
303 | n_samples = state_below.shape[1]
304 | else:
305 | n_samples = 1
306 |
307 | dim = tparams[_p(prefix,'Ux')].shape[1]
308 |
309 | if init_state == None:
310 | init_state = tensor.alloc(0., n_samples, dim)
311 |
312 | if mask == None:
313 | mask = tensor.alloc(1., state_below.shape[0], 1)
314 |
315 | def _slice(_x, n, dim):
316 | if _x.ndim == 3:
317 | return _x[:, :, n*dim:(n+1)*dim]
318 | return _x[:, n*dim:(n+1)*dim]
319 |
320 | state_below_ = tensor.dot(state_below, tparams[_p(prefix, 'W')]) + tparams[_p(prefix, 'b')]
321 | state_belowx = tensor.dot(state_below, tparams[_p(prefix, 'Wx')]) + tparams[_p(prefix, 'bx')]
322 | U = tparams[_p(prefix, 'U')]
323 | Ux = tparams[_p(prefix, 'Ux')]
324 |
325 | def _step_slice(m_, x_, xx_, h_, U, Ux):
326 | preact = tensor.dot(h_, U)
327 | preact += x_
328 |
329 | r = tensor.nnet.sigmoid(_slice(preact, 0, dim))
330 | u = tensor.nnet.sigmoid(_slice(preact, 1, dim))
331 |
332 | preactx = tensor.dot(h_, Ux)
333 | preactx = preactx * r
334 | preactx = preactx + xx_
335 |
336 | h = tensor.tanh(preactx)
337 |
338 | h = u * h_ + (1. - u) * h
339 | h = m_[:,None] * h + (1. - m_)[:,None] * h_
340 |
341 | return h
342 |
343 | seqs = [mask, state_below_, state_belowx]
344 | _step = _step_slice
345 |
346 | if one_step:
347 | rval = _step(*(seqs+[init_state, tparams[_p(prefix, 'U')], tparams[_p(prefix, 'Ux')]]))
348 | else:
349 | rval, updates = theano.scan(_step,
350 | sequences=seqs,
351 | outputs_info = [init_state],
352 | non_sequences = [tparams[_p(prefix, 'U')],
353 | tparams[_p(prefix, 'Ux')]],
354 | name=_p(prefix, '_layers'),
355 | n_steps=nsteps,
356 | profile=False,
357 | strict=True)
358 | rval = [rval]
359 | return rval
360 |
361 |
362 |
--------------------------------------------------------------------------------
/generate.py:
--------------------------------------------------------------------------------
1 | """
2 | Story generation
3 | """
4 | import cPickle as pkl
5 | import numpy
6 | import copy
7 | import sys
8 | import skimage.transform
9 |
10 | import skipthoughts
11 | import decoder
12 | import embedding
13 |
14 | import config
15 |
16 | import lasagne
17 | from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer, DropoutLayer
18 | from lasagne.layers import MaxPool2DLayer as PoolLayer
19 | from lasagne.nonlinearities import softmax
20 | from lasagne.utils import floatX
21 | if not config.FLAG_CPU_MODE:
22 | from lasagne.layers.corrmm import Conv2DMMLayer as ConvLayer
23 |
24 | from scipy import optimize, stats
25 | from collections import OrderedDict, defaultdict, Counter
26 | from numpy.random import RandomState
27 | from scipy.linalg import norm
28 |
29 | from PIL import Image
30 | from PIL import ImageFile
31 | ImageFile.LOAD_TRUNCATED_IMAGES = True
32 |
33 |
34 | def story(z, image_loc, k=100, bw=50, lyric=False):
35 | """
36 | Generate a story for an image at location image_loc
37 | """
38 | # Load the image
39 | rawim, im = load_image(image_loc)
40 |
41 | # Run image through convnet
42 | feats = compute_features(z['net'], im).flatten()
43 | feats /= norm(feats)
44 |
45 | # Embed image into joint space
46 | feats = embedding.encode_images(z['vse'], feats[None,:])
47 |
48 | # Compute the nearest neighbours
49 | scores = numpy.dot(feats, z['cvec'].T).flatten()
50 | sorted_args = numpy.argsort(scores)[::-1]
51 | sentences = [z['cap'][a] for a in sorted_args[:k]]
52 |
53 | print 'NEAREST-CAPTIONS: '
54 | for s in sentences[:5]:
55 | print s
56 | print ''
57 |
58 | # Compute skip-thought vectors for sentences
59 | svecs = skipthoughts.encode(z['stv'], sentences, verbose=False)
60 |
61 | # Style shifting
62 | shift = svecs.mean(0) - z['bneg'] + z['bpos']
63 |
64 | # Generate story conditioned on shift
65 | passage = decoder.run_sampler(z['dec'], shift, beam_width=bw)
66 | print 'OUTPUT: '
67 | if lyric:
68 | for line in passage.split(','):
69 | if line[0] != ' ':
70 | print line
71 | else:
72 | print line[1:]
73 | else:
74 | print passage
75 |
76 |
77 | def load_all():
78 | """
79 | Load everything we need for generating
80 | """
81 | print config.paths['decmodel']
82 |
83 | # Skip-thoughts
84 | print 'Loading skip-thoughts...'
85 | stv = skipthoughts.load_model(config.paths['skmodels'],
86 | config.paths['sktables'])
87 |
88 | # Decoder
89 | print 'Loading decoder...'
90 | dec = decoder.load_model(config.paths['decmodel'],
91 | config.paths['dictionary'])
92 |
93 | # Image-sentence embedding
94 | print 'Loading image-sentence embedding...'
95 | vse = embedding.load_model(config.paths['vsemodel'])
96 |
97 | # VGG-19
98 | print 'Loading and initializing ConvNet...'
99 |
100 | if config.FLAG_CPU_MODE:
101 | sys.path.insert(0, config.paths['pycaffe'])
102 | import caffe
103 | caffe.set_mode_cpu()
104 | net = caffe.Net(config.paths['vgg_proto_caffe'],
105 | config.paths['vgg_model_caffe'],
106 | caffe.TEST)
107 | else:
108 | net = build_convnet(config.paths['vgg'])
109 |
110 | # Captions
111 | print 'Loading captions...'
112 | cap = []
113 | with open(config.paths['captions'], 'rb') as f:
114 | for line in f:
115 | cap.append(line.strip())
116 |
117 | # Caption embeddings
118 | print 'Embedding captions...'
119 | cvec = embedding.encode_sentences(vse, cap, verbose=False)
120 |
121 | # Biases
122 | print 'Loading biases...'
123 | bneg = numpy.load(config.paths['negbias'])
124 | bpos = numpy.load(config.paths['posbias'])
125 |
126 | # Pack up
127 | z = {}
128 | z['stv'] = stv
129 | z['dec'] = dec
130 | z['vse'] = vse
131 | z['net'] = net
132 | z['cap'] = cap
133 | z['cvec'] = cvec
134 | z['bneg'] = bneg
135 | z['bpos'] = bpos
136 |
137 | return z
138 |
139 | def load_image(file_name):
140 | """
141 | Load and preprocess an image
142 | """
143 | MEAN_VALUE = numpy.array([103.939, 116.779, 123.68]).reshape((3,1,1))
144 | image = Image.open(file_name)
145 | im = numpy.array(image)
146 |
147 | # Resize so smallest dim = 256, preserving aspect ratio
148 | if len(im.shape) == 2:
149 | im = im[:, :, numpy.newaxis]
150 | im = numpy.repeat(im, 3, axis=2)
151 | h, w, _ = im.shape
152 | if h < w:
153 | im = skimage.transform.resize(im, (256, w*256/h), preserve_range=True)
154 | else:
155 | im = skimage.transform.resize(im, (h*256/w, 256), preserve_range=True)
156 |
157 | # Central crop to 224x224
158 | h, w, _ = im.shape
159 | im = im[h//2-112:h//2+112, w//2-112:w//2+112]
160 |
161 | rawim = numpy.copy(im).astype('uint8')
162 |
163 | # Shuffle axes to c01
164 | im = numpy.swapaxes(numpy.swapaxes(im, 1, 2), 0, 1)
165 |
166 | # Convert to BGR
167 | im = im[::-1, :, :]
168 |
169 | im = im - MEAN_VALUE
170 | return rawim, floatX(im[numpy.newaxis])
171 |
172 | def compute_features(net, im):
173 | """
174 | Compute fc7 features for im
175 | """
176 | if config.FLAG_CPU_MODE:
177 | net.blobs['data'].reshape(* im.shape)
178 | net.blobs['data'].data[...] = im
179 | net.forward()
180 | fc7 = net.blobs['fc7'].data
181 | else:
182 | fc7 = numpy.array(lasagne.layers.get_output(net['fc7'], im,
183 | deterministic=True).eval())
184 | return fc7
185 |
186 | def build_convnet(path_to_vgg):
187 | """
188 | Construct VGG-19 convnet
189 | """
190 | net = {}
191 | net['input'] = InputLayer((None, 3, 224, 224))
192 | net['conv1_1'] = ConvLayer(net['input'], 64, 3, pad=1)
193 | net['conv1_2'] = ConvLayer(net['conv1_1'], 64, 3, pad=1)
194 | net['pool1'] = PoolLayer(net['conv1_2'], 2)
195 | net['conv2_1'] = ConvLayer(net['pool1'], 128, 3, pad=1)
196 | net['conv2_2'] = ConvLayer(net['conv2_1'], 128, 3, pad=1)
197 | net['pool2'] = PoolLayer(net['conv2_2'], 2)
198 | net['conv3_1'] = ConvLayer(net['pool2'], 256, 3, pad=1)
199 | net['conv3_2'] = ConvLayer(net['conv3_1'], 256, 3, pad=1)
200 | net['conv3_3'] = ConvLayer(net['conv3_2'], 256, 3, pad=1)
201 | net['conv3_4'] = ConvLayer(net['conv3_3'], 256, 3, pad=1)
202 | net['pool3'] = PoolLayer(net['conv3_4'], 2)
203 | net['conv4_1'] = ConvLayer(net['pool3'], 512, 3, pad=1)
204 | net['conv4_2'] = ConvLayer(net['conv4_1'], 512, 3, pad=1)
205 | net['conv4_3'] = ConvLayer(net['conv4_2'], 512, 3, pad=1)
206 | net['conv4_4'] = ConvLayer(net['conv4_3'], 512, 3, pad=1)
207 | net['pool4'] = PoolLayer(net['conv4_4'], 2)
208 | net['conv5_1'] = ConvLayer(net['pool4'], 512, 3, pad=1)
209 | net['conv5_2'] = ConvLayer(net['conv5_1'], 512, 3, pad=1)
210 | net['conv5_3'] = ConvLayer(net['conv5_2'], 512, 3, pad=1)
211 | net['conv5_4'] = ConvLayer(net['conv5_3'], 512, 3, pad=1)
212 | net['pool5'] = PoolLayer(net['conv5_4'], 2)
213 | net['fc6'] = DenseLayer(net['pool5'], num_units=4096)
214 | net['fc7'] = DenseLayer(net['fc6'], num_units=4096)
215 | net['fc8'] = DenseLayer(net['fc7'], num_units=1000, nonlinearity=None)
216 | net['prob'] = NonlinearityLayer(net['fc8'], softmax)
217 |
218 | print 'Loading parameters...'
219 | output_layer = net['prob']
220 | model = pkl.load(open(path_to_vgg))
221 | lasagne.layers.set_all_param_values(output_layer, model['param values'])
222 |
223 | return net
224 |
225 |
226 |
--------------------------------------------------------------------------------
/images/ex1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryankiros/neural-storyteller/61e12a7a0453bdc62013c7c07b7f7c331059d360/images/ex1.jpg
--------------------------------------------------------------------------------
/images/ex2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryankiros/neural-storyteller/61e12a7a0453bdc62013c7c07b7f7c331059d360/images/ex2.jpg
--------------------------------------------------------------------------------
/images/ex3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryankiros/neural-storyteller/61e12a7a0453bdc62013c7c07b7f7c331059d360/images/ex3.jpg
--------------------------------------------------------------------------------
/images/ex4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryankiros/neural-storyteller/61e12a7a0453bdc62013c7c07b7f7c331059d360/images/ex4.jpg
--------------------------------------------------------------------------------
/search.py:
--------------------------------------------------------------------------------
1 | """
2 | Code for sequence generation
3 | """
4 | import numpy
5 | import copy
6 |
7 | def gen_sample(tparams, f_init, f_next, ctx, options, trng=None, k=1, maxlen=30,
8 | stochastic=True, argmax=False, use_unk=False):
9 | """
10 | Generate a sample, using either beam search or stochastic sampling
11 | """
12 | if k > 1:
13 | assert not stochastic, 'Beam search does not support stochastic sampling'
14 |
15 | sample = []
16 | sample_score = []
17 | if stochastic:
18 | sample_score = 0
19 |
20 | live_k = 1
21 | dead_k = 0
22 |
23 | hyp_samples = [[]] * live_k
24 | hyp_scores = numpy.zeros(live_k).astype('float32')
25 | hyp_states = []
26 |
27 | next_state = f_init(ctx)
28 | next_w = -1 * numpy.ones((1,)).astype('int64')
29 |
30 | for ii in xrange(maxlen):
31 | inps = [next_w, next_state]
32 | ret = f_next(*inps)
33 | next_p, next_w, next_state = ret[0], ret[1], ret[2]
34 |
35 | if stochastic:
36 | if argmax:
37 | nw = next_p[0].argmax()
38 | else:
39 | nw = next_w[0]
40 | sample.append(nw)
41 | sample_score += next_p[0,nw]
42 | if nw == 0:
43 | break
44 | else:
45 | cand_scores = hyp_scores[:,None] - numpy.log(next_p)
46 | cand_flat = cand_scores.flatten()
47 |
48 | if not use_unk:
49 | voc_size = next_p.shape[1]
50 | for xx in range(len(cand_flat) / voc_size):
51 | cand_flat[voc_size * xx + 1] = 1e20
52 |
53 | ranks_flat = cand_flat.argsort()[:(k-dead_k)]
54 |
55 | voc_size = next_p.shape[1]
56 | trans_indices = ranks_flat / voc_size
57 | word_indices = ranks_flat % voc_size
58 | costs = cand_flat[ranks_flat]
59 |
60 | new_hyp_samples = []
61 | new_hyp_scores = numpy.zeros(k-dead_k).astype('float32')
62 | new_hyp_states = []
63 |
64 | for idx, [ti, wi] in enumerate(zip(trans_indices, word_indices)):
65 | new_hyp_samples.append(hyp_samples[ti]+[wi])
66 | new_hyp_scores[idx] = copy.copy(costs[idx])
67 | new_hyp_states.append(copy.copy(next_state[ti]))
68 |
69 | # check the finished samples
70 | new_live_k = 0
71 | hyp_samples = []
72 | hyp_scores = []
73 | hyp_states = []
74 |
75 | for idx in xrange(len(new_hyp_samples)):
76 | if new_hyp_samples[idx][-1] == 0:
77 | sample.append(new_hyp_samples[idx])
78 | sample_score.append(new_hyp_scores[idx])
79 | dead_k += 1
80 | else:
81 | new_live_k += 1
82 | hyp_samples.append(new_hyp_samples[idx])
83 | hyp_scores.append(new_hyp_scores[idx])
84 | hyp_states.append(new_hyp_states[idx])
85 | hyp_scores = numpy.array(hyp_scores)
86 | live_k = new_live_k
87 |
88 | if new_live_k < 1:
89 | break
90 | if dead_k >= k:
91 | break
92 |
93 | next_w = numpy.array([w[-1] for w in hyp_samples])
94 | next_state = numpy.array(hyp_states)
95 |
96 | if not stochastic:
97 | # dump every remaining one
98 | if live_k > 0:
99 | for idx in xrange(live_k):
100 | sample.append(hyp_samples[idx])
101 | sample_score.append(hyp_scores[idx])
102 |
103 | return sample, sample_score
104 |
105 |
106 |
--------------------------------------------------------------------------------
/skipthoughts.py:
--------------------------------------------------------------------------------
1 | '''
2 | Skip-thought vectors
3 | '''
4 | import os
5 |
6 | import theano
7 | import theano.tensor as tensor
8 |
9 | import cPickle as pkl
10 | import numpy
11 | import copy
12 | import nltk
13 |
14 | from collections import OrderedDict, defaultdict
15 | from scipy.linalg import norm
16 | from nltk.tokenize import word_tokenize
17 |
18 | profile = False
19 |
20 |
21 | def load_model(path_to_models, path_to_tables):
22 | """
23 | Load the model with saved tables
24 | """
25 | path_to_umodel = path_to_models + 'uni_skip.npz'
26 | path_to_bmodel = path_to_models + 'bi_skip.npz'
27 |
28 | # Load model options
29 | with open('%s.pkl'%path_to_umodel, 'rb') as f:
30 | uoptions = pkl.load(f)
31 | with open('%s.pkl'%path_to_bmodel, 'rb') as f:
32 | boptions = pkl.load(f)
33 |
34 | # Load parameters
35 | uparams = init_params(uoptions)
36 | uparams = load_params(path_to_umodel, uparams)
37 | utparams = init_tparams(uparams)
38 | bparams = init_params_bi(boptions)
39 | bparams = load_params(path_to_bmodel, bparams)
40 | btparams = init_tparams(bparams)
41 |
42 | # Extractor functions
43 | embedding, x_mask, ctxw2v = build_encoder(utparams, uoptions)
44 | f_w2v = theano.function([embedding, x_mask], ctxw2v, name='f_w2v')
45 | embedding, x_mask, ctxw2v = build_encoder_bi(btparams, boptions)
46 | f_w2v2 = theano.function([embedding, x_mask], ctxw2v, name='f_w2v2')
47 |
48 | # Tables
49 | utable, btable = load_tables(path_to_tables)
50 |
51 | # Store everything we need in a dictionary
52 | model = {}
53 | model['uoptions'] = uoptions
54 | model['boptions'] = boptions
55 | model['utable'] = utable
56 | model['btable'] = btable
57 | model['f_w2v'] = f_w2v
58 | model['f_w2v2'] = f_w2v2
59 |
60 | return model
61 |
62 | def load_tables(path_to_tables):
63 | """
64 | Load the tables
65 | """
66 | words = []
67 | utable = numpy.load(path_to_tables + 'utable.npy')
68 | btable = numpy.load(path_to_tables + 'btable.npy')
69 | f = open(path_to_tables + 'dictionary.txt', 'rb')
70 | for line in f:
71 | words.append(line.decode('utf-8').strip())
72 | f.close()
73 | utable = OrderedDict(zip(words, utable))
74 | btable = OrderedDict(zip(words, btable))
75 | return utable, btable
76 |
77 | def encode(model, X, use_norm=True, verbose=True, batch_size=128, use_eos=False):
78 | """
79 | Encode sentences in the list X. Each entry will return a vector
80 | """
81 | # first, do preprocessing
82 | X = preprocess(X)
83 |
84 | # word dictionary and init
85 | d = defaultdict(lambda : 0)
86 | for w in model['utable'].keys():
87 | d[w] = 1
88 | ufeatures = numpy.zeros((len(X), model['uoptions']['dim']), dtype='float32')
89 | bfeatures = numpy.zeros((len(X), 2 * model['boptions']['dim']), dtype='float32')
90 |
91 | # length dictionary
92 | ds = defaultdict(list)
93 | captions = [s.split() for s in X]
94 | for i,s in enumerate(captions):
95 | ds[len(s)].append(i)
96 |
97 | # Get features. This encodes by length, in order to avoid wasting computation
98 | for k in ds.keys():
99 | if verbose:
100 | print k
101 | numbatches = len(ds[k]) / batch_size + 1
102 | for minibatch in range(numbatches):
103 | caps = ds[k][minibatch::numbatches]
104 |
105 | if use_eos:
106 | uembedding = numpy.zeros((k+1, len(caps), model['uoptions']['dim_word']), dtype='float32')
107 | bembedding = numpy.zeros((k+1, len(caps), model['boptions']['dim_word']), dtype='float32')
108 | else:
109 | uembedding = numpy.zeros((k, len(caps), model['uoptions']['dim_word']), dtype='float32')
110 | bembedding = numpy.zeros((k, len(caps), model['boptions']['dim_word']), dtype='float32')
111 | for ind, c in enumerate(caps):
112 | caption = captions[c]
113 | for j in range(len(caption)):
114 | if d[caption[j]] > 0:
115 | uembedding[j,ind] = model['utable'][caption[j]]
116 | bembedding[j,ind] = model['btable'][caption[j]]
117 | else:
118 | uembedding[j,ind] = model['utable']['UNK']
119 | bembedding[j,ind] = model['btable']['UNK']
120 | if use_eos:
121 | uembedding[-1,ind] = model['utable']['']
122 | bembedding[-1,ind] = model['btable']['']
123 | if use_eos:
124 | uff = model['f_w2v'](uembedding, numpy.ones((len(caption)+1,len(caps)), dtype='float32'))
125 | bff = model['f_w2v2'](bembedding, numpy.ones((len(caption)+1,len(caps)), dtype='float32'))
126 | else:
127 | uff = model['f_w2v'](uembedding, numpy.ones((len(caption),len(caps)), dtype='float32'))
128 | bff = model['f_w2v2'](bembedding, numpy.ones((len(caption),len(caps)), dtype='float32'))
129 | if use_norm:
130 | for j in range(len(uff)):
131 | uff[j] /= norm(uff[j])
132 | bff[j] /= norm(bff[j])
133 | for ind, c in enumerate(caps):
134 | ufeatures[c] = uff[ind]
135 | bfeatures[c] = bff[ind]
136 |
137 | features = numpy.c_[ufeatures, bfeatures]
138 | return features
139 |
140 | def preprocess(text):
141 | """
142 | Preprocess text for encoder
143 | """
144 | X = []
145 | sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
146 | for t in text:
147 | sents = sent_detector.tokenize(t)
148 | result = ''
149 | for s in sents:
150 | tokens = word_tokenize(s)
151 | result += ' ' + ' '.join(tokens)
152 | X.append(result)
153 | return X
154 |
155 | def _p(pp, name):
156 | """
157 | make prefix-appended name
158 | """
159 | return '%s_%s'%(pp, name)
160 |
161 | def init_tparams(params):
162 | """
163 | initialize Theano shared variables according to the initial parameters
164 | """
165 | tparams = OrderedDict()
166 | for kk, pp in params.iteritems():
167 | tparams[kk] = theano.shared(params[kk], name=kk)
168 | return tparams
169 |
170 | def load_params(path, params):
171 | """
172 | load parameters
173 | """
174 | pp = numpy.load(path)
175 | for kk, vv in params.iteritems():
176 | if kk not in pp:
177 | warnings.warn('%s is not in the archive'%kk)
178 | continue
179 | params[kk] = pp[kk]
180 | return params
181 |
182 | # layers: 'name': ('parameter initializer', 'feedforward')
183 | layers = {'gru': ('param_init_gru', 'gru_layer')}
184 |
185 | def get_layer(name):
186 | fns = layers[name]
187 | return (eval(fns[0]), eval(fns[1]))
188 |
189 | def init_params(options):
190 | """
191 | initialize all parameters needed for the encoder
192 | """
193 | params = OrderedDict()
194 |
195 | # embedding
196 | params['Wemb'] = norm_weight(options['n_words_src'], options['dim_word'])
197 |
198 | # encoder: GRU
199 | params = get_layer(options['encoder'])[0](options, params, prefix='encoder',
200 | nin=options['dim_word'], dim=options['dim'])
201 | return params
202 |
203 | def init_params_bi(options):
204 | """
205 | initialize all paramters needed for bidirectional encoder
206 | """
207 | params = OrderedDict()
208 |
209 | # embedding
210 | params['Wemb'] = norm_weight(options['n_words_src'], options['dim_word'])
211 |
212 | # encoder: GRU
213 | params = get_layer(options['encoder'])[0](options, params, prefix='encoder',
214 | nin=options['dim_word'], dim=options['dim'])
215 | params = get_layer(options['encoder'])[0](options, params, prefix='encoder_r',
216 | nin=options['dim_word'], dim=options['dim'])
217 | return params
218 |
219 | def build_encoder(tparams, options):
220 | """
221 | build an encoder, given pre-computed word embeddings
222 | """
223 | # word embedding (source)
224 | embedding = tensor.tensor3('embedding', dtype='float32')
225 | x_mask = tensor.matrix('x_mask', dtype='float32')
226 |
227 | # encoder
228 | proj = get_layer(options['encoder'])[1](tparams, embedding, options,
229 | prefix='encoder',
230 | mask=x_mask)
231 | ctx = proj[0][-1]
232 |
233 | return embedding, x_mask, ctx
234 |
235 | def build_encoder_bi(tparams, options):
236 | """
237 | build bidirectional encoder, given pre-computed word embeddings
238 | """
239 | # word embedding (source)
240 | embedding = tensor.tensor3('embedding', dtype='float32')
241 | embeddingr = embedding[::-1]
242 | x_mask = tensor.matrix('x_mask', dtype='float32')
243 | xr_mask = x_mask[::-1]
244 |
245 | # encoder
246 | proj = get_layer(options['encoder'])[1](tparams, embedding, options,
247 | prefix='encoder',
248 | mask=x_mask)
249 | projr = get_layer(options['encoder'])[1](tparams, embeddingr, options,
250 | prefix='encoder_r',
251 | mask=xr_mask)
252 |
253 | ctx = tensor.concatenate([proj[0][-1], projr[0][-1]], axis=1)
254 |
255 | return embedding, x_mask, ctx
256 |
257 | # some utilities
258 | def ortho_weight(ndim):
259 | W = numpy.random.randn(ndim, ndim)
260 | u, s, v = numpy.linalg.svd(W)
261 | return u.astype('float32')
262 |
263 | def norm_weight(nin,nout=None, scale=0.1, ortho=True):
264 | if nout == None:
265 | nout = nin
266 | if nout == nin and ortho:
267 | W = ortho_weight(nin)
268 | else:
269 | W = numpy.random.uniform(low=-scale, high=scale, size=(nin, nout))
270 | return W.astype('float32')
271 |
272 | def param_init_gru(options, params, prefix='gru', nin=None, dim=None):
273 | """
274 | parameter init for GRU
275 | """
276 | if nin == None:
277 | nin = options['dim_proj']
278 | if dim == None:
279 | dim = options['dim_proj']
280 | W = numpy.concatenate([norm_weight(nin,dim),
281 | norm_weight(nin,dim)], axis=1)
282 | params[_p(prefix,'W')] = W
283 | params[_p(prefix,'b')] = numpy.zeros((2 * dim,)).astype('float32')
284 | U = numpy.concatenate([ortho_weight(dim),
285 | ortho_weight(dim)], axis=1)
286 | params[_p(prefix,'U')] = U
287 |
288 | Wx = norm_weight(nin, dim)
289 | params[_p(prefix,'Wx')] = Wx
290 | Ux = ortho_weight(dim)
291 | params[_p(prefix,'Ux')] = Ux
292 | params[_p(prefix,'bx')] = numpy.zeros((dim,)).astype('float32')
293 |
294 | return params
295 |
296 | def gru_layer(tparams, state_below, options, prefix='gru', mask=None, **kwargs):
297 | """
298 | Forward pass through GRU layer
299 | """
300 | nsteps = state_below.shape[0]
301 | if state_below.ndim == 3:
302 | n_samples = state_below.shape[1]
303 | else:
304 | n_samples = 1
305 |
306 | dim = tparams[_p(prefix,'Ux')].shape[1]
307 |
308 | if mask == None:
309 | mask = tensor.alloc(1., state_below.shape[0], 1)
310 |
311 | def _slice(_x, n, dim):
312 | if _x.ndim == 3:
313 | return _x[:, :, n*dim:(n+1)*dim]
314 | return _x[:, n*dim:(n+1)*dim]
315 |
316 | state_below_ = tensor.dot(state_below, tparams[_p(prefix, 'W')]) + tparams[_p(prefix, 'b')]
317 | state_belowx = tensor.dot(state_below, tparams[_p(prefix, 'Wx')]) + tparams[_p(prefix, 'bx')]
318 | U = tparams[_p(prefix, 'U')]
319 | Ux = tparams[_p(prefix, 'Ux')]
320 |
321 | def _step_slice(m_, x_, xx_, h_, U, Ux):
322 | preact = tensor.dot(h_, U)
323 | preact += x_
324 |
325 | r = tensor.nnet.sigmoid(_slice(preact, 0, dim))
326 | u = tensor.nnet.sigmoid(_slice(preact, 1, dim))
327 |
328 | preactx = tensor.dot(h_, Ux)
329 | preactx = preactx * r
330 | preactx = preactx + xx_
331 |
332 | h = tensor.tanh(preactx)
333 |
334 | h = u * h_ + (1. - u) * h
335 | h = m_[:,None] * h + (1. - m_)[:,None] * h_
336 |
337 | return h
338 |
339 | seqs = [mask, state_below_, state_belowx]
340 | _step = _step_slice
341 |
342 | rval, updates = theano.scan(_step,
343 | sequences=seqs,
344 | outputs_info = [tensor.alloc(0., n_samples, dim)],
345 | non_sequences = [tparams[_p(prefix, 'U')],
346 | tparams[_p(prefix, 'Ux')]],
347 | name=_p(prefix, '_layers'),
348 | n_steps=nsteps,
349 | profile=profile,
350 | strict=True)
351 | rval = [rval]
352 | return rval
353 |
354 |
355 |
--------------------------------------------------------------------------------