├── .gitignore
├── 1-seq2seq.ipynb
├── 2-seq2seq-advanced.ipynb
├── 3-seq2seq-native-new.ipynb
├── LICENSE
├── README.md
├── helpers.py
├── model_new.py
└── pictures
    ├── 1-seq2seq.png
    └── 2-seq2seq-feed-previous.png


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | 
27 | # PyInstaller
28 | #  Usually these files are written by a python script from a template
29 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 | 
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 | 
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 | 
48 | # Translations
49 | *.mo
50 | *.pot
51 | 
52 | # Django stuff:
53 | *.log
54 | local_settings.py
55 | 
56 | # Flask stuff:
57 | instance/
58 | .webassets-cache
59 | 
60 | # Scrapy stuff:
61 | .scrapy
62 | 
63 | # Sphinx documentation
64 | docs/_build/
65 | 
66 | # PyBuilder
67 | target/
68 | 
69 | # IPython Notebook
70 | .ipynb_checkpoints
71 | 
72 | # pyenv
73 | .python-version
74 | 
75 | # celery beat schedule file
76 | celerybeat-schedule
77 | 
78 | # dotenv
79 | .env
80 | 
81 | # virtualenv
82 | venv/
83 | ENV/
84 | 
85 | # Spyder project settings
86 | .spyderproject
87 | 
88 | # Rope project settings
89 | .ropeproject
90 | 


--------------------------------------------------------------------------------
/1-seq2seq.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Simple dynamic seq2seq with TensorFlow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "This tutorial covers building seq2seq using dynamic unrolling with TensorFlow. \n",
 15 |     "\n",
 16 |     "I wasn't able to find any existing implementation of dynamic seq2seq with TF (as of 01.01.2017), so I decided to learn how to write my own, and document what I learn in the process.\n",
 17 |     "\n",
 18 |     "I deliberately try to be as explicit as possible. As it currently stands, TF code is the best source of documentation on itself, and I have a feeling that many conventions and design decisions are not documented anywhere except in the brains of Google Brain engineers. \n",
 19 |     "\n",
 20 |     "I hope this will be useful to people whose brains are wired like mine.\n",
 21 |     "\n",
 22 |     "**UPDATE**: as of r1.0 @ 16.02.2017, there is new official implementation in `tf.contrib.seq2seq`. See [tutorial #3](3-seq2seq-native-new.ipynb). Official tutorial reportedly be up soon. Personally I still find wiring dynamic encoder-decoder by hand insightful in many ways."
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {},
 28 |    "source": [
 29 |     "Here we implement plain seq2seq — forward-only encoder + decoder without attention. I'll try to follow closely the original architecture described in [Sutskever, Vinyals and Le (2014)](https://arxiv.org/abs/1409.3215). If you notice any deviations, please let me know."
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "markdown",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "Architecture diagram from their paper:\n",
 37 |     "![seq2seq architecutre](pictures/1-seq2seq.png)\n",
 38 |     "Rectangles are encoder and decoder's recurrent layers. Encoder receives `[A, B, C]` sequence as inputs. We don't care about encoder outputs, only about the hidden state it accumulates while reading the sequence. After input sequence ends, encoder passes its final state to decoder, which receives `[<EOS>, W, X, Y, Z]` and is trained to output `[W, X, Y, Z, <EOS>]`. `<EOS>` token is a special word in vocabulary that signals to decoder the beginning of translation."
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "## Implementation details\n",
 46 |     "\n",
 47 |     "TensorFlow has its own [implementation of seq2seq](https://www.tensorflow.org/tutorials/seq2seq/). Recently it was moved from core examples to [`tensorflow/models` repo](https://github.com/tensorflow/models/tree/master/tutorials/rnn/translate), and uses deprecated seq2seq implementation. Deprecation happened because it uses **static unrolling**.\n",
 48 |     "\n",
 49 |     "**Static unrolling** involves construction of computation graph with a fixed sequence of time step. Such a graph can only handle sequences of specific lengths. One solution for handling sequences of varying lengths is to create multiple graphs with different time lengths and separate the dataset into this buckets.\n",
 50 |     "\n",
 51 |     "**Dynamic unrolling** instead uses control flow ops to process sequence step by step. In TF this is supposed to more space efficient and just as fast. This is now a recommended way to implement RNNs."
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "## Vocabulary\n",
 59 |     "\n",
 60 |     "Seq2seq maps sequence onto another sequence. Both sequences consist of integers from a fixed range. In language tasks, integers usually correspond to words: we first construct a vocabulary by assigning to every word in our corpus a serial integer. First few integers are reserved for special tokens. We'll call the upper bound on vocabulary a `vocabulary size`.\n",
 61 |     "\n",
 62 |     "Input data consists of sequences of integers."
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 1,
 68 |    "metadata": {
 69 |     "collapsed": true
 70 |    },
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "x = [[5, 7, 8], [6, 3], [3], [1]]"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {},
 79 |    "source": [
 80 |     "While manipulating such variable-length lists are convenient to humans, RNNs prefer a different layout:"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": 2,
 86 |    "metadata": {
 87 |     "collapsed": false
 88 |    },
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "import helpers\n",
 92 |     "xt, xlen = helpers.batch(x)"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": 3,
 98 |    "metadata": {
 99 |     "collapsed": false
100 |    },
101 |    "outputs": [
102 |     {
103 |      "data": {
104 |       "text/plain": [
105 |        "[[5, 7, 8], [6, 3], [3], [1]]"
106 |       ]
107 |      },
108 |      "execution_count": 3,
109 |      "metadata": {},
110 |      "output_type": "execute_result"
111 |     }
112 |    ],
113 |    "source": [
114 |     "x"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 4,
120 |    "metadata": {
121 |     "collapsed": false
122 |    },
123 |    "outputs": [
124 |     {
125 |      "data": {
126 |       "text/plain": [
127 |        "array([[5, 6, 3, 1],\n",
128 |        "       [7, 3, 0, 0],\n",
129 |        "       [8, 0, 0, 0]], dtype=int32)"
130 |       ]
131 |      },
132 |      "execution_count": 4,
133 |      "metadata": {},
134 |      "output_type": "execute_result"
135 |     }
136 |    ],
137 |    "source": [
138 |     "xt"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "markdown",
143 |    "metadata": {},
144 |    "source": [
145 |     "Sequences form columns of a matrix of size `[max_time, batch_size]`. Sequences shorter then the longest one are padded with zeros towards the end. This layout is called `time-major`. It is slightly more efficient then `batch-major`. We will use it for the rest of the tutorial."
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": 5,
151 |    "metadata": {
152 |     "collapsed": false
153 |    },
154 |    "outputs": [
155 |     {
156 |      "data": {
157 |       "text/plain": [
158 |        "[3, 2, 1, 1]"
159 |       ]
160 |      },
161 |      "execution_count": 5,
162 |      "metadata": {},
163 |      "output_type": "execute_result"
164 |     }
165 |    ],
166 |    "source": [
167 |     "xlen"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "markdown",
172 |    "metadata": {},
173 |    "source": [
174 |     "For some forms of dynamic layout it is useful to have a pointer to terminals of every sequence in the batch in separate tensor (see following tutorials)."
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "markdown",
179 |    "metadata": {},
180 |    "source": [
181 |     "# Building a model"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "markdown",
186 |    "metadata": {},
187 |    "source": [
188 |     "## Simple seq2seq"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "markdown",
193 |    "metadata": {},
194 |    "source": [
195 |     "Encoder starts with empty state and runs through the input sequence. We are not interested in encoder's outputs, only in its `final_state`.\n",
196 |     "\n",
197 |     "Decoder uses encoder's `final_state` as its `initial_state`. Its inputs are a batch-sized matrix with `<EOS>` token at the 1st time step and `<PAD>` at the following. This is a rather crude setup, useful only for tutorial purposes. In practice, we would like to feed previously generated tokens after `<EOS>`.\n",
198 |     "\n",
199 |     "Decoder's outputs are mapped onto the output space using `[hidden_units x output_vocab_size]` projection layer. This is necessary because we cannot make `hidden_units` of decoder arbitrarily large, while our target space would grow with the size of the dictionary.\n",
200 |     "\n",
201 |     "This kind of encoder-decoder is forced to learn fixed-length representation (specifically, `hidden_units` size) of the variable-length input sequence and restore output sequence only from this representation."
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "code",
206 |    "execution_count": 6,
207 |    "metadata": {
208 |     "collapsed": false
209 |    },
210 |    "outputs": [],
211 |    "source": [
212 |     "import numpy as np\n",
213 |     "import tensorflow as tf\n",
214 |     "import helpers\n",
215 |     "\n",
216 |     "tf.reset_default_graph()\n",
217 |     "sess = tf.InteractiveSession()"
218 |    ]
219 |   },
220 |   {
221 |    "cell_type": "code",
222 |    "execution_count": 7,
223 |    "metadata": {
224 |     "collapsed": false
225 |    },
226 |    "outputs": [
227 |     {
228 |      "data": {
229 |       "text/plain": [
230 |        "'1.3.0'"
231 |       ]
232 |      },
233 |      "execution_count": 7,
234 |      "metadata": {},
235 |      "output_type": "execute_result"
236 |     }
237 |    ],
238 |    "source": [
239 |     "tf.__version__"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "markdown",
244 |    "metadata": {},
245 |    "source": [
246 |     "### Model inputs and outputs "
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "markdown",
251 |    "metadata": {},
252 |    "source": [
253 |     "First critical thing to decide: vocabulary size.\n",
254 |     "\n",
255 |     "Dynamic RNN models can be adapted to different batch sizes and sequence lengths without retraining (e.g. by serializing model parameters and Graph definitions via `tf.train.Saver`), but changing vocabulary size requires retraining the model."
256 |    ]
257 |   },
258 |   {
259 |    "cell_type": "code",
260 |    "execution_count": 8,
261 |    "metadata": {
262 |     "collapsed": true
263 |    },
264 |    "outputs": [],
265 |    "source": [
266 |     "PAD = 0\n",
267 |     "EOS = 1\n",
268 |     "\n",
269 |     "vocab_size = 10\n",
270 |     "input_embedding_size = 20\n",
271 |     "\n",
272 |     "encoder_hidden_units = 20\n",
273 |     "decoder_hidden_units = encoder_hidden_units"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "markdown",
278 |    "metadata": {},
279 |    "source": [
280 |     "Nice way to understand complicated function is to study its signature - inputs and outputs. With pure functions, only inputs-output relation matters.\n",
281 |     "\n",
282 |     "- `encoder_inputs` int32 tensor is shaped `[encoder_max_time, batch_size]`\n",
283 |     "- `decoder_targets` int32 tensor is shaped `[decoder_max_time, batch_size]`"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": 9,
289 |    "metadata": {
290 |     "collapsed": false
291 |    },
292 |    "outputs": [],
293 |    "source": [
294 |     "encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')\n",
295 |     "decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')"
296 |    ]
297 |   },
298 |   {
299 |    "cell_type": "markdown",
300 |    "metadata": {},
301 |    "source": [
302 |     "We'll add one additional placeholder tensor: \n",
303 |     "- `decoder_inputs` int32 tensor is shaped `[decoder_max_time, batch_size]`"
304 |    ]
305 |   },
306 |   {
307 |    "cell_type": "code",
308 |    "execution_count": 10,
309 |    "metadata": {
310 |     "collapsed": false
311 |    },
312 |    "outputs": [],
313 |    "source": [
314 |     "decoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_inputs')"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {},
320 |    "source": [
321 |     "We actually don't want to feed `decoder_inputs` manually — they are a function of either `decoder_targets` or previous decoder outputs during rollout. However, there are different ways to construct them. It might be illustrative to explicitly specify them for out first seq2seq implementation.\n",
322 |     "\n",
323 |     "During training, `decoder_inputs` will consist of `<EOS>` token concatenated with `decoder_targets` along time axis. In this way, we always pass target sequence as the history to the decoder, regrardless of what it actually outputs predicts. This can introduce distribution shift from training to prediction. \n",
324 |     "In prediction mode, model will receive tokens it previously generated (via argmax over logits), not the ground truth, which would be unknowable."
325 |    ]
326 |   },
327 |   {
328 |    "cell_type": "markdown",
329 |    "metadata": {},
330 |    "source": [
331 |     "Notice that all shapes are specified with `None`s (dynamic). We can use batches of any size with any number of timesteps. This is convenient and efficient, however but there are obvious constraints: \n",
332 |     "- Feed values for all tensors should have same `batch_size`\n",
333 |     "- Decoder inputs and ouputs (`decoder_inputs` and `decoder_targets`) should have same `decoder_max_time`"
334 |    ]
335 |   },
336 |   {
337 |    "cell_type": "markdown",
338 |    "metadata": {},
339 |    "source": [
340 |     "### Embeddings\n",
341 |     "\n",
342 |     "`encoder_inputs` and `decoder_inputs` are int32 tensors of shape `[max_time, batch_size]`, while encoder and decoder RNNs expect dense vector representation of words, `[max_time, batch_size, input_embedding_size]`. We convert one to another by using *word embeddings*. Specifics of working with embeddings are nicely described in [official tutorial on embeddings](https://www.tensorflow.org/tutorials/word2vec/)."
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "markdown",
347 |    "metadata": {},
348 |    "source": [
349 |     "First we initialize embedding matrix. Initializations are random. We rely on our end-to-end training to learn vector representations for words jointly with encoder and decoder."
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": 11,
355 |    "metadata": {
356 |     "collapsed": false
357 |    },
358 |    "outputs": [],
359 |    "source": [
360 |     "embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)"
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "markdown",
365 |    "metadata": {},
366 |    "source": [
367 |     "We use `tf.nn.embedding_lookup` to *index embedding matrix*: given word `4`, we represent it as 4th column of embedding matrix. \n",
368 |     "This operation is lightweight, compared with alternative approach of one-hot encoding word `4` as `[0,0,0,1,0,0,0,0,0,0]` (vocab size 10) and then multiplying it by embedding matrix.\n",
369 |     "\n",
370 |     "Additionally, we don't need to compute gradients for any columns except 4th.\n",
371 |     "\n",
372 |     "Encoder and decoder will share embeddings. It's all words, right? Well, digits in this case. In real NLP application embedding matrix can get very large, with 100k or even 1m columns."
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": 12,
378 |    "metadata": {
379 |     "collapsed": true
380 |    },
381 |    "outputs": [],
382 |    "source": [
383 |     "encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)\n",
384 |     "decoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, decoder_inputs)"
385 |    ]
386 |   },
387 |   {
388 |    "cell_type": "markdown",
389 |    "metadata": {},
390 |    "source": [
391 |     "### Encoder\n",
392 |     "\n",
393 |     "The centerpiece of all things RNN in TensorFlow is `RNNCell` class and its descendants (like `LSTMCell`). But they are outside of the scope of this post — nice [official tutorial](https://www.tensorflow.org/tutorials/recurrent/) is available. \n",
394 |     "\n",
395 |     "`@TODO: RNNCell as a factory`"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "code",
400 |    "execution_count": 13,
401 |    "metadata": {
402 |     "collapsed": false
403 |    },
404 |    "outputs": [],
405 |    "source": [
406 |     "encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)\n",
407 |     "\n",
408 |     "encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(\n",
409 |     "    encoder_cell, encoder_inputs_embedded,\n",
410 |     "    dtype=tf.float32, time_major=True,\n",
411 |     ")\n",
412 |     "\n",
413 |     "del encoder_outputs"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "markdown",
418 |    "metadata": {},
419 |    "source": [
420 |     "We discard `encoder_outputs` because we are not interested in them within seq2seq framework. What we actually want is `encoder_final_state` — state of LSTM's hidden cells at the last moment of the Encoder rollout.\n",
421 |     "\n",
422 |     "`encoder_final_state` is also called \"thought vector\". We will use it as initial state for the Decoder. In seq2seq without attention this is the only point where Encoder passes information to Decoder. We hope that backpropagation through time (BPTT) algorithm will tune the model to pass enough information throught the thought vector for correct sequence output decoding."
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "code",
427 |    "execution_count": 14,
428 |    "metadata": {
429 |     "collapsed": false
430 |    },
431 |    "outputs": [
432 |     {
433 |      "data": {
434 |       "text/plain": [
435 |        "LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_2:0' shape=(?, 20) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_3:0' shape=(?, 20) dtype=float32>)"
436 |       ]
437 |      },
438 |      "execution_count": 14,
439 |      "metadata": {},
440 |      "output_type": "execute_result"
441 |     }
442 |    ],
443 |    "source": [
444 |     "encoder_final_state"
445 |    ]
446 |   },
447 |   {
448 |    "cell_type": "markdown",
449 |    "metadata": {},
450 |    "source": [
451 |     "TensorFlow LSTM implementation stores state as a tuple of tensors. \n",
452 |     "- `encoder_final_state.h` is activations of hidden layer of LSTM cell\n",
453 |     "- `encoder_final_state.c` is final output, which can potentially be transfromed with some wrapper `@TODO: check correctness`"
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "metadata": {},
459 |    "source": [
460 |     "### Decoder"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "code",
465 |    "execution_count": 15,
466 |    "metadata": {
467 |     "collapsed": false
468 |    },
469 |    "outputs": [],
470 |    "source": [
471 |     "decoder_cell = tf.contrib.rnn.LSTMCell(decoder_hidden_units)\n",
472 |     "\n",
473 |     "decoder_outputs, decoder_final_state = tf.nn.dynamic_rnn(\n",
474 |     "    decoder_cell, decoder_inputs_embedded,\n",
475 |     "\n",
476 |     "    initial_state=encoder_final_state,\n",
477 |     "\n",
478 |     "    dtype=tf.float32, time_major=True, scope=\"plain_decoder\",\n",
479 |     ")"
480 |    ]
481 |   },
482 |   {
483 |    "cell_type": "markdown",
484 |    "metadata": {},
485 |    "source": [
486 |     "Since we pass `encoder_final_state` as `initial_state` to the decoder, they should be compatible. This means the same cell type (`LSTMCell` in our case), the same amount of `hidden_units` and the same amount of layers (single layer). I suppose this can be relaxed if we additonally pass `encoder_final_state` through a one-layer MLP."
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "markdown",
491 |    "metadata": {},
492 |    "source": [
493 |     "With encoder, we were not interested in cells output. But decoder's outputs are what we actually after: we use them to get distribution over words of output sequence.\n",
494 |     "\n",
495 |     "At this point `decoder_cell` output is a `hidden_units` sized vector at every timestep. However, for training and prediction we need logits of size `vocab_size`. Reasonable thing would be to put linear layer (fully-connected layer without activation function) on top of LSTM output to get non-normalized logits. This layer is called projection layer by convention."
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "code",
500 |    "execution_count": 16,
501 |    "metadata": {
502 |     "collapsed": false
503 |    },
504 |    "outputs": [],
505 |    "source": [
506 |     "decoder_logits = tf.contrib.layers.linear(decoder_outputs, vocab_size)\n",
507 |     "\n",
508 |     "decoder_prediction = tf.argmax(decoder_logits, 2)"
509 |    ]
510 |   },
511 |   {
512 |    "cell_type": "markdown",
513 |    "metadata": {},
514 |    "source": [
515 |     "### Optimizer"
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "code",
520 |    "execution_count": 17,
521 |    "metadata": {
522 |     "collapsed": false
523 |    },
524 |    "outputs": [
525 |     {
526 |      "data": {
527 |       "text/plain": [
528 |        "<tf.Tensor 'fully_connected/BiasAdd:0' shape=(?, ?, 10) dtype=float32>"
529 |       ]
530 |      },
531 |      "execution_count": 17,
532 |      "metadata": {},
533 |      "output_type": "execute_result"
534 |     }
535 |    ],
536 |    "source": [
537 |     "decoder_logits"
538 |    ]
539 |   },
540 |   {
541 |    "cell_type": "markdown",
542 |    "metadata": {},
543 |    "source": [
544 |     "RNN outputs tensor of shape `[max_time, batch_size, hidden_units]` which projection layer maps onto `[max_time, batch_size, vocab_size]`. `vocab_size` part of the shape is static, while `max_time` and `batch_size` is dynamic."
545 |    ]
546 |   },
547 |   {
548 |    "cell_type": "code",
549 |    "execution_count": 18,
550 |    "metadata": {
551 |     "collapsed": false
552 |    },
553 |    "outputs": [],
554 |    "source": [
555 |     "stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(\n",
556 |     "    labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),\n",
557 |     "    logits=decoder_logits,\n",
558 |     ")\n",
559 |     "\n",
560 |     "loss = tf.reduce_mean(stepwise_cross_entropy)\n",
561 |     "train_op = tf.train.AdamOptimizer().minimize(loss)"
562 |    ]
563 |   },
564 |   {
565 |    "cell_type": "code",
566 |    "execution_count": 19,
567 |    "metadata": {
568 |     "collapsed": false
569 |    },
570 |    "outputs": [],
571 |    "source": [
572 |     "sess.run(tf.global_variables_initializer())"
573 |    ]
574 |   },
575 |   {
576 |    "cell_type": "markdown",
577 |    "metadata": {},
578 |    "source": [
579 |     "### Test forward pass\n",
580 |     "\n",
581 |     "Did I say that deep learning is a game of shapes? When building a Graph, TF will throw errors when static shapes are not matching. However, mismatches between dynamic shapes are often only discovered when we try to run something through the graph.\n",
582 |     "\n",
583 |     "\n",
584 |     "So let's try running something. For that we need to prepare values we will feed into placeholders."
585 |    ]
586 |   },
587 |   {
588 |    "cell_type": "markdown",
589 |    "metadata": {},
590 |    "source": [
591 |     "```\n",
592 |     "this is key part where everything comes together\n",
593 |     "\n",
594 |     "@TODO: describe\n",
595 |     "- how encoder shape is fixed to max\n",
596 |     "- how decoder shape is arbitraty and determined by inputs, but should probably be longer then encoder's\n",
597 |     "- how decoder input values are also arbitraty, and how we use GO token, and what are those 0s, and what can be used instead (shifted gold sequence, beam search)\n",
598 |     "@TODO: add references\n",
599 |     "```"
600 |    ]
601 |   },
602 |   {
603 |    "cell_type": "code",
604 |    "execution_count": 20,
605 |    "metadata": {
606 |     "collapsed": false
607 |    },
608 |    "outputs": [
609 |     {
610 |      "name": "stdout",
611 |      "output_type": "stream",
612 |      "text": [
613 |       "batch_encoded:\n",
614 |       "[[6 3 9]\n",
615 |       " [0 4 8]\n",
616 |       " [0 0 7]]\n",
617 |       "decoder inputs:\n",
618 |       "[[1 1 1]\n",
619 |       " [0 0 0]\n",
620 |       " [0 0 0]\n",
621 |       " [0 0 0]]\n",
622 |       "decoder predictions:\n",
623 |       "[[2 4 6]\n",
624 |       " [2 4 2]\n",
625 |       " [1 4 5]\n",
626 |       " [5 4 4]]\n"
627 |      ]
628 |     }
629 |    ],
630 |    "source": [
631 |     "batch_ = [[6], [3, 4], [9, 8, 7]]\n",
632 |     "\n",
633 |     "batch_, batch_length_ = helpers.batch(batch_)\n",
634 |     "print('batch_encoded:\\n' + str(batch_))\n",
635 |     "\n",
636 |     "din_, dlen_ = helpers.batch(np.ones(shape=(3, 1), dtype=np.int32),\n",
637 |     "                            max_sequence_length=4)\n",
638 |     "print('decoder inputs:\\n' + str(din_))\n",
639 |     "\n",
640 |     "pred_ = sess.run(decoder_prediction,\n",
641 |     "    feed_dict={\n",
642 |     "        encoder_inputs: batch_,\n",
643 |     "        decoder_inputs: din_,\n",
644 |     "    })\n",
645 |     "print('decoder predictions:\\n' + str(pred_))"
646 |    ]
647 |   },
648 |   {
649 |    "cell_type": "markdown",
650 |    "metadata": {},
651 |    "source": [
652 |     "Successful forward computation, everything is wired correctly."
653 |    ]
654 |   },
655 |   {
656 |    "cell_type": "markdown",
657 |    "metadata": {},
658 |    "source": [
659 |     "## Training on the toy task"
660 |    ]
661 |   },
662 |   {
663 |    "cell_type": "markdown",
664 |    "metadata": {},
665 |    "source": [
666 |     "We will teach our model to memorize and reproduce input sequence. Sequences will be random, with varying length.\n",
667 |     "\n",
668 |     "Since random sequences do not contain any structure, model will not be able to exploit any patterns in data. It will simply encode sequence in a thought vector, then decode from it."
669 |    ]
670 |   },
671 |   {
672 |    "cell_type": "code",
673 |    "execution_count": 21,
674 |    "metadata": {
675 |     "collapsed": false
676 |    },
677 |    "outputs": [
678 |     {
679 |      "name": "stdout",
680 |      "output_type": "stream",
681 |      "text": [
682 |       "head of the batch:\n",
683 |       "[4, 3, 6, 7, 8, 2, 7, 9]\n",
684 |       "[4, 9, 3, 2, 3, 9, 5]\n",
685 |       "[2, 7, 4, 7]\n",
686 |       "[8, 4, 6, 6, 9, 2]\n",
687 |       "[5, 8, 8, 8, 6, 2]\n",
688 |       "[2, 7, 3, 2, 4]\n",
689 |       "[4, 8, 6]\n",
690 |       "[7, 8, 7, 3]\n",
691 |       "[7, 2, 3, 3, 7, 7, 6, 2]\n",
692 |       "[4, 5, 4, 7, 6, 5, 8]\n"
693 |      ]
694 |     }
695 |    ],
696 |    "source": [
697 |     "batch_size = 100\n",
698 |     "\n",
699 |     "batches = helpers.random_sequences(length_from=3, length_to=8,\n",
700 |     "                                   vocab_lower=2, vocab_upper=10,\n",
701 |     "                                   batch_size=batch_size)\n",
702 |     "\n",
703 |     "print('head of the batch:')\n",
704 |     "for seq in next(batches)[:10]:\n",
705 |     "    print(seq)"
706 |    ]
707 |   },
708 |   {
709 |    "cell_type": "code",
710 |    "execution_count": 22,
711 |    "metadata": {
712 |     "collapsed": true
713 |    },
714 |    "outputs": [],
715 |    "source": [
716 |     "def next_feed():\n",
717 |     "    batch = next(batches)\n",
718 |     "    encoder_inputs_, _ = helpers.batch(batch)\n",
719 |     "    decoder_targets_, _ = helpers.batch(\n",
720 |     "        [(sequence) + [EOS] for sequence in batch]\n",
721 |     "    )\n",
722 |     "    decoder_inputs_, _ = helpers.batch(\n",
723 |     "        [[EOS] + (sequence) for sequence in batch]\n",
724 |     "    )\n",
725 |     "    return {\n",
726 |     "        encoder_inputs: encoder_inputs_,\n",
727 |     "        decoder_inputs: decoder_inputs_,\n",
728 |     "        decoder_targets: decoder_targets_,\n",
729 |     "    }"
730 |    ]
731 |   },
732 |   {
733 |    "cell_type": "markdown",
734 |    "metadata": {},
735 |    "source": [
736 |     "Given encoder_inputs `[5, 6, 7]`, decoder_targets would be `[5, 6, 7, 1]`, where 1 is for `EOS`, and decoder_inputs would be `[1, 5, 6, 7]` - decoder_inputs are lagged by 1 step, passing previous token as input at current step."
737 |    ]
738 |   },
739 |   {
740 |    "cell_type": "code",
741 |    "execution_count": 23,
742 |    "metadata": {
743 |     "collapsed": true
744 |    },
745 |    "outputs": [],
746 |    "source": [
747 |     "loss_track = []"
748 |    ]
749 |   },
750 |   {
751 |    "cell_type": "code",
752 |    "execution_count": 24,
753 |    "metadata": {
754 |     "collapsed": false,
755 |     "scrolled": false
756 |    },
757 |    "outputs": [
758 |     {
759 |      "name": "stdout",
760 |      "output_type": "stream",
761 |      "text": [
762 |       "batch 0\n",
763 |       "  minibatch loss: 2.3455774784088135\n",
764 |       "  sample 1:\n",
765 |       "    input     > [4 3 2 0 0 0 0 0]\n",
766 |       "    predicted > [2 2 7 2 5 5 5 5 5]\n",
767 |       "  sample 2:\n",
768 |       "    input     > [8 5 9 6 9 4 0 0]\n",
769 |       "    predicted > [2 1 1 9 9 9 9 9 9]\n",
770 |       "  sample 3:\n",
771 |       "    input     > [3 3 3 6 8 2 6 0]\n",
772 |       "    predicted > [2 3 3 3 3 3 5 5 5]\n",
773 |       "\n",
774 |       "batch 1000\n",
775 |       "  minibatch loss: 0.3058355748653412\n",
776 |       "  sample 1:\n",
777 |       "    input     > [3 3 5 9 3 4 6 7]\n",
778 |       "    predicted > [3 3 5 9 3 4 6 7 1]\n",
779 |       "  sample 2:\n",
780 |       "    input     > [8 4 9 8 3 8 0 0]\n",
781 |       "    predicted > [8 4 8 8 3 8 1 0 0]\n",
782 |       "  sample 3:\n",
783 |       "    input     > [6 5 7 0 0 0 0 0]\n",
784 |       "    predicted > [6 5 7 1 0 0 0 0 0]\n",
785 |       "\n",
786 |       "batch 2000\n",
787 |       "  minibatch loss: 0.1154913678765297\n",
788 |       "  sample 1:\n",
789 |       "    input     > [5 4 8 5 0 0 0 0]\n",
790 |       "    predicted > [5 4 8 5 1 0 0 0 0]\n",
791 |       "  sample 2:\n",
792 |       "    input     > [8 2 6 2 6 0 0 0]\n",
793 |       "    predicted > [8 2 6 2 6 1 0 0 0]\n",
794 |       "  sample 3:\n",
795 |       "    input     > [6 2 9 5 0 0 0 0]\n",
796 |       "    predicted > [6 2 9 5 1 0 0 0 0]\n",
797 |       "\n",
798 |       "batch 3000\n",
799 |       "  minibatch loss: 0.09871210902929306\n",
800 |       "  sample 1:\n",
801 |       "    input     > [6 7 9 6 0 0 0 0]\n",
802 |       "    predicted > [6 7 9 6 1 0 0 0 0]\n",
803 |       "  sample 2:\n",
804 |       "    input     > [7 5 3 5 4 9 9 0]\n",
805 |       "    predicted > [7 5 3 5 4 9 9 1 0]\n",
806 |       "  sample 3:\n",
807 |       "    input     > [2 5 6 2 4 9 7 6]\n",
808 |       "    predicted > [2 5 6 2 4 9 7 3 1]\n",
809 |       "\n"
810 |      ]
811 |     }
812 |    ],
813 |    "source": [
814 |     "max_batches = 3001\n",
815 |     "batches_in_epoch = 1000\n",
816 |     "\n",
817 |     "try:\n",
818 |     "    for batch in range(max_batches):\n",
819 |     "        fd = next_feed()\n",
820 |     "        _, l = sess.run([train_op, loss], fd)\n",
821 |     "        loss_track.append(l)\n",
822 |     "\n",
823 |     "        if batch == 0 or batch % batches_in_epoch == 0:\n",
824 |     "            print('batch {}'.format(batch))\n",
825 |     "            print('  minibatch loss: {}'.format(sess.run(loss, fd)))\n",
826 |     "            predict_ = sess.run(decoder_prediction, fd)\n",
827 |     "            for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):\n",
828 |     "                print('  sample {}:'.format(i + 1))\n",
829 |     "                print('    input     > {}'.format(inp))\n",
830 |     "                print('    predicted > {}'.format(pred))\n",
831 |     "                if i >= 2:\n",
832 |     "                    break\n",
833 |     "            print()\n",
834 |     "except KeyboardInterrupt:\n",
835 |     "    print('training interrupted')"
836 |    ]
837 |   },
838 |   {
839 |    "cell_type": "code",
840 |    "execution_count": 25,
841 |    "metadata": {
842 |     "collapsed": false
843 |    },
844 |    "outputs": [
845 |     {
846 |      "name": "stdout",
847 |      "output_type": "stream",
848 |      "text": [
849 |       "loss 0.0982 after 300100 examples (batch_size=100)\n"
850 |      ]
851 |     },
852 |     {
853 |      "data": {
854 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8FPX9x/HXJzf3lSA34RIFRNAIKEe9RbRFWv1Vbaul\nKtraqq1tf9RapbXWag+rrdWftdTWelWrFgWtRVFRK8gV7vsMBBKuJJCEXN/fH7uJSciSa7OzO3k/\nH488mJ2Z3fkMA+/Mznzn+zXnHCIi4i9xXhcgIiLhp3AXEfEhhbuIiA8p3EVEfEjhLiLiQwp3EREf\nUriLiPiQwl1ExIcU7iIiPpTg1YZTU1Ndenq6V5sXEYlJS5cu3e+cS6tvPc/CPT09nSVLlni1eRGR\nmGRmOxqyni7LiIj4kMJdRMSHFO4iIj6kcBcR8SGFu4iIDyncRUR8SOEuIuJDMRfu+48c46evr6G0\nvMLrUkREolbMhfs76/bxl4+28/GWA16XIiIStWIu3CcMCTx1u+dwkceViIhEr5gL9+4dkomPM3Yf\nUriLiIQSc+GeGB9H785t2HGw0OtSRESiVsyFO0D/bm3ZceCo12WIiEStmAz39G7t2L5f4S4iEkpM\nhnu39knkF5epOaSISAgxGe6Zuw4DsDIrz+NKRESiU0yG+1UZfQEo05m7iEidYjLcu7VLAuDN1Xs9\nrkREJDrFZLifld4VgINHSzyuREQkOsVkuMfFGWMGdGVvfrHXpYiIRKWYDHeAtA7J7C845nUZIiJR\nKXbDvX0yuQp3EZE6xW64d0im4FgZRSXlXpciIhJ1YjbcN+0rAOAPCzZ5XImISPSJ2XCfOqo3ACVl\nausuIlJbzIb7505OIyk+jrg487oUEZGoE7PhHhdnlJRX8H/vb/W6FBGRqBOz4V6duiEQEakppsP9\nmjH9ACircB5XIiISXWI63AektgUU7iIitcV0uCfEBcrXZRkRkZpiOtwrB+s4cqzM40pERKJLTIf7\ni0t2ATDlkYUeVyIiEl1iOty/MrY/APnFOnMXEakupsP9qow+AEw5rYfHlYiIRJd6w93M+prZAjNb\na2ZrzOz2OtYxM3vUzDab2UozO6Nlyq2pY0oiAG9pRCYRkRoSGrBOGXCnc26ZmXUAlprZf5xza6ut\ncykwJPgzFng8+GdEqCWkiEhN9Z65O+eynXPLgtMFwDqgd63VpgJ/cwGfAJ3NrGfYqxURkQZp1DV3\nM0sHRgOLai3qDeyq9jqL438BtCjndPouIlKpweFuZu2BfwJ3OOfym7IxM5thZkvMbElubm5TPuI4\n00YHfodszjkSls8TEfGDBoW7mSUSCPZnnXOv1LHKbqBvtdd9gvNqcM496ZzLcM5lpKWlNaXe48xd\nmQ3Ay8uywvJ5IiJ+0JDWMgb8GVjnnPttiNXmANcFW82MA/Kcc9lhrDOkjm0C94Tzi9TWXUSkUkNa\ny4wHvgasMrMVwXl3Af0AnHNPAPOAKcBmoBCYHv5S69anS1v2HymhqEThLiJSqd5wd859CJxwuCMX\nuJt5a7iKaownvnom4x54h0Fp7b3YvIhIVIrpJ1QBenRKoW1SPHlFpV6XIiISNWI+3AE6t0nksMJd\nRKSKL8K9TVI8q3fneV2GiEjUaMgN1ai3JfcoAEUl5bRJive4GhER7/nizL1SiUZkEhEBfBbuBcW6\n7i4iAj4J9xG9OwLwm7c3elyJiEh08EW4V7ZxX77zkMeViIhEB1+Ee4eUwH3h7QcKPa5ERCQ6+CLc\nZ156qtcliIhEFV+Ee/vkBC47LTA2iPp1FxHxSbgDzF0V6IRy3iqNpyoi4ptwr3Trc8u8LkFExHO+\nC3cREfFRuM+7bWLVdKmeVBWRVs434T6sV8eq6eLScg8rERHxnm/CvbriUp25i0jr5stwP1amM3cR\nad18Ge6rd+d7XYKIiKd8Ge53vbrK6xJERDzly3AvU2sZEWnlfBXuw4MtZvKLy3j4P+r+V0RaL1+F\n+9++MaZq+pF3NnlYiYiIt3wV7kkJvtodEZEm81UaxseZ1yWIiEQFX4V7m8R4r0sQEYkKvgp3M525\ni4iAz8K9trdWq293EWmdfB3ut/x9qdcliIh4wnfh/sPJQ2u81rB7ItIa+S7cv3XuYGZMGlj1emVW\nnofViIh4w3fhDvDp9oNV0yXqikBEWiFfhvvynYerpnMLjnlYiYiIN3wZ7vdPG1E1/a1nl+m6u4i0\nOvWGu5nNNrMcM1sdYvm5ZpZnZiuCP/eEv8zG+dIZfWq8/uvH270pRETEIw05c38amFzPOgudc6OC\nPz9rflnNk1LrSdXH3tviUSUiIt6oN9ydcx8AB+tbL5rpsoyItDbhuuZ+jpmtNLM3zWx4mD4zbPYf\nKWH+2n1elyEiEjHhCPdlQD/n3Ejg98BroVY0sxlmtsTMluTm5oZh0w333OKdEd2eiIiXmh3uzrl8\n59yR4PQ8INHMUkOs+6RzLsM5l5GWltbcTTdKcWl5RLcnIuKlZoe7mfWwYHeMZjYm+JkHmvu54fbx\nlqgrSUSkxTSkKeTzwH+BoWaWZWY3mNktZnZLcJUrgdVmlgk8ClztouAO5u0XDPG6BBERzyTUt4Jz\n7pp6lv8B+EPYKgqTPl3aeF2CiIhnfPmEKkBdXx2i4AuFiEhE+DbcR/TqdNy8J97f6kElIiKR59tw\nH9arI9eO7Vdj3oNvrfeoGhGRyPJtuAP07JjidQkiIp7wdbjf/LlBXpcgIuIJX4d7UsLxu/fBxsg+\nGSsi4gVfh3tdHluw2esSRERaXKsL90XbYrqDSxGRBml14Q6QPnOu1yWIiLQo34f7iN4dvS5BRCTi\nfB/uXzi9l9cliIhEnO/Dffr4AfxsatSNHyIi0qJ8H+6J8XFcd3b6cfPTZ85l+/6jkS9IRCQCfB/u\nJzJ/3T7mrswmfeZcdh0s9LocEZGwaTXhvv6+yXzvopNrzCstd7yyLAuADXsLvChLRKRFtJpwT0mM\n57ZaA3g8+NZ6FmzIASCu1fxNiEhr0OojrSLYxXtwpEAREV9odeF+92Wn1jk/XuEuIj7S6sK9fXLd\nIwvGKdxFxEdaXbiHEqdsFxEfaXXhHmoU1U+3H4poHSIiLan1hXuIdH/8fXUFLCL+0erCPZTi0gqv\nSxARCZtWF+4dUuq+oQqQnVcUwUpERFpOqwv3y07ryS+mncZDV448btnZD7xLfnGpB1WJiIRX6NNY\nn4qLM64d2w+AbfuP8vh7W2osz8k/RseURC9KExEJm1Z35l4fNXcXET9o1eGuHBcRv2rV4V4XBb6I\n+IHCvZbzf/M+6TPncriwxOtSRESarFWHe1JC6N3/+yc7IliJiEh4tepwv3nSIG6eNLDOZR9vORDh\nakREwqdVh3ubpHh+NOVUzuzf5bhlCncRiWWtOtwr/ePms70uQUQkrFrdQ0x1iQ/R3++Db62nvMJx\nx4VDaJukvyoRiR31JpaZzQYuB3KccyPqWG7AI8AUoBD4unNuWbgL9ULl06spCXF896KTNRSfiMSM\nhlyWeRqYfILllwJDgj8zgMebX1bk3TBhQMhlj767mV+/vSGC1YiINE+94e6c+wA4eIJVpgJ/cwGf\nAJ3NrGe4CoyUn1w+jH/dOj7k8r/9V00jRSR2hOOGam9gV7XXWcF5xzGzGWa2xMyW5ObmhmHT4XV6\n384hlxUUl7FmT14EqxERabqItpZxzj3pnMtwzmWkpaVFctMNtuwnF4VcdtmjH7J+b34EqxERaZpw\nhPtuoG+1132C82JS13ZJjOzTKeRyjbUqIrEgHOE+B7jOAsYBec657DB8rmdenBG63Xt5uYbjE5Ho\n15CmkM8D5wKpZpYF3AskAjjnngDmEWgGuZlAU8jpLVVspLRJig+5rKwixAjbIiJRpN5wd85dU89y\nB9watoqixKST0/hg4/E3fXccKPSgGhGRxlH3A430zCc7KNfZu4hEOYV7CANT24VcNuiueazKUrNI\nEYleCvcQfjTlFJ6eflbI5bM/2hbBakREGkfhHkJyQjznDu0ecnlSvP7qRCR6KaGaqKi03OsSRERC\nUrjXo2u7pDrnz8ncQ6ChENz3xlrueGF5JMsSETkhhXs93vjOBO6fNoK2SfFcdWafGsvufCmTT7Ye\n4M8fbuO1FXs8qlBE5HgagaIevTq34Stj+/OVsf2Z/WHNm6ivLNvNK8titqcFEfExnbk3QkK8BusQ\nkdigcG+EhDj9dYlIbFBaNUJCiLFWRUSijcK9ES4efhKn9uwYcvntLyznWFk5RSVqJiki3rLK5nyR\nlpGR4ZYsWeLJtpsrfebcetdZOetiOqYkRqAaEWlNzGypcy6jvvV05t4EN55gMO1KI2e9TUFxaQSq\nERE5nsK9CWZMGtig9Q4cKWnhSkRE6qZwb0GlGrVJRDyicG+CyrsU3Tsks/n+S0OuN/uj7VSo73cR\n8YDCvQkqgjehzSAhPo5hIVrQPL94J798a30kSxMRARTuTVLZwMgItHufPj495LpPfrCV8379Hr95\newO7DmqIPhGJDIV7E6R1SGZE7448eOVIAK7K6HvC9bftP8rv393MLX9fGonyRETUcVhTJMbH8cZ3\nJjb6fWv25LdANSIix9OZe4S9tGSXnmAVkRancI+wH7y8kh/+cyVlaiYpIi1Il2XCZP73JpEUH8/a\n7Px6r62/nrmHeIMHrxxJckJ8hCoUkdZEZ+5hMrh7B/p1a8uEIakAfHVcv5BD9AG8tmIP189eTL66\nKBCRFqCOw1pATn4xXdslkVdUypef/ITNOUdOuP4fv3IGU07rGaHqRCSWqeMwD3XvmEJCfBzd2ifz\n7zsm0b1D8gnXX7hpf4QqE5HWQuHewuLjjMU/vpCFPzwv5Dql5RV8svWAbrKKSNjohmqE9O3aNuSy\nl5dm8fLSLAB++oXhZKR34aSOKaS2P/EZv4hIKAp3Dzz85dP57ouZdS67d84aAOIMtj5wWSTLEhEf\n0WWZCHrt1vG8/u0JTBvdp951KxzkFhyrMa+guJTNOQUtVZ6I+IjO3CNoVN/OjVp/4kPvMqR7B26Y\nMIArRvdm3C/e4WhJOdt/qTN6ETkxhXsUKy6tYNXuPO54cQXr9xZwVN0WiEgD6bKMRxbfdQH3XD6s\nwes/8f6WFqxGRPymQeFuZpPNbIOZbTazmXUsP9fM8sxsRfDnnvCX6i/dO6bwjQYMtF2XNXvyqKhw\nePUAmohEv3ovy5hZPPAYcBGQBXxqZnOcc2trrbrQOXd5C9QotVz26IcAnH9Kd2Z//SwApv3xI6aN\n7s11Z6d7WJmIRIuGnLmPATY757Y650qAF4CpLVuWNMS763OqppfvPMw9/1rjYTUiEk0aEu69gV3V\nXmcF59V2jpmtNLM3zWx4XR9kZjPMbImZLcnNzW1CuVJbRYWjuPT4G61ZhwrZuE/NJkVaq3C1llkG\n9HPOHTGzKcBrwJDaKznnngSehEDHYWHadky774oR7M0r4rEFWzitdydeuuVs5mTu4fMje3HqPW/V\n+/6Bd807bt4f39vMQ29tAFCzSZFWqiHhvhuoPkhon+C8Ks65/GrT88zsj2aW6pxTj1j1+Nq4/jjn\nSO/Wjs+f3ouUxHj+Jzgm6/ZfXkb6zLmN+ryy8oqqYBeR1qshl2U+BYaY2QAzSwKuBuZUX8HMepiZ\nBafHBD/3QLiL9Ssz46qMvqQkHj9wx3fOH9yoz7rjxRU1XqfPnMvynYeaVZ+IxJ4G9ecevNTyOyAe\nmO2cu9/MbgFwzj1hZt8GvgmUAUXA95xzH5/oM/3cn3u4FZWU88nWA8xdlV3VwVhT/N/XzuSS4T3C\nWJmIRFpD+3PXYB0x5In3t/DLN9c36zPe+M4E3lmXw+0XBm6J/OClTNJT23HreY37hiAi3tBgHT40\ndkDXGq+b0iXw5b//kIfnb6xqYfPS0ix+9W9doxfxG/UtE0NG9+vCxp9fyt68YkrKy3l/437ue6P2\ns2QNc8cLK/h0+8Gq109/tI2vj2/aE7MiEn0U7jEmKSGOft0CA38MSmvPwk25vLeh8c8MvLVmb43X\ns15fy46Dhdxxwcl0apsYllpFxDu65u4DW3KPsHFvAR9syuX5xbvqf0M9Pp55Pn98bzM/uOQUzGDx\n1oOcd0p34uMsDNWKSHPohmortXp3Hpf//sOwf+7dl53KjRMHHje/uLScpPg44hT8IhGhG6qt1Ije\nnaqmv3hGXb1ENM3P565j1pyafddUVDhO+clb3DtnDQs35ZKTXxy27YlI8+iauw898dUzOFxYyiXD\ne/DKssDDxN3aJXHgaEmzPvfpj7ez48BRJp2cxmm9O7F+b6DvmucW7+SZT3bQp0sbPvzf85tdv4g0\nn8LdhyaP6Fk1/fT0sxjdtwud2iZy2aMLWbMnv8a6XdomcqiwtMGfvWBDLgtq3cAtrwhc2ss6VETW\noUI27TvCead0b8YeiEhz6bKMz507tHtV65e/fmMMt1XrzmDGpIEsv+diXpwxLmzbm/DgAqY//ekJ\n13lu0U5WZh0O2zZF5Hg6c29FUtsn872Lh3JVRl/e25DD14IDeyQlhP93/OrdeVXX//MKS3l1eRbX\nnZ2OGdz16ipAPVaKtCSFeyvUt2vbqmCHwE3YL47uzeo9eWzcd4T7rhjBpSN6kFtwjEsfWdikbVS2\n2Ln6rL5kZuWxLjufWa+v5aoz+4RjF0SkHmoKKVUKikt5ZP4mfjB5KMkJgR4q31yVzTefXdZi20xt\nn8S73z+XFTsPc++cNWzbf5RPf3whaR0a37WCSGugdu4SNpV9yn/+9F68nrmnxbfXLimeT+++kLZJ\ngS+WR46V0T75sy+ZpeUVxJnpoSppldTOXcKm8pr8I18eVWP+y7ec3SLbO1pSzrB7/k36zLk8/dE2\nRtz7bz7eEhj35ca/LmHIj99kUB0jUDXU0x9tY0oTLzeJxAqFu9Rryd0XsvwnFxEXZzVugo7s05ne\nndtUve7ZKSXs2571eqBjtGv/tIj0mXOZv25f1bLi0nLmrcrm+tmLKa9wTP7dB7y0ZBcVFY7DhSWE\n+lY66/W1rM3Or3OZiF/osow02rxV2Wzbf7SqD/iC4lKOlVVgQHZeMb9+e0ONzsz6d2vLjgOFYa/j\n9guG8Mg7m4C62+s/9KWR/M9Zn40Q+dv/bOT1zD1s238UgA0/n8yWnKN0bJPA0h2HmDoqfE/0irQU\nXXMXTy3dcYgvPf4xT3z1DCaP6MmmfQV8/S+fsvtwUcRquHxkT/5w7RkAHCsrZ+jddQ843jYpnsKS\ncp67aSznDEqtmu+c49dvb+CxBVvY9sAUgiNJinhK4S6eyyssrdF9cEWFY2DwWvnKWRfz6baD3PDX\n6Ps3cEqPDmQdKuLb5w+uGvlqWM+OvHrrOQA8898d9OiUwuThPXji/S1MHz+AdsEbvhUVLmQnasWl\n5Xz/pUzuuHAI723IZfr4AbopLI2mcJeoVNnypvLafXFpOZc9upAbJw7kmjH9qpbHgkeuHsXtLwQG\nJN/6iyms3J3HFY99xPM3jaNv1zaktk8mJTGenPxibnpmKZm7aj6V+9v/OZ0vnqF2/9I4DQ13PcQk\nETXvtol0SPnsn11KYjzv3Hlu1etFd13A0x9vZ8LgVJ5auJXObZN4dfluDyqt3/vV7iu8tmI33/tH\nJgDX/OkTAC4edhJPXpfB84t3HRfsAIUl5Y3a3uacI6R3a0tCvNpBSP105i5R763VeykqLaNz2yT6\nd23L+b95/7h1bvncIJ54f4sH1TXdz6YO57rgk8IfbMxlzICuPLdoJ9eO7UdKYjyLth4gtUMyg9La\ns2jrAb78ZOCXxsQhqTx1fQZPLdzGjRMHVD1w5pV9+cVM/8un/PnrGfTs1Kb+N0iz6LKM+Npv3t7A\noq0HeX7GOOIMzAznHNv2HyUxPo6JDy3wusQGe+6msVz7p0U15v3qypH84OWVACz7yUUsWJ/DnS9l\nVi3/7oUn8/D8jcQZPHV9BucN7c63n1/OZaf1ZMppgV5BDxeW0CYpnuSEeFbsOswry7K48+Kh/HTO\nGu79wnA6tWnecIr5xaW8tXove/OK+e1/NjJxSCrP3DC2WZ8p9VO4S6u2aV8Ba/bkc8eLgWvi1Z+u\nTW2fzP4jx5g8vAfXnd2fbu2TueR3H3hZbr3uu2IEP3ltdcjlf5l+FtP/EuiNc/zgbny0+UDVsvun\njeC+N9ZSXFrBN8YPYPZH27hxwgCy84qZ9YXhrM3OZ+n2g3zv4qGNqmnUz97mcGEp5wzqxsdbAttb\nOetiOqZoDN6WpHAXAf61Yjep7ZMZPziVbfuPUlZewZCTOhy33j+XZpGe2paTT+rAabPe9qBS7908\naSC9Orfh3jlruHZsP34x7TTKKxy3PruMb547iNP7dq6xfqib3w9dOZIfvrySdT+bTJuk5l8yKiop\nJ6egmP7d2jX7s/xA4S7SRO9tyGF0vy50TEngi49/zA0TBvDfLQc4vW9nzhnUjfXZBRSWlvPgm+sj\n2m7fC8N7dawa4GXp3Reyek8+iXHGgaMlfOf55Sd877M3jmX84NSQy99es5e9+cVcd3Y6xaXlPLto\nJxOHpJKdV8yEwam8unw300b35ut/WczCTftDPmtw9FgZCzftZ/KIHs3b2aCiknIqnKtq3hptFO4i\nEfDNvy/lzdV7eXHGOAaktmNtdj5d2yWx53ARWYeK+PncdQB8+L/nccnDH3C0pJz7p43gx6+GvsTi\nN9NG9+bV5bu5fGRP3liZzeh+nfnlF0dWXQpbeveFnPnz+TXeU3lpaUTvjqzeHfjl8odrR5PWPpmx\nA7sBUFJWwfYDR/n9u5t5PXMPb393Etv3H+Xd9TnkF5fy0JWnU1hSxv6CEob16ghQ1SXFroNFTPrV\nAl791jmM7tcFgDv/kcnGfQWs2p0HfNZc98VPdzKqbxeG9jj+G19dnlu0k0Fp7arq3HWwkJTEeNI6\nJPODlzKZOqo3E4aE/qVXH4W7SAQcLixhTuYevjauf51nlU8t3Mrg7u05d2h3Nucc4bbnl/P8TeMo\nLC3j7Afe5bmbxlJSVsGczD28smw3XdslcfAEY90+d9NYRvXtzH+3HIjKB8Ai4aoz+5B75FiNLi4A\nBndvz+acI3W+59yhaazPLmBvrUHcrxnTl1F9O3P2wFQm/armTfj0bm156voMLvxt4JfQtgem8NTC\nbZx3SndeWLyT8YNTq4aTLK9wHDxaQlqH5OOe5ah8ve2BKQz40bway5pC4S4Sw97fmEuFc2TuOswt\nnxvE3rxiUhLj6RGic7bKAJl320SmPFqzx8vPn96LopIy5q/LqZp3Usdk9uUfa7kd8IlObRLJKwr0\nWfTijHFVzVGru2ZMP7Lzio77ZTN1VC8euXp01bFZdNcFjP3FOwDMvW0Cw3t1alJNCneRVmTD3gJS\nEuNq3HTMKyoFR40uILbvP4oZ9O/Wjn+v2cvNzyzlmRvGsGnfEX72xlruvOhkBnVvz7eCA7S0TYrn\nX7eO56KH629NNKpvZ+6bOoKlOw5W9ebZ2o0Z0JXF2w4eN79NYjzr7pvcpM/UE6oirUhd14Prasee\nnvpZ+F8yvEfV5YGJQ9LISO/CiF6dMAu0nBkzoCsXnHoSABee2p2z0rvytbP7k5wQzwcbc0lKiOMr\nTy3ikatHcWb/LnRpm0S75ARO69OJw0Wl/G7+phPW/I3xA/j+JSdz8zNLGdazIzMvPYWCY2Wc9fP5\nHCurqFqveu+fsaauYAcoKm3c08lNoTN3EWmywpKyqhGzalu+8xCn9+nM91/OZNHWg3w08/wavXOe\n6LpzTvDaeFqH5Kp7GVtyj3BB8OnkL2f05Z31Odw0cQDbDxQyZkAXpo3uw/b9R/np62sY2adzVP9C\nqLxk0xS6LCMiUWlddj6J8XEM7t6+0e89cOQYZkbXdkn1rvvI/E2M7teZQ4Ul3P7CCi4edhKPfeUM\ncgqOsS+/mF6d2mAGr2fu4az0rjw8fyObc46QdSjQvPXp6Wdx7tDuTHpoATsPBsYjuHZsP+6/YgTX\nzV5MpzaJ3DBhAB3bJPLDl1dy4Mgxtjdg3II7LzqZmyYNJCWxac8AKNxFRMLEOce+/GMhb2hDoLvn\n5xbv5Moz+7A55wh9u7SlQ0oC76zP4R9LdvH7a0Y3OdCrU7iLiPhQWAfINrPJZrbBzDab2cw6lpuZ\nPRpcvtLMzmhK0SIiEh71hruZxQOPAZcCw4BrzGxYrdUuBYYEf2YAj4e5ThERaYSGnLmPATY757Y6\n50qAF4CptdaZCvzNBXwCdDaznmGuVUREGqgh4d4b2FXtdVZwXmPXwcxmmNkSM1uSm5tbe7GIiIRJ\nRMfrcs496ZzLcM5lpKWlRXLTIiKtSkPCfTfQt9rrPsF5jV1HREQipCHh/ikwxMwGmFkScDUwp9Y6\nc4Drgq1mxgF5zrnsMNcqIiINVG/fMs65MjP7NvBvIB6Y7ZxbY2a3BJc/AcwDpgCbgUJgesuVLCIi\n9fHsISYzywV2NPHtqcD+MJbjJe1LdPLLvvhlP0D7Uqm/c67em5aehXtzmNmShjyhFQu0L9HJL/vi\nl/0A7UtjRbS1jIiIRIbCXUTEh2I13J/0uoAw0r5EJ7/si1/2A7QvjRKT19xFROTEYvXMXURETiDm\nwr2+7oejjZltN7NVZrbCzJYE53U1s/+Y2abgn12qrf+j4L5tMLNLvKsczGy2meWY2epq8xpdu5md\nGfw72BzsGtqiZF9mmdnu4LFZYWZTon1fzKyvmS0ws7VmtsbMbg/Oj7njcoJ9icXjkmJmi80sM7gv\nPw3O9+64OOdi5ofAQ1RbgIFAEpAJDPO6rnpq3g6k1pr3EDAzOD0TeDA4PSy4T8nAgOC+xntY+yTg\nDGB1c2oHFgPjAAPeBC6Nkn2ZBXy/jnWjdl+AnsAZwekOwMZgvTF3XE6wL7F4XAxoH5xOBBYF6/Hs\nuMTamXtDuh+OBVOBvwan/wpcUW3+C865Y865bQSe+B3jQX0AOOc+AGoP396o2i3Q9XNH59wnLvAv\n92/V3hMxIfYllKjdF+dctnNuWXC6AFhHoAfWmDsuJ9iXUKJ5X5xz7kjwZWLwx+HhcYm1cG9Q18JR\nxgHzzWxgd9zgAAACEklEQVSpmc0IzjvJfdb3zl7gpOB0LOxfY2vvHZyuPT9afMcCo4fNrvaVOSb2\nxczSgdEEzhJj+rjU2heIweNiZvFmtgLIAf7jnPP0uMRauMeiCc65UQRGq7rVzCZVXxj87RyTTZZi\nufagxwlc4hsFZAO/8bachjOz9sA/gTucc/nVl8XacaljX2LyuDjnyoP/1/sQOAsfUWt5RI9LrIV7\nzHUt7JzbHfwzB3iVwGWWfcGvXwT/zAmuHgv719jadwena8/3nHNuX/A/ZAXwJz67BBbV+2JmiQTC\n8Fnn3CvB2TF5XOral1g9LpWcc4eBBcBkPDwusRbuDel+OGqYWTsz61A5DVwMrCZQ8/XB1a4H/hWc\nngNcbWbJZjaAwJi0iyNbdb0aVXvwK2m+mY0L3vW/rtp7PGU1h4KcRuDYQBTvS3C7fwbWOed+W21R\nzB2XUPsSo8clzcw6B6fbABcB6/HyuETyjnI4fgh0LbyRwN3lH3tdTz21DiRwRzwTWFNZL9ANeAfY\nBMwHulZ7z4+D+7YBD1qV1Kr/eQJfi0sJXPu7oSm1AxkE/oNuAf5A8OG5KNiXZ4BVwMrgf7ae0b4v\nwAQCX+1XAiuCP1Ni8bicYF9i8biMBJYHa14N3BOc79lx0ROqIiI+FGuXZUREpAEU7iIiPqRwFxHx\nIYW7iIgPKdxFRHxI4S4i4kMKdxERH1K4i4j40P8DLbGt5O66KzYAAAAASUVORK5CYII=\n",
855 |       "text/plain": [
856 |        "<matplotlib.figure.Figure at 0x1228eefd0>"
857 |       ]
858 |      },
859 |      "metadata": {},
860 |      "output_type": "display_data"
861 |     }
862 |    ],
863 |    "source": [
864 |     "%matplotlib inline\n",
865 |     "import matplotlib.pyplot as plt\n",
866 |     "plt.plot(loss_track)\n",
867 |     "print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))"
868 |    ]
869 |   },
870 |   {
871 |    "cell_type": "markdown",
872 |    "metadata": {},
873 |    "source": [
874 |     "Something is definitely getting learned."
875 |    ]
876 |   },
877 |   {
878 |    "cell_type": "markdown",
879 |    "metadata": {},
880 |    "source": [
881 |     "# Limitations of the model\n",
882 |     "\n",
883 |     "We have no control over transitions of `tf.nn.dynamic_rnn`, it is unrolled in a single sweep. Some of the things that are not possible without such control:\n",
884 |     "\n",
885 |     "- We can't feed previously generated tokens without falling back to Python loops. This means *we cannot make efficient inference with dynamic_rnn decoder*!\n",
886 |     "\n",
887 |     "- We can't use attention, because attention conditions decoder inputs on its previous state\n",
888 |     "\n",
889 |     "Solution would be to use `tf.nn.raw_rnn` instead of `tf.nn.dynamic_rnn` for decoder, as we will do in tutorial #2. "
890 |    ]
891 |   },
892 |   {
893 |    "cell_type": "markdown",
894 |    "metadata": {},
895 |    "source": [
896 |     "# Fun things to try (aka Exercises)\n",
897 |     "\n",
898 |     "- In `copy_task` increasing `max_sequence_size` and `vocab_upper`. Observe slower learning and general performance degradation.\n",
899 |     "\n",
900 |     "- For `decoder_inputs`, instead of shifted target sequence `[<EOS> W X Y Z]`, try feeding `[<EOS> <PAD> <PAD> <PAD>]`, like we've done when we tested forward pass. Does it break things? Or slows learning?"
901 |    ]
902 |   }
903 |  ],
904 |  "metadata": {
905 |   "kernelspec": {
906 |    "display_name": "Python 3",
907 |    "language": "python",
908 |    "name": "python3"
909 |   },
910 |   "language_info": {
911 |    "codemirror_mode": {
912 |     "name": "ipython",
913 |     "version": 3
914 |    },
915 |    "file_extension": ".py",
916 |    "mimetype": "text/x-python",
917 |    "name": "python",
918 |    "nbconvert_exporter": "python",
919 |    "pygments_lexer": "ipython3",
920 |    "version": "3.6.0"
921 |   }
922 |  },
923 |  "nbformat": 4,
924 |  "nbformat_minor": 2
925 | }
926 | 


--------------------------------------------------------------------------------
/2-seq2seq-advanced.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Advanced dynamic seq2seq with TensorFlow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "Encoder is bidirectional now. Decoder is implemented using `tf.nn.raw_rnn`. \n",
 15 |     "It feeds previously generated tokens during training as inputs, instead of target sequence."
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "markdown",
 20 |    "metadata": {},
 21 |    "source": [
 22 |     "**UPDATE (16.02.2017)**: I learned some things after I wrote this tutorial. In particular:\n",
 23 |     " - [DONE] Replacing projection (one-hot encoding followed by linear layer) with embedding (indexing weights of linear layer directly) is more efficient.\n",
 24 |     " - When decoding, feeding previously generated tokens as inputs adds robustness to model's errors. However feeding ground truth speeds up training. Apperantly best practice is to mix both randomly when training.\n",
 25 |     "\n",
 26 |     "I will update tutorial to reflect this at some point."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 1,
 32 |    "metadata": {
 33 |     "collapsed": true
 34 |    },
 35 |    "outputs": [],
 36 |    "source": [
 37 |     "import numpy as np\n",
 38 |     "import tensorflow as tf\n",
 39 |     "import helpers\n",
 40 |     "\n",
 41 |     "tf.reset_default_graph()\n",
 42 |     "sess = tf.InteractiveSession()"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 2,
 48 |    "metadata": {
 49 |     "collapsed": false
 50 |    },
 51 |    "outputs": [
 52 |     {
 53 |      "data": {
 54 |       "text/plain": [
 55 |        "'1.3.0'"
 56 |       ]
 57 |      },
 58 |      "execution_count": 2,
 59 |      "metadata": {},
 60 |      "output_type": "execute_result"
 61 |     }
 62 |    ],
 63 |    "source": [
 64 |     "tf.__version__"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": 3,
 70 |    "metadata": {
 71 |     "collapsed": true
 72 |    },
 73 |    "outputs": [],
 74 |    "source": [
 75 |     "PAD = 0\n",
 76 |     "EOS = 1\n",
 77 |     "\n",
 78 |     "vocab_size = 10\n",
 79 |     "input_embedding_size = 20\n",
 80 |     "\n",
 81 |     "encoder_hidden_units = 20\n",
 82 |     "decoder_hidden_units = encoder_hidden_units * 2"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": 4,
 88 |    "metadata": {
 89 |     "collapsed": false
 90 |    },
 91 |    "outputs": [],
 92 |    "source": [
 93 |     "encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')\n",
 94 |     "encoder_inputs_length = tf.placeholder(shape=(None,), dtype=tf.int32, name='encoder_inputs_length')\n",
 95 |     "\n",
 96 |     "decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "Previously we elected to manually feed `decoder_inputs` to better understand what is going on. Here we implement decoder with `tf.nn.raw_rnn` and will construct `decoder_inputs` step by step in the loop."
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "metadata": {},
109 |    "source": [
110 |     "## Embeddings\n",
111 |     "Setup embeddings (see tutorial 1)"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "code",
116 |    "execution_count": 5,
117 |    "metadata": {
118 |     "collapsed": false
119 |    },
120 |    "outputs": [],
121 |    "source": [
122 |     "embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)\n",
123 |     "\n",
124 |     "encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {},
130 |    "source": [
131 |     "## Encoder\n",
132 |     "\n",
133 |     "We are replacing unidirectional `tf.nn.dynamic_rnn` with `tf.nn.bidirectional_dynamic_rnn` as the encoder.\n"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 6,
139 |    "metadata": {
140 |     "collapsed": true
141 |    },
142 |    "outputs": [],
143 |    "source": [
144 |     "from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "code",
149 |    "execution_count": 7,
150 |    "metadata": {
151 |     "collapsed": true
152 |    },
153 |    "outputs": [],
154 |    "source": [
155 |     "encoder_cell = LSTMCell(encoder_hidden_units)"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "code",
160 |    "execution_count": 8,
161 |    "metadata": {
162 |     "collapsed": false
163 |    },
164 |    "outputs": [],
165 |    "source": [
166 |     "((encoder_fw_outputs,\n",
167 |     "  encoder_bw_outputs),\n",
168 |     " (encoder_fw_final_state,\n",
169 |     "  encoder_bw_final_state)) = (\n",
170 |     "    tf.nn.bidirectional_dynamic_rnn(cell_fw=encoder_cell,\n",
171 |     "                                    cell_bw=encoder_cell,\n",
172 |     "                                    inputs=encoder_inputs_embedded,\n",
173 |     "                                    sequence_length=encoder_inputs_length,\n",
174 |     "                                    dtype=tf.float32, time_major=True)\n",
175 |     "    )"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": 9,
181 |    "metadata": {
182 |     "collapsed": false
183 |    },
184 |    "outputs": [
185 |     {
186 |      "data": {
187 |       "text/plain": [
188 |        "<tf.Tensor 'bidirectional_rnn/fw/fw/TensorArrayStack/TensorArrayGatherV3:0' shape=(?, ?, 20) dtype=float32>"
189 |       ]
190 |      },
191 |      "execution_count": 9,
192 |      "metadata": {},
193 |      "output_type": "execute_result"
194 |     }
195 |    ],
196 |    "source": [
197 |     "encoder_fw_outputs"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "code",
202 |    "execution_count": 10,
203 |    "metadata": {
204 |     "collapsed": false
205 |    },
206 |    "outputs": [
207 |     {
208 |      "data": {
209 |       "text/plain": [
210 |        "<tf.Tensor 'ReverseSequence:0' shape=(?, ?, 20) dtype=float32>"
211 |       ]
212 |      },
213 |      "execution_count": 10,
214 |      "metadata": {},
215 |      "output_type": "execute_result"
216 |     }
217 |    ],
218 |    "source": [
219 |     "encoder_bw_outputs"
220 |    ]
221 |   },
222 |   {
223 |    "cell_type": "code",
224 |    "execution_count": 11,
225 |    "metadata": {
226 |     "collapsed": false
227 |    },
228 |    "outputs": [
229 |     {
230 |      "data": {
231 |       "text/plain": [
232 |        "LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_2:0' shape=(?, 20) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_3:0' shape=(?, 20) dtype=float32>)"
233 |       ]
234 |      },
235 |      "execution_count": 11,
236 |      "metadata": {},
237 |      "output_type": "execute_result"
238 |     }
239 |    ],
240 |    "source": [
241 |     "encoder_fw_final_state"
242 |    ]
243 |   },
244 |   {
245 |    "cell_type": "code",
246 |    "execution_count": 12,
247 |    "metadata": {
248 |     "collapsed": false
249 |    },
250 |    "outputs": [
251 |     {
252 |      "data": {
253 |       "text/plain": [
254 |        "LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_2:0' shape=(?, 20) dtype=float32>, h=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_3:0' shape=(?, 20) dtype=float32>)"
255 |       ]
256 |      },
257 |      "execution_count": 12,
258 |      "metadata": {},
259 |      "output_type": "execute_result"
260 |     }
261 |    ],
262 |    "source": [
263 |     "encoder_bw_final_state"
264 |    ]
265 |   },
266 |   {
267 |    "cell_type": "markdown",
268 |    "metadata": {},
269 |    "source": [
270 |     "Have to concatenate forward and backward outputs and state. In this case we will not discard outputs, they would be used for attention."
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "code",
275 |    "execution_count": 13,
276 |    "metadata": {
277 |     "collapsed": false
278 |    },
279 |    "outputs": [],
280 |    "source": [
281 |     "encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2)\n",
282 |     "\n",
283 |     "encoder_final_state_c = tf.concat(\n",
284 |     "    (encoder_fw_final_state.c, encoder_bw_final_state.c), 1)\n",
285 |     "\n",
286 |     "encoder_final_state_h = tf.concat(\n",
287 |     "    (encoder_fw_final_state.h, encoder_bw_final_state.h), 1)\n",
288 |     "\n",
289 |     "encoder_final_state = LSTMStateTuple(\n",
290 |     "    c=encoder_final_state_c,\n",
291 |     "    h=encoder_final_state_h\n",
292 |     ")"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "metadata": {},
298 |    "source": [
299 |     "## Decoder"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "code",
304 |    "execution_count": 14,
305 |    "metadata": {
306 |     "collapsed": false
307 |    },
308 |    "outputs": [],
309 |    "source": [
310 |     "decoder_cell = LSTMCell(decoder_hidden_units)"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "markdown",
315 |    "metadata": {},
316 |    "source": [
317 |     "Time and batch dimensions are dynamic, i.e. they can change in runtime, from batch to batch"
318 |    ]
319 |   },
320 |   {
321 |    "cell_type": "code",
322 |    "execution_count": 15,
323 |    "metadata": {
324 |     "collapsed": true
325 |    },
326 |    "outputs": [],
327 |    "source": [
328 |     "encoder_max_time, batch_size = tf.unstack(tf.shape(encoder_inputs))"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "markdown",
333 |    "metadata": {},
334 |    "source": [
335 |     "Next we need to decide how far to run decoder. There are several options for stopping criteria:\n",
336 |     "- Stop after specified number of unrolling steps\n",
337 |     "- Stop after model produced <EOS> token\n",
338 |     "\n",
339 |     "The choice will likely be time-dependant. In legacy `translate` tutorial we can see that decoder unrolls for `len(encoder_input)+10` to allow for possibly longer translated sequence. Here we are doing a toy copy task, so how about we unroll decoder for `len(encoder_input)+2`, to allow model some room to make mistakes over 2 additional steps:"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "code",
344 |    "execution_count": 16,
345 |    "metadata": {
346 |     "collapsed": false
347 |    },
348 |    "outputs": [],
349 |    "source": [
350 |     "decoder_lengths = encoder_inputs_length + 3\n",
351 |     "# +2 additional steps, +1 leading <EOS> token for decoder inputs"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "markdown",
356 |    "metadata": {},
357 |    "source": [
358 |     "## Output projection\n",
359 |     "\n",
360 |     "Decoder will contain manually specified by us transition step:\n",
361 |     "```\n",
362 |     "output(t) -> output projection(t) -> prediction(t) (argmax) -> input embedding(t+1) -> input(t+1)\n",
363 |     "```\n",
364 |     "\n",
365 |     "In tutorial 1, we used `tf.contrib.layers.linear` layer to initialize weights and biases and apply operation for us. This is convenient, however now we need to specify parameters `W` and `b`  of the output layer in global scope, and apply them at every step of the decoder."
366 |    ]
367 |   },
368 |   {
369 |    "cell_type": "code",
370 |    "execution_count": 17,
371 |    "metadata": {
372 |     "collapsed": false
373 |    },
374 |    "outputs": [],
375 |    "source": [
376 |     "W = tf.Variable(tf.random_uniform([decoder_hidden_units, vocab_size], -1, 1), dtype=tf.float32)\n",
377 |     "b = tf.Variable(tf.zeros([vocab_size]), dtype=tf.float32)"
378 |    ]
379 |   },
380 |   {
381 |    "cell_type": "markdown",
382 |    "metadata": {},
383 |    "source": [
384 |     "## Decoder via `tf.nn.raw_rnn`\n",
385 |     "\n",
386 |     "`tf.nn.dynamic_rnn` allows for easy RNN construction, but is limited. \n",
387 |     "\n",
388 |     "For example, a nice way to increase robustness of the model is to feed as decoder inputs tokens that it previously generated, instead of shifted true sequence.\n",
389 |     "\n",
390 |     "![seq2seq-feed-previous](pictures/2-seq2seq-feed-previous.png)\n",
391 |     "*Image borrowed from http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/*"
392 |    ]
393 |   },
394 |   {
395 |    "cell_type": "markdown",
396 |    "metadata": {},
397 |    "source": [
398 |     "First prepare tokens. Decoder would operate on column vectors of shape `(batch_size,)` representing single time steps of the batch."
399 |    ]
400 |   },
401 |   {
402 |    "cell_type": "code",
403 |    "execution_count": 18,
404 |    "metadata": {
405 |     "collapsed": false
406 |    },
407 |    "outputs": [],
408 |    "source": [
409 |     "assert EOS == 1 and PAD == 0\n",
410 |     "\n",
411 |     "eos_time_slice = tf.ones([batch_size], dtype=tf.int32, name='EOS')\n",
412 |     "pad_time_slice = tf.zeros([batch_size], dtype=tf.int32, name='PAD')\n",
413 |     "\n",
414 |     "eos_step_embedded = tf.nn.embedding_lookup(embeddings, eos_time_slice)\n",
415 |     "pad_step_embedded = tf.nn.embedding_lookup(embeddings, pad_time_slice)"
416 |    ]
417 |   },
418 |   {
419 |    "cell_type": "markdown",
420 |    "metadata": {},
421 |    "source": [
422 |     "Now for the tricky part.\n",
423 |     "\n",
424 |     "Remember that standard `tf.nn.dynamic_rnn` requires all inputs `(t, ..., t+n)` be passed in advance as a single tensor. \"Dynamic\" part of its name refers to the fact that `n` can change from batch to batch.\n",
425 |     "\n",
426 |     "Now, what if we want to implement more complex mechanic like when we want decoder to receive previously generated tokens as input at every timestamp (instead of lagged target sequence)? Or when we want to implement soft attention, where at every timestep we add additional fixed-len representation, derived from query produced by previous step's hidden state? `tf.nn.raw_rnn` is a way to solve this problem.\n",
427 |     "\n",
428 |     "Main part of specifying RNN with `tf.nn.raw_rnn` is *loop transition function*. It defines inputs of step `t` given outputs and state of step `t-1`.\n",
429 |     "\n",
430 |     "Loop transition function is a mapping `(time, previous_cell_output, previous_cell_state, previous_loop_state) -> (elements_finished, input, cell_state, output, loop_state)`. It is called *before* RNNCell to prepare its inputs and state. Everything is a Tensor except for initial call at time=0 when everything is `None` (except `time`).\n",
431 |     "\n",
432 |     "Note that decoder inputs are returned from the transition function but passed into it. You are supposed to index inputs manually using `time` Tensor."
433 |    ]
434 |   },
435 |   {
436 |    "cell_type": "markdown",
437 |    "metadata": {},
438 |    "source": [
439 |     "Loop transition function is called two times:\n",
440 |     " 1. Initial call at time=0 to provide initial cell_state and input to RNN.\n",
441 |     " 2. Transition call for all following timesteps where you define transition between two adjacent steps.\n",
442 |     "\n",
443 |     "Lets define both cases separately."
444 |    ]
445 |   },
446 |   {
447 |    "cell_type": "markdown",
448 |    "metadata": {},
449 |    "source": [
450 |     "Loop initial state is function of only `encoder_final_state` and embeddings:"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "code",
455 |    "execution_count": 19,
456 |    "metadata": {
457 |     "collapsed": true
458 |    },
459 |    "outputs": [],
460 |    "source": [
461 |     "def loop_fn_initial():\n",
462 |     "    initial_elements_finished = (0 >= decoder_lengths)  # all False at the initial step\n",
463 |     "    initial_input = eos_step_embedded\n",
464 |     "    initial_cell_state = encoder_final_state\n",
465 |     "    initial_cell_output = None\n",
466 |     "    initial_loop_state = None  # we don't need to pass any additional information\n",
467 |     "    return (initial_elements_finished,\n",
468 |     "            initial_input,\n",
469 |     "            initial_cell_state,\n",
470 |     "            initial_cell_output,\n",
471 |     "            initial_loop_state)"
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "markdown",
476 |    "metadata": {},
477 |    "source": [
478 |     "Define transition function such that previously generated token (as judged in greedy manner by `argmax` over output projection) is passed as next input."
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": 20,
484 |    "metadata": {
485 |     "collapsed": false
486 |    },
487 |    "outputs": [],
488 |    "source": [
489 |     "def loop_fn_transition(time, previous_output, previous_state, previous_loop_state):\n",
490 |     "\n",
491 |     "    def get_next_input():\n",
492 |     "        output_logits = tf.add(tf.matmul(previous_output, W), b)\n",
493 |     "        prediction = tf.argmax(output_logits, axis=1)\n",
494 |     "        next_input = tf.nn.embedding_lookup(embeddings, prediction)\n",
495 |     "        return next_input\n",
496 |     "    \n",
497 |     "    elements_finished = (time >= decoder_lengths) # this operation produces boolean tensor of [batch_size]\n",
498 |     "                                                  # defining if corresponding sequence has ended\n",
499 |     "\n",
500 |     "    finished = tf.reduce_all(elements_finished) # -> boolean scalar\n",
501 |     "    input = tf.cond(finished, lambda: pad_step_embedded, get_next_input)\n",
502 |     "    state = previous_state\n",
503 |     "    output = previous_output\n",
504 |     "    loop_state = None\n",
505 |     "\n",
506 |     "    return (elements_finished, \n",
507 |     "            input,\n",
508 |     "            state,\n",
509 |     "            output,\n",
510 |     "            loop_state)"
511 |    ]
512 |   },
513 |   {
514 |    "cell_type": "markdown",
515 |    "metadata": {},
516 |    "source": [
517 |     "Combine initializer and transition functions and create raw_rnn.\n",
518 |     "\n",
519 |     "Note that while all operations above are defined with TF's control flow and reduction ops, here we rely on checking if state is `None` to determine if it is an initializer call or transition call. This is not very clean API and might be changed in the future (indeed, `tf.nn.raw_rnn`'s doc contains warning that API is experimental)."
520 |    ]
521 |   },
522 |   {
523 |    "cell_type": "code",
524 |    "execution_count": 21,
525 |    "metadata": {
526 |     "collapsed": false
527 |    },
528 |    "outputs": [],
529 |    "source": [
530 |     "def loop_fn(time, previous_output, previous_state, previous_loop_state):\n",
531 |     "    if previous_state is None:    # time == 0\n",
532 |     "        assert previous_output is None and previous_state is None\n",
533 |     "        return loop_fn_initial()\n",
534 |     "    else:\n",
535 |     "        return loop_fn_transition(time, previous_output, previous_state, previous_loop_state)\n",
536 |     "\n",
537 |     "decoder_outputs_ta, decoder_final_state, _ = tf.nn.raw_rnn(decoder_cell, loop_fn)\n",
538 |     "decoder_outputs = decoder_outputs_ta.stack()"
539 |    ]
540 |   },
541 |   {
542 |    "cell_type": "code",
543 |    "execution_count": 22,
544 |    "metadata": {
545 |     "collapsed": false
546 |    },
547 |    "outputs": [
548 |     {
549 |      "data": {
550 |       "text/plain": [
551 |        "<tf.Tensor 'TensorArrayStack/TensorArrayGatherV3:0' shape=(?, ?, 40) dtype=float32>"
552 |       ]
553 |      },
554 |      "execution_count": 22,
555 |      "metadata": {},
556 |      "output_type": "execute_result"
557 |     }
558 |    ],
559 |    "source": [
560 |     "decoder_outputs"
561 |    ]
562 |   },
563 |   {
564 |    "cell_type": "markdown",
565 |    "metadata": {},
566 |    "source": [
567 |     "To do output projection, we have to temporarilly flatten `decoder_outputs` from `[max_steps, batch_size, hidden_dim]` to `[max_steps*batch_size, hidden_dim]`, as `tf.matmul` needs rank-2 tensors at most."
568 |    ]
569 |   },
570 |   {
571 |    "cell_type": "code",
572 |    "execution_count": 23,
573 |    "metadata": {
574 |     "collapsed": false
575 |    },
576 |    "outputs": [],
577 |    "source": [
578 |     "decoder_max_steps, decoder_batch_size, decoder_dim = tf.unstack(tf.shape(decoder_outputs))\n",
579 |     "decoder_outputs_flat = tf.reshape(decoder_outputs, (-1, decoder_dim))\n",
580 |     "decoder_logits_flat = tf.add(tf.matmul(decoder_outputs_flat, W), b)\n",
581 |     "decoder_logits = tf.reshape(decoder_logits_flat, (decoder_max_steps, decoder_batch_size, vocab_size))"
582 |    ]
583 |   },
584 |   {
585 |    "cell_type": "code",
586 |    "execution_count": 24,
587 |    "metadata": {
588 |     "collapsed": false,
589 |     "scrolled": false
590 |    },
591 |    "outputs": [],
592 |    "source": [
593 |     "decoder_prediction = tf.argmax(decoder_logits, 2)"
594 |    ]
595 |   },
596 |   {
597 |    "cell_type": "markdown",
598 |    "metadata": {},
599 |    "source": [
600 |     "### Optimizer"
601 |    ]
602 |   },
603 |   {
604 |    "cell_type": "markdown",
605 |    "metadata": {},
606 |    "source": [
607 |     "RNN outputs tensor of shape `[max_time, batch_size, hidden_units]` which projection layer maps onto `[max_time, batch_size, vocab_size]`. `vocab_size` part of the shape is static, while `max_time` and `batch_size` is dynamic."
608 |    ]
609 |   },
610 |   {
611 |    "cell_type": "code",
612 |    "execution_count": 25,
613 |    "metadata": {
614 |     "collapsed": true
615 |    },
616 |    "outputs": [],
617 |    "source": [
618 |     "stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(\n",
619 |     "    labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),\n",
620 |     "    logits=decoder_logits,\n",
621 |     ")\n",
622 |     "\n",
623 |     "loss = tf.reduce_mean(stepwise_cross_entropy)\n",
624 |     "train_op = tf.train.AdamOptimizer().minimize(loss)"
625 |    ]
626 |   },
627 |   {
628 |    "cell_type": "code",
629 |    "execution_count": 26,
630 |    "metadata": {
631 |     "collapsed": false
632 |    },
633 |    "outputs": [],
634 |    "source": [
635 |     "sess.run(tf.global_variables_initializer())"
636 |    ]
637 |   },
638 |   {
639 |    "cell_type": "markdown",
640 |    "metadata": {},
641 |    "source": [
642 |     "## Training on the toy task"
643 |    ]
644 |   },
645 |   {
646 |    "cell_type": "markdown",
647 |    "metadata": {},
648 |    "source": [
649 |     "Consider the copy task — given a random sequence of integers from a `vocabulary`, learn to memorize and reproduce input sequence. Because sequences are random, they do not contain any structure, unlike natural language."
650 |    ]
651 |   },
652 |   {
653 |    "cell_type": "code",
654 |    "execution_count": 27,
655 |    "metadata": {
656 |     "collapsed": false
657 |    },
658 |    "outputs": [
659 |     {
660 |      "name": "stdout",
661 |      "output_type": "stream",
662 |      "text": [
663 |       "head of the batch:\n",
664 |       "[7, 3, 7, 3, 8, 4]\n",
665 |       "[8, 9, 7]\n",
666 |       "[5, 2, 6, 7, 3, 9, 9, 8]\n",
667 |       "[6, 7, 4, 2, 9, 6, 3, 3]\n",
668 |       "[5, 2, 2, 3, 7, 3]\n",
669 |       "[5, 6, 7, 9, 9]\n",
670 |       "[6, 3, 3, 4]\n",
671 |       "[4, 6, 5, 6, 4]\n",
672 |       "[7, 7, 7, 3]\n",
673 |       "[4, 6, 3, 9, 2, 5]\n"
674 |      ]
675 |     }
676 |    ],
677 |    "source": [
678 |     "batch_size = 100\n",
679 |     "\n",
680 |     "batches = helpers.random_sequences(length_from=3, length_to=8,\n",
681 |     "                                   vocab_lower=2, vocab_upper=10,\n",
682 |     "                                   batch_size=batch_size)\n",
683 |     "\n",
684 |     "print('head of the batch:')\n",
685 |     "for seq in next(batches)[:10]:\n",
686 |     "    print(seq)"
687 |    ]
688 |   },
689 |   {
690 |    "cell_type": "code",
691 |    "execution_count": 28,
692 |    "metadata": {
693 |     "collapsed": true
694 |    },
695 |    "outputs": [],
696 |    "source": [
697 |     "def next_feed():\n",
698 |     "    batch = next(batches)\n",
699 |     "    encoder_inputs_, encoder_input_lengths_ = helpers.batch(batch)\n",
700 |     "    decoder_targets_, _ = helpers.batch(\n",
701 |     "        [(sequence) + [EOS] + [PAD] * 2 for sequence in batch]\n",
702 |     "    )\n",
703 |     "    return {\n",
704 |     "        encoder_inputs: encoder_inputs_,\n",
705 |     "        encoder_inputs_length: encoder_input_lengths_,\n",
706 |     "        decoder_targets: decoder_targets_,\n",
707 |     "    }"
708 |    ]
709 |   },
710 |   {
711 |    "cell_type": "code",
712 |    "execution_count": 29,
713 |    "metadata": {
714 |     "collapsed": true
715 |    },
716 |    "outputs": [],
717 |    "source": [
718 |     "loss_track = []"
719 |    ]
720 |   },
721 |   {
722 |    "cell_type": "code",
723 |    "execution_count": 30,
724 |    "metadata": {
725 |     "collapsed": false,
726 |     "scrolled": false
727 |    },
728 |    "outputs": [
729 |     {
730 |      "name": "stdout",
731 |      "output_type": "stream",
732 |      "text": [
733 |       "batch 0\n",
734 |       "  minibatch loss: 2.3642187118530273\n",
735 |       "  sample 1:\n",
736 |       "    input     > [2 7 2 6 0 0 0 0]\n",
737 |       "    predicted > [1 1 1 1 6 1 6 0 0 0 0]\n",
738 |       "  sample 2:\n",
739 |       "    input     > [8 8 7 9 0 0 0 0]\n",
740 |       "    predicted > [1 1 1 6 1 6 6 0 0 0 0]\n",
741 |       "  sample 3:\n",
742 |       "    input     > [4 4 6 2 6 7 8 0]\n",
743 |       "    predicted > [0 0 6 6 6 6 6 3 7 4 0]\n",
744 |       "\n",
745 |       "batch 1000\n",
746 |       "  minibatch loss: 0.5487109422683716\n",
747 |       "  sample 1:\n",
748 |       "    input     > [3 6 9 4 5 6 5 0]\n",
749 |       "    predicted > [3 6 6 9 6 5 5 1 0 0 0]\n",
750 |       "  sample 2:\n",
751 |       "    input     > [2 3 3 7 2 0 0 0]\n",
752 |       "    predicted > [2 3 3 3 2 1 0 0 0 0 0]\n",
753 |       "  sample 3:\n",
754 |       "    input     > [6 2 5 2 4 0 0 0]\n",
755 |       "    predicted > [6 2 2 2 4 1 0 0 0 0 0]\n",
756 |       "\n",
757 |       "batch 2000\n",
758 |       "  minibatch loss: 0.277699738740921\n",
759 |       "  sample 1:\n",
760 |       "    input     > [7 6 7 9 9 0 0 0]\n",
761 |       "    predicted > [7 6 7 9 9 1 0 0 0 0 0]\n",
762 |       "  sample 2:\n",
763 |       "    input     > [7 5 3 4 3 3 3 3]\n",
764 |       "    predicted > [7 5 4 3 3 3 3 3 1 0 0]\n",
765 |       "  sample 3:\n",
766 |       "    input     > [7 9 8 4 6 8 6 3]\n",
767 |       "    predicted > [7 9 8 4 8 6 6 3 1 0 0]\n",
768 |       "\n",
769 |       "batch 3000\n",
770 |       "  minibatch loss: 0.13742178678512573\n",
771 |       "  sample 1:\n",
772 |       "    input     > [8 5 2 4 7 5 4 5]\n",
773 |       "    predicted > [8 5 2 7 4 4 5 5 1 0 0]\n",
774 |       "  sample 2:\n",
775 |       "    input     > [8 7 2 8 6 0 0 0]\n",
776 |       "    predicted > [8 7 2 8 6 1 0 0 0 0 0]\n",
777 |       "  sample 3:\n",
778 |       "    input     > [9 8 9 7 0 0 0 0]\n",
779 |       "    predicted > [9 8 9 7 1 0 0 0 0 0 0]\n",
780 |       "\n"
781 |      ]
782 |     }
783 |    ],
784 |    "source": [
785 |     "max_batches = 3001\n",
786 |     "batches_in_epoch = 1000\n",
787 |     "\n",
788 |     "try:\n",
789 |     "    for batch in range(max_batches):\n",
790 |     "        fd = next_feed()\n",
791 |     "        _, l = sess.run([train_op, loss], fd)\n",
792 |     "        loss_track.append(l)\n",
793 |     "\n",
794 |     "        if batch == 0 or batch % batches_in_epoch == 0:\n",
795 |     "            print('batch {}'.format(batch))\n",
796 |     "            print('  minibatch loss: {}'.format(sess.run(loss, fd)))\n",
797 |     "            predict_ = sess.run(decoder_prediction, fd)\n",
798 |     "            for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):\n",
799 |     "                print('  sample {}:'.format(i + 1))\n",
800 |     "                print('    input     > {}'.format(inp))\n",
801 |     "                print('    predicted > {}'.format(pred))\n",
802 |     "                if i >= 2:\n",
803 |     "                    break\n",
804 |     "            print()\n",
805 |     "\n",
806 |     "except KeyboardInterrupt:\n",
807 |     "    print('training interrupted')"
808 |    ]
809 |   },
810 |   {
811 |    "cell_type": "code",
812 |    "execution_count": 31,
813 |    "metadata": {
814 |     "collapsed": false
815 |    },
816 |    "outputs": [
817 |     {
818 |      "name": "stdout",
819 |      "output_type": "stream",
820 |      "text": [
821 |       "loss 0.1332 after 300100 examples (batch_size=100)\n"
822 |      ]
823 |     },
824 |     {
825 |      "data": {
826 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XecFeW9x/HPb3ujLbuANJeyiqg0V0DA3igmGmOMmogl\nkRjjvZpEEyKxcRPlGks0GguRWGI08dowgiAWQESQ3puAdFjKUraX5/5xjssu29mzO6d836/Xvpgz\n85wzv3Hky7NznnnGnHOIiEh4ifK6ABERCTyFu4hIGFK4i4iEIYW7iEgYUriLiIQhhbuISBiqM9zN\nrIuZfWpmq8xspZndUU2b88zsoJkt8f/c1zTliohIfcTUo00J8Gvn3CIzawEsNLOPnHOrjmk32zl3\nWeBLFBGRhqqz5+6c2+mcW+RfPgysBjo1dWEiInL86tNzL2dmGUB/YF41m4eY2TJgO3CXc25lbZ+V\nlpbmMjIyGrJ7EZGIt3Dhwr3OufS62tU73M0sBXgLuNM5d+iYzYuArs65I2Y2EngXyKzmM8YAYwC6\ndu3KggUL6rt7EREBzOyb+rSr12gZM4vFF+yvOefePna7c+6Qc+6If3kKEGtmadW0e8E5l+Wcy0pP\nr/MfHhEROU71GS1jwIvAaufc4zW06eBvh5kN9H/uvkAWKiIi9VefyzJDgeuB5Wa2xL/uHqArgHPu\nOeAq4OdmVgLkA9c4TTcpIuKZOsPdOfc5YHW0eRp4OlBFiYhI4+gOVRGRMKRwFxEJQwp3EZEwFHLh\nvnbXYR6dtpYDuUVelyIiErRCLtw37T3C059uYHtOvteliIgErZAL99ZJcQDk5BV7XImISPAKuXBP\nifeN3swtKvG4EhGR4BVy4Z4UFw1AflGpx5WIiASvEAx3X889T+EuIlKjkAv3RH/PPU+XZUREahRy\n4Z4SH0NcTBTZhwu9LkVEJGiFXLhHRxltkmI1WkZEpBYhF+4AyXExGi0jIlKLkAz3xLhojZYREalF\nSIa7eu4iIrULyXBXz11EpHYhGe7J8dHkKtxFRGoUkuGeGBujnruISC1CMtx9PXddcxcRqUlIhvsX\nX+8jJ6+YTXtzvS5FRCQohWS4b92fB8CybTkeVyIiEpxCMtyf/fEAAFKT4zyuREQkOIVkuLdvmQDA\n9gN6GpOISHVCMtyjzAAY+/ZyjysREQlOIRnuPduleF2CiEhQC8lwj40+WnZZmfOwEhGR4BSS4Q6Q\nlhIPwBGNdxcRqSJkw/03w08G4FC+5nUXETlWyIZ7y4RYAA4q3EVEqgjZcAfftfbnZ270uA4RkeAT\nsuGe2b4FAJOX7vC4EhGR4BOy4d4jXcMhRURqErLhXtFDU1Z7XYKISFAJ6XDvlpYMwAuzdN1dRKSi\nOsPdzLqY2admtsrMVprZHdW0MTN7ysw2mNkyMxvQNOVWdvOwbuXLuw4WNMcuRURCQn167iXAr51z\nvYHBwC/MrPcxbUYAmf6fMcCzAa2yBq0SY8uXx7y6oDl2KSISEuoMd+fcTufcIv/yYWA10OmYZpcD\nrzifL4HWZnZCwKs9xmWnH93Fqh2Hmnp3IiIho0HX3M0sA+gPzDtmUydga4XX26j6D0DARUUZ55+c\nDkCJ5pgRESlX73A3sxTgLeBO59xxdZPNbIyZLTCzBdnZ2cfzEVU8fGWf8uWb/j4/IJ8pIhLq6hXu\nZhaLL9hfc869XU2T7UCXCq87+9dV4px7wTmX5ZzLSk9PP556q+jQKqF8+dO1gfkHQ0Qk1NVntIwB\nLwKrnXOP19BsMjDaP2pmMHDQObczgHXW6sJe7ZprVyIiISGmHm2GAtcDy81siX/dPUBXAOfcc8AU\nYCSwAcgDbgp8qTW769KT+XjNnubcpYhIUKsz3J1znwNWRxsH/CJQRTXUKSe09GrXIiJBKaTvUK3I\n/1hVPlzRbFeDRESCVtiEu/OPhFy67aC3hYiIBIGwCfeEWN+hFJeUeVyJiIj3wibcv30y05FCPVNV\nRCRswv3KAZ0B+Grzfo8rERHxXtiE+12XnATA19m5HlciIuK9sAn3mOiwORQRkUYLy0S84pk5Xpcg\nIuKpsAz3JVtzvC5BRMRTYRXuT13b3+sSRESCQliF+8jTOnhdgohIUAircK/4pWpBcamHlYiIeCus\nwh2gt38SsSc+WudxJSIi3gm7cO/ftTUAhZqGQEQiWNiF++0X9AQgs32Kx5WIiHgn7MI9Jd43Rf22\nA/keVyIi4p2wC/fkOF+4P/vZ1x5XIiLinbAL96ioow+Nyskr8rASERHvhF24V5STV+x1CSIingjL\ncH/++jMAze0uIpErLMP92y9VcxXuIhKhwjrcF2sCMRGJUGEZ7ie0TgBgwtQ1HlciIuKNsAz39JR4\nr0sQEfFUWIa7mdXdSEQkjIVluFdUpDlmRCQChW24n9DKd939gG5kEpEIFLbhfu9lvQHYn6twF5HI\nE7bhnpocB8CybRoOKSKRJ2zDvWVCLAC/fWu5x5WIiDS/sA339BYaDikikSvG6wKaSnqLeJLiornw\nlPZelyIi0uzCtucO0DU1iR05emiHiESesO25A6zZdRiAguJSEmKjPa5GRKT51NlzN7NJZrbHzFbU\nsP08MztoZkv8P/cFvszGOZSved1FJLLU57LMS8DwOtrMds718/+Mb3xZgXVQ4S4iEabOcHfOzQL2\nN0MtTealLzZ7XYKISLMK1BeqQ8xsmZlNNbNTA/SZjfbIVX0AeG3eFo8rERFpXoEI90VAV+dcH+Av\nwLs1NTSzMWa2wMwWZGdnB2DXtbtqQOcm34eISDBqdLg75w455474l6cAsWaWVkPbF5xzWc65rPT0\n9Mbuuk5RUZr6V0QiU6PD3cw6mH8CdTMb6P/MfY393EC5fvCJACz8JqS/NhARaZD6DIV8HZgLnGxm\n28zsJ2Z2q5nd6m9yFbDCzJYCTwHXOOdc05XcMJee2gGA7z871+NKRESaT503MTnnrq1j+9PA0wGr\nKMBOap/idQkiIs0urKcfAEiOD+ubcEVEqhX24Z5YYdqBILpaJCLSpMI+3CuOmPnDB6s9rEREpPmE\nfbgDTLoxC4AXP9/kcSUiIs0jIsL9gl5H53TXPDMiEgkiItwr6vvgdK9LEBFpchEX7iIikSBiwr1/\n19ZelyAi0mwiJtxfuD6rfLm4tMzDSkREml7EhHt6i3jaJMUCkDluqsfViIg0rYgJd4DZv73A6xJE\nRJpFRIV7SoWpCO5+c6mHlYiINK2ICveK3ly4jW/25XpdhohIk4i4cP/9qFPKl7cdyPewEhGRphNx\n4X5J7w7lyz/62zwPKxERaToRF+5d2yZVev3VZj2hSUTCT8SFO8Cwnkcf8fqD5+ZypLDEw2pERAIv\nIsP9pZvO5H8uP7X89bJtOR5WIyISeBEZ7jHRUQyt0Hu/bqKuvYtIeInIcAfonp5CanJc+euHp+pB\nHiISPiI23AHm/u7oHavPz9zI4QLN9S4i4SGiwz0+JrrS69MfmE5pmZ6zKiKhL6LDHWDlg5dWet3j\nninszy3yqBoRkcCI+HBPjo/h3z87q9K61+dv8agaEZHAiPhwBxjYLZWbh3Yrf/2naWs9rEZEpPEU\n7n6XnNq+0ushD3/MnsMFHlUjItI4Cne/wd3bsuGPI8pf7zhYwMA/fsyOHE0uJiKhR+FeQUx0FLN/\nc36ldUMmfOJRNSIix0/hfowuqUlV1v19ziYPKhEROX4K92pMu/OcSq8ffH8Vt722kILiUo8qEhFp\nGIV7NU7u0II1/zO80ropy3fR694Pmfv1Po+qEhGpP4V7DRJio7mgV7sq66+d+KUH1YiINIzCvRbP\n/fgMLqwm4EtKyzyoRkSk/hTutYiLieLFG8+ssv6TNXs8qEZEpP7qDHczm2Rme8xsRQ3bzcyeMrMN\nZrbMzAYEvszgMubVhWSM/YBi9eBFJEjVp+f+EjC8lu0jgEz/zxjg2caXFVw2TxjF5gmjePu2IZXW\nZ46bqhE0IhKU6gx359wsoLanSF8OvOJ8vgRam9kJgSowmAzo2obuacmV1vW690PeW7Ldo4pERKoX\niGvunYCtFV5v868LSy+Mzqqy7o43lnDvuyso01zwIhIkmvULVTMbY2YLzGxBdnZ2c+46YHq2S6ly\nkxPAq19+w6n3T/OgIhGRqgIR7tuBLhVed/avq8I594JzLss5l5Wenh6AXXvj5A4tmH/PhVXW5xeX\nkjH2A7buz/OgKhGRowIR7pOB0f5RM4OBg865nQH43KDWrmUCM351Lh//+twq285+5FMPKhIROao+\nQyFfB+YCJ5vZNjP7iZndama3+ptMATYCG4CJwG1NVm2Q6dkuhR7p1V+myRj7AZv25npQlYgImHPe\nfAmYlZXlFixY4Mm+m8LW/XnV9tjHjujFLWd3JzrKPKhKRMKNmS10zlUd2XEM3aEaIF1Sk9g8YRQ9\n0isPlZwwdQ3vLdnOoYJiTVsgIs1G4R5gH//6PAZ3T6207lf/XkqfB6Yz4snZHlUlIpFG4d4EXrl5\nULXr1+85wkt68IeINAOFexOIi4ni07vOq3bbA++vat5iRCQiKdybSLe0ZBbdezF3XpRZZVvG2A/I\nLSxhyvKd5BaWeFCdiIQ7hXsTSk2O4/sDOle77dT7p3Hba4u4553lzVyViEQChXsT65KaxLo/jGDp\n/ZdUu/29JTv0AG4RCTiFezOIi4miVWIsc8ZeUO32B99fxadr9QAQEQkchXsz6tQ6kYevPL3abTf9\n/Ssyxn7AL/+1BK9uLBOR8KFwb2bXnNmFnwzrRmJsdLXb31m8nfH/WcWeQwWs3HGwmasTkXCh6Qc8\nUlBcSq97P6yz3eYJo5qhGhEJFZp+IMglxEbzys0DGTuiF3+44rQa213xzJxmrEpEwkWM1wVEsnNO\nSueck3zz2v/10w3sOFhQpc2SrTn84rVF9O/amu/07UjLhFgS46q/pCMi8i1dlgkieUUl9L6v9qc5\nnXFiG976+ZBa24hI+NJlmRCUFBfD3286k/iYmk/Lwm8OsO9IYTNWJSKhSD33IFRa5li98xCX/eXz\nGttc0KsdI07rwOSlO9h9qIDpv6z6RCgRCT/17bnrmnsQio4yTuvUiqX3X0LfB6dX2+aTNXv4ZE3V\nG58O5hUTHxtFQg1DLUUkMuiyTBBrlRjL4nsvZt0fRtTZdvLSHQD0HT+dayd+2dSliUiQU7gHuTbJ\nccTFRLF6/HDO9Y+sqc5/v76YmeuyAVi8Jae5yhORIKVwDxGJcdG8fPNAEmJrPmU3TJpfvvz0J+ub\noywRCVL6QjXEOOeYs2Efs9dnc+dFJ3HxEzPZdiC/1vfMu+dC2rdMaKYKRaQp6QvVMGVmDMtMY1hm\nGgB//dEA/jlvC2dmpPLrN5dW+56py3eSFBfDpn25/HZ4r+YsV0Q8onAPcX06t6ZP59YAPPPZBjZm\n51ZpU/HRfj87pzutk+KarT4R8YauuYeRB797ap1t+o3/iMufmcM7i7cxf9P+ZqhKRLygcA8jZ2em\n868xg+tst3RrDr/811Kufn4uAPtziygpLWvq8kSkGekL1TCUV1TCjpwCTmybxLYD+Zz/6Gf1et/d\nl55M/66tWbXjEDcOySAmWv/2iwSb+n6hqnCPALsPFTDooY8b9J5OrRNrfCygiHhHE4dJufYtE3jz\n1rP46bBu9X7P9px8Fm85QG5hCS9/sZn8otImrFBEAk099wiUMfaD43rfpodHYmYczCsmLiZK88qL\neEA9d6nR4nsv5h8/GdTg91313FwO5hfTd/x0Lnp8ZhNUJiKBonCPQG2S4xiWmcb3+ncC4JazuzHm\nnO5c0a9jre9b+M2B8lkqt+fk49VvfSJSN12WiWBFJWXsPJjPiW2Ty9cdyC3iosdnsi+3qF6fMXF0\nFi9/sZkbhmSwdX8eV5/ZhZR43Rsn0lQ0WkaOW0FxKTNW7+b2fy5u8HtvObsb40b1xjnH+j1HyGyX\ngpk1QZUikUnhLo3mnKPb76Y06jMeuaoPV2d1CVBFIqKJw6TRzIxZd5/Psu05dGqdyKx1e3lixroG\nfcZ/lu1kQNfW7DxYQLsWCZzcoUUTVSsiFdUr3M1sOPAkEA38zTk34Zjt5wHvAZv8q952zo0PYJ3i\nka5tk+jaNgmAvp1bsyH7CO/7n/pUH7PWZXPR49nlrzf8cQQlZa7SYwBLSstYvv0g/bu2CVzhIhGu\nztEyZhYNPAOMAHoD15pZ72qaznbO9fP/KNjDUFSUlU9O1rNdCp/edR4v3zyQzm0S6/0ZPcdNpde9\nH/L5+r1c/vTnvLdkO499tI7v/fULVu442FSli0Sc+vTcBwIbnHMbAczsDeByYFWt75KwlJocx5PX\n9GNozzTSUuLplpbM7N+cT5mDHvfU//r8j1+cB8AdbyyhbbJvCuJ9R4ooK3NERekLWJHGqs84907A\n1gqvt/nXHWuImS0zs6lmVu3cs2Y2xswWmNmC7Ozs6ppICLi8XyfSUuLLX5sZ0VHGO7cNOa7P+3bY\n5ehJ8+l+zxQen76WguJSTXkg0gh1jpYxs6uA4c65n/pfXw8Mcs7dXqFNS6DMOXfEzEYCTzrnMmv7\nXI2WCU9FJWXkF5WSGBfNSb+f2ujPW//HEcRWmJ3SOUeZg2j17iVCBXL6ge1AxbFsnf3ryjnnDjnn\njviXpwCxZpbWgHolTMTFRNEqKZa4mCj+dFWf45rmoKKrn59LxtgPeH3+Fg7kFpH1hxnll3/+9dUW\n9hwuqNR+f24R63YfbtQ+RcJBfXruMcA64EJ8of4VcJ1zbmWFNh2A3c45Z2YDgf8DTnS1fLh67pHj\ncEEx/1m2k7Mz03j5i838ZngvMsc1vlf/rY9+eQ6Z7X1DLAf+cQZ7DheyecKoSm2OFJaQGButHr+E\nvID13J1zJcDtwDRgNfBv59xKM7vVzG71N7sKWGFmS4GngGtqC3aJLC0SYrl2YFc6t0li3KjexEZH\n8bfRWQzMSOW+y6obeNUwFz8xi8/X72X+pv3sOVwIwLYDeQDkF5VSVuY47f5p3PveikbvSyRU6A5V\n8ZRzjomzN/LQlDUB/+xXbh7I6Enz+e8LevLUJxsA+OuPBjDy9BOYvnIXU1fs4okf9gv4fkWakqb8\nlZBgZow5pwebJ4xi5t3ncfPQbvV6Dmx9fLhyF0B5sAPc9toituzLY8yrC3ln8XY+XLGT0jJfB2fp\n1hxy8uo3YZpIsFPPXYLSP+dtYfhpHUiKi6bPA9MpasIHeD/wnd7ExkQx7p0VZLZL4aNfndtk+xJp\nLPXcJaRdN6grqclxJMRGc36vdMB3SaUpPPD+Ksa947sev37PETbtzaWsrHKn50BuEc/P/BrnHEUl\nZTw2fS15RSVNUo9IIGjiMAl6Y87pwcx12Qzqlsqsu8+nsKSU7MOFjJ40n0d/0Jc7/7UkoPs7/9HP\nAPjb6Cw6tk6kfct4hj85m+zDhQw4sQ3rdh/mL59soLjUMXZEr4DuWyRQFO4S9M44sQ1r/mcEAG1T\nfOsy27dgw0MjAejTuRUXPBb4x/799JWqlw2XbMkpf3bswfzigO9TJFB0zV3CwrYDeWQfLqS41HH1\n83Pp2CqBF0ZncdlfPg/4vuJjoigsKaNDywTGjTqFR6atIaNtMhed0p77J69k1fhLSYpTv0mahh7W\nIRFry748uqQmYmaUlJbxyLS1vDBrIwBX9u/E24u31/EJxyfKoMzBn3/Yj0HdU8kvKiU+NpqOrRL0\nNCoJGIW7iJ9zjnmb9pPZLoW2KfF8tGo3t/gvuSTHRTPxhiyumzivyfb/0PdO57pBXSut23O4gOS4\nGJL1vFlpIIW7SA2KSsp4Z/E2vtu3U/n18/7jp3Mgr2mvoQ/rmcaNQzLISEvmosdnktE2iYmjs/i/\nRdu4sn/nSk+pOphfTKvE2CatR0KTwl2kAfKKSigsLuOb/XkczC/mhknzGTfyFP44ZXWz1fDZXeeR\nkZbMjFW7+ekrC3jz1rM4MyOVFdsPcmrHlszZsI9l23O47byezVaTBB+Fu0gjHcwrpu/46fzv908n\nv6iUB95v3ufTjDitA4u2HGD3oULO6t6WuRv3AbDp4ZHl1/BLyxxlzlWaFvlb01buYv3uw9x+Qa2z\nb0uI0U1MIo3UKimWzRNG8cMzu3Lj0G7l67+dWPKKfh1p1yK+hnc33tQVu9h9yDcR2rfBDrBoywGc\ncxSXltHjnilkjpvK3K9927fn5PPZ2j2Uljl+9upCHp2+jgO5Rfzu7WUUFJeyYPN+Nu3NbbKaJXio\n5y5ST898uoEWCTH8aNCJzN+0n7N6tGXJ1hxumDSfHunJPHZ1v/IboLzw1LX9+e/XF1dZ/+PBXfnH\nl1sY2rMtczb4/hF46+dn0TIhtnyq5GN9uGIXt/5jISsevJQU/5e+uYUlOCh/Ld7QZRkRjxSVlFV5\nCtXPzu3O8zM3lr++6JT2zFi9u7lLq+Lrh0ZSWFJaZVz+pU/MYu3uw/zzlkHc/eYyJo7OYuRTswHK\n58qfs2EvB/KKuKxPx2avO5LpsoyIR+JiosoDsGOrBEadfgK/vvhkhvRoW97m3stOIT4mipuGZnhU\npU+Pe6bQ+75pjH9/FQXFpTw5Yz2HC4rJL/Y9v/a6ifPYnpPPXz87OrPmS3M2AfCjv83j9n8u1hw7\nQUo9d5EmkldUQmx0VPmXnftzi7j33RU89L3TaZVUdZjj5KU7eHTaWrbsz2vuUhvsVxefxOMfrSt/\nXfHJVwdyi9iXW0TPdilelBb2dFlGJAQVlpSydX8+Fz0+k75dWvP7Uafwg+fmAvDoD/rSNjmOm176\nyuMqq5p593mc+6fPOOekdGatywZ8v7VM/q9hjHllASVljsm3Dytvv2lvLocLiunTuXWj930wr5iW\niTERcxewwl0kxDnnMDPmbNjL+0t3MOH7fQAoKC5lza7DXPHMnFrff9clJ/Ho9HW1tmlql57anmkr\nj3638O+fncXAbqlkjP0AgNdvGUzrpFg6t0kkKS6GO95YzAW92hEdZdzxxhJ+Oqwbf/t8E3N/dwEn\ntEqs8vl7DhUw8KGPueuSkyJmyKfCXSTMvTDra9JbxPO9/p3L15WVOca9u5zrB2fQu2NLVu04xHee\n/pzv9DmBd5fs8LDaul0/+ERe/fKbGrcf+9Dz0+6fxpHCo9f7J47O4uLe7XHOcf/klVx1Rufy3wxK\nSsuIjrJ69e6/2ryfdbsP86NBJx7nkTQthbuIVLIjJ5/Z67PplpbC1c/PZcKVpzP27eXl2zPbpbB+\nzxEPK6zdTUMzOCcznZho4+Uvvql2tNE/bxlEt7Rkznr4E1okxPDRL8+lZWIMve+bxt2Xnswvzq98\nd29OXhF7jxz9fmDVjkNVRgWt232YlgmxdGiV0MRHWD8KdxGp07eXRzZPGEVeUQl/nrG+fAZNgPd+\nMZQnP17PkB5t+cMHzTcVQ1NaNf5Sdh8q5LHpa/nPsp0AvH3bEHqf0JJe935Y3m7p/ZfQKjG20n+j\n4/H+0h38Z9kO/vzD/uVzGTWGwl1E6nQgt4ji0jLatTzaK80+XMjv313OEz/sV2n8e2FJKc99tpEn\nZviu42+eMIqt+/M4+5FPAeiamhQSI32O13f7duSqMzozetJ87rusNzcP8921nFtYghmMeHI2f7qq\nLwO7pZa/p+JvAo9f3ZcrB3Su9rMbor7hrlvNRCJYm+S4KuvSW8Tz/PVVsyM+Jpo7Lsqka9tEjhT4\nrnV3SU3i/duHces/FvL+7cMoLCll/ub9vLlgGzP9o2be+vkQnpv5NR+t8v6mrcaYvHQHk5f6vrcY\n/59VnNg2iS6pSVzyxCzat4xn96FCfvLyVzz43VN5bubX3Dy0G+kVpqeIaubRPOq5i0iz2JGTz9b9\neczZsJefnduDU++f5nVJnrhpaAb3f+fU436/eu4iElQ6tk6kY+tEBnX33al77knpREcZz/34DErL\nHNmHC+naNontOfkMnfAJAKd2bMldl5xcPrb/7Mw0Zq/fC8Ds35zPXz/bwOvzt3pzQMfp73M2M6xn\nGhee0r5J96Oeu4gEnfW7D/P6/K3cM7IXMdFRlJU5CkpKSYyNZtb6vezIyefagV1xzvHx6j38+eN1\nrNh+qPzyyKQbs7j5pdrzpW1yHJNuPJPL67hfoCmkxMew4sFLj+u9+kJVRCLGwbxiVu86xODuR+fv\n2XOogOIyR6vEWDbvzeW0Tq3KR74s+P1FpKX4roc/OWM9T8xYx4W92vHijWey+1ABy7cdJCkumm/2\n5/Hlxn1c0b8TKfExPD9zI09f15+Jszby2EfHf4NY386teK/CHbsNoXAXETlGTl4RRwpL6NwmqXzd\nkcISHvlwDWNH9KoyO2ZNDhUU0+eB6QAsuvdi7vzXEmaty+a/LujJry4+iakrdnHba4sAeP/2YXzn\n6c8rvX/ePRfSvuXxjZvXNXcRkWO0ToqjdVLlEUIp8TGMv/y0Bn1Oy4RYnrymH6nJcaQmx/HKzQMr\nbY+POTrh7mmdWnLuSencOCSD83u1O/7iG0g9dxGRACstczw6fS23nN2d1GqGmzaGeu4iIh6JjjJ+\nO7yXpzXoYR0iImFI4S4iEoYU7iIiYahe4W5mw81srZltMLOx1Ww3M3vKv32ZmQ0IfKkiIlJfdYa7\nmUUDzwAjgN7AtWbW+5hmI4BM/88Y4NkA1ykiIg1Qn577QGCDc26jc64IeAO4/Jg2lwOvOJ8vgdZm\ndkKAaxURkXqqT7h3AirOzLPNv66hbTCzMWa2wMwWZGdnN7RWERGpp2b9QtU594JzLss5l5Went6c\nuxYRiSj1uYlpO9ClwuvO/nUNbVPJwoUL95pZzU/DrV0asPc43xtsdCzBKVyOJVyOA3Qs36rXk7vr\nE+5fAZlm1g1fYF8DXHdMm8nA7Wb2BjAIOOic21nbhzrnjrvrbmYL6nP7bSjQsQSncDmWcDkO0LE0\nVJ3h7pwrMbPbgWlANDDJObfSzG71b38OmAKMBDYAecBNTVeyiIjUpV5zyzjnpuAL8Irrnquw7IBf\nBLY0ERE5XqF6h+oLXhcQQDqW4BQuxxIuxwE6lgbxbMpfERFpOqHacxcRkVqEXLjXNc9NsDGzzWa2\n3MyWmNns01SsAAADaUlEQVQC/7pUM/vIzNb7/2xTof3v/Me21syO7wm6AWJmk8xsj5mtqLCuwbWb\n2Rn+/wYb/HMQWZAcywNmtt1/bpaY2chgPxYz62Jmn5rZKjNbaWZ3+NeH3Hmp5VhC8bwkmNl8M1vq\nP5YH/eu9Oy/OuZD5wTda52ugOxAHLAV6e11XHTVvBtKOWfcIMNa/PBb4X/9yb/8xxQPd/Mca7WHt\n5wADgBWNqR2YDwwGDJgKjAiSY3kAuKuatkF7LMAJwAD/cgtgnb/ekDsvtRxLKJ4XA1L8y7HAPH89\nnp2XUOu512eem1BwOfCyf/ll4IoK699wzhU65zbhG1o6sJr3Nwvn3Cxg/zGrG1S7+eYYaumc+9L5\n/s99pcJ7mk0Nx1KToD0W59xO59wi//JhYDW+qT5C7rzUciw1CeZjcc65I/6Xsf4fh4fnJdTCvV5z\n2AQZB8wws4VmNsa/rr07epPXLqC9fzkUjq+htXfyLx+7Plj8l/mmqZ5U4VfmkDgWM8sA+uPrJYb0\neTnmWCAEz4uZRZvZEmAP8JFzztPzEmrhHoqGOef64ZsW+Rdmdk7Fjf5/nUNyyFIo1+73LL5LfP2A\nncBj3pZTf2aWArwF3OmcO1RxW6idl2qOJSTPi3Ou1P93vTO+Xvhpx2xv1vMSauHe4DlsvOac2+7/\ncw/wDr7LLLv9v37h/3OPv3koHF9Da9/uXz52veecc7v9fyHLgIkcvQQW1MdiZrH4wvA159zb/tUh\neV6qO5ZQPS/fcs7lAJ8Cw/HwvIRauJfPc2NmcfjmuZnscU01MrNkM2vx7TJwCbACX803+JvdALzn\nX54MXGNm8eabyycT35crwaRBtft/JT1kZoP93/qPrvAeT1nlZw58D9+5gSA+Fv9+XwRWO+cer7Ap\n5M5LTccSoucl3cxa+5cTgYuBNXh5XprzG+VA/OCbw2Ydvm+Xx3ldTx21dsf3jfhSYOW39QJtgY+B\n9cAMILXCe8b5j20tHowqOab+1/H9WlyM79rfT46ndiAL31/Qr4Gn8d88FwTH8iqwHFjm/8t2QrAf\nCzAM36/2y4Al/p+RoXheajmWUDwvfYDF/ppXAPf513t2XnSHqohIGAq1yzIiIlIPCncRkTCkcBcR\nCUMKdxGRMKRwFxEJQwp3EZEwpHAXEQlDCncRkTD0/wNUeX4fMLUzAAAAAElFTkSuQmCC\n",
827 |       "text/plain": [
828 |        "<matplotlib.figure.Figure at 0x11c53aeb8>"
829 |       ]
830 |      },
831 |      "metadata": {},
832 |      "output_type": "display_data"
833 |     }
834 |    ],
835 |    "source": [
836 |     "%matplotlib inline\n",
837 |     "import matplotlib.pyplot as plt\n",
838 |     "plt.plot(loss_track)\n",
839 |     "print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))"
840 |    ]
841 |   }
842 |  ],
843 |  "metadata": {
844 |   "kernelspec": {
845 |    "display_name": "Python 3",
846 |    "language": "python",
847 |    "name": "python3"
848 |   },
849 |   "language_info": {
850 |    "codemirror_mode": {
851 |     "name": "ipython",
852 |     "version": 3
853 |    },
854 |    "file_extension": ".py",
855 |    "mimetype": "text/x-python",
856 |    "name": "python",
857 |    "nbconvert_exporter": "python",
858 |    "pygments_lexer": "ipython3",
859 |    "version": "3.6.0"
860 |   }
861 |  },
862 |  "nbformat": 4,
863 |  "nbformat_minor": 2
864 | }
865 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Matvey Ezhov
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # seq2seq with TensorFlow
 2 | Collection of unfinished tutorials. May be good for educational purposes.
 3 | 
 4 | ## **1 - [simple sequence-to-sequence model with dynamic unrolling](1-seq2seq.ipynb)**
 5 | > Deliberately slow-moving, explicit tutorial. I tried to thoroughly explain everything that I found in any way confusing.
 6 | 
 7 | > Implements simple seq2seq model described in [Sutskever at al., 2014](https://arxiv.org/abs/1409.3215) and tests it against toy memorization task.
 8 | 
 9 | ![1-seq2seq](pictures/1-seq2seq.png)
10 | *Picture from [Sutskever at al., 2014](https://arxiv.org/abs/1409.3215)*
11 | 
12 | ## **2 - [advanced dynamic seq2seq](2-seq2seq-advanced.ipynb)**
13 | > Encoder is bidirectional now. Decoder is implemented using `tf.nn.raw_rnn`. It feeds previously generated tokens during training as inputs, instead of target sequence.
14 | 
15 | ![2-seq2seq-feed-previous](pictures/2-seq2seq-feed-previous.png)
16 | *Picture from [Deep Learning for Chatbots](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/)*
17 | 
18 | ## **3 - [Using `tf.contrib.seq2seq`](3-seq2seq-native-new.ipynb)** (TF<=1.1)
19 | > New dynamic seq2seq appeared in r1.0. Let's try it.
20 | 
21 | UPDATE: that this tutorial doesn't work with tf version > 1.1, API. I recommend checking out new [official tutorial](https://github.com/tensorflow/nmt) instead to learn high-level seq2seq API.
22 | 


--------------------------------------------------------------------------------
/helpers.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def batch(inputs, max_sequence_length=None):
 4 |     """
 5 |     Args:
 6 |         inputs:
 7 |             list of sentences (integer lists)
 8 |         max_sequence_length:
 9 |             integer specifying how large should `max_time` dimension be.
10 |             If None, maximum sequence length would be used
11 |     
12 |     Outputs:
13 |         inputs_time_major:
14 |             input sentences transformed into time-major matrix 
15 |             (shape [max_time, batch_size]) padded with 0s
16 |         sequence_lengths:
17 |             batch-sized list of integers specifying amount of active 
18 |             time steps in each input sequence
19 |     """
20 |     
21 |     sequence_lengths = [len(seq) for seq in inputs]
22 |     batch_size = len(inputs)
23 |     
24 |     if max_sequence_length is None:
25 |         max_sequence_length = max(sequence_lengths)
26 |     
27 |     inputs_batch_major = np.zeros(shape=[batch_size, max_sequence_length], dtype=np.int32) # == PAD
28 |     
29 |     for i, seq in enumerate(inputs):
30 |         for j, element in enumerate(seq):
31 |             inputs_batch_major[i, j] = element
32 | 
33 |     # [batch_size, max_time] -> [max_time, batch_size]
34 |     inputs_time_major = inputs_batch_major.swapaxes(0, 1)
35 | 
36 |     return inputs_time_major, sequence_lengths
37 | 
38 | 
39 | def random_sequences(length_from, length_to,
40 |                      vocab_lower, vocab_upper,
41 |                      batch_size):
42 |     """ Generates batches of random integer sequences,
43 |         sequence length in [length_from, length_to],
44 |         vocabulary in [vocab_lower, vocab_upper]
45 |     """
46 |     if length_from > length_to:
47 |             raise ValueError('length_from > length_to')
48 | 
49 |     def random_length():
50 |         if length_from == length_to:
51 |             return length_from
52 |         return np.random.randint(length_from, length_to + 1)
53 |     
54 |     while True:
55 |         yield [
56 |             np.random.randint(low=vocab_lower,
57 |                               high=vocab_upper,
58 |                               size=random_length()).tolist()
59 |             for _ in range(batch_size)
60 |         ]


--------------------------------------------------------------------------------
/model_new.py:
--------------------------------------------------------------------------------
  1 | # Working with TF commit 24466c2e6d32621cd85f0a78d47df6eed2c5c5a6
  2 | 
  3 | import math
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | import tensorflow.contrib.seq2seq as seq2seq
  8 | from tensorflow.contrib.layers import safe_embedding_lookup_sparse as embedding_lookup_unique
  9 | from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple, GRUCell
 10 | 
 11 | import helpers
 12 | 
 13 | 
 14 | class Seq2SeqModel():
 15 |     """Seq2Seq model usign blocks from new `tf.contrib.seq2seq`.
 16 |     Requires TF 1.0.0-alpha"""
 17 | 
 18 |     PAD = 0
 19 |     EOS = 1
 20 | 
 21 |     def __init__(self, encoder_cell, decoder_cell, vocab_size, embedding_size,
 22 |                  bidirectional=True,
 23 |                  attention=False,
 24 |                  debug=False):
 25 |         self.debug = debug
 26 |         self.bidirectional = bidirectional
 27 |         self.attention = attention
 28 | 
 29 |         self.vocab_size = vocab_size
 30 |         self.embedding_size = embedding_size
 31 | 
 32 |         self.encoder_cell = encoder_cell
 33 |         self.decoder_cell = decoder_cell
 34 | 
 35 |         self._make_graph()
 36 | 
 37 |     @property
 38 |     def decoder_hidden_units(self):
 39 |         # @TODO: is this correct for LSTMStateTuple?
 40 |         return self.decoder_cell.output_size
 41 | 
 42 |     def _make_graph(self):
 43 |         if self.debug:
 44 |             self._init_debug_inputs()
 45 |         else:
 46 |             self._init_placeholders()
 47 | 
 48 |         self._init_decoder_train_connectors()
 49 |         self._init_embeddings()
 50 | 
 51 |         if self.bidirectional:
 52 |             self._init_bidirectional_encoder()
 53 |         else:
 54 |             self._init_simple_encoder()
 55 | 
 56 |         self._init_decoder()
 57 | 
 58 |         self._init_optimizer()
 59 | 
 60 |     def _init_debug_inputs(self):
 61 |         """ Everything is time-major """
 62 |         x = [[5, 6, 7],
 63 |              [7, 6, 0],
 64 |              [0, 7, 0]]
 65 |         xl = [2, 3, 1]
 66 |         self.encoder_inputs = tf.constant(x, dtype=tf.int32, name='encoder_inputs')
 67 |         self.encoder_inputs_length = tf.constant(xl, dtype=tf.int32, name='encoder_inputs_length')
 68 | 
 69 |         self.decoder_targets = tf.constant(x, dtype=tf.int32, name='decoder_targets')
 70 |         self.decoder_targets_length = tf.constant(xl, dtype=tf.int32, name='decoder_targets_length')
 71 | 
 72 |     def _init_placeholders(self):
 73 |         """ Everything is time-major """
 74 |         self.encoder_inputs = tf.placeholder(
 75 |             shape=(None, None),
 76 |             dtype=tf.int32,
 77 |             name='encoder_inputs',
 78 |         )
 79 |         self.encoder_inputs_length = tf.placeholder(
 80 |             shape=(None,),
 81 |             dtype=tf.int32,
 82 |             name='encoder_inputs_length',
 83 |         )
 84 | 
 85 |         # required for training, not required for testing
 86 |         self.decoder_targets = tf.placeholder(
 87 |             shape=(None, None),
 88 |             dtype=tf.int32,
 89 |             name='decoder_targets'
 90 |         )
 91 |         self.decoder_targets_length = tf.placeholder(
 92 |             shape=(None,),
 93 |             dtype=tf.int32,
 94 |             name='decoder_targets_length',
 95 |         )
 96 | 
 97 |     def _init_decoder_train_connectors(self):
 98 |         """
 99 |         During training, `decoder_targets`
100 |         and decoder logits. This means that their shapes should be compatible.
101 | 
102 |         Here we do a bit of plumbing to set this up.
103 |         """
104 |         with tf.name_scope('DecoderTrainFeeds'):
105 |             sequence_size, batch_size = tf.unstack(tf.shape(self.decoder_targets))
106 | 
107 |             EOS_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.EOS
108 |             PAD_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.PAD
109 | 
110 |             self.decoder_train_inputs = tf.concat([EOS_SLICE, self.decoder_targets], axis=0)
111 |             self.decoder_train_length = self.decoder_targets_length + 1
112 | 
113 |             decoder_train_targets = tf.concat([self.decoder_targets, PAD_SLICE], axis=0)
114 |             decoder_train_targets_seq_len, _ = tf.unstack(tf.shape(decoder_train_targets))
115 |             decoder_train_targets_eos_mask = tf.one_hot(self.decoder_train_length - 1,
116 |                                                         decoder_train_targets_seq_len,
117 |                                                         on_value=self.EOS, off_value=self.PAD,
118 |                                                         dtype=tf.int32)
119 |             decoder_train_targets_eos_mask = tf.transpose(decoder_train_targets_eos_mask, [1, 0])
120 | 
121 |             # hacky way using one_hot to put EOS symbol at the end of target sequence
122 |             decoder_train_targets = tf.add(decoder_train_targets,
123 |                                            decoder_train_targets_eos_mask)
124 | 
125 |             self.decoder_train_targets = decoder_train_targets
126 | 
127 |             self.loss_weights = tf.ones([
128 |                 batch_size,
129 |                 tf.reduce_max(self.decoder_train_length)
130 |             ], dtype=tf.float32, name="loss_weights")
131 | 
132 |     def _init_embeddings(self):
133 |         with tf.variable_scope("embedding") as scope:
134 | 
135 |             # Uniform(-sqrt(3), sqrt(3)) has variance=1.
136 |             sqrt3 = math.sqrt(3)
137 |             initializer = tf.random_uniform_initializer(-sqrt3, sqrt3)
138 | 
139 |             self.embedding_matrix = tf.get_variable(
140 |                 name="embedding_matrix",
141 |                 shape=[self.vocab_size, self.embedding_size],
142 |                 initializer=initializer,
143 |                 dtype=tf.float32)
144 | 
145 |             self.encoder_inputs_embedded = tf.nn.embedding_lookup(
146 |                 self.embedding_matrix, self.encoder_inputs)
147 | 
148 |             self.decoder_train_inputs_embedded = tf.nn.embedding_lookup(
149 |                 self.embedding_matrix, self.decoder_train_inputs)
150 | 
151 |     def _init_simple_encoder(self):
152 |         with tf.variable_scope("Encoder") as scope:
153 |             (self.encoder_outputs, self.encoder_state) = (
154 |                 tf.nn.dynamic_rnn(cell=self.encoder_cell,
155 |                                   inputs=self.encoder_inputs_embedded,
156 |                                   sequence_length=self.encoder_inputs_length,
157 |                                   time_major=True,
158 |                                   dtype=tf.float32)
159 |                 )
160 | 
161 |     def _init_bidirectional_encoder(self):
162 |         with tf.variable_scope("BidirectionalEncoder") as scope:
163 | 
164 |             ((encoder_fw_outputs,
165 |               encoder_bw_outputs),
166 |              (encoder_fw_state,
167 |               encoder_bw_state)) = (
168 |                 tf.nn.bidirectional_dynamic_rnn(cell_fw=self.encoder_cell,
169 |                                                 cell_bw=self.encoder_cell,
170 |                                                 inputs=self.encoder_inputs_embedded,
171 |                                                 sequence_length=self.encoder_inputs_length,
172 |                                                 time_major=True,
173 |                                                 dtype=tf.float32)
174 |                 )
175 | 
176 |             self.encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2)
177 | 
178 |             if isinstance(encoder_fw_state, LSTMStateTuple):
179 | 
180 |                 encoder_state_c = tf.concat(
181 |                     (encoder_fw_state.c, encoder_bw_state.c), 1, name='bidirectional_concat_c')
182 |                 encoder_state_h = tf.concat(
183 |                     (encoder_fw_state.h, encoder_bw_state.h), 1, name='bidirectional_concat_h')
184 |                 self.encoder_state = LSTMStateTuple(c=encoder_state_c, h=encoder_state_h)
185 | 
186 |             elif isinstance(encoder_fw_state, tf.Tensor):
187 |                 self.encoder_state = tf.concat((encoder_fw_state, encoder_bw_state), 1, name='bidirectional_concat')
188 | 
189 |     def _init_decoder(self):
190 |         with tf.variable_scope("Decoder") as scope:
191 |             def output_fn(outputs):
192 |                 return tf.contrib.layers.linear(outputs, self.vocab_size, scope=scope)
193 | 
194 |             if not self.attention:
195 |                 decoder_fn_train = seq2seq.simple_decoder_fn_train(encoder_state=self.encoder_state)
196 |                 decoder_fn_inference = seq2seq.simple_decoder_fn_inference(
197 |                     output_fn=output_fn,
198 |                     encoder_state=self.encoder_state,
199 |                     embeddings=self.embedding_matrix,
200 |                     start_of_sequence_id=self.EOS,
201 |                     end_of_sequence_id=self.EOS,
202 |                     maximum_length=tf.reduce_max(self.encoder_inputs_length) + 3,
203 |                     num_decoder_symbols=self.vocab_size,
204 |                 )
205 |             else:
206 | 
207 |                 # attention_states: size [batch_size, max_time, num_units]
208 |                 attention_states = tf.transpose(self.encoder_outputs, [1, 0, 2])
209 | 
210 |                 (attention_keys,
211 |                 attention_values,
212 |                 attention_score_fn,
213 |                 attention_construct_fn) = seq2seq.prepare_attention(
214 |                     attention_states=attention_states,
215 |                     attention_option="bahdanau",
216 |                     num_units=self.decoder_hidden_units,
217 |                 )
218 | 
219 |                 decoder_fn_train = seq2seq.attention_decoder_fn_train(
220 |                     encoder_state=self.encoder_state,
221 |                     attention_keys=attention_keys,
222 |                     attention_values=attention_values,
223 |                     attention_score_fn=attention_score_fn,
224 |                     attention_construct_fn=attention_construct_fn,
225 |                     name='attention_decoder'
226 |                 )
227 | 
228 |                 decoder_fn_inference = seq2seq.attention_decoder_fn_inference(
229 |                     output_fn=output_fn,
230 |                     encoder_state=self.encoder_state,
231 |                     attention_keys=attention_keys,
232 |                     attention_values=attention_values,
233 |                     attention_score_fn=attention_score_fn,
234 |                     attention_construct_fn=attention_construct_fn,
235 |                     embeddings=self.embedding_matrix,
236 |                     start_of_sequence_id=self.EOS,
237 |                     end_of_sequence_id=self.EOS,
238 |                     maximum_length=tf.reduce_max(self.encoder_inputs_length) + 3,
239 |                     num_decoder_symbols=self.vocab_size,
240 |                 )
241 | 
242 |             (self.decoder_outputs_train,
243 |              self.decoder_state_train,
244 |              self.decoder_context_state_train) = (
245 |                 seq2seq.dynamic_rnn_decoder(
246 |                     cell=self.decoder_cell,
247 |                     decoder_fn=decoder_fn_train,
248 |                     inputs=self.decoder_train_inputs_embedded,
249 |                     sequence_length=self.decoder_train_length,
250 |                     time_major=True,
251 |                     scope=scope,
252 |                 )
253 |             )
254 | 
255 |             self.decoder_logits_train = output_fn(self.decoder_outputs_train)
256 |             self.decoder_prediction_train = tf.argmax(self.decoder_logits_train, axis=-1, name='decoder_prediction_train')
257 | 
258 |             scope.reuse_variables()
259 | 
260 |             (self.decoder_logits_inference,
261 |              self.decoder_state_inference,
262 |              self.decoder_context_state_inference) = (
263 |                 seq2seq.dynamic_rnn_decoder(
264 |                     cell=self.decoder_cell,
265 |                     decoder_fn=decoder_fn_inference,
266 |                     time_major=True,
267 |                     scope=scope,
268 |                 )
269 |             )
270 |             self.decoder_prediction_inference = tf.argmax(self.decoder_logits_inference, axis=-1, name='decoder_prediction_inference')
271 | 
272 |     def _init_optimizer(self):
273 |         logits = tf.transpose(self.decoder_logits_train, [1, 0, 2])
274 |         targets = tf.transpose(self.decoder_train_targets, [1, 0])
275 |         self.loss = seq2seq.sequence_loss(logits=logits, targets=targets,
276 |                                           weights=self.loss_weights)
277 |         self.train_op = tf.train.AdamOptimizer().minimize(self.loss)
278 | 
279 |     def make_train_inputs(self, input_seq, target_seq):
280 |         inputs_, inputs_length_ = helpers.batch(input_seq)
281 |         targets_, targets_length_ = helpers.batch(target_seq)
282 |         return {
283 |             self.encoder_inputs: inputs_,
284 |             self.encoder_inputs_length: inputs_length_,
285 |             self.decoder_targets: targets_,
286 |             self.decoder_targets_length: targets_length_,
287 |         }
288 | 
289 |     def make_inference_inputs(self, input_seq):
290 |         inputs_, inputs_length_ = helpers.batch(input_seq)
291 |         return {
292 |             self.encoder_inputs: inputs_,
293 |             self.encoder_inputs_length: inputs_length_,
294 |         }
295 | 
296 | 
297 | def make_seq2seq_model(**kwargs):
298 |     args = dict(encoder_cell=LSTMCell(10),
299 |                 decoder_cell=LSTMCell(20),
300 |                 vocab_size=10,
301 |                 embedding_size=10,
302 |                 attention=True,
303 |                 bidirectional=True,
304 |                 debug=False)
305 |     args.update(kwargs)
306 |     return Seq2SeqModel(**args)
307 | 
308 | 
309 | def train_on_copy_task(session, model,
310 |                        length_from=3, length_to=8,
311 |                        vocab_lower=2, vocab_upper=10,
312 |                        batch_size=100,
313 |                        max_batches=5000,
314 |                        batches_in_epoch=1000,
315 |                        verbose=True):
316 | 
317 |     batches = helpers.random_sequences(length_from=length_from, length_to=length_to,
318 |                                        vocab_lower=vocab_lower, vocab_upper=vocab_upper,
319 |                                        batch_size=batch_size)
320 |     loss_track = []
321 |     try:
322 |         for batch in range(max_batches+1):
323 |             batch_data = next(batches)
324 |             fd = model.make_train_inputs(batch_data, batch_data)
325 |             _, l = session.run([model.train_op, model.loss], fd)
326 |             loss_track.append(l)
327 | 
328 |             if verbose:
329 |                 if batch == 0 or batch % batches_in_epoch == 0:
330 |                     print('batch {}'.format(batch))
331 |                     print('  minibatch loss: {}'.format(session.run(model.loss, fd)))
332 |                     for i, (e_in, dt_pred) in enumerate(zip(
333 |                             fd[model.encoder_inputs].T,
334 |                             session.run(model.decoder_prediction_train, fd).T
335 |                         )):
336 |                         print('  sample {}:'.format(i + 1))
337 |                         print('    enc input           > {}'.format(e_in))
338 |                         print('    dec train predicted > {}'.format(dt_pred))
339 |                         if i >= 2:
340 |                             break
341 |                     print()
342 |     except KeyboardInterrupt:
343 |         print('training interrupted')
344 | 
345 |     return loss_track
346 | 
347 | 
348 | if __name__ == '__main__':
349 |     import sys
350 | 
351 |     if 'fw-debug' in sys.argv:
352 |         tf.reset_default_graph()
353 |         with tf.Session() as session:
354 |             model = make_seq2seq_model(debug=True)
355 |             session.run(tf.global_variables_initializer())
356 |             session.run(model.decoder_prediction_train)
357 |             session.run(model.decoder_prediction_train)
358 | 
359 |     elif 'fw-inf' in sys.argv:
360 |         tf.reset_default_graph()
361 |         with tf.Session() as session:
362 |             model = make_seq2seq_model()
363 |             session.run(tf.global_variables_initializer())
364 |             fd = model.make_inference_inputs([[5, 4, 6, 7], [6, 6]])
365 |             inf_out = session.run(model.decoder_prediction_inference, fd)
366 |             print(inf_out)
367 | 
368 |     elif 'train' in sys.argv:
369 |         tracks = {}
370 | 
371 |         tf.reset_default_graph()
372 | 
373 |         with tf.Session() as session:
374 |             model = make_seq2seq_model(attention=True)
375 |             session.run(tf.global_variables_initializer())
376 |             loss_track_attention = train_on_copy_task(session, model)
377 | 
378 |         tf.reset_default_graph()
379 | 
380 |         with tf.Session() as session:
381 |             model = make_seq2seq_model(attention=False)
382 |             session.run(tf.global_variables_initializer())
383 |             loss_track_no_attention = train_on_copy_task(session, model)
384 | 
385 |         import matplotlib.pyplot as plt
386 |         plt.plot(loss_track)
387 |         print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))
388 | 
389 |     else:
390 |         tf.reset_default_graph()
391 |         session = tf.InteractiveSession()
392 |         model = make_seq2seq_model(debug=False)
393 |         session.run(tf.global_variables_initializer())
394 | 
395 |         fd = model.make_inference_inputs([[5, 4, 6, 7], [6, 6]])
396 | 
397 |         inf_out = session.run(model.decoder_prediction_inference, fd)


--------------------------------------------------------------------------------
/pictures/1-seq2seq.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ematvey/tensorflow-seq2seq-tutorials/f767fd66d940d7852e164731cc774de1f6c35437/pictures/1-seq2seq.png


--------------------------------------------------------------------------------
/pictures/2-seq2seq-feed-previous.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ematvey/tensorflow-seq2seq-tutorials/f767fd66d940d7852e164731cc774de1f6c35437/pictures/2-seq2seq-feed-previous.png


--------------------------------------------------------------------------------