├── .gitignore ├── 1-seq2seq.ipynb ├── 2-seq2seq-advanced.ipynb ├── 3-seq2seq-native-new.ipynb ├── LICENSE ├── README.md ├── helpers.py ├── model_new.py └── pictures ├── 1-seq2seq.png └── 2-seq2seq-feed-previous.png /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | local_settings.py 55 | 56 | # Flask stuff: 57 | instance/ 58 | .webassets-cache 59 | 60 | # Scrapy stuff: 61 | .scrapy 62 | 63 | # Sphinx documentation 64 | docs/_build/ 65 | 66 | # PyBuilder 67 | target/ 68 | 69 | # IPython Notebook 70 | .ipynb_checkpoints 71 | 72 | # pyenv 73 | .python-version 74 | 75 | # celery beat schedule file 76 | celerybeat-schedule 77 | 78 | # dotenv 79 | .env 80 | 81 | # virtualenv 82 | venv/ 83 | ENV/ 84 | 85 | # Spyder project settings 86 | .spyderproject 87 | 88 | # Rope project settings 89 | .ropeproject 90 | -------------------------------------------------------------------------------- /1-seq2seq.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Simple dynamic seq2seq with TensorFlow" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This tutorial covers building seq2seq using dynamic unrolling with TensorFlow. \n", 15 | "\n", 16 | "I wasn't able to find any existing implementation of dynamic seq2seq with TF (as of 01.01.2017), so I decided to learn how to write my own, and document what I learn in the process.\n", 17 | "\n", 18 | "I deliberately try to be as explicit as possible. As it currently stands, TF code is the best source of documentation on itself, and I have a feeling that many conventions and design decisions are not documented anywhere except in the brains of Google Brain engineers. \n", 19 | "\n", 20 | "I hope this will be useful to people whose brains are wired like mine.\n", 21 | "\n", 22 | "**UPDATE**: as of r1.0 @ 16.02.2017, there is new official implementation in `tf.contrib.seq2seq`. See [tutorial #3](3-seq2seq-native-new.ipynb). Official tutorial reportedly be up soon. Personally I still find wiring dynamic encoder-decoder by hand insightful in many ways." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "Here we implement plain seq2seq — forward-only encoder + decoder without attention. I'll try to follow closely the original architecture described in [Sutskever, Vinyals and Le (2014)](https://arxiv.org/abs/1409.3215). If you notice any deviations, please let me know." 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "Architecture diagram from their paper:\n", 37 | "![seq2seq architecutre](pictures/1-seq2seq.png)\n", 38 | "Rectangles are encoder and decoder's recurrent layers. Encoder receives `[A, B, C]` sequence as inputs. We don't care about encoder outputs, only about the hidden state it accumulates while reading the sequence. After input sequence ends, encoder passes its final state to decoder, which receives `[, W, X, Y, Z]` and is trained to output `[W, X, Y, Z, ]`. `` token is a special word in vocabulary that signals to decoder the beginning of translation." 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "## Implementation details\n", 46 | "\n", 47 | "TensorFlow has its own [implementation of seq2seq](https://www.tensorflow.org/tutorials/seq2seq/). Recently it was moved from core examples to [`tensorflow/models` repo](https://github.com/tensorflow/models/tree/master/tutorials/rnn/translate), and uses deprecated seq2seq implementation. Deprecation happened because it uses **static unrolling**.\n", 48 | "\n", 49 | "**Static unrolling** involves construction of computation graph with a fixed sequence of time step. Such a graph can only handle sequences of specific lengths. One solution for handling sequences of varying lengths is to create multiple graphs with different time lengths and separate the dataset into this buckets.\n", 50 | "\n", 51 | "**Dynamic unrolling** instead uses control flow ops to process sequence step by step. In TF this is supposed to more space efficient and just as fast. This is now a recommended way to implement RNNs." 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "## Vocabulary\n", 59 | "\n", 60 | "Seq2seq maps sequence onto another sequence. Both sequences consist of integers from a fixed range. In language tasks, integers usually correspond to words: we first construct a vocabulary by assigning to every word in our corpus a serial integer. First few integers are reserved for special tokens. We'll call the upper bound on vocabulary a `vocabulary size`.\n", 61 | "\n", 62 | "Input data consists of sequences of integers." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 1, 68 | "metadata": { 69 | "collapsed": true 70 | }, 71 | "outputs": [], 72 | "source": [ 73 | "x = [[5, 7, 8], [6, 3], [3], [1]]" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "While manipulating such variable-length lists are convenient to humans, RNNs prefer a different layout:" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 2, 86 | "metadata": { 87 | "collapsed": false 88 | }, 89 | "outputs": [], 90 | "source": [ 91 | "import helpers\n", 92 | "xt, xlen = helpers.batch(x)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 3, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "[[5, 7, 8], [6, 3], [3], [1]]" 106 | ] 107 | }, 108 | "execution_count": 3, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "x" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 4, 120 | "metadata": { 121 | "collapsed": false 122 | }, 123 | "outputs": [ 124 | { 125 | "data": { 126 | "text/plain": [ 127 | "array([[5, 6, 3, 1],\n", 128 | " [7, 3, 0, 0],\n", 129 | " [8, 0, 0, 0]], dtype=int32)" 130 | ] 131 | }, 132 | "execution_count": 4, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "xt" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "Sequences form columns of a matrix of size `[max_time, batch_size]`. Sequences shorter then the longest one are padded with zeros towards the end. This layout is called `time-major`. It is slightly more efficient then `batch-major`. We will use it for the rest of the tutorial." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 5, 151 | "metadata": { 152 | "collapsed": false 153 | }, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "[3, 2, 1, 1]" 159 | ] 160 | }, 161 | "execution_count": 5, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "xlen" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "For some forms of dynamic layout it is useful to have a pointer to terminals of every sequence in the batch in separate tensor (see following tutorials)." 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "# Building a model" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "## Simple seq2seq" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Encoder starts with empty state and runs through the input sequence. We are not interested in encoder's outputs, only in its `final_state`.\n", 196 | "\n", 197 | "Decoder uses encoder's `final_state` as its `initial_state`. Its inputs are a batch-sized matrix with `` token at the 1st time step and `` at the following. This is a rather crude setup, useful only for tutorial purposes. In practice, we would like to feed previously generated tokens after ``.\n", 198 | "\n", 199 | "Decoder's outputs are mapped onto the output space using `[hidden_units x output_vocab_size]` projection layer. This is necessary because we cannot make `hidden_units` of decoder arbitrarily large, while our target space would grow with the size of the dictionary.\n", 200 | "\n", 201 | "This kind of encoder-decoder is forced to learn fixed-length representation (specifically, `hidden_units` size) of the variable-length input sequence and restore output sequence only from this representation." 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 6, 207 | "metadata": { 208 | "collapsed": false 209 | }, 210 | "outputs": [], 211 | "source": [ 212 | "import numpy as np\n", 213 | "import tensorflow as tf\n", 214 | "import helpers\n", 215 | "\n", 216 | "tf.reset_default_graph()\n", 217 | "sess = tf.InteractiveSession()" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 7, 223 | "metadata": { 224 | "collapsed": false 225 | }, 226 | "outputs": [ 227 | { 228 | "data": { 229 | "text/plain": [ 230 | "'1.3.0'" 231 | ] 232 | }, 233 | "execution_count": 7, 234 | "metadata": {}, 235 | "output_type": "execute_result" 236 | } 237 | ], 238 | "source": [ 239 | "tf.__version__" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "### Model inputs and outputs " 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "First critical thing to decide: vocabulary size.\n", 254 | "\n", 255 | "Dynamic RNN models can be adapted to different batch sizes and sequence lengths without retraining (e.g. by serializing model parameters and Graph definitions via `tf.train.Saver`), but changing vocabulary size requires retraining the model." 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 8, 261 | "metadata": { 262 | "collapsed": true 263 | }, 264 | "outputs": [], 265 | "source": [ 266 | "PAD = 0\n", 267 | "EOS = 1\n", 268 | "\n", 269 | "vocab_size = 10\n", 270 | "input_embedding_size = 20\n", 271 | "\n", 272 | "encoder_hidden_units = 20\n", 273 | "decoder_hidden_units = encoder_hidden_units" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "Nice way to understand complicated function is to study its signature - inputs and outputs. With pure functions, only inputs-output relation matters.\n", 281 | "\n", 282 | "- `encoder_inputs` int32 tensor is shaped `[encoder_max_time, batch_size]`\n", 283 | "- `decoder_targets` int32 tensor is shaped `[decoder_max_time, batch_size]`" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 9, 289 | "metadata": { 290 | "collapsed": false 291 | }, 292 | "outputs": [], 293 | "source": [ 294 | "encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')\n", 295 | "decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "We'll add one additional placeholder tensor: \n", 303 | "- `decoder_inputs` int32 tensor is shaped `[decoder_max_time, batch_size]`" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 10, 309 | "metadata": { 310 | "collapsed": false 311 | }, 312 | "outputs": [], 313 | "source": [ 314 | "decoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_inputs')" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "We actually don't want to feed `decoder_inputs` manually — they are a function of either `decoder_targets` or previous decoder outputs during rollout. However, there are different ways to construct them. It might be illustrative to explicitly specify them for out first seq2seq implementation.\n", 322 | "\n", 323 | "During training, `decoder_inputs` will consist of `` token concatenated with `decoder_targets` along time axis. In this way, we always pass target sequence as the history to the decoder, regrardless of what it actually outputs predicts. This can introduce distribution shift from training to prediction. \n", 324 | "In prediction mode, model will receive tokens it previously generated (via argmax over logits), not the ground truth, which would be unknowable." 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "Notice that all shapes are specified with `None`s (dynamic). We can use batches of any size with any number of timesteps. This is convenient and efficient, however but there are obvious constraints: \n", 332 | "- Feed values for all tensors should have same `batch_size`\n", 333 | "- Decoder inputs and ouputs (`decoder_inputs` and `decoder_targets`) should have same `decoder_max_time`" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "### Embeddings\n", 341 | "\n", 342 | "`encoder_inputs` and `decoder_inputs` are int32 tensors of shape `[max_time, batch_size]`, while encoder and decoder RNNs expect dense vector representation of words, `[max_time, batch_size, input_embedding_size]`. We convert one to another by using *word embeddings*. Specifics of working with embeddings are nicely described in [official tutorial on embeddings](https://www.tensorflow.org/tutorials/word2vec/)." 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "First we initialize embedding matrix. Initializations are random. We rely on our end-to-end training to learn vector representations for words jointly with encoder and decoder." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 11, 355 | "metadata": { 356 | "collapsed": false 357 | }, 358 | "outputs": [], 359 | "source": [ 360 | "embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "We use `tf.nn.embedding_lookup` to *index embedding matrix*: given word `4`, we represent it as 4th column of embedding matrix. \n", 368 | "This operation is lightweight, compared with alternative approach of one-hot encoding word `4` as `[0,0,0,1,0,0,0,0,0,0]` (vocab size 10) and then multiplying it by embedding matrix.\n", 369 | "\n", 370 | "Additionally, we don't need to compute gradients for any columns except 4th.\n", 371 | "\n", 372 | "Encoder and decoder will share embeddings. It's all words, right? Well, digits in this case. In real NLP application embedding matrix can get very large, with 100k or even 1m columns." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 12, 378 | "metadata": { 379 | "collapsed": true 380 | }, 381 | "outputs": [], 382 | "source": [ 383 | "encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)\n", 384 | "decoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, decoder_inputs)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "### Encoder\n", 392 | "\n", 393 | "The centerpiece of all things RNN in TensorFlow is `RNNCell` class and its descendants (like `LSTMCell`). But they are outside of the scope of this post — nice [official tutorial](https://www.tensorflow.org/tutorials/recurrent/) is available. \n", 394 | "\n", 395 | "`@TODO: RNNCell as a factory`" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": 13, 401 | "metadata": { 402 | "collapsed": false 403 | }, 404 | "outputs": [], 405 | "source": [ 406 | "encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)\n", 407 | "\n", 408 | "encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(\n", 409 | " encoder_cell, encoder_inputs_embedded,\n", 410 | " dtype=tf.float32, time_major=True,\n", 411 | ")\n", 412 | "\n", 413 | "del encoder_outputs" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "We discard `encoder_outputs` because we are not interested in them within seq2seq framework. What we actually want is `encoder_final_state` — state of LSTM's hidden cells at the last moment of the Encoder rollout.\n", 421 | "\n", 422 | "`encoder_final_state` is also called \"thought vector\". We will use it as initial state for the Decoder. In seq2seq without attention this is the only point where Encoder passes information to Decoder. We hope that backpropagation through time (BPTT) algorithm will tune the model to pass enough information throught the thought vector for correct sequence output decoding." 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 14, 428 | "metadata": { 429 | "collapsed": false 430 | }, 431 | "outputs": [ 432 | { 433 | "data": { 434 | "text/plain": [ 435 | "LSTMStateTuple(c=, h=)" 436 | ] 437 | }, 438 | "execution_count": 14, 439 | "metadata": {}, 440 | "output_type": "execute_result" 441 | } 442 | ], 443 | "source": [ 444 | "encoder_final_state" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "TensorFlow LSTM implementation stores state as a tuple of tensors. \n", 452 | "- `encoder_final_state.h` is activations of hidden layer of LSTM cell\n", 453 | "- `encoder_final_state.c` is final output, which can potentially be transfromed with some wrapper `@TODO: check correctness`" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "### Decoder" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 15, 466 | "metadata": { 467 | "collapsed": false 468 | }, 469 | "outputs": [], 470 | "source": [ 471 | "decoder_cell = tf.contrib.rnn.LSTMCell(decoder_hidden_units)\n", 472 | "\n", 473 | "decoder_outputs, decoder_final_state = tf.nn.dynamic_rnn(\n", 474 | " decoder_cell, decoder_inputs_embedded,\n", 475 | "\n", 476 | " initial_state=encoder_final_state,\n", 477 | "\n", 478 | " dtype=tf.float32, time_major=True, scope=\"plain_decoder\",\n", 479 | ")" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": {}, 485 | "source": [ 486 | "Since we pass `encoder_final_state` as `initial_state` to the decoder, they should be compatible. This means the same cell type (`LSTMCell` in our case), the same amount of `hidden_units` and the same amount of layers (single layer). I suppose this can be relaxed if we additonally pass `encoder_final_state` through a one-layer MLP." 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "With encoder, we were not interested in cells output. But decoder's outputs are what we actually after: we use them to get distribution over words of output sequence.\n", 494 | "\n", 495 | "At this point `decoder_cell` output is a `hidden_units` sized vector at every timestep. However, for training and prediction we need logits of size `vocab_size`. Reasonable thing would be to put linear layer (fully-connected layer without activation function) on top of LSTM output to get non-normalized logits. This layer is called projection layer by convention." 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 16, 501 | "metadata": { 502 | "collapsed": false 503 | }, 504 | "outputs": [], 505 | "source": [ 506 | "decoder_logits = tf.contrib.layers.linear(decoder_outputs, vocab_size)\n", 507 | "\n", 508 | "decoder_prediction = tf.argmax(decoder_logits, 2)" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "### Optimizer" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": 17, 521 | "metadata": { 522 | "collapsed": false 523 | }, 524 | "outputs": [ 525 | { 526 | "data": { 527 | "text/plain": [ 528 | "" 529 | ] 530 | }, 531 | "execution_count": 17, 532 | "metadata": {}, 533 | "output_type": "execute_result" 534 | } 535 | ], 536 | "source": [ 537 | "decoder_logits" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "RNN outputs tensor of shape `[max_time, batch_size, hidden_units]` which projection layer maps onto `[max_time, batch_size, vocab_size]`. `vocab_size` part of the shape is static, while `max_time` and `batch_size` is dynamic." 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 18, 550 | "metadata": { 551 | "collapsed": false 552 | }, 553 | "outputs": [], 554 | "source": [ 555 | "stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(\n", 556 | " labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),\n", 557 | " logits=decoder_logits,\n", 558 | ")\n", 559 | "\n", 560 | "loss = tf.reduce_mean(stepwise_cross_entropy)\n", 561 | "train_op = tf.train.AdamOptimizer().minimize(loss)" 562 | ] 563 | }, 564 | { 565 | "cell_type": "code", 566 | "execution_count": 19, 567 | "metadata": { 568 | "collapsed": false 569 | }, 570 | "outputs": [], 571 | "source": [ 572 | "sess.run(tf.global_variables_initializer())" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "### Test forward pass\n", 580 | "\n", 581 | "Did I say that deep learning is a game of shapes? When building a Graph, TF will throw errors when static shapes are not matching. However, mismatches between dynamic shapes are often only discovered when we try to run something through the graph.\n", 582 | "\n", 583 | "\n", 584 | "So let's try running something. For that we need to prepare values we will feed into placeholders." 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "```\n", 592 | "this is key part where everything comes together\n", 593 | "\n", 594 | "@TODO: describe\n", 595 | "- how encoder shape is fixed to max\n", 596 | "- how decoder shape is arbitraty and determined by inputs, but should probably be longer then encoder's\n", 597 | "- how decoder input values are also arbitraty, and how we use GO token, and what are those 0s, and what can be used instead (shifted gold sequence, beam search)\n", 598 | "@TODO: add references\n", 599 | "```" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 20, 605 | "metadata": { 606 | "collapsed": false 607 | }, 608 | "outputs": [ 609 | { 610 | "name": "stdout", 611 | "output_type": "stream", 612 | "text": [ 613 | "batch_encoded:\n", 614 | "[[6 3 9]\n", 615 | " [0 4 8]\n", 616 | " [0 0 7]]\n", 617 | "decoder inputs:\n", 618 | "[[1 1 1]\n", 619 | " [0 0 0]\n", 620 | " [0 0 0]\n", 621 | " [0 0 0]]\n", 622 | "decoder predictions:\n", 623 | "[[2 4 6]\n", 624 | " [2 4 2]\n", 625 | " [1 4 5]\n", 626 | " [5 4 4]]\n" 627 | ] 628 | } 629 | ], 630 | "source": [ 631 | "batch_ = [[6], [3, 4], [9, 8, 7]]\n", 632 | "\n", 633 | "batch_, batch_length_ = helpers.batch(batch_)\n", 634 | "print('batch_encoded:\\n' + str(batch_))\n", 635 | "\n", 636 | "din_, dlen_ = helpers.batch(np.ones(shape=(3, 1), dtype=np.int32),\n", 637 | " max_sequence_length=4)\n", 638 | "print('decoder inputs:\\n' + str(din_))\n", 639 | "\n", 640 | "pred_ = sess.run(decoder_prediction,\n", 641 | " feed_dict={\n", 642 | " encoder_inputs: batch_,\n", 643 | " decoder_inputs: din_,\n", 644 | " })\n", 645 | "print('decoder predictions:\\n' + str(pred_))" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "Successful forward computation, everything is wired correctly." 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "## Training on the toy task" 660 | ] 661 | }, 662 | { 663 | "cell_type": "markdown", 664 | "metadata": {}, 665 | "source": [ 666 | "We will teach our model to memorize and reproduce input sequence. Sequences will be random, with varying length.\n", 667 | "\n", 668 | "Since random sequences do not contain any structure, model will not be able to exploit any patterns in data. It will simply encode sequence in a thought vector, then decode from it." 669 | ] 670 | }, 671 | { 672 | "cell_type": "code", 673 | "execution_count": 21, 674 | "metadata": { 675 | "collapsed": false 676 | }, 677 | "outputs": [ 678 | { 679 | "name": "stdout", 680 | "output_type": "stream", 681 | "text": [ 682 | "head of the batch:\n", 683 | "[4, 3, 6, 7, 8, 2, 7, 9]\n", 684 | "[4, 9, 3, 2, 3, 9, 5]\n", 685 | "[2, 7, 4, 7]\n", 686 | "[8, 4, 6, 6, 9, 2]\n", 687 | "[5, 8, 8, 8, 6, 2]\n", 688 | "[2, 7, 3, 2, 4]\n", 689 | "[4, 8, 6]\n", 690 | "[7, 8, 7, 3]\n", 691 | "[7, 2, 3, 3, 7, 7, 6, 2]\n", 692 | "[4, 5, 4, 7, 6, 5, 8]\n" 693 | ] 694 | } 695 | ], 696 | "source": [ 697 | "batch_size = 100\n", 698 | "\n", 699 | "batches = helpers.random_sequences(length_from=3, length_to=8,\n", 700 | " vocab_lower=2, vocab_upper=10,\n", 701 | " batch_size=batch_size)\n", 702 | "\n", 703 | "print('head of the batch:')\n", 704 | "for seq in next(batches)[:10]:\n", 705 | " print(seq)" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 22, 711 | "metadata": { 712 | "collapsed": true 713 | }, 714 | "outputs": [], 715 | "source": [ 716 | "def next_feed():\n", 717 | " batch = next(batches)\n", 718 | " encoder_inputs_, _ = helpers.batch(batch)\n", 719 | " decoder_targets_, _ = helpers.batch(\n", 720 | " [(sequence) + [EOS] for sequence in batch]\n", 721 | " )\n", 722 | " decoder_inputs_, _ = helpers.batch(\n", 723 | " [[EOS] + (sequence) for sequence in batch]\n", 724 | " )\n", 725 | " return {\n", 726 | " encoder_inputs: encoder_inputs_,\n", 727 | " decoder_inputs: decoder_inputs_,\n", 728 | " decoder_targets: decoder_targets_,\n", 729 | " }" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": {}, 735 | "source": [ 736 | "Given encoder_inputs `[5, 6, 7]`, decoder_targets would be `[5, 6, 7, 1]`, where 1 is for `EOS`, and decoder_inputs would be `[1, 5, 6, 7]` - decoder_inputs are lagged by 1 step, passing previous token as input at current step." 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 23, 742 | "metadata": { 743 | "collapsed": true 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "loss_track = []" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": 24, 753 | "metadata": { 754 | "collapsed": false, 755 | "scrolled": false 756 | }, 757 | "outputs": [ 758 | { 759 | "name": "stdout", 760 | "output_type": "stream", 761 | "text": [ 762 | "batch 0\n", 763 | " minibatch loss: 2.3455774784088135\n", 764 | " sample 1:\n", 765 | " input > [4 3 2 0 0 0 0 0]\n", 766 | " predicted > [2 2 7 2 5 5 5 5 5]\n", 767 | " sample 2:\n", 768 | " input > [8 5 9 6 9 4 0 0]\n", 769 | " predicted > [2 1 1 9 9 9 9 9 9]\n", 770 | " sample 3:\n", 771 | " input > [3 3 3 6 8 2 6 0]\n", 772 | " predicted > [2 3 3 3 3 3 5 5 5]\n", 773 | "\n", 774 | "batch 1000\n", 775 | " minibatch loss: 0.3058355748653412\n", 776 | " sample 1:\n", 777 | " input > [3 3 5 9 3 4 6 7]\n", 778 | " predicted > [3 3 5 9 3 4 6 7 1]\n", 779 | " sample 2:\n", 780 | " input > [8 4 9 8 3 8 0 0]\n", 781 | " predicted > [8 4 8 8 3 8 1 0 0]\n", 782 | " sample 3:\n", 783 | " input > [6 5 7 0 0 0 0 0]\n", 784 | " predicted > [6 5 7 1 0 0 0 0 0]\n", 785 | "\n", 786 | "batch 2000\n", 787 | " minibatch loss: 0.1154913678765297\n", 788 | " sample 1:\n", 789 | " input > [5 4 8 5 0 0 0 0]\n", 790 | " predicted > [5 4 8 5 1 0 0 0 0]\n", 791 | " sample 2:\n", 792 | " input > [8 2 6 2 6 0 0 0]\n", 793 | " predicted > [8 2 6 2 6 1 0 0 0]\n", 794 | " sample 3:\n", 795 | " input > [6 2 9 5 0 0 0 0]\n", 796 | " predicted > [6 2 9 5 1 0 0 0 0]\n", 797 | "\n", 798 | "batch 3000\n", 799 | " minibatch loss: 0.09871210902929306\n", 800 | " sample 1:\n", 801 | " input > [6 7 9 6 0 0 0 0]\n", 802 | " predicted > [6 7 9 6 1 0 0 0 0]\n", 803 | " sample 2:\n", 804 | " input > [7 5 3 5 4 9 9 0]\n", 805 | " predicted > [7 5 3 5 4 9 9 1 0]\n", 806 | " sample 3:\n", 807 | " input > [2 5 6 2 4 9 7 6]\n", 808 | " predicted > [2 5 6 2 4 9 7 3 1]\n", 809 | "\n" 810 | ] 811 | } 812 | ], 813 | "source": [ 814 | "max_batches = 3001\n", 815 | "batches_in_epoch = 1000\n", 816 | "\n", 817 | "try:\n", 818 | " for batch in range(max_batches):\n", 819 | " fd = next_feed()\n", 820 | " _, l = sess.run([train_op, loss], fd)\n", 821 | " loss_track.append(l)\n", 822 | "\n", 823 | " if batch == 0 or batch % batches_in_epoch == 0:\n", 824 | " print('batch {}'.format(batch))\n", 825 | " print(' minibatch loss: {}'.format(sess.run(loss, fd)))\n", 826 | " predict_ = sess.run(decoder_prediction, fd)\n", 827 | " for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):\n", 828 | " print(' sample {}:'.format(i + 1))\n", 829 | " print(' input > {}'.format(inp))\n", 830 | " print(' predicted > {}'.format(pred))\n", 831 | " if i >= 2:\n", 832 | " break\n", 833 | " print()\n", 834 | "except KeyboardInterrupt:\n", 835 | " print('training interrupted')" 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": 25, 841 | "metadata": { 842 | "collapsed": false 843 | }, 844 | "outputs": [ 845 | { 846 | "name": "stdout", 847 | "output_type": "stream", 848 | "text": [ 849 | "loss 0.0982 after 300100 examples (batch_size=100)\n" 850 | ] 851 | }, 852 | { 853 | "data": { 854 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8FPX9x/HXJzf3lSA34RIFRNAIKEe9RbRFWv1Vbaul\nKtraqq1tf9RapbXWag+rrdWftdTWelWrFgWtRVFRK8gV7vsMBBKuJJCEXN/fH7uJSciSa7OzO3k/\nH488mJ2Z3fkMA+/Mznzn+zXnHCIi4i9xXhcgIiLhp3AXEfEhhbuIiA8p3EVEfEjhLiLiQwp3EREf\nUriLiPiQwl1ExIcU7iIiPpTg1YZTU1Ndenq6V5sXEYlJS5cu3e+cS6tvPc/CPT09nSVLlni1eRGR\nmGRmOxqyni7LiIj4kMJdRMSHFO4iIj6kcBcR8SGFu4iIDyncRUR8SOEuIuJDMRfu+48c46evr6G0\nvMLrUkREolbMhfs76/bxl4+28/GWA16XIiIStWIu3CcMCTx1u+dwkceViIhEr5gL9+4dkomPM3Yf\nUriLiIQSc+GeGB9H785t2HGw0OtSRESiVsyFO0D/bm3ZceCo12WIiEStmAz39G7t2L5f4S4iEkpM\nhnu39knkF5epOaSISAgxGe6Zuw4DsDIrz+NKRESiU0yG+1UZfQEo05m7iEidYjLcu7VLAuDN1Xs9\nrkREJDrFZLifld4VgINHSzyuREQkOsVkuMfFGWMGdGVvfrHXpYiIRKWYDHeAtA7J7C845nUZIiJR\nKXbDvX0yuQp3EZE6xW64d0im4FgZRSXlXpciIhJ1YjbcN+0rAOAPCzZ5XImISPSJ2XCfOqo3ACVl\nausuIlJbzIb7505OIyk+jrg487oUEZGoE7PhHhdnlJRX8H/vb/W6FBGRqBOz4V6duiEQEakppsP9\nmjH9ACircB5XIiISXWI63AektgUU7iIitcV0uCfEBcrXZRkRkZpiOtwrB+s4cqzM40pERKJLTIf7\ni0t2ATDlkYUeVyIiEl1iOty/MrY/APnFOnMXEakupsP9qow+AEw5rYfHlYiIRJd6w93M+prZAjNb\na2ZrzOz2OtYxM3vUzDab2UozO6Nlyq2pY0oiAG9pRCYRkRoSGrBOGXCnc26ZmXUAlprZf5xza6ut\ncykwJPgzFng8+GdEqCWkiEhN9Z65O+eynXPLgtMFwDqgd63VpgJ/cwGfAJ3NrGfYqxURkQZp1DV3\nM0sHRgOLai3qDeyq9jqL438BtCjndPouIlKpweFuZu2BfwJ3OOfym7IxM5thZkvMbElubm5TPuI4\n00YHfodszjkSls8TEfGDBoW7mSUSCPZnnXOv1LHKbqBvtdd9gvNqcM496ZzLcM5lpKWlNaXe48xd\nmQ3Ay8uywvJ5IiJ+0JDWMgb8GVjnnPttiNXmANcFW82MA/Kcc9lhrDOkjm0C94Tzi9TWXUSkUkNa\ny4wHvgasMrMVwXl3Af0AnHNPAPOAKcBmoBCYHv5S69anS1v2HymhqEThLiJSqd5wd859CJxwuCMX\nuJt5a7iKaownvnom4x54h0Fp7b3YvIhIVIrpJ1QBenRKoW1SPHlFpV6XIiISNWI+3AE6t0nksMJd\nRKSKL8K9TVI8q3fneV2GiEjUaMgN1ai3JfcoAEUl5bRJive4GhER7/nizL1SiUZkEhEBfBbuBcW6\n7i4iAj4J9xG9OwLwm7c3elyJiEh08EW4V7ZxX77zkMeViIhEB1+Ee4eUwH3h7QcKPa5ERCQ6+CLc\nZ156qtcliIhEFV+Ee/vkBC47LTA2iPp1FxHxSbgDzF0V6IRy3iqNpyoi4ptwr3Trc8u8LkFExHO+\nC3cREfFRuM+7bWLVdKmeVBWRVs434T6sV8eq6eLScg8rERHxnm/CvbriUp25i0jr5stwP1amM3cR\nad18Ge6rd+d7XYKIiKd8Ge53vbrK6xJERDzly3AvU2sZEWnlfBXuw4MtZvKLy3j4P+r+V0RaL1+F\n+9++MaZq+pF3NnlYiYiIt3wV7kkJvtodEZEm81UaxseZ1yWIiEQFX4V7m8R4r0sQEYkKvgp3M525\ni4iAz8K9trdWq293EWmdfB3ut/x9qdcliIh4wnfh/sPJQ2u81rB7ItIa+S7cv3XuYGZMGlj1emVW\nnofViIh4w3fhDvDp9oNV0yXqikBEWiFfhvvynYerpnMLjnlYiYiIN3wZ7vdPG1E1/a1nl+m6u4i0\nOvWGu5nNNrMcM1sdYvm5ZpZnZiuCP/eEv8zG+dIZfWq8/uvH270pRETEIw05c38amFzPOgudc6OC\nPz9rflnNk1LrSdXH3tviUSUiIt6oN9ydcx8AB+tbL5rpsoyItDbhuuZ+jpmtNLM3zWx4mD4zbPYf\nKWH+2n1elyEiEjHhCPdlQD/n3Ejg98BroVY0sxlmtsTMluTm5oZh0w333OKdEd2eiIiXmh3uzrl8\n59yR4PQ8INHMUkOs+6RzLsM5l5GWltbcTTdKcWl5RLcnIuKlZoe7mfWwYHeMZjYm+JkHmvu54fbx\nlqgrSUSkxTSkKeTzwH+BoWaWZWY3mNktZnZLcJUrgdVmlgk8ClztouAO5u0XDPG6BBERzyTUt4Jz\n7pp6lv8B+EPYKgqTPl3aeF2CiIhnfPmEKkBdXx2i4AuFiEhE+DbcR/TqdNy8J97f6kElIiKR59tw\nH9arI9eO7Vdj3oNvrfeoGhGRyPJtuAP07JjidQkiIp7wdbjf/LlBXpcgIuIJX4d7UsLxu/fBxsg+\nGSsi4gVfh3tdHluw2esSRERaXKsL90XbYrqDSxGRBml14Q6QPnOu1yWIiLQo34f7iN4dvS5BRCTi\nfB/uXzi9l9cliIhEnO/Dffr4AfxsatSNHyIi0qJ8H+6J8XFcd3b6cfPTZ85l+/6jkS9IRCQCfB/u\nJzJ/3T7mrswmfeZcdh0s9LocEZGwaTXhvv6+yXzvopNrzCstd7yyLAuADXsLvChLRKRFtJpwT0mM\n57ZaA3g8+NZ6FmzIASCu1fxNiEhr0OojrSLYxXtwpEAREV9odeF+92Wn1jk/XuEuIj7S6sK9fXLd\nIwvGKdxFxEdaXbiHEqdsFxEfaXXhHmoU1U+3H4poHSIiLan1hXuIdH/8fXUFLCL+0erCPZTi0gqv\nSxARCZtWF+4dUuq+oQqQnVcUwUpERFpOqwv3y07ryS+mncZDV448btnZD7xLfnGpB1WJiIRX6NNY\nn4qLM64d2w+AbfuP8vh7W2osz8k/RseURC9KExEJm1Z35l4fNXcXET9o1eGuHBcRv2rV4V4XBb6I\n+IHCvZbzf/M+6TPncriwxOtSRESarFWHe1JC6N3/+yc7IliJiEh4tepwv3nSIG6eNLDOZR9vORDh\nakREwqdVh3ubpHh+NOVUzuzf5bhlCncRiWWtOtwr/ePms70uQUQkrFrdQ0x1iQ/R3++Db62nvMJx\nx4VDaJukvyoRiR31JpaZzQYuB3KccyPqWG7AI8AUoBD4unNuWbgL9ULl06spCXF896KTNRSfiMSM\nhlyWeRqYfILllwJDgj8zgMebX1bk3TBhQMhlj767mV+/vSGC1YiINE+94e6c+wA4eIJVpgJ/cwGf\nAJ3NrGe4CoyUn1w+jH/dOj7k8r/9V00jRSR2hOOGam9gV7XXWcF5xzGzGWa2xMyW5ObmhmHT4XV6\n384hlxUUl7FmT14EqxERabqItpZxzj3pnMtwzmWkpaVFctMNtuwnF4VcdtmjH7J+b34EqxERaZpw\nhPtuoG+1132C82JS13ZJjOzTKeRyjbUqIrEgHOE+B7jOAsYBec657DB8rmdenBG63Xt5uYbjE5Ho\n15CmkM8D5wKpZpYF3AskAjjnngDmEWgGuZlAU8jpLVVspLRJig+5rKwixAjbIiJRpN5wd85dU89y\nB9watoqixKST0/hg4/E3fXccKPSgGhGRxlH3A430zCc7KNfZu4hEOYV7CANT24VcNuiueazKUrNI\nEYleCvcQfjTlFJ6eflbI5bM/2hbBakREGkfhHkJyQjznDu0ecnlSvP7qRCR6KaGaqKi03OsSRERC\nUrjXo2u7pDrnz8ncQ6ChENz3xlrueGF5JMsSETkhhXs93vjOBO6fNoK2SfFcdWafGsvufCmTT7Ye\n4M8fbuO1FXs8qlBE5HgagaIevTq34Stj+/OVsf2Z/WHNm6ivLNvNK8titqcFEfExnbk3QkK8BusQ\nkdigcG+EhDj9dYlIbFBaNUJCiLFWRUSijcK9ES4efhKn9uwYcvntLyznWFk5RSVqJiki3rLK5nyR\nlpGR4ZYsWeLJtpsrfebcetdZOetiOqYkRqAaEWlNzGypcy6jvvV05t4EN55gMO1KI2e9TUFxaQSq\nERE5nsK9CWZMGtig9Q4cKWnhSkRE6qZwb0GlGrVJRDyicG+CyrsU3Tsks/n+S0OuN/uj7VSo73cR\n8YDCvQkqgjehzSAhPo5hIVrQPL94J798a30kSxMRARTuTVLZwMgItHufPj495LpPfrCV8379Hr95\newO7DmqIPhGJDIV7E6R1SGZE7448eOVIAK7K6HvC9bftP8rv393MLX9fGonyRETUcVhTJMbH8cZ3\nJjb6fWv25LdANSIix9OZe4S9tGSXnmAVkRancI+wH7y8kh/+cyVlaiYpIi1Il2XCZP73JpEUH8/a\n7Px6r62/nrmHeIMHrxxJckJ8hCoUkdZEZ+5hMrh7B/p1a8uEIakAfHVcv5BD9AG8tmIP189eTL66\nKBCRFqCOw1pATn4xXdslkVdUypef/ITNOUdOuP4fv3IGU07rGaHqRCSWqeMwD3XvmEJCfBzd2ifz\n7zsm0b1D8gnXX7hpf4QqE5HWQuHewuLjjMU/vpCFPzwv5Dql5RV8svWAbrKKSNjohmqE9O3aNuSy\nl5dm8fLSLAB++oXhZKR34aSOKaS2P/EZv4hIKAp3Dzz85dP57ouZdS67d84aAOIMtj5wWSTLEhEf\n0WWZCHrt1vG8/u0JTBvdp951KxzkFhyrMa+guJTNOQUtVZ6I+IjO3CNoVN/OjVp/4kPvMqR7B26Y\nMIArRvdm3C/e4WhJOdt/qTN6ETkxhXsUKy6tYNXuPO54cQXr9xZwVN0WiEgD6bKMRxbfdQH3XD6s\nwes/8f6WFqxGRPymQeFuZpPNbIOZbTazmXUsP9fM8sxsRfDnnvCX6i/dO6bwjQYMtF2XNXvyqKhw\nePUAmohEv3ovy5hZPPAYcBGQBXxqZnOcc2trrbrQOXd5C9QotVz26IcAnH9Kd2Z//SwApv3xI6aN\n7s11Z6d7WJmIRIuGnLmPATY757Y650qAF4CpLVuWNMS763OqppfvPMw9/1rjYTUiEk0aEu69gV3V\nXmcF59V2jpmtNLM3zWx4XR9kZjPMbImZLcnNzW1CuVJbRYWjuPT4G61ZhwrZuE/NJkVaq3C1llkG\n9HPOHTGzKcBrwJDaKznnngSehEDHYWHadky774oR7M0r4rEFWzitdydeuuVs5mTu4fMje3HqPW/V\n+/6Bd807bt4f39vMQ29tAFCzSZFWqiHhvhuoPkhon+C8Ks65/GrT88zsj2aW6pxTj1j1+Nq4/jjn\nSO/Wjs+f3ouUxHj+Jzgm6/ZfXkb6zLmN+ryy8oqqYBeR1qshl2U+BYaY2QAzSwKuBuZUX8HMepiZ\nBafHBD/3QLiL9Ssz46qMvqQkHj9wx3fOH9yoz7rjxRU1XqfPnMvynYeaVZ+IxJ4G9ecevNTyOyAe\nmO2cu9/MbgFwzj1hZt8GvgmUAUXA95xzH5/oM/3cn3u4FZWU88nWA8xdlV3VwVhT/N/XzuSS4T3C\nWJmIRFpD+3PXYB0x5In3t/DLN9c36zPe+M4E3lmXw+0XBm6J/OClTNJT23HreY37hiAi3tBgHT40\ndkDXGq+b0iXw5b//kIfnb6xqYfPS0ix+9W9doxfxG/UtE0NG9+vCxp9fyt68YkrKy3l/437ue6P2\ns2QNc8cLK/h0+8Gq109/tI2vj2/aE7MiEn0U7jEmKSGOft0CA38MSmvPwk25vLeh8c8MvLVmb43X\ns15fy46Dhdxxwcl0apsYllpFxDu65u4DW3KPsHFvAR9syuX5xbvqf0M9Pp55Pn98bzM/uOQUzGDx\n1oOcd0p34uMsDNWKSHPohmortXp3Hpf//sOwf+7dl53KjRMHHje/uLScpPg44hT8IhGhG6qt1Ije\nnaqmv3hGXb1ENM3P565j1pyafddUVDhO+clb3DtnDQs35ZKTXxy27YlI8+iauw898dUzOFxYyiXD\ne/DKssDDxN3aJXHgaEmzPvfpj7ez48BRJp2cxmm9O7F+b6DvmucW7+SZT3bQp0sbPvzf85tdv4g0\nn8LdhyaP6Fk1/fT0sxjdtwud2iZy2aMLWbMnv8a6XdomcqiwtMGfvWBDLgtq3cAtrwhc2ss6VETW\noUI27TvCead0b8YeiEhz6bKMz507tHtV65e/fmMMt1XrzmDGpIEsv+diXpwxLmzbm/DgAqY//ekJ\n13lu0U5WZh0O2zZF5Hg6c29FUtsn872Lh3JVRl/e25DD14IDeyQlhP93/OrdeVXX//MKS3l1eRbX\nnZ2OGdz16ipAPVaKtCSFeyvUt2vbqmCHwE3YL47uzeo9eWzcd4T7rhjBpSN6kFtwjEsfWdikbVS2\n2Ln6rL5kZuWxLjufWa+v5aoz+4RjF0SkHmoKKVUKikt5ZP4mfjB5KMkJgR4q31yVzTefXdZi20xt\nn8S73z+XFTsPc++cNWzbf5RPf3whaR0a37WCSGugdu4SNpV9yn/+9F68nrmnxbfXLimeT+++kLZJ\ngS+WR46V0T75sy+ZpeUVxJnpoSppldTOXcKm8pr8I18eVWP+y7ec3SLbO1pSzrB7/k36zLk8/dE2\nRtz7bz7eEhj35ca/LmHIj99kUB0jUDXU0x9tY0oTLzeJxAqFu9Rryd0XsvwnFxEXZzVugo7s05ne\nndtUve7ZKSXs2571eqBjtGv/tIj0mXOZv25f1bLi0nLmrcrm+tmLKa9wTP7dB7y0ZBcVFY7DhSWE\n+lY66/W1rM3Or3OZiF/osow02rxV2Wzbf7SqD/iC4lKOlVVgQHZeMb9+e0ONzsz6d2vLjgOFYa/j\n9guG8Mg7m4C62+s/9KWR/M9Zn40Q+dv/bOT1zD1s238UgA0/n8yWnKN0bJPA0h2HmDoqfE/0irQU\nXXMXTy3dcYgvPf4xT3z1DCaP6MmmfQV8/S+fsvtwUcRquHxkT/5w7RkAHCsrZ+jddQ843jYpnsKS\ncp67aSznDEqtmu+c49dvb+CxBVvY9sAUgiNJinhK4S6eyyssrdF9cEWFY2DwWvnKWRfz6baD3PDX\n6Ps3cEqPDmQdKuLb5w+uGvlqWM+OvHrrOQA8898d9OiUwuThPXji/S1MHz+AdsEbvhUVLmQnasWl\n5Xz/pUzuuHAI723IZfr4AbopLI2mcJeoVNnypvLafXFpOZc9upAbJw7kmjH9qpbHgkeuHsXtLwQG\nJN/6iyms3J3HFY99xPM3jaNv1zaktk8mJTGenPxibnpmKZm7aj6V+9v/OZ0vnqF2/9I4DQ13PcQk\nETXvtol0SPnsn11KYjzv3Hlu1etFd13A0x9vZ8LgVJ5auJXObZN4dfluDyqt3/vV7iu8tmI33/tH\nJgDX/OkTAC4edhJPXpfB84t3HRfsAIUl5Y3a3uacI6R3a0tCvNpBSP105i5R763VeykqLaNz2yT6\nd23L+b95/7h1bvncIJ54f4sH1TXdz6YO57rgk8IfbMxlzICuPLdoJ9eO7UdKYjyLth4gtUMyg9La\ns2jrAb78ZOCXxsQhqTx1fQZPLdzGjRMHVD1w5pV9+cVM/8un/PnrGfTs1Kb+N0iz6LKM+Npv3t7A\noq0HeX7GOOIMzAznHNv2HyUxPo6JDy3wusQGe+6msVz7p0U15v3qypH84OWVACz7yUUsWJ/DnS9l\nVi3/7oUn8/D8jcQZPHV9BucN7c63n1/OZaf1ZMppgV5BDxeW0CYpnuSEeFbsOswry7K48+Kh/HTO\nGu79wnA6tWnecIr5xaW8tXove/OK+e1/NjJxSCrP3DC2WZ8p9VO4S6u2aV8Ba/bkc8eLgWvi1Z+u\nTW2fzP4jx5g8vAfXnd2fbu2TueR3H3hZbr3uu2IEP3ltdcjlf5l+FtP/EuiNc/zgbny0+UDVsvun\njeC+N9ZSXFrBN8YPYPZH27hxwgCy84qZ9YXhrM3OZ+n2g3zv4qGNqmnUz97mcGEp5wzqxsdbAttb\nOetiOqZoDN6WpHAXAf61Yjep7ZMZPziVbfuPUlZewZCTOhy33j+XZpGe2paTT+rAabPe9qBS7908\naSC9Orfh3jlruHZsP34x7TTKKxy3PruMb547iNP7dq6xfqib3w9dOZIfvrySdT+bTJuk5l8yKiop\nJ6egmP7d2jX7s/xA4S7SRO9tyGF0vy50TEngi49/zA0TBvDfLQc4vW9nzhnUjfXZBRSWlvPgm+sj\n2m7fC8N7dawa4GXp3Reyek8+iXHGgaMlfOf55Sd877M3jmX84NSQy99es5e9+cVcd3Y6xaXlPLto\nJxOHpJKdV8yEwam8unw300b35ut/WczCTftDPmtw9FgZCzftZ/KIHs3b2aCiknIqnKtq3hptFO4i\nEfDNvy/lzdV7eXHGOAaktmNtdj5d2yWx53ARWYeK+PncdQB8+L/nccnDH3C0pJz7p43gx6+GvsTi\nN9NG9+bV5bu5fGRP3liZzeh+nfnlF0dWXQpbeveFnPnz+TXeU3lpaUTvjqzeHfjl8odrR5PWPpmx\nA7sBUFJWwfYDR/n9u5t5PXMPb393Etv3H+Xd9TnkF5fy0JWnU1hSxv6CEob16ghQ1SXFroNFTPrV\nAl791jmM7tcFgDv/kcnGfQWs2p0HfNZc98VPdzKqbxeG9jj+G19dnlu0k0Fp7arq3HWwkJTEeNI6\nJPODlzKZOqo3E4aE/qVXH4W7SAQcLixhTuYevjauf51nlU8t3Mrg7u05d2h3Nucc4bbnl/P8TeMo\nLC3j7Afe5bmbxlJSVsGczD28smw3XdslcfAEY90+d9NYRvXtzH+3HIjKB8Ai4aoz+5B75FiNLi4A\nBndvz+acI3W+59yhaazPLmBvrUHcrxnTl1F9O3P2wFQm/armTfj0bm156voMLvxt4JfQtgem8NTC\nbZx3SndeWLyT8YNTq4aTLK9wHDxaQlqH5OOe5ah8ve2BKQz40bway5pC4S4Sw97fmEuFc2TuOswt\nnxvE3rxiUhLj6RGic7bKAJl320SmPFqzx8vPn96LopIy5q/LqZp3Usdk9uUfa7kd8IlObRLJKwr0\nWfTijHFVzVGru2ZMP7Lzio77ZTN1VC8euXp01bFZdNcFjP3FOwDMvW0Cw3t1alJNCneRVmTD3gJS\nEuNq3HTMKyoFR40uILbvP4oZ9O/Wjn+v2cvNzyzlmRvGsGnfEX72xlruvOhkBnVvz7eCA7S0TYrn\nX7eO56KH629NNKpvZ+6bOoKlOw5W9ebZ2o0Z0JXF2w4eN79NYjzr7pvcpM/UE6oirUhd14Prasee\nnvpZ+F8yvEfV5YGJQ9LISO/CiF6dMAu0nBkzoCsXnHoSABee2p2z0rvytbP7k5wQzwcbc0lKiOMr\nTy3ikatHcWb/LnRpm0S75ARO69OJw0Wl/G7+phPW/I3xA/j+JSdz8zNLGdazIzMvPYWCY2Wc9fP5\nHCurqFqveu+fsaauYAcoKm3c08lNoTN3EWmywpKyqhGzalu+8xCn9+nM91/OZNHWg3w08/wavXOe\n6LpzTvDaeFqH5Kp7GVtyj3BB8OnkL2f05Z31Odw0cQDbDxQyZkAXpo3uw/b9R/np62sY2adzVP9C\nqLxk0xS6LCMiUWlddj6J8XEM7t6+0e89cOQYZkbXdkn1rvvI/E2M7teZQ4Ul3P7CCi4edhKPfeUM\ncgqOsS+/mF6d2mAGr2fu4az0rjw8fyObc46QdSjQvPXp6Wdx7tDuTHpoATsPBsYjuHZsP+6/YgTX\nzV5MpzaJ3DBhAB3bJPLDl1dy4Mgxtjdg3II7LzqZmyYNJCWxac8AKNxFRMLEOce+/GMhb2hDoLvn\n5xbv5Moz+7A55wh9u7SlQ0oC76zP4R9LdvH7a0Y3OdCrU7iLiPhQWAfINrPJZrbBzDab2cw6lpuZ\nPRpcvtLMzmhK0SIiEh71hruZxQOPAZcCw4BrzGxYrdUuBYYEf2YAj4e5ThERaYSGnLmPATY757Y6\n50qAF4CptdaZCvzNBXwCdDaznmGuVUREGqgh4d4b2FXtdVZwXmPXwcxmmNkSM1uSm5tbe7GIiIRJ\nRMfrcs496ZzLcM5lpKWlRXLTIiKtSkPCfTfQt9rrPsF5jV1HREQipCHh/ikwxMwGmFkScDUwp9Y6\nc4Drgq1mxgF5zrnsMNcqIiINVG/fMs65MjP7NvBvIB6Y7ZxbY2a3BJc/AcwDpgCbgUJgesuVLCIi\n9fHsISYzywV2NPHtqcD+MJbjJe1LdPLLvvhlP0D7Uqm/c67em5aehXtzmNmShjyhFQu0L9HJL/vi\nl/0A7UtjRbS1jIiIRIbCXUTEh2I13J/0uoAw0r5EJ7/si1/2A7QvjRKT19xFROTEYvXMXURETiDm\nwr2+7oejjZltN7NVZrbCzJYE53U1s/+Y2abgn12qrf+j4L5tMLNLvKsczGy2meWY2epq8xpdu5md\nGfw72BzsGtqiZF9mmdnu4LFZYWZTon1fzKyvmS0ws7VmtsbMbg/Oj7njcoJ9icXjkmJmi80sM7gv\nPw3O9+64OOdi5ofAQ1RbgIFAEpAJDPO6rnpq3g6k1pr3EDAzOD0TeDA4PSy4T8nAgOC+xntY+yTg\nDGB1c2oHFgPjAAPeBC6Nkn2ZBXy/jnWjdl+AnsAZwekOwMZgvTF3XE6wL7F4XAxoH5xOBBYF6/Hs\nuMTamXtDuh+OBVOBvwan/wpcUW3+C865Y865bQSe+B3jQX0AOOc+AGoP396o2i3Q9XNH59wnLvAv\n92/V3hMxIfYllKjdF+dctnNuWXC6AFhHoAfWmDsuJ9iXUKJ5X5xz7kjwZWLwx+HhcYm1cG9Q18JR\nxgHzzWxgd9zgAAACEklEQVSpmc0IzjvJfdb3zl7gpOB0LOxfY2vvHZyuPT9afMcCo4fNrvaVOSb2\nxczSgdEEzhJj+rjU2heIweNiZvFmtgLIAf7jnPP0uMRauMeiCc65UQRGq7rVzCZVXxj87RyTTZZi\nufagxwlc4hsFZAO/8bachjOz9sA/gTucc/nVl8XacaljX2LyuDjnyoP/1/sQOAsfUWt5RI9LrIV7\nzHUt7JzbHfwzB3iVwGWWfcGvXwT/zAmuHgv719jadwena8/3nHNuX/A/ZAXwJz67BBbV+2JmiQTC\n8Fnn3CvB2TF5XOral1g9LpWcc4eBBcBkPDwusRbuDel+OGqYWTsz61A5DVwMrCZQ8/XB1a4H/hWc\nngNcbWbJZjaAwJi0iyNbdb0aVXvwK2m+mY0L3vW/rtp7PGU1h4KcRuDYQBTvS3C7fwbWOed+W21R\nzB2XUPsSo8clzcw6B6fbABcB6/HyuETyjnI4fgh0LbyRwN3lH3tdTz21DiRwRzwTWFNZL9ANeAfY\nBMwHulZ7z4+D+7YBD1qV1Kr/eQJfi0sJXPu7oSm1AxkE/oNuAf5A8OG5KNiXZ4BVwMrgf7ae0b4v\nwAQCX+1XAiuCP1Ni8bicYF9i8biMBJYHa14N3BOc79lx0ROqIiI+FGuXZUREpAEU7iIiPqRwFxHx\nIYW7iIgPKdxFRHxI4S4i4kMKdxERH1K4i4j40P8DLbGt5O66KzYAAAAASUVORK5CYII=\n", 855 | "text/plain": [ 856 | "" 857 | ] 858 | }, 859 | "metadata": {}, 860 | "output_type": "display_data" 861 | } 862 | ], 863 | "source": [ 864 | "%matplotlib inline\n", 865 | "import matplotlib.pyplot as plt\n", 866 | "plt.plot(loss_track)\n", 867 | "print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))" 868 | ] 869 | }, 870 | { 871 | "cell_type": "markdown", 872 | "metadata": {}, 873 | "source": [ 874 | "Something is definitely getting learned." 875 | ] 876 | }, 877 | { 878 | "cell_type": "markdown", 879 | "metadata": {}, 880 | "source": [ 881 | "# Limitations of the model\n", 882 | "\n", 883 | "We have no control over transitions of `tf.nn.dynamic_rnn`, it is unrolled in a single sweep. Some of the things that are not possible without such control:\n", 884 | "\n", 885 | "- We can't feed previously generated tokens without falling back to Python loops. This means *we cannot make efficient inference with dynamic_rnn decoder*!\n", 886 | "\n", 887 | "- We can't use attention, because attention conditions decoder inputs on its previous state\n", 888 | "\n", 889 | "Solution would be to use `tf.nn.raw_rnn` instead of `tf.nn.dynamic_rnn` for decoder, as we will do in tutorial #2. " 890 | ] 891 | }, 892 | { 893 | "cell_type": "markdown", 894 | "metadata": {}, 895 | "source": [ 896 | "# Fun things to try (aka Exercises)\n", 897 | "\n", 898 | "- In `copy_task` increasing `max_sequence_size` and `vocab_upper`. Observe slower learning and general performance degradation.\n", 899 | "\n", 900 | "- For `decoder_inputs`, instead of shifted target sequence `[ W X Y Z]`, try feeding `[ ]`, like we've done when we tested forward pass. Does it break things? Or slows learning?" 901 | ] 902 | } 903 | ], 904 | "metadata": { 905 | "kernelspec": { 906 | "display_name": "Python 3", 907 | "language": "python", 908 | "name": "python3" 909 | }, 910 | "language_info": { 911 | "codemirror_mode": { 912 | "name": "ipython", 913 | "version": 3 914 | }, 915 | "file_extension": ".py", 916 | "mimetype": "text/x-python", 917 | "name": "python", 918 | "nbconvert_exporter": "python", 919 | "pygments_lexer": "ipython3", 920 | "version": "3.6.0" 921 | } 922 | }, 923 | "nbformat": 4, 924 | "nbformat_minor": 2 925 | } 926 | -------------------------------------------------------------------------------- /2-seq2seq-advanced.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced dynamic seq2seq with TensorFlow" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Encoder is bidirectional now. Decoder is implemented using `tf.nn.raw_rnn`. \n", 15 | "It feeds previously generated tokens during training as inputs, instead of target sequence." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "**UPDATE (16.02.2017)**: I learned some things after I wrote this tutorial. In particular:\n", 23 | " - [DONE] Replacing projection (one-hot encoding followed by linear layer) with embedding (indexing weights of linear layer directly) is more efficient.\n", 24 | " - When decoding, feeding previously generated tokens as inputs adds robustness to model's errors. However feeding ground truth speeds up training. Apperantly best practice is to mix both randomly when training.\n", 25 | "\n", 26 | "I will update tutorial to reflect this at some point." 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "import numpy as np\n", 38 | "import tensorflow as tf\n", 39 | "import helpers\n", 40 | "\n", 41 | "tf.reset_default_graph()\n", 42 | "sess = tf.InteractiveSession()" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 2, 48 | "metadata": { 49 | "collapsed": false 50 | }, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/plain": [ 55 | "'1.3.0'" 56 | ] 57 | }, 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "output_type": "execute_result" 61 | } 62 | ], 63 | "source": [ 64 | "tf.__version__" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 3, 70 | "metadata": { 71 | "collapsed": true 72 | }, 73 | "outputs": [], 74 | "source": [ 75 | "PAD = 0\n", 76 | "EOS = 1\n", 77 | "\n", 78 | "vocab_size = 10\n", 79 | "input_embedding_size = 20\n", 80 | "\n", 81 | "encoder_hidden_units = 20\n", 82 | "decoder_hidden_units = encoder_hidden_units * 2" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 4, 88 | "metadata": { 89 | "collapsed": false 90 | }, 91 | "outputs": [], 92 | "source": [ 93 | "encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')\n", 94 | "encoder_inputs_length = tf.placeholder(shape=(None,), dtype=tf.int32, name='encoder_inputs_length')\n", 95 | "\n", 96 | "decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "Previously we elected to manually feed `decoder_inputs` to better understand what is going on. Here we implement decoder with `tf.nn.raw_rnn` and will construct `decoder_inputs` step by step in the loop." 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "## Embeddings\n", 111 | "Setup embeddings (see tutorial 1)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 5, 117 | "metadata": { 118 | "collapsed": false 119 | }, 120 | "outputs": [], 121 | "source": [ 122 | "embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)\n", 123 | "\n", 124 | "encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "## Encoder\n", 132 | "\n", 133 | "We are replacing unidirectional `tf.nn.dynamic_rnn` with `tf.nn.bidirectional_dynamic_rnn` as the encoder.\n" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 6, 139 | "metadata": { 140 | "collapsed": true 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 7, 150 | "metadata": { 151 | "collapsed": true 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "encoder_cell = LSTMCell(encoder_hidden_units)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 8, 161 | "metadata": { 162 | "collapsed": false 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "((encoder_fw_outputs,\n", 167 | " encoder_bw_outputs),\n", 168 | " (encoder_fw_final_state,\n", 169 | " encoder_bw_final_state)) = (\n", 170 | " tf.nn.bidirectional_dynamic_rnn(cell_fw=encoder_cell,\n", 171 | " cell_bw=encoder_cell,\n", 172 | " inputs=encoder_inputs_embedded,\n", 173 | " sequence_length=encoder_inputs_length,\n", 174 | " dtype=tf.float32, time_major=True)\n", 175 | " )" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 9, 181 | "metadata": { 182 | "collapsed": false 183 | }, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "" 189 | ] 190 | }, 191 | "execution_count": 9, 192 | "metadata": {}, 193 | "output_type": "execute_result" 194 | } 195 | ], 196 | "source": [ 197 | "encoder_fw_outputs" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 10, 203 | "metadata": { 204 | "collapsed": false 205 | }, 206 | "outputs": [ 207 | { 208 | "data": { 209 | "text/plain": [ 210 | "" 211 | ] 212 | }, 213 | "execution_count": 10, 214 | "metadata": {}, 215 | "output_type": "execute_result" 216 | } 217 | ], 218 | "source": [ 219 | "encoder_bw_outputs" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 11, 225 | "metadata": { 226 | "collapsed": false 227 | }, 228 | "outputs": [ 229 | { 230 | "data": { 231 | "text/plain": [ 232 | "LSTMStateTuple(c=, h=)" 233 | ] 234 | }, 235 | "execution_count": 11, 236 | "metadata": {}, 237 | "output_type": "execute_result" 238 | } 239 | ], 240 | "source": [ 241 | "encoder_fw_final_state" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 12, 247 | "metadata": { 248 | "collapsed": false 249 | }, 250 | "outputs": [ 251 | { 252 | "data": { 253 | "text/plain": [ 254 | "LSTMStateTuple(c=, h=)" 255 | ] 256 | }, 257 | "execution_count": 12, 258 | "metadata": {}, 259 | "output_type": "execute_result" 260 | } 261 | ], 262 | "source": [ 263 | "encoder_bw_final_state" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "Have to concatenate forward and backward outputs and state. In this case we will not discard outputs, they would be used for attention." 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 13, 276 | "metadata": { 277 | "collapsed": false 278 | }, 279 | "outputs": [], 280 | "source": [ 281 | "encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2)\n", 282 | "\n", 283 | "encoder_final_state_c = tf.concat(\n", 284 | " (encoder_fw_final_state.c, encoder_bw_final_state.c), 1)\n", 285 | "\n", 286 | "encoder_final_state_h = tf.concat(\n", 287 | " (encoder_fw_final_state.h, encoder_bw_final_state.h), 1)\n", 288 | "\n", 289 | "encoder_final_state = LSTMStateTuple(\n", 290 | " c=encoder_final_state_c,\n", 291 | " h=encoder_final_state_h\n", 292 | ")" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "## Decoder" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 14, 305 | "metadata": { 306 | "collapsed": false 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "decoder_cell = LSTMCell(decoder_hidden_units)" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "Time and batch dimensions are dynamic, i.e. they can change in runtime, from batch to batch" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 15, 323 | "metadata": { 324 | "collapsed": true 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "encoder_max_time, batch_size = tf.unstack(tf.shape(encoder_inputs))" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "Next we need to decide how far to run decoder. There are several options for stopping criteria:\n", 336 | "- Stop after specified number of unrolling steps\n", 337 | "- Stop after model produced token\n", 338 | "\n", 339 | "The choice will likely be time-dependant. In legacy `translate` tutorial we can see that decoder unrolls for `len(encoder_input)+10` to allow for possibly longer translated sequence. Here we are doing a toy copy task, so how about we unroll decoder for `len(encoder_input)+2`, to allow model some room to make mistakes over 2 additional steps:" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 16, 345 | "metadata": { 346 | "collapsed": false 347 | }, 348 | "outputs": [], 349 | "source": [ 350 | "decoder_lengths = encoder_inputs_length + 3\n", 351 | "# +2 additional steps, +1 leading token for decoder inputs" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "## Output projection\n", 359 | "\n", 360 | "Decoder will contain manually specified by us transition step:\n", 361 | "```\n", 362 | "output(t) -> output projection(t) -> prediction(t) (argmax) -> input embedding(t+1) -> input(t+1)\n", 363 | "```\n", 364 | "\n", 365 | "In tutorial 1, we used `tf.contrib.layers.linear` layer to initialize weights and biases and apply operation for us. This is convenient, however now we need to specify parameters `W` and `b` of the output layer in global scope, and apply them at every step of the decoder." 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 17, 371 | "metadata": { 372 | "collapsed": false 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "W = tf.Variable(tf.random_uniform([decoder_hidden_units, vocab_size], -1, 1), dtype=tf.float32)\n", 377 | "b = tf.Variable(tf.zeros([vocab_size]), dtype=tf.float32)" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "## Decoder via `tf.nn.raw_rnn`\n", 385 | "\n", 386 | "`tf.nn.dynamic_rnn` allows for easy RNN construction, but is limited. \n", 387 | "\n", 388 | "For example, a nice way to increase robustness of the model is to feed as decoder inputs tokens that it previously generated, instead of shifted true sequence.\n", 389 | "\n", 390 | "![seq2seq-feed-previous](pictures/2-seq2seq-feed-previous.png)\n", 391 | "*Image borrowed from http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/*" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "First prepare tokens. Decoder would operate on column vectors of shape `(batch_size,)` representing single time steps of the batch." 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 18, 404 | "metadata": { 405 | "collapsed": false 406 | }, 407 | "outputs": [], 408 | "source": [ 409 | "assert EOS == 1 and PAD == 0\n", 410 | "\n", 411 | "eos_time_slice = tf.ones([batch_size], dtype=tf.int32, name='EOS')\n", 412 | "pad_time_slice = tf.zeros([batch_size], dtype=tf.int32, name='PAD')\n", 413 | "\n", 414 | "eos_step_embedded = tf.nn.embedding_lookup(embeddings, eos_time_slice)\n", 415 | "pad_step_embedded = tf.nn.embedding_lookup(embeddings, pad_time_slice)" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "Now for the tricky part.\n", 423 | "\n", 424 | "Remember that standard `tf.nn.dynamic_rnn` requires all inputs `(t, ..., t+n)` be passed in advance as a single tensor. \"Dynamic\" part of its name refers to the fact that `n` can change from batch to batch.\n", 425 | "\n", 426 | "Now, what if we want to implement more complex mechanic like when we want decoder to receive previously generated tokens as input at every timestamp (instead of lagged target sequence)? Or when we want to implement soft attention, where at every timestep we add additional fixed-len representation, derived from query produced by previous step's hidden state? `tf.nn.raw_rnn` is a way to solve this problem.\n", 427 | "\n", 428 | "Main part of specifying RNN with `tf.nn.raw_rnn` is *loop transition function*. It defines inputs of step `t` given outputs and state of step `t-1`.\n", 429 | "\n", 430 | "Loop transition function is a mapping `(time, previous_cell_output, previous_cell_state, previous_loop_state) -> (elements_finished, input, cell_state, output, loop_state)`. It is called *before* RNNCell to prepare its inputs and state. Everything is a Tensor except for initial call at time=0 when everything is `None` (except `time`).\n", 431 | "\n", 432 | "Note that decoder inputs are returned from the transition function but passed into it. You are supposed to index inputs manually using `time` Tensor." 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "Loop transition function is called two times:\n", 440 | " 1. Initial call at time=0 to provide initial cell_state and input to RNN.\n", 441 | " 2. Transition call for all following timesteps where you define transition between two adjacent steps.\n", 442 | "\n", 443 | "Lets define both cases separately." 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "Loop initial state is function of only `encoder_final_state` and embeddings:" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 19, 456 | "metadata": { 457 | "collapsed": true 458 | }, 459 | "outputs": [], 460 | "source": [ 461 | "def loop_fn_initial():\n", 462 | " initial_elements_finished = (0 >= decoder_lengths) # all False at the initial step\n", 463 | " initial_input = eos_step_embedded\n", 464 | " initial_cell_state = encoder_final_state\n", 465 | " initial_cell_output = None\n", 466 | " initial_loop_state = None # we don't need to pass any additional information\n", 467 | " return (initial_elements_finished,\n", 468 | " initial_input,\n", 469 | " initial_cell_state,\n", 470 | " initial_cell_output,\n", 471 | " initial_loop_state)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "Define transition function such that previously generated token (as judged in greedy manner by `argmax` over output projection) is passed as next input." 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 20, 484 | "metadata": { 485 | "collapsed": false 486 | }, 487 | "outputs": [], 488 | "source": [ 489 | "def loop_fn_transition(time, previous_output, previous_state, previous_loop_state):\n", 490 | "\n", 491 | " def get_next_input():\n", 492 | " output_logits = tf.add(tf.matmul(previous_output, W), b)\n", 493 | " prediction = tf.argmax(output_logits, axis=1)\n", 494 | " next_input = tf.nn.embedding_lookup(embeddings, prediction)\n", 495 | " return next_input\n", 496 | " \n", 497 | " elements_finished = (time >= decoder_lengths) # this operation produces boolean tensor of [batch_size]\n", 498 | " # defining if corresponding sequence has ended\n", 499 | "\n", 500 | " finished = tf.reduce_all(elements_finished) # -> boolean scalar\n", 501 | " input = tf.cond(finished, lambda: pad_step_embedded, get_next_input)\n", 502 | " state = previous_state\n", 503 | " output = previous_output\n", 504 | " loop_state = None\n", 505 | "\n", 506 | " return (elements_finished, \n", 507 | " input,\n", 508 | " state,\n", 509 | " output,\n", 510 | " loop_state)" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "Combine initializer and transition functions and create raw_rnn.\n", 518 | "\n", 519 | "Note that while all operations above are defined with TF's control flow and reduction ops, here we rely on checking if state is `None` to determine if it is an initializer call or transition call. This is not very clean API and might be changed in the future (indeed, `tf.nn.raw_rnn`'s doc contains warning that API is experimental)." 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 21, 525 | "metadata": { 526 | "collapsed": false 527 | }, 528 | "outputs": [], 529 | "source": [ 530 | "def loop_fn(time, previous_output, previous_state, previous_loop_state):\n", 531 | " if previous_state is None: # time == 0\n", 532 | " assert previous_output is None and previous_state is None\n", 533 | " return loop_fn_initial()\n", 534 | " else:\n", 535 | " return loop_fn_transition(time, previous_output, previous_state, previous_loop_state)\n", 536 | "\n", 537 | "decoder_outputs_ta, decoder_final_state, _ = tf.nn.raw_rnn(decoder_cell, loop_fn)\n", 538 | "decoder_outputs = decoder_outputs_ta.stack()" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 22, 544 | "metadata": { 545 | "collapsed": false 546 | }, 547 | "outputs": [ 548 | { 549 | "data": { 550 | "text/plain": [ 551 | "" 552 | ] 553 | }, 554 | "execution_count": 22, 555 | "metadata": {}, 556 | "output_type": "execute_result" 557 | } 558 | ], 559 | "source": [ 560 | "decoder_outputs" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "metadata": {}, 566 | "source": [ 567 | "To do output projection, we have to temporarilly flatten `decoder_outputs` from `[max_steps, batch_size, hidden_dim]` to `[max_steps*batch_size, hidden_dim]`, as `tf.matmul` needs rank-2 tensors at most." 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 23, 573 | "metadata": { 574 | "collapsed": false 575 | }, 576 | "outputs": [], 577 | "source": [ 578 | "decoder_max_steps, decoder_batch_size, decoder_dim = tf.unstack(tf.shape(decoder_outputs))\n", 579 | "decoder_outputs_flat = tf.reshape(decoder_outputs, (-1, decoder_dim))\n", 580 | "decoder_logits_flat = tf.add(tf.matmul(decoder_outputs_flat, W), b)\n", 581 | "decoder_logits = tf.reshape(decoder_logits_flat, (decoder_max_steps, decoder_batch_size, vocab_size))" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": 24, 587 | "metadata": { 588 | "collapsed": false, 589 | "scrolled": false 590 | }, 591 | "outputs": [], 592 | "source": [ 593 | "decoder_prediction = tf.argmax(decoder_logits, 2)" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "### Optimizer" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "RNN outputs tensor of shape `[max_time, batch_size, hidden_units]` which projection layer maps onto `[max_time, batch_size, vocab_size]`. `vocab_size` part of the shape is static, while `max_time` and `batch_size` is dynamic." 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 25, 613 | "metadata": { 614 | "collapsed": true 615 | }, 616 | "outputs": [], 617 | "source": [ 618 | "stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(\n", 619 | " labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),\n", 620 | " logits=decoder_logits,\n", 621 | ")\n", 622 | "\n", 623 | "loss = tf.reduce_mean(stepwise_cross_entropy)\n", 624 | "train_op = tf.train.AdamOptimizer().minimize(loss)" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": 26, 630 | "metadata": { 631 | "collapsed": false 632 | }, 633 | "outputs": [], 634 | "source": [ 635 | "sess.run(tf.global_variables_initializer())" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "metadata": {}, 641 | "source": [ 642 | "## Training on the toy task" 643 | ] 644 | }, 645 | { 646 | "cell_type": "markdown", 647 | "metadata": {}, 648 | "source": [ 649 | "Consider the copy task — given a random sequence of integers from a `vocabulary`, learn to memorize and reproduce input sequence. Because sequences are random, they do not contain any structure, unlike natural language." 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": 27, 655 | "metadata": { 656 | "collapsed": false 657 | }, 658 | "outputs": [ 659 | { 660 | "name": "stdout", 661 | "output_type": "stream", 662 | "text": [ 663 | "head of the batch:\n", 664 | "[7, 3, 7, 3, 8, 4]\n", 665 | "[8, 9, 7]\n", 666 | "[5, 2, 6, 7, 3, 9, 9, 8]\n", 667 | "[6, 7, 4, 2, 9, 6, 3, 3]\n", 668 | "[5, 2, 2, 3, 7, 3]\n", 669 | "[5, 6, 7, 9, 9]\n", 670 | "[6, 3, 3, 4]\n", 671 | "[4, 6, 5, 6, 4]\n", 672 | "[7, 7, 7, 3]\n", 673 | "[4, 6, 3, 9, 2, 5]\n" 674 | ] 675 | } 676 | ], 677 | "source": [ 678 | "batch_size = 100\n", 679 | "\n", 680 | "batches = helpers.random_sequences(length_from=3, length_to=8,\n", 681 | " vocab_lower=2, vocab_upper=10,\n", 682 | " batch_size=batch_size)\n", 683 | "\n", 684 | "print('head of the batch:')\n", 685 | "for seq in next(batches)[:10]:\n", 686 | " print(seq)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": 28, 692 | "metadata": { 693 | "collapsed": true 694 | }, 695 | "outputs": [], 696 | "source": [ 697 | "def next_feed():\n", 698 | " batch = next(batches)\n", 699 | " encoder_inputs_, encoder_input_lengths_ = helpers.batch(batch)\n", 700 | " decoder_targets_, _ = helpers.batch(\n", 701 | " [(sequence) + [EOS] + [PAD] * 2 for sequence in batch]\n", 702 | " )\n", 703 | " return {\n", 704 | " encoder_inputs: encoder_inputs_,\n", 705 | " encoder_inputs_length: encoder_input_lengths_,\n", 706 | " decoder_targets: decoder_targets_,\n", 707 | " }" 708 | ] 709 | }, 710 | { 711 | "cell_type": "code", 712 | "execution_count": 29, 713 | "metadata": { 714 | "collapsed": true 715 | }, 716 | "outputs": [], 717 | "source": [ 718 | "loss_track = []" 719 | ] 720 | }, 721 | { 722 | "cell_type": "code", 723 | "execution_count": 30, 724 | "metadata": { 725 | "collapsed": false, 726 | "scrolled": false 727 | }, 728 | "outputs": [ 729 | { 730 | "name": "stdout", 731 | "output_type": "stream", 732 | "text": [ 733 | "batch 0\n", 734 | " minibatch loss: 2.3642187118530273\n", 735 | " sample 1:\n", 736 | " input > [2 7 2 6 0 0 0 0]\n", 737 | " predicted > [1 1 1 1 6 1 6 0 0 0 0]\n", 738 | " sample 2:\n", 739 | " input > [8 8 7 9 0 0 0 0]\n", 740 | " predicted > [1 1 1 6 1 6 6 0 0 0 0]\n", 741 | " sample 3:\n", 742 | " input > [4 4 6 2 6 7 8 0]\n", 743 | " predicted > [0 0 6 6 6 6 6 3 7 4 0]\n", 744 | "\n", 745 | "batch 1000\n", 746 | " minibatch loss: 0.5487109422683716\n", 747 | " sample 1:\n", 748 | " input > [3 6 9 4 5 6 5 0]\n", 749 | " predicted > [3 6 6 9 6 5 5 1 0 0 0]\n", 750 | " sample 2:\n", 751 | " input > [2 3 3 7 2 0 0 0]\n", 752 | " predicted > [2 3 3 3 2 1 0 0 0 0 0]\n", 753 | " sample 3:\n", 754 | " input > [6 2 5 2 4 0 0 0]\n", 755 | " predicted > [6 2 2 2 4 1 0 0 0 0 0]\n", 756 | "\n", 757 | "batch 2000\n", 758 | " minibatch loss: 0.277699738740921\n", 759 | " sample 1:\n", 760 | " input > [7 6 7 9 9 0 0 0]\n", 761 | " predicted > [7 6 7 9 9 1 0 0 0 0 0]\n", 762 | " sample 2:\n", 763 | " input > [7 5 3 4 3 3 3 3]\n", 764 | " predicted > [7 5 4 3 3 3 3 3 1 0 0]\n", 765 | " sample 3:\n", 766 | " input > [7 9 8 4 6 8 6 3]\n", 767 | " predicted > [7 9 8 4 8 6 6 3 1 0 0]\n", 768 | "\n", 769 | "batch 3000\n", 770 | " minibatch loss: 0.13742178678512573\n", 771 | " sample 1:\n", 772 | " input > [8 5 2 4 7 5 4 5]\n", 773 | " predicted > [8 5 2 7 4 4 5 5 1 0 0]\n", 774 | " sample 2:\n", 775 | " input > [8 7 2 8 6 0 0 0]\n", 776 | " predicted > [8 7 2 8 6 1 0 0 0 0 0]\n", 777 | " sample 3:\n", 778 | " input > [9 8 9 7 0 0 0 0]\n", 779 | " predicted > [9 8 9 7 1 0 0 0 0 0 0]\n", 780 | "\n" 781 | ] 782 | } 783 | ], 784 | "source": [ 785 | "max_batches = 3001\n", 786 | "batches_in_epoch = 1000\n", 787 | "\n", 788 | "try:\n", 789 | " for batch in range(max_batches):\n", 790 | " fd = next_feed()\n", 791 | " _, l = sess.run([train_op, loss], fd)\n", 792 | " loss_track.append(l)\n", 793 | "\n", 794 | " if batch == 0 or batch % batches_in_epoch == 0:\n", 795 | " print('batch {}'.format(batch))\n", 796 | " print(' minibatch loss: {}'.format(sess.run(loss, fd)))\n", 797 | " predict_ = sess.run(decoder_prediction, fd)\n", 798 | " for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):\n", 799 | " print(' sample {}:'.format(i + 1))\n", 800 | " print(' input > {}'.format(inp))\n", 801 | " print(' predicted > {}'.format(pred))\n", 802 | " if i >= 2:\n", 803 | " break\n", 804 | " print()\n", 805 | "\n", 806 | "except KeyboardInterrupt:\n", 807 | " print('training interrupted')" 808 | ] 809 | }, 810 | { 811 | "cell_type": "code", 812 | "execution_count": 31, 813 | "metadata": { 814 | "collapsed": false 815 | }, 816 | "outputs": [ 817 | { 818 | "name": "stdout", 819 | "output_type": "stream", 820 | "text": [ 821 | "loss 0.1332 after 300100 examples (batch_size=100)\n" 822 | ] 823 | }, 824 | { 825 | "data": { 826 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XecFeW9x/HPb3ujLbuANJeyiqg0V0DA3igmGmOMmogl\nkRjjvZpEEyKxcRPlGks0GguRWGI08dowgiAWQESQ3puAdFjKUraX5/5xjssu29mzO6d836/Xvpgz\n85wzv3Hky7NznnnGnHOIiEh4ifK6ABERCTyFu4hIGFK4i4iEIYW7iEgYUriLiIQhhbuISBiqM9zN\nrIuZfWpmq8xspZndUU2b88zsoJkt8f/c1zTliohIfcTUo00J8Gvn3CIzawEsNLOPnHOrjmk32zl3\nWeBLFBGRhqqz5+6c2+mcW+RfPgysBjo1dWEiInL86tNzL2dmGUB/YF41m4eY2TJgO3CXc25lbZ+V\nlpbmMjIyGrJ7EZGIt3Dhwr3OufS62tU73M0sBXgLuNM5d+iYzYuArs65I2Y2EngXyKzmM8YAYwC6\ndu3KggUL6rt7EREBzOyb+rSr12gZM4vFF+yvOefePna7c+6Qc+6If3kKEGtmadW0e8E5l+Wcy0pP\nr/MfHhEROU71GS1jwIvAaufc4zW06eBvh5kN9H/uvkAWKiIi9VefyzJDgeuB5Wa2xL/uHqArgHPu\nOeAq4OdmVgLkA9c4TTcpIuKZOsPdOfc5YHW0eRp4OlBFiYhI4+gOVRGRMKRwFxEJQwp3EZEwFHLh\nvnbXYR6dtpYDuUVelyIiErRCLtw37T3C059uYHtOvteliIgErZAL99ZJcQDk5BV7XImISPAKuXBP\nifeN3swtKvG4EhGR4BVy4Z4UFw1AflGpx5WIiASvEAx3X889T+EuIlKjkAv3RH/PPU+XZUREahRy\n4Z4SH0NcTBTZhwu9LkVEJGiFXLhHRxltkmI1WkZEpBYhF+4AyXExGi0jIlKLkAz3xLhojZYREalF\nSIa7eu4iIrULyXBXz11EpHYhGe7J8dHkKtxFRGoUkuGeGBujnruISC1CMtx9PXddcxcRqUlIhvsX\nX+8jJ6+YTXtzvS5FRCQohWS4b92fB8CybTkeVyIiEpxCMtyf/fEAAFKT4zyuREQkOIVkuLdvmQDA\n9gN6GpOISHVCMtyjzAAY+/ZyjysREQlOIRnuPduleF2CiEhQC8lwj40+WnZZmfOwEhGR4BSS4Q6Q\nlhIPwBGNdxcRqSJkw/03w08G4FC+5nUXETlWyIZ7y4RYAA4q3EVEqgjZcAfftfbnZ270uA4RkeAT\nsuGe2b4FAJOX7vC4EhGR4BOy4d4jXcMhRURqErLhXtFDU1Z7XYKISFAJ6XDvlpYMwAuzdN1dRKSi\nOsPdzLqY2admtsrMVprZHdW0MTN7ysw2mNkyMxvQNOVWdvOwbuXLuw4WNMcuRURCQn167iXAr51z\nvYHBwC/MrPcxbUYAmf6fMcCzAa2yBq0SY8uXx7y6oDl2KSISEuoMd+fcTufcIv/yYWA10OmYZpcD\nrzifL4HWZnZCwKs9xmWnH93Fqh2Hmnp3IiIho0HX3M0sA+gPzDtmUydga4XX26j6D0DARUUZ55+c\nDkCJ5pgRESlX73A3sxTgLeBO59xxdZPNbIyZLTCzBdnZ2cfzEVU8fGWf8uWb/j4/IJ8pIhLq6hXu\nZhaLL9hfc869XU2T7UCXCq87+9dV4px7wTmX5ZzLSk9PP556q+jQKqF8+dO1gfkHQ0Qk1NVntIwB\nLwKrnXOP19BsMjDaP2pmMHDQObczgHXW6sJe7ZprVyIiISGmHm2GAtcDy81siX/dPUBXAOfcc8AU\nYCSwAcgDbgp8qTW769KT+XjNnubcpYhIUKsz3J1znwNWRxsH/CJQRTXUKSe09GrXIiJBKaTvUK3I\n/1hVPlzRbFeDRESCVtiEu/OPhFy67aC3hYiIBIGwCfeEWN+hFJeUeVyJiIj3wibcv30y05FCPVNV\nRCRswv3KAZ0B+Grzfo8rERHxXtiE+12XnATA19m5HlciIuK9sAn3mOiwORQRkUYLy0S84pk5Xpcg\nIuKpsAz3JVtzvC5BRMRTYRXuT13b3+sSRESCQliF+8jTOnhdgohIUAircK/4pWpBcamHlYiIeCus\nwh2gt38SsSc+WudxJSIi3gm7cO/ftTUAhZqGQEQiWNiF++0X9AQgs32Kx5WIiHgn7MI9Jd43Rf22\nA/keVyIi4p2wC/fkOF+4P/vZ1x5XIiLinbAL96ioow+Nyskr8rASERHvhF24V5STV+x1CSIingjL\ncH/++jMAze0uIpErLMP92y9VcxXuIhKhwjrcF2sCMRGJUGEZ7ie0TgBgwtQ1HlciIuKNsAz39JR4\nr0sQEfFUWIa7mdXdSEQkjIVluFdUpDlmRCQChW24n9DKd939gG5kEpEIFLbhfu9lvQHYn6twF5HI\nE7bhnpocB8CybRoOKSKRJ2zDvWVCLAC/fWu5x5WIiDS/sA339BYaDikikSvG6wKaSnqLeJLiornw\nlPZelyIi0uzCtucO0DU1iR05emiHiESesO25A6zZdRiAguJSEmKjPa5GRKT51NlzN7NJZrbHzFbU\nsP08MztoZkv8P/cFvszGOZSved1FJLLU57LMS8DwOtrMds718/+Mb3xZgXVQ4S4iEabOcHfOzQL2\nN0MtTealLzZ7XYKISLMK1BeqQ8xsmZlNNbNTA/SZjfbIVX0AeG3eFo8rERFpXoEI90VAV+dcH+Av\nwLs1NTSzMWa2wMwWZGdnB2DXtbtqQOcm34eISDBqdLg75w455474l6cAsWaWVkPbF5xzWc65rPT0\n9Mbuuk5RUZr6V0QiU6PD3cw6mH8CdTMb6P/MfY393EC5fvCJACz8JqS/NhARaZD6DIV8HZgLnGxm\n28zsJ2Z2q5nd6m9yFbDCzJYCTwHXOOdc05XcMJee2gGA7z871+NKRESaT503MTnnrq1j+9PA0wGr\nKMBOap/idQkiIs0urKcfAEiOD+ubcEVEqhX24Z5YYdqBILpaJCLSpMI+3CuOmPnDB6s9rEREpPmE\nfbgDTLoxC4AXP9/kcSUiIs0jIsL9gl5H53TXPDMiEgkiItwr6vvgdK9LEBFpchEX7iIikSBiwr1/\n19ZelyAi0mwiJtxfuD6rfLm4tMzDSkREml7EhHt6i3jaJMUCkDluqsfViIg0rYgJd4DZv73A6xJE\nRJpFRIV7SoWpCO5+c6mHlYiINK2ICveK3ly4jW/25XpdhohIk4i4cP/9qFPKl7cdyPewEhGRphNx\n4X5J7w7lyz/62zwPKxERaToRF+5d2yZVev3VZj2hSUTCT8SFO8Cwnkcf8fqD5+ZypLDEw2pERAIv\nIsP9pZvO5H8uP7X89bJtOR5WIyISeBEZ7jHRUQyt0Hu/bqKuvYtIeInIcAfonp5CanJc+euHp+pB\nHiISPiI23AHm/u7oHavPz9zI4QLN9S4i4SGiwz0+JrrS69MfmE5pmZ6zKiKhL6LDHWDlg5dWet3j\nninszy3yqBoRkcCI+HBPjo/h3z87q9K61+dv8agaEZHAiPhwBxjYLZWbh3Yrf/2naWs9rEZEpPEU\n7n6XnNq+0ushD3/MnsMFHlUjItI4Cne/wd3bsuGPI8pf7zhYwMA/fsyOHE0uJiKhR+FeQUx0FLN/\nc36ldUMmfOJRNSIix0/hfowuqUlV1v19ziYPKhEROX4K92pMu/OcSq8ffH8Vt722kILiUo8qEhFp\nGIV7NU7u0II1/zO80ropy3fR694Pmfv1Po+qEhGpP4V7DRJio7mgV7sq66+d+KUH1YiINIzCvRbP\n/fgMLqwm4EtKyzyoRkSk/hTutYiLieLFG8+ssv6TNXs8qEZEpP7qDHczm2Rme8xsRQ3bzcyeMrMN\nZrbMzAYEvszgMubVhWSM/YBi9eBFJEjVp+f+EjC8lu0jgEz/zxjg2caXFVw2TxjF5gmjePu2IZXW\nZ46bqhE0IhKU6gx359wsoLanSF8OvOJ8vgRam9kJgSowmAzo2obuacmV1vW690PeW7Ldo4pERKoX\niGvunYCtFV5v868LSy+Mzqqy7o43lnDvuyso01zwIhIkmvULVTMbY2YLzGxBdnZ2c+46YHq2S6ly\nkxPAq19+w6n3T/OgIhGRqgIR7tuBLhVed/avq8I594JzLss5l5Wenh6AXXvj5A4tmH/PhVXW5xeX\nkjH2A7buz/OgKhGRowIR7pOB0f5RM4OBg865nQH43KDWrmUCM351Lh//+twq285+5FMPKhIROao+\nQyFfB+YCJ5vZNjP7iZndama3+ptMATYCG4CJwG1NVm2Q6dkuhR7p1V+myRj7AZv25npQlYgImHPe\nfAmYlZXlFixY4Mm+m8LW/XnV9tjHjujFLWd3JzrKPKhKRMKNmS10zlUd2XEM3aEaIF1Sk9g8YRQ9\n0isPlZwwdQ3vLdnOoYJiTVsgIs1G4R5gH//6PAZ3T6207lf/XkqfB6Yz4snZHlUlIpFG4d4EXrl5\nULXr1+85wkt68IeINAOFexOIi4ni07vOq3bbA++vat5iRCQiKdybSLe0ZBbdezF3XpRZZVvG2A/I\nLSxhyvKd5BaWeFCdiIQ7hXsTSk2O4/sDOle77dT7p3Hba4u4553lzVyViEQChXsT65KaxLo/jGDp\n/ZdUu/29JTv0AG4RCTiFezOIi4miVWIsc8ZeUO32B99fxadr9QAQEQkchXsz6tQ6kYevPL3abTf9\n/Ssyxn7AL/+1BK9uLBOR8KFwb2bXnNmFnwzrRmJsdLXb31m8nfH/WcWeQwWs3HGwmasTkXCh6Qc8\nUlBcSq97P6yz3eYJo5qhGhEJFZp+IMglxEbzys0DGTuiF3+44rQa213xzJxmrEpEwkWM1wVEsnNO\nSueck3zz2v/10w3sOFhQpc2SrTn84rVF9O/amu/07UjLhFgS46q/pCMi8i1dlgkieUUl9L6v9qc5\nnXFiG976+ZBa24hI+NJlmRCUFBfD3286k/iYmk/Lwm8OsO9IYTNWJSKhSD33IFRa5li98xCX/eXz\nGttc0KsdI07rwOSlO9h9qIDpv6z6RCgRCT/17bnrmnsQio4yTuvUiqX3X0LfB6dX2+aTNXv4ZE3V\nG58O5hUTHxtFQg1DLUUkMuiyTBBrlRjL4nsvZt0fRtTZdvLSHQD0HT+dayd+2dSliUiQU7gHuTbJ\nccTFRLF6/HDO9Y+sqc5/v76YmeuyAVi8Jae5yhORIKVwDxGJcdG8fPNAEmJrPmU3TJpfvvz0J+ub\noywRCVL6QjXEOOeYs2Efs9dnc+dFJ3HxEzPZdiC/1vfMu+dC2rdMaKYKRaQp6QvVMGVmDMtMY1hm\nGgB//dEA/jlvC2dmpPLrN5dW+56py3eSFBfDpn25/HZ4r+YsV0Q8onAPcX06t6ZP59YAPPPZBjZm\n51ZpU/HRfj87pzutk+KarT4R8YauuYeRB797ap1t+o3/iMufmcM7i7cxf9P+ZqhKRLygcA8jZ2em\n868xg+tst3RrDr/811Kufn4uAPtziygpLWvq8kSkGekL1TCUV1TCjpwCTmybxLYD+Zz/6Gf1et/d\nl55M/66tWbXjEDcOySAmWv/2iwSb+n6hqnCPALsPFTDooY8b9J5OrRNrfCygiHhHE4dJufYtE3jz\n1rP46bBu9X7P9px8Fm85QG5hCS9/sZn8otImrFBEAk099wiUMfaD43rfpodHYmYczCsmLiZK88qL\neEA9d6nR4nsv5h8/GdTg91313FwO5hfTd/x0Lnp8ZhNUJiKBonCPQG2S4xiWmcb3+ncC4JazuzHm\nnO5c0a9jre9b+M2B8lkqt+fk49VvfSJSN12WiWBFJWXsPJjPiW2Ty9cdyC3iosdnsi+3qF6fMXF0\nFi9/sZkbhmSwdX8eV5/ZhZR43Rsn0lQ0WkaOW0FxKTNW7+b2fy5u8HtvObsb40b1xjnH+j1HyGyX\ngpk1QZUikUnhLo3mnKPb76Y06jMeuaoPV2d1CVBFIqKJw6TRzIxZd5/Psu05dGqdyKx1e3lixroG\nfcZ/lu1kQNfW7DxYQLsWCZzcoUUTVSsiFdUr3M1sOPAkEA38zTk34Zjt5wHvAZv8q952zo0PYJ3i\nka5tk+jaNgmAvp1bsyH7CO/7n/pUH7PWZXPR49nlrzf8cQQlZa7SYwBLSstYvv0g/bu2CVzhIhGu\nztEyZhYNPAOMAHoD15pZ72qaznbO9fP/KNjDUFSUlU9O1rNdCp/edR4v3zyQzm0S6/0ZPcdNpde9\nH/L5+r1c/vTnvLdkO499tI7v/fULVu442FSli0Sc+vTcBwIbnHMbAczsDeByYFWt75KwlJocx5PX\n9GNozzTSUuLplpbM7N+cT5mDHvfU//r8j1+cB8AdbyyhbbJvCuJ9R4ooK3NERekLWJHGqs84907A\n1gqvt/nXHWuImS0zs6lmVu3cs2Y2xswWmNmC7Ozs6ppICLi8XyfSUuLLX5sZ0VHGO7cNOa7P+3bY\n5ehJ8+l+zxQen76WguJSTXkg0gh1jpYxs6uA4c65n/pfXw8Mcs7dXqFNS6DMOXfEzEYCTzrnMmv7\nXI2WCU9FJWXkF5WSGBfNSb+f2ujPW//HEcRWmJ3SOUeZg2j17iVCBXL6ge1AxbFsnf3ryjnnDjnn\njviXpwCxZpbWgHolTMTFRNEqKZa4mCj+dFWf45rmoKKrn59LxtgPeH3+Fg7kFpH1hxnll3/+9dUW\n9hwuqNR+f24R63YfbtQ+RcJBfXruMcA64EJ8of4VcJ1zbmWFNh2A3c45Z2YDgf8DTnS1fLh67pHj\ncEEx/1m2k7Mz03j5i838ZngvMsc1vlf/rY9+eQ6Z7X1DLAf+cQZ7DheyecKoSm2OFJaQGButHr+E\nvID13J1zJcDtwDRgNfBv59xKM7vVzG71N7sKWGFmS4GngGtqC3aJLC0SYrl2YFc6t0li3KjexEZH\n8bfRWQzMSOW+y6obeNUwFz8xi8/X72X+pv3sOVwIwLYDeQDkF5VSVuY47f5p3PveikbvSyRU6A5V\n8ZRzjomzN/LQlDUB/+xXbh7I6Enz+e8LevLUJxsA+OuPBjDy9BOYvnIXU1fs4okf9gv4fkWakqb8\nlZBgZow5pwebJ4xi5t3ncfPQbvV6Dmx9fLhyF0B5sAPc9toituzLY8yrC3ln8XY+XLGT0jJfB2fp\n1hxy8uo3YZpIsFPPXYLSP+dtYfhpHUiKi6bPA9MpasIHeD/wnd7ExkQx7p0VZLZL4aNfndtk+xJp\nLPXcJaRdN6grqclxJMRGc36vdMB3SaUpPPD+Ksa947sev37PETbtzaWsrHKn50BuEc/P/BrnHEUl\nZTw2fS15RSVNUo9IIGjiMAl6Y87pwcx12Qzqlsqsu8+nsKSU7MOFjJ40n0d/0Jc7/7UkoPs7/9HP\nAPjb6Cw6tk6kfct4hj85m+zDhQw4sQ3rdh/mL59soLjUMXZEr4DuWyRQFO4S9M44sQ1r/mcEAG1T\nfOsy27dgw0MjAejTuRUXPBb4x/799JWqlw2XbMkpf3bswfzigO9TJFB0zV3CwrYDeWQfLqS41HH1\n83Pp2CqBF0ZncdlfPg/4vuJjoigsKaNDywTGjTqFR6atIaNtMhed0p77J69k1fhLSYpTv0mahh7W\nIRFry748uqQmYmaUlJbxyLS1vDBrIwBX9u/E24u31/EJxyfKoMzBn3/Yj0HdU8kvKiU+NpqOrRL0\nNCoJGIW7iJ9zjnmb9pPZLoW2KfF8tGo3t/gvuSTHRTPxhiyumzivyfb/0PdO57pBXSut23O4gOS4\nGJL1vFlpIIW7SA2KSsp4Z/E2vtu3U/n18/7jp3Mgr2mvoQ/rmcaNQzLISEvmosdnktE2iYmjs/i/\nRdu4sn/nSk+pOphfTKvE2CatR0KTwl2kAfKKSigsLuOb/XkczC/mhknzGTfyFP44ZXWz1fDZXeeR\nkZbMjFW7+ekrC3jz1rM4MyOVFdsPcmrHlszZsI9l23O47byezVaTBB+Fu0gjHcwrpu/46fzv908n\nv6iUB95v3ufTjDitA4u2HGD3oULO6t6WuRv3AbDp4ZHl1/BLyxxlzlWaFvlb01buYv3uw9x+Qa2z\nb0uI0U1MIo3UKimWzRNG8cMzu3Lj0G7l67+dWPKKfh1p1yK+hnc33tQVu9h9yDcR2rfBDrBoywGc\ncxSXltHjnilkjpvK3K9927fn5PPZ2j2Uljl+9upCHp2+jgO5Rfzu7WUUFJeyYPN+Nu3NbbKaJXio\n5y5ST898uoEWCTH8aNCJzN+0n7N6tGXJ1hxumDSfHunJPHZ1v/IboLzw1LX9+e/XF1dZ/+PBXfnH\nl1sY2rMtczb4/hF46+dn0TIhtnyq5GN9uGIXt/5jISsevJQU/5e+uYUlOCh/Ld7QZRkRjxSVlFV5\nCtXPzu3O8zM3lr++6JT2zFi9u7lLq+Lrh0ZSWFJaZVz+pU/MYu3uw/zzlkHc/eYyJo7OYuRTswHK\n58qfs2EvB/KKuKxPx2avO5LpsoyIR+JiosoDsGOrBEadfgK/vvhkhvRoW97m3stOIT4mipuGZnhU\npU+Pe6bQ+75pjH9/FQXFpTw5Yz2HC4rJL/Y9v/a6ifPYnpPPXz87OrPmS3M2AfCjv83j9n8u1hw7\nQUo9d5EmkldUQmx0VPmXnftzi7j33RU89L3TaZVUdZjj5KU7eHTaWrbsz2vuUhvsVxefxOMfrSt/\nXfHJVwdyi9iXW0TPdilelBb2dFlGJAQVlpSydX8+Fz0+k75dWvP7Uafwg+fmAvDoD/rSNjmOm176\nyuMqq5p593mc+6fPOOekdGatywZ8v7VM/q9hjHllASVljsm3Dytvv2lvLocLiunTuXWj930wr5iW\niTERcxewwl0kxDnnMDPmbNjL+0t3MOH7fQAoKC5lza7DXPHMnFrff9clJ/Ho9HW1tmlql57anmkr\nj3638O+fncXAbqlkjP0AgNdvGUzrpFg6t0kkKS6GO95YzAW92hEdZdzxxhJ+Oqwbf/t8E3N/dwEn\ntEqs8vl7DhUw8KGPueuSkyJmyKfCXSTMvTDra9JbxPO9/p3L15WVOca9u5zrB2fQu2NLVu04xHee\n/pzv9DmBd5fs8LDaul0/+ERe/fKbGrcf+9Dz0+6fxpHCo9f7J47O4uLe7XHOcf/klVx1Rufy3wxK\nSsuIjrJ69e6/2ryfdbsP86NBJx7nkTQthbuIVLIjJ5/Z67PplpbC1c/PZcKVpzP27eXl2zPbpbB+\nzxEPK6zdTUMzOCcznZho4+Uvvql2tNE/bxlEt7Rkznr4E1okxPDRL8+lZWIMve+bxt2Xnswvzq98\nd29OXhF7jxz9fmDVjkNVRgWt232YlgmxdGiV0MRHWD8KdxGp07eXRzZPGEVeUQl/nrG+fAZNgPd+\nMZQnP17PkB5t+cMHzTcVQ1NaNf5Sdh8q5LHpa/nPsp0AvH3bEHqf0JJe935Y3m7p/ZfQKjG20n+j\n4/H+0h38Z9kO/vzD/uVzGTWGwl1E6nQgt4ji0jLatTzaK80+XMjv313OEz/sV2n8e2FJKc99tpEn\nZviu42+eMIqt+/M4+5FPAeiamhQSI32O13f7duSqMzozetJ87rusNzcP8921nFtYghmMeHI2f7qq\nLwO7pZa/p+JvAo9f3ZcrB3Su9rMbor7hrlvNRCJYm+S4KuvSW8Tz/PVVsyM+Jpo7Lsqka9tEjhT4\nrnV3SU3i/duHces/FvL+7cMoLCll/ub9vLlgGzP9o2be+vkQnpv5NR+t8v6mrcaYvHQHk5f6vrcY\n/59VnNg2iS6pSVzyxCzat4xn96FCfvLyVzz43VN5bubX3Dy0G+kVpqeIaubRPOq5i0iz2JGTz9b9\neczZsJefnduDU++f5nVJnrhpaAb3f+fU436/eu4iElQ6tk6kY+tEBnX33al77knpREcZz/34DErL\nHNmHC+naNontOfkMnfAJAKd2bMldl5xcPrb/7Mw0Zq/fC8Ds35zPXz/bwOvzt3pzQMfp73M2M6xn\nGhee0r5J96Oeu4gEnfW7D/P6/K3cM7IXMdFRlJU5CkpKSYyNZtb6vezIyefagV1xzvHx6j38+eN1\nrNh+qPzyyKQbs7j5pdrzpW1yHJNuPJPL67hfoCmkxMew4sFLj+u9+kJVRCLGwbxiVu86xODuR+fv\n2XOogOIyR6vEWDbvzeW0Tq3KR74s+P1FpKX4roc/OWM9T8xYx4W92vHijWey+1ABy7cdJCkumm/2\n5/Hlxn1c0b8TKfExPD9zI09f15+Jszby2EfHf4NY386teK/CHbsNoXAXETlGTl4RRwpL6NwmqXzd\nkcISHvlwDWNH9KoyO2ZNDhUU0+eB6QAsuvdi7vzXEmaty+a/LujJry4+iakrdnHba4sAeP/2YXzn\n6c8rvX/ePRfSvuXxjZvXNXcRkWO0ToqjdVLlEUIp8TGMv/y0Bn1Oy4RYnrymH6nJcaQmx/HKzQMr\nbY+POTrh7mmdWnLuSencOCSD83u1O/7iG0g9dxGRACstczw6fS23nN2d1GqGmzaGeu4iIh6JjjJ+\nO7yXpzXoYR0iImFI4S4iEoYU7iIiYahe4W5mw81srZltMLOx1Ww3M3vKv32ZmQ0IfKkiIlJfdYa7\nmUUDzwAjgN7AtWbW+5hmI4BM/88Y4NkA1ykiIg1Qn577QGCDc26jc64IeAO4/Jg2lwOvOJ8vgdZm\ndkKAaxURkXqqT7h3AirOzLPNv66hbTCzMWa2wMwWZGdnN7RWERGpp2b9QtU594JzLss5l5Went6c\nuxYRiSj1uYlpO9ClwuvO/nUNbVPJwoUL95pZzU/DrV0asPc43xtsdCzBKVyOJVyOA3Qs36rXk7vr\nE+5fAZlm1g1fYF8DXHdMm8nA7Wb2BjAIOOic21nbhzrnjrvrbmYL6nP7bSjQsQSncDmWcDkO0LE0\nVJ3h7pwrMbPbgWlANDDJObfSzG71b38OmAKMBDYAecBNTVeyiIjUpV5zyzjnpuAL8Irrnquw7IBf\nBLY0ERE5XqF6h+oLXhcQQDqW4BQuxxIuxwE6lgbxbMpfERFpOqHacxcRkVqEXLjXNc9NsDGzzWa2\n3MyWmNns01SsAAADaUlEQVQC/7pUM/vIzNb7/2xTof3v/Me21syO7wm6AWJmk8xsj5mtqLCuwbWb\n2Rn+/wYb/HMQWZAcywNmtt1/bpaY2chgPxYz62Jmn5rZKjNbaWZ3+NeH3Hmp5VhC8bwkmNl8M1vq\nP5YH/eu9Oy/OuZD5wTda52ugOxAHLAV6e11XHTVvBtKOWfcIMNa/PBb4X/9yb/8xxQPd/Mca7WHt\n5wADgBWNqR2YDwwGDJgKjAiSY3kAuKuatkF7LMAJwAD/cgtgnb/ekDsvtRxLKJ4XA1L8y7HAPH89\nnp2XUOu512eem1BwOfCyf/ll4IoK699wzhU65zbhG1o6sJr3Nwvn3Cxg/zGrG1S7+eYYaumc+9L5\n/s99pcJ7mk0Nx1KToD0W59xO59wi//JhYDW+qT5C7rzUciw1CeZjcc65I/6Xsf4fh4fnJdTCvV5z\n2AQZB8wws4VmNsa/rr07epPXLqC9fzkUjq+htXfyLx+7Plj8l/mmqZ5U4VfmkDgWM8sA+uPrJYb0\neTnmWCAEz4uZRZvZEmAP8JFzztPzEmrhHoqGOef64ZsW+Rdmdk7Fjf5/nUNyyFIo1+73LL5LfP2A\nncBj3pZTf2aWArwF3OmcO1RxW6idl2qOJSTPi3Ou1P93vTO+Xvhpx2xv1vMSauHe4DlsvOac2+7/\ncw/wDr7LLLv9v37h/3OPv3koHF9Da9/uXz52veecc7v9fyHLgIkcvQQW1MdiZrH4wvA159zb/tUh\neV6qO5ZQPS/fcs7lAJ8Cw/HwvIRauJfPc2NmcfjmuZnscU01MrNkM2vx7TJwCbACX803+JvdALzn\nX54MXGNm8eabyycT35crwaRBtft/JT1kZoP93/qPrvAeT1nlZw58D9+5gSA+Fv9+XwRWO+cer7Ap\n5M5LTccSoucl3cxa+5cTgYuBNXh5XprzG+VA/OCbw2Ydvm+Xx3ldTx21dsf3jfhSYOW39QJtgY+B\n9cAMILXCe8b5j20tHowqOab+1/H9WlyM79rfT46ndiAL31/Qr4Gn8d88FwTH8iqwHFjm/8t2QrAf\nCzAM36/2y4Al/p+RoXheajmWUDwvfYDF/ppXAPf513t2XnSHqohIGAq1yzIiIlIPCncRkTCkcBcR\nCUMKdxGRMKRwFxEJQwp3EZEwpHAXEQlDCncRkTD0/wNUeX4fMLUzAAAAAElFTkSuQmCC\n", 827 | "text/plain": [ 828 | "" 829 | ] 830 | }, 831 | "metadata": {}, 832 | "output_type": "display_data" 833 | } 834 | ], 835 | "source": [ 836 | "%matplotlib inline\n", 837 | "import matplotlib.pyplot as plt\n", 838 | "plt.plot(loss_track)\n", 839 | "print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size))" 840 | ] 841 | } 842 | ], 843 | "metadata": { 844 | "kernelspec": { 845 | "display_name": "Python 3", 846 | "language": "python", 847 | "name": "python3" 848 | }, 849 | "language_info": { 850 | "codemirror_mode": { 851 | "name": "ipython", 852 | "version": 3 853 | }, 854 | "file_extension": ".py", 855 | "mimetype": "text/x-python", 856 | "name": "python", 857 | "nbconvert_exporter": "python", 858 | "pygments_lexer": "ipython3", 859 | "version": "3.6.0" 860 | } 861 | }, 862 | "nbformat": 4, 863 | "nbformat_minor": 2 864 | } 865 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Matvey Ezhov 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # seq2seq with TensorFlow 2 | Collection of unfinished tutorials. May be good for educational purposes. 3 | 4 | ## **1 - [simple sequence-to-sequence model with dynamic unrolling](1-seq2seq.ipynb)** 5 | > Deliberately slow-moving, explicit tutorial. I tried to thoroughly explain everything that I found in any way confusing. 6 | 7 | > Implements simple seq2seq model described in [Sutskever at al., 2014](https://arxiv.org/abs/1409.3215) and tests it against toy memorization task. 8 | 9 | ![1-seq2seq](pictures/1-seq2seq.png) 10 | *Picture from [Sutskever at al., 2014](https://arxiv.org/abs/1409.3215)* 11 | 12 | ## **2 - [advanced dynamic seq2seq](2-seq2seq-advanced.ipynb)** 13 | > Encoder is bidirectional now. Decoder is implemented using `tf.nn.raw_rnn`. It feeds previously generated tokens during training as inputs, instead of target sequence. 14 | 15 | ![2-seq2seq-feed-previous](pictures/2-seq2seq-feed-previous.png) 16 | *Picture from [Deep Learning for Chatbots](http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/)* 17 | 18 | ## **3 - [Using `tf.contrib.seq2seq`](3-seq2seq-native-new.ipynb)** (TF<=1.1) 19 | > New dynamic seq2seq appeared in r1.0. Let's try it. 20 | 21 | UPDATE: that this tutorial doesn't work with tf version > 1.1, API. I recommend checking out new [official tutorial](https://github.com/tensorflow/nmt) instead to learn high-level seq2seq API. 22 | -------------------------------------------------------------------------------- /helpers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def batch(inputs, max_sequence_length=None): 4 | """ 5 | Args: 6 | inputs: 7 | list of sentences (integer lists) 8 | max_sequence_length: 9 | integer specifying how large should `max_time` dimension be. 10 | If None, maximum sequence length would be used 11 | 12 | Outputs: 13 | inputs_time_major: 14 | input sentences transformed into time-major matrix 15 | (shape [max_time, batch_size]) padded with 0s 16 | sequence_lengths: 17 | batch-sized list of integers specifying amount of active 18 | time steps in each input sequence 19 | """ 20 | 21 | sequence_lengths = [len(seq) for seq in inputs] 22 | batch_size = len(inputs) 23 | 24 | if max_sequence_length is None: 25 | max_sequence_length = max(sequence_lengths) 26 | 27 | inputs_batch_major = np.zeros(shape=[batch_size, max_sequence_length], dtype=np.int32) # == PAD 28 | 29 | for i, seq in enumerate(inputs): 30 | for j, element in enumerate(seq): 31 | inputs_batch_major[i, j] = element 32 | 33 | # [batch_size, max_time] -> [max_time, batch_size] 34 | inputs_time_major = inputs_batch_major.swapaxes(0, 1) 35 | 36 | return inputs_time_major, sequence_lengths 37 | 38 | 39 | def random_sequences(length_from, length_to, 40 | vocab_lower, vocab_upper, 41 | batch_size): 42 | """ Generates batches of random integer sequences, 43 | sequence length in [length_from, length_to], 44 | vocabulary in [vocab_lower, vocab_upper] 45 | """ 46 | if length_from > length_to: 47 | raise ValueError('length_from > length_to') 48 | 49 | def random_length(): 50 | if length_from == length_to: 51 | return length_from 52 | return np.random.randint(length_from, length_to + 1) 53 | 54 | while True: 55 | yield [ 56 | np.random.randint(low=vocab_lower, 57 | high=vocab_upper, 58 | size=random_length()).tolist() 59 | for _ in range(batch_size) 60 | ] -------------------------------------------------------------------------------- /model_new.py: -------------------------------------------------------------------------------- 1 | # Working with TF commit 24466c2e6d32621cd85f0a78d47df6eed2c5c5a6 2 | 3 | import math 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | import tensorflow.contrib.seq2seq as seq2seq 8 | from tensorflow.contrib.layers import safe_embedding_lookup_sparse as embedding_lookup_unique 9 | from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple, GRUCell 10 | 11 | import helpers 12 | 13 | 14 | class Seq2SeqModel(): 15 | """Seq2Seq model usign blocks from new `tf.contrib.seq2seq`. 16 | Requires TF 1.0.0-alpha""" 17 | 18 | PAD = 0 19 | EOS = 1 20 | 21 | def __init__(self, encoder_cell, decoder_cell, vocab_size, embedding_size, 22 | bidirectional=True, 23 | attention=False, 24 | debug=False): 25 | self.debug = debug 26 | self.bidirectional = bidirectional 27 | self.attention = attention 28 | 29 | self.vocab_size = vocab_size 30 | self.embedding_size = embedding_size 31 | 32 | self.encoder_cell = encoder_cell 33 | self.decoder_cell = decoder_cell 34 | 35 | self._make_graph() 36 | 37 | @property 38 | def decoder_hidden_units(self): 39 | # @TODO: is this correct for LSTMStateTuple? 40 | return self.decoder_cell.output_size 41 | 42 | def _make_graph(self): 43 | if self.debug: 44 | self._init_debug_inputs() 45 | else: 46 | self._init_placeholders() 47 | 48 | self._init_decoder_train_connectors() 49 | self._init_embeddings() 50 | 51 | if self.bidirectional: 52 | self._init_bidirectional_encoder() 53 | else: 54 | self._init_simple_encoder() 55 | 56 | self._init_decoder() 57 | 58 | self._init_optimizer() 59 | 60 | def _init_debug_inputs(self): 61 | """ Everything is time-major """ 62 | x = [[5, 6, 7], 63 | [7, 6, 0], 64 | [0, 7, 0]] 65 | xl = [2, 3, 1] 66 | self.encoder_inputs = tf.constant(x, dtype=tf.int32, name='encoder_inputs') 67 | self.encoder_inputs_length = tf.constant(xl, dtype=tf.int32, name='encoder_inputs_length') 68 | 69 | self.decoder_targets = tf.constant(x, dtype=tf.int32, name='decoder_targets') 70 | self.decoder_targets_length = tf.constant(xl, dtype=tf.int32, name='decoder_targets_length') 71 | 72 | def _init_placeholders(self): 73 | """ Everything is time-major """ 74 | self.encoder_inputs = tf.placeholder( 75 | shape=(None, None), 76 | dtype=tf.int32, 77 | name='encoder_inputs', 78 | ) 79 | self.encoder_inputs_length = tf.placeholder( 80 | shape=(None,), 81 | dtype=tf.int32, 82 | name='encoder_inputs_length', 83 | ) 84 | 85 | # required for training, not required for testing 86 | self.decoder_targets = tf.placeholder( 87 | shape=(None, None), 88 | dtype=tf.int32, 89 | name='decoder_targets' 90 | ) 91 | self.decoder_targets_length = tf.placeholder( 92 | shape=(None,), 93 | dtype=tf.int32, 94 | name='decoder_targets_length', 95 | ) 96 | 97 | def _init_decoder_train_connectors(self): 98 | """ 99 | During training, `decoder_targets` 100 | and decoder logits. This means that their shapes should be compatible. 101 | 102 | Here we do a bit of plumbing to set this up. 103 | """ 104 | with tf.name_scope('DecoderTrainFeeds'): 105 | sequence_size, batch_size = tf.unstack(tf.shape(self.decoder_targets)) 106 | 107 | EOS_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.EOS 108 | PAD_SLICE = tf.ones([1, batch_size], dtype=tf.int32) * self.PAD 109 | 110 | self.decoder_train_inputs = tf.concat([EOS_SLICE, self.decoder_targets], axis=0) 111 | self.decoder_train_length = self.decoder_targets_length + 1 112 | 113 | decoder_train_targets = tf.concat([self.decoder_targets, PAD_SLICE], axis=0) 114 | decoder_train_targets_seq_len, _ = tf.unstack(tf.shape(decoder_train_targets)) 115 | decoder_train_targets_eos_mask = tf.one_hot(self.decoder_train_length - 1, 116 | decoder_train_targets_seq_len, 117 | on_value=self.EOS, off_value=self.PAD, 118 | dtype=tf.int32) 119 | decoder_train_targets_eos_mask = tf.transpose(decoder_train_targets_eos_mask, [1, 0]) 120 | 121 | # hacky way using one_hot to put EOS symbol at the end of target sequence 122 | decoder_train_targets = tf.add(decoder_train_targets, 123 | decoder_train_targets_eos_mask) 124 | 125 | self.decoder_train_targets = decoder_train_targets 126 | 127 | self.loss_weights = tf.ones([ 128 | batch_size, 129 | tf.reduce_max(self.decoder_train_length) 130 | ], dtype=tf.float32, name="loss_weights") 131 | 132 | def _init_embeddings(self): 133 | with tf.variable_scope("embedding") as scope: 134 | 135 | # Uniform(-sqrt(3), sqrt(3)) has variance=1. 136 | sqrt3 = math.sqrt(3) 137 | initializer = tf.random_uniform_initializer(-sqrt3, sqrt3) 138 | 139 | self.embedding_matrix = tf.get_variable( 140 | name="embedding_matrix", 141 | shape=[self.vocab_size, self.embedding_size], 142 | initializer=initializer, 143 | dtype=tf.float32) 144 | 145 | self.encoder_inputs_embedded = tf.nn.embedding_lookup( 146 | self.embedding_matrix, self.encoder_inputs) 147 | 148 | self.decoder_train_inputs_embedded = tf.nn.embedding_lookup( 149 | self.embedding_matrix, self.decoder_train_inputs) 150 | 151 | def _init_simple_encoder(self): 152 | with tf.variable_scope("Encoder") as scope: 153 | (self.encoder_outputs, self.encoder_state) = ( 154 | tf.nn.dynamic_rnn(cell=self.encoder_cell, 155 | inputs=self.encoder_inputs_embedded, 156 | sequence_length=self.encoder_inputs_length, 157 | time_major=True, 158 | dtype=tf.float32) 159 | ) 160 | 161 | def _init_bidirectional_encoder(self): 162 | with tf.variable_scope("BidirectionalEncoder") as scope: 163 | 164 | ((encoder_fw_outputs, 165 | encoder_bw_outputs), 166 | (encoder_fw_state, 167 | encoder_bw_state)) = ( 168 | tf.nn.bidirectional_dynamic_rnn(cell_fw=self.encoder_cell, 169 | cell_bw=self.encoder_cell, 170 | inputs=self.encoder_inputs_embedded, 171 | sequence_length=self.encoder_inputs_length, 172 | time_major=True, 173 | dtype=tf.float32) 174 | ) 175 | 176 | self.encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2) 177 | 178 | if isinstance(encoder_fw_state, LSTMStateTuple): 179 | 180 | encoder_state_c = tf.concat( 181 | (encoder_fw_state.c, encoder_bw_state.c), 1, name='bidirectional_concat_c') 182 | encoder_state_h = tf.concat( 183 | (encoder_fw_state.h, encoder_bw_state.h), 1, name='bidirectional_concat_h') 184 | self.encoder_state = LSTMStateTuple(c=encoder_state_c, h=encoder_state_h) 185 | 186 | elif isinstance(encoder_fw_state, tf.Tensor): 187 | self.encoder_state = tf.concat((encoder_fw_state, encoder_bw_state), 1, name='bidirectional_concat') 188 | 189 | def _init_decoder(self): 190 | with tf.variable_scope("Decoder") as scope: 191 | def output_fn(outputs): 192 | return tf.contrib.layers.linear(outputs, self.vocab_size, scope=scope) 193 | 194 | if not self.attention: 195 | decoder_fn_train = seq2seq.simple_decoder_fn_train(encoder_state=self.encoder_state) 196 | decoder_fn_inference = seq2seq.simple_decoder_fn_inference( 197 | output_fn=output_fn, 198 | encoder_state=self.encoder_state, 199 | embeddings=self.embedding_matrix, 200 | start_of_sequence_id=self.EOS, 201 | end_of_sequence_id=self.EOS, 202 | maximum_length=tf.reduce_max(self.encoder_inputs_length) + 3, 203 | num_decoder_symbols=self.vocab_size, 204 | ) 205 | else: 206 | 207 | # attention_states: size [batch_size, max_time, num_units] 208 | attention_states = tf.transpose(self.encoder_outputs, [1, 0, 2]) 209 | 210 | (attention_keys, 211 | attention_values, 212 | attention_score_fn, 213 | attention_construct_fn) = seq2seq.prepare_attention( 214 | attention_states=attention_states, 215 | attention_option="bahdanau", 216 | num_units=self.decoder_hidden_units, 217 | ) 218 | 219 | decoder_fn_train = seq2seq.attention_decoder_fn_train( 220 | encoder_state=self.encoder_state, 221 | attention_keys=attention_keys, 222 | attention_values=attention_values, 223 | attention_score_fn=attention_score_fn, 224 | attention_construct_fn=attention_construct_fn, 225 | name='attention_decoder' 226 | ) 227 | 228 | decoder_fn_inference = seq2seq.attention_decoder_fn_inference( 229 | output_fn=output_fn, 230 | encoder_state=self.encoder_state, 231 | attention_keys=attention_keys, 232 | attention_values=attention_values, 233 | attention_score_fn=attention_score_fn, 234 | attention_construct_fn=attention_construct_fn, 235 | embeddings=self.embedding_matrix, 236 | start_of_sequence_id=self.EOS, 237 | end_of_sequence_id=self.EOS, 238 | maximum_length=tf.reduce_max(self.encoder_inputs_length) + 3, 239 | num_decoder_symbols=self.vocab_size, 240 | ) 241 | 242 | (self.decoder_outputs_train, 243 | self.decoder_state_train, 244 | self.decoder_context_state_train) = ( 245 | seq2seq.dynamic_rnn_decoder( 246 | cell=self.decoder_cell, 247 | decoder_fn=decoder_fn_train, 248 | inputs=self.decoder_train_inputs_embedded, 249 | sequence_length=self.decoder_train_length, 250 | time_major=True, 251 | scope=scope, 252 | ) 253 | ) 254 | 255 | self.decoder_logits_train = output_fn(self.decoder_outputs_train) 256 | self.decoder_prediction_train = tf.argmax(self.decoder_logits_train, axis=-1, name='decoder_prediction_train') 257 | 258 | scope.reuse_variables() 259 | 260 | (self.decoder_logits_inference, 261 | self.decoder_state_inference, 262 | self.decoder_context_state_inference) = ( 263 | seq2seq.dynamic_rnn_decoder( 264 | cell=self.decoder_cell, 265 | decoder_fn=decoder_fn_inference, 266 | time_major=True, 267 | scope=scope, 268 | ) 269 | ) 270 | self.decoder_prediction_inference = tf.argmax(self.decoder_logits_inference, axis=-1, name='decoder_prediction_inference') 271 | 272 | def _init_optimizer(self): 273 | logits = tf.transpose(self.decoder_logits_train, [1, 0, 2]) 274 | targets = tf.transpose(self.decoder_train_targets, [1, 0]) 275 | self.loss = seq2seq.sequence_loss(logits=logits, targets=targets, 276 | weights=self.loss_weights) 277 | self.train_op = tf.train.AdamOptimizer().minimize(self.loss) 278 | 279 | def make_train_inputs(self, input_seq, target_seq): 280 | inputs_, inputs_length_ = helpers.batch(input_seq) 281 | targets_, targets_length_ = helpers.batch(target_seq) 282 | return { 283 | self.encoder_inputs: inputs_, 284 | self.encoder_inputs_length: inputs_length_, 285 | self.decoder_targets: targets_, 286 | self.decoder_targets_length: targets_length_, 287 | } 288 | 289 | def make_inference_inputs(self, input_seq): 290 | inputs_, inputs_length_ = helpers.batch(input_seq) 291 | return { 292 | self.encoder_inputs: inputs_, 293 | self.encoder_inputs_length: inputs_length_, 294 | } 295 | 296 | 297 | def make_seq2seq_model(**kwargs): 298 | args = dict(encoder_cell=LSTMCell(10), 299 | decoder_cell=LSTMCell(20), 300 | vocab_size=10, 301 | embedding_size=10, 302 | attention=True, 303 | bidirectional=True, 304 | debug=False) 305 | args.update(kwargs) 306 | return Seq2SeqModel(**args) 307 | 308 | 309 | def train_on_copy_task(session, model, 310 | length_from=3, length_to=8, 311 | vocab_lower=2, vocab_upper=10, 312 | batch_size=100, 313 | max_batches=5000, 314 | batches_in_epoch=1000, 315 | verbose=True): 316 | 317 | batches = helpers.random_sequences(length_from=length_from, length_to=length_to, 318 | vocab_lower=vocab_lower, vocab_upper=vocab_upper, 319 | batch_size=batch_size) 320 | loss_track = [] 321 | try: 322 | for batch in range(max_batches+1): 323 | batch_data = next(batches) 324 | fd = model.make_train_inputs(batch_data, batch_data) 325 | _, l = session.run([model.train_op, model.loss], fd) 326 | loss_track.append(l) 327 | 328 | if verbose: 329 | if batch == 0 or batch % batches_in_epoch == 0: 330 | print('batch {}'.format(batch)) 331 | print(' minibatch loss: {}'.format(session.run(model.loss, fd))) 332 | for i, (e_in, dt_pred) in enumerate(zip( 333 | fd[model.encoder_inputs].T, 334 | session.run(model.decoder_prediction_train, fd).T 335 | )): 336 | print(' sample {}:'.format(i + 1)) 337 | print(' enc input > {}'.format(e_in)) 338 | print(' dec train predicted > {}'.format(dt_pred)) 339 | if i >= 2: 340 | break 341 | print() 342 | except KeyboardInterrupt: 343 | print('training interrupted') 344 | 345 | return loss_track 346 | 347 | 348 | if __name__ == '__main__': 349 | import sys 350 | 351 | if 'fw-debug' in sys.argv: 352 | tf.reset_default_graph() 353 | with tf.Session() as session: 354 | model = make_seq2seq_model(debug=True) 355 | session.run(tf.global_variables_initializer()) 356 | session.run(model.decoder_prediction_train) 357 | session.run(model.decoder_prediction_train) 358 | 359 | elif 'fw-inf' in sys.argv: 360 | tf.reset_default_graph() 361 | with tf.Session() as session: 362 | model = make_seq2seq_model() 363 | session.run(tf.global_variables_initializer()) 364 | fd = model.make_inference_inputs([[5, 4, 6, 7], [6, 6]]) 365 | inf_out = session.run(model.decoder_prediction_inference, fd) 366 | print(inf_out) 367 | 368 | elif 'train' in sys.argv: 369 | tracks = {} 370 | 371 | tf.reset_default_graph() 372 | 373 | with tf.Session() as session: 374 | model = make_seq2seq_model(attention=True) 375 | session.run(tf.global_variables_initializer()) 376 | loss_track_attention = train_on_copy_task(session, model) 377 | 378 | tf.reset_default_graph() 379 | 380 | with tf.Session() as session: 381 | model = make_seq2seq_model(attention=False) 382 | session.run(tf.global_variables_initializer()) 383 | loss_track_no_attention = train_on_copy_task(session, model) 384 | 385 | import matplotlib.pyplot as plt 386 | plt.plot(loss_track) 387 | print('loss {:.4f} after {} examples (batch_size={})'.format(loss_track[-1], len(loss_track)*batch_size, batch_size)) 388 | 389 | else: 390 | tf.reset_default_graph() 391 | session = tf.InteractiveSession() 392 | model = make_seq2seq_model(debug=False) 393 | session.run(tf.global_variables_initializer()) 394 | 395 | fd = model.make_inference_inputs([[5, 4, 6, 7], [6, 6]]) 396 | 397 | inf_out = session.run(model.decoder_prediction_inference, fd) -------------------------------------------------------------------------------- /pictures/1-seq2seq.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ematvey/tensorflow-seq2seq-tutorials/f767fd66d940d7852e164731cc774de1f6c35437/pictures/1-seq2seq.png -------------------------------------------------------------------------------- /pictures/2-seq2seq-feed-previous.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ematvey/tensorflow-seq2seq-tutorials/f767fd66d940d7852e164731cc774de1f6c35437/pictures/2-seq2seq-feed-previous.png --------------------------------------------------------------------------------