├── README.md
├── conversion.png
├── data
    ├── small_vocab_en
    └── small_vocab_fr
├── decoder_shift.png
├── decoder_training_shift.png
├── dlnd_language_translationv2.html
├── dlnd_language_translationv2.ipynb
├── encoding_model.PNG
├── go_insert.png
├── gradient_clipping.PNG
├── inference_phase.PNG
├── lookup.png
├── pad_insert.png
├── params.p
└── training_phase.PNG


/README.md:
--------------------------------------------------------------------------------
 1 | # MLT (EN to FR ) TensorFlow
 2 | 
 3 | In this project, I am going to build language translation model called `seq2seq model or encoder-decoder model` in TensorFlow. The objective of the model is translating English sentences to French sentences. I am going to show the detailed steps, and they will answer to the questions like `how to preprocess the dataset`, `how to define inputs`, `how to define encoder model`, `how to define decoder model`, `how to build the entire seq2seq model`, `how to calculate the loss and clip gradients`, and `how to train and get prediction`. Please open the IPython notebook file to see the full workflow and detailed descriptions.
 4 | 
 5 | This is a part of Udacity's Deep Learning Nanodegree. Some codes/functions (save, load, measuring accuracy, etc) are provided by Udacity. However, majority part is implemented by myself along with much richer explanations and references on each section.
 6 | 
 7 | You can find only the model part explained in my medium post. https://medium.com/@parkchansung/seq2seq-model-in-tensorflow-ec0c557e560f
 8 | 
 9 | # Brief Overview of the Contents
10 | ### Data preprocessing
11 | In this section, you will see how to get the data, how to create `lookup table`, and how to `convert raw text to index based array` with the lookup table.
12 | 
13 | <div style="text-align:center;">
14 |   <img src='./conversion.png' alt='Drawing' width='500px'>
15 | </div>
16 | 
17 | ### Build model
18 | In short, this section will show how to `define the Seq2Seq model in TensorFlow`. The below steps (implementation) will be covered.
19 | - __(1)__ define input parameters to the encoder model
20 |   - `enc_dec_model_inputs`
21 | - __(2)__ build encoder model
22 |   - `encoding_layer`
23 | - __(3)__ define input parameters to the decoder model
24 |   - `enc_dec_model_inputs`, `process_decoder_input`, `decoding_layer`
25 | - __(4)__ build decoder model for training
26 |   - `decoding_layer_train`
27 | - __(5)__ build decoder model for inference
28 |   - `decoding_layer_infer`
29 | - __(6)__ put (4) and (5) together
30 |   - `decoding_layer`
31 | - __(7)__ connect encoder and decoder models
32 |   - `seq2seq_model`
33 | - __(8)__ train and estimate loss and accuracy
34 | 
35 | <div style="text-align:center;">
36 |   <img src="./decoder_shift.png" style="width:500px;"/>
37 | </div>
38 | 
39 | ### Training
40 | This section is about putting previously defined functions together to `build an actual instance of the model`. Furthermore, it will show how to `define cost function`, how to `apply optimizer` to the cost function, and how to modify the value of the gradients in the TensorFlow's optimizer module to perform `gradient clipping`.
41 | 
42 | <div style="text-align:center;">
43 |   <img src="./gradient_clipping.PNG" style="width:500px;"/>
44 | </div>
45 | 
46 | ### Prediction
47 | Nothing special but showing the prediction result.
48 | 


--------------------------------------------------------------------------------
/conversion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/conversion.png


--------------------------------------------------------------------------------
/decoder_shift.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/decoder_shift.png


--------------------------------------------------------------------------------
/decoder_training_shift.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/decoder_training_shift.png


--------------------------------------------------------------------------------
/dlnd_language_translationv2.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {
   6 |     "collapsed": true
   7 |    },
   8 |    "source": [
   9 |     "# Language Translation\n",
  10 |     "In this project, I am going to build language translation model called `seq2seq model or encoder-decoder model` in TensorFlow. The objective of the model is translating English sentences to French sentences. I am going to show the detailed steps, and they will answer to the questions like `how to preprocess the dataset`, `how to define inputs`, `how to define encoder model`, `how to define decoder model`, `how to build the entire seq2seq model`, `how to calculate the loss and clip gradients`, and `how to train and get prediction`.\n",
  11 |     "\n",
  12 |     "This is a part of Udacity's Deep Learning Nanodegree. Some codes/functions (save, load, measuring accuracy, etc) are provided by Udacity. However, majority part is implemented by myself along with much richer explanations and references on each section.  \n",
  13 |     "\n",
  14 |     "## Get the Data\n",
  15 |     "While I am running this project on my labtop computer, I cannot handle a huge dataset. Rather I am going to use the reduced size of the original dataset ([WMT10 French-English corpus](http://www.statmt.org/wmt10/training-giga-fren.tar)). This version of data is provided by Udacity. If you have GPU machines, feel free to run the codes. There is no other configuration needed."
  16 |    ]
  17 |   },
  18 |   {
  19 |    "cell_type": "code",
  20 |    "execution_count": 10,
  21 |    "metadata": {},
  22 |    "outputs": [],
  23 |    "source": [
  24 |     "import os\n",
  25 |     "import pickle\n",
  26 |     "import copy\n",
  27 |     "import numpy as np\n",
  28 |     "\n",
  29 |     "def load_data(path):\n",
  30 |     "    input_file = os.path.join(path)\n",
  31 |     "    with open(input_file, 'r', encoding='utf-8') as f:\n",
  32 |     "        data = f.read()\n",
  33 |     "\n",
  34 |     "    return data"
  35 |    ]
  36 |   },
  37 |   {
  38 |    "cell_type": "code",
  39 |    "execution_count": 11,
  40 |    "metadata": {},
  41 |    "outputs": [],
  42 |    "source": [
  43 |     "source_path = 'data/small_vocab_en'\n",
  44 |     "target_path = 'data/small_vocab_fr'\n",
  45 |     "source_text = load_data(source_path)\n",
  46 |     "target_text = load_data(target_path)"
  47 |    ]
  48 |   },
  49 |   {
  50 |    "cell_type": "markdown",
  51 |    "metadata": {},
  52 |    "source": [
  53 |     "## Explore the Data\n",
  54 |     "\n",
  55 |     "The two datasets store bunch of sentences in different language, and that is something we don't have to explore for now. You probably already know how your data looks like when you decided to download this one. **However**, it is worthwhile to explore how complex the datasets are. The complexity could suggest how we should approach to get the right result still considering some of restrictions. \n",
  56 |     "\n",
  57 |     "`note: ` The two files exactly contains the same number of lines. Each i-th line in both files has the same meaning but expressed in different languages."
  58 |    ]
  59 |   },
  60 |   {
  61 |    "cell_type": "code",
  62 |    "execution_count": 12,
  63 |    "metadata": {},
  64 |    "outputs": [
  65 |     {
  66 |      "name": "stdout",
  67 |      "output_type": "stream",
  68 |      "text": [
  69 |       "Dataset Brief Stats\n",
  70 |       "* number of unique words in English sample sentences: 227        [this is roughly measured/without any preprocessing]\n",
  71 |       "\n",
  72 |       "* English sentences\n",
  73 |       "\t- number of sentences: 137861\n",
  74 |       "\t- avg. number of words in a sentence: 13.225277634719028\n",
  75 |       "* French sentences\n",
  76 |       "\t- number of sentences: 137861 [data integrity check / should have the same number]\n",
  77 |       "\t- avg. number of words in a sentence: 14.226612312401622\n",
  78 |       "\n",
  79 |       "* Sample sentences range from 0 to 5\n",
  80 |       "[1-th] sentence\n",
  81 |       "\tEN: new jersey is sometimes quiet during autumn , and it is snowy in april .\n",
  82 |       "\tFR: new jersey est parfois calme pendant l' automne , et il est neigeux en avril .\n",
  83 |       "\n",
  84 |       "[2-th] sentence\n",
  85 |       "\tEN: the united states is usually chilly during july , and it is usually freezing in november .\n",
  86 |       "\tFR: les états-unis est généralement froid en juillet , et il gèle habituellement en novembre .\n",
  87 |       "\n",
  88 |       "[3-th] sentence\n",
  89 |       "\tEN: california is usually quiet during march , and it is usually hot in june .\n",
  90 |       "\tFR: california est généralement calme en mars , et il est généralement chaud en juin .\n",
  91 |       "\n",
  92 |       "[4-th] sentence\n",
  93 |       "\tEN: the united states is sometimes mild during june , and it is cold in september .\n",
  94 |       "\tFR: les états-unis est parfois légère en juin , et il fait froid en septembre .\n",
  95 |       "\n",
  96 |       "[5-th] sentence\n",
  97 |       "\tEN: your least liked fruit is the grape , but my least liked is the apple .\n",
  98 |       "\tFR: votre moins aimé fruit est le raisin , mais mon moins aimé est la pomme .\n",
  99 |       "\n"
 100 |      ]
 101 |     }
 102 |    ],
 103 |    "source": [
 104 |     "import numpy as np\n",
 105 |     "from collections import Counter\n",
 106 |     "\n",
 107 |     "print('Dataset Brief Stats')\n",
 108 |     "print('* number of unique words in English sample sentences: {}\\\n",
 109 |     "        [this is roughly measured/without any preprocessing]'.format(len(Counter(source_text.split()))))\n",
 110 |     "print()\n",
 111 |     "\n",
 112 |     "english_sentences = source_text.split('\\n')\n",
 113 |     "print('* English sentences')\n",
 114 |     "print('\\t- number of sentences: {}'.format(len(english_sentences)))\n",
 115 |     "print('\\t- avg. number of words in a sentence: {}'.format(np.average([len(sentence.split()) for sentence in english_sentences])))\n",
 116 |     "\n",
 117 |     "french_sentences = target_text.split('\\n')\n",
 118 |     "print('* French sentences')\n",
 119 |     "print('\\t- number of sentences: {} [data integrity check / should have the same number]'.format(len(french_sentences)))\n",
 120 |     "print('\\t- avg. number of words in a sentence: {}'.format(np.average([len(sentence.split()) for sentence in french_sentences])))\n",
 121 |     "print()\n",
 122 |     "\n",
 123 |     "sample_sentence_range = (0, 5)\n",
 124 |     "side_by_side_sentences = list(zip(english_sentences, french_sentences))[sample_sentence_range[0]:sample_sentence_range[1]]\n",
 125 |     "print('* Sample sentences range from {} to {}'.format(sample_sentence_range[0], sample_sentence_range[1]))\n",
 126 |     "\n",
 127 |     "for index, sentence in enumerate(side_by_side_sentences):\n",
 128 |     "    en_sent, fr_sent = sentence\n",
 129 |     "    print('[{}-th] sentence'.format(index+1))\n",
 130 |     "    print('\\tEN: {}'.format(en_sent))\n",
 131 |     "    print('\\tFR: {}'.format(fr_sent))\n",
 132 |     "    print()"
 133 |    ]
 134 |   },
 135 |   {
 136 |    "cell_type": "markdown",
 137 |    "metadata": {},
 138 |    "source": [
 139 |     "## Preprocessing\n",
 140 |     "\n",
 141 |     "Here are brief overview what steps will be done in this section\n",
 142 |     "\n",
 143 |     "- **create lookup tables** \n",
 144 |     "  - create two mapping tables \n",
 145 |     "      - (key, value) == (unique word string, its unique index)     - `(1)`\n",
 146 |     "      - (key, value) == (its unique index, unique word string)     - `(2)`\n",
 147 |     "      - `(1)` is used in the next step, and (2) is used later for prediction step\n",
 148 |     "      \n",
 149 |     "      \n",
 150 |     "- **text to word ids**\n",
 151 |     "  - convert each string word in the list of sentences to the index\n",
 152 |     "  - `(1)` is used for converting process\n",
 153 |     "  \n",
 154 |     "  \n",
 155 |     "- **save the pre-processed data**\n",
 156 |     "  - create two `(1)` mapping tables for English and French\n",
 157 |     "  - using the mapping tables, replace strings in the original source and target dataset with indicies\n",
 158 |     "\n",
 159 |     "### Create Lookup Tables\n",
 160 |     "\n",
 161 |     "As mentioned breifly, I am going to implement a function to create lookup tables. Since every models are mathmatically represented, the input and the output(prediction) should also be represented as numbers. That is why this step is necessary for NLP problem because human readable text is not machine readable. This function takes a list of sentences and returns two mapping tables (dictionary data type). Along with the list of sentences, there are special tokens, `<PAD>`, `<EOS>`, `<UNK>`, and `<GO>` to be added in the mapping tables. \n",
 162 |     "\n",
 163 |     "- (key, value) == (unique word string, its unique index)     - `(1)`\n",
 164 |     "- (key, value) == (its unique index, unique word string)     - `(2)`\n",
 165 |     "\n",
 166 |     "`(1)` will be used in the next step, `test to word ids`, to find a match between word and its index. `(2)` is not used in pre-processing step, but `(2)` will be used later. After making a prediction, the sequences of words in the output sentence will be represented as their indicies. The predicted output is machine readable but not human readable. That is why we need `(2)` to convert each indicies of words back into human readable words in string.\n",
 167 |     "\n",
 168 |     "<br/>\n",
 169 |     "<img src='./lookup.png' alt='Drawing' width='70%'>\n",
 170 |     "\n",
 171 |     "#### References\n",
 172 |     "- [Why special tokens?](https://datascience.stackexchange.com/questions/26947/why-do-we-need-to-add-start-s-end-s-symbols-when-using-recurrent-neural-n)\n",
 173 |     "- [Python `enumerate`](https://docs.python.org/3/library/functions.html#enumerate)"
 174 |    ]
 175 |   },
 176 |   {
 177 |    "cell_type": "code",
 178 |    "execution_count": 13,
 179 |    "metadata": {},
 180 |    "outputs": [],
 181 |    "source": [
 182 |     "CODES = {'<PAD>': 0, '<EOS>': 1, '<UNK>': 2, '<GO>': 3 }\n",
 183 |     "\n",
 184 |     "def create_lookup_tables(text):\n",
 185 |     "    # make a list of unique words\n",
 186 |     "    vocab = set(text.split())\n",
 187 |     "\n",
 188 |     "    # (1)\n",
 189 |     "    # starts with the special tokens\n",
 190 |     "    vocab_to_int = copy.copy(CODES)\n",
 191 |     "\n",
 192 |     "    # the index (v_i) will starts from 4 (the 2nd arg in enumerate() specifies the starting index)\n",
 193 |     "    # since vocab_to_int already contains special tokens\n",
 194 |     "    for v_i, v in enumerate(vocab, len(CODES)):\n",
 195 |     "        vocab_to_int[v] = v_i\n",
 196 |     "\n",
 197 |     "    # (2)\n",
 198 |     "    int_to_vocab = {v_i: v for v, v_i in vocab_to_int.items()}\n",
 199 |     "\n",
 200 |     "    return vocab_to_int, int_to_vocab"
 201 |    ]
 202 |   },
 203 |   {
 204 |    "cell_type": "markdown",
 205 |    "metadata": {},
 206 |    "source": [
 207 |     "### Text to Word Ids\n",
 208 |     "\n",
 209 |     "Two `(1)` lookup tables will be provided in `text_to_ids` functions as arguments. They will be used in the converting process for English(source) and French(target) respectively. This part is more like a programming part, so there are not much to mention. I will just go over few minor things to remember before jumping in.\n",
 210 |     "\n",
 211 |     "- original(raw) source & target datas contain a list of sentences\n",
 212 |     "  - they are represented as a string \n",
 213 |     "\n",
 214 |     "- the number of sentences are the same for English and French\n",
 215 |     " \n",
 216 |     "- by accessing each sentences, need to convert word into the corresponding index.\n",
 217 |     "  - each word should be stored in a list\n",
 218 |     "  - this makes the resuling list as a 2-D array ( row: sentence, column: word index )\n",
 219 |     "  \n",
 220 |     "- for every target sentences, special token, `<EOS>` should be inserted at the end\n",
 221 |     "  - this token suggests when to stop creating a sequence\n",
 222 |     "  \n",
 223 |     "<br/>\n",
 224 |     "<img src='./conversion.png' alt='Drawing' width='70%'>\n",
 225 |     "<br/>"
 226 |    ]
 227 |   },
 228 |   {
 229 |    "cell_type": "code",
 230 |    "execution_count": 14,
 231 |    "metadata": {},
 232 |    "outputs": [],
 233 |    "source": [
 234 |     "def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):\n",
 235 |     "    \"\"\"\n",
 236 |     "        1st, 2nd args: raw string text to be converted\n",
 237 |     "        3rd, 4th args: lookup tables for 1st and 2nd args respectively\n",
 238 |     "    \n",
 239 |     "        return: A tuple of lists (source_id_text, target_id_text) converted\n",
 240 |     "    \"\"\"\n",
 241 |     "    # empty list of converted sentences\n",
 242 |     "    source_text_id = []\n",
 243 |     "    target_text_id = []\n",
 244 |     "    \n",
 245 |     "    # make a list of sentences (extraction)\n",
 246 |     "    source_sentences = source_text.split(\"\\n\")\n",
 247 |     "    target_sentences = target_text.split(\"\\n\")\n",
 248 |     "    \n",
 249 |     "    max_source_sentence_length = max([len(sentence.split(\" \")) for sentence in source_sentences])\n",
 250 |     "    max_target_sentence_length = max([len(sentence.split(\" \")) for sentence in target_sentences])\n",
 251 |     "    \n",
 252 |     "    # iterating through each sentences (# of sentences in source&target is the same)\n",
 253 |     "    for i in range(len(source_sentences)):\n",
 254 |     "        # extract sentences one by one\n",
 255 |     "        source_sentence = source_sentences[i]\n",
 256 |     "        target_sentence = target_sentences[i]\n",
 257 |     "        \n",
 258 |     "        # make a list of tokens/words (extraction) from the chosen sentence\n",
 259 |     "        source_tokens = source_sentence.split(\" \")\n",
 260 |     "        target_tokens = target_sentence.split(\" \")\n",
 261 |     "        \n",
 262 |     "        # empty list of converted words to index in the chosen sentence\n",
 263 |     "        source_token_id = []\n",
 264 |     "        target_token_id = []\n",
 265 |     "        \n",
 266 |     "        for index, token in enumerate(source_tokens):\n",
 267 |     "            if (token != \"\"):\n",
 268 |     "                source_token_id.append(source_vocab_to_int[token])\n",
 269 |     "        \n",
 270 |     "        for index, token in enumerate(target_tokens):\n",
 271 |     "            if (token != \"\"):\n",
 272 |     "                target_token_id.append(target_vocab_to_int[token])\n",
 273 |     "                \n",
 274 |     "        # put <EOS> token at the end of the chosen target sentence\n",
 275 |     "        # this token suggests when to stop creating a sequence\n",
 276 |     "        target_token_id.append(target_vocab_to_int['<EOS>'])\n",
 277 |     "            \n",
 278 |     "        # add each converted sentences in the final list\n",
 279 |     "        source_text_id.append(source_token_id)\n",
 280 |     "        target_text_id.append(target_token_id)\n",
 281 |     "    \n",
 282 |     "    return source_text_id, target_text_id"
 283 |    ]
 284 |   },
 285 |   {
 286 |    "cell_type": "markdown",
 287 |    "metadata": {},
 288 |    "source": [
 289 |     "### Preprocess and Save Data\n",
 290 |     "\n",
 291 |     "`create_lookup_tables`, `text_to_ids` are generalized functions. It can  be used for other languages too. In this particular project, the target languages are English and French, so those languages have to fed into `create_lookup_tables`, `text_to_ids` functions to generate pre-processed dataset for this project. Here is the steps to do it.\n",
 292 |     "\n",
 293 |     "- Load data(text) from the original file for English and French\n",
 294 |     "- Make them lower case letters\n",
 295 |     "- Create lookup tables for both English and French\n",
 296 |     "- Convert the original data into the list of sentences whose words are represented in index\n",
 297 |     "- Finally, save the preprocessed data to the external file (checkpoint)"
 298 |    ]
 299 |   },
 300 |   {
 301 |    "cell_type": "code",
 302 |    "execution_count": 15,
 303 |    "metadata": {},
 304 |    "outputs": [],
 305 |    "source": [
 306 |     "def preprocess_and_save_data(source_path, target_path, text_to_ids):\n",
 307 |     "    # Preprocess\n",
 308 |     "    \n",
 309 |     "    # load original data (English, French)\n",
 310 |     "    source_text = load_data(source_path)\n",
 311 |     "    target_text = load_data(target_path)\n",
 312 |     "\n",
 313 |     "    # to the lower case\n",
 314 |     "    source_text = source_text.lower()\n",
 315 |     "    target_text = target_text.lower()\n",
 316 |     "\n",
 317 |     "    # create lookup tables for English and French data\n",
 318 |     "    source_vocab_to_int, source_int_to_vocab = create_lookup_tables(source_text)\n",
 319 |     "    target_vocab_to_int, target_int_to_vocab = create_lookup_tables(target_text)\n",
 320 |     "\n",
 321 |     "    # create list of sentences whose words are represented in index\n",
 322 |     "    source_text, target_text = text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int)\n",
 323 |     "\n",
 324 |     "    # Save data for later use\n",
 325 |     "    pickle.dump((\n",
 326 |     "        (source_text, target_text),\n",
 327 |     "        (source_vocab_to_int, target_vocab_to_int),\n",
 328 |     "        (source_int_to_vocab, target_int_to_vocab)), open('preprocess.p', 'wb'))"
 329 |    ]
 330 |   },
 331 |   {
 332 |    "cell_type": "code",
 333 |    "execution_count": 16,
 334 |    "metadata": {},
 335 |    "outputs": [],
 336 |    "source": [
 337 |     "preprocess_and_save_data(source_path, target_path, text_to_ids)"
 338 |    ]
 339 |   },
 340 |   {
 341 |    "cell_type": "markdown",
 342 |    "metadata": {},
 343 |    "source": [
 344 |     "# Check Point\n",
 345 |     " This project uses a small set of sentences. However, in general, NLP requires a huge amount of raw text data. It would take quite a long time to preprocess, so it is recommended to avoid whenever possible. In practice, save the preprocessed data to the external file could speed up your job and let you focus more on building a model."
 346 |    ]
 347 |   },
 348 |   {
 349 |    "cell_type": "code",
 350 |    "execution_count": 17,
 351 |    "metadata": {},
 352 |    "outputs": [],
 353 |    "source": [
 354 |     "import pickle\n",
 355 |     "\n",
 356 |     "def load_preprocess():\n",
 357 |     "    with open('preprocess.p', mode='rb') as in_file:\n",
 358 |     "        return pickle.load(in_file)"
 359 |    ]
 360 |   },
 361 |   {
 362 |    "cell_type": "code",
 363 |    "execution_count": 18,
 364 |    "metadata": {},
 365 |    "outputs": [],
 366 |    "source": [
 367 |     "import numpy as np\n",
 368 |     "\n",
 369 |     "(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = load_preprocess()"
 370 |    ]
 371 |   },
 372 |   {
 373 |    "cell_type": "markdown",
 374 |    "metadata": {},
 375 |    "source": [
 376 |     "### Check the Version of TensorFlow and Access to GPU\n",
 377 |     "Since the Recurrent Neural Networks is kind of heavy model to train, it is recommended to train the model in GPU environment. "
 378 |    ]
 379 |   },
 380 |   {
 381 |    "cell_type": "code",
 382 |    "execution_count": 19,
 383 |    "metadata": {},
 384 |    "outputs": [
 385 |     {
 386 |      "name": "stdout",
 387 |      "output_type": "stream",
 388 |      "text": [
 389 |       "TensorFlow Version: 1.7.0\n"
 390 |      ]
 391 |     },
 392 |     {
 393 |      "name": "stderr",
 394 |      "output_type": "stream",
 395 |      "text": [
 396 |       "/anaconda/envs/test/lib/python3.6/site-packages/ipykernel_launcher.py:12: UserWarning: No GPU found. Please use a GPU to train your neural network.\n",
 397 |       "  if sys.path[0] == '':\n"
 398 |      ]
 399 |     }
 400 |    ],
 401 |    "source": [
 402 |     "from distutils.version import LooseVersion\n",
 403 |     "import warnings\n",
 404 |     "import tensorflow as tf\n",
 405 |     "from tensorflow.python.layers.core import Dense\n",
 406 |     "\n",
 407 |     "# Check TensorFlow Version\n",
 408 |     "assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer'\n",
 409 |     "print('TensorFlow Version: {}'.format(tf.__version__))\n",
 410 |     "\n",
 411 |     "# Check for a GPU\n",
 412 |     "if not tf.test.gpu_device_name():\n",
 413 |     "    warnings.warn('No GPU found. Please use a GPU to train your neural network.')\n",
 414 |     "else:\n",
 415 |     "    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))"
 416 |    ]
 417 |   },
 418 |   {
 419 |    "cell_type": "markdown",
 420 |    "metadata": {},
 421 |    "source": [
 422 |     "## Build the Neural Network\n",
 423 |     "\n",
 424 |     " In this notebook, I am going to build a special kind of model called 'sequence to sequence' (seq2seq in short). You can separate the entire model into 2 small sub-models. The first sub-model is called as __[E]__ Encoder, and the second sub-model is called as __[D]__ Decoder. __[E]__ takes a raw input text data just like any other RNN architectures do. At the end, __[E]__ outputs a neural representation. This is a very typical work, but you need to pay attention what this output really is. The output of __[E]__ is going to be the input data for __[D]__.\n",
 425 |     "\n",
 426 |     "That is why we call __[E]__ as Encoder and __[D]__ as Decoder. __[E]__ makes an output encoded in neural representational form, and we don't know what it really is. It is somewhat encrypted. __[D]__ has the ability to look inside the __[E]__'s output, and it will create a totally different output data (translated in French in this case). \n",
 427 |     "\n",
 428 |     "In order to build such a model, there are 6 steps to do overall. I noted what functions to be implemented are related to those steps.\n",
 429 |     "- __(1)__ define input parameters to the encoder model\n",
 430 |     "  - `enc_dec_model_inputs`\n",
 431 |     "- __(2)__ build encoder model\n",
 432 |     "  - `encoding_layer`\n",
 433 |     "- __(3)__ define input parameters to the decoder model\n",
 434 |     "  - `enc_dec_model_inputs`, `process_decoder_input`, `decoding_layer`\n",
 435 |     "- __(4)__ build decoder model for training\n",
 436 |     "  - `decoding_layer_train`\n",
 437 |     "- __(5)__ build decoder model for inference\n",
 438 |     "  - `decoding_layer_infer`\n",
 439 |     "- __(6)__ put (4) and (5) together \n",
 440 |     "  - `decoding_layer`\n",
 441 |     "- __(7)__ connect encoder and decoder models\n",
 442 |     "  - `seq2seq_model`\n",
 443 |     "- __(8)__ train and estimate loss and accuracy\n",
 444 |     "\n",
 445 |     "<img src=\"./training_phase.png\" style=\"width:400px;\"/>\n",
 446 |     "<div style=\"text-align:center;\">Fig 1. Neural Machine Translation / Training Phase</div>\n",
 447 |     "<br/>\n",
 448 |     "The figure above is borrowed from Thang Luong's thesis ['Neural Machine Translation'](https://github.com/lmthang/thesis/blob/master/thesis.pdf)"
 449 |    ]
 450 |   },
 451 |   {
 452 |    "cell_type": "markdown",
 453 |    "metadata": {},
 454 |    "source": [
 455 |     "### Input (1), (3)\n",
 456 |     "\n",
 457 |     "`enc_dec_model_inputs` function creates and returns parameters (TF placeholders) related to building model. \n",
 458 |     "- inputs placeholder will be fed with English sentence data, and its shape is `[None, None]`. The first `None` means the batch size, and the batch size is unknown since user can set it. The second `None` means the lengths of sentences. The maximum length of setence is different from batch to batch, so it cannot be set with the exact number. \n",
 459 |     "  - One option is to set the lengths of every sentences to the maximum length across all sentences in every batch. No matter which method you choose, you need to add special character, `<PAD>` in empty positions. However, with the latter option, there could be unnecessarily more `<PAD>` characters.\n",
 460 |     "  \n",
 461 |     "\n",
 462 |     "- targets placeholder is similar to inputs placeholder except that it will be fed with French sentence data.\n",
 463 |     "\n",
 464 |     "\n",
 465 |     "- target_sequence_length placeholder represents the lengths of each sentences, so the shape is `None`, a column tensor, which is the same number to the batch size. This particular value is required as an argument of TrainerHelper to build decoder model for training. We will see in (4).\n",
 466 |     "\n",
 467 |     "\n",
 468 |     "- max_target_len gets the maximum value out of lengths of all the target sentences(sequences). As you know, we have the lengths of all the sentences in target_sequence_length parameter. The way to get the maximum value from it is to use [tf.reduce_max](https://www.tensorflow.org/api_docs/python/tf/reduce_max). "
 469 |    ]
 470 |   },
 471 |   {
 472 |    "cell_type": "code",
 473 |    "execution_count": 20,
 474 |    "metadata": {},
 475 |    "outputs": [],
 476 |    "source": [
 477 |     "def enc_dec_model_inputs():\n",
 478 |     "    inputs = tf.placeholder(tf.int32, [None, None], name='input')\n",
 479 |     "    targets = tf.placeholder(tf.int32, [None, None], name='targets') \n",
 480 |     "    \n",
 481 |     "    target_sequence_length = tf.placeholder(tf.int32, [None], name='target_sequence_length')\n",
 482 |     "    max_target_len = tf.reduce_max(target_sequence_length)    \n",
 483 |     "    \n",
 484 |     "    return inputs, targets, target_sequence_length, max_target_len"
 485 |    ]
 486 |   },
 487 |   {
 488 |    "cell_type": "markdown",
 489 |    "metadata": {},
 490 |    "source": [
 491 |     "`hyperparam_inputs` function creates and returns parameters (TF placeholders) related to hyper-parameters to the model. \n",
 492 |     "- lr_rate is learning rate\n",
 493 |     "- keep_prob is the keep probability for Dropouts\n"
 494 |    ]
 495 |   },
 496 |   {
 497 |    "cell_type": "code",
 498 |    "execution_count": 21,
 499 |    "metadata": {},
 500 |    "outputs": [],
 501 |    "source": [
 502 |     "def hyperparam_inputs():\n",
 503 |     "    lr_rate = tf.placeholder(tf.float32, name='lr_rate')\n",
 504 |     "    keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
 505 |     "    \n",
 506 |     "    return lr_rate, keep_prob"
 507 |    ]
 508 |   },
 509 |   {
 510 |    "cell_type": "markdown",
 511 |    "metadata": {},
 512 |    "source": [
 513 |     "### Process Decoder Input (3)\n",
 514 |     "<br/>\n",
 515 |     "<img src=\"./go_insert.png\" style=\"width:600px;\"/>\n",
 516 |     "<div style=\"text-align:center;\">Fig 2. `<GO>` insertion</div>\n",
 517 |     "<br/>\n",
 518 |     "\n",
 519 |     "On the decoder side, we need two different kinds of input for training and inference purposes repectively. While training phase, the input is provided as target label, but they still need to be embeded. On the inference phase, however, the output of each time step will be the input for the next time step. They also need to be embeded and embedding vector should be shared between two different phases.\n",
 520 |     "\n",
 521 |     "In this section, I am going to preprocess the target label data for the training phase. It is nothing special task. What all you need to do is add `<GO>` special token in front of all target data. `<GO>` token is a kind of guide token as saying like \"this is the start of the translation\". For this process, you need to know three libraries from TensorFlow.\n",
 522 |     "- [TF strided_slice](https://www.tensorflow.org/api_docs/python/tf/strided_slice)\n",
 523 |     "  - extracts a strided slice of a tensor (generalized python array indexing).\n",
 524 |     "  - can be thought as splitting into multiple tensors with the striding window size from begin to end\n",
 525 |     "  - arguments: TF Tensor, Begin, End, Strides\n",
 526 |     "- [TF fill](https://www.tensorflow.org/api_docs/python/tf/concat)\n",
 527 |     "  - creates a tensor filled with a scalar value.\n",
 528 |     "  - arguments: TF Tensor (must be int32/int64), value to fill\n",
 529 |     "- [TF concat](https://www.tensorflow.org/api_docs/python/tf/fill)\n",
 530 |     "  - concatenates tensors along one dimension.\n",
 531 |     "  - arguments: a list of TF Tensor (tf.fill and after_slice in this case), axis=1\n",
 532 |     "    \n",
 533 |     "After preprocessing the target label data, we will embed it later when implementing decoding_layer function."
 534 |    ]
 535 |   },
 536 |   {
 537 |    "cell_type": "code",
 538 |    "execution_count": 22,
 539 |    "metadata": {},
 540 |    "outputs": [],
 541 |    "source": [
 542 |     "def process_decoder_input(target_data, target_vocab_to_int, batch_size):\n",
 543 |     "    \"\"\"\n",
 544 |     "    Preprocess target data for encoding\n",
 545 |     "    :return: Preprocessed target data\n",
 546 |     "    \"\"\"\n",
 547 |     "    # get '<GO>' id\n",
 548 |     "    go_id = target_vocab_to_int['<GO>']\n",
 549 |     "    \n",
 550 |     "    after_slice = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])\n",
 551 |     "    after_concat = tf.concat( [tf.fill([batch_size, 1], go_id), after_slice], 1)\n",
 552 |     "    \n",
 553 |     "    return after_concat"
 554 |    ]
 555 |   },
 556 |   {
 557 |    "cell_type": "markdown",
 558 |    "metadata": {},
 559 |    "source": [
 560 |     "### Encoding (2)\n",
 561 |     "\n",
 562 |     "<img src=\"./encoding_model.png\" style=\"width:400px;\"/>\n",
 563 |     "<div style=\"text-align:center;\">Fig 3. Encoding model highlighted - Embedding/RNN layers</div>\n",
 564 |     "<br/>\n",
 565 |     "\n",
 566 |     "As depicted in Fig 3, the encoding model consists of two different parts. The first part is the embedding layer. Each word in a sentence will be represented with the number of features specified as `encoding_embedding_size`. This layer gives much richer representative power for the words [useful explanation](https://stackoverflow.com/questions/40784656/tf-contrib-layers-embed-sequence-is-for-what/44280918#44280918). The second part is the RNN layer(s). You can make use of any kind of RNN related techniques or algorithms. For example, in this project, multiple LSTM cells are stacked together after dropout technique is applied. You can use different kinds of RNN cells such as GRU.\n",
 567 |     "\n",
 568 |     "Embedding layer\n",
 569 |     "- [TF contrib.layers.embed_sequence](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence)\n",
 570 |     "\n",
 571 |     "RNN layers\n",
 572 |     "- [TF contrib.rnn.LSTMCell](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell)\n",
 573 |     "  - simply specifies how many internal units it has\n",
 574 |     "- [TF contrib.rnn.DropoutWrapper](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper)\n",
 575 |     "  - wraps a cell with keep probability value \n",
 576 |     "- [TF contrib.rnn.MultiRNNCell](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell)\n",
 577 |     "  - stacks multiple RNN (type) cells\n",
 578 |     "  - [how this API is used in action?](https://github.com/tensorflow/tensorflow/blob/6947f65a374ebf29e74bb71e36fd82760056d82c/tensorflow/docs_src/tutorials/recurrent.md#stacking-multiple-lstms)\n",
 579 |     "  \n",
 580 |     "Encoding model\n",
 581 |     "- [TF nn.dynamic_rnn](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)\n",
 582 |     "  - put Embedding layer and RNN layer(s) all together"
 583 |    ]
 584 |   },
 585 |   {
 586 |    "cell_type": "code",
 587 |    "execution_count": 23,
 588 |    "metadata": {
 589 |     "scrolled": false
 590 |    },
 591 |    "outputs": [],
 592 |    "source": [
 593 |     "def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, \n",
 594 |     "                   source_vocab_size, \n",
 595 |     "                   encoding_embedding_size):\n",
 596 |     "    \"\"\"\n",
 597 |     "    :return: tuple (RNN output, RNN state)\n",
 598 |     "    \"\"\"\n",
 599 |     "    embed = tf.contrib.layers.embed_sequence(rnn_inputs, \n",
 600 |     "                                             vocab_size=source_vocab_size, \n",
 601 |     "                                             embed_dim=encoding_embedding_size)\n",
 602 |     "    \n",
 603 |     "    stacked_cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])\n",
 604 |     "    \n",
 605 |     "    outputs, state = tf.nn.dynamic_rnn(stacked_cells, \n",
 606 |     "                                       embed, \n",
 607 |     "                                       dtype=tf.float32)\n",
 608 |     "    return outputs, state"
 609 |    ]
 610 |   },
 611 |   {
 612 |    "cell_type": "markdown",
 613 |    "metadata": {},
 614 |    "source": [
 615 |     "### Decoding - Training process (4)\n",
 616 |     "\n",
 617 |     "Decoding model can be thought of two separate processes, training and inference. It is not they have different architecture, but they share the same architecture and its parameters. It is that they have different strategy to feed the shared model.\n",
 618 |     "\n",
 619 |     "For this(training) and the next(inference) section, Fig 4 shows clearly shows what they are.\n",
 620 |     "\n",
 621 |     "<img src=\"./decoder_shift.png\" style=\"width:700px;\"/>\n",
 622 |     "<div style=\"text-align:center;\">Fig 4. Decoder shifted inputs</div>\n",
 623 |     "<br/>\n",
 624 |     "\n",
 625 |     "While encoder uses [TF contrib.layers.embed_sequence](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence), it is not applicable to decoder even though it may require its input embeded. That is because the same embedding vector should be shared via training and inferece phases. [TF contrib.layers.embed_sequence](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence) can only embed the prepared dataset before running. What needed for inference process is dynamic embedding capability. It is impossible to embed the output from the inference process before running the model because the output of the current time step will be the input of the next time step.\n",
 626 |     "\n",
 627 |     "How we can embed? We will see soon. However, for now, what you need to remember is training and inference processes share the same embedding parameters. For the training part, embeded input should be delivered. On the inference part, only embedding parameters used in the training part should be delivered.\n",
 628 |     "\n",
 629 |     "Let's see the training part first. \n",
 630 |     "- [`tf.contrib.seq2seq.TrainingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/TrainingHelper)\n",
 631 |     "  - TrainingHelper is where we pass the embeded input. As the name indicates, this is only a helper instance. This instance should be delivered to the BasicDecoder, which is the actual process of building the decoder model.\n",
 632 |     "- [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)\n",
 633 |     "  - BasicDecoder builds the decoder model. It means it connects the RNN layer(s) on the decoder side and the input prepared by TrainingHelper.\n",
 634 |     "- [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)\n",
 635 |     "  - dynamic_decode unrolls the decoder model so that actual prediction can be retrieved by BasicDecoder for each time steps."
 636 |    ]
 637 |   },
 638 |   {
 639 |    "cell_type": "code",
 640 |    "execution_count": 24,
 641 |    "metadata": {},
 642 |    "outputs": [],
 643 |    "source": [
 644 |     "def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, \n",
 645 |     "                         target_sequence_length, max_summary_length, \n",
 646 |     "                         output_layer, keep_prob):\n",
 647 |     "    \"\"\"\n",
 648 |     "    Create a training process in decoding layer \n",
 649 |     "    :return: BasicDecoderOutput containing training logits and sample_id\n",
 650 |     "    \"\"\"\n",
 651 |     "    dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, \n",
 652 |     "                                             output_keep_prob=keep_prob)\n",
 653 |     "    \n",
 654 |     "    # for only input layer\n",
 655 |     "    helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, \n",
 656 |     "                                               target_sequence_length)\n",
 657 |     "    \n",
 658 |     "    decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, \n",
 659 |     "                                              helper, \n",
 660 |     "                                              encoder_state, \n",
 661 |     "                                              output_layer)\n",
 662 |     "\n",
 663 |     "    # unrolling the decoder layer\n",
 664 |     "    outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, \n",
 665 |     "                                                      impute_finished=True, \n",
 666 |     "                                                      maximum_iterations=max_summary_length)\n",
 667 |     "    return outputs"
 668 |    ]
 669 |   },
 670 |   {
 671 |    "cell_type": "markdown",
 672 |    "metadata": {},
 673 |    "source": [
 674 |     "### Decoding - Inference process (5)\n",
 675 |     "\n",
 676 |     "- [`tf.contrib.seq2seq.GreedyEmbeddingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/GreedyEmbeddingHelper)\n",
 677 |     "  - GreedyEmbeddingHelper dynamically takes the output of the current step and give it to the next time step's input. In order to embed the each input result dynamically, embedding parameter(just bunch of weight values) should be provided. Along with it, GreedyEmbeddingHelper asks to give the `start_of_sequence_id` for the same amount as the batch size and `end_of_sequence_id`.\n",
 678 |     "- [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)\n",
 679 |     "  - same as described in the training process section\n",
 680 |     "- [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)\n",
 681 |     "  - same as described in the training process section"
 682 |    ]
 683 |   },
 684 |   {
 685 |    "cell_type": "code",
 686 |    "execution_count": 25,
 687 |    "metadata": {
 688 |     "scrolled": true
 689 |    },
 690 |    "outputs": [],
 691 |    "source": [
 692 |     "def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,\n",
 693 |     "                         end_of_sequence_id, max_target_sequence_length,\n",
 694 |     "                         vocab_size, output_layer, batch_size, keep_prob):\n",
 695 |     "    \"\"\"\n",
 696 |     "    Create a inference process in decoding layer \n",
 697 |     "    :return: BasicDecoderOutput containing inference logits and sample_id\n",
 698 |     "    \"\"\"\n",
 699 |     "    dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, \n",
 700 |     "                                             output_keep_prob=keep_prob)\n",
 701 |     "    \n",
 702 |     "    helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, \n",
 703 |     "                                                      tf.fill([batch_size], start_of_sequence_id), \n",
 704 |     "                                                      end_of_sequence_id)\n",
 705 |     "    \n",
 706 |     "    decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, \n",
 707 |     "                                              helper, \n",
 708 |     "                                              encoder_state, \n",
 709 |     "                                              output_layer)\n",
 710 |     "    \n",
 711 |     "    outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, \n",
 712 |     "                                                      impute_finished=True, \n",
 713 |     "                                                      maximum_iterations=max_target_sequence_length)\n",
 714 |     "    return outputs"
 715 |    ]
 716 |   },
 717 |   {
 718 |    "cell_type": "markdown",
 719 |    "metadata": {},
 720 |    "source": [
 721 |     "### Build the Decoding Layer (3), (6)\n",
 722 |     "\n",
 723 |     "__Embed the target sequences__\n",
 724 |     "\n",
 725 |     "- [TF contrib.layers.embed_sequence](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence) creates internal representation of embedding parameter, so we cannot look into or retrieve it. Rather, you need to create a embedding parameter manually by [TF Variable](https://www.tensorflow.org/api_docs/python/tf/Variable). \n",
 726 |     "\n",
 727 |     "- Manually created embedding parameter is used for training phase to convert provided target data(sequence of sentence) by [TF nn.embedding_lookup](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) before the training is run. [TF nn.embedding_lookup](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) with manually created embedding parameters returns the similar result to the [TF contrib.layers.embed_sequence](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence). For the inference process, whenever the output of the current time step is calculated via decoder, it will be embeded by the shared embedding parameter and become the input for the next time step. You only need to provide the embedding parameter to the GreedyEmbeddingHelper, then it will help the process.\n",
 728 |     "\n",
 729 |     "- [How embedding_lookup works?](https://stackoverflow.com/questions/34870614/what-does-tf-nn-embedding-lookup-function-do)\n",
 730 |     "  - In short, it selects specified rows\n",
 731 |     "  \n",
 732 |     "- Note: Please be careful about setting the variable scope. As mentioned previously, parameters/variables are shared between training and inference processes. Sharing can be specified via [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope).\n",
 733 |     "\n",
 734 |     "__Construct the decoder RNN layer(s)__\n",
 735 |     "- As depicted in Fig 3 and Fig 4, the number of RNN layer in the decoder model has to be equal to the number of RNN layer(s) in the encoder model.\n",
 736 |     "\n",
 737 |     "__Create an output layer to map the outputs of the decoder to the elements of our vocabulary__\n",
 738 |     "- This is just a fully connected layer to get probabilities of occurance of each words at the end."
 739 |    ]
 740 |   },
 741 |   {
 742 |    "cell_type": "code",
 743 |    "execution_count": 26,
 744 |    "metadata": {},
 745 |    "outputs": [],
 746 |    "source": [
 747 |     "def decoding_layer(dec_input, encoder_state,\n",
 748 |     "                   target_sequence_length, max_target_sequence_length,\n",
 749 |     "                   rnn_size,\n",
 750 |     "                   num_layers, target_vocab_to_int, target_vocab_size,\n",
 751 |     "                   batch_size, keep_prob, decoding_embedding_size):\n",
 752 |     "    \"\"\"\n",
 753 |     "    Create decoding layer\n",
 754 |     "    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)\n",
 755 |     "    \"\"\"\n",
 756 |     "    target_vocab_size = len(target_vocab_to_int)\n",
 757 |     "    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))\n",
 758 |     "    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)\n",
 759 |     "    \n",
 760 |     "    cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(rnn_size) for _ in range(num_layers)])\n",
 761 |     "    \n",
 762 |     "    with tf.variable_scope(\"decode\"):\n",
 763 |     "        output_layer = tf.layers.Dense(target_vocab_size)\n",
 764 |     "        train_output = decoding_layer_train(encoder_state, \n",
 765 |     "                                            cells, \n",
 766 |     "                                            dec_embed_input, \n",
 767 |     "                                            target_sequence_length, \n",
 768 |     "                                            max_target_sequence_length, \n",
 769 |     "                                            output_layer, \n",
 770 |     "                                            keep_prob)\n",
 771 |     "\n",
 772 |     "    with tf.variable_scope(\"decode\", reuse=True):\n",
 773 |     "        infer_output = decoding_layer_infer(encoder_state, \n",
 774 |     "                                            cells, \n",
 775 |     "                                            dec_embeddings, \n",
 776 |     "                                            target_vocab_to_int['<GO>'], \n",
 777 |     "                                            target_vocab_to_int['<EOS>'], \n",
 778 |     "                                            max_target_sequence_length, \n",
 779 |     "                                            target_vocab_size, \n",
 780 |     "                                            output_layer,\n",
 781 |     "                                            batch_size,\n",
 782 |     "                                            keep_prob)\n",
 783 |     "\n",
 784 |     "    return (train_output, infer_output)"
 785 |    ]
 786 |   },
 787 |   {
 788 |    "cell_type": "markdown",
 789 |    "metadata": {},
 790 |    "source": [
 791 |     "### Build the Seq2Seq model (7)\n",
 792 |     "\n",
 793 |     "In this section, previously defined functions, `encoding_layer`, `process_decoder_input`, and `decoding_layer` are put together to build the big picture, Sequence to Sequence model. "
 794 |    ]
 795 |   },
 796 |   {
 797 |    "cell_type": "code",
 798 |    "execution_count": 27,
 799 |    "metadata": {},
 800 |    "outputs": [],
 801 |    "source": [
 802 |     "def seq2seq_model(input_data, target_data, keep_prob, batch_size,\n",
 803 |     "                  target_sequence_length,\n",
 804 |     "                  max_target_sentence_length,\n",
 805 |     "                  source_vocab_size, target_vocab_size,\n",
 806 |     "                  enc_embedding_size, dec_embedding_size,\n",
 807 |     "                  rnn_size, num_layers, target_vocab_to_int):\n",
 808 |     "    \"\"\"\n",
 809 |     "    Build the Sequence-to-Sequence model\n",
 810 |     "    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)\n",
 811 |     "    \"\"\"\n",
 812 |     "    enc_outputs, enc_states = encoding_layer(input_data, \n",
 813 |     "                                             rnn_size, \n",
 814 |     "                                             num_layers, \n",
 815 |     "                                             keep_prob, \n",
 816 |     "                                             source_vocab_size, \n",
 817 |     "                                             enc_embedding_size)\n",
 818 |     "    \n",
 819 |     "    dec_input = process_decoder_input(target_data, \n",
 820 |     "                                      target_vocab_to_int, \n",
 821 |     "                                      batch_size)\n",
 822 |     "    \n",
 823 |     "    train_output, infer_output = decoding_layer(dec_input,\n",
 824 |     "                                               enc_states, \n",
 825 |     "                                               target_sequence_length, \n",
 826 |     "                                               max_target_sentence_length,\n",
 827 |     "                                               rnn_size,\n",
 828 |     "                                              num_layers,\n",
 829 |     "                                              target_vocab_to_int,\n",
 830 |     "                                              target_vocab_size,\n",
 831 |     "                                              batch_size,\n",
 832 |     "                                              keep_prob,\n",
 833 |     "                                              dec_embedding_size)\n",
 834 |     "    \n",
 835 |     "    return train_output, infer_output"
 836 |    ]
 837 |   },
 838 |   {
 839 |    "cell_type": "markdown",
 840 |    "metadata": {},
 841 |    "source": [
 842 |     "## Neural Network Training\n",
 843 |     "### Hyperparameters"
 844 |    ]
 845 |   },
 846 |   {
 847 |    "cell_type": "code",
 848 |    "execution_count": 28,
 849 |    "metadata": {},
 850 |    "outputs": [],
 851 |    "source": [
 852 |     "display_step = 300\n",
 853 |     "\n",
 854 |     "epochs = 13\n",
 855 |     "batch_size = 128\n",
 856 |     "\n",
 857 |     "rnn_size = 128\n",
 858 |     "num_layers = 3\n",
 859 |     "\n",
 860 |     "encoding_embedding_size = 200\n",
 861 |     "decoding_embedding_size = 200\n",
 862 |     "\n",
 863 |     "learning_rate = 0.001\n",
 864 |     "keep_probability = 0.5"
 865 |    ]
 866 |   },
 867 |   {
 868 |    "cell_type": "markdown",
 869 |    "metadata": {},
 870 |    "source": [
 871 |     "### Build the Graph\n",
 872 |     "`seq2seq_model` function creates the model. It defines how the feedforward and backpropagation should flow. The last step for this model to be trainable is deciding and applying what optimization algorithms to use. In this section, [TF contrib.seq2seq.sequence_loss](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/sequence_loss) is used to calculate the loss, then [TF train.AdamOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) is applied to calculate the gradient descent on the loss. Let's go over eatch steps in the code cell below.\n",
 873 |     "\n",
 874 |     "__load data from the checkpoint__\n",
 875 |     "- (source_int_text, target_int_text) are the input data, and (source_vocab_to_int, target_vocab_to_int) is the dictionary to lookup the index number of each words.\n",
 876 |     "- max_target_sentence_length is the length of the longest sentence from the source input data. This will be used for GreedyEmbeddingHelper when building inference process in the decoder mode.\n",
 877 |     "\n",
 878 |     "__create inputs__\n",
 879 |     "- inputs (input_data, targets, target_sequence_length, max_target_sequence_length) from enc_dec_model_inputs function\n",
 880 |     "- inputs (lr, keep_prob) from hyperparam_inputs function\n",
 881 |     "\n",
 882 |     "__build seq2seq model__\n",
 883 |     "- build the model by seq2seq_model function. It will return train_logits(logits to calculate the loss) and inference_logits(logits from prediction).\n",
 884 |     "\n",
 885 |     "__cost function__\n",
 886 |     "- [TF contrib.seq2seq.sequence_loss](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/sequence_loss) is used. This loss function is just a weighted softmax cross entropy loss function, but it is particularly designed to be applied in time series model (RNN). Weights should be explicitly provided as an argument, and it can be created by [TF sequence_mask](https://www.tensorflow.org/api_docs/python/tf/sequence_mask). In this project, [TF sequence_mask](https://www.tensorflow.org/api_docs/python/tf/sequence_mask) creates \\[batch_size, max_target_sequence_length\\] size of variable, then maks only the first target_sequence_length number of elements to 1. It means <PAD\\> parts will have less weight than others.\n",
 887 |     "\n",
 888 |     "__Optimizer__\n",
 889 |     "- [TF train.AdamOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) is used, and this is where the learning rate should be specified. You can choose other algorithms as well, this is just a choice.\n",
 890 |     "\n",
 891 |     "__Gradient Clipping__\n",
 892 |     "- Since recurrent neural networks is notorious about vanishing/exploding gradient, gradient clipping technique is believed to improve the issues. \n",
 893 |     "- The concept is really easy. You decide thresholds to keep the gradient to be in a certain boundary. In this project, the range of the threshold is between -1 and 1.\n",
 894 |     "- Now, you need to apply this conceptual knowledge to the TensorFlow code. Luckily, there is the official guide for this [TF Gradient Clipping How?](https://www.tensorflow.org/api_guides/python/train#Gradient_Clipping). In breif, you get the gradient values from the optimizer manually by calling [compute_gradients](https://www.tensorflow.org/api_docs/python/tf/train/Optimizer#compute_gradients), then manipulate the gradient values with [clip_by_value](https://www.tensorflow.org/api_docs/python/tf/clip_by_value). Lastly, you need to put back the modified gradients into the optimizer by calling [apply_gradients](https://www.tensorflow.org/api_docs/python/tf/train/Optimizer#apply_gradients)\n",
 895 |     "\n",
 896 |     "<img src=\"./gradient_clipping.png\" style=\"width:700px;\"/>\n",
 897 |     "<div style=\"text-align:center;\">Fig 4. Gradient Clipping</div>\n",
 898 |     "<br/>"
 899 |    ]
 900 |   },
 901 |   {
 902 |    "cell_type": "code",
 903 |    "execution_count": 29,
 904 |    "metadata": {},
 905 |    "outputs": [
 906 |     {
 907 |      "name": "stdout",
 908 |      "output_type": "stream",
 909 |      "text": [
 910 |       "WARNING:tensorflow:From /anaconda/envs/test/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
 911 |       "Instructions for updating:\n",
 912 |       "Use the retry module or similar alternatives.\n"
 913 |      ]
 914 |     }
 915 |    ],
 916 |    "source": [
 917 |     "save_path = 'checkpoints/dev'\n",
 918 |     "(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = load_preprocess()\n",
 919 |     "max_target_sentence_length = max([len(sentence) for sentence in source_int_text])\n",
 920 |     "\n",
 921 |     "train_graph = tf.Graph()\n",
 922 |     "with train_graph.as_default():\n",
 923 |     "    input_data, targets, target_sequence_length, max_target_sequence_length = enc_dec_model_inputs()\n",
 924 |     "    lr, keep_prob = hyperparam_inputs()\n",
 925 |     "    \n",
 926 |     "    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),\n",
 927 |     "                                                   targets,\n",
 928 |     "                                                   keep_prob,\n",
 929 |     "                                                   batch_size,\n",
 930 |     "                                                   target_sequence_length,\n",
 931 |     "                                                   max_target_sequence_length,\n",
 932 |     "                                                   len(source_vocab_to_int),\n",
 933 |     "                                                   len(target_vocab_to_int),\n",
 934 |     "                                                   encoding_embedding_size,\n",
 935 |     "                                                   decoding_embedding_size,\n",
 936 |     "                                                   rnn_size,\n",
 937 |     "                                                   num_layers,\n",
 938 |     "                                                   target_vocab_to_int)\n",
 939 |     "    \n",
 940 |     "    training_logits = tf.identity(train_logits.rnn_output, name='logits')\n",
 941 |     "    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')\n",
 942 |     "\n",
 943 |     "    # https://www.tensorflow.org/api_docs/python/tf/sequence_mask\n",
 944 |     "    # - Returns a mask tensor representing the first N positions of each cell.\n",
 945 |     "    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')\n",
 946 |     "\n",
 947 |     "    with tf.name_scope(\"optimization\"):\n",
 948 |     "        # Loss function - weighted softmax cross entropy\n",
 949 |     "        cost = tf.contrib.seq2seq.sequence_loss(\n",
 950 |     "            training_logits,\n",
 951 |     "            targets,\n",
 952 |     "            masks)\n",
 953 |     "\n",
 954 |     "        # Optimizer\n",
 955 |     "        optimizer = tf.train.AdamOptimizer(lr)\n",
 956 |     "\n",
 957 |     "        # Gradient Clipping\n",
 958 |     "        gradients = optimizer.compute_gradients(cost)\n",
 959 |     "        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]\n",
 960 |     "        train_op = optimizer.apply_gradients(capped_gradients)"
 961 |    ]
 962 |   },
 963 |   {
 964 |    "cell_type": "markdown",
 965 |    "metadata": {},
 966 |    "source": [
 967 |     "### Get Batches and Pad the source and target sequences\n",
 968 |     "<br/>\n",
 969 |     "<img src=\"./pad_insert.png\" style=\"width:300px;\"/>\n",
 970 |     "<div style=\"text-align:center;\">Fig 5. Padding character in empty space of sentences shorter than the longest one in a batch</div>\n",
 971 |     "<br/>"
 972 |    ]
 973 |   },
 974 |   {
 975 |    "cell_type": "code",
 976 |    "execution_count": 30,
 977 |    "metadata": {},
 978 |    "outputs": [],
 979 |    "source": [
 980 |     "def pad_sentence_batch(sentence_batch, pad_int):\n",
 981 |     "    \"\"\"Pad sentences with <PAD> so that each sentence of a batch has the same length\"\"\"\n",
 982 |     "    max_sentence = max([len(sentence) for sentence in sentence_batch])\n",
 983 |     "    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]\n",
 984 |     "\n",
 985 |     "\n",
 986 |     "def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):\n",
 987 |     "    \"\"\"Batch targets, sources, and the lengths of their sentences together\"\"\"\n",
 988 |     "    for batch_i in range(0, len(sources)//batch_size):\n",
 989 |     "        start_i = batch_i * batch_size\n",
 990 |     "\n",
 991 |     "        # Slice the right amount for the batch\n",
 992 |     "        sources_batch = sources[start_i:start_i + batch_size]\n",
 993 |     "        targets_batch = targets[start_i:start_i + batch_size]\n",
 994 |     "\n",
 995 |     "        # Pad\n",
 996 |     "        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))\n",
 997 |     "        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))\n",
 998 |     "\n",
 999 |     "        # Need the lengths for the _lengths parameters\n",
1000 |     "        pad_targets_lengths = []\n",
1001 |     "        for target in pad_targets_batch:\n",
1002 |     "            pad_targets_lengths.append(len(target))\n",
1003 |     "\n",
1004 |     "        pad_source_lengths = []\n",
1005 |     "        for source in pad_sources_batch:\n",
1006 |     "            pad_source_lengths.append(len(source))\n",
1007 |     "\n",
1008 |     "        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths"
1009 |    ]
1010 |   },
1011 |   {
1012 |    "cell_type": "markdown",
1013 |    "metadata": {},
1014 |    "source": [
1015 |     "### Train\n",
1016 |     "\n",
1017 |     "`get_accuracy`\n",
1018 |     "- compare the lengths of target(label) and logits(prediction)\n",
1019 |     "- add(pad) 0s at the end of the ones having the shorter length\n",
1020 |     "  - `[(0,0),(0,max_seq - target.shape[1])]` indicates the 2D array. The first (0,0) means no padding for the first dimension. The second (0, ...) means there is no pads in front of the second dimension but pads at the end. And pad as many times as ... .\n",
1021 |     "- above process is to makes two entities to have the same shape (length)\n",
1022 |     "- finally, returns the average of where the target and logits have the same value (1)\n",
1023 |     "\n",
1024 |     "[numpy pad function](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.pad.html)"
1025 |    ]
1026 |   },
1027 |   {
1028 |    "cell_type": "code",
1029 |    "execution_count": 31,
1030 |    "metadata": {
1031 |     "scrolled": true
1032 |    },
1033 |    "outputs": [
1034 |     {
1035 |      "name": "stdout",
1036 |      "output_type": "stream",
1037 |      "text": [
1038 |       "Epoch   0 Batch  300/1077 - Train Accuracy: 0.4293, Validation Accuracy: 0.5064, Loss: 1.9365\n",
1039 |       "Epoch   0 Batch  600/1077 - Train Accuracy: 0.4974, Validation Accuracy: 0.5241, Loss: 1.1117\n",
1040 |       "Epoch   0 Batch  900/1077 - Train Accuracy: 0.5410, Validation Accuracy: 0.5614, Loss: 0.9078\n",
1041 |       "Epoch   1 Batch  300/1077 - Train Accuracy: 0.5958, Validation Accuracy: 0.6044, Loss: 0.6807\n",
1042 |       "Epoch   1 Batch  600/1077 - Train Accuracy: 0.6786, Validation Accuracy: 0.6346, Loss: 0.5282\n",
1043 |       "Epoch   1 Batch  900/1077 - Train Accuracy: 0.7066, Validation Accuracy: 0.6779, Loss: 0.5059\n",
1044 |       "Epoch   2 Batch  300/1077 - Train Accuracy: 0.7512, Validation Accuracy: 0.7383, Loss: 0.4164\n",
1045 |       "Epoch   2 Batch  600/1077 - Train Accuracy: 0.7894, Validation Accuracy: 0.7784, Loss: 0.3555\n",
1046 |       "Epoch   2 Batch  900/1077 - Train Accuracy: 0.8164, Validation Accuracy: 0.7869, Loss: 0.3256\n",
1047 |       "Epoch   3 Batch  300/1077 - Train Accuracy: 0.8651, Validation Accuracy: 0.8224, Loss: 0.2444\n",
1048 |       "Epoch   3 Batch  600/1077 - Train Accuracy: 0.8519, Validation Accuracy: 0.8171, Loss: 0.2170\n",
1049 |       "Epoch   3 Batch  900/1077 - Train Accuracy: 0.8676, Validation Accuracy: 0.8509, Loss: 0.2241\n",
1050 |       "Epoch   4 Batch  300/1077 - Train Accuracy: 0.9219, Validation Accuracy: 0.8750, Loss: 0.1508\n",
1051 |       "Epoch   4 Batch  600/1077 - Train Accuracy: 0.9022, Validation Accuracy: 0.8984, Loss: 0.1493\n",
1052 |       "Epoch   4 Batch  900/1077 - Train Accuracy: 0.9105, Validation Accuracy: 0.8803, Loss: 0.1448\n",
1053 |       "Epoch   5 Batch  300/1077 - Train Accuracy: 0.9359, Validation Accuracy: 0.9091, Loss: 0.1008\n",
1054 |       "Epoch   5 Batch  600/1077 - Train Accuracy: 0.9364, Validation Accuracy: 0.9130, Loss: 0.1137\n",
1055 |       "Epoch   5 Batch  900/1077 - Train Accuracy: 0.9449, Validation Accuracy: 0.9205, Loss: 0.1105\n",
1056 |       "Epoch   6 Batch  300/1077 - Train Accuracy: 0.9548, Validation Accuracy: 0.9375, Loss: 0.0767\n",
1057 |       "Epoch   6 Batch  600/1077 - Train Accuracy: 0.9472, Validation Accuracy: 0.9347, Loss: 0.0794\n",
1058 |       "Epoch   6 Batch  900/1077 - Train Accuracy: 0.9441, Validation Accuracy: 0.9446, Loss: 0.0855\n",
1059 |       "Epoch   7 Batch  300/1077 - Train Accuracy: 0.9692, Validation Accuracy: 0.9364, Loss: 0.0530\n",
1060 |       "Epoch   7 Batch  600/1077 - Train Accuracy: 0.9483, Validation Accuracy: 0.9322, Loss: 0.0719\n",
1061 |       "Epoch   7 Batch  900/1077 - Train Accuracy: 0.9535, Validation Accuracy: 0.9513, Loss: 0.0616\n",
1062 |       "Epoch   8 Batch  300/1077 - Train Accuracy: 0.9667, Validation Accuracy: 0.9542, Loss: 0.0461\n",
1063 |       "Epoch   8 Batch  600/1077 - Train Accuracy: 0.9528, Validation Accuracy: 0.9506, Loss: 0.0595\n",
1064 |       "Epoch   8 Batch  900/1077 - Train Accuracy: 0.9656, Validation Accuracy: 0.9489, Loss: 0.0506\n",
1065 |       "Epoch   9 Batch  300/1077 - Train Accuracy: 0.9720, Validation Accuracy: 0.9563, Loss: 0.0383\n",
1066 |       "Epoch   9 Batch  600/1077 - Train Accuracy: 0.9624, Validation Accuracy: 0.9645, Loss: 0.0488\n",
1067 |       "Epoch   9 Batch  900/1077 - Train Accuracy: 0.9727, Validation Accuracy: 0.9638, Loss: 0.0511\n",
1068 |       "Epoch  10 Batch  300/1077 - Train Accuracy: 0.9753, Validation Accuracy: 0.9631, Loss: 0.0343\n",
1069 |       "Epoch  10 Batch  600/1077 - Train Accuracy: 0.9494, Validation Accuracy: 0.9748, Loss: 0.0482\n",
1070 |       "Epoch  10 Batch  900/1077 - Train Accuracy: 0.9746, Validation Accuracy: 0.9688, Loss: 0.0416\n",
1071 |       "Epoch  11 Batch  300/1077 - Train Accuracy: 0.9778, Validation Accuracy: 0.9737, Loss: 0.0258\n",
1072 |       "Epoch  11 Batch  600/1077 - Train Accuracy: 0.9513, Validation Accuracy: 0.9680, Loss: 0.0353\n",
1073 |       "Epoch  11 Batch  900/1077 - Train Accuracy: 0.9602, Validation Accuracy: 0.9712, Loss: 0.0419\n",
1074 |       "Epoch  12 Batch  300/1077 - Train Accuracy: 0.9774, Validation Accuracy: 0.9656, Loss: 0.0252\n",
1075 |       "Epoch  12 Batch  600/1077 - Train Accuracy: 0.9710, Validation Accuracy: 0.9698, Loss: 0.0413\n",
1076 |       "Epoch  12 Batch  900/1077 - Train Accuracy: 0.9770, Validation Accuracy: 0.9684, Loss: 0.0352\n",
1077 |       "Model Trained and Saved\n"
1078 |      ]
1079 |     }
1080 |    ],
1081 |    "source": [
1082 |     "def get_accuracy(target, logits):\n",
1083 |     "    \"\"\"\n",
1084 |     "    Calculate accuracy\n",
1085 |     "    \"\"\"\n",
1086 |     "    max_seq = max(target.shape[1], logits.shape[1])\n",
1087 |     "    if max_seq - target.shape[1]:\n",
1088 |     "        target = np.pad(\n",
1089 |     "            target,\n",
1090 |     "            [(0,0),(0,max_seq - target.shape[1])],\n",
1091 |     "            'constant')\n",
1092 |     "    if max_seq - logits.shape[1]:\n",
1093 |     "        logits = np.pad(\n",
1094 |     "            logits,\n",
1095 |     "            [(0,0),(0,max_seq - logits.shape[1])],\n",
1096 |     "            'constant')\n",
1097 |     "\n",
1098 |     "    return np.mean(np.equal(target, logits))\n",
1099 |     "\n",
1100 |     "# Split data to training and validation sets\n",
1101 |     "train_source = source_int_text[batch_size:]\n",
1102 |     "train_target = target_int_text[batch_size:]\n",
1103 |     "valid_source = source_int_text[:batch_size]\n",
1104 |     "valid_target = target_int_text[:batch_size]\n",
1105 |     "(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,\n",
1106 |     "                                                                                                             valid_target,\n",
1107 |     "                                                                                                             batch_size,\n",
1108 |     "                                                                                                             source_vocab_to_int['<PAD>'],\n",
1109 |     "                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  \n",
1110 |     "with tf.Session(graph=train_graph) as sess:\n",
1111 |     "    sess.run(tf.global_variables_initializer())\n",
1112 |     "\n",
1113 |     "    for epoch_i in range(epochs):\n",
1114 |     "        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(\n",
1115 |     "                get_batches(train_source, train_target, batch_size,\n",
1116 |     "                            source_vocab_to_int['<PAD>'],\n",
1117 |     "                            target_vocab_to_int['<PAD>'])):\n",
1118 |     "\n",
1119 |     "            _, loss = sess.run(\n",
1120 |     "                [train_op, cost],\n",
1121 |     "                {input_data: source_batch,\n",
1122 |     "                 targets: target_batch,\n",
1123 |     "                 lr: learning_rate,\n",
1124 |     "                 target_sequence_length: targets_lengths,\n",
1125 |     "                 keep_prob: keep_probability})\n",
1126 |     "\n",
1127 |     "\n",
1128 |     "            if batch_i % display_step == 0 and batch_i > 0:\n",
1129 |     "                batch_train_logits = sess.run(\n",
1130 |     "                    inference_logits,\n",
1131 |     "                    {input_data: source_batch,\n",
1132 |     "                     target_sequence_length: targets_lengths,\n",
1133 |     "                     keep_prob: 1.0})\n",
1134 |     "\n",
1135 |     "                batch_valid_logits = sess.run(\n",
1136 |     "                    inference_logits,\n",
1137 |     "                    {input_data: valid_sources_batch,\n",
1138 |     "                     target_sequence_length: valid_targets_lengths,\n",
1139 |     "                     keep_prob: 1.0})\n",
1140 |     "\n",
1141 |     "                train_acc = get_accuracy(target_batch, batch_train_logits)\n",
1142 |     "                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)\n",
1143 |     "\n",
1144 |     "                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'\n",
1145 |     "                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))\n",
1146 |     "\n",
1147 |     "    # Save Model\n",
1148 |     "    saver = tf.train.Saver()\n",
1149 |     "    saver.save(sess, save_path)\n",
1150 |     "    print('Model Trained and Saved')"
1151 |    ]
1152 |   },
1153 |   {
1154 |    "cell_type": "markdown",
1155 |    "metadata": {},
1156 |    "source": [
1157 |     "### Save Parameters\n",
1158 |     "Save the `batch_size` and `save_path` parameters for inference."
1159 |    ]
1160 |   },
1161 |   {
1162 |    "cell_type": "code",
1163 |    "execution_count": 32,
1164 |    "metadata": {},
1165 |    "outputs": [],
1166 |    "source": [
1167 |     "def save_params(params):\n",
1168 |     "    with open('params.p', 'wb') as out_file:\n",
1169 |     "        pickle.dump(params, out_file)\n",
1170 |     "\n",
1171 |     "\n",
1172 |     "def load_params():\n",
1173 |     "    with open('params.p', mode='rb') as in_file:\n",
1174 |     "        return pickle.load(in_file)"
1175 |    ]
1176 |   },
1177 |   {
1178 |    "cell_type": "code",
1179 |    "execution_count": 33,
1180 |    "metadata": {},
1181 |    "outputs": [],
1182 |    "source": [
1183 |     "# Save parameters for checkpoint\n",
1184 |     "save_params(save_path)"
1185 |    ]
1186 |   },
1187 |   {
1188 |    "cell_type": "markdown",
1189 |    "metadata": {},
1190 |    "source": [
1191 |     "# Checkpoint"
1192 |    ]
1193 |   },
1194 |   {
1195 |    "cell_type": "code",
1196 |    "execution_count": 34,
1197 |    "metadata": {},
1198 |    "outputs": [],
1199 |    "source": [
1200 |     "import tensorflow as tf\n",
1201 |     "import numpy as np\n",
1202 |     "import problem_unittests as tests\n",
1203 |     "\n",
1204 |     "_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = load_preprocess()\n",
1205 |     "load_path = load_params()"
1206 |    ]
1207 |   },
1208 |   {
1209 |    "cell_type": "markdown",
1210 |    "metadata": {},
1211 |    "source": [
1212 |     "## Translate\n",
1213 |     "This will translate `translate_sentence` from English to French."
1214 |    ]
1215 |   },
1216 |   {
1217 |    "cell_type": "code",
1218 |    "execution_count": 39,
1219 |    "metadata": {},
1220 |    "outputs": [
1221 |     {
1222 |      "name": "stdout",
1223 |      "output_type": "stream",
1224 |      "text": [
1225 |       "INFO:tensorflow:Restoring parameters from checkpoints/dev\n",
1226 |       "Input\n",
1227 |       "  Word Ids:      [158, 189, 82, 152, 206, 176, 101]\n",
1228 |       "  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']\n",
1229 |       "\n",
1230 |       "Prediction\n",
1231 |       "  Word Ids:      [18, 212, 182, 26, 220, 48, 317, 126, 1]\n",
1232 |       "  French Words: il a vu un nouveau camion jaune . <EOS>\n"
1233 |      ]
1234 |     }
1235 |    ],
1236 |    "source": [
1237 |     "def sentence_to_seq(sentence, vocab_to_int):\n",
1238 |     "    results = []\n",
1239 |     "    for word in sentence.split(\" \"):\n",
1240 |     "        if word in vocab_to_int:\n",
1241 |     "            results.append(vocab_to_int[word])\n",
1242 |     "        else:\n",
1243 |     "            results.append(vocab_to_int['<UNK>'])\n",
1244 |     "            \n",
1245 |     "    return results\n",
1246 |     "\n",
1247 |     "translate_sentence = 'he saw a old yellow truck .'\n",
1248 |     "\n",
1249 |     "translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)\n",
1250 |     "\n",
1251 |     "loaded_graph = tf.Graph()\n",
1252 |     "with tf.Session(graph=loaded_graph) as sess:\n",
1253 |     "    # Load saved model\n",
1254 |     "    loader = tf.train.import_meta_graph(load_path + '.meta')\n",
1255 |     "    loader.restore(sess, load_path)\n",
1256 |     "\n",
1257 |     "    input_data = loaded_graph.get_tensor_by_name('input:0')\n",
1258 |     "    logits = loaded_graph.get_tensor_by_name('predictions:0')\n",
1259 |     "    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')\n",
1260 |     "    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')\n",
1261 |     "\n",
1262 |     "    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,\n",
1263 |     "                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,\n",
1264 |     "                                         keep_prob: 1.0})[0]\n",
1265 |     "\n",
1266 |     "print('Input')\n",
1267 |     "print('  Word Ids:      {}'.format([i for i in translate_sentence]))\n",
1268 |     "print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))\n",
1269 |     "\n",
1270 |     "print('\\nPrediction')\n",
1271 |     "print('  Word Ids:      {}'.format([i for i in translate_logits]))\n",
1272 |     "print('  French Words: {}'.format(\" \".join([target_int_to_vocab[i] for i in translate_logits])))"
1273 |    ]
1274 |   },
1275 |   {
1276 |    "cell_type": "code",
1277 |    "execution_count": null,
1278 |    "metadata": {},
1279 |    "outputs": [],
1280 |    "source": []
1281 |   }
1282 |  ],
1283 |  "metadata": {
1284 |   "anaconda-cloud": {},
1285 |   "kernelspec": {
1286 |    "display_name": "test",
1287 |    "language": "python",
1288 |    "name": "test"
1289 |   },
1290 |   "language_info": {
1291 |    "codemirror_mode": {
1292 |     "name": "ipython",
1293 |     "version": 3
1294 |    },
1295 |    "file_extension": ".py",
1296 |    "mimetype": "text/x-python",
1297 |    "name": "python",
1298 |    "nbconvert_exporter": "python",
1299 |    "pygments_lexer": "ipython3",
1300 |    "version": "3.6.5"
1301 |   }
1302 |  },
1303 |  "nbformat": 4,
1304 |  "nbformat_minor": 1
1305 | }
1306 | 


--------------------------------------------------------------------------------
/encoding_model.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/encoding_model.PNG


--------------------------------------------------------------------------------
/go_insert.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/go_insert.png


--------------------------------------------------------------------------------
/gradient_clipping.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/gradient_clipping.PNG


--------------------------------------------------------------------------------
/inference_phase.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/inference_phase.PNG


--------------------------------------------------------------------------------
/lookup.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/lookup.png


--------------------------------------------------------------------------------
/pad_insert.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/pad_insert.png


--------------------------------------------------------------------------------
/params.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/params.p


--------------------------------------------------------------------------------
/training_phase.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/deep-diver/EN-FR-MLT-tensorflow/787400286758263cd127229e446bf8a28b8e9ca2/training_phase.PNG


--------------------------------------------------------------------------------