├── README.md
└── summarize_reviews.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # Summarizing Text with Amazon Reviews
 2 | 
 3 | Updated to work with TensorFlow Version: 1.3.0
 4 | 
 5 | The objective of this project is to build a model that can create relevant summaries for reviews written about fine foods sold on Amazon. This dataset contains above 500,000 reviews, and is hosted on [Kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews).
 6 | 
 7 | Here are two examples to show what the data looks like
 8 | ```
 9 | Review # 1
10 | Good Quality Dog Food
11 | I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.
12 | 
13 | Review # 2
14 | Not as Advertised
15 | Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".
16 | ```
17 | To build our model we will use a two-layered bidirectional RNN with LSTMs on the input data and two layers, each with an LSTM using bahdanau attention on the target data.
18 | 
19 | The sections of this project are:
20 | - 1.Inspecting the Data
21 | - 2.Preparing the Data
22 | - 3.Building the Model
23 | - 4.Training the Model
24 | - 5.Making Our Own Summaries
25 | 
26 | ## Download data
27 | Amazon Reviews Data: [Reviews.csv](https://www.kaggle.com/snap/amazon-fine-food-reviews/downloads/Reviews.csv) and copy it to **./Reviews.csv**
28 | 
29 | word embeddings [numberbatch-en-17.06.txt.gz](https://conceptnet.s3.amazonaws.com/downloads/2017/numberbatch/numberbatch-en-17.06.txt.gz)
30 | after download, extract to **./model/numberbatch-en-17.06.txt**
31 | 
32 | ## Dependencies
33 | Python 3.5 packages: tensorflow v1.3, pandas, numpy, nltk
34 | 
35 | ### How to Run
36 | Run the python notebook by cd into the directory in command line then run
37 | ```
38 | jupyter notebook
39 | ```
40 | choose this in the browser
41 | 
42 | **summarize_reviews.ipynb**
43 | 
44 | 
45 | Inspired by the post [Text Summarization with Amazon Reviews](https://medium.com/towards-data-science/text-summarization-with-amazon-reviews-41801c2210b), with a few improvements.
46 | 
47 | I wrote an [article](https://www.dlology.com/blog/tutorial-summarizing-text-with-amazon-reviews/)  about this project that explains parts of it in detail.


--------------------------------------------------------------------------------
/summarize_reviews.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Summarizing Text with Amazon Reviews"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "markdown",
  12 |    "metadata": {},
  13 |    "source": [
  14 |     "The objective of this project is to build a model that can create relevant summaries for reviews written about fine foods sold on Amazon. This dataset contains above 500,000 reviews, and is hosted on [Kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews).\n",
  15 |     "\n",
  16 |     "To build our model we will use a two-layered bidirectional RNN with LSTMs on the input data and two layers, each with an LSTM using bahdanau attention on the target data.\n",
  17 |     "\n",
  18 |     "The sections of this project are:\n",
  19 |     "- [1.Inspecting the Data](#1.-Insepcting-the-Data)\n",
  20 |     "- [2.Preparing the Data](#2.-Preparing-the-Data)\n",
  21 |     "- [3.Building the Model](#3.-Building-the-Model)\n",
  22 |     "- [4.Training the Model](#4.-Training-the-Model)\n",
  23 |     "- [5.Making Our Own Summaries](#5.-Making-Our-Own-Summaries)\n",
  24 |     "\n",
  25 |     "## Download data\n",
  26 |     "Amazon Reviews Data: [Reviews.csv](https://www.kaggle.com/snap/amazon-fine-food-reviews/downloads/Reviews.csv)\n",
  27 |     "\n",
  28 |     "word embeddings [numberbatch-en-17.06.txt.gz](https://conceptnet.s3.amazonaws.com/downloads/2017/numberbatch/numberbatch-en-17.06.txt.gz)\n",
  29 |     "after download, extract to **./model/numberbatch-en-17.06.txt**"
  30 |    ]
  31 |   },
  32 |   {
  33 |    "cell_type": "code",
  34 |    "execution_count": 1,
  35 |    "metadata": {},
  36 |    "outputs": [
  37 |     {
  38 |      "name": "stdout",
  39 |      "output_type": "stream",
  40 |      "text": [
  41 |       "TensorFlow Version: 1.3.0\n"
  42 |      ]
  43 |     }
  44 |    ],
  45 |    "source": [
  46 |     "import pandas as pd\n",
  47 |     "import numpy as np\n",
  48 |     "import tensorflow as tf\n",
  49 |     "import re\n",
  50 |     "from nltk.corpus import stopwords\n",
  51 |     "import time\n",
  52 |     "from tensorflow.python.layers.core import Dense\n",
  53 |     "from tensorflow.python.ops.rnn_cell_impl import _zero_state_tensors\n",
  54 |     "from tensorflow.python.ops import array_ops\n",
  55 |     "from tensorflow.python.ops import tensor_array_ops\n",
  56 |     "print('TensorFlow Version: {}'.format(tf.__version__))"
  57 |    ]
  58 |   },
  59 |   {
  60 |    "cell_type": "code",
  61 |    "execution_count": 2,
  62 |    "metadata": {
  63 |     "collapsed": true
  64 |    },
  65 |    "outputs": [],
  66 |    "source": [
  67 |     "import pickle\n",
  68 |     "def __pickleStuff(filename, stuff):\n",
  69 |     "    save_stuff = open(filename, \"wb\")\n",
  70 |     "    pickle.dump(stuff, save_stuff)\n",
  71 |     "    save_stuff.close()\n",
  72 |     "def __loadStuff(filename):\n",
  73 |     "    saved_stuff = open(filename,\"rb\")\n",
  74 |     "    stuff = pickle.load(saved_stuff)\n",
  75 |     "    saved_stuff.close()\n",
  76 |     "    return stuff"
  77 |    ]
  78 |   },
  79 |   {
  80 |    "cell_type": "markdown",
  81 |    "metadata": {},
  82 |    "source": [
  83 |     "## Load those prepared data and skip to section \"[3. Building the Model](#3.-Building-the-Model)\"\n",
  84 |     "Once we have run through the \"[2.Preparing the Data](#2.-Preparing-the-Data)\" section, we should have those data, uncomment and run those lines."
  85 |    ]
  86 |   },
  87 |   {
  88 |    "cell_type": "code",
  89 |    "execution_count": 3,
  90 |    "metadata": {
  91 |     "collapsed": true
  92 |    },
  93 |    "outputs": [],
  94 |    "source": [
  95 |     "clean_summaries = __loadStuff(\"./data/clean_summaries.p\")\n",
  96 |     "clean_texts = __loadStuff(\"./data/clean_texts.p\")\n",
  97 |     "\n",
  98 |     "sorted_summaries = __loadStuff(\"./data/sorted_summaries.p\")\n",
  99 |     "sorted_texts = __loadStuff(\"./data/sorted_texts.p\")\n",
 100 |     "word_embedding_matrix = __loadStuff(\"./data/word_embedding_matrix.p\")\n",
 101 |     "\n",
 102 |     "vocab_to_int = __loadStuff(\"./data/vocab_to_int.p\")\n",
 103 |     "int_to_vocab = __loadStuff(\"./data/int_to_vocab.p\")\n"
 104 |    ]
 105 |   },
 106 |   {
 107 |    "cell_type": "markdown",
 108 |    "metadata": {},
 109 |    "source": [
 110 |     "## 1. Insepcting the Data"
 111 |    ]
 112 |   },
 113 |   {
 114 |    "cell_type": "code",
 115 |    "execution_count": 3,
 116 |    "metadata": {
 117 |     "collapsed": true
 118 |    },
 119 |    "outputs": [],
 120 |    "source": [
 121 |     "reviews = pd.read_csv(\"Reviews.csv\")"
 122 |    ]
 123 |   },
 124 |   {
 125 |    "cell_type": "code",
 126 |    "execution_count": 4,
 127 |    "metadata": {},
 128 |    "outputs": [
 129 |     {
 130 |      "data": {
 131 |       "text/plain": [
 132 |        "(568454, 10)"
 133 |       ]
 134 |      },
 135 |      "execution_count": 4,
 136 |      "metadata": {},
 137 |      "output_type": "execute_result"
 138 |     }
 139 |    ],
 140 |    "source": [
 141 |     "reviews.shape"
 142 |    ]
 143 |   },
 144 |   {
 145 |    "cell_type": "code",
 146 |    "execution_count": 5,
 147 |    "metadata": {},
 148 |    "outputs": [
 149 |     {
 150 |      "data": {
 151 |       "text/html": [
 152 |        "<div>\n",
 153 |        "<table border=\"1\" class=\"dataframe\">\n",
 154 |        "  <thead>\n",
 155 |        "    <tr style=\"text-align: right;\">\n",
 156 |        "      <th></th>\n",
 157 |        "      <th>Id</th>\n",
 158 |        "      <th>ProductId</th>\n",
 159 |        "      <th>UserId</th>\n",
 160 |        "      <th>ProfileName</th>\n",
 161 |        "      <th>HelpfulnessNumerator</th>\n",
 162 |        "      <th>HelpfulnessDenominator</th>\n",
 163 |        "      <th>Score</th>\n",
 164 |        "      <th>Time</th>\n",
 165 |        "      <th>Summary</th>\n",
 166 |        "      <th>Text</th>\n",
 167 |        "    </tr>\n",
 168 |        "  </thead>\n",
 169 |        "  <tbody>\n",
 170 |        "    <tr>\n",
 171 |        "      <th>0</th>\n",
 172 |        "      <td>1</td>\n",
 173 |        "      <td>B001E4KFG0</td>\n",
 174 |        "      <td>A3SGXH7AUHU8GW</td>\n",
 175 |        "      <td>delmartian</td>\n",
 176 |        "      <td>1</td>\n",
 177 |        "      <td>1</td>\n",
 178 |        "      <td>5</td>\n",
 179 |        "      <td>1303862400</td>\n",
 180 |        "      <td>Good Quality Dog Food</td>\n",
 181 |        "      <td>I have bought several of the Vitality canned d...</td>\n",
 182 |        "    </tr>\n",
 183 |        "    <tr>\n",
 184 |        "      <th>1</th>\n",
 185 |        "      <td>2</td>\n",
 186 |        "      <td>B00813GRG4</td>\n",
 187 |        "      <td>A1D87F6ZCVE5NK</td>\n",
 188 |        "      <td>dll pa</td>\n",
 189 |        "      <td>0</td>\n",
 190 |        "      <td>0</td>\n",
 191 |        "      <td>1</td>\n",
 192 |        "      <td>1346976000</td>\n",
 193 |        "      <td>Not as Advertised</td>\n",
 194 |        "      <td>Product arrived labeled as Jumbo Salted Peanut...</td>\n",
 195 |        "    </tr>\n",
 196 |        "    <tr>\n",
 197 |        "      <th>2</th>\n",
 198 |        "      <td>3</td>\n",
 199 |        "      <td>B000LQOCH0</td>\n",
 200 |        "      <td>ABXLMWJIXXAIN</td>\n",
 201 |        "      <td>Natalia Corres \"Natalia Corres\"</td>\n",
 202 |        "      <td>1</td>\n",
 203 |        "      <td>1</td>\n",
 204 |        "      <td>4</td>\n",
 205 |        "      <td>1219017600</td>\n",
 206 |        "      <td>\"Delight\" says it all</td>\n",
 207 |        "      <td>This is a confection that has been around a fe...</td>\n",
 208 |        "    </tr>\n",
 209 |        "    <tr>\n",
 210 |        "      <th>3</th>\n",
 211 |        "      <td>4</td>\n",
 212 |        "      <td>B000UA0QIQ</td>\n",
 213 |        "      <td>A395BORC6FGVXV</td>\n",
 214 |        "      <td>Karl</td>\n",
 215 |        "      <td>3</td>\n",
 216 |        "      <td>3</td>\n",
 217 |        "      <td>2</td>\n",
 218 |        "      <td>1307923200</td>\n",
 219 |        "      <td>Cough Medicine</td>\n",
 220 |        "      <td>If you are looking for the secret ingredient i...</td>\n",
 221 |        "    </tr>\n",
 222 |        "    <tr>\n",
 223 |        "      <th>4</th>\n",
 224 |        "      <td>5</td>\n",
 225 |        "      <td>B006K2ZZ7K</td>\n",
 226 |        "      <td>A1UQRSCLF8GW1T</td>\n",
 227 |        "      <td>Michael D. Bigham \"M. Wassir\"</td>\n",
 228 |        "      <td>0</td>\n",
 229 |        "      <td>0</td>\n",
 230 |        "      <td>5</td>\n",
 231 |        "      <td>1350777600</td>\n",
 232 |        "      <td>Great taffy</td>\n",
 233 |        "      <td>Great taffy at a great price.  There was a wid...</td>\n",
 234 |        "    </tr>\n",
 235 |        "  </tbody>\n",
 236 |        "</table>\n",
 237 |        "</div>"
 238 |       ],
 239 |       "text/plain": [
 240 |        "   Id   ProductId          UserId                      ProfileName  \\\n",
 241 |        "0   1  B001E4KFG0  A3SGXH7AUHU8GW                       delmartian   \n",
 242 |        "1   2  B00813GRG4  A1D87F6ZCVE5NK                           dll pa   \n",
 243 |        "2   3  B000LQOCH0   ABXLMWJIXXAIN  Natalia Corres \"Natalia Corres\"   \n",
 244 |        "3   4  B000UA0QIQ  A395BORC6FGVXV                             Karl   \n",
 245 |        "4   5  B006K2ZZ7K  A1UQRSCLF8GW1T    Michael D. Bigham \"M. Wassir\"   \n",
 246 |        "\n",
 247 |        "   HelpfulnessNumerator  HelpfulnessDenominator  Score        Time  \\\n",
 248 |        "0                     1                       1      5  1303862400   \n",
 249 |        "1                     0                       0      1  1346976000   \n",
 250 |        "2                     1                       1      4  1219017600   \n",
 251 |        "3                     3                       3      2  1307923200   \n",
 252 |        "4                     0                       0      5  1350777600   \n",
 253 |        "\n",
 254 |        "                 Summary                                               Text  \n",
 255 |        "0  Good Quality Dog Food  I have bought several of the Vitality canned d...  \n",
 256 |        "1      Not as Advertised  Product arrived labeled as Jumbo Salted Peanut...  \n",
 257 |        "2  \"Delight\" says it all  This is a confection that has been around a fe...  \n",
 258 |        "3         Cough Medicine  If you are looking for the secret ingredient i...  \n",
 259 |        "4            Great taffy  Great taffy at a great price.  There was a wid...  "
 260 |       ]
 261 |      },
 262 |      "execution_count": 5,
 263 |      "metadata": {},
 264 |      "output_type": "execute_result"
 265 |     }
 266 |    ],
 267 |    "source": [
 268 |     "reviews.head()"
 269 |    ]
 270 |   },
 271 |   {
 272 |    "cell_type": "code",
 273 |    "execution_count": 6,
 274 |    "metadata": {},
 275 |    "outputs": [
 276 |     {
 277 |      "data": {
 278 |       "text/plain": [
 279 |        "Id                         0\n",
 280 |        "ProductId                  0\n",
 281 |        "UserId                     0\n",
 282 |        "ProfileName               16\n",
 283 |        "HelpfulnessNumerator       0\n",
 284 |        "HelpfulnessDenominator     0\n",
 285 |        "Score                      0\n",
 286 |        "Time                       0\n",
 287 |        "Summary                   26\n",
 288 |        "Text                       0\n",
 289 |        "dtype: int64"
 290 |       ]
 291 |      },
 292 |      "execution_count": 6,
 293 |      "metadata": {},
 294 |      "output_type": "execute_result"
 295 |     }
 296 |    ],
 297 |    "source": [
 298 |     "# Check for any nulls values\n",
 299 |     "reviews.isnull().sum()"
 300 |    ]
 301 |   },
 302 |   {
 303 |    "cell_type": "code",
 304 |    "execution_count": 7,
 305 |    "metadata": {
 306 |     "collapsed": true
 307 |    },
 308 |    "outputs": [],
 309 |    "source": [
 310 |     "# Remove null values and unneeded features\n",
 311 |     "reviews = reviews.dropna()\n",
 312 |     "reviews = reviews.drop(['Id','ProductId','UserId','ProfileName','HelpfulnessNumerator','HelpfulnessDenominator',\n",
 313 |     "                        'Score','Time'], 1)\n",
 314 |     "reviews = reviews.reset_index(drop=True)"
 315 |    ]
 316 |   },
 317 |   {
 318 |    "cell_type": "code",
 319 |    "execution_count": 8,
 320 |    "metadata": {},
 321 |    "outputs": [
 322 |     {
 323 |      "data": {
 324 |       "text/plain": [
 325 |        "(568412, 2)"
 326 |       ]
 327 |      },
 328 |      "execution_count": 8,
 329 |      "metadata": {},
 330 |      "output_type": "execute_result"
 331 |     }
 332 |    ],
 333 |    "source": [
 334 |     "reviews.shape"
 335 |    ]
 336 |   },
 337 |   {
 338 |    "cell_type": "code",
 339 |    "execution_count": 9,
 340 |    "metadata": {},
 341 |    "outputs": [
 342 |     {
 343 |      "data": {
 344 |       "text/html": [
 345 |        "<div>\n",
 346 |        "<table border=\"1\" class=\"dataframe\">\n",
 347 |        "  <thead>\n",
 348 |        "    <tr style=\"text-align: right;\">\n",
 349 |        "      <th></th>\n",
 350 |        "      <th>Summary</th>\n",
 351 |        "      <th>Text</th>\n",
 352 |        "    </tr>\n",
 353 |        "  </thead>\n",
 354 |        "  <tbody>\n",
 355 |        "    <tr>\n",
 356 |        "      <th>0</th>\n",
 357 |        "      <td>Good Quality Dog Food</td>\n",
 358 |        "      <td>I have bought several of the Vitality canned d...</td>\n",
 359 |        "    </tr>\n",
 360 |        "    <tr>\n",
 361 |        "      <th>1</th>\n",
 362 |        "      <td>Not as Advertised</td>\n",
 363 |        "      <td>Product arrived labeled as Jumbo Salted Peanut...</td>\n",
 364 |        "    </tr>\n",
 365 |        "    <tr>\n",
 366 |        "      <th>2</th>\n",
 367 |        "      <td>\"Delight\" says it all</td>\n",
 368 |        "      <td>This is a confection that has been around a fe...</td>\n",
 369 |        "    </tr>\n",
 370 |        "    <tr>\n",
 371 |        "      <th>3</th>\n",
 372 |        "      <td>Cough Medicine</td>\n",
 373 |        "      <td>If you are looking for the secret ingredient i...</td>\n",
 374 |        "    </tr>\n",
 375 |        "    <tr>\n",
 376 |        "      <th>4</th>\n",
 377 |        "      <td>Great taffy</td>\n",
 378 |        "      <td>Great taffy at a great price.  There was a wid...</td>\n",
 379 |        "    </tr>\n",
 380 |        "  </tbody>\n",
 381 |        "</table>\n",
 382 |        "</div>"
 383 |       ],
 384 |       "text/plain": [
 385 |        "                 Summary                                               Text\n",
 386 |        "0  Good Quality Dog Food  I have bought several of the Vitality canned d...\n",
 387 |        "1      Not as Advertised  Product arrived labeled as Jumbo Salted Peanut...\n",
 388 |        "2  \"Delight\" says it all  This is a confection that has been around a fe...\n",
 389 |        "3         Cough Medicine  If you are looking for the secret ingredient i...\n",
 390 |        "4            Great taffy  Great taffy at a great price.  There was a wid..."
 391 |       ]
 392 |      },
 393 |      "execution_count": 9,
 394 |      "metadata": {},
 395 |      "output_type": "execute_result"
 396 |     }
 397 |    ],
 398 |    "source": [
 399 |     "reviews.head()"
 400 |    ]
 401 |   },
 402 |   {
 403 |    "cell_type": "code",
 404 |    "execution_count": 10,
 405 |    "metadata": {},
 406 |    "outputs": [
 407 |     {
 408 |      "name": "stdout",
 409 |      "output_type": "stream",
 410 |      "text": [
 411 |       "Review # 1\n",
 412 |       "Good Quality Dog Food\n",
 413 |       "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.\n",
 414 |       "\n",
 415 |       "Review # 2\n",
 416 |       "Not as Advertised\n",
 417 |       "Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as \"Jumbo\".\n",
 418 |       "\n",
 419 |       "Review # 3\n",
 420 |       "\"Delight\" says it all\n",
 421 |       "This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis' \"The Lion, The Witch, and The Wardrobe\" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.\n",
 422 |       "\n",
 423 |       "Review # 4\n",
 424 |       "Cough Medicine\n",
 425 |       "If you are looking for the secret ingredient in Robitussin I believe I have found it.  I got this in addition to the Root Beer Extract I ordered (which was good) and made some cherry soda.  The flavor is very medicinal.\n",
 426 |       "\n",
 427 |       "Review # 5\n",
 428 |       "Great taffy\n",
 429 |       "Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal.\n",
 430 |       "\n"
 431 |      ]
 432 |     }
 433 |    ],
 434 |    "source": [
 435 |     "# Inspecting some of the reviews\n",
 436 |     "for i in range(5):\n",
 437 |     "    print(\"Review #\",i+1)\n",
 438 |     "    print(reviews.Summary[i])\n",
 439 |     "    print(reviews.Text[i])\n",
 440 |     "    print()"
 441 |    ]
 442 |   },
 443 |   {
 444 |    "cell_type": "markdown",
 445 |    "metadata": {},
 446 |    "source": [
 447 |     "## 2. Preparing the Data"
 448 |    ]
 449 |   },
 450 |   {
 451 |    "cell_type": "code",
 452 |    "execution_count": 15,
 453 |    "metadata": {
 454 |     "collapsed": true
 455 |    },
 456 |    "outputs": [],
 457 |    "source": [
 458 |     "# A list of contractions from http://stackoverflow.com/questions/19790188/expanding-english-language-contractions-in-python\n",
 459 |     "contractions = { \n",
 460 |     "\"ain't\": \"am not\",\n",
 461 |     "\"aren't\": \"are not\",\n",
 462 |     "\"can't\": \"cannot\",\n",
 463 |     "\"can't've\": \"cannot have\",\n",
 464 |     "\"'cause\": \"because\",\n",
 465 |     "\"could've\": \"could have\",\n",
 466 |     "\"couldn't\": \"could not\",\n",
 467 |     "\"couldn't've\": \"could not have\",\n",
 468 |     "\"didn't\": \"did not\",\n",
 469 |     "\"doesn't\": \"does not\",\n",
 470 |     "\"don't\": \"do not\",\n",
 471 |     "\"hadn't\": \"had not\",\n",
 472 |     "\"hadn't've\": \"had not have\",\n",
 473 |     "\"hasn't\": \"has not\",\n",
 474 |     "\"haven't\": \"have not\",\n",
 475 |     "\"he'd\": \"he would\",\n",
 476 |     "\"he'd've\": \"he would have\",\n",
 477 |     "\"he'll\": \"he will\",\n",
 478 |     "\"he's\": \"he is\",\n",
 479 |     "\"how'd\": \"how did\",\n",
 480 |     "\"how'll\": \"how will\",\n",
 481 |     "\"how's\": \"how is\",\n",
 482 |     "\"i'd\": \"i would\",\n",
 483 |     "\"i'll\": \"i will\",\n",
 484 |     "\"i'm\": \"i am\",\n",
 485 |     "\"i've\": \"i have\",\n",
 486 |     "\"isn't\": \"is not\",\n",
 487 |     "\"it'd\": \"it would\",\n",
 488 |     "\"it'll\": \"it will\",\n",
 489 |     "\"it's\": \"it is\",\n",
 490 |     "\"let's\": \"let us\",\n",
 491 |     "\"ma'am\": \"madam\",\n",
 492 |     "\"mayn't\": \"may not\",\n",
 493 |     "\"might've\": \"might have\",\n",
 494 |     "\"mightn't\": \"might not\",\n",
 495 |     "\"must've\": \"must have\",\n",
 496 |     "\"mustn't\": \"must not\",\n",
 497 |     "\"needn't\": \"need not\",\n",
 498 |     "\"oughtn't\": \"ought not\",\n",
 499 |     "\"shan't\": \"shall not\",\n",
 500 |     "\"sha'n't\": \"shall not\",\n",
 501 |     "\"she'd\": \"she would\",\n",
 502 |     "\"she'll\": \"she will\",\n",
 503 |     "\"she's\": \"she is\",\n",
 504 |     "\"should've\": \"should have\",\n",
 505 |     "\"shouldn't\": \"should not\",\n",
 506 |     "\"that'd\": \"that would\",\n",
 507 |     "\"that's\": \"that is\",\n",
 508 |     "\"there'd\": \"there had\",\n",
 509 |     "\"there's\": \"there is\",\n",
 510 |     "\"they'd\": \"they would\",\n",
 511 |     "\"they'll\": \"they will\",\n",
 512 |     "\"they're\": \"they are\",\n",
 513 |     "\"they've\": \"they have\",\n",
 514 |     "\"wasn't\": \"was not\",\n",
 515 |     "\"we'd\": \"we would\",\n",
 516 |     "\"we'll\": \"we will\",\n",
 517 |     "\"we're\": \"we are\",\n",
 518 |     "\"we've\": \"we have\",\n",
 519 |     "\"weren't\": \"were not\",\n",
 520 |     "\"what'll\": \"what will\",\n",
 521 |     "\"what're\": \"what are\",\n",
 522 |     "\"what's\": \"what is\",\n",
 523 |     "\"what've\": \"what have\",\n",
 524 |     "\"where'd\": \"where did\",\n",
 525 |     "\"where's\": \"where is\",\n",
 526 |     "\"who'll\": \"who will\",\n",
 527 |     "\"who's\": \"who is\",\n",
 528 |     "\"won't\": \"will not\",\n",
 529 |     "\"wouldn't\": \"would not\",\n",
 530 |     "\"you'd\": \"you would\",\n",
 531 |     "\"you'll\": \"you will\",\n",
 532 |     "\"you're\": \"you are\"\n",
 533 |     "}"
 534 |    ]
 535 |   },
 536 |   {
 537 |    "cell_type": "code",
 538 |    "execution_count": 16,
 539 |    "metadata": {
 540 |     "collapsed": true
 541 |    },
 542 |    "outputs": [],
 543 |    "source": [
 544 |     "def clean_text(text, remove_stopwords = True):\n",
 545 |     "    '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''\n",
 546 |     "    \n",
 547 |     "    # Convert words to lower case\n",
 548 |     "    text = text.lower()\n",
 549 |     "    \n",
 550 |     "    # Replace contractions with their longer forms \n",
 551 |     "    if True:\n",
 552 |     "        # We are not using \"text.split()\" here\n",
 553 |     "        #since it is not fool proof, e.g. words followed by punctuations \"Are you kidding?I think you aren't.\"\n",
 554 |     "        text = re.findall(r\"[\\w']+\", text)\n",
 555 |     "        new_text = []\n",
 556 |     "        for word in text:\n",
 557 |     "            if word in contractions:\n",
 558 |     "                new_text.append(contractions[word])\n",
 559 |     "            else:\n",
 560 |     "                new_text.append(word)\n",
 561 |     "        text = \" \".join(new_text)\n",
 562 |     "    \n",
 563 |     "    # Format words and remove unwanted characters\n",
 564 |     "    text = re.sub(r'https?:\\/\\/.*[\\r\\n]*', '', text, flags=re.MULTILINE)# remove links\n",
 565 |     "    text = re.sub(r'\\<a href', ' ', text)# remove html link tag\n",
 566 |     "    text = re.sub(r'&amp;', '', text) \n",
 567 |     "    text = re.sub(r'[_\"\\-;%()|+&=*%.,!?:#$@\\[\\]/]', ' ', text)\n",
 568 |     "    text = re.sub(r'<br />', ' ', text)\n",
 569 |     "    text = re.sub(r'\\'', ' ', text)\n",
 570 |     "    \n",
 571 |     "    # Optionally, remove stop words\n",
 572 |     "    if remove_stopwords:\n",
 573 |     "        text = text.split()\n",
 574 |     "        stops = set(stopwords.words(\"english\"))\n",
 575 |     "        text = [w for w in text if not w in stops]\n",
 576 |     "        text = \" \".join(text)\n",
 577 |     "\n",
 578 |     "    return text"
 579 |    ]
 580 |   },
 581 |   {
 582 |    "cell_type": "code",
 583 |    "execution_count": 17,
 584 |    "metadata": {},
 585 |    "outputs": [
 586 |     {
 587 |      "data": {
 588 |       "text/plain": [
 589 |        "'great movie believe may'"
 590 |       ]
 591 |      },
 592 |      "execution_count": 17,
 593 |      "metadata": {},
 594 |      "output_type": "execute_result"
 595 |     }
 596 |    ],
 597 |    "source": [
 598 |     "clean_text(\"That's a great movie,Can you believe it?I've.But you may not.\")"
 599 |    ]
 600 |   },
 601 |   {
 602 |    "cell_type": "markdown",
 603 |    "metadata": {},
 604 |    "source": [
 605 |     "### Clean the summaries and texts\n",
 606 |     "We will remove the stopwords from the texts because they do not provide much use for training our model. However, we will keep them for our summaries so that they sound more like natural phrases. "
 607 |    ]
 608 |   },
 609 |   {
 610 |    "cell_type": "code",
 611 |    "execution_count": 14,
 612 |    "metadata": {},
 613 |    "outputs": [
 614 |     {
 615 |      "name": "stdout",
 616 |      "output_type": "stream",
 617 |      "text": [
 618 |       "Summaries are complete.\n",
 619 |       "Texts are complete.\n"
 620 |      ]
 621 |     }
 622 |    ],
 623 |    "source": [
 624 |     "clean_summaries = []\n",
 625 |     "for summary in reviews.Summary:\n",
 626 |     "    clean_summaries.append(clean_text(summary, remove_stopwords=False))\n",
 627 |     "print(\"Summaries are complete.\")\n",
 628 |     "\n",
 629 |     "clean_texts = []\n",
 630 |     "for text in reviews.Text:\n",
 631 |     "    clean_texts.append(clean_text(text))\n",
 632 |     "print(\"Texts are complete.\")"
 633 |    ]
 634 |   },
 635 |   {
 636 |    "cell_type": "code",
 637 |    "execution_count": 15,
 638 |    "metadata": {},
 639 |    "outputs": [
 640 |     {
 641 |      "name": "stdout",
 642 |      "output_type": "stream",
 643 |      "text": [
 644 |       "Clean Review # 1\n",
 645 |       "good quality dog food\n",
 646 |       "bought several vitality canned dog food products found good quality product looks like stew processed meat smells better labrador finicky appreciates product better\n",
 647 |       "\n",
 648 |       "Clean Review # 2\n",
 649 |       "not as advertised\n",
 650 |       "product arrived labeled jumbo salted peanuts peanuts actually small sized unsalted sure error vendor intended represent product jumbo\n",
 651 |       "\n",
 652 |       "Clean Review # 3\n",
 653 |       "delight says it all\n",
 654 |       "confection around centuries light pillowy citrus gelatin nuts case filberts cut tiny squares liberally coated powdered sugar tiny mouthful heaven chewy flavorful highly recommend yummy treat familiar story c lewis lion witch wardrobe treat seduces edmund selling brother sisters witch\n",
 655 |       "\n",
 656 |       "Clean Review # 4\n",
 657 |       "cough medicine\n",
 658 |       "looking secret ingredient robitussin believe found got addition root beer extract ordered good made cherry soda flavor medicinal\n",
 659 |       "\n",
 660 |       "Clean Review # 5\n",
 661 |       "great taffy\n",
 662 |       "great taffy great price wide assortment yummy taffy delivery quick taffy lover deal\n",
 663 |       "\n"
 664 |      ]
 665 |     }
 666 |    ],
 667 |    "source": [
 668 |     "# Inspect the cleaned summaries and texts to ensure they have been cleaned well\n",
 669 |     "for i in range(5):\n",
 670 |     "    print(\"Clean Review #\",i+1)\n",
 671 |     "    print(clean_summaries[i])\n",
 672 |     "    print(clean_texts[i])\n",
 673 |     "    print()"
 674 |    ]
 675 |   },
 676 |   {
 677 |    "cell_type": "markdown",
 678 |    "metadata": {},
 679 |    "source": [
 680 |     "### Count the number of occurrences of each word in a set of text"
 681 |    ]
 682 |   },
 683 |   {
 684 |    "cell_type": "code",
 685 |    "execution_count": 16,
 686 |    "metadata": {
 687 |     "collapsed": true
 688 |    },
 689 |    "outputs": [],
 690 |    "source": [
 691 |     "def count_words(count_dict, text):\n",
 692 |     "    for sentence in text:\n",
 693 |     "        for word in sentence.split():\n",
 694 |     "            if word not in count_dict:\n",
 695 |     "                count_dict[word] = 1\n",
 696 |     "            else:\n",
 697 |     "                count_dict[word] += 1"
 698 |    ]
 699 |   },
 700 |   {
 701 |    "cell_type": "markdown",
 702 |    "metadata": {},
 703 |    "source": [
 704 |     "#### Give the function a try"
 705 |    ]
 706 |   },
 707 |   {
 708 |    "cell_type": "code",
 709 |    "execution_count": 17,
 710 |    "metadata": {},
 711 |    "outputs": [
 712 |     {
 713 |      "data": {
 714 |       "text/plain": [
 715 |        "{'a': 2, 'dog': 2, 'great': 4, 'have': 1, 'is': 1, 'that': 1, 'you': 1}"
 716 |       ]
 717 |      },
 718 |      "execution_count": 17,
 719 |      "metadata": {},
 720 |      "output_type": "execute_result"
 721 |     }
 722 |    ],
 723 |    "source": [
 724 |     "mydict = {}\n",
 725 |     "count_words(mydict, [\"that is a great great great dog\",\"you have a great dog\"])\n",
 726 |     "mydict"
 727 |    ]
 728 |   },
 729 |   {
 730 |    "cell_type": "code",
 731 |    "execution_count": 18,
 732 |    "metadata": {},
 733 |    "outputs": [
 734 |     {
 735 |      "name": "stdout",
 736 |      "output_type": "stream",
 737 |      "text": [
 738 |       "Size of Vocabulary: 125880\n"
 739 |      ]
 740 |     }
 741 |    ],
 742 |    "source": [
 743 |     "word_counts = {}\n",
 744 |     "count_words(word_counts, clean_summaries)\n",
 745 |     "count_words(word_counts, clean_texts)\n",
 746 |     "print(\"Size of Vocabulary:\", len(word_counts))"
 747 |    ]
 748 |   },
 749 |   {
 750 |    "cell_type": "markdown",
 751 |    "metadata": {},
 752 |    "source": [
 753 |     "Let's see how may \"hero\" occurs in the data"
 754 |    ]
 755 |   },
 756 |   {
 757 |    "cell_type": "code",
 758 |    "execution_count": 19,
 759 |    "metadata": {},
 760 |    "outputs": [
 761 |     {
 762 |      "data": {
 763 |       "text/plain": [
 764 |        "114"
 765 |       ]
 766 |      },
 767 |      "execution_count": 19,
 768 |      "metadata": {},
 769 |      "output_type": "execute_result"
 770 |     }
 771 |    ],
 772 |    "source": [
 773 |     "word_counts[\"hero\"]"
 774 |    ]
 775 |   },
 776 |   {
 777 |    "cell_type": "markdown",
 778 |    "metadata": {
 779 |     "collapsed": true
 780 |    },
 781 |    "source": [
 782 |     "### Load Conceptnet Numberbatch's (CN) embeddings, similar to GloVe, but probably better \n",
 783 |     " (https://github.com/commonsense/conceptnet-numberbatch)"
 784 |    ]
 785 |   },
 786 |   {
 787 |    "cell_type": "code",
 788 |    "execution_count": 20,
 789 |    "metadata": {},
 790 |    "outputs": [
 791 |     {
 792 |      "name": "stdout",
 793 |      "output_type": "stream",
 794 |      "text": [
 795 |       "Word embeddings: 417195\n"
 796 |      ]
 797 |     }
 798 |    ],
 799 |    "source": [
 800 |     "\n",
 801 |     "embeddings_index = {}\n",
 802 |     "with open('./model/numberbatch-en-17.06.txt', encoding='utf-8') as f:\n",
 803 |     "    for line in f:\n",
 804 |     "        values = line.split(' ')\n",
 805 |     "        word = values[0]\n",
 806 |     "        embedding = np.asarray(values[1:], dtype='float32')\n",
 807 |     "        embeddings_index[word] = embedding\n",
 808 |     "\n",
 809 |     "print('Word embeddings:', len(embeddings_index))"
 810 |    ]
 811 |   },
 812 |   {
 813 |    "cell_type": "markdown",
 814 |    "metadata": {},
 815 |    "source": [
 816 |     "### Take a look at the CN embedding dimension"
 817 |    ]
 818 |   },
 819 |   {
 820 |    "cell_type": "code",
 821 |    "execution_count": 21,
 822 |    "metadata": {},
 823 |    "outputs": [
 824 |     {
 825 |      "data": {
 826 |       "text/plain": [
 827 |        "(300,)"
 828 |       ]
 829 |      },
 830 |      "execution_count": 21,
 831 |      "metadata": {},
 832 |      "output_type": "execute_result"
 833 |     }
 834 |    ],
 835 |    "source": [
 836 |     "embeddings_index[\"hero\"].shape"
 837 |    ]
 838 |   },
 839 |   {
 840 |    "cell_type": "markdown",
 841 |    "metadata": {},
 842 |    "source": [
 843 |     "### Find the number of words that are missing from CN, and are used more than our threshold.\n",
 844 |     "\n",
 845 |     "I use a **threshold** of 20, so that words not in CN can be added to our **word_embedding_matrix**, but they need to be common enough in the reviews so that the model can understand their meaning."
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "code",
 850 |    "execution_count": 22,
 851 |    "metadata": {},
 852 |    "outputs": [
 853 |     {
 854 |      "name": "stdout",
 855 |      "output_type": "stream",
 856 |      "text": [
 857 |       "Number of words missing from CN: 2608\n",
 858 |       "Percent of words that are missing from vocabulary: 2.07%\n"
 859 |      ]
 860 |     }
 861 |    ],
 862 |    "source": [
 863 |     "missing_words = 0\n",
 864 |     "threshold = 20\n",
 865 |     "\n",
 866 |     "for word, count in word_counts.items():\n",
 867 |     "    if count > threshold:\n",
 868 |     "        if word not in embeddings_index:\n",
 869 |     "            missing_words += 1\n",
 870 |     "            \n",
 871 |     "missing_ratio = round(missing_words/len(word_counts),4)*100\n",
 872 |     "            \n",
 873 |     "print(\"Number of words missing from CN:\", missing_words)\n",
 874 |     "print(\"Percent of words that are missing from vocabulary: {}%\".format(missing_ratio))"
 875 |    ]
 876 |   },
 877 |   {
 878 |    "cell_type": "markdown",
 879 |    "metadata": {},
 880 |    "source": [
 881 |     "### What are those missing words in the CN\n",
 882 |     "Looks mostly products' brand."
 883 |    ]
 884 |   },
 885 |   {
 886 |    "cell_type": "code",
 887 |    "execution_count": 23,
 888 |    "metadata": {},
 889 |    "outputs": [
 890 |     {
 891 |      "data": {
 892 |       "text/plain": [
 893 |        "[('wafu', 29),\n",
 894 |        " ('wasaibi', 24),\n",
 895 |        " ('sauage', 23),\n",
 896 |        " ('diabetisweet', 27),\n",
 897 |        " ('aerogrow', 99),\n",
 898 |        " ('lowfat', 298),\n",
 899 |        " ('deliverd', 21),\n",
 900 |        " ('bullysticks', 21),\n",
 901 |        " ('keurigs', 72),\n",
 902 |        " ('pepitas', 42),\n",
 903 |        " ('wellpet', 27),\n",
 904 |        " ('undertaste', 24),\n",
 905 |        " ('50g', 44),\n",
 906 |        " ('ammount', 45),\n",
 907 |        " ('400', 461),\n",
 908 |        " ('toniq', 21),\n",
 909 |        " ('gummis', 161),\n",
 910 |        " ('teasan', 81),\n",
 911 |        " ('27th', 28),\n",
 912 |        " ('iherb', 66),\n",
 913 |        " ('fage', 34),\n",
 914 |        " ('droste', 70),\n",
 915 |        " ('wholefoods', 145),\n",
 916 |        " ('marzanos', 29),\n",
 917 |        " ('discusting', 28),\n",
 918 |        " ('foojoy', 41),\n",
 919 |        " ('91', 75),\n",
 920 |        " ('indomie', 36),\n",
 921 |        " ('5hour', 64),\n",
 922 |        " ('ec155', 34)]"
 923 |       ]
 924 |      },
 925 |      "execution_count": 23,
 926 |      "metadata": {},
 927 |      "output_type": "execute_result"
 928 |     }
 929 |    ],
 930 |    "source": [
 931 |     "missing_words = []\n",
 932 |     "for word, count in word_counts.items():\n",
 933 |     "    if count > threshold and word not in embeddings_index:\n",
 934 |     "        missing_words.append((word,count))\n",
 935 |     "missing_words[:30]"
 936 |    ]
 937 |   },
 938 |   {
 939 |    "cell_type": "markdown",
 940 |    "metadata": {},
 941 |    "source": [
 942 |     "### Words to indexes, indexes to words dicts\n",
 943 |     "Limit the vocab that we will use to words that appear ≥ threshold or are in CN"
 944 |    ]
 945 |   },
 946 |   {
 947 |    "cell_type": "code",
 948 |    "execution_count": 24,
 949 |    "metadata": {},
 950 |    "outputs": [
 951 |     {
 952 |      "name": "stdout",
 953 |      "output_type": "stream",
 954 |      "text": [
 955 |       "Total number of unique words: 125880\n",
 956 |       "Number of words we will use: 59072\n",
 957 |       "Percent of words we will use: 46.93%\n"
 958 |      ]
 959 |     }
 960 |    ],
 961 |    "source": [
 962 |     "#dictionary to convert words to integers\n",
 963 |     "vocab_to_int = {} \n",
 964 |     "# Index words from 0\n",
 965 |     "value = 0\n",
 966 |     "for word, count in word_counts.items():\n",
 967 |     "    if count >= threshold or word in embeddings_index:\n",
 968 |     "        vocab_to_int[word] = value\n",
 969 |     "        value += 1\n",
 970 |     "\n",
 971 |     "# Special tokens that will be added to our vocab\n",
 972 |     "codes = [\"<UNK>\",\"<PAD>\",\"<EOS>\",\"<GO>\"]   \n",
 973 |     "\n",
 974 |     "# Add codes to vocab\n",
 975 |     "for code in codes:\n",
 976 |     "    vocab_to_int[code] = len(vocab_to_int)\n",
 977 |     "\n",
 978 |     "# Dictionary to convert integers to words\n",
 979 |     "int_to_vocab = {}\n",
 980 |     "for word, value in vocab_to_int.items():\n",
 981 |     "    int_to_vocab[value] = word\n",
 982 |     "\n",
 983 |     "usage_ratio = round(len(vocab_to_int) / len(word_counts),4)*100\n",
 984 |     "\n",
 985 |     "print(\"Total number of unique words:\", len(word_counts))\n",
 986 |     "print(\"Number of words we will use:\", len(vocab_to_int))\n",
 987 |     "print(\"Percent of words we will use: {}%\".format(usage_ratio))"
 988 |    ]
 989 |   },
 990 |   {
 991 |    "cell_type": "markdown",
 992 |    "metadata": {},
 993 |    "source": [
 994 |     "### Create word embedding matrix\n",
 995 |     "It has shape (nb_words, embedding_dim) i.e. (59072, 300) in this case. 1st dim is word index, 2nd dim is from CN or random generated."
 996 |    ]
 997 |   },
 998 |   {
 999 |    "cell_type": "code",
1000 |    "execution_count": 25,
1001 |    "metadata": {},
1002 |    "outputs": [
1003 |     {
1004 |      "name": "stdout",
1005 |      "output_type": "stream",
1006 |      "text": [
1007 |       "59072\n"
1008 |      ]
1009 |     }
1010 |    ],
1011 |    "source": [
1012 |     "# Need to use 300 for embedding dimensions to match CN's vectors.\n",
1013 |     "embedding_dim = 300\n",
1014 |     "nb_words = len(vocab_to_int)\n",
1015 |     "\n",
1016 |     "# Create matrix with default values of zero\n",
1017 |     "word_embedding_matrix = np.zeros((nb_words, embedding_dim), dtype=np.float32)\n",
1018 |     "for word, i in vocab_to_int.items():\n",
1019 |     "    if word in embeddings_index:\n",
1020 |     "        word_embedding_matrix[i] = embeddings_index[word]\n",
1021 |     "    else:\n",
1022 |     "        # If word not in CN, create a random embedding for it\n",
1023 |     "        new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))\n",
1024 |     "        embeddings_index[word] = new_embedding\n",
1025 |     "        word_embedding_matrix[i] = new_embedding\n",
1026 |     "\n",
1027 |     "# Check if value matches len(vocab_to_int)\n",
1028 |     "print(len(word_embedding_matrix))"
1029 |    ]
1030 |   },
1031 |   {
1032 |    "cell_type": "markdown",
1033 |    "metadata": {},
1034 |    "source": [
1035 |     "### Function to convert sentences to sequence of words indexes\n",
1036 |     "It also use `<UNK>` index to replace unknown words, append `<EOS>` (End of Sentence) to the sequences if eos is set True"
1037 |    ]
1038 |   },
1039 |   {
1040 |    "cell_type": "code",
1041 |    "execution_count": 26,
1042 |    "metadata": {
1043 |     "collapsed": true
1044 |    },
1045 |    "outputs": [],
1046 |    "source": [
1047 |     "def convert_to_ints(text, word_count, unk_count, eos=False):\n",
1048 |     "    '''Convert words in text to an integer.\n",
1049 |     "       If word is not in vocab_to_int, use UNK's integer.\n",
1050 |     "       Total the number of words and UNKs.\n",
1051 |     "       Add EOS token to the end of texts'''\n",
1052 |     "    ints = []\n",
1053 |     "    for sentence in text:\n",
1054 |     "        sentence_ints = []\n",
1055 |     "        for word in sentence.split():\n",
1056 |     "            word_count += 1\n",
1057 |     "            if word in vocab_to_int:\n",
1058 |     "                sentence_ints.append(vocab_to_int[word])\n",
1059 |     "            else:\n",
1060 |     "                sentence_ints.append(vocab_to_int[\"<UNK>\"])\n",
1061 |     "                unk_count += 1\n",
1062 |     "        if eos:\n",
1063 |     "            sentence_ints.append(vocab_to_int[\"<EOS>\"])\n",
1064 |     "        ints.append(sentence_ints)\n",
1065 |     "    return ints, word_count, unk_count"
1066 |    ]
1067 |   },
1068 |   {
1069 |    "cell_type": "markdown",
1070 |    "metadata": {},
1071 |    "source": [
1072 |     "Apply convert_to_ints to clean_summaries and clean_texts"
1073 |    ]
1074 |   },
1075 |   {
1076 |    "cell_type": "code",
1077 |    "execution_count": 27,
1078 |    "metadata": {},
1079 |    "outputs": [
1080 |     {
1081 |      "name": "stdout",
1082 |      "output_type": "stream",
1083 |      "text": [
1084 |       "Total number of words in headlines: 26232576\n",
1085 |       "Total number of UNKs in headlines: 163594\n",
1086 |       "Percent of words that are UNK: 0.62%\n"
1087 |      ]
1088 |     }
1089 |    ],
1090 |    "source": [
1091 |     "\n",
1092 |     "word_count = 0\n",
1093 |     "unk_count = 0\n",
1094 |     "\n",
1095 |     "int_summaries, word_count, unk_count = convert_to_ints(clean_summaries, word_count, unk_count)\n",
1096 |     "int_texts, word_count, unk_count = convert_to_ints(clean_texts, word_count, unk_count, eos=True)\n",
1097 |     "\n",
1098 |     "unk_percent = round(unk_count/word_count,4)*100\n",
1099 |     "\n",
1100 |     "print(\"Total number of words in headlines:\", word_count)\n",
1101 |     "print(\"Total number of UNKs in headlines:\", unk_count)\n",
1102 |     "print(\"Percent of words that are UNK: {}%\".format(unk_percent))"
1103 |    ]
1104 |   },
1105 |   {
1106 |    "cell_type": "markdown",
1107 |    "metadata": {},
1108 |    "source": [
1109 |     "### Take a look at what the sequence looks like\n",
1110 |     "Each number here represents a word"
1111 |    ]
1112 |   },
1113 |   {
1114 |    "cell_type": "code",
1115 |    "execution_count": 28,
1116 |    "metadata": {},
1117 |    "outputs": [
1118 |     {
1119 |      "data": {
1120 |       "text/plain": [
1121 |        "[[32681, 40810, 26787, 54872],\n",
1122 |        " [2229, 54986, 47923],\n",
1123 |        " [23867, 38191, 14436, 39262]]"
1124 |       ]
1125 |      },
1126 |      "execution_count": 28,
1127 |      "metadata": {},
1128 |      "output_type": "execute_result"
1129 |     }
1130 |    ],
1131 |    "source": [
1132 |     "int_summaries[:3]"
1133 |    ]
1134 |   },
1135 |   {
1136 |    "cell_type": "markdown",
1137 |    "metadata": {},
1138 |    "source": [
1139 |     "### Function to get the length of each sequence"
1140 |    ]
1141 |   },
1142 |   {
1143 |    "cell_type": "code",
1144 |    "execution_count": 29,
1145 |    "metadata": {
1146 |     "collapsed": true
1147 |    },
1148 |    "outputs": [],
1149 |    "source": [
1150 |     "def create_lengths(text):\n",
1151 |     "    '''Create a data frame of the sentence lengths from a text'''\n",
1152 |     "    lengths = []\n",
1153 |     "    for sentence in text:\n",
1154 |     "        lengths.append(len(sentence))\n",
1155 |     "    return pd.DataFrame(lengths, columns=['counts'])"
1156 |    ]
1157 |   },
1158 |   {
1159 |    "cell_type": "code",
1160 |    "execution_count": 30,
1161 |    "metadata": {},
1162 |    "outputs": [
1163 |     {
1164 |      "data": {
1165 |       "text/html": [
1166 |        "<div>\n",
1167 |        "<table border=\"1\" class=\"dataframe\">\n",
1168 |        "  <thead>\n",
1169 |        "    <tr style=\"text-align: right;\">\n",
1170 |        "      <th></th>\n",
1171 |        "      <th>counts</th>\n",
1172 |        "    </tr>\n",
1173 |        "  </thead>\n",
1174 |        "  <tbody>\n",
1175 |        "    <tr>\n",
1176 |        "      <th>0</th>\n",
1177 |        "      <td>4</td>\n",
1178 |        "    </tr>\n",
1179 |        "    <tr>\n",
1180 |        "      <th>1</th>\n",
1181 |        "      <td>3</td>\n",
1182 |        "    </tr>\n",
1183 |        "    <tr>\n",
1184 |        "      <th>2</th>\n",
1185 |        "      <td>4</td>\n",
1186 |        "    </tr>\n",
1187 |        "  </tbody>\n",
1188 |        "</table>\n",
1189 |        "</div>"
1190 |       ],
1191 |       "text/plain": [
1192 |        "   counts\n",
1193 |        "0       4\n",
1194 |        "1       3\n",
1195 |        "2       4"
1196 |       ]
1197 |      },
1198 |      "execution_count": 30,
1199 |      "metadata": {},
1200 |      "output_type": "execute_result"
1201 |     }
1202 |    ],
1203 |    "source": [
1204 |     "create_lengths(int_summaries[:3])"
1205 |    ]
1206 |   },
1207 |   {
1208 |    "cell_type": "markdown",
1209 |    "metadata": {},
1210 |    "source": [
1211 |     "Get statistic summary of the length of summaries and texts"
1212 |    ]
1213 |   },
1214 |   {
1215 |    "cell_type": "code",
1216 |    "execution_count": 31,
1217 |    "metadata": {},
1218 |    "outputs": [
1219 |     {
1220 |      "name": "stdout",
1221 |      "output_type": "stream",
1222 |      "text": [
1223 |       "Summaries:\n",
1224 |       "              counts\n",
1225 |       "count  568412.000000\n",
1226 |       "mean        4.181208\n",
1227 |       "std         2.657212\n",
1228 |       "min         0.000000\n",
1229 |       "25%         2.000000\n",
1230 |       "50%         4.000000\n",
1231 |       "75%         5.000000\n",
1232 |       "max        48.000000\n",
1233 |       "\n",
1234 |       "Texts:\n",
1235 |       "              counts\n",
1236 |       "count  568412.000000\n",
1237 |       "mean       42.969429\n",
1238 |       "std        44.166421\n",
1239 |       "min         2.000000\n",
1240 |       "25%        18.000000\n",
1241 |       "50%        30.000000\n",
1242 |       "75%        51.000000\n",
1243 |       "max      2063.000000\n"
1244 |      ]
1245 |     }
1246 |    ],
1247 |    "source": [
1248 |     "lengths_summaries = create_lengths(int_summaries)\n",
1249 |     "lengths_texts = create_lengths(int_texts)\n",
1250 |     "\n",
1251 |     "print(\"Summaries:\")\n",
1252 |     "print(lengths_summaries.describe())\n",
1253 |     "print()\n",
1254 |     "print(\"Texts:\")\n",
1255 |     "print(lengths_texts.describe())"
1256 |    ]
1257 |   },
1258 |   {
1259 |    "cell_type": "markdown",
1260 |    "metadata": {},
1261 |    "source": [
1262 |     "### See what's the max squence length we can cover by percentile"
1263 |    ]
1264 |   },
1265 |   {
1266 |    "cell_type": "code",
1267 |    "execution_count": 32,
1268 |    "metadata": {},
1269 |    "outputs": [
1270 |     {
1271 |      "name": "stdout",
1272 |      "output_type": "stream",
1273 |      "text": [
1274 |       "84.0\n",
1275 |       "118.0\n",
1276 |       "216.0\n"
1277 |      ]
1278 |     }
1279 |    ],
1280 |    "source": [
1281 |     "# Inspect the length of texts\n",
1282 |     "print(np.percentile(lengths_texts.counts, 89.5))\n",
1283 |     "print(np.percentile(lengths_texts.counts, 95))\n",
1284 |     "print(np.percentile(lengths_texts.counts, 99))"
1285 |    ]
1286 |   },
1287 |   {
1288 |    "cell_type": "code",
1289 |    "execution_count": 33,
1290 |    "metadata": {},
1291 |    "outputs": [
1292 |     {
1293 |      "name": "stdout",
1294 |      "output_type": "stream",
1295 |      "text": [
1296 |       "8.0\n",
1297 |       "9.0\n",
1298 |       "13.0\n"
1299 |      ]
1300 |     }
1301 |    ],
1302 |    "source": [
1303 |     "# Inspect the length of summaries\n",
1304 |     "print(np.percentile(lengths_summaries.counts, 90))\n",
1305 |     "print(np.percentile(lengths_summaries.counts, 95))\n",
1306 |     "print(np.percentile(lengths_summaries.counts, 99))"
1307 |    ]
1308 |   },
1309 |   {
1310 |    "cell_type": "markdown",
1311 |    "metadata": {},
1312 |    "source": [
1313 |     "## Function to counts the number of time `<UNK>` appears in a sentence"
1314 |    ]
1315 |   },
1316 |   {
1317 |    "cell_type": "code",
1318 |    "execution_count": 34,
1319 |    "metadata": {
1320 |     "collapsed": true
1321 |    },
1322 |    "outputs": [],
1323 |    "source": [
1324 |     "def unk_counter(sentence):\n",
1325 |     "    '''Counts the number of time UNK appears in a sentence.'''\n",
1326 |     "    unk_count = 0\n",
1327 |     "    for word in sentence:\n",
1328 |     "        if word == vocab_to_int[\"<UNK>\"]:\n",
1329 |     "            unk_count += 1\n",
1330 |     "    return unk_count"
1331 |    ]
1332 |   },
1333 |   {
1334 |    "cell_type": "markdown",
1335 |    "metadata": {},
1336 |    "source": [
1337 |     "**Filter** for length limit and number of `<UNK>`s\n",
1338 |     "\n",
1339 |     "**Sort** the summaries and texts by the length of the element in **texts** from shortest to longest\n"
1340 |    ]
1341 |   },
1342 |   {
1343 |    "cell_type": "code",
1344 |    "execution_count": 39,
1345 |    "metadata": {},
1346 |    "outputs": [
1347 |     {
1348 |      "name": "stdout",
1349 |      "output_type": "stream",
1350 |      "text": [
1351 |       "428278\n",
1352 |       "428278\n"
1353 |      ]
1354 |     }
1355 |    ],
1356 |    "source": [
1357 |     "max_text_length = 83 # This will cover up to 89.5% lengthes\n",
1358 |     "max_summary_length = 13 # This will cover up to 99% lengthes\n",
1359 |     "min_length = 2\n",
1360 |     "unk_text_limit = 1 # text can contain up to 1 UNK word\n",
1361 |     "unk_summary_limit = 0 # Summary should not contain any UNK word\n",
1362 |     "\n",
1363 |     "def filter_condition(item):\n",
1364 |     "    int_summary = item[0]\n",
1365 |     "    int_text = item[1]\n",
1366 |     "    if(len(int_summary) >= min_length and \n",
1367 |     "       len(int_summary) <= max_summary_length and \n",
1368 |     "       len(int_text) >= min_length and \n",
1369 |     "       len(int_text) <= max_text_length and \n",
1370 |     "       unk_counter(int_summary) <= unk_summary_limit and \n",
1371 |     "       unk_counter(int_text) <= unk_text_limit):\n",
1372 |     "        return True\n",
1373 |     "    else:\n",
1374 |     "        return False\n",
1375 |     "\n",
1376 |     "int_text_summaries = list(zip(int_summaries , int_texts))\n",
1377 |     "int_text_summaries_filtered = list(filter(filter_condition, int_text_summaries))\n",
1378 |     "sorted_int_text_summaries = sorted(int_text_summaries_filtered, key=lambda item: len(item[1]))\n",
1379 |     "sorted_int_text_summaries = list(zip(*sorted_int_text_summaries))\n",
1380 |     "sorted_summaries = list(sorted_int_text_summaries[0])\n",
1381 |     "sorted_texts = list(sorted_int_text_summaries[1])\n",
1382 |     "# Delete those temporary varaibles\n",
1383 |     "del int_text_summaries, sorted_int_text_summaries, int_text_summaries_filtered\n",
1384 |     "# Compare lengths to ensure they match\n",
1385 |     "print(len(sorted_summaries))\n",
1386 |     "print(len(sorted_texts))"
1387 |    ]
1388 |   },
1389 |   {
1390 |    "cell_type": "markdown",
1391 |    "metadata": {},
1392 |    "source": [
1393 |     "### Inspect the length of text in sorted_texts"
1394 |    ]
1395 |   },
1396 |   {
1397 |    "cell_type": "code",
1398 |    "execution_count": 40,
1399 |    "metadata": {},
1400 |    "outputs": [
1401 |     {
1402 |      "data": {
1403 |       "text/plain": [
1404 |        "[2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]"
1405 |       ]
1406 |      },
1407 |      "execution_count": 40,
1408 |      "metadata": {},
1409 |      "output_type": "execute_result"
1410 |     }
1411 |    ],
1412 |    "source": [
1413 |     "lengths_texts = [len(text) for text in sorted_texts]\n",
1414 |     "lengths_texts[:20]"
1415 |    ]
1416 |   },
1417 |   {
1418 |    "cell_type": "markdown",
1419 |    "metadata": {},
1420 |    "source": [
1421 |     "## Save data for later"
1422 |    ]
1423 |   },
1424 |   {
1425 |    "cell_type": "code",
1426 |    "execution_count": 41,
1427 |    "metadata": {
1428 |     "collapsed": true
1429 |    },
1430 |    "outputs": [],
1431 |    "source": [
1432 |     "__pickleStuff(\"./data/clean_summaries.p\",clean_summaries)\n",
1433 |     "__pickleStuff(\"./data/clean_texts.p\",clean_texts)\n",
1434 |     "\n",
1435 |     "__pickleStuff(\"./data/sorted_summaries.p\",sorted_summaries)\n",
1436 |     "__pickleStuff(\"./data/sorted_texts.p\",sorted_texts)\n",
1437 |     "__pickleStuff(\"./data/word_embedding_matrix.p\",word_embedding_matrix)\n",
1438 |     "\n",
1439 |     "__pickleStuff(\"./data/vocab_to_int.p\",vocab_to_int)\n",
1440 |     "__pickleStuff(\"./data/int_to_vocab.p\",int_to_vocab)"
1441 |    ]
1442 |   },
1443 |   {
1444 |    "cell_type": "markdown",
1445 |    "metadata": {},
1446 |    "source": [
1447 |     "## 3. Building the Model"
1448 |    ]
1449 |   },
1450 |   {
1451 |    "cell_type": "markdown",
1452 |    "metadata": {},
1453 |    "source": [
1454 |     "Create palceholders for inputs to the model\n",
1455 |     "\n",
1456 |     "**summary_length** and **text_length** are the sentence lengths in a batch, and **max_summary_length** is the maximum length of a summary in a batch."
1457 |    ]
1458 |   },
1459 |   {
1460 |    "cell_type": "code",
1461 |    "execution_count": 4,
1462 |    "metadata": {
1463 |     "collapsed": true
1464 |    },
1465 |    "outputs": [],
1466 |    "source": [
1467 |     "def model_inputs():\n",
1468 |     "    input_data = tf.placeholder(tf.int32, [None, None], name='input')\n",
1469 |     "    targets = tf.placeholder(tf.int32, [None, None], name='targets')\n",
1470 |     "    lr = tf.placeholder(tf.float32, name='learning_rate')\n",
1471 |     "    keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
1472 |     "    summary_length = tf.placeholder(tf.int32, (None,), name='summary_length')\n",
1473 |     "    max_summary_length = tf.reduce_max(summary_length, name='max_dec_len')\n",
1474 |     "    text_length = tf.placeholder(tf.int32, (None,), name='text_length')\n",
1475 |     "\n",
1476 |     "    return input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length"
1477 |    ]
1478 |   },
1479 |   {
1480 |    "cell_type": "markdown",
1481 |    "metadata": {},
1482 |    "source": [
1483 |     "Remove the last word id from each batch and concatenate the id of `<GO>` to the begining of each batch"
1484 |    ]
1485 |   },
1486 |   {
1487 |    "cell_type": "code",
1488 |    "execution_count": 5,
1489 |    "metadata": {
1490 |     "collapsed": true
1491 |    },
1492 |    "outputs": [],
1493 |    "source": [
1494 |     "def process_encoding_input(target_data, vocab_to_int, batch_size):  \n",
1495 |     "    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1]) # slice it to target_data[0:batch_size, 0: -1]\n",
1496 |     "    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)\n",
1497 |     "\n",
1498 |     "    return dec_input"
1499 |    ]
1500 |   },
1501 |   {
1502 |    "cell_type": "markdown",
1503 |    "metadata": {},
1504 |    "source": [
1505 |     "### Create the encoding layers\n",
1506 |     "\n",
1507 |     "bidirectional_dynamic_rnn\n",
1508 |     "use **tf.variable_scope** so that variables are reused with each layer\n",
1509 |     "\n",
1510 |     "parameters\n",
1511 |     "- **rnn_size**: The number of units in the LSTM cell\n",
1512 |     "- **sequence_length**: size [batch_size], containing the actual lengths for each of the sequences in the batch\n",
1513 |     "- **num_layers**: number of bidirectional RNN layer\n",
1514 |     "- **rnn_inputs**: number of bidirectional RNN layer\n",
1515 |     "- **keep_prob**: RNN dropout input keep probability"
1516 |    ]
1517 |   },
1518 |   {
1519 |    "cell_type": "code",
1520 |    "execution_count": 6,
1521 |    "metadata": {
1522 |     "collapsed": true
1523 |    },
1524 |    "outputs": [],
1525 |    "source": [
1526 |     "def encoding_layer(rnn_size, sequence_length, num_layers, rnn_inputs, keep_prob):\n",
1527 |     "    for layer in range(num_layers):\n",
1528 |     "        with tf.variable_scope('encoder_{}'.format(layer)):\n",
1529 |     "            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size,\n",
1530 |     "                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))\n",
1531 |     "            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, \n",
1532 |     "                                                    input_keep_prob = keep_prob)\n",
1533 |     "\n",
1534 |     "            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size,\n",
1535 |     "                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))\n",
1536 |     "            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, \n",
1537 |     "                                                    input_keep_prob = keep_prob)\n",
1538 |     "\n",
1539 |     "            enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, \n",
1540 |     "                                                                    cell_bw, \n",
1541 |     "                                                                    rnn_inputs,\n",
1542 |     "                                                                    sequence_length,\n",
1543 |     "                                                                    dtype=tf.float32)\n",
1544 |     "            enc_output = tf.concat(enc_output,2)\n",
1545 |     "            # original code is missing this line below, that is how we connect layers \n",
1546 |     "            # by feeding the current layer's output to next layer's input\n",
1547 |     "            rnn_inputs = enc_output\n",
1548 |     "    return enc_output, enc_state"
1549 |    ]
1550 |   },
1551 |   {
1552 |    "cell_type": "markdown",
1553 |    "metadata": {},
1554 |    "source": [
1555 |     "### Create the training decoding layer\n",
1556 |     "parameters\n",
1557 |     "- **dec_embed_input**: output of embedding_lookup for a batch of inputs\n",
1558 |     "- **summary_length**: length of each padded summary sequences in batch, since padded, all lengths should be same number \n",
1559 |     "- **dec_cell**: the decoder RNN cells' output with attention wapper\n",
1560 |     "- **output_layer**: fully connected layer to apply to the RNN output\n",
1561 |     "- **vocab_size**: vocabulary size i.e. len(vocab_to_int)+1\n",
1562 |     "- **max_summary_length**: the maximum length of a summary in a batch\n",
1563 |     "- **batch_size**: number of input sequences in a batch\n",
1564 |     "\n",
1565 |     "Three components\n",
1566 |     "\n",
1567 |     "- **TraingHelper** reads a sequence of integers from the encoding layer.\n",
1568 |     "- **BasicDecoder** processes the sequence with the decoding cell, and an output layer, which is a fully connected layer. **initial_state** set to zero state.\n",
1569 |     "- **dynamic_decode** creates our outputs that will be used for training."
1570 |    ]
1571 |   },
1572 |   {
1573 |    "cell_type": "code",
1574 |    "execution_count": 7,
1575 |    "metadata": {
1576 |     "collapsed": true
1577 |    },
1578 |    "outputs": [],
1579 |    "source": [
1580 |     "def training_decoding_layer(dec_embed_input, summary_length, dec_cell, output_layer,\n",
1581 |     "                            vocab_size, max_summary_length,batch_size):\n",
1582 |     "    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,\n",
1583 |     "                                                        sequence_length=summary_length,\n",
1584 |     "                                                        time_major=False)\n",
1585 |     "\n",
1586 |     "    training_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell,\n",
1587 |     "                                                       helper=training_helper,\n",
1588 |     "                                                       initial_state=dec_cell.zero_state(dtype=tf.float32, batch_size=batch_size),\n",
1589 |     "                                                       output_layer = output_layer)\n",
1590 |     "\n",
1591 |     "    training_logits = tf.contrib.seq2seq.dynamic_decode(training_decoder,\n",
1592 |     "                                                           output_time_major=False,\n",
1593 |     "                                                           impute_finished=True,\n",
1594 |     "                                                           maximum_iterations=max_summary_length)\n",
1595 |     "    return training_logits"
1596 |    ]
1597 |   },
1598 |   {
1599 |    "cell_type": "markdown",
1600 |    "metadata": {},
1601 |    "source": [
1602 |     "### Create infer decoding layer\n",
1603 |     "\n",
1604 |     "parameters\n",
1605 |     "- **embeddings**: the CN's word_embedding_matrix\n",
1606 |     "- **start_token**: the id of `<GO>`\n",
1607 |     "- **end_token**: the id of `<EOS>`\n",
1608 |     "- **dec_cell**: the decoder RNN cells' output with attention wapper\n",
1609 |     "- **output_layer**: fully connected layer to apply to the RNN output\n",
1610 |     "- **max_summary_length**: the maximum length of a summary in a batch\n",
1611 |     "- **batch_size**: number of input sequences in a batch\n",
1612 |     "\n",
1613 |     "**GreedyEmbeddingHelper** argument **start_tokens**: int32 vector shaped [batch_size], the start tokens."
1614 |    ]
1615 |   },
1616 |   {
1617 |    "cell_type": "code",
1618 |    "execution_count": 8,
1619 |    "metadata": {
1620 |     "collapsed": true
1621 |    },
1622 |    "outputs": [],
1623 |    "source": [
1624 |     "def inference_decoding_layer(embeddings, start_token, end_token, dec_cell, output_layer,\n",
1625 |     "                             max_summary_length, batch_size):\n",
1626 |     "    '''Create the inference logits'''\n",
1627 |     "    \n",
1628 |     "    start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')\n",
1629 |     "    \n",
1630 |     "    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings,\n",
1631 |     "                                                                start_tokens,\n",
1632 |     "                                                                end_token)\n",
1633 |     "                \n",
1634 |     "    inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,\n",
1635 |     "                                                        inference_helper,\n",
1636 |     "                                                        dec_cell.zero_state(dtype=tf.float32, batch_size=batch_size),\n",
1637 |     "                                                        output_layer)\n",
1638 |     "                \n",
1639 |     "    inference_logits = tf.contrib.seq2seq.dynamic_decode(inference_decoder,\n",
1640 |     "                                                            output_time_major=False,\n",
1641 |     "                                                            impute_finished=True,\n",
1642 |     "                                                            maximum_iterations=max_summary_length)\n",
1643 |     "    \n",
1644 |     "    return inference_logits"
1645 |    ]
1646 |   },
1647 |   {
1648 |    "cell_type": "markdown",
1649 |    "metadata": {},
1650 |    "source": [
1651 |     "### Create Decoding layer\n",
1652 |     "3 parts: decoding cell, attention, and getting our logits.\n",
1653 |     "#### Decoding Cell: \n",
1654 |     "Just a two layer LSTM with dropout.\n",
1655 |     "#### Attention: \n",
1656 |     "Using Bhadanau, since trains faster than Luong. \n",
1657 |     "\n",
1658 |     "**AttentionWrapper** applies the attention mechanism to our decoding cell.\n",
1659 |     "\n",
1660 |     "parameters\n",
1661 |     "- **dec_embed_input**: output of embedding_lookup for a batch of inputs\n",
1662 |     "- **embeddings**: the CN's word_embedding_matrix\n",
1663 |     "- **enc_output**: encoder layer output, containing the forward and the backward rnn output\n",
1664 |     "- **enc_state**: encoder layer state, a tuple containing the forward and the backward final states of bidirectional rnn.\n",
1665 |     "- **vocab_size**: vocabulary size i.e. len(vocab_to_int)+1\n",
1666 |     "- **text_length**: the actual lengths for each of the input text sequences in the batch\n",
1667 |     "- **summary_length**: the actual lengths for each of the input summary sequences in the batch\n",
1668 |     "- **max_summary_length**: the maximum length of a summary in a batch\n",
1669 |     "- **rnn_size**: The number of units in the LSTM cell\n",
1670 |     "- **vocab_to_int**: vocab_to_int the dictionary\n",
1671 |     "- **keep_prob**: RNN dropout input keep probability\n",
1672 |     "- **batch_size**: number of input sequences in a batch\n",
1673 |     "- **num_layers**: number of decoder RNN layer"
1674 |    ]
1675 |   },
1676 |   {
1677 |    "cell_type": "code",
1678 |    "execution_count": 9,
1679 |    "metadata": {
1680 |     "collapsed": true
1681 |    },
1682 |    "outputs": [],
1683 |    "source": [
1684 |     "def lstm_cell(lstm_size, keep_prob):\n",
1685 |     "    cell = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
1686 |     "    return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob = keep_prob)\n",
1687 |     "\n",
1688 |     "def decoding_layer(dec_embed_input, embeddings, enc_output, enc_state, vocab_size, text_length, summary_length,\n",
1689 |     "                   max_summary_length, rnn_size, vocab_to_int, keep_prob, batch_size, num_layers):\n",
1690 |     "    '''Create the decoding cell and attention for the training and inference decoding layers'''\n",
1691 |     "    dec_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])\n",
1692 |     "    output_layer = Dense(vocab_size,kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))\n",
1693 |     "    attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size,\n",
1694 |     "                                                     enc_output,\n",
1695 |     "                                                     text_length,\n",
1696 |     "                                                     normalize=False,\n",
1697 |     "                                                     name='BahdanauAttention')\n",
1698 |     "    dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell,attn_mech,rnn_size)\n",
1699 |     "    with tf.variable_scope(\"decode\"):\n",
1700 |     "        training_logits = training_decoding_layer(dec_embed_input,summary_length,dec_cell,\n",
1701 |     "                                                  output_layer,\n",
1702 |     "                                                  vocab_size,\n",
1703 |     "                                                  max_summary_length,\n",
1704 |     "                                                  batch_size)\n",
1705 |     "    with tf.variable_scope(\"decode\", reuse=True):\n",
1706 |     "        inference_logits = inference_decoding_layer(embeddings,\n",
1707 |     "                                                    vocab_to_int['<GO>'],\n",
1708 |     "                                                    vocab_to_int['<EOS>'],\n",
1709 |     "                                                    dec_cell,\n",
1710 |     "                                                    output_layer,\n",
1711 |     "                                                    max_summary_length,\n",
1712 |     "                                                    batch_size)\n",
1713 |     "    return training_logits, inference_logits"
1714 |    ]
1715 |   },
1716 |   {
1717 |    "cell_type": "code",
1718 |    "execution_count": 10,
1719 |    "metadata": {
1720 |     "collapsed": true
1721 |    },
1722 |    "outputs": [],
1723 |    "source": [
1724 |     "def seq2seq_model(input_data, target_data, keep_prob, text_length, summary_length, max_summary_length, \n",
1725 |     "                  vocab_size, rnn_size, num_layers, vocab_to_int, batch_size):\n",
1726 |     "    '''Use the previous functions to create the training and inference logits'''\n",
1727 |     "    \n",
1728 |     "    # Use Numberbatch's embeddings and the newly created ones as our embeddings\n",
1729 |     "    embeddings = word_embedding_matrix\n",
1730 |     "    enc_embed_input = tf.nn.embedding_lookup(embeddings, input_data)\n",
1731 |     "    enc_output, enc_state = encoding_layer(rnn_size, text_length, num_layers, enc_embed_input, keep_prob)\n",
1732 |     "    dec_input = process_encoding_input(target_data, vocab_to_int, batch_size) #shape=(batch_size, senquence length) each seq start with index of<GO>\n",
1733 |     "    dec_embed_input = tf.nn.embedding_lookup(embeddings, dec_input)\n",
1734 |     "    training_logits, inference_logits  = decoding_layer(dec_embed_input, \n",
1735 |     "                                                        embeddings,\n",
1736 |     "                                                        enc_output,\n",
1737 |     "                                                        enc_state, \n",
1738 |     "                                                        vocab_size, \n",
1739 |     "                                                        text_length, \n",
1740 |     "                                                        summary_length, \n",
1741 |     "                                                        max_summary_length,\n",
1742 |     "                                                        rnn_size, \n",
1743 |     "                                                        vocab_to_int, \n",
1744 |     "                                                        keep_prob, \n",
1745 |     "                                                        batch_size,\n",
1746 |     "                                                        num_layers)\n",
1747 |     "    return training_logits, inference_logits"
1748 |    ]
1749 |   },
1750 |   {
1751 |    "cell_type": "markdown",
1752 |    "metadata": {},
1753 |    "source": [
1754 |     "### Pad sentences for batch\n",
1755 |     "Pad so the actual lengths for each of the sequences in the batch have the same length."
1756 |    ]
1757 |   },
1758 |   {
1759 |    "cell_type": "code",
1760 |    "execution_count": 11,
1761 |    "metadata": {
1762 |     "collapsed": true
1763 |    },
1764 |    "outputs": [],
1765 |    "source": [
1766 |     "def pad_sentence_batch(sentence_batch):\n",
1767 |     "    \"\"\"Pad sentences with <PAD> so that each sentence of a batch has the same length\"\"\"\n",
1768 |     "    max_sentence = max([len(sentence) for sentence in sentence_batch])\n",
1769 |     "    return [sentence + [vocab_to_int['<PAD>']] * (max_sentence - len(sentence)) for sentence in sentence_batch]"
1770 |    ]
1771 |   },
1772 |   {
1773 |    "cell_type": "markdown",
1774 |    "metadata": {},
1775 |    "source": [
1776 |     "### Function to generate batch data for training"
1777 |    ]
1778 |   },
1779 |   {
1780 |    "cell_type": "code",
1781 |    "execution_count": 12,
1782 |    "metadata": {
1783 |     "collapsed": true
1784 |    },
1785 |    "outputs": [],
1786 |    "source": [
1787 |     "def get_batches(summaries, texts, batch_size):\n",
1788 |     "    \"\"\"Batch summaries, texts, and the lengths of their sentences together\"\"\"\n",
1789 |     "    for batch_i in range(0, len(texts)//batch_size):\n",
1790 |     "        start_i = batch_i * batch_size\n",
1791 |     "        summaries_batch = summaries[start_i:start_i + batch_size]\n",
1792 |     "        texts_batch = texts[start_i:start_i + batch_size]\n",
1793 |     "        pad_summaries_batch = np.array(pad_sentence_batch(summaries_batch))\n",
1794 |     "        pad_texts_batch = np.array(pad_sentence_batch(texts_batch))\n",
1795 |     "        \n",
1796 |     "        # Need the lengths for the _lengths parameters\n",
1797 |     "        pad_summaries_lengths = []\n",
1798 |     "        for summary in pad_summaries_batch:\n",
1799 |     "            pad_summaries_lengths.append(len(summary))\n",
1800 |     "        \n",
1801 |     "        pad_texts_lengths = []\n",
1802 |     "        for text in pad_texts_batch:\n",
1803 |     "            pad_texts_lengths.append(len(text))\n",
1804 |     "        \n",
1805 |     "        yield pad_summaries_batch, pad_texts_batch, pad_summaries_lengths, pad_texts_lengths"
1806 |    ]
1807 |   },
1808 |   {
1809 |    "cell_type": "markdown",
1810 |    "metadata": {},
1811 |    "source": [
1812 |     "#### Just to test \"get_batches\" function\n",
1813 |     "Here we generate a batch with size of 5\n",
1814 |     "\n",
1815 |     "Checkout those \"59069\" they are `<PAD>`s, also all sequences' lengths are the same."
1816 |    ]
1817 |   },
1818 |   {
1819 |    "cell_type": "code",
1820 |    "execution_count": 13,
1821 |    "metadata": {},
1822 |    "outputs": [
1823 |     {
1824 |      "name": "stdout",
1825 |      "output_type": "stream",
1826 |      "text": [
1827 |       "'<PAD>' has id: 59069\n",
1828 |       "pad summaries batch samples:\n",
1829 |       "\r",
1830 |       " [[ 9218 18733 13131 39434 39434  4082  2454 29838 26219 33088 26752     4]\n",
1831 |       " [ 1417 42487  4397 22892 20719 59069 59069 59069 59069 59069 59069 59069]\n",
1832 |       " [ 2229 54986 19050 54986 44366 56008  8293 46449  6045 20974  4269 41958]\n",
1833 |       " [39205 16127  2875 26752 33799 58931 58335  5156 12490 59069 59069 59069]\n",
1834 |       " [54984 47044 12490 43359 46111 59069 59069 59069 59069 59069 59069 59069]]\n"
1835 |      ]
1836 |     }
1837 |    ],
1838 |    "source": [
1839 |     "print(\"'<PAD>' has id: {}\".format(vocab_to_int['<PAD>']))\n",
1840 |     "sorted_summaries_samples = sorted_summaries[7:50]\n",
1841 |     "sorted_texts_samples = sorted_texts[7:50]\n",
1842 |     "pad_summaries_batch_samples, pad_texts_batch_samples, pad_summaries_lengths_samples, pad_texts_lengths_samples = next(get_batches(\n",
1843 |     "    sorted_summaries_samples, sorted_texts_samples, 5))\n",
1844 |     "print(\"pad summaries batch samples:\\n\\r {}\".format(pad_summaries_batch_samples))"
1845 |    ]
1846 |   },
1847 |   {
1848 |    "cell_type": "code",
1849 |    "execution_count": 18,
1850 |    "metadata": {
1851 |     "collapsed": true
1852 |    },
1853 |    "outputs": [],
1854 |    "source": [
1855 |     "# Set the Hyperparameters\n",
1856 |     "epochs = 100\n",
1857 |     "batch_size = 64\n",
1858 |     "rnn_size = 256\n",
1859 |     "num_layers = 2\n",
1860 |     "learning_rate = 0.005\n",
1861 |     "keep_probability = 0.95"
1862 |    ]
1863 |   },
1864 |   {
1865 |    "cell_type": "markdown",
1866 |    "metadata": {},
1867 |    "source": [
1868 |     "## Build graph"
1869 |    ]
1870 |   },
1871 |   {
1872 |    "cell_type": "code",
1873 |    "execution_count": 15,
1874 |    "metadata": {},
1875 |    "outputs": [
1876 |     {
1877 |      "name": "stdout",
1878 |      "output_type": "stream",
1879 |      "text": [
1880 |       "Graph is built.\n",
1881 |       "./graph\n"
1882 |      ]
1883 |     }
1884 |    ],
1885 |    "source": [
1886 |     "# Build the graph\n",
1887 |     "train_graph = tf.Graph()\n",
1888 |     "# Set the graph to default to ensure that it is ready for training\n",
1889 |     "with train_graph.as_default():\n",
1890 |     "    \n",
1891 |     "    # Load the model inputs    \n",
1892 |     "    input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length = model_inputs()\n",
1893 |     "\n",
1894 |     "    # Create the training and inference logits\n",
1895 |     "    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),\n",
1896 |     "                                                      targets, \n",
1897 |     "                                                      keep_prob,   \n",
1898 |     "                                                      text_length,\n",
1899 |     "                                                      summary_length,\n",
1900 |     "                                                      max_summary_length,\n",
1901 |     "                                                      len(vocab_to_int)+1,\n",
1902 |     "                                                      rnn_size, \n",
1903 |     "                                                      num_layers, \n",
1904 |     "                                                      vocab_to_int,\n",
1905 |     "                                                      batch_size)\n",
1906 |     "    \n",
1907 |     "    # Create tensors for the training logits and inference logits\n",
1908 |     "    training_logits = tf.identity(training_logits[0].rnn_output, 'logits')\n",
1909 |     "    inference_logits = tf.identity(inference_logits[0].sample_id, name='predictions')\n",
1910 |     "    \n",
1911 |     "    # Create the weights for sequence_loss, the sould be all True across since each batch is padded\n",
1912 |     "    masks = tf.sequence_mask(summary_length, max_summary_length, dtype=tf.float32, name='masks')\n",
1913 |     "\n",
1914 |     "    with tf.name_scope(\"optimization\"):\n",
1915 |     "        # Loss function\n",
1916 |     "        cost = tf.contrib.seq2seq.sequence_loss(\n",
1917 |     "            training_logits,\n",
1918 |     "            targets,\n",
1919 |     "            masks)\n",
1920 |     "\n",
1921 |     "        # Optimizer\n",
1922 |     "        optimizer = tf.train.AdamOptimizer(learning_rate)\n",
1923 |     "\n",
1924 |     "        # Gradient Clipping\n",
1925 |     "        gradients = optimizer.compute_gradients(cost)\n",
1926 |     "        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]\n",
1927 |     "        train_op = optimizer.apply_gradients(capped_gradients)\n",
1928 |     "print(\"Graph is built.\")\n",
1929 |     "graph_location = \"./graph\"\n",
1930 |     "print(graph_location)\n",
1931 |     "train_writer = tf.summary.FileWriter(graph_location)\n",
1932 |     "train_writer.add_graph(train_graph)"
1933 |    ]
1934 |   },
1935 |   {
1936 |    "cell_type": "markdown",
1937 |    "metadata": {},
1938 |    "source": [
1939 |     "## 4. Training the Model\n",
1940 |     "\n",
1941 |     "Only going to use a subset of the data to reduce the traing time for this demo.\n",
1942 |     "\n",
1943 |     "We chose not use use the start of the subset because because those are shorter sequences and we don't want to make it too easy for the model."
1944 |    ]
1945 |   },
1946 |   {
1947 |    "cell_type": "code",
1948 |    "execution_count": 16,
1949 |    "metadata": {},
1950 |    "outputs": [
1951 |     {
1952 |      "name": "stdout",
1953 |      "output_type": "stream",
1954 |      "text": [
1955 |       "The shortest text length: 25\n",
1956 |       "The longest text length: 31\n"
1957 |      ]
1958 |     }
1959 |    ],
1960 |    "source": [
1961 |     "# Subset the data for training\n",
1962 |     "start = 200000\n",
1963 |     "end = start + 50000\n",
1964 |     "sorted_summaries_short = sorted_summaries[start:end]\n",
1965 |     "sorted_texts_short = sorted_texts[start:end]\n",
1966 |     "print(\"The shortest text length:\", len(sorted_texts_short[0]))\n",
1967 |     "print(\"The longest text length:\",len(sorted_texts_short[-1]))"
1968 |    ]
1969 |   },
1970 |   {
1971 |    "cell_type": "code",
1972 |    "execution_count": 17,
1973 |    "metadata": {
1974 |     "scrolled": true
1975 |    },
1976 |    "outputs": [
1977 |     {
1978 |      "name": "stdout",
1979 |      "output_type": "stream",
1980 |      "text": [
1981 |       "Epoch   1/100 Batch   20/781 - Loss:  5.205, Seconds: 4.33\n",
1982 |       "Epoch   1/100 Batch   40/781 - Loss:  2.856, Seconds: 3.35\n",
1983 |       "Epoch   1/100 Batch   60/781 - Loss:  2.914, Seconds: 4.61\n",
1984 |       "Epoch   1/100 Batch   80/781 - Loss:  2.825, Seconds: 4.13\n",
1985 |       "Epoch   1/100 Batch  100/781 - Loss:  2.698, Seconds: 4.21\n",
1986 |       "Epoch   1/100 Batch  120/781 - Loss:  2.711, Seconds: 3.79\n",
1987 |       "Epoch   1/100 Batch  140/781 - Loss:  2.587, Seconds: 3.97\n",
1988 |       "Epoch   1/100 Batch  160/781 - Loss:  2.844, Seconds: 3.27\n",
1989 |       "Epoch   1/100 Batch  180/781 - Loss:  2.685, Seconds: 3.65\n",
1990 |       "Epoch   1/100 Batch  200/781 - Loss:  2.676, Seconds: 4.35\n",
1991 |       "Epoch   1/100 Batch  220/781 - Loss:  2.608, Seconds: 4.05\n",
1992 |       "Epoch   1/100 Batch  240/781 - Loss:  2.471, Seconds: 4.09\n",
1993 |       "Average loss for this update: 2.896\n",
1994 |       "New Record!\n",
1995 |       "Epoch   1/100 Batch  260/781 - Loss:  2.540, Seconds: 4.59\n",
1996 |       "Epoch   1/100 Batch  280/781 - Loss:  2.611, Seconds: 4.21\n",
1997 |       "Epoch   1/100 Batch  300/781 - Loss:  2.674, Seconds: 4.13\n",
1998 |       "Epoch   1/100 Batch  320/781 - Loss:  2.685, Seconds: 4.13\n",
1999 |       "Epoch   1/100 Batch  340/781 - Loss:  2.483, Seconds: 4.67\n",
2000 |       "Epoch   1/100 Batch  360/781 - Loss:  2.591, Seconds: 4.37\n",
2001 |       "Epoch   1/100 Batch  380/781 - Loss:  2.430, Seconds: 4.67\n",
2002 |       "Epoch   1/100 Batch  400/781 - Loss:  2.550, Seconds: 4.61\n",
2003 |       "Epoch   1/100 Batch  420/781 - Loss:  2.512, Seconds: 3.83\n",
2004 |       "Epoch   1/100 Batch  440/781 - Loss:  2.619, Seconds: 4.63\n",
2005 |       "Epoch   1/100 Batch  460/781 - Loss:  2.652, Seconds: 4.51\n",
2006 |       "Epoch   1/100 Batch  480/781 - Loss:  2.440, Seconds: 3.95\n",
2007 |       "Epoch   1/100 Batch  500/781 - Loss:  2.464, Seconds: 4.23\n",
2008 |       "Average loss for this update: 2.55\n",
2009 |       "New Record!\n",
2010 |       "Epoch   1/100 Batch  520/781 - Loss:  2.462, Seconds: 3.95\n",
2011 |       "Epoch   1/100 Batch  540/781 - Loss:  2.477, Seconds: 4.53\n",
2012 |       "Epoch   1/100 Batch  560/781 - Loss:  2.419, Seconds: 4.05\n",
2013 |       "Epoch   1/100 Batch  580/781 - Loss:  2.470, Seconds: 4.31\n",
2014 |       "Epoch   1/100 Batch  600/781 - Loss:  2.654, Seconds: 4.25\n",
2015 |       "Epoch   1/100 Batch  620/781 - Loss:  2.529, Seconds: 4.21\n",
2016 |       "Epoch   1/100 Batch  640/781 - Loss:  2.432, Seconds: 4.29\n",
2017 |       "Epoch   1/100 Batch  660/781 - Loss:  2.368, Seconds: 4.69\n",
2018 |       "Epoch   1/100 Batch  680/781 - Loss:  2.284, Seconds: 4.27\n",
2019 |       "Epoch   1/100 Batch  700/781 - Loss:  2.440, Seconds: 4.29\n",
2020 |       "Epoch   1/100 Batch  720/781 - Loss:  2.582, Seconds: 4.53\n",
2021 |       "Epoch   1/100 Batch  740/781 - Loss:  2.421, Seconds: 4.62\n",
2022 |       "Epoch   1/100 Batch  760/781 - Loss:  2.419, Seconds: 4.23\n",
2023 |       "Average loss for this update: 2.442\n",
2024 |       "New Record!\n",
2025 |       "Epoch   1/100 Batch  780/781 - Loss:  2.229, Seconds: 3.87\n",
2026 |       "Epoch   2/100 Batch   20/781 - Loss:  2.374, Seconds: 4.19\n",
2027 |       "Epoch   2/100 Batch   40/781 - Loss:  2.232, Seconds: 3.49\n",
2028 |       "Epoch   2/100 Batch   60/781 - Loss:  2.322, Seconds: 4.31\n",
2029 |       "Epoch   2/100 Batch   80/781 - Loss:  2.287, Seconds: 4.25\n",
2030 |       "Epoch   2/100 Batch  100/781 - Loss:  2.184, Seconds: 4.35\n",
2031 |       "Epoch   2/100 Batch  120/781 - Loss:  2.215, Seconds: 3.87\n",
2032 |       "Epoch   2/100 Batch  140/781 - Loss:  2.079, Seconds: 4.07\n",
2033 |       "Epoch   2/100 Batch  160/781 - Loss:  2.356, Seconds: 3.31\n",
2034 |       "Epoch   2/100 Batch  180/781 - Loss:  2.215, Seconds: 3.65\n",
2035 |       "Epoch   2/100 Batch  200/781 - Loss:  2.221, Seconds: 4.41\n",
2036 |       "Epoch   2/100 Batch  220/781 - Loss:  2.154, Seconds: 4.36\n",
2037 |       "Epoch   2/100 Batch  240/781 - Loss:  2.013, Seconds: 4.11\n",
2038 |       "Average loss for this update: 2.211\n",
2039 |       "New Record!\n",
2040 |       "Epoch   2/100 Batch  260/781 - Loss:  2.074, Seconds: 4.61\n",
2041 |       "Epoch   2/100 Batch  280/781 - Loss:  2.124, Seconds: 4.41\n",
2042 |       "Epoch   2/100 Batch  300/781 - Loss:  2.238, Seconds: 4.11\n",
2043 |       "Epoch   2/100 Batch  320/781 - Loss:  2.276, Seconds: 4.33\n",
2044 |       "Epoch   2/100 Batch  340/781 - Loss:  2.091, Seconds: 4.67\n",
2045 |       "Epoch   2/100 Batch  360/781 - Loss:  2.186, Seconds: 4.27\n",
2046 |       "Epoch   2/100 Batch  380/781 - Loss:  2.033, Seconds: 4.63\n",
2047 |       "Epoch   2/100 Batch  400/781 - Loss:  2.148, Seconds: 4.62\n",
2048 |       "Epoch   2/100 Batch  420/781 - Loss:  2.130, Seconds: 4.09\n",
2049 |       "Epoch   2/100 Batch  440/781 - Loss:  2.233, Seconds: 4.75\n",
2050 |       "Epoch   2/100 Batch  460/781 - Loss:  2.285, Seconds: 4.33\n",
2051 |       "Epoch   2/100 Batch  480/781 - Loss:  2.098, Seconds: 4.21\n",
2052 |       "Epoch   2/100 Batch  500/781 - Loss:  2.132, Seconds: 4.57\n",
2053 |       "Average loss for this update: 2.157\n",
2054 |       "New Record!\n",
2055 |       "Epoch   2/100 Batch  520/781 - Loss:  2.093, Seconds: 4.24\n",
2056 |       "Epoch   2/100 Batch  540/781 - Loss:  2.097, Seconds: 4.29\n",
2057 |       "Epoch   2/100 Batch  560/781 - Loss:  2.055, Seconds: 3.91\n",
2058 |       "Epoch   2/100 Batch  580/781 - Loss:  2.177, Seconds: 4.27\n",
2059 |       "Epoch   2/100 Batch  600/781 - Loss:  2.329, Seconds: 4.23\n",
2060 |       "Epoch   2/100 Batch  620/781 - Loss:  2.219, Seconds: 3.99\n",
2061 |       "Epoch   2/100 Batch  640/781 - Loss:  2.115, Seconds: 4.27\n",
2062 |       "Epoch   2/100 Batch  660/781 - Loss:  2.047, Seconds: 4.50\n",
2063 |       "Epoch   2/100 Batch  680/781 - Loss:  1.999, Seconds: 4.35\n",
2064 |       "Epoch   2/100 Batch  700/781 - Loss:  2.156, Seconds: 4.31\n",
2065 |       "Epoch   2/100 Batch  720/781 - Loss:  2.297, Seconds: 4.51\n",
2066 |       "Epoch   2/100 Batch  740/781 - Loss:  2.160, Seconds: 4.79\n",
2067 |       "Epoch   2/100 Batch  760/781 - Loss:  2.161, Seconds: 4.41\n",
2068 |       "Average loss for this update: 2.138\n",
2069 |       "New Record!\n",
2070 |       "Epoch   2/100 Batch  780/781 - Loss:  1.973, Seconds: 3.99\n",
2071 |       "Epoch   3/100 Batch   20/781 - Loss:  2.163, Seconds: 4.33\n",
2072 |       "Epoch   3/100 Batch   40/781 - Loss:  2.015, Seconds: 3.43\n",
2073 |       "Epoch   3/100 Batch   60/781 - Loss:  2.076, Seconds: 4.43\n",
2074 |       "Epoch   3/100 Batch   80/781 - Loss:  2.057, Seconds: 4.21\n",
2075 |       "Epoch   3/100 Batch  100/781 - Loss:  1.947, Seconds: 4.15\n",
2076 |       "Epoch   3/100 Batch  120/781 - Loss:  1.995, Seconds: 3.79\n",
2077 |       "Epoch   3/100 Batch  140/781 - Loss:  1.837, Seconds: 3.99\n",
2078 |       "Epoch   3/100 Batch  160/781 - Loss:  2.138, Seconds: 3.31\n",
2079 |       "Epoch   3/100 Batch  180/781 - Loss:  2.005, Seconds: 3.89\n",
2080 |       "Epoch   3/100 Batch  200/781 - Loss:  2.011, Seconds: 4.29\n",
2081 |       "Epoch   3/100 Batch  220/781 - Loss:  1.942, Seconds: 4.05\n",
2082 |       "Epoch   3/100 Batch  240/781 - Loss:  1.796, Seconds: 4.19\n",
2083 |       "Average loss for this update: 1.99\n",
2084 |       "New Record!\n",
2085 |       "Epoch   3/100 Batch  260/781 - Loss:  1.877, Seconds: 4.49\n",
2086 |       "Epoch   3/100 Batch  280/781 - Loss:  1.905, Seconds: 4.19\n",
2087 |       "Epoch   3/100 Batch  300/781 - Loss:  2.033, Seconds: 4.19\n",
2088 |       "Epoch   3/100 Batch  320/781 - Loss:  2.075, Seconds: 4.27\n",
2089 |       "Epoch   3/100 Batch  340/781 - Loss:  1.902, Seconds: 4.75\n",
2090 |       "Epoch   3/100 Batch  360/781 - Loss:  1.997, Seconds: 4.39\n",
2091 |       "Epoch   3/100 Batch  380/781 - Loss:  1.816, Seconds: 4.47\n",
2092 |       "Epoch   3/100 Batch  400/781 - Loss:  1.941, Seconds: 4.63\n",
2093 |       "Epoch   3/100 Batch  420/781 - Loss:  1.911, Seconds: 4.03\n",
2094 |       "Epoch   3/100 Batch  440/781 - Loss:  2.010, Seconds: 4.53\n",
2095 |       "Epoch   3/100 Batch  460/781 - Loss:  2.071, Seconds: 4.39\n",
2096 |       "Epoch   3/100 Batch  480/781 - Loss:  1.883, Seconds: 3.99\n",
2097 |       "Epoch   3/100 Batch  500/781 - Loss:  1.921, Seconds: 4.25\n",
2098 |       "Average loss for this update: 1.947\n",
2099 |       "New Record!\n",
2100 |       "Epoch   3/100 Batch  520/781 - Loss:  1.861, Seconds: 3.91\n",
2101 |       "Epoch   3/100 Batch  540/781 - Loss:  1.889, Seconds: 4.41\n",
2102 |       "Epoch   3/100 Batch  560/781 - Loss:  1.842, Seconds: 4.03\n",
2103 |       "Epoch   3/100 Batch  580/781 - Loss:  1.997, Seconds: 4.35\n",
2104 |       "Epoch   3/100 Batch  600/781 - Loss:  2.124, Seconds: 4.35\n",
2105 |       "Epoch   3/100 Batch  620/781 - Loss:  2.016, Seconds: 4.03\n",
2106 |       "Epoch   3/100 Batch  640/781 - Loss:  1.915, Seconds: 4.49\n",
2107 |       "Epoch   3/100 Batch  660/781 - Loss:  1.837, Seconds: 4.45\n",
2108 |       "Epoch   3/100 Batch  680/781 - Loss:  1.817, Seconds: 4.27\n",
2109 |       "Epoch   3/100 Batch  700/781 - Loss:  1.956, Seconds: 4.45\n",
2110 |       "Epoch   3/100 Batch  720/781 - Loss:  2.116, Seconds: 4.51\n",
2111 |       "Epoch   3/100 Batch  740/781 - Loss:  1.975, Seconds: 4.79\n",
2112 |       "Epoch   3/100 Batch  760/781 - Loss:  1.988, Seconds: 4.33\n",
2113 |       "Average loss for this update: 1.944\n",
2114 |       "New Record!\n",
2115 |       "Epoch   3/100 Batch  780/781 - Loss:  1.783, Seconds: 3.99\n",
2116 |       "Epoch   4/100 Batch   20/781 - Loss:  2.024, Seconds: 4.29\n",
2117 |       "Epoch   4/100 Batch   40/781 - Loss:  1.855, Seconds: 3.57\n",
2118 |       "Epoch   4/100 Batch   60/781 - Loss:  1.903, Seconds: 4.35\n",
2119 |       "Epoch   4/100 Batch   80/781 - Loss:  1.899, Seconds: 4.19\n",
2120 |       "Epoch   4/100 Batch  100/781 - Loss:  1.763, Seconds: 4.21\n",
2121 |       "Epoch   4/100 Batch  120/781 - Loss:  1.821, Seconds: 3.85\n",
2122 |       "Epoch   4/100 Batch  140/781 - Loss:  1.684, Seconds: 4.01\n",
2123 |       "Epoch   4/100 Batch  160/781 - Loss:  1.981, Seconds: 3.29\n",
2124 |       "Epoch   4/100 Batch  180/781 - Loss:  1.853, Seconds: 3.67\n",
2125 |       "Epoch   4/100 Batch  200/781 - Loss:  1.856, Seconds: 4.27\n",
2126 |       "Epoch   4/100 Batch  220/781 - Loss:  1.793, Seconds: 4.09\n",
2127 |       "Epoch   4/100 Batch  240/781 - Loss:  1.635, Seconds: 4.19\n",
2128 |       "Average loss for this update: 1.831\n",
2129 |       "New Record!\n",
2130 |       "Epoch   4/100 Batch  260/781 - Loss:  1.724, Seconds: 4.55\n",
2131 |       "Epoch   4/100 Batch  280/781 - Loss:  1.746, Seconds: 4.33\n"
2132 |      ]
2133 |     },
2134 |     {
2135 |      "name": "stdout",
2136 |      "output_type": "stream",
2137 |      "text": [
2138 |       "Epoch   4/100 Batch  300/781 - Loss:  1.882, Seconds: 4.23\n",
2139 |       "Epoch   4/100 Batch  320/781 - Loss:  1.929, Seconds: 4.11\n",
2140 |       "Epoch   4/100 Batch  340/781 - Loss:  1.754, Seconds: 4.65\n",
2141 |       "Epoch   4/100 Batch  360/781 - Loss:  1.842, Seconds: 4.25\n",
2142 |       "Epoch   4/100 Batch  380/781 - Loss:  1.654, Seconds: 4.67\n",
2143 |       "Epoch   4/100 Batch  400/781 - Loss:  1.781, Seconds: 4.69\n",
2144 |       "Epoch   4/100 Batch  420/781 - Loss:  1.760, Seconds: 4.03\n",
2145 |       "Epoch   4/100 Batch  440/781 - Loss:  1.869, Seconds: 4.53\n",
2146 |       "Epoch   4/100 Batch  460/781 - Loss:  1.923, Seconds: 4.55\n",
2147 |       "Epoch   4/100 Batch  480/781 - Loss:  1.737, Seconds: 4.01\n",
2148 |       "Epoch   4/100 Batch  500/781 - Loss:  1.772, Seconds: 4.09\n",
2149 |       "Average loss for this update: 1.793\n",
2150 |       "New Record!\n",
2151 |       "Epoch   4/100 Batch  520/781 - Loss:  1.681, Seconds: 3.91\n",
2152 |       "Epoch   4/100 Batch  540/781 - Loss:  1.740, Seconds: 4.49\n",
2153 |       "Epoch   4/100 Batch  560/781 - Loss:  1.693, Seconds: 4.33\n",
2154 |       "Epoch   4/100 Batch  580/781 - Loss:  1.859, Seconds: 4.25\n",
2155 |       "Epoch   4/100 Batch  600/781 - Loss:  1.982, Seconds: 4.47\n",
2156 |       "Epoch   4/100 Batch  620/781 - Loss:  1.871, Seconds: 4.01\n",
2157 |       "Epoch   4/100 Batch  640/781 - Loss:  1.775, Seconds: 4.25\n",
2158 |       "Epoch   4/100 Batch  660/781 - Loss:  1.680, Seconds: 4.67\n",
2159 |       "Epoch   4/100 Batch  680/781 - Loss:  1.678, Seconds: 4.45\n",
2160 |       "Epoch   4/100 Batch  700/781 - Loss:  1.821, Seconds: 4.35\n",
2161 |       "Epoch   4/100 Batch  720/781 - Loss:  1.983, Seconds: 4.53\n",
2162 |       "Epoch   4/100 Batch  740/781 - Loss:  1.857, Seconds: 4.91\n",
2163 |       "Epoch   4/100 Batch  760/781 - Loss:  1.840, Seconds: 4.41\n",
2164 |       "Average loss for this update: 1.803\n",
2165 |       "No Improvement.\n",
2166 |       "Epoch   4/100 Batch  780/781 - Loss:  1.640, Seconds: 4.05\n",
2167 |       "Epoch   5/100 Batch   20/781 - Loss:  1.914, Seconds: 4.47\n",
2168 |       "Epoch   5/100 Batch   40/781 - Loss:  1.728, Seconds: 3.60\n",
2169 |       "Epoch   5/100 Batch   60/781 - Loss:  1.778, Seconds: 4.61\n",
2170 |       "Epoch   5/100 Batch   80/781 - Loss:  1.771, Seconds: 4.45\n",
2171 |       "Epoch   5/100 Batch  100/781 - Loss:  1.626, Seconds: 4.25\n",
2172 |       "Epoch   5/100 Batch  120/781 - Loss:  1.683, Seconds: 3.89\n",
2173 |       "Epoch   5/100 Batch  140/781 - Loss:  1.561, Seconds: 4.01\n",
2174 |       "Epoch   5/100 Batch  160/781 - Loss:  1.854, Seconds: 3.41\n",
2175 |       "Epoch   5/100 Batch  180/781 - Loss:  1.734, Seconds: 3.83\n",
2176 |       "Epoch   5/100 Batch  200/781 - Loss:  1.722, Seconds: 4.33\n",
2177 |       "Epoch   5/100 Batch  220/781 - Loss:  1.658, Seconds: 4.25\n",
2178 |       "Epoch   5/100 Batch  240/781 - Loss:  1.502, Seconds: 4.13\n",
2179 |       "Average loss for this update: 1.702\n",
2180 |       "New Record!\n",
2181 |       "Epoch   5/100 Batch  260/781 - Loss:  1.586, Seconds: 4.65\n",
2182 |       "Epoch   5/100 Batch  280/781 - Loss:  1.648, Seconds: 4.23\n",
2183 |       "Epoch   5/100 Batch  300/781 - Loss:  1.751, Seconds: 4.27\n",
2184 |       "Epoch   5/100 Batch  320/781 - Loss:  1.813, Seconds: 4.11\n",
2185 |       "Epoch   5/100 Batch  340/781 - Loss:  1.650, Seconds: 4.67\n",
2186 |       "Epoch   5/100 Batch  360/781 - Loss:  1.718, Seconds: 4.49\n",
2187 |       "Epoch   5/100 Batch  380/781 - Loss:  1.521, Seconds: 4.45\n",
2188 |       "Epoch   5/100 Batch  400/781 - Loss:  1.666, Seconds: 4.57\n",
2189 |       "Epoch   5/100 Batch  420/781 - Loss:  1.639, Seconds: 3.97\n",
2190 |       "Epoch   5/100 Batch  440/781 - Loss:  1.749, Seconds: 4.57\n",
2191 |       "Epoch   5/100 Batch  460/781 - Loss:  1.810, Seconds: 4.36\n",
2192 |       "Epoch   5/100 Batch  480/781 - Loss:  1.624, Seconds: 4.17\n",
2193 |       "Epoch   5/100 Batch  500/781 - Loss:  1.654, Seconds: 4.31\n",
2194 |       "Average loss for this update: 1.676\n",
2195 |       "New Record!\n",
2196 |       "Epoch   5/100 Batch  520/781 - Loss:  1.565, Seconds: 3.97\n",
2197 |       "Epoch   5/100 Batch  540/781 - Loss:  1.616, Seconds: 4.43\n",
2198 |       "Epoch   5/100 Batch  560/781 - Loss:  1.576, Seconds: 3.95\n",
2199 |       "Epoch   5/100 Batch  580/781 - Loss:  1.736, Seconds: 4.37\n",
2200 |       "Epoch   5/100 Batch  600/781 - Loss:  1.860, Seconds: 4.23\n",
2201 |       "Epoch   5/100 Batch  620/781 - Loss:  1.747, Seconds: 4.33\n",
2202 |       "Epoch   5/100 Batch  640/781 - Loss:  1.660, Seconds: 4.45\n",
2203 |       "Epoch   5/100 Batch  660/781 - Loss:  1.549, Seconds: 4.45\n",
2204 |       "Epoch   5/100 Batch  680/781 - Loss:  1.560, Seconds: 4.17\n",
2205 |       "Epoch   5/100 Batch  700/781 - Loss:  1.698, Seconds: 4.29\n",
2206 |       "Epoch   5/100 Batch  720/781 - Loss:  1.861, Seconds: 4.61\n",
2207 |       "Epoch   5/100 Batch  740/781 - Loss:  1.749, Seconds: 4.89\n",
2208 |       "Epoch   5/100 Batch  760/781 - Loss:  1.736, Seconds: 4.47\n",
2209 |       "Average loss for this update: 1.683\n",
2210 |       "No Improvement.\n",
2211 |       "Epoch   5/100 Batch  780/781 - Loss:  1.523, Seconds: 4.29\n",
2212 |       "Epoch   6/100 Batch   20/781 - Loss:  1.793, Seconds: 4.27\n",
2213 |       "Epoch   6/100 Batch   40/781 - Loss:  1.642, Seconds: 3.55\n",
2214 |       "Epoch   6/100 Batch   60/781 - Loss:  1.658, Seconds: 4.39\n",
2215 |       "Epoch   6/100 Batch   80/781 - Loss:  1.662, Seconds: 4.23\n",
2216 |       "Epoch   6/100 Batch  100/781 - Loss:  1.504, Seconds: 4.37\n",
2217 |       "Epoch   6/100 Batch  120/781 - Loss:  1.577, Seconds: 4.01\n",
2218 |       "Epoch   6/100 Batch  140/781 - Loss:  1.461, Seconds: 3.99\n",
2219 |       "Epoch   6/100 Batch  160/781 - Loss:  1.759, Seconds: 3.33\n",
2220 |       "Epoch   6/100 Batch  180/781 - Loss:  1.633, Seconds: 3.73\n",
2221 |       "Epoch   6/100 Batch  200/781 - Loss:  1.612, Seconds: 4.19\n",
2222 |       "Epoch   6/100 Batch  220/781 - Loss:  1.565, Seconds: 4.07\n",
2223 |       "Epoch   6/100 Batch  240/781 - Loss:  1.395, Seconds: 4.07\n",
2224 |       "Average loss for this update: 1.597\n",
2225 |       "New Record!\n",
2226 |       "Epoch   6/100 Batch  260/781 - Loss:  1.490, Seconds: 4.61\n",
2227 |       "Epoch   6/100 Batch  280/781 - Loss:  1.541, Seconds: 4.25\n",
2228 |       "Epoch   6/100 Batch  300/781 - Loss:  1.661, Seconds: 4.39\n",
2229 |       "Epoch   6/100 Batch  320/781 - Loss:  1.706, Seconds: 4.23\n",
2230 |       "Epoch   6/100 Batch  340/781 - Loss:  1.544, Seconds: 4.51\n",
2231 |       "Epoch   6/100 Batch  360/781 - Loss:  1.623, Seconds: 4.45\n",
2232 |       "Epoch   6/100 Batch  380/781 - Loss:  1.430, Seconds: 4.61\n",
2233 |       "Epoch   6/100 Batch  400/781 - Loss:  1.567, Seconds: 4.49\n",
2234 |       "Epoch   6/100 Batch  420/781 - Loss:  1.535, Seconds: 3.97\n",
2235 |       "Epoch   6/100 Batch  440/781 - Loss:  1.642, Seconds: 4.67\n",
2236 |       "Epoch   6/100 Batch  460/781 - Loss:  1.701, Seconds: 4.35\n",
2237 |       "Epoch   6/100 Batch  480/781 - Loss:  1.529, Seconds: 4.07\n",
2238 |       "Epoch   6/100 Batch  500/781 - Loss:  1.554, Seconds: 4.33\n",
2239 |       "Average loss for this update: 1.575\n",
2240 |       "New Record!\n",
2241 |       "Epoch   6/100 Batch  520/781 - Loss:  1.462, Seconds: 3.91\n",
2242 |       "Epoch   6/100 Batch  540/781 - Loss:  1.530, Seconds: 4.59\n",
2243 |       "Epoch   6/100 Batch  560/781 - Loss:  1.482, Seconds: 4.11\n",
2244 |       "Epoch   6/100 Batch  580/781 - Loss:  1.658, Seconds: 4.07\n",
2245 |       "Epoch   6/100 Batch  600/781 - Loss:  1.717, Seconds: 4.17\n",
2246 |       "Epoch   6/100 Batch  620/781 - Loss:  1.624, Seconds: 4.05\n",
2247 |       "Epoch   6/100 Batch  640/781 - Loss:  1.551, Seconds: 4.29\n",
2248 |       "Epoch   6/100 Batch  660/781 - Loss:  1.433, Seconds: 4.39\n",
2249 |       "Epoch   6/100 Batch  680/781 - Loss:  1.468, Seconds: 4.35\n",
2250 |       "Epoch   6/100 Batch  700/781 - Loss:  1.611, Seconds: 4.63\n",
2251 |       "Epoch   6/100 Batch  720/781 - Loss:  1.763, Seconds: 4.45\n",
2252 |       "Epoch   6/100 Batch  740/781 - Loss:  1.646, Seconds: 4.79\n",
2253 |       "Epoch   6/100 Batch  760/781 - Loss:  1.631, Seconds: 4.33\n",
2254 |       "Average loss for this update: 1.582\n",
2255 |       "No Improvement.\n",
2256 |       "Epoch   6/100 Batch  780/781 - Loss:  1.434, Seconds: 3.85\n",
2257 |       "Epoch   7/100 Batch   20/781 - Loss:  1.688, Seconds: 4.19\n",
2258 |       "Epoch   7/100 Batch   40/781 - Loss:  1.547, Seconds: 3.43\n",
2259 |       "Epoch   7/100 Batch   60/781 - Loss:  1.570, Seconds: 4.53\n",
2260 |       "Epoch   7/100 Batch   80/781 - Loss:  1.564, Seconds: 4.23\n",
2261 |       "Epoch   7/100 Batch  100/781 - Loss:  1.413, Seconds: 4.31\n",
2262 |       "Epoch   7/100 Batch  120/781 - Loss:  1.472, Seconds: 3.81\n",
2263 |       "Epoch   7/100 Batch  140/781 - Loss:  1.395, Seconds: 4.07\n",
2264 |       "Epoch   7/100 Batch  160/781 - Loss:  1.645, Seconds: 3.47\n",
2265 |       "Epoch   7/100 Batch  180/781 - Loss:  1.555, Seconds: 3.79\n",
2266 |       "Epoch   7/100 Batch  200/781 - Loss:  1.534, Seconds: 4.39\n",
2267 |       "Epoch   7/100 Batch  220/781 - Loss:  1.498, Seconds: 4.09\n",
2268 |       "Epoch   7/100 Batch  240/781 - Loss:  1.307, Seconds: 4.27\n",
2269 |       "Average loss for this update: 1.508\n",
2270 |       "New Record!\n",
2271 |       "Epoch   7/100 Batch  260/781 - Loss:  1.407, Seconds: 4.59\n",
2272 |       "Epoch   7/100 Batch  280/781 - Loss:  1.459, Seconds: 4.33\n",
2273 |       "Epoch   7/100 Batch  300/781 - Loss:  1.557, Seconds: 4.29\n",
2274 |       "Epoch   7/100 Batch  320/781 - Loss:  1.616, Seconds: 4.31\n",
2275 |       "Epoch   7/100 Batch  340/781 - Loss:  1.462, Seconds: 4.47\n",
2276 |       "Epoch   7/100 Batch  360/781 - Loss:  1.534, Seconds: 4.47\n",
2277 |       "Epoch   7/100 Batch  380/781 - Loss:  1.341, Seconds: 4.47\n",
2278 |       "Epoch   7/100 Batch  400/781 - Loss:  1.497, Seconds: 4.53\n",
2279 |       "Epoch   7/100 Batch  420/781 - Loss:  1.442, Seconds: 4.05\n",
2280 |       "Epoch   7/100 Batch  440/781 - Loss:  1.561, Seconds: 4.61\n",
2281 |       "Epoch   7/100 Batch  460/781 - Loss:  1.603, Seconds: 4.35\n",
2282 |       "Epoch   7/100 Batch  480/781 - Loss:  1.442, Seconds: 3.99\n",
2283 |       "Epoch   7/100 Batch  500/781 - Loss:  1.476, Seconds: 4.15\n",
2284 |       "Average loss for this update: 1.488\n",
2285 |       "New Record!\n",
2286 |       "Epoch   7/100 Batch  520/781 - Loss:  1.375, Seconds: 3.99\n",
2287 |       "Epoch   7/100 Batch  540/781 - Loss:  1.457, Seconds: 4.47\n",
2288 |       "Epoch   7/100 Batch  560/781 - Loss:  1.406, Seconds: 3.95\n"
2289 |      ]
2290 |     },
2291 |     {
2292 |      "name": "stdout",
2293 |      "output_type": "stream",
2294 |      "text": [
2295 |       "Epoch   7/100 Batch  580/781 - Loss:  1.564, Seconds: 4.35\n",
2296 |       "Epoch   7/100 Batch  600/781 - Loss:  1.615, Seconds: 4.39\n",
2297 |       "Epoch   7/100 Batch  620/781 - Loss:  1.533, Seconds: 3.99\n",
2298 |       "Epoch   7/100 Batch  640/781 - Loss:  1.448, Seconds: 4.19\n",
2299 |       "Epoch   7/100 Batch  660/781 - Loss:  1.343, Seconds: 4.67\n",
2300 |       "Epoch   7/100 Batch  680/781 - Loss:  1.388, Seconds: 4.23\n",
2301 |       "Epoch   7/100 Batch  700/781 - Loss:  1.504, Seconds: 4.37\n",
2302 |       "Epoch   7/100 Batch  720/781 - Loss:  1.675, Seconds: 4.79\n",
2303 |       "Epoch   7/100 Batch  740/781 - Loss:  1.553, Seconds: 4.89\n",
2304 |       "Epoch   7/100 Batch  760/781 - Loss:  1.534, Seconds: 4.39\n",
2305 |       "Average loss for this update: 1.491\n",
2306 |       "No Improvement.\n",
2307 |       "Epoch   7/100 Batch  780/781 - Loss:  1.354, Seconds: 4.03\n",
2308 |       "Epoch   8/100 Batch   20/781 - Loss:  1.612, Seconds: 4.39\n",
2309 |       "Epoch   8/100 Batch   40/781 - Loss:  1.480, Seconds: 3.47\n",
2310 |       "Epoch   8/100 Batch   60/781 - Loss:  1.498, Seconds: 4.51\n",
2311 |       "Epoch   8/100 Batch   80/781 - Loss:  1.474, Seconds: 4.25\n",
2312 |       "Epoch   8/100 Batch  100/781 - Loss:  1.334, Seconds: 4.35\n",
2313 |       "Epoch   8/100 Batch  120/781 - Loss:  1.410, Seconds: 3.91\n",
2314 |       "Epoch   8/100 Batch  140/781 - Loss:  1.320, Seconds: 4.25\n",
2315 |       "Epoch   8/100 Batch  160/781 - Loss:  1.559, Seconds: 3.19\n",
2316 |       "Epoch   8/100 Batch  180/781 - Loss:  1.465, Seconds: 3.67\n",
2317 |       "Epoch   8/100 Batch  200/781 - Loss:  1.444, Seconds: 4.35\n",
2318 |       "Epoch   8/100 Batch  220/781 - Loss:  1.420, Seconds: 4.05\n",
2319 |       "Epoch   8/100 Batch  240/781 - Loss:  1.253, Seconds: 4.05\n",
2320 |       "Average loss for this update: 1.432\n",
2321 |       "New Record!\n",
2322 |       "Epoch   8/100 Batch  260/781 - Loss:  1.336, Seconds: 4.77\n",
2323 |       "Epoch   8/100 Batch  280/781 - Loss:  1.372, Seconds: 4.29\n",
2324 |       "Epoch   8/100 Batch  300/781 - Loss:  1.496, Seconds: 4.19\n",
2325 |       "Epoch   8/100 Batch  320/781 - Loss:  1.536, Seconds: 4.11\n",
2326 |       "Epoch   8/100 Batch  340/781 - Loss:  1.385, Seconds: 4.57\n",
2327 |       "Epoch   8/100 Batch  360/781 - Loss:  1.452, Seconds: 4.25\n",
2328 |       "Epoch   8/100 Batch  380/781 - Loss:  1.284, Seconds: 4.59\n",
2329 |       "Epoch   8/100 Batch  400/781 - Loss:  1.421, Seconds: 4.84\n",
2330 |       "Epoch   8/100 Batch  420/781 - Loss:  1.361, Seconds: 4.09\n",
2331 |       "Epoch   8/100 Batch  440/781 - Loss:  1.476, Seconds: 4.67\n",
2332 |       "Epoch   8/100 Batch  460/781 - Loss:  1.515, Seconds: 4.53\n",
2333 |       "Epoch   8/100 Batch  480/781 - Loss:  1.371, Seconds: 4.11\n",
2334 |       "Epoch   8/100 Batch  500/781 - Loss:  1.401, Seconds: 4.27\n",
2335 |       "Average loss for this update: 1.413\n",
2336 |       "New Record!\n",
2337 |       "Epoch   8/100 Batch  520/781 - Loss:  1.309, Seconds: 3.93\n",
2338 |       "Epoch   8/100 Batch  540/781 - Loss:  1.379, Seconds: 4.45\n",
2339 |       "Epoch   8/100 Batch  560/781 - Loss:  1.333, Seconds: 3.97\n",
2340 |       "Epoch   8/100 Batch  580/781 - Loss:  1.476, Seconds: 4.23\n",
2341 |       "Epoch   8/100 Batch  600/781 - Loss:  1.532, Seconds: 4.17\n",
2342 |       "Epoch   8/100 Batch  620/781 - Loss:  1.465, Seconds: 4.19\n",
2343 |       "Epoch   8/100 Batch  640/781 - Loss:  1.369, Seconds: 4.27\n",
2344 |       "Epoch   8/100 Batch  660/781 - Loss:  1.268, Seconds: 4.65\n",
2345 |       "Epoch   8/100 Batch  680/781 - Loss:  1.312, Seconds: 4.53\n",
2346 |       "Epoch   8/100 Batch  700/781 - Loss:  1.419, Seconds: 4.55\n",
2347 |       "Epoch   8/100 Batch  720/781 - Loss:  1.588, Seconds: 4.61\n",
2348 |       "Epoch   8/100 Batch  740/781 - Loss:  1.482, Seconds: 4.87\n",
2349 |       "Epoch   8/100 Batch  760/781 - Loss:  1.471, Seconds: 4.53\n",
2350 |       "Average loss for this update: 1.414\n",
2351 |       "No Improvement.\n",
2352 |       "Epoch   8/100 Batch  780/781 - Loss:  1.285, Seconds: 4.13\n",
2353 |       "Epoch   9/100 Batch   20/781 - Loss:  1.542, Seconds: 4.19\n",
2354 |       "Epoch   9/100 Batch   40/781 - Loss:  1.404, Seconds: 3.47\n",
2355 |       "Epoch   9/100 Batch   60/781 - Loss:  1.421, Seconds: 4.47\n",
2356 |       "Epoch   9/100 Batch   80/781 - Loss:  1.407, Seconds: 4.31\n",
2357 |       "Epoch   9/100 Batch  100/781 - Loss:  1.268, Seconds: 4.25\n",
2358 |       "Epoch   9/100 Batch  120/781 - Loss:  1.336, Seconds: 3.81\n",
2359 |       "Epoch   9/100 Batch  140/781 - Loss:  1.259, Seconds: 4.01\n",
2360 |       "Epoch   9/100 Batch  160/781 - Loss:  1.500, Seconds: 3.33\n",
2361 |       "Epoch   9/100 Batch  180/781 - Loss:  1.395, Seconds: 3.79\n",
2362 |       "Epoch   9/100 Batch  200/781 - Loss:  1.392, Seconds: 4.47\n",
2363 |       "Epoch   9/100 Batch  220/781 - Loss:  1.341, Seconds: 4.05\n",
2364 |       "Epoch   9/100 Batch  240/781 - Loss:  1.192, Seconds: 4.09\n",
2365 |       "Average loss for this update: 1.364\n",
2366 |       "New Record!\n",
2367 |       "Epoch   9/100 Batch  260/781 - Loss:  1.268, Seconds: 4.59\n",
2368 |       "Epoch   9/100 Batch  280/781 - Loss:  1.320, Seconds: 4.33\n",
2369 |       "Epoch   9/100 Batch  300/781 - Loss:  1.412, Seconds: 4.11\n",
2370 |       "Epoch   9/100 Batch  320/781 - Loss:  1.479, Seconds: 4.25\n",
2371 |       "Epoch   9/100 Batch  340/781 - Loss:  1.317, Seconds: 4.67\n",
2372 |       "Epoch   9/100 Batch  360/781 - Loss:  1.379, Seconds: 4.53\n",
2373 |       "Epoch   9/100 Batch  380/781 - Loss:  1.222, Seconds: 4.81\n",
2374 |       "Epoch   9/100 Batch  400/781 - Loss:  1.358, Seconds: 4.67\n",
2375 |       "Epoch   9/100 Batch  420/781 - Loss:  1.299, Seconds: 4.03\n",
2376 |       "Epoch   9/100 Batch  440/781 - Loss:  1.388, Seconds: 4.67\n",
2377 |       "Epoch   9/100 Batch  460/781 - Loss:  1.451, Seconds: 4.45\n",
2378 |       "Epoch   9/100 Batch  480/781 - Loss:  1.303, Seconds: 4.01\n",
2379 |       "Epoch   9/100 Batch  500/781 - Loss:  1.344, Seconds: 4.35\n",
2380 |       "Average loss for this update: 1.346\n",
2381 |       "New Record!\n",
2382 |       "Epoch   9/100 Batch  520/781 - Loss:  1.245, Seconds: 3.93\n",
2383 |       "Epoch   9/100 Batch  540/781 - Loss:  1.300, Seconds: 4.61\n",
2384 |       "Epoch   9/100 Batch  560/781 - Loss:  1.257, Seconds: 4.07\n",
2385 |       "Epoch   9/100 Batch  580/781 - Loss:  1.409, Seconds: 4.23\n",
2386 |       "Epoch   9/100 Batch  600/781 - Loss:  1.465, Seconds: 4.37\n",
2387 |       "Epoch   9/100 Batch  620/781 - Loss:  1.373, Seconds: 3.99\n",
2388 |       "Epoch   9/100 Batch  640/781 - Loss:  1.316, Seconds: 4.29\n",
2389 |       "Epoch   9/100 Batch  660/781 - Loss:  1.198, Seconds: 4.61\n",
2390 |       "Epoch   9/100 Batch  680/781 - Loss:  1.263, Seconds: 4.31\n",
2391 |       "Epoch   9/100 Batch  700/781 - Loss:  1.369, Seconds: 4.55\n",
2392 |       "Epoch   9/100 Batch  720/781 - Loss:  1.511, Seconds: 4.77\n",
2393 |       "Epoch   9/100 Batch  740/781 - Loss:  1.425, Seconds: 4.73\n",
2394 |       "Epoch   9/100 Batch  760/781 - Loss:  1.413, Seconds: 4.19\n",
2395 |       "Average loss for this update: 1.349\n",
2396 |       "No Improvement.\n",
2397 |       "Epoch   9/100 Batch  780/781 - Loss:  1.234, Seconds: 3.89\n",
2398 |       "Epoch  10/100 Batch   20/781 - Loss:  1.470, Seconds: 4.47\n",
2399 |       "Epoch  10/100 Batch   40/781 - Loss:  1.351, Seconds: 3.53\n",
2400 |       "Epoch  10/100 Batch   60/781 - Loss:  1.343, Seconds: 4.73\n",
2401 |       "Epoch  10/100 Batch   80/781 - Loss:  1.350, Seconds: 4.45\n",
2402 |       "Epoch  10/100 Batch  100/781 - Loss:  1.202, Seconds: 4.25\n",
2403 |       "Epoch  10/100 Batch  120/781 - Loss:  1.274, Seconds: 3.79\n",
2404 |       "Epoch  10/100 Batch  140/781 - Loss:  1.189, Seconds: 4.07\n",
2405 |       "Epoch  10/100 Batch  160/781 - Loss:  1.432, Seconds: 3.25\n",
2406 |       "Epoch  10/100 Batch  180/781 - Loss:  1.339, Seconds: 3.65\n",
2407 |       "Epoch  10/100 Batch  200/781 - Loss:  1.334, Seconds: 4.65\n",
2408 |       "Epoch  10/100 Batch  220/781 - Loss:  1.299, Seconds: 4.01\n",
2409 |       "Epoch  10/100 Batch  240/781 - Loss:  1.140, Seconds: 4.07\n",
2410 |       "Average loss for this update: 1.304\n",
2411 |       "New Record!\n",
2412 |       "Epoch  10/100 Batch  260/781 - Loss:  1.210, Seconds: 4.53\n",
2413 |       "Epoch  10/100 Batch  280/781 - Loss:  1.272, Seconds: 4.45\n",
2414 |       "Epoch  10/100 Batch  300/781 - Loss:  1.351, Seconds: 4.19\n",
2415 |       "Epoch  10/100 Batch  320/781 - Loss:  1.399, Seconds: 4.25\n",
2416 |       "Epoch  10/100 Batch  340/781 - Loss:  1.273, Seconds: 4.63\n",
2417 |       "Epoch  10/100 Batch  360/781 - Loss:  1.301, Seconds: 4.59\n",
2418 |       "Epoch  10/100 Batch  380/781 - Loss:  1.164, Seconds: 4.69\n",
2419 |       "Epoch  10/100 Batch  400/781 - Loss:  1.284, Seconds: 4.75\n",
2420 |       "Epoch  10/100 Batch  420/781 - Loss:  1.221, Seconds: 3.95\n",
2421 |       "Epoch  10/100 Batch  440/781 - Loss:  1.333, Seconds: 4.67\n",
2422 |       "Epoch  10/100 Batch  460/781 - Loss:  1.383, Seconds: 4.61\n",
2423 |       "Epoch  10/100 Batch  480/781 - Loss:  1.240, Seconds: 4.19\n",
2424 |       "Epoch  10/100 Batch  500/781 - Loss:  1.298, Seconds: 4.15\n",
2425 |       "Average loss for this update: 1.282\n",
2426 |       "New Record!\n",
2427 |       "Epoch  10/100 Batch  520/781 - Loss:  1.167, Seconds: 4.07\n",
2428 |       "Epoch  10/100 Batch  540/781 - Loss:  1.242, Seconds: 4.47\n",
2429 |       "Epoch  10/100 Batch  560/781 - Loss:  1.201, Seconds: 4.03\n",
2430 |       "Epoch  10/100 Batch  580/781 - Loss:  1.343, Seconds: 4.33\n",
2431 |       "Epoch  10/100 Batch  600/781 - Loss:  1.399, Seconds: 4.63\n",
2432 |       "Epoch  10/100 Batch  620/781 - Loss:  1.308, Seconds: 4.37\n",
2433 |       "Epoch  10/100 Batch  640/781 - Loss:  1.260, Seconds: 4.37\n",
2434 |       "Epoch  10/100 Batch  660/781 - Loss:  1.139, Seconds: 4.35\n",
2435 |       "Epoch  10/100 Batch  680/781 - Loss:  1.203, Seconds: 4.41\n",
2436 |       "Epoch  10/100 Batch  700/781 - Loss:  1.299, Seconds: 4.39\n",
2437 |       "Epoch  10/100 Batch  720/781 - Loss:  1.437, Seconds: 4.59\n",
2438 |       "Epoch  10/100 Batch  740/781 - Loss:  1.346, Seconds: 4.73\n",
2439 |       "Epoch  10/100 Batch  760/781 - Loss:  1.340, Seconds: 4.27\n",
2440 |       "Average loss for this update: 1.284\n",
2441 |       "No Improvement.\n",
2442 |       "Epoch  10/100 Batch  780/781 - Loss:  1.176, Seconds: 4.33\n",
2443 |       "Epoch  11/100 Batch   20/781 - Loss:  1.408, Seconds: 4.49\n",
2444 |       "Epoch  11/100 Batch   40/781 - Loss:  1.280, Seconds: 3.41\n",
2445 |       "Epoch  11/100 Batch   60/781 - Loss:  1.287, Seconds: 4.65\n"
2446 |      ]
2447 |     },
2448 |     {
2449 |      "name": "stdout",
2450 |      "output_type": "stream",
2451 |      "text": [
2452 |       "Epoch  11/100 Batch   80/781 - Loss:  1.300, Seconds: 4.27\n",
2453 |       "Epoch  11/100 Batch  100/781 - Loss:  1.147, Seconds: 4.39\n",
2454 |       "Epoch  11/100 Batch  120/781 - Loss:  1.242, Seconds: 3.79\n",
2455 |       "Epoch  11/100 Batch  140/781 - Loss:  1.150, Seconds: 4.17\n",
2456 |       "Epoch  11/100 Batch  160/781 - Loss:  1.364, Seconds: 3.25\n",
2457 |       "Epoch  11/100 Batch  180/781 - Loss:  1.293, Seconds: 3.67\n",
2458 |       "Epoch  11/100 Batch  200/781 - Loss:  1.263, Seconds: 4.33\n",
2459 |       "Epoch  11/100 Batch  220/781 - Loss:  1.238, Seconds: 4.09\n",
2460 |       "Epoch  11/100 Batch  240/781 - Loss:  1.088, Seconds: 4.01\n",
2461 |       "Average loss for this update: 1.249\n",
2462 |       "New Record!\n",
2463 |       "Epoch  11/100 Batch  260/781 - Loss:  1.169, Seconds: 4.55\n",
2464 |       "Epoch  11/100 Batch  280/781 - Loss:  1.206, Seconds: 4.41\n",
2465 |       "Epoch  11/100 Batch  300/781 - Loss:  1.294, Seconds: 4.35\n",
2466 |       "Epoch  11/100 Batch  320/781 - Loss:  1.366, Seconds: 4.17\n",
2467 |       "Epoch  11/100 Batch  340/781 - Loss:  1.212, Seconds: 4.67\n",
2468 |       "Epoch  11/100 Batch  360/781 - Loss:  1.248, Seconds: 4.39\n",
2469 |       "Epoch  11/100 Batch  380/781 - Loss:  1.105, Seconds: 4.61\n",
2470 |       "Epoch  11/100 Batch  400/781 - Loss:  1.236, Seconds: 4.71\n",
2471 |       "Epoch  11/100 Batch  420/781 - Loss:  1.175, Seconds: 3.97\n",
2472 |       "Epoch  11/100 Batch  440/781 - Loss:  1.276, Seconds: 4.83\n",
2473 |       "Epoch  11/100 Batch  460/781 - Loss:  1.346, Seconds: 4.59\n",
2474 |       "Epoch  11/100 Batch  480/781 - Loss:  1.198, Seconds: 4.01\n",
2475 |       "Epoch  11/100 Batch  500/781 - Loss:  1.244, Seconds: 4.39\n",
2476 |       "Average loss for this update: 1.233\n",
2477 |       "New Record!\n",
2478 |       "Epoch  11/100 Batch  520/781 - Loss:  1.132, Seconds: 3.83\n",
2479 |       "Epoch  11/100 Batch  540/781 - Loss:  1.182, Seconds: 4.65\n",
2480 |       "Epoch  11/100 Batch  560/781 - Loss:  1.153, Seconds: 3.89\n",
2481 |       "Epoch  11/100 Batch  580/781 - Loss:  1.290, Seconds: 4.23\n",
2482 |       "Epoch  11/100 Batch  600/781 - Loss:  1.335, Seconds: 4.17\n",
2483 |       "Epoch  11/100 Batch  620/781 - Loss:  1.250, Seconds: 4.09\n",
2484 |       "Epoch  11/100 Batch  640/781 - Loss:  1.202, Seconds: 4.25\n",
2485 |       "Epoch  11/100 Batch  660/781 - Loss:  1.085, Seconds: 4.45\n",
2486 |       "Epoch  11/100 Batch  680/781 - Loss:  1.150, Seconds: 4.27\n",
2487 |       "Epoch  11/100 Batch  700/781 - Loss:  1.248, Seconds: 4.35\n",
2488 |       "Epoch  11/100 Batch  720/781 - Loss:  1.394, Seconds: 4.65\n",
2489 |       "Epoch  11/100 Batch  740/781 - Loss:  1.287, Seconds: 4.75\n",
2490 |       "Epoch  11/100 Batch  760/781 - Loss:  1.294, Seconds: 4.61\n",
2491 |       "Average loss for this update: 1.231\n",
2492 |       "New Record!\n",
2493 |       "Epoch  11/100 Batch  780/781 - Loss:  1.121, Seconds: 3.97\n",
2494 |       "Epoch  12/100 Batch   20/781 - Loss:  1.349, Seconds: 4.37\n",
2495 |       "Epoch  12/100 Batch   40/781 - Loss:  1.242, Seconds: 3.45\n",
2496 |       "Epoch  12/100 Batch   60/781 - Loss:  1.216, Seconds: 4.39\n",
2497 |       "Epoch  12/100 Batch   80/781 - Loss:  1.235, Seconds: 4.41\n",
2498 |       "Epoch  12/100 Batch  100/781 - Loss:  1.096, Seconds: 4.31\n",
2499 |       "Epoch  12/100 Batch  120/781 - Loss:  1.184, Seconds: 3.99\n",
2500 |       "Epoch  12/100 Batch  140/781 - Loss:  1.113, Seconds: 4.25\n",
2501 |       "Epoch  12/100 Batch  160/781 - Loss:  1.303, Seconds: 3.27\n",
2502 |       "Epoch  12/100 Batch  180/781 - Loss:  1.239, Seconds: 3.71\n",
2503 |       "Epoch  12/100 Batch  200/781 - Loss:  1.202, Seconds: 4.39\n",
2504 |       "Epoch  12/100 Batch  220/781 - Loss:  1.175, Seconds: 4.11\n",
2505 |       "Epoch  12/100 Batch  240/781 - Loss:  1.029, Seconds: 4.22\n",
2506 |       "Average loss for this update: 1.193\n",
2507 |       "New Record!\n",
2508 |       "Epoch  12/100 Batch  260/781 - Loss:  1.113, Seconds: 4.53\n",
2509 |       "Epoch  12/100 Batch  280/781 - Loss:  1.160, Seconds: 4.39\n",
2510 |       "Epoch  12/100 Batch  300/781 - Loss:  1.231, Seconds: 4.23\n",
2511 |       "Epoch  12/100 Batch  320/781 - Loss:  1.298, Seconds: 4.17\n",
2512 |       "Epoch  12/100 Batch  340/781 - Loss:  1.161, Seconds: 4.51\n",
2513 |       "Epoch  12/100 Batch  360/781 - Loss:  1.190, Seconds: 4.43\n",
2514 |       "Epoch  12/100 Batch  380/781 - Loss:  1.045, Seconds: 4.59\n",
2515 |       "Epoch  12/100 Batch  400/781 - Loss:  1.182, Seconds: 4.51\n",
2516 |       "Epoch  12/100 Batch  420/781 - Loss:  1.127, Seconds: 3.95\n",
2517 |       "Epoch  12/100 Batch  440/781 - Loss:  1.213, Seconds: 4.75\n",
2518 |       "Epoch  12/100 Batch  460/781 - Loss:  1.280, Seconds: 4.53\n",
2519 |       "Epoch  12/100 Batch  480/781 - Loss:  1.149, Seconds: 4.19\n",
2520 |       "Epoch  12/100 Batch  500/781 - Loss:  1.196, Seconds: 4.19\n",
2521 |       "Average loss for this update: 1.177\n",
2522 |       "New Record!\n",
2523 |       "Epoch  12/100 Batch  520/781 - Loss:  1.083, Seconds: 3.99\n",
2524 |       "Epoch  12/100 Batch  540/781 - Loss:  1.148, Seconds: 4.39\n",
2525 |       "Epoch  12/100 Batch  560/781 - Loss:  1.103, Seconds: 3.95\n",
2526 |       "Epoch  12/100 Batch  580/781 - Loss:  1.230, Seconds: 4.19\n",
2527 |       "Epoch  12/100 Batch  600/781 - Loss:  1.295, Seconds: 4.21\n",
2528 |       "Epoch  12/100 Batch  620/781 - Loss:  1.208, Seconds: 4.09\n",
2529 |       "Epoch  12/100 Batch  640/781 - Loss:  1.152, Seconds: 4.41\n",
2530 |       "Epoch  12/100 Batch  660/781 - Loss:  1.040, Seconds: 4.51\n",
2531 |       "Epoch  12/100 Batch  680/781 - Loss:  1.108, Seconds: 4.37\n",
2532 |       "Epoch  12/100 Batch  700/781 - Loss:  1.192, Seconds: 4.31\n",
2533 |       "Epoch  12/100 Batch  720/781 - Loss:  1.338, Seconds: 4.39\n",
2534 |       "Epoch  12/100 Batch  740/781 - Loss:  1.234, Seconds: 4.79\n",
2535 |       "Epoch  12/100 Batch  760/781 - Loss:  1.241, Seconds: 4.33\n",
2536 |       "Average loss for this update: 1.184\n",
2537 |       "No Improvement.\n",
2538 |       "Epoch  12/100 Batch  780/781 - Loss:  1.095, Seconds: 3.89\n",
2539 |       "Epoch  13/100 Batch   20/781 - Loss:  1.297, Seconds: 4.33\n",
2540 |       "Epoch  13/100 Batch   40/781 - Loss:  1.190, Seconds: 3.41\n",
2541 |       "Epoch  13/100 Batch   60/781 - Loss:  1.192, Seconds: 4.51\n",
2542 |       "Epoch  13/100 Batch   80/781 - Loss:  1.197, Seconds: 4.49\n",
2543 |       "Epoch  13/100 Batch  100/781 - Loss:  1.058, Seconds: 4.13\n",
2544 |       "Epoch  13/100 Batch  120/781 - Loss:  1.144, Seconds: 3.99\n",
2545 |       "Epoch  13/100 Batch  140/781 - Loss:  1.065, Seconds: 4.09\n",
2546 |       "Epoch  13/100 Batch  160/781 - Loss:  1.246, Seconds: 3.27\n",
2547 |       "Epoch  13/100 Batch  180/781 - Loss:  1.194, Seconds: 3.65\n",
2548 |       "Epoch  13/100 Batch  200/781 - Loss:  1.164, Seconds: 4.35\n",
2549 |       "Epoch  13/100 Batch  220/781 - Loss:  1.142, Seconds: 4.25\n",
2550 |       "Epoch  13/100 Batch  240/781 - Loss:  0.991, Seconds: 3.99\n",
2551 |       "Average loss for this update: 1.151\n",
2552 |       "New Record!\n",
2553 |       "Epoch  13/100 Batch  260/781 - Loss:  1.075, Seconds: 4.61\n",
2554 |       "Epoch  13/100 Batch  280/781 - Loss:  1.125, Seconds: 4.33\n",
2555 |       "Epoch  13/100 Batch  300/781 - Loss:  1.210, Seconds: 4.13\n",
2556 |       "Epoch  13/100 Batch  320/781 - Loss:  1.260, Seconds: 4.29\n",
2557 |       "Epoch  13/100 Batch  340/781 - Loss:  1.127, Seconds: 4.67\n",
2558 |       "Epoch  13/100 Batch  360/781 - Loss:  1.152, Seconds: 4.37\n",
2559 |       "Epoch  13/100 Batch  380/781 - Loss:  1.007, Seconds: 4.73\n",
2560 |       "Epoch  13/100 Batch  400/781 - Loss:  1.149, Seconds: 4.59\n",
2561 |       "Epoch  13/100 Batch  420/781 - Loss:  1.071, Seconds: 3.97\n",
2562 |       "Epoch  13/100 Batch  440/781 - Loss:  1.166, Seconds: 4.61\n",
2563 |       "Epoch  13/100 Batch  460/781 - Loss:  1.235, Seconds: 4.59\n",
2564 |       "Epoch  13/100 Batch  480/781 - Loss:  1.106, Seconds: 4.33\n",
2565 |       "Epoch  13/100 Batch  500/781 - Loss:  1.142, Seconds: 4.33\n",
2566 |       "Average loss for this update: 1.137\n",
2567 |       "New Record!\n",
2568 |       "Epoch  13/100 Batch  520/781 - Loss:  1.041, Seconds: 3.83\n",
2569 |       "Epoch  13/100 Batch  540/781 - Loss:  1.099, Seconds: 4.39\n",
2570 |       "Epoch  13/100 Batch  560/781 - Loss:  1.055, Seconds: 3.87\n",
2571 |       "Epoch  13/100 Batch  580/781 - Loss:  1.182, Seconds: 4.33\n",
2572 |       "Epoch  13/100 Batch  600/781 - Loss:  1.233, Seconds: 4.35\n",
2573 |       "Epoch  13/100 Batch  620/781 - Loss:  1.164, Seconds: 4.01\n",
2574 |       "Epoch  13/100 Batch  640/781 - Loss:  1.122, Seconds: 4.29\n",
2575 |       "Epoch  13/100 Batch  660/781 - Loss:  0.998, Seconds: 4.57\n",
2576 |       "Epoch  13/100 Batch  680/781 - Loss:  1.072, Seconds: 4.31\n",
2577 |       "Epoch  13/100 Batch  700/781 - Loss:  1.151, Seconds: 4.43\n",
2578 |       "Epoch  13/100 Batch  720/781 - Loss:  1.302, Seconds: 4.71\n",
2579 |       "Epoch  13/100 Batch  740/781 - Loss:  1.185, Seconds: 4.77\n",
2580 |       "Epoch  13/100 Batch  760/781 - Loss:  1.195, Seconds: 4.29\n",
2581 |       "Average loss for this update: 1.139\n",
2582 |       "No Improvement.\n",
2583 |       "Epoch  13/100 Batch  780/781 - Loss:  1.048, Seconds: 3.97\n",
2584 |       "Epoch  14/100 Batch   20/781 - Loss:  1.255, Seconds: 4.41\n",
2585 |       "Epoch  14/100 Batch   40/781 - Loss:  1.164, Seconds: 3.59\n",
2586 |       "Epoch  14/100 Batch   60/781 - Loss:  1.130, Seconds: 4.61\n",
2587 |       "Epoch  14/100 Batch   80/781 - Loss:  1.142, Seconds: 4.33\n",
2588 |       "Epoch  14/100 Batch  100/781 - Loss:  1.009, Seconds: 4.31\n",
2589 |       "Epoch  14/100 Batch  120/781 - Loss:  1.100, Seconds: 4.03\n",
2590 |       "Epoch  14/100 Batch  140/781 - Loss:  1.025, Seconds: 4.05\n",
2591 |       "Epoch  14/100 Batch  160/781 - Loss:  1.202, Seconds: 3.21\n",
2592 |       "Epoch  14/100 Batch  180/781 - Loss:  1.192, Seconds: 3.63\n",
2593 |       "Epoch  14/100 Batch  200/781 - Loss:  1.128, Seconds: 4.41\n",
2594 |       "Epoch  14/100 Batch  220/781 - Loss:  1.109, Seconds: 4.23\n",
2595 |       "Epoch  14/100 Batch  240/781 - Loss:  0.962, Seconds: 4.15\n",
2596 |       "Average loss for this update: 1.113\n",
2597 |       "New Record!\n",
2598 |       "Epoch  14/100 Batch  260/781 - Loss:  1.041, Seconds: 4.73\n",
2599 |       "Epoch  14/100 Batch  280/781 - Loss:  1.082, Seconds: 4.46\n",
2600 |       "Epoch  14/100 Batch  300/781 - Loss:  1.156, Seconds: 4.21\n",
2601 |       "Epoch  14/100 Batch  320/781 - Loss:  1.214, Seconds: 4.25\n",
2602 |       "Epoch  14/100 Batch  340/781 - Loss:  1.083, Seconds: 4.58\n"
2603 |      ]
2604 |     },
2605 |     {
2606 |      "name": "stdout",
2607 |      "output_type": "stream",
2608 |      "text": [
2609 |       "Epoch  14/100 Batch  360/781 - Loss:  1.113, Seconds: 4.67\n",
2610 |       "Epoch  14/100 Batch  380/781 - Loss:  0.979, Seconds: 4.51\n",
2611 |       "Epoch  14/100 Batch  400/781 - Loss:  1.119, Seconds: 4.61\n",
2612 |       "Epoch  14/100 Batch  420/781 - Loss:  1.048, Seconds: 3.91\n",
2613 |       "Epoch  14/100 Batch  440/781 - Loss:  1.154, Seconds: 4.63\n",
2614 |       "Epoch  14/100 Batch  460/781 - Loss:  1.194, Seconds: 4.39\n",
2615 |       "Epoch  14/100 Batch  480/781 - Loss:  1.065, Seconds: 4.07\n",
2616 |       "Epoch  14/100 Batch  500/781 - Loss:  1.117, Seconds: 4.23\n",
2617 |       "Average loss for this update: 1.101\n",
2618 |       "New Record!\n",
2619 |       "Epoch  14/100 Batch  520/781 - Loss:  1.005, Seconds: 3.85\n",
2620 |       "Epoch  14/100 Batch  540/781 - Loss:  1.053, Seconds: 4.37\n",
2621 |       "Epoch  14/100 Batch  560/781 - Loss:  1.018, Seconds: 3.95\n",
2622 |       "Epoch  14/100 Batch  580/781 - Loss:  1.157, Seconds: 4.49\n",
2623 |       "Epoch  14/100 Batch  600/781 - Loss:  1.189, Seconds: 4.21\n",
2624 |       "Epoch  14/100 Batch  620/781 - Loss:  1.123, Seconds: 4.21\n",
2625 |       "Epoch  14/100 Batch  640/781 - Loss:  1.087, Seconds: 4.49\n",
2626 |       "Epoch  14/100 Batch  660/781 - Loss:  0.954, Seconds: 4.53\n",
2627 |       "Epoch  14/100 Batch  680/781 - Loss:  1.025, Seconds: 4.35\n",
2628 |       "Epoch  14/100 Batch  700/781 - Loss:  1.118, Seconds: 4.49\n",
2629 |       "Epoch  14/100 Batch  720/781 - Loss:  1.248, Seconds: 4.71\n",
2630 |       "Epoch  14/100 Batch  740/781 - Loss:  1.166, Seconds: 4.79\n",
2631 |       "Epoch  14/100 Batch  760/781 - Loss:  1.166, Seconds: 4.27\n",
2632 |       "Average loss for this update: 1.102\n",
2633 |       "No Improvement.\n",
2634 |       "Epoch  14/100 Batch  780/781 - Loss:  1.011, Seconds: 3.85\n",
2635 |       "Epoch  15/100 Batch   20/781 - Loss:  1.214, Seconds: 4.19\n",
2636 |       "Epoch  15/100 Batch   40/781 - Loss:  1.116, Seconds: 3.43\n",
2637 |       "Epoch  15/100 Batch   60/781 - Loss:  1.103, Seconds: 4.61\n",
2638 |       "Epoch  15/100 Batch   80/781 - Loss:  1.109, Seconds: 4.43\n",
2639 |       "Epoch  15/100 Batch  100/781 - Loss:  0.976, Seconds: 4.29\n",
2640 |       "Epoch  15/100 Batch  120/781 - Loss:  1.064, Seconds: 3.89\n",
2641 |       "Epoch  15/100 Batch  140/781 - Loss:  1.001, Seconds: 3.99\n",
2642 |       "Epoch  15/100 Batch  160/781 - Loss:  1.159, Seconds: 3.27\n",
2643 |       "Epoch  15/100 Batch  180/781 - Loss:  1.123, Seconds: 3.79\n",
2644 |       "Epoch  15/100 Batch  200/781 - Loss:  1.082, Seconds: 4.39\n",
2645 |       "Epoch  15/100 Batch  220/781 - Loss:  1.071, Seconds: 4.23\n",
2646 |       "Epoch  15/100 Batch  240/781 - Loss:  0.923, Seconds: 4.15\n",
2647 |       "Average loss for this update: 1.074\n",
2648 |       "New Record!\n",
2649 |       "Epoch  15/100 Batch  260/781 - Loss:  1.008, Seconds: 4.67\n",
2650 |       "Epoch  15/100 Batch  280/781 - Loss:  1.035, Seconds: 4.19\n",
2651 |       "Epoch  15/100 Batch  300/781 - Loss:  1.130, Seconds: 4.13\n",
2652 |       "Epoch  15/100 Batch  320/781 - Loss:  1.169, Seconds: 4.25\n",
2653 |       "Epoch  15/100 Batch  340/781 - Loss:  1.041, Seconds: 4.71\n",
2654 |       "Epoch  15/100 Batch  360/781 - Loss:  1.092, Seconds: 4.57\n",
2655 |       "Epoch  15/100 Batch  380/781 - Loss:  0.944, Seconds: 4.49\n",
2656 |       "Epoch  15/100 Batch  400/781 - Loss:  1.076, Seconds: 4.73\n",
2657 |       "Epoch  15/100 Batch  420/781 - Loss:  1.013, Seconds: 3.97\n",
2658 |       "Epoch  15/100 Batch  440/781 - Loss:  1.091, Seconds: 4.75\n",
2659 |       "Epoch  15/100 Batch  460/781 - Loss:  1.146, Seconds: 4.49\n",
2660 |       "Epoch  15/100 Batch  480/781 - Loss:  1.036, Seconds: 4.03\n",
2661 |       "Epoch  15/100 Batch  500/781 - Loss:  1.069, Seconds: 4.27\n",
2662 |       "Average loss for this update: 1.062\n",
2663 |       "New Record!\n",
2664 |       "Epoch  15/100 Batch  520/781 - Loss:  0.970, Seconds: 4.00\n",
2665 |       "Epoch  15/100 Batch  540/781 - Loss:  1.020, Seconds: 4.40\n",
2666 |       "Epoch  15/100 Batch  560/781 - Loss:  0.996, Seconds: 3.93\n",
2667 |       "Epoch  15/100 Batch  580/781 - Loss:  1.105, Seconds: 4.41\n",
2668 |       "Epoch  15/100 Batch  600/781 - Loss:  1.139, Seconds: 4.35\n",
2669 |       "Epoch  15/100 Batch  620/781 - Loss:  1.069, Seconds: 4.37\n",
2670 |       "Epoch  15/100 Batch  640/781 - Loss:  1.056, Seconds: 4.35\n",
2671 |       "Epoch  15/100 Batch  660/781 - Loss:  0.931, Seconds: 4.43\n",
2672 |       "Epoch  15/100 Batch  680/781 - Loss:  0.996, Seconds: 4.57\n",
2673 |       "Epoch  15/100 Batch  700/781 - Loss:  1.086, Seconds: 4.29\n",
2674 |       "Epoch  15/100 Batch  720/781 - Loss:  1.216, Seconds: 4.43\n",
2675 |       "Epoch  15/100 Batch  740/781 - Loss:  1.116, Seconds: 4.77\n",
2676 |       "Epoch  15/100 Batch  760/781 - Loss:  1.131, Seconds: 4.51\n",
2677 |       "Average loss for this update: 1.065\n",
2678 |       "No Improvement.\n",
2679 |       "Epoch  15/100 Batch  780/781 - Loss:  0.985, Seconds: 3.83\n",
2680 |       "Epoch  16/100 Batch   20/781 - Loss:  1.174, Seconds: 4.33\n",
2681 |       "Epoch  16/100 Batch   40/781 - Loss:  1.077, Seconds: 3.51\n",
2682 |       "Epoch  16/100 Batch   60/781 - Loss:  1.082, Seconds: 4.41\n",
2683 |       "Epoch  16/100 Batch   80/781 - Loss:  1.067, Seconds: 4.49\n",
2684 |       "Epoch  16/100 Batch  100/781 - Loss:  0.946, Seconds: 4.49\n",
2685 |       "Epoch  16/100 Batch  120/781 - Loss:  1.019, Seconds: 3.83\n",
2686 |       "Epoch  16/100 Batch  140/781 - Loss:  0.968, Seconds: 4.01\n",
2687 |       "Epoch  16/100 Batch  160/781 - Loss:  1.127, Seconds: 3.33\n",
2688 |       "Epoch  16/100 Batch  180/781 - Loss:  1.093, Seconds: 3.91\n",
2689 |       "Epoch  16/100 Batch  200/781 - Loss:  1.045, Seconds: 4.43\n",
2690 |       "Epoch  16/100 Batch  220/781 - Loss:  1.044, Seconds: 4.09\n",
2691 |       "Epoch  16/100 Batch  240/781 - Loss:  0.892, Seconds: 4.35\n",
2692 |       "Average loss for this update: 1.04\n",
2693 |       "New Record!\n",
2694 |       "Epoch  16/100 Batch  260/781 - Loss:  0.973, Seconds: 4.65\n",
2695 |       "Epoch  16/100 Batch  280/781 - Loss:  1.016, Seconds: 4.39\n",
2696 |       "Epoch  16/100 Batch  300/781 - Loss:  1.095, Seconds: 4.27\n",
2697 |       "Epoch  16/100 Batch  320/781 - Loss:  1.139, Seconds: 4.05\n",
2698 |       "Epoch  16/100 Batch  340/781 - Loss:  1.002, Seconds: 4.85\n",
2699 |       "Epoch  16/100 Batch  360/781 - Loss:  1.058, Seconds: 4.49\n",
2700 |       "Epoch  16/100 Batch  380/781 - Loss:  0.909, Seconds: 4.65\n",
2701 |       "Epoch  16/100 Batch  400/781 - Loss:  1.042, Seconds: 4.63\n",
2702 |       "Epoch  16/100 Batch  420/781 - Loss:  0.985, Seconds: 4.01\n",
2703 |       "Epoch  16/100 Batch  440/781 - Loss:  1.053, Seconds: 4.59\n",
2704 |       "Epoch  16/100 Batch  460/781 - Loss:  1.115, Seconds: 4.41\n",
2705 |       "Epoch  16/100 Batch  480/781 - Loss:  1.001, Seconds: 4.27\n",
2706 |       "Epoch  16/100 Batch  500/781 - Loss:  1.052, Seconds: 4.31\n",
2707 |       "Average loss for this update: 1.031\n",
2708 |       "New Record!\n",
2709 |       "Epoch  16/100 Batch  520/781 - Loss:  0.940, Seconds: 3.91\n",
2710 |       "Epoch  16/100 Batch  540/781 - Loss:  1.000, Seconds: 4.61\n",
2711 |       "Epoch  16/100 Batch  560/781 - Loss:  0.961, Seconds: 4.21\n",
2712 |       "Epoch  16/100 Batch  580/781 - Loss:  1.082, Seconds: 4.31\n",
2713 |       "Epoch  16/100 Batch  600/781 - Loss:  1.110, Seconds: 4.33\n",
2714 |       "Epoch  16/100 Batch  620/781 - Loss:  1.035, Seconds: 4.13\n",
2715 |       "Epoch  16/100 Batch  640/781 - Loss:  1.026, Seconds: 4.25\n",
2716 |       "Epoch  16/100 Batch  660/781 - Loss:  0.902, Seconds: 4.53\n",
2717 |       "Epoch  16/100 Batch  680/781 - Loss:  0.970, Seconds: 4.41\n",
2718 |       "Epoch  16/100 Batch  700/781 - Loss:  1.062, Seconds: 4.60\n",
2719 |       "Epoch  16/100 Batch  720/781 - Loss:  1.192, Seconds: 4.63\n",
2720 |       "Epoch  16/100 Batch  740/781 - Loss:  1.094, Seconds: 4.85\n",
2721 |       "Epoch  16/100 Batch  760/781 - Loss:  1.111, Seconds: 4.49\n",
2722 |       "Average loss for this update: 1.04\n",
2723 |       "No Improvement.\n",
2724 |       "Epoch  16/100 Batch  780/781 - Loss:  0.965, Seconds: 3.83\n",
2725 |       "Epoch  17/100 Batch   20/781 - Loss:  1.147, Seconds: 4.21\n",
2726 |       "Epoch  17/100 Batch   40/781 - Loss:  1.055, Seconds: 3.43\n",
2727 |       "Epoch  17/100 Batch   60/781 - Loss:  1.053, Seconds: 4.41\n",
2728 |       "Epoch  17/100 Batch   80/781 - Loss:  1.040, Seconds: 4.51\n",
2729 |       "Epoch  17/100 Batch  100/781 - Loss:  0.918, Seconds: 4.23\n",
2730 |       "Epoch  17/100 Batch  120/781 - Loss:  0.988, Seconds: 3.89\n",
2731 |       "Epoch  17/100 Batch  140/781 - Loss:  0.944, Seconds: 4.09\n",
2732 |       "Epoch  17/100 Batch  160/781 - Loss:  1.078, Seconds: 3.36\n",
2733 |       "Epoch  17/100 Batch  180/781 - Loss:  1.053, Seconds: 3.69\n",
2734 |       "Epoch  17/100 Batch  200/781 - Loss:  1.022, Seconds: 4.40\n",
2735 |       "Epoch  17/100 Batch  220/781 - Loss:  1.007, Seconds: 4.20\n",
2736 |       "Epoch  17/100 Batch  240/781 - Loss:  0.854, Seconds: 4.22\n",
2737 |       "Average loss for this update: 1.009\n",
2738 |       "New Record!\n",
2739 |       "Epoch  17/100 Batch  260/781 - Loss:  0.950, Seconds: 4.57\n",
2740 |       "Epoch  17/100 Batch  280/781 - Loss:  0.971, Seconds: 4.33\n",
2741 |       "Epoch  17/100 Batch  300/781 - Loss:  1.070, Seconds: 4.21\n",
2742 |       "Epoch  17/100 Batch  320/781 - Loss:  1.125, Seconds: 4.45\n",
2743 |       "Epoch  17/100 Batch  340/781 - Loss:  0.983, Seconds: 4.49\n",
2744 |       "Epoch  17/100 Batch  360/781 - Loss:  1.026, Seconds: 4.33\n",
2745 |       "Epoch  17/100 Batch  380/781 - Loss:  0.881, Seconds: 4.97\n",
2746 |       "Epoch  17/100 Batch  400/781 - Loss:  1.019, Seconds: 4.67\n",
2747 |       "Epoch  17/100 Batch  420/781 - Loss:  0.962, Seconds: 4.03\n",
2748 |       "Epoch  17/100 Batch  440/781 - Loss:  1.011, Seconds: 4.61\n",
2749 |       "Epoch  17/100 Batch  460/781 - Loss:  1.076, Seconds: 4.45\n",
2750 |       "Epoch  17/100 Batch  480/781 - Loss:  0.975, Seconds: 4.05\n",
2751 |       "Epoch  17/100 Batch  500/781 - Loss:  1.024, Seconds: 4.23\n",
2752 |       "Average loss for this update: 1.002\n",
2753 |       "New Record!\n",
2754 |       "Epoch  17/100 Batch  520/781 - Loss:  0.915, Seconds: 4.03\n",
2755 |       "Epoch  17/100 Batch  540/781 - Loss:  0.963, Seconds: 4.41\n",
2756 |       "Epoch  17/100 Batch  560/781 - Loss:  0.913, Seconds: 4.09\n",
2757 |       "Epoch  17/100 Batch  580/781 - Loss:  1.042, Seconds: 4.29\n",
2758 |       "Epoch  17/100 Batch  600/781 - Loss:  1.078, Seconds: 4.29\n",
2759 |       "Epoch  17/100 Batch  620/781 - Loss:  1.010, Seconds: 4.01\n"
2760 |      ]
2761 |     },
2762 |     {
2763 |      "name": "stdout",
2764 |      "output_type": "stream",
2765 |      "text": [
2766 |       "Epoch  17/100 Batch  640/781 - Loss:  1.001, Seconds: 4.23\n",
2767 |       "Epoch  17/100 Batch  660/781 - Loss:  0.882, Seconds: 4.47\n",
2768 |       "Epoch  17/100 Batch  680/781 - Loss:  0.947, Seconds: 4.37\n",
2769 |       "Epoch  17/100 Batch  700/781 - Loss:  1.028, Seconds: 4.31\n",
2770 |       "Epoch  17/100 Batch  720/781 - Loss:  1.147, Seconds: 4.53\n",
2771 |       "Epoch  17/100 Batch  740/781 - Loss:  1.047, Seconds: 4.77\n",
2772 |       "Epoch  17/100 Batch  760/781 - Loss:  1.062, Seconds: 4.31\n",
2773 |       "Average loss for this update: 1.003\n",
2774 |       "No Improvement.\n",
2775 |       "Epoch  17/100 Batch  780/781 - Loss:  0.922, Seconds: 3.91\n",
2776 |       "Epoch  18/100 Batch   20/781 - Loss:  1.115, Seconds: 4.41\n",
2777 |       "Epoch  18/100 Batch   40/781 - Loss:  1.022, Seconds: 3.31\n",
2778 |       "Epoch  18/100 Batch   60/781 - Loss:  1.016, Seconds: 4.41\n",
2779 |       "Epoch  18/100 Batch   80/781 - Loss:  1.003, Seconds: 4.19\n",
2780 |       "Epoch  18/100 Batch  100/781 - Loss:  0.893, Seconds: 4.23\n",
2781 |       "Epoch  18/100 Batch  120/781 - Loss:  0.970, Seconds: 3.77\n",
2782 |       "Epoch  18/100 Batch  140/781 - Loss:  0.917, Seconds: 4.09\n",
2783 |       "Epoch  18/100 Batch  160/781 - Loss:  1.062, Seconds: 3.33\n",
2784 |       "Epoch  18/100 Batch  180/781 - Loss:  1.011, Seconds: 3.73\n",
2785 |       "Epoch  18/100 Batch  200/781 - Loss:  0.982, Seconds: 4.37\n",
2786 |       "Epoch  18/100 Batch  220/781 - Loss:  0.982, Seconds: 4.03\n",
2787 |       "Epoch  18/100 Batch  240/781 - Loss:  0.839, Seconds: 4.09\n",
2788 |       "Average loss for this update: 0.98\n",
2789 |       "New Record!\n",
2790 |       "Epoch  18/100 Batch  260/781 - Loss:  0.918, Seconds: 4.45\n",
2791 |       "Epoch  18/100 Batch  280/781 - Loss:  0.930, Seconds: 4.29\n",
2792 |       "Epoch  18/100 Batch  300/781 - Loss:  1.033, Seconds: 4.07\n",
2793 |       "Epoch  18/100 Batch  320/781 - Loss:  1.063, Seconds: 4.23\n",
2794 |       "Epoch  18/100 Batch  340/781 - Loss:  0.951, Seconds: 4.59\n",
2795 |       "Epoch  18/100 Batch  360/781 - Loss:  1.000, Seconds: 4.23\n",
2796 |       "Epoch  18/100 Batch  380/781 - Loss:  0.851, Seconds: 4.55\n",
2797 |       "Epoch  18/100 Batch  400/781 - Loss:  0.980, Seconds: 4.45\n",
2798 |       "Epoch  18/100 Batch  420/781 - Loss:  0.916, Seconds: 3.95\n",
2799 |       "Epoch  18/100 Batch  440/781 - Loss:  0.991, Seconds: 4.53\n",
2800 |       "Epoch  18/100 Batch  460/781 - Loss:  1.049, Seconds: 4.35\n",
2801 |       "Epoch  18/100 Batch  480/781 - Loss:  0.943, Seconds: 3.99\n",
2802 |       "Epoch  18/100 Batch  500/781 - Loss:  0.993, Seconds: 4.13\n",
2803 |       "Average loss for this update: 0.968\n",
2804 |       "New Record!\n",
2805 |       "Epoch  18/100 Batch  520/781 - Loss:  0.890, Seconds: 3.87\n",
2806 |       "Epoch  18/100 Batch  540/781 - Loss:  0.939, Seconds: 4.31\n",
2807 |       "Epoch  18/100 Batch  560/781 - Loss:  0.906, Seconds: 3.95\n",
2808 |       "Epoch  18/100 Batch  580/781 - Loss:  1.027, Seconds: 4.31\n",
2809 |       "Epoch  18/100 Batch  600/781 - Loss:  1.034, Seconds: 4.33\n",
2810 |       "Epoch  18/100 Batch  620/781 - Loss:  0.987, Seconds: 4.03\n",
2811 |       "Epoch  18/100 Batch  640/781 - Loss:  0.987, Seconds: 4.27\n",
2812 |       "Epoch  18/100 Batch  660/781 - Loss:  0.855, Seconds: 4.53\n",
2813 |       "Epoch  18/100 Batch  680/781 - Loss:  0.926, Seconds: 4.25\n",
2814 |       "Epoch  18/100 Batch  700/781 - Loss:  1.014, Seconds: 4.33\n",
2815 |       "Epoch  18/100 Batch  720/781 - Loss:  1.130, Seconds: 4.47\n",
2816 |       "Epoch  18/100 Batch  740/781 - Loss:  1.033, Seconds: 4.69\n",
2817 |       "Epoch  18/100 Batch  760/781 - Loss:  1.041, Seconds: 4.29\n",
2818 |       "Average loss for this update: 0.984\n",
2819 |       "No Improvement.\n",
2820 |       "Epoch  18/100 Batch  780/781 - Loss:  0.910, Seconds: 3.87\n",
2821 |       "Epoch  19/100 Batch   20/781 - Loss:  1.078, Seconds: 4.23\n",
2822 |       "Epoch  19/100 Batch   40/781 - Loss:  1.007, Seconds: 3.37\n",
2823 |       "Epoch  19/100 Batch   60/781 - Loss:  0.987, Seconds: 4.41\n",
2824 |       "Epoch  19/100 Batch   80/781 - Loss:  0.983, Seconds: 4.21\n",
2825 |       "Epoch  19/100 Batch  100/781 - Loss:  0.880, Seconds: 4.15\n",
2826 |       "Epoch  19/100 Batch  120/781 - Loss:  0.949, Seconds: 3.79\n",
2827 |       "Epoch  19/100 Batch  140/781 - Loss:  0.886, Seconds: 3.99\n",
2828 |       "Epoch  19/100 Batch  160/781 - Loss:  1.047, Seconds: 3.33\n",
2829 |       "Epoch  19/100 Batch  180/781 - Loss:  1.013, Seconds: 3.65\n",
2830 |       "Epoch  19/100 Batch  200/781 - Loss:  0.972, Seconds: 4.27\n",
2831 |       "Epoch  19/100 Batch  220/781 - Loss:  0.967, Seconds: 4.31\n",
2832 |       "Epoch  19/100 Batch  240/781 - Loss:  0.824, Seconds: 4.12\n",
2833 |       "Average loss for this update: 0.962\n",
2834 |       "New Record!\n",
2835 |       "Epoch  19/100 Batch  260/781 - Loss:  0.912, Seconds: 4.67\n",
2836 |       "Epoch  19/100 Batch  280/781 - Loss:  0.927, Seconds: 4.35\n",
2837 |       "Epoch  19/100 Batch  300/781 - Loss:  1.008, Seconds: 4.09\n",
2838 |       "Epoch  19/100 Batch  320/781 - Loss:  1.041, Seconds: 4.11\n",
2839 |       "Epoch  19/100 Batch  340/781 - Loss:  0.929, Seconds: 4.63\n",
2840 |       "Epoch  19/100 Batch  360/781 - Loss:  1.007, Seconds: 4.51\n",
2841 |       "Epoch  19/100 Batch  380/781 - Loss:  0.860, Seconds: 4.61\n",
2842 |       "Epoch  19/100 Batch  400/781 - Loss:  0.965, Seconds: 4.67\n",
2843 |       "Epoch  19/100 Batch  420/781 - Loss:  0.903, Seconds: 3.85\n",
2844 |       "Epoch  19/100 Batch  440/781 - Loss:  0.976, Seconds: 4.73\n",
2845 |       "Epoch  19/100 Batch  460/781 - Loss:  1.033, Seconds: 4.45\n",
2846 |       "Epoch  19/100 Batch  480/781 - Loss:  0.912, Seconds: 4.11\n",
2847 |       "Epoch  19/100 Batch  500/781 - Loss:  0.976, Seconds: 4.31\n",
2848 |       "Average loss for this update: 0.954\n",
2849 |       "New Record!\n",
2850 |       "Epoch  19/100 Batch  520/781 - Loss:  0.868, Seconds: 4.03\n",
2851 |       "Epoch  19/100 Batch  540/781 - Loss:  0.922, Seconds: 4.59\n",
2852 |       "Epoch  19/100 Batch  560/781 - Loss:  0.893, Seconds: 4.07\n",
2853 |       "Epoch  19/100 Batch  580/781 - Loss:  1.004, Seconds: 4.57\n",
2854 |       "Epoch  19/100 Batch  600/781 - Loss:  1.024, Seconds: 4.43\n",
2855 |       "Epoch  19/100 Batch  620/781 - Loss:  0.947, Seconds: 4.23\n",
2856 |       "Epoch  19/100 Batch  640/781 - Loss:  0.949, Seconds: 4.55\n",
2857 |       "Epoch  19/100 Batch  660/781 - Loss:  0.840, Seconds: 4.59\n",
2858 |       "Epoch  19/100 Batch  680/781 - Loss:  0.917, Seconds: 4.31\n",
2859 |       "Epoch  19/100 Batch  700/781 - Loss:  0.994, Seconds: 4.47\n",
2860 |       "Epoch  19/100 Batch  720/781 - Loss:  1.095, Seconds: 4.61\n",
2861 |       "Epoch  19/100 Batch  740/781 - Loss:  0.991, Seconds: 4.73\n",
2862 |       "Epoch  19/100 Batch  760/781 - Loss:  1.018, Seconds: 4.45\n",
2863 |       "Average loss for this update: 0.96\n",
2864 |       "No Improvement.\n",
2865 |       "Epoch  19/100 Batch  780/781 - Loss:  0.882, Seconds: 4.09\n",
2866 |       "Epoch  20/100 Batch   20/781 - Loss:  1.101, Seconds: 4.27\n",
2867 |       "Epoch  20/100 Batch   40/781 - Loss:  0.987, Seconds: 3.55\n",
2868 |       "Epoch  20/100 Batch   60/781 - Loss:  0.966, Seconds: 4.51\n",
2869 |       "Epoch  20/100 Batch   80/781 - Loss:  0.978, Seconds: 4.27\n",
2870 |       "Epoch  20/100 Batch  100/781 - Loss:  0.863, Seconds: 4.31\n",
2871 |       "Epoch  20/100 Batch  120/781 - Loss:  0.927, Seconds: 3.91\n",
2872 |       "Epoch  20/100 Batch  140/781 - Loss:  0.872, Seconds: 4.03\n",
2873 |       "Epoch  20/100 Batch  160/781 - Loss:  1.023, Seconds: 3.25\n",
2874 |       "Epoch  20/100 Batch  180/781 - Loss:  0.974, Seconds: 3.89\n",
2875 |       "Epoch  20/100 Batch  200/781 - Loss:  0.935, Seconds: 4.39\n",
2876 |       "Epoch  20/100 Batch  220/781 - Loss:  0.957, Seconds: 4.11\n",
2877 |       "Epoch  20/100 Batch  240/781 - Loss:  0.805, Seconds: 4.15\n",
2878 |       "Average loss for this update: 0.946\n",
2879 |       "New Record!\n",
2880 |       "Epoch  20/100 Batch  260/781 - Loss:  0.898, Seconds: 4.69\n",
2881 |       "Epoch  20/100 Batch  280/781 - Loss:  0.909, Seconds: 4.50\n",
2882 |       "Epoch  20/100 Batch  300/781 - Loss:  0.989, Seconds: 4.23\n",
2883 |       "Epoch  20/100 Batch  320/781 - Loss:  1.021, Seconds: 4.23\n",
2884 |       "Epoch  20/100 Batch  340/781 - Loss:  0.911, Seconds: 4.71\n",
2885 |       "Epoch  20/100 Batch  360/781 - Loss:  0.960, Seconds: 4.53\n",
2886 |       "Epoch  20/100 Batch  380/781 - Loss:  0.824, Seconds: 4.61\n",
2887 |       "Epoch  20/100 Batch  400/781 - Loss:  0.934, Seconds: 4.73\n",
2888 |       "Epoch  20/100 Batch  420/781 - Loss:  0.874, Seconds: 4.03\n",
2889 |       "Epoch  20/100 Batch  440/781 - Loss:  0.921, Seconds: 4.77\n",
2890 |       "Epoch  20/100 Batch  460/781 - Loss:  0.990, Seconds: 4.51\n",
2891 |       "Epoch  20/100 Batch  480/781 - Loss:  0.893, Seconds: 4.03\n",
2892 |       "Epoch  20/100 Batch  500/781 - Loss:  0.938, Seconds: 4.17\n",
2893 |       "Average loss for this update: 0.923\n",
2894 |       "New Record!\n",
2895 |       "Epoch  20/100 Batch  520/781 - Loss:  0.842, Seconds: 4.01\n",
2896 |       "Epoch  20/100 Batch  540/781 - Loss:  0.898, Seconds: 4.63\n",
2897 |       "Epoch  20/100 Batch  560/781 - Loss:  0.860, Seconds: 4.07\n",
2898 |       "Epoch  20/100 Batch  580/781 - Loss:  0.974, Seconds: 4.37\n",
2899 |       "Epoch  20/100 Batch  600/781 - Loss:  0.980, Seconds: 4.47\n",
2900 |       "Epoch  20/100 Batch  620/781 - Loss:  0.929, Seconds: 4.17\n",
2901 |       "Epoch  20/100 Batch  640/781 - Loss:  0.922, Seconds: 4.29\n",
2902 |       "Epoch  20/100 Batch  660/781 - Loss:  0.827, Seconds: 4.47\n",
2903 |       "Epoch  20/100 Batch  680/781 - Loss:  0.882, Seconds: 4.27\n",
2904 |       "Epoch  20/100 Batch  700/781 - Loss:  0.960, Seconds: 4.31\n",
2905 |       "Epoch  20/100 Batch  720/781 - Loss:  1.073, Seconds: 4.61\n",
2906 |       "Epoch  20/100 Batch  740/781 - Loss:  0.974, Seconds: 4.65\n",
2907 |       "Epoch  20/100 Batch  760/781 - Loss:  0.989, Seconds: 4.23\n",
2908 |       "Average loss for this update: 0.933\n",
2909 |       "No Improvement.\n",
2910 |       "Epoch  20/100 Batch  780/781 - Loss:  0.858, Seconds: 3.91\n",
2911 |       "Epoch  21/100 Batch   20/781 - Loss:  1.041, Seconds: 4.27\n",
2912 |       "Epoch  21/100 Batch   40/781 - Loss:  0.954, Seconds: 3.39\n",
2913 |       "Epoch  21/100 Batch   60/781 - Loss:  0.946, Seconds: 4.37\n",
2914 |       "Epoch  21/100 Batch   80/781 - Loss:  0.942, Seconds: 4.33\n",
2915 |       "Epoch  21/100 Batch  100/781 - Loss:  0.827, Seconds: 4.53\n",
2916 |       "Epoch  21/100 Batch  120/781 - Loss:  0.897, Seconds: 3.81\n"
2917 |      ]
2918 |     },
2919 |     {
2920 |      "name": "stdout",
2921 |      "output_type": "stream",
2922 |      "text": [
2923 |       "Epoch  21/100 Batch  140/781 - Loss:  0.851, Seconds: 4.17\n",
2924 |       "Epoch  21/100 Batch  160/781 - Loss:  0.983, Seconds: 3.27\n",
2925 |       "Epoch  21/100 Batch  180/781 - Loss:  0.949, Seconds: 3.59\n",
2926 |       "Epoch  21/100 Batch  200/781 - Loss:  0.904, Seconds: 4.35\n",
2927 |       "Epoch  21/100 Batch  220/781 - Loss:  0.918, Seconds: 4.11\n",
2928 |       "Epoch  21/100 Batch  240/781 - Loss:  0.773, Seconds: 4.01\n",
2929 |       "Average loss for this update: 0.913\n",
2930 |       "New Record!\n",
2931 |       "Epoch  21/100 Batch  260/781 - Loss:  0.874, Seconds: 4.55\n",
2932 |       "Epoch  21/100 Batch  280/781 - Loss:  0.875, Seconds: 4.27\n",
2933 |       "Epoch  21/100 Batch  300/781 - Loss:  0.943, Seconds: 4.15\n",
2934 |       "Epoch  21/100 Batch  320/781 - Loss:  0.996, Seconds: 4.25\n",
2935 |       "Epoch  21/100 Batch  340/781 - Loss:  0.879, Seconds: 4.65\n",
2936 |       "Epoch  21/100 Batch  360/781 - Loss:  0.925, Seconds: 4.41\n",
2937 |       "Epoch  21/100 Batch  380/781 - Loss:  0.787, Seconds: 4.57\n",
2938 |       "Epoch  21/100 Batch  400/781 - Loss:  0.915, Seconds: 4.57\n",
2939 |       "Epoch  21/100 Batch  420/781 - Loss:  0.850, Seconds: 4.07\n",
2940 |       "Epoch  21/100 Batch  440/781 - Loss:  0.908, Seconds: 4.73\n",
2941 |       "Epoch  21/100 Batch  460/781 - Loss:  0.966, Seconds: 4.61\n",
2942 |       "Epoch  21/100 Batch  480/781 - Loss:  0.866, Seconds: 3.97\n",
2943 |       "Epoch  21/100 Batch  500/781 - Loss:  0.915, Seconds: 4.51\n",
2944 |       "Average loss for this update: 0.894\n",
2945 |       "New Record!\n",
2946 |       "Epoch  21/100 Batch  520/781 - Loss:  0.812, Seconds: 3.93\n",
2947 |       "Epoch  21/100 Batch  540/781 - Loss:  0.880, Seconds: 4.59\n",
2948 |       "Epoch  21/100 Batch  560/781 - Loss:  0.843, Seconds: 4.11\n",
2949 |       "Epoch  21/100 Batch  580/781 - Loss:  0.945, Seconds: 4.31\n",
2950 |       "Epoch  21/100 Batch  600/781 - Loss:  0.969, Seconds: 4.56\n",
2951 |       "Epoch  21/100 Batch  620/781 - Loss:  0.906, Seconds: 4.19\n",
2952 |       "Epoch  21/100 Batch  640/781 - Loss:  0.896, Seconds: 4.25\n",
2953 |       "Epoch  21/100 Batch  660/781 - Loss:  0.793, Seconds: 4.51\n",
2954 |       "Epoch  21/100 Batch  680/781 - Loss:  0.849, Seconds: 4.21\n",
2955 |       "Epoch  21/100 Batch  700/781 - Loss:  0.944, Seconds: 4.25\n",
2956 |       "Epoch  21/100 Batch  720/781 - Loss:  1.048, Seconds: 4.51\n",
2957 |       "Epoch  21/100 Batch  740/781 - Loss:  0.944, Seconds: 4.71\n",
2958 |       "Epoch  21/100 Batch  760/781 - Loss:  0.972, Seconds: 4.45\n",
2959 |       "Average loss for this update: 0.911\n",
2960 |       "No Improvement.\n",
2961 |       "Epoch  21/100 Batch  780/781 - Loss:  0.850, Seconds: 4.11\n",
2962 |       "Epoch  22/100 Batch   20/781 - Loss:  1.015, Seconds: 4.25\n",
2963 |       "Epoch  22/100 Batch   40/781 - Loss:  0.936, Seconds: 3.51\n",
2964 |       "Epoch  22/100 Batch   60/781 - Loss:  0.925, Seconds: 4.27\n",
2965 |       "Epoch  22/100 Batch   80/781 - Loss:  0.913, Seconds: 4.15\n",
2966 |       "Epoch  22/100 Batch  100/781 - Loss:  0.818, Seconds: 4.25\n",
2967 |       "Epoch  22/100 Batch  120/781 - Loss:  0.871, Seconds: 3.81\n",
2968 |       "Epoch  22/100 Batch  140/781 - Loss:  0.843, Seconds: 4.05\n",
2969 |       "Epoch  22/100 Batch  160/781 - Loss:  0.968, Seconds: 3.33\n",
2970 |       "Epoch  22/100 Batch  180/781 - Loss:  0.926, Seconds: 3.69\n",
2971 |       "Epoch  22/100 Batch  200/781 - Loss:  0.896, Seconds: 4.29\n",
2972 |       "Epoch  22/100 Batch  220/781 - Loss:  0.891, Seconds: 4.09\n",
2973 |       "Epoch  22/100 Batch  240/781 - Loss:  0.756, Seconds: 4.11\n",
2974 |       "Average loss for this update: 0.893\n",
2975 |       "New Record!\n",
2976 |       "Epoch  22/100 Batch  260/781 - Loss:  0.849, Seconds: 4.61\n",
2977 |       "Epoch  22/100 Batch  280/781 - Loss:  0.846, Seconds: 4.27\n",
2978 |       "Epoch  22/100 Batch  300/781 - Loss:  0.935, Seconds: 4.09\n",
2979 |       "Epoch  22/100 Batch  320/781 - Loss:  0.980, Seconds: 4.25\n",
2980 |       "Epoch  22/100 Batch  340/781 - Loss:  0.857, Seconds: 4.55\n",
2981 |       "Epoch  22/100 Batch  360/781 - Loss:  0.914, Seconds: 4.67\n",
2982 |       "Epoch  22/100 Batch  380/781 - Loss:  0.786, Seconds: 4.63\n",
2983 |       "Epoch  22/100 Batch  400/781 - Loss:  0.895, Seconds: 4.61\n",
2984 |       "Epoch  22/100 Batch  420/781 - Loss:  0.837, Seconds: 3.91\n",
2985 |       "Epoch  22/100 Batch  440/781 - Loss:  0.901, Seconds: 4.63\n",
2986 |       "Epoch  22/100 Batch  460/781 - Loss:  0.944, Seconds: 4.39\n",
2987 |       "Epoch  22/100 Batch  480/781 - Loss:  0.842, Seconds: 3.97\n",
2988 |       "Epoch  22/100 Batch  500/781 - Loss:  0.908, Seconds: 4.15\n",
2989 |       "Average loss for this update: 0.88\n",
2990 |       "New Record!\n",
2991 |       "Epoch  22/100 Batch  520/781 - Loss:  0.796, Seconds: 3.85\n",
2992 |       "Epoch  22/100 Batch  540/781 - Loss:  0.856, Seconds: 4.37\n",
2993 |       "Epoch  22/100 Batch  560/781 - Loss:  0.817, Seconds: 4.01\n",
2994 |       "Epoch  22/100 Batch  580/781 - Loss:  0.931, Seconds: 4.31\n",
2995 |       "Epoch  22/100 Batch  600/781 - Loss:  0.962, Seconds: 4.27\n",
2996 |       "Epoch  22/100 Batch  620/781 - Loss:  0.903, Seconds: 4.03\n",
2997 |       "Epoch  22/100 Batch  640/781 - Loss:  0.872, Seconds: 4.13\n",
2998 |       "Epoch  22/100 Batch  660/781 - Loss:  0.788, Seconds: 4.43\n",
2999 |       "Epoch  22/100 Batch  680/781 - Loss:  0.839, Seconds: 4.37\n",
3000 |       "Epoch  22/100 Batch  700/781 - Loss:  0.928, Seconds: 4.31\n",
3001 |       "Epoch  22/100 Batch  720/781 - Loss:  1.030, Seconds: 4.49\n",
3002 |       "Epoch  22/100 Batch  740/781 - Loss:  0.934, Seconds: 4.85\n",
3003 |       "Epoch  22/100 Batch  760/781 - Loss:  0.955, Seconds: 4.33\n",
3004 |       "Average loss for this update: 0.895\n",
3005 |       "No Improvement.\n",
3006 |       "Epoch  22/100 Batch  780/781 - Loss:  0.830, Seconds: 3.89\n",
3007 |       "Epoch  23/100 Batch   20/781 - Loss:  0.994, Seconds: 4.33\n",
3008 |       "Epoch  23/100 Batch   40/781 - Loss:  0.901, Seconds: 3.47\n",
3009 |       "Epoch  23/100 Batch   60/781 - Loss:  0.897, Seconds: 4.35\n",
3010 |       "Epoch  23/100 Batch   80/781 - Loss:  0.893, Seconds: 4.27\n",
3011 |       "Epoch  23/100 Batch  100/781 - Loss:  0.796, Seconds: 4.31\n",
3012 |       "Epoch  23/100 Batch  120/781 - Loss:  0.861, Seconds: 3.71\n",
3013 |       "Epoch  23/100 Batch  140/781 - Loss:  0.816, Seconds: 4.05\n",
3014 |       "Epoch  23/100 Batch  160/781 - Loss:  0.952, Seconds: 3.41\n",
3015 |       "Epoch  23/100 Batch  180/781 - Loss:  0.905, Seconds: 3.65\n",
3016 |       "Epoch  23/100 Batch  200/781 - Loss:  0.885, Seconds: 4.23\n",
3017 |       "Epoch  23/100 Batch  220/781 - Loss:  0.880, Seconds: 4.15\n",
3018 |       "Epoch  23/100 Batch  240/781 - Loss:  0.738, Seconds: 4.13\n",
3019 |       "Average loss for this update: 0.874\n",
3020 |       "New Record!\n",
3021 |       "Epoch  23/100 Batch  260/781 - Loss:  0.833, Seconds: 4.49\n",
3022 |       "Epoch  23/100 Batch  280/781 - Loss:  0.838, Seconds: 4.29\n",
3023 |       "Epoch  23/100 Batch  300/781 - Loss:  0.915, Seconds: 4.25\n",
3024 |       "Epoch  23/100 Batch  320/781 - Loss:  0.953, Seconds: 4.13\n",
3025 |       "Epoch  23/100 Batch  340/781 - Loss:  0.847, Seconds: 4.49\n",
3026 |       "Epoch  23/100 Batch  360/781 - Loss:  0.889, Seconds: 4.33\n",
3027 |       "Epoch  23/100 Batch  380/781 - Loss:  0.775, Seconds: 4.55\n",
3028 |       "Epoch  23/100 Batch  400/781 - Loss:  0.904, Seconds: 4.59\n",
3029 |       "Epoch  23/100 Batch  420/781 - Loss:  0.824, Seconds: 3.95\n",
3030 |       "Epoch  23/100 Batch  440/781 - Loss:  0.883, Seconds: 4.57\n",
3031 |       "Epoch  23/100 Batch  460/781 - Loss:  0.936, Seconds: 4.47\n",
3032 |       "Epoch  23/100 Batch  480/781 - Loss:  0.862, Seconds: 3.85\n",
3033 |       "Epoch  23/100 Batch  500/781 - Loss:  0.913, Seconds: 4.29\n",
3034 |       "Average loss for this update: 0.871\n",
3035 |       "New Record!\n",
3036 |       "Epoch  23/100 Batch  520/781 - Loss:  0.793, Seconds: 3.91\n",
3037 |       "Epoch  23/100 Batch  540/781 - Loss:  0.833, Seconds: 4.41\n",
3038 |       "Epoch  23/100 Batch  560/781 - Loss:  0.812, Seconds: 3.97\n",
3039 |       "Epoch  23/100 Batch  580/781 - Loss:  0.905, Seconds: 4.25\n",
3040 |       "Epoch  23/100 Batch  600/781 - Loss:  0.934, Seconds: 4.19\n",
3041 |       "Epoch  23/100 Batch  620/781 - Loss:  0.874, Seconds: 4.07\n",
3042 |       "Epoch  23/100 Batch  640/781 - Loss:  0.867, Seconds: 4.37\n",
3043 |       "Epoch  23/100 Batch  660/781 - Loss:  0.762, Seconds: 4.41\n",
3044 |       "Epoch  23/100 Batch  680/781 - Loss:  0.821, Seconds: 4.25\n",
3045 |       "Epoch  23/100 Batch  700/781 - Loss:  0.917, Seconds: 4.37\n",
3046 |       "Epoch  23/100 Batch  720/781 - Loss:  1.014, Seconds: 4.47\n",
3047 |       "Epoch  23/100 Batch  740/781 - Loss:  0.926, Seconds: 4.61\n",
3048 |       "Epoch  23/100 Batch  760/781 - Loss:  0.963, Seconds: 4.35\n",
3049 |       "Average loss for this update: 0.881\n",
3050 |       "No Improvement.\n",
3051 |       "Epoch  23/100 Batch  780/781 - Loss:  0.827, Seconds: 3.87\n",
3052 |       "Epoch  24/100 Batch   20/781 - Loss:  0.965, Seconds: 4.19\n",
3053 |       "Epoch  24/100 Batch   40/781 - Loss:  0.886, Seconds: 3.41\n",
3054 |       "Epoch  24/100 Batch   60/781 - Loss:  0.874, Seconds: 4.35\n",
3055 |       "Epoch  24/100 Batch   80/781 - Loss:  0.885, Seconds: 4.19\n",
3056 |       "Epoch  24/100 Batch  100/781 - Loss:  0.778, Seconds: 4.18\n",
3057 |       "Epoch  24/100 Batch  120/781 - Loss:  0.851, Seconds: 3.73\n",
3058 |       "Epoch  24/100 Batch  140/781 - Loss:  0.797, Seconds: 4.05\n",
3059 |       "Epoch  24/100 Batch  160/781 - Loss:  0.932, Seconds: 3.29\n",
3060 |       "Epoch  24/100 Batch  180/781 - Loss:  0.893, Seconds: 3.63\n",
3061 |       "Epoch  24/100 Batch  200/781 - Loss:  0.867, Seconds: 4.21\n",
3062 |       "Epoch  24/100 Batch  220/781 - Loss:  0.865, Seconds: 4.05\n",
3063 |       "Epoch  24/100 Batch  240/781 - Loss:  0.719, Seconds: 4.03\n",
3064 |       "Average loss for this update: 0.856\n",
3065 |       "New Record!\n",
3066 |       "Epoch  24/100 Batch  260/781 - Loss:  0.811, Seconds: 4.65\n",
3067 |       "Epoch  24/100 Batch  280/781 - Loss:  0.830, Seconds: 4.31\n",
3068 |       "Epoch  24/100 Batch  300/781 - Loss:  0.902, Seconds: 4.23\n",
3069 |       "Epoch  24/100 Batch  320/781 - Loss:  0.935, Seconds: 4.23\n",
3070 |       "Epoch  24/100 Batch  340/781 - Loss:  0.819, Seconds: 4.49\n",
3071 |       "Epoch  24/100 Batch  360/781 - Loss:  0.871, Seconds: 4.31\n",
3072 |       "Epoch  24/100 Batch  380/781 - Loss:  0.738, Seconds: 4.67\n",
3073 |       "Epoch  24/100 Batch  400/781 - Loss:  0.857, Seconds: 4.53\n"
3074 |      ]
3075 |     },
3076 |     {
3077 |      "name": "stdout",
3078 |      "output_type": "stream",
3079 |      "text": [
3080 |       "Epoch  24/100 Batch  420/781 - Loss:  0.796, Seconds: 3.85\n",
3081 |       "Epoch  24/100 Batch  440/781 - Loss:  0.865, Seconds: 4.51\n",
3082 |       "Epoch  24/100 Batch  460/781 - Loss:  0.914, Seconds: 4.45\n",
3083 |       "Epoch  24/100 Batch  480/781 - Loss:  0.823, Seconds: 3.89\n",
3084 |       "Epoch  24/100 Batch  500/781 - Loss:  0.879, Seconds: 4.27\n",
3085 |       "Average loss for this update: 0.846\n",
3086 |       "New Record!\n",
3087 |       "Epoch  24/100 Batch  520/781 - Loss:  0.766, Seconds: 3.89\n",
3088 |       "Epoch  24/100 Batch  540/781 - Loss:  0.818, Seconds: 4.47\n",
3089 |       "Epoch  24/100 Batch  560/781 - Loss:  0.784, Seconds: 4.13\n",
3090 |       "Epoch  24/100 Batch  580/781 - Loss:  0.897, Seconds: 4.19\n",
3091 |       "Epoch  24/100 Batch  600/781 - Loss:  0.902, Seconds: 4.43\n",
3092 |       "Epoch  24/100 Batch  620/781 - Loss:  0.859, Seconds: 4.15\n",
3093 |       "Epoch  24/100 Batch  640/781 - Loss:  0.847, Seconds: 4.17\n",
3094 |       "Epoch  24/100 Batch  660/781 - Loss:  0.744, Seconds: 4.47\n",
3095 |       "Epoch  24/100 Batch  680/781 - Loss:  0.788, Seconds: 4.31\n",
3096 |       "Epoch  24/100 Batch  700/781 - Loss:  0.886, Seconds: 4.31\n",
3097 |       "Epoch  24/100 Batch  720/781 - Loss:  0.987, Seconds: 4.43\n",
3098 |       "Epoch  24/100 Batch  740/781 - Loss:  0.898, Seconds: 4.79\n",
3099 |       "Epoch  24/100 Batch  760/781 - Loss:  0.924, Seconds: 4.33\n",
3100 |       "Average loss for this update: 0.857\n",
3101 |       "No Improvement.\n",
3102 |       "Epoch  24/100 Batch  780/781 - Loss:  0.808, Seconds: 3.99\n",
3103 |       "Epoch  25/100 Batch   20/781 - Loss:  0.941, Seconds: 4.21\n",
3104 |       "Epoch  25/100 Batch   40/781 - Loss:  0.873, Seconds: 3.41\n",
3105 |       "Epoch  25/100 Batch   60/781 - Loss:  0.868, Seconds: 4.49\n",
3106 |       "Epoch  25/100 Batch   80/781 - Loss:  0.858, Seconds: 4.25\n",
3107 |       "Epoch  25/100 Batch  100/781 - Loss:  0.761, Seconds: 4.25\n",
3108 |       "Epoch  25/100 Batch  120/781 - Loss:  0.809, Seconds: 3.85\n",
3109 |       "Epoch  25/100 Batch  140/781 - Loss:  0.784, Seconds: 3.99\n",
3110 |       "Epoch  25/100 Batch  160/781 - Loss:  0.925, Seconds: 3.25\n",
3111 |       "Epoch  25/100 Batch  180/781 - Loss:  0.886, Seconds: 3.77\n",
3112 |       "Epoch  25/100 Batch  200/781 - Loss:  0.860, Seconds: 4.17\n",
3113 |       "Epoch  25/100 Batch  220/781 - Loss:  0.846, Seconds: 4.03\n",
3114 |       "Epoch  25/100 Batch  240/781 - Loss:  0.709, Seconds: 4.19\n",
3115 |       "Average loss for this update: 0.84\n",
3116 |       "New Record!\n",
3117 |       "Epoch  25/100 Batch  260/781 - Loss:  0.790, Seconds: 4.61\n",
3118 |       "Epoch  25/100 Batch  280/781 - Loss:  0.822, Seconds: 4.27\n",
3119 |       "Epoch  25/100 Batch  300/781 - Loss:  0.880, Seconds: 4.09\n",
3120 |       "Epoch  25/100 Batch  320/781 - Loss:  0.922, Seconds: 4.09\n",
3121 |       "Epoch  25/100 Batch  340/781 - Loss:  0.816, Seconds: 4.59\n",
3122 |       "Epoch  25/100 Batch  360/781 - Loss:  0.845, Seconds: 4.31\n",
3123 |       "Epoch  25/100 Batch  380/781 - Loss:  0.732, Seconds: 4.51\n",
3124 |       "Epoch  25/100 Batch  400/781 - Loss:  0.851, Seconds: 4.47\n",
3125 |       "Epoch  25/100 Batch  420/781 - Loss:  0.787, Seconds: 3.91\n",
3126 |       "Epoch  25/100 Batch  440/781 - Loss:  0.848, Seconds: 4.51\n",
3127 |       "Epoch  25/100 Batch  460/781 - Loss:  0.897, Seconds: 4.39\n",
3128 |       "Epoch  25/100 Batch  480/781 - Loss:  0.808, Seconds: 3.97\n",
3129 |       "Epoch  25/100 Batch  500/781 - Loss:  0.867, Seconds: 4.23\n",
3130 |       "Average loss for this update: 0.832\n",
3131 |       "New Record!\n",
3132 |       "Epoch  25/100 Batch  520/781 - Loss:  0.746, Seconds: 3.95\n",
3133 |       "Epoch  25/100 Batch  540/781 - Loss:  0.806, Seconds: 4.33\n",
3134 |       "Epoch  25/100 Batch  560/781 - Loss:  0.777, Seconds: 4.09\n",
3135 |       "Epoch  25/100 Batch  580/781 - Loss:  0.870, Seconds: 4.15\n",
3136 |       "Epoch  25/100 Batch  600/781 - Loss:  0.891, Seconds: 4.23\n",
3137 |       "Epoch  25/100 Batch  620/781 - Loss:  0.844, Seconds: 4.15\n",
3138 |       "Epoch  25/100 Batch  640/781 - Loss:  0.840, Seconds: 4.23\n",
3139 |       "Epoch  25/100 Batch  660/781 - Loss:  0.743, Seconds: 4.45\n",
3140 |       "Epoch  25/100 Batch  680/781 - Loss:  0.783, Seconds: 4.17\n",
3141 |       "Epoch  25/100 Batch  700/781 - Loss:  0.871, Seconds: 4.39\n",
3142 |       "Epoch  25/100 Batch  720/781 - Loss:  0.966, Seconds: 4.43\n",
3143 |       "Epoch  25/100 Batch  740/781 - Loss:  0.879, Seconds: 4.67\n",
3144 |       "Epoch  25/100 Batch  760/781 - Loss:  0.910, Seconds: 4.31\n",
3145 |       "Average loss for this update: 0.843\n",
3146 |       "No Improvement.\n",
3147 |       "Epoch  25/100 Batch  780/781 - Loss:  0.786, Seconds: 3.91\n",
3148 |       "Epoch  26/100 Batch   20/781 - Loss:  0.923, Seconds: 4.27\n",
3149 |       "Epoch  26/100 Batch   40/781 - Loss:  0.857, Seconds: 3.35\n",
3150 |       "Epoch  26/100 Batch   60/781 - Loss:  0.850, Seconds: 4.35\n",
3151 |       "Epoch  26/100 Batch   80/781 - Loss:  0.843, Seconds: 4.37\n",
3152 |       "Epoch  26/100 Batch  100/781 - Loss:  0.743, Seconds: 4.27\n",
3153 |       "Epoch  26/100 Batch  120/781 - Loss:  0.816, Seconds: 3.77\n",
3154 |       "Epoch  26/100 Batch  140/781 - Loss:  0.775, Seconds: 3.97\n",
3155 |       "Epoch  26/100 Batch  160/781 - Loss:  0.898, Seconds: 3.25\n",
3156 |       "Epoch  26/100 Batch  180/781 - Loss:  0.877, Seconds: 3.71\n",
3157 |       "Epoch  26/100 Batch  200/781 - Loss:  0.820, Seconds: 4.27\n",
3158 |       "Epoch  26/100 Batch  220/781 - Loss:  0.836, Seconds: 3.97\n",
3159 |       "Epoch  26/100 Batch  240/781 - Loss:  0.703, Seconds: 4.03\n",
3160 |       "Average loss for this update: 0.825\n",
3161 |       "New Record!\n",
3162 |       "Epoch  26/100 Batch  260/781 - Loss:  0.785, Seconds: 4.65\n",
3163 |       "Epoch  26/100 Batch  280/781 - Loss:  0.797, Seconds: 4.33\n",
3164 |       "Epoch  26/100 Batch  300/781 - Loss:  0.868, Seconds: 4.09\n",
3165 |       "Epoch  26/100 Batch  320/781 - Loss:  0.898, Seconds: 4.13\n",
3166 |       "Epoch  26/100 Batch  340/781 - Loss:  0.794, Seconds: 4.71\n",
3167 |       "Epoch  26/100 Batch  360/781 - Loss:  0.831, Seconds: 4.35\n",
3168 |       "Epoch  26/100 Batch  380/781 - Loss:  0.710, Seconds: 4.59\n",
3169 |       "Epoch  26/100 Batch  400/781 - Loss:  0.826, Seconds: 4.51\n",
3170 |       "Epoch  26/100 Batch  420/781 - Loss:  0.773, Seconds: 3.85\n",
3171 |       "Epoch  26/100 Batch  440/781 - Loss:  0.824, Seconds: 4.55\n",
3172 |       "Epoch  26/100 Batch  460/781 - Loss:  0.861, Seconds: 4.31\n",
3173 |       "Epoch  26/100 Batch  480/781 - Loss:  0.785, Seconds: 3.93\n",
3174 |       "Epoch  26/100 Batch  500/781 - Loss:  0.842, Seconds: 4.21\n",
3175 |       "Average loss for this update: 0.811\n",
3176 |       "New Record!\n",
3177 |       "Epoch  26/100 Batch  520/781 - Loss:  0.744, Seconds: 3.99\n",
3178 |       "Epoch  26/100 Batch  540/781 - Loss:  0.791, Seconds: 4.33\n",
3179 |       "Epoch  26/100 Batch  560/781 - Loss:  0.762, Seconds: 4.01\n",
3180 |       "Epoch  26/100 Batch  580/781 - Loss:  0.857, Seconds: 4.23\n",
3181 |       "Epoch  26/100 Batch  600/781 - Loss:  0.875, Seconds: 4.25\n",
3182 |       "Epoch  26/100 Batch  620/781 - Loss:  0.818, Seconds: 4.09\n",
3183 |       "Epoch  26/100 Batch  640/781 - Loss:  0.813, Seconds: 4.33\n",
3184 |       "Epoch  26/100 Batch  660/781 - Loss:  0.710, Seconds: 4.51\n",
3185 |       "Epoch  26/100 Batch  680/781 - Loss:  0.775, Seconds: 4.37\n",
3186 |       "Epoch  26/100 Batch  700/781 - Loss:  0.865, Seconds: 4.19\n",
3187 |       "Epoch  26/100 Batch  720/781 - Loss:  0.955, Seconds: 4.51\n",
3188 |       "Epoch  26/100 Batch  740/781 - Loss:  0.865, Seconds: 4.77\n",
3189 |       "Epoch  26/100 Batch  760/781 - Loss:  0.887, Seconds: 4.33\n",
3190 |       "Average loss for this update: 0.827\n",
3191 |       "No Improvement.\n",
3192 |       "Epoch  26/100 Batch  780/781 - Loss:  0.771, Seconds: 3.85\n",
3193 |       "Epoch  27/100 Batch   20/781 - Loss:  0.913, Seconds: 4.17\n",
3194 |       "Epoch  27/100 Batch   40/781 - Loss:  0.848, Seconds: 3.47\n",
3195 |       "Epoch  27/100 Batch   60/781 - Loss:  0.840, Seconds: 4.37\n",
3196 |       "Epoch  27/100 Batch   80/781 - Loss:  0.842, Seconds: 4.23\n",
3197 |       "Epoch  27/100 Batch  100/781 - Loss:  0.741, Seconds: 4.23\n",
3198 |       "Epoch  27/100 Batch  120/781 - Loss:  0.789, Seconds: 3.77\n",
3199 |       "Epoch  27/100 Batch  140/781 - Loss:  0.757, Seconds: 4.07\n",
3200 |       "Epoch  27/100 Batch  160/781 - Loss:  0.889, Seconds: 3.21\n",
3201 |       "Epoch  27/100 Batch  180/781 - Loss:  0.848, Seconds: 3.71\n",
3202 |       "Epoch  27/100 Batch  200/781 - Loss:  0.809, Seconds: 4.29\n",
3203 |       "Epoch  27/100 Batch  220/781 - Loss:  0.824, Seconds: 4.25\n",
3204 |       "Epoch  27/100 Batch  240/781 - Loss:  0.706, Seconds: 4.11\n",
3205 |       "Average loss for this update: 0.815\n",
3206 |       "No Improvement.\n",
3207 |       "Epoch  27/100 Batch  260/781 - Loss:  0.789, Seconds: 4.33\n",
3208 |       "Epoch  27/100 Batch  280/781 - Loss:  0.785, Seconds: 4.31\n",
3209 |       "Epoch  27/100 Batch  300/781 - Loss:  0.856, Seconds: 4.23\n",
3210 |       "Epoch  27/100 Batch  320/781 - Loss:  0.884, Seconds: 4.15\n",
3211 |       "Epoch  27/100 Batch  340/781 - Loss:  0.783, Seconds: 4.53\n",
3212 |       "Epoch  27/100 Batch  360/781 - Loss:  0.825, Seconds: 4.31\n",
3213 |       "Epoch  27/100 Batch  380/781 - Loss:  0.704, Seconds: 4.66\n",
3214 |       "Epoch  27/100 Batch  400/781 - Loss:  0.812, Seconds: 4.61\n",
3215 |       "Epoch  27/100 Batch  420/781 - Loss:  0.760, Seconds: 3.87\n",
3216 |       "Epoch  27/100 Batch  440/781 - Loss:  0.813, Seconds: 4.59\n",
3217 |       "Epoch  27/100 Batch  460/781 - Loss:  0.856, Seconds: 4.43\n",
3218 |       "Epoch  27/100 Batch  480/781 - Loss:  0.803, Seconds: 4.21\n",
3219 |       "Epoch  27/100 Batch  500/781 - Loss:  0.867, Seconds: 4.11\n",
3220 |       "Average loss for this update: 0.807\n",
3221 |       "New Record!\n",
3222 |       "Epoch  27/100 Batch  520/781 - Loss:  0.753, Seconds: 3.87\n",
3223 |       "Epoch  27/100 Batch  540/781 - Loss:  0.791, Seconds: 4.47\n",
3224 |       "Epoch  27/100 Batch  560/781 - Loss:  0.760, Seconds: 3.95\n",
3225 |       "Epoch  27/100 Batch  580/781 - Loss:  0.853, Seconds: 4.25\n",
3226 |       "Epoch  27/100 Batch  600/781 - Loss:  0.880, Seconds: 4.31\n",
3227 |       "Epoch  27/100 Batch  620/781 - Loss:  0.826, Seconds: 3.99\n",
3228 |       "Epoch  27/100 Batch  640/781 - Loss:  0.817, Seconds: 4.29\n",
3229 |       "Epoch  27/100 Batch  660/781 - Loss:  0.726, Seconds: 4.45\n",
3230 |       "Epoch  27/100 Batch  680/781 - Loss:  0.761, Seconds: 4.25\n"
3231 |      ]
3232 |     },
3233 |     {
3234 |      "name": "stdout",
3235 |      "output_type": "stream",
3236 |      "text": [
3237 |       "Epoch  27/100 Batch  700/781 - Loss:  0.853, Seconds: 4.31\n",
3238 |       "Epoch  27/100 Batch  720/781 - Loss:  0.944, Seconds: 4.51\n",
3239 |       "Epoch  27/100 Batch  740/781 - Loss:  0.856, Seconds: 4.76\n",
3240 |       "Epoch  27/100 Batch  760/781 - Loss:  0.891, Seconds: 4.27\n",
3241 |       "Average loss for this update: 0.826\n",
3242 |       "No Improvement.\n",
3243 |       "Epoch  27/100 Batch  780/781 - Loss:  0.777, Seconds: 3.85\n",
3244 |       "Epoch  28/100 Batch   20/781 - Loss:  0.902, Seconds: 4.25\n",
3245 |       "Epoch  28/100 Batch   40/781 - Loss:  0.824, Seconds: 3.43\n",
3246 |       "Epoch  28/100 Batch   60/781 - Loss:  0.823, Seconds: 4.31\n",
3247 |       "Epoch  28/100 Batch   80/781 - Loss:  0.837, Seconds: 4.45\n",
3248 |       "Epoch  28/100 Batch  100/781 - Loss:  0.743, Seconds: 4.27\n",
3249 |       "Epoch  28/100 Batch  120/781 - Loss:  0.792, Seconds: 3.81\n",
3250 |       "Epoch  28/100 Batch  140/781 - Loss:  0.759, Seconds: 4.01\n",
3251 |       "Epoch  28/100 Batch  160/781 - Loss:  0.870, Seconds: 3.25\n",
3252 |       "Epoch  28/100 Batch  180/781 - Loss:  0.830, Seconds: 3.61\n",
3253 |       "Epoch  28/100 Batch  200/781 - Loss:  0.792, Seconds: 4.23\n",
3254 |       "Epoch  28/100 Batch  220/781 - Loss:  0.810, Seconds: 4.13\n",
3255 |       "Epoch  28/100 Batch  240/781 - Loss:  0.684, Seconds: 4.15\n",
3256 |       "Average loss for this update: 0.802\n",
3257 |       "New Record!\n",
3258 |       "Epoch  28/100 Batch  260/781 - Loss:  0.763, Seconds: 4.75\n",
3259 |       "Epoch  28/100 Batch  280/781 - Loss:  0.780, Seconds: 4.21\n",
3260 |       "Epoch  28/100 Batch  300/781 - Loss:  0.837, Seconds: 4.11\n",
3261 |       "Epoch  28/100 Batch  320/781 - Loss:  0.875, Seconds: 4.03\n",
3262 |       "Epoch  28/100 Batch  340/781 - Loss:  0.781, Seconds: 4.57\n",
3263 |       "Epoch  28/100 Batch  360/781 - Loss:  0.808, Seconds: 4.31\n",
3264 |       "Epoch  28/100 Batch  380/781 - Loss:  0.704, Seconds: 4.49\n",
3265 |       "Epoch  28/100 Batch  400/781 - Loss:  0.797, Seconds: 4.59\n",
3266 |       "Epoch  28/100 Batch  420/781 - Loss:  0.742, Seconds: 3.97\n",
3267 |       "Epoch  28/100 Batch  440/781 - Loss:  0.800, Seconds: 4.61\n",
3268 |       "Epoch  28/100 Batch  460/781 - Loss:  0.844, Seconds: 4.31\n",
3269 |       "Epoch  28/100 Batch  480/781 - Loss:  0.752, Seconds: 3.93\n",
3270 |       "Epoch  28/100 Batch  500/781 - Loss:  0.810, Seconds: 4.29\n",
3271 |       "Average loss for this update: 0.788\n",
3272 |       "New Record!\n",
3273 |       "Epoch  28/100 Batch  520/781 - Loss:  0.717, Seconds: 3.89\n",
3274 |       "Epoch  28/100 Batch  540/781 - Loss:  0.779, Seconds: 4.41\n",
3275 |       "Epoch  28/100 Batch  560/781 - Loss:  0.740, Seconds: 3.93\n",
3276 |       "Epoch  28/100 Batch  580/781 - Loss:  0.830, Seconds: 4.27\n",
3277 |       "Epoch  28/100 Batch  600/781 - Loss:  0.835, Seconds: 4.17\n",
3278 |       "Epoch  28/100 Batch  620/781 - Loss:  0.788, Seconds: 4.15\n",
3279 |       "Epoch  28/100 Batch  640/781 - Loss:  0.792, Seconds: 4.35\n",
3280 |       "Epoch  28/100 Batch  660/781 - Loss:  0.702, Seconds: 4.47\n",
3281 |       "Epoch  28/100 Batch  680/781 - Loss:  0.764, Seconds: 4.25\n",
3282 |       "Epoch  28/100 Batch  700/781 - Loss:  0.842, Seconds: 4.61\n",
3283 |       "Epoch  28/100 Batch  720/781 - Loss:  0.926, Seconds: 4.67\n",
3284 |       "Epoch  28/100 Batch  740/781 - Loss:  0.836, Seconds: 4.65\n",
3285 |       "Epoch  28/100 Batch  760/781 - Loss:  0.866, Seconds: 4.19\n",
3286 |       "Average loss for this update: 0.804\n",
3287 |       "No Improvement.\n",
3288 |       "Epoch  28/100 Batch  780/781 - Loss:  0.761, Seconds: 3.81\n",
3289 |       "Epoch  29/100 Batch   20/781 - Loss:  0.882, Seconds: 4.27\n",
3290 |       "Epoch  29/100 Batch   40/781 - Loss:  0.814, Seconds: 3.37\n",
3291 |       "Epoch  29/100 Batch   60/781 - Loss:  0.814, Seconds: 4.31\n",
3292 |       "Epoch  29/100 Batch   80/781 - Loss:  0.804, Seconds: 4.17\n",
3293 |       "Epoch  29/100 Batch  100/781 - Loss:  0.712, Seconds: 4.37\n",
3294 |       "Epoch  29/100 Batch  120/781 - Loss:  0.774, Seconds: 3.93\n",
3295 |       "Epoch  29/100 Batch  140/781 - Loss:  0.743, Seconds: 3.89\n",
3296 |       "Epoch  29/100 Batch  160/781 - Loss:  0.861, Seconds: 3.21\n",
3297 |       "Epoch  29/100 Batch  180/781 - Loss:  0.816, Seconds: 3.63\n",
3298 |       "Epoch  29/100 Batch  200/781 - Loss:  0.773, Seconds: 4.29\n",
3299 |       "Epoch  29/100 Batch  220/781 - Loss:  0.793, Seconds: 4.17\n",
3300 |       "Epoch  29/100 Batch  240/781 - Loss:  0.657, Seconds: 4.03\n",
3301 |       "Average loss for this update: 0.785\n",
3302 |       "New Record!\n",
3303 |       "Epoch  29/100 Batch  260/781 - Loss:  0.748, Seconds: 4.57\n",
3304 |       "Epoch  29/100 Batch  280/781 - Loss:  0.745, Seconds: 4.29\n",
3305 |       "Epoch  29/100 Batch  300/781 - Loss:  0.803, Seconds: 4.21\n",
3306 |       "Epoch  29/100 Batch  320/781 - Loss:  0.860, Seconds: 4.19\n",
3307 |       "Epoch  29/100 Batch  340/781 - Loss:  0.759, Seconds: 4.69\n",
3308 |       "Epoch  29/100 Batch  360/781 - Loss:  0.794, Seconds: 4.37\n",
3309 |       "Epoch  29/100 Batch  380/781 - Loss:  0.679, Seconds: 4.61\n",
3310 |       "Epoch  29/100 Batch  400/781 - Loss:  0.794, Seconds: 4.55\n",
3311 |       "Epoch  29/100 Batch  420/781 - Loss:  0.739, Seconds: 3.99\n",
3312 |       "Epoch  29/100 Batch  440/781 - Loss:  0.791, Seconds: 4.69\n",
3313 |       "Epoch  29/100 Batch  460/781 - Loss:  0.820, Seconds: 4.59\n",
3314 |       "Epoch  29/100 Batch  480/781 - Loss:  0.747, Seconds: 4.05\n",
3315 |       "Epoch  29/100 Batch  500/781 - Loss:  0.804, Seconds: 4.33\n",
3316 |       "Average loss for this update: 0.771\n",
3317 |       "New Record!\n",
3318 |       "Epoch  29/100 Batch  520/781 - Loss:  0.701, Seconds: 4.03\n",
3319 |       "Epoch  29/100 Batch  540/781 - Loss:  0.757, Seconds: 4.37\n",
3320 |       "Epoch  29/100 Batch  560/781 - Loss:  0.724, Seconds: 3.91\n",
3321 |       "Epoch  29/100 Batch  580/781 - Loss:  0.813, Seconds: 4.25\n",
3322 |       "Epoch  29/100 Batch  600/781 - Loss:  0.825, Seconds: 4.19\n",
3323 |       "Epoch  29/100 Batch  620/781 - Loss:  0.785, Seconds: 4.03\n",
3324 |       "Epoch  29/100 Batch  640/781 - Loss:  0.786, Seconds: 4.13\n",
3325 |       "Epoch  29/100 Batch  660/781 - Loss:  0.690, Seconds: 4.35\n",
3326 |       "Epoch  29/100 Batch  680/781 - Loss:  0.734, Seconds: 4.23\n",
3327 |       "Epoch  29/100 Batch  700/781 - Loss:  0.821, Seconds: 4.27\n",
3328 |       "Epoch  29/100 Batch  720/781 - Loss:  0.905, Seconds: 4.53\n",
3329 |       "Epoch  29/100 Batch  740/781 - Loss:  0.836, Seconds: 4.73\n",
3330 |       "Epoch  29/100 Batch  760/781 - Loss:  0.854, Seconds: 4.33\n",
3331 |       "Average loss for this update: 0.79\n",
3332 |       "No Improvement.\n",
3333 |       "Epoch  29/100 Batch  780/781 - Loss:  0.741, Seconds: 3.85\n",
3334 |       "Epoch  30/100 Batch   20/781 - Loss:  0.855, Seconds: 4.25\n",
3335 |       "Epoch  30/100 Batch   40/781 - Loss:  0.793, Seconds: 3.69\n",
3336 |       "Epoch  30/100 Batch   60/781 - Loss:  0.793, Seconds: 4.39\n",
3337 |       "Epoch  30/100 Batch   80/781 - Loss:  0.791, Seconds: 4.11\n",
3338 |       "Epoch  30/100 Batch  100/781 - Loss:  0.690, Seconds: 4.23\n",
3339 |       "Epoch  30/100 Batch  120/781 - Loss:  0.766, Seconds: 3.85\n",
3340 |       "Epoch  30/100 Batch  140/781 - Loss:  0.733, Seconds: 3.99\n",
3341 |       "Epoch  30/100 Batch  160/781 - Loss:  0.845, Seconds: 3.33\n",
3342 |       "Epoch  30/100 Batch  180/781 - Loss:  0.793, Seconds: 3.71\n",
3343 |       "Epoch  30/100 Batch  200/781 - Loss:  0.761, Seconds: 4.27\n",
3344 |       "Epoch  30/100 Batch  220/781 - Loss:  0.771, Seconds: 4.13\n",
3345 |       "Epoch  30/100 Batch  240/781 - Loss:  0.646, Seconds: 4.03\n",
3346 |       "Average loss for this update: 0.768\n",
3347 |       "New Record!\n",
3348 |       "Epoch  30/100 Batch  260/781 - Loss:  0.744, Seconds: 4.63\n",
3349 |       "Epoch  30/100 Batch  280/781 - Loss:  0.734, Seconds: 4.27\n",
3350 |       "Epoch  30/100 Batch  300/781 - Loss:  0.788, Seconds: 4.09\n",
3351 |       "Epoch  30/100 Batch  320/781 - Loss:  0.839, Seconds: 4.11\n",
3352 |       "Epoch  30/100 Batch  340/781 - Loss:  0.742, Seconds: 4.53\n",
3353 |       "Epoch  30/100 Batch  360/781 - Loss:  0.792, Seconds: 4.33\n",
3354 |       "Epoch  30/100 Batch  380/781 - Loss:  0.669, Seconds: 4.61\n",
3355 |       "Epoch  30/100 Batch  400/781 - Loss:  0.777, Seconds: 4.55\n",
3356 |       "Epoch  30/100 Batch  420/781 - Loss:  0.738, Seconds: 3.91\n",
3357 |       "Epoch  30/100 Batch  440/781 - Loss:  0.824, Seconds: 4.59\n",
3358 |       "Epoch  30/100 Batch  460/781 - Loss:  0.821, Seconds: 4.41\n",
3359 |       "Epoch  30/100 Batch  480/781 - Loss:  0.727, Seconds: 4.03\n",
3360 |       "Epoch  30/100 Batch  500/781 - Loss:  0.807, Seconds: 4.23\n",
3361 |       "Average loss for this update: 0.765\n",
3362 |       "New Record!\n",
3363 |       "Epoch  30/100 Batch  520/781 - Loss:  0.693, Seconds: 4.07\n",
3364 |       "Epoch  30/100 Batch  540/781 - Loss:  0.748, Seconds: 4.49\n",
3365 |       "Epoch  30/100 Batch  560/781 - Loss:  0.724, Seconds: 4.01\n",
3366 |       "Epoch  30/100 Batch  580/781 - Loss:  0.797, Seconds: 4.53\n",
3367 |       "Epoch  30/100 Batch  600/781 - Loss:  0.810, Seconds: 4.31\n",
3368 |       "Epoch  30/100 Batch  620/781 - Loss:  0.770, Seconds: 4.25\n",
3369 |       "Epoch  30/100 Batch  640/781 - Loss:  0.767, Seconds: 4.21\n",
3370 |       "Epoch  30/100 Batch  660/781 - Loss:  0.670, Seconds: 4.39\n",
3371 |       "Epoch  30/100 Batch  680/781 - Loss:  0.722, Seconds: 4.33\n",
3372 |       "Epoch  30/100 Batch  700/781 - Loss:  0.789, Seconds: 4.25\n",
3373 |       "Epoch  30/100 Batch  720/781 - Loss:  0.894, Seconds: 4.77\n",
3374 |       "Epoch  30/100 Batch  740/781 - Loss:  0.818, Seconds: 4.69\n",
3375 |       "Epoch  30/100 Batch  760/781 - Loss:  0.856, Seconds: 4.43\n",
3376 |       "Average loss for this update: 0.777\n",
3377 |       "No Improvement.\n",
3378 |       "Epoch  30/100 Batch  780/781 - Loss:  0.735, Seconds: 3.85\n",
3379 |       "Epoch  31/100 Batch   20/781 - Loss:  0.863, Seconds: 4.15\n",
3380 |       "Epoch  31/100 Batch   40/781 - Loss:  0.789, Seconds: 3.47\n",
3381 |       "Epoch  31/100 Batch   60/781 - Loss:  0.772, Seconds: 4.39\n",
3382 |       "Epoch  31/100 Batch   80/781 - Loss:  0.780, Seconds: 4.25\n",
3383 |       "Epoch  31/100 Batch  100/781 - Loss:  0.694, Seconds: 4.17\n",
3384 |       "Epoch  31/100 Batch  120/781 - Loss:  0.744, Seconds: 4.01\n",
3385 |       "Epoch  31/100 Batch  140/781 - Loss:  0.725, Seconds: 4.09\n",
3386 |       "Epoch  31/100 Batch  160/781 - Loss:  0.822, Seconds: 3.27\n",
3387 |       "Epoch  31/100 Batch  180/781 - Loss:  0.778, Seconds: 3.63\n"
3388 |      ]
3389 |     },
3390 |     {
3391 |      "name": "stdout",
3392 |      "output_type": "stream",
3393 |      "text": [
3394 |       "Epoch  31/100 Batch  200/781 - Loss:  0.736, Seconds: 4.51\n",
3395 |       "Epoch  31/100 Batch  220/781 - Loss:  0.775, Seconds: 4.03\n",
3396 |       "Epoch  31/100 Batch  240/781 - Loss:  0.647, Seconds: 3.99\n",
3397 |       "Average loss for this update: 0.758\n",
3398 |       "New Record!\n",
3399 |       "Epoch  31/100 Batch  260/781 - Loss:  0.730, Seconds: 4.55\n",
3400 |       "Epoch  31/100 Batch  280/781 - Loss:  0.752, Seconds: 4.39\n",
3401 |       "Epoch  31/100 Batch  300/781 - Loss:  0.792, Seconds: 4.21\n",
3402 |       "Epoch  31/100 Batch  320/781 - Loss:  0.832, Seconds: 4.11\n",
3403 |       "Epoch  31/100 Batch  340/781 - Loss:  0.749, Seconds: 4.51\n",
3404 |       "Epoch  31/100 Batch  360/781 - Loss:  0.782, Seconds: 4.31\n",
3405 |       "Epoch  31/100 Batch  380/781 - Loss:  0.669, Seconds: 4.55\n",
3406 |       "Epoch  31/100 Batch  400/781 - Loss:  0.775, Seconds: 4.59\n",
3407 |       "Epoch  31/100 Batch  420/781 - Loss:  0.716, Seconds: 3.89\n",
3408 |       "Epoch  31/100 Batch  440/781 - Loss:  0.769, Seconds: 4.59\n",
3409 |       "Epoch  31/100 Batch  460/781 - Loss:  0.786, Seconds: 4.33\n",
3410 |       "Epoch  31/100 Batch  480/781 - Loss:  0.713, Seconds: 3.97\n",
3411 |       "Epoch  31/100 Batch  500/781 - Loss:  0.787, Seconds: 4.19\n",
3412 |       "Average loss for this update: 0.753\n",
3413 |       "New Record!\n",
3414 |       "Epoch  31/100 Batch  520/781 - Loss:  0.672, Seconds: 3.99\n",
3415 |       "Epoch  31/100 Batch  540/781 - Loss:  0.739, Seconds: 4.31\n",
3416 |       "Epoch  31/100 Batch  560/781 - Loss:  0.700, Seconds: 3.99\n",
3417 |       "Epoch  31/100 Batch  580/781 - Loss:  0.783, Seconds: 4.29\n",
3418 |       "Epoch  31/100 Batch  600/781 - Loss:  0.796, Seconds: 4.29\n",
3419 |       "Epoch  31/100 Batch  620/781 - Loss:  0.750, Seconds: 3.97\n",
3420 |       "Epoch  31/100 Batch  640/781 - Loss:  0.757, Seconds: 4.31\n",
3421 |       "Epoch  31/100 Batch  660/781 - Loss:  0.676, Seconds: 4.59\n",
3422 |       "Epoch  31/100 Batch  680/781 - Loss:  0.748, Seconds: 4.21\n",
3423 |       "Epoch  31/100 Batch  700/781 - Loss:  0.798, Seconds: 4.37\n",
3424 |       "Epoch  31/100 Batch  720/781 - Loss:  0.883, Seconds: 4.55\n",
3425 |       "Epoch  31/100 Batch  740/781 - Loss:  0.805, Seconds: 4.73\n",
3426 |       "Epoch  31/100 Batch  760/781 - Loss:  0.840, Seconds: 4.29\n",
3427 |       "Average loss for this update: 0.768\n",
3428 |       "No Improvement.\n",
3429 |       "Epoch  31/100 Batch  780/781 - Loss:  0.722, Seconds: 4.03\n",
3430 |       "Epoch  32/100 Batch   20/781 - Loss:  0.836, Seconds: 4.23\n",
3431 |       "Epoch  32/100 Batch   40/781 - Loss:  0.777, Seconds: 3.33\n",
3432 |       "Epoch  32/100 Batch   60/781 - Loss:  0.766, Seconds: 4.51\n",
3433 |       "Epoch  32/100 Batch   80/781 - Loss:  0.770, Seconds: 4.21\n",
3434 |       "Epoch  32/100 Batch  100/781 - Loss:  0.688, Seconds: 4.27\n",
3435 |       "Epoch  32/100 Batch  120/781 - Loss:  0.738, Seconds: 3.95\n",
3436 |       "Epoch  32/100 Batch  140/781 - Loss:  0.711, Seconds: 3.93\n",
3437 |       "Epoch  32/100 Batch  160/781 - Loss:  0.825, Seconds: 3.23\n",
3438 |       "Epoch  32/100 Batch  180/781 - Loss:  0.780, Seconds: 3.75\n",
3439 |       "Epoch  32/100 Batch  200/781 - Loss:  0.765, Seconds: 4.25\n",
3440 |       "Epoch  32/100 Batch  220/781 - Loss:  0.762, Seconds: 4.15\n",
3441 |       "Epoch  32/100 Batch  240/781 - Loss:  0.636, Seconds: 4.09\n",
3442 |       "Average loss for this update: 0.752\n",
3443 |       "New Record!\n",
3444 |       "Epoch  32/100 Batch  260/781 - Loss:  0.721, Seconds: 4.59\n",
3445 |       "Epoch  32/100 Batch  280/781 - Loss:  0.724, Seconds: 4.35\n",
3446 |       "Epoch  32/100 Batch  300/781 - Loss:  0.788, Seconds: 4.19\n",
3447 |       "Epoch  32/100 Batch  320/781 - Loss:  0.835, Seconds: 4.13\n",
3448 |       "Epoch  32/100 Batch  340/781 - Loss:  0.720, Seconds: 4.55\n",
3449 |       "Epoch  32/100 Batch  360/781 - Loss:  0.752, Seconds: 4.41\n",
3450 |       "Epoch  32/100 Batch  380/781 - Loss:  0.658, Seconds: 4.57\n",
3451 |       "Epoch  32/100 Batch  400/781 - Loss:  0.763, Seconds: 4.61\n",
3452 |       "Epoch  32/100 Batch  420/781 - Loss:  0.711, Seconds: 3.93\n",
3453 |       "Epoch  32/100 Batch  440/781 - Loss:  0.760, Seconds: 4.47\n",
3454 |       "Epoch  32/100 Batch  460/781 - Loss:  0.793, Seconds: 4.33\n",
3455 |       "Epoch  32/100 Batch  480/781 - Loss:  0.696, Seconds: 4.09\n",
3456 |       "Epoch  32/100 Batch  500/781 - Loss:  0.772, Seconds: 4.15\n",
3457 |       "Average loss for this update: 0.741\n",
3458 |       "New Record!\n",
3459 |       "Epoch  32/100 Batch  520/781 - Loss:  0.661, Seconds: 4.13\n",
3460 |       "Epoch  32/100 Batch  540/781 - Loss:  0.740, Seconds: 4.45\n",
3461 |       "Epoch  32/100 Batch  560/781 - Loss:  0.702, Seconds: 3.97\n",
3462 |       "Epoch  32/100 Batch  580/781 - Loss:  0.796, Seconds: 4.25\n",
3463 |       "Epoch  32/100 Batch  600/781 - Loss:  0.806, Seconds: 4.33\n",
3464 |       "Epoch  32/100 Batch  620/781 - Loss:  0.738, Seconds: 4.03\n",
3465 |       "Epoch  32/100 Batch  640/781 - Loss:  0.746, Seconds: 4.23\n",
3466 |       "Epoch  32/100 Batch  660/781 - Loss:  0.654, Seconds: 4.43\n",
3467 |       "Epoch  32/100 Batch  680/781 - Loss:  0.701, Seconds: 4.27\n",
3468 |       "Epoch  32/100 Batch  700/781 - Loss:  0.779, Seconds: 4.29\n",
3469 |       "Epoch  32/100 Batch  720/781 - Loss:  0.862, Seconds: 4.59\n",
3470 |       "Epoch  32/100 Batch  740/781 - Loss:  0.777, Seconds: 4.63\n",
3471 |       "Epoch  32/100 Batch  760/781 - Loss:  0.833, Seconds: 4.29\n",
3472 |       "Average loss for this update: 0.757\n",
3473 |       "No Improvement.\n",
3474 |       "Epoch  32/100 Batch  780/781 - Loss:  0.708, Seconds: 3.89\n",
3475 |       "Epoch  33/100 Batch   20/781 - Loss:  0.820, Seconds: 4.23\n",
3476 |       "Epoch  33/100 Batch   40/781 - Loss:  0.761, Seconds: 3.45\n",
3477 |       "Epoch  33/100 Batch   60/781 - Loss:  0.758, Seconds: 4.45\n",
3478 |       "Epoch  33/100 Batch   80/781 - Loss:  0.744, Seconds: 4.23\n",
3479 |       "Epoch  33/100 Batch  100/781 - Loss:  0.672, Seconds: 4.23\n",
3480 |       "Epoch  33/100 Batch  120/781 - Loss:  0.730, Seconds: 3.89\n",
3481 |       "Epoch  33/100 Batch  140/781 - Loss:  0.700, Seconds: 3.97\n",
3482 |       "Epoch  33/100 Batch  160/781 - Loss:  0.809, Seconds: 3.25\n",
3483 |       "Epoch  33/100 Batch  180/781 - Loss:  0.758, Seconds: 3.65\n",
3484 |       "Epoch  33/100 Batch  200/781 - Loss:  0.734, Seconds: 4.29\n",
3485 |       "Epoch  33/100 Batch  220/781 - Loss:  0.748, Seconds: 4.01\n",
3486 |       "Epoch  33/100 Batch  240/781 - Loss:  0.630, Seconds: 4.07\n",
3487 |       "Average loss for this update: 0.737\n",
3488 |       "New Record!\n",
3489 |       "Epoch  33/100 Batch  260/781 - Loss:  0.714, Seconds: 4.61\n",
3490 |       "Epoch  33/100 Batch  280/781 - Loss:  0.712, Seconds: 4.23\n",
3491 |       "Epoch  33/100 Batch  300/781 - Loss:  0.764, Seconds: 4.13\n",
3492 |       "Epoch  33/100 Batch  320/781 - Loss:  0.814, Seconds: 4.13\n",
3493 |       "Epoch  33/100 Batch  340/781 - Loss:  0.717, Seconds: 5.39\n",
3494 |       "Epoch  33/100 Batch  360/781 - Loss:  0.753, Seconds: 4.31\n",
3495 |       "Epoch  33/100 Batch  380/781 - Loss:  0.661, Seconds: 5.16\n",
3496 |       "Epoch  33/100 Batch  400/781 - Loss:  0.742, Seconds: 4.57\n",
3497 |       "Epoch  33/100 Batch  420/781 - Loss:  0.698, Seconds: 3.89\n",
3498 |       "Epoch  33/100 Batch  440/781 - Loss:  0.740, Seconds: 4.59\n",
3499 |       "Epoch  33/100 Batch  460/781 - Loss:  0.768, Seconds: 4.37\n",
3500 |       "Epoch  33/100 Batch  480/781 - Loss:  0.690, Seconds: 3.99\n",
3501 |       "Epoch  33/100 Batch  500/781 - Loss:  0.765, Seconds: 4.37\n",
3502 |       "Average loss for this update: 0.73\n",
3503 |       "New Record!\n",
3504 |       "Epoch  33/100 Batch  520/781 - Loss:  0.673, Seconds: 3.91\n",
3505 |       "Epoch  33/100 Batch  540/781 - Loss:  0.712, Seconds: 4.45\n",
3506 |       "Epoch  33/100 Batch  560/781 - Loss:  0.678, Seconds: 3.93\n",
3507 |       "Epoch  33/100 Batch  580/781 - Loss:  0.765, Seconds: 4.19\n",
3508 |       "Epoch  33/100 Batch  600/781 - Loss:  0.771, Seconds: 4.29\n",
3509 |       "Epoch  33/100 Batch  620/781 - Loss:  0.731, Seconds: 4.09\n",
3510 |       "Epoch  33/100 Batch  640/781 - Loss:  0.730, Seconds: 4.31\n",
3511 |       "Epoch  33/100 Batch  660/781 - Loss:  0.645, Seconds: 4.53\n",
3512 |       "Epoch  33/100 Batch  680/781 - Loss:  0.696, Seconds: 4.19\n",
3513 |       "Epoch  33/100 Batch  700/781 - Loss:  0.773, Seconds: 4.37\n",
3514 |       "Epoch  33/100 Batch  720/781 - Loss:  0.841, Seconds: 4.59\n",
3515 |       "Epoch  33/100 Batch  740/781 - Loss:  0.766, Seconds: 4.55\n",
3516 |       "Epoch  33/100 Batch  760/781 - Loss:  0.817, Seconds: 4.29\n",
3517 |       "Average loss for this update: 0.741\n",
3518 |       "No Improvement.\n",
3519 |       "Epoch  33/100 Batch  780/781 - Loss:  0.702, Seconds: 3.91\n",
3520 |       "Epoch  34/100 Batch   20/781 - Loss:  0.809, Seconds: 4.27\n",
3521 |       "Epoch  34/100 Batch   40/781 - Loss:  0.749, Seconds: 3.41\n",
3522 |       "Epoch  34/100 Batch   60/781 - Loss:  0.742, Seconds: 4.49\n",
3523 |       "Epoch  34/100 Batch   80/781 - Loss:  0.743, Seconds: 4.19\n",
3524 |       "Epoch  34/100 Batch  100/781 - Loss:  0.664, Seconds: 4.21\n",
3525 |       "Epoch  34/100 Batch  120/781 - Loss:  0.713, Seconds: 3.93\n",
3526 |       "Epoch  34/100 Batch  140/781 - Loss:  0.698, Seconds: 3.93\n",
3527 |       "Epoch  34/100 Batch  160/781 - Loss:  0.784, Seconds: 3.29\n",
3528 |       "Epoch  34/100 Batch  180/781 - Loss:  0.742, Seconds: 3.79\n",
3529 |       "Epoch  34/100 Batch  200/781 - Loss:  0.713, Seconds: 4.31\n",
3530 |       "Epoch  34/100 Batch  220/781 - Loss:  0.746, Seconds: 4.01\n",
3531 |       "Epoch  34/100 Batch  240/781 - Loss:  0.616, Seconds: 4.05\n",
3532 |       "Average loss for this update: 0.724\n",
3533 |       "New Record!\n",
3534 |       "Epoch  34/100 Batch  260/781 - Loss:  0.692, Seconds: 4.61\n",
3535 |       "Epoch  34/100 Batch  280/781 - Loss:  0.710, Seconds: 4.17\n",
3536 |       "Epoch  34/100 Batch  300/781 - Loss:  0.768, Seconds: 4.07\n",
3537 |       "Epoch  34/100 Batch  320/781 - Loss:  0.811, Seconds: 4.11\n",
3538 |       "Epoch  34/100 Batch  340/781 - Loss:  0.706, Seconds: 4.73\n",
3539 |       "Epoch  34/100 Batch  360/781 - Loss:  0.750, Seconds: 4.41\n",
3540 |       "Epoch  34/100 Batch  380/781 - Loss:  0.641, Seconds: 4.51\n",
3541 |       "Epoch  34/100 Batch  400/781 - Loss:  0.742, Seconds: 4.47\n",
3542 |       "Epoch  34/100 Batch  420/781 - Loss:  0.695, Seconds: 3.91\n",
3543 |       "Epoch  34/100 Batch  440/781 - Loss:  0.730, Seconds: 4.65\n",
3544 |       "Epoch  34/100 Batch  460/781 - Loss:  0.763, Seconds: 4.47\n"
3545 |      ]
3546 |     },
3547 |     {
3548 |      "name": "stdout",
3549 |      "output_type": "stream",
3550 |      "text": [
3551 |       "Epoch  34/100 Batch  480/781 - Loss:  0.696, Seconds: 4.09\n",
3552 |       "Epoch  34/100 Batch  500/781 - Loss:  0.751, Seconds: 4.17\n",
3553 |       "Average loss for this update: 0.725\n",
3554 |       "No Improvement.\n",
3555 |       "Epoch  34/100 Batch  520/781 - Loss:  0.662, Seconds: 3.85\n",
3556 |       "Epoch  34/100 Batch  540/781 - Loss:  0.696, Seconds: 4.43\n",
3557 |       "Epoch  34/100 Batch  560/781 - Loss:  0.672, Seconds: 4.03\n",
3558 |       "Epoch  34/100 Batch  580/781 - Loss:  0.756, Seconds: 4.29\n",
3559 |       "Epoch  34/100 Batch  600/781 - Loss:  0.766, Seconds: 4.31\n",
3560 |       "Epoch  34/100 Batch  620/781 - Loss:  0.728, Seconds: 4.07\n",
3561 |       "Epoch  34/100 Batch  640/781 - Loss:  0.726, Seconds: 4.33\n",
3562 |       "Epoch  34/100 Batch  660/781 - Loss:  0.633, Seconds: 4.39\n",
3563 |       "Epoch  34/100 Batch  680/781 - Loss:  0.698, Seconds: 4.21\n",
3564 |       "Epoch  34/100 Batch  700/781 - Loss:  0.761, Seconds: 4.35\n",
3565 |       "Epoch  34/100 Batch  720/781 - Loss:  0.833, Seconds: 4.47\n",
3566 |       "Epoch  34/100 Batch  740/781 - Loss:  0.756, Seconds: 4.73\n",
3567 |       "Epoch  34/100 Batch  760/781 - Loss:  0.788, Seconds: 4.35\n",
3568 |       "Average loss for this update: 0.731\n",
3569 |       "No Improvement.\n",
3570 |       "Epoch  34/100 Batch  780/781 - Loss:  0.689, Seconds: 3.93\n",
3571 |       "Epoch  35/100 Batch   20/781 - Loss:  0.788, Seconds: 4.21\n",
3572 |       "Epoch  35/100 Batch   40/781 - Loss:  0.730, Seconds: 3.47\n",
3573 |       "Epoch  35/100 Batch   60/781 - Loss:  0.731, Seconds: 4.37\n",
3574 |       "Epoch  35/100 Batch   80/781 - Loss:  0.729, Seconds: 4.25\n",
3575 |       "Epoch  35/100 Batch  100/781 - Loss:  0.648, Seconds: 4.25\n",
3576 |       "Epoch  35/100 Batch  120/781 - Loss:  0.700, Seconds: 3.91\n",
3577 |       "Epoch  35/100 Batch  140/781 - Loss:  0.668, Seconds: 4.23\n",
3578 |       "Epoch  35/100 Batch  160/781 - Loss:  0.766, Seconds: 3.33\n",
3579 |       "Epoch  35/100 Batch  180/781 - Loss:  0.726, Seconds: 3.75\n",
3580 |       "Epoch  35/100 Batch  200/781 - Loss:  0.691, Seconds: 4.19\n",
3581 |       "Epoch  35/100 Batch  220/781 - Loss:  0.733, Seconds: 4.11\n",
3582 |       "Epoch  35/100 Batch  240/781 - Loss:  0.605, Seconds: 4.21\n",
3583 |       "Average loss for this update: 0.708\n",
3584 |       "New Record!\n",
3585 |       "Epoch  35/100 Batch  260/781 - Loss:  0.689, Seconds: 4.51\n",
3586 |       "Epoch  35/100 Batch  280/781 - Loss:  0.693, Seconds: 4.31\n",
3587 |       "Epoch  35/100 Batch  300/781 - Loss:  0.746, Seconds: 4.17\n",
3588 |       "Epoch  35/100 Batch  320/781 - Loss:  0.790, Seconds: 4.27\n",
3589 |       "Epoch  35/100 Batch  340/781 - Loss:  0.690, Seconds: 4.55\n",
3590 |       "Epoch  35/100 Batch  360/781 - Loss:  0.729, Seconds: 4.79\n",
3591 |       "Epoch  35/100 Batch  380/781 - Loss:  0.638, Seconds: 4.61\n",
3592 |       "Epoch  35/100 Batch  400/781 - Loss:  0.719, Seconds: 4.89\n",
3593 |       "Epoch  35/100 Batch  420/781 - Loss:  0.684, Seconds: 3.93\n",
3594 |       "Epoch  35/100 Batch  440/781 - Loss:  0.722, Seconds: 4.57\n",
3595 |       "Epoch  35/100 Batch  460/781 - Loss:  0.738, Seconds: 4.35\n",
3596 |       "Epoch  35/100 Batch  480/781 - Loss:  0.678, Seconds: 3.95\n",
3597 |       "Epoch  35/100 Batch  500/781 - Loss:  0.744, Seconds: 4.37\n",
3598 |       "Average loss for this update: 0.708\n",
3599 |       "New Record!\n",
3600 |       "Epoch  35/100 Batch  520/781 - Loss:  0.632, Seconds: 3.77\n",
3601 |       "Epoch  35/100 Batch  540/781 - Loss:  0.686, Seconds: 4.45\n",
3602 |       "Epoch  35/100 Batch  560/781 - Loss:  0.671, Seconds: 4.17\n",
3603 |       "Epoch  35/100 Batch  580/781 - Loss:  0.750, Seconds: 4.29\n",
3604 |       "Epoch  35/100 Batch  600/781 - Loss:  0.756, Seconds: 4.27\n",
3605 |       "Epoch  35/100 Batch  620/781 - Loss:  0.696, Seconds: 4.01\n",
3606 |       "Epoch  35/100 Batch  640/781 - Loss:  0.709, Seconds: 4.29\n",
3607 |       "Epoch  35/100 Batch  660/781 - Loss:  0.626, Seconds: 4.55\n",
3608 |       "Epoch  35/100 Batch  680/781 - Loss:  0.685, Seconds: 4.33\n",
3609 |       "Epoch  35/100 Batch  700/781 - Loss:  0.753, Seconds: 4.55\n",
3610 |       "Epoch  35/100 Batch  720/781 - Loss:  0.834, Seconds: 4.57\n",
3611 |       "Epoch  35/100 Batch  740/781 - Loss:  0.744, Seconds: 4.71\n",
3612 |       "Epoch  35/100 Batch  760/781 - Loss:  0.787, Seconds: 4.33\n",
3613 |       "Average loss for this update: 0.721\n",
3614 |       "No Improvement.\n",
3615 |       "Epoch  35/100 Batch  780/781 - Loss:  0.681, Seconds: 4.13\n",
3616 |       "Epoch  36/100 Batch   20/781 - Loss:  0.797, Seconds: 4.27\n",
3617 |       "Epoch  36/100 Batch   40/781 - Loss:  0.727, Seconds: 3.61\n",
3618 |       "Epoch  36/100 Batch   60/781 - Loss:  0.722, Seconds: 4.31\n",
3619 |       "Epoch  36/100 Batch   80/781 - Loss:  0.723, Seconds: 4.27\n",
3620 |       "Epoch  36/100 Batch  100/781 - Loss:  0.639, Seconds: 4.25\n",
3621 |       "Epoch  36/100 Batch  120/781 - Loss:  0.690, Seconds: 3.85\n",
3622 |       "Epoch  36/100 Batch  140/781 - Loss:  0.663, Seconds: 4.07\n",
3623 |       "Epoch  36/100 Batch  160/781 - Loss:  0.760, Seconds: 3.31\n",
3624 |       "Epoch  36/100 Batch  180/781 - Loss:  0.718, Seconds: 3.89\n",
3625 |       "Epoch  36/100 Batch  200/781 - Loss:  0.713, Seconds: 4.31\n",
3626 |       "Epoch  36/100 Batch  220/781 - Loss:  0.738, Seconds: 4.17\n",
3627 |       "Epoch  36/100 Batch  240/781 - Loss:  0.602, Seconds: 4.05\n",
3628 |       "Average loss for this update: 0.706\n",
3629 |       "New Record!\n",
3630 |       "Epoch  36/100 Batch  260/781 - Loss:  0.684, Seconds: 4.53\n",
3631 |       "Epoch  36/100 Batch  280/781 - Loss:  0.704, Seconds: 4.49\n",
3632 |       "Epoch  36/100 Batch  300/781 - Loss:  0.725, Seconds: 4.27\n",
3633 |       "Epoch  36/100 Batch  320/781 - Loss:  0.785, Seconds: 4.07\n",
3634 |       "Epoch  36/100 Batch  340/781 - Loss:  0.683, Seconds: 4.47\n",
3635 |       "Epoch  36/100 Batch  360/781 - Loss:  0.704, Seconds: 4.53\n",
3636 |       "Epoch  36/100 Batch  380/781 - Loss:  0.626, Seconds: 4.57\n",
3637 |       "Epoch  36/100 Batch  400/781 - Loss:  0.699, Seconds: 4.45\n",
3638 |       "Epoch  36/100 Batch  420/781 - Loss:  0.668, Seconds: 3.95\n",
3639 |       "Epoch  36/100 Batch  440/781 - Loss:  0.704, Seconds: 4.77\n",
3640 |       "Epoch  36/100 Batch  460/781 - Loss:  0.732, Seconds: 4.63\n",
3641 |       "Epoch  36/100 Batch  480/781 - Loss:  0.707, Seconds: 4.01\n",
3642 |       "Epoch  36/100 Batch  500/781 - Loss:  0.790, Seconds: 4.22\n",
3643 |       "Average loss for this update: 0.706\n",
3644 |       "No Improvement.\n",
3645 |       "Epoch  36/100 Batch  520/781 - Loss:  0.663, Seconds: 3.85\n",
3646 |       "Epoch  36/100 Batch  540/781 - Loss:  0.703, Seconds: 4.35\n",
3647 |       "Epoch  36/100 Batch  560/781 - Loss:  0.681, Seconds: 4.11\n",
3648 |       "Epoch  36/100 Batch  580/781 - Loss:  0.759, Seconds: 4.37\n",
3649 |       "Epoch  36/100 Batch  600/781 - Loss:  0.743, Seconds: 4.29\n",
3650 |       "Epoch  36/100 Batch  620/781 - Loss:  0.711, Seconds: 4.05\n",
3651 |       "Epoch  36/100 Batch  640/781 - Loss:  0.722, Seconds: 4.21\n",
3652 |       "Epoch  36/100 Batch  660/781 - Loss:  0.619, Seconds: 4.43\n",
3653 |       "Epoch  36/100 Batch  680/781 - Loss:  0.677, Seconds: 4.29\n",
3654 |       "Epoch  36/100 Batch  700/781 - Loss:  0.740, Seconds: 4.31\n",
3655 |       "Epoch  36/100 Batch  720/781 - Loss:  0.825, Seconds: 4.43\n",
3656 |       "Epoch  36/100 Batch  740/781 - Loss:  0.738, Seconds: 5.05\n",
3657 |       "Epoch  36/100 Batch  760/781 - Loss:  0.772, Seconds: 4.27\n",
3658 |       "Average loss for this update: 0.721\n",
3659 |       "No Improvement.\n",
3660 |       "Epoch  36/100 Batch  780/781 - Loss:  0.689, Seconds: 4.35\n",
3661 |       "Epoch  37/100 Batch   20/781 - Loss:  0.849, Seconds: 4.29\n",
3662 |       "Epoch  37/100 Batch   40/781 - Loss:  0.742, Seconds: 3.49\n",
3663 |       "Epoch  37/100 Batch   60/781 - Loss:  0.720, Seconds: 4.41\n",
3664 |       "Epoch  37/100 Batch   80/781 - Loss:  0.724, Seconds: 4.39\n",
3665 |       "Epoch  37/100 Batch  100/781 - Loss:  0.643, Seconds: 4.33\n",
3666 |       "Epoch  37/100 Batch  120/781 - Loss:  0.690, Seconds: 3.81\n",
3667 |       "Epoch  37/100 Batch  140/781 - Loss:  0.673, Seconds: 4.03\n",
3668 |       "Epoch  37/100 Batch  160/781 - Loss:  0.757, Seconds: 3.33\n",
3669 |       "Epoch  37/100 Batch  180/781 - Loss:  0.716, Seconds: 3.83\n",
3670 |       "Epoch  37/100 Batch  200/781 - Loss:  0.690, Seconds: 4.33\n",
3671 |       "Epoch  37/100 Batch  220/781 - Loss:  0.715, Seconds: 4.09\n",
3672 |       "Epoch  37/100 Batch  240/781 - Loss:  0.603, Seconds: 4.23\n",
3673 |       "Average loss for this update: 0.707\n",
3674 |       "No Improvement.\n",
3675 |       "Stopping Training.\n"
3676 |      ]
3677 |     }
3678 |    ],
3679 |    "source": [
3680 |     "# Train the Model\n",
3681 |     "learning_rate_decay = 0.95\n",
3682 |     "min_learning_rate = 0.0005\n",
3683 |     "display_step = 20 # Check training loss after every 20 batches\n",
3684 |     "stop_early = 0 \n",
3685 |     "stop = 3 # If the update loss does not decrease in 3 consecutive update checks, stop training\n",
3686 |     "per_epoch = 3 # Make 3 update checks per epoch\n",
3687 |     "update_check = (len(sorted_texts_short)//batch_size//per_epoch)-1\n",
3688 |     "\n",
3689 |     "update_loss = 0 \n",
3690 |     "batch_loss = 0\n",
3691 |     "summary_update_loss = [] # Record the update losses for saving improvements in the model\n",
3692 |     "\n",
3693 |     "checkpoint = \"./best_model.ckpt\" \n",
3694 |     "with tf.Session(graph=train_graph) as sess:\n",
3695 |     "    sess.run(tf.global_variables_initializer())\n",
3696 |     "    \n",
3697 |     "    # If we want to continue training a previous session\n",
3698 |     "    #loader = tf.train.import_meta_graph(\"./\" + checkpoint + '.meta')\n",
3699 |     "    #loader.restore(sess, checkpoint)\n",
3700 |     "    \n",
3701 |     "    for epoch_i in range(1, epochs+1):\n",
3702 |     "        update_loss = 0\n",
3703 |     "        batch_loss = 0\n",
3704 |     "        for batch_i, (summaries_batch, texts_batch, summaries_lengths, texts_lengths) in enumerate(\n",
3705 |     "                get_batches(sorted_summaries_short, sorted_texts_short, batch_size)):\n",
3706 |     "            start_time = time.time()\n",
3707 |     "            _, loss = sess.run(\n",
3708 |     "                [train_op, cost],\n",
3709 |     "                {input_data: texts_batch,\n",
3710 |     "                 targets: summaries_batch,\n",
3711 |     "                 lr: learning_rate,\n",
3712 |     "                 summary_length: summaries_lengths,\n",
3713 |     "                 text_length: texts_lengths,\n",
3714 |     "                 keep_prob: keep_probability})\n",
3715 |     "\n",
3716 |     "            batch_loss += loss\n",
3717 |     "            update_loss += loss\n",
3718 |     "            end_time = time.time()\n",
3719 |     "            batch_time = end_time - start_time\n",
3720 |     "\n",
3721 |     "            if batch_i % display_step == 0 and batch_i > 0:\n",
3722 |     "                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'\n",
3723 |     "                      .format(epoch_i,\n",
3724 |     "                              epochs, \n",
3725 |     "                              batch_i, \n",
3726 |     "                              len(sorted_texts_short) // batch_size, \n",
3727 |     "                              batch_loss / display_step, \n",
3728 |     "                              batch_time*display_step))\n",
3729 |     "                batch_loss = 0\n",
3730 |     "\n",
3731 |     "            if batch_i % update_check == 0 and batch_i > 0:\n",
3732 |     "                print(\"Average loss for this update:\", round(update_loss/update_check,3))\n",
3733 |     "                summary_update_loss.append(update_loss)\n",
3734 |     "                \n",
3735 |     "                # If the update loss is at a new minimum, save the model\n",
3736 |     "                if update_loss <= min(summary_update_loss):\n",
3737 |     "                    print('New Record!') \n",
3738 |     "                    stop_early = 0\n",
3739 |     "                    saver = tf.train.Saver() \n",
3740 |     "                    saver.save(sess, checkpoint)\n",
3741 |     "\n",
3742 |     "                else:\n",
3743 |     "                    print(\"No Improvement.\")\n",
3744 |     "                    stop_early += 1\n",
3745 |     "                    if stop_early == stop:\n",
3746 |     "                        break\n",
3747 |     "                update_loss = 0\n",
3748 |     "            \n",
3749 |     "                    \n",
3750 |     "        # Reduce learning rate, but not below its minimum value\n",
3751 |     "        learning_rate *= learning_rate_decay\n",
3752 |     "        if learning_rate < min_learning_rate:\n",
3753 |     "            learning_rate = min_learning_rate\n",
3754 |     "        \n",
3755 |     "        if stop_early == stop:\n",
3756 |     "            print(\"Stopping Training.\")\n",
3757 |     "            break"
3758 |    ]
3759 |   },
3760 |   {
3761 |    "cell_type": "markdown",
3762 |    "metadata": {},
3763 |    "source": [
3764 |     "## 5. Making Our Own Summaries"
3765 |    ]
3766 |   },
3767 |   {
3768 |    "cell_type": "markdown",
3769 |    "metadata": {},
3770 |    "source": [
3771 |     "To see the quality of the summaries that this model can generate, you can either create your own review, or use a review from the dataset. You can set the length of the summary to a fixed value, or use a random value like I have here."
3772 |    ]
3773 |   },
3774 |   {
3775 |    "cell_type": "code",
3776 |    "execution_count": 20,
3777 |    "metadata": {
3778 |     "collapsed": true
3779 |    },
3780 |    "outputs": [],
3781 |    "source": [
3782 |     "def text_to_seq(text):\n",
3783 |     "    '''Prepare the text for the model'''\n",
3784 |     "    \n",
3785 |     "    text = clean_text(text)\n",
3786 |     "    return [vocab_to_int.get(word, vocab_to_int['<UNK>']) for word in text.split()]"
3787 |    ]
3788 |   },
3789 |   {
3790 |    "cell_type": "markdown",
3791 |    "metadata": {},
3792 |    "source": [
3793 |     "\n",
3794 |     "- **input_sentences**: a list of reviews strings we are going to summarize\n",
3795 |     "- **generagte_summary_length**: a int or list, if a list must be same length as input_sentences\n"
3796 |    ]
3797 |   },
3798 |   {
3799 |    "cell_type": "code",
3800 |    "execution_count": 23,
3801 |    "metadata": {},
3802 |    "outputs": [
3803 |     {
3804 |      "name": "stdout",
3805 |      "output_type": "stream",
3806 |      "text": [
3807 |       "INFO:tensorflow:Restoring parameters from ./best_model.ckpt\n",
3808 |       "- Review:\n",
3809 |       " The coffee tasted great and was at such a good price! I highly recommend this to everyone!\n",
3810 |       "- Summary:\n",
3811 |       " great great coffee\n",
3812 |       "\n",
3813 |       "\n",
3814 |       "- Review:\n",
3815 |       " love individual oatmeal cups found years ago sam quit selling sound big lots quit selling found target expensive buy individually trilled get entire case time go anywhere need water microwave spoon know quaker flavor packets\n",
3816 |       "- Summary:\n",
3817 |       " great taste\n",
3818 |       "\n",
3819 |       "\n"
3820 |      ]
3821 |     }
3822 |    ],
3823 |    "source": [
3824 |     "input_sentences=[\"The coffee tasted great and was at such a good price! I highly recommend this to everyone!\",\n",
3825 |     "               \"love individual oatmeal cups found years ago sam quit selling sound big lots quit selling found target expensive buy individually trilled get entire case time go anywhere need water microwave spoon know quaker flavor packets\"]\n",
3826 |     "generagte_summary_length =  [3,2]\n",
3827 |     "\n",
3828 |     "texts = [text_to_seq(input_sentence) for input_sentence in input_sentences]\n",
3829 |     "checkpoint = \"./best_model.ckpt\"\n",
3830 |     "if type(generagte_summary_length) is list:\n",
3831 |     "    if len(input_sentences)!=len(generagte_summary_length):\n",
3832 |     "        raise Exception(\"[Error] makeSummaries parameter generagte_summary_length must be same length as input_sentences or an integer\")\n",
3833 |     "    generagte_summary_length_list = generagte_summary_length\n",
3834 |     "else:\n",
3835 |     "    generagte_summary_length_list = [generagte_summary_length] * len(texts)\n",
3836 |     "loaded_graph = tf.Graph()\n",
3837 |     "with tf.Session(graph=loaded_graph) as sess:\n",
3838 |     "    # Load saved model\n",
3839 |     "    loader = tf.train.import_meta_graph(checkpoint + '.meta')\n",
3840 |     "    loader.restore(sess, checkpoint)\n",
3841 |     "    input_data = loaded_graph.get_tensor_by_name('input:0')\n",
3842 |     "    logits = loaded_graph.get_tensor_by_name('predictions:0')\n",
3843 |     "    text_length = loaded_graph.get_tensor_by_name('text_length:0')\n",
3844 |     "    summary_length = loaded_graph.get_tensor_by_name('summary_length:0')\n",
3845 |     "    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')\n",
3846 |     "    #Multiply by batch_size to match the model's input parameters\n",
3847 |     "    for i, text in enumerate(texts):\n",
3848 |     "        generagte_summary_length = generagte_summary_length_list[i]\n",
3849 |     "        answer_logits = sess.run(logits, {input_data: [text]*batch_size, \n",
3850 |     "                                          summary_length: [generagte_summary_length], #summary_length: [np.random.randint(5,8)], \n",
3851 |     "                                          text_length: [len(text)]*batch_size,\n",
3852 |     "                                          keep_prob: 1.0})[0] \n",
3853 |     "        # Remove the padding from the summaries\n",
3854 |     "        pad = vocab_to_int[\"<PAD>\"] \n",
3855 |     "        print('- Review:\\n\\r {}'.format(input_sentences[i]))\n",
3856 |     "        print('- Summary:\\n\\r {}\\n\\r\\n\\r'.format(\" \".join([int_to_vocab[i] for i in answer_logits if i != pad])))"
3857 |    ]
3858 |   },
3859 |   {
3860 |    "cell_type": "markdown",
3861 |    "metadata": {},
3862 |    "source": [
3863 |     "## Summary"
3864 |    ]
3865 |   },
3866 |   {
3867 |    "cell_type": "markdown",
3868 |    "metadata": {},
3869 |    "source": [
3870 |     "I hope that you found this project to be rather interesting and informative. One of my main recommendations for working with this dataset and model is either use a GPU, a subset of the dataset, or plenty of time to train your model. As you might be able to expect, the model will not be able to make good predictions just by seeing many reviews, it needs so see the reviews many times to be able to understand the relationship between words and between descriptions & summaries. \n",
3871 |     "\n",
3872 |     "In short, I'm pleased with how well this model performs. After creating numerous reviews and checking those from the dataset, I can happily say that most of the generated summaries are appropriate, some of them are great, and some of them make mistakes. I'll try to improve this model and if it gets better, I'll update my GitHub.\n",
3873 |     "\n",
3874 |     "Thanks for reading!"
3875 |    ]
3876 |   },
3877 |   {
3878 |    "cell_type": "code",
3879 |    "execution_count": null,
3880 |    "metadata": {
3881 |     "collapsed": true
3882 |    },
3883 |    "outputs": [],
3884 |    "source": []
3885 |   }
3886 |  ],
3887 |  "metadata": {
3888 |   "anaconda-cloud": {},
3889 |   "kernelspec": {
3890 |    "display_name": "Python 3",
3891 |    "language": "python",
3892 |    "name": "python3"
3893 |   },
3894 |   "language_info": {
3895 |    "codemirror_mode": {
3896 |     "name": "ipython",
3897 |     "version": 3
3898 |    },
3899 |    "file_extension": ".py",
3900 |    "mimetype": "text/x-python",
3901 |    "name": "python",
3902 |    "nbconvert_exporter": "python",
3903 |    "pygments_lexer": "ipython3",
3904 |    "version": "3.5.2"
3905 |   }
3906 |  },
3907 |  "nbformat": 4,
3908 |  "nbformat_minor": 1
3909 | }
3910 | 


--------------------------------------------------------------------------------