├── README.md
└── summarize_reviews.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Summarizing Text with Amazon Reviews
2 |
3 | Updated to work with TensorFlow Version: 1.3.0
4 |
5 | The objective of this project is to build a model that can create relevant summaries for reviews written about fine foods sold on Amazon. This dataset contains above 500,000 reviews, and is hosted on [Kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews).
6 |
7 | Here are two examples to show what the data looks like
8 | ```
9 | Review # 1
10 | Good Quality Dog Food
11 | I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.
12 |
13 | Review # 2
14 | Not as Advertised
15 | Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".
16 | ```
17 | To build our model we will use a two-layered bidirectional RNN with LSTMs on the input data and two layers, each with an LSTM using bahdanau attention on the target data.
18 |
19 | The sections of this project are:
20 | - 1.Inspecting the Data
21 | - 2.Preparing the Data
22 | - 3.Building the Model
23 | - 4.Training the Model
24 | - 5.Making Our Own Summaries
25 |
26 | ## Download data
27 | Amazon Reviews Data: [Reviews.csv](https://www.kaggle.com/snap/amazon-fine-food-reviews/downloads/Reviews.csv) and copy it to **./Reviews.csv**
28 |
29 | word embeddings [numberbatch-en-17.06.txt.gz](https://conceptnet.s3.amazonaws.com/downloads/2017/numberbatch/numberbatch-en-17.06.txt.gz)
30 | after download, extract to **./model/numberbatch-en-17.06.txt**
31 |
32 | ## Dependencies
33 | Python 3.5 packages: tensorflow v1.3, pandas, numpy, nltk
34 |
35 | ### How to Run
36 | Run the python notebook by cd into the directory in command line then run
37 | ```
38 | jupyter notebook
39 | ```
40 | choose this in the browser
41 |
42 | **summarize_reviews.ipynb**
43 |
44 |
45 | Inspired by the post [Text Summarization with Amazon Reviews](https://medium.com/towards-data-science/text-summarization-with-amazon-reviews-41801c2210b), with a few improvements.
46 |
47 | I wrote an [article](https://www.dlology.com/blog/tutorial-summarizing-text-with-amazon-reviews/) about this project that explains parts of it in detail.
--------------------------------------------------------------------------------
/summarize_reviews.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Summarizing Text with Amazon Reviews"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "The objective of this project is to build a model that can create relevant summaries for reviews written about fine foods sold on Amazon. This dataset contains above 500,000 reviews, and is hosted on [Kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews).\n",
15 | "\n",
16 | "To build our model we will use a two-layered bidirectional RNN with LSTMs on the input data and two layers, each with an LSTM using bahdanau attention on the target data.\n",
17 | "\n",
18 | "The sections of this project are:\n",
19 | "- [1.Inspecting the Data](#1.-Insepcting-the-Data)\n",
20 | "- [2.Preparing the Data](#2.-Preparing-the-Data)\n",
21 | "- [3.Building the Model](#3.-Building-the-Model)\n",
22 | "- [4.Training the Model](#4.-Training-the-Model)\n",
23 | "- [5.Making Our Own Summaries](#5.-Making-Our-Own-Summaries)\n",
24 | "\n",
25 | "## Download data\n",
26 | "Amazon Reviews Data: [Reviews.csv](https://www.kaggle.com/snap/amazon-fine-food-reviews/downloads/Reviews.csv)\n",
27 | "\n",
28 | "word embeddings [numberbatch-en-17.06.txt.gz](https://conceptnet.s3.amazonaws.com/downloads/2017/numberbatch/numberbatch-en-17.06.txt.gz)\n",
29 | "after download, extract to **./model/numberbatch-en-17.06.txt**"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 1,
35 | "metadata": {},
36 | "outputs": [
37 | {
38 | "name": "stdout",
39 | "output_type": "stream",
40 | "text": [
41 | "TensorFlow Version: 1.3.0\n"
42 | ]
43 | }
44 | ],
45 | "source": [
46 | "import pandas as pd\n",
47 | "import numpy as np\n",
48 | "import tensorflow as tf\n",
49 | "import re\n",
50 | "from nltk.corpus import stopwords\n",
51 | "import time\n",
52 | "from tensorflow.python.layers.core import Dense\n",
53 | "from tensorflow.python.ops.rnn_cell_impl import _zero_state_tensors\n",
54 | "from tensorflow.python.ops import array_ops\n",
55 | "from tensorflow.python.ops import tensor_array_ops\n",
56 | "print('TensorFlow Version: {}'.format(tf.__version__))"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 2,
62 | "metadata": {
63 | "collapsed": true
64 | },
65 | "outputs": [],
66 | "source": [
67 | "import pickle\n",
68 | "def __pickleStuff(filename, stuff):\n",
69 | " save_stuff = open(filename, \"wb\")\n",
70 | " pickle.dump(stuff, save_stuff)\n",
71 | " save_stuff.close()\n",
72 | "def __loadStuff(filename):\n",
73 | " saved_stuff = open(filename,\"rb\")\n",
74 | " stuff = pickle.load(saved_stuff)\n",
75 | " saved_stuff.close()\n",
76 | " return stuff"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "## Load those prepared data and skip to section \"[3. Building the Model](#3.-Building-the-Model)\"\n",
84 | "Once we have run through the \"[2.Preparing the Data](#2.-Preparing-the-Data)\" section, we should have those data, uncomment and run those lines."
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 3,
90 | "metadata": {
91 | "collapsed": true
92 | },
93 | "outputs": [],
94 | "source": [
95 | "clean_summaries = __loadStuff(\"./data/clean_summaries.p\")\n",
96 | "clean_texts = __loadStuff(\"./data/clean_texts.p\")\n",
97 | "\n",
98 | "sorted_summaries = __loadStuff(\"./data/sorted_summaries.p\")\n",
99 | "sorted_texts = __loadStuff(\"./data/sorted_texts.p\")\n",
100 | "word_embedding_matrix = __loadStuff(\"./data/word_embedding_matrix.p\")\n",
101 | "\n",
102 | "vocab_to_int = __loadStuff(\"./data/vocab_to_int.p\")\n",
103 | "int_to_vocab = __loadStuff(\"./data/int_to_vocab.p\")\n"
104 | ]
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "metadata": {},
109 | "source": [
110 | "## 1. Insepcting the Data"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 3,
116 | "metadata": {
117 | "collapsed": true
118 | },
119 | "outputs": [],
120 | "source": [
121 | "reviews = pd.read_csv(\"Reviews.csv\")"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 4,
127 | "metadata": {},
128 | "outputs": [
129 | {
130 | "data": {
131 | "text/plain": [
132 | "(568454, 10)"
133 | ]
134 | },
135 | "execution_count": 4,
136 | "metadata": {},
137 | "output_type": "execute_result"
138 | }
139 | ],
140 | "source": [
141 | "reviews.shape"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 5,
147 | "metadata": {},
148 | "outputs": [
149 | {
150 | "data": {
151 | "text/html": [
152 | "
\n",
153 | "
\n",
154 | " \n",
155 | " \n",
156 | " | \n",
157 | " Id | \n",
158 | " ProductId | \n",
159 | " UserId | \n",
160 | " ProfileName | \n",
161 | " HelpfulnessNumerator | \n",
162 | " HelpfulnessDenominator | \n",
163 | " Score | \n",
164 | " Time | \n",
165 | " Summary | \n",
166 | " Text | \n",
167 | "
\n",
168 | " \n",
169 | " \n",
170 | " \n",
171 | " | 0 | \n",
172 | " 1 | \n",
173 | " B001E4KFG0 | \n",
174 | " A3SGXH7AUHU8GW | \n",
175 | " delmartian | \n",
176 | " 1 | \n",
177 | " 1 | \n",
178 | " 5 | \n",
179 | " 1303862400 | \n",
180 | " Good Quality Dog Food | \n",
181 | " I have bought several of the Vitality canned d... | \n",
182 | "
\n",
183 | " \n",
184 | " | 1 | \n",
185 | " 2 | \n",
186 | " B00813GRG4 | \n",
187 | " A1D87F6ZCVE5NK | \n",
188 | " dll pa | \n",
189 | " 0 | \n",
190 | " 0 | \n",
191 | " 1 | \n",
192 | " 1346976000 | \n",
193 | " Not as Advertised | \n",
194 | " Product arrived labeled as Jumbo Salted Peanut... | \n",
195 | "
\n",
196 | " \n",
197 | " | 2 | \n",
198 | " 3 | \n",
199 | " B000LQOCH0 | \n",
200 | " ABXLMWJIXXAIN | \n",
201 | " Natalia Corres \"Natalia Corres\" | \n",
202 | " 1 | \n",
203 | " 1 | \n",
204 | " 4 | \n",
205 | " 1219017600 | \n",
206 | " \"Delight\" says it all | \n",
207 | " This is a confection that has been around a fe... | \n",
208 | "
\n",
209 | " \n",
210 | " | 3 | \n",
211 | " 4 | \n",
212 | " B000UA0QIQ | \n",
213 | " A395BORC6FGVXV | \n",
214 | " Karl | \n",
215 | " 3 | \n",
216 | " 3 | \n",
217 | " 2 | \n",
218 | " 1307923200 | \n",
219 | " Cough Medicine | \n",
220 | " If you are looking for the secret ingredient i... | \n",
221 | "
\n",
222 | " \n",
223 | " | 4 | \n",
224 | " 5 | \n",
225 | " B006K2ZZ7K | \n",
226 | " A1UQRSCLF8GW1T | \n",
227 | " Michael D. Bigham \"M. Wassir\" | \n",
228 | " 0 | \n",
229 | " 0 | \n",
230 | " 5 | \n",
231 | " 1350777600 | \n",
232 | " Great taffy | \n",
233 | " Great taffy at a great price. There was a wid... | \n",
234 | "
\n",
235 | " \n",
236 | "
\n",
237 | "
"
238 | ],
239 | "text/plain": [
240 | " Id ProductId UserId ProfileName \\\n",
241 | "0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian \n",
242 | "1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa \n",
243 | "2 3 B000LQOCH0 ABXLMWJIXXAIN Natalia Corres \"Natalia Corres\" \n",
244 | "3 4 B000UA0QIQ A395BORC6FGVXV Karl \n",
245 | "4 5 B006K2ZZ7K A1UQRSCLF8GW1T Michael D. Bigham \"M. Wassir\" \n",
246 | "\n",
247 | " HelpfulnessNumerator HelpfulnessDenominator Score Time \\\n",
248 | "0 1 1 5 1303862400 \n",
249 | "1 0 0 1 1346976000 \n",
250 | "2 1 1 4 1219017600 \n",
251 | "3 3 3 2 1307923200 \n",
252 | "4 0 0 5 1350777600 \n",
253 | "\n",
254 | " Summary Text \n",
255 | "0 Good Quality Dog Food I have bought several of the Vitality canned d... \n",
256 | "1 Not as Advertised Product arrived labeled as Jumbo Salted Peanut... \n",
257 | "2 \"Delight\" says it all This is a confection that has been around a fe... \n",
258 | "3 Cough Medicine If you are looking for the secret ingredient i... \n",
259 | "4 Great taffy Great taffy at a great price. There was a wid... "
260 | ]
261 | },
262 | "execution_count": 5,
263 | "metadata": {},
264 | "output_type": "execute_result"
265 | }
266 | ],
267 | "source": [
268 | "reviews.head()"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": 6,
274 | "metadata": {},
275 | "outputs": [
276 | {
277 | "data": {
278 | "text/plain": [
279 | "Id 0\n",
280 | "ProductId 0\n",
281 | "UserId 0\n",
282 | "ProfileName 16\n",
283 | "HelpfulnessNumerator 0\n",
284 | "HelpfulnessDenominator 0\n",
285 | "Score 0\n",
286 | "Time 0\n",
287 | "Summary 26\n",
288 | "Text 0\n",
289 | "dtype: int64"
290 | ]
291 | },
292 | "execution_count": 6,
293 | "metadata": {},
294 | "output_type": "execute_result"
295 | }
296 | ],
297 | "source": [
298 | "# Check for any nulls values\n",
299 | "reviews.isnull().sum()"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": 7,
305 | "metadata": {
306 | "collapsed": true
307 | },
308 | "outputs": [],
309 | "source": [
310 | "# Remove null values and unneeded features\n",
311 | "reviews = reviews.dropna()\n",
312 | "reviews = reviews.drop(['Id','ProductId','UserId','ProfileName','HelpfulnessNumerator','HelpfulnessDenominator',\n",
313 | " 'Score','Time'], 1)\n",
314 | "reviews = reviews.reset_index(drop=True)"
315 | ]
316 | },
317 | {
318 | "cell_type": "code",
319 | "execution_count": 8,
320 | "metadata": {},
321 | "outputs": [
322 | {
323 | "data": {
324 | "text/plain": [
325 | "(568412, 2)"
326 | ]
327 | },
328 | "execution_count": 8,
329 | "metadata": {},
330 | "output_type": "execute_result"
331 | }
332 | ],
333 | "source": [
334 | "reviews.shape"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": 9,
340 | "metadata": {},
341 | "outputs": [
342 | {
343 | "data": {
344 | "text/html": [
345 | "\n",
346 | "
\n",
347 | " \n",
348 | " \n",
349 | " | \n",
350 | " Summary | \n",
351 | " Text | \n",
352 | "
\n",
353 | " \n",
354 | " \n",
355 | " \n",
356 | " | 0 | \n",
357 | " Good Quality Dog Food | \n",
358 | " I have bought several of the Vitality canned d... | \n",
359 | "
\n",
360 | " \n",
361 | " | 1 | \n",
362 | " Not as Advertised | \n",
363 | " Product arrived labeled as Jumbo Salted Peanut... | \n",
364 | "
\n",
365 | " \n",
366 | " | 2 | \n",
367 | " \"Delight\" says it all | \n",
368 | " This is a confection that has been around a fe... | \n",
369 | "
\n",
370 | " \n",
371 | " | 3 | \n",
372 | " Cough Medicine | \n",
373 | " If you are looking for the secret ingredient i... | \n",
374 | "
\n",
375 | " \n",
376 | " | 4 | \n",
377 | " Great taffy | \n",
378 | " Great taffy at a great price. There was a wid... | \n",
379 | "
\n",
380 | " \n",
381 | "
\n",
382 | "
"
383 | ],
384 | "text/plain": [
385 | " Summary Text\n",
386 | "0 Good Quality Dog Food I have bought several of the Vitality canned d...\n",
387 | "1 Not as Advertised Product arrived labeled as Jumbo Salted Peanut...\n",
388 | "2 \"Delight\" says it all This is a confection that has been around a fe...\n",
389 | "3 Cough Medicine If you are looking for the secret ingredient i...\n",
390 | "4 Great taffy Great taffy at a great price. There was a wid..."
391 | ]
392 | },
393 | "execution_count": 9,
394 | "metadata": {},
395 | "output_type": "execute_result"
396 | }
397 | ],
398 | "source": [
399 | "reviews.head()"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": 10,
405 | "metadata": {},
406 | "outputs": [
407 | {
408 | "name": "stdout",
409 | "output_type": "stream",
410 | "text": [
411 | "Review # 1\n",
412 | "Good Quality Dog Food\n",
413 | "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.\n",
414 | "\n",
415 | "Review # 2\n",
416 | "Not as Advertised\n",
417 | "Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as \"Jumbo\".\n",
418 | "\n",
419 | "Review # 3\n",
420 | "\"Delight\" says it all\n",
421 | "This is a confection that has been around a few centuries. It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar. And it is a tiny mouthful of heaven. Not too chewy, and very flavorful. I highly recommend this yummy treat. If you are familiar with the story of C.S. Lewis' \"The Lion, The Witch, and The Wardrobe\" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.\n",
422 | "\n",
423 | "Review # 4\n",
424 | "Cough Medicine\n",
425 | "If you are looking for the secret ingredient in Robitussin I believe I have found it. I got this in addition to the Root Beer Extract I ordered (which was good) and made some cherry soda. The flavor is very medicinal.\n",
426 | "\n",
427 | "Review # 5\n",
428 | "Great taffy\n",
429 | "Great taffy at a great price. There was a wide assortment of yummy taffy. Delivery was very quick. If your a taffy lover, this is a deal.\n",
430 | "\n"
431 | ]
432 | }
433 | ],
434 | "source": [
435 | "# Inspecting some of the reviews\n",
436 | "for i in range(5):\n",
437 | " print(\"Review #\",i+1)\n",
438 | " print(reviews.Summary[i])\n",
439 | " print(reviews.Text[i])\n",
440 | " print()"
441 | ]
442 | },
443 | {
444 | "cell_type": "markdown",
445 | "metadata": {},
446 | "source": [
447 | "## 2. Preparing the Data"
448 | ]
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": 15,
453 | "metadata": {
454 | "collapsed": true
455 | },
456 | "outputs": [],
457 | "source": [
458 | "# A list of contractions from http://stackoverflow.com/questions/19790188/expanding-english-language-contractions-in-python\n",
459 | "contractions = { \n",
460 | "\"ain't\": \"am not\",\n",
461 | "\"aren't\": \"are not\",\n",
462 | "\"can't\": \"cannot\",\n",
463 | "\"can't've\": \"cannot have\",\n",
464 | "\"'cause\": \"because\",\n",
465 | "\"could've\": \"could have\",\n",
466 | "\"couldn't\": \"could not\",\n",
467 | "\"couldn't've\": \"could not have\",\n",
468 | "\"didn't\": \"did not\",\n",
469 | "\"doesn't\": \"does not\",\n",
470 | "\"don't\": \"do not\",\n",
471 | "\"hadn't\": \"had not\",\n",
472 | "\"hadn't've\": \"had not have\",\n",
473 | "\"hasn't\": \"has not\",\n",
474 | "\"haven't\": \"have not\",\n",
475 | "\"he'd\": \"he would\",\n",
476 | "\"he'd've\": \"he would have\",\n",
477 | "\"he'll\": \"he will\",\n",
478 | "\"he's\": \"he is\",\n",
479 | "\"how'd\": \"how did\",\n",
480 | "\"how'll\": \"how will\",\n",
481 | "\"how's\": \"how is\",\n",
482 | "\"i'd\": \"i would\",\n",
483 | "\"i'll\": \"i will\",\n",
484 | "\"i'm\": \"i am\",\n",
485 | "\"i've\": \"i have\",\n",
486 | "\"isn't\": \"is not\",\n",
487 | "\"it'd\": \"it would\",\n",
488 | "\"it'll\": \"it will\",\n",
489 | "\"it's\": \"it is\",\n",
490 | "\"let's\": \"let us\",\n",
491 | "\"ma'am\": \"madam\",\n",
492 | "\"mayn't\": \"may not\",\n",
493 | "\"might've\": \"might have\",\n",
494 | "\"mightn't\": \"might not\",\n",
495 | "\"must've\": \"must have\",\n",
496 | "\"mustn't\": \"must not\",\n",
497 | "\"needn't\": \"need not\",\n",
498 | "\"oughtn't\": \"ought not\",\n",
499 | "\"shan't\": \"shall not\",\n",
500 | "\"sha'n't\": \"shall not\",\n",
501 | "\"she'd\": \"she would\",\n",
502 | "\"she'll\": \"she will\",\n",
503 | "\"she's\": \"she is\",\n",
504 | "\"should've\": \"should have\",\n",
505 | "\"shouldn't\": \"should not\",\n",
506 | "\"that'd\": \"that would\",\n",
507 | "\"that's\": \"that is\",\n",
508 | "\"there'd\": \"there had\",\n",
509 | "\"there's\": \"there is\",\n",
510 | "\"they'd\": \"they would\",\n",
511 | "\"they'll\": \"they will\",\n",
512 | "\"they're\": \"they are\",\n",
513 | "\"they've\": \"they have\",\n",
514 | "\"wasn't\": \"was not\",\n",
515 | "\"we'd\": \"we would\",\n",
516 | "\"we'll\": \"we will\",\n",
517 | "\"we're\": \"we are\",\n",
518 | "\"we've\": \"we have\",\n",
519 | "\"weren't\": \"were not\",\n",
520 | "\"what'll\": \"what will\",\n",
521 | "\"what're\": \"what are\",\n",
522 | "\"what's\": \"what is\",\n",
523 | "\"what've\": \"what have\",\n",
524 | "\"where'd\": \"where did\",\n",
525 | "\"where's\": \"where is\",\n",
526 | "\"who'll\": \"who will\",\n",
527 | "\"who's\": \"who is\",\n",
528 | "\"won't\": \"will not\",\n",
529 | "\"wouldn't\": \"would not\",\n",
530 | "\"you'd\": \"you would\",\n",
531 | "\"you'll\": \"you will\",\n",
532 | "\"you're\": \"you are\"\n",
533 | "}"
534 | ]
535 | },
536 | {
537 | "cell_type": "code",
538 | "execution_count": 16,
539 | "metadata": {
540 | "collapsed": true
541 | },
542 | "outputs": [],
543 | "source": [
544 | "def clean_text(text, remove_stopwords = True):\n",
545 | " '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''\n",
546 | " \n",
547 | " # Convert words to lower case\n",
548 | " text = text.lower()\n",
549 | " \n",
550 | " # Replace contractions with their longer forms \n",
551 | " if True:\n",
552 | " # We are not using \"text.split()\" here\n",
553 | " #since it is not fool proof, e.g. words followed by punctuations \"Are you kidding?I think you aren't.\"\n",
554 | " text = re.findall(r\"[\\w']+\", text)\n",
555 | " new_text = []\n",
556 | " for word in text:\n",
557 | " if word in contractions:\n",
558 | " new_text.append(contractions[word])\n",
559 | " else:\n",
560 | " new_text.append(word)\n",
561 | " text = \" \".join(new_text)\n",
562 | " \n",
563 | " # Format words and remove unwanted characters\n",
564 | " text = re.sub(r'https?:\\/\\/.*[\\r\\n]*', '', text, flags=re.MULTILINE)# remove links\n",
565 | " text = re.sub(r'\\', ' ', text)\n",
569 | " text = re.sub(r'\\'', ' ', text)\n",
570 | " \n",
571 | " # Optionally, remove stop words\n",
572 | " if remove_stopwords:\n",
573 | " text = text.split()\n",
574 | " stops = set(stopwords.words(\"english\"))\n",
575 | " text = [w for w in text if not w in stops]\n",
576 | " text = \" \".join(text)\n",
577 | "\n",
578 | " return text"
579 | ]
580 | },
581 | {
582 | "cell_type": "code",
583 | "execution_count": 17,
584 | "metadata": {},
585 | "outputs": [
586 | {
587 | "data": {
588 | "text/plain": [
589 | "'great movie believe may'"
590 | ]
591 | },
592 | "execution_count": 17,
593 | "metadata": {},
594 | "output_type": "execute_result"
595 | }
596 | ],
597 | "source": [
598 | "clean_text(\"That's a great movie,Can you believe it?I've.But you may not.\")"
599 | ]
600 | },
601 | {
602 | "cell_type": "markdown",
603 | "metadata": {},
604 | "source": [
605 | "### Clean the summaries and texts\n",
606 | "We will remove the stopwords from the texts because they do not provide much use for training our model. However, we will keep them for our summaries so that they sound more like natural phrases. "
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": 14,
612 | "metadata": {},
613 | "outputs": [
614 | {
615 | "name": "stdout",
616 | "output_type": "stream",
617 | "text": [
618 | "Summaries are complete.\n",
619 | "Texts are complete.\n"
620 | ]
621 | }
622 | ],
623 | "source": [
624 | "clean_summaries = []\n",
625 | "for summary in reviews.Summary:\n",
626 | " clean_summaries.append(clean_text(summary, remove_stopwords=False))\n",
627 | "print(\"Summaries are complete.\")\n",
628 | "\n",
629 | "clean_texts = []\n",
630 | "for text in reviews.Text:\n",
631 | " clean_texts.append(clean_text(text))\n",
632 | "print(\"Texts are complete.\")"
633 | ]
634 | },
635 | {
636 | "cell_type": "code",
637 | "execution_count": 15,
638 | "metadata": {},
639 | "outputs": [
640 | {
641 | "name": "stdout",
642 | "output_type": "stream",
643 | "text": [
644 | "Clean Review # 1\n",
645 | "good quality dog food\n",
646 | "bought several vitality canned dog food products found good quality product looks like stew processed meat smells better labrador finicky appreciates product better\n",
647 | "\n",
648 | "Clean Review # 2\n",
649 | "not as advertised\n",
650 | "product arrived labeled jumbo salted peanuts peanuts actually small sized unsalted sure error vendor intended represent product jumbo\n",
651 | "\n",
652 | "Clean Review # 3\n",
653 | "delight says it all\n",
654 | "confection around centuries light pillowy citrus gelatin nuts case filberts cut tiny squares liberally coated powdered sugar tiny mouthful heaven chewy flavorful highly recommend yummy treat familiar story c lewis lion witch wardrobe treat seduces edmund selling brother sisters witch\n",
655 | "\n",
656 | "Clean Review # 4\n",
657 | "cough medicine\n",
658 | "looking secret ingredient robitussin believe found got addition root beer extract ordered good made cherry soda flavor medicinal\n",
659 | "\n",
660 | "Clean Review # 5\n",
661 | "great taffy\n",
662 | "great taffy great price wide assortment yummy taffy delivery quick taffy lover deal\n",
663 | "\n"
664 | ]
665 | }
666 | ],
667 | "source": [
668 | "# Inspect the cleaned summaries and texts to ensure they have been cleaned well\n",
669 | "for i in range(5):\n",
670 | " print(\"Clean Review #\",i+1)\n",
671 | " print(clean_summaries[i])\n",
672 | " print(clean_texts[i])\n",
673 | " print()"
674 | ]
675 | },
676 | {
677 | "cell_type": "markdown",
678 | "metadata": {},
679 | "source": [
680 | "### Count the number of occurrences of each word in a set of text"
681 | ]
682 | },
683 | {
684 | "cell_type": "code",
685 | "execution_count": 16,
686 | "metadata": {
687 | "collapsed": true
688 | },
689 | "outputs": [],
690 | "source": [
691 | "def count_words(count_dict, text):\n",
692 | " for sentence in text:\n",
693 | " for word in sentence.split():\n",
694 | " if word not in count_dict:\n",
695 | " count_dict[word] = 1\n",
696 | " else:\n",
697 | " count_dict[word] += 1"
698 | ]
699 | },
700 | {
701 | "cell_type": "markdown",
702 | "metadata": {},
703 | "source": [
704 | "#### Give the function a try"
705 | ]
706 | },
707 | {
708 | "cell_type": "code",
709 | "execution_count": 17,
710 | "metadata": {},
711 | "outputs": [
712 | {
713 | "data": {
714 | "text/plain": [
715 | "{'a': 2, 'dog': 2, 'great': 4, 'have': 1, 'is': 1, 'that': 1, 'you': 1}"
716 | ]
717 | },
718 | "execution_count": 17,
719 | "metadata": {},
720 | "output_type": "execute_result"
721 | }
722 | ],
723 | "source": [
724 | "mydict = {}\n",
725 | "count_words(mydict, [\"that is a great great great dog\",\"you have a great dog\"])\n",
726 | "mydict"
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": 18,
732 | "metadata": {},
733 | "outputs": [
734 | {
735 | "name": "stdout",
736 | "output_type": "stream",
737 | "text": [
738 | "Size of Vocabulary: 125880\n"
739 | ]
740 | }
741 | ],
742 | "source": [
743 | "word_counts = {}\n",
744 | "count_words(word_counts, clean_summaries)\n",
745 | "count_words(word_counts, clean_texts)\n",
746 | "print(\"Size of Vocabulary:\", len(word_counts))"
747 | ]
748 | },
749 | {
750 | "cell_type": "markdown",
751 | "metadata": {},
752 | "source": [
753 | "Let's see how may \"hero\" occurs in the data"
754 | ]
755 | },
756 | {
757 | "cell_type": "code",
758 | "execution_count": 19,
759 | "metadata": {},
760 | "outputs": [
761 | {
762 | "data": {
763 | "text/plain": [
764 | "114"
765 | ]
766 | },
767 | "execution_count": 19,
768 | "metadata": {},
769 | "output_type": "execute_result"
770 | }
771 | ],
772 | "source": [
773 | "word_counts[\"hero\"]"
774 | ]
775 | },
776 | {
777 | "cell_type": "markdown",
778 | "metadata": {
779 | "collapsed": true
780 | },
781 | "source": [
782 | "### Load Conceptnet Numberbatch's (CN) embeddings, similar to GloVe, but probably better \n",
783 | " (https://github.com/commonsense/conceptnet-numberbatch)"
784 | ]
785 | },
786 | {
787 | "cell_type": "code",
788 | "execution_count": 20,
789 | "metadata": {},
790 | "outputs": [
791 | {
792 | "name": "stdout",
793 | "output_type": "stream",
794 | "text": [
795 | "Word embeddings: 417195\n"
796 | ]
797 | }
798 | ],
799 | "source": [
800 | "\n",
801 | "embeddings_index = {}\n",
802 | "with open('./model/numberbatch-en-17.06.txt', encoding='utf-8') as f:\n",
803 | " for line in f:\n",
804 | " values = line.split(' ')\n",
805 | " word = values[0]\n",
806 | " embedding = np.asarray(values[1:], dtype='float32')\n",
807 | " embeddings_index[word] = embedding\n",
808 | "\n",
809 | "print('Word embeddings:', len(embeddings_index))"
810 | ]
811 | },
812 | {
813 | "cell_type": "markdown",
814 | "metadata": {},
815 | "source": [
816 | "### Take a look at the CN embedding dimension"
817 | ]
818 | },
819 | {
820 | "cell_type": "code",
821 | "execution_count": 21,
822 | "metadata": {},
823 | "outputs": [
824 | {
825 | "data": {
826 | "text/plain": [
827 | "(300,)"
828 | ]
829 | },
830 | "execution_count": 21,
831 | "metadata": {},
832 | "output_type": "execute_result"
833 | }
834 | ],
835 | "source": [
836 | "embeddings_index[\"hero\"].shape"
837 | ]
838 | },
839 | {
840 | "cell_type": "markdown",
841 | "metadata": {},
842 | "source": [
843 | "### Find the number of words that are missing from CN, and are used more than our threshold.\n",
844 | "\n",
845 | "I use a **threshold** of 20, so that words not in CN can be added to our **word_embedding_matrix**, but they need to be common enough in the reviews so that the model can understand their meaning."
846 | ]
847 | },
848 | {
849 | "cell_type": "code",
850 | "execution_count": 22,
851 | "metadata": {},
852 | "outputs": [
853 | {
854 | "name": "stdout",
855 | "output_type": "stream",
856 | "text": [
857 | "Number of words missing from CN: 2608\n",
858 | "Percent of words that are missing from vocabulary: 2.07%\n"
859 | ]
860 | }
861 | ],
862 | "source": [
863 | "missing_words = 0\n",
864 | "threshold = 20\n",
865 | "\n",
866 | "for word, count in word_counts.items():\n",
867 | " if count > threshold:\n",
868 | " if word not in embeddings_index:\n",
869 | " missing_words += 1\n",
870 | " \n",
871 | "missing_ratio = round(missing_words/len(word_counts),4)*100\n",
872 | " \n",
873 | "print(\"Number of words missing from CN:\", missing_words)\n",
874 | "print(\"Percent of words that are missing from vocabulary: {}%\".format(missing_ratio))"
875 | ]
876 | },
877 | {
878 | "cell_type": "markdown",
879 | "metadata": {},
880 | "source": [
881 | "### What are those missing words in the CN\n",
882 | "Looks mostly products' brand."
883 | ]
884 | },
885 | {
886 | "cell_type": "code",
887 | "execution_count": 23,
888 | "metadata": {},
889 | "outputs": [
890 | {
891 | "data": {
892 | "text/plain": [
893 | "[('wafu', 29),\n",
894 | " ('wasaibi', 24),\n",
895 | " ('sauage', 23),\n",
896 | " ('diabetisweet', 27),\n",
897 | " ('aerogrow', 99),\n",
898 | " ('lowfat', 298),\n",
899 | " ('deliverd', 21),\n",
900 | " ('bullysticks', 21),\n",
901 | " ('keurigs', 72),\n",
902 | " ('pepitas', 42),\n",
903 | " ('wellpet', 27),\n",
904 | " ('undertaste', 24),\n",
905 | " ('50g', 44),\n",
906 | " ('ammount', 45),\n",
907 | " ('400', 461),\n",
908 | " ('toniq', 21),\n",
909 | " ('gummis', 161),\n",
910 | " ('teasan', 81),\n",
911 | " ('27th', 28),\n",
912 | " ('iherb', 66),\n",
913 | " ('fage', 34),\n",
914 | " ('droste', 70),\n",
915 | " ('wholefoods', 145),\n",
916 | " ('marzanos', 29),\n",
917 | " ('discusting', 28),\n",
918 | " ('foojoy', 41),\n",
919 | " ('91', 75),\n",
920 | " ('indomie', 36),\n",
921 | " ('5hour', 64),\n",
922 | " ('ec155', 34)]"
923 | ]
924 | },
925 | "execution_count": 23,
926 | "metadata": {},
927 | "output_type": "execute_result"
928 | }
929 | ],
930 | "source": [
931 | "missing_words = []\n",
932 | "for word, count in word_counts.items():\n",
933 | " if count > threshold and word not in embeddings_index:\n",
934 | " missing_words.append((word,count))\n",
935 | "missing_words[:30]"
936 | ]
937 | },
938 | {
939 | "cell_type": "markdown",
940 | "metadata": {},
941 | "source": [
942 | "### Words to indexes, indexes to words dicts\n",
943 | "Limit the vocab that we will use to words that appear ≥ threshold or are in CN"
944 | ]
945 | },
946 | {
947 | "cell_type": "code",
948 | "execution_count": 24,
949 | "metadata": {},
950 | "outputs": [
951 | {
952 | "name": "stdout",
953 | "output_type": "stream",
954 | "text": [
955 | "Total number of unique words: 125880\n",
956 | "Number of words we will use: 59072\n",
957 | "Percent of words we will use: 46.93%\n"
958 | ]
959 | }
960 | ],
961 | "source": [
962 | "#dictionary to convert words to integers\n",
963 | "vocab_to_int = {} \n",
964 | "# Index words from 0\n",
965 | "value = 0\n",
966 | "for word, count in word_counts.items():\n",
967 | " if count >= threshold or word in embeddings_index:\n",
968 | " vocab_to_int[word] = value\n",
969 | " value += 1\n",
970 | "\n",
971 | "# Special tokens that will be added to our vocab\n",
972 | "codes = [\"\",\"\",\"\",\"\"] \n",
973 | "\n",
974 | "# Add codes to vocab\n",
975 | "for code in codes:\n",
976 | " vocab_to_int[code] = len(vocab_to_int)\n",
977 | "\n",
978 | "# Dictionary to convert integers to words\n",
979 | "int_to_vocab = {}\n",
980 | "for word, value in vocab_to_int.items():\n",
981 | " int_to_vocab[value] = word\n",
982 | "\n",
983 | "usage_ratio = round(len(vocab_to_int) / len(word_counts),4)*100\n",
984 | "\n",
985 | "print(\"Total number of unique words:\", len(word_counts))\n",
986 | "print(\"Number of words we will use:\", len(vocab_to_int))\n",
987 | "print(\"Percent of words we will use: {}%\".format(usage_ratio))"
988 | ]
989 | },
990 | {
991 | "cell_type": "markdown",
992 | "metadata": {},
993 | "source": [
994 | "### Create word embedding matrix\n",
995 | "It has shape (nb_words, embedding_dim) i.e. (59072, 300) in this case. 1st dim is word index, 2nd dim is from CN or random generated."
996 | ]
997 | },
998 | {
999 | "cell_type": "code",
1000 | "execution_count": 25,
1001 | "metadata": {},
1002 | "outputs": [
1003 | {
1004 | "name": "stdout",
1005 | "output_type": "stream",
1006 | "text": [
1007 | "59072\n"
1008 | ]
1009 | }
1010 | ],
1011 | "source": [
1012 | "# Need to use 300 for embedding dimensions to match CN's vectors.\n",
1013 | "embedding_dim = 300\n",
1014 | "nb_words = len(vocab_to_int)\n",
1015 | "\n",
1016 | "# Create matrix with default values of zero\n",
1017 | "word_embedding_matrix = np.zeros((nb_words, embedding_dim), dtype=np.float32)\n",
1018 | "for word, i in vocab_to_int.items():\n",
1019 | " if word in embeddings_index:\n",
1020 | " word_embedding_matrix[i] = embeddings_index[word]\n",
1021 | " else:\n",
1022 | " # If word not in CN, create a random embedding for it\n",
1023 | " new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))\n",
1024 | " embeddings_index[word] = new_embedding\n",
1025 | " word_embedding_matrix[i] = new_embedding\n",
1026 | "\n",
1027 | "# Check if value matches len(vocab_to_int)\n",
1028 | "print(len(word_embedding_matrix))"
1029 | ]
1030 | },
1031 | {
1032 | "cell_type": "markdown",
1033 | "metadata": {},
1034 | "source": [
1035 | "### Function to convert sentences to sequence of words indexes\n",
1036 | "It also use `` index to replace unknown words, append `` (End of Sentence) to the sequences if eos is set True"
1037 | ]
1038 | },
1039 | {
1040 | "cell_type": "code",
1041 | "execution_count": 26,
1042 | "metadata": {
1043 | "collapsed": true
1044 | },
1045 | "outputs": [],
1046 | "source": [
1047 | "def convert_to_ints(text, word_count, unk_count, eos=False):\n",
1048 | " '''Convert words in text to an integer.\n",
1049 | " If word is not in vocab_to_int, use UNK's integer.\n",
1050 | " Total the number of words and UNKs.\n",
1051 | " Add EOS token to the end of texts'''\n",
1052 | " ints = []\n",
1053 | " for sentence in text:\n",
1054 | " sentence_ints = []\n",
1055 | " for word in sentence.split():\n",
1056 | " word_count += 1\n",
1057 | " if word in vocab_to_int:\n",
1058 | " sentence_ints.append(vocab_to_int[word])\n",
1059 | " else:\n",
1060 | " sentence_ints.append(vocab_to_int[\"\"])\n",
1061 | " unk_count += 1\n",
1062 | " if eos:\n",
1063 | " sentence_ints.append(vocab_to_int[\"\"])\n",
1064 | " ints.append(sentence_ints)\n",
1065 | " return ints, word_count, unk_count"
1066 | ]
1067 | },
1068 | {
1069 | "cell_type": "markdown",
1070 | "metadata": {},
1071 | "source": [
1072 | "Apply convert_to_ints to clean_summaries and clean_texts"
1073 | ]
1074 | },
1075 | {
1076 | "cell_type": "code",
1077 | "execution_count": 27,
1078 | "metadata": {},
1079 | "outputs": [
1080 | {
1081 | "name": "stdout",
1082 | "output_type": "stream",
1083 | "text": [
1084 | "Total number of words in headlines: 26232576\n",
1085 | "Total number of UNKs in headlines: 163594\n",
1086 | "Percent of words that are UNK: 0.62%\n"
1087 | ]
1088 | }
1089 | ],
1090 | "source": [
1091 | "\n",
1092 | "word_count = 0\n",
1093 | "unk_count = 0\n",
1094 | "\n",
1095 | "int_summaries, word_count, unk_count = convert_to_ints(clean_summaries, word_count, unk_count)\n",
1096 | "int_texts, word_count, unk_count = convert_to_ints(clean_texts, word_count, unk_count, eos=True)\n",
1097 | "\n",
1098 | "unk_percent = round(unk_count/word_count,4)*100\n",
1099 | "\n",
1100 | "print(\"Total number of words in headlines:\", word_count)\n",
1101 | "print(\"Total number of UNKs in headlines:\", unk_count)\n",
1102 | "print(\"Percent of words that are UNK: {}%\".format(unk_percent))"
1103 | ]
1104 | },
1105 | {
1106 | "cell_type": "markdown",
1107 | "metadata": {},
1108 | "source": [
1109 | "### Take a look at what the sequence looks like\n",
1110 | "Each number here represents a word"
1111 | ]
1112 | },
1113 | {
1114 | "cell_type": "code",
1115 | "execution_count": 28,
1116 | "metadata": {},
1117 | "outputs": [
1118 | {
1119 | "data": {
1120 | "text/plain": [
1121 | "[[32681, 40810, 26787, 54872],\n",
1122 | " [2229, 54986, 47923],\n",
1123 | " [23867, 38191, 14436, 39262]]"
1124 | ]
1125 | },
1126 | "execution_count": 28,
1127 | "metadata": {},
1128 | "output_type": "execute_result"
1129 | }
1130 | ],
1131 | "source": [
1132 | "int_summaries[:3]"
1133 | ]
1134 | },
1135 | {
1136 | "cell_type": "markdown",
1137 | "metadata": {},
1138 | "source": [
1139 | "### Function to get the length of each sequence"
1140 | ]
1141 | },
1142 | {
1143 | "cell_type": "code",
1144 | "execution_count": 29,
1145 | "metadata": {
1146 | "collapsed": true
1147 | },
1148 | "outputs": [],
1149 | "source": [
1150 | "def create_lengths(text):\n",
1151 | " '''Create a data frame of the sentence lengths from a text'''\n",
1152 | " lengths = []\n",
1153 | " for sentence in text:\n",
1154 | " lengths.append(len(sentence))\n",
1155 | " return pd.DataFrame(lengths, columns=['counts'])"
1156 | ]
1157 | },
1158 | {
1159 | "cell_type": "code",
1160 | "execution_count": 30,
1161 | "metadata": {},
1162 | "outputs": [
1163 | {
1164 | "data": {
1165 | "text/html": [
1166 | "\n",
1167 | "
\n",
1168 | " \n",
1169 | " \n",
1170 | " | \n",
1171 | " counts | \n",
1172 | "
\n",
1173 | " \n",
1174 | " \n",
1175 | " \n",
1176 | " | 0 | \n",
1177 | " 4 | \n",
1178 | "
\n",
1179 | " \n",
1180 | " | 1 | \n",
1181 | " 3 | \n",
1182 | "
\n",
1183 | " \n",
1184 | " | 2 | \n",
1185 | " 4 | \n",
1186 | "
\n",
1187 | " \n",
1188 | "
\n",
1189 | "
"
1190 | ],
1191 | "text/plain": [
1192 | " counts\n",
1193 | "0 4\n",
1194 | "1 3\n",
1195 | "2 4"
1196 | ]
1197 | },
1198 | "execution_count": 30,
1199 | "metadata": {},
1200 | "output_type": "execute_result"
1201 | }
1202 | ],
1203 | "source": [
1204 | "create_lengths(int_summaries[:3])"
1205 | ]
1206 | },
1207 | {
1208 | "cell_type": "markdown",
1209 | "metadata": {},
1210 | "source": [
1211 | "Get statistic summary of the length of summaries and texts"
1212 | ]
1213 | },
1214 | {
1215 | "cell_type": "code",
1216 | "execution_count": 31,
1217 | "metadata": {},
1218 | "outputs": [
1219 | {
1220 | "name": "stdout",
1221 | "output_type": "stream",
1222 | "text": [
1223 | "Summaries:\n",
1224 | " counts\n",
1225 | "count 568412.000000\n",
1226 | "mean 4.181208\n",
1227 | "std 2.657212\n",
1228 | "min 0.000000\n",
1229 | "25% 2.000000\n",
1230 | "50% 4.000000\n",
1231 | "75% 5.000000\n",
1232 | "max 48.000000\n",
1233 | "\n",
1234 | "Texts:\n",
1235 | " counts\n",
1236 | "count 568412.000000\n",
1237 | "mean 42.969429\n",
1238 | "std 44.166421\n",
1239 | "min 2.000000\n",
1240 | "25% 18.000000\n",
1241 | "50% 30.000000\n",
1242 | "75% 51.000000\n",
1243 | "max 2063.000000\n"
1244 | ]
1245 | }
1246 | ],
1247 | "source": [
1248 | "lengths_summaries = create_lengths(int_summaries)\n",
1249 | "lengths_texts = create_lengths(int_texts)\n",
1250 | "\n",
1251 | "print(\"Summaries:\")\n",
1252 | "print(lengths_summaries.describe())\n",
1253 | "print()\n",
1254 | "print(\"Texts:\")\n",
1255 | "print(lengths_texts.describe())"
1256 | ]
1257 | },
1258 | {
1259 | "cell_type": "markdown",
1260 | "metadata": {},
1261 | "source": [
1262 | "### See what's the max squence length we can cover by percentile"
1263 | ]
1264 | },
1265 | {
1266 | "cell_type": "code",
1267 | "execution_count": 32,
1268 | "metadata": {},
1269 | "outputs": [
1270 | {
1271 | "name": "stdout",
1272 | "output_type": "stream",
1273 | "text": [
1274 | "84.0\n",
1275 | "118.0\n",
1276 | "216.0\n"
1277 | ]
1278 | }
1279 | ],
1280 | "source": [
1281 | "# Inspect the length of texts\n",
1282 | "print(np.percentile(lengths_texts.counts, 89.5))\n",
1283 | "print(np.percentile(lengths_texts.counts, 95))\n",
1284 | "print(np.percentile(lengths_texts.counts, 99))"
1285 | ]
1286 | },
1287 | {
1288 | "cell_type": "code",
1289 | "execution_count": 33,
1290 | "metadata": {},
1291 | "outputs": [
1292 | {
1293 | "name": "stdout",
1294 | "output_type": "stream",
1295 | "text": [
1296 | "8.0\n",
1297 | "9.0\n",
1298 | "13.0\n"
1299 | ]
1300 | }
1301 | ],
1302 | "source": [
1303 | "# Inspect the length of summaries\n",
1304 | "print(np.percentile(lengths_summaries.counts, 90))\n",
1305 | "print(np.percentile(lengths_summaries.counts, 95))\n",
1306 | "print(np.percentile(lengths_summaries.counts, 99))"
1307 | ]
1308 | },
1309 | {
1310 | "cell_type": "markdown",
1311 | "metadata": {},
1312 | "source": [
1313 | "## Function to counts the number of time `` appears in a sentence"
1314 | ]
1315 | },
1316 | {
1317 | "cell_type": "code",
1318 | "execution_count": 34,
1319 | "metadata": {
1320 | "collapsed": true
1321 | },
1322 | "outputs": [],
1323 | "source": [
1324 | "def unk_counter(sentence):\n",
1325 | " '''Counts the number of time UNK appears in a sentence.'''\n",
1326 | " unk_count = 0\n",
1327 | " for word in sentence:\n",
1328 | " if word == vocab_to_int[\"\"]:\n",
1329 | " unk_count += 1\n",
1330 | " return unk_count"
1331 | ]
1332 | },
1333 | {
1334 | "cell_type": "markdown",
1335 | "metadata": {},
1336 | "source": [
1337 | "**Filter** for length limit and number of ``s\n",
1338 | "\n",
1339 | "**Sort** the summaries and texts by the length of the element in **texts** from shortest to longest\n"
1340 | ]
1341 | },
1342 | {
1343 | "cell_type": "code",
1344 | "execution_count": 39,
1345 | "metadata": {},
1346 | "outputs": [
1347 | {
1348 | "name": "stdout",
1349 | "output_type": "stream",
1350 | "text": [
1351 | "428278\n",
1352 | "428278\n"
1353 | ]
1354 | }
1355 | ],
1356 | "source": [
1357 | "max_text_length = 83 # This will cover up to 89.5% lengthes\n",
1358 | "max_summary_length = 13 # This will cover up to 99% lengthes\n",
1359 | "min_length = 2\n",
1360 | "unk_text_limit = 1 # text can contain up to 1 UNK word\n",
1361 | "unk_summary_limit = 0 # Summary should not contain any UNK word\n",
1362 | "\n",
1363 | "def filter_condition(item):\n",
1364 | " int_summary = item[0]\n",
1365 | " int_text = item[1]\n",
1366 | " if(len(int_summary) >= min_length and \n",
1367 | " len(int_summary) <= max_summary_length and \n",
1368 | " len(int_text) >= min_length and \n",
1369 | " len(int_text) <= max_text_length and \n",
1370 | " unk_counter(int_summary) <= unk_summary_limit and \n",
1371 | " unk_counter(int_text) <= unk_text_limit):\n",
1372 | " return True\n",
1373 | " else:\n",
1374 | " return False\n",
1375 | "\n",
1376 | "int_text_summaries = list(zip(int_summaries , int_texts))\n",
1377 | "int_text_summaries_filtered = list(filter(filter_condition, int_text_summaries))\n",
1378 | "sorted_int_text_summaries = sorted(int_text_summaries_filtered, key=lambda item: len(item[1]))\n",
1379 | "sorted_int_text_summaries = list(zip(*sorted_int_text_summaries))\n",
1380 | "sorted_summaries = list(sorted_int_text_summaries[0])\n",
1381 | "sorted_texts = list(sorted_int_text_summaries[1])\n",
1382 | "# Delete those temporary varaibles\n",
1383 | "del int_text_summaries, sorted_int_text_summaries, int_text_summaries_filtered\n",
1384 | "# Compare lengths to ensure they match\n",
1385 | "print(len(sorted_summaries))\n",
1386 | "print(len(sorted_texts))"
1387 | ]
1388 | },
1389 | {
1390 | "cell_type": "markdown",
1391 | "metadata": {},
1392 | "source": [
1393 | "### Inspect the length of text in sorted_texts"
1394 | ]
1395 | },
1396 | {
1397 | "cell_type": "code",
1398 | "execution_count": 40,
1399 | "metadata": {},
1400 | "outputs": [
1401 | {
1402 | "data": {
1403 | "text/plain": [
1404 | "[2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]"
1405 | ]
1406 | },
1407 | "execution_count": 40,
1408 | "metadata": {},
1409 | "output_type": "execute_result"
1410 | }
1411 | ],
1412 | "source": [
1413 | "lengths_texts = [len(text) for text in sorted_texts]\n",
1414 | "lengths_texts[:20]"
1415 | ]
1416 | },
1417 | {
1418 | "cell_type": "markdown",
1419 | "metadata": {},
1420 | "source": [
1421 | "## Save data for later"
1422 | ]
1423 | },
1424 | {
1425 | "cell_type": "code",
1426 | "execution_count": 41,
1427 | "metadata": {
1428 | "collapsed": true
1429 | },
1430 | "outputs": [],
1431 | "source": [
1432 | "__pickleStuff(\"./data/clean_summaries.p\",clean_summaries)\n",
1433 | "__pickleStuff(\"./data/clean_texts.p\",clean_texts)\n",
1434 | "\n",
1435 | "__pickleStuff(\"./data/sorted_summaries.p\",sorted_summaries)\n",
1436 | "__pickleStuff(\"./data/sorted_texts.p\",sorted_texts)\n",
1437 | "__pickleStuff(\"./data/word_embedding_matrix.p\",word_embedding_matrix)\n",
1438 | "\n",
1439 | "__pickleStuff(\"./data/vocab_to_int.p\",vocab_to_int)\n",
1440 | "__pickleStuff(\"./data/int_to_vocab.p\",int_to_vocab)"
1441 | ]
1442 | },
1443 | {
1444 | "cell_type": "markdown",
1445 | "metadata": {},
1446 | "source": [
1447 | "## 3. Building the Model"
1448 | ]
1449 | },
1450 | {
1451 | "cell_type": "markdown",
1452 | "metadata": {},
1453 | "source": [
1454 | "Create palceholders for inputs to the model\n",
1455 | "\n",
1456 | "**summary_length** and **text_length** are the sentence lengths in a batch, and **max_summary_length** is the maximum length of a summary in a batch."
1457 | ]
1458 | },
1459 | {
1460 | "cell_type": "code",
1461 | "execution_count": 4,
1462 | "metadata": {
1463 | "collapsed": true
1464 | },
1465 | "outputs": [],
1466 | "source": [
1467 | "def model_inputs():\n",
1468 | " input_data = tf.placeholder(tf.int32, [None, None], name='input')\n",
1469 | " targets = tf.placeholder(tf.int32, [None, None], name='targets')\n",
1470 | " lr = tf.placeholder(tf.float32, name='learning_rate')\n",
1471 | " keep_prob = tf.placeholder(tf.float32, name='keep_prob')\n",
1472 | " summary_length = tf.placeholder(tf.int32, (None,), name='summary_length')\n",
1473 | " max_summary_length = tf.reduce_max(summary_length, name='max_dec_len')\n",
1474 | " text_length = tf.placeholder(tf.int32, (None,), name='text_length')\n",
1475 | "\n",
1476 | " return input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length"
1477 | ]
1478 | },
1479 | {
1480 | "cell_type": "markdown",
1481 | "metadata": {},
1482 | "source": [
1483 | "Remove the last word id from each batch and concatenate the id of `` to the begining of each batch"
1484 | ]
1485 | },
1486 | {
1487 | "cell_type": "code",
1488 | "execution_count": 5,
1489 | "metadata": {
1490 | "collapsed": true
1491 | },
1492 | "outputs": [],
1493 | "source": [
1494 | "def process_encoding_input(target_data, vocab_to_int, batch_size): \n",
1495 | " ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1]) # slice it to target_data[0:batch_size, 0: -1]\n",
1496 | " dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['']), ending], 1)\n",
1497 | "\n",
1498 | " return dec_input"
1499 | ]
1500 | },
1501 | {
1502 | "cell_type": "markdown",
1503 | "metadata": {},
1504 | "source": [
1505 | "### Create the encoding layers\n",
1506 | "\n",
1507 | "bidirectional_dynamic_rnn\n",
1508 | "use **tf.variable_scope** so that variables are reused with each layer\n",
1509 | "\n",
1510 | "parameters\n",
1511 | "- **rnn_size**: The number of units in the LSTM cell\n",
1512 | "- **sequence_length**: size [batch_size], containing the actual lengths for each of the sequences in the batch\n",
1513 | "- **num_layers**: number of bidirectional RNN layer\n",
1514 | "- **rnn_inputs**: number of bidirectional RNN layer\n",
1515 | "- **keep_prob**: RNN dropout input keep probability"
1516 | ]
1517 | },
1518 | {
1519 | "cell_type": "code",
1520 | "execution_count": 6,
1521 | "metadata": {
1522 | "collapsed": true
1523 | },
1524 | "outputs": [],
1525 | "source": [
1526 | "def encoding_layer(rnn_size, sequence_length, num_layers, rnn_inputs, keep_prob):\n",
1527 | " for layer in range(num_layers):\n",
1528 | " with tf.variable_scope('encoder_{}'.format(layer)):\n",
1529 | " cell_fw = tf.contrib.rnn.LSTMCell(rnn_size,\n",
1530 | " initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))\n",
1531 | " cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, \n",
1532 | " input_keep_prob = keep_prob)\n",
1533 | "\n",
1534 | " cell_bw = tf.contrib.rnn.LSTMCell(rnn_size,\n",
1535 | " initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))\n",
1536 | " cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, \n",
1537 | " input_keep_prob = keep_prob)\n",
1538 | "\n",
1539 | " enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, \n",
1540 | " cell_bw, \n",
1541 | " rnn_inputs,\n",
1542 | " sequence_length,\n",
1543 | " dtype=tf.float32)\n",
1544 | " enc_output = tf.concat(enc_output,2)\n",
1545 | " # original code is missing this line below, that is how we connect layers \n",
1546 | " # by feeding the current layer's output to next layer's input\n",
1547 | " rnn_inputs = enc_output\n",
1548 | " return enc_output, enc_state"
1549 | ]
1550 | },
1551 | {
1552 | "cell_type": "markdown",
1553 | "metadata": {},
1554 | "source": [
1555 | "### Create the training decoding layer\n",
1556 | "parameters\n",
1557 | "- **dec_embed_input**: output of embedding_lookup for a batch of inputs\n",
1558 | "- **summary_length**: length of each padded summary sequences in batch, since padded, all lengths should be same number \n",
1559 | "- **dec_cell**: the decoder RNN cells' output with attention wapper\n",
1560 | "- **output_layer**: fully connected layer to apply to the RNN output\n",
1561 | "- **vocab_size**: vocabulary size i.e. len(vocab_to_int)+1\n",
1562 | "- **max_summary_length**: the maximum length of a summary in a batch\n",
1563 | "- **batch_size**: number of input sequences in a batch\n",
1564 | "\n",
1565 | "Three components\n",
1566 | "\n",
1567 | "- **TraingHelper** reads a sequence of integers from the encoding layer.\n",
1568 | "- **BasicDecoder** processes the sequence with the decoding cell, and an output layer, which is a fully connected layer. **initial_state** set to zero state.\n",
1569 | "- **dynamic_decode** creates our outputs that will be used for training."
1570 | ]
1571 | },
1572 | {
1573 | "cell_type": "code",
1574 | "execution_count": 7,
1575 | "metadata": {
1576 | "collapsed": true
1577 | },
1578 | "outputs": [],
1579 | "source": [
1580 | "def training_decoding_layer(dec_embed_input, summary_length, dec_cell, output_layer,\n",
1581 | " vocab_size, max_summary_length,batch_size):\n",
1582 | " training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,\n",
1583 | " sequence_length=summary_length,\n",
1584 | " time_major=False)\n",
1585 | "\n",
1586 | " training_decoder = tf.contrib.seq2seq.BasicDecoder(cell=dec_cell,\n",
1587 | " helper=training_helper,\n",
1588 | " initial_state=dec_cell.zero_state(dtype=tf.float32, batch_size=batch_size),\n",
1589 | " output_layer = output_layer)\n",
1590 | "\n",
1591 | " training_logits = tf.contrib.seq2seq.dynamic_decode(training_decoder,\n",
1592 | " output_time_major=False,\n",
1593 | " impute_finished=True,\n",
1594 | " maximum_iterations=max_summary_length)\n",
1595 | " return training_logits"
1596 | ]
1597 | },
1598 | {
1599 | "cell_type": "markdown",
1600 | "metadata": {},
1601 | "source": [
1602 | "### Create infer decoding layer\n",
1603 | "\n",
1604 | "parameters\n",
1605 | "- **embeddings**: the CN's word_embedding_matrix\n",
1606 | "- **start_token**: the id of ``\n",
1607 | "- **end_token**: the id of ``\n",
1608 | "- **dec_cell**: the decoder RNN cells' output with attention wapper\n",
1609 | "- **output_layer**: fully connected layer to apply to the RNN output\n",
1610 | "- **max_summary_length**: the maximum length of a summary in a batch\n",
1611 | "- **batch_size**: number of input sequences in a batch\n",
1612 | "\n",
1613 | "**GreedyEmbeddingHelper** argument **start_tokens**: int32 vector shaped [batch_size], the start tokens."
1614 | ]
1615 | },
1616 | {
1617 | "cell_type": "code",
1618 | "execution_count": 8,
1619 | "metadata": {
1620 | "collapsed": true
1621 | },
1622 | "outputs": [],
1623 | "source": [
1624 | "def inference_decoding_layer(embeddings, start_token, end_token, dec_cell, output_layer,\n",
1625 | " max_summary_length, batch_size):\n",
1626 | " '''Create the inference logits'''\n",
1627 | " \n",
1628 | " start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')\n",
1629 | " \n",
1630 | " inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings,\n",
1631 | " start_tokens,\n",
1632 | " end_token)\n",
1633 | " \n",
1634 | " inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,\n",
1635 | " inference_helper,\n",
1636 | " dec_cell.zero_state(dtype=tf.float32, batch_size=batch_size),\n",
1637 | " output_layer)\n",
1638 | " \n",
1639 | " inference_logits = tf.contrib.seq2seq.dynamic_decode(inference_decoder,\n",
1640 | " output_time_major=False,\n",
1641 | " impute_finished=True,\n",
1642 | " maximum_iterations=max_summary_length)\n",
1643 | " \n",
1644 | " return inference_logits"
1645 | ]
1646 | },
1647 | {
1648 | "cell_type": "markdown",
1649 | "metadata": {},
1650 | "source": [
1651 | "### Create Decoding layer\n",
1652 | "3 parts: decoding cell, attention, and getting our logits.\n",
1653 | "#### Decoding Cell: \n",
1654 | "Just a two layer LSTM with dropout.\n",
1655 | "#### Attention: \n",
1656 | "Using Bhadanau, since trains faster than Luong. \n",
1657 | "\n",
1658 | "**AttentionWrapper** applies the attention mechanism to our decoding cell.\n",
1659 | "\n",
1660 | "parameters\n",
1661 | "- **dec_embed_input**: output of embedding_lookup for a batch of inputs\n",
1662 | "- **embeddings**: the CN's word_embedding_matrix\n",
1663 | "- **enc_output**: encoder layer output, containing the forward and the backward rnn output\n",
1664 | "- **enc_state**: encoder layer state, a tuple containing the forward and the backward final states of bidirectional rnn.\n",
1665 | "- **vocab_size**: vocabulary size i.e. len(vocab_to_int)+1\n",
1666 | "- **text_length**: the actual lengths for each of the input text sequences in the batch\n",
1667 | "- **summary_length**: the actual lengths for each of the input summary sequences in the batch\n",
1668 | "- **max_summary_length**: the maximum length of a summary in a batch\n",
1669 | "- **rnn_size**: The number of units in the LSTM cell\n",
1670 | "- **vocab_to_int**: vocab_to_int the dictionary\n",
1671 | "- **keep_prob**: RNN dropout input keep probability\n",
1672 | "- **batch_size**: number of input sequences in a batch\n",
1673 | "- **num_layers**: number of decoder RNN layer"
1674 | ]
1675 | },
1676 | {
1677 | "cell_type": "code",
1678 | "execution_count": 9,
1679 | "metadata": {
1680 | "collapsed": true
1681 | },
1682 | "outputs": [],
1683 | "source": [
1684 | "def lstm_cell(lstm_size, keep_prob):\n",
1685 | " cell = tf.contrib.rnn.BasicLSTMCell(lstm_size)\n",
1686 | " return tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob = keep_prob)\n",
1687 | "\n",
1688 | "def decoding_layer(dec_embed_input, embeddings, enc_output, enc_state, vocab_size, text_length, summary_length,\n",
1689 | " max_summary_length, rnn_size, vocab_to_int, keep_prob, batch_size, num_layers):\n",
1690 | " '''Create the decoding cell and attention for the training and inference decoding layers'''\n",
1691 | " dec_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(rnn_size, keep_prob) for _ in range(num_layers)])\n",
1692 | " output_layer = Dense(vocab_size,kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))\n",
1693 | " attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size,\n",
1694 | " enc_output,\n",
1695 | " text_length,\n",
1696 | " normalize=False,\n",
1697 | " name='BahdanauAttention')\n",
1698 | " dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell,attn_mech,rnn_size)\n",
1699 | " with tf.variable_scope(\"decode\"):\n",
1700 | " training_logits = training_decoding_layer(dec_embed_input,summary_length,dec_cell,\n",
1701 | " output_layer,\n",
1702 | " vocab_size,\n",
1703 | " max_summary_length,\n",
1704 | " batch_size)\n",
1705 | " with tf.variable_scope(\"decode\", reuse=True):\n",
1706 | " inference_logits = inference_decoding_layer(embeddings,\n",
1707 | " vocab_to_int[''],\n",
1708 | " vocab_to_int[''],\n",
1709 | " dec_cell,\n",
1710 | " output_layer,\n",
1711 | " max_summary_length,\n",
1712 | " batch_size)\n",
1713 | " return training_logits, inference_logits"
1714 | ]
1715 | },
1716 | {
1717 | "cell_type": "code",
1718 | "execution_count": 10,
1719 | "metadata": {
1720 | "collapsed": true
1721 | },
1722 | "outputs": [],
1723 | "source": [
1724 | "def seq2seq_model(input_data, target_data, keep_prob, text_length, summary_length, max_summary_length, \n",
1725 | " vocab_size, rnn_size, num_layers, vocab_to_int, batch_size):\n",
1726 | " '''Use the previous functions to create the training and inference logits'''\n",
1727 | " \n",
1728 | " # Use Numberbatch's embeddings and the newly created ones as our embeddings\n",
1729 | " embeddings = word_embedding_matrix\n",
1730 | " enc_embed_input = tf.nn.embedding_lookup(embeddings, input_data)\n",
1731 | " enc_output, enc_state = encoding_layer(rnn_size, text_length, num_layers, enc_embed_input, keep_prob)\n",
1732 | " dec_input = process_encoding_input(target_data, vocab_to_int, batch_size) #shape=(batch_size, senquence length) each seq start with index of\n",
1733 | " dec_embed_input = tf.nn.embedding_lookup(embeddings, dec_input)\n",
1734 | " training_logits, inference_logits = decoding_layer(dec_embed_input, \n",
1735 | " embeddings,\n",
1736 | " enc_output,\n",
1737 | " enc_state, \n",
1738 | " vocab_size, \n",
1739 | " text_length, \n",
1740 | " summary_length, \n",
1741 | " max_summary_length,\n",
1742 | " rnn_size, \n",
1743 | " vocab_to_int, \n",
1744 | " keep_prob, \n",
1745 | " batch_size,\n",
1746 | " num_layers)\n",
1747 | " return training_logits, inference_logits"
1748 | ]
1749 | },
1750 | {
1751 | "cell_type": "markdown",
1752 | "metadata": {},
1753 | "source": [
1754 | "### Pad sentences for batch\n",
1755 | "Pad so the actual lengths for each of the sequences in the batch have the same length."
1756 | ]
1757 | },
1758 | {
1759 | "cell_type": "code",
1760 | "execution_count": 11,
1761 | "metadata": {
1762 | "collapsed": true
1763 | },
1764 | "outputs": [],
1765 | "source": [
1766 | "def pad_sentence_batch(sentence_batch):\n",
1767 | " \"\"\"Pad sentences with so that each sentence of a batch has the same length\"\"\"\n",
1768 | " max_sentence = max([len(sentence) for sentence in sentence_batch])\n",
1769 | " return [sentence + [vocab_to_int['']] * (max_sentence - len(sentence)) for sentence in sentence_batch]"
1770 | ]
1771 | },
1772 | {
1773 | "cell_type": "markdown",
1774 | "metadata": {},
1775 | "source": [
1776 | "### Function to generate batch data for training"
1777 | ]
1778 | },
1779 | {
1780 | "cell_type": "code",
1781 | "execution_count": 12,
1782 | "metadata": {
1783 | "collapsed": true
1784 | },
1785 | "outputs": [],
1786 | "source": [
1787 | "def get_batches(summaries, texts, batch_size):\n",
1788 | " \"\"\"Batch summaries, texts, and the lengths of their sentences together\"\"\"\n",
1789 | " for batch_i in range(0, len(texts)//batch_size):\n",
1790 | " start_i = batch_i * batch_size\n",
1791 | " summaries_batch = summaries[start_i:start_i + batch_size]\n",
1792 | " texts_batch = texts[start_i:start_i + batch_size]\n",
1793 | " pad_summaries_batch = np.array(pad_sentence_batch(summaries_batch))\n",
1794 | " pad_texts_batch = np.array(pad_sentence_batch(texts_batch))\n",
1795 | " \n",
1796 | " # Need the lengths for the _lengths parameters\n",
1797 | " pad_summaries_lengths = []\n",
1798 | " for summary in pad_summaries_batch:\n",
1799 | " pad_summaries_lengths.append(len(summary))\n",
1800 | " \n",
1801 | " pad_texts_lengths = []\n",
1802 | " for text in pad_texts_batch:\n",
1803 | " pad_texts_lengths.append(len(text))\n",
1804 | " \n",
1805 | " yield pad_summaries_batch, pad_texts_batch, pad_summaries_lengths, pad_texts_lengths"
1806 | ]
1807 | },
1808 | {
1809 | "cell_type": "markdown",
1810 | "metadata": {},
1811 | "source": [
1812 | "#### Just to test \"get_batches\" function\n",
1813 | "Here we generate a batch with size of 5\n",
1814 | "\n",
1815 | "Checkout those \"59069\" they are ``s, also all sequences' lengths are the same."
1816 | ]
1817 | },
1818 | {
1819 | "cell_type": "code",
1820 | "execution_count": 13,
1821 | "metadata": {},
1822 | "outputs": [
1823 | {
1824 | "name": "stdout",
1825 | "output_type": "stream",
1826 | "text": [
1827 | "'' has id: 59069\n",
1828 | "pad summaries batch samples:\n",
1829 | "\r",
1830 | " [[ 9218 18733 13131 39434 39434 4082 2454 29838 26219 33088 26752 4]\n",
1831 | " [ 1417 42487 4397 22892 20719 59069 59069 59069 59069 59069 59069 59069]\n",
1832 | " [ 2229 54986 19050 54986 44366 56008 8293 46449 6045 20974 4269 41958]\n",
1833 | " [39205 16127 2875 26752 33799 58931 58335 5156 12490 59069 59069 59069]\n",
1834 | " [54984 47044 12490 43359 46111 59069 59069 59069 59069 59069 59069 59069]]\n"
1835 | ]
1836 | }
1837 | ],
1838 | "source": [
1839 | "print(\"'' has id: {}\".format(vocab_to_int['']))\n",
1840 | "sorted_summaries_samples = sorted_summaries[7:50]\n",
1841 | "sorted_texts_samples = sorted_texts[7:50]\n",
1842 | "pad_summaries_batch_samples, pad_texts_batch_samples, pad_summaries_lengths_samples, pad_texts_lengths_samples = next(get_batches(\n",
1843 | " sorted_summaries_samples, sorted_texts_samples, 5))\n",
1844 | "print(\"pad summaries batch samples:\\n\\r {}\".format(pad_summaries_batch_samples))"
1845 | ]
1846 | },
1847 | {
1848 | "cell_type": "code",
1849 | "execution_count": 18,
1850 | "metadata": {
1851 | "collapsed": true
1852 | },
1853 | "outputs": [],
1854 | "source": [
1855 | "# Set the Hyperparameters\n",
1856 | "epochs = 100\n",
1857 | "batch_size = 64\n",
1858 | "rnn_size = 256\n",
1859 | "num_layers = 2\n",
1860 | "learning_rate = 0.005\n",
1861 | "keep_probability = 0.95"
1862 | ]
1863 | },
1864 | {
1865 | "cell_type": "markdown",
1866 | "metadata": {},
1867 | "source": [
1868 | "## Build graph"
1869 | ]
1870 | },
1871 | {
1872 | "cell_type": "code",
1873 | "execution_count": 15,
1874 | "metadata": {},
1875 | "outputs": [
1876 | {
1877 | "name": "stdout",
1878 | "output_type": "stream",
1879 | "text": [
1880 | "Graph is built.\n",
1881 | "./graph\n"
1882 | ]
1883 | }
1884 | ],
1885 | "source": [
1886 | "# Build the graph\n",
1887 | "train_graph = tf.Graph()\n",
1888 | "# Set the graph to default to ensure that it is ready for training\n",
1889 | "with train_graph.as_default():\n",
1890 | " \n",
1891 | " # Load the model inputs \n",
1892 | " input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length = model_inputs()\n",
1893 | "\n",
1894 | " # Create the training and inference logits\n",
1895 | " training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),\n",
1896 | " targets, \n",
1897 | " keep_prob, \n",
1898 | " text_length,\n",
1899 | " summary_length,\n",
1900 | " max_summary_length,\n",
1901 | " len(vocab_to_int)+1,\n",
1902 | " rnn_size, \n",
1903 | " num_layers, \n",
1904 | " vocab_to_int,\n",
1905 | " batch_size)\n",
1906 | " \n",
1907 | " # Create tensors for the training logits and inference logits\n",
1908 | " training_logits = tf.identity(training_logits[0].rnn_output, 'logits')\n",
1909 | " inference_logits = tf.identity(inference_logits[0].sample_id, name='predictions')\n",
1910 | " \n",
1911 | " # Create the weights for sequence_loss, the sould be all True across since each batch is padded\n",
1912 | " masks = tf.sequence_mask(summary_length, max_summary_length, dtype=tf.float32, name='masks')\n",
1913 | "\n",
1914 | " with tf.name_scope(\"optimization\"):\n",
1915 | " # Loss function\n",
1916 | " cost = tf.contrib.seq2seq.sequence_loss(\n",
1917 | " training_logits,\n",
1918 | " targets,\n",
1919 | " masks)\n",
1920 | "\n",
1921 | " # Optimizer\n",
1922 | " optimizer = tf.train.AdamOptimizer(learning_rate)\n",
1923 | "\n",
1924 | " # Gradient Clipping\n",
1925 | " gradients = optimizer.compute_gradients(cost)\n",
1926 | " capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]\n",
1927 | " train_op = optimizer.apply_gradients(capped_gradients)\n",
1928 | "print(\"Graph is built.\")\n",
1929 | "graph_location = \"./graph\"\n",
1930 | "print(graph_location)\n",
1931 | "train_writer = tf.summary.FileWriter(graph_location)\n",
1932 | "train_writer.add_graph(train_graph)"
1933 | ]
1934 | },
1935 | {
1936 | "cell_type": "markdown",
1937 | "metadata": {},
1938 | "source": [
1939 | "## 4. Training the Model\n",
1940 | "\n",
1941 | "Only going to use a subset of the data to reduce the traing time for this demo.\n",
1942 | "\n",
1943 | "We chose not use use the start of the subset because because those are shorter sequences and we don't want to make it too easy for the model."
1944 | ]
1945 | },
1946 | {
1947 | "cell_type": "code",
1948 | "execution_count": 16,
1949 | "metadata": {},
1950 | "outputs": [
1951 | {
1952 | "name": "stdout",
1953 | "output_type": "stream",
1954 | "text": [
1955 | "The shortest text length: 25\n",
1956 | "The longest text length: 31\n"
1957 | ]
1958 | }
1959 | ],
1960 | "source": [
1961 | "# Subset the data for training\n",
1962 | "start = 200000\n",
1963 | "end = start + 50000\n",
1964 | "sorted_summaries_short = sorted_summaries[start:end]\n",
1965 | "sorted_texts_short = sorted_texts[start:end]\n",
1966 | "print(\"The shortest text length:\", len(sorted_texts_short[0]))\n",
1967 | "print(\"The longest text length:\",len(sorted_texts_short[-1]))"
1968 | ]
1969 | },
1970 | {
1971 | "cell_type": "code",
1972 | "execution_count": 17,
1973 | "metadata": {
1974 | "scrolled": true
1975 | },
1976 | "outputs": [
1977 | {
1978 | "name": "stdout",
1979 | "output_type": "stream",
1980 | "text": [
1981 | "Epoch 1/100 Batch 20/781 - Loss: 5.205, Seconds: 4.33\n",
1982 | "Epoch 1/100 Batch 40/781 - Loss: 2.856, Seconds: 3.35\n",
1983 | "Epoch 1/100 Batch 60/781 - Loss: 2.914, Seconds: 4.61\n",
1984 | "Epoch 1/100 Batch 80/781 - Loss: 2.825, Seconds: 4.13\n",
1985 | "Epoch 1/100 Batch 100/781 - Loss: 2.698, Seconds: 4.21\n",
1986 | "Epoch 1/100 Batch 120/781 - Loss: 2.711, Seconds: 3.79\n",
1987 | "Epoch 1/100 Batch 140/781 - Loss: 2.587, Seconds: 3.97\n",
1988 | "Epoch 1/100 Batch 160/781 - Loss: 2.844, Seconds: 3.27\n",
1989 | "Epoch 1/100 Batch 180/781 - Loss: 2.685, Seconds: 3.65\n",
1990 | "Epoch 1/100 Batch 200/781 - Loss: 2.676, Seconds: 4.35\n",
1991 | "Epoch 1/100 Batch 220/781 - Loss: 2.608, Seconds: 4.05\n",
1992 | "Epoch 1/100 Batch 240/781 - Loss: 2.471, Seconds: 4.09\n",
1993 | "Average loss for this update: 2.896\n",
1994 | "New Record!\n",
1995 | "Epoch 1/100 Batch 260/781 - Loss: 2.540, Seconds: 4.59\n",
1996 | "Epoch 1/100 Batch 280/781 - Loss: 2.611, Seconds: 4.21\n",
1997 | "Epoch 1/100 Batch 300/781 - Loss: 2.674, Seconds: 4.13\n",
1998 | "Epoch 1/100 Batch 320/781 - Loss: 2.685, Seconds: 4.13\n",
1999 | "Epoch 1/100 Batch 340/781 - Loss: 2.483, Seconds: 4.67\n",
2000 | "Epoch 1/100 Batch 360/781 - Loss: 2.591, Seconds: 4.37\n",
2001 | "Epoch 1/100 Batch 380/781 - Loss: 2.430, Seconds: 4.67\n",
2002 | "Epoch 1/100 Batch 400/781 - Loss: 2.550, Seconds: 4.61\n",
2003 | "Epoch 1/100 Batch 420/781 - Loss: 2.512, Seconds: 3.83\n",
2004 | "Epoch 1/100 Batch 440/781 - Loss: 2.619, Seconds: 4.63\n",
2005 | "Epoch 1/100 Batch 460/781 - Loss: 2.652, Seconds: 4.51\n",
2006 | "Epoch 1/100 Batch 480/781 - Loss: 2.440, Seconds: 3.95\n",
2007 | "Epoch 1/100 Batch 500/781 - Loss: 2.464, Seconds: 4.23\n",
2008 | "Average loss for this update: 2.55\n",
2009 | "New Record!\n",
2010 | "Epoch 1/100 Batch 520/781 - Loss: 2.462, Seconds: 3.95\n",
2011 | "Epoch 1/100 Batch 540/781 - Loss: 2.477, Seconds: 4.53\n",
2012 | "Epoch 1/100 Batch 560/781 - Loss: 2.419, Seconds: 4.05\n",
2013 | "Epoch 1/100 Batch 580/781 - Loss: 2.470, Seconds: 4.31\n",
2014 | "Epoch 1/100 Batch 600/781 - Loss: 2.654, Seconds: 4.25\n",
2015 | "Epoch 1/100 Batch 620/781 - Loss: 2.529, Seconds: 4.21\n",
2016 | "Epoch 1/100 Batch 640/781 - Loss: 2.432, Seconds: 4.29\n",
2017 | "Epoch 1/100 Batch 660/781 - Loss: 2.368, Seconds: 4.69\n",
2018 | "Epoch 1/100 Batch 680/781 - Loss: 2.284, Seconds: 4.27\n",
2019 | "Epoch 1/100 Batch 700/781 - Loss: 2.440, Seconds: 4.29\n",
2020 | "Epoch 1/100 Batch 720/781 - Loss: 2.582, Seconds: 4.53\n",
2021 | "Epoch 1/100 Batch 740/781 - Loss: 2.421, Seconds: 4.62\n",
2022 | "Epoch 1/100 Batch 760/781 - Loss: 2.419, Seconds: 4.23\n",
2023 | "Average loss for this update: 2.442\n",
2024 | "New Record!\n",
2025 | "Epoch 1/100 Batch 780/781 - Loss: 2.229, Seconds: 3.87\n",
2026 | "Epoch 2/100 Batch 20/781 - Loss: 2.374, Seconds: 4.19\n",
2027 | "Epoch 2/100 Batch 40/781 - Loss: 2.232, Seconds: 3.49\n",
2028 | "Epoch 2/100 Batch 60/781 - Loss: 2.322, Seconds: 4.31\n",
2029 | "Epoch 2/100 Batch 80/781 - Loss: 2.287, Seconds: 4.25\n",
2030 | "Epoch 2/100 Batch 100/781 - Loss: 2.184, Seconds: 4.35\n",
2031 | "Epoch 2/100 Batch 120/781 - Loss: 2.215, Seconds: 3.87\n",
2032 | "Epoch 2/100 Batch 140/781 - Loss: 2.079, Seconds: 4.07\n",
2033 | "Epoch 2/100 Batch 160/781 - Loss: 2.356, Seconds: 3.31\n",
2034 | "Epoch 2/100 Batch 180/781 - Loss: 2.215, Seconds: 3.65\n",
2035 | "Epoch 2/100 Batch 200/781 - Loss: 2.221, Seconds: 4.41\n",
2036 | "Epoch 2/100 Batch 220/781 - Loss: 2.154, Seconds: 4.36\n",
2037 | "Epoch 2/100 Batch 240/781 - Loss: 2.013, Seconds: 4.11\n",
2038 | "Average loss for this update: 2.211\n",
2039 | "New Record!\n",
2040 | "Epoch 2/100 Batch 260/781 - Loss: 2.074, Seconds: 4.61\n",
2041 | "Epoch 2/100 Batch 280/781 - Loss: 2.124, Seconds: 4.41\n",
2042 | "Epoch 2/100 Batch 300/781 - Loss: 2.238, Seconds: 4.11\n",
2043 | "Epoch 2/100 Batch 320/781 - Loss: 2.276, Seconds: 4.33\n",
2044 | "Epoch 2/100 Batch 340/781 - Loss: 2.091, Seconds: 4.67\n",
2045 | "Epoch 2/100 Batch 360/781 - Loss: 2.186, Seconds: 4.27\n",
2046 | "Epoch 2/100 Batch 380/781 - Loss: 2.033, Seconds: 4.63\n",
2047 | "Epoch 2/100 Batch 400/781 - Loss: 2.148, Seconds: 4.62\n",
2048 | "Epoch 2/100 Batch 420/781 - Loss: 2.130, Seconds: 4.09\n",
2049 | "Epoch 2/100 Batch 440/781 - Loss: 2.233, Seconds: 4.75\n",
2050 | "Epoch 2/100 Batch 460/781 - Loss: 2.285, Seconds: 4.33\n",
2051 | "Epoch 2/100 Batch 480/781 - Loss: 2.098, Seconds: 4.21\n",
2052 | "Epoch 2/100 Batch 500/781 - Loss: 2.132, Seconds: 4.57\n",
2053 | "Average loss for this update: 2.157\n",
2054 | "New Record!\n",
2055 | "Epoch 2/100 Batch 520/781 - Loss: 2.093, Seconds: 4.24\n",
2056 | "Epoch 2/100 Batch 540/781 - Loss: 2.097, Seconds: 4.29\n",
2057 | "Epoch 2/100 Batch 560/781 - Loss: 2.055, Seconds: 3.91\n",
2058 | "Epoch 2/100 Batch 580/781 - Loss: 2.177, Seconds: 4.27\n",
2059 | "Epoch 2/100 Batch 600/781 - Loss: 2.329, Seconds: 4.23\n",
2060 | "Epoch 2/100 Batch 620/781 - Loss: 2.219, Seconds: 3.99\n",
2061 | "Epoch 2/100 Batch 640/781 - Loss: 2.115, Seconds: 4.27\n",
2062 | "Epoch 2/100 Batch 660/781 - Loss: 2.047, Seconds: 4.50\n",
2063 | "Epoch 2/100 Batch 680/781 - Loss: 1.999, Seconds: 4.35\n",
2064 | "Epoch 2/100 Batch 700/781 - Loss: 2.156, Seconds: 4.31\n",
2065 | "Epoch 2/100 Batch 720/781 - Loss: 2.297, Seconds: 4.51\n",
2066 | "Epoch 2/100 Batch 740/781 - Loss: 2.160, Seconds: 4.79\n",
2067 | "Epoch 2/100 Batch 760/781 - Loss: 2.161, Seconds: 4.41\n",
2068 | "Average loss for this update: 2.138\n",
2069 | "New Record!\n",
2070 | "Epoch 2/100 Batch 780/781 - Loss: 1.973, Seconds: 3.99\n",
2071 | "Epoch 3/100 Batch 20/781 - Loss: 2.163, Seconds: 4.33\n",
2072 | "Epoch 3/100 Batch 40/781 - Loss: 2.015, Seconds: 3.43\n",
2073 | "Epoch 3/100 Batch 60/781 - Loss: 2.076, Seconds: 4.43\n",
2074 | "Epoch 3/100 Batch 80/781 - Loss: 2.057, Seconds: 4.21\n",
2075 | "Epoch 3/100 Batch 100/781 - Loss: 1.947, Seconds: 4.15\n",
2076 | "Epoch 3/100 Batch 120/781 - Loss: 1.995, Seconds: 3.79\n",
2077 | "Epoch 3/100 Batch 140/781 - Loss: 1.837, Seconds: 3.99\n",
2078 | "Epoch 3/100 Batch 160/781 - Loss: 2.138, Seconds: 3.31\n",
2079 | "Epoch 3/100 Batch 180/781 - Loss: 2.005, Seconds: 3.89\n",
2080 | "Epoch 3/100 Batch 200/781 - Loss: 2.011, Seconds: 4.29\n",
2081 | "Epoch 3/100 Batch 220/781 - Loss: 1.942, Seconds: 4.05\n",
2082 | "Epoch 3/100 Batch 240/781 - Loss: 1.796, Seconds: 4.19\n",
2083 | "Average loss for this update: 1.99\n",
2084 | "New Record!\n",
2085 | "Epoch 3/100 Batch 260/781 - Loss: 1.877, Seconds: 4.49\n",
2086 | "Epoch 3/100 Batch 280/781 - Loss: 1.905, Seconds: 4.19\n",
2087 | "Epoch 3/100 Batch 300/781 - Loss: 2.033, Seconds: 4.19\n",
2088 | "Epoch 3/100 Batch 320/781 - Loss: 2.075, Seconds: 4.27\n",
2089 | "Epoch 3/100 Batch 340/781 - Loss: 1.902, Seconds: 4.75\n",
2090 | "Epoch 3/100 Batch 360/781 - Loss: 1.997, Seconds: 4.39\n",
2091 | "Epoch 3/100 Batch 380/781 - Loss: 1.816, Seconds: 4.47\n",
2092 | "Epoch 3/100 Batch 400/781 - Loss: 1.941, Seconds: 4.63\n",
2093 | "Epoch 3/100 Batch 420/781 - Loss: 1.911, Seconds: 4.03\n",
2094 | "Epoch 3/100 Batch 440/781 - Loss: 2.010, Seconds: 4.53\n",
2095 | "Epoch 3/100 Batch 460/781 - Loss: 2.071, Seconds: 4.39\n",
2096 | "Epoch 3/100 Batch 480/781 - Loss: 1.883, Seconds: 3.99\n",
2097 | "Epoch 3/100 Batch 500/781 - Loss: 1.921, Seconds: 4.25\n",
2098 | "Average loss for this update: 1.947\n",
2099 | "New Record!\n",
2100 | "Epoch 3/100 Batch 520/781 - Loss: 1.861, Seconds: 3.91\n",
2101 | "Epoch 3/100 Batch 540/781 - Loss: 1.889, Seconds: 4.41\n",
2102 | "Epoch 3/100 Batch 560/781 - Loss: 1.842, Seconds: 4.03\n",
2103 | "Epoch 3/100 Batch 580/781 - Loss: 1.997, Seconds: 4.35\n",
2104 | "Epoch 3/100 Batch 600/781 - Loss: 2.124, Seconds: 4.35\n",
2105 | "Epoch 3/100 Batch 620/781 - Loss: 2.016, Seconds: 4.03\n",
2106 | "Epoch 3/100 Batch 640/781 - Loss: 1.915, Seconds: 4.49\n",
2107 | "Epoch 3/100 Batch 660/781 - Loss: 1.837, Seconds: 4.45\n",
2108 | "Epoch 3/100 Batch 680/781 - Loss: 1.817, Seconds: 4.27\n",
2109 | "Epoch 3/100 Batch 700/781 - Loss: 1.956, Seconds: 4.45\n",
2110 | "Epoch 3/100 Batch 720/781 - Loss: 2.116, Seconds: 4.51\n",
2111 | "Epoch 3/100 Batch 740/781 - Loss: 1.975, Seconds: 4.79\n",
2112 | "Epoch 3/100 Batch 760/781 - Loss: 1.988, Seconds: 4.33\n",
2113 | "Average loss for this update: 1.944\n",
2114 | "New Record!\n",
2115 | "Epoch 3/100 Batch 780/781 - Loss: 1.783, Seconds: 3.99\n",
2116 | "Epoch 4/100 Batch 20/781 - Loss: 2.024, Seconds: 4.29\n",
2117 | "Epoch 4/100 Batch 40/781 - Loss: 1.855, Seconds: 3.57\n",
2118 | "Epoch 4/100 Batch 60/781 - Loss: 1.903, Seconds: 4.35\n",
2119 | "Epoch 4/100 Batch 80/781 - Loss: 1.899, Seconds: 4.19\n",
2120 | "Epoch 4/100 Batch 100/781 - Loss: 1.763, Seconds: 4.21\n",
2121 | "Epoch 4/100 Batch 120/781 - Loss: 1.821, Seconds: 3.85\n",
2122 | "Epoch 4/100 Batch 140/781 - Loss: 1.684, Seconds: 4.01\n",
2123 | "Epoch 4/100 Batch 160/781 - Loss: 1.981, Seconds: 3.29\n",
2124 | "Epoch 4/100 Batch 180/781 - Loss: 1.853, Seconds: 3.67\n",
2125 | "Epoch 4/100 Batch 200/781 - Loss: 1.856, Seconds: 4.27\n",
2126 | "Epoch 4/100 Batch 220/781 - Loss: 1.793, Seconds: 4.09\n",
2127 | "Epoch 4/100 Batch 240/781 - Loss: 1.635, Seconds: 4.19\n",
2128 | "Average loss for this update: 1.831\n",
2129 | "New Record!\n",
2130 | "Epoch 4/100 Batch 260/781 - Loss: 1.724, Seconds: 4.55\n",
2131 | "Epoch 4/100 Batch 280/781 - Loss: 1.746, Seconds: 4.33\n"
2132 | ]
2133 | },
2134 | {
2135 | "name": "stdout",
2136 | "output_type": "stream",
2137 | "text": [
2138 | "Epoch 4/100 Batch 300/781 - Loss: 1.882, Seconds: 4.23\n",
2139 | "Epoch 4/100 Batch 320/781 - Loss: 1.929, Seconds: 4.11\n",
2140 | "Epoch 4/100 Batch 340/781 - Loss: 1.754, Seconds: 4.65\n",
2141 | "Epoch 4/100 Batch 360/781 - Loss: 1.842, Seconds: 4.25\n",
2142 | "Epoch 4/100 Batch 380/781 - Loss: 1.654, Seconds: 4.67\n",
2143 | "Epoch 4/100 Batch 400/781 - Loss: 1.781, Seconds: 4.69\n",
2144 | "Epoch 4/100 Batch 420/781 - Loss: 1.760, Seconds: 4.03\n",
2145 | "Epoch 4/100 Batch 440/781 - Loss: 1.869, Seconds: 4.53\n",
2146 | "Epoch 4/100 Batch 460/781 - Loss: 1.923, Seconds: 4.55\n",
2147 | "Epoch 4/100 Batch 480/781 - Loss: 1.737, Seconds: 4.01\n",
2148 | "Epoch 4/100 Batch 500/781 - Loss: 1.772, Seconds: 4.09\n",
2149 | "Average loss for this update: 1.793\n",
2150 | "New Record!\n",
2151 | "Epoch 4/100 Batch 520/781 - Loss: 1.681, Seconds: 3.91\n",
2152 | "Epoch 4/100 Batch 540/781 - Loss: 1.740, Seconds: 4.49\n",
2153 | "Epoch 4/100 Batch 560/781 - Loss: 1.693, Seconds: 4.33\n",
2154 | "Epoch 4/100 Batch 580/781 - Loss: 1.859, Seconds: 4.25\n",
2155 | "Epoch 4/100 Batch 600/781 - Loss: 1.982, Seconds: 4.47\n",
2156 | "Epoch 4/100 Batch 620/781 - Loss: 1.871, Seconds: 4.01\n",
2157 | "Epoch 4/100 Batch 640/781 - Loss: 1.775, Seconds: 4.25\n",
2158 | "Epoch 4/100 Batch 660/781 - Loss: 1.680, Seconds: 4.67\n",
2159 | "Epoch 4/100 Batch 680/781 - Loss: 1.678, Seconds: 4.45\n",
2160 | "Epoch 4/100 Batch 700/781 - Loss: 1.821, Seconds: 4.35\n",
2161 | "Epoch 4/100 Batch 720/781 - Loss: 1.983, Seconds: 4.53\n",
2162 | "Epoch 4/100 Batch 740/781 - Loss: 1.857, Seconds: 4.91\n",
2163 | "Epoch 4/100 Batch 760/781 - Loss: 1.840, Seconds: 4.41\n",
2164 | "Average loss for this update: 1.803\n",
2165 | "No Improvement.\n",
2166 | "Epoch 4/100 Batch 780/781 - Loss: 1.640, Seconds: 4.05\n",
2167 | "Epoch 5/100 Batch 20/781 - Loss: 1.914, Seconds: 4.47\n",
2168 | "Epoch 5/100 Batch 40/781 - Loss: 1.728, Seconds: 3.60\n",
2169 | "Epoch 5/100 Batch 60/781 - Loss: 1.778, Seconds: 4.61\n",
2170 | "Epoch 5/100 Batch 80/781 - Loss: 1.771, Seconds: 4.45\n",
2171 | "Epoch 5/100 Batch 100/781 - Loss: 1.626, Seconds: 4.25\n",
2172 | "Epoch 5/100 Batch 120/781 - Loss: 1.683, Seconds: 3.89\n",
2173 | "Epoch 5/100 Batch 140/781 - Loss: 1.561, Seconds: 4.01\n",
2174 | "Epoch 5/100 Batch 160/781 - Loss: 1.854, Seconds: 3.41\n",
2175 | "Epoch 5/100 Batch 180/781 - Loss: 1.734, Seconds: 3.83\n",
2176 | "Epoch 5/100 Batch 200/781 - Loss: 1.722, Seconds: 4.33\n",
2177 | "Epoch 5/100 Batch 220/781 - Loss: 1.658, Seconds: 4.25\n",
2178 | "Epoch 5/100 Batch 240/781 - Loss: 1.502, Seconds: 4.13\n",
2179 | "Average loss for this update: 1.702\n",
2180 | "New Record!\n",
2181 | "Epoch 5/100 Batch 260/781 - Loss: 1.586, Seconds: 4.65\n",
2182 | "Epoch 5/100 Batch 280/781 - Loss: 1.648, Seconds: 4.23\n",
2183 | "Epoch 5/100 Batch 300/781 - Loss: 1.751, Seconds: 4.27\n",
2184 | "Epoch 5/100 Batch 320/781 - Loss: 1.813, Seconds: 4.11\n",
2185 | "Epoch 5/100 Batch 340/781 - Loss: 1.650, Seconds: 4.67\n",
2186 | "Epoch 5/100 Batch 360/781 - Loss: 1.718, Seconds: 4.49\n",
2187 | "Epoch 5/100 Batch 380/781 - Loss: 1.521, Seconds: 4.45\n",
2188 | "Epoch 5/100 Batch 400/781 - Loss: 1.666, Seconds: 4.57\n",
2189 | "Epoch 5/100 Batch 420/781 - Loss: 1.639, Seconds: 3.97\n",
2190 | "Epoch 5/100 Batch 440/781 - Loss: 1.749, Seconds: 4.57\n",
2191 | "Epoch 5/100 Batch 460/781 - Loss: 1.810, Seconds: 4.36\n",
2192 | "Epoch 5/100 Batch 480/781 - Loss: 1.624, Seconds: 4.17\n",
2193 | "Epoch 5/100 Batch 500/781 - Loss: 1.654, Seconds: 4.31\n",
2194 | "Average loss for this update: 1.676\n",
2195 | "New Record!\n",
2196 | "Epoch 5/100 Batch 520/781 - Loss: 1.565, Seconds: 3.97\n",
2197 | "Epoch 5/100 Batch 540/781 - Loss: 1.616, Seconds: 4.43\n",
2198 | "Epoch 5/100 Batch 560/781 - Loss: 1.576, Seconds: 3.95\n",
2199 | "Epoch 5/100 Batch 580/781 - Loss: 1.736, Seconds: 4.37\n",
2200 | "Epoch 5/100 Batch 600/781 - Loss: 1.860, Seconds: 4.23\n",
2201 | "Epoch 5/100 Batch 620/781 - Loss: 1.747, Seconds: 4.33\n",
2202 | "Epoch 5/100 Batch 640/781 - Loss: 1.660, Seconds: 4.45\n",
2203 | "Epoch 5/100 Batch 660/781 - Loss: 1.549, Seconds: 4.45\n",
2204 | "Epoch 5/100 Batch 680/781 - Loss: 1.560, Seconds: 4.17\n",
2205 | "Epoch 5/100 Batch 700/781 - Loss: 1.698, Seconds: 4.29\n",
2206 | "Epoch 5/100 Batch 720/781 - Loss: 1.861, Seconds: 4.61\n",
2207 | "Epoch 5/100 Batch 740/781 - Loss: 1.749, Seconds: 4.89\n",
2208 | "Epoch 5/100 Batch 760/781 - Loss: 1.736, Seconds: 4.47\n",
2209 | "Average loss for this update: 1.683\n",
2210 | "No Improvement.\n",
2211 | "Epoch 5/100 Batch 780/781 - Loss: 1.523, Seconds: 4.29\n",
2212 | "Epoch 6/100 Batch 20/781 - Loss: 1.793, Seconds: 4.27\n",
2213 | "Epoch 6/100 Batch 40/781 - Loss: 1.642, Seconds: 3.55\n",
2214 | "Epoch 6/100 Batch 60/781 - Loss: 1.658, Seconds: 4.39\n",
2215 | "Epoch 6/100 Batch 80/781 - Loss: 1.662, Seconds: 4.23\n",
2216 | "Epoch 6/100 Batch 100/781 - Loss: 1.504, Seconds: 4.37\n",
2217 | "Epoch 6/100 Batch 120/781 - Loss: 1.577, Seconds: 4.01\n",
2218 | "Epoch 6/100 Batch 140/781 - Loss: 1.461, Seconds: 3.99\n",
2219 | "Epoch 6/100 Batch 160/781 - Loss: 1.759, Seconds: 3.33\n",
2220 | "Epoch 6/100 Batch 180/781 - Loss: 1.633, Seconds: 3.73\n",
2221 | "Epoch 6/100 Batch 200/781 - Loss: 1.612, Seconds: 4.19\n",
2222 | "Epoch 6/100 Batch 220/781 - Loss: 1.565, Seconds: 4.07\n",
2223 | "Epoch 6/100 Batch 240/781 - Loss: 1.395, Seconds: 4.07\n",
2224 | "Average loss for this update: 1.597\n",
2225 | "New Record!\n",
2226 | "Epoch 6/100 Batch 260/781 - Loss: 1.490, Seconds: 4.61\n",
2227 | "Epoch 6/100 Batch 280/781 - Loss: 1.541, Seconds: 4.25\n",
2228 | "Epoch 6/100 Batch 300/781 - Loss: 1.661, Seconds: 4.39\n",
2229 | "Epoch 6/100 Batch 320/781 - Loss: 1.706, Seconds: 4.23\n",
2230 | "Epoch 6/100 Batch 340/781 - Loss: 1.544, Seconds: 4.51\n",
2231 | "Epoch 6/100 Batch 360/781 - Loss: 1.623, Seconds: 4.45\n",
2232 | "Epoch 6/100 Batch 380/781 - Loss: 1.430, Seconds: 4.61\n",
2233 | "Epoch 6/100 Batch 400/781 - Loss: 1.567, Seconds: 4.49\n",
2234 | "Epoch 6/100 Batch 420/781 - Loss: 1.535, Seconds: 3.97\n",
2235 | "Epoch 6/100 Batch 440/781 - Loss: 1.642, Seconds: 4.67\n",
2236 | "Epoch 6/100 Batch 460/781 - Loss: 1.701, Seconds: 4.35\n",
2237 | "Epoch 6/100 Batch 480/781 - Loss: 1.529, Seconds: 4.07\n",
2238 | "Epoch 6/100 Batch 500/781 - Loss: 1.554, Seconds: 4.33\n",
2239 | "Average loss for this update: 1.575\n",
2240 | "New Record!\n",
2241 | "Epoch 6/100 Batch 520/781 - Loss: 1.462, Seconds: 3.91\n",
2242 | "Epoch 6/100 Batch 540/781 - Loss: 1.530, Seconds: 4.59\n",
2243 | "Epoch 6/100 Batch 560/781 - Loss: 1.482, Seconds: 4.11\n",
2244 | "Epoch 6/100 Batch 580/781 - Loss: 1.658, Seconds: 4.07\n",
2245 | "Epoch 6/100 Batch 600/781 - Loss: 1.717, Seconds: 4.17\n",
2246 | "Epoch 6/100 Batch 620/781 - Loss: 1.624, Seconds: 4.05\n",
2247 | "Epoch 6/100 Batch 640/781 - Loss: 1.551, Seconds: 4.29\n",
2248 | "Epoch 6/100 Batch 660/781 - Loss: 1.433, Seconds: 4.39\n",
2249 | "Epoch 6/100 Batch 680/781 - Loss: 1.468, Seconds: 4.35\n",
2250 | "Epoch 6/100 Batch 700/781 - Loss: 1.611, Seconds: 4.63\n",
2251 | "Epoch 6/100 Batch 720/781 - Loss: 1.763, Seconds: 4.45\n",
2252 | "Epoch 6/100 Batch 740/781 - Loss: 1.646, Seconds: 4.79\n",
2253 | "Epoch 6/100 Batch 760/781 - Loss: 1.631, Seconds: 4.33\n",
2254 | "Average loss for this update: 1.582\n",
2255 | "No Improvement.\n",
2256 | "Epoch 6/100 Batch 780/781 - Loss: 1.434, Seconds: 3.85\n",
2257 | "Epoch 7/100 Batch 20/781 - Loss: 1.688, Seconds: 4.19\n",
2258 | "Epoch 7/100 Batch 40/781 - Loss: 1.547, Seconds: 3.43\n",
2259 | "Epoch 7/100 Batch 60/781 - Loss: 1.570, Seconds: 4.53\n",
2260 | "Epoch 7/100 Batch 80/781 - Loss: 1.564, Seconds: 4.23\n",
2261 | "Epoch 7/100 Batch 100/781 - Loss: 1.413, Seconds: 4.31\n",
2262 | "Epoch 7/100 Batch 120/781 - Loss: 1.472, Seconds: 3.81\n",
2263 | "Epoch 7/100 Batch 140/781 - Loss: 1.395, Seconds: 4.07\n",
2264 | "Epoch 7/100 Batch 160/781 - Loss: 1.645, Seconds: 3.47\n",
2265 | "Epoch 7/100 Batch 180/781 - Loss: 1.555, Seconds: 3.79\n",
2266 | "Epoch 7/100 Batch 200/781 - Loss: 1.534, Seconds: 4.39\n",
2267 | "Epoch 7/100 Batch 220/781 - Loss: 1.498, Seconds: 4.09\n",
2268 | "Epoch 7/100 Batch 240/781 - Loss: 1.307, Seconds: 4.27\n",
2269 | "Average loss for this update: 1.508\n",
2270 | "New Record!\n",
2271 | "Epoch 7/100 Batch 260/781 - Loss: 1.407, Seconds: 4.59\n",
2272 | "Epoch 7/100 Batch 280/781 - Loss: 1.459, Seconds: 4.33\n",
2273 | "Epoch 7/100 Batch 300/781 - Loss: 1.557, Seconds: 4.29\n",
2274 | "Epoch 7/100 Batch 320/781 - Loss: 1.616, Seconds: 4.31\n",
2275 | "Epoch 7/100 Batch 340/781 - Loss: 1.462, Seconds: 4.47\n",
2276 | "Epoch 7/100 Batch 360/781 - Loss: 1.534, Seconds: 4.47\n",
2277 | "Epoch 7/100 Batch 380/781 - Loss: 1.341, Seconds: 4.47\n",
2278 | "Epoch 7/100 Batch 400/781 - Loss: 1.497, Seconds: 4.53\n",
2279 | "Epoch 7/100 Batch 420/781 - Loss: 1.442, Seconds: 4.05\n",
2280 | "Epoch 7/100 Batch 440/781 - Loss: 1.561, Seconds: 4.61\n",
2281 | "Epoch 7/100 Batch 460/781 - Loss: 1.603, Seconds: 4.35\n",
2282 | "Epoch 7/100 Batch 480/781 - Loss: 1.442, Seconds: 3.99\n",
2283 | "Epoch 7/100 Batch 500/781 - Loss: 1.476, Seconds: 4.15\n",
2284 | "Average loss for this update: 1.488\n",
2285 | "New Record!\n",
2286 | "Epoch 7/100 Batch 520/781 - Loss: 1.375, Seconds: 3.99\n",
2287 | "Epoch 7/100 Batch 540/781 - Loss: 1.457, Seconds: 4.47\n",
2288 | "Epoch 7/100 Batch 560/781 - Loss: 1.406, Seconds: 3.95\n"
2289 | ]
2290 | },
2291 | {
2292 | "name": "stdout",
2293 | "output_type": "stream",
2294 | "text": [
2295 | "Epoch 7/100 Batch 580/781 - Loss: 1.564, Seconds: 4.35\n",
2296 | "Epoch 7/100 Batch 600/781 - Loss: 1.615, Seconds: 4.39\n",
2297 | "Epoch 7/100 Batch 620/781 - Loss: 1.533, Seconds: 3.99\n",
2298 | "Epoch 7/100 Batch 640/781 - Loss: 1.448, Seconds: 4.19\n",
2299 | "Epoch 7/100 Batch 660/781 - Loss: 1.343, Seconds: 4.67\n",
2300 | "Epoch 7/100 Batch 680/781 - Loss: 1.388, Seconds: 4.23\n",
2301 | "Epoch 7/100 Batch 700/781 - Loss: 1.504, Seconds: 4.37\n",
2302 | "Epoch 7/100 Batch 720/781 - Loss: 1.675, Seconds: 4.79\n",
2303 | "Epoch 7/100 Batch 740/781 - Loss: 1.553, Seconds: 4.89\n",
2304 | "Epoch 7/100 Batch 760/781 - Loss: 1.534, Seconds: 4.39\n",
2305 | "Average loss for this update: 1.491\n",
2306 | "No Improvement.\n",
2307 | "Epoch 7/100 Batch 780/781 - Loss: 1.354, Seconds: 4.03\n",
2308 | "Epoch 8/100 Batch 20/781 - Loss: 1.612, Seconds: 4.39\n",
2309 | "Epoch 8/100 Batch 40/781 - Loss: 1.480, Seconds: 3.47\n",
2310 | "Epoch 8/100 Batch 60/781 - Loss: 1.498, Seconds: 4.51\n",
2311 | "Epoch 8/100 Batch 80/781 - Loss: 1.474, Seconds: 4.25\n",
2312 | "Epoch 8/100 Batch 100/781 - Loss: 1.334, Seconds: 4.35\n",
2313 | "Epoch 8/100 Batch 120/781 - Loss: 1.410, Seconds: 3.91\n",
2314 | "Epoch 8/100 Batch 140/781 - Loss: 1.320, Seconds: 4.25\n",
2315 | "Epoch 8/100 Batch 160/781 - Loss: 1.559, Seconds: 3.19\n",
2316 | "Epoch 8/100 Batch 180/781 - Loss: 1.465, Seconds: 3.67\n",
2317 | "Epoch 8/100 Batch 200/781 - Loss: 1.444, Seconds: 4.35\n",
2318 | "Epoch 8/100 Batch 220/781 - Loss: 1.420, Seconds: 4.05\n",
2319 | "Epoch 8/100 Batch 240/781 - Loss: 1.253, Seconds: 4.05\n",
2320 | "Average loss for this update: 1.432\n",
2321 | "New Record!\n",
2322 | "Epoch 8/100 Batch 260/781 - Loss: 1.336, Seconds: 4.77\n",
2323 | "Epoch 8/100 Batch 280/781 - Loss: 1.372, Seconds: 4.29\n",
2324 | "Epoch 8/100 Batch 300/781 - Loss: 1.496, Seconds: 4.19\n",
2325 | "Epoch 8/100 Batch 320/781 - Loss: 1.536, Seconds: 4.11\n",
2326 | "Epoch 8/100 Batch 340/781 - Loss: 1.385, Seconds: 4.57\n",
2327 | "Epoch 8/100 Batch 360/781 - Loss: 1.452, Seconds: 4.25\n",
2328 | "Epoch 8/100 Batch 380/781 - Loss: 1.284, Seconds: 4.59\n",
2329 | "Epoch 8/100 Batch 400/781 - Loss: 1.421, Seconds: 4.84\n",
2330 | "Epoch 8/100 Batch 420/781 - Loss: 1.361, Seconds: 4.09\n",
2331 | "Epoch 8/100 Batch 440/781 - Loss: 1.476, Seconds: 4.67\n",
2332 | "Epoch 8/100 Batch 460/781 - Loss: 1.515, Seconds: 4.53\n",
2333 | "Epoch 8/100 Batch 480/781 - Loss: 1.371, Seconds: 4.11\n",
2334 | "Epoch 8/100 Batch 500/781 - Loss: 1.401, Seconds: 4.27\n",
2335 | "Average loss for this update: 1.413\n",
2336 | "New Record!\n",
2337 | "Epoch 8/100 Batch 520/781 - Loss: 1.309, Seconds: 3.93\n",
2338 | "Epoch 8/100 Batch 540/781 - Loss: 1.379, Seconds: 4.45\n",
2339 | "Epoch 8/100 Batch 560/781 - Loss: 1.333, Seconds: 3.97\n",
2340 | "Epoch 8/100 Batch 580/781 - Loss: 1.476, Seconds: 4.23\n",
2341 | "Epoch 8/100 Batch 600/781 - Loss: 1.532, Seconds: 4.17\n",
2342 | "Epoch 8/100 Batch 620/781 - Loss: 1.465, Seconds: 4.19\n",
2343 | "Epoch 8/100 Batch 640/781 - Loss: 1.369, Seconds: 4.27\n",
2344 | "Epoch 8/100 Batch 660/781 - Loss: 1.268, Seconds: 4.65\n",
2345 | "Epoch 8/100 Batch 680/781 - Loss: 1.312, Seconds: 4.53\n",
2346 | "Epoch 8/100 Batch 700/781 - Loss: 1.419, Seconds: 4.55\n",
2347 | "Epoch 8/100 Batch 720/781 - Loss: 1.588, Seconds: 4.61\n",
2348 | "Epoch 8/100 Batch 740/781 - Loss: 1.482, Seconds: 4.87\n",
2349 | "Epoch 8/100 Batch 760/781 - Loss: 1.471, Seconds: 4.53\n",
2350 | "Average loss for this update: 1.414\n",
2351 | "No Improvement.\n",
2352 | "Epoch 8/100 Batch 780/781 - Loss: 1.285, Seconds: 4.13\n",
2353 | "Epoch 9/100 Batch 20/781 - Loss: 1.542, Seconds: 4.19\n",
2354 | "Epoch 9/100 Batch 40/781 - Loss: 1.404, Seconds: 3.47\n",
2355 | "Epoch 9/100 Batch 60/781 - Loss: 1.421, Seconds: 4.47\n",
2356 | "Epoch 9/100 Batch 80/781 - Loss: 1.407, Seconds: 4.31\n",
2357 | "Epoch 9/100 Batch 100/781 - Loss: 1.268, Seconds: 4.25\n",
2358 | "Epoch 9/100 Batch 120/781 - Loss: 1.336, Seconds: 3.81\n",
2359 | "Epoch 9/100 Batch 140/781 - Loss: 1.259, Seconds: 4.01\n",
2360 | "Epoch 9/100 Batch 160/781 - Loss: 1.500, Seconds: 3.33\n",
2361 | "Epoch 9/100 Batch 180/781 - Loss: 1.395, Seconds: 3.79\n",
2362 | "Epoch 9/100 Batch 200/781 - Loss: 1.392, Seconds: 4.47\n",
2363 | "Epoch 9/100 Batch 220/781 - Loss: 1.341, Seconds: 4.05\n",
2364 | "Epoch 9/100 Batch 240/781 - Loss: 1.192, Seconds: 4.09\n",
2365 | "Average loss for this update: 1.364\n",
2366 | "New Record!\n",
2367 | "Epoch 9/100 Batch 260/781 - Loss: 1.268, Seconds: 4.59\n",
2368 | "Epoch 9/100 Batch 280/781 - Loss: 1.320, Seconds: 4.33\n",
2369 | "Epoch 9/100 Batch 300/781 - Loss: 1.412, Seconds: 4.11\n",
2370 | "Epoch 9/100 Batch 320/781 - Loss: 1.479, Seconds: 4.25\n",
2371 | "Epoch 9/100 Batch 340/781 - Loss: 1.317, Seconds: 4.67\n",
2372 | "Epoch 9/100 Batch 360/781 - Loss: 1.379, Seconds: 4.53\n",
2373 | "Epoch 9/100 Batch 380/781 - Loss: 1.222, Seconds: 4.81\n",
2374 | "Epoch 9/100 Batch 400/781 - Loss: 1.358, Seconds: 4.67\n",
2375 | "Epoch 9/100 Batch 420/781 - Loss: 1.299, Seconds: 4.03\n",
2376 | "Epoch 9/100 Batch 440/781 - Loss: 1.388, Seconds: 4.67\n",
2377 | "Epoch 9/100 Batch 460/781 - Loss: 1.451, Seconds: 4.45\n",
2378 | "Epoch 9/100 Batch 480/781 - Loss: 1.303, Seconds: 4.01\n",
2379 | "Epoch 9/100 Batch 500/781 - Loss: 1.344, Seconds: 4.35\n",
2380 | "Average loss for this update: 1.346\n",
2381 | "New Record!\n",
2382 | "Epoch 9/100 Batch 520/781 - Loss: 1.245, Seconds: 3.93\n",
2383 | "Epoch 9/100 Batch 540/781 - Loss: 1.300, Seconds: 4.61\n",
2384 | "Epoch 9/100 Batch 560/781 - Loss: 1.257, Seconds: 4.07\n",
2385 | "Epoch 9/100 Batch 580/781 - Loss: 1.409, Seconds: 4.23\n",
2386 | "Epoch 9/100 Batch 600/781 - Loss: 1.465, Seconds: 4.37\n",
2387 | "Epoch 9/100 Batch 620/781 - Loss: 1.373, Seconds: 3.99\n",
2388 | "Epoch 9/100 Batch 640/781 - Loss: 1.316, Seconds: 4.29\n",
2389 | "Epoch 9/100 Batch 660/781 - Loss: 1.198, Seconds: 4.61\n",
2390 | "Epoch 9/100 Batch 680/781 - Loss: 1.263, Seconds: 4.31\n",
2391 | "Epoch 9/100 Batch 700/781 - Loss: 1.369, Seconds: 4.55\n",
2392 | "Epoch 9/100 Batch 720/781 - Loss: 1.511, Seconds: 4.77\n",
2393 | "Epoch 9/100 Batch 740/781 - Loss: 1.425, Seconds: 4.73\n",
2394 | "Epoch 9/100 Batch 760/781 - Loss: 1.413, Seconds: 4.19\n",
2395 | "Average loss for this update: 1.349\n",
2396 | "No Improvement.\n",
2397 | "Epoch 9/100 Batch 780/781 - Loss: 1.234, Seconds: 3.89\n",
2398 | "Epoch 10/100 Batch 20/781 - Loss: 1.470, Seconds: 4.47\n",
2399 | "Epoch 10/100 Batch 40/781 - Loss: 1.351, Seconds: 3.53\n",
2400 | "Epoch 10/100 Batch 60/781 - Loss: 1.343, Seconds: 4.73\n",
2401 | "Epoch 10/100 Batch 80/781 - Loss: 1.350, Seconds: 4.45\n",
2402 | "Epoch 10/100 Batch 100/781 - Loss: 1.202, Seconds: 4.25\n",
2403 | "Epoch 10/100 Batch 120/781 - Loss: 1.274, Seconds: 3.79\n",
2404 | "Epoch 10/100 Batch 140/781 - Loss: 1.189, Seconds: 4.07\n",
2405 | "Epoch 10/100 Batch 160/781 - Loss: 1.432, Seconds: 3.25\n",
2406 | "Epoch 10/100 Batch 180/781 - Loss: 1.339, Seconds: 3.65\n",
2407 | "Epoch 10/100 Batch 200/781 - Loss: 1.334, Seconds: 4.65\n",
2408 | "Epoch 10/100 Batch 220/781 - Loss: 1.299, Seconds: 4.01\n",
2409 | "Epoch 10/100 Batch 240/781 - Loss: 1.140, Seconds: 4.07\n",
2410 | "Average loss for this update: 1.304\n",
2411 | "New Record!\n",
2412 | "Epoch 10/100 Batch 260/781 - Loss: 1.210, Seconds: 4.53\n",
2413 | "Epoch 10/100 Batch 280/781 - Loss: 1.272, Seconds: 4.45\n",
2414 | "Epoch 10/100 Batch 300/781 - Loss: 1.351, Seconds: 4.19\n",
2415 | "Epoch 10/100 Batch 320/781 - Loss: 1.399, Seconds: 4.25\n",
2416 | "Epoch 10/100 Batch 340/781 - Loss: 1.273, Seconds: 4.63\n",
2417 | "Epoch 10/100 Batch 360/781 - Loss: 1.301, Seconds: 4.59\n",
2418 | "Epoch 10/100 Batch 380/781 - Loss: 1.164, Seconds: 4.69\n",
2419 | "Epoch 10/100 Batch 400/781 - Loss: 1.284, Seconds: 4.75\n",
2420 | "Epoch 10/100 Batch 420/781 - Loss: 1.221, Seconds: 3.95\n",
2421 | "Epoch 10/100 Batch 440/781 - Loss: 1.333, Seconds: 4.67\n",
2422 | "Epoch 10/100 Batch 460/781 - Loss: 1.383, Seconds: 4.61\n",
2423 | "Epoch 10/100 Batch 480/781 - Loss: 1.240, Seconds: 4.19\n",
2424 | "Epoch 10/100 Batch 500/781 - Loss: 1.298, Seconds: 4.15\n",
2425 | "Average loss for this update: 1.282\n",
2426 | "New Record!\n",
2427 | "Epoch 10/100 Batch 520/781 - Loss: 1.167, Seconds: 4.07\n",
2428 | "Epoch 10/100 Batch 540/781 - Loss: 1.242, Seconds: 4.47\n",
2429 | "Epoch 10/100 Batch 560/781 - Loss: 1.201, Seconds: 4.03\n",
2430 | "Epoch 10/100 Batch 580/781 - Loss: 1.343, Seconds: 4.33\n",
2431 | "Epoch 10/100 Batch 600/781 - Loss: 1.399, Seconds: 4.63\n",
2432 | "Epoch 10/100 Batch 620/781 - Loss: 1.308, Seconds: 4.37\n",
2433 | "Epoch 10/100 Batch 640/781 - Loss: 1.260, Seconds: 4.37\n",
2434 | "Epoch 10/100 Batch 660/781 - Loss: 1.139, Seconds: 4.35\n",
2435 | "Epoch 10/100 Batch 680/781 - Loss: 1.203, Seconds: 4.41\n",
2436 | "Epoch 10/100 Batch 700/781 - Loss: 1.299, Seconds: 4.39\n",
2437 | "Epoch 10/100 Batch 720/781 - Loss: 1.437, Seconds: 4.59\n",
2438 | "Epoch 10/100 Batch 740/781 - Loss: 1.346, Seconds: 4.73\n",
2439 | "Epoch 10/100 Batch 760/781 - Loss: 1.340, Seconds: 4.27\n",
2440 | "Average loss for this update: 1.284\n",
2441 | "No Improvement.\n",
2442 | "Epoch 10/100 Batch 780/781 - Loss: 1.176, Seconds: 4.33\n",
2443 | "Epoch 11/100 Batch 20/781 - Loss: 1.408, Seconds: 4.49\n",
2444 | "Epoch 11/100 Batch 40/781 - Loss: 1.280, Seconds: 3.41\n",
2445 | "Epoch 11/100 Batch 60/781 - Loss: 1.287, Seconds: 4.65\n"
2446 | ]
2447 | },
2448 | {
2449 | "name": "stdout",
2450 | "output_type": "stream",
2451 | "text": [
2452 | "Epoch 11/100 Batch 80/781 - Loss: 1.300, Seconds: 4.27\n",
2453 | "Epoch 11/100 Batch 100/781 - Loss: 1.147, Seconds: 4.39\n",
2454 | "Epoch 11/100 Batch 120/781 - Loss: 1.242, Seconds: 3.79\n",
2455 | "Epoch 11/100 Batch 140/781 - Loss: 1.150, Seconds: 4.17\n",
2456 | "Epoch 11/100 Batch 160/781 - Loss: 1.364, Seconds: 3.25\n",
2457 | "Epoch 11/100 Batch 180/781 - Loss: 1.293, Seconds: 3.67\n",
2458 | "Epoch 11/100 Batch 200/781 - Loss: 1.263, Seconds: 4.33\n",
2459 | "Epoch 11/100 Batch 220/781 - Loss: 1.238, Seconds: 4.09\n",
2460 | "Epoch 11/100 Batch 240/781 - Loss: 1.088, Seconds: 4.01\n",
2461 | "Average loss for this update: 1.249\n",
2462 | "New Record!\n",
2463 | "Epoch 11/100 Batch 260/781 - Loss: 1.169, Seconds: 4.55\n",
2464 | "Epoch 11/100 Batch 280/781 - Loss: 1.206, Seconds: 4.41\n",
2465 | "Epoch 11/100 Batch 300/781 - Loss: 1.294, Seconds: 4.35\n",
2466 | "Epoch 11/100 Batch 320/781 - Loss: 1.366, Seconds: 4.17\n",
2467 | "Epoch 11/100 Batch 340/781 - Loss: 1.212, Seconds: 4.67\n",
2468 | "Epoch 11/100 Batch 360/781 - Loss: 1.248, Seconds: 4.39\n",
2469 | "Epoch 11/100 Batch 380/781 - Loss: 1.105, Seconds: 4.61\n",
2470 | "Epoch 11/100 Batch 400/781 - Loss: 1.236, Seconds: 4.71\n",
2471 | "Epoch 11/100 Batch 420/781 - Loss: 1.175, Seconds: 3.97\n",
2472 | "Epoch 11/100 Batch 440/781 - Loss: 1.276, Seconds: 4.83\n",
2473 | "Epoch 11/100 Batch 460/781 - Loss: 1.346, Seconds: 4.59\n",
2474 | "Epoch 11/100 Batch 480/781 - Loss: 1.198, Seconds: 4.01\n",
2475 | "Epoch 11/100 Batch 500/781 - Loss: 1.244, Seconds: 4.39\n",
2476 | "Average loss for this update: 1.233\n",
2477 | "New Record!\n",
2478 | "Epoch 11/100 Batch 520/781 - Loss: 1.132, Seconds: 3.83\n",
2479 | "Epoch 11/100 Batch 540/781 - Loss: 1.182, Seconds: 4.65\n",
2480 | "Epoch 11/100 Batch 560/781 - Loss: 1.153, Seconds: 3.89\n",
2481 | "Epoch 11/100 Batch 580/781 - Loss: 1.290, Seconds: 4.23\n",
2482 | "Epoch 11/100 Batch 600/781 - Loss: 1.335, Seconds: 4.17\n",
2483 | "Epoch 11/100 Batch 620/781 - Loss: 1.250, Seconds: 4.09\n",
2484 | "Epoch 11/100 Batch 640/781 - Loss: 1.202, Seconds: 4.25\n",
2485 | "Epoch 11/100 Batch 660/781 - Loss: 1.085, Seconds: 4.45\n",
2486 | "Epoch 11/100 Batch 680/781 - Loss: 1.150, Seconds: 4.27\n",
2487 | "Epoch 11/100 Batch 700/781 - Loss: 1.248, Seconds: 4.35\n",
2488 | "Epoch 11/100 Batch 720/781 - Loss: 1.394, Seconds: 4.65\n",
2489 | "Epoch 11/100 Batch 740/781 - Loss: 1.287, Seconds: 4.75\n",
2490 | "Epoch 11/100 Batch 760/781 - Loss: 1.294, Seconds: 4.61\n",
2491 | "Average loss for this update: 1.231\n",
2492 | "New Record!\n",
2493 | "Epoch 11/100 Batch 780/781 - Loss: 1.121, Seconds: 3.97\n",
2494 | "Epoch 12/100 Batch 20/781 - Loss: 1.349, Seconds: 4.37\n",
2495 | "Epoch 12/100 Batch 40/781 - Loss: 1.242, Seconds: 3.45\n",
2496 | "Epoch 12/100 Batch 60/781 - Loss: 1.216, Seconds: 4.39\n",
2497 | "Epoch 12/100 Batch 80/781 - Loss: 1.235, Seconds: 4.41\n",
2498 | "Epoch 12/100 Batch 100/781 - Loss: 1.096, Seconds: 4.31\n",
2499 | "Epoch 12/100 Batch 120/781 - Loss: 1.184, Seconds: 3.99\n",
2500 | "Epoch 12/100 Batch 140/781 - Loss: 1.113, Seconds: 4.25\n",
2501 | "Epoch 12/100 Batch 160/781 - Loss: 1.303, Seconds: 3.27\n",
2502 | "Epoch 12/100 Batch 180/781 - Loss: 1.239, Seconds: 3.71\n",
2503 | "Epoch 12/100 Batch 200/781 - Loss: 1.202, Seconds: 4.39\n",
2504 | "Epoch 12/100 Batch 220/781 - Loss: 1.175, Seconds: 4.11\n",
2505 | "Epoch 12/100 Batch 240/781 - Loss: 1.029, Seconds: 4.22\n",
2506 | "Average loss for this update: 1.193\n",
2507 | "New Record!\n",
2508 | "Epoch 12/100 Batch 260/781 - Loss: 1.113, Seconds: 4.53\n",
2509 | "Epoch 12/100 Batch 280/781 - Loss: 1.160, Seconds: 4.39\n",
2510 | "Epoch 12/100 Batch 300/781 - Loss: 1.231, Seconds: 4.23\n",
2511 | "Epoch 12/100 Batch 320/781 - Loss: 1.298, Seconds: 4.17\n",
2512 | "Epoch 12/100 Batch 340/781 - Loss: 1.161, Seconds: 4.51\n",
2513 | "Epoch 12/100 Batch 360/781 - Loss: 1.190, Seconds: 4.43\n",
2514 | "Epoch 12/100 Batch 380/781 - Loss: 1.045, Seconds: 4.59\n",
2515 | "Epoch 12/100 Batch 400/781 - Loss: 1.182, Seconds: 4.51\n",
2516 | "Epoch 12/100 Batch 420/781 - Loss: 1.127, Seconds: 3.95\n",
2517 | "Epoch 12/100 Batch 440/781 - Loss: 1.213, Seconds: 4.75\n",
2518 | "Epoch 12/100 Batch 460/781 - Loss: 1.280, Seconds: 4.53\n",
2519 | "Epoch 12/100 Batch 480/781 - Loss: 1.149, Seconds: 4.19\n",
2520 | "Epoch 12/100 Batch 500/781 - Loss: 1.196, Seconds: 4.19\n",
2521 | "Average loss for this update: 1.177\n",
2522 | "New Record!\n",
2523 | "Epoch 12/100 Batch 520/781 - Loss: 1.083, Seconds: 3.99\n",
2524 | "Epoch 12/100 Batch 540/781 - Loss: 1.148, Seconds: 4.39\n",
2525 | "Epoch 12/100 Batch 560/781 - Loss: 1.103, Seconds: 3.95\n",
2526 | "Epoch 12/100 Batch 580/781 - Loss: 1.230, Seconds: 4.19\n",
2527 | "Epoch 12/100 Batch 600/781 - Loss: 1.295, Seconds: 4.21\n",
2528 | "Epoch 12/100 Batch 620/781 - Loss: 1.208, Seconds: 4.09\n",
2529 | "Epoch 12/100 Batch 640/781 - Loss: 1.152, Seconds: 4.41\n",
2530 | "Epoch 12/100 Batch 660/781 - Loss: 1.040, Seconds: 4.51\n",
2531 | "Epoch 12/100 Batch 680/781 - Loss: 1.108, Seconds: 4.37\n",
2532 | "Epoch 12/100 Batch 700/781 - Loss: 1.192, Seconds: 4.31\n",
2533 | "Epoch 12/100 Batch 720/781 - Loss: 1.338, Seconds: 4.39\n",
2534 | "Epoch 12/100 Batch 740/781 - Loss: 1.234, Seconds: 4.79\n",
2535 | "Epoch 12/100 Batch 760/781 - Loss: 1.241, Seconds: 4.33\n",
2536 | "Average loss for this update: 1.184\n",
2537 | "No Improvement.\n",
2538 | "Epoch 12/100 Batch 780/781 - Loss: 1.095, Seconds: 3.89\n",
2539 | "Epoch 13/100 Batch 20/781 - Loss: 1.297, Seconds: 4.33\n",
2540 | "Epoch 13/100 Batch 40/781 - Loss: 1.190, Seconds: 3.41\n",
2541 | "Epoch 13/100 Batch 60/781 - Loss: 1.192, Seconds: 4.51\n",
2542 | "Epoch 13/100 Batch 80/781 - Loss: 1.197, Seconds: 4.49\n",
2543 | "Epoch 13/100 Batch 100/781 - Loss: 1.058, Seconds: 4.13\n",
2544 | "Epoch 13/100 Batch 120/781 - Loss: 1.144, Seconds: 3.99\n",
2545 | "Epoch 13/100 Batch 140/781 - Loss: 1.065, Seconds: 4.09\n",
2546 | "Epoch 13/100 Batch 160/781 - Loss: 1.246, Seconds: 3.27\n",
2547 | "Epoch 13/100 Batch 180/781 - Loss: 1.194, Seconds: 3.65\n",
2548 | "Epoch 13/100 Batch 200/781 - Loss: 1.164, Seconds: 4.35\n",
2549 | "Epoch 13/100 Batch 220/781 - Loss: 1.142, Seconds: 4.25\n",
2550 | "Epoch 13/100 Batch 240/781 - Loss: 0.991, Seconds: 3.99\n",
2551 | "Average loss for this update: 1.151\n",
2552 | "New Record!\n",
2553 | "Epoch 13/100 Batch 260/781 - Loss: 1.075, Seconds: 4.61\n",
2554 | "Epoch 13/100 Batch 280/781 - Loss: 1.125, Seconds: 4.33\n",
2555 | "Epoch 13/100 Batch 300/781 - Loss: 1.210, Seconds: 4.13\n",
2556 | "Epoch 13/100 Batch 320/781 - Loss: 1.260, Seconds: 4.29\n",
2557 | "Epoch 13/100 Batch 340/781 - Loss: 1.127, Seconds: 4.67\n",
2558 | "Epoch 13/100 Batch 360/781 - Loss: 1.152, Seconds: 4.37\n",
2559 | "Epoch 13/100 Batch 380/781 - Loss: 1.007, Seconds: 4.73\n",
2560 | "Epoch 13/100 Batch 400/781 - Loss: 1.149, Seconds: 4.59\n",
2561 | "Epoch 13/100 Batch 420/781 - Loss: 1.071, Seconds: 3.97\n",
2562 | "Epoch 13/100 Batch 440/781 - Loss: 1.166, Seconds: 4.61\n",
2563 | "Epoch 13/100 Batch 460/781 - Loss: 1.235, Seconds: 4.59\n",
2564 | "Epoch 13/100 Batch 480/781 - Loss: 1.106, Seconds: 4.33\n",
2565 | "Epoch 13/100 Batch 500/781 - Loss: 1.142, Seconds: 4.33\n",
2566 | "Average loss for this update: 1.137\n",
2567 | "New Record!\n",
2568 | "Epoch 13/100 Batch 520/781 - Loss: 1.041, Seconds: 3.83\n",
2569 | "Epoch 13/100 Batch 540/781 - Loss: 1.099, Seconds: 4.39\n",
2570 | "Epoch 13/100 Batch 560/781 - Loss: 1.055, Seconds: 3.87\n",
2571 | "Epoch 13/100 Batch 580/781 - Loss: 1.182, Seconds: 4.33\n",
2572 | "Epoch 13/100 Batch 600/781 - Loss: 1.233, Seconds: 4.35\n",
2573 | "Epoch 13/100 Batch 620/781 - Loss: 1.164, Seconds: 4.01\n",
2574 | "Epoch 13/100 Batch 640/781 - Loss: 1.122, Seconds: 4.29\n",
2575 | "Epoch 13/100 Batch 660/781 - Loss: 0.998, Seconds: 4.57\n",
2576 | "Epoch 13/100 Batch 680/781 - Loss: 1.072, Seconds: 4.31\n",
2577 | "Epoch 13/100 Batch 700/781 - Loss: 1.151, Seconds: 4.43\n",
2578 | "Epoch 13/100 Batch 720/781 - Loss: 1.302, Seconds: 4.71\n",
2579 | "Epoch 13/100 Batch 740/781 - Loss: 1.185, Seconds: 4.77\n",
2580 | "Epoch 13/100 Batch 760/781 - Loss: 1.195, Seconds: 4.29\n",
2581 | "Average loss for this update: 1.139\n",
2582 | "No Improvement.\n",
2583 | "Epoch 13/100 Batch 780/781 - Loss: 1.048, Seconds: 3.97\n",
2584 | "Epoch 14/100 Batch 20/781 - Loss: 1.255, Seconds: 4.41\n",
2585 | "Epoch 14/100 Batch 40/781 - Loss: 1.164, Seconds: 3.59\n",
2586 | "Epoch 14/100 Batch 60/781 - Loss: 1.130, Seconds: 4.61\n",
2587 | "Epoch 14/100 Batch 80/781 - Loss: 1.142, Seconds: 4.33\n",
2588 | "Epoch 14/100 Batch 100/781 - Loss: 1.009, Seconds: 4.31\n",
2589 | "Epoch 14/100 Batch 120/781 - Loss: 1.100, Seconds: 4.03\n",
2590 | "Epoch 14/100 Batch 140/781 - Loss: 1.025, Seconds: 4.05\n",
2591 | "Epoch 14/100 Batch 160/781 - Loss: 1.202, Seconds: 3.21\n",
2592 | "Epoch 14/100 Batch 180/781 - Loss: 1.192, Seconds: 3.63\n",
2593 | "Epoch 14/100 Batch 200/781 - Loss: 1.128, Seconds: 4.41\n",
2594 | "Epoch 14/100 Batch 220/781 - Loss: 1.109, Seconds: 4.23\n",
2595 | "Epoch 14/100 Batch 240/781 - Loss: 0.962, Seconds: 4.15\n",
2596 | "Average loss for this update: 1.113\n",
2597 | "New Record!\n",
2598 | "Epoch 14/100 Batch 260/781 - Loss: 1.041, Seconds: 4.73\n",
2599 | "Epoch 14/100 Batch 280/781 - Loss: 1.082, Seconds: 4.46\n",
2600 | "Epoch 14/100 Batch 300/781 - Loss: 1.156, Seconds: 4.21\n",
2601 | "Epoch 14/100 Batch 320/781 - Loss: 1.214, Seconds: 4.25\n",
2602 | "Epoch 14/100 Batch 340/781 - Loss: 1.083, Seconds: 4.58\n"
2603 | ]
2604 | },
2605 | {
2606 | "name": "stdout",
2607 | "output_type": "stream",
2608 | "text": [
2609 | "Epoch 14/100 Batch 360/781 - Loss: 1.113, Seconds: 4.67\n",
2610 | "Epoch 14/100 Batch 380/781 - Loss: 0.979, Seconds: 4.51\n",
2611 | "Epoch 14/100 Batch 400/781 - Loss: 1.119, Seconds: 4.61\n",
2612 | "Epoch 14/100 Batch 420/781 - Loss: 1.048, Seconds: 3.91\n",
2613 | "Epoch 14/100 Batch 440/781 - Loss: 1.154, Seconds: 4.63\n",
2614 | "Epoch 14/100 Batch 460/781 - Loss: 1.194, Seconds: 4.39\n",
2615 | "Epoch 14/100 Batch 480/781 - Loss: 1.065, Seconds: 4.07\n",
2616 | "Epoch 14/100 Batch 500/781 - Loss: 1.117, Seconds: 4.23\n",
2617 | "Average loss for this update: 1.101\n",
2618 | "New Record!\n",
2619 | "Epoch 14/100 Batch 520/781 - Loss: 1.005, Seconds: 3.85\n",
2620 | "Epoch 14/100 Batch 540/781 - Loss: 1.053, Seconds: 4.37\n",
2621 | "Epoch 14/100 Batch 560/781 - Loss: 1.018, Seconds: 3.95\n",
2622 | "Epoch 14/100 Batch 580/781 - Loss: 1.157, Seconds: 4.49\n",
2623 | "Epoch 14/100 Batch 600/781 - Loss: 1.189, Seconds: 4.21\n",
2624 | "Epoch 14/100 Batch 620/781 - Loss: 1.123, Seconds: 4.21\n",
2625 | "Epoch 14/100 Batch 640/781 - Loss: 1.087, Seconds: 4.49\n",
2626 | "Epoch 14/100 Batch 660/781 - Loss: 0.954, Seconds: 4.53\n",
2627 | "Epoch 14/100 Batch 680/781 - Loss: 1.025, Seconds: 4.35\n",
2628 | "Epoch 14/100 Batch 700/781 - Loss: 1.118, Seconds: 4.49\n",
2629 | "Epoch 14/100 Batch 720/781 - Loss: 1.248, Seconds: 4.71\n",
2630 | "Epoch 14/100 Batch 740/781 - Loss: 1.166, Seconds: 4.79\n",
2631 | "Epoch 14/100 Batch 760/781 - Loss: 1.166, Seconds: 4.27\n",
2632 | "Average loss for this update: 1.102\n",
2633 | "No Improvement.\n",
2634 | "Epoch 14/100 Batch 780/781 - Loss: 1.011, Seconds: 3.85\n",
2635 | "Epoch 15/100 Batch 20/781 - Loss: 1.214, Seconds: 4.19\n",
2636 | "Epoch 15/100 Batch 40/781 - Loss: 1.116, Seconds: 3.43\n",
2637 | "Epoch 15/100 Batch 60/781 - Loss: 1.103, Seconds: 4.61\n",
2638 | "Epoch 15/100 Batch 80/781 - Loss: 1.109, Seconds: 4.43\n",
2639 | "Epoch 15/100 Batch 100/781 - Loss: 0.976, Seconds: 4.29\n",
2640 | "Epoch 15/100 Batch 120/781 - Loss: 1.064, Seconds: 3.89\n",
2641 | "Epoch 15/100 Batch 140/781 - Loss: 1.001, Seconds: 3.99\n",
2642 | "Epoch 15/100 Batch 160/781 - Loss: 1.159, Seconds: 3.27\n",
2643 | "Epoch 15/100 Batch 180/781 - Loss: 1.123, Seconds: 3.79\n",
2644 | "Epoch 15/100 Batch 200/781 - Loss: 1.082, Seconds: 4.39\n",
2645 | "Epoch 15/100 Batch 220/781 - Loss: 1.071, Seconds: 4.23\n",
2646 | "Epoch 15/100 Batch 240/781 - Loss: 0.923, Seconds: 4.15\n",
2647 | "Average loss for this update: 1.074\n",
2648 | "New Record!\n",
2649 | "Epoch 15/100 Batch 260/781 - Loss: 1.008, Seconds: 4.67\n",
2650 | "Epoch 15/100 Batch 280/781 - Loss: 1.035, Seconds: 4.19\n",
2651 | "Epoch 15/100 Batch 300/781 - Loss: 1.130, Seconds: 4.13\n",
2652 | "Epoch 15/100 Batch 320/781 - Loss: 1.169, Seconds: 4.25\n",
2653 | "Epoch 15/100 Batch 340/781 - Loss: 1.041, Seconds: 4.71\n",
2654 | "Epoch 15/100 Batch 360/781 - Loss: 1.092, Seconds: 4.57\n",
2655 | "Epoch 15/100 Batch 380/781 - Loss: 0.944, Seconds: 4.49\n",
2656 | "Epoch 15/100 Batch 400/781 - Loss: 1.076, Seconds: 4.73\n",
2657 | "Epoch 15/100 Batch 420/781 - Loss: 1.013, Seconds: 3.97\n",
2658 | "Epoch 15/100 Batch 440/781 - Loss: 1.091, Seconds: 4.75\n",
2659 | "Epoch 15/100 Batch 460/781 - Loss: 1.146, Seconds: 4.49\n",
2660 | "Epoch 15/100 Batch 480/781 - Loss: 1.036, Seconds: 4.03\n",
2661 | "Epoch 15/100 Batch 500/781 - Loss: 1.069, Seconds: 4.27\n",
2662 | "Average loss for this update: 1.062\n",
2663 | "New Record!\n",
2664 | "Epoch 15/100 Batch 520/781 - Loss: 0.970, Seconds: 4.00\n",
2665 | "Epoch 15/100 Batch 540/781 - Loss: 1.020, Seconds: 4.40\n",
2666 | "Epoch 15/100 Batch 560/781 - Loss: 0.996, Seconds: 3.93\n",
2667 | "Epoch 15/100 Batch 580/781 - Loss: 1.105, Seconds: 4.41\n",
2668 | "Epoch 15/100 Batch 600/781 - Loss: 1.139, Seconds: 4.35\n",
2669 | "Epoch 15/100 Batch 620/781 - Loss: 1.069, Seconds: 4.37\n",
2670 | "Epoch 15/100 Batch 640/781 - Loss: 1.056, Seconds: 4.35\n",
2671 | "Epoch 15/100 Batch 660/781 - Loss: 0.931, Seconds: 4.43\n",
2672 | "Epoch 15/100 Batch 680/781 - Loss: 0.996, Seconds: 4.57\n",
2673 | "Epoch 15/100 Batch 700/781 - Loss: 1.086, Seconds: 4.29\n",
2674 | "Epoch 15/100 Batch 720/781 - Loss: 1.216, Seconds: 4.43\n",
2675 | "Epoch 15/100 Batch 740/781 - Loss: 1.116, Seconds: 4.77\n",
2676 | "Epoch 15/100 Batch 760/781 - Loss: 1.131, Seconds: 4.51\n",
2677 | "Average loss for this update: 1.065\n",
2678 | "No Improvement.\n",
2679 | "Epoch 15/100 Batch 780/781 - Loss: 0.985, Seconds: 3.83\n",
2680 | "Epoch 16/100 Batch 20/781 - Loss: 1.174, Seconds: 4.33\n",
2681 | "Epoch 16/100 Batch 40/781 - Loss: 1.077, Seconds: 3.51\n",
2682 | "Epoch 16/100 Batch 60/781 - Loss: 1.082, Seconds: 4.41\n",
2683 | "Epoch 16/100 Batch 80/781 - Loss: 1.067, Seconds: 4.49\n",
2684 | "Epoch 16/100 Batch 100/781 - Loss: 0.946, Seconds: 4.49\n",
2685 | "Epoch 16/100 Batch 120/781 - Loss: 1.019, Seconds: 3.83\n",
2686 | "Epoch 16/100 Batch 140/781 - Loss: 0.968, Seconds: 4.01\n",
2687 | "Epoch 16/100 Batch 160/781 - Loss: 1.127, Seconds: 3.33\n",
2688 | "Epoch 16/100 Batch 180/781 - Loss: 1.093, Seconds: 3.91\n",
2689 | "Epoch 16/100 Batch 200/781 - Loss: 1.045, Seconds: 4.43\n",
2690 | "Epoch 16/100 Batch 220/781 - Loss: 1.044, Seconds: 4.09\n",
2691 | "Epoch 16/100 Batch 240/781 - Loss: 0.892, Seconds: 4.35\n",
2692 | "Average loss for this update: 1.04\n",
2693 | "New Record!\n",
2694 | "Epoch 16/100 Batch 260/781 - Loss: 0.973, Seconds: 4.65\n",
2695 | "Epoch 16/100 Batch 280/781 - Loss: 1.016, Seconds: 4.39\n",
2696 | "Epoch 16/100 Batch 300/781 - Loss: 1.095, Seconds: 4.27\n",
2697 | "Epoch 16/100 Batch 320/781 - Loss: 1.139, Seconds: 4.05\n",
2698 | "Epoch 16/100 Batch 340/781 - Loss: 1.002, Seconds: 4.85\n",
2699 | "Epoch 16/100 Batch 360/781 - Loss: 1.058, Seconds: 4.49\n",
2700 | "Epoch 16/100 Batch 380/781 - Loss: 0.909, Seconds: 4.65\n",
2701 | "Epoch 16/100 Batch 400/781 - Loss: 1.042, Seconds: 4.63\n",
2702 | "Epoch 16/100 Batch 420/781 - Loss: 0.985, Seconds: 4.01\n",
2703 | "Epoch 16/100 Batch 440/781 - Loss: 1.053, Seconds: 4.59\n",
2704 | "Epoch 16/100 Batch 460/781 - Loss: 1.115, Seconds: 4.41\n",
2705 | "Epoch 16/100 Batch 480/781 - Loss: 1.001, Seconds: 4.27\n",
2706 | "Epoch 16/100 Batch 500/781 - Loss: 1.052, Seconds: 4.31\n",
2707 | "Average loss for this update: 1.031\n",
2708 | "New Record!\n",
2709 | "Epoch 16/100 Batch 520/781 - Loss: 0.940, Seconds: 3.91\n",
2710 | "Epoch 16/100 Batch 540/781 - Loss: 1.000, Seconds: 4.61\n",
2711 | "Epoch 16/100 Batch 560/781 - Loss: 0.961, Seconds: 4.21\n",
2712 | "Epoch 16/100 Batch 580/781 - Loss: 1.082, Seconds: 4.31\n",
2713 | "Epoch 16/100 Batch 600/781 - Loss: 1.110, Seconds: 4.33\n",
2714 | "Epoch 16/100 Batch 620/781 - Loss: 1.035, Seconds: 4.13\n",
2715 | "Epoch 16/100 Batch 640/781 - Loss: 1.026, Seconds: 4.25\n",
2716 | "Epoch 16/100 Batch 660/781 - Loss: 0.902, Seconds: 4.53\n",
2717 | "Epoch 16/100 Batch 680/781 - Loss: 0.970, Seconds: 4.41\n",
2718 | "Epoch 16/100 Batch 700/781 - Loss: 1.062, Seconds: 4.60\n",
2719 | "Epoch 16/100 Batch 720/781 - Loss: 1.192, Seconds: 4.63\n",
2720 | "Epoch 16/100 Batch 740/781 - Loss: 1.094, Seconds: 4.85\n",
2721 | "Epoch 16/100 Batch 760/781 - Loss: 1.111, Seconds: 4.49\n",
2722 | "Average loss for this update: 1.04\n",
2723 | "No Improvement.\n",
2724 | "Epoch 16/100 Batch 780/781 - Loss: 0.965, Seconds: 3.83\n",
2725 | "Epoch 17/100 Batch 20/781 - Loss: 1.147, Seconds: 4.21\n",
2726 | "Epoch 17/100 Batch 40/781 - Loss: 1.055, Seconds: 3.43\n",
2727 | "Epoch 17/100 Batch 60/781 - Loss: 1.053, Seconds: 4.41\n",
2728 | "Epoch 17/100 Batch 80/781 - Loss: 1.040, Seconds: 4.51\n",
2729 | "Epoch 17/100 Batch 100/781 - Loss: 0.918, Seconds: 4.23\n",
2730 | "Epoch 17/100 Batch 120/781 - Loss: 0.988, Seconds: 3.89\n",
2731 | "Epoch 17/100 Batch 140/781 - Loss: 0.944, Seconds: 4.09\n",
2732 | "Epoch 17/100 Batch 160/781 - Loss: 1.078, Seconds: 3.36\n",
2733 | "Epoch 17/100 Batch 180/781 - Loss: 1.053, Seconds: 3.69\n",
2734 | "Epoch 17/100 Batch 200/781 - Loss: 1.022, Seconds: 4.40\n",
2735 | "Epoch 17/100 Batch 220/781 - Loss: 1.007, Seconds: 4.20\n",
2736 | "Epoch 17/100 Batch 240/781 - Loss: 0.854, Seconds: 4.22\n",
2737 | "Average loss for this update: 1.009\n",
2738 | "New Record!\n",
2739 | "Epoch 17/100 Batch 260/781 - Loss: 0.950, Seconds: 4.57\n",
2740 | "Epoch 17/100 Batch 280/781 - Loss: 0.971, Seconds: 4.33\n",
2741 | "Epoch 17/100 Batch 300/781 - Loss: 1.070, Seconds: 4.21\n",
2742 | "Epoch 17/100 Batch 320/781 - Loss: 1.125, Seconds: 4.45\n",
2743 | "Epoch 17/100 Batch 340/781 - Loss: 0.983, Seconds: 4.49\n",
2744 | "Epoch 17/100 Batch 360/781 - Loss: 1.026, Seconds: 4.33\n",
2745 | "Epoch 17/100 Batch 380/781 - Loss: 0.881, Seconds: 4.97\n",
2746 | "Epoch 17/100 Batch 400/781 - Loss: 1.019, Seconds: 4.67\n",
2747 | "Epoch 17/100 Batch 420/781 - Loss: 0.962, Seconds: 4.03\n",
2748 | "Epoch 17/100 Batch 440/781 - Loss: 1.011, Seconds: 4.61\n",
2749 | "Epoch 17/100 Batch 460/781 - Loss: 1.076, Seconds: 4.45\n",
2750 | "Epoch 17/100 Batch 480/781 - Loss: 0.975, Seconds: 4.05\n",
2751 | "Epoch 17/100 Batch 500/781 - Loss: 1.024, Seconds: 4.23\n",
2752 | "Average loss for this update: 1.002\n",
2753 | "New Record!\n",
2754 | "Epoch 17/100 Batch 520/781 - Loss: 0.915, Seconds: 4.03\n",
2755 | "Epoch 17/100 Batch 540/781 - Loss: 0.963, Seconds: 4.41\n",
2756 | "Epoch 17/100 Batch 560/781 - Loss: 0.913, Seconds: 4.09\n",
2757 | "Epoch 17/100 Batch 580/781 - Loss: 1.042, Seconds: 4.29\n",
2758 | "Epoch 17/100 Batch 600/781 - Loss: 1.078, Seconds: 4.29\n",
2759 | "Epoch 17/100 Batch 620/781 - Loss: 1.010, Seconds: 4.01\n"
2760 | ]
2761 | },
2762 | {
2763 | "name": "stdout",
2764 | "output_type": "stream",
2765 | "text": [
2766 | "Epoch 17/100 Batch 640/781 - Loss: 1.001, Seconds: 4.23\n",
2767 | "Epoch 17/100 Batch 660/781 - Loss: 0.882, Seconds: 4.47\n",
2768 | "Epoch 17/100 Batch 680/781 - Loss: 0.947, Seconds: 4.37\n",
2769 | "Epoch 17/100 Batch 700/781 - Loss: 1.028, Seconds: 4.31\n",
2770 | "Epoch 17/100 Batch 720/781 - Loss: 1.147, Seconds: 4.53\n",
2771 | "Epoch 17/100 Batch 740/781 - Loss: 1.047, Seconds: 4.77\n",
2772 | "Epoch 17/100 Batch 760/781 - Loss: 1.062, Seconds: 4.31\n",
2773 | "Average loss for this update: 1.003\n",
2774 | "No Improvement.\n",
2775 | "Epoch 17/100 Batch 780/781 - Loss: 0.922, Seconds: 3.91\n",
2776 | "Epoch 18/100 Batch 20/781 - Loss: 1.115, Seconds: 4.41\n",
2777 | "Epoch 18/100 Batch 40/781 - Loss: 1.022, Seconds: 3.31\n",
2778 | "Epoch 18/100 Batch 60/781 - Loss: 1.016, Seconds: 4.41\n",
2779 | "Epoch 18/100 Batch 80/781 - Loss: 1.003, Seconds: 4.19\n",
2780 | "Epoch 18/100 Batch 100/781 - Loss: 0.893, Seconds: 4.23\n",
2781 | "Epoch 18/100 Batch 120/781 - Loss: 0.970, Seconds: 3.77\n",
2782 | "Epoch 18/100 Batch 140/781 - Loss: 0.917, Seconds: 4.09\n",
2783 | "Epoch 18/100 Batch 160/781 - Loss: 1.062, Seconds: 3.33\n",
2784 | "Epoch 18/100 Batch 180/781 - Loss: 1.011, Seconds: 3.73\n",
2785 | "Epoch 18/100 Batch 200/781 - Loss: 0.982, Seconds: 4.37\n",
2786 | "Epoch 18/100 Batch 220/781 - Loss: 0.982, Seconds: 4.03\n",
2787 | "Epoch 18/100 Batch 240/781 - Loss: 0.839, Seconds: 4.09\n",
2788 | "Average loss for this update: 0.98\n",
2789 | "New Record!\n",
2790 | "Epoch 18/100 Batch 260/781 - Loss: 0.918, Seconds: 4.45\n",
2791 | "Epoch 18/100 Batch 280/781 - Loss: 0.930, Seconds: 4.29\n",
2792 | "Epoch 18/100 Batch 300/781 - Loss: 1.033, Seconds: 4.07\n",
2793 | "Epoch 18/100 Batch 320/781 - Loss: 1.063, Seconds: 4.23\n",
2794 | "Epoch 18/100 Batch 340/781 - Loss: 0.951, Seconds: 4.59\n",
2795 | "Epoch 18/100 Batch 360/781 - Loss: 1.000, Seconds: 4.23\n",
2796 | "Epoch 18/100 Batch 380/781 - Loss: 0.851, Seconds: 4.55\n",
2797 | "Epoch 18/100 Batch 400/781 - Loss: 0.980, Seconds: 4.45\n",
2798 | "Epoch 18/100 Batch 420/781 - Loss: 0.916, Seconds: 3.95\n",
2799 | "Epoch 18/100 Batch 440/781 - Loss: 0.991, Seconds: 4.53\n",
2800 | "Epoch 18/100 Batch 460/781 - Loss: 1.049, Seconds: 4.35\n",
2801 | "Epoch 18/100 Batch 480/781 - Loss: 0.943, Seconds: 3.99\n",
2802 | "Epoch 18/100 Batch 500/781 - Loss: 0.993, Seconds: 4.13\n",
2803 | "Average loss for this update: 0.968\n",
2804 | "New Record!\n",
2805 | "Epoch 18/100 Batch 520/781 - Loss: 0.890, Seconds: 3.87\n",
2806 | "Epoch 18/100 Batch 540/781 - Loss: 0.939, Seconds: 4.31\n",
2807 | "Epoch 18/100 Batch 560/781 - Loss: 0.906, Seconds: 3.95\n",
2808 | "Epoch 18/100 Batch 580/781 - Loss: 1.027, Seconds: 4.31\n",
2809 | "Epoch 18/100 Batch 600/781 - Loss: 1.034, Seconds: 4.33\n",
2810 | "Epoch 18/100 Batch 620/781 - Loss: 0.987, Seconds: 4.03\n",
2811 | "Epoch 18/100 Batch 640/781 - Loss: 0.987, Seconds: 4.27\n",
2812 | "Epoch 18/100 Batch 660/781 - Loss: 0.855, Seconds: 4.53\n",
2813 | "Epoch 18/100 Batch 680/781 - Loss: 0.926, Seconds: 4.25\n",
2814 | "Epoch 18/100 Batch 700/781 - Loss: 1.014, Seconds: 4.33\n",
2815 | "Epoch 18/100 Batch 720/781 - Loss: 1.130, Seconds: 4.47\n",
2816 | "Epoch 18/100 Batch 740/781 - Loss: 1.033, Seconds: 4.69\n",
2817 | "Epoch 18/100 Batch 760/781 - Loss: 1.041, Seconds: 4.29\n",
2818 | "Average loss for this update: 0.984\n",
2819 | "No Improvement.\n",
2820 | "Epoch 18/100 Batch 780/781 - Loss: 0.910, Seconds: 3.87\n",
2821 | "Epoch 19/100 Batch 20/781 - Loss: 1.078, Seconds: 4.23\n",
2822 | "Epoch 19/100 Batch 40/781 - Loss: 1.007, Seconds: 3.37\n",
2823 | "Epoch 19/100 Batch 60/781 - Loss: 0.987, Seconds: 4.41\n",
2824 | "Epoch 19/100 Batch 80/781 - Loss: 0.983, Seconds: 4.21\n",
2825 | "Epoch 19/100 Batch 100/781 - Loss: 0.880, Seconds: 4.15\n",
2826 | "Epoch 19/100 Batch 120/781 - Loss: 0.949, Seconds: 3.79\n",
2827 | "Epoch 19/100 Batch 140/781 - Loss: 0.886, Seconds: 3.99\n",
2828 | "Epoch 19/100 Batch 160/781 - Loss: 1.047, Seconds: 3.33\n",
2829 | "Epoch 19/100 Batch 180/781 - Loss: 1.013, Seconds: 3.65\n",
2830 | "Epoch 19/100 Batch 200/781 - Loss: 0.972, Seconds: 4.27\n",
2831 | "Epoch 19/100 Batch 220/781 - Loss: 0.967, Seconds: 4.31\n",
2832 | "Epoch 19/100 Batch 240/781 - Loss: 0.824, Seconds: 4.12\n",
2833 | "Average loss for this update: 0.962\n",
2834 | "New Record!\n",
2835 | "Epoch 19/100 Batch 260/781 - Loss: 0.912, Seconds: 4.67\n",
2836 | "Epoch 19/100 Batch 280/781 - Loss: 0.927, Seconds: 4.35\n",
2837 | "Epoch 19/100 Batch 300/781 - Loss: 1.008, Seconds: 4.09\n",
2838 | "Epoch 19/100 Batch 320/781 - Loss: 1.041, Seconds: 4.11\n",
2839 | "Epoch 19/100 Batch 340/781 - Loss: 0.929, Seconds: 4.63\n",
2840 | "Epoch 19/100 Batch 360/781 - Loss: 1.007, Seconds: 4.51\n",
2841 | "Epoch 19/100 Batch 380/781 - Loss: 0.860, Seconds: 4.61\n",
2842 | "Epoch 19/100 Batch 400/781 - Loss: 0.965, Seconds: 4.67\n",
2843 | "Epoch 19/100 Batch 420/781 - Loss: 0.903, Seconds: 3.85\n",
2844 | "Epoch 19/100 Batch 440/781 - Loss: 0.976, Seconds: 4.73\n",
2845 | "Epoch 19/100 Batch 460/781 - Loss: 1.033, Seconds: 4.45\n",
2846 | "Epoch 19/100 Batch 480/781 - Loss: 0.912, Seconds: 4.11\n",
2847 | "Epoch 19/100 Batch 500/781 - Loss: 0.976, Seconds: 4.31\n",
2848 | "Average loss for this update: 0.954\n",
2849 | "New Record!\n",
2850 | "Epoch 19/100 Batch 520/781 - Loss: 0.868, Seconds: 4.03\n",
2851 | "Epoch 19/100 Batch 540/781 - Loss: 0.922, Seconds: 4.59\n",
2852 | "Epoch 19/100 Batch 560/781 - Loss: 0.893, Seconds: 4.07\n",
2853 | "Epoch 19/100 Batch 580/781 - Loss: 1.004, Seconds: 4.57\n",
2854 | "Epoch 19/100 Batch 600/781 - Loss: 1.024, Seconds: 4.43\n",
2855 | "Epoch 19/100 Batch 620/781 - Loss: 0.947, Seconds: 4.23\n",
2856 | "Epoch 19/100 Batch 640/781 - Loss: 0.949, Seconds: 4.55\n",
2857 | "Epoch 19/100 Batch 660/781 - Loss: 0.840, Seconds: 4.59\n",
2858 | "Epoch 19/100 Batch 680/781 - Loss: 0.917, Seconds: 4.31\n",
2859 | "Epoch 19/100 Batch 700/781 - Loss: 0.994, Seconds: 4.47\n",
2860 | "Epoch 19/100 Batch 720/781 - Loss: 1.095, Seconds: 4.61\n",
2861 | "Epoch 19/100 Batch 740/781 - Loss: 0.991, Seconds: 4.73\n",
2862 | "Epoch 19/100 Batch 760/781 - Loss: 1.018, Seconds: 4.45\n",
2863 | "Average loss for this update: 0.96\n",
2864 | "No Improvement.\n",
2865 | "Epoch 19/100 Batch 780/781 - Loss: 0.882, Seconds: 4.09\n",
2866 | "Epoch 20/100 Batch 20/781 - Loss: 1.101, Seconds: 4.27\n",
2867 | "Epoch 20/100 Batch 40/781 - Loss: 0.987, Seconds: 3.55\n",
2868 | "Epoch 20/100 Batch 60/781 - Loss: 0.966, Seconds: 4.51\n",
2869 | "Epoch 20/100 Batch 80/781 - Loss: 0.978, Seconds: 4.27\n",
2870 | "Epoch 20/100 Batch 100/781 - Loss: 0.863, Seconds: 4.31\n",
2871 | "Epoch 20/100 Batch 120/781 - Loss: 0.927, Seconds: 3.91\n",
2872 | "Epoch 20/100 Batch 140/781 - Loss: 0.872, Seconds: 4.03\n",
2873 | "Epoch 20/100 Batch 160/781 - Loss: 1.023, Seconds: 3.25\n",
2874 | "Epoch 20/100 Batch 180/781 - Loss: 0.974, Seconds: 3.89\n",
2875 | "Epoch 20/100 Batch 200/781 - Loss: 0.935, Seconds: 4.39\n",
2876 | "Epoch 20/100 Batch 220/781 - Loss: 0.957, Seconds: 4.11\n",
2877 | "Epoch 20/100 Batch 240/781 - Loss: 0.805, Seconds: 4.15\n",
2878 | "Average loss for this update: 0.946\n",
2879 | "New Record!\n",
2880 | "Epoch 20/100 Batch 260/781 - Loss: 0.898, Seconds: 4.69\n",
2881 | "Epoch 20/100 Batch 280/781 - Loss: 0.909, Seconds: 4.50\n",
2882 | "Epoch 20/100 Batch 300/781 - Loss: 0.989, Seconds: 4.23\n",
2883 | "Epoch 20/100 Batch 320/781 - Loss: 1.021, Seconds: 4.23\n",
2884 | "Epoch 20/100 Batch 340/781 - Loss: 0.911, Seconds: 4.71\n",
2885 | "Epoch 20/100 Batch 360/781 - Loss: 0.960, Seconds: 4.53\n",
2886 | "Epoch 20/100 Batch 380/781 - Loss: 0.824, Seconds: 4.61\n",
2887 | "Epoch 20/100 Batch 400/781 - Loss: 0.934, Seconds: 4.73\n",
2888 | "Epoch 20/100 Batch 420/781 - Loss: 0.874, Seconds: 4.03\n",
2889 | "Epoch 20/100 Batch 440/781 - Loss: 0.921, Seconds: 4.77\n",
2890 | "Epoch 20/100 Batch 460/781 - Loss: 0.990, Seconds: 4.51\n",
2891 | "Epoch 20/100 Batch 480/781 - Loss: 0.893, Seconds: 4.03\n",
2892 | "Epoch 20/100 Batch 500/781 - Loss: 0.938, Seconds: 4.17\n",
2893 | "Average loss for this update: 0.923\n",
2894 | "New Record!\n",
2895 | "Epoch 20/100 Batch 520/781 - Loss: 0.842, Seconds: 4.01\n",
2896 | "Epoch 20/100 Batch 540/781 - Loss: 0.898, Seconds: 4.63\n",
2897 | "Epoch 20/100 Batch 560/781 - Loss: 0.860, Seconds: 4.07\n",
2898 | "Epoch 20/100 Batch 580/781 - Loss: 0.974, Seconds: 4.37\n",
2899 | "Epoch 20/100 Batch 600/781 - Loss: 0.980, Seconds: 4.47\n",
2900 | "Epoch 20/100 Batch 620/781 - Loss: 0.929, Seconds: 4.17\n",
2901 | "Epoch 20/100 Batch 640/781 - Loss: 0.922, Seconds: 4.29\n",
2902 | "Epoch 20/100 Batch 660/781 - Loss: 0.827, Seconds: 4.47\n",
2903 | "Epoch 20/100 Batch 680/781 - Loss: 0.882, Seconds: 4.27\n",
2904 | "Epoch 20/100 Batch 700/781 - Loss: 0.960, Seconds: 4.31\n",
2905 | "Epoch 20/100 Batch 720/781 - Loss: 1.073, Seconds: 4.61\n",
2906 | "Epoch 20/100 Batch 740/781 - Loss: 0.974, Seconds: 4.65\n",
2907 | "Epoch 20/100 Batch 760/781 - Loss: 0.989, Seconds: 4.23\n",
2908 | "Average loss for this update: 0.933\n",
2909 | "No Improvement.\n",
2910 | "Epoch 20/100 Batch 780/781 - Loss: 0.858, Seconds: 3.91\n",
2911 | "Epoch 21/100 Batch 20/781 - Loss: 1.041, Seconds: 4.27\n",
2912 | "Epoch 21/100 Batch 40/781 - Loss: 0.954, Seconds: 3.39\n",
2913 | "Epoch 21/100 Batch 60/781 - Loss: 0.946, Seconds: 4.37\n",
2914 | "Epoch 21/100 Batch 80/781 - Loss: 0.942, Seconds: 4.33\n",
2915 | "Epoch 21/100 Batch 100/781 - Loss: 0.827, Seconds: 4.53\n",
2916 | "Epoch 21/100 Batch 120/781 - Loss: 0.897, Seconds: 3.81\n"
2917 | ]
2918 | },
2919 | {
2920 | "name": "stdout",
2921 | "output_type": "stream",
2922 | "text": [
2923 | "Epoch 21/100 Batch 140/781 - Loss: 0.851, Seconds: 4.17\n",
2924 | "Epoch 21/100 Batch 160/781 - Loss: 0.983, Seconds: 3.27\n",
2925 | "Epoch 21/100 Batch 180/781 - Loss: 0.949, Seconds: 3.59\n",
2926 | "Epoch 21/100 Batch 200/781 - Loss: 0.904, Seconds: 4.35\n",
2927 | "Epoch 21/100 Batch 220/781 - Loss: 0.918, Seconds: 4.11\n",
2928 | "Epoch 21/100 Batch 240/781 - Loss: 0.773, Seconds: 4.01\n",
2929 | "Average loss for this update: 0.913\n",
2930 | "New Record!\n",
2931 | "Epoch 21/100 Batch 260/781 - Loss: 0.874, Seconds: 4.55\n",
2932 | "Epoch 21/100 Batch 280/781 - Loss: 0.875, Seconds: 4.27\n",
2933 | "Epoch 21/100 Batch 300/781 - Loss: 0.943, Seconds: 4.15\n",
2934 | "Epoch 21/100 Batch 320/781 - Loss: 0.996, Seconds: 4.25\n",
2935 | "Epoch 21/100 Batch 340/781 - Loss: 0.879, Seconds: 4.65\n",
2936 | "Epoch 21/100 Batch 360/781 - Loss: 0.925, Seconds: 4.41\n",
2937 | "Epoch 21/100 Batch 380/781 - Loss: 0.787, Seconds: 4.57\n",
2938 | "Epoch 21/100 Batch 400/781 - Loss: 0.915, Seconds: 4.57\n",
2939 | "Epoch 21/100 Batch 420/781 - Loss: 0.850, Seconds: 4.07\n",
2940 | "Epoch 21/100 Batch 440/781 - Loss: 0.908, Seconds: 4.73\n",
2941 | "Epoch 21/100 Batch 460/781 - Loss: 0.966, Seconds: 4.61\n",
2942 | "Epoch 21/100 Batch 480/781 - Loss: 0.866, Seconds: 3.97\n",
2943 | "Epoch 21/100 Batch 500/781 - Loss: 0.915, Seconds: 4.51\n",
2944 | "Average loss for this update: 0.894\n",
2945 | "New Record!\n",
2946 | "Epoch 21/100 Batch 520/781 - Loss: 0.812, Seconds: 3.93\n",
2947 | "Epoch 21/100 Batch 540/781 - Loss: 0.880, Seconds: 4.59\n",
2948 | "Epoch 21/100 Batch 560/781 - Loss: 0.843, Seconds: 4.11\n",
2949 | "Epoch 21/100 Batch 580/781 - Loss: 0.945, Seconds: 4.31\n",
2950 | "Epoch 21/100 Batch 600/781 - Loss: 0.969, Seconds: 4.56\n",
2951 | "Epoch 21/100 Batch 620/781 - Loss: 0.906, Seconds: 4.19\n",
2952 | "Epoch 21/100 Batch 640/781 - Loss: 0.896, Seconds: 4.25\n",
2953 | "Epoch 21/100 Batch 660/781 - Loss: 0.793, Seconds: 4.51\n",
2954 | "Epoch 21/100 Batch 680/781 - Loss: 0.849, Seconds: 4.21\n",
2955 | "Epoch 21/100 Batch 700/781 - Loss: 0.944, Seconds: 4.25\n",
2956 | "Epoch 21/100 Batch 720/781 - Loss: 1.048, Seconds: 4.51\n",
2957 | "Epoch 21/100 Batch 740/781 - Loss: 0.944, Seconds: 4.71\n",
2958 | "Epoch 21/100 Batch 760/781 - Loss: 0.972, Seconds: 4.45\n",
2959 | "Average loss for this update: 0.911\n",
2960 | "No Improvement.\n",
2961 | "Epoch 21/100 Batch 780/781 - Loss: 0.850, Seconds: 4.11\n",
2962 | "Epoch 22/100 Batch 20/781 - Loss: 1.015, Seconds: 4.25\n",
2963 | "Epoch 22/100 Batch 40/781 - Loss: 0.936, Seconds: 3.51\n",
2964 | "Epoch 22/100 Batch 60/781 - Loss: 0.925, Seconds: 4.27\n",
2965 | "Epoch 22/100 Batch 80/781 - Loss: 0.913, Seconds: 4.15\n",
2966 | "Epoch 22/100 Batch 100/781 - Loss: 0.818, Seconds: 4.25\n",
2967 | "Epoch 22/100 Batch 120/781 - Loss: 0.871, Seconds: 3.81\n",
2968 | "Epoch 22/100 Batch 140/781 - Loss: 0.843, Seconds: 4.05\n",
2969 | "Epoch 22/100 Batch 160/781 - Loss: 0.968, Seconds: 3.33\n",
2970 | "Epoch 22/100 Batch 180/781 - Loss: 0.926, Seconds: 3.69\n",
2971 | "Epoch 22/100 Batch 200/781 - Loss: 0.896, Seconds: 4.29\n",
2972 | "Epoch 22/100 Batch 220/781 - Loss: 0.891, Seconds: 4.09\n",
2973 | "Epoch 22/100 Batch 240/781 - Loss: 0.756, Seconds: 4.11\n",
2974 | "Average loss for this update: 0.893\n",
2975 | "New Record!\n",
2976 | "Epoch 22/100 Batch 260/781 - Loss: 0.849, Seconds: 4.61\n",
2977 | "Epoch 22/100 Batch 280/781 - Loss: 0.846, Seconds: 4.27\n",
2978 | "Epoch 22/100 Batch 300/781 - Loss: 0.935, Seconds: 4.09\n",
2979 | "Epoch 22/100 Batch 320/781 - Loss: 0.980, Seconds: 4.25\n",
2980 | "Epoch 22/100 Batch 340/781 - Loss: 0.857, Seconds: 4.55\n",
2981 | "Epoch 22/100 Batch 360/781 - Loss: 0.914, Seconds: 4.67\n",
2982 | "Epoch 22/100 Batch 380/781 - Loss: 0.786, Seconds: 4.63\n",
2983 | "Epoch 22/100 Batch 400/781 - Loss: 0.895, Seconds: 4.61\n",
2984 | "Epoch 22/100 Batch 420/781 - Loss: 0.837, Seconds: 3.91\n",
2985 | "Epoch 22/100 Batch 440/781 - Loss: 0.901, Seconds: 4.63\n",
2986 | "Epoch 22/100 Batch 460/781 - Loss: 0.944, Seconds: 4.39\n",
2987 | "Epoch 22/100 Batch 480/781 - Loss: 0.842, Seconds: 3.97\n",
2988 | "Epoch 22/100 Batch 500/781 - Loss: 0.908, Seconds: 4.15\n",
2989 | "Average loss for this update: 0.88\n",
2990 | "New Record!\n",
2991 | "Epoch 22/100 Batch 520/781 - Loss: 0.796, Seconds: 3.85\n",
2992 | "Epoch 22/100 Batch 540/781 - Loss: 0.856, Seconds: 4.37\n",
2993 | "Epoch 22/100 Batch 560/781 - Loss: 0.817, Seconds: 4.01\n",
2994 | "Epoch 22/100 Batch 580/781 - Loss: 0.931, Seconds: 4.31\n",
2995 | "Epoch 22/100 Batch 600/781 - Loss: 0.962, Seconds: 4.27\n",
2996 | "Epoch 22/100 Batch 620/781 - Loss: 0.903, Seconds: 4.03\n",
2997 | "Epoch 22/100 Batch 640/781 - Loss: 0.872, Seconds: 4.13\n",
2998 | "Epoch 22/100 Batch 660/781 - Loss: 0.788, Seconds: 4.43\n",
2999 | "Epoch 22/100 Batch 680/781 - Loss: 0.839, Seconds: 4.37\n",
3000 | "Epoch 22/100 Batch 700/781 - Loss: 0.928, Seconds: 4.31\n",
3001 | "Epoch 22/100 Batch 720/781 - Loss: 1.030, Seconds: 4.49\n",
3002 | "Epoch 22/100 Batch 740/781 - Loss: 0.934, Seconds: 4.85\n",
3003 | "Epoch 22/100 Batch 760/781 - Loss: 0.955, Seconds: 4.33\n",
3004 | "Average loss for this update: 0.895\n",
3005 | "No Improvement.\n",
3006 | "Epoch 22/100 Batch 780/781 - Loss: 0.830, Seconds: 3.89\n",
3007 | "Epoch 23/100 Batch 20/781 - Loss: 0.994, Seconds: 4.33\n",
3008 | "Epoch 23/100 Batch 40/781 - Loss: 0.901, Seconds: 3.47\n",
3009 | "Epoch 23/100 Batch 60/781 - Loss: 0.897, Seconds: 4.35\n",
3010 | "Epoch 23/100 Batch 80/781 - Loss: 0.893, Seconds: 4.27\n",
3011 | "Epoch 23/100 Batch 100/781 - Loss: 0.796, Seconds: 4.31\n",
3012 | "Epoch 23/100 Batch 120/781 - Loss: 0.861, Seconds: 3.71\n",
3013 | "Epoch 23/100 Batch 140/781 - Loss: 0.816, Seconds: 4.05\n",
3014 | "Epoch 23/100 Batch 160/781 - Loss: 0.952, Seconds: 3.41\n",
3015 | "Epoch 23/100 Batch 180/781 - Loss: 0.905, Seconds: 3.65\n",
3016 | "Epoch 23/100 Batch 200/781 - Loss: 0.885, Seconds: 4.23\n",
3017 | "Epoch 23/100 Batch 220/781 - Loss: 0.880, Seconds: 4.15\n",
3018 | "Epoch 23/100 Batch 240/781 - Loss: 0.738, Seconds: 4.13\n",
3019 | "Average loss for this update: 0.874\n",
3020 | "New Record!\n",
3021 | "Epoch 23/100 Batch 260/781 - Loss: 0.833, Seconds: 4.49\n",
3022 | "Epoch 23/100 Batch 280/781 - Loss: 0.838, Seconds: 4.29\n",
3023 | "Epoch 23/100 Batch 300/781 - Loss: 0.915, Seconds: 4.25\n",
3024 | "Epoch 23/100 Batch 320/781 - Loss: 0.953, Seconds: 4.13\n",
3025 | "Epoch 23/100 Batch 340/781 - Loss: 0.847, Seconds: 4.49\n",
3026 | "Epoch 23/100 Batch 360/781 - Loss: 0.889, Seconds: 4.33\n",
3027 | "Epoch 23/100 Batch 380/781 - Loss: 0.775, Seconds: 4.55\n",
3028 | "Epoch 23/100 Batch 400/781 - Loss: 0.904, Seconds: 4.59\n",
3029 | "Epoch 23/100 Batch 420/781 - Loss: 0.824, Seconds: 3.95\n",
3030 | "Epoch 23/100 Batch 440/781 - Loss: 0.883, Seconds: 4.57\n",
3031 | "Epoch 23/100 Batch 460/781 - Loss: 0.936, Seconds: 4.47\n",
3032 | "Epoch 23/100 Batch 480/781 - Loss: 0.862, Seconds: 3.85\n",
3033 | "Epoch 23/100 Batch 500/781 - Loss: 0.913, Seconds: 4.29\n",
3034 | "Average loss for this update: 0.871\n",
3035 | "New Record!\n",
3036 | "Epoch 23/100 Batch 520/781 - Loss: 0.793, Seconds: 3.91\n",
3037 | "Epoch 23/100 Batch 540/781 - Loss: 0.833, Seconds: 4.41\n",
3038 | "Epoch 23/100 Batch 560/781 - Loss: 0.812, Seconds: 3.97\n",
3039 | "Epoch 23/100 Batch 580/781 - Loss: 0.905, Seconds: 4.25\n",
3040 | "Epoch 23/100 Batch 600/781 - Loss: 0.934, Seconds: 4.19\n",
3041 | "Epoch 23/100 Batch 620/781 - Loss: 0.874, Seconds: 4.07\n",
3042 | "Epoch 23/100 Batch 640/781 - Loss: 0.867, Seconds: 4.37\n",
3043 | "Epoch 23/100 Batch 660/781 - Loss: 0.762, Seconds: 4.41\n",
3044 | "Epoch 23/100 Batch 680/781 - Loss: 0.821, Seconds: 4.25\n",
3045 | "Epoch 23/100 Batch 700/781 - Loss: 0.917, Seconds: 4.37\n",
3046 | "Epoch 23/100 Batch 720/781 - Loss: 1.014, Seconds: 4.47\n",
3047 | "Epoch 23/100 Batch 740/781 - Loss: 0.926, Seconds: 4.61\n",
3048 | "Epoch 23/100 Batch 760/781 - Loss: 0.963, Seconds: 4.35\n",
3049 | "Average loss for this update: 0.881\n",
3050 | "No Improvement.\n",
3051 | "Epoch 23/100 Batch 780/781 - Loss: 0.827, Seconds: 3.87\n",
3052 | "Epoch 24/100 Batch 20/781 - Loss: 0.965, Seconds: 4.19\n",
3053 | "Epoch 24/100 Batch 40/781 - Loss: 0.886, Seconds: 3.41\n",
3054 | "Epoch 24/100 Batch 60/781 - Loss: 0.874, Seconds: 4.35\n",
3055 | "Epoch 24/100 Batch 80/781 - Loss: 0.885, Seconds: 4.19\n",
3056 | "Epoch 24/100 Batch 100/781 - Loss: 0.778, Seconds: 4.18\n",
3057 | "Epoch 24/100 Batch 120/781 - Loss: 0.851, Seconds: 3.73\n",
3058 | "Epoch 24/100 Batch 140/781 - Loss: 0.797, Seconds: 4.05\n",
3059 | "Epoch 24/100 Batch 160/781 - Loss: 0.932, Seconds: 3.29\n",
3060 | "Epoch 24/100 Batch 180/781 - Loss: 0.893, Seconds: 3.63\n",
3061 | "Epoch 24/100 Batch 200/781 - Loss: 0.867, Seconds: 4.21\n",
3062 | "Epoch 24/100 Batch 220/781 - Loss: 0.865, Seconds: 4.05\n",
3063 | "Epoch 24/100 Batch 240/781 - Loss: 0.719, Seconds: 4.03\n",
3064 | "Average loss for this update: 0.856\n",
3065 | "New Record!\n",
3066 | "Epoch 24/100 Batch 260/781 - Loss: 0.811, Seconds: 4.65\n",
3067 | "Epoch 24/100 Batch 280/781 - Loss: 0.830, Seconds: 4.31\n",
3068 | "Epoch 24/100 Batch 300/781 - Loss: 0.902, Seconds: 4.23\n",
3069 | "Epoch 24/100 Batch 320/781 - Loss: 0.935, Seconds: 4.23\n",
3070 | "Epoch 24/100 Batch 340/781 - Loss: 0.819, Seconds: 4.49\n",
3071 | "Epoch 24/100 Batch 360/781 - Loss: 0.871, Seconds: 4.31\n",
3072 | "Epoch 24/100 Batch 380/781 - Loss: 0.738, Seconds: 4.67\n",
3073 | "Epoch 24/100 Batch 400/781 - Loss: 0.857, Seconds: 4.53\n"
3074 | ]
3075 | },
3076 | {
3077 | "name": "stdout",
3078 | "output_type": "stream",
3079 | "text": [
3080 | "Epoch 24/100 Batch 420/781 - Loss: 0.796, Seconds: 3.85\n",
3081 | "Epoch 24/100 Batch 440/781 - Loss: 0.865, Seconds: 4.51\n",
3082 | "Epoch 24/100 Batch 460/781 - Loss: 0.914, Seconds: 4.45\n",
3083 | "Epoch 24/100 Batch 480/781 - Loss: 0.823, Seconds: 3.89\n",
3084 | "Epoch 24/100 Batch 500/781 - Loss: 0.879, Seconds: 4.27\n",
3085 | "Average loss for this update: 0.846\n",
3086 | "New Record!\n",
3087 | "Epoch 24/100 Batch 520/781 - Loss: 0.766, Seconds: 3.89\n",
3088 | "Epoch 24/100 Batch 540/781 - Loss: 0.818, Seconds: 4.47\n",
3089 | "Epoch 24/100 Batch 560/781 - Loss: 0.784, Seconds: 4.13\n",
3090 | "Epoch 24/100 Batch 580/781 - Loss: 0.897, Seconds: 4.19\n",
3091 | "Epoch 24/100 Batch 600/781 - Loss: 0.902, Seconds: 4.43\n",
3092 | "Epoch 24/100 Batch 620/781 - Loss: 0.859, Seconds: 4.15\n",
3093 | "Epoch 24/100 Batch 640/781 - Loss: 0.847, Seconds: 4.17\n",
3094 | "Epoch 24/100 Batch 660/781 - Loss: 0.744, Seconds: 4.47\n",
3095 | "Epoch 24/100 Batch 680/781 - Loss: 0.788, Seconds: 4.31\n",
3096 | "Epoch 24/100 Batch 700/781 - Loss: 0.886, Seconds: 4.31\n",
3097 | "Epoch 24/100 Batch 720/781 - Loss: 0.987, Seconds: 4.43\n",
3098 | "Epoch 24/100 Batch 740/781 - Loss: 0.898, Seconds: 4.79\n",
3099 | "Epoch 24/100 Batch 760/781 - Loss: 0.924, Seconds: 4.33\n",
3100 | "Average loss for this update: 0.857\n",
3101 | "No Improvement.\n",
3102 | "Epoch 24/100 Batch 780/781 - Loss: 0.808, Seconds: 3.99\n",
3103 | "Epoch 25/100 Batch 20/781 - Loss: 0.941, Seconds: 4.21\n",
3104 | "Epoch 25/100 Batch 40/781 - Loss: 0.873, Seconds: 3.41\n",
3105 | "Epoch 25/100 Batch 60/781 - Loss: 0.868, Seconds: 4.49\n",
3106 | "Epoch 25/100 Batch 80/781 - Loss: 0.858, Seconds: 4.25\n",
3107 | "Epoch 25/100 Batch 100/781 - Loss: 0.761, Seconds: 4.25\n",
3108 | "Epoch 25/100 Batch 120/781 - Loss: 0.809, Seconds: 3.85\n",
3109 | "Epoch 25/100 Batch 140/781 - Loss: 0.784, Seconds: 3.99\n",
3110 | "Epoch 25/100 Batch 160/781 - Loss: 0.925, Seconds: 3.25\n",
3111 | "Epoch 25/100 Batch 180/781 - Loss: 0.886, Seconds: 3.77\n",
3112 | "Epoch 25/100 Batch 200/781 - Loss: 0.860, Seconds: 4.17\n",
3113 | "Epoch 25/100 Batch 220/781 - Loss: 0.846, Seconds: 4.03\n",
3114 | "Epoch 25/100 Batch 240/781 - Loss: 0.709, Seconds: 4.19\n",
3115 | "Average loss for this update: 0.84\n",
3116 | "New Record!\n",
3117 | "Epoch 25/100 Batch 260/781 - Loss: 0.790, Seconds: 4.61\n",
3118 | "Epoch 25/100 Batch 280/781 - Loss: 0.822, Seconds: 4.27\n",
3119 | "Epoch 25/100 Batch 300/781 - Loss: 0.880, Seconds: 4.09\n",
3120 | "Epoch 25/100 Batch 320/781 - Loss: 0.922, Seconds: 4.09\n",
3121 | "Epoch 25/100 Batch 340/781 - Loss: 0.816, Seconds: 4.59\n",
3122 | "Epoch 25/100 Batch 360/781 - Loss: 0.845, Seconds: 4.31\n",
3123 | "Epoch 25/100 Batch 380/781 - Loss: 0.732, Seconds: 4.51\n",
3124 | "Epoch 25/100 Batch 400/781 - Loss: 0.851, Seconds: 4.47\n",
3125 | "Epoch 25/100 Batch 420/781 - Loss: 0.787, Seconds: 3.91\n",
3126 | "Epoch 25/100 Batch 440/781 - Loss: 0.848, Seconds: 4.51\n",
3127 | "Epoch 25/100 Batch 460/781 - Loss: 0.897, Seconds: 4.39\n",
3128 | "Epoch 25/100 Batch 480/781 - Loss: 0.808, Seconds: 3.97\n",
3129 | "Epoch 25/100 Batch 500/781 - Loss: 0.867, Seconds: 4.23\n",
3130 | "Average loss for this update: 0.832\n",
3131 | "New Record!\n",
3132 | "Epoch 25/100 Batch 520/781 - Loss: 0.746, Seconds: 3.95\n",
3133 | "Epoch 25/100 Batch 540/781 - Loss: 0.806, Seconds: 4.33\n",
3134 | "Epoch 25/100 Batch 560/781 - Loss: 0.777, Seconds: 4.09\n",
3135 | "Epoch 25/100 Batch 580/781 - Loss: 0.870, Seconds: 4.15\n",
3136 | "Epoch 25/100 Batch 600/781 - Loss: 0.891, Seconds: 4.23\n",
3137 | "Epoch 25/100 Batch 620/781 - Loss: 0.844, Seconds: 4.15\n",
3138 | "Epoch 25/100 Batch 640/781 - Loss: 0.840, Seconds: 4.23\n",
3139 | "Epoch 25/100 Batch 660/781 - Loss: 0.743, Seconds: 4.45\n",
3140 | "Epoch 25/100 Batch 680/781 - Loss: 0.783, Seconds: 4.17\n",
3141 | "Epoch 25/100 Batch 700/781 - Loss: 0.871, Seconds: 4.39\n",
3142 | "Epoch 25/100 Batch 720/781 - Loss: 0.966, Seconds: 4.43\n",
3143 | "Epoch 25/100 Batch 740/781 - Loss: 0.879, Seconds: 4.67\n",
3144 | "Epoch 25/100 Batch 760/781 - Loss: 0.910, Seconds: 4.31\n",
3145 | "Average loss for this update: 0.843\n",
3146 | "No Improvement.\n",
3147 | "Epoch 25/100 Batch 780/781 - Loss: 0.786, Seconds: 3.91\n",
3148 | "Epoch 26/100 Batch 20/781 - Loss: 0.923, Seconds: 4.27\n",
3149 | "Epoch 26/100 Batch 40/781 - Loss: 0.857, Seconds: 3.35\n",
3150 | "Epoch 26/100 Batch 60/781 - Loss: 0.850, Seconds: 4.35\n",
3151 | "Epoch 26/100 Batch 80/781 - Loss: 0.843, Seconds: 4.37\n",
3152 | "Epoch 26/100 Batch 100/781 - Loss: 0.743, Seconds: 4.27\n",
3153 | "Epoch 26/100 Batch 120/781 - Loss: 0.816, Seconds: 3.77\n",
3154 | "Epoch 26/100 Batch 140/781 - Loss: 0.775, Seconds: 3.97\n",
3155 | "Epoch 26/100 Batch 160/781 - Loss: 0.898, Seconds: 3.25\n",
3156 | "Epoch 26/100 Batch 180/781 - Loss: 0.877, Seconds: 3.71\n",
3157 | "Epoch 26/100 Batch 200/781 - Loss: 0.820, Seconds: 4.27\n",
3158 | "Epoch 26/100 Batch 220/781 - Loss: 0.836, Seconds: 3.97\n",
3159 | "Epoch 26/100 Batch 240/781 - Loss: 0.703, Seconds: 4.03\n",
3160 | "Average loss for this update: 0.825\n",
3161 | "New Record!\n",
3162 | "Epoch 26/100 Batch 260/781 - Loss: 0.785, Seconds: 4.65\n",
3163 | "Epoch 26/100 Batch 280/781 - Loss: 0.797, Seconds: 4.33\n",
3164 | "Epoch 26/100 Batch 300/781 - Loss: 0.868, Seconds: 4.09\n",
3165 | "Epoch 26/100 Batch 320/781 - Loss: 0.898, Seconds: 4.13\n",
3166 | "Epoch 26/100 Batch 340/781 - Loss: 0.794, Seconds: 4.71\n",
3167 | "Epoch 26/100 Batch 360/781 - Loss: 0.831, Seconds: 4.35\n",
3168 | "Epoch 26/100 Batch 380/781 - Loss: 0.710, Seconds: 4.59\n",
3169 | "Epoch 26/100 Batch 400/781 - Loss: 0.826, Seconds: 4.51\n",
3170 | "Epoch 26/100 Batch 420/781 - Loss: 0.773, Seconds: 3.85\n",
3171 | "Epoch 26/100 Batch 440/781 - Loss: 0.824, Seconds: 4.55\n",
3172 | "Epoch 26/100 Batch 460/781 - Loss: 0.861, Seconds: 4.31\n",
3173 | "Epoch 26/100 Batch 480/781 - Loss: 0.785, Seconds: 3.93\n",
3174 | "Epoch 26/100 Batch 500/781 - Loss: 0.842, Seconds: 4.21\n",
3175 | "Average loss for this update: 0.811\n",
3176 | "New Record!\n",
3177 | "Epoch 26/100 Batch 520/781 - Loss: 0.744, Seconds: 3.99\n",
3178 | "Epoch 26/100 Batch 540/781 - Loss: 0.791, Seconds: 4.33\n",
3179 | "Epoch 26/100 Batch 560/781 - Loss: 0.762, Seconds: 4.01\n",
3180 | "Epoch 26/100 Batch 580/781 - Loss: 0.857, Seconds: 4.23\n",
3181 | "Epoch 26/100 Batch 600/781 - Loss: 0.875, Seconds: 4.25\n",
3182 | "Epoch 26/100 Batch 620/781 - Loss: 0.818, Seconds: 4.09\n",
3183 | "Epoch 26/100 Batch 640/781 - Loss: 0.813, Seconds: 4.33\n",
3184 | "Epoch 26/100 Batch 660/781 - Loss: 0.710, Seconds: 4.51\n",
3185 | "Epoch 26/100 Batch 680/781 - Loss: 0.775, Seconds: 4.37\n",
3186 | "Epoch 26/100 Batch 700/781 - Loss: 0.865, Seconds: 4.19\n",
3187 | "Epoch 26/100 Batch 720/781 - Loss: 0.955, Seconds: 4.51\n",
3188 | "Epoch 26/100 Batch 740/781 - Loss: 0.865, Seconds: 4.77\n",
3189 | "Epoch 26/100 Batch 760/781 - Loss: 0.887, Seconds: 4.33\n",
3190 | "Average loss for this update: 0.827\n",
3191 | "No Improvement.\n",
3192 | "Epoch 26/100 Batch 780/781 - Loss: 0.771, Seconds: 3.85\n",
3193 | "Epoch 27/100 Batch 20/781 - Loss: 0.913, Seconds: 4.17\n",
3194 | "Epoch 27/100 Batch 40/781 - Loss: 0.848, Seconds: 3.47\n",
3195 | "Epoch 27/100 Batch 60/781 - Loss: 0.840, Seconds: 4.37\n",
3196 | "Epoch 27/100 Batch 80/781 - Loss: 0.842, Seconds: 4.23\n",
3197 | "Epoch 27/100 Batch 100/781 - Loss: 0.741, Seconds: 4.23\n",
3198 | "Epoch 27/100 Batch 120/781 - Loss: 0.789, Seconds: 3.77\n",
3199 | "Epoch 27/100 Batch 140/781 - Loss: 0.757, Seconds: 4.07\n",
3200 | "Epoch 27/100 Batch 160/781 - Loss: 0.889, Seconds: 3.21\n",
3201 | "Epoch 27/100 Batch 180/781 - Loss: 0.848, Seconds: 3.71\n",
3202 | "Epoch 27/100 Batch 200/781 - Loss: 0.809, Seconds: 4.29\n",
3203 | "Epoch 27/100 Batch 220/781 - Loss: 0.824, Seconds: 4.25\n",
3204 | "Epoch 27/100 Batch 240/781 - Loss: 0.706, Seconds: 4.11\n",
3205 | "Average loss for this update: 0.815\n",
3206 | "No Improvement.\n",
3207 | "Epoch 27/100 Batch 260/781 - Loss: 0.789, Seconds: 4.33\n",
3208 | "Epoch 27/100 Batch 280/781 - Loss: 0.785, Seconds: 4.31\n",
3209 | "Epoch 27/100 Batch 300/781 - Loss: 0.856, Seconds: 4.23\n",
3210 | "Epoch 27/100 Batch 320/781 - Loss: 0.884, Seconds: 4.15\n",
3211 | "Epoch 27/100 Batch 340/781 - Loss: 0.783, Seconds: 4.53\n",
3212 | "Epoch 27/100 Batch 360/781 - Loss: 0.825, Seconds: 4.31\n",
3213 | "Epoch 27/100 Batch 380/781 - Loss: 0.704, Seconds: 4.66\n",
3214 | "Epoch 27/100 Batch 400/781 - Loss: 0.812, Seconds: 4.61\n",
3215 | "Epoch 27/100 Batch 420/781 - Loss: 0.760, Seconds: 3.87\n",
3216 | "Epoch 27/100 Batch 440/781 - Loss: 0.813, Seconds: 4.59\n",
3217 | "Epoch 27/100 Batch 460/781 - Loss: 0.856, Seconds: 4.43\n",
3218 | "Epoch 27/100 Batch 480/781 - Loss: 0.803, Seconds: 4.21\n",
3219 | "Epoch 27/100 Batch 500/781 - Loss: 0.867, Seconds: 4.11\n",
3220 | "Average loss for this update: 0.807\n",
3221 | "New Record!\n",
3222 | "Epoch 27/100 Batch 520/781 - Loss: 0.753, Seconds: 3.87\n",
3223 | "Epoch 27/100 Batch 540/781 - Loss: 0.791, Seconds: 4.47\n",
3224 | "Epoch 27/100 Batch 560/781 - Loss: 0.760, Seconds: 3.95\n",
3225 | "Epoch 27/100 Batch 580/781 - Loss: 0.853, Seconds: 4.25\n",
3226 | "Epoch 27/100 Batch 600/781 - Loss: 0.880, Seconds: 4.31\n",
3227 | "Epoch 27/100 Batch 620/781 - Loss: 0.826, Seconds: 3.99\n",
3228 | "Epoch 27/100 Batch 640/781 - Loss: 0.817, Seconds: 4.29\n",
3229 | "Epoch 27/100 Batch 660/781 - Loss: 0.726, Seconds: 4.45\n",
3230 | "Epoch 27/100 Batch 680/781 - Loss: 0.761, Seconds: 4.25\n"
3231 | ]
3232 | },
3233 | {
3234 | "name": "stdout",
3235 | "output_type": "stream",
3236 | "text": [
3237 | "Epoch 27/100 Batch 700/781 - Loss: 0.853, Seconds: 4.31\n",
3238 | "Epoch 27/100 Batch 720/781 - Loss: 0.944, Seconds: 4.51\n",
3239 | "Epoch 27/100 Batch 740/781 - Loss: 0.856, Seconds: 4.76\n",
3240 | "Epoch 27/100 Batch 760/781 - Loss: 0.891, Seconds: 4.27\n",
3241 | "Average loss for this update: 0.826\n",
3242 | "No Improvement.\n",
3243 | "Epoch 27/100 Batch 780/781 - Loss: 0.777, Seconds: 3.85\n",
3244 | "Epoch 28/100 Batch 20/781 - Loss: 0.902, Seconds: 4.25\n",
3245 | "Epoch 28/100 Batch 40/781 - Loss: 0.824, Seconds: 3.43\n",
3246 | "Epoch 28/100 Batch 60/781 - Loss: 0.823, Seconds: 4.31\n",
3247 | "Epoch 28/100 Batch 80/781 - Loss: 0.837, Seconds: 4.45\n",
3248 | "Epoch 28/100 Batch 100/781 - Loss: 0.743, Seconds: 4.27\n",
3249 | "Epoch 28/100 Batch 120/781 - Loss: 0.792, Seconds: 3.81\n",
3250 | "Epoch 28/100 Batch 140/781 - Loss: 0.759, Seconds: 4.01\n",
3251 | "Epoch 28/100 Batch 160/781 - Loss: 0.870, Seconds: 3.25\n",
3252 | "Epoch 28/100 Batch 180/781 - Loss: 0.830, Seconds: 3.61\n",
3253 | "Epoch 28/100 Batch 200/781 - Loss: 0.792, Seconds: 4.23\n",
3254 | "Epoch 28/100 Batch 220/781 - Loss: 0.810, Seconds: 4.13\n",
3255 | "Epoch 28/100 Batch 240/781 - Loss: 0.684, Seconds: 4.15\n",
3256 | "Average loss for this update: 0.802\n",
3257 | "New Record!\n",
3258 | "Epoch 28/100 Batch 260/781 - Loss: 0.763, Seconds: 4.75\n",
3259 | "Epoch 28/100 Batch 280/781 - Loss: 0.780, Seconds: 4.21\n",
3260 | "Epoch 28/100 Batch 300/781 - Loss: 0.837, Seconds: 4.11\n",
3261 | "Epoch 28/100 Batch 320/781 - Loss: 0.875, Seconds: 4.03\n",
3262 | "Epoch 28/100 Batch 340/781 - Loss: 0.781, Seconds: 4.57\n",
3263 | "Epoch 28/100 Batch 360/781 - Loss: 0.808, Seconds: 4.31\n",
3264 | "Epoch 28/100 Batch 380/781 - Loss: 0.704, Seconds: 4.49\n",
3265 | "Epoch 28/100 Batch 400/781 - Loss: 0.797, Seconds: 4.59\n",
3266 | "Epoch 28/100 Batch 420/781 - Loss: 0.742, Seconds: 3.97\n",
3267 | "Epoch 28/100 Batch 440/781 - Loss: 0.800, Seconds: 4.61\n",
3268 | "Epoch 28/100 Batch 460/781 - Loss: 0.844, Seconds: 4.31\n",
3269 | "Epoch 28/100 Batch 480/781 - Loss: 0.752, Seconds: 3.93\n",
3270 | "Epoch 28/100 Batch 500/781 - Loss: 0.810, Seconds: 4.29\n",
3271 | "Average loss for this update: 0.788\n",
3272 | "New Record!\n",
3273 | "Epoch 28/100 Batch 520/781 - Loss: 0.717, Seconds: 3.89\n",
3274 | "Epoch 28/100 Batch 540/781 - Loss: 0.779, Seconds: 4.41\n",
3275 | "Epoch 28/100 Batch 560/781 - Loss: 0.740, Seconds: 3.93\n",
3276 | "Epoch 28/100 Batch 580/781 - Loss: 0.830, Seconds: 4.27\n",
3277 | "Epoch 28/100 Batch 600/781 - Loss: 0.835, Seconds: 4.17\n",
3278 | "Epoch 28/100 Batch 620/781 - Loss: 0.788, Seconds: 4.15\n",
3279 | "Epoch 28/100 Batch 640/781 - Loss: 0.792, Seconds: 4.35\n",
3280 | "Epoch 28/100 Batch 660/781 - Loss: 0.702, Seconds: 4.47\n",
3281 | "Epoch 28/100 Batch 680/781 - Loss: 0.764, Seconds: 4.25\n",
3282 | "Epoch 28/100 Batch 700/781 - Loss: 0.842, Seconds: 4.61\n",
3283 | "Epoch 28/100 Batch 720/781 - Loss: 0.926, Seconds: 4.67\n",
3284 | "Epoch 28/100 Batch 740/781 - Loss: 0.836, Seconds: 4.65\n",
3285 | "Epoch 28/100 Batch 760/781 - Loss: 0.866, Seconds: 4.19\n",
3286 | "Average loss for this update: 0.804\n",
3287 | "No Improvement.\n",
3288 | "Epoch 28/100 Batch 780/781 - Loss: 0.761, Seconds: 3.81\n",
3289 | "Epoch 29/100 Batch 20/781 - Loss: 0.882, Seconds: 4.27\n",
3290 | "Epoch 29/100 Batch 40/781 - Loss: 0.814, Seconds: 3.37\n",
3291 | "Epoch 29/100 Batch 60/781 - Loss: 0.814, Seconds: 4.31\n",
3292 | "Epoch 29/100 Batch 80/781 - Loss: 0.804, Seconds: 4.17\n",
3293 | "Epoch 29/100 Batch 100/781 - Loss: 0.712, Seconds: 4.37\n",
3294 | "Epoch 29/100 Batch 120/781 - Loss: 0.774, Seconds: 3.93\n",
3295 | "Epoch 29/100 Batch 140/781 - Loss: 0.743, Seconds: 3.89\n",
3296 | "Epoch 29/100 Batch 160/781 - Loss: 0.861, Seconds: 3.21\n",
3297 | "Epoch 29/100 Batch 180/781 - Loss: 0.816, Seconds: 3.63\n",
3298 | "Epoch 29/100 Batch 200/781 - Loss: 0.773, Seconds: 4.29\n",
3299 | "Epoch 29/100 Batch 220/781 - Loss: 0.793, Seconds: 4.17\n",
3300 | "Epoch 29/100 Batch 240/781 - Loss: 0.657, Seconds: 4.03\n",
3301 | "Average loss for this update: 0.785\n",
3302 | "New Record!\n",
3303 | "Epoch 29/100 Batch 260/781 - Loss: 0.748, Seconds: 4.57\n",
3304 | "Epoch 29/100 Batch 280/781 - Loss: 0.745, Seconds: 4.29\n",
3305 | "Epoch 29/100 Batch 300/781 - Loss: 0.803, Seconds: 4.21\n",
3306 | "Epoch 29/100 Batch 320/781 - Loss: 0.860, Seconds: 4.19\n",
3307 | "Epoch 29/100 Batch 340/781 - Loss: 0.759, Seconds: 4.69\n",
3308 | "Epoch 29/100 Batch 360/781 - Loss: 0.794, Seconds: 4.37\n",
3309 | "Epoch 29/100 Batch 380/781 - Loss: 0.679, Seconds: 4.61\n",
3310 | "Epoch 29/100 Batch 400/781 - Loss: 0.794, Seconds: 4.55\n",
3311 | "Epoch 29/100 Batch 420/781 - Loss: 0.739, Seconds: 3.99\n",
3312 | "Epoch 29/100 Batch 440/781 - Loss: 0.791, Seconds: 4.69\n",
3313 | "Epoch 29/100 Batch 460/781 - Loss: 0.820, Seconds: 4.59\n",
3314 | "Epoch 29/100 Batch 480/781 - Loss: 0.747, Seconds: 4.05\n",
3315 | "Epoch 29/100 Batch 500/781 - Loss: 0.804, Seconds: 4.33\n",
3316 | "Average loss for this update: 0.771\n",
3317 | "New Record!\n",
3318 | "Epoch 29/100 Batch 520/781 - Loss: 0.701, Seconds: 4.03\n",
3319 | "Epoch 29/100 Batch 540/781 - Loss: 0.757, Seconds: 4.37\n",
3320 | "Epoch 29/100 Batch 560/781 - Loss: 0.724, Seconds: 3.91\n",
3321 | "Epoch 29/100 Batch 580/781 - Loss: 0.813, Seconds: 4.25\n",
3322 | "Epoch 29/100 Batch 600/781 - Loss: 0.825, Seconds: 4.19\n",
3323 | "Epoch 29/100 Batch 620/781 - Loss: 0.785, Seconds: 4.03\n",
3324 | "Epoch 29/100 Batch 640/781 - Loss: 0.786, Seconds: 4.13\n",
3325 | "Epoch 29/100 Batch 660/781 - Loss: 0.690, Seconds: 4.35\n",
3326 | "Epoch 29/100 Batch 680/781 - Loss: 0.734, Seconds: 4.23\n",
3327 | "Epoch 29/100 Batch 700/781 - Loss: 0.821, Seconds: 4.27\n",
3328 | "Epoch 29/100 Batch 720/781 - Loss: 0.905, Seconds: 4.53\n",
3329 | "Epoch 29/100 Batch 740/781 - Loss: 0.836, Seconds: 4.73\n",
3330 | "Epoch 29/100 Batch 760/781 - Loss: 0.854, Seconds: 4.33\n",
3331 | "Average loss for this update: 0.79\n",
3332 | "No Improvement.\n",
3333 | "Epoch 29/100 Batch 780/781 - Loss: 0.741, Seconds: 3.85\n",
3334 | "Epoch 30/100 Batch 20/781 - Loss: 0.855, Seconds: 4.25\n",
3335 | "Epoch 30/100 Batch 40/781 - Loss: 0.793, Seconds: 3.69\n",
3336 | "Epoch 30/100 Batch 60/781 - Loss: 0.793, Seconds: 4.39\n",
3337 | "Epoch 30/100 Batch 80/781 - Loss: 0.791, Seconds: 4.11\n",
3338 | "Epoch 30/100 Batch 100/781 - Loss: 0.690, Seconds: 4.23\n",
3339 | "Epoch 30/100 Batch 120/781 - Loss: 0.766, Seconds: 3.85\n",
3340 | "Epoch 30/100 Batch 140/781 - Loss: 0.733, Seconds: 3.99\n",
3341 | "Epoch 30/100 Batch 160/781 - Loss: 0.845, Seconds: 3.33\n",
3342 | "Epoch 30/100 Batch 180/781 - Loss: 0.793, Seconds: 3.71\n",
3343 | "Epoch 30/100 Batch 200/781 - Loss: 0.761, Seconds: 4.27\n",
3344 | "Epoch 30/100 Batch 220/781 - Loss: 0.771, Seconds: 4.13\n",
3345 | "Epoch 30/100 Batch 240/781 - Loss: 0.646, Seconds: 4.03\n",
3346 | "Average loss for this update: 0.768\n",
3347 | "New Record!\n",
3348 | "Epoch 30/100 Batch 260/781 - Loss: 0.744, Seconds: 4.63\n",
3349 | "Epoch 30/100 Batch 280/781 - Loss: 0.734, Seconds: 4.27\n",
3350 | "Epoch 30/100 Batch 300/781 - Loss: 0.788, Seconds: 4.09\n",
3351 | "Epoch 30/100 Batch 320/781 - Loss: 0.839, Seconds: 4.11\n",
3352 | "Epoch 30/100 Batch 340/781 - Loss: 0.742, Seconds: 4.53\n",
3353 | "Epoch 30/100 Batch 360/781 - Loss: 0.792, Seconds: 4.33\n",
3354 | "Epoch 30/100 Batch 380/781 - Loss: 0.669, Seconds: 4.61\n",
3355 | "Epoch 30/100 Batch 400/781 - Loss: 0.777, Seconds: 4.55\n",
3356 | "Epoch 30/100 Batch 420/781 - Loss: 0.738, Seconds: 3.91\n",
3357 | "Epoch 30/100 Batch 440/781 - Loss: 0.824, Seconds: 4.59\n",
3358 | "Epoch 30/100 Batch 460/781 - Loss: 0.821, Seconds: 4.41\n",
3359 | "Epoch 30/100 Batch 480/781 - Loss: 0.727, Seconds: 4.03\n",
3360 | "Epoch 30/100 Batch 500/781 - Loss: 0.807, Seconds: 4.23\n",
3361 | "Average loss for this update: 0.765\n",
3362 | "New Record!\n",
3363 | "Epoch 30/100 Batch 520/781 - Loss: 0.693, Seconds: 4.07\n",
3364 | "Epoch 30/100 Batch 540/781 - Loss: 0.748, Seconds: 4.49\n",
3365 | "Epoch 30/100 Batch 560/781 - Loss: 0.724, Seconds: 4.01\n",
3366 | "Epoch 30/100 Batch 580/781 - Loss: 0.797, Seconds: 4.53\n",
3367 | "Epoch 30/100 Batch 600/781 - Loss: 0.810, Seconds: 4.31\n",
3368 | "Epoch 30/100 Batch 620/781 - Loss: 0.770, Seconds: 4.25\n",
3369 | "Epoch 30/100 Batch 640/781 - Loss: 0.767, Seconds: 4.21\n",
3370 | "Epoch 30/100 Batch 660/781 - Loss: 0.670, Seconds: 4.39\n",
3371 | "Epoch 30/100 Batch 680/781 - Loss: 0.722, Seconds: 4.33\n",
3372 | "Epoch 30/100 Batch 700/781 - Loss: 0.789, Seconds: 4.25\n",
3373 | "Epoch 30/100 Batch 720/781 - Loss: 0.894, Seconds: 4.77\n",
3374 | "Epoch 30/100 Batch 740/781 - Loss: 0.818, Seconds: 4.69\n",
3375 | "Epoch 30/100 Batch 760/781 - Loss: 0.856, Seconds: 4.43\n",
3376 | "Average loss for this update: 0.777\n",
3377 | "No Improvement.\n",
3378 | "Epoch 30/100 Batch 780/781 - Loss: 0.735, Seconds: 3.85\n",
3379 | "Epoch 31/100 Batch 20/781 - Loss: 0.863, Seconds: 4.15\n",
3380 | "Epoch 31/100 Batch 40/781 - Loss: 0.789, Seconds: 3.47\n",
3381 | "Epoch 31/100 Batch 60/781 - Loss: 0.772, Seconds: 4.39\n",
3382 | "Epoch 31/100 Batch 80/781 - Loss: 0.780, Seconds: 4.25\n",
3383 | "Epoch 31/100 Batch 100/781 - Loss: 0.694, Seconds: 4.17\n",
3384 | "Epoch 31/100 Batch 120/781 - Loss: 0.744, Seconds: 4.01\n",
3385 | "Epoch 31/100 Batch 140/781 - Loss: 0.725, Seconds: 4.09\n",
3386 | "Epoch 31/100 Batch 160/781 - Loss: 0.822, Seconds: 3.27\n",
3387 | "Epoch 31/100 Batch 180/781 - Loss: 0.778, Seconds: 3.63\n"
3388 | ]
3389 | },
3390 | {
3391 | "name": "stdout",
3392 | "output_type": "stream",
3393 | "text": [
3394 | "Epoch 31/100 Batch 200/781 - Loss: 0.736, Seconds: 4.51\n",
3395 | "Epoch 31/100 Batch 220/781 - Loss: 0.775, Seconds: 4.03\n",
3396 | "Epoch 31/100 Batch 240/781 - Loss: 0.647, Seconds: 3.99\n",
3397 | "Average loss for this update: 0.758\n",
3398 | "New Record!\n",
3399 | "Epoch 31/100 Batch 260/781 - Loss: 0.730, Seconds: 4.55\n",
3400 | "Epoch 31/100 Batch 280/781 - Loss: 0.752, Seconds: 4.39\n",
3401 | "Epoch 31/100 Batch 300/781 - Loss: 0.792, Seconds: 4.21\n",
3402 | "Epoch 31/100 Batch 320/781 - Loss: 0.832, Seconds: 4.11\n",
3403 | "Epoch 31/100 Batch 340/781 - Loss: 0.749, Seconds: 4.51\n",
3404 | "Epoch 31/100 Batch 360/781 - Loss: 0.782, Seconds: 4.31\n",
3405 | "Epoch 31/100 Batch 380/781 - Loss: 0.669, Seconds: 4.55\n",
3406 | "Epoch 31/100 Batch 400/781 - Loss: 0.775, Seconds: 4.59\n",
3407 | "Epoch 31/100 Batch 420/781 - Loss: 0.716, Seconds: 3.89\n",
3408 | "Epoch 31/100 Batch 440/781 - Loss: 0.769, Seconds: 4.59\n",
3409 | "Epoch 31/100 Batch 460/781 - Loss: 0.786, Seconds: 4.33\n",
3410 | "Epoch 31/100 Batch 480/781 - Loss: 0.713, Seconds: 3.97\n",
3411 | "Epoch 31/100 Batch 500/781 - Loss: 0.787, Seconds: 4.19\n",
3412 | "Average loss for this update: 0.753\n",
3413 | "New Record!\n",
3414 | "Epoch 31/100 Batch 520/781 - Loss: 0.672, Seconds: 3.99\n",
3415 | "Epoch 31/100 Batch 540/781 - Loss: 0.739, Seconds: 4.31\n",
3416 | "Epoch 31/100 Batch 560/781 - Loss: 0.700, Seconds: 3.99\n",
3417 | "Epoch 31/100 Batch 580/781 - Loss: 0.783, Seconds: 4.29\n",
3418 | "Epoch 31/100 Batch 600/781 - Loss: 0.796, Seconds: 4.29\n",
3419 | "Epoch 31/100 Batch 620/781 - Loss: 0.750, Seconds: 3.97\n",
3420 | "Epoch 31/100 Batch 640/781 - Loss: 0.757, Seconds: 4.31\n",
3421 | "Epoch 31/100 Batch 660/781 - Loss: 0.676, Seconds: 4.59\n",
3422 | "Epoch 31/100 Batch 680/781 - Loss: 0.748, Seconds: 4.21\n",
3423 | "Epoch 31/100 Batch 700/781 - Loss: 0.798, Seconds: 4.37\n",
3424 | "Epoch 31/100 Batch 720/781 - Loss: 0.883, Seconds: 4.55\n",
3425 | "Epoch 31/100 Batch 740/781 - Loss: 0.805, Seconds: 4.73\n",
3426 | "Epoch 31/100 Batch 760/781 - Loss: 0.840, Seconds: 4.29\n",
3427 | "Average loss for this update: 0.768\n",
3428 | "No Improvement.\n",
3429 | "Epoch 31/100 Batch 780/781 - Loss: 0.722, Seconds: 4.03\n",
3430 | "Epoch 32/100 Batch 20/781 - Loss: 0.836, Seconds: 4.23\n",
3431 | "Epoch 32/100 Batch 40/781 - Loss: 0.777, Seconds: 3.33\n",
3432 | "Epoch 32/100 Batch 60/781 - Loss: 0.766, Seconds: 4.51\n",
3433 | "Epoch 32/100 Batch 80/781 - Loss: 0.770, Seconds: 4.21\n",
3434 | "Epoch 32/100 Batch 100/781 - Loss: 0.688, Seconds: 4.27\n",
3435 | "Epoch 32/100 Batch 120/781 - Loss: 0.738, Seconds: 3.95\n",
3436 | "Epoch 32/100 Batch 140/781 - Loss: 0.711, Seconds: 3.93\n",
3437 | "Epoch 32/100 Batch 160/781 - Loss: 0.825, Seconds: 3.23\n",
3438 | "Epoch 32/100 Batch 180/781 - Loss: 0.780, Seconds: 3.75\n",
3439 | "Epoch 32/100 Batch 200/781 - Loss: 0.765, Seconds: 4.25\n",
3440 | "Epoch 32/100 Batch 220/781 - Loss: 0.762, Seconds: 4.15\n",
3441 | "Epoch 32/100 Batch 240/781 - Loss: 0.636, Seconds: 4.09\n",
3442 | "Average loss for this update: 0.752\n",
3443 | "New Record!\n",
3444 | "Epoch 32/100 Batch 260/781 - Loss: 0.721, Seconds: 4.59\n",
3445 | "Epoch 32/100 Batch 280/781 - Loss: 0.724, Seconds: 4.35\n",
3446 | "Epoch 32/100 Batch 300/781 - Loss: 0.788, Seconds: 4.19\n",
3447 | "Epoch 32/100 Batch 320/781 - Loss: 0.835, Seconds: 4.13\n",
3448 | "Epoch 32/100 Batch 340/781 - Loss: 0.720, Seconds: 4.55\n",
3449 | "Epoch 32/100 Batch 360/781 - Loss: 0.752, Seconds: 4.41\n",
3450 | "Epoch 32/100 Batch 380/781 - Loss: 0.658, Seconds: 4.57\n",
3451 | "Epoch 32/100 Batch 400/781 - Loss: 0.763, Seconds: 4.61\n",
3452 | "Epoch 32/100 Batch 420/781 - Loss: 0.711, Seconds: 3.93\n",
3453 | "Epoch 32/100 Batch 440/781 - Loss: 0.760, Seconds: 4.47\n",
3454 | "Epoch 32/100 Batch 460/781 - Loss: 0.793, Seconds: 4.33\n",
3455 | "Epoch 32/100 Batch 480/781 - Loss: 0.696, Seconds: 4.09\n",
3456 | "Epoch 32/100 Batch 500/781 - Loss: 0.772, Seconds: 4.15\n",
3457 | "Average loss for this update: 0.741\n",
3458 | "New Record!\n",
3459 | "Epoch 32/100 Batch 520/781 - Loss: 0.661, Seconds: 4.13\n",
3460 | "Epoch 32/100 Batch 540/781 - Loss: 0.740, Seconds: 4.45\n",
3461 | "Epoch 32/100 Batch 560/781 - Loss: 0.702, Seconds: 3.97\n",
3462 | "Epoch 32/100 Batch 580/781 - Loss: 0.796, Seconds: 4.25\n",
3463 | "Epoch 32/100 Batch 600/781 - Loss: 0.806, Seconds: 4.33\n",
3464 | "Epoch 32/100 Batch 620/781 - Loss: 0.738, Seconds: 4.03\n",
3465 | "Epoch 32/100 Batch 640/781 - Loss: 0.746, Seconds: 4.23\n",
3466 | "Epoch 32/100 Batch 660/781 - Loss: 0.654, Seconds: 4.43\n",
3467 | "Epoch 32/100 Batch 680/781 - Loss: 0.701, Seconds: 4.27\n",
3468 | "Epoch 32/100 Batch 700/781 - Loss: 0.779, Seconds: 4.29\n",
3469 | "Epoch 32/100 Batch 720/781 - Loss: 0.862, Seconds: 4.59\n",
3470 | "Epoch 32/100 Batch 740/781 - Loss: 0.777, Seconds: 4.63\n",
3471 | "Epoch 32/100 Batch 760/781 - Loss: 0.833, Seconds: 4.29\n",
3472 | "Average loss for this update: 0.757\n",
3473 | "No Improvement.\n",
3474 | "Epoch 32/100 Batch 780/781 - Loss: 0.708, Seconds: 3.89\n",
3475 | "Epoch 33/100 Batch 20/781 - Loss: 0.820, Seconds: 4.23\n",
3476 | "Epoch 33/100 Batch 40/781 - Loss: 0.761, Seconds: 3.45\n",
3477 | "Epoch 33/100 Batch 60/781 - Loss: 0.758, Seconds: 4.45\n",
3478 | "Epoch 33/100 Batch 80/781 - Loss: 0.744, Seconds: 4.23\n",
3479 | "Epoch 33/100 Batch 100/781 - Loss: 0.672, Seconds: 4.23\n",
3480 | "Epoch 33/100 Batch 120/781 - Loss: 0.730, Seconds: 3.89\n",
3481 | "Epoch 33/100 Batch 140/781 - Loss: 0.700, Seconds: 3.97\n",
3482 | "Epoch 33/100 Batch 160/781 - Loss: 0.809, Seconds: 3.25\n",
3483 | "Epoch 33/100 Batch 180/781 - Loss: 0.758, Seconds: 3.65\n",
3484 | "Epoch 33/100 Batch 200/781 - Loss: 0.734, Seconds: 4.29\n",
3485 | "Epoch 33/100 Batch 220/781 - Loss: 0.748, Seconds: 4.01\n",
3486 | "Epoch 33/100 Batch 240/781 - Loss: 0.630, Seconds: 4.07\n",
3487 | "Average loss for this update: 0.737\n",
3488 | "New Record!\n",
3489 | "Epoch 33/100 Batch 260/781 - Loss: 0.714, Seconds: 4.61\n",
3490 | "Epoch 33/100 Batch 280/781 - Loss: 0.712, Seconds: 4.23\n",
3491 | "Epoch 33/100 Batch 300/781 - Loss: 0.764, Seconds: 4.13\n",
3492 | "Epoch 33/100 Batch 320/781 - Loss: 0.814, Seconds: 4.13\n",
3493 | "Epoch 33/100 Batch 340/781 - Loss: 0.717, Seconds: 5.39\n",
3494 | "Epoch 33/100 Batch 360/781 - Loss: 0.753, Seconds: 4.31\n",
3495 | "Epoch 33/100 Batch 380/781 - Loss: 0.661, Seconds: 5.16\n",
3496 | "Epoch 33/100 Batch 400/781 - Loss: 0.742, Seconds: 4.57\n",
3497 | "Epoch 33/100 Batch 420/781 - Loss: 0.698, Seconds: 3.89\n",
3498 | "Epoch 33/100 Batch 440/781 - Loss: 0.740, Seconds: 4.59\n",
3499 | "Epoch 33/100 Batch 460/781 - Loss: 0.768, Seconds: 4.37\n",
3500 | "Epoch 33/100 Batch 480/781 - Loss: 0.690, Seconds: 3.99\n",
3501 | "Epoch 33/100 Batch 500/781 - Loss: 0.765, Seconds: 4.37\n",
3502 | "Average loss for this update: 0.73\n",
3503 | "New Record!\n",
3504 | "Epoch 33/100 Batch 520/781 - Loss: 0.673, Seconds: 3.91\n",
3505 | "Epoch 33/100 Batch 540/781 - Loss: 0.712, Seconds: 4.45\n",
3506 | "Epoch 33/100 Batch 560/781 - Loss: 0.678, Seconds: 3.93\n",
3507 | "Epoch 33/100 Batch 580/781 - Loss: 0.765, Seconds: 4.19\n",
3508 | "Epoch 33/100 Batch 600/781 - Loss: 0.771, Seconds: 4.29\n",
3509 | "Epoch 33/100 Batch 620/781 - Loss: 0.731, Seconds: 4.09\n",
3510 | "Epoch 33/100 Batch 640/781 - Loss: 0.730, Seconds: 4.31\n",
3511 | "Epoch 33/100 Batch 660/781 - Loss: 0.645, Seconds: 4.53\n",
3512 | "Epoch 33/100 Batch 680/781 - Loss: 0.696, Seconds: 4.19\n",
3513 | "Epoch 33/100 Batch 700/781 - Loss: 0.773, Seconds: 4.37\n",
3514 | "Epoch 33/100 Batch 720/781 - Loss: 0.841, Seconds: 4.59\n",
3515 | "Epoch 33/100 Batch 740/781 - Loss: 0.766, Seconds: 4.55\n",
3516 | "Epoch 33/100 Batch 760/781 - Loss: 0.817, Seconds: 4.29\n",
3517 | "Average loss for this update: 0.741\n",
3518 | "No Improvement.\n",
3519 | "Epoch 33/100 Batch 780/781 - Loss: 0.702, Seconds: 3.91\n",
3520 | "Epoch 34/100 Batch 20/781 - Loss: 0.809, Seconds: 4.27\n",
3521 | "Epoch 34/100 Batch 40/781 - Loss: 0.749, Seconds: 3.41\n",
3522 | "Epoch 34/100 Batch 60/781 - Loss: 0.742, Seconds: 4.49\n",
3523 | "Epoch 34/100 Batch 80/781 - Loss: 0.743, Seconds: 4.19\n",
3524 | "Epoch 34/100 Batch 100/781 - Loss: 0.664, Seconds: 4.21\n",
3525 | "Epoch 34/100 Batch 120/781 - Loss: 0.713, Seconds: 3.93\n",
3526 | "Epoch 34/100 Batch 140/781 - Loss: 0.698, Seconds: 3.93\n",
3527 | "Epoch 34/100 Batch 160/781 - Loss: 0.784, Seconds: 3.29\n",
3528 | "Epoch 34/100 Batch 180/781 - Loss: 0.742, Seconds: 3.79\n",
3529 | "Epoch 34/100 Batch 200/781 - Loss: 0.713, Seconds: 4.31\n",
3530 | "Epoch 34/100 Batch 220/781 - Loss: 0.746, Seconds: 4.01\n",
3531 | "Epoch 34/100 Batch 240/781 - Loss: 0.616, Seconds: 4.05\n",
3532 | "Average loss for this update: 0.724\n",
3533 | "New Record!\n",
3534 | "Epoch 34/100 Batch 260/781 - Loss: 0.692, Seconds: 4.61\n",
3535 | "Epoch 34/100 Batch 280/781 - Loss: 0.710, Seconds: 4.17\n",
3536 | "Epoch 34/100 Batch 300/781 - Loss: 0.768, Seconds: 4.07\n",
3537 | "Epoch 34/100 Batch 320/781 - Loss: 0.811, Seconds: 4.11\n",
3538 | "Epoch 34/100 Batch 340/781 - Loss: 0.706, Seconds: 4.73\n",
3539 | "Epoch 34/100 Batch 360/781 - Loss: 0.750, Seconds: 4.41\n",
3540 | "Epoch 34/100 Batch 380/781 - Loss: 0.641, Seconds: 4.51\n",
3541 | "Epoch 34/100 Batch 400/781 - Loss: 0.742, Seconds: 4.47\n",
3542 | "Epoch 34/100 Batch 420/781 - Loss: 0.695, Seconds: 3.91\n",
3543 | "Epoch 34/100 Batch 440/781 - Loss: 0.730, Seconds: 4.65\n",
3544 | "Epoch 34/100 Batch 460/781 - Loss: 0.763, Seconds: 4.47\n"
3545 | ]
3546 | },
3547 | {
3548 | "name": "stdout",
3549 | "output_type": "stream",
3550 | "text": [
3551 | "Epoch 34/100 Batch 480/781 - Loss: 0.696, Seconds: 4.09\n",
3552 | "Epoch 34/100 Batch 500/781 - Loss: 0.751, Seconds: 4.17\n",
3553 | "Average loss for this update: 0.725\n",
3554 | "No Improvement.\n",
3555 | "Epoch 34/100 Batch 520/781 - Loss: 0.662, Seconds: 3.85\n",
3556 | "Epoch 34/100 Batch 540/781 - Loss: 0.696, Seconds: 4.43\n",
3557 | "Epoch 34/100 Batch 560/781 - Loss: 0.672, Seconds: 4.03\n",
3558 | "Epoch 34/100 Batch 580/781 - Loss: 0.756, Seconds: 4.29\n",
3559 | "Epoch 34/100 Batch 600/781 - Loss: 0.766, Seconds: 4.31\n",
3560 | "Epoch 34/100 Batch 620/781 - Loss: 0.728, Seconds: 4.07\n",
3561 | "Epoch 34/100 Batch 640/781 - Loss: 0.726, Seconds: 4.33\n",
3562 | "Epoch 34/100 Batch 660/781 - Loss: 0.633, Seconds: 4.39\n",
3563 | "Epoch 34/100 Batch 680/781 - Loss: 0.698, Seconds: 4.21\n",
3564 | "Epoch 34/100 Batch 700/781 - Loss: 0.761, Seconds: 4.35\n",
3565 | "Epoch 34/100 Batch 720/781 - Loss: 0.833, Seconds: 4.47\n",
3566 | "Epoch 34/100 Batch 740/781 - Loss: 0.756, Seconds: 4.73\n",
3567 | "Epoch 34/100 Batch 760/781 - Loss: 0.788, Seconds: 4.35\n",
3568 | "Average loss for this update: 0.731\n",
3569 | "No Improvement.\n",
3570 | "Epoch 34/100 Batch 780/781 - Loss: 0.689, Seconds: 3.93\n",
3571 | "Epoch 35/100 Batch 20/781 - Loss: 0.788, Seconds: 4.21\n",
3572 | "Epoch 35/100 Batch 40/781 - Loss: 0.730, Seconds: 3.47\n",
3573 | "Epoch 35/100 Batch 60/781 - Loss: 0.731, Seconds: 4.37\n",
3574 | "Epoch 35/100 Batch 80/781 - Loss: 0.729, Seconds: 4.25\n",
3575 | "Epoch 35/100 Batch 100/781 - Loss: 0.648, Seconds: 4.25\n",
3576 | "Epoch 35/100 Batch 120/781 - Loss: 0.700, Seconds: 3.91\n",
3577 | "Epoch 35/100 Batch 140/781 - Loss: 0.668, Seconds: 4.23\n",
3578 | "Epoch 35/100 Batch 160/781 - Loss: 0.766, Seconds: 3.33\n",
3579 | "Epoch 35/100 Batch 180/781 - Loss: 0.726, Seconds: 3.75\n",
3580 | "Epoch 35/100 Batch 200/781 - Loss: 0.691, Seconds: 4.19\n",
3581 | "Epoch 35/100 Batch 220/781 - Loss: 0.733, Seconds: 4.11\n",
3582 | "Epoch 35/100 Batch 240/781 - Loss: 0.605, Seconds: 4.21\n",
3583 | "Average loss for this update: 0.708\n",
3584 | "New Record!\n",
3585 | "Epoch 35/100 Batch 260/781 - Loss: 0.689, Seconds: 4.51\n",
3586 | "Epoch 35/100 Batch 280/781 - Loss: 0.693, Seconds: 4.31\n",
3587 | "Epoch 35/100 Batch 300/781 - Loss: 0.746, Seconds: 4.17\n",
3588 | "Epoch 35/100 Batch 320/781 - Loss: 0.790, Seconds: 4.27\n",
3589 | "Epoch 35/100 Batch 340/781 - Loss: 0.690, Seconds: 4.55\n",
3590 | "Epoch 35/100 Batch 360/781 - Loss: 0.729, Seconds: 4.79\n",
3591 | "Epoch 35/100 Batch 380/781 - Loss: 0.638, Seconds: 4.61\n",
3592 | "Epoch 35/100 Batch 400/781 - Loss: 0.719, Seconds: 4.89\n",
3593 | "Epoch 35/100 Batch 420/781 - Loss: 0.684, Seconds: 3.93\n",
3594 | "Epoch 35/100 Batch 440/781 - Loss: 0.722, Seconds: 4.57\n",
3595 | "Epoch 35/100 Batch 460/781 - Loss: 0.738, Seconds: 4.35\n",
3596 | "Epoch 35/100 Batch 480/781 - Loss: 0.678, Seconds: 3.95\n",
3597 | "Epoch 35/100 Batch 500/781 - Loss: 0.744, Seconds: 4.37\n",
3598 | "Average loss for this update: 0.708\n",
3599 | "New Record!\n",
3600 | "Epoch 35/100 Batch 520/781 - Loss: 0.632, Seconds: 3.77\n",
3601 | "Epoch 35/100 Batch 540/781 - Loss: 0.686, Seconds: 4.45\n",
3602 | "Epoch 35/100 Batch 560/781 - Loss: 0.671, Seconds: 4.17\n",
3603 | "Epoch 35/100 Batch 580/781 - Loss: 0.750, Seconds: 4.29\n",
3604 | "Epoch 35/100 Batch 600/781 - Loss: 0.756, Seconds: 4.27\n",
3605 | "Epoch 35/100 Batch 620/781 - Loss: 0.696, Seconds: 4.01\n",
3606 | "Epoch 35/100 Batch 640/781 - Loss: 0.709, Seconds: 4.29\n",
3607 | "Epoch 35/100 Batch 660/781 - Loss: 0.626, Seconds: 4.55\n",
3608 | "Epoch 35/100 Batch 680/781 - Loss: 0.685, Seconds: 4.33\n",
3609 | "Epoch 35/100 Batch 700/781 - Loss: 0.753, Seconds: 4.55\n",
3610 | "Epoch 35/100 Batch 720/781 - Loss: 0.834, Seconds: 4.57\n",
3611 | "Epoch 35/100 Batch 740/781 - Loss: 0.744, Seconds: 4.71\n",
3612 | "Epoch 35/100 Batch 760/781 - Loss: 0.787, Seconds: 4.33\n",
3613 | "Average loss for this update: 0.721\n",
3614 | "No Improvement.\n",
3615 | "Epoch 35/100 Batch 780/781 - Loss: 0.681, Seconds: 4.13\n",
3616 | "Epoch 36/100 Batch 20/781 - Loss: 0.797, Seconds: 4.27\n",
3617 | "Epoch 36/100 Batch 40/781 - Loss: 0.727, Seconds: 3.61\n",
3618 | "Epoch 36/100 Batch 60/781 - Loss: 0.722, Seconds: 4.31\n",
3619 | "Epoch 36/100 Batch 80/781 - Loss: 0.723, Seconds: 4.27\n",
3620 | "Epoch 36/100 Batch 100/781 - Loss: 0.639, Seconds: 4.25\n",
3621 | "Epoch 36/100 Batch 120/781 - Loss: 0.690, Seconds: 3.85\n",
3622 | "Epoch 36/100 Batch 140/781 - Loss: 0.663, Seconds: 4.07\n",
3623 | "Epoch 36/100 Batch 160/781 - Loss: 0.760, Seconds: 3.31\n",
3624 | "Epoch 36/100 Batch 180/781 - Loss: 0.718, Seconds: 3.89\n",
3625 | "Epoch 36/100 Batch 200/781 - Loss: 0.713, Seconds: 4.31\n",
3626 | "Epoch 36/100 Batch 220/781 - Loss: 0.738, Seconds: 4.17\n",
3627 | "Epoch 36/100 Batch 240/781 - Loss: 0.602, Seconds: 4.05\n",
3628 | "Average loss for this update: 0.706\n",
3629 | "New Record!\n",
3630 | "Epoch 36/100 Batch 260/781 - Loss: 0.684, Seconds: 4.53\n",
3631 | "Epoch 36/100 Batch 280/781 - Loss: 0.704, Seconds: 4.49\n",
3632 | "Epoch 36/100 Batch 300/781 - Loss: 0.725, Seconds: 4.27\n",
3633 | "Epoch 36/100 Batch 320/781 - Loss: 0.785, Seconds: 4.07\n",
3634 | "Epoch 36/100 Batch 340/781 - Loss: 0.683, Seconds: 4.47\n",
3635 | "Epoch 36/100 Batch 360/781 - Loss: 0.704, Seconds: 4.53\n",
3636 | "Epoch 36/100 Batch 380/781 - Loss: 0.626, Seconds: 4.57\n",
3637 | "Epoch 36/100 Batch 400/781 - Loss: 0.699, Seconds: 4.45\n",
3638 | "Epoch 36/100 Batch 420/781 - Loss: 0.668, Seconds: 3.95\n",
3639 | "Epoch 36/100 Batch 440/781 - Loss: 0.704, Seconds: 4.77\n",
3640 | "Epoch 36/100 Batch 460/781 - Loss: 0.732, Seconds: 4.63\n",
3641 | "Epoch 36/100 Batch 480/781 - Loss: 0.707, Seconds: 4.01\n",
3642 | "Epoch 36/100 Batch 500/781 - Loss: 0.790, Seconds: 4.22\n",
3643 | "Average loss for this update: 0.706\n",
3644 | "No Improvement.\n",
3645 | "Epoch 36/100 Batch 520/781 - Loss: 0.663, Seconds: 3.85\n",
3646 | "Epoch 36/100 Batch 540/781 - Loss: 0.703, Seconds: 4.35\n",
3647 | "Epoch 36/100 Batch 560/781 - Loss: 0.681, Seconds: 4.11\n",
3648 | "Epoch 36/100 Batch 580/781 - Loss: 0.759, Seconds: 4.37\n",
3649 | "Epoch 36/100 Batch 600/781 - Loss: 0.743, Seconds: 4.29\n",
3650 | "Epoch 36/100 Batch 620/781 - Loss: 0.711, Seconds: 4.05\n",
3651 | "Epoch 36/100 Batch 640/781 - Loss: 0.722, Seconds: 4.21\n",
3652 | "Epoch 36/100 Batch 660/781 - Loss: 0.619, Seconds: 4.43\n",
3653 | "Epoch 36/100 Batch 680/781 - Loss: 0.677, Seconds: 4.29\n",
3654 | "Epoch 36/100 Batch 700/781 - Loss: 0.740, Seconds: 4.31\n",
3655 | "Epoch 36/100 Batch 720/781 - Loss: 0.825, Seconds: 4.43\n",
3656 | "Epoch 36/100 Batch 740/781 - Loss: 0.738, Seconds: 5.05\n",
3657 | "Epoch 36/100 Batch 760/781 - Loss: 0.772, Seconds: 4.27\n",
3658 | "Average loss for this update: 0.721\n",
3659 | "No Improvement.\n",
3660 | "Epoch 36/100 Batch 780/781 - Loss: 0.689, Seconds: 4.35\n",
3661 | "Epoch 37/100 Batch 20/781 - Loss: 0.849, Seconds: 4.29\n",
3662 | "Epoch 37/100 Batch 40/781 - Loss: 0.742, Seconds: 3.49\n",
3663 | "Epoch 37/100 Batch 60/781 - Loss: 0.720, Seconds: 4.41\n",
3664 | "Epoch 37/100 Batch 80/781 - Loss: 0.724, Seconds: 4.39\n",
3665 | "Epoch 37/100 Batch 100/781 - Loss: 0.643, Seconds: 4.33\n",
3666 | "Epoch 37/100 Batch 120/781 - Loss: 0.690, Seconds: 3.81\n",
3667 | "Epoch 37/100 Batch 140/781 - Loss: 0.673, Seconds: 4.03\n",
3668 | "Epoch 37/100 Batch 160/781 - Loss: 0.757, Seconds: 3.33\n",
3669 | "Epoch 37/100 Batch 180/781 - Loss: 0.716, Seconds: 3.83\n",
3670 | "Epoch 37/100 Batch 200/781 - Loss: 0.690, Seconds: 4.33\n",
3671 | "Epoch 37/100 Batch 220/781 - Loss: 0.715, Seconds: 4.09\n",
3672 | "Epoch 37/100 Batch 240/781 - Loss: 0.603, Seconds: 4.23\n",
3673 | "Average loss for this update: 0.707\n",
3674 | "No Improvement.\n",
3675 | "Stopping Training.\n"
3676 | ]
3677 | }
3678 | ],
3679 | "source": [
3680 | "# Train the Model\n",
3681 | "learning_rate_decay = 0.95\n",
3682 | "min_learning_rate = 0.0005\n",
3683 | "display_step = 20 # Check training loss after every 20 batches\n",
3684 | "stop_early = 0 \n",
3685 | "stop = 3 # If the update loss does not decrease in 3 consecutive update checks, stop training\n",
3686 | "per_epoch = 3 # Make 3 update checks per epoch\n",
3687 | "update_check = (len(sorted_texts_short)//batch_size//per_epoch)-1\n",
3688 | "\n",
3689 | "update_loss = 0 \n",
3690 | "batch_loss = 0\n",
3691 | "summary_update_loss = [] # Record the update losses for saving improvements in the model\n",
3692 | "\n",
3693 | "checkpoint = \"./best_model.ckpt\" \n",
3694 | "with tf.Session(graph=train_graph) as sess:\n",
3695 | " sess.run(tf.global_variables_initializer())\n",
3696 | " \n",
3697 | " # If we want to continue training a previous session\n",
3698 | " #loader = tf.train.import_meta_graph(\"./\" + checkpoint + '.meta')\n",
3699 | " #loader.restore(sess, checkpoint)\n",
3700 | " \n",
3701 | " for epoch_i in range(1, epochs+1):\n",
3702 | " update_loss = 0\n",
3703 | " batch_loss = 0\n",
3704 | " for batch_i, (summaries_batch, texts_batch, summaries_lengths, texts_lengths) in enumerate(\n",
3705 | " get_batches(sorted_summaries_short, sorted_texts_short, batch_size)):\n",
3706 | " start_time = time.time()\n",
3707 | " _, loss = sess.run(\n",
3708 | " [train_op, cost],\n",
3709 | " {input_data: texts_batch,\n",
3710 | " targets: summaries_batch,\n",
3711 | " lr: learning_rate,\n",
3712 | " summary_length: summaries_lengths,\n",
3713 | " text_length: texts_lengths,\n",
3714 | " keep_prob: keep_probability})\n",
3715 | "\n",
3716 | " batch_loss += loss\n",
3717 | " update_loss += loss\n",
3718 | " end_time = time.time()\n",
3719 | " batch_time = end_time - start_time\n",
3720 | "\n",
3721 | " if batch_i % display_step == 0 and batch_i > 0:\n",
3722 | " print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'\n",
3723 | " .format(epoch_i,\n",
3724 | " epochs, \n",
3725 | " batch_i, \n",
3726 | " len(sorted_texts_short) // batch_size, \n",
3727 | " batch_loss / display_step, \n",
3728 | " batch_time*display_step))\n",
3729 | " batch_loss = 0\n",
3730 | "\n",
3731 | " if batch_i % update_check == 0 and batch_i > 0:\n",
3732 | " print(\"Average loss for this update:\", round(update_loss/update_check,3))\n",
3733 | " summary_update_loss.append(update_loss)\n",
3734 | " \n",
3735 | " # If the update loss is at a new minimum, save the model\n",
3736 | " if update_loss <= min(summary_update_loss):\n",
3737 | " print('New Record!') \n",
3738 | " stop_early = 0\n",
3739 | " saver = tf.train.Saver() \n",
3740 | " saver.save(sess, checkpoint)\n",
3741 | "\n",
3742 | " else:\n",
3743 | " print(\"No Improvement.\")\n",
3744 | " stop_early += 1\n",
3745 | " if stop_early == stop:\n",
3746 | " break\n",
3747 | " update_loss = 0\n",
3748 | " \n",
3749 | " \n",
3750 | " # Reduce learning rate, but not below its minimum value\n",
3751 | " learning_rate *= learning_rate_decay\n",
3752 | " if learning_rate < min_learning_rate:\n",
3753 | " learning_rate = min_learning_rate\n",
3754 | " \n",
3755 | " if stop_early == stop:\n",
3756 | " print(\"Stopping Training.\")\n",
3757 | " break"
3758 | ]
3759 | },
3760 | {
3761 | "cell_type": "markdown",
3762 | "metadata": {},
3763 | "source": [
3764 | "## 5. Making Our Own Summaries"
3765 | ]
3766 | },
3767 | {
3768 | "cell_type": "markdown",
3769 | "metadata": {},
3770 | "source": [
3771 | "To see the quality of the summaries that this model can generate, you can either create your own review, or use a review from the dataset. You can set the length of the summary to a fixed value, or use a random value like I have here."
3772 | ]
3773 | },
3774 | {
3775 | "cell_type": "code",
3776 | "execution_count": 20,
3777 | "metadata": {
3778 | "collapsed": true
3779 | },
3780 | "outputs": [],
3781 | "source": [
3782 | "def text_to_seq(text):\n",
3783 | " '''Prepare the text for the model'''\n",
3784 | " \n",
3785 | " text = clean_text(text)\n",
3786 | " return [vocab_to_int.get(word, vocab_to_int['']) for word in text.split()]"
3787 | ]
3788 | },
3789 | {
3790 | "cell_type": "markdown",
3791 | "metadata": {},
3792 | "source": [
3793 | "\n",
3794 | "- **input_sentences**: a list of reviews strings we are going to summarize\n",
3795 | "- **generagte_summary_length**: a int or list, if a list must be same length as input_sentences\n"
3796 | ]
3797 | },
3798 | {
3799 | "cell_type": "code",
3800 | "execution_count": 23,
3801 | "metadata": {},
3802 | "outputs": [
3803 | {
3804 | "name": "stdout",
3805 | "output_type": "stream",
3806 | "text": [
3807 | "INFO:tensorflow:Restoring parameters from ./best_model.ckpt\n",
3808 | "- Review:\n",
3809 | " The coffee tasted great and was at such a good price! I highly recommend this to everyone!\n",
3810 | "- Summary:\n",
3811 | " great great coffee\n",
3812 | "\n",
3813 | "\n",
3814 | "- Review:\n",
3815 | " love individual oatmeal cups found years ago sam quit selling sound big lots quit selling found target expensive buy individually trilled get entire case time go anywhere need water microwave spoon know quaker flavor packets\n",
3816 | "- Summary:\n",
3817 | " great taste\n",
3818 | "\n",
3819 | "\n"
3820 | ]
3821 | }
3822 | ],
3823 | "source": [
3824 | "input_sentences=[\"The coffee tasted great and was at such a good price! I highly recommend this to everyone!\",\n",
3825 | " \"love individual oatmeal cups found years ago sam quit selling sound big lots quit selling found target expensive buy individually trilled get entire case time go anywhere need water microwave spoon know quaker flavor packets\"]\n",
3826 | "generagte_summary_length = [3,2]\n",
3827 | "\n",
3828 | "texts = [text_to_seq(input_sentence) for input_sentence in input_sentences]\n",
3829 | "checkpoint = \"./best_model.ckpt\"\n",
3830 | "if type(generagte_summary_length) is list:\n",
3831 | " if len(input_sentences)!=len(generagte_summary_length):\n",
3832 | " raise Exception(\"[Error] makeSummaries parameter generagte_summary_length must be same length as input_sentences or an integer\")\n",
3833 | " generagte_summary_length_list = generagte_summary_length\n",
3834 | "else:\n",
3835 | " generagte_summary_length_list = [generagte_summary_length] * len(texts)\n",
3836 | "loaded_graph = tf.Graph()\n",
3837 | "with tf.Session(graph=loaded_graph) as sess:\n",
3838 | " # Load saved model\n",
3839 | " loader = tf.train.import_meta_graph(checkpoint + '.meta')\n",
3840 | " loader.restore(sess, checkpoint)\n",
3841 | " input_data = loaded_graph.get_tensor_by_name('input:0')\n",
3842 | " logits = loaded_graph.get_tensor_by_name('predictions:0')\n",
3843 | " text_length = loaded_graph.get_tensor_by_name('text_length:0')\n",
3844 | " summary_length = loaded_graph.get_tensor_by_name('summary_length:0')\n",
3845 | " keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')\n",
3846 | " #Multiply by batch_size to match the model's input parameters\n",
3847 | " for i, text in enumerate(texts):\n",
3848 | " generagte_summary_length = generagte_summary_length_list[i]\n",
3849 | " answer_logits = sess.run(logits, {input_data: [text]*batch_size, \n",
3850 | " summary_length: [generagte_summary_length], #summary_length: [np.random.randint(5,8)], \n",
3851 | " text_length: [len(text)]*batch_size,\n",
3852 | " keep_prob: 1.0})[0] \n",
3853 | " # Remove the padding from the summaries\n",
3854 | " pad = vocab_to_int[\"\"] \n",
3855 | " print('- Review:\\n\\r {}'.format(input_sentences[i]))\n",
3856 | " print('- Summary:\\n\\r {}\\n\\r\\n\\r'.format(\" \".join([int_to_vocab[i] for i in answer_logits if i != pad])))"
3857 | ]
3858 | },
3859 | {
3860 | "cell_type": "markdown",
3861 | "metadata": {},
3862 | "source": [
3863 | "## Summary"
3864 | ]
3865 | },
3866 | {
3867 | "cell_type": "markdown",
3868 | "metadata": {},
3869 | "source": [
3870 | "I hope that you found this project to be rather interesting and informative. One of my main recommendations for working with this dataset and model is either use a GPU, a subset of the dataset, or plenty of time to train your model. As you might be able to expect, the model will not be able to make good predictions just by seeing many reviews, it needs so see the reviews many times to be able to understand the relationship between words and between descriptions & summaries. \n",
3871 | "\n",
3872 | "In short, I'm pleased with how well this model performs. After creating numerous reviews and checking those from the dataset, I can happily say that most of the generated summaries are appropriate, some of them are great, and some of them make mistakes. I'll try to improve this model and if it gets better, I'll update my GitHub.\n",
3873 | "\n",
3874 | "Thanks for reading!"
3875 | ]
3876 | },
3877 | {
3878 | "cell_type": "code",
3879 | "execution_count": null,
3880 | "metadata": {
3881 | "collapsed": true
3882 | },
3883 | "outputs": [],
3884 | "source": []
3885 | }
3886 | ],
3887 | "metadata": {
3888 | "anaconda-cloud": {},
3889 | "kernelspec": {
3890 | "display_name": "Python 3",
3891 | "language": "python",
3892 | "name": "python3"
3893 | },
3894 | "language_info": {
3895 | "codemirror_mode": {
3896 | "name": "ipython",
3897 | "version": 3
3898 | },
3899 | "file_extension": ".py",
3900 | "mimetype": "text/x-python",
3901 | "name": "python",
3902 | "nbconvert_exporter": "python",
3903 | "pygments_lexer": "ipython3",
3904 | "version": "3.5.2"
3905 | }
3906 | },
3907 | "nbformat": 4,
3908 | "nbformat_minor": 1
3909 | }
3910 |
--------------------------------------------------------------------------------