└── How_to_build_own_text_summarizer_using_deep_learning.ipynb


/How_to_build_own_text_summarizer_using_deep_learning.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "name": "How to build own text summarizer using deep learning.ipynb",
   7 |       "version": "0.3.2",
   8 |       "provenance": [],
   9 |       "collapsed_sections": []
  10 |     },
  11 |     "language_info": {
  12 |       "name": "python",
  13 |       "version": "3.6.4",
  14 |       "mimetype": "text/x-python",
  15 |       "codemirror_mode": {
  16 |         "name": "ipython",
  17 |         "version": 3
  18 |       },
  19 |       "pygments_lexer": "ipython3",
  20 |       "nbconvert_exporter": "python",
  21 |       "file_extension": ".py"
  22 |     },
  23 |     "kernelspec": {
  24 |       "display_name": "Python 3",
  25 |       "language": "python",
  26 |       "name": "python3"
  27 |     }
  28 |   },
  29 |   "cells": [
  30 |     {
  31 |       "cell_type": "markdown",
  32 |       "metadata": {
  33 |         "id": "qFuL-RBgXqgU",
  34 |         "colab_type": "text"
  35 |       },
  36 |       "source": [
  37 |         "In this notebook, we will build an abstractive based text summarizer using deep learning from the scratch in python using keras\n",
  38 |         "\n",
  39 |         "I recommend you to go through the article over [here](https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/) to cover all the concepts which is required to build our own summarizer"
  40 |       ]
  41 |     },
  42 |     {
  43 |       "cell_type": "markdown",
  44 |       "metadata": {
  45 |         "id": "F5dSoP8lGMZi",
  46 |         "colab_type": "text"
  47 |       },
  48 |       "source": [
  49 |         "#Understanding the Problem Statement\n",
  50 |         "\n",
  51 |         "Customer reviews can often be long and descriptive. Analyzing these reviews manually, as you can imagine, is really time-consuming. This is where the brilliance of Natural Language Processing can be applied to generate a summary for long reviews.\n",
  52 |         "\n",
  53 |         "We will be working on a really cool dataset. Our objective here is to generate a summary for the Amazon Fine Food reviews using the abstraction-based approach we learned about above. You can download the dataset from[ here ](https://www.kaggle.com/snap/amazon-fine-food-reviews)\n",
  54 |         "\n",
  55 |         "It’s time to fire up our Jupyter notebooks! Let’s dive into the implementation details right away.\n",
  56 |         "\n",
  57 |         "#Custom Attention Layer\n",
  58 |         "\n",
  59 |         "Keras does not officially support attention layer. So, we can either implement our own attention layer or use a third-party implementation. We will go with the latter option for this article. You can download the attention layer from [here](https://github.com/thushv89/attention_keras/blob/master/layers/attention.py) and copy it in a different file called attention.py.\n",
  60 |         "\n",
  61 |         "Let’s import it into our environment:"
  62 |       ]
  63 |     },
  64 |     {
  65 |       "cell_type": "code",
  66 |       "metadata": {
  67 |         "trusted": true,
  68 |         "id": "Fi64aA0FFxcS",
  69 |         "colab_type": "code",
  70 |         "colab": {}
  71 |       },
  72 |       "source": [
  73 |         "from attention import AttentionLayer"
  74 |       ],
  75 |       "execution_count": 0,
  76 |       "outputs": []
  77 |     },
  78 |     {
  79 |       "cell_type": "markdown",
  80 |       "metadata": {
  81 |         "id": "JUValOzcHtEK",
  82 |         "colab_type": "text"
  83 |       },
  84 |       "source": [
  85 |         "#Import the Libraries"
  86 |       ]
  87 |     },
  88 |     {
  89 |       "cell_type": "code",
  90 |       "metadata": {
  91 |         "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
  92 |         "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
  93 |         "trusted": true,
  94 |         "id": "_Jpu8qLEFxcY",
  95 |         "colab_type": "code",
  96 |         "colab": {},
  97 |         "outputId": "95968e01-faac-4911-c802-9c008a4e62cf"
  98 |       },
  99 |       "source": [
 100 |         "import numpy as np\n",
 101 |         "import pandas as pd \n",
 102 |         "import re\n",
 103 |         "from bs4 import BeautifulSoup\n",
 104 |         "from keras.preprocessing.text import Tokenizer \n",
 105 |         "from keras.preprocessing.sequence import pad_sequences\n",
 106 |         "from nltk.corpus import stopwords\n",
 107 |         "from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, Concatenate, TimeDistributed\n",
 108 |         "from tensorflow.keras.models import Model\n",
 109 |         "from tensorflow.keras.callbacks import EarlyStopping\n",
 110 |         "import warnings\n",
 111 |         "pd.set_option(\"display.max_colwidth\", 200)\n",
 112 |         "warnings.filterwarnings(\"ignore\")"
 113 |       ],
 114 |       "execution_count": 0,
 115 |       "outputs": [
 116 |         {
 117 |           "output_type": "stream",
 118 |           "text": [
 119 |             "Using TensorFlow backend.\n"
 120 |           ],
 121 |           "name": "stderr"
 122 |         }
 123 |       ]
 124 |     },
 125 |     {
 126 |       "cell_type": "markdown",
 127 |       "metadata": {
 128 |         "id": "UVakjZ3oICgx",
 129 |         "colab_type": "text"
 130 |       },
 131 |       "source": [
 132 |         "#Read the dataset\n",
 133 |         "\n",
 134 |         "This dataset consists of reviews of fine foods from Amazon. The data spans a period of more than 10 years, including all ~500,000 reviews up to October 2012. These reviews include product and user information, ratings, plain text review, and summary. It also includes reviews from all other Amazon categories.\n",
 135 |         "\n",
 136 |         "We’ll take a sample of 100,000 reviews to reduce the training time of our model. Feel free to use the entire dataset for training your model if your machine has that kind of computational power."
 137 |       ]
 138 |     },
 139 |     {
 140 |       "cell_type": "code",
 141 |       "metadata": {
 142 |         "trusted": true,
 143 |         "id": "wnK5o4Z1Fxcj",
 144 |         "colab_type": "code",
 145 |         "colab": {}
 146 |       },
 147 |       "source": [
 148 |         "data=pd.read_csv(\"../input/amazon-fine-food-reviews/Reviews.csv\",nrows=100000)"
 149 |       ],
 150 |       "execution_count": 0,
 151 |       "outputs": []
 152 |     },
 153 |     {
 154 |       "cell_type": "markdown",
 155 |       "metadata": {
 156 |         "id": "kGNQKvCaISIn",
 157 |         "colab_type": "text"
 158 |       },
 159 |       "source": [
 160 |         "# Drop Duplicates and NA values"
 161 |       ]
 162 |     },
 163 |     {
 164 |       "cell_type": "code",
 165 |       "metadata": {
 166 |         "trusted": true,
 167 |         "id": "Cjul88oOFxcr",
 168 |         "colab_type": "code",
 169 |         "colab": {}
 170 |       },
 171 |       "source": [
 172 |         "data.drop_duplicates(subset=['Text'],inplace=True)#dropping duplicates\n",
 173 |         "data.dropna(axis=0,inplace=True)#dropping na"
 174 |       ],
 175 |       "execution_count": 0,
 176 |       "outputs": []
 177 |     },
 178 |     {
 179 |       "cell_type": "markdown",
 180 |       "metadata": {
 181 |         "id": "qi0xD6BkIWAm",
 182 |         "colab_type": "text"
 183 |       },
 184 |       "source": [
 185 |         "# Information about dataset\n",
 186 |         "\n",
 187 |         "Let us look at datatypes and shape of the dataset"
 188 |       ]
 189 |     },
 190 |     {
 191 |       "cell_type": "code",
 192 |       "metadata": {
 193 |         "trusted": true,
 194 |         "id": "__fy-JxTFxc9",
 195 |         "colab_type": "code",
 196 |         "colab": {},
 197 |         "outputId": "d42c6e36-bbc8-43c2-de0e-d3effe3e8c4c"
 198 |       },
 199 |       "source": [
 200 |         "data.info()"
 201 |       ],
 202 |       "execution_count": 0,
 203 |       "outputs": [
 204 |         {
 205 |           "output_type": "stream",
 206 |           "text": [
 207 |             "<class 'pandas.core.frame.DataFrame'>\n",
 208 |             "Int64Index: 88421 entries, 0 to 99999\n",
 209 |             "Data columns (total 10 columns):\n",
 210 |             "Id                        88421 non-null int64\n",
 211 |             "ProductId                 88421 non-null object\n",
 212 |             "UserId                    88421 non-null object\n",
 213 |             "ProfileName               88421 non-null object\n",
 214 |             "HelpfulnessNumerator      88421 non-null int64\n",
 215 |             "HelpfulnessDenominator    88421 non-null int64\n",
 216 |             "Score                     88421 non-null int64\n",
 217 |             "Time                      88421 non-null int64\n",
 218 |             "Summary                   88421 non-null object\n",
 219 |             "Text                      88421 non-null object\n",
 220 |             "dtypes: int64(5), object(5)\n",
 221 |             "memory usage: 7.4+ MB\n"
 222 |           ],
 223 |           "name": "stdout"
 224 |         }
 225 |       ]
 226 |     },
 227 |     {
 228 |       "cell_type": "markdown",
 229 |       "metadata": {
 230 |         "id": "r0xLYACiFxdJ",
 231 |         "colab_type": "text"
 232 |       },
 233 |       "source": [
 234 |         "#Preprocessing\n",
 235 |         "\n",
 236 |         "Performing basic preprocessing steps is very important before we get to the model building part. Using messy and uncleaned text data is a potentially disastrous move. So in this step, we will drop all the unwanted symbols, characters, etc. from the text that do not affect the objective of our problem.\n",
 237 |         "\n",
 238 |         "Here is the dictionary that we will use for expanding the contractions:"
 239 |       ]
 240 |     },
 241 |     {
 242 |       "cell_type": "code",
 243 |       "metadata": {
 244 |         "trusted": true,
 245 |         "id": "0s6IY-x2FxdL",
 246 |         "colab_type": "code",
 247 |         "colab": {}
 248 |       },
 249 |       "source": [
 250 |         "contraction_mapping = {\"ain't\": \"is not\", \"aren't\": \"are not\",\"can't\": \"cannot\", \"'cause\": \"because\", \"could've\": \"could have\", \"couldn't\": \"could not\",\n",
 251 |         "                           \"didn't\": \"did not\",  \"doesn't\": \"does not\", \"don't\": \"do not\", \"hadn't\": \"had not\", \"hasn't\": \"has not\", \"haven't\": \"have not\",\n",
 252 |         "                           \"he'd\": \"he would\",\"he'll\": \"he will\", \"he's\": \"he is\", \"how'd\": \"how did\", \"how'd'y\": \"how do you\", \"how'll\": \"how will\", \"how's\": \"how is\",\n",
 253 |         "                           \"I'd\": \"I would\", \"I'd've\": \"I would have\", \"I'll\": \"I will\", \"I'll've\": \"I will have\",\"I'm\": \"I am\", \"I've\": \"I have\", \"i'd\": \"i would\",\n",
 254 |         "                           \"i'd've\": \"i would have\", \"i'll\": \"i will\",  \"i'll've\": \"i will have\",\"i'm\": \"i am\", \"i've\": \"i have\", \"isn't\": \"is not\", \"it'd\": \"it would\",\n",
 255 |         "                           \"it'd've\": \"it would have\", \"it'll\": \"it will\", \"it'll've\": \"it will have\",\"it's\": \"it is\", \"let's\": \"let us\", \"ma'am\": \"madam\",\n",
 256 |         "                           \"mayn't\": \"may not\", \"might've\": \"might have\",\"mightn't\": \"might not\",\"mightn't've\": \"might not have\", \"must've\": \"must have\",\n",
 257 |         "                           \"mustn't\": \"must not\", \"mustn't've\": \"must not have\", \"needn't\": \"need not\", \"needn't've\": \"need not have\",\"o'clock\": \"of the clock\",\n",
 258 |         "                           \"oughtn't\": \"ought not\", \"oughtn't've\": \"ought not have\", \"shan't\": \"shall not\", \"sha'n't\": \"shall not\", \"shan't've\": \"shall not have\",\n",
 259 |         "                           \"she'd\": \"she would\", \"she'd've\": \"she would have\", \"she'll\": \"she will\", \"she'll've\": \"she will have\", \"she's\": \"she is\",\n",
 260 |         "                           \"should've\": \"should have\", \"shouldn't\": \"should not\", \"shouldn't've\": \"should not have\", \"so've\": \"so have\",\"so's\": \"so as\",\n",
 261 |         "                           \"this's\": \"this is\",\"that'd\": \"that would\", \"that'd've\": \"that would have\", \"that's\": \"that is\", \"there'd\": \"there would\",\n",
 262 |         "                           \"there'd've\": \"there would have\", \"there's\": \"there is\", \"here's\": \"here is\",\"they'd\": \"they would\", \"they'd've\": \"they would have\",\n",
 263 |         "                           \"they'll\": \"they will\", \"they'll've\": \"they will have\", \"they're\": \"they are\", \"they've\": \"they have\", \"to've\": \"to have\",\n",
 264 |         "                           \"wasn't\": \"was not\", \"we'd\": \"we would\", \"we'd've\": \"we would have\", \"we'll\": \"we will\", \"we'll've\": \"we will have\", \"we're\": \"we are\",\n",
 265 |         "                           \"we've\": \"we have\", \"weren't\": \"were not\", \"what'll\": \"what will\", \"what'll've\": \"what will have\", \"what're\": \"what are\",\n",
 266 |         "                           \"what's\": \"what is\", \"what've\": \"what have\", \"when's\": \"when is\", \"when've\": \"when have\", \"where'd\": \"where did\", \"where's\": \"where is\",\n",
 267 |         "                           \"where've\": \"where have\", \"who'll\": \"who will\", \"who'll've\": \"who will have\", \"who's\": \"who is\", \"who've\": \"who have\",\n",
 268 |         "                           \"why's\": \"why is\", \"why've\": \"why have\", \"will've\": \"will have\", \"won't\": \"will not\", \"won't've\": \"will not have\",\n",
 269 |         "                           \"would've\": \"would have\", \"wouldn't\": \"would not\", \"wouldn't've\": \"would not have\", \"y'all\": \"you all\",\n",
 270 |         "                           \"y'all'd\": \"you all would\",\"y'all'd've\": \"you all would have\",\"y'all're\": \"you all are\",\"y'all've\": \"you all have\",\n",
 271 |         "                           \"you'd\": \"you would\", \"you'd've\": \"you would have\", \"you'll\": \"you will\", \"you'll've\": \"you will have\",\n",
 272 |         "                           \"you're\": \"you are\", \"you've\": \"you have\"}"
 273 |       ],
 274 |       "execution_count": 0,
 275 |       "outputs": []
 276 |     },
 277 |     {
 278 |       "cell_type": "markdown",
 279 |       "metadata": {
 280 |         "id": "2JFRXFHmI7Mj",
 281 |         "colab_type": "text"
 282 |       },
 283 |       "source": [
 284 |         "We will perform the below preprocessing tasks for our data:\n",
 285 |         "\n",
 286 |         "1.Convert everything to lowercase\n",
 287 |         "\n",
 288 |         "2.Remove HTML tags\n",
 289 |         "\n",
 290 |         "3.Contraction mapping\n",
 291 |         "\n",
 292 |         "4.Remove (‘s)\n",
 293 |         "\n",
 294 |         "5.Remove any text inside the parenthesis ( )\n",
 295 |         "\n",
 296 |         "6.Eliminate punctuations and special characters\n",
 297 |         "\n",
 298 |         "7.Remove stopwords\n",
 299 |         "\n",
 300 |         "8.Remove short words\n",
 301 |         "\n",
 302 |         "Let’s define the function:"
 303 |       ]
 304 |     },
 305 |     {
 306 |       "cell_type": "code",
 307 |       "metadata": {
 308 |         "trusted": true,
 309 |         "id": "XZr-u3OEFxdT",
 310 |         "colab_type": "code",
 311 |         "colab": {}
 312 |       },
 313 |       "source": [
 314 |         "stop_words = set(stopwords.words('english')) \n",
 315 |         "\n",
 316 |         "def text_cleaner(text,num):\n",
 317 |         "    newString = text.lower()\n",
 318 |         "    newString = BeautifulSoup(newString, \"lxml\").text\n",
 319 |         "    newString = re.sub(r'\\([^)]*\\)', '', newString)\n",
 320 |         "    newString = re.sub('\"','', newString)\n",
 321 |         "    newString = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in newString.split(\" \")])    \n",
 322 |         "    newString = re.sub(r\"'s\\b\",\"\",newString)\n",
 323 |         "    newString = re.sub(\"[^a-zA-Z]\", \" \", newString) \n",
 324 |         "    newString = re.sub('[m]{2,}', 'mm', newString)\n",
 325 |         "    if(num==0):\n",
 326 |         "        tokens = [w for w in newString.split() if not w in stop_words]\n",
 327 |         "    else:\n",
 328 |         "        tokens=newString.split()\n",
 329 |         "    long_words=[]\n",
 330 |         "    for i in tokens:\n",
 331 |         "        if len(i)>1:                                                 #removing short word\n",
 332 |         "            long_words.append(i)   \n",
 333 |         "    return (\" \".join(long_words)).strip()"
 334 |       ],
 335 |       "execution_count": 0,
 336 |       "outputs": []
 337 |     },
 338 |     {
 339 |       "cell_type": "code",
 340 |       "metadata": {
 341 |         "trusted": true,
 342 |         "id": "A2QAeCHWFxdY",
 343 |         "colab_type": "code",
 344 |         "colab": {}
 345 |       },
 346 |       "source": [
 347 |         "#call the function\n",
 348 |         "cleaned_text = []\n",
 349 |         "for t in data['Text']:\n",
 350 |         "    cleaned_text.append(text_cleaner(t,0)) "
 351 |       ],
 352 |       "execution_count": 0,
 353 |       "outputs": []
 354 |     },
 355 |     {
 356 |       "cell_type": "markdown",
 357 |       "metadata": {
 358 |         "id": "snRZY8wjLao2",
 359 |         "colab_type": "text"
 360 |       },
 361 |       "source": [
 362 |         "Let us look at the first five preprocessed reviews"
 363 |       ]
 364 |     },
 365 |     {
 366 |       "cell_type": "code",
 367 |       "metadata": {
 368 |         "trusted": true,
 369 |         "id": "NCAIkhWbFxdh",
 370 |         "colab_type": "code",
 371 |         "colab": {},
 372 |         "outputId": "c2da1a36-4488-4e32-ef9e-fcfe496e374d"
 373 |       },
 374 |       "source": [
 375 |         "cleaned_text[:5]  "
 376 |       ],
 377 |       "execution_count": 0,
 378 |       "outputs": [
 379 |         {
 380 |           "output_type": "execute_result",
 381 |           "data": {
 382 |             "text/plain": [
 383 |               "['bought several vitality canned dog food products found good quality product looks like stew processed meat smells better labrador finicky appreciates product better',\n",
 384 |               " 'product arrived labeled jumbo salted peanuts peanuts actually small sized unsalted sure error vendor intended represent product jumbo',\n",
 385 |               " 'confection around centuries light pillowy citrus gelatin nuts case filberts cut tiny squares liberally coated powdered sugar tiny mouthful heaven chewy flavorful highly recommend yummy treat familiar story lewis lion witch wardrobe treat seduces edmund selling brother sisters witch',\n",
 386 |               " 'looking secret ingredient robitussin believe found got addition root beer extract ordered made cherry soda flavor medicinal',\n",
 387 |               " 'great taffy great price wide assortment yummy taffy delivery quick taffy lover deal']"
 388 |             ]
 389 |           },
 390 |           "metadata": {
 391 |             "tags": []
 392 |           },
 393 |           "execution_count": 9
 394 |         }
 395 |       ]
 396 |     },
 397 |     {
 398 |       "cell_type": "code",
 399 |       "metadata": {
 400 |         "trusted": true,
 401 |         "id": "GsRXocxoFxd-",
 402 |         "colab_type": "code",
 403 |         "colab": {}
 404 |       },
 405 |       "source": [
 406 |         "#call the function\n",
 407 |         "cleaned_summary = []\n",
 408 |         "for t in data['Summary']:\n",
 409 |         "    cleaned_summary.append(text_cleaner(t,1))"
 410 |       ],
 411 |       "execution_count": 0,
 412 |       "outputs": []
 413 |     },
 414 |     {
 415 |       "cell_type": "markdown",
 416 |       "metadata": {
 417 |         "id": "oZeD0gs6Lnb-",
 418 |         "colab_type": "text"
 419 |       },
 420 |       "source": [
 421 |         "Let us look at the first 10 preprocessed summaries"
 422 |       ]
 423 |     },
 424 |     {
 425 |       "cell_type": "code",
 426 |       "metadata": {
 427 |         "trusted": true,
 428 |         "id": "jQJdZcAzFxee",
 429 |         "colab_type": "code",
 430 |         "colab": {},
 431 |         "outputId": "a1fbe683-c03f-4afb-addf-e075021c121b"
 432 |       },
 433 |       "source": [
 434 |         "cleaned_summary[:10]"
 435 |       ],
 436 |       "execution_count": 0,
 437 |       "outputs": [
 438 |         {
 439 |           "output_type": "execute_result",
 440 |           "data": {
 441 |             "text/plain": [
 442 |               "['good quality dog food',\n",
 443 |               " 'not as advertised',\n",
 444 |               " 'delight says it all',\n",
 445 |               " 'cough medicine',\n",
 446 |               " 'great taffy',\n",
 447 |               " 'nice taffy',\n",
 448 |               " 'great just as good as the expensive brands',\n",
 449 |               " 'wonderful tasty taffy',\n",
 450 |               " 'yay barley',\n",
 451 |               " 'healthy dog food']"
 452 |             ]
 453 |           },
 454 |           "metadata": {
 455 |             "tags": []
 456 |           },
 457 |           "execution_count": 11
 458 |         }
 459 |       ]
 460 |     },
 461 |     {
 462 |       "cell_type": "code",
 463 |       "metadata": {
 464 |         "trusted": true,
 465 |         "id": "L1zLpnqsFxey",
 466 |         "colab_type": "code",
 467 |         "colab": {}
 468 |       },
 469 |       "source": [
 470 |         "data['cleaned_text']=cleaned_text\n",
 471 |         "data['cleaned_summary']=cleaned_summary"
 472 |       ],
 473 |       "execution_count": 0,
 474 |       "outputs": []
 475 |     },
 476 |     {
 477 |       "cell_type": "markdown",
 478 |       "metadata": {
 479 |         "id": "KT_D2cLiLy77",
 480 |         "colab_type": "text"
 481 |       },
 482 |       "source": [
 483 |         "#Drop empty rows"
 484 |       ]
 485 |     },
 486 |     {
 487 |       "cell_type": "code",
 488 |       "metadata": {
 489 |         "trusted": true,
 490 |         "id": "sYK390unFxfA",
 491 |         "colab_type": "code",
 492 |         "colab": {}
 493 |       },
 494 |       "source": [
 495 |         "data.replace('', np.nan, inplace=True)\n",
 496 |         "data.dropna(axis=0,inplace=True)"
 497 |       ],
 498 |       "execution_count": 0,
 499 |       "outputs": []
 500 |     },
 501 |     {
 502 |       "cell_type": "markdown",
 503 |       "metadata": {
 504 |         "id": "Vm8Fk2TCL7Sp",
 505 |         "colab_type": "text"
 506 |       },
 507 |       "source": [
 508 |         "#Understanding the distribution of the sequences\n",
 509 |         "\n",
 510 |         "Here, we will analyze the length of the reviews and the summary to get an overall idea about the distribution of length of the text. This will help us fix the maximum length of the sequence:"
 511 |       ]
 512 |     },
 513 |     {
 514 |       "cell_type": "code",
 515 |       "metadata": {
 516 |         "trusted": true,
 517 |         "id": "MdF76AHHFxgw",
 518 |         "colab_type": "code",
 519 |         "colab": {},
 520 |         "outputId": "e3bbe165-4235-482f-bfd4-36a3f1d95290"
 521 |       },
 522 |       "source": [
 523 |         "import matplotlib.pyplot as plt\n",
 524 |         "\n",
 525 |         "text_word_count = []\n",
 526 |         "summary_word_count = []\n",
 527 |         "\n",
 528 |         "# populate the lists with sentence lengths\n",
 529 |         "for i in data['cleaned_text']:\n",
 530 |         "      text_word_count.append(len(i.split()))\n",
 531 |         "\n",
 532 |         "for i in data['cleaned_summary']:\n",
 533 |         "      summary_word_count.append(len(i.split()))\n",
 534 |         "\n",
 535 |         "length_df = pd.DataFrame({'text':text_word_count, 'summary':summary_word_count})\n",
 536 |         "\n",
 537 |         "length_df.hist(bins = 30)\n",
 538 |         "plt.show()"
 539 |       ],
 540 |       "execution_count": 0,
 541 |       "outputs": [
 542 |         {
 543 |           "output_type": "display_data",
 544 |           "data": {
 545 |             "text/plain": [
 546 |               "<Figure size 432x288 with 2 Axes>"
 547 |             ],
 548 |             "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3X+01XWd7/HnK03zWgZonRBssImaZTKRcJVZdZtTJiI1YXeVQd5AY0kttbF1WRU2rUWjOZfupI3OeC1KrtCY6NVMpjA6kXuZ6w4KJImgDkfC62EhFKB0qCzoff/4fnZ82d999tlwfuwfvB5r7bX3fn8/3+/+fs767vPe38/38/18FBGYmZnlvaLRO2BmZs3HycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMmpqkbZLeNwjbuUPSlwdjn44FTg5WN0nHN3ofzGx4ODkMM0mfl7Rd0q8lPSPp/MpfNJI6JfXk3m+T9FlJT0jaL+l2SR2SHkzb+bGkkansOEkh6XJJz0vaK+lTkv5zWv9FSf+S2/afS/qJpN2SfiXpTkkjKj7785KeAPan/bivok63SLp5SP9wdkyS9G3gjcC/SeqV9DlJUyT933Qs/1xSZyo7SlKPpL9J718tqVvSbEnzgEuBz6Xt/FvDKtUqIsKPYXoAbwWeB05P78cBfw7cAXw5V64T6Mm93wasATqAMcAu4GfAO4BXAT8BFua2GcDX07KpwO+A7wGvz63/16n8m4ELgBOB1wEPA/9U8dkbgDOAk4DRwH5gRFp+fNrepEb/ff1oz0c6Bt+XXo8BdgPTyX7cXpDevy4tnwq8kI71bwL35rZz2PfMj9oPnzkMr4Nk/4TPkvTKiNgWEc/Wue4/R8TOiNgO/BR4NCIej4jfAfeTJYq86yPidxHxI7J/5ndFxK7c+u8AiIjuiOiKiJcj4pfATcBfV2zrloh4PiJ+GxE7yBLIR9KyacCvImL9Ef0lzI7OfwNWRsTKiPhjRHQB68iSBel4/z/A6hT7ZMP2tMU5OQyjiOgGPgN8Cdglabmk0+tcfWfu9W+rvH/10ZRPzVPLU1PXPuBfgdMqtvV8xfulZF9S0vO366yD2UD9GfCR1KT0oqQXgXeRndGWLQbOBu6IiN2N2Ml24OQwzCLiOxHxLrKDPICvkP2y/0+5Ym8Yxl36h7QfEyLiFLJ/9qooUzl07/eAv5R0NvAB4M4h30s7luWPv+eBb0fEiNzj5IhYBCDpOLLksAy4UtKb+9iO9cPJYRhJequk90o6kew6wG+BP5K16U9PF9TeQHZ2MVxeA/QCL0kaA3y2vxVSU9a9wHeAxyLi/w3tLtoxbifwpvT6X4G/kXShpOMkvSp14Bibln+BLAl8AvhHYFlKGJXbsX44OQyvE4FFwK84dNHsWrJmmZ+TXXj7EXD3MO7T3wPnAC8BPwC+W+d6S4EJuEnJht7/AL6YmpA+CswgSwK/JDuT+CzwCkmTgP8OzI6Ig2Rn5QEsSNu5nex634uSvjfMdWg5SlfxzY6IpDcCTwNviIh9jd4fMxtcPnOwIybpFWS/0JY7MZi1J9/xakdE0slkbbfPkXVjNbM25GYlMzMr6LdZSdIZkh6StFnSJknXpPgoSV2StqTn8vANSsMpdKfhGs7JbWtOKr9F0pxcfJKkjWmdWyRVdqU0M7Nh1O+Zg6TRwOiI+Jmk1wDrgYuBy4A9EbFI0gJgZER8XtJ04NNkdyeeB9wcEedJGkV2J+Nksh4E68mGXNgr6THgb4FHgZVkd+Q+WGu/TjvttBg3bhz79+/n5JNPPuo/QDNwHRpj/fr1v4qI1zV6P+pVPuYrteLfvh6u19Co+7g/0vE2gAfIxjN5hixpQHZ34jPp9TeAWbnyz6Tls4Bv5OLfSLHRwNO5+GHl+npMmjQpIiIeeuihaHWuQ2MA66IJxrCp91E+5iu14t++Hq7X0Kj3uD+iC9KSxpGNyfMo0BHZODuQ9dnvSK/HcPhwCz0pViveUyVe7fPnAfMAOjo6KJVK9Pb2UiqVjqQaTcd1MLNmU3dykPRq4D7gMxGxL39ZICJC0pBf2Y6IxWS3xjN58uTo7OykVCrR2dk51B89pFwHM2s2dd3nIOmVZInhzogo30G7M12PKF+X2JXi28mGdy4bm2K14mOrxM3MrEHq6a0kstvOn4qIm3KLVgDlHkdzyK5FlOOzU6+lKcBLqflpFTBV0sjUs2kqsCot25cm8BAwO7ctMzNrgHqald4JfBzYKGlDin2BbIygeyTNJbsh6pK0bCVZT6Vu4DfA5QARsUfS9cDaVO66iNiTXl9JNhHHScCD6WFmZg3Sb3KIiEcoDuFcdn6V8gFc1ce2lgBLqsTXkY2/bmZmTcBjK5mZWYGTg5mZFTg5mJlZwTExKuu4BT847P22Re9v0J6YDQ0f4zbYfOZgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgVoWkEZLulfS0pKck/ZWnxrVjiZODWXU3Az+MiL8A3g48BSwAVkfEeGB1eg9wETA+PeYBt0E2zzqwkGy63HOBheWEkspckVtv2jDUyaxuTg5mFSS9Fng32VD1RMTvI+JFYAawNBVbSjaXOim+LM3CuAYYkeY4uRDoiog9EbEX6AKmpWWnRMSaNFDlsty2zJrCMXGHtNkROhP4JfC/Jb0dWA9cQ5NMjVupt7eX+RMOHhZrhylb23Xq2Vapl5ODWdHxwDnApyPiUUk3c6gJCWjs1LiVSqUSNz6y/7DYtkuL5VpNu0492yr1crOSWVEP0BMRj6b395IlC0+Na8cMJwezChHxAvC8pLem0PnAZjw1rh1D3KxkVt2ngTslnQBsJZvu9hV4alw7RvSbHCQtAT4A7IqIs1PsbqD8q2oE8GJETJQ0jqzL3zNp2ZqI+FRaZxKHvgwrgWtSu+0o4G5gHLANuCT17DBrmIjYAEyusshT49oxoZ5mpTuo6IMdER+NiIkRMRG4D/hubvGz5WXlxJD01a+7r77jZmbWIP0mh4h4GNhTbVlqL70EuKvWNvrp191X33EzM2uQgV5z+C/AzojYkoudKelxYB/wxYj4KbX7dffVd7ygWp/vevoMz59w4LD3zdbHuFX6PdfSDnUws0MGmhxmcfhZww7gjRGxO11j+J6kt9W7sf76jlfr811Pn+HLKqdQbLI+4K3S77mWdqiDmR1y1MlB0vHAfwUmlWMR8TLwcnq9XtKzwFuo3a97p6TREbGjou+4mZk1yEDuc3gf8HRE/Km5SNLrJB2XXr+J7MLz1n76dffVd9zMzBqk3+Qg6S7g34G3SupJfbwBZlK8EP1u4AlJG8juKv1URb/ub5H1BX+WQ/26FwEXSNpClnAWDaA+ZmY2CPptVoqIWX3EL6sSu4+sa2u18lX7dUfEbqr0HTczs8bx8BlmZlbg5GBmZgVODmZmVnBMDrw3ruK+B4Bti97fgD0xM2tOPnMwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwq0LSNkkbJW2QtC7FRknqkrQlPY9McUm6RVK3pCcknZPbzpxUfoukObn4pLT97rSuhr+WZn1zcjDr23siYmJETE7vFwCrI2I8sDq9B7iIbL708cA84DbIkgmwEDgPOBdYWE4oqcwVufWmDX11zOpXzxzSSyTtkvRkLvYlSdvTr6oNkqbnll2bfg09I+nCXHxainVLWpCLnynp0RS/W9IJg1lBs0E0A1iaXi8FLs7Fl0VmDTBC0mjgQqArIvZExF6gC5iWlp0SEWsiIoBluW2ZNYV65nO4A/gXsgM472sR8dV8QNJZwEzgbcDpwI8lvSUtvhW4AOgB1kpaERGbga+kbS2X9HVgLumXl1kDBfAjSQF8IyIWAx0RsSMtfwHoSK/HAM/n1u1JsVrxnirxAknzyM5G6OjooFQqFcr09vYyf8LBw2LVyrWa3t7etqhHpVapV7/JISIeljSuzu3NAJZHxMvALyR1k51OA3RHxFYAScuBGZKeAt4LfCyVWQp8CScHa7x3RcR2Sa8HuiQ9nV8YEZESx5BKSWkxwOTJk6Ozs7NQplQqceMj+w+Lbbu0WK7VlEolqtW31bVKvQYyE9zVkmYD64D56bR5DLAmVyb/i6jyF9R5wKnAixFxoEr5gmq/ourJwvMnHKi5HBr7S6tVfknU0g51yIuI7el5l6T7yX7k7JQ0OiJ2pKahXan4duCM3OpjU2w70FkRL6X42CrlzZrG0SaH24DryU69rwduBD4xWDvVl2q/ourJwpdVmRa0UiN/abXKL4la2qEOZZJOBl4REb9Or6cC1wErgDnAovT8QFplBdmPpeVkP3peSglkFfAPuYvQU4FrI2KPpH2SpgCPArOBfx6u+pnV46iSQ0TsLL+W9E3g++ltX7+g6CO+m+zi3fHp7MG/oKwZdAD3p96lxwPfiYgfSloL3CNpLvAccEkqvxKYDnQDvwEuB0hJ4HpgbSp3XUTsSa+vJLuedxLwYHqYNY2jSg7lU+v09kNAuSfTCuA7km4iuyA9HngMEDBe0plk//xnAh9L7bYPAR8GlnP4rzGzhkjXxt5eJb4bOL9KPICr+tjWEmBJlfg64OwB76zZEOk3OUi6i6zd9DRJPWT9tjslTSRrVtoGfBIgIjZJugfYDBwAroqIg2k7VwOrgOOAJRGxKX3E54Hlkr4MPA7cPmi1MzOzo1JPb6VZVcJ9/gOPiBuAG6rEV5KdflfGt3KoR5OZmTUB3yFtZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZWMJCxldrKuIohNrYten+D9sTMrPF85mBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYF/SYHSUsk7ZL0ZC72j5KelvSEpPsljUjxcZJ+K2lDenw9t84kSRsldUu6RZJSfJSkLklb0vPIoaiomZnVr54zhzuAaRWxLuDsiPhL4D+Aa3PLno2IienxqVz8NuAKYHx6lLe5AFgdEeOB1em9mZk1UL/JISIeBvZUxH4UEQfS2zXA2FrbkDQaOCUi1kREAMuAi9PiGcDS9HppLm5mZg0yGEN2fwK4O/f+TEmPA/uAL0bET4ExQE+uTE+KAXRExI70+gWgo68PkjQPmAfQ0dFBqVSit7eXUqlUcwfnTzhQc3k1/W1zMNVTh2bXDnWoJOk4YB2wPSI+IOlMYDlwKrAe+HhE/F7SiWQ/eCYBu4GPRsS2tI1rgbnAQeBvI2JVik8DbgaOA74VEYuGtXJm/RhQcpD0d8AB4M4U2gG8MSJ2S5oEfE/S2+rdXkSEpKixfDGwGGDy5MnR2dlJqVSis7Oz5nYvq5iroR7bLq29zcFUTx2aXTvUoYprgKeAU9L7rwBfi4jl6XraXLLm0rnA3oh4s6SZqdxHJZ0FzATeBpwO/FjSW9K2bgUuIPuhtFbSiojYPFwVM+vPUfdWknQZ8AHg0tRURES8HBG70+v1wLPAW4DtHN70NDbFAHamZqdy89Ouo90ns8EiaSzwfuBb6b2A9wL3piL5JtB80+i9wPmp/Axgefpe/ALoBs5Nj+6I2BoRvyc7G5kx9LUyq99RJYd0Svw54IMR8Ztc/HXpVBxJbyK78Lw1NRvtkzQlfWlmAw+k1VYAc9LrObm4WSP9E9kx/sf0/lTgxdy1tnzT6BjgeYC0/KVU/k/xinX6ips1jX6blSTdBXQCp0nqARaS9U46EehKPVLXpJ5J7wauk/QHsi/VpyKifDH7SrKeTycBD6YHwCLgHklzgeeASwalZmZHSdIHgF0RsV5SZ4P3pXCdrVJvby/zJxw8LNYO13/a8ToWtE69+k0OETGrSvj2PsreB9zXx7J1wNlV4ruB8/vbD7Nh9E7gg5KmA68iu+ZwMzBC0vHp7CDfNLodOAPokXQ88FqyC9PleFl+nb7ih6l2na1SqVTixkf2HxYbzmtmQ6VNr2O1TL18h7RZhYi4NiLGRsQ4sgvKP4mIS4GHgA+nYvkm0HzT6IdT+UjxmZJOTD2dxgOPAWuB8ZLOlHRC+owVw1A1s7oNRldWs2PF54Hlkr4MPM6hM+jbgW9L6ia7J2gmQERsknQPsJmsV99VEXEQQNLVwCqyrqxLImLTsNbErB9ODmY1REQJKKXXW8l6GlWW+R3wkT7WvwG4oUp8JbByEHfVbFC5WcnMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCupKDpKWSNol6clcbJSkLklb0vPIFJekWyR1S3pC0jm5deak8lskzcnFJ0namNa5RWliajMza4x6zxzuAKZVxBYAqyNiPLA6vQe4iGw6xPFkE6PfBlkyARYC55FNmLKwnFBSmSty61V+1rAbt+AHhz3MzI4ldSWHiHiYbPrDvBnA0vR6KXBxLr4sMmvIJmUfDVwIdEXEnojYC3QB09KyUyJiTZp3d1luW2Zm1gADmSa0IyJ2pNcvAB3p9Rjg+Vy5nhSrFe+pEi+QNI/sbISOjg5KpRK9vb2USqWaOzp/woE6qlNbf58xEPXUodm1Qx3M7JBBmUM6IkJSDMa2+vmcxcBigMmTJ0dnZyelUonOzs6a6102CM1C2y6t/RkDUU8dml071MHMDhlIb6WdqUmI9LwrxbcDZ+TKjU2xWvGxVeJmZtYgA0kOK4Byj6M5wAO5+OzUa2kK8FJqfloFTJU0Ml2IngqsSsv2SZqSeinNzm3LzMwaoK5mJUl3AZ3AaZJ6yHodLQLukTQXeA64JBVfCUwHuoHfAJcDRMQeSdcDa1O56yKifJH7SrIeUScBD6aHmZk1SF3JISJm9bHo/CplA7iqj+0sAZZUia8Dzq5nX8zMbOj5DmmzKiS9StJjkn4uaZOkv0/xMyU9mm7YvFvSCSl+YnrfnZaPy23r2hR/RtKFufi0FOuWtKByH8waycnBrLqXgfdGxNuBiWT35EwBvgJ8LSLeDOwF5qbyc4G9Kf61VA5JZwEzgbeR3dz5vyQdJ+k44Faym0bPAmalsmZNwcnBrIp0E2dvevvK9AjgvcC9KV5582f5ptB7gfNTB4sZwPKIeDkifkF2Le7c9OiOiK0R8XtgeSpr1hQG5T4Hs3aUft2vB95M9iv/WeDFiCjfVZm/YfNPN3lGxAFJLwGnpvia3Gbz61TeFHpelX0o3PhZqbe3l/kTDh4Wa4cbEtv1xspWqZeTg1kfIuIgMFHSCOB+4C8asA+FGz8rlUolbnxk/2Gxobxpc7i0642VrVIvNyuZ9SMiXgQeAv6KbKyw8o+q/A2bf7rJMy1/LbCbI78p1KwptN2Zg0dQtcEg6XXAHyLiRUknAReQXWR+CPgw2TWCyps/5wD/npb/JA0rswL4jqSbgNPJRh1+DBAwXtKZZElhJvCx4aqfWX/aLjmYDZLRwNJ03eEVwD0R8X1Jm4Hlkr4MPA7cnsrfDnxbUjfZCMYzASJik6R7gM3AAeCq1FyFpKvJRg44DlgSEZuGr3pmtTk5mFUREU8A76gS30rW06gy/jvgI31s6wbghirxlWQjCpg1HV9zMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7OCo04Okt4qaUPusU/SZyR9SdL2XHx6bh3PiGVm1gKOeviMiHiGbIas8rj328mGNb6cbKasr+bLV8yIdTrwY0lvSYtvJRvYrAdYK2lFRGw+2n0zM7OBGayxlc4Hno2I57LJr6r604xYwC/SAGXlMWq605g1SCrPiOXkYGbWIIOVHGYCd+XeXy1pNrAOmB8RexngjFhQfVasylmV5k84UG3VARvKmZtaZWaoWtqhDmZ2yICTg6QTgA8C16bQbcD1ZPPtXg/cCHxioJ8D1WfFqpxV6bIhms9hKGfWapWZoWpphzqY2SGDceZwEfCziNgJUH4GkPRN4Pvpba2ZrzwjlplZExmMrqyzyDUpSRqdW/Yh4Mn0egUwU9KJafar8oxYa0kzYqWzkJmprJmZNciAzhwknUzWy+iTufD/lDSRrFlpW3mZZ8QyM2sdA0oOEbEfOLUi9vEa5Vt2Rqxqc1NvW/T+BuyJmdnQ8x3SZmZW4ORgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZhUknSHpIUmbJW2SdE2Kj5LUJWlLeh6Z4pJ0Sxpy/glJ5+S2NSeV3yJpTi4+SdLGtM4tqjFipVkjODmYFR0gGzDyLGAKcFUacn4BsDoixgOr03vIhpAZnx7zyMYXQ9IoYCHZQJLnAgvLCSWVuSK33rRhqJdZ3ZwczCpExI6I+Fl6/WvgKbIRhGcAS1OxpcDF6fUMYFlk1gAj0jAyFwJdEbEnjUzcBUxLy06JiDUREcCy3LbMmsJgDdlt1pYkjQPeATwKdETEjrToBaAjvR5Dcdj5Mf3Ee6rEq31+YZj6Sr29vcyfcPCwWDsMn96uw8C3Sr2cHMz6IOnVwH3AZyJiX/6yQESEpBjqfag2TH2lUqnEjY/sPyw2lEPMD5d2HQa+VerlZiWzKiS9kiwx3BkR303hneVRh9PzrhTvazj6WvGxVeJmTcPJwaxC6jl0O/BURNyUW7QCKPc4mgM8kIvPTr2WpgAvpeanVcBUSSPTheipwKq0bJ+kKemzZue2ZdYU3KxkVvRO4OPARkkbUuwLwCLgHklzgeeAS9KylcB0oBv4DXA5QETskXQ92ZwlANdFxJ70+krgDuAk4MH0MGsaTg5mFSLiEaCv+w7Or1I+gKv62NYSYEmV+Drg7AHsptmQcrOSmZkV+MzBrA1VTk7liansSPnMwczMCgacHCRtS2PEbJC0LsUGbQwaMzMbfoN15vCeiJgYEZPT+8Ecg8bMzIbZUDUrDcoYNEO0b2Zm1o/BuCAdwI/SUALfSLf7D9YYNIepNs5M5Tgl8yccGIQq1WewxkdplbFWammHOpjZIYORHN4VEdslvR7okvR0fuFgjkFTbZyZynFKLqvopTGUBmv8mlYZa6WWdqiDmR0y4GaliNienncB95NdMxisMWjMzKwBBpQcJJ0s6TXl12RjxzzJII1BM5B9MzOzozfQZqUO4P40lPHxwHci4oeS1jJ4Y9CYmdkwG1ByiIitwNurxHczSGPQmJnZ8PPwGQPgIQrMrF15+AwzMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwq0LSEkm7JD2Zi42S1CVpS3oemeKSdIukbklPSDont86cVH6LpDm5+CRJG9M6tygNbWzWLJwczKq7g+I85guA1RExHlid3gNcBIxPj3nAbZAlE2AhcB7ZJFgLywkllbkit57nTLem4uRgVkVEPAxUzikyA1iaXi8FLs7Fl0VmDTAizYB4IdAVEXsiYi/QBUxLy06JiDVpGPtluW2ZNQUP2W1Wv440cyHAC2STXQGMAZ7PletJsVrxnirxAknzyM5G6OjooFQqFcr09vYyf8LBmjtebb1m19vb25L73Z9WqZeTwyCqnN8BPMdDu4qIkBTD8DmLgcUAkydPjs7OzkKZUqnEjY/sr7mdbZcW12t2pVKJavVtda1SLzcrmdVvZ2oSIj3vSvHtwBm5cmNTrFZ8bJW4WdM46uQg6QxJD0naLGmTpGtS/EuStkvakB7Tc+tcm3pnPCPpwlx8Wop1S1pQ7fPMmsAKoNzjaA7wQC4+O/VamgK8lJqfVgFTJY1MF6KnAqvSsn2SpqReSrNz2zJrCgNpVjoAzI+In0l6DbBeUlda9rWI+Gq+sKSzgJnA24DTgR9LektafCtwAVnb61pJKyJi8wD2zWxAJN0FdAKnSeoh63W0CLhH0lzgOeCSVHwlMB3oBn4DXA4QEXskXQ+sTeWui4jyRe4ryXpEnQQ8mB5mTeOok0P69bMjvf61pKfo46JaMgNYHhEvA7+Q1E3WvQ+gOyK2Akhanso6OVjDRMSsPhadX6VsAFf1sZ0lwJIq8XXA2QPZR7OhNCgXpCWNA94BPAq8E7ha0mxgHdnZxV6yxLEmt1q+h0Zlj47z+vicQs+Nyiv/8yccGHiFBlE9vRJapfdCLe1QBzM7ZMDJQdKrgfuAz0TEPkm3AdcDkZ5vBD4x0M+B6j03Kq/8X1alx1Aj1dNLpFV6L9TSDnUws0MGlBwkvZIsMdwZEd8FiIidueXfBL6f3vbVc4MacTMza4CB9FYScDvwVETclIuPzhX7EFAem2YFMFPSiZLOJBsy4DGyi3XjJZ0p6QSyi9Yrjna/zMxs4AZy5vBO4OPARkkbUuwLwCxJE8malbYBnwSIiE2S7iG70HwAuCoiDgJIupqs299xwJKI2DSA/TIzswEaSG+lR4BqI0murLHODcANVeIra63XyirvmvYd02bWCnyHtJmZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYEn+zE7BngiKjtSPnMwM7MCnzk0gY3bXzpswED/ojOzRvOZg5mZFTg5mJlZgZODmZkVODmYmVmBL0g3IY/kamaN5jMHMzMrcHIwM7MCNyu1AN/dakPBzZdWi88czMysoGnOHCRNA24mm0f6WxGxqMG71NT8q6/1+Zi3ZtYUyUHSccCtwAVAD7BW0oqI2NzYPWsdbnpqLc14zPsYsrymSA7AuUB3RGwFkLQcmAE4OQxAtS/7kfI/hyHTEsd8PceQj5H21CzJYQzwfO59D3BeZSFJ84B56W2vpGeA04BfDfkeDhJ9pWq4aevQx/5W07R1qOHPGvjZAznmKzX0b38Ex8iRasVjqh6Nrlddx32zJIe6RMRiYHE+JmldRExu0C4NCtfB+lLtmK/Urn9716uxmqW30nbgjNz7sSlm1q58zFtTa5bksBYYL+lMSScAM4EVDd4ns6HkY96aWlM0K0XEAUlXA6vIuvUtiYhNda5e85S7RbgOx5gBHvOV2vVv73o1kCKi0ftw05ypAAACzElEQVRgZmZNplmalczMrIk4OZiZWUHLJgdJ0yQ9I6lb0oJG7089JC2RtEvSk7nYKEldkrak55GN3Mf+SDpD0kOSNkvaJOmaFG+perSLVvwelEnaJmmjpA2S1qVY1eNImVtSPZ+QdE5j9/6QI/le16qHpDmp/BZJcxpRl7yWTA65oQcuAs4CZkk6q7F7VZc7gGkVsQXA6ogYD6xO75vZAWB+RJwFTAGuSn/7VqtHy2vh70HeeyJiYq7ff1/H0UXA+PSYB9w27Hvatzuo/3tdtR6SRgELyW6EPBdY2OgfWC2ZHMgNPRARvwfKQw80tYh4GNhTEZ4BLE2vlwIXD+tOHaGI2BERP0uvfw08RXa3b0vVo0205PegH30dRzOAZZFZA4yQNLoRO1jpCL/XfdXjQqArIvZExF6gi2LCGVatmhyqDT0wpkH7MlAdEbEjvX4B6GjkzhwJSeOAdwCP0sL1aGGt/j0I4EeS1qdhQqDv46jV6nqk9Wi6+jXFfQ6WiYiQ1BJ9iyW9GrgP+ExE7JP0p2WtVA9rqHdFxHZJrwe6JD2dX9gux1Gr1qNVzxzaaeiBneXT4/S8q8H70y9JryRLDHdGxHdTuOXq0QZa+nsQEdvT8y7gfrJmsr6Oo1ar65HWo+nq16rJoZ2GHlgBlHsmzAEeaOC+9EvZKcLtwFMRcVNuUUvVo0207PdA0smSXlN+DUwFnqTv42gFMDv19pkCvJRrtmlGR1qPVcBUSSPTheipKdY4EdGSD2A68B/As8DfNXp/6tznu4AdwB/I2hTnAqeS9WbYAvwYGNXo/eynDu8iayt+AtiQHtNbrR7t8mjF70Ha7zcBP0+PTeV97+s4AkTWM+tZYCMwudF1yNWl7u91rXoAnwC60+PyRtfLw2eYmVlBqzYrmZnZEHJyMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK/j/GVUGM5tv3DMAAAAASUVORK5CYII=\n"
 549 |           },
 550 |           "metadata": {
 551 |             "tags": []
 552 |           }
 553 |         }
 554 |       ]
 555 |     },
 556 |     {
 557 |       "cell_type": "markdown",
 558 |       "metadata": {
 559 |         "id": "QwdSGIhGMEbz",
 560 |         "colab_type": "text"
 561 |       },
 562 |       "source": [
 563 |         "Interesting. We can fix the maximum length of the summary to 8 since that seems to be the majority summary length.\n",
 564 |         "\n",
 565 |         "Let us understand the proportion of the length of summaries below 8"
 566 |       ]
 567 |     },
 568 |     {
 569 |       "cell_type": "code",
 570 |       "metadata": {
 571 |         "trusted": true,
 572 |         "id": "7JRjwdIOFxg3",
 573 |         "colab_type": "code",
 574 |         "colab": {},
 575 |         "outputId": "f968be82-c539-471d-ce23-16f18b059ea0"
 576 |       },
 577 |       "source": [
 578 |         "cnt=0\n",
 579 |         "for i in data['cleaned_summary']:\n",
 580 |         "    if(len(i.split())<=8):\n",
 581 |         "        cnt=cnt+1\n",
 582 |         "print(cnt/len(data['cleaned_summary']))"
 583 |       ],
 584 |       "execution_count": 0,
 585 |       "outputs": [
 586 |         {
 587 |           "output_type": "stream",
 588 |           "text": [
 589 |             "0.9424907471335922\n"
 590 |           ],
 591 |           "name": "stdout"
 592 |         }
 593 |       ]
 594 |     },
 595 |     {
 596 |       "cell_type": "markdown",
 597 |       "metadata": {
 598 |         "id": "yYB4Ga9KMjEu",
 599 |         "colab_type": "text"
 600 |       },
 601 |       "source": [
 602 |         "We observe that 94% of the summaries have length below 8. So, we can fix maximum length of summary to 8.\n",
 603 |         "\n",
 604 |         "Let us fix the maximum length of review to 30"
 605 |       ]
 606 |     },
 607 |     {
 608 |       "cell_type": "code",
 609 |       "metadata": {
 610 |         "trusted": true,
 611 |         "id": "ZKD5VOWqFxhC",
 612 |         "colab_type": "code",
 613 |         "colab": {}
 614 |       },
 615 |       "source": [
 616 |         "max_text_len=30\n",
 617 |         "max_summary_len=8"
 618 |       ],
 619 |       "execution_count": 0,
 620 |       "outputs": []
 621 |     },
 622 |     {
 623 |       "cell_type": "markdown",
 624 |       "metadata": {
 625 |         "id": "E6d48E-8M4VO",
 626 |         "colab_type": "text"
 627 |       },
 628 |       "source": [
 629 |         "Let us select the reviews and summaries whose length falls below or equal to **max_text_len** and **max_summary_len**"
 630 |       ]
 631 |     },
 632 |     {
 633 |       "cell_type": "code",
 634 |       "metadata": {
 635 |         "trusted": true,
 636 |         "id": "yY0tEJP0FxhI",
 637 |         "colab_type": "code",
 638 |         "colab": {}
 639 |       },
 640 |       "source": [
 641 |         "cleaned_text =np.array(data['cleaned_text'])\n",
 642 |         "cleaned_summary=np.array(data['cleaned_summary'])\n",
 643 |         "\n",
 644 |         "short_text=[]\n",
 645 |         "short_summary=[]\n",
 646 |         "\n",
 647 |         "for i in range(len(cleaned_text)):\n",
 648 |         "    if(len(cleaned_summary[i].split())<=max_summary_len and len(cleaned_text[i].split())<=max_text_len):\n",
 649 |         "        short_text.append(cleaned_text[i])\n",
 650 |         "        short_summary.append(cleaned_summary[i])\n",
 651 |         "        \n",
 652 |         "df=pd.DataFrame({'text':short_text,'summary':short_summary})"
 653 |       ],
 654 |       "execution_count": 0,
 655 |       "outputs": []
 656 |     },
 657 |     {
 658 |       "cell_type": "markdown",
 659 |       "metadata": {
 660 |         "id": "tR1uh8xSNUma",
 661 |         "colab_type": "text"
 662 |       },
 663 |       "source": [
 664 |         "Remember to add the **START** and **END** special tokens at the beginning and end of the summary. Here, I have chosen **sostok** and **eostok** as START and END tokens\n",
 665 |         "\n",
 666 |         "**Note:** Be sure that the chosen special tokens never appear in the summary"
 667 |       ]
 668 |     },
 669 |     {
 670 |       "cell_type": "code",
 671 |       "metadata": {
 672 |         "trusted": true,
 673 |         "id": "EwLUH78CFxhg",
 674 |         "colab_type": "code",
 675 |         "colab": {}
 676 |       },
 677 |       "source": [
 678 |         "df['summary'] = df['summary'].apply(lambda x : 'sostok '+ x + ' eostok')"
 679 |       ],
 680 |       "execution_count": 0,
 681 |       "outputs": []
 682 |     },
 683 |     {
 684 |       "cell_type": "markdown",
 685 |       "metadata": {
 686 |         "id": "1GlcX4RFOh13",
 687 |         "colab_type": "text"
 688 |       },
 689 |       "source": [
 690 |         "We are getting closer to the model building part. Before that, we need to split our dataset into a training and validation set. We’ll use 90% of the dataset as the training data and evaluate the performance on the remaining 10% (holdout set):"
 691 |       ]
 692 |     },
 693 |     {
 694 |       "cell_type": "code",
 695 |       "metadata": {
 696 |         "trusted": true,
 697 |         "id": "RakakKHcFxhl",
 698 |         "colab_type": "code",
 699 |         "colab": {}
 700 |       },
 701 |       "source": [
 702 |         "from sklearn.model_selection import train_test_split\n",
 703 |         "x_tr,x_val,y_tr,y_val=train_test_split(np.array(df['text']),np.array(df['summary']),test_size=0.1,random_state=0,shuffle=True) "
 704 |       ],
 705 |       "execution_count": 0,
 706 |       "outputs": []
 707 |     },
 708 |     {
 709 |       "cell_type": "markdown",
 710 |       "metadata": {
 711 |         "id": "Vq1mqyOHOtIl",
 712 |         "colab_type": "text"
 713 |       },
 714 |       "source": [
 715 |         "#Preparing the Tokenizer\n",
 716 |         "\n",
 717 |         "A tokenizer builds the vocabulary and converts a word sequence to an integer sequence. Go ahead and build tokenizers for text and summary:\n",
 718 |         "\n",
 719 |         "#Text Tokenizer"
 720 |       ]
 721 |     },
 722 |     {
 723 |       "cell_type": "code",
 724 |       "metadata": {
 725 |         "trusted": true,
 726 |         "id": "oRHTgX6hFxhq",
 727 |         "colab_type": "code",
 728 |         "colab": {}
 729 |       },
 730 |       "source": [
 731 |         "from keras.preprocessing.text import Tokenizer \n",
 732 |         "from keras.preprocessing.sequence import pad_sequences\n",
 733 |         "\n",
 734 |         "#prepare a tokenizer for reviews on training data\n",
 735 |         "x_tokenizer = Tokenizer() \n",
 736 |         "x_tokenizer.fit_on_texts(list(x_tr))"
 737 |       ],
 738 |       "execution_count": 0,
 739 |       "outputs": []
 740 |     },
 741 |     {
 742 |       "cell_type": "markdown",
 743 |       "metadata": {
 744 |         "id": "RzvLwYL_PDcx",
 745 |         "colab_type": "text"
 746 |       },
 747 |       "source": [
 748 |         "#Rarewords and its Coverage\n",
 749 |         "\n",
 750 |         "Let us look at the proportion rare words and its total coverage in the entire text\n",
 751 |         "\n",
 752 |         "Here, I am defining the threshold to be 4 which means word whose count is below 4 is considered as a rare word"
 753 |       ]
 754 |     },
 755 |     {
 756 |       "cell_type": "code",
 757 |       "metadata": {
 758 |         "trusted": true,
 759 |         "id": "y8KronV2Fxhx",
 760 |         "colab_type": "code",
 761 |         "colab": {},
 762 |         "outputId": "d2eb2f27-fbbc-4e61-9556-3c3ff5e4327b"
 763 |       },
 764 |       "source": [
 765 |         "thresh=4\n",
 766 |         "\n",
 767 |         "cnt=0\n",
 768 |         "tot_cnt=0\n",
 769 |         "freq=0\n",
 770 |         "tot_freq=0\n",
 771 |         "\n",
 772 |         "for key,value in x_tokenizer.word_counts.items():\n",
 773 |         "    tot_cnt=tot_cnt+1\n",
 774 |         "    tot_freq=tot_freq+value\n",
 775 |         "    if(value<thresh):\n",
 776 |         "        cnt=cnt+1\n",
 777 |         "        freq=freq+value\n",
 778 |         "    \n",
 779 |         "print(\"% of rare words in vocabulary:\",(cnt/tot_cnt)*100)\n",
 780 |         "print(\"Total Coverage of rare words:\",(freq/tot_freq)*100)"
 781 |       ],
 782 |       "execution_count": 0,
 783 |       "outputs": [
 784 |         {
 785 |           "output_type": "stream",
 786 |           "text": [
 787 |             "% of rare words in vocabulary: 66.12339930151339\n",
 788 |             "Total Coverage of rare words: 2.953684513790566\n"
 789 |           ],
 790 |           "name": "stdout"
 791 |         }
 792 |       ]
 793 |     },
 794 |     {
 795 |       "cell_type": "markdown",
 796 |       "metadata": {
 797 |         "id": "So-J-5kzQIeO",
 798 |         "colab_type": "text"
 799 |       },
 800 |       "source": [
 801 |         "**Remember**:\n",
 802 |         "\n",
 803 |         "\n",
 804 |         "* **tot_cnt** gives the size of vocabulary (which means every unique words in the text)\n",
 805 |         " \n",
 806 |         "*   **cnt** gives me the no. of rare words whose count falls below threshold\n",
 807 |         "\n",
 808 |         "*  **tot_cnt - cnt** gives me the top most common words \n",
 809 |         "\n",
 810 |         "Let us define the tokenizer with top most common words for reviews."
 811 |       ]
 812 |     },
 813 |     {
 814 |       "cell_type": "code",
 815 |       "metadata": {
 816 |         "trusted": true,
 817 |         "id": "J2giEsF3Fxh3",
 818 |         "colab_type": "code",
 819 |         "colab": {}
 820 |       },
 821 |       "source": [
 822 |         "#prepare a tokenizer for reviews on training data\n",
 823 |         "x_tokenizer = Tokenizer(num_words=tot_cnt-cnt) \n",
 824 |         "x_tokenizer.fit_on_texts(list(x_tr))\n",
 825 |         "\n",
 826 |         "#convert text sequences into integer sequences\n",
 827 |         "x_tr_seq    =   x_tokenizer.texts_to_sequences(x_tr) \n",
 828 |         "x_val_seq   =   x_tokenizer.texts_to_sequences(x_val)\n",
 829 |         "\n",
 830 |         "#padding zero upto maximum length\n",
 831 |         "x_tr    =   pad_sequences(x_tr_seq,  maxlen=max_text_len, padding='post')\n",
 832 |         "x_val   =   pad_sequences(x_val_seq, maxlen=max_text_len, padding='post')\n",
 833 |         "\n",
 834 |         "#size of vocabulary ( +1 for padding token)\n",
 835 |         "x_voc   =  x_tokenizer.num_words + 1"
 836 |       ],
 837 |       "execution_count": 0,
 838 |       "outputs": []
 839 |     },
 840 |     {
 841 |       "cell_type": "code",
 842 |       "metadata": {
 843 |         "trusted": true,
 844 |         "id": "DCbGMsm4FxiA",
 845 |         "colab_type": "code",
 846 |         "colab": {},
 847 |         "outputId": "2d9165f0-e542-4114-91f3-e070d483fce9"
 848 |       },
 849 |       "source": [
 850 |         "x_voc"
 851 |       ],
 852 |       "execution_count": 0,
 853 |       "outputs": [
 854 |         {
 855 |           "output_type": "execute_result",
 856 |           "data": {
 857 |             "text/plain": [
 858 |               "8440"
 859 |             ]
 860 |           },
 861 |           "metadata": {
 862 |             "tags": []
 863 |           },
 864 |           "execution_count": 24
 865 |         }
 866 |       ]
 867 |     },
 868 |     {
 869 |       "cell_type": "markdown",
 870 |       "metadata": {
 871 |         "id": "uQfKP3sqRxi9",
 872 |         "colab_type": "text"
 873 |       },
 874 |       "source": [
 875 |         "#Summary Tokenizer"
 876 |       ]
 877 |     },
 878 |     {
 879 |       "cell_type": "code",
 880 |       "metadata": {
 881 |         "trusted": true,
 882 |         "id": "eRHqyBkBFxiJ",
 883 |         "colab_type": "code",
 884 |         "colab": {}
 885 |       },
 886 |       "source": [
 887 |         "#prepare a tokenizer for reviews on training data\n",
 888 |         "y_tokenizer = Tokenizer()   \n",
 889 |         "y_tokenizer.fit_on_texts(list(y_tr))"
 890 |       ],
 891 |       "execution_count": 0,
 892 |       "outputs": []
 893 |     },
 894 |     {
 895 |       "cell_type": "markdown",
 896 |       "metadata": {
 897 |         "id": "KInA6O6ZSkJz",
 898 |         "colab_type": "text"
 899 |       },
 900 |       "source": [
 901 |         "#Rarewords and its Coverage\n",
 902 |         "\n",
 903 |         "Let us look at the proportion rare words and its total coverage in the entire summary\n",
 904 |         "\n",
 905 |         "Here, I am defining the threshold to be 6 which means word whose count is below 6 is considered as a rare word"
 906 |       ]
 907 |     },
 908 |     {
 909 |       "cell_type": "code",
 910 |       "metadata": {
 911 |         "trusted": true,
 912 |         "id": "yzE5OiRLFxiM",
 913 |         "colab_type": "code",
 914 |         "colab": {},
 915 |         "outputId": "7f7a4f89-b088-4847-8172-09e5a2383d0e"
 916 |       },
 917 |       "source": [
 918 |         "thresh=6\n",
 919 |         "\n",
 920 |         "cnt=0\n",
 921 |         "tot_cnt=0\n",
 922 |         "freq=0\n",
 923 |         "tot_freq=0\n",
 924 |         "\n",
 925 |         "for key,value in y_tokenizer.word_counts.items():\n",
 926 |         "    tot_cnt=tot_cnt+1\n",
 927 |         "    tot_freq=tot_freq+value\n",
 928 |         "    if(value<thresh):\n",
 929 |         "        cnt=cnt+1\n",
 930 |         "        freq=freq+value\n",
 931 |         "    \n",
 932 |         "print(\"% of rare words in vocabulary:\",(cnt/tot_cnt)*100)\n",
 933 |         "print(\"Total Coverage of rare words:\",(freq/tot_freq)*100)"
 934 |       ],
 935 |       "execution_count": 0,
 936 |       "outputs": [
 937 |         {
 938 |           "output_type": "stream",
 939 |           "text": [
 940 |             "% of rare words in vocabulary: 78.12740675541863\n",
 941 |             "Total Coverage of rare words: 5.3921899389571895\n"
 942 |           ],
 943 |           "name": "stdout"
 944 |         }
 945 |       ]
 946 |     },
 947 |     {
 948 |       "cell_type": "markdown",
 949 |       "metadata": {
 950 |         "id": "0PBhzKuRSw_9",
 951 |         "colab_type": "text"
 952 |       },
 953 |       "source": [
 954 |         "Let us define the tokenizer with top most common words for summary."
 955 |       ]
 956 |     },
 957 |     {
 958 |       "cell_type": "code",
 959 |       "metadata": {
 960 |         "trusted": true,
 961 |         "id": "-fswLvIgFxiR",
 962 |         "colab_type": "code",
 963 |         "colab": {}
 964 |       },
 965 |       "source": [
 966 |         "#prepare a tokenizer for reviews on training data\n",
 967 |         "y_tokenizer = Tokenizer(num_words=tot_cnt-cnt) \n",
 968 |         "y_tokenizer.fit_on_texts(list(y_tr))\n",
 969 |         "\n",
 970 |         "#convert text sequences into integer sequences\n",
 971 |         "y_tr_seq    =   y_tokenizer.texts_to_sequences(y_tr) \n",
 972 |         "y_val_seq   =   y_tokenizer.texts_to_sequences(y_val) \n",
 973 |         "\n",
 974 |         "#padding zero upto maximum length\n",
 975 |         "y_tr    =   pad_sequences(y_tr_seq, maxlen=max_summary_len, padding='post')\n",
 976 |         "y_val   =   pad_sequences(y_val_seq, maxlen=max_summary_len, padding='post')\n",
 977 |         "\n",
 978 |         "#size of vocabulary\n",
 979 |         "y_voc  =   y_tokenizer.num_words +1"
 980 |       ],
 981 |       "execution_count": 0,
 982 |       "outputs": []
 983 |     },
 984 |     {
 985 |       "cell_type": "markdown",
 986 |       "metadata": {
 987 |         "id": "qqwDUT5oTFmn",
 988 |         "colab_type": "text"
 989 |       },
 990 |       "source": [
 991 |         "Let us check whether word count of start token is equal to length of the training data"
 992 |       ]
 993 |     },
 994 |     {
 995 |       "cell_type": "code",
 996 |       "metadata": {
 997 |         "trusted": true,
 998 |         "id": "pR8IX9FRFxiY",
 999 |         "colab_type": "code",
1000 |         "colab": {},
1001 |         "outputId": "b116cdbd-42c4-4ede-9f6d-46284115393e"
1002 |       },
1003 |       "source": [
1004 |         "y_tokenizer.word_counts['sostok'],len(y_tr)   "
1005 |       ],
1006 |       "execution_count": 0,
1007 |       "outputs": [
1008 |         {
1009 |           "output_type": "execute_result",
1010 |           "data": {
1011 |             "text/plain": [
1012 |               "(42453, 42453)"
1013 |             ]
1014 |           },
1015 |           "metadata": {
1016 |             "tags": []
1017 |           },
1018 |           "execution_count": 28
1019 |         }
1020 |       ]
1021 |     },
1022 |     {
1023 |       "cell_type": "markdown",
1024 |       "metadata": {
1025 |         "id": "LVFhFVguTTtw",
1026 |         "colab_type": "text"
1027 |       },
1028 |       "source": [
1029 |         "Here, I am deleting the rows that contain only **START** and **END** tokens"
1030 |       ]
1031 |     },
1032 |     {
1033 |       "cell_type": "code",
1034 |       "metadata": {
1035 |         "trusted": true,
1036 |         "id": "kZ-vW82sFxih",
1037 |         "colab_type": "code",
1038 |         "colab": {}
1039 |       },
1040 |       "source": [
1041 |         "ind=[]\n",
1042 |         "for i in range(len(y_tr)):\n",
1043 |         "    cnt=0\n",
1044 |         "    for j in y_tr[i]:\n",
1045 |         "        if j!=0:\n",
1046 |         "            cnt=cnt+1\n",
1047 |         "    if(cnt==2):\n",
1048 |         "        ind.append(i)\n",
1049 |         "\n",
1050 |         "y_tr=np.delete(y_tr,ind, axis=0)\n",
1051 |         "x_tr=np.delete(x_tr,ind, axis=0)"
1052 |       ],
1053 |       "execution_count": 0,
1054 |       "outputs": []
1055 |     },
1056 |     {
1057 |       "cell_type": "code",
1058 |       "metadata": {
1059 |         "trusted": true,
1060 |         "id": "cx5NISuMFxik",
1061 |         "colab_type": "code",
1062 |         "colab": {}
1063 |       },
1064 |       "source": [
1065 |         "ind=[]\n",
1066 |         "for i in range(len(y_val)):\n",
1067 |         "    cnt=0\n",
1068 |         "    for j in y_val[i]:\n",
1069 |         "        if j!=0:\n",
1070 |         "            cnt=cnt+1\n",
1071 |         "    if(cnt==2):\n",
1072 |         "        ind.append(i)\n",
1073 |         "\n",
1074 |         "y_val=np.delete(y_val,ind, axis=0)\n",
1075 |         "x_val=np.delete(x_val,ind, axis=0)"
1076 |       ],
1077 |       "execution_count": 0,
1078 |       "outputs": []
1079 |     },
1080 |     {
1081 |       "cell_type": "markdown",
1082 |       "metadata": {
1083 |         "id": "wOtlDcthFxip",
1084 |         "colab_type": "text"
1085 |       },
1086 |       "source": [
1087 |         "# Model building\n",
1088 |         "\n",
1089 |         "We are finally at the model building part. But before we do that, we need to familiarize ourselves with a few terms which are required prior to building the model.\n",
1090 |         "\n",
1091 |         "**Return Sequences = True**: When the return sequences parameter is set to True, LSTM produces the hidden state and cell state for every timestep\n",
1092 |         "\n",
1093 |         "**Return State = True**: When return state = True, LSTM produces the hidden state and cell state of the last timestep only\n",
1094 |         "\n",
1095 |         "**Initial State**: This is used to initialize the internal states of the LSTM for the first timestep\n",
1096 |         "\n",
1097 |         "**Stacked LSTM**: Stacked LSTM has multiple layers of LSTM stacked on top of each other. \n",
1098 |         "This leads to a better representation of the sequence. I encourage you to experiment with the multiple layers of the LSTM stacked on top of each other (it’s a great way to learn this)\n",
1099 |         "\n",
1100 |         "Here, we are building a 3 stacked LSTM for the encoder:"
1101 |       ]
1102 |     },
1103 |     {
1104 |       "cell_type": "code",
1105 |       "metadata": {
1106 |         "trusted": true,
1107 |         "id": "zXef38nBFxir",
1108 |         "colab_type": "code",
1109 |         "colab": {},
1110 |         "outputId": "7ae99521-46f8-4c6f-9cba-4979deffeee8"
1111 |       },
1112 |       "source": [
1113 |         "from keras import backend as K \n",
1114 |         "K.clear_session()\n",
1115 |         "\n",
1116 |         "latent_dim = 300\n",
1117 |         "embedding_dim=100\n",
1118 |         "\n",
1119 |         "# Encoder\n",
1120 |         "encoder_inputs = Input(shape=(max_text_len,))\n",
1121 |         "\n",
1122 |         "#embedding layer\n",
1123 |         "enc_emb =  Embedding(x_voc, embedding_dim,trainable=True)(encoder_inputs)\n",
1124 |         "\n",
1125 |         "#encoder lstm 1\n",
1126 |         "encoder_lstm1 = LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)\n",
1127 |         "encoder_output1, state_h1, state_c1 = encoder_lstm1(enc_emb)\n",
1128 |         "\n",
1129 |         "#encoder lstm 2\n",
1130 |         "encoder_lstm2 = LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)\n",
1131 |         "encoder_output2, state_h2, state_c2 = encoder_lstm2(encoder_output1)\n",
1132 |         "\n",
1133 |         "#encoder lstm 3\n",
1134 |         "encoder_lstm3=LSTM(latent_dim, return_state=True, return_sequences=True,dropout=0.4,recurrent_dropout=0.4)\n",
1135 |         "encoder_outputs, state_h, state_c= encoder_lstm3(encoder_output2)\n",
1136 |         "\n",
1137 |         "# Set up the decoder, using `encoder_states` as initial state.\n",
1138 |         "decoder_inputs = Input(shape=(None,))\n",
1139 |         "\n",
1140 |         "#embedding layer\n",
1141 |         "dec_emb_layer = Embedding(y_voc, embedding_dim,trainable=True)\n",
1142 |         "dec_emb = dec_emb_layer(decoder_inputs)\n",
1143 |         "\n",
1144 |         "decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)\n",
1145 |         "decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=[state_h, state_c])\n",
1146 |         "\n",
1147 |         "# Attention layer\n",
1148 |         "attn_layer = AttentionLayer(name='attention_layer')\n",
1149 |         "attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs])\n",
1150 |         "\n",
1151 |         "# Concat attention input and decoder LSTM output\n",
1152 |         "decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])\n",
1153 |         "\n",
1154 |         "#dense layer\n",
1155 |         "decoder_dense =  TimeDistributed(Dense(y_voc, activation='softmax'))\n",
1156 |         "decoder_outputs = decoder_dense(decoder_concat_input)\n",
1157 |         "\n",
1158 |         "# Define the model \n",
1159 |         "model = Model([encoder_inputs, decoder_inputs], decoder_outputs)\n",
1160 |         "\n",
1161 |         "model.summary() "
1162 |       ],
1163 |       "execution_count": 0,
1164 |       "outputs": [
1165 |         {
1166 |           "output_type": "stream",
1167 |           "text": [
1168 |             "WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
1169 |             "Instructions for updating:\n",
1170 |             "Colocations handled automatically by placer.\n",
1171 |             "WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/keras/backend.py:4010: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n",
1172 |             "Instructions for updating:\n",
1173 |             "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n",
1174 |             "__________________________________________________________________________________________________\n",
1175 |             "Layer (type)                    Output Shape         Param #     Connected to                     \n",
1176 |             "==================================================================================================\n",
1177 |             "input_1 (InputLayer)            (None, 30)           0                                            \n",
1178 |             "__________________________________________________________________________________________________\n",
1179 |             "embedding (Embedding)           (None, 30, 100)      844000      input_1[0][0]                    \n",
1180 |             "__________________________________________________________________________________________________\n",
1181 |             "lstm (LSTM)                     [(None, 30, 300), (N 481200      embedding[0][0]                  \n",
1182 |             "__________________________________________________________________________________________________\n",
1183 |             "input_2 (InputLayer)            (None, None)         0                                            \n",
1184 |             "__________________________________________________________________________________________________\n",
1185 |             "lstm_1 (LSTM)                   [(None, 30, 300), (N 721200      lstm[0][0]                       \n",
1186 |             "__________________________________________________________________________________________________\n",
1187 |             "embedding_1 (Embedding)         (None, None, 100)    198900      input_2[0][0]                    \n",
1188 |             "__________________________________________________________________________________________________\n",
1189 |             "lstm_2 (LSTM)                   [(None, 30, 300), (N 721200      lstm_1[0][0]                     \n",
1190 |             "__________________________________________________________________________________________________\n",
1191 |             "lstm_3 (LSTM)                   [(None, None, 300),  481200      embedding_1[0][0]                \n",
1192 |             "                                                                 lstm_2[0][1]                     \n",
1193 |             "                                                                 lstm_2[0][2]                     \n",
1194 |             "__________________________________________________________________________________________________\n",
1195 |             "attention_layer (AttentionLayer [(None, None, 300),  180300      lstm_2[0][0]                     \n",
1196 |             "                                                                 lstm_3[0][0]                     \n",
1197 |             "__________________________________________________________________________________________________\n",
1198 |             "concat_layer (Concatenate)      (None, None, 600)    0           lstm_3[0][0]                     \n",
1199 |             "                                                                 attention_layer[0][0]            \n",
1200 |             "__________________________________________________________________________________________________\n",
1201 |             "time_distributed (TimeDistribut (None, None, 1989)   1195389     concat_layer[0][0]               \n",
1202 |             "==================================================================================================\n",
1203 |             "Total params: 4,823,389\n",
1204 |             "Trainable params: 4,823,389\n",
1205 |             "Non-trainable params: 0\n",
1206 |             "__________________________________________________________________________________________________\n"
1207 |           ],
1208 |           "name": "stdout"
1209 |         }
1210 |       ]
1211 |     },
1212 |     {
1213 |       "cell_type": "markdown",
1214 |       "metadata": {
1215 |         "id": "0ZVlfRuMUcoP",
1216 |         "colab_type": "text"
1217 |       },
1218 |       "source": [
1219 |         "I am using sparse categorical cross-entropy as the loss function since it converts the integer sequence to a one-hot vector on the fly. This overcomes any memory issues."
1220 |       ]
1221 |     },
1222 |     {
1223 |       "cell_type": "code",
1224 |       "metadata": {
1225 |         "trusted": true,
1226 |         "id": "Lwfi1Fm8Fxiz",
1227 |         "colab_type": "code",
1228 |         "colab": {}
1229 |       },
1230 |       "source": [
1231 |         "model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')"
1232 |       ],
1233 |       "execution_count": 0,
1234 |       "outputs": []
1235 |     },
1236 |     {
1237 |       "cell_type": "markdown",
1238 |       "metadata": {
1239 |         "id": "p0ykDbxfUhyw",
1240 |         "colab_type": "text"
1241 |       },
1242 |       "source": [
1243 |         "Remember the concept of early stopping? It is used to stop training the neural network at the right time by monitoring a user-specified metric. Here, I am monitoring the validation loss (val_loss). Our model will stop training once the validation loss increases:\n"
1244 |       ]
1245 |     },
1246 |     {
1247 |       "cell_type": "code",
1248 |       "metadata": {
1249 |         "id": "s-A3J92MUljB",
1250 |         "colab_type": "code",
1251 |         "colab": {}
1252 |       },
1253 |       "source": [
1254 |         "es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,patience=2)"
1255 |       ],
1256 |       "execution_count": 0,
1257 |       "outputs": []
1258 |     },
1259 |     {
1260 |       "cell_type": "markdown",
1261 |       "metadata": {
1262 |         "id": "Mw6CVECaUq5b",
1263 |         "colab_type": "text"
1264 |       },
1265 |       "source": [
1266 |         "We’ll train the model on a batch size of 128 and validate it on the holdout set (which is 10% of our dataset):"
1267 |       ]
1268 |     },
1269 |     {
1270 |       "cell_type": "code",
1271 |       "metadata": {
1272 |         "trusted": true,
1273 |         "id": "ETnPzA4OFxi3",
1274 |         "colab_type": "code",
1275 |         "colab": {},
1276 |         "outputId": "477e374f-7cf2-4d60-f86e-2c49c9cebedb"
1277 |       },
1278 |       "source": [
1279 |         "history=model.fit([x_tr,y_tr[:,:-1]], y_tr.reshape(y_tr.shape[0],y_tr.shape[1], 1)[:,1:] ,epochs=50,callbacks=[es],batch_size=128, validation_data=([x_val,y_val[:,:-1]], y_val.reshape(y_val.shape[0],y_val.shape[1], 1)[:,1:]))"
1280 |       ],
1281 |       "execution_count": 0,
1282 |       "outputs": [
1283 |         {
1284 |           "output_type": "stream",
1285 |           "text": [
1286 |             "Train on 41346 samples, validate on 4588 samples\n",
1287 |             "WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
1288 |             "Instructions for updating:\n",
1289 |             "Use tf.cast instead.\n",
1290 |             "Epoch 1/50\n",
1291 |             "41346/41346 [==============================] - 85s 2ms/sample - loss: 2.8152 - val_loss: 2.5780\n",
1292 |             "Epoch 2/50\n",
1293 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 2.4859 - val_loss: 2.4072\n",
1294 |             "Epoch 3/50\n",
1295 |             "41346/41346 [==============================] - 81s 2ms/sample - loss: 2.3259 - val_loss: 2.3232\n",
1296 |             "Epoch 4/50\n",
1297 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 2.2281 - val_loss: 2.2534\n",
1298 |             "Epoch 5/50\n",
1299 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 2.1604 - val_loss: 2.1862\n",
1300 |             "Epoch 6/50\n",
1301 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 2.1065 - val_loss: 2.1549\n",
1302 |             "Epoch 7/50\n",
1303 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 2.0616 - val_loss: 2.1177\n",
1304 |             "Epoch 8/50\n",
1305 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 2.0202 - val_loss: 2.0992\n",
1306 |             "Epoch 9/50\n",
1307 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 1.9835 - val_loss: 2.0822\n",
1308 |             "Epoch 10/50\n",
1309 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.9476 - val_loss: 2.0636\n",
1310 |             "Epoch 11/50\n",
1311 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.9145 - val_loss: 2.0606\n",
1312 |             "Epoch 12/50\n",
1313 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 1.8826 - val_loss: 2.0672\n",
1314 |             "Epoch 13/50\n",
1315 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 1.8553 - val_loss: 2.0444\n",
1316 |             "Epoch 14/50\n",
1317 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.8267 - val_loss: 2.0422\n",
1318 |             "Epoch 15/50\n",
1319 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.7980 - val_loss: 2.0456\n",
1320 |             "Epoch 16/50\n",
1321 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 1.7745 - val_loss: 2.0409\n",
1322 |             "Epoch 17/50\n",
1323 |             "41346/41346 [==============================] - 79s 2ms/sample - loss: 1.7518 - val_loss: 2.0374\n",
1324 |             "Epoch 18/50\n",
1325 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.7299 - val_loss: 2.0434\n",
1326 |             "Epoch 19/50\n",
1327 |             "41346/41346 [==============================] - 80s 2ms/sample - loss: 1.7070 - val_loss: 2.0398\n",
1328 |             "Epoch 00019: early stopping\n"
1329 |           ],
1330 |           "name": "stdout"
1331 |         }
1332 |       ]
1333 |     },
1334 |     {
1335 |       "cell_type": "markdown",
1336 |       "metadata": {
1337 |         "id": "0ezKYOp2UxG5",
1338 |         "colab_type": "text"
1339 |       },
1340 |       "source": [
1341 |         "#Understanding the Diagnostic plot\n",
1342 |         "\n",
1343 |         "Now, we will plot a few diagnostic plots to understand the behavior of the model over time:"
1344 |       ]
1345 |     },
1346 |     {
1347 |       "cell_type": "code",
1348 |       "metadata": {
1349 |         "trusted": true,
1350 |         "id": "tDTNLAURFxjE",
1351 |         "colab_type": "code",
1352 |         "colab": {},
1353 |         "outputId": "e2ea6e44-3931-4014-97a1-03fa2a441228"
1354 |       },
1355 |       "source": [
1356 |         "from matplotlib import pyplot\n",
1357 |         "pyplot.plot(history.history['loss'], label='train')\n",
1358 |         "pyplot.plot(history.history['val_loss'], label='test')\n",
1359 |         "pyplot.legend()\n",
1360 |         "pyplot.show()"
1361 |       ],
1362 |       "execution_count": 0,
1363 |       "outputs": [
1364 |         {
1365 |           "output_type": "display_data",
1366 |           "data": {
1367 |             "text/plain": [
1368 |               "<Figure size 432x288 with 1 Axes>"
1369 |             ],
1370 |             "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xd8VFX+//HXSU9ISEglIaTQQoIQSmiCGkSKoKhfFdfeVlZd96H7df3qupZ11993i7t+d3VXWQu2dW3YEBvFICg1IBFIIZBCQkI6qaSf3x93AiGkTGAyLZ/n4zGPmcw9M/PJMLzn5Nxzz1Vaa4QQQjgXF1sXIIQQwvIk3IUQwglJuAshhBOScBdCCCck4S6EEE5Iwl0IIZyQhLsQQjghCXchhHBCEu5CCOGE3Gz1wsHBwTomJsZWLy+EEA5p9+7d5VrrkL7a2SzcY2JiSE1NtdXLCyGEQ1JK5ZvTToZlhBDCCUm4CyGEE5JwF0IIJ2SzMXchhDgbLS0tFBYW0tjYaOtSBpSXlxeRkZG4u7uf1eMl3IUQDqWwsBA/Pz9iYmJQStm6nAGhtaaiooLCwkJiY2PP6jlkWEYI4VAaGxsJCgpy2mAHUEoRFBR0Tn+dSLgLIRyOMwd7h3P9HfsMd6XUSKVUilIqXSl1QCl1fzdt/JVSnyml0kxtbj+nqnpxsKSW369Np6m1baBeQgghHJ45PfdW4EGtdQIwC/i5UiqhS5ufA+la60QgGfirUsrDopWaFFY18Op3uezMrRyIpxdCiF4dP36cF154od+PW7JkCcePHx+AirrXZ7hrrYu11ntMt2uBDGBE12aAnzL+jvAFKjG+FCxu9qhgPNxcSMksG4inF0KIXvUU7q2tvUfeF198QUBAwECVdYZ+jbkrpWKAKcCOLpv+AcQDRcA+4H6tdbsF6juDt4crs0YFselg6UA8vRBC9OqRRx7h8OHDTJ48menTp3PBBRewbNkyEhKMAY0rr7ySadOmMWHCBF566aWTj4uJiaG8vJy8vDzi4+O56667mDBhAgsXLuTEiRMWr9PsqZBKKV/gQ+ABrXVNl82LgL3AxcBoYL1SakvXdkqpFcAKgKioqLMuel5cCE99lk5+RT3RQUPO+nmEEI7tqc8OkF7UNY7OTULEUJ68fEKP2//4xz+yf/9+9u7dy6ZNm1i6dCn79+8/OWVx1apVBAYGcuLECaZPn87VV19NUFDQac+RnZ3NO++8w8svv8zy5cv58MMPuemmmyz6e5jVc1dKuWME+9ta64+6aXI78JE2HAJygfFdG2mtX9JaJ2mtk0JC+lzUrEfz4kIB2JQlQzNCCNuaMWPGaXPRn3vuORITE5k1axYFBQVkZ2ef8ZjY2FgmT54MwLRp08jLy7N4XX323E3j6K8CGVrrZ3todgSYD2xRSoUBcUCOxarsIiZ4CLHBQ9iUVcqt58cM1MsIIexcbz1saxky5NTowaZNm9iwYQPbtm3Dx8eH5OTkbueqe3p6nrzt6upqs2GZOcDNwD6l1F7TfY8CUQBa65XA74HXlVL7AAU8rLUut3i1nVw0LoR3dh6hsaUNL3fXgXwpIYQ4yc/Pj9ra2m63VVdXM2zYMHx8fMjMzGT79u1Wru6UPsNda/0dRmD31qYIWGiposwxb3wor2/NY1tOxclhGiGEGGhBQUHMmTOH8847D29vb8LCwk5uW7x4MStXriQ+Pp64uDhmzZplszoddm2ZmbGBeLm78G1WmYS7EMKq/vOf/3R7v6enJ19++WW32zrG1YODg9m/f//J+3/1q19ZvD5w4OUHvNxdOX90MN9klqK1tnU5QghhVxw23MGYEnmksoHc8npblyKEEHbFocM9WaZECiFEtxw63EcG+jA6ZAgpWXK0qhBCdObQ4Q7GAU07cippaB6QpWyEEMIhOX64jw+lua2dbYcrbF2KEELYDYcP96SYYfh4uMrQjBDCKs52yV+Av/3tbzQ0NFi4ou45fLh7urkyZ0wwKZllMiVSCDHgHCXcHfYgps7mxYWyPr2EQ6V1jA3zs3U5Qggn1nnJ3wULFhAaGsr7779PU1MTV111FU899RT19fUsX76cwsJC2traePzxxykpKaGoqIh58+YRHBxMSkrKgNbpFOGeHGesMLkpq0zCXYjB5MtH4Ng+yz7n8Ilw6R973Nx5yd9169axevVqdu7cidaaZcuWsXnzZsrKyoiIiODzzz8HjDVn/P39efbZZ0lJSSE4ONiyNXfD4YdlACICvIkL85NxdyGEVa1bt45169YxZcoUpk6dSmZmJtnZ2UycOJH169fz8MMPs2XLFvz9/a1em1P03AGSx4ew6rtc6ppa8fV0ml9LCNGbXnrY1qC15te//jU/+9nPzti2Z88evvjiCx577DHmz5/PE088YdXanKLnDsa4e0ub5vtDA7rSsBBikOu85O+iRYtYtWoVdXV1ABw9epTS0lKKiorw8fHhpptu4qGHHmLPnj1nPHagOU0Xd1r0MPw83diUVcqiCcNtXY4Qwkl1XvL30ksv5YYbbmD27NkA+Pr68u9//5tDhw7x0EMP4eLigru7Oy+++CIAK1asYPHixURERAz4DlVlq+mDSUlJOjU11aLPec+/d/PDkeNs+/XFGCeQEkI4m4yMDOLj421dhlV097sqpXZrrZP6eqzTDMuAMTRzrKaRrBLr/NkjhBD2yqnC/SLTlMiUTFklUggxuDlVuIcN9SIhfKhMiRTCyQ2Go9HP9Xd0qnAHmDc+hN35VdQ0tti6FCHEAPDy8qKiosKpA15rTUVFBV5eXmf9HE4zW6ZDclwo/0w5zHfZ5SyZGG7rcoQQFhYZGUlhYSFlZc49/Orl5UVkZORZP97pwn3KyACGermRklkq4S6EE3J3dyc2NtbWZdg9pxuWcXN14cJxIWw6KKtECiEGL6cLdzCGZspqmzhQVGPrUoQQwiacMtwvGtexSqTMmhFCDE5OGe4hfp5MivRnU5Zz73ARQoieOGW4gzE0s+dIFccbmm1dihBCWF2f4a6UGqmUSlFKpSulDiil7u+hXbJSaq+pzbeWL7V/kuNCaNewOVtWiRRCDD7m9NxbgQe11gnALODnSqmEzg2UUgHAC8AyrfUE4FqLV9pPiZEBDPNxZ1OmjLsLIQafPsNda12std5jul0LZAAjujS7AfhIa33E1M7mierqorhoXAjfHiyjvV2mRAohBpd+jbkrpWKAKcCOLpvGAcOUUpuUUruVUrdYprxzkxwXSkV9M/uOVtu6FCGEsCqzw10p5Qt8CDygte46gdwNmAYsBRYBjyulxnXzHCuUUqlKqVRrHDp84bgQlEIWEhNCDDpmhbtSyh0j2N/WWn/UTZNC4Gutdb3WuhzYDCR2baS1fklrnaS1TgoJCTmXus0SOMSDySMDZEqkEGLQMWe2jAJeBTK01s/20OxTYK5Syk0p5QPMxBibt7z2dsjdbHbz5HGhpBUep6KuaUDKEUIIe2ROz30OcDNwsWmq416l1BKl1N1KqbsBtNYZwFfAj8BO4BWt9f4BqfiHt+CNyyF3i1nN540PQWvYnC29dyHE4NHnqpBa6++APk9IqrV+BnjGEkX1atJy+PZPsO4xuCsFXHr/fjovwp9gXw82ZZVx1ZSzXz5TCCEcieMdoeruDfOfgOK9sH91n81dXBQXmqZEtsmUSCHEIOF44Q4wcTmEJ8LG30HLiT6bz4sL5XhDC3sLjluhOCGEsD3HDHcXF1j4NFQXwI6VfTa/cGwILgq+lSmRQohBwjHDHSD2Qhh3KWx5Fup7Xz/G38edadHDSJEpkUKIQcJxwx1gwVPQXG/sYO1Dclwo+45WU1rbaIXChBDCthw73EPiYNptkLoKyrN7bZocZxw0tfmgrBIphHB+jh3uAMm/Bjdv2PDbXpslhA8l1M9TliIQQgwKjh/uviEw9wHIXAt53/fYTClFclwIWw6W0drWbsUChRDC+hw/3AFm3QtDRxgHNrX3HNzz4kKpaWzlB5kSKYRwcs4R7h4+cPHjULQHDnS3rplhzthg3FwUKXICDyGEk3OOcAeYdB0MnwgbnoKW7mfEDPWSKZFCiMHBecLdxQUW/j+oPgI7/9Vjs3njQ8koruFYtUyJFEI4L+cJd4BRF8HYRbD5r1Bf0W2TeXGhAHx7UIZmhBDOy7nCHWDB76C5Fjb/udvN48J8Cff3IiVThmaEEM7L+cI9dDxMvRV2vQLlh87YbEyJDOW7Q+U0trTZoEAhhBh4zhfuYDqwyQs2PNnt5qunjqCuqZWV3x62cmFCCGEdzhnufmEwx3RgU/7WMzYnxQRyeWIEL246TEFlgw0KFEKIgeWc4Q4w++fgFw5f/6bbA5seXTIeVxfF79em26A4IYQYWM4b7n0c2BTu780vLh7LuvQSvj0oO1eFEM7FecMdIPEnENbzgU13zI0hNngIT605QHOrrDcjhHAezh3uLq6w6GnTgU0vnbHZ082VJy9PIKe8nlXf59qgQCGEGBjOHe4Ao5Jh7ELY/BdoqDxjc3JcKAsSwnh+Y7YctSqEcBrOH+5w6sCmb7s/sOnxpQm0tGv+8GWGlQsTQoiBMTjCPTQept4Cu16GijPntkcF+XD3RaP5dG8RO3K6X7ZACCEcyeAId4DkR8HVs8czNt1z0WhGBHjz5JoDcjIPIYTDGzzh7hdmnLEpYw0c2X7GZm8PVx6/LJ7MY7W8veOIDQoUQgjLGTzhDqcf2KT1GZsXTRjOBWOD+eu6LCrqmmxQoBBCWEaf4a6UGqmUSlFKpSulDiil7u+l7XSlVKtS6hrLlmkhHkPg4sfgaCoc+PiMzUopnrx8Ag3NbTzzdZYNChRCCMswp+feCjyotU4AZgE/V0oldG2klHIF/gSss2yJFpZ4PYSdZywqVl9+xuYxob7cMTeW91IL2CvnWhVCOKg+w11rXay13mO6XQtkACO6afoL4EPAvs+C4eIKS/4CdaXw6gKozDmjyS8uHkOIrydPfrqf9vYzh2+EEMLe9WvMXSkVA0wBdnS5fwRwFfBiH49foZRKVUqllpXZcD2X6Nlw62dw4ji8sgAKd5+22c/LnV8vGU9aYTWrdxfaqEghhDh7Zoe7UsoXo2f+gNa6psvmvwEPa617nUOotX5Ja52ktU4KCQnpf7WWNHIG3LneGId/fSlkfXXa5isnj2B6zDD+9FUm1Q0tNipSCCHOjlnhrpRyxwj2t7XWZy6xCEnAu0qpPOAa4AWl1JUWq3KgBI+Bn26AkDh493pIfe3kJqUUv102gaqGZv5vw0EbFimEEP1nzmwZBbwKZGitn+2ujdY6Vmsdo7WOAVYD92qtP7FopQPFNxRu+xxGz4e1D8A3T5+cJjkhwp8bZ0bz5rY8Moq7/rEihBD2y5ye+xzgZuBipdRe02WJUupupdTdA1yfdXj6wvXvwpSbYfMz8Mm90GYMxTy4cBz+3u48ueYAupu58UIIYY/c+mqgtf4OUOY+odb6tnMpyGZc3WDZ8+AfCZv+AHXHYPmbBPj48dCi8Tz68T7WpBVxxeTuJgoJIYR9GVxHqPZFKUh+BJb9A3K+hdeWQO0xrps+kokj/PnfLzKob2q1dZVCCNEnCffuTL0ZbnjPWEHylQW4VmTz1BUTKKlp4vlvDtm6OiGE6JOEe0/GLoDb1kLrCVi1kKlkcc20SF79LofDZXW2rk4IIXol4d6bEVONufDegfDGMh4ffQgvN1ee+ixddq4KIeyahHtfAmONgA9PxH/NnbwUt5vNB8tYn15i68qEEKJHEu7mGBIEt66B8UuZffBPPDP0A37/2X4aW9psXZkQQnRLwt1c7t6w/E2YfhfXNn/MQ/V/4eUUOeeqEMI+Sbj3h4srLHkGLvkty1y3MfO7n7I1Ld3WVQkhxBkk3PtLKZj7SxouW8kklxzGfbSYA1vOPPGHEELYkoT7WfJJup7G2zdS6xrAhI23Ubz64ZNLFgghhK1JuJ+DgOiJ+N63mc/cFxG+fyX1KxdAVb6tyxJCCAn3cxUSGMD0+97kt56/or0sk7YX50L6p7YuSwgxyEm4W8Bwfy/uuudX3OHxLBnNIfD+LbD2v6Gl0dalCSEGKQl3CxkR4M1ffnYFd7v/L2+qZZD6KrwyH8rkRB9CCOuTcLeg6KAhvLFiLs+53sr9rr+hrboIXroIfnj75AlAhBDCGiTcLWx0iC//uWsmW5jCf+lnaAybAp/eCx+tgKZaW5cnhBgkJNwHwLgwP966cwa5TX4sqvgltbP/B/avhn9dCEV7bV2eEGIQkHAfIBMi/HnrzplUNLSzbN8cqq792NjB+uoC2L5ShmmEEANKwn0AJY4M4I07plNS08jyrxSVt3xjnIj7q4fh3RugodLWJQohnJSE+wCbFh3IqtumU1DVwI3/OcTxK96AxX+E7PWwci7kb7V1iUIIJyThbgWzRgXx8i1JHC6r45bXdlEz+afw0/Xg5gmvL4VvnpalC4QQFiXhbiUXjA3hxRunklFcw22rdlIXNBF+thkSr4fNz8CqRcY5W4UQwgIk3K1ofnwYz18/hbTCau58fRcnlA9c+QJc+zpUHIKVF8Cet2RnqxDinEm4W9ni88L5v+smsyuvkrveTDXO5jThKrhnq3HO1jX3GcsXyM5WIcQ5kHC3gWWJEfz5mkS+O1TOXW+mUlnfDP6RcMsaWPA7yPoSXjwfDqfYulQhhIOScLeRa6ZF8udrJrEjp5JFf9vMluwycHGBOffDXRvB0w/euhK+/g20Ntm6XCGEg+kz3JVSI5VSKUqpdKXUAaXU/d20uVEp9aNSap9SaqtSKnFgynUuy5NG8ul9cwjwdufmV3fy9Np0mlrbIDwRVnwL038K2/4BL8+HUjlfqxDCfOb03FuBB7XWCcAs4OdKqYQubXKBi7TWE4HfAy9ZtkznFR8+lM9+MZdbZkfzyne5XPnPrWSX1IKHDyz9K1z/HtQWw0vJsOMl2dkqhDBLn+GutS7WWu8x3a4FMoARXdps1VpXmX7cDkRaulBn5uXuyu+uOI9Xb02itKaRy57/jre256O1hrjFcO82iLkAvnwI/rMc6kptXbIQws71a8xdKRUDTAF29NLsTuDLsy9p8JofH8aXD1zArFFBPP7Jfu56M5WKuibwDYUbP4BLn4HczfDCbMj6ytblCiHsmNnhrpTyBT4EHtBa1/TQZh5GuD/cw/YVSqlUpVRqWVnZ2dTr9EL9vHjttuk8cVkCmw+Ws/jvW9h8sAyUgpkrYMUm8BsO71xnnO2pucHWJQsh7JDSZozhKqXcgbXA11rrZ3toMwn4GLhUa93n6YeSkpJ0ampqP8sdXDKKa7j/3R84WFLHHXNi+Z/FcXi5uxqzZzb+ztjZGjwOLv87RM02vgCEEE5NKbVba53UZ7u+wl0ppYA3gEqt9QM9tIkCvgFu0VqbtRKWhLt5Glva+MMXGbyxLZ/xw/14/vopjA3zMzYeToFP7jF2uPpHQcIy44CoEdMk6IVwUpYM97nAFmAf0G66+1EgCkBrvVIp9QpwNZBv2t7a14tLuPfPN5klPPTBj9Q1tfLY0nhumhWNUgoaqyHjM0j/1Aj79hbwHwkJVxiXEUnG/HkhhFOwWLgPFAn3/iurbeKh1Wlsyipj/vhQ/nTNJIJ9PU81OFFlHN164BM4/I0R9ENHmIL+SoicLkEvhIOTcHdSWmte35rHH77MZKiXO39dnshF40LObHjiOBz8yhT0G6GtGfwijKGbhCth5EwJeiEckIS7k8s8VsP97+wlq6SWm2dF8+DCcQT4eHTfuLEaDn5tBP2hDdDWBH7hEL8MJnQEvat1fwEhxFmRcB8EGlva+PNXWby+NRc/L3d+eclYbpwVjbtrLz3yxhrIXgcHPjaCvrURfMPgvKth6i0QGm+9X0AI0W8S7oNI1rFanv48nS3Z5YwOGcJjSxNIjgsxdrj2pqnW1KP/2Lhub4HIGUbIT7gKPH2t8wsIIcwm4T7IaK1JySrl6bUZ5JTXc+G4EB5bGs+4jmmTfakvh7R3Yc8bUH4QPPxgoqk3HzFVplYKYSck3AeplrZ23tqWz982HKS+uY0bZkTxywXjCBzSw3h8V1pDwQ7Y/YbRo289AWETjZCfdC14DxvYX0AI0SsJ90Guqr6Zv2/M5q3t+fh4uHL//LHcMjsGD7d+zJBprIZ9q43efHEauHkZ0yqn3grR50tvXggbkHAXABwqreXpzzPYlFVGbPAQHl0SzyXxoX2Px3dVtBf2vAn7PoCmGggaY/TmE683FjYTQliFhLs4zaasUp7+PINDpXXMGRPEY0sTiA8f2v8nam6A9E+MoD+yDVzcIG6J0ZsflQyubpYuXQjRiYS7OENLWzvv7DzCs+sPUnOiheumR/HgwnGnH+XaH2VZRsinvQMNFeDmDcPPM84k1XEJiQc3M8f7hRB9knAXPapuaOG5b7J5Y2se3u6u3HfxGG6bE4On21keyNTaZEylPLLdGJs/9qMxdAPg4g5hCafCfngihE0wzjQlhOg3CXfRp5yyOv73iww2ZJQS7u/FXReM4iczRuLjcY5DK+3tUJVrBH3ny4lKY7tygeC403v4wyeC11kMEwkxyEi4C7N9f6ic5zZmsyO3kmE+7tw+J5ZbZ8fg7+NuuRfRGqoLzwz8umOn2gSONoZ1ws4zevehCRAQLWvgCNGJhLvot935lbyQcpiNmaUM8XDlplnR3Dk3ltChXgP3orUlxjBO8V5jRk7JAaPX38HD1wj5sAmnLqEJ4B0wcDUJYcck3MVZyzxWw4ubDvNZWhFuri5cMy2Suy8cTVSQlcbJm+qgLBNK9hthX5Ju3G48fqqN/8hTQR82wejtB42R2TrC6Um4i3OWX1HPvzbnsDq1kNb2di5PjOCe5NGMH26DsXGtoabICPvSA6bQP2AsldDearRx9YSQccZO2/BEiJhs2nk7xPr1CjFAJNyFxZTWNPLqd7n8e3s+9c1tzB8fyr3zRjMtOtDWpUFrsxHwJQdO9fSL06Ch3NiuXIzzzIYnQvhk2XkrHJ6Eu7C46oYW3tiWx2vf51LV0MLM2EDunTeGC8cG9/+I14HU0cs/beftXuNcsx0CRxs9+5OzdSaBjx18WXVoazGmmMrKnKILCXcxYBqaW3lnZwGvbMmhuLqR80YM5Z6LxrD4vOG4uthRyHfVeedtcRoUpUH1kVPbA6JPBb13gLGWjpsXuHmaee11+sweraG5zjj94YnjpusqY99Bx+0z7jf93FxnPEfwOGMdn6jzIXo2BERZ9z0TdkfCXQy45tZ2PvnhKCu/PUxOeT3RQT7cdn4M1yaNxNfTQXZsNlSe3rsvToPKnLN/Phd3I+Rd3Yz18jv2B/TU1ntYp0vAqdteAcbCbIW7jIPDOg4K8x9pCvvZED0HgsfKAm6DjIS7sJq2ds3XB46x6rtcUvOr8PN049qkkdx2foz1ZthYUnMDNNcbZ6lqbep0faKb+7q5bmk0TnziOfTMwO4c5O4+5gVze5uxLyF/KxzZCvnboL7U2OYTbPToo843Qn/4RDllopOTcBc2kVZwnNe+z2Xtj8W0ac2C+DDumBvLzNhA+xqXd2RaQ8VhU9CbLsfzjW0efjByhhH00ecbJ1pxH8DjFLrW1dYMLaYvwc7XaOOYBQ9fY/aSh69MWz1LEu7CpkpqGnlrWz5v78inqqGFhPCh3DE3lssTw89+DRvRs+qjxiqdHWFflnFqm3IBVw/j4uJ26rare5frjttd7lcupwd1d+HdOcTN5eZ1Kuw9/U6Ffk8/u/uAu3eni4/xHF3vd/M276jmjn0ijdXGvo7G6m4uXe83HWtx2mt7dfrZu5v7Ov3cUd/QCBga3q9/4pP/nBLuwh40trTxyQ9Hee37PLJKagn29eDGmdHcNCuaEL+zXI1S9K2h0gj7kgPGcFF7izEDp63ZdGnpdN3D/e2m2+1tplDqFFRu3l2uvTq16eYaZQRpc71x3VRn+tl0X7c/1xrXuq3/v7+r55mB7+5t7APpHNi6vffn8fA1htO8/E2Xocbv0nri1BfdyS+3BmNIrqWBPr/k5twPC37X/98LCXdhZ7TWbD1cwarvctmYWYqHqwuXJ0Zw+5wYzhvhb+vyhL3S2vhy6vhS6PzXQ0tDl59PdArdTkF7sk0DKFdjf8dpgW26eHe+L8DYZ3I2Q0cnh6caev4CCIiC0PFn9ZZIuAu7lVNWxxtb8/hgdyENzW3MjA3k9jmxLEgIs++plELYAQl3YfeqT7Tw/q4CXt+ax9HjJxgZ6M0NM6K5etoIQv2stBNQCAdjsXBXSo0E3gTCMAaSXtJa/71LGwX8HVgCNAC3aa339Pa8Eu6iQ2tbO+vTS3htax47cytxdVHMiwvluukjmRcXgpurLPkrRAdzw92cAaVW4EGt9R6llB+wWym1Xmud3qnNpcBY02Um8KLpWog+ubm6cOnEcC6dGM7hsjreTy3gw91H2ZBRQoifJ1dPjWR5UiSjQuRQfCHM1e9hGaXUp8A/tNbrO933L2CT1vod089ZQLLWuriHp5Geu+hVS1s7KZmlvJ9aSEpWKW3tmhkxgSyfPpIlE4ef+9mihHBQluy5d37SGGAKsKPLphFAQaefC0339RjuQvTG3dWFhROGs3DCcEprGvlwz1HeTy3gVx+k8ds1B7g8MYLrpo8kMdJfDo4Sohtmh7tSyhf4EHhAa11zNi+mlFoBrACIipIFkIR5Qod6cU/yaO6+aBS78qp4b1cBn/xwlHd2HiEuzI/l00dy1ZQRBA7xsHWpQtgNs4ZllFLuwFrga631s91sl2EZYVW1jS18llbMe6kFpBUcx91VsTBhONcmRXLB2BCZUimclsWGZUwzYV4FMroLdpM1wH1KqXcxdqRW9xbsQpwrPy93bpgZxQ0zo8g6Vst7uwr4+IdCPt9XTLi/F1dPjeSaaZHEBMtZmMTgZM5UyLnAFmAf0HGs7qNAFIDWeqXpC+AfwGKMqZC3a6177ZZLz11YWlNrGxvSS/lgdwGbD5bRrmFGTCDXJEWydGI4QxxlGWIheiEHMYlB7Vh1Ix/9UMjq1EJyyuvx8XBlycRwrp0WyQxZoVI4MAl3ITDWtNlzpIoPUgv5LK2I+uY2ooN8uGbs0LwPAAANzElEQVRqJFdPiyQiwNvWJQrRLxLuQnTR0NzKl/uO8cHuArbnVKIUzB0TzDXTIlk0YThe7rIUsbB/Eu5C9OJIRQOr9xTy4e5Cjh4/wVAvN5ZNjuDaaSOZJHPnhR2TcBfCDO3tmm05FXyQWsCX+4/R1NrOuDBfrpwygssnRTAy0AFPEyicmoS7EP1U09jC2rRiVu8uYM8R44w7U6MCuGLyCJZMDJeTiwi7IOEuxDkoqGxgTVoRn6UVkXmsFhcFc8YEsywxgkXnDWeol7utSxSDlIS7EBZysKSWNXuL+DTtKAWVJ/Bwc2FeXAjLEkcwPz5UdsQKq5JwF8LCtNbsLTjOmrQi1v5YTFltE76ebixMCOPyyRHMHROMu6w9LwaYhLsQA6itXbM9p4I1e4v4cn8xNY2tBA7xYMnE4SxLHEFS9DBcZH0bMQAk3IWwkqbWNr7NKmNNWhEbMkpobGknwt+LpZPCWTopQpYlFhYl4S6EDdQ3tbI+vYTP0orYnF1GS5tmRIA3l00KZ+mkcCaOkKAX50bCXQgbqz7Rwvr0Ej7/sYgt2eW0tmuiAn2MHv3EcCZEDJWgF/0m4S6EHTne0My69BI+/7GY7w8ZQR8T1BH0EcSH+0nQC7NIuAthp6rqm1mXfoy1Pxaz9XAFbe2aUcFDTGP04cSFSdCLnkm4C+EAKuub+frAMdb+WMS2wxW0axgdMoSlkyJYOjGccWG+EvTiNBLuQjiY8romvtp/jM9/LGZHrhH0kcO8mRcXSnJcCLNHB+HjISccGewk3IVwYKW1jaxPL2FTVhnfHyqnobkNDzcXZo0KYl5cCPPiQuUUgoOUhLsQTqKptY1duVWkZJWyKauUw2X1AMQE+ZAcF8q88aHMjA2UZRAGCQl3IZzUkYoGNh0sJSWzlK2HK2hqbcfL3YU5o4NJjgshOS5Ulip2YhLuQgwCjS1tbMup4NusMr7JLOVIZQMAY0J9Tw7fTI8NlDVvnIiEuxCDjNaa3PJ6UrLK2JRVyo6cSprb2vHzcuOicSFcEh9GclwIAT4eti5VnAMJdyEGufqmVr4/VM7GjFI2ZpZQXteMq4siKXoYl8SHcUlCGLGyU9bhSLgLIU5qb9ekFR5nY0YpGzJKyDxWC8CokCFG0MeHMTUqADcZvrF7Eu5CiB4VVDbwTaYR9NtzKmhp0wT4uDMvLpRL4sO4cFwwfnK2Kbsk4S6EMEttYwtbssvZkF5CSlYpVQ0tuLsqZsYGMT/eCHuZfWM/JNyFEP3W2tbOniPH2ZhRwoaMkpNz6scP92N+fCjz48OYHBkgJyKxIQl3IcQ5yy2vZ0O6EfSp+VW0tWuCfT25eLwx+2bu2GBZEsHKLBbuSqlVwGVAqdb6vG62+wP/BqIAN+AvWuvX+nphCXchHMvxhmY2ZZWxIaOEb7PKqG1qxcPNhTmjg7gkIYz548MY7u9l6zKdniXD/UKgDnizh3B/FPDXWj+slAoBsoDhWuvm3p5Xwl0Ix9Xc2s6uvEo2mIZvCipPADBxhP/JcXo5GcnAMDfc+/x7Smu9WSkV01sTwE8Z/4q+QCXQamadQggH5OHmwpwxwcwZE8wTlyWQXVrH+vQSNmaU8PeN2fxtQzbh/l4nx+lnjwqStW+szKwxd1O4r+2h5+4HrAHGA37AdVrrz3t4nhXACoCoqKhp+fn5Z124EMI+ldc1GdMs00vYkl3OiZY2PN1cmBo1jJmjApk1KojJIwMk7M+SRXeo9hHu1wBzgP8GRgPrgUStdU1vzynDMkI4v8aWNrYdrmBLdjk7citIL65Ba6PnP2VkADNHBTFrVCBTo4ZJ2JvJYsMyZrgd+KM2viUOKaVyMXrxOy3w3EIIB+bl7sq88cayxADVDS3syqtke04FO3Ir+cc32Ty3ETxcXZg8MuBkz35q1DC8PSTsz4Ulwv0IMB/YopQKA+KAHAs8rxDCyfj7uHNJgrGuDUBNYwupeZVsz6lkR04F/0w5xPPfHMLdVZEYeSrsp0UPkymX/WTObJl3gGQgGCgBngTcAbTWK5VSEcDrQDigMHrx/+7rhWVYRgjRVW1jC6n5VezIMXr3+45W09aucXNRTIz0Z0ZMINNNF3+fwbk8ghzEJIRweHVNrezOr2JHTgU7cytJKzxOS5tGKYgL82NGrBH0M2IDCRs6OObYS7gLIZxOY0sbewuOsyu3kp15lezOr6KhuQ2A6CAfI+hjApkeG0hMkI9TzrO35g5VIYSwCi93V2aNCmLWqCDAWAsnvbiGnbmV7MytZGNGCat3FwIQ4ufJDFOvfnpMIHHD/XAdRGviSM9dCOE02ts1h8vq2JlXafTucyspqm4EwM/LjaToYSTFBDItehiJkQEOOSNHeu5CiEHHxUUxNsyPsWF+3DgzGoDCqgZ25layK6+SXXlVpGRlAeDmopgwwt8I/OhhTIsZRqif84zbS89dCDGoVNU3s+dIFan5VezOqyKt8DhNre0ARAX6nAz66TGBjAnxtbvljWWHqhBCmKG5tZ39RdXszqsiNd/YSVteZ6x7ONTLjWl2NpQjwzJCCGEGD9O6N1OjhnEXo9Bak1/RYPTs8ytJ7TKU0zHffkZsIEnR9jvfXnruQgjRh+MNxlDOrrwqUvMqSSuoprmt/eR8+5mxgcyIDWJ67MCP28uwjBBCDJDe5tvHBg85Odd+ZmwgkcO8LTrfXoZlhBBigHSdb9/S1k56kTHffkduJV+nH+O91AIAwv29Th5FOzM2kDGhvlY5uEp67kIIYWHt7Zrs0jp25lawM89YPqG0tgmAYT7u3Js8hrsuHHVWzy09dyGEsBEXF0XccD/ihvtx8+wYtNYcqWw4eSRtmBXONSvhLoQQA0wpRXTQEKKDhnBt0kirvKaLVV5FCCGEVUm4CyGEE5JwF0IIJyThLoQQTkjCXQghnJCEuxBCOCEJdyGEcEIS7kII4YRstvyAUqoMyD/LhwcD5RYsZyA5Sq1Sp+U5Sq1Sp2UNdJ3RWuuQvhrZLNzPhVIq1Zy1FeyBo9QqdVqeo9QqdVqWvdQpwzJCCOGEJNyFEMIJOWq4v2TrAvrBUWqVOi3PUWqVOi3LLup0yDF3IYQQvXPUnrsQQohe2HW4K6UWK6WylFKHlFKPdLPdUyn1nmn7DqVUjA1qHKmUSlFKpSulDiil7u+mTbJSqloptdd0ecLadXaqJU8ptc9UxxmnwlKG50zv6Y9Kqak2qDGu03u1VylVo5R6oEsbm72nSqlVSqlSpdT+TvcFKqXWK6WyTdfDenjsraY22UqpW21Q5zNKqUzTv+3HSqmAHh7b6+fECnX+Vil1tNO/75IeHttrRlihzvc61ZinlNrbw2Ot9n6epLW2ywvgChwGRgEeQBqQ0KXNvcBK0+2fAO/ZoM5wYKrpth9wsJs6k4G1tn5PTbXkAcG9bF8CfAkoYBawww4+B8cw5vbaxXsKXAhMBfZ3uu/PwCOm248Af+rmcYFAjul6mOn2MCvXuRBwM93+U3d1mvM5sUKdvwV+ZcZno9eMGOg6u2z/K/CErd/Pjos999xnAIe01jla62bgXeCKLm2uAN4w3V4NzFfWOPNsJ1rrYq31HtPtWiADGGHNGizsCuBNbdgOBCilwm1Yz3zgsNb6bA94szit9WagssvdnT+LbwBXdvPQRcB6rXWl1roKWA8stmadWut1WutW04/bgciBen1z9fB+msOcjLCY3uo05c5y4J2Bev3+sudwHwEUdPq5kDND82Qb0we2GgiySnXdMA0LTQF2dLN5tlIqTSn1pVJqglULO50G1imldiulVnSz3Zz33Zp+Qs//YezlPQUI01oXm24fA8K6aWNv7+0dGH+ldaevz4k13GcaPlrVwzCXPb2fFwAlWuvsHrZb/f2053B3KEopX+BD4AGtdU2XzXswhhUSgeeBT6xdXydztdZTgUuBnyulLrRhLb1SSnkAy4APutlsT+/pabTxd7hdT0NTSv0GaAXe7qGJrT8nLwKjgclAMcaQhz27nt577VZ/P+053I8Cnc8kG2m6r9s2Sik3wB+osEp1nSil3DGC/W2t9Uddt2uta7TWdabbXwDuSqlgK5fZUctR03Up8DHGn7admfO+W8ulwB6tdUnXDfb0npqUdAxfma5Lu2ljF++tUuo24DLgRtMX0RnM+JwMKK11ida6TWvdDrzcw+vby/vpBvwX8F5PbWzxftpzuO8CxiqlYk09uJ8Aa7q0WQN0zDi4Bvimpw/rQDGNtb0KZGitn+2hzfCOfQFKqRkY77stvoSGKKX8Om5j7Fzb36XZGuAW06yZWUB1p+EGa+uxN2Qv72knnT+LtwKfdtPma2ChUmqYaZhhoek+q1FKLQb+B1imtW7ooY05n5MB1WU/z1U9vL45GWENlwCZWuvC7jba7P205t7b/l4wZm4cxNgj/hvTfb/D+GACeGH8yX4I2AmMskGNczH+BP8R2Gu6LAHuBu42tbkPOICxN387cL6N3s9RphrSTPV0vKeda1XAP03v+T4gyUa1DsEIa/9O99nFe4rxhVMMtGCM896Jsa9nI5ANbAACTW2TgFc6PfYO0+f1EHC7Deo8hDFO3fFZ7ZhtFgF80dvnxMp1vmX6/P2IEdjhXes0/XxGRlizTtP9r3d8Lju1tdn72XGRI1SFEMIJ2fOwjBBCiLMk4S6EEE5Iwl0IIZyQhLsQQjghCXchhHBCEu5CCOGEJNyFEMIJSbgLIYQT+v9RbSbbr55Y9gAAAABJRU5ErkJggg==\n"
1371 |           },
1372 |           "metadata": {
1373 |             "tags": []
1374 |           }
1375 |         }
1376 |       ]
1377 |     },
1378 |     {
1379 |       "cell_type": "markdown",
1380 |       "metadata": {
1381 |         "id": "HSyx-HvpUz2o",
1382 |         "colab_type": "text"
1383 |       },
1384 |       "source": [
1385 |         "From the plot, we can infer that validation loss has increased after epoch 17 for 2 successive epochs. Hence, training is stopped at epoch 19.\n",
1386 |         "\n",
1387 |         "Next, let’s build the dictionary to convert the index to word for target and source vocabulary:"
1388 |       ]
1389 |     },
1390 |     {
1391 |       "cell_type": "code",
1392 |       "metadata": {
1393 |         "trusted": true,
1394 |         "id": "sBX0zZnOFxjW",
1395 |         "colab_type": "code",
1396 |         "colab": {}
1397 |       },
1398 |       "source": [
1399 |         "reverse_target_word_index=y_tokenizer.index_word\n",
1400 |         "reverse_source_word_index=x_tokenizer.index_word\n",
1401 |         "target_word_index=y_tokenizer.word_index"
1402 |       ],
1403 |       "execution_count": 0,
1404 |       "outputs": []
1405 |     },
1406 |     {
1407 |       "cell_type": "markdown",
1408 |       "metadata": {
1409 |         "id": "eM_nU_VvFxjq",
1410 |         "colab_type": "text"
1411 |       },
1412 |       "source": [
1413 |         "# Inference\n",
1414 |         "\n",
1415 |         "Set up the inference for the encoder and decoder:"
1416 |       ]
1417 |     },
1418 |     {
1419 |       "cell_type": "code",
1420 |       "metadata": {
1421 |         "trusted": true,
1422 |         "id": "9QkrNV-4Fxjt",
1423 |         "colab_type": "code",
1424 |         "colab": {}
1425 |       },
1426 |       "source": [
1427 |         "# Encode the input sequence to get the feature vector\n",
1428 |         "encoder_model = Model(inputs=encoder_inputs,outputs=[encoder_outputs, state_h, state_c])\n",
1429 |         "\n",
1430 |         "# Decoder setup\n",
1431 |         "# Below tensors will hold the states of the previous time step\n",
1432 |         "decoder_state_input_h = Input(shape=(latent_dim,))\n",
1433 |         "decoder_state_input_c = Input(shape=(latent_dim,))\n",
1434 |         "decoder_hidden_state_input = Input(shape=(max_text_len,latent_dim))\n",
1435 |         "\n",
1436 |         "# Get the embeddings of the decoder sequence\n",
1437 |         "dec_emb2= dec_emb_layer(decoder_inputs) \n",
1438 |         "# To predict the next word in the sequence, set the initial states to the states from the previous time step\n",
1439 |         "decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=[decoder_state_input_h, decoder_state_input_c])\n",
1440 |         "\n",
1441 |         "#attention inference\n",
1442 |         "attn_out_inf, attn_states_inf = attn_layer([decoder_hidden_state_input, decoder_outputs2])\n",
1443 |         "decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_outputs2, attn_out_inf])\n",
1444 |         "\n",
1445 |         "# A dense softmax layer to generate prob dist. over the target vocabulary\n",
1446 |         "decoder_outputs2 = decoder_dense(decoder_inf_concat) \n",
1447 |         "\n",
1448 |         "# Final decoder model\n",
1449 |         "decoder_model = Model(\n",
1450 |         "    [decoder_inputs] + [decoder_hidden_state_input,decoder_state_input_h, decoder_state_input_c],\n",
1451 |         "    [decoder_outputs2] + [state_h2, state_c2])"
1452 |       ],
1453 |       "execution_count": 0,
1454 |       "outputs": []
1455 |     },
1456 |     {
1457 |       "cell_type": "markdown",
1458 |       "metadata": {
1459 |         "id": "zOiyk4ToWe74",
1460 |         "colab_type": "text"
1461 |       },
1462 |       "source": [
1463 |         "We are defining a function below which is the implementation of the inference process (which we covered [here](https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/)):"
1464 |       ]
1465 |     },
1466 |     {
1467 |       "cell_type": "code",
1468 |       "metadata": {
1469 |         "trusted": true,
1470 |         "id": "6f6TTFnBFxj6",
1471 |         "colab_type": "code",
1472 |         "colab": {}
1473 |       },
1474 |       "source": [
1475 |         "def decode_sequence(input_seq):\n",
1476 |         "    # Encode the input as state vectors.\n",
1477 |         "    e_out, e_h, e_c = encoder_model.predict(input_seq)\n",
1478 |         "    \n",
1479 |         "    # Generate empty target sequence of length 1.\n",
1480 |         "    target_seq = np.zeros((1,1))\n",
1481 |         "    \n",
1482 |         "    # Populate the first word of target sequence with the start word.\n",
1483 |         "    target_seq[0, 0] = target_word_index['sostok']\n",
1484 |         "\n",
1485 |         "    stop_condition = False\n",
1486 |         "    decoded_sentence = ''\n",
1487 |         "    while not stop_condition:\n",
1488 |         "      \n",
1489 |         "        output_tokens, h, c = decoder_model.predict([target_seq] + [e_out, e_h, e_c])\n",
1490 |         "\n",
1491 |         "        # Sample a token\n",
1492 |         "        sampled_token_index = np.argmax(output_tokens[0, -1, :])\n",
1493 |         "        sampled_token = reverse_target_word_index[sampled_token_index]\n",
1494 |         "        \n",
1495 |         "        if(sampled_token!='eostok'):\n",
1496 |         "            decoded_sentence += ' '+sampled_token\n",
1497 |         "\n",
1498 |         "        # Exit condition: either hit max length or find stop word.\n",
1499 |         "        if (sampled_token == 'eostok'  or len(decoded_sentence.split()) >= (max_summary_len-1)):\n",
1500 |         "            stop_condition = True\n",
1501 |         "\n",
1502 |         "        # Update the target sequence (of length 1).\n",
1503 |         "        target_seq = np.zeros((1,1))\n",
1504 |         "        target_seq[0, 0] = sampled_token_index\n",
1505 |         "\n",
1506 |         "        # Update internal states\n",
1507 |         "        e_h, e_c = h, c\n",
1508 |         "\n",
1509 |         "    return decoded_sentence"
1510 |       ],
1511 |       "execution_count": 0,
1512 |       "outputs": []
1513 |     },
1514 |     {
1515 |       "cell_type": "markdown",
1516 |       "metadata": {
1517 |         "id": "6GuDf4TPWt6_",
1518 |         "colab_type": "text"
1519 |       },
1520 |       "source": [
1521 |         "Let us define the functions to convert an integer sequence to a word sequence for summary as well as the reviews:"
1522 |       ]
1523 |     },
1524 |     {
1525 |       "cell_type": "code",
1526 |       "metadata": {
1527 |         "trusted": true,
1528 |         "id": "aAUntznIFxj9",
1529 |         "colab_type": "code",
1530 |         "colab": {}
1531 |       },
1532 |       "source": [
1533 |         "def seq2summary(input_seq):\n",
1534 |         "    newString=''\n",
1535 |         "    for i in input_seq:\n",
1536 |         "        if((i!=0 and i!=target_word_index['sostok']) and i!=target_word_index['eostok']):\n",
1537 |         "            newString=newString+reverse_target_word_index[i]+' '\n",
1538 |         "    return newString\n",
1539 |         "\n",
1540 |         "def seq2text(input_seq):\n",
1541 |         "    newString=''\n",
1542 |         "    for i in input_seq:\n",
1543 |         "        if(i!=0):\n",
1544 |         "            newString=newString+reverse_source_word_index[i]+' '\n",
1545 |         "    return newString"
1546 |       ],
1547 |       "execution_count": 0,
1548 |       "outputs": []
1549 |     },
1550 |     {
1551 |       "cell_type": "markdown",
1552 |       "metadata": {
1553 |         "id": "9gM4ALyfWwA9",
1554 |         "colab_type": "text"
1555 |       },
1556 |       "source": [
1557 |         "Here are a few summaries generated by the model:"
1558 |       ]
1559 |     },
1560 |     {
1561 |       "cell_type": "code",
1562 |       "metadata": {
1563 |         "trusted": true,
1564 |         "id": "BUtQmQTmFxkI",
1565 |         "colab_type": "code",
1566 |         "colab": {},
1567 |         "outputId": "f407d9fc-e0cd-4082-98f5-bd1f562dc26f"
1568 |       },
1569 |       "source": [
1570 |         "for i in range(0,100):\n",
1571 |         "    print(\"Review:\",seq2text(x_tr[i]))\n",
1572 |         "    print(\"Original summary:\",seq2summary(y_tr[i]))\n",
1573 |         "    print(\"Predicted summary:\",decode_sequence(x_tr[i].reshape(1,max_text_len)))\n",
1574 |         "    print(\"\\n\")"
1575 |       ],
1576 |       "execution_count": 0,
1577 |       "outputs": [
1578 |         {
1579 |           "output_type": "stream",
1580 |           "text": [
1581 |             "Review: gave caffeine shakes heart anxiety attack plus tastes unbelievably bad stick coffee tea soda thanks \n",
1582 |             "Original summary: hour \n",
1583 |             "Predicted summary:  not worth the money\n",
1584 |             "\n",
1585 |             "\n",
1586 |             "Review: got great course good belgian chocolates better \n",
1587 |             "Original summary: would like to give it stars but \n",
1588 |             "Predicted summary:  good\n",
1589 |             "\n",
1590 |             "\n",
1591 |             "Review: one best flavored coffees tried usually like flavored coffees one great serve company love \n",
1592 |             "Original summary: delicious \n",
1593 |             "Predicted summary:  great coffee\n",
1594 |             "\n",
1595 |             "\n",
1596 |             "Review: salt separate area pain makes hard regulate salt putting like salt go ahead get product \n",
1597 |             "Original summary: tastes ok packaging \n",
1598 |             "Predicted summary:  salt\n",
1599 |             "\n",
1600 |             "\n",
1601 |             "Review: really like product super easy order online delivered much cheaper buying gas station stocking good long drives \n",
1602 |             "Original summary: turkey jerky is great \n",
1603 |             "Predicted summary:  great product\n",
1604 |             "\n",
1605 |             "\n",
1606 |             "Review: best salad dressing delivered promptly quantities last vidalia onion dressing compares made oak hill farms sometimes find costco order front door want even orders cut shipping costs \n",
1607 |             "Original summary: my favorite salad dressing \n",
1608 |             "Predicted summary:  great product\n",
1609 |             "\n",
1610 |             "\n",
1611 |             "Review: think sitting around warehouse long time took long time send got tea tasted like cardboard red rasberry leaf tea know supposed taste like \n",
1612 |             "Original summary: stale \n",
1613 |             "Predicted summary:  not as good as\n",
1614 |             "\n",
1615 |             "\n",
1616 |             "Review: year old cat special diet digestive problems also diabetes stopped eating usual special formula food tried different kinds catfood one liked easy digestion diabetes thank newman \n",
1617 |             "Original summary: wonderful \n",
1618 |             "Predicted summary:  cat food\n",
1619 |             "\n",
1620 |             "\n",
1621 |             "Review: always perfect snack dog loves knows exactly starts ask time evening gets greenie snack thank excellent product fast delivery \n",
1622 |             "Original summary: greenies buddy treat \n",
1623 |             "Predicted summary:  great treat\n",
1624 |             "\n",
1625 |             "\n",
1626 |             "Review: dog loves tiny treats keep one car one house \n",
1627 |             "Original summary: dog loves them \n",
1628 |             "Predicted summary:  my dog loves these\n",
1629 |             "\n",
1630 |             "\n",
1631 |             "Review: liked coffee much subscribing dark rich smooth \n",
1632 |             "Original summary: makes great cup of java \n",
1633 |             "Predicted summary:  good coffee\n",
1634 |             "\n",
1635 |             "\n",
1636 |             "Review: far dog tried chicken peanut butter flavor absolutely loves love natural makes happy giving dog something healthy treats small soft big plus calories \n",
1637 |             "Original summary: love zuke mini naturals \n",
1638 |             "Predicted summary:  my dog loves these\n",
1639 |             "\n",
1640 |             "\n",
1641 |             "Review: absolutely delicious satisfy something sweet really filling great early morning time make breakfast great afternoon snack work feeling sluggish \n",
1642 |             "Original summary: protein bar \n",
1643 |             "Predicted summary:  yummy\n",
1644 |             "\n",
1645 |             "\n",
1646 |             "Review: aware decaf coffee although showed search decaf cups intended purchase gift kept recipient drink caffeine favorite means \n",
1647 |             "Original summary: not decaf \n",
1648 |             "Predicted summary:  decaf decaf\n",
1649 |             "\n",
1650 |             "\n",
1651 |             "Review: wonderful wrote perfect iced cookie one pen writing cookies names happy ca \n",
1652 |             "Original summary: cookie \n",
1653 |             "Predicted summary:  delicious\n",
1654 |             "\n",
1655 |             "\n",
1656 |             "Review: truffle oil quite good prefer brand france urbani italy expensive oh delicious tried black white good black bit stronger pungent event healthy alternative butter enjoy \n",
1657 |             "Original summary: delicious but not the best \n",
1658 |             "Predicted summary:  great flavor\n",
1659 |             "\n",
1660 |             "\n",
1661 |             "Review: enjoy coffee office split right middle loving think worth try order regularly \n",
1662 |             "Original summary: hit or miss \n",
1663 |             "Predicted summary:  good coffee\n",
1664 |             "\n",
1665 |             "\n",
1666 |             "Review: husband gluten free food several years tried several different bread mixes first actually enjoys buying amazon saves loaf \n",
1667 |             "Original summary: really good gluten free bread \n",
1668 |             "Predicted summary:  great gluten free bread\n",
1669 |             "\n",
1670 |             "\n",
1671 |             "Review: hubby eats says good snacks morning done apple flavor \n",
1672 |             "Original summary: really good nice snack \n",
1673 |             "Predicted summary:  great snack\n",
1674 |             "\n",
1675 |             "\n",
1676 |             "Review: waste money disgusting product chocolate taste tastes like plastic lining paper carton using milk treated ultra high temperatures like fresh milk go get fresh milk hershey syrup want chocolate milk \n",
1677 |             "Original summary: please do not waste your money \n",
1678 |             "Predicted summary:  yuck\n",
1679 |             "\n",
1680 |             "\n",
1681 |             "Review: absolutely loves apple chicken happy hips looks forward one morning one night gets soooo excited would eat allowed \n",
1682 |             "Original summary: healthy treats \n",
1683 |             "Predicted summary:  great for training\n",
1684 |             "\n",
1685 |             "\n",
1686 |             "Review: strong much flavor little aroma tried purchase another time similiar brands met standards expected \n",
1687 |             "Original summary: no flavor \n",
1688 |             "Predicted summary:  san francisco bay coffee\n",
1689 |             "\n",
1690 |             "\n",
1691 |             "Review: company wanted chose order anyway \n",
1692 |             "Original summary: water \n",
1693 |             "Predicted summary:  not good\n",
1694 |             "\n",
1695 |             "\n",
1696 |             "Review: introduced number people hooked best sour gummy ever great flavors got great price \n",
1697 |             "Original summary: new favorite \n",
1698 |             "Predicted summary:  best licorice\n",
1699 |             "\n",
1700 |             "\n",
1701 |             "Review: new price attractive however tastes horrible maybe old zico coconut water brands might find acceptable \n",
1702 |             "Original summary: do not be by the price \n",
1703 |             "Predicted summary:  terrible\n",
1704 |             "\n",
1705 |             "\n",
1706 |             "Review: sure ever going buy product way expensive market price \n",
1707 |             "Original summary: too expensive \n",
1708 |             "Predicted summary:  good value\n",
1709 |             "\n",
1710 |             "\n",
1711 |             "Review: flavor normally find local stores plus buy bulk things take savings add veggies even stir egg noodles cook add nutrition quick meals lot extra \n",
1712 |             "Original summary: good value \n",
1713 |             "Predicted summary:  great deal\n",
1714 |             "\n",
1715 |             "\n",
1716 |             "Review: order tea labeled decaff must caffeine residue levels tested tea caffeine decaff non decaff tea anywhere caffeine caffeine caffeinated tea caffeine slightly less naturally present tea leaf \n",
1717 |             "Original summary: caffeine is not \n",
1718 |             "Predicted summary:  not as good as\n",
1719 |             "\n",
1720 |             "\n",
1721 |             "Review: excellent babies toddler really best offer little one delicious rich vitamins calcium protein low fat sorry products available website \n",
1722 |             "Original summary: excellent product for babies and toddler \n",
1723 |             "Predicted summary:  great product\n",
1724 |             "\n",
1725 |             "\n",
1726 |             "Review: purchased item dented would bet run dented product clearing ship ones \n",
1727 |             "Original summary: sometimes dented \n",
1728 |             "Predicted summary:  dented cans\n",
1729 |             "\n",
1730 |             "\n",
1731 |             "Review: almost tastes like mini blueberry pie love one favorite thoroughly fallen love \n",
1732 |             "Original summary: excellent love the blueberry pecan \n",
1733 |             "Predicted summary:  yummy\n",
1734 |             "\n",
1735 |             "\n",
1736 |             "Review: dog loves keeps busy minutes long time chew hound \n",
1737 |             "Original summary: chew away \n",
1738 |             "Predicted summary:  dog loves it\n",
1739 |             "\n",
1740 |             "\n",
1741 |             "Review: plant came quickly looks great office nice pot plant thriving well \n",
1742 |             "Original summary: very nice office plant \n",
1743 |             "Predicted summary:  plant\n",
1744 |             "\n",
1745 |             "\n",
1746 |             "Review: dog loves lickety stik bacon flavor since likes much plan getting flavors great liquid treat dog highly recommend lickety stik \n",
1747 |             "Original summary: great dog treat \n",
1748 |             "Predicted summary:  dog loves them\n",
1749 |             "\n",
1750 |             "\n",
1751 |             "Review: great toy dogs chew everything else little literally eats toys one toys yet destroy loves carries around everywhere got rex cutest thing \n",
1752 |             "Original summary: good for chewers \n",
1753 |             "Predicted summary:  dogs love it\n",
1754 |             "\n",
1755 |             "\n",
1756 |             "Review: really search good deals tea tea great price tea amazon almost cup price cup coffee herbal varieties low caffine good option wife used dinner coffe \n",
1757 |             "Original summary: great price for great tea \n",
1758 |             "Predicted summary:  great tea\n",
1759 |             "\n",
1760 |             "\n",
1761 |             "Review: pricey essentially small bag hard crumbs maybe dog spoiled treats like third class treats definitely bottom doggie treat often simply walk away glad people like buying \n",
1762 |             "Original summary: waste of money \n",
1763 |             "Predicted summary:  not for dogs\n",
1764 |             "\n",
1765 |             "\n",
1766 |             "Review: little pricey consider sugar low cal caffine really rich flavor best chai ever found \n",
1767 |             "Original summary: fabulous product \n",
1768 |             "Predicted summary:  delicious\n",
1769 |             "\n",
1770 |             "\n",
1771 |             "Review: loves taste beef freeze dried dog treats use training really works \n",
1772 |             "Original summary: dog lover \n",
1773 |             "Predicted summary:  great training treat\n",
1774 |             "\n",
1775 |             "\n",
1776 |             "Review: three dogs cairn terriers year old border collie proud greenies like taste helps keep gums teeth good shape \n",
1777 |             "Original summary: our dogs love greenies \n",
1778 |             "Predicted summary:  greenies\n",
1779 |             "\n",
1780 |             "\n",
1781 |             "Review: good soft drink smooth strawberry cream soda tasty \n",
1782 |             "Original summary: good stuff \n",
1783 |             "Predicted summary:  good stuff\n",
1784 |             "\n",
1785 |             "\n",
1786 |             "Review: item arrived sugar free shipped regular version caramel syrup small internal sticker bottle stated sugar free although company label bottle stated regular version \n",
1787 |             "Original summary: wrong item \n",
1788 |             "Predicted summary:  not as good as\n",
1789 |             "\n",
1790 |             "\n",
1791 |             "Review: like strong coffee coffee rated found weak sickening taste \n",
1792 |             "Original summary: disapointed \n",
1793 |             "Predicted summary:  not bad\n",
1794 |             "\n",
1795 |             "\n",
1796 |             "Review: saw peanut butter chocolate cereal knew try pleased eat chocolate breakfast feel guilty two kids love cereal well great eat alone favorite milk product yogurt mix homemade granola well \n",
1797 |             "Original summary: the yummy \n"
1798 |           ],
1799 |           "name": "stdout"
1800 |         },
1801 |         {
1802 |           "output_type": "stream",
1803 |           "text": [
1804 |             "Predicted summary:  yummy\n",
1805 |             "\n",
1806 |             "\n",
1807 |             "Review: begging time loves used buy small bottle buying every weeks since saw oz buying last lot longer gas money cheaper buy online \n",
1808 |             "Original summary: my dog loves it \n",
1809 |             "Predicted summary:  great product\n",
1810 |             "\n",
1811 |             "\n",
1812 |             "Review: true also need decent scale tried caviar recipe everything worked perfectly first try fun easy make kit comes large enough samples looks like good uses \n",
1813 |             "Original summary: great to \n",
1814 |             "Predicted summary:  works great\n",
1815 |             "\n",
1816 |             "\n",
1817 |             "Review: dog really likes treats like buy run mill treats loaded fat fillers continue buy \n",
1818 |             "Original summary: buddy biscuits \n",
1819 |             "Predicted summary:  my dog loves these\n",
1820 |             "\n",
1821 |             "\n",
1822 |             "Review: tulsi green tea great good iced tea well \n",
1823 |             "Original summary: green tea \n",
1824 |             "Predicted summary:  green tea\n",
1825 |             "\n",
1826 |             "\n",
1827 |             "Review: always put something market couple poof gone best tasting product pepsi \n",
1828 |             "Original summary: best taste \n",
1829 |             "Predicted summary:  great taste\n",
1830 |             "\n",
1831 |             "\n",
1832 |             "Review: like tomatoes fresh flavorful also come carton welcome alternative metal cans impart flavor sometimes lined plastic containing \n",
1833 |             "Original summary: yummy tomatoes good packaging \n",
1834 |             "Predicted summary:  good product\n",
1835 |             "\n",
1836 |             "\n",
1837 |             "Review: great get habit forming careful bought whole case save overall versus going supermarket rich dark chocolate crisp cookie worth every penny oreo eat heart \n",
1838 |             "Original summary: delicious \n",
1839 |             "Predicted summary:  love these\n",
1840 |             "\n",
1841 |             "\n",
1842 |             "Review: else say arrived promptly perhaps time expected expiration date like next day good go \n",
1843 |             "Original summary: baby loves it \n",
1844 |             "Predicted summary:  not what expected\n",
1845 |             "\n",
1846 |             "\n",
1847 |             "Review: bought local recently advertised cheesy flavor detectable product even salt flavor avoid product \n",
1848 |             "Original summary: no cheese flavor \n",
1849 |             "Predicted summary:  good but not great\n",
1850 |             "\n",
1851 |             "\n",
1852 |             "Review: big volume coffee morning one great \n",
1853 |             "Original summary: great morning coffee \n",
1854 |             "Predicted summary:  great coffee\n",
1855 |             "\n",
1856 |             "\n",
1857 |             "Review: drank try keep awake fell asleep minutes drinking feel anything \n",
1858 |             "Original summary: it made me fall \n",
1859 |             "Predicted summary:  great taste\n",
1860 |             "\n",
1861 |             "\n",
1862 |             "Review: drink cups day verona italian french roast coffee wanted try lower acid version brand coffee smells tastes like vinegar totally unpalatable better drinking water acid coffee bothers \n",
1863 |             "Original summary: single worst coffee ever \n",
1864 |             "Predicted summary:  not very good\n",
1865 |             "\n",
1866 |             "\n",
1867 |             "Review: getting price however afraid stocking anymore reduced price think one trying eat crackers low calorie string cheese breakfast every total calories put breakfast baggie go \n",
1868 |             "Original summary: am addicted to these \n",
1869 |             "Predicted summary:  not as good as expected\n",
1870 |             "\n",
1871 |             "\n",
1872 |             "Review: first time using fondarific fondant general one really easy use baby shower cake worked indicated also colored made two tier cake final product looked great greasy \n",
1873 |             "Original summary: easy to use \n",
1874 |             "Predicted summary:  great product\n",
1875 |             "\n",
1876 |             "\n",
1877 |             "Review: work home drink cups cup coffee day good tasting coffee lowest price cup market \n",
1878 |             "Original summary: great coffee great price \n",
1879 |             "Predicted summary:  great coffee\n",
1880 |             "\n",
1881 |             "\n",
1882 |             "Review: guys say natural really tastes great pleasantly surprised stand flavor carbonated think would even better product time come fed sweet juices aftertaste make obvious really natural switch really gets vote \n",
1883 |             "Original summary: great taste all natural \n",
1884 |             "Predicted summary:  not very good\n",
1885 |             "\n",
1886 |             "\n",
1887 |             "Review: product good goes long way quite good one dd good product less \n",
1888 |             "Original summary: very good \n",
1889 |             "Predicted summary:  good stuff\n",
1890 |             "\n",
1891 |             "\n",
1892 |             "Review: tea wonderful soothing even soothing get shipped house found hard find decaffeinated tea grocery store much easier \n",
1893 |             "Original summary: decaffeinated french vanilla tea yummy \n",
1894 |             "Predicted summary:  great tea\n",
1895 |             "\n",
1896 |             "\n",
1897 |             "Review: wow little calorie espresso sugar serve cold delicious little shot espresso sugar overly sweet sugar helps offset taste espresso caffe bitter sweet tastes good really gave afternoon kick pants \n",
1898 |             "Original summary: nice little pick me up \n",
1899 |             "Predicted summary:  best water ever\n",
1900 |             "\n",
1901 |             "\n",
1902 |             "Review: mayonnaise delicious side side taste test would give hellman edge hellman richer taste \n",
1903 |             "Original summary: excellent but \n",
1904 |             "Predicted summary:  good stuff\n",
1905 |             "\n",
1906 |             "\n",
1907 |             "Review: love medium full flavored roast smooth taste bitter acidic taste excellent coffee good value also try timothy kona good also \n",
1908 |             "Original summary: wonderful coffee \n",
1909 |             "Predicted summary:  great coffee\n",
1910 |             "\n",
1911 |             "\n",
1912 |             "Review: nice item chunks meat good gravy cat fond varieties nice little treat nonetheless think item bit pricy per ounce \n",
1913 |             "Original summary: nice but pricey \n",
1914 |             "Predicted summary:  good cat food\n",
1915 |             "\n",
1916 |             "\n",
1917 |             "Review: bought cookies gifts open last long good make great gifts would definitely buy \n",
1918 |             "Original summary: mouth watery cookies \n",
1919 |             "Predicted summary:  cookies\n",
1920 |             "\n",
1921 |             "\n",
1922 |             "Review: great price fast shipping best chips better ingredients less calories snack foods plus taste like real chips \n",
1923 |             "Original summary: pop chips are the best \n",
1924 |             "Predicted summary:  great chips\n",
1925 |             "\n",
1926 |             "\n",
1927 |             "Review: taco bell chipotle sauce bold flavorful tried chicken wings tacos salad made dish extremely tasty glad sampled new sauce staple condiment \n",
1928 |             "Original summary: bold flavor \n",
1929 |             "Predicted summary:  great taste\n",
1930 |             "\n",
1931 |             "\n",
1932 |             "Review: bought seeds make centerpieces really surprised fast grow planted seeds potting soil without ny preparation anything kept watering days super tall ready displayed centerpieces perfect \n",
1933 |             "Original summary: perfect for in days \n",
1934 |             "Predicted summary:  great seeds\n",
1935 |             "\n",
1936 |             "\n",
1937 |             "Review: every time need sun dried tomatoes local grocery stores conveniently small pouches ensure always hand called recipe \n",
1938 |             "Original summary: sun dried tomato bliss \n",
1939 |             "Predicted summary:  great product\n",
1940 |             "\n",
1941 |             "\n",
1942 |             "Review: love soup eat plain use recipe cannot find area glad amazon \n",
1943 |             "Original summary: soup chicken cheese \n",
1944 |             "Predicted summary:  soup\n",
1945 |             "\n",
1946 |             "\n",
1947 |             "Review: size quite good dog training smell strong cannot put open bag must seal everytime gave treat otherwise dog stand trying fetch believe taste great puppy purchase sure \n",
1948 |             "Original summary: strong smell and my puppy loves it \n",
1949 |             "Predicted summary:  my dog loves these\n",
1950 |             "\n",
1951 |             "\n",
1952 |             "Review: love chips auto order every months taste great whole bag calories bag every day sure helped weight loss little bags eat huge amount \n",
1953 |             "Original summary: great purchase \n",
1954 |             "Predicted summary:  great chips\n",
1955 |             "\n",
1956 |             "\n",
1957 |             "Review: many kit wines cost three four times made many kits find fine table wine recommend adding water five gallon mark flavor \n",
1958 |             "Original summary: good wine \n",
1959 |             "Predicted summary:  great product\n",
1960 |             "\n",
1961 |             "\n",
1962 |             "Review: sooo much pepper heavy salt reminds adams trick food cannot eat seriously fresh nuts seasoned \n",
1963 |             "Original summary: over the top seasoning \n",
1964 |             "Predicted summary:  great salt\n",
1965 |             "\n",
1966 |             "\n",
1967 |             "Review: loved brand best vanilla flavor others tried would buy better price \n",
1968 |             "Original summary: wolfgang puck coffee vanilla \n",
1969 |             "Predicted summary:  great taste\n",
1970 |             "\n",
1971 |             "\n",
1972 |             "Review: another brand cinammon carried amazon much better tasting brand maybe packaging part problem simple plastic bag tie amazon brand comes carefully set plastic box \n",
1973 |             "Original summary: edible have had much better \n",
1974 |             "Predicted summary:  not as good as expected\n",
1975 |             "\n",
1976 |             "\n",
1977 |             "Review: throw pack one actually taste bad especially compared orange tangerine like carbonation adds juice flavors need work switch drinks best worst watermelon strawberry kiwi berry black cherry orange tangerine \n",
1978 |             "Original summary: my favorite of the four tried \n",
1979 |             "Predicted summary:  tastes like\n",
1980 |             "\n",
1981 |             "\n",
1982 |             "Review: daughter drinking since months old months old still loves snack time healthy delicious great addition menu \n",
1983 |             "Original summary: great snack \n",
1984 |             "Predicted summary:  great snack\n",
1985 |             "\n",
1986 |             "\n",
1987 |             "Review: live guinea africa order products delivered boat every months sometimes disappointed time zero calories zero carbs taste great price zero delivery costs prime ordered different flavors one favorite love \n",
1988 |             "Original summary: love it \n",
1989 |             "Predicted summary:  great product\n",
1990 |             "\n",
1991 |             "\n",
1992 |             "Review: purchased larger size love size perfect keep purse snack especially times others dessert snack cannot eat must gluten free spouse touch diet food loves \n",
1993 |             "Original summary: cannot get enough \n",
1994 |             "Predicted summary:  great snack\n",
1995 |             "\n",
1996 |             "\n",
1997 |             "Review: always house drink favorite mix sprite oh good every day mind larger bottles use much bring \n",
1998 |             "Original summary: am an adult still love this \n",
1999 |             "Predicted summary:  great taste\n",
2000 |             "\n",
2001 |             "\n",
2002 |             "Review: ginger snaps overpowering ginger go great milk really enjoyed house great buy affordable compared alternative diet foods last least week store well \n",
2003 |             "Original summary: you can eat ginger again \n",
2004 |             "Predicted summary:  ginger\n",
2005 |             "\n",
2006 |             "\n",
2007 |             "Review: give squid one star use might thoroughly disappointed quite possibly call crazy \n",
2008 |             "Original summary: can for your \n"
2009 |           ],
2010 |           "name": "stdout"
2011 |         },
2012 |         {
2013 |           "output_type": "stream",
2014 |           "text": [
2015 |             "Predicted summary:  good stuff\n",
2016 |             "\n",
2017 |             "\n",
2018 |             "Review: quality seeds excellent begin germinate hours days ready use never sprouted seeds results good easily recommend sprouter whether human consumption four legged friends \n",
2019 |             "Original summary: wheat grass seeds \n",
2020 |             "Predicted summary:  great product\n",
2021 |             "\n",
2022 |             "\n",
2023 |             "Review: love stuff great store bought homemade baked goods kicking things professional level works colored dark light frosting also used dusting powdered sugar pretty fine texture \n",
2024 |             "Original summary: fun like dust \n",
2025 |             "Predicted summary:  perfect\n",
2026 |             "\n",
2027 |             "\n",
2028 |             "Review: bought jumbo greenies black lab loved way expensive regular use notice difference breath primary reason buying \n",
2029 |             "Original summary: jumbo greenies good but very expensive \n",
2030 |             "Predicted summary:  greenies\n",
2031 |             "\n",
2032 |             "\n",
2033 |             "Review: also bought costco per box included bags oz kids fighting remaining bags good buying due price high price prevent product reaching mass distribution \n",
2034 |             "Original summary: very good but too pricey \n",
2035 |             "Predicted summary:  good but not the best\n",
2036 |             "\n",
2037 |             "\n",
2038 |             "Review: originally found mints whole foods taste superb get lot money plus comes cute little tin uses dog loves go organic \n",
2039 |             "Original summary: wonderful \n",
2040 |             "Predicted summary:  my dog loves this\n",
2041 |             "\n",
2042 |             "\n",
2043 |             "Review: regular spam awful almost inedible would give tastes like animal know mean fellow spam turkey spam pretty good great would give worth try \n",
2044 |             "Original summary: better than regular \n",
2045 |             "Predicted summary:  not bad\n",
2046 |             "\n",
2047 |             "\n",
2048 |             "Review: really need know many cans also whitefish tuna buffet canned cat food thanks \n",
2049 |             "Original summary: need to know how many in case \n",
2050 |             "Predicted summary:  great product\n",
2051 |             "\n",
2052 |             "\n",
2053 |             "Review: great tasting rich flavor perfect making nice hot cup mocha bought test hershey syrup mocha incredible distinct taste difference noticeable much richer tastes like chocolate less sugary hershey syrup \n",
2054 |             "Original summary: great taste \n",
2055 |             "Predicted summary:  great flavor\n",
2056 |             "\n",
2057 |             "\n",
2058 |             "Review: number one japan number one great save get shipped automatically every month lugging car \n",
2059 |             "Original summary: great tea \n",
2060 |             "Predicted summary:  great price\n",
2061 |             "\n",
2062 |             "\n",
2063 |             "Review: bought item read best mayo sold yes even better worlds favorite hellman well review good bit better hellman fact put empty hellman jar said nothing family never knew difference \n",
2064 |             "Original summary: blue mayo \n",
2065 |             "Predicted summary:  good stuff\n",
2066 |             "\n",
2067 |             "\n",
2068 |             "Review: gum great makes car smell good leave refreshing sweet tart smooth \n",
2069 |             "Original summary: love the gum and the price \n",
2070 |             "Predicted summary:  gum\n",
2071 |             "\n",
2072 |             "\n",
2073 |             "Review: flavorful smells like heaven great price compared stores arrived fast \n",
2074 |             "Original summary: divine \n",
2075 |             "Predicted summary:  great product\n",
2076 |             "\n",
2077 |             "\n",
2078 |             "Review: love low calorie organic doctors recommend grams fiber daily smart bran grams per serving fruits veggies set day eat dry vanilla frozen yogurt cinnamon \n",
2079 |             "Original summary: yes to smart bran \n",
2080 |             "Predicted summary:  love these\n",
2081 |             "\n",
2082 |             "\n",
2083 |             "Review: found spice blend dallas years back tell restaurant using grilled shrimp like cajun spice grilling fish recommend store dry place replace every year least lose flavor \n",
2084 |             "Original summary: good stuff \n",
2085 |             "Predicted summary:  good stuff\n",
2086 |             "\n",
2087 |             "\n",
2088 |             "Review: plain riceselect couscous delicious easy quick prepare great side item base main course far found bad product riceselect \n",
2089 |             "Original summary: yummy \n",
2090 |             "Predicted summary:  good\n",
2091 |             "\n",
2092 |             "\n"
2093 |           ],
2094 |           "name": "stdout"
2095 |         }
2096 |       ]
2097 |     },
2098 |     {
2099 |       "cell_type": "markdown",
2100 |       "metadata": {
2101 |         "id": "OTkaYNjHW4lC",
2102 |         "colab_type": "text"
2103 |       },
2104 |       "source": [
2105 |         "This is really cool stuff. Even though the actual summary and the summary generated by our model do not match in terms of words, both of them are conveying the same meaning. Our model is able to generate a legible summary based on the context present in the text.\n",
2106 |         "\n",
2107 |         "This is how we can perform text summarization using deep learning concepts in Python.\n",
2108 |         "\n",
2109 |         "#How can we Improve the Model’s Performance Even Further?\n",
2110 |         "\n",
2111 |         "Your learning doesn’t stop here! There’s a lot more you can do to play around and experiment with the model:\n",
2112 |         "\n",
2113 |         "I recommend you to **increase the training dataset** size and build the model. The generalization capability of a deep learning model enhances with an increase in the training dataset size\n",
2114 |         "\n",
2115 |         "Try implementing **Bi-Directional LSTM** which is capable of capturing the context from both the directions and results in a better context vector\n",
2116 |         "\n",
2117 |         "Use the **beam search strategy** for decoding the test sequence instead of using the greedy approach (argmax)\n",
2118 |         "\n",
2119 |         "Evaluate the performance of your model based on the **BLEU score**\n",
2120 |         "\n",
2121 |         "Implement **pointer-generator networks** and **coverage mechanisms**\n",
2122 |         " \n",
2123 |         "\n"
2124 |       ]
2125 |     },
2126 |     {
2127 |       "cell_type": "markdown",
2128 |       "metadata": {
2129 |         "id": "R_qIecuvY5GT",
2130 |         "colab_type": "text"
2131 |       },
2132 |       "source": [
2133 |         "#End Notes\n",
2134 |         "\n",
2135 |         "If you have any feedback on this article or any doubts/queries, kindly share them in the comments section over [here](https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/) and I will get back to you. And make sure you experiment with the model we built here and share your results with me!"
2136 |       ]
2137 |     }
2138 |   ]
2139 | }
2140 | 


--------------------------------------------------------------------------------