└── How_to_build_own_text_summarizer_using_deep_learning.ipynb /How_to_build_own_text_summarizer_using_deep_learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "How to build own text summarizer using deep learning.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [] 10 | }, 11 | "language_info": { 12 | "name": "python", 13 | "version": "3.6.4", 14 | "mimetype": "text/x-python", 15 | "codemirror_mode": { 16 | "name": "ipython", 17 | "version": 3 18 | }, 19 | "pygments_lexer": "ipython3", 20 | "nbconvert_exporter": "python", 21 | "file_extension": ".py" 22 | }, 23 | "kernelspec": { 24 | "display_name": "Python 3", 25 | "language": "python", 26 | "name": "python3" 27 | } 28 | }, 29 | "cells": [ 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "id": "qFuL-RBgXqgU", 34 | "colab_type": "text" 35 | }, 36 | "source": [ 37 | "In this notebook, we will build an abstractive based text summarizer using deep learning from the scratch in python using keras\n", 38 | "\n", 39 | "I recommend you to go through the article over [here](https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/) to cover all the concepts which is required to build our own summarizer" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": { 45 | "id": "F5dSoP8lGMZi", 46 | "colab_type": "text" 47 | }, 48 | "source": [ 49 | "#Understanding the Problem Statement\n", 50 | "\n", 51 | "Customer reviews can often be long and descriptive. Analyzing these reviews manually, as you can imagine, is really time-consuming. This is where the brilliance of Natural Language Processing can be applied to generate a summary for long reviews.\n", 52 | "\n", 53 | "We will be working on a really cool dataset. Our objective here is to generate a summary for the Amazon Fine Food reviews using the abstraction-based approach we learned about above. You can download the dataset from[ here ](https://www.kaggle.com/snap/amazon-fine-food-reviews)\n", 54 | "\n", 55 | "It’s time to fire up our Jupyter notebooks! Let’s dive into the implementation details right away.\n", 56 | "\n", 57 | "#Custom Attention Layer\n", 58 | "\n", 59 | "Keras does not officially support attention layer. So, we can either implement our own attention layer or use a third-party implementation. We will go with the latter option for this article. You can download the attention layer from [here](https://github.com/thushv89/attention_keras/blob/master/layers/attention.py) and copy it in a different file called attention.py.\n", 60 | "\n", 61 | "Let’s import it into our environment:" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "metadata": { 67 | "trusted": true, 68 | "id": "Fi64aA0FFxcS", 69 | "colab_type": "code", 70 | "colab": {} 71 | }, 72 | "source": [ 73 | "from attention import AttentionLayer" 74 | ], 75 | "execution_count": 0, 76 | "outputs": [] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": { 81 | "id": "JUValOzcHtEK", 82 | "colab_type": "text" 83 | }, 84 | "source": [ 85 | "#Import the Libraries" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "metadata": { 91 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", 92 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", 93 | "trusted": true, 94 | "id": "_Jpu8qLEFxcY", 95 | "colab_type": "code", 96 | "colab": {}, 97 | "outputId": "95968e01-faac-4911-c802-9c008a4e62cf" 98 | }, 99 | "source": [ 100 | "import numpy as np\n", 101 | "import pandas as pd \n", 102 | "import re\n", 103 | "from bs4 import BeautifulSoup\n", 104 | "from keras.preprocessing.text import Tokenizer \n", 105 | "from keras.preprocessing.sequence import pad_sequences\n", 106 | "from nltk.corpus import stopwords\n", 107 | "from tensorflow.keras.layers import Input, LSTM, Embedding, Dense, Concatenate, TimeDistributed\n", 108 | "from tensorflow.keras.models import Model\n", 109 | "from tensorflow.keras.callbacks import EarlyStopping\n", 110 | "import warnings\n", 111 | "pd.set_option(\"display.max_colwidth\", 200)\n", 112 | "warnings.filterwarnings(\"ignore\")" 113 | ], 114 | "execution_count": 0, 115 | "outputs": [ 116 | { 117 | "output_type": "stream", 118 | "text": [ 119 | "Using TensorFlow backend.\n" 120 | ], 121 | "name": "stderr" 122 | } 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": { 128 | "id": "UVakjZ3oICgx", 129 | "colab_type": "text" 130 | }, 131 | "source": [ 132 | "#Read the dataset\n", 133 | "\n", 134 | "This dataset consists of reviews of fine foods from Amazon. The data spans a period of more than 10 years, including all ~500,000 reviews up to October 2012. These reviews include product and user information, ratings, plain text review, and summary. It also includes reviews from all other Amazon categories.\n", 135 | "\n", 136 | "We’ll take a sample of 100,000 reviews to reduce the training time of our model. Feel free to use the entire dataset for training your model if your machine has that kind of computational power." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "metadata": { 142 | "trusted": true, 143 | "id": "wnK5o4Z1Fxcj", 144 | "colab_type": "code", 145 | "colab": {} 146 | }, 147 | "source": [ 148 | "data=pd.read_csv(\"../input/amazon-fine-food-reviews/Reviews.csv\",nrows=100000)" 149 | ], 150 | "execution_count": 0, 151 | "outputs": [] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": { 156 | "id": "kGNQKvCaISIn", 157 | "colab_type": "text" 158 | }, 159 | "source": [ 160 | "# Drop Duplicates and NA values" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "metadata": { 166 | "trusted": true, 167 | "id": "Cjul88oOFxcr", 168 | "colab_type": "code", 169 | "colab": {} 170 | }, 171 | "source": [ 172 | "data.drop_duplicates(subset=['Text'],inplace=True)#dropping duplicates\n", 173 | "data.dropna(axis=0,inplace=True)#dropping na" 174 | ], 175 | "execution_count": 0, 176 | "outputs": [] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": { 181 | "id": "qi0xD6BkIWAm", 182 | "colab_type": "text" 183 | }, 184 | "source": [ 185 | "# Information about dataset\n", 186 | "\n", 187 | "Let us look at datatypes and shape of the dataset" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "metadata": { 193 | "trusted": true, 194 | "id": "__fy-JxTFxc9", 195 | "colab_type": "code", 196 | "colab": {}, 197 | "outputId": "d42c6e36-bbc8-43c2-de0e-d3effe3e8c4c" 198 | }, 199 | "source": [ 200 | "data.info()" 201 | ], 202 | "execution_count": 0, 203 | "outputs": [ 204 | { 205 | "output_type": "stream", 206 | "text": [ 207 | "\n", 208 | "Int64Index: 88421 entries, 0 to 99999\n", 209 | "Data columns (total 10 columns):\n", 210 | "Id 88421 non-null int64\n", 211 | "ProductId 88421 non-null object\n", 212 | "UserId 88421 non-null object\n", 213 | "ProfileName 88421 non-null object\n", 214 | "HelpfulnessNumerator 88421 non-null int64\n", 215 | "HelpfulnessDenominator 88421 non-null int64\n", 216 | "Score 88421 non-null int64\n", 217 | "Time 88421 non-null int64\n", 218 | "Summary 88421 non-null object\n", 219 | "Text 88421 non-null object\n", 220 | "dtypes: int64(5), object(5)\n", 221 | "memory usage: 7.4+ MB\n" 222 | ], 223 | "name": "stdout" 224 | } 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": { 230 | "id": "r0xLYACiFxdJ", 231 | "colab_type": "text" 232 | }, 233 | "source": [ 234 | "#Preprocessing\n", 235 | "\n", 236 | "Performing basic preprocessing steps is very important before we get to the model building part. Using messy and uncleaned text data is a potentially disastrous move. So in this step, we will drop all the unwanted symbols, characters, etc. from the text that do not affect the objective of our problem.\n", 237 | "\n", 238 | "Here is the dictionary that we will use for expanding the contractions:" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "metadata": { 244 | "trusted": true, 245 | "id": "0s6IY-x2FxdL", 246 | "colab_type": "code", 247 | "colab": {} 248 | }, 249 | "source": [ 250 | "contraction_mapping = {\"ain't\": \"is not\", \"aren't\": \"are not\",\"can't\": \"cannot\", \"'cause\": \"because\", \"could've\": \"could have\", \"couldn't\": \"could not\",\n", 251 | " \"didn't\": \"did not\", \"doesn't\": \"does not\", \"don't\": \"do not\", \"hadn't\": \"had not\", \"hasn't\": \"has not\", \"haven't\": \"have not\",\n", 252 | " \"he'd\": \"he would\",\"he'll\": \"he will\", \"he's\": \"he is\", \"how'd\": \"how did\", \"how'd'y\": \"how do you\", \"how'll\": \"how will\", \"how's\": \"how is\",\n", 253 | " \"I'd\": \"I would\", \"I'd've\": \"I would have\", \"I'll\": \"I will\", \"I'll've\": \"I will have\",\"I'm\": \"I am\", \"I've\": \"I have\", \"i'd\": \"i would\",\n", 254 | " \"i'd've\": \"i would have\", \"i'll\": \"i will\", \"i'll've\": \"i will have\",\"i'm\": \"i am\", \"i've\": \"i have\", \"isn't\": \"is not\", \"it'd\": \"it would\",\n", 255 | " \"it'd've\": \"it would have\", \"it'll\": \"it will\", \"it'll've\": \"it will have\",\"it's\": \"it is\", \"let's\": \"let us\", \"ma'am\": \"madam\",\n", 256 | " \"mayn't\": \"may not\", \"might've\": \"might have\",\"mightn't\": \"might not\",\"mightn't've\": \"might not have\", \"must've\": \"must have\",\n", 257 | " \"mustn't\": \"must not\", \"mustn't've\": \"must not have\", \"needn't\": \"need not\", \"needn't've\": \"need not have\",\"o'clock\": \"of the clock\",\n", 258 | " \"oughtn't\": \"ought not\", \"oughtn't've\": \"ought not have\", \"shan't\": \"shall not\", \"sha'n't\": \"shall not\", \"shan't've\": \"shall not have\",\n", 259 | " \"she'd\": \"she would\", \"she'd've\": \"she would have\", \"she'll\": \"she will\", \"she'll've\": \"she will have\", \"she's\": \"she is\",\n", 260 | " \"should've\": \"should have\", \"shouldn't\": \"should not\", \"shouldn't've\": \"should not have\", \"so've\": \"so have\",\"so's\": \"so as\",\n", 261 | " \"this's\": \"this is\",\"that'd\": \"that would\", \"that'd've\": \"that would have\", \"that's\": \"that is\", \"there'd\": \"there would\",\n", 262 | " \"there'd've\": \"there would have\", \"there's\": \"there is\", \"here's\": \"here is\",\"they'd\": \"they would\", \"they'd've\": \"they would have\",\n", 263 | " \"they'll\": \"they will\", \"they'll've\": \"they will have\", \"they're\": \"they are\", \"they've\": \"they have\", \"to've\": \"to have\",\n", 264 | " \"wasn't\": \"was not\", \"we'd\": \"we would\", \"we'd've\": \"we would have\", \"we'll\": \"we will\", \"we'll've\": \"we will have\", \"we're\": \"we are\",\n", 265 | " \"we've\": \"we have\", \"weren't\": \"were not\", \"what'll\": \"what will\", \"what'll've\": \"what will have\", \"what're\": \"what are\",\n", 266 | " \"what's\": \"what is\", \"what've\": \"what have\", \"when's\": \"when is\", \"when've\": \"when have\", \"where'd\": \"where did\", \"where's\": \"where is\",\n", 267 | " \"where've\": \"where have\", \"who'll\": \"who will\", \"who'll've\": \"who will have\", \"who's\": \"who is\", \"who've\": \"who have\",\n", 268 | " \"why's\": \"why is\", \"why've\": \"why have\", \"will've\": \"will have\", \"won't\": \"will not\", \"won't've\": \"will not have\",\n", 269 | " \"would've\": \"would have\", \"wouldn't\": \"would not\", \"wouldn't've\": \"would not have\", \"y'all\": \"you all\",\n", 270 | " \"y'all'd\": \"you all would\",\"y'all'd've\": \"you all would have\",\"y'all're\": \"you all are\",\"y'all've\": \"you all have\",\n", 271 | " \"you'd\": \"you would\", \"you'd've\": \"you would have\", \"you'll\": \"you will\", \"you'll've\": \"you will have\",\n", 272 | " \"you're\": \"you are\", \"you've\": \"you have\"}" 273 | ], 274 | "execution_count": 0, 275 | "outputs": [] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": { 280 | "id": "2JFRXFHmI7Mj", 281 | "colab_type": "text" 282 | }, 283 | "source": [ 284 | "We will perform the below preprocessing tasks for our data:\n", 285 | "\n", 286 | "1.Convert everything to lowercase\n", 287 | "\n", 288 | "2.Remove HTML tags\n", 289 | "\n", 290 | "3.Contraction mapping\n", 291 | "\n", 292 | "4.Remove (‘s)\n", 293 | "\n", 294 | "5.Remove any text inside the parenthesis ( )\n", 295 | "\n", 296 | "6.Eliminate punctuations and special characters\n", 297 | "\n", 298 | "7.Remove stopwords\n", 299 | "\n", 300 | "8.Remove short words\n", 301 | "\n", 302 | "Let’s define the function:" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "metadata": { 308 | "trusted": true, 309 | "id": "XZr-u3OEFxdT", 310 | "colab_type": "code", 311 | "colab": {} 312 | }, 313 | "source": [ 314 | "stop_words = set(stopwords.words('english')) \n", 315 | "\n", 316 | "def text_cleaner(text,num):\n", 317 | " newString = text.lower()\n", 318 | " newString = BeautifulSoup(newString, \"lxml\").text\n", 319 | " newString = re.sub(r'\\([^)]*\\)', '', newString)\n", 320 | " newString = re.sub('\"','', newString)\n", 321 | " newString = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in newString.split(\" \")]) \n", 322 | " newString = re.sub(r\"'s\\b\",\"\",newString)\n", 323 | " newString = re.sub(\"[^a-zA-Z]\", \" \", newString) \n", 324 | " newString = re.sub('[m]{2,}', 'mm', newString)\n", 325 | " if(num==0):\n", 326 | " tokens = [w for w in newString.split() if not w in stop_words]\n", 327 | " else:\n", 328 | " tokens=newString.split()\n", 329 | " long_words=[]\n", 330 | " for i in tokens:\n", 331 | " if len(i)>1: #removing short word\n", 332 | " long_words.append(i) \n", 333 | " return (\" \".join(long_words)).strip()" 334 | ], 335 | "execution_count": 0, 336 | "outputs": [] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "metadata": { 341 | "trusted": true, 342 | "id": "A2QAeCHWFxdY", 343 | "colab_type": "code", 344 | "colab": {} 345 | }, 346 | "source": [ 347 | "#call the function\n", 348 | "cleaned_text = []\n", 349 | "for t in data['Text']:\n", 350 | " cleaned_text.append(text_cleaner(t,0)) " 351 | ], 352 | "execution_count": 0, 353 | "outputs": [] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": { 358 | "id": "snRZY8wjLao2", 359 | "colab_type": "text" 360 | }, 361 | "source": [ 362 | "Let us look at the first five preprocessed reviews" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "metadata": { 368 | "trusted": true, 369 | "id": "NCAIkhWbFxdh", 370 | "colab_type": "code", 371 | "colab": {}, 372 | "outputId": "c2da1a36-4488-4e32-ef9e-fcfe496e374d" 373 | }, 374 | "source": [ 375 | "cleaned_text[:5] " 376 | ], 377 | "execution_count": 0, 378 | "outputs": [ 379 | { 380 | "output_type": "execute_result", 381 | "data": { 382 | "text/plain": [ 383 | "['bought several vitality canned dog food products found good quality product looks like stew processed meat smells better labrador finicky appreciates product better',\n", 384 | " 'product arrived labeled jumbo salted peanuts peanuts actually small sized unsalted sure error vendor intended represent product jumbo',\n", 385 | " 'confection around centuries light pillowy citrus gelatin nuts case filberts cut tiny squares liberally coated powdered sugar tiny mouthful heaven chewy flavorful highly recommend yummy treat familiar story lewis lion witch wardrobe treat seduces edmund selling brother sisters witch',\n", 386 | " 'looking secret ingredient robitussin believe found got addition root beer extract ordered made cherry soda flavor medicinal',\n", 387 | " 'great taffy great price wide assortment yummy taffy delivery quick taffy lover deal']" 388 | ] 389 | }, 390 | "metadata": { 391 | "tags": [] 392 | }, 393 | "execution_count": 9 394 | } 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "metadata": { 400 | "trusted": true, 401 | "id": "GsRXocxoFxd-", 402 | "colab_type": "code", 403 | "colab": {} 404 | }, 405 | "source": [ 406 | "#call the function\n", 407 | "cleaned_summary = []\n", 408 | "for t in data['Summary']:\n", 409 | " cleaned_summary.append(text_cleaner(t,1))" 410 | ], 411 | "execution_count": 0, 412 | "outputs": [] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": { 417 | "id": "oZeD0gs6Lnb-", 418 | "colab_type": "text" 419 | }, 420 | "source": [ 421 | "Let us look at the first 10 preprocessed summaries" 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "metadata": { 427 | "trusted": true, 428 | "id": "jQJdZcAzFxee", 429 | "colab_type": "code", 430 | "colab": {}, 431 | "outputId": "a1fbe683-c03f-4afb-addf-e075021c121b" 432 | }, 433 | "source": [ 434 | "cleaned_summary[:10]" 435 | ], 436 | "execution_count": 0, 437 | "outputs": [ 438 | { 439 | "output_type": "execute_result", 440 | "data": { 441 | "text/plain": [ 442 | "['good quality dog food',\n", 443 | " 'not as advertised',\n", 444 | " 'delight says it all',\n", 445 | " 'cough medicine',\n", 446 | " 'great taffy',\n", 447 | " 'nice taffy',\n", 448 | " 'great just as good as the expensive brands',\n", 449 | " 'wonderful tasty taffy',\n", 450 | " 'yay barley',\n", 451 | " 'healthy dog food']" 452 | ] 453 | }, 454 | "metadata": { 455 | "tags": [] 456 | }, 457 | "execution_count": 11 458 | } 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "metadata": { 464 | "trusted": true, 465 | "id": "L1zLpnqsFxey", 466 | "colab_type": "code", 467 | "colab": {} 468 | }, 469 | "source": [ 470 | "data['cleaned_text']=cleaned_text\n", 471 | "data['cleaned_summary']=cleaned_summary" 472 | ], 473 | "execution_count": 0, 474 | "outputs": [] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": { 479 | "id": "KT_D2cLiLy77", 480 | "colab_type": "text" 481 | }, 482 | "source": [ 483 | "#Drop empty rows" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "metadata": { 489 | "trusted": true, 490 | "id": "sYK390unFxfA", 491 | "colab_type": "code", 492 | "colab": {} 493 | }, 494 | "source": [ 495 | "data.replace('', np.nan, inplace=True)\n", 496 | "data.dropna(axis=0,inplace=True)" 497 | ], 498 | "execution_count": 0, 499 | "outputs": [] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": { 504 | "id": "Vm8Fk2TCL7Sp", 505 | "colab_type": "text" 506 | }, 507 | "source": [ 508 | "#Understanding the distribution of the sequences\n", 509 | "\n", 510 | "Here, we will analyze the length of the reviews and the summary to get an overall idea about the distribution of length of the text. This will help us fix the maximum length of the sequence:" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "metadata": { 516 | "trusted": true, 517 | "id": "MdF76AHHFxgw", 518 | "colab_type": "code", 519 | "colab": {}, 520 | "outputId": "e3bbe165-4235-482f-bfd4-36a3f1d95290" 521 | }, 522 | "source": [ 523 | "import matplotlib.pyplot as plt\n", 524 | "\n", 525 | "text_word_count = []\n", 526 | "summary_word_count = []\n", 527 | "\n", 528 | "# populate the lists with sentence lengths\n", 529 | "for i in data['cleaned_text']:\n", 530 | " text_word_count.append(len(i.split()))\n", 531 | "\n", 532 | "for i in data['cleaned_summary']:\n", 533 | " summary_word_count.append(len(i.split()))\n", 534 | "\n", 535 | "length_df = pd.DataFrame({'text':text_word_count, 'summary':summary_word_count})\n", 536 | "\n", 537 | "length_df.hist(bins = 30)\n", 538 | "plt.show()" 539 | ], 540 | "execution_count": 0, 541 | "outputs": [ 542 | { 543 | "output_type": "display_data", 544 | "data": { 545 | "text/plain": [ 546 | "
" 547 | ], 548 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEICAYAAAC0+DhzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3X+01XWd7/HnK03zWgZonRBssImaZTKRcJVZdZtTJiI1YXeVQd5AY0kttbF1WRU2rUWjOZfupI3OeC1KrtCY6NVMpjA6kXuZ6w4KJImgDkfC62EhFKB0qCzoff/4fnZ82d999tlwfuwfvB5r7bX3fn8/3+/+fs767vPe38/38/18FBGYmZnlvaLRO2BmZs3HycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMmpqkbZLeNwjbuUPSlwdjn44FTg5WN0nHN3ofzGx4ODkMM0mfl7Rd0q8lPSPp/MpfNJI6JfXk3m+T9FlJT0jaL+l2SR2SHkzb+bGkkansOEkh6XJJz0vaK+lTkv5zWv9FSf+S2/afS/qJpN2SfiXpTkkjKj7785KeAPan/bivok63SLp5SP9wdkyS9G3gjcC/SeqV9DlJUyT933Qs/1xSZyo7SlKPpL9J718tqVvSbEnzgEuBz6Xt/FvDKtUqIsKPYXoAbwWeB05P78cBfw7cAXw5V64T6Mm93wasATqAMcAu4GfAO4BXAT8BFua2GcDX07KpwO+A7wGvz63/16n8m4ELgBOB1wEPA/9U8dkbgDOAk4DRwH5gRFp+fNrepEb/ff1oz0c6Bt+XXo8BdgPTyX7cXpDevy4tnwq8kI71bwL35rZz2PfMj9oPnzkMr4Nk/4TPkvTKiNgWEc/Wue4/R8TOiNgO/BR4NCIej4jfAfeTJYq86yPidxHxI7J/5ndFxK7c+u8AiIjuiOiKiJcj4pfATcBfV2zrloh4PiJ+GxE7yBLIR9KyacCvImL9Ef0lzI7OfwNWRsTKiPhjRHQB68iSBel4/z/A6hT7ZMP2tMU5OQyjiOgGPgN8Cdglabmk0+tcfWfu9W+rvH/10ZRPzVPLU1PXPuBfgdMqtvV8xfulZF9S0vO366yD2UD9GfCR1KT0oqQXgXeRndGWLQbOBu6IiN2N2Ml24OQwzCLiOxHxLrKDPICvkP2y/0+5Ym8Yxl36h7QfEyLiFLJ/9qooUzl07/eAv5R0NvAB4M4h30s7luWPv+eBb0fEiNzj5IhYBCDpOLLksAy4UtKb+9iO9cPJYRhJequk90o6kew6wG+BP5K16U9PF9TeQHZ2MVxeA/QCL0kaA3y2vxVSU9a9wHeAxyLi/w3tLtoxbifwpvT6X4G/kXShpOMkvSp14Bibln+BLAl8AvhHYFlKGJXbsX44OQyvE4FFwK84dNHsWrJmmZ+TXXj7EXD3MO7T3wPnAC8BPwC+W+d6S4EJuEnJht7/AL6YmpA+CswgSwK/JDuT+CzwCkmTgP8OzI6Ig2Rn5QEsSNu5nex634uSvjfMdWg5SlfxzY6IpDcCTwNviIh9jd4fMxtcPnOwIybpFWS/0JY7MZi1J9/xakdE0slkbbfPkXVjNbM25GYlMzMr6LdZSdIZkh6StFnSJknXpPgoSV2StqTn8vANSsMpdKfhGs7JbWtOKr9F0pxcfJKkjWmdWyRVdqU0M7Nh1O+Zg6TRwOiI+Jmk1wDrgYuBy4A9EbFI0gJgZER8XtJ04NNkdyeeB9wcEedJGkV2J+Nksh4E68mGXNgr6THgb4FHgZVkd+Q+WGu/TjvttBg3bhz79+/n5JNPPuo/QDNwHRpj/fr1v4qI1zV6P+pVPuYrteLfvh6u19Co+7g/0vE2gAfIxjN5hixpQHZ34jPp9TeAWbnyz6Tls4Bv5OLfSLHRwNO5+GHl+npMmjQpIiIeeuihaHWuQ2MA66IJxrCp91E+5iu14t++Hq7X0Kj3uD+iC9KSxpGNyfMo0BHZODuQ9dnvSK/HcPhwCz0pViveUyVe7fPnAfMAOjo6KJVK9Pb2UiqVjqQaTcd1MLNmU3dykPRq4D7gMxGxL39ZICJC0pBf2Y6IxWS3xjN58uTo7OykVCrR2dk51B89pFwHM2s2dd3nIOmVZInhzogo30G7M12PKF+X2JXi28mGdy4bm2K14mOrxM3MrEHq6a0kstvOn4qIm3KLVgDlHkdzyK5FlOOzU6+lKcBLqflpFTBV0sjUs2kqsCot25cm8BAwO7ctMzNrgHqald4JfBzYKGlDin2BbIygeyTNJbsh6pK0bCVZT6Vu4DfA5QARsUfS9cDaVO66iNiTXl9JNhHHScCD6WFmZg3Sb3KIiEcoDuFcdn6V8gFc1ce2lgBLqsTXkY2/bmZmTcBjK5mZWYGTg5mZFTg5mJlZwTExKuu4BT847P22Re9v0J6YDQ0f4zbYfOZgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgVoWkEZLulfS0pKck/ZWnxrVjiZODWXU3Az+MiL8A3g48BSwAVkfEeGB1eg9wETA+PeYBt0E2zzqwkGy63HOBheWEkspckVtv2jDUyaxuTg5mFSS9Fng32VD1RMTvI+JFYAawNBVbSjaXOim+LM3CuAYYkeY4uRDoiog9EbEX6AKmpWWnRMSaNFDlsty2zJrCMXGHtNkROhP4JfC/Jb0dWA9cQ5NMjVupt7eX+RMOHhZrhylb23Xq2Vapl5ODWdHxwDnApyPiUUk3c6gJCWjs1LiVSqUSNz6y/7DYtkuL5VpNu0492yr1crOSWVEP0BMRj6b395IlC0+Na8cMJwezChHxAvC8pLem0PnAZjw1rh1D3KxkVt2ngTslnQBsJZvu9hV4alw7RvSbHCQtAT4A7IqIs1PsbqD8q2oE8GJETJQ0jqzL3zNp2ZqI+FRaZxKHvgwrgWtSu+0o4G5gHLANuCT17DBrmIjYAEyusshT49oxoZ5mpTuo6IMdER+NiIkRMRG4D/hubvGz5WXlxJD01a+7r77jZmbWIP0mh4h4GNhTbVlqL70EuKvWNvrp191X33EzM2uQgV5z+C/AzojYkoudKelxYB/wxYj4KbX7dffVd7ygWp/vevoMz59w4LD3zdbHuFX6PdfSDnUws0MGmhxmcfhZww7gjRGxO11j+J6kt9W7sf76jlfr811Pn+HLKqdQbLI+4K3S77mWdqiDmR1y1MlB0vHAfwUmlWMR8TLwcnq9XtKzwFuo3a97p6TREbGjou+4mZk1yEDuc3gf8HRE/Km5SNLrJB2XXr+J7MLz1n76dffVd9zMzBqk3+Qg6S7g34G3SupJfbwBZlK8EP1u4AlJG8juKv1URb/ub5H1BX+WQ/26FwEXSNpClnAWDaA+ZmY2CPptVoqIWX3EL6sSu4+sa2u18lX7dUfEbqr0HTczs8bx8BlmZlbg5GBmZgVODmZmVnBMDrw3ruK+B4Bti97fgD0xM2tOPnMwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwq0LSNkkbJW2QtC7FRknqkrQlPY9McUm6RVK3pCcknZPbzpxUfoukObn4pLT97rSuhr+WZn1zcjDr23siYmJETE7vFwCrI2I8sDq9B7iIbL708cA84DbIkgmwEDgPOBdYWE4oqcwVufWmDX11zOpXzxzSSyTtkvRkLvYlSdvTr6oNkqbnll2bfg09I+nCXHxainVLWpCLnynp0RS/W9IJg1lBs0E0A1iaXi8FLs7Fl0VmDTBC0mjgQqArIvZExF6gC5iWlp0SEWsiIoBluW2ZNYV65nO4A/gXsgM472sR8dV8QNJZwEzgbcDpwI8lvSUtvhW4AOgB1kpaERGbga+kbS2X9HVgLumXl1kDBfAjSQF8IyIWAx0RsSMtfwHoSK/HAM/n1u1JsVrxnirxAknzyM5G6OjooFQqFcr09vYyf8LBw2LVyrWa3t7etqhHpVapV7/JISIeljSuzu3NAJZHxMvALyR1k51OA3RHxFYAScuBGZKeAt4LfCyVWQp8CScHa7x3RcR2Sa8HuiQ9nV8YEZESx5BKSWkxwOTJk6Ozs7NQplQqceMj+w+Lbbu0WK7VlEolqtW31bVKvQYyE9zVkmYD64D56bR5DLAmVyb/i6jyF9R5wKnAixFxoEr5gmq/ourJwvMnHKi5HBr7S6tVfknU0g51yIuI7el5l6T7yX7k7JQ0OiJ2pKahXan4duCM3OpjU2w70FkRL6X42CrlzZrG0SaH24DryU69rwduBD4xWDvVl2q/ourJwpdVmRa0UiN/abXKL4la2qEOZZJOBl4REb9Or6cC1wErgDnAovT8QFplBdmPpeVkP3peSglkFfAPuYvQU4FrI2KPpH2SpgCPArOBfx6u+pnV46iSQ0TsLL+W9E3g++ltX7+g6CO+m+zi3fHp7MG/oKwZdAD3p96lxwPfiYgfSloL3CNpLvAccEkqvxKYDnQDvwEuB0hJ4HpgbSp3XUTsSa+vJLuedxLwYHqYNY2jSg7lU+v09kNAuSfTCuA7km4iuyA9HngMEDBe0plk//xnAh9L7bYPAR8GlnP4rzGzhkjXxt5eJb4bOL9KPICr+tjWEmBJlfg64OwB76zZEOk3OUi6i6zd9DRJPWT9tjslTSRrVtoGfBIgIjZJugfYDBwAroqIg2k7VwOrgOOAJRGxKX3E54Hlkr4MPA7cPmi1MzOzo1JPb6VZVcJ9/gOPiBuAG6rEV5KdflfGt3KoR5OZmTUB3yFtZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZWMJCxldrKuIohNrYten+D9sTMrPF85mBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYF/SYHSUsk7ZL0ZC72j5KelvSEpPsljUjxcZJ+K2lDenw9t84kSRsldUu6RZJSfJSkLklb0vPIoaiomZnVr54zhzuAaRWxLuDsiPhL4D+Aa3PLno2IienxqVz8NuAKYHx6lLe5AFgdEeOB1em9mZk1UL/JISIeBvZUxH4UEQfS2zXA2FrbkDQaOCUi1kREAMuAi9PiGcDS9HppLm5mZg0yGEN2fwK4O/f+TEmPA/uAL0bET4ExQE+uTE+KAXRExI70+gWgo68PkjQPmAfQ0dFBqVSit7eXUqlUcwfnTzhQc3k1/W1zMNVTh2bXDnWoJOk4YB2wPSI+IOlMYDlwKrAe+HhE/F7SiWQ/eCYBu4GPRsS2tI1rgbnAQeBvI2JVik8DbgaOA74VEYuGtXJm/RhQcpD0d8AB4M4U2gG8MSJ2S5oEfE/S2+rdXkSEpKixfDGwGGDy5MnR2dlJqVSis7Oz5nYvq5iroR7bLq29zcFUTx2aXTvUoYprgKeAU9L7rwBfi4jl6XraXLLm0rnA3oh4s6SZqdxHJZ0FzATeBpwO/FjSW9K2bgUuIPuhtFbSiojYPFwVM+vPUfdWknQZ8AHg0tRURES8HBG70+v1wLPAW4DtHN70NDbFAHamZqdy89Ouo90ns8EiaSzwfuBb6b2A9wL3piL5JtB80+i9wPmp/Axgefpe/ALoBs5Nj+6I2BoRvyc7G5kx9LUyq99RJYd0Svw54IMR8Ztc/HXpVBxJbyK78Lw1NRvtkzQlfWlmAw+k1VYAc9LrObm4WSP9E9kx/sf0/lTgxdy1tnzT6BjgeYC0/KVU/k/xinX6ips1jX6blSTdBXQCp0nqARaS9U46EehKPVLXpJ5J7wauk/QHsi/VpyKifDH7SrKeTycBD6YHwCLgHklzgeeASwalZmZHSdIHgF0RsV5SZ4P3pXCdrVJvby/zJxw8LNYO13/a8ToWtE69+k0OETGrSvj2PsreB9zXx7J1wNlV4ruB8/vbD7Nh9E7gg5KmA68iu+ZwMzBC0vHp7CDfNLodOAPokXQ88FqyC9PleFl+nb7ih6l2na1SqVTixkf2HxYbzmtmQ6VNr2O1TL18h7RZhYi4NiLGRsQ4sgvKP4mIS4GHgA+nYvkm0HzT6IdT+UjxmZJOTD2dxgOPAWuB8ZLOlHRC+owVw1A1s7oNRldWs2PF54Hlkr4MPM6hM+jbgW9L6ia7J2gmQERsknQPsJmsV99VEXEQQNLVwCqyrqxLImLTsNbErB9ODmY1REQJKKXXW8l6GlWW+R3wkT7WvwG4oUp8JbByEHfVbFC5WcnMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCupKDpKWSNol6clcbJSkLklb0vPIFJekWyR1S3pC0jm5deak8lskzcnFJ0namNa5RWliajMza4x6zxzuAKZVxBYAqyNiPLA6vQe4iGw6xPFkE6PfBlkyARYC55FNmLKwnFBSmSty61V+1rAbt+AHhz3MzI4ldSWHiHiYbPrDvBnA0vR6KXBxLr4sMmvIJmUfDVwIdEXEnojYC3QB09KyUyJiTZp3d1luW2Zm1gADmSa0IyJ2pNcvAB3p9Rjg+Vy5nhSrFe+pEi+QNI/sbISOjg5KpRK9vb2USqWaOzp/woE6qlNbf58xEPXUodm1Qx3M7JBBmUM6IkJSDMa2+vmcxcBigMmTJ0dnZyelUonOzs6a6102CM1C2y6t/RkDUU8dml071MHMDhlIb6WdqUmI9LwrxbcDZ+TKjU2xWvGxVeJmZtYgA0kOK4Byj6M5wAO5+OzUa2kK8FJqfloFTJU0Ml2IngqsSsv2SZqSeinNzm3LzMwaoK5mJUl3AZ3AaZJ6yHodLQLukTQXeA64JBVfCUwHuoHfAJcDRMQeSdcDa1O56yKifJH7SrIeUScBD6aHmZk1SF3JISJm9bHo/CplA7iqj+0sAZZUia8Dzq5nX8zMbOj5DmmzKiS9StJjkn4uaZOkv0/xMyU9mm7YvFvSCSl+YnrfnZaPy23r2hR/RtKFufi0FOuWtKByH8waycnBrLqXgfdGxNuBiWT35EwBvgJ8LSLeDOwF5qbyc4G9Kf61VA5JZwEzgbeR3dz5vyQdJ+k44Faym0bPAmalsmZNwcnBrIp0E2dvevvK9AjgvcC9KV5582f5ptB7gfNTB4sZwPKIeDkifkF2Le7c9OiOiK0R8XtgeSpr1hQG5T4Hs3aUft2vB95M9iv/WeDFiCjfVZm/YfNPN3lGxAFJLwGnpvia3Gbz61TeFHpelX0o3PhZqbe3l/kTDh4Wa4cbEtv1xspWqZeTg1kfIuIgMFHSCOB+4C8asA+FGz8rlUolbnxk/2Gxobxpc7i0642VrVIvNyuZ9SMiXgQeAv6KbKyw8o+q/A2bf7rJMy1/LbCbI78p1KwptN2Zg0dQtcEg6XXAHyLiRUknAReQXWR+CPgw2TWCyps/5wD/npb/JA0rswL4jqSbgNPJRh1+DBAwXtKZZElhJvCx4aqfWX/aLjmYDZLRwNJ03eEVwD0R8X1Jm4Hlkr4MPA7cnsrfDnxbUjfZCMYzASJik6R7gM3AAeCq1FyFpKvJRg44DlgSEZuGr3pmtTk5mFUREU8A76gS30rW06gy/jvgI31s6wbghirxlWQjCpg1HV9zMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7OCo04Okt4qaUPusU/SZyR9SdL2XHx6bh3PiGVm1gKOeviMiHiGbIas8rj328mGNb6cbKasr+bLV8yIdTrwY0lvSYtvJRvYrAdYK2lFRGw+2n0zM7OBGayxlc4Hno2I57LJr6r604xYwC/SAGXlMWq605g1SCrPiOXkYGbWIIOVHGYCd+XeXy1pNrAOmB8RexngjFhQfVasylmV5k84UG3VARvKmZtaZWaoWtqhDmZ2yICTg6QTgA8C16bQbcD1ZPPtXg/cCHxioJ8D1WfFqpxV6bIhms9hKGfWapWZoWpphzqY2SGDceZwEfCziNgJUH4GkPRN4Pvpba2ZrzwjlplZExmMrqyzyDUpSRqdW/Yh4Mn0egUwU9KJafar8oxYa0kzYqWzkJmprJmZNciAzhwknUzWy+iTufD/lDSRrFlpW3mZZ8QyM2sdA0oOEbEfOLUi9vEa5Vt2Rqxqc1NvW/T+BuyJmdnQ8x3SZmZW4ORgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZhUknSHpIUmbJW2SdE2Kj5LUJWlLeh6Z4pJ0Sxpy/glJ5+S2NSeV3yJpTi4+SdLGtM4tqjFipVkjODmYFR0gGzDyLGAKcFUacn4BsDoixgOr03vIhpAZnx7zyMYXQ9IoYCHZQJLnAgvLCSWVuSK33rRhqJdZ3ZwczCpExI6I+Fl6/WvgKbIRhGcAS1OxpcDF6fUMYFlk1gAj0jAyFwJdEbEnjUzcBUxLy06JiDUREcCy3LbMmsJgDdlt1pYkjQPeATwKdETEjrToBaAjvR5Dcdj5Mf3Ee6rEq31+YZj6Sr29vcyfcPCwWDsMn96uw8C3Sr2cHMz6IOnVwH3AZyJiX/6yQESEpBjqfag2TH2lUqnEjY/sPyw2lEPMD5d2HQa+VerlZiWzKiS9kiwx3BkR303hneVRh9PzrhTvazj6WvGxVeJmTcPJwaxC6jl0O/BURNyUW7QCKPc4mgM8kIvPTr2WpgAvpeanVcBUSSPTheipwKq0bJ+kKemzZue2ZdYU3KxkVvRO4OPARkkbUuwLwCLgHklzgeeAS9KylcB0oBv4DXA5QETskXQ92ZwlANdFxJ70+krgDuAk4MH0MGsaTg5mFSLiEaCv+w7Or1I+gKv62NYSYEmV+Drg7AHsptmQcrOSmZkV+MzBrA1VTk7liansSPnMwczMCgacHCRtS2PEbJC0LsUGbQwaMzMbfoN15vCeiJgYEZPT+8Ecg8bMzIbZUDUrDcoYNEO0b2Zm1o/BuCAdwI/SUALfSLf7D9YYNIepNs5M5Tgl8yccGIQq1WewxkdplbFWammHOpjZIYORHN4VEdslvR7okvR0fuFgjkFTbZyZynFKLqvopTGUBmv8mlYZa6WWdqiDmR0y4GaliNienncB95NdMxisMWjMzKwBBpQcJJ0s6TXl12RjxzzJII1BM5B9MzOzozfQZqUO4P40lPHxwHci4oeS1jJ4Y9CYmdkwG1ByiIitwNurxHczSGPQmJnZ8PPwGQPgIQrMrF15+AwzMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwq0LSEkm7JD2Zi42S1CVpS3oemeKSdIukbklPSDont86cVH6LpDm5+CRJG9M6tygNbWzWLJwczKq7g+I85guA1RExHlid3gNcBIxPj3nAbZAlE2AhcB7ZJFgLywkllbkit57nTLem4uRgVkVEPAxUzikyA1iaXi8FLs7Fl0VmDTAizYB4IdAVEXsiYi/QBUxLy06JiDVpGPtluW2ZNQUP2W1Wv440cyHAC2STXQGMAZ7PletJsVrxnirxAknzyM5G6OjooFQqFcr09vYyf8LBmjtebb1m19vb25L73Z9WqZeTwyCqnN8BPMdDu4qIkBTD8DmLgcUAkydPjs7OzkKZUqnEjY/sr7mdbZcW12t2pVKJavVtda1SLzcrmdVvZ2oSIj3vSvHtwBm5cmNTrFZ8bJW4WdM46uQg6QxJD0naLGmTpGtS/EuStkvakB7Tc+tcm3pnPCPpwlx8Wop1S1pQ7fPMmsAKoNzjaA7wQC4+O/VamgK8lJqfVgFTJY1MF6KnAqvSsn2SpqReSrNz2zJrCgNpVjoAzI+In0l6DbBeUlda9rWI+Gq+sKSzgJnA24DTgR9LektafCtwAVnb61pJKyJi8wD2zWxAJN0FdAKnSeoh63W0CLhH0lzgOeCSVHwlMB3oBn4DXA4QEXskXQ+sTeWui4jyRe4ryXpEnQQ8mB5mTeOok0P69bMjvf61pKfo46JaMgNYHhEvA7+Q1E3WvQ+gOyK2Akhanso6OVjDRMSsPhadX6VsAFf1sZ0lwJIq8XXA2QPZR7OhNCgXpCWNA94BPAq8E7ha0mxgHdnZxV6yxLEmt1q+h0Zlj47z+vicQs+Nyiv/8yccGHiFBlE9vRJapfdCLe1QBzM7ZMDJQdKrgfuAz0TEPkm3AdcDkZ5vBD4x0M+B6j03Kq/8X1alx1Aj1dNLpFV6L9TSDnUws0MGlBwkvZIsMdwZEd8FiIidueXfBL6f3vbVc4MacTMza4CB9FYScDvwVETclIuPzhX7EFAem2YFMFPSiZLOJBsy4DGyi3XjJZ0p6QSyi9Yrjna/zMxs4AZy5vBO4OPARkkbUuwLwCxJE8malbYBnwSIiE2S7iG70HwAuCoiDgJIupqs299xwJKI2DSA/TIzswEaSG+lR4BqI0murLHODcANVeIra63XyirvmvYd02bWCnyHtJmZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYEn+zE7BngiKjtSPnMwM7MCnzk0gY3bXzpswED/ojOzRvOZg5mZFTg5mJlZgZODmZkVODmYmVmBL0g3IY/kamaN5jMHMzMrcHIwM7MCNyu1AN/dakPBzZdWi88czMysoGnOHCRNA24mm0f6WxGxqMG71NT8q6/1+Zi3ZtYUyUHSccCtwAVAD7BW0oqI2NzYPWsdbnpqLc14zPsYsrymSA7AuUB3RGwFkLQcmAE4OQxAtS/7kfI/hyHTEsd8PceQj5H21CzJYQzwfO59D3BeZSFJ84B56W2vpGeA04BfDfkeDhJ9pWq4aevQx/5W07R1qOHPGvjZAznmKzX0b38Ex8iRasVjqh6Nrlddx32zJIe6RMRiYHE+JmldRExu0C4NCtfB+lLtmK/Urn9716uxmqW30nbgjNz7sSlm1q58zFtTa5bksBYYL+lMSScAM4EVDd4ns6HkY96aWlM0K0XEAUlXA6vIuvUtiYhNda5e85S7RbgOx5gBHvOV2vVv73o1kCKi0ftw05ypAAACzElEQVRgZmZNplmalczMrIk4OZiZWUHLJgdJ0yQ9I6lb0oJG7089JC2RtEvSk7nYKEldkrak55GN3Mf+SDpD0kOSNkvaJOmaFG+perSLVvwelEnaJmmjpA2S1qVY1eNImVtSPZ+QdE5j9/6QI/le16qHpDmp/BZJcxpRl7yWTA65oQcuAs4CZkk6q7F7VZc7gGkVsQXA6ogYD6xO75vZAWB+RJwFTAGuSn/7VqtHy2vh70HeeyJiYq7ff1/H0UXA+PSYB9w27Hvatzuo/3tdtR6SRgELyW6EPBdY2OgfWC2ZHMgNPRARvwfKQw80tYh4GNhTEZ4BLE2vlwIXD+tOHaGI2BERP0uvfw08RXa3b0vVo0205PegH30dRzOAZZFZA4yQNLoRO1jpCL/XfdXjQqArIvZExF6gi2LCGVatmhyqDT0wpkH7MlAdEbEjvX4B6GjkzhwJSeOAdwCP0sL1aGGt/j0I4EeS1qdhQqDv46jV6nqk9Wi6+jXFfQ6WiYiQ1BJ9iyW9GrgP+ExE7JP0p2WtVA9rqHdFxHZJrwe6JD2dX9gux1Gr1qNVzxzaaeiBneXT4/S8q8H70y9JryRLDHdGxHdTuOXq0QZa+nsQEdvT8y7gfrJmsr6Oo1ar65HWo+nq16rJoZ2GHlgBlHsmzAEeaOC+9EvZKcLtwFMRcVNuUUvVo0207PdA0smSXlN+DUwFnqTv42gFMDv19pkCvJRrtmlGR1qPVcBUSSPTheipKdY4EdGSD2A68B/As8DfNXp/6tznu4AdwB/I2hTnAqeS9WbYAvwYGNXo/eynDu8iayt+AtiQHtNbrR7t8mjF70Ha7zcBP0+PTeV97+s4AkTWM+tZYCMwudF1yNWl7u91rXoAnwC60+PyRtfLw2eYmVlBqzYrmZnZEHJyMDOzAicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK/j/GVUGM5tv3DMAAAAASUVORK5CYII=\n" 549 | }, 550 | "metadata": { 551 | "tags": [] 552 | } 553 | } 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "id": "QwdSGIhGMEbz", 560 | "colab_type": "text" 561 | }, 562 | "source": [ 563 | "Interesting. We can fix the maximum length of the summary to 8 since that seems to be the majority summary length.\n", 564 | "\n", 565 | "Let us understand the proportion of the length of summaries below 8" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "metadata": { 571 | "trusted": true, 572 | "id": "7JRjwdIOFxg3", 573 | "colab_type": "code", 574 | "colab": {}, 575 | "outputId": "f968be82-c539-471d-ce23-16f18b059ea0" 576 | }, 577 | "source": [ 578 | "cnt=0\n", 579 | "for i in data['cleaned_summary']:\n", 580 | " if(len(i.split())<=8):\n", 581 | " cnt=cnt+1\n", 582 | "print(cnt/len(data['cleaned_summary']))" 583 | ], 584 | "execution_count": 0, 585 | "outputs": [ 586 | { 587 | "output_type": "stream", 588 | "text": [ 589 | "0.9424907471335922\n" 590 | ], 591 | "name": "stdout" 592 | } 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": { 598 | "id": "yYB4Ga9KMjEu", 599 | "colab_type": "text" 600 | }, 601 | "source": [ 602 | "We observe that 94% of the summaries have length below 8. So, we can fix maximum length of summary to 8.\n", 603 | "\n", 604 | "Let us fix the maximum length of review to 30" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "metadata": { 610 | "trusted": true, 611 | "id": "ZKD5VOWqFxhC", 612 | "colab_type": "code", 613 | "colab": {} 614 | }, 615 | "source": [ 616 | "max_text_len=30\n", 617 | "max_summary_len=8" 618 | ], 619 | "execution_count": 0, 620 | "outputs": [] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": { 625 | "id": "E6d48E-8M4VO", 626 | "colab_type": "text" 627 | }, 628 | "source": [ 629 | "Let us select the reviews and summaries whose length falls below or equal to **max_text_len** and **max_summary_len**" 630 | ] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "metadata": { 635 | "trusted": true, 636 | "id": "yY0tEJP0FxhI", 637 | "colab_type": "code", 638 | "colab": {} 639 | }, 640 | "source": [ 641 | "cleaned_text =np.array(data['cleaned_text'])\n", 642 | "cleaned_summary=np.array(data['cleaned_summary'])\n", 643 | "\n", 644 | "short_text=[]\n", 645 | "short_summary=[]\n", 646 | "\n", 647 | "for i in range(len(cleaned_text)):\n", 648 | " if(len(cleaned_summary[i].split())<=max_summary_len and len(cleaned_text[i].split())<=max_text_len):\n", 649 | " short_text.append(cleaned_text[i])\n", 650 | " short_summary.append(cleaned_summary[i])\n", 651 | " \n", 652 | "df=pd.DataFrame({'text':short_text,'summary':short_summary})" 653 | ], 654 | "execution_count": 0, 655 | "outputs": [] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": { 660 | "id": "tR1uh8xSNUma", 661 | "colab_type": "text" 662 | }, 663 | "source": [ 664 | "Remember to add the **START** and **END** special tokens at the beginning and end of the summary. Here, I have chosen **sostok** and **eostok** as START and END tokens\n", 665 | "\n", 666 | "**Note:** Be sure that the chosen special tokens never appear in the summary" 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "metadata": { 672 | "trusted": true, 673 | "id": "EwLUH78CFxhg", 674 | "colab_type": "code", 675 | "colab": {} 676 | }, 677 | "source": [ 678 | "df['summary'] = df['summary'].apply(lambda x : 'sostok '+ x + ' eostok')" 679 | ], 680 | "execution_count": 0, 681 | "outputs": [] 682 | }, 683 | { 684 | "cell_type": "markdown", 685 | "metadata": { 686 | "id": "1GlcX4RFOh13", 687 | "colab_type": "text" 688 | }, 689 | "source": [ 690 | "We are getting closer to the model building part. Before that, we need to split our dataset into a training and validation set. We’ll use 90% of the dataset as the training data and evaluate the performance on the remaining 10% (holdout set):" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "metadata": { 696 | "trusted": true, 697 | "id": "RakakKHcFxhl", 698 | "colab_type": "code", 699 | "colab": {} 700 | }, 701 | "source": [ 702 | "from sklearn.model_selection import train_test_split\n", 703 | "x_tr,x_val,y_tr,y_val=train_test_split(np.array(df['text']),np.array(df['summary']),test_size=0.1,random_state=0,shuffle=True) " 704 | ], 705 | "execution_count": 0, 706 | "outputs": [] 707 | }, 708 | { 709 | "cell_type": "markdown", 710 | "metadata": { 711 | "id": "Vq1mqyOHOtIl", 712 | "colab_type": "text" 713 | }, 714 | "source": [ 715 | "#Preparing the Tokenizer\n", 716 | "\n", 717 | "A tokenizer builds the vocabulary and converts a word sequence to an integer sequence. Go ahead and build tokenizers for text and summary:\n", 718 | "\n", 719 | "#Text Tokenizer" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "metadata": { 725 | "trusted": true, 726 | "id": "oRHTgX6hFxhq", 727 | "colab_type": "code", 728 | "colab": {} 729 | }, 730 | "source": [ 731 | "from keras.preprocessing.text import Tokenizer \n", 732 | "from keras.preprocessing.sequence import pad_sequences\n", 733 | "\n", 734 | "#prepare a tokenizer for reviews on training data\n", 735 | "x_tokenizer = Tokenizer() \n", 736 | "x_tokenizer.fit_on_texts(list(x_tr))" 737 | ], 738 | "execution_count": 0, 739 | "outputs": [] 740 | }, 741 | { 742 | "cell_type": "markdown", 743 | "metadata": { 744 | "id": "RzvLwYL_PDcx", 745 | "colab_type": "text" 746 | }, 747 | "source": [ 748 | "#Rarewords and its Coverage\n", 749 | "\n", 750 | "Let us look at the proportion rare words and its total coverage in the entire text\n", 751 | "\n", 752 | "Here, I am defining the threshold to be 4 which means word whose count is below 4 is considered as a rare word" 753 | ] 754 | }, 755 | { 756 | "cell_type": "code", 757 | "metadata": { 758 | "trusted": true, 759 | "id": "y8KronV2Fxhx", 760 | "colab_type": "code", 761 | "colab": {}, 762 | "outputId": "d2eb2f27-fbbc-4e61-9556-3c3ff5e4327b" 763 | }, 764 | "source": [ 765 | "thresh=4\n", 766 | "\n", 767 | "cnt=0\n", 768 | "tot_cnt=0\n", 769 | "freq=0\n", 770 | "tot_freq=0\n", 771 | "\n", 772 | "for key,value in x_tokenizer.word_counts.items():\n", 773 | " tot_cnt=tot_cnt+1\n", 774 | " tot_freq=tot_freq+value\n", 775 | " if(value