├── Chapter 01 └── NLTK Chapter 1 - Exercises.ipynb ├── Chapter 02 ├── NLTK Chapter 2 - Exercises.ipynb ├── NLTK Chapter 2 - Notes.ipynb └── alice.txt ├── Chapter 03 ├── NLTK Chapter 3 - Exercises.ipynb ├── NLTK Chapter 3 - Notes.ipynb ├── Nationalities.csv └── tranzlator.yml ├── Chapter 04 ├── NLTK Chapter 4 - Exercises.ipynb └── NLTK Chapter 4 - Notes.ipynb ├── Chapter 05 ├── NLTK Chapter 5 - Exercises.ipynb └── NLTK Chapter 5 - Notes.ipynb ├── Chapter 06 └── NLTK Chapter 6 - Notes and Exercises.ipynb ├── Chapter 07 └── NLTK Chapter 7 - Notes and Exercises.ipynb ├── Chapter 08 ├── NLTK Chapter 8 - Exercises.ipynb └── NLTK Chapter 8 - Notes.ipynb ├── Chapter 09 └── NLTK Chapter 9 - Notes and Exercises.ipynb ├── Chapter 10 └── NLTK Chapter 10 - Notes .ipynb ├── Chapter 11 ├── NLTK Chapter 11 - Notes & Exercises.ipynb └── Swadesh Lists.csv └── README.md /Chapter 01/NLTK Chapter 1 - Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NLTK Chapter 1\n", 8 | "\n", 9 | "## Language Processing and Python\n", 10 | "\n", 11 | "*The html version of this chapter in the NLTK book is available [here](https://www.nltk.org/book/ch01.html#exercises \"Ch01 Exercises\").*\n", 12 | "\n", 13 | "### 8 Exercises\n", 14 | "\n", 15 | "###### 1. \n", 16 | "\n", 17 | "☼ Try using the Python interpreter as a calculator, and typing expressions like 12 / (4 + 1)." 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": {}, 24 | "outputs": [ 25 | { 26 | "data": { 27 | "text/plain": [ 28 | "8" 29 | ] 30 | }, 31 | "execution_count": 2, 32 | "metadata": {}, 33 | "output_type": "execute_result" 34 | } 35 | ], 36 | "source": [ 37 | "8008 // ((10 * 10 * 10) + 1)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "###### 2. \n", 45 | "\n", 46 | "☼ Given an alphabet of 26 letters, there are 26 to the power 10, or `26 ** 10`, ten-letter strings we can form. That works out to `141167095653376`. How many hundred-letter strings are possible?\n", 47 | "\n", 48 | "*`print(\"{}\".format)` has not yet been introduced in the book, but it can be used to make strings - especially numerical ones - much more legible.*" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 4, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "name": "stdout", 58 | "output_type": "stream", 59 | "text": [ 60 | "3,142,930,641,582,938,830,174,357,788,501,626,427,282,669,988,762,475,256,374,173,175,398,995,908,420,104,023,465,432,599,069,702,289,330,964,075,081,611,719,197,835,869,803,511,992,549,376\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "print(\"{:,}\".format(26 ** 100))" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "###### 3. \n", 73 | "☼ The Python multiplication operation can be applied to lists. What happens when you type `['Monty', 'Python'] * 20`, or `3 * sent1`?" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 13, 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "name": "stdout", 83 | "output_type": "stream", 84 | "text": [ 85 | "['Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python', 'Monty', 'Python']" 86 | ] 87 | } 88 | ], 89 | "source": [ 90 | "print(['Monty', 'Python'] * 20, end = '')" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 10, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "*** Introductory Examples for the NLTK Book ***\n", 103 | "Loading text1, ..., text9 and sent1, ..., sent9\n", 104 | "Type the name of the text or sentence to view it.\n", 105 | "Type: 'texts()' or 'sents()' to list the materials.\n", 106 | "text1: Moby Dick by Herman Melville 1851\n", 107 | "text2: Sense and Sensibility by Jane Austen 1811\n", 108 | "text3: The Book of Genesis\n", 109 | "text4: Inaugural Address Corpus\n", 110 | "text5: Chat Corpus\n", 111 | "text6: Monty Python and the Holy Grail\n", 112 | "text7: Wall Street Journal\n", 113 | "text8: Personals Corpus\n", 114 | "text9: The Man Who Was Thursday by G . K . Chesterton 1908\n" 115 | ] 116 | } 117 | ], 118 | "source": [ 119 | "import nltk\n", 120 | "\n", 121 | "from nltk.book import *\n" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 12, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "name": "stdout", 131 | "output_type": "stream", 132 | "text": [ 133 | "['Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.', 'Call', 'me', 'Ishmael', '.']" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "print(3 * sent1, end = '')" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "###### 4.\n", 146 | "\n", 147 | "☼ Review [1](https://www.nltk.org/book/ch01.html#sec-computing-with-language-texts-and-words \"computing with language\") on computing with language. How many words are there in `text2`? How many distinct words are there?" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 30, 153 | "metadata": {}, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "6833" 159 | ] 160 | }, 161 | "execution_count": 30, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "# number of words\n", 168 | "\n", 169 | "len(set(text2))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "*This isn't entirely accurate, as punctuation marks(e.g., periods, commas, etc...) are included as tokens. To eliminate non-alphabetic strings, we can use this list comprehension with the method `.isalpha()`.* \n", 177 | "\n", 178 | "*However, another problem is that the tokenizer splits words with apostrophes (i.e., contractions and possessives) into their constituent parts, so that a word such as `didn't` would become `didn`, `'`, and `t`. `.isalpha()` will remove apostrophes from the count, but it would be inaccurate to separately count all the enclitics `d`, `ll`, `m`, `re`, `s`, `t`, `ve`s that result from tokenization. Therefore, we'll add the boolean `and not in (\"d\", \"ll\", \"m\", \"re\", \"s\", \"t\", \"ve\")` to the list comprehension to account for these:*" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 31, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "data": { 188 | "text/plain": [ 189 | "6713" 190 | ] 191 | }, 192 | "execution_count": 31, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "len(set([w for w in text2 if w.isalpha()]))" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 307, 204 | "metadata": {}, 205 | "outputs": [ 206 | { 207 | "data": { 208 | "text/plain": [ 209 | "6706" 210 | ] 211 | }, 212 | "execution_count": 307, 213 | "metadata": {}, 214 | "output_type": "execute_result" 215 | } 216 | ], 217 | "source": [ 218 | "len(set([w for w in text2 if w.isalpha() and w not in (\"d\", \"ll\", \"m\", \"re\", \"s\", \"t\", \"ve\")]))" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "*If by chance we want to include numbers:*" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 33, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "text/plain": [ 236 | "6762" 237 | ] 238 | }, 239 | "execution_count": 33, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "len(set([w for w in text2 if w.isalnum() and w not in (\"d\", \"ll\", \"s\", \"t\")]))" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "###### 5.\n", 253 | "\n", 254 | "☼ Compare the lexical diversity scores for humor and romance fiction in [1.1](https://www.nltk.org/book/ch01.html#tab-brown-types \"Brown types\"). Which genre is more lexically diverse?\n", 255 | "\n", 256 | "*Humor (0.231) is about twice as lexically diverse as romance fiction (0.121).*" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "###### 6. \n", 264 | "\n", 265 | "☼ Produce a dispersion plot of the four main protagonists in *Sense and Sensibility*: Elinor, Marianne, Edward, and Willoughby. What can you observe about the different roles played by the males and females in this novel? Can you identify the couples?" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 35, 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "data": { 275 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZwAAAEWCAYAAABSaiGHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3df5xcVX3/8dc7WUiAtFl+VUAkq/gbfwRMLSCQtdX6C7G2WqG2EquibeXbWGkbv1iyPmorItpoaUX0SxcREEStlmrVKgEDBkgw/FIoCAtGigZxwSARiZ/vH+dc5u5kZnd2dvbsrryfj8c85s45557zuefemU/m3ptZRQRmZmbTbd5MB2BmZo8NTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjj1mSfqypOOn2McKSeum2MdNkgan0kcv9WJeuhhzSNKnSo5p5Tnh2JwgaUTSi3rZZ0S8LCLO6WWfdZIGJIWkrfnxQ0mXSHpxUxwHRcTa6YpjsqZrXiQNS3o4z8V9kr4m6eld9NPzY8HKcMIxm379EbEIeC7wNeDzklbMVDCS+mZqbOC0PBf7Az8ChmcwFivMCcfmPElHS9okaVTSlZKek8sPzP+SPiS/3k/SvdXpK0lrJb251s9bJH1X0k8lfae23ipJ36uVv7qbOCPinoj4MDAEvF/SvNz/o/9il/R8SRskPZC/EX0ol1fflk6QdLek/5X0zlrs82px/ljSRZL2aFr3TZLuAr4haaGkT+W2o5KukfS45nnJ/b5b0p2SfiTpk5IWN/V7vKS78tye3OFc/Aw4H3hWq3pJx+RTjaM5nmfk8nOBA4D/yN+U/may+8FmjhOOzWk5KZwNvBXYE/gY8EVJCyLie8DfAudJ2hX4N2C41ekrSa8lJYI3AL8OHAP8OFd/DzgSWAy8B/iUpH2nEPbngN8Antai7sPAhyPi14EDgYua6l8IPAX4XWBV7dTS/wF+D1gO7Af8BPiXpnWXA88AXgIcn7fnCaR5exvwUIt4VuTHC4EnAYuAM5raHJG35XeAU6rkMB5Ji4DXA99uUfdU4AJgJbA38CVSgtk5Iv4EuAt4ZUQsiojTJhrLZg8nHJvr3gJ8LCKuiojt+drDz4FDASLi48CtwFXAvkC7f4G/mXS655pIbouIO3Mfn4mIuyPilxFxYe7v+VOI+e78vEeLul8AT5a0V0RsjYj1TfXviYgHI+IGUgI9Lpe/FTg5IjZHxM9JyfM1TafPhvK6D+Vx9gSenOdtY0Q80CKe1wMfiojbI2Ir8C7g2KZ+3xMRD0XEdcB1pFOH7ZwkaRS4jZS8VrRo8zrgPyPiaxHxC+B0YBfg8HH6tTnACcfmuiXAO/Opl9H8YfYE0r/yKx8nnbr55/xh3MoTSN9kdiDpDbVTdqO5r72mEPPj8/N9LereBDwVuDmf5jq6qf77teU7aWznEtK1oSrG7wLbgce1Wfdc4CvAp/MputMk7dQinv3yOPUx+5r6vae2/DNSImnn9Ijoj4h9IuKY/C103DEj4pc59se3aGtziBOOzXXfB/4hf4hVj10j4gJ49NTNGuD/AUPVdY02/RzYXChpCSlhvR3YMyL6gRsBTSHmV5MumN/SXBERt0bEcaRTbu8HLpa0W63JE2rLB9D4tvR94GVN87AwIn5Q7742zi8i4j0R8UzSN4ejSacTm91NSmb1MR8BftjhtnZjzJiSRNrualv8E/dzlBOOzSU75Yvd1aOPlAzeJum3lOwm6RWSfi2v82FgY0S8GfhP4Mw2fX+CdLrnebmfJ+dksxvpA24LgKQ30uZC90QkPU7S24HVwLvyv9yb2/yxpL1z3Wgu3l5r8neSdpV0EPBG4MJcfibwDzlmJO0t6VXjxPJCSc+WNB94gHSKbXuLphcA75D0xJy8/xG4MCIemcy2T9JFwCsk/U7+1vVO0mnSK3P9D0nXk2yOccKxueRLpAvb1WMoIjaQruOcQbpQfhv5ukD+wH0p6YI4wF8Bh0h6fXPHEfEZ4B9Id079FPh3YI+I+A7wQeBbpA+6ZwNXTDLuUUkPAjcALwdeGxFnt2n7UuAmSVtJyfLYiNhWq78sb+PXSaenvprLPwx8EfiqpJ8C64HfGiemfYCLScnmu7nfVv/x8mzS6bfLgTuAbcCJ42/u1ETELcAfA/8M3Au8knSTwMO5yfuAd+fThydNZyzWW/IfYDOb/SQNkD7wd5rmbxdm08bfcMzMrAgnHDMzK8Kn1MzMrAh/wzEzsyJm8kf8ZrW99torBgYGZjoMM7M5ZePGjfdGxN6t6pxw2hgYGGDDhg0zHYaZ2Zwi6c52dT6lZmZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWRN9MB1AnsR24oVb06QhOlVgLnBTBBokvAX8UweiMBNmBwcH0vHZt67qRkbQ8WtuCpUvHrletOzSUHvX1165ttB0ZgRUrxrYdGmqMMzDQWGdkBO65Bw49FNavT8+XXw5HHdXou1r/1FNh4UJYuXJsTNW4mzaluuHhNP7QEPT3p7r+/rRtS5c2xlm3DvbfP7UdHk5xrV3bGG9goLEd1bbUt7Uar5Xh4bGvq3ndtg322acRy8hIWh4dHbsdS5c2nqs5q/bRihVj527hwtRu3TpYtKgR05o1je1+4AE44IC03N+f+lu/vjGfw8Opbtu29KjmrV1dtT1DQ2mcetuREejrgyOOaGxT89zcc0/qq3nfrVkzdl76+xv7B1J5f//YsQYGYPPmtC+rdao5X7mycdzWj99qH7aqq46zap/Uj0Fo1C9dmtYbGEgPSG2r+armuSqrYq32X3UcVtta7fNqLqt9XK1X9XXqqWnuqrL6e7e+b1etasx5FX+13dW21N+f1RwPDMBll8Hy5Y3Y6+/Z9esb+64au4qjimHzZnj3u1P7devSMjT2b139M6HaF9V2VzHX35e9pojofa9dktgawaIW5WvJCaeHY/VF8Ei7+mXLlsWGDd0NJ6XnVlNb1U2kWlca20/1ulU/VXm7+k7Hbbdut3UTjdVq3V5sy0TjzgadzGn9WJjM+t20G2+dyfQxUczNdfW+Jzq+Ox2/m3083rHYbR/N7+Wp7ofp0Gr+u00NkjZGxLJWdXPulJrEiMReEgMS35X4uMRNEl+V2CW3WSqxXuJ6ic9L7J7L10r8o8RlwF/O6IaYmT3GzLaEs4vEptrjdRO0fwrwLxEcBIwCf5DLPwn8bQTPIZ2iW11bpz+C5RF8sLkzSSdI2iBpw5YtW3qwOWZmVplV13CAhyJYOnGzR90RQT4Ty0ZgQGIxKalclsvPAT5TW+fCdp1FxFnAWZBOqU0iDjMzm8Bs+4YzWT+vLW+nswT64DTFYmZm45ht33CmLIL7JX4icWQE3wT+BB79tlPE8uXj13Vyl1pldf1kYK3v6rl+l1q9fTXOZO9Sq/podZdafdzmu9QAFi9Oz53epVaPd8mSsdvRvK29vEuteTtm8i61+ryNV1fNVfNdagDz5098l1rzNndyl1o1fn2sJUs6u0utHnPzXWr1uuo4q+KrH4Mw9i61avxO7lKr2k72LrX6GNVdavWyTu5Sa97ueln1PJm71Kq5mc671Oox12Pttdl2l1rzbdH/FcGqptuiR4BlwCLgkgieldc9CVgUwZDEUuBMYFfgduCNEfxkMne7TeUuNTOzx6rx7lKbVQlnNnHCMTObvF+p26LNzGxucsIxM7MinHDMzKwIJxwzMyvCCcfMzIpwwjEzsyKccMzMrAgnHDMzK8IJx8zMinDCMTOzIpxwzMysCCccMzMrwgnHzMyKcMIxM7MinHDMzKwIJxwzMyvCCcfMzIpwwjEzsyKccMzMrAgnHDMzK8IJx8zMinDCMTOzIpxwzMysCCccMzMrwgnHzMyKcMIxM7MinHDMzKwIJxwzMyvCCcfMzIpwwjEzsyKccMzMrAgnHDMzK8IJx8zMinDCMTOzInqScCRC4tza6z6JLRKXTLKf/SQu7kVMM2loKD3GMzAA/f0wONh4brV+cz8LF6Z1h4bSelVfg4ONusHB9Jg3r1E+OJjq6utWY1f9VOsNDoLUiKlap6qr+qnqq3j7+9OY8+Y1xqxi6utLy/39jbJq3Grs+jZXj4GBsfPSrIqr1dyNpx57tV2t1q/PX/153ryxMVbz0t/fmLt588b2Uc1DX19j/qv1+/rGxlPVL1w4dj7qc1LF3dc3dv5azU8Ve337JlqnlWo/N++X6jiq19X3b9Wu+XV1TFRzV21ffR5arVc91+e+rvmY6HTbqv6GhtK8tnpv1eNtt2319079fVOPvb+/cSxUj2o/1cest6/mo5q3VsdDu5jq5c3HQaV6D3fyHuqGImLqnYitwK3A4RE8JPEy4H3A5giO7rCPvggemXIwPbJs2bLYsGFDV+tK6Xm8qa3a1FXt6+tLY/tpXq9qM1Xt+pmo/6mO326b27VpntPmue5k7ut91ceqv27ur1uTmZ9etG03P+OtP9GctutvvDlqd+yOdyyPp1U/7Y6T5rGal5vbdhNP85jttrWb91Q37dsdu53sw1bzU+9nsiRtjIhlrep6eUrty8Ar8vJxwAWNAHi+xJUS387PT8vlKyQ+I/EfwFclBiRuzHUDEt+UuDY/Ds/lgxJrJS6WuFniPAnluhGJ9+T2N0g8PZfvJnG2xDU5hlf1cLvNzKwDvUw4nwaOlVgIPAe4qlZ3M3BUBAcDpwD/WKs7DDg+gt9u6u9HwIsjOAR4HfCRWt3BwErgmcCTgBfU6u7N63wUOCmXnQx8I4LfBF4IfEBit+YNkHSCpA2SNmzZsmUSm25mZhPp61VHEVwvMUD6dvOlpurFwDkSTwEC2KlW97UI7mvR5U7AGRJLge3AU2t1V0ewGUBiEzAArMt1n8vPG4Hfz8u/CxwjPZqAFgIHAN8duw1xFnAWpFNqE2yymZlNQs8STvZF4HRgENizVv73wKURvDonpbW1ugfb9PUO4IfAc0nfxLbV6n5eW97O2O34eYtyAX8QwS0dboeZmfVYrxPO2cD9EdwgMVgrXwz8IC+v6LCvxaSbDn4pcTwwfwpxfQU4UeLECELi4Ai+PYX+xrV69cRtliyB0VFYuhQ2bUrPrdZv7mvBAthnH1ixAtasafQ1MADr16e66q6Vyy+Ho45K5YceCiMjqbxatxq7snx5Y/myyxqvlyxJ66xdm15X/VTjVDGuWQMPPJCWd945jVnFtHlz4061bdtS2ejo2LGru2Xq2zw83H4uqrIqrnZtWqm2baL1ly9vzF/9+eGH4ZRTGjEODKR5GR2F++9P611++dg+qnkAWLQozf/gYFq/Kq/iqY6J9eth1aqxMVVzUm3DunVwxBGt776q+qtir7epz3mrdVpZvTrt5/odWlVZpapbvHjHdZtfn3pq446r0VFYuTJtX3W8bdqUylr1s3r12Lmvqx/Lze+n8bat6m9wEN77Xth//x3fW5s2NeJtt2319071PhkZSdtVtV+zBrZuTcdCZdu2tJ+gMWa9PaT5qOatmq/mbW4V00SxQtpnDzzQOLZ7rWd3qUWwqKlsEDgpgqMlDgPOAbYA3wD+JIIBiRXAsgjentcZAC6J4Fn59NtngZ8BlwInRrCo3m9e5wxgQwTDEiO5v3sllgGnRzAosQuwBjic9G1nZKK756Zyl5qZ2WPVeHep9STh/CpywjEzm7xSt0WbmZm15YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEUUTjsR2iU21x6oWbQYlLpnmOIYkTprOMdoZHEyPgYH0ABgaaiwPDqbXnRgaSu3Hq++0r15auLAxdrVt1Tb396f6wcG0XG1vfQ6mot4fjB27mvtqrP7+1uvW91FdL+KD1vukXjZd+6xdv1V5q/r6cVpXHXfVvpSgr2/H47F+jNafq/mdN68xTv04qPZP8/6s1m31Pmk+hurr1seu+q/GbBX3eO+rXutmf1fbVN++dsd+/biu3pvV9lfbPjSU5nu6Py8UEdM7Qn0wsTWCRRO0GQROiuDoHo05P4LtTWVDwNYITm+33rJly2LDhg29CKE5njEiGmXNy5321a7tZPrqpeZtnIypxlofuz6fnYzXqm1zfS/mslU/9bJejdPJuPXydnFB+3g7mbOqrN06neyndu3ajVV/3ek407GvO9HNWJ0c192+D6f+HtTGiFjWqm5WnFKTeKnEzRLrgN+vld8g0S8hiR9LvCGXnyvxIokBiW9KXJsfh+f6QYlLJc4HbshlJ0vcIvHfwNNmYDPNzB7T+gqPt4vEptrr9wFfAD4O/DZwG3Bhrf4K4AXAncDtwJHAJ4FDgT8Dfgm8OIJtEk8BLgCqzPp84FkR3CHxPOBY4GDSNl8LbGwOTtIJwAkABxxwQC+218zMstIJ56EIltYLJJYCd0Rwa379KfKHPvBN4ChSwvkocILE44H7ItgqsRg4I/exHXhqreurI7gjLx8JfD6Cn+UxvtgquIg4CzgL0im1KW+tmZk9alacUgPafbhfTkoWRwJrgS3Aa0iJCOAdwA+B55K+2excW/fBDscwM7MCSn/DaeVm4IkSB0bwPeC4qiKC70vsBewcwe35Gs9JwNtzk8XA5gh+KXE8ML/NGJcDwxKnkrb5lcDHpml7xrV8eXoeGWmUrV4Nw8ON+k7vkFm9GtauHb9+JixYAKtq9x8OD6c7YkZGYHQUtm2DQw+FTZtg6dLG9lZzMBXN87dkSWPs+p2Aw8MpllbrVnNa30dVX73Qar/Uy6Zrv7XrtypvVd9um6vjeMGCtC8vuwzmz4cjjtix72o+q3Xq74G77mqMs2JFWq6Og9HRscdHfV3Y8X1Sfx9Vbas29Rjq+3XFCnjve3eMuz7OdOtmf1fz2rx9rY79qnztWli/Pr03q3navDlt++AgrFkDK1dOPpbJKH2X2nbyRfzsvyJYJfFSYA1wL7COdO3l6LzOucD8CP4o3xSwDtg7gh/n6zafBX4GXAqcGMGiVne6SZwMvIF0em4z8J2ZuEvNzOxX2Xh3qRVNOHOJE46Z2eTN+tuizczsV58TjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRUxYcKR+CeJlbXXX5H4RO31ByX+r8TF+fWgxCV5eYXEGb0OWmJEYq8W5UMSJ/V6vLlgaAgGB2FgID0PDaXl/v70unoMDaX2g4OwcGF6XS+r+unrS+tDeq4/+vtTu/7+xnJzfSeqdTs1ONh5371Uj7Gar+b5rD83t6m2s75/6vupUp//hQt3jKHevtp39X5axdkca/OY02Ey/U93LJPRzdzU57i+D5qPh2ofVGXNx8bQUGOfVrFU7at1BgZ27H861MftNUXE+A3Ea4HXRvCHEvOAa4CHIzgs138LWBnBVfn1IHBSBEdLrACWRfD2ngYtRnK/9zaVDwFbIzh9qmMsW7YsNmzYMNVuipE6bxuxY/tWZeOVdzLGRKp+O2nbTftekRpjjjdHU52rVvukHkPzeJPpq1276VCfr162nW7dHF+TPR46OUbqbVrtz3r/02GqfUvaGBHLWtV1ckrtCuDwvHwQcCPwU4ndJRYAzwB+InHj+EGwROLrEtfn5wNy+bDEa2rttubneRL/KnGTxCUSX6q3A06UuFbiBomn18qfK/ENiVsl3pL7OlfiVbUxzpM4poNtNzOzHpkw4URwN/BIThCHA98CrgIOA5YB1wMPdzDWGcAnI3gOcB7wkQna/z4wADwbeHMer+7eCA4BPgpjTqM9B3hFbn+KxH7AJ4A3AkgsztvxpeYBJZ0gaYOkDVu2bOlgk8zMrFOd3jRQfcupEs63aq+v7LCPw4Dz8/K5wBETtD8C+EwEv4zgHuDSpvrP5eeNpMRU+UIED+XTbZcCz4/gMuDJEr8BHAd8NoJHmgeMiLMiYllELNt777073CwzM+tEpwnnSlJyeTbplNp6UgI5nJSMulGdJXykikNCwM65fKKz4T/Pz9uBvhb9Nr8+F3g96ZvOv3URr5mZTUHfxE2AlFTeCdwewXbgPol+0jWdtwCLOujjSuBYGh/863L5CPA84CLgVcBOuXwdcLzEOcDewCCNb0jjeZXE+4Dd8jqrcvkwcDVwTwQ3ddDPnLJ6NaxdCyMjjTuXhodhdBSWLm20q+5AWb4c1q+HVasadcuXp/q1a2HdOth//1S+ZMnYsUZHYeVKWLMmvV65Mo1Vr+/E4sVp3U4tXw6bNnXevldWr95xee3a9FzNZ1Veb1u12bQpbWd9/9T3U6U+/+vX7xjD2rWN9gsWpH03PLxjP+3iqPqezruQmsfuZdvpVs3/ZNSPh/q6zcdDtQ+qsuZjA+DUUxvvx+XLdxxneBhWrBjb73RoHruXJrxLDUBiPvAT4CMRvDuXDQOHRfA0iQHgkgie1e4utdzmbGAvYAvwxgjukngc8AXSt5yvAydGsCjfEfevwFHA/wALgA9F8LX6XWoSy4DTIxjMd6ntBxwIHACcFsHHa9vxX8C/R3DmRNs81+5SMzObDca7S62jhDNTJBZFsFViT9K3kxfk6znd9LUrcANwSAT3T9TeCcfMbPLGSzidnlKbKZfkU3c7A38/hWTzItK3qw91kmzMzKz3ZnXCiWCwR/38N+kUm5mZzRD/lpqZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXhhGNmZkU44ZiZWRFOOGZmVoQTjpmZFeGEY2ZmRTjhmJlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEImKmY5iVJG0B7pzEKnsB905TOL02l2KFuRXvXIoV5la8cylWmFvx9jLWJRGxd6sKJ5wekbQhIpbNdBydmEuxwtyKdy7FCnMr3rkUK8yteEvF6lNqZmZWhBOOmZkV4YTTO2fNdACTMJdihbkV71yKFeZWvHMpVphb8RaJ1ddwzMysCH/DMTOzIpxwzMysCCecKZL0Ukm3SLpN0qqC4z5B0qWSvivpJkl/mcv3kPQ1Sbfm591zuSR9JMd5vaRDan0dn9vfKun4WvnzJN2Q1/mIJPUg7vmSvi3pkvz6iZKuymNfKGnnXL4gv74t1w/U+nhXLr9F0ktq5T3bF5L6JV0s6eY8x4fN5rmV9I58HNwo6QJJC2fT3Eo6W9KPJN1YK5v2+Ww3RhexfiAfC9dL+ryk/m7nrJv9Mtl4a3UnSQpJe82GuSUi/OjyAcwHvgc8CdgZuA54ZqGx9wUOycu/BvwP8EzgNGBVLl8FvD8vvxz4MiDgUOCqXL4HcHt+3j0v757rrgYOy+t8GXhZD+L+K+B84JL8+iLg2Lx8JvBnefnPgTPz8rHAhXn5mXmeFwBPzPM/v9f7AjgHeHNe3hnon61zCzweuAPYpTanK2bT3AJHAYcAN9bKpn0+243RRay/C/Tl5ffXYp30nE12v3QTby5/AvAV0n9g32tWzO1UP0Aey4+8E75Se/0u4F0zFMsXgBcDtwD75rJ9gVvy8seA42rtb8n1xwEfq5V/LJftC9xcKx/TrssY9we+Dvw2cEk+gO+tvZEfnc/8RjksL/fldmqe46pdL/cF8OukD3A1lc/KuSUlnO/nD4u+PLcvmW1zCwww9kN82uez3RiTjbWp7tXAea3mYqI56+aY7zZe4GLgucAIjYQzo3PrU2pTU73RK5tzWVH5q/fBwFXA4yLifwHy82/kZu1iHa98c4vyqVgD/A3wy/x6T2A0Ih5pMcajceX6+3P7yW5HN54EbAH+Ten03yck7cYsnduI+AFwOnAX8L+kudrI7JzbuhLz2W6MqfhT0r/0u4m1m2N+0iQdA/wgIq5rqprRuXXCmZpW592L3mcuaRHwWWBlRDwwXtMWZdFFeVckHQ38KCI2dhDTeHUl4u0jnaL4aEQcDDxIOmXQzkzP7e7Aq0indPYDdgNeNs4YMxpvB2ZtfJJOBh4BzquKJhlTN8f8ZGPcFTgZOKVV9STj6uncOuFMzWbSedLK/sDdpQaXtBMp2ZwXEZ/LxT+UtG+u3xf40QSxjle+f4vybr0AOEbSCPBp0mm1NUC/pL4WYzwaV65fDNzXxXZ0YzOwOSKuyq8vJiWg2Tq3LwLuiIgtEfEL4HPA4czOua0rMZ/txpi0fCH9aOD1kc8jdRHrvUx+v0zWgaR/fFyX32/7A9dK2qeLeHs7t92eN/bj0fOst+edW10YPKjQ2AI+CaxpKv8AYy/knZaXX8HYi4VX5/I9SNcrds+PO4A9ct01uW11sfDlPYp9kMZNA59h7AXUP8/Lf8HYC6gX5eWDGHuR9nbSBdqe7gvgmxufY90AAAQuSURBVMDT8vJQntdZObfAbwE3Abvm/s4BTpxtc8uO13CmfT7bjdFFrC8FvgPs3dRu0nM22f3STbxNdSM0ruHM6NxO+wfjr/qDdNfH/5DuSDm54LhHkL7aXg9syo+Xk875fh24NT9XB42Af8lx3gAsq/X1p8Bt+fHGWvky4Ma8zhl0eAGzg9gHaSScJ5HugrktvxEX5PKF+fVtuf5JtfVPzjHdQu3url7uC2ApsCHP77/nN+GsnVvgPcDNuc9zSR+As2ZugQtI15d+QfpX85tKzGe7MbqI9TbSNY7qvXZmt3PWzX6ZbLxN9SM0Es6Mzq1/2sbMzIrwNRwzMyvCCcfMzIpwwjEzsyKccMzMrAgnHDMzK8IJx6xLkv5J0sra669I+kTt9Qcl/dUU+h+SdFKbuhPyrxffLOlqSUfU6o5U+uXoTZJ2yb90fJOkD0xy/AFJf9Rt/GbNnHDMuncl6X/0I2kesBfpPwJWDgeu6KQjSfM7HTT/TNBbgSMi4unA24Dz8/8kB3g9cHpELI2Ih3LbQyLirzsdIxsAnHCsZ5xwzLp3BTnhkBLNjcBPJe0uaQHwDODb+W+QfEDpb9XcIOl1AJIGlf6m0fmk/4SHpJPz31D5b+Bpbcb9W+CvI+JegIi4lvTrAn8h6c3AHwKnSDpP0hdJv612laTXSXptjuM6SZfnMefn+K7JfyPlrXmcU4Ej8zeld/Ry4uyxqW/iJmbWSkTcLekRSQeQEs+3SL+kexjpl36vj4iHJf0B6ZcLnkv6FnRN9WEPPB94VkTcIel5pJ80OZj03ryW9KvPzQ5qUb4BOD4i/i6fXrskIi4GkLQ1Ipbm5RuAl0TED9T4I2JvAu6PiN/MifIKSV8l/VzJSRFx9NRmyixxwjGbmupbzuHAh0gJ53BSwrkytzkCuCAitpN+8PAy4DeBB0i/ZXVHbnck8PmI+BlA/nbSKdHZr/heAQxLuoj0I5+Q/rjYcyS9Jr9eDDwFeHgS45tNyKfUzKamuo7zbNIptfWkbzj16zfj/fnoB5ted5I0vgM8r6nskFw+roh4G/Bu0i8Db5K0Z47vxHzNZ2lEPDEivtpBHGaT4oRjNjVXkH6y/r6I2B4R95H+HPVhpFNsAJcDr8vXSvYm/Ungq1v0dTnw6nxn2a8Br2wz5mnA+3OyQNJS0p+U/teJgpV0YERcFRGnkH4qv/ozxH+W/9wFkp6a/+DcT0l/vtysJ3xKzWxqbiBdlzm/qWxRdVEf+DwpAV1H+gbzNxFxj6Sn1zuKiGslXUj6NeI7SX8iYQcR8UVJjweulBSkxPDHkf/64gQ+IOkppG81X88xXU+6I+1aSSL9tdPfy+WPSLoOGI6If+qgf7O2/GvRZmZWhE+pmZlZEU44ZmZWhBOOmZkV4YRjZmZFOOGYmVkRTjhmZlaEE46ZmRXx/wF6uTlRBqk32wAAAABJRU5ErkJggg==\n", 276 | "text/plain": [ 277 | "
" 278 | ] 279 | }, 280 | "metadata": { 281 | "needs_background": "light" 282 | }, 283 | "output_type": "display_data" 284 | } 285 | ], 286 | "source": [ 287 | "%matplotlib inline \n", 288 | "\n", 289 | "text2.dispersion_plot([\"Elinor\", \"Marianne\", \"Edward\", \"Willoughby\"])" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "*I haven't read anything by Jane Austen, but it appears that the female characters dominate the novel. It also appears that Elinor and Edward are a couple, as it seems Marianne and Willoughby are.*" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "###### 7.\n", 304 | "\n", 305 | "☼ Find the collocations in text5." 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 36, 311 | "metadata": {}, 312 | "outputs": [ 313 | { 314 | "name": "stdout", 315 | "output_type": "stream", 316 | "text": [ 317 | "wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;\n", 318 | "cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys\n", 319 | "wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime\n", 320 | "Player; Player 12%; dont know; lez gurls; long time\n" 321 | ] 322 | } 323 | ], 324 | "source": [ 325 | "text5.collocations()" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "###### 8. \n", 333 | "\n", 334 | "☼ Consider the following Python expression: `len(set(text4))`. State the purpose of this expression. Describe the two steps involved in performing this computation." 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 38, 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "data": { 344 | "text/plain": [ 345 | "9913" 346 | ] 347 | }, 348 | "execution_count": 38, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "len(set(text4))" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "*This expression returns the number of types (unique strings) in `text4`. `set` compiles all the unique strings in `text4`, and `len` returns the number of items in that set.*" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "###### 9.\n", 369 | "\n", 370 | "☼ Review [2](https://www.nltk.org/book/ch01.html#sec-a-closer-look-at-python-texts-as-lists-of-words \"Lists and strings\") on lists and strings.\n", 371 | "\n", 372 | "+ a. Define a string and assign it to a variable, e.g.,` my_string = 'My String'` (but put something more interesting in the string). Print the contents of this variable in two ways, first by simply typing the variable name and pressing enter, then by using the `print` statement.\n", 373 | "\n", 374 | "+ b. Try adding the string to itself using `my_string + my_string`, or multiplying it by a number, e.g., `my_string * 3`. Notice that the strings are joined together without any spaces. How could you fix this?\n" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 39, 380 | "metadata": {}, 381 | "outputs": [ 382 | { 383 | "data": { 384 | "text/plain": [ 385 | "\"Something more interesting than 'My String'\"" 386 | ] 387 | }, 388 | "execution_count": 39, 389 | "metadata": {}, 390 | "output_type": "execute_result" 391 | } 392 | ], 393 | "source": [ 394 | "# a\n", 395 | "\n", 396 | "my_string = \"Something more interesting than 'My String'\"\n", 397 | "\n", 398 | "my_string" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 40, 404 | "metadata": {}, 405 | "outputs": [ 406 | { 407 | "name": "stdout", 408 | "output_type": "stream", 409 | "text": [ 410 | "Something more interesting than 'My String'\n" 411 | ] 412 | } 413 | ], 414 | "source": [ 415 | "print(my_string)" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 41, 421 | "metadata": {}, 422 | "outputs": [ 423 | { 424 | "data": { 425 | "text/plain": [ 426 | "\"Something more interesting than 'My String'Something more interesting than 'My String'Something more interesting than 'My String'\"" 427 | ] 428 | }, 429 | "execution_count": 41, 430 | "metadata": {}, 431 | "output_type": "execute_result" 432 | } 433 | ], 434 | "source": [ 435 | "# b\n", 436 | "\n", 437 | "my_string * 3" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 44, 443 | "metadata": {}, 444 | "outputs": [ 445 | { 446 | "name": "stdout", 447 | "output_type": "stream", 448 | "text": [ 449 | "Something more interesting than 'My String' Something more interesting than 'My String' Something more interesting than 'My String' \n" 450 | ] 451 | } 452 | ], 453 | "source": [ 454 | "print((my_string + \" \") * 3)" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 49, 460 | "metadata": {}, 461 | "outputs": [ 462 | { 463 | "name": "stdout", 464 | "output_type": "stream", 465 | "text": [ 466 | "Something more interesting than 'My String' Something more interesting than 'My String' Something more interesting than 'My String'\n" 467 | ] 468 | } 469 | ], 470 | "source": [ 471 | "print(' '.join([my_string] * 3))" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "###### 10.\n", 479 | "\n", 480 | "☼ Define a variable my_sent to be a list of words, using the syntax `my_sent = [\"My\", \"sent\"]` (but with your own words, or a favorite saying).\n", 481 | "\n", 482 | "+ a. Use `' '.join(my_sent)` to convert this into a string.\n", 483 | " \n", 484 | "+ b. Use `split()` to split the string back into the list form you had to start with." 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 52, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "data": { 494 | "text/plain": [ 495 | "\"You can lead a horse to water but you can't saddle a duck.\"" 496 | ] 497 | }, 498 | "execution_count": 52, 499 | "metadata": {}, 500 | "output_type": "execute_result" 501 | } 502 | ], 503 | "source": [ 504 | "my_sent = [\"You\", \"can\", \"lead\", \"a\", \"horse\", \"to\", \"water\", \"but\", \n", 505 | " \"you\", \"can't\", \"saddle\", \"a\", \"duck.\"]\n", 506 | "\n", 507 | "my_string = ' '.join(my_sent)\n", 508 | "my_string" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 58, 514 | "metadata": {}, 515 | "outputs": [ 516 | { 517 | "name": "stdout", 518 | "output_type": "stream", 519 | "text": [ 520 | "['You', 'can', 'lead', 'a', 'horse', 'to', 'water', 'but', 'you', \"can't\", 'saddle', 'a', 'duck.'] " 521 | ] 522 | } 523 | ], 524 | "source": [ 525 | "print(my_string.split(), end = ' ')" 526 | ] 527 | }, 528 | { 529 | "cell_type": "markdown", 530 | "metadata": {}, 531 | "source": [ 532 | "###### 11.\n", 533 | "\n", 534 | "☼ Define several variables containing lists of words, e.g., `phrase1`, `phrase2`, and so on. Join them together in various combinations (using the plus operator) to form whole sentences. What is the relationship between `len(phrase1 + phrase2)` and `len(phrase1) + len(phrase2)`?" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 60, 540 | "metadata": {}, 541 | "outputs": [ 542 | { 543 | "name": "stdout", 544 | "output_type": "stream", 545 | "text": [ 546 | "I once shot an elephant in my pajamas. What he was doing in my pajamas I don't know.\n" 547 | ] 548 | } 549 | ], 550 | "source": [ 551 | "phrase1 = [\"I\", \"once\", \"shot\", \"an\", \"elephant\", \"in\", \"my\", \"pajamas.\"]\n", 552 | "phrase2 = [\"What\", \"he\", \"was\", \"doing\", \"in\", \"my\", \"pajamas\", \n", 553 | " \"I\", \"don't\", \"know.\"]\n", 554 | "\n", 555 | "print(' '.join(phrase1 + phrase2))" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 61, 561 | "metadata": {}, 562 | "outputs": [ 563 | { 564 | "data": { 565 | "text/plain": [ 566 | "18" 567 | ] 568 | }, 569 | "execution_count": 61, 570 | "metadata": {}, 571 | "output_type": "execute_result" 572 | } 573 | ], 574 | "source": [ 575 | "len(phrase1 + phrase2)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 62, 581 | "metadata": {}, 582 | "outputs": [ 583 | { 584 | "data": { 585 | "text/plain": [ 586 | "18" 587 | ] 588 | }, 589 | "execution_count": 62, 590 | "metadata": {}, 591 | "output_type": "execute_result" 592 | } 593 | ], 594 | "source": [ 595 | "len(phrase1) + len(phrase2)" 596 | ] 597 | }, 598 | { 599 | "cell_type": "markdown", 600 | "metadata": {}, 601 | "source": [ 602 | "*The lengths are obviously the same.*" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "###### 12. \n", 610 | "\n", 611 | "☼ Consider the following two expressions, which have the same value. Which one will typically be more relevant in NLP? Why?\n", 612 | "\n", 613 | "+ a. `\"Monty Python\"[6:12]`\n", 614 | " \n", 615 | "+ b. `[\"Monty\", \"Python\"][1]`\n", 616 | " \n", 617 | "*List indexing will surely be more relevant than string indexing in NLP. We're much more likely to be dealing with long lists of individual words than long strings of individual characters.*\n", 618 | "\n", 619 | "###### 13. \n", 620 | "\n", 621 | "☼ We have seen how to represent a sentence as a list of words, where each word is a sequence of characters. What does `sent1[2][2]` do? Why? Experiment with other index values." 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": 63, 627 | "metadata": {}, 628 | "outputs": [ 629 | { 630 | "data": { 631 | "text/plain": [ 632 | "'h'" 633 | ] 634 | }, 635 | "execution_count": 63, 636 | "metadata": {}, 637 | "output_type": "execute_result" 638 | } 639 | ], 640 | "source": [ 641 | "sent1[2][2]" 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": {}, 647 | "source": [ 648 | "`sent1[2][2]` returns the third letter of the third word in `sent1`." 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": 65, 654 | "metadata": {}, 655 | "outputs": [ 656 | { 657 | "data": { 658 | "text/plain": [ 659 | "'e'" 660 | ] 661 | }, 662 | "execution_count": 65, 663 | "metadata": {}, 664 | "output_type": "execute_result" 665 | } 666 | ], 667 | "source": [ 668 | "sent9[-2][-2]" 669 | ] 670 | }, 671 | { 672 | "cell_type": "markdown", 673 | "metadata": {}, 674 | "source": [ 675 | "###### 14. \n", 676 | "\n", 677 | "☼ The first sentence of `text3` is provided to you in the variable `sent3`. The index of *the* in `sent3` is 1, because `sent3[1]` gives us `'the'`. What are the indexes of the two other occurrences of this word in `sent3`?\n", 678 | "\n", 679 | "*There's no easy way to do this using only those methods introduced so far in the book, since `index` will only return the first instance of an item. We could cheat and just print the sentence, manually counting the indices of the instances:*" 680 | ] 681 | }, 682 | { 683 | "cell_type": "code", 684 | "execution_count": 82, 685 | "metadata": {}, 686 | "outputs": [ 687 | { 688 | "name": "stdout", 689 | "output_type": "stream", 690 | "text": [ 691 | "['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.'] " 692 | ] 693 | } 694 | ], 695 | "source": [ 696 | "print(sent3, end = ' ')" 697 | ] 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "metadata": {}, 702 | "source": [ 703 | "*But I highly doubt that's what the authors had in mind. Instead, we can take advantage of the fact that `index` allows us to specify as arguments where the search should begin and end. Then we can use a `for`-loop and a conditional to iteratively begin the search just after the point where the previous instance of `the` was found:*" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": 84, 709 | "metadata": {}, 710 | "outputs": [ 711 | { 712 | "name": "stdout", 713 | "output_type": "stream", 714 | "text": [ 715 | "1\n", 716 | "5\n", 717 | "8\n" 718 | ] 719 | } 720 | ], 721 | "source": [ 722 | "i = 0\n", 723 | "for w in sent3:\n", 724 | " if w == 'the':\n", 725 | " i = sent3.index(w, i, -1)\n", 726 | " print(i)\n", 727 | " i += 1" 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "*I don't believe that iterators have been covered in the book yet. If you haven't seen them before, `i += 1` just adds one to the current value of `i`. It's basically a shortcut for `i = i + 1`.*\n", 735 | "\n", 736 | "*For what it's worth, we don't need to specify the ending point of the search:*" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 85, 742 | "metadata": {}, 743 | "outputs": [ 744 | { 745 | "name": "stdout", 746 | "output_type": "stream", 747 | "text": [ 748 | "1\n", 749 | "5\n", 750 | "8\n" 751 | ] 752 | } 753 | ], 754 | "source": [ 755 | "i = 0\n", 756 | "for w in sent3:\n", 757 | " if w == 'the':\n", 758 | " i = sent3.index(w, i)\n", 759 | " print(i)\n", 760 | " i += 1" 761 | ] 762 | }, 763 | { 764 | "cell_type": "markdown", 765 | "metadata": {}, 766 | "source": [ 767 | "###### 15. \n", 768 | "\n", 769 | "☼ Review the discussion of conditionals in 4. Find all words in the Chat Corpus (`text5`) starting with the letter *b*. Show them in alphabetical order.\n", 770 | "\n", 771 | "*We need to find words that begin with either `b` or `B`, so we'll use the tuple `('b', 'B')` with `.startswith()` to make sure matches for both characters are returned. Also, `sorted` will always place words beginning with uppercase letters before those starting with lowercase ones, so to deactivate this behavior we have to add the argument `key = str.casefold`.*" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": 95, 777 | "metadata": {}, 778 | "outputs": [ 779 | { 780 | "name": "stdout", 781 | "output_type": "stream", 782 | "text": [ 783 | "['B', 'b', 'b-day', 'b/c', 'b4', 'BAAAAALLLLLLLLIIIIIIINNNNNNNNNNN', 'babay', 'babble', 'babblein', 'babe', 'babes', 'babi', 'babies', 'babiess', 'baby', 'babycakeses', 'bachelorette', 'back', 'Back', 'backatchya', 'backfrontsidewaysandallaroundtheworld', 'backroom', 'backup', 'bacl', 'bad', 'bag', 'bagel', 'bagels', 'bahahahaa', 'bak', 'baked', 'balad', 'balance', 'balck', 'ball', 'ballin', 'balls', 'ban', 'band', 'bandito', 'bandsaw', 'banjoes', 'banned', 'baord', 'bar', 'barbie', 'Barbieee', 'bare', 'barely', 'bares', 'barfights', 'barks', 'barn', 'Barometer', 'barrel', 'base', 'bases', 'basically', 'basket', 'battery', 'bay', 'bbbbbyyyyyyyeeeeeeeee', 'bbiam', 'bbl', 'bbs', 'bc', 'BE', 'be', 'Beach', 'beach', 'beachhhh', 'beam', 'beams', 'beanbag', 'beans', 'bear', 'bears', 'beat', 'beaten', 'beatles', 'beats', 'beattles', 'beautiful', 'because', 'Because', 'beckley', 'become', 'bed', 'bedford', 'bedroom', 'beeeeehave', 'beeehave', 'Been', 'been', 'beer', 'before', 'beg', 'begin', 'behave', 'behind', 'bein', 'being', 'beleive', 'believe', 'belive', 'bell', 'belly', 'belong', 'belongings', 'ben', 'Ben', 'bend', 'Benjamin', 'benz', 'bes', 'beside', 'besides', 'best', 'bet', 'betrayal', 'betta', 'better', 'Better', 'between', 'beuty', 'bf', 'bi', 'biatch', 'Bible', 'bible', 'biebsa', 'bied', 'big', 'BIG', 'bigest', 'biggest', 'biiiatch', 'Biiiiiitch', 'bike', 'bikes', 'bikini', 'bio', 'Biographys', 'bird', 'Birdgang', 'birfday', 'birthday', 'bisexual', 'bishes', 'bit', 'bitch', 'bitches', 'bitdh', 'bite', 'bites', 'biyatch', 'biz', 'bj', 'black', 'blade', 'blah', 'blank', 'blankie', 'blazed', 'bleach', 'blech', 'bless', 'blessings', 'blew', 'blind', 'blinks', 'bliss', 'blocking', 'bloe', 'BLONDES', 'blood', 'blooded', 'bloody', 'Bloooooooood', 'Bloooooooooood', 'Bloooooooooooood', 'blow', 'blowing', 'blowjob', 'blowup', 'blue', 'blueberry', 'bluer', 'blues', 'blunt', 'board', 'bob', 'bodies', 'body', 'boed', 'boght', 'boi', 'boing', 'boinked', 'bois', 'bomb', 'bone', 'Bone', 'boned', 'bones', 'bong', 'boning', 'Bonus', 'bonus', 'boo', 'booboo', 'boobs', 'book', 'Books', 'boom', 'Boone', 'boooooooooooglyyyyyy', 'boost', 'boot', 'bootay', 'booted', 'boots', 'BOOTS', 'booty', 'BOOTY', 'Booyah', 'Borat', 'border', 'borderline', 'bored', 'boredom', 'boring', 'Born', 'born', 'born-again', 'bosom', 'boss', 'bossy', 'bot', 'both', 'bother', 'bothering', 'bottle', 'bought', 'bounced', 'bouncer', 'bouncers', 'bound', 'bout', 'bouts', 'bow', 'bowl', 'box', 'Box', 'boy', 'BOY', 'boyfriend', 'boys', 'Boyz', 'bra', 'brad', 'brady', 'brain', 'brakes', 'brass', 'brat', 'brb', 'brbbb', 'bread', 'Break', 'break', 'Breaking', 'breaks', 'breath', 'breathe', 'bred', 'breeding', 'bright', 'brightened', 'bring', 'brings', 'bro', 'broke', 'Broken', 'brooklyn', 'brother', 'brothers', 'brought', 'brown', 'brrrrrrr', 'bruises', 'brunswick', 'brwn', 'btw', 'bucks', 'Bud', 'buddyyyyyy', 'buff', 'buffalo', 'bug', 'bugs', 'buh', 'build', 'builds', 'built', 'bull', 'bulls', 'bum', 'bumber', 'bummer', 'bumped', 'bumper', 'bunch', 'bunny', 'Burger', 'burger', 'burito', 'burned', 'burns', 'burp', 'burpin', 'burps', 'burried', 'burryed', 'bus', 'buses', 'bust', 'busted', 'busy', 'BUt', 'But', 'but', 'BUT', 'butt', 'butter', 'butterscotch', 'button', 'buttons', 'buy', 'buying', 'bwahahahahahahahahahaha', 'Bwhaha', 'by', 'byb', 'bye', 'BYE', 'Bye', 'byeee', 'byeeee', 'byeeeeeeee', 'byeeeeeeeeeeeee', 'byes']\n" 784 | ] 785 | } 786 | ], 787 | "source": [ 788 | "\n", 789 | "print(sorted(set([w for w in text5 if w.startswith(('b', 'B'))]), key = str.casefold))" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": {}, 795 | "source": [ 796 | "16. ☼ Type the expression `list(range(10))` at the interpreter prompt. Now try `list(range(10, 20))`, `list(range(10, 20, 2))`, and `list(range(20, 10, -2))`. We will see a variety of uses for this built-in function in later chapters." 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 96, 802 | "metadata": {}, 803 | "outputs": [ 804 | { 805 | "data": { 806 | "text/plain": [ 807 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" 808 | ] 809 | }, 810 | "execution_count": 96, 811 | "metadata": {}, 812 | "output_type": "execute_result" 813 | } 814 | ], 815 | "source": [ 816 | "list(range(10))" 817 | ] 818 | }, 819 | { 820 | "cell_type": "code", 821 | "execution_count": 97, 822 | "metadata": {}, 823 | "outputs": [ 824 | { 825 | "data": { 826 | "text/plain": [ 827 | "[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]" 828 | ] 829 | }, 830 | "execution_count": 97, 831 | "metadata": {}, 832 | "output_type": "execute_result" 833 | } 834 | ], 835 | "source": [ 836 | "list(range(10, 20))" 837 | ] 838 | }, 839 | { 840 | "cell_type": "code", 841 | "execution_count": 98, 842 | "metadata": {}, 843 | "outputs": [ 844 | { 845 | "data": { 846 | "text/plain": [ 847 | "[10, 12, 14, 16, 18]" 848 | ] 849 | }, 850 | "execution_count": 98, 851 | "metadata": {}, 852 | "output_type": "execute_result" 853 | } 854 | ], 855 | "source": [ 856 | "list(range(10, 20, 2))" 857 | ] 858 | }, 859 | { 860 | "cell_type": "code", 861 | "execution_count": 99, 862 | "metadata": {}, 863 | "outputs": [ 864 | { 865 | "data": { 866 | "text/plain": [ 867 | "[20, 18, 16, 14, 12]" 868 | ] 869 | }, 870 | "execution_count": 99, 871 | "metadata": {}, 872 | "output_type": "execute_result" 873 | } 874 | ], 875 | "source": [ 876 | "list(range(20, 10, -2))" 877 | ] 878 | }, 879 | { 880 | "cell_type": "markdown", 881 | "metadata": {}, 882 | "source": [ 883 | "###### 17. \n", 884 | "\n", 885 | "◑ Use `text9.index()` to find the index of the word *sunset*. You'll need to insert this word as an argument between the parentheses. By a process of trial and error, find the slice for the complete sentence that contains this word." 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 100, 891 | "metadata": {}, 892 | "outputs": [ 893 | { 894 | "data": { 895 | "text/plain": [ 896 | "629" 897 | ] 898 | }, 899 | "execution_count": 100, 900 | "metadata": {}, 901 | "output_type": "execute_result" 902 | } 903 | ], 904 | "source": [ 905 | "text9.index('sunset')" 906 | ] 907 | }, 908 | { 909 | "cell_type": "markdown", 910 | "metadata": {}, 911 | "source": [ 912 | "*We __could__ just use trial and error, but we could also use `for`-loops and conditionals to find the periods preceding and following the word `sunset`.*" 913 | ] 914 | }, 915 | { 916 | "cell_type": "code", 917 | "execution_count": 118, 918 | "metadata": {}, 919 | "outputs": [ 920 | { 921 | "name": "stdout", 922 | "output_type": "stream", 923 | "text": [ 924 | "612\n" 925 | ] 926 | } 927 | ], 928 | "source": [ 929 | "for i in range(629, 600, -1):\n", 930 | " if text9[i] == '.':\n", 931 | " print(i)\n", 932 | " break" 933 | ] 934 | }, 935 | { 936 | "cell_type": "code", 937 | "execution_count": 119, 938 | "metadata": {}, 939 | "outputs": [ 940 | { 941 | "name": "stdout", 942 | "output_type": "stream", 943 | "text": [ 944 | "643\n" 945 | ] 946 | } 947 | ], 948 | "source": [ 949 | "for i in range(630, 650):\n", 950 | " if text9[i] == '.':\n", 951 | " print(i)\n", 952 | " break" 953 | ] 954 | }, 955 | { 956 | "cell_type": "markdown", 957 | "metadata": {}, 958 | "source": [ 959 | "*Simply printing the slice won't give us anything very legible:*" 960 | ] 961 | }, 962 | { 963 | "cell_type": "code", 964 | "execution_count": 133, 965 | "metadata": {}, 966 | "outputs": [ 967 | { 968 | "name": "stdout", 969 | "output_type": "stream", 970 | "text": [ 971 | "['CHAPTER', 'I', 'THE', 'TWO', 'POETS', 'OF', 'SAFFRON', 'PARK', 'THE', 'suburb', 'of', 'Saffron', 'Park', 'lay', 'on', 'the', 'sunset', 'side', 'of', 'London', ',', 'as', 'red', 'and', 'ragged', 'as', 'a', 'cloud', 'of', 'sunset', '.'] " 972 | ] 973 | } 974 | ], 975 | "source": [ 976 | "print(text9[613:644], end = ' ')" 977 | ] 978 | }, 979 | { 980 | "cell_type": "markdown", 981 | "metadata": {}, 982 | "source": [ 983 | "*But using `' '.join()` will add spaces before the punctuation:*" 984 | ] 985 | }, 986 | { 987 | "cell_type": "code", 988 | "execution_count": 136, 989 | "metadata": {}, 990 | "outputs": [ 991 | { 992 | "data": { 993 | "text/plain": [ 994 | "'CHAPTER I THE TWO POETS OF SAFFRON PARK THE suburb of Saffron Park lay on the sunset side of London , as red and ragged as a cloud of sunset .'" 995 | ] 996 | }, 997 | "execution_count": 136, 998 | "metadata": {}, 999 | "output_type": "execute_result" 1000 | } 1001 | ], 1002 | "source": [ 1003 | "' '.join(text9[613:644])" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "markdown", 1008 | "metadata": {}, 1009 | "source": [ 1010 | "*Which we can fix with manual concatenation:*" 1011 | ] 1012 | }, 1013 | { 1014 | "cell_type": "code", 1015 | "execution_count": 131, 1016 | "metadata": {}, 1017 | "outputs": [ 1018 | { 1019 | "data": { 1020 | "text/plain": [ 1021 | "'CHAPTER I THE TWO POETS OF SAFFRON PARK THE suburb of Saffron Park lay on the sunset side of London, as red and ragged as a cloud of sunset.'" 1022 | ] 1023 | }, 1024 | "execution_count": 131, 1025 | "metadata": {}, 1026 | "output_type": "execute_result" 1027 | } 1028 | ], 1029 | "source": [ 1030 | "' '.join(text9[613:633]) + text9[633] + ' ' + ' '.join(text9[634:643]) + text9[643]" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "*Or, with a `for`-loop and a conditional that could be generalized for other sentences:* " 1038 | ] 1039 | }, 1040 | { 1041 | "cell_type": "code", 1042 | "execution_count": 135, 1043 | "metadata": {}, 1044 | "outputs": [ 1045 | { 1046 | "name": "stdout", 1047 | "output_type": "stream", 1048 | "text": [ 1049 | "CHAPTER I THE TWO POETS OF SAFFRON PARK THE suburb of Saffron Park lay on the sunset side of London, as red and ragged as a cloud of sunset. \n" 1050 | ] 1051 | } 1052 | ], 1053 | "source": [ 1054 | "full_sent = \"\"\n", 1055 | "alpha_text = []\n", 1056 | "\n", 1057 | "for i in range(613, 644):\n", 1058 | " \n", 1059 | " if text9[i].isalpha():\n", 1060 | " alpha_text.append(text9[i])\n", 1061 | " else:\n", 1062 | " full_sent += (' '.join(alpha_text) + text9[i] + ' ')\n", 1063 | " alpha_text = []\n", 1064 | " \n", 1065 | "print(full_sent)" 1066 | ] 1067 | }, 1068 | { 1069 | "cell_type": "markdown", 1070 | "metadata": {}, 1071 | "source": [ 1072 | "*I don't think `list.append` has been used in the book yet; but it's just a method to append an item to the end of a list. You might think it's the same as just adding an item to a list; but the behavior is quite different. Observe:*" 1073 | ] 1074 | }, 1075 | { 1076 | "cell_type": "code", 1077 | "execution_count": 334, 1078 | "metadata": {}, 1079 | "outputs": [ 1080 | { 1081 | "name": "stdout", 1082 | "output_type": "stream", 1083 | "text": [ 1084 | "C H A P T E R I T H E T W O P O E T S O F S A F F R O N P A R K T H E s u b u r b o f S a f f r o n P a r k l a y o n t h e s u n s e t s i d e o f L o n d o n, a s r e d a n d r a g g e d a s a c l o u d o f s u n s e t. \n" 1085 | ] 1086 | } 1087 | ], 1088 | "source": [ 1089 | "full_sent = \"\"\n", 1090 | "alpha_text = []\n", 1091 | "\n", 1092 | "for i in range(613, 644):\n", 1093 | " \n", 1094 | " if text9[i].isalpha():\n", 1095 | " alpha_text += text9[i]\n", 1096 | " else:\n", 1097 | " full_sent += (' '.join(alpha_text) + text9[i] + ' ')\n", 1098 | " alpha_text = []\n", 1099 | " \n", 1100 | "print(full_sent)" 1101 | ] 1102 | }, 1103 | { 1104 | "cell_type": "markdown", 1105 | "metadata": {}, 1106 | "source": [ 1107 | "###### 18.\n", 1108 | "\n", 1109 | "◑ Using list addition, and the `set` and `sorted` operations, compute the vocabulary of the sentences `sent1` ... `sent8`." 1110 | ] 1111 | }, 1112 | { 1113 | "cell_type": "code", 1114 | "execution_count": 145, 1115 | "metadata": {}, 1116 | "outputs": [ 1117 | { 1118 | "data": { 1119 | "text/plain": [ 1120 | "75" 1121 | ] 1122 | }, 1123 | "execution_count": 145, 1124 | "metadata": {}, 1125 | "output_type": "execute_result" 1126 | } 1127 | ], 1128 | "source": [ 1129 | "len(set(sent1 + sent2 + sent3 + sent4 + sent5 + sent6 + sent7 + sent8))\n" 1130 | ] 1131 | }, 1132 | { 1133 | "cell_type": "markdown", 1134 | "metadata": {}, 1135 | "source": [ 1136 | "*Using `sorted` makes no difference to the number of types:*" 1137 | ] 1138 | }, 1139 | { 1140 | "cell_type": "code", 1141 | "execution_count": 146, 1142 | "metadata": {}, 1143 | "outputs": [ 1144 | { 1145 | "data": { 1146 | "text/plain": [ 1147 | "75" 1148 | ] 1149 | }, 1150 | "execution_count": 146, 1151 | "metadata": {}, 1152 | "output_type": "execute_result" 1153 | } 1154 | ], 1155 | "source": [ 1156 | "len(sorted(set(sent1 + sent2 + sent3 + sent4 + sent5 + sent6 + sent7 + sent8)))\n" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "markdown", 1161 | "metadata": {}, 1162 | "source": [ 1163 | "###### 19.\n", 1164 | "\n", 1165 | "◑ What is the difference between the following two lines? Which one will give a larger value? Will this be the case for other texts?\n", 1166 | "\n", 1167 | "`sorted(set(w.lower() for w in text1))`\n", 1168 | "`sorted(w.lower() for w in set(text1))`" 1169 | ] 1170 | }, 1171 | { 1172 | "cell_type": "markdown", 1173 | "metadata": {}, 1174 | "source": [ 1175 | "*The second line will give a larger value. It will treat lower- and uppercase versions of the same word as distinct strings and include both of them in the set. The conversion to lowercase comes after this, so we'll ultimately end up with two lowercase versions of the same word. This will be easier to observe with a toy example:*" 1176 | ] 1177 | }, 1178 | { 1179 | "cell_type": "code", 1180 | "execution_count": 152, 1181 | "metadata": {}, 1182 | "outputs": [ 1183 | { 1184 | "data": { 1185 | "text/plain": [ 1186 | "['aye', 'captain']" 1187 | ] 1188 | }, 1189 | "execution_count": 152, 1190 | "metadata": {}, 1191 | "output_type": "execute_result" 1192 | } 1193 | ], 1194 | "source": [ 1195 | "test = [\"Aye\", \"aye\", \"captain\"]\n", 1196 | "sorted(set(w.lower() for w in test))" 1197 | ] 1198 | }, 1199 | { 1200 | "cell_type": "code", 1201 | "execution_count": 153, 1202 | "metadata": {}, 1203 | "outputs": [ 1204 | { 1205 | "data": { 1206 | "text/plain": [ 1207 | "['aye', 'aye', 'captain']" 1208 | ] 1209 | }, 1210 | "execution_count": 153, 1211 | "metadata": {}, 1212 | "output_type": "execute_result" 1213 | } 1214 | ], 1215 | "source": [ 1216 | "sorted(w.lower() for w in set(test))" 1217 | ] 1218 | }, 1219 | { 1220 | "cell_type": "markdown", 1221 | "metadata": {}, 1222 | "source": [ 1223 | "*Trying it with `text1`:*" 1224 | ] 1225 | }, 1226 | { 1227 | "cell_type": "code", 1228 | "execution_count": 155, 1229 | "metadata": {}, 1230 | "outputs": [ 1231 | { 1232 | "data": { 1233 | "text/plain": [ 1234 | "17231" 1235 | ] 1236 | }, 1237 | "execution_count": 155, 1238 | "metadata": {}, 1239 | "output_type": "execute_result" 1240 | } 1241 | ], 1242 | "source": [ 1243 | "len(sorted(set(w.lower() for w in text1)))" 1244 | ] 1245 | }, 1246 | { 1247 | "cell_type": "code", 1248 | "execution_count": 156, 1249 | "metadata": {}, 1250 | "outputs": [ 1251 | { 1252 | "data": { 1253 | "text/plain": [ 1254 | "19317" 1255 | ] 1256 | }, 1257 | "execution_count": 156, 1258 | "metadata": {}, 1259 | "output_type": "execute_result" 1260 | } 1261 | ], 1262 | "source": [ 1263 | "len(sorted(w.lower() for w in set(text1)))" 1264 | ] 1265 | }, 1266 | { 1267 | "cell_type": "markdown", 1268 | "metadata": {}, 1269 | "source": [ 1270 | "*For shits & giggles, trying it again with `text2`:*" 1271 | ] 1272 | }, 1273 | { 1274 | "cell_type": "code", 1275 | "execution_count": 157, 1276 | "metadata": {}, 1277 | "outputs": [ 1278 | { 1279 | "data": { 1280 | "text/plain": [ 1281 | "6403" 1282 | ] 1283 | }, 1284 | "execution_count": 157, 1285 | "metadata": {}, 1286 | "output_type": "execute_result" 1287 | } 1288 | ], 1289 | "source": [ 1290 | "len(sorted(set(w.lower() for w in text2)))" 1291 | ] 1292 | }, 1293 | { 1294 | "cell_type": "code", 1295 | "execution_count": 158, 1296 | "metadata": {}, 1297 | "outputs": [ 1298 | { 1299 | "data": { 1300 | "text/plain": [ 1301 | "6833" 1302 | ] 1303 | }, 1304 | "execution_count": 158, 1305 | "metadata": {}, 1306 | "output_type": "execute_result" 1307 | } 1308 | ], 1309 | "source": [ 1310 | "len(sorted(w.lower() for w in set(text2)))" 1311 | ] 1312 | }, 1313 | { 1314 | "cell_type": "markdown", 1315 | "metadata": {}, 1316 | "source": [ 1317 | "*The same would be true for any text with occurences of the same words in both lower- and uppercase.*" 1318 | ] 1319 | }, 1320 | { 1321 | "cell_type": "markdown", 1322 | "metadata": {}, 1323 | "source": [ 1324 | "###### 20.\n", 1325 | "\n", 1326 | "◑ What is the difference between the following two tests: `w.isupper()` and `not w.islower()`?\n", 1327 | "\n", 1328 | "*`not w.islower()` will return `True` on non-alphabetic characters:*" 1329 | ] 1330 | }, 1331 | { 1332 | "cell_type": "code", 1333 | "execution_count": 165, 1334 | "metadata": {}, 1335 | "outputs": [ 1336 | { 1337 | "data": { 1338 | "text/plain": [ 1339 | "True" 1340 | ] 1341 | }, 1342 | "execution_count": 165, 1343 | "metadata": {}, 1344 | "output_type": "execute_result" 1345 | } 1346 | ], 1347 | "source": [ 1348 | "x = 'X'\n", 1349 | "x.isupper()" 1350 | ] 1351 | }, 1352 | { 1353 | "cell_type": "code", 1354 | "execution_count": 166, 1355 | "metadata": {}, 1356 | "outputs": [ 1357 | { 1358 | "data": { 1359 | "text/plain": [ 1360 | "True" 1361 | ] 1362 | }, 1363 | "execution_count": 166, 1364 | "metadata": {}, 1365 | "output_type": "execute_result" 1366 | } 1367 | ], 1368 | "source": [ 1369 | "not x.islower()" 1370 | ] 1371 | }, 1372 | { 1373 | "cell_type": "code", 1374 | "execution_count": 167, 1375 | "metadata": {}, 1376 | "outputs": [ 1377 | { 1378 | "data": { 1379 | "text/plain": [ 1380 | "False" 1381 | ] 1382 | }, 1383 | "execution_count": 167, 1384 | "metadata": {}, 1385 | "output_type": "execute_result" 1386 | } 1387 | ], 1388 | "source": [ 1389 | "y = '!'\n", 1390 | "y.isupper()" 1391 | ] 1392 | }, 1393 | { 1394 | "cell_type": "code", 1395 | "execution_count": 168, 1396 | "metadata": {}, 1397 | "outputs": [ 1398 | { 1399 | "data": { 1400 | "text/plain": [ 1401 | "True" 1402 | ] 1403 | }, 1404 | "execution_count": 168, 1405 | "metadata": {}, 1406 | "output_type": "execute_result" 1407 | } 1408 | ], 1409 | "source": [ 1410 | "not y.islower()" 1411 | ] 1412 | }, 1413 | { 1414 | "cell_type": "markdown", 1415 | "metadata": {}, 1416 | "source": [ 1417 | "###### 21.\n", 1418 | "\n", 1419 | "◑ Write the slice expression that extracts the last two words of `text2`." 1420 | ] 1421 | }, 1422 | { 1423 | "cell_type": "code", 1424 | "execution_count": 169, 1425 | "metadata": {}, 1426 | "outputs": [ 1427 | { 1428 | "data": { 1429 | "text/plain": [ 1430 | "['THE', 'END']" 1431 | ] 1432 | }, 1433 | "execution_count": 169, 1434 | "metadata": {}, 1435 | "output_type": "execute_result" 1436 | } 1437 | ], 1438 | "source": [ 1439 | "text2[-2:]" 1440 | ] 1441 | }, 1442 | { 1443 | "cell_type": "markdown", 1444 | "metadata": {}, 1445 | "source": [ 1446 | "###### 22.\n", 1447 | "\n", 1448 | "◑ Find all the four-letter words in the Chat Corpus (`text5`). With the help of a frequency distribution (`FreqDist`), show these words in decreasing order of frequency.\n", 1449 | "\n", 1450 | "*There are too many words in the distribution for us to display:*" 1451 | ] 1452 | }, 1453 | { 1454 | "cell_type": "code", 1455 | "execution_count": 170, 1456 | "metadata": {}, 1457 | "outputs": [], 1458 | "source": [ 1459 | "four_letter_fd = FreqDist([w for w in text5 if len(w) == 4])" 1460 | ] 1461 | }, 1462 | { 1463 | "cell_type": "code", 1464 | "execution_count": 181, 1465 | "metadata": {}, 1466 | "outputs": [ 1467 | { 1468 | "data": { 1469 | "text/plain": [ 1470 | "1181" 1471 | ] 1472 | }, 1473 | "execution_count": 181, 1474 | "metadata": {}, 1475 | "output_type": "execute_result" 1476 | } 1477 | ], 1478 | "source": [ 1479 | "len(four_letter_fd)" 1480 | ] 1481 | }, 1482 | { 1483 | "cell_type": "markdown", 1484 | "metadata": {}, 1485 | "source": [ 1486 | "*So we'll just print the first hundred:*" 1487 | ] 1488 | }, 1489 | { 1490 | "cell_type": "code", 1491 | "execution_count": 180, 1492 | "metadata": {}, 1493 | "outputs": [ 1494 | { 1495 | "name": "stdout", 1496 | "output_type": "stream", 1497 | "text": [ 1498 | "[('JOIN', 1021), ('PART', 1016), ('that', 274), ('what', 183), ('here', 181), ('....', 170), ('have', 164), ('like', 156), ('with', 152), ('chat', 142), ('your', 137), ('good', 130), ('just', 125), ('lmao', 107), ('know', 103), ('room', 98), ('from', 92), ('this', 86), ('well', 81), ('back', 78), ('hiya', 78), ('they', 77), ('dont', 75), ('yeah', 75), ('want', 71), ('love', 60), ('guys', 58), ('some', 58), ('been', 57), ('talk', 56), ('nice', 52), ('time', 50), ('when', 48), ('haha', 44), ('make', 44), ('girl', 43), ('need', 43), ('U122', 42), ('MODE', 41), ('will', 40), ('much', 40), ('then', 40), ('over', 39), ('work', 38), ('were', 38), ('take', 37), ('U121', 36), ('U115', 36), ('song', 36), ('even', 35), ('does', 35), ('seen', 35), ('U156', 35), ('U105', 35), ('more', 34), ('damn', 34), ('only', 33), ('come', 33), ('hell', 29), ('long', 28), ('them', 28), ('name', 27), ('tell', 27), ('away', 26), ('sure', 26), ('look', 26), ('baby', 26), ('call', 26), ('play', 25), ('U110', 25), ('U114', 25), ('NICK', 24), ('down', 24), ('cool', 24), ('sexy', 23), ('many', 23), ('hate', 23), ('said', 23), ('last', 22), ('ever', 22), ('hear', 21), ('life', 21), ('live', 20), ('feel', 19), ('very', 19), ('mean', 19), ('give', 19), ('same', 19), ('must', 19), ('stop', 19), ('LMAO', 19), ('!!!!', 18), ('hugs', 18), ('What', 18), ('find', 18), ('cant', 18), ('left', 17), ('????', 17), ('shit', 17), ('nite', 17)]" 1499 | ] 1500 | } 1501 | ], 1502 | "source": [ 1503 | "print(four_letter_fd.most_common(100), end = '')" 1504 | ] 1505 | }, 1506 | { 1507 | "cell_type": "markdown", 1508 | "metadata": {}, 1509 | "source": [ 1510 | "*However, it's probably advisable to ignore case:*" 1511 | ] 1512 | }, 1513 | { 1514 | "cell_type": "code", 1515 | "execution_count": 182, 1516 | "metadata": {}, 1517 | "outputs": [ 1518 | { 1519 | "name": "stdout", 1520 | "output_type": "stream", 1521 | "text": [ 1522 | "[('part', 1022), ('join', 1021), ('that', 284), ('what', 201), ('here', 185), ('have', 171), ('....', 170), ('like', 160), ('with', 154), ('chat', 146), ('your', 142), ('good', 132), ('lmao', 128), ('just', 128), ('know', 104), ('room', 103), ('this', 98), ('from', 96), ('well', 91), ('yeah', 85), ('hiya', 85), ('they', 84), ('back', 79), ('dont', 77), ('want', 71), ('love', 63), ('guys', 59), ('some', 59), ('talk', 58), ('been', 58), ('nice', 54), ('when', 54), ('time', 52), ('haha', 46), ('girl', 45), ('make', 44), ('need', 44), ('will', 42), ('song', 42), ('mode', 42), ('u122', 42), ('then', 41), ('much', 40), ('over', 40), ('were', 39), ('take', 38), ('does', 38), ('work', 38), ('even', 37), ('u121', 36), ('u115', 36), ('only', 36), ('damn', 36), ('seen', 36), ('come', 36), ('more', 35), ('u156', 35), ('u105', 35), ('nick', 33), ('long', 30), ('hell', 29), ('last', 29), ('tell', 29), ('name', 28), ('away', 28), ('sure', 28), ('them', 28), ('look', 27), ('baby', 26), ('call', 26), ('down', 26), ('cool', 26), ('sexy', 25), ('play', 25), ('u110', 25), ('u114', 25), ('many', 23), ('hate', 23), ('said', 23), ('same', 23), ('live', 23), ('stop', 23), ('life', 22), ('ever', 22), ('hear', 21), ('very', 20), ('feel', 19), ('hugs', 19), ('mean', 19), ('give', 19), ('must', 19), ('!!!!', 18), ('find', 18), ('cant', 18), ('left', 17), ('fine', 17), ('lets', 17), ('????', 17), ('shit', 17), ('nite', 17)]" 1523 | ] 1524 | } 1525 | ], 1526 | "source": [ 1527 | "four_letter_fd = FreqDist(w.lower() for w in text5 if len(w) == 4)\n", 1528 | "print(four_letter_fd.most_common(100), end = '')" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "markdown", 1533 | "metadata": {}, 1534 | "source": [ 1535 | "*This would be easier to analyze as a plot:*" 1536 | ] 1537 | }, 1538 | { 1539 | "cell_type": "code", 1540 | "execution_count": 190, 1541 | "metadata": {}, 1542 | "outputs": [ 1543 | { 1544 | "data": { 1545 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEYCAYAAACtEtpmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO2deZhcRbXAf2f2GbKTQCYQEggBEsKWCXtcAEVABJRFERER5fH0CcgTUVFR1Kc8Fx6igjxQWX2sAgn7EghbgJkkZGXJQvaQfZ2ZzHbeH6d6+k737dmXZPr8vq+/7q5bt6ruvXXrVJ1zqkpUFcdxHMdpjpyeLoDjOI6z8+PCwnEcx2kRFxaO4zhOi7iwcBzHcVrEhYXjOI7TInk9XYCuYPDgwTpy5Mh2n19VVUVxcXG3h/f2vLsjD8+7+/Pujjw8747n3RoqKirWqeqQ2IOq2us+ZWVl2hHKy8t7JLy3590deXjevTMPz7tzzmkJoFwztKuuhnIcx3FaxIWF4ziO0yIuLBzHcZwWcWHhOI7jtIgLC8dxHKdFXFg4juM4LeLCwnEcx2mRXjkpr73c++YS/vTiArS+lgGvTKUwL4fCvFwK83Mozs/luCE1lPV0IR3HcXoAFxYRNlXWsmpzNQCrt21NO75iSD5fO7W7S+U4jtPzdJmwEJG/AacDa1R1XAgbBNwPjAQ+BM5T1Y0iIsBNwGlAJfA1VZ0ezrkI+HFI9peqemdXlflrx43krCP2YvrMWex/4Biqa+vZUdfA+x9t5aePzaWy1jeKchwnO+lKm8U/gFNSwn4AvKCqo4EXwn+AU4HR4XMpcAs0CpfrgKOBo4DrRGRgVxV4t8I89hpQzLC+eYwp7ccR+wzkmP1257hRuwOwo86FheM42UmXCQtVnQpsSAk+E0iMDO4EzoqE3xWWJ5kGDBCRUuAzwHOqukFVNwLPkS6AupziAhuAVde7sHAcJzvpbm+oPVV1FUD43iOE7wUsi8RbHsIyhXcrJfm5ANT4yMJxnCxFbKHBLkpcZCQwOWKz2KSqAyLHN6rqQBF5Avi1qr4awl8Avg+cCBSq6i9D+E+ASlX9fUxel2IqLEpLS8smTZrU7nJXVlZSUlLS+H9HvfLlRz4iLwfuP3toi/HbG96Zae2MeXdHHp539+fdHXl43h3PuzVMmDChQlUnxB7MtBxtZ3wwQ/acyP/3gNLwuxR4L/z+K3B+ajzgfOCvkfAm8TJ9OnuJ8oaGBh35g8k64prJWltX32L89oZ3Zlo7Y97dkYfn3Tvz8Lw755yWYCdaovxx4KLw+yLgsUj4V8U4BtispqZ6BjhZRAYGw/bJIaxbEZFGVVRlbX13Z+84jtPjdKXr7D+BTwKDRWQ55tX0G+ABEbkEWAqcG6I/ibnNLsBcZy8GUNUNIvIL4O0Q73pVTTWadwvFBXlsr6mnuqaefkX5PVEEx3GcHqPLhIWqnp/h0EkxcRX4doZ0/gb8rROL1i5KCsLIosZHFo7jZB++NlQrKc53YeE4TvbiwqKVFIeRRVVtXQ+XxHEcp/txYdFKEmqoqpqGHi6J4zhO9+PCopUkbRY+snAcJ/twYdFKivITaii3WTiOk324sGgl7g3lOE4248KilZSExQSrXFg4jpOFuLBoJUlvKBcWjuNkHy4sWklynoUbuB3HyT5cWLQSt1k4jpPNuLBoJQk1VLWroRzHyUJcWLQSH1k4jpPNuLBoJcX55g3lwsJxnGzEhUUrafSGcmHhOE4W4sKilZS466zjOFmMC4tW4kuUO46TzbiwaCXJVWd9noXjONmHC4tWUuzeUI7jZDEuLFpJSb6vDeU4TvbiwqKV+NpQjuNkMy4sWklBXg65AnUNSk2d75bnOE524cKiDRTmCuCqKMdxsg8XFm2gMM+ERWWte0Q5jpNduLBoAwlh4SMLx3GyDRcWbaAoqKHcfdZxnGzDhUUbaBxZuEeU4zhZhguLNlDoIwvHcbIUFxZtwG0WjuNkKy4s2kBRoxrKvaEcx8kuXFi0AVdDOY6TrbiwaAOuhnIcJ1txYdEGfAa34zjZSo8ICxH5rojMFZE5IvJPESkSkX1F5E0R+UBE7heRghC3MPxfEI6P7IkyQ9JmUemus47jZBndLixEZC/gcmCCqo4DcoEvATcAN6rqaGAjcEk45RJgo6ruD9wY4vUIroZyHCdb6Sk1VB5QLCJ5QAmwCjgReCgcvxM4K/w+M/wnHD9JRKQby9pI0sDt3lCO42QXoqrdn6nIFcCvgCrgWeAKYFoYPSAiw4GnVHWciMwBTlHV5eHYQuBoVV2XkualwKUApaWlZZMmTWp3+SorKykpKUkLf/6Dzdwys4rjhxdx1TEDWozf1vDOTGtnzLs78vC8uz/v7sjD8+543q1hwoQJFao6IfagqnbrBxgIvAgMAfKBR4ELgQWROMOB2eH3XGDvyLGFwO7N5VFWVqYdoby8PDb8L4+9qiOumayX/OOtVsVva3hnprUz5t0deXjevTMPz7tzzmkJoFwztKs9oYb6FLBYVdeqai3wCHAcMCCopQD2BlaG38sx4UE43h/Y0L1FNhqXKHebheM4WUZPCIulwDEiUhJsDycB84ApwDkhzkXAY+H34+E/4fiLQQJ2Oz4pz3GcbKXbhYWqvokZqqcDs0MZbgOuAa4SkQXA7sAd4ZQ7gN1D+FXAD7q7zAkSrrPV7jrrOE6WkddylM5HVa8DrksJXgQcFRO3Gji3O8rVEq6GchwnW/EZ3G3ANz9yHCdbcWHRBpKT8nyeheM42YULizbQaOCuraeHbOyO4zg9gguLNpCbIxTk5qAKO+oaero4juM43YYLizZSXJAL+PpQjuNkFy4s2khJEBa+8qzjONmEC4s2UpyfGFm4kdtxnOzBhUUbSaqh3GbhOE724MKijTSqoXxk4ThOFuHCoo0UF9ikd7dZOI6TTbiwaCPF+XbL3BvKcZxswoVFGykJIwsXFo7jZBMuLNpIsbvOOo6ThbiwaCMl7jrrOE4W4sKijSS9oXxk4ThO9uDCoo0UJeZZuBrKcZwswoVFG0mqoVxYOI6TPbiwaCMJbyhXQzmOk024sGgjvuqs4zjZiAuLNtK4kKDbLBzHySJcWLQRXxvKcZxsxIVFG3E1lOM42YgLizbiBm7HcbIRFxZtxG0WjuNkIy4s2oiroRzHyUZcWLQRX+7DcZxsxIVFG4mqoRoatIdL4ziO0z24sGgjOTlCUdgAaUed78PtOE524MKiHSRGFz7XwnGcbMGFRTtw91nHcbINFxbtoNiXKXccJ8voEWEhIgNE5CEReVdE5ovIsSIySESeE5EPwvfAEFdE5I8iskBEZonI+J4ocxT3iHIcJ9voqZHFTcDTqnoQcBgwH/gB8IKqjgZeCP8BTgVGh8+lwC3dX9ymFPmeFo7jZBltFhYiMlBEDm1vhiLSD/g4cAeAqtao6ibgTODOEO1O4Kzw+0zgLjWmAQNEpLS9+XcGJY1qKDdwO46THYhqy3MFROQl4AwgD5gJrAVeVtWr2pyhyOHAbcA8bFRRAVwBrFDVAZF4G1V1oIhMBn6jqq+G8BeAa1S1PCXdS7GRB6WlpWWTJk1qa9EaqayspKSkJGP4797YyBvLd3DVMf05fnhxi/FbG96ec3alvLsjD8+7+/Pujjw8747n3RomTJhQoaoTYg+qaosfYEb4/gbw8/B7VmvOjUlrAlAHHB3+3wT8AtiUEm9j+H4CmBgJfwEoay6PsrIy7Qjl5eXNhl91/0wdcc1kvf/tpa2K39rwzkxrZ8y7O/LwvHtnHp5355zTEkC5ZmhXW6uGyguqn/OAyW2VViksB5ar6pvh/0PAeOCjhHopfK+JxB8eOX9vYGUHy9AhigvstlW7N5TjOFlCa4XFz4FngAWq+raI7Ad80J4MVXU1sExEDgxBJ2EqqceBi0LYRcBj4ffjwFeDV9QxwGZVXdWevDsLn2fhOE62kdfKeKtUtdGoraqLROQPHcj3O8C9IlIALAIuxgTXAyJyCbAUODfEfRI4DVgAVIa4PUpyBrcLC8dxsoPWCoubMVVRS2GtQlVnYraLVE6KiavAt9uTT1fR6A3ly304jpMlNCssRORY4DhgiIhEPZ/6AbldWbCdmRKfwe04TpbR0siiAOgT4vWNhG8BzumqQu3sFLkaynGcLKNZYaGqLwMvi8g/VHVJN5Vppydh4PYZ3I7jZAuttVkUishtwMjoOap6YlcUamfH14ZyHCfbaK2weBC4FbgdyPoW0leddRwn22itsKhT1R5fwG9nodgXEnQcJ8to7aS8SSLyLREpDUuJDxKRQV1asp2YpBrKXWcdx8kOWjuySMysvjoSpsB+nVucXYNGNZSPLBzHyRJaJSxUdd+uLsiuRKM3lNssHMfJElolLETkq3HhqnpX5xZn18CX+3AcJ9torRrqyMjvImxZjulAVgqLovwcRGBHXQP1DS3vB+I4jrOr01o11Hei/0WkP3B3l5RoF0BEKM7PpbKm3lVRjuNkBe3dg7sS2xM7aylxI7fjOFlEa20WkzDvJ7AFBMcAD3RVoXYFinyuheM4WURrbRa/i/yuA5ao6vIuKM8uQ+Nci1qfa+E4Tu+nVWqosKDgu9jKswOBmq4s1K5Ase+W5zhOFtEqYSEi5wFvYbvXnQe8KSJZu0Q5QImroRzHySJaq4a6FjhSVdcAiMgQ4Hngoa4q2M5O1MBd1MNlcRzH6Wpa6w2VkxAUgfVtOLdXUtRos/CRheM4vZ/WjiyeFpFngH+G/18EnuyaIu0aJNVQdVkuNh3HyQZa2oN7f2BPVb1aRL4ATAQEeAO4txvKt9PSZAMk10M5jtPLaalP/D/AVgBVfURVr1LV72Kjiv/p6sLtzBT7YoKO42QRLQmLkao6KzVQVcuxLVazFt8AyXGcbKIlYdGcgqW4Mwuyq+H7cDuOk020JCzeFpFvpgaKyCVARdcUadeg2IWF4zhZREveUFcC/xKRC0gKhwlAAfD5rizYzk5iZFHtNgvHcbKAZoWFqn4EHCciJwDjQvATqvpil5dsJye5AVId5iDmOI7Te2ntfhZTgCldXJZdiqZqqNZOV3Ecx9k18elk7aRxH263WTiOkwW4sGgnjWtDuc3CcZwswIVFOyn2nfIcx8kiekxYiEiuiMwQkcnh/74i8qaIfCAi94tIQQgvDP8XhOMje6rMUZIGbhcWjuP0fnpyZHEFMD/y/wbgRlUdDWwELgnhlwAbVXV/4MYQr8dJTsrznfIcx+n99IiwEJG9gc8Ct4f/ApxIcn+MO4Gzwu8zw3/C8ZNC/B6luHGeRUMPl8RxHKfrEVXt/kxFHgJ+jW3T+j3ga8C0MHpARIYDT6nqOBGZA5yS2PNbRBYCR6vqupQ0LwUuBSgtLS2bNGlSu8tXWVlJSUlJs+GqynkPf0SDwj9O7UvfPru1K52OntNZ4b0lD8+7+/Pujjw8747n3RomTJhQoaoTYg+qard+gNOBv4TfnwQmA0OABZE4w4HZ4fdcYO/IsYXA7s3lUVZWph2hvLy8VeEH//RpHXHNZH35jbc6lE5Hzums8N6Sh+fdO/PwvDvnnJYAyjVDu9oTs8mOB84QkdOwhQr7YcudDxCRPFWtA/YGVob4yzHhsVxE8oD+wIbuL3Y6xQW5bNtRx4667h+dOY7jdCfdbrNQ1R+q6t6qOhL4EvCiql6AzRA/J0S7CHgs/H48/CccfzFIwB4nYeTeUb9TFMdxHKfL2JnmWVwDXCUiC4DdgTtC+B3A7iH8KuAHPVS+NBLusz6ycBynt9Ojixqp6kvAS+H3IuComDjVwLndWrBW0ugR5cLCcZxezs40stjlcDWU4zjZgguLDlCcbwMzV0M5jtPbcWHRAXxk4ThOtuDCogOUuM3CcZwswYVFByhybyjHcbIEFxYdoHFk4Woox3F6OS4sOkCjzcJHFo7j9HJcWHSA4rC1qhu4Hcfp7biw6AA+g9txnGzBhUUHcG8ox3GyBRcWHaDY51k4jpMluLDoAG7gdhwnW3Bh0QEabRY+snAcp5fjwqID+KqzjuNkCy4sOkCJu846jpMluLDoAG6zcBwnW3Bh0QGiaqidZKdXx3GcLsGFRQfoU5DHoN0KqKpT5q7c0tPFcRzH6TJcWHSAnBzhjMOGAfDw9OU9XBrHcZyuw4VFBzmnbG8AHpu5kpq6hh4ujeM4TtfgwqKDHDysH/v0y2PD9hqmvLemp4vjOI7TJbiw6CAiwgkjiwF4uMJVUY7j9E5cWHQCHxtRRG6O8OK7a1i/bUdPF8dxHKfTcWHRCQwsyuUTBwyhrkF5/J2VPV0cx3GcTseFRSeRMHQ/5Koox3F6IS4sOomTxuxB/+J85q7cwvxVPufCcZzehQuLTqIwLzc558JHF47j9DJcWHQiZwdV1KMzV1Bb73MuHMfpPbiw6EQO27s/++/Rh3Xbapj6/tqeLo7jOE6n4cKiExERzh7vhm7HcXofLiw6mc8fsRc5As/P/4itO1wV5ThO76DbhYWIDBeRKSIyX0TmisgVIXyQiDwnIh+E74EhXETkjyKyQERmicj47i5zWxjav4iJo4dQW6+8uqyqp4vjOI7TKfTEyKIO+E9VHQMcA3xbRMYCPwBeUNXRwAvhP8CpwOjwuRS4pfuL3DYScy6mfOjCwnGc3kG3CwtVXaWq08PvrcB8YC/gTODOEO1O4Kzw+0zgLjWmAQNEpLSbi90mTh67J32L8li4sY4Lbp/G9KUbe7pIjuM4HUJ6coc3ERkJTAXGAUtVdUDk2EZVHSgik4HfqOqrIfwF4BpVLU9J61Js5EFpaWnZpEmT2l2uyspKSkpKOhT+ytIq/lqxmao6+19WWsj54/qw74D8jOl0Vt7tCe8teXje3Z93d+TheXc879YwYcKEClWdEHtQVXvkA/QBKoAvhP+bUo5vDN9PABMj4S8AZc2lXVZWph2hvLy8U8KnvP6W/vfT83XMT57SEddM1hHXTNZv3VOhj744rcvzbmt4b8nD8+6deXjenXNOSwDlmqFd7RFvKBHJBx4G7lXVR0LwRwn1UvhObA6xHBgeOX1vYJdYra9vQQ5Xf+Ygpn7/BC6ZuC8FeTk8MXsVVz6zjsvurqBiyYaeLqLjOE6r6AlvKAHuAOar6h8ihx4HLgq/LwIei4R/NXhFHQNsVtVV3VbgTmBwn0J+cvpYpl59AhccvQ+5OfD03NWcfcsbfOEvr/H0nFXUN/ScOtBxHKcl8nogz+OBC4HZIjIzhP0I+A3wgIhcAiwFzg3HngROAxYAlcDF3VvczmNo/yJ+9flDOGFIFTO29+OeaUuZvnQTl90znRG7l3DxcSPJ2VpDztKNFOblUpifQ2FeDkX5uTT0oG3JcRyn24WFmqFaMhw+KSa+At/u0kJ1MwOLc7l64kF8+4T9ebB8Obe/uogl6yv52aR5FuGl19PPKcrh9OVzOHXcUI7adxB5uT6f0nGc7qMnRhZOoKQgj4uOG8lXjhnBs3NX88iMFaxYs4H8ohJ21DXYp7aebTvq2Fhdx93TlnD3tCUMLMnn02P35JRxQymp9xGH4zhdjwuLnYDcHOHUQ0o59ZBSKioqKCsra3JcVXnwhTf5sG4AT89ZzaJ123mgfDkPlC8nT+DAaa8wblh/xu3Vj7HD+jOmtG8PXYnjOL0VFxa7ACLCqIH5nFd2EFd/5kA+WLONp2av5um5q5m/agtzV9rn/jDzJEdg7755nLB8DseOGsyx++1O/5L8nr0Ix3F2aVxY7GKICAfs2ZcD9uzLFZ8azavT3qZw6CjmrNjM3JVbmLNiMwvWbGPpljrufGMJd76xBBEYN6w/x47ancEN1Qxev51hA4rJd7uH4zitxIXFLk5xfg5lIwdx5MhBjWHVtfU89OJbrM8dzOsL1zFj6SZmr9jM7BWbAfivV18iR6C0fzF7Dyxm+KAS9hlUwoDaGg5vUHJzMvkfOI6Trbiw6IUU5ecyZnABZWWjueJTo6mqqadiyUZeW7iOqXOXsrEml1VbqlmxqYoVm6p4c3FycuAfy1/g5IP35JSDh3LsqN199OE4DuDCIisoLshl4ujBTBw9mE8N2U5ZWRk1dQ2s3FTFso2VLNtQxYI123hi5lI+2raD+95cyn1vLqV/cT6fGrMne+ZUsmPAOoYPLKG0f5G77TpOFuLCIkspyMth5ODdGDl4t8aw00orKRl2AE/PWcVTc1bzwZptPDzddvz7S/mbgHluDRtQxPCBJUjNdga9PyMt7frtW9jngB0M6VvYPRfjOE6X48LCaUREGDusH2OH9eOqkw9kwZptPDfvI6bN/5BKKWbZhipWb6lm2YYqlm0Ie3Usi1+mq+LmV/jzl8czIWJLcRxn18WFhZOR/ffow/579OGovpsa535U19azYlMVyzZUMnPe++y7775p5936/Fzmr9vBl26bxo9OG8PFx4/ElgRzHGdXxYWF0yaK8nMZNaQPo4b0oe+2ZZQdvldanKG1q3h+TQn/+8pirp88j4qlG7nh7EPpU+jVzXF2VdxS6XQ6eTnCtZ8dy18uGM9uBbk8MWsVZ/35NRas2drTRXMcp514V8/pMk47pJQDh/blsrsr+GDNNk6/+VWGluQwYNprjavpJr5zqreymOWM3qMPo/bo46MQx9nJ8DfS6VJGDenDo98+nh/9azaPzVzJh5sbYPOm2LiPvvdO4+9h/YsYtUcfCuu2s99H8+lfnE+/ojz6FefTvziftRtrGVNTR0mBV2HH6Q78TXO6nN0K87jpS0fwvZMPZNr0Wew3+kB21NWHVXUbqKqtY9rshWzP68uCNdtYtHY7KzdXs3JztSWweFFsulc//wx7Dyw2Q/yQPozesw/1G2tciDhOF+BvlNNtDB9UwpqB+ZSNGJh2bJ+GjygrGw9AfYOybEMlH6zZRvnc9xkwZBhbqmvZXGWfLVW1LF69kdXbG1i+sYrlG6t46b21jWldO+UZ9hu8G+P26s/Bw/oxblh/xpT2o6Zeqa6tT8u73jeWcpwWcWHh7HTk5kjjhMFBVcspKxuVFqeiooJDDz+CJesrWbBmGwvWbGXBmm3MXLyG5VvrWbh2OwvXbuexmSnzQB55OjbPvEeebLShFOblUJifS6HWcPiHsxpdiPffow/D+hd3xSU7zk6PCwtnlyU/N6exEYehgAmRcYcdzvurtzFn5WbmrtzMnBVb+OCjrdTU1SM5KQ6ACrX1DdQ1KNt21LFtR9PD765f1uR/SUEug4uF0vI3Gu0nZk/JZ9v6SuoHbWDssH5uoHd6HV6jnV5HYV4uh+zdn0P27t8kPG5jKYDy8nIOPXx80o5S10B1bT1T355FzoBhfBBGLQvWbGfdth0srYGlmzekpQPwvzPeAGDfwbsxNqjAxu3VD2obOv9CHacbcWHhZD0iQkFeDgV5OUT3GNy0ZyFlZSObxN1UWcNzr09nr333Z0ujDaWOzVW1zFu8gjW1+by3eiuL121n8brtPDFrFWAbUh02/TWOHzWY40btzvgRAynKz+2+i3ScDuLCwnHawICSAvYbmE/ZqMFpxyoqtjWu6PvBmq3MXbGFuSs3887yzcxavokZS+3zpykLKMjLoWyfgQzJq2JuzYcMH1jC8EHF7D2wxIWIs1PiwsJxOpmCvBwOHtafg4f1B4YD8Mq0t6kdsA+vL1jP6wvXM2/VFt5YtB6Ax9+f2+T8IX0LKZZ6Sl6ZmpZ2VVUVA6a9RlHCCB8M8ts2b2LwonfS4gtQv30r79cvbRRIvkui0x5cWDhON1CSn0PZQXty4kF7ArBhew1vLV7PKzPfp6FkoK3ku7GSFRurWLs1WNm3xC+PsiTDpEaWLM+Y/yPvzm78ndglMU9rKY4RSLU7qiid8Sb9ivMajff9ivPZtq6SvD02ceDQvj76yUJcWDhODzBotwJOGVfKkB0rKSs7tDG8vkFZvaWaaRXvMHbs2LTz5sydx6gDDmRHbQM76uqpDt/vL1jEiJEj0+I3NCgz3ltMXeEAlm2sZPmGysZdEgHYHC+QFm5cFxt+S8Vr5OYIo/foE0ZP/RhT2o+Vm2oZsHZbk2VcCvNyUZ/D0mtwYeE4OxG5OcJeA4oZOSCfMaX90o5Xrsxn/D7pkxor6ldTVjY8Ns3RuWspKzus8X9il8SKd2YzZky6QHpnzlz2Grm/Ge+rk0b8eYtXsKo6j4Vrt/Hu6q28u3orD0+PnPjcy2lp5QCFjz1NYX5OowApys+hprqa4ldfSYtfU13ZdFQTRjYb12xnXu2S2OtburQy9lim8OXLKlkiyynMC0It38q1eEMtfVZvbRJWlJ9DfYNS3xAv9LJJGLqwcJwsI7FL4voB+Ywdli6QqlYVUHbAkLTwigrbkreqpp75q7cwd8Vm5q7cwvsfbWX95m3k5hdSXVvf6HpcXddAfYNSVVtPVczMeTZviS1fplENM+ZkvqjpGY5lCq9It+8A8EK6Wg6Ah5+MDc4Bih5/uokgLMzLpa6mikFvv95klFWYn8OmDZsZsnhWWjrr1m9mcEz4+vWbGbZ8TjKNIMiK8nNZtaKSZTkrmo7m8nP4cFMt6Q7iHceFheM4baK4IJfx+wxsMsLJNIflrbfLGXfY4Y3qsh21DVTX1TNn7jzGjBmTFn/WnHkMGxlxSw4jmw+Xr2Lw4HQBBrB27VqGDEk/lin8ozVr6TtgUBM13o7aBjZs3kpeYVEyLKxdtqO23jwFUlCgQaGypp7KmnqgtsnxhRs3xpaXD5fFhy/OFB4/ogKgfGZaUL/CHM4+KfMp7cWFheM4XUZujlBSkEdJQdPw7Svyg7dYU6ozjmoqKSs7JDYPE1Tpx5oPPzxDeLrAyxQOJgwPPvTwxtFU4vudOXPZd9QBTcJ31DWwcNFiRowYkZbOkiVLYsMXL1nCsL2Gpwm26rp6Vq5eS5/+A5seq2tAd1TGlrWjuLBwHMdpJ7k5wm6FeexW2DS8cmUBZfvtnha/go8oK9snPTx3bTPhI2PzNiF2RGx4V+DO1o7jOE6LuLBwHMdxWmSXERYicoqIvCciC0TkBz1dHsdxnGxilxAWIpIL/Bk4FRgLnC8i6Q7ijuM4TpewSwgL4ChggaouUtUa4P+AM3u4TI7jOFmD7AozEEXkHOAUVf1G+JSihfAAACAASURBVH8hcLSq/kckzqXApQClpaVlkyZNand+lZWVlJSUdHt4b8+7O/LwvLs/7+7Iw/PueN6tYcKECRWqOiH2oKru9B/gXOD2yP8LgZszxS8rK9OOUF5e3iPhvT3v7sjD8+6deXjenXNOSwDlmqFd3VXUUMtJrPVs7A2szBDXcRzH6WR2FTVUHvA+cBKwAngb+LKqzs0Qfy3QzBz5FhkMxC1Q09XhvT3v7sjD8+6deXjenXNOS4xQ1fh1VTINOXa2D3AaJjAWAtd2cV6xQ7GuDu/teff268vWvHv79fWWvDv62WWW+1DVJ4H4pR8dx3GcLmVXsVk4juM4PYgLi3hu66Hw3p53d+TheffOPDzvzjmn3ewSBm7HcRynZ/GRheM4jtMiLiwcx3GcFnFh4XQYEdm3NWGO4+y6uLCg+cZORPYSkeNE5OORT2FM/MLwfXzMsePDd7GIHNiJ5d6tA+fmiMh5HTi/r4j0CX8fjonyUHvT3hkRkZExYUeKyLkx4ec2Vw+6ukydmUdX0JF625OIMbzlmO1Ke6CIHNoVaXcWbuAGRGS6qo5PCasAnge+CMwD6sMhBfaOiT9dVcdnSGs6cB3wO6BAVfcVkcOB64HvAbcAe6rquFBhzlDVXzZT3uOA24E+qrqPiBwG/JuqfiscG0nTLXMPVtVrUtK4AThWVT8eCZsdri8WVT1URA4B7gIGAflAFVAAXBGJ2g+4WlUPFpFXgKnAK8Brqro1kt/pwC+AEaG8EtLcA9garvEI4AfAW8BXU66tEPhhpvICucA3U++Hqn495D8OW/K+KBz6OfaM7lPVjdGEwjP8nKquCP8/AfwJqM3wvMlUR8LvEcBoVX1eRIpD+RqAKlVtEJEDgIOAp1S1VkQmhvh/F5EhQB9MSKeVSVXjN6tOliMtb1XdGrYC2JOmdaeIZuqniOwR4pwS4sfOHFbVR2Lq7ceArwFXp0R/FauH9SnhEsI/GZPF1nCfBLgA2E9VrxeRfYCzgO3AC6r6YeQ+/BdwJE3rn6rqfnHXEM6pUNW0DbnD84q9T83c75eAM0LeM4G1wMvAXFW9IyX93wC/p5n6HFOmn6rq9Zmupa1ktbAQkYOAg4H/pmmF7Rf+5wGHquqOEH8osBdwD/BlrHIl4v8DuBm4ErgxJa3PA3XAicBLqnpESG8WsDHk9ddI+BxVHddMud8EzgEej54DzABGYRUvKtwmxjRcs4AHscb+fuxl2jsc/kr4vjt8XwBUhpfvdWwG/RQRORO4DPhUuCcJtgL/p6qvi8h+wETgY8AxwA7gFVX9rogsAL4AzNZQEUXkHVU9TEQ+A3wb+Anwd6AamAbMxhpVsIZ9K8nnEEWB1ZiQqojcD1T1YRG5Dmt0xmKTPU8N92861kEoD/k+q6oaeux/AT4HjMee9VRsZYH7I/mOAg4JZUqrB+HavomtkDxIVUeJyGjgVmBAuE8Dw7WWA5XYygUTgANV9QARGYY9uytTyvRfmPBYJiJfAG7ABK+QbGj/M0Pej2Idmo8i91fJUD+BH2GN1zBgDdaAbQrppD0LVf16ar0VkcXY+7QC2CfkJeE+rAj3Io6XsbXiovFXhXIsDN8nquoYEfkD1sDeEe7T/6jqzeE6qrF3M7V+rG/m/t0N/ENV344WSEReznCfboq736p6kojMCPfhG8BwVb0uvJcrgHtU9d6Qzl+wjtEYMtTnuJskIktVNX1j7/bSFdPCd5UPtifG34H14Tvx+SNwHPAU1gtKxL8ImII1UFMin8eBH2Mv26rwnfhcBYwG3gxpzIikNwt4OyZ8ZjNlnpwhrXeA+YQOQAj7d6xx3R7ySnwWY4374pjPImwEkJrva4l8Yo590MJ9LgW+hG1gNQ94OoRPAXJS4s4K3zdhjSuERrwdz7e5+zgbU8O+E/7vCUwKv3OwHt8KYBk24hgEHBvu31uYoLkIW4Psosjnx8CvM9WDRLmw0diMlPJMD7+/A3w/cu0zscZqRsx9ipZpSOT4AmBM3D3JkPcCYPeY+LH1M9S33RPhwAmYf/95zdzzTPX2VuC0SNipwO+bSedW4DOR/ycDf8A6I9ujeYRrSzzjAVjH4Mbwf3szeWS6fwktw8Jw32fTzHuc6X5HylYKPAscGWkTioHngPOxUfz/ZKrPwJYMn61AXVvfmeY+u8xyH12Bqj4mIpOBa1T1v1KPi0glMFNEXsB6xGAP+E+aWZr/Q1XTFjEUkTki8mUgN/QuLgdeB0aIyCiC+kds745VzRT7m8DNYUivIlIQ0pqPqV2GRs6/DxN4v8ZUOQm2quqGTBmIyEwRmaiqr4b/xwEJPfMiEfkJyVHHV0Kcftgo5WngMOBKVb1HRBZiqon7sN7dd1Q10XP9PvBk6JUl7m+1iDwL7Av8UET6Yj3de0KPfHIk7qHYi5GJySJymtpSMakk1D11oexrgP2C+uBibMTwMKYGOwfrxb8MlACbscYfYH9VrY25h7fG1YPADlWtMY1J40KZaj/lWGwkd0mImxfKqiKSqCNPYvXm8RAnUaY7RARVPQP4SFXntyHvZSGNVNZlqJ9D1HrgOSKSozbSvAFTnT2Q4bqXZai3R6rqZZF4DwAl4XlHSfTu34/GV9VnReS/VPUqEUnsrJlQmRRinSVUdZOIfA64TUQeBCpF5LfAIyTrFKo6vZn7d2qGa7s1w30alOF+g6mhn8E6Ym+LyBHAh5iw+AY2SnsNuF5EBhFfnzeF+/dRaoFEZFmGsraLrFZDJRCRKap6Qkz4RXHxVfVOEfkspsIqioRfH/TJ3089BpwOXIv1gsAqyS+xYfxt2EhmI9a7v6CZhgYRGYz1vD+FvUA1mBArAg7HepnRyn9GOC+hX07wyQxZzAX+BvQP/zcBX1fV6SIyEOtpTwx5T8WG/IeIyOcx/fB3gSlqKpcrQtzhwLtYgztVVRcGobCNpqolwUZqi8LLvTumqvgY8KtQlkSlHYqpa1JJqC+qMCG3A6glqZPuF4b2P8JGPP8ZyjEc6+neATysqjuCHQDMtvKTmLzqgJ+RovfGdPjfI12/fKKI/He4jq9io4hvYT3WZ8I5r6nqDUGFdyWwFBudfhoT/Fdi9rRHYsqDqr4sIjeF+/MokbqA9b7j8h4KHAg8kRL/UdLr51cw28NZoTyDMWF7JNY5iao2E2XaEFNvn8VsXfdh6pV7wr37CvBxVf1M3PWFevMCtmMmmNrw09g9X4DVp/HAnSH936rqz1PS+CX2Pr6UkvxgbCT4ibj7p2Z7ibMfScx9ugBTpabdb1W9Nua6FtPUZhhVryowBKvPNeEj2KjlY6r6Vkx6N2iKrbIjuLAARORXWMOYWsGnZ4h/K9abOwF7ac4B3lLVS0JFvh976S/DVBPrsHudashDRHJVtV7MQyRHIwbgDHn/DPhjdGQQGrShmI4+jn7YMD2hXx6B9ehejMQpwpaAn66q54R0+4Vyx/U4o2Waq2bM/l+skX06YXuIxOmD9di/hzkI5IpIuYZduUTkIFV9V0TGx+fCg9juiO1derm58o/E7tE2VV2UIc6+wCpVrQ7/izHV1dOYcGyiR8bu7a2p4apaISI52MjhZOyFfwarRwer6pwM+X86Gl9Vn2vhmv4eE6xYjzUu75/GpZNoZFPrZ/hfHdK4AHt/7sXsLDHJ6H4iMih1RBvu62asgf54KOM0TJ2XiZwQP9FheRXrwGzGbB95WF1OHHtPVatSExGRvTQ4B0TC4u5b43Vgasc4+9FNqvpAzH1Ke9aq+r/hWJxR/EzgRVV9rZly9AydqdPaVT80tT9MwRrUF0nqI9M+4bzEdx/MEApQET0Wfr+MVYC4vJdiPZKTiNgbminr57Chab9I2BhgDnBDTPwbyKBfjonbH+vV98eES3n4/B7oH+IMAX6L6X5fDJ8l2KhhBubNNISkfvr3wJvYaOUOTHjuF479Bjg5/L4tw7OYEvJ4HCjJcE9KsMYlkcZo4PTweyC2h/vHE58Q/hDwWSI2E6zxvwPzQAIzfl8SfpdjnmyJuAXYvipvZihTRTPP8PNAYUz4q9io8FvAgJjj/TDbyVYy66m3tFB/dgNyI/9zo/cV2C0l/gBMXfQHzJb3R6yz0p73LLbeRv73Cd+Tw/dizIa2OPJZlCHtQS188mPOGZPpeTdzDbH2I2y0HBe/LO4djrQLR6WkNQd4I0Nago26fhL+DweOCr/jrm9we55TxmvvzMR6ywcoDd8jMnwSDeE0rLdeSDDyAtPC9zNYY3QEZgz7PdbgXYh5ACU+xcB5mErhQ8wdc2IL5ftsqGh9gDKsIT6cGCNwqMjl4fc7hMYRGwmlxs3HRhwPYz21/cLnOuCREOdZrKc0Hxuq/w0TSAMJjRDWeA8Nv8/Fek5x17EVUz9Vh98ZGzvgX5hX0F9JabSwkdz3CQ1PuKczsV70bEwtMAVTj7wY4nwK6wkvxITWQZgK5TySBtE8ksbIOOPiO+Hc32KG5vGRz8+wRr+USKMVzvs7JmDvDs8yL5LmAZhqZwGmnjkZ+DfMS+lDko1nbKOZks4LkXtyKCZQp9HUaaMPZjs7FlNHLQ3hh2GeVq9jguJimhryvwB8gPXmt2BeW1toWrcbPy3U2+My5H03ZqM7KObabsPqYqLDUhW5N/XYaH59+F2HuaU+C4yMpLO5med9JxGBjdXvvxHeG5LOCLth79hPsFHzcJoKqenAIZF0zifZfmQyiv8cOJuUziM2CvkzMD9SpnexnUTjrq/NTiHNfVwNFWjGBrEnposFqyhrgoH3Zmw08GdseHq7qv5EbO7AK1iluRnrDf4c602mohrxkQ72gJswm0VuhnKerqqTReQsrIHsiw2Dz8Ua9oWR6H2x3txQ4vXL60nqSHOxntYDwCmqenhKvjNV9XAJfuYiMktVDw3H3sXcNlMv7q5w/AysVw/wsqpOiru2SF5xc0Xi3GNRsx+Vq+oECa6IIY13MHXFkZgAP1zMVfrnqvrFSF79sRf4WpK96Msj6SSu+zls3/fHQ/iZIW7cxFbFDPQxxTUffhHJx4ylX8TUKc+p6jfCsVzsef0Ra4BHAd9U1Tsz3K8mtihVXSqZXTnr4p4tppePc8eu0RS363BsAdZDnh/+/1zN9fPv4fol+q3JuS3RevsFVf1AMruCX07S7Xo/bOT6CvB1Mqv4bg3pPBnSeR94XVW/FozOvwYuVNVpIrJdVXdLqTeJ590YFrnmGVgHI2o/+jom1K8iHcU6JQ9hqrqJmO3idFXdLCJPAf8BPKg2R+scrCM2ERNC9ZgQTNzLBSFetLyVmIF7bsz1pV1DR8hqb6gEmWwQYjOcf4sZwQTzQrpaVX8RTn1YzJuqSINeX1Unh2ObQ3oJHicDwebwRazxeBvr7cTFuxk4SkQSRvJ+WG9qGKYmmUOM11NEv/xdkvrl64HoXI46YImqLheRT0hTb6jjsUoLZigGWBUE7Eqs95wQqI22D+AuEfk1NtS+Nxy/XESOU9UfhrSjguSlcB/S5oqo6uViHjQHhLD3NOmFVBNsCBrSHIU1fqKq1WIeMoVqNpHGGfTBeP4VbLSXmKNyHLB/OH4MSQ+hy4B7ReRPWF1YBnxVVRfQDtQmkD0VylwMnCkif8R68J/FXCc/p+ZUMAUzsDcRFuHeRec6jMBGfAdjqqW3RJrI2Dpgu4iM12CPE5Ey7NnmqM3PiMavB+6WdC80SPEWUtXrws85JIUE4fdnReQ+khP2EvX2O4n84vJW1ReD0DsSe5cuw+rsdlW9Jf7OpnlWVYVzUNWHRGQ+8IiI/ABoCHUgUW+izztHRAZqmJwp5o2Up6q/C/ajLZhDwE/V7Ec3ZygPIvIlzFC+DFO7Jt6lb2MjpINEZAUtOLeIyJsS8fQSM66Lhu2lY66vU0cCPrIAEr3kyHcfTC20J/BpVV0T4g0Bnlfz8knr/arqXSFO2ixLTF2RNsMT60nMxHr0j6vqdjIg5p3Vn3QXxyJMGMT22LV5N9m4kdPhWMPUH3vpNwAXqeqsTCOnRI87pNkfuFtVzxCbZHS4BnfZUNlnhPv8m5B3QpCcj6nthmhKxRSRT4YyfRjKNDyUaWp4eX+M6ZyfBY7HZgZfgTW+V2ITIjdiut3TROQRTPWUmGS1Ssy4fjNwNGZnGQKco6qzIuXog703CQNmf5LGWTA1y/Wh55g6QzxRR07BvLBOwATk/SRVKrdjPc1Gg6yYS+WjmDE92mB/IlzX82qTu04AzlfVS5vptf4U8yJaGdIoxQT0DzF1058wj6nLMUPuKzT1QisJ3/cT7211Tjjv8fCcPouNYEuxjlCcG/PnMuSd8P55I5Tj1VA/f4YJx3/R1FNpg4g8Q1PPqtcwFXFjx01E9saE32js3RuHCbnG5y0iXw33JLFszbnhPgwO93R59AJEpAQbXewT7v97mF0r4bCyB/be7ghlPTRybppzS2onKmgTLsCeVcLT6xysjTleVVdHzk1c3yhV7Rtzv9uFCwtMYqvq0SIyDdOxrscqT7VGlk4Q82x4B6tgmXq/rxMzyxJ7AeLUAsepanNzBVLLOllVT48Lk6TrXarL3feIn436DZqOnD6GLdPxUEi3X7iwVpcvnJePGfjHBGHxyYTACj20lxLCmXRBsh6bDLUqJc0K4Muq+l74fwDwTzWV2N2YbSKht35TU7ymwuitPzYhsEZETlTVqDdYIl4e1mMUmo5eYlWVmH59Dsle/4Uh7B3SZ4i/qqrniMj/YQ32UxpWB2jhfr6FGb+jLsZgc1YmiKncjlCbN/KWqh4l5nab5vKqqh+G55O4xnfDKCeTW+tbRLzQpGVvob2As1V1W4jfB2twP48Z/cfGXF+mvH+M2TZ2YI3+VExwzIvLW4PHFU2F92JsEt4rKXn2x4TpDWR+3mMxYSzYUiHzxGb+n4d1oP4PeEhVPxKR+7F3/quhM3gAJsxOi71RqksydTQwIZXaiapQ1R+IqVITnl4vhPu9VlXfibs+Vf1VXP7twYUFIEkbxImYDQKsh1eEGQb/GcK+hBmzPgeMTe39hrRmaopOOIS/rapHSop+FOtJXUK6vSRtvRcReUBVz5PMazi9Q1iHSVXfjZzXRL8cCX+HmJETVnkz9Zb3xfzFR5IcOR2FNSpgto+xwAOqeo2InI+NqqZgFfzjwA9V9f+igkREJoX0PoapuprMFcEMd00WWouMBE+kqW57JuadcpOk+8R/WlXvE1vOIY7VxI8YY1WVmLdLnA0gFxMaM8JIdE/MrvW5ECduRJcQ9mllUtXjUgNF5HnSbVETVPX4SJzdCLaboNaJvW5VjZ2zITbx70uqWhl3PCb+fOAwVa0J/wsx54AxIjIPa+hTR1sZ12IKaUTdroeqatpCnjHn9AMaEkKrmXiZNASxy2So6tJw3qFYL/9szMA8QGPsZhrcxyXervQw8R2N/YkZjWPv0P2q+npL198VuM3C+B22NMbHSA55b1HTd38BU2sItqbLo6HBjM6UjpJp1nCmmbB3Yx4Nn8F6FRdgeuc4Eov1nZ7h+Cis0bw59CwTBsFMs1FzEoIisB5rWP6GVeKE7eRCzIPnC5ja4Q5M5ZXo5R6GNdCQtH2sAFDVf4otmHYkdg+viQyZfw3MENPJ54d0VmEjngSC9f7KReQOmq5XVRHyiNNtHywiAwg+8aH8+Zjgug8T+JBsnBMjqzWkr611FzYCTKgqfy4iv8dUlVUSb9/J05gZ4iHOuVide4mILSyUNUERpvoYhM36vzTc86gAfQfzQoraovqEPPbEnA6GqeqpInJ2qLcTY65bxRZ8jFOf1mOrGExJyfv7xHRywr2dJiKPhf+fA/4ZhNZgTBV7I/acLraixi/Ch6m+PoaNLpZg9fKVMDL6d5rauv4aRkjRhS4RkXWYujJt/oqYnWAJ8c/7icg9KsYcFt4L1wv2PFdj78wewDaJsZtJ83alUap6dqRIPw8dDTBni4T6ODE5djrw48io5X5VjZvXkri+p1Q104zzNuMjC6zHjukWE4vh/RProZ9KU7VOojLUhe9XSL5Ap2EvLtgLuyPEI8Q9nPgZno8GfXOil5yPTdw5sZ3Xkkuy0fwuNtPzYTLP5j2M5Mjpi9jI6TNxvWU1L5E3VfXoEPaqqk4Uka0p90nDZwOm5nqM5Axni6A6NaSRMI4LZid4UuMXPTwSMwhGZ47/RW2W9QvE67ZnYjaQ6ZHeXuI+F2G9wpGRcv0HsEeGEWMmVeU5WOOSeKE3Yq6ll5E+Q3ymql6caUSnkUmMkXxfxVQNqSiwKe5ehet7ChOQ14aRTR7WeTgy5roVm/0cpz7tQzynY52cLxPp5KjqFWJG88YJc4kGTZKedLM1qHeDkGogXkV7J/acK1Q18S4hIrdjgj/aI69X1W9IZKHLEPeb2HO9OPXWYm7EBXHPOxUxe9a/YYLli5h94yGswZ4nme1mN5HZrvQGpvaNdjR+F875DcnORONoPMQbhD3DL2F2l7PiiozNVylt6dpai48sjANTXtShYQjZxDgkpvcWrKcwhKS7qGC9oqPF9OevYKqg+ZFzC7GXdwrW69mCNSoJHekmMYNoQg2SRqRRTjsUwt+maaP5Amb47osJspMj5yhW8d/Aem+CTWr7l9ieHZm8oW4S09s+i3k2jQc+oTGz3cU8Td7HeqFzabqa6dTw+0jsZTgMe0mGBOGQoC/m63+Hqn4FM4SmMgvrfY7DjIibwotYo9pkTaXoPgqPYj3X6eEegS0c2NyIcQC2QnEFdp+nYfrju0iunbUd0733xUYGL2GG6X6aNJTHjuik6ez1HGyk0VdVm7jhisi/Y3M4Doq5V4mZv4PVZhT/EEBV60SkPsN1K+Y91eqlIUTkSlU9V0TOVHNfvg+bW4SqVoR7lEq1mN3vAxH5D+x+74HNrUnz3FLV32bI/siU9/XFIIDBJhVOiRy7Fav7v49JJ4fMz7sJal5pR2KdgStIzjbPD1G+io1GHsLsZleo6joRqdX4NbTAOhR3idkXINnRuBobRW3EJu1GR+NgaqqDsHZiOCZg4lzLB7R0XW3BhYUxQ0SOUdVpACJyNMmXrhFVfTkcF8zo11jJwhAUTCBMBP6Yogo6leRLujKS7G1i8yt+jHmQ9CF+DSJShVcqInIjTRvN/8WEwa1Y5d0U4g3EXp49MMP7dKxyPhOSylSJwZbfvhDrLUUb/7SRUHhJNmLLWKQZcqWpN9RirKe0naSKCJLuv8+ISEFCF56Sz3dDegnd9t+xRuBaEfkrMCD0ML8e7gnYkiOnhPMmhWvYCMwTMyinrq2VqqosxtyVJ4RreAwaZ9hOJVkPbibYUURkqqreBDwt5rUTHdE9iT2TRGegDvP8OjdG7TINMxj/gswLRG6XeLfQxuuOIiKDJKI+lebtYxq5Py12ciJcidl9Lg9lPwFrZH8mbVtMs15ERqnqwhB/P5KjodSFLtdhzhafjlxr4nnXkuF5i0h03kQO9l6tDeW6B1NBCrbA5W0kn/eniTzvcH/6YHXiXhFZg3WIEunHdTQSaZ2RktYw7LkvwrzRfoE5Pvybqn6QepPEFxLsfMSMcgdiUhxsfZn5WGOompx8lujRxU5+Cz3fVFXQZVivvE5j9qgII46EWiDRS1HtwKYlkmIQxBYuS5tgFIbFgo04LsYavgewF2k9SRXENqyhqcA8QA6Na7QzlOUp4FyNMTRKM261MXH/irkMPk7T9bv+EHqpUd12YrOlg7FG7ChouqZSeMFvVtXZkRHjDdgoqDFbbAmVoyVdVXk+1nMbgHn/JFxp+2KulafE1QNVPSjEO5ukLWxqGNH9J+nqvM2Y++lWYtQuGW57Qm1yc7gHcwluoZhK5mZVnR3iRUerUfWpYPX/aZru9SLY6OpZTL15CLaXSx9sGYq/NlOmCdjkxxFE6jqmRmn1YppiDg3/wBpNsHfn4tBrTyx0mbi3K4GfRXX7ked9G2aniV5b4nlfR7rgfhgbvR+rwcU9jFbfCKq/uPf+iXD/ckjalb6GdS4OpGlH43NYXfhGhrT+iL2LIzW5sdOXgX9p8BJMuU9nqWrc/iLtwkcWRlpPKwMtLvkt6frzI4P+/DYROSTxkkZ4jGRD3KIbZXPENJp/C2W4UWImGIFJJRFZjTWqddgSAokZzn/DKvGXsZfkslDGAZixrjXELfOOql4efsYZ8qLXdLeqXhjKdCP20qWOsIox9VSqbvtETGWQGDk9H+kp5wEXi8iiUC7Blht/OSX/xIgxVVU5Jag+CjG7UIIaYGSmehC5/odJ3462jPQ5Cm9j9fNnmnT1japdMjEPM4JWYoLmUUwlOBH4mpjn1Q6sEVTMWB6nPp2e2miLuW9eSLKTkxBie7ZQpnuxhjPVBXg18SraTB2m3bHR80hs4b3jSM49GoWpZnKwZzwC+JuIxG3z26eZ5/0kZnMaSbKd/AH2XKI2nXo7LeN7Pz10hhoI90lEvqHmJPEsMD7S0fgZ8GAzad0S0jkx3JutWEflNyISuxAk8ZtRtQsXFpjPcyvjbcYq5fnNREvVn/9ORLZjrpSpjZNi+uvWCquWyNRo/h54Pbwwink5/UpELie5Ku7tmLGtNgzrR6vq98L512G62I+HuO+KyNvELIMewxukz17vF0Y0vyPpDdVoyEuJWya2LeVSMsySzaTbVtUfB5VEYuT0J6ynfD/JUSQkZ3GrZLYBZFJVrsBm+/8Lu7efxxqFPUi3ozxG8zanN7DGIzFHIXHf38N6wP8dwqNql0zchTW6Cbva+ZhqJpN3TMKTLqE+3YyNSvtnuCft6eSs1cjkzQQi8jTxKtpM/ERVHxTzMvs0pr67BZtMeS82op5DU4F0TuT3xzF1T0Ezz/ueDOmcDbwZnjfYqOgOTChFn/fHxIze+zWTxz7EdDSwDmmcDe5oDct9AKjqRrFVDSAy2sa8004ns1dl+9BOXGjKP00W/eqDzUdYjr1MIzJ8biOy0FgXlmcspoL4DjZHBKx3MiImvgCaUgAABalJREFU7nxM1ZT4X0hy8bIPsJnDTT7N5NvcQmoV2MzeM7Ae4tCY8xMb5FRjaofEZzEtLKYXSeMw4H8w751bMDvSf0eO98de0n+mPJtBJFceTqglPwx5N5BcpG88NoK5ApscF1cPlmAbDzVXzvk0Xdm2MISdiDUkL4XPh8AJLaQVt6NhWljK8VzMQ+6HmDBdGHdPQtw5zaWVIf2TsE7J+TRdaLBNaZFcPfnX2ETNaNirrTi/P7ZszKq4a2spnfC8L2/heS/F3vvY+xfiXouN6H6GzWuaiXk9xdYdzFswl+QihkOILEKYUo5CTO3aaW2I2yw6mUz6c02ZLZyiDhmNNYCNIw6N0dt3F6E3/nms9wimS30c68HdpqoXtCGt/ci8kNqfidnPOEM6t6jqv7fxOlJHTo+qjZxysCUgRrUijRHNHdfMevVW1YOUczLd90Xh969Iql2u1Qz7rYS0/oHNC4qOhC5S1W9liB/rftxM+o02n0xxYs65B/PiSfWMq2tLWmLrsa3ARgeJta3eUnMRPgkTRqlqz0dS0hgYzhmdIY9WpROJ3+bnHc4bT3KzrqmqOqMZG1xiaZboch8/VtUHY9Jt9vragwuLTkZsclWab3hMvHY1Qt2FpPjKY/rkjCoUVe3XTFoHkFxI7SwN6x6Jzeg9AHshttPJglJErsdcbtPupYiM0fiJip1Ca+tBzHlpcxQkOXdiIqZW+j3wIw3zXVLOT3RCEkt6LA3/R2CODmlOFuG8hCddk6U1NGXToI50ciQyvyIlfB7mDpqwozSbltg6TKdgy4l/IDZX5xC17VUzCaQjSdbfXKxXfr2q/ilDHrHpaMzKCiF+u553W9OSlOU+NLnqb9RrrcXra1e5XFg4XYGku1ymLaSWSWD2tKDc2ZCk59qvsQbyPsmw/HRHOyHSwtIaHUlfbCfFG1V1Xkp4p9WDZgRSNI86bFWD5jpzsensrLT1+tqDG7idriLTkiSNuFBoNSvEXIc/Bdwg5m4dt49Gu+9pM550nZJ+YCJwUcQTqytUrtNEZGyqQGpHuWPT2VnpjnfJRxaOs5PTnNqlE/PoNDVKM3l0+UhSbM7UKFqp0urqdHoTLiwcx+k1dJZAchVpOi4sHMdxnBaJ1Xs6juM4ThQXFo7jOE6LuLBwnBYQkWtFZK6IzBKRmWGCW1fl9ZLYgnuOs1PhrrOO0wwicizmBjxebaOlwUBBC6c5Tq/DRxaO0zylwDoN+3Go6jpVXSkiPxWRt0VkjtiKwgKNI4MbRWSqiMwXkSNF5BER+UBEfhnijBSRd0XkzjBaeSi4xzZBRE4WkTdEZLqIPBgmzCEivxGReeHc33XjvXCyGBcWjtM8zwLDReR9EfmL2F4IAH9S1SPD8hnFNJ2EWKOqH8c2nXoM2w52HLY0+O4hzoHYOluHYqvDNlmzKYxgfgx8Sm3r1HLgKrHl5T+PbSh1KPDLLrhmx0nDhYXjNIPacuFlwKXYTmn3i8jXgBNE5M2wrMmJ2CZDCRLLcM8G5qrqqjAyWYTttQCwTFWjy2FPTMn6GGyl4NfE9hK/CFvfaQu2Au/tIvIFkvu+O06X4jYLx2kBVa0nLA8ehMO/AYcCE1R1mdimNUWRUxKrlDbQdK+HBpLvXOoEp9T/Ajynqml7p4jIUdhicl/Clp1P29LWcTobH1k4TjOIyIEiEl3m+XBsMyKAdcGOcE76mS2yTzCegy2F/WrK8WnA8SKyfyhHiYgcEPLrr7ZX9pWhPI7T5fjIwnGapw9ws4gMwFbzXICppDZhaqYPsa1P28p8bFG9v2IbSt0SPaiqa4O6659h4UAwG8ZW4DERKcJGH99tR96O02Z8uQ/H6WZEZCQwOdPeEo6zM+JqKMdxHKdFfGThOI7jtIiPLBzHcZwWcWHhOI7jtIgLC8dxHKdFXFg4juM4LeLCwnEcx2mR/wckeNaUg1ULVAAAAABJRU5ErkJggg==\n", 1546 | "text/plain": [ 1547 | "
" 1548 | ] 1549 | }, 1550 | "metadata": { 1551 | "needs_background": "light" 1552 | }, 1553 | "output_type": "display_data" 1554 | } 1555 | ], 1556 | "source": [ 1557 | "four_letter_fd.plot(50);" 1558 | ] 1559 | }, 1560 | { 1561 | "cell_type": "markdown", 1562 | "metadata": {}, 1563 | "source": [ 1564 | "###### 23.\n", 1565 | "\n", 1566 | "◑ Review the discussion of looping with conditions in [4](https://www.nltk.org/book/ch01.html#sec-making-decisions \"Looping with conditions\"). Use a combination of `for` and `if` statements to loop over the words of the movie script for *Monty Python and the Holy Grail* (`text6`) and `print` all the uppercase words, one per line.\n", 1567 | "\n", 1568 | "*To save space in this notebook, I'm only going to consider the first 300 words in the script. To examine the whole script, just delete this index.*" 1569 | ] 1570 | }, 1571 | { 1572 | "cell_type": "code", 1573 | "execution_count": 199, 1574 | "metadata": {}, 1575 | "outputs": [ 1576 | { 1577 | "name": "stdout", 1578 | "output_type": "stream", 1579 | "text": [ 1580 | "SCENE\n", 1581 | " KING\n", 1582 | " ARTHUR\n", 1583 | " SOLDIER\n", 1584 | " ARTHUR\n", 1585 | " I\n", 1586 | " SOLDIER\n", 1587 | " ARTHUR\n", 1588 | " I\n", 1589 | " I\n", 1590 | " SOLDIER\n", 1591 | " ARTHUR\n", 1592 | " SOLDIER\n", 1593 | " ARTHUR\n", 1594 | " SOLDIER\n", 1595 | " ARTHUR\n", 1596 | " SOLDIER\n", 1597 | " ARTHUR\n", 1598 | " SOLDIER\n", 1599 | " ARTHUR\n", 1600 | " SOLDIER\n", 1601 | " ARTHUR\n", 1602 | " SOLDIER\n", 1603 | " ARTHUR\n", 1604 | " " 1605 | ] 1606 | } 1607 | ], 1608 | "source": [ 1609 | "for w in text6[:300]:\n", 1610 | " if w.isupper():\n", 1611 | " print(w, end = '\\n ')" 1612 | ] 1613 | }, 1614 | { 1615 | "cell_type": "markdown", 1616 | "metadata": {}, 1617 | "source": [ 1618 | "If we preferred to use a list comprehension, we'd have to use `*` to unpack the list, and `sep` instead of `end`." 1619 | ] 1620 | }, 1621 | { 1622 | "cell_type": "code", 1623 | "execution_count": 219, 1624 | "metadata": {}, 1625 | "outputs": [ 1626 | { 1627 | "name": "stdout", 1628 | "output_type": "stream", 1629 | "text": [ 1630 | "SCENE\n", 1631 | "KING\n", 1632 | "ARTHUR\n", 1633 | "SOLDIER\n", 1634 | "ARTHUR\n", 1635 | "I\n", 1636 | "SOLDIER\n", 1637 | "ARTHUR\n", 1638 | "I\n", 1639 | "I\n", 1640 | "SOLDIER\n", 1641 | "ARTHUR\n", 1642 | "SOLDIER\n", 1643 | "ARTHUR\n", 1644 | "SOLDIER\n", 1645 | "ARTHUR\n", 1646 | "SOLDIER\n", 1647 | "ARTHUR\n", 1648 | "SOLDIER\n", 1649 | "ARTHUR\n", 1650 | "SOLDIER\n", 1651 | "ARTHUR\n", 1652 | "SOLDIER\n", 1653 | "ARTHUR\n" 1654 | ] 1655 | } 1656 | ], 1657 | "source": [ 1658 | "print(*[w for w in text6[:300] if w.isupper()], sep = \"\\n\")" 1659 | ] 1660 | }, 1661 | { 1662 | "cell_type": "markdown", 1663 | "metadata": {}, 1664 | "source": [ 1665 | "###### 24.\n", 1666 | "\n", 1667 | "◑ Write expressions for finding all words in `text6` that meet the conditions listed below. The result should be in the form of a list of words: [`'word1', 'word2', ...`].\n", 1668 | "\n", 1669 | "+ a. Ending in *ise*\n", 1670 | " \n", 1671 | "+ b. Containing the letter *z*\n", 1672 | " \n", 1673 | "+ c. Containing the sequence of letters *pt*\n", 1674 | " \n", 1675 | "+ d. Having all lowercase letters except for an initial capital (i.e., `titlecase`)\n", 1676 | " \n", 1677 | "*The most basic way of doing this is to use a series of `if...elif` conditionals for each condition. One thing we need to watch out for is the fact that we'd need to find occurences of strings with both lower- and uppercase `z`s (as well as `pt`s). For all practical purposes, it's not a huge concern: the `w.istitle()` conditional will find any words that begin with `Z`. However, `w.istitle()` would miss words in all caps (e.g., `ZZZZ`) or in camelcase (e.g., `tHIS`), etc... So we'll use the conditional `if 'z' in w.lower()` to handle this. The conditional will change the word to lowercase for the check, but pass the unaltered form to the list, so the words in the list will appear as they did in the text.*" 1678 | ] 1679 | }, 1680 | { 1681 | "cell_type": "code", 1682 | "execution_count": 244, 1683 | "metadata": {}, 1684 | "outputs": [ 1685 | { 1686 | "name": "stdout", 1687 | "output_type": "stream", 1688 | "text": [ 1689 | "['A', 'AMAZING', 'Aaaaaaaaah', 'Aaaaaaaah', 'Aaaaaah', 'Aaaah', 'Aaaaugh', 'Aaagh', 'Aaah', 'Aaauggh', 'Aaaugh', 'Aaauugh', 'Aagh', 'Aah', 'Aauuggghhh', 'Aauuugh', 'Aauuuuugh', 'Aauuuves', 'Action', 'Actually', 'African', 'Ages', 'Aggh', 'Agh', 'Ah', 'Ahh', 'Alice', 'All', 'Allo', 'Almighty', 'Alright', 'Am', 'Amen', 'An', 'Anarcho', 'And', 'Angnor', 'Anthrax', 'Antioch', 'Anybody', 'Anyway', 'Apples', 'Aramaic', 'Are', 'Arimathea', 'Armaments', 'Arthur', 'As', 'Ask', 'Assyria', 'At', 'Attila', 'Augh', 'Autumn', 'Auuuuuuuugh', 'Away', 'Ay', 'Ayy', 'B', 'Back', 'Bad', 'Badon', 'Battle', 'Be', 'Beast', 'Bedevere', 'Bedwere', 'Behold', 'Between', 'Beyond', 'Black', 'Bloody', 'Blue', 'Bon', 'Bones', 'Book', 'Bors', 'Brave', 'Bravely', 'Bravest', 'Bread', 'Bridge', 'Bring', 'Bristol', 'Britain', 'Britons', 'Brother', 'Build', 'Burn', 'But', 'By', 'C', 'Caerbannog', 'Camaaaaaargue', 'Camelot', 'Castle', 'Chapter', 'Charge', 'Chaste', 'Cherries', 'Chicken', 'Chickennn', 'Chop', 'Christ', 'Churches', 'Cider', 'Clark', 'Clear', 'Come', 'Concorde', 'Consult', 'Cornwall', 'Could', 'Course', 'Court', 'Crapper', 'Cut', 'Dappy', 'Death', 'Defeat', 'Dennis', 'Did', 'Didn', 'Dingo', 'Dis', 'Divine', 'Do', 'Doctor', 'Does', 'Don', 'Dragon', 'Dramatically', 'Ecky', 'Ector', 'Eee', 'Eh', 'Enchanter', 'England', 'English', 'Erbert', 'Ere', 'Erm', 'Eternal', 'European', 'Even', 'Every', 'Everything', 'Ewing', 'Exactly', 'Excalibur', 'Excuse', 'Explain', 'Far', 'Farewell', 'Father', 'Fetchez', 'Fiends', 'Fine', 'First', 'Firstly', 'Five', 'Follow', 'For', 'Forgive', 'Forward', 'Found', 'Four', 'France', 'Frank', 'French', 'Gable', 'Galahad', 'Gallahad', 'Gawain', 'Get', 'Go', 'God', 'Good', 'Gorge', 'Grail', 'Great', 'Greetings', 'Grenade', 'Guards', 'Guy', 'Ha', 'Hah', 'Hallo', 'Halt', 'Hand', 'Hang', 'Have', 'Haw', 'He', 'Hee', 'Heee', 'Heh', 'Hello', 'Help', 'Herbert', 'Here', 'Hey', 'Hic', 'Hill', 'Himself', 'His', 'Hiyaah', 'Hiyah', 'Hiyya', 'Hm', 'Hmm', 'Ho', 'Hoa', 'Hold', 'Holy', 'Honestly', 'Hoo', 'Hooray', 'How', 'Huh', 'Hurry', 'Huy', 'Huyah', 'Hya', 'Hyy', 'I', 'Idiom', 'Iesu', 'If', 'Iiiiives', 'Iiiives', 'In', 'Is', 'Isn', 'It', 'Ives', 'Jesus', 'Joseph', 'Just', 'Keep', 'King', 'Knight', 'Knights', 'Lady', 'Lake', 'Lancelot', 'Launcelot', 'Lead', 'Leaving', 'Let', 'Lie', 'Like', 'Listen', 'Loimbard', 'Look', 'Looks', 'Lord', 'Lucky', 'Make', 'Man', 'May', 'Maynard', 'Meanwhile', 'Mercea', 'Message', 'Midget', 'Mind', 'Mine', 'Mmm', 'Monsieur', 'More', 'Morning', 'Most', 'Mother', 'Mud', 'Must', 'My', 'N', 'Nador', 'Nay', 'Neee', 'Never', 'Ni', 'Nine', 'Ninepence', 'No', 'None', 'Not', 'Nothing', 'Now', 'Nu', 'O', 'Of', 'Off', 'Oh', 'Ohh', 'Old', 'Olfin', 'On', 'Once', 'One', 'Ooh', 'Oooh', 'Oooo', 'Oooohoohohooo', 'Oooooooh', 'Open', 'Or', 'Order', 'Other', 'Oui', 'Our', 'Over', 'Ow', 'Packing', 'Patsy', 'Pendragon', 'Peng', 'Perhaps', 'Peril', 'Picture', 'Pie', 'Piglet', 'Pin', 'Please', 'Practice', 'Prepare', 'Prince', 'Princess', 'Providence', 'Psalms', 'Pull', 'Pure', 'Put', 'Quick', 'Quickly', 'Quiet', 'Quite', 'Quoi', 'Rather', 'Really', 'Recently', 'Remove', 'Rheged', 'Ridden', 'Right', 'Riiight', 'Robin', 'Robinson', 'Roger', 'Round', 'Run', 'Running', 'S', 'Said', 'Saint', 'Saxons', 'Say', 'Schools', 'See', 'Seek', 'Shall', 'She', 'Shh', 'Shrubber', 'Shrubberies', 'Shut', 'Silence', 'Silly', 'Since', 'Sir', 'Skip', 'So', 'Sorry', 'Speak', 'Splendid', 'Spring', 'Stand', 'Stay', 'Steady', 'Stop', 'Summer', 'Supposing', 'Supreme', 'Surely', 'Swamp', 'Table', 'Tale', 'Tall', 'Tell', 'Thank', 'That', 'The', 'Thee', 'Then', 'There', 'Therefore', 'They', 'This', 'Those', 'Thou', 'Thpppppt', 'Thppppt', 'Thpppt', 'Thppt', 'Three', 'Throw', 'Thsss', 'Thursday', 'Thy', 'Til', 'Tim', 'Tis', 'To', 'Today', 'Together', 'Too', 'Torment', 'Tower', 'True', 'Try', 'Twenty', 'Two', 'U', 'Uh', 'Uhh', 'Ulk', 'Um', 'Umhm', 'Umm', 'Un', 'Unfortunately', 'Until', 'Use', 'Uther', 'Uugh', 'Uuh', 'Very', 'Victory', 'W', 'Waa', 'Wait', 'Walk', 'Wayy', 'We', 'Welcome', 'Well', 'What', 'When', 'Where', 'Which', 'Who', 'Whoa', 'Why', 'Will', 'Winston', 'Winter', 'With', 'Woa', 'Wood', 'Would', 'Y', 'Yapping', 'Yay', 'Yeaaah', 'Yeaah', 'Yeah', 'Yes', 'You', 'Your', 'Yup', 'ZOOT', 'Zoot', 'amazes', 'apologise', 'aptly', 'empty', 'excepting', 'frozen', 'noise', 'ptoo', 'surprise', 'temptation', 'temptress', 'wise', 'zhiv', 'zone', 'zoo', 'zoop', 'zoosh']\n" 1690 | ] 1691 | } 1692 | ], 1693 | "source": [ 1694 | "my_list = []\n", 1695 | "\n", 1696 | "for w in text6:\n", 1697 | " if w.endswith('ise'):\n", 1698 | " my_list.append(w)\n", 1699 | " elif 'z' in w.lower():\n", 1700 | " my_list.append(w)\n", 1701 | " elif 'pt' in w.lower():\n", 1702 | " my_list.append(w)\n", 1703 | " elif w.istitle():\n", 1704 | " my_list.append(w)\n", 1705 | " \n", 1706 | "\n", 1707 | "print(sorted(set(my_list)))" 1708 | ] 1709 | }, 1710 | { 1711 | "cell_type": "markdown", 1712 | "metadata": {}, 1713 | "source": [ 1714 | "*There are two ways we could make the above conditional more concise: 1) use a list comprehension with `or` to connect the multiple conditionals; 2) use `any` with an additional list comprehension so we can find words with either `pt` or `z` in the same conditional.*" 1715 | ] 1716 | }, 1717 | { 1718 | "cell_type": "code", 1719 | "execution_count": 245, 1720 | "metadata": {}, 1721 | "outputs": [ 1722 | { 1723 | "name": "stdout", 1724 | "output_type": "stream", 1725 | "text": [ 1726 | "['A', 'AMAZING', 'Aaaaaaaaah', 'Aaaaaaaah', 'Aaaaaah', 'Aaaah', 'Aaaaugh', 'Aaagh', 'Aaah', 'Aaauggh', 'Aaaugh', 'Aaauugh', 'Aagh', 'Aah', 'Aauuggghhh', 'Aauuugh', 'Aauuuuugh', 'Aauuuves', 'Action', 'Actually', 'African', 'Ages', 'Aggh', 'Agh', 'Ah', 'Ahh', 'Alice', 'All', 'Allo', 'Almighty', 'Alright', 'Am', 'Amen', 'An', 'Anarcho', 'And', 'Angnor', 'Anthrax', 'Antioch', 'Anybody', 'Anyway', 'Apples', 'Aramaic', 'Are', 'Arimathea', 'Armaments', 'Arthur', 'As', 'Ask', 'Assyria', 'At', 'Attila', 'Augh', 'Autumn', 'Auuuuuuuugh', 'Away', 'Ay', 'Ayy', 'B', 'Back', 'Bad', 'Badon', 'Battle', 'Be', 'Beast', 'Bedevere', 'Bedwere', 'Behold', 'Between', 'Beyond', 'Black', 'Bloody', 'Blue', 'Bon', 'Bones', 'Book', 'Bors', 'Brave', 'Bravely', 'Bravest', 'Bread', 'Bridge', 'Bring', 'Bristol', 'Britain', 'Britons', 'Brother', 'Build', 'Burn', 'But', 'By', 'C', 'Caerbannog', 'Camaaaaaargue', 'Camelot', 'Castle', 'Chapter', 'Charge', 'Chaste', 'Cherries', 'Chicken', 'Chickennn', 'Chop', 'Christ', 'Churches', 'Cider', 'Clark', 'Clear', 'Come', 'Concorde', 'Consult', 'Cornwall', 'Could', 'Course', 'Court', 'Crapper', 'Cut', 'Dappy', 'Death', 'Defeat', 'Dennis', 'Did', 'Didn', 'Dingo', 'Dis', 'Divine', 'Do', 'Doctor', 'Does', 'Don', 'Dragon', 'Dramatically', 'Ecky', 'Ector', 'Eee', 'Eh', 'Enchanter', 'England', 'English', 'Erbert', 'Ere', 'Erm', 'Eternal', 'European', 'Even', 'Every', 'Everything', 'Ewing', 'Exactly', 'Excalibur', 'Excuse', 'Explain', 'Far', 'Farewell', 'Father', 'Fetchez', 'Fiends', 'Fine', 'First', 'Firstly', 'Five', 'Follow', 'For', 'Forgive', 'Forward', 'Found', 'Four', 'France', 'Frank', 'French', 'Gable', 'Galahad', 'Gallahad', 'Gawain', 'Get', 'Go', 'God', 'Good', 'Gorge', 'Grail', 'Great', 'Greetings', 'Grenade', 'Guards', 'Guy', 'Ha', 'Hah', 'Hallo', 'Halt', 'Hand', 'Hang', 'Have', 'Haw', 'He', 'Hee', 'Heee', 'Heh', 'Hello', 'Help', 'Herbert', 'Here', 'Hey', 'Hic', 'Hill', 'Himself', 'His', 'Hiyaah', 'Hiyah', 'Hiyya', 'Hm', 'Hmm', 'Ho', 'Hoa', 'Hold', 'Holy', 'Honestly', 'Hoo', 'Hooray', 'How', 'Huh', 'Hurry', 'Huy', 'Huyah', 'Hya', 'Hyy', 'I', 'Idiom', 'Iesu', 'If', 'Iiiiives', 'Iiiives', 'In', 'Is', 'Isn', 'It', 'Ives', 'Jesus', 'Joseph', 'Just', 'Keep', 'King', 'Knight', 'Knights', 'Lady', 'Lake', 'Lancelot', 'Launcelot', 'Lead', 'Leaving', 'Let', 'Lie', 'Like', 'Listen', 'Loimbard', 'Look', 'Looks', 'Lord', 'Lucky', 'Make', 'Man', 'May', 'Maynard', 'Meanwhile', 'Mercea', 'Message', 'Midget', 'Mind', 'Mine', 'Mmm', 'Monsieur', 'More', 'Morning', 'Most', 'Mother', 'Mud', 'Must', 'My', 'N', 'Nador', 'Nay', 'Neee', 'Never', 'Ni', 'Nine', 'Ninepence', 'No', 'None', 'Not', 'Nothing', 'Now', 'Nu', 'O', 'Of', 'Off', 'Oh', 'Ohh', 'Old', 'Olfin', 'On', 'Once', 'One', 'Ooh', 'Oooh', 'Oooo', 'Oooohoohohooo', 'Oooooooh', 'Open', 'Or', 'Order', 'Other', 'Oui', 'Our', 'Over', 'Ow', 'Packing', 'Patsy', 'Pendragon', 'Peng', 'Perhaps', 'Peril', 'Picture', 'Pie', 'Piglet', 'Pin', 'Please', 'Practice', 'Prepare', 'Prince', 'Princess', 'Providence', 'Psalms', 'Pull', 'Pure', 'Put', 'Quick', 'Quickly', 'Quiet', 'Quite', 'Quoi', 'Rather', 'Really', 'Recently', 'Remove', 'Rheged', 'Ridden', 'Right', 'Riiight', 'Robin', 'Robinson', 'Roger', 'Round', 'Run', 'Running', 'S', 'Said', 'Saint', 'Saxons', 'Say', 'Schools', 'See', 'Seek', 'Shall', 'She', 'Shh', 'Shrubber', 'Shrubberies', 'Shut', 'Silence', 'Silly', 'Since', 'Sir', 'Skip', 'So', 'Sorry', 'Speak', 'Splendid', 'Spring', 'Stand', 'Stay', 'Steady', 'Stop', 'Summer', 'Supposing', 'Supreme', 'Surely', 'Swamp', 'Table', 'Tale', 'Tall', 'Tell', 'Thank', 'That', 'The', 'Thee', 'Then', 'There', 'Therefore', 'They', 'This', 'Those', 'Thou', 'Thpppppt', 'Thppppt', 'Thpppt', 'Thppt', 'Three', 'Throw', 'Thsss', 'Thursday', 'Thy', 'Til', 'Tim', 'Tis', 'To', 'Today', 'Together', 'Too', 'Torment', 'Tower', 'True', 'Try', 'Twenty', 'Two', 'U', 'Uh', 'Uhh', 'Ulk', 'Um', 'Umhm', 'Umm', 'Un', 'Unfortunately', 'Until', 'Use', 'Uther', 'Uugh', 'Uuh', 'Very', 'Victory', 'W', 'Waa', 'Wait', 'Walk', 'Wayy', 'We', 'Welcome', 'Well', 'What', 'When', 'Where', 'Which', 'Who', 'Whoa', 'Why', 'Will', 'Winston', 'Winter', 'With', 'Woa', 'Wood', 'Would', 'Y', 'Yapping', 'Yay', 'Yeaaah', 'Yeaah', 'Yeah', 'Yes', 'You', 'Your', 'Yup', 'ZOOT', 'Zoot', 'amazes', 'apologise', 'aptly', 'empty', 'excepting', 'frozen', 'noise', 'ptoo', 'surprise', 'temptation', 'temptress', 'wise', 'zhiv', 'zone', 'zoo', 'zoop', 'zoosh']\n" 1727 | ] 1728 | } 1729 | ], 1730 | "source": [ 1731 | "print(sorted(set([w for w in text6 if w.endswith('ise') or any(s in w.lower() for s in ('pt', 'z')) or w.istitle()])))" 1732 | ] 1733 | }, 1734 | { 1735 | "cell_type": "markdown", 1736 | "metadata": {}, 1737 | "source": [ 1738 | "###### 25. \n", 1739 | "◑ Define `sent` to be the list of words `['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']`. Now write code to perform the following tasks:\n", 1740 | "\n", 1741 | "+ a. Print all words beginning with *sh*\n", 1742 | " \n", 1743 | "+ b. Print all words longer than four characters" 1744 | ] 1745 | }, 1746 | { 1747 | "cell_type": "code", 1748 | "execution_count": 246, 1749 | "metadata": {}, 1750 | "outputs": [ 1751 | { 1752 | "name": "stdout", 1753 | "output_type": "stream", 1754 | "text": [ 1755 | "['she', 'shells', 'shore']\n" 1756 | ] 1757 | } 1758 | ], 1759 | "source": [ 1760 | "sent = ['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']\n", 1761 | "\n", 1762 | "print([w for w in sent if w.startswith('sh')])" 1763 | ] 1764 | }, 1765 | { 1766 | "cell_type": "code", 1767 | "execution_count": 247, 1768 | "metadata": {}, 1769 | "outputs": [ 1770 | { 1771 | "name": "stdout", 1772 | "output_type": "stream", 1773 | "text": [ 1774 | "['sells', 'shells', 'shore']\n" 1775 | ] 1776 | } 1777 | ], 1778 | "source": [ 1779 | "print([w for w in sent if len(w) > 4])" 1780 | ] 1781 | }, 1782 | { 1783 | "cell_type": "markdown", 1784 | "metadata": {}, 1785 | "source": [ 1786 | "###### 26. \n", 1787 | "\n", 1788 | "◑ What does the following Python code do? `sum(len(w) for w in text1)` Can you use it to work out the average word length of a text?\n", 1789 | "\n", 1790 | "*`sum(len(w) for w in text1)` will give us the number of characters in `text1`. Dividing this sum by the length of `text1` will give us a rough idea of the average word length of the text. However, this number will not be entirely accurate, since blocks of punctuation will be considered types:*" 1791 | ] 1792 | }, 1793 | { 1794 | "cell_type": "code", 1795 | "execution_count": 266, 1796 | "metadata": {}, 1797 | "outputs": [ 1798 | { 1799 | "name": "stdout", 1800 | "output_type": "stream", 1801 | "text": [ 1802 | "Total number of characters: 999044\n" 1803 | ] 1804 | } 1805 | ], 1806 | "source": [ 1807 | "tot_chars = sum(len(w) for w in text1)\n", 1808 | "\n", 1809 | "print(\"Total number of characters: {}\".format(tot_chars))" 1810 | ] 1811 | }, 1812 | { 1813 | "cell_type": "markdown", 1814 | "metadata": {}, 1815 | "source": [ 1816 | "*Average word length:*" 1817 | ] 1818 | }, 1819 | { 1820 | "cell_type": "code", 1821 | "execution_count": 256, 1822 | "metadata": {}, 1823 | "outputs": [ 1824 | { 1825 | "data": { 1826 | "text/plain": [ 1827 | "3.830411128023649" 1828 | ] 1829 | }, 1830 | "execution_count": 256, 1831 | "metadata": {}, 1832 | "output_type": "execute_result" 1833 | } 1834 | ], 1835 | "source": [ 1836 | "sum(len(w) for w in text1)/len(text1)" 1837 | ] 1838 | }, 1839 | { 1840 | "cell_type": "markdown", 1841 | "metadata": {}, 1842 | "source": [ 1843 | "*If we perform similar calculations on non-alphanumeric characters, we can see that they comprise nearly 5% of the characters in the text, which will affect our derivation of the average word length:*" 1844 | ] 1845 | }, 1846 | { 1847 | "cell_type": "code", 1848 | "execution_count": 301, 1849 | "metadata": {}, 1850 | "outputs": [ 1851 | { 1852 | "name": "stdout", 1853 | "output_type": "stream", 1854 | "text": [ 1855 | "Total number of non-alphanumeric characters: 46415\n", 1856 | "Percentage of non-alphanumeric characters: 4.645942%\n" 1857 | ] 1858 | } 1859 | ], 1860 | "source": [ 1861 | "tot_na_chars = sum(len(w) for w in text1 if not w.isalnum())\n", 1862 | "perc_na_chars = tot_na_chars/tot_chars\n", 1863 | "\n", 1864 | "print(\"Total number of non-alphanumeric characters: {}\".format(tot_na_chars))\n", 1865 | "print(\"Percentage of non-alphanumeric characters: {:%}\".format(perc_na_chars))" 1866 | ] 1867 | }, 1868 | { 1869 | "cell_type": "markdown", 1870 | "metadata": {}, 1871 | "source": [ 1872 | "*This would be the word length average if we eliminated non-alphabetic characters:*" 1873 | ] 1874 | }, 1875 | { 1876 | "cell_type": "code", 1877 | "execution_count": 299, 1878 | "metadata": {}, 1879 | "outputs": [ 1880 | { 1881 | "data": { 1882 | "text/plain": [ 1883 | "4.35945521407211" 1884 | ] 1885 | }, 1886 | "execution_count": 299, 1887 | "metadata": {}, 1888 | "output_type": "execute_result" 1889 | } 1890 | ], 1891 | "source": [ 1892 | "sum(len(w) for w in text1 if w.isalpha())/len([w for w in text1 if w.isalpha()])" 1893 | ] 1894 | }, 1895 | { 1896 | "cell_type": "markdown", 1897 | "metadata": {}, 1898 | "source": [ 1899 | "*4.36 vs. 3.83 is a sizeable difference.*\n", 1900 | "\n", 1901 | "*A further consideration is how we'd handle words with apostrophes. Currently, these words are split into three parts: the main part, the apostrophe, and the enclitic (e.g., `didn't` => `didn`, `'`, `t`). Even if we remove the apostrophes from the final counts, we'd still have the problem of enclitics (i.e., \"d\", \"ll\", \"m\", \"re\", \"s\", \"t\", and \"ve\") - single/double letters that would be counted as individual words, which would drive the average word length down. I'd argue that these endings should be rejoined to the words they were originally a part of. There are some complicated ways of dealing with this (e.g., going through the text and reattaching these enclitics); but a simpler solution would just be to leave these endings in the counts of total characters, but remove them from the counts of total words.*" 1902 | ] 1903 | }, 1904 | { 1905 | "cell_type": "code", 1906 | "execution_count": 304, 1907 | "metadata": {}, 1908 | "outputs": [ 1909 | { 1910 | "data": { 1911 | "text/plain": [ 1912 | "4.407147288400819" 1913 | ] 1914 | }, 1915 | "execution_count": 304, 1916 | "metadata": {}, 1917 | "output_type": "execute_result" 1918 | } 1919 | ], 1920 | "source": [ 1921 | "char_len = sum(len(w) for w in text1 if w.isalpha())\n", 1922 | "total_words = len([w for w in text1 if w.isalpha() and w.lower() not in (\"d\", \"ll\", \"m\", \"re\", \"s\", \"t\", \"ve\")])\n", 1923 | "\n", 1924 | "char_len/total_words" 1925 | ] 1926 | }, 1927 | { 1928 | "cell_type": "markdown", 1929 | "metadata": {}, 1930 | "source": [ 1931 | "*The difference here isn't as nearly as great - partly because contractions aren't especially common in __Moby Dick__ - but if we care about the accuracy of our figures, we need to keep such things in mind.*" 1932 | ] 1933 | }, 1934 | { 1935 | "cell_type": "markdown", 1936 | "metadata": {}, 1937 | "source": [ 1938 | "###### 27. \n", 1939 | "\n", 1940 | "◑ Define a function called `vocab_size(text)` that has a single parameter for the text, and which returns the vocabulary size of the text.\n", 1941 | "\n", 1942 | "*Here is a basic version of that function. Also, the book hasn't yet covered docstrings for hand-crafted functions, but IMHO it's a good idea to get in the habit of writing descriptions of what your function does.*" 1943 | ] 1944 | }, 1945 | { 1946 | "cell_type": "code", 1947 | "execution_count": 302, 1948 | "metadata": {}, 1949 | "outputs": [], 1950 | "source": [ 1951 | "def vocab_size(text):\n", 1952 | " \"\"\"\n", 1953 | " Returns the number of words in a text.\n", 1954 | " \"\"\"\n", 1955 | " return len(text)" 1956 | ] 1957 | }, 1958 | { 1959 | "cell_type": "code", 1960 | "execution_count": 303, 1961 | "metadata": {}, 1962 | "outputs": [ 1963 | { 1964 | "data": { 1965 | "text/plain": [ 1966 | "260819" 1967 | ] 1968 | }, 1969 | "execution_count": 303, 1970 | "metadata": {}, 1971 | "output_type": "execute_result" 1972 | } 1973 | ], 1974 | "source": [ 1975 | "vocab_size(text1)" 1976 | ] 1977 | }, 1978 | { 1979 | "cell_type": "markdown", 1980 | "metadata": {}, 1981 | "source": [ 1982 | "*But as outlined above, this includes punctuation, and enclitics. This more complex version eliminates these:*" 1983 | ] 1984 | }, 1985 | { 1986 | "cell_type": "code", 1987 | "execution_count": 305, 1988 | "metadata": {}, 1989 | "outputs": [], 1990 | "source": [ 1991 | "def vocab_size(text):\n", 1992 | " \"\"\"\n", 1993 | " Returns the number of words in a text. Only alphabetic strings\n", 1994 | " are considered. Enclitics are removed from the count.\n", 1995 | " \"\"\"\n", 1996 | " \n", 1997 | " return len([w for w in text if w.isalpha() and w.lower not in (\"d\", \"ll\", \"m\", \"re\", \"s\", \"t\", \"ve\")])" 1998 | ] 1999 | }, 2000 | { 2001 | "cell_type": "code", 2002 | "execution_count": 306, 2003 | "metadata": {}, 2004 | "outputs": [ 2005 | { 2006 | "data": { 2007 | "text/plain": [ 2008 | "218361" 2009 | ] 2010 | }, 2011 | "execution_count": 306, 2012 | "metadata": {}, 2013 | "output_type": "execute_result" 2014 | } 2015 | ], 2016 | "source": [ 2017 | "vocab_size(text1)" 2018 | ] 2019 | }, 2020 | { 2021 | "cell_type": "markdown", 2022 | "metadata": {}, 2023 | "source": [ 2024 | "###### 28. \n", 2025 | "◑ Define a function `percent(word, text)` that calculates how often a given word occurs in a text, and expresses the result as a percentage.\n", 2026 | "\n", 2027 | "*We'll want to tabulate words regardless of their case, so I'm adding the method `.lower()` to the list comprehension.*" 2028 | ] 2029 | }, 2030 | { 2031 | "cell_type": "code", 2032 | "execution_count": 314, 2033 | "metadata": {}, 2034 | "outputs": [], 2035 | "source": [ 2036 | "def percent(word, text):\n", 2037 | " \"\"\"\n", 2038 | " Returns the percentage that a given word comprises in a text.\n", 2039 | " \"\"\"\n", 2040 | " word_count = len([w for w in text if w.lower() == word.lower()])\n", 2041 | " total_words = vocab_size(text)\n", 2042 | " return 100 * word_count/total_words " 2043 | ] 2044 | }, 2045 | { 2046 | "cell_type": "code", 2047 | "execution_count": 315, 2048 | "metadata": {}, 2049 | "outputs": [ 2050 | { 2051 | "data": { 2052 | "text/plain": [ 2053 | "0.561455571278754" 2054 | ] 2055 | }, 2056 | "execution_count": 315, 2057 | "metadata": {}, 2058 | "output_type": "execute_result" 2059 | } 2060 | ], 2061 | "source": [ 2062 | "percent(\"whale\", text1)" 2063 | ] 2064 | }, 2065 | { 2066 | "cell_type": "code", 2067 | "execution_count": 319, 2068 | "metadata": {}, 2069 | "outputs": [ 2070 | { 2071 | "data": { 2072 | "text/plain": [ 2073 | "0.6000779321989869" 2074 | ] 2075 | }, 2076 | "execution_count": 319, 2077 | "metadata": {}, 2078 | "output_type": "execute_result" 2079 | } 2080 | ], 2081 | "source": [ 2082 | "percent(\"God\", text3)" 2083 | ] 2084 | }, 2085 | { 2086 | "cell_type": "markdown", 2087 | "metadata": {}, 2088 | "source": [ 2089 | "###### 29. \n", 2090 | "◑ We have been using sets to store vocabularies. Try the following Python expression: `set(sent3)` < `set(text1)`. Experiment with this using different arguments to `set()`. What does it do? Can you think of a practical application for this?" 2091 | ] 2092 | }, 2093 | { 2094 | "cell_type": "code", 2095 | "execution_count": 332, 2096 | "metadata": {}, 2097 | "outputs": [ 2098 | { 2099 | "data": { 2100 | "text/plain": [ 2101 | "['In',\n", 2102 | " 'the',\n", 2103 | " 'beginning',\n", 2104 | " 'God',\n", 2105 | " 'created',\n", 2106 | " 'the',\n", 2107 | " 'heaven',\n", 2108 | " 'and',\n", 2109 | " 'the',\n", 2110 | " 'earth',\n", 2111 | " '.']" 2112 | ] 2113 | }, 2114 | "execution_count": 332, 2115 | "metadata": {}, 2116 | "output_type": "execute_result" 2117 | } 2118 | ], 2119 | "source": [ 2120 | "sent3" 2121 | ] 2122 | }, 2123 | { 2124 | "cell_type": "code", 2125 | "execution_count": 329, 2126 | "metadata": {}, 2127 | "outputs": [ 2128 | { 2129 | "data": { 2130 | "text/plain": [ 2131 | "True" 2132 | ] 2133 | }, 2134 | "execution_count": 329, 2135 | "metadata": {}, 2136 | "output_type": "execute_result" 2137 | } 2138 | ], 2139 | "source": [ 2140 | "set(sent3) < set(text1)" 2141 | ] 2142 | }, 2143 | { 2144 | "cell_type": "code", 2145 | "execution_count": 330, 2146 | "metadata": {}, 2147 | "outputs": [ 2148 | { 2149 | "data": { 2150 | "text/plain": [ 2151 | "True" 2152 | ] 2153 | }, 2154 | "execution_count": 330, 2155 | "metadata": {}, 2156 | "output_type": "execute_result" 2157 | } 2158 | ], 2159 | "source": [ 2160 | "set(sent3) < set(text3)" 2161 | ] 2162 | }, 2163 | { 2164 | "cell_type": "code", 2165 | "execution_count": 333, 2166 | "metadata": {}, 2167 | "outputs": [ 2168 | { 2169 | "data": { 2170 | "text/plain": [ 2171 | "True" 2172 | ] 2173 | }, 2174 | "execution_count": 333, 2175 | "metadata": {}, 2176 | "output_type": "execute_result" 2177 | } 2178 | ], 2179 | "source": [ 2180 | "set(sent3) < set(text4)" 2181 | ] 2182 | }, 2183 | { 2184 | "cell_type": "code", 2185 | "execution_count": 323, 2186 | "metadata": {}, 2187 | "outputs": [ 2188 | { 2189 | "data": { 2190 | "text/plain": [ 2191 | "False" 2192 | ] 2193 | }, 2194 | "execution_count": 323, 2195 | "metadata": {}, 2196 | "output_type": "execute_result" 2197 | } 2198 | ], 2199 | "source": [ 2200 | "set(sent3) < set(text5)" 2201 | ] 2202 | }, 2203 | { 2204 | "cell_type": "markdown", 2205 | "metadata": {}, 2206 | "source": [ 2207 | "*The expression evaluates whether the set on the left is a subset of the set of the right. It could be used to determine if one set was drawn from another, i.e., to try to locate which text a sentence was taken from. However, this would generate a large number of false positives in the sentence was sufficiently short, as can be seen above. It could also be used to compare the similarities between two texts - with the caveat that one text would have to be significantly shorter than the other.*" 2208 | ] 2209 | } 2210 | ], 2211 | "metadata": { 2212 | "kernelspec": { 2213 | "display_name": "Python 3", 2214 | "language": "python", 2215 | "name": "python3" 2216 | }, 2217 | "language_info": { 2218 | "codemirror_mode": { 2219 | "name": "ipython", 2220 | "version": 3 2221 | }, 2222 | "file_extension": ".py", 2223 | "mimetype": "text/x-python", 2224 | "name": "python", 2225 | "nbconvert_exporter": "python", 2226 | "pygments_lexer": "ipython3", 2227 | "version": "3.7.3" 2228 | } 2229 | }, 2230 | "nbformat": 4, 2231 | "nbformat_minor": 2 2232 | } 2233 | -------------------------------------------------------------------------------- /Chapter 03/Nationalities.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/STRZGR/Natural-Language-Processing-with-Python-Analyzing-Text-with-the-Natural-Language-Toolkit/07835b68d1f7e7d274baf24f409e5345a73f3110/Chapter 03/Nationalities.csv -------------------------------------------------------------------------------- /Chapter 03/tranzlator.yml: -------------------------------------------------------------------------------- 1 | a: "" 2 | abbatial: abbashul 3 | abhorred: abhorrd 4 | about: bout 5 | absorbed: absorbd 6 | abstinential: abstinenshul 7 | accelerometer: accelerometr 8 | accepts: acceptz 9 | access: acces 10 | accessibility: accesibility 11 | accessories: accessoriez 12 | accidential: accidenshul 13 | accompanying: accompanyin 14 | accomplished: accomplishd 15 | according: accordin 16 | account: akownt 17 | accounting: accouting 18 | accrementitial: accrementishul 19 | accumulating: accumulatin 20 | accusation: accusashun 21 | accustomed: accustomd 22 | achilles: achillez 23 | acquired: acquird 24 | acquiring: acquirin 25 | acquisition: acquizishun 26 | across: acros 27 | action: akshun 28 | actions: akshuns 29 | actual: akshul 30 | actually: akshully 31 | added: addd 32 | adding: addin 33 | addition: addishun 34 | additions: addishuns 35 | address: addres 36 | addresses: addressez 37 | adds: addz 38 | administration: administrashun 39 | admission: admishun 40 | admitted: admittd 41 | adopters: adopterz 42 | ads: adz 43 | advanced: advancd 44 | advantages: advantagez 45 | advential: advenshul 46 | advertisers: advertiserz 47 | advertising: advertisin 48 | advised: advisd 49 | advisors: advisorz 50 | affiliation: affiliashun 51 | aforelinked: aforelinkd 52 | aforementioned: aforemenshuned 53 | after: aftr 54 | agential: agenshul 55 | ages: agez 56 | agreed: agred 57 | agrees: agreez 58 | agrestial: agresshul 59 | aims: aimz 60 | airs: airz 61 | airtunes: airtunez 62 | allowed: allowd 63 | allowing: allowin 64 | allows: allowz 65 | alpha: alfa 66 | alternating: alternatin 67 | alternative: alternativ 68 | alternatives: alternativez 69 | always: alwayz 70 | amazing: amazin 71 | analyzing: analyzin 72 | anchored: anchord 73 | and: an 74 | animated: animatd 75 | animation: animashun 76 | animations: animashuns 77 | annoucing: annoucin 78 | announced: announcd 79 | announcing: announcin 80 | annoyed: annoyd 81 | annoying: annoyin 82 | another: anothr 83 | answer: anzwr 84 | answered: anzwerd 85 | answering: answerin 86 | answers: anzwrs 87 | anteinitial: anteinishul 88 | antenatalitial: antenatalishul 89 | antialiasing: antialiasin 90 | anticipated: anticipatd 91 | anticipation: anticipashun 92 | antipestilential: antipestilenshul 93 | antiprudential: antiprudenshul 94 | anything: anythin 95 | aperature: aperschur 96 | apologies: apologiez 97 | apostrophe: apostrofe 98 | appalling: appallin 99 | appealed: appeald 100 | appeared: appeard 101 | appears: appearz 102 | application: applicashun 103 | applications: applicashuns 104 | applies: appliez 105 | approved: approvd 106 | approving: approvin 107 | apps: appz 108 | aquantive: aquantiv 109 | architecture: architecchur 110 | archive: archiv 111 | are: r 112 | argued: argud 113 | arising: arisin 114 | arrived: arrivd 115 | articles: articlez 116 | asisted: assistd 117 | asked: askd 118 | asking: askin 119 | asks: askz 120 | assembling: assemblin 121 | assets: assetz 122 | assimilation: assimilashun 123 | associated: associatd 124 | assumed: assumd 125 | assuming: assumin 126 | assumption: assumpshun 127 | astonishing: astonishin 128 | astounding: astoundin 129 | atrophy: atrofy 130 | attached: attachd 131 | attacker: attackr 132 | attacks: attackz 133 | attempted: attemptd 134 | attempting: attemptin 135 | attendees: attendeez 136 | attention: attenshun 137 | attitudes: attitudez 138 | authentication: authenticashun 139 | authored: authord 140 | authorities: authoritiez 141 | authorized: authorizd 142 | authors: authorz 143 | autosaving: autosavin 144 | avoiding: avoidin 145 | awesome: awsum 146 | back: bak 147 | backed: backd 148 | background: bakground 149 | backing: backin 150 | backup: bakup 151 | backups: backupz 152 | baffled: baffld 153 | baked: bakd 154 | balanopreputial: balanoprepushul 155 | balked: balkd 156 | balkled: balkd 157 | ballmer: ballmr 158 | bandwidth: bandwif 159 | banker: bankr 160 | banking: bankin 161 | banning: bannin 162 | based: basd 163 | bastards: bastardz 164 | be: bees 165 | beaing: bean 166 | beaking: breakin 167 | because: cuz 168 | becomes: becomez 169 | before: "b4" 170 | beginning: beginnin 171 | beijing: beijin 172 | being: bein 173 | beleaguered: beleaguerd 174 | believe: bleev 175 | believes: believez 176 | believing: believin 177 | belongs: belongz 178 | benchmarks: benchmarkz 179 | bends: bendz 180 | beneficential: beneficenshul 181 | benefits: benefitz 182 | bestial: besshul 183 | betrays: betrayz 184 | better: bettr 185 | between: tween 186 | bidential: bidenshul 187 | big: hooj 188 | bigential: bigenshul 189 | bigger: biggr 190 | billboards: billboardz 191 | billed: billd 192 | binaries: binariez 193 | birthday: birfdai 194 | bitangential: bitangenshul 195 | blocks: blockz 196 | bloggers: bloggerz 197 | blogging: bloggin 198 | blows: blowz 199 | board: bord 200 | boing: boin 201 | "book's": bookz 202 | booked: bookd 203 | bookies: bookiez 204 | bookmarks: bookmarkz 205 | books: bookz 206 | bootlegged: bootleggd 207 | bootlegging: bootleggin 208 | borders: borderz 209 | boring: borin 210 | boss: bos 211 | bought: buyd 212 | bow: bowage 213 | boy: boi 214 | branching: branchin 215 | branded: brandd 216 | branding: brandin 217 | breaks: breakz 218 | breed: bred 219 | brewed: brewd 220 | bricks: brickz 221 | briefed: briefd 222 | brightness: brightnes 223 | bring: brin 224 | bringing: bringin 225 | brings: bringz 226 | broken: brokd 227 | browser: browsr 228 | browsers: browserz 229 | browsing: browsin 230 | building: buildin 231 | burned: burnd 232 | burrows: burrowz 233 | business: busines 234 | businessweek: businesweek 235 | busted: bustd 236 | buying: buyin 237 | buys: buyz 238 | bye: kthxbye 239 | cables: cablez 240 | cadential: cadenshul 241 | calculated: calculatd 242 | called: calld 243 | calling: callin 244 | calls: callz 245 | candidates: candidatez 246 | canned: cannd 247 | cans: canz 248 | capillitial: capillishul 249 | capitalization: capitalizashun 250 | capping: cappin 251 | caps: capz 252 | capture: capchure 253 | captured: capturd 254 | cardinalitial: cardinalishul 255 | carrier: carrir 256 | cars: carz 257 | cat: kitteh 258 | caucuses: caucusez 259 | caution: caushun 260 | celebrating: celebratin 261 | celestial: celesshul 262 | cellphone: cellfone 263 | center: centr 264 | cents: sents 265 | challenges: challengez 266 | champions: championz 267 | changed: changd 268 | changes: changez 269 | changing: changin 270 | channels: channelz 271 | characters: characterz 272 | charged: chargd 273 | charges: chargez 274 | charging: chargin 275 | chatter: chattr 276 | cheaper: cheapr 277 | checking: checkin 278 | cheer: cher 279 | cheeseburger: cheezburger 280 | cheeseburgers: cheezburgers 281 | chipped: chippd 282 | choose: chooz 283 | choosing: choosin 284 | christopher: christophr 285 | circumferential: circumferenshul 286 | circumspatial: circumspashul 287 | circumstantial: circumstanshul 288 | cited: citd 289 | cities: citiez 290 | claims: claimz 291 | class: clas 292 | clearer: clearr 293 | clever: clevr 294 | clicking: clickin 295 | clients: clientz 296 | closes: closez 297 | closing: closin 298 | coders: coderz 299 | coding: codin 300 | coeffluential: coeffluenshul 301 | coessential: coessenshul 302 | coincides: coincidez 303 | coinitial: coinishul 304 | collaboration: collaborashun 305 | colleagues: colleaguez 306 | collection: collecshun 307 | collections: collecshuns 308 | collects: collectz 309 | columns: columnz 310 | combination: combinashun 311 | comes: comez 312 | coming: comin 313 | comitial: comishul 314 | commands: commandz 315 | comments: commentz 316 | commercial: commershul 317 | commercials: commershuls 318 | commission: commishun 319 | communicated: speekd 320 | communicating: communicatin 321 | commuter: commutr 322 | companies: companiez 323 | compares: comparez 324 | comparing: comparin 325 | compelling: compellin 326 | compensation: compensashun 327 | competing: competin 328 | competition: competishun 329 | competitors: competitorz 330 | compilation: compilashun 331 | compiled: compild 332 | complacential: complacenshul 333 | complained: complaind 334 | complaining: complainin 335 | complains: complainz 336 | complaints: complaintz 337 | completed: completd 338 | complicated: complicatd 339 | components: componentz 340 | comprised: comprisd 341 | compromise: compromize 342 | compromised: compromisd 343 | computer: computr 344 | computers: computerz 345 | computing: computin 346 | concerns: concernz 347 | concluded: concludd 348 | concludes: concludez 349 | conclusion: conclushun 350 | concordantial: concordanshul 351 | condenses: condensez 352 | condition: condishun 353 | conditions: condishuns 354 | conferential: conferenshul 355 | confessing: confessin 356 | confidential: confidenshul 357 | configuration: configurashun 358 | confines: confinez 359 | confirmation: confirmashun 360 | confirmed: confirmd 361 | confronted: confrontd 362 | confusion: confushun 363 | congratulations: congratulashuns 364 | congress: congres 365 | congruential: congruenshul 366 | connected: connectd 367 | connection: connecshun 368 | connections: connecshuns 369 | consciousness: consciousnes 370 | consequential: consequenshul 371 | consider: considr 372 | considerations: considerashuns 373 | considered: considerd 374 | considering: considerin 375 | considers: considerz 376 | considred: considerd 377 | consiousness: sonsciousnes 378 | consists: consistz 379 | consortial: consorshul 380 | constrained: constraind 381 | consubstantial: consubstanshul 382 | consulting: consultin 383 | consumer: consumr 384 | consumers: consumerz 385 | consumes: consumez 386 | consuming: consumin 387 | consummated: consummatd 388 | contacted: contactd 389 | contains: containz 390 | contangential: contangenshul 391 | contents: contentz 392 | contingential: contingenshul 393 | continued: continud 394 | continues: continuez 395 | contradiction: contradicshun 396 | contrived: contrivd 397 | controlled: controlld 398 | controller: controllr 399 | controls: controlz 400 | controversial: controvershul 401 | convention: convenshun 402 | conventional: convenshunal 403 | conversation: conversashun 404 | conversions: convershuns 405 | converted: convertd 406 | convinced: convincd 407 | coordinates: coordinatez 408 | coordination: coordinashun 409 | copied: copid 410 | copies: copiez 411 | corporation: corporashun 412 | correspondential: correspondenshul 413 | costs: costz 414 | cotangential: cotangenshul 415 | could: cud 416 | covering: coverin 417 | covers: coverz 418 | cracking: crackin 419 | crapped: crappd 420 | creaks: creakz 421 | created: creatd 422 | creating: creatin 423 | creation: creashun 424 | creations: creashuns 425 | credential: credenshul 426 | credentials: credenshuls 427 | crendital: credenshul 428 | criticism: critisizm 429 | critics: criticz 430 | crossed: crossd 431 | crowes: crowez 432 | crowned: crownd 433 | crumbling: crumblin 434 | cubes: cubez 435 | curatial: curashul 436 | customer: customr 437 | customers: customerz 438 | cutter: cuttr 439 | cutting: cuttin 440 | damages: damagez 441 | danger: dangr 442 | daring: darin 443 | day: dai 444 | days: dais 445 | dead: ded 446 | deals: dealz 447 | debuted: debutd 448 | december: decembr 449 | deception: decepshun 450 | decided: decidd 451 | decides: decidez 452 | deciding: decidin 453 | decision: decishun 454 | decisions: decishuns 455 | decrypting: decryptin 456 | dedicated: dedicatd 457 | deduction: deducshun 458 | deductions: deducshuns 459 | defaults: defaultz 460 | defeats: defeatz 461 | defective: defectiv 462 | deferential: deferenshul 463 | deficiencies: deficienciez 464 | defining: definin 465 | definition: definishun 466 | definitions: definishuns 467 | delays: delayz 468 | delicious: delishus 469 | deliver: delivr 470 | delivered: deliverd 471 | delivering: deliverin 472 | delivers: deliverz 473 | delphi: delfi 474 | demanding: demandin 475 | demographics: demographicz 476 | demos: demoz 477 | denver: denvr 478 | denying: denyin 479 | dependencies: dependenciez 480 | deposition: deposishun 481 | depreciation: depreciashun 482 | depressing: depresin 483 | described: describd 484 | describes: describez 485 | description: descripshun 486 | descriptions: descripshuns 487 | deserved: deservd 488 | deserves: deservez 489 | designed: designd 490 | designer: designr 491 | designers: designerz 492 | designing: designin 493 | designs: designz 494 | desperation: desperashun 495 | destined: destind 496 | destroyed: destroyd 497 | detailed: detaild 498 | details: detailz 499 | determination: determinashun 500 | determined: determind 501 | developed: developd 502 | developer: developr 503 | developers: developrz 504 | developing: developin 505 | devices: devicez 506 | "didn't": didnt 507 | died: did 508 | differences: differencez 509 | differencing: differencin 510 | differential: differenshul 511 | digging: diggin 512 | digitizing: digitizin 513 | diluted: dilutd 514 | dimension: dimenshun 515 | dinner: dinnr 516 | dipped: dippd 517 | direction: direcshun 518 | directions: direcshuns 519 | directories: directoriez 520 | directors: directorz 521 | disabled: diabld 522 | disagreeing: disagreein 523 | disappearing: disapperin 524 | disappointed: disappointd 525 | disappointing: disappointin 526 | discounted: discountd 527 | discovered: discoverd 528 | discussion: discushun 529 | discussions: discushuns 530 | disliked: dislikd 531 | dismiss: dismis 532 | dismissed: dismissd 533 | dismisses: dismisez 534 | dismissing: dismissin 535 | disorder: disordr 536 | dispatches: dispatchez 537 | displayed: displayd 538 | displays: displayz 539 | disputed: disputd 540 | distinction: distincshun 541 | distributed: distributd 542 | distribution: distribushun 543 | ditched: ditchd 544 | division: divishun 545 | divisions: divishuns 546 | do: doez 547 | docs: docz 548 | doctored: doctord 549 | documentation: documentashun 550 | dodging: dodgin 551 | does: doez 552 | dogged: doggd 553 | doing: doin 554 | dollars: dollarz 555 | "don't": dun 556 | donation: donashun 557 | donations: donashuns 558 | done: dun 559 | doomed: doomd 560 | doors: doorz 561 | doubled: doubld 562 | doubters: doubterz 563 | doubting: doubtin 564 | downgrading: downgradin 565 | downloading: downloadin 566 | downloads: downloadz 567 | dressed: dressd 568 | drive: driv 569 | drivers: driverz 570 | drives: drivez 571 | driving: drivin 572 | dropped: droppd 573 | drops: dropz 574 | drowned: drownd 575 | during: durin 576 | dvds: dvdz 577 | eagles: eaglez 578 | earlier: earlir 579 | easier: easir 580 | eclipsing: eclipsin 581 | economics: economicz 582 | edges: edgez 583 | editing: editin 584 | edition: edishun 585 | editors: editorz 586 | edits: editz 587 | education: edumacashun 588 | edwards: edwardz 589 | effects: effectz 590 | either: eithr 591 | election: elecshun 592 | electropotential: electropotenshul 593 | elements: elementz 594 | elmer: elmr 595 | elongated: elongatd 596 | eloquential: eloquenshul 597 | emailed: emaild 598 | emerging: emergin 599 | emphasis: emfasiz 600 | emphasizing: emphasizin 601 | employed: employd 602 | employees: employeez 603 | emulation: emulashun 604 | emulations: emulashuns 605 | enabled: enabld 606 | enables: enablez 607 | enabling: enablin 608 | encountering: encounterin 609 | encourages: encouragez 610 | encouraging: encouragin 611 | encryption: encrypshun 612 | endeavors: endeavorz 613 | ended: endd 614 | ending: endin 615 | endless: endles 616 | endorsed: endorsd 617 | endorses: endorsez 618 | endorsing: endorsin 619 | ends: endz 620 | enemies: enemiez 621 | engineering: engineerin 622 | engineers: engineerz 623 | enhanced: enhancd 624 | enhances: enhanez 625 | enjoyed: enjoyd 626 | enjoying: enjoyin 627 | enlightened: enlightend 628 | enter: entr 629 | entrenched: entrenchd 630 | entries: entriez 631 | enumeration: enumerashun 632 | epididymodeferential: epididymodeferenshul 633 | equation: equashun 634 | equidistantial: equidistanshul 635 | equinoctial: equinocshul 636 | equipotential: equipotenshul 637 | equipped: equippd 638 | equispatial: equispashul 639 | equitangential: equitangenshul 640 | errors: errorz 641 | escaping: escapin 642 | essential: essenshul 643 | established: establishd 644 | establishing: establishin 645 | ether: ethr 646 | eucharistial: eucharisshul 647 | euphemisms: eufemisms 648 | evaluating: evaluatin 649 | events: eventz 650 | eventually: evenshully 651 | ever: evr 652 | every: evry 653 | everyone: evryone 654 | everything: everythin 655 | evidential: evidenshul 656 | evokes: evokez 657 | evolution: evolushun 658 | evolving: evolvin 659 | exactly: egsaktly 660 | examination: examinashun 661 | examples: examplez 662 | exasperated: exasperatd 663 | excellent: awsum 664 | exception: excepshun 665 | excited: excitd 666 | exciting: excitin 667 | excrementitial: excrementishul 668 | excrescential: excrescenshul 669 | executing: executin 670 | executive: executiv 671 | executives: executivez 672 | exhaustive: exhaustiv 673 | existed: existd 674 | existential: existenshul 675 | existing: existin 676 | exists: existz 677 | expanded: expandd 678 | expanding: expandin 679 | expected: expectd 680 | expediential: expedienshul 681 | expenditures: expenditurez 682 | experienced: experiencd 683 | experiences: experiencez 684 | experiential: experienshul 685 | explained: explaind 686 | explains: explainz 687 | explanation: explanashun 688 | explanations: explanashuns 689 | exponential: exponenshul 690 | exported: exportd 691 | express: expres 692 | expression: expreshun 693 | expressions: expreshuns 694 | extends: extendz 695 | extensions: extenshuns 696 | extensive: extensiv 697 | extraessential: extraessenshul 698 | fabrication: fabricashun 699 | faces: facez 700 | factitial: factishul 701 | factories: factoriez 702 | factors: factorz 703 | failed: faild 704 | failing: failin 705 | failures: failurez 706 | fake: faek 707 | fans: fanz 708 | fascinating: fascintatin 709 | faster: fastr 710 | fastscripts: fastscriptz 711 | feathers: featherz 712 | feature: feachur 713 | featured: featurd 714 | features: featurez 715 | featuring: featurin 716 | feed: fed 717 | feel: feelz 718 | feeling: feelin 719 | feels: feelz 720 | fees: feez 721 | fetial: feshul 722 | fever: fevr 723 | fewer: fewr 724 | fiddling: fiddlin 725 | figures: figurez 726 | figuring: figurin 727 | filed: fild 728 | files: filez 729 | filter: filtr 730 | financial: finanshul 731 | finder: findr 732 | finding: findin 733 | fingers: fingerz 734 | finished: finishd 735 | fired: fird 736 | first: furst 737 | fitness: fitnes 738 | fits: fitz 739 | fiver: fivr 740 | fixed: fixd 741 | fixes: fixez 742 | fixing: fixin 743 | flavored: flavord 744 | flawed: flawd 745 | floating: floatin 746 | focused: focusd 747 | folder: foldr 748 | folks: folkz 749 | following: followin 750 | fonts: fontz 751 | football: futball 752 | for: "4" 753 | forced: forcd 754 | forcing: forcin 755 | foregoing: foregoin 756 | forestial: foresshul 757 | formats: formatz 758 | formatted: formattd 759 | formed: formd 760 | former: formr 761 | forms: formz 762 | fortunes: fortunez 763 | forums: forumz 764 | found: findz 765 | foundation: foundashun 766 | four: "4" 767 | foxytunes: foxytunez 768 | frames: framez 769 | fraser: frasr 770 | friday: fridai 771 | fried: frid 772 | friend: frend 773 | friends: frenz 774 | from: frum 775 | frommer: frommr 776 | front: frunt 777 | frontier: frontir 778 | frustrated: frustratd 779 | frying: fryin 780 | fuck: "f&^%" 781 | fucking: "f&^%ing" 782 | fueled: fueld 783 | full: fol 784 | function: funcshun 785 | functions: funcshuns 786 | furnished: furnishd 787 | further: furthr 788 | galleries: galleriez 789 | games: gamez 790 | gaping: gapin 791 | gartner: gartnr 792 | gatekeeper: gatekeepr 793 | gave: gaev 794 | gawker: gawkr 795 | generated: generatd 796 | generates: generatez 797 | generating: generatin 798 | generation: generashun 799 | gentilitial: gentilishul 800 | gestures: gesturez 801 | get: git 802 | gets: getz 803 | getting: gettin 804 | giant: hooj 805 | giants: giantz 806 | ginned: ginnd 807 | girl: gurl 808 | give: giv 809 | gives: givez 810 | giving: givin 811 | gloating: gloatin 812 | glorified: glorifid 813 | goes: goez 814 | going: goin 815 | good: gud 816 | goodies: goodiez 817 | graggers: graggerz 818 | graphed: grafd 819 | graphic: grafic 820 | graphics: grafic 821 | gratification: gratificashun 822 | greater: greatr 823 | gripped: grippd 824 | groans: groanz 825 | grounds: groundz 826 | growing: growin 827 | grows: growz 828 | gruber: groober 829 | guaranteed: guaranted 830 | guess: gues 831 | guessed: guessd 832 | guessing: guessin 833 | guided: guidd 834 | guts: gutz 835 | guys: guyz 836 | hacking: hackin 837 | half: haf 838 | handed: handd 839 | handicaps: handicapz 840 | handles: handlez 841 | hands: hanz 842 | hanging: hangin 843 | happend: happend 844 | happened: happend 845 | happening: happenin 846 | happens: happenz 847 | happy: happeh 848 | harder: hardr 849 | have: has 850 | having: havin 851 | head: hed 852 | headed: headd 853 | heading: headin 854 | heads: headz 855 | heard: herd 856 | hearing: hearin 857 | hearings: hearingz 858 | heartwarming: heartwarmin 859 | hearwarming: heartwarmin 860 | heels: heelz 861 | hello: "y halo thar" 862 | help: halp 863 | helped: helpd 864 | helping: helpin 865 | helps: helpz 866 | here: her 867 | "here's": heers 868 | heroes: heroez 869 | hi: "oh hai" 870 | higher: highr 871 | highlight: highlite 872 | highlighting: highlightin 873 | highlights: highlightz 874 | himself: him 875 | hints: hintz 876 | hired: hird 877 | his: hims 878 | hits: hitz 879 | holder: holdr 880 | holders: holderz 881 | holding: holdin 882 | holdings: holdingz 883 | holds: holdz 884 | holiday: holidai 885 | holidays: holidais 886 | hoped: hopd 887 | hopelessly: hopelesly 888 | hoping: hopin 889 | hosting: hostin 890 | hours: hourz 891 | house: houz 892 | hovering: hoverin 893 | however: howevr 894 | hughes: hughez 895 | humorless: humorles 896 | hundred: hundrd 897 | hundreds: hundredz 898 | hunger: hungr 899 | hurdles: hurdlez 900 | hurts: hurtz 901 | hyperreverential: hyperreverenshul 902 | hyperspatial: hyperspashul 903 | "i'm": me 904 | "i've": "i has" 905 | icons: iconz 906 | ideas: ideaz 907 | identified: identifid 908 | idiots: idiotz 909 | illusion: illushun 910 | illustration: illustrashun 911 | imacs: imacz 912 | images: imagez 913 | imaging: imagin 914 | imaginig: imaginin 915 | imagining: imaginin 916 | impartial: imparshul 917 | implementation: implementashun 918 | implemented: implementd 919 | implied: implid 920 | implies: impliez 921 | impressed: impresd 922 | impressions: impreshuns 923 | improve: improov 924 | improved: improovd 925 | improvements: improovements 926 | imprudential: imprudenshul 927 | inception: incepshun 928 | inches: inchez 929 | included: includd 930 | includes: includez 931 | including: includin 932 | inconsequential: inconsequenshul 933 | increased: increasd 934 | increases: increasez 935 | increasing: increasin 936 | indeed: inded 937 | indentation: indentashun 938 | indentations: indentashuns 939 | indicated: indicatd 940 | indicates: indicatez 941 | indifferential: indifferenshul 942 | individuals: individualz 943 | indulgential: indulgenshul 944 | inequipotential: inequipotenshul 945 | inertial: inershul 946 | inessential: inessenshul 947 | infection: infecshun 948 | infects: infectz 949 | inferential: inferenshul 950 | influenced: influencd 951 | influential: influenshul 952 | infographic: infografic 953 | information: informashun 954 | informed: informd 955 | infrabestial: infrabesshul 956 | infracelestial: infracelesshul 957 | infuriating: infriatin 958 | initial: inishul 959 | injection: injecshun 960 | innocent: innosent 961 | inquirer: inquirr 962 | inspection: inspecshun 963 | inspiration: inspirashun 964 | inspired: inspird 965 | inspiring: inspirin 966 | installation: installashun 967 | installations: installashuns 968 | installed: installd 969 | installer: installr 970 | installing: installin 971 | instantial: instanshul 972 | instruction: instrucshun 973 | instructions: instrucshuns 974 | insubstantial: insubstanshul 975 | integrated: integratd 976 | integrates: integratez 977 | integration: integrashn 978 | integrodifferential: integrodifferenshul 979 | intelligential: intelligenshul 980 | intended: intendd 981 | intentions: intenshuns 982 | interaction: interacshun 983 | interactive: interactiv 984 | interequinoctial: interequinocshul 985 | interested: interestd 986 | interesting: interestin 987 | interfaces: interfacez 988 | interfacing: interfacin 989 | interferential: interferenshul 990 | intergential: intergenshul 991 | internet: internets 992 | internuptial: internupshul 993 | interpolation: interpolashun 994 | interpolations: interpolashuns 995 | interspatial: interspashul 996 | interstitial: interstishul 997 | interviews: interviewz 998 | introduced: introducd 999 | introduces: introducez 1000 | introducing: introducin 1001 | introduction: introducshun 1002 | inverts: invertz 1003 | invested: investd 1004 | investigating: investigatin 1005 | investigative: investigativ 1006 | investors: investorz 1007 | invokes: invokez 1008 | involved: involvd 1009 | iphone: ifone 1010 | iphonedrive: ifonedrive 1011 | iphones: ifonez 1012 | iphoto: ifoto 1013 | ipods: ipodz 1014 | irredential: irredenshul 1015 | irreverential: irreverenshul 1016 | is: iz 1017 | "isn't": isnt 1018 | isolated: isolatd 1019 | issues: issuez 1020 | "it's": iz 1021 | items: itemz 1022 | iterating: iteratin 1023 | its: itz 1024 | itunes: itunez 1025 | jackass: jackas 1026 | jackassyness: jackasynes 1027 | james: jamez 1028 | jaws: jawz 1029 | jesus: jebus 1030 | jibes: jibez 1031 | jiminy: omgwtfbbq 1032 | jobs: jobz 1033 | joined: joind 1034 | joining: joinin 1035 | joking: jokin 1036 | jones: jonez 1037 | jostling: jostlin 1038 | journalists: journalistz 1039 | judged: judgd 1040 | judging: judgin 1041 | jumped: jumpd 1042 | jumping: jumpin 1043 | jurisprudential: jurisprudenshul 1044 | just: jus 1045 | keeping: keepin 1046 | keeps: keepz 1047 | keyboards: keyboardz 1048 | keys: keyz 1049 | keywords: keywordz 1050 | kicking: kickin 1051 | kidding: kiddin 1052 | kids: kidz 1053 | killed: killd 1054 | killer: killr 1055 | know: knoe 1056 | knows: knowz 1057 | labeled: labeld 1058 | labeling: labelin 1059 | labelled: labeld 1060 | labels: labelz 1061 | laboratories: laboratoriez 1062 | labs: labz 1063 | ladies: ladiez 1064 | landed: landd 1065 | laptops: laptopz 1066 | larger: largr 1067 | laser: lazr 1068 | later: latr 1069 | latter: lattr 1070 | laugh: laff 1071 | launched: launchd 1072 | laws: lawz 1073 | layers: layerz 1074 | layouts: layoutz 1075 | lazytwitter: lazytwittr 1076 | leaders: leaderz 1077 | leading: leadin 1078 | leaks: leakz 1079 | learned: learnd 1080 | learning: lernin 1081 | leave: leef 1082 | legislature: legislachur 1083 | legs: legz 1084 | lens: lenz 1085 | leopard: lepprd 1086 | less: les 1087 | lets: letz 1088 | letter: lettr 1089 | letters: letterz 1090 | levels: levelz 1091 | leveraging: leveragin 1092 | levereaging: leveragin 1093 | licensed: licensd 1094 | licensing: licensin 1095 | like: liek 1096 | liked: likd 1097 | limitation: limitashun 1098 | limitations: limitashuns 1099 | limited: limitd 1100 | limits: limitz 1101 | lined: lind 1102 | lines: linez 1103 | lingering: lingerin 1104 | linkblogging: linkbloggin 1105 | links: linkz 1106 | listened: listend 1107 | lists: listz 1108 | little: lil 1109 | live: liv 1110 | living: livin 1111 | loaded: loadd 1112 | loading: loadin 1113 | localised: localisd 1114 | location: locashun 1115 | logicians: logicianz 1116 | logs: logz 1117 | longer: longr 1118 | looked: lookd 1119 | looking: lukingz 1120 | looks: lookz 1121 | losing: losin 1122 | lots: lawts 1123 | love: luv 1124 | loved: lovd 1125 | lovely: luvly 1126 | loves: luvz 1127 | lucrative: lucrativ 1128 | lunistitial: lunistishul 1129 | lying: lyin 1130 | macbooks: macbookz 1131 | machine: masheen 1132 | machines: masheens 1133 | macs: macz 1134 | magazines: magazinez 1135 | mailboxes: mailboxez 1136 | maintained: maintaind 1137 | maintaining: maintainin 1138 | make: mak 1139 | maker: makr 1140 | makers: makerz 1141 | makes: makez 1142 | making: makin 1143 | malicious: malishus 1144 | manager: managr 1145 | managers: managerz 1146 | manages: managez 1147 | managing: managin 1148 | manipulation: manipulashun 1149 | manning: manning 1150 | manufacture: manufacchur 1151 | manufacturing: manufacturin 1152 | many: "lotz da" 1153 | mapping: mappin 1154 | maps: mapz 1155 | market: markit 1156 | marketing: marketin 1157 | markets: marketz 1158 | marshals: marshalz 1159 | martial: marshul 1160 | masks: maskz 1161 | mass: mas 1162 | matter: mattr 1163 | mattered: matterd 1164 | maximizing: maximizin 1165 | may: cud 1166 | meaning: meanin 1167 | means: meanz 1168 | measures: measurez 1169 | measuring: measurin 1170 | mechanisms: mechanismz 1171 | meeting: mettin 1172 | megaphone: megafone 1173 | member: membr 1174 | members: memberz 1175 | memoirs: memoirz 1176 | memory: mems 1177 | mention: menshun 1178 | mentioned: menshund 1179 | merchandise: merchandize 1180 | mess: mes 1181 | message: mesage 1182 | messages: mesagez 1183 | messenger: messengr 1184 | migration: migrashun 1185 | minded: mindd 1186 | minds: mindz 1187 | miniaturization: miniaturizashun 1188 | minutes: minutez 1189 | minutial: minushul 1190 | mischaracterization: mischaracterizashun 1191 | misd: missd 1192 | misleading: misleadin 1193 | miss: mis 1194 | missed: misd 1195 | misses: misez 1196 | missing: missin 1197 | mission: mishun 1198 | missions: mishuns 1199 | mistakes: mistakez 1200 | misunderstanding: misunderstandin 1201 | mocking: mockn 1202 | models: modelz 1203 | monday: mondai 1204 | months: monthz 1205 | more: moar 1206 | morning: mornin 1207 | mother: mothr 1208 | motion: moshun 1209 | move: moov 1210 | moved: movd 1211 | moves: movez 1212 | movies: moviez 1213 | moving: movin 1214 | mr: mistah 1215 | much: mutch 1216 | multiuser: multiusr 1217 | my: mah 1218 | naked: nakd 1219 | name: naym 1220 | named: namd 1221 | names: namez 1222 | nation: nashun 1223 | nationwide: nashunwide 1224 | nature: nachur 1225 | need: ned 1226 | needed: needd 1227 | needs: needz 1228 | negotiations: negotiashuns 1229 | neither: neithr 1230 | network: netwerk 1231 | never: nevr 1232 | new: nu 1233 | news: newz 1234 | newsletter: newslettr 1235 | newspaper: nespapr 1236 | nomination: nominashun 1237 | noncelestial: noncelesshul 1238 | nonconfidential: nonconfidenshul 1239 | nondeferential: nondeferenshul 1240 | nonessential: nonessenshul 1241 | nonetheless: nonetheles 1242 | nonevidential: nonevidenshul 1243 | nonexistential: nonexistenshul 1244 | noninitial: noninishul 1245 | nonmartial: nonmarshul 1246 | nonpartial: nonparshul 1247 | nonpotential: nonpotenshul 1248 | nonpreferential: nonpreferenshul 1249 | nonpresidential: nonpresidenshul 1250 | nonprovidential: nonprovidenshul 1251 | nonresidential: nonresidenshul 1252 | nonserviential: nonservienshul 1253 | nonsubstantial: nonsubstanshul 1254 | nontangential: nontangenshul 1255 | notes: notez 1256 | nothing: nothin 1257 | noticed: noticd 1258 | notification: notificashun 1259 | notifications: notificashuns 1260 | noting: notin 1261 | notion: noshun 1262 | november: novembr 1263 | novitial: novishul 1264 | now: nao 1265 | nowadays: nowadais 1266 | nuclear: nucular 1267 | nuggets: nuggetz 1268 | number: numbr 1269 | numbered: numbrd 1270 | numbers: numberz 1271 | nuptial: nupshul 1272 | nutritial: nutrishul 1273 | nytimes: nytimez 1274 | obediential: obedienshul 1275 | objected: objectd 1276 | objections: objecshuns 1277 | obligation: obligashun 1278 | observation: observashun 1279 | observations: observashuns 1280 | observer: observr 1281 | observers: observerz 1282 | obtained: obtaind 1283 | obtaining: obtainin 1284 | obviating: obviatin 1285 | occasional: occashunal 1286 | occasionally: occashunally 1287 | occasions: occashuns 1288 | occurring: occurrin 1289 | october: octobr 1290 | of: ov 1291 | offended: offendd 1292 | offer: offr 1293 | offered: offrd 1294 | offering: offerin 1295 | offers: offerz 1296 | oh: o 1297 | ok: k 1298 | omission: omishun 1299 | one: wan 1300 | ones: onez 1301 | ongoing: ongoin 1302 | opening: openin 1303 | opens: openz 1304 | operating: operatin 1305 | operation: operashun 1306 | operations: operashuns 1307 | opinions: opinionz 1308 | opposed: opposd 1309 | optimized: optimizd 1310 | option: opshun 1311 | optional: opshunal 1312 | optionally: opshunally 1313 | options: opshuns 1314 | order: ordr 1315 | ordered: orderd 1316 | orders: orderz 1317 | oriented: orientd 1318 | ostial: osshul 1319 | other: othr 1320 | others: otherz 1321 | otherwise: otherwize 1322 | our: r 1323 | outliner: outlinr 1324 | outstanding: outstandin 1325 | over: ovar 1326 | overcarged: overchargd 1327 | overheating: overheatin 1328 | overinfluential: overinfluenshul 1329 | overlooked: overlookd 1330 | overpartial: overparshul 1331 | overpotential: overpotenshul 1332 | overpowered: overpowerd 1333 | overrated: overratd 1334 | overrepresented: overrepresentd 1335 | overriding: overridin 1336 | overtures: overturez 1337 | overwhelming: overwhelmin 1338 | owed: owd 1339 | owned: ownd 1340 | owner: ownr 1341 | owners: ownerz 1342 | owns: ownz 1343 | packaging: packagin 1344 | page: paeg 1345 | pages: pagez 1346 | pageviews: pageviewz 1347 | palatial: palashul 1348 | panning: pannin 1349 | pants: pantz 1350 | paper: papr 1351 | paragraph: paragraf 1352 | parallels: parallelz 1353 | parameter: parametr 1354 | parses: parsez 1355 | parsing: parsin 1356 | partial: parshul 1357 | participated: participatd 1358 | partitioning: partitionin 1359 | partners: partnerz 1360 | parts: partz 1361 | pass: pas 1362 | passengers: passengerz 1363 | passes: pasez 1364 | passwords: passwordz 1365 | patents: patentz 1366 | patriots: patriotz 1367 | penitential: penitenshul 1368 | people: peeps 1369 | percent: persent 1370 | perfect: perfik 1371 | performed: performd 1372 | performs: performz 1373 | perhaps: perhapz 1374 | perichaetial: perichaeshul 1375 | pericytial: pericyshul 1376 | peripheral: periferal 1377 | perkins: perkinz 1378 | permission: permishun 1379 | permissions: permishuns 1380 | perplexed: perplexd 1381 | persons: personz 1382 | perspective: perspectiv 1383 | pestilential: pestilenshul 1384 | peter: petr 1385 | phil: fil 1386 | philadelphia: filadelfia 1387 | philip: filip 1388 | phill: fill 1389 | philosophy: filosofy 1390 | phone: fone 1391 | phones: fonez 1392 | photo: foto 1393 | photograph: fotograf 1394 | photographers: fotografers 1395 | photographic: fotografic 1396 | photographs: fotografs 1397 | photonic: fotonic 1398 | photos: fotos 1399 | photoshop: fotoshop 1400 | phrase: frase 1401 | phrasing: frasin 1402 | physical: fysical 1403 | physicist: fysicist 1404 | physics: fysics 1405 | picked: pickd 1406 | picking: pickin 1407 | picks: pickz 1408 | pictures: picturez 1409 | pieces: piecez 1410 | pinching: pinchin 1411 | pirated: piratd 1412 | places: placez 1413 | plagued: plagud 1414 | planning: plannin 1415 | plans: planz 1416 | platforms: platformz 1417 | played: playd 1418 | player: playr 1419 | players: playerz 1420 | playing: playin 1421 | plays: playz 1422 | playstation: playstashun 1423 | please: plz 1424 | pleasing: pleasin 1425 | plenipotential: plenipotenshul 1426 | plugging: pluggin 1427 | plugins: pluginz 1428 | podcasts: podcastz 1429 | pointing: pointin 1430 | points: pointz 1431 | polling: pollin 1432 | pondering: ponderin 1433 | pops: popz 1434 | portals: portalz 1435 | ported: portd 1436 | portion: porshun 1437 | portions: porshuns 1438 | ports: portz 1439 | position: posishun 1440 | positive: positiv 1441 | possession: posesshun 1442 | possibilities: possibilitiez 1443 | possible: posible 1444 | postcomitial: postcomishul 1445 | posted: postd 1446 | poster: postr 1447 | postnuptial: postnupshul 1448 | posts: postz 1449 | potential: potenshul 1450 | pounced: pouncd 1451 | power: powr 1452 | powered: powerd 1453 | praises: praisez 1454 | precedential: precedenshul 1455 | preceding: precedin 1456 | precious: preshus 1457 | predicted: predictd 1458 | predicting: predictin 1459 | prediction: predicshun 1460 | prefatial: prefashul 1461 | prefer: prefr 1462 | preferences: preferencez 1463 | preferential: preferenshul 1464 | preferred: preferrd 1465 | prefs: prefz 1466 | preinitial: preinishul 1467 | prelatial: prelashul 1468 | prenuptial: prenupshul 1469 | preparation: preparashun 1470 | preparations: preparashuns 1471 | prepotential: prepotenshul 1472 | preprocessing: preprocessin 1473 | preputial: prepushul 1474 | presentation: presentashun 1475 | presentations: presentashuns 1476 | presential: presenshul 1477 | presention: presenshun 1478 | president: prezident 1479 | presidential: presidenshul 1480 | press: pres 1481 | pressing: pressin 1482 | presubstantial: presubstanshul 1483 | preteressential: preteressenshul 1484 | preternuptial: preternupshul 1485 | pretty: pritee 1486 | prevented: preventd 1487 | prevents: preventz 1488 | priced: pricd 1489 | prices: pricez 1490 | primaries: primariez 1491 | primatial: primashul 1492 | primitial: primishul 1493 | principles: principlez 1494 | printed: printd 1495 | printing: printin 1496 | priorities: prioritiez 1497 | privileged: privilegd 1498 | privileges: privilegez 1499 | prizes: prizez 1500 | problems: problemz 1501 | proceed: proced 1502 | process: proces 1503 | processing: processin 1504 | proclaiming: proclaimin 1505 | produced: producd 1506 | producer: producr 1507 | produces: producez 1508 | production: producshun 1509 | products: productz 1510 | professional: profeshunal 1511 | professionals: profeshunals 1512 | profressional: profeshunal 1513 | profressionals: profeshunals 1514 | programming: programmin 1515 | progress: progres 1516 | projects: projectz 1517 | promising: promisun 1518 | promoting: promotin 1519 | promotional: promoshunal 1520 | proper: propr 1521 | properties: propertiez 1522 | propitial: propishul 1523 | prospects: prospectz 1524 | protective: protectiv 1525 | protuberantial: protuberanshul 1526 | proves: provez 1527 | provided: providd 1528 | providential: providenshul 1529 | provider: providr 1530 | provides: providez 1531 | providing: providin 1532 | proving: provin 1533 | prudential: prudenshul 1534 | pseudoimpartial: pseudoimparshul 1535 | publication: publicashun 1536 | publications: publicashuns 1537 | publishers: publisherz 1538 | publishes: publishez 1539 | publishing: publishin 1540 | pulled: pulld 1541 | pulling: pullin 1542 | pulls: pullz 1543 | pumping: pumpin 1544 | purchased: purchasd 1545 | purported: purportd 1546 | purposes: purposez 1547 | pursuing: pursuin 1548 | pushed: pushd 1549 | pussy: pusy 1550 | putting: puttin 1551 | puutting: puttin 1552 | quarter: quartr 1553 | quarters: quarterz 1554 | querulential: querulenshul 1555 | question: queshun 1556 | questions: queshuns 1557 | queued: queud 1558 | quicksilver: quicksilvr 1559 | quintessential: quintessenshul 1560 | quitting: quittin 1561 | radioactive: radioactiv 1562 | rails: railz 1563 | raised: raisd 1564 | raises: raisez 1565 | rands: randz 1566 | ranks: rankz 1567 | rather: rathr 1568 | rating: ratin 1569 | reached: reachd 1570 | reaction: reacshun 1571 | reader: readr 1572 | readers: readerz 1573 | reading: readin 1574 | reads: readz 1575 | realizing: realizin 1576 | really: rly 1577 | reasons: reasonz 1578 | recalls: recallz 1579 | receipts: receiptz 1580 | received: receivd 1581 | receiver: receivr 1582 | recent: resent 1583 | reception: recepshun 1584 | recodring: recordin 1585 | recommended: recommendd 1586 | recorder: recordr 1587 | recording: recordin 1588 | recrementitial: recrementishul 1589 | recurring: recurrin 1590 | reducing: reducin 1591 | reduction: reducshun 1592 | referential: referenshul 1593 | referring: referrin 1594 | refusing: refusin 1595 | regarding: regardin 1596 | regardless: regardles 1597 | register: registr 1598 | reinforcing: reinforcin 1599 | rejected: rejectd 1600 | rejection: rejecshun 1601 | related: relatd 1602 | relations: relashuns 1603 | relatives: relativez 1604 | released: releasd 1605 | releases: releasez 1606 | releasing: releasin 1607 | remaining: remainin 1608 | remains: remainz 1609 | remarks: remarkz 1610 | remember: remembr 1611 | reminder: remindr 1612 | reminding: remindin 1613 | reminds: remindz 1614 | reminiscential: reminiscenshul 1615 | removed: removd 1616 | render: rendr 1617 | rendering: rendrin 1618 | renowned: renownd 1619 | rented: rentd 1620 | repairing: repairin 1621 | repeated: repeatd 1622 | repeats: repeatz 1623 | repercussions: repercushuns 1624 | replied: replid 1625 | replies: repliez 1626 | reporing: reportin 1627 | reported: reportd 1628 | reporting: reportin 1629 | reports: reportz 1630 | representatives: representativez 1631 | representing: representin 1632 | requested: requestd 1633 | required: requird 1634 | requirements: requirementz 1635 | requires: requirez 1636 | requiring: requirin 1637 | resentations: presentashuns 1638 | residential: residenshul 1639 | residents: residentz 1640 | resisting: resistn 1641 | resolution: resolushun 1642 | resolved: resolvd 1643 | resources: resourcez 1644 | respected: respectd 1645 | responds: respondz 1646 | responses: responsez 1647 | restless: restles 1648 | restoration: restorashun 1649 | restriction: restricshun 1650 | restrictions: restricshuns 1651 | resulted: resultd 1652 | results: resultz 1653 | retailer: retailr 1654 | retained: retaind 1655 | retention: retenshun 1656 | retial: reshul 1657 | reveals: revealz 1658 | revenues: revenuez 1659 | reverential: reverenshul 1660 | reverses: reversez 1661 | reviewed: reviewd 1662 | reviews: reviewz 1663 | revises: revisez 1664 | revolution: revolushun 1665 | revolutionary: revolushunary 1666 | rewards: rewardz 1667 | rewriting: rewritin 1668 | ridiculous: redikulus 1669 | right: rite 1670 | ripper: rippr 1671 | ripping: rippin 1672 | risks: riskz 1673 | robbed: robbd 1674 | robotics: roboticz 1675 | rodential: rodenshul 1676 | rolling: rollin 1677 | rotation: rotashun 1678 | rows: rowz 1679 | ruffling: rufflin 1680 | rugged: ruggd 1681 | rules: rulez 1682 | rumored: rumord 1683 | rumors: rumorz 1684 | running: runnin 1685 | runs: runz 1686 | said: sed 1687 | sales: salez 1688 | sampling: samplin 1689 | sanctioned: sancshund 1690 | sandwiches: sandwichez 1691 | sans: sanz 1692 | sapiential: sapienshul 1693 | sarbanes: sarbanez 1694 | satisfaction: satisfacshun 1695 | satisfying: satisfyin 1696 | saturday: katurdai 1697 | saving: savin 1698 | savviness: savvines 1699 | say: sai 1700 | saying: sayin 1701 | says: sez 1702 | scaling: scalin 1703 | scared: skerd 1704 | scarification: scarificashun 1705 | scathing: scathin 1706 | scenes: scneez 1707 | scheduled: scheduld 1708 | scheduling: schedulin 1709 | schizophrenia: schizofrenia 1710 | schools: schoolz 1711 | sciential: scienshul 1712 | sclerotial: scleroshul 1713 | score: skor 1714 | scored: scord 1715 | screencasting: screencastin 1716 | screenshots: screenshotz 1717 | screwed: screwd 1718 | screws: screwz 1719 | script: skript 1720 | scripting: scriptin 1721 | scripts: scriptz 1722 | scripture: scripchur 1723 | scrolling: scrollin 1724 | seamless: seamles 1725 | searches: searchez 1726 | searching: searchin 1727 | seating: seatin 1728 | seats: seatz 1729 | seconds: secondz 1730 | secret: seekret 1731 | secrets: seekretz 1732 | section: secshun 1733 | sections: secshuns 1734 | see: c 1735 | seeded: seedd 1736 | seeds: seedz 1737 | seeing: seein 1738 | seeking: seekin 1739 | seeks: seekz 1740 | seemed: seemd 1741 | seems: seemz 1742 | sees: seez 1743 | selected: selectd 1744 | selecting: selectin 1745 | selection: selecshun 1746 | "self's": selfs 1747 | selling: sellin 1748 | sells: sellz 1749 | semicelestial: semicelesshul 1750 | sending: sendin 1751 | sentential: sentenshul 1752 | separation: separashun 1753 | sepbember: septembr 1754 | sequential: sequenshul 1755 | series: seriez 1756 | seriously: srsly 1757 | served: servd 1758 | server: servr 1759 | servers: servrz 1760 | serves: servez 1761 | service: survis 1762 | services: servicez 1763 | serviential: servienshul 1764 | sesquitertial: sesquitershul 1765 | session: seshun 1766 | sessions: seshuns 1767 | settings: settingz 1768 | shared: shard 1769 | shareholder: shareholdr 1770 | shareholders: shareholderz 1771 | shares: sharez 1772 | sharing: sharin 1773 | sheds: shedz 1774 | shipped: shippd 1775 | shipping: shippin 1776 | ships: shipz 1777 | shitting: shittin 1778 | shocking: shockin 1779 | shortcomings: shortcomingz 1780 | shortcuts: shortcutz 1781 | shorter: shortr 1782 | should: shud 1783 | showed: showd 1784 | showing: showin 1785 | shows: showz 1786 | shrinks: shrinkz 1787 | sidelines: sidelinez 1788 | signals: signalz 1789 | signing: signin 1790 | signs: signz 1791 | signups: signupz 1792 | silential: silenshul 1793 | silver: silvr 1794 | sinking: sinkin 1795 | site: siet 1796 | sites: siets 1797 | situation: situashun 1798 | situations: situashuns 1799 | six: "6" 1800 | sized: sizd 1801 | sketching: sketchin 1802 | skewed: skewd 1803 | slides: slidez 1804 | slimmed: slimmd 1805 | slower: slowr 1806 | slumping: slumpin 1807 | smaller: smallr 1808 | smartphone: smartfone 1809 | smells: smellz 1810 | smudged: smudgd 1811 | smuggled: smuggld 1812 | soaked: soakd 1813 | social: soshul 1814 | soldering: solderin 1815 | solstitial: solstishul 1816 | solution: solushun 1817 | solutions: solushuns 1818 | solved: solvd 1819 | some: sum 1820 | something: somethin 1821 | sometimes: sometimez 1822 | sophisticated: sofisticatd 1823 | sorry: sry 1824 | sounding: soundin 1825 | sounds: soundz 1826 | sources: sourcez 1827 | sourcing: sourcin 1828 | spaces: spacez 1829 | spangled: spangld 1830 | spatial: spashul 1831 | speakers: speakerz 1832 | speaking: speakin 1833 | specced: speccd 1834 | special: speshul 1835 | specialized: specializd 1836 | specification: specificashun 1837 | specifications: specificashuns 1838 | specs: specz 1839 | spectators: spectatorz 1840 | specter: spectr 1841 | speculating: speculatin 1842 | speculation: speculashun 1843 | specutlitve: speculativ 1844 | speed: sped 1845 | spends: spendz 1846 | spoilers: spoilerz 1847 | sponsoring: sponsorin 1848 | sporting: sportin 1849 | spreading: spreadin 1850 | spreads: spreadz 1851 | spreadsheets: spreadsheetz 1852 | sprinkles: sprinklez 1853 | squeezed: squeezd 1854 | stacks: stackz 1855 | standardized: standardizd 1856 | standards: standardz 1857 | stands: standz 1858 | starbucks: starbuckz 1859 | stars: starz 1860 | started: startd 1861 | starting: startin 1862 | starts: startz 1863 | stashing: stashin 1864 | stated: statd 1865 | states: statez 1866 | station: stashun 1867 | stations: stashuns 1868 | stats: statz 1869 | staying: stayin 1870 | stays: stayz 1871 | steal: steel 1872 | stealing: stealin 1873 | stephenson: stefenson 1874 | stepped: steppd 1875 | steps: stepz 1876 | sticking: stickin 1877 | stocked: stockd 1878 | stocks: stockz 1879 | stored: stord 1880 | stores: storez 1881 | stories: storiez 1882 | strained: straind 1883 | strapped: strappd 1884 | strategies: strategiez 1885 | streaming: streamin 1886 | streamlined: streamlind 1887 | strengths: strengthz 1888 | strikes: strikez 1889 | striking: strikin 1890 | string: strin 1891 | strips: stripz 1892 | stronger: strongr 1893 | structure: strucchur 1894 | struggled: struggld 1895 | students: studentz 1896 | stunning: stunnin 1897 | subcelestial: subcelesshul 1898 | subessential: subessenshul 1899 | subinitial: subinishul 1900 | subintelligential: subintelligenshul 1901 | submissions: submishuns 1902 | subscribers: subscriberz 1903 | subscription: subscripshun 1904 | subsequential: subsequenshul 1905 | subsistential: subsistenshul 1906 | substantial: substanshul 1907 | substring: substrin 1908 | subtercelestial: subtercelesshul 1909 | subversion: subvershun 1910 | succeed: succed 1911 | succeeds: succeedz 1912 | success: succes 1913 | sucks: suckz 1914 | suffer: suffr 1915 | suffers: sufferz 1916 | suggested: suggestd 1917 | suggestions: sugesshuns 1918 | suggests: suggestz 1919 | suited: suitd 1920 | summarizes: summarizez 1921 | super: supr 1922 | supercelestial: supercelesshul 1923 | superduper: superdupr 1924 | superessential: superessenshul 1925 | superintendential: superintendenshul 1926 | supersesquitertial: supersesquitershul 1927 | supersubstantial: supersubstanshul 1928 | supplied: supplid 1929 | suppliers: supplierz 1930 | supported: supportd 1931 | supporters: supporterz 1932 | supports: supportz 1933 | supposed: supposd 1934 | suprise: surprize 1935 | surfing: surfin 1936 | surprised: surprisd 1937 | surprises: surprisez 1938 | surprising: surprisn 1939 | surrounding: surroundin 1940 | sutied: suitd 1941 | sweater: sweatr 1942 | swelled: swelld 1943 | switched: switchd 1944 | switches: switchez 1945 | switching: switchin 1946 | swizzling: swizzlin 1947 | synching: synchin 1948 | syncytial: syncyshul 1949 | system: sistem 1950 | sytems: sistemz 1951 | tagging: taggin 1952 | take: taek 1953 | takeover: takeovr 1954 | takes: takez 1955 | taking: takin 1956 | takover: takeovr 1957 | talented: talentd 1958 | talked: talkd 1959 | talking: talkin 1960 | talks: talkz 1961 | tangential: tangenshul 1962 | tapes: tapez 1963 | taping: tapin 1964 | targeting: targetin 1965 | targets: targetz 1966 | tasks: taskz 1967 | tasting: tastin 1968 | teaching: teachin 1969 | teams: teamz 1970 | techniques: techniquez 1971 | technologies: technologiez 1972 | telephotos: telefotoz 1973 | telestial: telesshul 1974 | telling: tellin 1975 | tendential: tendenshul 1976 | teretial: tereshul 1977 | terms: termz 1978 | tertial: tershul 1979 | tester: testr 1980 | testing: testin 1981 | tests: testz 1982 | textwrangler: textwranglr 1983 | thanks: thx 1984 | that: dat 1985 | the: teh 1986 | theater: theatr 1987 | their: der 1988 | themselves: themselvez 1989 | there: thar 1990 | these: thees 1991 | thing: ting 1992 | things: tings 1993 | think: finks 1994 | thinking: thinkin 1995 | thinkpads: thinkpadz 1996 | thinnes: thinnes 1997 | this: dis 1998 | those: dose 1999 | thought: thot 2000 | thoughts: thoughtz 2001 | thousands: thousandz 2002 | three: "3" 2003 | thrives: thrivez 2004 | thriving: thrivin 2005 | through: thru 2006 | throwing: throwin 2007 | thursday: thursdai 2008 | tiered: tierd 2009 | tiering: tierin 2010 | tiles: tilez 2011 | time: tiem 2012 | times: tiems 2013 | timing: timin 2014 | tinker: tinkr 2015 | titles: titlez 2016 | to: "2" 2017 | today: todai 2018 | together: togehtr 2019 | tomorrow: "2morrow" 2020 | tonight: tonite 2021 | too: "2" 2022 | tools: toolz 2023 | topics: topicz 2024 | torrential: torrenshul 2025 | tossup: tosup 2026 | totipotential: totipotenshul 2027 | touches: touchez 2028 | touted: toutd 2029 | towards: towardz 2030 | tower: towr 2031 | toys: toyz 2032 | tracking: trackin 2033 | trackpads: trackpadz 2034 | tracks: trackz 2035 | traction: tracshun 2036 | trademarks: trademarkz 2037 | tradional: tradishunal 2038 | trafficked: traffickd 2039 | transaction: transacshun 2040 | transcript: tranzscript 2041 | transexperiential: transexperienshul 2042 | transfer: transfr 2043 | transferential: transferenshul 2044 | transformation: transformashun 2045 | transition: transishun 2046 | translate: tranzlate 2047 | translated: translatd 2048 | translation: translashun 2049 | translator: tranzlator 2050 | transmitted: transmittd 2051 | transubstantial: transubstanshul 2052 | treated: treatd 2053 | trekked: trekkd 2054 | triangulation: triangulashun 2055 | tribunitial: tribunishul 2056 | tricks: trickz 2057 | tried: trid 2058 | triggering: triggerin 2059 | trimming: trimmin 2060 | trinoctial: trinocshul 2061 | tritangential: tritangenshul 2062 | troubled: troubld 2063 | troubling: troublin 2064 | trying: tryin 2065 | tuesday: tuesdai 2066 | turned: turnd 2067 | turns: turnz 2068 | tweaking: tweakin 2069 | tweeted: tweetd 2070 | tweets: tweetz 2071 | twitter: twittr 2072 | two: "2" 2073 | tying: tyin 2074 | typing: typin 2075 | typograhy: tyopgrafy 2076 | typography: typografy 2077 | unanchored: unanchord 2078 | unanswered: unanzwerd 2079 | unapproved: unapprovd 2080 | unboxing: unboxin 2081 | uncelestial: uncelesshul 2082 | uncircumstantial: uncircumstanshul 2083 | unconfidential: unconfidenshul 2084 | unconsequential: unconsequenshul 2085 | undefeated: undefeatd 2086 | undeferential: undeferenshul 2087 | under: undr 2088 | underlying: underlyin 2089 | underpowered: underpowerd 2090 | underscores: underscorez 2091 | understanding: understandin 2092 | undertaking: undertakin 2093 | undervalued: undervalud 2094 | undifferential: undifferenshul 2095 | unessential: unessenshul 2096 | unevidential: unevidenshul 2097 | unexpected: unexpectd 2098 | unexperiential: unexperienshul 2099 | unimpartial: unimparshul 2100 | uninfluential: uninfluenshul 2101 | unipotential: unipotenshul 2102 | united: unitd 2103 | units: unitz 2104 | unless: unles 2105 | unlocked: unlockd 2106 | unlocking: unlockin 2107 | unmartial: unmarshul 2108 | unnamed: unnamd 2109 | unpalatial: unpalashul 2110 | unpartial: unparshul 2111 | unpestilential: unpestilenshul 2112 | unprecedential: unprecedenshul 2113 | unpresidential: unpresidenshul 2114 | unprovidential: unprovidenshul 2115 | unreverential: unreverenshul 2116 | unsequential: unsequenshul 2117 | unsolicited: unsolicitd 2118 | unspatial: unspashul 2119 | unsubstantial: unsubstanshul 2120 | untangential: untangenshul 2121 | until: til 2122 | unveiled: unveild 2123 | unwise: unwize 2124 | upcoming: upcomin 2125 | updated: updatd 2126 | updates: updatez 2127 | upgraded: upgradd 2128 | upgrades: upgradez 2129 | uploads: uploadz 2130 | upper: uppr 2131 | upscaled: upscald 2132 | used: usd 2133 | useless: useles 2134 | user: usr 2135 | users: userz 2136 | uses: usez 2137 | using: usin 2138 | utilities: utilitiez 2139 | utter: uttr 2140 | vader: vadr 2141 | variables: variablez 2142 | vending: vendin 2143 | ventripotential: ventripotenshul 2144 | version: vershun 2145 | versionable: vershunable 2146 | versions: vershuns 2147 | very: vary 2148 | videos: videoz 2149 | videotapes: videotapez 2150 | videotaping: videotapin 2151 | viewed: viewd 2152 | viewers: viewerz 2153 | vindicated: vindicatd 2154 | violations: violashuns 2155 | virtualization: virtualizashun 2156 | vision: vishun 2157 | visionless: vishunles 2158 | visions: vishuns 2159 | visited: visietd 2160 | visualization: visualizashun 2161 | vivisectin: vivisecshun 2162 | voters: voterz 2163 | voting: votin 2164 | vowed: vowd 2165 | waiting: waitin 2166 | waking: wakin 2167 | want: wants 2168 | wanted: wantd 2169 | wanting: wantin 2170 | wants: wantz 2171 | warming: warmin 2172 | warned: warnd 2173 | warner: warnr 2174 | warranties: warrantiez 2175 | wars: warz 2176 | was: wuz 2177 | watching: watchin 2178 | watered: waterd 2179 | way: wai 2180 | ways: wayz 2181 | "we'll": well 2182 | "we're": were 2183 | weakness: weaknes 2184 | wearing: wearin 2185 | weather: weathr 2186 | webapps: webappz 2187 | website: websiet 2188 | websites: websiets 2189 | weekday: weekdai 2190 | weeks: weekz 2191 | welcome: welcom 2192 | were: wuz 2193 | "weren't": "wuz not" 2194 | what: wut 2195 | whatever: whatevr 2196 | whatsoever: whatsoevr 2197 | whenever: whenevr 2198 | where: wer 2199 | whether: whethr 2200 | whipped: whippd 2201 | who: hoo 2202 | whoever: whoevr 2203 | why: y 2204 | widgets: widgetz 2205 | wiggling: wigglin 2206 | willing: willin 2207 | windows: windowz 2208 | winner: winnr 2209 | winners: winnerz 2210 | winning: winnin 2211 | wins: winz 2212 | wired: wird 2213 | wished: wishd 2214 | wishes: wishez 2215 | with: wif 2216 | wnated: wantd 2217 | wonder: wondr 2218 | wondering: wonderin 2219 | word: werd 2220 | wordpress: werdpres 2221 | words: werdz 2222 | work: werk 2223 | workaround: werkaround 2224 | worked: werkd 2225 | worker: werkr 2226 | working: werkin 2227 | works: werkz 2228 | workstation: workstashun 2229 | workstations: werkstashuns 2230 | world: wurld 2231 | worst: wurst 2232 | worth: wurth 2233 | would: wud 2234 | write: rite 2235 | writer: writr 2236 | writes: writez 2237 | writing: writin 2238 | written: writteh 2239 | yards: yardz 2240 | yeah: ya 2241 | year: yer 2242 | years: yeers 2243 | "yes": yez 2244 | yesterday: yesterdai 2245 | you: u 2246 | "you'd": ud 2247 | younger: youngr 2248 | your: ur 2249 | yours: urz 2250 | zoocytial: zoocyshul 2251 | zoomer: zoomr 2252 | zooming: zommin 2253 | -------------------------------------------------------------------------------- /Chapter 10/NLTK Chapter 10 - Notes .ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NLTK Chapter 10\n", 8 | "\n", 9 | "## Analyzing the Meaning of Sentences\n", 10 | "\n", 11 | "*The html version of this chapter in the book is available [here](https://www.nltk.org/book/ch10.html \"ch10\").*\n", 12 | "\n", 13 | "__*I skipped this chapter, as I have scant experience with propositional logic, and didn't feel that any of the techniques therein had any practical value for me.*__" 14 | ] 15 | } 16 | ], 17 | "metadata": { 18 | "kernelspec": { 19 | "display_name": "Python 3", 20 | "language": "python", 21 | "name": "python3" 22 | }, 23 | "language_info": { 24 | "codemirror_mode": { 25 | "name": "ipython", 26 | "version": 3 27 | }, 28 | "file_extension": ".py", 29 | "mimetype": "text/x-python", 30 | "name": "python", 31 | "nbconvert_exporter": "python", 32 | "pygments_lexer": "ipython3", 33 | "version": "3.7.3" 34 | } 35 | }, 36 | "nbformat": 4, 37 | "nbformat_minor": 2 38 | } 39 | -------------------------------------------------------------------------------- /Chapter 11/Swadesh Lists.csv: -------------------------------------------------------------------------------- 1 | i,English,je,French,ich,German,io,Italian,ego,Latin,ja,Russian,yo,Spanish,ik,Dutch,jag,Swedish,ahám,Sanskrit 2 | you,English,tu,French,du,German,voi,Italian,vos,Latin,vy,Russian,usted,Spanish,jij,Dutch,du,Swedish,tvám,Sanskrit 3 | thou,English,vous ,French,sie,German,tu,Italian,tu,Latin,ty,Russian,tú,Spanish,u,Dutch,du,Swedish,tvám,Sanskrit 4 | he,English,il,French,er,German,egli,Italian,ipse,Latin,on,Russian,él,Spanish,hij,Dutch,han,Swedish,sá,Sanskrit 5 | we,English,nous,French,wir,German,noi,Italian,nos,Latin,my,Russian,nosotros,Spanish,wij,Dutch,vi,Swedish,vayám,Sanskrit 6 | you,English,vous,French,ihr,German,voi,Italian,vos,Latin,vy,Russian,usted,Spanish,jullie,Dutch,ni,Swedish,yūyám,Sanskrit 7 | they,English,ils,French,sie,German,essi,Italian,ipsi,Latin,oni,Russian,ellos,Spanish,zij,Dutch,de,Swedish,té,Sanskrit 8 | this,English,ceci,French,dieses,German,questo,Italian,hoc,Latin,e'tot,Russian,esta,Spanish,deze,Dutch,det här,Swedish,idám,Sanskrit 9 | that,English,cela,French,jenes,German,che,Italian,quod,Latin,tot,Russian,que,Spanish,die,Dutch,det där,Swedish,tát,Sanskrit 10 | here,English,ici,French,hier,German,qui,Italian,hic,Latin,zdes',Russian,aquí,Spanish,hier,Dutch,här,Swedish,átra,Sanskrit 11 | there,English,là,French,dort,German,ci,Italian,ibi,Latin,tam,Russian,hay,Spanish,daar,Dutch,där,Swedish,tátra,Sanskrit 12 | who,English,qui,French,wer,German,che,Italian,qui,Latin,kto,Russian,que,Spanish,wie,Dutch,vem,Swedish,ká,Sanskrit 13 | what,English,quoi,French,was,German,cosa,Italian,quid,Latin,chto,Russian,¿qué,Spanish,wat,Dutch,vad,Swedish,kím,Sanskrit 14 | where,English,où,French,wo,German,dove,Italian,ubi,Latin,gde,Russian,donde,Spanish,waar,Dutch,var,Swedish,kútra,Sanskrit 15 | when,English,quand,French,wann,German,quando,Italian,quando,Latin,kogda,Russian,cuando,Spanish,wanneer,Dutch,när,Swedish,kadā́,Sanskrit 16 | how,English,comment,French,wie,German,come,Italian,quomodo,Latin,kak,Russian,cómo,Spanish,hoe,Dutch,hur,Swedish,kathám,Sanskrit 17 | not,English,ne...pas,French,nicht,German,non,Italian,non,Latin,ne,Russian,no,Spanish,niet,Dutch,inte,Swedish,ná,Sanskrit 18 | all,English,tout,French,alle,German,tutti,Italian,omnes,Latin,vse,Russian,todos,Spanish,al,Dutch,alla,Swedish,sárva,Sanskrit 19 | many,English,plusieurs,French,viele,German,molti,Italian,multis,Latin,mnogo,Russian,muchos,Spanish,veel,Dutch,många,Swedish,bahú,Sanskrit 20 | some,English,quelques,French,einige,German,un po 'di,Italian,aliqua,Latin,nekotorye,Russian,algunos,Spanish,enkele,Dutch,några,Swedish,kiñcid,Sanskrit 21 | few,English,peu,French,wenige,German,pochi,Italian,pauci,Latin,malo,Russian,pocos,Spanish,weinig,Dutch,få,Swedish,álpa,Sanskrit 22 | other,English,autre,French,andere,German,altro,Italian,aliis,Latin,drugoj,Russian,otro,Spanish,ander,Dutch,annan,Swedish,anyá,Sanskrit 23 | one,English,un,French,eins,German,uno,Italian,unum,Latin,odin,Russian,uno,Spanish,een,Dutch,ett,Swedish,éka,Sanskrit 24 | two,English,deux,French,zwei,German,due,Italian,duo,Latin,dva,Russian,dos,Spanish,twee,Dutch,två,Swedish,dví,Sanskrit 25 | three,English,trois,French,drei,German,tre,Italian,tres,Latin,tri,Russian,tres,Spanish,drie,Dutch,tre,Swedish,trí,Sanskrit 26 | four,English,quatre,French,vier,German,quattro,Italian,quattuor,Latin,chetyre,Russian,cuatro,Spanish,vier,Dutch,fyra,Swedish,cátur,Sanskrit 27 | five,English,cinq,French,fünf,German,cinque,Italian,quinque,Latin,pjat',Russian,cinco,Spanish,vijf,Dutch,fem,Swedish,páñcan,Sanskrit 28 | big,English,grand,French,groß,German,grande,Italian,magnus,Latin,bol'shoj,Russian,grande,Spanish,groot,Dutch,stor,Swedish,mahát,Sanskrit 29 | long,English,long,French,lang,German,lungo,Italian,diu,Latin,dlinnyj,Russian,largo,Spanish,lang,Dutch,lång,Swedish,dīrghá,Sanskrit 30 | wide,English,large,French,breit,German,ampio,Italian,lata,Latin,shirokij,Russian,ancho,Spanish,breed,Dutch,bred,Swedish,urú,Sanskrit 31 | thick,English,épais,French,dick,German,spesso,Italian,crassus,Latin,tolstyj,Russian,espesor,Spanish,dik,Dutch,tjock,Swedish,ghaná,Sanskrit 32 | heavy,English,lourd,French,schwer,German,pesante,Italian,gravis,Latin,tjazhjolyj,Russian,pesado,Spanish,zwaar,Dutch,tung,Swedish,gurú,Sanskrit 33 | small,English,petit,French,klein,German,piccolo,Italian,parva,Latin,malen'kij,Russian,pequeño,Spanish,klein,Dutch,liten,Swedish,álpa,Sanskrit 34 | short,English,court,French,kurz,German,breve,Italian,brevis,Latin,korotkij,Russian,corto,Spanish,kort,Dutch,kort,Swedish,hrasvá,Sanskrit 35 | narrow,English,étroit,French,eng,German,stretto,Italian,angustus,Latin,uzkij,Russian,estrecho,Spanish,smal,Dutch,trång,Swedish,aṃhú,Sanskrit 36 | thin,English,mince,French,dünn,German,sottile,Italian,tenues,Latin,tonkij,Russian,delgado,Spanish,dun,Dutch,tunn,Swedish,tanú,Sanskrit 37 | woman,English,femme,French,frau,German,donna,Italian,mulier,Latin,zhenschina,Russian,mujer,Spanish,vrouw,Dutch,kvinna,Swedish,strī́,Sanskrit 38 | man,English,homme,French,mann,German,uomo,Italian,homo,Latin,muzhchina,Russian,hombre,Spanish,man,Dutch,man,Swedish,púruṣa,Sanskrit 39 | man,English,homme,French,mensch,German,uomo,Italian,homo,Latin,chelovek,Russian,hombre,Spanish,mens,Dutch,människa,Swedish,manuṣyá,Sanskrit 40 | child,English,enfant,French,kind,German,bambino,Italian,puer,Latin,rebjonok,Russian,niño,Spanish,kind,Dutch,barn,Swedish,bā́la,Sanskrit 41 | wife,English,femme,French,frau,German,moglie,Italian,uxor,Latin,zhena,Russian,esposa,Spanish,vrouw,Dutch,hustru,Swedish,pátnī,Sanskrit 42 | husband,English,mari,French,mann,German,marito,Italian,maritus,Latin,muzh,Russian,marido,Spanish,man,Dutch,man,Swedish,páti,Sanskrit 43 | mother,English,mère,French,mutter,German,madre,Italian,mater,Latin,mat',Russian,madre,Spanish,moeder,Dutch,mamma,Swedish,mā́tṛ,Sanskrit 44 | father,English,père,French,vater,German,padre,Italian,pater,Latin,otec,Russian,padre,Spanish,vader,Dutch,pappa,Swedish,pitṛ́,Sanskrit 45 | animal,English,animal,French,tier,German,animale,Italian,animalis,Latin,zhivotnoe,Russian,animal,Spanish,dier,Dutch,djur,Swedish,páśu,Sanskrit 46 | fish,English,poisson,French,fisch,German,pesce,Italian,piscis,Latin,ryba,Russian,pescado,Spanish,vis,Dutch,fisk,Swedish,mátsya,Sanskrit 47 | bird,English,oiseau,French,vogel,German,uccello,Italian,avis,Latin,ptica,Russian,pájaro,Spanish,vogel,Dutch,fågel,Swedish,ví,Sanskrit 48 | dog,English,chien,French,hund,German,cane,Italian,canis,Latin,sobaka,Russian,perro,Spanish,hond,Dutch,hund,Swedish,śván,Sanskrit 49 | louse,English,pou,French,laus,German,pidocchio,Italian,pedis,Latin,vosh',Russian,piojo,Spanish,luis,Dutch,lus,Swedish,yūkā,Sanskrit 50 | snake,English,serpent,French,schlange,German,serpente,Italian,anguis,Latin,zmeja,Russian,serpiente,Spanish,slang,Dutch,orm,Swedish,sarpá,Sanskrit 51 | worm,English,ver,French,wurm,German,verme,Italian,vermis,Latin,cherv',Russian,gusano,Spanish,worm,Dutch,mask,Swedish,kṛ́mi,Sanskrit 52 | tree,English,arbre,French,baum,German,albero,Italian,arbor,Latin,derevo,Russian,árbol,Spanish,boom,Dutch,träd,Swedish,vṛkṣá,Sanskrit 53 | forest,English,forêt,French,wald,German,foresta,Italian,silva,Latin,les,Russian,bosque,Spanish,woud,Dutch,skog,Swedish,vána,Sanskrit 54 | stick,English,bâton,French,stock,German,attaccare,Italian,inhaero,Latin,palka,Russian,palo,Spanish,stok,Dutch,pinne,Swedish,daṇḍá,Sanskrit 55 | fruit,English,fruit,French,frucht,German,frutta,Italian,fructus,Latin,plod,Russian,fruta,Spanish,fruit,Dutch,frukt,Swedish,phála,Sanskrit 56 | seed,English,graine,French,samen,German,seme,Italian,semen,Latin,semja,Russian,semilla,Spanish,zaad,Dutch,frö,Swedish,bī́ja,Sanskrit 57 | leaf,English,feuille,French,blatt,German,foglia,Italian,folium,Latin,list,Russian,hoja,Spanish,blad,Dutch,löv,Swedish,páttra,Sanskrit 58 | root,English,racine,French,wurzel,German,radice,Italian,radix,Latin,koren',Russian,raíz,Spanish,wortel,Dutch,rot,Swedish,mū́la,Sanskrit 59 | bark,English,écorce,French,rinde,German,corteccia,Italian,cortex,Latin,kora,Russian,ladrido,Spanish,schors,Dutch,bark,Swedish,tvác,Sanskrit 60 | flower,English,fleur,French,blume,German,fiore,Italian,flos,Latin,cvetok,Russian,flor,Spanish,bloem,Dutch,blomma,Swedish,púṣpa,Sanskrit 61 | grass,English,herbe,French,gras,German,erba,Italian,herba,Latin,trava,Russian,hierba,Spanish,gras,Dutch,gräs,Swedish,tṛ́ṇa,Sanskrit 62 | rope,English,corde,French,seil,German,corda,Italian,funis,Latin,verjovka,Russian,cuerda,Spanish,touw,Dutch,rep,Swedish,rájju,Sanskrit 63 | skin,English,peau,French,haut,German,pelle,Italian,cutis,Latin,kozha,Russian,piel,Spanish,huid,Dutch,hud,Swedish,cárman,Sanskrit 64 | meat,English,viande,French,fleisch,German,carne,Italian,cibus,Latin,mjaso,Russian,carne,Spanish,vlees,Dutch,kött,Swedish,māṃsá,Sanskrit 65 | blood,English,sang,French,blut,German,sangue,Italian,sanguis,Latin,krov',Russian,sangre,Spanish,bloed,Dutch,blod,Swedish,rakta,Sanskrit 66 | bone,English,os,French,knochen,German,osso,Italian,os,Latin,kost',Russian,hueso,Spanish,been,Dutch,ben,Swedish,ásthi,Sanskrit 67 | fat,English,graisse,French,fett,German,grasso,Italian,adipem,Latin,zhir,Russian,grasa,Spanish,vet,Dutch,fett,Swedish,pī́vas,Sanskrit 68 | egg,English,œuf,French,ei,German,uovo,Italian,ovum,Latin,jajco,Russian,huevo,Spanish,ei,Dutch,ägg,Swedish,aṇḍá,Sanskrit 69 | horn,English,corne,French,horn,German,corno,Italian,cornu,Latin,rog,Russian,cuerno,Spanish,hoorn,Dutch,horn,Swedish,śṛṅgá,Sanskrit 70 | tail,English,queue,French,schwanz,German,coda,Italian,cauda,Latin,hvost,Russian,cola,Spanish,staart,Dutch,svans,Swedish,púccha,Sanskrit 71 | feather,English,plume,French,feder,German,piuma,Italian,pinnam,Latin,pero,Russian,pluma,Spanish,veer,Dutch,fjäder,Swedish,parṇá,Sanskrit 72 | hair,English,cheveu,French,haar,German,capelli,Italian,capillus,Latin,volos,Russian,cabello,Spanish,haar,Dutch,hår,Swedish,kéśa,Sanskrit 73 | head,English,tête,French,kopf ,German,testa,Italian,caput,Latin,golova,Russian,cabeza,Spanish,hoofd,Dutch,huvud,Swedish,śíras,Sanskrit 74 | ear,English,oreille,French,ohr,German,orecchio,Italian,auris,Latin,uho,Russian,oído,Spanish,oor,Dutch,öra,Swedish,kárṇa,Sanskrit 75 | eye,English,œil,French,auge,German,occhio,Italian,oculus,Latin,glaz,Russian,ojo,Spanish,oog,Dutch,öga,Swedish,ákṣi,Sanskrit 76 | nose,English,nez,French,nase,German,naso,Italian,nasus,Latin,nos,Russian,nariz,Spanish,neus,Dutch,näsa,Swedish,nā́sā,Sanskrit 77 | mouth,English,bouche,French,mund,German,bocca,Italian,os,Latin,rot,Russian,boca,Spanish,mond,Dutch,mun,Swedish,vaktra,Sanskrit 78 | tooth,English,dent,French,zahn,German,dente,Italian,dentem,Latin,zub,Russian,diente,Spanish,tand,Dutch,tand,Swedish,dánta,Sanskrit 79 | tongue,English,langue,French,zunge,German,lingua,Italian,lingua,Latin,jazyk,Russian,lengua,Spanish,tong,Dutch,tunga,Swedish,jihvā́,Sanskrit 80 | fingernail,English,ongle,French,fingernagel,German,unghia,Italian,unguem,Latin,nogot',Russian,uña,Spanish,vingernagel,Dutch,nagel,Swedish,nakhá,Sanskrit 81 | foot,English,pied,French,fuß,German,piede,Italian,pes,Latin,stupnja,Russian,pie,Spanish,voet,Dutch,fot,Swedish,padá,Sanskrit 82 | leg,English,jambe,French,bein,German,gamba,Italian,crus,Latin,noga,Russian,pierna,Spanish,been,Dutch,ben,Swedish,jáṅgha,Sanskrit 83 | knee,English,genou,French,knie,German,ginocchio,Italian,genu,Latin,koleno,Russian,rodilla,Spanish,knie,Dutch,knä,Swedish,jā́nu,Sanskrit 84 | hand,English,main,French,hand,German,mano,Italian,manu,Latin,kist' ruki,Russian,mano,Spanish,hand,Dutch,hand,Swedish,hásta,Sanskrit 85 | wing,English,aile,French,flügel,German,ala,Italian,ala,Latin,krylo,Russian,ala,Spanish,vleugel,Dutch,vinge,Swedish,pakṣá,Sanskrit 86 | belly,English,ventre,French,bauch,German,pancia,Italian,venter,Latin,zhivot,Russian,vientre,Spanish,buik,Dutch,mage,Swedish,udára,Sanskrit 87 | guts,English,entrailles,French,eingeweide,German,budella,Italian,viscera,Latin,kishki,Russian,agallas,Spanish,ingewanden,Dutch,inälvor,Swedish,ántra,Sanskrit 88 | neck,English,cou,French,hals,German,collo,Italian,collum,Latin,sheja,Russian,cuello,Spanish,nek,Dutch,hals,Swedish,gala,Sanskrit 89 | back,English,dos,French,rücken,German,indietro,Italian,retro,Latin,spina,Russian,espalda,Spanish,rug,Dutch,rygg,Swedish,pṛṣṭhá,Sanskrit 90 | breast,English,sein,French,brust,German,seno,Italian,pectus,Latin,grud',Russian,pecho,Spanish,borst,Dutch,bröst,Swedish,stána,Sanskrit 91 | heart,English,cœur,French,herz,German,cuore,Italian,cor,Latin,serdce,Russian,corazón,Spanish,hart,Dutch,hjärta,Swedish,hṛ́daya,Sanskrit 92 | liver,English,foie,French,leber,German,fegato,Italian,iecur,Latin,pechen',Russian,hígado,Spanish,lever,Dutch,lever,Swedish,yákṛt,Sanskrit 93 | drink,English,boire,French,trinken,German,bere,Italian,bibere,Latin,pit',Russian,beber,Spanish,drinken,Dutch,dricka,Swedish,píbati,Sanskrit 94 | eat,English,manger,French,essen,German,mangiare,Italian,manducare,Latin,est',Russian,comer,Spanish,eten,Dutch,äta,Swedish,khā́dati,Sanskrit 95 | bite,English,mordre,French,beißen,German,mordere,Italian,mordere,Latin,kusat',Russian,morder,Spanish,bijten,Dutch,bita,Swedish,dáśati,Sanskrit 96 | suck,English,sucer,French,saugen,German,succhiare,Italian,nutrientibus,Latin,sosat',Russian,chupar,Spanish,zuigen,Dutch,suga,Swedish,dháyati,Sanskrit 97 | spit,English,cracher,French,spucken,German,sputare,Italian,conspuentibus,Latin,plevat',Russian,escupir,Spanish,spuwen,Dutch,spotta,Swedish,ṣṭhī́vati,Sanskrit 98 | vomit,English,vomir,French,erbrechen,German,vomitare,Italian,evomere,Latin,blevat',Russian,vomitar,Spanish,braken,Dutch,kräkas,Swedish,vámati,Sanskrit 99 | blow,English,souffler,French,blasen,German,soffiare,Italian,flarent,Latin,dut',Russian,soplar,Spanish,blazen,Dutch,blåsa,Swedish,vā́ti,Sanskrit 100 | breathe,English,respirer,French,atmen,German,respirare,Italian,spirant,Latin,dyshat',Russian,respirar,Spanish,ademen,Dutch,andas,Swedish,ániti,Sanskrit 101 | laugh,English,rire,French,lachen,German,ridere,Italian,"ridere,",Latin,smejat'sja,Russian,reír,Spanish,lachen,Dutch,skratta,Swedish,smáyate,Sanskrit 102 | see,English,voir,French,sehen,German,vedere,Italian,videre,Latin,videt',Russian,ver,Spanish,zien,Dutch,se,Swedish,páśyati,Sanskrit 103 | hear,English,entendre,French,hören,German,sentire,Italian,audire,Latin,slyshat',Russian,escuchar,Spanish,horen,Dutch,höra,Swedish,śṛṇóti,Sanskrit 104 | know,English,savoir,French,wissen,German,sapere,Italian,scire,Latin,znat',Russian,saber,Spanish,weten,Dutch,veta,Swedish,jānā́ti,Sanskrit 105 | think,English,penser,French,denken,German,pensare,Italian,cogitare,Latin,dumat',Russian,pensar,Spanish,denken,Dutch,tänka,Swedish,mányate,Sanskrit 106 | smell,English,sentir,French,riechen,German,odore,Italian,odor,Latin,njuhat',Russian,olor,Spanish,ruiken,Dutch,lukta,Swedish,jíghrati,Sanskrit 107 | fear,English,craindre,French,fürchten,German,paura,Italian,timor,Latin,bojat'sja,Russian,miedo,Spanish,vrezen,Dutch,frukta,Swedish,bibhéti,Sanskrit 108 | sleep,English,dormir,French,schlafen,German,dormire,Italian,somnus,Latin,spat',Russian,dormir,Spanish,slapen,Dutch,sova,Swedish,svápiti,Sanskrit 109 | live,English,vivre,French,leben,German,vivere,Italian,vivere,Latin,zhit',Russian,vivir,Spanish,leven,Dutch,leva,Swedish,jī́vati,Sanskrit 110 | die,English,mourir,French,sterben,German,morire,Italian,mori,Latin,umirat',Russian,morir,Spanish,sterven,Dutch,dö,Swedish,mriyáte,Sanskrit 111 | kill,English,tuer,French,töten,German,uccidere,Italian,occidere,Latin,ubivat',Russian,matar,Spanish,doden,Dutch,döda,Swedish,hánti,Sanskrit 112 | fight,English,se battre,French,kämpfen,German,combattere,Italian,pugnare,Latin,borot'sja,Russian,luchar,Spanish,vechten,Dutch,strida,Swedish,yúdhyate,Sanskrit 113 | hunt,English,chasser,French,jagen,German,caccia,Italian,hunt,Latin,ohotit'sja,Russian,caza,Spanish,jagen,Dutch,jaga,Swedish,veti,Sanskrit 114 | hit,English,frapper,French,schlagen,German,colpire,Italian,ledo,Latin,bit',Russian,golpear,Spanish,slaan,Dutch,slå,Swedish,hánti,Sanskrit 115 | cut,English,couper,French,schneiden,German,taglio,Italian,secare,Latin,rezat',Russian,cortar,Spanish,snijden,Dutch,skära,Swedish,kṛntáti,Sanskrit 116 | split,English,fendre,French,spalten,German,dividere,Italian,scindendae,Latin,raskalyvat',Russian,dividido,Spanish,splijten,Dutch,dela,Swedish,bhinátti,Sanskrit 117 | stab,English,poignarder,French,stechen,German,pugnalata,Italian,confodisti,Latin,votknut',Russian,puñalada,Spanish,steken,Dutch,sticka,Swedish,vídhyati,Sanskrit 118 | scratch,English,gratter,French,kratzen,German,graffiare,Italian,scalpere,Latin,carapat',Russian,rayar,Spanish,krabben,Dutch,klia,Swedish,likháti,Sanskrit 119 | dig,English,creuser,French,graben,German,scavare,Italian,fodere,Latin,kopat',Russian,cavar,Spanish,graven,Dutch,gräva,Swedish,khánati,Sanskrit 120 | swim,English,nager,French,schwimmen,German,nuotare,Italian,natare,Latin,plavat',Russian,nadar,Spanish,zwemmen,Dutch,simma,Swedish,plávate,Sanskrit 121 | fly,English,voler,French,fliegen,German,volare,Italian,volare,Latin,letat',Russian,volar,Spanish,vliegen,Dutch,flyga,Swedish,pátati,Sanskrit 122 | walk,English,marcher,French,gehen,German,camminare,Italian,ambulare,Latin,hodit',Russian,caminar,Spanish,lopen,Dutch,gå,Swedish,éti,Sanskrit 123 | come,English,venir,French,kommen,German,venire,Italian,venire,Latin,prihodit',Russian,venir,Spanish,komen,Dutch,komma,Swedish,ā́gacchati,Sanskrit 124 | lie,English,s'étendre,French,liegen,German,mentire,Italian,mentiri,Latin,lezhat',Russian,mentir,Spanish,liggen,Dutch,ligga,Swedish,śéte,Sanskrit 125 | sit,English,s'asseoir,French,sitzen,German,sedersi,Italian,sedere,Latin,sidet',Russian,sentarse,Spanish,zitten,Dutch,sitta,Swedish,sī́dati,Sanskrit 126 | stand,English,se lever,French,stehen,German,stare in piedi,Italian,stare,Latin,stojat',Russian,estar de pie,Spanish,staan,Dutch,stå,Swedish,tíṣṭhati,Sanskrit 127 | turn,English,tourner,French,drehen,German,girare,Italian,converte,Latin,povorachivat',Russian,vuelta,Spanish,draaien,Dutch,svänga,Swedish,vártate,Sanskrit 128 | fall,English,tomber,French,fallen,German,caduta,Italian,cadere,Latin,padat',Russian,caer,Spanish,vallen,Dutch,falla,Swedish,pádyate,Sanskrit 129 | give,English,donner,French,geben,German,dare,Italian,dare,Latin,davat',Russian,dar,Spanish,geven,Dutch,ge,Swedish,dádāti,Sanskrit 130 | hold,English,tenir,French,halten,German,tenere,Italian,tenere,Latin,derzhat',Russian,mantener,Spanish,houden,Dutch,hålla,Swedish,dhárati,Sanskrit 131 | squeeze,English,serrer,French,quetschen,German,spremere,Italian,exprimito,Latin,szhimat',Russian,apretar,Spanish,knijpen,Dutch,klämma,Swedish,mṛdnā́ti,Sanskrit 132 | rub,English,frotter,French,reiben,German,strofinare,Italian,fricare,Latin,teret',Russian,frotar,Spanish,wrijven,Dutch,gnida,Swedish,ghárṣati,Sanskrit 133 | wash,English,laver,French,waschen,German,lavare,Italian,lavabit,Latin,myt',Russian,lavar,Spanish,wassen,Dutch,tvätta,Swedish,kṣā́layati,Sanskrit 134 | wipe,English,essuyer,French,wischen,German,pulire,Italian,extergunt,Latin,vytirat',Russian,limpiar,Spanish,vegen,Dutch,rensa,Swedish,mā́rṣṭi,Sanskrit 135 | pull,English,tirer,French,ziehen,German,tirare,Italian,attrahendam,Latin,tjanut',Russian,tirar,Spanish,trekken,Dutch,dra,Swedish,kárṣati,Sanskrit 136 | push,English,pousser,French,drücken,German,spingere,Italian,ventilabis,Latin,tolkat',Russian,empuje,Spanish,duwen,Dutch,trycka,Swedish,nudáti,Sanskrit 137 | throw,English,jeter,French,werfen,German,buttare,Italian,subvertat,Latin,brosat',Russian,lanzar,Spanish,werpen,Dutch,kasta,Swedish,kṣipáti,Sanskrit 138 | tie,English,lier,French,binden,German,legare,Italian,iungite,Latin,svjazyvat',Russian,atar,Spanish,knopen,Dutch,knyta,Swedish,badhnā́ti,Sanskrit 139 | sew,English,coudre,French,nähen,German,cucire,Italian,consuendi,Latin,shit',Russian,coser,Spanish,naaien,Dutch,sy,Swedish,sī́vyati,Sanskrit 140 | count,English,compter,French,zählen,German,contare,Italian,numerabit,Latin,schitat',Russian,contar,Spanish,tellen,Dutch,räkna,Swedish,gáṇayati,Sanskrit 141 | say,English,dire,French,sagen,German,dire,Italian,dicere,Latin,skazat',Russian,decir,Spanish,zeggen,Dutch,säga,Swedish,vákti,Sanskrit 142 | sing,English,chanter,French,singen,German,cantare,Italian,cantare,Latin,pet',Russian,cantar,Spanish,zingen,Dutch,sjunga,Swedish,gā́yati,Sanskrit 143 | play,English,jouer,French,spielen,German,giocare,Italian,ludere,Latin,igrat',Russian,jugar,Spanish,spelen,Dutch,leka,Swedish,dī́vyati,Sanskrit 144 | float,English,flotter,French,schweben,German,galleggiante,Italian,supernatet,Latin,plavat',Russian,flotar,Spanish,zweven,Dutch,flyta,Swedish,plávate,Sanskrit 145 | flow,English,couler,French,fließen,German,flusso,Italian,fluere,Latin,tech',Russian,flujo,Spanish,vloeien,Dutch,rinna,Swedish,sárati,Sanskrit 146 | freeze,English,geler,French,frieren,German,congelare,Italian,duratus,Latin,zamerzat',Russian,congelar,Spanish,vriezen,Dutch,frysa,Swedish,śī́yate,Sanskrit 147 | swell,English,gonfler,French,schwellen,German,gonfiarsi,Italian,intumescunt,Latin,puhnut',Russian,hincharse,Spanish,zwellen,Dutch,svälla,Swedish,śváyati,Sanskrit 148 | sun,English,soleil,French,sonne,German,sole,Italian,sol,Latin,solnce,Russian,sol,Spanish,zon,Dutch,sol,Swedish,sū́rya,Sanskrit 149 | moon,English,lune,French,mond,German,luna,Italian,luna,Latin,luna,Russian,luna,Spanish,maan,Dutch,måne,Swedish,mā́sa,Sanskrit 150 | star,English,étoile,French,stern,German,stella,Italian,stella,Latin,zvezda,Russian,estrella,Spanish,ster,Dutch,stjärna,Swedish,nákṣatra,Sanskrit 151 | water,English,eau,French,wasser,German,acqua,Italian,aqua,Latin,voda,Russian,agua,Spanish,water,Dutch,vatten,Swedish,jalá,Sanskrit 152 | rain,English,pluie,French,regen,German,pioggia,Italian,pluvia,Latin,dozhd',Russian,lluvia,Spanish,regen,Dutch,regn,Swedish,varṣá,Sanskrit 153 | river,English,rivière,French,fluß,German,fiume,Italian,flumen,Latin,reka,Russian,río,Spanish,rivier,Dutch,flod,Swedish,nadī́,Sanskrit 154 | lake,English,lac,French,see,German,lago,Italian,lacus,Latin,ozero,Russian,lago,Spanish,meer,Dutch,sjö,Swedish,sáras,Sanskrit 155 | sea,English,mer,French,meer,German,mare,Italian,mare,Latin,more,Russian,mar,Spanish,zee,Dutch,hav,Swedish,samudrá,Sanskrit 156 | salt,English,sel,French,salz,German,sale,Italian,sal,Latin,sol',Russian,sal,Spanish,zout,Dutch,salt,Swedish,lavaṇá,Sanskrit 157 | stone,English,pierre,French,stein,German,pietra,Italian,lapis,Latin,kamen',Russian,piedra,Spanish,steen,Dutch,sten,Swedish,áśman,Sanskrit 158 | sand,English,sable,French,sand,German,sabbia,Italian,harena,Latin,pesok,Russian,arena,Spanish,zand,Dutch,sand,Swedish,pāṃsú,Sanskrit 159 | dust,English,poussière,French,staub,German,polvere,Italian,pulvis,Latin,pyl',Russian,polvo,Spanish,stof,Dutch,damm,Swedish,reṇú,Sanskrit 160 | earth,English,terre,French,erde,German,terra,Italian,terra,Latin,zemlja,Russian,tierra,Spanish,aarde,Dutch,jord,Swedish,kṣám,Sanskrit 161 | cloud,English,nuage,French,wolke,German,nuvola,Italian,nubes,Latin,oblako,Russian,nube,Spanish,wolk,Dutch,moln,Swedish,nábhas,Sanskrit 162 | fog,English,brouillard,French,nebel,German,nebbia,Italian,caligo,Latin,tuman,Russian,niebla,Spanish,mist,Dutch,dimma,Swedish,míh,Sanskrit 163 | sky,English,ciel,French,himmel,German,cielo,Italian,caelum,Latin,nebo,Russian,cielo,Spanish,lucht,Dutch,himmel,Swedish,ākāśá,Sanskrit 164 | wind,English,vent,French,wind,German,vento,Italian,ventus,Latin,veter,Russian,viento,Spanish,wind,Dutch,vind,Swedish,vāyú,Sanskrit 165 | snow,English,neige,French,schnee,German,neve,Italian,nix,Latin,sneg,Russian,nieve,Spanish,sneeuw,Dutch,snö,Swedish,himá,Sanskrit 166 | ice,English,glace,French,eis,German,ghiaccio,Italian,glacies,Latin,ljod,Russian,hielo,Spanish,ijs,Dutch,is,Swedish,himá,Sanskrit 167 | smoke,English,fumée,French,rauch,German,fumare,Italian,fumigant,Latin,dym,Russian,fumar,Spanish,rook,Dutch,rök,Swedish,dhūmá,Sanskrit 168 | fire,English,feu,French,feuer,German,fuoco,Italian,ignis,Latin,ogon',Russian,fuego,Spanish,vuur,Dutch,eld,Swedish,agní,Sanskrit 169 | ashes,English,cendres,French,asche,German,ceneri,Italian,cinere,Latin,zola,Russian,cenizas,Spanish,as,Dutch,aska,Swedish,ā́sa,Sanskrit 170 | burn,English,brûler,French,brennen,German,bruciare,Italian,adolebit,Latin,goret',Russian,quemar,Spanish,branden,Dutch,brinna,Swedish,dahati,Sanskrit 171 | road,English,route,French,straße,German,strada,Italian,via,Latin,doroga,Russian,carretera,Spanish,weg,Dutch,väg,Swedish,patha,Sanskrit 172 | mountain,English,montagne,French,berg,German,montagna,Italian,mons,Latin,gora,Russian,montaña,Spanish,berg,Dutch,berg,Swedish,girí,Sanskrit 173 | red,English,rouge,French,rot,German,rosso,Italian,ruber,Latin,krasnyj,Russian,rojo,Spanish,rood,Dutch,röd,Swedish,rakta,Sanskrit 174 | green,English,vert,French,grün,German,verde,Italian,viridi,Latin,zeljonyj,Russian,verde,Spanish,groen,Dutch,grön,Swedish,harít,Sanskrit 175 | yellow,English,jaune,French,gelb,German,giallo,Italian,flavis,Latin,zhjoltyj,Russian,amarillo,Spanish,geel,Dutch,gul,Swedish,pīta,Sanskrit 176 | white,English,blanc,French,weiß,German,bianco,Italian,albus,Latin,belyj,Russian,blanco,Spanish,wit,Dutch,vit,Swedish,śvetá,Sanskrit 177 | black,English,noir,French,schwarz,German,nero,Italian,niger,Latin,chjornyj,Russian,negro,Spanish,zwart,Dutch,svart,Swedish,kṛṣṇá,Sanskrit 178 | night,English,nuit,French,nacht,German,notte,Italian,nocte,Latin,noch',Russian,noche,Spanish,nacht,Dutch,natt,Swedish,rā́tri,Sanskrit 179 | day,English,jour,French,tag,German,giorno,Italian,die,Latin,den',Russian,día,Spanish,dag,Dutch,dag,Swedish,diná,Sanskrit 180 | year,English,an,French,jahr,German,anno,Italian,anno,Latin,god,Russian,año,Spanish,jaar,Dutch,år,Swedish,varṣá,Sanskrit 181 | warm,English,chaud,French,warm,German,caldo,Italian,tepido,Latin,tjoplyj,Russian,caliente,Spanish,warm,Dutch,varm,Swedish,taptá,Sanskrit 182 | cold,English,froid,French,kalt,German,freddo,Italian,frigus,Latin,holodnyj,Russian,frío,Spanish,koud,Dutch,kall,Swedish,śītá,Sanskrit 183 | full,English,plein,French,voll,German,completo,Italian,plenus,Latin,polnyj,Russian,completo,Spanish,vol,Dutch,full,Swedish,pūrṇá,Sanskrit 184 | new,English,nouveau,French,neu,German,nuovo,Italian,novus,Latin,novyj,Russian,nuevo,Spanish,nieuw,Dutch,ny,Swedish,náva,Sanskrit 185 | old,English,vieux,French,alt,German,vecchio,Italian,vetus,Latin,staryj,Russian,viejo,Spanish,oud,Dutch,gammal,Swedish,jīrṇá,Sanskrit 186 | good,English,bon,French,gut,German,buono,Italian,bonum,Latin,horoshij,Russian,bueno,Spanish,goed,Dutch,bra,Swedish,vásu,Sanskrit 187 | bad,English,mauvais,French,schlecht,German,male,Italian,malus,Latin,plohoj,Russian,malo,Spanish,slecht,Dutch,dålig,Swedish,pāpá,Sanskrit 188 | rotten,English,pourri,French,verrottet,German,marcio,Italian,putrida,Latin,gniloj,Russian,podrido,Spanish,rot,Dutch,rutten,Swedish,pū́ti,Sanskrit 189 | dirty,English,sale,French,schmutzig,German,sporco,Italian,sordida,Latin,grjaznyj,Russian,sucio,Spanish,vies,Dutch,smutsig,Swedish,malina,Sanskrit 190 | straight,English,droit,French,gerade,German,dritto,Italian,recta,Latin,prjamoj,Russian,recto,Spanish,recht,Dutch,rak,Swedish,ṛjú,Sanskrit 191 | round,English,rond,French,rund,German,intorno,Italian,circum,Latin,kruglyj,Russian,redondear,Spanish,rond,Dutch,rund,Swedish,vṛttá,Sanskrit 192 | sharp,English,tranchant,French,scharf,German,affilato,Italian,acutus,Latin,ostryj,Russian,agudo,Spanish,scherp,Dutch,vass,Swedish,tīkṣṇá,Sanskrit 193 | dull,English,émoussé,French,stumpf,German,opaco,Italian,hebetes,Latin,tupoj,Russian,aburrido,Spanish,stomp,Dutch,slö,Swedish,kuṇṭha,Sanskrit 194 | smooth,English,lisse,French,glatt,German,lisciare,Italian,aequabis,Latin,gladkij,Russian,suavizar,Spanish,glad,Dutch,len,Swedish,ślakṣṇá,Sanskrit 195 | wet,English,mouillé,French,nass,German,bagnato,Italian,udo,Latin,mokryj,Russian,húmedo,Spanish,nat,Dutch,våt,Swedish,ārdrá,Sanskrit 196 | dry,English,sec,French,trocken,German,asciugare,Italian,siccabis,Latin,suhoj,Russian,secar,Spanish,droog,Dutch,torr,Swedish,śúṣka,Sanskrit 197 | correct,English,juste,French,richtig,German,correggere,Italian,corrigere,Latin,pravil'nyj,Russian,corregir,Spanish,juist,Dutch,rätt,Swedish,śuddhá,Sanskrit 198 | near,English,proche,French,nah,German,vicino,Italian,prope,Latin,blizkij,Russian,cerca,Spanish,dichtbij,Dutch,nära,Swedish,neda,Sanskrit 199 | far,English,loin,French,weit,German,lontano,Italian,longe,Latin,daljokij,Russian,lejos,Spanish,ver,Dutch,långt bort,Swedish,dūrá,Sanskrit 200 | right,English,à droite,French,rechts,German,destra,Italian,ius,Latin,pravyj,Russian,derecho,Spanish,rechts,Dutch,höger,Swedish,dákṣiṇa,Sanskrit 201 | left,English,à gauche,French,links,German,a sinistra,Italian,reliquit,Latin,levyj,Russian,izquierda,Spanish,links,Dutch,vänster,Swedish,savyá,Sanskrit 202 | at,English,à,French,bei,German,a,Italian,ad,Latin,pri,Russian,en,Spanish,aan,Dutch,hos,Swedish,,Sanskrit 203 | in,English,dans,French,in,German,in,Italian,in,Latin,v,Russian,en,Spanish,in,Dutch,i,Swedish,,Sanskrit 204 | with,English,avec,French,mit,German,con,Italian,cum,Latin,s,Russian,con,Spanish,met,Dutch,med,Swedish,sahá,Sanskrit 205 | and,English,et,French,und,German,e,Italian,et,Latin,i,Russian,y,Spanish,en,Dutch,och,Swedish,ca,Sanskrit 206 | if,English,si,French,wenn,German,se,Italian,si,Latin,esli,Russian,si,Spanish,als,Dutch,om,Swedish,yádi,Sanskrit 207 | because,English,parce que,French,weil,German,perché,Italian,quia,Latin,potomu chto,Russian,porque,Spanish,omdat,Dutch,eftersom,Swedish,hí,Sanskrit 208 | name,English,nom,French,name,German,nome,Italian,nomen,Latin,imja,Russian,nombre,Spanish,naam,Dutch,namn,Swedish,nā́man,Sanskrit 209 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Natural-Language-Processing-with-Python-Analyzing-Text-with-the-Natural-Language-Toolkit 2 | 3 | My solutions to selected exercises to "Natural Language Processing with Python 4 | – Analyzing Text with the Natural Language Toolkit" by Steven Bird, Ewan Klein, and Edward Loper. Unless noted otherwise, all solutions are my own and represent original material. The HTML version of the NLTK book is available at https://www.nltk.org/book/. 5 | --------------------------------------------------------------------------------