├── README.md ├── Data Cleaning 5.ipynb ├── Data Cleaning 3.ipynb ├── Data Cleaning 4.ipynb └── Data Cleaning 2.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Kaggle-Data-Cleaning-Challenge 2 | Learn professional data cleaning techniques! Data cleaning is a key part of data science, but it can be deeply frustrating. Why are some of your text fields garbled? What should you do about those missing values? Why aren’t your dates formatted correctly? How can you quickly clean up inconsistent data entry? In this five day challenge, you'll learn why you've run into these problems and, more importantly, how to fix them! In this challenge we’ll learn how to tackle some of the most common data cleaning problems so you can get to actually analyzing your data faster. 3 | 4 | 5 | We’ll work through five hands-on exercises with real, messy data and answer some of your most commonly-asked data cleaning questions. Here's a day-by-day breakdown of what we'll be learning each day: 6 | - Day 1: Handling missing values 7 | - Day 2: Data scaling and normalization 8 | - Day 3: Cleaning and parsing dates 9 | - Day 4: Character encoding errors (no more messed up text fields!) 10 | - Day 5: Fixing inconsistent data entry & spelling errors 11 | 12 | -------------------------------------------------------------------------------- /Data Cleaning 5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": { 5 | "_uuid": "6ac53f18b4f4ec0fc44348cedb5d1c319fa127c0", 6 | "_cell_guid": "b91a74ba-85f4-486e-b5f9-d0898f0626bf" 7 | }, 8 | "cell_type": "markdown", 9 | "source": "### Previous days\n\n* [Day 1: Handling missing values](https://www.kaggle.com/rtatman/data-cleaning-challenge-handling-missing-values)\n* [Day 2: Scaling and normalization](https://www.kaggle.com/rtatman/data-cleaning-challenge-scale-and-normalize-data)\n* [Day 3: Parsing dates](https://www.kaggle.com/rtatman/data-cleaning-challenge-parsing-dates/)\n* [Day 4: Character encodings](https://www.kaggle.com/rtatman/data-cleaning-challenge-character-encodings/)\n___\n\nWelcome to day 5 of the 5-Day Data Challenge! (Can you believe it's already been five days??) Today, we're going to learn how to clean up inconsistent text entries. To get started, click the blue \"Fork Notebook\" button in the upper, right hand corner. This will create a private copy of this notebook that you can edit and play with. Once you're finished with the exercises, you can choose to make your notebook public to share with others. :)\n\n> **Your turn!** As we work through this notebook, you'll see some notebook cells (a block of either code or text) that has \"Your Turn!\" written in it. These are exercises for you to do to help cement your understanding of the concepts we're talking about. Once you've written the code to answer a specific question, you can run the code by clicking inside the cell (box with code in it) with the code you want to run and then hit CTRL + ENTER (CMD + ENTER on a Mac). You can also click in a cell and then click on the right \"play\" arrow to the left of the code. If you want to run all the code in your notebook, you can use the double, \"fast forward\" arrows at the bottom of the notebook editor.\n\nHere's what we're going to do today:\n\n* [Get our environment set up](#Get-our-environment-set-up)\n* [Do some preliminary text pre-processing](#Do-some-preliminary-text-pre-processing)\n* [Use fuzzy matching to correct inconsistent data entry](#Use-fuzzy-matching-to-correct-inconsistent-data-entry)\n\n\nLet's get started!" 10 | }, 11 | { 12 | "metadata": { 13 | "_uuid": "9d82bf13584b8e682962fbb96131f2447d741679", 14 | "_cell_guid": "5cd5061f-ae30-4837-a53b-690ffd5c5830" 15 | }, 16 | "cell_type": "markdown", 17 | "source": "# Get our environment set up\n________\n\nThe first thing we'll need to do is load in the libraries we'll be using. Not our datasets, though: we'll get to those later!\n\n> **Important!** Make sure you run this cell yourself or the rest of your code won't work!" 18 | }, 19 | { 20 | "metadata": { 21 | "_uuid": "835cbe0834b935fb0fd40c75b9c39454836f4d5f", 22 | "collapsed": true, 23 | "_cell_guid": "135a7804-b5f5-40aa-8657-4a15774e3666", 24 | "trusted": true 25 | }, 26 | "cell_type": "code", 27 | "source": "# modules we'll use\nimport pandas as pd\nimport numpy as np\n\n# helpful modules\nimport fuzzywuzzy\nfrom fuzzywuzzy import process\nimport chardet\n\n# set seed for reproducibility\nnp.random.seed(0)", 28 | "execution_count": 1, 29 | "outputs": [] 30 | }, 31 | { 32 | "metadata": { 33 | "_uuid": "ed09d242e94e22f1bac2dc446d7545b1d1f5d5c5", 34 | "_cell_guid": "5169ae8c-6210-400a-ace2-e5fbe00378fc" 35 | }, 36 | "cell_type": "markdown", 37 | "source": "When I tried to read in the `PakistanSuicideAttacks Ver 11 (30-November-2017).csv`file the first time, I got a character encoding error, so I'm going to quickly check out what the encoding should be..." 38 | }, 39 | { 40 | "metadata": { 41 | "_uuid": "d2578d4d4bc7e42f5ab6157d9c3eb40e68d42e9b", 42 | "_cell_guid": "ee54b6ee-0869-438a-9b6f-57c6d67f923f", 43 | "trusted": true 44 | }, 45 | "cell_type": "code", 46 | "source": "# look at the first ten thousand bytes to guess the character encoding\nwith open(\"../input/PakistanSuicideAttacks Ver 11 (30-November-2017).csv\", 'rb') as rawdata:\n result = chardet.detect(rawdata.read(100000))\n\n# check what the character encoding might be\nprint(result)", 47 | "execution_count": 2, 48 | "outputs": [ 49 | { 50 | "output_type": "stream", 51 | "text": "{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}\n", 52 | "name": "stdout" 53 | } 54 | ] 55 | }, 56 | { 57 | "metadata": { 58 | "_uuid": "71d00770de8e42e926d8dc5a3a8b48b2c368ea43", 59 | "_cell_guid": "6a60be35-cd57-4dcc-9b98-c365de041332" 60 | }, 61 | "cell_type": "markdown", 62 | "source": "And then read it in with the correct encoding. (If this look unfamiliar to you, check out [yesterday's challenge](https://www.kaggle.com/rtatman/data-cleaning-challenge-character-encodings/).) " 63 | }, 64 | { 65 | "metadata": { 66 | "_uuid": "c82584427932f3f0ccd21c7d1eca92f62476ed0a", 67 | "collapsed": true, 68 | "_cell_guid": "0f40ed87-fc61-4a61-b230-6af1f4618114", 69 | "trusted": true 70 | }, 71 | "cell_type": "code", 72 | "source": "# read in our dat\nsuicide_attacks = pd.read_csv(\"../input/PakistanSuicideAttacks Ver 11 (30-November-2017).csv\", \n encoding='Windows-1252')", 73 | "execution_count": 3, 74 | "outputs": [] 75 | }, 76 | { 77 | "metadata": { 78 | "_uuid": "a3f42cea88795426f036e35d30d5c079f3c6152c", 79 | "_cell_guid": "83630dd4-6775-4ba5-a290-077c6f503f64" 80 | }, 81 | "cell_type": "markdown", 82 | "source": "Now we're ready to get started! You can, as always, take a moment here to look at the data and get familiar with it. :)\n\n\n# Do some preliminary text pre-processing\n___\n\nFor this exercise, I'm interested in cleaning up the \"City\" column to make sure there's no data entry inconsistencies in it. We could go through and check each row by hand, of course, and hand-correct inconsistencies when we find them. There's a more efficient way to do this though!" 83 | }, 84 | { 85 | "metadata": { 86 | "_uuid": "4bced8b6f6a985ded2c991f46ed0145ac1d8b722", 87 | "_cell_guid": "b3d4b17e-77c4-46d8-9681-a94801969b49", 88 | "trusted": true 89 | }, 90 | "cell_type": "code", 91 | "source": "# get all the unique values in the 'City' column\ncities = suicide_attacks['City'].unique()\n\n# sort them alphabetically and then take a closer look\ncities.sort()\ncities", 92 | "execution_count": 4, 93 | "outputs": [ 94 | { 95 | "output_type": "execute_result", 96 | "execution_count": 4, 97 | "data": { 98 | "text/plain": "array(['ATTOCK', 'Attock ', 'Bajaur Agency', 'Bannu', 'Bhakkar ', 'Buner',\n 'Chakwal ', 'Chaman', 'Charsadda', 'Charsadda ', 'D. I Khan',\n 'D.G Khan', 'D.G Khan ', 'D.I Khan', 'D.I Khan ', 'Dara Adam Khel',\n 'Dara Adam khel', 'Fateh Jang', 'Ghallanai, Mohmand Agency ',\n 'Gujrat', 'Hangu', 'Haripur', 'Hayatabad', 'Islamabad',\n 'Islamabad ', 'Jacobabad', 'KURRAM AGENCY', 'Karachi', 'Karachi ',\n 'Karak', 'Khanewal', 'Khuzdar', 'Khyber Agency', 'Khyber Agency ',\n 'Kohat', 'Kohat ', 'Kuram Agency ', 'Lahore', 'Lahore ',\n 'Lakki Marwat', 'Lakki marwat', 'Lasbela', 'Lower Dir', 'MULTAN',\n 'Malakand ', 'Mansehra', 'Mardan', 'Mohmand Agency',\n 'Mohmand Agency ', 'Mohmand agency', 'Mosal Kor, Mohmand Agency',\n 'Multan', 'Muzaffarabad', 'North Waziristan', 'North waziristan',\n 'Nowshehra', 'Orakzai Agency', 'Peshawar', 'Peshawar ', 'Pishin',\n 'Poonch', 'Quetta', 'Quetta ', 'Rawalpindi', 'Sargodha',\n 'Sehwan town', 'Shabqadar-Charsadda', 'Shangla ', 'Shikarpur',\n 'Sialkot', 'South Waziristan', 'South waziristan', 'Sudhanoti',\n 'Sukkur', 'Swabi ', 'Swat', 'Swat ', 'Taftan',\n 'Tangi, Charsadda District', 'Tank', 'Tank ', 'Taunsa',\n 'Tirah Valley', 'Totalai', 'Upper Dir', 'Wagah', 'Zhob', 'bannu',\n 'karachi', 'karachi ', 'lakki marwat', 'peshawar', 'swat'],\n dtype=object)" 99 | }, 100 | "metadata": {} 101 | } 102 | ] 103 | }, 104 | { 105 | "metadata": { 106 | "_uuid": "8785e8cc59b40e6ac7a824184132460e22a99f87", 107 | "_cell_guid": "c11d7808-e677-4ec3-a357-0a3e9bed4cf5" 108 | }, 109 | "cell_type": "markdown", 110 | "source": "Just looking at this, I can see some problems due to inconsistent data entry: 'Lahore' and 'Lahore ', for example, or 'Lakki Marwat' and 'Lakki marwat'.\n\nThe first thing I'm going to do is make everything lower case (I can change it back at the end if I like) and remove any white spaces at the beginning and end of cells. Inconsistencies in capitalizations and trailing white spaces are very common in text data and you can fix a good 80% of your text data entry inconsistencies by doing this." 111 | }, 112 | { 113 | "metadata": { 114 | "_uuid": "2b604c74492419f89a43262d1f811e272646f9a6", 115 | "collapsed": true, 116 | "_cell_guid": "61651d57-f28c-4b81-bd05-b82720a8ed18", 117 | "trusted": true 118 | }, 119 | "cell_type": "code", 120 | "source": "# convert to lower case\nsuicide_attacks['City'] = suicide_attacks['City'].str.lower()\n# remove trailing white spaces\nsuicide_attacks['City'] = suicide_attacks['City'].str.strip()", 121 | "execution_count": 5, 122 | "outputs": [] 123 | }, 124 | { 125 | "metadata": { 126 | "_uuid": "29388ff41b320262a8fe17a8f2a347ae919bad7c", 127 | "_cell_guid": "4c11e916-981a-41c3-b79f-9ac60521d6a2" 128 | }, 129 | "cell_type": "markdown", 130 | "source": "Next we're going to tackle more difficult inconsistencies." 131 | }, 132 | { 133 | "metadata": { 134 | "_uuid": "27aeda660f0e95ccb24bf8c5c1e1d5cfb22be7a8", 135 | "collapsed": true, 136 | "_cell_guid": "3deb3f1b-80e0-4a94-9bf7-1c9cd4882c18", 137 | "trusted": true 138 | }, 139 | "cell_type": "code", 140 | "source": "# Your turn! Take a look at all the unique values in the \"Province\" column.\nprovince = suicide_attacks['Province'].unique()\n\n# sort them alphabetically and then take a closer look\nprovince.sort()\nprovince\n# Then convert the column to lowercase and remove any trailing white spaces\nsuicide_attacks['Province'] = suicide_attacks['Province'].str.lower()\n# remove trailing white spaces\nsuicide_attacks['Province'] = suicide_attacks['Province'].str.strip()", 141 | "execution_count": 6, 142 | "outputs": [] 143 | }, 144 | { 145 | "metadata": { 146 | "_uuid": "3639865348f499faa25b75a46438807ed70d4173", 147 | "_cell_guid": "a612e0fa-1361-4e8e-a6aa-5008b631d076" 148 | }, 149 | "cell_type": "markdown", 150 | "source": "# Use fuzzy matching to correct inconsistent data entry\n___\n\nAlright, let's take another look at the city column and see if there's any more data cleaning we need to do." 151 | }, 152 | { 153 | "metadata": { 154 | "_uuid": "1408dacdd7b76f306bd1c0c534b991d76243d7cc", 155 | "_cell_guid": "8f20fd24-33a4-472d-ba22-be0abc2a1e5b", 156 | "trusted": true 157 | }, 158 | "cell_type": "code", 159 | "source": "# get all the unique values in the 'City' column\ncities = suicide_attacks['City'].unique()\n\n# sort them alphabetically and then take a closer look\ncities.sort()\ncities", 160 | "execution_count": 7, 161 | "outputs": [ 162 | { 163 | "output_type": "execute_result", 164 | "execution_count": 7, 165 | "data": { 166 | "text/plain": "array(['attock', 'bajaur agency', 'bannu', 'bhakkar', 'buner', 'chakwal',\n 'chaman', 'charsadda', 'd. i khan', 'd.g khan', 'd.i khan',\n 'dara adam khel', 'fateh jang', 'ghallanai, mohmand agency',\n 'gujrat', 'hangu', 'haripur', 'hayatabad', 'islamabad',\n 'jacobabad', 'karachi', 'karak', 'khanewal', 'khuzdar',\n 'khyber agency', 'kohat', 'kuram agency', 'kurram agency',\n 'lahore', 'lakki marwat', 'lasbela', 'lower dir', 'malakand',\n 'mansehra', 'mardan', 'mohmand agency',\n 'mosal kor, mohmand agency', 'multan', 'muzaffarabad',\n 'north waziristan', 'nowshehra', 'orakzai agency', 'peshawar',\n 'pishin', 'poonch', 'quetta', 'rawalpindi', 'sargodha',\n 'sehwan town', 'shabqadar-charsadda', 'shangla', 'shikarpur',\n 'sialkot', 'south waziristan', 'sudhanoti', 'sukkur', 'swabi',\n 'swat', 'taftan', 'tangi, charsadda district', 'tank', 'taunsa',\n 'tirah valley', 'totalai', 'upper dir', 'wagah', 'zhob'],\n dtype=object)" 167 | }, 168 | "metadata": {} 169 | } 170 | ] 171 | }, 172 | { 173 | "metadata": { 174 | "_uuid": "b092eca650105d8fe8b15f85fbe2747003b4f170", 175 | "_cell_guid": "dcbefc7e-702c-4b5a-86ab-f0c2f93f3873" 176 | }, 177 | "cell_type": "markdown", 178 | "source": "It does look like there are some remaining inconsistencies: 'd. i khan' and 'd.i khan' should probably be the same. (I [looked it up](https://en.wikipedia.org/wiki/List_of_most_populous_cities_in_Pakistan) and 'd.g khan' is a seperate city, so I shouldn't combine those.) \n\nI'm going to use the [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) package to help identify which string are closest to each other. This dataset is small enough that we could probably could correct errors by hand, but that approach doesn't scale well. (Would you want to correct a thousand errors by hand? What about ten thousand? Automating things as early as possible is generally a good idea. Plus, it’s fun! :)\n\n> **Fuzzy matching:** The process of automatically finding text strings that are very similar to the target string. In general, a string is considered \"closer\" to another one the fewer characters you'd need to change if you were transforming one string into another. So \"apple\" and \"snapple\" are two changes away from each other (add \"s\" and \"n\") while \"in\" and \"on\" and one change away (rplace \"i\" with \"o\"). You won't always be able to rely on fuzzy matching 100%, but it will usually end up saving you at least a little time.\n\nFuzzywuzzy returns a ratio given two strings. The closer the ratio is to 100, the smaller the edit distance between the two strings. Here, we're going to get the ten strings from our list of cities that have the closest distance to \"d.i khan\"." 179 | }, 180 | { 181 | "metadata": { 182 | "_uuid": "a53c6f011f5c9144e9a48f329d5cf15e2feddec8", 183 | "_cell_guid": "4fdcd726-4a4f-4348-b745-1e42c3338100", 184 | "trusted": true 185 | }, 186 | "cell_type": "code", 187 | "source": "# get the top 10 closest matches to \"d.i khan\"\nmatches = fuzzywuzzy.process.extract(\"d.i khan\", cities, limit=10, scorer=fuzzywuzzy.fuzz.token_sort_ratio)\n\n# take a look at them\nmatches", 188 | "execution_count": 8, 189 | "outputs": [ 190 | { 191 | "output_type": "execute_result", 192 | "execution_count": 8, 193 | "data": { 194 | "text/plain": "[('d. i khan', 100),\n ('d.i khan', 100),\n ('d.g khan', 88),\n ('khanewal', 50),\n ('sudhanoti', 47),\n ('hangu', 46),\n ('kohat', 46),\n ('dara adam khel', 45),\n ('chaman', 43),\n ('mardan', 43)]" 195 | }, 196 | "metadata": {} 197 | } 198 | ] 199 | }, 200 | { 201 | "metadata": { 202 | "_uuid": "e31474068514e35c65bb9d16d58bbb7e5f1226ce", 203 | "_cell_guid": "43bf991e-8c5c-4ac6-b412-de36c494b40f" 204 | }, 205 | "cell_type": "markdown", 206 | "source": "We can see that two of the items in the cities are very close to \"d.i khan\": \"d. i khan\" and \"d.i khan\". We can also see the \"d.g khan\", which is a seperate city, has a ratio of 88. Since we don't want to replace \"d.g khan\" with \"d.i khan\", let's replace all rows in our City column that have a ratio of > 90 with \"d. i khan\". \n\nTo do this, I'm going to write a function. (It's a good idea to write a general purpose function you can reuse if you think you might have to do a specific task more than once or twice. This keeps you from having to copy and paste code too often, which saves time and can help prevent mistakes.)" 207 | }, 208 | { 209 | "metadata": { 210 | "_uuid": "e518a51a3969956e8259e323bd03c62fc99a830c", 211 | "collapsed": true, 212 | "_cell_guid": "2d1b7f9b-5fe0-4de3-865d-b3fdc4355b17", 213 | "trusted": true 214 | }, 215 | "cell_type": "code", 216 | "source": "# function to replace rows in the provided column of the provided dataframe\n# that match the provided string above the provided ratio with the provided string\ndef replace_matches_in_column(df, column, string_to_match, min_ratio = 90):\n # get a list of unique strings\n strings = df[column].unique()\n \n # get the top 10 closest matches to our input string\n matches = fuzzywuzzy.process.extract(string_to_match, strings, \n limit=10, scorer=fuzzywuzzy.fuzz.token_sort_ratio)\n\n # only get matches with a ratio > 90\n close_matches = [matches[0] for matches in matches if matches[1] >= min_ratio]\n\n # get the rows of all the close matches in our dataframe\n rows_with_matches = df[column].isin(close_matches)\n\n # replace all rows with close matches with the input matches \n df.loc[rows_with_matches, column] = string_to_match\n \n # let us know the function's done\n print(\"All done!\")", 217 | "execution_count": 9, 218 | "outputs": [] 219 | }, 220 | { 221 | "metadata": { 222 | "_uuid": "555c4f9d53db48869becbf5efd054e6e73570990", 223 | "_cell_guid": "72081a02-025a-4ccb-b08b-2a6e6018b2f9" 224 | }, 225 | "cell_type": "markdown", 226 | "source": "Now that we have a function, we can put it to the test!" 227 | }, 228 | { 229 | "metadata": { 230 | "_uuid": "846464842c3537f6bf41eb1db6d09c11fedc1f99", 231 | "_cell_guid": "989a8f2e-8bca-4a6a-a64c-a5ec0ea57606", 232 | "trusted": true 233 | }, 234 | "cell_type": "code", 235 | "source": "# use the function we just wrote to replace close matches to \"d.i khan\" with \"d.i khan\"\nreplace_matches_in_column(df=suicide_attacks, column='City', string_to_match=\"d.i khan\")", 236 | "execution_count": 10, 237 | "outputs": [ 238 | { 239 | "output_type": "stream", 240 | "text": "All done!\n", 241 | "name": "stdout" 242 | } 243 | ] 244 | }, 245 | { 246 | "metadata": { 247 | "_uuid": "2c284b82c0d22189e998a034807f98e9a01fe228", 248 | "_cell_guid": "dd6d23bc-2c43-4fc1-bbd0-cbba4c545557" 249 | }, 250 | "cell_type": "markdown", 251 | "source": "And now let's can check the unique values in our City column again and make sure we've tidied up d.i khan correctly." 252 | }, 253 | { 254 | "metadata": { 255 | "_uuid": "ef869fbc043758259d6eafe599532468692eb15c", 256 | "_cell_guid": "7a2c8300-4795-43ba-9ed8-dbd3f1fb22a7", 257 | "trusted": true 258 | }, 259 | "cell_type": "code", 260 | "source": "# get all the unique values in the 'City' column\ncities = suicide_attacks['City'].unique()\n\n# sort them alphabetically and then take a closer look\ncities.sort()\ncities", 261 | "execution_count": 11, 262 | "outputs": [ 263 | { 264 | "output_type": "execute_result", 265 | "execution_count": 11, 266 | "data": { 267 | "text/plain": "array(['attock', 'bajaur agency', 'bannu', 'bhakkar', 'buner', 'chakwal',\n 'chaman', 'charsadda', 'd.g khan', 'd.i khan', 'dara adam khel',\n 'fateh jang', 'ghallanai, mohmand agency', 'gujrat', 'hangu',\n 'haripur', 'hayatabad', 'islamabad', 'jacobabad', 'karachi',\n 'karak', 'khanewal', 'khuzdar', 'khyber agency', 'kohat',\n 'kuram agency', 'kurram agency', 'lahore', 'lakki marwat',\n 'lasbela', 'lower dir', 'malakand', 'mansehra', 'mardan',\n 'mohmand agency', 'mosal kor, mohmand agency', 'multan',\n 'muzaffarabad', 'north waziristan', 'nowshehra', 'orakzai agency',\n 'peshawar', 'pishin', 'poonch', 'quetta', 'rawalpindi', 'sargodha',\n 'sehwan town', 'shabqadar-charsadda', 'shangla', 'shikarpur',\n 'sialkot', 'south waziristan', 'sudhanoti', 'sukkur', 'swabi',\n 'swat', 'taftan', 'tangi, charsadda district', 'tank', 'taunsa',\n 'tirah valley', 'totalai', 'upper dir', 'wagah', 'zhob'],\n dtype=object)" 268 | }, 269 | "metadata": {} 270 | } 271 | ] 272 | }, 273 | { 274 | "metadata": { 275 | "_uuid": "4d43bc9b0bc6997a6c6454ff2a21aa0a296a8571", 276 | "_cell_guid": "6a7c360d-24f5-44ef-8efd-a04c56caa95d" 277 | }, 278 | "cell_type": "markdown", 279 | "source": "Excellent! Now we only have \"d.i khan\" in our dataframe and we didn't have to change anything by hand. " 280 | }, 281 | { 282 | "metadata": { 283 | "_uuid": "bfb366a27a3995fe253a662dd09f453afba117f6", 284 | "_cell_guid": "0922e215-9abb-4b44-9060-7b52080fae90", 285 | "trusted": true 286 | }, 287 | "cell_type": "code", 288 | "source": "# Your turn! It looks like 'kuram agency' and 'kurram agency' should\n# be the same city. Correct the dataframe so that they are.\n\nreplace_matches_in_column(df=suicide_attacks, column='City', string_to_match=\"kurram agency\")", 289 | "execution_count": 16, 290 | "outputs": [ 291 | { 292 | "output_type": "stream", 293 | "text": "All done!\n", 294 | "name": "stdout" 295 | } 296 | ] 297 | }, 298 | { 299 | "metadata": { 300 | "trusted": true, 301 | "_uuid": "1b152352c99531803562c93e802961a63e65d495" 302 | }, 303 | "cell_type": "code", 304 | "source": "# get all the unique values in the 'City' column\ncities = suicide_attacks['City'].unique()\n\n# sort them alphabetically and then take a closer look\ncities.sort()\ncities", 305 | "execution_count": 17, 306 | "outputs": [ 307 | { 308 | "output_type": "execute_result", 309 | "execution_count": 17, 310 | "data": { 311 | "text/plain": "array(['attock', 'bajaur agency', 'bannu', 'bhakkar', 'buner', 'chakwal',\n 'chaman', 'charsadda', 'd.g khan', 'd.i khan', 'dara adam khel',\n 'fateh jang', 'ghallanai, mohmand agency', 'gujrat', 'hangu',\n 'haripur', 'hayatabad', 'islamabad', 'jacobabad', 'karachi',\n 'karak', 'khanewal', 'khuzdar', 'khyber agency', 'kohat',\n 'kurram agency', 'lahore', 'lakki marwat', 'lasbela', 'lower dir',\n 'malakand', 'mansehra', 'mardan', 'mohmand agency',\n 'mosal kor, mohmand agency', 'multan', 'muzaffarabad',\n 'north waziristan', 'nowshehra', 'orakzai agency', 'peshawar',\n 'pishin', 'poonch', 'quetta', 'rawalpindi', 'sargodha',\n 'sehwan town', 'shabqadar-charsadda', 'shangla', 'shikarpur',\n 'sialkot', 'south waziristan', 'sudhanoti', 'sukkur', 'swabi',\n 'swat', 'taftan', 'tangi, charsadda district', 'tank', 'taunsa',\n 'tirah valley', 'totalai', 'upper dir', 'wagah', 'zhob'],\n dtype=object)" 312 | }, 313 | "metadata": {} 314 | } 315 | ] 316 | }, 317 | { 318 | "metadata": { 319 | "_uuid": "52b0af56e3c77db96056e9acd785f8f435f7caf5", 320 | "_cell_guid": "b4f37fce-4d08-409e-bbbd-6a26c3bbc6ee" 321 | }, 322 | "cell_type": "markdown", 323 | "source": "And that's it for today! If you have any questions, be sure to post them in the comments below or [on the forums](https://www.kaggle.com/questions-and-answers). \n\nRemember that your notebook is private by default, and in order to share it with other people or ask for help with it, you'll need to make it public. First, you'll need to save a version of your notebook that shows your current work by hitting the \"Commit & Run\" button. (Your work is saved automatically, but versioning your work lets you go back and look at what it was like at the point you saved it. It also lets you share a nice compiled notebook instead of just the raw code.) Then, once your notebook is finished running, you can go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the \"Commit & Run\" button) and setting the \"Visibility\" dropdown to \"Public\".\n\n# More practice!\n___\n\nDo any other columns in this dataframe have inconsistent data entry? If you can find any, try to tidy them up.\n\nYou can also try reading in the `PakistanSuicideAttacks Ver 6 (10-October-2017).csv` file from this dataset and tidying up any inconsistent columns in that data file." 324 | }, 325 | { 326 | "metadata": { 327 | "trusted": true, 328 | "collapsed": true, 329 | "_uuid": "964c1d71a3143953d1fe5599b7c7eebf2f9274cb" 330 | }, 331 | "cell_type": "code", 332 | "source": "", 333 | "execution_count": null, 334 | "outputs": [] 335 | } 336 | ], 337 | "metadata": { 338 | "kernelspec": { 339 | "display_name": "Python 3", 340 | "language": "python", 341 | "name": "python3" 342 | }, 343 | "language_info": { 344 | "name": "python", 345 | "version": "3.6.4", 346 | "mimetype": "text/x-python", 347 | "codemirror_mode": { 348 | "name": "ipython", 349 | "version": 3 350 | }, 351 | "pygments_lexer": "ipython3", 352 | "nbconvert_exporter": "python", 353 | "file_extension": ".py" 354 | } 355 | }, 356 | "nbformat": 4, 357 | "nbformat_minor": 1 358 | } -------------------------------------------------------------------------------- /Data Cleaning 3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": { 5 | "_uuid": "6ac53f18b4f4ec0fc44348cedb5d1c319fa127c0", 6 | "_cell_guid": "b91a74ba-85f4-486e-b5f9-d0898f0626bf" 7 | }, 8 | "cell_type": "markdown", 9 | "source": "### Previous days\n\n* [Day 1: Handling missing values](https://www.kaggle.com/rtatman/data-cleaning-challenge-handling-missing-values)\n* [Day 2: Scaling and normalization](https://www.kaggle.com/rtatman/data-cleaning-challenge-scale-and-normalize-data)\n___\nWelcome to day 3 of the 5-Day Data Challenge! Today, we're going to work with dates. To get started, click the blue \"Fork Notebook\" button in the upper, right hand corner. This will create a private copy of this notebook that you can edit and play with. Once you're finished with the exercises, you can choose to make your notebook public to share with others. :)\n\n> **Your turn!** As we work through this notebook, you'll see some notebook cells (a block of either code or text) that has \"Your Turn!\" written in it. These are exercises for you to do to help cement your understanding of the concepts we're talking about. Once you've written the code to answer a specific question, you can run the code by clicking inside the cell (box with code in it) with the code you want to run and then hit CTRL + ENTER (CMD + ENTER on a Mac). You can also click in a cell and then click on the right \"play\" arrow to the left of the code. If you want to run all the code in your notebook, you can use the double, \"fast forward\" arrows at the bottom of the notebook editor.\n\nHere's what we're going to do today:\n\n* [Get our environment set up](#Get-our-environment-set-up)\n* [Check the data type of our date column](#Check-the-data-type-of-our-date-column)\n* [Convert our date columns to datetime](#Convert-our-date-columns-to-datetime)\n* [Select just the day of the month from our column](#Select-just-the-day-of-the-month-from-our-column)\n* [Plot the day of the month to check the date parsing](#Plot-the-day-of-the-month-to-the-date-parsing)\n\nLet's get started!" 10 | }, 11 | { 12 | "metadata": { 13 | "_uuid": "9d82bf13584b8e682962fbb96131f2447d741679", 14 | "_cell_guid": "5cd5061f-ae30-4837-a53b-690ffd5c5830" 15 | }, 16 | "cell_type": "markdown", 17 | "source": "# Get our environment set up\n________\n\nThe first thing we'll need to do is load in the libraries and datasets we'll be using. For today, we'll be working with two datasets: one containing information on earthquakes that occured between 1965 and 2016, and another that contains information on landslides that occured between 2007 and 2016.\n\n> **Important!** Make sure you run this cell yourself or the rest of your code won't work!" 18 | }, 19 | { 20 | "metadata": { 21 | "_uuid": "835cbe0834b935fb0fd40c75b9c39454836f4d5f", 22 | "_cell_guid": "135a7804-b5f5-40aa-8657-4a15774e3666", 23 | "collapsed": true, 24 | "trusted": true 25 | }, 26 | "cell_type": "code", 27 | "source": "# modules we'll use\nimport pandas as pd\nimport numpy as np\nimport seaborn as sns\nimport datetime\n\n# read in our data\nearthquakes = pd.read_csv(\"../input/earthquake-database/database.csv\")\nlandslides = pd.read_csv(\"../input/landslide-events/catalog.csv\")\nvolcanos = pd.read_csv(\"../input/volcanic-eruptions/database.csv\")\n\n# set seed for reproducibility\nnp.random.seed(0)", 28 | "execution_count": 1, 29 | "outputs": [] 30 | }, 31 | { 32 | "metadata": { 33 | "_uuid": "03ce3b4afe87d98f777172c2c7be066a66a0b237", 34 | "_cell_guid": "604ac3a4-b1d9-4264-b312-4bbeecdeec00" 35 | }, 36 | "cell_type": "markdown", 37 | "source": "Now we're ready to look at some dates! (If you like, you can take this opportunity to take a look at some of the data.)" 38 | }, 39 | { 40 | "metadata": { 41 | "_uuid": "f77382b78577a34eee1f65c0ca00a8872e0c04ab", 42 | "_cell_guid": "eeff85f1-c29e-4e31-8874-7d38428056b8" 43 | }, 44 | "cell_type": "markdown", 45 | "source": "# Check the data type of our date column\n___\n\nFor this part of the challenge, I'll be working with the `date` column from the `landslides` dataframe. The very first thing I'm going to do is take a peek at the first few rows to make sure it actually looks like it contains dates." 46 | }, 47 | { 48 | "metadata": { 49 | "_uuid": "8168b45d546e78031e4298cb8a4590411385c2d6", 50 | "_cell_guid": "aa8123f5-2897-467f-909c-70f961e16fd9", 51 | "trusted": true 52 | }, 53 | "cell_type": "code", 54 | "source": "# print the first few rows of the date column\nprint(landslides['date'].head())", 55 | "execution_count": 2, 56 | "outputs": [ 57 | { 58 | "output_type": "stream", 59 | "text": "0 3/2/07\n1 3/22/07\n2 4/6/07\n3 4/14/07\n4 4/15/07\nName: date, dtype: object\n", 60 | "name": "stdout" 61 | } 62 | ] 63 | }, 64 | { 65 | "metadata": { 66 | "_uuid": "601f4faa997f1069b35f14d712bb6314f8cbd448", 67 | "_cell_guid": "27fb4839-c036-4a97-b705-7b1f9c387170" 68 | }, 69 | "cell_type": "markdown", 70 | "source": "Yep, those are dates! But just because I, a human, can tell that these are dates doesn't mean that Python knows that they're dates. Notice that the at the bottom of the output of `head()`, you can see that it says that the data type of this column is \"object\". \n\n> Pandas uses the \"object\" dtype for storing various types of data types, but most often when you see a column with the dtype \"object\" it will have strings in it. \n\nIf you check the pandas dtype documentation [here](http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes), you'll notice that there's also a specific `datetime64` dtypes. Because the dtype of our column is `object` rather than `datetime64`, we can tell that Python doesn't know that this column contains dates.\n\nWe can also look at just the dtype of your column without printing the first few rows if we like:" 71 | }, 72 | { 73 | "metadata": { 74 | "_uuid": "cc76da0492b27ed55f6f819ecf6b44a7f7dcc47f", 75 | "_cell_guid": "a0ff4ffe-51ac-4395-b8a9-6b04557a797c", 76 | "trusted": true 77 | }, 78 | "cell_type": "code", 79 | "source": "# check the data type of our date column\nlandslides['date'].dtype", 80 | "execution_count": 3, 81 | "outputs": [ 82 | { 83 | "output_type": "execute_result", 84 | "execution_count": 3, 85 | "data": { 86 | "text/plain": "dtype('O')" 87 | }, 88 | "metadata": {} 89 | } 90 | ] 91 | }, 92 | { 93 | "metadata": { 94 | "_uuid": "0466780bc0450aa9ef729b1da3f8aac048ee6f68", 95 | "_cell_guid": "75689fad-c2b5-452f-9e53-b95df363d540" 96 | }, 97 | "cell_type": "markdown", 98 | "source": "You may have to check the [numpy documentation](https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind) to match the letter code to the dtype of the object. \"O\" is the code for \"object\", so we can see that these two methods give us the same information." 99 | }, 100 | { 101 | "metadata": { 102 | "_uuid": "049da6c620038cc19f9e71279db9bb4942bb8e48", 103 | "_cell_guid": "cfcc3b84-93b4-4ac3-8706-ee8efa948540", 104 | "trusted": true 105 | }, 106 | "cell_type": "code", 107 | "source": "# Your turn! Check the data type of the Date column in the earthquakes dataframe\n# (note the capital 'D' in date!)\nearthquakes['Date'].dtype", 108 | "execution_count": 4, 109 | "outputs": [ 110 | { 111 | "output_type": "execute_result", 112 | "execution_count": 4, 113 | "data": { 114 | "text/plain": "dtype('O')" 115 | }, 116 | "metadata": {} 117 | } 118 | ] 119 | }, 120 | { 121 | "metadata": { 122 | "_uuid": "06ed45a852989dfd54acb855df2454ec43f01e0d", 123 | "_cell_guid": "101a4b0e-da5c-44e1-95c8-525fba292b7b" 124 | }, 125 | "cell_type": "markdown", 126 | "source": "# Convert our date columns to datetime\n___\n\nNow that we know that our date column isn't being recognized as a date, it's time to convert it so that it *is* recognized as a date. This is called \"parsing dates\" because we're taking in a string and identifying its component parts.\n\nWe can pandas what the format of our dates are with a guide called as [\"strftime directive\", which you can find more information on at this link](http://strftime.org/). The basic idea is that you need to point out which parts of the date are where and what punctuation is between them. There are [lots of possible parts of a date](http://strftime.org/), but the most common are `%d` for day, `%m` for month, `%y` for a two-digit year and `%Y` for a four digit year.\n\nSome examples:\n\n * 1/17/07 has the format \"%m/%d/%y\"\n * 17-1-2007 has the format \"%d-%m-%Y\"\n \n Looking back up at the head of the `date` column in the landslides dataset, we can see that it's in the format \"month/day/two-digit year\", so we can use the same syntax as the first example to parse in our dates: " 127 | }, 128 | { 129 | "metadata": { 130 | "_uuid": "512a2b892c6c0959d98c5cf1534d912547d7f257", 131 | "scrolled": false, 132 | "_cell_guid": "2901e10a-de81-4839-b121-af1529b68844", 133 | "collapsed": true, 134 | "trusted": true 135 | }, 136 | "cell_type": "code", 137 | "source": "# create a new column, date_parsed, with the parsed dates\nlandslides['date_parsed'] = pd.to_datetime(landslides['date'], format = \"%m/%d/%y\")", 138 | "execution_count": 5, 139 | "outputs": [] 140 | }, 141 | { 142 | "metadata": { 143 | "_uuid": "00d8939c4f49f52a71162fa07161d22f57550c58", 144 | "_cell_guid": "b1376c2f-f646-44a8-9fa7-55529577ea0c" 145 | }, 146 | "cell_type": "markdown", 147 | "source": "Now when I check the first few rows of the new column, I can see that the dtype is `datetime64`. I can also see that my dates have been slightly rearranged so that they fit the default order datetime objects (year-month-day)." 148 | }, 149 | { 150 | "metadata": { 151 | "_uuid": "38d88e73df0c5c36e74698c135a2ef2ba3ded154", 152 | "_cell_guid": "38b35960-3cb5-4dda-81ac-79b6ad7ac5a7", 153 | "trusted": true 154 | }, 155 | "cell_type": "code", 156 | "source": "# print the first few rows\nlandslides['date_parsed'].head()", 157 | "execution_count": 6, 158 | "outputs": [ 159 | { 160 | "output_type": "execute_result", 161 | "execution_count": 6, 162 | "data": { 163 | "text/plain": "0 2007-03-02\n1 2007-03-22\n2 2007-04-06\n3 2007-04-14\n4 2007-04-15\nName: date_parsed, dtype: datetime64[ns]" 164 | }, 165 | "metadata": {} 166 | } 167 | ] 168 | }, 169 | { 170 | "metadata": { 171 | "_uuid": "82b628147746feb0776e216610bcd8a6022afd45", 172 | "_cell_guid": "89b3896c-6e76-4131-a57b-3fa1428805d0" 173 | }, 174 | "cell_type": "markdown", 175 | "source": "Now that our dates are parsed correctly, we can interact with them in useful ways." 176 | }, 177 | { 178 | "metadata": { 179 | "_uuid": "e7e0753a637b95d437ed5388afa8fb6b152a431c", 180 | "_cell_guid": "e0f2ea5a-584c-408f-974f-f07c369d1317", 181 | "trusted": true 182 | }, 183 | "cell_type": "code", 184 | "source": "# Your turn! Create a new column, date_parsed, in the earthquakes\n# dataset that has correctly parsed dates in it. (Don't forget to \n# double-check that the dtype is correct!)\nearthquakes['date_parsed'] = pd.to_datetime(earthquakes['Date'], format = \"%m/%d/%y\", errors = 'coerce')\nearthquakes['date_parsed'].head()", 185 | "execution_count": 7, 186 | "outputs": [ 187 | { 188 | "output_type": "execute_result", 189 | "execution_count": 7, 190 | "data": { 191 | "text/plain": "0 NaT\n1 NaT\n2 NaT\n3 NaT\n4 NaT\nName: date_parsed, dtype: datetime64[ns]" 192 | }, 193 | "metadata": {} 194 | } 195 | ] 196 | }, 197 | { 198 | "metadata": { 199 | "_uuid": "8fd9a5a6da0005e6624176e90515bdc40d99ae4e", 200 | "_cell_guid": "f40e443e-ddab-4761-ad1d-c05ba98b6a47" 201 | }, 202 | "cell_type": "markdown", 203 | "source": "# Select just the day of the month from our column\n___\n\n\"Ok, Rachael,\" you may be saying at this point, \"This messing around with data types is fine, I guess, but what's the *point*?\" To answer your question, let's try to get information on the day of the month that a landslide occured on from the original \"date\" column, which has an \"object\" dtype: " 204 | }, 205 | { 206 | "metadata": { 207 | "_uuid": "f9afd282db0149e26949d53512d7aa46891f250f", 208 | "_cell_guid": "4510306b-09df-4d71-a682-6809d6b4cf07", 209 | "trusted": true 210 | }, 211 | "cell_type": "code", 212 | "source": "# try to get the day of the month from the date column\nday_of_month_landslides = landslides['date'].dt.day", 213 | "execution_count": 8, 214 | "outputs": [ 215 | { 216 | "output_type": "error", 217 | "ename": "AttributeError", 218 | "evalue": "Can only use .dt accessor with datetimelike values", 219 | "traceback": [ 220 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 221 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 222 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/accessors.py\u001b[0m in \u001b[0;36m_make_accessor\u001b[0;34m(cls, data)\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 256\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mmaybe_to_datetimelike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 257\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 223 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/accessors.py\u001b[0m in \u001b[0;36mmaybe_to_datetimelike\u001b[0;34m(data, copy)\u001b[0m\n\u001b[1;32m 81\u001b[0m raise TypeError(\"cannot convert an object of type {0} to a \"\n\u001b[0;32m---> 82\u001b[0;31m \"datetimelike index\".format(type(data)))\n\u001b[0m\u001b[1;32m 83\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 224 | "\u001b[0;31mTypeError\u001b[0m: cannot convert an object of type to a datetimelike index", 225 | "\nDuring handling of the above exception, another exception occurred:\n", 226 | "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", 227 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# try to get the day of the month from the date column\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mday_of_month_landslides\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlandslides\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'date'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mday\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 228 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 3608\u001b[0m if (name in self._internal_names_set or name in self._metadata or\n\u001b[1;32m 3609\u001b[0m name in self._accessors):\n\u001b[0;32m-> 3610\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3611\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3612\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 229 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/core/accessor.py\u001b[0m in \u001b[0;36m__get__\u001b[0;34m(self, instance, owner)\u001b[0m\n\u001b[1;32m 52\u001b[0m \u001b[0;31m# this ensures that Series.str. is well defined\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 53\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0maccessor_cls\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 54\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconstruct_accessor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minstance\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 55\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 56\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__set__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minstance\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 230 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/accessors.py\u001b[0m in \u001b[0;36m_make_accessor\u001b[0;34m(cls, data)\u001b[0m\n\u001b[1;32m 256\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmaybe_to_datetimelike\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 257\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 258\u001b[0;31m raise AttributeError(\"Can only use .dt accessor with \"\n\u001b[0m\u001b[1;32m 259\u001b[0m \"datetimelike values\")\n", 231 | "\u001b[0;31mAttributeError\u001b[0m: Can only use .dt accessor with datetimelike values" 232 | ] 233 | } 234 | ] 235 | }, 236 | { 237 | "metadata": { 238 | "_uuid": "e387aa01cde1e5af4d106ad6d44b1bde1d156e1e", 239 | "_cell_guid": "0b38eb7d-ef3c-45ab-9065-60e66172cfcf" 240 | }, 241 | "cell_type": "markdown", 242 | "source": "We got an error! The important part to look at here is the part at the very end that says `AttributeError: Can only use .dt accessor with datetimelike values`. We're getting this error because the dt.day() function doesn't know how to deal with a column with the dtype \"object\". Even though our dataframe has dates in it, because they haven't been parsed we can't interact with them in a useful way.\n\nLuckily, we have a column that we parsed earlier , and that lets us get the day of the month out no problem:" 243 | }, 244 | { 245 | "metadata": { 246 | "_uuid": "9132dcab4ccfc1f0cc119ca1ecd554b3bdece3f0", 247 | "_cell_guid": "9496661e-01ef-44e4-9811-ad713323ad33", 248 | "collapsed": true, 249 | "trusted": true 250 | }, 251 | "cell_type": "code", 252 | "source": "# get the day of the month from the date_parsed column\nday_of_month_landslides = landslides['date_parsed'].dt.day", 253 | "execution_count": 9, 254 | "outputs": [] 255 | }, 256 | { 257 | "metadata": { 258 | "_uuid": "54cda16eb0c4f77bd77dbbcbdec757016eefe4a6", 259 | "_cell_guid": "6bd1e417-9ac6-45c4-a3d2-019c777ccf36", 260 | "trusted": true 261 | }, 262 | "cell_type": "code", 263 | "source": "# Your turn! get the day of the month from the date_parsed column\nday_of_month_landslides.head()", 264 | "execution_count": 10, 265 | "outputs": [ 266 | { 267 | "output_type": "execute_result", 268 | "execution_count": 10, 269 | "data": { 270 | "text/plain": "0 2.0\n1 22.0\n2 6.0\n3 14.0\n4 15.0\nName: date_parsed, dtype: float64" 271 | }, 272 | "metadata": {} 273 | } 274 | ] 275 | }, 276 | { 277 | "metadata": { 278 | "_uuid": "919cdbd2c166287a9b9c591e7d3e357dd1b68006", 279 | "_cell_guid": "3ba0146c-ab79-4f26-842f-1d60b3825819" 280 | }, 281 | "cell_type": "markdown", 282 | "source": "# Plot the day of the month to check the date parsing\n___\n\nOne of the biggest dangers in parsing dates is mixing up the months and days. The to_datetime() function does have very helpful error messages, but it doesn't hurt to double-check that the days of the month we've extracted make sense. \n\nTo do this, let's plot a histogram of the days of the month. We expect it to have values between 1 and 31 and, since there's no reason to suppose the landslides are more common on some days of the month than others, a relatively even distribution. (With a dip on 31 because not all months have 31 days.) Let's see if that's the case:" 283 | }, 284 | { 285 | "metadata": { 286 | "_uuid": "158ae77588266631060947c78eada47ab3d4b7dc", 287 | "_cell_guid": "94264561-1884-4f28-ab1e-c0590ca1c0a2", 288 | "trusted": true 289 | }, 290 | "cell_type": "code", 291 | "source": "# remove na's\nday_of_month_landslides = day_of_month_landslides.dropna()\n\n# plot the day of the month\n\nsns.distplot(day_of_month_landslides, kde=False, bins=31)", 292 | "execution_count": 11, 293 | "outputs": [ 294 | { 295 | "output_type": "execute_result", 296 | "execution_count": 11, 297 | "data": { 298 | "text/plain": "" 299 | }, 300 | "metadata": {} 301 | }, 302 | { 303 | "output_type": "display_data", 304 | "data": { 305 | "text/plain": "", 306 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAELCAYAAADJF31HAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAExNJREFUeJzt3X+wHWd93/H3B/+osYHIhmuPaqPK\nMCqBusXBd4yBNqU2zgBJsTK1EwzNiIynamcIhdC0dmA6kAyd2g0JZKYpjBJTKy3GNsauPDQFVMf8\nyDQVyL+wjTEyxnEcq5IIONiQHzX59o/zuNwRVzp77z1H955H79fMmbO7Z1fnu3elz3307O6zqSok\nSbPvGatdgCRpMgx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUieOPZJf9rznPa82\nbtx4JL9Skmbe7bff/s2qmhu33hEN9I0bN7J79+4j+ZWSNPOS/NGQ9exykaROGOiS1AkDXZI6YaBL\nUicMdEnqxKBAT/KLSe5Lcm+SjyU5IcmZSXYl2ZPk+iTHT7tYSdKhjQ30JKcD/xKYr6qzgGOANwJX\nAR+oqk3At4HLplmoJOnwhna5HAs8M8mxwInAXuB84Mb2+XZg8+TLkyQNNTbQq+pPgPcDjzAK8j8D\nbgcer6qn2mqPAqdPq0hJ0nhj7xRNcjJwEXAm8DjwceB1i6y66NOmk2wFtgJs2LBh2YVqtly765FB\n673p5f6dkCZlSJfLa4BvVNWBqvq/wE3AK4F1rQsG4AzgscU2rqptVTVfVfNzc2OHIpAkLdOQsVwe\nAc5LciLw58AFwG7gNuBi4DpgC7BjWkVKa5n/G9FaMaQPfRejk593APe0bbYBlwPvTPIg8Fzg6inW\nKUkaY9Boi1X1HuA9By1+CDh34hVJkpbFO0UlqRMGuiR1wkCXpE4c0ScWSdJSeRXRcLbQJakTBrok\ndcJAl6ROGOiS1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOjE20JO8\nKMldC17fSfKOJKck2ZlkT3s/+UgULEla3JBnij5QVWdX1dnAOcD3gJuBK4Bbq2oTcGublyStkqV2\nuVwAfL2q/gi4CNjelm8HNk+yMEnS0iw10N8IfKxNn1ZVewHa+6mTLEyStDSDn1iU5HjgDcAvL+UL\nkmwFtgJs2DBbTxTxSSmSZslSWuivA+6oqn1tfl+S9QDtff9iG1XVtqqar6r5ubm5lVUrSTqkpQT6\npfyguwXgFmBLm94C7JhUUZKkpRsU6ElOBC4Eblqw+ErgwiR72mdXTr48SdJQg/rQq+p7wHMPWvan\njK56kSStAYNPivZm6AlPSZoV3vovSZ0w0CWpEwa6JHXCQJekThy1J0V19PIOYPXKFrokdcJAl6RO\nGOiS1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktSJoY+gW5fkxiRfTXJ/klck\nOSXJziR72vvJ0y5WknRoQwfn+k3gU1V1cZLjgROBdwG3VtWVSa4ArgAun1KdkhaY9ABjDljWh7Et\n9CTPAX4cuBqgqv6qqh4HLgK2t9W2A5unVaQkabwhXS4vAA4A/znJnUl+J8lJwGlVtRegvZ86xTol\nSWMM6XI5FngZ8Laq2pXkNxl1rwySZCuwFWDDBv+7pqOX3RqatiEt9EeBR6tqV5u/kVHA70uyHqC9\n719s46raVlXzVTU/Nzc3iZolSYsYG+hV9X+AP07yorboAuArwC3AlrZsC7BjKhVKkgYZepXL24CP\ntitcHgJ+ntEvgxuSXAY8AlwynRIlSUMMCvSquguYX+SjCyZbjiRpubxTVJI6YaBLUieG9qHrCDqa\nLm87mvZVmjZb6JLUCQNdkjphoEtSJwx0SeqEgS5JnfAqF0lHlZ6vrLKFLkmd6K6FPvS3r2aLx1Ua\nzxa6JHXCQJekTnTX5SJJkzCLJ09toUtSJ2yha0k8OTl9k/wZH03H62ja10OxhS5JnRjUQk/yMPAE\n8H3gqaqaT3IKcD2wEXgY+Jmq+vZ0ypQkjbOULpd/VFXfXDB/BXBrVV2Z5Io2f/lEq5M0k4Z0f6yl\nk4m9WEmXy0XA9ja9Hdi88nIkScs1NNAL+EyS25NsbctOq6q9AO391MU2TLI1ye4kuw8cOLDyiiVJ\nixra5fKqqnosyanAziRfHfoFVbUN2AYwPz9fy6hRkjTAoBZ6VT3W3vcDNwPnAvuSrAdo7/unVaQk\nabyxLfQkJwHPqKon2vRPAL8K3AJsAa5s7zumWajXmErS4Q3pcjkNuDnJ0+tfW1WfSvIl4IYklwGP\nAJdMr0xJ0jhjA72qHgJeusjyPwUumEZRkqSl89b/CZjFQXw0nt18P8yfydrmrf+S1Alb6EeQrRtJ\n02QLXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOmGgS1InvPX/KOADe6Wj\ngy10SeqEgS5JnTDQJakTgwM9yTFJ7kzyyTZ/ZpJdSfYkuT7J8dMrU5I0zlJOir4duB94Tpu/CvhA\nVV2X5MPAZcCHJlyfDsPx1SUtNKiFnuQM4CeB32nzAc4HbmyrbAc2T6NASdIwQ1voHwT+DfDsNv9c\n4PGqeqrNPwqcvtiGSbYCWwE2bPDSuLXK1r40+8a20JP8FLC/qm5fuHiRVWux7atqW1XNV9X83Nzc\nMsuUJI0zpIX+KuANSV4PnMCoD/2DwLokx7ZW+hnAY9MrU5I0ztgWelX9clWdUVUbgTcCv19VbwZu\nAy5uq20BdkytSknSWCu5Dv1y4J1JHmTUp371ZEqSJC3HksZyqarPAp9t0w8B506+JEnScninqCR1\nwkCXpE4Y6JLUCQNdkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicM\ndEnqhIEuSZ0w0CWpE0MeEn1Cki8muTvJfUl+pS0/M8muJHuSXJ/k+OmXK0k6lCEt9L8Ezq+qlwJn\nA69Nch5wFfCBqtoEfBu4bHplSpLGGfsIuqoq4Mk2e1x7FXA+8Ka2fDvwXuBDky9Rktaua3c9Mnad\nN718wxGoZGAfepJjktwF7Ad2Al8HHq+qp9oqjwKnT6dESdIQgx4SXVXfB85Osg64GXjxYqsttm2S\nrcBWgA0bjsxvKUlr35CWrZZmSVe5VNXjwGeB84B1SZ7+hXAG8NghttlWVfNVNT83N7eSWiVJhzHk\nKpe51jInyTOB1wD3A7cBF7fVtgA7plWkJGm8IV0u64HtSY5h9Avghqr6ZJKvANcleR9wJ3D1FOuU\nJI0x5CqXLwM/tsjyh4Bzp1GUJGnpvFNUkjphoEtSJwx0SeqEgS5JnTDQJakTBrokdcJAl6ROGOiS\n1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWpEwa6JHXCQJekThjoktQJA12SOjHkmaLPT3JbkvuT3Jfk\n7W35KUl2JtnT3k+efrmSpEMZ0kJ/CvhXVfVi4DzgrUleAlwB3FpVm4Bb27wkaZWMDfSq2ltVd7Tp\nJ4D7gdOBi4DtbbXtwOZpFSlJGm9JfehJNjJ6YPQu4LSq2guj0AdOnXRxkqThBgd6kmcBnwDeUVXf\nWcJ2W5PsTrL7wIEDy6lRkjTAoEBPchyjMP9oVd3UFu9Lsr59vh7Yv9i2VbWtquaran5ubm4SNUuS\nFjHkKpcAVwP3V9VvLPjoFmBLm94C7Jh8eZKkoY4dsM6rgJ8D7klyV1v2LuBK4IYklwGPAJdMp0RJ\n0hBjA72q/gDIIT6+YLLlSJKWyztFJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicMdEnqhIEuSZ0w\n0CWpEwa6JHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLkmdMNAlqRNDnin6kST7k9y7YNkpSXYm\n2dPeT55umZKkcYa00K8BXnvQsiuAW6tqE3Brm5ckraKxgV5Vnwe+ddDii4DtbXo7sHnCdUmSlmi5\nfeinVdVegPZ+6uRKkiQtx9RPiibZmmR3kt0HDhyY9tdJ0lFruYG+L8l6gPa+/1ArVtW2qpqvqvm5\nubllfp0kaZzlBvotwJY2vQXYMZlyJEnLNeSyxY8Bfwi8KMmjSS4DrgQuTLIHuLDNS5JW0bHjVqiq\nSw/x0QUTrkWStALeKSpJnTDQJakTBrokdcJAl6ROGOiS1AkDXZI6YaBLUicMdEnqhIEuSZ0w0CWp\nEwa6JHXCQJekThjoktQJA12SOmGgS1InDHRJ6oSBLkmdWFGgJ3ltkgeSPJjkikkVJUlaumUHepJj\ngN8CXge8BLg0yUsmVZgkaWlW0kI/F3iwqh6qqr8CrgMumkxZkqSlWkmgnw788YL5R9sySdIqOHYF\n22aRZfVDKyVbga1t9skkDxy0yvOAb66gjrWkl33pZT/AfVmretmXQfvx5pV/z98astJKAv1R4PkL\n5s8AHjt4paraBmw71B+SZHdVza+gjjWjl33pZT/AfVmretmXtbYfK+ly+RKwKcmZSY4H3gjcMpmy\nJElLtewWelU9leQXgE8DxwAfqar7JlaZJGlJVtLlQlX9HvB7K6zhkN0xM6iXfellP8B9Wat62Zc1\ntR+p+qHzmJKkGeSt/5LUiVUL9J6GDUjycJJ7ktyVZPdq17MUST6SZH+SexcsOyXJziR72vvJq1nj\nUIfYl/cm+ZN2bO5K8vrVrHGIJM9PcluS+5Pcl+TtbfnMHZfD7MssHpcTknwxyd1tX36lLT8zya52\nXK5vF4msTo2r0eXShg34GnAho8sfvwRcWlVfOeLFTECSh4H5qpq562qT/DjwJPC7VXVWW/YfgG9V\n1ZXtl+3JVXX5atY5xCH25b3Ak1X1/tWsbSmSrAfWV9UdSZ4N3A5sBt7CjB2Xw+zLzzB7xyXASVX1\nZJLjgD8A3g68E7ipqq5L8mHg7qr60GrUuFotdIcNWCOq6vPAtw5afBGwvU1vZ/QPcM07xL7MnKra\nW1V3tOkngPsZ3YU9c8flMPsyc2rkyTZ7XHsVcD5wY1u+qsdltQK9t2EDCvhMktvbnbGz7rSq2guj\nf5DAqatcz0r9QpIvty6ZNd9NsVCSjcCPAbuY8eNy0L7ADB6XJMckuQvYD+wEvg48XlVPtVVWNctW\nK9AHDRswQ15VVS9jNPLkW9t//bU2fAh4IXA2sBf49dUtZ7gkzwI+Abyjqr6z2vWsxCL7MpPHpaq+\nX1VnM7oz/lzgxYutdmSr+oHVCvRBwwbMiqp6rL3vB25mdKBn2b7W9/l0H+j+Va5n2apqX/tH+NfA\nbzMjx6b10X4C+GhV3dQWz+RxWWxfZvW4PK2qHgc+C5wHrEvy9D09q5plqxXo3QwbkOSkdrKHJCcB\nPwHce/it1rxbgC1teguwYxVrWZGnA7D5aWbg2LSTb1cD91fVbyz4aOaOy6H2ZUaPy1ySdW36mcBr\nGJ0TuA24uK22qsdl1W4sapcpfZAfDBvw71alkBVK8gJGrXIY3Xl77SztS5KPAa9mNGrcPuA9wH8D\nbgA2AI8Al1TVmj/ZeIh9eTWj/9YX8DDwz5/uh16rkvx94AvAPcBft8XvYtT3PFPH5TD7cimzd1z+\nHqOTnscwagzfUFW/2jLgOuAU4E7gn1bVX65Kjd4pKkl98E5RSeqEgS5JnTDQJakTBrokdcJAl6RO\nGOiS1AkDXWtaG2b1lw7z+eYkLzmSNU1Cko0Lh/mVJsFA16zbDBzxQG9DQEtrioGuNSfJu9vDT/4n\n8KK27J8l+VJ7uMAnkpyY5JXAG4Bfaw9JeGF7faqNfPmFJD96mO+5JsmH23pfS/JTbfnGtuyO9npl\nW/7q9rCGa4F72rAP/73VdG+Sn23rnZPkc62GTy8Yf+Wctu4fAm+d6g9RR6eq8uVrzbyAcxjdJn4i\n8BzgQeCXgOcuWOd9wNva9DXAxQs+uxXY1KZfDvz+Yb7rGuBTjBo2mxgNGndC++4T2jqbgN1t+tXA\nd4Ez2/w/AX57wZ/3I4zGyP5fwFxb9rOMhrYA+DLwD9v0rwH3rvbP21dfr6dHCJPWin8A3FxV3wNI\n8vSgbWcleR+wDngW8OmDN2xDtL4S+PhoTCgA/saY77uhRiP+7UnyEPCjwDeA/5jkbOD7wN9esP4X\nq+obbfoe4P1JrgI+WVVfSHIWcBaws9VwDLA3yY8A66rqc23b/8JouGVpYgx0rUWLDTB0DbC5qu5O\n8hZGreWDPYPRwwbOXsF3FfCLjAb3emn7M/9iweff/f8rVn0tyTnA64F/n+QzjAZqu6+qXrHwD22j\n9DlwkqbKPnStNZ8HfjrJM9uwxP+4LX82o5buccCbF6z/RPuMGj044RtJLoHR0K1JXjrm+y5J8owk\nLwReADzAqOtkb2u5/xyjVvYPSfI3ge9V1X8F3g+8rG0/l+QVbZ3jkvydGo2f/Wdt9EEO2gdpIgx0\nrSk1ev7k9cBdjB6K8IX20b9lNHzsTuCrCza5DvjXSe5sofxm4LIkdwP3Mf5ZtQ8AnwP+B/Avquov\ngP8EbEnyvxl1t3z3ENv+XeCL7ZFk7wbeV6Nn5F4MXNVquItRNxDAzwO/1U6K/vnYH4a0RA6fq6NW\nkmsY9X3fOG5daRbYQpekTnhSVN1L8m7gkoMWf7yq3rIK5UhTY5eLJHXCLhdJ6oSBLkmdMNAlqRMG\nuiR1wkCXpE78P4H3LsG2jBn0AAAAAElFTkSuQmCC\n" 307 | }, 308 | "metadata": {} 309 | } 310 | ] 311 | }, 312 | { 313 | "metadata": { 314 | "_uuid": "1a39e52da6c5eb7ef0deeb2285c9d1602b6f36e4", 315 | "_cell_guid": "eb46423c-26ac-4b6e-95a9-efda98e583a4" 316 | }, 317 | "cell_type": "markdown", 318 | "source": "Yep, it looks like we did parse our dates correctly & this graph makes good sense to me. Why don't you take a turn checking the dates you parsed earlier?" 319 | }, 320 | { 321 | "metadata": { 322 | "trusted": true, 323 | "_uuid": "2888e79dc89348c930575e0e60587cb06a43b118" 324 | }, 325 | "cell_type": "code", 326 | "source": "day_of_month_earthquakes.head()", 327 | "execution_count": 14, 328 | "outputs": [ 329 | { 330 | "output_type": "execute_result", 331 | "execution_count": 14, 332 | "data": { 333 | "text/plain": "Series([], Name: date_parsed, dtype: float64)" 334 | }, 335 | "metadata": {} 336 | } 337 | ] 338 | }, 339 | { 340 | "metadata": { 341 | "_uuid": "3fad63b7f16333a777c76733b3c8647818f6f1b7", 342 | "_cell_guid": "f382a051-51e4-4bf5-a5b0-58af5da22ace", 343 | "trusted": true 344 | }, 345 | "cell_type": "code", 346 | "source": "# Your turn! Plot the days of the month from your\n# earthquake dataset and make sure they make sense.\n\nday_of_month_earthquakes = earthquakes['date_parsed'].dt.day\nday_of_month_earthquakes = day_of_month_earthquakes.dropna()\n\n# plot the day of the month\nsns.distplot(day_of_month_earthquakes, kde=False, bins=100)\n", 347 | "execution_count": 12, 348 | "outputs": [ 349 | { 350 | "output_type": "stream", 351 | "text": "/opt/conda/lib/python3.6/site-packages/seaborn/distributions.py:195: RuntimeWarning: Mean of empty slice.\n line, = ax.plot(a.mean(), 0)\n/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars\n ret = ret.dtype.type(ret / rcount)\n", 352 | "name": "stderr" 353 | }, 354 | { 355 | "output_type": "execute_result", 356 | "execution_count": 12, 357 | "data": { 358 | "text/plain": "" 359 | }, 360 | "metadata": {} 361 | }, 362 | { 363 | "output_type": "display_data", 364 | "data": { 365 | "text/plain": "", 366 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAELCAYAAADdriHjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEdZJREFUeJzt3X2QXXV9x/H3h0ShKOUxKBJiqMTa\niOMDd0CtVloEwamE1lihWkOHmqkVZ6rVKQ7joMhMperQccTaKAwpHQXEsWa0miKIMlaQDQ9KrJEI\nKimMxAZpkSqNfvvHPdD9rTfZm9ybXTZ5v2bu7Hn43nu+v91NPvecc/ecVBWSJD1qr9luQJL0+GIw\nSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqWEwSJIaBoMkqTF/thvYGYccckgtXrx4ttuQpDll3bp1\nP66qBdPVzclgWLx4MRMTE7PdhiTNKUl+MEydh5IkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgk\nSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUMBgkSQ2D\nQZLUMBgkSQ2DQZLUMBgkSQ2DQZLUGEswJDk5yYYkG5OcM2D93kmu7NbflGTxlPWLkjyU5O3j6EeS\ntPNGDoYk84CLgVOApcAZSZZOKTsLeKCqjgIuAi6csv4i4Auj9iJJGt049hiOBTZW1V1V9QhwBbBs\nSs0yYHU3fTVwQpIAJDkNuAtYP4ZeJEkjGkcwHA7cM2l+U7dsYE1VbQUeBA5O8iTgr4H3jKEPSdIY\njCMYMmBZDVnzHuCiqnpo2o0kK5NMJJnYvHnzTrQpSRrG/DG8xibgiEnzC4F7t1GzKcl8YH9gC3Ac\nsDzJ3wIHAL9M8rOq+vDUjVTVKmAVQK/Xmxo8kqQxGUcw3AwsSXIk8B/A6cAfT6lZA6wAvg4sB66r\nqgJe+mhBkncDDw0KBUnSzBk5GKpqa5KzgbXAPODSqlqf5HxgoqrWAJcAlyfZSH9P4fRRtytJ2jXS\nf+M+t/R6vZqYmJjtNiRpTkmyrqp609X5l8+SpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbB\nIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElq\nGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpIbBIElqGAySpMZYgiHJyUk2\nJNmY5JwB6/dOcmW3/qYki7vlJyZZl+Rb3dffG0c/kqSdN3IwJJkHXAycAiwFzkiydErZWcADVXUU\ncBFwYbf8x8Crquo5wArg8lH7kSSNZhx7DMcCG6vqrqp6BLgCWDalZhmwupu+GjghSarq1qq6t1u+\nHtgnyd5j6EmStJPGEQyHA/dMmt/ULRtYU1VbgQeBg6fUvBq4tap+PoaeJEk7af4YXiMDltWO1CR5\nNv3DSydtcyPJSmAlwKJFi3a8S0nSUMaxx7AJOGLS/ELg3m3VJJkP7A9s6eYXAp8B3lBV39vWRqpq\nVVX1qqq3YMGCMbQtSRpkHMFwM7AkyZFJngicDqyZUrOG/sllgOXAdVVVSQ4APg+8s6q+NoZeJEkj\nGjkYunMGZwNrgX8Hrqqq9UnOT3JqV3YJcHCSjcDbgEc/0no2cBTwriS3dY9DR+1JkrTzUjX1dMDj\nX6/Xq4mJidluQ5LmlCTrqqo3XZ1/+SxJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSG\nwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJ\nahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqSGwSBJahgMkqTGWIIhyclJNiTZmOScAev3\nTnJlt/6mJIsnrXtnt3xDkleMox9J0s4bORiSzAMuBk4BlgJnJFk6pews4IGqOgq4CLiwe+5S4HTg\n2cDJwEe615MkzZJx7DEcC2ysqruq6hHgCmDZlJplwOpu+mrghCTpll9RVT+vqruBjd3rSZJmyTiC\n4XDgnknzm7plA2uqaivwIHDwkM+VJM2gcQRDBiyrIWuGeW7/BZKVSSaSTGzevHkHW5QkDWscwbAJ\nOGLS/ELg3m3VJJkP7A9sGfK5AFTVqqrqVVVvwYIFY2hbkjTIOILhZmBJkiOTPJH+yeQ1U2rWACu6\n6eXAdVVV3fLTu08tHQksAb4xhp4kSTtp/qgvUFVbk5wNrAXmAZdW1fok5wMTVbUGuAS4PMlG+nsK\np3fPXZ/kKuDbwFbgzVX1i1F7kiTtvPTfuM8tvV6vJiYmZrsNSZpTkqyrqt50df7lsySpYTBIkhoG\ngySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySp\nYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBI\nkhoGgySpYTBIkhoGgySpMVIwJDkoyTVJ7uy+HriNuhVdzZ1JVnTL9k3y+STfSbI+yftG6UWSNB6j\n7jGcA1xbVUuAa7v5RpKDgPOA44BjgfMmBcgHqupZwPOB305yyoj9SJJGNGowLANWd9OrgdMG1LwC\nuKaqtlTVA8A1wMlV9XBVfRmgqh4BbgEWjtiPJGlEowbDU6rqPoDu66EDag4H7pk0v6lb9pgkBwCv\nor/XIUmaRfOnK0jyJeCpA1adO+Q2MmBZTXr9+cAngQ9V1V3b6WMlsBJg0aJFQ25akrSjpg2Gqnr5\nttYl+VGSw6rqviSHAfcPKNsEHD9pfiFw/aT5VcCdVfV30/Sxqqul1+vV9molSTtv1ENJa4AV3fQK\n4LMDatYCJyU5sDvpfFK3jCQXAPsDfzliH5KkMRk1GN4HnJjkTuDEbp4kvSQfB6iqLcB7gZu7x/lV\ntSXJQvqHo5YCtyS5LcmfjdiPJGlEqZp7R2V6vV5NTEzMdhuSNKckWVdVvenq/MtnSVLDYJAkNQwG\nSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLD\nYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAkNQwGSVLDYJAk\nNQwGSVLDYJAkNUYKhiQHJbkmyZ3d1wO3Ubeiq7kzyYoB69ckuWOUXiRJ4zHqHsM5wLVVtQS4tptv\nJDkIOA84DjgWOG9ygCT5Q+ChEfuQJI3JqMGwDFjdTa8GThtQ8wrgmqraUlUPANcAJwMkeTLwNuCC\nEfuQJI3JqMHwlKq6D6D7euiAmsOBeybNb+qWAbwX+CDw8Ih9SJLGZP50BUm+BDx1wKpzh9xGBiyr\nJM8DjqqqtyZZPEQfK4GVAIsWLRpy05KkHTVtMFTVy7e1LsmPkhxWVfclOQy4f0DZJuD4SfMLgeuB\nFwHHJPl+18ehSa6vquMZoKpWAasAer1eTde3JGnnjHooaQ3w6KeMVgCfHVCzFjgpyYHdSeeTgLVV\n9fdV9bSqWgy8BPjutkJBkjRzRg2G9wEnJrkTOLGbJ0kvyccBqmoL/XMJN3eP87tlkqTHoVTNvaMy\nvV6vJiYmZrsNSZpTkqyrqt50df7lsySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoG\ngySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySpYTBIkhoGgySp\nYTBIkhoGgySpYTBIkhoGgySpkaqa7R52WJLNwA928umHAD8eYztzgWPeM+xpY97Txgujj/npVbVg\nuqI5GQyjSDJRVb3Z7mMmOeY9w5425j1tvDBzY/ZQkiSpYTBIkhp7YjCsmu0GZoFj3jPsaWPe08YL\nMzTmPe4cgyRp+/bEPQZJ0nbstsGQ5OQkG5JsTHLOgPV7J7myW39TksUz3+X4DDHetyX5dpJvJrk2\nydNno89xmm7Mk+qWJ6kkc/4TLMOMOckfdT/r9Uk+MdM9jtsQv9uLknw5ya3d7/crZ6PPcUlyaZL7\nk9yxjfVJ8qHu+/HNJC8YexNVtds9gHnA94DfAJ4I3A4snVLzF8BHu+nTgStnu+9dPN7fBfbtpt80\nl8c77Ji7uv2ArwI3Ar3Z7nsGfs5LgFuBA7v5Q2e77xkY8yrgTd30UuD7s933iGP+HeAFwB3bWP9K\n4AtAgBcCN427h911j+FYYGNV3VVVjwBXAMum1CwDVnfTVwMnJMkM9jhO0463qr5cVQ93szcCC2e4\nx3Eb5mcM8F7gb4GfzWRzu8gwY34jcHFVPQBQVffPcI/jNsyYC/j1bnp/4N4Z7G/squqrwJbtlCwD\n/rH6bgQOSHLYOHvYXYPhcOCeSfObumUDa6pqK/AgcPCMdDd+w4x3srPov+OYy6Ydc5LnA0dU1edm\nsrFdaJif8zOBZyb5WpIbk5w8Y93tGsOM+d3A65NsAv4FeMvMtDZrdvTf+w6bP84XexwZ9M5/6sev\nhqmZK4YeS5LXAz3gZbu0o11vu2NOshdwEXDmTDU0A4b5Oc+nfzjpePp7hTckObqqfrKLe9tVhhnz\nGcBlVfXBJC8CLu/G/Mtd396s2OX/d+2uewybgCMmzS/kV3cvH6tJMp/+Luj2dt8ez4YZL0leDpwL\nnFpVP5+h3naV6ca8H3A0cH2S79M/Frtmjp+AHvb3+rNV9b9VdTewgX5QzFXDjPks4CqAqvo6sA/9\nawrtrob69z6K3TUYbgaWJDkyyRPpn1xeM6VmDbCim14OXFfdmZ05aNrxdodV/oF+KMz1484wzZir\n6sGqOqSqFlfVYvrnVU6tqonZaXcshvm9/mf6HzQgySH0Dy3dNaNdjtcwY/4hcAJAkt+iHwybZ7TL\nmbUGeEP36aQXAg9W1X3j3MBueSipqrYmORtYS/9TDZdW1fok5wMTVbUGuIT+LudG+nsKp89ex6MZ\ncrzvB54MfKo7x/7Dqjp11poe0ZBj3q0MOea1wElJvg38AnhHVf3n7HU9miHH/FfAx5K8lf4hlTPn\n8Js8knyS/qHAQ7rzJucBTwCoqo/SP4/ySmAj8DDwp2PvYQ5//yRJu8DueihJkrSTDAZJUsNgkCQ1\nDAZJUsNgkCQ1DAZJUsNg0B4hybuTvH07609LsnQmexqHJIu3dXlmaWcZDFLfafQv2Tyjksyb6W1K\n0zEYtNtKcm53g5cvAb/ZLXtjkpuT3J7k00n2TfJi4FTg/UluS/KM7vHFJOuS3JDkWdvZzmVJPtrV\nfTfJ73fLF3fLbukeL+6WH9/dWOYTwLeSPCnJ57ue7kjy2q7umCRf6XpY++illbvltyf5OvDmXfpN\n1J5ptm9K4cPHrngAxwDfAvalf63+jcDbgYMn1VwAvKWbvgxYPmndtcCSbvo4+tfS2ta2LgO+SP+N\n1hL6Fznbp9v2Pl3NEvqXcID+5Q5+ChzZzb8a+Nik19uf/iUQ/g1Y0C17Lf3LQQB8E3hZN/1+tnFD\nFx8+dvaxW14rSQJeCnymupsTJXn02klHJ7kAOID+taPWTn1ikicDL+b/rysFsPc027uq+pd5vjPJ\nXcCzgLuBDyd5Hv3rFj1zUv03qn/1U+gH2AeSXAh8rqpuSHI0/avDXtP1MA+4L8n+wAFV9ZXuuZcD\np0z/7ZCGZzBodzboQmCXAadV1e1JzqT/7n2qvYCfVNXzRthWAW8FfgQ8t3vNyXeR++ljhVXfTXIM\n/Quj/U2SfwU+A6yvqhdNftEkBwzYljRWnmPQ7uqrwB8k+bUk+wGv6pbvR/+d9xOA102q/+9uHVX1\nX8DdSV4Dj918/bnTbO81SfZK8gz69yfeQP+Q0H3dnsSf0H/X/yuSPA14uKr+CfgA/fv9bgAWdDee\nIckTkjy7+jfceTDJS7qnv27Qa0qjMBi0W6qqW4ArgduATwM3dKveBdwEXAN8Z9JTrgDekeTW7j/3\n1wFnJbkdWM/g+0lPtgH4Cv1bpv55Vf0M+AiwIsmN9A8j/XQbz30O8I0kt9G/kdIF1b+/8XLgwq6H\n2+gf3oL+ZZYv7k4+/8+03wxpB3nZbWlESS6jf27g6tnuRRoH9xgkSQ1PPktDSnIu8Jopiz9VVWfO\nQjvSLuOhJElSw0NJkqSGwSBJahgMkqSGwSBJahgMkqTG/wF8gLouwvq4JgAAAABJRU5ErkJggg==\n" 367 | }, 368 | "metadata": {} 369 | } 370 | ] 371 | }, 372 | { 373 | "metadata": { 374 | "_uuid": "52b0af56e3c77db96056e9acd785f8f435f7caf5", 375 | "_cell_guid": "b4f37fce-4d08-409e-bbbd-6a26c3bbc6ee" 376 | }, 377 | "cell_type": "markdown", 378 | "source": "And that's it for today! If you have any questions, be sure to post them in the comments below or [on the forums](https://www.kaggle.com/questions-and-answers). \n\nRemember that your notebook is private by default, and in order to share it with other people or ask for help with it, you'll need to make it public. First, you'll need to save a version of your notebook that shows your current work by hitting the \"Commit & Run\" button. (Your work is saved automatically, but versioning your work lets you go back and look at what it was like at the point you saved it. It also lets you share a nice compiled notebook instead of just the raw code.) Then, once your notebook is finished running, you can go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the \"Commit & Run\" button) and setting the \"Visibility\" dropdown to \"Public\".\n\n# More practice!\n___\n\nIf you're interested in graphing time series, [check out this Learn tutorial](https://www.kaggle.com/residentmario/time-series-plotting-optional).\n\nYou can also look into passing columns that you know have dates in them to the `parse_dates` argument in `read_csv`. (The documention [is here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html).) Do note that this method can be very slow, but depending on your needs it may sometimes be handy to use.\n\nFor an extra challenge, you can try try parsing the column `Last Known Eruption` from the `volcanos` dataframe. This column contains a mixture of text (\"Unknown\") and years both before the common era (BCE, also known as BC) and in the common era (CE, also known as AD)." 379 | }, 380 | { 381 | "metadata": { 382 | "_uuid": "b647eb891e9818d466b484e45e00f9d95ce585b4", 383 | "_cell_guid": "d99dbf1d-1e1d-4304-8cf0-a8874a84ac33", 384 | "trusted": true 385 | }, 386 | "cell_type": "code", 387 | "source": "volcanos['Last Known Eruption'].sample(5)", 388 | "execution_count": 15, 389 | "outputs": [ 390 | { 391 | "output_type": "execute_result", 392 | "execution_count": 15, 393 | "data": { 394 | "text/plain": "764 Unknown\n1069 1996 CE\n34 1855 CE\n489 2016 CE\n9 1302 CE\nName: Last Known Eruption, dtype: object" 395 | }, 396 | "metadata": {} 397 | } 398 | ] 399 | }, 400 | { 401 | "metadata": { 402 | "trusted": true, 403 | "collapsed": true, 404 | "_uuid": "03da2e8fac13fe61b0dcb94da804dfc623e5e1e3" 405 | }, 406 | "cell_type": "code", 407 | "source": "", 408 | "execution_count": null, 409 | "outputs": [] 410 | } 411 | ], 412 | "metadata": { 413 | "kernelspec": { 414 | "display_name": "Python 3", 415 | "language": "python", 416 | "name": "python3" 417 | }, 418 | "language_info": { 419 | "name": "python", 420 | "version": "3.6.4", 421 | "mimetype": "text/x-python", 422 | "codemirror_mode": { 423 | "name": "ipython", 424 | "version": 3 425 | }, 426 | "pygments_lexer": "ipython3", 427 | "nbconvert_exporter": "python", 428 | "file_extension": ".py" 429 | } 430 | }, 431 | "nbformat": 4, 432 | "nbformat_minor": 1 433 | } -------------------------------------------------------------------------------- /Data Cleaning 4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": { 5 | "_cell_guid": "b91a74ba-85f4-486e-b5f9-d0898f0626bf", 6 | "_uuid": "6ac53f18b4f4ec0fc44348cedb5d1c319fa127c0" 7 | }, 8 | "cell_type": "markdown", 9 | "source": "### Previous days\n\n* [Day 1: Handling missing values](https://www.kaggle.com/rtatman/data-cleaning-challenge-handling-missing-values)\n* [Day 2: Scaling and normalization](https://www.kaggle.com/rtatman/data-cleaning-challenge-scale-and-normalize-data)\n* [Day 3: Parsing dates](https://www.kaggle.com/rtatman/data-cleaning-challenge-parsing-dates/)\n___\nWelcome to day 4 of the 5-Day Data Challenge! Today, we're going to be working with different character encodings. To get started, click the blue \"Fork Notebook\" button in the upper, right hand corner. This will create a private copy of this notebook that you can edit and play with. Once you're finished with the exercises, you can choose to make your notebook public to share with others. :)\n\n> **Your turn!** As we work through this notebook, you'll see some notebook cells (a block of either code or text) that has \"Your Turn!\" written in it. These are exercises for you to do to help cement your understanding of the concepts we're talking about. Once you've written the code to answer a specific question, you can run the code by clicking inside the cell (box with code in it) with the code you want to run and then hit CTRL + ENTER (CMD + ENTER on a Mac). You can also click in a cell and then click on the right \"play\" arrow to the left of the code. If you want to run all the code in your notebook, you can use the double, \"fast forward\" arrows at the bottom of the notebook editor.\n\nHere's what we're going to do today:\n\n* [Get our environment set up](#Get-our-environment-set-up)\n* [What are encodings?](#What-are-encodings?)\n* [Convert our date columns to datetime](#Convert-our-date-columns-to-datetime)\n* [Select just the day of the month from our column](#Select-just-the-day-of-the-month-from-our-column)\n* [Plot the day of the month to check the date parsing](#Plot-the-day-of-the-month-to-the-date-parsing)\n\nLet's get started!" 10 | }, 11 | { 12 | "metadata": { 13 | "_cell_guid": "5cd5061f-ae30-4837-a53b-690ffd5c5830", 14 | "_uuid": "9d82bf13584b8e682962fbb96131f2447d741679" 15 | }, 16 | "cell_type": "markdown", 17 | "source": "# Get our environment set up\n________\n\nThe first thing we'll need to do is load in the libraries we'll be using. Not our datasets, though: we'll get to those later!\n\n> **Important!** Make sure you run this cell yourself or the rest of your code won't work!" 18 | }, 19 | { 20 | "metadata": { 21 | "_cell_guid": "135a7804-b5f5-40aa-8657-4a15774e3666", 22 | "collapsed": true, 23 | "_uuid": "835cbe0834b935fb0fd40c75b9c39454836f4d5f", 24 | "trusted": true 25 | }, 26 | "cell_type": "code", 27 | "source": "# modules we'll use\nimport pandas as pd\nimport numpy as np\n\n# helpful character encoding module\nimport chardet\n\n# set seed for reproducibility\nnp.random.seed(0)", 28 | "execution_count": 1, 29 | "outputs": [] 30 | }, 31 | { 32 | "metadata": { 33 | "_cell_guid": "604ac3a4-b1d9-4264-b312-4bbeecdeec00", 34 | "_uuid": "03ce3b4afe87d98f777172c2c7be066a66a0b237" 35 | }, 36 | "cell_type": "markdown", 37 | "source": "Now we're ready to work with some character encodings! (If you like, you can add a code cell here and take this opportunity to take a look at some of the data.)" 38 | }, 39 | { 40 | "metadata": { 41 | "_cell_guid": "52d1b1fb-b71f-4691-9f49-545bf272354d", 42 | "_uuid": "06093219f80ef491dd51e776a1c0701badaaf67e" 43 | }, 44 | "cell_type": "markdown", 45 | "source": "# What are encodings?\n____\n\nCharacter encodings are specific sets of rules for mapping from raw binary byte strings (that look like this: 0110100001101001) to characters that make up human-readable text (like \"hi\"). There are many different encodings, and if you tried to read in text with a different encoding that the one it was originally written in, you ended up with scrambled text called \"mojibake\" (said like mo-gee-bah-kay). Here's an example of mojibake:\n\næ–‡å—化ã??\n\nYou might also end up with a \"unknown\" characters. There are what gets printed when there's no mapping between a particular byte and a character in the encoding you're using to read your byte string in and they look like this:\n\n����������\n\nCharacter encoding mismatches are less common today than they used to be, but it's definitely still a problem. There are lots of different character encodings, but the main one you need to know is UTF-8.\n\n> UTF-8 is **the** standard text encoding. All Python code is in UTF-8 and, ideally, all your data should be as well. It's when things aren't in UTF-8 that you run into trouble.\n\nIt was pretty hard to deal with encodings in Python 2, but thankfully in Python 3 it's a lot simpler. (Kaggle Kernels only use Python 3.) There are two main data types you'll encounter when working with text in Python 3. One is is the string, which is what text is by default." 46 | }, 47 | { 48 | "metadata": { 49 | "_cell_guid": "579e7b4a-9113-4795-833f-43dfaa7bd223", 50 | "_uuid": "c93b6c3d188e2174d5060255ea6220f52026d6f2", 51 | "trusted": true 52 | }, 53 | "cell_type": "code", 54 | "source": "# start with a string\nbefore = \"This is the euro symbol: €\"\n\n# check to see what datatype it is\ntype(before)", 55 | "execution_count": 2, 56 | "outputs": [ 57 | { 58 | "output_type": "execute_result", 59 | "execution_count": 2, 60 | "data": { 61 | "text/plain": "str" 62 | }, 63 | "metadata": {} 64 | } 65 | ] 66 | }, 67 | { 68 | "metadata": { 69 | "_cell_guid": "411f1c92-beeb-41ae-bf37-689830b18543", 70 | "_uuid": "3744c3f583a9e2cc71a9dddbd40f875cfe118192" 71 | }, 72 | "cell_type": "markdown", 73 | "source": "The other data is the [bytes](https://docs.python.org/3.1/library/functions.html#bytes) data type, which is a sequence of integers. You can convert a string into bytes by specifying which encoding it's in:" 74 | }, 75 | { 76 | "metadata": { 77 | "_cell_guid": "e2581032-e30d-427a-ade1-e68bd5bbfa26", 78 | "_uuid": "7abd3230e80d30916c7bb2c89a75268c8d943124", 79 | "trusted": true 80 | }, 81 | "cell_type": "code", 82 | "source": "# encode it to a different encoding, replacing characters that raise errors\nafter = before.encode(\"utf-8\", errors = \"replace\")\n\n# check the type\ntype(after)", 83 | "execution_count": 3, 84 | "outputs": [ 85 | { 86 | "output_type": "execute_result", 87 | "execution_count": 3, 88 | "data": { 89 | "text/plain": "bytes" 90 | }, 91 | "metadata": {} 92 | } 93 | ] 94 | }, 95 | { 96 | "metadata": { 97 | "_cell_guid": "2163421a-27ec-40b7-8064-7a4ddf2ccbb2", 98 | "_uuid": "561289a2b998601f914ddd548a1f8cc15f6d6452" 99 | }, 100 | "cell_type": "markdown", 101 | "source": "If you look at a bytes object, you'll see that it has a b in front of it, and then maybe some text after. That's because bytes are printed out as if they were characters encoded in ASCII. (ASCII is an older character encoding that doesn't really work for writing any language other than English.) Here you can see that our euro symbol has been replaced with some mojibake that looks like \"\\xe2\\x82\\xac\" when it's printed as if it were an ASCII string." 102 | }, 103 | { 104 | "metadata": { 105 | "_cell_guid": "b3aa69d1-7e4a-48a3-b788-a75d71d4dfc4", 106 | "_uuid": "28337794179e4e4b335983027e60789b4664f0d4", 107 | "trusted": true 108 | }, 109 | "cell_type": "code", 110 | "source": "# take a look at what the bytes look like\nafter", 111 | "execution_count": 4, 112 | "outputs": [ 113 | { 114 | "output_type": "execute_result", 115 | "execution_count": 4, 116 | "data": { 117 | "text/plain": "b'This is the euro symbol: \\xe2\\x82\\xac'" 118 | }, 119 | "metadata": {} 120 | } 121 | ] 122 | }, 123 | { 124 | "metadata": { 125 | "_cell_guid": "f56be052-f564-4cea-9aee-e34813a71a3f", 126 | "_uuid": "4c2e8b76861fb724986a7475cb0979d3bc23276b" 127 | }, 128 | "cell_type": "markdown", 129 | "source": "When we convert our bytes back to a string with the correct encoding, we can see that our text is all there correctly, which is great! :)" 130 | }, 131 | { 132 | "metadata": { 133 | "_cell_guid": "8cc169fb-827e-485a-bb6d-f414a46e6c15", 134 | "_uuid": "5d904ea4f724652fbad9b786f2c0aa318601b8fc", 135 | "trusted": true 136 | }, 137 | "cell_type": "code", 138 | "source": "# convert it back to utf-8\nprint(after.decode(\"utf-8\"))", 139 | "execution_count": 5, 140 | "outputs": [ 141 | { 142 | "output_type": "stream", 143 | "text": "This is the euro symbol: €\n", 144 | "name": "stdout" 145 | } 146 | ] 147 | }, 148 | { 149 | "metadata": { 150 | "_cell_guid": "ea3bd345-e139-46cf-bf2a-a479887c112b", 151 | "_uuid": "7ed1ee6a1ae446fc02eb35f01456c9d068fa897d" 152 | }, 153 | "cell_type": "markdown", 154 | "source": "However, when we try to use a different encoding to map our bytes into a string,, we get an error. This is because the encoding we're trying to use doesn't know what to do with the bytes we're trying to pass it. You need to tell Python the encoding that the byte string is actually supposed to be in.\n\n> You can think of different encodings as different ways of recording music. You can record the same music on a CD, cassette tape or 8-track. While the music may sound more-or-less the same, you need to use the right equipment to play the music from each recording format. The correct decoder is like a cassette player or a cd player. If you try to play a cassette in a CD player, it just won't work. " 155 | }, 156 | { 157 | "metadata": { 158 | "_cell_guid": "0454daad-f3b4-46bb-986e-e1710e6ec45c", 159 | "_uuid": "2ae367e4c83d2d1b1a02e288c9ab9d2a409bbddc", 160 | "trusted": true 161 | }, 162 | "cell_type": "code", 163 | "source": "# try to decode our bytes with the ascii encoding\nprint(after.decode(\"ascii\"))", 164 | "execution_count": 6, 165 | "outputs": [ 166 | { 167 | "output_type": "error", 168 | "ename": "UnicodeDecodeError", 169 | "evalue": "'ascii' codec can't decode byte 0xe2 in position 25: ordinal not in range(128)", 170 | "traceback": [ 171 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 172 | "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 173 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# try to decode our bytes with the ascii encoding\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mafter\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"ascii\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 174 | "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'ascii' codec can't decode byte 0xe2 in position 25: ordinal not in range(128)" 175 | ] 176 | } 177 | ] 178 | }, 179 | { 180 | "metadata": { 181 | "_cell_guid": "7dde2127-bcbe-46a6-8522-66eaef4fde53", 182 | "_uuid": "c7c9271352fc4564669cf34712096b21fb4b29b7" 183 | }, 184 | "cell_type": "markdown", 185 | "source": "We can also run into trouble if we try to use the wrong encoding to map from a string to bytes. Like I said earlier, strings are UTF-8 by default in Python 3, so if we try to treat them like they were in another encoding we'll create problems. \n\nFor example, if we try to convert a string to bytes for ascii using encode(), we can ask for the bytes to be what they would be if the text was in ASCII. Since our text isn't in ASCII, though, there will be some characters it can't handle. We can automatically replace the characters that ASCII can't handle. If we do that, however, any characters not in ASCII will just be replaced with the unknown character. Then, when we convert the bytes back to a string, the character will be replaced with the unknown character. The dangerous part about this is that there's not way to tell which character it *should* have been. That means we may have just made our data unusable!" 186 | }, 187 | { 188 | "metadata": { 189 | "_cell_guid": "abc9901f-4667-4679-a7e1-5589f8cbf7cf", 190 | "_uuid": "7a54834072c291034ddd3f83292e1a36be01388f", 191 | "trusted": true 192 | }, 193 | "cell_type": "code", 194 | "source": "# start with a string\nbefore = \"This is the euro symbol: €\"\n\n# encode it to a different encoding, replacing characters that raise errors\nafter = before.encode(\"ascii\", errors = \"replace\")\n\n# convert it back to utf-8\nprint(after.decode(\"ascii\"))\n\n# We've lost the original underlying byte string! It's been \n# replaced with the underlying byte string for the unknown character :(", 195 | "execution_count": 7, 196 | "outputs": [ 197 | { 198 | "output_type": "stream", 199 | "text": "This is the euro symbol: ?\n", 200 | "name": "stdout" 201 | } 202 | ] 203 | }, 204 | { 205 | "metadata": { 206 | "_cell_guid": "cdc4438f-4e9f-4d06-bbf5-01f2613af790", 207 | "_uuid": "991c5ca3457deb585ce58bf4ba64d55fe0580ee2" 208 | }, 209 | "cell_type": "markdown", 210 | "source": "This is bad and we want to avoid doing it! It's far better to convert all our text to UTF-8 as soon as we can and keep it in that encoding. The best time to convert non UTF-8 input into UTF-8 is when you read in files, which we'll talk about next.\n\nFirst, however, try converting between bytes and strings with different encodings and see what happens. Notice what this does to your text. Would you want this to happen to data you were trying to analyze?" 211 | }, 212 | { 213 | "metadata": { 214 | "_cell_guid": "6a07f260-d0f4-41d1-8511-e45e8a43bf43", 215 | "_uuid": "1b1f20f32a0647bc65281d3feb1d3124a226ccdd", 216 | "trusted": true 217 | }, 218 | "cell_type": "code", 219 | "source": "# Your turn! Try encoding and decoding different symbols to ASCII and\n# see what happens. I'd recommend $, #, 你好 and नमस्ते but feel free to\n# try other characters. What happens? When would this cause problems?\nbefore = \"This is the Hindi Hello: नमस्ते\"\n\n# encode it to a different encoding, replacing characters that raise errors\nafter = before.encode(\"ascii\", errors = \"replace\")\n\n# convert it back to utf-8\nprint(after.decode(\"ascii\"))", 220 | "execution_count": 8, 221 | "outputs": [ 222 | { 223 | "output_type": "stream", 224 | "text": "This is the Hindi Hello: ??????\n", 225 | "name": "stdout" 226 | } 227 | ] 228 | }, 229 | { 230 | "metadata": { 231 | "_cell_guid": "8cdb1777-e518-499c-8941-2c510e1ca785", 232 | "_uuid": "1970b834a6189e19197ffd4d1ad2c56b1a7c705d" 233 | }, 234 | "cell_type": "markdown", 235 | "source": "# Reading in files with encoding problems\n___\n\nMost files you'll encounter will probably be encoded with UTF-8. This is what Python expects by default, so most of the time you won't run into problems. However, sometimes you'll get an error like this: " 236 | }, 237 | { 238 | "metadata": { 239 | "_cell_guid": "7f193412-74f3-4b8c-93d3-61997020b922", 240 | "_uuid": "4833c0ce828c4547d374737f5707401c90ac4597", 241 | "trusted": true 242 | }, 243 | "cell_type": "code", 244 | "source": "# try to read in a file not in UTF-8\nkickstarter_2016 = pd.read_csv(\"../input/kickstarter-projects/ks-projects-201612.csv\")", 245 | "execution_count": 9, 246 | "outputs": [ 247 | { 248 | "output_type": "error", 249 | "ename": "UnicodeDecodeError", 250 | "evalue": "'utf-8' codec can't decode byte 0x99 in position 11: invalid start byte", 251 | "traceback": [ 252 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 253 | "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 254 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_tokens\u001b[0;34m()\u001b[0m\n", 255 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_with_dtype\u001b[0;34m()\u001b[0m\n", 256 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._string_convert\u001b[0;34m()\u001b[0m\n", 257 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers._string_box_utf8\u001b[0;34m()\u001b[0m\n", 258 | "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0x99 in position 11: invalid start byte", 259 | "\nDuring handling of the above exception, another exception occurred:\n", 260 | "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 261 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# try to read in a file not in UTF-8\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mkickstarter_2016\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"../input/kickstarter-projects/ks-projects-201612.csv\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 262 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 707\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 708\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 709\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 710\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 711\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 263 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 454\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 455\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 456\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 457\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 264 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1067\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'skipfooter not supported for iteration'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1068\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1069\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1070\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1071\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'as_recarray'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 265 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1837\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1838\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1839\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1840\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1841\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 266 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n", 267 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n", 268 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n", 269 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_column_data\u001b[0;34m()\u001b[0m\n", 270 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_tokens\u001b[0;34m()\u001b[0m\n", 271 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_with_dtype\u001b[0;34m()\u001b[0m\n", 272 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._string_convert\u001b[0;34m()\u001b[0m\n", 273 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers._string_box_utf8\u001b[0;34m()\u001b[0m\n", 274 | "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0x99 in position 11: invalid start byte" 275 | ] 276 | } 277 | ] 278 | }, 279 | { 280 | "metadata": { 281 | "_cell_guid": "8e40ef9c-8973-4df8-b307-10a6c592c715", 282 | "_uuid": "3855e9b70f573aff62a7333001b827b13d349b49" 283 | }, 284 | "cell_type": "markdown", 285 | "source": "Notice that we get the same `UnicodeDecodeError` we got when we tried to decode UTF-8 bytes as if they were ASCII! This tells us that this file isn't actually UTF-8. We don't know what encoding it actually *is* though. One way to figure it out is to try and test a bunch of different character encodings and see if any of them work. A better way, though, is to use the chardet module to try and automatically guess what the right encoding is. It's not 100% guaranteed to be right, but it's usually faster than just trying to guess.\n\nI'm going to just look at the first ten thousand bytes of this file. This is usually enough for a good guess about what the encoding is and is much faster than trying to look at the whole file. (Especially with a large file this can be very slow.) Another reason to just look at the first part of the file is that we can see by looking at the error message that the first problem is the 11th character. So we probably only need to look at the first little bit of the file to figure out what's going on." 286 | }, 287 | { 288 | "metadata": { 289 | "_cell_guid": "86e058c6-e971-4927-a442-0d67fadca013", 290 | "_uuid": "ef876801a295410c657b2b85ecfef63c8ae0ab09", 291 | "trusted": true 292 | }, 293 | "cell_type": "code", 294 | "source": "# look at the first ten thousand bytes to guess the character encoding\nwith open(\"../input/kickstarter-projects/ks-projects-201801.csv\", 'rb') as rawdata:\n result = chardet.detect(rawdata.read(10000))\n\n# check what the character encoding might be\nprint(result)", 295 | "execution_count": 10, 296 | "outputs": [ 297 | { 298 | "output_type": "stream", 299 | "text": "{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}\n", 300 | "name": "stdout" 301 | } 302 | ] 303 | }, 304 | { 305 | "metadata": { 306 | "_cell_guid": "eb893685-188a-4de6-9f1e-ab42973135a9", 307 | "_uuid": "907cafd0d66144e12af2953467021019cd5c2945" 308 | }, 309 | "cell_type": "markdown", 310 | "source": "So chardet is 73% confidence that the right encoding is \"Windows-1252\". Let's see if that's correct:" 311 | }, 312 | { 313 | "metadata": { 314 | "_cell_guid": "25e9e59c-d881-4d91-be0f-4425f5c6583d", 315 | "_uuid": "5e09150f25216d09065165845414f308d7445ae2", 316 | "trusted": true 317 | }, 318 | "cell_type": "code", 319 | "source": "# read in the file with the encoding detected by chardet\nkickstarter_2016 = pd.read_csv(\"../input/kickstarter-projects/ks-projects-201612.csv\", encoding='Windows-1252')\n\n# look at the first few lines\nkickstarter_2016.head()", 320 | "execution_count": 11, 321 | "outputs": [ 322 | { 323 | "output_type": "stream", 324 | "text": "/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2698: DtypeWarning: Columns (13,14,15) have mixed types. Specify dtype option on import or set low_memory=False.\n interactivity=interactivity, compiler=compiler, result=result)\n", 325 | "name": "stderr" 326 | }, 327 | { 328 | "output_type": "execute_result", 329 | "execution_count": 11, 330 | "data": { 331 | "text/plain": " ID name \\\n0 1000002330 The Songs of Adelaide & Abullah \n1 1000004038 Where is Hank? \n2 1000007540 ToshiCapital Rekordz Needs Help to Complete Album \n3 1000011046 Community Film Project: The Art of Neighborhoo... \n4 1000014025 Monarch Espresso Bar \n\n category main_category currency deadline goal \\\n0 Poetry Publishing GBP 2015-10-09 11:36:00 1000 \n1 Narrative Film Film & Video USD 2013-02-26 00:20:50 45000 \n2 Music Music USD 2012-04-16 04:24:11 5000 \n3 Film & Video Film & Video USD 2015-08-29 01:00:00 19500 \n4 Restaurants Food USD 2016-04-01 13:38:27 50000 \n\n launched pledged state backers country usd pledged \\\n0 2015-08-11 12:12:28 0 failed 0 GB 0 \n1 2013-01-12 00:20:50 220 failed 3 US 220 \n2 2012-03-17 03:24:11 1 failed 1 US 1 \n3 2015-07-04 08:35:03 1283 canceled 14 US 1283 \n4 2016-02-26 13:38:27 52375 successful 224 US 52375 \n\n Unnamed: 13 Unnamed: 14 Unnamed: 15 Unnamed: 16 \n0 NaN NaN NaN NaN \n1 NaN NaN NaN NaN \n2 NaN NaN NaN NaN \n3 NaN NaN NaN NaN \n4 NaN NaN NaN NaN ", 332 | "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
IDnamecategorymain_categorycurrencydeadlinegoallaunchedpledgedstatebackerscountryusd pledgedUnnamed: 13Unnamed: 14Unnamed: 15Unnamed: 16
01000002330The Songs of Adelaide & AbullahPoetryPublishingGBP2015-10-09 11:36:0010002015-08-11 12:12:280failed0GB0NaNNaNNaNNaN
11000004038Where is Hank?Narrative FilmFilm & VideoUSD2013-02-26 00:20:50450002013-01-12 00:20:50220failed3US220NaNNaNNaNNaN
21000007540ToshiCapital Rekordz Needs Help to Complete AlbumMusicMusicUSD2012-04-16 04:24:1150002012-03-17 03:24:111failed1US1NaNNaNNaNNaN
31000011046Community Film Project: The Art of Neighborhoo...Film & VideoFilm & VideoUSD2015-08-29 01:00:00195002015-07-04 08:35:031283canceled14US1283NaNNaNNaNNaN
41000014025Monarch Espresso BarRestaurantsFoodUSD2016-04-01 13:38:27500002016-02-26 13:38:2752375successful224US52375NaNNaNNaNNaN
\n
" 333 | }, 334 | "metadata": {} 335 | } 336 | ] 337 | }, 338 | { 339 | "metadata": { 340 | "_cell_guid": "200d647e-0d92-48b9-b49b-9d48b5f149ba", 341 | "_uuid": "bbc3c1a70b0d01f314a68b2cf2448cecd8006e89" 342 | }, 343 | "cell_type": "markdown", 344 | "source": "Yep, looks like chardet was right! The file reads in with no problem (although we do get a warning about datatypes) and when we look at the first few rows it seems to be be fine. " 345 | }, 346 | { 347 | "metadata": { 348 | "_cell_guid": "a52ac0aa-7d47-4442-88e3-d625d7b42934", 349 | "_uuid": "706e1f985080e9492a47447a34fb9c1203738229", 350 | "trusted": true 351 | }, 352 | "cell_type": "code", 353 | "source": "# Your Turn! Trying to read in this file gives you an error. Figure out\n# what the correct encoding should be and read in the file. :)\npolice_killings = pd.read_csv(\"../input/fatal-police-shootings-in-the-us/PoliceKillingsUS.csv\")", 354 | "execution_count": 12, 355 | "outputs": [ 356 | { 357 | "output_type": "error", 358 | "ename": "UnicodeDecodeError", 359 | "evalue": "'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte", 360 | "traceback": [ 361 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 362 | "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 363 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_tokens\u001b[0;34m()\u001b[0m\n", 364 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_with_dtype\u001b[0;34m()\u001b[0m\n", 365 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._string_convert\u001b[0;34m()\u001b[0m\n", 366 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers._string_box_utf8\u001b[0;34m()\u001b[0m\n", 367 | "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte", 368 | "\nDuring handling of the above exception, another exception occurred:\n", 369 | "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", 370 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Your Turn! Trying to read in this file gives you an error. Figure out\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# what the correct encoding should be and read in the file. :)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mpolice_killings\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"../input/fatal-police-shootings-in-the-us/PoliceKillingsUS.csv\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 371 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 707\u001b[0m skip_blank_lines=skip_blank_lines)\n\u001b[1;32m 708\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 709\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 710\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 711\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 372 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 454\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 455\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 456\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 457\u001b[0m \u001b[0mparser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 373 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1067\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'skipfooter not supported for iteration'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1068\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1069\u001b[0;31m \u001b[0mret\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1070\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1071\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'as_recarray'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 374 | "\u001b[0;32m/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self, nrows)\u001b[0m\n\u001b[1;32m 1837\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1838\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1839\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1840\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1841\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 375 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[0;34m()\u001b[0m\n", 376 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[0;34m()\u001b[0m\n", 377 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[0;34m()\u001b[0m\n", 378 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_column_data\u001b[0;34m()\u001b[0m\n", 379 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_tokens\u001b[0;34m()\u001b[0m\n", 380 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._convert_with_dtype\u001b[0;34m()\u001b[0m\n", 381 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._string_convert\u001b[0;34m()\u001b[0m\n", 382 | "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers._string_box_utf8\u001b[0;34m()\u001b[0m\n", 383 | "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte" 384 | ] 385 | } 386 | ] 387 | }, 388 | { 389 | "metadata": { 390 | "_cell_guid": "deb7ec96-9ef5-4f40-ab93-42075912d9a3", 391 | "_uuid": "f2918952ade0ab4943f826fda9f955a79cdb5d0b", 392 | "trusted": true 393 | }, 394 | "cell_type": "code", 395 | "source": "# look at the first ten thousand bytes to guess the character encoding\nwith open(\"../input/fatal-police-shootings-in-the-us/PoliceKillingsUS.csv\", 'rb') as rawdata:\n result = chardet.detect(rawdata.read(10000))\n\n# check what the character encoding might be\nprint(result)", 396 | "execution_count": 13, 397 | "outputs": [ 398 | { 399 | "output_type": "stream", 400 | "text": "{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}\n", 401 | "name": "stdout" 402 | } 403 | ] 404 | }, 405 | { 406 | "metadata": { 407 | "_cell_guid": "b4788484-d875-4df2-b3ae-4f8c718d7e43", 408 | "_uuid": "509fff816a3ffbd2e3ec97c7af58e8a6d97d7620", 409 | "trusted": true 410 | }, 411 | "cell_type": "code", 412 | "source": "# Your Turn! Trying to read in this file gives you an error. Figure out\n# what the correct encoding should be and read in the file. :)\nwith open(\"../input/fatal-police-shootings-in-the-us/PoliceKillingsUS.csv\", 'rb') as rawdata:\n result = chardet.detect(rawdata.read(100000))\n\nprint (result)\npolice_killings = pd.read_csv(\"../input/fatal-police-shootings-in-the-us/PoliceKillingsUS.csv\", encoding='Windows-1252')\npolice_killings.head()", 413 | "execution_count": 15, 414 | "outputs": [ 415 | { 416 | "output_type": "stream", 417 | "text": "{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}\n", 418 | "name": "stdout" 419 | }, 420 | { 421 | "output_type": "execute_result", 422 | "execution_count": 15, 423 | "data": { 424 | "text/plain": " id name date manner_of_death armed age \\\n0 3 Tim Elliot 02/01/15 shot gun 53.0 \n1 4 Lewis Lee Lembke 02/01/15 shot gun 47.0 \n2 5 John Paul Quintero 03/01/15 shot and Tasered unarmed 23.0 \n3 8 Matthew Hoffman 04/01/15 shot toy weapon 32.0 \n4 9 Michael Rodriguez 04/01/15 shot nail gun 39.0 \n\n gender race city state signs_of_mental_illness threat_level \\\n0 M A Shelton WA True attack \n1 M W Aloha OR False attack \n2 M H Wichita KS False other \n3 M W San Francisco CA True attack \n4 M H Evans CO False attack \n\n flee body_camera \n0 Not fleeing False \n1 Not fleeing False \n2 Not fleeing False \n3 Not fleeing False \n4 Not fleeing False ", 425 | "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idnamedatemanner_of_deatharmedagegenderracecitystatesigns_of_mental_illnessthreat_levelfleebody_camera
03Tim Elliot02/01/15shotgun53.0MASheltonWATrueattackNot fleeingFalse
14Lewis Lee Lembke02/01/15shotgun47.0MWAlohaORFalseattackNot fleeingFalse
25John Paul Quintero03/01/15shot and Taseredunarmed23.0MHWichitaKSFalseotherNot fleeingFalse
38Matthew Hoffman04/01/15shottoy weapon32.0MWSan FranciscoCATrueattackNot fleeingFalse
49Michael Rodriguez04/01/15shotnail gun39.0MHEvansCOFalseattackNot fleeingFalse
\n
" 426 | }, 427 | "metadata": {} 428 | } 429 | ] 430 | }, 431 | { 432 | "metadata": { 433 | "_cell_guid": "02f54e31-ec04-425b-9fcd-d85b11742f8e", 434 | "_uuid": "51e11611fae94a9f704440275905e719ee801a0a" 435 | }, 436 | "cell_type": "markdown", 437 | "source": "# Saving your files with UTF-8 encoding\n___\n\nFinally, once you've gone through all the trouble of getting your file into UTF-8, you'll probably want to keep it that way. The easiest way to do that is to save your files with UTF-8 encoding. The good news is, since UTF-8 is the standard encoding in Python, when you save a file it will be saved as UTF-8 by default:" 438 | }, 439 | { 440 | "metadata": { 441 | "_cell_guid": "affcfb28-b6a8-426a-b690-0c717073ad09", 442 | "collapsed": true, 443 | "_uuid": "8f72b89b5ea80a1fc9890c3eac89614757c16b47", 444 | "trusted": true 445 | }, 446 | "cell_type": "code", 447 | "source": "# save our file (will be saved as UTF-8 by default!)\nkickstarter_2016.to_csv(\"ks-projects-201801-utf8.csv\")", 448 | "execution_count": 16, 449 | "outputs": [] 450 | }, 451 | { 452 | "metadata": { 453 | "_cell_guid": "036fb925-21ec-4b15-8eac-4d29879003f5", 454 | "_uuid": "5ff9d834741ea3f9a6ed0276dc02b7a59948ae1e" 455 | }, 456 | "cell_type": "markdown", 457 | "source": "Pretty easy, huh? :)\n\n> If you haven't saved a file in a kernel before, you need to hit the commit & run button and wait for your notebook to finish running first before you can see or access the file you've saved out. If you don't see it at first, wait a couple minutes and it should show up. The files you save will be in the directory \"../output/\", and you can download them from your notebook." 458 | }, 459 | { 460 | "metadata": { 461 | "_cell_guid": "a56266f2-b957-459b-a2d4-3587b6637e70", 462 | "collapsed": true, 463 | "_uuid": "b6610da48bc0a43c5934b4970c394e9de596fe97", 464 | "trusted": true 465 | }, 466 | "cell_type": "code", 467 | "source": "# Your turn! Save out a version of the police_killings dataset with UTF-8 encoding\npolice_killings.to_csv(\"Police Killings-utf8.csv\")", 468 | "execution_count": 17, 469 | "outputs": [] 470 | }, 471 | { 472 | "metadata": { 473 | "_cell_guid": "b4f37fce-4d08-409e-bbbd-6a26c3bbc6ee", 474 | "_uuid": "52b0af56e3c77db96056e9acd785f8f435f7caf5" 475 | }, 476 | "cell_type": "markdown", 477 | "source": "And that's it for today! We didn't do quite as much coding, but take my word for it: if you don't have the right tools, figuring out what encoding a file is in can be a huge time sink. If you have any questions, be sure to post them in the comments below or [on the forums](https://www.kaggle.com/questions-and-answers). \n\nRemember that your notebook is private by default, and in order to share it with other people or ask for help with it, you'll need to make it public. First, you'll need to save a version of your notebook that shows your current work by hitting the \"Commit & Run\" button. (Your work is saved automatically, but versioning your work lets you go back and look at what it was like at the point you saved it. It also lets you share a nice compiled notebook instead of just the raw code.) Then, once your notebook is finished running, you can go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the \"Commit & Run\" button) and setting the \"Visibility\" dropdown to \"Public\".\n\n# More practice!\n___\n\nCheck out [this dataset of files in different character encodings](https://www.kaggle.com/rtatman/character-encoding-examples). Can you read in all the files with their original encodings and them save them out as UTF-8 files?\n\nIf you have a file that's in UTF-8 but has just a couple of weird-looking characters in it, you can try out the [ftfy module](https://ftfy.readthedocs.io/en/latest/#) and see if it helps. " 478 | }, 479 | { 480 | "metadata": { 481 | "trusted": true, 482 | "collapsed": true, 483 | "_uuid": "6efe80da4e087451ce0b538fcc36759488a6ad4b" 484 | }, 485 | "cell_type": "code", 486 | "source": "", 487 | "execution_count": null, 488 | "outputs": [] 489 | } 490 | ], 491 | "metadata": { 492 | "language_info": { 493 | "name": "python", 494 | "version": "3.6.4", 495 | "mimetype": "text/x-python", 496 | "codemirror_mode": { 497 | "name": "ipython", 498 | "version": 3 499 | }, 500 | "pygments_lexer": "ipython3", 501 | "nbconvert_exporter": "python", 502 | "file_extension": ".py" 503 | }, 504 | "kernelspec": { 505 | "display_name": "Python 3", 506 | "language": "python", 507 | "name": "python3" 508 | } 509 | }, 510 | "nbformat": 4, 511 | "nbformat_minor": 1 512 | } -------------------------------------------------------------------------------- /Data Cleaning 2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": { 5 | "_cell_guid": "b91a74ba-85f4-486e-b5f9-d0898f0626bf", 6 | "_uuid": "6ac53f18b4f4ec0fc44348cedb5d1c319fa127c0" 7 | }, 8 | "cell_type": "markdown", 9 | "source": "### Previous days\n\n* [Day 1: Handling missing values](https://www.kaggle.com/rtatman/data-cleaning-challenge-handling-missing-values)\n___\nWelcome to day 2 of the 5-Day Data Challenge! Today, we're going to be looking at how to scale and normalize data (and what the difference is between the two!). To get started, click the blue \"Fork Notebook\" button in the upper, right hand corner. This will create a private copy of this notebook that you can edit and play with. Once you're finished with the exercises, you can choose to make your notebook public to share with others. :)\n\n> **Your turn!** As we work through this notebook, you'll see some notebook cells (a block of either code or text) that has \"Your Turn!\" written in it. These are exercises for you to do to help cement your understanding of the concepts we're talking about. Once you've written the code to answer a specific question, you can run the code by clicking inside the cell (box with code in it) with the code you want to run and then hit CTRL + ENTER (CMD + ENTER on a Mac). You can also click in a cell and then click on the right \"play\" arrow to the left of the code. If you want to run all the code in your notebook, you can use the double, \"fast forward\" arrows at the bottom of the notebook editor.\n\nHere's what we're going to do today:\n\n* [Get our environment set up](#Get-our-environment-set-up)\n* [Scaling vs. Normalization: What's the difference?](#Scaling-vs.-Normalization:-What's-the-difference?)\n* [Practice scaling](#Practice-scaling)\n* [Practice normalization](#Practice-normalization)\n\nLet's get started!" 10 | }, 11 | { 12 | "metadata": { 13 | "_cell_guid": "5cd5061f-ae30-4837-a53b-690ffd5c5830", 14 | "_uuid": "9d82bf13584b8e682962fbb96131f2447d741679" 15 | }, 16 | "cell_type": "markdown", 17 | "source": "# Get our environment set up\n________\n\nThe first thing we'll need to do is load in the libraries and datasets we'll be using. \n\n> **Important!** Make sure you run this cell yourself or the rest of your code won't work!" 18 | }, 19 | { 20 | "metadata": { 21 | "_cell_guid": "135a7804-b5f5-40aa-8657-4a15774e3666", 22 | "_uuid": "835cbe0834b935fb0fd40c75b9c39454836f4d5f", 23 | "collapsed": true, 24 | "trusted": true 25 | }, 26 | "cell_type": "code", 27 | "source": "# modules we'll use\nimport pandas as pd\nimport numpy as np\n\n# for Box-Cox Transformation\nfrom scipy import stats\n\n# for min_max scaling\nfrom mlxtend.preprocessing import minmax_scaling\n\n# plotting modules\nimport seaborn as sns\nimport matplotlib.pyplot as plt\n\n# read in all our data\nkickstarters_2017 = pd.read_csv(\"../input/kickstarter-projects/ks-projects-201801.csv\")\n\n# set seed for reproducibility\nnp.random.seed(0)", 28 | "execution_count": 1, 29 | "outputs": [] 30 | }, 31 | { 32 | "metadata": { 33 | "_cell_guid": "604ac3a4-b1d9-4264-b312-4bbeecdeec00", 34 | "_uuid": "03ce3b4afe87d98f777172c2c7be066a66a0b237" 35 | }, 36 | "cell_type": "markdown", 37 | "source": "Now that we're set up, let's learn about scaling & normalization. (If you like, you can take this opportunity to take a look at some of the data.)" 38 | }, 39 | { 40 | "metadata": { 41 | "_cell_guid": "62b9f021-5b80-43e2-bf60-8e0d5e22d572", 42 | "_uuid": "032a618abb98a28e60ab84376cf21402178f995d" 43 | }, 44 | "cell_type": "markdown", 45 | "source": "# Scaling vs. Normalization: What's the difference?\n____\n\nOne of the reasons that it's easy to get confused between scaling and normalization is because the terms are sometimes used interchangeably and, to make it even more confusing, they are very similar! In both cases, you're transforming the values of numeric variables so that the transformed data points have specific helpful properties. The difference is that, in scaling, you're changing the *range* of your data while in normalization you're changing the *shape of the distribution* of your data. Let's talk a little more in-depth about each of these options. \n\n___\n\n## **Scaling**\n\nThis means that you're transforming your data so that it fits within a specific scale, like 0-100 or 0-1. You want to scale data when you're using methods based on measures of how far apart data points, like [support vector machines, or SVM](https://en.wikipedia.org/wiki/Support_vector_machine) or [k-nearest neighbors, or KNN](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm). With these algorithms, a change of \"1\" in any numeric feature is given the same importance. \n\nFor example, you might be looking at the prices of some products in both Yen and US Dollars. One US Dollar is worth about 100 Yen, but if you don't scale your prices methods like SVM or KNN will consider a difference in price of 1 Yen as important as a difference of 1 US Dollar! This clearly doesn't fit with our intuitions of the world. With currency, you can convert between currencies. But what about if you're looking at something like height and weight? It's not entirely clear how many pounds should equal one inch (or how many kilograms should equal one meter).\n\nBy scaling your variables, you can help compare different variables on equal footing. To help solidify what scaling looks like, let's look at a made-up example. (Don't worry, we'll work with real data in just a second, this is just to help illustrate my point.)\n" 46 | }, 47 | { 48 | "metadata": { 49 | "_cell_guid": "e0942c00-e306-4c64-a53a-e76d07cd937f", 50 | "_uuid": "e35280c753de7b963c4d812624c816c766ef4367", 51 | "trusted": true 52 | }, 53 | "cell_type": "code", 54 | "source": "# generate 1000 data points randomly drawn from an exponential distribution\noriginal_data = np.random.exponential(size = 1000)\n\n# mix-max scale the data between 0 and 1\nscaled_data = minmax_scaling(original_data, columns = [0])\n\n# plot both together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(original_data, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(scaled_data, ax=ax[1])\nax[1].set_title(\"Scaled data\")", 55 | "execution_count": 2, 56 | "outputs": [ 57 | { 58 | "output_type": "execute_result", 59 | "execution_count": 2, 60 | "data": { 61 | "text/plain": "Text(0.5,1,'Scaled data')" 62 | }, 63 | "metadata": {} 64 | }, 65 | { 66 | "output_type": "display_data", 67 | "data": { 68 | "text/plain": "", 69 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl8XGd56PHfM6ORRrtsS7bl3Uls\nx45DNmchOyTQJJCktw3gAAmhKbmFhkKB9lK4DZD29lMobaEkEExIQoEESELApCEb2cjmxHGcxXYW\n75Zlx7Js7Zr9uX+cM/JYHkkz0mxn9Hw/H30saY7OeUd+z6PnXc77iqpijDGmvPiKXQBjjDG5Z8Hd\nGGPKkAV3Y4wpQxbcjTGmDFlwN8aYMmTB3RhjypAF9xwQka+IyK25PjaDc6mIHJOLcxkzUSJyjYg8\nXYifFZHtInLheK41WVhwH8atZK+JyICI7BWRH4hI02g/o6r/oqp/mcn5szl2IkTkCREJiUiviPSI\nyEsi8mURqcriHPbHo0yJyNki8qyIdIvIARF5RkROLXa58mGy1mML7ilE5IvAN4G/AxqBM4D5wCMi\nUjnCz1QUroRZu15V64FW4IvASuABEZHiFssUk4g0APcD3wOmArOBbwDhYpbL5JYFd5db4b8BfFZV\nH1TVqKpuBz6ME+A/7h73dRG5R0R+JiI9wDXu936Wcq6rRWSHiHSKyD+mNiFTjxWRBW5W8QkR2Ski\n+0XkqynnOU1EnhORLhHZIyI3jfRHZjSq2q+qTwCXAe8GPjDW+UXkKffHXxGRPhH5iIhMEZH7RaRD\nRA66n8/Jtjym6BYDqOpdqhpX1UFVfVhVX00eICKfEpFNbstvo4ic7H7/yyKyJeX7/2uki4jIsSLy\niNsyeFNEPpzy2jQRWe22Kl8Ajh6twCJyVco99dVhr1k9TsOC+yFnAkHg16nfVNU+4PfA+1K+fTlw\nD9AE/Dz1eBFZBnwf+BhOxtyIkxmN5mxgCXABcIOILHW/Hwf+FmjGCcoXAJ/J8n2lvpedwFrgnLHO\nr6rnusecoKp1qvpLnPpyO84fu3nAIHDTeMtjiuYtIC4iPxGRi0VkSuqLIvIh4OvA1UADTlLQ6b68\nBaf+NOIkQz8TkdbhFxCRWuAR4E5gOnAl8H0ROc495GYghHOP/IX7kZZ7T/0AuAqYBUwDUoOx1eM0\nLLgf0gzsV9VYmtf2uK8nPaeqv1HVhKoODjv2CuB3qvq0qkaAG4CxFvD5hps9vQK8ApwAoKovqerz\nqhpzWxE/BM7L/q0dph2nKZ71+VW1U1XvVdUBVe0F/l8OymMKTFV7cBIKBX4EdLhZ9Az3kL8EvqWq\nL6pjs6rucH/2blVtd+v+L4G3gdPSXOaDwHZVvd2tX+uAe4ErRMQP/Dlwg9uqfB34yShFvgK4X1Wf\nUtUw8I9AIuX9WD1Ow4L7IfuB5hH60Fvd15N2jXKeWamvq+oAh7KekexN+XwAqAMQkcVuk3Gv2wX0\nLxz+R2Y8ZgMHxnN+EakRkR+6zeMe4Cmgyb1ZjYeo6iZVvUZV5wDLcertd9yX5+Jk6EdwuxzXu10g\nXe7Ppqsz84HTk8e5x34MmAm0ABUcfh/tGKW4w++pflLuKavH6VlwP+Q5nAGlP0v9ptu8vBj4Q8q3\nR8vE95DSZBSRapxm5Hj8AHgDWKSqDcBXgHEPhorIXOAU4I/jPP8XcbqPTnePTzZ5bYDWw1T1DeAO\nnEANTiA9og9cRObjZPrXA9NUtQl4nfT//7uAJ1W1KeWjTlU/DXQAMZw/IknzRinintRjRaSGw+8p\nq8dpWHB3qWo3Th/i90TkIhEJiMgC4G6gDfhphqe6B7hURM50B3W+wfgrTT3QA/SJyLHAp8dzEjdT\nOQ/4LfAC8ECG538HOGpYeQaBLhGZCnxtPOUxxeUOdH4xOYjo/tG/EnjePeRW4Esicoo4jnEDey1O\nYtPh/twnOfQHYbj7gcXuQGjA/ThVRJaqahxnbOvrbt1cBnxilCLfA3xQnOmblcCNHB67rB6nYcE9\nhap+C+ev/rdxKssanAzkArevL5NzbAA+C/wCJ+PoBfYxvmlmXwI+6p7jR8Avs/z5m0SkF6dyfwen\nz/MiVU32V451/q8DP3Gb1R92z1GN00X1PPBgtm/IlIRe4HRgjYj04/xfvo6T0aKqd+P0Q9/pHvsb\nYKqqbgT+HaeV+w5wPPBMugu4fdnvx5l+247T9fhNIPmcxfU43Y97cVoNt49UWPee+mu3PHuAgzgJ\nV5LV4zTENuvILxGpA7pwmozbil0eY8zkYJl7HojIpW5zsxanFfAasL24pTLGTCYW3PPjcpymaDuw\nCFip1kQyxhSQdcsYY0wZsszdGGPKUNEWvWpubtYFCxYU6/KmzL300kv7VbWlGNe2um3yKdO6XbTg\nvmDBAtauXVusy5syJyKjPfGYV1a3TT5lWretW8YYY8qQBXdjjClDFtyNMaYMWXA3Zgwi8rciskFE\nXheRu0QkWOwyGTMWC+7GjEJEZgN/A6xQ1eWAH2e9FGNKmgV3Y8ZWAVS7a/3X4Dx5bExJs+BuzChU\ndTfO+kA7cVYk7FbVh4cfJyLXichaEVnb0dFR6GIacwQL7saMwt1f9HJgIc6OQLUi8vHhx6nqKlVd\noaorWlqK8uyUMYex4G7M6C4Etqlqh6pGcTaZOLPIZTJmTEV7QnWi7lyzc+jzj54+2g5dxkzITuAM\nd2u3QeACIG+Pn1q9Nrlimbsxo1DVNTjbvK3DWZffB6wqaqGMyYBnM3djCkVVv0aZ7rNpypdl7sYY\nU4YsuBtjTBmy4G6MMWXIgrsxxpQhC+7GGFOGLLgbY0wZsuBujDFlyIK7McaUIQvuxhhThiy4G2NM\nGbLgbowxZciCuzHGlCEL7sYYU4YsuBtjTBmy4G6MMWXIgrsxxpQhC+7GGFOGLLgbMwoRWSIi61M+\nekTk88UulzFjsW32jBmFqr4JnAggIn5gN3BfUQtlTAYsczcmcxcAW1R1R7ELYsxYLLgbk7mVwF3F\nLoQxmcgouIvIRSLypohsFpEvp3l9nog8LiIvi8irInJJ7otqTPGISCVwGXD3CK9fJyJrRWRtR0dH\nYQtnTBpjBne3n/Fm4GJgGXCliCwbdtj/BX6lqifhZDffz3VBjSmyi4F1qvpOuhdVdZWqrlDVFS0t\nLQUumjFHyiRzPw3YrKpbVTUC/AK4fNgxCjS4nzcC7bkrojEl4UqsS8Z4SCbBfTawK+XrNvd7qb4O\nfFxE2oAHgM/mpHTGlAARqQHeB/y62GUxJlOZBHdJ8z0d9vWVwB2qOge4BPipiBxxbuuXNF6kqgOq\nOk1Vu4tdFmMylUlwbwPmpnw9hyO7Xa4FfgWgqs8BQaB5+ImsX9IYYwojk+D+IrBIRBa6MwZWAquH\nHbMTZw4wIrIUJ7hbam6MMUUyZnBX1RhwPfAQsAlnVswGEblRRC5zD/si8CkReQVn0OkaVR3edWOM\nMaZAMlp+QFUfwBkoTf3eDSmfbwTOym3RjDHGjJc9oWqMMWXIgrsxxpQhC+7GGFOGLLgbY0wZsuBu\njDFlyIK7McaUIQvuxhhThspim7071+wc+vyjp88rYkmMyR2r12YiLHM3xpgyZMHdGGPKkAV3Y4wp\nQxbcjTGmDFlwN8aYMmTB3RhjypAFd2PGICJNInKPiLwhIptE5N3FLpMxYymL4N49GGV/X7jYxTDl\n67vAg6p6LHACzqY1BbGzs59oPFGoy5kyUhbB/dfr2lj11Fa7CUzOiUgDcC7wYwBVjahqVyGu3TUQ\n4YdPbeWRje8U4nKmzHg+uMfiCbZ39tMXjrF+Z0HuOTO5HIWzH/DtIvKyiNwqIrXDDxKR60RkrYis\n7ejIzfbBW/f3o8AL2w/QPRDNyTnN5OH54N52cJBoXAn4hWe37i92cUz5qQBOBn6gqicB/cCXhx+k\nqqtUdYWqrmhpacnJhbd19BPwC5FYgrtf2pWTc5rJw/PBfev+fgQ4dcFU9vWECUXjxS6SKS9tQJuq\nrnG/vgcn2Ofd1v19LJpeT1N1gNd3dxfikqaMeD64b+/sZ0ZDkLlTa1BgR+dAsYtkyoiq7gV2icgS\n91sXABvzfd3uwSgHB6IsbK6lub6Kbfv7831JU2Y8vypk10CEmY3VNNdVAfCTZ7ezfHYjYCvpmZz5\nLPBzEakEtgKfzPcFuwYiALTUV9HZH+blnV38/PkdiIjVa5MRzwf3vnCMuqoKmusqAWxKpMk5VV0P\nrCjkNfvCMQC3blcRjiXoC8eoDwYKWQzjYZ7ulonFE4SiCeqq/FRV+GkIVtDRa8HdeF8yuNdWVdDi\ntko7LHExWfB0cD+U3TjZTHN9lWXupiwcCu5+muud4L6/N1LMIhmP8XRw7w87M2Pqqpzepea6Kstu\nTFnoC8WoDvip8PlorA5Q4RNLXExWPB3c+8LOgx11QSe4T62pJBRN2HRI43nJsSQAnwhTaio5OGCZ\nu8mcx4P7oUEngMZqp3ume9Ce5jPe1h+ODSUt4NTtHqvXJgseD+6Hd8s0uMG9J2Q3gfG2vnCM2qpD\nwb2huoKeUKyIJTJe4+3gHopS6fdRWeG8jWTm3jNoN4HxttRuGXASl95QlIRqEUtlvMTTwb0/Eqe2\nyj/0db3bjLVuGeNlh6b4pgT3YICEOgOtxmTC08G9L3R4dhPw+6ip9Fu3jPG04WNJkNIqtbptMuTt\n4B6OUTfsiT0beDJely64N9hkAZMlTwf3gUiMmoD/sO81BC24G28bjDgTBWoqD9XtQ+NJVrdNZjwd\n3MOxBMHA4W+hoTpg2Y3xtFDM2VEsmJK41FT68YvQbZMFTIY8G9wTqoRjCaqGZ+7VFfRH4sRsyz3j\nUWH3IbyqikO3p0+E+uoK63M3GcsouIvIRSLypohsFpEjdqFxj/mwiGwUkQ0icmdui3mkiJvdpN4A\nAI1uH3yvzSowHhVO1u3hrVLrcjRZGHPJXxHxAzcD78PZleZFEVmtqhtTjlkE/ANwlqoeFJHp+Spw\nUvIGCFYcnrknp0P2hi24G28KxZKZ+5F1e5+temoylEnmfhqwWVW3qmoE+AVw+bBjPgXcrKoHAVR1\nX26LeaTk+jHDs5u6oczdMhzjTeFogoBf8PvksO/XByusXpuMZRLcZwOpu/O2ud9LtRhYLCLPiMjz\nInJRuhPlcof4oabrSJm7dcsYjwrH4kfUa4D6YMAWxjMZyyS4S5rvDX8GugJYBJwPXAncKiJNR/xQ\nDneITw46DZ8tU1dVgWDB3eSOiGwXkddEZL2IrM339ULRxBFjSQD17rx325DGZCKT4N4GzE35eg7Q\nnuaY36pqVFW3AW/iBPu8CY2QuftEqK2y5qvJufeo6omqmvft9sKx+GHTIJOSrVLrdzeZyCS4vwgs\nEpGF7gbBK4HVw475DfAeABFpxumm2ZrLgg4XHqHPHZJ9k5a5G28Kj5S5u+NJHb2hQhfJeNCYwV1V\nY8D1wEPAJuBXqrpBRG4Ukcvcwx4COkVkI/A48Heq2pmvQsPIs2XACe59NlvG5I4CD4vISyJyXboD\ncj2eNPz5DTiUuVu3jMnEmFMhAVT1AeCBYd+7IeVzBb7gfhREcrpY5QgZzt5uy25Mzpylqu3uFN9H\nROQNVX0q9QBVXQWsAlixYsWE1uUNxeIE09TrWnc8ybplTCY8+4TqSNPFwBl46gvHiCds7Wszcara\n7v67D7gPZ3pw3oSjibTdjT4R6qoq2Ndjwd2MzbvBPRZP2yUDTvM1oXCg3/acNBMjIrUiUp/8HHg/\n8Hq+rqeqI06FhOSDTNYqNWPLqFumFIVGyG4gdeApTEt9VSGLZcrPDOA+EQHnfrlTVR/M18WicSWh\npO2WAadud/RZ5m7G5tngPlZ2A7CvN8QyGgpZLFNmVHUrcEKhrhdOLj2QZkAVoC5Ywa4DA4UqjvEw\nD3fLpJ8uBocydxt4Ml4THmFBvKT6YAX7+8I2nmTG5N3gHk0/XQxsypjxrnD0yLXcU9W7e6naeJIZ\ni3eD+wjTxcDZSzUY8FlwN55zaEXIETL3qkNdjsaMxrPBfbQBVYC6qoDdAMZzkpn7WK1S63I0Y/Fs\ncI/EElT6098A4NwElrkbrxkaUPWPPRPMmNF4MrhH4wniqlRWpFuw0mEbGxgvirjbQwZGGVAFC+5m\nbJ4M7gPu7vCVI2Q34PRN7usJ46yMYIw3RN3ZMiPV7YDf5yQuPdblaEbnyeCe3KxgpOwGnObrYDRu\nC4gZT4nEnWQk3ZpJSdPrq6xVasbkyeCeUeZuzVfjQdF4Ar+kXzMpaXp90Oq1GZNHg7uTjQdGDe72\nIJPxnkgsQWCUsSSAFsvcTQY8GdwHIyMv95tkU8aMF0XiiVFbpJDslgnZeJIZlTeDe9S6ZUx5isYT\no7ZIAaY3VBGKJmw8yYzKk8F9IIPMvTrgp9LvsweZjKdEYolR6zU4fe5grVIzOk8G92S3zGgZjojQ\nUl9Fh21sYDwkkkHmnlzG2jbtMKPxZHDPJHMHG3gy3hPNKHN3g7u1Ss0oPBncM+lzB+cmsD534yXR\nuGZQr51uGavbZjTeDO7JqZAZTRmz7MZ4RyQ+dubeUF1BZYWtempG58ngPhCJ4xPwy+jBfXp9kIMD\nUSLuI93GjJeI+EXkZRG5P5/XicScjd/HKAstddblaEbn2eAe8PuQsYJ7g9M3ud/2nDQT9zlgU74v\nEs1gnjs4ddsydzMaTwb3wUh8zKYrpA482U1gxk9E5gAfAG7N53VU1X1CNbO6bV2OZjTeDO7ReEbZ\nzaEpY3YTmAn5DvD3wIj9eyJynYisFZG1HR0d47pIOJZAGXuiANhMMDM2Twb3gQwz95kNzqyCdyy4\nm3ESkQ8C+1T1pdGOU9VVqrpCVVe0tLSM61qZLKuRNLMhSNdAdGiFVGOG82RwH4zGxnzQA6C5roqA\nX9jdZcHdjNtZwGUish34BfBeEflZPi6UnOKbSd2e1VQNQHvXYD6KYsqAJ4P7QCSzbhmfT5jZGGRP\nt90AZnxU9R9UdY6qLgBWAo+p6sfzca1MlrJOam10gvuebktcTHqeDO6DkXhGg04AsxqrLbsxnpBN\nt8xsN3PfbXXbjMCbwT0ap3KMucBJs5qqabduGZMDqvqEqn4wX+fPZJ+CpBmNzmSBPVa3zQg8GdyT\n89wzMaspyN6eEPGErX1tStuhZTXGTlyqKvy01FdZq9SMyJPBPdN57uD0TcYTag98mJI3tNppxl2O\nQdptPMmMwHPBXVUznucO1jdpvCObAVVIdjlavTbpeS64R+IJ4gnNPHNvcua624wZU+oGopkPqILT\nKt3TbdvtmfQ8F9wz2agjVXI+cNtBC+6mtIWyrttBBiJxDg5E81ks41GeC+7ZNl0bggGm1ATYeWAg\nn8UyZsIy3YQmaf60WgCr2yatjGqRiFwkIm+KyGYR+fIox10hIioiK3JXxMMNPcWX4Q0AMG9aLTs7\n7QYwpW0gGqPCJ/jGWO00af60GgB2dPbns1jGo8aMkCLiB24GLgaWAVeKyLI0x9UDfwOsyXUhUw1m\nmbkDzJ9aw44DdgOY0jaYxRRfgHlTk8HdEhdzpExq0mnAZlXdqqoRnPU1Lk9z3D8B3wLy+lTFwNB0\nscyyG4AF02rYfXDQNu0wJS2bKb4AwYCfmQ1BC+4mrYoMjpkN7Er5ug04PfUAETkJmKuq94vIl0Y6\nkYhcB1wHMG/evOxLy6Gn+DLJ3O9csxOA9q4QCXWmQy5srh3XdY3Jt4FoZpl7sl4DzJtWw05rlZo0\nMkkT0qXIQ3OvRMQH/CfwxbFOlItlUUNZThcDmFpbCVjfpCltTuaeeYsUnC7H7Za5mzQyiZBtwNyU\nr+cA7Slf1wPLgSfcZVHPAFbna1A129kyAFPrnOB+77rdh2U9xpSSgUgsq3oN0DUYpaM3zB3PbM9P\noYxnZVKTXgQWichCEanEWfZ0dfJFVe1W1WZVXeAui/o8cJmqrs1HgQeynAsMUF9VQaXfZ3upmpI2\nGE1kVa8BprmtUqvbZrgxa5KqxoDrgYdwNgj+lapuEJEbReSyfBdwuGyWRU0SEVrqbUNhU9oGI7Gs\n6jXA9HrnCWyr22a4TAZUUdUHgAeGfe+GEY49f+LFGlk2u9WkmtFQxeZ9ffkokjE5MRCJ01KX0S05\npLmuEp/AO7ZZthnGk0+oVvp9+H3ZDTxNrw/SE4oNZf7GlJpsNqFJqvD7mFpbxb4ey9zN4TwX3Acj\nMaor/Vn/3PR6Z3ODfZbhmBKVzWqnqWY0VFm9NkfwXHAfiMSpDowjuDc4fZP7rG/SZEFEgiLygoi8\nIiIbROQb+bhOcinrbLsbwUlcOvsihGPWKjWHeC+4R+PUjCNzb6oJEPAL+3oswzFZCQPvVdUTgBOB\ni0TkjFxfJBRNoJrdRIGk6Q1BFNjaYc9xmEM8F9xDkfi4umV8IsxoCNJuu8WbLKgjORIfcD9yvoB6\nNlvsDTfTbZVubO/JaZmMt3kuuA9Expe5g7MrU3vXIAnbT9VkQUT8IrIe2Ac8oqpHLI4nIteJyFoR\nWdvR0ZH1NYaW1RhH5t5SX0Wl38dru7uz/llTvrwX3KNxguPocwcnuIdjCbbbMgQmC6oaV9UTcZ7O\nPk1Elqc5ZkJLa2S7CU0qnwitjUEL7uYwngvug5HY+DP3Kc6uTHYTmPFQ1S7gCeCiXJ97PMtqpJo1\npZqN7T3ErVVqXN4L7tE4NZXZPeiRNL0+SIVPeN2Cu8mQiLSISJP7eTVwIfBGrq8znk1oUs1uqmYw\nGmdLhz2oZxzeC+7jHFAF8PuEmdZ8NdlpBR4XkVdx1ll6RFXvz/VFxrMJTarZ7l7Br7VZ3TaO8aXA\nRTTeee5Js5uq2bC7h0RC8WX5lKuZfFT1VeCkfF/n0CY04wvuLfVVVAf8vLa7mz8/ZU4ui2Y8ylOZ\ne/JBj/H2uYMT3HvDMRtUNSUlm01o0vGJcNysBmuVmiGeCu7hmPOgx3i7ZcAGVU1pGs8mNMMtn91o\ng6pmiKeCe7LpWjOBbpnp9UEqK3w2qGpKykRnywAcP7vRBlXNEI8Fd6fpOpHM3e8TlrZa89WUlmRw\nrxjHE6pJx89pBGxQ1Tg8FdyTMwrGOxUyKVjh4+WdXfakqikZg1FnooBPxh/cX9h2gIBfuGddWw5L\nZrzKU8G9fyi4jz9zh0NPqu44YBsLm9LQHx7/w3lJzpOq1bQfHMxRqYyXeSq4J7tlJpq5z2qyQVVT\nWgYjcWqqJhbcwV0/qXvQBlWNx4J72Mncayd4E8xosCdVTWnpj8SonWDSAk5wj8aVrTaoOul5K7hH\nc9MtM/Skqg08mRIxMIEnr1PNsqm+xuWt4B7OTbcMwKzGal5v70bVmq+m+AYi8Zxk7i11VQT8YsHd\neCu4JwdUc9V87Q3F2GmDqqYE5GJAFZxWaWujs8SGmdw8FdwHczDPPcmar6aUTHRZjVSzmqrZ0N5t\nU30nOU8F9/5InIBfJvSIdtKMBtu9xpSO/nCcmqrcrOM3u6ma/kicrftt/aTJzFPBfTAy/rXch6vw\n+Vgys95mzJiSMBCJUZuzzN3ZU9Xq9uTmqeCeq37JpOWzG3itzQZVTXElEs5qp9U5Slym1wepqrBW\n6WTnqeA+kc2x0zluViM9oRht9kSfKaJQLI4qOcvc/T7h2NYGNrRbcJ/MPBbcYznrlgFniVTAbgJT\nVP3h3Dy/kWr5rAY2tPdYq3QS81Rw789x5n7szHr8PmFDu00bM+mJyFwReVxENonIBhH5XK6vkasF\n8VIdN6uR3lCMXQesVTpZeSq4D0bi1OZoRgFAMODn6JZaC+5mNDHgi6q6FDgD+GsRWZbLC/S7U3wn\nuqxGquNmNQDWKp3MPBXc+yOxnMxxT7V8VqPNKjAjUtU9qrrO/bwX2ATMzuU1Du1TkLvEZYm1Sic9\nTwX3gXA8Z4NOSctmNbCvN0xHbzin5zXlR0QW4GyWvSbNa9eJyFoRWdvR0ZHVeQeGnrzOXd0OBvwc\n01Jnmfsk5q3gnuMB1TvX7GRvdwiw5qsZnYjUAfcCn1fVI9JhVV2lqitUdUVLS0tW5z40oJrbul1T\n6Wft9oM5O6fxFo8F99wOqAK0NjrLEFjz1YxERAI4gf3nqvrrXJ9/MJpcEC+3dXtWUzW94Rj7ekM5\nPa/xBs8E90gsQSyhOR1QBWedmqm1lZa5m7RERIAfA5tU9T/ycY2hzD2HA6oAre6Tqpa4TE6eCe5D\ng06B3N4AAK2NQbsBzEjOAq4C3isi692PS3J5gVztMDbcLLdVutHq9qSUUXAXkYtE5E0R2SwiX07z\n+hdEZKOIvCoifxCR+bku6NCgU46zG3AWWtrROUBPKJrzcxtvU9WnVVVU9V2qeqL78UAur5Gs27lO\nXIIBp1Vqs8EmpzGDu4j4gZuBi4FlwJVp5vm+DKxQ1XcB9wDfynVB85XdQMqeqrYzkymCgUic6oAf\nv09yfu7ZTdW8avV6Usokcz8N2KyqW1U1AvwCuDz1AFV9XFWTu148D8zJbTHz84h20twpNQCs22Ez\nC0zh5XpBvFTzptawu2uQd3psUHWyySS4zwZ2pXzdxugPcVwL/D7dCxOZC9znbrFXl+MBVXAGVRdN\nr2PdTgvupvD6wjHqgrmv1+AEd7DEZTLKJLinayumXY1IRD4OrAD+Ld3rE5kL3Ov2h9cHA1n9XKZO\nnjeFl3d12UJLpuB6QzHq8xTcW5uCVFb4LHGZhDIJ7m3A3JSv5wDtww8SkQuBrwKXqWrOH/fsCTmZ\ne75ugpPnN9E1EGVLh+1eYwqrNxSlvio/SUuFz8fxsxt5yTL3SSeT4P4isEhEFopIJbASWJ16gIic\nBPwQJ7Dvy30xnewG8hfcT1s4DYDntuzPy/mNGUlvKH/dMgCnLZzKq23dQ61fMzmMGdxVNQZcDzyE\ns2jSr1R1g4jcKCKXuYf9G1AH3O3OA149wunGLVkx89HnDrBgWg1zplTz5FsW3E1h5bNbBuCcRc3E\nEspzWzrzdg1TejKqUe683geGfe+GlM8vzHG5jtAbcmYUVPjz89yViHDu4hZ++/JuovEEgTxdx5jh\nekJRGvI0lgSwYv5Uair9/PHt/bz/uJl5u44pLZ6JYL2haF6zG4BzF7XQH4nbYkumYBIJpS+c38y9\nssLHu4+axpNvddiEgUnEQ8GAKNlnAAAQtUlEQVQ9lreZMuCsorene5CAX1j9yhHjxcbkRX8khmr+\nxpLAqduN1QF2Hhhg/a6uvF3HlBaPBff8Zu5VFX6Wz2rk/lfbCUXjeb2WMZA6USB/iQs4+wUH/MK9\n69ryeh1TOjwU3KN5vwEATpo3hd5QjP95dQ/gZD3JD2NyLd+zwJKCAT/HzWpk9fr2oQcCrW6XNw8F\n9/xn7gBHtdSytLWB/3z0LcveTd7l++G8VGcePY2eUIxbntiS92uZ4vNMcO8JxWgoQHD3ifDVS5bS\ndnCQ7zz6dt6vZya3QmXuAHOm1HD5ibP40R+32v4Fk4BngnuhumUAzl7UzMpT53LLk1t4eMNeYolE\nQa5rJp/kMtOFSFwAvnLJUqbWVvKJ215ka0dfQa5pisMTwT0SSxCOJajP0wNM6fzzny7nilPm8MRb\nHdz02GbaDg6M/UPGZKlQA6pJMxqC/PTa06mt8nPr09v49bo2wtb9WJY8EdyTA0CFaLomVfh9fPtD\nJ3D1GfMJxxKsemorj258p2DXN5NDMer2MdPrePBz53LOomZe2nGQHz29lc6+nC8HZYqscDVqAgo5\n6AQcNnvg2NYG5k6t4Y5nt3P9Xev43fVns2hGfUHKYYpPRG4DPgjsU9XluT5/byiK3yd52T4yndS6\nffHyVo5qruXna3by2bte5mfXno4vDxuGmOLwROZeyEGndGqrKrj63fPxi3D1bS/w0+d2FKUcpiju\nAC7K18mTs8CcfbgLb8nMBi49YRbPbunk+jvX2bTIMuKJ4N5T4Mw9nfpggMtPnM2e7hAvbLMFmCYL\nVX0KOJCv8xdqiu9oVsyfwpIZ9Tz6xj5bObKMeCO4DxY3c086blYDR7XU8uimfXQNRIpaFlMeegbz\nt5Z7pkSED7yrlXhcedjGlcqGJ4L7gX4nkDbXVRW1HCLCB4+fRSgatznw5jDj3UKysz/CtLrKPJYs\nM811VZx59DTW7ThoG8WXCU8E9/3uSP7U2uLfBDMbg5y2cCo/fX4Hb7/TW+zimBIx3i0k9/eFi560\nJL3n2OnUVPr5xu822OqRZcATwb2zL0xDsILKitIo7oVLZ1Bb6efG+zeiqrZGhxm3zr4I00ogaQFn\n/Zn3L5vJ2h0Hud/WVvK80oiWY9jfH6G5vjSyG3Bmz3z+wsX88e39PPZGXnYVNCVCRO4CngOWiEib\niFybq3MPRGIMRuNMK5HMHeCUBVNY1trAv/7+DQYj9nCTl3kiuHf2hWmuLZ0bAOCqd8/n6JZa/vl/\nNtnyBGVMVa9U1VZVDajqHFX9ca7O3dnnjCWVQp97kk+Er126jN1dg6x6amuxi2MmwCPBvTQGnVIF\n/D7+8YPL2La/nwdf31vs4hgPSo4ltZRQ5g5w+lHT+MDxrdz8xGZ2HbBlN7zKE8F9f1+45II7wPlL\npvPJsxbw7JZOnt1iG2ub7Owvwcw96Z/+dDkzGqr42fM7hv4IGW8p+eAeiyc4OBAtmRkFw331kqUs\nbW3g/lf3sMYebjJZSK7nUkp97klTayu57ROnElflx09vG5qObLyj5IP7gYFkdlN6NwA4C4xdeepc\nlsyo57fr2/n5GluawGSm0w2YpTJbZrhFM+q59uyFRGIJbv3jVrbv7y92kUwWSj64Jwedmkv0BgAn\nwH/09HksmVHPV+97ne8/sdnmCZsx7e8LU1dVQbBAi4aNR2tjNX9x9kIi8QRX3PIcG9t7il0kkyHP\nBPdSzdyTAn4fHz9jPpefOItvPfgm3/jdRqJxm0VjRlaKEwXSmd1UzXXnHEXAL3xk1XM8/baNL3lB\nyQf35GBOswduAr9POHXBVM46ehp3PLudK255jtd326PcJr1Sejp1LNMbglx1xnyqA36u+vEa/un+\njbbIWIkr+fXcd3cNAs4OMqVkpCf2fCJ84F2zmDu1hkc3vcOlNz3NylPnsWBazdCqlh89fV4hi2pK\nVHvXIMfNaix2MY4wUt1uqqnkM+cfwwOv7eG2Z7bx2/Xt/P2fLCEST+Bzlyy2ul06Sj5z37a/nxkN\nVdQWcIu9XHjXnCYe+9L5fPLMhdy9dhf//vBb/OGNd6yrxgDO1pG7Dg6ysLm22EXJSmWFjz89aTa/\n/euzmDu1mr+/91W+99jbvL3P1lkqNZ4I7l67AZIaggFuuHQZj3zhPBbNqOMPm/Zx0+ObWb+rq9hF\nM0W26+AA8YR6tm6/a04Tv/70mdz80ZOJxpXbn9nOvevahvZeMMVnwT2PkgsuPbelk4+dPp9rzlxA\nJJbgz77/DP/6+zcI2cbEk9a2Dmda4cIW79btu17YRfdglM9dsIjzFrewbsdB/uQ/n+LJtzJf8tjk\nT0kH9+6BKAf6I54N7sMtnlHP5y5YxIdOmcstT27h0u89zfNb7cGnyWh7pxPcjyqDuh3w+/iT42by\nV+cdTW1VBZ+47QX+zz2v0tFrT7YWU0kH923uDbCwua7IJcmdYMDPCXObuObMBezrDbNy1fN87Nbn\nWbs9bzu5mRK0dX8/U2oCNNWU/iywTM2dWsNVZ8zn3EUt3P3SLs751mP8ywObbPmCIinpUcpt+/sA\nyiZzT7V4Rj1feN9i1mw7wJNvdXDFLc9xzPQ63rd0BnOn1tisgzK3rcO73Y2jCfh9XLR8JisWTOHx\nN/bxo6e2cvsz23j3UdM4b/F0rj1nYbGLOGmUdOa+sb2HgF+YN7Wm2EXJi4Dfx9nHNPN371/Cxctn\nsqdrkFue3MKjm94hnrAnXMtVIqG8sbeHo1vKp0U6XHNdFR9aMZfPX7iYZa0N/PHt/fzXY2/zorVQ\nC6akg/sTb3Zw+sJpJbMDU75UVvg4Z1ELX3r/Ek6c28Rjb+zjoz96nr3doWIXzeTBK21dHByIcvai\n5mIXJe9a6qv4yKnz+PT5R+P3CStXPc9Nj71tyUsBlGzU3HVggLf39XH+ksz3o/S6qoCfD62YyxWn\nzOG13d1c/N2neOwN242+3Dz+Zgc+gfMWT566PWdKDde/5xg+cHwr3374La6+bQ37eix5yaeS7XN/\n/E1n+7r3Hju9yCUpvJPnTeHT5x/N9Xe+zF/csZbTF07lnEXNNFYHEBF8Ivh9MKWmkmWzGpjdVI24\nTwia0vf4G/s4ed6UshpMzUQw4Oe7K0/k7GOauWH165z/7Se45PhWlrU2EKjwIThLeFT4hPnTalna\nWj/0VLfJXkbBXUQuAr4L+IFbVfVfh71eBfw3cArQCXxEVbePt1C9oSg/eGILS1sbynLQKRNHt9Rx\n32fO5I5nt3P32l18++G3Rjy2ua6Scxa1cN7iFs5Z1Fzyi6x5zVj1PxsPvr6X13Z387VLl+WsfF4i\nInz41LmcPH8Ktzy5hYc37OWel9pGPH5pawPnLXbq9inzp5R9F20ujRncRcQP3Ay8D2gDXhSR1aq6\nMeWwa4GDqnqMiKwEvgl8ZDwF2nVggK/c9xp7e0Lc/LGTJ21GmlzfoyEY4NqzjyIcjRNNKAlVVEFV\n6QnFaO8aZEdnPw9t2Mt9L+8GYM6Uapa2NjB/ag2zp1Qzq6ma2U3VNNdVEQz4CAb8VPiEgWicgXCc\nvnCM7sEo3YMRugaixOJuf6gc+if5/5D83xBxPpzvCan/TQG/j8bqwNBHQ3WA+qoKfL4j/y+j8QS9\noRi9oSg9g+6/oRj94RjVlX4aggHqgxXuh/N5VYWPaFyJxhNUB/xpz5srGdb/MakqD214h6/c9xrL\nWhv4+Bnz81Hckpe6bs3J86Zw0twmBiJxp14Dqs4GPR19YXZ3DbK1o59VT23hlie3UOn3sWhGHUtm\n1jNnSg2zm4LMbqqhtSlIfbCC6oCfYMBPQnWoXveFY3QNOHW7JxSDZFe/pNblQ3V7qE6LU69J+R5A\nTWXFUL1uqnH+Tbdks6oSiiboCUXpDUXpdut2byhGNJ6gPhigIaVONwQD1AUrECDiLlEy0aWgM8nc\nTwM2q+pW543KL4DLgdTKfTnwdffze4CbREQ0y0XNVZVP/fdadnQOcOPlyzl53pRsfrysVQX8DM/H\nm2oqmTe1hjOOmkZClfauQbbs66O9O8T6XV089VYH4VhprGUjwtDiUqnVYqLjamv/74X5Xlkxk/o/\nptd39/BXP3uJJTPq+a8rTyTgtwwUnMCabt2oaXVVHDuzgQuOhVA0zpaOPnZ2DrCnJ8SjG9+hNxyj\nVLZM8InzPlLrdfIP1Xi999jp3HbNqRMqVybBfTawK+XrNuD0kY5R1ZiIdAPTgMMWfhaR64Dr3C/7\nROTNkS569T/D1aOXq3n4+YvIyjKyvJan5ZsjvpSr1DiT+p9x3d4BLPrCqNcrtf+/Qphs73nM93s7\ncPsnR3w5o7qdSXBP1+Yd/jcpk2NQ1VXAqgyuOXahRNaq6opcnGuirCwjK7XyjENB63YZ/L6yNtne\nc6HebyZtwzZgbsrXc4D2kY4RkQqgEbCnFUw5yKT+G1NyMgnuLwKLRGShiFQCK4HVw45ZDXzC/fwK\n4LFs+9uNKVGZ1H9jSs6Y3TJuH/r1wEM4U8FuU9UNInIjsFZVVwM/Bn4qIptxMvaV+Sy0KyfdOzli\nZRlZqZUnKyPV/zxe0tO/r3GabO+5IO9XLME2xpjyY/OxjDGmDFlwN8aYMlTywV1ELhKRN0Vks4h8\nOc3rVSLyS/f1NSKyIE/lmCsij4vIJhHZICKfS3PM+SLSLSLr3Y8b8lEW91rbReQ19zpr07wuIvJf\n7u/lVRE5OU/lWJLyfteLSI+IfH7YMQX7vXhFqdTrQsng/V4jIh0pdeQvi1HOXBGR20Rkn4i8PsLr\n+b8/VbVkP3AGsLYARwGVwCvAsmHHfAa4xf18JfDLPJWlFTjZ/bweeCtNWc4H7i/Q72Y70DzK65cA\nv8eZp30GsKZA/197gfnF+r144aOU6nUJvd9rgJuKXdYcvudzgZOB10d4Pe/3Z6ln7kOPfqtqBEg+\n+p3qcuAn7uf3ABdIHhakUdU9qrrO/bwX2ITz9GKpuhz4b3U8DzSJSGuer3kBsEVVd+T5Ol5XMvW6\nQDJ5v2VFVZ9i9Gd98n5/lnpwT/fo9/CAetjSB0By6YO8cZvIJwFr0rz8bhF5RUR+LyLH5bEYCjws\nIi+5j74Pl8nvLtdWAneN8Fqhfi9eUJL1Oo8yrYt/7nZR3CMic9O8Xk7yfn+WenDP2dIHuSIidcC9\nwOdVtWfYy+twuiROAL4H/CZf5QDOUtWTgYuBvxaRc4cXNc3P5PP3UglcBtyd5uVC/l68oOTqdZ5l\n8l5+ByxQ1XcBj3Ko1VKu8v7/W+rBvaSWPhCRAE5g/7mq/nr466rao6p97ucPAAERycteaqra7v67\nD7gPp+mbqtCPzV8MrFPVI7aOKuTvxSNKql4XwJjvV1U7VTXsfvkjnL0hylne789SD+4ls/SB29/5\nY2CTqv7HCMfMTPaLishpOL/fzjyUpVZE6pOfA+8Hho/KrwaudkflzwC6VXVPrsuS4kpG6JIp1O/F\nQ0qmXhfImO93WH/zZThjWuUs7/dnyW6zByW39MFZwFXAayKy3v3eV4B5bllvwbkJPy0iMWAQWJmn\nG3IGcJ8bLyuAO1X1QRH5q5SyPIAzIr8ZGABGXkB0gkSkBmczi/+d8r3UshTq9+IJJVav8y7D9/s3\nInIZEMN5v9cUrcA5ICJ34cwSaxaRNuBrQAAKd3/a8gPGGFOGSr1bxhhjzDhYcDfGmDJkwd0YY8qQ\nBXdjjClDFtyNMaYMWXA3xpgyZMHdGGPK0P8HQ+3ngmOtrQ0AAAAASUVORK5CYII=\n" 70 | }, 71 | "metadata": {} 72 | } 73 | ] 74 | }, 75 | { 76 | "metadata": { 77 | "_cell_guid": "ed530656-2707-4978-835c-c665a9e25ec0", 78 | "_uuid": "a2523383e47af8d7902b75c5da7829b85553dcae" 79 | }, 80 | "cell_type": "markdown", 81 | "source": "Notice that the *shape* of the data doesn't change, but that instead of ranging from 0 to 8ish, it now ranges from 0 to 1.\n\n___\n## Normalization\n\nScaling just changes the range of your data. Normalization is a more radical transformation. The point of normalization is to change your observations so that they can be described as a normal distribution.\n\n> **[Normal distribution:](https://en.wikipedia.org/wiki/Normal_distribution)** Also known as the \"bell curve\", this is a specific statistical distribution where a roughly equal observations fall above and below the mean, the mean and the median are the same, and there are more observations closer to the mean. The normal distribution is also known as the Gaussian distribution.\n\nIn general, you'll only want to normalize your data if you're going to be using a machine learning or statistics technique that assumes your data is normally distributed. Some examples of these include t-tests, ANOVAs, linear regression, linear discriminant analysis (LDA) and Gaussian naive Bayes. (Pro tip: any method with \"Gaussian\" in the name probably assumes normality.)\n\nThe method were using to normalize here is called the [Box-Cox Transformation](https://en.wikipedia.org/wiki/Power_transform#Box%E2%80%93Cox_transformation). Let's take a quick peek at what normalizing some data looks like:" 82 | }, 83 | { 84 | "metadata": { 85 | "_cell_guid": "851dc531-ea15-46f4-ba59-2e9be614856c", 86 | "_uuid": "e1484f70203b1a9335a557939398beb45b3a4fbd", 87 | "scrolled": true, 88 | "trusted": true 89 | }, 90 | "cell_type": "code", 91 | "source": "# normalize the exponential data with boxcox\nnormalized_data = stats.boxcox(original_data)\n\n# plot both together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(original_data, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(normalized_data[0], ax=ax[1])\nax[1].set_title(\"Normalized data\")", 92 | "execution_count": 3, 93 | "outputs": [ 94 | { 95 | "output_type": "execute_result", 96 | "execution_count": 3, 97 | "data": { 98 | "text/plain": "Text(0.5,1,'Normalized data')" 99 | }, 100 | "metadata": {} 101 | }, 102 | { 103 | "output_type": "display_data", 104 | "data": { 105 | "text/plain": "", 106 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xl4XPV56PHvO4v23ZJla7FlgyEY\ns8bYBELgBtJCFsjtpY2h2dvQNJcsN6RtGlqakqb3Jk2TJjckhKa3aRayQELjEBIIhCUp2LEB78b7\nJsmWZNnat1ne+8c5I8aylpE025l5P8/jx9LM0ZlX0plXv/P+NlFVjDHG5BZfpgMwxhiTfJbcjTEm\nB1lyN8aYHGTJ3RhjcpAld2OMyUGW3I0xJgdZck8CEfmUiHwz2ccmcC4VkXOTcS5jUkFEPi0i33U/\nXiIiAyLiT/JrHBaRGxI89r0i8ttkvn62suQ+gfvL3y4iQyJyQkS+LiJV032Nqv6jqv5pIuefzbHz\nISLPiMiIiPSLSJ+IvCginxSRwlmcw/54ZDk3sXWISGncY38qIs9kMKxJqepRVS1T1UimY0lE/B8m\nL7LkHkdE7gI+B/wFUAlcCSwFfiUiBVN8TSB9Ec7anapaDiwG7gLWAY+JiGQ2LJNkAeCj8z2JOCwn\n5Aj7RbpEpAL4e+DDqvpLVQ2p6mHgj3AS/Dvd4z4tIg+LyHdFpA9478S/8CLybhE5IiLdIvK38beN\nE25TW9zW8XtE5KiInBSRu+POs0ZEXhCRHhE5LiJfneqPzHRUdVBVnwFuBl4HvGWm84vIc+6Xb3Vv\npd8hItUi8qiIdInIaffjptnGY5Lun4BPTHWHKSJXicgmEel1/78q7rlnROSzIvJfwBCw3H3sH0Tk\nefd3/zMRWSAi33PvAjeJSEvcOb4sIsfi7hCvmSKO2PUeEJHXueeO/RsRkcPucT73LvOA+x76kYjU\nxJ3nXXHvr7sne624YxeIyHo3tt8B50x4ftLYReRG4FPAO9z4trqPv09Edrt3xAdF5M+me/1MsuT+\nqquAIuAn8Q+q6gDwC+BNcQ/fAjwMVAHfiz9eRFYCXwP+GKfFXAk0zvDarwfOB64H7hGRC9zHI8D/\nAmpxkvL1wIdm+X3Ffy9Hgc1A7M035flV9Q3uMZe4t9I/xLle/h3nj90SYBj46lzjMUmzGXgG+MTE\nJ9yk+HPgK8AC4IvAz0VkQdxh7wLuAMqBI+5j69zHG3ES4gs4v/saYDfwd3Ffvwm41H3uQeAhESma\nLmBVfcG9rsqAamAD8H336Y8AbweuBRqA08B97vezEvi6G1uD+z1N18C4DxjBeS++3/0Xb9LYVfWX\nwD8CP3TjvMQ9vhN4K1ABvA/4kohcPt33mimW3F9VC5xU1fAkzx13n495QVX/U1Wjqjo84dhbgZ+p\n6m9VdQy4B5hpAZ+/V9VhVd0KbAUuAVDVF1V1g6qG3buIb+Bc8PPRjnMhz/r8qtqtqj9W1SFV7Qc+\nm4R4THLcA3xYROomPP4WYJ+qfsf9PX8feAV4W9wx31LVne7zIfexf1fVA6rai9O4OaCqT7rvj4eA\ny2JfrKrfda+NsKr+M1CI01hJ1FeAQSDWCv8z4G5VbVXVUeDTwK1uCfRW4FFVfc597m+B6GQnFafj\n9n8A97h3rzuA/4g/Zraxq+rP3Z+LquqzwBO82ljKKpbcX3USqJ2ihr7YfT7m2DTnaYh/XlWHgO4Z\nXvtE3MdDQBmAiJznlj5OuCWgf+TMPzJz0Qicmsv5RaRERL7h3hL3Ac8BVZLk0Q9m9tzE9SjwyQlP\nNfBqazzmCGfeTU52PXfEfTw8yedlsU9E5C63VNErIj04d6sJXaduWeM64HZVjSXppcAjbrmwB+dO\nIQLUc/b7a5Cp3191OP0R8d/fGT+L2cYuIjeJyAYROeUe/+ZEv9d0s+T+qheAUeAP4h8UZxTCTcBT\ncQ9P1xI/TtxtoogU49w6zsXXcVpZK1S1AqcGOOfOUBFpBl4L/GaO578Lp1Wz1j0+VrqxDtrs8HfA\nBzgzcbfjJMt4S4C2uM/nvDSsW6P+K5y+qWpVrQJ6SeCacL/2M8At7h1CzDHgJlWtivtXpKptOO+v\n5rhzlDD1+6sLCMcfj/O9Jxr7GT8XcUaa/Rj4AlDvHv9YIt9rJlhyd7kX198D/1dEbhSRoNtp9BDQ\nCnwnwVM9DLzN7cQqcM85119+OdAHDIjIa4A/n8tJ3Bb3tcBPgd/hXJCJnL8DWD4hnmGgx63l/h0m\na6jqfuCHODXrmMeA80Tkdrcj8x3ASpxWfjKU4yTQLiAgIvfg1KOn5TY0fgi8W1X3Tnj6fuCzIrLU\nPbZORG5xn3sYeKuIvN59f93LFHnMHXL5E+DT7ntgJfCeWcTeAbTIqyOICnDKNl1AWERuAn5vpu81\nUyy5x1HVz+O0Xr+Ak/Q24rQirnfre4mcYyfwYeAHOK2MfpxOmIS+foJPALe75/hXnDfDbHxVRPpx\nLtJ/wWl13Bh3+zvT+T8N/Id7e/xH7jmKcUpUG4BfzvYbMil3LzA+5l1Vu3E6AO/CKV/8JfBWVT05\n+ZfP2uM4Nfm9OCWPEaYvW8ZcDywCHo4bMbPTfe7LwHrgCff63QCsdb+fncD/xOn8PI7T2do6zevc\niVNCOgF8C6dTONHYH3L/7xaRl9x+po8AP3Jf93Y3zqwktllHaolIGdCDU/o4lOl4jDH5wVruKSAi\nb3NvA0tx7gK2A4czG5UxJp9Yck+NW3A6stqBFcA6tVskY0waWVnGGGNykLXcjTEmB2Vs0ava2lpt\naWnJ1MubHPfiiy+eVNWJszXTwq5tk0qJXtsZS+4tLS1s3rw5Uy9vcpyITJyVmTZ2bZtUSvTatrKM\nMcbkIEvuxhiTgyy5G2NMDrLkbowxOciSuzHG5CBL7sYYk4MsuRtjTA6y5G6MMTnIkrsxxuSgjM1Q\nna8HNx4d//j2tUumOdIYk2nx71ew92w6WMvdGGNykCV3Y4zJQZbcjTEmB1lyN8aYHGTJ3RhjcpAl\nd2OMyUGW3I0xJgdZcjfGmBxkyd0YY3KQJXeT10TkRhHZIyL7ReST0xx3q4ioiKxOZ3zGzJUld5O3\nRMQP3AfcBKwEbhORlZMcVw58BNiY3giNmTtL7iafrQH2q+pBVR0DfgDcMslxnwE+D4ykMzhj5sOS\nu8lnjcCxuM9b3cfGichlQLOqPprOwIyZL0vuJp/JJI/p+JMiPuBLwF0znkjkDhHZLCKbu7q6khii\nMXNjyd3ks1agOe7zJqA97vNyYBXwjIgcBq4E1k/WqaqqD6jqalVdXVdXl8KQjUmMJXeTzzYBK0Rk\nmYgUAOuA9bEnVbVXVWtVtUVVW4ANwM2qujkz4RqTOEvuJm+pahi4E3gc2A38SFV3isi9InJzZqMz\nZn48uxOTMcmgqo8Bj0147J4pjr0uHTEZkwzWcjfGmBxkyd0YY3KQJXdjjMlBltyNMSYHWXI3xpgc\nlFByn2nlPBFZIiJPi8jLIrJNRN6c/FCNMcYkasbknuDKeX+DM0b4MpyJIF9LdqDGGGMSl0jLPZGV\n8xSocD+u5Mwp3MYYY9IskUlMk62ct3bCMZ8GnhCRDwOlwA1Jic4Yk1NODowS8E22XptJtkSS+7Qr\n57luA76lqv8sIq8DviMiq1Q1esaJRO4A7gBYsmTJXOI1xnhQOBLluxuPsLdjAAEOdA3yf/7HRQT9\nNqYjVRL5yc60ch7AnwA/AlDVF4AioHbiiWzlPGPy0/qt7eztGOCGC+q5+txafvxSK3f9aCuqE9uJ\nJlkSSe7TrpznOgpcDyAiF+Akd1vU2hjDrvY+Nh85zTUrannjaxby5osW84nfO4/1W9t5aHNrpsPL\nWTMm9wRXzrsL+ICIbAW+D7xX7U+yMQa4/9kDFAR8XHfewvHHPnTduaxdVsO9j+7i5MBoBqPLXQkV\nvFT1MVU9T1XPUdXPuo/do6rr3Y93qerVqnqJql6qqk+kMmhjjDd09Y/y8+3HWdNSQ3GBf/xxn0/4\nxz+4iOFQhK88tS+DEeYu680wxqTM4ztPEIkqly+pPuu5c+rKuG1NMw9uPMrBroEMRJfbLLkbY1Lm\nlztOsKy2lPqKwkmf/+j151EY8PH5X+5Jc2S5zzbrMMYk1YMbjwIwPBbh+QMnuWZFHSKTj22vKy/k\nz649hy/+ai9bj/VwSXNVOkPNadZyN8akxIGuAaIK59eXT3vc+1+/jMriIF99en+aIssPltyNMSlx\n8OQgQb/QVFM87XFlhQHee1ULv9rVwSsn+tIUXe6z5G6MSYmDXQO0LCgl4Js5zbzv6hZKC/x8/ZkD\naYgsP1jN3RiTdP0jITr7R7lsklEyk6kqKeDypdWs39LOuXVlLChzOmBvX2vLlMxVTiT3WAcO2MVg\nTDY4emoIgGULShL+mqvPreX5/d08f7Cbt13cAJz53o6x93hirCxjjEm6Y6eG8YuwuGr6enu8iqIg\nFzVV8tKR04yGIimMLj9YcjfGJF3r6SEWVRbNetXHK5cvYDQcZUtrT4oiyx+W3I0xSRVVpbVnmOYZ\nRslMprm6mPqKQl46cjoFkeUXS+7GmKTq6h9lLBylqTrxenuMiHD5kmqOnR6mq98WFJsPS+7GmKRq\n6xkGoGkW9fZ4lzZX4RN4+Zi13ufDkrsxJqk6+0bw+2R8OONslRcFaaktZWe7TWiaD0vuxpik6ugb\npa6sEP889kq9cHEFXf2jdPaNJDGy/JITyb13OGQL/huTJTr6RqZcBTJRKxsqAdh13Frvc5UTyf0n\nL7XywHMHCUWiMx9sjEmZ/pEQPcMh6iuK5nWeyuIgjVXF7OnoT1Jk+cfzyT0ciXK4e5CB0TBbjtrY\nWGMyaV+ns+nGfJM7wLkLyzh2aogRm9A0J55P7q2nhwlFlKBfeP7gyUyHY0xe2+e2tJOR3FcsLCOq\ncLBrcN7nykeeT+4HTw4iwBUtNXT2jdpfeWMyaM+JAYJ+oaokOO9zLakpocDvY1+nlWbmwvPJ/XD3\nIPUVRTTXlKDAke6hTIdkTN7a29HPwvIifFPsvDQbAb+PltoSDp20lvtceD659wyNUVteSK07pvY/\nnj/MgxuPTrqanDEmtfZ29M97pEy8lgWldPaPMjQaTto584Xnl/wdGA1TVhigtqwAwIZEGpMhPUNj\ndPaP8tqlM6/hnmjja+mCUgCOnBrigsUV84ov33i65R6ORBkJRSkr9FMY8FNRFLD1KIzJkL0dyRsp\nE9NUXYzfJxy20syseTq5D7i3amWFTudNbXmhtdyNyZDYmPSF5ckrywT9Ppqqijncbcl9tjyd3AdH\nnZExZYVOdam2rJAuS+7GZMTeE/2UFwaoLJ7/SJl4zTUlHO8dIRLVpJ4313k6uQ+MhgAoK3KSe01J\nASOhqA2HNAkTkRtFZI+I7BeRT07y/AdFZLuIbBGR34rIykzE6QV7O/pZUV+GJGGkTLzG6mLCUaXD\n1pmZFY8n91hZxknusRZD73AoYzEZ7xARP3AfcBOwErhtkuT9oKpepKqXAp8HvpjmMD1BVdnb0c/5\ni8qTfu7Y0sFtp4eTfu5c5vHkfmZZpsJN7n0jltxNQtYA+1X1oKqOAT8Abok/QFXjV64qBaw2MImT\nA2OcHgpxXn3yk3tNaQFFQR+tPZbcZ8PbyX0kRIHfR0HA+TZiLfe+YRsTaxLSCByL+7zVfewMIvI/\nReQATsv9I5OdSETuEJHNIrK5q6srJcFms71uZ2oqkruI0FRVQluPTVCcDU8n98GxCKWF/vHPy93a\nu5VlTIImKw6f1TJX1ftU9Rzgr4C/mexEqvqAqq5W1dV1dXVJDjP77TmRuuQOTt39RO+Irfw6C55O\n7gMj4fGSDDjDpkoK/FaWMYlqBZrjPm8C2qc5/gfA21MakUft6+ynuiQ4Ppkw2RqriokqnOi1TtVE\neTu5j4YpKzpz2FVlcZA+a7mbxGwCVojIMhEpANYB6+MPEJEVcZ++BdiXxvg8Y8+Jfs6rL0/6SJmY\npmqnU9Xq7onzdHIfGgtTEvSf8VhFkSV3kxhVDQN3Ao8Du4EfqepOEblXRG52D7tTRHaKyBbg48B7\nMhRu1lJV9nUMpGSkTExlcZDSwoCNmJkFT68tMxqOUhQ88+9TRXGQ1tPW8WISo6qPAY9NeOyeuI8/\nmvagPOZ47wj9o2FWpKjeDrFO1WLrVJ0Fz7bco6qMhqMUTmy5FwcYHIsQto4XY9IituzA+SlM7uB0\nqnb2jTI0ZqPhEpFQcp9pFp97zB+JyC73FvbB5IZ5trGwk7wLA2d+C5VuDb5/xC4AY1LtwY1H+dEm\nZzTpttaelC613VhVjAK7j9vmHYmYMbknMovP7XT6a+BqVb0Q+FgKYj3DqJvciwJnttxjwyH7bf1n\nY9Kio2+E8qIAJQWprfI2uDNVd7T1pvR1ckUiLfcZZ/EBHwDuU9XTAKramdwwzxZbP6ZwQs29bLzl\nbp2qxqRDR99oUpf5nUpFUYDSwoAl9wQlktwTmcV3HnCeiPyXiGwQkRsnO1EyZ/GNjpdlpmi5W1nG\nmJSLqtLZP0J9Epf5nYqI0FhVxI72vpkPNgkl90Rm8QWAFcB1wG3AN0Wk6qwvSuIsvlG35T5xtExZ\nYQDBkrsx6XB6cIxQRNPScgdoqCxmX0e/rfyagESSeyKz+FqBn6pqSFUPAXtwkn3KjEzRcveJUFoY\nsLKMMWnQ6e58tjBdyb3KWf43ttyBmVoiyX3GWXzAfwL/DUBEanHKNAeTGehEo1PU3MEpzVjL3ZjU\ni62xnszdl6bTGOtUbbe6+0xmTO4JzuJ7HOgWkV3A08BfqGp3qoKGqUfLgJPcB2y0jDEp19E3QlVJ\nkKLg2e/DVKgqCVJZHGRHm9XdZ5LQ2KUEZvEpztTsjyc1ummMhJ2We0FgspZ70BYYMiYNOvpGqS9P\nT0kGnE7VVY0VNmImAZ6doToaihL0C37f2f295YVOy932XDQmdcKRKF0Do9RXpKckE7OqsZI9J/rH\nJzKayXk3uYcjk5ZkwCnLRBVODY6lOSpj8sfh7iEiUU1bZ2rMqoZKxiJR9nVap+p0PJvcR0LRSTtT\nwSnLAHS5PfnGmOSL7b6UrmGQMasaKwHYaXX3aXk2uY+GI2cNg4yJTWTq7Le6uzGpsrejHyF9I2Vi\nltaUUF4YYLvV3afl4eQePWvRsJhYy73TWu7GpMzejn5qSgsI+tObRnw+YWVDhQ2HnIF3k3vo7OV+\nY2ItdyvLGJM6e070p73eHrOqsZLdx/tsae9peDe5hyMUTdFyD/p9FAV9ltyNSZGRUIRDJwdZXJmZ\n5H5RYyUjoSgHugYz8vpe4NnkPl2HKkBZYdBq7sakyN6OfqIKizLUcr+4yelU3XLsdEZe3ws8m9zH\nwlEK/FPPiisvCljL3ZgU2X3cGamSqZb7stpSqkuCvHjEkvtUPJncQ5EoEVUKAlPvtF5eFLAOVWNS\nZPfxfkoL/FSXFmTk9UWEy5dUW3KfhieT+9CYu/TANL305YUBOvtGcVZGMMYk067jfZy/qByfTN3A\nSrXLl1ZzoGuQniGbrDgZTyb32FrOwSk6VMEZDjkcitgCYsYkmaryyvE+XrO4IqNxXL6kGoCXj/Zk\nNI5s5cnknlDL3YZDGpMS7b0j9I2EuSDDyf2S5kr8PrHSzBQ8mtyd1vh0kydsIpMxqbHb3eZu5eLy\njMZRUhDggsXlvHTUkvtkPJnch8emXu435tUlCCy5G5NMsZEy5y/KbMsd4LVLqtlyrMcmM03Cm8k9\nZGUZYzLllRP9LKkpoawwoe0gUurypdUMjUV4xbbdO4snk/tQAi334qCfAr/PJjIZk2S7j/dxQYZL\nMjGxTlWru58t83965yBWlpmu5i4i1JUX0tVnLXdjkmVoLMyh7kFuvrQhYzE8uPHo+MeqyuLKIn53\n6BTvuaolYzFlo5xtuQPUlRdazd2YJNrR1oeqs7ZLNhAR1i6rYeOhUzanZQJPJvdEau7grDNtNXdj\nkmdbqzOm/KKm7EjuAGuXL+DkwCgHT9oiYvG8mdxjQyGnWX4AYi13q7kbkyzb23pZXFnEwjRuij2T\nNctqANh48FSGI8kunkzuQ2MRfAL+GaY+Lywv4vRQyDbSNSZJtrX2jq/ImC2W15ZSW1bI7w51ZzqU\nrOLZ5B70+5CZkru7K/vJASvNGDNfvcMhDp0c5OKmqkyHcgYRYe1yq7tP5MnkPjwWmbEzFV7d29E6\nVY2Zvx3unqXZ0pka8+DGo/hFON47wn1PHzhjNE0+82ZyD0Vm7EwFp+YO0NlndXdj5mtbq5Pcs60s\nA9BSWwrAoZMDGY4ke3gyuQ8l2HKP7RLTYcndmHnb1trDkpoSqkoys4b7dBaWF1JS4OfQyaFMh5I1\nvDmJKRROaMf12rJCgn6hrceSuzHz9cLBbpqrS7Ky7OEToWVBqbXc43i35Z5Acvf5hEWVRRzvHU5D\nVMbkru6BUXqGQjRVF2c6lCktqy3l9FDINu9weTK5D49Fpt2oI15DZTHtPZbcjZmPbW5namOWJ3eA\nw902mQm8mtxDEQr8iW3v1VBVTLuVZcwkRORGEdkjIvtF5JOTPP9xEdklIttE5CkRWZqJOLPB9tZe\nBGiszN7kvqiyiKKgj0M2UxXwaHKPjXNPRENVESf6RohEbfyreZWI+IH7gJuAlcBtIrJywmEvA6tV\n9WLgYeDz6Y0ye2xr7aG2vJDCoD/ToUzJJ8LSmlJL7i5PJvdEx7kDLK4sJhJVW2PGTLQG2K+qB1V1\nDPgBcEv8Aar6tKrGhl9sAJrSHGPW2NbaS1NV9rbaY5bVlnJyYMyWHcGDyV1VEx7nDtDoXpBtVnc3\nZ2oEjsV93uo+NpU/AX4x1ZMicoeIbBaRzV1dXUkKMTuc6B2hs380q+vtMbG6++8O2ToznhsKORaJ\nEolq4i33KmesuzNipjqFkRmPmazTZtLanYi8E1gNXDvVyVT1AeABgNWrV+dEDTA25HGXu2dqowda\n7g1VxRT4fWw8eIq3Xpy5NeezgeeSeyIbdcRrcC/I1tPWcjdnaAWa4z5vAtonHiQiNwB3A9eqal7W\n9tp6hvCJU+LMdn6fsHRBCRttETHvlWXGN+pIMLlXFAWpLgly9JTNXDNn2ASsEJFlIlIArAPWxx8g\nIpcB3wBuVtXODMSYFdp6hllYXpTw3XKmLastZW/HAKcG83u8e0K/rZmGjMUdd6uIqIisTl6IZ4pt\n1JHoOHeAJQtKOdptyd28SlXDwJ3A48Bu4EequlNE7hWRm93D/gkoAx4SkS0isn6K0+UsVaXt9LAn\nSjIxLQus7g4JlGXihoy9CedWdpOIrFfVXROOKwc+AmxMRaAxw7NsuQMsrSnh5WO2ga45k6o+Bjw2\n4bF74j6+Ie1BZZne4RCDYxFPdKbGNFUXUxjwsfFQNzeuWpTpcDImkQw545Ax12dwxgGndAxSrCwz\n0y5M8VoWlNB2etg27TBmlmKjzLzUcg/4fVy+pDrvW+6JJPcZh4y5tclmVX10uhMlY7jYkLvFXiIt\n9wc3HuXBjUdp7xkhqjYc0pjZajs9jE+c2Z9esmZZDbuO99E7HMp0KBmTSHKfdsiYiPiALwF3zXQi\nVX1AVVer6uq6urrEo4wzEtscexY195pSZ4nSI7bmhDGz0tYzTH1FUcKj07LF2uU1qMLmw/nbek/k\nNzbTkLFyYBXwjIgcBq4E1qeqU3W2o2UAasqc5P7jl9qycrlSY7KRqtLW463O1JjLl1QT9Etel2YS\nyZDTDhlT1V5VrVXVFlVtwZmmfbOqbk5FwEOzHOcOUF4YoMDvs71UjZmFnqEQQx7rTI0pCvq5pKmK\nDZbcp5bgkLG0GR8tM4uyjIhQV15o68sYMwutHuxMjbdmWQ072nrHc0a+SWiG6kxDxiY8ft38w5ra\n+Dj3WdYA6ysK2d9pu7QYk6j2nmH8IuPbVXrNpc1VRKLKzvZeVrfUZDqctPNWLwmv7sLk9yU+FBJg\nYXkRfSPhvP0rbsxstZ0epr6ykIDHOlNjLm2uAmDLsZ4MR5IZHlxbJkxxwezXlF5YXghgS4Eak4BY\nZ+qqxspMhzInsYETlcVBfrqlnZKCALevXZLhqNLLc3+Sh8YiFM9hw4CF7q1lp9XdjZnRsVPDDIci\nnq23xzRXF9N6Oj+XHvFecg9FKJlDy72qJEjQL3T2WcvdmJlsa3NKGV4cKROvqbqE00MhBkbDmQ4l\n7TyX3EfGInMqy/hEqK8oor3XkrsxM9ne2ovfJ9RXFGY6lHlprikByMvWu+eS+9DY3Fru4Azpau8Z\nJmr7qRozre1tvSyqKCLg81yKOENDVRGCU2bKN577zQ2FIhTNcZPexqpiRsNRDtsyBMZMKRpVtrf1\ner4kA1AY8FNfUWQtdy8YHgvPveXuXqzb23qTGZIxOeXIqSH6R8Ke2BA7EU3VxbSeHkY1v+7YvZfc\nQxFKCuY2gnNheREBn7DDkrsxU4o1fhpyJLk315QwHIpwOM827PFecp9jhyo4+ysuqiyylrsx09je\n2kNBwEe9R2emTtTk3rFvybMNezyX3Oc6zj2msaqYnW191qlqzBS2tfaycnHFrGeBZ6v6iiIK/D62\nHsuvRp2nkruqumWZ+SX3/tGwdaoaM4loVNnZ3sdFHp2ZOhmfCA1VxXm3DIGnkvtoOIoqcy7LgHWq\nGjOdQ92DDIyGuagpd5I7QHNNMbva+xgN58/aUp5K7rG13EvmUZZZWF5EQcBnnarGTGJ7q/O+uDjH\nkntTdQljkSi7j/dnOpS08Vhyd6YQz6fl7vcJFyyusJa7MZPY1tpLUdDHuXVlmQ4lqWKdqtta86c0\n46nkHluud65DIWOKAj5ePtpjnarGTLCjzelM9eoyv1OpKg6yoLSAba3506jz1G9wcDy5z73lDq/O\nVD1yKr/GvRoznUhU2dHey8VNVZkOJelEhIubKvOq5e6p9dxjZRmn5T73pXtjkzO2t/WyrLY0GaEZ\n42kPbjxKR98IQ2MRBkbDObmR/MVNVTy7t4vB0TClhZ5KfXPiqZb70KjTci8tnF/Lvb7CZqoaM1G7\nx/dMncklzZVElbx533sruYeSU5YZn6maR/U3Y2bS2jNM0O9sJp+LYuWmfKm7e+reZGg0viwzPw2V\nxexo70VVEcmNmXjGzEfb6WGhg+jHAAAWnElEQVQaqorx5ej74YmdHVQVB1m/tX28LJPLW+95quUe\n61AtTUJyb6wqpn8kzFHrVDWGSFQ53jucsyWZmMbqYtp68mNtd08l9+EkjHOPabCZqsaM6xoYJRTR\nnE/uTdUlnBocG68C5DJPJffBsQhBv1AQmH/Y9RWFFPh9ltyNwSnJgPf3TJ1JbDJTax603j2V3IfH\n5r6W+0QBn4/zF5XnTc+5MdNp6xmmIOCjtiw3O1NjYncmractuWeVwdG578I0mVWNFWxv7c27HVqM\nmajt9BANlbnbmRpTFPRTW1ZAWx5su+ep5D6fzbEnc2FDJX0j4bz4K27MVEKRKMd7R2isyo3NOWbS\nVF1iZZlsMzQWTlpZBmCVu2b1znYrzZj8tedEP+Go0lxTkulQ0iI2Uq53OJTpUFLKU8l9MMkt99cs\nKsfvE3a29yXtnMZ4zcvuJhbN1fmR3GOdqm05fsfuqeQ+PBZJ6poQRUE/59SVWnI3eW3rsR5KC/xU\nlQQzHUpaLK4sxifQ2pPbdXdPJffBsXBSxrjHW9VQaSNmTF7bcqyH5pqSvJmpHdv821ruWWRoNEJp\nkpP7yoYKOvtH6eqf+yqTxnhV30iIA10D46WKfNFYVUzr6eGcHinnreSe5A7VBzce5UTvCGCdqvlI\nRG4UkT0isl9EPjnJ828QkZdEJCwit2YixlRzhgLnT709pqm6hOFQJKeXH/FYck9uhyo49TfA6u55\nRkT8wH3ATcBK4DYRWTnhsKPAe4EH0xtd+mxxO1Ob8i65O+/7rTm8QqRnkvtYOEo4qklfZL+4wE9N\naYG13PPPGmC/qh5U1THgB8At8Qeo6mFV3QZEMxFgOmw51sPy2tKk92Vlu9ieDtuO5e7OTJ5J7uOb\nYweTfxEuriyylnv+aQSOxX3e6j42JyJyh4hsFpHNXV1d8w4uHVSVLcd6uKQ597bVm4nfJyyuLMrp\ntd0TSu4J1CY/LiK7RGSbiDwlIkuTHejQWHJ2YZpMY1UxR7qH6BvJ7UkN5gyTDQ2Zc++aqj6gqqtV\ndXVdXd08wkqf9t4RuvpHuTQPkzs4pagd7b1EornZqTpjck+wNvkysFpVLwYeBj6f7EDP3D81ucb3\nVM3hv+LmLK1Ac9znTUB7hmLJiI0HuwG4oqUmw5FkRlN1MUNjEfZ3DmQ6lJRIpOWeSG3yaVWNdTtv\nwHmjJNXgaHK22JtMbKTAS0dOJ/3cJmttAlaIyDIRKQDWAeszHFNabTx4isriIK9ZVJ7pUDKicbxT\nNTfr7okk99nWJv8E+MVkT8ynLjngLq5floJdy4sL/KxYWMZLRy255wtVDQN3Ao8Du4EfqepOEblX\nRG4GEJErRKQV+EPgGyKyM3MRJ9+GQ91c0VKDz5cfk5cmqi0rpKwwwLYcTe6JZMqEa5Mi8k5gNXDt\nZM+r6gPAAwCrV6+eVaGr362HlxelZor05UuqeXzXCdtTNY+o6mPAYxMeuyfu402k4C40G5zoHeFI\n9xDvujLp3WOe4RNhVWNFznaqJtJyT6g2KSI3AHcDN6tq0qd79o04LffyotTs6X350ip6hkIc6BpM\nyfmNySYbDzn19iuXL8hwJJl1SVMVu4/3MRqOZDqUpEskuc9YmxSRy4Bv4CT2zuSHCf0pTu5rljkX\n+QsHTqbk/MZkkw0HT1FeFOCCxRWZDiWjLm6qIhRR9pzoz3QoSTdjck+kNgn8E1AGPCQiW0Qk6R1T\nsbJMKmruAC0LSmiqLubZvZbcTe7beNCpt/vztN4ec3GTs6dDLs5UTShTJlCbvCHJcZ2lf8TZYi/g\nT828KxHhDefV8dOX2whFogRT9DrGZFpn3wgHTw6ybk3zzAfnuKbqYmpKC5yZqjnW/+CZDNY/EkpZ\nSSbmDSvqGByLsPmwjZoxuev5A1ZvjxERLm6qzMlOVQ8l93DKRsqAs0Lk8d5hgn5h/da8msti8swz\nezpZUFrAqobKTIeSFS5uqmJfZ//4RMlc4bHkntqWe2HAz6qGSh7d1s5IKPd6z42JRJXn9p3k2vPq\n8nZ8+0SXNFUSVdjRllvrS3kouYdS2nKPuWxJNf0jYX6+7TjgtOhj/4zxum2tPZwaHOPa872x/k06\nXNzkrK2Ta5OZPJTcU99yB1heV8oFiyv40pN7rfVucs6Tuzvw+4Q3rLDkHlNXXkhDZVHOjZjxTHLv\nGwlTkYbk7hPh7jdfQOvpYf7lyX0pfz1j0unxnR2sXVZDdWlBpkPJCrG78urSAn67ryun7tA9k9zT\nVZYBeP2KWtZd0cz9zx7giZ0nCEdzdq8Gk0f2dw6wv3OAG1ctynQoWWdpTQmnh0L0DefOst+pbwon\nwVg4ymg4SnmKJjBN5h/evopwVHn4xVZ2He/j1tfm5BIjJo/8YrvTjzQ4GsmpFmoytNSWAnC4O3eW\nH/FEyz22ImQ6au4xAb+PL/zhJbz7yqWMhqM88NxBntzVkbbXNyaZVJVHtrSxZlkNlcXpuQP2ksWV\nxQT9wpHu3Nkw2xMt91SvCDlRfKvmNYsraK4p4VvPH+bO77/Ez+58PSvq83P9a+Nd21p7Odg1yB3X\nLCdHNx6aF79PWFJTYi33dEv1omEzKS0M8O7XLcUvwrv/3+/4zgtHMhKHMXP1yMttFAR83HTR4kyH\nkrVaFpRyonckZ7bb9ERy70tzy30y5UVBbrm0keO9I/zOXS7VGC8IRaL8bGs7b7qg3koy01i6oBQl\nd3Zk80RZpm84sy33mAsbKlheV8qTuzvpGRqjqsSGk5ns9uDGo7xyvI/uwTFqSgusI3UaS2pK8Als\nOnyK685fmOlw5s0TLfdTg2OAsy1WJokIb72ogZFQxMbAG8948ehpSgr8nGd9RdMqCPhoqCpm06Hc\naLl7IrmfHHA2dqrJgokXiyqLWLOshu9sOMK+jtxb4N/klr7hELuP9/HaJdV5v3Z7IloWlLKltScn\nZqd7Irl3D4xSURSgIJAd4d5wQT2lBX7ufXQXqmrrz5istenwKaIKa5bVZDoUT1heV8pYOMqLOVB3\nz45sOYOTg2PUlme2JBOvtDDAx244j9/sO8mvX0nJroLGzFsoEmXT4VOcV1/GggyXNL1iWW0pQb/w\nm33e35HNE8m9e2CU2tLsujjf9bqlnFNXyj/8fLctT2Cy0lO7O+gbCbN2mW3KkajCgJ/LllTzm31d\nmQ5l3jyS3MdYUJb5enu8oN/H3751JYdODvLLHScyHY4xZ/n2C0eoKg5y/iLrSJ2NN6yoZWd733hf\nn1d5IrmfHBjNuuQOcN35C3nf1S08f6Cb5w94/zbO5I4dbb08f6CbK5cvwCfWkTobsWGQT3u85Jr1\nyT0ciXJ6KJTxYZBTufvNF3DB4goe3XacjTa5yWSJbzx3kLLCgHWkzsGFDRUsriziVx5fSyrrk/up\nIWeMe7Z2CAX8Pm67opnz68v56ZZ2vrfRliYwmXXs1BA/39bOH69dQlHQn+lwPEdEuOGCen6z76Sn\nh0RmfXLvHnAnMGXBGPepBPw+bl+7hPPry7n7kR187Zn9qNrqTCYz/u23h/D7hPddvSzToXjWm1bW\nMxyK8Mwe73aseia5Z2vLPSbo9/HOK5dyy6UNfP6Xe/j7n+0iFLFRNCa9OvtG+MGmo9xyaSOLKosy\nHY5nXXXOAhaUFvCzre2ZDmXOsj65x3qsa7OwQ3Uiv0+4oqWGq89ZwLeeP8yt97/Ajrbc2pfRZLcv\nP7WPcET58BvPzXQonhbw+3jrxYt5cnfH+JLjXpP1C4e19QwDUF+RXa2QqWaj+kR4y8UNNNeU8OTu\nDt721d+y7ooltCwoGV/V8va1S9IZqskTB7sG+MGmY7xz7RKWLijNdDieFXtvFxcEGA1HeXTbcW5b\n4733bNa33A+dHKS+opDSNG6xlwwXN1Xx609cx/uuWsZDm4/xz0/s5alXOqxUY1LmC0/soTDg4843\nrsh0KDmhubqYRRVFfPuFI57sQ/NEcl9W681WSEVRkHvetpJfffxaVtSX8dTuTr769H62HOvJdGgm\nx/zu0Cke236CD1yznLosWqrDy0SEK5cvYPfxPjZ7cK2ZrG8OHzo5yO9fWJ/pMOYkvnTzx2uXsrej\nn0debuMPvvZf3PGGc/jYDStsqJqZt5FQhE/+eBvVJUGqS2zN9mS6tLmK5/Z18ZWn9vGdP1mb6XBm\nJatb7r1DIU4Njnm25T7RefXlfPT6Ffzha5u5/9kDvO3//pYNB23ik5mfz/58NwdPDvL2yxqzZuXU\nXFEQ8PHBa5fzm30n+d2hU5kOZ1ay+ko45G5Wu6y2LMORJE9R0M8lzVW896oWOvtHWffABv74mxvY\nfNhbF47JDo+83Mp3NhzhA9csY8VCW0MmFd51ZQuLKoq456c7GAt7p88su5P7yQGAnGm5xzuvvpyP\nv+k83nzRYrYc6+XW+1/ghi8+y+d+8YrdVpuEPLe3i798eBtXLq/hL37/NZkOJ2cVF/j5zNtX8cqJ\nfr781N5Mh5OwrE7uu9r7CPqFJTUlmQ4lJYJ+H68/t5a/+L3zuWnVIo73DHP/swd4cncHkaj3eudN\n+vxyx3He/61NLCgt5E0XLOLhF1szHVJOe9PKet6xupn7nj7AT17yxs86qztUn9nTxdplC3K+jlgQ\n8HHNijrWtNSwfms7v36lk9v/dQNfXneZzTI0Z4jt33v/swdori7mPVe1UFxgnfLp8Jm3r+LoqSHu\nemgrj+/s4JoVtWesuJlt81eyNmseOzXEvs4Brju/LtOhpE1h0M8frm7m1tc2sb2tl5u+/By/fsXb\nK9OZ5Nl4sJu3fOU33P/sAdZd0cyfXrOckoKsbp/llIKAj39/3xW8edViHt95gq89vZ+trT2MhrNz\ncbGsvTKe3uOspfzG1yzMcCTpd/mSav78unO488GXef+3NrN2WQ3XrKilsjiIiOATwe+D6pICVjZU\n0FhVjNia3TlJVdl0+DSfemQ7+zsHqCoJ8r6rWlhRb52nmVAU9PPV2y+j7McBfrW7gx9uOkbAJ5y7\nsIz+kRCrW6pZ1VhJYSDzd1MJJXcRuRH4MuAHvqmq/2fC84XAt4HXAt3AO1T18FyD6h8J8fVnDnDB\n4oqc7ExNxDl1ZTzyoav41vOHeWjzMb7wxNQdObVlBVyzoo5rz6vjmhW1Wb/IWjZJ97WdqGOnhnhi\nVwc/3HSUvR0DlBb4uWnVorwoU2Y7EeGS5iouaqrkcPcgO9v72Huin//9i1cAKPD7uKipktVLq3nt\n0mouba5iYQaWT5GZptWKiB/YC7wJaAU2Abep6q64Yz4EXKyqHxSRdcB/V9V3THfe1atX6+bNm896\n/NipIT71yHZ+u/8kP/7zq7h8SfWkX59vI0pGQxFCUSWqiqrTousbCdPeM8yR7kH2dQ4wNObcHjZV\nF3PB4gqW1pTQWF1MQ1UxjVXF1JYVUhT0URT0E/AJQ6EIQ6MRBkbD9A6H6B0eo2coRDjiXhPy6n+x\nO4PY/YGI8895TIi/cQj6fVQWB8f/VRQHKS8M4POdfXcRikTpHwnTPxKib9j9fyTM4GiY4gI/FUVB\nyosC7j/n48KAj1BECUWiFAf9k55XRF5U1dXT/UzTfW3HRKNKKBolFFH6R0J09Y/S1T/K0VND7Gzv\n4+WjpznQ5QwDvqS5inVXNDMailpSz3L9IyGOnhriaPcQR04N0dYzPD4woq68kAsbKjh/UTlNVcU0\nVhezuLKY8qIAZYUBSgsDBHyS0B14Itc2JNZyXwPsV9WD7ol/ANwC7Io75hbg0+7HDwNfFRHRWS7I\noKp84NubOdI9xL23rJoyseejwqCfie3xqpICltSUcOXyBURVae8Z5kDnAO29I2w51sNze7sYzZJx\nuSKMdz7FXxbzHRS0+W9umM8uXWm7tg+fHOT3/+U5QpHotN9zbVkBqxoruX3tUt74moXjd6751pjx\novKiIBc2VHJhQyXgNFxWNlSwvbWXHe297Grv47/2nyQUmfoC8PsEvwjXnV/HA++eMX9PK5Hk3ggc\ni/u8FZg4D3f8GFUNi0gvsAA4Y2NREbkDuMP9dEBE9kz1ou/+B3j39HHVTjx/BlksU0tpPHWfm/Kp\npQl8eUau7UmM/4yOAC8C/zGLL06ibLp28jqWfcC/vmfKWBK5thNK7pPdJ0z805PIMajqA8ADCbzm\nzEGJbE7k1iQdLJapZVs8E2TFtZ0tP6NsiQMslqm4sbQkcmwiRbxWoDnu8yZg4vYk48eISACoBGw+\nvcl2dm2bnJVIct8ErBCRZSJSAKwD1k84Zj0Qu4m4Ffj1bGuSxmSAXdsmZ81YlnHrjHcCj+MMF/t/\nqrpTRO4FNqvqeuDfgO+IyH6cVs26VAbtSkp5J0kslqllWzzjsujazpafUbbEARbLVBKOZcahkMYY\nY7zHBs4aY0wOsuRujDE5KOuTu4jcKCJ7RGS/iHxykucLReSH7vMbRaQlRXE0i8jTIrJbRHaKyEcn\nOeY6EekVkS3uv3tSEYv7WodFZLv7OmdNhxTHV9yfyzYRuTxFcZwf9/1uEZE+EfnYhGPS9nPxMhH5\nhIioiNRmMIZ/EpFX3GvmERGpykAM077n0xjHjO/5DMTkF5GXReTRGQ9W1az9h9PJdQBYDhQAW4GV\nE475EHC/+/E64IcpimUxcLn7cTnOtPWJsVwHPJqmn81hoHaa598M/AJnnPaVwMY0/b5OAEsz9XPx\n6j+c4ZaP48xjmvL3moY4fg8IuB9/Dvhcml9/xvd8GmOZ8T2fgZg+DjyYyPsp21vu49PDVXUMiE0P\nj3cLr07oexi4XlKwRKKqHlfVl9yP+4HdOLMXs9UtwLfVsQGoEpHFKX7N64EDqnokxa+Ti74E/CWT\nTJBKJ1V9QlXD7qcbcMb+p1Mi7/m0yLb3vIg0AW8BvpnI8dme3CebHj7xh3vG9HAgNj08ZdzSz2XA\nxkmefp2IbBWRX4jIhSkMQ4EnRORFd+r7RIn87JJtHfD9KZ5L18/Fc0TkZqBNVbdmOpYJ3o9z95dO\nmbhuZzTDez5d/gWnAZDQglFZu567K2nTw5NFRMqAHwMfU9W+CU+/hFOSGBCRNwP/CaxIUShXq2q7\niCwEfiUir6jqc/GhTvI1qfy5FAA3A389ydPp/LlkJRF5Elg0yVN3A5/CKYdkPBZV/al7zN1AGPhe\nuuJypfW6TcQM7/l0xfBWoFNVXxSR6xL5mmxP7rOZHt6a6unhIhLE+SV/T1V/MvH5+F+8qj4mIl8T\nkVpVTfqiQ6ra7v7fKSKP4NzOxif3RH52yXQT8JKqnrV1VDp/LtlKVW+Y7HERuQhYBmx1q4lNwEsi\nskZVT6QzlriY3gO8Fbhe3UJvGqX7up3WTO/5NLoauNltHBUBFSLyXVV951RfkO1lmayZHu7W8f8N\n2K2qX5zimEWxer+IrMH5+XanIJZSESmPfYzT6tsx4bD1wLvdUTNXAr2qejzZscS5jSlKMun6uXiR\nqm5X1YWq2qLOglCtOJ14KUnsMxFn85K/Am5W1aEMhJDIez4tEnnPp4uq/rWqNrnXyDqcPDdlYocs\nb7lr9kwPB+cv57uA7SKyxX3sU8ASN9b7cf64/LmIhIFhYF2KWj71wCNuvgwAD6rqL0Xkg3GxPIYz\nYmY/MAS8LwVxACAiJTgbXvxZ3GPxsaTr52Lm76tAIU6pD2CDqn4wXS8+1Xs+Xa8/waTveVV9LEPx\nzIotP2CMMTko28syxhhj5sCSuzHG5CBL7sYYk4MsuRtjTA6y5G6MMTnIkrsxxuQgS+7GGJOD/j8A\nt2y11hUdIwAAAABJRU5ErkJggg==\n" 107 | }, 108 | "metadata": {} 109 | } 110 | ] 111 | }, 112 | { 113 | "metadata": { 114 | "_cell_guid": "52011105-e1e3-4bb0-9b59-59614a96e3d4", 115 | "_uuid": "5975eb63a310ca983facc4a8b969e235fee58c74" 116 | }, 117 | "cell_type": "markdown", 118 | "source": "Notice that the *shape* of our data has changed. Before normalizing it was almost L-shaped. But after normalizing it looks more like the outline of a bell (hence \"bell curve\"). \n\n___\n## Your turn!\n\nFor the following example, decide whether scaling or normalization makes more sense. \n\n* You want to build a linear regression model to predict someone's grades given how much time they spend on various activities during a normal school week. You notice that your measurements for how much time students spend studying aren't normally distributed: some students spend almost no time studying and others study for four or more hours every day. Should you scale or normalize this variable?\n* You're still working on your grades study, but you want to include information on how students perform on several fitness tests as well. You have information on how many jumping jacks and push-ups each student can complete in a minute. However, you notice that students perform far more jumping jacks than push-ups: the average for the former is 40, and for the latter only 10. Should you scale or normalize these variables?" 119 | }, 120 | { 121 | "metadata": { 122 | "_cell_guid": "fc728697-ce3e-4890-b14d-597b2281f30d", 123 | "_uuid": "0c4d06413046e632dd1936095028587af3be0e47" 124 | }, 125 | "cell_type": "markdown", 126 | "source": "# Practice scaling\n___\n\nTo practice scaling and normalization, we're going to be using a dataset of Kickstarter campaigns. (Kickstarter is a website where people can ask people to invest in various projects and concept products.)\n\nLet's start by scaling the goals of each campaign, which is how much money they were asking for." 127 | }, 128 | { 129 | "metadata": { 130 | "_cell_guid": "148dbbac-e38b-443f-8240-fee4259231e2", 131 | "_uuid": "9c6aaa573dbd346106b120c499b967718919d520", 132 | "trusted": true 133 | }, 134 | "cell_type": "code", 135 | "source": "# select the usd_goal_real column\nusd_goal = kickstarters_2017.usd_goal_real\n\n# scale the goals from 0 to 1\nscaled_data = minmax_scaling(usd_goal, columns = [0])\n\n# plot the original & scaled data together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(kickstarters_2017.usd_goal_real, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(scaled_data, ax=ax[1])\nax[1].set_title(\"Scaled data\")", 136 | "execution_count": 4, 137 | "outputs": [ 138 | { 139 | "output_type": "execute_result", 140 | "execution_count": 4, 141 | "data": { 142 | "text/plain": "Text(0.5,1,'Scaled data')" 143 | }, 144 | "metadata": {} 145 | }, 146 | { 147 | "output_type": "display_data", 148 | "data": { 149 | "text/plain": "", 150 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEXCAYAAACnP18pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAHhpJREFUeJzt3XmcXHWd7vHPQxJAAQmQoDEsQY3K\nMrLFAOIoAt4JiIZR9AYdDBrlpSOOXvHOoF4RuHO94h2FURAGBQkugEQHAgMKIhhxWEyQsEUkLEIm\nEcIaEEQDz/3j/NoUTS9V3dVd1Yfn/XrVq0+d86tT3z79q6dPnVW2iYiI+lmv0wVERMTISMBHRNRU\nAj4ioqYS8BERNZWAj4ioqQR8RERNJeDbQNJnJX2r3W2bmJclvaod84oYLkmHS7p6NF4r6R5J+w/l\nvV5IEvC9lI52s6QnJf1e0qmSJg70GttftP2hZubfStvhkHSVpD9KelzSGklLJB0taYMW5pF/IDUl\n6Y2S/lPSY5IelvRLSa/vdF0j4YXcjxPwDSQdBZwA/E9gU2BPYFvgcknr9/Oa8aNXYcuOtL0JMAU4\nCpgDXCJJnS0rOknSS4CLga8DmwNTgeOApztZV7RfAr4onf444OO2f2z7z7bvAd5DFfJ/V9odK2mB\npO9KWgMcXsZ9t2Fe75f0O0kPSfp849fJxraSppW1i7mS7pX0oKTPNcxnpqRrJD0qaZWkk/v7RzMQ\n23+wfRXwDmAv4G2DzV/SovLypZKekPTfJW0m6WJJqyU9Uoa3arWe6LhXA9g+x/Yztp+yfZntm3oa\nSPqwpGXlG+BtknYr44+WdGfD+L/t700kvVbS5eUbwu2S3tMwbQtJC8u3y+uBVw5UsKTDGj5Tn+s1\nLf24Hwn4dd4AbAj8qHGk7SeAS4G3NoyeDSwAJgLfa2wvaQfgG8D7qNacN6VaQxrIG4HXAPsBx0ja\nvox/BvgfwCSqYN4P+PsWf6/G3+VeYDHw14PN3/abSpudbW9s+zyq/vJtqn942wBPAScPtZ7omN8C\nz0iaL+kASZs1TpT0buBY4P3AS6hWDB4qk++k6j+bUq0QfVfSlN5vIGkj4HLg+8CWwKHANyTtWJqc\nAvyR6jPywfLoU/lMnQocBrwc2AJoDOT04350NOAlnSnpAUm3tGFeb5F0Y8Pjj5IObmEWk4AHba/t\nY9qqMr3HNbYvsP2s7ad6tT0EuMj21bb/BBwDDHbBn+PKWtRSYCmwM4DtJbavtb22fJv4N+DNLfxO\nfVlJ9bW85fnbfsj2D20/aftx4P+0oZ4YZbbXUK1UGPgmsLqsTb+0NPkQ8GXbv3Jlue3fldeeb3tl\n6fvnAXcAM/t4m4OAe2x/u/SvG4AfAodIGge8CzimfLu8BZg/QMmHABfbXmT7aeDzwLMNv0/6cT86\nvQZ/FjCrHTOyfaXtXWzvAuwLPAlc1sIsHgQm9bNNfUqZ3uO+Aebz8sbptp9k3dpPf37fMPwksDGA\npFeXr4+/L5uDvshz/9EMxVTg4aHMX9KLJf1b+aq8BlgETCwf2BhDbC+zfbjtrYCdqPrtSWXy1lRr\n6s9TNj/eWDaHPFpe21ef2RbYo6ddafs+4GXAZGA8z/0c/W6Acnt/pv5Aw2cq/bh/HQ1424soYdND\n0isl/VjVUR+/kPTaIcz6EODSEq7NuoZqJ9M7e9WzEXAAcEVj6QPMZxUNXx8lvYjqK+VQnAr8Bphu\n+yXAZ4Eh7yCVtDWwO/CLIc7/KKpNSXuU9j1ff7PTdgyz/Ruqla2dyqj76GObuKRtqdb4jwS2sD0R\nuIW+//73AT+3PbHhsbHtjwKrgbVU/0h6bDNAiasa20p6Mc/9TKUf96PTa/B9OZ1qR+fuwKeptme3\nag5wTisvsP0Y1TbFr0uaJWmCpGnA+cAK4DtNzmoB8HZJbyg7eo5j6B1nE2AN8ET5R/fRocykrLG8\nGbgQuB64pMn53w+8olc9TwGPStoc+MJQ6onOKjs/j+rZsVj+8R8KXFuafAv4tKTdVXlVCfeNqFZu\nVpfXfYB1/xR6uxh4ddk5OqE8Xi9pe9vPUO3rOrb0zR2AuQOUvAA4SNWhnesDx/Pc7Eo/7kdXBbyk\njal2dp4v6UaqbWlTyrR3Srqlj8dPes1jCvBXwE96z38wtr9M9d//X6g6zHVUayL7lW1/zczjVuDj\nwLlUax6PAw8wtEPQPg28t8zjm8B5Lb7+ZEmPU3Xwk6i2gc6y3bP9crD5HwvML1+x31Pm8SKqzVXX\nAj9u9ReKrvA4sAdwnaQ/UP0tb6Fas8X2+VTbpb9f2l4AbG77NuArVN9276f6nP2yrzco27b/G9XK\n1kqqzZAnAD3nYRxJtSny91TfHr7dX7HlM/WxUs8q4BGqla4e6cf9UKdv+FHWki+2vZOqQxVvt/28\nvfItzO8TwI62j2hTicNS/mk9SvX18e5O1xMRLxxdtQZf9u7fXQ7Tonw93LnF2RxKi5tn2k3S28tX\nz42ovg3cDNzTyZoi4oWn04dJnkP1de81klZImke1p32epKXArVTHnDc7v2lUO2N+3v5qWzKb6mvp\nSmA6MMed/qoUES84Hd9EExERI6OrNtFERET7dOxCWZMmTfK0adM69fZRc0uWLHnQ9uROvHf6doyk\nVvp2xwJ+2rRpLF68uFNvHzUnaaAzI0dU+naMpFb6djbRRETUVAI+IqKmEvARETWVgI+IqKkEfERE\nTXXz/UQjRpWke6guWPUMsNb2jHK1wfOAaVSXm3iP7Uc6VWNEKwZdg5e0oaTrJS2VdKuk4/pos4Gk\n8yQtl3RduWRAxFj0lnLjmBnl+dHAFbanU90T4OjOlRbRmmY20TwN7Gt7Z2AXYJakPXu1mQc8YvtV\nwIlUlwWNqIPZrLud3HygldtARnTUoAFf7sn4RHk6oTx6X8Cm8UOwANhPUu3ujhK1Z+CycjexnstN\nv9T2KoDyc8uOVRfRoqa2wZd7FS4BXgWcYvu6Xk2mUu6ZaHutpMeobqn1YK/5HAEcAbDNNv3foev7\n1937l+H37jHQnbwi2mpv2yslbQlcLuk3zb4wfTu6UVNH0dh+ptzMeitgpqTet+nqa239eZeptH26\n7Rm2Z0ye3JHLhET0y/bK8vMB4N+BmcD95S5hPXcLe6Cf16ZvR9dp6TBJ248CVwGzek1aQbkprqTx\nwKb0upl2RDeTtJGkTXqGqW43dwuwkHX3C51LdV/biDGhmaNoJkuaWIZfBOxPdQfzRo0fgkOAn+UG\nFzHGvBS4utxo5nrgP2z/GPgS8FZJdwBvLc8jxoRmtsFPobph7Tiqfwg/sH2xpOOBxbYXAmcA35G0\nnGrNfc6IVRwxAmzfBTzv9pC2HwL2G/2KIoZv0IC3fROwax/jj2kY/iPw7vaWFhERw5FLFURE1FQC\nPiKiphLwERE1lYCPiKipBHxERE0l4CMiaioBHxFRUwn4iIiaSsBHRNRUAj4ioqYS8BERNZWAj4io\nqQR8RERNJeAjImoqAR8RUVMJ+IiImkrAR0TUVAI+IqKmEvARETWVgI+IqKkEfERETSXgIyJqKgEf\nEVFTCfiIiJpKwEdE1NSgAS9pa0lXSlom6VZJn+ijzT6SHpN0Y3kcMzLlRkREs8Y30WYtcJTtGyRt\nAiyRdLnt23q1+4Xtg9pfYkREDMWga/C2V9m+oQw/DiwDpo50YRERMTwtbYOXNA3YFbiuj8l7SVoq\n6VJJO/bz+iMkLZa0ePXq1S0XGxERzWs64CVtDPwQ+KTtNb0m3wBsa3tn4OvABX3Nw/bptmfYnjF5\n8uSh1hwREU1oKuAlTaAK9+/Z/lHv6bbX2H6iDF8CTJA0qa2VRkRES5o5ikbAGcAy21/tp83LSjsk\nzSzzfaidhUZERGuaOYpmb+Aw4GZJN5ZxnwW2AbB9GnAI8FFJa4GngDm2PQL1RowoSeOAxcB/2T5I\n0nbAucDmVJsiD7P9p07WGNGsQQPe9tWABmlzMnByu4qK6KBPUB0p9pLy/ATgRNvnSjoNmAec2qni\nIlqRM1kjCklbAW8DvlWeC9gXWFCazAcO7kx1Ea1LwEescxLwj8Cz5fkWwKO215bnK+jnHJAcAhzd\nKAEfAUg6CHjA9pLG0X007XPfUg4Bjm7UzE7WiBeCvYF3SDoQ2JBqG/xJwERJ48ta/FbAyg7WGNGS\nrMFHALY/Y3sr29OAOcDPbL8PuJLqKDGAucCFHSoxomUJ+IiB/RPwKUnLqbbJn9HheiKalk00Eb3Y\nvgq4qgzfBczsZD0RQ5U1+IiImkrAR0TUVAI+IqKmEvARETWVgI+IqKkEfERETSXgIyJqKgEfEVFT\nCfiIiJpKwEdE1FQCPiKiphLwERE1lYCPiKipBHxERE0l4CMiaioBHxFRUwn4iIiaSsBHRNRUAj4i\noqYGDXhJW0u6UtIySbdK+kQfbSTpa5KWS7pJ0m4jU25ERDSrmZturwWOsn2DpE2AJZIut31bQ5sD\ngOnlsQdwavkZEREdMugavO1Vtm8ow48Dy4CpvZrNBs525VpgoqQpba82IiKa1tI2eEnTgF2B63pN\nmgrc1/B8Bc//J4CkIyQtlrR49erVrVUaEREtaTrgJW0M/BD4pO01vSf38RI/b4R9uu0ZtmdMnjy5\ntUojIqIlTQW8pAlU4f492z/qo8kKYOuG51sBK4dfXkREDFUzR9EIOANYZvur/TRbCLy/HE2zJ/CY\n7VVtrDMiIlrUzFE0ewOHATdLurGM+yywDYDt04BLgAOB5cCTwAfaX2pERLRi0IC3fTV9b2NvbGPg\nY+0qKiIihi9nskZE1FQCPiKiphLwERE1lYCPiKipBHwEIGlDSddLWlouqndcGb+dpOsk3SHpPEnr\nd7rWiGYl4CMqTwP72t4Z2AWYVc7pOAE40fZ04BFgXgdrjGhJAj6C6lBf20+UpxPKw8C+wIIyfj5w\ncAfKixiSBHxEIWlcOZnvAeBy4E7gUdtrS5M+L6IX0a0S8BGF7Wds70J1LaWZwPZ9NevrtblSanSj\nBHxEL7YfBa4C9qS6t0HPGd/9XkQvV0qNbpSAjwAkTZY0sQy/CNif6uY2VwKHlGZzgQs7U2FE65q5\n2FjEC8EUYL6kcVQrPj+wfbGk24BzJf0z8GuqK6tGjAkJ+AjA9k1UdyvrPf4uqu3xEWNONtFERNRU\nAj4ioqYS8BERNZWAj4ioqQR8RERNJeAjImoqAR8RUVMJ+IiImkrAR0TUVAI+IqKmEvARETWVgI+I\nqKlBA17SmZIekHRLP9P3kfSYpBvL45j2lxkREa1q5mqSZwEnA2cP0OYXtg9qS0UREdEWg67B214E\nPDwKtURERBu1axv8XpKWSrpU0o79Ncp9KyMiRk87Av4GYFvbOwNfBy7or2HuWxkRMXqGHfC219h+\nogxfAkyQNGnYlUVExLAMO+AlvUySyvDMMs+HhjvfiIgYnkGPopF0DrAPMEnSCuALwAQA26dR3XH+\no5LWAk8Bc2x7xCqOiIimDBrwtg8dZPrJVIdRRkREF8mZrBERNZWAj4ioqQR8RERNJeAjImoqAR8R\nUVMJ+IiImkrAR0TUVAI+IqKmEvARETWVgI+IqKkEfERETSXgIyJqKgEfAUjaWtKVkpZJulXSJ8r4\nzSVdLumO8nOzTtca0awEfERlLXCU7e2BPYGPSdoBOBq4wvZ04IryPGJMSMBHALZX2b6hDD8OLAOm\nArOB+aXZfODgzlQY0boEfEQvkqYBuwLXAS+1vQqqfwLAlv28JjeUj66TgI9oIGlj4IfAJ22vafZ1\nuaF8dKMEfEQhaQJVuH/P9o/K6PslTSnTpwAPdKq+iFYl4COAcuP4M4Bltr/aMGkhMLcMzwUuHO3a\nIoZq0HuyRrxA7A0cBtws6cYy7rPAl4AfSJoH3Au8u0P1RbQsAR8B2L4aUD+T9xvNWiLaJZtoIiJq\nKgEfEVFTCfiIiJpKwEdE1FQCPiKipgYNeElnSnpA0i39TJekr0laLukmSbu1v8yIiGhVM2vwZwGz\nBph+ADC9PI4ATh1+WRERMVyDBrztRcDDAzSZDZztyrXAxJ5TuyMionPasQ1+KnBfw/MVZdzz5Ip7\nERGjpx0B39fZf+6rYa64FxExetoR8CuArRuebwWsbMN8IyJiGNoR8AuB95ejafYEHuu5QUJERHTO\noBcbk3QOsA8wSdIK4AvABADbpwGXAAcCy4EngQ+MVLEREdG8QQPe9qGDTDfwsbZVFBERbZEzWSMi\naioBHxFRUwn4iIiaSsBHRNRUAj4ioqYS8BERNZWAj4ioqQR8RERNJeAjImoqAR8RUVMJ+IiImkrA\nR0TUVAI+IqKmEvARETWVgI+IqKkEfERETSXgIyJqKgEfEVFTCfgIQNKZkh6QdEvDuM0lXS7pjvJz\ns07WGNGqBHxE5SxgVq9xRwNX2J4OXFGeR4wZCfgIwPYi4OFeo2cD88vwfODgUS0qYpgS8BH9e6nt\nVQDl55b9NZR0hKTFkhavXr161AqMGEgCPqINbJ9ue4btGZMnT+50ORFAAj5iIPdLmgJQfj7Q4Xoi\nWpKAj+jfQmBuGZ4LXNjBWiJa1lTAS5ol6XZJyyU970gCSYdLWi3pxvL4UPtLjRg5ks4BrgFeI2mF\npHnAl4C3SroDeGt5HjFmjB+sgaRxwClUHXwF8CtJC23f1qvpebaPHIEaI0ac7UP7mbTfqBYS0UbN\nrMHPBJbbvsv2n4BzqQ4fi4iILtZMwE8F7mt4vqKM6+1dkm6StEDS1n3NKIeSRUSMnmYCXn2Mc6/n\nFwHTbL8O+CnrTg557otyKFlExKhpJuBXAI1r5FsBKxsb2H7I9tPl6TeB3dtTXkREDFUzAf8rYLqk\n7SStD8yhOnzsL3qOFS7eASxrX4kRETEUgx5FY3utpCOBnwDjgDNt3yrpeGCx7YXAP0h6B7CW6noe\nh49gzRER0YRBAx7A9iXAJb3GHdMw/BngM+0tLSIihiNnskZE1FQCPiKiphLwERE1lYCPiKipBHxE\nRE0l4CMiaioBHxFRUwn4iIiaSsBHRNRUAj4ioqYS8BERNZWAj4ioqQR8RERNJeAjImoqAR8RUVMJ\n+IiImkrAR0TUVAI+IqKmEvARETWVgI+IqKkEfERETSXgIyJqKgEfEVFTCfiIiJpKwEdE1FQCPiKi\nppoKeEmzJN0uabmko/uYvoGk88r06yRNa3ehEZ0yWP+P6FaDBrykccApwAHADsChknbo1Wwe8Ijt\nVwEnAie0u9CITmiy/0d0pfFNtJkJLLd9F4Ckc4HZwG0NbWYDx5bhBcDJkmTbrRZ0/EW38e+/XsHa\nZ80fnl7LsRfdyvrj1mP8ODF+vfVYTyDBehKq6kFlnCjDTb5Xy8VF1/nG+3Zjx5dvOpJv0Uz/H9Sz\nz5o3nvAznvrzMzz5p2f48zPPcvzFtzJh3HpMGLce49ZT1bcb+nBj3+7p781Ivx77pm2xEfM/OHPY\n82km4KcC9zU8XwHs0V8b22slPQZsATzY2EjSEcAR5ekTkm7v5z0n9X5tlxoLdY6FGmGIde70j/1O\n2nY4xTRopv+Pxb7dLXVA99TSLXWwCCadPa/fWpru280EfF8rDr1XEpppg+3TgdMHfUNpse0ZTdTW\nUWOhzrFQI3R1nbXs291SB3RPLd1SB7SvlmZ2sq4Atm54vhWwsr82ksYDmwIPD7e4iC7QTP+P6ErN\nBPyvgOmStpO0PjAHWNirzUJgbhk+BPjZULa/R3ShZvp/RFcadBNN2aZ+JPATYBxwpu1bJR0PLLa9\nEDgD+I6k5VRr7nOGWdegX3W7xFiocyzUCF1aZ3/9fxiz7Jbfs1vqgO6ppVvqgDbVoqxoR0TUU85k\njYioqQR8RERNdTTgx8IlEJqo8XBJqyXdWB4f6kCNZ0p6QNIt/UyXpK+V3+EmSbuNdo2ljsHq3EfS\nYw3L8pjRrnGohtOXJX2mjL9d0t+MQi2fknRb6QtXSNq2YdozDct/WDuTh/PZkTRX0h3lMbf3a0eg\nlhMb6vitpEcbprVzmQz5szqkZWK7Iw+qHVZ3Aq8A1geWAjv0avP3wGlleA5wXhfWeDhwcqeWY6nh\nTcBuwC39TD8QuJTqmO49geu6tM59gIs7uSxHsJ/02ZepLn+wFNgA2K7MZ9wI1/IW4MVl+KONnyvg\niVFcJn1+doDNgbvKz83K8GYjWUuv9h+n2pne1mVS5jWkz+pQl0kn1+D/cgq47T8BPaeAN5oNzC/D\nC4D9JDV7xvZo1dhxthcx8HkHs4GzXbkWmChpyuhUt04TdY5Vw+nLs4FzbT9t+25geZnfiNVi+0rb\nT5an11Id299uw/ns/A1wue2HbT8CXA7MGsVaDgXOGcb79WsYn9UhLZNOBnxfp4BP7a+N7bVAzyUQ\nRkszNQK8q3ydWiBp6z6md1qzv0c32EvSUkmXStqx08U0aTh9ud1/m1bnN49qjbHHhpIWS7pW0sGj\nUEdfn52OLZOyuWo74GcNo9u1TJrRX61DWiadDPi2XQJhBDXz/hcB02y/Dvgp69bSukmnl2OzbgC2\ntb0z8HXggg7X06zh9OV2/22anp+kvwNmAP+vYfQ2rk6Rfy9wkqRXjmAd/X12OrZMqDafLbD9TMO4\ndi2TZrS1n3Qy4MfCJRAGrdH2Q7afLk+/Cew+SrW1Ykycbm97je0nyvAlwARJkzpcVjOG05fb/bdp\nan6S9gc+B7yjof9ie2X5eRdwFbDrSNUxwGenI8ukmEOvzTNtXCbN6K/WoS2Tdu08GMLOhvFUOwq2\nY92Ojx17tfkYz90x9YMurHFKw/DfAtd2aHlOo/8dN2/juTturu/g332gOl/GupPvZgL39jzv5sdw\n+jKwI8/dyXoXw9vJ2kwtu1LtdJzea/xmwAZleBJwBwPsjGxDHX1+dqh2JN5d6tmsDG8+ksuktHsN\ncE9jn2vnMmnyM9DnZ3Woy6TTH4wDgd+Wzva5Mu54qrUKgA2B86l2PF0PvKILa/y/wK2l01wJvLYD\nNZ4DrAL+TPWffh7wEeAjZbqoblpxJ3AzMKNDf+/B6jyyYVleC7yhk/2zzf2k375MtSZ9J3A7cMAo\n1PJT4H7gxvJYWMa/ofSPpeXnvBGuo9/PDvDBsqyWAx8Y6WVSnh8LfKnX69q9TIb8WR3KMsmlCiIi\naipnskZE1FQCPiKiphLwERE1lYCPiKipBHx0ncEuyNSr7TaSrpT063JG5IGjUWPEWJCAj250Fs1f\ne+R/UR1TvivV8eXfGKmiIsaaBPwIk3TPaJyNKelYSZ8ewfnvI+nikZp/I/dxQSZJr5T0Y0lLJP1C\n0mt7mgMvKcOb0oVn6EZ0yqD3ZI36kTTe1QWvxpLTqU4GuUPSHlRr6vtSnZxymaSPAxsB+3euxIju\nkoBvQrk5w8W2dyrPPw1sTLWW+RFgLXCb7TmStqA6W20y1RmLA17eWNLngfdRXSnuQWCJ7X+RtAtw\nGvBiqrPaPmj7EUkfBo6gOuV6OXCY1136daD3uQr4T2BvYKGks8v8tylNPmn7l5JmAicBLwKeojpj\n7vZBF9IIkrQx1RmF5zdcLXqD8vNQ4CzbX5G0F9XN33ey/WwHSo3oKtlEMzxHA7u6uhreR8q4LwBX\nl23CC1kXoM8jaQbwLqprg7yT6sp+Pc4G/qnM++YyX4Af2X69qysuLqM61blZE22/2fZXgH8FTrT9\n+lLDt0qb3wBvKvUfA3yxhfmPlPWAR23v0vDYvkybB/wAwPY1VJcEGAsXKIsYcVmDH56bgO9JuoB1\nl7Z9E1VYY/s/JD0ywOvfCFxo+ykASReVn5tShfHPS7v5VNcxAdhJ0j8DE6m+RfykhXrPaxjeH9ih\nYY34JZI2odqOPV/SdKrt2xNamP+IsL1G0t2S3m37/HKjjNfZXkp1QbL9gLMkbU8V8Ks7WW9Et8ga\nfHPW8txltWH5+TaqCwPtDiwpl4GF5q9dPZS7U50FHGn7r4DjGmppxh8ahtcD9mpYI55q+3HgfwNX\nls1Rb29x/m0h6RzgGuA1klZImke1GWuepKVUF6jquSPPUcCHy/hzgMOdCyxFAAn4Zt0PbClpC0kb\nAAdRLbutbV8J/CPr1qgXUYURkg6gurRnf64G3i5pw7Kd+W0Ath8DHpH016XdYUDP2vwmwCpJE3re\nZ4guo7p6I6XWXcrgpsB/leHDhzH/IbN9qO0ptifY3sr2Gbbvtj3L9s62d7B9fGl7m+29y/hdbF/W\niZojulE20TTB9p8lHQ9cR3Ud5t9Q3cj3u2Vziqi2Zz8q6TjgHEk3UIXyvQPM91flLu1Lgd8Bi6lu\n5QYwFzhN0ouprmX9gTL+86WO31Ftm99kiL/WPwCnSLqJqh8sotqP8GWqTTSf4rm3LYuIMSaXC+4w\nSRvbfqIE+SLgCNs3dLquiBj7sgbfeadL2oFqW/f8hHtEtEvW4EdBOTb+ij4m7Wf7oTa/1ylUx7o3\n+lfb327n+0RE90vAR0TUVI6iiYioqQR8RERNJeAjImoqAR8RUVP/H4HNQpTW/jFnAAAAAElFTkSu\nQmCC\n" 151 | }, 152 | "metadata": {} 153 | } 154 | ] 155 | }, 156 | { 157 | "metadata": { 158 | "_cell_guid": "a26e3103-83d7-41b1-a1f6-437ea51becd9", 159 | "_uuid": "71d69ec4508b1b7048cd9592605e17884e6aed25" 160 | }, 161 | "cell_type": "markdown", 162 | "source": "You can see that scaling changed the scales of the plots dramatically (but not the shape of the data: it looks like most campaigns have small goals but a few have very large ones)" 163 | }, 164 | { 165 | "metadata": { 166 | "_cell_guid": "0d896669-f8a7-4a78-a22f-278065c33724", 167 | "_uuid": "6ab743a9bb0a40ca7921fc506f39f41217e47ab3", 168 | "trusted": true 169 | }, 170 | "cell_type": "code", 171 | "source": "# Your turn! \n\n# We just scaled the \"usd_goal_real\" column. What about the \"goal\" column?\n\n# select the ugoal column\nusd_goal = kickstarters_2017.goal\n\n# scale the goals from 0 to 1\nscaled_data = minmax_scaling(usd_goal, columns = [0])\n\n# plot the original & scaled data together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(kickstarters_2017.goal, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(scaled_data, ax=ax[1])\nax[1].set_title(\"Scaled data\")\n", 172 | "execution_count": 5, 173 | "outputs": [ 174 | { 175 | "output_type": "execute_result", 176 | "execution_count": 5, 177 | "data": { 178 | "text/plain": "Text(0.5,1,'Scaled data')" 179 | }, 180 | "metadata": {} 181 | }, 182 | { 183 | "output_type": "display_data", 184 | "data": { 185 | "text/plain": "", 186 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEWCAYAAABG030jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAGGxJREFUeJzt3Xu0ZGV55/HvDxpEQW2Q1iGgNhq8\noA6orThq4hJ0gldQ0YGoAUVZGnU5E5yRxNFAVmI0a2Z0TYwXvNGJERHUgKx4YRBEM4A2CnITuYjK\nNEJzExBFuvuZP/Y+cDicS1V31Tn10t/PWrVOndrv3vV09XN+9dbetatSVUiS2rHVUhcgSRqOwS1J\njTG4JakxBrckNcbglqTGGNyS1BiDewFJ/iLJp0Y9doBtVZLfH8W2pM2V5LAk312MdZNcneQFm3Jf\nW4otKrj7BrowyR1JfpnkY0mWz7dOVb2/qt40yPaHGbs5kpyZ5LdJbktya5LzkhyV5AFDbMMnhvup\nJM9N8n+T/CrJTUn+LckzlrqucdhS+3iLCe4kRwIfBP4r8FDgWcCjgdOSbDvHOssWr8Khvb2qHgzs\nAhwJHAz8a5IsbVlaSkkeApwK/D2wE7ArcAxw51LWpdHaIoK7b+ZjgHdU1der6q6quhp4DV14v64f\nd3SSk5J8LsmtwGH9bZ+btq0/SfKzJDcmee/0l3XTxyZZ2c8GDk3y8yQ3JHnPtO08M8nZSW5Jcm2S\nj8z1BDKfqvp1VZ0JvBz4D8BLFtp+krP61S9IcnuS/5RkxySnJlmX5Ob++m7D1qMl9ziAqjq+qjZU\n1W+q6ptV9aOpAUnenOTS/hXbJUme1t9+VJIrp93+irnuJMkTkpzWz+gvS/KaacseluSU/tXg94DH\nzldwktdP+5t6z4xl9vEstojgBp4NbAd8efqNVXU78DXghdNuPgA4CVgO/PP08Un2BD4KvJZupvtQ\nuhnNfJ4LPB7YD3hfkif2t28A/guwM13g7gf86ZD/run/lp8Da4A/WGj7VfWH/Zi9qmqHqjqBrhc+\nS/dE9ijgN8BHNrUeLZmfABuSrE7yoiQ7Tl+Y5NXA0cCfAA+he8K/sV98JV3/PJRuovO5JLvMvIMk\n2wOnAZ8HHg4cAnw0yZP6If8A/Jbub+SN/WVW/d/Ux4DXA78HPAyYHrT28SzGFtxJPpPk+iQXjWBb\nz09y/rTLb5McOMQmdgZuqKr1syy7tl8+5eyq+peq2lhVv5kx9iDgq1X13ar6HfA+YKEPezmmn/Vc\nAFwA7AVQVedV1TlVtb6f/X8CeN4Q/6bZrKV7eTz09qvqxqr6UlXdUVW3AX8zgnq0yKrqVrrJQgGf\nBNb1s99H9EPeBPxdVX2/OldU1c/6dU+sqrV9758AXA48c5a7eSlwdVV9tu+vHwBfAg5KsjXwKuB9\n/avBi4DV85R8EHBqVZ1VVXcC7wU2Tvv32MezGOeM+zhg/1FsqKrOqKq9q2pvYF/gDuCbQ2ziBmDn\nOfZZ79Ivn/KLebbze9OXV9Ud3DNbmcsvp12/A9gBIMnj+pdxv+x3y7yfez+BbIpdgZs2ZftJHpTk\nE/1L1luBs4Dl/R+iGlJVl1bVYVW1G/Bkur79cL/4kXQz6/vodwOe3++WuKVfd7aeeTSwz9S4fuxr\ngX8HrACWce+/o5/NU+7Mv6lfM+1vyj6e3diCu6rOog+RKUkem+Tr6d4F8Z0kT9iETR8EfK0PzUGd\nTXdw5pUz6tkeeBFw+vTS59nOtUx7GZfkgXQv7TbFx4AfA3tU1UOAvwA2+cBikkcCTwe+s4nbP5Ju\nl84+/fipl6Ee7GxYVf2YbhL15P6mXzDLPuckj6abob8deFhVLQcuYvb//18A366q5dMuO1TVW4F1\nwHq6J4gpj5qnxGunj03yIO79N2Ufz2Kx93EfS3eA8OnAu+j2Fw/rYOD4YVaoql/R7bP7+yT7J9km\nyUrgROAa4J8G3NRJwMuSPLs/QHIMm94QDwZuBW7vn8Deuikb6WcYzwNOBr4H/OuA278OeMyMen4D\n3JJkJ+AvN6UeLa3+oOGRUwfk+if0Q4Bz+iGfAt6V5Onp/H4f2tvTTVrW9eu9gXvCfqZTgcf1BxW3\n6S/PSPLEqtpAdyzp6L439wQOnafkk4CXpnsL47bAX3HvXLKPZ7FowZ1kB7qDhCcmOZ9uX9Uu/bJX\nJrlolss3ZmxjF+ApwDdmbn8hVfV3dM/W/4OuEc6lmzns1+9bG2QbFwPvAL5AN1O4DbieTXur1buA\nP+638UnghCHX/0iS2+ga98N0+xj3r6qp/YMLbf9oYHX/Uvc1/TYeSLfb6Bzg68P+gzQRbgP2Ac5N\n8mu6/8uL6GaiVNWJdPt9P9+P/Rdgp6q6BPifdK9Or6P7O/u32e6g33f8H+kmUWvpdgd+EJg6j+Dt\ndLsEf0k32//sXMX2f1Nv6+u5FriZbjI1xT6eRcb5RQr9rPbUqnpyurfkXVZV9zlKPcT23gk8qaqO\nGFGJm6V/MrqF7mXcT5e6HklbhkWbcfdHu3/avx2J/mXaXkNu5hCG3E0yakle1r8E3J5u9n4hcPVS\n1iRpyzLOtwMeT/ey6/FJrklyON2R58OTXABcTPee6UG3t5LuIMa3R1/tUA6ge3m4FtgDOLj8/jdJ\ni2isu0okSaO3pZw5KUn3G2P5EKWdd965Vq5cOY5NS5x33nk3VNWKxb5f+1rjNExfjyW4V65cyZo1\na8axaYkk852JNzb2tcZpmL52V4kkNcbglqTGGNyS1BiDW5IaY3BLUmMGeldJkqvpPuRlA7C+qlaN\nsyhplGbr3/6T404AVtJ9ZMFrqurmpapRGsYwM+7n919mYGirRTP79yjg9Krag+7z2I9autKk4bir\nRFuqA7jnK7VWA8N8FZ60pAYN7gK+2X9zzawfqZrkiCRrkqxZt27d6CqUNt9s/fuIqroWoP/58NlW\ntK81iQY9c/I5VbU2ycOB05L8uP9qsrtV1bF033DDqlWrZv3kqs+f+/O7r//xPvN9m5E0Uvfp30FX\nHKSvwd7W4hpoxl1Va/uf1wNfYfZvfpYm0hz9e13/jUpT36x0/dJVKA1nweBOsn2SB09dp/vKoovG\nXZg0CvP07ync812Ih9J9Z6fUhEF2lTwC+EqSqfGfr6r75fe46X5p1v5N8n3gi/0XfPwcePUS1igN\nZcHgrqqrgGG/YkyaCHP1b1XdCOy3+BVJm8+3A0pSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS\n1BiDW5IaY3BLUmMMbklqjMEtSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmN\nMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiD\nW5IaY3BLUmMGDu4kWyf5YZJTx1mQNA4z+zfJ7knOTXJ5khOSbLvUNUqDGmbG/U7g0nEVIo3ZzP79\nIPChqtoDuBk4fEmqkjbBQMGdZDfgJcCnxluONHoz+zdJgH2Bk/ohq4EDl6Y6aXiDzrg/DPw3YONc\nA5IckWRNkjXr1q0bSXHSiMzs34cBt1TV+v73a4BdZ1vRvtYkWjC4k7wUuL6qzptvXFUdW1WrqmrV\nihUrRlagtDnm6N/MMrRmW9++1iRaNsCY5wAvT/JiYDvgIUk+V1WvG29p0kjcp3/pZuDLkyzrZ927\nAWuXsEZpKAvOuKvqz6tqt6paCRwMfMvQVivm6N/XAmcAB/XDDgVOXqISpaH5Pm5tqd4N/FmSK+j2\neX96ieuRBjbIrpK7VdWZwJljqUQas+n9W1VXAc9cynqkTeWMW5IaY3BLUmMMbklqjMEtSY0xuCWp\nMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpj\ncEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbklqjMEtSY0xuCWpMQa3\nJDXG4JakxhjcktSYBYM7yXZJvpfkgiQXJzlmMQqTRmGu/k2ye5Jzk1ye5IQk2y51rdKgBplx3wns\nW1V7AXsD+yd51njLkkZmrv79IPChqtoDuBk4fAlrlIayYHBX5/b+1236S421KmlE5unffYGT+ttX\nAwcuQXnSJhloH3eSrZOcD1wPnFZV584y5ogka5KsWbdu3ajrlDbZzP4FrgRuqar1/ZBrgF3nWNe+\n1sQZKLirakNV7Q3sBjwzyZNnGXNsVa2qqlUrVqwYdZ3SJpvZv8ATZxs2x7r2tSbOUO8qqapbgDOB\n/cdSjTRG0/r3WcDyJMv6RbsBa5eqLmlYg7yrZEWS5f31BwIvAH487sKkUZijfy8FzgAO6ocdCpy8\nNBVKw1u28BB2AVYn2Zou6L9YVaeOtyxpZGbt3ySXAF9I8tfAD4FPL2WR0jAWDO6q+hHw1EWoRRq5\nufq3qq6i298tNcczJyWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1\nxuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMM\nbklqjMEtSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTELBneSRyY5\nI8mlSS5O8s7FKEwahbn6N8lOSU5Lcnn/c8elrlUa1CAz7vXAkVX1ROBZwNuS7DnesqSRmat/jwJO\nr6o9gNP736UmLBjcVXVtVf2gv34bcCmw67gLk0Zhnv49AFjdD1sNHLg0FUrDG2ofd5KVwFOBc2dZ\ndkSSNUnWrFu3bjTVSSM0o38fUVXXQhfuwMPnWMe+1sQZOLiT7AB8CfjPVXXrzOVVdWxVraqqVStW\nrBhljdJmW6h/52JfaxINFNxJtqFr+n+uqi+PtyRptObo3+uS7NIv3wW4fqnqk4Y1yLtKAnwauLSq\n/tf4S5JGZ57+PQU4tL9+KHDyYtcmbapBZtzPAV4P7Jvk/P7y4jHXJY3KXP37AeCFSS4HXtj/LjVh\n2UIDquq7QBahFmnkFujf/RazFmlUPHNSkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS\n1BiDW5IaY3BLUmMMbklqjMEtSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmN\nMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1JgF\ngzvJZ5Jcn+SixShIGrXZejjJTklOS3J5/3PHpaxRGsYgM+7jgP3HXIc0Tsdx3x4+Cji9qvYATu9/\nl5qwYHBX1VnATYtQizQWc/TwAcDq/vpq4MBFLUraDCPbx53kiCRrkqxZt27dqDYrjcsjqupagP7n\nw2cbZF9rEo0suKvq2KpaVVWrVqxYMarNSkvKvtYk8l0l2lJdl2QXgP7n9UtcjzQwg1tbqlOAQ/vr\nhwInL2Et0lAGeTvg8cDZwOOTXJPk8PGXJY3OHD38AeCFSS4HXtj/LjVh2UIDquqQxShEGpd5eni/\nRS1EGhF3lUhSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbklqjMEtSY0x\nuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINb\nkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMGCu4k+ye5LMkVSY4a\nd1HSYrG31aIFgzvJ1sA/AC8C9gQOSbLnpt5hVXHrb+/i+1ffxMVrf8Wtv72LqmLDxu6iLddUH9y1\nYSPrN2wc+/2NurfXb9zIutvu5JyrbuRnN/6auzZspKpY3//UlmuqD363fiMbR5BzywYY80zgiqq6\nCiDJF4ADgEuGuaNzrrqRo796MRs3FuvnKTyBbbbaiq23CgBFN3aq7+9es+67XgIhJMNUpqWyYWNR\nBRuq2FjF9Gx77T6P4m9e8ZRxlzCS3n73ST/iK+f/P9Zv2Mh8f5NbbxWWbRW2SgbuawLB3m7JVE9X\nFRuL+0xIT37bc9jrkcs36z4GCe5dgV9M+/0aYJ+Zg5IcARzR/3p7kstm2dbOwA3DFjkm1nJfk1IH\n74ed3z93LY8e0d0s2NsD9jVM0GPH5NQyKXXA5NSy894f3Py+HiS4Z3uOv8+8oqqOBY6dd0PJmqpa\nNWBtY2Utk1sHLFotC/b2IH0NW+Rj10wdMDm1jKqOQQ5OXgM8ctrvuwFrN/eOpQlgb6tJgwT394E9\nkuyeZFvgYOCU8ZYlLQp7W01acFdJVa1P8nbgG8DWwGeq6uJNvL8FX3IuImu5r0mpAxahFnt77Cal\nDpicWkZSR3ybkiS1xTMnJakxBrckNWZkwb3QqcNJHpDkhH75uUlWTlv25/3tlyX5o0Wo5c+SXJLk\nR0lOT/Loacs2JDm/v2zWgaoB6jgsybpp9/emacsOTXJ5fzl0c+oYsJYPTavjJ0lumbZslI/JZ5Jc\nn+SiOZYnyf/u6/xRkqdNWzbSx2SImieityelrwesZVF6e1L6ut/e4vV29Wf4bM6F7sDOlcBjgG2B\nC4A9Z4z5U+Dj/fWDgRP663v24x8A7N5vZ+sx1/J84EH99bdO1dL/fvsiPiaHAR+ZZd2dgKv6nzv2\n13ccZy0zxr+D7kDdSB+Tflt/CDwNuGiO5S8Gvkb3HutnAeeO4zFprbcnpa8nqbcnqa8Xu7dHNeO+\n+9ThqvodMHXq8HQHAKv76ycB+yVJf/sXqurOqvopcEW/vbHVUlVnVNUd/a/n0L1/d9QGeUzm8kfA\naVV1U1XdDJwG7L+ItRwCHL8Z9zenqjoLuGmeIQcA/1idc4DlSXZh9I/JoCaltyelrweqZR6j/H+c\nmL6Gxe3tUQX3bKcO7zrXmKpaD/wKeNiA6466lukOp3sWnLJdkjVJzkly4CLU8ar+ZdNJSaZOBlmy\nx6R/eb078K1pN4/qMRnEXLWO+jHZ3HpmHTPG3p6Uvh6mlnH3dkt9DSPs7UFOeR/EIKfFzzVmoFPq\nR1xLNzB5HbAKeN60mx9VVWuTPAb4VpILq+rKMdXxVeD4qrozyVvoZm37DrjuqGuZcjBwUlVtmHbb\nqB6TQSxWnwxqUnp7Uvp60FoWo7db6msYYZ+MasY9yKnDd49Jsgx4KN3LilGfdjzQ9pK8AHgP8PKq\nunPq9qpa2/+8CjgTeOq46qiqG6fd9yeBpw/zbxhlLdMczIyXkyN8TAYxV61LdXr6pPT2pPT1QLUs\nUm+31Ncwyt4e0U75ZXQ71HfnnoMET5ox5m3c+wDOF/vrT+LeB3CuYvMOTg5Sy1PpDmrsMeP2HYEH\n9Nd3Bi5nnoMdI6hjl2nXXwGcU/ccrPhpX8+O/fWdxvmY9OMeD1xNf2LWqB+TadtcydwHcF7CvQ/g\nfG8cj0lrvT0pfT1JvT1pfb2YvT3KBn8x8JO+cd7T3/ZXdM/8ANsBJ9IdoPke8Jhp676nX+8y4EWL\nUMv/Aa4Dzu8vp/S3Pxu4sG+AC4HDx1zH3wIX9/d3BvCEaeu+sX+srgDeMO7HpP/9aOADM9Yb9WNy\nPHAtcBfdTONw4C3AW/rloftygyv7+1s1rsektd6elL6epN6elL5e7N72lHdJaoxnTkpSYwxuSWqM\nwS1JjTG4JakxBrcWzUIfwjNj7KOSnJHkh/3Zdy9ejBqlFhjcEy7J0UnetdR1jMhxDP65FP+d7v3Q\nT6V7b/RHx1WU1BqDW4umZvkQniSPTfL1JOcl+U6SJ0wNBx7SX38ofomvdLdRfVaJZpHkvcBr6T5A\n5gbgPLqTJD4OPIjujfhvrKqbk7wZOILuDLArgNfXPZ/0dn92LN0JCpcn2YduZr0v3UkT30zyDmB7\n4AVLV6I0WZxxj0mSVcCr6E5DfiXdh/4A/CPw7qr693RnT/1lf/uXq+oZVbUXcCndWVf3a0l2oDuD\n7cQk5wOfAHbpFx8CHFdVu9GdHfdPSexXCWfc4/Rc4OSq+g1Akq/SzRyXV9W3+zGr6U6VBnhykr8G\nlgM70H3z+P3dVsAtVbX3LMsOp98fXlVnJ9mO7jMlrl/E+qSJ5AxmfGb7qMb5HAe8vaqeAhxD9/kX\n92tVdSvw0ySvhru/2mmvfvHPgf36259I93isW5JCpQljcI/Pd4GXJdmu3yXwEuDXwM1J/qAf83pg\navb9YODaJNvQ7Re/30lyPHA28Pgk1yQ5nO7feniSC+g+lGjqG0yOBN7c3348cFj5wToS4K6Ssamq\n7/dfQHoB8DNgDd03oxwKfDzJg+g+kvIN/SrvBc7tx15IF+T3K1V1yByL7vMWwaq6BHjOeCuS2uSn\nA45Rkh2q6vY+pM8CjqiqHyx1XZLa5ox7vI5Nsifd/tnVhrakUXDGLUmN8eCkJDXG4JakxhjcktQY\ng1uSGmNwS1Jj/j+qmsaPn8pBjwAAAABJRU5ErkJggg==\n" 187 | }, 188 | "metadata": {} 189 | } 190 | ] 191 | }, 192 | { 193 | "metadata": { 194 | "_cell_guid": "cfe26726-a690-4d65-ba37-b0940a411747", 195 | "_uuid": "e19939624c42f1e3a0ca371d883ac417adb31ab7" 196 | }, 197 | "cell_type": "markdown", 198 | "source": "# Practice normalization\n___\n\nOk, now let's try practicing normalization. We're going to normalize the amount of money pledged to each campaign." 199 | }, 200 | { 201 | "metadata": { 202 | "_cell_guid": "ae7450a4-bf5e-41ac-9aa6-d98451969535", 203 | "_uuid": "4b45fd281c4b2004ad9e02b7b4391100cca7023a", 204 | "trusted": true 205 | }, 206 | "cell_type": "code", 207 | "source": "# get the index of all positive pledges (Box-Cox only takes postive values)\nindex_of_positive_pledges = kickstarters_2017.usd_pledged_real > 0\n\n# get only positive pledges (using their indexes)\npositive_pledges = kickstarters_2017.usd_pledged_real.loc[index_of_positive_pledges]\n\n# normalize the pledges (w/ Box-Cox)\nnormalized_pledges = stats.boxcox(positive_pledges)[0]\n\n# plot both together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(positive_pledges, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(normalized_pledges, ax=ax[1])\nax[1].set_title(\"Normalized data\")", 208 | "execution_count": 6, 209 | "outputs": [ 210 | { 211 | "output_type": "execute_result", 212 | "execution_count": 6, 213 | "data": { 214 | "text/plain": "Text(0.5,1,'Normalized data')" 215 | }, 216 | "metadata": {} 217 | }, 218 | { 219 | "output_type": "display_data", 220 | "data": { 221 | "text/plain": "", 222 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEXCAYAAAB29JkcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XmcXFWd+P3Pt6q6el/S3UnIQhYk\nCEFlMT/AHYliUCHMI44BRtAfMzgz4vg86m+EnwMCAz4PzjyDoygzKCiCMWAcxigRRBYVhISwk4RA\nk4Qme3d6X6qX6u/vj3uqqa5UdVV3V9Wt6v6+X696pfrec885VblV3zrLPVdUFWOMMSYfAn5XwBhj\nzMxhQccYY0zeWNAxxhiTNxZ0jDHG5I0FHWOMMXljQccYY0zeWNCZxkTkf4vIj7KdNoO8VESOzUZe\nxuSCiFwrIne754tEpEdEglkuY7eIfCTDtJ8TkcezWX6hsqBTJNxJ+ZKI9InIARG5VUTqxjtGVb+l\nqn+dSf4TSTsVIvKYiEREpFtEukTkGRG5UkRKJ5CHBbUC575wD4pIZdy2vxaRx3ysVlKq2qyqVaoa\n9bsumYgPmMXIgk4REJGvAjcB/wuoBc4AFgMPiUg4xTGh/NVwwq5Q1WpgHvBVYA2wUUTE32qZLAsB\nX55qJuKx76ppwv4jC5yI1ADXAV9S1QdUdUhVdwN/iRd4/sqlu1ZE1ovI3SLSBXwu8ReRiFwiIm+I\nyGERuTq++Z/Q3bDEtSYuFZFmEWkVkW/E5XOaiDwpIh0isl9EbkkV/Majqr2q+hhwHvAe4BPp8heR\nP7rDX3BdIp8RkVki8hsRaRGRdvd84UTrY7LuX4CvpWqRi8h7ReRpEel0/743bt9jInKjiDwB9AHH\nuG03iMif3f/9r0WkQUR+5lrNT4vIkrg8/l1E3oxrUX8gRT1i53tIRN7j8o49IiKy26ULuFb56+4z\ndK+I1Mfl89m4z9c3kpUVl7ZBRDa4um0G3pawP2ndRWQV8L+Bz7j6veC2f15EtrsehJ0i8oXxyveT\nBZ3C916gDPiv+I2q2gP8Fvho3ObVwHqgDvhZfHoRWQ78ALgYr4VRCyxIU/b7gbcDK4FrROQEtz0K\n/D9AI16wWAn8/QRfV/xraQa2ALEvhZT5q+oHXZqTXJfIPXjn8Y/xgvAioB+4ZbL1MVmzBXgM+Fri\nDvdlfT/wXaAB+DfgfhFpiEv2WeByoBp4w21b47YvwPuifhLv/74e2A58M+74p4GT3b61wC9EpGy8\nCqvqk+68qgJmAU8BP3e7/wE4H/gQMB9oB77vXs9y4FZXt/nuNY33w+f7QATvs/g/3SNe0rqr6gPA\nt4B7XD1PcukPAZ8EaoDPAzeLyKnjvVa/WNApfI1Aq6oOJ9m33+2PeVJV/1tVR1S1PyHtBcCvVfVx\nVR0ErgHSLbx3nar2q+oLwAvASQCq+oyqPqWqw67V9Z94H8Sp2If3AZtw/qp6WFV/qap9qtoN3JiF\n+pjsuAb4kojMTtj+CeA1Vb3L/T//HHgFODcuzU9UdavbP+S2/VhVX1fVTrwfXa+r6u/d5+MXwCmx\ng1X1bnduDKvq/w+U4v2IytR3gV4g1mr5AvANVd2jqgPAtcAFriv7AuA3qvpHt+9qYCRZpuJNWPgU\ncI1r7b8M3BmfZqJ1V9X73fuiqvoH4He89SOuoFjQKXytQGOKMZp5bn/Mm+PkMz9+v6r2AYfTlH0g\n7nkfUAUgIse5LqwDrivvW4wNfpOxAGibTP4iUiEi/+m6NrqAPwJ1kuXZSGbi3Bfqb4ArE3bN563W\nS8wbjG19JzufD8Y970/yd1XsDxH5quty6hSRDrzWfUbnqeueOhO4SFVjwWMxcJ/r9u3Aa1lFgbkc\n+fnqJfXnazbeeFf86xvzXky07iJyjog8JSJtLv3HM32t+WZBp/A9CQwA/1f8RvFmBZ0DPBy3ebyW\ny37imvsiUo7XBTAZt+L9Kl2mqjV4fcyTngQgIkcD7wb+NMn8v4r3K/B0lz7WBWcTEwrDN4G/YWxA\n2Yf3JR5vEbA37u9JL4HvxkC+jjf2OUtV64BOMjgn3LH/DKx2LaqYN4FzVLUu7lGmqnvxPl9Hx+VR\nQerPVwswHJ8e77VnWvcx74t4Mz9/CfwrMNel35jJa/WDBZ0C507664DvicgqESlxg6W/APYAd2WY\n1XrgXDd4G3Z5TvakrAa6gB4ROR74u8lk4looHwJ+BWzG+6Bkkv9B4JiE+vQDHW6s4JuYgqGqTcA9\neGMiMRuB40TkIjeA/xlgOV6rKBuq8b7YW4CQiFyDN94xLvcD6B7gElV9NWH3fwA3ishil3a2iKx2\n+9YDnxSR97vP1/Wk+H51U7P/C7jWfQaWA5dOoO4HgSXy1oy+MF73WwswLCLnAGene61+saBTBFT1\n23i/9v8V78t4E96vrpWu/ziTPLYCXwLW4f0q68YbfMzo+ARfAy5yefwQ70M6EbeISDfeh+c7eL/S\nVsV1Y6TL/1rgTtfN8Zcuj3K8rsangAcm+oJMzl0PjF6zo6qH8Qa+v4rXDfWPwCdVtTX54RP2IN6Y\nz6t4XVcRxu9+jlkJHAWsj5vBttXt+3dgA/A7d/4+BZzuXs9W4It4g/778SYZ7BmnnCvwugIPAD/B\nmwyRad1/4f49LCLPunHMfwDudeVe5OpZkMRu4jYziUgV0IHXhbXL7/oYY2YGa+nMICJyrmvOV+K1\nml4CdvtbK2PMTGJBZ2ZZjTeAuw9YBqxRa+oaY/LIuteMMcbkjbV0jDHG5E0hLwrpi8bGRl2yZInf\n1TDT1DPPPNOqqolX5+ecndcmlyZyXlvQSbBkyRK2bNnidzXMNCUiiVfh54Wd1yaXJnJeW/eaMcaY\nvMko6Lgr4XeISJOIJK6hhIiUisg9bv8mGbu8+FVu+w4R+Vi6PEVkqcvjNZdneLwyROSj4i39/ZL7\n96y4vB5zZTzvHnMm/hYZY4zJlrRBxy2a+H28db6WAxe6ZRviXQa0q+qxwM14NxyLLfe9BjgRWAX8\nQESCafK8CbhZVZfhXV172Xhl4F2Ffq6qvhNvKYnEZWEuVtWT3eNQ2nfEGGNMzmTS0jkNaFLVnW5J\n/HV413vEW81bS3OvB1aKiLjt61R1wF313uTyS5qnO+Yslwcuz/PHK0NVn1PVfW77VqBMJnDrY2OM\nMfmTSdBZwNh1f/Zw5M2/RtO4+1p04q2wmurYVNsbgI64e8fEl5WqjHifAp5LWI/sx65r7WoX1Iwx\nxvgkk6CT7Is68YrSVGmytT1tPUTkRLwut/jbtF7sut0+4B6fTZIHInK5iGwRkS0tLS3JkhhjjMmC\nTILOHsbe92Eh3jIqSdOId7OxWrwbcqU6NtX2Vrybb4USto9XBiKyELgPbzny12OZuvtc4FZhXYvX\nrXcEVb1NVVeo6orZs/N+CYUxxswYmQSdp4FlblZZGG9iQOKy2Rt4634QFwCPuDW9NgBr3MyzpXjr\nfW1Olac75lGXBy7PX41XhojU4d1r/SpVfSJWIXePjkb3vARvGfWXM3i9xhhjciRt0HHjJ1fg3eNh\nO3Cvqm4VketF5DyX7HagQUSagK/gbk3r7jFxL7AN7x4nX1TVaKo8XV5fB77i8mpweacsw+VzLHB1\nwtToUuBBEXkReB7vjoQ/nPhbZIwpdE80tbLqO39k8642v6ti0rAFPxOsWLFCU125vXZTMwAXnb4o\n6X5j0hGRZ1R1hXu+Cu/GYEHgR6r6/yWk/SDeDerehbci+Pq4fZcC/+T+vEFV72Qc453X08Gld2zm\nD6+2EA4GePR/ncmCunK/qzSjxJ/X6diKBMb4IMPr35qBz+GNR8YfG7sl9+l445TfFJFZua5zoTrQ\nGeFPr7XwkRPmMBgdYctua+0UMgs6xvgj7fVvqrpbVV8ERhKO/RjwkKq2qWo78BDexdcz0q9f2MeI\nwjvm1xIOBti2r8vvKplxWNAxxh+ZXP82pWNnyqUAf2pqZU51KXNqymisDvPw9kOjXeGm8FjQMcYf\nmVz/NqVjZ8KlAIPDIzy9q41jZlcBML+2nH2d/dhYdeGyoGOMPzK5/i0Xx04rL+7poH8oyttmVwIw\nr66cvsEoXZHhNEcav1jQMcYfmVz/lsqDwNkiMstNIDjbbZtx/vz6YURgaaMXdObXlgGwp73Pz2qZ\ncVjQMcYHmVz/JiL/Q0T2AJ8G/lNEtrpj24B/xgtcTwPXu20zzp9fb2X5vBoqwt4iJgvqygkFhN2t\nvT7XzKRidw41xiequhHYmLDtmrjnT+N1nSU79g7gjpxWsMBFhqI829zBJWcsHt0WCgZYVF/BLgs6\nBctaOsaYovTsG+0MDo/w3mPHLja/dHYl+zsj3P6nXTaLrQBZ0DHGFKUf/mknAYHdrWPHb5Y2VqJA\nU0uPPxUz47LuNWNM0Yi1XIajIzzb3MHihkrKSoJj0iyur6ShMszvtx1k+bwaP6ppxmEtHWNM0dm8\nu43O/iHOfPuR1x8FA8In3jmPlp4Bnn+zw4famfFY0DHGFJWndh5m40v7OaaxkmPdRaGJ3n5UNeFg\ngAOd/XmunUnHuteMMUVjeGSEB7YeYEljJX91+mJS3YFeRKivDNPWO5jnGpp0rKVjjCkazW193oy1\nYxqOGMtJNKuihPa+oTzVzGTKgo4xpmg0HewhIIyutTaeWEvH1mErLBZ0jDFF47VDPRxdX5G2lQMw\nqzLMYHSEw9bFVlAs6BhjisLAcJR9Hf0sbajMKH19RRiAN9tsHbZCYkHHGFMUmg/3ocDs6tKM0s+q\n9IJOswWdgmJBxxhTFF5v8dZTyzjoWEunIFnQMcYUhZ2t3rI2jVWZBZ1wKEB5SZBD3QO5rJaZIAs6\nxpiisLOll+qyUEaTCGIqwkE6bNp0QbGgY4wpCq+39GTcyokpDwfp6LegU0gs6BhjCp6qsrOlN+Px\nnJiKcJDOPpsyXUgs6BhjCt6+zgid/UPMnWDQKS8J2qoEBcaCjjGm4D29y7sb9+IMr9GJqQiH6LCW\nTkGxoGOMKXibd7dRXRriqNqyCR1XHg7SFRkmOmJL4RQKCzrGmIK3ZXcbpy6eRSDFqtKpVIS9mW5d\nNpmgYFjQMcYUtAOdEV492MNpS+snfGy5m15tM9gKhwUdY0xB+9ff7SAcDHDuu+ZP+NhYS6fdxnUK\nhgUdY0zB2tnSwy+f3cPn37eERQ0VEz6+Iuzdp7LTZrAVDAs6xpiC9URTK6pw8emLJ3V8eTjWvWYt\nnUJhQccYU7A2725nbk0pR9eXT+r4itiYjrV0CkbI7woYY0wyP3vqDf6w4xCLGyr5+eY3J5VH2eiY\njgWdQpFRS0dEVonIDhFpEpErk+wvFZF73P5NIrIkbt9VbvsOEflYujxFZKnL4zWXZ3i8MkTkoyLy\njIi85P49Ky6vd7vtTSLyXZEJzrc0xvimvW+IrsgwSxondkFovIAINWUhWwqngKQNOiISBL4PnAMs\nBy4UkeUJyS4D2lX1WOBm4CZ37HJgDXAisAr4gYgE0+R5E3Czqi4D2l3eKcsAWoFzVfWdwKXAXXH1\nuhW4HFjmHqvSviPGmIKwt6MfgEX1E59AEG9WZdimTBeQTFo6pwFNqrpTVQeBdcDqhDSrgTvd8/XA\nSteqWA2sU9UBVd0FNLn8kubpjjnL5YHL8/zxylDV51R1n9u+FShzraJ5QI2qPqmqCvw0Li9jTIFr\n7fHugzN7gitLJ6orL7ExnQKSSdBZAMR3qO5x25KmUdVhoBNoGOfYVNsbgA6XR2JZqcqI9yngOVUd\ncOn3pKk3ACJyuYhsEZEtLS0tyZIYY/LscM8ANWUhwqGpzXeqrQjb+msFJJP/zWTjIIkLGaVKk63t\naeshIifidbl9IZP0Yzaq3qaqK1R1xezZs5MlMcbkWWvPIA1TbOWAa+lY91rByCTo7AGOjvt7IbAv\nVRoRCQG1QNs4x6ba3grUuTwSy0pVBiKyELgPuERVX49LvzBNvY0xBepwzwANleEp51NXYd1rhSST\noPM0sMzNKgvjTQzYkJBmA94gPsAFwCNuHGUDsMaNsSzFG8zfnCpPd8yjLg9cnr8arwwRqQPuB65S\n1SdiFVLV/UC3iJzhxoouicvLGN9NdlaoiJSIyJ1uZuZ2Ebkq33XPta7IEL2D0QnfKTSZuoowXZEh\nW2m6QKQNOm785ArgQWA7cK+qbhWR60XkPJfsdqBBRJqArwBXumO3AvcC24AHgC+qajRVni6vrwNf\ncXk1uLxTluHyORa4WkSed485bt/fAT/Cm8DwOvDbib09xuTGVGaFAp8GSt2MzXcDX4i/TGE62N3a\nC0BDVRZaOuUlqEJ3xFo7hSCji0NVdSOwMWHbNXHPI3gfhGTH3gjcmEmebvtOvNltiduTlqGqNwA3\npCh7C/COZPuM8dnoDE4AEYnNCt0Wl2Y1cK17vh64xbXaFah03czlwCDQlad658Wu0aCTjZZOCeBd\n91NXMfUgZqbGlsExxh9TmRW6HugF9gPNwL+qaltiAcU8K3P7/m6CIjRmaUwHsBlsBcKCjjH+mMqs\n0NOAKDAfWAp8VUSOOSJhEc/KfHlvJ3NrSwkFp/4VVVvuBS6bwVYYLOgY44+pzAq9CHhAVYdU9RDw\nBLAi5zXOE1Xlpb2dLKib3CKfiWa5lo7d3qAwWNAxxh9TmRXaDJwlnkrgDOCVPNU75/a099PZP8T8\nLAWd2DiO3citMFjQMcYHU5kVijfrrQp4GS94/VhVX8zrC8ihl/Z2AmStpVNT5s2Xsmt1CoPd2sAY\nn0x2Vqiq9iTbPl080dRKWUmAo2rKspJfKBiguixEp43pFARr6RhjCsbAcJRfv7CPVScelZVJBDHe\nqgTWvVYILOgYYwrGw9sP0RUZZlYWpkrHm1URps261wqCBR1jTMH4r2f3UFMW4m2zq7Ka77zaMva5\n+/MYf1nQMcYUhNaeAR7b0cLJR9cRyPJNfhfOqmBPex/e5D/jJws6xpiCsOH5fQyPKCcvmpXVfNdu\nauZgV4TI0AiHe21cx28WdIwxBeGBlw9wwryarM1aizfLXauzp9262PxmQccY47s7Ht/FljfamFM9\n9QU+k4kFnTfb+nKSv8mcBR1jjO9eb+lhROG4udU5yT+2FI61dPxnQccY47vXDvZQGgqwqL4iJ/mX\nlgSpCAfZ024tHb9Z0DHG+GpweISX93WybE4VwUB2Z63Fm1URtpZOAbBlcIwxeTcyonzjv1/mhHnV\n1FeG6RuMsmJJfU7LnFUZZvfh3pyWYdKzoGOMybudrb38fHMzAAHxbil97JzsXhCaaE51Kdv2dRIZ\nilJWEsxpWSY1CzrGmLx7trkdgC+vXMaBzgglQcn6BaGJ5lSXMqLerbBPmFeT07JMahZ0jDF5tXZT\n8+hyN19euYxAQFi7qTnn5c6p9q7/ee1QjwUdH9lEAmNM3jW39XHKolkEcjhxIFFjVRgB7nt2b16C\nnEnOgo4xJq8iQ1Faugc4NcvL3aQTCgaorwzT0h3Ja7lmLAs6xpi8au8bRIFlc3M7cSCZOTVlHOoe\nyHu55i0WdIwxeRW7g+dRtdlfYy2d+ooSOuwOor6yiQTGmLyKBZ1NO9t4ZX93XsuuKg0xODzC4PBI\nXss1b7GWjjEmrzr7hwgIVJfl/zdvZalXZu/AcN7LNh4LOsaYvOrsG6K6rCTn1+UkU+UCXY8FHd9Y\n0DHG5FVnZIja8hJfyq4qtaDjNws6xpi86uofosaCzoxlQccYkzeqSmf/EHU+BZ1KCzq+s6BjjMmb\nzv4hhqLqW0unJBigNBSwoOMjCzrGmLzZ2+Hdz8avMR3wuth6IhZ0/GJBxxiTNzsOeNflzKku9a0O\nVaUha+n4KKOgIyKrRGSHiDSJyJVJ9peKyD1u/yYRWRK37yq3fYeIfCxdniKy1OXxmsszPF4ZItIg\nIo+KSI+I3JJQr8dcGc+7x5yJvT3GmGzatq+LUEBorPIx6JSF7DodH6UNOiISBL4PnAMsBy4UkeUJ\nyS4D2lX1WOBm4CZ37HJgDXAisAr4gYgE0+R5E3Czqi4D2l3eKcsAIsDVwNdSvISLVfVk9ziU7vUa\nY3Jn2/4u5taU5fS21OlUWkvHV5m0dE4DmlR1p6oOAuuA1QlpVgN3uufrgZUiIm77OlUdUNVdQJPL\nL2me7pizXB64PM8frwxV7VXVx/GCjzGmQKkq2/d3Mc+HNdfiVZWG6BuMMhy1pXD8kEnQWQC8Gff3\nHrctaRpVHQY6gYZxjk21vQHocHkklpWqjHR+7LrWrnZB7QgicrmIbBGRLS0tLRlkaYyZqANdEdr7\nhnwPOhVh71bVnbbwpy8yCTrJvqg1wzTZ2p5pPRJdrKrvBD7gHp9NlkhVb1PVFaq6Yvbs2WmyNMZM\nxuZdbQDMryv3tR4VYe9anfY+Czp+yCTo7AGOjvt7IbAvVRoRCQG1QNs4x6ba3grUuTwSy0pVRkqq\nutf92w2sxevWM6YgTHGCzrtE5EkR2SoiL4mIv82HDNz15Bssbqjg6PoKX+sRa+l09A36Wo+ZKpOg\n8zSwzM0qC+NNDNiQkGYDcKl7fgHwiKqq277GfXiWAsuAzanydMc86vLA5fmrNGUkJSIhEWl0z0uA\nTwIvZ/B6jcm5KU7QCQF3A3+rqicCZwIF/bN9675OtrzRzmfPWOzLQp/xYkHHWjr+SLu2uKoOi8gV\nwINAELhDVbeKyPXAFlXdANwO3CUiTXitjzXu2K0ici+wDRgGvqiqUYBkeboivw6sE5EbgOdc3qQq\nw+W1G6gBwiJyPnA28AbwoAs4QeD3wA8n8R4Zkwujk2kARCQ2QWdbXJrVwLXu+XrgFjcueTbwoqq+\nAKCqh/NV6cn63sNNAKT+mZg/b3WvWUvHDxnd0EJVNwIbE7ZdE/c8Anw6xbE3AjdmkqfbvpMk3WBp\nyliSourvTrHdGL8lm0xzeqo07sdfbPLMcYCKyIPAbLwZot9OLEBELgcuB1i0aFHWX8BE7Ovsp66i\nZHTtMz9Z95q/bEUCY/wxlQk6IeD9wMXu378QkZVHJCygCTL7OyLMq/V3AkFMaShAQKx7zS8WdIzx\nx1Qn6PxBVVtVtQ+vx+DUnNd4kvoGh2ntGWC+z1OlY0SEinDIWjo+saBjjD+mMkHnQeBdIlLhgtGH\nGDsWVFC27+9GoWBaOuB1sbX3WkvHD/53sBozA01xgk67iPwbXuBSYKOq3u/LC8nAtn2dAMyrK4yW\nDniTCWwigT8s6BjjkylO0Lkbb9p0wXui6TC15SW+3bgtmYpwkA4b0/GFda8ZY3JmKDrCE02tHDe3\nihSrUPmiIhy0lo5PLOgYY3JiKDrCw9sP0j0wzLI51X5XZwxvIsEQ41xfbnLEuteMMTlx7Yat/GxT\nM6GAcOycKr+rM0ZFOMhgdITewShVBXDt0ExiLR1jTE4819zB8UdV88NLVlBWEvS7OmPEAk1bj3Wx\n5ZsFHWNM1kVHlKaWHj6wrJEPH194N+ytKvOCTkuP3YYr3yzoGGOy7o3DvQwOj3Dc3MIay4mJtXRa\nuq2lk28WdIwxWffqwW6Awg06rqXT2jPgc01mHgs6xpis23GgB4BlcwtrAkFMZTiECLR0W9DJNws6\nxpise/VgN4vqK0ZvI1BoggGhviJsLR0fWNAxxmTdga4IC3y+LXU6jVWl1tLxgQUdY0zWdfYPUVdR\nOMveJNNYbS0dPxRm29cYU9Q6+go/6PQORGlu6/O7GjOOtXSMMVmlqnT1D1FTQAt8JlNVGqI7Yot+\n5psFHWNMVt355zcYjI6wu7WwWxFVpSGGokrvwLDfVZlRrHvNGJNV/UNRAMpLgqzd1OxzbVIbXZWg\ne4BKW38tb6ylY4zJqv5BF3TChbXeWqJqF3QO2Qy2vLKgY4zJqr4hr7uqvMAW+UxUU+aNOR3ssvXX\n8smCjjEmqyJF0tKxoOMPCzrGmKyKjelUFHhLp6wkQCgg1r2WZxZ0jDFZ1VckLR0Roaa8hAOd1tLJ\nJws6xpis6h+KEhAoDRX+10tNWci61/Ks8M8KY0xR6R+MUlYSRET8rkpa1WUl1r2WZxZ0jDFZ1T8U\nLfiZazGxlo6q+l2VGcOCjjEmq/oHowU/nhNTU15C32CUHluVIG8s6BhjsqqYWjrVNm067yzoGGOy\nqq+YWjpuVYIDnTauky8WdIwxWdU3OExlgd4xNFGtWwl7X2e/zzWZOSzoGGOyZjg6QmRohIoiaenE\ngs7+Dutey5eMgo6IrBKRHSLSJCJXJtlfKiL3uP2bRGRJ3L6r3PYdIvKxdHmKyFKXx2suz/B4ZYhI\ng4g8KiI9InJLQr3eLSIvuWO+K8Uwh9OYItbR792fpqJIVm0OBQM0VpWy31o6eZM26IhIEPg+cA6w\nHLhQRJYnJLsMaFfVY4GbgZvcscuBNcCJwCrgByISTJPnTcDNqroMaHd5pywDiABXA19LUv1bgcuB\nZe6xKt3rNcZMXnvvIEDRtHQA5teVsbfDgk6+ZNLSOQ1oUtWdqjoIrANWJ6RZDdzpnq8HVrpWxWpg\nnaoOqOouoMnllzRPd8xZLg9cnuePV4aq9qrq43jBZ5SIzANqVPVJ9Sbh/zQuL2NMBtZuah59ZKK9\nz2vpFMuYDsD82nL221I4eZNJ0FkAvBn39x63LWkaVR0GOoGGcY5Ntb0B6HB5JJaVqozx6r0nTb0B\nEJHLRWSLiGxpaWkZJ0tjsmcq3dZu/yLXrZysle+LtiJs6cyrK2N/R79dIJonmQSdZOMgif87qdJk\na3um9cikTkduVL1NVVeo6orZs2ePk6Ux2TGVbus4NwO/zXVdJ6Kjr/iCzvzacnoHo3T12wWi+ZBJ\nG3gPcHTc3wuBfSnS7BGREFALtKU5Ntn2VqBOREKuNROfPlUZ49V7YZp6G+OX0S5mABGJdVtvi0uz\nGrjWPV8P3OK6lFVEzgd2Ar35q3Jqse63P7zq9RRUFFP3Wl054E2brq0o8bk2018mLZ2ngWVuVlkY\nb2LAhoQ0G4BL3fMLgEfcOMoGYI3rJliKN5i/OVWe7phHXR64PH+VpoykVHU/0C0iZ7ixokvi8jLG\nb5PuthaRSuDrwHXjFeBHt3HfwDChgBAughWmY+bVlQGwzyYT5EXanyOqOiwiVwAPAkHgDlXdKiLX\nA1tUdQNwO3CXiDThtT7WuGMTFg+xAAAZ2UlEQVS3isi9eL/ehoEvqmoUIFmersivA+tE5AbgOZc3\nqcpwee0GaoCw+wV4tqpuA/4O+AlQjtcNUVBdEWZGm0q39XV4Mzx7xrsKQFVvA24DWLFiRV4GLPoG\no1QWyXTpmIWzvJbOnnYLOvmQ0dmhqhuBjQnbrol7HgE+neLYG4EbM8nTbd+J1/WQuH28Mpak2L4F\neEeyfcb4bCrd1qcDF4jIt4E6YEREIqp6Cz7rHRwuqvEcgNlVpZSXBGlu6/O7KjNCcf0kMWb6GO1i\nBvbitdwvSkgT61J+krFdyh+IJRCRa4GeQgg4UFzrrsWICIvqK3jjsAWdfCiejldjphE3RhPrYt4O\n3BvrthaR81yy2/HGcJqArwBHTKsuNH2D0aK6Rifm6PoKmtsKYk7GtFd8Z4cx08RUuq3j0lybk8pN\ngqrSHRni2DlVfldlQtZuaqZ/cJhdrb2oalHc8bSYWUvHGJMVHf1DDAyPMLem1O+qTFh9VSlDUaXF\nbl2dcxZ0jDFZccAtJTOvpsznmkxcfUUYgDdsMkHOWdAxxmTFAXf3zblFGHQaKr2g02yTCXLOgo4x\nJiv2d0aorwxTWiS3qo5XV1mCYC2dfLCgY4zJigOdEY4qwlYOQCgQoLa8hObDNoMt1yzoGGOmbHB4\nhMM9AxxVW5xBB6C+MmwXiOaBBR1jzJQd6o6gULQtHbCgky8WdIwxKQ2PjDCSwX1mYjPXir2l09oz\nSM+A3eIglyzoGGNS+s8/7OT32w+mTXegK0JJUKh3s8CKUazub1prJ6cs6BhjklJVDnZFONiV/oLJ\n2CSCQBFfzR8LOrYGW25Z0DHGJNU7GGV4ROlN092kqhzoihR11xpAQ6W3koKtwZZbFnSMMUkd7vFa\nOOmCTmvPIH2DUeZUF3fQKQ8HKS8J8vvth/yuyrRmQccYk9Th3kGAtAPrB91KBHXT4FbPDVXh0WBr\ncsOCjjEmqbYeL+gMDI8wHB1JmS4WdGrKij/oNFaVcti9bpMbFnSMMUkd7n3rF3/vYDRluthEg+qy\n4r9TSn1lmM7+ISJDqV+vmRoLOsaYpGLdazD+uE6spVM9LVo6YRSbNp1LFnSMMUm1xXUzjTeuc6g7\nQmVpiGCgeKdLx8RmsO1qtRlsuWJBxxiT1OHeQWJhZLyWzqGuAWqmQdcaeGM6ALtt4c+csaBjjEnq\ncO/g6AWT43avdUemxSQCeGva9G67QDRnLOgYY5I63DNAY1UpQZG0EwmmwySCmMaqMLutey1nLOgY\nY5Jq6x2ksjREZWkw5ZjOcHSE1p4BasqnR0sHoKGq1IJODlnQMcYk5QWdIJWloZTda609g6hOj+nS\nMQ1VYfZ1RmzadI5Y0DHGHGE4OsLA8AiloQBlJcGUX8B7O7yxj9rp1NIZXYPNxnVywYKOMeYI/S7I\nhIMBykuCRIaSr0iwq9X7Ym50X9TTQWOVN3nCpk3nhgUdY8wRYkGnJBSgrCSQsqWzu7WXYECYVcT3\n0UkUa+nYuE5uWNAxxhwhMui1bEqCXvdaf4qgs+twLwtnlU+LC0NjysNB6ivDdq1OjljQMcYcoW/I\nmzgQdkFnYHiE6MiRt63e3drL0sbKfFcv55Y0VLC71cZ0csGCjjHmCP3uupwSN6YDRy6Fo6rsau1l\nScN0DDqV1tLJEQs6xpgjvDWmI5SVeF8TXf1DY9K0dA/QNxidni2dxkr2d0ZGg6/JHgs6xpgjROJm\nr5W5lk5XZGzQ2ekG2pdM06ADNm06FzIKOiKySkR2iEiTiFyZZH+piNzj9m8SkSVx+65y23eIyMfS\n5SkiS10er7k8w1MoY7eIvCQiz4vIlom9NcbMXH1x3WuxoNMdGdu9tuNANwBvn1ud38rlwVLXZWjT\nprMvbdARkSDwfeAcYDlwoYgsT0h2GdCuqscCNwM3uWOXA2uAE4FVwA9EJJgmz5uAm1V1GdDu8p5w\nGXF1+7CqnqyqKzJ8T4yZ8WLdSmNaOgnda9v3dzGrooS5NdPnGp2YxY0VgK02nQuZtHROA5pUdaeq\nDgLrgNUJaVYDd7rn64GVIiJu+zpVHVDVXUCTyy9pnu6Ys1weuDzPn2QZxhS0yfYgiMhHReQZ14p/\nRkTOynbdInHX6ZSPdq95LZ21m5pZu6mZx5taOWFeDd7HcHqpKSuhodIW/syFTILOAuDNuL/3uG1J\n06jqMNAJNIxzbKrtDUCHyyOxrImWAaDA79wH8/IMXqsxeTGVHgSgFThXVd8JXArcle36jU4kCL41\nkaA7bkwnOqIc6IxwwryabBddENZuaqayNMRTO9v8rsq0k0nQSfYzJnHCfqo02do+mTIA3qeqp+J9\nsL8oIh9MkhYRuVxEtojIlpaWlmRJjMm2SfcgqOpzqrrPbd8KlIlIVvu44sd0SkOx7rW3xnQO9www\nPKLTNugAHFVbxv7OfkaSXJ9kJi+ToLMHODru74XAvlRpRCQE1AJt4xybansrUOfySCxromUQ+2Cq\n6iHgPlJ0u6nqbaq6QlVXzJ49O8XbYExWTaUHId6ngOdUdSCxgKn8mOofilIaChAQIRgQwqHAmNlr\nB7oiABx/1PSbRBBz9KxyBoZHRmfpmezIJOg8DSxzs8rCeIP2GxLSbMBr5gNcADyiquq2r3F900uB\nZcDmVHm6Yx51eeDy/NVkyhCRShGpBhCRSuBs4OXM3hZjcm4qPQjeTpET8brcvpCsgKn8mIoMRikP\nvzUfp7wkOKZ7raPPe764oWJC+RaThbO81/b8mx0+12R6SXsTDFUdFpErgAeBIHCHqm4VkeuBLaq6\nAbgduEtEmvBaH2vcsVtF5F5gGzAMfFFVowDJ8nRFfh1YJyI3AM+5vJloGSIyF7jPDXKGgLWq+sCk\n3yljsmsiPQh7Elr3iMhCvNb7Jar6erYr1zcYHZ1AAFAaCozpXmvvG6SsJED1NLlNdTKzq0spDQV4\n/s12Lnj3Qr+rM21kdOclVd0IbEzYdk3c8wjw6RTH3gjcmEmebvtOknSDTbQMl89JydIbUwBGW/vA\nXrwfURclpIm17p8krnUvInXA/cBVqvpELirXP3RkSye+e62zf4hZFdNnZelkAiIsmFVuLZ0ssxUJ\njPGBG6OJtfa3A/fGehBE5DyX7HagwbXuvwLEplVfARwLXO0ufH5eROZks36RobEtnfJwkLbewdG/\nO/qGptWN21JZ0lDJtn1ddPYNpU9sMjJ97jFrTJGZbA+Cqt4A3JDLuvUnBJ26ihK27esa/bujf3Ba\nLn+T6G2zq3jklUM8ufMwq95xlN/VmRaspWOMOUJfwkSCuvIwXZFhuiNDRIaiRIZGqJsBLZ1F9RVU\nhoM83mSXUmSLBR1jzBH6B49s6QDs7egfnbkW2zadBQPC6cc08PhrrX5XZdqwoGOMOUIkYSJBnZs0\nsLe9n47+wTHbpruykiC7D/dx62NZnyQ4I1nQMcYcIdmYDngtnfZYS2cGdK8Bo/cLsnXYssOCjjHm\nCIljOlWlIcLBAHvb+znQGXHX6MyMeUjzassoDQXsNgdZYkHHGHOExCnTARHm15Wxp6OffR39zK8r\nn5arSycTEGFJQyW77DYHWTEzfqoYYzI2FB1hKKpjgg7Aglnl7G7t5UBXhPe+zVsCbu2mZj+qmHdL\nGivZsbWb1p4BGqum3/2D8slaOsaYMWK3NYjvXgNYNqearfu6iI4oC+rK/aiab2LjOk/vslsdTJUF\nHWPMGL0D3hprlaVjO0Iue/9SwiHvK2OmBZ35dWWUBIVNFnSmzIKOMWaM2MKeNQmLeR5dX8Hfn/k2\n6ivDzKqcGdOlY0KBAIvqK9hsQWfKbEzHGDNG7BYG1WUhOvvHrjn25ZXLmF1VOmMmEcRb2ljJw68c\norNviNoZcGFsrlhLxxgzRnfEa+kkToleu6mZn29+c0YGHIBj51SjCo/uOOR3VYqaBR1jzBhdoy0d\n+zUfb+Gsco6qKWPjS/v9rkpRs6BjjBkj1tKpKbfe93gBEZY2VvLIK4dGJ1uYibOgY4wZI9bSSZxI\nYOCdC2oZHlE2vJB4k1eTKQs6xpgxuiPDlASF0pB9PSRa3FDBvNoybn98FyMj6nd1ipKdVcaYMboj\nQ1SXlczYCQPjERHef2wjTYd6+N22g35XpyhZ0DHGjNEdGZ4xi3lOxrsW1rFsThXf2ridiFu9wWTO\ngo4xZozuyLCN54wjGBCuPe9Emtv6+NGfdvpdnaJjQccYM0ZX/5C1dNJ443AfJ86v4d8ffo19Hf1+\nV6eoWNAxxoxh3WuZ+fg756EKN27c7ndViooFHWPMGLGJBGZ8syrCfOi42dz/4n7+/Hqr39UpGhZ0\njDFj2JhO5j543GwW1JVz3YZtDEdH/K5OUbCgY4wZFR1Rugesey1TJcEAHzpuNjsOdvPTJ9/wuzpF\nwYKOMWZUz0DyxT5NaifOr+G4uVXc9MAr7DjQ7Xd1Cp4FHWPMqG5bAmfCRIRPnbqQ6rIQV6x9lv5B\nu3ZnPBZ0jDGjDnYNAPDy3k7Wbmr2uTbFo7qshO985hSaWnr4p/9+GVVbIicVCzrGmFGvH+oBYHZ1\nqc81KT7NbX18+O1z+OWze/i3h161wJOCddwaY0a9dqibUEBm3O2os2Xl8XPo7B/ie480sbejn6s/\nsdzeywQWdIwxo5oO9TC7upSALfY5KSLCX5yygNryEu57di8bX9rPRact5rPvWczSxkq/q1cQLOgY\nY0Y1tfRY19oUBUT4yAlzeceCWh595RA/+fMu7nhiFx9Y1sgl71nCWcfPIRiYuUHdgo4xBoD+wSh7\n2vs5bm6131WZFo6qKePC0xbRFRni6d1tvLy3k7/56RYW1JVz8RmLOO+k+SyoK59xt5DIaCKBiKwS\nkR0i0iQiVybZXyoi97j9m0RkSdy+q9z2HSLysXR5ishSl8drLs9wtsswphDk4nM1Fdv2d6EKc6rL\nspGdcWrKSlh5/Fyu+PAyLjptEaWhAN9+YAfvv+lRTv/Ww1yx9lnue24Pbb2Dflc1L9K2dEQkCHwf\n+CiwB3haRDao6ra4ZJcB7ap6rIisAW4CPiMiy4E1wInAfOD3InKcOyZVnjcBN6vqOhH5D5f3rVku\nY1KiI8rLezsZHhlh4axyTpxfy5zq0hn3S8VMXS4+V6o66QtEBodHuPH+bdSWl9jYQ44EA8I7FtTy\njgW1HOqO8HpLL2+29fHYjhZ+8+J+ABoqwyxprKS+MkxdeQm15SXUVZRQW+H9XVdRQl15mMrSICXB\nAKGgEAwIoYD3PBR7HhACBdqFl0n32mlAk6ruBBCRdcBqIP7DsRq41j1fD9wi3jfxamCdqg4Au0Sk\nyeVHsjxFZDtwFnCRS3Ony/fWbJWRUO+M7OvoZ9V3/shQVOkfisKWt/aFAkJABBEQ8fpzY3/H/xsQ\nb5BRSPjbHZcJ7+gM0mWQrDBPx+nhtKX1fPuCk9ImI/ufqycnWtcf/WknP/nzbg73DNI/FOU7nzmZ\nPru4MefmVJcxp7qM9xzTwIgqe9v72dXaS2vPAG29g+xt76d/KEr/YJTBSa7pFhBGg1EwIJQEAy5A\nvfVNkviDOfZn/OZYahG49twT+fDxcyZVn5hMgs4C4M24v/cAp6dKo6rDItIJNLjtTyUcu8A9T5Zn\nA9ChqsNJ0merjCOIyOXA5e7PHhHZkSwd0AgUwnKyVo+xCqoefwD+JXWaxe7fXH2uRk3gvB71Fzcl\n3Vwo7+94rI7Zk7KeZ/1jymMWp9yTIJOgk+xHceJVT6nSpNqebCxpvPTZLOPIjaq3Abcl2xdPRLao\n6op06XLN6jEt6pGLz9XYDRme1+kUyvs7Hqtj9uS6nplMJNgDHB3390JgX6o0IhICaoG2cY5Ntb0V\nqHN5JJaVrTKMKQS5+FwZU/AyCTpPA8vcrLIw3gDmhoQ0G4BL3fMLgEfUWwNiA7DGzcJZCiwDNqfK\n0x3zqMsDl+evsllGZm+LMTmXi8+VMYVPVdM+gI8DrwKvA99w264HznPPy4BfAE14J/8xccd+wx23\nAzhnvDzd9mNcHk0uz9JslzHZB3D5VPPIxsPqMT3qkYvPVSG8rmJ4762O/tVTXCHGGGNMztkq08YY\nY/LGgo4xxpi8saCTYCpLk+S5Hp8TkRYRed49/jpH9bhDRA6JyMsp9ouIfNfV80UROdWnepwpIp1x\n78c1OajD0SLyqIhsF5GtIvLlJGny8n7kQ7pz0C/JzgURqReRh8RbPushEZnlcx2TniuFVE8RKROR\nzSLygqvjdW77UkmyFFnW+D1oVUgPIIg3OHsMEAZeAJYnpPl74D/c8zXAPT7V43PALXl4Tz4InAq8\nnGL/x4Hf4l07cgawyad6nAn8JsfvxTzgVPe8Gm8SQOL/S17ejzz8v6c9B32s2xHnAvBt4Er3/Erg\nJp/rmPRcKaR6unO0yj0vATa5c/ZeYI3b/h/A32WzXGvpjDW6NImqDgKxpUnircZbnge8pUlWSuJa\nEvmpR16o6h/xrg1JZTXwU/U8hXed1Twf6pFzqrpfVZ91z7uB7Ry5EkBe3o88KJhzMFGKcyH+c3kn\ncH5eK5VgnHOlYOrpztEe92eJeyjeUmTr3fas19GCzljJliZJ/FIZszQJEFuaJN/1APiU68JZLyJH\nJ9mfD5nWNR/e47oKfisiJ+ayINetegrer8N4hfR+TEWxvY65qrofvC98YGoLhGVRwrlSUPUUkaCI\nPA8cAh7Ca92mWoosKyzojDWVpUnyXY9fA0tU9V3A73nr11O+5eP9yMSzwGJVPQn4HvDfuSpIRKqA\nXwL/t6p2Je5OckgxXpcwXV6Hr9KcK75T1aiqnoy3qsVpwAnJkmWzTAs6Y01laZK81kNVD6u3yjDA\nD4F3Z7kOmSqIJVlUtSvWVaCqG4ESEWnMdjkiUoL3JfIzVf2vJEkK4v3IgmJ7HQdj3Zju30M+1yfV\nuVJw9QRQ1Q7gMbwxnVRLkWWFBZ2xprI0SV7rkTBOcB5en7EfNgCXuFlbZwCdse6DfBKRo2JjayJy\nGt65fTjLZQhwO7BdVf8tRbKCeD+yoNiWkYr/XMYvn+WLcc6VgqmniMwWkTr3vBz4CN73SKqlyLLD\nr5kThfpgCkuT5Lke/y+wFW9W0aPA8Tmqx8+B/cAQ3q/fy4C/Bf7W7Re8m5G9DrwErPCpHlfEvR9P\nAe/NQR3ej9fV8CLwvHt83I/3Ix+PZOdgITxSnAsNwMPAa+7fep/rmOpcKZh6Au8CnnN1fBm4xm1P\nuhRZth62DI4xxpi8se41Y4wxeWNBxxhjTN5Y0DHGGJM3FnSMMcbkjQUdYzKQbsHRhLQ3xy08+qqI\ndOSjjsYUA5u9ZkwGROSDQA/eumrvmMBxXwJOUdX/mbPKGVNErKUzzYjI7slciS8ij4nIigmkP1NE\nfjPRclLk9TkRuSUbeaXIf0kmLZTxaJJFJkXkbSLygIg8IyJ/EpHjkxx6Id51JcYYIJQ+iTGFQ0RC\n+tZihH67De+i0NdE5HTgB3gr9AIgIouBpcAjPtXPmIJjQacAuVVpfxPrxhGRrwFVeL+0/xYYBrap\n6hoRacD7JT0b7yrilLdZcPk+gLfa7Sl4V5tfoqp9CenOBq4DSvGuRv+8qvaIyCrgO0Ar3gKbsfSz\ngbV4V1s/DawC3q2qrSLyV8A/4N2TZRPw96oaFZHPA1fhXVn+KhBbRy5ZvX/iXvspwLPi3aDte8A7\n8c7ha1X1V+713QVUukOvUNU/p8p3KtxCju8FfhF3Z4vShGRrgPWqGs1FHYwpRta9VlyuxBsfeBde\n8AH4JvC4qp6Ct67TojR5vB24zeXRhXdTulGua+6fgI+o6qnAFuArIlKGt7DoucAHgKPiDvsm3hp0\npwL3xeogIicAnwHep95KtlHgYrdu3HXA+4CP4t3cKp3jXJ2+CnzDlfc/gA8D/yIilXiLJ37U1eMz\nwHczyHeyAnhLwJ8c90hcoXcN1rVmzBgWdIrLi8DPXOsh1sX0QeBuAFW9H2hPk8ebqvqEe3433hpR\n8c7ACwJPuPtsXAosBo4Hdqnqa+rNPrk77pj3493kC1V9IK4OK/FWv37a5bUSb12n04HHVLVFvRuE\n3ZPBa/9FXIvhbOBKl+djeOvhLcK7CdUPReQlvDWjMglmk6LeMvW7ROTTMHqb6pNi+0Xk7cAs4Mlc\n1cGYYmTda4VpmLE/CMrcv5/ACzLnAVfH3ahsIlMQE9Mmu1/QQ6p64ZiNIiePU06qLj0B7lTVqxLy\nOn+cvFLpTcj3U6q6IyHfa4GDwEl4719kgmWkJCI/x7sldqOI7MFr3V0M3Coi/4QX8NbhLTgK3gSC\ndWrTQ40Zw1o6hekgMEdEGkSkFPgk3v/V0ar6KPCPQB3eOM8f8b78EJFz8H5dj2eRiLzHPb8QeDxh\n/1PA+0TkWJdnhYgcB7wCLBWRt8UdG/M48Jcu/dlxdXgYuEBE5rh99W5wfRNwpnt9JcCnM3lT4jwI\nfCnuVganuO21wH5VHQE+CwQnmG9Kqnqhqs5T1RJVXaiqt6vqLlVdpaonqepyVb0+Lv21qnpltso3\nZrqwoFOAVHUI7zYGm4Df4H3hB4G7XdfRc8DN6t146TrggyLyLF63U3Oa7LcDl4rIi0A9cGtC2S3A\n54CfuzRP4d02IQJcDtwvIo8Db8Qddh1wtqvDOXiTA7pVdRve+NDvXF4PAfPUu7/MtXhdT78nblJC\nhv4Zr2XxopsK/c9u+w/ca3sKbwyoN8Xxxhif2MWhM0jirLgs5lsKRFV12LWibnUTB4wxZgwb0zHZ\nsAi4V0QCwCDwNz7XxxhToKylMw25a3ceTrJrpapm9RbO2SQi3+DI8Z1fqOqNftTHGJN9FnSMMcbk\njU0kMMYYkzcWdIwxxuSNBR1jjDF5Y0HHGGNM3vwfD+xTodtLeEcAAAAASUVORK5CYII=\n" 223 | }, 224 | "metadata": {} 225 | } 226 | ] 227 | }, 228 | { 229 | "metadata": { 230 | "_cell_guid": "0cc2a766-a2e0-4a7c-b144-beccc0b474af", 231 | "_uuid": "06252c91946d610e8487023d1c8fff79a8a4677f" 232 | }, 233 | "cell_type": "markdown", 234 | "source": "It's not perfect (it looks like a lot pledges got very few pledges) but it is much closer to normal!" 235 | }, 236 | { 237 | "metadata": { 238 | "_cell_guid": "59848836-a7d5-4adf-8456-5d98d62eef62", 239 | "_uuid": "6dd21ff124b05826e5ef104f44e1dbf055154e2f", 240 | "trusted": true 241 | }, 242 | "cell_type": "code", 243 | "source": "# Your turn! \n# We looked as the usd_pledged_real column. What about the \"pledged\" column? Does it have the same info?\n# get the index of all positive pledges (Box-Cox only takes postive values)\nindex_of_positive_pledges = kickstarters_2017.pledged > 0\n\n# get only positive pledges (using their indexes)\npositive_pledges = kickstarters_2017.pledged.loc[index_of_positive_pledges]\n\n# normalize the pledges (w/ Box-Cox)\nnormalized_pledges = stats.boxcox(positive_pledges)[0]\n\n# plot both together to compare\nfig, ax=plt.subplots(1,2)\nsns.distplot(positive_pledges, ax=ax[0])\nax[0].set_title(\"Original Data\")\nsns.distplot(normalized_pledges, ax=ax[1])\nax[1].set_title(\"Normalized data\")", 244 | "execution_count": 7, 245 | "outputs": [ 246 | { 247 | "output_type": "execute_result", 248 | "execution_count": 7, 249 | "data": { 250 | "text/plain": "Text(0.5,1,'Normalized data')" 251 | }, 252 | "metadata": {} 253 | }, 254 | { 255 | "output_type": "display_data", 256 | "data": { 257 | "text/plain": "", 258 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEWCAYAAAC9qEq5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XmcXNV95/3Pt5fqvdVSq7UgISRA\nYITBDlbA8R6IsUhsi4lxLON4e0iwMyaZZxxnDPEYL2M8D5lMSBzjhcQLsYMBkzhWbGxsYhabsAmz\nCqEVIYS2bqn3ffk9f9xTrVJR1V3dXVW3uvv3fr36pep7zz3ndKmqfnWWe47MDOecc64YyuKugHPO\nufnDg45zzrmi8aDjnHOuaDzoOOecKxoPOs4554rGg45zzrmi8aAzh0n6S0n/mO+0OeRlkk7PR17O\nFYKkz0r6bni8SlKPpPI8l7FX0u/kmPZDkn6Vz/JLlQedWSK8KJ+W1CfpkKSvSmqa6Boz+6KZ/VEu\n+U8l7UxIulfSgKRuSV2SHpN0taSqKeThQa3EhQ/cw5LqUo79kaR7Y6xWRma2z8zqzWw07rrkIjVg\nzkYedGYBSX8OXA/8BbAAeC1wCvBzSYks11QUr4ZTdpWZNQDLgT8HNgF3SlK81XJ5VgH8t5lmooh/\nVs0R/h9Z4iQ1Ap8D/tTMfmpmw2a2F/gDosDzhyHdZyXdIem7krqAD6V/I5L0AUkvSDoq6dOpzf+0\n7obVoTXxQUn7JLVJ+lRKPudLelBSh6SDkr6cLfhNxMx6zexe4J3AbwG/N1n+ku4Plz8ZukTeI2mh\npB9JapXUHh6vnGp9XN79H+AT2Vrkkl4n6VFJneHf16Wcu1fSdZIeAPqAU8OxL0j6z/B//++SmiX9\nc2g1PyppdUoefyfpxZQW9Ruz1CP5eq+Q9Fsh7+TPgKS9IV1ZaJXvDu+h2yUtSsnn/Snvr09lKisl\nbbOkzaFujwCnpZ3PWHdJG4C/BN4T6vdkOP5hSdtCD8IeSR+ZqPw4edApfa8DqoF/TT1oZj3AT4C3\nphzeCNwBNAH/nJpe0jrgK8D7iFoYC4AVk5T9BuBM4CLgWklnheOjwH8HFhMFi4uA/zrFvyv1b9kH\nbAGSHwpZ8zezN4U0rwpdIrcRvY6/RRSEVwH9wJenWx+XN1uAe4FPpJ8IH9Y/Br4ENAN/A/xYUnNK\nsvcDVwINwAvh2KZwfAXRB/WDRP/3i4BtwGdSrn8UeHU4dwvwfUnVE1XYzB4Mr6t6YCHwEPC9cPrP\ngEuBNwMnAe3AjeHvWQd8NdTtpPA3TfTF50ZggOi9+P+En1QZ625mPwW+CNwW6vmqkP4I8HagEfgw\ncIOk8yb6W+PiQaf0LQbazGwkw7mD4XzSg2b2b2Y2Zmb9aWkvA/7dzH5lZkPAtcBkC+99zsz6zexJ\n4EngVQBm9piZPWRmI6HV9XWiN+JMHCB6g005fzM7amb/YmZ9ZtYNXJeH+rj8uBb4U0ktacd/D9hp\nZt8J/8/fA54D3pGS5ttmtjWcHw7HvmVmu82sk+hL124zuzu8P74P/EbyYjP7bnhtjJjZ/wWqiL5E\n5epLQC+QbLV8BPiUme03s0Hgs8BloSv7MuBHZnZ/OPdpYCxTpoomLLwLuDa09p8Bbk5NM9W6m9mP\nw/NiZnYf8DOOf4krKR50Sl8bsDjLGM3ycD7pxQnyOSn1vJn1AUcnKftQyuM+oB5A0hmhC+tQ6Mr7\nIicGv+lYARybTv6SaiV9PXRtdAH3A03K82wkN3XhA/VHwNVpp07ieOsl6QVObH1nej0fTnncn+H3\n+uQvkv48dDl1Suogat3n9DoN3VNvAS43s2TwOAX4Qej27SBqWY0CS3n5+6uX7O+vFqLxrtS/74Tn\nYqp1l3SJpIckHQvpfzfXv7XYPOiUvgeBQeD3Uw8qmhV0CfAfKYcnarkcJKW5L6mGqAtgOr5K9K10\nrZk1EvUxT3sSgKSTgdcAv5xm/n9O9C3wgpA+2QXnExNKw2eAP+bEgHKA6EM81SrgpZTfp70EfhgD\n+STR2OdCM2sCOsnhNRGu/V/AxtCiSnoRuMTMmlJ+qs3sJaL318kpedSS/f3VCoykpif623Ot+wnP\ni6KZn/8C/DWwNKS/M5e/NQ4edEpceNF/Dvh7SRskVYbB0u8D+4Hv5JjVHcA7wuBtIuQ53RdlA9AF\n9Eh6BfAn08kktFDeDPwQeITojZJL/oeBU9Pq0w90hLGCz+BKhpntAm4jGhNJuhM4Q9LlYQD/PcA6\nolZRPjQQfbC3AhWSriUa75hQ+AJ0G/ABM9uRdvprwHWSTglpWyRtDOfuAN4u6Q3h/fV5sny+hqnZ\n/wp8NrwH1gEfnELdDwOrdXxGX4Ko+60VGJF0CXDxZH9rXDzozAJm9ldE3/b/mujD+GGib10Xhf7j\nXPLYCvwpcCvRt7JuosHHnK5P8wng8pDHPxC9Safiy5K6id48f0v0LW1DSjfGZPl/Frg5dHP8Qcij\nhqir8SHgp1P9g1zBfR4Yv2fHzI4SDXz/OVE31P8A3m5mbZkvn7K7iMZ8dhB1XQ0wcfdz0kXAMuCO\nlBlsW8O5vwM2Az8Lr9+HgAvC37MV+BjRoP9BokkG+yco5yqirsBDwLeJJkPkWvfvh3+PSvp1GMf8\nM+D2UO7loZ4lSb6J2/wkqR7oIOrCej7u+jjn5gdv6cwjkt4RmvN1RK2mp4G98dbKOTefeNCZXzYS\nDeAeANYCm8ybus65IvLuNeecc0XjLR3nnHNFU8qLQsZi8eLFtnr16rir4eaoxx57rM3M0u/OLzh/\nXbtCmsrr2oNOmtWrV7Nly5a4q+HmKEnpd+EXhb+uXSFN5XXt3WvOOeeKJqegE+6E3y5pl6T0NZSQ\nVCXptnD+YZ24vPg14fh2SW+bLE9Ja0IeO0OeiYnKkPRWRUt/Px3+vTAlr3tDGU+EnyVTf4qcc87l\ny6RBJyyaeCPROl/rgPeGZRtSXQG0m9npwA1EG44ll/veBJwNbAC+Iql8kjyvB24ws7VEd9deMVEZ\nRHehv8PMziFaSiJ9WZj3mdmrw8+RSZ8R55xzBZNLS+d8YJeZ7QlL4t9KdL9Hqo0cX5r7DuAiSQrH\nbzWzwXDX+66QX8Y8wzUXhjwIeV46URlm9riZHQjHtwLVmsLWx84554onl6CzghPX/dnPyzf/Gk8T\n9rXoJFphNdu12Y43Ax0pe8eklpWtjFTvAh5PW4/sW6Fr7dMhqDnnnItJLkEn0wd1+h2l2dLk6/ik\n9ZB0NlGXW+o2re8L3W5vDD/vz5AHkq6UtEXSltbW1kxJnHPO5UEuQWc/J+77sJJoGZWMaRRtNraA\naEOubNdmO95GtPlWRdrxicpA0krgB0TLke9OZhr2uSCswnoLUbfey5jZTWa23szWt7QU/RYK55yb\nN3IJOo8Ca8OssgTRxID0ZbM3c3w/iMuAX4Q1vTYDm8LMszVE6309ki3PcM09IQ9Cnj+cqAxJTUR7\nrV9jZg8kKxT26FgcHlcSLaP+TA5/r3POuQKZ9OZQMxuRdBXRHg/lwDfNbKukzwNbzGwz8A3gO5J2\nEbU+NoVrt0q6HXiWaFOij4UNjMiUZyjyk8Ctkr4APB7yJlsZRPtSnA58WtKnw7GLifY2vysEnHLg\nbqK9WZxzzsXEF/xMs379est25/YtD+8D4PILVmU879xkJD1mZuuLXe5Er+u5wN+b8ZrK69pXJHDO\nOVc0HnScc84VjQcd55xzReNBxznnXNF40HHOOVc0HnScc7Pa0MgYR3sGJ0/oSoIHHefcrPbx25/g\nhrt3cKRrIO6quBx40HHOzVq/3NnKj546yJjBT7ceirs6LgcedJxzs9bN//kCyxqrecsZLTx3qJsD\nHf1xV8lNwoOOc25WGh4d46E9R/ntVyzhzGUNAGw72BVzrdxkPOg452alJ1/soGdwhDeuXczSxmoA\nnjvUHXOt3GQ86DgXE0kbJG2XtEvS1RnOv0nSryWNSLos7dwHJe0MPx9Mv3Y++OXONiR43WnNVFeW\ns7C20ls6s4AHHediIKkcuBG4BFgHvFfSurRk+4APEe0FlXrtIuAzwAVEe0R9RtLCQte51Dz8/FHO\nPqmRptoEAMsW1HhLZxbwoONcPM4HdpnZHjMbAm4FNqYmMLO9ZvYUMJZ27duAn5vZMTNrB34ObChG\npUvF6Jjx9P5Ozlt1PNYua6xiT2sPA8OjMdbMTcaDjnPxWAG8mPL7/nAsb9fO5W3Ydx7ppndolFef\n3DR+bPmCGsYMnvUutpLmQce5eCjDsVw3t8rp2rm8DfuTL3YA8OKx/vG9dE5dXIcEv9zRFmfV3CQ8\n6DgXj/3AySm/rwQOFOHaOeGJFzuoriyjuT4xfqy2qoJzVzZx344jMdbMTcaDjnPxeBRYK2mNpATR\n9uubc7z2LuBiSQvDBIKLw7F5YWB4lP/YdoRTFtVRphMbfW8+o4UnXuygzddiK1kedJyLgZmNAFcR\nBYttwO1mtlXS5yW9E0DSb0raD7wb+LqkreHaY8D/IgpcjwKfD8fmNDPjL77/JH/8T1s40j3I609f\n/LI0bzt7KQCX/N0veb6tt9hVdDmoiLsCzs1XZnYncGfasWtTHj9K1HWW6dpvAt8saAVLzH07Wvn+\nY/sB+I1VTZzWUveyNGeftIDbPvJbvPtrD3LX1kN89M2nFbuabhIedJxzs8LX7tvNgppK/vCCU2is\nqUDKNJ8CfnP1IpY2VrHzcE+Ra+hy4d1rzrmS19E3xEN7jvGbqxeyYmENDdWVE6Y/Y2kDO4/4jaKl\nyIOOc67k7QnjMyctqMkp/elL6tl5uIexsVxnobti8aDjnCt5yXtxmuurckp/xtIG+odH+cq9uwtZ\nLTcNHnSccyXvaM8gZYKFdRN3qyWdsbQewHcTLUEedJxzJa+tZ4im2gQVZbl9ZJ3eEu2v0+r365Qc\nDzrOuZJ3tGeQxSmrD0ymsaaCMkHfkC/+WWo86DjnSpqZ0dYzlPN4DoAkahMV9A2NFLBmbjr8Ph3n\nXEk70j3I0OgYi+smb+kkJxwA1CbKvaVTgryl45wracndQJcuqJ7SdR50SpMHHedcSdt2MLrJc3lj\nbvfoJNUkKuj3oFNyPOg450raswe7aKqppCZRPqXropaOj+mUGg86zrmStu1gF8un2LUG3r1Wqjzo\nOOdK1sDwKHtae1iW4/I3qWoTFYyMmXexlZicgo6kDZK2S9ol6eoM56sk3RbOPyxpdcq5a8Lx7ZLe\nNlmeYVOrhyXtDHkmJipD0lslPSbp6fDvhSl5vSYc3yXpS8q2LK1zriT9+5MHGDM4qWkaLZ3KqDuu\nvW8o39VyMzBp0JFUDtwIXAKsA94raV1asiuAdjM7HbgBuD5cu45oR8SzgQ3AVySVT5Ln9cANZrYW\naA95Zy0DaAPeYWbnAB8EvpNSr68CVwJrw8+GSZ8R51xJeO5QF5/6t2e4YM0izlzWMOXrk2NAHnRK\nSy4tnfOBXWa2x8yGgFuBjWlpNgI3h8d3ABeFVsVG4FYzGzSz54FdIb+MeYZrLgx5EPK8dKIyzOxx\nM0vuD78VqA6touVAo5k9aGYG/FNKXs65Evfx256kTPA7Zy3NefmbVLVVUdDp7BvOd9XcDOTyP7kC\neDHl9/3hWMY0YRveTqB5gmuzHW8GOkIe6WVlKyPVu4DHzWwwpN8/Sb0BkHSlpC2StrS2tmZK4pwr\nomcPdPHswS5ef/pi6qqmdw97bWV0XbsHnZKSS9DJNA6SvklFtjT5Oj5pPSSdTdTl9pFc0p9w0Owm\nM1tvZutbWloyJXHOFdG9O44A8No16d8rc1fr3WslKZegsx84OeX3lcCBbGkkVQALgGMTXJvteBvQ\nFPJILytbGUhaCfwA+ICZ7U5Jn7q/fKZ6O+dK0K4jPTRUV0y7lQPHg05nv7d0SkkuQedRYG2YVZYg\nmhiwOS3NZqJBfIDLgF+EcZTNwKYwxrKGaDD/kWx5hmvuCXkQ8vzhRGVIagJ+DFxjZg8kK2RmB4Fu\nSa8NY0UfSMnLOVfCdh/poaUh9wU+M6koLyNRXkZ7r7d0SsmkQSeMn1wF3AVsA243s62SPi/pnSHZ\nN4BmSbuAjwNXh2u3ArcDzwI/BT5mZqPZ8gx5fRL4eMirOeSdtYyQz+nApyU9EX6WhHN/Avwj0QSG\n3cBPpvb0OOeKzczY3drLkoapT5NOV5Mo9zGdEpNT29XM7gTuTDt2bcrjAeDdWa69DrgulzzD8T1E\ns9vSj2csw8y+AHwhS9lbgFdmOuecK02HugboGRxhyQxbOhB1sXX4mE5J8RUJnHMlZdeRHoAZd69B\nCDo+plNSPOg450pKMujkp6VT4bPXSowHHedcSdl1pIfG6grqZzBzLakmUU6Hj+mUFA86zrmSsutI\nD6cvqScfSyUmx3TGxjLeoudi4EHHOVdSdrdGQScfahMVjBl0D/q+OqXCg45zrmR09A3R1jNE90B+\ngkTyBlGfwVY6POg4F5PpbhkiqVLSzWHbjm2Sril23QslnzPXIHV7Ax/XKRUedJyLwUy2DCG6X60q\nbOfxGuAjqXtYzWbHZ67N/MZQ8PXXSpEHHefiMZMtQwyoC2sQ1gBDQFdxql1Y2w93U1Emmmor85Jf\nbSKaAefbG5QODzrOxWMmW4bcAfQCB4F9wF+b2bH0Ambjlh0P7j7KquZayvK0ya+3dErPzCfCO+em\nYyZbhpwPjAInAQuBX0q6OywhdTyh2U3ATQDr168v6TnDtzy8j+6BYZ471M3F65bmLd/qRDmSj+mU\nEm/pOBePmWwZcjnwUzMbNrMjwAPA+oLXuMB2t0bjOfmaLg1QJtFYXemz10qIBx3n4jGTLUP2ARcq\nUge8FniuSPUumB2He6ipLOekppq85ttcn6CtZzCvebrp86DjXAxmsmUI0ay3euAZouD1LTN7qqh/\nQJ4NDI+y9UAnr1zRmLfxnKSVC2t58Vh/XvN00+djOs7FZLpbhphZT6bjs9lT+zsZHjXWn7Io73mv\nWlTDU/s78p6vmx5v6TjnYvf4vnaWNFSxcmF+u9YATl5YS0ffMF0DPpmgFHjQcc7F6kj3APuO9XHO\nygV5WeQz3cmLagF48Vhf3vN2U+dBxzkXq7ufPYIBZy9fUJD8T16YDDo+rlMKPOg452L1s2cPsagu\nwdLG/Ky3lm5VaOnsb/eWTinwoOOci033wDD/ueso65Y3FqRrDWBBbSUN1RXs8+61kuBBxzkXm/t2\ntDI0OsZZyxsLWs6qRbXsPepBpxR40HHOxeZnWw/TXJfglObagpZz5rIGnj3QRXRvrYuTBx3nXNEM\njYzxxTu3sfHGB7j5P/fy82cP89Z1S/N+Q2i6c1YsoK1nkCPdvjJB3DzoOOeK5rEX2rnp/j3sONTN\nZzZvpaJM/NlFawte7jkroplxT+/vLHhZbmIedJxzRZNc1POKN6zht89s4f+8+9y8r7WWybqTGpHg\n6Zc86MTNl8FxzhXN7tYeEuVlLF9QzSfedmbRyq1NVHBaSz3PeNCJnbd0nHNFs7u1l8UNiYJNj57I\nmUsb2NPWW/Ry3Ym8peOcK5rdR3poqS/MTaDZ3PLwPgA6+oY43DWAmcUS9FzEWzrOuaLoHxrlpY5+\nWhqKG3SSGmsq6RsapXtwJJbyXcRbOs65otjTFk0iaGmoBo63QIqloboSgCNdAzSGx674POg45/Iq\nNZhcfsGq8ccvhBUBmusSRa8TQGNN9HF3qHOQ05c0xFIH591rzrkiOdARrfK8sDamoBNaN4e7BmIp\n30U86DjnimJ/ez/1VRVUV8bzsZMMOoc86MQqp/99SRskbZe0S9LVGc5XSbotnH9Y0uqUc9eE49sl\nvW2yPCWtCXnsDHkmJipDUrOkeyT1SPpyWr3uDWU8EX6WTO3pcc7ly4GOfk5qqo5t5liiooyG6gqO\neNCJ1aRBR1I5cCNwCbAOeK+kdWnJrgDazex04Abg+nDtOmATcDawAfiKpPJJ8rweuMHM1gLtIe+s\nZQADwKeBT2T5E95nZq8OP0cm+3udc4VxoLOfFUVYfWAiNZXlPLq3veiTGNxxubR0zgd2mdkeMxsC\nbgU2pqXZCNwcHt8BXKTo68xG4FYzGzSz54FdIb+MeYZrLgx5EPK8dKIyzKzXzH5FFHyccyXqpfb+\noix5M5HG6kq6B4ZjrcN8l0vQWQG8mPL7/nAsYxozGwE6geYJrs12vBnoCHmkl5WtjMl8K3StfVpZ\n2vWSrpS0RdKW1tbWHLJ0zk3Ftx/YS3vfMG0xr/LcWFNB14DfpxOnXIJOpg/q9E0psqXJ1/Fc65Hu\nfWZ2DvDG8PP+TInM7CYzW29m61taWibJ0jk3VR19QwAsiGnmWlJdVQW9gyO+r06Mcgk6+4GTU35f\nCRzIlkZSBbAAODbBtdmOtwFNIY/0srKVkZWZvRT+7QZuIerWc84VWUd/1KXVVBPvTZl1iQpGxoyh\n0bFY6zGf5RJ0HgXWhlllCaKJAZvT0mwGPhgeXwb8wqKvEpuBTWHm2RpgLfBItjzDNfeEPAh5/nCS\nMjKSVCFpcXhcCbwdeCaHv9c5l2cHO6Mh1+b6uFs65QD0Do7GWo/5bNIVCcxsRNJVwF1AOfBNM9sq\n6fPAFjPbDHwD+I6kXUStj03h2q2SbgeeBUaAj5nZKECmPEORnwRulfQF4PGQN9nKCHntBRqBhKRL\ngYuBF4C7QsApB+4G/mEaz5Fzboa2HexiRVPN+FI0calNRB95fUM+rhOXnJbBMbM7gTvTjl2b8ngA\neHeWa68Drsslz3B8Dxm6wSYpY3WWqr8my3HnXJEc6RrgxWN9XHRW/LfJ1VVFH3ne0omPr0jgnCuo\nO58+iAHrli+IuyrUJUL3mrd0YuNBxzlXML2DI9x4725WLaplaWM8WxqkGu9e8+0NYuNBxzlXMP/8\n8Au0dg/yu69cVhIbp1VXllEm6B3y7rW4eNBxLiYzXNPwXEkPStoq6WlJ1cWse64e39fBqYvrWNVc\nF3dVAJBEXSK6V8fFw4OOczGY4ZqGFcB3gY+a2dnAW4CSXNvlsRfaqaosj7saJ6irqqDPWzqx8aDj\nXDxmsqbhxcBTZvYkgJkdTd6KUEpGx4yjPUO01Mc/lpOqNlHuEwli5EHHuXjMZE3DMwCTdJekX0v6\nH5kKiHtNwWO9Q4yasaShtIJOtBROycXoecODjnPxmMmahhXAG4D3hX//i6SLXpYw5jUFW8Pini0l\nFnRqE+V+c2iMPOg4F4+Zrml4n5m1mVkf0U3W5xW8xlPU2h0tfVNqQaeuqoL+oVFGfP21WHjQcS4e\nM1nT8C7gXEm1IRi9mWipqZJyuHuQxuoKqktsIkF9VQVG1P3nii+nZXCcc/k1wzUN2yX9DVHgMuBO\nM/txLH/IBPYd62Plwtq4q/EyDdXRx96R7kGWNJbkTPM5zYOOczGZ4ZqG3yWaNl2SugeGOdY7xAVr\nFsVdlZdpCOuvtfXEu6HcfOVBxzmXV8d6h3juUBcApywqvZZOfVjpujXmXUznKw86zrm8MTO++cDz\n4+MlJzXVxFyjl6sPLZ1Wb+nEwicSOOfyZndr73jAOWt5IxXlpfcRk6goo6qijLZun0gQB2/pOOfy\n5r4d0U2of3HxmTTGvDX1ROqrKrylExMPOs65vLlvRyst9VUsrIt3W+rJ1FdXjN9H5Iqr9Nq+zrlZ\n66n9HaxeXBorSk+koaqCth7vXouDBx3nXF6Mjhmd/cPjA/WlLGrpePdaHDzoOOfyontgGLNobbNS\nV19VSWf/MIMjvvBnsXnQcc7lRUdftKXPbAg6yVUJvLVTfB50nHN50d4XjZHUzIKg0xhuED3c5ZMJ\nis2DjnMuLzr6Q0unxBb4zGRBmM59sNODTrF50HHO5UXnePda6U8kaKyJ6njIg07RedBxzuXFbOpe\nq6ksp6qizLvXYuBBxzmXF8mJBKW2f04mkli2oJpDXT6RoNg86Djn8qKzf5jG6grKyzLtsl16ljZW\nc9i714rOg45zLi86+oZoqi3t5W9SLWus5pB3rxWdBx3nXF609w3TVFu6i3yma+8d4kBHP9EO4K5Y\nSn+aiXOu6G55eB8Al1+wKqd0EE2ZXlDCK0una6ypZGTMaO8bZlGJL1A6l3hLxzmXF519QyycRd1r\njeP36vTHXJP5xYOOcy4vZlv3WlMIOi+1e9ApJg86zrkZGzOja2B4/IN8NkgGyJc6POgUU05BR9IG\nSdsl7ZJ0dYbzVZJuC+cflrQ65dw14fh2SW+bLE9Ja0IeO0OeiYnKkNQs6R5JPZK+nFav10h6Olzz\nJUmzYy6nc7PMwPAoZsyq2Wv1VRVUlMlbOkU2adCRVA7cCFwCrAPeK2ldWrIrgHYzOx24Abg+XLsO\n2AScDWwAviKpfJI8rwduMLO1QHvIO2sZwADwaeATGar/VeBKYG342TDZ3+ucm7r+oWiLgNnUvSaJ\nptpKb+kUWS4tnfOBXWa2x8yGgFuBjWlpNgI3h8d3ABeFVsVG4FYzGzSz54FdIb+MeYZrLgx5EPK8\ndKIyzKzXzH5FFHzGSVoONJrZgxbNifynlLycc3nUNwuDDkQtMw86xZVL0FkBvJjy+/5wLGMaMxsB\nOoHmCa7NdrwZ6Ah5pJeVrYyJ6r1/knoDIOlKSVskbWltbZ0gS+dcJsmgs6Bm9nSvQTSZwLvXiiuX\noJNpHCT9bqpsafJ1PNd65FKnlx80u8nM1pvZ+paWlgmydM5l0j8cfU9cOAtbOkd7h8a7B13h5RJ0\n9gMnp/y+EjiQLY2kCmABcGyCa7MdbwOaQh7pZWUrY6J6r5yk3s65PDjevTa7WjoLfQZb0eUSdB4F\n1oZZZQmiiQGb09JsBj4YHl8G/CKMo2wGNoWZZ2uIBvMfyZZnuOaekAchzx9OUkZGZnYQ6Jb02jBW\n9IGUvJxzeZQMOo3Vs2uRk2SQ3N/eF3NN5o9JXyFmNiLpKuAuoBz4ppltlfR5YIuZbQa+AXxH0i6i\n1semcO1WSbcDzwIjwMfMbBQgU56hyE8Ct0r6AvB4yJtsZYS89gKNQELSpcDFZvYs8CfAt4Ea4Cfh\nxzmXZ/1DozRWV1BRPrtu/VtcHwWdvW29cGbMlZkncvpaYmZ3AnemHbs25fEA8O4s114HXJdLnuH4\nHqLZbenHJypjdZbjW4BXZjo7H+diAAAYQUlEQVTnXNwkbQD+juiL1z+a2f+Xdr6KaNbla4CjwHvM\nbG/K+VVEX+g+a2Z/Xax6Z9I/PDrrutYgulenLlHO3qPe0imW2fW1xLk5Yib3v6W4gRJpvfcNjcy6\n6dIQ3auzpqWOPW29cVdl3vCg41w8ZnL/G6EbeQ+wlRLQNzQ6q1aYTiXEMy91nrBitiscDzrOxWPa\n979JqiMa+/zcRAUU8/6z/qHRWbXCdKrF9VW09w4xMjYWd1XmBQ86zsVjJve/fY5oqaieiQoo5v1n\nfUOjs7J7DaLJBAYc6x2Kuyrzwuya3+jc3DGV+9/2p92bdgFwmaS/ApqAMUkDZvZlYjBmxsDw6Kxa\nYTpVc30VAEd7POgUgwcd5+Ixfq8a8BLRLQCXp6VJ3pv2ICfem/bGZAJJnwV64go4ELVyjNl3Y2hS\nctp0W89gzDWZHzzoOBeDmdz/Vmr2henG605qjLkm01ObqKCmstxbOkXiQce5mMzk/reUNJ/Nd712\nHenBzMh1+6ndrT1UlovfWNWU76oUzeL6BG293tIpBp9I4Jwbt7+9j7fecB/PHerO+ZrdrT2sbq6j\nqqK8gDUrrMX1Vd7SKRIPOs65cYe7BjGDjr7cPoA7+oY40j3IaS31Ba5ZYTXXJ+jsH/bVpovAg45z\nblzPYLRFQW+OH753bztCeZl45YoFhaxWwS0OM9j2HvWVCQrNg45zblz3wDAQLWszmV1Henh8Xzuv\nO62ZRXWzc+ZaUnLa9F5fDqfgPOg458Z1D0TBpi+Hls5zh7ow4NUnz94JBEmLQ9B83ls6BedBxzk3\nricZdAYnDzqt3dFsr4bq2XlTaKqqynIaqiq8pVMEHnScc+OS3Wu9OXSvHekepExQm5i9s9ZSNdcn\neN6DTsF50HHOjeuaQvfaka5BGqorKcvxfp5S11xfxfNtvq9OoXnQcc6NS85ey2UiQWvPIA2zbHvq\niSyuS9DWMzje2nOFMXdeMc65GUt+4A6PGkMjEy/1f6RrgPqq4x8hs30/muQMtheO9s36KeClzFs6\nzrlxydlrMHlrp7V7cE5MIkhqDgt/+rhOYXnQcc6N6xkcITlEM9ENosOjYxzrG5pT3WvNdX6vTjF4\n0HHOjeseGGFZYzUwcUvnaM8QZsypoJOoKGNZY7Xfq1NgHnScc+O6B4Y5eVEtMPG9Oke6BwBoqJo7\n3WsAqxfXekunwDzoOOfGdQ+MsCoEnYnu1Tl+Y+jcaekArFlcx96jPm26kDzoOOcAGBoZY3BkjJMW\nRN1rgxPMXjvUFVo6cyzotPcOc6x3iM5+nzZdKB50nHPA8enSi+oSlEsMDmcPOgc7BigvE401c6t7\nLbl1tXexFY4HHecccPzG0IbqShIVZQyOZB/TOdDRz7LG6jmzGkFSs29xUHAedJxzwPF7dOqrK6iq\nLJvw5tADnf0sD91wc8miugTC79UpJA86zjngeNBpqKqgqqJswjGdAx0DnNRUU6yqFU1leRkLaiu9\ne62APOg45wAYGI6606oT5VRVlGftXhsbMw51DrC8ae61dAAW11XxvM9gKxgPOs45ICXoVJRTVZG9\ne62td5Ch0TFOWjD3WjoQtjho7cHM4q7KnORBxzkHwEBo2VRXloWJBJmDzsGOaLr0XOxeg2gyQdfA\nCO19Pm26EDzoOOcAGAhTpKsrk91rmYPOgY5+gDk5kQBStq72cZ2C8KDjnAOOd6/VVJaHiQSZx3SS\na5OtmMMtHfB7dQolp6AjaYOk7ZJ2Sbo6w/kqSbeF8w9LWp1y7ppwfLukt02Wp6Q1IY+dIc/EDMrY\nK+lpSU9I2jK1p8a5+eXElk40ppNpXOO+7a28YlkDC0OLYK5ZWFdJmfxenUKZNOhIKgduBC4B1gHv\nlbQuLdkVQLuZnQ7cAFwfrl0HbALOBjYAX5FUPkme1wM3mNlaoD3kPeUyUur222b2ajNbn+Nz4ty8\nlGzpVFWUUVVRxpi9fCmczr5htrzQzkVnLYmjikVRUVbGyoW17PGWTkHk0tI5H9hlZnvMbAi4FdiY\nlmYjcHN4fAdwkSSF47ea2aCZPQ/sCvllzDNcc2HIg5DnpdMswzk3BQPDoyQqyigrE4nK6HtbcpWC\npHt3HGF0zLjorKVxVLFozlzWwLYDXXFXY07KJeisAF5M+X1/OJYxjZmNAJ1A8wTXZjveDHSEPNLL\nmmoZAAb8TNJjkq7M4W91bt4aGB6luiL6SKgK//amBZ3HXminvqqCV61sKnr9iulVKxewp63XF/4s\ngFyCTqbFldI7erOlydfx6ZQB8HozO4+oG+9jkt6UIS2SrpS0RdKW1tbWTEmcy7vpjpVKemv4IvV0\n+PfCfNRnYHiM6tDCSQad9JbOC0f7WL24lvKyubXmWrpzQ1B95qXOmGsy9+QSdPYDJ6f8vhI4kC2N\npApgAXBsgmuzHW8DmkIe6WVNtQzMLPnvEeAHZOl2M7ObzGy9ma1vaWnJ8jQ4lz8zGSslep+8w8zO\nAT4IfCcfdRoYGU0JOtG/vSkbud3y8D62Hugc329nLjt35QIAntzfEXNN5p5cgs6jwNowqyxBNGi/\nOS3NZqIXP8BlwC8smvayGdgUvrGtAdYCj2TLM1xzT8iDkOcPp1OGpDpJDQCS6oCLgWdye1qcK7hp\nj5Wa2ePJL1TAVqBaUtVMKzQwPEpNWksntXttzIz23mFWLaqbaVElr6k2wSnNtTz1ord08m3SHZjM\nbETSVcBdQDnwTTPbKunzwBYz2wx8A/iOpF1ErY9N4dqtkm4HngVGgI+Z2ShApjxDkZ8EbpX0BeDx\nkDdTLUPSUuAH0VwDKoBbzOyn036mnMuvTGORF2RLE96HyXHMtpQ07wIeN7PB9ALCOOaVAKtWrZq0\nQlH3WhRsEhm61zr7hxk145Tmud/SAThv1ULu29HK2JhRNse7E4spp23/zOxO4M60Y9emPB4A3p3l\n2uuA63LJMxzfQ4ZusKmWEfJ5Vab0zpWAmYyVRiels4m63C7OVICZ3QTcBLB+/fpJFxIbGB6laoKW\nzrHeIYB50b0G8Ma1i/nB4y+x9UAX54TuNjdzviKBc/GYyVgpklYSjVN+wMx256NCA8MvH9PpmddB\nJxrfvX+nTy7KJw86zsVj2mOlkpqAHwPXmNkD+arQwPDY+JTpxHhL5/hEgmO9Q5Rp7i70meqWh/fx\n82cPs3xBNffv8KCTTx50nItBuNcsOa65Dbg9OVYq6Z0h2TeA5jCO+XEgOa36KuB04NNhiacnJM14\niYDU2WvlZaK6soyjvceHio71DtFUm5jz06VTrV3SwGMvtL9s6ribvpzGdJxz+TfdsVIz+wLwhXzX\nJ+peO/49dGFtgkeeP8YtD+8DookETbWV+S62pK1dWs/9O1u5/ifPcdbyRi6/YPIJGW5i3tJxzgFR\n91pyyjRAU00l7X1D47939g/TVDO/gs4pi2qpLBc7DnfHXZU5w4OOcw44cSIBQFNdgo6+YcyM0TGj\nq3+YBTVzc2XpbCrKyzh1cT07j/TEXZU5w4OOc46xMWNwZGx8yjTAwppKBkfG6B8epWtgGIN5170G\ncPqSeo71DtGR0upz0+djOs658S0MUsd0mmqjVk1H3zBD4fyCeda9BnBqS7QCg+8kmh8edJxz43vp\nVFektHRC0GnvG2J4NLq3dL6N6QAsbaymprLcg06eeNBxzjEQtqZOHdNZGLrSOvqGGRkNLZ152L1W\nJrG62Td1yxcf03HOpWxVffwjoSZRTqKijGN9Q3T0D1NTWT6+UsF8s6YlGtfZ394Xd1VmPQ86zrnx\n7rXUKdOSOHlhDTsOdXO4a2BeTiJIOmNpPQD3bvfVCWbKg45z7viYTuWJLZlXrWziaO8Qe4/2cc6K\n+bvoZUt9FQtrK7l3+5G4qzLredBxzo13r1VVnviR8MoVC6goE1UVZVywpjmOqpUESZy5rIEHdh0d\nD9BuejzoOOeytnSqK8u5+OxlvP3c5dQk5ud4TtJZyxrpHx7lP7Z5a2cmPOg45zJOmU56w+mLec0p\ni4pdpZJz2pJ6li+o5vYtL06e2GXlU6adc+OrKNdXTf6RkFwAdL4pk3jFsgbu3d7K/vY+Vi6cH/sK\n5Zu3dJxzdPYPA/PzPpyp+M3ViyiTuPGeXXFXZdbyoOOco7N/GAkacmjpzGdNtQnOX7OI27fsZ3er\nLwI6HR50nHN09g/TWF1J2TzaoG263nJmC7WV5Xx281bMLO7qzDoedJxzdPTNvw3apquhupK3vGIJ\nv9zZxtX/+nTc1Zl1POg45+jsH56XK0hP1wVrFrGiqYYfP3VwfDzM5caDjnPOg84UlUlc+hsr6B0c\n4W/v3hF3dWYVDzrOOQ8607CiqYb1qxfynQdf8G0PpsCDjnPOg840/c5ZS6mqKOMzPqkgZx50nJvn\nzIzOfp9IMB0N1ZVc+Iol3L+jldse9ZUKcuFBx7l5rmdwhNEx85bONF1wajOnLq7jCz/e5vvt5MCD\njnPzXHL2VVNNIuaazE5lEu86byVmxsdve5KhkbG4q1TSPOg4N8919EVBp9FbOtO2sC7BF3//HB7Z\ne4y//MHTjI75+E42HnScm+e6ki0dH9OZkd7BUS58xRLueGw/H/vnX9PRNxR3lUqSL7Tk3DzXkVzs\n01s6M5aczfazZw/zyP89xkfffCp/+NpTqE34R22St3Scm+c6Pejk1RvXtvAnbz6N5roEX7zzOd54\n/T18/b7d9A2NxF21kuBBx7l57tkDXdQmymlpqIq7KnPGSU01fPj1a/jIm05lUV2C//2TKPj8w/17\n5n3w8Tafc/PcA7vbOH/NIirL/Ttovp3SXMeHX7+GfUd72Xqwi+vu3MaX/mMnbz6zhfPXLOL8NYs4\nY0nDvFrdO6dXmaQNkrZL2iXp6gznqyTdFs4/LGl1yrlrwvHtkt42WZ6S1oQ8doY8E/kuw7lSUIj3\n1VQd6hxgT2svrz9t8XSzcDlY1VzHJa9czkffdCpnLG3g/h2tXPvDrWz4219y/hfv5pp/fZqfP3uY\nF4/1MTbHZ75N2tKRVA7cCLwV2A88KmmzmT2bkuwKoN3MTpe0CbgeeI+kdcAm4GzgJOBuSWeEa7Ll\neT1wg5ndKulrIe+v5rmMKRkbM4ZGxxgYHuWB3W08d6iLlQtrOGdFE6sX11JVUU6ioozKclGm6EeA\nBNL8+QbjcleI95WZjU61Hg/sagPgdac3z+jvcblZ1VzHquY6zIz2vmH2tvWy/XA3//Lr/XzvkWgb\n8ER5GYvrEyyqT9BcV0VzXYLFDVUsrk/Q0lBFY3UlleVlVJSLyvKy6HGZSFRE/44fGz8vKsqif6fy\neZS6rE/yYT4+03LpXjsf2GVme6JCdSuwEUh9c2wEPhse3wF8WVHNNgK3mtkg8LykXSE/MuUpaRtw\nIXB5SHNzyPer+Sojrd45OdDZzxuuv4fkU11dWU7/cG7vbwkE48GI8PtM/t/EzP7TZ1a2m8wb1i7m\n6+9fP1myQryvHpxqXQ91DbCssZqzljVO9VI3A5JYVJdgUV2C805ZyPDoGC+199PaM8jRnkF6Bkfp\nHRxh15EenhwcoWdwhJE8tIDKQzdeMqCk5pjL0nE3Xn4ev3fu8hnVIZegswJIXVRoP3BBtjRmNiKp\nE2gOxx9Ku3ZFeJwpz2agw8xGMqTPVxkvI+lK4Mrwa4+k7ZnSAYuBtiznisnrcaKSqsezwE0fyJrm\nlPBvod5X46bwuqb8U5n/lqx/RWmZTXWFWVzft1+fNc0pWc+kySXoZPpymx4Ts6XJdjzTWNJE6fNZ\nxssPmt0E3JTpXCpJW8xs0q+wheb1mBP1KMT76sQDOb6uMymV5zQXs6mu4PXNZSLBfuDklN9XAgey\npZFUASwAjk1wbbbjbUBTyCO9rHyV4VwpKMT7yrmSl0vQeRRYG2aVJYgGMDenpdkMfDA8vgz4hUWd\nhpuBTWEWzhpgLfBItjzDNfeEPAh5/jCfZeT2tDhXcIV4XzlX+sxs0h/gd4EdwG7gU+HY54F3hsfV\nwPeBXUQv/lNTrv1UuG47cMlEeYbjp4Y8doU8q/JdxnR/gCtnmkc+frwec6MehXhfzbXndK7V1etr\nKGTqnHPOFZzfguycc65oPOg455wrGg86aWayNEmR6/EhSa2Sngg/f1SgenxT0hFJz2Q5L0lfCvV8\nStJ5MdXjLZI6U56PawtQh5Ml3SNpm6Stkv5bhjRFeT4KZbLXXdwyvQ4kLZL0c0VLZ/1c0sI465gq\n22umFOssqVrSI5KeDHX9XDi+RhmWJpu2uAepSukHKCcanD0VSABPAuvS0vxX4Gvh8Sbgtpjq8SHg\ny0V4Tt4EnAc8k+X87wI/Ibp35LXAwzHV4y3Ajwr8XCwHzguPG4gmAaT/vxTl+SjQ3zfp6y7un0yv\nA+CvgKvD46uB6+Ou52SvmVKsc3jN1ofHlcDD4TV8O7ApHP8a8CczKcdbOicaX5rEzIaA5NIkqTYS\nLc8D0dIkF4WlSYpdj6Iws/uJ7g3JZiPwTxZ5iOg+q5mtkzG9ehScmR00s1+Hx93ANl6+EkBRno8C\nKZnXXTZZXgep78mbgUuLWqkJTPCaKbk6h9dsT/i1MvwY0dJkd4TjM66rB50TZVqaJP1D5YSlSYDk\n0iTFrgfAu0IXzh2STs5wvhhyrWsx/FboGviJpLMLWVDoVv0Nom+DqUrp+Ziq2Vr3pWZ2EKIPeWBJ\nzPXJKO01U5J1llQu6QngCPBzopZvtqXJpsWDzolmsjRJsevx78BqMzsXuJvj35qKrRjPRy5+DZxi\nZq8C/h74t0IVJKke+Bfg/zWzrvTTGS6ZLfclzOa6l7RJXjMlw8xGzezVRKtcnA+clSnZTMrwoHOi\nmSxNUtR6mNlRi1YZBvgH4DV5rkOuSmJJFjPrSnYNmNmdQKWkvG8SI6mS6MPjn83sXzMkKYnnY5pm\na90PJ7sww79HYq7PCbK8Zkq6zmbWAdxLNKaTbWmyafGgc6KZLE1S1HqkjRO8k6ivOA6bgQ+EWVuv\nBTqT3QbFJGlZcmxN0vlEr+2jeS5DwDeAbWb2N1mSlcTzMU2zdemo1Pdk6tJZsZvgNVNydZbUIqkp\nPK4BfofocyXb0mTTE/eMiVL7YQZLkxS5Hv8b2Eo0w+ge4BUFqsf3gIPAMNE34SuAjwIfDedFtBnZ\nbuBpYH1M9bgq5fl4CHhdAerwBqKuhaeAJ8LP78bxfBTqJ9PrrpR+srwOmoH/AHaGfxfFXc8cXjMl\nV2fgXODxUNdngGvD8YxLk033x5fBcc45VzTeveacc65oPOg455wrGg86zjnnisaDjnPOuaLxoONc\nDiZbcDQt7Q0pC4/ukNRRjDo6Nxt40HEASLpX0voppH+LpB/lqewPSfpyPvIqoG8DG3JJaGb/3cxe\nbdGd3X8PZLqJ1Ll5yYOOczmwDAtNSjpN0k8lPSbpl5JekeHS9xLdW+Kcw4POvCNptaTnJN2cslho\nbVqaiyU9KOnXkr4f1o1K7rXynKRfAb+fkr4l7Anya0lfl/RCcgkaSX8Y9uh4IpwrD8c/HLqe7gNe\nX7xnIK9uAv7UzF4DfAL4SupJSacAa4BfxFA350qSB5356UzgJosWC+0i2iMIgBAs/ifwO2Z2HrAF\n+LikaqI13t4BvBFYlpLfZ4iWAzoP+AGwKuR1FvAe4PWhq2kUeF9YwudzRMHmrUT7i8wqIRC/Dvh+\nWJX360R7p6TaBNxhZqPFrp9zpapi8iRuDnrRzB4Ij78L/FnKudcSBYEHwlJmCeBB4BXA82a2E0DS\nd4ErwzVvAP4LgJn9VFJ7OH4R0UKkj4a8aogWNrwAuNfMWkNetwFn5P/PLKgyoiXfXz1Bmk3Ax4pU\nH+dmBQ8681P62kepvwv4uZm9NzWBpFdnuC71mmzHbzaza9LyunSCvGYFM+uS9Lykd5vZ98PCjuea\n2ZMAks4EFhIFbOdc4N1r89MqSb8VHr8X+FXKuYeA10s6HUBSraQzgOeANZJOS7ku6VfAH4T0FxN9\n2EK0kOFlkpaEc4vCOMfDwFskNYdl39+d978wzyR9jyiAnClpv6QrgPcBV0h6kmix0dRdNt8L3Gq+\nuKFzJ/AFP+cZRbsX3gncTzQmsRN4fzj2CTPbIulC4HqgKlz2P81ss6QNwN8CbUSB5pVm9vYQVL5H\nFGzuIxrHWWNmg5LeA1xD9AVnGPiYmT0k6cPh+EGilXfLzeyqgj8BzrlYedCZZ0LQ+ZGZvTKPeVYB\no2Y2ElpQX51krMM5N0/5mI7Lh1XA7ZLKgCHgj2Ouj3OuRHlLxznnXNH4RALnnHNF40HHOedc0XjQ\ncc45VzQedJxzzhWNBx3nnHNF8/8DcjqTYd7kKwAAAAAASUVORK5CYII=\n" 259 | }, 260 | "metadata": {} 261 | } 262 | ] 263 | }, 264 | { 265 | "metadata": { 266 | "_cell_guid": "b4f37fce-4d08-409e-bbbd-6a26c3bbc6ee", 267 | "_uuid": "52b0af56e3c77db96056e9acd785f8f435f7caf5" 268 | }, 269 | "cell_type": "markdown", 270 | "source": "And that's it for today! If you have any questions, be sure to post them in the comments below or [on the forums](https://www.kaggle.com/questions-and-answers). \n\nRemember that your notebook is private by default, and in order to share it with other people or ask for help with it, you'll need to make it public. First, you'll need to save a version of your notebook that shows your current work by hitting the \"Commit & Run\" button. (Your work is saved automatically, but versioning your work lets you go back and look at what it was like at the point you saved it. It also lets you share a nice compiled notebook instead of just the raw code.) Then, once your notebook is finished running, you can go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the \"Commit & Run\" button) and setting the \"Visibility\" dropdown to \"Public\".\n\n# More practice!\n___\n\nTry finding a new dataset and pretend you're preparing to preform a [regression analysis](https://www.kaggle.com/rtatman/the-5-day-regression-challenge). ([These datasets are a good start!](https://www.kaggle.com/rtatman/datasets-for-regression-analysis)) Pick three or four variables and decide if you need to normalize or scale any of them and, if you think you should, practice applying the correct technique." 271 | }, 272 | { 273 | "metadata": { 274 | "trusted": true, 275 | "collapsed": true, 276 | "_uuid": "7db4b06adef1514f80355c24d37e19e80a83aed4" 277 | }, 278 | "cell_type": "code", 279 | "source": "", 280 | "execution_count": null, 281 | "outputs": [] 282 | } 283 | ], 284 | "metadata": { 285 | "kernelspec": { 286 | "display_name": "Python 3", 287 | "language": "python", 288 | "name": "python3" 289 | }, 290 | "language_info": { 291 | "name": "python", 292 | "version": "3.6.4", 293 | "mimetype": "text/x-python", 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "pygments_lexer": "ipython3", 299 | "nbconvert_exporter": "python", 300 | "file_extension": ".py" 301 | } 302 | }, 303 | "nbformat": 4, 304 | "nbformat_minor": 1 305 | } --------------------------------------------------------------------------------