├── ASL Recognition with Deep Learning.ipynb
├── README.md
├── a_network_analysis_of_game_of_thrones.ipynb
├── a_new_era_of_data_analysis_in_baseball.ipynb
├── a_visual_history_of_nobel_prize_winners.ipynb
├── a_visual_history_of_nobel_prize_winners_python.ipynb
├── bad_passwords_and_the_NIST_guidelines.ipynb
├── classify_song_genres_from_audio_data.ipynb
├── clustering_heart_disease_patient_data.ipynb
├── degrees_that_pay_you_back.ipynb
├── dr_semmelweis_and_the_discovery_of_handwashing.ipynb
├── exploring_67_years_of_lego.ipynb
├── exploring_the_bitcoin_cryptocurrency_market.ipynb
├── exploring_the_evolution_of_linux.ipynb
├── exploring_the_kaggle_data_science_survey.ipynb
├── find_movie_similarity_from_plot_summaries.ipynb
├── functions_for_food_price_forecasts.ipynb
├── generating_keywords_for_google_adwords.ipynb
├── give_life_predict_blood_donations.ipynb
├── level_difficulty_in_candy_crush_saga.ipynb
├── mobile_games_ab-testing_with_cookie_cats.ipynb
├── naive_Bees_predict_species_from_images.ipynb
├── naive_bees_deep_learning_with_images.ipynb
├── naive_bees_image_loading_and_processing.ipynb
├── name_game_genderprediction_using_sound.ipynb
├── phyllotaxis_draw_flowers_using_mathematics.ipynb
├── predict_taxi_fares_with_random_forests.ipynb
├── predicting_credit_card_approvals.ipynb
├── recreating_john_snow_s_ghost_map.ipynb
├── reducing_traffic_mortality_in_the_USA.ipynb
├── rise_and_fall_of_programming_languages.ipynb
├── risk_and_returns_the_sharpe_ratio.ipynb
├── scout_your_athletics_fantasy_team.ipynb
├── the_android_app_market_on_google_play.ipynb
├── the_gitHub_history_of_the_scala_language.ipynb
├── the_hottest_topics_in_machine_learning.ipynb
├── tv_halftime_shows_and_the_big_game.ipynb
├── up_and_down_with_the_kardashians.ipynb
├── visualizing_inequalities_in_life_expectancy.ipynb
├── what_makes_a_pokémon_legendary.ipynb
├── whats_in_a_name.ipynb
├── which_debts_are_worth_the_Bank_s_effort.ipynb
├── who_is_drunk_and_when_in_ames_iowa.ipynb
├── who_is_drunk_and_when_in_ames_iowa_python.ipynb
├── word_frequency_in_moby_dick.ipynb
└── wrangling_and_visualizing_musical_data.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # DataCamp Projects
 2 | 
 3 | Each file is the solution to a project in [DataCamp](https://www.datacamp.com).
 4 | 
 5 | ## R Projects
 6 |  - [Rise and Fall of Programming Languages](https://github.com/ChristianNogueira/datacamp_projects/blob/master/rise_and_fall_of_programming_languages.ipynb)
 7 |  - [Predict Taxi Fares with Random Forests](https://github.com/ChristianNogueira/datacamp_projects/blob/master/predict_taxi_fares_with_random_forests.ipynb)
 8 |  - [Scout Your Athletics Fantasy Team](https://github.com/ChristianNogueira/datacamp_projects/blob/master/scout_your_athletics_fantasy_team.ipynb)
 9 |  - [Visualizing Inequalities in Life Expectancy](https://github.com/ChristianNogueira/datacamp_projects/blob/master/visualizing_inequalities_in_life_expectancy.ipynb)
10 |  - [A Visual History of Nobel Prize Winners](https://github.com/ChristianNogueira/datacamp_projects/blob/master/a_visual_history_of_nobel_prize_winners.ipynb)
11 |  - [Who Is Drunk and When in Ames, Iowa?](https://github.com/ChristianNogueira/datacamp_projects/blob/master/who_is_drunk_and_when_in_ames_iowa.ipynb)
12 |  - [Bad passwords and the NIST guidelines]
13 |  - [Wrangling and Visualizing Musical Data](https://github.com/ChristianNogueira/datacamp_projects/blob/master/wrangling_and_visualizing_musical_data.ipynb)
14 |  - [Level difficulty in Candy Crush Saga](https://github.com/ChristianNogueira/datacamp_projects/blob/master/level_difficulty_in_candy_crush_saga.ipynb)
15 |  - [Exploring the Kaggle Data Science Survey](https://github.com/ChristianNogueira/datacamp_projects/blob/master/exploring_the_kaggle_data_science_survey.ipynb)
16 |  - [Phyllotaxis: Draw flowers using mathematics](https://github.com/ChristianNogueira/datacamp_projects/blob/master/phyllotaxis_draw_flowers_using_mathematics.ipynb)
17 |  - [Dr. Semmelweis and the discovery of handwashing]
18 |  - [Introduction to DataCamp Projects]
19 | 
20 | ## Python Projects
21 |  - [Predicting Credit Card Approvals](https://github.com/ChristianNogueira/datacamp_projects/blob/master/predicting_credit_card_approvals.ipynb)
22 |  - [Give Life, Predict Blood Donations](https://github.com/ChristianNogueira/datacamp_projects/blob/master/give_life_predict_blood_donations.ipynb)
23 |  - [Classify songs Genres from Audio Data](https://github.com/ChristianNogueira/datacamp_projects/blob/master/classify_song_genres_from_audio_data.ipynb)
24 |  - [A Visual History of Nobel Prize Winners](https://github.com/ChristianNogueira/datacamp_projects/blob/master/a_visual_history_of_nobel_prize_winners_python.ipynb)
25 |  - [Who Is Drunk and When in Ames, Iowa? (Python version)](https://github.com/ChristianNogueira/datacamp_projects/blob/master/who_is_drunk_and_when_in_ames_iowa_python.ipynb)
26 |  - [Naïve Bees: Predict Species from Images](https://github.com/ChristianNogueira/datacamp_projects/blob/master/naive_Bees_predict_species_from_images.ipynb)
27 |  - [Generating Keywords for Google AdWords](https://github.com/ChristianNogueira/datacamp_projects/blob/master/generating_keywords_for_google_adwords.ipynb)
28 |  - [Naïve Bees: Image Loading and Processing](https://github.com/ChristianNogueira/datacamp_projects/blob/master/naive_bees_image_loading_and_processing.ipynb)
29 |  - [The GitHub History of the Scala Language](https://github.com/ChristianNogueira/datacamp_projects/blob/master/the_gitHub_history_of_the_scala_language.ipynb)
30 |  - [The Hottest Topics in Machine Learning](https://github.com/ChristianNogueira/datacamp_projects/blob/master/the_hottest_topics_in_machine_learning.ipynb)
31 |  - [Recreating John Snow's Ghost Map](https://github.com/ChristianNogueira/datacamp_projects/blob/master/recreating_john_snow_s_ghost_map.ipynb)
32 |  - [A New Era of Data Analysis in Baseball](https://github.com/ChristianNogueira/datacamp_projects/blob/master/a_new_era_of_data_analysis_in_baseball.ipynb)
33 |  - [Mobile Games A/B Testing with Cookie Cats](https://github.com/ChristianNogueira/datacamp_projects/blob/master/mobile_games_ab-testing_with_cookie_cats.ipynb)
34 |  - [A Network analysis of Game of Thrones](https://github.com/ChristianNogueira/datacamp_projects/blob/master/a_network_analysis_of_game_of_thrones.ipynb)
35 |  - [Name Game: Gender Prediction using Sound](https://github.com/ChristianNogueira/datacamp_projects/blob/master/name_game_genderprediction_using_sound.ipynb)
36 |  - [Risk and Returns: The Sharpe Ratio](https://github.com/ChristianNogueira/datacamp_projects/blob/master/risk_and_returns_the_sharpe_ratio.ipynb)
37 |  - [Exploring the Bitcoin cryptocurrency market](https://github.com/ChristianNogueira/datacamp_projects/blob/master/exploring_the_bitcoin_cryptocurrency_market.ipynb)
38 |  - [Word frequency in Moby Dick](https://github.com/ChristianNogueira/datacamp_projects/blob/master/word_frequency_in_moby_dick.ipynb)
39 |  - [Bad passwords and the NIST guidelines](https://github.com/ChristianNogueira/datacamp_projects/blob/master/bad_passwords_and_the_NIST_guidelines.ipynbd)
40 |  - [Dr. Semmelweis and the discovery of handwashing](https://github.com/ChristianNogueira/datacamp_projects/blob/master/dr_semmelweis_and_the_discovery_of_handwashing.ipynb)
41 |  - [Exploring the evolution of Linux](https://github.com/ChristianNogueira/datacamp_projects/blob/master/exploring_the_evolution_of_linux.ipynb)
42 |  - [Exploring 67 years of LEGO](https://github.com/ChristianNogueira/datacamp_projects/blob/master/exploring_67_years_of_lego.ipynb)
43 |  - [Introduction to DataCamp Projects]
44 | 
45 | ## Projects not available in DataCamp any more (or missed it)
46 |  - [What in a Name](https://github.com/ChristianNogueira/datacamp_projects/blob/master/whats_in_a_name.ipynb)
47 | 


--------------------------------------------------------------------------------
/bad_passwords_and_the_NIST_guidelines.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","source":"## 1. The NIST Special Publication 800-63B\n<p>If you – 50 years ago – needed to come up with a secret password you were probably part of a secret espionage organization or (more likely) you were pretending to be a spy when playing as a kid. Today, many of us are forced to come up with new passwords <em>all the time</em> when signing into sites and apps. As a password <em>inventeur</em> it is your responsibility to come up with good, hard-to-crack passwords. But it is also in the interest of sites and apps to make sure that you use good passwords. The problem is that it's really hard to define what makes a good password. However, <em>the National Institute of Standards and Technology</em> (NIST) knows what the second best thing is: To make sure you're at least not using a <em>bad</em> password. </p>\n<p>In this notebook, we will go through the rules in <a href=\"https://pages.nist.gov/800-63-3/sp800-63b.html\">NIST Special Publication 800-63B</a> which details what checks a <em>verifier</em> (what the NIST calls a second party responsible for storing and verifying passwords) should perform to make sure users don't pick bad passwords. We will go through the passwords of users from a fictional company and use python to flag the users with bad passwords. But us being able to do this already means the fictional company is breaking one of the rules of 800-63B:</p>\n<blockquote>\n  <p>Verifiers SHALL store memorized secrets in a form that is resistant to offline attacks. Memorized secrets SHALL be salted and hashed using a suitable one-way key derivation function.</p>\n</blockquote>\n<p>That is, never save users' passwords in plaintext, always encrypt the passwords! Keeping this in mind for the next time we're building a password management system, let's load in the data.</p>\n<p><em>Warning: The list of passwords and the fictional user database both contain <strong>real</strong> passwords leaked from <strong>real</strong> websites. These passwords have not been filtered in any way and include words that are explicit, derogatory and offensive.</em></p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"3"}}},{"cell_type":"code","source":"# Importing the pandas module\nimport pandas as pd\n\n# Loading in datasets/users.csv \nusers = pd.read_csv('datasets/users.csv')\n\n# Printing out how many users we've got\nprint(len(users.index))\n\n# Taking a look at the 12 first users\nusers.head(n=12)","metadata":{"dc":{"key":"3"},"trusted":true,"tags":["sample_code"]},"execution_count":98,"outputs":[{"text":"982\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"    id         user_name          password\n0    1    vance.jennings          joobheco\n1    2    consuelo.eaton        0869347314\n2    3   mitchel.perkins        fabypotter\n3    4    odessa.vaughan         aharney88\n4    5    araceli.wilder        acecdn3000\n5    6  shawn.harrington           5278049\n6    7        evelyn.gay            master\n7    8       noreen.hale            murphy\n8    9       gladys.ward           lwsves2\n9   10   brant.zimmerman  1190KAREN5572497\n10  11     leanna.abbott          aivlys24\n11  12   milford.hubbard           hubbard","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>vance.jennings</td>\n      <td>joobheco</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>consuelo.eaton</td>\n      <td>0869347314</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>mitchel.perkins</td>\n      <td>fabypotter</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>odessa.vaughan</td>\n      <td>aharney88</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>araceli.wilder</td>\n      <td>acecdn3000</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>shawn.harrington</td>\n      <td>5278049</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>evelyn.gay</td>\n      <td>master</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>noreen.hale</td>\n      <td>murphy</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>gladys.ward</td>\n      <td>lwsves2</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>brant.zimmerman</td>\n      <td>1190KAREN5572497</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11</td>\n      <td>leanna.abbott</td>\n      <td>aivlys24</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12</td>\n      <td>milford.hubbard</td>\n      <td>hubbard</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":98}]},{"cell_type":"markdown","source":"## 2. Passwords should not be too short\n<p>If we take a look at the first 12 users above we already see some bad passwords. But let's not get ahead of ourselves and start flagging passwords <em>manually</em>. What is the first thing we should check according to the NIST Special Publication 800-63B?</p>\n<blockquote>\n  <p>Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length.</p>\n</blockquote>\n<p>Ok, so the passwords of our users shouldn't be too short. Let's start by checking that!</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"10"}}},{"cell_type":"code","source":"# Calculating the lengths of users' passwords\nusers['length'] = users['password'].str.len()\n\n# Flagging the users with too short passwords\nusers['too_short'] = users['length'] < 8\n\n# Counting and printing the number of users with too short passwords\nprint(users['too_short'].sum())\n\n# Taking a look at the 12 first rows\nusers.head(n=12)","metadata":{"dc":{"key":"10"},"trusted":true,"tags":["sample_code"]},"execution_count":100,"outputs":[{"text":"376\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"    id         user_name          password  length too_short\n0    1    vance.jennings          joobheco       8     False\n1    2    consuelo.eaton        0869347314      10     False\n2    3   mitchel.perkins        fabypotter      10     False\n3    4    odessa.vaughan         aharney88       9     False\n4    5    araceli.wilder        acecdn3000      10     False\n5    6  shawn.harrington           5278049       7      True\n6    7        evelyn.gay            master       6      True\n7    8       noreen.hale            murphy       6      True\n8    9       gladys.ward           lwsves2       7      True\n9   10   brant.zimmerman  1190KAREN5572497      16     False\n10  11     leanna.abbott          aivlys24       8     False\n11  12   milford.hubbard           hubbard       7      True","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n      <th>length</th>\n      <th>too_short</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>vance.jennings</td>\n      <td>joobheco</td>\n      <td>8</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>consuelo.eaton</td>\n      <td>0869347314</td>\n      <td>10</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>mitchel.perkins</td>\n      <td>fabypotter</td>\n      <td>10</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>odessa.vaughan</td>\n      <td>aharney88</td>\n      <td>9</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>araceli.wilder</td>\n      <td>acecdn3000</td>\n      <td>10</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>shawn.harrington</td>\n      <td>5278049</td>\n      <td>7</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>evelyn.gay</td>\n      <td>master</td>\n      <td>6</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>noreen.hale</td>\n      <td>murphy</td>\n      <td>6</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>gladys.ward</td>\n      <td>lwsves2</td>\n      <td>7</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>brant.zimmerman</td>\n      <td>1190KAREN5572497</td>\n      <td>16</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11</td>\n      <td>leanna.abbott</td>\n      <td>aivlys24</td>\n      <td>8</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12</td>\n      <td>milford.hubbard</td>\n      <td>hubbard</td>\n      <td>7</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":100}]},{"cell_type":"markdown","source":"## 3.  Common passwords people use\n<p>Already this simple rule flagged a couple of offenders among the first 12 users. Next up in Special Publication 800-63B is the rule that</p>\n<blockquote>\n  <p>verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised.</p>\n  <ul>\n  <li>Passwords obtained from previous breach corpuses.</li>\n  <li>Dictionary words.</li>\n  <li>Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).</li>\n  <li>Context-specific words, such as the name of the service, the username, and derivatives thereof.</li>\n  </ul>\n</blockquote>\n<p>We're going to check these in order and start with <em>Passwords obtained from previous breach corpuses</em>, that is, websites where hackers have leaked all the users' passwords. As many websites don't follow the NIST guidelines and encrypt passwords there now exist large lists of the most popular passwords. Let's start by loading in the 10,000 most common passwords which I've taken from <a href=\"https://github.com/danielmiessler/SecLists/tree/master/Passwords\">here</a>.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"17"}}},{"cell_type":"code","source":"# Reading in the top 10000 passwords\ncommon_passwords = pd.read_csv('datasets/10_million_password_list_top_10000.txt', header=None, squeeze=True)\n\n# Taking a look at the top 20\ncommon_passwords.head(n=20)","metadata":{"dc":{"key":"17"},"trusted":true,"tags":["sample_code"]},"execution_count":102,"outputs":[{"data":{"text/plain":"0        123456\n1      password\n2      12345678\n3        qwerty\n4     123456789\n5         12345\n6          1234\n7        111111\n8       1234567\n9        dragon\n10       123123\n11     baseball\n12       abc123\n13     football\n14       monkey\n15      letmein\n16       696969\n17       shadow\n18       master\n19       666666\nName: 0, dtype: object"},"output_type":"execute_result","metadata":{},"execution_count":102}]},{"cell_type":"markdown","source":"## 4.  Passwords should not be common passwords\n<p>The list of passwords was ordered, with the most common passwords first, and so we shouldn't be surprised to see passwords like <code>123456</code> and <code>qwerty</code> above. As hackers also have access to this list of common passwords, it's important that none of our users use these passwords!</p>\n<p>Let's flag all the passwords in our user database that are among the top 10,000 used passwords.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"24"}}},{"cell_type":"code","source":"# Flagging the users with passwords that are common passwords\nusers['common_password'] = users['password'].isin(common_passwords)\n\n# Counting and printing the number of users using common passwords\nprint(users['common_password'].sum())\n\n# Taking a look at the 12 first rows\nusers.head(n=12)","metadata":{"dc":{"key":"24"},"trusted":true,"tags":["sample_code"]},"execution_count":104,"outputs":[{"text":"129\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"    id         user_name          password  length too_short common_password\n0    1    vance.jennings          joobheco       8     False           False\n1    2    consuelo.eaton        0869347314      10     False           False\n2    3   mitchel.perkins        fabypotter      10     False           False\n3    4    odessa.vaughan         aharney88       9     False           False\n4    5    araceli.wilder        acecdn3000      10     False           False\n5    6  shawn.harrington           5278049       7      True           False\n6    7        evelyn.gay            master       6      True            True\n7    8       noreen.hale            murphy       6      True            True\n8    9       gladys.ward           lwsves2       7      True           False\n9   10   brant.zimmerman  1190KAREN5572497      16     False           False\n10  11     leanna.abbott          aivlys24       8     False           False\n11  12   milford.hubbard           hubbard       7      True           False","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n      <th>length</th>\n      <th>too_short</th>\n      <th>common_password</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>vance.jennings</td>\n      <td>joobheco</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>consuelo.eaton</td>\n      <td>0869347314</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>mitchel.perkins</td>\n      <td>fabypotter</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>odessa.vaughan</td>\n      <td>aharney88</td>\n      <td>9</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>araceli.wilder</td>\n      <td>acecdn3000</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>shawn.harrington</td>\n      <td>5278049</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>evelyn.gay</td>\n      <td>master</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>noreen.hale</td>\n      <td>murphy</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>gladys.ward</td>\n      <td>lwsves2</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>brant.zimmerman</td>\n      <td>1190KAREN5572497</td>\n      <td>16</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11</td>\n      <td>leanna.abbott</td>\n      <td>aivlys24</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12</td>\n      <td>milford.hubbard</td>\n      <td>hubbard</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":104}]},{"cell_type":"markdown","source":"## 5. Passwords should not be common words\n<p>Ay ay ay! It turns out many of our users use common passwords, and of the first 12 users there are already two. However, as most common passwords also tend to be short, they were already flagged as being too short. What is the next thing we should check?</p>\n<blockquote>\n  <p>Verifiers SHALL compare the prospective secrets against a list that contains [...] dictionary words.</p>\n</blockquote>\n<p>This follows the same logic as before: It is easy for hackers to check users' passwords against common English words and therefore common English words make bad passwords. Let's check our users' passwords against the top 10,000 English words from <a href=\"https://github.com/first20hours/google-10000-english\">Google's Trillion Word Corpus</a>.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"31"}}},{"cell_type":"code","source":"# Reading in a list of the 10000 most common words\nwords = pd.read_csv('datasets/google-10000-english.txt', header=None, squeeze=True)\n\n# Flagging the users with passwords that are common words\nusers['common_word'] = users['password'].str.lower().isin(words)\n\n# Counting and printing the number of users using common words as passwords\nprint(users['common_word'].sum())\n\n# Taking a look at the 12 first rows\nusers.head(n=12)","metadata":{"dc":{"key":"31"},"trusted":true,"tags":["sample_code"]},"execution_count":106,"outputs":[{"text":"137\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"    id         user_name          password  length too_short common_password  \\\n0    1    vance.jennings          joobheco       8     False           False   \n1    2    consuelo.eaton        0869347314      10     False           False   \n2    3   mitchel.perkins        fabypotter      10     False           False   \n3    4    odessa.vaughan         aharney88       9     False           False   \n4    5    araceli.wilder        acecdn3000      10     False           False   \n5    6  shawn.harrington           5278049       7      True           False   \n6    7        evelyn.gay            master       6      True            True   \n7    8       noreen.hale            murphy       6      True            True   \n8    9       gladys.ward           lwsves2       7      True           False   \n9   10   brant.zimmerman  1190KAREN5572497      16     False           False   \n10  11     leanna.abbott          aivlys24       8     False           False   \n11  12   milford.hubbard           hubbard       7      True           False   \n\n   common_word  \n0        False  \n1        False  \n2        False  \n3        False  \n4        False  \n5        False  \n6         True  \n7         True  \n8        False  \n9        False  \n10       False  \n11       False  ","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n      <th>length</th>\n      <th>too_short</th>\n      <th>common_password</th>\n      <th>common_word</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>vance.jennings</td>\n      <td>joobheco</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>consuelo.eaton</td>\n      <td>0869347314</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>mitchel.perkins</td>\n      <td>fabypotter</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>odessa.vaughan</td>\n      <td>aharney88</td>\n      <td>9</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>araceli.wilder</td>\n      <td>acecdn3000</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>shawn.harrington</td>\n      <td>5278049</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>evelyn.gay</td>\n      <td>master</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>noreen.hale</td>\n      <td>murphy</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>gladys.ward</td>\n      <td>lwsves2</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>brant.zimmerman</td>\n      <td>1190KAREN5572497</td>\n      <td>16</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11</td>\n      <td>leanna.abbott</td>\n      <td>aivlys24</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12</td>\n      <td>milford.hubbard</td>\n      <td>hubbard</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":106}]},{"cell_type":"markdown","source":"## 6. Passwords should not be your name\n<p>It turns out many of our passwords were common English words too! Next up on the NIST list:</p>\n<blockquote>\n  <p>Verifiers SHALL compare the prospective secrets against a list that contains [...] context-specific words, such as the name of the service, the username, and derivatives thereof.</p>\n</blockquote>\n<p>Ok, so there are many things we could check here. One thing to notice is that our users' usernames consist of their first names and last names separated by a dot. For now, let's just flag passwords that are the same as either a user's first or last name.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"38"}}},{"cell_type":"code","source":"# Extracting first and last names into their own columns\nusers['first_name'] = users['user_name'].str.extract('(^\\w+)')\nusers['last_name'] = users['user_name'].str.extract('(\\w+$)')\n\n# Flagging the users with passwords that matches their names\nusers['uses_name'] = (users['password'] == users['first_name']) | (users['password'] == users['last_name'])\n\n# Counting and printing the number of users using names as passwords\nprint(users['uses_name'].sum())\n\n# Taking a look at the 12 first rows\nusers.head(n=12)","metadata":{"dc":{"key":"38"},"trusted":true,"tags":["sample_code"]},"execution_count":108,"outputs":[{"text":"50\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"    id         user_name          password  length too_short common_password  \\\n0    1    vance.jennings          joobheco       8     False           False   \n1    2    consuelo.eaton        0869347314      10     False           False   \n2    3   mitchel.perkins        fabypotter      10     False           False   \n3    4    odessa.vaughan         aharney88       9     False           False   \n4    5    araceli.wilder        acecdn3000      10     False           False   \n5    6  shawn.harrington           5278049       7      True           False   \n6    7        evelyn.gay            master       6      True            True   \n7    8       noreen.hale            murphy       6      True            True   \n8    9       gladys.ward           lwsves2       7      True           False   \n9   10   brant.zimmerman  1190KAREN5572497      16     False           False   \n10  11     leanna.abbott          aivlys24       8     False           False   \n11  12   milford.hubbard           hubbard       7      True           False   \n\n   common_word first_name   last_name uses_name  \n0        False      vance    jennings     False  \n1        False   consuelo       eaton     False  \n2        False    mitchel     perkins     False  \n3        False     odessa     vaughan     False  \n4        False    araceli      wilder     False  \n5        False      shawn  harrington     False  \n6         True     evelyn         gay     False  \n7         True     noreen        hale     False  \n8        False     gladys        ward     False  \n9        False      brant   zimmerman     False  \n10       False     leanna      abbott     False  \n11       False    milford     hubbard      True  ","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n      <th>length</th>\n      <th>too_short</th>\n      <th>common_password</th>\n      <th>common_word</th>\n      <th>first_name</th>\n      <th>last_name</th>\n      <th>uses_name</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>vance.jennings</td>\n      <td>joobheco</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>vance</td>\n      <td>jennings</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>consuelo.eaton</td>\n      <td>0869347314</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>consuelo</td>\n      <td>eaton</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>mitchel.perkins</td>\n      <td>fabypotter</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>mitchel</td>\n      <td>perkins</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>odessa.vaughan</td>\n      <td>aharney88</td>\n      <td>9</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>odessa</td>\n      <td>vaughan</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>araceli.wilder</td>\n      <td>acecdn3000</td>\n      <td>10</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>araceli</td>\n      <td>wilder</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>shawn.harrington</td>\n      <td>5278049</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>shawn</td>\n      <td>harrington</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>evelyn.gay</td>\n      <td>master</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>evelyn</td>\n      <td>gay</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>noreen.hale</td>\n      <td>murphy</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>True</td>\n      <td>noreen</td>\n      <td>hale</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>gladys.ward</td>\n      <td>lwsves2</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>gladys</td>\n      <td>ward</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>brant.zimmerman</td>\n      <td>1190KAREN5572497</td>\n      <td>16</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>brant</td>\n      <td>zimmerman</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11</td>\n      <td>leanna.abbott</td>\n      <td>aivlys24</td>\n      <td>8</td>\n      <td>False</td>\n      <td>False</td>\n      <td>False</td>\n      <td>leanna</td>\n      <td>abbott</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12</td>\n      <td>milford.hubbard</td>\n      <td>hubbard</td>\n      <td>7</td>\n      <td>True</td>\n      <td>False</td>\n      <td>False</td>\n      <td>milford</td>\n      <td>hubbard</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":108}]},{"cell_type":"markdown","source":"## 7. Passwords should not be repetitive\n<p>Milford Hubbard (user number 12 above), what where you thinking!? Ok, so the last thing we are going to check is a bit tricky:</p>\n<blockquote>\n  <p>verifiers SHALL compare the prospective secrets [so that they don't contain] repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).</p>\n</blockquote>\n<p>This is tricky to check because what is <em>repetitive</em> is hard to define. Is <code>11111</code> repetitive? Yes! Is <code>12345</code> repetitive? Well, kind of. Is <code>13579</code> repetitive? Maybe not..? To check for <em>repetitiveness</em> can be arbitrarily complex, but here we're only going to do something simple. We're going to flag all passwords that contain 4 or more repeated characters.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"45"}}},{"cell_type":"code","source":"### Flagging the users with passwords with >= 4 repeats\nusers['too_many_repeats'] = users['password'].str.contains(r'(.)\\1{4,}')\n\n# Taking a look at the users with too many repeats\nusers.loc[users['too_many_repeats'] == True]","metadata":{"dc":{"key":"45"},"trusted":true,"tags":["sample_code"]},"execution_count":110,"outputs":[{"data":{"text/plain":"      id         user_name password  length too_short common_password  \\\n146  147       patti.dixon   555555       6      True            True   \n572  573  cornelia.bradley   555555       6      True            True   \n644  645       essie.lopez    11111       5      True            True   \n798  799       charley.key   888888       6      True            True   \n941  942    mitch.ferguson   aaaaaa       6      True            True   \n\n    common_word first_name last_name uses_name too_many_repeats  \n146       False      patti     dixon     False             True  \n572       False   cornelia   bradley     False             True  \n644       False      essie     lopez     False             True  \n798       False    charley       key     False             True  \n941       False      mitch  ferguson     False             True  ","text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>user_name</th>\n      <th>password</th>\n      <th>length</th>\n      <th>too_short</th>\n      <th>common_password</th>\n      <th>common_word</th>\n      <th>first_name</th>\n      <th>last_name</th>\n      <th>uses_name</th>\n      <th>too_many_repeats</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>146</th>\n      <td>147</td>\n      <td>patti.dixon</td>\n      <td>555555</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>patti</td>\n      <td>dixon</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>572</th>\n      <td>573</td>\n      <td>cornelia.bradley</td>\n      <td>555555</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>cornelia</td>\n      <td>bradley</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>644</th>\n      <td>645</td>\n      <td>essie.lopez</td>\n      <td>11111</td>\n      <td>5</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>essie</td>\n      <td>lopez</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>798</th>\n      <td>799</td>\n      <td>charley.key</td>\n      <td>888888</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>charley</td>\n      <td>key</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <th>941</th>\n      <td>942</td>\n      <td>mitch.ferguson</td>\n      <td>aaaaaa</td>\n      <td>6</td>\n      <td>True</td>\n      <td>True</td>\n      <td>False</td>\n      <td>mitch</td>\n      <td>ferguson</td>\n      <td>False</td>\n      <td>True</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"output_type":"execute_result","metadata":{},"execution_count":110}]},{"cell_type":"markdown","source":"## 8. All together now!\n<p>Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What's left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"52"}}},{"cell_type":"code","source":"# Flagging all passwords that are bad\nusers['bad_password'] = users['too_short'] |\\\n                        users['common_password'] |\\\n                        users['common_word'] |\\\n                        users['uses_name'] |\\\n                        users['too_many_repeats']\n\n# Counting and printing the number of bad passwords\nprint(users['bad_password'].sum())\n\n# Looking at the first 25 bad passwords\nusers.loc[users['bad_password'] == True]['password'].head(n=25)","metadata":{"dc":{"key":"52"},"trusted":true,"tags":["sample_code"]},"execution_count":112,"outputs":[{"text":"423\n","name":"stdout","output_type":"stream"},{"data":{"text/plain":"5       5278049\n6        master\n7        murphy\n8       lwsves2\n11      hubbard\n13       310356\n15      oZ4k0QE\n16      chelsea\n17      zvc1939\n18       nickgd\n21     cocacola\n22      woodard\n25        AJ9Da\n26       ewokzs\n28      YyGjz8E\n30         reid\n34      jOYZBs8\n38      wwewwf1\n43       225377\n45       NdZ7E6\n47        CQB3Z\n48        diffo\n51    123456789\n52      y8uM7D6\n56      mikeloo\nName: password, dtype: object"},"output_type":"execute_result","metadata":{},"execution_count":112}]},{"cell_type":"markdown","source":"## 9. Otherwise, the password should be up to the user\n<p>In this notebook, we've implemented the password checks recommended by the NIST Special Publication 800-63B. It's certainly possible to better implement these checks, for example, by using a longer list of common passwords. Also note that the NIST checks in no way guarantee that a chosen password is good, just that it's not obviously bad.</p>\n<p>Apart from the checks we've implemented above the NIST is also clear with what password rules should <em>not</em> be imposed:</p>\n<blockquote>\n  <p>Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets. Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically).</p>\n</blockquote>\n<p>So the next time a website or app tells you to \"include both a number, symbol and an upper and lower case character in your password\" you should send them a copy of <a href=\"https://pages.nist.gov/800-63-3/sp800-63b.html\">NIST Special Publication 800-63B</a>.</p>","metadata":{"editable":false,"tags":["context"],"run_control":{"frozen":true},"deletable":false,"dc":{"key":"59"}}},{"cell_type":"code","source":"# Enter a password that passes the NIST requirements\n# PLEASE DO NOT USE AN EXISTING PASSWORD HERE\nnew_password = \"i_like_pie\"","metadata":{"dc":{"key":"59"},"collapsed":true,"trusted":true,"tags":["sample_code"]},"execution_count":114,"outputs":[]}],"metadata":{"kernelspec":{"display_name":"Python 3","name":"python3","language":"python"},"language_info":{"name":"python","mimetype":"text/x-python","pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.5.2","file_extension":".py","codemirror_mode":{"name":"ipython","version":3}}},"nbformat":4,"nbformat_minor":2}


--------------------------------------------------------------------------------
/exploring_67_years_of_lego.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat_minor":2,"metadata":{"kernelspec":{"display_name":"Python 3","name":"python3","language":"python"},"language_info":{"nbconvert_exporter":"python","mimetype":"text/x-python","version":"3.5.2","pygments_lexer":"ipython3","name":"python","codemirror_mode":{"version":3,"name":"ipython"},"file_extension":".py"}},"cells":[{"source":"## Introduction\n<p>Everyone loves Lego (unless you ever stepped on one). Did you know by the way that \"Lego\" was derived from the Danish phrase leg godt, which means \"play well\"? Unless you speak Danish, probably not. </p>\n<p>In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!</p>\n<p><img src=\"https://s3.amazonaws.com/assets.datacamp.com/projects/lego/lego-bricks.jpeg\" alt=\"lego\"></p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"1d0b086e6c"}},"cell_type":"markdown"},{"source":"# Nothing to do here","metadata":{"collapsed":true,"trusted":true,"tags":["sample_code"],"dc":{"key":"1d0b086e6c"}},"execution_count":63,"cell_type":"code","outputs":[]},{"source":"## Reading Data\n<p>This comprehensive database of lego blocks is provided by <a href=\"https://rebrickable.com/downloads/\">Rebrickable</a>. The data is available as csv files and the schema is shown below.</p>\n<p><img src=\"https://s3.amazonaws.com/assets.datacamp.com/projects/lego/downloads_schema.png\" alt=\"schema\"></p>\n<p>Let us start by reading in the colors data to get a sense of the diversity of lego sets!</p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"044b2cef41"}},"cell_type":"markdown"},{"source":"# Import modules\nimport pandas as pd\n\n# Read colors data\ncolors = pd.read_csv('datasets/colors.csv')\n\n# Print the first few rows\ncolors.head()","metadata":{"trusted":true,"tags":["sample_code"],"dc":{"key":"044b2cef41"}},"execution_count":65,"cell_type":"code","outputs":[{"data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>name</th>\n      <th>rgb</th>\n      <th>is_trans</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>-1</td>\n      <td>Unknown</td>\n      <td>0033B2</td>\n      <td>f</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0</td>\n      <td>Black</td>\n      <td>05131D</td>\n      <td>f</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1</td>\n      <td>Blue</td>\n      <td>0055BF</td>\n      <td>f</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2</td>\n      <td>Green</td>\n      <td>237841</td>\n      <td>f</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>3</td>\n      <td>Dark Turquoise</td>\n      <td>008F9B</td>\n      <td>f</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"   id            name     rgb is_trans\n0  -1         Unknown  0033B2        f\n1   0           Black  05131D        f\n2   1            Blue  0055BF        f\n3   2           Green  237841        f\n4   3  Dark Turquoise  008F9B        f"},"output_type":"execute_result","execution_count":65,"metadata":{}}]},{"source":"## Exploring Colors\n<p>Now that we have read the <code>colors</code> data, we can start exploring it! Let us start by understanding the number of colors available.</p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"15c1e2ce38"}},"cell_type":"markdown"},{"source":"# How many distinct colors are available?\nnum_colors = colors['name'].nunique()\nprint(num_colors)","metadata":{"trusted":true,"tags":["sample_code"],"dc":{"key":"15c1e2ce38"}},"execution_count":67,"cell_type":"code","outputs":[{"output_type":"stream","text":"135\n","name":"stdout"}]},{"source":"## Transparent Colors in Lego Sets\n<p>The <code>colors</code> data has a column named <code>is_trans</code> that indicates whether a color is transparent or not. It would be interesting to explore the distribution of transparent vs. non-transparent colors.</p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"a5723ae5c2"}},"cell_type":"markdown"},{"source":"# colors_summary: Distribution of colors based on transparency\ncolors_summary = colors.groupby('is_trans').count()\nprint(colors_summary)","metadata":{"trusted":true,"tags":["sample_code"],"dc":{"key":"a5723ae5c2"}},"execution_count":69,"cell_type":"code","outputs":[{"output_type":"stream","text":"           id  name  rgb\nis_trans                \nf         107   107  107\nt          28    28   28\n","name":"stdout"}]},{"source":"## Explore Lego Sets\n<p>Another interesting dataset available in this database is the <code>sets</code> data. It contains a comprehensive list of sets over the years and the number of parts that each of these sets contained. </p>\n<p><img src=\"https://imgur.com/1k4PoXs.png\" alt=\"sets_data\"></p>\n<p>Let us use this data to explore how the average number of parts in lego sets has varied over the years.</p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"c9d0e58653"}},"cell_type":"markdown"},{"source":"%matplotlib inline\n# Read sets data as `sets`\nsets = pd.read_csv('datasets/sets.csv')\n# Create a summary of average number of parts by year: `parts_by_year`\nparts_by_year = sets.groupby('year')['num_parts'].mean().reset_index()\nprint(parts_by_year.head())\n# Plot trends in average number of parts by year\nimport matplotlib.pyplot as plt\n\nplt.scatter(x = parts_by_year['year'], y = parts_by_year['num_parts'])","metadata":{"trusted":true,"tags":["sample_code"],"dc":{"key":"c9d0e58653"}},"execution_count":71,"cell_type":"code","outputs":[{"output_type":"stream","text":"   year  num_parts\n0  1950  10.142857\n1  1953  16.500000\n2  1954  12.357143\n3  1955  36.857143\n4  1956  18.500000\n","name":"stdout"},{"data":{"text/plain":"<matplotlib.collections.PathCollection at 0x7efcbceb8b70>"},"output_type":"execute_result","execution_count":71,"metadata":{}},{"data":{"text/plain":"<matplotlib.figure.Figure at 0x7efcbce94e48>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+wXGWd5/H3JybBq5lEMl7JLui9KChMjRQJE2AHdu0g0aBVhnVqgbiz/rpUSQGrVbOzEmZ3K9FldwZrdNV1szoaXZyd/HB1HGDXAcySLgsV7hXygx8JoM6NgMK9KxJFgonku3/0uaG5uT+6+57T/Zzuz6uqK6dPd5/+3tOd8+3v8zznOYoIzMzMUjOv0wGYmZlNxQnKzMyS5ARlZmZJcoIyM7MkOUGZmVmSnKDMzCxJDScoSSdIukfSLkn3S9qQrf+KpB9n6++TdFbdaz4r6VFJuyWdXcQfYGZm3Wl+o0+MiN9IWhURz0l6GfBdSbdlD/9pRPxt/fMlXQK8ISJOl3Qe8Hng/NwiNzOzrtZUE19EPJctnkAtuR3N7muKp68Fvpq97h5giaSTWozTzMx6TFMJStI8SbuAJ4FvR8RI9tANWTPeJyUtyNadDDxW9/InsnVmZmazariJDyAijgLLJS0Gvinp94D1EfFUlpi+CFwH3NDoNiV5riUzsy4SEVO1qjWtpVF8EfFLoAqsiYinsnVHgK8A52ZPewJ4bd3LTsnWTbW95G8bNmzoeAyO0XGWLcayxFmGGMsSZ56aGcX3aklLsuU+YDWwX9KybJ2AS4EHspfcArw3e+x84JnIkpmZmdlsmmni+0fATZLmUUts2yPiW5L+r6RXUxsosRu4CiB77B2Sfgj8GvhAzrGbmVkXa2aY+f3AiinWv3WG11zbYlzJqVQqnQ5hVo4xP2WIswwxQjniLEOMUJ4486K82wybDkCKTsdgZmb5kER0cpCEmZlZ0ZygzMwsSU5QZmaWJCcoMzNLkhOUmZklyQnKzMyS5ARlZmZJcoIyM7MkOUGZmVmSnKDMzCxJTlBmZpYkJygzM0uSE5SZmSXJCcrMzJLkBGVmZklygjIzsyQ5QZmZWZKcoMzMLElOUGZmliQnKDMzS5ITlJmZJanhBCXpBEn3SNol6X5JG7L1g5LulvSIpK2S5mfrF0raJulRSd+X9Lqi/ggzM+s+DSeoiPgNsCoilgNnA5dIOg+4EfhkRLwReAYYyl4yBDwdEacDnwY+kWvkZmYlNT4+zsjICOPj450OJWlNNfFFxHPZ4gnAfCCAVcA3svU3AZdmy2uz+wBfB946p0jNzLrA1q3bGRg4g9Wrr2Jg4Ay2bt3e6ZCS1VSCkjRP0i7gSeDbwI+AZyLiaPaUx4GTs+WTgccAIuIF4BlJS3OJ2syshMbHxxkauppDh3Zy8OC9HDq0k6Ghq11JTWN+M0/OEtFySYuBbwJnNPFyTffAxo0bjy1XKhUqlUozYZmZlcLo6CgLFw5y6NBZ2ZqzWLBggNHRUfr7+zsaW6uq1SrVarWQbSsiWnuh9B+AQ8BHgWURcVTS+cCGiLhE0m3Z8j2SXgb8LCJeM8V2otUYzMzKZHx8nIGBMzh0aCdwFrCXvr5VHDiwv7QJajJJRMS0BUkzmhnF92pJS7LlPmA18BCwE/gX2dPeB9ycLd+S3Sd7/M48AjYzK6v+/n42b95EX98qFi9eQV/fKjZv3tQ1ySlvDVdQkt5MbdDDvOy2PSL+k6RTgW3AicAu4I8j4oikE4C/BpYDPweuiIjRKbbrCsrMesr4+Dijo6MMDg52XXLKs4JquYkvL05QZmbdoyNNfGZmZu3kBGVmZklygjIzsyQ5QZmZWZKcoMzMLElOUGZmliQnKDMzS5ITlJmZJckJyszMkuQEZWZmSXKCMjOzJDlBmZlZkpygzMwsSU5QZmaWJCcoMzNLkhOUmZklyQnKzMyS5ARlZmZJcoIyM7MkOUGZmVmSnKDMzCxJTlBmZpakhhOUpFMk3SnpQUn3S/rX2foNkh6XdF92W1P3muslPSppn6S3FfEHmJlZd1JENPZEaRmwLCJ2S1oE3AusBS4HfhURn5r0/DOBLcBK4BRgB3B6THpDSZNXmZlZSUkiIpTHthquoCLiyYjYnS0/C+wDTp6IaYqXrAW2RcRvI2IUeBQ4d27hmpmVz/j4OCMjI4yPj3c6lFJpqQ9K0iBwNnBPtuoaSbslfUnSkmzdycBjdS97ghcTmplZLlI/+G/dup2BgTNYvfoqBgbOYOvW7Z0OqTTmN/uCrHnv68BHIuJZSZuAj0dESLoB+CRwZTPb3Lhx47HlSqVCpVJpNiwz60Fbt25naOhqFi4c5PDhUTZv3sS6dZd3OqxjxsfHGRq6mkOHdnLo0FnAXoaGVnHxxRfR39/f6fByUa1WqVarhWy74T4oAEnzgf8N/H1EfGaKxweAWyPiLEnrgYiIG7PHbgM2RMQ9k17jPigza9r4+DgDA2dw6NBOoHbw7+tbxYED+5M5+I+MjLB69VUcPHjvsXWLF69gx44vsHLlyg5GVpyO9EFlvgw8VJ+cssETE94NPJAt3wJcIWmhpFOB04DhuQRrZjZhdHSUhQsHqSUngLNYsGCA0dHRzgU1yeBgrbKDvdmavRw5coDBwcHOBVUiDTfxSboA+JfA/ZJ2AQH8GfAeSWcDR4FR4EMAEfGQpK8BDwFHgKtdKplZXl568K9VUKkd/Pv7+9m8eRNDQ6tYsGCAI0cOsHnzpmQqvNQ11cRXSABu4jOzFk30QdUf/FPqg5owPj7O6Ogog4ODXZ+c8mzic4Iys1LrpYN/GThBmZlZkjo5SMLMzKwtnKDMzCxJTlBm1tVSn2nCpucEZWZdpT4heZqhcvMgCTPrGvVTH/3mNz/m6NHg8OHvkOpME93IgyTMzCapn/fu4MF7ef75/8bhw/2kPNOEzcwJysy6wvFTH60GfoqnGSqvpmczNzNL0fFTH/2MBQvmMX++pxkqK/dBmVnXmGrqo4svvsgzTbSRZ5IwM5uGpz7qLCcoMzNLkkfxmZlZ13OCMjOzJDlBmZlZkpygzMwsSU5QZtbTPJlsupygzKxneTLZtHmYuZn1pPHxcQYGzuDQoZ14Mtn8eJi5mdkcHT93nyeTTY0TlJn1pJfO3QeeTDY9DScoSadIulPSg5Lul/ThbP2Jku6Q9LCk2yUtqXvNZyU9Kmm3pLOL+APMelE7Ova7ffBAf38/mzdvoq9vFYsXr6Cvb5Unk01Mw31QkpYByyJit6RFwL3AWuADwM8j4hOSrgNOjIj1ki4Bro2Id0o6D/hMRJw/xXbdB2XWhPqL8h0+PMrmzZtYt+7y0r1HKoqYu6+X5wPMsw+KiGjpBvwdcDGwHzgpW7cM2Jctfx64vO75+yaeN2k7YWaNGRsbi76+pQF7AiJgT/T1LY2xsbFSvcdcYhseHk4iluls2bIt+vqWxpIlK6Kvb2ls2bKt0yG1VXZMbzm31N9a6oOSNAicDdydJZ2nskzzJHBS9rSTgcfqXvZEts7MWtSOjv1UBw/kNSS8yKbLyVf1PXRoJ0NDV3dtM2nRmr5gYda893XgIxHxrKTJ7XNNt9dt3Ljx2HKlUqFSqTS7CbOecPxF+fLv2G/HezSr/sB/6FAtpqGhVVx88UVNNaEV3XQ5kdxrMUJ9cu/Wpr5qtUq1Wi1m482UW9QS2m3UktNxTXfM3MR3rClw0jYLKjTNutNEE9LixcsLa0Jqx3s0Y3h4OJYsWZE1OdZuixcvj+Hh4Ya30evNo+1Cjk18zVZQXwYeiojP1K27BXg/cGP27811668Btks6H3gmsqZAM2vdunWXF36V2Ha8RzPyqOraUd1MjAwcGvJl5vPQzCi+C4DvAPdTa8YL4M+AYeBrwGuBA8BlEfFM9prPAWuAXwMfiIj7pthuNBqDmfWuqS7n3kzzXDtnjvAoPl9R18x6zFwP/HNNcjY7Jygzsxb1cnXTDk5QZmaWJE8Wa2ZmXc8JyszMkuQEZWbWpG6fSDcVTlBmBvig2yhfhbd9PEjCzHpq9vK58FV4Z+dBEnYc//q1VnmC08alOpFut3KC6gJucrC58EG3cb4Kb3s5QZWcf/3aXPmg2zhfhbe9mr7chqWlF6f3t3x5gtPmpDaRbjfzIImSc6et5cVTALXO++5FHiRhx7jJwfLS39/PypUr/d1pkvuAi+MKqkv4F1xv8+ffGW7BOJ4rKDuOf/32Lv+C7xyPgCyWKyizEvMv+M7y/j+eKygzA/wLvtPcB1wsV1BmJdZtv+BT6EtrJYYU4k6FKygzA7rrF3wKfWmtxuA+4GK4gjLrAmX/BZ9CJZhCDN3AFZRZxpPk1pT9F3wKfWkpxGAv5QRlpTVVc4wTVjmlMB9gCjHYJBHR0A3YDDwF7K1btwF4HLgvu62pe+x64FFgH/C2GbYbZs0aGxuLvr6lAXsCImBPLFjwO9HXtzSWLFkRfX1LY8uWbZ0O05qwZcu26OtbGosXL+/Y55dCDGWXHdMbzi0z3Rrug5J0IfAs8NWIOCtbtwH4VUR8atJzzwS2ACuBU4AdwOkxxZu5D8paMTIywurVV3Hw4L3ZmnFgEPg+7j8orxT60ibHkEJMZdKRPqiIuAv4xVTxTLFuLbAtIn4bEaPUKqlzW4rQbArHN8d8G/jHuP9gemVo/kyhL60+hhRGFvayPPqgrpG0W9KXJC3J1p0MPFb3nCeydWa5mDy8+uUvv4aFC8dppP+gDAfqvPlA2zxfa63z5no9qE3AxyMiJN0AfBK4stmNbNy48dhypVKhUqnMMSzrBZOvy7Njx52zXtNo69btDA1dzcKFtQps8+ZNrFt3eYf+gvaoP9DWrhu2l6GhVVx88UVuspqBr7XWmGq1SrVaLWTbTZ0HJWkAuHWiD2q6xyStp9ZRdmP22G3Ahoi4Z4rXuQ/KcjNTf0GZznPJs9/j+P46WLx4BTt2fIGVK1fONdSuVabvS0o6eR6UqOtzkrSs7rF3Aw9ky7cAV0haKOlU4DRgeC6BmjVipj6MspznkndznIdPt6abZukoq2ZG8W0BKsDvUhtuvgFYBZwNHAVGgQ9FxFPZ868HhoAjwEci4o5ptusKytqiDL+Ii4pxommzvvlzrk2bvTK6rVf+zrzkWUHlMlZ9Ljd8HpS1UernuQwPD8eSJSuyc7tqt8WLl8fw8PCctz02NhbDw8MxNjY2521N7MeizznLM2ZrDzpxHlRRXEFZu6X8i7iXq7zJenFASzfwXHxmc5DCuTbTKUO/Rzv68jzE22Duw8zNLGeTh8+nlJxg8qCLWgWV96ALD/E2cIIyS1J/f3+yB+KJKm+2c87moh1J0NLnPigza0nRfXlFjDy04uXZB+UEZWYN6cTgkpQHtNjUPEjCktSLc9yVWTOfV6fm8kt5QIsVzwnKctFNk5H2QqJt5vPyiDrrFCcom7NuOoB1U6KdTrOfV1mmiLLu4wRlc9YtB7AiE21KVVmzn5fn8rNOcYKyOeuWA1hRiTa1qqzZz6sMJw9bl8przqRWb3guvq6Q+hx3jRgbG4u+vqUBe7J58PZEX9/SOc0DV8Q2m3nv6eaxa+Xz8rx41ghynIvPCcpy0w0HsLwTbZGTv86kkclcu+HzsvTkmaB8HpSVSjvOi2n2PVK7SGIZJpyd4POcuo/Pg7Ke1K6+nGbOvZktpun6b4DCBk2UZdBKan1zlqC8SrFWb7iJzxrQyb6cPGKqb04r+lpKKe6rycoQo7WGHJv4XEFZKaRYFTQT00RVBhR+zlgZRt2l+HlaejybuZVCirNbtxJTuy4j4Ut2WDdwBWWlkGJV0EpM7TxnLOV57FL8PC09HsVn00pxhFU3xOTLSLwoxc/T5saX27DCTRxEFy6s/eLv5oOoLyORn8l/12z3rfvkmaA8is+Ok9LsB0WfTFr0iLpeMnlfXnvtR2a8733dnejETBLAZuApYG/duhOBO4CHgduBJXWPfRZ4FNgNnD3DdgvcVdaKVGY/KPqAVqahzqnP+nD8vtwZ0DfD/XT3tc1NngmqmUESXwHePmndemBHRLwJuBO4HkDSJcAbIuJ04EPA55ur66yTOjH56/EziX+Dz33urwodjl2Woc5lOKH1+H35SuC1M9xPc19bWhpOUBFxF/CLSavXAjdlyzdl9yfWfzV73T3AEkknzS1Ua5dGR1jleQmJ2Q9w+R/QGk3EnbxURlmutXX8vvw18NgM9z2s3BrQTLkFDPDSJr6nJz3+dPbvrcAf1q3fAayYZpsFFJmWh0Zmw86r+W32JqJimoRmmxy2031UnWpubcXkfXnttR+e8b77oLoTnZosVtIAcGtEnJXdfzoiltY9/vOI+F1JtwJ/HhHfy9bvAD4aEfdNsc3YsGHDsfuVSoVKpdJwTNZ+RU1GOnn49dDQH7N58/8sfDj2dCPLWv078xypVqaJX8Gj+HpRtVqlWq0eu/+xj32M6MQoPo6voPYBJ2XLy4B92fLngcvrnrd/4nlTbDP/FG6FKvJXfbtH8c2klb+ziIqrG661Zb2DDlZQg9QqqDdn92+k1qx3o6T1wKsiYr2kdwDXRMQ7JZ0PfDoizp9mm9FMDNZ5ZfpVP5df7M3+nUXuF1ceVhYdudyGpC3A94A3SvqJpA8AfwGslvQwcFF2n4j4FvAPkn4IfAG4Oo9gLQ1lmaZmrqPfmv07ixwVmPK0RWZF8UwS1rKUf9XnWc00+neWqbI0K0qeFZRnM7eW9ff3J3vgzXPW8Eb/zomKa2ho1UsGdqS6j8xS5wrK2qadFVcnq5mUK0uzovmS71Y67Z4NoZP9ZO4vMsuHK6ge0civ+qJ++buaMesdrqCsKY1UL0VWOJ2c887VjFl5uYLqco1UL0VXOB7dZtY7XEFZwxqpXoqucMpy3pSZpcUVVJdLoYKqj8X9QWbdzRWUNayR6qVdFY77g8ysGa6gekQnR/GZWe/Is4JygupSTjZm1glu4rMZleES4WVWxBV2O3nVXrNUOUF1mbJcInxC2Q7MRSR//6Awm5oTVJfp5EmxzSrbgbmI5F+2HxRm7eQE1WUGBwc5fHgU2Jut2cuRIwcYHBxseywzVUdlPDAXkfzL9IPCrN2coLpMKifFzlYdlfHAXETyT+kHhVly8rp2fKu3WgiWt7GxsRgeHo6xsbGOvHdf39KAPQERsCf6+pa+JJZGnpOiLVu2RV/f0li8eHn09S2NLVu2JblNs07Jjum55AcPM7fcjYyMsHr1VRw8eO+xdYsXr2DHji+wcuXKY+u2bt3O0NDVL7m437p1l3ci5KYUMYTfpwVYt/B5UJa0ZqZO8oHZrLv4ku+WtGYufZ7yZePNrLNcQVlhXB2Z9R438ZmZWZKSa+KTNAocBI4CRyLiXEknAtuBAWAUuCwiDubxfmZm1v3yOg/qKFCJiOURcW62bj2wIyLeBNwJXJ/Te5mZWQ/IK0Fpim2tBW7Klm8CLs3pvczMrAfklaACuF3SiKQrs3UnRcRTABHxJPCanN7LzMx6QF7DzC+IiJ9J6gfukPQwtaRVb9qREBs3bjy2XKlUqFQqOYXVPTwizsxSVK1WqVarhWw791F8kjYAzwJXUuuXekrSMmBnRJw5xfM9im8WEzMuLFxYm7etLDMumFnvSWqYuaRXAPMi4llJrwTuAD4GvBV4OiJulHQdcGJErJ/i9U5QU5iomBYtWsQ551zY0KwMZmadltow85OAb0qKbHt/ExF3SPoB8DVJHwQOAJfl8F49ob5iev75HzJv3gBTzfrd7gTlZkYzayefqJuY4+exqwLvAO6mkxWUmxnNrBFJNfHNOYAeTFAzVSJTzQT+8pefSsQznHDCqR2Z9buZyV/NrLel1sRnTZitEnnpBexqyUD6Jffd9z2effbZjjSvTVxc8NChzjczmlnvcAXVRo1WIu24TlIz/UmuoMysUXlWUL7kexs1epnzdesu58CB/ezY8QUOHNife3Ka7XLsk6VyGXkz6y2uoNoohUpkLjF4FJ+ZzcYVVEmlUIk0WsVNpb+/n5UrVzo5mVlbuILqgE5WIilUcWbWvTyKr+Q6eZnzZi7HbmbWSa6gepT7k8ysCD5R18zMkuRBEmZm1vWcoBIwPj7OyMgI4+PjnQ7FzCwZTlAd1uxJs2ZmvcJ9UAVodACCT5o1s27jPqiENVMRtXrSrKsuM+sFrqByNF1FdO+9d005E3krFZRPtDWzlLmCStRUFVHEEpYv/8Mpq51Wpj6ay1RFZmZl4gqqAa33KVVp5Gq4vvSFmXULV1Bt1Ex/z+SK6IQT1tLXdxqzVTvNTMKawoSzZmbt4ApqBq1WKxMV0aJFizjnnAsLqXY8is/MUuTJYtuk1Uud108GW9TErJ2ccNbMrB1cQc0gr/4eVztm1itK1QclaY2k/ZIekXRd0e+Xp7z6e3yhPzOz5hVaQUmaBzwCvBX4KTACXBER++uek2wFNcEVkJlZY8rUB3Uu8GhEHACQtA1YC+yf8VWJcX+PmVn7Fd3EdzLwWN39x7N1ZmZmM0piFN/GjRuPLVcqFSqVSsdiMTOzxlWrVarVaiHbLroP6nxgY0Ssye6vByIibqx7TvJ9UGZm1pgyjeIbAU6TNCBpIXAFcEvB72lmZl2g0Ca+iHhB0rXAHdSS4eaI2Ffke5qZWXfwibpmZpabMjXxmZmZtcQJyszMkuQEZWZmSXKCMjOzJDlBmZlZkpygzMwsSU5QZmaWJCcoMzNLkhOUmZklyQnKzMyS5ARlZmZJcoIyM7MkOUGZmVmSnKDMzCxJTlBmZpYkJygzM0uSE5SZmSXJCcrMzJLkBGVmZklygjIzsyQ5QZmZWZKcoMzMLElzSlCSNkh6XNJ92W1N3WPXS3pU0j5Jb5t7qJ1VrVY7HcKsHGN+yhBnGWKEcsRZhhihPHHmJY8K6lMRsSK73QYg6UzgMuBM4BJgkyTl8F4dU4YvhmPMTxniLEOMUI44yxAjlCfOvOSRoKZKPGuBbRHx24gYBR4Fzs3hvczMrEfkkaCukbRb0pckLcnWnQw8VvecJ7J1ZmZmDVFEzPwE6dvASfWrgAD+HXA38P8iIiTdACyLiCsl/Vfg+xGxJdvGl4BvRcTfTrH9mQMwM7NSiYhcunTmN/BGqxvc1heBW7PlJ4DX1j12SrZuqu2Xum/KzMyKMddRfMvq7r4beCBbvgW4QtJCSacCpwHDc3kvMzPrLbNWULP4hKSzgaPAKPAhgIh4SNLXgIeAI8DVMVtbopmZWZ1Z+6DMzMw6oZCZJCRtlvSUpL11686S9D1JeyTdLGnRpNe8TtKvJP1J3bo1kvZLekTSdZ2KUdKApOfqTkjeVPeaFZL2ZjF+ulMxTnrsgezxhUXH2Gyckt4jaVe2H3dJekHSWdlj56SwLyXNl/Q/slgelLS+7jWFfSdbiHOBpC9nce6S9Ja61xT5vTxF0p3Zvrlf0oez9SdKukPSw5JurxvVi6TPqnbi/u6s1WVi/fuyGB+W9N5OxSjpTdk+fr7+GJQ9VuRxqNk435N9D/ZIumvi/06RcbYQ47uy+HZJGpZ0Qd22mvu8IyL3G3AhcDawt27dMHBhtvx+4OOTXvO/gO3An2T35wE/BAaABcBu4IxOxJjFsHea7dwDrMyWvwW8vUMxvgzYA/x+dv9EXqyQC4ux1c87W//7wKMJ7st1wJZsuQ/4B+B1RX8nW4jzamBzttwP/KBN+3IZcHa2vAh4GDgDuBH4aLb+OuAvsuVLgP+TLZ8H3F33Hf0RsAR41cRyh2LsB84B/iPZMShbX/RxqNk4z5/YR8Caun1ZWJwtxPiKute+GdjX6uddSAUVEXcBv5i0+vRsPcAO4I8mHpC0Fvgx8GDd88+ldvA6EBFHgG3UTgDuSIxMcUKyaoNEficiRrJVXwUu7VCMbwP2RMQD2Wt/ERFRdIwtxFlvHbXPNbV9GcArJb0MeAXwG+CXFPydbCLOd2fLvwfcmb1uHHhG0h+0YV8+GRG7s+VngX3URuquBW7KnnYTL+6btVkMRMQ9wBJJJwFvB+6IiIMR8QxwB7WDbjtjvDR7znhE3Av8dtKmij4ONRvn3RFxMFt/Ny+eX1pYnC3E+FzdyxdRG6MALXze7Zws9kFJ78qWL6P2B5I1V3wU+BgvTQKTT/Z9nOJP9p0yxsygpHsl7ZR0YV2MjycS4xsBJN0m6QeS/m0HY5wpznqXA1uz5ZT25deB54CfURv885fZf6hOfCeninPiFI49wLskvUy10bLnZI+1bV9KGqRW8d0NnBQRT0HtoMaL509Ot9/ackL/LDG+ZpaXt+0zbyHOK4G/b2ecjcYo6VJJ+6idevTBaWKc9fNuZ4L6ILVZJ0aAVwKHs/UbgP8yKet2ynQx/gx4XUScA/wbYIsm9aElEON84AJqVck/Bf65pFWdCRGYPk4AJJ0L/DoiHupEcJnpYjyP2i/pZcDrgT/N/mN2ynRxfpnaf/IR4FPAd4EX2hVU9n/g68BHsl/Wk0dcTTcCq23nPs4hxrZqNs7s//YHqDWttUUzMUbE30XEmdSqqhtafc+5DjNvWEQ8Qq3EQ9LpwDuzh84D/kjSJ6i1Ub4g6XngPmrt/hOmPdm36Bgj4jDZQSEi7pP0I2oVS8MnJBcdI7VfTN+JiF9kj30LWAH8TbtjnCXOCVfwYvUEae3LdcBtEXEUGJf0XeAPqO3jtn4nZ4ozIl4A6gcVfRd4BHiGgvelpPnUDlZ/HRE3Z6ufknRSRDyVNTOOZeun+2yfACqT1u/sUIzTeYKCP/Nm48wGRvwVsGbi/3vRcba6LyPiLkmvl7SUFj7vIisoUfdLSVJ/9u884N8DnweIiH8WEa+PiNcDnwb+c0Rsovar8DTVRtAtpHZAu6UTMUp6dbYOSa+nduLxj7Oy9qCkcyUJeC9wM/lqKEbgduDNkl6efZneAjzYphibiZMsjsvI+p/gWBNBp/flf88e+glwUfbYK6l1TO+jPd/JRuKc+F72SXpFtrwaOBIR+9u0L78MPBQRn6lbdwu1QRxk/95ct/69WZznA89kTUO3A6slLZF0IrA6W9euGN/H1PulvsJrx2fecJySXgd8A/hXEfGjNsbZTIxvmHiCpBXAwoh4mlY+70ZGcTR7A7YAP6XWufyp2TBRAAAA7UlEQVQTaqXoh6mN/thPLQlN9boNvHQEzZrsNY8C6zsVIy/OknEf8APgHXWPnQPcn8X4mU7uR+A9WZx7gT9vR4wtxvkW4HtTbCeJfUmtGe1r2b58oF3fyRbiHMjWPUitw/m1bdqXF1BrStwN7Mr+X6wBllIbxPFwFs+r6l7zOWqjzPYAK+rWvz+L8RHgvZ2KkVp/2WPUqs+ns32/qOjPvIU4vwj8PHveLmC46O9mCzF+lBePl98F/kmrn7dP1DUzsyT5ku9mZpYkJygzM0uSE5SZmSXJCcrMzJLkBGVmZklygjIzsyQ5QZmZWZL+P4X7dk3qsB9wAAAAAElFTkSuQmCC\n"},"output_type":"display_data","metadata":{}}]},{"source":"## Lego Themes Over Years\n<p>Lego blocks ship under multiple <a href=\"https://shop.lego.com/en-US/Themes\">themes</a>. Let us try to get a sense of how the number of themes shipped has varied over the years.</p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"266a3f390c"}},"cell_type":"markdown"},{"source":"# themes_by_year: Number of themes shipped by year\nthemes_by_year = sets.groupby('year')['theme_id'].nunique().reset_index()\nprint(themes_by_year.head())","metadata":{"trusted":true,"tags":["sample_code"],"dc":{"key":"266a3f390c"}},"execution_count":73,"cell_type":"code","outputs":[{"output_type":"stream","text":"   year  theme_id\n0  1950         2\n1  1953         1\n2  1954         2\n3  1955         4\n4  1956         3\n","name":"stdout"}]},{"source":"## Wrapping It All Up!\n<p>Lego blocks offer an unlimited amoung of fun across ages. We explored some interesting trends around colors, parts and themes. </p>","metadata":{"tags":["context"],"run_control":{"frozen":true},"deletable":false,"editable":false,"dc":{"key":"a293e5076e"}},"cell_type":"markdown"},{"source":"# Nothing to do here","metadata":{"collapsed":true,"trusted":true,"tags":["sample_code"],"dc":{"key":"a293e5076e"}},"execution_count":75,"cell_type":"code","outputs":[]}],"nbformat":4}


--------------------------------------------------------------------------------
/exploring_the_evolution_of_linux.ipynb:
--------------------------------------------------------------------------------
1 | {"metadata":{"kernelspec":{"language":"python","name":"python3","display_name":"Python 3"},"language_info":{"nbconvert_exporter":"python","name":"python","file_extension":".py","mimetype":"text/x-python","version":"3.5.2","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3"}},"cells":[{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"4"}},"cell_type":"markdown","source":"## 1. Introduction\n<p>Version control repositories like CVS, Subversion or Git can be a real gold mine for software developers. They contain every change to the source code including the date (the \"when\"), the responsible developer (the \"who\"), as well as little message that describes the intention (the \"what\") of a change.</p>\n<p><a href=\"https://commons.wikimedia.org/wiki/File:Tux.svg\">\n<img style=\"float: right;margin:5px 20px 5px 1px\" width=\"150px\" src=\"https://s3.amazonaws.com/assets.datacamp.com/production/project_111/img/tux.png\" alt=\"Tux - the Linux mascot\">\n</a></p>\n<p>In this notebook, we will analyze the evolution of a very famous open-source project &ndash; the Linux kernel. The Linux kernel is the heart of some Linux distributions like Debian, Ubuntu or CentOS. </p>\n<p>We get some first insights into the work of the development efforts by </p>\n<ul>\n<li>identifying the TOP 10 contributors and</li>\n<li>visualizing the commits over the years.</li>\n</ul>\n<p>Linus Torvalds, the (spoiler alert!) main contributor to the Linux kernel (and also the creator of Git), created a <a href=\"https://github.com/torvalds/linux/\">mirror of the Linux repository on GitHub</a>. It contains the complete history of kernel development for the last 13 years.</p>\n<p>For our analysis, we will use a Git log file with the following content:</p>"},{"execution_count":80,"outputs":[{"output_type":"stream","name":"stdout","text":"['1502382966#Linus Torvalds\\n', '1501368308#Max Gurtovoy\\n', '1501625560#James Smart\\n', '1501625559#James Smart\\n', '1500568442#Martin Wilck\\n']\n"}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"4"}},"cell_type":"code","source":"# Printing the content of git_log_excerpt.csv\nwith open(\"datasets/git_log_excerpt.csv\") as myfile:\n    firstNlines=myfile.readlines()[0:5]\n    \nprint(firstNlines)"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"11"}},"cell_type":"markdown","source":"## 2. Reading in the dataset\n<p>The dataset was created by using the command <code>git log --encoding=latin-1 --pretty=\"%at#%aN\"</code>. The <code>latin-1</code> encoded text output was saved in a header-less csv file. In this file, each row is a commit entry with the following information:</p>\n<ul>\n<li><code>timestamp</code>: the time of the commit as a UNIX timestamp in seconds since 1970-01-01 00:00:00 (Git log placeholder \"<code>%at</code>\")</li>\n<li><code>author</code>: the name of the author that performed the commit (Git log placeholder \"<code>%aN</code>\")</li>\n</ul>\n<p>The columns are separated by the number sign <code>#</code>. The complete dataset is in the <code>datasets/</code> directory. It is a <code>gz</code>-compressed csv file named <code>git_log.gz</code>.</p>"},{"execution_count":82,"outputs":[{"execution_count":82,"metadata":{},"output_type":"execute_result","data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>timestamp</th>\n      <th>author</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1502826583</td>\n      <td>Linus Torvalds</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1501749089</td>\n      <td>Adrian Hunter</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1501749088</td>\n      <td>Adrian Hunter</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1501882480</td>\n      <td>Kees Cook</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1497271395</td>\n      <td>Rob Clark</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"    timestamp          author\n0  1502826583  Linus Torvalds\n1  1501749089   Adrian Hunter\n2  1501749088   Adrian Hunter\n3  1501882480       Kees Cook\n4  1497271395       Rob Clark"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"11"}},"cell_type":"code","source":"# Loading in the pandas module\nimport pandas as pd\n\n# Reading in the log file\ngit_log = pd.read_csv('datasets/git_log.gz', sep='#', encoding='latin-1', header=None, names=['timestamp', 'author'])\n\n# Printing out the first 5 rows\ngit_log.head(n=5)"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"18"}},"cell_type":"markdown","source":"## 3. Getting an overview\n<p>The dataset contains the information about every single code contribution (a \"commit\") to the Linux kernel over the last 13 years. We'll first take a look at the number of authors and their commits to the repository.</p>"},{"execution_count":84,"outputs":[{"output_type":"stream","name":"stdout","text":"17385 authors committed 699071 code changes.\n"}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"18"}},"cell_type":"code","source":"# calculating number of commits\nnumber_of_commits = len(git_log.index)\n\n# calculating number of authors\nnumber_of_authors = git_log['author'].nunique()\n\n# printing out the results\nprint(\"%s authors committed %s code changes.\" % (number_of_authors, number_of_commits))"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"25"}},"cell_type":"markdown","source":"## 4. Finding the TOP 10 contributors\n<p>There are some very important people that changed the Linux kernel very often. To see if there are any bottlenecks, we take a look at the TOP 10 authors with the most commits.</p>"},{"execution_count":86,"outputs":[{"execution_count":86,"metadata":{},"output_type":"execute_result","data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>count</th>\n    </tr>\n    <tr>\n      <th>author</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Linus Torvalds</th>\n      <td>23361</td>\n    </tr>\n    <tr>\n      <th>David S. Miller</th>\n      <td>9106</td>\n    </tr>\n    <tr>\n      <th>Mark Brown</th>\n      <td>6802</td>\n    </tr>\n    <tr>\n      <th>Takashi Iwai</th>\n      <td>6209</td>\n    </tr>\n    <tr>\n      <th>Al Viro</th>\n      <td>6006</td>\n    </tr>\n    <tr>\n      <th>H Hartley Sweeten</th>\n      <td>5938</td>\n    </tr>\n    <tr>\n      <th>Ingo Molnar</th>\n      <td>5344</td>\n    </tr>\n    <tr>\n      <th>Mauro Carvalho Chehab</th>\n      <td>5204</td>\n    </tr>\n    <tr>\n      <th>Arnd Bergmann</th>\n      <td>4890</td>\n    </tr>\n    <tr>\n      <th>Greg Kroah-Hartman</th>\n      <td>4580</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"                       count\nauthor                      \nLinus Torvalds         23361\nDavid S. Miller         9106\nMark Brown              6802\nTakashi Iwai            6209\nAl Viro                 6006\nH Hartley Sweeten       5938\nIngo Molnar             5344\nMauro Carvalho Chehab   5204\nArnd Bergmann           4890\nGreg Kroah-Hartman      4580"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"25"}},"cell_type":"code","source":"# Listing top 10 authors\ntop_10_authors = git_log.groupby('author')['timestamp']\\\n                        .count()\\\n                        .reset_index(name='count')\\\n                        .sort_values(['count'], ascending=False) \\\n                        .head(10)\\\n                        .set_index('author')\n\ntop_10_authors"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"32"}},"cell_type":"markdown","source":"## 5. Wrangling the data\n<p>For our analysis, we want to visualize the contributions over time. For this, we use the information in the <code>timestamp</code> column to create a time series-based column.</p>"},{"execution_count":88,"outputs":[{"execution_count":88,"metadata":{},"output_type":"execute_result","data":{"text/plain":"count                  699071\nunique                 668448\ntop       2008-09-04 05:30:19\nfreq                       99\nfirst     1970-01-01 00:00:01\nlast      2037-04-25 08:08:26\nName: timestamp, dtype: object"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"32"}},"cell_type":"code","source":"# converting the timestamp column\ngit_log['timestamp'] = pd.to_datetime(git_log['timestamp'], unit='s')\n\n# summarizing the converted timestamp column\ngit_log['timestamp'].describe()"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"39"}},"cell_type":"markdown","source":"## 6. Treating wrong timestamps\n<p>As we can see from the results above, some contributors had their operating system's time incorrectly set when they committed to the repository. We'll clean up the <code>timestamp</code> column by dropping the rows with the incorrect timestamps.</p>"},{"execution_count":90,"outputs":[{"execution_count":90,"metadata":{},"output_type":"execute_result","data":{"text/plain":"count                  698569\nunique                 667977\ntop       2008-09-04 05:30:19\nfreq                       99\nfirst     2005-04-16 22:20:36\nlast      2017-10-03 12:57:00\nName: timestamp, dtype: object"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"39"}},"cell_type":"code","source":"# determining the first real commit timestamp\nfirst_commit_timestamp = git_log[git_log['author'] == 'Linus Torvalds']['timestamp'].min()\n\n# determining the last sensible commit timestamp\nimport datetime\nlast_commit_timestamp = datetime.date(2018, 1, 17)\n\n# filtering out wrong timestamps\ncorrected_log = git_log[(git_log['timestamp'] >= first_commit_timestamp) \\\n                        & (git_log['timestamp'] <= last_commit_timestamp)]\n\n# summarizing the corrected timestamp column\ncorrected_log['timestamp'].describe()"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"46"}},"cell_type":"markdown","source":"## 7. Grouping commits per year\n<p>To find out how the development activity has increased over time, we'll group the commits by year and count them up.</p>"},{"execution_count":92,"outputs":[{"execution_count":92,"metadata":{},"output_type":"execute_result","data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>author</th>\n    </tr>\n    <tr>\n      <th>timestamp</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2005-01-01</th>\n      <td>16229</td>\n    </tr>\n    <tr>\n      <th>2006-01-01</th>\n      <td>29255</td>\n    </tr>\n    <tr>\n      <th>2007-01-01</th>\n      <td>33759</td>\n    </tr>\n    <tr>\n      <th>2008-01-01</th>\n      <td>48847</td>\n    </tr>\n    <tr>\n      <th>2009-01-01</th>\n      <td>52572</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"            author\ntimestamp         \n2005-01-01   16229\n2006-01-01   29255\n2007-01-01   33759\n2008-01-01   48847\n2009-01-01   52572"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"46"}},"cell_type":"code","source":"# Counting the no. commits per year\ncommits_per_year = corrected_log.groupby(\n    pd.Grouper(\n        key='timestamp',\n        freq='AS'\n        )\n    ).count()\n\n# Listing the first rows\ncommits_per_year.head(n=5)"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"53"}},"cell_type":"markdown","source":"## 8. Visualizing the history of Linux\n<p>Finally, we'll make a plot out of these counts to better see how the development effort on Linux has increased over the the last few years. </p>"},{"execution_count":94,"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/plain":"<matplotlib.figure.Figure at 0x7fda74a71d30>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0VdX5//H3w+yAiFqhyuCIQx0Ql6AVSyxFQFux/SmitYx1AK3UefhagWLrPLG0UqsgUCvgVLAqRIVYZ1BAUEa1UEDBEbQIyvD8/tg7cokJuUlucu7N/bzWyrrn7rvPOXuTkCd7nz2YuyMiIpJt6iRdABERkdIoQImISFZSgBIRkaykACUiIllJAUpERLKSApSIiGQlBSiRBJjZM2b2mwrkn25m/auzTCLZRgFKajUzO9vMZprZV2a20syeNrPjky6Xu5/s7uNiGfuY2UvVfU8za2Rmi0sGRjO7vibuL1JRClBSa5nZpcAdwA3AnkAr4C/AqUmWqxQGVPuMeXffAAwAbjezHwCY2SHAJUBGW2dmVjeT15P8pAAltZKZ7QIMAwa5+yR3X+/um939aXe/KuZpYGZ3xZbVCjO708zqx886mdlyM7vCzFbHPD3MrLuZLTKzT83smpT7DTGziWY2zsy+NLO3zexAM7s6nr/MzLqk5J9uZv3N7GDgPuC42Mr7fDvVOsDM3jCztWb2pJntGq/1LzO7sET93zazHiUv4O4vAROAe2LS34A/u/uSeF4TMxtlZh+a2X/NbFjKNQ8ws2lm9pmZfWxmY82sccrny83scjObC/wvne+TyPYoQEltdRzQEPjndvJcB7QHjgCOjMfXpXzeHGgA7AUMIfwy/zVwFPAT4A9m1jol/8+BMcCuwBxgKqF1tBcwHPhryQK4+0LgAuA1d2/s7rttp7y/AfrGcm0GRsT0MfEzAMzsyHjPp8u4ztVAezN7PNbvtpTPxgHrgH2Bo4GTzaxf8aVjPfYEDo15/lDi2mcCXeO/gUiVKEBJbbU78Km7b9lOnrOBYe7+mbt/RmhxpT6f+ZbQutgMjAf2AO5y96/dfT4wnxDYir3k7s/Hez4a89+Ucv4+sWVXWePcfYG7rycEhjPNzIDJwIFmtn/Mdw4wwd03lXYRd18HXAj8EujvcUFOM9sL+Blwqbt/4+6fAHcDZ8Xzlrj79NgS/RS4C+hU4vJ3uftH7v5NFeopAkC9pAsgUk0+A/YwszrbCVJ7Af9Neb8spn13jeJf3sD6+PpxyufrgZ1T3q8u8dmnpZy/M/BlelX4nuUlylof2MPdPzGzicA5ZvZHQkD5f+Vc613Cc6/5KWmtCa3O1SHuYfHrPwBm1ozQajs+1qMu2/57AKyoeLVESqcWlNRWrwHfAKdtJ89Kwi/lYq2BD6uzUGVId4BEy5Tj1oQW3qfx/RhCy6kzsM7d36hEOZbHc3eLX03dfVd3Pyp+fjOwAfiRu+9K6G60EtfQ9giSMQpQUiu5+5eE50b3xsENO5hZvTjI4aaYbTxwnZntYWZ7ELrNxiVQ3NVAi+IBGttxjpkdbGY7ErojHy1uobn768AW4HbSr8M2wcXdVwAvmtntZtbYgv3N7ISYpTHh+dRXZtYSuDzN+4hUigKU1FrufgdwKWHgw8eE7rxBbB04cQPwJjAXeDse/2l7lyznfblFKuN4GqHLbZWZlewyS80/jtBS+pAwuGFwiTxjgcOAv1eiPMXOAXYidP19DkwEmsXPhgAdgDWEf8PH0rieSKVZOhsWmtklhPkTW4B5QD9CX/14YDfgLeA37r7JzBoQ/qMcTeh+ONPd/xuvcw1hvsUmYLC7F8b0boQHrnWAB9395kxWUiQfxAm457r7T5Iui0gmlNuCiiN7fge0c/cjCAMrziL0R9/u7m0If1ENiKcMAD539wMJQeeWeJ1DgZ7AIUB34C+xC6EOYU5GV+BHwFlxboiIpCl2+w2ilKHsIrkq3S6+usBOZlYP2IHQxXAi8Hj8fAxbH0b3iO8hdAH8NB6fCox3903uvhRYQph30h5Y4u7L3H0joVX2vQmGIlI6MzuJ0IX5EfBIwsURyZhyh5m7+4dmdjuh//5roBCYBaxJGb67Atg7Hu9NHA7r7pvjrPfdYvprKZdeGdOMbYfPriAELRFJQ+wq37ncjCI5ptwAFZdT6UEY1rqWMAGxWwXuUXIYaqWZmR7CiojUQu7+vViRThffz4AP3P3zOCP+ScJEvV3j8yOAFoQWEfG1JXy3YOQu7v55anqJc1YSFvEsmV5WJRL56tSpU2L3HjJkSF7WO5/rnq/1zue652u93ctud6QToP4LHBuX6jfCRMB3genAGTFPH2BSPJ4c3xM/n5aS3isu0LkvcAAwA5hJWASzdRwB2CvmzSr77LNPYvcuKChI7N5J1hvyt+75Wm/I37rna723J51nUDPM7DFgNrAxvt4PPAOMN7PhMe3BeMqDwDgzW0JYbqZXvM78uBzL/HidQR5C52Yzu4jwbKt4mPmCDNYxI/SDm4x8rXu+1hvyt+75Wu/tSWstPncfRpi5nuo/hEl7JfN+QxhOXtp1bgRuLCV9CnBQOmVJSpI/PEnK13pD/tY9X+sN+Vv3bK13WhN1s4WZeS6VV0REymdmeCUHSYiIiNQ4BSgREclKClAiIpKVFKBERCQraUddEZE88c03sHo1fPQRrFq19XX33eF3v0u6dN+nUXwiIjnMHdas2TbglPX61VfQrBk0bw4//OHW17vvhiVL4Ac/SKYOZY3iUwtKRCQLbdwYWjvpBJ6GDbcNOMWvhx++7fvddoM6pTzYWbQInnwSzjuv5uu5PWpBiYgA778P//kPbNkCmzdv/Up9X9nP0sn7v/9tG3jWrAktmtICT+pr8+aw445Vq/sTT8B998Fzz2Xm37KiympBKUCJSF6bORNuvBFefjm0OOrUgbp1t36lvt/eZ1XNu9NOIegUB5499gif1YT168N9k+rmUxefiEjkDtOmhcC0eDFcdhmMGxeCRD7aYQfo3j37uvk0zFxE8saWLaE7q0MHuOgiOOcceO89GDw4f4NTsTPOgIkTky7FttTFJyK13rffwj/+ATffDDvvDNdcA6edVvqAgXyVZDef1uITkbyzbl0YQn3AAfD3v8M998CMGfCrXyk4lVTczffEE0mXZCt9i0Sk1vniCxg+HPbdF158ER5/HJ5/Hjp3Bvve3+lS7Iwz4NFHky7FVgpQIlJrfPghXHFFaDF98EEITk88Accck3TJckP37vDmm/Dxx0mXJFCAEpGc9957YfTZYYeF502zZ8Po0XDIIUmXLLekjubLBgpQIpKz5syBXr3guOPC3KHFi8Mzp1atki5Z7urZM3u6+TSKT0Ryiju89FKYwzR3LlxyCZx/PjRunHTJaofi0XyLF8Oee9bMPTWKT0Rymjs89RR07AgDBoSReB98AJdfruCUSTvsACefnB3dfGpBiUhW27QJJkyAm24KS/9ccw2cfnrNLQOUj558MgzJf+GFmrmf1uITkZyyYUMY6HDrrdCyZQhMXbtqmHhNWL8e9torrHJeE9186uITkZywdm1oLe27LzzzTJhg++KL0K2bglNNyZZJu+UGKDNrY2azzWxWfF1rZhebWVMzKzSzRWY21cyapJwzwsyWmNkcM2ubkt7HzBbHc3qnpLczs7nxs7syX00RyXZffgnXXQf77w/vvAOFheGZ049/nHTJ8lM2TNotN0C5+2J3P8rd2wFHA+uAJ4Grgefd/SBgGnANgJl1B/Z39wOB84GRMb0pcD1wDNABGJIS1O4DBrh7G6CNmXXNYB1FJItt3gx/+xscdBCsXBmWIvr738PWF5Kcbt3grbeSnbRb0S6+nwHvu/tyoAcwJqaPie+Jr2MB3P0NoImZNQO6AoXuvtbd1wCFQDczaw40dveZ8fyxwGmVrZCI5I7p06Fdu7DVxb/+FZ457bdf0qUS2DqaL8luvooGqDOBf8TjZu6+GsDdVwHNYvrewPKUc1bEtJLpK1PSV5SSX0Rqqffeg1/+Evr3hz/8ITxjOvropEslJfXsmewWHGlvWGhm9YFTgatiUsnhdGUNr8voY82hQ4d+d1xQUEBBQUEmLy8i1WjtWrjhhtBSuvxyeOQRaNQo6VJJWbp2hb59QzdfJkfzFRUVUVRUVG6+iuyo2x14y90/je9Xm1kzd18du+mKeypXAi1TzmsR01YCBSXSp28nf6lSA5SI5IbNm+GBB2DIEPj5z8MgiObNky6VlGeHHeCUU0I33wUXZO66JRsXw4YNKzVfRbr4zgIeSXk/Gegbj/sCk1LSewOY2bHAmtgVOBXoYmZN4oCJLsDU2D241szam5nFcychIrXCCy/AUUeFDQOffTYEKgWn3JHkTrtpTdQ1sx2BZcB+7v5VTNsNmEho/SwDesbBD5jZPUA3woi/fu4+K6b3Bf6P0B14g7uPjelHAw8BjYBn3H1wGeXQRF2RHLFkSejGmzcPbrstPHPSPKbcs2FDWJtv4UJo1qz8/JWhlSREpEasWROeMz30EFx5JVx8sZ4z5bpf/zqsgThwYPVcXytJiEi12rQJRo6Egw8OgyHefTcEKAWn3JfUFhxqQYlIlT3/fNj2Yo894M47oW3b8s+R3FHd3XxqQYlIxi1eDKeeGkZ4DR8O06YpONVGjRolM2lXAUpEKuyLL+DSS+H44+EnPwndeaedpkEQtVkSk3YVoEQkbZs2wV/+Ep4zrVsXAtPll0PDhkmXTKpb164wZw6sWlVz91SAEpG0FBaG7rvHHgvHf/1rzW0JLslr1GjrpN2aogAlItu1aFFY/eHCC+FPfwoTb488MulSSRJqegsOBSgRKdUXX4SReR07woknhuWJevTQc6Z8VtPdfApQIrKNTZvg3nvDc6YNG8Jzpssu03MmqfluPgUoEfnO/Pmh++7JJ8Pcpvvu03Mm2VZNTtrVRF0RAeC118J6eTfdBH36qCtPSlc8aXfBgswt+quJuiJSpmefDc+XRo8O+/8oOElZarKbTwFKJM89/DD06weTJkH37kmXRnJBTU3aVRefSB676y644w6YMgUOPTTp0kiuyHQ3n7r4ROQ77nDttWGy7csvKzhJxTRqFObGPf549d5HAUokz2zaBOeeG0bpvfQStGqVdIkkF9XEpF118YnkkfXr4eyz4euvw1+/O++cdIkkV2Wym09dfCJ5bu1a6NYtdM889ZSCk1RNTXTzKUCJ5IFVq6BTJzjiiDBqr0GDpEsktUF1T9pVgBKp5d5/P+zbdPrpMGIE1NH/esmQk06Ct9+Gjz6qnuvrR1WkFps9O2woeOWVcN11moArmdWwYejmq65JuwpQIrVUUVFYfXrECDj//KRLI7VVdU7a1Sg+kVroySdDUJowIWyVIVJdvvkmjOKbPz+M6quMKo3iM7MmZvaomS0ws3fNrIOZNTWzQjNbZGZTzaxJSv4RZrbEzOaYWduU9D5mtjie0zslvZ2ZzY2f3VW5KooIwAMPhM0Fp0xRcJLq17Ah/OIX1TOaL90uvruBZ9z9EOBIYCFwNfC8ux8ETAOuATCz7sD+7n4gcD4wMqY3Ba4HjgE6AENSgtp9wAB3bwO0MbOumaicSD5xhz//OXy9+CK0a5d0iSRfVNek3XIDlJntApzg7qMB3H2Tu68FegBjYrYx8T3xdWzM+wbQxMyaAV2BQndf6+5rgEKgm5k1Bxq7+8x4/ljgtIzUTiRPbNkSdr+dMAFeeQUOPDDpEkk+OekkmDcv86P50mlB7Qt8amajzWyWmd1vZjsCzdx9NYC7rwKaxfx7A8tTzl8R00qmr0xJX1FKfhFJw7ffwm9+A7NmhZZTZZ8DiFRW8Wi+THfz1UszTzvgQnd/08zuJHTvlRytUNbohYwObB06dOh3xwUFBRQUFGTy8iI5Zd26ML+pfn2YOhV22CHpEkm+6tkTbr0VLrqo/LxFRUUUFRWVm6/cUXyxe+41d98vvu9ICFD7AwXuvjp2001390PMbGQ8nhDzLwQ6ASfG/BfE9JHAdODF4nNjei+gk7sPLKUsGsUnEn32Wdg47pBD4G9/g3rp/LkpUk2++Sa03t99t+Kt+EqP4ovdeMvNrE1M6gy8C0wG+sa0vsCkeDwZ6B1veiywJl5jKtAljghsCnQBpsbuwbVm1t7MLJ5bfC0RKcXy5XDCCWH5olGjFJwkedXRzZfWPCgzOxJ4AKgPfAD0A+oCE4GWwDKgZxz8gJndA3QD1gH93H1WTO8L/B+hO/AGdx8b048GHgIaEUYLDi6jHGpBSd5bsCAs+nrxxXDZZUmXRmSrf/0LbrkF/v3vip1XVgtKE3VFcsgbb0CPHuGXQO/e5ecXqUnF3XzvvAN77ZX+edpuQyTHTZ0aulAefFDBSbJTpiftKkCJ5IBHHglBadKkMDBCJFtlctKuuvhEstyIEWH47rPPwmGHJV0ake2rTDefuvhEcox72CLj3nvh5ZcVnCQ3ZLKbTwFKJAtt3hxWI586NQSn1q2TLpFI+jK10666+ESyzIYN8Otfw5dfho3gGjdOukQiFfPtt2ELjnS7+dTFJ5IDZs2C9u2hQYMwp0TBSXJRgwZw6qlV7+ZTgBLJAt9+C0OGhAm4V1wB//hH6MsXyVVnnFH1nXbVxSeSsDlzoG9faNEC7r+/YhMcRbJVcTffvHmwdzn7U6iLTyTLbNwIw4aFvXQuuQSeekrBSWqPTHTzKUCJJGDuXOjQISxdNHs29OkDltGNaUSSV9VJuwpQIjVo40YYPhw6d4bf/Q6efrr87g+RXNWlS9h+Y+XKyp2vACVSQ+bNg2OPDVuyz5oF/fqp1SS1W1W7+RSgRKrZpk3wpz/BT38KgwaFJYtatky6VCI1o2fPyo/m0yg+kWr07rthhF7TpvDAA9CqVdIlEqlZ334b1uabO7fs7myN4hOpQZs2wY03QkEBnHdeWLJIwUnyUVW6+RSgRDJs/nz48Y9h2jR4800491w9a5L8VtlJuwpQIhmyaRPcfDN06gQDBkBhoRZ5FQH42c9gwYKKj+ZTgJKs4g4zZ4bBBC1aQNeucOed4Yc7mx8/LlwIHTuGoDRzZliJXK0mkaC4m++xxyp2ngKUZIXVq+H22+Hww6FXr/BQtbAw/KJfuDCsUde6dXie88QTsHZt0iUONm8OmwmecEKYbPvcc7DPPkmXSiT7VGbSrkbxSWK+/RaeeQZGj4YXX4TTTgtzg044AeqU+NPJPQSqqVNhypQwl6ht29DC6tYN2rX7/jnVbdGiUN6GDWHUKNh335q9v0guKR7N9/bboXckVVmj+BSgpMbNnRuC0sMPw0EHhV/yZ5xRsa0l1q+Hf/87BKspU+Czz8Ks9W7dwtp2zZpVX/k3b4a774Y//zmspTdwYM0HR5Fc1K9f+MNy8OBt0xWgJFGffx62kBg9Gj7+GHr3DvODDjwwM9dftiy0rqZOhRdegP32C8GqWzc47jioXz8z91m8GPr3h7p1Q6tp//0zc12RfPDss3DDDaEHJFWVApSZLQXWAluAje7e3syaAhOA1sBSoKe7r435RwDdgXVAX3efE9P7AP8HOPAndx8b09sBDwGNgGfc/fdllEMBKods3hyeI40eHV67dw9/QXXuHH7BV5eNG+H117d2B773Hpx4YghWXbtW7hnRli0wYkT4zzVkCFx4oVpNIhVVVjdfVQPUB8DR7v5FStrNwGfufouZXQU0dferzaw7cJG7n2JmHYC73f3YGNDeBNoBBrwFtHP3tWb2Rjxnppk9E8+ZWko5FKBywOLFISiNHRtmjvfrFwY+NG2aTHk+/jgMXpgyJQTKpk23BqtOnWDHHbd//nvvbV03b9QoOOCAmim3SG3Uvz8ccQT8PqUZUtWVJKyUvD2AMfF4THxfnD4WwN3fAJqYWTOgK1Do7mvdfQ1QCHQzs+ZAY3efGc8fC5yWZrkkS3z5ZVjK5/jj4Sc/Ca2YqVNhxozwjCap4ASw557w61/DuHHw0Uehq3HPPeGmm8KzqpNOgjvuCBNsU//+KW41HXssnH46FBUpOIlUVUVG89VL85oOTDUzB/7q7g8Azdx9NYC7r4pBCGBvYHnKuStiWsn0lSnpK0rJL1luy5Yw+m70aJg8OXSjXXVV6MrL1DOfTKtTJ4z4a9cOrr02DFefNi0E0xEjQrdkt25hiaL77w/vX3stc8/KRPJd585wzjmwYsX3R/OVlG6AOt7dPzKzHwCFZraIELRSldX3ltHpikOHDv3uuKCggIKCgkxeXtKwdCmMGQMPPRRG3vXrB7fdFloluaZJE/jlL8OXe+ienDIFJkwIw94vvrh6n5eJ5JsGDaB9+yJ++9sijj12+3krPIrPzIYA/wN+CxS4++rYTTfd3Q8xs5HxeELMvxDoBJwY818Q00cC04EXi8+N6b2ATu4+sJR76xlUQr7+Oiz2OHp0GCbeq1cITO3aacUEEamYkqP5Kv0Mysx2NLOd4/FOwEnAPGAy0Ddm6wtMiseTgd4x/7HAmtgVOBXoYmZN4oCJLsBUd18FrDWz9mZm8dzia0nCXn89LHa6997h2c0FF4Sm+T33wNFHKziJSMV17hwm3i9fvv186XTxNQOejM+f6gEPu3uhmb0JTDSz/sAyoCeAuz9jZieb2XuEYeb9YvoXZjacMJLPgWFxsATAhWw7zHxKhWor1WL8eLjsMrjoInjnHW1NLiKZ0aAB9OgRemV+X+qkokATdaVUn3wS1sV76ik45pikSyMitc2UKfDHP8Krr2olCamgs84KI2xuvTXpkohIbbRxIzRvDnPmQKtWpQeodEfxSR6ZPDlstPfgg0mXRERqq/r1w0jZ7W3BocVaZBtr1oS9mB54oPwVFkREqqK8Sbvq4pNtnHsu1KsH992XdElEpLbbuDGszffZZ+rik3K88EJYq27evKRLIiL5oH79sNZlWUuhqQUlAKxbF0bt3XtvWKpIRKSmaBSfbNfvfx/2bBo7NumSiEi+KStAqYtPePVVmDhRXXsikl00ii/PbdgAAwaElbx33z3p0oiIbKUAleeGD4dDDw37HYmIZBN18eWx2bPDfKe33066JCIi36cWVJ7auDFsvXzLLWG5ERGRbKMAladuvTUEpt69ky6JiEjpNMw8Dy1cCB07wltvQevWSZdGRPJdpTcslNpl8+Ywam/YMAUnEcluClB55t57oW5dGDgw6ZKIiGyfuvjyyH/+EzYffPVVaNMm6dKIiATq4stz7nDeeXDllQpOIpIbFKDyxKhR8MUXcOmlSZdERCQ96uLLAx9+CG3bwvPPwxFHJF0aEZFtqYsvT7mHAREDByo4iUhu0VJHtdyECfD++9vfVllEJBul3YIyszpmNsvMJsf3+5jZ62a22MweMbN6Mb2BmY03syVm9pqZtUq5xjUxfYGZnZSS3s3MFsZrXZXJCuazTz4J+zyNGgUNGiRdGhGRiqlIF99gYH7K+5uB2929DbAGGBDTBwCfu/uBwF3ALQBmdijQEzgE6A78xYI6wD1AV+BHwFlmdnDlqyTFBg+Gc86B9u2TLomISMWlFaDMrAVwMvBASvJPgcfj8RjgtHjcI74HeCzmAzgVGO/um9x9KbAEaB+/lrj7MnffCIyP15AqeOopmDED/vjHpEsiIlI56bag7gSuABzAzHYHvnD3LfHzFcDe8XhvYDmAu28G1prZbqnp0cqYVjI99VpSCWvXwqBBYSuNHXdMujQiIpVT7iAJMzsFWO3uc8ysIPWjNO+Rbr60DB069LvjgoICCgoKMnn5WuGKK+CUU0D/NCKSjYqKiigqKio3Xzqj+I4HTjWzk4EdgMbA3UATM6sTW1EtCC0i4mtL4EMzqwvs4u6fm1lxerHicwxoVUp6qVIDlHzftGkwZQq8807SJRERKV3JxsWwYcNKzVduF5+7X+vurdx9P6AXMM3dzwGmA2fEbH2ASfF4cnxP/HxaSnqvOMpvX+AAYAYwEzjAzFqbWYN4j8lp1lNSrFsH554LI0fCLrskXRoRkaqpyjyoq4HxZjYcmA08GNMfBMaZ2RLgM0LAwd3nm9lEwkjAjcCguCzEZjO7CCgkBMwH3X1BFcqVt667Dn78Yzj55KRLIiJSdVrqqJZ47TX41a9C197uuyddGhGR9Gmpo1rsm2/CJoQjRig4iUjtoQBVCwwfDgcfDKefnnRJREQyR2vx5bg5c+D+++Htt8EyOqBfRCRZakHlsI0boX9/uOUW+OEPky6NiEhmKUDlsNtvhz33hD59ys8rIpJrNIovRy1aBB07wptvQuvWSZdGRKTyNIqvFtmyJYzaGzJEwUlEai8FqBx0771hQMSgQUmXRESk+qiLL8csXQrHHAMvvwwHHZR0aUREqk5dfLWAe1hr7/LLFZxEpPZTgMoho0fDF1/AZZclXRIRkeqnLr4c8eGH0LYtPPccHHlk0qUREckcdfHlMPcwIOKCCxScRCR/aKmjHDBxIixZAhMmJF0SEZGaoy6+LPfJJ3DEEfDPf0KHDkmXRkQk89TFl2NWrw4bEB56aOjaU3ASkXyjAJVlFi2C884L22d89hm8+mpYMUJEJN/oGVSWePVVuPVWeOUVGDgwBKo990y6VCIiyVGAStCWLTB5cghMq1bBpZfCww/DjjsmXTIRkeQpQCVgwwYYNy5sl9G4MVxxBfzqV1BP3w0Rke/oV2IN+vxzuO8+uOceaNcORo6ETp20E66ISGk0SKIGLF0KgwfDAQeE+UzPPQdPPw0FBQpOIiJlUYCqRrNnw9lnw9FHQ8OGMG8ePPQQHHZY0iUTEcl+5QYoM2toZm+Y2Wwzm2dmQ2L6Pmb2upktNrNHzKxeTG9gZuPNbImZvWZmrVKudU1MX2BmJ6WkdzOzhfFaV1VHRWuKOxQWws9+Br/4BRx1FHzwAdxyC+y9d9KlExHJHWmtJGFmO7r712ZWF3gFGAxcCjzm7o+a2X3AHHf/q5kNBA5390FmdibwS3fvZWaHAg8DxwAtgOeBAwEDFgOdgQ+BmUAvd19YSjmydiWJjRvDUkS33QabN4ctMc46Cxo0SLpkIiLZrUorSbj71/GwIWFghQMnAo/H9DHAafG4R3wP8Bjw03h8KjDe3Te5+1JgCdA+fi1x92XuvhGEApuKAAAOwklEQVQYH6+RE776Cu64A/bfH0aNghtvhLlzoU8fBScRkapIK0CZWR0zmw2sAp4D3gfWuPuWmGUFUNyBtTewHMDdNwNrzWy31PRoZUwrmZ56raz10Udw9dWw777wxhvwxBMwbRp0766BDyIimZDWMPMYiI4ys12AJ4GDK3CPjP66Hjp06HfHBQUFFBQUZPLy5VqwIHTjPfEEnHMOzJgB++1Xo0UQEclpRUVFFBUVlZuvQvOg3P1LMysCjgN2NbM6MXi1ILSIiK8tgQ/jM6td3P1zMytOL1Z8jgGtSkkvVWqAqinu8NJLYcWHGTPgwgvDcPE99qjxooiI5LySjYthw4aVmi+dUXx7mFmTeLwD0AWYD0wHzojZ+gCT4vHk+J74+bSU9F5xlN++wAHADMKgiAPMrLWZNQB6xbxZYcsWOPNM6N8fTjklzGm6/noFJxGR6pZOC+qHwBgzq0MIaBPc/RkzWwCMN7PhwGzgwZj/QWCcmS0BPiMEHNx9vplNJAS3jcCgOCRvs5ldBBTG6z/o7gsyV8Wque02WLEidO3Vr590aURE8oc2LNyOl1+G00+HmTOhZcvy84uISMVpw8IK+uSTMI9p9GgFJxGRJKgFVYotW+Dkk8MqEDfeWO23ExHJa2pBVcCNN8LXX8Pw4UmXREQkf2m7jRKmTw/bYbz1lvZnEhFJklpQKVavDpNvx46FvfZKujQiIvlNASravDlsjfHb30KXLkmXRkREFKCiP/4xvF5/fbLlEBGRQE9ZCDvcPvBAeO5Ut27SpREREVCA4sMPw9YYDz8MzZsnXRoRESmW1118mzZBr14waBCceGLSpRERkVR5PVH32mtDt96zz0KdvA7VIiLJKWuibt528T37LIwbB7NmKTiJiGSjvAxQy5dDv37w2GPwgx8kXRoRESlN3rUdNm4M+ztdcgl07Jh0aUREpCx59wzqiitg/nx46il17YmIZAM9gwImT4aJE/XcSUQkF+RNC2rpUujQASZNgmOPzWy5RESk8vJ6u41vv4WePeGqqxScRERyRV60oAYPhmXL4Mknwb4Xo0VEJEl5+wzq8cfDgIi33lJwEhHJJbW6BfX++3DccfD003DMMdVYMBERqbS8ewa1YQOccUbYPkPBSUQk99TaFtSgQfDppzBhgrr2RESyWaVbUGbWwsymmdm7ZjbPzC6O6U3NrNDMFpnZVDNrknLOCDNbYmZzzKxtSnofM1scz+mdkt7OzObGz+6qamUfeQQKC+Fvf1NwEhHJVel08W0CLnX3HwHHARea2cHA1cDz7n4QMA24BsDMugP7u/uBwPnAyJjeFLgeOAboAAxJCWr3AQPcvQ3Qxsy6VrZCixbBxRfDo49Ckybl5xcRkexUboBy91XuPice/w9YALQAegBjYrYx8T3xdWzM/wbQxMyaAV2BQndf6+5rgEKgm5k1Bxq7+8x4/ljgtMpUZv368NzphhvgqKMqcwUREckWFRokYWb7AG2B14Fm7r4aQhADmsVsewPLU05bEdNKpq9MSV9RSv4K+93v4LDD4LzzKnO2iIhkk7TnQZnZzsBjwGB3/5+ZlRytUNbohYw+BRo6dOh3xwUFBRQUFABhb6eXX4aZM/XcSUQkmxUVFVFUVFRuvrRG8ZlZPeBfwLPufndMWwAUuPvq2E033d0PMbOR8XhCzLcQ6AScGPNfENNHAtOBF4vPjem9gE7uPrCUcpQ6im/+fOjUCaZNg8MPL7c6IiKSRao6D2oUML84OEWTgb7xuC8wKSW9d7zpscCa2BU4FehiZk3igIkuwNTYPbjWzNqbmcVzJ5GmdevCc6ebb1ZwEhGpTcptQZnZ8cC/gXmEbjwHrgVmABOBlsAyoGcc/ICZ3QN0A9YB/dx9VkzvC/xfvMYN7j42ph8NPAQ0Ap5x98FllGWbFpQ79O0bjh96SF17IiK5qKwWVE5P1B01Cm6/HWbMgJ12SrBgIiJSabUuQM2dC507w4svwqGHJlwwERGptFq1Ft9XX4XnTnfcoeAkIlJb5VwLassW5+yzYeedw1JGIiKS22rNflB//WsYVv7660mXREREqlPOtaD22MN55RVo0ybp0oiISCbUmmdQ99yj4CQikg9yrgWVS+UVEZHy1ZoWlIiI5AcFKBERyUoKUCIikpUUoEREJCspQImISFZSgBIRkaykACUiIllJAUpERLKSApSIiGQlBSgREclKClAiIpKVFKBERCQrKUCJiEhWUoASEZGspAAlIiJZSQFKRESyUrkBysweNLPVZjY3Ja2pmRWa2SIzm2pmTVI+G2FmS8xsjpm1TUnvY2aL4zm9U9Lbmdnc+NldmaxcJhUVFSVdhETka70hf+uer/WG/K17ttY7nRbUaKBribSrgefd/SBgGnANgJl1B/Z39wOB84GRMb0pcD1wDNABGJIS1O4DBrh7G6CNmZW8V1bI1m9gdcvXekP+1j1f6w35W/dsrXe5AcrdXwa+KJHcAxgTj8fE98XpY+N5bwBNzKwZIcAVuvtad18DFALdzKw50NjdZ8bzxwKnVaE+1Wbp0qWJ3TvJH54k6w35W/d8rTfkb93ztd7bU9lnUHu6+2oAd18FNIvpewPLU/KtiGkl01empK8oJX/W0Q9uMvK17vlab8jfuudrvbfH3L38TGatgafc/Yj4/nN33y3l88/cfXczewq40d1fjenPA1cCJwIN3f3PMf064GvgxZj/pJjeEbjS3U8toxzlF1ZERHKOu1vJtHqVvNZqM2vm7qtjN93HMX0l0DIlX4uYthIoKJE+fTv5S1VaBUREpHZKt4vP4lexyUDfeNwXmJSS3hvAzI4F1sSuwKlAFzNrEgdMdAGmxu7BtWbW3swsnjsJERHJe+W2oMzsH4TWz+5m9l9gCHAT8KiZ9QeWAT0B3P0ZMzvZzN4D1gH9YvoXZjYceBNwYFgcLAFwIfAQ0Ah4xt2nZK56IiKSq9J6BiUiIlLT8nYlCTNrYWbTzOxdM5tnZhfH9MpMQm4Z8843s3fMrFUSdUpXhut+c6zzu9k80RoqXm8zO8jMXjWzDWZ2aYlrdTOzhXGC+VVJ1Cddmap3WdfJZpn8nsfP65jZLDObXNN1qYgM/6w3MbNHzWxBvF6HGquIu+flF9AcaBuPdwYWAQcDNxNGEgJcBdwUj7sDT8fjDsDrKdeaDvw0Hu8INEq6fjVRd+A44KV4bMCrwE+Srl8G6/0D4GhgOHBpynXqAO8BrYH6wBzg4KTrVwP1LvU6SdevJuqecr1LgL8Dk5OuW03Vm/AIpl88rgfsUlP1yNsWlLuvcvc58fh/wALCKMIKTUI2s0OAuu4+LX72tbtvqLmaVFym6k54ntjIzBoBOxB+eFfXVD0qqgL1Pi3m+cTd3wI2lbhUe2CJuy9z943AeLb+W2WdTNW7jOtk5bzFYhn8nmNmLYCTgQdqoOhVkql6m9kuwAnuPjrm2+TuX9ZMLfK4iy+Vme0DtAVeB5p5epOQiycbtyGMRHzczN6KXV45Mxy+KnV399eBIuCjmDbV3RfVSMGrqJx671nO6WVNSM96Vax3add5I9NlrC4ZqPudwBWEP8xyRhXrvS/wqZmNjl2b95vZDtVZ3lR5H6DMbGfgMWBw/Euj5A9feT+M9YCOwKWEtQb3Z+sQ/KxW1bqb2f6EboO9CL+gO5vZ8dVR1kzKwPc8J2Wq3qVcJ+tl4Gf9FGB1bJWUnHaTtTL0+60dcK+7tyMssHB1xgtahrwOUGZWj/DNG+fuxfOvVsfuKyy9ScgrgDmxu2cL8E/CNzSrZajuvyQ8j1rv7l8DzxKeS2WtCta7LCuB1IEw251gng0yVO+yrpPVMlT344FTzewD4BHgRDMbW11lzoQM1XsFsNzd34zvH6MGf7/ldYACRgHz3f3ulLSKTkKeCexqZrvHfD8F5ldvsTMiE3X/L9DJzOqaWX2gE6GvO5uVV+8+lD5ZPPUv5pnAAWbW2swaAL3iNbJZJupd1nWyXZXr7u7Xunsrd9+P8P2e5u69Szknm2Si3quB5WbWJiZ1piZ/v9XUaIxs+yL8RbSZMAJrNjAL6AbsBjxPGPVSCOyacs49hNFbbwPtUtI7x7S3CT8U9ZKuX03UnfAHzkjCD+w7wK1J1y2T9SY8g1sOrAE+JwTkneNn3WL+JcDVSdetJupd1nWSrl9Nfc9TrtmJ7B/Fl8mf9SMJf5TNAZ4AmtRUPTRRV0REslK+d/GJiEiWUoASEZGspAAlIiJZSQFKRESykgKUiIhkJQUoERHJSgpQImmK2w4MjMc/NLOJ1XivI82se3VdXyQXKECJpK8pMAjA3T9y957VeK+2hJWzRfKWJuqKpMnMHgFOJczCfw84xN0PN7M+hG0LdgIOAG4HGgC/ATYAJ7v7GjPbD7gX2IOw6Oa57r7YzM4AridsdbAW6BKv34iwxt+NwFLgbqAhsJ6wP8+SCtx7OmEVkE5AXWCAu8+srn8rkUxQC0okfVcD73tY1bnktgs/IgSK9sCfgP/FfK8T1zEE7gcucvdj4vn3xfQ/ACe5+1HAqR72mLoemODu7dz9UcIahx3d/WhgCCFoVeTeADvEe1xIWJJLJKvVS7oAIrXEdA8run9tZmuAf8X0ecDhZrYT8GPg0ZT9wurH11eAMfGZ1hNlXH9XYKyZHUgIjKn/d7d775R8jwC4+0tm1tjMdvEa3HxOpKIUoEQy45uUY095v4Xw/6wO8EVs2WzD3Qea2THAz4G3zKy07QyGE1bQ/pWZtQamV+DeqZ8VM2rpvldSe6iLTyR9XwGN43GFNqxz96+A/5jZ6cVpZnZEfN3P3We6+xDC/jwt4712SbnELmzdc6pf5YrPmfF+HQlbpnxVyeuI1AgFKJE0ufvnwCtmNhe4hbJbIGWlnwMMMLM5ZvYOYcAFwK1mNjde91V3n0toIR0at9k+I97vJjN7i+3/v91eq2iDmc0C/gL0304+kaygUXwieSCO4rvM3WclXRaRdKkFJZIf9Jeo5By1oEREJCupBSUiIllJAUpERLKSApSIiGQlBSgREclKClAiIpKV/j8tpEoAhFB1RQAAAABJRU5ErkJggg==\n"}}],"metadata":{"tags":["sample_code"],"trusted":true,"dc":{"key":"53"}},"cell_type":"code","source":"# Setting up plotting in Jupyter notebooks\n%matplotlib inline\n\n# plot the data\nimport matplotlib.pyplot as plt\ncommits_per_year.plot(kind='line', title='Commit by Year', legend=False)\nplt.show()"},{"metadata":{"editable":false,"tags":["context"],"deletable":false,"run_control":{"frozen":true},"dc":{"key":"60"}},"cell_type":"markdown","source":"## 9.  Conclusion\n<p>Thanks to the solid foundation and caretaking of Linux Torvalds, many other developers are now able to contribute to the Linux kernel as well. There is no decrease of development activity at sight!</p>"},{"execution_count":96,"outputs":[],"metadata":{"collapsed":true,"tags":["sample_code"],"trusted":true,"dc":{"key":"60"}},"cell_type":"code","source":"# calculating or setting the year with the most commits to Linux\nyear_with_most_commits = 2016 "}],"nbformat":4,"nbformat_minor":2}


--------------------------------------------------------------------------------
/generating_keywords_for_google_adwords.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"metadata":{"language_info":{"name":"python","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2","mimetype":"text/x-python"},"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"}},"nbformat_minor":2,"cells":[{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"4"},"deletable":false,"tags":["context"],"editable":false},"source":"## 1. The brief\n<p>Imagine working for a digital marketing agency, and the agency is approached by a massive online retailer of furniture. They want to test our skills at creating large campaigns for all of their website. We are tasked with creating a prototype set of keywords for search campaigns for their sofas section. The client says that they want us to generate keywords for the following products: </p>\n<ul>\n<li>sofas</li>\n<li>convertible sofas</li>\n<li>love seats</li>\n<li>recliners</li>\n<li>sofa beds</li>\n</ul>\n<p><strong>The brief</strong>: The client is generally a low-cost retailer, offering many promotions and discounts. We will need to focus on such keywords. We will also need to move away from luxury keywords and topics, as we are targeting price-sensitive customers. Because we are going to be tight on budget, it would be good to focus on a tightly targeted set of keywords and make sure they are all set to exact and phrase match.</p>\n<p>Based on the brief above we will first need to generate a list of words, that together with the products given above would make for good keywords. Here are some examples:</p>\n<ul>\n<li>Products: sofas, recliners</li>\n<li>Words: buy, prices</li>\n</ul>\n<p>The resulting keywords: 'buy sofas', 'sofas buy', 'buy recliners', 'recliners buy',\n          'prices sofas', 'sofas prices', 'prices recliners', 'recliners prices'.</p>\n<p>As a final result, we want to have a DataFrame that looks like this: </p>\n<table>\n<thead>\n<tr>\n<th>Campaign</th>\n<th>Ad Group</th>\n<th>Keyword</th>\n<th>Criterion Type</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_1</td>\n<td>keyword 1a</td>\n<td>Exact</td>\n</tr>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_1</td>\n<td>keyword 1a</td>\n<td>Phrase</td>\n</tr>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_1</td>\n<td>keyword 1b</td>\n<td>Exact</td>\n</tr>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_1</td>\n<td>keyword 1b</td>\n<td>Phrase</td>\n</tr>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_2</td>\n<td>keyword 2a</td>\n<td>Exact</td>\n</tr>\n<tr>\n<td>Campaign1</td>\n<td>AdGroup_2</td>\n<td>keyword 2a</td>\n<td>Phrase</td>\n</tr>\n</tbody>\n</table>\n<p>The first step is to come up with a list of words that users might use to express their desire in buying low-cost sofas.</p>"},{"cell_type":"code","execution_count":176,"metadata":{"trusted":true,"dc":{"key":"4"},"tags":["sample_code"]},"outputs":[{"name":"stdout","text":"['buy', 'discount', 'promotion', 'price', 'promo', 'shop', 'cheap']\n","output_type":"stream"}],"source":"# List of words to pair with products\nwords = ['buy', 'discount', 'promotion', 'price', 'promo', 'shop', 'cheap']\n\n# Print list of words\nprint(words)"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"11"},"deletable":false,"tags":["context"],"editable":false},"source":"## 2. Combine the words with the product names\n<p>Imagining all the possible combinations of keywords can be stressful! But not for us, because we are keyword ninjas! We know how to translate campaign briefs into Python data structures and can imagine the resulting DataFrames that we need to create.</p>\n<p>Now that we have brainstormed the words that work well with the brief that we received, it is now time to combine them with the product names to generate meaningful search keywords. We want to combine every word with every product once before, and once after, as seen in the example above.</p>\n<p>As a quick reminder, for the product 'recliners' and the words 'buy' and 'price' for example, we would want to generate the following combinations: </p>\n<p>buy recliners<br>\nrecliners buy<br>\nprice recliners<br>\nrecliners price<br>\n...  </p>\n<p>and so on for all the words and products that we have.</p>"},{"cell_type":"code","execution_count":178,"metadata":{"trusted":true,"dc":{"key":"11"},"tags":["sample_code"]},"outputs":[{"name":"stdout","text":"[['sofas', 'sofas buy'], ['sofas', 'buy sofas'], ['sofas', 'sofas discount'], ['sofas', 'discount sofas'], ['sofas', 'sofas promotion'], ['sofas', 'promotion sofas'], ['sofas', 'sofas price'], ['sofas', 'price sofas'], ['sofas', 'sofas promo'], ['sofas', 'promo sofas'], ['sofas', 'sofas shop'], ['sofas', 'shop sofas'], ['sofas', 'sofas cheap'], ['sofas', 'cheap sofas'], ['convertible sofas', 'convertible sofas buy'], ['convertible sofas', 'buy convertible sofas'], ['convertible sofas', 'convertible sofas discount'], ['convertible sofas', 'discount convertible sofas'], ['convertible sofas', 'convertible sofas promotion'], ['convertible sofas', 'promotion convertible sofas'], ['convertible sofas', 'convertible sofas price'], ['convertible sofas', 'price convertible sofas'], ['convertible sofas', 'convertible sofas promo'], ['convertible sofas', 'promo convertible sofas'], ['convertible sofas', 'convertible sofas shop'], ['convertible sofas', 'shop convertible sofas'], ['convertible sofas', 'convertible sofas cheap'], ['convertible sofas', 'cheap convertible sofas'], ['love seats', 'love seats buy'], ['love seats', 'buy love seats'], ['love seats', 'love seats discount'], ['love seats', 'discount love seats'], ['love seats', 'love seats promotion'], ['love seats', 'promotion love seats'], ['love seats', 'love seats price'], ['love seats', 'price love seats'], ['love seats', 'love seats promo'], ['love seats', 'promo love seats'], ['love seats', 'love seats shop'], ['love seats', 'shop love seats'], ['love seats', 'love seats cheap'], ['love seats', 'cheap love seats'], ['recliners', 'recliners buy'], ['recliners', 'buy recliners'], ['recliners', 'recliners discount'], ['recliners', 'discount recliners'], ['recliners', 'recliners promotion'], ['recliners', 'promotion recliners'], ['recliners', 'recliners price'], ['recliners', 'price recliners'], ['recliners', 'recliners promo'], ['recliners', 'promo recliners'], ['recliners', 'recliners shop'], ['recliners', 'shop recliners'], ['recliners', 'recliners cheap'], ['recliners', 'cheap recliners'], ['sofa beds', 'sofa beds buy'], ['sofa beds', 'buy sofa beds'], ['sofa beds', 'sofa beds discount'], ['sofa beds', 'discount sofa beds'], ['sofa beds', 'sofa beds promotion'], ['sofa beds', 'promotion sofa beds'], ['sofa beds', 'sofa beds price'], ['sofa beds', 'price sofa beds'], ['sofa beds', 'sofa beds promo'], ['sofa beds', 'promo sofa beds'], ['sofa beds', 'sofa beds shop'], ['sofa beds', 'shop sofa beds'], ['sofa beds', 'sofa beds cheap'], ['sofa beds', 'cheap sofa beds']]\n","output_type":"stream"}],"source":"products = ['sofas', 'convertible sofas', 'love seats', 'recliners', 'sofa beds']\n\n# Create an empty list\nkeywords_list = []\n\n# Loop through products\nfor product in products:\n    # Loop through words\n    for word in words:\n        # Append combinations\n        keywords_list.append([product, product + ' ' + word])\n        keywords_list.append([product, word + ' ' + product])\n        \n# Inspect keyword list\nprint(keywords_list)"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"18"},"deletable":false,"tags":["context"],"editable":false},"source":"## 3. Convert the list of lists into a DataFrame\n<p>Now we want to convert this list of lists into a DataFrame so we can easily manipulate it and manage the final output.</p>"},{"cell_type":"code","execution_count":180,"metadata":{"trusted":true,"dc":{"key":"18"},"tags":["sample_code"]},"outputs":[{"name":"stdout","text":"       0                1\n0  sofas        sofas buy\n1  sofas        buy sofas\n2  sofas   sofas discount\n3  sofas   discount sofas\n4  sofas  sofas promotion\n","output_type":"stream"}],"source":"# Load library\nimport pandas as pd\n\n# Create a DataFrame from list\nkeywords_df = pd.DataFrame.from_records(keywords_list)\n\n# Print the keywords DataFrame to explore it\nprint(keywords_df.head())"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"25"},"deletable":false,"tags":["context"],"editable":false},"source":"## 4. Rename the columns of the DataFrame\n<p>Before we can upload this table of keywords, we will need to give the columns meaningful names. If we inspect the DataFrame we just created above, we can see that the columns are currently named <code>0</code> and <code>1</code>. <code>Ad Group</code> (example: \"sofas\") and <code>Keyword</code> (example: \"sofas buy\") are much more appropriate names.</p>"},{"cell_type":"code","execution_count":182,"metadata":{"trusted":true,"dc":{"key":"25"},"tags":["sample_code"]},"outputs":[{"name":"stdout","text":"  Ad Group          Keyword\n0    sofas        sofas buy\n1    sofas        buy sofas\n2    sofas   sofas discount\n3    sofas   discount sofas\n4    sofas  sofas promotion\n","output_type":"stream"}],"source":"# Rename the columns of the DataFrame\nkeywords_df = keywords_df.rename(columns={0: 'Ad Group', 1: 'Keyword'})\nprint(keywords_df.head())"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"32"},"deletable":false,"tags":["context"],"editable":false},"source":"## 5. Add a campaign column\n<p>Now we need to add some additional information to our DataFrame. \nWe need a new column called <code>Campaign</code> for the campaign name. We want campaign names to be descriptive of our group of keywords and products, so let's call this campaign 'SEM_Sofas'.</p>"},{"cell_type":"code","execution_count":184,"metadata":{"trusted":true,"dc":{"key":"32"},"tags":["sample_code"],"collapsed":true},"outputs":[],"source":"# Add a campaign column\nkeywords_df['Campaign'] = 'SEM_Sofas'"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"39"},"deletable":false,"tags":["context"],"editable":false},"source":"## 6. Create the match type column\n<p>There are different keyword match types. One is exact match, which is for matching the exact term or are close variations of that exact term. Another match type is broad match, which means ads may show on searches that include misspellings, synonyms, related searches, and other relevant variations.</p>\n<p>Straight from Google's AdWords <a href=\"https://support.google.com/google-ads/answer/2497836?hl=en\">documentation</a>:</p>\n<blockquote>\n  <p>In general, the broader the match type, the more traffic potential that keyword will have, since your ads may be triggered more often. Conversely, a narrower match type means that your ads may show less often—but when they do, they’re likely to be more related to someone’s search.</p>\n</blockquote>\n<p>Since the client is tight on budget, we want to make sure all the keywords are in exact match at the beginning.</p>"},{"cell_type":"code","execution_count":186,"metadata":{"trusted":true,"dc":{"key":"39"},"tags":["sample_code"],"collapsed":true},"outputs":[],"source":"# Add a criterion type column\nkeywords_df['Criterion Type'] = 'Exact'"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"46"},"deletable":false,"tags":["context"],"editable":false},"source":"## 7. Duplicate all the keywords into 'phrase' match\n<p>The great thing about exact match is that it is very specific, and we can control the process very well. The tradeoff, however, is that:  </p>\n<ol>\n<li>The search volume for exact match is lower than other match types</li>\n<li>We can't possibly think of all the ways in which people search, and so, we are probably missing out on some high-quality keywords.</li>\n</ol>\n<p>So it's good to use another match called <em>phrase match</em> as a discovery mechanism to allow our ads to be triggered by keywords that include our exact match keywords, together with anything before (or after) them.</p>\n<p>Later on, when we launch the campaign, we can explore with modified broad match, broad match, and negative match types, for better visibility and control of our campaigns.</p>"},{"cell_type":"code","execution_count":188,"metadata":{"trusted":true,"dc":{"key":"46"},"tags":["sample_code"],"collapsed":true},"outputs":[],"source":"# Make a copy of the keywords DataFrame\nkeywords_phrase = keywords_df.copy()\n\n# Change criterion type match to phrase\nkeywords_phrase['Criterion Type'] = 'Phrase'\n\n# Append the DataFrames\nkeywords_df_final = keywords_df.append(keywords_phrase)"},{"cell_type":"markdown","metadata":{"run_control":{"frozen":true},"dc":{"key":"53"},"deletable":false,"tags":["context"],"editable":false},"source":"## 8. Save and summarize!\n<p>To upload our campaign, we need to save it as a CSV file. Then we will be able to import it to AdWords editor or BingAds editor. There is also the option of pasting the data into the editor if we want, but having easy access to the saved data is great so let's save to a CSV file!</p>\n<p>Looking at a summary of our campaign structure is good now that we've wrapped up our keyword work. We can do that by grouping by ad group and criterion type and counting by keyword. This summary shows us that we assigned specific keywords to specific ad groups, which are each part of a campaign. In essence, we are telling Google (or Bing, etc.) that we want any of the words in each ad group to trigger one of the ads in the same ad group. Separately, we will have to create another table for ads, which is a task for another day and would look something like this:</p>\n<table>\n<thead>\n<tr>\n<th>Campaign</th>\n<th>Ad Group</th>\n<th>Headline 1</th>\n<th>Headline 2</th>\n<th>Description</th>\n<th>Final URL</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>SEM_Sofas</td>\n<td>Sofas</td>\n<td>Looking for Quality Sofas?</td>\n<td>Explore Our Massive Collection</td>\n<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>\n<td>DataCampSofas.com/sofas</td>\n</tr>\n<tr>\n<td>SEM_Sofas</td>\n<td>Sofas</td>\n<td>Looking for Affordable Sofas?</td>\n<td>Check Out Our Weekly Offers</td>\n<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>\n<td>DataCampSofas.com/sofas</td>\n</tr>\n<tr>\n<td>SEM_Sofas</td>\n<td>Recliners</td>\n<td>Looking for Quality Recliners?</td>\n<td>Explore Our Massive Collection</td>\n<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>\n<td>DataCampSofas.com/recliners</td>\n</tr>\n<tr>\n<td>SEM_Sofas</td>\n<td>Recliners</td>\n<td>Need Affordable Recliners?</td>\n<td>Check Out Our Weekly Offers</td>\n<td>30-day Returns With Free Delivery Within the US. Start Shopping Now</td>\n<td>DataCampSofas.com/recliners</td>\n</tr>\n</tbody>\n</table>\n<p>Together, these tables get us the sample <strong>keywords -> ads -> landing pages</strong> mapping shown in the diagram below.</p>\n<p><img src=\"https://s3.amazonaws.com/assets.datacamp.com/production/project_400/img/kwds_ads_lpages.png\" alt=\"Keywords-Ads-Landing pages flow\"></p>"},{"cell_type":"code","execution_count":190,"metadata":{"trusted":true,"dc":{"key":"53"},"tags":["sample_code"]},"outputs":[{"name":"stdout","text":"Ad Group           Criterion Type\nconvertible sofas  Exact             14\n                   Phrase            14\nlove seats         Exact             14\n                   Phrase            14\nrecliners          Exact             14\n                   Phrase            14\nsofa beds          Exact             14\n                   Phrase            14\nsofas              Exact             14\n                   Phrase            14\nName: Keyword, dtype: int64\n","output_type":"stream"}],"source":"# Save the final keywords to a CSV file\nkeywords_df_final.to_csv('keywords.csv', index=False)\n\n# View a summary of our campaign work\nsummary = keywords_df_final.groupby(['Ad Group', 'Criterion Type'])['Keyword'].count()\nprint(summary)"}]}


--------------------------------------------------------------------------------
/give_life_predict_blood_donations.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "dc": {
  7 |      "key": "3"
  8 |     },
  9 |     "deletable": false,
 10 |     "editable": false,
 11 |     "run_control": {
 12 |      "frozen": true
 13 |     },
 14 |     "tags": [
 15 |      "context"
 16 |     ]
 17 |    },
 18 |    "source": [
 19 |     "## 1. Inspecting transfusion.data file\n",
 20 |     "<p><img src=\"https://assets.datacamp.com/production/project_646/img/blood_donation.png\" style=\"float: right;\" alt=\"A pictogram of a blood bag with blood donation written in it\" width=\"200\"></p>\n",
 21 |     "<p>Blood transfusion saves lives - from replacing lost blood during major surgery or a serious injury to treating various illnesses and blood disorders. Ensuring that there's enough blood in supply whenever needed is a serious challenge for the health professionals. According to <a href=\"https://www.webmd.com/a-to-z-guides/blood-transfusion-what-to-know#1\">WebMD</a>, \"about 5 million Americans need a blood transfusion every year\".</p>\n",
 22 |     "<p>Our dataset is from a mobile blood donation vehicle in Taiwan. The Blood Transfusion Service Center drives to different universities and collects blood as part of a blood drive. We want to predict whether or not a donor will give blood the next time the vehicle comes to campus.</p>\n",
 23 |     "<p>The data is stored in <code>datasets/transfusion.data</code> and it is structured according to RFMTC marketing model (a variation of RFM). We'll explore what that means later in this notebook. First, let's inspect the data.</p>"
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": 153,
 29 |    "metadata": {
 30 |     "dc": {
 31 |      "key": "3"
 32 |     },
 33 |     "tags": [
 34 |      "sample_code"
 35 |     ]
 36 |    },
 37 |    "outputs": [
 38 |     {
 39 |      "name": "stdout",
 40 |      "output_type": "stream",
 41 |      "text": [
 42 |       "Recency (months),Frequency (times),Monetary (c.c. blood),Time (months),\"whether he/she donated blood in March 2007\"\r",
 43 |       "\r\n",
 44 |       "2 ,50,12500,98 ,1\r",
 45 |       "\r\n",
 46 |       "0 ,13,3250,28 ,1\r",
 47 |       "\r\n",
 48 |       "1 ,16,4000,35 ,1\r",
 49 |       "\r\n",
 50 |       "2 ,20,5000,45 ,1\r",
 51 |       "\r\n"
 52 |      ]
 53 |     }
 54 |    ],
 55 |    "source": [
 56 |     "# Print out the first 5 lines from the transfusion.data file\n",
 57 |     "!head -n 5 datasets/transfusion.data"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "markdown",
 62 |    "metadata": {
 63 |     "dc": {
 64 |      "key": "10"
 65 |     },
 66 |     "deletable": false,
 67 |     "editable": false,
 68 |     "run_control": {
 69 |      "frozen": true
 70 |     },
 71 |     "tags": [
 72 |      "context"
 73 |     ]
 74 |    },
 75 |    "source": [
 76 |     "## 2. Loading the blood donations data\n",
 77 |     "<p>We now know that we are working with a typical CSV file (i.e., the delimiter is <code>,</code>, etc.). We proceed to loading the data into memory.</p>"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": 155,
 83 |    "metadata": {
 84 |     "dc": {
 85 |      "key": "10"
 86 |     },
 87 |     "tags": [
 88 |      "sample_code"
 89 |     ]
 90 |    },
 91 |    "outputs": [
 92 |     {
 93 |      "data": {
 94 |       "text/html": [
 95 |        "<div>\n",
 96 |        "<style scoped>\n",
 97 |        "    .dataframe tbody tr th:only-of-type {\n",
 98 |        "        vertical-align: middle;\n",
 99 |        "    }\n",
100 |        "\n",
101 |        "    .dataframe tbody tr th {\n",
102 |        "        vertical-align: top;\n",
103 |        "    }\n",
104 |        "\n",
105 |        "    .dataframe thead th {\n",
106 |        "        text-align: right;\n",
107 |        "    }\n",
108 |        "</style>\n",
109 |        "<table border=\"1\" class=\"dataframe\">\n",
110 |        "  <thead>\n",
111 |        "    <tr style=\"text-align: right;\">\n",
112 |        "      <th></th>\n",
113 |        "      <th>Recency (months)</th>\n",
114 |        "      <th>Frequency (times)</th>\n",
115 |        "      <th>Monetary (c.c. blood)</th>\n",
116 |        "      <th>Time (months)</th>\n",
117 |        "      <th>whether he/she donated blood in March 2007</th>\n",
118 |        "    </tr>\n",
119 |        "  </thead>\n",
120 |        "  <tbody>\n",
121 |        "    <tr>\n",
122 |        "      <th>0</th>\n",
123 |        "      <td>2</td>\n",
124 |        "      <td>50</td>\n",
125 |        "      <td>12500</td>\n",
126 |        "      <td>98</td>\n",
127 |        "      <td>1</td>\n",
128 |        "    </tr>\n",
129 |        "    <tr>\n",
130 |        "      <th>1</th>\n",
131 |        "      <td>0</td>\n",
132 |        "      <td>13</td>\n",
133 |        "      <td>3250</td>\n",
134 |        "      <td>28</td>\n",
135 |        "      <td>1</td>\n",
136 |        "    </tr>\n",
137 |        "    <tr>\n",
138 |        "      <th>2</th>\n",
139 |        "      <td>1</td>\n",
140 |        "      <td>16</td>\n",
141 |        "      <td>4000</td>\n",
142 |        "      <td>35</td>\n",
143 |        "      <td>1</td>\n",
144 |        "    </tr>\n",
145 |        "    <tr>\n",
146 |        "      <th>3</th>\n",
147 |        "      <td>2</td>\n",
148 |        "      <td>20</td>\n",
149 |        "      <td>5000</td>\n",
150 |        "      <td>45</td>\n",
151 |        "      <td>1</td>\n",
152 |        "    </tr>\n",
153 |        "    <tr>\n",
154 |        "      <th>4</th>\n",
155 |        "      <td>1</td>\n",
156 |        "      <td>24</td>\n",
157 |        "      <td>6000</td>\n",
158 |        "      <td>77</td>\n",
159 |        "      <td>0</td>\n",
160 |        "    </tr>\n",
161 |        "  </tbody>\n",
162 |        "</table>\n",
163 |        "</div>"
164 |       ],
165 |       "text/plain": [
166 |        "   Recency (months)  Frequency (times)  Monetary (c.c. blood)  Time (months)  \\\n",
167 |        "0                 2                 50                  12500             98   \n",
168 |        "1                 0                 13                   3250             28   \n",
169 |        "2                 1                 16                   4000             35   \n",
170 |        "3                 2                 20                   5000             45   \n",
171 |        "4                 1                 24                   6000             77   \n",
172 |        "\n",
173 |        "   whether he/she donated blood in March 2007  \n",
174 |        "0                                           1  \n",
175 |        "1                                           1  \n",
176 |        "2                                           1  \n",
177 |        "3                                           1  \n",
178 |        "4                                           0  "
179 |       ]
180 |      },
181 |      "execution_count": 155,
182 |      "metadata": {},
183 |      "output_type": "execute_result"
184 |     }
185 |    ],
186 |    "source": [
187 |     "# Import pandas\n",
188 |     "import pandas as pd\n",
189 |     "\n",
190 |     "# Read in dataset\n",
191 |     "transfusion = pd.read_csv('datasets/transfusion.data')\n",
192 |     "\n",
193 |     "# Print out the first rows of our dataset\n",
194 |     "transfusion.head()"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "markdown",
199 |    "metadata": {
200 |     "dc": {
201 |      "key": "17"
202 |     },
203 |     "deletable": false,
204 |     "editable": false,
205 |     "run_control": {
206 |      "frozen": true
207 |     },
208 |     "tags": [
209 |      "context"
210 |     ]
211 |    },
212 |    "source": [
213 |     "## 3. Inspecting transfusion DataFrame\n",
214 |     "<p>Let's briefly return to our discussion of RFM model. RFM stands for Recency, Frequency and Monetary Value and it is commonly used in marketing for identifying your best customers. In our case, our customers are blood donors.</p>\n",
215 |     "<p>RFMTC is a variation of the RFM model. Below is a description of what each column means in our dataset:</p>\n",
216 |     "<ul>\n",
217 |     "<li>R (Recency - months since the last donation)</li>\n",
218 |     "<li>F (Frequency - total number of donation)</li>\n",
219 |     "<li>M (Monetary - total blood donated in c.c.)</li>\n",
220 |     "<li>T (Time - months since the first donation)</li>\n",
221 |     "<li>a binary variable representing whether he/she donated blood in March 2007 (1 stands for donating blood; 0 stands for not donating blood)</li>\n",
222 |     "</ul>\n",
223 |     "<p>It looks like every column in our DataFrame has the numeric type, which is exactly what we want when building a machine learning model. Let's verify our hypothesis.</p>"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 157,
229 |    "metadata": {
230 |     "dc": {
231 |      "key": "17"
232 |     },
233 |     "tags": [
234 |      "sample_code"
235 |     ]
236 |    },
237 |    "outputs": [
238 |     {
239 |      "name": "stdout",
240 |      "output_type": "stream",
241 |      "text": [
242 |       "<class 'pandas.core.frame.DataFrame'>\n",
243 |       "RangeIndex: 748 entries, 0 to 747\n",
244 |       "Data columns (total 5 columns):\n",
245 |       "Recency (months)                              748 non-null int64\n",
246 |       "Frequency (times)                             748 non-null int64\n",
247 |       "Monetary (c.c. blood)                         748 non-null int64\n",
248 |       "Time (months)                                 748 non-null int64\n",
249 |       "whether he/she donated blood in March 2007    748 non-null int64\n",
250 |       "dtypes: int64(5)\n",
251 |       "memory usage: 29.3 KB\n"
252 |      ]
253 |     }
254 |    ],
255 |    "source": [
256 |     "# Print a concise summary of transfusion DataFrame\n",
257 |     "transfusion.info()"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "markdown",
262 |    "metadata": {
263 |     "dc": {
264 |      "key": "24"
265 |     },
266 |     "deletable": false,
267 |     "editable": false,
268 |     "run_control": {
269 |      "frozen": true
270 |     },
271 |     "tags": [
272 |      "context"
273 |     ]
274 |    },
275 |    "source": [
276 |     "## 4. Creating target column\n",
277 |     "<p>We are aiming to predict the value in <code>whether he/she donated blood in March 2007</code> column. Let's rename this it to <code>target</code> so that it's more convenient to work with.</p>"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "code",
282 |    "execution_count": 159,
283 |    "metadata": {
284 |     "dc": {
285 |      "key": "24"
286 |     },
287 |     "tags": [
288 |      "sample_code"
289 |     ]
290 |    },
291 |    "outputs": [
292 |     {
293 |      "data": {
294 |       "text/html": [
295 |        "<div>\n",
296 |        "<style scoped>\n",
297 |        "    .dataframe tbody tr th:only-of-type {\n",
298 |        "        vertical-align: middle;\n",
299 |        "    }\n",
300 |        "\n",
301 |        "    .dataframe tbody tr th {\n",
302 |        "        vertical-align: top;\n",
303 |        "    }\n",
304 |        "\n",
305 |        "    .dataframe thead th {\n",
306 |        "        text-align: right;\n",
307 |        "    }\n",
308 |        "</style>\n",
309 |        "<table border=\"1\" class=\"dataframe\">\n",
310 |        "  <thead>\n",
311 |        "    <tr style=\"text-align: right;\">\n",
312 |        "      <th></th>\n",
313 |        "      <th>Recency (months)</th>\n",
314 |        "      <th>Frequency (times)</th>\n",
315 |        "      <th>Monetary (c.c. blood)</th>\n",
316 |        "      <th>Time (months)</th>\n",
317 |        "      <th>target</th>\n",
318 |        "    </tr>\n",
319 |        "  </thead>\n",
320 |        "  <tbody>\n",
321 |        "    <tr>\n",
322 |        "      <th>0</th>\n",
323 |        "      <td>2</td>\n",
324 |        "      <td>50</td>\n",
325 |        "      <td>12500</td>\n",
326 |        "      <td>98</td>\n",
327 |        "      <td>1</td>\n",
328 |        "    </tr>\n",
329 |        "    <tr>\n",
330 |        "      <th>1</th>\n",
331 |        "      <td>0</td>\n",
332 |        "      <td>13</td>\n",
333 |        "      <td>3250</td>\n",
334 |        "      <td>28</td>\n",
335 |        "      <td>1</td>\n",
336 |        "    </tr>\n",
337 |        "  </tbody>\n",
338 |        "</table>\n",
339 |        "</div>"
340 |       ],
341 |       "text/plain": [
342 |        "   Recency (months)  Frequency (times)  Monetary (c.c. blood)  Time (months)  \\\n",
343 |        "0                 2                 50                  12500             98   \n",
344 |        "1                 0                 13                   3250             28   \n",
345 |        "\n",
346 |        "   target  \n",
347 |        "0       1  \n",
348 |        "1       1  "
349 |       ]
350 |      },
351 |      "execution_count": 159,
352 |      "metadata": {},
353 |      "output_type": "execute_result"
354 |     }
355 |    ],
356 |    "source": [
357 |     "# Rename target column as 'target' for brevity \n",
358 |     "transfusion.rename(\n",
359 |     "    columns={'whether he/she donated blood in March 2007': 'target'},\n",
360 |     "    inplace=True\n",
361 |     ")\n",
362 |     "\n",
363 |     "# Print out the first 2 rows\n",
364 |     "transfusion.head(n=2)"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "markdown",
369 |    "metadata": {
370 |     "dc": {
371 |      "key": "31"
372 |     },
373 |     "deletable": false,
374 |     "editable": false,
375 |     "run_control": {
376 |      "frozen": true
377 |     },
378 |     "tags": [
379 |      "context"
380 |     ]
381 |    },
382 |    "source": [
383 |     "## 5. Checking target incidence\n",
384 |     "<p>We want to predict whether or not the same donor will give blood the next time the vehicle comes to campus. The model for this is a binary classifier, meaning that there are only 2 possible outcomes:</p>\n",
385 |     "<ul>\n",
386 |     "<li><code>0</code> - the donor will not give blood</li>\n",
387 |     "<li><code>1</code> - the donor will give blood</li>\n",
388 |     "</ul>\n",
389 |     "<p>Target incidence is defined as the number of cases of each individual target value in a dataset. That is, how many 0s in the target column compared to how many 1s? Target incidence gives us an idea of how balanced (or imbalanced) is our dataset.</p>"
390 |    ]
391 |   },
392 |   {
393 |    "cell_type": "code",
394 |    "execution_count": 161,
395 |    "metadata": {
396 |     "dc": {
397 |      "key": "31"
398 |     },
399 |     "tags": [
400 |      "sample_code"
401 |     ]
402 |    },
403 |    "outputs": [
404 |     {
405 |      "data": {
406 |       "text/plain": [
407 |        "0    0.762\n",
408 |        "1    0.238\n",
409 |        "Name: target, dtype: float64"
410 |       ]
411 |      },
412 |      "execution_count": 161,
413 |      "metadata": {},
414 |      "output_type": "execute_result"
415 |     }
416 |    ],
417 |    "source": [
418 |     "# Print target incidence proportions, rounding output to 3 decimal places\n",
419 |     "transfusion.target.value_counts(normalize=True).round(3)"
420 |    ]
421 |   },
422 |   {
423 |    "cell_type": "markdown",
424 |    "metadata": {
425 |     "dc": {
426 |      "key": "38"
427 |     },
428 |     "deletable": false,
429 |     "editable": false,
430 |     "run_control": {
431 |      "frozen": true
432 |     },
433 |     "tags": [
434 |      "context"
435 |     ]
436 |    },
437 |    "source": [
438 |     "## 6. Splitting transfusion into train and test datasets\n",
439 |     "<p>We'll now use <code>train_test_split()</code> method to split <code>transfusion</code> DataFrame.</p>\n",
440 |     "<p>Target incidence informed us that in our dataset <code>0</code>s appear 76% of the time. We want to keep the same structure in train and test datasets, i.e., both datasets must have 0 target incidence of 76%. This is very easy to do using the <code>train_test_split()</code> method from the <code>scikit learn</code> library - all we need to do is specify the <code>stratify</code> parameter. In our case, we'll stratify on the <code>target</code> column.</p>"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "code",
445 |    "execution_count": 163,
446 |    "metadata": {
447 |     "dc": {
448 |      "key": "38"
449 |     },
450 |     "tags": [
451 |      "sample_code"
452 |     ]
453 |    },
454 |    "outputs": [
455 |     {
456 |      "data": {
457 |       "text/html": [
458 |        "<div>\n",
459 |        "<style scoped>\n",
460 |        "    .dataframe tbody tr th:only-of-type {\n",
461 |        "        vertical-align: middle;\n",
462 |        "    }\n",
463 |        "\n",
464 |        "    .dataframe tbody tr th {\n",
465 |        "        vertical-align: top;\n",
466 |        "    }\n",
467 |        "\n",
468 |        "    .dataframe thead th {\n",
469 |        "        text-align: right;\n",
470 |        "    }\n",
471 |        "</style>\n",
472 |        "<table border=\"1\" class=\"dataframe\">\n",
473 |        "  <thead>\n",
474 |        "    <tr style=\"text-align: right;\">\n",
475 |        "      <th></th>\n",
476 |        "      <th>Recency (months)</th>\n",
477 |        "      <th>Frequency (times)</th>\n",
478 |        "      <th>Monetary (c.c. blood)</th>\n",
479 |        "      <th>Time (months)</th>\n",
480 |        "    </tr>\n",
481 |        "  </thead>\n",
482 |        "  <tbody>\n",
483 |        "    <tr>\n",
484 |        "      <th>334</th>\n",
485 |        "      <td>16</td>\n",
486 |        "      <td>2</td>\n",
487 |        "      <td>500</td>\n",
488 |        "      <td>16</td>\n",
489 |        "    </tr>\n",
490 |        "    <tr>\n",
491 |        "      <th>99</th>\n",
492 |        "      <td>5</td>\n",
493 |        "      <td>7</td>\n",
494 |        "      <td>1750</td>\n",
495 |        "      <td>26</td>\n",
496 |        "    </tr>\n",
497 |        "  </tbody>\n",
498 |        "</table>\n",
499 |        "</div>"
500 |       ],
501 |       "text/plain": [
502 |        "     Recency (months)  Frequency (times)  Monetary (c.c. blood)  Time (months)\n",
503 |        "334                16                  2                    500             16\n",
504 |        "99                  5                  7                   1750             26"
505 |       ]
506 |      },
507 |      "execution_count": 163,
508 |      "metadata": {},
509 |      "output_type": "execute_result"
510 |     }
511 |    ],
512 |    "source": [
513 |     "# Import train_test_split method\n",
514 |     "from sklearn.model_selection import train_test_split\n",
515 |     "\n",
516 |     "# Split transfusion DataFrame into\n",
517 |     "# X_train, X_test, y_train and y_test datasets,\n",
518 |     "# stratifying on the `target` column\n",
519 |     "X_train, X_test, y_train, y_test = train_test_split(\n",
520 |     "    transfusion.drop(columns='target'),\n",
521 |     "    transfusion.target,\n",
522 |     "    test_size=0.25,\n",
523 |     "    random_state=42,\n",
524 |     "    stratify=transfusion.target\n",
525 |     ")\n",
526 |     "\n",
527 |     "# Print out the first 2 rows of X_train\n",
528 |     "X_train.head(n=2)"
529 |    ]
530 |   },
531 |   {
532 |    "cell_type": "markdown",
533 |    "metadata": {
534 |     "dc": {
535 |      "key": "45"
536 |     },
537 |     "deletable": false,
538 |     "editable": false,
539 |     "run_control": {
540 |      "frozen": true
541 |     },
542 |     "tags": [
543 |      "context"
544 |     ]
545 |    },
546 |    "source": [
547 |     "## 7. Selecting model using TPOT\n",
548 |     "<p><a href=\"https://github.com/EpistasisLab/tpot\">TPOT</a> is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.</p>\n",
549 |     "<p><img src=\"https://assets.datacamp.com/production/project_646/img/tpot-ml-pipeline.png\" alt=\"TPOT Machine Learning Pipeline\"></p>\n",
550 |     "<p>TPOT will automatically explore hundreds of possible pipelines to find the best one for our dataset. Note, the outcome of this search will be a <a href=\"https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html\">scikit-learn pipeline</a>, meaning it will include any pre-processing steps as well as the model.</p>\n",
551 |     "<p>We are using TPOT to help us zero in on one model that we can then explore and optimize further.</p>"
552 |    ]
553 |   },
554 |   {
555 |    "cell_type": "code",
556 |    "execution_count": 165,
557 |    "metadata": {
558 |     "dc": {
559 |      "key": "45"
560 |     },
561 |     "tags": [
562 |      "sample_code"
563 |     ]
564 |    },
565 |    "outputs": [
566 |     {
567 |      "data": {
568 |       "application/vnd.jupyter.widget-view+json": {
569 |        "model_id": "85bd7069fff74a75a8c36926bb8b9586",
570 |        "version_major": 2,
571 |        "version_minor": 0
572 |       },
573 |       "text/plain": [
574 |        "HBox(children=(IntProgress(value=0, description='Optimization Progress', max=120, style=ProgressStyle(descript…"
575 |       ]
576 |      },
577 |      "metadata": {},
578 |      "output_type": "display_data"
579 |     },
580 |     {
581 |      "name": "stdout",
582 |      "output_type": "stream",
583 |      "text": [
584 |       "Generation 1 - Current best internal CV score: 0.7433977184592779\n",
585 |       "Generation 2 - Current best internal CV score: 0.7433977184592779\n",
586 |       "Generation 3 - Current best internal CV score: 0.7433977184592779\n",
587 |       "Generation 4 - Current best internal CV score: 0.7433977184592779\n",
588 |       "Generation 5 - Current best internal CV score: 0.7433977184592779\n",
589 |       "\n",
590 |       "Best pipeline: LogisticRegression(input_matrix, C=0.5, dual=False, penalty=l2)\n",
591 |       "\n",
592 |       "AUC score: 0.7850\n",
593 |       "\n",
594 |       "Best pipeline steps:\n",
595 |       "1. LogisticRegression(C=0.5, class_weight=None, dual=False, fit_intercept=True,\n",
596 |       "          intercept_scaling=1, max_iter=100, multi_class='warn',\n",
597 |       "          n_jobs=None, penalty='l2', random_state=None, solver='warn',\n",
598 |       "          tol=0.0001, verbose=0, warm_start=False)\n"
599 |      ]
600 |     }
601 |    ],
602 |    "source": [
603 |     "# Import TPOTClassifier and roc_auc_score\n",
604 |     "from tpot import TPOTClassifier\n",
605 |     "from sklearn.metrics import roc_auc_score\n",
606 |     "\n",
607 |     "# Instantiate TPOTClassifier\n",
608 |     "tpot = TPOTClassifier(\n",
609 |     "    generations=5,\n",
610 |     "    population_size=20,\n",
611 |     "    verbosity=2,\n",
612 |     "    scoring='roc_auc',\n",
613 |     "    random_state=42,\n",
614 |     "    disable_update_check=True,\n",
615 |     "    config_dict='TPOT light'\n",
616 |     ")\n",
617 |     "tpot.fit(X_train, y_train)\n",
618 |     "\n",
619 |     "# AUC score for tpot model\n",
620 |     "tpot_auc_score = roc_auc_score(y_test, tpot.predict_proba(X_test)[:, 1])\n",
621 |     "print(f'\\nAUC score: {tpot_auc_score:.4f}')\n",
622 |     "\n",
623 |     "# Print best pipeline steps\n",
624 |     "print('\\nBest pipeline steps:', end='\\n')\n",
625 |     "for idx, (name, transform) in enumerate(tpot.fitted_pipeline_.steps, start=1):\n",
626 |     "    # Print idx and transform\n",
627 |     "    print(f'{idx}. {transform}')"
628 |    ]
629 |   },
630 |   {
631 |    "cell_type": "markdown",
632 |    "metadata": {
633 |     "dc": {
634 |      "key": "52"
635 |     },
636 |     "deletable": false,
637 |     "editable": false,
638 |     "run_control": {
639 |      "frozen": true
640 |     },
641 |     "tags": [
642 |      "context"
643 |     ]
644 |    },
645 |    "source": [
646 |     "## 8. Checking the variance\n",
647 |     "<p>TPOT picked <code>LogisticRegression</code> as the best model for our dataset with no pre-processing steps, giving us the AUC score of 0.7850. This is a great starting point. Let's see if we can make it better.</p>\n",
648 |     "<p>One of the assumptions for linear regression models is that the data and the features we are giving it are related in a linear fashion, or can be measured with a linear distance metric. If a feature in our dataset has a high variance that's an order of magnitude or more greater than the other features, this could impact the model's ability to learn from other features in the dataset.</p>\n",
649 |     "<p>Correcting for high variance is called normalization. It is one of the possible transformations you do before training a model. Let's check the variance to see if such transformation is needed.</p>"
650 |    ]
651 |   },
652 |   {
653 |    "cell_type": "code",
654 |    "execution_count": 167,
655 |    "metadata": {
656 |     "dc": {
657 |      "key": "52"
658 |     },
659 |     "tags": [
660 |      "sample_code"
661 |     ]
662 |    },
663 |    "outputs": [
664 |     {
665 |      "data": {
666 |       "text/plain": [
667 |        "Recency (months)              66.929\n",
668 |        "Frequency (times)             33.830\n",
669 |        "Monetary (c.c. blood)    2114363.700\n",
670 |        "Time (months)                611.147\n",
671 |        "dtype: float64"
672 |       ]
673 |      },
674 |      "execution_count": 167,
675 |      "metadata": {},
676 |      "output_type": "execute_result"
677 |     }
678 |    ],
679 |    "source": [
680 |     "# X_train's variance, rounding the output to 3 decimal places\n",
681 |     "X_train.var().round(3)"
682 |    ]
683 |   },
684 |   {
685 |    "cell_type": "markdown",
686 |    "metadata": {
687 |     "dc": {
688 |      "key": "59"
689 |     },
690 |     "deletable": false,
691 |     "editable": false,
692 |     "run_control": {
693 |      "frozen": true
694 |     },
695 |     "tags": [
696 |      "context"
697 |     ]
698 |    },
699 |    "source": [
700 |     "## 9. Log normalization\n",
701 |     "<p><code>Monetary (c.c. blood)</code>'s variance is very high in comparison to any other column in the dataset. This means that, unless accounted for, this feature may get more weight by the model (i.e., be seen as more important) than any other feature.</p>\n",
702 |     "<p>One way to correct for high variance is to use log normalization.</p>"
703 |    ]
704 |   },
705 |   {
706 |    "cell_type": "code",
707 |    "execution_count": 169,
708 |    "metadata": {
709 |     "dc": {
710 |      "key": "59"
711 |     },
712 |     "tags": [
713 |      "sample_code"
714 |     ]
715 |    },
716 |    "outputs": [
717 |     {
718 |      "data": {
719 |       "text/plain": [
720 |        "Recency (months)      66.929\n",
721 |        "Frequency (times)     33.830\n",
722 |        "Time (months)        611.147\n",
723 |        "monetary_log           0.837\n",
724 |        "dtype: float64"
725 |       ]
726 |      },
727 |      "execution_count": 169,
728 |      "metadata": {},
729 |      "output_type": "execute_result"
730 |     }
731 |    ],
732 |    "source": [
733 |     "# Import numpy\n",
734 |     "import numpy as np\n",
735 |     "\n",
736 |     "# Copy X_train and X_test into X_train_normed and X_test_normed\n",
737 |     "X_train_normed, X_test_normed = X_train.copy(), X_test.copy()\n",
738 |     "\n",
739 |     "# Specify which column to normalize\n",
740 |     "col_to_normalize = 'Monetary (c.c. blood)'\n",
741 |     "\n",
742 |     "# Log normalization\n",
743 |     "for df_ in [X_train_normed, X_test_normed]:\n",
744 |     "    # Add log normalized column\n",
745 |     "    df_['monetary_log'] = np.log(df_[col_to_normalize])\n",
746 |     "    # Drop the original column\n",
747 |     "    df_.drop(columns=col_to_normalize, inplace=True)\n",
748 |     "\n",
749 |     "# Check the variance for X_train_normed\n",
750 |     "X_train_normed.var().round(3)"
751 |    ]
752 |   },
753 |   {
754 |    "cell_type": "markdown",
755 |    "metadata": {
756 |     "dc": {
757 |      "key": "66"
758 |     },
759 |     "deletable": false,
760 |     "editable": false,
761 |     "run_control": {
762 |      "frozen": true
763 |     },
764 |     "tags": [
765 |      "context"
766 |     ]
767 |    },
768 |    "source": [
769 |     "## 10. Training the linear regression model\n",
770 |     "<p>The variance looks much better now. Notice that now <code>Time (months)</code> has the largest variance, but it's not the <a href=\"https://en.wikipedia.org/wiki/Order_of_magnitude\">orders of magnitude</a> higher than the rest of the variables, so we'll leave it as is.</p>\n",
771 |     "<p>We are now ready to train the linear regression model.</p>"
772 |    ]
773 |   },
774 |   {
775 |    "cell_type": "code",
776 |    "execution_count": 171,
777 |    "metadata": {
778 |     "dc": {
779 |      "key": "66"
780 |     },
781 |     "tags": [
782 |      "sample_code"
783 |     ]
784 |    },
785 |    "outputs": [
786 |     {
787 |      "name": "stdout",
788 |      "output_type": "stream",
789 |      "text": [
790 |       "\n",
791 |       "AUC score: 0.7891\n"
792 |      ]
793 |     }
794 |    ],
795 |    "source": [
796 |     "# Importing modules\n",
797 |     "from sklearn import linear_model\n",
798 |     "\n",
799 |     "# Instantiate LogisticRegression\n",
800 |     "logreg = linear_model.LogisticRegression(\n",
801 |     "    solver='liblinear',\n",
802 |     "    random_state=42\n",
803 |     ")\n",
804 |     "\n",
805 |     "# Train the model\n",
806 |     "logreg.fit(X_train_normed, y_train)\n",
807 |     "\n",
808 |     "# AUC score for tpot model\n",
809 |     "logreg_auc_score = roc_auc_score(y_test, logreg.predict_proba(X_test_normed)[:, 1])\n",
810 |     "print(f'\\nAUC score: {logreg_auc_score:.4f}')"
811 |    ]
812 |   },
813 |   {
814 |    "cell_type": "markdown",
815 |    "metadata": {
816 |     "dc": {
817 |      "key": "73"
818 |     },
819 |     "deletable": false,
820 |     "editable": false,
821 |     "run_control": {
822 |      "frozen": true
823 |     },
824 |     "tags": [
825 |      "context"
826 |     ]
827 |    },
828 |    "source": [
829 |     "## 11. Conclusion\n",
830 |     "<p>The demand for blood fluctuates throughout the year. As one <a href=\"https://www.kjrh.com/news/local-news/red-cross-in-blood-donation-crisis\">prominent</a> example, blood donations slow down during busy holiday seasons. An accurate forecast for the future supply of blood allows for an appropriate action to be taken ahead of time and therefore saving more lives.</p>\n",
831 |     "<p>In this notebook, we explored automatic model selection using TPOT and AUC score we got was 0.7850. This is better than simply choosing <code>0</code> all the time (the target incidence suggests that such a model would have 76% success rate). We then log normalized our training data and improved the AUC score by 0.5%. In the field of machine learning, even small improvements in accuracy can be important, depending on the purpose.</p>\n",
832 |     "<p>Another benefit of using logistic regression model is that it is interpretable. We can analyze how much of the variance in the response variable (<code>target</code>) can be explained by other variables in our dataset.</p>"
833 |    ]
834 |   },
835 |   {
836 |    "cell_type": "code",
837 |    "execution_count": 173,
838 |    "metadata": {
839 |     "dc": {
840 |      "key": "73"
841 |     },
842 |     "tags": [
843 |      "sample_code"
844 |     ]
845 |    },
846 |    "outputs": [
847 |     {
848 |      "data": {
849 |       "text/plain": [
850 |        "[('logreg', 0.7890972663699937), ('tpot', 0.7849650349650349)]"
851 |       ]
852 |      },
853 |      "execution_count": 173,
854 |      "metadata": {},
855 |      "output_type": "execute_result"
856 |     }
857 |    ],
858 |    "source": [
859 |     "# Importing itemgetter\n",
860 |     "from operator import itemgetter\n",
861 |     "\n",
862 |     "# Sort models based on their AUC score from highest to lowest\n",
863 |     "sorted(\n",
864 |     "    [('tpot', tpot_auc_score), ('logreg', logreg_auc_score)],\n",
865 |     "    key=itemgetter(1),\n",
866 |     "    reverse=True\n",
867 |     ")"
868 |    ]
869 |   }
870 |  ],
871 |  "metadata": {
872 |   "kernelspec": {
873 |    "display_name": "Python 3",
874 |    "language": "python",
875 |    "name": "python3"
876 |   },
877 |   "language_info": {
878 |    "codemirror_mode": {
879 |     "name": "ipython",
880 |     "version": 3
881 |    },
882 |    "file_extension": ".py",
883 |    "mimetype": "text/x-python",
884 |    "name": "python",
885 |    "nbconvert_exporter": "python",
886 |    "pygments_lexer": "ipython3",
887 |    "version": "3.6.7"
888 |   }
889 |  },
890 |  "nbformat": 4,
891 |  "nbformat_minor": 2
892 | }
893 | 


--------------------------------------------------------------------------------
/name_game_genderprediction_using_sound.ipynb:
--------------------------------------------------------------------------------
1 | {"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","nbconvert_exporter":"python","name":"python","pygments_lexer":"ipython3","version":"3.5.2"}},"nbformat_minor":2,"nbformat":4,"cells":[{"metadata":{"run_control":{"frozen":true},"dc":{"key":"3"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 1. Sound it out!\n<p>Grey and Gray. Colour and Color. Words like these have been the cause of many heated arguments between Brits and Americans. Accents (and jokes) aside, there are many words that are pronounced the same way but have different spellings. While it is easy for us to realize their equivalence, basic programming commands will fail to equate such two strings. </p>\n<p>More extreme than word spellings are names because people have more flexibility in choosing to spell a name in a certain way. To some extent, tradition sometimes governs the way a name is spelled, which limits the number of variations of any given English name. But if we consider global names and their associated English spellings, you can only imagine how many ways they can be spelled out. </p>\n<p>One way to tackle this challenge is to write a program that checks if two strings sound the same, instead of checking for equivalence in spellings. We'll do that here using fuzzy name matching.</p>"},{"outputs":[{"text":"TANAR\nTANAR\n","output_type":"stream","name":"stdout"}],"metadata":{"dc":{"key":"3"},"trusted":true,"tags":["sample_code"]},"execution_count":95,"cell_type":"code","source":"import fuzzy\n\n# Exploring the output of fuzzy.nysiis\nprint(fuzzy.nysiis('tomorrow'))\n\n# Testing equivalence of similar sounding words (misspelled word)\nprint(fuzzy.nysiis('tommorow'))"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"10"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 2. Authoring the authors\n<p>The New York Times puts out a weekly list of best-selling books from different genres, and which has been published since the 1930’s.  We’ll focus on Children’s Picture Books, and analyze the gender distribution of authors to see if there have been changes over time. We'll begin by reading in the data on the best selling authors from 2008 to 2017.</p>"},{"outputs":[{"data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Year</th>\n      <th>Book Title</th>\n      <th>Author</th>\n      <th>Besteller this year</th>\n      <th>first_name</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2017</td>\n      <td>DRAGONS LOVE TACOS</td>\n      <td>Adam Rubin</td>\n      <td>49</td>\n      <td>Adam</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2017</td>\n      <td>THE WONDERFUL THINGS YOU WILL BE</td>\n      <td>Emily Winfield Martin</td>\n      <td>48</td>\n      <td>Emily</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2017</td>\n      <td>THE DAY THE CRAYONS QUIT</td>\n      <td>Drew Daywalt</td>\n      <td>44</td>\n      <td>Drew</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2017</td>\n      <td>ROSIE REVERE, ENGINEER</td>\n      <td>Andrea Beaty</td>\n      <td>38</td>\n      <td>Andrea</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2017</td>\n      <td>ADA TWIST, SCIENTIST</td>\n      <td>Andrea Beaty</td>\n      <td>28</td>\n      <td>Andrea</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"   Year                        Book Title                 Author  \\\n0  2017                DRAGONS LOVE TACOS             Adam Rubin   \n1  2017  THE WONDERFUL THINGS YOU WILL BE  Emily Winfield Martin   \n2  2017          THE DAY THE CRAYONS QUIT           Drew Daywalt   \n3  2017            ROSIE REVERE, ENGINEER           Andrea Beaty   \n4  2017              ADA TWIST, SCIENTIST           Andrea Beaty   \n\n   Besteller this year first_name  \n0                   49       Adam  \n1                   48      Emily  \n2                   44       Drew  \n3                   38     Andrea  \n4                   28     Andrea  "},"metadata":{},"execution_count":97,"output_type":"execute_result"}],"metadata":{"dc":{"key":"10"},"trusted":true,"tags":["sample_code"]},"execution_count":97,"cell_type":"code","source":"# Importing the pandas module\nimport pandas as pd\n\n# Reading in datasets/nytkids_yearly.csv, which is semicolon delimited.\nauthor_df = pd.read_csv('datasets/nytkids_yearly.csv', delimiter=';')\n\n# Looping through author_df['Author'] to extract the authors first names\nfirst_name = []\nfor name in author_df['Author']:\n    first_name.append(name.split()[0])\n\n# Adding first_name as a column to author_df\nauthor_df['first_name'] = first_name\n\n# Checking out the first few rows of author_df\nauthor_df.head()"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"17"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 3. It's time to bring on the phonics... _again_!\n<p>When we were young children, we were taught to read using phonics; sounding out the letters that compose words. So let's relive history and do that again, but using python this time. We will now create a new column or list that contains the phonetic equivalent of every first name that we just extracted. </p>\n<p>To make sure we're on the right track, let's compare the number of unique values in the <code>first_name</code> column and the number of unique values in the nysiis coded column. As a rule of thumb, the number of unique nysiis first names should be less than or equal to the number of actual first names.</p>"},{"outputs":[{"data":{"text/plain":"25"},"metadata":{},"execution_count":99,"output_type":"execute_result"}],"metadata":{"dc":{"key":"17"},"trusted":true,"tags":["sample_code"]},"execution_count":99,"cell_type":"code","source":"# Importing numpy\nimport numpy as np\n\n# Looping through author's first names to create the nysiis (fuzzy) equivalent\nnysiis_name = []\nfor first_name in author_df['first_name']:\n    nysiis_name.append(fuzzy.nysiis(first_name))\n\n# Adding nysiis_name as a column to author_df\nauthor_df['nysiis_name'] = nysiis_name\n\n# Printing out the difference between unique firstnames and unique nysiis_names:\nlen(np.unique(author_df['first_name'])) - len(np.unique(author_df['nysiis_name']))"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"24"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 4. The inbetweeners\n<p>We'll use <code>babynames_nysiis.csv</code>, a dataset that is derived from <a href=\"https://www.ssa.gov/oact/babynames/limits.html\">the Social Security Administration’s baby name data</a>, to identify author genders. The dataset contains unique NYSIIS versions of baby names, and also includes the percentage of times the name appeared as a female name (<code>perc_female</code>) and the percentage of times it appeared as a male name (<code>perc_male</code>). </p>\n<p>We'll use this data to create a list of <code>gender</code>. Let's make the following simplifying assumption: For each name, if <code>perc_female</code> is greater than <code>perc_male</code> then assume the name is female, if <code>perc_female</code> is less than <code>perc_male</code> then assume it is a male name, and if the percentages are equal then it's a \"neutral\" name.</p>"},{"outputs":[{"data":{"text/html":"<div>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>babynysiis</th>\n      <th>perc_female</th>\n      <th>perc_male</th>\n      <th>gender</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>NaN</td>\n      <td>62.50</td>\n      <td>37.50</td>\n      <td>F</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>RAX</td>\n      <td>63.64</td>\n      <td>36.36</td>\n      <td>F</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>ESAR</td>\n      <td>44.44</td>\n      <td>55.56</td>\n      <td>M</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>DJANG</td>\n      <td>0.00</td>\n      <td>100.00</td>\n      <td>M</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>PARCAL</td>\n      <td>25.00</td>\n      <td>75.00</td>\n      <td>M</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>VALCARY</td>\n      <td>100.00</td>\n      <td>0.00</td>\n      <td>F</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>FRANCASC</td>\n      <td>63.64</td>\n      <td>36.36</td>\n      <td>F</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>CABAT</td>\n      <td>50.00</td>\n      <td>50.00</td>\n      <td>N</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>XANDAR</td>\n      <td>16.67</td>\n      <td>83.33</td>\n      <td>M</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>RACSAN</td>\n      <td>33.33</td>\n      <td>66.67</td>\n      <td>M</td>\n    </tr>\n  </tbody>\n</table>\n</div>","text/plain":"  babynysiis  perc_female  perc_male gender\n0        NaN        62.50      37.50      F\n1        RAX        63.64      36.36      F\n2       ESAR        44.44      55.56      M\n3      DJANG         0.00     100.00      M\n4     PARCAL        25.00      75.00      M\n5    VALCARY       100.00       0.00      F\n6   FRANCASC        63.64      36.36      F\n7      CABAT        50.00      50.00      N\n8     XANDAR        16.67      83.33      M\n9     RACSAN        33.33      66.67      M"},"metadata":{},"execution_count":101,"output_type":"execute_result"}],"metadata":{"dc":{"key":"24"},"trusted":true,"tags":["sample_code"]},"execution_count":101,"cell_type":"code","source":"# Reading in datasets/babynames_nysiis.csv, which is semicolon delimited.\nbabies_df = pd.read_csv('datasets/babynames_nysiis.csv', delimiter=';')\n\n# Looping through babies_df to and filling up gender\ngender = []\nfor baby in babies_df['perc_female']:\n    if baby > 50:\n        gender.append('F')\n    elif baby == 50:\n        gender.append('N')\n    else:\n        gender.append('M')\n\n# Adding a gender column to babies_df\nbabies_df['gender'] = gender\n\n# Printing out the first few rows of babies_df\nbabies_df.head(n=10)"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"31"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 5. Playing matchmaker\n<p>Now that we have identified the likely genders of different names, let's find author genders by searching for each author's name in the <code>babies_df</code> DataFrame, and extracting the associated gender. </p>"},{"outputs":[{"data":{"text/plain":"F    395\nM    191\nN      8\nName: author_gender, dtype: int64"},"metadata":{},"execution_count":103,"output_type":"execute_result"}],"metadata":{"dc":{"key":"31"},"trusted":true,"tags":["sample_code"]},"execution_count":103,"cell_type":"code","source":"author_gender = author_df[['nysiis_name']].merge(babies_df[['babynysiis', 'gender']], \n                                               how='left', \n                                               left_on='nysiis_name', \n                                               right_on='babynysiis')['gender']\n    \n# Adding author_gender to the author_df\nauthor_df['author_gender'] = author_gender\n\n# Counting the author's genders\nauthor_df['author_gender'].value_counts()"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"38"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 6. Tally up\n<p>From the results above see that there are more female authors on the New York Times best seller's list than male authors. Our dataset spans 2008 to 2017. Let's find out if there have been changes over time.</p>"},{"outputs":[{"text":"[15, 45, 48, 51, 46, 51, 34, 30, 32, 43]\n[8, 19, 27, 21, 21, 11, 21, 18, 25, 20]\n[1, 0, 1, 1, 2, 1, 1, 0, 1, 0]\n","output_type":"stream","name":"stdout"}],"metadata":{"dc":{"key":"38"},"trusted":true,"tags":["sample_code"]},"execution_count":105,"cell_type":"code","source":"# Creating a list of unique years, sorted in ascending order.\nyears = np.unique(author_df['Year'])\n\n# Initializing lists\nmales_by_yr = []\nfemales_by_yr = []\nunknown_by_yr = []\n\n# Looping through years to find the number of male, female and unknown authors per year\nfor year in years:\n    females_by_yr.append(len(author_df[(author_df['author_gender']=='F') & (author_df['Year']==year)]))\n    males_by_yr.append(len(author_df[(author_df['author_gender']=='M') & (author_df['Year']==year)]))\n    unknown_by_yr.append(len(author_df[(author_df['author_gender']=='N') & (author_df['Year']==year)]))\n\n# Printing out yearly values to examine changes over time\nprint(females_by_yr)\nprint(males_by_yr)\nprint(unknown_by_yr)"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"45"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 7. Foreign-born authors?\n<p>Our gender data comes from social security applications of individuals born in the US. Hence, one possible explanation for why there are \"unknown\" genders associated with some author names is because these authors were foreign-born. While making this assumption, we should note that these are only a subset of foreign-born authors as others will have names that have a match in <code>baby_df</code> (and in the social security dataset). </p>\n<p>Using a bar chart, let's explore the trend of foreign-born authors with no name matches in the social security dataset.</p>"},{"outputs":[{"data":{"text/plain":"Text(0.5,0,'year')"},"metadata":{},"execution_count":107,"output_type":"execute_result"},{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYwAAAEWCAYAAAB1xKBvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAG7xJREFUeJzt3XuUHWWd7vHvY7ipMBBMRMiFoKKCR7nYA14HvGFAJXocl2FEUWFldASv4xwYZwHieAQ93mXEHIyIDqCDohlFkSMwjCKaBAEFBEIMJhFMJNxBMPCcP+pt3TTd6beTXb130s9nrVq9663L/r3dnTxdb9Wukm0iIiJG85heFxAREZuGBEZERFRJYERERJUERkREVElgRERElQRGRERUSWDEuJP0dElXSrpb0rta2P89kp7c7f2WfVvSU9vY93iTdImko3pdR2w6tuh1ATEh/RNwse2929i57W3b2O+mTNKJwFNtH97rWmLTlSOM6IVdgWs2ZENJm8UfOZtLP2Dz6kusXwIjxpWki4AXA58vQ0dPk7S9pDMlrZF0s6R/kfSYsv5bJP1E0qck3QacWNrfJuk6SbdLukDSrh3v8edhI0lPkPSfku6StEjSv0r68ZB13y7pRkl3SDpVkkbpxiGSlkn6g6SPd9T6mFL7zZJWlz5tX5bNKu91pKTfAhd1tB0h6bdlfx9cz/fulZJ+Ufqyohw1DC47UNLKIesvl/QySbOBfwbeUL7nV3Wstmv5/t4t6YeSpnRsf6ika8r35RJJewzZ9/+SdDVwr6Qtyvyqsq/rJb10lO9jbGpsZ8o0rhNwCXBUx/yZwHeA7YBZwA3AkWXZW4B1wDE0Q6iPBeYAS4E9Stu/AJd17M80wy8A55TpccCewArgx0PW/S6wAzATWAPMXk/tBi4Gdizr3zDYF+Btpa4nA9sC3wK+WpbNKtueCTy+9GOw7f+W+b2AB4A9RnjvA4Fn0fyh92zg98BrOpatHLL+cuBl5fWJwNeG+TncBDytvP8lwMll2dOAe4GXA1vSDCMuBbbq2PeVwIyy7dPL93aXjv4+pde/a5m6O+UII3pK0iRgLnCc7bttLwc+AbypY7Xf2f6c7XW27wfeDnzU9nW21wH/G9i78yijY9+vA06wfZ/ta4GvDFPGybbvsP1bmjAY7dzKKbbXlvU/DRxW2t8IfNL2Mtv3AMcBc4cM2Zxo+97Sj0Efsn2/7auAq2iC41FsX2L7l7Yftn01cDZwwCi1jubLtm8o9XyDv/T9DcD3bF9o+0/A/6EJhud3bPtZ2yvKtg8BWwN7StrS9nLbN21kbdFnEhjRa1No/oK9uaPtZmBax/yKIdvsCnymDJXcAawFNGQbgKk0RyCd2w/dF8CtHa/vozk6oAzH3FOmF42wj5uBXcrrXYbpxxbAThvy/kNJ2l/SxWXo7k6a4Jwy3LpjMNJ7P6Ivth+mqX3Yn4vtpcB7aI5kVks6R9IuxGYlgRG99gfgTzQhMGgmsKpjfugtlVcAf297h47psbYvG7LeGprhrOkdbTNqC7P9TNvblum/R9jHTOB35fXvhunHOpqho5H6MhZnAQuBGba3B06jCUpoho8eN7hiObqauhHv+4i+lPM6M1jPz8X2WbZfWLYzcMoY3zP6XAIjesr2QzRDIR+RtF0ZVnof8LX1bHYacJykZwKUk+avH2Hf3wJOlPQ4Sc8A3tyFsj8gabKkGcC7ga+X9rOB90raTdK2NENlXy/DZt2wHbDW9h8l7Qf8XceyG4BtyonxLWnO62zdsfz3wKzBE/QVvgG8UtJLy/7eT3N+ZWgoA3/+bM1LJG0N/BG4H3h4LJ2L/pfAiH5wDM1fyMuAH9P8Jb1gpJVtn0fz1+s5ku4CfgUcPMLqRwPb0wy9fJXmP/UHNrLe7wBLaE76fg/4UmlfUN7jUuA3NP9xHrOR79XpH4CTJN0NHE/znzoAtu8sy0+nOQq4F+i8auo/ytfbJF0x2hvZvh44HPgczVHgq4FX235whE22Bk4u694KPJHmHE5sRmTnAUoxcUg6BXiS7SN6XUvEpiZHGLFZk/QMSc9WYz/gSOC8XtcVsSnKJzRjc7cdzTDULjTj+J+gGVKKiDHKkFRERFTJkFRERFTZrIakpkyZ4lmzZvW6jIiITcaSJUv+YHvq6GtuZoExa9YsFi9e3OsyIiI2GZJuHn2tRoakIiKiSgIjIiKqJDAiIqJKAiMiIqokMCIiokoCIyIiqrQWGJJmlIe9XFseRPPuYdaRpM9KWirpakn7diw7ojxn+UZJuVFcRESPtfk5jHXA+21fIWk7YImkC8tjMgcdDOxepv2BLwD7S9oROAEYoHkQyxJJC23f3mK9ERGxHq0dYdi+xfYV5fXdwHU8+hGac4Az3bgc2EHSzsArgAvLc5NvBy4EZrdVa0REjG5cPuktaRawD/CzIYum8chnHK8sbSO1D7fvecA8gJkzZ3al3tj8zTr2e62/x/KTX9n6e0SMp9ZPepdHVX4TeI/tu7q9f9vzbQ/YHpg6tep2KBERsQFaDYzyLOBvAv9u+1vDrLKK5sHyg6aXtpHaIyKiR9q8Sko0zzq+zvYnR1htIfDmcrXUc4E7bd8CXAAcJGmypMnAQaUtIiJ6pM1zGC8A3gT8UtKVpe2fgZkAtk8DzgcOAZYC9wFvLcvWSvowsKhsd5LttS3WGhERo2gtMGz/GNAo6xh45wjLFgALWigtIiI2QD7pHRERVRIYERFRJYERERFVEhgREVElgREREVUSGBERUSWBERERVRIYERFRJYERERFVEhgREVElgREREVUSGBERUSWBERERVRIYERFRJYERERFVEhgREVGltQcoSVoAvApYbft/DLP8A8AbO+rYA5hanra3HLgbeAhYZ3ugrTojIqJOm0cYZwCzR1po++O297a9N3Ac8F9DHsP64rI8YRER0QdaCwzblwK1z+E+DDi7rVoiImLj9fwchqTH0RyJfLOj2cAPJS2RNK83lUVERKfWzmGMwauBnwwZjnqh7VWSnghcKOnX5YjlUUqgzAOYOXNm+9VGRExQPT/CAOYyZDjK9qrydTVwHrDfSBvbnm97wPbA1KlTWy00ImIi62lgSNoeOAD4Tkfb4yVtN/gaOAj4VW8qjIiIQW1eVns2cCAwRdJK4ARgSwDbp5XVXgv80Pa9HZvuBJwnabC+s2z/oK06IyKiTmuBYfuwinXOoLn8trNtGbBXO1VFRMSG6odzGBERsQlIYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUaS0wJC2QtFrSsM/jlnSgpDslXVmm4zuWzZZ0vaSlko5tq8aIiKjX5hHGGcDsUdb5b9t7l+kkAEmTgFOBg4E9gcMk7dlinRERUaG1wLB9KbB2AzbdD1hqe5ntB4FzgDldLS4iIsas1+cwnifpKknfl/TM0jYNWNGxzsrSNixJ8yQtlrR4zZo1bdYaETGh9TIwrgB2tb0X8Dng2xuyE9vzbQ/YHpg6dWpXC4yIiL/oWWDYvsv2PeX1+cCWkqYAq4AZHatOL20REdFDPQsMSU+SpPJ6v1LLbcAiYHdJu0naCpgLLOxVnRER0diirR1LOhs4EJgiaSVwArAlgO3TgL8F3iFpHXA/MNe2gXWSjgYuACYBC2xf01adERFRp7XAsH3YKMs/D3x+hGXnA+e3UVdERGyYXl8lFRERm4gERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESV1gJD0gJJqyX9aoTlb5R0taRfSrpM0l4dy5aX9islLW6rxoiIqNfmEcYZwOz1LP8NcIDtZwEfBuYPWf5i23vbHmipvoiIGIM2n+l9qaRZ61l+Wcfs5cD0tmqJiIiN1y/nMI4Evt8xb+CHkpZImre+DSXNk7RY0uI1a9a0WmRExETW2hFGLUkvpgmMF3Y0v9D2KklPBC6U9Gvblw63ve35lOGsgYEBt15wRMQE1dMjDEnPBk4H5ti+bbDd9qrydTVwHrBfbyqMiIhBVYEh6dWSuhoukmYC3wLeZPuGjvbHS9pu8DVwEDDslVYRETF+aoek3gB8WtI3gQW2fz3aBpLOBg4EpkhaCZwAbAlg+zTgeOAJwL9JAlhXrojaCTivtG0BnGX7B2PpVEREdF9VYNg+XNJfAYcBZ0gy8GXgbNt3j7DNYaPs8yjgqGHalwF7PXqLiIjopephJtt3AecC5wA7A68FrpB0TEu1RUREH6k9hzFH0nnAJTTDSvvZPpjmSOD97ZUXERH9ovYcxv8EPjX00lbb90k6svtlRUREv6kdkrp1aFhIOgXA9o+6XlVERPSd2sB4+TBtB3ezkIiI6G/rHZKS9A7gH4CnSLq6Y9F2wE/aLCwiIvrLaOcwzqK5x9NHgWM72u+2vba1qiIiou+MFhi2vVzSO4cukLRjQiMiYuKoOcJ4FbCE5g6y6lhm4Mkt1RUREX1mvYFh+1Xl627jU05ERPSr2g/uPerS2eHaIiJi8zXaVVLbAI+juYHgZP4yJPVXwLSWa4uIiD4y2jmMvwfeA+xCcx5jMDDuAj7fYl0REdFnRjuH8RngM5KOsf25caopIiL6UO3tzT8n6fnArM5tbJ/ZUl0REdFnqgJD0leBpwBXAg+VZgMJjIiICaL2brUDwJ623WYxERHRv2pvPvgr4Elj3bmkBZJWSxr2mdxqfFbSUklXS9q3Y9kRkm4s0xFjfe+IiOiu2iOMKcC1kn4OPDDYaPvQUbY7g+ZqqpGGrg4Gdi/T/sAXgP0l7UjzDPABmqGvJZIW2r69st6IiOiy2sA4cUN2bvtSSbPWs8oc4Mwy1HW5pB0k7QwcCFw4eK8qSRcCs4GzN6SOiIjYeLVXSf1XS+8/DVjRMb+ytI3U/iiS5gHzAGbOnLnBhcw69nsbvG2t5Se/svX3GKte9jvf8/bke/5IE7Xf3VZ7a5DnSlok6R5JD0p6SNJdbRdXw/Z82wO2B6ZOndrrciIiNlu1J70/DxwG3Ag8FjgKOLUL778KmNExP720jdQeERE9UhsY2F4KTLL9kO0v05xT2FgLgTeXq6WeC9xp+xbgAuAgSZPLPawOKm0REdEjtSe975O0FXClpI8Bt1ARNpLOpjmBPUXSSporn7YEsH0acD5wCLAUuA94a1m2VtKHgUVlVyflYU0REb1VGxhvAiYBRwPvpRkuet1oG9k+bJTlBh71NL+ybAGwoLK+iIhoWe1VUjeXl/cDH2qvnIiI6Fe195L6Dc0H6B7Bdh7RGhExQYzlXlKDtgFeD+zY/XIiIqJfVV0lZfu2jmmV7U8Dm/+nVCIi4s9qh6T27Zh9DM0RR+3RSUREbAZq/9P/BH85h7EOWE4zLBURERNEbWB8lyYwBp/pbeBVUjNr+5PdLy0iIvpJbWA8B/hr4Ds0ofFq4Oc0twqJiIgJoDYwpgP72r4bQNKJwPdsH95WYRER0V9q7yW1E/Bgx/yDpS0iIiaI2iOMM4GfSzqvzL+G5ml6ERExQdTeGuQjkr4PvKg0vdX2L9orKyIi+k31ZylsXwFc0WItERHRx6qfhxERERNbAiMiIqokMCIiokoCIyIiqrQaGJJmS7pe0lJJxw6z/FOSrizTDZLu6Fj2UMeyhW3WGRERo2vtjrOSJgGnAi8HVgKLJC20fe3gOrbf27H+McA+Hbu43/bebdUXERFj0+YRxn7AUtvLbD8InAPMWc/6hwFnt1hPRERshDYDYxqwomN+ZWl7FEm7ArsBF3U0byNpsaTLJb1mpDeRNK+st3jNmjXdqDsiIobRLye95wLn2n6oo21X2wPA3wGflvSU4Ta0Pd/2gO2BqVOnjketERETUpuBsQqY0TE/vbQNZy5DhqNsrypflwGX8MjzGxERMc7aDIxFwO6SdpO0FU0oPOpqJ0nPACYDP+1omyxp6/J6CvAC4Nqh20ZExPhp7Sop2+skHQ1cAEwCFti+RtJJwGLbg+ExFzjHtjs23wP4oqSHaULt5M6rqyIiYvy1FhgAts8Hzh/SdvyQ+ROH2e4y4Flt1hYREWPTLye9IyKizyUwIiKiSgIjIiKqJDAiIqJKAiMiIqokMCIiokoCIyIiqiQwIiKiSgIjIiKqJDAiIqJKAiMiIqokMCIiokoCIyIiqiQwIiKiSgIjIiKqJDAiIqJKAiMiIqq0GhiSZku6XtJSSccOs/wtktZIurJMR3UsO0LSjWU6os06IyJidK09olXSJOBU4OXASmCRpIXDPJv767aPHrLtjsAJwABgYEnZ9va26o2IiPVr8whjP2Cp7WW2HwTOAeZUbvsK4ELba0tIXAjMbqnOiIio0GZgTANWdMyvLG1DvU7S1ZLOlTRjjNsiaZ6kxZIWr1mzpht1R0TEMHp90vs/gVm2n01zFPGVse7A9nzbA7YHpk6d2vUCIyKi0WZgrAJmdMxPL21/Zvs22w+U2dOB59RuGxER46vNwFgE7C5pN0lbAXOBhZ0rSNq5Y/ZQ4Lry+gLgIEmTJU0GDiptERHRI61dJWV7naSjaf6jnwQssH2NpJOAxbYXAu+SdCiwDlgLvKVsu1bSh2lCB+Ak22vbqjUiIkbXWmAA2D4fOH9I2/Edr48Djhth2wXAgjbri4iIer0+6R0REZuIBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElVYDQ9JsSddLWirp2GGWv0/StZKulvQjSbt2LHtI0pVlWjh024iIGF+tPaJV0iTgVODlwEpgkaSFtq/tWO0XwIDt+yS9A/gY8Iay7H7be7dVX0REjE2bRxj7AUttL7P9IHAOMKdzBdsX276vzF4OTG+xnoiI2AhtBsY0YEXH/MrSNpIjge93zG8jabGkyyW9ZqSNJM0r6y1es2bNxlUcEREjam1IaiwkHQ4MAAd0NO9qe5WkJwMXSfql7ZuGbmt7PjAfYGBgwONScETEBNTmEcYqYEbH/PTS9giSXgZ8EDjU9gOD7bZXla/LgEuAfVqsNSIiRtFmYCwCdpe0m6StgLnAI652krQP8EWasFjd0T5Z0tbl9RTgBUDnyfKIiBhnrQ1J2V4n6WjgAmASsMD2NZJOAhbbXgh8HNgW+A9JAL+1fSiwB/BFSQ/ThNrJQ66uioiIcdbqOQzb5wPnD2k7vuP1y0bY7jLgWW3WFhERY5NPekdERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVVgND0mxJ10taKunYYZZvLenrZfnPJM3qWHZcab9e0ivarDMiIkbXWmBImgScChwM7AkcJmnPIasdCdxu+6nAp4BTyrZ7AnOBZwKzgX8r+4uIiB5p8whjP2Cp7WW2HwTOAeYMWWcO8JXy+lzgpZJU2s+x/YDt3wBLy/4iIqJHtmhx39OAFR3zK4H9R1rH9jpJdwJPKO2XD9l22nBvImkeMK/M3iPp+o0vvcoU4A9j2UCntFTJ+Nqk+t3F956o/YYx9n0z+T2HidPvXWtXbDMwxoXt+cD88X5fSYttD4z3+/Za+j3xTNS+T9R+r0+bQ1KrgBkd89NL27DrSNoC2B64rXLbiIgYR20GxiJgd0m7SdqK5iT2wiHrLASOKK//FrjItkv73HIV1W7A7sDPW6w1IiJG0dqQVDkncTRwATAJWGD7GkknAYttLwS+BHxV0lJgLU2oUNb7BnAtsA54p+2H2qp1A437MFifSL8nnona94na7xGp+YM+IiJi/fJJ74iIqJLAiIiIKgmMQtIMSRdLulbSNZLeXdp3lHShpBvL18mlXZI+W25fcrWkfTv29bGyj+vKOupVv0azAf1+hqSfSnpA0j8O2dd6bwXTb7rV95H206+6+TMvyydJ+oWk7453X8aiy7/rO0g6V9Kvy7/z5/WiT+POdqbmPM7OwL7l9XbADTS3NPkYcGxpPxY4pbw+BPg+IOC5wM9K+/OBn9Cc6J8E/BQ4sNf962K/nwj8NfAR4B879jMJuAl4MrAVcBWwZ6/7N059H3Y/ve5f2/3u2N/7gLOA7/a6b+PVb5o7VBxVXm8F7NDr/o3HlCOMwvYttq8or+8GrqP5dHnn7Uu+ArymvJ4DnOnG5cAOknYGDGxD80u0NbAl8Ptx68gYjbXftlfbXgT8aciuam4F01e61ff17KcvdfFnjqTpwCuB08eh9I3SrX5L2h74G5qrPLH9oO07xqUTPZbAGIaau+buA/wM2Mn2LWXRrcBO5fVwtz6ZZvunwMXALWW6wPZ141D2Rqvs90iG/X50ucTWbGTfR9pP3+tCvz8N/BPwcBv1tWUj+70bsAb4chmKO13S49uqtZ8kMIaQtC3wTeA9tu/qXObm+HO91yFLeiqwB82n06cBL5H0opbK7ZqN7femrFt9X99++lEXftdfBay2vaS9KruvCz/vLYB9gS/Y3ge4l2Yoa7OXwOggaUuaX6R/t/2t0vz7MtRE+bq6tI90+5LXApfbvsf2PTTnOfr6hNgY+z2STfJ2Ll3q+0j76Vtd6vcLgEMlLacZgnyJpK+1VHJXdKnfK4GVtgePIs+lCZDNXgKjKFcyfQm4zvYnOxZ13r7kCOA7He1vLldLPRe4sxzW/hY4QNIW5ZfzAJqx0r60Af0eSc2tYPpKt/q+nv30pW712/ZxtqfbnkXz877I9uEtlNwVXez3rcAKSU8vTS+luSvF5q/XZ937ZQJeSHMoejVwZZkOobnd+o+AG4H/B+xY1hfNA6JuAn4JDJT2ScAXaULiWuCTve5bl/v9JJq/sO4C7iiv/6osO4TmypObgA/2um/j1feR9tPr/o3Hz7xjnwfS/1dJdfN3fW9gcdnXt4HJve7feEy5NUhERFTJkFRERFRJYERERJUERkREVElgRERElQRGRERUSWBERESVBEZEH5E0qdc1RIwkgRGxgSSdJOk9HfMfkfRuSR+QtEjNc1I+1LH825KWlGcxzOtov0fSJyRdRZ/fRiYmtgRGxIZbALwZQNJjaG6PcSuwO83t3vcGniPpb8r6b7P9HGAAeJekJ5T2x9M8T2Uv2z8ezw5EjMUWvS4gYlNle7mk2yTtQ3NL7F/QPHDnoPIaYFuaALmUJiReW9pnlPbbgIdobogX0dcSGBEb53TgLTT3HVpAcyO6j9r+YudKkg4EXgY8z/Z9ki6hedAWwB9tPzReBUdsqAxJRWyc84DZNEcWF5TpbeWZC0iaJumJwPbA7SUsnkHzWN+ITUqOMCI2gu0HJV0M3FGOEn4oaQ/gp83dtLkHOBz4AfB2SdcB1wOX96rmiA2Vu9VGbIRysvsK4PW2b+x1PRFtypBUxAaStCewFPhRwiImghxhRERElRxhRERElQRGRERUSWBERESVBEZERFRJYERERJX/D/M7CLuliBgWAAAAAElFTkSuQmCC\n","text/plain":"<matplotlib.figure.Figure at 0x7f0c30611ef0>"},"metadata":{},"output_type":"display_data"}],"metadata":{"dc":{"key":"45"},"trusted":true,"tags":["sample_code"]},"execution_count":107,"cell_type":"code","source":"# Importing matplotlib\nimport matplotlib.pyplot as plt\n\n# This makes plots appear in the notebook\n%matplotlib inline\n\n# Plotting the bar chart\nplt.bar(x=years, height=unknown_by_yr)\nplt.title('foreign-born authors')\nplt.ylabel('quantity')\nplt.xlabel('year')"},{"metadata":{"run_control":{"frozen":true},"dc":{"key":"52"},"deletable":false,"editable":false,"tags":["context"]},"cell_type":"markdown","source":"## 8. Raising the bar\n<p>What’s more exciting than a bar chart is a grouped bar chart. This type of chart is good for displaying <em>changes</em> over time while also <em>comparing</em> two or more groups. Let’s use a grouped bar chart to look at the distribution of male and female authors over time.</p>"},{"outputs":[{"data":{"text/plain":"Text(0.5,0,'year')"},"metadata":{},"execution_count":109,"output_type":"execute_result"},{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAF6BJREFUeJzt3Xu0ZGV9p/HnSzdIBBSQtkWatjE4GmYyiraI8QIRzSCiIFFjxksbMR1nEqOJl6CZmeCMjJe1NJLociSigHjBG8Iwo0ZRdKKgNIiotARQCCBNt0jLzUjA3/yx9wnl4VzqdNeuOof9fNaqdfZ9/95z6tS39rtr70pVIUnqrx0mXYAkabIMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQP8qySlJ3rLY60jyliQ/SbJpzHUdn+T0ce5zNkkqyf4dbfvQJNd1sW0tTgaBlpQkq4HXAgdU1UMmXc84JDkvySsmXYfuuwwCLTWrgZuqavOkC7kvSLJ80jWMQpJlk65hKTMIlrgkVyd5fZJLk9ye5OQkK5N8LsmtSb6UZI+B5T+ZZFOSnyX5WpJ/O8e2j0xySZKtSb6R5N/PseyJSa5NckuSi5I8ZWDe8Uk+keS0tqbvJ1k7MP/AJBe3884Adp5lH08Hvgg8NMltSU5ppx/c1rc1yXeSHDqwznltV9I32nX+d5IHJflIW+uFSdYM044Z6pl1vzMse1ySq9o2XpbkudN+P6cPjK9pu36WJzkBeArwnrb+9wxs9ulJrmj3/94kadffIcl/SXJNks3t7/2B07Z9bJJ/Ar48R81varvgrk7yonba45PcOPjCm+SYJN+ZYf05l23rnPq93NQ+R/YcWHbW52qa7sP3Jfm/SW4Hfnu2dmgIVeVjCT+Aq4ELgJXAPsBm4GLgQJoX1C8DfzWw/MuB3YD7Ae8GLhmYdwrwlnb4wHZbTwCWAevafd1vljpeDDwIWE7TdbMJ2Lmddzzwz8AR7bbeClzQztsJuAb4M2BH4HnAv0zVMcN+DgWuGxjfB7ip3fYOwDPa8RXt/POAK4FfBx4IXAb8I/D0ttbTgA8toB2nD7PfGep+PvDQdtnfA24H9p6+3XZ8DVDA8oE2vGLa9go4B9id5ihpC3D4wN/4SuDhwK7AZ4APT9v2acAuwK/N8ju+C3hX+zw5pK33ke38y4BnDix/JvDaWdo967LAq2meu6va/bwf+NgCnqs/A57U/k53nvT/4lJ+TLwAH9v5B2xenF80MP5p4H0D468CPjvLuru3LwoPbMdP4Z4geB/wP6YtfzlwyJB13Qw8uh0+HvjSwLwDgJ+3w08FfgxkYP43GD4I/mLqRW5g2heAde3wecBfDsx7J/C5gfFnD77ADNGO04fZ7xC/n0uAo6Zvtx1fw3BB8OSB8U8Ax7XD5wL/eWDeI2nCdfnAth8+R22H0gTBLtO2/18H2v6RdnhP4A7aUJthW7MuC2wEDhtYdu+pOod8rp42jv+xPjzsGrpvuHFg+OczjO8KTT9qkre1h+K30IQIwF4zbPNhwGvbboetSbYC+9K8q72XJK9LsrE9jN9K8+57cLuDn/C5A9g5Tf/0Q4Hrq/3vbl0zT3un1/n8aXU+meZFZcpQv58h27GQ/f6rJC8d6GbbCvy7Wba7ENN/p1PteCi/+ju8hiYEVg5Mu3aebd9cVbdP28bU3/504NlJdgFeAPy/qrphlu3MtezDgDMHficbgbuBlUM+V+drg4Z0nzhRpKH9R+Aomm6Rq2le5G4GMsOy1wInVNUJ82207Ud/A3AY8P2q+mWS2bY73Q3APkkyEAargauGWHeqzg9X1R8OufysFtiOofeb5GHA37XbPb+q7k5yycB2bwfuP7DK9E9DLfQWwT+meZGdsprmHf6NNN0ww2xzjyS7DITBauB7AFV1fZLzgWOAl9AcPc5onmWvBV5eVV+fvl6SlzD/c9VbJ4+IRwT9shvwC5q+7PsD/3OOZf8OeGWSJ6SxS5JnJdltlu3eRdNPvTzJfwMeMGRN57fr/mmSHZMcAxw05LpwzzvO/9C+i9w5zefgV8275r0tpB0L2e8uNC9aWwCS/AHNEcGUS4CnJlndntR947T1b6Tp7x/Wx4A/S7Jfkl1p/s5nVNVdC9gGwJuT7NQG5JHAJwfmnUYTmr9Jcw5iLrMt+7+AE9qgJMmKJEe18xbyXNV2Mgj65TSaQ/zraU7iXTDbglW1AfhD4D0078SuBF42y+JfAD5PcxL2GpoTw0MdtlfVnTTvFl8G/JTmROp8LyyD619L887xTTQvtNcCr2fbnttDt2Mh+62qy2jOTZxP86L+m8DXB+Z/ETgDuBS4iOYk8KATgecluTnJ3wzRjg8CHwa+Bvyobcerhlhv0Caav/uPgY8Ar6yqHwzMP5O2a6eq7phnW7MteyJwNvD3SW6leT4+oZ039HNV2y+/2jUrScNJchXwR1X1pVEuq/HziEDSgiX5XZrurlmvQ9iWZTUZniyWtCBJzqP5CPBLquqXo1pWk2PXkCT1nF1DktRzS6JraK+99qo1a9ZMugxJWlIuuuiin1TVivmWWxJBsGbNGjZs2DDpMiRpSUky1FX6dg1JUs8ZBJLUcwaBJPWcQSBJPWcQSFLPGQSS1HMGgST1nEEgST1nEEhSzy2JK4t1H/bVea4YP2TteOqYlL63X4tCp0GQ5GrgVpovpL6rqtYm2ZPm25jW0HwX6Quq6uYu65AkzW4cXUO/XVWPqaqptzbHAedW1SOAc9txSdKETOIcwVHAqe3wqcDRE6hBktTqOgiK5oupL0qyvp22sqpuaIc3AStnWjHJ+iQbkmzYsmVLx2VKUn91fbL4yVV1fZIHA19M8oPBmVVVSWb8irSqOgk4CWDt2rV+jZokdaTTI4Kqur79uRk4EzgIuDHJ3gDtz81d1iBJmltnQZBklyS7TQ0DvwN8DzgbWNcutg44q6saJEnz67JraCVwZpKp/Xy0qj6f5ELgE0mOBa4BXtBhDZKkeXQWBFX1Q+DRM0y/CTisq/1KkhbGW0xIUs8ZBJLUc95rqO+8143Uex4RSFLPGQSS1HMGgST1nEEgST1nEEhSzxkEktRzBoEk9ZzXEajfvI5C8ohAkvrOIJCknjMIJKnnPEcwafZRS5owjwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6zusIJGlSFsl1RB4RSFLPGQSS1HMGgST1nEEgST1nEEhSzxkEktRzBoEk9ZxBIEk913kQJFmW5NtJzmnH90vyzSRXJjkjyU5d1yBJmt04jgheDWwcGH878NdVtT9wM3DsGGqQJM2i0yBIsgp4FvCBdjzA04BPtYucChzdZQ2SpLl1fUTwbuANwC/b8QcBW6vqrnb8OmCfmVZMsj7JhiQbtmzZ0nGZktRfnQVBkiOBzVV10basX1UnVdXaqlq7YsWKEVcnSZrS5d1HnwQ8J8kRwM7AA4ATgd2TLG+PClYB13dYgyRpHp0dEVTVG6tqVVWtAV4IfLmqXgR8BXheu9g64KyuapAkzW8S1xH8BfDnSa6kOWdw8gRqkCS1xvLFNFV1HnBeO/xD4KBx7FeSND+vLJaknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeq5sXxnsaRF6qsb5p5/yNrx1KGJ8ohAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp57yOQFJ/eR0F4BGBJPWeQSBJPTdUECR5dhJDQ5Lug4Z9cf894Iok70jyqC4LkiSN11BBUFUvBg4ErgJOSXJ+kvVJdpttnSQ7J/lWku8k+X6SN7fT90vyzSRXJjkjyU4jaYkkaZsM3d1TVbcAnwI+DuwNPBe4OMmrZlnlF8DTqurRwGOAw5McDLwd+Ouq2h+4GTh2O+qXJG2nYc8RHJXkTOA8YEfgoKp6JvBo4LUzrVON29rRHdtHAU+jCRSAU4Gjt7l6SdJ2G/Y6gmNo3sV/bXBiVd2RZNZ39EmWARcB+wPvpela2lpVd7WLXAfsM8u664H1AKtXrx6yTElLip/jXxSG7RraND0EkrwdoKrOnW2lqrq7qh4DrAIOAoY+0VxVJ1XV2qpau2LFimFXkyQt0LBB8IwZpj1z2J1U1VbgK8ATgd2TTB2JrAKuH3Y7kqTRmzMIkvynJN8FHpXk0oHHj4BL51l3RZLd2+FfowmTjTSB8Lx2sXXAWdvbCEnStpvvHMFHgc8BbwWOG5h+a1X9dJ519wZObc8T7AB8oqrOSXIZ8PEkbwG+DZy8baVLkkZhviCoqro6yR9Pn5Fkz7nCoKoupbn2YPr0H9KcL5AkLQLDHBEcSfPJnwIyMK+Ah3dUlyRpTOYMgqo6sv2533jKkSSN27AXlN3rI6IzTZMkLT1zHhEk2Rm4P7BXkj24p2voAcxyIZgkaWmZ7xzBHwGvAR5Kc55gKghuAd7TYV2SpDGZ7xzBicCJSV5VVX87ppokSWM01L2Gqupvk/wWsGZwnao6raO6NCKfufyGOecfM6Y6JC1eQwVBkg8Dvw5cAtzdTi7AIJCkJW7Yu4+uBQ6oquqyGEnS+A1707nvAQ/pshBJ0mQMe0SwF3BZkm/RfPMYAFX1nE6qkiSNzbBBcHyXRUiSJmfYTw19tetCJEmTMewtJg5OcmGS25LcmeTuJLd0XZwkqXvDdg29B3gh8EmaTxC9FPg3XRUlSaPgdTTDGfZTQ1TVlcCy9nuIPwQc3l1ZkqRxGfaI4I4kOwGXJHkHcAMLCBFJ0uI17Iv5S4BlwJ8AtwP7Ar/bVVGSpPEZ9lND17SDPwfe3F05kqRxG/ZeQz+iubfQr6gqv6pSkpa4hdxraMrOwPOBPUdfjiRp3IY6R1BVNw08rq+qdwPP6rg2SdIYDNs19NiB0R1ojhCGPZqQJC1iw76Yv5N7zhHcBVxN0z0kSVrihg2Cc2iCYOo7iws4MmlGq+pdoy9NkjQOwwbB44DHA2fRhMGzgW8BV3RUlyRpTIYNglXAY6vqVoAkxwP/p6pe3FVhkqTxGPbK4pXAnQPjd7bTJElL3LBHBKcB30pyZjt+NHBKJxVJksZq2FtMnJDkc8BT2kl/UFXf7q4sSdK4DH0tQFVdDFzcYS2SpAno7KKwJPvSdCmtpPm46UlVdWKSPYEzgDU01yO8oKpu7qoOTdakvxhk0vuftL63X8Pp8jsF7gJeW1UHAAcDf5zkAOA44NyqegRwbjsuSZqQzoKgqm5ou5NoP3a6EdgHOAo4tV3sVJoTz5KkCRnLt4wlWQMcCHwTWFlVU8erm/BjqJI0UZ3fOC7JrsCngddU1S1Tt6UAqKpKcq/vOWjXWw+sB1i9enXXZUrSyC2VczSdHhEk2ZEmBD5SVZ9pJ9+YZO92/t7A5pnWraqTqmptVa1dsWJFl2VKUq91FgRp3vqfDGycdlO6s4F17fA6mvsXSZImpMuuoSfRfOn9d5Nc0k57E/A24BNJjgWuAV7QYQ2SpHl0FgRV9Q/cc9vq6Q7rar+LzVLpI5TUX2P51JAkafEyCCSp5wwCSeo5v4BeUmc8R7Y0eEQgST1nEEhSzxkEktRzBoEk9ZxBIEk9ZxBIUs8ZBJLUcwaBJPWcQSBJPWcQSFLPGQSS1HPea+irG+aef8ja8dQhSRPiEYEk9ZxBIEk9ZxBIUs8ZBJLUcwaBJPWcQSBJPWcQSFLPGQSS1HMGgST1nEEgST1nEEhSzxkEktRzBoEk9ZxBIEk9ZxBIUs/5fQRShz5z+Q1zzj9mTHVIc+nsiCDJB5NsTvK9gWl7Jvlikivan3t0tX9J0nC67Bo6BTh82rTjgHOr6hHAue24JGmCOguCqvoa8NNpk48CTm2HTwWO7mr/kqThjPtk8cqqmuo03QSsnG3BJOuTbEiyYcuWLeOpTpJ6aGKfGqqqAmqO+SdV1dqqWrtixYoxViZJ/TLuILgxyd4A7c/NY96/JGmacQfB2cC6dngdcNaY9y9JmqbLj49+DDgfeGSS65IcC7wNeEaSK4Cnt+OSpAnq7IKyqvr9WWYd1tU+JUkL5y0mJKnnDAJJ6rn7/L2GvNeLJM3NIwJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkKSeMwgkqecMAknqOYNAknrOIJCknjMIJKnnDAJJ6jmDQJJ6ziCQpJ4zCCSp5wwCSeq5iQRBksOTXJ7kyiTHTaIGSVJj7EGQZBnwXuCZwAHA7yc5YNx1SJIakzgiOAi4sqp+WFV3Ah8HjppAHZIkYPkE9rkPcO3A+HXAE6YvlGQ9sL4dvS3J5SPa/17AT0a0raXI9tt+298fDxtmoUkEwVCq6iTgpFFvN8mGqlo76u0uFbbf9tv+/rZ/NpPoGroe2HdgfFU7TZI0AZMIgguBRyTZL8lOwAuBsydQhySJCXQNVdVdSf4E+AKwDPhgVX1/jCWMvLtpibH9/Wb7dS+pqknXIEmaIK8slqSeMwgkqeeWfBAk2TfJV5JcluT7SV7dTt8zyReTXNH+3KOdniR/097e4tIkjx3Y1jvabWxsl8mk2jWsbWj/o5Kcn+QXSV43bVtL7tYfo2r/bNtZ7Eb592/nL0vy7STnjLst22LEz//dk3wqyQ/a14AnTqJNE1FVS/oB7A08th3eDfhHmltXvAM4rp1+HPD2dvgI4HNAgIOBb7bTfwv4Os0J7GXA+cChk25fB+1/MPB44ATgdQPbWQZcBTwc2An4DnDApNs3xvbPuJ1Jt29c7R/Y3p8DHwXOmXTbxt1+4FTgFe3wTsDuk27fuB5L/oigqm6oqovb4VuBjTRXLx9F84el/Xl0O3wUcFo1LgB2T7I3UMDONE+A+wE7AjeOrSHbaKHtr6rNVXUh8C/TNrUkb/0xqvbPsZ1FbYR/f5KsAp4FfGAMpY/EqNqf5IHAU4GT2+XurKqtY2nEIrDkg2BQkjXAgcA3gZVVdUM7axOwsh2e6RYX+1TV+cBXgBvaxxeqauMYyh6ZIds/mxl/LyMusVPb2f7ZtrNkjKD97wbeAPyyi/q6tp3t3w/YAnyo7Rr7QJJduqp1sbnPBEGSXYFPA6+pqlsG51VzrDfn52ST7A/8Bs2VzvsAT0vylI7KHbntbf9SN6r2z7WdxWwEz/8jgc1VdVF3VXZnBH//5cBjgfdV1YHA7TRdSr1wnwiCJDvSPAk+UlWfaSff2Hb50P7c3E6f7RYXzwUuqKrbquo2mvMIS+Jk0QLbP5sle+uPEbV/tu0seiNq/5OA5yS5mqZb8GlJTu+o5JEaUfuvA66rqqmjwE/RBEMvLPkgaD/ZczKwsareNTDrbGBdO7wOOGtg+kvbTw8dDPysPYT8J+CQJMvbJ9YhNP2Ni9o2tH82S/LWH6Nq/xzbWdRG1f6qemNVraqqNTR/+y9X1Ys7KHmkRtj+TcC1SR7ZTjoMuGzE5S5ekz5bvb0P4Mk0h32XApe0jyOABwHnAlcAXwL2bJcPzRfjXAV8F1jbTl8GvJ/mxf8y4F2TbltH7X8IzbufW4Ct7fAD2nlH0Hzq4irgLyfdtnG2f7btTLp94/z7D2zzUJbOp4ZG+fx/DLCh3dZngT0m3b5xPbzFhCT13JLvGpIkbR+DQJJ6ziCQpJ4zCCSp5wwCSeo5g0CSes4gkMYgybJJ1yDNxiCQpkny35O8ZmD8hCSvTvL6JBem+R6LNw/M/2ySi9r74a8fmH5bkncm+Q5L5HYl6ieDQLq3DwIvBUiyA80tFzYBj6C5XfdjgMcleWq7/Mur6nHAWuBPkzyonb4LzfddPLqq/mGcDZAWYvmkC5AWm6q6OslNSQ6kuX3xt2m+zOR32mGAXWmC4Ws0L/7Pbafv206/Cbib5mZo0qJmEEgz+wDwMpp703yQ5iZkb62q9w8ulORQ4OnAE6vqjiTn0XzBEcA/V9Xd4ypY2lZ2DUkzOxM4nOZI4Avt4+Xtfe9Jsk+SBwMPBG5uQ+BRNF9/Ki0pHhFIM6iqO5N8Bdjavqv/+yS/AZzf3PmY24AXA58HXplkI3A5cMGkapa2lXcflWbQniS+GHh+VV0x6XqkLtk1JE2T5ADgSuBcQ0B94BGBJPWcRwSS1HMGgST1nEEgST1nEEhSzxkEktRz/x8bXmhPNw4i1gAAAABJRU5ErkJggg==\n","text/plain":"<matplotlib.figure.Figure at 0x7f0c305fae10>"},"metadata":{},"output_type":"display_data"}],"metadata":{"dc":{"key":"52"},"trusted":true,"tags":["sample_code"]},"execution_count":109,"cell_type":"code","source":"# Creating a new list, where 0.25 is added to each year\nyears_shifted = np.array(years) + np.array(0.25)\n\n# Plotting males_by_yr by year\nplt.bar(x=years, height=males_by_yr, width=0.25, color='lightblue')\n\n# Plotting females_by_yr by years_shifted\nplt.bar(x=years_shifted, height=females_by_yr, width=0.25, color='pink')\n\nplt.title('male and female author by year')\nplt.ylabel('quantity')\nplt.xlabel('year')"}]}


--------------------------------------------------------------------------------
/word_frequency_in_moby_dick.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat_minor":2,"cells":[{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"3"},"run_control":{"frozen":true},"deletable":false},"source":"## 1. Tools for text processing\n<p><img style=\"float: right ; margin: 5px 20px 5px 10px; width: 45%\" src=\"https://s3.amazonaws.com/assets.datacamp.com/production/project_38/img/Moby_Dick_p510_illustration.jpg\"> </p>\n<p>What are the most frequent words in Herman Melville's novel Moby Dick and how often do they occur?</p>\n<p>In this notebook, we'll scrape the novel <em>Moby Dick</em> from the website <a href=\"https://www.gutenberg.org/\">Project Gutenberg</a> (which contains a large corpus of books) using the Python package <code>requests</code>. Then we'll extract words from this web data using <code>BeautifulSoup</code>. Finally, we'll dive into analyzing the distribution of words using the Natural Language ToolKit (<code>nltk</code>). </p>\n<p>The <em>Data Science pipeline</em> we'll build in this notebook can be used to visualize the word frequency distributions of any novel that you can find on Project Gutenberg. The natural language processing tools used here apply to much of the data that data scientists encounter as a vast proportion of the world's data is unstructured data and includes a great deal of text.</p>\n<p>Let's start by loading in the three main python packages we are going to use.</p>","cell_type":"markdown"},{"execution_count":2,"outputs":[],"metadata":{"tags":["sample_code"],"dc":{"key":"3"},"trusted":true,"collapsed":true},"source":"# Importing requests, BeautifulSoup and nltk\nimport requests\nimport nltk\nfrom bs4 import BeautifulSoup","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"10"},"run_control":{"frozen":true},"deletable":false},"source":"## 2. Request Moby Dick\n<p>To analyze Moby Dick, we need to get the contents of Moby Dick from <em>somewhere</em>. Luckily, the text is freely available online at Project Gutenberg as an HTML file: https://www.gutenberg.org/files/2701/2701-h/2701-h.htm .</p>\n<p><strong>Note</strong> that HTML stands for Hypertext Markup Language and is the standard markup language for the web.</p>\n<p>To fetch the HTML file with Moby Dick we're going to use the <code>request</code> package to make a <code>GET</code> request for the website, which means we're <em>getting</em> data from it. This is what you're doing through a browser when visiting a webpage, but now we're getting the requested page directly into python instead. </p>","cell_type":"markdown"},{"execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"'<?xml version=\"1.0\" encoding=\"utf-8\"?>\\r\\n\\r\\n<!DOCTYPE html\\r\\n   PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\\r\\n   \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\" >\\r\\n\\r\\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\\r\\n  <head>\\r\\n    <title>\\r\\n      Moby Dick; Or the Whale, by Herman Melville\\r\\n    </title>\\r\\n    <style type=\"text/css\" xml:space=\"preserve\">\\r\\n\\r\\n    body { background:#faebd0; color:black; margin-left:15%; margin-right:15%; text-align:justify }\\r\\n    P { text-indent: 1em; margin-top: .25em; margin-bottom: .25em; }\\r\\n    H1,H2,H3,H4,H5,H6 { text-align: center; margin-left: 15%; margin-right: 15%; }\\r\\n    hr  { width: 50%; text-align: center;}\\r\\n    .foot { margin-left: 20%; margin-right: 20%; text-align: justify; text-indent: -3em; font-size: 90%; }\\r\\n    blockquote {font-size: 100%; margin-left: 0%; margin-right: 0%;}\\r\\n    .mynote    {background-color: #DDE; color: #000; padding: .5em; margin-left: 10%; margin-right: 10%; font-family: sans-serif; font-size: 95%;}\\r\\n    .toc       { margin-left: 10%; margin-bottom: .75em;}\\r\\n    .toc2      { margin-left: 20%;}\\r\\n    div.fig    { display:block; margin:0 auto; text-align:center; }\\r\\n    div.middle { margin-left: 20%; margin-right: 20%; text-align: justify; }\\r\\n    .figleft   {float: left; margin-left: 0%; margin-right: 1%;}\\r\\n    .figright  {float: right; margin-right: 0%; margin-left: 1%;}\\r\\n    .pagenum   {display:inline; font-size: 70%; font-style:normal;\\r\\n               margin: 0; padding: 0; position: absolute; right: 1%;\\r\\n               text-align: right;}\\r\\n    pre        { font-family: times new roman; font-size: 100%; margin-left: 10%;}\\r\\n\\r\\n    table      {margin-left: 10%;}\\r\\n\\r\\na:link {color:blue;\\r\\n\\t\\ttext-decoration:none}\\r\\nlink {color:blue;\\r\\n\\t\\ttext-decoration:none}\\r\\na:visited {color:blue;\\r\\n\\t\\ttext-decoration:none}\\r\\na:hover {color:red}\\r\\n\\r\\n</style>\\r\\n  </head>\\r\\n  <body>\\r\\n<pre xml:space=\"preserve\">\\r\\n\\r\\nThe Project Gutenberg EBook of Moby Dick; or The Whale, by Herman Melville\\r\\n\\r\\nThis eBook is for the use of anyone anywh'"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"10"},"trusted":true},"source":"# Getting the Moby Dick HTML \nr = requests.get('https://s3.amazonaws.com/assets.datacamp.com/production/project_147/datasets/2701-h.htm')\n\n# Setting the correct text encoding of the HTML page\nr.encoding = 'utf-8'\n\n# Extracting the HTML from the request object\nhtml = r.text\n\n# Printing the first 2000 characters in html\nhtml[0:2000]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"17"},"run_control":{"frozen":true},"deletable":false},"source":"## 3. Get the text from the HTML\n<p>This HTML is not quite what we want. However, it does <em>contain</em> what we want: the text of <em>Moby Dick</em>. What we need to do now is <em>wrangle</em> this HTML to extract the text of the novel. For this we'll use the package <code>BeautifulSoup</code>.</p>\n<p>Firstly, a word on the name of the package: Beautiful Soup? In web development, the term \"tag soup\" refers to structurally or syntactically incorrect HTML code written for a web page. What Beautiful Soup does best is to make tag soup beautiful again and to extract information from it with ease! In fact, the main object created and queried when using this package is called <code>BeautifulSoup</code>. After creating the soup, we can use its <code>.get_text()</code> method to extract the text.</p>","cell_type":"markdown"},{"execution_count":6,"outputs":[{"output_type":"execute_result","execution_count":6,"data":{"text/plain":"'r which the beech tree\\r\\n        extended its branches.” —Darwin’s Voyage of a Naturalist.\\r\\n      \\n\\r\\n        “‘Stern all!’ exclaimed the mate, as upon turning his head, he saw the\\r\\n        distended jaws of a large Sperm Whale close to the head of the boat,\\r\\n        threatening it with instant destruction;—‘Stern all, for your\\r\\n        lives!’” —Wharton the Whale Killer.\\r\\n      \\n\\r\\n        “So be cheery, my lads, let your hearts never fail, While the bold\\r\\n        harpooneer is striking the whale!” —Nantucket Song.\\r\\n      \\n\\r\\n     “Oh, the rare old Whale, mid storm and gale\\r\\n     In his ocean home will be\\r\\n     A giant in might, where might is right,\\r\\n     And King of the boundless sea.”\\r\\n      —Whale Song.\\r\\n\\n\\n\\n\\n\\n \\n\\n\\n\\n\\n\\r\\n      CHAPTER 1. Loomings.\\r\\n    \\n\\r\\n      Call me Ishmael. Some years ago—never mind how long precisely—having\\r\\n      little or no money in my purse, and nothing particular to interest me on\\r\\n      shore, I thought I would sail about a little and see the watery part of\\r\\n      the world. It is a way I have of driving off the spleen and regulating the\\r\\n      circulation. Whenever I find myself growing grim about the mouth; whenever\\r\\n      it is a damp, drizzly November in my soul; whenever I find myself\\r\\n      involuntarily pausing before coffin warehouses, and bringing up the rear\\r\\n      of every funeral I meet; and especially whenever my hypos get such an\\r\\n      upper hand of me, that it requires a strong moral principle to prevent me\\r\\n      from deliberately stepping into the street, and methodically knocking\\r\\n      people’s hats off—then, I account it high time to get to sea as soon\\r\\n      as I can. This is my substitute for pistol and ball. With a philosophical\\r\\n      flourish Cato throws himself upon his sword; I quietly take to the ship.\\r\\n      There is nothing surprising in this. If they but knew it, almost all men\\r\\n      in their degree, some time or other, cherish very nearly the same feelings\\r\\n      towards the ocean with me.\\r\\n    \\n\\r\\n      Ther'"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"17"},"trusted":true},"source":"# Creating a BeautifulSoup object from the HTML\nsoup = BeautifulSoup(html, 'html.parser')\n\n# Getting the text out of the soup\ntext = soup.get_text()\n\n# Printing out text between characters 32000 and 34000\ntext[32000:34000]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"24"},"run_control":{"frozen":true},"deletable":false},"source":"## 4. Extract the words\n<p>We now have the text of the novel! There is some unwanted stuff at the start and some unwanted stuff at the end. We could remove it, but this content is so much smaller in amount than the text of Moby Dick that, to a first approximation, it is okay to leave it in.</p>\n<p>Now that we have the text of interest, it's time to count how many times each word appears, and for this we'll use <code>nltk</code> – the Natural Language Toolkit. We'll start by tokenizing the text, that is, remove everything that isn't a word (whitespace, punctuation, etc.) and then split the text into a list of words.</p>","cell_type":"markdown"},{"execution_count":8,"outputs":[{"output_type":"execute_result","execution_count":8,"data":{"text/plain":"['Moby', 'Dick', 'Or', 'the', 'Whale', 'by', 'Herman', 'Melville']"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"24"},"trusted":true},"source":"# Creating a tokenizer\ntokenizer = nltk.tokenize.RegexpTokenizer('\\w+')\n\n# Tokenizing the text\ntokens = tokenizer.tokenize(text)\n\n# Printing out the first 8 words / tokens \ntokens[:8]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"31"},"run_control":{"frozen":true},"deletable":false},"source":"## 5. Make the words lowercase\n<p>OK! We're nearly there. Note that in the above 'Or' has a capital 'O' and that in other places it may not, but both 'Or' and 'or' should be counted as the same word. For this reason, we should build a list of all words in <em>Moby Dick</em> in which all capital letters have been made lower case.</p>","cell_type":"markdown"},{"execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"['moby', 'dick', 'or', 'the', 'whale', 'by', 'herman', 'melville']"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"31"},"trusted":true},"source":"# A new list to hold the lowercased words\nwords = []\n\n# Looping through the tokens and make them lower case\nfor word in tokens:\n    words.append(word.lower())\n\n# Printing out the first 8 words / tokens \nwords[:8]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"38"},"run_control":{"frozen":true},"deletable":false},"source":"## 6. Load in stop words\n<p>It is common practice to remove words that appear a lot in the English language such as 'the', 'of' and 'a' because they're not so interesting. Such words are known as <em>stop words</em>. The package <code>nltk</code> includes a good list of stop words in English that we can use.</p>","cell_type":"markdown"},{"execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves']"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"38"},"trusted":true},"source":"# Getting the English stop words from nltk\nsw = nltk.corpus.stopwords.words('english')\n\n# Printing out the first eight stop words\nsw[:8]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"45"},"run_control":{"frozen":true},"deletable":false},"source":"## 7. Remove stop words in Moby Dick\n<p>We now want to create a new list with all <code>words</code> in Moby Dick, except those that are stop words (that is, those words listed in <code>sw</code>). One way to get this list is to loop over all elements of <code>words</code> and add each word to a new list if they are <em>not</em> in <code>sw</code>.</p>","cell_type":"markdown"},{"execution_count":14,"outputs":[{"output_type":"execute_result","execution_count":14,"data":{"text/plain":"['moby', 'dick', 'whale', 'herman', 'melville', 'body', 'background', 'faebd0']"},"metadata":{}}],"metadata":{"tags":["sample_code"],"dc":{"key":"45"},"trusted":true},"source":"# A new list to hold Moby Dick with No Stop words\nwords_ns = []\n\n# Appending to words_ns all words that are in words but not in sw\nfor word in words:\n    if not word in sw:\n        words_ns.append(word)\n\n# Printing the first 5 words_ns to check that stop words are gone\nwords_ns[:8]","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"52"},"run_control":{"frozen":true},"deletable":false},"source":"## 8. We have the answer\n<p>Our original question was:</p>\n<blockquote>\n  <p>What are the most frequent words in Herman Melville's novel Moby Dick and how often do they occur?</p>\n</blockquote>\n<p>We are now ready to answer that! Let's create a word frequency distribution plot using <code>nltk</code>. </p>","cell_type":"markdown"},{"execution_count":16,"outputs":[{"output_type":"display_data","metadata":{},"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJztnXmcFNW1x78HEIZFGEQFV1QUFwwOBoREjHvUxGhixBh31ERjjBp8KhqNMfp8LkmMZtNE3DVR9BlxQ54KbrggMIAIiLugQkQGBRURzvvjVjE9zUxPV/ft7ts15/v51Ke7blf9+vZWp+85554rqophGIZhhEa7SnfAMAzDMJrDDJRhGIYRJGagDMMwjCAxA2UYhmEEiRkowzAMI0jMQBmGYRhBUlIDJSJjRGSRiMxs5rGzRWSNiGyQ0XadiMwXkXoRqctoP15EXhOReSJyXCn7bBiGYYRBqUdQNwMHZDeKyObA/sA7GW0HAf1UdTvgFOD6qL0n8GtgCDAUuFhEepS434ZhGEaFKamBUtVngaXNPHQNcE5W26HAbdF5LwI9RKQ3zsBNUNVlqtoATAAOLF2vDcMwjBAoewxKRA4B3lPVWVkPbQa8l7G/IGrLbl8YtRmGYRgppkM5n0xEOgMX4Nx7rR6eVH/bbbfV5cuXs2jRIgD69evH+uuvT319PQB1dS6sZfu2b/u2b/uV3e/duzfA2uu1qq57zVfVkm5AX2BmdH9n4EPgTeAtYBXwNrAxLub0o4zz5gK9gSOB6zPamxyX9Vzqi4svvjiVOj61QtPxqRWajk+ttOr41ApNx6dWaDqqqtG1e51rejlcfBJtqOorqtpHVbdR1a1xbrxBqroYGAccByAiw4AGVV0EPAbsLyI9ooSJ/aM2wzAMI8WUOs38LmAy0F9E3hWRkVmHKI3G6xHgLRF5HbgBOC1qXwpcCrwMvAhcoi5ZYh3iIaMPvvjii1Tq+NQKTcenVmg6PrXSquNTKzQdn1qh6eSipDEoVT2qlce3ydo/vYXjbgFuae35unXrlqB3uRk+fHgqdXxqhabjUys0HZ9aadXxqRWajk+t0HRyIZqi9aBERNP0egzDMNoCItJskoSVOjIMwzCCJFUGKk5j9EFDQ7NhrqrX8akVmo5PrdB0fGqlVcenVmg6PrVC08lFqgyUYRiGkR4sBmUYhmFUFItBGYZhGFVFqgyUxaDKqxWajk+t0HR8aqVVx6dWaDo+tULTyUWqDJRhGIaRHiwGZRiGYVQUi0EZhmEYVUWqDJTFoMqrFZqOT63QdHxqpVXHp1ZoOj61QtPJRaoMlGEYhpEeLAZlGIZhVBSLQRmGYRhVRaoMlMWgyqsVmo5PrdB0fGqlVcenVmg6PrVC08lFqgyUYRiGkR4sBmUYhmFUFItBGYZhGFVFqgyUxaDKqxWajk+t0HR8aqVVx6dWaDo+tULTyUWqDJRhGIaRHlIXg3rjDWWbbSrdE8MwDCNf2kwM6u67K90DwzAMwwepMlB1dXXcc48frdD8tObLLq9WaDo+tdKq41MrNB2fWqHp5CJVBgqgvh5ee63SvTAMwzCKJXUxKFAuuwx+9atK98YwDMPIh4rEoERkjIgsEpGZGW1XicgcEakXkftEpHvGY+eLyPzo8W9ntB8oInNF5DUROa+15/Xl5jMMwzAqR6ldfDcDB2S1TQAGqGodMB84H0BEdgKOAHYEDgL+Ko52wJ8jnQHAj0Vkh+aerK6ujh49YOZMmDu3uI6H5qc1X3Z5tULT8amVVh2fWqHp+NQKTScXJTVQqvossDSr7XFVXRPtvgBsHt0/BPiXqn6lqm/jjNdu0TZfVd9R1VXAv4BDW3rOH/zA3Y4d6+91GIZhGOWn0kkSJwKPRPc3A97LeGxh1JbdviBqW4f6+nqOOMLdL9ZA1dbWFicQqI5PrdB0fGqFpuNTK606PrVC0/GpFZpOLjqU/BlaQER+BaxS1X/60uzXrx9PPjmaTp1qmDULbrxxMIcfPnztGxkPSW3f9m3f9m2/cvuTJk1i/PjxANTU1NAiqlrSDegLzMxqOwF4DuiU0TYaOC9jfzwwFBgGjG/puMytrq5OVVVHjlQF1Usu0YJZunRp4ScHrONTKzQdn1qh6fjUSquOT63QdHxqhaajqupM0brX9HK4+CTa3I7IgcA5wCGqujLjuHHAkSLSUUS2BrYFXgKmANuKSF8R6QgcGR3bIrGbz7L5DMMwqpeSzoMSkbuAvYBewCLgYuACoCOwJDrsBVU9LTr+fOAkYBVwpqpOiNoPBK7FxczGqOoVLTyfqiqrVkHv3rB0KbzyCgwYULKXaBiGYRRJS/OgUjdRN349J58MY8bAxRfDb35T2X4ZhmEYLdMmisVmrgeV6eYrxAaHNlfA5lOUVys0HZ9aadXxqRWajk+t0HRykSoDlcnee0OvXjBnDsyeXeneGIZhGElJrYsP4Kc/hX/8A379a7jkkgp2zDAMw2iRNuHiy2bECHdbqJvPMAzDqBypMlCZMShodPPNneuy+ZIQmp/WfNnl1QpNx6dWWnV8aoWm41MrNJ1cpMpAZdOhA/zwh+6+zYkyDMOoLlIdgwJ44gnYbz/o39+NpGQdL6dhGIZRSdrcPKiYr76CTTeF//zHrba7yy4V6pxhGIbRLG0iSSI7BgWFu/lC89OaL7u8WqHp+NRKq45PrdB0fGqFppOLVBmolih20q5hGIZRflLv4gNYvdq5+RYvhmnTYNCgCnTOMAzDaJY24eJrifbtG918ttKuYRhGdZAqA9VcDComqZsvND+t+bLLqxWajk+ttOr41ApNx6dWaDq5SJWBysUee7glON54A6ZPr3RvDMMwjNZoEzGomNNPh7/8Bc47D65odkUpwzAMo9y06RhUjGXzGYZhVA+pMlC5YlAAu+8Om2wCb70FU6fm1grNT2u+7PJqhabjUyutOj61QtPxqRWaTi5SZaBao317OPxwd99q8xmGYYRNm4pBATzzDHzrW9C3rxtJWW0+wzCMymIxqIjYzffOOzBlSqV7YxiGYbREqgxUazEogHbtGhcyzDVpNzQ/rfmyy6sVmo5PrbTq+NQKTcenVmg6uUiVgcoXy+YzDMMInzYXgwJYswa23BIWLoQXXoChQ8vQOcMwDKNZLAaVQaabz7L5DMMwwiRVBiqfGFRM7OYbO9aNqLIJzU9rvuzyaoWm41MrrTo+tULT8akVmk4uSmqgRGSMiCwSkZkZbT1FZIKIzBORx0SkR8Zj14nIfBGpF5G6jPbjReS16JzjfPRt6FDYYgt47z148UUfioZhGIZPShqDEpHhwHLgNlUdGLVdCSxR1atE5Dygp6qOFpGDgNNV9bsiMhS4VlWHiUhP4GVgV0CAqcCuqrqsmefLKwYVM2oUXHMNnHWWuzUMwzDKT0ViUKr6LLA0q/lQ4Nbo/q3Rftx+W3Tei0APEekNHABMUNVlqtoATAAO9NG/1tx8hmEYRuWoRAxqY1VdBKCqHwK9o/bNgPcyjlsQtWW3L4za1iFJDAqcmy/O5nv++aaPheanNV92ebVC0/GplVYdn1qh6fjUCk0nFx1K/gyt05JPLnERou7duzN69GhqamoAGDx4MMOHD6e2thZofEPj/WXLGjj1VLjgglrGjoUBA5o+nn18IfvLly/3qudjP6ZYveXLlwfVnxDfb5/9Ce39Dq0/9vlXz+c/adIkxo8fD7D2et0cJZ8HJSJ9gQczYlBzgL1UdZGI9AEmquqOInJ9dP/u6Li5wJ7A3tHxp0btTY7Leq5EMSiAl15yI6lNN3UJE+1SlddoGIYRPpWcByU0HQ2NA06I7p8APJDRfhyAiAwDGiJX4GPA/iLSI0qY2D9q88KQIa5w7Pvvw+TJvlQNwzCMYil1mvldwGSgv4i8KyIjgStwBmcesE+0j6o+ArwlIq8DNwCnRe1LgUtxmXwvApdEyRLrkDQG5frYtPRRTGh+WvNll1crNB2fWmnV8akVmo5PrdB0clHSGJSqHtXCQ/u1cPzpLbTfAtzip1frcsQRcPXVcO+9Lt28fftSPZNhGIaRL22yFl82qrDNNvD22/DUU269KMMwDKM8WC2+HLTk5jMMwzAqR6oMVCExqJjYQN17L6xeHZ6f1nzZ5dUKTcenVlp1fGqFpuNTKzSdXKTKQBXDrrs6N9+iRW5ZeMMwDKOyWAwqg/PPhyuugNNOg7/8xWPHDMMwjBZpKQZlBiqD6dPdSGrjjd28KMvmMwzDKD1tIkmimBiUOx+23RYWL4annw7LT2u+7PJqhabjUyutOj61QtPxqRWaTi5SZaCKJTObb+LEyvbFMAyjrWMuvixmzoRddoGePZ2bL0cdQ8MwDMMDbcLF54OBA10caulSuP/+SvfGMAyj7ZIqA1VsDCrmpJOgrq6BG28sXitEf29ofbLXVl6ttOr41ApNx6dWaDq5SJWB8sVRR0HHjvDkk/Dmm5XujWEYRtvEYlAtcOyxcMcdcOGFcOmlXiQNwzCMZrAYVEJOOsnd3nKLK31kGIZhlJdUGShfMSiAXXZpoF8/WLAAJkwoXCdEf29ofbLXVl6ttOr41ApNx6dWaDq5SGygRKSniAwsRWdCQgROPNHdHzOmsn0xDMNoi+QVgxKRScAhuAUOpwKLgedUdVRJe5cQnzEogIULYcstXcmjBQtcCSTDMAzDL8XGoHqo6ifAYcBtqjqUFlbFTRObbQbf+Q6sWgW3317p3hiGYbQt8jVQHURkE+AI4KES9qcofMagYv9qnCwxZoxbebdQHV/9CUkrNB2fWqHp+NRKq45PrdB0fGqFppOLfA3UJcBjwOuqOkVEtgHml65b4fDd70Lv3jBnDrzwQqV7YxiG0XbINwa1u6o+11pbpfEdg4o591y4+mo3mvJRXcIwDMNopKj1oERkmqru2lpbpSmVgZo7F3bcEbp1gw8+cLeGYRiGHwpKkhCRb4jI2cBGIjIqY/sNENxyfqWIQQHssAPsvjssXw733FO4jq/+hKIVmo5PrdB0fGqlVcenVmg6PrVC08lFazGojkA3XHr5+hnbJ8Dhpe1aWJx8sru1OVGGYRjlIV8XX19VfacM/SmKUrn4AFasgE02gU8/hVdfdS4/wzAMo3iKnQfVSUT+LiITROTJeCuyQ78UkVdEZKaI3CkiHUVkKxF5QUReE5F/ikiH6NiOIvIvEZkvIs+LyJbFPHchdO0KRx7p7tsoyjAMo/Tka6DGAtOBC4FzMraCEJFNgV8Au6rqQJwL8cfAlcDvVbU/0ABEs5A4CfhYVbcD/ghc1ZxuqWJQMfGcqNtugy+/LFzHV38qrRWajk+t0HR8aqVVx6dWaDo+tULTyUW+BuorVf2bqr6kqlPjrcjnbg90jUZJnYH3gb2B+6LHbwW+H90/NNoHuBfYt8jnLojddoMBA+A//4GHgp2ubBiGkQ7yjUH9Bld/735gZdyuqh8X/MQiZwD/DXwGTADOAp6PRk+IyObAI6o6UERmAQeo6vvRY/OBodnPX8oYVMw118CoUa4E0sMPl/SpDMMw2gQtxaA65Hn+8dFtpltPgW0K7EwtblTUF1iGcyEemESiucZ+/foxevRoampqABg8eDDDhw+ntrYWaBySFrN/2GFw3nm1jB8P8+c3sNFGxenZvu3bvu23tf1JkyYxfvx4gLXX62ZR1bJvuBT1f2TsHwv8FTdKaxe1DQMeje6Px42YwLkGFzenW1dXp75YunRpi4+NGKEKqpddVpyOr/5USis0HZ9aoen41Eqrjk+t0HR8aoWmo6rqTNG61/S8YlAiclxzWz7ntsC7wDARqRERwcWUZgMTgRHRMccDD0T3x9E4ihsBFJVBWCyZBWTXrKlkTwzDMNJLvjGoP2Xs1uAMyjRVLXiyrohcDBwJrMJlCJ4MbA78C+gZtR2jqqtEpBNwOzAIWAIcqapvN6Op+byeYlm9GrbeGt57D554AvbZp+RPaRiGkVqKqsXXjFgt8C9VTRI3KjnlMlAAF18Mv/0tHHUU3HlnWZ7SMAwjlRQ7UTebFcDWxXXJP6WeB5XJyJFuWfj77oOlSwvX8dWfSmiFpuNTKzQdn1pp1fGpFZqOT63QdHKRbwzqQREZF20PA/NwKedtlq22gv32g5Ur4a67Kt0bwzCM9JFvDGrPjN2vgHdUdUHJelUg5XTxAdx9tyt/NGgQTJtWtqc1DMNIFUXHoESkNzAk2n1JVRd77J8Xym2gVq6ETTeFjz+GqVNh16BWxzIMw6gOiopBicgRwEu4FO8jgBdFJLjlNsoZgwLo1AmOOcbdb6mAbIj+3tD6ZK+tvFpp1fGpFZqOT63QdHKRb5LEr4Ahqnq8qh4H7AZcVLpuVQ/xnKg774TPP69sXwzDMNJEvjGoWar6tYz9dsCMzLYQKLeLL2bIEHj5ZbjjDjj66LI/vWEYRlVTbJr5eBF5TEROEJETgIeBR3x2sJqx1XYNwzD8k9NAici2IrK7qp4D3AAMjLbngb+XoX+JKHcMKubII6FzZ5g4Ed54o3AdX/0pl1ZoOj61QtPxqZVWHZ9aoen41ApNJxetjaD+CHwCoKr/q6qjVHUUbg7UH0vduWqhRw8YEVUQvPnmyvbFMAwjLeSMQYnIFFUd0sJjsywG1cjTT8Oee8Jmm8Hbb0OHfBcyMQzDaOMUGoOqzfFY5+K6lC722AO22w4WLoTHHqt0bwzDMKqf1gzUyyLyk+xGETkZKHbJd+9UKgYFri7fiSe6+5nJEiH6e0Prk7228mqlVcenVmg6PrVC08lFa46os4D7ReRoGg3SYKAj8INSdqwaOf54uPBCePBBWLQIeveudI8MwzCql3znQe0N7BztzlbVii4Y2BKVjEHFHHoojBsHV18N//VfFe2KYRhGVeB1PahQCcFAjRvnjNQOO8CrrzrXn2EYhtEyvteDCpJKxqBivvMd6NMH5s6F558P098bWp/stZVXK606PrVC0/GpFZpOLlJloEKgQwcXiwKrLGEYhlEM5uIrAa+9BttvD127wgcfwPrrV7pHhmEY4dImXHyh0L+/mxe1YoVb1NAwDMNITqoMVAgxqJi4gOzjjzewenXl+1MKrdB0fGqFpuNTK606PrVC0/GpFZpOLlJloELi8MOhthbmzYN994UFCyrdI8MwjOrCYlAl5KmnXKXzDz+EXr3gllvg4IMr3SvDMIywsBhUBdhzT5gxAw48EJYsge99D846C1aurHTPDMMwwidVBiqkGFRMx44NPPywqyzRoQNcey184xswf35l+uNTKzQdn1qh6fjUSquOT63QdHxqhaaTi4oZKBHpISJjRWSOiMwWkaEi0lNEJojIvGgF3x4Zx18nIvNFpF5E/FmiMtCunSt79NxzsM02MH067LqrWyLeMAzDaJ6KxaBE5BbgKVW9WUQ6AF2BC4AlqnqViJwH9FTV0SJyEHC6qn5XRIYC16rqsGY0g4pBNceyZXDKKY3p58cfD3/+M3TrVtl+GYZhVIqgavGJSHdguqr2y2qfC+ypqotEpA8wUVV3FJHro/t3R8fNAfZS1UVZ5wdvoABU4aab4Be/gM8/d/Om7r4bPHooDcMwqobQkiS2Bj4SkZtFZJqI/F1EugC9Y6Ojqh8C8YIVmwHvZZy/MGprQogxqOZ0ROCkk+Dll2HnnV3liaFD3UiqJftqvuzyaoWm41MrrTo+tULT8akVmk4uKrUweQdgV+DnqvqyiFwDjAayL8+JhkPdu3dn9OjR1NTUADB48GCGDx9Oba1bGDh+Q8u5v3z58hYf33TTBh5/HH7zm1quvx7GjGlg7ly45JJaevUqXf9iitVbvnx5UP1p7f0O7fOv9vc7tP7Y5189n/+kSZMYP348wNrrdXNUysXXG3heVbeJ9ofjDFQ/ItddKy6+ta7ALN2qcPE1x333uVHVsmWw+eZw112uXJJhGEbaCcrFFxmW90Skf9S0LzAbGAecELWdADwQ3R8HHAcgIsOAhmzjVO388IdQX+9S0BcsgL32gksvxUuZJMMwjGqkkvOgzgDuFJF6YBfgcuBKYH8RmQfsA1wBoKqPAG+JyOvADcBpzQlWSwyqJbbaylWfOP98F4v69a9hv/1g4ULzZZdbKzQdn1pp1fGpFZqOT63QdHJRqRgUqjoDGNLMQ/u1cPzppe1RGKy3Hlx+OeyzDxxzDEyaBLvsArffDgcdVOneGYZhlA+rxRcwixa5eVKPPeb2d9sNRo509f2iuKNhGEbVE9Q8qFKRNgMFsGYN/OEPLh71ySeuraYGDjsMTjwR9t7bVaowDMOoVoJKkigV1R6Dao64TNK8eQ3cfrtz/X3xhcvy228/VzrpN7+Bt98uX59C1fGpFZqOT6206vjUCk3Hp1ZoOrlIlYFKMzU1Lib1xBPw5ptw8cXQty+88w5ccglsvbVbd+qOO+CzzyrdW8MwjOIxF18Vs2YNTJwIN9/s5lF98YVr797dxalGjnQVKmSdgbNhGEY4WAwq5TQ0uHp+N90EL73U2L7jjs5QHXss9OlTuf4ZhmG0hMWgEhKan7Y1ndpaVyX9xRfhlVdc3GrjjWHOHDj3XFed4pBD4P77Ye7cBlasaLnun68+lVvHp1ZoOj610qrjUys0HZ9aoenkomLzoIzSMWCAWyDx8svh0UfdqOrhh+HBB91WV+eqVnTq5JaiT7L17Ant21f6FRqG0RYwF18bYdEil0AxdqwrpbRkSWPMKgkibrTWq5dbw2q99Rq3Dh0K2+/VC0aMcCM+wzDaHhaDMtbhs8+cocp3++gjF+sqBeut5+oRnnoqfOtblthhGG2JNmGgBg0apNOnT/ei1dDQsLZMfJp0itX66itYutQZrGXLGoBaVq2CVavcY/H9JPtLljRwww21rFnjnmOnnZyhOvbY5BUzQnu/Q/nc2oKOT63QdHxqhaYDLRsoi0EZiejQATbayG0NDX5KLjU0wOjRcOON8I9/wKuvwhlnuLYf/9gZq8GDi38ewzCqi1SNoMzFV/2sWgXjxsHf/uYmJccMHgw/+5mb39WlS+X6ZxiGf9qEi88MVLp47TW44QY3EXnpUtfWo4croHvqqW6Ol2EY1Y/Ng0pIaHMF2uJ8iv794fe/d+th3XorDBvmVhy+7joXp9prLzc5+csvy9enSun41Eqrjk+t0HR8aoWmk4tUGSgjnXTuDMcdB88/D9OnuwnJXbu6xR2PPBK22AIuuCBZwVzDMMLHXHxGVfLJJ25e19/+5ipngEtN79fPja4GDGjctt/eGTnDMMLEYlBGKlGFyZPh+uvdJOSVK9c9pl07tyxJbLBiA7bDDq5KvGEYlaVNGCibB1VerdB0Vq6EV19t4LXXapk9G2bPdinr8+fD6tXrHt+uXcsjri++COu1+dRKq45PrdB0fGqFpgM2D8poA3Tq5NbFGjSoafvKlS4jMNNozZ4Nr7/ujNf8+fDAA43Ht2sHe+zhHu/UqXGrqWm6n6s9buvTxyV7DBhgozXDSEqqRlDm4jOSsHIlzJvXaLDi7fXXWVvVwhft2zuX4i67uGK98a3VHzSMNuLiMwNl+GDlSpfOvnKlK6i7cmXzW67H4scXLYIZM5whbM7o9enT1GDtsosbcVnFeKMt0SYMlMWgyqsVmo5PLd86n3/usg1nzHBLncyY4bZPP133nM6dYeedmxqugQNh9eowX1soOj61QtPxqRWaDlgMyjAqSufOMGSI22LWrHFzt+rrG41WfT28+y5MmeK2mHbt4LTT4IgjYPhwq/ZutA1SNYIyF5+RBpYubRxhxYZr1ixX/R1cXcKzz3bLk6y3XmX7ahg+CNLFJyLtgJeBBap6iIhsBfwL2ACYChyrql+JSEfgNuDrwEfAj1T13Wb0zEAZqeTDD+Gvf3XbkiWubYstXNX3k0/2U1XeMCpFqLX4zgRezdi/Evi9qvYHGoCTovaTgI9VdTvgj8BVzYlZLb7yaoWm41MrNJ0+fWDUqAbefdcV0N1+e3jvPTjnHGeozjoL3nqrvH0KTcenVmg6PrVC08lFxQyUiGwOfAe4MaN5H+C+6P6twPej+4dG+wD3AvuWo4+GERpdusBPf+pS4x96CPbZB5Yvh2uvhW23hREjXM1Cw0gDFXPxichY4L+BHsDZwEjg+Wj0FBuwR1R1oIjMAg5Q1fejx+YDQ1X14yxNc/EZbY76erjmGvjnP916WuAqv48aBT/4gVtk0jBCJqgsPhH5LrBIVetFZK/Mh/KVaK6xX79+jB49mppoyv7gwYMZPnz42lTIeEhq+7afpv26ulpuvRUuuqiBf/8bLr+8lhdegMsvb2DMGDjggFpOOgnWrAmjv7Zv+5MmTWL8+PEAa6/XzaKqZd+Ay4F3gTeBD4DlwB3AYqBddMww4NHo/njciAmgPbC4Od26ujr1xdKlS1Op41MrNB2fWqHpJNFavlz1r39V3W47VVdOV3X99VVHjVJ9++3wXpt9/uXVCk1HVdWZonWv6RWJQanqBaq6papuAxwJPKmqxwATgRHRYccDcYW0cdE+0eNPlrO/hlFNdO0KP/sZzJ3ragzuuaebEPyHP7jiuBdc4JYq+eSTSvfUMHJT8XlQIrIncLa6NPOtcWnmPYHpwDGqukpEOgG3A4OAJcCRqvp2M1pa6ddjGCEydaqLU919d+N8qk6d4IAD3OTf730PunevbB+NtkuQ86B8YwbKMHLz4Ydw331wzz3wzDPOAQjOWB14YKOxWn/9yvbTaFuEOg/KKzYPqrxaoen41ApNx5dWnz5w9NENPPUULFwIf/qTW1rkyy+dO/Doo2GjjVz23113NV8r0Gd/fOr41ApNx6dWaDq5SJWBMgwjfzbZBE4/HZ5+GhYsgOuuazRW//53o7E67DCXwp7LWBlGKTAXn2EYTXj//UY34HPPNboBa2rgoIOcG/Dgg6Fbt8r200gPFoMyDCMxCxc2NVYxNTVuMnDXrs2vMpxrBeLsxzp3hp49oVcvd2sTi9sebcJA2XpQ5dUKTcenVmg6PrUK1VmwoNFYTZ4MdXUN1NcX359sndpaZ6ySbF26uCVIKv0elUrHp1ZoOhBYJQnDMKqPzTeHM89024IFMGdOcSsOx8dssgmsWOGqtC9dCg0Nbnvjjfz71qmTM1SDBsE77+Q/kmvAW1yLAAAaHUlEQVTpmK5dXSyuWGpq3JIoXbs6l2j21qmTre2Vi1SNoMzFZxjVzerVzjgtWdJ0++ijddsyt5UrK93zwmjfvqnBasmQdesGW20FO+4IO+wAG25Y6Z77pU24+MxAGUbbQxU++8wZqs8+a33Uls8IL57MXCyrV7s+LV/e/FboKG3DDZ2hig1WfL9vX7f6crXRJgyUxaDKqxWajk+t0HR8aqVVx6dWuXS+/NK5N1esaNmILV8Oy5bBihUNTJxYy9y5rq05amrcWmHZxqt/f5eMUs7XlgSLQRmGYQRGx45u69mz9WMbGuCqq9yIceFCV2tx7lwXC4xvP/gAZsxwWyYijS7CQYOcgevc2SWXFHJbrlFaqkZQ5uIzDKMts2xZo+HKNF6vv+7cjb6IpwfcdZebG1csNoIyDMNIOT16wNChbsvkyy9dVuScOfCf/8Dnn7vYWKG3cayuffvSvp5UjaAsBlVerdB0fGqFpuNTK606PrVC0/GpVayOqjNOixc30KdPLR07Ft2ltlEs1jAMwygtIi4Zo3t3vBinnM+VphGUxaAMwzCqDxtBGYZhGFVFqgyUrQdVXq3QdHxqhabjUyutOj61QtPxqRWaTi5SZaAMwzCM9GAxKMMwDKOiWAzKMAzDqCpSZaAsBlVerdB0fGqFpuNTK606PrVC0/GpFZpOLlJloAzDMIz0YDEowzAMo6JYDMowDMOoKlJloCwGVV6t0HR8aoWm41MrrTo+tULT8akVmk4uKmKgRGRzEXlSRGaLyCwROSNq7ykiE0Rknog8JiI9Ms65TkTmi0i9iDRriT799FNvfXz22WdTqeNTKzQdn1qh6fjUSquOT63QdHxqhaaTi0qNoL4CRqnqAOAbwM9FZAdgNPC4qm4PPAmcDyAiBwH9VHU74BTg+uZE33jjDW8dfPnll1Op41MrNB2fWqHp+NRKq45PrdB0fGqFppOLihgoVf1QVeuj+8uBOcDmwKHArdFht0b7RLe3Rce/CPQQkd5l7bRhGIZRVioegxKRrYA64AWgt6ouAmfEgNgIbQa8l3HawqitCb17+7NZX3zxRSp1fGqFpuNTKzQdn1pp1fGpFZqOT63QdHJR0TRzEekGTAIuVdUHRORjVd0g4/ElqtpLRB4E/kdVJ0ftjwPnquq0LD3LMTcMw6hCglryXUQ6APcCt6vqA1HzIhHpraqLRKQPsDhqXwhskXH65lFbE5p7gYZhGEZ1UkkX303Aq6p6bUbbOOCE6P4JwAMZ7ccBiMgwoCF2BRqGYRjppCIuPhHZHXgamAVotF0AvATcgxstvQMcoaoN0Tl/Bg4EVgAjs917hmEYRrpIVakjwzAMIz1UPIvPMAzDMJrDDJRhGFWNiHTKp82oPiqWxecbEemiqp9Vuh+Z+OiTiLTHzQdb+1mp6rsFam0G9M3SeroAnY7ADrjY4TxV/bLA/vQBdot0pkRz3wrRGQ5sp6o3i8hGQDdVfasQrWIQkcNyPa6q/1uuvvhGRDbI9biqfpxQ70pVPa+1tjx5Htg1j7aqw9dv1lNf2gGHq+o9ZXvOao9Bicg3gRtxF6UtRWQX4BRVPS2hTn/gb7jJwjuLyEDgEFW9rIJ9+gVwMbAIWBM1q6oOLKBPVwI/Al4FVmdoHZJQ57u4UlNvAAJsjXttjybUORn4Na6klQB7Ar9V1ZsS6lwMDAa2V9X+IrIpMFZVd0+iE2nVACcBA4CauF1VT8zz/JujuxsD38S9NoC9gcmqenCeOnHyULMk/fyj0cQPga1oeqH7bQKNt6I+CbAlsDS6Xwu8q6pbJ+zTNFXdNattZpLXFv3B2Qy4Azgq6g9Ad+B6Vd0hYZ8eZN33fRnwMnCDquY1M9XXtcTHbzbHd0ko4FoiIi+r6uAk5xRDGgzUi8DhwDhVHRS1vaKqOyfUeQo4B/dFLFjHc59eB4aq6pKkfWhGax4wUFVXFqkzFzhYVV+P9vsBDxdwMZgHfDN+bSLSC3cR3z6hTj0wCJiW8V4nutBlaI0F5uIudr8FjgbmqOqZCXUmAMer6gfR/ibALap6QJ7n943u/jy6vT26PRpAVUcn7M943IV2Ko0XOlT190l0Iq1/APer6iPR/kHA91X1lDzP/xlwGrAN7k9OzPrAc6p6TIK+HI+bjjIYZ0RiPsW934lGrCJyLbAR8M+o6UfAJ7gLfHdVPTZPHS/XEh+/2YzvUrOo6jsJ9a4APgLuxmVUxzqJRtD5kgoXn6q+J9Jkju7qlo7NQRdVfSlL56sK9+k93IXFB28C6wFFGSjg09g4ZegWUkZ+SdZ5n0ZtSflSVTWuIiIiXQvQiNlWVUeIyKGqequI3AU8U4DOFrFxiliEG3XkRXzREJH94wtcxGgRmYYrqpyEzVX1wITntMQwVf1JvKOqj4rIVQnOvwt4FPgfmr6OT5Ne5FT1VuBWEfmhqt6X5NwW+KaqDsnYf1BEpqjqEBGZnUDH17Wk6N9sUgOUBz+Kbn+e0aa4PxzeSYOBei9yqamIrAeciSs+m5SPotFAfKE7HPgg9ykl79ObwCQReZiML6mq/qEArc+AehF5IkvrjIQ6L4vII7j5agqMAKbE8ZfW/rWKyKjo7uvAiyLyQKRzKDAzYV8A7hGRG4BaEfkJcCLwjwJ0AFZFtw0isjPwIc5dl5QnROQxmv4Tf7wAHRGR3VX1uWjnmxSW2DRZRL6mqrMKODeb90XkQpxbDdyo7v18T1bVZbg/XT/Oih1uKCJbFxI7VNX7Itdztms2bxdmRDcR2TKO8YrIlkC36LEkcVZf15Kif7Mi8im5XXzdk3QoqSu3WNJgoE4FrsX5ohcCE2hq3fPl58DfgR1EZCHwFpC3u6FEfXo32jpGWzGMi7ZiqcGNCPaM9v8DdAa+h/shtOZWWT+6fYOmLp4Hmjk2H77EXfw/AbYHfq2q/1eg1t9FpCdwEe696oaLkyVCVU+PDPYesa6q3l9Af04CbhK3Lprg4j55xcOgSfyhAzBSRN7EXegKij9E/BgXF41fz9NRWyIyY4fAzbjv9x1AIbHD64EuuFjfjTj3+ktJdYCzgWdFJDO+elo0Kr8155lN8XUtKfo3q6rrt35U/ohIF2AUsKWq/lREtsPFfx/y+Txrn6/aY1C+ib6M7VS14NUPRWSDbHdFof8Oo3O7wdqlSYwMROQy4EhgGq581mOasi91ZKDi0UeS87zGH3ziOXY4U1UHZtx2Ax5V1T1aPXldrU64DFVwGaoFl+z2cS0pFhHprqqfSAtZmAVkX96Ni2UeFyWAdMHFjv0tZ55B1Y6gRORP5M50SuS6ys50iv3HBbgJwPmuD1LVTyLtHYGxQNIg6c64APkG0f5HuC9GEn94rLUdzu+/E03dIHn5jkXkXFW9qqX3vYD3eyPgXNZ1y+yTREdVLxSRi4BvAyOBP4vIPcAYVU20gqW4NcYuBzZV1YNEZCfgG6o6Js/zY3eK0PQ9SuROyXCDZrcD+bt4M2JZzV2cCrpo+vrc8Bs7/Dy6/UxcFucSYJMCtb5OY7bjLiKCqt6WREBEanG1Q7ei6bUkr9+IiNyjqkdICxl4CY34XcDBOKMSfzfXSpE8dtRPVX8kIj+O+vKZZAXbfFK1BoqmWTs+eIDGTKdiEwkuxxmp7+JcGLcRZWAl5O+4lYcnAojIXrj4yjcL0LoZ55q5BucKGUmyeEYcQ/P1vt+JywQ6GOcSPR7nLkxMdKH7EBcz+groCdwrIv+nqucmkLoF9z79Ktp/LepjXgYq050iInU0uvieVtUZCfrh1S2DG11uQdPU8A9FZBHwE1WdmkDL1+fmM3b4UGQUrsa9Vi1ES0RuB/oB9WSkdRMtlpqAR3Dr282icXpIEuKs0bymJeRCo6kNHmNHX4pIZxrja/0o/nrZMqpqm/MIveJZ7/vAZNyXtH+BGjPyactTa2p0Oyu7rULvd9yfmRltUwrQORP3p+IxXMLGelF7O+CNhFpTotvpGW31BfTpjOhzvwSXrj4T+EUF3+t/AAdk7H8buAEYBrxYoc/tjOjzuhr4HbC/p9faCehR4LlziMIeRfZhWqU+6xx9eiKftjx0vg08hftTcifwNrB3qfpdzSMoYK3L4TzWdV0ldTkUnenUjPurBy4R4PTIVZA0Y+7NyH0Vz4M5BpfZVwgrxc0Eny8ip+OSN7q1cs46iJuE+F+sO+kz6fsdZ8x9EI003ydyZSZkA+AwzYqnqOoaEUn6D3SFuPlY8b/DYRSW5n8yLh17RaRzJa6ywZ+SiIib+NuciyfvRImI7NTwCSLyO1U9RZKXBPL1uW2MM1Jx7LCQLEcAokzZnwHfipomicgNqroqx2nN8QrQh8Kzd2Nuj0aFD9E0+y5pvGcY7juzIy6JpD2wQhNk3ombfN4F2DBKAMqczLzOquStEX13puL+3Ahwpqp+lFQnX6reQNHocvguxbkchlN8plO2+yuJ66Q5TsT9C48z454hQRZXFmfivqhnAJfi3HzHFaAzFldJ4kYKm9sVc1kU/D8b9yPsDvwyqYiqXpzjsaSp/aNwWVP9ROQ53KTNw5P2CffdyXxvVtPU958vmZlRNcAPSJDSncEHInIe8K9o/0e4xUHbk9wF5etz8xY7xFVtWA/4a7R/bNR2ckKdDYFXReQlmhqWRNVWcJmlV+NcxfEfjELiPX/GJQCNxWU8Hgf0T6hxCnAWsCnuehTHRz8l4R8mABF5QlX3BR5ups07VZ/FJyJTVfXrmRlAEk2uS6jTFxe7WBs3wC2MWLFMp5jogrBGi8ssHIz7wfTF/ZihsFInU1X164X2I2REZATOVbgFLmFmKHCRJlx7LEpyOJ7GVOzv4yob/LHI/rUDnlXVRDFIEdkQF38cHjU9h/vjswyXLvx6S+eWGnFlwEbi1nqbiPtnnih2KCIzVHWX1try0NmzuXZVfSqhzpvAbsWOLCQqK5R1bZuuTSdv56v1a+CP6jL6LsLVKbw03+92xkhsIrAXTUdi4zVhJZl8ScMIypfL4fu4f1z/i3vzb8f57vP+l+E5+wYRGYJzf6wf7S8DTtRkQe2YO3HlVwoK3GZkgj0oIqfhLr7FuC82An7Cuq7CQkeIPrhIVcdGrpC9cbGRv+EMVd6o6h9EZBKNBmGkqk730L/tKGDicHSh/EULDycyTuKvztyZuBHBR7jR+Dmquip2Q+MyBfNltYj0i0deIrINBYzuVfWpKJMz/nP7kqouTqqDe099FK7+TFxh5npx1To+oPAVKA5X1d+Kmxy9D8m/25kjsUyj9glupFcS0jCCOhjn+tqCRpfDJaqaaIKbiMzEpRTHcYOuwPNJjIqIbKKqH0gL80+SjsaiPv1cVZ+J9ocDf01q6KJzn1XV4a0f2eL5mcVCY9Z+eTTPdPUMvcm4zy27PpyPkjUFEf87FZH/wSWT3FXoP1ZP/clOW/8QOD/peyQiE2n+D1PSuKHPOnOXADc195sQkR2TuGdFZF9c9mUcn90K96dgYsI+HYFzzU3Cved74AznvQl17sel4U+kiKot0XVkES7+9EtcTPsvBbhAvX23ReQXqprYNVgoVW+gfBGNeoZoNDEvGtJOUdWvVbBP63yBpJkq0Hlq7Yub8Z9dNiVpQc0jcEP6glwFGTr1WqLJfYUiIg/hkkf2x72uz3H/ohO5ikJDRDJdsjU49+VXSdxoGVpxbbrpGQaqop9l9Fs9G9gXaACmANdowkm2IjIDl024ONrfCHi8AFfh8c21q6sdmETnTFW9trW2PLW8fLejFPOf4bwDivuTeX3S9zrv56t2A+XLVeQjbiCe616JyB9xZYT+Gen+CPiCqA5aEqMgInfgZsjPpunSHUnfp3i2/nBcssXvcOWFErnBxFWAmKxRVewQEDcr/kDcP8z54qqQf01VJ1SwT4eQkZ2mnkrKiMhLqrpbAec9CpyOW9JkV3F15k5S1YN89KsQouSKT3BubHDV6GtVdURCnVmZf0gjd+OMSv1Jbe7PaBExKC/f7ei9/pTGWowFvdd5P18KDJQ3V5GI7Epj3OAZT3GDgolcM9Bo9GJXT2zw8nbRiMg8TbiURQs6RbkKstxWXXGjuVUUaMTTjLilDYbQeOH9MW5Uf0FCncyYbDtctYTrCvk+RPGdv+Mmiy/F1Zk7upLJRCLyqqru1FpbHjpXAwNpWuR3pua5iGIrMWjNd7QirkrDUbhrUWY1/e7Aai1Rxlw++Hqv8yUNSRJd8v0CtUY0Iknkqioxk7L2FQouvzRZRHZS1VeL7NNCcRUA9geuFDePJu/ArTattrABLvBf0/IZbZrvAHWqugZARG4FpgOJDBRNy9x8hTMqJxXYp4W4eM9EXDLSJzjPQyHfSV9ME5FhqvoCgIgMpYCKJ6p6joj8kMaCtUmL/MYVIObg4nQxAiRZkmQyLiFiQyBzza5PKaziv0+8vNf5kgYD9ZCIfCckV5FHMovD1uBKnxSybAe49N36KNmhmIrWR+BcBb9T1YbIVXBOK+esg7gVdc8ENseVlhmG+2FW7N9hoNQCcYZkj0IE1O8SCQ/g4jzTKGxOljcyRirr4f6AvRvt98UtPJmYyPNSUKKONq4Btm32iFJE8k7Djs59B/iGuFWDd8O9rnmqWvAadZ74Oo3vNbh1zubFn0UhCVy5qFoXX1a8pxvuoht/eKl0FUWjlcdUda8CzvWSWeiLOCkFeEFV66If8OWqelgl+hMiInIkcAWNWWXfAkar6t0JddaptIDLwktaaaGgjL1S0dJ3Oibf77av2LF4XC040jsJN3/tyagvewK/VdWbkuj4xNd7nvfzVauBiomC/0/jYkaFji6qAnHzc6ao6raV7kuxZGSD1eOWtV8pIrNVdUCl+xYK0Xf7NVys523cZ/9hATo34kYZcRbZsbhYRtJKC4jI34E/qZ/FD1OFuAn1PfGwWnCkNw+3yu+SaL8XLrGo6FhysYjIxjQtLfdujsMLJg0uvjG4+QrXiausOw1nrBKnYoZGVrC1Pa70TiV9/T5ZIK4C9b+B/xORpTi3htFI/N0+BFdle7qIPF3Ad3tIVoD+ySilOm+kNIsfpgrNWC3Yk+QSmi6L8mnUVjGirNLf4ybsLsa5U+fg5n35f75qH0EBiKspNgQ3+/9U4HMtUemNcpI1nP4KWBSAD9o74krM9MDNr0qytHbq8fHdFpFpwAhtWmnh3uwU5lY0gl38MK2IyG3A13BxPwUOxSVJzIT81wXz3KcZuEoUj0fZvHsDx6hqoUk3Oan6EZSIPIFLV34el5I5RAsrTxIcbeVHrwlrnbUVPH63zwEmRqMewf3rHZlEoK18FwPjDZrGsh6Ibn2vF5aEVaq6RETaiUg7VZ0YzdcsCVVvoHD/Jr6OW612GdAgIs+r6ue5TzOM4PHy3VbVJ8StqBzHLuapaukWmTO8oKqXVLoPzdAgIt1wcf87RWQxsKJUT5YKFx+AiKwPnIBbq6iPqiZd58YwgsTHd1tEvsm61VaSrhRrlJGoSs65uPhOMWvd+exTV1w1G8GtEt4DuDNO5PBN1Y+gxC2+twfun+bbuOrfz+Q6xzCqAV/fbfG3lLlRXuK17g6muLXuvKFRMe2IRLUFC6HqDRTun8UfcEtRpy6BwGjT+PpuDwZ20rS4S9oOvVR1TFQg9ingKRGZUskOichhwJW4ZV+EEpcoq3oDpaq/q3QfDKMUePxu+1rK3Cgvvta688lVwPfKNee06g2UYRjNIyIP4lx56+NnKXOjvFwWTf49m8a17s6qbJdYVM6CCKlJkjAMoynR/DLBuWQy134S4EpNuESKUV6i4sBnqmpDtL8BrgZm2Vedjlx74Mot9cFNsC94Xbl8sRGUYaSUeH6ZiKyXPddM3MJzRtgMjI0TgKp+LCIVWd0Z+F7cDdxy9t/OeEwBM1CGYeRPZvFSEclcpmF94LnK9MpIQDsR6amqS2HtCKoi12xVHRn1IXtU15OmS4J4xQyUYaSXu4BH8VS81Cg7vweeF5Gx0f4I4L8r2B9Yd1S3tJSjOotBGYZhBIqI7ISrfQfwpIcFR4vtzwxgr6xR3VOq+rVSPJ+NoAzDMAIlMkgVNUpZlHVUZyMowzAMI2/KOaozA2UYhmEESbtKd8AwDMMwmsMMlGEYhhEkZqAMwzCMIDEDZRglQkR+JSKviMgMEZkmIkNK+FwTRSTvJdwNoxqwNHPDKAEiMgz4DlCnql9F80U6VrhbhlFV2AjKMErDJsBH8TpOqvqxqn4oIheJyIsiMlNEro8PjkZAfxCRKSIyW0QGi8h9IjJPRC6NjukrInNE5A4ReVVE7hGRmuwnFpH9RWSyiLwsIneLSJeo/YpoRFcvIleV6X0wjIIxA2UYpWECsKWIzBWRv4jIt6L2P6nqUFUdCHSJ1vmJWamqQ4AbgAeAnwFfA06Iap4BbA/8WVV3Aj7F1dpbi4j0Ai4E9lXVwcBUYFQ0gvu+qu6sqnXAZSV51YbhETNQhlECoqWxdwV+ilum+18ichywj4i8EBVv3RsYkHHauOh2FvCKqi5W1S+BN4AtosfeVdUXovt3AMOznnoYsBPwnIhMB44DtgSWAZ+LyI0i8gPgc48v1zBKgsWgDKNEREusPw08LSKzgFNwI6Kvq+r7InIxbln3mHh9nTUZ98EtZ9DSbzV7pr0AE1T16OwDRWQ3YF9ceZrTo/uGESw2gjKMEiAi/UVk24ymOmBudP9jEekGHF6A9JYiEi80eBTwTNbjLwC7i0i/qB9dRGQ7EekK1KrqeGAUMLCA5zaMsmIjKMMoDd2AP0VLdn8FvI5z9y0DXgE+AF7KOD5XzbHMx+YBPxeRm4HZwPWZx6jqRyJyAvBPEekUtV+Ii1c9kJFU8cvCX5phlAerxWcYVYKI9AUeKtXSBoYRGubiM4zqwv5RGm0GG0EZhmEYQWIjKMMwDCNIzEAZhmEYQWIGyjAMwwgSM1CGYRhGkJiBMgzDMILk/wHppI+IfApBLgAAAABJRU5ErkJggg==\n","text/plain":"<matplotlib.figure.Figure at 0x7f28f41a66a0>"}}],"metadata":{"tags":["sample_code"],"dc":{"key":"52"},"trusted":true},"source":"# This command display figures inline\n%matplotlib inline\n\n# Creating the word frequency distribution\nfreqdist = nltk.FreqDist(words_ns)\n\n# Plotting the word frequency distribution\nfreqdist.plot(20)","cell_type":"code"},{"metadata":{"tags":["context"],"editable":false,"dc":{"key":"59"},"run_control":{"frozen":true},"deletable":false},"source":"## 9. The most common word\n<p>Nice! The frequency distribution plot above is the answer to our question. </p>\n<p>The natural language processing skills we used in this notebook are also applicable to much of the data that Data Scientists encounter as the vast proportion of the world's data is unstructured data and includes a great deal of text. </p>\n<p>So, what word turned out to (<em>not surprisingly</em>) be the most common word in Moby Dick?</p>","cell_type":"markdown"},{"execution_count":18,"outputs":[],"metadata":{"tags":["sample_code"],"dc":{"key":"59"},"trusted":true,"collapsed":true},"source":"# What's the most common word in Moby Dick?\nmost_common_word = 'whale'","cell_type":"code"}],"nbformat":4,"metadata":{"language_info":{"codemirror_mode":{"name":"ipython","version":3},"nbconvert_exporter":"python","file_extension":".py","mimetype":"text/x-python","pygments_lexer":"ipython3","name":"python","version":"3.5.2"},"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"}}}


--------------------------------------------------------------------------------