├── .ipynb_checkpoints ├── lesson1-checkpoint.ipynb └── lesson_4-checkpoint.ipynb ├── Project ├── .ipynb_checkpoints │ └── FinalProject_StudentFriendly-checkpoint.ipynb ├── convictions_by_state.csv ├── crime_by_state.csv ├── finalproject.ipynb ├── hindu_img.png └── literacy_by_state.csv ├── case_study ├── .DS_Store ├── .ipynb_checkpoints │ ├── case_study-checkpoint.ipynb │ └── datmo_demo-checkpoint.ipynb ├── case_study.ipynb ├── cities.csv ├── city_gdp.csv ├── cost_of_living.csv ├── datmo_demo.ipynb ├── engineering_data.csv ├── engineering_salaries_test.csv └── housing_price_index.csv ├── dsi.png ├── lesson1 ├── .ipynb_checkpoints │ └── lesson1-checkpoint.ipynb ├── child_mortality.csv ├── countries.csv ├── diseases.csv ├── fertility.csv ├── lesson1.ipynb ├── life_expectancy.csv ├── pew_population_projection.png ├── population.csv ├── poverty.csv └── water_quality.csv ├── lesson2 ├── .ipynb_checkpoints │ ├── Lesson_2_Programming_Intro-checkpoint.ipynb │ └── lesson2-checkpoint.ipynb └── lesson2.ipynb ├── lesson3.ipynb ├── lesson4 ├── .ipynb_checkpoints │ └── lesson4-checkpoint.ipynb └── lesson4.ipynb ├── lesson5 ├── cricket_tiers.csv ├── doctor_salaries.csv ├── engineering_data.csv └── lesson_5.ipynb ├── lesson6 ├── .ipynb_checkpoints │ └── lesson6-checkpoint.ipynb └── lesson6.ipynb ├── lesson7 ├── .ipynb_checkpoints │ └── Lesson_7_Visualization-checkpoint.ipynb ├── Lesson_7_Visualization.ipynb ├── british_india_troops.csv └── foreign_tourists.csv ├── lesson8 ├── .ipynb_checkpoints │ └── Lesson8_Correlation-checkpoint.ipynb ├── Lesson8_Correlation.ipynb ├── child_mortality.csv ├── fertility.csv ├── life_expectancy.csv └── population.csv ├── lesson9 ├── .ipynb_checkpoints │ ├── Notebook 9-checkpoint.ipynb │ └── lesson9-checkpoint.ipynb ├── Notebook 9.ipynb ├── circular.csv ├── cities_r2.csv ├── family.csv └── lesson9.ipynb └── tutorial └── tutorial.ipynb /.ipynb_checkpoints/lesson_4-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Lesson 4: Exercises" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Today, we'll be going over a couple Python exercises to reinforce your knowledge about tables and basic statistics." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 2, 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import numpy as np\n", 26 | "from datascience import *" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "**RCB's Cricket Confusion** Should this be a \"challenge problem?\"\n", 34 | "\n", 35 | "Royal Challengers Bangalore finished last in IPL 2017. The coach wants to build a strong team next year to win the trophy by buying the best possible players in the auction. He’s a great cricketer and coach, but he’s not very good at math! Your job is write a program to help him put a team together that's within his budget. There are 3 categories of players with the following costs in crores.\n", 36 | "Tier 1: 1\n", 37 | "Tier 2: 0.5\n", 38 | "Tier 3: 0.25\n", 39 | " \n", 40 | "The coach has 10 crores to spend and can purchase 11 to 16 players he wants from this set of 28. Write a function that accepts the coach's input for the number of players he wants that's within his budget.\n", 41 | "*Return* the players' total salary and *print* their names.\n", 42 | "REMEMBER: We refer to the first value in a column with 0!\n", 43 | "\n", 44 | " \n", 45 | "Sample Output: select_players(2, 4, 5)\n", 46 | "[MS Dhoni, Virat Kohli, Suresh Raina, Ambati Rayudu, Rohit Sharma, Murali Vijay, Amit Mishra, Axar Patel, Stuart Binny, Wriddhiman Saha, Mohit Sharma]\n", 47 | "5.25\n", 48 | "\n", 49 | "Sample Input: select_players(2, 2, 5)\n", 50 | "Sample Output: Too few players!\n", 51 | "\n", 52 | "Sample Input: select_players(10, 3, 1)\n", 53 | "Sample Output: Too many players!" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 17, 59 | "metadata": { 60 | "collapsed": false 61 | }, 62 | "outputs": [ 63 | { 64 | "data": { 65 | "text/html": [ 66 | "\n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | "
PLAYER Salary Tier
MS Dhoni 1 1
Virat Kohli 1 1
Ajinkya Rahan 1 1
Ravi Ashwin 1 1
Suresh Raina 0.5 2
Ambati Rayudu 0.5 2
Rohit Sharma 0.5 2
Murali Vijay 0.5 2
Shikhar Dhawan 0.5 2
Bhuvneshwar Kumar 0.5 2
\n", 114 | "

... (18 rows omitted) 16:\n", 185 | " print(\"Too many players!\")\n", 186 | " else:\n", 187 | " t1 = players._________(\"______\", are.equal_to(_______)).take(range(0, tier_1)).column(\"PLAYER\")\n", 188 | " t2 =\n", 189 | " t3 = \n", 190 | " \n", 191 | " #How will you access the players' names using the column() function?\n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " total_salary = __________+_____________+______________\n", 196 | " if total_salary > 100:\n", 197 | " return \"Too expensive! Select a different combination!\"\n", 198 | " return total_salary" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "**Building a Better Estimate**\n", 206 | "\n", 207 | "Raju the builder has made N measurements. Now, he wants to know the average value of the measurements made. In order to make the average value a better representative of the measurements, before calculating the average, he wants first to remove the highest K and the lowest K measurements. After that, he will calculate the average value among the remaining N - 2K measurements.\n", 208 | "Could you help Raju find the average value he will get after these manipulations?\n", 209 | "\n", 210 | "\n", 211 | "Sample Input: \n", 212 | "N - 5 \n", 213 | "K - 1\n", 214 | "N values - 2 9 -10 25 1\n", 215 | "Sample Output: \n", 216 | "4.00000\n" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": { 223 | "collapsed": false 224 | }, 225 | "outputs": [], 226 | "source": [ 227 | "def new_measurements(n, k, arr):\n", 228 | " \n", 229 | " for i in range(___, ___):\n", 230 | " " 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "**Aditi’s Career Planning**\n", 238 | " \n", 239 | "Aditi can’t decide what field she wants to work in when she grows up! She likes medicine and engineering equally so her father advised her to pick the field that pays the most to an average worker. Aditi has collected tables containing the necessary data on the salaries of professionals in these fields and stored them in 2 unsorted arrays. Can you help her find out which job to pick as per her father’s advice? \n", 240 | "\n", 241 | "Hint: use the sort() function.\n", 242 | "\n" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 6, 248 | "metadata": { 249 | "collapsed": true 250 | }, 251 | "outputs": [], 252 | "source": [ 253 | "engg_salaries = Table.read_table(\"engineering_data.csv\").column(\"Salary\")\n", 254 | "#Source: http://research.aspiringminds.com/resources/#datasets\n", 255 | "doc_salaries = Table.read_table(\"doctor_salaries.csv\").column(\"Salary\")\n", 256 | "#Source: https://www.glassdoor.com/Salaries/india-doctor-salary-SRCH_IL.0,5_IN115_KO6,12_IP6.htm\n" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": { 263 | "collapsed": false 264 | }, 265 | "outputs": [], 266 | "source": [ 267 | "def salary(med, engg, law):\n", 268 | " #In Python, we can define functions inside our own functions. \n", 269 | " #This function will compute a certain quantity from each array for you to help you compare the salaries.\n", 270 | " #What quantity do you think it is?\n", 271 | " def helper(array):\n", 272 | " \n", 273 | " _________________\n", 274 | " \n", 275 | " _________________\n", 276 | " length = len(array)\n", 277 | " if ___________:\n", 278 | " return array[___]\n", 279 | " else:\n", 280 | " return __________\n", 281 | " \n", 282 | " med_salary = _____________\n", 283 | " engg_salary = ____________\n", 284 | " law_salary = _____________\n", 285 | " #The max() function takes the maximum of all the values you put into it\n", 286 | " best_salary = max(___________, ______________, ___________) \n", 287 | " if best_salary == engg_salary:\n", 288 | " print(\"Engineering\")\n", 289 | " elif best_salary == med_salary:\n", 290 | " print(\"Medicine\")\n", 291 | " else:\n", 292 | " print(law)" 293 | ] 294 | } 295 | ], 296 | "metadata": { 297 | "kernelspec": { 298 | "display_name": "Python 3", 299 | "language": "python", 300 | "name": "python3" 301 | }, 302 | "language_info": { 303 | "codemirror_mode": { 304 | "name": "ipython", 305 | "version": 3 306 | }, 307 | "file_extension": ".py", 308 | "mimetype": "text/x-python", 309 | "name": "python", 310 | "nbconvert_exporter": "python", 311 | "pygments_lexer": "ipython3", 312 | "version": "3.6.0" 313 | } 314 | }, 315 | "nbformat": 4, 316 | "nbformat_minor": 2 317 | } 318 | -------------------------------------------------------------------------------- /Project/.ipynb_checkpoints/FinalProject_StudentFriendly-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Final Project: Crime in India" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Congratulations on reaching the final project stage of DSI! As your final project today, we want you to look at data from India about crimes committed against women. Safety of women is a critical issue across India right now and we want you to take a data-centric approach to begin unpacking this topic. \n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "import matplotlib\n", 33 | "matplotlib.use('Agg')\n", 34 | "from datascience import Table, predicates\n", 35 | "%matplotlib inline\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "import numpy as np\n", 38 | "plt.style.use('fivethirtyeight')" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "First, let's make a table of crimes committed in each state in 2012." 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "collapsed": false 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "crime_data = Table.read_table('crime_by_state.csv')\n", 57 | "crime_data" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "## Question 1: Warm-Up" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "**A.** How many rows are there in this table? Write a line of code that tells us the number of rows in the table. " 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": { 78 | "collapsed": true 79 | }, 80 | "outputs": [], 81 | "source": [] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "**B.** Find the population of all the states reported." 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "total_pop = sum(crime_data.____(_____))\n", 99 | "total_pop" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "**C.** Find the total amount of arrests in 2012." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "total_arrests = \n", 118 | "total_arrests" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "**D.** Calculate the total amount of arrests per hundred thousand people in 2012." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "collapsed": true 133 | }, 134 | "outputs": [], 135 | "source": [ 136 | "10000 * _____ / ______" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "**E.** Do you think it’s a high fraction of the population? Consider the entire population of India. How would this reflect if the rate were applied to the whole country?\n" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "**Answer Here** (double click to edit)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "## Question 2: Visualizing our Data" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "**A.** Before we do this, we need to add one column to our table. " 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "Calculate the weighted crime by state (crimes per 100,000) using two columns from the table, and add this information as a new column to the crime_data table as a column called `\"Weighted Crime\"`." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "total_crime_by_state = \n", 183 | "population_by_state = \n", 184 | "weighted_crime_by_state = np.divide(100000.0 * total_crime_by_state, population_by_state)\n", 185 | "crime_data = crime_data.with_column(\"________\", _________)\n", 186 | "crime_data" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "**B.** Make a plot of the total male count and total female count of arrested peoples by state below. \n", 194 | "\n", 195 | "Hint: Put the names of the columns representing male and female arrests in the array." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": { 202 | "collapsed": false 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "crime_data.____(\"_____\", np.array([\"____\", \"______\"]))" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "**C.** Now we want to look at the *normalized* crime per state (per 100,000 people). This will let us look at how the states rank in terms of crimes relative to their populations. Make this plot below.\n" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": { 220 | "collapsed": true 221 | }, 222 | "outputs": [], 223 | "source": [] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "**D.** Do the states rank similarly? Do they seem to follow a similar pattern in terms of number of crimes?\n", 230 | "\n" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "## Question 3: Convictions" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "Now that we have looked at some raw crime data, let’s think about its consequences. Is any action being taken against offenders? What kind of data do you think you could look at that would help you answer this question?\n", 245 | "\n", 246 | "Let’s try looking at the number of persons convicted and corresponding numbers of acquittals and convictions.\n" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "metadata": { 253 | "collapsed": false 254 | }, 255 | "outputs": [], 256 | "source": [ 257 | "conviction_data = Table.read_table(\"convictions_by_state.csv\")\n", 258 | "conviction_data" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "**A.** \n", 266 | "First, let's calculate the percentage of *arrests* that were *aquitted* and add this information as a new column to the crime_data table as a column called \"Percentage Acquitted\". Recall how you did this for the crime_data table." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": { 273 | "collapsed": true 274 | }, 275 | "outputs": [], 276 | "source": [ 277 | "number_acquitted = \n", 278 | "number_arrested = \n", 279 | "percentage_acquitted = (_______ / _______) * ______\n", 280 | "conviction_data = conviction_data.with_column(\"__________\", ___________)\n", 281 | "conviction_data" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "**B.** Conduct your own calculations and make appropriate plots to try to answer the questions we have posed. Discuss your approach and potential results with your peers and instructor. We have left some blank space below for you to do this work.\n" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": { 295 | "collapsed": true 296 | }, 297 | "outputs": [], 298 | "source": [] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": { 304 | "collapsed": true 305 | }, 306 | "outputs": [], 307 | "source": [] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "*Caution! Do you think looking at the acquittal rate was a fair way to assess action being taken against offenders? What does an acquittal really mean? We must understand the difference between an acquittal and a false case - no matter the crime. An acquittal occurs when there is not enough proof against the accused - not necessarily when the accuser is determined to be a liar. In fact, the acquittal rate for all crimes is approximately the same. The rate of acquittal for attempted murder for example is 73.4%. That doesn’t mean there is a false murder case epidemic! For rape it is 72.9%. So perhaps we should not have looked at the acquittal rate data to assess the conviction rate. What other types of data do you think we could have looked at to help us?*\n" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "## Question 3: Literacy" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "Get the literacy data as a table from the raw literacy csv file." 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": { 334 | "collapsed": false 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "literacy_data = Table.read_table(\"literacy_by_state.csv\")\n", 339 | "literacy_data" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "**A.** Add two columns to your `literacy_data` table called `Percentage Acquitted` and `Weighted Crime`. Use the `Percentage Acquitted` column from your `conviction_data` table and the `Weighted Crime` column from your `crime_data` table." 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "collapsed": false 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "literacy_data = \n", 358 | "literacy_data" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "**B.** Make a scatter plot comparing literacy rates in each state to the percentage of people acquitted. " 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "metadata": { 372 | "collapsed": true 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "literacy_data.scatter(\"_______\", \"___________\")" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "**C.** Calculate the correlation coefficient between these two factors. Use the `standard_units` function to help you." 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": null, 389 | "metadata": { 390 | "collapsed": true 391 | }, 392 | "outputs": [], 393 | "source": [ 394 | "#Calculating distance from the mean, dividing by standard deviation\n", 395 | "def standard_units(nums):\n", 396 | " return (nums - np.mean(nums))/np.std(nums)\n", 397 | "#Average of the product of x and y values in standard units, takes in a table and strings representing two column names.\n", 398 | "def correlation(tbl, col_1, col_2):\n", 399 | " return np.mean(standard_units(tbl.column(col_1)) * standard_units(tbl.column(col_2)))" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": { 406 | "collapsed": true 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "correlation(literacy_data, \"____\", \"_____\")" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "**D.** What results did you find? Are they correlated? Is it a strong correlation? Even if they are correlated, does that say anything about causation?\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "**Your Answer Here**" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "**E.** Bonus– Plot predictions generated by the regression line (Use the `minimize` function for least-squares).\n" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "## Question 4: Qualitative Analysis" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "Discuss these questions with your peers and the instructor." 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "**A.** What can you take away from the calculations and visualizations you have made today?" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "**Your Answer**" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "**B.** Where do you think there are potential holes in the data and in the calculations we have made? \n" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "**Your Answer**" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "**C.** According to *The Hindu* newspaper, marital and other rape is grossly underreported in India.\n" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "Why would there be motivation to keep the reported crime rate low? Who is held accountable for these crime rates? Politicians, police, citizens?\n" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "**Your Answer**" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "## Conclusion" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "Today, wewere able to see one of the many uses of data in today's world and connected our knowledge of data science with an important issue in society. You'll be able to use the material from the course *anywhere* and *anytime* in life, and we hope you learned a lot about how to look at the world and use numbers to describe it, computers to process it, and your brain to draw conclusions!" 516 | ] 517 | } 518 | ], 519 | "metadata": { 520 | "kernelspec": { 521 | "display_name": "Python 3", 522 | "language": "python", 523 | "name": "python3" 524 | }, 525 | "language_info": { 526 | "codemirror_mode": { 527 | "name": "ipython", 528 | "version": 3 529 | }, 530 | "file_extension": ".py", 531 | "mimetype": "text/x-python", 532 | "name": "python", 533 | "nbconvert_exporter": "python", 534 | "pygments_lexer": "ipython3", 535 | "version": "3.6.0" 536 | } 537 | }, 538 | "nbformat": 4, 539 | "nbformat_minor": 2 540 | } 541 | -------------------------------------------------------------------------------- /Project/convictions_by_state.csv: -------------------------------------------------------------------------------- 1 | STATE/UT,Persons Arrested,Persons Chargesheeted,Persons Convicted,Persons Acquitted 2 | A & N ISLANDS,73,73,5,68 3 | ANDHRA PRADESH,39288,39191,3527,35761 4 | ARUNACHAL PRADESH,202,130,24,178 5 | ASSAM,12346,7694,637,11709 6 | BIHAR,20147,19282,1317,18830 7 | CHANDIGARH,268,265,38,230 8 | CHHATTISGARH,6594,6566,1605,4989 9 | D & N HAVELI,30,38,4,26 10 | DAMAN & DIU,45,54,1,44 11 | DELHI,3981,3397,1771,2210 12 | GOA,286,127,7,279 13 | GUJARAT,23965,23525,434,23531 14 | HARYANA,7264,7429,1266,5998 15 | HIMACHAL PRADESH,1325,1317,107,1218 16 | JAMMU & KASHMIR,5204,5203,338,4866 17 | JHARKHAND,6549,5720,1152,5397 18 | KARNATAKA,16680,15849,859,15821 19 | KERALA,13517,13187,862,12655 20 | LAKSHADWEEP,1,0,0,1 21 | MADHYA PRADESH,29247,29234,5529,23718 22 | MAHARASHTRA,41048,39535,1047,40001 23 | MANIPUR,202,28,0,202 24 | MEGHALAYA,271,160,9,262 25 | MIZORAM,215,185,118,97 26 | NAGALAND,75,69,58,17 27 | ODISHA,17183,17142,974,16209 28 | PUDUCHERRY,110,103,26,84 29 | PUNJAB,5048,3439,904,4144 30 | RAJASTHAN,17095,17087,4582,12513 31 | SIKKIM,69,47,35,34 32 | TAMIL NADU,10913,9393,2046,8867 33 | TRIPURA,1946,2088,349,1597 34 | UTTAR PRADESH,77745,43775,12971,64774 35 | UTTARAKHAND,1420,1343,813,607 36 | WEST BENGAL,34023,33694,915,33108 37 | -------------------------------------------------------------------------------- /Project/crime_by_state.csv: -------------------------------------------------------------------------------- 1 | STATE/UT,Total Male,Total Female,Grand Total,Total population 2 | ANDHRA PRADESH,64916,13660,78576,1015986396 3 | UTTAR PRADESH,70157,7588,77745,2195396247 4 | MAHARASHTRA,29897,11151,41048,1236102692 5 | WEST BENGAL,25332,8691,34023,1004825096 6 | MADHYA PRADESH,25227,4020,29247,798573215 7 | GUJARAT,18026,5939,23965,664219908 8 | BIHAR,17420,2727,20147,1141851007 9 | ODISHA,14719,2464,17183,461420938 10 | RAJASTHAN,14795,2300,17095,754831132 11 | KARNATAKA,12946,3734,16680,672437744 12 | KERALA,11740,1777,13517,367264447 13 | ASSAM,12223,123,12346,342861992 14 | TAMIL NADU,8732,2181,10913,793528538 15 | HARYANA,6268,996,7264,278883891 16 | CHHATTISGARH,5742,852,6594,280942156 17 | JHARKHAND,5808,741,6549,362628618 18 | JAMMU & KASHMIR,4679,525,5204,138038186 19 | PUNJAB,4023,1025,5048,304746596 20 | NCT OF DELHI,3657,324,3981,184285585 21 | TRIPURA,1656,290,1946,40381352 22 | UTTARAKHAND,1309,111,1420,111284272 23 | HIMACHAL PRADESH,1064,261,1325,75421599 24 | GOA,234,52,286,16034953 25 | MEGHALAYA,252,19,271,32604077 26 | CHANDIGARH,257,11,268,11601546 27 | MIZORAM,214,1,215,12001154 28 | ARUNACHAL PRADESH,202,0,202,15208721 29 | MANIPUR,181,21,202,29939316 30 | PUDUCHERRY,93,17,110,13689104 31 | NAGALAND,60,15,75,21786622 32 | ANDAMAN & NICOBAR ISLANDS,60,13,73,4179384 33 | SIKKIM,66,3,69,6684568 34 | DAMAN & DIU,26,19,45,2672021 35 | DADRA & NAGAR HAVELI,24,6,30,3771383 36 | LAKSHADWEEP,1,0,1,708719 37 | -------------------------------------------------------------------------------- /Project/finalproject.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Final Project: Crime in India" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Congratulations on reaching the final project stage of DSI! As your final project today, we want you to look at data from India about crimes committed against women. Safety of women is a critical issue across India right now and we want you to take a data-centric approach to begin unpacking this topic. \n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "import matplotlib\n", 33 | "matplotlib.use('Agg')\n", 34 | "from datascience import Table, predicates\n", 35 | "%matplotlib inline\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "import numpy as np\n", 38 | "plt.style.use('fivethirtyeight')" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "First, let's make a table of crimes committed in each state in 2012." 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "collapsed": false 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "crime_data = Table.read_table('crime_by_state.csv')\n", 57 | "crime_data" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "## Question 1: Warm-Up" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "**A.** How many rows are there in this table? Write a line of code that tells us the number of rows in the table. " 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": { 78 | "collapsed": true 79 | }, 80 | "outputs": [], 81 | "source": [] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "**B.** Find the population of all the states reported." 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "total_pop = sum(crime_data.____(_____))\n", 99 | "total_pop" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "**C.** Find the total amount of arrests in 2012." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "total_arrests = \n", 118 | "total_arrests" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "**D.** Calculate the total amount of arrests per hundred thousand people in 2012." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "collapsed": true 133 | }, 134 | "outputs": [], 135 | "source": [ 136 | "10000 * _____ / ______" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "**E.** Do you think it’s a high fraction of the population? Consider the entire population of India. How would this reflect if the rate were applied to the whole country?\n" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "**Answer Here** (double click to edit)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "## Question 2: Visualizing our Data" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "**A.** Before we do this, we need to add one column to our table. " 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "Calculate the weighted crime by state (crimes per 100,000) using two columns from the table, and add this information as a new column to the crime_data table as a column called `\"Weighted Crime\"`." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "total_crime_by_state = \n", 183 | "population_by_state = \n", 184 | "weighted_crime_by_state = np.divide(100000.0 * total_crime_by_state, population_by_state)\n", 185 | "crime_data = crime_data.with_column(\"________\", _________)\n", 186 | "crime_data" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "**B.** Make a plot of the total male count and total female count of arrested peoples by state below. \n", 194 | "\n", 195 | "Hint: Put the names of the columns representing male and female arrests in the array." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": { 202 | "collapsed": false 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "crime_data.____(\"_____\", np.array([\"____\", \"______\"]))" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "**C.** Now we want to look at the *normalized* crime per state (per 100,000 people). This will let us look at how the states rank in terms of crimes relative to their populations. Make this plot below.\n" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": { 220 | "collapsed": true 221 | }, 222 | "outputs": [], 223 | "source": [] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "**D.** Do the states rank similarly? Do they seem to follow a similar pattern in terms of number of crimes?\n", 230 | "\n" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "## Question 3: Convictions" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "Now that we have looked at some raw crime data, let’s think about its consequences. Is any action being taken against offenders? What kind of data do you think you could look at that would help you answer this question?\n", 245 | "\n", 246 | "Let’s try looking at the number of persons convicted and corresponding numbers of acquittals and convictions.\n" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "metadata": { 253 | "collapsed": false 254 | }, 255 | "outputs": [], 256 | "source": [ 257 | "conviction_data = Table.read_table(\"convictions_by_state.csv\")\n", 258 | "conviction_data" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "**A.** \n", 266 | "First, let's calculate the percentage of *arrests* that were *aquitted* and add this information as a new column to the crime_data table as a column called \"Percentage Acquitted\". Recall how you did this for the crime_data table." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": { 273 | "collapsed": true 274 | }, 275 | "outputs": [], 276 | "source": [ 277 | "number_acquitted = \n", 278 | "number_arrested = \n", 279 | "percentage_acquitted = (_______ / _______) * ______\n", 280 | "conviction_data = conviction_data.with_column(\"__________\", ___________)\n", 281 | "conviction_data" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "**B.** Conduct your own calculations and make appropriate plots to try to answer the questions we have posed. Discuss your approach and potential results with your peers and instructor. We have left some blank space below for you to do this work.\n" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": { 295 | "collapsed": true 296 | }, 297 | "outputs": [], 298 | "source": [] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": { 304 | "collapsed": true 305 | }, 306 | "outputs": [], 307 | "source": [] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "*Caution! Do you think looking at the acquittal rate was a fair way to assess action being taken against offenders? What does an acquittal really mean? We must understand the difference between an acquittal and a false case - no matter the crime. An acquittal occurs when there is not enough proof against the accused - not necessarily when the accuser is determined to be a liar. In fact, the acquittal rate for all crimes is approximately the same. The rate of acquittal for attempted murder for example is 73.4%. That doesn’t mean there is a false murder case epidemic! For rape it is 72.9%. So perhaps we should not have looked at the acquittal rate data to assess the conviction rate. What other types of data do you think we could have looked at to help us?*\n" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "## Question 3: Literacy" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "Get the literacy data as a table from the raw literacy csv file." 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": { 334 | "collapsed": false 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "literacy_data = Table.read_table(\"literacy_by_state.csv\")\n", 339 | "literacy_data" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "**A.** Add two columns to your `literacy_data` table called `Percentage Acquitted` and `Weighted Crime`. Use the `Percentage Acquitted` column from your `conviction_data` table and the `Weighted Crime` column from your `crime_data` table." 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "collapsed": false 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "literacy_data = \n", 358 | "literacy_data" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "**B.** Make a scatter plot comparing literacy rates in each state to the percentage of people acquitted. " 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "metadata": { 372 | "collapsed": true 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "literacy_data.scatter(\"_______\", \"___________\")" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "**C.** Calculate the correlation coefficient between these two factors. Use the `standard_units` function to help you." 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": null, 389 | "metadata": { 390 | "collapsed": true 391 | }, 392 | "outputs": [], 393 | "source": [ 394 | "#Calculating distance from the mean, dividing by standard deviation\n", 395 | "def standard_units(nums):\n", 396 | " return (nums - np.mean(nums))/np.std(nums)\n", 397 | "#Average of the product of x and y values in standard units, takes in a table and strings representing two column names.\n", 398 | "def correlation(tbl, col_1, col_2):\n", 399 | " return np.mean(standard_units(tbl.column(col_1)) * standard_units(tbl.column(col_2)))" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": { 406 | "collapsed": true 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "correlation(literacy_data, \"____\", \"_____\")" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "**D.** What results did you find? Are they correlated? Is it a strong correlation? Even if they are correlated, does that say anything about causation?\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "**Your Answer Here**" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "**E.** Bonus– Plot predictions generated by the regression line (Use the `minimize` function for least-squares).\n" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "## Question 4: Qualitative Analysis" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "Discuss these questions with your peers and the instructor." 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "**A.** What can you take away from the calculations and visualizations you have made today?" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "**Your Answer**" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "**B.** Where do you think there are potential holes in the data and in the calculations we have made? \n" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "**Your Answer**" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "**C.** According to *The Hindu* newspaper, marital and other rape is grossly underreported in India.\n" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "Why would there be motivation to keep the reported crime rate low? Who is held accountable for these crime rates? Politicians, police, citizens?\n" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "**Your Answer**" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "## Conclusion" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "Today, wewere able to see one of the many uses of data in today's world and connected our knowledge of data science with an important issue in society. You'll be able to use the material from the course *anywhere* and *anytime* in life, and we hope you learned a lot about how to look at the world and use numbers to describe it, computers to process it, and your brain to draw conclusions!" 516 | ] 517 | } 518 | ], 519 | "metadata": { 520 | "kernelspec": { 521 | "display_name": "Python 3", 522 | "language": "python", 523 | "name": "python3" 524 | }, 525 | "language_info": { 526 | "codemirror_mode": { 527 | "name": "ipython", 528 | "version": 3 529 | }, 530 | "file_extension": ".py", 531 | "mimetype": "text/x-python", 532 | "name": "python", 533 | "nbconvert_exporter": "python", 534 | "pygments_lexer": "ipython3", 535 | "version": "3.6.0" 536 | } 537 | }, 538 | "nbformat": 4, 539 | "nbformat_minor": 2 540 | } 541 | -------------------------------------------------------------------------------- /Project/hindu_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/Project/hindu_img.png -------------------------------------------------------------------------------- /Project/literacy_by_state.csv: -------------------------------------------------------------------------------- 1 | State,Literacy Rate 2 | Andaman & Nicobar Islands,86.3 3 | Andhra Pradesh,67.89 4 | Arunachal Pradesh,67 5 | Assam,73.2 6 | Bihar,63.8 7 | Chandigarh,86.4 8 | Chhattisgarh,71 9 | Dadra & Nagar Haveli,77.7 10 | Daman & Diu,87.1 11 | Delhi,86.3 12 | Goa,87.4 13 | Gujarat,77.3 14 | Haryana,76.6 15 | Himachal Pradesh,83.8 16 | Jammu and Kashmir,68.7 17 | Jharkhand,67.6 18 | Karnataka,75.6 19 | Kerala,93.91 20 | Lakshadweep,92.3 21 | Madhya Pradesh,70.6 22 | Maharashtra,80.1 23 | Manipur,79.8 24 | Meghalaya,75.5 25 | Mizoram,91.6 26 | Nagaland,82.9 27 | Odisha,73.45 28 | Puducherry,86.5 29 | Punjab,79.7 30 | Rajasthan,67.1 31 | Sikkim,82.2 32 | Tamil Nadu,80.3 33 | Tripura,87.8 34 | Uttar Pradesh,71.7 35 | Uttarakhand,79.6 36 | West Bengal,77.1 -------------------------------------------------------------------------------- /case_study/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/case_study/.DS_Store -------------------------------------------------------------------------------- /case_study/city_gdp.csv: -------------------------------------------------------------------------------- 1 | Rank,City,State or union territory,"GDP per capita (nominal)","GDP per capita (PPP)" 2 | 1,New Delhi,National Capital Territory of Delhi,"$3,580 ","$12,747 " 3 | 2,Mumbai,Maharashtra,"$1,990 ","$7,005 " 4 | 3,Chennai,Tamil Nadu,"$1,870 ","$6,469 " 5 | 4,Hyderabad,Telangana,"$1,430 ","$5,063 " 6 | 5,Bangalore,Karnataka,"$1,420 ","$5,051 " 7 | 6,Kolkata,West Bengal,"$1,110 ","$4,036 " -------------------------------------------------------------------------------- /case_study/cost_of_living.csv: -------------------------------------------------------------------------------- 1 | Rank,City,Cost of Living Index,Rent Index,Cost of Living Plus Rent Index,Groceries Index,Restaurant Price Index,Local Purchasing Power Index 2 | 1,Ahmedabad,29.26,6.61,18.18,31.59,20.84,55.58 3 | 2,Bangalore,30.64,9.68,20.39,33.42,19.5,85.76 4 | 3,Bhubaneswar,25.91,4.66,15.52,29.79,14.91,51.16 5 | 4,Chandigarh,29.65,6.41,18.29,31.42,20.38,63.36 6 | 5,Chennai,29,7.34,18.41,31.48,19.21,68.68 7 | 6,Coimbatore,25.12,5.73,15.64,26.61,15.49,46.49 8 | 7,New Delhi,33.1,9.97,21.79,33.04,26.02,68.13 9 | 8,Goa,28.93,7.65,18.53,31.22,21.98,52.73 10 | 9,Gurgaon,36.29,10.53,23.69,35.6,32.21,94.28 11 | 10,Hyderabad,27.16,6.49,17.05,29.35,17.82,65.45 12 | 11,Indore,27.82,4.48,16.41,27.7,18.18,43.68 13 | 12,Jaipur,28.46,4.72,16.85,30.22,16.91,65.7 14 | 13,Kochi,25.99,6.45,16.44,28.94,14.98,55.8 15 | 14,Kolkata,28.48,6.92,17.94,30.05,23.36,46.81 16 | 15,Lucknow,28.45,4.72,16.85,29.48,18.27,62.69 17 | 16,Mangalore,24.2,5.82,15.21,25.67,13.22,91.24 18 | 17,Mumbai,33.25,23.72,28.59,35.84,23.96,63.7 19 | 18,Mysore,26.48,4.57,15.77,32.2,13.13,35.57 20 | 19,Nagpur,27.9,5.06,16.73,29.69,19.61,79.41 21 | 20,Navi Mumbai,29.94,9.97,20.18,32.22,20.8,72.11 22 | 21,Noida,33.11,7.02,20.35,34.77,22.69,82.21 23 | 22,Pune,30.87,7.89,19.64,32.92,22.1,85.31 24 | 23,Surat,28.28,4.61,16.71,29.5,20.91,54.56 25 | 24,Thane,30.34,8.73,19.78,31.99,20.49,58.83 26 | 25,Thiruvananthapuram,22.01,5.31,13.84,23.96,12.28,62.01 27 | 26,Vadodara,27.79,3.95,16.14,32.19,16.52,67.02 28 | 27,Visakhapatnam,25.99,5.27,15.86,28.85,17.14,52.77 -------------------------------------------------------------------------------- /case_study/housing_price_index.csv: -------------------------------------------------------------------------------- 1 | Particulars, 06-2011, 09-2011, 12-2011, 03-2012, 06-2012, 09-2012, 12-2012, 03-2013, 06-2013, 09-2013 2 | All India,116.00,119.40,125.50,134.10,142.60,147.10,157.00,160.80,162.30,169.20 3 | Ahmedabad,121.30,130.40,137.10,141.00,140.80,146.40,150.60,155.00,161.90,171.70 4 | Bangalore,110.70,107.80,138.60,133.30,133.30,136.60,141.20,141.90,142.30,150.40 5 | Chennai,101.20,110.40,110.70,108.20,119.20,117.80,137.60,137.40,138.30,150.00 6 | Delhi,126.80,124.80,136.70,158.20,177.30,183.20,200.70,213.10,214.80,215.70 7 | Kolkata,103.00,105.00,103.20,106.10,135.20,149.10,162.50,169.40,171.80,173.50 8 | Mumbai,122.10,131.40,122.80,143.50,147.60,148.10,158.90,159.50,160.00,169.20 9 | -------------------------------------------------------------------------------- /dsi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/dsi.png -------------------------------------------------------------------------------- /lesson1/diseases.csv: -------------------------------------------------------------------------------- 1 | Year,State/UTs,Acute Diarrhoeal Diseases - Cases,Acute Diarrhoeal Diseases - Deaths,Malaria - Cases,Malaria - Deaths,Acute Respiaratory Infection - Cases,Acute Respiaratory Infection - Deaths,Japanese Encephalitis - Cases,Japanese Encephalitis - Deaths,Viral Hepatitis - Cases,Viral Hepatitis - Deaths 2 | 2011(P),GRAND TOTAL,10231049,1269,1278760,463,26300208,2492,8249,1169,94402,520 3 | 2011(P),Andhra Pradesh,2235614,107,39559,5,3089290,236,73,1,11050,61 4 | 2011(P),Arunachal Pradesh,32228,11,10961,NA,48602,9,NULL,NULL,636,4 5 | 2011(P),Assam,96816,16,47397,42,314824,NA,1319,250,2557,25 6 | 2011(P),Bihar,130276,NULL,2390,0,87486,NULL,821,197,202,NULL 7 | 2011(P),Chhattisgarh,64575,5,131179,18,155743,18,NULL,NULL,139,1 8 | 2011(P),Delhi,102983,62,413,NA,198541,102,9,NULL,8347,68 9 | 2011(P),Goa,15146,2,1231,1,61029,6,91,1,118,NA 10 | 2011(P),Gujarat,367450,0,86005,15,604076,NA,NULL,NULL,4328,NA 11 | 2011(P),Haryana,224223,21,33345,1,1275035,48,90,14,2557,2 12 | 2011(P),Himachal Pradesh,310227,51,247,NA,1484149,154,NULL,NULL,1248,10 13 | 2011(P),Jammu & Kashmir,544711,0,1031,NA,528409,6,NULL,NULL,5129,2 14 | 2011(P),Jharkhand,98258,1,152061,16,205496,5,303,19,384,2 15 | 2011(P),Karnataka,591989,49,24487,0,1629997,182,397,0,6049,8 16 | 2011(P),Kerala,260938,0,1339,2,5034506,128,88,6,5336,7 17 | 2011(P),Madhya Pradesh,290705,92,89304,71,578783,182,35,0,3851,12 18 | 2011(P),Maharashtra,507046,4,96632,114,571947,28,NULL,9,5994,30 19 | 2011(P),Manipur,17605,39,714,0,25441,55,11,0,229,NA 20 | 2011(P),Meghalaya,148801,20,24507,47,295146,5,NULL,NULL,87,3 21 | 2011(P),Mizoram,16192,11,8849,26,26817,33,NULL,NULL,812,14 22 | 2011(P),Nagaland,30458,1,3363,2,48566,NA,44,6,64,NA 23 | 2011(P),Orissa,632493,143,294759,73,1372208,269,NULL,NULL,3272,89 24 | 2011(P),Punjab,190022,15,2693,NA,656544,10,NULL,NULL,5041,12 25 | 2011(P),Rajasthan,227571,7,46457,5,1089640,62,NULL,NULL,967,0 26 | 2011(P),Sikkim,44094,2,51,NA,92736,12,NULL,NULL,484,0 27 | 2011(P),Tamil Nadu,210074,24,22139,0,2410214,22,762,29,5940,0 28 | 2011(P),Tripura,109777,83,14295,9,160438,135,NULL,NULL,404,0 29 | 2011(P),Uttar Pradesh,554770,185,56438,NA,1183992,196,3492,579,7749,28 30 | 2011(P),Uttarakhand,79643,26,1162,2,130283,56,0,NA,3143,19 31 | 2011(P),West Bengal,1854651,288,66465,14,1991660,528,714,58,5480,105 32 | 2011(P),A & N Islands,19679,0,5939,NA,69151,3,0,NULL,208,5 33 | 2011(P),Chandigarh,42615,NULL,582,NA,49649,NULL,NULL,NULL,1309,NULL 34 | 2011(P),D & N Haveli,81322,1,12331,NA,104447,NA,NULL,NULL,269,0 35 | 2011(P),Daman & Diu,12638,NA,268,NA,42350,NA,NULL,NULL,484,NA 36 | 2011(P),Lakshadweep,4693,NA,15,NA,28129,NA,NULL,NULL,15,1 37 | 2011(P),Puducherry,80766,3,152,NA,654884,2,NULL,NULL,520,12 -------------------------------------------------------------------------------- /lesson1/pew_population_projection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/lesson1/pew_population_projection.png -------------------------------------------------------------------------------- /lesson2/.ipynb_checkpoints/lesson2-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true 8 | }, 9 | "source": [ 10 | "" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "deletable": true, 17 | "editable": true 18 | }, 19 | "source": [ 20 | "## Lesson 2: Introduction to Statistics and Python Programming" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": { 26 | "collapsed": false, 27 | "deletable": true, 28 | "editable": true 29 | }, 30 | "source": [ 31 | "In the last lesson, we saw some cool ways in which computers can do amazing things with data. Today, we're going to be learning a little bit about Python, a computer programming language which is used by people of various professions including scientists, business people, and engineers. We will be using it to practice what we learned earlier in the session." 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": { 37 | "deletable": true, 38 | "editable": true 39 | }, 40 | "source": [ 41 | "**Computer Programming and Programming Languages**\n", 42 | "\n", 43 | "One of the reasons why computers are useful is that they can store and run *programs*, a set of instructions that tells a computer what to do. In fact, you're running a computer program, called a Jupyter Notebook, right now! It's a program that lets you write your own programs. How cool is that?\n", 44 | "\n", 45 | "How can you tell a computer what to do? It doesn't understand Hindi, English, or any of the languages that we speak. This is where programming languages come in. They are the medium through which the computer understands what you want it to do. Think of them as the step between what you want to do and how you're going to tell the computer to do it. You may have heard of languages like Java and C++. Python is just like these. They may look different, but you can write programs to do the same things in different languages, just like we have different words in different spoken languages, for the same things.\n" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "deletable": true, 52 | "editable": true 53 | }, 54 | "source": [ 55 | "**Data Types and Variables in Python**" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "deletable": true, 62 | "editable": true 63 | }, 64 | "source": [ 65 | "Python can handle several different types of data. The most important ones that we'll be talking about are *integers*, *doubles*, *booleans* and *strings*.\n", 66 | "\n", 67 | "**integers** are whole numbers (without decimals) like 0, 1, 2 ...\n", 68 | "\n", 69 | "**doubles** are decimal numbers like 1.234, 56.8, 9.5. In Python, they are termed *float*.\n", 70 | "\n", 71 | "**booleans** are special values of only two types: either *True* or *False*. Examples of booleans are 0 = False, *nil* = False, 1 = True. In general, besides 0, all numbers are 'true' values. \n", 72 | "\n", 73 | "**strings** are anything placed between quotation marks like these: \" \". E.g \"hello\", \"data\"." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": { 80 | "collapsed": false, 81 | "deletable": true, 82 | "editable": true 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "type(212412)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": { 93 | "collapsed": false, 94 | "deletable": true, 95 | "editable": true 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "type(9.9999)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": null, 105 | "metadata": { 106 | "collapsed": false, 107 | "deletable": true, 108 | "editable": true 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "type(True)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": { 119 | "collapsed": false, 120 | "deletable": true, 121 | "editable": true 122 | }, 123 | "outputs": [], 124 | "source": [ 125 | "type(\"data science\")" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": { 131 | "collapsed": true, 132 | "deletable": true, 133 | "editable": true 134 | }, 135 | "source": [ 136 | "It is a tradition in programming to print the string \"Hello World!\" as your first program. Luckily, Python makes it easy for us to do this. If you ever want to print something, write print( ), with whatever you want in between the parentheses. Try printing \"Hello World!\" below." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": { 143 | "collapsed": false, 144 | "deletable": true, 145 | "editable": true 146 | }, 147 | "outputs": [], 148 | "source": [ 149 | "print(\"Hello World!\"\n", 150 | " )" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": { 156 | "collapsed": false, 157 | "deletable": true, 158 | "editable": true 159 | }, 160 | "source": [ 161 | "You've heard of *variables* in math class before, but they are also an essential part of programming. They allow you to store a value, like a word or number, in the computer's *memory,* and also give it a name, so you can use this value later on in your program. You can set a variable with the help of an equal sign ( = ). The variable name comes before the equal sign and it's value comes after. In the example below, we are setting the variable *pi* to be equal to 3.142. " 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "collapsed": false, 169 | "deletable": true, 170 | "editable": true 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "pi = 3.142\n", 175 | "pi" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": { 181 | "collapsed": false, 182 | "deletable": true, 183 | "editable": true 184 | }, 185 | "source": [ 186 | "Try setting the variable \"name\" to be equal to your name. Remember to put your name in quotes; that's how Python will recognize it!" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": { 193 | "collapsed": true, 194 | "deletable": true, 195 | "editable": true 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "name = #your name here" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "deletable": true, 206 | "editable": true 207 | }, 208 | "source": [ 209 | "Run this code." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true, 217 | "deletable": true, 218 | "editable": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "print(\"Hello\" + your name + \"!\")" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": { 228 | "collapsed": true, 229 | "deletable": true, 230 | "editable": true 231 | }, 232 | "source": [ 233 | "You might have noticed how we used the + sign to add, or *concatenate,* these strings together. This is why the + sign is called an *operator*, it performs an *operation* on strings. This is just like an operation in math, such as addition, subtraction, multiplication, or division." 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": { 239 | "deletable": true, 240 | "editable": true 241 | }, 242 | "source": [ 243 | "Here are some examples of operations on *integers* and *doubles*. Because computers are great with working with big numbers, let's see your computer in action." 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": { 249 | "deletable": true, 250 | "editable": true 251 | }, 252 | "source": [ 253 | "The text after the \"#\" sign is called a *comment*, which means it doesn't get run. Comments are useful to explain your code." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": { 260 | "collapsed": false, 261 | "deletable": true, 262 | "editable": true 263 | }, 264 | "outputs": [], 265 | "source": [ 266 | "print(127834 + 7345) # Addition\n", 267 | "print(8237645 - 10634) #Subtraction\n", 268 | "print(1234.5678 * 5678.9101112) #Multiplication\n", 269 | "print(341345.234028 / 7345.332) #Division" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "collapsed": false, 276 | "deletable": true, 277 | "editable": true 278 | }, 279 | "source": [ 280 | "Using our *pi* variable, let's calculate the area of a circle of radius 5 cm." 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": { 287 | "collapsed": false, 288 | "deletable": true, 289 | "editable": true 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "pi = 3.142\n", 294 | "pi * 5 * 5\n" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": { 300 | "collapsed": false, 301 | "deletable": true, 302 | "editable": true 303 | }, 304 | "source": [ 305 | "Now that we've learnt about data types and variables, let's test ourselves. In the next cell, there will be examples of variables. Fill in the variable \"type_guess\" with the data type you think it is and run the cell." 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": { 312 | "collapsed": true, 313 | "deletable": true, 314 | "editable": true 315 | }, 316 | "outputs": [], 317 | "source": [ 318 | "data_example1 = \"Data Science\"\n", 319 | "example1_type = type(data_example1)\n", 320 | "type_guess1 = \"______\"\n", 321 | "\n", 322 | "print(\"Your answer: \" + type_guess1)\n", 323 | "\n", 324 | "print(\"Correct answer:\")\n", 325 | "example1_type" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": { 332 | "collapsed": false, 333 | "deletable": true, 334 | "editable": true 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "data_example2 = 2678\n", 339 | "example2_type = type(data_example2)\n", 340 | "type_guess2 = \"_______\"\n", 341 | "\n", 342 | "print(\"Your answer: \" + type_guess2)\n", 343 | "\n", 344 | "print(\"Correct answer: \")\n", 345 | "example2_type" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": { 352 | "collapsed": true, 353 | "deletable": true, 354 | "editable": true 355 | }, 356 | "outputs": [], 357 | "source": [ 358 | "data_example3 = 0\n", 359 | "example3_type = type(data_example3)\n", 360 | "type_guess3 = \"______\"\n", 361 | "\n", 362 | "print(\"Your answer:\" + type_guess3)\n", 363 | "\n", 364 | "print(\"Correct answer: \")\n", 365 | "example3_type" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "collapsed": true, 372 | "deletable": true, 373 | "editable": true 374 | }, 375 | "source": [ 376 | "You might be wondering what exactly boolean values do in Python. Essentially, a boolean value tells us if something is true or false. Therefore, some expressions can output boolean values. Here are some examples:" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": { 383 | "collapsed": false, 384 | "deletable": true, 385 | "editable": true, 386 | "scrolled": true 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "4 == 4 #we use two equal signs to check if two things are equal; remember one equal sign is used for setting variables!" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": { 397 | "collapsed": false, 398 | "deletable": true, 399 | "editable": true 400 | }, 401 | "outputs": [], 402 | "source": [ 403 | "type(4 == 4)" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": null, 409 | "metadata": { 410 | "collapsed": false, 411 | "deletable": true, 412 | "editable": true 413 | }, 414 | "outputs": [], 415 | "source": [ 416 | "4 > 5" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": { 423 | "collapsed": false, 424 | "deletable": true, 425 | "editable": true 426 | }, 427 | "outputs": [], 428 | "source": [ 429 | "type(4 > 5)" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": { 436 | "collapsed": false, 437 | "deletable": true, 438 | "editable": true 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "5 + 1 == 6" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": { 448 | "deletable": true, 449 | "editable": true 450 | }, 451 | "source": [ 452 | "As you can see, a comparison of two integer values will output a boolean value. A similar concept can be applied to other data types, but that will be covered later. For now, we will learn exactly why this is important. A very important part of Python is the *if-statement*. This statement lets you take a boolean value and if True, compute some action. If it's False, the code under the *if* will be ignored.\n", 453 | "\n", 454 | "The basic syntax of an if statement looks like this:\n", 455 | "\n", 456 | " if (boolean):\n", 457 | " \n", 458 | " some code\n", 459 | "\n", 460 | "You may have noticed that the code below the if is indented. This is very important! In Python, the indent is used to separate lines of code. All the code inside any command has to be at the same indent line otherwise Python can't read it. In order for python to understand that the action you want is for the \"if\" case, you must indent exactly one indent beyond where the \"if\" is. Here are some examples of the \"if\".\n", 461 | " \n" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": null, 467 | "metadata": { 468 | "collapsed": false, 469 | "deletable": true, 470 | "editable": true 471 | }, 472 | "outputs": [], 473 | "source": [ 474 | "if 4 == 4:\n", 475 | " print(\"4 is equal to four!\")" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": { 482 | "collapsed": false, 483 | "deletable": true, 484 | "editable": true 485 | }, 486 | "outputs": [], 487 | "source": [ 488 | "example_bool = 5 > 4\n", 489 | "if example_bool:\n", 490 | " print(\"5 is greater than 4\")" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": { 496 | "deletable": true, 497 | "editable": true 498 | }, 499 | "source": [ 500 | "Notice that it wasn't needed to specify whether or not example_bool was equal, less than, or greater than anything. This is because example_bool is a true statement and Python will allow true if-statements to execute." 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": null, 506 | "metadata": { 507 | "collapsed": false, 508 | "deletable": true, 509 | "editable": true 510 | }, 511 | "outputs": [], 512 | "source": [ 513 | "example_bool = 4 > 5\n", 514 | "if example_bool == False:\n", 515 | " print(\"4 is greater than 5\")" 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": { 521 | "deletable": true, 522 | "editable": true 523 | }, 524 | "source": [ 525 | "Notice in the last example, \"4 > 5\" seems to be a false value, but in the if-statement, it is asking if the value is equal to False. Thus, what the statement actually executes to is \"False == False\", which computes to \"True\"! Make sure you watch for tricks like this one!\n", 526 | "\n", 527 | "\n", 528 | "Now try it yourself! Fill in the if-statement with a true value and watch the code print.\n" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": { 535 | "collapsed": true, 536 | "deletable": true, 537 | "editable": true 538 | }, 539 | "outputs": [], 540 | "source": [ 541 | "if ___________:\n", 542 | " print(\"You have succeeded!\")" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": { 548 | "deletable": true, 549 | "editable": true 550 | }, 551 | "source": [ 552 | "Set a variable \"mood\" to any number. If \"mood\" is greater than 5, print \"I am glad you are in a good mood today!\"" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": null, 558 | "metadata": { 559 | "collapsed": true, 560 | "deletable": true, 561 | "editable": true 562 | }, 563 | "outputs": [], 564 | "source": [ 565 | "mood = _______\n", 566 | "if ________:\n", 567 | " print(____________)" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": { 573 | "deletable": true, 574 | "editable": true 575 | }, 576 | "source": [ 577 | "So far, the \"if\" statements that we've seen have only had one condition. That means, we only checked one boolean variable, and that was it. However, we can actually have any number of \"if\" checks. For example, if we have the score of a student, and we want to check which letter grade corresponds to their score, we could write the following:" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": null, 583 | "metadata": { 584 | "collapsed": false, 585 | "deletable": true, 586 | "editable": true 587 | }, 588 | "outputs": [], 589 | "source": [ 590 | "score = 76\n", 591 | "if score >= 90:\n", 592 | " print(\"A\")\n", 593 | "elif 80 <= score < 90:\n", 594 | " print(\"B\")\n", 595 | "elif 70 <= score < 80:\n", 596 | " print(\"C\")\n", 597 | "elif 60 <= score < 70:\n", 598 | " print(\"D\")\n", 599 | "else:\n", 600 | " print(\"F\")" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": { 606 | "deletable": true, 607 | "editable": true 608 | }, 609 | "source": [ 610 | "Python evaluates these statements in order, and stops at the first one that is true. For example, in this case, the first two booleans are false (76 is not greater than 90 nor is it between 80 and 90), but the third is true (76 is between 70 and 80). Therefore, the code prints out C, and then stops evaluating this sequence of statements. " 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": { 616 | "deletable": true, 617 | "editable": true 618 | }, 619 | "source": [ 620 | "We will build upon these foundational skills that you've learned in today's notebook in the next session. We encourage that you practice these tools on your own; you can do so by creating a blank notebook." 621 | ] 622 | } 623 | ], 624 | "metadata": { 625 | "anaconda-cloud": {}, 626 | "kernelspec": { 627 | "display_name": "Python 3", 628 | "language": "python", 629 | "name": "python3" 630 | }, 631 | "language_info": { 632 | "codemirror_mode": { 633 | "name": "ipython", 634 | "version": 3 635 | }, 636 | "file_extension": ".py", 637 | "mimetype": "text/x-python", 638 | "name": "python", 639 | "nbconvert_exporter": "python", 640 | "pygments_lexer": "ipython3", 641 | "version": "3.6.0" 642 | } 643 | }, 644 | "nbformat": 4, 645 | "nbformat_minor": 2 646 | } 647 | -------------------------------------------------------------------------------- /lesson2/lesson2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true 8 | }, 9 | "source": [ 10 | "" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "deletable": true, 17 | "editable": true 18 | }, 19 | "source": [ 20 | "## Lesson 2: Introduction to Statistics and Python Programming" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": { 26 | "collapsed": false, 27 | "deletable": true, 28 | "editable": true 29 | }, 30 | "source": [ 31 | "In the last lesson, we saw some cool ways in which computers can do amazing things with data. Today, we're going to be learning a little bit about Python, a computer programming language which is used by people of various professions including scientists, business people, and engineers. We will be using it to practice what we learned earlier in the session." 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": { 37 | "deletable": true, 38 | "editable": true 39 | }, 40 | "source": [ 41 | "**Computer Programming and Programming Languages**\n", 42 | "\n", 43 | "One of the reasons why computers are useful is that they can store and run *programs*, a set of instructions that tells a computer what to do. In fact, you're running a computer program, called a Jupyter Notebook, right now! It's a program that lets you write your own programs. How cool is that?\n", 44 | "\n", 45 | "How can you tell a computer what to do? It doesn't understand Hindi, English, or any of the languages that we speak. This is where programming languages come in. They are the medium through which the computer understands what you want it to do. Think of them as the step between what you want to do and how you're going to tell the computer to do it. You may have heard of languages like Java and C++. Python is just like these. They may look different, but you can write programs to do the same things in different languages, just like we have different words in different spoken languages, for the same things.\n" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "deletable": true, 52 | "editable": true 53 | }, 54 | "source": [ 55 | "**Data Types and Variables in Python**" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "deletable": true, 62 | "editable": true 63 | }, 64 | "source": [ 65 | "Python can handle several different types of data. The most important ones that we'll be talking about are *integers*, *doubles*, *booleans* and *strings*.\n", 66 | "\n", 67 | "**integers** are whole numbers (without decimals) like 0, 1, 2 ...\n", 68 | "\n", 69 | "**doubles** are decimal numbers like 1.234, 56.8, 9.5. In Python, they are termed *float*.\n", 70 | "\n", 71 | "**booleans** are special values of only two types: either *True* or *False*. Examples of booleans are 0 = False, *nil* = False, 1 = True. In general, besides 0, all numbers are 'true' values. \n", 72 | "\n", 73 | "**strings** are anything placed between quotation marks like these: \" \". E.g \"hello\", \"data\"." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": { 80 | "collapsed": false, 81 | "deletable": true, 82 | "editable": true 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "type(212412)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": { 93 | "collapsed": false, 94 | "deletable": true, 95 | "editable": true 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "type(9.9999)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": null, 105 | "metadata": { 106 | "collapsed": false, 107 | "deletable": true, 108 | "editable": true 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "type(True)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": { 119 | "collapsed": false, 120 | "deletable": true, 121 | "editable": true 122 | }, 123 | "outputs": [], 124 | "source": [ 125 | "type(\"data science\")" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": { 131 | "collapsed": true, 132 | "deletable": true, 133 | "editable": true 134 | }, 135 | "source": [ 136 | "It is a tradition in programming to print the string \"Hello World!\" as your first program. Luckily, Python makes it easy for us to do this. If you ever want to print something, write print( ), with whatever you want in between the parentheses. Try printing \"Hello World!\" below." 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": { 143 | "collapsed": false, 144 | "deletable": true, 145 | "editable": true 146 | }, 147 | "outputs": [], 148 | "source": [ 149 | "print(\"Hello World!\"\n", 150 | " )" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": { 156 | "collapsed": false, 157 | "deletable": true, 158 | "editable": true 159 | }, 160 | "source": [ 161 | "You've heard of *variables* in math class before, but they are also an essential part of programming. They allow you to store a value, like a word or number, in the computer's *memory,* and also give it a name, so you can use this value later on in your program. You can set a variable with the help of an equal sign ( = ). The variable name comes before the equal sign and it's value comes after. In the example below, we are setting the variable *pi* to be equal to 3.142. " 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "collapsed": false, 169 | "deletable": true, 170 | "editable": true 171 | }, 172 | "outputs": [], 173 | "source": [ 174 | "pi = 3.142\n", 175 | "pi" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": { 181 | "collapsed": false, 182 | "deletable": true, 183 | "editable": true 184 | }, 185 | "source": [ 186 | "Try setting the variable \"name\" to be equal to your name. Remember to put your name in quotes; that's how Python will recognize it!" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": { 193 | "collapsed": true, 194 | "deletable": true, 195 | "editable": true 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "name = #your name here" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "deletable": true, 206 | "editable": true 207 | }, 208 | "source": [ 209 | "Run this code." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": true, 217 | "deletable": true, 218 | "editable": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "print(\"Hello\" + your name + \"!\")" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": { 228 | "collapsed": true, 229 | "deletable": true, 230 | "editable": true 231 | }, 232 | "source": [ 233 | "You might have noticed how we used the + sign to add, or *concatenate,* these strings together. This is why the + sign is called an *operator*, it performs an *operation* on strings. This is just like an operation in math, such as addition, subtraction, multiplication, or division." 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": { 239 | "deletable": true, 240 | "editable": true 241 | }, 242 | "source": [ 243 | "Here are some examples of operations on *integers* and *doubles*. Because computers are great with working with big numbers, let's see your computer in action." 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": { 249 | "deletable": true, 250 | "editable": true 251 | }, 252 | "source": [ 253 | "The text after the \"#\" sign is called a *comment*, which means it doesn't get run. Comments are useful to explain your code." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": { 260 | "collapsed": false, 261 | "deletable": true, 262 | "editable": true 263 | }, 264 | "outputs": [], 265 | "source": [ 266 | "print(127834 + 7345) # Addition\n", 267 | "print(8237645 - 10634) #Subtraction\n", 268 | "print(1234.5678 * 5678.9101112) #Multiplication\n", 269 | "print(341345.234028 / 7345.332) #Division" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "collapsed": false, 276 | "deletable": true, 277 | "editable": true 278 | }, 279 | "source": [ 280 | "Using our *pi* variable, let's calculate the area of a circle of radius 5 cm." 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": { 287 | "collapsed": false, 288 | "deletable": true, 289 | "editable": true 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "pi = 3.142\n", 294 | "pi * 5 * 5\n" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": { 300 | "collapsed": false, 301 | "deletable": true, 302 | "editable": true 303 | }, 304 | "source": [ 305 | "Now that we've learnt about data types and variables, let's test ourselves. In the next cell, there will be examples of variables. Fill in the variable \"type_guess\" with the data type you think it is and run the cell." 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": { 312 | "collapsed": true, 313 | "deletable": true, 314 | "editable": true 315 | }, 316 | "outputs": [], 317 | "source": [ 318 | "data_example1 = \"Data Science\"\n", 319 | "example1_type = type(data_example1)\n", 320 | "type_guess1 = \"______\"\n", 321 | "\n", 322 | "print(\"Your answer: \" + type_guess1)\n", 323 | "\n", 324 | "print(\"Correct answer:\")\n", 325 | "example1_type" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": { 332 | "collapsed": false, 333 | "deletable": true, 334 | "editable": true 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "data_example2 = 2678\n", 339 | "example2_type = type(data_example2)\n", 340 | "type_guess2 = \"_______\"\n", 341 | "\n", 342 | "print(\"Your answer: \" + type_guess2)\n", 343 | "\n", 344 | "print(\"Correct answer: \")\n", 345 | "example2_type" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": { 352 | "collapsed": true, 353 | "deletable": true, 354 | "editable": true 355 | }, 356 | "outputs": [], 357 | "source": [ 358 | "data_example3 = 0\n", 359 | "example3_type = type(data_example3)\n", 360 | "type_guess3 = \"______\"\n", 361 | "\n", 362 | "print(\"Your answer:\" + type_guess3)\n", 363 | "\n", 364 | "print(\"Correct answer: \")\n", 365 | "example3_type" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "collapsed": true, 372 | "deletable": true, 373 | "editable": true 374 | }, 375 | "source": [ 376 | "You might be wondering what exactly boolean values do in Python. Essentially, a boolean value tells us if something is true or false. Therefore, some expressions can output boolean values. Here are some examples:" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": { 383 | "collapsed": false, 384 | "deletable": true, 385 | "editable": true, 386 | "scrolled": true 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "4 == 4 #we use two equal signs to check if two things are equal; remember one equal sign is used for setting variables!" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": { 397 | "collapsed": false, 398 | "deletable": true, 399 | "editable": true 400 | }, 401 | "outputs": [], 402 | "source": [ 403 | "type(4 == 4)" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": null, 409 | "metadata": { 410 | "collapsed": false, 411 | "deletable": true, 412 | "editable": true 413 | }, 414 | "outputs": [], 415 | "source": [ 416 | "4 > 5" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": { 423 | "collapsed": false, 424 | "deletable": true, 425 | "editable": true 426 | }, 427 | "outputs": [], 428 | "source": [ 429 | "type(4 > 5)" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": { 436 | "collapsed": false, 437 | "deletable": true, 438 | "editable": true 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "5 + 1 == 6" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": { 448 | "deletable": true, 449 | "editable": true 450 | }, 451 | "source": [ 452 | "As you can see, a comparison of two integer values will output a boolean value. A similar concept can be applied to other data types, but that will be covered later. For now, we will learn exactly why this is important. A very important part of Python is the *if-statement*. This statement lets you take a boolean value and if True, compute some action. If it's False, the code under the *if* will be ignored.\n", 453 | "\n", 454 | "The basic syntax of an if statement looks like this:\n", 455 | "\n", 456 | " if (boolean):\n", 457 | " \n", 458 | " some code\n", 459 | "\n", 460 | "You may have noticed that the code below the if is indented. This is very important! In Python, the indent is used to separate lines of code. All the code inside any command has to be at the same indent line otherwise Python can't read it. In order for python to understand that the action you want is for the \"if\" case, you must indent exactly one indent beyond where the \"if\" is. Here are some examples of the \"if\".\n", 461 | " \n" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": null, 467 | "metadata": { 468 | "collapsed": false, 469 | "deletable": true, 470 | "editable": true 471 | }, 472 | "outputs": [], 473 | "source": [ 474 | "if 4 == 4:\n", 475 | " print(\"4 is equal to four!\")" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": { 482 | "collapsed": false, 483 | "deletable": true, 484 | "editable": true 485 | }, 486 | "outputs": [], 487 | "source": [ 488 | "example_bool = 5 > 4\n", 489 | "if example_bool:\n", 490 | " print(\"5 is greater than 4\")" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": { 496 | "deletable": true, 497 | "editable": true 498 | }, 499 | "source": [ 500 | "Notice that it wasn't needed to specify whether or not example_bool was equal, less than, or greater than anything. This is because example_bool is a true statement and Python will allow true if-statements to execute." 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": null, 506 | "metadata": { 507 | "collapsed": false, 508 | "deletable": true, 509 | "editable": true 510 | }, 511 | "outputs": [], 512 | "source": [ 513 | "example_bool = 4 > 5\n", 514 | "if example_bool == False:\n", 515 | " print(\"4 is greater than 5\")" 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": { 521 | "deletable": true, 522 | "editable": true 523 | }, 524 | "source": [ 525 | "Notice in the last example, \"4 > 5\" seems to be a false value, but in the if-statement, it is asking if the value is equal to False. Thus, what the statement actually executes to is \"False == False\", which computes to \"True\"! Make sure you watch for tricks like this one!\n", 526 | "\n", 527 | "\n", 528 | "Now try it yourself! Fill in the if-statement with a true value and watch the code print.\n" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": { 535 | "collapsed": true, 536 | "deletable": true, 537 | "editable": true 538 | }, 539 | "outputs": [], 540 | "source": [ 541 | "if ___________:\n", 542 | " print(\"You have succeeded!\")" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": { 548 | "deletable": true, 549 | "editable": true 550 | }, 551 | "source": [ 552 | "Set a variable \"mood\" to any number. If \"mood\" is greater than 5, print \"I am glad you are in a good mood today!\"" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": null, 558 | "metadata": { 559 | "collapsed": true, 560 | "deletable": true, 561 | "editable": true 562 | }, 563 | "outputs": [], 564 | "source": [ 565 | "mood = _______\n", 566 | "if ________:\n", 567 | " print(____________)" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": { 573 | "deletable": true, 574 | "editable": true 575 | }, 576 | "source": [ 577 | "So far, the \"if\" statements that we've seen have only had one condition. That means, we only checked one boolean variable, and that was it. However, we can actually have any number of \"if\" checks. For example, if we have the score of a student, and we want to check which letter grade corresponds to their score, we could write the following:" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": null, 583 | "metadata": { 584 | "collapsed": false, 585 | "deletable": true, 586 | "editable": true 587 | }, 588 | "outputs": [], 589 | "source": [ 590 | "score = 76\n", 591 | "if score >= 90:\n", 592 | " print(\"A\")\n", 593 | "elif 80 <= score < 90:\n", 594 | " print(\"B\")\n", 595 | "elif 70 <= score < 80:\n", 596 | " print(\"C\")\n", 597 | "elif 60 <= score < 70:\n", 598 | " print(\"D\")\n", 599 | "else:\n", 600 | " print(\"F\")" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": { 606 | "deletable": true, 607 | "editable": true 608 | }, 609 | "source": [ 610 | "Python evaluates these statements in order, and stops at the first one that is true. For example, in this case, the first two booleans are false (76 is not greater than 90 nor is it between 80 and 90), but the third is true (76 is between 70 and 80). Therefore, the code prints out C, and then stops evaluating this sequence of statements. " 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": { 616 | "deletable": true, 617 | "editable": true 618 | }, 619 | "source": [ 620 | "We will build upon these foundational skills that you've learned in today's notebook in the next session. We encourage that you practice these tools on your own; you can do so by creating a blank notebook." 621 | ] 622 | } 623 | ], 624 | "metadata": { 625 | "anaconda-cloud": {}, 626 | "kernelspec": { 627 | "display_name": "Python 3", 628 | "language": "python", 629 | "name": "python3" 630 | }, 631 | "language_info": { 632 | "codemirror_mode": { 633 | "name": "ipython", 634 | "version": 3 635 | }, 636 | "file_extension": ".py", 637 | "mimetype": "text/x-python", 638 | "name": "python", 639 | "nbconvert_exporter": "python", 640 | "pygments_lexer": "ipython3", 641 | "version": "3.6.0" 642 | } 643 | }, 644 | "nbformat": 4, 645 | "nbformat_minor": 2 646 | } 647 | -------------------------------------------------------------------------------- /lesson5/cricket_tiers.csv: -------------------------------------------------------------------------------- 1 | PLAYER,Salary,Tier 2 | MS Dhoni,1,1 3 | Virat Kohli,1,1 4 | Ajinkya Rahan,1,1 5 | Ravi Ashwin,1,1 6 | Suresh Raina,0.5,2 7 | Ambati Rayudu,0.5,2 8 | Rohit Sharma,0.5,2 9 | Murali Vijay,0.5,2 10 | Shikhar Dhawan,0.5,2 11 | Bhuvneshwar Kumar,0.5,2 12 | Umesh Yadav,0.5,2 13 | Ishant Sharma,0.5,2 14 | Cheteshwar Pujara,0.5,2 15 | Mohammed Shami,0.5,2 16 | Amit Mishra,0.25,3 17 | Axar Patel,0.25,3 18 | Stuart Binny,0.25,3 19 | Wriddhiman Saha,0.25,3 20 | Mohit Sharma,0.25,3 21 | Vinay Kumar,0.25,3 22 | Mohit Sharma,0.25,3 23 | Varun Aaron,0.25,3 24 | Karn Sharma,0.25,3 25 | Ravindra Jadeja,0.25,3 26 | KL Rahul,0.25,3 27 | Dhawal Kulkarni,0.25,3 28 | Harbhajan Singh,0.25,3 29 | S Aravind,0.25,3 -------------------------------------------------------------------------------- /lesson5/doctor_salaries.csv: -------------------------------------------------------------------------------- 1 | Job Title,Location,Salary 2 | Goverment of Tamil Nadu Doctor - Monthly,India,"59,447" 3 | Paras Hospital Doctor,India,"1,175,091" 4 | Indian Navy Doctor - Monthly,India,"122,566" 5 | Fortis Healthcare Doctor - Monthly,India,"319,328" 6 | Isha Foundation Doctor,India,"128,659" 7 | Border Security Force Doctor - Monthly Contractor,India,"47,065" 8 | Govt of Gujarat Doctor,India,"974,240" 9 | Tangedco Doctor,India,"826,281" 10 | Shathayu Ayurveda Wellness Centre Doctor,India,"1,269,676" 11 | DH&FWS Doctor - Monthly,India,"38,773" 12 | Agartala Government Medical College Doctor - Monthly,India,"64,477" 13 | Haryana Department of Health Doctor Intern - Monthly,India,"11,796" 14 | Odisha Government Doctor - Monthly,India,"39,845" 15 | Vardhman Mahavir Medical College Doctor - Monthly,India,"47,334" 16 | Ayushya Hospital Doctor - Monthly,India,"52,967" 17 | Government of Assam Doctor - Monthly,India,"32,988" 18 | Riyahd Military Hospital Doctor - Monthly,India,"128,411" 19 | "Fortis Hospital, NOIDA Doctor - Monthly",India,"51,686" 20 | Rajendra Institute of Medical Sciences Doctor - Monthly,India,"47,376" 21 | Government of Haryana Doctor - Monthly,India,"50,140" 22 | Best Doctors Doctor - Monthly,India,"82,389" 23 | Apollo Hospitals Doctor - Monthly,India,"118,980" 24 | All India Institute of Medical Sciences Doctor - Monthly,India,"247,352" 25 | Doctor's Exchange of Washington Doctor - Monthly,India,"40,841" 26 | Best Doctors Doctor,India,"7,387,274" 27 | Government of India Doctor - Monthly,India,"111,044" 28 | Government of India Doctor Intern - Monthly,India,"17,979" 29 | Municipal of Delhi Doctor - Monthly,India,"35,859" 30 | Best Doctors Doctor Intern - Monthly,India,"20,084" 31 | Government of India Doctor - Monthly Contractor,India,"47,095" 32 | Government of India Doctor,India,"516,600" 33 | Self-Employment and Entrepreneur Development Society (SEEDS) Doctor - Monthly,India,"155,303" 34 | Doctors Hospital at Renaissance Doctor - Monthly,India,"128,672" 35 | Fortis Hospitals Doctor - Monthly,India,"29,254" 36 | Fortis Hospitals Doctor,India,"705,009" 37 | Cancer Cross Doctor - Monthly,India,"49,763" 38 | Dr. Lal PathLabs Doctor - Monthly,India,"100,159" 39 | Govt of Uttarakhand Doctor - Monthly Contractor,India,"61,469" 40 | Apollo Health and Lifestyle Doctor - Monthly,India,"40,122" 41 | MGM medical college Doctor - Contractor,India,"385,930" 42 | Vijaya Hospital Doctor,India,"712,630" 43 | Sahyadri Speciality Hospital Doctor - Monthly,India,"17,759" 44 | Life Force Doctor - Monthly,India,"24,834" 45 | Indian Army Doctor,India,"1,149,396" 46 | Rajasthan High Court Doctor - Monthly,India,"71,984" 47 | Global Hospitals Group Doctor - Monthly,India,"33,327" 48 | Columbia Asia Doctor - Monthly,India,"69,998" 49 | Government of Odisha Doctor,India,"776,421" 50 | KMC Hospital Doctor - Monthly,India,"128,962" 51 | Government of NCT of Delhi Doctor - Monthly,India,"69,312" 52 | "Department of Health & Family Welfare, Punjab, India Doctor",India,"768,644" 53 | UPRVUNL Doctor - Monthly Contractor,India,"31,159" 54 | Dr. Batras' Positive Health Clinic Doctor - Monthly,India,"21,100" 55 | "Himachal Pradesh Government, India Doctor Intern - Monthly",India,"19,474" 56 | Pmch Doctor - Monthly,India,"128,970" 57 | Park Hospital Doctor,India,"420,063" 58 | Bad Hospitals in Delhi Doctor - Monthly,India,"19,999" 59 | Geeta Infotech Doctor - Monthly,India,"51,754" 60 | National Rural Health Mission Andhra Pradesh Doctor - Monthly Contractor,India,"25,966" 61 | The Jaypee Group Doctor - Monthly,India,"11,839" 62 | Pragya Doctor,India,"670,158" 63 | Lok Hospital Doctor - Monthly,India,"47,259" 64 | Bharat Vikas Parishad Hospital & Research Center Doctor,India,"1,295,646" 65 | MŽdecins Sans Frontires India Doctor - Monthly,India,"50,869" 66 | East Delhi Municipal Doctor,India,"1,415,971" 67 | ASRAM Hospital Doctor - Monthly,India,"29,456" 68 | KAMALA REDDY MEMORIAL FOUNDATION Doctor,India,"415,383" 69 | Select Medical Doctor - Monthly,India,"58,739" 70 | Apollo Doctor,India,"1,184,035" 71 | UT Southwestern Medical Center Doctor,India,"904,750" 72 | HealthSpring Doctor - Monthly Contractor,India,"115,694" 73 | Capgemini Doctor - Monthly,India,"96,512" 74 | Rmo Doctor - Monthly,India,"59,169" 75 | GURUKRUPA Doctor - Monthly,India,"38,506" 76 | Transocean Offshore Deepwater Drilling Doctor,India,"1,938,982" 77 | Maxim Healthcare Services Doctor - Monthly,India,"99,616" 78 | CMC Doctor - Monthly,India,"38,825" 79 | Alere Doctor - Monthly,India,"64,635" 80 | Apollo Education Group Doctor - Monthly,India,"41,390" 81 | Jessica McClintock Doctor Intern - Monthly,India,"128,809" 82 | Ayurveda Pura Doctor - Monthly,India,"50,176" 83 | Government Procurement Service Doctor - Monthly,India,"25,576" 84 | Home Office Doctor - Monthly,India,"19,401" 85 | Department of Health UK Doctor - Monthly Contractor,India,"34,714" 86 | Communities & Local Government Doctor - Monthly Contractor,India,"51,258" 87 | Star Hospitals Nurse Cum Physician Assistant - Monthly,India,"19,370" 88 | Subharti University Doctor ENT Surgeon - Monthly,India,"115,666" 89 | BARC Jrd Doctor Pathology - Monthly,India,"51,613" 90 | GVK Industries Emergency Response Care Physician,India,"440,721" 91 | ONGC Occupational Health Physician - Monthly,India,"103,755" 92 | "Rajiv Gandhi Cancer Institute & Researchcentre Physician, Investigator - Clinical Research - Monthly",India,"130,251" 93 | Ganga Medical Centre & Hospitals Doctor's Assistant,India,"231,833" 94 | Shabbir Tiles Doctor Clinic - Monthly,India,"128,848" 95 | BMC DOCTOR INMEDICAL COLLGE,India,"462,962" 96 | Dr.MANU'S HOMEOPATHY Branch Head Doctor,India,"192,913" 97 | MGM medical college Junior Resident Doctor - Monthly,India,"12,866" 98 | Dr. Netra Mandir Consultant Doctor,India,"948,358" 99 | Government of Maharashtra Doctor/Physion - Monthly,India,"58,846" 100 | Excel Homeo Medical Centre Consultant Physician,India,"141,097" 101 | Apollo Health Street A R Physician,India,"375,787" 102 | Govt. Medical College Junior Resident Doctor Department of General Surgery - Monthly,India,"51,427" 103 | Asian Heart Institute and Research Centre Physician Assistant - Monthly Contractor,India,"29,770" 104 | Madras Medical Mission Physician Assistant - Monthly,India,"31,971" 105 | Government of Kerala Veterinary Doctor - Monthly,India,"64,326" 106 | BanasDairy Veterinary Doctor - Monthly,India,"28,430" 107 | Blossom Hospital Consultant Diabetologist & Physician,India,"514,132" 108 | Dr.MANU'S HOMEOPATHY Homeopathy Doctor,India,"175,534" 109 | VS Hospital Junior Doctor Intern - Monthly,India,"8,524" 110 | Levioza Health Care Medical Doctor & Consultant - Monthly Contractor,India,"24,616" 111 | Merck KGaA Medical Affairs Physician,India,"773,302" 112 | Government of India Resident Doctor - Monthly,India,"33,404" 113 | Vardhman Mahavir Medical College Resident Doctor - Contractor,India,"774,587" 114 | Civil Hospital Amdavad Resident Doctor,India,"323,373" 115 | TNMC & Nair Hospital Resident Doctor In Surgery Intern - Monthly,India,"40,059" 116 | Apollo Hospitals Resident Doctor - Monthly,India,"40,498" -------------------------------------------------------------------------------- /lesson5/lesson_5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Lesson 5: Exercises" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "We've given you a lot of material in the past few sessions. Let's work on a key concept we covered last session: Ttables." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 1, 27 | "metadata": { 28 | "collapsed": true 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "import numpy as np\n", 33 | "from datascience import *" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "**RCB's Cricket Confusion** \n", 41 | "\n", 42 | "Royal Challengers Bangalore finished last in IPL 2017. The coach wants to build a strong team next year to win the trophy by buying the best possible players in the auction. You're a sports analyst whose job is to help him put a team together that's within his budget. There are 3 categories of players with the following costs in crores.\n", 43 | "Tier 1: 1\n", 44 | "Tier 2: 0.5\n", 45 | "Tier 3: 0.25\n", 46 | " \n", 47 | "The coach has 10 crores to spend and can purchase 11 to 16 players he wants from this set of 28. Write a function that accepts the coach's input for the number of players he wants that's within his budget.\n", 48 | "\n", 49 | "*Return* the players' total salary and *print* their names.\n", 50 | "REMEMBER: We refer to the first value in a column with 0!\n", 51 | "\n", 52 | "Sample Output: select_players(2, 4, 5)\n", 53 | "[MS Dhoni, Virat Kohli, Suresh Raina, Ambati Rayudu, Rohit Sharma, Murali Vijay, Amit Mishra, Axar Patel, Stuart Binny, Wriddhiman Saha, Mohit Sharma]\n", 54 | "5.25\n", 55 | "\n", 56 | "Sample Input: select_players(2, 2, 5)\n", 57 | "Sample Output: Too few players!\n", 58 | "\n", 59 | "Sample Input: select_players(10, 3, 1)\n", 60 | "Sample Output: Too many players!" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": { 67 | "collapsed": false 68 | }, 69 | "outputs": [], 70 | "source": [ 71 | "players = Table().read_table(\"cricket_tiers.csv\")\n", 72 | "#Source: http://www.totalsportek.com/cricket/indian-player-salaries-central-contracts-2015/\n", 73 | "players" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "How can you find the number of players in each tier? Hint: Once you have this table, pass this number in as the first parameter of the *range* function when splitting up the players table" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": { 87 | "collapsed": false 88 | }, 89 | "outputs": [], 90 | "source": [ 91 | "players.______(\"Tier\")" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": { 98 | "collapsed": false, 99 | "scrolled": true 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "def select_players(tier_1, tier_2, tier_3):\n", 104 | " total_players = tier_1 + tier_2 + tier_3\n", 105 | " if ______________:\n", 106 | " print(\"Too few players!\")\n", 107 | " elif ____________:\n", 108 | " print(\"Too many players!\")\n", 109 | " else:\n", 110 | " t1 = players._________(\"______\", are.equal_to(_______)).take(range(0, tier_1)).column(\"PLAYER\")\n", 111 | " t2 =\n", 112 | " t3 = \n", 113 | " \n", 114 | " #How will you access the players' names using the column() function?\n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " total_salary = __________+_____________+______________\n", 119 | " if total_salary > 100:\n", 120 | " return \"Too expensive! Select a different combination!\"\n", 121 | " return total_salary" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "Test out your function here." 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": { 135 | "collapsed": true 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "select_players(____, ____, _____)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "**Building a Better Estimate**\n", 147 | "\n", 148 | "Raju the builder has made N measurements. Now, he wants to know the average value of the measurements made. In order to make the average value a better representative of the measurements, before calculating the average, he wants first to remove the highest K and the lowest K measurements. After that, he will calculate the average value among the remaining N - 2K measurements.\n", 149 | "Could you help Raju find the average value he will get after these manipulations?\n", 150 | "\n", 151 | "\n", 152 | "Sample Input: \n", 153 | "N - 5 \n", 154 | "K - 1\n", 155 | "N values - 2 9 -10 25 1\n", 156 | "Sample Output: \n", 157 | "4.00000\n" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "collapsed": false 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "def new_measurements(n, k, arr):\n", 169 | " \n", 170 | " for i in range(___, ___):\n", 171 | " " 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "**Aditi’s Career Planning**\n", 179 | " \n", 180 | "Aditi can’t decide what field she wants to work in when she grows up! She likes medicine and engineering equally so her father advised her to pick the field that pays the most to an average worker. Aditi has collected tables containing the necessary data on the salaries of professionals in these fields and stored them in 2 unsorted arrays. Can you help her find out which job to pick as per her father’s advice? \n", 181 | "\n", 182 | "Hint: use the sort() function.\n", 183 | "\n" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": { 190 | "collapsed": true 191 | }, 192 | "outputs": [], 193 | "source": [ 194 | "def salary(med, engg):\n", 195 | " #In Python, we can define functions inside our own functions. \n", 196 | " #This function will compute a certain quantity from each array for you to help you compare the salaries.\n", 197 | " #What quantity do you think it is?\n", 198 | " def helper(array):\n", 199 | " \n", 200 | " _________________\n", 201 | " \n", 202 | " _________________\n", 203 | " length = len(array)\n", 204 | " if ___________:\n", 205 | " return array[___]\n", 206 | " else:\n", 207 | " return __________\n", 208 | " \n", 209 | " med_salary = _____________\n", 210 | " engg_salary = ____________\n", 211 | " law_salary = _____________\n", 212 | " #The max() function takes the maximum of all the values you put into it\n", 213 | " best_salary = max(___________, ______________, ___________) \n", 214 | " if best_salary == engg_salary:\n", 215 | " print(\"Engineering\")\n", 216 | " else:\n", 217 | " print(\"Medicine\")\n" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "Now, try to find an array of salaries from these two tables." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 21, 230 | "metadata": { 231 | "collapsed": false 232 | }, 233 | "outputs": [ 234 | { 235 | "data": { 236 | "text/plain": [ 237 | "array([ 59447., 1175091., 122566., 319328., 128659., 47065.,\n", 238 | " 974240., 826281., 1269676., 38773., 64477., 11796.,\n", 239 | " 39845., 47334., 52967., 32988., 128411., 51686.,\n", 240 | " 47376., 50140., 82389., 118980., 247352., 40841.,\n", 241 | " 7387274., 111044., 17979., 35859., 20084., 47095.,\n", 242 | " 516600., 155303., 128672., 29254., 705009., 49763.,\n", 243 | " 100159., 61469., 40122., 385930., 712630., 17759.,\n", 244 | " 24834., 1149396., 71984., 33327., 69998., 776421.,\n", 245 | " 128962., 69312., 768644., 31159., 21100., 19474.,\n", 246 | " 128970., 420063., 19999., 51754., 25966., 11839.,\n", 247 | " 670158., 47259., 1295646., 50869., 1415971., 29456.,\n", 248 | " 415383., 58739., 1184035., 904750., 115694., 96512.,\n", 249 | " 59169., 38506., 1938982., 99616., 38825., 64635.,\n", 250 | " 41390., 128809., 50176., 25576., 19401., 34714.,\n", 251 | " 51258., 19370., 115666., 51613., 440721., 103755.,\n", 252 | " 130251., 231833., 128848., 462962., 192913., 12866.,\n", 253 | " 948358., 58846., 141097., 375787., 51427., 29770.,\n", 254 | " 31971., 64326., 28430., 514132., 175534., 8524.,\n", 255 | " 24616., 773302., 33404., 774587., 323373., 40059.,\n", 256 | " 40498.])" 257 | ] 258 | }, 259 | "execution_count": 21, 260 | "metadata": {}, 261 | "output_type": "execute_result" 262 | } 263 | ], 264 | "source": [ 265 | "engineers = Table.read_table(\"engineering_data.csv\")\n", 266 | "#Source: http://research.aspiringminds.com/resources/#datasets\n", 267 | "doctors = Table.read_table(\"doctor_salaries.csv\")\n", 268 | "#Source: https://www.glassdoor.com/Salaries/india-doctor-salary-SRCH_IL.0,5_IN115_KO6,12_IP6.htm\n", 269 | "med_strip = list(map(lambda s : s.replace(\",\",\"\"), doctors.column(\"Salary\")))\n", 270 | "\n", 271 | "#This space is for you to see what's inside each table.\n" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": { 278 | "collapsed": false 279 | }, 280 | "outputs": [], 281 | "source": [ 282 | "engg_salaries =\n", 283 | "#We needed to clean up the table of doctors' salaries so that you could do calculations with it\n", 284 | "med_salaries = np.asarray(med_strip).astype(np.float)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 2, 290 | "metadata": { 291 | "collapsed": true 292 | }, 293 | "outputs": [], 294 | "source": [ 295 | "#Call your salary function here.\n", 296 | "\n", 297 | "__________(________, ___________)" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "We hope you got some great practice today! In the next session, we're going to look at how tables can be applied when we're looking at probability." 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "Python 3", 311 | "language": "python", 312 | "name": "python3" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 3 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython3", 324 | "version": "3.6.0" 325 | } 326 | }, 327 | "nbformat": 4, 328 | "nbformat_minor": 2 329 | } 330 | -------------------------------------------------------------------------------- /lesson6/.ipynb_checkpoints/lesson6-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import matplotlib.pyplot as plt\n", 12 | "import math\n", 13 | "import numpy as np\n", 14 | "import matplotlib.mlab as mlab\n", 15 | "import random\n", 16 | "from datascience import *\n", 17 | "%matplotlib inline\n", 18 | "import matplotlib.pyplot as plots\n", 19 | "plots.style.use('fivethirtyeight')\n", 20 | "import math" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Intro to Probability -- Digging Deeper" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Probability Distributions\n", 35 | "\n", 36 | "In the lesson, we ended with an experiment about coin flipping and we compared the empirical probabilities with the theoretical probabilites. Here, the same experiement is simulated, with an extra column for the theoretical probability. The experiment has been run 10 times, 100 times, 1000 times, and 10,000 times so you can see the difference between experimental and theoretical probablity as the numbers get extremely high. Run the cells and note what you see!\n", 37 | "\n", 38 | "Remember the table that we make in this way is known as a \"probability distribution.\" That's just a fancy way of saying that it tells us what the probability of each outcome is." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": false 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "run_amount=10\n", 50 | "heads,tails=0,0\n", 51 | "for i in range(run_amount):\n", 52 | " coin=random.randint(0,1)\n", 53 | " if coin==0:\n", 54 | " heads+=1\n", 55 | " else:\n", 56 | " tails+=1\n", 57 | "theoretical_prob=(.5*run_amount)/run_amount\n", 58 | "Table().with_columns(\n", 59 | " 'Side', make_array('heads','tails'),\n", 60 | " 'Count', make_array(heads,tails),\n", 61 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 62 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 63 | ")" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": { 70 | "collapsed": false 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "run_amount=100\n", 75 | "heads,tails=0,0\n", 76 | "for i in range(run_amount):\n", 77 | " coin=random.randint(0,1)\n", 78 | " if coin==0:\n", 79 | " heads+=1\n", 80 | " else:\n", 81 | " tails+=1\n", 82 | "theoretical_prob=(.5*run_amount)/run_amount\n", 83 | "Table().with_columns(\n", 84 | " 'Side', make_array('heads','tails'),\n", 85 | " 'Count', make_array(heads,tails),\n", 86 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 87 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 88 | ")" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "run_amount=1000\n", 100 | "heads,tails=0,0\n", 101 | "for i in range(run_amount):\n", 102 | " coin=random.randint(0,1)\n", 103 | " if coin==0:\n", 104 | " heads+=1\n", 105 | " else:\n", 106 | " tails+=1\n", 107 | "theoretical_prob=(.5*run_amount)/run_amount\n", 108 | "Table().with_columns(\n", 109 | " 'Side', make_array('heads','tails'),\n", 110 | " 'Count', make_array(heads,tails),\n", 111 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 112 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 113 | ")" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": { 120 | "collapsed": false 121 | }, 122 | "outputs": [], 123 | "source": [ 124 | "run_amount=10000\n", 125 | "heads,tails=0,0\n", 126 | "for i in range(run_amount):\n", 127 | " coin=random.randint(0,1)\n", 128 | " if coin==0:\n", 129 | " heads+=1\n", 130 | " else:\n", 131 | " tails+=1\n", 132 | "theoretical_prob=(.5*run_amount)/run_amount\n", 133 | "Table().with_columns(\n", 134 | " 'Side', make_array('heads','tails'),\n", 135 | " 'Count', make_array(heads,tails),\n", 136 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 137 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 138 | ")\n" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "You probably noticed that as the experiement is run more times, the experimental and theoretical probabilities begin to align more and more. This is known as the law of large numbers.\n", 146 | "\n", 147 | "Now it's your turn! Let's consider a different scenario. Suppose we were rolling a die with numbers 1 through 6 on it instead. What's the theoretical probability of rolling any number? What does the empirical probability of rolling a 6 if you do this experiment 100 times? Fill in the code in the cell below with the right values to answer this question! After you do that, try increasing the run_amount to see what happens to the probability distribution." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "run_amount = # fill in this value\n", 159 | "ones, others = 0, 0\n", 160 | "for i in range(run_amount):\n", 161 | " die_roll = random.randint( ) # fill in this function\n", 162 | " if die_roll == 0:\n", 163 | " ones += 1\n", 164 | " else:\n", 165 | " others += 1\n", 166 | "theoretical_prob_ones = # What's the probability of getting a 6 (or any one number)?\n", 167 | "theoretical_prob_others = # What's the probability of getting a value that's not a 6 (or your desired number)? \n", 168 | "#Hint: Use the complement rule\n", 169 | "Table().with_columns(\n", 170 | " 'Side', make_array(\"ones\", \"others\")\n", 171 | " 'Count', make_array(ones, others),\n", 172 | " 'Empirical Probability', make_array(ones/run_amount,others/run_amount),\n", 173 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 174 | ")" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "## Normal Distribution\n", 182 | "\n", 183 | "Coin tosses and die rolls are examples of \"discrete\" random variables. There are only a few specific outcomes: a coin toss can be either heads or tails, and a die roll can be 1, 2, 3, 4, 5, or 6. But some random variables are \"continuous\": they can take on any value in a range.\n", 184 | "\n", 185 | "For these continuous random variables, we can't display the probability distribution in a table like we could for coin tosses. Instead, we need to graph the distribution. In the lesson today, you learned about one such graph: the normal distribution. Let's take a look at it by running the following cells:" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "collapsed": false 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "mu = 100\n", 197 | "variance = 225\n", 198 | "sigma = 15\n", 199 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 200 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 201 | "plt.axis([0,200, 0, 0.025])\n", 202 | "plt.show()\n" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "As you can see, the shape of the normal distribution is affected by two variables: mu, or it's mean, and sigma, or it's standard deviation. Here are a few more examples of normal distributions that have different values for mu and sigma:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [], 219 | "source": [ 220 | "mu = 0\n", 221 | "variance = 1\n", 222 | "sigma = math.sqrt(variance)\n", 223 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 224 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 225 | "plt.show()" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false, 233 | "scrolled": true 234 | }, 235 | "outputs": [], 236 | "source": [ 237 | "mu = 0\n", 238 | "variance = 100\n", 239 | "sigma = math.sqrt(variance)\n", 240 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 241 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 242 | "plt.show()" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "Now make your own! Insert values for mu and variance, run the cell, and see what happens!\n" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "collapsed": false 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "mu = ##\n", 261 | "variance = ##\n", 262 | "sigma = math.sqrt(variance)\n", 263 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 264 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 265 | "plt.show()" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": { 271 | "collapsed": true 272 | }, 273 | "source": [ 274 | "## Normal Distributions and Coin Tosses\n", 275 | "\n", 276 | "Now we're gonna try an experiment. Earlier, we flipped a coin 100 times and saw how many times it came heads up. Now we're gonna repeat that same experiment a LOT of times and see what happens by plotting the resulting values in a histogram. First run the following cell:" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "num_trials = 100\n", 288 | "results = [0]*101\n", 289 | "run_amount = 100\n", 290 | "for j in range(num_trials):\n", 291 | " heads = 0\n", 292 | " for i in range(run_amount):\n", 293 | " coin = random.randint(0,1)\n", 294 | " if coin == 0:\n", 295 | " heads += 1\n", 296 | " results[heads] = results[heads] + 1\n", 297 | "\n", 298 | "ticks = np.arange(len(results)) # this tells matplotlib how to arrange the bars\n", 299 | "plt.bar(ticks, results)\n", 300 | "plt.xlabel(\"Number of Heads\")\n", 301 | "plt.ylabel(\"Count\")" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "Does the outline of the histogram look familiar? What if we increase the number of trials?" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "collapsed": false 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "num_trials = 1000\n", 320 | "run_amount = 100\n", 321 | "results = [0]*(run_amount + 1)\n", 322 | "for j in range(num_trials):\n", 323 | " heads = 0\n", 324 | " for i in range(run_amount):\n", 325 | " coin = random.randint(0,1)\n", 326 | " if coin == 0:\n", 327 | " heads += 1\n", 328 | " results[heads] = results[heads] + 1\n", 329 | "\n", 330 | "ticks = np.arange(len(results))\n", 331 | "plt.bar(ticks, results)\n", 332 | "plt.xlabel(\"Number of Heads\")\n", 333 | "plt.ylabel(\"Count\")" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "It turns out that the shape of the graph is actually very similar to a normal distribution! This is what makes the normal distribution so powerful - it is a good approximation of what happens when we run a trial with a fixed probability of success many times.\n", 341 | "\n", 342 | "Now, one last exercise. Try to implement the same experiment for a die roll - that is, instead of counting the number of heads, count the number of \"one's\" on a die rolled 100 times for many trials. What happens to the graph? Does it still look like a normal curve? Try experimenting with different values of num_trials too." 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": { 349 | "collapsed": true 350 | }, 351 | "outputs": [], 352 | "source": [ 353 | "num_trials = #\n", 354 | "run_amount = #\n", 355 | "results = [0]*(run_amount + 1)\n", 356 | "for j in range(num_trials):\n", 357 | " ones = 0\n", 358 | " for i in range(run_amount):\n", 359 | " die_roll = random.randint( ) # fill in this value so that the experiment is a die roll, not a coin toss\n", 360 | " if die_roll == 0:\n", 361 | " ones += 1\n", 362 | " results[ones] = results[ones] + 1\n", 363 | "\n", 364 | "ticks = np.arange(len(results))\n", 365 | "plt.bar(ticks, results)\n", 366 | "plt.xlabel(\"Number of Ones\")\n", 367 | "plt.ylabel(\"Count\")" 368 | ] 369 | } 370 | ], 371 | "metadata": { 372 | "kernelspec": { 373 | "display_name": "Python 3", 374 | "language": "python", 375 | "name": "python3" 376 | }, 377 | "language_info": { 378 | "codemirror_mode": { 379 | "name": "ipython", 380 | "version": 3 381 | }, 382 | "file_extension": ".py", 383 | "mimetype": "text/x-python", 384 | "name": "python", 385 | "nbconvert_exporter": "python", 386 | "pygments_lexer": "ipython3", 387 | "version": "3.6.0" 388 | } 389 | }, 390 | "nbformat": 4, 391 | "nbformat_minor": 2 392 | } 393 | -------------------------------------------------------------------------------- /lesson6/lesson6.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import matplotlib.pyplot as plt\n", 12 | "import math\n", 13 | "import numpy as np\n", 14 | "import matplotlib.mlab as mlab\n", 15 | "import random\n", 16 | "from datascience import *\n", 17 | "%matplotlib inline\n", 18 | "import matplotlib.pyplot as plots\n", 19 | "plots.style.use('fivethirtyeight')\n", 20 | "import math" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Intro to Probability -- Digging Deeper" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Probability Distributions\n", 35 | "\n", 36 | "In the lesson, we ended with an experiment about coin flipping and we compared the empirical probabilities with the theoretical probabilites. Here, the same experiement is simulated, with an extra column for the theoretical probability. The experiment has been run 10 times, 100 times, 1000 times, and 10,000 times so you can see the difference between experimental and theoretical probablity as the numbers get extremely high. Run the cells and note what you see!\n", 37 | "\n", 38 | "Remember the table that we make in this way is known as a \"probability distribution.\" That's just a fancy way of saying that it tells us what the probability of each outcome is." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": false 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "run_amount=10\n", 50 | "heads,tails=0,0\n", 51 | "for i in range(run_amount):\n", 52 | " coin=random.randint(0,1)\n", 53 | " if coin==0:\n", 54 | " heads+=1\n", 55 | " else:\n", 56 | " tails+=1\n", 57 | "theoretical_prob=(.5*run_amount)/run_amount\n", 58 | "Table().with_columns(\n", 59 | " 'Side', make_array('heads','tails'),\n", 60 | " 'Count', make_array(heads,tails),\n", 61 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 62 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 63 | ")" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": { 70 | "collapsed": false 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "run_amount=100\n", 75 | "heads,tails=0,0\n", 76 | "for i in range(run_amount):\n", 77 | " coin=random.randint(0,1)\n", 78 | " if coin==0:\n", 79 | " heads+=1\n", 80 | " else:\n", 81 | " tails+=1\n", 82 | "theoretical_prob=(.5*run_amount)/run_amount\n", 83 | "Table().with_columns(\n", 84 | " 'Side', make_array('heads','tails'),\n", 85 | " 'Count', make_array(heads,tails),\n", 86 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 87 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 88 | ")" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "run_amount=1000\n", 100 | "heads,tails=0,0\n", 101 | "for i in range(run_amount):\n", 102 | " coin=random.randint(0,1)\n", 103 | " if coin==0:\n", 104 | " heads+=1\n", 105 | " else:\n", 106 | " tails+=1\n", 107 | "theoretical_prob=(.5*run_amount)/run_amount\n", 108 | "Table().with_columns(\n", 109 | " 'Side', make_array('heads','tails'),\n", 110 | " 'Count', make_array(heads,tails),\n", 111 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 112 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 113 | ")" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": { 120 | "collapsed": false 121 | }, 122 | "outputs": [], 123 | "source": [ 124 | "run_amount=10000\n", 125 | "heads,tails=0,0\n", 126 | "for i in range(run_amount):\n", 127 | " coin=random.randint(0,1)\n", 128 | " if coin==0:\n", 129 | " heads+=1\n", 130 | " else:\n", 131 | " tails+=1\n", 132 | "theoretical_prob=(.5*run_amount)/run_amount\n", 133 | "Table().with_columns(\n", 134 | " 'Side', make_array('heads','tails'),\n", 135 | " 'Count', make_array(heads,tails),\n", 136 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n", 137 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 138 | ")\n" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "You probably noticed that as the experiement is run more times, the experimental and theoretical probabilities begin to align more and more. This is known as the law of large numbers.\n", 146 | "\n", 147 | "Now it's your turn! Let's consider a different scenario. Suppose we were rolling a die with numbers 1 through 6 on it instead. What's the theoretical probability of rolling any number? What does the empirical probability of rolling a 6 if you do this experiment 100 times? Fill in the code in the cell below with the right values to answer this question! After you do that, try increasing the run_amount to see what happens to the probability distribution." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "run_amount = # fill in this value\n", 159 | "ones, others = 0, 0\n", 160 | "for i in range(run_amount):\n", 161 | " die_roll = random.randint( ) # fill in this function\n", 162 | " if die_roll == 0:\n", 163 | " ones += 1\n", 164 | " else:\n", 165 | " others += 1\n", 166 | "theoretical_prob_ones = # What's the probability of getting a 6 (or any one number)?\n", 167 | "theoretical_prob_others = # What's the probability of getting a value that's not a 6 (or your desired number)? \n", 168 | "#Hint: Use the complement rule\n", 169 | "Table().with_columns(\n", 170 | " 'Side', make_array(\"ones\", \"others\")\n", 171 | " 'Count', make_array(ones, others),\n", 172 | " 'Empirical Probability', make_array(ones/run_amount,others/run_amount),\n", 173 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n", 174 | ")" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "## Normal Distribution\n", 182 | "\n", 183 | "Coin tosses and die rolls are examples of \"discrete\" random variables. There are only a few specific outcomes: a coin toss can be either heads or tails, and a die roll can be 1, 2, 3, 4, 5, or 6. But some random variables are \"continuous\": they can take on any value in a range.\n", 184 | "\n", 185 | "For these continuous random variables, we can't display the probability distribution in a table like we could for coin tosses. Instead, we need to graph the distribution. In the lesson today, you learned about one such graph: the normal distribution. Let's take a look at it by running the following cells:" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "collapsed": false 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "mu = 100\n", 197 | "variance = 225\n", 198 | "sigma = 15\n", 199 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 200 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 201 | "plt.axis([0,200, 0, 0.025])\n", 202 | "plt.show()\n" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "As you can see, the shape of the normal distribution is affected by two variables: mu, or it's mean, and sigma, or it's standard deviation. Here are a few more examples of normal distributions that have different values for mu and sigma:" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [], 219 | "source": [ 220 | "mu = 0\n", 221 | "variance = 1\n", 222 | "sigma = math.sqrt(variance)\n", 223 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 224 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 225 | "plt.show()" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false, 233 | "scrolled": true 234 | }, 235 | "outputs": [], 236 | "source": [ 237 | "mu = 0\n", 238 | "variance = 100\n", 239 | "sigma = math.sqrt(variance)\n", 240 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 241 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 242 | "plt.show()" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "Now make your own! Insert values for mu and variance, run the cell, and see what happens!\n" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "collapsed": false 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "mu = ##\n", 261 | "variance = ##\n", 262 | "sigma = math.sqrt(variance)\n", 263 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n", 264 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n", 265 | "plt.show()" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": { 271 | "collapsed": true 272 | }, 273 | "source": [ 274 | "## Normal Distributions and Coin Tosses\n", 275 | "\n", 276 | "Now we're gonna try an experiment. Earlier, we flipped a coin 100 times and saw how many times it came heads up. Now we're gonna repeat that same experiment a LOT of times and see what happens by plotting the resulting values in a histogram. First run the following cell:" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "num_trials = 100\n", 288 | "results = [0]*101\n", 289 | "run_amount = 100\n", 290 | "for j in range(num_trials):\n", 291 | " heads = 0\n", 292 | " for i in range(run_amount):\n", 293 | " coin = random.randint(0,1)\n", 294 | " if coin == 0:\n", 295 | " heads += 1\n", 296 | " results[heads] = results[heads] + 1\n", 297 | "\n", 298 | "ticks = np.arange(len(results)) # this tells matplotlib how to arrange the bars\n", 299 | "plt.bar(ticks, results)\n", 300 | "plt.xlabel(\"Number of Heads\")\n", 301 | "plt.ylabel(\"Count\")" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "Does the outline of the histogram look familiar? What if we increase the number of trials?" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "collapsed": false 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "num_trials = 1000\n", 320 | "run_amount = 100\n", 321 | "results = [0]*(run_amount + 1)\n", 322 | "for j in range(num_trials):\n", 323 | " heads = 0\n", 324 | " for i in range(run_amount):\n", 325 | " coin = random.randint(0,1)\n", 326 | " if coin == 0:\n", 327 | " heads += 1\n", 328 | " results[heads] = results[heads] + 1\n", 329 | "\n", 330 | "ticks = np.arange(len(results))\n", 331 | "plt.bar(ticks, results)\n", 332 | "plt.xlabel(\"Number of Heads\")\n", 333 | "plt.ylabel(\"Count\")" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "It turns out that the shape of the graph is actually very similar to a normal distribution! This is what makes the normal distribution so powerful - it is a good approximation of what happens when we run a trial with a fixed probability of success many times.\n", 341 | "\n", 342 | "Now, one last exercise. Try to implement the same experiment for a die roll - that is, instead of counting the number of heads, count the number of \"one's\" on a die rolled 100 times for many trials. What happens to the graph? Does it still look like a normal curve? Try experimenting with different values of num_trials too." 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": { 349 | "collapsed": true 350 | }, 351 | "outputs": [], 352 | "source": [ 353 | "num_trials = #\n", 354 | "run_amount = #\n", 355 | "results = [0]*(run_amount + 1)\n", 356 | "for j in range(num_trials):\n", 357 | " ones = 0\n", 358 | " for i in range(run_amount):\n", 359 | " die_roll = random.randint( ) # fill in this value so that the experiment is a die roll, not a coin toss\n", 360 | " if die_roll == 0:\n", 361 | " ones += 1\n", 362 | " results[ones] = results[ones] + 1\n", 363 | "\n", 364 | "ticks = np.arange(len(results))\n", 365 | "plt.bar(ticks, results)\n", 366 | "plt.xlabel(\"Number of Ones\")\n", 367 | "plt.ylabel(\"Count\")" 368 | ] 369 | } 370 | ], 371 | "metadata": { 372 | "kernelspec": { 373 | "display_name": "Python 3", 374 | "language": "python", 375 | "name": "python3" 376 | }, 377 | "language_info": { 378 | "codemirror_mode": { 379 | "name": "ipython", 380 | "version": 3 381 | }, 382 | "file_extension": ".py", 383 | "mimetype": "text/x-python", 384 | "name": "python", 385 | "nbconvert_exporter": "python", 386 | "pygments_lexer": "ipython3", 387 | "version": "3.6.0" 388 | } 389 | }, 390 | "nbformat": 4, 391 | "nbformat_minor": 2 392 | } 393 | -------------------------------------------------------------------------------- /lesson7/british_india_troops.csv: -------------------------------------------------------------------------------- 1 | year,british_bengal,native_bengal,total_bengal,british_madras,native_madras,total_madras,british_bombay,native_bombay,total_bombay 2 | 1840,16303,102055,118358,12371,59711,72082,6930,38073,45003 3 | 1841,18873,106907,125780,11979,63183,75162,7554,42526,50080 4 | 1842,21114,109078,130192,12183,61378,73561,8816,42168,50984 5 | 1843,22007,113762,135769,14113,63804,77917,10606,43381,53987 6 | 1844,21645,112034,133679,14078,62547,76625,10517,41999,52516 7 | 1845,21783,133525,155308,14354,61953,76307,9974,44832,54806 8 | 1846,20445,133561,154006,12794,63217,76011,10775,43955,54730 9 | 1847,20898,132848,153746,12775,60904,73679,10650,53721,64371 10 | 1848,20596,114577,135173,12650,54806,67456,11024,51508,62532 11 | 1849,22727,124917,147644,12031,53697,65728,13135,50516,63651 12 | 1850,26803,126910,153713,11662,53867,65529,10815,47671,58486 13 | 1851,27159,138142,165301,11584,53667,65251,10665,48312,58977 14 | 1852,26089,139807,165896,11687,53714,65401,10933,45552,56485 15 | 1853,24986,139246,164232,11370,53787,65157,10577,45312,55889 16 | 1854,26531,138674,165205,11172,53254,64426,9443,44921,54364 17 | 1855,25344,139162,164506,10927,53031,63958,9822,44898,54720 18 | 1856,24594,137109,161703,10352,53201,63553,10158,44911,55069 19 | 1857,24366,135767,160133,10726,51244,61970,10430,45213,55643 20 | 1859,62167,82687,144854,17091,67141,84232,27032,46415,73447 21 | 1860,57778,91898,149676,17851,78440,96291,17237,42664,59901 22 | 1861,51791,86620,138411,18257,63727,81984,14246,34325,48571 23 | 1862,47912,39210,87122,16421,55687,72108,13841,31016,44857 24 | 1863,46614,40945,87559,15113,50964,66077,14358,28866,43224 25 | 1864,45283,42938,88221,15583,50131,65714,14095,27991,42086 26 | 1865,42128,43796,85924,16002,46693,62695,13750,27826,41576 -------------------------------------------------------------------------------- /lesson7/foreign_tourists.csv: -------------------------------------------------------------------------------- 1 | Nationality,Region,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010 2 | Canada,NORTH AMERICA,88600,93598,107671,135884,157643,176567,208214,222364,224069,242372 3 | U.S.A.,NORTH AMERICA,329147,348182,410803,526120,611165,696739,799062,804933,827140,931292 4 | Total,NORTH AMERICA,417747,441780,518474,662004,768808,873306,1007276,1027297,1051209,1173664 5 | Argentina,CENTRAL AND SOUTH AMERICA,2906,1359,1805,2799,3313,4493,4992,5087,6011,7626 6 | Brazil,CENTRAL AND SOUTH AMERICA,3819,3622,4528,7397,7005,9148,10788,11530,13964,15219 7 | Mexico,CENTRAL AND SOUTH AMERICA,3473,3105,3563,4570,5398,6502,8299,9272,8185,10458 8 | Others,CENTRAL AND SOUTH AMERICA,11727,9586,11758,13399,19870,18602,18240,17616,18444,29425 9 | Total,CENTRAL AND SOUTH AMERICA,21925,17672,21654,28165,35586,38745,42319,43505,46604,62728 10 | Austria,WESTERN EUROPE,17787,13801,16903,21093,27187,28045,26692,25900,27930,32620 11 | Belgium,WESTERN EUROPE,18851,13945,17309,24007,25596,29156,34207,36277,34759,37709 12 | Denmark,WESTERN EUROPE,14531,10230,11327,15805,20170,21592,28347,34253,30857,35541 13 | Finland,WESTERN EUROPE,8186,7673,8001,12525,16258,22860,34364,29223,24874,24089 14 | France,WESTERN EUROPE,102434,78194,97654,131824,152258,175345,204827,207802,196462,225232 15 | Germany,WESTERN EUROPE,80011,64891,76868,116679,120243,156808,184195,204344,191616,227720 16 | Greece,WESTERN EUROPE,3996,3207,3455,4468,4793,5146,6455,6672,6664,7441 17 | Ireland,WESTERN EUROPE,6136,5793,7083,8996,10052,14936,18376,18924,19223,20329 18 | Italy,WESTERN EUROPE,41351,37136,46908,65561,67642,79978,93540,85766,77873,94100 19 | Netherland,WESTERN EUROPE,42368,31669,40565,51211,52755,58611,67429,71605,64580,70756 20 | Norway,WESTERN EUROPE,7667,7475,8400,10631,11194,14216,19484,22369,22092,22229 21 | Portugal,WESTERN EUROPE,7028,7262,8158,10648,11457,13108,15756,15415,17184,21038 22 | Spain,WESTERN EUROPE,23073,19567,30551,42895,45247,53520,63357,62535,59047,72591 23 | Sweden,WESTERN EUROPE,14446,15330,18098,26154,28799,36013,47090,58961,43327,45028 24 | Switzerland,WESTERN EUROPE,25308,21606,24463,28260,34311,37446,41172,42107,38290,43134 25 | U.K.,WESTERN EUROPE,405472,387846,430917,555907,651083,734240,796191,776530,769251,759494 26 | Others,WESTERN EUROPE,1328,1158,1306,1633,3074,6251,4601,10842,10013,11291 27 | Total,WESTERN EUROPE,819973,726783,847966,1128297,1282119,1487271,1686083,1709525,1634042,1750342 28 | Czechoslovakia, EASTERN EUROPE,3197,2561,3466,4114,4783,5760,7764,8549,8328,9918 29 | Hungary, EASTERN EUROPE,1939,1557,1997,3527,3704,4262,5073,5263,4980,6022 30 | Poland, EASTERN EUROPE,5181,4468,6336,8445,10983,14808,20166,23517,19656,25424 31 | CIS, EASTERN EUROPE,25032,28304,38947,61816,75863,87433,109769,140341,135854,170112 32 | Others, EASTERN EUROPE,3514,3738,4506,5153,6077,7928,7946,10607,11565,12603 33 | Total, EASTERN EUROPE,38863,40628,55252,82426,101445,121309,152764,191110,183475,227650 34 | Egypt,AFRICA,2479,2688,3382,3781,4048,5528,6328,5326,5869,8017 35 | Ethiopia,AFRICA,2897,2535,2301,2661,3248,3140,3588,3306,3936,3797 36 | Kenya,AFRICA,15973,17275,16563,17538,19816,20313,25397,14941,22704,29223 37 | Mali,AFRICA,85,54,57,2541,114,162,238,232,273,495 38 | Mauritius,AFRICA,16039,14425,16308,19823,19760,20607,21522,19713,18866,21672 39 | Nigeria,AFRICA,7539,5997,5713,6659,10049,9348,10863,13997,18338,23893 40 | South Africa,AFRICA,21162,18238,23873,32148,39229,41954,46042,42337,44308,55688 41 | Sudan,AFRICA,2323,1899,2025,2487,3660,4355,4381,3473,4987,7418 42 | Tanzania,AFRICA,6579,7459,8515,9953,11193,11954,13960,14872,17020,17645 43 | Zambia,AFRICA,1290,1126,1383,1468,1848,2069,2814,1995,2249,2621 44 | Others,AFRICA,14596,11761,13233,16434,21836,23383,22352,21558,25924,34056 45 | Total,AFRICA,90962,83457,93353,115493,134801,142813,157485,141750,164474,204525 46 | Bahrein, WEST ASIA,3945,3754,4182,4414,4923,4793,6674,7224,7901,7766 47 | Israel, WEST ASIA,28774,25503,32157,39083,42866,42735,47553,42720,40581,43456 48 | Jordan, WEST ASIA,1428,1768,1686,2400,3333,3933,4537,4154,4301,4640 49 | Kuwait, WEST ASIA,1850,1838,2361,2965,3103,3773,4129,5302,5208,4764 50 | Oman, WEST ASIA,13114,13256,12352,14927,14979,17849,22284,34042,32971,35485 51 | Qatar, WEST ASIA,1361,1215,1434,1788,2176,2392,2606,2934,2765,2735 52 | Saudi Arabia, WEST ASIA,9851,8663,9961,11929,12444,14006,16352,16983,15552,21599 53 | Syria, WEST ASIA,1501,1452,1661,2289,2385,2645,2928,2883,3215,3586 54 | Turkey, WEST ASIA,2432,3354,5528,7008,7906,10221,11212,10934,10282,15483 55 | U.A.E., WEST ASIA,21483,22027,21374,22668,24560,27593,32750,63502,47234,45482 56 | Yemen Arab Rep., WEST ASIA,7773,6772,7717,8826,9423,9573,10898,11583,12695,14931 57 | Others, WEST ASIA,2912,2962,3183,4511,5723,7180,9738,13281,22138,35390 58 | Total, WEST ASIA,96424,92562,103596,122808,133821,146693,171661,215542,204843,235317 59 | Afghanistan,SOUTH ASIA,1248,6012,10079,12705,14025,18799,23045,32438,50446,73389 60 | Iran,SOUTH ASIA,11728,11815,17539,24733,28691,29771,33223,30149,34652,49265 61 | Maldives,SOUTH ASIA,17564,18826,18345,21099,33915,37652,45787,54956,55159,58152 62 | Nepal,SOUTH ASIA,41135,43056,42771,51534,77024,91552,83037,78133,88785,104374 63 | Pakistan,SOUTH ASIA,52762,2946,10364,67416,88609,83426,106283,85529,53137,51739 64 | Bangladesh,SOUTH ASIA,431312,435867,454611,477446,456371,484401,480240,541884,468899,431962 65 | Sri Lanka,SOUTH ASIA,112813,108008,109098,128711,136400,154813,204084,218805,239995,266515 66 | Bhutan,SOUTH ASIA,3571,4123,4082,7054,6934,8502,6729,9952,10328,12048 67 | Total,SOUTH ASIA,672133,630653,666889,790698,841969,908916,982428,1051846,1001401,1047444 68 | Indonesia,SOUTH EAST ASIA,7767,8694,9078,11408,12640,16990,17818,19609,20068,26171 69 | Malayasia,SOUTH EAST ASIA,57869,63748,70750,84390,96276,107286,112741,115794,135343,179077 70 | Maynnar,SOUTH EAST ASIA,3417,3037,3609,4932,5652,7734,7977,12147,12849,14719 71 | Philippines,SOUTH EAST ASIA,7199,7647,8091,10492,11422,15644,15567,17222,21987,24534 72 | Singapore,SOUTH EAST ASIA,42824,44306,48368,60710,68666,82574,92908,97851,95328,107487 73 | Thailand,SOUTH EAST ASIA,18623,19649,25754,33442,41978,46623,50037,58065,67309,76617 74 | Others,SOUTH EAST ASIA,2276,2210,3276,3736,4774,4875,6427,12237,7307,10438 75 | Total,SOUTH EAST ASIA,139975,149291,168926,209110,241408,281726,303475,332925,360191,439043 76 | China,EAST ASIA,18657,23207,33837,52279,63791,88833,118127,127032,123673,143445 77 | Hong Kong,EAST ASIA,918,581,1070,1965,1908,1466,914,519,1396,1507 78 | Japan,EAST ASIA,80634,59709,77996,96851,103082,119292,145538,145352,124756,168019 79 | Korea,EAST ASIA,29685,31552,36982,49314,53585,71920,85292,79824,70489,95612 80 | Others,EAST ASIA,570,375,621,1218,1201,1474,2166,2503,2483,3364 81 | Total,EAST ASIA,130464,115424,150506,201627,223567,282985,352037,355230,322797,411947 82 | Australia,AUSTRALASIA,52691,50743,58730,81608,96258,109867,135925,146209,149074,169647 83 | New Zealand,AUSTRALASIA,11700,10811,13283,16762,20463,23493,27498,29261,30876,37024 84 | Fiji,AUSTRALASIA,1422,1499,1519,2003,2326,2412,2635,2129,2031,2508 85 | Others,AUSTRALASIA,291,208,317,571,731,1664,1005,709,470,1096 86 | Total,AUSTRALASIA,66104,63261,73849,100944,119778,137436,167063,178308,182451,210275 87 | STATELESS, STATELESS,9393,13487,15516,1434,13490,647,26237,1025,624,670 88 | Others, Others,33319,9366,10232,13692,21818,25320,32676,34540,15588,12087 89 | -------------------------------------------------------------------------------- /lesson8/.ipynb_checkpoints/Lesson8_Correlation-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "deletable": true, 14 | "editable": true 15 | }, 16 | "source": [ 17 | "# Lesson 8: Correlation\n", 18 | "\n", 19 | "In our last notebook, we learned how to use the power of programming to visualize data. But that's not all we can do with these Python libraries. Today we'll take our programming toolkit one step further and see how we can use the same libraries to calculate numerical information about our data after we've plotted it. Visualizations and numerical information together can tell us detailed stories about our data and what it means.\n", 20 | "\n", 21 | "We'll begin by using matplotlib - the same library we used to make line graphs, bar graphs, and pie charts - to explore the process of visualizing data using scatter plots. We'll then see how we can use another powerful library called numpy to numerically investigate the relationships between variables in our data by computing correlation coefficients. Finally we'll also look into plotting lines of best fit as an introduction to the concept of regression." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": true, 29 | "deletable": true, 30 | "editable": true 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "import matplotlib\n", 35 | "matplotlib.use('Agg')\n", 36 | "from datascience import Table\n", 37 | "%matplotlib inline\n", 38 | "import matplotlib.pyplot as plt\n", 39 | "import numpy as np\n", 40 | "plt.style.use('fivethirtyeight')" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": { 47 | "collapsed": true, 48 | "deletable": true, 49 | "editable": true 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "def checker(strong, positive):\n", 54 | " if strong:\n", 55 | " if positive:\n", 56 | " print(\"Try Again!\")\n", 57 | " else:\n", 58 | " print(\"You are correct! You now know how to make and read scatterplots to analyze trends in data!\") \n", 59 | " else:\n", 60 | " print(\"Try Again!\")" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": { 66 | "deletable": true, 67 | "editable": true 68 | }, 69 | "source": [ 70 | "## Scatterplots\n", 71 | "\n", 72 | "Remember from today's lesson that when we have data about two variables, the best way to visualize it is through something called a \"scatterplot.\" We can use our handy datascience library to quickly make lots of scatterplots.\n", 73 | "\n", 74 | "Let's start with the example of clasroom participation from the lesson. Below we'll create a new Table using that data and then use the datascience \".scatter('Column Name 1', 'Column Name 2')\" function to create a scatterplot." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": { 81 | "collapsed": false, 82 | "deletable": true, 83 | "editable": true 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "classroom_participation = Table().with_columns([\n", 88 | " 'Student',['Sathvik','Anjali', 'Shreyan','Chaaru','Rishi','Divya'],\n", 89 | " '1st Week', [3,1,1,2,2,3],\n", 90 | " '12th Week', [7,2,4,5,6,5]\n", 91 | "])\n", 92 | "classroom_participation" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "collapsed": false, 100 | "deletable": true, 101 | "editable": true 102 | }, 103 | "outputs": [], 104 | "source": [ 105 | "classroom_participation.scatter('1st Week', '12th Week')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": { 111 | "deletable": true, 112 | "editable": true 113 | }, 114 | "source": [ 115 | "Scatterplots are useful for illustrating \"trends\" in our data. As we can see from this scatterplot, students who tended to participate less in the first week also tended to participated less in the twelfth week - like Anjali for example, who participated the least out of all the students in the 1st and 12th weeks. Similarly, students who tended to particpate more in the first week also participated more in the twelfth week. This is exactly what we mean by a trend. \n", 116 | "\n", 117 | "This is an example of a \"weak positive\" correlation. It's weak because the data points don't form exactly a straight line, and it's positive because the data points rougly go from the bottom left to the top right in our scatterplot (positive slope). Now let's look at another example. Fill in the following cells to make a scatter plot of classroom absences vs. final grade in the class." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": false, 125 | "deletable": true, 126 | "editable": true 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "student_abscences = Table().with_columns([\n", 131 | " 'Student',['Rashi','Amit', 'Simaran','Shruti','Eesha','Arpit'],\n", 132 | " 'Absences', [0,1,2,3,4,7],\n", 133 | " 'Grades', [99,95,90,80,75,68]\n", 134 | "])\n", 135 | "student_abscences" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "collapsed": false, 143 | "deletable": true, 144 | "editable": true 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "student_abscences.scatter( ) # fill in this line to make the scatterplot" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": { 154 | "deletable": true, 155 | "editable": true 156 | }, 157 | "source": [ 158 | "Is this data strongly or weakly correlated? And is the correlation positive or negative? Put your answers in the cell below and run it to find out if you're right!" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "collapsed": false, 166 | "deletable": true, 167 | "editable": true 168 | }, 169 | "outputs": [], 170 | "source": [ 171 | "strong = # put True here if you think it's a strong correlation and False if it's weak\n", 172 | "positive = # put True here if you think the data is positively correlated, and False if it's negatively correlated\n", 173 | "\n", 174 | "checker(strong, positive)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": { 180 | "deletable": true, 181 | "editable": true 182 | }, 183 | "source": [ 184 | "## Correlation Coefficients\n", 185 | "\n", 186 | "Scatterplots can tell us a lot about trends in our data, but just knowing whether our data is strongly or weakly correlated and whether the correlation is positive or negative isn't enough. You might ask yourself, \"How strong is the correlation? Are some correlations stronger than others?\"\n", 187 | "\n", 188 | "This is where the idea of a correlation coefficient comes in. The correlation coefficient is a number, known as r, between -1 and 1 that can be calculated for any two variables. If r is negative the correlation is negative, and if it's positive the correlation is positive. And the \"magnitude\" of r (in other words, how close |r| is to 1) tells us how strong the correlation is.\n", 189 | "\n", 190 | "In the following cells, we'll walk you through an example of calculating the correlation coefficient for a dataset. For our dataset, let's take a look at some data we first saw all the way back in lesson 1. We'll use the datascience library to create a Table with fertility, population, life expectancy, and child mortality data for all the countries in the world for the year 2015." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": false, 198 | "deletable": true, 199 | "editable": true 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "fertility_data = Table.read_table('fertility.csv').where('time', 2015).drop(\"time\")\n", 204 | "population_data = Table.read_table('population.csv').where('time', 2015).drop(\"time\")\n", 205 | "life_expectancy_data = Table.read_table('life_expectancy.csv').where('time', 2015).drop(\"time\")\n", 206 | "child_mortality_data = Table.read_table('child_mortality.csv').where('time', 2015).drop(\"time\")\n", 207 | "\n", 208 | "joined_data = fertility_data.join(\"geo\", population_data)\\\n", 209 | " .join(\"geo\", life_expectancy_data)\\\n", 210 | " .join(\"geo\", child_mortality_data)\n", 211 | "joined_data = joined_data.relabeled(\"children_per_woman_total_fertility\", \"fertility\")\\\n", 212 | " .relabeled(\"population_total\", \"population\")\\\n", 213 | " .relabeled(\"life_expectancy_years\", \"life expectancy\")\\\n", 214 | " .relabeled(\"child_mortality_0_5_year_olds_dying_per_1000_born\", \"child mortality\")\n", 215 | "joined_data" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": { 221 | "deletable": true, 222 | "editable": true 223 | }, 224 | "source": [ 225 | "Let's take a look at the \"child mortality\" and \"life expectancy\" columns. We'd expect these two be negatively correlated since the higher the risk of dying as a child, the lower your expected lifetime should be. We can see if this trend holds true by first plotting the data." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false, 233 | "deletable": true, 234 | "editable": true 235 | }, 236 | "outputs": [], 237 | "source": [ 238 | "joined_data.scatter(\"life expectancy\", \"child mortality\")" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": { 244 | "deletable": true, 245 | "editable": true 246 | }, 247 | "source": [ 248 | "Yup, in fact the data seems negatively correlated. How strong is the correlation exactly? Well to answer that let's compute the correlation coefficient, using the numpy library. First, let's get the two columns we want." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "collapsed": true, 256 | "deletable": true, 257 | "editable": true 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "life_expectancy = joined_data.column(\"life expectancy\")\n", 262 | "child_mortality = joined_data.column(\"child mortality\")" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": { 268 | "deletable": true, 269 | "editable": true 270 | }, 271 | "source": [ 272 | "Remember that the next step in computing the correlation coefficient is calculating the mean and standard deviation of our two variables. Numpy lets us do this with the convenient \"np.mean\" and \"np.std\" functions. Here's how to use them." 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": { 279 | "collapsed": true, 280 | "deletable": true, 281 | "editable": true 282 | }, 283 | "outputs": [], 284 | "source": [ 285 | "avg_life_expectancy = np.mean(life_expectancy)\n", 286 | "stddev_life_expectancy = np.std(life_expectancy)\n", 287 | "avg_child_mortality = np.mean(child_mortality)\n", 288 | "stddev_child_mortality = np.std(child_mortality)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": { 294 | "deletable": true, 295 | "editable": true 296 | }, 297 | "source": [ 298 | "The next step is to transform the data by calculating z_x = (x - x_mean)/x_stddev and z_y = (y - y_mean)/y_stddev for each x and y value, then multiplying together to get z_x * z_y. We can use the numpy \"subtract,\" \"divide,\" and \"multiply\" functions for this." 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": { 305 | "collapsed": true, 306 | "deletable": true, 307 | "editable": true 308 | }, 309 | "outputs": [], 310 | "source": [ 311 | "transformed_life_expectancy = np.divide(np.subtract(life_expectancy, avg_life_expectancy), stddev_life_expectancy)\n", 312 | "transformed_child_mortality = np.divide(np.subtract(child_mortality, avg_child_mortality), stddev_child_mortality)\n", 313 | "products = np.multiply(transformed_life_expectancy, transformed_child_mortality)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "deletable": true, 320 | "editable": true 321 | }, 322 | "source": [ 323 | "Finally, we add up the values in this array and divide by n, where n is the number of values, to get the final correlation coefficient." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": { 330 | "collapsed": false, 331 | "deletable": true, 332 | "editable": true 333 | }, 334 | "outputs": [], 335 | "source": [ 336 | "correlation_coefficient = np.sum(products) / len(products)\n", 337 | "print(\"Correlation coefficient: \", correlation_coefficient)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": { 343 | "deletable": true, 344 | "editable": true 345 | }, 346 | "source": [ 347 | "As you can see the correlation coefficient is negative and very close to -1, indicating that this is a strong negative correlation as expected. \n", 348 | "\n", 349 | "Now it's your turn. Let's take a look at another pair of variables: fertility and population. Fill in the following cells to make the scatterplot." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": { 356 | "collapsed": false, 357 | "deletable": true, 358 | "editable": true 359 | }, 360 | "outputs": [], 361 | "source": [ 362 | "joined_data.scatter( ) # fill this in to get a scatterplot of fertility vs population" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "deletable": true, 369 | "editable": true 370 | }, 371 | "source": [ 372 | "Now guess whether the correlation is strong or weak, and if it's positive or negative. Then fill in the following cells to find out if you're right by computing the correlation coefficient!" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": { 379 | "collapsed": false, 380 | "deletable": true, 381 | "editable": true 382 | }, 383 | "outputs": [], 384 | "source": [ 385 | "fertility = joined_data.column( )\n", 386 | "population = joined_data.column( )\n", 387 | "\n", 388 | "avg_fertility = \n", 389 | "stddev_fertility = \n", 390 | "avg_population = \n", 391 | "stddev_population = \n", 392 | "\n", 393 | "transformed_fertility = np.divide(np.subtract( ), )\n", 394 | "transformed_population = np.divide(np.subtract( ), )\n", 395 | "products = \n", 396 | "\n", 397 | "correlation_coefficient = np.sum( ) / ( )\n", 398 | "print(\"This is the correlation coefficient: \", correlation_coefficient)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": { 404 | "deletable": true, 405 | "editable": true 406 | }, 407 | "source": [ 408 | "A much quicker way to compute the correlation coefficient is to use the np.corrcoef() function. To check if you filled in the previous cells correctly and found the right value, run the next cell to see the right answer." 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": null, 414 | "metadata": { 415 | "collapsed": false, 416 | "deletable": true, 417 | "editable": true, 418 | "scrolled": true 419 | }, 420 | "outputs": [], 421 | "source": [ 422 | "print(\"This is the correlation coefficient: \", \n", 423 | " np.corrcoef(joined_data.column(\"population\"), joined_data.column(\"fertility\"))[1,0])" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": { 429 | "deletable": true, 430 | "editable": true 431 | }, 432 | "source": [ 433 | "## Line of Best Fit\n", 434 | "\n", 435 | "The last thing you learned about correlation is the idea of a line of best fit. This is a line that roughly \"fits\" the data by describing the ideal linear relationship between the data points. Luckily we can plot lines of best fit easily using the datascience library by adding an additional argument, \"fit_line = True\", to our call to the scatter function. Here's an example." 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": null, 441 | "metadata": { 442 | "collapsed": false, 443 | "deletable": true, 444 | "editable": true 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "joined_data.scatter(\"life expectancy\", \"child mortality\", fit_line=True)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": { 454 | "deletable": true, 455 | "editable": true 456 | }, 457 | "source": [ 458 | "When the data is strongly correlated (correlation coefficient close to 1 or -1), the line closely fits most of the data points well. But when the data is not strongly correlated (correlation coefficient close to 0), the lie of best fit doesn't seem to fit the data at all. Let's see what the line of best fit looks like for population and fertility. Fill in the following cell to create a scatterplot of population and fertility with a best fit line." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": false, 466 | "deletable": true, 467 | "editable": true 468 | }, 469 | "outputs": [], 470 | "source": [ 471 | "joined_data.scatter( ) # fill this in" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": { 477 | "deletable": true, 478 | "editable": true 479 | }, 480 | "source": [ 481 | "In the next lesson, we'll learn how to actually mathematically find the equation for this line of best fit by using a technique known as \"regression.\"" 482 | ] 483 | } 484 | ], 485 | "metadata": { 486 | "celltoolbar": "Raw Cell Format", 487 | "kernelspec": { 488 | "display_name": "Python 3", 489 | "language": "python", 490 | "name": "python3" 491 | }, 492 | "language_info": { 493 | "codemirror_mode": { 494 | "name": "ipython", 495 | "version": 3 496 | }, 497 | "file_extension": ".py", 498 | "mimetype": "text/x-python", 499 | "name": "python", 500 | "nbconvert_exporter": "python", 501 | "pygments_lexer": "ipython3", 502 | "version": "3.6.0" 503 | } 504 | }, 505 | "nbformat": 4, 506 | "nbformat_minor": 2 507 | } 508 | -------------------------------------------------------------------------------- /lesson8/Lesson8_Correlation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "deletable": true, 14 | "editable": true 15 | }, 16 | "source": [ 17 | "# Lesson 8: Correlation\n", 18 | "\n", 19 | "In our last notebook, we learned how to use the power of programming to visualize data. But that's not all we can do with these Python libraries. Today we'll take our programming toolkit one step further and see how we can use the same libraries to calculate numerical information about our data after we've plotted it. Visualizations and numerical information together can tell us detailed stories about our data and what it means.\n", 20 | "\n", 21 | "We'll begin by using matplotlib - the same library we used to make line graphs, bar graphs, and pie charts - to explore the process of visualizing data using scatter plots. We'll then see how we can use another powerful library called numpy to numerically investigate the relationships between variables in our data by computing correlation coefficients. Finally we'll also look into plotting lines of best fit as an introduction to the concept of regression." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "collapsed": true, 29 | "deletable": true, 30 | "editable": true 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "import matplotlib\n", 35 | "matplotlib.use('Agg')\n", 36 | "from datascience import Table\n", 37 | "%matplotlib inline\n", 38 | "import matplotlib.pyplot as plt\n", 39 | "import numpy as np\n", 40 | "plt.style.use('fivethirtyeight')" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": { 47 | "collapsed": true, 48 | "deletable": true, 49 | "editable": true 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "def checker(strong, positive):\n", 54 | " if strong:\n", 55 | " if positive:\n", 56 | " print(\"Try Again!\")\n", 57 | " else:\n", 58 | " print(\"You are correct! You now know how to make and read scatterplots to analyze trends in data!\") \n", 59 | " else:\n", 60 | " print(\"Try Again!\")" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": { 66 | "deletable": true, 67 | "editable": true 68 | }, 69 | "source": [ 70 | "## Scatterplots\n", 71 | "\n", 72 | "Remember from today's lesson that when we have data about two variables, the best way to visualize it is through something called a \"scatterplot.\" We can use our handy datascience library to quickly make lots of scatterplots.\n", 73 | "\n", 74 | "Let's start with the example of clasroom participation from the lesson. Below we'll create a new Table using that data and then use the datascience \".scatter('Column Name 1', 'Column Name 2')\" function to create a scatterplot." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": { 81 | "collapsed": false, 82 | "deletable": true, 83 | "editable": true 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "classroom_participation = Table().with_columns([\n", 88 | " 'Student',['Sathvik','Anjali', 'Shreyan','Chaaru','Rishi','Divya'],\n", 89 | " '1st Week', [3,1,1,2,2,3],\n", 90 | " '12th Week', [7,2,4,5,6,5]\n", 91 | "])\n", 92 | "classroom_participation" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "collapsed": false, 100 | "deletable": true, 101 | "editable": true 102 | }, 103 | "outputs": [], 104 | "source": [ 105 | "classroom_participation.scatter('1st Week', '12th Week')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": { 111 | "deletable": true, 112 | "editable": true 113 | }, 114 | "source": [ 115 | "Scatterplots are useful for illustrating \"trends\" in our data. As we can see from this scatterplot, students who tended to participate less in the first week also tended to participated less in the twelfth week - like Anjali for example, who participated the least out of all the students in the 1st and 12th weeks. Similarly, students who tended to particpate more in the first week also participated more in the twelfth week. This is exactly what we mean by a trend. \n", 116 | "\n", 117 | "This is an example of a \"weak positive\" correlation. It's weak because the data points don't form exactly a straight line, and it's positive because the data points rougly go from the bottom left to the top right in our scatterplot (positive slope). Now let's look at another example. Fill in the following cells to make a scatter plot of classroom absences vs. final grade in the class." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": false, 125 | "deletable": true, 126 | "editable": true 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "student_abscences = Table().with_columns([\n", 131 | " 'Student',['Rashi','Amit', 'Simaran','Shruti','Eesha','Arpit'],\n", 132 | " 'Absences', [0,1,2,3,4,7],\n", 133 | " 'Grades', [99,95,90,80,75,68]\n", 134 | "])\n", 135 | "student_abscences" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "collapsed": false, 143 | "deletable": true, 144 | "editable": true 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "student_abscences.scatter( ) # fill in this line to make the scatterplot" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": { 154 | "deletable": true, 155 | "editable": true 156 | }, 157 | "source": [ 158 | "Is this data strongly or weakly correlated? And is the correlation positive or negative? Put your answers in the cell below and run it to find out if you're right!" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": { 165 | "collapsed": false, 166 | "deletable": true, 167 | "editable": true 168 | }, 169 | "outputs": [], 170 | "source": [ 171 | "strong = # put True here if you think it's a strong correlation and False if it's weak\n", 172 | "positive = # put True here if you think the data is positively correlated, and False if it's negatively correlated\n", 173 | "\n", 174 | "checker(strong, positive)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": { 180 | "deletable": true, 181 | "editable": true 182 | }, 183 | "source": [ 184 | "## Correlation Coefficients\n", 185 | "\n", 186 | "Scatterplots can tell us a lot about trends in our data, but just knowing whether our data is strongly or weakly correlated and whether the correlation is positive or negative isn't enough. You might ask yourself, \"How strong is the correlation? Are some correlations stronger than others?\"\n", 187 | "\n", 188 | "This is where the idea of a correlation coefficient comes in. The correlation coefficient is a number, known as r, between -1 and 1 that can be calculated for any two variables. If r is negative the correlation is negative, and if it's positive the correlation is positive. And the \"magnitude\" of r (in other words, how close |r| is to 1) tells us how strong the correlation is.\n", 189 | "\n", 190 | "In the following cells, we'll walk you through an example of calculating the correlation coefficient for a dataset. For our dataset, let's take a look at some data we first saw all the way back in lesson 1. We'll use the datascience library to create a Table with fertility, population, life expectancy, and child mortality data for all the countries in the world for the year 2015." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": false, 198 | "deletable": true, 199 | "editable": true 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "fertility_data = Table.read_table('fertility.csv').where('time', 2015).drop(\"time\")\n", 204 | "population_data = Table.read_table('population.csv').where('time', 2015).drop(\"time\")\n", 205 | "life_expectancy_data = Table.read_table('life_expectancy.csv').where('time', 2015).drop(\"time\")\n", 206 | "child_mortality_data = Table.read_table('child_mortality.csv').where('time', 2015).drop(\"time\")\n", 207 | "\n", 208 | "joined_data = fertility_data.join(\"geo\", population_data)\\\n", 209 | " .join(\"geo\", life_expectancy_data)\\\n", 210 | " .join(\"geo\", child_mortality_data)\n", 211 | "joined_data = joined_data.relabeled(\"children_per_woman_total_fertility\", \"fertility\")\\\n", 212 | " .relabeled(\"population_total\", \"population\")\\\n", 213 | " .relabeled(\"life_expectancy_years\", \"life expectancy\")\\\n", 214 | " .relabeled(\"child_mortality_0_5_year_olds_dying_per_1000_born\", \"child mortality\")\n", 215 | "joined_data" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": { 221 | "deletable": true, 222 | "editable": true 223 | }, 224 | "source": [ 225 | "Let's take a look at the \"child mortality\" and \"life expectancy\" columns. We'd expect these two be negatively correlated since the higher the risk of dying as a child, the lower your expected lifetime should be. We can see if this trend holds true by first plotting the data." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false, 233 | "deletable": true, 234 | "editable": true 235 | }, 236 | "outputs": [], 237 | "source": [ 238 | "joined_data.scatter(\"life expectancy\", \"child mortality\")" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": { 244 | "deletable": true, 245 | "editable": true 246 | }, 247 | "source": [ 248 | "Yup, in fact the data seems negatively correlated. How strong is the correlation exactly? Well to answer that let's compute the correlation coefficient, using the numpy library. First, let's get the two columns we want." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "collapsed": true, 256 | "deletable": true, 257 | "editable": true 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "life_expectancy = joined_data.column(\"life expectancy\")\n", 262 | "child_mortality = joined_data.column(\"child mortality\")" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": { 268 | "deletable": true, 269 | "editable": true 270 | }, 271 | "source": [ 272 | "Remember that the next step in computing the correlation coefficient is calculating the mean and standard deviation of our two variables. Numpy lets us do this with the convenient \"np.mean\" and \"np.std\" functions. Here's how to use them." 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": { 279 | "collapsed": true, 280 | "deletable": true, 281 | "editable": true 282 | }, 283 | "outputs": [], 284 | "source": [ 285 | "avg_life_expectancy = np.mean(life_expectancy)\n", 286 | "stddev_life_expectancy = np.std(life_expectancy)\n", 287 | "avg_child_mortality = np.mean(child_mortality)\n", 288 | "stddev_child_mortality = np.std(child_mortality)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": { 294 | "deletable": true, 295 | "editable": true 296 | }, 297 | "source": [ 298 | "The next step is to transform the data by calculating z_x = (x - x_mean)/x_stddev and z_y = (y - y_mean)/y_stddev for each x and y value, then multiplying together to get z_x * z_y. We can use the numpy \"subtract,\" \"divide,\" and \"multiply\" functions for this." 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": { 305 | "collapsed": true, 306 | "deletable": true, 307 | "editable": true 308 | }, 309 | "outputs": [], 310 | "source": [ 311 | "transformed_life_expectancy = np.divide(np.subtract(life_expectancy, avg_life_expectancy), stddev_life_expectancy)\n", 312 | "transformed_child_mortality = np.divide(np.subtract(child_mortality, avg_child_mortality), stddev_child_mortality)\n", 313 | "products = np.multiply(transformed_life_expectancy, transformed_child_mortality)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "deletable": true, 320 | "editable": true 321 | }, 322 | "source": [ 323 | "Finally, we add up the values in this array and divide by n, where n is the number of values, to get the final correlation coefficient." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": { 330 | "collapsed": false, 331 | "deletable": true, 332 | "editable": true 333 | }, 334 | "outputs": [], 335 | "source": [ 336 | "correlation_coefficient = np.sum(products) / len(products)\n", 337 | "print(\"Correlation coefficient: \", correlation_coefficient)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": { 343 | "deletable": true, 344 | "editable": true 345 | }, 346 | "source": [ 347 | "As you can see the correlation coefficient is negative and very close to -1, indicating that this is a strong negative correlation as expected. \n", 348 | "\n", 349 | "Now it's your turn. Let's take a look at another pair of variables: fertility and population. Fill in the following cells to make the scatterplot." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": { 356 | "collapsed": false, 357 | "deletable": true, 358 | "editable": true 359 | }, 360 | "outputs": [], 361 | "source": [ 362 | "joined_data.scatter( ) # fill this in to get a scatterplot of fertility vs population" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "deletable": true, 369 | "editable": true 370 | }, 371 | "source": [ 372 | "Now guess whether the correlation is strong or weak, and if it's positive or negative. Then fill in the following cells to find out if you're right by computing the correlation coefficient!" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": { 379 | "collapsed": false, 380 | "deletable": true, 381 | "editable": true 382 | }, 383 | "outputs": [], 384 | "source": [ 385 | "fertility = joined_data.column( )\n", 386 | "population = joined_data.column( )\n", 387 | "\n", 388 | "avg_fertility = \n", 389 | "stddev_fertility = \n", 390 | "avg_population = \n", 391 | "stddev_population = \n", 392 | "\n", 393 | "transformed_fertility = np.divide(np.subtract( ), )\n", 394 | "transformed_population = np.divide(np.subtract( ), )\n", 395 | "products = \n", 396 | "\n", 397 | "correlation_coefficient = np.sum( ) / ( )\n", 398 | "print(\"This is the correlation coefficient: \", correlation_coefficient)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": { 404 | "deletable": true, 405 | "editable": true 406 | }, 407 | "source": [ 408 | "A much quicker way to compute the correlation coefficient is to use the np.corrcoef() function. To check if you filled in the previous cells correctly and found the right value, run the next cell to see the right answer." 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": null, 414 | "metadata": { 415 | "collapsed": false, 416 | "deletable": true, 417 | "editable": true, 418 | "scrolled": true 419 | }, 420 | "outputs": [], 421 | "source": [ 422 | "print(\"This is the correlation coefficient: \", \n", 423 | " np.corrcoef(joined_data.column(\"population\"), joined_data.column(\"fertility\"))[1,0])" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": { 429 | "deletable": true, 430 | "editable": true 431 | }, 432 | "source": [ 433 | "## Line of Best Fit\n", 434 | "\n", 435 | "The last thing you learned about correlation is the idea of a line of best fit. This is a line that roughly \"fits\" the data by describing the ideal linear relationship between the data points. Luckily we can plot lines of best fit easily using the datascience library by adding an additional argument, \"fit_line = True\", to our call to the scatter function. Here's an example." 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": null, 441 | "metadata": { 442 | "collapsed": false, 443 | "deletable": true, 444 | "editable": true 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "joined_data.scatter(\"life expectancy\", \"child mortality\", fit_line=True)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": { 454 | "deletable": true, 455 | "editable": true 456 | }, 457 | "source": [ 458 | "When the data is strongly correlated (correlation coefficient close to 1 or -1), the line closely fits most of the data points well. But when the data is not strongly correlated (correlation coefficient close to 0), the lie of best fit doesn't seem to fit the data at all. Let's see what the line of best fit looks like for population and fertility. Fill in the following cell to create a scatterplot of population and fertility with a best fit line." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": false, 466 | "deletable": true, 467 | "editable": true 468 | }, 469 | "outputs": [], 470 | "source": [ 471 | "joined_data.scatter( ) # fill this in" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": { 477 | "deletable": true, 478 | "editable": true 479 | }, 480 | "source": [ 481 | "In the next lesson, we'll learn how to actually mathematically find the equation for this line of best fit by using a technique known as \"regression.\"" 482 | ] 483 | } 484 | ], 485 | "metadata": { 486 | "celltoolbar": "Raw Cell Format", 487 | "kernelspec": { 488 | "display_name": "Python 3", 489 | "language": "python", 490 | "name": "python3" 491 | }, 492 | "language_info": { 493 | "codemirror_mode": { 494 | "name": "ipython", 495 | "version": 3 496 | }, 497 | "file_extension": ".py", 498 | "mimetype": "text/x-python", 499 | "name": "python", 500 | "nbconvert_exporter": "python", 501 | "pygments_lexer": "ipython3", 502 | "version": "3.6.0" 503 | } 504 | }, 505 | "nbformat": 4, 506 | "nbformat_minor": 2 507 | } 508 | -------------------------------------------------------------------------------- /lesson9/circular.csv: -------------------------------------------------------------------------------- 1 | x,y 2 | -187,38 3 | -65,15 4 | -47,4 5 | -53,46 6 | -12,-31 7 | -48,-93 8 | -126,59 9 | -71,13 10 | -183,78 11 | -88,47 12 | -129,8 13 | -172,-88 14 | -23,-18 15 | -3,10 16 | -19,-48 17 | -86,-57 18 | -49,-10 19 | -69,-69 20 | -172,21 21 | -131,101 22 | -118,119 23 | -130,101 24 | -55,106 25 | -117,-125 26 | -137,-78 27 | -132,-3 28 | -141,-1 29 | -74,-123 30 | -74,-68 31 | -154,36 32 | -141,26 33 | -118,-76 34 | -98,-44 35 | -157,-61 36 | -49,80 37 | 19,62 38 | -116,-5 39 | -18,33 40 | -63,-84 41 | -64,-98 42 | 13,-19 43 | -32,13 44 | -152,-47 45 | -56,2 46 | -49,-17 47 | -43,-90 48 | -141,-36 49 | -4,108 50 | -86,-36 51 | -82,-69 52 | -91,-13 53 | -47,-12 54 | -82,156 55 | -127,33 56 | -142,171 57 | -74,-162 58 | -8,-109 59 | -78,-131 60 | -55,-40 61 | -115,32 62 | 4,-62 63 | -14,-97 64 | -85,-98 65 | -16,-23 66 | -125,19 67 | -125,6 68 | -13,9 69 | -129,33 70 | -86,127 71 | -47,101 72 | 46,43 73 | 2,63 74 | -137,-36 75 | -124,-84 76 | 55,-84 77 | -125,-52 78 | -77,43 79 | -40,-100 80 | 49,-1 81 | -51,-47 82 | -13,51 83 | -49,-15 84 | -104,17 85 | 59,2 86 | 24,14 87 | 28,51 88 | -27,1 89 | -114,94 90 | -25,160 91 | -50,94 92 | -53,54 93 | -40,-138 94 | -48,-71 95 | 28,12 96 | 25,-158 97 | 6,-54 98 | -40,-109 99 | -60,-94 100 | -75,-29 101 | 3,75 102 | -80,68 103 | 66,-77 104 | 44,97 105 | 23,122 106 | -100,116 107 | -62,82 108 | 1,130 109 | -22,143 110 | -25,116 111 | -69,181 112 | 31,-114 113 | -77,-118 114 | -20,-132 115 | -109,-85 116 | 10,32 117 | 49,-17 118 | -69,17 119 | -92,-56 120 | -39,51 121 | 67,57 122 | 55,40 123 | -20,102 124 | 13,-44 125 | -23,106 126 | -50,-32 127 | 20,26 128 | -114,22 129 | -42,59 130 | 26,182 131 | -82,-112 132 | 58,-56 133 | -55,20 134 | -71,-82 135 | -50,-73 136 | -29,14 137 | 89,-53 138 | 63,44 139 | -51,-34 140 | 49,-61 141 | 9,47 142 | 4,65 143 | -34,103 144 | 83,-56 145 | -66,-3 146 | -46,81 147 | -21,26 148 | 8,41 149 | 59,49 150 | 20,-83 151 | 26,1 152 | -50,-160 153 | -71,-10 154 | 54,29 155 | 55,-15 156 | -42,-99 157 | -98,-31 158 | -67,-90 159 | 98,-36 160 | -49,-66 161 | 20,18 162 | 31,75 163 | 57,95 164 | 43,91 165 | 78,-26 166 | -31,-21 167 | -42,6 168 | 67,149 169 | -43,32 170 | -45,-180 171 | 0,-112 172 | -18,-132 173 | 30,-23 174 | 9,-143 175 | -16,3 176 | 61,-128 177 | -77,-36 178 | -30,-54 179 | 83,-14 180 | 29,-77 181 | -90,111 182 | -16,-25 183 | 92,113 184 | -6,54 185 | -57,95 186 | -58,60 187 | -74,148 188 | 49,100 189 | 47,-9 190 | 98,-37 191 | 61,-115 192 | -7,33 193 | 19,-137 194 | 47,-53 195 | 40,-86 196 | -29,-102 197 | 5,-51 198 | -24,-20 199 | -1,71 200 | 74,67 201 | 18,80 202 | 17,73 203 | 99,127 204 | 112,-24 205 | 20,145 206 | -24,-12 207 | 102,78 208 | -27,-51 209 | -12,-144 210 | 58,-35 211 | 6,-160 212 | 87,-22 213 | -48,-67 214 | 45,29 215 | -47,-32 216 | 10,-32 217 | 113,-85 218 | -24,95 219 | 55,15 220 | 103,-10 221 | -44,110 222 | 87,-5 223 | 60,57 224 | 51,114 225 | 122,160 226 | -15,170 227 | 68,-171 228 | -24,-2 229 | 77,7 230 | 57,36 231 | 12,31 232 | 133,-116 233 | 134,-77 234 | 45,-51 235 | 7,-9 236 | 53,46 237 | 97,106 238 | 124,117 239 | 30,-6 240 | 55,49 241 | -42,84 242 | 72,158 243 | 69,43 244 | -12,140 245 | 43,27 246 | -48,-74 247 | 96,-96 248 | 65,-9 249 | -50,2 250 | 123,-130 251 | 6,-111 252 | -45,30 253 | 27,54 254 | -30,-98 255 | 144,10 256 | 36,58 257 | -12,108 258 | 61,-23 259 | 47,70 260 | 33,70 261 | 43,126 262 | 100,44 263 | 148,-136 264 | 9,-68 265 | 2,-144 266 | 128,-14 267 | 37,-30 268 | 117,36 269 | 86,9 270 | -33,76 271 | 125,73 272 | 33,81 273 | 65,103 274 | 28,105 275 | -14,73 276 | 40,-48 277 | 111,-28 278 | 114,31 279 | 113,50 280 | -15,-161 281 | 114,-77 282 | 100,-57 283 | 88,-101 284 | 122,-103 285 | 138,-114 286 | 101,-19 287 | 25,-3 288 | 156,66 289 | 52,98 290 | 0,81 291 | 150,107 292 | 122,122 293 | 71,149 294 | 162,113 295 | 70,2 296 | 93,-137 297 | 31,-78 298 | 131,-117 299 | 179,-78 300 | 126,-32 301 | 158,13 302 | -7,25 303 | 171,4 304 | 122,20 305 | 92,12 306 | 160,81 307 | 17,-24 308 | 136,43 309 | 94,3 310 | 23,-68 311 | 144,-32 312 | 110,43 313 | 11,70 314 | 107,14 315 | 183,31 316 | 112,-26 -------------------------------------------------------------------------------- /lesson9/family.csv: -------------------------------------------------------------------------------- 1 | Family,Father,Mother,Gender,Height,Kids 1,78.5,67,M,73.2,4 1,78.5,67,F,69.2,4 1,78.5,67,F,69,4 1,78.5,67,F,69,4 2,75.5,66.5,M,73.5,4 2,75.5,66.5,M,72.5,4 2,75.5,66.5,F,65.5,4 2,75.5,66.5,F,65.5,4 3,75,64,M,71,2 3,75,64,F,68,2 4,75,64,M,70.5,5 4,75,64,M,68.5,5 4,75,64,F,67,5 4,75,64,F,64.5,5 4,75,64,F,63,5 5,75,58.5,M,72,6 5,75,58.5,M,69,6 5,75,58.5,M,68,6 5,75,58.5,F,66.5,6 5,75,58.5,F,62.5,6 5,75,58.5,F,62.5,6 6,74,68,F,69.5,1 7,74,68,M,76.5,6 7,74,68,M,74,6 7,74,68,M,73,6 7,74,68,M,73,6 7,74,68,F,70.5,6 7,74,68,F,64,6 8,74,66.5,F,70.5,3 8,74,66.5,F,68,3 8,74,66.5,F,66,3 9,74.5,66,F,66,1 10,74,65.5,F,65.5,1 11,74,62,M,74,8 11,74,62,M,70,8 11,74,62,F,68,8 11,74,62,F,67,8 11,74,62,F,67,8 11,74,62,F,66,8 11,74,62,F,63.5,8 11,74,62,F,63,8 12,74,61,F,65,1 14,73,67,M,68,2 14,73,67,M,67,2 15,73,66.5,M,71,3 15,73,66.5,M,70.5,3 15,73,66.5,F,66.7,3 16,73,65,M,72,9 16,73,65,M,70.5,9 16,73,65,M,70.2,9 16,73,65,M,70.2,9 16,73,65,M,69.2,9 16,73,65,F,68.7,9 16,73,65,F,66.5,9 16,73,65,F,64.5,9 16,73,65,F,63.5,9 17,73,64.5,M,74,6 17,73,64.5,M,73,6 17,73,64.5,M,71.5,6 17,73,64.5,M,62.5,6 17,73,64.5,F,66.5,6 17,73,64.5,F,62.3,6 18,73,64,F,66,3 18,73,64,F,64.5,3 18,73,64,F,64,3 19,73.2,63,F,62.7,1 20,72.7,69,M,73.2,8 20,72.7,69,M,73,8 20,72.7,69,M,72.7,8 20,72.7,69,F,70,8 20,72.7,69,F,69,8 20,72.7,69,F,68.5,8 20,72.7,69,F,68,8 20,72.7,69,F,66,8 21,72,68,M,73,3 21,72,68,F,68.5,3 21,72,68,F,68,3 22,72,67,M,73,3 22,72,67,M,71,3 22,72,67,F,67,3 23,72,65,M,74.2,7 23,72,65,M,70.5,7 23,72,65,M,69.5,7 23,72,65,F,66,7 23,72,65,F,65.5,7 23,72,65,F,65,7 23,72,65,F,65,7 24,72,65.5,F,65.5,1 25,72,64,F,66,2 25,72,64,F,63,2 26,72,63,M,70.5,5 26,72,63,M,70.5,5 26,72,63,M,69,5 26,72,63,F,65,5 26,72,63,F,63,5 27,72,63,M,69,3 27,72,63,M,67,3 27,72,63,F,63,3 28,72,63,M,73,6 28,72,63,M,67,6 28,72,63,F,70.5,6 28,72,63,F,70,6 28,72,63,F,66.5,6 28,72,63,F,63,6 29,72.5,63.5,F,67.5,3 29,72.5,63.5,F,67.2,3 29,72.5,63.5,F,66.7,3 30,72,62,F,64,1 31,72.5,62,M,71,6 31,72.5,62,M,70,6 31,72.5,62,M,70,6 31,72.5,62,F,66,6 31,72.5,62,F,65,6 31,72.5,62,F,65,6 32,72,62,M,74,5 32,72,62,M,72,5 32,72,62,M,69,5 32,72,62,F,67.5,5 32,72,62,F,63.5,5 33,72,62,M,72,5 33,72,62,M,71.5,5 33,72,62,M,71.5,5 33,72,62,M,70,5 33,72,62,F,68,5 34,72,61,F,65.7,1 35,71,69,M,78,5 35,71,69,M,74,5 35,71,69,M,73,5 35,71,69,M,72,5 35,71,69,F,67,5 36,71,67,M,73.2,4 36,71,67,M,73,4 36,71,67,M,69,4 36,71,67,F,67,4 37,71,66,M,70,4 37,71,66,F,67,4 37,71,66,F,67,4 37,71,66,F,66.5,4 38,71,66,M,70,6 38,71,66,M,69,6 38,71,66,M,68.5,6 38,71,66,F,66,6 38,71,66,F,64.5,6 38,71,66,F,63,6 39,71,66,M,71,2 39,71,66,F,67,2 40,71,66,M,76,5 40,71,66,M,72,5 40,71,66,M,71,5 40,71,66,M,66,5 40,71,66,F,66,5 41,71.7,65.5,M,70.5,1 42,71,65.5,M,72,6 42,71,65.5,M,72,6 42,71,65.5,M,71,6 42,71,65.5,M,69,6 42,71,65.5,F,66,6 42,71,65.5,F,65,6 43,71.5,65.5,M,73,2 43,71.5,65.5,F,65.2,2 44,71.5,65,M,68.5,2 44,71.5,65,M,67.7,2 45,71,65,M,68,3 45,71,65,M,68,3 45,71,65,F,62,3 46,71,64,F,68,8 46,71,64,F,68,8 46,71,64,F,67.5,8 46,71,64,F,66.5,8 46,71,64,F,66.5,8 46,71,64,F,66,8 46,71,64,F,65.5,8 46,71,64,F,65,8 47,71.7,64.5,M,72,4 47,71.7,64.5,M,71,4 47,71.7,64.5,M,70.5,4 47,71.7,64.5,F,67,4 48,71,64,M,68,3 48,71,64,M,68,3 48,71,64,M,68,3 49,71.5,64.5,M,72,7 49,71.5,64.5,M,71,7 49,71.5,64.5,M,70,7 49,71.5,64.5,F,66,7 49,71.5,64.5,F,64.5,7 49,71.5,64.5,F,64.5,7 49,71.5,64.5,F,62,7 51,71.2,63,F,67.5,2 51,71.2,63,F,64.5,2 52,71,63.5,M,71,5 52,71,63.5,M,67,5 52,71,63.5,F,66,5 52,71,63.5,F,65,5 52,71,63.5,F,63.5,5 53,71,63,M,71,9 53,71,63,M,70,9 53,71,63,M,70,9 53,71,63,M,64,9 53,71,63,F,65,9 53,71,63,F,65,9 53,71,63,F,64,9 53,71,63,F,63,9 53,71,63,F,63,9 54,71,63,M,71,4 54,71,63,M,71,4 54,71,63,M,70,4 54,71,63,F,63.5,4 55,71,62,M,71,5 55,71,62,M,70,5 55,71,62,F,64.5,5 55,71,62,F,62.5,5 55,71,62,F,61.5,5 56,71,62,M,72,5 56,71,62,M,70.5,5 56,71,62,M,70.5,5 56,71,62,F,64.5,5 56,71,62,F,60,5 57,71,62.5,M,70,5 57,71,62.5,F,64,5 57,71,62.5,F,64,5 57,71,62.5,F,64,5 57,71,62.5,F,62.5,5 58,71,62,M,70.5,7 58,71,62,M,70,7 58,71,62,M,69,7 58,71,62,M,69,7 58,71,62,M,66,7 58,71,62,F,64.5,7 58,71,62,F,64,7 59,71,61,F,62,1 60,71,58,M,71.5,2 60,71,58,M,69,2 61,70,69,M,71,4 61,70,69,M,70,4 61,70,69,M,69,4 61,70,69,F,69,4 62,70,69,M,70,6 62,70,69,M,68.7,6 62,70,69,F,68,6 62,70,69,F,66,6 62,70,69,F,64,6 62,70,69,F,62,6 63,70,68,M,75,1 64,70,67,M,70,5 64,70,67,M,69,5 64,70,67,F,66,5 64,70,67,F,64,5 64,70,67,F,60,5 65,70,67,F,67.5,1 66,70,66.5,M,73,11 66,70,66.5,M,72,11 66,70,66.5,M,72,11 66,70,66.5,M,66.5,11 66,70,66.5,F,69.2,11 66,70,66.5,F,67.2,11 66,70,66.5,F,66.5,11 66,70,66.5,F,66,11 66,70,66.5,F,66,11 66,70,66.5,F,64.2,11 66,70,66.5,F,63.7,11 67,70.5,65,M,72,4 67,70.5,65,M,70.2,4 67,70.5,65,M,69,4 67,70.5,65,M,68.5,4 68,70.5,65,F,68,5 68,70.5,65,F,65,5 68,70.5,65,F,61.5,5 68,70.5,65,F,61,5 68,70.5,65,F,61,5 69,70,65,M,73,8 69,70,65,M,72,8 69,70,65,M,70.5,8 69,70,65,M,65,8 69,70,65,M,65,8 69,70,65,F,64.5,8 69,70,65,F,63,8 69,70,65,F,62,8 70,70,65,M,67,5 70,70,65,M,65,5 70,70,65,F,64.5,5 70,70,65,F,62.5,5 70,70,65,F,62.5,5 71,70,65,M,70,6 71,70,65,M,70,6 71,70,65,F,67,6 71,70,65,F,65,6 71,70,65,F,65,6 71,70,65,F,63,6 72,70,65,M,79,7 72,70,65,M,75,7 72,70,65,M,71,7 72,70,65,F,69,7 72,70,65,F,67,7 72,70,65,F,65.7,7 72,70,65,F,62,7 73,70,65,M,73,3 73,70,65,M,72.5,3 73,70,65,F,65,3 74,70,65,M,69,2 74,70,65,M,69,2 75,70,64.7,M,72,7 75,70,64.7,M,70,7 75,70,64.7,M,68.7,7 75,70,64.7,F,66.5,7 75,70,64.7,F,65.5,7 75,70,64.7,F,64.7,7 75,70,64.7,F,64.5,7 76,70,64,M,70.7,7 76,70,64,M,70,7 76,70,64,M,68,7 76,70,64,M,67,7 76,70,64,M,66,7 76,70,64,M,65,7 76,70,64,F,67,7 77,70,64,M,70,4 77,70,64,M,68,4 77,70,64,M,66.7,4 77,70,64,F,65.5,4 78,70,64.2,M,72,5 78,70,64.2,M,70,5 78,70,64.2,F,62.5,5 78,70,64.2,F,61.2,5 78,70,64.2,F,60.1,5 79,70.5,64,M,74,8 79,70.5,64,M,69.5,8 79,70.5,64,M,69,8 79,70.5,64,M,68,8 79,70.5,64,M,68,8 79,70.5,64,M,68,8 79,70.5,64,F,65.5,8 79,70.5,64,F,65,8 80,70.5,64.5,F,60,1 81,70,64,M,68,4 81,70,64,F,65,4 81,70,64,F,64,4 81,70,64,F,62,4 82,70,64,M,71,9 82,70,64,M,70,9 82,70,64,M,70,9 82,70,64,M,70,9 82,70,64,M,69.5,9 82,70,64,M,68.5,9 82,70,64,F,69,9 82,70,64,F,65,9 82,70,64,F,64,9 83,70,63.7,M,70,8 83,70,63.7,M,67,8 83,70,63.7,M,65.5,8 83,70,63.7,F,63.7,8 83,70,63.7,F,63.2,8 83,70,63.7,F,62.5,8 83,70,63.7,F,62.2,8 83,70,63.7,F,61,8 85,70.5,63,M,72.5,5 85,70.5,63,M,69,5 85,70.5,63,M,67,5 85,70.5,63,F,64.5,5 85,70.5,63,F,64,5 86,70,63.5,M,71,4 86,70,63.5,M,67.5,4 86,70,63.5,F,67.5,4 86,70,63.5,F,63.5,4 87,70,63,M,68,4 87,70,63,M,67,4 87,70,63,F,63.7,4 87,70,63,F,62,4 88,70,63,M,70,4 88,70,63,M,66.5,4 88,70,63,F,62,4 88,70,63,F,61,4 89,70.5,62,M,72,8 89,70.5,62,M,70,8 89,70.5,62,M,69.5,8 89,70.5,62,M,69.5,8 89,70.5,62,M,68,8 89,70.5,62,F,65,8 89,70.5,62,F,64,8 89,70.5,62,F,63,8 90,70.3,62.7,M,70.7,7 90,70.3,62.7,M,69.7,7 90,70.3,62.7,M,69.2,7 90,70.3,62.7,M,65.2,7 90,70.3,62.7,F,64,7 90,70.3,62.7,F,63.5,7 90,70.3,62.7,F,63.2,7 91,70.5,62,M,72,3 91,70.5,62,M,72,3 91,70.5,62,F,60,3 92,70,61,M,71.2,2 92,70,61,M,67,2 93,70,60,M,67,4 93,70,60,M,64.5,4 93,70,60,F,65,4 93,70,60,F,63,4 94,70,60,F,65,2 94,70,60,F,65,2 95,70,58.5,M,71.5,3 95,70,58.5,M,64.5,3 95,70,58.5,F,63,3 96,70,58,M,72,5 96,70,58,M,66,5 96,70,58,F,66,5 96,70,58,F,65,5 96,70,58,F,63,5 97,69,68.5,M,75,10 97,69,68.5,M,71,10 97,69,68.5,M,70,10 97,69,68.5,F,66,10 97,69,68.5,F,66,10 97,69,68.5,F,65.5,10 97,69,68.5,F,65,10 97,69,68.5,F,65,10 97,69,68.5,F,64,10 97,69,68.5,F,64,10 98,69,67,F,64,1 99,69,66,M,73,8 99,69,66,M,72,8 99,69,66,M,71.7,8 99,69,66,M,71.5,8 99,69,66,F,65.5,8 99,69,66,F,65,8 99,69,66,F,62.7,8 99,69,66,F,62.5,8 100,69,66,M,71.2,3 100,69,66,M,71,3 100,69,66,M,70,3 101,69,66.7,M,75,6 101,69,66.7,M,74,6 101,69,66.7,M,72,6 101,69,66.7,M,68.5,6 101,69,66.7,M,67,6 101,69,66.7,M,66,6 102,69,66,M,70,6 102,69,66,M,68.5,6 102,69,66,M,68,6 102,69,66,F,65,6 102,69,66,F,63,6 102,69,66,F,62.5,6 103,69,66.5,M,73,5 103,69,66.5,M,71,5 103,69,66.5,M,70.5,5 103,69,66.5,M,70.5,5 103,69,66.5,F,61,5 104,69.5,66.5,M,70.5,4 104,69.5,66.5,M,67.5,4 104,69.5,66.5,F,64.5,4 104,69.5,66.5,F,64,4 105,69,66.5,M,71,6 105,69,66.5,F,68.5,6 105,69,66.5,F,67.5,6 105,69,66.5,F,66,6 105,69,66.5,F,63,6 105,69,66.5,F,63,6 106,69.5,66,M,71,7 106,69.5,66,M,71,7 106,69.5,66,M,70.5,7 106,69.5,66,M,70.5,7 106,69.5,66,F,66.5,7 106,69.5,66,F,65.5,7 106,69.5,66,F,64.5,7 107,69,66,M,73,9 107,69,66,M,72,9 107,69,66,M,69,9 107,69,66,M,69,9 107,69,66,F,66.5,9 107,69,66,F,65.5,9 107,69,66,F,65.5,9 107,69,66,F,65,9 107,69,66,F,64,9 108,69,65,M,70,7 108,69,65,M,68.5,7 108,69,65,M,67,7 108,69,65,F,65,7 108,69,65,F,64,7 108,69,65,F,63.5,7 108,69,65,F,61,7 109,69.5,64.5,M,69.7,7 109,69.5,64.5,M,68,7 109,69.5,64.5,M,60,7 109,69.5,64.5,F,65.2,7 109,69.5,64.5,F,64.5,7 109,69.5,64.5,F,63.7,7 109,69.5,64.5,F,60,7 110,69.2,64,M,71.7,4 110,69.2,64,M,66.5,4 110,69.2,64,F,65,4 110,69.2,64,F,63.5,4 112,69,63,M,69,3 112,69,63,F,67.5,3 112,69,63,F,63.5,3 113,69,63,M,72,1 114,69,63,M,73,6 114,69,63,M,70,6 114,69,63,M,70,6 114,69,63,M,64,6 114,69,63,F,66,6 114,69,63,F,62,6 115,69,63.5,M,70.5,7 115,69,63.5,M,67,7 115,69,63.5,M,66,7 115,69,63.5,F,65,7 115,69,63.5,F,63,7 115,69,63.5,F,62,7 115,69,63.5,F,61,7 116,69,63.5,M,70.5,3 116,69,63.5,F,63.7,3 116,69,63.5,F,63,3 117,69.7,62,F,62.5,1 118,69.5,62,M,73,3 118,69.5,62,M,72,3 118,69.5,62,M,69,3 119,69,62,M,73,5 119,69,62,M,71,5 119,69,62,M,71,5 119,69,62,M,69,5 119,69,62,F,63,5 121,69,62.5,M,71,8 121,69,62.5,M,70,8 121,69,62.5,M,70,8 121,69,62.5,M,69,8 121,69,62.5,F,63.5,8 121,69,62.5,F,62.5,8 121,69,62.5,F,62.5,8 121,69,62.5,F,62,8 122,69,62,M,72,4 122,69,62,M,68,4 122,69,62,F,66,4 122,69,62,F,66,4 123,69.5,61,M,70,5 123,69.5,61,M,69.5,5 123,69.5,61,M,69,5 123,69.5,61,F,63,5 123,69.5,61,F,62,5 124,69,61,M,68,9 124,69,61,M,68,9 124,69,61,M,67.5,9 124,69,61,M,64,9 124,69,61,M,63,9 124,69,61,M,63,9 124,69,61,F,63.5,9 124,69,61,F,62,9 124,69,61,F,62,9 125,69,60,M,70.5,3 125,69,60,F,68,3 125,69,60,F,62.5,3 126,69,60,M,69,4 126,69,60,M,66,4 126,69,60,F,61.7,4 126,69,60,F,60.5,4 127,69,60.5,M,69.5,1 128,68.7,70.5,M,71,2 128,68.7,70.5,F,61.7,2 129,68.5,67,M,73,3 129,68.5,67,M,71,3 129,68.5,67,F,67,3 130,68.5,66.5,M,70,11 130,68.5,66.5,M,69,11 130,68.5,66.5,M,69,11 130,68.5,66.5,M,68.7,11 130,68.5,66.5,M,68.5,11 130,68.5,66.5,M,68.5,11 130,68.5,66.5,M,68,11 130,68.5,66.5,M,68,11 130,68.5,66.5,M,68,11 130,68.5,66.5,F,63.2,11 131,68,65,M,67.5,2 131,68,65,M,66,2 132,68,65.5,M,66,2 132,68,65.5,F,64,2 133,68,65.5,M,71.7,7 133,68,65.5,M,71.5,7 133,68,65.5,M,70.7,7 133,68,65.5,M,65.5,7 133,68,65.5,F,66.5,7 133,68,65.5,F,65.2,7 133,68,65.5,F,61.5,7 134,68,65,M,72,4 134,68,65,M,72,4 134,68,65,F,68,4 134,68,65,F,66,4 135,68.5,65,M,69.2,8 135,68.5,65,M,68,8 135,68.5,65,M,66,8 135,68.5,65,M,66,8 135,68.5,65,F,62,8 135,68.5,65,F,61.5,8 135,68.5,65,F,61,8 135,68.5,65,F,60,8 136,68,64,M,71,10 136,68,64,M,68,10 136,68,64,M,68,10 136,68,64,M,67,10 136,68,64,F,65,10 136,68,64,F,64,10 136,68,64,F,63,10 136,68,64,F,63,10 136,68,64,F,62,10 136,68,64,F,61,10 137,68,64,M,66,4 137,68,64,M,63,4 137,68,64,F,65.5,4 137,68,64,F,62,4 138,68,64,M,71.2,5 138,68,64,M,71.2,5 138,68,64,M,69,5 138,68,64,M,68.5,5 138,68,64,F,62.5,5 139,68,64.5,F,62,1 140,68,64,M,69,10 140,68,64,M,67,10 140,68,64,M,66,10 140,68,64,F,66,10 140,68,64,F,66,10 140,68,64,F,65,10 140,68,64,F,65,10 140,68,64,F,65,10 140,68,64,F,64,10 140,68,64,F,63,10 141,68,63,M,70.5,8 141,68,63,M,70,8 141,68,63,M,68,8 141,68,63,M,66,8 141,68,63,M,66,8 141,68,63,F,66,8 141,68,63,F,62,8 141,68,63,F,61.5,8 142,68.5,63.5,M,73.5,4 142,68.5,63.5,M,70,4 142,68.5,63.5,M,69.5,4 142,68.5,63.5,F,65.5,4 143,68,63,M,67,1 144,68,63,M,70,4 144,68,63,M,68,4 144,68,63,F,64.5,4 144,68,63,F,64,4 145,68,63,M,71,8 145,68,63,M,68,8 145,68,63,M,66,8 145,68,63,M,65.5,8 145,68,63,M,65,8 145,68,63,F,63,8 145,68,63,F,62,8 145,68,63,F,62,8 146,68,63,M,67,6 146,68,63,M,67,6 146,68,63,M,66,6 146,68,63,F,64,6 146,68,63,F,63.5,6 146,68,63,F,61,6 147,68.5,63.5,M,68.2,1 148,68,63,M,70,1 149,68.2,63.5,M,70,5 149,68.2,63.5,M,69,5 149,68.2,63.5,M,67,5 149,68.2,63.5,M,65.5,5 149,68.2,63.5,F,64.5,5 150,68,62.5,M,68.5,1 151,68.7,62,M,67.7,2 151,68.7,62,F,61.7,2 152,68,62.5,M,66.5,1 153,68,61,M,68.5,5 153,68,61,M,68,5 153,68,61,M,64,5 153,68,61,F,63.5,5 153,68,61,F,63,5 154,68,60.2,M,66.7,1 155,68,60,M,64,7 155,68,60,F,61,7 155,68,60,F,61,7 155,68,60,F,60,7 155,68,60,F,60,7 155,68,60,F,60,7 155,68,60,F,56,7 156,68,60,M,67.5,4 156,68,60,M,67,4 156,68,60,M,66.5,4 156,68,60,F,60,4 157,68.5,59,M,69,1 158,68,59,M,68,10 158,68,59,M,65,10 158,68,59,M,64.7,10 158,68,59,M,64,10 158,68,59,M,64,10 158,68,59,M,63,10 158,68,59,F,65,10 158,68,59,F,65,10 158,68,59,F,62,10 158,68,59,F,61,10 159,67,66.2,M,72.7,5 159,67,66.2,M,72.7,5 159,67,66.2,M,71.5,5 159,67,66.2,F,65.5,5 159,67,66.2,F,63.5,5 160,67,66.5,M,71,1 162,67,65,M,69.7,6 162,67,65,M,67.5,6 162,67,65,F,65.5,6 162,67,65,F,65,6 162,67,65,F,64.5,6 162,67,65,F,63.5,6 163,67,65.5,M,70,5 163,67,65.5,M,69,5 163,67,65.5,F,65.5,5 163,67,65.5,F,65.5,5 163,67,65.5,F,63,5 164,67,65.5,M,70,4 164,67,65.5,M,67.7,4 164,67,65.5,F,63,4 164,67,65.5,F,60,4 165,67,65,M,65,3 165,67,65,F,62,3 165,67,65,F,62,3 166,67.5,65,M,71,11 166,67.5,65,M,69,11 166,67.5,65,F,64,11 166,67.5,65,F,64,11 166,67.5,65,F,63,11 166,67.5,65,F,63,11 166,67.5,65,F,63,11 166,67.5,65,F,63,11 166,67.5,65,F,63,11 166,67.5,65,F,62.5,11 166,67.5,65,F,62,11 167,67,64,M,71.5,4 167,67,64,M,70,4 167,67,64,M,67,4 167,67,64,M,67,4 168,67,63.5,M,71,8 168,67,63.5,M,70.2,8 168,67,63.5,M,69.2,8 168,67,63.5,M,68.5,8 168,67,63.5,M,68,8 168,67,63.5,M,67,8 168,67,63.5,M,65.5,8 168,67,63.5,F,63.5,8 169,67,63,M,69,3 169,67,63,M,68,3 169,67,63,F,63,3 170,67.5,62,M,70,5 170,67.5,62,M,69.5,5 170,67.5,62,M,69,5 170,67.5,62,M,68.5,5 170,67.5,62,F,66,5 171,67,61,M,67,1 172,66,67,M,70.5,8 172,66,67,M,70.5,8 172,66,67,M,67,8 172,66,67,M,66,8 172,66,67,M,66,8 172,66,67,F,62,8 172,66,67,F,62,8 172,66,67,F,61.5,8 173,66,67,M,72,9 173,66,67,M,65,9 173,66,67,M,65,9 173,66,67,F,67,9 173,66,67,F,64,9 173,66,67,F,64,9 173,66,67,F,62,9 173,66,67,F,60,9 173,66,67,F,60,9 174,66,66,M,66,5 174,66,66,M,65,5 174,66,66,F,67,5 174,66,66,F,66.5,5 174,66,66,F,65.5,5 175,66,66,M,72,6 175,66,66,M,68,6 175,66,66,F,66,6 175,66,66,F,65,6 175,66,66,F,62,6 175,66,66,F,61,6 176,66.5,65,M,68.7,8 176,66.5,65,M,68.5,8 176,66.5,65,M,66.5,8 176,66.5,65,M,64.5,8 176,66.5,65,F,62.5,8 176,66.5,65,F,60.5,8 176,66.5,65,F,60.5,8 176,66.5,65,F,57.5,8 177,66,65.5,M,72,5 177,66,65.5,M,71,5 177,66,65.5,M,67,5 177,66,65.5,F,66,5 177,66,65.5,F,65,5 178,66,63,M,70,1 179,66,63.5,F,64.5,2 179,66,63.5,F,62,2 180,66.5,63,M,67.2,6 180,66.5,63,M,67,6 180,66.5,63,M,65,6 180,66.5,63,F,65,6 180,66.5,63,F,65,6 180,66.5,63,F,63,6 181,66.5,62.5,M,70,7 181,66.5,62.5,M,68,7 181,66.5,62.5,F,63.5,7 181,66.5,62.5,F,62.5,7 181,66.5,62.5,F,62.5,7 181,66.5,62.5,F,62.5,7 181,66.5,62.5,F,62.5,7 182,66,61.5,M,70,1 183,66,60,M,68,4 183,66,60,M,67,4 183,66,60,M,65,4 183,66,60,F,60,4 184,66,60,M,65,1 185,66,59,M,68,15 185,66,59,M,67,15 185,66,59,M,66.5,15 185,66,59,M,66,15 185,66,59,M,65.7,15 185,66,59,M,65.5,15 185,66,59,M,65,15 185,66,59,F,65,15 185,66,59,F,64,15 185,66,59,F,63,15 185,66,59,F,62,15 185,66,59,F,61,15 185,66,59,F,60,15 185,66,59,F,58,15 185,66,59,F,57,15 186,65,67,M,66.5,4 186,65,67,M,66,4 186,65,67,M,66,4 186,65,67,F,65,4 187,65,67,F,63,1 188,65,66,M,63,4 188,65,66,F,63,4 188,65,66,F,63,4 188,65,66,F,60,4 190,65,65,M,69,9 190,65,65,M,68,9 190,65,65,M,68,9 190,65,65,F,65,9 190,65,65,F,65,9 190,65,65,F,62,9 190,65,65,F,62,9 190,65,65,F,61,9 190,65,65,F,59,9 191,65,65.5,M,70.7,2 191,65,65.5,F,65.5,2 192,65,65,M,69.2,6 192,65,65,M,69,6 192,65,65,M,68,6 192,65,65,M,67.7,6 192,65,65,F,64.5,6 192,65,65,F,60.5,6 193,65,64,M,67,6 193,65,64,M,67,6 193,65,64,F,64,6 193,65,64,F,64,6 193,65,64,F,62.5,6 193,65,64,F,60.5,6 194,65,63,M,70,2 194,65,63,F,63,2 195,65,63,M,66,3 195,65,63,M,66,3 195,65,63,F,63,3 196,65.5,63,M,71,4 196,65.5,63,M,71,4 196,65.5,63,M,69,4 196,65.5,63,F,63.5,4 197,65.5,60,M,68,5 197,65.5,60,M,68,5 197,65.5,60,M,67,5 197,65.5,60,M,67,5 197,65.5,60,F,62,5 198,64,64,M,71.5,7 198,64,64,M,68,7 198,64,64,F,65.5,7 198,64,64,F,64,7 198,64,64,F,62,7 198,64,64,F,62,7 198,64,64,F,61,7 199,64,64,M,70.5,7 199,64,64,M,68,7 199,64,64,F,67,7 199,64,64,F,65,7 199,64,64,F,64,7 199,64,64,F,64,7 199,64,64,F,60,7 200,64,63,M,64.5,1 201,64,60,M,66,2 201,64,60,F,60,2 203,62,66,M,64,3 203,62,66,F,62,3 203,62,66,F,61,3 204,62.5,63,M,66.5,2 204,62.5,63,F,57,2 136A,68.5,65,M,72,8 136A,68.5,65,M,70.5,8 136A,68.5,65,M,68.7,8 136A,68.5,65,M,68.5,8 136A,68.5,65,M,67.7,8 136A,68.5,65,F,64,8 136A,68.5,65,F,63.5,8 136A,68.5,65,F,63,8 -------------------------------------------------------------------------------- /tutorial/tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## This is a Heading" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Explanatory text" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 9, 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import matplotlib.pyplot as plt" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 10, 31 | "metadata": { 32 | "collapsed": false 33 | }, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/plain": [ 38 | "3" 39 | ] 40 | }, 41 | "execution_count": 10, 42 | "metadata": {}, 43 | "output_type": "execute_result" 44 | } 45 | ], 46 | "source": [ 47 | "def add(a, b):\n", 48 | " return a + b\n", 49 | "add(1, 2)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 11, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [ 59 | { 60 | "data": { 61 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4VPW9x/H3Dwj7TlgCIYR9DSKERbCKSxWQK6K2tVaq\nWEvxaterLOKCO2prtXWh9FovXK3aEjZZRKkiuIACShbWsBMCASIJkIRs3/tHpveJMZAJTHJmJp/X\n8+TJzJmTnI+H5OPJycn3ODNDRETCSy2vA4iISOCp3EVEwpDKXUQkDKncRUTCkMpdRCQMqdxFRMKQ\nyl1EJAyp3EVEwpDKXUQkDNXxasORkZEWGxvr1eZFRELSxo0bj5lZ64rW86zcY2Nj2bBhg1ebFxEJ\nSc65ff6sp9MyIiJhSOUuIhKGVO4iImFI5S4iEoZU7iIiYcivcnfO7XXOJTnnvnbOfecSF1fiT865\nVOdconNuYOCjioiIvypzKeQVZnbsLK+NBrr73oYCr/rei4iIBwJ1WmYcMM9KrAOaO+eiAvS5RUTC\nQkFRMa+sTmXzgRNVvi1/y92AVc65jc65SeW83gE4UOr5Qd+yb3HOTXLObXDObTh69Gjl04qIhKjk\ntCxuePlTnn1vOyuSD1f59vw9LXOpmaU559oAHzjntpnZmspuzMzmAHMA4uPjdWduEQl7eQVF/PnD\nncz+eDctGtbl1Z8MZHRc1Z/Y8KvczSzN9z7DObcQGAKULvc0oGOp59G+ZSIiNdaGvZlMSUhk99HT\n/GBQNA9e14dmDSOqZdsVlrtzrhFQy8xO+h5fAzxWZrUlwL3Oubcp+UVqlpmlBzytiEgIOHWmkOfe\n28a8dfto36wB8+4cwmU9Kpz1FVD+HLm3BRY65/69/t/N7D3n3GQAM5sNLAfGAKlADjCxauKKiAS3\nj3cc5YEFSRzKyuX2S2K5/9qeNKpX/TMaK9yime0GLipn+exSjw24J7DRRERCx4mcfB5fupWETQfp\n2roR//zFJcTHtvQsj2cjf0VEwsWKpHQeWpzCiZx87r2iG/de2Y36EbU9zaRyFxE5TxnZeTy8OIX3\nUg7Tr0NT5t45mL7tm3kdC1C5i4hUmpnxz40HeWLpFvIKi5k6qhc//15n6tQOnnFdKncRkUo4kJnD\nAwuTWLvzGENiWzLrpji6tG7sdazvULmLiPihqNiY9/lenlu5HQc8fkM/fjIkhlq1nNfRyqVyFxGp\nQGrGSabMT2TT/hOM7NmaJ8fH0aF5A69jnZPKXUTkLAqKivnLx7v4079SaVivNn/80UXcMKADvr/7\nCWoqdxGRciQdzOL++ZvZdvgk1/WP4tHr+xLZuJ7XsfymchcRKSWvoIgXVu3kr2t306pRXf4yYRDX\n9m3ndaxKU7mLiPis332caQuS2HPsNLcM7sj0Mb1p1qB6Bn0FmspdRGq8k3kFPPPeNt5Yt5+OLRvw\n5l1DGdEt0utYF0TlLiI12kfbMpixMIn07Dx+dmln/uuaHjSsG/rVGPr/BSIi5yHzdD6PL93Cwq/S\n6N6mMQl3D2dgTAuvYwWMyl1EahQzY1lSOo8sTiErt4BfXdWde67oSr063g76CjSVu4jUGEey83hw\nUTIfbDlC/+hmvHHXUHpHNfU6VpXwu9ydc7WBDUCamY0t89pIYDGwx7dogZmVvVuTiIgnzIx3vjzA\nk8u3kl9YzIwxvZk4IjaoBn0FWmWO3H8NbAXO9r+5tWVLX0TEa/uP5zBtQSKf7TrO0M4teeam/sRG\nNvI6VpXzq9ydc9HAdcCTwO+qNJGISAAUFRuvf7qH37+/nTq1avHU+DhuGdwxaAd9BZq/R+4vAFOA\nJudYZ7hzLhFIA+4zs5QLDScicj52HCkZ9PX1gRNc2asNT47vR1Sz4B70FWgVlrtzbiyQYWYbfefW\ny7MJiDGzU865McAioHs5n2sSMAkgJibmvEOLiJQnv7CYV1fv4qWPdtKkfgQv3jKA6y9qHxKDvgLN\nldzb+hwrOPc0MAEoBOpTcs59gZnddo6P2QvEm9mxs60THx9vGzZsOJ/MIiLfsfnACabMT2T7kZOM\nG9Ceh8f2oVUIDfryl3Nuo5nFV7RehUfuZjYdmO77pCMpOeXyrWJ3zrUDjpiZOeeGALWA4+cTXESk\nMnLzi3j+g+289ske2jSpz3//NJ6r+7T1Opbnzvs6d+fcZAAzmw3cDNztnCsEcoFbrKIfCURELtDn\nu44zbUEi+47ncOvQGKaN7kXT+qE56CvQKjwtU1V0WkZEzld2XgFPL9/GW1/sp1Orhjx9YxzDu4b2\noC9/Bey0jIhIMPnX1iPMWJhMxsk8Jl3Whd9e3YMGdcNrdEAgqNxFJCQcP3WGR9/dwpLNh+jVrgl/\nmTCIizo29zpW0FK5i0hQMzOWbD7EzCUpnDpTyG+v7sHdI7tSt074jg4IBJW7iASt9KxcHlyYzL+2\nZTCgY3Oevbk/Pdqe628p5d9U7iISdIqLjbe+3M/Ty7dRWFzMg9f1ZuKIztSuIaMDAkHlLiJBZe+x\n00xbkMi63ZkM79qKWTf2J6ZVQ69jhRyVu4gEhcKiYv726R7+8P4O6tapxTM3xfHD+I41cnRAIKjc\nRcRzW9OzmZqQSOLBLL7fpy1P3NCPtk3rex0rpKncRcQzZwqLePmjXbzyUSrNGkTw0q0Xc11clI7W\nA0DlLiKe2LT/G6bOT2RnxinGX9yBh8f2oUWjul7HChsqdxGpVjn5hfzh/R387dM9RDWtz+t3DOaK\nXm28jhV2VO4iUm0+TT3GtAWJHMjMZcKwTkwZ1ZMmGvRVJVTuIlLlsnILeGrZVt7ZcIDOkY14Z9Iw\nhnZp5XWssKZyF5Eq9X7KYR5clMzx0/lMvrwrv7m6O/UjNOirqqncRaRKHD15hpnvprAsMZ3eUU15\n7fbBxEU38zpWjaFyF5GAMjMWfpXGY0u3kHOmiPuv7cmky7oQUVuDvqqT3+XunKsNbADSzGxsmdcc\n8CIwBsgB7jCzTYEMKiLBL+1ELjMWJrF6+1EGxpQM+urWRoO+vFCZI/dfA1spuUF2WaOB7r63ocCr\nvvciUgMUFxtvrt/HrBXbMGDmf/RhwiWxGvTlIb/K3TkXDVwHPAn8rpxVxgHzfPdNXeeca+6cizKz\n9MBFFZFgtPvoKaYlJPHF3ky+1z2Sp8bH0bGlBn15zd8j9xeAKcDZfr7qABwo9fygb5nKXSRMFRYV\n89e1e/jjqh3Ur1OL527uz82DojU6IEhUWO7OubFAhpltdM6NvJCNOecmAZMAYmJiLuRTiYiHUg5l\nMTUhkeS0bEb1bcdj4/rSRoO+goo/R+4jgOudc2OA+kBT59wbZnZbqXXSgI6lnkf7ln2Lmc0B5gDE\nx8fbeacWEU/kFRTx5w93Mvvj3bRoWJdXfzKQ0XFRXseSclRY7mY2HZgO4Dtyv69MsQMsAe51zr1N\nyS9Ss3S+XSS8bNyXyZT5iew6epqbBkbz0NjeNG+oQV/B6ryvc3fOTQYws9nAckoug0yl5FLIiQFJ\nJyKeO32mkOdWbmfu53tp36wBc+8cwuU9WnsdSypQqXI3s9XAat/j2aWWG3BPIIOJiPfW7DjK9AVJ\nHMrK5afDOnH/qF40rqe/fQwF+lcSke/Iying8WVbmL/xIF1aN+Ifv7iEwbEtvY4llaByF5FveS85\nnYcWp5B5Op97rujKL6/UoK9QpHIXEQAyTubxyOIUViQfpm/7pvzPxMH0ba9BX6FK5S5Sw5kZ8zce\n5IllW8ktKGLKqJ78/Hsa9BXqVO4iNdiBzBweWJjE2p3HGBzbglk39adr68Zex5IAULmL1EDFxca8\nz/fy7MrtOOCxcX25bWgnamnQV9hQuYvUMKkZp5iWkMiGfd9wWY/WPDW+H9EtNOgr3KjcRWqIgqJi\n5qzZzYurdtKgbm3+8IOLuHFgBw36ClMqd5EaIDktiynzE9mSns2YuHY8en0/Wjep53UsqUIqd5Ew\nlldQxIv/2smcNbtp2agus28bxKh+7byOJdVA5S4Spr7cm8nU+YnsPnaaH8ZHM2NMH5o1jPA6llQT\nlbtImDl1ppBn39vGvM/3Ed2iAW/8bCiXdo/0OpZUM5W7SBhZvT2DGQuTOZSVy8QRsdx3TU8aadBX\njaR/dZEw8M3pfB5ftoUFm9Lo1qYx8ycPZ1CnFl7HEg+p3EVCmJmxPOkwjyxJ5kROAb+8shv3XtmN\nenU06KumU7mLhKiM7DweXJTM+1uOENehGfPuHEqf9k29jiVBwp8bZNcH1gD1fOvPN7NHyqwzElgM\n7PEtWmBmjwU2qohAydH6Pzcc5PFlW8gvLGb66F787NLO1NGgLynFnyP3M8CVZnbKORcBfOKcW2Fm\n68qst9bMxgY+ooj824HMHKYvSOKT1GMM6dySWTfG0UWDvqQc/twg24BTvqcRvjerylAi8m1Fxcbc\nz/by3Mrt1K7leOKGftw6JEaDvuSs/Drn7pyrDWwEugEvm9n6clYb7pxLBNKA+8wspZzPMwmYBBAT\nE3PeoUVqkp1HTjIlIZGv9p9gZM/WPDU+jvbNG3gdS4KcX+VuZkXAAOdcc2Chc66fmSWXWmUTEOM7\ndTMGWAR0L+fzzAHmAMTHx+voX+Qc8guLmf3xLl76MJVG9Wrzwo8GMG5Aew36Er9U6moZMzvhnPsI\nGAUkl1qeXerxcufcK865SDM7FrioIjVH4sETTJmfyLbDJxnbP4qZ1/clsrEGfYn//LlapjVQ4Cv2\nBsD3gWfKrNMOOGJm5pwbAtQCjldFYJFwlldQxB8/2MFf1+4msnE95kwYxDV9NehLKs+fI/coYK7v\nvHst4B9mttQ5NxnAzGYDNwN3O+cKgVzgFt8vYkXET+t2H2daQiJ7j+fw4yEdmTa6N80aaNCXnB9/\nrpZJBC4uZ/nsUo9fAl4KbDSRmuFkXgGzVmzjzfX7iWnZkL/fNZTh3TToSy6M/kJVxEMfbjvCjIXJ\nHMnO465LO/O7a3rQsK6+LeXC6atIxAOZp/N57N0UFn19iO5tGvPK3cO5OEaDviRwVO4i1cjMeDcx\nnZlLUsjOLeDXV3XnP6/oqkFfEnAqd5FqcjirZNDXqq1H6B/djGd/PpRe7TToS6qGyl2kipkZb395\ngKeWbSW/qJgZY3ozcUSsBn1JlVK5i1ShfcdPMy0hic93H2dYl5bMurE/sZGNvI4lNYDKXaQKFBUb\nr3+6h9+/v52IWrV4anwctwzuqEFfUm1U7iIBtv1wyaCvzQdOcFWvNjwxvh9RzTToS6qXyl0kQPIL\ni3lldSovf5RKk/oRvHjLAK6/SIO+xBsqd5EA+PrACabOT2T7kZOMG9Ceh8f2oZUGfYmHVO4iFyA3\nv4jnP9jOa5/soU2T+vz3T+O5uk9br2OJqNxFztdnu44xLSGJ/Zk53Do0hmmje9G0vgZ9SXBQuYtU\nUnZeAU8v38ZbX+ynU6uG/P3nQxneVYO+JLio3EUqYdWWI8xYlMTRk2eYdFkXfnt1DxrU1egACT4q\ndxE/HD91hpnvbuHdzYfo1a4JcybEc1HH5l7HEjkrf+7EVB9YA9TzrT/fzB4ps44DXgTGADnAHWa2\nKfBxRaqXmbFk8yFmLknh1JlCfnt1D+4e2ZW6dTQ6QIKbP0fuZ4ArfTe/jgA+cc6tMLN1pdYZTckN\nsbsDQ4FXfe9FQtahE7k8uCiZD7dlMKBjc569uT892jbxOpaIX/y5E5MBp3xPI3xvZW+hNw6Y51t3\nnXOuuXMuyszSA5pWpBoUFxtvfbmfp5dvo7C4mAev683EEZ2prdEBEkL8Oufuu3/qRqAb8LKZrS+z\nSgfgQKnnB33LVO4SUvYcO820hETW78lkeNdWzLqxPzGtGnodS6TS/Cp3MysCBjjnmgMLnXP9zCy5\nshtzzk0CJgHExMRU9sNFqkxhUTGvfbKH5z/YQd3atZh1Yxw/GtxRowMkZFXqahkzO+Gc+wgYBZQu\n9zSgY6nn0b5lZT9+DjAHID4+vuypHRFPbE3PZmpCIokHs7i6d1ueuKEf7ZrV9zqWyAXx52qZ1kCB\nr9gbAN8Hnimz2hLgXufc25T8IjVL59sl2J0pLOLlD1N5ZfUumjWI4KVbL+a6uCgdrUtY8OfIPQqY\n6zvvXgv4h5ktdc5NBjCz2cBySi6DTKXkUsiJVZRXJCA27f+GqfMT2ZlxivEXd+DhsX1o0aiu17FE\nAsafq2USgYvLWT671GMD7glsNJHAy8kv5Pcrd/D6Z3to17Q+r98xmCt6tfE6lkjA6S9Upcb4NPUY\n0xYkciAzl9uGxTB1VC+aaNCXhCmVu4S9rNwCnlq2lXc2HKBzZCPemTSMoV1aeR1LpEqp3CWsrUw5\nzEOLkjl+Op/Jl3flN1d3p36EBn1J+FO5S1g6evIMM5eksCwpnd5RTXnt9sHERTfzOpZItVG5S1gx\nMxZ+lcZjS7eQc6aI+67pwS8u70pEbQ36kppF5S5hI+1ELjMWJrF6+1EGxpQM+urWRoO+pGZSuUvI\nKy423ly/j1krtlFs8Mh/9OGnl8Rq0JfUaCp3CWm7j55iWkISX+zN5NJukTx9YxwdW2rQl4jKXUJS\nYVExf127hz+u2kH9OrV49ub+/GBQtEYHiPio3CXkpBzKYmpCIslp2Vzbty2Pj+tHm6Ya9CVSmspd\nQkZeQRF//nAnsz/eTYuGdXn1JwMZHRfldSyRoKRyl5CwcV8mU+YnsuvoaW4aGM1DY3vTvKEGfYmc\njcpdgtrpM4U8t3I7cz/fS/tmDZh75xAu79Ha61giQU/lLkFrzY6jTF+QRNqJXG6/pBP3j+pF43r6\nkhXxh75TJOhk5RTw+LItzN94kC6tG/HPyZcwOLal17FEQorKXYLKe8npPLQ4hczT+fznyK786ioN\n+hI5H/7cZq8jMA9oCxgwx8xeLLPOSGAxsMe3aIGZPRbYqBLOMk7m8cjiFFYkH6ZPVFNev2Mw/Tpo\n0JfI+fLnyL0Q+C8z2+ScawJsdM59YGZbyqy31szGBj6ihDMzY/7GgzyxbCu5BUXcf21PJl3WRYO+\nRC6QP7fZSwfSfY9POue2Ah2AsuUuUikHMnN4YGESa3ceI75TC2bd1J9ubRp7HUskLFTqnLtzLpaS\n+6muL+fl4c65RCANuM/MUi44nYSl4mJj3ud7eXbldgAevb4vE4Z1opYGfYkEjN/l7pxrDCQAvzGz\n7DIvbwJizOyUc24MsAjoXs7nmARMAoiJiTnv0BK6UjNOMS0hkQ37vuGyHq15anw/olto0JdIoDkz\nq3gl5yKApcBKM3vej/X3AvFmduxs68THx9uGDRsqEVVCWUFRMXPW7ObFVTtpULc2D4/tw40DO2jQ\nl0glOec2mll8Rev5c7WMA14Dtp6t2J1z7YAjZmbOuSFALeB4JTNLmEpOy2LK/ES2pGczJq4dj17f\nj9ZN6nkdSySs+XNaZgQwAUhyzn3tW/YAEANgZrOBm4G7nXOFQC5wi/nzI4GEtbyCIl78107mrNlN\ny0Z1mX3bQEb106Avkergz9UynwDn/NnZzF4CXgpUKAl9X+7NZOr8RHYfO80PBkXz4HV9aNYwwutY\nIjWG/kJVAurUmUKefW8b8z7fR3SLBvzvz4bwve4a9CVS3VTuEjCrt2cwY2Eyh7JymTgilvuu6Ukj\nDfoS8YS+8+SCfXM6n8eXbWHBpjS6tWnM/MnDGdSphdexRGo0lbucNzNjRfJhHl6czImcAn55ZTfu\nvbIb9epo0JeI11Tucl4ysvN4aHEyK1OOENehGfPuHEqf9k29jiUiPip3qRQz458bDvLEsi2cKSxm\n2uhe3HVpZ+po0JdIUFG5i98OZOYwfUESn6QeY0jnlsy6MY4urTXoSyQYqdylQkXFxtzP9vLcyu3U\nruV44oZ+3DokRoO+RIKYyl3OaeeRk0xNSGTT/hOM7Nmap8bH0b55A69jiUgFVO5SroKiYmav3sWf\nP0ylUb3avPCjAYwb0F6DvkRChMpdviPpYBb3z9/MtsMnGds/ipnX9yWysQZ9iYQSlbv8v7yCIv64\nagd/XbObyMb1mDNhENf0bed1LBE5Dyp3AWD97uNMW5DEnmOn+fGQjkwb3ZtmDTToSyRUqdxruJN5\nBTzz3jbeWLefmJYN+ftdQxneLdLrWCJygVTuNdhH2zJ4YGESR7LzuOvSzvzumh40rKsvCZFwoO/k\nGijzdD6PvZvCoq8P0b1NY165ezgXx2jQl0g48ec2ex2BeUBbwIA5ZvZimXUc8CIwBsgB7jCzTYGP\nKxfCzFiamM7MJSlk5Rbw66u6859XdNWgL5Ew5M+ReyHwX2a2yTnXBNjonPvAzLaUWmc00N33NhR4\n1fdegsSR7DxmLExm1dYj9I9uxps/H0qvdhr0JRKu/LnNXjqQ7nt80jm3FegAlC73ccA8331T1znn\nmjvnonwfKx4yM9758gBPLt9KQVExM8b0ZuKIWA36EglzlTrn7pyLBS4G1pd5qQNwoNTzg75l3yp3\n59wkYBJATExM5ZJKpe07fprpC5L4bNdxhnVpyawb+xMb2cjrWCJSDfwud+dcYyAB+I2ZZZ/Pxsxs\nDjAHID4+3s7nc0jFioqN1z/dw+/f305ErVo8Ob4fPx6sQV8iNYlf5e6ci6Ck2N80swXlrJIGdCz1\nPNq3TKrZ9sMnmZKQyOYDJ7iqVxueGN+PqGYa9CVS0/hztYwDXgO2mtnzZ1ltCXCvc+5tSn6RmqXz\n7dUrv7CYV1an8vJHqTSpH8GLtwzg+os06EukpvLnyH0EMAFIcs597Vv2ABADYGazgeWUXAaZSsml\nkBMDH1XOZvOBE0yZn8j2IycZN6A9D4/tQysN+hKp0fy5WuYT4JyHf76rZO4JVCjxT25+Ec9/sJ3X\nPtlDmyb1ee32eK7q3dbrWCISBPQXqiHqs13HmJaQxP7MHG4dGsO00b1oWl+DvkSkhMo9xGTnFfD0\n8m289cV+OrVqyFs/H8YlXVt5HUtEgozKPYSs2nKEGYuSOHryDJMu68Jvr+5Bg7oaHSAi36VyDwHH\nT53h0Xe3sGTzIXq1a8KcCfFc1LG517FEJIip3IOYmbFk8yFmLknh1JlCfvf9Hky+vCt162h0gIic\nm8o9SB06kcuDi5L5cFsGAzo259mb+9OjbROvY4lIiFC5B5niYuOtL/fz9PJtFBUbD43twx3DY6mt\n0QEiUgkq9yCy59hppiUksn5PJiO6teLp8f2JadXQ61giEoJU7kGgsKiYv326hz+8v4O6dWrxzE1x\n/DC+o0YHiMh5U7l7bGt6NlMTEkk8mMX3+7TliRv60bZpfa9jiUiIU7l75ExhES9/mMorq3fRrEEE\nL916MdfFReloXUQCQuXugU37v2Hq/ER2Zpzixos78NDYPrRoVNfrWCISRlTu1Sgnv5Dfr9zB65/t\nIappfV6fOJgrerbxOpaIhCGVezX5NPUY0xYkciAzlwnDOjFlVE+aaNCXiFQRlXsVy8ot4KllW3ln\nwwE6RzbinUnDGNpFg75EpGqp3KvQypTDPLQomeOn85l8eVd+c3V36kdo0JeIVD1/brP3N2AskGFm\n/cp5fSSwGNjjW7TAzB4LZMhQc/TkGWYuSWFZUjq9o5ry2u2DiYtu5nUsEalB/Dly/x/gJWDeOdZZ\na2ZjA5IohJkZC79K47GlW8g5U8T91/Zk0mVdiKitQV8iUr38uc3eGudcbNVHCW1pJ3KZsTCJ1duP\nMjCmZNBXtzYa9CUi3gjUOffhzrlEIA24z8xSylvJOTcJmAQQExMToE17q7jYeHP9Pmat2IYBM/+j\nDxMu0aAvEfFWIMp9ExBjZqecc2OARUD38lY0sznAHID4+HgLwLY9tevoKaYlJPLl3m/4XvdInhof\nR8eWGvQlIt674HI3s+xSj5c7515xzkWa2bEL/dzBqrComDlrd/PCqp3Ur1OL527uz82DojU6QESC\nxgWXu3OuHXDEzMw5NwSoBRy/4GRBKuVQFlMTEklOy2ZU33Y8dkNf2jTRoC8RCS7+XAr5FjASiHTO\nHQQeASIAzGw2cDNwt3OuEMgFbjGzkD/lUlZeQRF//nAnsz/eTYuGdXn1JwMZHRfldSwRkXL5c7XM\njyt4/SVKLpUMWxv2ZjI1IZFdR09z08BoHhrbm+YNNehLRIKX/kL1HE6fKeS5lduZ+/le2jdrwNw7\nh3B5j9ZexxIRqZDK/SzW7DjK9AVJHMrK5fZLYrnv2p40rqfdJSKhQW1VxomcfJ5YtpX5Gw/SpXUj\n/vmLS4iPbel1LBGRSlG5l7IiKZ2HFqfwTU4+91zRlV9eqUFfIhKaVO5ARnYeDy9O4b2Uw/Rt35S5\ndw6mb3sN+hKR0FWjy93MmL/xII8v3UJeYTFTRvXk59/ToC8RCX01ttwPZObwwMIk1u48xuDYFsy6\nqT9dWzf2OpaISEDUuHIvLjbmfb6XZ1duxwGPj+vLT4Z2opYGfYlIGKlR5Z6acZKpCUls3PcNl/do\nzZPj+xHdQoO+RCT81IhyLygq5i8f7+JP/0qlYb3aPP/Dixh/cQcN+hKRsBX25Z6clsX98xPZmp7N\ndXFRzLy+L62b1PM6lohIlQrbcs8rKOKFVTv569rdtGxUl9m3DWJUv3ZexxIRqRZhWe5f7MlkWkIi\nu4+d5kfxHXlgTG+aNYzwOpaISLUJq3I/mVfAs+9t53/X7SO6RQPe+NlQLu0e6XUsEZFqFzbl/tH2\nDGYsSCI9O487R3Tmvmt70LBu2PzniYhUij836/gbMBbIMLN+5bzugBeBMUAOcIeZbQp00LP55nQ+\njy/dwoKv0ujWpjHzJw9nUKcW1bV5EZGg5M+h7f9QcjOOeWd5fTQlN8TuDgwFXvW9r1JmxrKkdB5Z\nnEJWbgG/urIb91zZjXp1NOhLRMSfOzGtcc7FnmOVccA836311jnnmjvnoswsPUAZv+NIdh4PLUrm\n/S1HiOvQjDfuGkrvqKZVtTkRkZATiJPSHYADpZ4f9C2rknL/aFsGv3r7K/ILi5k+uhc/u7QzdTTo\nS0TkW6r1N47OuUnAJICYmJjz+hydIxsxMKYFM6/vS+fIRoGMJyISNgJxyJsGdCz1PNq37DvMbI6Z\nxZtZfOu11I9+AAAEOElEQVTW53cv0tjIRsy9c4iKXUTkHAJR7kuAn7oSw4CsqjzfLiIiFfPnUsi3\ngJFApHPuIPAIEAFgZrOB5ZRcBplKyaWQE6sqrIiI+Mefq2V+XMHrBtwTsEQiInLBdJmJiEgYUrmL\niIQhlbuISBhSuYuIhCGVu4hIGHIlF7t4sGHnjgL7zvPDI4FjAYwTKMGaC4I3m3JVjnJVTjjm6mRm\nFf4VqGflfiGccxvMLN7rHGUFay4I3mzKVTnKVTk1OZdOy4iIhCGVu4hIGArVcp/jdYCzCNZcELzZ\nlKtylKtyamyukDznLiIi5xaqR+4iInIOQV3uzrm/OecynHPJZ3ndOef+5JxLdc4lOucGBkmukc65\nLOfc1763h6shU0fn3EfOuS3OuRTn3K/LWafa95efubzYX/Wdc1845zb7cj1azjpe7C9/clX7/iq1\n7drOua+cc0vLec2T70c/cnm5v/Y655J8291QzutVt8/MLGjfgMuAgUDyWV4fA6wAHDAMWB8kuUYC\nS6t5X0UBA32PmwA7gD5e7y8/c3mxvxzQ2Pc4AlgPDAuC/eVPrmrfX6W2/Tvg7+Vt36vvRz9yebm/\n9gKR53i9yvZZUB+5m9kaIPMcq/z/zbnNbB3Q3DkXFQS5qp2ZpZvZJt/jk8BWSu5lW1q17y8/c1U7\n3z445Xsa4Xsr+wsoL/aXP7k84ZyLBq4D/vssq3jy/ehHrmBWZfssqMvdD2e7OXcwGO77MWuFc65v\ndW7YORcLXEzJUV9pnu6vc+QCD/aX70f5r4EM4AMzC4r95Ucu8Obr6wVgClB8lte9+vqqKBd49/1o\nwCrn3EZXcg/psqpsn4V6uQerTUCMmfUH/gwsqq4NO+caAwnAb8wsu7q2W5EKcnmyv8ysyMwGUHLf\n3yHOuX7Vsd2K+JGr2veXc24skGFmG6t6W5XhZy7Pvh+BS33/lqOBe5xzl1XXhkO93P2+OXd1MrPs\nf/9obWbLgQjnXGRVb9c5F0FJgb5pZgvKWcWT/VVRLq/2V6ntnwA+AkaVecnTr6+z5fJof40ArnfO\n7QXeBq50zr1RZh0v9leFubz8+jKzNN/7DGAhMKTMKlW2z0K93IPy5tzOuXbOOed7PISS/Xy8irfp\ngNeArWb2/FlWq/b95U8uj/ZXa+dcc9/jBsD3gW1lVvNif1WYy4v9ZWbTzSzazGKBW4APzey2MqtV\n+/7yJ5cX+8u3rUbOuSb/fgxcA5S9wq7K9lmF91D1kgvSm3P7ketm4G7nXCGQC9xivl+NV6ERwAQg\nyXe+FuABIKZULi/2lz+5vNhfUcBc51xtSr7Z/2FmS51zk0vl8mJ/+ZPLi/1VriDYX/7k8mp/tQUW\n+v6/Ugf4u5m9V137TH+hKiIShkL9tIyIiJRD5S4iEoZU7iIiYUjlLiIShlTuIiJhSOUuIhKGVO4i\nImFI5S4iEob+D2m1x85kw/dLAAAAAElFTkSuQmCC\n", 62 | "text/plain": [ 63 | "" 64 | ] 65 | }, 66 | "metadata": {}, 67 | "output_type": "display_data" 68 | } 69 | ], 70 | "source": [ 71 | "xs = [1, 2, 3, 4, 5]\n", 72 | "ys = [1, 2, 3, 4, 5]\n", 73 | "plt.plot(xs, ys)\n", 74 | "plt.show()" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": { 81 | "collapsed": true 82 | }, 83 | "outputs": [], 84 | "source": [] 85 | } 86 | ], 87 | "metadata": { 88 | "kernelspec": { 89 | "display_name": "Python 3", 90 | "language": "python", 91 | "name": "python3" 92 | }, 93 | "language_info": { 94 | "codemirror_mode": { 95 | "name": "ipython", 96 | "version": 3 97 | }, 98 | "file_extension": ".py", 99 | "mimetype": "text/x-python", 100 | "name": "python", 101 | "nbconvert_exporter": "python", 102 | "pygments_lexer": "ipython3", 103 | "version": "3.6.0" 104 | } 105 | }, 106 | "nbformat": 4, 107 | "nbformat_minor": 2 108 | } 109 | --------------------------------------------------------------------------------