├── .ipynb_checkpoints
├── lesson1-checkpoint.ipynb
└── lesson_4-checkpoint.ipynb
├── Project
├── .ipynb_checkpoints
│ └── FinalProject_StudentFriendly-checkpoint.ipynb
├── convictions_by_state.csv
├── crime_by_state.csv
├── finalproject.ipynb
├── hindu_img.png
└── literacy_by_state.csv
├── case_study
├── .DS_Store
├── .ipynb_checkpoints
│ ├── case_study-checkpoint.ipynb
│ └── datmo_demo-checkpoint.ipynb
├── case_study.ipynb
├── cities.csv
├── city_gdp.csv
├── cost_of_living.csv
├── datmo_demo.ipynb
├── engineering_data.csv
├── engineering_salaries_test.csv
└── housing_price_index.csv
├── dsi.png
├── lesson1
├── .ipynb_checkpoints
│ └── lesson1-checkpoint.ipynb
├── child_mortality.csv
├── countries.csv
├── diseases.csv
├── fertility.csv
├── lesson1.ipynb
├── life_expectancy.csv
├── pew_population_projection.png
├── population.csv
├── poverty.csv
└── water_quality.csv
├── lesson2
├── .ipynb_checkpoints
│ ├── Lesson_2_Programming_Intro-checkpoint.ipynb
│ └── lesson2-checkpoint.ipynb
└── lesson2.ipynb
├── lesson3.ipynb
├── lesson4
├── .ipynb_checkpoints
│ └── lesson4-checkpoint.ipynb
└── lesson4.ipynb
├── lesson5
├── cricket_tiers.csv
├── doctor_salaries.csv
├── engineering_data.csv
└── lesson_5.ipynb
├── lesson6
├── .ipynb_checkpoints
│ └── lesson6-checkpoint.ipynb
└── lesson6.ipynb
├── lesson7
├── .ipynb_checkpoints
│ └── Lesson_7_Visualization-checkpoint.ipynb
├── Lesson_7_Visualization.ipynb
├── british_india_troops.csv
└── foreign_tourists.csv
├── lesson8
├── .ipynb_checkpoints
│ └── Lesson8_Correlation-checkpoint.ipynb
├── Lesson8_Correlation.ipynb
├── child_mortality.csv
├── fertility.csv
├── life_expectancy.csv
└── population.csv
├── lesson9
├── .ipynb_checkpoints
│ ├── Notebook 9-checkpoint.ipynb
│ └── lesson9-checkpoint.ipynb
├── Notebook 9.ipynb
├── circular.csv
├── cities_r2.csv
├── family.csv
└── lesson9.ipynb
└── tutorial
└── tutorial.ipynb
/.ipynb_checkpoints/lesson_4-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Lesson 4: Exercises"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "Today, we'll be going over a couple Python exercises to reinforce your knowledge about tables and basic statistics."
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 2,
20 | "metadata": {
21 | "collapsed": true
22 | },
23 | "outputs": [],
24 | "source": [
25 | "import numpy as np\n",
26 | "from datascience import *"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "**RCB's Cricket Confusion** Should this be a \"challenge problem?\"\n",
34 | "\n",
35 | "Royal Challengers Bangalore finished last in IPL 2017. The coach wants to build a strong team next year to win the trophy by buying the best possible players in the auction. He’s a great cricketer and coach, but he’s not very good at math! Your job is write a program to help him put a team together that's within his budget. There are 3 categories of players with the following costs in crores.\n",
36 | "Tier 1: 1\n",
37 | "Tier 2: 0.5\n",
38 | "Tier 3: 0.25\n",
39 | " \n",
40 | "The coach has 10 crores to spend and can purchase 11 to 16 players he wants from this set of 28. Write a function that accepts the coach's input for the number of players he wants that's within his budget.\n",
41 | "*Return* the players' total salary and *print* their names.\n",
42 | "REMEMBER: We refer to the first value in a column with 0!\n",
43 | "\n",
44 | " \n",
45 | "Sample Output: select_players(2, 4, 5)\n",
46 | "[MS Dhoni, Virat Kohli, Suresh Raina, Ambati Rayudu, Rohit Sharma, Murali Vijay, Amit Mishra, Axar Patel, Stuart Binny, Wriddhiman Saha, Mohit Sharma]\n",
47 | "5.25\n",
48 | "\n",
49 | "Sample Input: select_players(2, 2, 5)\n",
50 | "Sample Output: Too few players!\n",
51 | "\n",
52 | "Sample Input: select_players(10, 3, 1)\n",
53 | "Sample Output: Too many players!"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 17,
59 | "metadata": {
60 | "collapsed": false
61 | },
62 | "outputs": [
63 | {
64 | "data": {
65 | "text/html": [
66 | "
\n",
67 | " \n",
68 | " \n",
69 | " PLAYER | Salary | Tier | \n",
70 | "
\n",
71 | " \n",
72 | " \n",
73 | " \n",
74 | " MS Dhoni | 1 | 1 | \n",
75 | "
\n",
76 | " \n",
77 | " \n",
78 | " Virat Kohli | 1 | 1 | \n",
79 | "
\n",
80 | " \n",
81 | " \n",
82 | " Ajinkya Rahan | 1 | 1 | \n",
83 | "
\n",
84 | " \n",
85 | " \n",
86 | " Ravi Ashwin | 1 | 1 | \n",
87 | "
\n",
88 | " \n",
89 | " \n",
90 | " Suresh Raina | 0.5 | 2 | \n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " Ambati Rayudu | 0.5 | 2 | \n",
95 | "
\n",
96 | " \n",
97 | " \n",
98 | " Rohit Sharma | 0.5 | 2 | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " Murali Vijay | 0.5 | 2 | \n",
103 | "
\n",
104 | " \n",
105 | " \n",
106 | " Shikhar Dhawan | 0.5 | 2 | \n",
107 | "
\n",
108 | " \n",
109 | " \n",
110 | " Bhuvneshwar Kumar | 0.5 | 2 | \n",
111 | "
\n",
112 | " \n",
113 | "
\n",
114 | "... (18 rows omitted)
16:\n",
185 | " print(\"Too many players!\")\n",
186 | " else:\n",
187 | " t1 = players._________(\"______\", are.equal_to(_______)).take(range(0, tier_1)).column(\"PLAYER\")\n",
188 | " t2 =\n",
189 | " t3 = \n",
190 | " \n",
191 | " #How will you access the players' names using the column() function?\n",
192 | " \n",
193 | " \n",
194 | " \n",
195 | " total_salary = __________+_____________+______________\n",
196 | " if total_salary > 100:\n",
197 | " return \"Too expensive! Select a different combination!\"\n",
198 | " return total_salary"
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "**Building a Better Estimate**\n",
206 | "\n",
207 | "Raju the builder has made N measurements. Now, he wants to know the average value of the measurements made. In order to make the average value a better representative of the measurements, before calculating the average, he wants first to remove the highest K and the lowest K measurements. After that, he will calculate the average value among the remaining N - 2K measurements.\n",
208 | "Could you help Raju find the average value he will get after these manipulations?\n",
209 | "\n",
210 | "\n",
211 | "Sample Input: \n",
212 | "N - 5 \n",
213 | "K - 1\n",
214 | "N values - 2 9 -10 25 1\n",
215 | "Sample Output: \n",
216 | "4.00000\n"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": null,
222 | "metadata": {
223 | "collapsed": false
224 | },
225 | "outputs": [],
226 | "source": [
227 | "def new_measurements(n, k, arr):\n",
228 | " \n",
229 | " for i in range(___, ___):\n",
230 | " "
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "**Aditi’s Career Planning**\n",
238 | " \n",
239 | "Aditi can’t decide what field she wants to work in when she grows up! She likes medicine and engineering equally so her father advised her to pick the field that pays the most to an average worker. Aditi has collected tables containing the necessary data on the salaries of professionals in these fields and stored them in 2 unsorted arrays. Can you help her find out which job to pick as per her father’s advice? \n",
240 | "\n",
241 | "Hint: use the sort() function.\n",
242 | "\n"
243 | ]
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": 6,
248 | "metadata": {
249 | "collapsed": true
250 | },
251 | "outputs": [],
252 | "source": [
253 | "engg_salaries = Table.read_table(\"engineering_data.csv\").column(\"Salary\")\n",
254 | "#Source: http://research.aspiringminds.com/resources/#datasets\n",
255 | "doc_salaries = Table.read_table(\"doctor_salaries.csv\").column(\"Salary\")\n",
256 | "#Source: https://www.glassdoor.com/Salaries/india-doctor-salary-SRCH_IL.0,5_IN115_KO6,12_IP6.htm\n"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": null,
262 | "metadata": {
263 | "collapsed": false
264 | },
265 | "outputs": [],
266 | "source": [
267 | "def salary(med, engg, law):\n",
268 | " #In Python, we can define functions inside our own functions. \n",
269 | " #This function will compute a certain quantity from each array for you to help you compare the salaries.\n",
270 | " #What quantity do you think it is?\n",
271 | " def helper(array):\n",
272 | " \n",
273 | " _________________\n",
274 | " \n",
275 | " _________________\n",
276 | " length = len(array)\n",
277 | " if ___________:\n",
278 | " return array[___]\n",
279 | " else:\n",
280 | " return __________\n",
281 | " \n",
282 | " med_salary = _____________\n",
283 | " engg_salary = ____________\n",
284 | " law_salary = _____________\n",
285 | " #The max() function takes the maximum of all the values you put into it\n",
286 | " best_salary = max(___________, ______________, ___________) \n",
287 | " if best_salary == engg_salary:\n",
288 | " print(\"Engineering\")\n",
289 | " elif best_salary == med_salary:\n",
290 | " print(\"Medicine\")\n",
291 | " else:\n",
292 | " print(law)"
293 | ]
294 | }
295 | ],
296 | "metadata": {
297 | "kernelspec": {
298 | "display_name": "Python 3",
299 | "language": "python",
300 | "name": "python3"
301 | },
302 | "language_info": {
303 | "codemirror_mode": {
304 | "name": "ipython",
305 | "version": 3
306 | },
307 | "file_extension": ".py",
308 | "mimetype": "text/x-python",
309 | "name": "python",
310 | "nbconvert_exporter": "python",
311 | "pygments_lexer": "ipython3",
312 | "version": "3.6.0"
313 | }
314 | },
315 | "nbformat": 4,
316 | "nbformat_minor": 2
317 | }
318 |
--------------------------------------------------------------------------------
/Project/.ipynb_checkpoints/FinalProject_StudentFriendly-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Final Project: Crime in India"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "Congratulations on reaching the final project stage of DSI! As your final project today, we want you to look at data from India about crimes committed against women. Safety of women is a critical issue across India right now and we want you to take a data-centric approach to begin unpacking this topic. \n"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "collapsed": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import matplotlib\n",
33 | "matplotlib.use('Agg')\n",
34 | "from datascience import Table, predicates\n",
35 | "%matplotlib inline\n",
36 | "import matplotlib.pyplot as plt\n",
37 | "import numpy as np\n",
38 | "plt.style.use('fivethirtyeight')"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "First, let's make a table of crimes committed in each state in 2012."
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": null,
51 | "metadata": {
52 | "collapsed": false
53 | },
54 | "outputs": [],
55 | "source": [
56 | "crime_data = Table.read_table('crime_by_state.csv')\n",
57 | "crime_data"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## Question 1: Warm-Up"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "**A.** How many rows are there in this table? Write a line of code that tells us the number of rows in the table. "
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "metadata": {
78 | "collapsed": true
79 | },
80 | "outputs": [],
81 | "source": []
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "**B.** Find the population of all the states reported."
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "metadata": {
94 | "collapsed": true
95 | },
96 | "outputs": [],
97 | "source": [
98 | "total_pop = sum(crime_data.____(_____))\n",
99 | "total_pop"
100 | ]
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "**C.** Find the total amount of arrests in 2012."
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {
113 | "collapsed": true
114 | },
115 | "outputs": [],
116 | "source": [
117 | "total_arrests = \n",
118 | "total_arrests"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "**D.** Calculate the total amount of arrests per hundred thousand people in 2012."
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": null,
131 | "metadata": {
132 | "collapsed": true
133 | },
134 | "outputs": [],
135 | "source": [
136 | "10000 * _____ / ______"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "**E.** Do you think it’s a high fraction of the population? Consider the entire population of India. How would this reflect if the rate were applied to the whole country?\n"
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {},
149 | "source": [
150 | "**Answer Here** (double click to edit)"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "metadata": {},
156 | "source": [
157 | "## Question 2: Visualizing our Data"
158 | ]
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "metadata": {},
163 | "source": [
164 | "**A.** Before we do this, we need to add one column to our table. "
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "Calculate the weighted crime by state (crimes per 100,000) using two columns from the table, and add this information as a new column to the crime_data table as a column called `\"Weighted Crime\"`."
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {
178 | "collapsed": false
179 | },
180 | "outputs": [],
181 | "source": [
182 | "total_crime_by_state = \n",
183 | "population_by_state = \n",
184 | "weighted_crime_by_state = np.divide(100000.0 * total_crime_by_state, population_by_state)\n",
185 | "crime_data = crime_data.with_column(\"________\", _________)\n",
186 | "crime_data"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "**B.** Make a plot of the total male count and total female count of arrested peoples by state below. \n",
194 | "\n",
195 | "Hint: Put the names of the columns representing male and female arrests in the array."
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": null,
201 | "metadata": {
202 | "collapsed": false
203 | },
204 | "outputs": [],
205 | "source": [
206 | "crime_data.____(\"_____\", np.array([\"____\", \"______\"]))"
207 | ]
208 | },
209 | {
210 | "cell_type": "markdown",
211 | "metadata": {},
212 | "source": [
213 | "**C.** Now we want to look at the *normalized* crime per state (per 100,000 people). This will let us look at how the states rank in terms of crimes relative to their populations. Make this plot below.\n"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": null,
219 | "metadata": {
220 | "collapsed": true
221 | },
222 | "outputs": [],
223 | "source": []
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "**D.** Do the states rank similarly? Do they seem to follow a similar pattern in terms of number of crimes?\n",
230 | "\n"
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "## Question 3: Convictions"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {},
243 | "source": [
244 | "Now that we have looked at some raw crime data, let’s think about its consequences. Is any action being taken against offenders? What kind of data do you think you could look at that would help you answer this question?\n",
245 | "\n",
246 | "Let’s try looking at the number of persons convicted and corresponding numbers of acquittals and convictions.\n"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "metadata": {
253 | "collapsed": false
254 | },
255 | "outputs": [],
256 | "source": [
257 | "conviction_data = Table.read_table(\"convictions_by_state.csv\")\n",
258 | "conviction_data"
259 | ]
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "metadata": {},
264 | "source": [
265 | "**A.** \n",
266 | "First, let's calculate the percentage of *arrests* that were *aquitted* and add this information as a new column to the crime_data table as a column called \"Percentage Acquitted\". Recall how you did this for the crime_data table."
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": null,
272 | "metadata": {
273 | "collapsed": true
274 | },
275 | "outputs": [],
276 | "source": [
277 | "number_acquitted = \n",
278 | "number_arrested = \n",
279 | "percentage_acquitted = (_______ / _______) * ______\n",
280 | "conviction_data = conviction_data.with_column(\"__________\", ___________)\n",
281 | "conviction_data"
282 | ]
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "metadata": {},
287 | "source": [
288 | "**B.** Conduct your own calculations and make appropriate plots to try to answer the questions we have posed. Discuss your approach and potential results with your peers and instructor. We have left some blank space below for you to do this work.\n"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": null,
294 | "metadata": {
295 | "collapsed": true
296 | },
297 | "outputs": [],
298 | "source": []
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": null,
303 | "metadata": {
304 | "collapsed": true
305 | },
306 | "outputs": [],
307 | "source": []
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {},
312 | "source": [
313 | "*Caution! Do you think looking at the acquittal rate was a fair way to assess action being taken against offenders? What does an acquittal really mean? We must understand the difference between an acquittal and a false case - no matter the crime. An acquittal occurs when there is not enough proof against the accused - not necessarily when the accuser is determined to be a liar. In fact, the acquittal rate for all crimes is approximately the same. The rate of acquittal for attempted murder for example is 73.4%. That doesn’t mean there is a false murder case epidemic! For rape it is 72.9%. So perhaps we should not have looked at the acquittal rate data to assess the conviction rate. What other types of data do you think we could have looked at to help us?*\n"
314 | ]
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "metadata": {},
319 | "source": [
320 | "## Question 3: Literacy"
321 | ]
322 | },
323 | {
324 | "cell_type": "markdown",
325 | "metadata": {},
326 | "source": [
327 | "Get the literacy data as a table from the raw literacy csv file."
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": null,
333 | "metadata": {
334 | "collapsed": false
335 | },
336 | "outputs": [],
337 | "source": [
338 | "literacy_data = Table.read_table(\"literacy_by_state.csv\")\n",
339 | "literacy_data"
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {},
345 | "source": [
346 | "**A.** Add two columns to your `literacy_data` table called `Percentage Acquitted` and `Weighted Crime`. Use the `Percentage Acquitted` column from your `conviction_data` table and the `Weighted Crime` column from your `crime_data` table."
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": null,
352 | "metadata": {
353 | "collapsed": false
354 | },
355 | "outputs": [],
356 | "source": [
357 | "literacy_data = \n",
358 | "literacy_data"
359 | ]
360 | },
361 | {
362 | "cell_type": "markdown",
363 | "metadata": {},
364 | "source": [
365 | "**B.** Make a scatter plot comparing literacy rates in each state to the percentage of people acquitted. "
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": null,
371 | "metadata": {
372 | "collapsed": true
373 | },
374 | "outputs": [],
375 | "source": [
376 | "literacy_data.scatter(\"_______\", \"___________\")"
377 | ]
378 | },
379 | {
380 | "cell_type": "markdown",
381 | "metadata": {},
382 | "source": [
383 | "**C.** Calculate the correlation coefficient between these two factors. Use the `standard_units` function to help you."
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": null,
389 | "metadata": {
390 | "collapsed": true
391 | },
392 | "outputs": [],
393 | "source": [
394 | "#Calculating distance from the mean, dividing by standard deviation\n",
395 | "def standard_units(nums):\n",
396 | " return (nums - np.mean(nums))/np.std(nums)\n",
397 | "#Average of the product of x and y values in standard units, takes in a table and strings representing two column names.\n",
398 | "def correlation(tbl, col_1, col_2):\n",
399 | " return np.mean(standard_units(tbl.column(col_1)) * standard_units(tbl.column(col_2)))"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": null,
405 | "metadata": {
406 | "collapsed": true
407 | },
408 | "outputs": [],
409 | "source": [
410 | "correlation(literacy_data, \"____\", \"_____\")"
411 | ]
412 | },
413 | {
414 | "cell_type": "markdown",
415 | "metadata": {},
416 | "source": [
417 | "**D.** What results did you find? Are they correlated? Is it a strong correlation? Even if they are correlated, does that say anything about causation?\n"
418 | ]
419 | },
420 | {
421 | "cell_type": "markdown",
422 | "metadata": {},
423 | "source": [
424 | "**Your Answer Here**"
425 | ]
426 | },
427 | {
428 | "cell_type": "markdown",
429 | "metadata": {},
430 | "source": [
431 | "**E.** Bonus– Plot predictions generated by the regression line (Use the `minimize` function for least-squares).\n"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {},
437 | "source": [
438 | "## Question 4: Qualitative Analysis"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "Discuss these questions with your peers and the instructor."
446 | ]
447 | },
448 | {
449 | "cell_type": "markdown",
450 | "metadata": {},
451 | "source": [
452 | "**A.** What can you take away from the calculations and visualizations you have made today?"
453 | ]
454 | },
455 | {
456 | "cell_type": "markdown",
457 | "metadata": {},
458 | "source": [
459 | "**Your Answer**"
460 | ]
461 | },
462 | {
463 | "cell_type": "markdown",
464 | "metadata": {},
465 | "source": [
466 | "**B.** Where do you think there are potential holes in the data and in the calculations we have made? \n"
467 | ]
468 | },
469 | {
470 | "cell_type": "markdown",
471 | "metadata": {},
472 | "source": [
473 | "**Your Answer**"
474 | ]
475 | },
476 | {
477 | "cell_type": "markdown",
478 | "metadata": {},
479 | "source": [
480 | "**C.** According to *The Hindu* newspaper, marital and other rape is grossly underreported in India.\n"
481 | ]
482 | },
483 | {
484 | "cell_type": "markdown",
485 | "metadata": {},
486 | "source": [
487 | "
"
488 | ]
489 | },
490 | {
491 | "cell_type": "markdown",
492 | "metadata": {},
493 | "source": [
494 | "Why would there be motivation to keep the reported crime rate low? Who is held accountable for these crime rates? Politicians, police, citizens?\n"
495 | ]
496 | },
497 | {
498 | "cell_type": "markdown",
499 | "metadata": {},
500 | "source": [
501 | "**Your Answer**"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "metadata": {},
507 | "source": [
508 | "## Conclusion"
509 | ]
510 | },
511 | {
512 | "cell_type": "markdown",
513 | "metadata": {},
514 | "source": [
515 | "Today, wewere able to see one of the many uses of data in today's world and connected our knowledge of data science with an important issue in society. You'll be able to use the material from the course *anywhere* and *anytime* in life, and we hope you learned a lot about how to look at the world and use numbers to describe it, computers to process it, and your brain to draw conclusions!"
516 | ]
517 | }
518 | ],
519 | "metadata": {
520 | "kernelspec": {
521 | "display_name": "Python 3",
522 | "language": "python",
523 | "name": "python3"
524 | },
525 | "language_info": {
526 | "codemirror_mode": {
527 | "name": "ipython",
528 | "version": 3
529 | },
530 | "file_extension": ".py",
531 | "mimetype": "text/x-python",
532 | "name": "python",
533 | "nbconvert_exporter": "python",
534 | "pygments_lexer": "ipython3",
535 | "version": "3.6.0"
536 | }
537 | },
538 | "nbformat": 4,
539 | "nbformat_minor": 2
540 | }
541 |
--------------------------------------------------------------------------------
/Project/convictions_by_state.csv:
--------------------------------------------------------------------------------
1 | STATE/UT,Persons Arrested,Persons Chargesheeted,Persons Convicted,Persons Acquitted
2 | A & N ISLANDS,73,73,5,68
3 | ANDHRA PRADESH,39288,39191,3527,35761
4 | ARUNACHAL PRADESH,202,130,24,178
5 | ASSAM,12346,7694,637,11709
6 | BIHAR,20147,19282,1317,18830
7 | CHANDIGARH,268,265,38,230
8 | CHHATTISGARH,6594,6566,1605,4989
9 | D & N HAVELI,30,38,4,26
10 | DAMAN & DIU,45,54,1,44
11 | DELHI,3981,3397,1771,2210
12 | GOA,286,127,7,279
13 | GUJARAT,23965,23525,434,23531
14 | HARYANA,7264,7429,1266,5998
15 | HIMACHAL PRADESH,1325,1317,107,1218
16 | JAMMU & KASHMIR,5204,5203,338,4866
17 | JHARKHAND,6549,5720,1152,5397
18 | KARNATAKA,16680,15849,859,15821
19 | KERALA,13517,13187,862,12655
20 | LAKSHADWEEP,1,0,0,1
21 | MADHYA PRADESH,29247,29234,5529,23718
22 | MAHARASHTRA,41048,39535,1047,40001
23 | MANIPUR,202,28,0,202
24 | MEGHALAYA,271,160,9,262
25 | MIZORAM,215,185,118,97
26 | NAGALAND,75,69,58,17
27 | ODISHA,17183,17142,974,16209
28 | PUDUCHERRY,110,103,26,84
29 | PUNJAB,5048,3439,904,4144
30 | RAJASTHAN,17095,17087,4582,12513
31 | SIKKIM,69,47,35,34
32 | TAMIL NADU,10913,9393,2046,8867
33 | TRIPURA,1946,2088,349,1597
34 | UTTAR PRADESH,77745,43775,12971,64774
35 | UTTARAKHAND,1420,1343,813,607
36 | WEST BENGAL,34023,33694,915,33108
37 |
--------------------------------------------------------------------------------
/Project/crime_by_state.csv:
--------------------------------------------------------------------------------
1 | STATE/UT,Total Male,Total Female,Grand Total,Total population
2 | ANDHRA PRADESH,64916,13660,78576,1015986396
3 | UTTAR PRADESH,70157,7588,77745,2195396247
4 | MAHARASHTRA,29897,11151,41048,1236102692
5 | WEST BENGAL,25332,8691,34023,1004825096
6 | MADHYA PRADESH,25227,4020,29247,798573215
7 | GUJARAT,18026,5939,23965,664219908
8 | BIHAR,17420,2727,20147,1141851007
9 | ODISHA,14719,2464,17183,461420938
10 | RAJASTHAN,14795,2300,17095,754831132
11 | KARNATAKA,12946,3734,16680,672437744
12 | KERALA,11740,1777,13517,367264447
13 | ASSAM,12223,123,12346,342861992
14 | TAMIL NADU,8732,2181,10913,793528538
15 | HARYANA,6268,996,7264,278883891
16 | CHHATTISGARH,5742,852,6594,280942156
17 | JHARKHAND,5808,741,6549,362628618
18 | JAMMU & KASHMIR,4679,525,5204,138038186
19 | PUNJAB,4023,1025,5048,304746596
20 | NCT OF DELHI,3657,324,3981,184285585
21 | TRIPURA,1656,290,1946,40381352
22 | UTTARAKHAND,1309,111,1420,111284272
23 | HIMACHAL PRADESH,1064,261,1325,75421599
24 | GOA,234,52,286,16034953
25 | MEGHALAYA,252,19,271,32604077
26 | CHANDIGARH,257,11,268,11601546
27 | MIZORAM,214,1,215,12001154
28 | ARUNACHAL PRADESH,202,0,202,15208721
29 | MANIPUR,181,21,202,29939316
30 | PUDUCHERRY,93,17,110,13689104
31 | NAGALAND,60,15,75,21786622
32 | ANDAMAN & NICOBAR ISLANDS,60,13,73,4179384
33 | SIKKIM,66,3,69,6684568
34 | DAMAN & DIU,26,19,45,2672021
35 | DADRA & NAGAR HAVELI,24,6,30,3771383
36 | LAKSHADWEEP,1,0,1,708719
37 |
--------------------------------------------------------------------------------
/Project/finalproject.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Final Project: Crime in India"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "Congratulations on reaching the final project stage of DSI! As your final project today, we want you to look at data from India about crimes committed against women. Safety of women is a critical issue across India right now and we want you to take a data-centric approach to begin unpacking this topic. \n"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "collapsed": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import matplotlib\n",
33 | "matplotlib.use('Agg')\n",
34 | "from datascience import Table, predicates\n",
35 | "%matplotlib inline\n",
36 | "import matplotlib.pyplot as plt\n",
37 | "import numpy as np\n",
38 | "plt.style.use('fivethirtyeight')"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "First, let's make a table of crimes committed in each state in 2012."
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": null,
51 | "metadata": {
52 | "collapsed": false
53 | },
54 | "outputs": [],
55 | "source": [
56 | "crime_data = Table.read_table('crime_by_state.csv')\n",
57 | "crime_data"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## Question 1: Warm-Up"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "**A.** How many rows are there in this table? Write a line of code that tells us the number of rows in the table. "
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": null,
77 | "metadata": {
78 | "collapsed": true
79 | },
80 | "outputs": [],
81 | "source": []
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "**B.** Find the population of all the states reported."
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "metadata": {
94 | "collapsed": true
95 | },
96 | "outputs": [],
97 | "source": [
98 | "total_pop = sum(crime_data.____(_____))\n",
99 | "total_pop"
100 | ]
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "**C.** Find the total amount of arrests in 2012."
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {
113 | "collapsed": true
114 | },
115 | "outputs": [],
116 | "source": [
117 | "total_arrests = \n",
118 | "total_arrests"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "**D.** Calculate the total amount of arrests per hundred thousand people in 2012."
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": null,
131 | "metadata": {
132 | "collapsed": true
133 | },
134 | "outputs": [],
135 | "source": [
136 | "10000 * _____ / ______"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "**E.** Do you think it’s a high fraction of the population? Consider the entire population of India. How would this reflect if the rate were applied to the whole country?\n"
144 | ]
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "metadata": {},
149 | "source": [
150 | "**Answer Here** (double click to edit)"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "metadata": {},
156 | "source": [
157 | "## Question 2: Visualizing our Data"
158 | ]
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "metadata": {},
163 | "source": [
164 | "**A.** Before we do this, we need to add one column to our table. "
165 | ]
166 | },
167 | {
168 | "cell_type": "markdown",
169 | "metadata": {},
170 | "source": [
171 | "Calculate the weighted crime by state (crimes per 100,000) using two columns from the table, and add this information as a new column to the crime_data table as a column called `\"Weighted Crime\"`."
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": null,
177 | "metadata": {
178 | "collapsed": false
179 | },
180 | "outputs": [],
181 | "source": [
182 | "total_crime_by_state = \n",
183 | "population_by_state = \n",
184 | "weighted_crime_by_state = np.divide(100000.0 * total_crime_by_state, population_by_state)\n",
185 | "crime_data = crime_data.with_column(\"________\", _________)\n",
186 | "crime_data"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "**B.** Make a plot of the total male count and total female count of arrested peoples by state below. \n",
194 | "\n",
195 | "Hint: Put the names of the columns representing male and female arrests in the array."
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": null,
201 | "metadata": {
202 | "collapsed": false
203 | },
204 | "outputs": [],
205 | "source": [
206 | "crime_data.____(\"_____\", np.array([\"____\", \"______\"]))"
207 | ]
208 | },
209 | {
210 | "cell_type": "markdown",
211 | "metadata": {},
212 | "source": [
213 | "**C.** Now we want to look at the *normalized* crime per state (per 100,000 people). This will let us look at how the states rank in terms of crimes relative to their populations. Make this plot below.\n"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": null,
219 | "metadata": {
220 | "collapsed": true
221 | },
222 | "outputs": [],
223 | "source": []
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {},
228 | "source": [
229 | "**D.** Do the states rank similarly? Do they seem to follow a similar pattern in terms of number of crimes?\n",
230 | "\n"
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "## Question 3: Convictions"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {},
243 | "source": [
244 | "Now that we have looked at some raw crime data, let’s think about its consequences. Is any action being taken against offenders? What kind of data do you think you could look at that would help you answer this question?\n",
245 | "\n",
246 | "Let’s try looking at the number of persons convicted and corresponding numbers of acquittals and convictions.\n"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "metadata": {
253 | "collapsed": false
254 | },
255 | "outputs": [],
256 | "source": [
257 | "conviction_data = Table.read_table(\"convictions_by_state.csv\")\n",
258 | "conviction_data"
259 | ]
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "metadata": {},
264 | "source": [
265 | "**A.** \n",
266 | "First, let's calculate the percentage of *arrests* that were *aquitted* and add this information as a new column to the crime_data table as a column called \"Percentage Acquitted\". Recall how you did this for the crime_data table."
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": null,
272 | "metadata": {
273 | "collapsed": true
274 | },
275 | "outputs": [],
276 | "source": [
277 | "number_acquitted = \n",
278 | "number_arrested = \n",
279 | "percentage_acquitted = (_______ / _______) * ______\n",
280 | "conviction_data = conviction_data.with_column(\"__________\", ___________)\n",
281 | "conviction_data"
282 | ]
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "metadata": {},
287 | "source": [
288 | "**B.** Conduct your own calculations and make appropriate plots to try to answer the questions we have posed. Discuss your approach and potential results with your peers and instructor. We have left some blank space below for you to do this work.\n"
289 | ]
290 | },
291 | {
292 | "cell_type": "code",
293 | "execution_count": null,
294 | "metadata": {
295 | "collapsed": true
296 | },
297 | "outputs": [],
298 | "source": []
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": null,
303 | "metadata": {
304 | "collapsed": true
305 | },
306 | "outputs": [],
307 | "source": []
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {},
312 | "source": [
313 | "*Caution! Do you think looking at the acquittal rate was a fair way to assess action being taken against offenders? What does an acquittal really mean? We must understand the difference between an acquittal and a false case - no matter the crime. An acquittal occurs when there is not enough proof against the accused - not necessarily when the accuser is determined to be a liar. In fact, the acquittal rate for all crimes is approximately the same. The rate of acquittal for attempted murder for example is 73.4%. That doesn’t mean there is a false murder case epidemic! For rape it is 72.9%. So perhaps we should not have looked at the acquittal rate data to assess the conviction rate. What other types of data do you think we could have looked at to help us?*\n"
314 | ]
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "metadata": {},
319 | "source": [
320 | "## Question 3: Literacy"
321 | ]
322 | },
323 | {
324 | "cell_type": "markdown",
325 | "metadata": {},
326 | "source": [
327 | "Get the literacy data as a table from the raw literacy csv file."
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": null,
333 | "metadata": {
334 | "collapsed": false
335 | },
336 | "outputs": [],
337 | "source": [
338 | "literacy_data = Table.read_table(\"literacy_by_state.csv\")\n",
339 | "literacy_data"
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {},
345 | "source": [
346 | "**A.** Add two columns to your `literacy_data` table called `Percentage Acquitted` and `Weighted Crime`. Use the `Percentage Acquitted` column from your `conviction_data` table and the `Weighted Crime` column from your `crime_data` table."
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": null,
352 | "metadata": {
353 | "collapsed": false
354 | },
355 | "outputs": [],
356 | "source": [
357 | "literacy_data = \n",
358 | "literacy_data"
359 | ]
360 | },
361 | {
362 | "cell_type": "markdown",
363 | "metadata": {},
364 | "source": [
365 | "**B.** Make a scatter plot comparing literacy rates in each state to the percentage of people acquitted. "
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": null,
371 | "metadata": {
372 | "collapsed": true
373 | },
374 | "outputs": [],
375 | "source": [
376 | "literacy_data.scatter(\"_______\", \"___________\")"
377 | ]
378 | },
379 | {
380 | "cell_type": "markdown",
381 | "metadata": {},
382 | "source": [
383 | "**C.** Calculate the correlation coefficient between these two factors. Use the `standard_units` function to help you."
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": null,
389 | "metadata": {
390 | "collapsed": true
391 | },
392 | "outputs": [],
393 | "source": [
394 | "#Calculating distance from the mean, dividing by standard deviation\n",
395 | "def standard_units(nums):\n",
396 | " return (nums - np.mean(nums))/np.std(nums)\n",
397 | "#Average of the product of x and y values in standard units, takes in a table and strings representing two column names.\n",
398 | "def correlation(tbl, col_1, col_2):\n",
399 | " return np.mean(standard_units(tbl.column(col_1)) * standard_units(tbl.column(col_2)))"
400 | ]
401 | },
402 | {
403 | "cell_type": "code",
404 | "execution_count": null,
405 | "metadata": {
406 | "collapsed": true
407 | },
408 | "outputs": [],
409 | "source": [
410 | "correlation(literacy_data, \"____\", \"_____\")"
411 | ]
412 | },
413 | {
414 | "cell_type": "markdown",
415 | "metadata": {},
416 | "source": [
417 | "**D.** What results did you find? Are they correlated? Is it a strong correlation? Even if they are correlated, does that say anything about causation?\n"
418 | ]
419 | },
420 | {
421 | "cell_type": "markdown",
422 | "metadata": {},
423 | "source": [
424 | "**Your Answer Here**"
425 | ]
426 | },
427 | {
428 | "cell_type": "markdown",
429 | "metadata": {},
430 | "source": [
431 | "**E.** Bonus– Plot predictions generated by the regression line (Use the `minimize` function for least-squares).\n"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {},
437 | "source": [
438 | "## Question 4: Qualitative Analysis"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "Discuss these questions with your peers and the instructor."
446 | ]
447 | },
448 | {
449 | "cell_type": "markdown",
450 | "metadata": {},
451 | "source": [
452 | "**A.** What can you take away from the calculations and visualizations you have made today?"
453 | ]
454 | },
455 | {
456 | "cell_type": "markdown",
457 | "metadata": {},
458 | "source": [
459 | "**Your Answer**"
460 | ]
461 | },
462 | {
463 | "cell_type": "markdown",
464 | "metadata": {},
465 | "source": [
466 | "**B.** Where do you think there are potential holes in the data and in the calculations we have made? \n"
467 | ]
468 | },
469 | {
470 | "cell_type": "markdown",
471 | "metadata": {},
472 | "source": [
473 | "**Your Answer**"
474 | ]
475 | },
476 | {
477 | "cell_type": "markdown",
478 | "metadata": {},
479 | "source": [
480 | "**C.** According to *The Hindu* newspaper, marital and other rape is grossly underreported in India.\n"
481 | ]
482 | },
483 | {
484 | "cell_type": "markdown",
485 | "metadata": {},
486 | "source": [
487 | "
"
488 | ]
489 | },
490 | {
491 | "cell_type": "markdown",
492 | "metadata": {},
493 | "source": [
494 | "Why would there be motivation to keep the reported crime rate low? Who is held accountable for these crime rates? Politicians, police, citizens?\n"
495 | ]
496 | },
497 | {
498 | "cell_type": "markdown",
499 | "metadata": {},
500 | "source": [
501 | "**Your Answer**"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "metadata": {},
507 | "source": [
508 | "## Conclusion"
509 | ]
510 | },
511 | {
512 | "cell_type": "markdown",
513 | "metadata": {},
514 | "source": [
515 | "Today, wewere able to see one of the many uses of data in today's world and connected our knowledge of data science with an important issue in society. You'll be able to use the material from the course *anywhere* and *anytime* in life, and we hope you learned a lot about how to look at the world and use numbers to describe it, computers to process it, and your brain to draw conclusions!"
516 | ]
517 | }
518 | ],
519 | "metadata": {
520 | "kernelspec": {
521 | "display_name": "Python 3",
522 | "language": "python",
523 | "name": "python3"
524 | },
525 | "language_info": {
526 | "codemirror_mode": {
527 | "name": "ipython",
528 | "version": 3
529 | },
530 | "file_extension": ".py",
531 | "mimetype": "text/x-python",
532 | "name": "python",
533 | "nbconvert_exporter": "python",
534 | "pygments_lexer": "ipython3",
535 | "version": "3.6.0"
536 | }
537 | },
538 | "nbformat": 4,
539 | "nbformat_minor": 2
540 | }
541 |
--------------------------------------------------------------------------------
/Project/hindu_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/Project/hindu_img.png
--------------------------------------------------------------------------------
/Project/literacy_by_state.csv:
--------------------------------------------------------------------------------
1 | State,Literacy Rate
2 | Andaman & Nicobar Islands,86.3
3 | Andhra Pradesh,67.89
4 | Arunachal Pradesh,67
5 | Assam,73.2
6 | Bihar,63.8
7 | Chandigarh,86.4
8 | Chhattisgarh,71
9 | Dadra & Nagar Haveli,77.7
10 | Daman & Diu,87.1
11 | Delhi,86.3
12 | Goa,87.4
13 | Gujarat,77.3
14 | Haryana,76.6
15 | Himachal Pradesh,83.8
16 | Jammu and Kashmir,68.7
17 | Jharkhand,67.6
18 | Karnataka,75.6
19 | Kerala,93.91
20 | Lakshadweep,92.3
21 | Madhya Pradesh,70.6
22 | Maharashtra,80.1
23 | Manipur,79.8
24 | Meghalaya,75.5
25 | Mizoram,91.6
26 | Nagaland,82.9
27 | Odisha,73.45
28 | Puducherry,86.5
29 | Punjab,79.7
30 | Rajasthan,67.1
31 | Sikkim,82.2
32 | Tamil Nadu,80.3
33 | Tripura,87.8
34 | Uttar Pradesh,71.7
35 | Uttarakhand,79.6
36 | West Bengal,77.1
--------------------------------------------------------------------------------
/case_study/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/case_study/.DS_Store
--------------------------------------------------------------------------------
/case_study/city_gdp.csv:
--------------------------------------------------------------------------------
1 | Rank,City,State or union territory,"GDP per capita
(nominal)","GDP per capita
(PPP)"
2 | 1,New Delhi,National Capital Territory of Delhi,"$3,580 ","$12,747 "
3 | 2,Mumbai,Maharashtra,"$1,990 ","$7,005 "
4 | 3,Chennai,Tamil Nadu,"$1,870 ","$6,469 "
5 | 4,Hyderabad,Telangana,"$1,430 ","$5,063 "
6 | 5,Bangalore,Karnataka,"$1,420 ","$5,051 "
7 | 6,Kolkata,West Bengal,"$1,110 ","$4,036 "
--------------------------------------------------------------------------------
/case_study/cost_of_living.csv:
--------------------------------------------------------------------------------
1 | Rank,City,Cost of Living Index,Rent Index,Cost of Living Plus Rent Index,Groceries Index,Restaurant Price Index,Local Purchasing Power Index
2 | 1,Ahmedabad,29.26,6.61,18.18,31.59,20.84,55.58
3 | 2,Bangalore,30.64,9.68,20.39,33.42,19.5,85.76
4 | 3,Bhubaneswar,25.91,4.66,15.52,29.79,14.91,51.16
5 | 4,Chandigarh,29.65,6.41,18.29,31.42,20.38,63.36
6 | 5,Chennai,29,7.34,18.41,31.48,19.21,68.68
7 | 6,Coimbatore,25.12,5.73,15.64,26.61,15.49,46.49
8 | 7,New Delhi,33.1,9.97,21.79,33.04,26.02,68.13
9 | 8,Goa,28.93,7.65,18.53,31.22,21.98,52.73
10 | 9,Gurgaon,36.29,10.53,23.69,35.6,32.21,94.28
11 | 10,Hyderabad,27.16,6.49,17.05,29.35,17.82,65.45
12 | 11,Indore,27.82,4.48,16.41,27.7,18.18,43.68
13 | 12,Jaipur,28.46,4.72,16.85,30.22,16.91,65.7
14 | 13,Kochi,25.99,6.45,16.44,28.94,14.98,55.8
15 | 14,Kolkata,28.48,6.92,17.94,30.05,23.36,46.81
16 | 15,Lucknow,28.45,4.72,16.85,29.48,18.27,62.69
17 | 16,Mangalore,24.2,5.82,15.21,25.67,13.22,91.24
18 | 17,Mumbai,33.25,23.72,28.59,35.84,23.96,63.7
19 | 18,Mysore,26.48,4.57,15.77,32.2,13.13,35.57
20 | 19,Nagpur,27.9,5.06,16.73,29.69,19.61,79.41
21 | 20,Navi Mumbai,29.94,9.97,20.18,32.22,20.8,72.11
22 | 21,Noida,33.11,7.02,20.35,34.77,22.69,82.21
23 | 22,Pune,30.87,7.89,19.64,32.92,22.1,85.31
24 | 23,Surat,28.28,4.61,16.71,29.5,20.91,54.56
25 | 24,Thane,30.34,8.73,19.78,31.99,20.49,58.83
26 | 25,Thiruvananthapuram,22.01,5.31,13.84,23.96,12.28,62.01
27 | 26,Vadodara,27.79,3.95,16.14,32.19,16.52,67.02
28 | 27,Visakhapatnam,25.99,5.27,15.86,28.85,17.14,52.77
--------------------------------------------------------------------------------
/case_study/housing_price_index.csv:
--------------------------------------------------------------------------------
1 | Particulars, 06-2011, 09-2011, 12-2011, 03-2012, 06-2012, 09-2012, 12-2012, 03-2013, 06-2013, 09-2013
2 | All India,116.00,119.40,125.50,134.10,142.60,147.10,157.00,160.80,162.30,169.20
3 | Ahmedabad,121.30,130.40,137.10,141.00,140.80,146.40,150.60,155.00,161.90,171.70
4 | Bangalore,110.70,107.80,138.60,133.30,133.30,136.60,141.20,141.90,142.30,150.40
5 | Chennai,101.20,110.40,110.70,108.20,119.20,117.80,137.60,137.40,138.30,150.00
6 | Delhi,126.80,124.80,136.70,158.20,177.30,183.20,200.70,213.10,214.80,215.70
7 | Kolkata,103.00,105.00,103.20,106.10,135.20,149.10,162.50,169.40,171.80,173.50
8 | Mumbai,122.10,131.40,122.80,143.50,147.60,148.10,158.90,159.50,160.00,169.20
9 |
--------------------------------------------------------------------------------
/dsi.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/dsi.png
--------------------------------------------------------------------------------
/lesson1/diseases.csv:
--------------------------------------------------------------------------------
1 | Year,State/UTs,Acute Diarrhoeal Diseases - Cases,Acute Diarrhoeal Diseases - Deaths,Malaria - Cases,Malaria - Deaths,Acute Respiaratory Infection - Cases,Acute Respiaratory Infection - Deaths,Japanese Encephalitis - Cases,Japanese Encephalitis - Deaths,Viral Hepatitis - Cases,Viral Hepatitis - Deaths
2 | 2011(P),GRAND TOTAL,10231049,1269,1278760,463,26300208,2492,8249,1169,94402,520
3 | 2011(P),Andhra Pradesh,2235614,107,39559,5,3089290,236,73,1,11050,61
4 | 2011(P),Arunachal Pradesh,32228,11,10961,NA,48602,9,NULL,NULL,636,4
5 | 2011(P),Assam,96816,16,47397,42,314824,NA,1319,250,2557,25
6 | 2011(P),Bihar,130276,NULL,2390,0,87486,NULL,821,197,202,NULL
7 | 2011(P),Chhattisgarh,64575,5,131179,18,155743,18,NULL,NULL,139,1
8 | 2011(P),Delhi,102983,62,413,NA,198541,102,9,NULL,8347,68
9 | 2011(P),Goa,15146,2,1231,1,61029,6,91,1,118,NA
10 | 2011(P),Gujarat,367450,0,86005,15,604076,NA,NULL,NULL,4328,NA
11 | 2011(P),Haryana,224223,21,33345,1,1275035,48,90,14,2557,2
12 | 2011(P),Himachal Pradesh,310227,51,247,NA,1484149,154,NULL,NULL,1248,10
13 | 2011(P),Jammu & Kashmir,544711,0,1031,NA,528409,6,NULL,NULL,5129,2
14 | 2011(P),Jharkhand,98258,1,152061,16,205496,5,303,19,384,2
15 | 2011(P),Karnataka,591989,49,24487,0,1629997,182,397,0,6049,8
16 | 2011(P),Kerala,260938,0,1339,2,5034506,128,88,6,5336,7
17 | 2011(P),Madhya Pradesh,290705,92,89304,71,578783,182,35,0,3851,12
18 | 2011(P),Maharashtra,507046,4,96632,114,571947,28,NULL,9,5994,30
19 | 2011(P),Manipur,17605,39,714,0,25441,55,11,0,229,NA
20 | 2011(P),Meghalaya,148801,20,24507,47,295146,5,NULL,NULL,87,3
21 | 2011(P),Mizoram,16192,11,8849,26,26817,33,NULL,NULL,812,14
22 | 2011(P),Nagaland,30458,1,3363,2,48566,NA,44,6,64,NA
23 | 2011(P),Orissa,632493,143,294759,73,1372208,269,NULL,NULL,3272,89
24 | 2011(P),Punjab,190022,15,2693,NA,656544,10,NULL,NULL,5041,12
25 | 2011(P),Rajasthan,227571,7,46457,5,1089640,62,NULL,NULL,967,0
26 | 2011(P),Sikkim,44094,2,51,NA,92736,12,NULL,NULL,484,0
27 | 2011(P),Tamil Nadu,210074,24,22139,0,2410214,22,762,29,5940,0
28 | 2011(P),Tripura,109777,83,14295,9,160438,135,NULL,NULL,404,0
29 | 2011(P),Uttar Pradesh,554770,185,56438,NA,1183992,196,3492,579,7749,28
30 | 2011(P),Uttarakhand,79643,26,1162,2,130283,56,0,NA,3143,19
31 | 2011(P),West Bengal,1854651,288,66465,14,1991660,528,714,58,5480,105
32 | 2011(P),A & N Islands,19679,0,5939,NA,69151,3,0,NULL,208,5
33 | 2011(P),Chandigarh,42615,NULL,582,NA,49649,NULL,NULL,NULL,1309,NULL
34 | 2011(P),D & N Haveli,81322,1,12331,NA,104447,NA,NULL,NULL,269,0
35 | 2011(P),Daman & Diu,12638,NA,268,NA,42350,NA,NULL,NULL,484,NA
36 | 2011(P),Lakshadweep,4693,NA,15,NA,28129,NA,NULL,NULL,15,1
37 | 2011(P),Puducherry,80766,3,152,NA,654884,2,NULL,NULL,520,12
--------------------------------------------------------------------------------
/lesson1/pew_population_projection.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dsindia/notebooks/2bf4ada929274e0472edd1f896659364b353d184/lesson1/pew_population_projection.png
--------------------------------------------------------------------------------
/lesson2/.ipynb_checkpoints/lesson2-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "deletable": true,
7 | "editable": true
8 | },
9 | "source": [
10 | "
"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {
16 | "deletable": true,
17 | "editable": true
18 | },
19 | "source": [
20 | "## Lesson 2: Introduction to Statistics and Python Programming"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {
26 | "collapsed": false,
27 | "deletable": true,
28 | "editable": true
29 | },
30 | "source": [
31 | "In the last lesson, we saw some cool ways in which computers can do amazing things with data. Today, we're going to be learning a little bit about Python, a computer programming language which is used by people of various professions including scientists, business people, and engineers. We will be using it to practice what we learned earlier in the session."
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {
37 | "deletable": true,
38 | "editable": true
39 | },
40 | "source": [
41 | "**Computer Programming and Programming Languages**\n",
42 | "\n",
43 | "One of the reasons why computers are useful is that they can store and run *programs*, a set of instructions that tells a computer what to do. In fact, you're running a computer program, called a Jupyter Notebook, right now! It's a program that lets you write your own programs. How cool is that?\n",
44 | "\n",
45 | "How can you tell a computer what to do? It doesn't understand Hindi, English, or any of the languages that we speak. This is where programming languages come in. They are the medium through which the computer understands what you want it to do. Think of them as the step between what you want to do and how you're going to tell the computer to do it. You may have heard of languages like Java and C++. Python is just like these. They may look different, but you can write programs to do the same things in different languages, just like we have different words in different spoken languages, for the same things.\n"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {
51 | "deletable": true,
52 | "editable": true
53 | },
54 | "source": [
55 | "**Data Types and Variables in Python**"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {
61 | "deletable": true,
62 | "editable": true
63 | },
64 | "source": [
65 | "Python can handle several different types of data. The most important ones that we'll be talking about are *integers*, *doubles*, *booleans* and *strings*.\n",
66 | "\n",
67 | "**integers** are whole numbers (without decimals) like 0, 1, 2 ...\n",
68 | "\n",
69 | "**doubles** are decimal numbers like 1.234, 56.8, 9.5. In Python, they are termed *float*.\n",
70 | "\n",
71 | "**booleans** are special values of only two types: either *True* or *False*. Examples of booleans are 0 = False, *nil* = False, 1 = True. In general, besides 0, all numbers are 'true' values. \n",
72 | "\n",
73 | "**strings** are anything placed between quotation marks like these: \" \". E.g \"hello\", \"data\"."
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": null,
79 | "metadata": {
80 | "collapsed": false,
81 | "deletable": true,
82 | "editable": true
83 | },
84 | "outputs": [],
85 | "source": [
86 | "type(212412)"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "metadata": {
93 | "collapsed": false,
94 | "deletable": true,
95 | "editable": true
96 | },
97 | "outputs": [],
98 | "source": [
99 | "type(9.9999)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {
106 | "collapsed": false,
107 | "deletable": true,
108 | "editable": true
109 | },
110 | "outputs": [],
111 | "source": [
112 | "type(True)"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {
119 | "collapsed": false,
120 | "deletable": true,
121 | "editable": true
122 | },
123 | "outputs": [],
124 | "source": [
125 | "type(\"data science\")"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {
131 | "collapsed": true,
132 | "deletable": true,
133 | "editable": true
134 | },
135 | "source": [
136 | "It is a tradition in programming to print the string \"Hello World!\" as your first program. Luckily, Python makes it easy for us to do this. If you ever want to print something, write print( ), with whatever you want in between the parentheses. Try printing \"Hello World!\" below."
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": null,
142 | "metadata": {
143 | "collapsed": false,
144 | "deletable": true,
145 | "editable": true
146 | },
147 | "outputs": [],
148 | "source": [
149 | "print(\"Hello World!\"\n",
150 | " )"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "metadata": {
156 | "collapsed": false,
157 | "deletable": true,
158 | "editable": true
159 | },
160 | "source": [
161 | "You've heard of *variables* in math class before, but they are also an essential part of programming. They allow you to store a value, like a word or number, in the computer's *memory,* and also give it a name, so you can use this value later on in your program. You can set a variable with the help of an equal sign ( = ). The variable name comes before the equal sign and it's value comes after. In the example below, we are setting the variable *pi* to be equal to 3.142. "
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": null,
167 | "metadata": {
168 | "collapsed": false,
169 | "deletable": true,
170 | "editable": true
171 | },
172 | "outputs": [],
173 | "source": [
174 | "pi = 3.142\n",
175 | "pi"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {
181 | "collapsed": false,
182 | "deletable": true,
183 | "editable": true
184 | },
185 | "source": [
186 | "Try setting the variable \"name\" to be equal to your name. Remember to put your name in quotes; that's how Python will recognize it!"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": null,
192 | "metadata": {
193 | "collapsed": true,
194 | "deletable": true,
195 | "editable": true
196 | },
197 | "outputs": [],
198 | "source": [
199 | "name = #your name here"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {
205 | "deletable": true,
206 | "editable": true
207 | },
208 | "source": [
209 | "Run this code."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {
216 | "collapsed": true,
217 | "deletable": true,
218 | "editable": true
219 | },
220 | "outputs": [],
221 | "source": [
222 | "print(\"Hello\" + your name + \"!\")"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {
228 | "collapsed": true,
229 | "deletable": true,
230 | "editable": true
231 | },
232 | "source": [
233 | "You might have noticed how we used the + sign to add, or *concatenate,* these strings together. This is why the + sign is called an *operator*, it performs an *operation* on strings. This is just like an operation in math, such as addition, subtraction, multiplication, or division."
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {
239 | "deletable": true,
240 | "editable": true
241 | },
242 | "source": [
243 | "Here are some examples of operations on *integers* and *doubles*. Because computers are great with working with big numbers, let's see your computer in action."
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "metadata": {
249 | "deletable": true,
250 | "editable": true
251 | },
252 | "source": [
253 | "The text after the \"#\" sign is called a *comment*, which means it doesn't get run. Comments are useful to explain your code."
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": null,
259 | "metadata": {
260 | "collapsed": false,
261 | "deletable": true,
262 | "editable": true
263 | },
264 | "outputs": [],
265 | "source": [
266 | "print(127834 + 7345) # Addition\n",
267 | "print(8237645 - 10634) #Subtraction\n",
268 | "print(1234.5678 * 5678.9101112) #Multiplication\n",
269 | "print(341345.234028 / 7345.332) #Division"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {
275 | "collapsed": false,
276 | "deletable": true,
277 | "editable": true
278 | },
279 | "source": [
280 | "Using our *pi* variable, let's calculate the area of a circle of radius 5 cm."
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {
287 | "collapsed": false,
288 | "deletable": true,
289 | "editable": true
290 | },
291 | "outputs": [],
292 | "source": [
293 | "pi = 3.142\n",
294 | "pi * 5 * 5\n"
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {
300 | "collapsed": false,
301 | "deletable": true,
302 | "editable": true
303 | },
304 | "source": [
305 | "Now that we've learnt about data types and variables, let's test ourselves. In the next cell, there will be examples of variables. Fill in the variable \"type_guess\" with the data type you think it is and run the cell."
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": null,
311 | "metadata": {
312 | "collapsed": true,
313 | "deletable": true,
314 | "editable": true
315 | },
316 | "outputs": [],
317 | "source": [
318 | "data_example1 = \"Data Science\"\n",
319 | "example1_type = type(data_example1)\n",
320 | "type_guess1 = \"______\"\n",
321 | "\n",
322 | "print(\"Your answer: \" + type_guess1)\n",
323 | "\n",
324 | "print(\"Correct answer:\")\n",
325 | "example1_type"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": null,
331 | "metadata": {
332 | "collapsed": false,
333 | "deletable": true,
334 | "editable": true
335 | },
336 | "outputs": [],
337 | "source": [
338 | "data_example2 = 2678\n",
339 | "example2_type = type(data_example2)\n",
340 | "type_guess2 = \"_______\"\n",
341 | "\n",
342 | "print(\"Your answer: \" + type_guess2)\n",
343 | "\n",
344 | "print(\"Correct answer: \")\n",
345 | "example2_type"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": null,
351 | "metadata": {
352 | "collapsed": true,
353 | "deletable": true,
354 | "editable": true
355 | },
356 | "outputs": [],
357 | "source": [
358 | "data_example3 = 0\n",
359 | "example3_type = type(data_example3)\n",
360 | "type_guess3 = \"______\"\n",
361 | "\n",
362 | "print(\"Your answer:\" + type_guess3)\n",
363 | "\n",
364 | "print(\"Correct answer: \")\n",
365 | "example3_type"
366 | ]
367 | },
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {
371 | "collapsed": true,
372 | "deletable": true,
373 | "editable": true
374 | },
375 | "source": [
376 | "You might be wondering what exactly boolean values do in Python. Essentially, a boolean value tells us if something is true or false. Therefore, some expressions can output boolean values. Here are some examples:"
377 | ]
378 | },
379 | {
380 | "cell_type": "code",
381 | "execution_count": null,
382 | "metadata": {
383 | "collapsed": false,
384 | "deletable": true,
385 | "editable": true,
386 | "scrolled": true
387 | },
388 | "outputs": [],
389 | "source": [
390 | "4 == 4 #we use two equal signs to check if two things are equal; remember one equal sign is used for setting variables!"
391 | ]
392 | },
393 | {
394 | "cell_type": "code",
395 | "execution_count": null,
396 | "metadata": {
397 | "collapsed": false,
398 | "deletable": true,
399 | "editable": true
400 | },
401 | "outputs": [],
402 | "source": [
403 | "type(4 == 4)"
404 | ]
405 | },
406 | {
407 | "cell_type": "code",
408 | "execution_count": null,
409 | "metadata": {
410 | "collapsed": false,
411 | "deletable": true,
412 | "editable": true
413 | },
414 | "outputs": [],
415 | "source": [
416 | "4 > 5"
417 | ]
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": null,
422 | "metadata": {
423 | "collapsed": false,
424 | "deletable": true,
425 | "editable": true
426 | },
427 | "outputs": [],
428 | "source": [
429 | "type(4 > 5)"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {
436 | "collapsed": false,
437 | "deletable": true,
438 | "editable": true
439 | },
440 | "outputs": [],
441 | "source": [
442 | "5 + 1 == 6"
443 | ]
444 | },
445 | {
446 | "cell_type": "markdown",
447 | "metadata": {
448 | "deletable": true,
449 | "editable": true
450 | },
451 | "source": [
452 | "As you can see, a comparison of two integer values will output a boolean value. A similar concept can be applied to other data types, but that will be covered later. For now, we will learn exactly why this is important. A very important part of Python is the *if-statement*. This statement lets you take a boolean value and if True, compute some action. If it's False, the code under the *if* will be ignored.\n",
453 | "\n",
454 | "The basic syntax of an if statement looks like this:\n",
455 | "\n",
456 | " if (boolean):\n",
457 | " \n",
458 | " some code\n",
459 | "\n",
460 | "You may have noticed that the code below the if is indented. This is very important! In Python, the indent is used to separate lines of code. All the code inside any command has to be at the same indent line otherwise Python can't read it. In order for python to understand that the action you want is for the \"if\" case, you must indent exactly one indent beyond where the \"if\" is. Here are some examples of the \"if\".\n",
461 | " \n"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "metadata": {
468 | "collapsed": false,
469 | "deletable": true,
470 | "editable": true
471 | },
472 | "outputs": [],
473 | "source": [
474 | "if 4 == 4:\n",
475 | " print(\"4 is equal to four!\")"
476 | ]
477 | },
478 | {
479 | "cell_type": "code",
480 | "execution_count": null,
481 | "metadata": {
482 | "collapsed": false,
483 | "deletable": true,
484 | "editable": true
485 | },
486 | "outputs": [],
487 | "source": [
488 | "example_bool = 5 > 4\n",
489 | "if example_bool:\n",
490 | " print(\"5 is greater than 4\")"
491 | ]
492 | },
493 | {
494 | "cell_type": "markdown",
495 | "metadata": {
496 | "deletable": true,
497 | "editable": true
498 | },
499 | "source": [
500 | "Notice that it wasn't needed to specify whether or not example_bool was equal, less than, or greater than anything. This is because example_bool is a true statement and Python will allow true if-statements to execute."
501 | ]
502 | },
503 | {
504 | "cell_type": "code",
505 | "execution_count": null,
506 | "metadata": {
507 | "collapsed": false,
508 | "deletable": true,
509 | "editable": true
510 | },
511 | "outputs": [],
512 | "source": [
513 | "example_bool = 4 > 5\n",
514 | "if example_bool == False:\n",
515 | " print(\"4 is greater than 5\")"
516 | ]
517 | },
518 | {
519 | "cell_type": "markdown",
520 | "metadata": {
521 | "deletable": true,
522 | "editable": true
523 | },
524 | "source": [
525 | "Notice in the last example, \"4 > 5\" seems to be a false value, but in the if-statement, it is asking if the value is equal to False. Thus, what the statement actually executes to is \"False == False\", which computes to \"True\"! Make sure you watch for tricks like this one!\n",
526 | "\n",
527 | "\n",
528 | "Now try it yourself! Fill in the if-statement with a true value and watch the code print.\n"
529 | ]
530 | },
531 | {
532 | "cell_type": "code",
533 | "execution_count": null,
534 | "metadata": {
535 | "collapsed": true,
536 | "deletable": true,
537 | "editable": true
538 | },
539 | "outputs": [],
540 | "source": [
541 | "if ___________:\n",
542 | " print(\"You have succeeded!\")"
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "metadata": {
548 | "deletable": true,
549 | "editable": true
550 | },
551 | "source": [
552 | "Set a variable \"mood\" to any number. If \"mood\" is greater than 5, print \"I am glad you are in a good mood today!\""
553 | ]
554 | },
555 | {
556 | "cell_type": "code",
557 | "execution_count": null,
558 | "metadata": {
559 | "collapsed": true,
560 | "deletable": true,
561 | "editable": true
562 | },
563 | "outputs": [],
564 | "source": [
565 | "mood = _______\n",
566 | "if ________:\n",
567 | " print(____________)"
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "metadata": {
573 | "deletable": true,
574 | "editable": true
575 | },
576 | "source": [
577 | "So far, the \"if\" statements that we've seen have only had one condition. That means, we only checked one boolean variable, and that was it. However, we can actually have any number of \"if\" checks. For example, if we have the score of a student, and we want to check which letter grade corresponds to their score, we could write the following:"
578 | ]
579 | },
580 | {
581 | "cell_type": "code",
582 | "execution_count": null,
583 | "metadata": {
584 | "collapsed": false,
585 | "deletable": true,
586 | "editable": true
587 | },
588 | "outputs": [],
589 | "source": [
590 | "score = 76\n",
591 | "if score >= 90:\n",
592 | " print(\"A\")\n",
593 | "elif 80 <= score < 90:\n",
594 | " print(\"B\")\n",
595 | "elif 70 <= score < 80:\n",
596 | " print(\"C\")\n",
597 | "elif 60 <= score < 70:\n",
598 | " print(\"D\")\n",
599 | "else:\n",
600 | " print(\"F\")"
601 | ]
602 | },
603 | {
604 | "cell_type": "markdown",
605 | "metadata": {
606 | "deletable": true,
607 | "editable": true
608 | },
609 | "source": [
610 | "Python evaluates these statements in order, and stops at the first one that is true. For example, in this case, the first two booleans are false (76 is not greater than 90 nor is it between 80 and 90), but the third is true (76 is between 70 and 80). Therefore, the code prints out C, and then stops evaluating this sequence of statements. "
611 | ]
612 | },
613 | {
614 | "cell_type": "markdown",
615 | "metadata": {
616 | "deletable": true,
617 | "editable": true
618 | },
619 | "source": [
620 | "We will build upon these foundational skills that you've learned in today's notebook in the next session. We encourage that you practice these tools on your own; you can do so by creating a blank notebook."
621 | ]
622 | }
623 | ],
624 | "metadata": {
625 | "anaconda-cloud": {},
626 | "kernelspec": {
627 | "display_name": "Python 3",
628 | "language": "python",
629 | "name": "python3"
630 | },
631 | "language_info": {
632 | "codemirror_mode": {
633 | "name": "ipython",
634 | "version": 3
635 | },
636 | "file_extension": ".py",
637 | "mimetype": "text/x-python",
638 | "name": "python",
639 | "nbconvert_exporter": "python",
640 | "pygments_lexer": "ipython3",
641 | "version": "3.6.0"
642 | }
643 | },
644 | "nbformat": 4,
645 | "nbformat_minor": 2
646 | }
647 |
--------------------------------------------------------------------------------
/lesson2/lesson2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "deletable": true,
7 | "editable": true
8 | },
9 | "source": [
10 | "
"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {
16 | "deletable": true,
17 | "editable": true
18 | },
19 | "source": [
20 | "## Lesson 2: Introduction to Statistics and Python Programming"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {
26 | "collapsed": false,
27 | "deletable": true,
28 | "editable": true
29 | },
30 | "source": [
31 | "In the last lesson, we saw some cool ways in which computers can do amazing things with data. Today, we're going to be learning a little bit about Python, a computer programming language which is used by people of various professions including scientists, business people, and engineers. We will be using it to practice what we learned earlier in the session."
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {
37 | "deletable": true,
38 | "editable": true
39 | },
40 | "source": [
41 | "**Computer Programming and Programming Languages**\n",
42 | "\n",
43 | "One of the reasons why computers are useful is that they can store and run *programs*, a set of instructions that tells a computer what to do. In fact, you're running a computer program, called a Jupyter Notebook, right now! It's a program that lets you write your own programs. How cool is that?\n",
44 | "\n",
45 | "How can you tell a computer what to do? It doesn't understand Hindi, English, or any of the languages that we speak. This is where programming languages come in. They are the medium through which the computer understands what you want it to do. Think of them as the step between what you want to do and how you're going to tell the computer to do it. You may have heard of languages like Java and C++. Python is just like these. They may look different, but you can write programs to do the same things in different languages, just like we have different words in different spoken languages, for the same things.\n"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {
51 | "deletable": true,
52 | "editable": true
53 | },
54 | "source": [
55 | "**Data Types and Variables in Python**"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {
61 | "deletable": true,
62 | "editable": true
63 | },
64 | "source": [
65 | "Python can handle several different types of data. The most important ones that we'll be talking about are *integers*, *doubles*, *booleans* and *strings*.\n",
66 | "\n",
67 | "**integers** are whole numbers (without decimals) like 0, 1, 2 ...\n",
68 | "\n",
69 | "**doubles** are decimal numbers like 1.234, 56.8, 9.5. In Python, they are termed *float*.\n",
70 | "\n",
71 | "**booleans** are special values of only two types: either *True* or *False*. Examples of booleans are 0 = False, *nil* = False, 1 = True. In general, besides 0, all numbers are 'true' values. \n",
72 | "\n",
73 | "**strings** are anything placed between quotation marks like these: \" \". E.g \"hello\", \"data\"."
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": null,
79 | "metadata": {
80 | "collapsed": false,
81 | "deletable": true,
82 | "editable": true
83 | },
84 | "outputs": [],
85 | "source": [
86 | "type(212412)"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "metadata": {
93 | "collapsed": false,
94 | "deletable": true,
95 | "editable": true
96 | },
97 | "outputs": [],
98 | "source": [
99 | "type(9.9999)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {
106 | "collapsed": false,
107 | "deletable": true,
108 | "editable": true
109 | },
110 | "outputs": [],
111 | "source": [
112 | "type(True)"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {
119 | "collapsed": false,
120 | "deletable": true,
121 | "editable": true
122 | },
123 | "outputs": [],
124 | "source": [
125 | "type(\"data science\")"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {
131 | "collapsed": true,
132 | "deletable": true,
133 | "editable": true
134 | },
135 | "source": [
136 | "It is a tradition in programming to print the string \"Hello World!\" as your first program. Luckily, Python makes it easy for us to do this. If you ever want to print something, write print( ), with whatever you want in between the parentheses. Try printing \"Hello World!\" below."
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": null,
142 | "metadata": {
143 | "collapsed": false,
144 | "deletable": true,
145 | "editable": true
146 | },
147 | "outputs": [],
148 | "source": [
149 | "print(\"Hello World!\"\n",
150 | " )"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "metadata": {
156 | "collapsed": false,
157 | "deletable": true,
158 | "editable": true
159 | },
160 | "source": [
161 | "You've heard of *variables* in math class before, but they are also an essential part of programming. They allow you to store a value, like a word or number, in the computer's *memory,* and also give it a name, so you can use this value later on in your program. You can set a variable with the help of an equal sign ( = ). The variable name comes before the equal sign and it's value comes after. In the example below, we are setting the variable *pi* to be equal to 3.142. "
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": null,
167 | "metadata": {
168 | "collapsed": false,
169 | "deletable": true,
170 | "editable": true
171 | },
172 | "outputs": [],
173 | "source": [
174 | "pi = 3.142\n",
175 | "pi"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {
181 | "collapsed": false,
182 | "deletable": true,
183 | "editable": true
184 | },
185 | "source": [
186 | "Try setting the variable \"name\" to be equal to your name. Remember to put your name in quotes; that's how Python will recognize it!"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": null,
192 | "metadata": {
193 | "collapsed": true,
194 | "deletable": true,
195 | "editable": true
196 | },
197 | "outputs": [],
198 | "source": [
199 | "name = #your name here"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {
205 | "deletable": true,
206 | "editable": true
207 | },
208 | "source": [
209 | "Run this code."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {
216 | "collapsed": true,
217 | "deletable": true,
218 | "editable": true
219 | },
220 | "outputs": [],
221 | "source": [
222 | "print(\"Hello\" + your name + \"!\")"
223 | ]
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "metadata": {
228 | "collapsed": true,
229 | "deletable": true,
230 | "editable": true
231 | },
232 | "source": [
233 | "You might have noticed how we used the + sign to add, or *concatenate,* these strings together. This is why the + sign is called an *operator*, it performs an *operation* on strings. This is just like an operation in math, such as addition, subtraction, multiplication, or division."
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {
239 | "deletable": true,
240 | "editable": true
241 | },
242 | "source": [
243 | "Here are some examples of operations on *integers* and *doubles*. Because computers are great with working with big numbers, let's see your computer in action."
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "metadata": {
249 | "deletable": true,
250 | "editable": true
251 | },
252 | "source": [
253 | "The text after the \"#\" sign is called a *comment*, which means it doesn't get run. Comments are useful to explain your code."
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": null,
259 | "metadata": {
260 | "collapsed": false,
261 | "deletable": true,
262 | "editable": true
263 | },
264 | "outputs": [],
265 | "source": [
266 | "print(127834 + 7345) # Addition\n",
267 | "print(8237645 - 10634) #Subtraction\n",
268 | "print(1234.5678 * 5678.9101112) #Multiplication\n",
269 | "print(341345.234028 / 7345.332) #Division"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {
275 | "collapsed": false,
276 | "deletable": true,
277 | "editable": true
278 | },
279 | "source": [
280 | "Using our *pi* variable, let's calculate the area of a circle of radius 5 cm."
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {
287 | "collapsed": false,
288 | "deletable": true,
289 | "editable": true
290 | },
291 | "outputs": [],
292 | "source": [
293 | "pi = 3.142\n",
294 | "pi * 5 * 5\n"
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {
300 | "collapsed": false,
301 | "deletable": true,
302 | "editable": true
303 | },
304 | "source": [
305 | "Now that we've learnt about data types and variables, let's test ourselves. In the next cell, there will be examples of variables. Fill in the variable \"type_guess\" with the data type you think it is and run the cell."
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": null,
311 | "metadata": {
312 | "collapsed": true,
313 | "deletable": true,
314 | "editable": true
315 | },
316 | "outputs": [],
317 | "source": [
318 | "data_example1 = \"Data Science\"\n",
319 | "example1_type = type(data_example1)\n",
320 | "type_guess1 = \"______\"\n",
321 | "\n",
322 | "print(\"Your answer: \" + type_guess1)\n",
323 | "\n",
324 | "print(\"Correct answer:\")\n",
325 | "example1_type"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": null,
331 | "metadata": {
332 | "collapsed": false,
333 | "deletable": true,
334 | "editable": true
335 | },
336 | "outputs": [],
337 | "source": [
338 | "data_example2 = 2678\n",
339 | "example2_type = type(data_example2)\n",
340 | "type_guess2 = \"_______\"\n",
341 | "\n",
342 | "print(\"Your answer: \" + type_guess2)\n",
343 | "\n",
344 | "print(\"Correct answer: \")\n",
345 | "example2_type"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": null,
351 | "metadata": {
352 | "collapsed": true,
353 | "deletable": true,
354 | "editable": true
355 | },
356 | "outputs": [],
357 | "source": [
358 | "data_example3 = 0\n",
359 | "example3_type = type(data_example3)\n",
360 | "type_guess3 = \"______\"\n",
361 | "\n",
362 | "print(\"Your answer:\" + type_guess3)\n",
363 | "\n",
364 | "print(\"Correct answer: \")\n",
365 | "example3_type"
366 | ]
367 | },
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {
371 | "collapsed": true,
372 | "deletable": true,
373 | "editable": true
374 | },
375 | "source": [
376 | "You might be wondering what exactly boolean values do in Python. Essentially, a boolean value tells us if something is true or false. Therefore, some expressions can output boolean values. Here are some examples:"
377 | ]
378 | },
379 | {
380 | "cell_type": "code",
381 | "execution_count": null,
382 | "metadata": {
383 | "collapsed": false,
384 | "deletable": true,
385 | "editable": true,
386 | "scrolled": true
387 | },
388 | "outputs": [],
389 | "source": [
390 | "4 == 4 #we use two equal signs to check if two things are equal; remember one equal sign is used for setting variables!"
391 | ]
392 | },
393 | {
394 | "cell_type": "code",
395 | "execution_count": null,
396 | "metadata": {
397 | "collapsed": false,
398 | "deletable": true,
399 | "editable": true
400 | },
401 | "outputs": [],
402 | "source": [
403 | "type(4 == 4)"
404 | ]
405 | },
406 | {
407 | "cell_type": "code",
408 | "execution_count": null,
409 | "metadata": {
410 | "collapsed": false,
411 | "deletable": true,
412 | "editable": true
413 | },
414 | "outputs": [],
415 | "source": [
416 | "4 > 5"
417 | ]
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": null,
422 | "metadata": {
423 | "collapsed": false,
424 | "deletable": true,
425 | "editable": true
426 | },
427 | "outputs": [],
428 | "source": [
429 | "type(4 > 5)"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {
436 | "collapsed": false,
437 | "deletable": true,
438 | "editable": true
439 | },
440 | "outputs": [],
441 | "source": [
442 | "5 + 1 == 6"
443 | ]
444 | },
445 | {
446 | "cell_type": "markdown",
447 | "metadata": {
448 | "deletable": true,
449 | "editable": true
450 | },
451 | "source": [
452 | "As you can see, a comparison of two integer values will output a boolean value. A similar concept can be applied to other data types, but that will be covered later. For now, we will learn exactly why this is important. A very important part of Python is the *if-statement*. This statement lets you take a boolean value and if True, compute some action. If it's False, the code under the *if* will be ignored.\n",
453 | "\n",
454 | "The basic syntax of an if statement looks like this:\n",
455 | "\n",
456 | " if (boolean):\n",
457 | " \n",
458 | " some code\n",
459 | "\n",
460 | "You may have noticed that the code below the if is indented. This is very important! In Python, the indent is used to separate lines of code. All the code inside any command has to be at the same indent line otherwise Python can't read it. In order for python to understand that the action you want is for the \"if\" case, you must indent exactly one indent beyond where the \"if\" is. Here are some examples of the \"if\".\n",
461 | " \n"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": null,
467 | "metadata": {
468 | "collapsed": false,
469 | "deletable": true,
470 | "editable": true
471 | },
472 | "outputs": [],
473 | "source": [
474 | "if 4 == 4:\n",
475 | " print(\"4 is equal to four!\")"
476 | ]
477 | },
478 | {
479 | "cell_type": "code",
480 | "execution_count": null,
481 | "metadata": {
482 | "collapsed": false,
483 | "deletable": true,
484 | "editable": true
485 | },
486 | "outputs": [],
487 | "source": [
488 | "example_bool = 5 > 4\n",
489 | "if example_bool:\n",
490 | " print(\"5 is greater than 4\")"
491 | ]
492 | },
493 | {
494 | "cell_type": "markdown",
495 | "metadata": {
496 | "deletable": true,
497 | "editable": true
498 | },
499 | "source": [
500 | "Notice that it wasn't needed to specify whether or not example_bool was equal, less than, or greater than anything. This is because example_bool is a true statement and Python will allow true if-statements to execute."
501 | ]
502 | },
503 | {
504 | "cell_type": "code",
505 | "execution_count": null,
506 | "metadata": {
507 | "collapsed": false,
508 | "deletable": true,
509 | "editable": true
510 | },
511 | "outputs": [],
512 | "source": [
513 | "example_bool = 4 > 5\n",
514 | "if example_bool == False:\n",
515 | " print(\"4 is greater than 5\")"
516 | ]
517 | },
518 | {
519 | "cell_type": "markdown",
520 | "metadata": {
521 | "deletable": true,
522 | "editable": true
523 | },
524 | "source": [
525 | "Notice in the last example, \"4 > 5\" seems to be a false value, but in the if-statement, it is asking if the value is equal to False. Thus, what the statement actually executes to is \"False == False\", which computes to \"True\"! Make sure you watch for tricks like this one!\n",
526 | "\n",
527 | "\n",
528 | "Now try it yourself! Fill in the if-statement with a true value and watch the code print.\n"
529 | ]
530 | },
531 | {
532 | "cell_type": "code",
533 | "execution_count": null,
534 | "metadata": {
535 | "collapsed": true,
536 | "deletable": true,
537 | "editable": true
538 | },
539 | "outputs": [],
540 | "source": [
541 | "if ___________:\n",
542 | " print(\"You have succeeded!\")"
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "metadata": {
548 | "deletable": true,
549 | "editable": true
550 | },
551 | "source": [
552 | "Set a variable \"mood\" to any number. If \"mood\" is greater than 5, print \"I am glad you are in a good mood today!\""
553 | ]
554 | },
555 | {
556 | "cell_type": "code",
557 | "execution_count": null,
558 | "metadata": {
559 | "collapsed": true,
560 | "deletable": true,
561 | "editable": true
562 | },
563 | "outputs": [],
564 | "source": [
565 | "mood = _______\n",
566 | "if ________:\n",
567 | " print(____________)"
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "metadata": {
573 | "deletable": true,
574 | "editable": true
575 | },
576 | "source": [
577 | "So far, the \"if\" statements that we've seen have only had one condition. That means, we only checked one boolean variable, and that was it. However, we can actually have any number of \"if\" checks. For example, if we have the score of a student, and we want to check which letter grade corresponds to their score, we could write the following:"
578 | ]
579 | },
580 | {
581 | "cell_type": "code",
582 | "execution_count": null,
583 | "metadata": {
584 | "collapsed": false,
585 | "deletable": true,
586 | "editable": true
587 | },
588 | "outputs": [],
589 | "source": [
590 | "score = 76\n",
591 | "if score >= 90:\n",
592 | " print(\"A\")\n",
593 | "elif 80 <= score < 90:\n",
594 | " print(\"B\")\n",
595 | "elif 70 <= score < 80:\n",
596 | " print(\"C\")\n",
597 | "elif 60 <= score < 70:\n",
598 | " print(\"D\")\n",
599 | "else:\n",
600 | " print(\"F\")"
601 | ]
602 | },
603 | {
604 | "cell_type": "markdown",
605 | "metadata": {
606 | "deletable": true,
607 | "editable": true
608 | },
609 | "source": [
610 | "Python evaluates these statements in order, and stops at the first one that is true. For example, in this case, the first two booleans are false (76 is not greater than 90 nor is it between 80 and 90), but the third is true (76 is between 70 and 80). Therefore, the code prints out C, and then stops evaluating this sequence of statements. "
611 | ]
612 | },
613 | {
614 | "cell_type": "markdown",
615 | "metadata": {
616 | "deletable": true,
617 | "editable": true
618 | },
619 | "source": [
620 | "We will build upon these foundational skills that you've learned in today's notebook in the next session. We encourage that you practice these tools on your own; you can do so by creating a blank notebook."
621 | ]
622 | }
623 | ],
624 | "metadata": {
625 | "anaconda-cloud": {},
626 | "kernelspec": {
627 | "display_name": "Python 3",
628 | "language": "python",
629 | "name": "python3"
630 | },
631 | "language_info": {
632 | "codemirror_mode": {
633 | "name": "ipython",
634 | "version": 3
635 | },
636 | "file_extension": ".py",
637 | "mimetype": "text/x-python",
638 | "name": "python",
639 | "nbconvert_exporter": "python",
640 | "pygments_lexer": "ipython3",
641 | "version": "3.6.0"
642 | }
643 | },
644 | "nbformat": 4,
645 | "nbformat_minor": 2
646 | }
647 |
--------------------------------------------------------------------------------
/lesson5/cricket_tiers.csv:
--------------------------------------------------------------------------------
1 | PLAYER,Salary,Tier
2 | MS Dhoni,1,1
3 | Virat Kohli,1,1
4 | Ajinkya Rahan,1,1
5 | Ravi Ashwin,1,1
6 | Suresh Raina,0.5,2
7 | Ambati Rayudu,0.5,2
8 | Rohit Sharma,0.5,2
9 | Murali Vijay,0.5,2
10 | Shikhar Dhawan,0.5,2
11 | Bhuvneshwar Kumar,0.5,2
12 | Umesh Yadav,0.5,2
13 | Ishant Sharma,0.5,2
14 | Cheteshwar Pujara,0.5,2
15 | Mohammed Shami,0.5,2
16 | Amit Mishra,0.25,3
17 | Axar Patel,0.25,3
18 | Stuart Binny,0.25,3
19 | Wriddhiman Saha,0.25,3
20 | Mohit Sharma,0.25,3
21 | Vinay Kumar,0.25,3
22 | Mohit Sharma,0.25,3
23 | Varun Aaron,0.25,3
24 | Karn Sharma,0.25,3
25 | Ravindra Jadeja,0.25,3
26 | KL Rahul,0.25,3
27 | Dhawal Kulkarni,0.25,3
28 | Harbhajan Singh,0.25,3
29 | S Aravind,0.25,3
--------------------------------------------------------------------------------
/lesson5/doctor_salaries.csv:
--------------------------------------------------------------------------------
1 | Job Title,Location,Salary
2 | Goverment of Tamil Nadu Doctor - Monthly,India,"59,447"
3 | Paras Hospital Doctor,India,"1,175,091"
4 | Indian Navy Doctor - Monthly,India,"122,566"
5 | Fortis Healthcare Doctor - Monthly,India,"319,328"
6 | Isha Foundation Doctor,India,"128,659"
7 | Border Security Force Doctor - Monthly Contractor,India,"47,065"
8 | Govt of Gujarat Doctor,India,"974,240"
9 | Tangedco Doctor,India,"826,281"
10 | Shathayu Ayurveda Wellness Centre Doctor,India,"1,269,676"
11 | DH&FWS Doctor - Monthly,India,"38,773"
12 | Agartala Government Medical College Doctor - Monthly,India,"64,477"
13 | Haryana Department of Health Doctor Intern - Monthly,India,"11,796"
14 | Odisha Government Doctor - Monthly,India,"39,845"
15 | Vardhman Mahavir Medical College Doctor - Monthly,India,"47,334"
16 | Ayushya Hospital Doctor - Monthly,India,"52,967"
17 | Government of Assam Doctor - Monthly,India,"32,988"
18 | Riyahd Military Hospital Doctor - Monthly,India,"128,411"
19 | "Fortis Hospital, NOIDA Doctor - Monthly",India,"51,686"
20 | Rajendra Institute of Medical Sciences Doctor - Monthly,India,"47,376"
21 | Government of Haryana Doctor - Monthly,India,"50,140"
22 | Best Doctors Doctor - Monthly,India,"82,389"
23 | Apollo Hospitals Doctor - Monthly,India,"118,980"
24 | All India Institute of Medical Sciences Doctor - Monthly,India,"247,352"
25 | Doctor's Exchange of Washington Doctor - Monthly,India,"40,841"
26 | Best Doctors Doctor,India,"7,387,274"
27 | Government of India Doctor - Monthly,India,"111,044"
28 | Government of India Doctor Intern - Monthly,India,"17,979"
29 | Municipal of Delhi Doctor - Monthly,India,"35,859"
30 | Best Doctors Doctor Intern - Monthly,India,"20,084"
31 | Government of India Doctor - Monthly Contractor,India,"47,095"
32 | Government of India Doctor,India,"516,600"
33 | Self-Employment and Entrepreneur Development Society (SEEDS) Doctor - Monthly,India,"155,303"
34 | Doctors Hospital at Renaissance Doctor - Monthly,India,"128,672"
35 | Fortis Hospitals Doctor - Monthly,India,"29,254"
36 | Fortis Hospitals Doctor,India,"705,009"
37 | Cancer Cross Doctor - Monthly,India,"49,763"
38 | Dr. Lal PathLabs Doctor - Monthly,India,"100,159"
39 | Govt of Uttarakhand Doctor - Monthly Contractor,India,"61,469"
40 | Apollo Health and Lifestyle Doctor - Monthly,India,"40,122"
41 | MGM medical college Doctor - Contractor,India,"385,930"
42 | Vijaya Hospital Doctor,India,"712,630"
43 | Sahyadri Speciality Hospital Doctor - Monthly,India,"17,759"
44 | Life Force Doctor - Monthly,India,"24,834"
45 | Indian Army Doctor,India,"1,149,396"
46 | Rajasthan High Court Doctor - Monthly,India,"71,984"
47 | Global Hospitals Group Doctor - Monthly,India,"33,327"
48 | Columbia Asia Doctor - Monthly,India,"69,998"
49 | Government of Odisha Doctor,India,"776,421"
50 | KMC Hospital Doctor - Monthly,India,"128,962"
51 | Government of NCT of Delhi Doctor - Monthly,India,"69,312"
52 | "Department of Health & Family Welfare, Punjab, India Doctor",India,"768,644"
53 | UPRVUNL Doctor - Monthly Contractor,India,"31,159"
54 | Dr. Batras' Positive Health Clinic Doctor - Monthly,India,"21,100"
55 | "Himachal Pradesh Government, India Doctor Intern - Monthly",India,"19,474"
56 | Pmch Doctor - Monthly,India,"128,970"
57 | Park Hospital Doctor,India,"420,063"
58 | Bad Hospitals in Delhi Doctor - Monthly,India,"19,999"
59 | Geeta Infotech Doctor - Monthly,India,"51,754"
60 | National Rural Health Mission Andhra Pradesh Doctor - Monthly Contractor,India,"25,966"
61 | The Jaypee Group Doctor - Monthly,India,"11,839"
62 | Pragya Doctor,India,"670,158"
63 | Lok Hospital Doctor - Monthly,India,"47,259"
64 | Bharat Vikas Parishad Hospital & Research Center Doctor,India,"1,295,646"
65 | MŽdecins Sans Frontires India Doctor - Monthly,India,"50,869"
66 | East Delhi Municipal Doctor,India,"1,415,971"
67 | ASRAM Hospital Doctor - Monthly,India,"29,456"
68 | KAMALA REDDY MEMORIAL FOUNDATION Doctor,India,"415,383"
69 | Select Medical Doctor - Monthly,India,"58,739"
70 | Apollo Doctor,India,"1,184,035"
71 | UT Southwestern Medical Center Doctor,India,"904,750"
72 | HealthSpring Doctor - Monthly Contractor,India,"115,694"
73 | Capgemini Doctor - Monthly,India,"96,512"
74 | Rmo Doctor - Monthly,India,"59,169"
75 | GURUKRUPA Doctor - Monthly,India,"38,506"
76 | Transocean Offshore Deepwater Drilling Doctor,India,"1,938,982"
77 | Maxim Healthcare Services Doctor - Monthly,India,"99,616"
78 | CMC Doctor - Monthly,India,"38,825"
79 | Alere Doctor - Monthly,India,"64,635"
80 | Apollo Education Group Doctor - Monthly,India,"41,390"
81 | Jessica McClintock Doctor Intern - Monthly,India,"128,809"
82 | Ayurveda Pura Doctor - Monthly,India,"50,176"
83 | Government Procurement Service Doctor - Monthly,India,"25,576"
84 | Home Office Doctor - Monthly,India,"19,401"
85 | Department of Health UK Doctor - Monthly Contractor,India,"34,714"
86 | Communities & Local Government Doctor - Monthly Contractor,India,"51,258"
87 | Star Hospitals Nurse Cum Physician Assistant - Monthly,India,"19,370"
88 | Subharti University Doctor ENT Surgeon - Monthly,India,"115,666"
89 | BARC Jrd Doctor Pathology - Monthly,India,"51,613"
90 | GVK Industries Emergency Response Care Physician,India,"440,721"
91 | ONGC Occupational Health Physician - Monthly,India,"103,755"
92 | "Rajiv Gandhi Cancer Institute & Researchcentre Physician, Investigator - Clinical Research - Monthly",India,"130,251"
93 | Ganga Medical Centre & Hospitals Doctor's Assistant,India,"231,833"
94 | Shabbir Tiles Doctor Clinic - Monthly,India,"128,848"
95 | BMC DOCTOR INMEDICAL COLLGE,India,"462,962"
96 | Dr.MANU'S HOMEOPATHY Branch Head Doctor,India,"192,913"
97 | MGM medical college Junior Resident Doctor - Monthly,India,"12,866"
98 | Dr. Netra Mandir Consultant Doctor,India,"948,358"
99 | Government of Maharashtra Doctor/Physion - Monthly,India,"58,846"
100 | Excel Homeo Medical Centre Consultant Physician,India,"141,097"
101 | Apollo Health Street A R Physician,India,"375,787"
102 | Govt. Medical College Junior Resident Doctor Department of General Surgery - Monthly,India,"51,427"
103 | Asian Heart Institute and Research Centre Physician Assistant - Monthly Contractor,India,"29,770"
104 | Madras Medical Mission Physician Assistant - Monthly,India,"31,971"
105 | Government of Kerala Veterinary Doctor - Monthly,India,"64,326"
106 | BanasDairy Veterinary Doctor - Monthly,India,"28,430"
107 | Blossom Hospital Consultant Diabetologist & Physician,India,"514,132"
108 | Dr.MANU'S HOMEOPATHY Homeopathy Doctor,India,"175,534"
109 | VS Hospital Junior Doctor Intern - Monthly,India,"8,524"
110 | Levioza Health Care Medical Doctor & Consultant - Monthly Contractor,India,"24,616"
111 | Merck KGaA Medical Affairs Physician,India,"773,302"
112 | Government of India Resident Doctor - Monthly,India,"33,404"
113 | Vardhman Mahavir Medical College Resident Doctor - Contractor,India,"774,587"
114 | Civil Hospital Amdavad Resident Doctor,India,"323,373"
115 | TNMC & Nair Hospital Resident Doctor In Surgery Intern - Monthly,India,"40,059"
116 | Apollo Hospitals Resident Doctor - Monthly,India,"40,498"
--------------------------------------------------------------------------------
/lesson5/lesson_5.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Lesson 5: Exercises"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "We've given you a lot of material in the past few sessions. Let's work on a key concept we covered last session: Ttables."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 1,
27 | "metadata": {
28 | "collapsed": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import numpy as np\n",
33 | "from datascience import *"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "**RCB's Cricket Confusion** \n",
41 | "\n",
42 | "Royal Challengers Bangalore finished last in IPL 2017. The coach wants to build a strong team next year to win the trophy by buying the best possible players in the auction. You're a sports analyst whose job is to help him put a team together that's within his budget. There are 3 categories of players with the following costs in crores.\n",
43 | "Tier 1: 1\n",
44 | "Tier 2: 0.5\n",
45 | "Tier 3: 0.25\n",
46 | " \n",
47 | "The coach has 10 crores to spend and can purchase 11 to 16 players he wants from this set of 28. Write a function that accepts the coach's input for the number of players he wants that's within his budget.\n",
48 | "\n",
49 | "*Return* the players' total salary and *print* their names.\n",
50 | "REMEMBER: We refer to the first value in a column with 0!\n",
51 | "\n",
52 | "Sample Output: select_players(2, 4, 5)\n",
53 | "[MS Dhoni, Virat Kohli, Suresh Raina, Ambati Rayudu, Rohit Sharma, Murali Vijay, Amit Mishra, Axar Patel, Stuart Binny, Wriddhiman Saha, Mohit Sharma]\n",
54 | "5.25\n",
55 | "\n",
56 | "Sample Input: select_players(2, 2, 5)\n",
57 | "Sample Output: Too few players!\n",
58 | "\n",
59 | "Sample Input: select_players(10, 3, 1)\n",
60 | "Sample Output: Too many players!"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": null,
66 | "metadata": {
67 | "collapsed": false
68 | },
69 | "outputs": [],
70 | "source": [
71 | "players = Table().read_table(\"cricket_tiers.csv\")\n",
72 | "#Source: http://www.totalsportek.com/cricket/indian-player-salaries-central-contracts-2015/\n",
73 | "players"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "How can you find the number of players in each tier? Hint: Once you have this table, pass this number in as the first parameter of the *range* function when splitting up the players table"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": null,
86 | "metadata": {
87 | "collapsed": false
88 | },
89 | "outputs": [],
90 | "source": [
91 | "players.______(\"Tier\")"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": null,
97 | "metadata": {
98 | "collapsed": false,
99 | "scrolled": true
100 | },
101 | "outputs": [],
102 | "source": [
103 | "def select_players(tier_1, tier_2, tier_3):\n",
104 | " total_players = tier_1 + tier_2 + tier_3\n",
105 | " if ______________:\n",
106 | " print(\"Too few players!\")\n",
107 | " elif ____________:\n",
108 | " print(\"Too many players!\")\n",
109 | " else:\n",
110 | " t1 = players._________(\"______\", are.equal_to(_______)).take(range(0, tier_1)).column(\"PLAYER\")\n",
111 | " t2 =\n",
112 | " t3 = \n",
113 | " \n",
114 | " #How will you access the players' names using the column() function?\n",
115 | " \n",
116 | " \n",
117 | " \n",
118 | " total_salary = __________+_____________+______________\n",
119 | " if total_salary > 100:\n",
120 | " return \"Too expensive! Select a different combination!\"\n",
121 | " return total_salary"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {},
127 | "source": [
128 | "Test out your function here."
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": null,
134 | "metadata": {
135 | "collapsed": true
136 | },
137 | "outputs": [],
138 | "source": [
139 | "select_players(____, ____, _____)"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "**Building a Better Estimate**\n",
147 | "\n",
148 | "Raju the builder has made N measurements. Now, he wants to know the average value of the measurements made. In order to make the average value a better representative of the measurements, before calculating the average, he wants first to remove the highest K and the lowest K measurements. After that, he will calculate the average value among the remaining N - 2K measurements.\n",
149 | "Could you help Raju find the average value he will get after these manipulations?\n",
150 | "\n",
151 | "\n",
152 | "Sample Input: \n",
153 | "N - 5 \n",
154 | "K - 1\n",
155 | "N values - 2 9 -10 25 1\n",
156 | "Sample Output: \n",
157 | "4.00000\n"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {
164 | "collapsed": false
165 | },
166 | "outputs": [],
167 | "source": [
168 | "def new_measurements(n, k, arr):\n",
169 | " \n",
170 | " for i in range(___, ___):\n",
171 | " "
172 | ]
173 | },
174 | {
175 | "cell_type": "markdown",
176 | "metadata": {},
177 | "source": [
178 | "**Aditi’s Career Planning**\n",
179 | " \n",
180 | "Aditi can’t decide what field she wants to work in when she grows up! She likes medicine and engineering equally so her father advised her to pick the field that pays the most to an average worker. Aditi has collected tables containing the necessary data on the salaries of professionals in these fields and stored them in 2 unsorted arrays. Can you help her find out which job to pick as per her father’s advice? \n",
181 | "\n",
182 | "Hint: use the sort() function.\n",
183 | "\n"
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": null,
189 | "metadata": {
190 | "collapsed": true
191 | },
192 | "outputs": [],
193 | "source": [
194 | "def salary(med, engg):\n",
195 | " #In Python, we can define functions inside our own functions. \n",
196 | " #This function will compute a certain quantity from each array for you to help you compare the salaries.\n",
197 | " #What quantity do you think it is?\n",
198 | " def helper(array):\n",
199 | " \n",
200 | " _________________\n",
201 | " \n",
202 | " _________________\n",
203 | " length = len(array)\n",
204 | " if ___________:\n",
205 | " return array[___]\n",
206 | " else:\n",
207 | " return __________\n",
208 | " \n",
209 | " med_salary = _____________\n",
210 | " engg_salary = ____________\n",
211 | " law_salary = _____________\n",
212 | " #The max() function takes the maximum of all the values you put into it\n",
213 | " best_salary = max(___________, ______________, ___________) \n",
214 | " if best_salary == engg_salary:\n",
215 | " print(\"Engineering\")\n",
216 | " else:\n",
217 | " print(\"Medicine\")\n"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "Now, try to find an array of salaries from these two tables."
225 | ]
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": 21,
230 | "metadata": {
231 | "collapsed": false
232 | },
233 | "outputs": [
234 | {
235 | "data": {
236 | "text/plain": [
237 | "array([ 59447., 1175091., 122566., 319328., 128659., 47065.,\n",
238 | " 974240., 826281., 1269676., 38773., 64477., 11796.,\n",
239 | " 39845., 47334., 52967., 32988., 128411., 51686.,\n",
240 | " 47376., 50140., 82389., 118980., 247352., 40841.,\n",
241 | " 7387274., 111044., 17979., 35859., 20084., 47095.,\n",
242 | " 516600., 155303., 128672., 29254., 705009., 49763.,\n",
243 | " 100159., 61469., 40122., 385930., 712630., 17759.,\n",
244 | " 24834., 1149396., 71984., 33327., 69998., 776421.,\n",
245 | " 128962., 69312., 768644., 31159., 21100., 19474.,\n",
246 | " 128970., 420063., 19999., 51754., 25966., 11839.,\n",
247 | " 670158., 47259., 1295646., 50869., 1415971., 29456.,\n",
248 | " 415383., 58739., 1184035., 904750., 115694., 96512.,\n",
249 | " 59169., 38506., 1938982., 99616., 38825., 64635.,\n",
250 | " 41390., 128809., 50176., 25576., 19401., 34714.,\n",
251 | " 51258., 19370., 115666., 51613., 440721., 103755.,\n",
252 | " 130251., 231833., 128848., 462962., 192913., 12866.,\n",
253 | " 948358., 58846., 141097., 375787., 51427., 29770.,\n",
254 | " 31971., 64326., 28430., 514132., 175534., 8524.,\n",
255 | " 24616., 773302., 33404., 774587., 323373., 40059.,\n",
256 | " 40498.])"
257 | ]
258 | },
259 | "execution_count": 21,
260 | "metadata": {},
261 | "output_type": "execute_result"
262 | }
263 | ],
264 | "source": [
265 | "engineers = Table.read_table(\"engineering_data.csv\")\n",
266 | "#Source: http://research.aspiringminds.com/resources/#datasets\n",
267 | "doctors = Table.read_table(\"doctor_salaries.csv\")\n",
268 | "#Source: https://www.glassdoor.com/Salaries/india-doctor-salary-SRCH_IL.0,5_IN115_KO6,12_IP6.htm\n",
269 | "med_strip = list(map(lambda s : s.replace(\",\",\"\"), doctors.column(\"Salary\")))\n",
270 | "\n",
271 | "#This space is for you to see what's inside each table.\n"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": null,
277 | "metadata": {
278 | "collapsed": false
279 | },
280 | "outputs": [],
281 | "source": [
282 | "engg_salaries =\n",
283 | "#We needed to clean up the table of doctors' salaries so that you could do calculations with it\n",
284 | "med_salaries = np.asarray(med_strip).astype(np.float)"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 2,
290 | "metadata": {
291 | "collapsed": true
292 | },
293 | "outputs": [],
294 | "source": [
295 | "#Call your salary function here.\n",
296 | "\n",
297 | "__________(________, ___________)"
298 | ]
299 | },
300 | {
301 | "cell_type": "markdown",
302 | "metadata": {},
303 | "source": [
304 | "We hope you got some great practice today! In the next session, we're going to look at how tables can be applied when we're looking at probability."
305 | ]
306 | }
307 | ],
308 | "metadata": {
309 | "kernelspec": {
310 | "display_name": "Python 3",
311 | "language": "python",
312 | "name": "python3"
313 | },
314 | "language_info": {
315 | "codemirror_mode": {
316 | "name": "ipython",
317 | "version": 3
318 | },
319 | "file_extension": ".py",
320 | "mimetype": "text/x-python",
321 | "name": "python",
322 | "nbconvert_exporter": "python",
323 | "pygments_lexer": "ipython3",
324 | "version": "3.6.0"
325 | }
326 | },
327 | "nbformat": 4,
328 | "nbformat_minor": 2
329 | }
330 |
--------------------------------------------------------------------------------
/lesson6/.ipynb_checkpoints/lesson6-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import matplotlib.pyplot as plt\n",
12 | "import math\n",
13 | "import numpy as np\n",
14 | "import matplotlib.mlab as mlab\n",
15 | "import random\n",
16 | "from datascience import *\n",
17 | "%matplotlib inline\n",
18 | "import matplotlib.pyplot as plots\n",
19 | "plots.style.use('fivethirtyeight')\n",
20 | "import math"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "# Intro to Probability -- Digging Deeper"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "## Probability Distributions\n",
35 | "\n",
36 | "In the lesson, we ended with an experiment about coin flipping and we compared the empirical probabilities with the theoretical probabilites. Here, the same experiement is simulated, with an extra column for the theoretical probability. The experiment has been run 10 times, 100 times, 1000 times, and 10,000 times so you can see the difference between experimental and theoretical probablity as the numbers get extremely high. Run the cells and note what you see!\n",
37 | "\n",
38 | "Remember the table that we make in this way is known as a \"probability distribution.\" That's just a fancy way of saying that it tells us what the probability of each outcome is."
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": null,
44 | "metadata": {
45 | "collapsed": false
46 | },
47 | "outputs": [],
48 | "source": [
49 | "run_amount=10\n",
50 | "heads,tails=0,0\n",
51 | "for i in range(run_amount):\n",
52 | " coin=random.randint(0,1)\n",
53 | " if coin==0:\n",
54 | " heads+=1\n",
55 | " else:\n",
56 | " tails+=1\n",
57 | "theoretical_prob=(.5*run_amount)/run_amount\n",
58 | "Table().with_columns(\n",
59 | " 'Side', make_array('heads','tails'),\n",
60 | " 'Count', make_array(heads,tails),\n",
61 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
62 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
63 | ")"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {
70 | "collapsed": false
71 | },
72 | "outputs": [],
73 | "source": [
74 | "run_amount=100\n",
75 | "heads,tails=0,0\n",
76 | "for i in range(run_amount):\n",
77 | " coin=random.randint(0,1)\n",
78 | " if coin==0:\n",
79 | " heads+=1\n",
80 | " else:\n",
81 | " tails+=1\n",
82 | "theoretical_prob=(.5*run_amount)/run_amount\n",
83 | "Table().with_columns(\n",
84 | " 'Side', make_array('heads','tails'),\n",
85 | " 'Count', make_array(heads,tails),\n",
86 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
87 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
88 | ")"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "metadata": {
95 | "collapsed": false
96 | },
97 | "outputs": [],
98 | "source": [
99 | "run_amount=1000\n",
100 | "heads,tails=0,0\n",
101 | "for i in range(run_amount):\n",
102 | " coin=random.randint(0,1)\n",
103 | " if coin==0:\n",
104 | " heads+=1\n",
105 | " else:\n",
106 | " tails+=1\n",
107 | "theoretical_prob=(.5*run_amount)/run_amount\n",
108 | "Table().with_columns(\n",
109 | " 'Side', make_array('heads','tails'),\n",
110 | " 'Count', make_array(heads,tails),\n",
111 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
112 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
113 | ")"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {
120 | "collapsed": false
121 | },
122 | "outputs": [],
123 | "source": [
124 | "run_amount=10000\n",
125 | "heads,tails=0,0\n",
126 | "for i in range(run_amount):\n",
127 | " coin=random.randint(0,1)\n",
128 | " if coin==0:\n",
129 | " heads+=1\n",
130 | " else:\n",
131 | " tails+=1\n",
132 | "theoretical_prob=(.5*run_amount)/run_amount\n",
133 | "Table().with_columns(\n",
134 | " 'Side', make_array('heads','tails'),\n",
135 | " 'Count', make_array(heads,tails),\n",
136 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
137 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
138 | ")\n"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "You probably noticed that as the experiement is run more times, the experimental and theoretical probabilities begin to align more and more. This is known as the law of large numbers.\n",
146 | "\n",
147 | "Now it's your turn! Let's consider a different scenario. Suppose we were rolling a die with numbers 1 through 6 on it instead. What's the theoretical probability of rolling any number? What does the empirical probability of rolling a 6 if you do this experiment 100 times? Fill in the code in the cell below with the right values to answer this question! After you do that, try increasing the run_amount to see what happens to the probability distribution."
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "metadata": {
154 | "collapsed": false
155 | },
156 | "outputs": [],
157 | "source": [
158 | "run_amount = # fill in this value\n",
159 | "ones, others = 0, 0\n",
160 | "for i in range(run_amount):\n",
161 | " die_roll = random.randint( ) # fill in this function\n",
162 | " if die_roll == 0:\n",
163 | " ones += 1\n",
164 | " else:\n",
165 | " others += 1\n",
166 | "theoretical_prob_ones = # What's the probability of getting a 6 (or any one number)?\n",
167 | "theoretical_prob_others = # What's the probability of getting a value that's not a 6 (or your desired number)? \n",
168 | "#Hint: Use the complement rule\n",
169 | "Table().with_columns(\n",
170 | " 'Side', make_array(\"ones\", \"others\")\n",
171 | " 'Count', make_array(ones, others),\n",
172 | " 'Empirical Probability', make_array(ones/run_amount,others/run_amount),\n",
173 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
174 | ")"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "## Normal Distribution\n",
182 | "\n",
183 | "Coin tosses and die rolls are examples of \"discrete\" random variables. There are only a few specific outcomes: a coin toss can be either heads or tails, and a die roll can be 1, 2, 3, 4, 5, or 6. But some random variables are \"continuous\": they can take on any value in a range.\n",
184 | "\n",
185 | "For these continuous random variables, we can't display the probability distribution in a table like we could for coin tosses. Instead, we need to graph the distribution. In the lesson today, you learned about one such graph: the normal distribution. Let's take a look at it by running the following cells:"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {
192 | "collapsed": false
193 | },
194 | "outputs": [],
195 | "source": [
196 | "mu = 100\n",
197 | "variance = 225\n",
198 | "sigma = 15\n",
199 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
200 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
201 | "plt.axis([0,200, 0, 0.025])\n",
202 | "plt.show()\n"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "As you can see, the shape of the normal distribution is affected by two variables: mu, or it's mean, and sigma, or it's standard deviation. Here are a few more examples of normal distributions that have different values for mu and sigma:"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {
216 | "collapsed": false
217 | },
218 | "outputs": [],
219 | "source": [
220 | "mu = 0\n",
221 | "variance = 1\n",
222 | "sigma = math.sqrt(variance)\n",
223 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
224 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
225 | "plt.show()"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "metadata": {
232 | "collapsed": false,
233 | "scrolled": true
234 | },
235 | "outputs": [],
236 | "source": [
237 | "mu = 0\n",
238 | "variance = 100\n",
239 | "sigma = math.sqrt(variance)\n",
240 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
241 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
242 | "plt.show()"
243 | ]
244 | },
245 | {
246 | "cell_type": "markdown",
247 | "metadata": {},
248 | "source": [
249 | "Now make your own! Insert values for mu and variance, run the cell, and see what happens!\n"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {
256 | "collapsed": false
257 | },
258 | "outputs": [],
259 | "source": [
260 | "mu = ##\n",
261 | "variance = ##\n",
262 | "sigma = math.sqrt(variance)\n",
263 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
264 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
265 | "plt.show()"
266 | ]
267 | },
268 | {
269 | "cell_type": "markdown",
270 | "metadata": {
271 | "collapsed": true
272 | },
273 | "source": [
274 | "## Normal Distributions and Coin Tosses\n",
275 | "\n",
276 | "Now we're gonna try an experiment. Earlier, we flipped a coin 100 times and saw how many times it came heads up. Now we're gonna repeat that same experiment a LOT of times and see what happens by plotting the resulting values in a histogram. First run the following cell:"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": null,
282 | "metadata": {
283 | "collapsed": false
284 | },
285 | "outputs": [],
286 | "source": [
287 | "num_trials = 100\n",
288 | "results = [0]*101\n",
289 | "run_amount = 100\n",
290 | "for j in range(num_trials):\n",
291 | " heads = 0\n",
292 | " for i in range(run_amount):\n",
293 | " coin = random.randint(0,1)\n",
294 | " if coin == 0:\n",
295 | " heads += 1\n",
296 | " results[heads] = results[heads] + 1\n",
297 | "\n",
298 | "ticks = np.arange(len(results)) # this tells matplotlib how to arrange the bars\n",
299 | "plt.bar(ticks, results)\n",
300 | "plt.xlabel(\"Number of Heads\")\n",
301 | "plt.ylabel(\"Count\")"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "Does the outline of the histogram look familiar? What if we increase the number of trials?"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": null,
314 | "metadata": {
315 | "collapsed": false
316 | },
317 | "outputs": [],
318 | "source": [
319 | "num_trials = 1000\n",
320 | "run_amount = 100\n",
321 | "results = [0]*(run_amount + 1)\n",
322 | "for j in range(num_trials):\n",
323 | " heads = 0\n",
324 | " for i in range(run_amount):\n",
325 | " coin = random.randint(0,1)\n",
326 | " if coin == 0:\n",
327 | " heads += 1\n",
328 | " results[heads] = results[heads] + 1\n",
329 | "\n",
330 | "ticks = np.arange(len(results))\n",
331 | "plt.bar(ticks, results)\n",
332 | "plt.xlabel(\"Number of Heads\")\n",
333 | "plt.ylabel(\"Count\")"
334 | ]
335 | },
336 | {
337 | "cell_type": "markdown",
338 | "metadata": {},
339 | "source": [
340 | "It turns out that the shape of the graph is actually very similar to a normal distribution! This is what makes the normal distribution so powerful - it is a good approximation of what happens when we run a trial with a fixed probability of success many times.\n",
341 | "\n",
342 | "Now, one last exercise. Try to implement the same experiment for a die roll - that is, instead of counting the number of heads, count the number of \"one's\" on a die rolled 100 times for many trials. What happens to the graph? Does it still look like a normal curve? Try experimenting with different values of num_trials too."
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "execution_count": null,
348 | "metadata": {
349 | "collapsed": true
350 | },
351 | "outputs": [],
352 | "source": [
353 | "num_trials = #\n",
354 | "run_amount = #\n",
355 | "results = [0]*(run_amount + 1)\n",
356 | "for j in range(num_trials):\n",
357 | " ones = 0\n",
358 | " for i in range(run_amount):\n",
359 | " die_roll = random.randint( ) # fill in this value so that the experiment is a die roll, not a coin toss\n",
360 | " if die_roll == 0:\n",
361 | " ones += 1\n",
362 | " results[ones] = results[ones] + 1\n",
363 | "\n",
364 | "ticks = np.arange(len(results))\n",
365 | "plt.bar(ticks, results)\n",
366 | "plt.xlabel(\"Number of Ones\")\n",
367 | "plt.ylabel(\"Count\")"
368 | ]
369 | }
370 | ],
371 | "metadata": {
372 | "kernelspec": {
373 | "display_name": "Python 3",
374 | "language": "python",
375 | "name": "python3"
376 | },
377 | "language_info": {
378 | "codemirror_mode": {
379 | "name": "ipython",
380 | "version": 3
381 | },
382 | "file_extension": ".py",
383 | "mimetype": "text/x-python",
384 | "name": "python",
385 | "nbconvert_exporter": "python",
386 | "pygments_lexer": "ipython3",
387 | "version": "3.6.0"
388 | }
389 | },
390 | "nbformat": 4,
391 | "nbformat_minor": 2
392 | }
393 |
--------------------------------------------------------------------------------
/lesson6/lesson6.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import matplotlib.pyplot as plt\n",
12 | "import math\n",
13 | "import numpy as np\n",
14 | "import matplotlib.mlab as mlab\n",
15 | "import random\n",
16 | "from datascience import *\n",
17 | "%matplotlib inline\n",
18 | "import matplotlib.pyplot as plots\n",
19 | "plots.style.use('fivethirtyeight')\n",
20 | "import math"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "# Intro to Probability -- Digging Deeper"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "## Probability Distributions\n",
35 | "\n",
36 | "In the lesson, we ended with an experiment about coin flipping and we compared the empirical probabilities with the theoretical probabilites. Here, the same experiement is simulated, with an extra column for the theoretical probability. The experiment has been run 10 times, 100 times, 1000 times, and 10,000 times so you can see the difference between experimental and theoretical probablity as the numbers get extremely high. Run the cells and note what you see!\n",
37 | "\n",
38 | "Remember the table that we make in this way is known as a \"probability distribution.\" That's just a fancy way of saying that it tells us what the probability of each outcome is."
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": null,
44 | "metadata": {
45 | "collapsed": false
46 | },
47 | "outputs": [],
48 | "source": [
49 | "run_amount=10\n",
50 | "heads,tails=0,0\n",
51 | "for i in range(run_amount):\n",
52 | " coin=random.randint(0,1)\n",
53 | " if coin==0:\n",
54 | " heads+=1\n",
55 | " else:\n",
56 | " tails+=1\n",
57 | "theoretical_prob=(.5*run_amount)/run_amount\n",
58 | "Table().with_columns(\n",
59 | " 'Side', make_array('heads','tails'),\n",
60 | " 'Count', make_array(heads,tails),\n",
61 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
62 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
63 | ")"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {
70 | "collapsed": false
71 | },
72 | "outputs": [],
73 | "source": [
74 | "run_amount=100\n",
75 | "heads,tails=0,0\n",
76 | "for i in range(run_amount):\n",
77 | " coin=random.randint(0,1)\n",
78 | " if coin==0:\n",
79 | " heads+=1\n",
80 | " else:\n",
81 | " tails+=1\n",
82 | "theoretical_prob=(.5*run_amount)/run_amount\n",
83 | "Table().with_columns(\n",
84 | " 'Side', make_array('heads','tails'),\n",
85 | " 'Count', make_array(heads,tails),\n",
86 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
87 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
88 | ")"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": null,
94 | "metadata": {
95 | "collapsed": false
96 | },
97 | "outputs": [],
98 | "source": [
99 | "run_amount=1000\n",
100 | "heads,tails=0,0\n",
101 | "for i in range(run_amount):\n",
102 | " coin=random.randint(0,1)\n",
103 | " if coin==0:\n",
104 | " heads+=1\n",
105 | " else:\n",
106 | " tails+=1\n",
107 | "theoretical_prob=(.5*run_amount)/run_amount\n",
108 | "Table().with_columns(\n",
109 | " 'Side', make_array('heads','tails'),\n",
110 | " 'Count', make_array(heads,tails),\n",
111 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
112 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
113 | ")"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {
120 | "collapsed": false
121 | },
122 | "outputs": [],
123 | "source": [
124 | "run_amount=10000\n",
125 | "heads,tails=0,0\n",
126 | "for i in range(run_amount):\n",
127 | " coin=random.randint(0,1)\n",
128 | " if coin==0:\n",
129 | " heads+=1\n",
130 | " else:\n",
131 | " tails+=1\n",
132 | "theoretical_prob=(.5*run_amount)/run_amount\n",
133 | "Table().with_columns(\n",
134 | " 'Side', make_array('heads','tails'),\n",
135 | " 'Count', make_array(heads,tails),\n",
136 | " 'Empirical Probability', make_array(heads/run_amount,tails/run_amount),\n",
137 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
138 | ")\n"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "You probably noticed that as the experiement is run more times, the experimental and theoretical probabilities begin to align more and more. This is known as the law of large numbers.\n",
146 | "\n",
147 | "Now it's your turn! Let's consider a different scenario. Suppose we were rolling a die with numbers 1 through 6 on it instead. What's the theoretical probability of rolling any number? What does the empirical probability of rolling a 6 if you do this experiment 100 times? Fill in the code in the cell below with the right values to answer this question! After you do that, try increasing the run_amount to see what happens to the probability distribution."
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "metadata": {
154 | "collapsed": false
155 | },
156 | "outputs": [],
157 | "source": [
158 | "run_amount = # fill in this value\n",
159 | "ones, others = 0, 0\n",
160 | "for i in range(run_amount):\n",
161 | " die_roll = random.randint( ) # fill in this function\n",
162 | " if die_roll == 0:\n",
163 | " ones += 1\n",
164 | " else:\n",
165 | " others += 1\n",
166 | "theoretical_prob_ones = # What's the probability of getting a 6 (or any one number)?\n",
167 | "theoretical_prob_others = # What's the probability of getting a value that's not a 6 (or your desired number)? \n",
168 | "#Hint: Use the complement rule\n",
169 | "Table().with_columns(\n",
170 | " 'Side', make_array(\"ones\", \"others\")\n",
171 | " 'Count', make_array(ones, others),\n",
172 | " 'Empirical Probability', make_array(ones/run_amount,others/run_amount),\n",
173 | " 'Theoretical Probability', make_array(theoretical_prob,theoretical_prob)\n",
174 | ")"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "## Normal Distribution\n",
182 | "\n",
183 | "Coin tosses and die rolls are examples of \"discrete\" random variables. There are only a few specific outcomes: a coin toss can be either heads or tails, and a die roll can be 1, 2, 3, 4, 5, or 6. But some random variables are \"continuous\": they can take on any value in a range.\n",
184 | "\n",
185 | "For these continuous random variables, we can't display the probability distribution in a table like we could for coin tosses. Instead, we need to graph the distribution. In the lesson today, you learned about one such graph: the normal distribution. Let's take a look at it by running the following cells:"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {
192 | "collapsed": false
193 | },
194 | "outputs": [],
195 | "source": [
196 | "mu = 100\n",
197 | "variance = 225\n",
198 | "sigma = 15\n",
199 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
200 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
201 | "plt.axis([0,200, 0, 0.025])\n",
202 | "plt.show()\n"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "As you can see, the shape of the normal distribution is affected by two variables: mu, or it's mean, and sigma, or it's standard deviation. Here are a few more examples of normal distributions that have different values for mu and sigma:"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {
216 | "collapsed": false
217 | },
218 | "outputs": [],
219 | "source": [
220 | "mu = 0\n",
221 | "variance = 1\n",
222 | "sigma = math.sqrt(variance)\n",
223 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
224 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
225 | "plt.show()"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "metadata": {
232 | "collapsed": false,
233 | "scrolled": true
234 | },
235 | "outputs": [],
236 | "source": [
237 | "mu = 0\n",
238 | "variance = 100\n",
239 | "sigma = math.sqrt(variance)\n",
240 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
241 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
242 | "plt.show()"
243 | ]
244 | },
245 | {
246 | "cell_type": "markdown",
247 | "metadata": {},
248 | "source": [
249 | "Now make your own! Insert values for mu and variance, run the cell, and see what happens!\n"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {
256 | "collapsed": false
257 | },
258 | "outputs": [],
259 | "source": [
260 | "mu = ##\n",
261 | "variance = ##\n",
262 | "sigma = math.sqrt(variance)\n",
263 | "x = np.linspace(mu-3*variance,mu+3*variance, 100)\n",
264 | "plt.plot(x,mlab.normpdf(x, mu, sigma))\n",
265 | "plt.show()"
266 | ]
267 | },
268 | {
269 | "cell_type": "markdown",
270 | "metadata": {
271 | "collapsed": true
272 | },
273 | "source": [
274 | "## Normal Distributions and Coin Tosses\n",
275 | "\n",
276 | "Now we're gonna try an experiment. Earlier, we flipped a coin 100 times and saw how many times it came heads up. Now we're gonna repeat that same experiment a LOT of times and see what happens by plotting the resulting values in a histogram. First run the following cell:"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": null,
282 | "metadata": {
283 | "collapsed": false
284 | },
285 | "outputs": [],
286 | "source": [
287 | "num_trials = 100\n",
288 | "results = [0]*101\n",
289 | "run_amount = 100\n",
290 | "for j in range(num_trials):\n",
291 | " heads = 0\n",
292 | " for i in range(run_amount):\n",
293 | " coin = random.randint(0,1)\n",
294 | " if coin == 0:\n",
295 | " heads += 1\n",
296 | " results[heads] = results[heads] + 1\n",
297 | "\n",
298 | "ticks = np.arange(len(results)) # this tells matplotlib how to arrange the bars\n",
299 | "plt.bar(ticks, results)\n",
300 | "plt.xlabel(\"Number of Heads\")\n",
301 | "plt.ylabel(\"Count\")"
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "Does the outline of the histogram look familiar? What if we increase the number of trials?"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": null,
314 | "metadata": {
315 | "collapsed": false
316 | },
317 | "outputs": [],
318 | "source": [
319 | "num_trials = 1000\n",
320 | "run_amount = 100\n",
321 | "results = [0]*(run_amount + 1)\n",
322 | "for j in range(num_trials):\n",
323 | " heads = 0\n",
324 | " for i in range(run_amount):\n",
325 | " coin = random.randint(0,1)\n",
326 | " if coin == 0:\n",
327 | " heads += 1\n",
328 | " results[heads] = results[heads] + 1\n",
329 | "\n",
330 | "ticks = np.arange(len(results))\n",
331 | "plt.bar(ticks, results)\n",
332 | "plt.xlabel(\"Number of Heads\")\n",
333 | "plt.ylabel(\"Count\")"
334 | ]
335 | },
336 | {
337 | "cell_type": "markdown",
338 | "metadata": {},
339 | "source": [
340 | "It turns out that the shape of the graph is actually very similar to a normal distribution! This is what makes the normal distribution so powerful - it is a good approximation of what happens when we run a trial with a fixed probability of success many times.\n",
341 | "\n",
342 | "Now, one last exercise. Try to implement the same experiment for a die roll - that is, instead of counting the number of heads, count the number of \"one's\" on a die rolled 100 times for many trials. What happens to the graph? Does it still look like a normal curve? Try experimenting with different values of num_trials too."
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "execution_count": null,
348 | "metadata": {
349 | "collapsed": true
350 | },
351 | "outputs": [],
352 | "source": [
353 | "num_trials = #\n",
354 | "run_amount = #\n",
355 | "results = [0]*(run_amount + 1)\n",
356 | "for j in range(num_trials):\n",
357 | " ones = 0\n",
358 | " for i in range(run_amount):\n",
359 | " die_roll = random.randint( ) # fill in this value so that the experiment is a die roll, not a coin toss\n",
360 | " if die_roll == 0:\n",
361 | " ones += 1\n",
362 | " results[ones] = results[ones] + 1\n",
363 | "\n",
364 | "ticks = np.arange(len(results))\n",
365 | "plt.bar(ticks, results)\n",
366 | "plt.xlabel(\"Number of Ones\")\n",
367 | "plt.ylabel(\"Count\")"
368 | ]
369 | }
370 | ],
371 | "metadata": {
372 | "kernelspec": {
373 | "display_name": "Python 3",
374 | "language": "python",
375 | "name": "python3"
376 | },
377 | "language_info": {
378 | "codemirror_mode": {
379 | "name": "ipython",
380 | "version": 3
381 | },
382 | "file_extension": ".py",
383 | "mimetype": "text/x-python",
384 | "name": "python",
385 | "nbconvert_exporter": "python",
386 | "pygments_lexer": "ipython3",
387 | "version": "3.6.0"
388 | }
389 | },
390 | "nbformat": 4,
391 | "nbformat_minor": 2
392 | }
393 |
--------------------------------------------------------------------------------
/lesson7/british_india_troops.csv:
--------------------------------------------------------------------------------
1 | year,british_bengal,native_bengal,total_bengal,british_madras,native_madras,total_madras,british_bombay,native_bombay,total_bombay
2 | 1840,16303,102055,118358,12371,59711,72082,6930,38073,45003
3 | 1841,18873,106907,125780,11979,63183,75162,7554,42526,50080
4 | 1842,21114,109078,130192,12183,61378,73561,8816,42168,50984
5 | 1843,22007,113762,135769,14113,63804,77917,10606,43381,53987
6 | 1844,21645,112034,133679,14078,62547,76625,10517,41999,52516
7 | 1845,21783,133525,155308,14354,61953,76307,9974,44832,54806
8 | 1846,20445,133561,154006,12794,63217,76011,10775,43955,54730
9 | 1847,20898,132848,153746,12775,60904,73679,10650,53721,64371
10 | 1848,20596,114577,135173,12650,54806,67456,11024,51508,62532
11 | 1849,22727,124917,147644,12031,53697,65728,13135,50516,63651
12 | 1850,26803,126910,153713,11662,53867,65529,10815,47671,58486
13 | 1851,27159,138142,165301,11584,53667,65251,10665,48312,58977
14 | 1852,26089,139807,165896,11687,53714,65401,10933,45552,56485
15 | 1853,24986,139246,164232,11370,53787,65157,10577,45312,55889
16 | 1854,26531,138674,165205,11172,53254,64426,9443,44921,54364
17 | 1855,25344,139162,164506,10927,53031,63958,9822,44898,54720
18 | 1856,24594,137109,161703,10352,53201,63553,10158,44911,55069
19 | 1857,24366,135767,160133,10726,51244,61970,10430,45213,55643
20 | 1859,62167,82687,144854,17091,67141,84232,27032,46415,73447
21 | 1860,57778,91898,149676,17851,78440,96291,17237,42664,59901
22 | 1861,51791,86620,138411,18257,63727,81984,14246,34325,48571
23 | 1862,47912,39210,87122,16421,55687,72108,13841,31016,44857
24 | 1863,46614,40945,87559,15113,50964,66077,14358,28866,43224
25 | 1864,45283,42938,88221,15583,50131,65714,14095,27991,42086
26 | 1865,42128,43796,85924,16002,46693,62695,13750,27826,41576
--------------------------------------------------------------------------------
/lesson7/foreign_tourists.csv:
--------------------------------------------------------------------------------
1 | Nationality,Region,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010
2 | Canada,NORTH AMERICA,88600,93598,107671,135884,157643,176567,208214,222364,224069,242372
3 | U.S.A.,NORTH AMERICA,329147,348182,410803,526120,611165,696739,799062,804933,827140,931292
4 | Total,NORTH AMERICA,417747,441780,518474,662004,768808,873306,1007276,1027297,1051209,1173664
5 | Argentina,CENTRAL AND SOUTH AMERICA,2906,1359,1805,2799,3313,4493,4992,5087,6011,7626
6 | Brazil,CENTRAL AND SOUTH AMERICA,3819,3622,4528,7397,7005,9148,10788,11530,13964,15219
7 | Mexico,CENTRAL AND SOUTH AMERICA,3473,3105,3563,4570,5398,6502,8299,9272,8185,10458
8 | Others,CENTRAL AND SOUTH AMERICA,11727,9586,11758,13399,19870,18602,18240,17616,18444,29425
9 | Total,CENTRAL AND SOUTH AMERICA,21925,17672,21654,28165,35586,38745,42319,43505,46604,62728
10 | Austria,WESTERN EUROPE,17787,13801,16903,21093,27187,28045,26692,25900,27930,32620
11 | Belgium,WESTERN EUROPE,18851,13945,17309,24007,25596,29156,34207,36277,34759,37709
12 | Denmark,WESTERN EUROPE,14531,10230,11327,15805,20170,21592,28347,34253,30857,35541
13 | Finland,WESTERN EUROPE,8186,7673,8001,12525,16258,22860,34364,29223,24874,24089
14 | France,WESTERN EUROPE,102434,78194,97654,131824,152258,175345,204827,207802,196462,225232
15 | Germany,WESTERN EUROPE,80011,64891,76868,116679,120243,156808,184195,204344,191616,227720
16 | Greece,WESTERN EUROPE,3996,3207,3455,4468,4793,5146,6455,6672,6664,7441
17 | Ireland,WESTERN EUROPE,6136,5793,7083,8996,10052,14936,18376,18924,19223,20329
18 | Italy,WESTERN EUROPE,41351,37136,46908,65561,67642,79978,93540,85766,77873,94100
19 | Netherland,WESTERN EUROPE,42368,31669,40565,51211,52755,58611,67429,71605,64580,70756
20 | Norway,WESTERN EUROPE,7667,7475,8400,10631,11194,14216,19484,22369,22092,22229
21 | Portugal,WESTERN EUROPE,7028,7262,8158,10648,11457,13108,15756,15415,17184,21038
22 | Spain,WESTERN EUROPE,23073,19567,30551,42895,45247,53520,63357,62535,59047,72591
23 | Sweden,WESTERN EUROPE,14446,15330,18098,26154,28799,36013,47090,58961,43327,45028
24 | Switzerland,WESTERN EUROPE,25308,21606,24463,28260,34311,37446,41172,42107,38290,43134
25 | U.K.,WESTERN EUROPE,405472,387846,430917,555907,651083,734240,796191,776530,769251,759494
26 | Others,WESTERN EUROPE,1328,1158,1306,1633,3074,6251,4601,10842,10013,11291
27 | Total,WESTERN EUROPE,819973,726783,847966,1128297,1282119,1487271,1686083,1709525,1634042,1750342
28 | Czechoslovakia, EASTERN EUROPE,3197,2561,3466,4114,4783,5760,7764,8549,8328,9918
29 | Hungary, EASTERN EUROPE,1939,1557,1997,3527,3704,4262,5073,5263,4980,6022
30 | Poland, EASTERN EUROPE,5181,4468,6336,8445,10983,14808,20166,23517,19656,25424
31 | CIS, EASTERN EUROPE,25032,28304,38947,61816,75863,87433,109769,140341,135854,170112
32 | Others, EASTERN EUROPE,3514,3738,4506,5153,6077,7928,7946,10607,11565,12603
33 | Total, EASTERN EUROPE,38863,40628,55252,82426,101445,121309,152764,191110,183475,227650
34 | Egypt,AFRICA,2479,2688,3382,3781,4048,5528,6328,5326,5869,8017
35 | Ethiopia,AFRICA,2897,2535,2301,2661,3248,3140,3588,3306,3936,3797
36 | Kenya,AFRICA,15973,17275,16563,17538,19816,20313,25397,14941,22704,29223
37 | Mali,AFRICA,85,54,57,2541,114,162,238,232,273,495
38 | Mauritius,AFRICA,16039,14425,16308,19823,19760,20607,21522,19713,18866,21672
39 | Nigeria,AFRICA,7539,5997,5713,6659,10049,9348,10863,13997,18338,23893
40 | South Africa,AFRICA,21162,18238,23873,32148,39229,41954,46042,42337,44308,55688
41 | Sudan,AFRICA,2323,1899,2025,2487,3660,4355,4381,3473,4987,7418
42 | Tanzania,AFRICA,6579,7459,8515,9953,11193,11954,13960,14872,17020,17645
43 | Zambia,AFRICA,1290,1126,1383,1468,1848,2069,2814,1995,2249,2621
44 | Others,AFRICA,14596,11761,13233,16434,21836,23383,22352,21558,25924,34056
45 | Total,AFRICA,90962,83457,93353,115493,134801,142813,157485,141750,164474,204525
46 | Bahrein, WEST ASIA,3945,3754,4182,4414,4923,4793,6674,7224,7901,7766
47 | Israel, WEST ASIA,28774,25503,32157,39083,42866,42735,47553,42720,40581,43456
48 | Jordan, WEST ASIA,1428,1768,1686,2400,3333,3933,4537,4154,4301,4640
49 | Kuwait, WEST ASIA,1850,1838,2361,2965,3103,3773,4129,5302,5208,4764
50 | Oman, WEST ASIA,13114,13256,12352,14927,14979,17849,22284,34042,32971,35485
51 | Qatar, WEST ASIA,1361,1215,1434,1788,2176,2392,2606,2934,2765,2735
52 | Saudi Arabia, WEST ASIA,9851,8663,9961,11929,12444,14006,16352,16983,15552,21599
53 | Syria, WEST ASIA,1501,1452,1661,2289,2385,2645,2928,2883,3215,3586
54 | Turkey, WEST ASIA,2432,3354,5528,7008,7906,10221,11212,10934,10282,15483
55 | U.A.E., WEST ASIA,21483,22027,21374,22668,24560,27593,32750,63502,47234,45482
56 | Yemen Arab Rep., WEST ASIA,7773,6772,7717,8826,9423,9573,10898,11583,12695,14931
57 | Others, WEST ASIA,2912,2962,3183,4511,5723,7180,9738,13281,22138,35390
58 | Total, WEST ASIA,96424,92562,103596,122808,133821,146693,171661,215542,204843,235317
59 | Afghanistan,SOUTH ASIA,1248,6012,10079,12705,14025,18799,23045,32438,50446,73389
60 | Iran,SOUTH ASIA,11728,11815,17539,24733,28691,29771,33223,30149,34652,49265
61 | Maldives,SOUTH ASIA,17564,18826,18345,21099,33915,37652,45787,54956,55159,58152
62 | Nepal,SOUTH ASIA,41135,43056,42771,51534,77024,91552,83037,78133,88785,104374
63 | Pakistan,SOUTH ASIA,52762,2946,10364,67416,88609,83426,106283,85529,53137,51739
64 | Bangladesh,SOUTH ASIA,431312,435867,454611,477446,456371,484401,480240,541884,468899,431962
65 | Sri Lanka,SOUTH ASIA,112813,108008,109098,128711,136400,154813,204084,218805,239995,266515
66 | Bhutan,SOUTH ASIA,3571,4123,4082,7054,6934,8502,6729,9952,10328,12048
67 | Total,SOUTH ASIA,672133,630653,666889,790698,841969,908916,982428,1051846,1001401,1047444
68 | Indonesia,SOUTH EAST ASIA,7767,8694,9078,11408,12640,16990,17818,19609,20068,26171
69 | Malayasia,SOUTH EAST ASIA,57869,63748,70750,84390,96276,107286,112741,115794,135343,179077
70 | Maynnar,SOUTH EAST ASIA,3417,3037,3609,4932,5652,7734,7977,12147,12849,14719
71 | Philippines,SOUTH EAST ASIA,7199,7647,8091,10492,11422,15644,15567,17222,21987,24534
72 | Singapore,SOUTH EAST ASIA,42824,44306,48368,60710,68666,82574,92908,97851,95328,107487
73 | Thailand,SOUTH EAST ASIA,18623,19649,25754,33442,41978,46623,50037,58065,67309,76617
74 | Others,SOUTH EAST ASIA,2276,2210,3276,3736,4774,4875,6427,12237,7307,10438
75 | Total,SOUTH EAST ASIA,139975,149291,168926,209110,241408,281726,303475,332925,360191,439043
76 | China,EAST ASIA,18657,23207,33837,52279,63791,88833,118127,127032,123673,143445
77 | Hong Kong,EAST ASIA,918,581,1070,1965,1908,1466,914,519,1396,1507
78 | Japan,EAST ASIA,80634,59709,77996,96851,103082,119292,145538,145352,124756,168019
79 | Korea,EAST ASIA,29685,31552,36982,49314,53585,71920,85292,79824,70489,95612
80 | Others,EAST ASIA,570,375,621,1218,1201,1474,2166,2503,2483,3364
81 | Total,EAST ASIA,130464,115424,150506,201627,223567,282985,352037,355230,322797,411947
82 | Australia,AUSTRALASIA,52691,50743,58730,81608,96258,109867,135925,146209,149074,169647
83 | New Zealand,AUSTRALASIA,11700,10811,13283,16762,20463,23493,27498,29261,30876,37024
84 | Fiji,AUSTRALASIA,1422,1499,1519,2003,2326,2412,2635,2129,2031,2508
85 | Others,AUSTRALASIA,291,208,317,571,731,1664,1005,709,470,1096
86 | Total,AUSTRALASIA,66104,63261,73849,100944,119778,137436,167063,178308,182451,210275
87 | STATELESS, STATELESS,9393,13487,15516,1434,13490,647,26237,1025,624,670
88 | Others, Others,33319,9366,10232,13692,21818,25320,32676,34540,15588,12087
89 |
--------------------------------------------------------------------------------
/lesson8/.ipynb_checkpoints/Lesson8_Correlation-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {
13 | "deletable": true,
14 | "editable": true
15 | },
16 | "source": [
17 | "# Lesson 8: Correlation\n",
18 | "\n",
19 | "In our last notebook, we learned how to use the power of programming to visualize data. But that's not all we can do with these Python libraries. Today we'll take our programming toolkit one step further and see how we can use the same libraries to calculate numerical information about our data after we've plotted it. Visualizations and numerical information together can tell us detailed stories about our data and what it means.\n",
20 | "\n",
21 | "We'll begin by using matplotlib - the same library we used to make line graphs, bar graphs, and pie charts - to explore the process of visualizing data using scatter plots. We'll then see how we can use another powerful library called numpy to numerically investigate the relationships between variables in our data by computing correlation coefficients. Finally we'll also look into plotting lines of best fit as an introduction to the concept of regression."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "collapsed": true,
29 | "deletable": true,
30 | "editable": true
31 | },
32 | "outputs": [],
33 | "source": [
34 | "import matplotlib\n",
35 | "matplotlib.use('Agg')\n",
36 | "from datascience import Table\n",
37 | "%matplotlib inline\n",
38 | "import matplotlib.pyplot as plt\n",
39 | "import numpy as np\n",
40 | "plt.style.use('fivethirtyeight')"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {
47 | "collapsed": true,
48 | "deletable": true,
49 | "editable": true
50 | },
51 | "outputs": [],
52 | "source": [
53 | "def checker(strong, positive):\n",
54 | " if strong:\n",
55 | " if positive:\n",
56 | " print(\"Try Again!\")\n",
57 | " else:\n",
58 | " print(\"You are correct! You now know how to make and read scatterplots to analyze trends in data!\") \n",
59 | " else:\n",
60 | " print(\"Try Again!\")"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {
66 | "deletable": true,
67 | "editable": true
68 | },
69 | "source": [
70 | "## Scatterplots\n",
71 | "\n",
72 | "Remember from today's lesson that when we have data about two variables, the best way to visualize it is through something called a \"scatterplot.\" We can use our handy datascience library to quickly make lots of scatterplots.\n",
73 | "\n",
74 | "Let's start with the example of clasroom participation from the lesson. Below we'll create a new Table using that data and then use the datascience \".scatter('Column Name 1', 'Column Name 2')\" function to create a scatterplot."
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {
81 | "collapsed": false,
82 | "deletable": true,
83 | "editable": true
84 | },
85 | "outputs": [],
86 | "source": [
87 | "classroom_participation = Table().with_columns([\n",
88 | " 'Student',['Sathvik','Anjali', 'Shreyan','Chaaru','Rishi','Divya'],\n",
89 | " '1st Week', [3,1,1,2,2,3],\n",
90 | " '12th Week', [7,2,4,5,6,5]\n",
91 | "])\n",
92 | "classroom_participation"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {
99 | "collapsed": false,
100 | "deletable": true,
101 | "editable": true
102 | },
103 | "outputs": [],
104 | "source": [
105 | "classroom_participation.scatter('1st Week', '12th Week')"
106 | ]
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {
111 | "deletable": true,
112 | "editable": true
113 | },
114 | "source": [
115 | "Scatterplots are useful for illustrating \"trends\" in our data. As we can see from this scatterplot, students who tended to participate less in the first week also tended to participated less in the twelfth week - like Anjali for example, who participated the least out of all the students in the 1st and 12th weeks. Similarly, students who tended to particpate more in the first week also participated more in the twelfth week. This is exactly what we mean by a trend. \n",
116 | "\n",
117 | "This is an example of a \"weak positive\" correlation. It's weak because the data points don't form exactly a straight line, and it's positive because the data points rougly go from the bottom left to the top right in our scatterplot (positive slope). Now let's look at another example. Fill in the following cells to make a scatter plot of classroom absences vs. final grade in the class."
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": null,
123 | "metadata": {
124 | "collapsed": false,
125 | "deletable": true,
126 | "editable": true
127 | },
128 | "outputs": [],
129 | "source": [
130 | "student_abscences = Table().with_columns([\n",
131 | " 'Student',['Rashi','Amit', 'Simaran','Shruti','Eesha','Arpit'],\n",
132 | " 'Absences', [0,1,2,3,4,7],\n",
133 | " 'Grades', [99,95,90,80,75,68]\n",
134 | "])\n",
135 | "student_abscences"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {
142 | "collapsed": false,
143 | "deletable": true,
144 | "editable": true
145 | },
146 | "outputs": [],
147 | "source": [
148 | "student_abscences.scatter( ) # fill in this line to make the scatterplot"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {
154 | "deletable": true,
155 | "editable": true
156 | },
157 | "source": [
158 | "Is this data strongly or weakly correlated? And is the correlation positive or negative? Put your answers in the cell below and run it to find out if you're right!"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {
165 | "collapsed": false,
166 | "deletable": true,
167 | "editable": true
168 | },
169 | "outputs": [],
170 | "source": [
171 | "strong = # put True here if you think it's a strong correlation and False if it's weak\n",
172 | "positive = # put True here if you think the data is positively correlated, and False if it's negatively correlated\n",
173 | "\n",
174 | "checker(strong, positive)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {
180 | "deletable": true,
181 | "editable": true
182 | },
183 | "source": [
184 | "## Correlation Coefficients\n",
185 | "\n",
186 | "Scatterplots can tell us a lot about trends in our data, but just knowing whether our data is strongly or weakly correlated and whether the correlation is positive or negative isn't enough. You might ask yourself, \"How strong is the correlation? Are some correlations stronger than others?\"\n",
187 | "\n",
188 | "This is where the idea of a correlation coefficient comes in. The correlation coefficient is a number, known as r, between -1 and 1 that can be calculated for any two variables. If r is negative the correlation is negative, and if it's positive the correlation is positive. And the \"magnitude\" of r (in other words, how close |r| is to 1) tells us how strong the correlation is.\n",
189 | "\n",
190 | "In the following cells, we'll walk you through an example of calculating the correlation coefficient for a dataset. For our dataset, let's take a look at some data we first saw all the way back in lesson 1. We'll use the datascience library to create a Table with fertility, population, life expectancy, and child mortality data for all the countries in the world for the year 2015."
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "collapsed": false,
198 | "deletable": true,
199 | "editable": true
200 | },
201 | "outputs": [],
202 | "source": [
203 | "fertility_data = Table.read_table('fertility.csv').where('time', 2015).drop(\"time\")\n",
204 | "population_data = Table.read_table('population.csv').where('time', 2015).drop(\"time\")\n",
205 | "life_expectancy_data = Table.read_table('life_expectancy.csv').where('time', 2015).drop(\"time\")\n",
206 | "child_mortality_data = Table.read_table('child_mortality.csv').where('time', 2015).drop(\"time\")\n",
207 | "\n",
208 | "joined_data = fertility_data.join(\"geo\", population_data)\\\n",
209 | " .join(\"geo\", life_expectancy_data)\\\n",
210 | " .join(\"geo\", child_mortality_data)\n",
211 | "joined_data = joined_data.relabeled(\"children_per_woman_total_fertility\", \"fertility\")\\\n",
212 | " .relabeled(\"population_total\", \"population\")\\\n",
213 | " .relabeled(\"life_expectancy_years\", \"life expectancy\")\\\n",
214 | " .relabeled(\"child_mortality_0_5_year_olds_dying_per_1000_born\", \"child mortality\")\n",
215 | "joined_data"
216 | ]
217 | },
218 | {
219 | "cell_type": "markdown",
220 | "metadata": {
221 | "deletable": true,
222 | "editable": true
223 | },
224 | "source": [
225 | "Let's take a look at the \"child mortality\" and \"life expectancy\" columns. We'd expect these two be negatively correlated since the higher the risk of dying as a child, the lower your expected lifetime should be. We can see if this trend holds true by first plotting the data."
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "metadata": {
232 | "collapsed": false,
233 | "deletable": true,
234 | "editable": true
235 | },
236 | "outputs": [],
237 | "source": [
238 | "joined_data.scatter(\"life expectancy\", \"child mortality\")"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {
244 | "deletable": true,
245 | "editable": true
246 | },
247 | "source": [
248 | "Yup, in fact the data seems negatively correlated. How strong is the correlation exactly? Well to answer that let's compute the correlation coefficient, using the numpy library. First, let's get the two columns we want."
249 | ]
250 | },
251 | {
252 | "cell_type": "code",
253 | "execution_count": null,
254 | "metadata": {
255 | "collapsed": true,
256 | "deletable": true,
257 | "editable": true
258 | },
259 | "outputs": [],
260 | "source": [
261 | "life_expectancy = joined_data.column(\"life expectancy\")\n",
262 | "child_mortality = joined_data.column(\"child mortality\")"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {
268 | "deletable": true,
269 | "editable": true
270 | },
271 | "source": [
272 | "Remember that the next step in computing the correlation coefficient is calculating the mean and standard deviation of our two variables. Numpy lets us do this with the convenient \"np.mean\" and \"np.std\" functions. Here's how to use them."
273 | ]
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": null,
278 | "metadata": {
279 | "collapsed": true,
280 | "deletable": true,
281 | "editable": true
282 | },
283 | "outputs": [],
284 | "source": [
285 | "avg_life_expectancy = np.mean(life_expectancy)\n",
286 | "stddev_life_expectancy = np.std(life_expectancy)\n",
287 | "avg_child_mortality = np.mean(child_mortality)\n",
288 | "stddev_child_mortality = np.std(child_mortality)"
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "metadata": {
294 | "deletable": true,
295 | "editable": true
296 | },
297 | "source": [
298 | "The next step is to transform the data by calculating z_x = (x - x_mean)/x_stddev and z_y = (y - y_mean)/y_stddev for each x and y value, then multiplying together to get z_x * z_y. We can use the numpy \"subtract,\" \"divide,\" and \"multiply\" functions for this."
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {
305 | "collapsed": true,
306 | "deletable": true,
307 | "editable": true
308 | },
309 | "outputs": [],
310 | "source": [
311 | "transformed_life_expectancy = np.divide(np.subtract(life_expectancy, avg_life_expectancy), stddev_life_expectancy)\n",
312 | "transformed_child_mortality = np.divide(np.subtract(child_mortality, avg_child_mortality), stddev_child_mortality)\n",
313 | "products = np.multiply(transformed_life_expectancy, transformed_child_mortality)"
314 | ]
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "metadata": {
319 | "deletable": true,
320 | "editable": true
321 | },
322 | "source": [
323 | "Finally, we add up the values in this array and divide by n, where n is the number of values, to get the final correlation coefficient."
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": null,
329 | "metadata": {
330 | "collapsed": false,
331 | "deletable": true,
332 | "editable": true
333 | },
334 | "outputs": [],
335 | "source": [
336 | "correlation_coefficient = np.sum(products) / len(products)\n",
337 | "print(\"Correlation coefficient: \", correlation_coefficient)"
338 | ]
339 | },
340 | {
341 | "cell_type": "markdown",
342 | "metadata": {
343 | "deletable": true,
344 | "editable": true
345 | },
346 | "source": [
347 | "As you can see the correlation coefficient is negative and very close to -1, indicating that this is a strong negative correlation as expected. \n",
348 | "\n",
349 | "Now it's your turn. Let's take a look at another pair of variables: fertility and population. Fill in the following cells to make the scatterplot."
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "metadata": {
356 | "collapsed": false,
357 | "deletable": true,
358 | "editable": true
359 | },
360 | "outputs": [],
361 | "source": [
362 | "joined_data.scatter( ) # fill this in to get a scatterplot of fertility vs population"
363 | ]
364 | },
365 | {
366 | "cell_type": "markdown",
367 | "metadata": {
368 | "deletable": true,
369 | "editable": true
370 | },
371 | "source": [
372 | "Now guess whether the correlation is strong or weak, and if it's positive or negative. Then fill in the following cells to find out if you're right by computing the correlation coefficient!"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": null,
378 | "metadata": {
379 | "collapsed": false,
380 | "deletable": true,
381 | "editable": true
382 | },
383 | "outputs": [],
384 | "source": [
385 | "fertility = joined_data.column( )\n",
386 | "population = joined_data.column( )\n",
387 | "\n",
388 | "avg_fertility = \n",
389 | "stddev_fertility = \n",
390 | "avg_population = \n",
391 | "stddev_population = \n",
392 | "\n",
393 | "transformed_fertility = np.divide(np.subtract( ), )\n",
394 | "transformed_population = np.divide(np.subtract( ), )\n",
395 | "products = \n",
396 | "\n",
397 | "correlation_coefficient = np.sum( ) / ( )\n",
398 | "print(\"This is the correlation coefficient: \", correlation_coefficient)"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {
404 | "deletable": true,
405 | "editable": true
406 | },
407 | "source": [
408 | "A much quicker way to compute the correlation coefficient is to use the np.corrcoef() function. To check if you filled in the previous cells correctly and found the right value, run the next cell to see the right answer."
409 | ]
410 | },
411 | {
412 | "cell_type": "code",
413 | "execution_count": null,
414 | "metadata": {
415 | "collapsed": false,
416 | "deletable": true,
417 | "editable": true,
418 | "scrolled": true
419 | },
420 | "outputs": [],
421 | "source": [
422 | "print(\"This is the correlation coefficient: \", \n",
423 | " np.corrcoef(joined_data.column(\"population\"), joined_data.column(\"fertility\"))[1,0])"
424 | ]
425 | },
426 | {
427 | "cell_type": "markdown",
428 | "metadata": {
429 | "deletable": true,
430 | "editable": true
431 | },
432 | "source": [
433 | "## Line of Best Fit\n",
434 | "\n",
435 | "The last thing you learned about correlation is the idea of a line of best fit. This is a line that roughly \"fits\" the data by describing the ideal linear relationship between the data points. Luckily we can plot lines of best fit easily using the datascience library by adding an additional argument, \"fit_line = True\", to our call to the scatter function. Here's an example."
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "metadata": {
442 | "collapsed": false,
443 | "deletable": true,
444 | "editable": true
445 | },
446 | "outputs": [],
447 | "source": [
448 | "joined_data.scatter(\"life expectancy\", \"child mortality\", fit_line=True)"
449 | ]
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {
454 | "deletable": true,
455 | "editable": true
456 | },
457 | "source": [
458 | "When the data is strongly correlated (correlation coefficient close to 1 or -1), the line closely fits most of the data points well. But when the data is not strongly correlated (correlation coefficient close to 0), the lie of best fit doesn't seem to fit the data at all. Let's see what the line of best fit looks like for population and fertility. Fill in the following cell to create a scatterplot of population and fertility with a best fit line."
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {
465 | "collapsed": false,
466 | "deletable": true,
467 | "editable": true
468 | },
469 | "outputs": [],
470 | "source": [
471 | "joined_data.scatter( ) # fill this in"
472 | ]
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "metadata": {
477 | "deletable": true,
478 | "editable": true
479 | },
480 | "source": [
481 | "In the next lesson, we'll learn how to actually mathematically find the equation for this line of best fit by using a technique known as \"regression.\""
482 | ]
483 | }
484 | ],
485 | "metadata": {
486 | "celltoolbar": "Raw Cell Format",
487 | "kernelspec": {
488 | "display_name": "Python 3",
489 | "language": "python",
490 | "name": "python3"
491 | },
492 | "language_info": {
493 | "codemirror_mode": {
494 | "name": "ipython",
495 | "version": 3
496 | },
497 | "file_extension": ".py",
498 | "mimetype": "text/x-python",
499 | "name": "python",
500 | "nbconvert_exporter": "python",
501 | "pygments_lexer": "ipython3",
502 | "version": "3.6.0"
503 | }
504 | },
505 | "nbformat": 4,
506 | "nbformat_minor": 2
507 | }
508 |
--------------------------------------------------------------------------------
/lesson8/Lesson8_Correlation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {
13 | "deletable": true,
14 | "editable": true
15 | },
16 | "source": [
17 | "# Lesson 8: Correlation\n",
18 | "\n",
19 | "In our last notebook, we learned how to use the power of programming to visualize data. But that's not all we can do with these Python libraries. Today we'll take our programming toolkit one step further and see how we can use the same libraries to calculate numerical information about our data after we've plotted it. Visualizations and numerical information together can tell us detailed stories about our data and what it means.\n",
20 | "\n",
21 | "We'll begin by using matplotlib - the same library we used to make line graphs, bar graphs, and pie charts - to explore the process of visualizing data using scatter plots. We'll then see how we can use another powerful library called numpy to numerically investigate the relationships between variables in our data by computing correlation coefficients. Finally we'll also look into plotting lines of best fit as an introduction to the concept of regression."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {
28 | "collapsed": true,
29 | "deletable": true,
30 | "editable": true
31 | },
32 | "outputs": [],
33 | "source": [
34 | "import matplotlib\n",
35 | "matplotlib.use('Agg')\n",
36 | "from datascience import Table\n",
37 | "%matplotlib inline\n",
38 | "import matplotlib.pyplot as plt\n",
39 | "import numpy as np\n",
40 | "plt.style.use('fivethirtyeight')"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {
47 | "collapsed": true,
48 | "deletable": true,
49 | "editable": true
50 | },
51 | "outputs": [],
52 | "source": [
53 | "def checker(strong, positive):\n",
54 | " if strong:\n",
55 | " if positive:\n",
56 | " print(\"Try Again!\")\n",
57 | " else:\n",
58 | " print(\"You are correct! You now know how to make and read scatterplots to analyze trends in data!\") \n",
59 | " else:\n",
60 | " print(\"Try Again!\")"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {
66 | "deletable": true,
67 | "editable": true
68 | },
69 | "source": [
70 | "## Scatterplots\n",
71 | "\n",
72 | "Remember from today's lesson that when we have data about two variables, the best way to visualize it is through something called a \"scatterplot.\" We can use our handy datascience library to quickly make lots of scatterplots.\n",
73 | "\n",
74 | "Let's start with the example of clasroom participation from the lesson. Below we'll create a new Table using that data and then use the datascience \".scatter('Column Name 1', 'Column Name 2')\" function to create a scatterplot."
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {
81 | "collapsed": false,
82 | "deletable": true,
83 | "editable": true
84 | },
85 | "outputs": [],
86 | "source": [
87 | "classroom_participation = Table().with_columns([\n",
88 | " 'Student',['Sathvik','Anjali', 'Shreyan','Chaaru','Rishi','Divya'],\n",
89 | " '1st Week', [3,1,1,2,2,3],\n",
90 | " '12th Week', [7,2,4,5,6,5]\n",
91 | "])\n",
92 | "classroom_participation"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {
99 | "collapsed": false,
100 | "deletable": true,
101 | "editable": true
102 | },
103 | "outputs": [],
104 | "source": [
105 | "classroom_participation.scatter('1st Week', '12th Week')"
106 | ]
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {
111 | "deletable": true,
112 | "editable": true
113 | },
114 | "source": [
115 | "Scatterplots are useful for illustrating \"trends\" in our data. As we can see from this scatterplot, students who tended to participate less in the first week also tended to participated less in the twelfth week - like Anjali for example, who participated the least out of all the students in the 1st and 12th weeks. Similarly, students who tended to particpate more in the first week also participated more in the twelfth week. This is exactly what we mean by a trend. \n",
116 | "\n",
117 | "This is an example of a \"weak positive\" correlation. It's weak because the data points don't form exactly a straight line, and it's positive because the data points rougly go from the bottom left to the top right in our scatterplot (positive slope). Now let's look at another example. Fill in the following cells to make a scatter plot of classroom absences vs. final grade in the class."
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": null,
123 | "metadata": {
124 | "collapsed": false,
125 | "deletable": true,
126 | "editable": true
127 | },
128 | "outputs": [],
129 | "source": [
130 | "student_abscences = Table().with_columns([\n",
131 | " 'Student',['Rashi','Amit', 'Simaran','Shruti','Eesha','Arpit'],\n",
132 | " 'Absences', [0,1,2,3,4,7],\n",
133 | " 'Grades', [99,95,90,80,75,68]\n",
134 | "])\n",
135 | "student_abscences"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": null,
141 | "metadata": {
142 | "collapsed": false,
143 | "deletable": true,
144 | "editable": true
145 | },
146 | "outputs": [],
147 | "source": [
148 | "student_abscences.scatter( ) # fill in this line to make the scatterplot"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {
154 | "deletable": true,
155 | "editable": true
156 | },
157 | "source": [
158 | "Is this data strongly or weakly correlated? And is the correlation positive or negative? Put your answers in the cell below and run it to find out if you're right!"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": null,
164 | "metadata": {
165 | "collapsed": false,
166 | "deletable": true,
167 | "editable": true
168 | },
169 | "outputs": [],
170 | "source": [
171 | "strong = # put True here if you think it's a strong correlation and False if it's weak\n",
172 | "positive = # put True here if you think the data is positively correlated, and False if it's negatively correlated\n",
173 | "\n",
174 | "checker(strong, positive)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {
180 | "deletable": true,
181 | "editable": true
182 | },
183 | "source": [
184 | "## Correlation Coefficients\n",
185 | "\n",
186 | "Scatterplots can tell us a lot about trends in our data, but just knowing whether our data is strongly or weakly correlated and whether the correlation is positive or negative isn't enough. You might ask yourself, \"How strong is the correlation? Are some correlations stronger than others?\"\n",
187 | "\n",
188 | "This is where the idea of a correlation coefficient comes in. The correlation coefficient is a number, known as r, between -1 and 1 that can be calculated for any two variables. If r is negative the correlation is negative, and if it's positive the correlation is positive. And the \"magnitude\" of r (in other words, how close |r| is to 1) tells us how strong the correlation is.\n",
189 | "\n",
190 | "In the following cells, we'll walk you through an example of calculating the correlation coefficient for a dataset. For our dataset, let's take a look at some data we first saw all the way back in lesson 1. We'll use the datascience library to create a Table with fertility, population, life expectancy, and child mortality data for all the countries in the world for the year 2015."
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "collapsed": false,
198 | "deletable": true,
199 | "editable": true
200 | },
201 | "outputs": [],
202 | "source": [
203 | "fertility_data = Table.read_table('fertility.csv').where('time', 2015).drop(\"time\")\n",
204 | "population_data = Table.read_table('population.csv').where('time', 2015).drop(\"time\")\n",
205 | "life_expectancy_data = Table.read_table('life_expectancy.csv').where('time', 2015).drop(\"time\")\n",
206 | "child_mortality_data = Table.read_table('child_mortality.csv').where('time', 2015).drop(\"time\")\n",
207 | "\n",
208 | "joined_data = fertility_data.join(\"geo\", population_data)\\\n",
209 | " .join(\"geo\", life_expectancy_data)\\\n",
210 | " .join(\"geo\", child_mortality_data)\n",
211 | "joined_data = joined_data.relabeled(\"children_per_woman_total_fertility\", \"fertility\")\\\n",
212 | " .relabeled(\"population_total\", \"population\")\\\n",
213 | " .relabeled(\"life_expectancy_years\", \"life expectancy\")\\\n",
214 | " .relabeled(\"child_mortality_0_5_year_olds_dying_per_1000_born\", \"child mortality\")\n",
215 | "joined_data"
216 | ]
217 | },
218 | {
219 | "cell_type": "markdown",
220 | "metadata": {
221 | "deletable": true,
222 | "editable": true
223 | },
224 | "source": [
225 | "Let's take a look at the \"child mortality\" and \"life expectancy\" columns. We'd expect these two be negatively correlated since the higher the risk of dying as a child, the lower your expected lifetime should be. We can see if this trend holds true by first plotting the data."
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "metadata": {
232 | "collapsed": false,
233 | "deletable": true,
234 | "editable": true
235 | },
236 | "outputs": [],
237 | "source": [
238 | "joined_data.scatter(\"life expectancy\", \"child mortality\")"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {
244 | "deletable": true,
245 | "editable": true
246 | },
247 | "source": [
248 | "Yup, in fact the data seems negatively correlated. How strong is the correlation exactly? Well to answer that let's compute the correlation coefficient, using the numpy library. First, let's get the two columns we want."
249 | ]
250 | },
251 | {
252 | "cell_type": "code",
253 | "execution_count": null,
254 | "metadata": {
255 | "collapsed": true,
256 | "deletable": true,
257 | "editable": true
258 | },
259 | "outputs": [],
260 | "source": [
261 | "life_expectancy = joined_data.column(\"life expectancy\")\n",
262 | "child_mortality = joined_data.column(\"child mortality\")"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {
268 | "deletable": true,
269 | "editable": true
270 | },
271 | "source": [
272 | "Remember that the next step in computing the correlation coefficient is calculating the mean and standard deviation of our two variables. Numpy lets us do this with the convenient \"np.mean\" and \"np.std\" functions. Here's how to use them."
273 | ]
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": null,
278 | "metadata": {
279 | "collapsed": true,
280 | "deletable": true,
281 | "editable": true
282 | },
283 | "outputs": [],
284 | "source": [
285 | "avg_life_expectancy = np.mean(life_expectancy)\n",
286 | "stddev_life_expectancy = np.std(life_expectancy)\n",
287 | "avg_child_mortality = np.mean(child_mortality)\n",
288 | "stddev_child_mortality = np.std(child_mortality)"
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "metadata": {
294 | "deletable": true,
295 | "editable": true
296 | },
297 | "source": [
298 | "The next step is to transform the data by calculating z_x = (x - x_mean)/x_stddev and z_y = (y - y_mean)/y_stddev for each x and y value, then multiplying together to get z_x * z_y. We can use the numpy \"subtract,\" \"divide,\" and \"multiply\" functions for this."
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {
305 | "collapsed": true,
306 | "deletable": true,
307 | "editable": true
308 | },
309 | "outputs": [],
310 | "source": [
311 | "transformed_life_expectancy = np.divide(np.subtract(life_expectancy, avg_life_expectancy), stddev_life_expectancy)\n",
312 | "transformed_child_mortality = np.divide(np.subtract(child_mortality, avg_child_mortality), stddev_child_mortality)\n",
313 | "products = np.multiply(transformed_life_expectancy, transformed_child_mortality)"
314 | ]
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "metadata": {
319 | "deletable": true,
320 | "editable": true
321 | },
322 | "source": [
323 | "Finally, we add up the values in this array and divide by n, where n is the number of values, to get the final correlation coefficient."
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": null,
329 | "metadata": {
330 | "collapsed": false,
331 | "deletable": true,
332 | "editable": true
333 | },
334 | "outputs": [],
335 | "source": [
336 | "correlation_coefficient = np.sum(products) / len(products)\n",
337 | "print(\"Correlation coefficient: \", correlation_coefficient)"
338 | ]
339 | },
340 | {
341 | "cell_type": "markdown",
342 | "metadata": {
343 | "deletable": true,
344 | "editable": true
345 | },
346 | "source": [
347 | "As you can see the correlation coefficient is negative and very close to -1, indicating that this is a strong negative correlation as expected. \n",
348 | "\n",
349 | "Now it's your turn. Let's take a look at another pair of variables: fertility and population. Fill in the following cells to make the scatterplot."
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "metadata": {
356 | "collapsed": false,
357 | "deletable": true,
358 | "editable": true
359 | },
360 | "outputs": [],
361 | "source": [
362 | "joined_data.scatter( ) # fill this in to get a scatterplot of fertility vs population"
363 | ]
364 | },
365 | {
366 | "cell_type": "markdown",
367 | "metadata": {
368 | "deletable": true,
369 | "editable": true
370 | },
371 | "source": [
372 | "Now guess whether the correlation is strong or weak, and if it's positive or negative. Then fill in the following cells to find out if you're right by computing the correlation coefficient!"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": null,
378 | "metadata": {
379 | "collapsed": false,
380 | "deletable": true,
381 | "editable": true
382 | },
383 | "outputs": [],
384 | "source": [
385 | "fertility = joined_data.column( )\n",
386 | "population = joined_data.column( )\n",
387 | "\n",
388 | "avg_fertility = \n",
389 | "stddev_fertility = \n",
390 | "avg_population = \n",
391 | "stddev_population = \n",
392 | "\n",
393 | "transformed_fertility = np.divide(np.subtract( ), )\n",
394 | "transformed_population = np.divide(np.subtract( ), )\n",
395 | "products = \n",
396 | "\n",
397 | "correlation_coefficient = np.sum( ) / ( )\n",
398 | "print(\"This is the correlation coefficient: \", correlation_coefficient)"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {
404 | "deletable": true,
405 | "editable": true
406 | },
407 | "source": [
408 | "A much quicker way to compute the correlation coefficient is to use the np.corrcoef() function. To check if you filled in the previous cells correctly and found the right value, run the next cell to see the right answer."
409 | ]
410 | },
411 | {
412 | "cell_type": "code",
413 | "execution_count": null,
414 | "metadata": {
415 | "collapsed": false,
416 | "deletable": true,
417 | "editable": true,
418 | "scrolled": true
419 | },
420 | "outputs": [],
421 | "source": [
422 | "print(\"This is the correlation coefficient: \", \n",
423 | " np.corrcoef(joined_data.column(\"population\"), joined_data.column(\"fertility\"))[1,0])"
424 | ]
425 | },
426 | {
427 | "cell_type": "markdown",
428 | "metadata": {
429 | "deletable": true,
430 | "editable": true
431 | },
432 | "source": [
433 | "## Line of Best Fit\n",
434 | "\n",
435 | "The last thing you learned about correlation is the idea of a line of best fit. This is a line that roughly \"fits\" the data by describing the ideal linear relationship between the data points. Luckily we can plot lines of best fit easily using the datascience library by adding an additional argument, \"fit_line = True\", to our call to the scatter function. Here's an example."
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "metadata": {
442 | "collapsed": false,
443 | "deletable": true,
444 | "editable": true
445 | },
446 | "outputs": [],
447 | "source": [
448 | "joined_data.scatter(\"life expectancy\", \"child mortality\", fit_line=True)"
449 | ]
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {
454 | "deletable": true,
455 | "editable": true
456 | },
457 | "source": [
458 | "When the data is strongly correlated (correlation coefficient close to 1 or -1), the line closely fits most of the data points well. But when the data is not strongly correlated (correlation coefficient close to 0), the lie of best fit doesn't seem to fit the data at all. Let's see what the line of best fit looks like for population and fertility. Fill in the following cell to create a scatterplot of population and fertility with a best fit line."
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {
465 | "collapsed": false,
466 | "deletable": true,
467 | "editable": true
468 | },
469 | "outputs": [],
470 | "source": [
471 | "joined_data.scatter( ) # fill this in"
472 | ]
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "metadata": {
477 | "deletable": true,
478 | "editable": true
479 | },
480 | "source": [
481 | "In the next lesson, we'll learn how to actually mathematically find the equation for this line of best fit by using a technique known as \"regression.\""
482 | ]
483 | }
484 | ],
485 | "metadata": {
486 | "celltoolbar": "Raw Cell Format",
487 | "kernelspec": {
488 | "display_name": "Python 3",
489 | "language": "python",
490 | "name": "python3"
491 | },
492 | "language_info": {
493 | "codemirror_mode": {
494 | "name": "ipython",
495 | "version": 3
496 | },
497 | "file_extension": ".py",
498 | "mimetype": "text/x-python",
499 | "name": "python",
500 | "nbconvert_exporter": "python",
501 | "pygments_lexer": "ipython3",
502 | "version": "3.6.0"
503 | }
504 | },
505 | "nbformat": 4,
506 | "nbformat_minor": 2
507 | }
508 |
--------------------------------------------------------------------------------
/lesson9/circular.csv:
--------------------------------------------------------------------------------
1 | x,y
2 | -187,38
3 | -65,15
4 | -47,4
5 | -53,46
6 | -12,-31
7 | -48,-93
8 | -126,59
9 | -71,13
10 | -183,78
11 | -88,47
12 | -129,8
13 | -172,-88
14 | -23,-18
15 | -3,10
16 | -19,-48
17 | -86,-57
18 | -49,-10
19 | -69,-69
20 | -172,21
21 | -131,101
22 | -118,119
23 | -130,101
24 | -55,106
25 | -117,-125
26 | -137,-78
27 | -132,-3
28 | -141,-1
29 | -74,-123
30 | -74,-68
31 | -154,36
32 | -141,26
33 | -118,-76
34 | -98,-44
35 | -157,-61
36 | -49,80
37 | 19,62
38 | -116,-5
39 | -18,33
40 | -63,-84
41 | -64,-98
42 | 13,-19
43 | -32,13
44 | -152,-47
45 | -56,2
46 | -49,-17
47 | -43,-90
48 | -141,-36
49 | -4,108
50 | -86,-36
51 | -82,-69
52 | -91,-13
53 | -47,-12
54 | -82,156
55 | -127,33
56 | -142,171
57 | -74,-162
58 | -8,-109
59 | -78,-131
60 | -55,-40
61 | -115,32
62 | 4,-62
63 | -14,-97
64 | -85,-98
65 | -16,-23
66 | -125,19
67 | -125,6
68 | -13,9
69 | -129,33
70 | -86,127
71 | -47,101
72 | 46,43
73 | 2,63
74 | -137,-36
75 | -124,-84
76 | 55,-84
77 | -125,-52
78 | -77,43
79 | -40,-100
80 | 49,-1
81 | -51,-47
82 | -13,51
83 | -49,-15
84 | -104,17
85 | 59,2
86 | 24,14
87 | 28,51
88 | -27,1
89 | -114,94
90 | -25,160
91 | -50,94
92 | -53,54
93 | -40,-138
94 | -48,-71
95 | 28,12
96 | 25,-158
97 | 6,-54
98 | -40,-109
99 | -60,-94
100 | -75,-29
101 | 3,75
102 | -80,68
103 | 66,-77
104 | 44,97
105 | 23,122
106 | -100,116
107 | -62,82
108 | 1,130
109 | -22,143
110 | -25,116
111 | -69,181
112 | 31,-114
113 | -77,-118
114 | -20,-132
115 | -109,-85
116 | 10,32
117 | 49,-17
118 | -69,17
119 | -92,-56
120 | -39,51
121 | 67,57
122 | 55,40
123 | -20,102
124 | 13,-44
125 | -23,106
126 | -50,-32
127 | 20,26
128 | -114,22
129 | -42,59
130 | 26,182
131 | -82,-112
132 | 58,-56
133 | -55,20
134 | -71,-82
135 | -50,-73
136 | -29,14
137 | 89,-53
138 | 63,44
139 | -51,-34
140 | 49,-61
141 | 9,47
142 | 4,65
143 | -34,103
144 | 83,-56
145 | -66,-3
146 | -46,81
147 | -21,26
148 | 8,41
149 | 59,49
150 | 20,-83
151 | 26,1
152 | -50,-160
153 | -71,-10
154 | 54,29
155 | 55,-15
156 | -42,-99
157 | -98,-31
158 | -67,-90
159 | 98,-36
160 | -49,-66
161 | 20,18
162 | 31,75
163 | 57,95
164 | 43,91
165 | 78,-26
166 | -31,-21
167 | -42,6
168 | 67,149
169 | -43,32
170 | -45,-180
171 | 0,-112
172 | -18,-132
173 | 30,-23
174 | 9,-143
175 | -16,3
176 | 61,-128
177 | -77,-36
178 | -30,-54
179 | 83,-14
180 | 29,-77
181 | -90,111
182 | -16,-25
183 | 92,113
184 | -6,54
185 | -57,95
186 | -58,60
187 | -74,148
188 | 49,100
189 | 47,-9
190 | 98,-37
191 | 61,-115
192 | -7,33
193 | 19,-137
194 | 47,-53
195 | 40,-86
196 | -29,-102
197 | 5,-51
198 | -24,-20
199 | -1,71
200 | 74,67
201 | 18,80
202 | 17,73
203 | 99,127
204 | 112,-24
205 | 20,145
206 | -24,-12
207 | 102,78
208 | -27,-51
209 | -12,-144
210 | 58,-35
211 | 6,-160
212 | 87,-22
213 | -48,-67
214 | 45,29
215 | -47,-32
216 | 10,-32
217 | 113,-85
218 | -24,95
219 | 55,15
220 | 103,-10
221 | -44,110
222 | 87,-5
223 | 60,57
224 | 51,114
225 | 122,160
226 | -15,170
227 | 68,-171
228 | -24,-2
229 | 77,7
230 | 57,36
231 | 12,31
232 | 133,-116
233 | 134,-77
234 | 45,-51
235 | 7,-9
236 | 53,46
237 | 97,106
238 | 124,117
239 | 30,-6
240 | 55,49
241 | -42,84
242 | 72,158
243 | 69,43
244 | -12,140
245 | 43,27
246 | -48,-74
247 | 96,-96
248 | 65,-9
249 | -50,2
250 | 123,-130
251 | 6,-111
252 | -45,30
253 | 27,54
254 | -30,-98
255 | 144,10
256 | 36,58
257 | -12,108
258 | 61,-23
259 | 47,70
260 | 33,70
261 | 43,126
262 | 100,44
263 | 148,-136
264 | 9,-68
265 | 2,-144
266 | 128,-14
267 | 37,-30
268 | 117,36
269 | 86,9
270 | -33,76
271 | 125,73
272 | 33,81
273 | 65,103
274 | 28,105
275 | -14,73
276 | 40,-48
277 | 111,-28
278 | 114,31
279 | 113,50
280 | -15,-161
281 | 114,-77
282 | 100,-57
283 | 88,-101
284 | 122,-103
285 | 138,-114
286 | 101,-19
287 | 25,-3
288 | 156,66
289 | 52,98
290 | 0,81
291 | 150,107
292 | 122,122
293 | 71,149
294 | 162,113
295 | 70,2
296 | 93,-137
297 | 31,-78
298 | 131,-117
299 | 179,-78
300 | 126,-32
301 | 158,13
302 | -7,25
303 | 171,4
304 | 122,20
305 | 92,12
306 | 160,81
307 | 17,-24
308 | 136,43
309 | 94,3
310 | 23,-68
311 | 144,-32
312 | 110,43
313 | 11,70
314 | 107,14
315 | 183,31
316 | 112,-26
--------------------------------------------------------------------------------
/lesson9/family.csv:
--------------------------------------------------------------------------------
1 | Family,Father,Mother,Gender,Height,Kids
1,78.5,67,M,73.2,4
1,78.5,67,F,69.2,4
1,78.5,67,F,69,4
1,78.5,67,F,69,4
2,75.5,66.5,M,73.5,4
2,75.5,66.5,M,72.5,4
2,75.5,66.5,F,65.5,4
2,75.5,66.5,F,65.5,4
3,75,64,M,71,2
3,75,64,F,68,2
4,75,64,M,70.5,5
4,75,64,M,68.5,5
4,75,64,F,67,5
4,75,64,F,64.5,5
4,75,64,F,63,5
5,75,58.5,M,72,6
5,75,58.5,M,69,6
5,75,58.5,M,68,6
5,75,58.5,F,66.5,6
5,75,58.5,F,62.5,6
5,75,58.5,F,62.5,6
6,74,68,F,69.5,1
7,74,68,M,76.5,6
7,74,68,M,74,6
7,74,68,M,73,6
7,74,68,M,73,6
7,74,68,F,70.5,6
7,74,68,F,64,6
8,74,66.5,F,70.5,3
8,74,66.5,F,68,3
8,74,66.5,F,66,3
9,74.5,66,F,66,1
10,74,65.5,F,65.5,1
11,74,62,M,74,8
11,74,62,M,70,8
11,74,62,F,68,8
11,74,62,F,67,8
11,74,62,F,67,8
11,74,62,F,66,8
11,74,62,F,63.5,8
11,74,62,F,63,8
12,74,61,F,65,1
14,73,67,M,68,2
14,73,67,M,67,2
15,73,66.5,M,71,3
15,73,66.5,M,70.5,3
15,73,66.5,F,66.7,3
16,73,65,M,72,9
16,73,65,M,70.5,9
16,73,65,M,70.2,9
16,73,65,M,70.2,9
16,73,65,M,69.2,9
16,73,65,F,68.7,9
16,73,65,F,66.5,9
16,73,65,F,64.5,9
16,73,65,F,63.5,9
17,73,64.5,M,74,6
17,73,64.5,M,73,6
17,73,64.5,M,71.5,6
17,73,64.5,M,62.5,6
17,73,64.5,F,66.5,6
17,73,64.5,F,62.3,6
18,73,64,F,66,3
18,73,64,F,64.5,3
18,73,64,F,64,3
19,73.2,63,F,62.7,1
20,72.7,69,M,73.2,8
20,72.7,69,M,73,8
20,72.7,69,M,72.7,8
20,72.7,69,F,70,8
20,72.7,69,F,69,8
20,72.7,69,F,68.5,8
20,72.7,69,F,68,8
20,72.7,69,F,66,8
21,72,68,M,73,3
21,72,68,F,68.5,3
21,72,68,F,68,3
22,72,67,M,73,3
22,72,67,M,71,3
22,72,67,F,67,3
23,72,65,M,74.2,7
23,72,65,M,70.5,7
23,72,65,M,69.5,7
23,72,65,F,66,7
23,72,65,F,65.5,7
23,72,65,F,65,7
23,72,65,F,65,7
24,72,65.5,F,65.5,1
25,72,64,F,66,2
25,72,64,F,63,2
26,72,63,M,70.5,5
26,72,63,M,70.5,5
26,72,63,M,69,5
26,72,63,F,65,5
26,72,63,F,63,5
27,72,63,M,69,3
27,72,63,M,67,3
27,72,63,F,63,3
28,72,63,M,73,6
28,72,63,M,67,6
28,72,63,F,70.5,6
28,72,63,F,70,6
28,72,63,F,66.5,6
28,72,63,F,63,6
29,72.5,63.5,F,67.5,3
29,72.5,63.5,F,67.2,3
29,72.5,63.5,F,66.7,3
30,72,62,F,64,1
31,72.5,62,M,71,6
31,72.5,62,M,70,6
31,72.5,62,M,70,6
31,72.5,62,F,66,6
31,72.5,62,F,65,6
31,72.5,62,F,65,6
32,72,62,M,74,5
32,72,62,M,72,5
32,72,62,M,69,5
32,72,62,F,67.5,5
32,72,62,F,63.5,5
33,72,62,M,72,5
33,72,62,M,71.5,5
33,72,62,M,71.5,5
33,72,62,M,70,5
33,72,62,F,68,5
34,72,61,F,65.7,1
35,71,69,M,78,5
35,71,69,M,74,5
35,71,69,M,73,5
35,71,69,M,72,5
35,71,69,F,67,5
36,71,67,M,73.2,4
36,71,67,M,73,4
36,71,67,M,69,4
36,71,67,F,67,4
37,71,66,M,70,4
37,71,66,F,67,4
37,71,66,F,67,4
37,71,66,F,66.5,4
38,71,66,M,70,6
38,71,66,M,69,6
38,71,66,M,68.5,6
38,71,66,F,66,6
38,71,66,F,64.5,6
38,71,66,F,63,6
39,71,66,M,71,2
39,71,66,F,67,2
40,71,66,M,76,5
40,71,66,M,72,5
40,71,66,M,71,5
40,71,66,M,66,5
40,71,66,F,66,5
41,71.7,65.5,M,70.5,1
42,71,65.5,M,72,6
42,71,65.5,M,72,6
42,71,65.5,M,71,6
42,71,65.5,M,69,6
42,71,65.5,F,66,6
42,71,65.5,F,65,6
43,71.5,65.5,M,73,2
43,71.5,65.5,F,65.2,2
44,71.5,65,M,68.5,2
44,71.5,65,M,67.7,2
45,71,65,M,68,3
45,71,65,M,68,3
45,71,65,F,62,3
46,71,64,F,68,8
46,71,64,F,68,8
46,71,64,F,67.5,8
46,71,64,F,66.5,8
46,71,64,F,66.5,8
46,71,64,F,66,8
46,71,64,F,65.5,8
46,71,64,F,65,8
47,71.7,64.5,M,72,4
47,71.7,64.5,M,71,4
47,71.7,64.5,M,70.5,4
47,71.7,64.5,F,67,4
48,71,64,M,68,3
48,71,64,M,68,3
48,71,64,M,68,3
49,71.5,64.5,M,72,7
49,71.5,64.5,M,71,7
49,71.5,64.5,M,70,7
49,71.5,64.5,F,66,7
49,71.5,64.5,F,64.5,7
49,71.5,64.5,F,64.5,7
49,71.5,64.5,F,62,7
51,71.2,63,F,67.5,2
51,71.2,63,F,64.5,2
52,71,63.5,M,71,5
52,71,63.5,M,67,5
52,71,63.5,F,66,5
52,71,63.5,F,65,5
52,71,63.5,F,63.5,5
53,71,63,M,71,9
53,71,63,M,70,9
53,71,63,M,70,9
53,71,63,M,64,9
53,71,63,F,65,9
53,71,63,F,65,9
53,71,63,F,64,9
53,71,63,F,63,9
53,71,63,F,63,9
54,71,63,M,71,4
54,71,63,M,71,4
54,71,63,M,70,4
54,71,63,F,63.5,4
55,71,62,M,71,5
55,71,62,M,70,5
55,71,62,F,64.5,5
55,71,62,F,62.5,5
55,71,62,F,61.5,5
56,71,62,M,72,5
56,71,62,M,70.5,5
56,71,62,M,70.5,5
56,71,62,F,64.5,5
56,71,62,F,60,5
57,71,62.5,M,70,5
57,71,62.5,F,64,5
57,71,62.5,F,64,5
57,71,62.5,F,64,5
57,71,62.5,F,62.5,5
58,71,62,M,70.5,7
58,71,62,M,70,7
58,71,62,M,69,7
58,71,62,M,69,7
58,71,62,M,66,7
58,71,62,F,64.5,7
58,71,62,F,64,7
59,71,61,F,62,1
60,71,58,M,71.5,2
60,71,58,M,69,2
61,70,69,M,71,4
61,70,69,M,70,4
61,70,69,M,69,4
61,70,69,F,69,4
62,70,69,M,70,6
62,70,69,M,68.7,6
62,70,69,F,68,6
62,70,69,F,66,6
62,70,69,F,64,6
62,70,69,F,62,6
63,70,68,M,75,1
64,70,67,M,70,5
64,70,67,M,69,5
64,70,67,F,66,5
64,70,67,F,64,5
64,70,67,F,60,5
65,70,67,F,67.5,1
66,70,66.5,M,73,11
66,70,66.5,M,72,11
66,70,66.5,M,72,11
66,70,66.5,M,66.5,11
66,70,66.5,F,69.2,11
66,70,66.5,F,67.2,11
66,70,66.5,F,66.5,11
66,70,66.5,F,66,11
66,70,66.5,F,66,11
66,70,66.5,F,64.2,11
66,70,66.5,F,63.7,11
67,70.5,65,M,72,4
67,70.5,65,M,70.2,4
67,70.5,65,M,69,4
67,70.5,65,M,68.5,4
68,70.5,65,F,68,5
68,70.5,65,F,65,5
68,70.5,65,F,61.5,5
68,70.5,65,F,61,5
68,70.5,65,F,61,5
69,70,65,M,73,8
69,70,65,M,72,8
69,70,65,M,70.5,8
69,70,65,M,65,8
69,70,65,M,65,8
69,70,65,F,64.5,8
69,70,65,F,63,8
69,70,65,F,62,8
70,70,65,M,67,5
70,70,65,M,65,5
70,70,65,F,64.5,5
70,70,65,F,62.5,5
70,70,65,F,62.5,5
71,70,65,M,70,6
71,70,65,M,70,6
71,70,65,F,67,6
71,70,65,F,65,6
71,70,65,F,65,6
71,70,65,F,63,6
72,70,65,M,79,7
72,70,65,M,75,7
72,70,65,M,71,7
72,70,65,F,69,7
72,70,65,F,67,7
72,70,65,F,65.7,7
72,70,65,F,62,7
73,70,65,M,73,3
73,70,65,M,72.5,3
73,70,65,F,65,3
74,70,65,M,69,2
74,70,65,M,69,2
75,70,64.7,M,72,7
75,70,64.7,M,70,7
75,70,64.7,M,68.7,7
75,70,64.7,F,66.5,7
75,70,64.7,F,65.5,7
75,70,64.7,F,64.7,7
75,70,64.7,F,64.5,7
76,70,64,M,70.7,7
76,70,64,M,70,7
76,70,64,M,68,7
76,70,64,M,67,7
76,70,64,M,66,7
76,70,64,M,65,7
76,70,64,F,67,7
77,70,64,M,70,4
77,70,64,M,68,4
77,70,64,M,66.7,4
77,70,64,F,65.5,4
78,70,64.2,M,72,5
78,70,64.2,M,70,5
78,70,64.2,F,62.5,5
78,70,64.2,F,61.2,5
78,70,64.2,F,60.1,5
79,70.5,64,M,74,8
79,70.5,64,M,69.5,8
79,70.5,64,M,69,8
79,70.5,64,M,68,8
79,70.5,64,M,68,8
79,70.5,64,M,68,8
79,70.5,64,F,65.5,8
79,70.5,64,F,65,8
80,70.5,64.5,F,60,1
81,70,64,M,68,4
81,70,64,F,65,4
81,70,64,F,64,4
81,70,64,F,62,4
82,70,64,M,71,9
82,70,64,M,70,9
82,70,64,M,70,9
82,70,64,M,70,9
82,70,64,M,69.5,9
82,70,64,M,68.5,9
82,70,64,F,69,9
82,70,64,F,65,9
82,70,64,F,64,9
83,70,63.7,M,70,8
83,70,63.7,M,67,8
83,70,63.7,M,65.5,8
83,70,63.7,F,63.7,8
83,70,63.7,F,63.2,8
83,70,63.7,F,62.5,8
83,70,63.7,F,62.2,8
83,70,63.7,F,61,8
85,70.5,63,M,72.5,5
85,70.5,63,M,69,5
85,70.5,63,M,67,5
85,70.5,63,F,64.5,5
85,70.5,63,F,64,5
86,70,63.5,M,71,4
86,70,63.5,M,67.5,4
86,70,63.5,F,67.5,4
86,70,63.5,F,63.5,4
87,70,63,M,68,4
87,70,63,M,67,4
87,70,63,F,63.7,4
87,70,63,F,62,4
88,70,63,M,70,4
88,70,63,M,66.5,4
88,70,63,F,62,4
88,70,63,F,61,4
89,70.5,62,M,72,8
89,70.5,62,M,70,8
89,70.5,62,M,69.5,8
89,70.5,62,M,69.5,8
89,70.5,62,M,68,8
89,70.5,62,F,65,8
89,70.5,62,F,64,8
89,70.5,62,F,63,8
90,70.3,62.7,M,70.7,7
90,70.3,62.7,M,69.7,7
90,70.3,62.7,M,69.2,7
90,70.3,62.7,M,65.2,7
90,70.3,62.7,F,64,7
90,70.3,62.7,F,63.5,7
90,70.3,62.7,F,63.2,7
91,70.5,62,M,72,3
91,70.5,62,M,72,3
91,70.5,62,F,60,3
92,70,61,M,71.2,2
92,70,61,M,67,2
93,70,60,M,67,4
93,70,60,M,64.5,4
93,70,60,F,65,4
93,70,60,F,63,4
94,70,60,F,65,2
94,70,60,F,65,2
95,70,58.5,M,71.5,3
95,70,58.5,M,64.5,3
95,70,58.5,F,63,3
96,70,58,M,72,5
96,70,58,M,66,5
96,70,58,F,66,5
96,70,58,F,65,5
96,70,58,F,63,5
97,69,68.5,M,75,10
97,69,68.5,M,71,10
97,69,68.5,M,70,10
97,69,68.5,F,66,10
97,69,68.5,F,66,10
97,69,68.5,F,65.5,10
97,69,68.5,F,65,10
97,69,68.5,F,65,10
97,69,68.5,F,64,10
97,69,68.5,F,64,10
98,69,67,F,64,1
99,69,66,M,73,8
99,69,66,M,72,8
99,69,66,M,71.7,8
99,69,66,M,71.5,8
99,69,66,F,65.5,8
99,69,66,F,65,8
99,69,66,F,62.7,8
99,69,66,F,62.5,8
100,69,66,M,71.2,3
100,69,66,M,71,3
100,69,66,M,70,3
101,69,66.7,M,75,6
101,69,66.7,M,74,6
101,69,66.7,M,72,6
101,69,66.7,M,68.5,6
101,69,66.7,M,67,6
101,69,66.7,M,66,6
102,69,66,M,70,6
102,69,66,M,68.5,6
102,69,66,M,68,6
102,69,66,F,65,6
102,69,66,F,63,6
102,69,66,F,62.5,6
103,69,66.5,M,73,5
103,69,66.5,M,71,5
103,69,66.5,M,70.5,5
103,69,66.5,M,70.5,5
103,69,66.5,F,61,5
104,69.5,66.5,M,70.5,4
104,69.5,66.5,M,67.5,4
104,69.5,66.5,F,64.5,4
104,69.5,66.5,F,64,4
105,69,66.5,M,71,6
105,69,66.5,F,68.5,6
105,69,66.5,F,67.5,6
105,69,66.5,F,66,6
105,69,66.5,F,63,6
105,69,66.5,F,63,6
106,69.5,66,M,71,7
106,69.5,66,M,71,7
106,69.5,66,M,70.5,7
106,69.5,66,M,70.5,7
106,69.5,66,F,66.5,7
106,69.5,66,F,65.5,7
106,69.5,66,F,64.5,7
107,69,66,M,73,9
107,69,66,M,72,9
107,69,66,M,69,9
107,69,66,M,69,9
107,69,66,F,66.5,9
107,69,66,F,65.5,9
107,69,66,F,65.5,9
107,69,66,F,65,9
107,69,66,F,64,9
108,69,65,M,70,7
108,69,65,M,68.5,7
108,69,65,M,67,7
108,69,65,F,65,7
108,69,65,F,64,7
108,69,65,F,63.5,7
108,69,65,F,61,7
109,69.5,64.5,M,69.7,7
109,69.5,64.5,M,68,7
109,69.5,64.5,M,60,7
109,69.5,64.5,F,65.2,7
109,69.5,64.5,F,64.5,7
109,69.5,64.5,F,63.7,7
109,69.5,64.5,F,60,7
110,69.2,64,M,71.7,4
110,69.2,64,M,66.5,4
110,69.2,64,F,65,4
110,69.2,64,F,63.5,4
112,69,63,M,69,3
112,69,63,F,67.5,3
112,69,63,F,63.5,3
113,69,63,M,72,1
114,69,63,M,73,6
114,69,63,M,70,6
114,69,63,M,70,6
114,69,63,M,64,6
114,69,63,F,66,6
114,69,63,F,62,6
115,69,63.5,M,70.5,7
115,69,63.5,M,67,7
115,69,63.5,M,66,7
115,69,63.5,F,65,7
115,69,63.5,F,63,7
115,69,63.5,F,62,7
115,69,63.5,F,61,7
116,69,63.5,M,70.5,3
116,69,63.5,F,63.7,3
116,69,63.5,F,63,3
117,69.7,62,F,62.5,1
118,69.5,62,M,73,3
118,69.5,62,M,72,3
118,69.5,62,M,69,3
119,69,62,M,73,5
119,69,62,M,71,5
119,69,62,M,71,5
119,69,62,M,69,5
119,69,62,F,63,5
121,69,62.5,M,71,8
121,69,62.5,M,70,8
121,69,62.5,M,70,8
121,69,62.5,M,69,8
121,69,62.5,F,63.5,8
121,69,62.5,F,62.5,8
121,69,62.5,F,62.5,8
121,69,62.5,F,62,8
122,69,62,M,72,4
122,69,62,M,68,4
122,69,62,F,66,4
122,69,62,F,66,4
123,69.5,61,M,70,5
123,69.5,61,M,69.5,5
123,69.5,61,M,69,5
123,69.5,61,F,63,5
123,69.5,61,F,62,5
124,69,61,M,68,9
124,69,61,M,68,9
124,69,61,M,67.5,9
124,69,61,M,64,9
124,69,61,M,63,9
124,69,61,M,63,9
124,69,61,F,63.5,9
124,69,61,F,62,9
124,69,61,F,62,9
125,69,60,M,70.5,3
125,69,60,F,68,3
125,69,60,F,62.5,3
126,69,60,M,69,4
126,69,60,M,66,4
126,69,60,F,61.7,4
126,69,60,F,60.5,4
127,69,60.5,M,69.5,1
128,68.7,70.5,M,71,2
128,68.7,70.5,F,61.7,2
129,68.5,67,M,73,3
129,68.5,67,M,71,3
129,68.5,67,F,67,3
130,68.5,66.5,M,70,11
130,68.5,66.5,M,69,11
130,68.5,66.5,M,69,11
130,68.5,66.5,M,68.7,11
130,68.5,66.5,M,68.5,11
130,68.5,66.5,M,68.5,11
130,68.5,66.5,M,68,11
130,68.5,66.5,M,68,11
130,68.5,66.5,M,68,11
130,68.5,66.5,F,63.2,11
131,68,65,M,67.5,2
131,68,65,M,66,2
132,68,65.5,M,66,2
132,68,65.5,F,64,2
133,68,65.5,M,71.7,7
133,68,65.5,M,71.5,7
133,68,65.5,M,70.7,7
133,68,65.5,M,65.5,7
133,68,65.5,F,66.5,7
133,68,65.5,F,65.2,7
133,68,65.5,F,61.5,7
134,68,65,M,72,4
134,68,65,M,72,4
134,68,65,F,68,4
134,68,65,F,66,4
135,68.5,65,M,69.2,8
135,68.5,65,M,68,8
135,68.5,65,M,66,8
135,68.5,65,M,66,8
135,68.5,65,F,62,8
135,68.5,65,F,61.5,8
135,68.5,65,F,61,8
135,68.5,65,F,60,8
136,68,64,M,71,10
136,68,64,M,68,10
136,68,64,M,68,10
136,68,64,M,67,10
136,68,64,F,65,10
136,68,64,F,64,10
136,68,64,F,63,10
136,68,64,F,63,10
136,68,64,F,62,10
136,68,64,F,61,10
137,68,64,M,66,4
137,68,64,M,63,4
137,68,64,F,65.5,4
137,68,64,F,62,4
138,68,64,M,71.2,5
138,68,64,M,71.2,5
138,68,64,M,69,5
138,68,64,M,68.5,5
138,68,64,F,62.5,5
139,68,64.5,F,62,1
140,68,64,M,69,10
140,68,64,M,67,10
140,68,64,M,66,10
140,68,64,F,66,10
140,68,64,F,66,10
140,68,64,F,65,10
140,68,64,F,65,10
140,68,64,F,65,10
140,68,64,F,64,10
140,68,64,F,63,10
141,68,63,M,70.5,8
141,68,63,M,70,8
141,68,63,M,68,8
141,68,63,M,66,8
141,68,63,M,66,8
141,68,63,F,66,8
141,68,63,F,62,8
141,68,63,F,61.5,8
142,68.5,63.5,M,73.5,4
142,68.5,63.5,M,70,4
142,68.5,63.5,M,69.5,4
142,68.5,63.5,F,65.5,4
143,68,63,M,67,1
144,68,63,M,70,4
144,68,63,M,68,4
144,68,63,F,64.5,4
144,68,63,F,64,4
145,68,63,M,71,8
145,68,63,M,68,8
145,68,63,M,66,8
145,68,63,M,65.5,8
145,68,63,M,65,8
145,68,63,F,63,8
145,68,63,F,62,8
145,68,63,F,62,8
146,68,63,M,67,6
146,68,63,M,67,6
146,68,63,M,66,6
146,68,63,F,64,6
146,68,63,F,63.5,6
146,68,63,F,61,6
147,68.5,63.5,M,68.2,1
148,68,63,M,70,1
149,68.2,63.5,M,70,5
149,68.2,63.5,M,69,5
149,68.2,63.5,M,67,5
149,68.2,63.5,M,65.5,5
149,68.2,63.5,F,64.5,5
150,68,62.5,M,68.5,1
151,68.7,62,M,67.7,2
151,68.7,62,F,61.7,2
152,68,62.5,M,66.5,1
153,68,61,M,68.5,5
153,68,61,M,68,5
153,68,61,M,64,5
153,68,61,F,63.5,5
153,68,61,F,63,5
154,68,60.2,M,66.7,1
155,68,60,M,64,7
155,68,60,F,61,7
155,68,60,F,61,7
155,68,60,F,60,7
155,68,60,F,60,7
155,68,60,F,60,7
155,68,60,F,56,7
156,68,60,M,67.5,4
156,68,60,M,67,4
156,68,60,M,66.5,4
156,68,60,F,60,4
157,68.5,59,M,69,1
158,68,59,M,68,10
158,68,59,M,65,10
158,68,59,M,64.7,10
158,68,59,M,64,10
158,68,59,M,64,10
158,68,59,M,63,10
158,68,59,F,65,10
158,68,59,F,65,10
158,68,59,F,62,10
158,68,59,F,61,10
159,67,66.2,M,72.7,5
159,67,66.2,M,72.7,5
159,67,66.2,M,71.5,5
159,67,66.2,F,65.5,5
159,67,66.2,F,63.5,5
160,67,66.5,M,71,1
162,67,65,M,69.7,6
162,67,65,M,67.5,6
162,67,65,F,65.5,6
162,67,65,F,65,6
162,67,65,F,64.5,6
162,67,65,F,63.5,6
163,67,65.5,M,70,5
163,67,65.5,M,69,5
163,67,65.5,F,65.5,5
163,67,65.5,F,65.5,5
163,67,65.5,F,63,5
164,67,65.5,M,70,4
164,67,65.5,M,67.7,4
164,67,65.5,F,63,4
164,67,65.5,F,60,4
165,67,65,M,65,3
165,67,65,F,62,3
165,67,65,F,62,3
166,67.5,65,M,71,11
166,67.5,65,M,69,11
166,67.5,65,F,64,11
166,67.5,65,F,64,11
166,67.5,65,F,63,11
166,67.5,65,F,63,11
166,67.5,65,F,63,11
166,67.5,65,F,63,11
166,67.5,65,F,63,11
166,67.5,65,F,62.5,11
166,67.5,65,F,62,11
167,67,64,M,71.5,4
167,67,64,M,70,4
167,67,64,M,67,4
167,67,64,M,67,4
168,67,63.5,M,71,8
168,67,63.5,M,70.2,8
168,67,63.5,M,69.2,8
168,67,63.5,M,68.5,8
168,67,63.5,M,68,8
168,67,63.5,M,67,8
168,67,63.5,M,65.5,8
168,67,63.5,F,63.5,8
169,67,63,M,69,3
169,67,63,M,68,3
169,67,63,F,63,3
170,67.5,62,M,70,5
170,67.5,62,M,69.5,5
170,67.5,62,M,69,5
170,67.5,62,M,68.5,5
170,67.5,62,F,66,5
171,67,61,M,67,1
172,66,67,M,70.5,8
172,66,67,M,70.5,8
172,66,67,M,67,8
172,66,67,M,66,8
172,66,67,M,66,8
172,66,67,F,62,8
172,66,67,F,62,8
172,66,67,F,61.5,8
173,66,67,M,72,9
173,66,67,M,65,9
173,66,67,M,65,9
173,66,67,F,67,9
173,66,67,F,64,9
173,66,67,F,64,9
173,66,67,F,62,9
173,66,67,F,60,9
173,66,67,F,60,9
174,66,66,M,66,5
174,66,66,M,65,5
174,66,66,F,67,5
174,66,66,F,66.5,5
174,66,66,F,65.5,5
175,66,66,M,72,6
175,66,66,M,68,6
175,66,66,F,66,6
175,66,66,F,65,6
175,66,66,F,62,6
175,66,66,F,61,6
176,66.5,65,M,68.7,8
176,66.5,65,M,68.5,8
176,66.5,65,M,66.5,8
176,66.5,65,M,64.5,8
176,66.5,65,F,62.5,8
176,66.5,65,F,60.5,8
176,66.5,65,F,60.5,8
176,66.5,65,F,57.5,8
177,66,65.5,M,72,5
177,66,65.5,M,71,5
177,66,65.5,M,67,5
177,66,65.5,F,66,5
177,66,65.5,F,65,5
178,66,63,M,70,1
179,66,63.5,F,64.5,2
179,66,63.5,F,62,2
180,66.5,63,M,67.2,6
180,66.5,63,M,67,6
180,66.5,63,M,65,6
180,66.5,63,F,65,6
180,66.5,63,F,65,6
180,66.5,63,F,63,6
181,66.5,62.5,M,70,7
181,66.5,62.5,M,68,7
181,66.5,62.5,F,63.5,7
181,66.5,62.5,F,62.5,7
181,66.5,62.5,F,62.5,7
181,66.5,62.5,F,62.5,7
181,66.5,62.5,F,62.5,7
182,66,61.5,M,70,1
183,66,60,M,68,4
183,66,60,M,67,4
183,66,60,M,65,4
183,66,60,F,60,4
184,66,60,M,65,1
185,66,59,M,68,15
185,66,59,M,67,15
185,66,59,M,66.5,15
185,66,59,M,66,15
185,66,59,M,65.7,15
185,66,59,M,65.5,15
185,66,59,M,65,15
185,66,59,F,65,15
185,66,59,F,64,15
185,66,59,F,63,15
185,66,59,F,62,15
185,66,59,F,61,15
185,66,59,F,60,15
185,66,59,F,58,15
185,66,59,F,57,15
186,65,67,M,66.5,4
186,65,67,M,66,4
186,65,67,M,66,4
186,65,67,F,65,4
187,65,67,F,63,1
188,65,66,M,63,4
188,65,66,F,63,4
188,65,66,F,63,4
188,65,66,F,60,4
190,65,65,M,69,9
190,65,65,M,68,9
190,65,65,M,68,9
190,65,65,F,65,9
190,65,65,F,65,9
190,65,65,F,62,9
190,65,65,F,62,9
190,65,65,F,61,9
190,65,65,F,59,9
191,65,65.5,M,70.7,2
191,65,65.5,F,65.5,2
192,65,65,M,69.2,6
192,65,65,M,69,6
192,65,65,M,68,6
192,65,65,M,67.7,6
192,65,65,F,64.5,6
192,65,65,F,60.5,6
193,65,64,M,67,6
193,65,64,M,67,6
193,65,64,F,64,6
193,65,64,F,64,6
193,65,64,F,62.5,6
193,65,64,F,60.5,6
194,65,63,M,70,2
194,65,63,F,63,2
195,65,63,M,66,3
195,65,63,M,66,3
195,65,63,F,63,3
196,65.5,63,M,71,4
196,65.5,63,M,71,4
196,65.5,63,M,69,4
196,65.5,63,F,63.5,4
197,65.5,60,M,68,5
197,65.5,60,M,68,5
197,65.5,60,M,67,5
197,65.5,60,M,67,5
197,65.5,60,F,62,5
198,64,64,M,71.5,7
198,64,64,M,68,7
198,64,64,F,65.5,7
198,64,64,F,64,7
198,64,64,F,62,7
198,64,64,F,62,7
198,64,64,F,61,7
199,64,64,M,70.5,7
199,64,64,M,68,7
199,64,64,F,67,7
199,64,64,F,65,7
199,64,64,F,64,7
199,64,64,F,64,7
199,64,64,F,60,7
200,64,63,M,64.5,1
201,64,60,M,66,2
201,64,60,F,60,2
203,62,66,M,64,3
203,62,66,F,62,3
203,62,66,F,61,3
204,62.5,63,M,66.5,2
204,62.5,63,F,57,2
136A,68.5,65,M,72,8
136A,68.5,65,M,70.5,8
136A,68.5,65,M,68.7,8
136A,68.5,65,M,68.5,8
136A,68.5,65,M,67.7,8
136A,68.5,65,F,64,8
136A,68.5,65,F,63.5,8
136A,68.5,65,F,63,8
--------------------------------------------------------------------------------
/tutorial/tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## This is a Heading"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "Explanatory text"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 9,
20 | "metadata": {
21 | "collapsed": true
22 | },
23 | "outputs": [],
24 | "source": [
25 | "import matplotlib.pyplot as plt"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 10,
31 | "metadata": {
32 | "collapsed": false
33 | },
34 | "outputs": [
35 | {
36 | "data": {
37 | "text/plain": [
38 | "3"
39 | ]
40 | },
41 | "execution_count": 10,
42 | "metadata": {},
43 | "output_type": "execute_result"
44 | }
45 | ],
46 | "source": [
47 | "def add(a, b):\n",
48 | " return a + b\n",
49 | "add(1, 2)"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 11,
55 | "metadata": {
56 | "collapsed": false
57 | },
58 | "outputs": [
59 | {
60 | "data": {
61 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4VPW9x/H3Dwj7TlgCIYR9DSKERbCKSxWQK6K2tVaq\nWEvxaterLOKCO2prtXWh9FovXK3aEjZZRKkiuIACShbWsBMCASIJkIRs3/tHpveJMZAJTHJmJp/X\n8+TJzJmTnI+H5OPJycn3ODNDRETCSy2vA4iISOCp3EVEwpDKXUQkDKncRUTCkMpdRCQMqdxFRMKQ\nyl1EJAyp3EVEwpDKXUQkDNXxasORkZEWGxvr1eZFRELSxo0bj5lZ64rW86zcY2Nj2bBhg1ebFxEJ\nSc65ff6sp9MyIiJhSOUuIhKGVO4iImFI5S4iEoZU7iIiYcivcnfO7XXOJTnnvnbOfecSF1fiT865\nVOdconNuYOCjioiIvypzKeQVZnbsLK+NBrr73oYCr/rei4iIBwJ1WmYcMM9KrAOaO+eiAvS5RUTC\nQkFRMa+sTmXzgRNVvi1/y92AVc65jc65SeW83gE4UOr5Qd+yb3HOTXLObXDObTh69Gjl04qIhKjk\ntCxuePlTnn1vOyuSD1f59vw9LXOpmaU559oAHzjntpnZmspuzMzmAHMA4uPjdWduEQl7eQVF/PnD\nncz+eDctGtbl1Z8MZHRc1Z/Y8KvczSzN9z7DObcQGAKULvc0oGOp59G+ZSIiNdaGvZlMSUhk99HT\n/GBQNA9e14dmDSOqZdsVlrtzrhFQy8xO+h5fAzxWZrUlwL3Oubcp+UVqlpmlBzytiEgIOHWmkOfe\n28a8dfto36wB8+4cwmU9Kpz1FVD+HLm3BRY65/69/t/N7D3n3GQAM5sNLAfGAKlADjCxauKKiAS3\nj3cc5YEFSRzKyuX2S2K5/9qeNKpX/TMaK9yime0GLipn+exSjw24J7DRRERCx4mcfB5fupWETQfp\n2roR//zFJcTHtvQsj2cjf0VEwsWKpHQeWpzCiZx87r2iG/de2Y36EbU9zaRyFxE5TxnZeTy8OIX3\nUg7Tr0NT5t45mL7tm3kdC1C5i4hUmpnxz40HeWLpFvIKi5k6qhc//15n6tQOnnFdKncRkUo4kJnD\nAwuTWLvzGENiWzLrpji6tG7sdazvULmLiPihqNiY9/lenlu5HQc8fkM/fjIkhlq1nNfRyqVyFxGp\nQGrGSabMT2TT/hOM7NmaJ8fH0aF5A69jnZPKXUTkLAqKivnLx7v4079SaVivNn/80UXcMKADvr/7\nCWoqdxGRciQdzOL++ZvZdvgk1/WP4tHr+xLZuJ7XsfymchcRKSWvoIgXVu3kr2t306pRXf4yYRDX\n9m3ndaxKU7mLiPis332caQuS2HPsNLcM7sj0Mb1p1qB6Bn0FmspdRGq8k3kFPPPeNt5Yt5+OLRvw\n5l1DGdEt0utYF0TlLiI12kfbMpixMIn07Dx+dmln/uuaHjSsG/rVGPr/BSIi5yHzdD6PL93Cwq/S\n6N6mMQl3D2dgTAuvYwWMyl1EahQzY1lSOo8sTiErt4BfXdWde67oSr063g76CjSVu4jUGEey83hw\nUTIfbDlC/+hmvHHXUHpHNfU6VpXwu9ydc7WBDUCamY0t89pIYDGwx7dogZmVvVuTiIgnzIx3vjzA\nk8u3kl9YzIwxvZk4IjaoBn0FWmWO3H8NbAXO9r+5tWVLX0TEa/uP5zBtQSKf7TrO0M4teeam/sRG\nNvI6VpXzq9ydc9HAdcCTwO+qNJGISAAUFRuvf7qH37+/nTq1avHU+DhuGdwxaAd9BZq/R+4vAFOA\nJudYZ7hzLhFIA+4zs5QLDScicj52HCkZ9PX1gRNc2asNT47vR1Sz4B70FWgVlrtzbiyQYWYbfefW\ny7MJiDGzU865McAioHs5n2sSMAkgJibmvEOLiJQnv7CYV1fv4qWPdtKkfgQv3jKA6y9qHxKDvgLN\nldzb+hwrOPc0MAEoBOpTcs59gZnddo6P2QvEm9mxs60THx9vGzZsOJ/MIiLfsfnACabMT2T7kZOM\nG9Ceh8f2oVUIDfryl3Nuo5nFV7RehUfuZjYdmO77pCMpOeXyrWJ3zrUDjpiZOeeGALWA4+cTXESk\nMnLzi3j+g+289ske2jSpz3//NJ6r+7T1Opbnzvs6d+fcZAAzmw3cDNztnCsEcoFbrKIfCURELtDn\nu44zbUEi+47ncOvQGKaN7kXT+qE56CvQKjwtU1V0WkZEzld2XgFPL9/GW1/sp1Orhjx9YxzDu4b2\noC9/Bey0jIhIMPnX1iPMWJhMxsk8Jl3Whd9e3YMGdcNrdEAgqNxFJCQcP3WGR9/dwpLNh+jVrgl/\nmTCIizo29zpW0FK5i0hQMzOWbD7EzCUpnDpTyG+v7sHdI7tSt074jg4IBJW7iASt9KxcHlyYzL+2\nZTCgY3Oevbk/Pdqe628p5d9U7iISdIqLjbe+3M/Ty7dRWFzMg9f1ZuKIztSuIaMDAkHlLiJBZe+x\n00xbkMi63ZkM79qKWTf2J6ZVQ69jhRyVu4gEhcKiYv726R7+8P4O6tapxTM3xfHD+I41cnRAIKjc\nRcRzW9OzmZqQSOLBLL7fpy1P3NCPtk3rex0rpKncRcQzZwqLePmjXbzyUSrNGkTw0q0Xc11clI7W\nA0DlLiKe2LT/G6bOT2RnxinGX9yBh8f2oUWjul7HChsqdxGpVjn5hfzh/R387dM9RDWtz+t3DOaK\nXm28jhV2VO4iUm0+TT3GtAWJHMjMZcKwTkwZ1ZMmGvRVJVTuIlLlsnILeGrZVt7ZcIDOkY14Z9Iw\nhnZp5XWssKZyF5Eq9X7KYR5clMzx0/lMvrwrv7m6O/UjNOirqqncRaRKHD15hpnvprAsMZ3eUU15\n7fbBxEU38zpWjaFyF5GAMjMWfpXGY0u3kHOmiPuv7cmky7oQUVuDvqqT3+XunKsNbADSzGxsmdcc\n8CIwBsgB7jCzTYEMKiLBL+1ELjMWJrF6+1EGxpQM+urWRoO+vFCZI/dfA1spuUF2WaOB7r63ocCr\nvvciUgMUFxtvrt/HrBXbMGDmf/RhwiWxGvTlIb/K3TkXDVwHPAn8rpxVxgHzfPdNXeeca+6cizKz\n9MBFFZFgtPvoKaYlJPHF3ky+1z2Sp8bH0bGlBn15zd8j9xeAKcDZfr7qABwo9fygb5nKXSRMFRYV\n89e1e/jjqh3Ur1OL527uz82DojU6IEhUWO7OubFAhpltdM6NvJCNOecmAZMAYmJiLuRTiYiHUg5l\nMTUhkeS0bEb1bcdj4/rSRoO+goo/R+4jgOudc2OA+kBT59wbZnZbqXXSgI6lnkf7ln2Lmc0B5gDE\nx8fbeacWEU/kFRTx5w93Mvvj3bRoWJdXfzKQ0XFRXseSclRY7mY2HZgO4Dtyv69MsQMsAe51zr1N\nyS9Ss3S+XSS8bNyXyZT5iew6epqbBkbz0NjeNG+oQV/B6ryvc3fOTQYws9nAckoug0yl5FLIiQFJ\nJyKeO32mkOdWbmfu53tp36wBc+8cwuU9WnsdSypQqXI3s9XAat/j2aWWG3BPIIOJiPfW7DjK9AVJ\nHMrK5afDOnH/qF40rqe/fQwF+lcSke/Iying8WVbmL/xIF1aN+Ifv7iEwbEtvY4llaByF5FveS85\nnYcWp5B5Op97rujKL6/UoK9QpHIXEQAyTubxyOIUViQfpm/7pvzPxMH0ba9BX6FK5S5Sw5kZ8zce\n5IllW8ktKGLKqJ78/Hsa9BXqVO4iNdiBzBweWJjE2p3HGBzbglk39adr68Zex5IAULmL1EDFxca8\nz/fy7MrtOOCxcX25bWgnamnQV9hQuYvUMKkZp5iWkMiGfd9wWY/WPDW+H9EtNOgr3KjcRWqIgqJi\n5qzZzYurdtKgbm3+8IOLuHFgBw36ClMqd5EaIDktiynzE9mSns2YuHY8en0/Wjep53UsqUIqd5Ew\nlldQxIv/2smcNbtp2agus28bxKh+7byOJdVA5S4Spr7cm8nU+YnsPnaaH8ZHM2NMH5o1jPA6llQT\nlbtImDl1ppBn39vGvM/3Ed2iAW/8bCiXdo/0OpZUM5W7SBhZvT2DGQuTOZSVy8QRsdx3TU8aadBX\njaR/dZEw8M3pfB5ftoUFm9Lo1qYx8ycPZ1CnFl7HEg+p3EVCmJmxPOkwjyxJ5kROAb+8shv3XtmN\nenU06KumU7mLhKiM7DweXJTM+1uOENehGfPuHEqf9k29jiVBwp8bZNcH1gD1fOvPN7NHyqwzElgM\n7PEtWmBmjwU2qohAydH6Pzcc5PFlW8gvLGb66F787NLO1NGgLynFnyP3M8CVZnbKORcBfOKcW2Fm\n68qst9bMxgY+ooj824HMHKYvSOKT1GMM6dySWTfG0UWDvqQc/twg24BTvqcRvjerylAi8m1Fxcbc\nz/by3Mrt1K7leOKGftw6JEaDvuSs/Drn7pyrDWwEugEvm9n6clYb7pxLBNKA+8wspZzPMwmYBBAT\nE3PeoUVqkp1HTjIlIZGv9p9gZM/WPDU+jvbNG3gdS4KcX+VuZkXAAOdcc2Chc66fmSWXWmUTEOM7\ndTMGWAR0L+fzzAHmAMTHx+voX+Qc8guLmf3xLl76MJVG9Wrzwo8GMG5Aew36Er9U6moZMzvhnPsI\nGAUkl1qeXerxcufcK865SDM7FrioIjVH4sETTJmfyLbDJxnbP4qZ1/clsrEGfYn//LlapjVQ4Cv2\nBsD3gWfKrNMOOGJm5pwbAtQCjldFYJFwlldQxB8/2MFf1+4msnE95kwYxDV9NehLKs+fI/coYK7v\nvHst4B9mttQ5NxnAzGYDNwN3O+cKgVzgFt8vYkXET+t2H2daQiJ7j+fw4yEdmTa6N80aaNCXnB9/\nrpZJBC4uZ/nsUo9fAl4KbDSRmuFkXgGzVmzjzfX7iWnZkL/fNZTh3TToSy6M/kJVxEMfbjvCjIXJ\nHMnO465LO/O7a3rQsK6+LeXC6atIxAOZp/N57N0UFn19iO5tGvPK3cO5OEaDviRwVO4i1cjMeDcx\nnZlLUsjOLeDXV3XnP6/oqkFfEnAqd5FqcjirZNDXqq1H6B/djGd/PpRe7TToS6qGyl2kipkZb395\ngKeWbSW/qJgZY3ozcUSsBn1JlVK5i1ShfcdPMy0hic93H2dYl5bMurE/sZGNvI4lNYDKXaQKFBUb\nr3+6h9+/v52IWrV4anwctwzuqEFfUm1U7iIBtv1wyaCvzQdOcFWvNjwxvh9RzTToS6qXyl0kQPIL\ni3lldSovf5RKk/oRvHjLAK6/SIO+xBsqd5EA+PrACabOT2T7kZOMG9Ceh8f2oZUGfYmHVO4iFyA3\nv4jnP9jOa5/soU2T+vz3T+O5uk9br2OJqNxFztdnu44xLSGJ/Zk53Do0hmmje9G0vgZ9SXBQuYtU\nUnZeAU8v38ZbX+ynU6uG/P3nQxneVYO+JLio3EUqYdWWI8xYlMTRk2eYdFkXfnt1DxrU1egACT4q\ndxE/HD91hpnvbuHdzYfo1a4JcybEc1HH5l7HEjkrf+7EVB9YA9TzrT/fzB4ps44DXgTGADnAHWa2\nKfBxRaqXmbFk8yFmLknh1JlCfnt1D+4e2ZW6dTQ6QIKbP0fuZ4ArfTe/jgA+cc6tMLN1pdYZTckN\nsbsDQ4FXfe9FQtahE7k8uCiZD7dlMKBjc569uT892jbxOpaIX/y5E5MBp3xPI3xvZW+hNw6Y51t3\nnXOuuXMuyszSA5pWpBoUFxtvfbmfp5dvo7C4mAev683EEZ2prdEBEkL8Oufuu3/qRqAb8LKZrS+z\nSgfgQKnnB33LVO4SUvYcO820hETW78lkeNdWzLqxPzGtGnodS6TS/Cp3MysCBjjnmgMLnXP9zCy5\nshtzzk0CJgHExMRU9sNFqkxhUTGvfbKH5z/YQd3atZh1Yxw/GtxRowMkZFXqahkzO+Gc+wgYBZQu\n9zSgY6nn0b5lZT9+DjAHID4+vuypHRFPbE3PZmpCIokHs7i6d1ueuKEf7ZrV9zqWyAXx52qZ1kCB\nr9gbAN8Hnimz2hLgXufc25T8IjVL59sl2J0pLOLlD1N5ZfUumjWI4KVbL+a6uCgdrUtY8OfIPQqY\n6zvvXgv4h5ktdc5NBjCz2cBySi6DTKXkUsiJVZRXJCA27f+GqfMT2ZlxivEXd+DhsX1o0aiu17FE\nAsafq2USgYvLWT671GMD7glsNJHAy8kv5Pcrd/D6Z3to17Q+r98xmCt6tfE6lkjA6S9Upcb4NPUY\n0xYkciAzl9uGxTB1VC+aaNCXhCmVu4S9rNwCnlq2lXc2HKBzZCPemTSMoV1aeR1LpEqp3CWsrUw5\nzEOLkjl+Op/Jl3flN1d3p36EBn1J+FO5S1g6evIMM5eksCwpnd5RTXnt9sHERTfzOpZItVG5S1gx\nMxZ+lcZjS7eQc6aI+67pwS8u70pEbQ36kppF5S5hI+1ELjMWJrF6+1EGxpQM+urWRoO+pGZSuUvI\nKy423ly/j1krtlFs8Mh/9OGnl8Rq0JfUaCp3CWm7j55iWkISX+zN5NJukTx9YxwdW2rQl4jKXUJS\nYVExf127hz+u2kH9OrV49ub+/GBQtEYHiPio3CXkpBzKYmpCIslp2Vzbty2Pj+tHm6Ya9CVSmspd\nQkZeQRF//nAnsz/eTYuGdXn1JwMZHRfldSyRoKRyl5CwcV8mU+YnsuvoaW4aGM1DY3vTvKEGfYmc\njcpdgtrpM4U8t3I7cz/fS/tmDZh75xAu79Ha61giQU/lLkFrzY6jTF+QRNqJXG6/pBP3j+pF43r6\nkhXxh75TJOhk5RTw+LItzN94kC6tG/HPyZcwOLal17FEQorKXYLKe8npPLQ4hczT+fznyK786ioN\n+hI5H/7cZq8jMA9oCxgwx8xeLLPOSGAxsMe3aIGZPRbYqBLOMk7m8cjiFFYkH6ZPVFNev2Mw/Tpo\n0JfI+fLnyL0Q+C8z2+ScawJsdM59YGZbyqy31szGBj6ihDMzY/7GgzyxbCu5BUXcf21PJl3WRYO+\nRC6QP7fZSwfSfY9POue2Ah2AsuUuUikHMnN4YGESa3ceI75TC2bd1J9ubRp7HUskLFTqnLtzLpaS\n+6muL+fl4c65RCANuM/MUi44nYSl4mJj3ud7eXbldgAevb4vE4Z1opYGfYkEjN/l7pxrDCQAvzGz\n7DIvbwJizOyUc24MsAjoXs7nmARMAoiJiTnv0BK6UjNOMS0hkQ37vuGyHq15anw/olto0JdIoDkz\nq3gl5yKApcBKM3vej/X3AvFmduxs68THx9uGDRsqEVVCWUFRMXPW7ObFVTtpULc2D4/tw40DO2jQ\nl0glOec2mll8Rev5c7WMA14Dtp6t2J1z7YAjZmbOuSFALeB4JTNLmEpOy2LK/ES2pGczJq4dj17f\nj9ZN6nkdSySs+XNaZgQwAUhyzn3tW/YAEANgZrOBm4G7nXOFQC5wi/nzI4GEtbyCIl78107mrNlN\ny0Z1mX3bQEb106Avkergz9UynwDn/NnZzF4CXgpUKAl9X+7NZOr8RHYfO80PBkXz4HV9aNYwwutY\nIjWG/kJVAurUmUKefW8b8z7fR3SLBvzvz4bwve4a9CVS3VTuEjCrt2cwY2Eyh7JymTgilvuu6Ukj\nDfoS8YS+8+SCfXM6n8eXbWHBpjS6tWnM/MnDGdSphdexRGo0lbucNzNjRfJhHl6czImcAn55ZTfu\nvbIb9epo0JeI11Tucl4ysvN4aHEyK1OOENehGfPuHEqf9k29jiUiPip3qRQz458bDvLEsi2cKSxm\n2uhe3HVpZ+po0JdIUFG5i98OZOYwfUESn6QeY0jnlsy6MY4urTXoSyQYqdylQkXFxtzP9vLcyu3U\nruV44oZ+3DokRoO+RIKYyl3OaeeRk0xNSGTT/hOM7Nmap8bH0b55A69jiUgFVO5SroKiYmav3sWf\nP0ylUb3avPCjAYwb0F6DvkRChMpdviPpYBb3z9/MtsMnGds/ipnX9yWysQZ9iYQSlbv8v7yCIv64\nagd/XbObyMb1mDNhENf0bed1LBE5Dyp3AWD97uNMW5DEnmOn+fGQjkwb3ZtmDTToSyRUqdxruJN5\nBTzz3jbeWLefmJYN+ftdQxneLdLrWCJygVTuNdhH2zJ4YGESR7LzuOvSzvzumh40rKsvCZFwoO/k\nGijzdD6PvZvCoq8P0b1NY165ezgXx2jQl0g48ec2ex2BeUBbwIA5ZvZimXUc8CIwBsgB7jCzTYGP\nKxfCzFiamM7MJSlk5Rbw66u6859XdNWgL5Ew5M+ReyHwX2a2yTnXBNjonPvAzLaUWmc00N33NhR4\n1fdegsSR7DxmLExm1dYj9I9uxps/H0qvdhr0JRKu/LnNXjqQ7nt80jm3FegAlC73ccA8331T1znn\nmjvnonwfKx4yM9758gBPLt9KQVExM8b0ZuKIWA36EglzlTrn7pyLBS4G1pd5qQNwoNTzg75l3yp3\n59wkYBJATExM5ZJKpe07fprpC5L4bNdxhnVpyawb+xMb2cjrWCJSDfwud+dcYyAB+I2ZZZ/Pxsxs\nDjAHID4+3s7nc0jFioqN1z/dw+/f305ErVo8Ob4fPx6sQV8iNYlf5e6ci6Ck2N80swXlrJIGdCz1\nPNq3TKrZ9sMnmZKQyOYDJ7iqVxueGN+PqGYa9CVS0/hztYwDXgO2mtnzZ1ltCXCvc+5tSn6RmqXz\n7dUrv7CYV1an8vJHqTSpH8GLtwzg+os06EukpvLnyH0EMAFIcs597Vv2ABADYGazgeWUXAaZSsml\nkBMDH1XOZvOBE0yZn8j2IycZN6A9D4/tQysN+hKp0fy5WuYT4JyHf76rZO4JVCjxT25+Ec9/sJ3X\nPtlDmyb1ee32eK7q3dbrWCISBPQXqiHqs13HmJaQxP7MHG4dGsO00b1oWl+DvkSkhMo9xGTnFfD0\n8m289cV+OrVqyFs/H8YlXVt5HUtEgozKPYSs2nKEGYuSOHryDJMu68Jvr+5Bg7oaHSAi36VyDwHH\nT53h0Xe3sGTzIXq1a8KcCfFc1LG517FEJIip3IOYmbFk8yFmLknh1JlCfvf9Hky+vCt162h0gIic\nm8o9SB06kcuDi5L5cFsGAzo259mb+9OjbROvY4lIiFC5B5niYuOtL/fz9PJtFBUbD43twx3DY6mt\n0QEiUgkq9yCy59hppiUksn5PJiO6teLp8f2JadXQ61giEoJU7kGgsKiYv326hz+8v4O6dWrxzE1x\n/DC+o0YHiMh5U7l7bGt6NlMTEkk8mMX3+7TliRv60bZpfa9jiUiIU7l75ExhES9/mMorq3fRrEEE\nL916MdfFReloXUQCQuXugU37v2Hq/ER2Zpzixos78NDYPrRoVNfrWCISRlTu1Sgnv5Dfr9zB65/t\nIappfV6fOJgrerbxOpaIhCGVezX5NPUY0xYkciAzlwnDOjFlVE+aaNCXiFQRlXsVy8ot4KllW3ln\nwwE6RzbinUnDGNpFg75EpGqp3KvQypTDPLQomeOn85l8eVd+c3V36kdo0JeIVD1/brP3N2AskGFm\n/cp5fSSwGNjjW7TAzB4LZMhQc/TkGWYuSWFZUjq9o5ry2u2DiYtu5nUsEalB/Dly/x/gJWDeOdZZ\na2ZjA5IohJkZC79K47GlW8g5U8T91/Zk0mVdiKitQV8iUr38uc3eGudcbNVHCW1pJ3KZsTCJ1duP\nMjCmZNBXtzYa9CUi3gjUOffhzrlEIA24z8xSylvJOTcJmAQQExMToE17q7jYeHP9Pmat2IYBM/+j\nDxMu0aAvEfFWIMp9ExBjZqecc2OARUD38lY0sznAHID4+HgLwLY9tevoKaYlJPLl3m/4XvdInhof\nR8eWGvQlIt674HI3s+xSj5c7515xzkWa2bEL/dzBqrComDlrd/PCqp3Ur1OL527uz82DojU6QESC\nxgWXu3OuHXDEzMw5NwSoBRy/4GRBKuVQFlMTEklOy2ZU33Y8dkNf2jTRoC8RCS7+XAr5FjASiHTO\nHQQeASIAzGw2cDNwt3OuEMgFbjGzkD/lUlZeQRF//nAnsz/eTYuGdXn1JwMZHRfldSwRkXL5c7XM\njyt4/SVKLpUMWxv2ZjI1IZFdR09z08BoHhrbm+YNNehLRIKX/kL1HE6fKeS5lduZ+/le2jdrwNw7\nh3B5j9ZexxIRqZDK/SzW7DjK9AVJHMrK5fZLYrnv2p40rqfdJSKhQW1VxomcfJ5YtpX5Gw/SpXUj\n/vmLS4iPbel1LBGRSlG5l7IiKZ2HFqfwTU4+91zRlV9eqUFfIhKaVO5ARnYeDy9O4b2Uw/Rt35S5\ndw6mb3sN+hKR0FWjy93MmL/xII8v3UJeYTFTRvXk59/ToC8RCX01ttwPZObwwMIk1u48xuDYFsy6\nqT9dWzf2OpaISEDUuHIvLjbmfb6XZ1duxwGPj+vLT4Z2opYGfYlIGKlR5Z6acZKpCUls3PcNl/do\nzZPj+xHdQoO+RCT81IhyLygq5i8f7+JP/0qlYb3aPP/Dixh/cQcN+hKRsBX25Z6clsX98xPZmp7N\ndXFRzLy+L62b1PM6lohIlQrbcs8rKOKFVTv569rdtGxUl9m3DWJUv3ZexxIRqRZhWe5f7MlkWkIi\nu4+d5kfxHXlgTG+aNYzwOpaISLUJq3I/mVfAs+9t53/X7SO6RQPe+NlQLu0e6XUsEZFqFzbl/tH2\nDGYsSCI9O487R3Tmvmt70LBu2PzniYhUij836/gbMBbIMLN+5bzugBeBMUAOcIeZbQp00LP55nQ+\njy/dwoKv0ujWpjHzJw9nUKcW1bV5EZGg5M+h7f9QcjOOeWd5fTQlN8TuDgwFXvW9r1JmxrKkdB5Z\nnEJWbgG/urIb91zZjXp1NOhLRMSfOzGtcc7FnmOVccA836311jnnmjvnoswsPUAZv+NIdh4PLUrm\n/S1HiOvQjDfuGkrvqKZVtTkRkZATiJPSHYADpZ4f9C2rknL/aFsGv3r7K/ILi5k+uhc/u7QzdTTo\nS0TkW6r1N47OuUnAJICYmJjz+hydIxsxMKYFM6/vS+fIRoGMJyISNgJxyJsGdCz1PNq37DvMbI6Z\nxZtZfOu11I9+AAAEOElEQVTW53cv0tjIRsy9c4iKXUTkHAJR7kuAn7oSw4CsqjzfLiIiFfPnUsi3\ngJFApHPuIPAIEAFgZrOB5ZRcBplKyaWQE6sqrIiI+Mefq2V+XMHrBtwTsEQiInLBdJmJiEgYUrmL\niIQhlbuISBhSuYuIhCGVu4hIGHIlF7t4sGHnjgL7zvPDI4FjAYwTKMGaC4I3m3JVjnJVTjjm6mRm\nFf4VqGflfiGccxvMLN7rHGUFay4I3mzKVTnKVTk1OZdOy4iIhCGVu4hIGArVcp/jdYCzCNZcELzZ\nlKtylKtyamyukDznLiIi5xaqR+4iInIOQV3uzrm/OecynHPJZ3ndOef+5JxLdc4lOucGBkmukc65\nLOfc1763h6shU0fn3EfOuS3OuRTn3K/LWafa95efubzYX/Wdc1845zb7cj1azjpe7C9/clX7/iq1\n7drOua+cc0vLec2T70c/cnm5v/Y655J8291QzutVt8/MLGjfgMuAgUDyWV4fA6wAHDAMWB8kuUYC\nS6t5X0UBA32PmwA7gD5e7y8/c3mxvxzQ2Pc4AlgPDAuC/eVPrmrfX6W2/Tvg7+Vt36vvRz9yebm/\n9gKR53i9yvZZUB+5m9kaIPMcq/z/zbnNbB3Q3DkXFQS5qp2ZpZvZJt/jk8BWSu5lW1q17y8/c1U7\n3z445Xsa4Xsr+wsoL/aXP7k84ZyLBq4D/vssq3jy/ehHrmBWZfssqMvdD2e7OXcwGO77MWuFc65v\ndW7YORcLXEzJUV9pnu6vc+QCD/aX70f5r4EM4AMzC4r95Ucu8Obr6wVgClB8lte9+vqqKBd49/1o\nwCrn3EZXcg/psqpsn4V6uQerTUCMmfUH/gwsqq4NO+caAwnAb8wsu7q2W5EKcnmyv8ysyMwGUHLf\n3yHOuX7Vsd2K+JGr2veXc24skGFmG6t6W5XhZy7Pvh+BS33/lqOBe5xzl1XXhkO93P2+OXd1MrPs\nf/9obWbLgQjnXGRVb9c5F0FJgb5pZgvKWcWT/VVRLq/2V6ntnwA+AkaVecnTr6+z5fJof40ArnfO\n7QXeBq50zr1RZh0v9leFubz8+jKzNN/7DGAhMKTMKlW2z0K93IPy5tzOuXbOOed7PISS/Xy8irfp\ngNeArWb2/FlWq/b95U8uj/ZXa+dcc9/jBsD3gW1lVvNif1WYy4v9ZWbTzSzazGKBW4APzey2MqtV\n+/7yJ5cX+8u3rUbOuSb/fgxcA5S9wq7K9lmF91D1kgvSm3P7ketm4G7nXCGQC9xivl+NV6ERwAQg\nyXe+FuABIKZULi/2lz+5vNhfUcBc51xtSr7Z/2FmS51zk0vl8mJ/+ZPLi/1VriDYX/7k8mp/tQUW\n+v6/Ugf4u5m9V137TH+hKiIShkL9tIyIiJRD5S4iEoZU7iIiYUjlLiIShlTuIiJhSOUuIhKGVO4i\nImFI5S4iEob+D2m1x85kw/dLAAAAAElFTkSuQmCC\n",
62 | "text/plain": [
63 | ""
64 | ]
65 | },
66 | "metadata": {},
67 | "output_type": "display_data"
68 | }
69 | ],
70 | "source": [
71 | "xs = [1, 2, 3, 4, 5]\n",
72 | "ys = [1, 2, 3, 4, 5]\n",
73 | "plt.plot(xs, ys)\n",
74 | "plt.show()"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {
81 | "collapsed": true
82 | },
83 | "outputs": [],
84 | "source": []
85 | }
86 | ],
87 | "metadata": {
88 | "kernelspec": {
89 | "display_name": "Python 3",
90 | "language": "python",
91 | "name": "python3"
92 | },
93 | "language_info": {
94 | "codemirror_mode": {
95 | "name": "ipython",
96 | "version": 3
97 | },
98 | "file_extension": ".py",
99 | "mimetype": "text/x-python",
100 | "name": "python",
101 | "nbconvert_exporter": "python",
102 | "pygments_lexer": "ipython3",
103 | "version": "3.6.0"
104 | }
105 | },
106 | "nbformat": 4,
107 | "nbformat_minor": 2
108 | }
109 |
--------------------------------------------------------------------------------