├── penalty.png
├── table_ronaldo.png
├── picture_ronaldo.jpg
├── README.md
├── Analysis of Variance - ANOVA.ipynb
├── Advanced Regression Analysis.ipynb
└── Two-sample Inference for the Difference Between Groups with Python.ipynb
/penalty.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luisroque/College_Statistics_with_Python/HEAD/penalty.png
--------------------------------------------------------------------------------
/table_ronaldo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luisroque/College_Statistics_with_Python/HEAD/table_ronaldo.png
--------------------------------------------------------------------------------
/picture_ronaldo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/luisroque/College_Statistics_with_Python/HEAD/picture_ronaldo.jpg
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # College_Statistics_with_Python
2 | We use Python to get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch
3 |
4 | In a series of weekly articles, I will be covering some important topics of statistics with a twist.
5 |
6 | The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.
7 |
8 | At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.
9 |
--------------------------------------------------------------------------------
/Analysis of Variance - ANOVA.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "smooth-harvey",
6 | "metadata": {},
7 | "source": [
8 | "# Introduction\n",
9 | "In a series of weekly articles, I will be covering some important topics of statistics with a twist.\n",
10 | "\n",
11 | "The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.\n",
12 | "\n",
13 | "At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.\n",
14 | "\n",
15 | "Articles published so far:\n",
16 | "- [Bernoulli and Binomial Random Variables with Python](https://towardsdatascience.com/bernoulli-and-binomial-random-variables-d0698288dd36)\n",
17 | "- [From Binomial to Geometric and Poisson Random Variables with Python](https://towardsdatascience.com/geometric-and-poisson-random-variables-with-python-e5dcb63d6b55)\n",
18 | "- [Sampling Distribution of a Sample Proportion with Python](https://towardsdatascience.com/sampling-distributions-with-python-f5a5f268f636)\n",
19 | "- [Confidence Intervals with Python](https://towardsdatascience.com/confidence-intervals-with-python-bfa28ebb81c)\n",
20 | "- [Significance Tests with Python](https://towardsdatascience.com/significance-or-hypothesis-tests-with-python-7ed35e9ac9b6)\n",
21 | "- [Two-sample Inference for the Difference Between Groups with Python](https://towardsdatascience.com/two-sample-inference-for-the-difference-between-groups-with-python-de91fbee32f9)\n",
22 | "- [Inference for Categorical Data](https://towardsdatascience.com/inference-for-categorical-data-9f3c6034aa57)\n",
23 | "- [Advanced Regression](https://towardsdatascience.com/advanced-regression-f74090014f3)\n",
24 | "- Analysis of Variance - ANOVA\n",
25 | "\n",
26 | "As usual the code is available on my [GitHub](https://github.com/luisroque/College_Statistics_with_Python)."
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "id": "announced-population",
32 | "metadata": {},
33 | "source": [
34 | "# Analysis of Variance step by step"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "id": "apparent-definition",
40 | "metadata": {},
41 | "source": [
42 | "Once again we are working with the salaries of Data Scientists. In this case, we are not interested in predicting the salary based on some independent feature. We are focused on understanding if there are differences between 3 groups of Data Scientists with distinct backgrounds: the first are samples from graduates from Computer Science, the second from Economics and the third from Informatics Engineering (notice that the salary unit is 10,000€)."
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 15,
48 | "id": "defined-horizontal",
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "import pandas as pd\n",
53 | "import numpy as np\n",
54 | "from scipy.stats import f"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 79,
60 | "id": "binding-chance",
61 | "metadata": {},
62 | "outputs": [
63 | {
64 | "data": {
65 | "text/html": [
66 | "
\n",
67 | "\n",
80 | "
\n",
81 | " \n",
82 | "
\n",
83 | "
\n",
84 | "
g1
\n",
85 | "
g2
\n",
86 | "
g3
\n",
87 | "
\n",
88 | " \n",
89 | " \n",
90 | "
\n",
91 | "
0
\n",
92 | "
5
\n",
93 | "
5
\n",
94 | "
9
\n",
95 | "
\n",
96 | "
\n",
97 | "
1
\n",
98 | "
9
\n",
99 | "
4
\n",
100 | "
8
\n",
101 | "
\n",
102 | "
\n",
103 | "
2
\n",
104 | "
10
\n",
105 | "
4
\n",
106 | "
5
\n",
107 | "
\n",
108 | "
\n",
109 | "
3
\n",
110 | "
12
\n",
111 | "
5
\n",
112 | "
6
\n",
113 | "
\n",
114 | "
\n",
115 | "
4
\n",
116 | "
8
\n",
117 | "
5
\n",
118 | "
7
\n",
119 | "
\n",
120 | "
\n",
121 | "
5
\n",
122 | "
8
\n",
123 | "
4
\n",
124 | "
7
\n",
125 | "
\n",
126 | "
\n",
127 | "
6
\n",
128 | "
9
\n",
129 | "
8
\n",
130 | "
6
\n",
131 | "
\n",
132 | " \n",
133 | "
\n",
134 | "
"
135 | ],
136 | "text/plain": [
137 | " g1 g2 g3\n",
138 | "0 5 5 9\n",
139 | "1 9 4 8\n",
140 | "2 10 4 5\n",
141 | "3 12 5 6\n",
142 | "4 8 5 7\n",
143 | "5 8 4 7\n",
144 | "6 9 8 6"
145 | ]
146 | },
147 | "execution_count": 79,
148 | "metadata": {},
149 | "output_type": "execute_result"
150 | }
151 | ],
152 | "source": [
153 | "df = pd.DataFrame.from_dict({'g1': [5,9,10,12,8,8,9], \n",
154 | " 'g2': [5,4, 4, 5,5,4,8], \n",
155 | " 'g3': [9,8, 5, 6,7,7,6]})\n",
156 | "\n",
157 | "df"
158 | ]
159 | },
160 | {
161 | "cell_type": "markdown",
162 | "id": "unavailable-country",
163 | "metadata": {},
164 | "source": [
165 | "The first step to perform an ANOVA test is to calculate the SST (total sum of squares), the SSW (total sum of squares within), and the SSB (total sum of squares between), and the corresponding degrees of freedom. They calculated as following:\n",
166 | "* SST is the sum of the squared distance between each data point and the mean of the dataset. The degrees of freedom in this case is the number of groups $m$ times the number of data points $n$ in each group and then we subtract 1, i.e. $m\\cdot n -1$.\n",
167 | "* SSW is the sum of the squared distance between each data point and the respective group mean. The degrees of freedom is the number of groups times the number of data points minus 1, i.e. $m\\cdot (n -1)$.\n",
168 | "* SSB is the sum of the squared distance between each group mean and the mean of the dataset for each data point. The degrees of freedom is the number of groups minus 1, i.e. $m -1$."
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": 80,
174 | "id": "similar-configuration",
175 | "metadata": {},
176 | "outputs": [],
177 | "source": [
178 | "m = df.shape[1]\n",
179 | "n = df.shape[0]"
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "execution_count": 81,
185 | "id": "acting-municipality",
186 | "metadata": {},
187 | "outputs": [
188 | {
189 | "data": {
190 | "text/plain": [
191 | "98.57142857142858"
192 | ]
193 | },
194 | "execution_count": 81,
195 | "metadata": {},
196 | "output_type": "execute_result"
197 | }
198 | ],
199 | "source": [
200 | "SST = np.sum(np.sum((df - np.mean(np.mean(df)))**2))\n",
201 | "SST"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": 82,
207 | "id": "thermal-straight",
208 | "metadata": {},
209 | "outputs": [
210 | {
211 | "data": {
212 | "text/plain": [
213 | "20"
214 | ]
215 | },
216 | "execution_count": 82,
217 | "metadata": {},
218 | "output_type": "execute_result"
219 | }
220 | ],
221 | "source": [
222 | "df_sst = m*n-1\n",
223 | "df_sst"
224 | ]
225 | },
226 | {
227 | "cell_type": "code",
228 | "execution_count": 83,
229 | "id": "active-potter",
230 | "metadata": {},
231 | "outputs": [
232 | {
233 | "data": {
234 | "text/plain": [
235 | "50.28571428571429"
236 | ]
237 | },
238 | "execution_count": 83,
239 | "metadata": {},
240 | "output_type": "execute_result"
241 | }
242 | ],
243 | "source": [
244 | "SSW = np.sum(np.sum((df - np.mean(df))**2))\n",
245 | "SSW"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 84,
251 | "id": "fitted-filing",
252 | "metadata": {},
253 | "outputs": [
254 | {
255 | "data": {
256 | "text/plain": [
257 | "18"
258 | ]
259 | },
260 | "execution_count": 84,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "df_ssw = m*(n-1)\n",
267 | "df_ssw"
268 | ]
269 | },
270 | {
271 | "cell_type": "code",
272 | "execution_count": 85,
273 | "id": "separated-lawyer",
274 | "metadata": {},
275 | "outputs": [
276 | {
277 | "data": {
278 | "text/plain": [
279 | "20.6938775510204"
280 | ]
281 | },
282 | "execution_count": 85,
283 | "metadata": {},
284 | "output_type": "execute_result"
285 | }
286 | ],
287 | "source": [
288 | "SSB = np.sum(np.sum((np.tile(np.mean(df), (3,1)) - np.mean(np.mean(df)))**2))\n",
289 | "SSB"
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": 86,
295 | "id": "accessory-dressing",
296 | "metadata": {},
297 | "outputs": [
298 | {
299 | "data": {
300 | "text/plain": [
301 | "2"
302 | ]
303 | },
304 | "execution_count": 86,
305 | "metadata": {},
306 | "output_type": "execute_result"
307 | }
308 | ],
309 | "source": [
310 | "df_ssb = m-1\n",
311 | "df_ssb"
312 | ]
313 | },
314 | {
315 | "cell_type": "markdown",
316 | "id": "color-hampshire",
317 | "metadata": {},
318 | "source": [
319 | "# Hypothesis Test"
320 | ]
321 | },
322 | {
323 | "cell_type": "markdown",
324 | "id": "prompt-evanescence",
325 | "metadata": {},
326 | "source": [
327 | "Let's define our hypothesis test. Our null hypothesis is the scenario where the background does not make a difference, while our alternative hypothesis states that background makes a difference on the salary of a Data Scientist.\n",
328 | "\n",
329 | "$$\\begin{align}\n",
330 | "H_0: \\mu_1 = \\mu_2 = \\mu_3 \\\\\n",
331 | "H_1: \\mu_1 \\neq \\mu_2 \\neq \\mu_3 \n",
332 | "\\end{align}$$\n",
333 | "\n",
334 | "As usual, we are going to assume that our null hypothesis is true and figure out the probability of getting a statistic as extreme or more extreme than the one we get from the data observed. For that we will use an F-statistic, which is basically a ratio of two chi-square statistics. It is actually the ratio of two of the metrics calculated above divided by their respective degrees of freedom:\n",
335 | "\n",
336 | "$$\\begin{align}\n",
337 | "\\text{F-statistic} = \\frac{\\frac{SSB}{m-1}}{\\frac{SSW}{m(n-1)}}\n",
338 | "\\end{align}$$\n",
339 | "\n",
340 | "The idea is that if the numerator is significantly larger than the denominator, this should make us believe that there is a difference between the true populations means. Conversely, if the the denominator is significantly larger, it means that the variation within each sample is a bigger percentage of the total variation when compared with the variation between the samples. Thus, any difference that we could observe in the means is probably just a result of random chance."
341 | ]
342 | },
343 | {
344 | "cell_type": "code",
345 | "execution_count": 87,
346 | "id": "satellite-marine",
347 | "metadata": {},
348 | "outputs": [
349 | {
350 | "data": {
351 | "text/plain": [
352 | "3.703733766233764"
353 | ]
354 | },
355 | "execution_count": 87,
356 | "metadata": {},
357 | "output_type": "execute_result"
358 | }
359 | ],
360 | "source": [
361 | "F = (SSB/df_ssb)/(SSW/df_ssw)\n",
362 | "F"
363 | ]
364 | },
365 | {
366 | "cell_type": "code",
367 | "execution_count": 91,
368 | "id": "about-boring",
369 | "metadata": {},
370 | "outputs": [
371 | {
372 | "data": {
373 | "text/plain": [
374 | "3.554557145661787"
375 | ]
376 | },
377 | "execution_count": 91,
378 | "metadata": {},
379 | "output_type": "execute_result"
380 | }
381 | ],
382 | "source": [
383 | "f.ppf(0.95, dfn=df_ssb, dfd=df_ssw)"
384 | ]
385 | },
386 | {
387 | "cell_type": "markdown",
388 | "id": "domestic-independence",
389 | "metadata": {},
390 | "source": [
391 | "Now, we can calculate our p-value. Let's use a significance level of 0.1."
392 | ]
393 | },
394 | {
395 | "cell_type": "code",
396 | "execution_count": 94,
397 | "id": "afraid-concern",
398 | "metadata": {},
399 | "outputs": [
400 | {
401 | "data": {
402 | "text/plain": [
403 | "0.08991458167840971"
404 | ]
405 | },
406 | "execution_count": 94,
407 | "metadata": {},
408 | "output_type": "execute_result"
409 | }
410 | ],
411 | "source": [
412 | "p_value = (1 - f.cdf(F, dfn=df_ssb, dfd=df_ssw))*2\n",
413 | "p_value"
414 | ]
415 | },
416 | {
417 | "cell_type": "code",
418 | "execution_count": 97,
419 | "id": "legendary-pharmacology",
420 | "metadata": {},
421 | "outputs": [
422 | {
423 | "name": "stdout",
424 | "output_type": "stream",
425 | "text": [
426 | "Reject H_0\n"
427 | ]
428 | }
429 | ],
430 | "source": [
431 | "if p_value<0.1:\n",
432 | " print('Reject H_0')\n",
433 | "else:\n",
434 | " print('Accept H_0')"
435 | ]
436 | },
437 | {
438 | "cell_type": "markdown",
439 | "id": "superior-torture",
440 | "metadata": {},
441 | "source": [
442 | "We see that the p-value is smaller than the significance level, which makes us reject the null hypothesis. There is enough evidence to accept that there is a difference between the populations means that does not come from chance alone nor from the variance within each group. With that said, we can conclude that the salary of a Data Scientist is difference depending on the graduation background."
443 | ]
444 | },
445 | {
446 | "cell_type": "markdown",
447 | "id": "short-parliament",
448 | "metadata": {},
449 | "source": [
450 | "# Conclusion\n",
451 | "\n",
452 | "This article covered analysis of variance (ANOVA), which is a collection of methods for comparing multiple means across different groups. We also introduced a new statistic, called F-statistic, which we used to conduct a hypothesis test on the difference of means of our groups.\n",
453 | "\n",
454 | "This is the final article of this first series on \"College Statistics with Python\". Hope that you enjoyed it!"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "id": "floating-malawi",
460 | "metadata": {},
461 | "source": [
462 | "### Answers from last week"
463 | ]
464 | },
465 | {
466 | "cell_type": "markdown",
467 | "id": "tamil-vegetarian",
468 | "metadata": {},
469 | "source": [
470 | "1. Márcia collected data on the battery life and price of a random sample of Portable Computers. Based on the data presented below, what is the test statistic for the null hypothesis that the population slope is 0? "
471 | ]
472 | },
473 | {
474 | "cell_type": "code",
475 | "execution_count": 10,
476 | "id": "structured-player",
477 | "metadata": {},
478 | "outputs": [],
479 | "source": [
480 | "data = {'Intercept': [200.312, 92.618],\n",
481 | " 'Battery': [7.546,4.798]}"
482 | ]
483 | },
484 | {
485 | "cell_type": "code",
486 | "execution_count": 11,
487 | "id": "industrial-bedroom",
488 | "metadata": {},
489 | "outputs": [
490 | {
491 | "data": {
492 | "text/html": [
493 | "
\n",
494 | "\n",
507 | "
\n",
508 | " \n",
509 | "
\n",
510 | "
\n",
511 | "
Coef
\n",
512 | "
SE Coef
\n",
513 | "
\n",
514 | " \n",
515 | " \n",
516 | "
\n",
517 | "
Intercept
\n",
518 | "
200.312
\n",
519 | "
92.618
\n",
520 | "
\n",
521 | "
\n",
522 | "
Battery
\n",
523 | "
7.546
\n",
524 | "
4.798
\n",
525 | "
\n",
526 | " \n",
527 | "
\n",
528 | "
"
529 | ],
530 | "text/plain": [
531 | " Coef SE Coef\n",
532 | "Intercept 200.312 92.618\n",
533 | "Battery 7.546 4.798"
534 | ]
535 | },
536 | "execution_count": 11,
537 | "metadata": {},
538 | "output_type": "execute_result"
539 | }
540 | ],
541 | "source": [
542 | "df = pd.DataFrame.from_dict(data, columns=['Coef', 'SE Coef'], orient='index')\n",
543 | "df"
544 | ]
545 | },
546 | {
547 | "cell_type": "code",
548 | "execution_count": 12,
549 | "id": "clean-mistress",
550 | "metadata": {},
551 | "outputs": [
552 | {
553 | "data": {
554 | "text/plain": [
555 | "1.5727386411004585"
556 | ]
557 | },
558 | "execution_count": 12,
559 | "metadata": {},
560 | "output_type": "execute_result"
561 | }
562 | ],
563 | "source": [
564 | "t = (df['Coef'][1]-0)/df['SE Coef'][1]\n",
565 | "t"
566 | ]
567 | },
568 | {
569 | "cell_type": "markdown",
570 | "id": "smart-museum",
571 | "metadata": {},
572 | "source": [
573 | "2. Rui obtained a random sample of colleagues at work and noticed a positive linear relationship between their ages and the number of kilometers they said they walked yesterday. A 95%, percent confidence interval for the slope of the regression line was (15.4, 155.2). Rui wants to use this interval to test $H_0: \\beta=0$ vs. $H_1: \\beta \\neq 0$ at the 5% level of significance. Assume that all conditions for inference have been met. What should Rui conclude?"
574 | ]
575 | },
576 | {
577 | "cell_type": "markdown",
578 | "id": "prospective-sustainability",
579 | "metadata": {},
580 | "source": [
581 | "Rui should reject $H_0$, i.e. the data suggests that there is a linear relationship between age and number of kilometers walked yesterday."
582 | ]
583 | }
584 | ],
585 | "metadata": {
586 | "kernelspec": {
587 | "display_name": "Python 3",
588 | "language": "python",
589 | "name": "python3"
590 | },
591 | "language_info": {
592 | "codemirror_mode": {
593 | "name": "ipython",
594 | "version": 3
595 | },
596 | "file_extension": ".py",
597 | "mimetype": "text/x-python",
598 | "name": "python",
599 | "nbconvert_exporter": "python",
600 | "pygments_lexer": "ipython3",
601 | "version": "3.8.5"
602 | }
603 | },
604 | "nbformat": 4,
605 | "nbformat_minor": 5
606 | }
607 |
--------------------------------------------------------------------------------
/Advanced Regression Analysis.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "smooth-harvey",
6 | "metadata": {},
7 | "source": [
8 | "# Introduction\n",
9 | "In a series of weekly articles, I will be covering some important topics of statistics with a twist.\n",
10 | "\n",
11 | "The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.\n",
12 | "\n",
13 | "At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.\n",
14 | "\n",
15 | "Articles published so far:\n",
16 | "- [Bernoulli and Binomial Random Variables with Python](https://towardsdatascience.com/bernoulli-and-binomial-random-variables-d0698288dd36)\n",
17 | "- [From Binomial to Geometric and Poisson Random Variables with Python](https://towardsdatascience.com/geometric-and-poisson-random-variables-with-python-e5dcb63d6b55)\n",
18 | "- [Sampling Distribution of a Sample Proportion with Python](https://towardsdatascience.com/sampling-distributions-with-python-f5a5f268f636)\n",
19 | "- [Confidence Intervals with Python](https://towardsdatascience.com/confidence-intervals-with-python-bfa28ebb81c)\n",
20 | "- [Significance Tests with Python](https://towardsdatascience.com/significance-or-hypothesis-tests-with-python-7ed35e9ac9b6)\n",
21 | "- [Two-sample Inference for the Difference Between Groups with Python](https://towardsdatascience.com/two-sample-inference-for-the-difference-between-groups-with-python-de91fbee32f9)\n",
22 | "- [Inference for Categorical Data](https://towardsdatascience.com/inference-for-categorical-data-9f3c6034aa57)\n",
23 | "- Advanced Regression\n",
24 | "\n",
25 | "As usual the code is available on my [GitHub](https://github.com/luisroque/College_Statistics_with_Python)."
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "id": "vital-voice",
31 | "metadata": {},
32 | "source": [
33 | "# Inference about Slope in Linear Regression"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "id": "alien-catalyst",
39 | "metadata": {},
40 | "source": [
41 | "Imagine that you want to predict the salary of a Data Scientist based on the number of years he has been coding. To be able to build this relationship, we can sample 20 random people from the population of Data Scientists and plot the relationship (notice that the salary unit is 10,000€). At the same time we can calculate the line of best fit."
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 280,
47 | "id": "chief-springfield",
48 | "metadata": {},
49 | "outputs": [],
50 | "source": [
51 | "import pandas as pd\n",
52 | "import matplotlib.pyplot as plt\n",
53 | "import numpy as np\n",
54 | "from scipy.stats import t, chi2"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 281,
60 | "id": "needed-curve",
61 | "metadata": {},
62 | "outputs": [],
63 | "source": [
64 | "salaries = [[4, 5.5, 3.5, 6, 7, 9, 3.4, 5.3, 4.3, 5.3, 6, 6.2, 6.5, 7, 7.1, 4.3, 5.2, 5, 5.7, 3.2]]\n",
65 | "yearsCoding = [[2, 5, 1, 4, 5, 10, 2, 2, 4, 3, 4, 6, 5, 6, 7, 4, 3, 4, 4, 3]]\n",
66 | "n = 20"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 282,
72 | "id": "academic-steel",
73 | "metadata": {},
74 | "outputs": [
75 | {
76 | "data": {
77 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAUZElEQVR4nO3df5BdZX3H8feXTSgbFGMlWhOIkepELajYFRSUCtHGX8UUtcKMP4qdZnQUqdPGMR1bVHRQo63O2Goj/qBi448Qo0VLoIpaxxLcEDUoBBWQsEFZqgGFKGH59o97FzbLZrMb7rnn3Oe+XzOZ3T33x/PlDvnk7Pc853kiM5EkleeguguQJFXDgJekQhnwklQoA16SCmXAS1Kh5tRdwESHH354LlmypO4yJKlnbNmy5bbMXDDVY40K+CVLljA8PFx3GZLUMyLiZ/t6zBaNJBXKgJekQhnwklQoA16SCmXAS1KhKp1FExFnA38NBPCxzPxgleNJUi/ZuHWENZu2s3PXbhbOH2TV8qWsOHZRx96/soCPiKNphftxwN3AJRHxlcz8cVVjSlKv2Lh1hNUbtrF7zxgAI7t2s3rDNoCOhXyVLZonAldk5l2ZeQ/wTeDPKxxPknrGmk3b7wv3cbv3jLFm0/aOjVFlwF8NnBQRj4iIecALgSMnPykiVkbEcEQMj46OVliOJDXHzl27Z3X8QFQW8Jl5DfBe4DLgEuD7wD1TPG9tZg5l5tCCBVPebStJxVk4f3BWxw9EpbNoMvPjmfm0zDwJ+CVg/12SgFXLlzI4d2CvY4NzB1i1fGnHxqh6Fs0jM/PWiFgMnAY8s8rxJKlXjF9I7clZNG0XRcQjgD3AGzLzVxWPJ0k9Y8Wxizoa6JNVGvCZ+ewq31+StG/eySpJhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9Jhao04CPizRHxw4i4OiLWRcQhVY4nSbpfZQEfEYuANwFDmXk0MACcXtV4kqS9Vd2imQMMRsQcYB6ws+LxJEltlQV8Zo4A7wduAm4Bbs/MSyc/LyJWRsRwRAyPjo5WVY4k9Z0qWzQPB14CPBZYCBwaEa+c/LzMXJuZQ5k5tGDBgqrKkaS+U2WL5rnADZk5mpl7gA3ACRWOJ0maoMqAvwl4RkTMi4gAlgHXVDieJGmCKnvwm4H1wFXAtvZYa6saT5K0tzlVvnlmngOcU+UYkqSpeSerJBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkq1Jy6C5DUHRu3jrBm03Z27trNwvmDrFq+lBXHLqq7LFXIgJf6wMatI6zesI3de8YAGNm1m9UbtgEY8gWzRSP1gTWbtt8X7uN27xljzabtNVWkbqjsDD4ilgKfm3DoKOAfM/ODVY0pTdSUlkQT6ti5a/esjqsMlQV8Zm4HngoQEQPACPDFqsaTJmpKS6IpdSycP8jIFGG+cP5g12pQ93WrRbMM+Glm/qxL46nPNaUl0ZQ6Vi1fyuDcgb2ODc4dYNXypV2tQ93VrYuspwPrpnogIlYCKwEWL17cpXJUuqa0JJpSx/hvC3W3itRdlQd8RBwMnAqsnurxzFwLrAUYGhrKqutRf2hKS6IpdUAr5A30/tKNFs0LgKsy8xddGEsCmtOSaEod6k/daNGcwT7aM1JVmtKSaEod6k+RWV1XJCLmATuAozLz9v09f2hoKIeHhyurR5JKExFbMnNoqscqPYPPzLuAR1Q5hiRpat7JKkmFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUW/apWE3YaEOqkwGvIjVlow2pTrZoVKSmbLQh1cmAV5GastGGVKcZBXx7T1WpZ+xrQw33IFU/mekZ/E8iYk1EPKnSaqQOcaMNaeYB/2TgOuD8iLgiIlZGxGEV1iU9KCuOXcR5px3DovmDBLBo/iDnnXaMF1jVV2a94UdEnERrh6b5wHrg3Mz8SSeKccMPSZqd6Tb8mHEPPiJOjYgvAh8CPgAcBfwn8NWOVSpJ6piZzoP/MXA5sCYzvzPh+Pr2Gb0kqWH2G/DtGTSfysx3TvV4Zr6p41VJkh60/QZ8Zo5FxMnAlAEvaXoumaC6zLRF852I+DDwOeDO8YOZeVUlVUmFcMkE1WmmAX9C++vEs/gETulsOVJZplsywYBX1WYU8Jl5ctWFSCVyyQTVacarSUbEi4A/Ag4ZP7avC6+SWhbOH2RkijB3yQR1w0znwX8UeAVwFhDAy4HHVFiXVASXTFCdZtyDz8wnR8QPMvMdEfEBYMP+XhQR84HzgaNp9exfm5n/e8DVqie8beM21m3ewVgmAxGccfyRvGvFMXWXVYvxPruzaFSHmQb8+O+Yd0XEQuD/gMfO4HUfAi7JzJdFxMHAvAOoUT3kbRu3ceEVN93381jmfT/3c8gb6KrDTBcbu7h9Nr4GuAq4EfjsdC9oL0Z2EvBxgMy8OzN3HWih6g3rNu+Y1XFJ1ZnpLJpz299eFBEXA4dk5u37edlRwCjwyYh4CrAFODsz75z4pIhYCawEWLx48WxqVwON7WPxun0dl1SdaQM+Ik6b5jEyc7o+/BzgacBZmbk5Ij4EvBX4h4lPysy1wFporSY508LVTAMRU4b5QEQN1Uj9bX9n8H82zWPJ9BdabwZuzszN7Z/X0wp4FeyM44/cqwc/8bik7po24DPzzAN948z8eUTsiIilmbkdWAb86EDfT71h/EKqs2ik+s14w48DudEpIp5Ka5rkwcD1wJmZ+at9Pd8NPyRpdqbb8GNGF1nbNzrNA06mFdgvA67c3+sy83vAlANLkqo102mSJ2Tmq4FfZeY7gGcCNlUlqcFmGvCTb3S6h5nd6CRJqslM72Qdv9HpfbTms0OrVSM9gBtcSM2wv3nwTwd2jN/oFBEPAbYB1wL/XH156jVucCE1x/5aNP8G3A3Q3lz7Pe1jt9O+OUmaaLoNLiR11/5aNAOZ+cv2968A1mbmRbSWLPhepZWpJ7nBhdQc+zuDH4iI8X8ElgFfn/DYjDcLUf/Y10YWbnAhdd/+An4d8M2I+BKtmTT/AxARj6PVppH24gYXUnPsb6mCd0fE14BHA5fm/be9HkRrdydpL25wITXHftssmXnFFMeuq6YclaApG1w4XVP9zj66iuR0TWnmd7JKPcXpmpIBr0I5XVMy4FUop2tKBrwK5XRNyYusHdOEGRtNqKEpnK4pGfAd0YQZG02ooWmaMl1Tqostmg5owoyNJtQgqVkM+A5owoyNJtQgqVkM+A5owoyNJtQgqVkM+A5owoyNk5+wYFbHJZXPi6wd0IQZG5dfOzqr45LKZ8B3SN0zNuzBS5rMFk0h7MFLmsyAL0QTrgNIapZKWzQRcSPwa2AMuCczh6ocr5814TqApGbpRg/+5My8rQvj9L26rwOMc8kEqRm8yKqOcskEqTmq7sEncGlEbImIlRWPpQZwyQSpOao+gz8xM3dGxCOByyLi2sz81sQntIN/JcDixYsrLkdVc7qm1ByVnsFn5s7211uBLwLHTfGctZk5lJlDCxZ412Wvc7qm1ByVBXxEHBoRDx3/HvhT4OqqxlMzOF1Tao4qWzSPAr4YEePj/EdmXlLheGoAp2tKzVFZwGfm9cBTqnp/NVdTpmtK/c47WSWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSpU1Xuyqg+9beM21m3ewVgmAxGccfyRvGvFMXWXJfUdA14d9baN27jwipvu+3ks876fDXmpu2zRqKPWbd4xq+OSqmPAq6PGMmd1XFJ1DHh11EBrk/UZH5dUHQNeHXXG8UfO6rik6niRVR01fiHVWTRS/SIb1BsdGhrK4eHhusuQpJ4REVsyc2iqx2zRSFKhDHhJKlTlAR8RAxGxNSIurnosSdL9unGR9WzgGuCwLoxVm41bR1izaTs7d+1m4fxBVi1fyopjF9VdlqQ+VukZfEQcAbwIOL/Kceq2cesIqzdsY2TXbhIY2bWb1Ru2sXHrSN2lSepjVbdoPgi8Bbi34nFqtWbTdnbvGdvr2O49Y6zZtL2miiSpwoCPiBcDt2bmlv08b2VEDEfE8OjoaFXlVGrnrt2zOi5J3VDlGfyJwKkRcSPwWeCUiLhw8pMyc21mDmXm0IIFCyospzoL5w/O6rgkdUNlAZ+ZqzPziMxcApwOfD0zX1nVeHVatXwpg3MH9jo2OHeAVcuX1lSRJLlUQUeMz5ZxFo2kJnGpAknqYS5VIEl9yICXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFarnlypwow1JmlpPB/z4Rhvja7GPb7QBGPKS+l5Pt2jcaEOS9q2nz+CbtNGGrSJJTdPTZ/BN2WjDPVklNVFPB3xTNtqwVSSpiXq6RdOUjTaa1CqSpHE9HfDQCvm6e90L5w8yMkWYuyerpDr1dIumKZrSKpKkiXr+DL4JmtIqkqSJDPgOaUKryKmakiYy4AvhXb2SJrMHXwinakqazIAvhFM1JU1mwBeiKXf1SmoOA74QTtWUNJkXWQvhVE1Jk1UW8BFxCPAt4Pfa46zPzHOqGk/NmKopqTmqPIP/HXBKZv4mIuYC346I/8rMKyocU5LUVlnAZ2YCv2n/OLf9J6saT5K0t0ovskbEQER8D7gVuCwzN0/xnJURMRwRw6Ojo1WWI0l9pdKAz8yxzHwqcARwXEQcPcVz1mbmUGYOLViwoMpyJKmvdGWaZGbuAr4BPL8b40mSIFqt8greOGIBsCczd0XEIHAp8N7MvHia14wCP6ukoO45HLit7iIaws9ib34ee/PzuN+D+Swek5lTtj+qnEXzaOCCiBig9ZvC56cLd4B9FdlLImI4M4fqrqMJ/Cz25uexNz+P+1X1WVQ5i+YHwLFVvb8kaXouVSBJhTLgO29t3QU0iJ/F3vw89ubncb9KPovKLrJKkurlGbwkFcqAl6RCGfAdEBFHRsTlEXFNRPwwIs6uu6a6tZep2BoR006N7QcRMT8i1kfEte3/R55Zd011iog3t/+eXB0R69orz/aNiPhERNwaEVdPOPb7EXFZRPy4/fXhnRjLgO+Me4C/zcwnAs8A3hART6q5prqdDVxTdxEN8SHgksx8AvAU+vhziYhFwJuAocw8GhgATq+3qq77FA+8q/+twNcy8/HA19o/P2gGfAdk5i2ZeVX7+1/T+gvctwuzR8QRwIuA8+uupW4RcRhwEvBxgMy8u710Rz+bAwxGxBxgHrCz5nq6KjO/Bfxy0uGXABe0v78AWNGJsQz4DouIJbRu8HrAypl95IPAW4B7a66jCY4CRoFPtltW50fEoXUXVZfMHAHeD9wE3ALcnpmX1ltVIzwqM2+B1gkj8MhOvKkB30ER8RDgIuBvMvOOuuupQ0S8GLg1M7fUXUtDzAGeBnwkM48F7qRDv373onZv+SXAY4GFwKER8cp6qyqXAd8h7V2rLgI+k5kb6q6nRicCp0bEjcBngVMi4sJ6S6rVzcDNE/ZCWE8r8PvVc4EbMnM0M/cAG4ATaq6pCX4REY8GaH+9tRNvasB3QEQErR7rNZn5T3XXU6fMXJ2ZR2TmEloXz76emX17hpaZPwd2RMTS9qFlwI9qLKluNwHPiIh57b83y+jji84TfBl4Tfv71wBf6sSbVrmaZD85EXgVsK29gxXA32fmV+srSQ1yFvCZiDgYuB44s+Z6apOZmyNiPXAVrdlnW+mzJQsiYh3wHODwiLgZOAd4D/D5iPgrWv8IvrwjY7lUgSSVyRaNJBXKgJekQhnwklQoA16SCmXAS1KhDHj1hGj5dkS8YMKxv4iIS7ow9t+1V4K8OiK+HxGvnsVrnzO+omZEnBoRfXsXq7rPefDqCZmZEfE64AsRcTmtVQjfzQNX5ZuRiBjIzLEZPO91wPOA4zLzjoh4GAe4EFRmfpnWDS1SVzgPXj0lIt5Haz2XQ9tfHwMcQ+tk5e2Z+aX2gm+fbj8H4I2Z+Z2IeA6tm0puAZ4KPB34PHAErX8wzs3Mz00a7ybg5Mz86RS1LKO1cNYc4LvA6zPzdxHxfFoLrt1G64aeozLzxRHxl7SWyX1jRHwKuAMYAv4AeEtmro+Ig4APA38C3EDrt+xPZOb6A//U1K88g1eveQet0LwbuJjWUgivjYj5wJUR8d+01vF4Xmb+NiIeD6yjFaQAxwFHZ+YNEfFSYGdmvgigfXZ+n4h4KPDQfYT7IbTW9V6WmddFxL8Dr4+IjwIfA04BfgJ8bvJrJ3g08CzgCbTO7NcDpwFLaP2j9Uhat/F/YuYfj3Q/e/DqKZl5J63Q/DSt1slb28tDfAM4BFgMzAU+FhHbgC8AEzdfuTIzb2h/vw14bkS8NyKenZm3TxougH39iruU1qJZ17V/voDWuu9PaB//cbZ+PZ5uobWNmXlvZv4IeFT72LOAL7SP/xy4fJrXS9PyDF696N72nwBempnbJz4YEW8HfkFr96SDgN9OePjO8W/aZ95/DLwQOC8iLs3Md054/I6IuDMijsrM6yfVENPUN9O+5++meL/p3leaFc/g1cs2AWe1VyUkIo5tH38YcEtm3ktrEbiBqV4cEQuBuzLzQlq99KmW8T0P+Jf2zkxExGERsRK4FlgSEY9rP+9VwDfbxx8bEX/YPn7GLP+bvg28NCIOiohH0VqUSjognsGrl51L62LmD9ohfyPwYuBfgYsi4uW0Whx37uP1xwBrIuJeYA/w+ime8xHgIcB3I2JP+3kfaPf3z6Q1q2f8IutH2xdZVwJfiYjbaAX20bP4b7qI1hK6VwPX0doZbHLrSJoRZ9FIDRMRD8nM30TEI4ArgRPb/XhpVjyDl5rn4vasoINpTd003HVAPIOXpEJ5kVWSCmXAS1KhDHhJKpQBL0mFMuAlqVD/D2Epsa+BJDOEAAAAAElFTkSuQmCC\n",
78 | "text/plain": [
79 | "
"
80 | ]
81 | },
82 | "metadata": {
83 | "needs_background": "light"
84 | },
85 | "output_type": "display_data"
86 | }
87 | ],
88 | "source": [
89 | "plt.scatter(yearsCoding[0], salaries[0])\n",
90 | "plt.xlabel('Years Coding')\n",
91 | "plt.ylabel('Salary');"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "id": "outdoor-discrimination",
97 | "metadata": {},
98 | "source": [
99 | "The line of best fit can be calculated by minimizing the squared distance between the data points and the line. The equation for the regression line is the following:\n",
100 | "\n",
101 | "$$\\begin{align}\n",
102 | "\\hat y = b \\cdot X + a ,\n",
103 | "\\end{align}$$\n",
104 | "\n",
105 | "where $b$ is the slope and $a$ is the intercept of the regression line. Notice that by calculating this line for our data, we are calculating the line of best fit for a single sample of 20 Data Scientists. We could calculate a different line of best fit for a different sample that we took from our population. This happens because we are estimating parameters for a population. In the case that you could get the salaries and number of years coding for all the Data Scientists in the world to build your model, you would be estimating the true population parameters. In that case we replace $a$ and $b$ by $\\alpha$ and $\\beta$:\n",
106 | "\n",
107 | "$$\\begin{align}\n",
108 | "\\hat y = \\beta \\cdot X + \\alpha\n",
109 | "\\end{align}$$\n",
110 | "\n",
111 | "Since we are using samples to estimate the population parameters, we can make inferences based on those samples. With that in mind, we know that $b$ will not be exactly equal to $\\beta$, but can we say that there is a positive linear relationship or a non-zero linear relationship between salaries and years of coding? As a matter of fact we can do it by defining a confidence interval around this statistic. This way we will have a good sense of where the true parameter might actually be. In previous articles we saw how to compute such interval: we subtract/add the critical value $t^*$ multiplied by the standard error of b.\n",
112 | "\n",
113 | "$$\\begin{align}\n",
114 | "b \\pm t^* SE_b\n",
115 | "\\end{align}$$\n",
116 | "\n",
117 | "In the same way, we could define a hypothesis test for the slope parameter. One that states that there is no relationship between the variables as a null hypothesis and, as the alternative, that there is a relationship between the variables (you could also be more specific and define the alternative hypothesis as considering only a positive linear relationship between the variables).\n",
118 | "\n",
119 | "$$\\begin{align}\n",
120 | "H_0: \\beta = 0 \\\\\n",
121 | "H_1: \\beta \\neq 0\n",
122 | "\\end{align}$$\n",
123 | "\n",
124 | "As in any other inference procedure we must respect the conditions to assure that the results that we get are valid. In this case, for the slope of a linear regression we must ensure that:\n",
125 | "* The data must be randomly generated;\n",
126 | "* The individual observations should be independent (or approximately independent - remember the 10% rule);\n",
127 | "* The relationship between the variables must be a linear relationship;\n",
128 | "* For any given x in the population, the distribution of y must be normal;\n",
129 | "* The variance should be constant for any given x."
130 | ]
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "id": "latest-distance",
135 | "metadata": {},
136 | "source": [
137 | "# Fitting a Regression Line from Scratch"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "id": "polyphonic-resort",
143 | "metadata": {},
144 | "source": [
145 | "We already defined our equation for the regression line, now we need to define our cost function and a method to update our parameters. For our cost function we will be using the Mean Squared Error:\n",
146 | "\n",
147 | "$$MSE = J(a, b) = \\frac{1}{N} \\sum^n_{i=1}(y_i-(b x_i + a))^2$$\n",
148 | "\n",
149 | "To update our parameters we will be using gradient descent. I will not cover in depth the method, I just want to give some intuition. The first step is to compute the gradient of the cost function with respect to each parameter:\n",
150 | "\n",
151 | "$$\\begin{align}\n",
152 | "\\frac{\\partial}{\\partial a} {J} = \\frac{1}{N}\\sum -2(y_i - (b x_i + a)) \\\\\n",
153 | "\\frac{\\partial}{\\partial b} {J} = \\frac{1}{N}\\sum -2 x_i(y_i - (b x_i + a)) \n",
154 | "\\end{align}$$\n",
155 | "\n",
156 | "Then we update the parameters accordingly:\n",
157 | "\n",
158 | "$$\\begin{align}\n",
159 | "a = a - \\alpha \\cdot da \\\\\n",
160 | "b = b - \\alpha \\cdot db,\n",
161 | "\\end{align}$$\n",
162 | "\n",
163 | "where $\\alpha$ represents the learning rate."
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": 283,
169 | "id": "romantic-masters",
170 | "metadata": {},
171 | "outputs": [],
172 | "source": [
173 | "def fit_lr(X, y, num_iter=1000, lr=0.01):\n",
174 | " n_samples = X.shape[0]\n",
175 | " slope = 0\n",
176 | " intercept = 0\n",
177 | "\n",
178 | " for _ in range(num_iter):\n",
179 | " y_predicted = np.dot(X, slope) + intercept\n",
180 | "\n",
181 | " ds = (1/n_samples) * np.dot(X.T, (y_predicted - y))\n",
182 | " di = (1/n_samples) * np.sum(y_predicted - y)\n",
183 | "\n",
184 | " slope -= lr * ds\n",
185 | " intercept -= lr * di\n",
186 | " \n",
187 | " return (slope, intercept)"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": 284,
193 | "id": "explicit-change",
194 | "metadata": {},
195 | "outputs": [],
196 | "source": [
197 | "slope, intercept = fit_lr(np.array(yearsCoding[0]), np.array(salaries[0]))"
198 | ]
199 | },
200 | {
201 | "cell_type": "code",
202 | "execution_count": 285,
203 | "id": "defensive-terminal",
204 | "metadata": {},
205 | "outputs": [
206 | {
207 | "data": {
208 | "image/png": "\n",
209 | "text/plain": [
210 | "
"
211 | ]
212 | },
213 | "metadata": {
214 | "needs_background": "light"
215 | },
216 | "output_type": "display_data"
217 | }
218 | ],
219 | "source": [
220 | "plt.scatter(yearsCoding[0], salaries[0])\n",
221 | "plt.xlabel('Years Coding')\n",
222 | "plt.ylabel('Salary')\n",
223 | "plt.plot(yearsCoding[0],slope*np.array(yearsCoding[0]) + intercept, color='r');"
224 | ]
225 | },
226 | {
227 | "cell_type": "code",
228 | "execution_count": 286,
229 | "id": "functional-talent",
230 | "metadata": {},
231 | "outputs": [
232 | {
233 | "name": "stdout",
234 | "output_type": "stream",
235 | "text": [
236 | "Slope=0.7096702503648314\n",
237 | "Intercept=2.4147010493601404\n"
238 | ]
239 | }
240 | ],
241 | "source": [
242 | "print('Slope=' + str(slope))\n",
243 | "print('Intercept=' + str(intercept))"
244 | ]
245 | },
246 | {
247 | "cell_type": "markdown",
248 | "id": "noticed-minimum",
249 | "metadata": {},
250 | "source": [
251 | "What would be the salary of a Data Scientist that codes for 15 years?"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": 287,
257 | "id": "aggregate-mainland",
258 | "metadata": {},
259 | "outputs": [
260 | {
261 | "data": {
262 | "text/plain": [
263 | "13.059754804832611"
264 | ]
265 | },
266 | "execution_count": 287,
267 | "metadata": {},
268 | "output_type": "execute_result"
269 | }
270 | ],
271 | "source": [
272 | "slope * 15 + intercept"
273 | ]
274 | },
275 | {
276 | "cell_type": "markdown",
277 | "id": "immediate-player",
278 | "metadata": {},
279 | "source": [
280 | "It would be around 130,000€."
281 | ]
282 | },
283 | {
284 | "cell_type": "markdown",
285 | "id": "leading-involvement",
286 | "metadata": {},
287 | "source": [
288 | "# Confidence Interval of the Slope"
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "id": "dried-sweet",
294 | "metadata": {},
295 | "source": [
296 | "We can treat the slope $\\hat b$ as a normally distributed random variable with a mean of $b$ and a variance equal to $\\sigma^2$ divided by the sum of squares of X.\n",
297 | "\n",
298 | "$$\n",
299 | "\\hat b \\sim \\mathcal{N}(b, \\frac{\\sigma^2}{SS_{xx}})$$\n",
300 | "\n",
301 | "As we do not know the population variance, we use the sampling variance to calculate the Standard Error (SE):\n",
302 | "\n",
303 | "$$SE_{\\hat b} = \\frac{ \\sqrt{\\frac{\\sum{(y_i - \\hat y)}^2}{n-2}}}{\\sqrt{SS_{xx}}}$$\n",
304 | "\n",
305 | "We can describe the SE as the standard deviation of the sampling distribution of the slope of the regression line."
306 | ]
307 | },
308 | {
309 | "cell_type": "code",
310 | "execution_count": 288,
311 | "id": "capable-beach",
312 | "metadata": {},
313 | "outputs": [
314 | {
315 | "data": {
316 | "text/plain": [
317 | "79.20000000000002"
318 | ]
319 | },
320 | "execution_count": 288,
321 | "metadata": {},
322 | "output_type": "execute_result"
323 | }
324 | ],
325 | "source": [
326 | "SS_xx = np.sum((np.array(yearsCoding[0]) - np.mean(np.array(yearsCoding[0])))**2)\n",
327 | "SS_xx"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": 289,
333 | "id": "under-donor",
334 | "metadata": {},
335 | "outputs": [
336 | {
337 | "data": {
338 | "text/plain": [
339 | "0.08687984291368046"
340 | ]
341 | },
342 | "execution_count": 289,
343 | "metadata": {},
344 | "output_type": "execute_result"
345 | }
346 | ],
347 | "source": [
348 | "SE_b = (np.sqrt(np.sum((np.array(salaries[0])-(np.array(yearsCoding[0])*slope+intercept))**2)/(n-2))) / np.sqrt(SS_xx)\n",
349 | "SE_b"
350 | ]
351 | },
352 | {
353 | "cell_type": "markdown",
354 | "id": "british-marketing",
355 | "metadata": {},
356 | "source": [
357 | "We can finally calculate the 95% confidence interval for the slope of our linear regression."
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": 290,
363 | "id": "equal-vienna",
364 | "metadata": {},
365 | "outputs": [
366 | {
367 | "name": "stdout",
368 | "output_type": "stream",
369 | "text": [
370 | "95% Confidence interval=[0.53,0.89]\n"
371 | ]
372 | }
373 | ],
374 | "source": [
375 | "print('95% Confidence interval=[' + str(np.round(slope - t.ppf(0.975, df=n-2)*SE_b,2)) + ',' + str(np.round(slope + t.ppf(0.975, df=n-2)*SE_b,2)) + ']')"
376 | ]
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "id": "editorial-characteristic",
381 | "metadata": {},
382 | "source": [
383 | "Recall that a 95% confidence interval means that the true slope is contained in our confidence intervals 95% of the times.\n",
384 | "\n",
385 | "We can use this interval to test the following hypothesis at the $\\alpha = 0.05$ level of significance:\n",
386 | "\n",
387 | "$$\\begin{align}\n",
388 | "H_0: \\beta = 0 \\\\\n",
389 | "H_1: \\beta \\neq 0\n",
390 | "\\end{align}$$\n",
391 | "\n",
392 | "What can we conclude then? Assuming that $H_0$ is true, we are facing the situation where $\\beta$ does not overlaps with the 95% interval and that happens with probability less than 5%. Thus, we reject $H_0$ and accept the suggested $H_1$. It states that there is some relationship (non-zero) between number of years coding and the salary of a Data Scientist."
393 | ]
394 | },
395 | {
396 | "cell_type": "markdown",
397 | "id": "secondary-anchor",
398 | "metadata": {},
399 | "source": [
400 | "# Nonlinear Regression"
401 | ]
402 | },
403 | {
404 | "cell_type": "markdown",
405 | "id": "governing-seattle",
406 | "metadata": {},
407 | "source": [
408 | "Now, suppose that the relationship between the salary of a Data Scientist and the number of years of coding is not linear. In this case we see that it is closer to an exponential relationship. What happens if we try to fit a line to these data?"
409 | ]
410 | },
411 | {
412 | "cell_type": "code",
413 | "execution_count": 291,
414 | "id": "referenced-conservative",
415 | "metadata": {},
416 | "outputs": [],
417 | "source": [
418 | "salaries = [[3.5, 7.5, 2.5, 7, 9, 40, 3.4, 4.3, 5.3, 5.3, 4, 12.2, 8.5, 10, 18.1, 4.3, 5.2, 5, 5.7, 5.2]]\n",
419 | "yearsCoding = [[2, 5, 1 , 4, 5, 10, 2 , 2 , 4, 3 , 4, 6 , 5 , 6 , 7 , 4 , 3 , 4, 4 , 3 ]]\n",
420 | "n = 20"
421 | ]
422 | },
423 | {
424 | "cell_type": "code",
425 | "execution_count": 292,
426 | "id": "reliable-memorabilia",
427 | "metadata": {},
428 | "outputs": [
429 | {
430 | "data": {
431 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAWdElEQVR4nO3df7RdZX3n8feHS7qIvxocLjQEMP5acVoYSb1FKx0HwQ5UGYm4sGWWljquFeuqVqcODviPvxdM44+6VjvaoIypOhQEDAx1RAakjuMUe0MiwSLaUUQvKYnFiNJbjcl3/jg7ktzcm9yLd599c/f7tdZZ55zn7HP2l7PI5zz32c9+dqoKSVJ/HNF1AZKk4TL4JalnDH5J6hmDX5J6xuCXpJ45susCZuOYY46plStXdl2GJB1WNm3a9L2qGp3aflgE/8qVKxkfH++6DEk6rCT59nTtDvVIUs8Y/JLUMwa/JPWMwS9JPWPwS1LPtD6rJ8kIMA5MVNW5SZ4MXA2sBO4DXlFV32+7Dkk6nGzcPMG6m+/lgZ2THL9sKRefvYo1q1fMy2cPo8f/RuCefZ5fAtxaVc8Ebm2eS5IaGzdPcOn1W5nYOUkBEzsnufT6rWzcPDEvn99q8Cc5AXgJ8JF9ms8DNjSPNwBr2qxBkg43626+l8ldu/drm9y1m3U33zsvn992j/9PgLcAe/ZpO66qtgE098dO98Yka5OMJxnfsWNHy2VK0sLxwM7JObXPVWvBn+RcYHtVbXos76+q9VU1VlVjo6MHnHEsSYvW8cuWzql9rtrs8Z8OvDTJfcBfAmcm+QTwYJLlAM399hZrkKTDzsVnr2LpkpH92pYuGeHis1fNy+e3FvxVdWlVnVBVK4HfAW6rqlcCNwIXNZtdBNzQVg2SdDhas3oFl51/CiuWLSXAimVLuez8U+ZtVk8Xi7RdDlyT5DXA/cAFHdQgSQvamtUr5i3opxpK8FfV7cDtzeN/BM4axn4lSQfyzF1J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWcMfknqGYNfknrG4JeknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeqZ1oI/yVFJvpzkK0m+muQdTfvbk0wk2dLcXtxWDZKkA7V56cUfA2dW1Y+SLAG+mOR/Nq99oKre2+K+JUkzaC34q6qAHzVPlzS3amt/kqTZaXWMP8lIki3AduCWqrqjeen1Se5KcmWSo9usQZK0v1aDv6p2V9WpwAnAaUlOBj4EPB04FdgGvG+69yZZm2Q8yfiOHTvaLFOSemUos3qqaidwO3BOVT3Y/CDsAa4ATpvhPeuraqyqxkZHR4dRpiT1QpuzekaTLGseLwVeBHwtyfJ9NnsZcHdbNUiSDtTmrJ7lwIYkIwx+YK6pqpuSfDzJqQwO9N4HvLbFGiRJU7Q5q+cuYPU07a9qa5+SpEPzzF1J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWcMfknqGYNfknrG4JeknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+SeqbNi60fleTLSb6S5KtJ3tG0PznJLUm+0dwf3VYNkqQDtdnj/zFwZlU9GzgVOCfJ84BLgFur6pnArc1zSdKQtBb8NfCj5umS5lbAecCGpn0DsKatGiRJB2p1jD/JSJItwHbglqq6AziuqrYBNPfHzvDetUnGk4zv2LGjzTIlqVdaDf6q2l1VpwInAKclOXkO711fVWNVNTY6OtpajZLUN0OZ1VNVO4HbgXOAB5MsB2jutw+jBknSQJuzekaTLGseLwVeBHwNuBG4qNnsIuCGtmqQJB3oyBY/ezmwIckIgx+Ya6rqpiT/F7gmyWuA+4ELWqxBkjRFa8FfVXcBq6dp/0fgrLb2K0k6OM/claSeMfglqWcMfknqGYNfknrG4JeknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWcMfknqGYNfknrG4JeknmnzmrsnJvl8knuSfDXJG5v2tyeZSLKlub24rRokSQdq85q7PwXeXFV3JnkisCnJLc1rH6iq97a4b0nSDNq85u42YFvz+IdJ7gFWtLU/SdLsDGWMP8lKBhdev6Npen2Su5JcmeToGd6zNsl4kvEdO3YMo0xJ6oXWgz/JE4DrgDdV1cPAh4CnA6cy+IvgfdO9r6rWV9VYVY2Njo62XaYk9UarwZ9kCYPQ/2RVXQ9QVQ9W1e6q2gNcAZzWZg2SpP21OasnwEeBe6rq/fu0L99ns5cBd7dVgyTpQG3O6jkdeBWwNcmWpu2twIVJTgUKuA94bYs1SJKmaHNWzxeBTPPSZ9rapyTp0DxzV5J6ZlbBn2Sk7UIkScMx2x7/3ydZl+SXW61GktS62Qb/vwK+Dnwkyd80J1c9qcW6JEktmVXwV9UPq+qKqno+8BbgbcC2JBuSPKPVCiVJ82rWY/xJXprk08AHGZxt+zTgf+AsHUk6rMx2Ouc3gM8D66rqS/u0X5vkBfNfliSpLYcM/mZGz8eq6p3TvV5VfzjvVUmSWnPIoZ6q2g28cAi1SJKGYLZDPV9K8qfA1cAjexur6s5WqpIktWa2wf/85n7f4Z4CzpzfciRJbZtV8FeVQz2StEjMepG2JC8BfgU4am/bTAd8JUkL12zn8X8Y+G3gDQxW3LwAeEqLdUmSWjLbJRueX1W/C3y/qt4B/DpwYntlSZLaMtvgn2zu/ynJ8cAu4KntlCRJatNsx/hvSrIMWAfcyWBGz0faKkqS1J7Zzup5V/PwuiQ3AUdV1Q/aK0uS1JaDBn+S8w/yGlV1/UFePxH4C+CXgD3A+qr6YJInMzgRbCWDa+6+oqq+P/fSJUmPxaF6/P/uIK8VMGPwAz8F3lxVdyZ5IrApyS3A7wG3VtXlSS4BLgH+8xxqliT9HA4a/FX16sf6wVW1DdjWPP5hknuAFcB5wBnNZhuA2zH4JWlohnICV5KVwGrgDuC45keBqtqW5NgZ3rMWWAtw0kknzbZMSdIhtH4CV5InANcBb6qqh2dbWFWtr6qxqhobHR2d7dskSYfQ6glcSZYwCP1P7nMg+MEky5vXlwPb5162JOmxeqwncP2UQ5zAlSTAR4F7qur9+7x0I3BR8/gi4IbZlytJ+nnN9QSuPwY2NW2HOoHrdOBVwNYkW5q2twKXA9ckeQ1wP4NhI0nSkBxqHv+vAd/ZewJXM16/Ffga8IGDvbeqvsjgeMB0zpp7qZKk+XCooZ4/B34C0FxU/fKm7QfA+nZLkyS14VBDPSNV9VDz+LcZnH17HYOlG7a0Wpmkodi4eYJ1N9/LAzsnOX7ZUi4+exVrVq/ouiy16FA9/pEke38czgJu2+e1WZ8DIGlh2rh5gkuv38rEzkkKmNg5yaXXb2Xj5omuS1OLDhX8VwF/neQGBjN7/jdAkmcwGO6RdBhbd/O9TO7avV/b5K7drLv53o4q0jAcasmG9yS5FVgOfK6qqnnpCAYnc0k6jD2wc3JO7VocDjlcU1V/M03b19spR9IwHb9sKRPThPzxy5Z2UI2GZbYncElahC4+exVLl4zs17Z0yQgXn72qo4o0DB6glXps7+wdZ/X0i8Ev9dya1SsM+p5xqEeSesbgl6SeMfglqWcMfknqGYNfknrG4JeknjH4JalnDH5J6pnWgj/JlUm2J7l7n7a3J5lIsqW5vbit/UuSptdmj/9jwDnTtH+gqk5tbp9pcf+SpGm0FvxV9QXgoUNuKEkaqi7G+F+f5K5mKOjomTZKsjbJeJLxHTt2DLM+SVrUhh38HwKeDpwKbAPeN9OGVbW+qsaqamx0dHRI5UnS4jfU4K+qB6tqd1XtAa4AThvm/iVJQw7+JMv3efoy4O6ZtpUktaO19fiTXAWcARyT5LvA24AzkpwKFHAf8Nq29i9Jml5rwV9VF07T/NG29idJmh3P3JWknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZ1qbzinp4DZunmDdzffywM5Jjl+2lIvPXsWa1Su6Lks9YPBLHdi4eYJLr9/K5K7dAEzsnOTS67cCGP5qnUM9UgfW3Xzvz0J/r8ldu1l3870dVaQ+MfilDjywc3JO7dJ8MvilDhy/bOmc2qX5ZPBLHbj47FUsXTKyX9vSJSNcfPaqjipSn3hwV+rA3gO4zupRFwx+qSNrVq8w6NUJh3okqWcMfknqGYNfknqmzUsvXgmcC2yvqpObticDVwMrGVx68RVV9f22apAWMpdsUFfa7PF/DDhnStslwK1V9Uzg1ua51Dt7l2yY2DlJ8eiSDRs3T3RdmnqgteCvqi8AD01pPg/Y0DzeAKxpa//SQuaSDerSsMf4j6uqbQDN/bEzbZhkbZLxJOM7duwYWoHSMLhkg7q0YA/uVtX6qhqrqrHR0dGuy5HmlUs2qEvDDv4HkywHaO63D3n/0oLgkg3q0rDP3L0RuAi4vLm/Ycj7lxbEbBqXbFCXUlXtfHByFXAGcAzwIPA2YCNwDXAScD9wQVVNPQB8gLGxsRofH2+lTvXL1AugwKCnfdn5pxi6WnSSbKqqsantrfX4q+rCGV46q619SodysNk0Br/6YsEe3JXaMDHDrJmZ2qXFyOBXr4wkc2qXFiODX72ye4ZjWjO1S4uRwa9eWTHDPPmZ2qXFyOBXrzh/XjL41TNrVq/g5c9Z8bMx/ZGElz/HK2GpXwx+9crGzRNct2niZ2P6u6u4btOEq2KqV7zmroaq67NmnccvGfwaoqlnze5dgx4YWui6KqbkUI+GaCGsQe+qmJLBryFaCL1tZ/VIBr+GaCH0ttesXsFl55/CimVLCYP5+y7Qpr5xjF9Dc/HZq6ZdGXPYve01q52+qX4z+DU0rkEvLQwO9UhSz9jj19AshOmckgz+3uj6xCnw5ClpoTD4e2Dj5gn+6Oot7GmeT+yc5I+u3gIMt6ftRVCkhaGTMf4k9yXZmmRLEi+m27JLr7/rZ6G/156mfZi8CIq0MHTZ439hVX2vw/33xuSuqbF/8Pa2eBEUaWFwVo+Gxh6/tDB01eMv4HNJCvjzqlo/dYMka4G1ACeddNKQy5s/C+Gg6hGBPdN0qo8Yct7a45cWhq56/KdX1a8CvwX8QZIXTN2gqtZX1VhVjY2Ojg6/wnmwd/rixM5JikenLw577fd//9zpfzhnam/L0Y9bMqd2Se3oJPir6oHmfjvwaeC0Lupo20JYjRLg3WtO4ZXPO2m/q0698nkn8e41pwy1jpk69nb4peEa+lBPkscDR1TVD5vH/xZ457DrGIaFsBrlXu9ec8rQg36qnZO75tQuqR1djPEfB3w6g97nkcB/r6rPdlBH645ftnTaOep9Xft9JJl2PN+Du9JwDX2op6q+WVXPbm6/UlXvGXYNw3Lx2asYmXIEdeSI9Hbtdw/uSguD0zlbNP7th9g9ZTrN7j3F+Lcf6qiibq2Y4S+dmdoltcPgb9FVd3xnTu2LnVe/khYG1+ppkUMb+3M9fmlhMPhb5MHMA3n1K6l7i3aoZ+PmCU6//Daeeslfcfrltw39pCmAC5974pzaJWkYFmWPf6Fc8GPvvPmr7vgOu6sYSbjwuSd2Pp9eUr+lDoPx5rGxsRofn/3qzadfftu08+dXLFvK/7nkzPksTZIWrCSbqmpsavuiHOpZSGfMStJCsyiDf6YzY/t6xqwk7WtRBr/zxSVpZovy4K7zxSVpZosy+MH54pI0k0U51CNJmpnBL0k9Y/BLUs8Y/JLUMwa/JPXMYbFkQ5IdwLe7ruPndAzwva6LWED8Ph7ld7E/v4/9/Tzfx1OqanRq42ER/ItBkvHp1szoK7+PR/ld7M/vY39tfB8O9UhSzxj8ktQzBv/wrO+6gAXG7+NRfhf78/vY37x/H47xS1LP2OOXpJ4x+CWpZwz+liU5Mcnnk9yT5KtJ3th1TV1LMpJkc5Kbuq6la0mWJbk2ydea/0d+veuaupLkPzb/Ru5OclWSo7quaZiSXJlke5K792l7cpJbknyjuT96PvZl8Lfvp8Cbq+pfAs8D/iDJL3dcU9feCNzTdRELxAeBz1bVs4Bn09PvJckK4A+Bsao6GRgBfqfbqobuY8A5U9ouAW6tqmcCtzbPf24Gf8uqaltV3dk8/iGDf9i9vVBAkhOAlwAf6bqWriV5EvAC4KMAVfWTqtrZaVHdOhJYmuRI4HHAAx3XM1RV9QXgoSnN5wEbmscbgDXzsS+Df4iSrARWA3d0XEqX/gR4C7Cn4zoWgqcBO4D/1gx9fSTJ47suqgtVNQG8F7gf2Ab8oKo+121VC8JxVbUNBp1I4Nj5+FCDf0iSPAG4DnhTVT3cdT1dSHIusL2qNnVdywJxJPCrwIeqajXwCPP0p/zhphm7Pg94KnA88Pgkr+y2qsXL4B+CJEsYhP4nq+r6ruvp0OnAS5PcB/wlcGaST3RbUqe+C3y3qvb+BXgtgx+CPnoR8K2q2lFVu4Drged3XNNC8GCS5QDN/fb5+FCDv2VJwmAM956qen/X9XSpqi6tqhOqaiWDA3e3VVVve3VV9Q/Ad5KsaprOAv6uw5K6dD/wvCSPa/7NnEVPD3RPcSNwUfP4IuCG+fjQRXux9QXkdOBVwNYkW5q2t1bVZ7orSQvIG4BPJvkF4JvAqzuupxNVdUeSa4E7GcyE20zPlm5IchVwBnBMku8CbwMuB65J8hoGP44XzMu+XLJBkvrFoR5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+HtQx8Mclv7dP2iiSfHcK+/1OzqubdSb6S5Hfn8N4z9q5OmuSlSXp5xq664Tx+HdaqqpL8PvCpJJ9nsKrjezhwlcNZSTJSVbtnsd3vA78JnFZVDyf5RR7jAlpVdSODE3WkoXAevxaFJH/MYK2bxzf3TwFOYdC5eXtV3dAskvfxZhuA11fVl5KcweBkmW3AqcCvAdcAJzD4IXlXVV09ZX/3Ay+sqv83TS1nMVhw7Ejgb4HXVdWPk5zDYJG67zE4UelpVXVukt9jsBzx65N8DHgYGAN+CXhLVV2b5AjgT4F/A3yLwV/rV1bVtY/9W1Nf2ePXYvEOBmH6E+AmBstB/Icky4AvJ/lfDNY5+c2q+uckzwSuYhCwAKcBJ1fVt5K8HHigql4C0PTmfybJE4EnzhD6RzFYV/2sqvp6kr8AXpfkw8AVwJnA3wNXT33vPpYDvwE8i8FfAtcC5wMrGfyYHctgOYMrZ//1SI9yjF+LQlU9wiBMP85gCOaSZomM24GjgJOAJcAVSbYCnwL2vSDOl6vqW83jrcCLkvyXJP+6qn4wZXcBZvpTeRWDxca+3jzfwGDN/Wc17d+owZ/ZB1ucbmNV7amqvwOOa9p+A/hU0/4PwOcP8n7poOzxazHZ09wCvLyq7t33xSRvBx5kcKWrI4B/3uflR/Y+aHrqzwFeDFyW5HNV9c59Xn84ySNJnlZV35xSQw5S32zHVX88zecd7HOlObHHr8XoZuANzSqPJFndtP8isK2q9jBYOG9kujcnOR74p6r6BIOx+umWSr4M+LPmKlokeVKStcDXgJVJntFs9yrgr5v2pyZ5etN+4Rz/m74IvDzJEUmOY7CYl/SY2OPXYvQuBgdR72rC/z7gXOC/AtcluYDBUMkjM7z/FGBdkj3ALuB102zzIeAJwN8m2dVs977m+MGrGcwy2ntw98PNwd21wF8l+R6DID95Dv9N1zFYqvhu4OsMruI2dQhKmhVn9UiHiSRPqKofJfkXwJeB05vxfmlO7PFLh4+bmllKv8Bgiqmhr8fEHr8k9YwHdyWpZwx+SeoZg1+Sesbgl6SeMfglqWf+P7RPc360g0K1AAAAAElFTkSuQmCC\n",
432 | "text/plain": [
433 | "
"
469 | ]
470 | },
471 | "metadata": {
472 | "needs_background": "light"
473 | },
474 | "output_type": "display_data"
475 | }
476 | ],
477 | "source": [
478 | "plt.scatter(yearsCoding[0], salaries[0])\n",
479 | "plt.xlabel('Years Coding')\n",
480 | "plt.ylabel('Salary')\n",
481 | "plt.plot(yearsCoding[0],slope*np.array(yearsCoding[0]) + intercept, color='r');"
482 | ]
483 | },
484 | {
485 | "cell_type": "code",
486 | "execution_count": 295,
487 | "id": "raised-pilot",
488 | "metadata": {},
489 | "outputs": [
490 | {
491 | "name": "stdout",
492 | "output_type": "stream",
493 | "text": [
494 | "Slope=3.35501894004377\n",
495 | "Intercept=-5.5729005963397205\n"
496 | ]
497 | }
498 | ],
499 | "source": [
500 | "print('Slope=' + str(slope))\n",
501 | "print('Intercept=' + str(intercept))"
502 | ]
503 | },
504 | {
505 | "cell_type": "markdown",
506 | "id": "alleged-major",
507 | "metadata": {},
508 | "source": [
509 | "We see that the line that we fit does not explain the data at all. We even get a negative salary for someone that just started coding and we would heavily underestimate the salary of someone that codes for 10 years. We have two options here. The first is to fit an exponential line to the data. The problem with this approach is that we lose the tools that we have been developing to fit and analyze a linear relationship between two variables. On the other hand, we can transform our data prior to fit the regression line."
510 | ]
511 | },
512 | {
513 | "cell_type": "code",
514 | "execution_count": 296,
515 | "id": "plastic-priority",
516 | "metadata": {},
517 | "outputs": [
518 | {
519 | "data": {
520 | "image/png": "\n",
521 | "text/plain": [
522 | "
"
558 | ]
559 | },
560 | "metadata": {
561 | "needs_background": "light"
562 | },
563 | "output_type": "display_data"
564 | }
565 | ],
566 | "source": [
567 | "plt.scatter(yearsCoding[0], np.log(salaries[0]))\n",
568 | "plt.xlabel('Years Coding')\n",
569 | "plt.ylabel('ln(Salary)')\n",
570 | "plt.plot(yearsCoding[0],slope*np.array(yearsCoding[0]) + intercept, color='r');"
571 | ]
572 | },
573 | {
574 | "cell_type": "code",
575 | "execution_count": 299,
576 | "id": "occupied-large",
577 | "metadata": {},
578 | "outputs": [
579 | {
580 | "name": "stdout",
581 | "output_type": "stream",
582 | "text": [
583 | "Slope=0.3173211894646696\n",
584 | "Intercept=0.5195321580809141\n"
585 | ]
586 | }
587 | ],
588 | "source": [
589 | "print('Slope=' + str(slope))\n",
590 | "print('Intercept=' + str(intercept))"
591 | ]
592 | },
593 | {
594 | "cell_type": "markdown",
595 | "id": "stuffed-italic",
596 | "metadata": {},
597 | "source": [
598 | "With the transformation we see that the line of best fit actually explains the relationship between the variable years of coding and the transformed variable salary. But notice that the relationship between the original variables is not linear, but seems to be exponential."
599 | ]
600 | },
601 | {
602 | "cell_type": "markdown",
603 | "id": "external-disposition",
604 | "metadata": {},
605 | "source": [
606 | "What would be the salary of a Data Scientist that codes for 15 years in this scenario?"
607 | ]
608 | },
609 | {
610 | "cell_type": "code",
611 | "execution_count": 300,
612 | "id": "comparative-navigation",
613 | "metadata": {},
614 | "outputs": [
615 | {
616 | "data": {
617 | "text/plain": [
618 | "196.24227641696947"
619 | ]
620 | },
621 | "execution_count": 300,
622 | "metadata": {},
623 | "output_type": "execute_result"
624 | }
625 | ],
626 | "source": [
627 | "np.exp(slope * 15+intercept)"
628 | ]
629 | },
630 | {
631 | "cell_type": "markdown",
632 | "id": "specific-messenger",
633 | "metadata": {},
634 | "source": [
635 | "It would be almost 2M€! Compare it with the result from the linear dataset that we used before."
636 | ]
637 | },
638 | {
639 | "cell_type": "markdown",
640 | "id": "civil-institute",
641 | "metadata": {},
642 | "source": [
643 | "# Conclusion"
644 | ]
645 | },
646 | {
647 | "cell_type": "markdown",
648 | "id": "appropriate-stranger",
649 | "metadata": {},
650 | "source": [
651 | "This article covered how to solve a linear regression problem and then \n",
652 | "use our line of best fit to make inferences about our parameters. In this case we focus on the slope parameter, using confidence intervals and hypothesis testing to evaluate the type of association between two variables of interest. Finally, we introduced the concept of transforming variables to deal with the fact that the data can have nonlinear patterns."
653 | ]
654 | },
655 | {
656 | "cell_type": "markdown",
657 | "id": "optical-certification",
658 | "metadata": {},
659 | "source": [
660 | "# Exercises\n",
661 | "\n",
662 | "You will get the solutions in next week's article."
663 | ]
664 | },
665 | {
666 | "cell_type": "markdown",
667 | "id": "tamil-vegetarian",
668 | "metadata": {},
669 | "source": [
670 | "1. Márcia collected data on the battery life and price of a random sample of Portable Computers. Based on the data presented below, what is the test statistic for the null hypothesis that the population slope is 0? "
671 | ]
672 | },
673 | {
674 | "cell_type": "markdown",
675 | "id": "periodic-departure",
676 | "metadata": {},
677 | "source": [
678 | "2. Rui obtained a random sample of colleagues at work and noticed a positive linear relationship between their ages and the number of kilometers they said they walked yesterday. A 95%, percent confidence interval for the slope of the regression line was (15.4, 155.2). Rui wants to use this interval to test $H_0: \\beta=0$ vs. $H_1: \\beta \\neq 0$ at the 5% level of significance. Assume that all conditions for inference have been met. What should Rui conclude?"
679 | ]
680 | },
681 | {
682 | "cell_type": "markdown",
683 | "id": "floating-malawi",
684 | "metadata": {},
685 | "source": [
686 | "### Answers from last week"
687 | ]
688 | },
689 | {
690 | "cell_type": "markdown",
691 | "id": "mechanical-shopping",
692 | "metadata": {},
693 | "source": [
694 | "1. According to a distributor of surf boards, 66% of the boards are common, 25% are uncommon, and 9% are rare. José wondered if the rarity levels of the boards he and his friends owned followed this distribution, so he took a random sample of 500 boards and recorded their rarity levels. The results are presented in the table below. Carry out a goodness-of-fit test to determine if the distribution of rarity levels of surf boards José and his friends own disagrees with the claimed percentages."
695 | ]
696 | },
697 | {
698 | "cell_type": "code",
699 | "execution_count": 301,
700 | "id": "sticky-religion",
701 | "metadata": {},
702 | "outputs": [
703 | {
704 | "data": {
705 | "text/html": [
706 | "
\n",
707 | "\n",
720 | "
\n",
721 | " \n",
722 | "
\n",
723 | "
\n",
724 | "
Common
\n",
725 | "
Uncommon
\n",
726 | "
Rare
\n",
727 | "
\n",
728 | "
\n",
729 | "
Rarity level
\n",
730 | "
\n",
731 | "
\n",
732 | "
\n",
733 | "
\n",
734 | " \n",
735 | " \n",
736 | "
\n",
737 | "
Cards
\n",
738 | "
345
\n",
739 | "
125
\n",
740 | "
30
\n",
741 | "
\n",
742 | " \n",
743 | "
\n",
744 | "
"
745 | ],
746 | "text/plain": [
747 | " Common Uncommon Rare\n",
748 | "Rarity level \n",
749 | "Cards 345 125 30"
750 | ]
751 | },
752 | "execution_count": 301,
753 | "metadata": {},
754 | "output_type": "execute_result"
755 | }
756 | ],
757 | "source": [
758 | "table = [['Cards', 345, 125, 30]]\n",
759 | "alpha = 0.05\n",
760 | "df = pd.DataFrame(table)\n",
761 | "df.columns = ['Rarity level', 'Common', 'Uncommon', 'Rare']\n",
762 | "df = df.set_index('Rarity level')\n",
763 | "df"
764 | ]
765 | },
766 | {
767 | "cell_type": "code",
768 | "execution_count": 302,
769 | "id": "manufactured-paper",
770 | "metadata": {},
771 | "outputs": [],
772 | "source": [
773 | "arr = df.to_numpy()\n",
774 | "\n",
775 | "arr = np.concatenate((arr, (np.sum(arr)*np.asarray([0.66, 0.25, 0.09])).reshape(1,-1)))"
776 | ]
777 | },
778 | {
779 | "cell_type": "code",
780 | "execution_count": 303,
781 | "id": "controversial-message",
782 | "metadata": {},
783 | "outputs": [
784 | {
785 | "data": {
786 | "text/plain": [
787 | "5.681818181818182"
788 | ]
789 | },
790 | "execution_count": 303,
791 | "metadata": {},
792 | "output_type": "execute_result"
793 | }
794 | ],
795 | "source": [
796 | "chi_sq_statistic = np.sum((arr[0]-arr[1])**2/arr[1])\n",
797 | "chi_sq_statistic"
798 | ]
799 | },
800 | {
801 | "cell_type": "code",
802 | "execution_count": 304,
803 | "id": "confirmed-greeting",
804 | "metadata": {},
805 | "outputs": [
806 | {
807 | "name": "stdout",
808 | "output_type": "stream",
809 | "text": [
810 | "P-value = 0.0584\n"
811 | ]
812 | }
813 | ],
814 | "source": [
815 | "print('P-value = ' + str(np.round(1-chi2.cdf(chi_sq_statistic, df =2), 4)))"
816 | ]
817 | },
818 | {
819 | "cell_type": "code",
820 | "execution_count": 305,
821 | "id": "annual-rochester",
822 | "metadata": {},
823 | "outputs": [
824 | {
825 | "name": "stdout",
826 | "output_type": "stream",
827 | "text": [
828 | "Fail to reject H_0\n"
829 | ]
830 | }
831 | ],
832 | "source": [
833 | "if 1-chi2.cdf(chi_sq_statistic, df =2) < alpha:\n",
834 | " print('Reject H_0')\n",
835 | "else:\n",
836 | " print('Fail to reject H_0')"
837 | ]
838 | }
839 | ],
840 | "metadata": {
841 | "kernelspec": {
842 | "display_name": "Python 3",
843 | "language": "python",
844 | "name": "python3"
845 | },
846 | "language_info": {
847 | "codemirror_mode": {
848 | "name": "ipython",
849 | "version": 3
850 | },
851 | "file_extension": ".py",
852 | "mimetype": "text/x-python",
853 | "name": "python",
854 | "nbconvert_exporter": "python",
855 | "pygments_lexer": "ipython3",
856 | "version": "3.8.5"
857 | }
858 | },
859 | "nbformat": 4,
860 | "nbformat_minor": 5
861 | }
862 |
--------------------------------------------------------------------------------
/Two-sample Inference for the Difference Between Groups with Python.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "smooth-harvey",
6 | "metadata": {},
7 | "source": [
8 | "# Introduction\n",
9 | "In a series of weekly articles, I will be covering some important topics of statistics with a twist.\n",
10 | "\n",
11 | "The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.\n",
12 | "\n",
13 | "At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.\n",
14 | "\n",
15 | "Articles published so far:\n",
16 | "- [Bernoulli and Binomial Random Variables with Python](https://towardsdatascience.com/bernoulli-and-binomial-random-variables-d0698288dd36)\n",
17 | "- [From Binomial to Geometric and Poisson Random Variables with Python](https://towardsdatascience.com/geometric-and-poisson-random-variables-with-python-e5dcb63d6b55)\n",
18 | "- [Sampling Distribution of a Sample Proportion with Python](https://towardsdatascience.com/sampling-distributions-with-python-f5a5f268f636)\n",
19 | "- [Confidence Intervals with Python](https://towardsdatascience.com/confidence-intervals-with-python-bfa28ebb81c)\n",
20 | "- [Significance Tests with Python](https://towardsdatascience.com/significance-or-hypothesis-tests-with-python-7ed35e9ac9b6)\n",
21 | "- Two-sample Inference for the Difference Between Groups with Python\n",
22 | "\n",
23 | "As usual the code is available on my [GitHub](https://github.com/luisroque/College_Statistics_with_Python)."
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "id": "impressive-monday",
29 | "metadata": {},
30 | "source": [
31 | "# Comparing Population Proportions"
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "id": "robust-museum",
37 | "metadata": {},
38 | "source": [
39 | "Portuguese local election of 2021 will be held on September 26 of the same year. Imagine that one of the candidates for the city of Porto was interest in comparing the proportion of men and women that will vote for him. During that weekend a new poll comes out with the following information: from a sample of 1000 men, 62% will vote for him and from a sample of 1000 women, 57% will vote for him. These are in reality two Bernoulli distributions which we can defined by the parameters:\n",
40 | "\n",
41 | "$$\\begin{align}\n",
42 | "\\mu_{men} &= p_{men} \\\\\n",
43 | "\\sigma_{men}^2 &= p_{men} \\cdot(1-p_{men})\n",
44 | "\\end{align}$$"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 1,
50 | "id": "nearby-discipline",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "from scipy.stats import bernoulli, norm\n",
55 | "import numpy as np\n",
56 | "import seaborn as sns\n",
57 | "import math\n",
58 | "import matplotlib.pyplot as plt\n",
59 | "from graphviz import Digraph"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 10,
65 | "id": "super-holmes",
66 | "metadata": {},
67 | "outputs": [
68 | {
69 | "data": {
70 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAD4CAYAAAAD6PrjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAARpElEQVR4nO3de4xcZ33G8e+DTQItN6fZRK7t1KZ1AaciQJcASYsoqRpDqzpIBEwpWCitqUgRvYiSUKkIVZaoVFX0FsACinsjuFwagyjUNbdWQIwD4eKENC4Be2s3NmkpFKQgm1//mMPLxF57x/GeGa/3+5FW55z3vOfM79Wu5tlzZuadVBWSJAE8ZNIFSJLOHoaCJKkxFCRJjaEgSWoMBUlSs3TSBZyJCy+8sFavXj3pMiRpQbntttu+XlVTs+1b0KGwevVq9uzZM+kyJGlBSfK1k+3z9pEkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkaR6sWHUJScb2s2LVJb2MY0FPcyFJZ4uDMwd44Vs+ObbHe9fLr+jlvF4pSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLU9BoKSR6T5N1JvpzkziTPSHJBkp1J7u6Wy4b635hkX5K7klzdZ22SpBP1faXwp8CHqurxwGXAncANwK6qWgvs6rZJsg7YCFwKrAduSrKk5/okSUN6C4UkjwKeCbwNoKq+W1XfADYA27pu24BruvUNwM1VdX9V3QPsAy7vqz5J0on6vFJ4LHAE+Kskn0vy1iQ/DFxcVYcAuuVFXf8VwIGh42e6NknSmPQZCkuBpwBvqqonA9+mu1V0EpmlrU7olGxOsifJniNHjsxPpZIkoN9QmAFmqurWbvvdDELi3iTLAbrl4aH+q4aOXwkcPP6kVbW1qqaranpqaqq34iVpMeotFKrqv4ADSR7XNV0F3AHsADZ1bZuAW7r1HcDGJOcnWQOsBXb3VZ8k6UR9f/PaK4G/S3Ie8BXgZQyCaHuS64D9wLUAVbU3yXYGwXEUuL6qjvVcnyRpSK+hUFW3A9Oz7LrqJP23AFv6rEmSdHJ+olmS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUtNrKCT5apIvJrk9yZ6u7YIkO5Pc3S2XDfW/Mcm+JHclubrP2iRJJxrHlcLPVdWTqmq6274B2FVVa4Fd3TZJ1gEbgUuB9cBNSZaMoT5JUmcSt482ANu69W3ANUPtN1fV/VV1D7APuHz85UnS4tV3KBTwz0luS7K5a7u4qg4BdMuLuvYVwIGhY2e6NknSmCzt+fxXVtXBJBcBO5N8+RR9M0tbndBpEC6bAS655JL5qVKSBPR8pVBVB7vlYeB9DG4H3ZtkOUC3PNx1nwFWDR2+Ejg4yzm3VtV0VU1PTU31Wb4kLTq9hUKSH07yyO+vA78AfAnYAWzqum0CbunWdwAbk5yfZA2wFtjdV32SpBP1efvoYuB9Sb7/OH9fVR9K8hlge5LrgP3AtQBVtTfJduAO4ChwfVUd67E+SdJxeguFqvoKcNks7fcBV53kmC3Alr5qkiSdmp9oliQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLU9B4KSZYk+VySD3TbFyTZmeTubrlsqO+NSfYluSvJ1X3XJkl6oHFcKbwKuHNo+wZgV1WtBXZ12yRZB2wELgXWAzclWTKG+iRJnV5DIclK4BeBtw41bwC2devbgGuG2m+uqvur6h5gH3B5n/VJkh6o7yuFNwK/B3xvqO3iqjoE0C0v6tpXAAeG+s10bQ+QZHOSPUn2HDlypJeiJWmx6i0UkvwScLiqbhv1kFna6oSGqq1VNV1V01NTU2dUoyTpgZb2eO4rgV9O8lzgYcCjkvwtcG+S5VV1KMly4HDXfwZYNXT8SuBgj/VJko7T25VCVd1YVSurajWDF5A/UlW/CuwANnXdNgG3dOs7gI1Jzk+yBlgL7O6rPknSifq8UjiZNwDbk1wH7AeuBaiqvUm2A3cAR4Hrq+rYBOqTpEVrLKFQVR8DPtat3wdcdZJ+W4At46hJknQiP9EsSWpGCoUkV47SJkla2Ea9UvjzEdskSQvYKV9TSPIM4ApgKsnvDO16FOAUFJJ0jpnrhebzgEd0/R451P5N4Pl9FSVJmoxThkJVfRz4eJJ3VNXXxlSTJGlCRn1L6vlJtgKrh4+pqmf3UZQkaTJGDYV/AN7MYLZTP1AmSeeoUUPhaFW9qddKJEkTN+pbUt+f5BVJlnffnHZBkgt6rUySNHajXil8fwK7Vw+1FfDY+S1nvFasuoSDMwfm7jgPfnTlKv7zwP6xPJYkPVgjhUJVrem7kEk4OHOAF77lk2N5rHe9/IqxPI4knYmRQiHJS2drr6q/nt9yJEmTNOrto6cOrT+MwSynnwUMBUk6h4x6++iVw9tJHg38TS8VSZIm5sFOnf0dBt+MJkk6h4z6msL7GbzbCAYT4T0B2N5XUZKkyRj1NYU/Hlo/CnytqmZ6qEeSNEEj3T7qJsb7MoOZUpcB3+2zKEnSZIz6zWsvAHYD1wIvAG5N4tTZknSOGfX20e8DT62qwwBJpoB/Ad7dV2GSpPEb9d1HD/l+IHTuO41jJUkLxKhXCh9K8mHgnd32C4EP9lOSJGlSTvnffpKfSHJlVb0aeAvwROAy4FPA1jmOfViS3Uk+n2Rvktd37Rck2Znk7m65bOiYG5PsS3JXkqvPeHSSpNMy1y2gNwLfAqiq91bV71TVbzO4SnjjHMfeDzy7qi4DngSsT/J04AZgV1WtBXZ12yRZB2wELgXWAzclWfIgxiRJepDmCoXVVfWF4xurag+Dr+Y8qRr4v27zod1PARuAbV37NuCabn0DcHNV3V9V9wD7gMtHGIMkaZ7MFQoPO8W+h8918iRLktwOHAZ2VtWtwMVVdQigW17UdV8BDH+5wUzXdvw5NyfZk2TPkSNH5ipBknQa5gqFzyT59eMbk1wH3DbXyavqWFU9CVgJXJ7kp07RPbOdYpZzbq2q6aqanpqamqsESdJpmOvdR78FvC/Ji/lBCEwD5wHPG/VBquobST7G4LWCe5Msr6pDSZYzuIqAwZXBqqHDVgIHR30MSdKZO+WVQlXdW1VXAK8Hvtr9vL6qnlFV/3WqY5NMJXlMt/5w4OcZTJWxgx98vecm4JZufQewMcn5SdYwmIV194MYkyTpQRr1+xQ+Cnz0NM+9HNjWvYPoIcD2qvpAkk8B27tbUPsZTJ1BVe1Nsh24g8Gke9dX1bHTfExJ0hkY9cNrp61719KTZ2m/j8E3t812zBZgS181SZJOzakqJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1vYVCklVJPprkziR7k7yqa78gyc4kd3fLZUPH3JhkX5K7klzdV22SpNn1eaVwFPjdqnoC8HTg+iTrgBuAXVW1FtjVbdPt2whcCqwHbkqypMf6JEnH6S0UqupQVX22W/8WcCewAtgAbOu6bQOu6dY3ADdX1f1VdQ+wD7i8r/okSScay2sKSVYDTwZuBS6uqkMwCA7goq7bCuDA0GEzXdvx59qcZE+SPUeOHOm1bklabHoPhSSPAN4D/FZVffNUXWdpqxMaqrZW1XRVTU9NTc1XmZIkeg6FJA9lEAh/V1Xv7ZrvTbK8278cONy1zwCrhg5fCRzssz5J0gP1+e6jAG8D7qyqPxnatQPY1K1vAm4Zat+Y5Pwka4C1wO6+6pMknWhpj+e+EngJ8MUkt3dtrwXeAGxPch2wH7gWoKr2JtkO3MHgnUvXV9WxHuuTJB2nt1Coqn9j9tcJAK46yTFbgC191SRJOjU/0SxJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqektFJK8PcnhJF8aarsgyc4kd3fLZUP7bkyyL8ldSa7uqy5J0sn1eaXwDmD9cW03ALuqai2wq9smyTpgI3Bpd8xNSZb0WJskaRa9hUJVfQL47+OaNwDbuvVtwDVD7TdX1f1VdQ+wD7i8r9okSbMb92sKF1fVIYBueVHXvgI4MNRvpms7QZLNSfYk2XPkyJFei5WkxeZseaE5s7TVbB2ramtVTVfV9NTUVM9lSdLiMu5QuDfJcoBuebhrnwFWDfVbCRwcc22StOiNOxR2AJu69U3ALUPtG5Ocn2QNsBbYPebaJGnRW9rXiZO8E3gWcGGSGeB1wBuA7UmuA/YD1wJU1d4k24E7gKPA9VV1rK/aJEmz6y0UqupFJ9l11Un6bwG29FWPJGluZ8sLzZKks4ChIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDVnXSgkWZ/kriT7ktww6XokaTE5q0IhyRLgL4HnAOuAFyVZN9mqJGnxOKtCAbgc2FdVX6mq7wI3AxsmXJMkLRqpqknX0CR5PrC+qn6t234J8LSq+s2hPpuBzd3m44C7zuAhLwS+fgbHLzSLbbzgmBcLx3x6fqyqpmbbsfTB19OLzNL2gNSqqq3A1nl5sGRPVU3Px7kWgsU2XnDMi4Vjnj9n2+2jGWDV0PZK4OCEapGkRedsC4XPAGuTrElyHrAR2DHhmiRp0Tirbh9V1dEkvwl8GFgCvL2q9vb4kPNyG2oBWWzjBce8WDjmeXJWvdAsSZqss+32kSRpggwFSVJzzofCXNNmZODPuv1fSPKUSdQ5n0YY84u7sX4hySeTXDaJOufTqNOjJHlqkmPdZ2IWtFHGnORZSW5PsjfJx8dd43wb4W/70Unen+Tz3ZhfNok650uStyc5nORLJ9k//89fVXXO/jB4sfo/gMcC5wGfB9Yd1+e5wD8x+IzE04FbJ133GMZ8BbCsW3/OYhjzUL+PAB8Enj/pusfwe34McAdwSbd90aTrHsOYXwv8Ubc+Bfw3cN6kaz+DMT8TeArwpZPsn/fnr3P9SmGUaTM2AH9dA58GHpNk+bgLnUdzjrmqPllV/9NtfprB50EWslGnR3kl8B7g8DiL68koY/4V4L1VtR+gqhb6uEcZcwGPTBLgEQxC4eh4y5w/VfUJBmM4mXl//jrXQ2EFcGBoe6ZrO90+C8npjuc6Bv9pLGRzjjnJCuB5wJvHWFefRvk9/ySwLMnHktyW5KVjq64fo4z5L4AnMPjQ6xeBV1XV98ZT3kTM+/PXWfU5hR7MOW3GiH0WkpHHk+TnGITCz/RaUf9GGfMbgddU1bHBP5EL3ihjXgr8NHAV8HDgU0k+XVX/3ndxPRllzFcDtwPPBn4c2JnkX6vqmz3XNinz/vx1rofCKNNmnGtTa4w0niRPBN4KPKeq7htTbX0ZZczTwM1dIFwIPDfJ0ar6x7FUOP9G/dv+elV9G/h2kk8AlwELNRRGGfPLgDfU4Ib7viT3AI8Hdo+nxLGb9+evc/320SjTZuwAXtq9iv904H+r6tC4C51Hc445ySXAe4GXLOD/GofNOeaqWlNVq6tqNfBu4BULOBBgtL/tW4CfTbI0yQ8BTwPuHHOd82mUMe9ncGVEkosZzKT8lbFWOV7z/vx1Tl8p1EmmzUjyG93+NzN4J8pzgX3Adxj8p7FgjTjmPwB+BLip+8/5aC3gGSZHHPM5ZZQxV9WdST4EfAH4HvDWqpr1rY0LwYi/5z8E3pHkiwxurbymqhbslNpJ3gk8C7gwyQzwOuCh0N/zl9NcSJKac/32kSTpNBgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlS8/8yVAcGzrFlGgAAAABJRU5ErkJggg==\n",
71 | "text/plain": [
72 | ""
73 | ]
74 | },
75 | "metadata": {
76 | "needs_background": "light"
77 | },
78 | "output_type": "display_data"
79 | }
80 | ],
81 | "source": [
82 | "# Theoretical parameters for men\n",
83 | "p1 = 0.65\n",
84 | "n1=1000\n",
85 | "n = n1\n",
86 | "\n",
87 | "μ_1 = p1\n",
88 | "var_1 = p1*(1-p1)\n",
89 | "\n",
90 | "dist_men = np.concatenate((np.ones(int(p1*n)), np.zeros(int(n-(p1*n)))))\n",
91 | "sns.histplot(dist_men);"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 11,
97 | "id": "continued-offering",
98 | "metadata": {},
99 | "outputs": [
100 | {
101 | "data": {
102 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAD5CAYAAADItClGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAARu0lEQVR4nO3deYxdZ33G8e9Tm4QdksaJjJfatIbioLJ0SGmgCEjVBFrVAREwpWBRq6YiZSkVjUOl8kdlKVWrCpU2gBUopqUkLktjKEuDISxiCQ6ExQkpLoF4ZDc2AcomBdn8+sc9ebm2x56bZM69Y8/3I43OOe95z7m/V7buM2edVBWSJAH8wqQLkCTNH4aCJKkxFCRJjaEgSWoMBUlSYyhIkprFfe48ycOBq4DHAgX8EXArcA2wCvgW8Pyq+l7X/3JgI3AYeGVVfeRE+z/rrLNq1apV/RQvSaeoG2+88TtVtWSmdenzOYUk24BPVdVVSU4DHgi8DvhuVV2RZDNwRlVdlmQt8C7gPOARwEeBR1XV4ePtf2pqqnbt2tVb/ZJ0KkpyY1VNzbSut9NHSR4KPA14K0BV/bSqvg+sA7Z13bYBF3fz64Crq+quqroN2MMgICRJY9LnNYVHAgeBf07ypSRXJXkQcE5V7Qfopmd3/ZcBe4e2n+7aJElj0mcoLAaeCLypqp4A/BjYfIL+maHtmHNbSTYl2ZVk18GDB+emUkkS0G8oTAPTVfX5bvndDELijiRLAbrpgaH+K4a2Xw7sO3qnVbW1qqaqamrJkhmvk0iS7qXeQqGq/hfYm+TRXdMFwM3ADmBD17YBuLab3wGsT3J6ktXAGuCGvuqTJB2r11tSgVcA7+zuPPom8FIGQbQ9yUbgduASgKranWQ7g+A4BFx6ojuPJElzr9dQqKqbgJlue7rgOP23AFv6rEmSdHw+0SxJagwFSVJjKEjSHFi2YiVJxvazbMXKXsbR94VmSVoQ9k3v5QVv+czYPu+al53fy349UpAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1PQaCkm+leSrSW5KsqtrOzPJdUm+0U3PGOp/eZI9SW5NcmGftUmSjjWOI4VnVNXjq2qqW94M7KyqNcDObpkka4H1wLnARcCVSRaNoT5JUmcSp4/WAdu6+W3AxUPtV1fVXVV1G7AHOG/85UnSwtV3KBTwX0luTLKpazunqvYDdNOzu/ZlwN6hbae7tiMk2ZRkV5JdBw8e7LF0SVp4Fve8/6dU1b4kZwPXJfn6CfpmhrY6pqFqK7AVYGpq6pj1kqR7r9cjhara100PAO9jcDrojiRLAbrpga77NLBiaPPlwL4+65MkHam3UEjyoCQPuXse+B3ga8AOYEPXbQNwbTe/A1if5PQkq4E1wA191SdJOlafp4/OAd6X5O7P+beq+nCSLwDbk2wEbgcuAaiq3Um2AzcDh4BLq+pwj/VJko7SWyhU1TeBx83QfidwwXG22QJs6asmSdKJ+USzJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqRmQYfCshUrSTKWn2UrVk56uJI0q8WTLmCS9k3v5QVv+cxYPuual50/ls+RpPtiQR8pSJKOZChIkpreQyHJoiRfSvKBbvnMJNcl+UY3PWOo7+VJ9iS5NcmFfdcmSTrSOI4UXgXcMrS8GdhZVWuAnd0ySdYC64FzgYuAK5MsGkN9kqROr6GQZDnwu8BVQ83rgG3d/Dbg4qH2q6vqrqq6DdgDnNdnfZKkI/V9pPAG4C+Anw21nVNV+wG66dld+zJg71C/6a7tCEk2JdmVZNfBgwd7KVqSFqreQiHJ7wEHqurGUTeZoa2OaajaWlVTVTW1ZMmS+1SjJOlIfT6n8BTg95M8G7g/8NAk/wrckWRpVe1PshQ40PWfBlYMbb8c2NdjfZKko/R2pFBVl1fV8qpaxeAC8seq6g+BHcCGrtsG4NpufgewPsnpSVYDa4Ab+qpPknSsSTzRfAWwPclG4HbgEoCq2p1kO3AzcAi4tKoOT6A+SVqwxhIKVXU9cH03fydwwXH6bQG2jKMmSdKxfKJZktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSM1IoJHnKKG2SpJPbqEcKbxyxTZJ0EjvhX15L8pvA+cCSJK8ZWvVQYFGfhUmSxm+2P8d5GvDgrt9Dhtp/ADyvr6IkSZNxwlCoqk8An0jy9qr69phqkiRNyGxHCnc7PclWYNXwNlX1zD6KkiRNxqih8O/Am4GrgMP9lSNJmqRRQ+FQVb2p10okSRM36i2p70/y8iRLk5x590+vlUmSxm7UI4UN3fS1Q20FPHJuy5EkTdJIoVBVq/suRJI0eSOFQpKXzNReVe84wTb3Bz4JnN59zrur6vXdaadrGNzJ9C3g+VX1vW6by4GNDC5mv7KqPjLySCRJ99mop4+eNDR/f+AC4IvAcUMBuAt4ZlX9KMn9gE8n+RDwXGBnVV2RZDOwGbgsyVpgPXAu8Ajgo0keVVXe7SRJYzLq6aNXDC8neRjwL7NsU8CPusX7dT8FrAOe3rVvA64HLuvar66qu4DbkuwBzgM+O0qNkqT77t6+OvsnwJrZOiVZlOQm4ABwXVV9HjinqvYDdNOzu+7LgL1Dm093bZKkMRn1msL7GfyWD4MX4T0G2D7bdt2pn8cneTjwviSPPdHHzLSLGWrZBGwCWLly5WwlSJLugVGvKfzd0Pwh4NtVNT3qh1TV95NcD1wE3JFkaVXtT7KUwVEEDI4MVgxtthzYN8O+tgJbAaampo4JDUnSvTfS6aPuxXhfZ/Cm1DOAn862TZIl3RECSR4A/Ha3jx38/LmHDcC13fwOYH2S05OsZnB66oaRRyJJus9GPX30fOBvGVwUDvDGJK+tqnefYLOlwLYkixiEz/aq+kCSzwLbk2wEbgcuAaiq3Um2AzczOBq51DuPJGm8Rj199JfAk6rqAAyOAoCPAscNhar6CvCEGdrvZHBL60zbbAG2jFiTJGmOjXr30S/cHQidO+/BtpKkk8SoRwofTvIR4F3d8guAD/ZTkiRpUmb7G82/wuC5gtcmeS7wVAbXFD4LvHMM9UmSxmi2U0BvAH4IUFXvrarXVNWfMThKeEO/pUmSxm22UFjVXTA+QlXtYvBCO0nSKWS2ULj/CdY9YC4LkSRN3myh8IUkf3x0Y/eMwY39lCRJmpTZ7j56NYN3Fr2In4fAFHAa8Jwe65IkTcAJQ6Gq7gDOT/IM4O6X2f1nVX2s98okSWM36t9T+Djw8Z5rkSRNmE8lS5IaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSmt5CIcmKJB9PckuS3Ule1bWfmeS6JN/opmcMbXN5kj1Jbk1yYV+1SZJm1ueRwiHgz6vqMcCTgUuTrAU2Azurag2ws1umW7ceOBe4CLgyyaIe65MkHaW3UKiq/VX1xW7+h8AtwDJgHbCt67YNuLibXwdcXVV3VdVtwB7gvL7qkyQdayzXFJKsAp4AfB44p6r2wyA4gLO7bsuAvUObTXdtR+9rU5JdSXYdPHiw17olaaHpPRSSPBh4D/DqqvrBibrO0FbHNFRtraqpqppasmTJXJUpSaLnUEhyPwaB8M6qem/XfEeSpd36pcCBrn0aWDG0+XJgX5/1SZKO1OfdRwHeCtxSVX8/tGoHsKGb3wBcO9S+PsnpSVYDa4Ab+qpPknSsxT3u+ynAi4GvJrmpa3sdcAWwPclG4HbgEoCq2p1kO3AzgzuXLq2qwz3WJ0k6Sm+hUFWfZubrBAAXHGebLcCWvmqSJJ2YTzRLkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpKa3UEjytiQHknxtqO3MJNcl+UY3PWNo3eVJ9iS5NcmFfdUlSTq+Po8U3g5cdFTbZmBnVa0BdnbLJFkLrAfO7ba5MsmiHmuTJM2gt1Coqk8C3z2qeR2wrZvfBlw81H51Vd1VVbcBe4Dz+qpNkjSzcV9TOKeq9gN007O79mXA3qF+013bMZJsSrIrya6DBw/2WqwkLTTz5UJzZmirmTpW1daqmqqqqSVLlvRcliQtLOMOhTuSLAXopge69mlgxVC/5cC+MdcmSQveuENhB7Chm98AXDvUvj7J6UlWA2uAG8ZcmyQteIv72nGSdwFPB85KMg28HrgC2J5kI3A7cAlAVe1Osh24GTgEXFpVh/uqTZI0s95CoapeeJxVFxyn/xZgS1/1SJJmN18uNEuS5gFDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUzLtQSHJRkluT7EmyedL1SNJCMq9CIcki4J+AZwFrgRcmWTvZqiRp4ZhXoQCcB+ypqm9W1U+Bq4F1E65JkhaM+RYKy4C9Q8vTXZskaQwWT7qAo2SGtjqiQ7IJ2NQt/ijJrffh88665mXnf+c+bH+PJDMNb6zOAsY23nnCMS8M82LM17zs/HF+3FlJ7u2Yf+l4K+ZbKEwDK4aWlwP7hjtU1VZg61x8WJJdVTU1F/s6GSy08YJjXigc89yZb6ePvgCsSbI6yWnAemDHhGuSpAVjXh0pVNWhJH8KfARYBLytqnZPuCxJWjDmVSgAVNUHgQ+O6ePm5DTUSWShjRcc80LhmOdIqmr2XpKkBWG+XVOQJE3QKR8Ks702IwP/0K3/SpInTqLOuTTCmF/UjfUrST6T5HGTqHMujfp6lCRPSnI4yfPGWV8fRhlzkqcnuSnJ7iSfGHeNc22E/9sPS/L+JF/uxvzSSdQ5V5K8LcmBJF87zvq5//6qqlP2h8HF6v8BHgmcBnwZWHtUn2cDH2LwjMSTgc9Puu4xjPl84Ixu/lkLYcxD/T7G4JrV8yZd9xj+nR8O3Ays7JbPnnTdYxjz64C/6eaXAN8FTpt07fdhzE8Dngh87Tjr5/z761Q/UhjltRnrgHfUwOeAhydZOu5C59CsY66qz1TV97rFzzF4HuRkNurrUV4BvAc4MM7iejLKmP8AeG9V3Q5QVSf7uEcZcwEPyeBJ0QczCIVD4y1z7lTVJxmM4Xjm/PvrVA+FUV6bcaq9WuOejmcjg980TmazjjnJMuA5wJvHWFefRvl3fhRwRpLrk9yY5CVjq64fo4z5H4HHMHjo9avAq6rqZ+MpbyLm/Ptr3t2SOsdmfW3GiH1OJiOPJ8kzGITCU3utqH+jjPkNwGVVdXgevG5kLowy5sXArwMXAA8APpvkc1X1330X15NRxnwhcBPwTOCXgeuSfKqqftBzbZMy599fp3oozPrajBH7nExGGk+SXwOuAp5VVXeOqba+jDLmKeDqLhDOAp6d5FBV/cdYKpx7o/7f/k5V/Rj4cZJPAo8DTtZQGGXMLwWuqMEJ9z1JbgN+FbhhPCWO3Zx/f53qp49GeW3GDuAl3VX8JwP/V1X7x13oHJp1zElWAu8FXnwS/9Y4bNYxV9XqqlpVVauAdwMvP4kDAUb7v30t8FtJFid5IPAbwC1jrnMujTLm2xkcGZHkHODRwDfHWuV4zfn31yl9pFDHeW1Gkj/p1r+ZwZ0ozwb2AD9h8JvGSWvEMf8V8IvAld1vzofqJH6Z2IhjPqWMMuaquiXJh4GvAD8DrqqqGW9tPBmM+O/818Dbk3yVwamVy6pq4m9PvbeSvAt4OoM3ok4DrwfuB/19f/lEsySpOdVPH0mS7gFDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVLz/ylzDcqAhFQIAAAAAElFTkSuQmCC\n",
103 | "text/plain": [
104 | ""
105 | ]
106 | },
107 | "metadata": {
108 | "needs_background": "light"
109 | },
110 | "output_type": "display_data"
111 | }
112 | ],
113 | "source": [
114 | "# Theoretical parameters for women\n",
115 | "p2 = 0.58\n",
116 | "n2 = n\n",
117 | "\n",
118 | "μ_2 = p2\n",
119 | "var_2 = p2*(1-p2)\n",
120 | "\n",
121 | "dist_women = np.concatenate((np.ones(int(p2*n)), np.zeros(int(n-(p2*n)))))\n",
122 | "sns.histplot(dist_women);"
123 | ]
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "id": "african-costs",
128 | "metadata": {},
129 | "source": [
130 | "Remember that we want to figure out if there is a meaningful difference in the way men and women vote for the candidate. In reality, we want to come up with a 95% confidence interval for that difference.\n",
131 | "\n",
132 | "Because the sample size is large enough we can assume that the sampling distributions of the sample proportions can be approximated by a normal distribution. In previous articles we already saw some of the properties of these distributions:\n",
133 | "\n",
134 | "$$\\begin{align}\n",
135 | "\\mu_{\\bar p_1}&=p_1 \\\\\n",
136 | "\\sigma_{\\bar p_1}^2 &= \\frac{p_1(1-p_1)}{n_1}\n",
137 | "\\end{align}$$\n",
138 | "\n",
139 | "Notice that to create the poll, they took 1000 samples from the original distribution and then calculated the mean from that sample. This procedure is equivalent to taking a sample from the sampling distribution of sample proportions."
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 12,
145 | "id": "sensitive-budget",
146 | "metadata": {},
147 | "outputs": [
148 | {
149 | "data": {
150 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAobElEQVR4nO3dd3yV9d3/8dcnewcyCZmEvVfYIAhSEavUUVcVqQu91VZrbW17t7faX3t721brahUVGQpOrKMOEFAEQiAge0MGGYQMkkB2cr6/PxKVKiMk55zrjM/z8TiPnFw5yXnnIuf9uLjO9/p+xRiDUkop9+NjdQCllFIdowWulFJuSgtcKaXclBa4Ukq5KS1wpZRyU37OfLKYmBiTlpbmzKdUSim3t3nz5jJjTOx3tzu1wNPS0sjOznbmUyqllNsTkbzTbddTKEop5aa0wJVSyk1pgSullJvSAldKKTelBa6UUm5KC1wppdyUFrhSSrkpp44DV8pVNLXY2FFYxZa841TXNX2zPalrCKN6RJEWHYKIWJhQqXPTAldewxjD2oNlvLw2hw2Hy6lvsgHwdU+fOjV+TFggPxgYz9wL0kmNDrUgrVLnpgWuPJ4xhhW7S3h29UG2F1QRHxHIdaNSGN0jilFpUcSGB37zuEOlJ9mYc5wNh8t5e3MBr2/M59Ih3fnZ1F70jg+3+DdR6j+JM1fkycjIMHopvXKm4zWN/O5fO/hox1FSo0O4a3JPrhiRSKCf7zm/91h1PS+vy+HVzDwaW2w8eHFfbpuYjo+PnlpRziUim40xGd/brgWuPNXaA2U88NZWKmoa+cX0vtw+qQd+vuf/vn35yQZ+s2wHy3eXMC49mr9dM5TuXYIdkFip0ztTgesoFOWRFmXmctP8LMIC/Xj3vyZw15SeHSpvgOiwQF64aSSPXzWEbQWVXPbMWnYWVtk5sVLnTwtceRRjDE+vPMAf3tvFtH7xfHjvJAYlRnb654oI14xK5oN7JxLk78v18zaQdbjcDomV6jgtcOUxbDbDHz/cwxMr9nPliESev3EEwQHnPtd9PnrGhvH2XeOIiwhk9vyNrNxTYtefr9T50AJXHuMvy/cxf10OP52Qxl+vHtrhUybnkhAZzFt3jqdvt3DufHUz6w+VOeR5lDqXc/6Fi0iyiKwWkT0isktEft62/WERKRSRrW23mY6Pq9Tpvbohj39+fogbxqTwhx8OcPhIkajQABbfMobU6FDmLt7M/pITDn0+pU6nPYcozcADxpj+wFjgbhEZ0Pa1J40xw9puHzkspVJnsXJPCX94bydT+8Xx6OUDnXYFZWSIPwt+Ooogf1/mzN9ISXW9U55Xqa+ds8CNMcXGmC1t908Ae4BERwdTqj12FVVxz5KvGNg9kmeuH+6w0yZnktQ1hFfmjKKyrolbFmyivqnFqc+vvNt5/bWLSBowHMhq23SPiGwXkfki0vUM33OHiGSLSHZpaWnn0ip1ipMNzdyz5Csigv14eU4GoYHWXFg8KDGSZ28Yzq6iah75YLclGZR3aneBi0gY8A5wnzGmGvgn0BMYBhQDfzvd9xlj5hljMowxGbGx31tUWakOMcbw22U7yCuv4enrhhMXHmRpnqn94rlzck+Wbszn/W1FlmZR3qNdBS4i/rSW92vGmGUAxpgSY0yLMcYGvAiMdlxMpf7T65uO8P62In4xvQ9j0qOtjgPAAz/ow8jUrvzmne3klNVYHUd5gfaMQhHgZWCPMeaJU7YnnPKwK4Cd9o+n1PftO3qCh9/fxaTeMfzXlF5Wx/mGv68Pz1w/HH8/H+5+bQsNzXo+XDlWe47AJwA3AVO/M2TwcRHZISLbgQuB+x0ZVCmA5hYbv3xrG2GBfjxxzTCXm1iqe5dg/nL1UHYXV/Pc6kNWx1Ee7pzv+hhj1gKne5XosEHldC+tzWFHYRXP3TDim2lgXc30AfFcMTyRf6w+yCWDutE/IcLqSMpD6ZWYym0cKj3JEyv2c/HAeGYO7mZ1nLP6ww8H0CXEnwff3kZzi83qOMpDaYErt2CzGR56ZzvB/r78cdYgl1/urGtoAI/OGsTOwmrmfXnY6jjKQ2mBK7fw2sZ8NuUe5/c/HEBchLVDBttr5uAEZgzsxt8/O6CjUpRDaIErl1dR08hfP93H+J7RXDXCvS4CfnTWQAJ8ffjjh3qBj7I/LXDl8v66fB8nG5p52InznNhLXEQQP5vWi1V7j7Fqr049q+xLC1y5tJ2FVSzdmM/scan0cdNFheeM70F6bCiPfrBbx4Yru9ICVy7LGMP/vL+LqJAA7ruoj9VxOizAz4f/uWwgueW1vLw2x+o4yoNogSuX9a+thWzOO86vZ/QjMtjf6jidMrlPLNMHxPPsqoMcrdJpZ5V9aIErl1Tf1MJfP93PoMQIrh6ZZHUcu/j9pQNoarHx1Mr9VkdRHkILXLmkxZl5FFbW8ZtL+rvc5fIdlRIdwk/GpPLGpiMcPKYr+KjO0wJXLqeqrolnVx/kgj6xTOgVY3Ucu7p3ai9CAvx4/JN9VkdRHkALXLmc5784RFVdE7+e0dfqKHYXHRbI3AvSWb67hM15FVbHUW5OC1y5lKNV9cxfm8OPhnVnYPdIq+M4xK2TehATFshjH+/FGGN1HOXGtMCVS3lq5QGMgQd+4HlH318LCfDjvot6syn3OKv2HrM6jnJjWuDKZRypqOWt7CNcNzqZ5KgQq+M41LWjkkmJCuHvnx3Qo3DVYVrgymU8t/ogPj7iUqvsOIq/rw/3TO3FjsIqVu7Ro3DVMVrgyiUcqajl7c0F3DA6hW6R7jHbYGddOTyR1OgQ/r5yvx6Fqw7RAlcu4ZlVB/DxEe6a0tPqKE7j5+vDPRf2YmdhNSt260RX6vxpgSvL5ZXX8M6WQm4YnUK8m8z1bS9XDE8kLVrPhauO0QJXlntu9UH8fIT/8qKj76/5+fpw79Te7C6uZrkehavzpAWuLFVYWceyLYVcPzrFbVbasbdZw7qTEhXCP1Yf1KNwdV60wJWlXlzTul7k7RekW5zEOn6+Ptw5uSfbCqpYd7Dc6jjKjWiBK8uUn2zg9U35XDE8kcQuwVbHsdRVIxOJCw/kH58ftDqKciNa4Moyr6zLpaHZxp1eeO77uwL9fLl9UjrrD5WzJf+41XGUm9ACV5aorm9iYWYulwzqRs/YMKvjuIQbxqQQGezPP1YfsjqKchNa4MoSr27I40R9s1dcddleoYF+zBmfxmd7Sth7tNrqOMoNaIErp6tvamH+2lwm9Y5hUKJnzjjYUXPGpxES4Mu8tjd3lTobLXDldO9tLaTsZAN3TtZz39/VNTSAazKSeX9rEcVVdVbHUS5OC1w5lc1mmLfmMAMSIhjfM9rqOC7p1ok9sBnDgnW5VkdRLu6cBS4iySKyWkT2iMguEfl52/YoEVkhIgfaPnZ1fFzl7lbvO8ah0hrmTk5HxDPWurS35KgQZg5OYElWPifqm6yOo1xYe47Am4EHjDH9gbHA3SIyAHgIWGmM6Q2sbPtcqbOat+Yw3SODmDk4weooLu2OC9I50dDM6xuPWB1FubBzFrgxptgYs6Xt/glgD5AIzAIWtj1sIfAjB2VUHmLbkUqyciq4ZWIP/H317N3ZDEnqwpgeUcxfl0NTi83qOMpFnderSETSgOFAFhBvjCmG1pIH4s7wPXeISLaIZJeWlnYyrnJn8748THiQH9eNTrE6iluYOzmd4qp6PtxeZHUU5aLaXeAiEga8A9xnjGn3IFVjzDxjTIYxJiM2NrYjGZUHKDhey8c7irlhdAphgX5Wx3ELU/rE0TM2lJfX5ugkV+q02lXgIuJPa3m/ZoxZ1ra5REQS2r6eAOi6UOqMFmXmISLMHp9mdRS34eMj3DKxBzsLq9mUq5fXq+9rzygUAV4G9hhjnjjlS+8DN7fdvxl4z/7xlCeoaWhm6cZ8Zgzq5vWTVp2vK4cn0SXEn5fX6oU96vvacwQ+AbgJmCoiW9tuM4HHgOkicgCY3va5Ut/z9uYCTtQ3c+vEHlZHcTvBAb7cMDqF5btLyC+vtTqOcjHtGYWy1hgjxpghxphhbbePjDHlxphpxpjebR8rnBFYuRebzfDKuhyGJXdhRIpeKtARs8el4SvCgvW5VkdRLkbHcimHWrX3GLnltXr03QndIoO4dEgCb2Yf0Qt71H/QAlcONX9dDgmRQcwY1M3qKG7t1ok9ONnQzJvZBVZHUS5EC1w5zN6j1aw/VM7scWl64U4nDUnqQkZqVxZl5mKz6ZBC1UpfVcphFq7PI9DPh+tGJVsdxSPMmZBGXnktn+/XEbuqlRa4cojK2kbe/aqAK4Yn0jU0wOo4HuHigd3oFhHEKzpLoWqjBa4c4o1NR6hvsnGzXrhjN/6+Ptw4NoUvD5Rx8NhJq+MoF6AFruyuxWZYlJnH2PQo+idEWB3Ho1w/OoUAPx8WZeZaHUW5AC1wZXef7SmhsLKOOXr0bXfRYYFcPrQ7b28uoFqHFHo9LXBldwvW5ZLYJZiL+sdbHcUjzRmfRm1jC2/pkEKvpwWu7Gp/yQkyD5dz49hU/HTooEMMSowkI7Uri3VIodfTV5iyq4Xrc3XooBPMHp9GbnktXxzQOfa9mRa4spuquiaWbSnk8qHddeigg80Y2I3Y8EAW6fwoXk0LXNnNO5sLqGtq0aGDThDg58MNo1P4fH8puWU1VsdRFtECV3ZhsxkWb8hjREoXBiVGWh3HK9wwJgVfEV7dkGd1FGURLXBlF18eLCOnrEaPvp0oPqJ1krA3s49Q29hsdRxlAS1wZReL1ucSExbIJYMSrI7iVW4en0Z1fTPvbdWFj72RFrjqtCMVtazad4zrRycT4Kd/Us6UkdqV/gkRLFyfqwsfeyF9talOe3VDHj4i/GRMqtVRvI6IMHtcKnuPniA7Txc+9jZa4KpT6ptaeCP7CD8YEE+3yCCr43ilWcO6ExHkx0IdUuh1tMBVp7y/rYjK2iZuGqdH31YJCfDjxxnJfLLzKMeq662Oo5xIC1x1mDGGxZl59I4LY1x6tNVxvNqNY1NpthmWbjxidRTlRFrgqsO2HqlkR2EVs8elIiJWx/FqPWJCuaBPLEs25tHUYrM6jnISLXDVYYsz8wgL9OOKEUlWR1HA7LGplFQ3sGJ3idVRlJNogasOKT/ZwIfbi7lyRCJhgX5Wx1HAhf3iSOoarG9mehEtcNUhr286QmOLjZvG6puXrsLXR7hxbCpZORXsO3rC6jjKCbTA1XlrsRmWZOUzLj2a3vHhVsdRp7gmo/ViqsUbcq2OopxAC1ydt5VtS6bdPF6Pvl1NVGgAlw3pzrtbCjmhS655PC1wdd4Wb8gjITJIl0xzUbPHpVLT2MKyLYVWR1EOpgWuzsuh0pN8eaCMG0an6JJpLmpocheGJndhUabOj+LpzvkKFJH5InJMRHaesu1hESkUka1tt5mOjalcxeLMPPx9hetGp1gdRZ3F7LGpHCqtYf2hcqujKAdqzyHUAmDGabY/aYwZ1nb7yL6xlCuqaWjmnc0FzBycQGx4oNVx1FlcOiSBqNAAFmXmWh1FOdA5C9wYswaocEIW5eL+tbWQEw3NzNZ5T1xekL8v12Qks2J36xvOyjN15iTmPSKyve0US9czPUhE7hCRbBHJLi3VFbTdlTGGRevzGNg9ghEpZ/znVi7kJ2NaT3MtydIl1zxVRwv8n0BPYBhQDPztTA80xswzxmQYYzJiY2M7+HTKalk5FewrOcHN49J03hM3kRwVwrT+8SzdeIT6phar4ygH6FCBG2NKjDEtxhgb8CIw2r6xlKtZlJlLlxB/Lh/W3eoo6jzcPC6NippGPtpRbHUU5QAdKnAROXXhwyuAnWd6rHJ/xVV1fLqrhGszkgny97U6jjoPE3pFkx4bysJMPY3iidozjHApkAn0FZECEbkVeFxEdojIduBC4H4H51QWWpqVj80YbtR5T9yOiHDzuDS2Halk65FKq+MoOzvnNHLGmOtPs/llB2RRLqihuYUlG/OZ1i+O5KgQq+OoDrhyRCKPf7KXRZm5DEseZnUcZUd6KZ06q493HKXsZCM3jUuzOorqoPAgf64amcSH24opO9lgdRxlR1rg6qwWrM8lPTaUSb1irI6iOmH2uFQaW2y8vjHf6ijKjrTA1RltbTtvevO4NHx8dOigO+sVF86k3jEs3qBLrnkSLXB1RgvX5xIW6MdVI3XJNE8wZ3waJdUNfLrrqNVRlJ1ogavTOnaing+3F3H1yCRdMs1DTOkbR0pUiC655kG0wNVpLc06QlOL0XlPPIivjzB7XCqbco+zs7DK6jjKDrTA1fc0Ntt4NSuPyX1iSY8NszqOsqMfZyQT7O+rR+EeQgtcfc/HO4spPdHAnPFpVkdRdhYZ7M9VIxN5b1sR5Tqk0O1pgav/YIxh/toc0mNCmdxHJx/zRHPGp9HYbGOpDil0e1rg6j9sya9kW0EVcybo0EFP1SsunAv6xLIoM4/GZh1S6M60wNV/eGVdDuFBflw1QocOerJbJqRx7ESDzlLo5rTA1TeKKuv4eOdRrh+dQqgOHfRoF/SOJT02lPnrcnThYzemBa6+sSgzD2N06KA38PERfjqhB9sLqtiSf9zqOKqDtMAVAHWNLSzdmM/FA7uR1FVnHfQGV41IJCLIj/lrc62OojpIC1wB8M6WAqrqmrhlYg+roygnCQnw4/rRKXy8s5iC47VWx1EdoAWusNlahw4OSYokI1UXLPYmN49vXeN0wbpcq6OoDtACV6zae4zDZTXcNildFyz2Mt27BHPp4ARe33SE6vomq+Oo86QFrnjxy8Mkdglm5qBuVkdRFrhtUg9ONjTzxsYjVkdR50kL3MvtKKgiK6eCOePT8PPVPwdvNCSpC6N7RPHKuhyada5wt6KvWC/34peHCQv049rRyVZHURa6fVI6RVX1fLRT5wp3J1rgXqywso5/7yjmulHJRAT5Wx1HWWhavzh6xITy0peH9cIeN6IF7sUWrMsBYM6ENGuDKMv5+Ai3TGy9sCcrp8LqOKqdtMC9VFVtE0uy8rl0cIJeuKMAuHpEElGhAbzwxSGro6h20gL3Uq9m5VHT2MLcyelWR1EuIjjAlznj01i9r5S9R6utjqPaQQvcC9U3tfDKulwm9Y5hYPdIq+MoF3LT2FSC/X2Z98Vhq6OodtAC90LvflVI2ckG7prc0+ooysV0DQ3gutHJvL+tiMLKOqvjqHPQAvcyLTbDvDWHGZwYybie0VbHUS7o1ok9MMDLX+ZYHUWdgxa4l1mx+yg5ZTXcObmnXjavTiupawiXD+3O65vyqaxttDqOOotzFriIzBeRYyKy85RtUSKyQkQOtH3UGZDcgDGGf3x+iNToEGboZfPqLOZOTqe2sYWF6/OsjqLOoj1H4AuAGd/Z9hCw0hjTG1jZ9rlycWsOlLG9oIq7JvfEV9e7VGfRr1sEF/WP45X1OdQ0NFsdR53BOQvcGLMG+O7I/lnAwrb7C4Ef2TeWcoTnVh0kITKIK3W9S9UOd1/Yi8raJl7L0qNwV9XRc+DxxphigLaPcfaLpBxhY04FG3MrmHtBOgF++taHOrfhKV2Z2CuGF7/Mob6pxeo46jQc/koWkTtEJFtEsktLSx39dOoMnl19kJiwAK4bnWJ1FOVG7r6wF6UnGngrW6eadUUdLfASEUkAaPt47EwPNMbMM8ZkGGMyYmNjO/h0qjO2F1SyZn8pt05MJ8jf1+o4yo2MTY9iZGpXnv/iME061azL6WiBvw/c3Hb/ZuA9+8RRjvDMqoNEBPlx41g9+lbnR0S458JeFFbW8e6WQqvjqO9ozzDCpUAm0FdECkTkVuAxYLqIHACmt32uXNDOwipW7C7htknphOuUsaoDpvSNZUhSJM+sPqBH4S6mPaNQrjfGJBhj/I0xScaYl40x5caYacaY3m0fdf5JF/X3zw4QEeSnU8aqDhMR7ruoN0cq6li2pcDqOOoUOhzBg+0oqOKzPSXcPildF2xQnXJh3ziGJkXyzKqDNDbrUbir0AL3YE+t3E9ksD8369G36qTWo/A+FBzXo3BXogXuobYXVPLZnmPcNrGHHn0ru5jSN5ahyV30KNyFaIF7qCdXtB5967lvZS9fnwsvrKzjTR0X7hK0wD3QptwKVu8rZe5kHXmi7GtKn1hGpnblmVUH9OpMF6AF7mGMMTz+yV5iwwP56fgeVsdRHkZEePDivpRUN7Bwfa7VcbyeFriH+XxfKZtyj/Ozqb0IDtCrLpX9jU2P5oI+sfzzi0NU1zdZHceraYF7EJvN8Pin+0iOCubaUXrVpXKcX13cl8raJl5co2tnWkkL3IN8uKOYPcXVPDC9r844qBxqUGIklw5J4OW1OZSeaLA6jtfSV7mHaGy28bfl++jXLZzLh3a3Oo7yAg9M70NDs41nVh2wOorX0gL3EIs35JFXXsuvL+mHj662o5wgPTaM60YlsyQrn0OlJ62O45W0wD1AZW0jT688wKTeMUzpo1P2Kue5f3ofgvx9+d+P9lodxStpgXuAZ1cdpLq+id9c0l9XmldOFRMWyF1TevLZnhIyD5VbHcfraIG7ubzyGhZm5vLjkUkM6B5hdRzlhW6d2IPukUH86aPd2GzG6jheRQvczf3fJ3vx8/HhgR/0tTqK8lJB/r48OKMvOwur+ddWXfTBmbTA3dj6Q2V8tOMocyenEx8RZHUc5cVmDU1kaFIkj328l5MNzVbH8Rpa4G6qqcXGI+/vJqlrMHdO7ml1HOXlfHyEhy8fyLETDTqs0Im0wN3U4sw89pWc4Pc/HKALFSuXMDylKz8emcT8tTk6rNBJtMDdUNnJBp78bD+TesfwgwHxVsdR6hu/mtGPID9fHvlgN8boG5qOpgXuhh7/ZC91jS38z2UDddigcimx4YHcP70Pa/aXsnx3idVxPJ4WuJvZmFPBm9kF3DqxB73iwqyOo9T33DQulb7x4Tzy/i5q9A1Nh9ICdyMNzS38Ztl2ErsE8/OLelsdR6nT8vf14c9XDqKoqp6/Ld9vdRyPpgXuRp7//DCHSmv4f1cMIiTAz+o4Sp3RyNQobhybwoL1OewoqLI6jsfSAncTh0pP8tzqg1w2tDsX9o2zOo5S5/SrGf2ICQvkoWXbaW7RRZAdQQvcDdhsht8u20GQvw9/+OEAq+Mo1S4RQf48cvlAdhVVM39djtVxPJIWuBt4NSuPrJwKfjuzP7HhgVbHUardZgzqxvQB8fxt+X4dG+4AWuAuLreshv/9aC+T+8Ry7ahkq+ModV5EhD9dMYjgAF9++dY2WnSyK7vSAndhLTbDg29vw89XeOyqwTrmW7mluPAgHp01iK/yK5mna2jalRa4C3tlXQ6bco/z8GUDSYgMtjqOUh122ZAEZg7uxpMr9rPv6Amr43iMThW4iOSKyA4R2Soi2fYKpWDf0RP85dN9TB8Qz5UjEq2Oo1SniAh/nDWI8CA/7n9jKw3NLVZH8gj2OAK/0BgzzBiTYYefpYC6xhbuXbqF8CB//nyFnjpRniE6LJD/u2oIu4ur+b+P91kdxyPoKRQX9Md/72Z/yUmeuGaojjpRHuWiAfHMGZ/G/HU5rNqrc6V0VmcL3ADLRWSziNxxugeIyB0iki0i2aWlpZ18Os/38Y5ilmTlM3dyOhfoAsXKAz10ST/6J0Twy7e2U1Jdb3Uct9bZAp9gjBkBXALcLSIXfPcBxph5xpgMY0xGbKwW0tkcqajl1+9sZ2hSJA9M1yXSlGcK8vflmeuHU9fYwn2vb9WrNDuhUwVujClq+3gMeBcYbY9Q3qiusYW5izcD8PT1wwnw07NbynP1igvjjz8aRObhcv6yXM+Hd1SHW0JEQkUk/Ov7wA+AnfYK5k2MMfzu3R3sOVrNU9cNJzU61OpISjnc1SOTuHFsCi98cZh/by+2Oo5b6sxhXjywVkS2ARuBfxtjPrFPLO+yeEMey74q5L5pfbiwn05UpbzHH344kOEpXXjw7W0cKNHx4eerwwVujDlsjBnadhtojPmTPYN5iw2Hy3n0g91M6xfHvVN7WR1HKacK8PPhnz8ZSUiAH3cs3kxlbaPVkdyKnmi10KHSk8xdvJnU6BCeuHYYPj463lt5n26RQTx/4wgKj9cxd/FmGpv1Tc320gK3SEVNI7cs2ISfj/DKnNFEBvtbHUkpy2SkRfGXHw8hK6eCh5Zt1wWR20mXdbFAfVMLcxdnU1xVz9Lbx5ISHWJ1JKUsN2tYIrlltTz52X7SokP52TRdNvBctMCdrLnFxs+WfsWm3OM8e8NwRqZ2tTqSUi7jZ9N6kVdRwxMr9hMbHsj1o1OsjuTStMCdyGYz/Oqd7SzfXcLDlw3gh0O6Wx1JKZciIjx25RAqahr57bs7CAv047Kh+jo5Ez0H7iTGGB79cDfLthTyi+l9mDOhh9WRlHJJX49MGZUaxf1vbGX13mNWR3JZWuBOYIzhsY/3smB9LrdN7KHDBZU6h+AAX16ak0G/hHDufHUzX+zXeZRORwvcwYwxPPLBbl5Yc5ibxqbyu0v76/SwSrVDRJA/i24ZQ3psGLcvzGblHp298Lu0wB3IZjP87l87WbA+l1sn9uDRWQO1vJU6D1GhASy9fcw3R+Kf7NRL7k+lBe4gDc0t3P/mVpZk5XPXlJ78tx55K9UhXUICePW2MQxOjOTuJV/xxqZ8qyO5DC1wB6iqa2LO/E28t7WIBy/uy68u7qvlrVQnRAT5s+jWMYzvGc2v39nBkyv268U+aIHbXVFlHdc8n0l2XgVPXjuUuy/speWtlB2EBfoxf84orh6ZxFMrD/Crt7d7/WX3Og7cjjYcLufu17bQ0GxjwU9HM6FXjNWRlPIo/r4+/OXqIXTvEszTKw+QV17Lsz8ZTlx4kNXRLKFH4HZgjOGVdTn85KUsIkP8+dfd47W8lXIQEeEX0/vw1HXD2F5YyWXPrOWr/ONWx7KEFngnVdc38fPXt/LIB7uZ2i+O9+6eQK+4cKtjKeXxZg1LZNldEwjw8+HaFzbwyrocrzsvrgXeCdm5Fcx86kv+vaOYBy/uyws3jiQ8SGcVVMpZBnSP4IN7JjKpdwyPfLCbny7YROmJBqtjOY0WeAfUN7Xw10/3cc0LmfiI8Nad47j7wl46n7dSFugSEsBLN2fw6KyBZB4q55Kn1vDJzqNWx3IKLfDztDGngplPf8mzqw9y5Ygk/v2ziYxI0RkFlbKSiDB7XBof3DuRuPAg7nx1M3cu3kxJdb3V0RxKnHnOKCMjw2RnZzvt+eyp7GQDf1u+j6Ubj5DYJZg/XzmYyX1irY6llPqOphYbL355mL9/doBAPx8emN6Hn4xNxd/XfY9XRWSzMSbje9u1wM+uobmFBetyeXbVQeqaWpgzPo37p/chNFBHYCrlynLKavjvf+1g3cFyesWF8d+X9mdKX/dcNFwL/Dw1tdh4d0shT686QMHxOqb2i+O3M/vTKy7M6mhKqXYyxrBidwl/+mgPeeW1jO8ZzS+m9yEjLcrqaOdFC7ydGppbeG9rEc+tPkheeS2DEyP55cV99XSJUm6sobmFxZl5PP/FIcpONjKpdwz3Tu3NqLSubnGltBb4OVTWNvJaVj4L1+dy7EQDA7tHcP9FfZjWP84t/oGVUudW29jM4sw8XlhzmIqaRoYmRXLbpHQuGdQNPxc+R64FfhrGGDbmVPDGpiP8e0cxDc02JvWO4fZJ6UzqHaPFrZSHqmts4e0tBcxfm0NOWQ3xEYH8eGQy145KJjnK9RYZ1wI/xcFjJ/lgWxEfbCvicFkNYYF+zBrWnZvGpdKvW4TV8ZRSTmKzGVbuPcaSrDw+31+KMTAuPZrLh3VnxsBudA0NsDoi4OUFbrMZdhZV8dmeY6zYXcKe4mpEYHRaFFePTOLSIQmEBOioEqW8WVFlHW9mH+G9rUXklNXg5yNM6BXDRf3jmNo/nsQuwZZl87oCL6qsY93BMtYfKmftwTJKTzTgIzAytSuXDErg0iEJxEd45wxmSqkzM8awq6iaD7YV8emuo+SW1wLQr1s443vGMKFXNKN7RDl12gyPLvDaxmb2FFezvaCKLfmVbMk7TmFlHQDRoQGM6xnN1H5xTOkbR5SL/JdIKeX6jDEcKq1h5Z4S1hwoZVPucRqbbfgI9O0WwcjULgxP7srgpEjSY0Id9kaoRxR4XWMLeRU1HC6tYX/JCQ6UnGTv0WoOl9Xw9a+REBnEiNSujEzpyrie0fSND9c5SpRSdlHf1MKWvONsyKngq/zjfJVfycmGZgCC/H3o1y2CPvFh9IkPp3d8OD2iQ+neJajTxe6QAheRGcBTgC/wkjHmsbM9vqMF/vTKAyzJyufoKfMaiEBKVAi948IZlBjBwO6RDEqMICHSuvNUSinv0mIzHDx2kl1FVewqqmZXURUHj52k7GTjN4/x9xWSu4bw5ysHMzY9ukPPc6YC7/A7dyLiCzwHTAcKgE0i8r4xZndHf+aZxEcEMqFXDGnRIaTFhNIjJpSesWEEB/ja+6mUUqrdfH2Evt3C6dstnCtHfLu9oqaRAyUnyCuvJae8hrzyGoecvu3M0IvRwEFjzGEAEXkdmAXYvcCvHZXCtaNS7P1jlVLKIaJCAxiTHs2YDh5xt1dnTswkAkdO+bygbdt/EJE7RCRbRLJLS0s78XRKKaVO1ZkCP907g987oW6MmWeMyTDGZMTG6nwiSillL50p8AIg+ZTPk4CizsVRSinVXp0p8E1AbxHpISIBwHXA+/aJpZRS6lw6/CamMaZZRO4BPqV1GOF8Y8wuuyVTSil1Vp2aAMQY8xHwkZ2yKKWUOg+uOwGuUkqps9ICV0opN+XUuVBEpBTIc9oTuq4YoMzqEC5E98e3dF98S/fFt1KNMd8bh+3UAletRCT7dPMaeCvdH9/SffEt3RfnpqdQlFLKTWmBK6WUm9ICt8Y8qwO4GN0f39J98S3dF+eg58CVUspN6RG4Ukq5KS1wpZRyU1rgdiYiM0Rkn4gcFJGHzvCYKSKyVUR2icgXbduSRWS1iOxp2/5z5ya3v47ui1O+5isiX4nIh85J7Did2Rci0kVE3haRvW1/H+Ocl9wxOrk/7m/btlNElopIkPOSuxhjjN7sdKN1Uq9DQDoQAGwDBnznMV1oXbUope3zuLaPCcCItvvhwP7vfq873TqzL075+i+AJcCHVv8+Vu4LYCFwW9v9AKCL1b+TVfuD1kVjcoDgts/fBOZY/TtZddMjcPv6Zpk5Y0wj8PUyc6e6AVhmjMkHMMYca/tYbIzZ0nb/BLCH06xw5EY6vC8ARCQJuBR4yUl5HanD+0JEIoALgJfbtjcaYyqdFdxBOvW3QeskfMEi4geE4MXrEGiB21d7lpnrA3QVkc9FZLOIzP7uDxGRNGA4kOWooE7Q2X3xd+BXgM2hKZ2jM/siHSgFXmk7nfSSiIQ6PrJDdXh/GGMKgb8C+UAxUGWMWe6EzC6pU9PJqu9pzzJzfsBIYBoQDGSKyAZjzH4AEQkD3gHuM8ZUOzKsg3V4X9D64j1mjNksIlMcGdJJOrMv/IARwL3GmCwReQp4CPi9A/M6Wmf2RymtR+s9gErgLRG50RjzquPiui4tcPtqzzJzBUCZMaYGqBGRNcBQYL+I+NNa3q8ZY5Y5I7ADdWZfjAAuF5GZQBAQISKvGmNudEJuR+jMvvgSKDDGfP2/sbdpLXB31pn9AZBjjCkFEJFlwHjAKwtcT6HYV3uWmXsPmCQifiISAowB9oiI0Hqec48x5gmnpnaMDu8LY8xvjDFJxpi0tu9b5cblDZ3bF0eBIyLSt+1x02h9c8+ddXh/0HrqZKyIhLS9Zqa1bfdKegRuR+YMy8yJyJ1tX3/eGLNHRD4BttN6fvclY8xOEZkI3ATsEJGtbT/yt6Z11SO305l9YV1qx7DDvrgXeK2t7A4DP3X+b2E/nd0fIvI2sAVoBr7Ciy+510vplVLKTekpFKWUclNa4Eop5aa0wJVSyk1pgSullJvSAldKKTelBa6UUm5KC1wppdzU/wdJ2ww6GHM+tgAAAABJRU5ErkJggg==\n",
151 | "text/plain": [
152 | ""
153 | ]
154 | },
155 | "metadata": {
156 | "needs_background": "light"
157 | },
158 | "output_type": "display_data"
159 | }
160 | ],
161 | "source": [
162 | "# Sampling distribution of sample proportion for men\n",
163 | "\n",
164 | "mu = p1\n",
165 | "variance = p1*(1-p1)/n\n",
166 | "sigma = math.sqrt(variance)\n",
167 | "x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)\n",
168 | "sns.lineplot(x = x, y = norm.pdf(x, mu, sigma));"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": 13,
174 | "id": "reserved-baseball",
175 | "metadata": {},
176 | "outputs": [
177 | {
178 | "data": {
179 | "image/png": "\n",
180 | "text/plain": [
181 | ""
182 | ]
183 | },
184 | "metadata": {
185 | "needs_background": "light"
186 | },
187 | "output_type": "display_data"
188 | }
189 | ],
190 | "source": [
191 | "# Sampling distribution of sample proportion for women\n",
192 | "\n",
193 | "mu = p2\n",
194 | "variance = p2*(1-p2)/n\n",
195 | "sigma = math.sqrt(variance)\n",
196 | "x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)\n",
197 | "sns.lineplot(x = x, y = norm.pdf(x, mu, sigma));"
198 | ]
199 | },
200 | {
201 | "cell_type": "markdown",
202 | "id": "british-foster",
203 | "metadata": {},
204 | "source": [
205 | "In fact, we are not interest in the individual distributions for men and women. Our interest is on their differences, so let's create the sampling distribution of $\\bar p_1 - \\bar p_2$. For that, we need to define its parameters:\n",
206 | "\n",
207 | "$$\\begin{align}\n",
208 | "\\mu_{\\bar p_1 - \\bar p_2} &= p_1 - p_2 \\approx \\bar p_1 - \\bar p_2 \\\\\n",
209 | "\\sigma_{\\bar p_1 - \\bar p_2}^2 &= \\frac{p_1(1-p1)}{n_1} + \\frac{p_2(1-p2)}{n_2} \\approx \\frac{\\bar p_1(1-\\bar p1)}{n_1} + \\frac{\\bar p_2(1-\\bar p2)}{n_2}\n",
210 | "\\end{align}$$"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 14,
216 | "id": "laden-press",
217 | "metadata": {},
218 | "outputs": [
219 | {
220 | "data": {
221 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAD4CAYAAAAKA1qZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAswklEQVR4nO3dd3yW9b3/8dcnOyEJARJIyGZPWWEElOIsouKsggOlWEftsD12nNNzTsdpT/21x9ZaLQqKilocdRSLOCvKCIQwDYaRSQYZJIQsMu/v74/cemJOgOQeue479+f5eOSR+75GrjchVz75Xtf3+n7FGINSSinf5Gd1AKWUUtbRIqCUUj5Mi4BSSvkwLQJKKeXDtAgopZQPC7A6QE+io6NNSkqK1TGUUspr7Nmz56QxJqav+3lkEUhJSSErK8vqGEop5TVEpMiR/fRykFJK+TAtAkop5cO0CCillA/TIqCUUj5Mi4BSSvkwLQJKKeXDtAgopZQP88jnBJTyFKcaW8kqOsXnZXV02GwABPj7MSU+kllJQxkcFmhxQqWco0VAqW4aW9rZkHmcV7OKOVrR8OVykc7PX0zBIQLjR0SwfE4St8xOJCTQ34K0SjlHPHFSmbS0NKNPDKv+VtfcxrptBTy3o5DapjZmpwxh0fjhzE4ZygUJg7/8Jd/c1sG+47XsLqzhn4cr2V9cS3R4ECsXpLJyQQphQfq3lep/IrLHGJPW5/20CCgFu/Kr+eGrByitPcPlk0bw7UWjmZE0pFf7ZhbU8MTHuXxytIrU6EH88ZbpTE+Mcm9gpbrRIqCUA1rbbTz64VFWf5JH0tAw/nDzdGYl9+6Xf3c78k7y0KsHqKhv4cFLx3L/otEE+GvfC9U/HC0C+hOqfFZDSzt3PZvJX7bkcfOsRDZ97yKHCwDA/NHRbH5wIVdNjeORD45y7wt7aG7rcGFipVxPi4DySacaW7lt7U52FdTwP9+Yxv+76QLCg52/lj84NJDHls/gv66dzD+PVLJiXSZ1zW0uSKyUe2gRUD6n/HQz33gqg5zyep66fRY3zUpw+THuSE/hT8tmsLfoFMvX7ORkQ4vLj6GUK5y3CIjIOhGpFJHsLsteEZH99o9CEdl/ln0LReQz+3Z6kV9Z7nRTG7c/s4vy0808v3IOl00a4bZjLZ02krUr0sirauDOdZk0trS77VhKOao3LYHngMVdFxhjbjHGTDfGTAdeB944x/4X27ft8w0LpVyppb2Db72QxfHqJp6+M4300cPcfsyLJwxn9W2zyDlRxwN/3Ut7h83tx1SqL85bBIwxnwI1Pa0TEQFuBja4OJdSLmWzGR567SCZBTX8/hsXMG+U+wvAFy6eMJxfXzeVLUeq+Pe3svHEHnnKdzl7T+AioMIYc+ws6w3wvojsEZF7zvWFROQeEckSkayqqionYyn1VX/44ChvHyjjJ4sncO30+H4//q1zk3jg4tG8vLuYpz7N7/fjK3U2zhaB5Zy7FbDAGDMTuBJ4QEQWnm1DY8waY0yaMSYtJqbPcyUrdVYfH6nk8Y9zuSUtkfu+NsqyHA9dMZ6rLojjd+8eJrOgx8a1Uv3O4SIgIgHADcArZ9vGGFNm/1wJvAnMcfR4Sjmi/HQz//LqASbERvDLaycjXwwAZAER4eEbppI0NIzvbdhHTWOrZVmU+oIzLYHLgMPGmJKeVorIIBGJ+OI1cAWQ3dO2SrlDe4eN723YR3NbB0/cNtMjBniLCAnk8VtnUtPYyg9f3Y/NpvcHlLV600V0A5ABjBeREhFZZV+1jG6XgkRkpIi8Y387AtgmIgeATGCTMeZd10VX6twe++gYmYU1/Ob6KYyOCbc6zpemxA/mP66eyJYjVazdqvcHlLXO+4ikMWb5WZbf1cOyMmCJ/XU+MM3JfEo55LOS0zyxJY8bZyZw/QzXPwzmrNvnJbM9t5pH3j/KpROHM2Z4hNWRlI/SJ4bVgNPabuNHfzvAsEFB/Oc1k6yO0yMR4b+um0JYsD8//ttBOvSykLKIFgE14Kzeksfh8np+c/1UBod67sxfMRHB/PyaSew9XstzOwqtjqN8lBYBNaAcKa/n8Y+PsXTaSC5345AQrnLd9HgumTCc3793mKLqRqvjKB+kRUANGDab4SevHyQiJJCfe+hloO5EhN9cP4VAPz/+7c3P9Gli1e+0CKgB4297S9hfXMvPlkxkWHiw1XF6LW5wKD9aPJ7tudVszi63Oo7yMVoE1IBQ19zG7949zMykKG6Y2f/DQjjr1jlJTIiN4DebcjjTqhPRqP6jRUANCI9+cIzqxlZ+de0US58KdlSAvx+/WDqZ0tozrP4kz+o4yodoEVBe72hFPc9nFLJsdhJT4gdbHcdh80YN45ppI3nykzyKa5qsjqN8hBYB5dWMMfzy7UOEBwfwo6+PtzqO0/5tyQT8Rfj1ps+tjqJ8hBYB5dU+OVrF9txqHrxsLEMHBVkdx2lxg0P59qLRvHeogqxCHWlUuZ8WAeW1OmyGhzcfJmloGLfNTbY6jsusuiiV4RHB/HbzYe0yqtxOi4DyWn/fX8rh8noe+vp4ggIGzo9yWFAAD142jj1Fp/jg8wqr46gBbuCcOcqnNLd18Mj7R5kSH8nVU+OsjuNyN6clMCp6EL9774jOS6zcSouA8kov7iyitPYMP108ET8/7+sSej4B/n78ePF4cisbeH1vj1N2KOUSWgSU16lvbuOJj3O5aGw0F46NtjqO23x9ciwzkqL44wfHaG7TB8iUe2gRUF7n+R2FnGpq46ErvL9L6LmICD+6Yjzldc28nHnc6jhqgNIioLxKfXMba7cWcOmE4UxLjLI6jtuljx7GnNSh/GVLnrYGlFtoEVBe5bnthZw+08aDl42zOkq/EBF+cNk4Kutb2KCtAeUGvZljeJ2IVIpIdpdlvxCRUhHZb/9YcpZ9F4vIERHJFZGfujK48j11zW2s3ZrPZROHMzXBe4eH6Kv00cOYq60B5Sa9aQk8ByzuYfkfjTHT7R/vdF8pIv7AE8CVwCRguYh4xyDvyiM9u62QuuZ2n2kFdPWDy8dRVd/CS7u0NaBc67xFwBjzKeDI8+tzgFxjTL4xphV4GbjWga+jFHXNbTyzLZ/LJ43w6kHiHDVv1DDSRw1jtbYGlIs5c0/gOyJy0H65aEgP6+OB4i7vS+zLeiQi94hIlohkVVVVORFLDUQv7iyirrmd710y1uoolvnepWM52dDCa1nF599YqV5ytAisBkYD04ETwCM9bNPTEzxnHQjFGLPGGJNmjEmLiYlxMJYaiJrbOli3rYCF42J86l5Ad/NGDWVmUhRPfpJPmz5FrFzEoSJgjKkwxnQYY2zAWjov/XRXAiR2eZ8AlDlyPOXbXs0q5mRDKw8sGm11FEuJCA9cPIbS2jO8fUBPJeUaDhUBEek6WMv1QHYPm+0GxopIqogEAcuAjY4cT/mutg4bT32Sz6zkIcxJHWp1HMtdMmE4E2Ij+MuWPGw2HWFUOa83XUQ3ABnAeBEpEZFVwO9E5DMROQhcDPzAvu1IEXkHwBjTDnwHeA/IAV41xhxy079DDVAb95dRWnuGBy4e7ZXTRrqaiHD/otHkVjbwvo4wqlxAPHG88rS0NJOVlWV1DGUxm81wxaOfEuAnbP7+RVoE7No7bFzyyCdEhQXy9wcW6PdFASAie4wxaX3dT58YVh7rw5wKcisbuH+RtgK6CvD3496vjeJgyWky8qutjqO8nBYB5bHWbs0nPiqUqwbgfAHOunFmAtHhQaz5NN/qKMrLaRFQHmnv8VPsLjzFqgtTCfDXH9PuQgL9WZGewpYjVRwpr7c6jvJienYpj7T203wiQwK4ZXbi+Tf2UXfMSyYk0I+nt2prQDlOi4DyOEXVjbx7qJzb5yUzKDjA6jgea8igIG5OS+St/aVU1DVbHUd5KS0CyuM8vbWAQD8/7pqfYnUUj7fqwlQ6bIbndhRaHUV5KS0CyqPUNLby2p5irpsxkuGRIVbH8XjJwwaxeEosL+4soqGl3eo4ygtpEVAe5a+7imhus3H3RaOsjuI17r5oFPXN7fxNB5ZTDtAioDxGa7uN9RlFXDQ2mnEjIqyO4zVmJg1hRlIUz+4o1KEkVJ9pEVAeY9NnZVTWt/DNC1OtjuJ1Vl2YSlF1Ex8drrQ6ivIyWgSURzDG8My2AkbHDOJrY3Uo8b5aPDmWkYNDWLetwOooystoEVAeYXfhKbJL61i5IBU/Px0ioq8C/P24c34KGfnVHCo7bXUc5UW0CCiPsG5bAYNDA7lxZoLVUbzWstlJhAb68+z2QqujKC+iRUBZrrimifc/L+fWuUmEBvlbHcdrDQ4L5BtpCWzcX0ZVfYvVcZSX0CKgLLc+oxARYUV6stVRvN5d81No7bDx113HrY6ivIQWAWWpptZ2XtldzOIpscQNDrU6jtcbFRPOovExvLSriNZ2nYdYnZ8WAWWpN/eVUtfczkodIsJl7pqfQmV9C5uzT1gdRXkBLQLKMsYYnt9RyJT4SGYlD7E6zoCxcGwMqdGDdDwh1Su9mWN4nYhUikh2l2W/F5HDInJQRN4Ukaiz7Fton4t4v4jofJHqKzLyqjla0cBd81N15jAX8vMT7kxPZt/xWg4U11odR3m43rQEngMWd1v2ATDFGHMBcBT413Psf7ExZrojc1+qge3ZHYUMHRTE1RfozGGuduOsBMKDA3heWwPqPM5bBIwxnwI13Za9b4z5YsjCnYB27lZ9UlzTxIc5Fdw6J4mQQO0W6moRIYHcNCuBtw9qd1F1bq64J/BNYPNZ1hngfRHZIyL3nOuLiMg9IpIlIllVVVUuiKU82Ys7i/AT4bZ5SVZHGbBWpCfT1mF4OVO7i6qzc6oIiMjPgHbgpbNsssAYMxO4EnhARBae7WsZY9YYY9KMMWkxMTp2zEB2prWDl3cXs3iydgt1p1Ex4SwcF8OLu4po69DuoqpnDhcBEbkTuBq4zRjT4/i1xpgy++dK4E1gjqPHUwPHxgOlnD7Tpg+H9YM705OpqGvh/UMVVkdRHsqhIiAii4GfAEuNMU1n2WaQiER88Rq4AsjuaVvlOzq7hRYxITaCOalDrY4z4C0aP5zEoaE8n1FodRTloXrTRXQDkAGMF5ESEVkFPA5EAB/Yu38+ad92pIi8Y991BLBNRA4AmcAmY8y7bvlXKK+xp+gUn5+oY0V6inYL7Qf+fsId85LJLKgh50Sd1XGUBwo43wbGmOU9LH7mLNuWAUvsr/OBaU6lUwPO8xlFRIQEcN2MkVZH8Rk3pyXyyPtHWZ9RxG9vmGp1HOVh9Ilh1W8q65rZ/NkJbk5LJCzovH9/KBeJCgviuunxvLWvlNNNbVbHUR5Gi4DqN3/NPE67zXDHPL0h3N9WzE/mTFsHr+3RyejVV2kRUP2izT688dfGxZASPcjqOD5n8sjBzEoewos7i3QyevUVWgRUv3jvUDmV9S3aLdRCK9KTKaxu4tNj+jCm+l9aBFS/WJ9RRMKQUBaNH251FJ915ZQ4osODeSGjyOooyoNoEVBud7i8jsyCGm6fl4y/TiJvmaAAP5bPSeSfRyoprunx8R7lg7QIKLd7IaOI4AA/bklLtDqKz7t1bhJ+Iry4S1sDqpMWAeVWdc1tvLmvlGumjWTIoCCr4/i8uMGhXD5xBK/uLqa5rcPqOMoDaBFQbvXGnhKaWjv0hrAHWTE/mVNNbbx9oMzqKMoDaBFQbmOMYf3OIqYlRnFBQpTVcZRd+qhhjB0ezgs79ZKQ0iKg3Gh7bjX5VY2s0IfDPIqIcEd6MgdLTrNfp5/0eVoElNusz+icPvIqnT7S41w/I55BQf6s19FFfZ4WAeUWpbVn+DCngltmJ+r0kR4oIiSQG2Ym8I+DJ6hu0OknfZkWAeUWL9mvN982V6eP9FQr0pNpbbfxSpaOJ+TLtAgol2tp7+CV3cVcMmEECUPCrI6jzmLsiAjSRw3jpZ3H6dDxhHyWFgHlcu98doLqxlbunK83hD3divRkSmvP8M/DlVZHURbRIqBcbn1GEaOiB7FgdLTVUdR5XD5pBLGRIXqD2IdpEVAudbCkln3Ha7kjPRk/HSfI4wX4+3Hb3CS2HjtJXlWD1XGUBXozx/A6EakUkewuy4aKyAcicsz+echZ9l0sIkdEJFdEfurK4Mozrc8oIizInxtnJVgdRfXSsjlJBPqLji7qo3rTEngOWNxt2U+Bj4wxY4GP7O+/QkT8gSeAK4FJwHIRmeRUWuXRahpb2XigjBtmxhMZEmh1HNVLMRHBXDU1jtf3lNDQ0m51HNXPzlsEjDGfAjXdFl8LPG9//TxwXQ+7zgFyjTH5xphW4GX7fmqAemV3Ma3tNlakp1gdRfXRivkp1Le08+a+UqujqH7m6D2BEcaYEwD2zz3NFBIPdO2AXGJf1iMRuUdEskQkq6pKZz7yNh02w4s7i0gfNYxxIyKsjqP6aEZiFFPjB7N+RyHGaHdRX+LOG8M93RU860+XMWaNMSbNGJMWExPjxljKHf55uJLS2jPaLdRLiQgr0pM5VtlARn611XFUP3K0CFSISByA/XNPnYxLgK6ziCQAOnbtAPX8jkLiBodw2cQRVkdRDrpm2kiGhAXy/I5Cq6OofuRoEdgI3Gl/fSfw9x622Q2MFZFUEQkCltn3UwPMsYp6tuWe5PZ5yQT4a69jbxUS6M+yOUl88HkFJad0+klf0ZsuohuADGC8iJSIyCrgYeByETkGXG5/j4iMFJF3AIwx7cB3gPeAHOBVY8wh9/wzlJWezyi0z1+r4wR5u9vtw37rXAO+I+B8Gxhjlp9l1aU9bFsGLOny/h3gHYfTKY93+kwbb+wtZem0kQzV6SO9XnxUKF+fHMsru4v5wWXjdARYH6Btd+WU17KKaWrt4K75KVZHUS5y5/wUapva+Pt+7S7qC7QIKId12AzrM4pISx7ClPjBVsdRLjI3dSgTYiN4drt2F/UFWgSUw7YcqeR4TRN3aitgQBER7pqfwuHyejILuj8nqgYaLQLKYc/tKGREZDCLp8RaHUW52LXT44kKC+Q57S464GkRUA45WlHP1mMnWZGeQqB2Cx1wQoP8WTY7ifcOlWt30QFOz17lkGe3FxKs3UIHtBXpyYgI63V00QFNi4Dqs1ONrby5r4TrZ8Rrt9ABbGRUKFdOiWVD5nEadXTRAUuLgOqzDbuP09xmY+WCVKujKDdbuSCV+uZ2Xt9bYnUU5SZaBFSftHXYWL+jiAvHRDM+VkcLHehmJkUxLTGK57YXYtPJ6AckLQKqT97NLqe8rpmVC1KsjqL6gYjwzQUp5J9s5JOjOsT7QKRFQPWaMYZnthWQMiyMi8f3NIWEGoiWTI1jRGQwz2wrsDqKcgMtAqrX9hSdYn9xLd+8MFUnkfchgf5+rEhPYVvuSXJO1FkdR7mYFgHVa09vLWBwaCA36STyPue2uUmEBvrz9FZtDQw0WgRUrxRVN/Le5+XcPi+JsKDzDj6rBpiosCBuTktg44FSKuqarY6jXEiLgOqVddsKCPATnUTeh61ckEq7zbA+o9DqKMqFtAio86ptauXVrBKWTotnRGSI1XGURVKiB3HFpBG8uPM4Ta368NhAoUVAnddLu45zpq2Duy/Sh8N83bcuGsXpM238bY8+PDZQaBFQ59TS3sHzOwq5cEw0E+MirY6jLDYreQjTEqN4ZlsBHfrw2IDgcBEQkfEisr/LR52IPNhtm0UicrrLNv/pdGLVr97aV0plfQv3LBxldRTlAUSEexeOoqi6iXezy62Oo1zA4W4expgjwHQAEfEHSoE3e9h0qzHmakePo6xjsxme+jSfSXGRXDQ22uo4ykN8fXIsKcPCePKTPJZMjUVEnxnxZq66HHQpkGeM0TFnB5APcyrIr2rk3q+N0hNdfcnfT/jWwlF8VnqajLxqq+MoJ7mqCCwDNpxlXbqIHBCRzSIy2UXHU/3gqU/zSRgSylVT46yOojzMjTMTiA4P4slP862OopzkdBEQkSBgKfBaD6v3AsnGmGnAn4G3zvF17hGRLBHJqqrSgaqstruwhj1Fp/jWRaMI0JnDVDchgf6sXJDKp0erOFR22uo4ygmuOLuvBPYaYyq6rzDG1BljGuyv3wECRaTHi8vGmDXGmDRjTFpMTIwLYilnPPVJHkMHBXFzWqLVUZSHun1uMoOC/HnqE20NeDNXFIHlnOVSkIjEiv1isojMsR9PLyJ6uMPldXyYU8md6SmEBvlbHUd5qMFhgdw6N4lNn53geLXOQ+ytnCoCIhIGXA680WXZfSJyn/3tTUC2iBwAHgOWGWO0c7GHe+LjPAYF+XPn/GSroygPd/dFo/AXYfUneVZHUQ5yqggYY5qMMcOMMae7LHvSGPOk/fXjxpjJxphpxph5xpgdzgZW7lVwspFNB8u4Iz2FqDCdP1id24jIEG6encDre0ooP60Dy3kjveOnvmL1llwC/f1YdaEOEaF6596Fo+kwhjXaU8graRFQXyqtPcMbe0tZPieJmIhgq+MoL5E4NIzrpsfz18wiqhtarI6j+kiLgPrSmk/yEEGHiFB99u2LR9PSbmPddp10xttoEVAAVNY18/LuYm6YkcDIqFCr4ygvMzomnCVT4li/o4japlar46g+0CKgAFj9SR7tNsMDF4+xOoryUt+9dAz1Le2s0wnpvYoWAUVFXTMv7TrOjTPjSRoWZnUc5aUmxEayZGos67YXamvAi2gRUKzekofNZvjOxWOtjqK83PcvHUdDS7tOSO9FtAj4uPLTzfw18zg3zkzQVoBy2vjYCK6aGsdzOwo51aitAW+gRcDHrd6S29kKuETvBSjX+N6lY2lsbefpbfrcgDfQIuDDymrPsCGzmJtmJZA4VFsByjXGx0awZGocz20v1OcGvIAWAR/2pw+PAWgrQLncDy4by5m2Dv6yRccU8nRaBHxUXlUDr+0p5ta5SSQM0VaAcq0xwyO4YWYCL+wsoqz2jNVx1DloEfBRf3j/KCGB/toKUG7z4GVjwfxvi1N5Ji0CPii79DSbPjvBqgtTiQ7XMYKUeyQMCePWuUm8tqeYvKoGq+Oos9Ai4IN+994RosIC+ZaOEaTc7DuXjCEk0J8/vH/U6ijqLLQI+JgdeSf59GgV3140msiQQKvjqAEuOjyYuy9MZdNnJzhQXGt1HNUDLQI+xGYz/GZTDvFRoaxIT7E6jvIR31o4iujwIH6zKQedWNDzaBHwIW/sK+VQWR0/XjyekECdO1j1j4iQQB68bByZhTW8d6jC6jiqGy0CPuJMawf/894RLkgYzDUXjLQ6jvIxy2YnMmZ4OA9vzqG13WZ1HNWFsxPNF4rIZyKyX0SyelgvIvKYiOSKyEERmenM8ZTjnt6aT3ldMz9bMhE/P7E6jvIxAf5+/NuSCRRWN/HSriKr46guXNESuNgYM90Yk9bDuiuBsfaPe4DVLjie6qPK+mae/CSPKyaNYO6oYVbHUT7q4vHDWTBmGH/66Binm9qsjqPs3H056Fpgvem0E4gSkTg3H1N18/82H6G1w8a/LplodRTlw0SEny2ZRN2ZNv74oXYZ9RTOFgEDvC8ie0Tknh7WxwPFXd6X2Jf9HyJyj4hkiUhWVVWVk7HUF/YUneL1vSWsunAUqdGDrI6jfNykkZHcNjeZ9RmF5JyoszqOwvkisMAYM5POyz4PiMjCbut7uvjcYx8xY8waY0yaMSYtJibGyVgKoMNm+PnGbEZEBvNdHR5CeYh/uWIcg0MD+fnGQ9pl1AM4VQSMMWX2z5XAm8CcbpuUAIld3icAZc4cU/XeK7uLyS6t49+WTGRQcIDVcZQCICosiIe+Pp7MghrePnjC6jg+z+EiICKDRCTii9fAFUB2t802AivsvYTmAaeNMfq/3g9qm1r5/XuHmZM6lKXTtEuo8izLZicxJT6S/96UQ2NLu9VxfJozLYERwDYROQBkApuMMe+KyH0icp99m3eAfCAXWAt826m0qtce3nyYuuZ2frl0MiLaJVR5Fn8/4ZdLp1Be18wfP9CbxFZy+BqBMSYfmNbD8ie7vDbAA44eQzlmV341L+8u5t6Fo5gYF2l1HKV6NCt5CLfOTWLd9gKunR7P1ITBVkfySfrE8ADT0t7Bv775GQlDQvn+ZWOtjqPUOf1k8QSGhQfzr28epL1DnyS2ghaBAWb1ljzyqxr59XVTCAvSm8HKsw0ODeQX10wmu7SO53YUWh3HJ2kRGEByK+v5y8d5LJ02kkXjh1sdR6leWTI1lksnDOeR949SXNNkdRyfo0VggGjvsPHQawcJC/bnP66eZHUcpXpNRPjVdVPwE/jpGwex2fTZgf6kRWCAWLM1n/3Ftfzq2inEROiUkcq7xEeF8u9XT2J7brUOMNfPtAgMAEfK63n0g2MsmRrLNRfo0EzKOy2bncjXxsXw3+8cpqi60eo4PkOLgJdr67Dxw1f3ExESwH9dO0WfCVBeS0R4+MapBPgLP3pNLwv1Fy0CXu6xj45xqKyO/75hKsPC9TKQ8m5xg0P5xTWTySysYe3WfKvj+AQtAl4sI6+axz/O5aZZCXx9cqzVcZRyiRtmxnPllFh+/94RnZy+H2gR8FI1ja08+Mo+UqMH8culk62Oo5TLiAgP33ABIyJD+O6GfdQ36wQ07qRFwAsZY/jx3w5wqrGNx5bN0BFC1YAzOCyQPy2bTsmpJv79rWwdctqNtAh4oWe3F/JhTiU/vXICU+J1vBU1MKWlDOXBy8bx9/1lvLanxOo4A5YWAS+TWVDDf7+Tw2UTR7ByQYrVcZRyqwcuHsP80cP497eyyS49bXWcAUmLgBepqGvm2y/tJXFoGH+4ZZp2B1UDnr+f8OflM4geFMS9L+yhprHV6kgDjhYBL9HabuP+F/fQ1NrOU3fMIjIk0OpISvWLYeHBPHnHLKoaWvjehn106PMDLqVFwAsYY/j5xkPsPV7L72+axrgREVZHUqpfXZAQxa+vncK23JM8vDnH6jgDinYr8QJrt+azIfM49y8azVU6LITyUTfPTiS77DRrtxaQGh3OrXOTrI40IGgR8HDvZpfz282HuWpqHD+6YrzVcZSy1H9ePYnjNU38x9+zSRgSysJxMVZH8nrOTDSfKCIfi0iOiBwSke/3sM0iETktIvvtH//pXFzfcqC4lgdf2cf0xCgeuXkafn56I1j5tgB/Px6/dSZjh4fzwEt7OVJeb3Ukr+fMPYF24F+MMROBecADItLTQPZbjTHT7R+/cuJ4PiW3soGVz+0mOjyYtSvSCAn0tzqSUh4hPDiAdXfNJjTInxXrdulENE5yuAgYY04YY/baX9cDOUC8q4L5spJTTdzxzC78RHhh1VyidWA4pb5iZFQo61fNobnNxu3P7KKyrtnqSF7LJb2DRCQFmAHs6mF1uogcEJHNInLWQW5E5B4RyRKRrKqqKlfE8kpV9S3c/vQuGlvaeWHVHFKjB1kdSSmPNCE2kmdXzqaqvoU7nsmktkmfIXCE00VARMKB14EHjTF13VbvBZKNMdOAPwNvne3rGGPWGGPSjDFpMTG+ebPniwJQUdfCsyvnMDEu0upISnm0mUlDWLsijYKTjVoIHORUERCRQDoLwEvGmDe6rzfG1BljGuyv3wECRSTamWMOVOWnm7llTQbHa5p45s40ZiUPsTqSUl5hwZhonrxjJkfK61m+dhfVDS1WR/IqzvQOEuAZIMcY84ezbBNr3w4RmWM/XrWjxxyoSmvPcMuaDCpON/P8N+cwf4zWSaX64pIJI3j6zjTyqxpYvnYnlfV6j6C3nGkJLADuAC7p0gV0iYjcJyL32be5CcgWkQPAY8Ayo2PCfsXRinq+sXoHNY2tvHD3XOakDrU6klJeaeG4GJ5dOZvimjPc/GSGzlPcS+KJv5PT0tJMVlaW1THcbmd+NfeszyI40J9n75qtw0Ir5QJ7ik6x6vnd+IvwzF2zmZ4YZXWkfiEie4wxaX3dT8cOssjGA2WseCaTmIhg3rh/vhYApVxkVvIQXr9/PmHB/ixbk8EHn1dYHcmjaRHoZx02w+/ePcz3NuxjWuJgXr9/PolDw6yOpdSAMjomnDfuX8C4ERHc80IWj//zGDYdfbRHWgT6UW1TKyuf281ftuSxfE4SL949l6iwIKtjKTUgxUQE88o96Vw7bST/8/5R7n9pDw0t7VbH8jhaBPrJvuOnuObxbezMq+a3N0zltzdMJThAh4JQyp1Cg/z54y3T+Y+rJ/FhTiVLH9/GoTKdoawrLQJu1mEz/PmjY9z0ZAY2G7x87zyWz9EhcJXqLyLCqgtTeXHVXBpb2rn+iR08vTVfLw/ZaRFwo4KTjSxfs5NHPjjK1RfEsfnBi5iZpA+BKWWF9NHDePf7C1k0PoZfb8phxbpMSk7p4HPaRdQN2jpsrN2az6MfHiM4wI9fLp3MDTMTrI6llKJzpr4NmcX8ZtPnGOChK8Zz5/wU/L18qHZHu4hqEXCxXfnV/OLtz8k5UceVU2L55dLJDI8MsTqWUqqb0toz/OzNz9hypIppCYP5xdLJzPDilroWAYsdr27it5tz2JxdTtzgEH5+zWQWT4m1OpZS6hyMMWw8UMavN+VQVd/CddNH8uPFExgZFWp1tD5ztAjo9JJOqqhr5i8f57Ihsxh/P+GHl4/jWxeNIjRIe/4o5elEhGunx3PpxBGs3pLL2q0FvHuonDvmJXPv10b7xFwe2hJw0InTZ1j7aQEv7iqiw2a4aWYCD14+lrjB3vcXhFKqU3FNE3/44Ch/319KcIA/K+Yns2pBqldc0tXLQf0ku/Q0a7fms+ngCQxww4x4vnvJWJKG6VO/Sg0UeVUNPPbRMTYeKCPQz4+l00dy90WpTIj13Dk+tAi4UWNLO/84WMaGzGL2F9cSHhzALbMTuWt+ig75oNQAVnCykWe3F/BaVgln2jqYkzKUW2YnsmRqnMdd8tUi4GJtHTa25Z7k7QNlvJddTmNrB2OGh7NsdiI3z04kMiTQ0nxKqf5T29TKy7uLeTnzOIXVTUSEBLBkShzXTBvJvFFDCfC3/pErLQIu0NjSztZjJ/kop4IPcyo41dRGREgAiyfHcsvsRGYlD8E+R45SygcZY9iZX8OrWcW8f6jzj8Po8GAunzScSyeMYMGYaMtaCNo7yAHtHTayy+rYnnuSHXkn2V1witYOGxEhAVwyYThXXzCSheOidYwfpRTQ2ZsoffQw0kcPo7mtg48PV/KPgyd4+8AJNmQWExzgx5zUoSwYE82C0dFMGhnp8Q+h+UxLwBhDRV0L2aWnOVBSy56iU+wvrqWptQOACbERXDgmmksnjiAtZQiBHtC8U0p5h9Z2G7sKqvkop5LtuSc5VtkAQHhwADOSopiZNITpiVFMjo9keIR7ehrp5SA7YwzVja0Unmwkt7KBoxUNHKusJ+dEHScbWgHw9xMmxkUwK2kIs1KGMn/0MJ/oD6yU6h+Vdc3syKsmq6iGPUW1HCmv44vx6oZHBDMhLpJxw8MZNyKC0cPDSY0exJCwQKcuN1tSBERkMfAnwB942hjzcLf1Yl+/BGgC7jLG7D3f13WkCHTYDNf/ZTsFVY3UdxkzPCTQj7HDIxgfG8GUkZFMiR/MpJGRhAX59JUwpVQ/amhp5/OyOrJLT5NdeprD5fXkVTXQ0m77cpvIkAAmxEbyyr3zHCoG/X5PQET8gSeAy4ESYLeIbDTGfN5lsyuBsfaPucBq+2eX8/cTRkUPYkZiFCnRg0gZNojRMeEkDAnFz8OvySmlBrbw4ADmpA5lTurQL5d12AzFNU3kn2yg4GQTRdWNtHXY+r3ziTN/Ds8Bco0x+QAi8jJwLdC1CFwLrDedzY2dIhIlInHGmBNOHPesHl02wx1fVimlXM7fTzr/YI0eZGkOZ+5+xgPFXd6X2Jf1dRsAROQeEckSkayqqionYimllOotZ4pAT22W7jcYerNN50Jj1hhj0owxaTExMU7EUkop1VvOFIESILHL+wSgzIFtlFJKWcSZIrAbGCsiqSISBCwDNnbbZiOwQjrNA067636AUkqpvnP4xrAxpl1EvgO8R2cX0XXGmEMicp99/ZPAO3R2D82ls4voSucjK6WUchWnOssbY96h8xd912VPdnltgAecOYZSSin30bERlFLKh2kRUEopH+aRYweJSBVQZH8bDZy0MI4zNLs1vDW7t+YGzW6VrtmTjTF97l/vkUWgKxHJcmQ8DE+g2a3hrdm9NTdodqu4IrteDlJKKR+mRUAppXyYNxSBNVYHcIJmt4a3ZvfW3KDZreJ0do+/J6CUUsp9vKEloJRSyk20CCillA+ztAiIyGIROSIiuSLy0x7Wi4g8Zl9/UERm9nZfd3M0u4gkisjHIpIjIodE5PvekLvLen8R2Sci/+i/1F8e25mflygR+ZuIHLZ/79O9KPsP7D8r2SKyQUTcM1O549kniEiGiLSIyEN92dedHM1t9Tlqz+Dw99y+vvfnqTHGkg86B53LA0YBQcABYFK3bZYAm+mcl2AesKu3+3pw9jhgpv11BHC0v7I7k7vL+h8CfwX+4S0/L/Z1zwN3218HAVHekJ3OSZgKgFD7+1fpnKvbk7IPB2YDvwEe6su+HprbsnPU2exd1vf6PLWyJfDl9JTGmFbgi+kpu/pyekpjzE4gSkTiermvR2Y3xpwwxuwFMMbUAzmcZbY1T8oNICIJwFXA0/2UtyuHs4tIJLAQeAbAGNNqjKn1huz2dQFAqIgEAGH075wc581ujKk0xuwG2vq6rxs5nNvicxSc+573+Ty1sgg4Mz1lr6etdBOXTK0pIinADGCX6yP2yNncjwI/BmxuyncuzmQfBVQBz9qbyE+LSH9O7OpwdmNMKfA/wHHgBJ1zcrzvxqzdOXOuWXmeuuTYFpyj4Hz2R+nDeWplEXBmespeT1vpJk5PrSki4cDrwIPGmDoXZjsXh3OLyNVApTFmj+tj9Yoz3/MAYCaw2hgzA2gE+vP6tDPf9yF0/hWYCowEBonI7S7Ody7OnGtWnqdOH9uicxScyO7IeWplEXBmekqrp610ampNEQmk84frJWPMG27M2Z0zuRcAS0WkkM7m6SUi8qL7ov4fzv68lBhjvvhr7m90FoX+4kz2y4ACY0yVMaYNeAOY78as3Tlzrll5njp1bAvPUXAue9/P0/662dHDjYsAIJ/Ov3C+uPkxuds2V/HVm2WZvd3Xg7MLsB541Ju+5922WUT/3xh2KjuwFRhvf/0L4PfekB2YCxyi816A0HmD+7uelL3Ltr/gqzdYLTtPncxt2TnqbPZu63p1nvb7P7BbyCV03nnPA35mX3YfcF+X/4wn7Os/A9LOta83ZAcupLNpdxDYb/9Y4um5Hfnh8qTswHQgy/59fwsY4kXZfwkcBrKBF4BgD8seS+dfr3VArf115Nn29fTcVp+jzn7Pu3yNXp2nOmyEUkr5MH1iWCmlfJgWAaWU8mFaBJRSyodpEVBKKR+mRUAppXyYFgGllPJhWgSUUsqH/X/kZL9RtUXIuAAAAABJRU5ErkJggg==\n",
222 | "text/plain": [
223 | ""
224 | ]
225 | },
226 | "metadata": {
227 | "needs_background": "light"
228 | },
229 | "output_type": "display_data"
230 | }
231 | ],
232 | "source": [
233 | "# Sampling distribution of sample proportion for the difference between men and women\n",
234 | "\n",
235 | "mu = p1-p2\n",
236 | "variance = p1*(1-p1)/n + p2*(1-p2)/n\n",
237 | "sigma = math.sqrt(variance)\n",
238 | "x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)\n",
239 | "sns.lineplot(x = x, y = norm.pdf(x, mu, sigma));"
240 | ]
241 | },
242 | {
243 | "cell_type": "markdown",
244 | "id": "magnetic-positive",
245 | "metadata": {},
246 | "source": [
247 | "The confidence interval that we want to calculate states that there is a 95% chance that $p_1 - p_2$ is within $1.96 \\cdot \\sigma_{\\bar p_1 - \\bar p_2}$."
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 15,
253 | "id": "local-longer",
254 | "metadata": {},
255 | "outputs": [
256 | {
257 | "data": {
258 | "text/plain": [
259 | "1.959963984540054"
260 | ]
261 | },
262 | "execution_count": 15,
263 | "metadata": {},
264 | "output_type": "execute_result"
265 | }
266 | ],
267 | "source": [
268 | "# Calculating the number of standard deviations required for a 95% interval\n",
269 | "\n",
270 | "norm.ppf(0.975)"
271 | ]
272 | },
273 | {
274 | "cell_type": "markdown",
275 | "id": "usual-breach",
276 | "metadata": {},
277 | "source": [
278 | "Let's calculate our interval."
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": 16,
284 | "id": "square-answer",
285 | "metadata": {},
286 | "outputs": [
287 | {
288 | "data": {
289 | "text/plain": [
290 | "0.042540701104107376"
291 | ]
292 | },
293 | "execution_count": 16,
294 | "metadata": {},
295 | "output_type": "execute_result"
296 | }
297 | ],
298 | "source": [
299 | "np.sqrt(variance) * norm.ppf(0.975)"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "id": "living-editor",
305 | "metadata": {},
306 | "source": [
307 | "Now we know that there is a 95% chance that the true difference of the proportions is within 0.04254 of the actual difference of the sample proportions."
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 17,
313 | "id": "closing-greene",
314 | "metadata": {},
315 | "outputs": [
316 | {
317 | "name": "stdout",
318 | "output_type": "stream",
319 | "text": [
320 | "The 95% CI is: [0.027,0.113]\n"
321 | ]
322 | }
323 | ],
324 | "source": [
325 | "print(f'The 95% CI is: [{np.round(p1-p2 - np.sqrt(variance) * norm.ppf(0.975), 3)},{np.round(p1-p2 + np.sqrt(variance) * norm.ppf(0.975), 3)}]')"
326 | ]
327 | },
328 | {
329 | "cell_type": "markdown",
330 | "id": "continuing-devices",
331 | "metadata": {},
332 | "source": [
333 | "The candidate can conclude that there is a 95% chance that men are more likely to vote for him than women. Notice that the value 0 (no difference) is not contained in the interval."
334 | ]
335 | },
336 | {
337 | "cell_type": "markdown",
338 | "id": "phantom-turning",
339 | "metadata": {},
340 | "source": [
341 | "# Hypothesis test comparing population proportions"
342 | ]
343 | },
344 | {
345 | "cell_type": "markdown",
346 | "id": "broken-prevention",
347 | "metadata": {},
348 | "source": [
349 | "We can be even more direct in our approach. We can define a hypothesis test to evaluate if in fact that is a difference.\n",
350 | "\n",
351 | "We start by defining our null hypothesis, which represents the no difference scenario. Conversely, our alternative hypothesis states that there is a difference.\n",
352 | "\n",
353 | "$$\\begin{align}\n",
354 | "H_0: p_1 - p_2 = 0 \\\\\n",
355 | "H_a: p_1 - p_2 \\neq 0\n",
356 | "\\end{align}$$\n",
357 | "\n",
358 | "Thus, we want to calculate the $P(\\bar p_1 - \\bar p_2 | H_0)$. If this probability is less than the significance level defined for the problem, we will reject our null hypothesis and accept that there is a difference between how men and women vote for the candidate. In other words, assuming that the true proportions are equal, we want to know what is the probability of getting something as extreme as the value of $\\bar p_1 - \\bar p_2$.\n",
359 | "\n",
360 | "$$\\begin{align}\n",
361 | "p_0 &= \\frac{n_1 \\cdot p_1 + n_2 \\cdot p_2}{n_1 + n_2} \\\\\n",
362 | "n &= n_1 = n_2 \\\\\n",
363 | "z &= \\frac{(\\bar p_1 - \\bar p_2) - 0}{\\sqrt{2 \\cdot \\frac{p_0(1-p_0)}{n}}}\n",
364 | "\\end{align}$$"
365 | ]
366 | },
367 | {
368 | "cell_type": "code",
369 | "execution_count": 28,
370 | "id": "everyday-short",
371 | "metadata": {},
372 | "outputs": [
373 | {
374 | "data": {
375 | "text/plain": [
376 | "3.2167337783899304"
377 | ]
378 | },
379 | "execution_count": 28,
380 | "metadata": {},
381 | "output_type": "execute_result"
382 | }
383 | ],
384 | "source": [
385 | "p0 = (n1*p1+n2*p2)/(n1+n2)\n",
386 | "z = ((p1 - p2) - 0) /(np.sqrt(2*(p0*(1-p0)/n)))\n",
387 | "z"
388 | ]
389 | },
390 | {
391 | "cell_type": "markdown",
392 | "id": "respective-command",
393 | "metadata": {},
394 | "source": [
395 | "We can now compare our z-statistic with the critical value for a significance level of 5%."
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": 34,
401 | "id": "neural-opportunity",
402 | "metadata": {},
403 | "outputs": [
404 | {
405 | "data": {
406 | "text/plain": [
407 | "1.959963984540054"
408 | ]
409 | },
410 | "execution_count": 34,
411 | "metadata": {},
412 | "output_type": "execute_result"
413 | }
414 | ],
415 | "source": [
416 | "z_critical = norm.ppf(0.975)\n",
417 | "z_critical"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 35,
423 | "id": "global-plaintiff",
424 | "metadata": {},
425 | "outputs": [
426 | {
427 | "data": {
428 | "text/plain": [
429 | "True"
430 | ]
431 | },
432 | "execution_count": 35,
433 | "metadata": {},
434 | "output_type": "execute_result"
435 | }
436 | ],
437 | "source": [
438 | "z > z_critical"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "id": "going-russia",
444 | "metadata": {},
445 | "source": [
446 | "Our z-statistic is greater than the critical value. In other words, there is a 5% chance of sampling a z-statistic greater than the critical value assuming our null hypothesis. In this scenario we reject $H_0$, which means that we accept that there is a difference between men and women voting for the candidate."
447 | ]
448 | },
449 | {
450 | "cell_type": "markdown",
451 | "id": "average-madness",
452 | "metadata": {},
453 | "source": [
454 | "# Statistical Significance of Experiment"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "id": "unnecessary-jerusalem",
460 | "metadata": {},
461 | "source": [
462 | "In an experiment aimed at studying the effect of fake news on engagement, a group of 500 people was randomly assigned to two different groups. After randomization, each person received a smartphone with only a social media app that they could use to follow the news and updates from his or her friends.\n",
463 | "\n",
464 | "The first group received fake news at least two times a day, while the second group did not receive any. After 30 days, the time spent on the social media app was measured. The conductors of the experiment found that the average time per day spent on the app by the first group was approximately 12 minutes greater than the second group."
465 | ]
466 | },
467 | {
468 | "cell_type": "code",
469 | "execution_count": 42,
470 | "id": "applied-publisher",
471 | "metadata": {},
472 | "outputs": [],
473 | "source": [
474 | "gra = Digraph()\n",
475 | "\n",
476 | "gra.node('a', '500')\n",
477 | "gra.node('b', '#1\\n Fake news')\n",
478 | "gra.node('c', '#2\\n No fake news')\n",
479 | "gra.edges(['ab', 'ac'])"
480 | ]
481 | },
482 | {
483 | "cell_type": "code",
484 | "execution_count": 43,
485 | "id": "genuine-explorer",
486 | "metadata": {},
487 | "outputs": [
488 | {
489 | "data": {
490 | "image/svg+xml": [
491 | "\n",
492 | "\n",
494 | "\n",
496 | "\n",
497 | "\n"
536 | ],
537 | "text/plain": [
538 | ""
539 | ]
540 | },
541 | "execution_count": 43,
542 | "metadata": {},
543 | "output_type": "execute_result"
544 | }
545 | ],
546 | "source": [
547 | "gra"
548 | ]
549 | },
550 | {
551 | "cell_type": "code",
552 | "execution_count": 128,
553 | "id": "rough-interstate",
554 | "metadata": {},
555 | "outputs": [],
556 | "source": [
557 | "g1 = np.random.normal(3, 1, size=250)\n",
558 | "g2 = np.random.normal(2.2, 1, size=250)"
559 | ]
560 | },
561 | {
562 | "cell_type": "code",
563 | "execution_count": 129,
564 | "id": "environmental-williams",
565 | "metadata": {},
566 | "outputs": [
567 | {
568 | "name": "stdout",
569 | "output_type": "stream",
570 | "text": [
571 | "Group 1: 3.0h\n",
572 | "Group 2: 2.2h\n"
573 | ]
574 | }
575 | ],
576 | "source": [
577 | "print('Group 1: ' + str(np.round(g1.mean(),1)) + 'h')\n",
578 | "print('Group 2: ' + str(np.round(g2.mean(),1)) + 'h')"
579 | ]
580 | },
581 | {
582 | "cell_type": "markdown",
583 | "id": "upset-punishment",
584 | "metadata": {},
585 | "source": [
586 | "To be able to understand the significance of this result, we need to re-randomize the results into two new groups and measure the difference between the mean of the new groups. We repeat the simulation 200 times."
587 | ]
588 | },
589 | {
590 | "cell_type": "code",
591 | "execution_count": 130,
592 | "id": "lesbian-utility",
593 | "metadata": {},
594 | "outputs": [],
595 | "source": [
596 | "groups_ind = np.zeros((200, 500))\n",
597 | "for i in range(200):\n",
598 | " groups_ind[i] = (np.random.choice(np.arange(500, dtype=np.int32), size=500, replace=False))"
599 | ]
600 | },
601 | {
602 | "cell_type": "code",
603 | "execution_count": 131,
604 | "id": "incorporated-capitol",
605 | "metadata": {},
606 | "outputs": [],
607 | "source": [
608 | "groups_ind = groups_ind.astype('int32')"
609 | ]
610 | },
611 | {
612 | "cell_type": "code",
613 | "execution_count": 132,
614 | "id": "recreational-overall",
615 | "metadata": {},
616 | "outputs": [],
617 | "source": [
618 | "g = np.concatenate((g1,g2))"
619 | ]
620 | },
621 | {
622 | "cell_type": "code",
623 | "execution_count": 133,
624 | "id": "recovered-tuesday",
625 | "metadata": {},
626 | "outputs": [],
627 | "source": [
628 | "g1_rand = g[groups_ind][:,:250]\n",
629 | "g2_rand = g[groups_ind][:,250:]\n",
630 | "\n",
631 | "g1_rand_mean = g1_rand.mean(axis=0)\n",
632 | "g2_rand_mean = g2_rand.mean(axis=0)\n",
633 | "\n",
634 | "diff_g = g1_rand_mean - g2_rand_mean"
635 | ]
636 | },
637 | {
638 | "cell_type": "code",
639 | "execution_count": 147,
640 | "id": "young-document",
641 | "metadata": {},
642 | "outputs": [
643 | {
644 | "data": {
645 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD4CAYAAADrRI2NAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAOwklEQVR4nO3df4wcZ33H8fcndkJQA8VuLsFc7mooES1C/JKh1EQICJGStKpDBQREwVXTOqiiglIobvmrqiqlVYVQq6qNFRCmRTRpCYr50SJjElDlEHBoCEQOGCoRG1uxCdCAKgEm3/6xk+biX7d33tm9vef9kkYz8+zszPfJKZ8dz848m6pCktSOcyZdgCRpvAx+SWqMwS9JjTH4JakxBr8kNWbtpAsYxoUXXlgbN26cdBmSNFXuvvvu71bVzIntUxH8GzduZN++fZMuQ5KmSpJvn6rdSz2S1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1/NmJ2bJ8lIp9m5+Ul3S1qyqRiyQRqFw4cOcu2Ne0e6z5uv3zzS/Unj4Bm/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TG9B78SdYk+a8kn+jW1yfZneRAN1/Xdw2SpMeM44z/bcD+BevbgT1VdSmwp1uXJI1Jr8Gf5BLg14GbFjRvAXZ2yzuBa/qsQZL0eH2f8b8P+BPgkQVtF1fVEYBuftGp3phkW5J9SfYdO3as5zK1kszOzZNk5JOkgbV97TjJbwBHq+ruJC9f6vuragewA2DTpk012uq0kh0+dJBrb9w78v3efP3mke9Tmka9BT/wUuA3k1wNnA88Ock/Aw8m2VBVR5JsAI72WIMk6QS9Xeqpqj+tqkuqaiPweuCzVfXbwC5ga7fZVuC2vmqQJJ1sEvfx3wBckeQAcEW3Lkkakz4v9fy/qroDuKNbfgi4fBzHlSSdzCd3JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbg17LNzs2TZOSTpH6tnXQBml6HDx3k2hv3jny/N1+/eeT7lPQYz/glqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxvQV/kvOTfDHJV5Lcl+TPu/b1SXYnOdDN1/VVgyTpZH2e8f8YeGVVPQ94PnBlkpcA24E9VXUpsKdblySNSW/BXwM/6lbP7aYCtgA7u/adwDV91SBJOlmv1/iTrElyD3AU2F1VdwEXV9URgG5+0Wneuy3JviT7jh071meZktSUXoO/qn5WVc8HLgFenOQ5S3jvjqraVFWbZmZmeqtRklozlrt6quoHwB3AlcCDSTYAdPOj46hBkjTQ5109M0me0i0/EXgVcD+wC9jabbYVuK2vGiRJJ+vzh1g2ADuTrGHwAXNLVX0iyZ3ALUmuAx4AXttjDVK/zlnby6+GPe2SOb5z8IGR71eCHoO/qu4FXnCK9oeAy/s6rjRWjxz3V8g0dXxyV5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JasxQwZ/kpcO0SZJWvmHP+P9uyDZJo9CN+jnqaXZuftI90wpwxtE5k/wasBmYSfKOBS89GVjTZ2FS0xz1Uz1abFjm84ALuu2etKD9YeA1fRUlSerPGYO/qj4HfC7JB6vq22OqSZLUo2F/iOUJSXYAGxe+p6pe2UdRkqT+DBv8/wr8I3AT8LP+ypEk9W3Y4D9eVf/QayWSpLEY9nbOjyf5gyQbkqx/dOq1MklSL4Y949/azd+1oK2AZ4y2HElS34YK/qp6et+FSJLGY6jgT/LmU7VX1YdGW44kqW/DXup50YLl84HLgS8DBr8kTZlhL/X84cL1JD8P/FMvFUmSerXcYZn/F7h0lIVIksZj2Gv8H2dwFw8MBmf7FeCWvoqSJPVn2Gv8f7Ng+Tjw7ao61EM9kqSeDXWppxus7X4GI3SuA37SZ1GSpP4M+wtcrwO+CLwWeB1wVxKHZZakKTTspZ73AC+qqqMASWaAzwD/1ldhkqR+DHtXzzmPhn7noSW8V5K0ggx7xv8fST4NfKRbvxb4VD8lSZL6tNhv7j4TuLiq3pXkt4DLgAB3Ah8eQ32SpBFb7HLN+4AfAlTVrVX1jqr6IwZn++/rtzRJUh8WC/6NVXXviY1VtY/BzzBKkqbMYsF//hlee+IoC5Ekjcdiwf+lJL9/YmOS64C7z/TGJHNJbk+yP8l9Sd7Wta9PsjvJgW6+bvnlS5KWarG7et4OfCzJG3ks6DcB5wGvXuS9x4E/rqovJ3kScHeS3cDvAHuq6oYk24HtwLuXWb8kaYnOGPxV9SCwOckrgOd0zZ+sqs8utuOqOgIc6ZZ/mGQ/MAtsAV7ebbYTuAODX5LGZtjx+G8Hbl/uQZJsBF4A3MXg9tBHPxCOJLnoNO/ZBmwDmJ+fX+6hJUkn6P3p2yQXAB8F3l5VDw/7vqraUVWbqmrTzMxMfwVKUmN6Df4k5zII/Q9X1a1d84NJNnSvbwCOnu79kqTR6y34kwR4P7C/qt674KVdwNZueStwW181SJJONuxYPcvxUuBNwFeT3NO1/RlwA3BLd0voAwyGepYkjUlvwV9V/8lgXJ9Tubyv40qSzsyhlSWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfgbMDs3T5KRT5KmU59DNmiFOHzoINfeuHfk+735+s0j36ek/nnGL0mNMfglqTEGv9SSc9b28n3P7Jy/kjdNvMYvteSR437fI8/4Jak1Br8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1/S2eth1E9H/OyPo3NKOns9jPrpiJ/98Yxfkhpj8EtSYwx+SWqMwb/CzM7Nj/xLMklayC93V5jDhw76JZmkXnnGL0mNMfglqTEGvyQ1xuCXpMb0FvxJPpDkaJKvLWhbn2R3kgPdfF1fx5cknVqfZ/wfBK48oW07sKeqLgX2dOuSpDHqLfir6vPA905o3gLs7JZ3Atf0dXxJ0qmN+xr/xVV1BKCbX3S6DZNsS7Ivyb5jx46NrUBJWu1W7Je7VbWjqjZV1aaZmZlJlyNJq8a4g//BJBsAuvnRMR9fkpo37uDfBWztlrcCt435+JLUvD5v5/wIcCfwrCSHklwH3ABckeQAcEW3Lkkao94GaauqN5zmpcv7OqYkaXEr9stdSVI/DH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8Etamc5ZS5KRT7Nz85Pu2cT19uSuJJ2VR45z7Y17R77bm6/fPPJ9ThvP+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBv0yzc/O9PFwiSX3zAa5lOnzooA+XSJpKnvFLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktSYVR/8PmglSY+36h/g8kErSXq8VX/GL0l6PINfkhpj8EtSY1b9NX5Jepxz1vZyg8aac5/Az37645Hv92mXzPGdgw+MdJ8Gv6S2PHK8txs+puVGEi/1SFJjDH5JaozBL0mNMfglqTETCf4kVyb5epJvJtk+iRokqVVjD/4ka4C/B64Cng28Icmzx12HJLVqEmf8Lwa+WVX/XVU/Af4F2DKBOiSpSamq8R4weQ1wZVX9Xrf+JuBXq+qtJ2y3DdjWrT4L+PpYCx2NC4HvTrqIHtiv6bJa+wWrt2+j6tcvVtXMiY2TeIDrVI/MnfTpU1U7gB39l9OfJPuqatOk6xg1+zVdVmu/YPX2re9+TeJSzyFgbsH6JcDhCdQhSU2aRPB/Cbg0ydOTnAe8Htg1gTokqUljv9RTVceTvBX4NLAG+EBV3TfuOsZkqi9VnYH9mi6rtV+wevvWa7/G/uWuJGmyfHJXkhpj8EtSYwz+EUmyPsnuJAe6+bpTbDOX5PYk+5Pcl+Rtk6h1qYbpW7fdB5IcTfK1cde4FIsNGZKBv+1evzfJCydR51IN0a9fTnJnkh8neeckalyOIfr1xu7vdG+SvUmeN4k6l2OIvm3p+nVPkn1JLhvJgavKaQQT8NfA9m55O/BXp9hmA/DCbvlJwDeAZ0+69lH0rXvtZcALga9NuuYz9GUN8C3gGcB5wFdO/BsAVwP/zuCZk5cAd0267hH16yLgRcBfAu+cdM0j7NdmYF23fNU0/L2W0LcLeOy72OcC94/i2J7xj84WYGe3vBO45sQNqupIVX25W/4hsB+YHVeBZ2HRvgFU1eeB742ppuUaZsiQLcCHauALwFOSbBh3oUu0aL+q6mhVfQn46SQKXKZh+rW3qr7frX6BwbNB02CYvv2outQHfo5TPOy6HAb/6FxcVUdgEPAMzq5OK8lG4AXAXf2XdtaW1LcVbhY4uGD9ECd/+A6zzUozjTUPY6n9uo7Bv9amwVB9S/LqJPcDnwR+dxQH9jd3lyDJZ4CnnuKl9yxxPxcAHwXeXlUPj6K2szWqvk2BYYYMGWpYkRVmGmsextD9SvIKBsE/muvg/Rt2+JqPAR9L8jLgL4BXne2BDf4lqKrT/gdP8mCSDVV1pLsscPQ0253LIPQ/XFW39lTqko2ib1NimCFDpnFYkWmseRhD9SvJc4GbgKuq6qEx1Xa2lvQ3q6rPJ/mlJBdW1VkN4OalntHZBWztlrcCt524QZIA7wf2V9V7x1jb2Vq0b1NkmCFDdgFv7u7ueQnwP49e6lrBVutQKIv2K8k8cCvwpqr6xgRqXK5h+vbMLjfo7i47Dzj7D7ZJf7O9WibgF4A9wIFuvr5rfxrwqW75Mgb/lLsXuKebrp507aPoW7f+EeAIgy8PDwHXTbr20/TnagZ3VH0LeE/X9hbgLd1yGPxY0LeArwKbJl3ziPr11O7v8jDwg275yZOuewT9ugn4/oL/p/ZNuuYR9u3dwH1dv+4ELhvFcR2yQZIa46UeSWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5Ia83/w+A6+eiXdvAAAAABJRU5ErkJggg==\n",
646 | "text/plain": [
647 | ""
648 | ]
649 | },
650 | "metadata": {
651 | "needs_background": "light"
652 | },
653 | "output_type": "display_data"
654 | }
655 | ],
656 | "source": [
657 | "sns.histplot(diff_g);"
658 | ]
659 | },
660 | {
661 | "cell_type": "markdown",
662 | "id": "paperback-quest",
663 | "metadata": {},
664 | "source": [
665 | "We can see that the number of times that we had a difference greater than 0.2 is quite small. But is it statistically significant?"
666 | ]
667 | },
668 | {
669 | "cell_type": "code",
670 | "execution_count": 137,
671 | "id": "athletic-filename",
672 | "metadata": {},
673 | "outputs": [
674 | {
675 | "data": {
676 | "text/plain": [
677 | "0.024"
678 | ]
679 | },
680 | "execution_count": 137,
681 | "metadata": {},
682 | "output_type": "execute_result"
683 | }
684 | ],
685 | "source": [
686 | "diff_g[diff_g>=0.2].shape[0]/diff_g.shape[0]"
687 | ]
688 | },
689 | {
690 | "cell_type": "markdown",
691 | "id": "residential-uzbekistan",
692 | "metadata": {},
693 | "source": [
694 | "Suppose that a significance level of 5% was established prior to the start of the experiment. In that case, we see that our result is statistically significant, as the probability of observing a difference of 12 minutes in the 200 simulations with the re-randomized groups is only 2.4%. If this was due to chance it would happen only 6 in 200 times."
695 | ]
696 | },
697 | {
698 | "cell_type": "markdown",
699 | "id": "verified-asian",
700 | "metadata": {},
701 | "source": [
702 | "# Conclusion\n",
703 | "\n",
704 | "In this article, we covered how confidence intervals and hypothesis tests can be applied to compare differences between samples from two populations. It gives us a way to understand if the differences are actually statistically significant. We applied the ideas to an experiment that uses simulation and re-randomization to test the difference between a treatment and a control group."
705 | ]
706 | },
707 | {
708 | "cell_type": "markdown",
709 | "id": "optical-certification",
710 | "metadata": {},
711 | "source": [
712 | "# Exercises\n",
713 | "\n",
714 | "You will get the solutions on next week's article."
715 | ]
716 | },
717 | {
718 | "cell_type": "markdown",
719 | "id": "fallen-broadway",
720 | "metadata": {},
721 | "source": [
722 | "1. Physicians had a hypothesis that the mean time spent in the hospital due to Covid-19 before and after the vaccine changed. A group of 1,000 patients was randomized between a treatment group and a control group. The treatment group had already taken the vaccine while the control group did not. The results show that the mean time spent in the hospital by the treatment group was 10 days less than the time spent by the control group. The table below summarizes the results for the 1,000 re-randomizations of the data. Based on the data, what is the probability of the treatment group's mean is smaller than the one from the control group by 10 days or more? What can you conclude from the experiment's result (assuming a 5% significance level)?"
723 | ]
724 | },
725 | {
726 | "cell_type": "code",
727 | "execution_count": 160,
728 | "id": "quick-correlation",
729 | "metadata": {},
730 | "outputs": [],
731 | "source": [
732 | "diff = [[-17.5,1],\n",
733 | "[-15.0, 6],\n",
734 | "[-12.5, 15],\n",
735 | "[-10.0, 41],\n",
736 | "[-7.5, 82],\n",
737 | "[-5.0, 43],\n",
738 | "[-2.5, 150],\n",
739 | "[0., 167],\n",
740 | "[2.5, 132],\n",
741 | "[5.0, 127],\n",
742 | "[7.5, 173],\n",
743 | "[10.0, 38],\n",
744 | "[12.5, 18],\n",
745 | "[15.0, 6],\n",
746 | "[17.5, 1]]"
747 | ]
748 | },
749 | {
750 | "cell_type": "markdown",
751 | "id": "floating-malawi",
752 | "metadata": {},
753 | "source": [
754 | "### Answers from last week"
755 | ]
756 | },
757 | {
758 | "cell_type": "markdown",
759 | "id": "unsigned-columbia",
760 | "metadata": {},
761 | "source": [
762 | "1. According to a large poll from last year, about 85% of houses in Porto have access to high-speed internet. Marco wondered if the proportion had changed and took a random sample of 80 houses and found that 75 had access to high-speed internet. He wants to use this sample data to test if the proportion actually changed. Assuming that the conditions for inference were met, what would you conclude about the proportion of houses with hight-speed internet considering a significance level of 1%?"
763 | ]
764 | },
765 | {
766 | "cell_type": "code",
767 | "execution_count": 437,
768 | "id": "stupid-medicare",
769 | "metadata": {},
770 | "outputs": [
771 | {
772 | "data": {
773 | "text/plain": [
774 | "2.191785018798024"
775 | ]
776 | },
777 | "execution_count": 437,
778 | "metadata": {},
779 | "output_type": "execute_result"
780 | }
781 | ],
782 | "source": [
783 | "p_0 = 0.85\n",
784 | "p = 75/80\n",
785 | "n = 80\n",
786 | "α = 0.01\n",
787 | "\n",
788 | "z = (p-p_0)/np.sqrt(p_0*(1-p_0)/n)\n",
789 | "z"
790 | ]
791 | },
792 | {
793 | "cell_type": "code",
794 | "execution_count": 438,
795 | "id": "fifth-sunday",
796 | "metadata": {},
797 | "outputs": [],
798 | "source": [
799 | "p_value = (1-norm.cdf(z))*2 # see that Marco wants to check if the proportion changed, so it is a two-tail test"
800 | ]
801 | },
802 | {
803 | "cell_type": "code",
804 | "execution_count": 439,
805 | "id": "constant-poultry",
806 | "metadata": {},
807 | "outputs": [
808 | {
809 | "name": "stdout",
810 | "output_type": "stream",
811 | "text": [
812 | "Fail to Reject H0\n"
813 | ]
814 | }
815 | ],
816 | "source": [
817 | "if p_value < α:\n",
818 | " print(\"Reject H0\")\n",
819 | "else:\n",
820 | " print(\"Fail to Reject H0\")"
821 | ]
822 | },
823 | {
824 | "cell_type": "markdown",
825 | "id": "great-passport",
826 | "metadata": {},
827 | "source": [
828 | "2. Marta owns a fruit store and receives watermelons weekly. The supplier states that they are supposed to weigh 1kg. Marta decides to weigh a random sample of 100 watermelons and finds a mean weight of 850g and a standard deviation of 200g. She wants to use this sample data to test if the mean is smaller than the one claimed by the supplier and, if this is the case, renegotiate their contract. Assuming that the conditions for inference were met, what should Marta do (consider a significance level of 5%)?"
829 | ]
830 | },
831 | {
832 | "cell_type": "code",
833 | "execution_count": 453,
834 | "id": "toxic-assumption",
835 | "metadata": {},
836 | "outputs": [
837 | {
838 | "data": {
839 | "text/plain": [
840 | "-7.500000000000001"
841 | ]
842 | },
843 | "execution_count": 453,
844 | "metadata": {},
845 | "output_type": "execute_result"
846 | }
847 | ],
848 | "source": [
849 | "μ_0 = 1\n",
850 | "μ = 0.850\n",
851 | "s = 0.2\n",
852 | "n = 100\n",
853 | "α = 0.05\n",
854 | "\n",
855 | "t_star = (μ-μ_0)/(s/np.sqrt(n))\n",
856 | "t_star"
857 | ]
858 | },
859 | {
860 | "cell_type": "code",
861 | "execution_count": 454,
862 | "id": "theoretical-butterfly",
863 | "metadata": {},
864 | "outputs": [],
865 | "source": [
866 | "p_value = t.cdf(t_star, df=n-1)"
867 | ]
868 | },
869 | {
870 | "cell_type": "code",
871 | "execution_count": 456,
872 | "id": "spread-season",
873 | "metadata": {},
874 | "outputs": [
875 | {
876 | "name": "stdout",
877 | "output_type": "stream",
878 | "text": [
879 | "Reject H0\n"
880 | ]
881 | }
882 | ],
883 | "source": [
884 | "if p_value < α:\n",
885 | " print(\"Reject H0\")\n",
886 | "else:\n",
887 | " print(\"Fail to Reject H0\")"
888 | ]
889 | },
890 | {
891 | "cell_type": "markdown",
892 | "id": "bacterial-cable",
893 | "metadata": {},
894 | "source": [
895 | "Marta should renegotiate the contract with the supplier as the claim that the watermelons weigh 1kg is not true for a significance level of 5%."
896 | ]
897 | },
898 | {
899 | "cell_type": "code",
900 | "execution_count": null,
901 | "id": "large-limit",
902 | "metadata": {},
903 | "outputs": [],
904 | "source": []
905 | }
906 | ],
907 | "metadata": {
908 | "kernelspec": {
909 | "display_name": "Python 3",
910 | "language": "python",
911 | "name": "python3"
912 | },
913 | "language_info": {
914 | "codemirror_mode": {
915 | "name": "ipython",
916 | "version": 3
917 | },
918 | "file_extension": ".py",
919 | "mimetype": "text/x-python",
920 | "name": "python",
921 | "nbconvert_exporter": "python",
922 | "pygments_lexer": "ipython3",
923 | "version": "3.8.5"
924 | }
925 | },
926 | "nbformat": 4,
927 | "nbformat_minor": 5
928 | }
929 |
--------------------------------------------------------------------------------