├── AB_Testing.ipynb
├── Conversion Rate.ipynb
├── Employee_Retention_PeopleAnalytics.ipynb
├── Identify_Fraudulent_Activities.ipynb
├── Machine_Learning_Algorithms_Python.ipynb
├── README.md
└── raw-data
└── readme
/AB_Testing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# A/B Test"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "A/B testing is a controlled experiment with two variants - A/B--controll and experiement group. It's a hypothesi tesing to check if there is any statistical/practical difference between the controll and experiment group.
\n",
15 | "A/B tesint plays a vital rol in website optimization."
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": [
22 | "## Goal:\n",
23 | "### 1. Analyze results from an A/B Test\n",
24 | "### 2. Design an algorithm to automate some steps"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "#### Problem description:
\n",
32 | "Company XYZ is a world-wide e-commerce company and its Spain-based users have a much higher conversion rate than any other spanish-speaking countries. All spanish-speaking countries' website was transalated by a Spaniard.
\n",
33 | "They have a hypothesis that website which are translated by local people will have a higher conversion rate. Therefor, they designed the A/B test to test the hypothesis."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 348,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "import pandas as pd\n",
43 | "import numpy as np\n",
44 | "from scipy import stats\n",
45 | "import matplotlib.pyplot as plt\n",
46 | "%matplotlib inline\n",
47 | "import seaborn as sns"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 288,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "# load two tables into pandas data frame\n",
57 | "test = pd.read_csv(r'C:\\Users\\lshen\\Downloads\\Translation_Test\\test_table.csv')\n",
58 | "user = pd.read_csv(r'C:\\Users\\lshen\\Downloads\\Translation_Test\\user_table.csv')"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "### Step 1: Data Exploration"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 289,
71 | "metadata": {},
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "
\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " user_id | \n",
95 | " date | \n",
96 | " source | \n",
97 | " device | \n",
98 | " browser_language | \n",
99 | " ads_channel | \n",
100 | " browser | \n",
101 | " conversion | \n",
102 | " test | \n",
103 | "
\n",
104 | " \n",
105 | " \n",
106 | " \n",
107 | " 0 | \n",
108 | " 315281 | \n",
109 | " 2015-12-03 | \n",
110 | " Direct | \n",
111 | " Web | \n",
112 | " ES | \n",
113 | " NaN | \n",
114 | " IE | \n",
115 | " 1 | \n",
116 | " 0 | \n",
117 | "
\n",
118 | " \n",
119 | " 1 | \n",
120 | " 497851 | \n",
121 | " 2015-12-04 | \n",
122 | " Ads | \n",
123 | " Web | \n",
124 | " ES | \n",
125 | " Google | \n",
126 | " IE | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " 2 | \n",
132 | " 848402 | \n",
133 | " 2015-12-04 | \n",
134 | " Ads | \n",
135 | " Web | \n",
136 | " ES | \n",
137 | " Facebook | \n",
138 | " Chrome | \n",
139 | " 0 | \n",
140 | " 0 | \n",
141 | "
\n",
142 | " \n",
143 | " 3 | \n",
144 | " 290051 | \n",
145 | " 2015-12-03 | \n",
146 | " Ads | \n",
147 | " Mobile | \n",
148 | " Other | \n",
149 | " Facebook | \n",
150 | " Android_App | \n",
151 | " 0 | \n",
152 | " 1 | \n",
153 | "
\n",
154 | " \n",
155 | " 4 | \n",
156 | " 548435 | \n",
157 | " 2015-11-30 | \n",
158 | " Ads | \n",
159 | " Web | \n",
160 | " ES | \n",
161 | " Google | \n",
162 | " FireFox | \n",
163 | " 0 | \n",
164 | " 1 | \n",
165 | "
\n",
166 | " \n",
167 | "
\n",
168 | "
"
169 | ],
170 | "text/plain": [
171 | " user_id date source device browser_language ads_channel \\\n",
172 | "0 315281 2015-12-03 Direct Web ES NaN \n",
173 | "1 497851 2015-12-04 Ads Web ES Google \n",
174 | "2 848402 2015-12-04 Ads Web ES Facebook \n",
175 | "3 290051 2015-12-03 Ads Mobile Other Facebook \n",
176 | "4 548435 2015-11-30 Ads Web ES Google \n",
177 | "\n",
178 | " browser conversion test \n",
179 | "0 IE 1 0 \n",
180 | "1 IE 0 1 \n",
181 | "2 Chrome 0 0 \n",
182 | "3 Android_App 0 1 \n",
183 | "4 FireFox 0 1 "
184 | ]
185 | },
186 | "execution_count": 289,
187 | "metadata": {},
188 | "output_type": "execute_result"
189 | }
190 | ],
191 | "source": [
192 | "test.head()"
193 | ]
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": 290,
198 | "metadata": {},
199 | "outputs": [
200 | {
201 | "data": {
202 | "text/html": [
203 | "\n",
204 | "\n",
217 | "
\n",
218 | " \n",
219 | " \n",
220 | " | \n",
221 | " user_id | \n",
222 | " sex | \n",
223 | " age | \n",
224 | " country | \n",
225 | "
\n",
226 | " \n",
227 | " \n",
228 | " \n",
229 | " 0 | \n",
230 | " 765821 | \n",
231 | " M | \n",
232 | " 20 | \n",
233 | " Mexico | \n",
234 | "
\n",
235 | " \n",
236 | " 1 | \n",
237 | " 343561 | \n",
238 | " F | \n",
239 | " 27 | \n",
240 | " Nicaragua | \n",
241 | "
\n",
242 | " \n",
243 | " 2 | \n",
244 | " 118744 | \n",
245 | " M | \n",
246 | " 23 | \n",
247 | " Colombia | \n",
248 | "
\n",
249 | " \n",
250 | " 3 | \n",
251 | " 987753 | \n",
252 | " F | \n",
253 | " 27 | \n",
254 | " Venezuela | \n",
255 | "
\n",
256 | " \n",
257 | " 4 | \n",
258 | " 554597 | \n",
259 | " F | \n",
260 | " 20 | \n",
261 | " Spain | \n",
262 | "
\n",
263 | " \n",
264 | "
\n",
265 | "
"
266 | ],
267 | "text/plain": [
268 | " user_id sex age country\n",
269 | "0 765821 M 20 Mexico\n",
270 | "1 343561 F 27 Nicaragua\n",
271 | "2 118744 M 23 Colombia\n",
272 | "3 987753 F 27 Venezuela\n",
273 | "4 554597 F 20 Spain"
274 | ]
275 | },
276 | "execution_count": 290,
277 | "metadata": {},
278 | "output_type": "execute_result"
279 | }
280 | ],
281 | "source": [
282 | "user.head()"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": 291,
288 | "metadata": {},
289 | "outputs": [
290 | {
291 | "name": "stdout",
292 | "output_type": "stream",
293 | "text": [
294 | "Total number of user_id: 453321\n",
295 | "Total number of user_id: 453321\n"
296 | ]
297 | }
298 | ],
299 | "source": [
300 | "# check if test table's user_id is unique---Yes, one user_id has only one record\n",
301 | "print ('Total number of user_id: {}'.format(test.user_id.size))\n",
302 | "print ('Total number of user_id: {}'.format(test.user_id.nunique()))"
303 | ]
304 | },
305 | {
306 | "cell_type": "code",
307 | "execution_count": 292,
308 | "metadata": {},
309 | "outputs": [
310 | {
311 | "name": "stdout",
312 | "output_type": "stream",
313 | "text": [
314 | "Total records in test table: 453321\n",
315 | "Total records in user table: 452867\n"
316 | ]
317 | }
318 | ],
319 | "source": [
320 | "print ('Total records in test table: {}'.format(len(test)))\n",
321 | "print ('Total records in user table: {}'.format(len(user)))"
322 | ]
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": [
328 | "From above code, we can see that some user_id don't exist in user table. Since the analysis is based on different countries, and it's very import variable, so we will drop the records that don't have demographic information."
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": 293,
334 | "metadata": {},
335 | "outputs": [],
336 | "source": [
337 | "# merge two tables based on user_id, which will return the records with demographic info.\n",
338 | "data = test.merge(user,how = 'inner', on='user_id')"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": 294,
344 | "metadata": {},
345 | "outputs": [
346 | {
347 | "data": {
348 | "text/html": [
349 | "\n",
350 | "\n",
363 | "
\n",
364 | " \n",
365 | " \n",
366 | " | \n",
367 | " user_id | \n",
368 | " date | \n",
369 | " source | \n",
370 | " device | \n",
371 | " browser_language | \n",
372 | " ads_channel | \n",
373 | " browser | \n",
374 | " conversion | \n",
375 | " test | \n",
376 | " sex | \n",
377 | " age | \n",
378 | " country | \n",
379 | "
\n",
380 | " \n",
381 | " \n",
382 | " \n",
383 | " 0 | \n",
384 | " 315281 | \n",
385 | " 2015-12-03 | \n",
386 | " Direct | \n",
387 | " Web | \n",
388 | " ES | \n",
389 | " NaN | \n",
390 | " IE | \n",
391 | " 1 | \n",
392 | " 0 | \n",
393 | " M | \n",
394 | " 32 | \n",
395 | " Spain | \n",
396 | "
\n",
397 | " \n",
398 | " 1 | \n",
399 | " 497851 | \n",
400 | " 2015-12-04 | \n",
401 | " Ads | \n",
402 | " Web | \n",
403 | " ES | \n",
404 | " Google | \n",
405 | " IE | \n",
406 | " 0 | \n",
407 | " 1 | \n",
408 | " M | \n",
409 | " 21 | \n",
410 | " Mexico | \n",
411 | "
\n",
412 | " \n",
413 | " 2 | \n",
414 | " 848402 | \n",
415 | " 2015-12-04 | \n",
416 | " Ads | \n",
417 | " Web | \n",
418 | " ES | \n",
419 | " Facebook | \n",
420 | " Chrome | \n",
421 | " 0 | \n",
422 | " 0 | \n",
423 | " M | \n",
424 | " 34 | \n",
425 | " Spain | \n",
426 | "
\n",
427 | " \n",
428 | " 3 | \n",
429 | " 290051 | \n",
430 | " 2015-12-03 | \n",
431 | " Ads | \n",
432 | " Mobile | \n",
433 | " Other | \n",
434 | " Facebook | \n",
435 | " Android_App | \n",
436 | " 0 | \n",
437 | " 1 | \n",
438 | " F | \n",
439 | " 22 | \n",
440 | " Mexico | \n",
441 | "
\n",
442 | " \n",
443 | " 4 | \n",
444 | " 548435 | \n",
445 | " 2015-11-30 | \n",
446 | " Ads | \n",
447 | " Web | \n",
448 | " ES | \n",
449 | " Google | \n",
450 | " FireFox | \n",
451 | " 0 | \n",
452 | " 1 | \n",
453 | " M | \n",
454 | " 19 | \n",
455 | " Mexico | \n",
456 | "
\n",
457 | " \n",
458 | "
\n",
459 | "
"
460 | ],
461 | "text/plain": [
462 | " user_id date source device browser_language ads_channel \\\n",
463 | "0 315281 2015-12-03 Direct Web ES NaN \n",
464 | "1 497851 2015-12-04 Ads Web ES Google \n",
465 | "2 848402 2015-12-04 Ads Web ES Facebook \n",
466 | "3 290051 2015-12-03 Ads Mobile Other Facebook \n",
467 | "4 548435 2015-11-30 Ads Web ES Google \n",
468 | "\n",
469 | " browser conversion test sex age country \n",
470 | "0 IE 1 0 M 32 Spain \n",
471 | "1 IE 0 1 M 21 Mexico \n",
472 | "2 Chrome 0 0 M 34 Spain \n",
473 | "3 Android_App 0 1 F 22 Mexico \n",
474 | "4 FireFox 0 1 M 19 Mexico "
475 | ]
476 | },
477 | "execution_count": 294,
478 | "metadata": {},
479 | "output_type": "execute_result"
480 | }
481 | ],
482 | "source": [
483 | "data.head()"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": 295,
489 | "metadata": {},
490 | "outputs": [
491 | {
492 | "data": {
493 | "text/plain": [
494 | "(452867, 12)"
495 | ]
496 | },
497 | "execution_count": 295,
498 | "metadata": {},
499 | "output_type": "execute_result"
500 | }
501 | ],
502 | "source": [
503 | "data.shape"
504 | ]
505 | },
506 | {
507 | "cell_type": "code",
508 | "execution_count": 296,
509 | "metadata": {},
510 | "outputs": [
511 | {
512 | "data": {
513 | "text/plain": [
514 | "user_id int64\n",
515 | "date object\n",
516 | "source object\n",
517 | "device object\n",
518 | "browser_language object\n",
519 | "ads_channel object\n",
520 | "browser object\n",
521 | "conversion int64\n",
522 | "test int64\n",
523 | "sex object\n",
524 | "age int64\n",
525 | "country object\n",
526 | "dtype: object"
527 | ]
528 | },
529 | "execution_count": 296,
530 | "metadata": {},
531 | "output_type": "execute_result"
532 | }
533 | ],
534 | "source": [
535 | "# check columns' data types\n",
536 | "data.dtypes"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": 297,
542 | "metadata": {},
543 | "outputs": [
544 | {
545 | "data": {
546 | "text/html": [
547 | "\n",
548 | "\n",
561 | "
\n",
562 | " \n",
563 | " \n",
564 | " | \n",
565 | " user_id | \n",
566 | " date | \n",
567 | " source | \n",
568 | " device | \n",
569 | " browser_language | \n",
570 | " ads_channel | \n",
571 | " browser | \n",
572 | " conversion | \n",
573 | " test | \n",
574 | " sex | \n",
575 | " age | \n",
576 | " country | \n",
577 | "
\n",
578 | " \n",
579 | " \n",
580 | " \n",
581 | " count | \n",
582 | " 452867.000000 | \n",
583 | " 452867 | \n",
584 | " 452867 | \n",
585 | " 452867 | \n",
586 | " 452867 | \n",
587 | " 181693 | \n",
588 | " 452867 | \n",
589 | " 452867.000000 | \n",
590 | " 452867.000000 | \n",
591 | " 452867 | \n",
592 | " 452867.000000 | \n",
593 | " 452867 | \n",
594 | "
\n",
595 | " \n",
596 | " unique | \n",
597 | " NaN | \n",
598 | " 5 | \n",
599 | " 3 | \n",
600 | " 2 | \n",
601 | " 3 | \n",
602 | " 5 | \n",
603 | " 7 | \n",
604 | " NaN | \n",
605 | " NaN | \n",
606 | " 2 | \n",
607 | " NaN | \n",
608 | " 17 | \n",
609 | "
\n",
610 | " \n",
611 | " top | \n",
612 | " NaN | \n",
613 | " 2015-12-04 | \n",
614 | " Ads | \n",
615 | " Web | \n",
616 | " ES | \n",
617 | " Facebook | \n",
618 | " Android_App | \n",
619 | " NaN | \n",
620 | " NaN | \n",
621 | " M | \n",
622 | " NaN | \n",
623 | " Mexico | \n",
624 | "
\n",
625 | " \n",
626 | " freq | \n",
627 | " NaN | \n",
628 | " 141024 | \n",
629 | " 181693 | \n",
630 | " 251316 | \n",
631 | " 377160 | \n",
632 | " 68358 | \n",
633 | " 154977 | \n",
634 | " NaN | \n",
635 | " NaN | \n",
636 | " 264485 | \n",
637 | " NaN | \n",
638 | " 128484 | \n",
639 | "
\n",
640 | " \n",
641 | " mean | \n",
642 | " 499944.805166 | \n",
643 | " NaN | \n",
644 | " NaN | \n",
645 | " NaN | \n",
646 | " NaN | \n",
647 | " NaN | \n",
648 | " NaN | \n",
649 | " 0.049560 | \n",
650 | " 0.476462 | \n",
651 | " NaN | \n",
652 | " 27.130740 | \n",
653 | " NaN | \n",
654 | "
\n",
655 | " \n",
656 | " std | \n",
657 | " 288676.264784 | \n",
658 | " NaN | \n",
659 | " NaN | \n",
660 | " NaN | \n",
661 | " NaN | \n",
662 | " NaN | \n",
663 | " NaN | \n",
664 | " 0.217034 | \n",
665 | " 0.499446 | \n",
666 | " NaN | \n",
667 | " 6.776678 | \n",
668 | " NaN | \n",
669 | "
\n",
670 | " \n",
671 | " min | \n",
672 | " 1.000000 | \n",
673 | " NaN | \n",
674 | " NaN | \n",
675 | " NaN | \n",
676 | " NaN | \n",
677 | " NaN | \n",
678 | " NaN | \n",
679 | " 0.000000 | \n",
680 | " 0.000000 | \n",
681 | " NaN | \n",
682 | " 18.000000 | \n",
683 | " NaN | \n",
684 | "
\n",
685 | " \n",
686 | " 25% | \n",
687 | " 249819.000000 | \n",
688 | " NaN | \n",
689 | " NaN | \n",
690 | " NaN | \n",
691 | " NaN | \n",
692 | " NaN | \n",
693 | " NaN | \n",
694 | " 0.000000 | \n",
695 | " 0.000000 | \n",
696 | " NaN | \n",
697 | " 22.000000 | \n",
698 | " NaN | \n",
699 | "
\n",
700 | " \n",
701 | " 50% | \n",
702 | " 500019.000000 | \n",
703 | " NaN | \n",
704 | " NaN | \n",
705 | " NaN | \n",
706 | " NaN | \n",
707 | " NaN | \n",
708 | " NaN | \n",
709 | " 0.000000 | \n",
710 | " 0.000000 | \n",
711 | " NaN | \n",
712 | " 26.000000 | \n",
713 | " NaN | \n",
714 | "
\n",
715 | " \n",
716 | " 75% | \n",
717 | " 749543.000000 | \n",
718 | " NaN | \n",
719 | " NaN | \n",
720 | " NaN | \n",
721 | " NaN | \n",
722 | " NaN | \n",
723 | " NaN | \n",
724 | " 0.000000 | \n",
725 | " 1.000000 | \n",
726 | " NaN | \n",
727 | " 31.000000 | \n",
728 | " NaN | \n",
729 | "
\n",
730 | " \n",
731 | " max | \n",
732 | " 1000000.000000 | \n",
733 | " NaN | \n",
734 | " NaN | \n",
735 | " NaN | \n",
736 | " NaN | \n",
737 | " NaN | \n",
738 | " NaN | \n",
739 | " 1.000000 | \n",
740 | " 1.000000 | \n",
741 | " NaN | \n",
742 | " 70.000000 | \n",
743 | " NaN | \n",
744 | "
\n",
745 | " \n",
746 | "
\n",
747 | "
"
748 | ],
749 | "text/plain": [
750 | " user_id date source device browser_language \\\n",
751 | "count 452867.000000 452867 452867 452867 452867 \n",
752 | "unique NaN 5 3 2 3 \n",
753 | "top NaN 2015-12-04 Ads Web ES \n",
754 | "freq NaN 141024 181693 251316 377160 \n",
755 | "mean 499944.805166 NaN NaN NaN NaN \n",
756 | "std 288676.264784 NaN NaN NaN NaN \n",
757 | "min 1.000000 NaN NaN NaN NaN \n",
758 | "25% 249819.000000 NaN NaN NaN NaN \n",
759 | "50% 500019.000000 NaN NaN NaN NaN \n",
760 | "75% 749543.000000 NaN NaN NaN NaN \n",
761 | "max 1000000.000000 NaN NaN NaN NaN \n",
762 | "\n",
763 | " ads_channel browser conversion test sex \\\n",
764 | "count 181693 452867 452867.000000 452867.000000 452867 \n",
765 | "unique 5 7 NaN NaN 2 \n",
766 | "top Facebook Android_App NaN NaN M \n",
767 | "freq 68358 154977 NaN NaN 264485 \n",
768 | "mean NaN NaN 0.049560 0.476462 NaN \n",
769 | "std NaN NaN 0.217034 0.499446 NaN \n",
770 | "min NaN NaN 0.000000 0.000000 NaN \n",
771 | "25% NaN NaN 0.000000 0.000000 NaN \n",
772 | "50% NaN NaN 0.000000 0.000000 NaN \n",
773 | "75% NaN NaN 0.000000 1.000000 NaN \n",
774 | "max NaN NaN 1.000000 1.000000 NaN \n",
775 | "\n",
776 | " age country \n",
777 | "count 452867.000000 452867 \n",
778 | "unique NaN 17 \n",
779 | "top NaN Mexico \n",
780 | "freq NaN 128484 \n",
781 | "mean 27.130740 NaN \n",
782 | "std 6.776678 NaN \n",
783 | "min 18.000000 NaN \n",
784 | "25% 22.000000 NaN \n",
785 | "50% 26.000000 NaN \n",
786 | "75% 31.000000 NaN \n",
787 | "max 70.000000 NaN "
788 | ]
789 | },
790 | "execution_count": 297,
791 | "metadata": {},
792 | "output_type": "execute_result"
793 | }
794 | ],
795 | "source": [
796 | "data.describe(include = 'all')"
797 | ]
798 | },
799 | {
800 | "cell_type": "code",
801 | "execution_count": 298,
802 | "metadata": {},
803 | "outputs": [
804 | {
805 | "data": {
806 | "text/plain": [
807 | "user_id 0\n",
808 | "date 0\n",
809 | "source 0\n",
810 | "device 0\n",
811 | "browser_language 0\n",
812 | "ads_channel 271174\n",
813 | "browser 0\n",
814 | "conversion 0\n",
815 | "test 0\n",
816 | "sex 0\n",
817 | "age 0\n",
818 | "country 0\n",
819 | "dtype: int64"
820 | ]
821 | },
822 | "execution_count": 298,
823 | "metadata": {},
824 | "output_type": "execute_result"
825 | }
826 | ],
827 | "source": [
828 | "# check if there is any null values\n",
829 | "# about 60% ads_channel values are missing \n",
830 | "data.isnull().sum()"
831 | ]
832 | },
833 | {
834 | "cell_type": "code",
835 | "execution_count": 299,
836 | "metadata": {},
837 | "outputs": [
838 | {
839 | "data": {
840 | "text/plain": [
841 | "2015-12-04 141024\n",
842 | "2015-12-03 99399\n",
843 | "2015-11-30 70948\n",
844 | "2015-12-01 70915\n",
845 | "2015-12-02 70581\n",
846 | "Name: date, dtype: int64"
847 | ]
848 | },
849 | "execution_count": 299,
850 | "metadata": {},
851 | "output_type": "execute_result"
852 | }
853 | ],
854 | "source": [
855 | "data.date.value_counts()"
856 | ]
857 | },
858 | {
859 | "cell_type": "code",
860 | "execution_count": 300,
861 | "metadata": {},
862 | "outputs": [
863 | {
864 | "data": {
865 | "text/plain": [
866 | "Ads 181693\n",
867 | "SEO 180436\n",
868 | "Direct 90738\n",
869 | "Name: source, dtype: int64"
870 | ]
871 | },
872 | "execution_count": 300,
873 | "metadata": {},
874 | "output_type": "execute_result"
875 | }
876 | ],
877 | "source": [
878 | "data.source.value_counts()"
879 | ]
880 | },
881 | {
882 | "cell_type": "code",
883 | "execution_count": 301,
884 | "metadata": {},
885 | "outputs": [
886 | {
887 | "data": {
888 | "text/plain": [
889 | "Web 251316\n",
890 | "Mobile 201551\n",
891 | "Name: device, dtype: int64"
892 | ]
893 | },
894 | "execution_count": 301,
895 | "metadata": {},
896 | "output_type": "execute_result"
897 | }
898 | ],
899 | "source": [
900 | "data.device.value_counts()"
901 | ]
902 | },
903 | {
904 | "cell_type": "code",
905 | "execution_count": 302,
906 | "metadata": {},
907 | "outputs": [
908 | {
909 | "data": {
910 | "text/plain": [
911 | "ES 377160\n",
912 | "EN 63079\n",
913 | "Other 12628\n",
914 | "Name: browser_language, dtype: int64"
915 | ]
916 | },
917 | "execution_count": 302,
918 | "metadata": {},
919 | "output_type": "execute_result"
920 | }
921 | ],
922 | "source": [
923 | "data.browser_language.value_counts()"
924 | ]
925 | },
926 | {
927 | "cell_type": "code",
928 | "execution_count": 303,
929 | "metadata": {},
930 | "outputs": [
931 | {
932 | "data": {
933 | "text/plain": [
934 | "Facebook 68358\n",
935 | "Google 68113\n",
936 | "Yahoo 27409\n",
937 | "Bing 13670\n",
938 | "Other 4143\n",
939 | "Name: ads_channel, dtype: int64"
940 | ]
941 | },
942 | "execution_count": 303,
943 | "metadata": {},
944 | "output_type": "execute_result"
945 | }
946 | ],
947 | "source": [
948 | "data.ads_channel.value_counts()"
949 | ]
950 | },
951 | {
952 | "cell_type": "code",
953 | "execution_count": 304,
954 | "metadata": {},
955 | "outputs": [
956 | {
957 | "data": {
958 | "text/plain": [
959 | "Android_App 154977\n",
960 | "Chrome 101822\n",
961 | "IE 61656\n",
962 | "Iphone_App 46574\n",
963 | "Safari 41033\n",
964 | "FireFox 40721\n",
965 | "Opera 6084\n",
966 | "Name: browser, dtype: int64"
967 | ]
968 | },
969 | "execution_count": 304,
970 | "metadata": {},
971 | "output_type": "execute_result"
972 | }
973 | ],
974 | "source": [
975 | "data.browser.value_counts()"
976 | ]
977 | },
978 | {
979 | "cell_type": "code",
980 | "execution_count": 305,
981 | "metadata": {},
982 | "outputs": [
983 | {
984 | "data": {
985 | "text/plain": [
986 | "0 430423\n",
987 | "1 22444\n",
988 | "Name: conversion, dtype: int64"
989 | ]
990 | },
991 | "execution_count": 305,
992 | "metadata": {},
993 | "output_type": "execute_result"
994 | }
995 | ],
996 | "source": [
997 | "data.conversion.value_counts()"
998 | ]
999 | },
1000 | {
1001 | "cell_type": "code",
1002 | "execution_count": 306,
1003 | "metadata": {},
1004 | "outputs": [
1005 | {
1006 | "data": {
1007 | "text/plain": [
1008 | "0 237093\n",
1009 | "1 215774\n",
1010 | "Name: test, dtype: int64"
1011 | ]
1012 | },
1013 | "execution_count": 306,
1014 | "metadata": {},
1015 | "output_type": "execute_result"
1016 | }
1017 | ],
1018 | "source": [
1019 | "data.test.value_counts()"
1020 | ]
1021 | },
1022 | {
1023 | "cell_type": "code",
1024 | "execution_count": 307,
1025 | "metadata": {},
1026 | "outputs": [
1027 | {
1028 | "data": {
1029 | "text/plain": [
1030 | "Mexico 128484\n",
1031 | "Colombia 54060\n",
1032 | "Spain 51782\n",
1033 | "Argentina 46733\n",
1034 | "Peru 33666\n",
1035 | "Venezuela 32054\n",
1036 | "Chile 19737\n",
1037 | "Ecuador 15895\n",
1038 | "Guatemala 15125\n",
1039 | "Bolivia 11124\n",
1040 | "Honduras 8568\n",
1041 | "El Salvador 8175\n",
1042 | "Paraguay 7347\n",
1043 | "Nicaragua 6723\n",
1044 | "Costa Rica 5309\n",
1045 | "Uruguay 4134\n",
1046 | "Panama 3951\n",
1047 | "Name: country, dtype: int64"
1048 | ]
1049 | },
1050 | "execution_count": 307,
1051 | "metadata": {},
1052 | "output_type": "execute_result"
1053 | }
1054 | ],
1055 | "source": [
1056 | "data.country.value_counts()"
1057 | ]
1058 | },
1059 | {
1060 | "cell_type": "markdown",
1061 | "metadata": {},
1062 | "source": [
1063 | "Let's first check and confirm that before test, Spain converts more than the other countrys"
1064 | ]
1065 | },
1066 | {
1067 | "cell_type": "code",
1068 | "execution_count": 358,
1069 | "metadata": {},
1070 | "outputs": [
1071 | {
1072 | "data": {
1073 | "text/plain": [
1074 | "country\n",
1075 | "Spain 0.079719\n",
1076 | "El Salvador 0.053554\n",
1077 | "Nicaragua 0.052647\n",
1078 | "Costa Rica 0.052256\n",
1079 | "Colombia 0.052089\n",
1080 | "Honduras 0.050906\n",
1081 | "Guatemala 0.050643\n",
1082 | "Venezuela 0.050344\n",
1083 | "Peru 0.049914\n",
1084 | "Mexico 0.049495\n",
1085 | "Bolivia 0.049369\n",
1086 | "Ecuador 0.049154\n",
1087 | "Paraguay 0.048493\n",
1088 | "Chile 0.048107\n",
1089 | "Panama 0.046796\n",
1090 | "Argentina 0.015071\n",
1091 | "Uruguay 0.012048\n",
1092 | "Name: conversion, dtype: float64"
1093 | ]
1094 | },
1095 | "execution_count": 358,
1096 | "metadata": {},
1097 | "output_type": "execute_result"
1098 | }
1099 | ],
1100 | "source": [
1101 | "# Yes, Spain has the highest conversion rate.\n",
1102 | "data[data['test']==0].groupby('country').conversion.mean().sort_values(ascending = False)"
1103 | ]
1104 | },
1105 | {
1106 | "cell_type": "code",
1107 | "execution_count": 362,
1108 | "metadata": {},
1109 | "outputs": [],
1110 | "source": [
1111 | "# group by country, and do NOT set country as index\n",
1112 | "data_country = data[data['test']==0].groupby('country', as_index = False).conversion.mean()"
1113 | ]
1114 | },
1115 | {
1116 | "cell_type": "code",
1117 | "execution_count": 368,
1118 | "metadata": {},
1119 | "outputs": [
1120 | {
1121 | "data": {
1122 | "text/plain": [
1123 | ""
1124 | ]
1125 | },
1126 | "execution_count": 368,
1127 | "metadata": {},
1128 | "output_type": "execute_result"
1129 | },
1130 | {
1131 | "data": {
1132 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABDAAAADrCAYAAACfImbpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xe4XFW5x/HvjyQk9GZooYSOgKgQAaVXKUJAinSkCCgoiiDlAiJgARUuSr8CIh1BMEqkqoB0BBVBwYgFRJEmKBbae/9413B2xgM5OTlz9p5zfp/nmSez9+w5WWtmz95rvaspIjAzMzMzMzMza7KZ6k6AmZmZmZmZmdm0OIBhZmZmZmZmZo3nAIaZmZmZmZmZNZ4DGGZmZmZmZmbWeA5gmJmZmZmZmVnjOYBhZmZmZmZmZo3nAIaZmZmZmZmZNZ4DGGZmZmZmZmbWeA5gmJmZmZmZmVnjjaw7AQNl0003jeuuu67uZJiZmZmZmZnZ9FFfDhoyPTCeeeaZupNgZmZmZmZmZh0yZAIYZmZmZmZmZjZ0OYBhZmZmZmZmZo3X0QCGpE0lPSJpiqTDe3l9tKTLy+t3Sxpf9o+SdIGkByX9StIRnUynmZmZmZmZmTVbxwIYkkYApwObASsAO0laoe2wvYHnI2Jp4BTgxLJ/e2B0RLwDWBXYrxXcMDMzMzMzM7Php5M9MFYDpkTEYxHxMnAZMLHtmInABeX5lcCGkgQEMJukkcAswMvAix1Mq5mZmZmZmZk1WCcDGOOAxyvbT5R9vR4TEa8CLwDzkcGMl4A/A38EvhIRz7X/B5L2lXSfpPuefvrpgc+BmZmZmZmZmTXCyA7+7d7WcY0+HrMa8BqwMDAPcJukmyLisakOjDgHOAdgwoQJ7X/bzMzMzMxsWPrNaU/VnYR+W+bABepOgjVUJ3tgPAEsWtleBHjyzY4pw0XmAp4Ddgaui4hXIuKvwO3AhA6m1czMzMzMzMwarJMBjHuBZSQtIWlmYEdgUtsxk4A9yvPtgB9GRJDDRjZQmg1YA/h1B9NqZmZmZmZmZg3WsQBGmdPiQOB64FfAFRHxkKTjJG1VDjsXmE/SFOBgoLXU6unA7MAvyUDI+RHxi06l1czMzMzMzMyarZNzYBARk4HJbfuOqTz/N7lkavv7/tHbfjMzMzMzMzMbnjo5hMTMzMzMzMzMbEA4gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeN1NIAhaVNJj0iaIunwXl4fLeny8vrdksZXXltZ0p2SHpL0oKQxnUyrmZmZmZmZmTVXxwIYkkYApwObASsAO0laoe2wvYHnI2Jp4BTgxPLekcBFwP4RsSKwHvBKp9JqZmZmZmZmZs3WyR4YqwFTIuKxiHgZuAyY2HbMROCC8vxKYENJAjYBfhERPweIiGcj4rUOptXMzMzMzMzMGqyTAYxxwOOV7SfKvl6PiYhXgReA+YBlgZB0vaT7JX2mg+k0MzMzMzMzs4Yb2cG/rV72RR+PGQmsBbwH+Cdws6SfRsTNU71Z2hfYF2CxxRab4QSbmZmZmZmZWTN1sgfGE8Cile1FgCff7Jgy78VcwHNl/y0R8UxE/BOYDKzS/h9ExDkRMSEiJowdO7YDWTAzMzMzMzOzJuhkAONeYBlJS0iaGdgRmNR2zCRgj/J8O+CHERHA9cDKkmYtgY11gYc7mFYzMzMzMzMza7CODSGJiFclHUgGI0YA50XEQ5KOA+6LiEnAucCFkqaQPS92LO99XtLJZBAkgMkRcW2n0mpmZmZmZmZmzdbJOTCIiMnk8I/qvmMqz/8NbP8m772IXErVzMzMzMzMzIa5Tg4hMTMzMzMzMzMbEA5gmJmZmZmZmVnjOYBhZmZmZmZmZo3nAIaZmZmZmZmZNZ4DGGZmZmZmZmbWeH1ahUTSWOAjwPjqeyJir84ky8zMzMzMzMysR1+XUf0ucBtwE/Ba55JjZmZmZmZmZvbf+hrAmDUiDutoSszMbMjb7Ls71Z2EGfKDiZfWnQQzMzOzYauvc2B8X9LmHU2JmZmZmZmZmdmb6GsPjIOAIyW9DLxS9kVEzNmZZJnZcHbeBZvUnYQZstceN9SdBDOzAbPNVT+qOwkz5Opt1687CWZmNkD6FMCIiDk6nRAze2uTz+3eTlCb7z257iSYmZmZmVmX62sPDCRtBaxTNn8cEd/vTJLMzMzMzMzMzKbW12VUvwS8B7i47DpI0loRcXjHUmZmZtblNr/6xLqTMEMmb9P3+bs/cNW5HUxJ531/273rToKZmZlNQ197YGwOvCsiXgeQdAHwAOAAhpmZmZmZmdkg+evXb647Cf02/8c3nKH393kICTA38Fx5PtcM/a9mZmZmZmaD6IcXP113EmbIBruMrTsJZrXrawDji8ADkn4EiJwL44iOpcr67c9n/E/dSei3hT72+bqTYFaLEy5/f91J6LejPnR93Ukwq8WWV15VdxJmyPe227buJJiZmU23vq5CcqmkH5PzYAg4LCL+0smEmZmZmZmZmZm1vGUAQ9LyEfFrSauUXU+UfxeWtHBE3N/Z5Jm9uQfO2rLuJMyQd+//vbqTYGZmZsPYOd/5a91JmCH7fnD+upNgZoNsWj0wDgb2Bb7ay2sBbDDgKTIzMzMzMzMza/OWAYyI2Lf8u/7gJMfMzMzMzMzM7L/1aQ4MSdsD10XE3yUdBawCHB8RD3Q0dWZmZmZmg+hDVz1adxJmyOXbLlt3EszMOmamPh53dAlerAW8H7gAOKtzyTIzMzMzMzMz69HXAMZr5d8tgDMj4rvAzJ1JkpmZmZmZmZnZ1PoawPiTpLOBHYDJkkZPx3vNzMzMzMzMzGZIX4MQOwDXA5tGxN+AeYFDO5YqMzMzMzMzM7OKaU7iKWkm4J6IWKm1LyL+DPy5kwkzMzMzMzMzM2uZZg+MiHgd+LmkxQYhPWZmZmZmZmZm/6VPy6gCCwEPSboHeKm1MyK26kiqzMzMzMzMzMwq+hrA+FxHU2FmZmZmZmZm9hb6NIlnRNwC/B4YVZ7fC9w/rfdJ2lTSI5KmSDq8l9dHS7q8vH63pPFtry8m6R+SDulLOs3MzMzMzMxsaOpTAEPSR4ArgbPLrnHANdN4zwjgdGAzYAVgJ0krtB22N/B8RCwNnAKc2Pb6KcAP+pJGMzMzMzMzMxu6+rqM6gHAmsCLABHxG2D+abxnNWBKRDwWES8DlwET246ZCFxQnl8JbChJAJK2Bh4DHupjGs3MzMzMzMxsiOprAOM/JQgBgKSRQEzjPeOAxyvbT5R9vR4TEa8CLwDzSZoNOAzPvWFmZmZmZmZm9D2AcYukI4FZJG0MfBv43jTeo172tQc93uyYzwGnRMQ/3vI/kPaVdJ+k+55++ulpJMfMzMzMzMzMulVfAxiHA08DDwL7AZOBo6bxnieARSvbiwBPvtkxpVfHXMBzwOrASZJ+D3wSOFLSge3/QUScExETImLC2LFj+5gVMzMzMzMzM+s2fV1GdSLwrYj4v+n42/cCy0haAvgTsCOwc9sxk4A9gDuB7YAfRkQAa7cOkHQs8I+IOG06/m8zMzMzMzMzG0L62gNjK+BRSRdK2qL0lnhLZU6LA4HrgV8BV0TEQ5KOk7RVOexccs6LKcDBZE8PMzMzMzMzM7Op9KkHRkTsKWkUuSTqzsAZkm6MiH2m8b7J5HCT6r5jKs//DWw/jb9xbF/SaGZmZmZmZmZDV1+HkBARr0j6ATnJ5izksJK3DGCYmZmZmZmZmQ2EPg0hkbSppG8CU8i5Kr4BLNTBdJmZmZmZmZmZvaGvPTA+DFwG7BcR/+lccszMzMzMzMzM/ltf58DYsdMJMTMzMzMzMzN7M30dQvJBSb+R9IKkFyX9XdKLnU6cmZmZmZmZmRn0fQjJScCWEfGrTibGzMzMzMzMzKw3feqBATzl4IWZmZmZmZmZ1aWvPTDuk3Q5cA3wxiSeEfGdjqTKzMzMzMzMzKyirwGMOYF/AptU9gXgAIaZmZmZmZmZdVxfVyHZs9MJMTMzMzMzMzN7M31dhWQRSVdL+qukpyRdJWmRTifOzMzMzMzMzAz6Ponn+cAkYGFgHPC9ss/MzMzMzMzMrOP6GsAYGxHnR8Sr5fFNYGwH02VmZmZmZmZm9oa+BjCekbSrpBHlsSvwbCcTZmZmZmZmZmbW0tcAxl7ADsBfgD8D2wGe2NPMzMzMzMzMBkVfl1E9HtgjIp4HkDQv8BUysGFmZmZmZmZm1lF97YGxcit4ARARzwHv7kySzMzMzMzMzMym1tcAxkyS5mltlB4Yfe29YWZmZmZmZmY2Q/oahPgqcIekK4Eg58P4fMdSZWZmZmZmZmZW0acARkR8S9J9wAaAgA9GxMMdTZmZmZmZmZmZWdHnYSAlYOGghZmZmZmZmZkNur7OgWFmZmZmZmZmVhsHMMzMzMzMzMys8RzAMDMzMzMzM7PGcwDDzMzMzMzMzBrPAQwzMzMzMzMzazwHMMzMzMzMzMys8RzAMDMzMzMzM7PGcwDDzMzMzMzMzBqvowEMSZtKekTSFEmH9/L6aEmXl9fvljS+7N9Y0k8lPVj+3aCT6TQzMzMzMzOzZutYAEPSCOB0YDNgBWAnSSu0HbY38HxELA2cApxY9j8DbBkR7wD2AC7sVDrNzMzMzMzMrPk62QNjNWBKRDwWES8DlwET246ZCFxQnl8JbChJEfFARDxZ9j8EjJE0uoNpNTMzMzMzM7MG62QAYxzweGX7ibKv12Mi4lXgBWC+tmO2BR6IiP90KJ1mZmZmZmZm1nAjO/i31cu+mJ5jJK1IDivZpNf/QNoX2BdgscUW618qzczMzMzMzKzxOtkD4wlg0cr2IsCTb3aMpJHAXMBzZXsR4Gpg94j4bW//QUScExETImLC2LFjBzj5ZmZmZmZmZtYUnQxg3AssI2kJSTMDOwKT2o6ZRE7SCbAd8MOICElzA9cCR0TE7R1Mo5mZmZmZmZl1gY4FMMqcFgcC1wO/Aq6IiIckHSdpq3LYucB8kqYABwOtpVYPBJYGjpb0s/KYv1NpNTMzMzMzM7Nm6+QcGETEZGBy275jKs//DWzfy/tOAE7oZNrMzMzMzMzMrHt0cgiJmZmZmZmZmdmAcADDzMzMzMzMzBrPAQwzMzMzMzMzazwHMMzMzMzMzMys8To6iWcTPH3mRXUnYYaM/eiudSfBzMzMzMzMrHbugWFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjTfkVyExMzMzMzOzoesvJz9UdxJmyIIHr1h3ErqGe2CYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1ngMYZmZmZmZmZtZ4DmCYmZmZmZmZWeM5gGFmZmZmZmZmjecAhpmZmZmZmZk1XkcDGJI2lfSIpCmSDu/l9dGSLi+v3y1pfOW1I8r+RyS9v5PpNDMzMzMzM7Nm61gAQ9II4HRgM2AFYCdJK7QdtjfwfEQsDZwCnFjeuwKwI7AisClwRvl7ZmZmZmZmZjYMdbIHxmrAlIh4LCJeBi4DJrYdMxG4oDy/EthQksr+yyLiPxHxO2BK+XtmZmZmZmZmNgwpIjrzh6XtgE0jYp+yvRuwekQcWDnml+WYJ8r2b4HVgWOBuyLiorL/XOAHEXFl2/+xL7Bv2VwOeKQjmXlrbwOeqeH/rctwyu9wyisMr/wOp7zC8MrvcMorDK/8Dqe8wvDK73DKKwyv/A6nvMLwyu9wyisMr/zWlddnImLTaR00soMJUC/72qMlb3ZMX95LRJwDnDP9SRs4ku6LiAl1pmEwDaf8Dqe8wvDK73DKKwyv/A6nvMLwyu9wyisMr/wOp7zC8MrvcMorDK/8Dqe8wvDKb9Pz2skhJE8Ai1a2FwGefLNjJI0E5gKe6+N7zczMzMzMzGyY6GQA415gGUlLSJqZnJRzUtsxk4A9yvPtgB9GjmmZBOxYVilZAlgGuKeDaTUzMzMzMzOzBuvYEJKIeFXSgcD1wAjgvIh4SNJxwH0RMQk4F7hQ0hSy58WO5b0PSboCeBh4FTggIl7rVFpnUK1DWGownPI7nPIKwyu/wymvMLzyO5zyCsMrv8MprzC88juc8grDK7/DKa8wvPI7nPIKwyu/jc5rxybxNDMzMzMzMzMbKJ0cQmJmZmZmZmZmNiAcwDAzMzMzMzOzxnMAw8zMeiWptyWthxVJI+pOg5mZmZklBzBqJGnIff5KQy5fMLTzNiOG62cy1Cu2khTDeJKk8ntXgyeQHjDD5domaaahFpQbDt9bX7V+s3WnoxOGar5saBrq5aPp5ev0wPMHWoPWiRwRr7ft7+oblKSZIr0uaX5Jy9adpoHSqsyVvM0hafG601S31g2qfCbD7lrSqthKWn8o3qwjIiSNk/R/kjauOz2DrfzeQ9Kykm6RtHbdaRpIrfuNpBGVa9vo1rnc7fejdiWfr5fvdK6h8Jst96XXy/P5huN1uKVyHsdQ+xxaeas7HYOp/fojaVVJH6orPQNB0kZ1p6HTKuXCVvloKUljyvMhdU+ZHuX+OkbSInWnZTAMxjV4SF3ku0WlwLGjpDMk7V72d+UNStLakuaq5OsEcvncT0s6SNJc9aaw/yQtAD3fjaSjgJuBIyXtI2m+OtNXB0mzwlQ3qI8D35K0iaQ5yr4hd6PqpUC1jaQ7gAOAz0t6fz0pGxjtFVdJmwMnAndHxI11pm2wVCu1pTF3e+AbwGkRcVt9KeuIOWCq3/FRwHXAVyTN0a33ozcTEa9Jml3SOcDXgQXqTtOMKpX1sZK+A1wAnClpsbrTVYfKeXwMcMJQqiyWc3eUpP+RdKCksXWnqdOq1x9JI4HVgfG1JWgGSVoG+LKkDetOSydIWhGm+h1uJOk35LX2m+W1IXVP6YcvAp+God0jozRmt+qDEyVt34lr1pD9AJumUilQacG/BNiRLBwfJekjkkbXmsh+kLQaMDfw97L9YXJ53ncDTwK7AjvVlsAZIOljwKTK9gbA2IhYDfgzsB+wfE3JG3Sl8H8AsEbZXljS94GVgIuBfYCPwNC7UVVbwMpveFFgHfL8Pg5YHzigFdzpJird6lsFj1IpGkl+zxOAu8pxXd9i/WZa1+dK4Wvm8n2/CqwCPFr2j6wtkQOoVPJ+VJ7PWirACwAbk/k9rcbkDYj2AqKktwE3kvelfSPiyVoSNgPaf4PlfPwf4GfAB4FZgP0kvauG5A266ncsaQVJt5HlkZuBb0paoxuD6a18Vf7dCLgdGAksAXxW0gr1pbAz2r7PkZJ2K41jr5Lf6/ztxzVN9XyTtIR6eu49Tpb3968lYR0kaXVgLUmzSJpN0rHAHsCuEbE5MEHSnuXYIVuOgF4bupas5PkqYAlJo9p74HezUiaeW9LXSlny9XLuf4c831cELi334AHT2IvAUKGe4SJRvtiIiL8DFwG7AGsDs5LBjK65IVUK/PcAPwT2UA6ruBw4WdI3yUrdt4F11AXDScqPcHZJn5U0b0ScAYxST7fFccB8kr4GbAQcHhG3D5VKTR+8RAYq7pC0IPAi8Dng48A2wOLA1pJWgWYXMqZXpfX2i8AeEfE4cDSwFvlbvgR4AfhojcmcLpLmL09b3a7XknSdpM+S+ToF+AkZxHijcj+UqGcoWKuisK2kO4GzJW0N3ACcDuwJUArSXaty3T4OWFDS+yPin8DhZCDuHCCAdSVtWV9KZ0xbC1Dr+jwzcCfwS+CdknaStGa3VHBLnloBtq0lTSC/q5WAeyLiZeB4QGQld8grBeVWl+w/kefxScD25G96b7qsp0313K1UcsYAuwNnkNfjVYFturHRqzfqfVj1fMCmwFmlYeBSYA1Vevs2UbmXjinlxq2BCyWNjoh/A98lL8N71JvKgVEp491P3jvWjYiXyEDqYmSZCLKM+AUYmuWIFrUN9SoNXZcAn5S0EPAc8AdgSA27LvXav5H1ogPK7oWBayJiM+Bl4L3A7gPZyDdkPsAmaitEHQJ8TdK25eXryZaTFSJiEeB5sgW3kcMteinkqURatyQr9usBG0bEv4DlgH9GxHpkD4aVgY8NYnL7pfwI/wF8j6ycA3weOKIUFJ4C3g08FxFrRcTNkpYEGh+c6a9qtLxcmIOsuH+mfFb3Af8HPAOsCfyW0sLQ5ELGtCiHd21d2X4ncDcwiqzUArwCvAfYPSJOBV4D9pG09GCnd3pJWgO4TNJ8pcC1CfC/ZFfP3wHfIit8dwDvKPkfUkODSgXwd/BGgGp94EAyCPWD8u8GwNXAopLWLe/ruhYk9YxLbvWugQw+fq3sfxQ4CHixXLfPI7s7z15DcvutEqB5XdLSkiYBX5K0V+lx8ThwGLAZ2UL4CbLXSSNJWrQVSCp5Gi/pauBQ8nozB3ArPa16vwEWJO/BQ+r32ptSCbhO0k4R8QLwMHAWWaEaD6wLbClp5vpS2Teaek6pBSWdJekTkmaPiO+Xw64mA6onkRWCtWpK7oCqlJO3lHSZpE+R16JdgN+TwfTtySFujWowUk8vmdYQzA8DhwCrRsQp5Dl5SDn8r+S5uZPKcNtuVD1Xy7+vkOfiPpI2BU4le7q1rks/AH4p6Rvl/UPyulQZ6nWUcqjQ02QQa256ejVuAczRzeXjll6CMB8FPl7KlbcD1yt7ac8O7EDecwesod4BjAGmnLzyxFaUWDkB3CnAu4B7gXMlvaNEIRcmWzghK35r0NyWk1mqG+XHtzAZdf0jWal7h7KnxbzAxHLo+8kC1l2Dl9Tp11Yp+S1wjaS1I+Iq8qZzIPBrYDIwt9IBwLXAooOe4A4rrfGHVFr8tpK0eCkk3gjMppzYcU6yW+fnSuv0TGR3wXfXlviB8RDwo8qNdn3g8og4pNL1fCSwIbCApPWA/5ABgL8Pclr7rHKe/4psiW51Z50fuDYiLouIb5FBvKOBy8gW3S3UM6yiq1UKX/cBN0o6qbw0Dxmc/FlEXEH+1lcnh4/cDnymvK/rWpAqv+OdyCEiRMQ5wMsluA4wmhwaB/Bvsmfg2wc5qf2iqXs6jlAObbyG7B11PnCqpHeWCsVaEXEssBv5m/1NTcl+S+XasxwwpbJ7XeBPEbFmRDxQWr2mkL0w9irHjKYnMNf1v1fotVv2GpKWLOWQrwJ7l5a9mcnKwndKj5SnyGv0vIOd5r6qnLut3+hC5Fj558geuh9XzrX1LuDpcm16lOzxuJGk2WpJ+AzS1MNFRpXr8MfIvL+fnFdq3og4guzJuzcZcGzMPFuaurdM677wOWCDiPhM2T6S7KG8ZDknXwbeBmw16AkeIJVz9QPKYT7zkffIm8h8PUfWbTYGlilv+ygwe/nMhsR1qZ2kiWSj3jxkT6mLIuIvEXE08ASwM1mXemd9qRwY0lQTSC9fAlW3kA19x5XD5gdmiYgjI+Ja8jq8m6R5BiINDmAMkMrF9GWyVetf5QJ9ANk74dBSMTgNOEzSLGRAYydJ3yODAXtHxM8GPfHTIOlgshCAcqLG1sn3GNklbmVyzOmrwOalteA6Sa0L2PERcdngp/ytleBSq2v4a2V7dKmk/5q82EC2dn2UzN9XyYvTlWQkdfuIuH7wU98ZlfP4n8B2kjaTdCHZW+jrkv6XDEj9hvyuXyCvI6dKupu8SH8gIh6oIfn9ppwp+1hll/JREfEgWSFoRc1fBxYugatWBfgl4ARyKNgZwLcj4sSIeKqOPPRFpZC1Dhlo2VDSwsBsZE+qluPJQqTI69QTZO+brlcpfE0k5w74qHJI1HPAH9Uz3O0aco6T18iAzhWlclx7wXlaeqnwrS/px2TPg4Mkfb289FGyh9lIsuD1Tkm/Ilvx3xcR9w5isvutUpDamWyh/hPZIvgMcCYZsGvdg+aStD/Zy+ZFsqWwUUrhMCLiJmBm5QSrkI0cvyjHtO7D3yfP1a3KNfhZssLX9cr19o0Kj6R5JI0jy1UbAUTE+eRvdF/gX8ADZEPRTWSl6pCI+EstGeiDyrn7QUm/AD4LPBURR5JlyfnJMuSPyKFdx5L3m3OBL5T7UNcpDXwjJC1dWvAnkb0sVicrvSsAm5b78U3knGM3k73iag3Ota6vJQ9LSfqScqJCAduSvWMox/yMvH+cIOkWcvjPbhFxcR1pHwiSFir1lo+RdZevkz2efkCWlXYkG3MWJs/Z2SJiSkTsOBR6HrRTmeyfLC9NJO85GwDvVU5yD9kgdANZR5xqjptuVBoLFivnwanA5yStBXwK2EzZa/dvwEvKYflnkCMPLoiI5wcqEX7M4IPsNbF+eT6CrAhMJqNsK5GFiR3K66PICNUW5Mm+I/CxuvPQS57mBY4rz8cAc5Gt7V8h50HYorx2MbB6eb4dOZyg9VksWHc+3iRvi5NjZQ8hCzjvISdXepjsOr9QyesPgW3Ke04HLq58x/PUnY8OfC4j2raPJQsMrfNgfrJ3ympk4eIsMto+hpxAbqe689CfPJPDhH5KFggnAaeU11Yo+V2y/F6/SLasQLYC7VN+w7PUnY/pyO9M5Vy/GfhkOf+/WL7DXwBbl+PWAc7t7bzotgc5qXD7vm+Wa/QmZEuQ0F9DAAAgAElEQVTRt8keNf9XzvvZyCFRFwOz1p2H6czvf31fZOVufbKF+kpybPJa5bXvAP9Xni/bun43/VH9XoFFyNndLwPGl33jyzX8PWX7BbIL68xkYfI9deeht/wAM5V/NyXvu1sAF5IToe0M3N/2vneWf+cG5qs7Hx36bBYu5/Dkcg3bgyyLtL7bDcneFuPLb/fTlDJK+7lS96PcM6rn7gIlvZeSrbZHA3+pvH4QOWRkUTKA9UVgvbrzMUCfRSvfC5XtPekpZx1NVnjeVTn+S8CmdXynZFlhN7K8P7rs2xZ4kJy8/CzgrLJ/EnBm5b0jyeHHe9X9mfcj3zP1sm8dYGJ5fjLZI+gbZXs78v66KLA5pcxU/RzrztNAfh7le72XnOAfMnDxM+AD5dr9SyplCHJlt6/WnY8B+BxGkw14E8v5fRc9ZcYTyB7LkAHJq4BPD3Qaujb600Afk3QcGYkaRxaK942IX5In98qSloqMNH+TjLKPiOyyfQY0Lhr3GjlJ1M6Rkw99hlxK8BCysP8p5Rjy39PTU+FGsuvqPwCiYa0eytUWTiLHUc5KpvVO8gJ8d0SsQEaP9yZ7IHwT2FnSnMARwKqSFo2I12KgIogNUGlNaLVMb1Be+gr5eYyRNGtE/JW8SZ8QEQ+Twyy2AEZFxHci4tLBT/0M248cmzcxIvYmu39OkLRIyeN3gWOA28jx82dIOpqs+C9D5v1f9ST9ran3uRrmBxaJiA0j4n+Bg8lhAsuRv/GtJV1HzodxM3TnkIkWta0eU/6dgzyv942IG8hu+euSQyu+QE4edzXZ++bCyEkuG6/SM+g1STMrV7ZqLdl3Ltk6fRd5Tp9KVgYgW9G2UY5bfTQifjTYae+PiIjSIr8SWZjapOz+fTlkPHl/eljSeHLs+Ucj4uWIOD4a1rukdZ5GTwvlweQ15i6yQLxXRFxCtmidLGk75RwfhyrnSfhbRDxbS+I7SLks+yTy9zmOXNXsqvJya4WHuymTdkbESxHx1cguy61u/o3oPda6HpVzt3V9XojMx8iIuC8ijgeeUc4DAdktfz5g44i4KyKOiIgfD37q+09TDxdZUtI2ZfM88re7btlegRyyCzl/wLzA4uV6tiA5kel8MLg9MCTtTZbpdyN74R5YXlqq7LuD7CXTKvPuB+woabmy/VrkkK/zBivNM6r0En/jeiRpv9LLZLGIuBW4S9INZFl6B2BJ5Wo5N5A9394eEZMj4ofVv9ut5Qm1TTQrqTUs7Q/kNbr1e10O+GZkT/QgezMeW/lT85A9ArtCpdzU+vcDJe+vkBO1rkrWqR6kzPUSEUcBayvnJrqbbMD/6oAnru4oTjc+yAj6iMr28mTl5vfAsmXfimRL18blSz4V2K/ynqXa/2YD8tUeWdyWLBiILEhdSQ4PgIyyXkd2lfoWMFtvf6NJD/JG820qLeZkD5nHgR3L9lpkC3Wrh8nNwMHl+ei68zDAn0d7S9BK5I3452RFbiGyZ8WVwJLlmHFlW2Sr5xx156O/eS//zk3ecDcr2wuSE4aNLNszk8OJNizb65Ld7hvVejuNvO5PDoeZtWw/AGxZns9XzvfzK5/JRq38D4UH2evtc2TQeLly7j4IrF055izgvsr2inWnezryp7btWcp16+Jy7u5Z9n8Y+Hp5/k4yiLNL2R5Tdz76ke+ZSp7uKN/pHmRPuXeU198BnE0OsXgA2KTuNPeSh2o5opWfg8nWy3uBucpra5VzdGMyCLk3cAUN7L05g99n+7k8PxmsaLV4707OOzWWbAX+VrkfTS7X5Tkq7629TPUWeT2EbCDZuOR7B3JYyBrl9XXJSuCosr0KDS5bvUU+q+d361zel1yZYZmyvUfZXoIM5NxHDrm4vu0aPS85ZHew8zB/uVa2yvbbAxeU68tx5BC0ycCa5fU5yr+nkY1+tX8P05nfDcmea60ywkpkBfwGsmx0P1lOWh44pxwzG9kY+BOyp0pX9VqcxufR23XpSOCm8lxkz6kfkQGtA8i5lw4v1+jtgfnLsSuTDaZvqztfM/B5/IisH4wkh7j9FFilev5UzqPlOpqWuj+MbntUbyLlwtY6MT9cTtZ1yvacZd9l5Yv+GFmAnoeeikJjbrBt+Zqt8vw7wFHl+UfIwsSYsr1huWBdSLnRNvVRLqqXA1uV7VYexpBzXFxcOfYz5eYzlizoL1t3+jvweVQDF6uRY/0PJwtO85JzILSGjlxCtkhvB9zCEOj+VvLV6qr9EbIb68oln8+Sw0o2qLz+u7rT24/vdU3gx+VcPp2ccHdxMpD3ncpxp5bf8WZ1p78Dn8eaZEXwGLIr9hVky/yHyVbtVqDqIHKZ4A3rTvMM5HUNMqh8BPCJsm8LMljTuh9dQFYgziJbEpeoO919yNfby/e3UiVPrcrQAmQl9iMlj6eUe22r4jcvGYgfV3c+2vLUXiAeQZYTtiCDT6eRXY9XK6+PIYMWlwKLtt5Tdz468XmUe+6q5fl85MSyrTyPIysH/1O2FyWDk9UCdGMq+u1pKdffG8o1d3OyN9TuZCv2l8lW3NnLsT8GPl53Hgboc9iQbHXelCwDnwgcUXn9brKSPIqcP2LvtvfXeq6TQwsPKM+XIOd6GF9+kxfSM2xtSXI+iMXr/sz7kccxZDnhrnJOzlzy+HfgvMpxp5Xr0IJkL+Ztyn3lSMpw8sqxjanj9PMzqV6XliHrCmuUz+aXwPvLa7OWe89Xyva2ZN1vnbrz0M98VwOPs5C97Fvl4ZXIINZyZf+ZZJBm3nJeTGaQAjS1f1Dd+iDHj/+JbN05rOz7GNnCP6Jsjy8n8X40vIJf0rtoyc8lwEll36pkAXgc2Vr9DXJC0tZ7uqYFnmyRPLA8r16YFit5brVGLk8WJhpfuJ/Bz2MWeuZB+Dq58kCr4Lg+WeFdk+xN9AeyYrRj3enuZ17fsgBEBremkBWl95ABjN/RUxHcrf28aeqDnhagzcmW25nJwOOjZG+MhckWrkvJCu+53X6ut38v5LjLcWSX5LeTFd2byNb6s8sxF5GtoL8o3/fCdedjOvLbCr6pFBw+TQ77OajccybRE6T9HhmcnJsMYtxGDS2Z/f1OS/4uB3Yt29dWvsOZyDHHPy7527JctzaqO/1vkqe52/K2YTknT6ZnnqxWnl4ngxkfIitGS5JliSEzzwVTF5THkpXB28ly1P5kxeA4SnmkHHce2Tr/nra/NVWPwrofTF3GaP1eFyfHzM9KBt7uLL/HRcp3fg498241vszYh89gTTJwcVI5l28o+7coeX1f2T6HnBx8yTc7P2rOx2xkj5gx5Eoo15Llp2XJCc4fKHn8OaXhp9seZO+BayvbreD+sWSPjNb2GLIsMQ8ZvDibXIWu9jwM4GdR/e3ORgb8v0tOZt6aL2pP4PbKcceSvfA3fqu/1+QH/x1w3YUM2JxGBm9aDQenkEG9MeW3fFW5ln12UNNb9wfW9Adt3YfIyu3JZKF/VjIK9SLZ3X5esiL8UeB9ZLfAd1CZ8LH9BKkpTyPICZJ2oGdowNvIQsMnyYLEreX528hJo1o/2u3KD3nOuvMxnXkWWfg7lZ7JdmYp/44jWyNvA+atO60dyn/7hWnW8v0+WjkHrgK+XJ7PTXaFu7CcL5s34dzt73dfeT5v22utYOOa5SZd/a1+CPg4DZ6ks/3GSM+ESa3urkuSvSv2KTejm8hK/Riyt03XTSrWlt9eu6qWG2yrkrsI2e1xV3IpwpvomYBsOXI1ndrz0sf89jah2gRyPOrHy/a6ZDCjFZBdlpyXqKPdOTuc79PIYU6Llnvtb+gZKjI/GaQ5plyrjgJWrjvNlbSLHOd/FVnxaQWWNi/pXrZca/5Rno8o16Mvkl3qTyZb7bu+Qlv9TNq2FyCHzuxXtr9BVga3IctQD5EBjaPJ1t6zW+d3b3+vKY9ynT2tPFqNA6PIgNzR5fkV9DQYHUnPxKSNzNNb5LW3yYM/Sc47A1lB/i5wUNk+lKz0XE+24q7wVudI3Y9yD32tfJeLt722PlnuX6zudM5A/saRZaD1yPmEPlG+o+3IBqz1ynFjyMbAecp2tfd2V5YRp/G57AdcVZ5vSDbytoaY30gOpziODEhOpG1YYN3p72eetyfLja+TDXrrknWn1rCiseScWq1J38dQeo4N5qNJk0Y2Tpn86fWIiLLUIGQUdjwZfY2IeISs5J0VEc+R4xj3IVv0HoyIByPi+eqyS4OekQpJ+5AXqZXLozWp0CtkpfXaiHiabLFbmezCeRqwsaR1IuJKcrWJFwc98TMg8lf2YzIgtUvZ15p8cRty4qjzgFe6YZnE6VEmDmtNPLSxpHeTF6ZryV4Gm5ZDDyYn81s5Iv5Gjm27lSxwT6773J1elUmXQtKqkm4FTpb0ydYxUSaUiojbgUeA/6ksT3hFRHw9GjhJZzVvZfvdZftucpmuTcokccsDL0XEN8jeFm8nuzeOjohboosmFWsn6e3k8r3vLdvbV16+DfhDWR50deCFiLiIXCp1ZmAHSfNExCMRMXmw095fld/xhyR9Q9J6EXEfOeHuGuWwB8gVld5XJlx7lCxcP9X0a1svE4btIOnn5AoimwP7RMSfycLi8eVtT5P35S3J3gknRMQvBj3xbyLSf8gGjmXIbueQZYhLyXx9Gjg2chLV18iK7bJkC9/BEbFJ5ATgQ0LlurWWpB+Q392ZwPck3UwGca4jl3L+HdlzaEHyM9yL/Hz+3P736tTLubsnOdfFv8lz9EBJ65PlrDkjJ5J9hVyefeMy0ewXo0wu24Q8TY/IyYPnKRM9zlp2r04GHSErwecBe5cJg79MVorOjYiPRk6aXf17Tcv/ueQkncdHxB8kja6U638UEWdGxB/rTeIMeZrsvXcBOcRncXIljfeR5cWvSPo0GXCbnTyviVxKVpLUbWXElup9sWRlNUmfL7tGUJbajoibyaDbrmWS073IXhdLkb0Pvlt+B42o701LyeuItn2nkvejI8iGkCfIeWn+Sk7MOZbslfIwsHr53v8dEf8Y3NTjHhjTepDdpM4kK3InkUMq1iEvxBNa92HgeXqiU42MwtIzGVFrLPE4smvUKLJb+dlkhLHVKn0rsHt5vj1DYC4IsrJ+F1n4nUgWkm6gtOYNlUc5b4+vbL+NDLTdR3adv4mMou5NBqjGl+O+CtxWd/pnMO9r0dNFeyRZsbuTbM1cgewxtRU9XSJb5/vyZNf7BerOw1vkTbS1eJAtJt+mTHhWfsPXlGvVEuQQsKPLvi82OX/T+VmMJ7ttHksua/s7srVk7vIZTC7HzV2+8y8A95BD/bpi6BuV+YjK9ohyP7qCbBm7gpyzZ07gMXqWtF6F7JbdFUsbt5/Xlf3n0jN8a5ty7Wp1O7+xvP5zsqLYqInRyMJ/q5fIrGTPkGPJXhcLk/N2PEH2sFigHLdYOXZ/cnLDRg2LmMHvt32Z7o3K73K7yr4dKOPty7X6WXLFh9acJkuRvReuoyHlrLc4dx8HbizP5yF7851etm8h5xu4j+w51Mgl56eV77btj5OTBn+DbMhbiWwE+w09rfWbkPNdnNjL32t8azU5P8fddaejw3lcngyutr6zvcgeBreRveBWqTN9HchvtYfuzOXfRcm60iJkcPVL5KoqkL02/0yuhtR+TevKazUZGN6oPK8ucvBLeuoGK5HLo/6MHPZY+3xh7oFRod6XMf0sWQHYmPxht37I/yFbuBaM/HaPLq8TJQrbHtmqW+QymOeRcxpAjv9ek6zM/40sAG9Gz5JWT5E3JCLi25EteV0tIq4juzW+QHbbvSaydevBelM24F4nl8RsLXG7ItnqMyEidqWnEn8POXnhDgAR8WmygtTNNgTOlbQHWTi8m6xMzE+e83eTwwkWhjdajmaKiF+TQwueqiXVb0HSbPBGa+7rksZLupo8hx8he8tsXvJxMxlQ3YdsmT6ADGR8K3IJvsblry/aWwsil8u8gyxkLE9eu8YD/1s+g7GS3hPZm2gTsoXpsIg4IyL+Ptjpn16SRgEHSVq8tPatSS7L9p+I2IEsUCwF/DKyR9xZZIAKslL/xeiCpY3LOds6r1dSLg+6dHn5JbIwTURcTfYU3Kd8Nh8ig3KfiYivRMQztWSgF5IWIIOhX5e0cORSvGPK4zqysncx+TudFBFPSVqZDEhtSQ7ZvKB8Lk1riZ4upYUuynV2LkkLl303ka2X764c/ldgXeXSuDuTkyXeERGvlGvgtsD1EbFpNKC1u5K31yUtJWkfSa2eULtSekVFLrt+FzBK0lbktepW8jd6XDRsyflpUWVpWkkrSloXmIsc7vNNsmyxP9lj4VrgIkn7kcGoa4DF1LMMJdD81mqAiLgTeK38VoekiPh1RPyrnLOQgcaHyR66a5DDuRpXv+mviAhJY0pZeTdJ80bE4+Qw1DPI83cO4BPlurQtWYnfNCpLwqpByzVPD0kHk/PTrCfperKXLqW88cfoWZb8hcjlUT8NrFvKWPWqO4LShAf/3aq5PHlTHUN2p9qCHKd3CT3zBaxDrtDRVTP3k11/XiEnrjuM7JZ5ERlZHUNWdq4jZ5n9Ut3p7fT3XncaBjg/7dHgD5IBipnI3iZn0RNVX4/saj6CjCRfQIm0dtuj5K/6+12f7Jb7A8pcLWQldzI9s9n/iZx8a7bBTu/0fqdkgPGr9EygNIEMqh5Gz3j6DciJWLevfPdP08uEUt34YOpWknmA95bns5AFq6/Rs0zsFeV6dg1l3G43Pai0upPB1ifJAtOh5FwKD5Ktu8dRmcen/Psg8O6689CPPI8iW/p+TQbZHyWH+hxMDmdcvhz3YeC3ZAWp0ddvstXucbJX0K5kl+zLyB5ik8hyxublXL2eDELuXHe6BzD/7S30h5Kt8f8HnFz2vb+cs63JTWcn54G4i+zG3rjlnMs1eTcyKNy6/rZaJj9JVvDWK/sn0bPU5BzlXDi/6fedPn4OS5C92a4hhxqPKd/dHWQ58lx6lqffkWwYWbr8dk+uO/0z8v3XnYYO529k+W4PJBt7LqBntcXrKL2yu/VBNtruVNnepVxrjiN7D61Ree05sq43J9lIfQtZ7lq4fBZdMxcgvfQUK9ekL5G9/t5L9ixpTVy/Kln23JpsEPlU3Xn4rzzVnYAmPchW6q+Qc0QcU/ZNJqPIK1eOa3W1ORBYqu1vdEM3uD2ZehnFUWSL/BJlezwN647rxzS/06lmTa48/w5ZCRhfClMrVV67kQxgjR0K3zfZIv0BslB8OnBr2T+i5P82stIwttyITqTBQwnIHhS3lALEfJX9n2rdZCr75iSHA91KVhQuL/82dgLSPn4G6zL1xKpHkIG3M8kWkgXIFoMzyPkRWp/FMWQQa4O68zAdee2tgLEr8Ed6VroaXQpS1WXtdiKHN6obvu9e8jgXWdmZRJkcr1y3vkZWaP+X0tOCHH6xL10woSUZXHuRbAyZRBYUjyOHNB0EXFA5dkLd6R3AfPd2Hu9EBqZmJocCvQasX147Czi1/Zx4s/Ol5rztTQ5HvKGco58q+w8p1533kJWea8hu2QuQvWxaAbiFeJOJh5v8oJeu8uTwnivoGYo5liwvt4KvPyrbq5TtOcnK3y/okqFtw/VBDv05h7YGAMoQi25+lPvI18iG6S+UfC5UXvsaWb5apGyfCTxZee9s5Xp+Q7med0Uwi6knFl2Cnoa8dcgy5vnkpJ2bVY47gOzJ/V3KcrFNe9SegBq/0PYb7DpkS8COZGH4++TkNRPIAuSC5DwCp5YvtGvHkZMt1k8AS5ft1cvNeGzdafNjur7H8WQldbmyPY7/Xga31Vo/Vyk8XEQGNC4phY/GVwT68DmMJAOPD5cCZmtOiz/QM4fLHOWzupfsXdToZSRpm6+m7Fu+/HYPBq4u+0ZVXh9JtmKfT9sSg932IKP+95EtIq05e9Yjh0iMIltNfl05z3cnK0Kt38IsdOlyk2Qg7gh6Vi1Yjyz0t1qplyeDjxeR3Vt/QpcEatoKUhMoS9eW++7N9MzjMg9ZEVymbH+ELFyuNJjpHYD87k8uyT2WrNQ+WM7NZUo5oiu+t37mfcWS93nJwNscZC+x68j5tm4v17PlyZa/5dreP9UKcHU/Ktfk1gpP25OTybbGxh9Wfperl3+PKPtPo8x90W2P9s+frLwtWJ5/DHiq9V2Vf6eQ99mdyvn+QXrmL3kf2ZuqsY0GfvR+DtCgIGI/0r9Q6zdatpcie0wdSvb0upDS6Ec2mJxPNhyMJ3vp3kdP4+7by3Xrg3Xnqx+fw+iS59+TvZM/WL7bn1NWmSzHLVbyvxiwb93pfss81Z2Auh9kZWDNcsE9v+xrdWM8vnzBR5WT+qdkAGP2yvsbc4Odzny/l6zIHVd+oF29nOJwepSC3Ulkhf0LZPBtKd58GdwTga+V965bClSfrDsf/c17L/taeX9b2W4NJdgS+HN5vh3ZzXUNuqT7Lll5bw0JuYSM+o8nu+F+j55JgxcBjq47vQOY741LwWKz6ndOVmpnJ4NVd5CTHN5MVhjGkpWifepO/3TmtTr0aQTwuXJdPoAMTGxbXruSypA+erp87jKY6e1nHucCdqhsjyXnf7if7F10Ztl/LtnraN6y/QVgSt3pn9HvlxyutkwpS6xH9kKYmSHQ662Sx22pTO5HFpR/So6Xbk2Mtyo530frmP+07kN0SQ8UcvjLAeX5EmQwptWaeQllYjuyEexXdOnSxeREhdVr08bkcIJLS55bQYxHgI9VjlutHPN9Gr4sqh99Og+6NnBRycNBwH1t+3YngxOnkr1Vx1de27ncl34P7Fp3+vuZ5/YeU4uX6/EJZXsfsh6wGjmk61fl3nRQef6Jbvi9jmSYKMtX/T4ifle2l6bcXMlWrLmAlyUtEhFPSHqYXAr1rog4oUzwuXBEPFHe/8YSq7VkaAZFxJ2SXiC7s64ZucybdYddyMLTqlGW+CzLls1FWQZX0uHkRWo+suXydkmTIuImST+JyuRD3SR6lpFcjxxz+wuyZe854GpJtwNrSLotIo6WdL2ku8luvLdGxF01Jb0/DgL+JulosrX95Ih4tSzRdTnwVUnvp2dM/RuTytWW4oGxAbnu+g/KUmVzSnqGnGh4IbJFd+3ISQH3I2/KG0v6QkT8ocZ0T7fK+Twr2br7JNlS+V6yEvEhSQ+RFcJry+psW5EtQHeS3dkbp+08XAvYVtKzkRN/rQG8EhGrKJcsvlXSTmR33QPIStFtEXGkpD/28ve6RuTkjtuRE+i+l1zKu6Uxk472l3LpzyeApyPifkkzR8TLZMBmz5h6Odv5gH9JehfZ6+LH5HxcRC4F3A0+SS7PfC4ZIH8F+He5Tj0G7KFc1vlt5ASdj9SX1OlXvs8vkCvA3CNpMjkMZF9y2OLdkv5CNu59hOzx+G1JZ0fEaxFxj6R9o0ySXFlOMrrx9zvcRRdMrNoH3yAnON8zIs4v+64lK+m/Ia9VW0n6RkT8MyIukfR9sgL/AuSEpd1QZpb0toh4ppVWSYsCT0Qu+/sSsGQ59FqyIeEDEXGMpPnIITWLkI0NXbGowbBYhaTMdnwJcIGkvcrut5PdZC4tN5mXyJUpti2vP0zekN4paa4SrHiizIQ/0xD5YW8UEZ9w8KJ7lJmfPwBcGBH/kjS6vLQAObndYuViewfZUr96RPyJHKP7R8hVN2pIer+U1TYOlbScpBGSZpb0NTJyvg1wNXkdO43snnxBeT6fpHHkPDW7RM5a/9eastEvEfESWXD8TUScFBGvlv0REReRq63cSrkJtV6rLcEzSD2rQN1Jrr5xBBm4OZOs7BxMtmKvTZ7n+5Ctgt8t53zjgxe9zdwu6Stkd+zXI+Ls8vwQskDxCtkK9DsyIPk8OdHjrwcv1f2yUOX5T8l5pbaQNJJs9fk9vLE6w4HkHB/3ka3yG5TABhFxVvm3a8/rEmiKobRyQSkH7UL2VF0sIm6VdCiwTVktZGPgH+XYWcvb/kBPl+3tgI9ExOmDn/r+K9fkw8ny4rJkb4ynS0PCOWRvmyXJniXfqi+l068ELx4ju5evQQ7R25oMxuwALCzp52SFcBlJH4qIn5CVwK+1/k4leDHCgQurW/nNngYc2CovR8SzZA/A8eQcSzvTszojEfFiRLzQul83vcxcrsfbkvNjIWmREoQ5HThe0lJk8PVdkhaNiD+TZaexknaPiPMj4tCI2KlbghcwTAIY5IRR95Nd2z5Sbrw/IW84OwJExC3kuMUPSrqGnC38J8CmwMutP1Sux0MheNH4H6X9t/KdvUqOT4NybpYKzhPk8mxrldeqy+BeGV20DK6kmSSdRLb+zEMGaEaTsz+/PSLeGRH7ATeRLdLPRsQVEfErsuX6ncC/IuIfETGlnlwMiAuA1UuPMSRNkHShpO0i4vGS5ydqTmO/SBon6fOS5oee1p6ImESOnX8X+f1eQw6t+DxZwT2HDGDtBnw9Ik5r+rVMuRTqSZQAuaQ51LNU6O1khWH50mK5NvCViLiHDFhsJGndiLgjIk6MiJ/VkYe+krQw2QAAQOQSkTORExyuTg5/2qfylkfJnlSQBc1vRc8SfkPF2m29EbqSpHkkzVkqpVPI83PD8vIs5G8WcsjTsQAR8U9JywKjI+IUYJuI2C4i/thqEBrcXMywc8mJ3Y8veRgDELms65ERsVdEPFlrCvshcrnEX5DDtv5NBh3nI4dMBxnM+EzkUop/AA6XNJZsUDm6l7/X6GuyDSvfJ+9Jn23tiIjvkhO+315e/6/ztVvO4fL7XIpcZh3y93guOc/FzuTCDQ+T5enjyjG3A78kV8p6o7dUN+m2G8d0K11PXyC7mM9OdhtaF/go2Xo7q6SNACLiRrJ7/un0zBHxODlpnFntykXmVrIFZGxERGnxgqzojQCOkHQ/8FipBHWj6jCZIyPi1oj4J2W4iHI9bsgK/sbAKEmrSLqBvJBvFRHP1ZLyAVQq9dsDV0g6ntIbISKurDdl/SdpUUkbAit+/lwAAAlOSURBVH8ll7xdv1WJaf0bEadGxIci4syI+FZE3ETOc7IMOQfG7hGxbkQ8VlM2+qQE4r5MTpL8d+B1SauRPSw+DhARV5NL3m5DVgLvB46UdALZQvT5EmDvCqXy9m1JewJIOofsav4rcuzxvcBPJJ2nHAp2IqXwGBG/LMHYIaVbCsJvpbTiPQucLGlMRNxNLh36bklvJ+c1mYMsNB8PrCbpGEmnkK36ywK0AsqVFvquahAqlYXtKMP2SmW/+lo325q8p0IOrx4FvFB6Tr0ErFt6Es1ElkNeL63Vz3VhIMqGifK7PB7YTdKakkaXHmO/JocznhAR99ebyukjaXnl0LWW64CtJS1Arn61DBmw+DE52fm/yZXb1pW0cWTP+7Mi4kfQndeu4XTBuZqcDfleMur0KXJN+Z9SaQ0qrZmPkBfxo4CvRsSLg55as16Ui8yPyd/uLmXfS+Xljcmly/YHNomIw+pI44x6i2EykN2SXyAvzpReJX8jx/M9CuxdWsCeHeRkd0xkF/QXyDlO1oqIc2tOUr+UIUDHkZWZcRHxCtn1eBdKj6L2ykzru5f0BTIw9btS6fnboCa+/3YiA3FrR8TxpSfUPWTvvpA0sRx3FlnxW53s0vpDcqLOPSLiezWke0Z9Ejhb0i/IQOoEslXoJbLnzF7APeQ8L7+NiD1rS6n1SUT8lhzeNRE4VNL+5Eoqz5IV38fI8tQqZGPRZmRZ6j/Aau1B124O6pRr8mtDaVgQvNEL41blHHBfJlcneIqcn+dSMgj1TXKOooOq99luC0TZ8FLKil8G9iOXDl2RXAb5ZeiuHgiS3kYO5fp0ZVjqH8hJzdcjeyxvBRxTysMvSlotcjj5oeXYrr4GQ896zUOepN3ISZdeB95BruIwEViU7Ia/favbn6TZgfdFxA01JdfsLUnalOyieyO5isz+ZIvJJyLioRqTNiAkXQzcGRGnlV5UUXntw2Qvqr+RLX5Lkr/fIRO0aKcumUTqzSgnG72QvOmeXm6krdcuBh4gV8p5ubJ/FnJ+hD2B28ju2V3zHZeCxSXAxZHDYlr73032qnmEHOp0WES8Iukecjz5/lHGkXez0gPjAxHRGjYziqzobku2CN0vaXR4DqauIWkRcuz02mSPv7PJIX4iKwUPkcO9/gJ8ufrddvs1rN1Qy09Lue6+CMwVOfznjXxKmicqw7s0dOaDs2Gi9CYaX+kJ1pXnsKTlyWG1L5BBmdHk5O5HkUGMcWQvsZ+S5a4xZM/Vri9btAynAMY8ZAvBRRHx8bJvCWDpMnTErKtIWoOc72IVcoWNs2pO0oAokfB9yZnQT4hcVWUM2ZI3O7AyOb/HDv/f3v2FWlaWcRz//tQJMgLRDI2ooAsvvBKHzJTIqanwxgjpD0ZYhHnTkIbeKBpJ0Y0WVgRazthkEGkiRsMQgxU50ISY4p8gClNR1GESKpgDc+bp4n03bE4dznHmnLPW3vv7uTn7rLX22s/icNbe+3nf93mApaq6fbBgtS5JLqIlpCZLRXYAH6VVvD8P+DbwtVpR3yHJduDYyu2zoidnDlbVD3pC4zra9PO30ZaVLNE+aPyT1l1ld1X9fqh4N1KfUv48cHn1+g/9PXcnsL9moOiq/leSPbSRvodprYyvp81q/Suts9sO4EhVPTb1nJnsIrOo0ro7XVhV1yQ5rXoB6cnfcV6TN1oss5q8mEhrULGbdu+9ndbFaxut0PDVtFmd5wD7qhd6nyeLlMAIcAfw66r6zcobsDdkzap5/HCY5Dza6Pvfquq7U9uvo7Xs++k8Xvc8S/IAbQbcC8AHaSO0P+/77qAVp71tXkYI+nvOl2kdr75VVa8kuahaK8IP0IqRfgy4EThcVbcNGO6mSGsp+Z2qev/QsWhj9JpLzwPv7Ev8rqLdq98LXNyXmmiG9eTjEVoSw7+nNFJpBZI/SWtI8T1gO60L0lJPcByfoSW3b8iiJTAeAu4GfuUXH2ncVlkms412c35qwNB0ApKcQWszuLeqru3bTqmq40nOBfYDN1TV/iHj3EhTibi/V+vAMNn+WdqXvV2zPgq0liQHactiZr4Lh5ok19CW2V7dfz+dtuTg5UED04ZJ8vaqetWBAmn8+iDQlcDrVTVXtXlWszAJDGjTbWoOOhNIi2Jel8ksqiRfpxW0/PDUto/QigO+D3hi3u7RPRF3K63A7p9oRaOLVmBrpiqfnwhnN86flSP0Li2QpK03de99M/Bx4Kyq+tHQcW2FhUpgTJhRlmaL/7PzI8k/aIU5n6at31wGvtIr4M+lnoi7BLgAeLSqfjhwSNJJmYzQDx2HJGnxLGQCQ5I0jCSfprXkOwTcVVX3DBzSljERJ0mSdHJMYEiStlSSL9E6Qh0dOhZJkiTNDhMYkiRJkiRp9E4ZOgBJkiRJkqS1mMCQJEmSJEmjZwJDkiRJkiSNngkMSZIkSZI0eiYwJEnSzEry1SSnDx2HJEnafHYhkSRJMyvJc8D2qjr8f/adWlXLWx+VJEnaDM7AkCRJmyrJ55M8meSJJHuTvDvJgb7tQJJ39eP2JLly6nn/7j8/lOS3Se5P8pck96XZBbwDeCTJI5PnJPlGkj8CNyd5cOp8O5P8cksvXpIkbZjThg5AkiTNryTnAzcBl1TV4SRnAvcCP6mqe5N8EbgT+MQap7oAOB94CXi0n+/OJNcDl03NwHgL8FRV3ZIkwLNJzq6q14AvALs3/CIlSdKWcAaGJEnaTDuA+ycJhqo6AlwM/Kzv3wtcuo7zHKqqF6vqOPBn4D2rHLcMPNBfq/r5P5fkjP66+07wOiRJ0sCcgSFJkjZTgLUKbk32H6MPrvTZE2+aOmZp6vEyq3+GObqi7sVu4GHgKPCLqjq2zrglSdLIOANDkiRtpgPAp5KcBdCXkBwEPtP3XwX8oT9+DriwP74C2LaO8/8LeOtqO6vqJdqyk5uBPW8sdEmSNCbOwJAkSZumqp5O8k3gd0mWgceBXcA9SW4AJrUpAO4GHkpyiJb4+M86XuIuYF+Sl6vqslWOuQ84u6qeOZlrkSRJw7KNqiRJmmtJvg88XlU/HjoWSZJ04kxgSJKkuZXkMdpMjp1VtbTW8ZIkabxMYEiSJEmSpNGziKckSZIkSRo9ExiSJEmSJGn0TGBIkiRJkqTRM4EhSZIkSZJGzwSGJEmSJEkaPRMYkiRJkiRp9P4LvhgGn7aLHDYAAAAASUVORK5CYII=\n",
1133 | "text/plain": [
1134 | ""
1135 | ]
1136 | },
1137 | "metadata": {},
1138 | "output_type": "display_data"
1139 | }
1140 | ],
1141 | "source": [
1142 | "g = sns.factorplot(x = 'country', y = 'conversion', \\\n",
1143 | " data = data[data['test']==0].groupby('country', as_index = False).conversion.mean(),\\\n",
1144 | " kind = 'bar', size = 3, aspect = 5)\n",
1145 | "g.set_xticklabels(rotation=30)"
1146 | ]
1147 | },
1148 | {
1149 | "cell_type": "markdown",
1150 | "metadata": {},
1151 | "source": [
1152 | "### Step2: Calculate t statistics"
1153 | ]
1154 | },
1155 | {
1156 | "cell_type": "markdown",
1157 | "metadata": {},
1158 | "source": [
1159 | "#### Hypothesis:\n",
1160 | "Null Hypothesis: the population mean(conversion rate) of local-translation is the same as the population mean of Spaniard-translation. mu1 = mu2
\n",
1161 | "Alternative Hypothesis: mu1 != mu2
\n",
1162 | "And let's use a signifigance level alpha < 0.05 and we're doing a two-tail test"
1163 | ]
1164 | },
1165 | {
1166 | "cell_type": "markdown",
1167 | "metadata": {},
1168 | "source": [
1169 | "Breake the data into two groups: controlled and experiment group, without Spain data"
1170 | ]
1171 | },
1172 | {
1173 | "cell_type": "code",
1174 | "execution_count": 309,
1175 | "metadata": {},
1176 | "outputs": [],
1177 | "source": [
1178 | "controll = data[(data['test']==0)&(data['country']!='Spain')]\n",
1179 | "exp = data[(data['test']==1)&(data['country']!='Spain')]"
1180 | ]
1181 | },
1182 | {
1183 | "cell_type": "markdown",
1184 | "metadata": {},
1185 | "source": [
1186 | "#### I will use below example to explain Simpson's Paradox"
1187 | ]
1188 | },
1189 | {
1190 | "cell_type": "markdown",
1191 | "metadata": {},
1192 | "source": [
1193 | "Compare the control and test groups in all country as a whole"
1194 | ]
1195 | },
1196 | {
1197 | "cell_type": "code",
1198 | "execution_count": 314,
1199 | "metadata": {},
1200 | "outputs": [
1201 | {
1202 | "name": "stdout",
1203 | "output_type": "stream",
1204 | "text": [
1205 | "The avg conversion rate of controll group: 0.04829179055749524\n",
1206 | "The avg conversion rate of exp group: 0.043411161678422794\n"
1207 | ]
1208 | }
1209 | ],
1210 | "source": [
1211 | "# calculate the mean conversion rate for both groups\n",
1212 | "print ('The avg conversion rate of controll group: {}'.format(controll.conversion.mean()))\n",
1213 | "print ('The avg conversion rate of exp group: {}'.format(exp.conversion.mean()))"
1214 | ]
1215 | },
1216 | {
1217 | "cell_type": "code",
1218 | "execution_count": 315,
1219 | "metadata": {},
1220 | "outputs": [],
1221 | "source": [
1222 | "# calculate t statistics and p value\n",
1223 | "t,p = stats.ttest_ind(a=controll['conversion'], b=exp['conversion'],equal_var = False)"
1224 | ]
1225 | },
1226 | {
1227 | "cell_type": "code",
1228 | "execution_count": 316,
1229 | "metadata": {},
1230 | "outputs": [
1231 | {
1232 | "name": "stdout",
1233 | "output_type": "stream",
1234 | "text": [
1235 | "7.35389520308 1.92891785778e-13\n"
1236 | ]
1237 | }
1238 | ],
1239 | "source": [
1240 | "print (t,p)"
1241 | ]
1242 | },
1243 | {
1244 | "cell_type": "markdown",
1245 | "metadata": {},
1246 | "source": [
1247 | "If we look at the above analysis, the test group is doing significally worse than control group. It seems that after the change, the conversion rates drops significantly!
\n",
1248 | "When things seems not in the way that we expected, there must be something wrong.
\n",
1249 | "Let's dive deeper into the sample"
1250 | ]
1251 | },
1252 | {
1253 | "cell_type": "code",
1254 | "execution_count": 319,
1255 | "metadata": {},
1256 | "outputs": [
1257 | {
1258 | "data": {
1259 | "text/plain": [
1260 | "date\n",
1261 | "2015-11-30 0.051204\n",
1262 | "2015-12-01 0.046249\n",
1263 | "2015-12-02 0.048472\n",
1264 | "2015-12-03 0.049255\n",
1265 | "2015-12-04 0.047085\n",
1266 | "Name: conversion, dtype: float64"
1267 | ]
1268 | },
1269 | "execution_count": 319,
1270 | "metadata": {},
1271 | "output_type": "execute_result"
1272 | }
1273 | ],
1274 | "source": [
1275 | "# the conversion rate in test group are constantly ower throughout the days\n",
1276 | "controll.groupby('date').conversion.mean()"
1277 | ]
1278 | },
1279 | {
1280 | "cell_type": "code",
1281 | "execution_count": 320,
1282 | "metadata": {},
1283 | "outputs": [
1284 | {
1285 | "data": {
1286 | "text/plain": [
1287 | "date\n",
1288 | "2015-11-30 0.043878\n",
1289 | "2015-12-01 0.041371\n",
1290 | "2015-12-02 0.044216\n",
1291 | "2015-12-03 0.043898\n",
1292 | "2015-12-04 0.043459\n",
1293 | "Name: conversion, dtype: float64"
1294 | ]
1295 | },
1296 | "execution_count": 320,
1297 | "metadata": {},
1298 | "output_type": "execute_result"
1299 | }
1300 | ],
1301 | "source": [
1302 | "exp.groupby('date').conversion.mean()"
1303 | ]
1304 | },
1305 | {
1306 | "cell_type": "code",
1307 | "execution_count": 321,
1308 | "metadata": {},
1309 | "outputs": [],
1310 | "source": [
1311 | "c_country = pd.Series(controll.groupby('country').size(), name = 'controll')"
1312 | ]
1313 | },
1314 | {
1315 | "cell_type": "code",
1316 | "execution_count": 322,
1317 | "metadata": {},
1318 | "outputs": [],
1319 | "source": [
1320 | "e_country = pd.Series(exp.groupby('country').size(), name = 'exp')"
1321 | ]
1322 | },
1323 | {
1324 | "cell_type": "code",
1325 | "execution_count": 323,
1326 | "metadata": {},
1327 | "outputs": [
1328 | {
1329 | "data": {
1330 | "text/html": [
1331 | "\n",
1332 | "\n",
1345 | "
\n",
1346 | " \n",
1347 | " \n",
1348 | " | \n",
1349 | " controll | \n",
1350 | " exp | \n",
1351 | "
\n",
1352 | " \n",
1353 | " country | \n",
1354 | " | \n",
1355 | " | \n",
1356 | "
\n",
1357 | " \n",
1358 | " \n",
1359 | " \n",
1360 | " Argentina | \n",
1361 | " 9356 | \n",
1362 | " 37377 | \n",
1363 | "
\n",
1364 | " \n",
1365 | " Bolivia | \n",
1366 | " 5550 | \n",
1367 | " 5574 | \n",
1368 | "
\n",
1369 | " \n",
1370 | " Chile | \n",
1371 | " 9853 | \n",
1372 | " 9884 | \n",
1373 | "
\n",
1374 | " \n",
1375 | " Colombia | \n",
1376 | " 27088 | \n",
1377 | " 26972 | \n",
1378 | "
\n",
1379 | " \n",
1380 | " Costa Rica | \n",
1381 | " 2660 | \n",
1382 | " 2649 | \n",
1383 | "
\n",
1384 | " \n",
1385 | " Ecuador | \n",
1386 | " 8036 | \n",
1387 | " 7859 | \n",
1388 | "
\n",
1389 | " \n",
1390 | " El Salvador | \n",
1391 | " 4108 | \n",
1392 | " 4067 | \n",
1393 | "
\n",
1394 | " \n",
1395 | " Guatemala | \n",
1396 | " 7622 | \n",
1397 | " 7503 | \n",
1398 | "
\n",
1399 | " \n",
1400 | " Honduras | \n",
1401 | " 4361 | \n",
1402 | " 4207 | \n",
1403 | "
\n",
1404 | " \n",
1405 | " Mexico | \n",
1406 | " 64209 | \n",
1407 | " 64275 | \n",
1408 | "
\n",
1409 | " \n",
1410 | " Nicaragua | \n",
1411 | " 3419 | \n",
1412 | " 3304 | \n",
1413 | "
\n",
1414 | " \n",
1415 | " Panama | \n",
1416 | " 1966 | \n",
1417 | " 1985 | \n",
1418 | "
\n",
1419 | " \n",
1420 | " Paraguay | \n",
1421 | " 3650 | \n",
1422 | " 3697 | \n",
1423 | "
\n",
1424 | " \n",
1425 | " Peru | \n",
1426 | " 16869 | \n",
1427 | " 16797 | \n",
1428 | "
\n",
1429 | " \n",
1430 | " Uruguay | \n",
1431 | " 415 | \n",
1432 | " 3719 | \n",
1433 | "
\n",
1434 | " \n",
1435 | " Venezuela | \n",
1436 | " 16149 | \n",
1437 | " 15905 | \n",
1438 | "
\n",
1439 | " \n",
1440 | "
\n",
1441 | "
"
1442 | ],
1443 | "text/plain": [
1444 | " controll exp\n",
1445 | "country \n",
1446 | "Argentina 9356 37377\n",
1447 | "Bolivia 5550 5574\n",
1448 | "Chile 9853 9884\n",
1449 | "Colombia 27088 26972\n",
1450 | "Costa Rica 2660 2649\n",
1451 | "Ecuador 8036 7859\n",
1452 | "El Salvador 4108 4067\n",
1453 | "Guatemala 7622 7503\n",
1454 | "Honduras 4361 4207\n",
1455 | "Mexico 64209 64275\n",
1456 | "Nicaragua 3419 3304\n",
1457 | "Panama 1966 1985\n",
1458 | "Paraguay 3650 3697\n",
1459 | "Peru 16869 16797\n",
1460 | "Uruguay 415 3719\n",
1461 | "Venezuela 16149 15905"
1462 | ]
1463 | },
1464 | "execution_count": 323,
1465 | "metadata": {},
1466 | "output_type": "execute_result"
1467 | }
1468 | ],
1469 | "source": [
1470 | "pd.concat([c_country,e_country],axis = 1)"
1471 | ]
1472 | },
1473 | {
1474 | "cell_type": "code",
1475 | "execution_count": 324,
1476 | "metadata": {},
1477 | "outputs": [
1478 | {
1479 | "data": {
1480 | "text/plain": [
1481 | ""
1482 | ]
1483 | },
1484 | "execution_count": 324,
1485 | "metadata": {},
1486 | "output_type": "execute_result"
1487 | },
1488 | {
1489 | "data": {
1490 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAE4CAYAAACwgj/eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXFWZ//HPN2EJQtgDOgQmqFEIGBYDMqAgZIgwIKDCsAhEWaIOCuqA4jgOCKK4MsYNUQIB2VcRQUB2QoAkEMIS+BEhQAaEQBBZRAw8vz/OqaTStzrdfe+t9ML3/XrVq+ueuvXUqe6ueu4921VEYGZm1mxQb1fAzMz6HicHMzMrcHIwM7MCJwczMytwcjAzswInBzMzK3ByMDOzAicHMzMrcHIwM7OC5Xq7AmWtvfbaMWLEiN6uhplZvzFjxoznImJYd/btt8lhxIgRTJ8+vberYWbWb0h6vLv7ulnJzMwKnBzMzKzAycHMzAr6bZ+Dmb11/eMf/2DevHm89tprvV2VPmnIkCEMHz6c5ZdfvnQMJwcz63fmzZvH0KFDGTFiBJJ6uzp9SkTw/PPPM2/ePDbccMPScdysZGb9zmuvvcZaa63lxNCCJNZaa63KZ1VODmbWLzkxdK6O342Tg5mZFbjPwcz6vRHH/r7WeHNP3q3WeC1fY+5cbr/9dg444IAeP2/33Xfn/vvv56abbuIHP/gBV155Ze31c3Iw66dafSHOHdLJF83xL7a5NtZTc+fO5dxzz22ZHBYuXMhyy/Xu17OblczMSjjrrLMYPXo0m222GQcddBCPP/44Y8eOZfTo0YwdO5YnnngCgE996lMceeSRbLvttrzzne/k4osvBuDYY4/l1ltvZfPNN+eUU07hzDPPZJ999uGjH/0o48aNIyI45phj2HTTTXnf+97HBRdcsEzfn88czMx66IEHHuCkk05iypQprL322ixYsIDx48dz8MEHM378eCZNmsSRRx7J5ZdfDsDTTz/NbbfdxkMPPcQee+zB3nvvzcknn7xEk9CZZ57J1KlTmTVrFmuuuSaXXHIJM2fO5N577+W5555jq622Yvvtt19m79FnDmZmPXTDDTew9957s/baawOw5pprMnXq1EVNRAcddBC33Xbbov332msvBg0axKhRo3jmmWc6jbvzzjuz5pprAnDbbbex//77M3jwYNZdd1122GEHpk2b1sZ3tSQnBzOzHoqILoeLNj++4oorLvHczqy88srd2m9ZcHIwM+uhsWPHcuGFF/L8888DsGDBArbddlvOP/98AM455xw++MEPLjXG0KFDeemllzp9fPvtt+eCCy7gjTfeYP78+dxyyy1svfXW9b2JLrjPwcz6vWUx9LTZJptswte//nV22GEHBg8ezBZbbMHEiRM55JBD+P73v8+wYcM444wzlhpj9OjRLLfccmy22WZ86lOfYo011lji8Y997GNMnTqVzTbbDEl873vf4+1vfztz585t4ztbTL196lLWmDFjwhf7sbeyt/JQ1tmzZ7Pxxhv3djX6tFa/I0kzImJMd57vZiUzMytwcjAzs4JuJQdJq0u6WNJDkmZL+hdJa0q6TtIj+ecaeV9JmihpjqRZkrZsijM+7/+IpPFN5e+XdF9+zkR5RS0zs17V3TOHHwN/iIiNgM2A2cCxwPURMRK4Pm8D7AqMzLcJwC8AJK0JHAd8ANgaOK6RUPI+E5qet0u1t2VmZlV0mRwkrQpsD5wOEBGvR8RfgD2ByXm3ycBe+f6ewFmR3AGsLukdwEeA6yJiQUS8AFwH7JIfWzUipkbqHT+rKZaZmfWC7pw5vBOYD5wh6R5Jv5a0MrBuRDwNkH+uk/dfD3iy6fnzctnSyue1KC+QNEHSdEnT58+f342qm5lZGd2Z57AcsCXwhYi4U9KPWdyE1Eqr/oIoUV4sjDgNOA3SUNalVdrM3kKOX63meANr6G8Z3TlzmAfMi4g78/bFpGTxTG4SIv98tmn/9ZuePxx4qovy4S3Kzcysl3SZHCLiz8CTkt6bi8YCDwJXAI0RR+OB3+b7VwAH51FL2wAv5mana4BxktbIHdHjgGvyYy9J2iaPUjq4KZaZWZ/0m9/8hq233prNN9+cz3zmMzz++OOMHDmS5557jjfffJMPfehDXHvttcydO5eNNtqI8ePHM3r0aPbee29effXV3q5+l7o7WukLwDmSZgGbA98GTgZ2lvQIsHPeBrgKeBSYA/wK+A+AiFgAnAhMy7cTchnA54Bf5+f8Cbi62tsyM2uf2bNnc8EFFzBlyhRmzpzJ4MGDufnmm/nqV7/KZz/7WX74wx8yatQoxo0bB8DDDz/MhAkTmDVrFquuuio///nPe/kddK1baytFxEyg1ZTrsS32DeCITuJMAia1KJ8ObNqdupiZ9bbrr7+eGTNmsNVWWwHwt7/9jXXWWYfjjz+eiy66iFNPPZWZM2cu2n/99ddnu+22A+DAAw9k4sSJHH300b1S9+7ywntmZj0UEYwfP57vfOc7S5S/+uqrzJuXBl++/PLLDB06FKCwvHd/mOfr5TPMzHpo7NixXHzxxTz7bBqHs2DBAh5//HG++tWv8slPfpITTjiBww8/fNH+TzzxBFOnTgXgvPPO63I5777AZw5m1v8t46Gno0aN4lvf+hbjxo3jzTffZPnll+dHP/oR06ZNY8qUKQwePJhLLrmEM844gx133JGNN96YyZMn85nPfIaRI0fyuc99bpnWtwwnBzOzEvbdd1/23XffJcruuOOORfcvvfRSAObOncugQYM49dRTl2n9qnKzkpmZFTg5mJm10YgRI7j//vt7uxo95uRgZv1Sf72K5bJQx+/GycHM+p0hQ4bw/PPPO0G0EBE8//zzDBkypFIcd0ibWb8zfPhw5s2bh1dnbm3IkCEMHz686x2XwsnBzPqd5Zdfng033LC3qzGguVnJzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKupUcJM2VdJ+kmZKm57I1JV0n6ZH8c41cLkkTJc2RNEvSlk1xxuf9H5E0vqn8/Tn+nPxc1f1Gzcys+3py5rBjRGweEWPy9rHA9RExErg+bwPsCozMtwnALyAlE+A44APA1sBxjYSS95nQ9LxdSr8jMzOrrEqz0p7A5Hx/MrBXU/lZkdwBrC7pHcBHgOsiYkFEvABcB+ySH1s1IqZGuqzTWU2xzMysF3Q3OQRwraQZkibksnUj4mmA/HOdXL4e8GTTc+flsqWVz2tRbmZmvaS7V4LbLiKekrQOcJ2kh5ayb6v+gihRXgycEtMEgA022GDpNTYzs9K6deYQEU/ln88Cl5H6DJ7JTULkn8/m3ecB6zc9fTjwVBflw1uUt6rHaRExJiLGDBs2rDtVNzOzErpMDpJWljS0cR8YB9wPXAE0RhyNB36b718BHJxHLW0DvJibna4BxklaI3dEjwOuyY+9JGmbPErp4KZYZmbWC7rTrLQucFkeXboccG5E/EHSNOBCSYcCTwD75P2vAv4NmAO8CnwaICIWSDoRmJb3OyEiFuT7nwPOBFYCrs43MzPrJV0mh4h4FNisRfnzwNgW5QEc0UmsScCkFuXTgU27UV8zM1sGPEPazMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAqcHMzMrMDJwczMCrqdHCQNlnSPpCvz9oaS7pT0iKQLJK2Qy1fM23Py4yOaYnwtlz8s6SNN5bvksjmSjq3v7ZmZWRk9OXM4CpjdtP1d4JSIGAm8AByayw8FXoiIdwOn5P2QNArYD9gE2AX4eU44g4GfAbsCo4D9875mZtZLupUcJA0HdgN+nbcF7ARcnHeZDOyV7++Zt8mPj8377wmcHxF/j4jHgDnA1vk2JyIejYjXgfPzvmZm1ku6e+bwv8BXgDfz9lrAXyJiYd6eB6yX768HPAmQH38x77+ovMNzOis3M7Ne0mVykLQ78GxEzGgubrFrdPFYT8tb1WWCpOmSps+fP38ptTYzsyq6c+awHbCHpLmkJp+dSGcSq0taLu8zHHgq358HrA+QH18NWNBc3uE5nZUXRMRpETEmIsYMGzasG1U3M7MyukwOEfG1iBgeESNIHco3RMQngRuBvfNu44Hf5vtX5G3y4zdEROTy/fJopg2BkcBdwDRgZB79tEJ+jStqeXdmZlbKcl3v0qmvAudL+hZwD3B6Lj8dOFvSHNIZw34AEfGApAuBB4GFwBER8QaApM8D1wCDgUkR8UCFepmZWUU9Sg4RcRNwU77/KGmkUcd9XgP26eT5JwEntSi/CriqJ3UxM7P28QxpMzMrcHIwM7MCJwczMyuo0iHdPxy/WouyF5d9PczM+hGfOZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWUGXyUHSEEl3SbpX0gOSvpnLN5R0p6RHJF0gaYVcvmLenpMfH9EU62u5/GFJH2kq3yWXzZF0bP1v08zMeqI7Zw5/B3aKiM2AzYFdJG0DfBc4JSJGAi8Ah+b9DwVeiIh3A6fk/ZA0CtgP2ATYBfi5pMGSBgM/A3YFRgH7533NzKyXdJkcInk5by6fbwHsBFycyycDe+X7e+Zt8uNjJSmXnx8Rf4+Ix4A5wNb5NiciHo2I14Hz875mZtZLutXnkI/wZwLPAtcBfwL+EhEL8y7zgPXy/fWAJwHy4y8CazWXd3hOZ+VmZtZLupUcIuKNiNgcGE460t+41W75pzp5rKflBZImSJouafr8+fO7rriZmZXSo9FKEfEX4CZgG2B1Scvlh4YDT+X784D1AfLjqwELmss7PKez8lavf1pEjImIMcOGDetJ1c3MrAe6M1ppmKTV8/2VgH8FZgM3Anvn3cYDv833r8jb5MdviIjI5fvl0UwbAiOBu4BpwMg8+mkFUqf1FXW8OTMzK2e5rnfhHcDkPKpoEHBhRFwp6UHgfEnfAu4BTs/7nw6cLWkO6YxhP4CIeEDShcCDwELgiIh4A0DS54FrgMHApIh4oLZ3aGZmPdZlcoiIWcAWLcofJfU/dCx/Ddink1gnASe1KL8KuKob9TUzs2XAM6TNzKzAycHMzAqcHMzMrMDJwczMCpwczMysoDtDWa0fGHHs7wtlc4ccUNzx+BeXQW3MrL/zmYOZmRU4OZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVmBk4OZmRU4OZiZWYGTg5mZFTg5mJlZgZODmZkVODmYmVlBl8lB0vqSbpQ0W9IDko7K5WtKuk7SI/nnGrlckiZKmiNplqQtm2KNz/s/Iml8U/n7Jd2XnzNRktrxZs3MrHu6c+awEPjPiNgY2AY4QtIo4Fjg+ogYCVyftwF2BUbm2wTgF5CSCXAc8AFga+C4RkLJ+0xoet4u1d+amZmV1WVyiIinI+LufP8lYDawHrAnMDnvNhnYK9/fEzgrkjuA1SW9A/gIcF1ELIiIF4DrgF3yY6tGxNSICOCsplhmZtYLetTnIGkEsAVwJ7BuRDwNKYEA6+Td1gOebHravFy2tPJ5Lcpbvf4ESdMlTZ8/f35Pqm5mZj3Q7eQgaRXgEuCLEfHXpe3aoixKlBcLI06LiDERMWbYsGFdVdnMzErqVnKQtDwpMZwTEZfm4mdykxD557O5fB6wftPThwNPdVE+vEW5mZn1ku6MVhJwOjA7In7U9NAVQGPE0Xjgt03lB+dRS9sAL+Zmp2uAcZLWyB3R44Br8mMvSdomv9bBTbHMzKwXLNeNfbYDDgLukzQzl/0XcDJwoaRDgSeAffJjVwH/BswBXgU+DRARCySdCEzL+50QEQvy/c8BZwIrAVfnm5mZ9ZIuk0NE3EbrfgGAsS32D+CITmJNAia1KJ8ObNpVXczMbNnwDGkzMytwcjAzswInBzMzK3ByMDOzgu6MVjIzG/BGHPv7Qtnck3frhZr0DU4OZmadOX61FmUvLvt69AI3K5mZWYHPHMzM+pFl1fzl5GBm1t+1ofnLzUpmZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYFTg5mZlbg5GBmZgVODmZmVuDkYGZmBU4OZmZW4ORgZmYFTg5mZlbg5GBmZgVdJgdJkyQ9K+n+prI1JV0n6ZH8c41cLkkTJc2RNEvSlk3PGZ/3f0TS+Kby90u6Lz9noiTV/SbNzKxnunPmcCawS4eyY4HrI2IkcH3eBtgVGJlvE4BfQEomwHHAB4CtgeMaCSXvM6HpeR1fy8zMlrEuk0NE3AIs6FC8JzA5358M7NVUflYkdwCrS3oH8BHguohYEBEvANcBu+THVo2IqRERwFlNsczMrJeU7XNYNyKeBsg/18nl6wFPNu03L5ctrXxei3IzM+tFdV8mtFV/QZQobx1cmkBqgmKDDTYoU78+oeU1YIccUNyx4mX+zMzKKnvm8ExuEiL/fDaXzwPWb9pvOPBUF+XDW5S3FBGnRcSYiBgzbNiwklU3M7OulD1zuAIYD5ycf/62qfzzks4ndT6/GBFPS7oG+HZTJ/Q44GsRsUDSS5K2Ae4EDgZ+UrJOnRyRl41mZvbW1WVykHQe8GFgbUnzSKOOTgYulHQo8ASwT979KuDfgDnAq8CnAXISOBGYlvc7ISIandyfI42IWgm4Ot/MzKwXdZkcImL/Th4a22LfAI7oJM4kYFKL8unApl3Vw8zMlh3PkDYzs4K6RyuZdapVnxB4pJZZX+TkYNZBp0ns5N2WcU3Meo+Tg1l3Hb9aizKf4djA5D4HMzMrcHIwM7MCJwczMytwcjAzswInBzMzK/BopS54BVUzeyvymYOZmRU4OZiZWYGTg5mZFbjPwWwZ6HbfFbj/yvoEJwfr17yYn1l7ODmYWb/Tr0cR9pM1upwczGwRN39ZgzukzcyswGcO1ql+fepuZpX4zMHMzAp85mBmbeUz0P7JZw5mZlbg5GBmZgV9JjlI2kXSw5LmSDq2t+tjZvZW1ieSg6TBwM+AXYFRwP6SRvVurczM3rr6Sof01sCciHgUQNL5wJ7Ag71aKzOzClp3xvdCRUroE2cOwHrAk03b83KZmZn1AkVEb9cBSfsAH4mIw/L2QcDWEfGFDvtNACbkzfcCD3cj/NrAczVW1zEds6/Gc0zH7Mo/R8Sw7gTsK81K84D1m7aHA0913CkiTgNO60lgSdMjYky16jmmY9Yfsz/U0THfujH7SrPSNGCkpA0lrQDsB1zRy3UyM3vL6hNnDhGxUNLngWuAwcCkiHigl6tlZvaW1SeSA0BEXAVc1YbQPWqGckzHXIYx+0MdHfMtGrNPdEibmVnf0lf6HMzMrA9xcjAzswInB+v3JA2SdH9v18NsIOkzHdJ1krQGMBJYNFE9Im7pvRoVSRLwSeCdEXGCpA2At0fEXb1ctSVIGgZ8lbTmVfPvc6cKMQcDkyPiwOo1hIh4U9K9kjaIiCfqiNlM0jos+d5Lv4ak7YCZEfGKpAOBLYEfR8TjFeu4GfChvHlrRNxbJV6O2R8+R5cAk4CrI+LN3q5PZyRtGhG1HsC0++8z4DqkJR0GHEWaSDcT2AaYWuXLLMcdAhwKbMKSf4xDSsb7BfAmsFNEbJz/0NdGxFYV6tiOL/JrgQuAo4HPAuOB+RHx1bIxc9xrgI9GxOtV4jTFuwHYCrgLeKVRHhF7VIi5B/BD4J+AZ4F/BmZHxCYVYs4CNgNGA2cDpwMfj4gdKsQ8CjgcuDQXfQw4LSJ+UiFmuz5Hu1H8DJ1QId6/Ap/O9bsIODMiHqpYxxuBwhdjxc/RbcAKwJnAuRHxl9IVpH1/nyVExIC6AfeR/vFm5u2NgAtqiHsRcCLwJ9IX5LWkI76y8e7OP+9pKru3Yh2vJSWw2cAOpCOq71aMOSP/nNVUdnMNv89fkiY/fgP4cuNWId4OrW4V63gvsFbjbwTsSPrSrRKz8Xf/H+DQ5rIKMWcBKzdtr9z89yoZs/bPEXAqcBZpHbXj8mucXvV/KcdejXTw8iRwOylhLF8y1vubbtsBPwK+V0MdRwLfAeYA5wI796W/T8fbQOxzeC0iXgOQtGKko4j31hD33RHxDeCViJgM7Aa8r0K8f+TmlYBFR/1VT4vXiojTgX9ExM2Rzmq2qRjzH/nn05J2k7QF6WilqqeAK0n9XkObbqVExM3AQ01xZueyKv4REc8DgyQNiogbgc0rxnxJ0teAA4Hf5/+B5SvGFPBG0/YbuayKdnyOto2Ig4EXIuKbwL+w5LI5pUhaC/gUcBhwD/BjUnPddWXiRcSMptuUiPgy8IGq9YyIR4D/Jp3d7wBMlPSQpI+XCNeu77lFBmKfwzxJqwOXA9dJeoEW6zSV0PiS/IukTYE/AyMqxJsIXAasI+kkYG/SP04VS3yRk9531S/yb0laDfhP4CfAqsCXKsYkfzkgaWjajJerxJP078D3gZtIX4w/kXRMRFxcIexfJK0C3AKcI+lZYGGVegL7AgeQzhr+nPuavl8x5hnAnZIuy9t7kZqrqmjH5+hv+eerkv4JeB7YsEpASZeSjprPJjVTPp0fukDS9JIx12zaHEQ6g3h7xXqOJp3N7EZKWh+NiLvz72Eqi5sEu6td33OL1Xka0tdupOy8B7BCDbEOA9bIMR8ltUF/tmLMjYAjgM8DG9dQx91Jp9ebAjcCM4A9evvv0EldNyUd5T2ebzOATSrEuxdYp2l7GNWb6VYmLeeyHKkp8UjS2Vmv//5a1HXLXL+jgC1qjl3L54jUhLg68AnSwdXTwIkVY+7Uht/lY/kz/hjwCKm59oMVY94CHASs1OKxg/rC36fjbcB1SMOi0TDr0nRmFG0YxVJGh6OSgohYsKzq0h2SJgNHRe5Ayx3nP4ySHfFNcW8Hvh6pqQZJHwa+HRHblox3X0S8r2l7ECk5VGn6q52kbUhnYBuTOigHAy9HxGoVYz4QES/l7aHAqIi4s0LMDVqV1/U5krQiMCQiXqwh1qYUB2GcVTLWIOBfImJK1Xq1w7L8/hhwzUqSvkDq7HqGxW34QRodUibegRHxG0lfbvV4RPyohyFn5Po0twk3tgN4Z4k6fiUivifpJ7QeZXFkT2M2GR1NIysi4oXc71DVyo3EkOPeJGnlCvH+kEdAnZe396XkWl2SXqLF75H8N4qIVctVEYCfklYdvggYAxxM6qis4hekM4eGV1qU9dTvWfx/OYTU/PMwaaRRKfmgbTdSc+xyuazMZ6g55nHAh0nJ4SrSpYZvI3V891ikYdE/IPWH1EZSozO6YxLr6ee99u+Pzgy45EA6rX5vpI7EOjS+sEp3ljaLiEptrJ2YnX+WamPtwiBJa0TEC7DoyKWO/5tHJX2D1FYMqYP2sbLBIuIYSZ8gjS4RaVTRZV08rbNYtfytlxJ/jqTBEfEGcEY+i6pC0dQEkL/gKv2NOp5xSdoS+EyVmMDvgNdII23qmpOwN2lo8D0R8WlJ6wK/rhjz2vy/dGnz77WiM0gHraeQRr19mhKDBtr0/dHSQEwOTwKVT1UbIuKX+e7PI2J+1XiSNoqIh/KHrdXr3d3TmBHxu/xzcn6NVdNmamao6IfA7ZIaHbv7ACfVEPcQ4JukjjiR2mQ/XSVgRFwCXFK9akuqcxIcqTN2BWCmpO+R2t2rnDFBSrRHks4WAP6D1GZem0idp6Xn4GTDI6LUGfxS/C0nw4X5//5Zqh89f5n0N3lD0t+o54xxpYi4XpIiTXg8XtKtpITRY02TaDeMiBPbMYl2ICaHR4GbJP0e+HujsMqpa3a7pMdIE8IubRxJl/Bl0qVOf9jisQCqTLQZQzpCGZo29RfgkIiYUTZmRJyVR33sRPqQfDwiHiwbrynuC6QO1EqW0gTUeJ3SH+jOJsFRoWmF1Ck5iDQI4UukoZyfqBAP0vj+iaTRbgFcz+LL6ZbSoRl1EKmJqurB0dWSxkXEtRXjNJueR+38itTk8jJpImRpbTpzfC33ZzyidO2a/wPWqRDv5+RJtKT5Vy+RDoyqJvBFBlyHdG6DLIg8dLJi7K1J7cV7AQ8C50fEb6rGrUuefXtERNyatz9IOuPp8dGapFUj4q+ddYCV7fiS9DuW/mVeakazpBNII2DOJiWxTwJDI+J7ZeLlmPeSPnx/jIgtJO0I7B8Rpb54VfOyIe3U4XO0EJgLXBJ5bH3JmB8DfkNKNv+gniPy5vgjgFUjYlbFOB2PytcH3lHlqDyfdc0mjdY6kTSq8HsRcUfJeHdHxJaS7omILXLZvRGxWdk6Fl5joCWHZUHS2qRZk5+MiMEV4mxLU+cclB9lkeNNiYjtuirrZqwrI2L3fLbU/E/S+ECXOnWX1Fgm4uOkseON5Lo/MDci/qtk3Dsj4gNdlfUw5vSIGJOTxBa5+eKuiNi6Qszalg1p80CE2kl6lHRgdV9dbfmStm9VHhXWGFIblrapm6Q7gW2BaTlJDCPVsY7BIsAAalaS9L8R8cXOjkzLHpE2xV+VtGbNfsC7SBPYqnxJnJ3jzGTx7NagxCiLpv6LuyT9kjRiJ0gjdm4qU7+I2D3/rLUDLPKsZUknRkTzB/t3kqosGvaGpE8C55Pe+/4sOWu4jHZMgpsLTJF0BUuuAVWm2bNtAxFy/Tp6Mb/WL0ueQTwC3F9jJy/AMU33h5A+kzOo0DwLfKBxVA6LRuitUCFeO9Zrasck2iUMmOTA4lEvP2hT/HtJsxFPiIipNcQbQxqLXscHpWP/RXOTQK2nhpLeCxwdEYdXDDVM0jsj4tEcd0PSxLWyDiAtm/Bj0nueksuq2JM0uuZLpGaG1YDSi8RlT+VbY9mQ0hoDEUhr6izxZZ3Pbqt4jPT3aB4a/AzwHlL7/kElYj5N6g+8mpr6AyPio83buQmodFNi1o6lbY7+qBdwAAAV4UlEQVRuuj+E1M9U+kAjIs6RNAMYSzqb3ysiZnfxtB4ZMMmhqdN184j4cfNjSqtWVl1n5501H/HcT2pWebqrHbsSETtWr86SlKb7/4DUGXs5aeLWz0lrzLTqTO+pL5G+KBqjakZQYahkRMwlfZnXJiJegUVnjb/rYvfuxqzc99XCXZImNNqv8zDM75C+yMvaotWZXURsL+mBkjEfy7cV8q0d5pFm31dR+1F5i0EhUySV/k7Ko5Nepen/UjUvWT/g+hwaHTUdyhZ12pSIV2tzVVOcoaRF3O5iyaOoKktMr06aVDWCJfsxetz2nNs0f0Fa92UX4CuklSS/UaVTssNrrEhaQgTgoYj4+9L27yJWrUuq55ifIZ0p/I105FipvyXHbMdy0O8jrcB7EymZrwUcFhHzKsScDXyk8WWTv4z+EBGjqnye6tahv2UQ6TM1t2qnv6SNWHxUfn3Vo3K1Xq9pYkSUWixP0n20mKQYFZaT72jAnDlI2p/UjLBhh/bSoaQFvsqqu7nqCtLSHrd2KN+BNLytiquAO6hnktGKEXFmvv+wpKOBYyNN3KrLSNJKkkOAzZRmy5btkD+btCrrR0hf6J9kcZt8WUeT1nt6rmKcjjEbKjcvAETEffkI92zSkMbtqySG7D+B2yT9ifQFtCHwH0qz2CeXCZibZ75CMYFX6R9o7m9ZCJwXJZe+yAcYnwXeTfoM/TIiqvYxNTTPbF5IOoM6tGywaM8kxSUMmDMHSf9M+gf+DnBs00Mvkda2r+uPXImkK4H/6jjcLs9ROK5jG2oPYxfOmirEeojUqduYxXkOKfkKyk3W6xC/5bIHEbF3yXj35OGmsyJitKTlgWsqHpH/gTSv49WyMbr5OjdHtYv9nE4a3PBpUlPS/wI/jYifVaxX48xOpDO7SmeMatOFo+oi6QLSENtbSf+PcyPii71bq+6r8/MPA+jMIdKsw8epf02Uxulb4aH0sj2eQzCi1TjsiJiex2lXcbakw0nXSWhuqiozJ+Fp0nDdhj83bVearJfVvexB3UuqA3yNNPnxTpb8fZYeItpJ80Kl5aBJ/VeH5T6xx5QW4qs66ROWPLMbXfHMDvL1RiQdlUet3Vyl3R06/Xw2RlV9K3q2jM6oxhF5Tri1zTZW62s2vEga1vtsiXjtmKS4hAGTHBryH+G7pNmHovpEm93rqls2ZCmPrVQx9uukawN8ncUfmFKLcbWjk7uDupc9OC2PR/8GqelulXy/il8CN1DvWkC1Ni8ARMQpklbKHZIPR1rptFLMzs7sKLmgXdaO641cTRqyfG7e3i///Cvpkpw9ORNv1I+IWChVvV7SEg4lHbg2Fpv8MKkJ+D2SToiIszt7YieaR7otJC2UWO/SMVHzWui9fSNdgq/ytRE6ib0uKVnsTtO1A3oY4zzg8Bblh1L9Mox/Atbu7b9BN+v6c9Js0c+Sxr/fA5xRId7gNtTx9t7+PXWznh8lrZj6WN7eHLiiYsz7SEek9+btdYHfVYxZ+/VGgCmdlZGOynsS6w1SUvkrqTl6YdP9v1as5++AdZu21yWtK7Ymae5HT+N9qOP/PLBlnf9XA+7MAXgmah7vC6D6rjT2ReCyPGGrMbxtDGlo38cqVvMB0vC2Pi8i/iPfPTW37Vdd9uCxHOcC4IbIn5aKbpQ0gfTBrtRM10mzwiIR0dMrgTU7njT566Yca2aeN1JF7QvaRcSV+e6LpJVJ67CKpA9EvnaF0hI3q+THetTPGBVWO+iGERHxTNP2s8B7ImKBpH909qSluAaYJunfm+L+mmrLtC9hICaH6blj6XKW/EBX+fBBaqrZKnL7YB558UegR8kh/yG3VVqnpzEe+/cRcUPF+kE68pmZh0vW0kbeLmqx7IGk7aP8sgfvJR1BHwFMykOGz4+I2ypUszGJ7mtNZWXXzG80b6xDWvag8ffekfSlXuX/c2FEvNihGaRqcqx9Qbv8mTmc4lDrKheOOoz0916FdND2V+CwPKrqOxXi1u3WPBjlorz9CeCWXM+/dP60Tj1MPliVdGhE3A49XwJ8aQbMaKUGSWe0KI6K/4CoH1xpTNL4VuWRl/KuEHcNUudk8/DDKktdNOZ7NCxa9iCqDWtsxF6DNFO60tpX7ZC/IA6PfK1jSe8AfhYRZS4y34h5Omkl1mNJXzpHAstHxGdLxhNpee0n8/YI6lnQ7nbSSKAZNC1tEmmp9UqUrnOuaLowVV+Sf6fN1xu5jbSQYakvYC1eeG8k6Wx5EmkF5trOHAZccmgXSd8nXU2ueTmBWdFHhuE1KK0B05gZ+3BElDllbY53GOkCSsNJ60BtA0yt40u8w+usT1qlcv8KMXYg/V12BaaR+nBKf/FIehtpifUNImJC/iC+t6l5pEzM+yNi06btQaT/o9KzenM9vw6MI33xXEO6NnOVFVRnRMT7yz6/k5gzI2LzOmPmuLtRnDtRdZmTPk1Lrsb6NlLn+8cjorbWoAGXHCS9hzSzd92I2FRpGYg9IuJbNcT+OPBB0gfwlih5pbF2UboO82TS4m4iXStgfJWj/DxUcCvgjojYPM8c/WZE7Fu9xku8jkhfkqXOxJRWj50JXEjqjH2li6d0J+YFpKPcg/P/0kqkxFj6C07ST0lnYY3FEfcD5kTEF6rWt06SfgacGRHTaoz5LVInf6nLt3YS81TgbaTmuV+ThkjfFRGVRmvVTUted2QFYHnglahpufL8Gl4+Y2nyuOljSLMbG5n1/ipHZi1eY23g+Zo6PWujtBDXARHxcN5+D2nGaOkjQEnTImIrSTNJq1X+vY4jQNW87IHy9Seq1KlFzMaS3bWumZ8PMj6UN0sfZKj1yqmLRLWlWB4knYE+Tlo9tuy8nuaYL5GusPZ3arqegxZPemz8XIV0Ma5xZWMuC5L2AraO8kvUt+0guGEgdki/LSLu6tA5V3p2dJ5QdDKwgHSRjrOBtUnXVj44Iv5QpbI1W76RGAAi4v8pzRSuYl7umLwcuE7SC6Tx6VXVsuxBc5JpNS69Ymf86/lsoRH/XTR19JeVB0dUHSABadz8k6SzkDupt0Ny1xpjAe27wlr++aqkfyItlbPMrrNcVkRcLunYrvfs1K/IB8E53ixJ5wJODkvxXP4QNz7Qe1Nt5dOfAv9FGp99A7BrRNyRm1fOA/pScpieOycbE2qah8uWEhGN4bXH51FQq5EmHlV1MfBa5LWaJA2W9Lbo+VIVtV/LoMnxpL/v+pLOIXUmVrrOteqdpPl2YGfSMicHkCZCnRcRZVdNXSTSigOow/Wzq2rD4Ibf5YOX7wN3kz73v6pUyTboMJR5EGn4epWWh1oPgluKGidN9IUbaZjhH0nj/f+PNCpgRIV4M5vuz+7w2D29/X471GdFUgfqpaQlh79EWkCvSsyzu1NWIu4dwCpN26vQByedkVY43Y00gavyBEPaNEkz/+0/RVpC4Qs1xNuDNDnxFdIs7jeBByrGPIw0ue4F0iS4v5HmpJSNNwjYtsPvYLXe/p/ppK5nNN1+RRpAUGoibY53NWk9rbvz9t7A1bXWubd/aW38Y6xMuoZw1Th3t7rfansg3lq858HAgzXEndmdsh7EG0ZaOfcq0hneDVW+eHLM67tT1sOYhRm9FeOtSLrk6kWkEVrfANarIe69OTHek7d3BE6rGPM+0hnDzLy9EdVXBZha5++zHbf8mflSzTFrPQhudRtwzUpackGqRjv0i6Qx9DNLhNxM0l9Jp/8r5fvk7dpOt6tQ54sDAhAlOhElfY3UnNbxPb8OnFamnh28ImnLyKu7Sno/6UiyrHNI4713o2nFzzKBlJZufhuwdm4GaZy7r0q6XkIVtU3SlDSZNJHyatIIsvsr1q3ZPyLieUmDJA2KiBslfbdizNci4jVJSFoxIh5SurJgFdcqXdzo0sjfmn1NRLwhaQ/glBpjPgr8a55ENygiXqordsNAHK10Lqk9rzHJajfSEdVGwEURUfUSgn2O0nLlnYrcflwy9nci4mtd79njuFuRrvfc6Nx+B7BfRJTqQ2iMy2+MWsllpZbCVrpy4BdJieD/WJwc/gr8KiJ+WqaOOXZtkzQlvcni61A3f5DrGAX0R2Av0izjtUnLPWwVEdtWiHkZqc/mi6RVfV8gDaL4twoxGyOgFpI6pyu/93ZQut7GaqQDmOZrh5da+l5pOfVPUJxtXtv8joGYHK4BPhERL+ftVUidnx8jnT2M6s36tZvS0tdb5c27osRywB3ibUdqBnhF0oGktVt+XCXhNMVenrTsReN6AaUn7Em6IyK2yX//iaSkc3FEvKtCzC9ExE/KPr8/y0ekfyO16zeun31O9GwJ7KXF3yHH/ENEvF5HzL4sD+aAxUm8kcRKTSZVWkfsRYqzzeu4hG96jQGYHGYDmzX+4XKGnRkRG6sPXd6wHVRcHPBDQJnFAZtjziJdd2E0aRTU6aSZmKUuTiPpK42zN0n7RMRFTY99O8qP+96dtDTD+qTrXa9KampZ6lyAbsTdlLRsdfPomtLLVksanuu3HemL4jbgqKh+5bZaKS3c93TkWdZ5SO+6ka7V3dNYHa+wdnrUePGtNoyAqk1TM3fj7DNIzZ23RcRjFeLWOnerlUHtDN5LzgXukHSc0pr0U4Dz8pHQg71btbZrLA44PiIOJq1XVPWaBgtzW+6epDOGH7PkWvI9tV/T/Y7NVbuUDRoRV0bEixFxf0TsGBHvryExHEf6Iv8JqUP2e6RRPFWcQbrexD8B65GaP1s1NfW2i1jyGhZvsHjRuJ6aTGrqvY80f6K+o9u0vMstpCVDvpl/Hl9X/BoMzbdV8m0o6XdxtaT9lvbELtyudO3w9qmzd7uv3EhX12q0G4/p7fosw/d9X4ftQR3LSsS8mfQl/ghpXP3gKjFpGv5Lh6HAHbe7Ge9wYGS+L9IX7YvALGCLqr9P6r+mQa2jtNr4v9SqnveW/T023V+OGkf50YYRUMvo97tmld8D6UD3ddLqrLPy72FWnXUcUKOVtOQiZpUmf/VTf8ht7s2LA1Zdx2Zf0gSrQyLiz5I2IDVdlRWd3G+13R1HkRYdgzQZbDRpmN8WpL6HD7V+WrfUfk0D0iTNA1n8N9qfNKu3r5kvaY/IZ1+S9gSeKxmrnVdYa8cIqLaLdB2HKr+I2mewdzSgkkP+IN+rmheg6uskvZvUHnyMllwccCppiGdpOSGcA2yV2/XvimrXEa57aPDCWNyRvTtwVqRO0z9KqjoyrfZrGgCHkGbdn0JKhrfnsr7ms8A5SgsFirRMx8ElY23W4e+8UtP/QES1kUXtWt6lrSQ1RmuVEhGPS/og6az5DKVrZazS1fN6YiB2SN9AGq1zF03D/CJiz96rVXspXSPgv6LDevuSxgDHRURPrqPbMXbtndx1knQ3abjyC6RF4naKvHyEpNkRsXFNrzOCGq5p0N/k0X6KNoyjr1tfHAHVyRykNUkJ7OCIeKhk3ONIfRfvjYj3KK0rdVFEbFepwk0G1JlD9s2m+yIdRZe+RkA/MaLVl1ZETM9falXUcgW8Nvof0vpKg0lLdTcSww7Ao1UCq8ar1WnJVWgLoo9crU/SgRHxm04mkxIRP+qVinXQyQiom3u3Vi3t3mE7SCs6V11S/mOkptO7ASLiKUm1Lmw44JJDRNwsaXNSO/m/k9aFObV3a9V2S2uOWali7EGx5FyJ5+lDo9wi4so8CXBoRDSfpk8n9ZdUcUzT/UVXqyNN4Oqp5sl93wSOq1Cvdlo5/2zHCqp1mkzqy7iV1P4+itT/1KdEDfOBOvF6RISkxgKjK3f1hJ4aMM1KSuub78fiDr4LgKMjYqmzhwcCSeeR1hH6VYfyQ4FxUeHCPGp9Bbz7IuIrZWP2V6rhanU5zoCeb7MsqOmyvZKWI/WF1XaJzL5O0tGkuR07k2axHwKcGzVO2hxIyeFN0lHEoRExJ5c9GhFVR5f0eXlW9GWkoW2NUVpjSFec+lhE/Lli/D59BbxlJY8uKX21uqY4d/fVLzJJ/7OUhyMiTlxmlVmKjr/Dvvw7rVMeIHBuRNwuaWeaLg0bEdfV+VoDqVnpE6Qzhxvz1PLzodaLn/RZEfEMsK2kHUkLsQH8PiJuKBuzaQTUlGi6OI2k7SW9KyL+VLnifZyKV6vbgrRa6UDWqi18ZeBQ0iqtfSI50N4RUH3ZI8APJb2D1DpyTpRbULRLA+bMoSG3ve1Fal7aidQ2eVlEXNurFetn2jkCqk6Slnq0GCUXNsuxxzfCkBZ2mxsRt5eM1XwN4beRllqGPvxlljs4jyIlhguBH0bFtbqsHrmfbb98G0Jq9j0/Iv5fba8x0JJDM0lrAvsA+0bJBa7eqpa2dktze29va1rQrJUo83fPE76GR8TP8vZdpOtFBPCVvjKMt13y5+bLpAX3JpOWTSk9Jt/aS9IWwCRgdEQMri3uQE4OVp6kORHx7p4+NhBImkJaPvzJvD2TdBa6CnBGRIztzfq1Ux6A8HHSNTt+Fnl1Y+tblFY03oV05jCWtMzNeRFxeV2v0WeGJFqfM03S4R0L8wioPrM0iaSvNN3fp8Nj3y4ZdoVGYshui4gFedZ97UMG+5j/JC0K+N/AU5L+mm8vNbXxWy+RtLOkScA8YAJpeZx3RcS+dSYG8JmDdaLdI6Dq0jxKpa4RLF2cNf0pKlwjwqyK3Ix6LnBJRCxo52sNpNFKVqN2jIBqE3Vyv9V2d90p6fAW80Y+Q/W1lcxKi4gdl9VrOTnYUkXEjcDSOn17W92rvAJ8Cbhc0gHk5QlIy8CvSBoJZzbguVnJ+jVJb5DG5ou0VEjzENEhEbF8hdg7AZvkzQf64FmTWds4OZiZWYFHK5mZWYGTg5mZFTg5mJlZgZOD2TIg6YuS3tbb9TDrLndImy0DkuYCYyLiuRaPDY6IN5Z9rcw65zMHs0zSwZJmSbpX0tmS/lnS9bnsekkb5P3OlLR30/Nezj8/LOkmSRdLekjSOUqOJC1JcWNjoUBJL0s6QdKdwH9Luqwp3s6SLl2mb96sA0+CMwMkbUK6XvZ2EfFcXpl0MnBWREyWdAgwka4nwW1BmhvxFDAlx5uYr8m8Y9OZw8rA/RHxP/kiQrMlDYuI+cCngTNqf5NmPeAzB7NkJ+Dixpd3XrfmX0jr2ACcTboaXlfuioh5EfEmMBMY0cl+bwCX5NeKHP9ASavn17265Pswq4XPHMwS0fVyG43HF5IPrPJR/wpN+/y96f4bdP4Ze61DP8MZwO+A14CLImJhN+tt1hY+czBLrgf+XdJasOiCN7eT1suHdOGb2/L9uaS1lgD2BLqzRMdLwNDOHoyIp0hNUf8NnNmzqpvVz2cOZkBEPCDpJODmvF7TPcCRwCRJxwCNvgCAXwG/zVeIu57W113u6DTgaklPL2VlzXOAYRHxYJX3YlYHD2U16yMk/RS4JyJO7+26mDk5mPUBkmaQzkB2joi/d7W/Wbs5OZiZWYE7pM3MrMDJwczMCpwczMyswMnBzMwKnBzMzKzAycHMzAr+P7z9xgjlNa7UAAAAAElFTkSuQmCC\n",
1491 | "text/plain": [
1492 | ""
1493 | ]
1494 | },
1495 | "metadata": {},
1496 | "output_type": "display_data"
1497 | }
1498 | ],
1499 | "source": [
1500 | "(pd.concat([c_country,e_country],axis = 1)).plot(kind='bar')"
1501 | ]
1502 | },
1503 | {
1504 | "cell_type": "markdown",
1505 | "metadata": {},
1506 | "source": [
1507 | "The sample is biased. For example, Argentina and Uruguay's exp group has a larger sample size than the control group"
1508 | ]
1509 | },
1510 | {
1511 | "cell_type": "markdown",
1512 | "metadata": {},
1513 | "source": [
1514 | "#### We should look at the comparison in each segment(country)"
1515 | ]
1516 | },
1517 | {
1518 | "cell_type": "code",
1519 | "execution_count": 327,
1520 | "metadata": {},
1521 | "outputs": [],
1522 | "source": [
1523 | "# get the conversion rate for each country in controll group\n",
1524 | "c_cr = pd.Series(controll.groupby('country').conversion.mean(),name = 'controll conversion rate')"
1525 | ]
1526 | },
1527 | {
1528 | "cell_type": "code",
1529 | "execution_count": 328,
1530 | "metadata": {},
1531 | "outputs": [],
1532 | "source": [
1533 | "# get the conversion rate for each country in experiment group\n",
1534 | "e_cr = pd.Series(exp.groupby('country').conversion.mean(), name = 'exp conversion rate')"
1535 | ]
1536 | },
1537 | {
1538 | "cell_type": "code",
1539 | "execution_count": 329,
1540 | "metadata": {},
1541 | "outputs": [
1542 | {
1543 | "data": {
1544 | "text/plain": [
1545 | "country\n",
1546 | "Argentina 0.015071\n",
1547 | "Bolivia 0.049369\n",
1548 | "Chile 0.048107\n",
1549 | "Colombia 0.052089\n",
1550 | "Costa Rica 0.052256\n",
1551 | "Ecuador 0.049154\n",
1552 | "El Salvador 0.053554\n",
1553 | "Guatemala 0.050643\n",
1554 | "Honduras 0.050906\n",
1555 | "Mexico 0.049495\n",
1556 | "Nicaragua 0.052647\n",
1557 | "Panama 0.046796\n",
1558 | "Paraguay 0.048493\n",
1559 | "Peru 0.049914\n",
1560 | "Uruguay 0.012048\n",
1561 | "Venezuela 0.050344\n",
1562 | "Name: controll conversion rate, dtype: float64"
1563 | ]
1564 | },
1565 | "execution_count": 329,
1566 | "metadata": {},
1567 | "output_type": "execute_result"
1568 | }
1569 | ],
1570 | "source": [
1571 | "c_cr"
1572 | ]
1573 | },
1574 | {
1575 | "cell_type": "markdown",
1576 | "metadata": {},
1577 | "source": [
1578 | "Get all the t, and p values for each country"
1579 | ]
1580 | },
1581 | {
1582 | "cell_type": "code",
1583 | "execution_count": 330,
1584 | "metadata": {},
1585 | "outputs": [],
1586 | "source": [
1587 | "country_list =list(controll.country.unique())"
1588 | ]
1589 | },
1590 | {
1591 | "cell_type": "code",
1592 | "execution_count": 331,
1593 | "metadata": {},
1594 | "outputs": [
1595 | {
1596 | "data": {
1597 | "text/plain": [
1598 | "['Mexico',\n",
1599 | " 'Colombia',\n",
1600 | " 'El Salvador',\n",
1601 | " 'Nicaragua',\n",
1602 | " 'Peru',\n",
1603 | " 'Chile',\n",
1604 | " 'Argentina',\n",
1605 | " 'Ecuador',\n",
1606 | " 'Venezuela',\n",
1607 | " 'Guatemala',\n",
1608 | " 'Honduras',\n",
1609 | " 'Panama',\n",
1610 | " 'Paraguay',\n",
1611 | " 'Costa Rica',\n",
1612 | " 'Bolivia',\n",
1613 | " 'Uruguay']"
1614 | ]
1615 | },
1616 | "execution_count": 331,
1617 | "metadata": {},
1618 | "output_type": "execute_result"
1619 | }
1620 | ],
1621 | "source": [
1622 | "country_list"
1623 | ]
1624 | },
1625 | {
1626 | "cell_type": "code",
1627 | "execution_count": 332,
1628 | "metadata": {},
1629 | "outputs": [],
1630 | "source": [
1631 | "lin = []\n",
1632 | "for c in country_list:\n",
1633 | " t,p = stats.ttest_ind(a=controll[controll['country']==c].conversion, \\\n",
1634 | " b=exp[exp['country']==c].conversion,equal_var = False)\n",
1635 | " #t_stat.append(t)\n",
1636 | " #p_value.append(p)\n",
1637 | " lin = lin + [[t,p]]"
1638 | ]
1639 | },
1640 | {
1641 | "cell_type": "code",
1642 | "execution_count": 333,
1643 | "metadata": {},
1644 | "outputs": [
1645 | {
1646 | "data": {
1647 | "text/plain": [
1648 | "[[-1.3866735952325449, 0.16554372211039645],\n",
1649 | " [0.79999178223708245, 0.42371907413141141],\n",
1650 | " [1.1549940887832975, 0.2481266743266678],\n",
1651 | " [-0.27880850314757355, 0.78040038589047944],\n",
1652 | " [-0.28982358545511927, 0.77195298851535477],\n",
1653 | " [-1.0303728644383661, 0.30284764308444695],\n",
1654 | " [0.9638326839451179, 0.33514654687468659],\n",
1655 | " [0.048257426198918048, 0.96151169060066222],\n",
1656 | " [0.56261424690935702, 0.57370152343872549],\n",
1657 | " [0.56496315146205101, 0.57210720819120686],\n",
1658 | " [0.72013284328217941, 0.47146285652575859],\n",
1659 | " [-0.378167043801935, 0.70532683727258894],\n",
1660 | " [-0.14628996329799995, 0.88369650349623641],\n",
1661 | " [-0.40176067651471453, 0.68787635370739864],\n",
1662 | " [0.35995817724402418, 0.71888524684510746],\n",
1663 | " [-0.15134316107212104, 0.87976397365142245]]"
1664 | ]
1665 | },
1666 | "execution_count": 333,
1667 | "metadata": {},
1668 | "output_type": "execute_result"
1669 | }
1670 | ],
1671 | "source": [
1672 | "lin"
1673 | ]
1674 | },
1675 | {
1676 | "cell_type": "code",
1677 | "execution_count": 334,
1678 | "metadata": {},
1679 | "outputs": [],
1680 | "source": [
1681 | "stats = pd.DataFrame(lin, columns=['t', 'p'], index = country_list)"
1682 | ]
1683 | },
1684 | {
1685 | "cell_type": "code",
1686 | "execution_count": 335,
1687 | "metadata": {},
1688 | "outputs": [
1689 | {
1690 | "data": {
1691 | "text/html": [
1692 | "\n",
1693 | "\n",
1706 | "
\n",
1707 | " \n",
1708 | " \n",
1709 | " | \n",
1710 | " t | \n",
1711 | " p | \n",
1712 | "
\n",
1713 | " \n",
1714 | " \n",
1715 | " \n",
1716 | " Mexico | \n",
1717 | " -1.386674 | \n",
1718 | " 0.165544 | \n",
1719 | "
\n",
1720 | " \n",
1721 | " Colombia | \n",
1722 | " 0.799992 | \n",
1723 | " 0.423719 | \n",
1724 | "
\n",
1725 | " \n",
1726 | " El Salvador | \n",
1727 | " 1.154994 | \n",
1728 | " 0.248127 | \n",
1729 | "
\n",
1730 | " \n",
1731 | " Nicaragua | \n",
1732 | " -0.278809 | \n",
1733 | " 0.780400 | \n",
1734 | "
\n",
1735 | " \n",
1736 | " Peru | \n",
1737 | " -0.289824 | \n",
1738 | " 0.771953 | \n",
1739 | "
\n",
1740 | " \n",
1741 | " Chile | \n",
1742 | " -1.030373 | \n",
1743 | " 0.302848 | \n",
1744 | "
\n",
1745 | " \n",
1746 | " Argentina | \n",
1747 | " 0.963833 | \n",
1748 | " 0.335147 | \n",
1749 | "
\n",
1750 | " \n",
1751 | " Ecuador | \n",
1752 | " 0.048257 | \n",
1753 | " 0.961512 | \n",
1754 | "
\n",
1755 | " \n",
1756 | " Venezuela | \n",
1757 | " 0.562614 | \n",
1758 | " 0.573702 | \n",
1759 | "
\n",
1760 | " \n",
1761 | " Guatemala | \n",
1762 | " 0.564963 | \n",
1763 | " 0.572107 | \n",
1764 | "
\n",
1765 | " \n",
1766 | " Honduras | \n",
1767 | " 0.720133 | \n",
1768 | " 0.471463 | \n",
1769 | "
\n",
1770 | " \n",
1771 | " Panama | \n",
1772 | " -0.378167 | \n",
1773 | " 0.705327 | \n",
1774 | "
\n",
1775 | " \n",
1776 | " Paraguay | \n",
1777 | " -0.146290 | \n",
1778 | " 0.883697 | \n",
1779 | "
\n",
1780 | " \n",
1781 | " Costa Rica | \n",
1782 | " -0.401761 | \n",
1783 | " 0.687876 | \n",
1784 | "
\n",
1785 | " \n",
1786 | " Bolivia | \n",
1787 | " 0.359958 | \n",
1788 | " 0.718885 | \n",
1789 | "
\n",
1790 | " \n",
1791 | " Uruguay | \n",
1792 | " -0.151343 | \n",
1793 | " 0.879764 | \n",
1794 | "
\n",
1795 | " \n",
1796 | "
\n",
1797 | "
"
1798 | ],
1799 | "text/plain": [
1800 | " t p\n",
1801 | "Mexico -1.386674 0.165544\n",
1802 | "Colombia 0.799992 0.423719\n",
1803 | "El Salvador 1.154994 0.248127\n",
1804 | "Nicaragua -0.278809 0.780400\n",
1805 | "Peru -0.289824 0.771953\n",
1806 | "Chile -1.030373 0.302848\n",
1807 | "Argentina 0.963833 0.335147\n",
1808 | "Ecuador 0.048257 0.961512\n",
1809 | "Venezuela 0.562614 0.573702\n",
1810 | "Guatemala 0.564963 0.572107\n",
1811 | "Honduras 0.720133 0.471463\n",
1812 | "Panama -0.378167 0.705327\n",
1813 | "Paraguay -0.146290 0.883697\n",
1814 | "Costa Rica -0.401761 0.687876\n",
1815 | "Bolivia 0.359958 0.718885\n",
1816 | "Uruguay -0.151343 0.879764"
1817 | ]
1818 | },
1819 | "execution_count": 335,
1820 | "metadata": {},
1821 | "output_type": "execute_result"
1822 | }
1823 | ],
1824 | "source": [
1825 | "stats"
1826 | ]
1827 | },
1828 | {
1829 | "cell_type": "code",
1830 | "execution_count": 336,
1831 | "metadata": {},
1832 | "outputs": [
1833 | {
1834 | "data": {
1835 | "text/html": [
1836 | "\n",
1837 | "\n",
1850 | "
\n",
1851 | " \n",
1852 | " \n",
1853 | " | \n",
1854 | " controll conversion rate | \n",
1855 | " exp conversion rate | \n",
1856 | " t | \n",
1857 | " p | \n",
1858 | "
\n",
1859 | " \n",
1860 | " \n",
1861 | " \n",
1862 | " Argentina | \n",
1863 | " 0.015071 | \n",
1864 | " 0.013725 | \n",
1865 | " 0.963833 | \n",
1866 | " 0.335147 | \n",
1867 | "
\n",
1868 | " \n",
1869 | " Bolivia | \n",
1870 | " 0.049369 | \n",
1871 | " 0.047901 | \n",
1872 | " 0.359958 | \n",
1873 | " 0.718885 | \n",
1874 | "
\n",
1875 | " \n",
1876 | " Chile | \n",
1877 | " 0.048107 | \n",
1878 | " 0.051295 | \n",
1879 | " -1.030373 | \n",
1880 | " 0.302848 | \n",
1881 | "
\n",
1882 | " \n",
1883 | " Colombia | \n",
1884 | " 0.052089 | \n",
1885 | " 0.050571 | \n",
1886 | " 0.799992 | \n",
1887 | " 0.423719 | \n",
1888 | "
\n",
1889 | " \n",
1890 | " Costa Rica | \n",
1891 | " 0.052256 | \n",
1892 | " 0.054738 | \n",
1893 | " -0.401761 | \n",
1894 | " 0.687876 | \n",
1895 | "
\n",
1896 | " \n",
1897 | " Ecuador | \n",
1898 | " 0.049154 | \n",
1899 | " 0.048988 | \n",
1900 | " 0.048257 | \n",
1901 | " 0.961512 | \n",
1902 | "
\n",
1903 | " \n",
1904 | " El Salvador | \n",
1905 | " 0.053554 | \n",
1906 | " 0.047947 | \n",
1907 | " 1.154994 | \n",
1908 | " 0.248127 | \n",
1909 | "
\n",
1910 | " \n",
1911 | " Guatemala | \n",
1912 | " 0.050643 | \n",
1913 | " 0.048647 | \n",
1914 | " 0.564963 | \n",
1915 | " 0.572107 | \n",
1916 | "
\n",
1917 | " \n",
1918 | " Honduras | \n",
1919 | " 0.050906 | \n",
1920 | " 0.047540 | \n",
1921 | " 0.720133 | \n",
1922 | " 0.471463 | \n",
1923 | "
\n",
1924 | " \n",
1925 | " Mexico | \n",
1926 | " 0.049495 | \n",
1927 | " 0.051186 | \n",
1928 | " -1.386674 | \n",
1929 | " 0.165544 | \n",
1930 | "
\n",
1931 | " \n",
1932 | " Nicaragua | \n",
1933 | " 0.052647 | \n",
1934 | " 0.054177 | \n",
1935 | " -0.278809 | \n",
1936 | " 0.780400 | \n",
1937 | "
\n",
1938 | " \n",
1939 | " Panama | \n",
1940 | " 0.046796 | \n",
1941 | " 0.049370 | \n",
1942 | " -0.378167 | \n",
1943 | " 0.705327 | \n",
1944 | "
\n",
1945 | " \n",
1946 | " Paraguay | \n",
1947 | " 0.048493 | \n",
1948 | " 0.049229 | \n",
1949 | " -0.146290 | \n",
1950 | " 0.883697 | \n",
1951 | "
\n",
1952 | " \n",
1953 | " Peru | \n",
1954 | " 0.049914 | \n",
1955 | " 0.050604 | \n",
1956 | " -0.289824 | \n",
1957 | " 0.771953 | \n",
1958 | "
\n",
1959 | " \n",
1960 | " Uruguay | \n",
1961 | " 0.012048 | \n",
1962 | " 0.012907 | \n",
1963 | " -0.151343 | \n",
1964 | " 0.879764 | \n",
1965 | "
\n",
1966 | " \n",
1967 | " Venezuela | \n",
1968 | " 0.050344 | \n",
1969 | " 0.048978 | \n",
1970 | " 0.562614 | \n",
1971 | " 0.573702 | \n",
1972 | "
\n",
1973 | " \n",
1974 | "
\n",
1975 | "
"
1976 | ],
1977 | "text/plain": [
1978 | " controll conversion rate exp conversion rate t p\n",
1979 | "Argentina 0.015071 0.013725 0.963833 0.335147\n",
1980 | "Bolivia 0.049369 0.047901 0.359958 0.718885\n",
1981 | "Chile 0.048107 0.051295 -1.030373 0.302848\n",
1982 | "Colombia 0.052089 0.050571 0.799992 0.423719\n",
1983 | "Costa Rica 0.052256 0.054738 -0.401761 0.687876\n",
1984 | "Ecuador 0.049154 0.048988 0.048257 0.961512\n",
1985 | "El Salvador 0.053554 0.047947 1.154994 0.248127\n",
1986 | "Guatemala 0.050643 0.048647 0.564963 0.572107\n",
1987 | "Honduras 0.050906 0.047540 0.720133 0.471463\n",
1988 | "Mexico 0.049495 0.051186 -1.386674 0.165544\n",
1989 | "Nicaragua 0.052647 0.054177 -0.278809 0.780400\n",
1990 | "Panama 0.046796 0.049370 -0.378167 0.705327\n",
1991 | "Paraguay 0.048493 0.049229 -0.146290 0.883697\n",
1992 | "Peru 0.049914 0.050604 -0.289824 0.771953\n",
1993 | "Uruguay 0.012048 0.012907 -0.151343 0.879764\n",
1994 | "Venezuela 0.050344 0.048978 0.562614 0.573702"
1995 | ]
1996 | },
1997 | "execution_count": 336,
1998 | "metadata": {},
1999 | "output_type": "execute_result"
2000 | }
2001 | ],
2002 | "source": [
2003 | "pd.concat([c_cr,e_cr,stats],axis = 1)"
2004 | ]
2005 | },
2006 | {
2007 | "cell_type": "markdown",
2008 | "metadata": {},
2009 | "source": [
2010 | "### Conclusion:\n",
2011 | "If we look at the A/B test results in each segment, we can see that the p values is not less than the alpha 0.05, which means that we cannot reject null hypothesis.
\n",
2012 | "Therefore, there is no significant improvement of the converstion rate after the change.
\n",
2013 | "Also, it's not becoming worse after the change."
2014 | ]
2015 | },
2016 | {
2017 | "cell_type": "markdown",
2018 | "metadata": {},
2019 | "source": [
2020 | "#### Some extra\n",
2021 | "Below is the step by step calculation of t-statistics not using stats.ttest_ind() function"
2022 | ]
2023 | },
2024 | {
2025 | "cell_type": "code",
2026 | "execution_count": 337,
2027 | "metadata": {},
2028 | "outputs": [],
2029 | "source": [
2030 | "# take Mexico as an example\n",
2031 | "controll_m = controll[controll['country']=='Mexico']\n",
2032 | "exp_m = exp[exp['country']=='Mexico']"
2033 | ]
2034 | },
2035 | {
2036 | "cell_type": "markdown",
2037 | "metadata": {},
2038 | "source": [
2039 | "Calculate sample size"
2040 | ]
2041 | },
2042 | {
2043 | "cell_type": "code",
2044 | "execution_count": 338,
2045 | "metadata": {},
2046 | "outputs": [],
2047 | "source": [
2048 | "na = len(controll_m)\n",
2049 | "nb = len(exp_m)"
2050 | ]
2051 | },
2052 | {
2053 | "cell_type": "code",
2054 | "execution_count": 339,
2055 | "metadata": {},
2056 | "outputs": [
2057 | {
2058 | "name": "stdout",
2059 | "output_type": "stream",
2060 | "text": [
2061 | "Sample size of controll group: 64209\n",
2062 | "Sample size of experiment group: 64275\n"
2063 | ]
2064 | }
2065 | ],
2066 | "source": [
2067 | "print ('Sample size of controll group: {}'.format(na))\n",
2068 | "print ('Sample size of experiment group: {}'.format(nb))"
2069 | ]
2070 | },
2071 | {
2072 | "cell_type": "markdown",
2073 | "metadata": {},
2074 | "source": [
2075 | "Degree of freedom"
2076 | ]
2077 | },
2078 | {
2079 | "cell_type": "code",
2080 | "execution_count": 340,
2081 | "metadata": {},
2082 | "outputs": [
2083 | {
2084 | "name": "stdout",
2085 | "output_type": "stream",
2086 | "text": [
2087 | "128482\n"
2088 | ]
2089 | }
2090 | ],
2091 | "source": [
2092 | "df = na+nb-2\n",
2093 | "print (df)"
2094 | ]
2095 | },
2096 | {
2097 | "cell_type": "markdown",
2098 | "metadata": {},
2099 | "source": [
2100 | "Calculate conversion rate(sample mean) for controlled and exp group for Mexico"
2101 | ]
2102 | },
2103 | {
2104 | "cell_type": "code",
2105 | "execution_count": 341,
2106 | "metadata": {},
2107 | "outputs": [],
2108 | "source": [
2109 | "xa = controll_m.conversion.mean()\n",
2110 | "xb = exp_m.conversion.mean()"
2111 | ]
2112 | },
2113 | {
2114 | "cell_type": "code",
2115 | "execution_count": 343,
2116 | "metadata": {},
2117 | "outputs": [
2118 | {
2119 | "name": "stdout",
2120 | "output_type": "stream",
2121 | "text": [
2122 | "Conversion rate of controll group: 0.04949461913438926\n",
2123 | "Conversion rate of experiment group: 0.05118630882924932\n"
2124 | ]
2125 | }
2126 | ],
2127 | "source": [
2128 | "# the conversion rate of test group is 0.17% higher, but is it significant enough? or it's due to chance.\n",
2129 | "print ('Conversion rate of controll group: {}'.format(xa))\n",
2130 | "print ('Conversion rate of experiment group: {}'.format(xb))"
2131 | ]
2132 | },
2133 | {
2134 | "cell_type": "markdown",
2135 | "metadata": {},
2136 | "source": [
2137 | "Calculate standard deviation"
2138 | ]
2139 | },
2140 | {
2141 | "cell_type": "code",
2142 | "execution_count": 344,
2143 | "metadata": {},
2144 | "outputs": [],
2145 | "source": [
2146 | "# in ipython notebook, use shift+tab to get function details(more tab more details)\n",
2147 | "# ddof is set default of 1. 1 is for sample std, 0 for population std\n",
2148 | "sa = controll_m.conversion.std()\n",
2149 | "sb = exp_m.conversion.std()"
2150 | ]
2151 | },
2152 | {
2153 | "cell_type": "code",
2154 | "execution_count": 345,
2155 | "metadata": {},
2156 | "outputs": [],
2157 | "source": [
2158 | "# calculate standard error\n",
2159 | "se = pow(sa,2)/na+pow(sb,2)/nb"
2160 | ]
2161 | },
2162 | {
2163 | "cell_type": "code",
2164 | "execution_count": 346,
2165 | "metadata": {},
2166 | "outputs": [],
2167 | "source": [
2168 | "t = (xb-xa)/np.sqrt(se)"
2169 | ]
2170 | },
2171 | {
2172 | "cell_type": "code",
2173 | "execution_count": 347,
2174 | "metadata": {},
2175 | "outputs": [
2176 | {
2177 | "name": "stdout",
2178 | "output_type": "stream",
2179 | "text": [
2180 | "1.38667359523\n"
2181 | ]
2182 | }
2183 | ],
2184 | "source": [
2185 | "print (t)"
2186 | ]
2187 | },
2188 | {
2189 | "cell_type": "markdown",
2190 | "metadata": {},
2191 | "source": [
2192 | "Look up t-table of alpha = 0.05 and df = 128482 to get the critical value. t = 1.96
\n",
2193 | "Since 1.39 is not greater than the critical value 1.96. so we cannot reject null"
2194 | ]
2195 | }
2196 | ],
2197 | "metadata": {
2198 | "kernelspec": {
2199 | "display_name": "Python 3",
2200 | "language": "python",
2201 | "name": "python3"
2202 | },
2203 | "language_info": {
2204 | "codemirror_mode": {
2205 | "name": "ipython",
2206 | "version": 3
2207 | },
2208 | "file_extension": ".py",
2209 | "mimetype": "text/x-python",
2210 | "name": "python",
2211 | "nbconvert_exporter": "python",
2212 | "pygments_lexer": "ipython3",
2213 | "version": "3.5.4"
2214 | }
2215 | },
2216 | "nbformat": 4,
2217 | "nbformat_minor": 2
2218 | }
2219 |
--------------------------------------------------------------------------------
/Employee_Retention_PeopleAnalytics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Employee Retention - People Analytics"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Goal:\n",
15 | "### 1. Predict Employee Retention\n",
16 | "#### ----create a table with 3 columns, day, employee_headcount, company_id\n",
17 | "### 2. What are the main factors drive employee churn"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 811,
23 | "metadata": {},
24 | "outputs": [],
25 | "source": [
26 | "import pandas as pd\n",
27 | "import numpy as np\n",
28 | "import matplotlib.pyplot as plt\n",
29 | "%matplotlib inline\n",
30 | "import datetime\n",
31 | "from ggplot import *"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 812,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "data = pd.read_csv(r'C:\\Users\\lshen\\Downloads\\employee_retention_data.csv')"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 813,
46 | "metadata": {},
47 | "outputs": [
48 | {
49 | "data": {
50 | "text/html": [
51 | "\n",
52 | "\n",
65 | "
\n",
66 | " \n",
67 | " \n",
68 | " | \n",
69 | " employee_id | \n",
70 | " company_id | \n",
71 | " dept | \n",
72 | " seniority | \n",
73 | " salary | \n",
74 | " join_date | \n",
75 | " quit_date | \n",
76 | "
\n",
77 | " \n",
78 | " \n",
79 | " \n",
80 | " 0 | \n",
81 | " 13021.0 | \n",
82 | " 7 | \n",
83 | " customer_service | \n",
84 | " 28 | \n",
85 | " 89000.0 | \n",
86 | " 2014-03-24 | \n",
87 | " 2015-10-30 | \n",
88 | "
\n",
89 | " \n",
90 | " 1 | \n",
91 | " 825355.0 | \n",
92 | " 7 | \n",
93 | " marketing | \n",
94 | " 20 | \n",
95 | " 183000.0 | \n",
96 | " 2013-04-29 | \n",
97 | " 2014-04-04 | \n",
98 | "
\n",
99 | " \n",
100 | " 2 | \n",
101 | " 927315.0 | \n",
102 | " 4 | \n",
103 | " marketing | \n",
104 | " 14 | \n",
105 | " 101000.0 | \n",
106 | " 2014-10-13 | \n",
107 | " NaN | \n",
108 | "
\n",
109 | " \n",
110 | " 3 | \n",
111 | " 662910.0 | \n",
112 | " 7 | \n",
113 | " customer_service | \n",
114 | " 20 | \n",
115 | " 115000.0 | \n",
116 | " 2012-05-14 | \n",
117 | " 2013-06-07 | \n",
118 | "
\n",
119 | " \n",
120 | " 4 | \n",
121 | " 256971.0 | \n",
122 | " 2 | \n",
123 | " data_science | \n",
124 | " 23 | \n",
125 | " 276000.0 | \n",
126 | " 2011-10-17 | \n",
127 | " 2014-08-22 | \n",
128 | "
\n",
129 | " \n",
130 | "
\n",
131 | "
"
132 | ],
133 | "text/plain": [
134 | " employee_id company_id dept seniority salary join_date \\\n",
135 | "0 13021.0 7 customer_service 28 89000.0 2014-03-24 \n",
136 | "1 825355.0 7 marketing 20 183000.0 2013-04-29 \n",
137 | "2 927315.0 4 marketing 14 101000.0 2014-10-13 \n",
138 | "3 662910.0 7 customer_service 20 115000.0 2012-05-14 \n",
139 | "4 256971.0 2 data_science 23 276000.0 2011-10-17 \n",
140 | "\n",
141 | " quit_date \n",
142 | "0 2015-10-30 \n",
143 | "1 2014-04-04 \n",
144 | "2 NaN \n",
145 | "3 2013-06-07 \n",
146 | "4 2014-08-22 "
147 | ]
148 | },
149 | "execution_count": 813,
150 | "metadata": {},
151 | "output_type": "execute_result"
152 | }
153 | ],
154 | "source": [
155 | "data.head()"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 814,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "data": {
165 | "text/plain": [
166 | "employee_id float64\n",
167 | "company_id int64\n",
168 | "dept object\n",
169 | "seniority int64\n",
170 | "salary float64\n",
171 | "join_date object\n",
172 | "quit_date object\n",
173 | "dtype: object"
174 | ]
175 | },
176 | "execution_count": 814,
177 | "metadata": {},
178 | "output_type": "execute_result"
179 | }
180 | ],
181 | "source": [
182 | "data.dtypes"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": 815,
188 | "metadata": {},
189 | "outputs": [
190 | {
191 | "data": {
192 | "text/plain": [
193 | "(24702, 7)"
194 | ]
195 | },
196 | "execution_count": 815,
197 | "metadata": {},
198 | "output_type": "execute_result"
199 | }
200 | ],
201 | "source": [
202 | "data.shape"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 816,
208 | "metadata": {},
209 | "outputs": [
210 | {
211 | "data": {
212 | "text/plain": [
213 | "employee_id 0\n",
214 | "company_id 0\n",
215 | "dept 0\n",
216 | "seniority 0\n",
217 | "salary 0\n",
218 | "join_date 0\n",
219 | "quit_date 11192\n",
220 | "dtype: int64"
221 | ]
222 | },
223 | "execution_count": 816,
224 | "metadata": {},
225 | "output_type": "execute_result"
226 | }
227 | ],
228 | "source": [
229 | "data.isnull().sum()"
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "### change into proper data types"
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 817,
242 | "metadata": {},
243 | "outputs": [],
244 | "source": [
245 | "# change join and quit date's type to date time\n",
246 | "# one way -----data['join_date'] = data.join_date.astype(datetime.datetime)\n",
247 | "data['join_date'] = pd.to_datetime(data.join_date)\n",
248 | "data['quit_date'] = pd.to_datetime(data.quit_date)"
249 | ]
250 | },
251 | {
252 | "cell_type": "code",
253 | "execution_count": 818,
254 | "metadata": {},
255 | "outputs": [
256 | {
257 | "data": {
258 | "text/plain": [
259 | "employee_id float64\n",
260 | "company_id int64\n",
261 | "dept object\n",
262 | "seniority int64\n",
263 | "salary float64\n",
264 | "join_date datetime64[ns]\n",
265 | "quit_date datetime64[ns]\n",
266 | "dtype: object"
267 | ]
268 | },
269 | "execution_count": 818,
270 | "metadata": {},
271 | "output_type": "execute_result"
272 | }
273 | ],
274 | "source": [
275 | "data.dtypes"
276 | ]
277 | },
278 | {
279 | "cell_type": "code",
280 | "execution_count": 819,
281 | "metadata": {},
282 | "outputs": [
283 | {
284 | "data": {
285 | "text/html": [
286 | "\n",
287 | "\n",
300 | "
\n",
301 | " \n",
302 | " \n",
303 | " | \n",
304 | " employee_id | \n",
305 | " company_id | \n",
306 | " dept | \n",
307 | " seniority | \n",
308 | " salary | \n",
309 | " join_date | \n",
310 | " quit_date | \n",
311 | "
\n",
312 | " \n",
313 | " \n",
314 | " \n",
315 | " count | \n",
316 | " 24702.000000 | \n",
317 | " 24702.000000 | \n",
318 | " 24702 | \n",
319 | " 24702.000000 | \n",
320 | " 24702.000000 | \n",
321 | " 24702 | \n",
322 | " 13510 | \n",
323 | "
\n",
324 | " \n",
325 | " unique | \n",
326 | " NaN | \n",
327 | " NaN | \n",
328 | " 6 | \n",
329 | " NaN | \n",
330 | " NaN | \n",
331 | " 995 | \n",
332 | " 664 | \n",
333 | "
\n",
334 | " \n",
335 | " top | \n",
336 | " NaN | \n",
337 | " NaN | \n",
338 | " customer_service | \n",
339 | " NaN | \n",
340 | " NaN | \n",
341 | " 2012-01-03 00:00:00 | \n",
342 | " 2015-05-08 00:00:00 | \n",
343 | "
\n",
344 | " \n",
345 | " freq | \n",
346 | " NaN | \n",
347 | " NaN | \n",
348 | " 9180 | \n",
349 | " NaN | \n",
350 | " NaN | \n",
351 | " 105 | \n",
352 | " 111 | \n",
353 | "
\n",
354 | " \n",
355 | " first | \n",
356 | " NaN | \n",
357 | " NaN | \n",
358 | " NaN | \n",
359 | " NaN | \n",
360 | " NaN | \n",
361 | " 2011-01-24 00:00:00 | \n",
362 | " 2011-10-13 00:00:00 | \n",
363 | "
\n",
364 | " \n",
365 | " last | \n",
366 | " NaN | \n",
367 | " NaN | \n",
368 | " NaN | \n",
369 | " NaN | \n",
370 | " NaN | \n",
371 | " 2015-12-10 00:00:00 | \n",
372 | " 2015-12-09 00:00:00 | \n",
373 | "
\n",
374 | " \n",
375 | " mean | \n",
376 | " 501604.403530 | \n",
377 | " 3.426969 | \n",
378 | " NaN | \n",
379 | " 14.127803 | \n",
380 | " 138183.345478 | \n",
381 | " NaN | \n",
382 | " NaN | \n",
383 | "
\n",
384 | " \n",
385 | " std | \n",
386 | " 288909.026101 | \n",
387 | " 2.700011 | \n",
388 | " NaN | \n",
389 | " 8.089520 | \n",
390 | " 76058.184573 | \n",
391 | " NaN | \n",
392 | " NaN | \n",
393 | "
\n",
394 | " \n",
395 | " min | \n",
396 | " 36.000000 | \n",
397 | " 1.000000 | \n",
398 | " NaN | \n",
399 | " 1.000000 | \n",
400 | " 17000.000000 | \n",
401 | " NaN | \n",
402 | " NaN | \n",
403 | "
\n",
404 | " \n",
405 | " 25% | \n",
406 | " 250133.750000 | \n",
407 | " 1.000000 | \n",
408 | " NaN | \n",
409 | " 7.000000 | \n",
410 | " 79000.000000 | \n",
411 | " NaN | \n",
412 | " NaN | \n",
413 | "
\n",
414 | " \n",
415 | " 50% | \n",
416 | " 500793.000000 | \n",
417 | " 2.000000 | \n",
418 | " NaN | \n",
419 | " 14.000000 | \n",
420 | " 123000.000000 | \n",
421 | " NaN | \n",
422 | " NaN | \n",
423 | "
\n",
424 | " \n",
425 | " 75% | \n",
426 | " 753137.250000 | \n",
427 | " 5.000000 | \n",
428 | " NaN | \n",
429 | " 21.000000 | \n",
430 | " 187000.000000 | \n",
431 | " NaN | \n",
432 | " NaN | \n",
433 | "
\n",
434 | " \n",
435 | " max | \n",
436 | " 999969.000000 | \n",
437 | " 12.000000 | \n",
438 | " NaN | \n",
439 | " 99.000000 | \n",
440 | " 408000.000000 | \n",
441 | " NaN | \n",
442 | " NaN | \n",
443 | "
\n",
444 | " \n",
445 | "
\n",
446 | "
"
447 | ],
448 | "text/plain": [
449 | " employee_id company_id dept seniority \\\n",
450 | "count 24702.000000 24702.000000 24702 24702.000000 \n",
451 | "unique NaN NaN 6 NaN \n",
452 | "top NaN NaN customer_service NaN \n",
453 | "freq NaN NaN 9180 NaN \n",
454 | "first NaN NaN NaN NaN \n",
455 | "last NaN NaN NaN NaN \n",
456 | "mean 501604.403530 3.426969 NaN 14.127803 \n",
457 | "std 288909.026101 2.700011 NaN 8.089520 \n",
458 | "min 36.000000 1.000000 NaN 1.000000 \n",
459 | "25% 250133.750000 1.000000 NaN 7.000000 \n",
460 | "50% 500793.000000 2.000000 NaN 14.000000 \n",
461 | "75% 753137.250000 5.000000 NaN 21.000000 \n",
462 | "max 999969.000000 12.000000 NaN 99.000000 \n",
463 | "\n",
464 | " salary join_date quit_date \n",
465 | "count 24702.000000 24702 13510 \n",
466 | "unique NaN 995 664 \n",
467 | "top NaN 2012-01-03 00:00:00 2015-05-08 00:00:00 \n",
468 | "freq NaN 105 111 \n",
469 | "first NaN 2011-01-24 00:00:00 2011-10-13 00:00:00 \n",
470 | "last NaN 2015-12-10 00:00:00 2015-12-09 00:00:00 \n",
471 | "mean 138183.345478 NaN NaN \n",
472 | "std 76058.184573 NaN NaN \n",
473 | "min 17000.000000 NaN NaN \n",
474 | "25% 79000.000000 NaN NaN \n",
475 | "50% 123000.000000 NaN NaN \n",
476 | "75% 187000.000000 NaN NaN \n",
477 | "max 408000.000000 NaN NaN "
478 | ]
479 | },
480 | "execution_count": 819,
481 | "metadata": {},
482 | "output_type": "execute_result"
483 | }
484 | ],
485 | "source": [
486 | "data.describe(include = 'all')"
487 | ]
488 | },
489 | {
490 | "cell_type": "markdown",
491 | "metadata": {},
492 | "source": [
493 | "### Get new hire number for each company by each day"
494 | ]
495 | },
496 | {
497 | "cell_type": "code",
498 | "execution_count": 820,
499 | "metadata": {},
500 | "outputs": [],
501 | "source": [
502 | "new_hire_by_date = data.groupby(['company_id','join_date'], as_index = False).employee_id.count()"
503 | ]
504 | },
505 | {
506 | "cell_type": "code",
507 | "execution_count": 821,
508 | "metadata": {},
509 | "outputs": [],
510 | "source": [
511 | "new_hire_by_date.columns = ['company_id','day','new_hire_count']"
512 | ]
513 | },
514 | {
515 | "cell_type": "code",
516 | "execution_count": 822,
517 | "metadata": {},
518 | "outputs": [
519 | {
520 | "data": {
521 | "text/html": [
522 | "\n",
523 | "\n",
536 | "
\n",
537 | " \n",
538 | " \n",
539 | " | \n",
540 | " company_id | \n",
541 | " day | \n",
542 | " new_hire_count | \n",
543 | "
\n",
544 | " \n",
545 | " \n",
546 | " \n",
547 | " 0 | \n",
548 | " 1 | \n",
549 | " 2011-01-24 | \n",
550 | " 25 | \n",
551 | "
\n",
552 | " \n",
553 | " 1 | \n",
554 | " 1 | \n",
555 | " 2011-01-25 | \n",
556 | " 2 | \n",
557 | "
\n",
558 | " \n",
559 | " 2 | \n",
560 | " 1 | \n",
561 | " 2011-01-26 | \n",
562 | " 2 | \n",
563 | "
\n",
564 | " \n",
565 | " 3 | \n",
566 | " 1 | \n",
567 | " 2011-01-31 | \n",
568 | " 30 | \n",
569 | "
\n",
570 | " \n",
571 | " 4 | \n",
572 | " 1 | \n",
573 | " 2011-02-01 | \n",
574 | " 7 | \n",
575 | "
\n",
576 | " \n",
577 | "
\n",
578 | "
"
579 | ],
580 | "text/plain": [
581 | " company_id day new_hire_count\n",
582 | "0 1 2011-01-24 25\n",
583 | "1 1 2011-01-25 2\n",
584 | "2 1 2011-01-26 2\n",
585 | "3 1 2011-01-31 30\n",
586 | "4 1 2011-02-01 7"
587 | ]
588 | },
589 | "execution_count": 822,
590 | "metadata": {},
591 | "output_type": "execute_result"
592 | }
593 | ],
594 | "source": [
595 | "new_hire_by_date.head()"
596 | ]
597 | },
598 | {
599 | "cell_type": "code",
600 | "execution_count": 823,
601 | "metadata": {},
602 | "outputs": [
603 | {
604 | "data": {
605 | "text/html": [
606 | "\n",
607 | "\n",
620 | "
\n",
621 | " \n",
622 | " \n",
623 | " | \n",
624 | " company_id | \n",
625 | " day | \n",
626 | " new_hire_count | \n",
627 | "
\n",
628 | " \n",
629 | " \n",
630 | " \n",
631 | " 5125 | \n",
632 | " 12 | \n",
633 | " 2014-05-19 | \n",
634 | " 2 | \n",
635 | "
\n",
636 | " \n",
637 | " 5126 | \n",
638 | " 12 | \n",
639 | " 2014-10-13 | \n",
640 | " 1 | \n",
641 | "
\n",
642 | " \n",
643 | " 5127 | \n",
644 | " 12 | \n",
645 | " 2015-03-23 | \n",
646 | " 1 | \n",
647 | "
\n",
648 | " \n",
649 | " 5128 | \n",
650 | " 12 | \n",
651 | " 2015-07-06 | \n",
652 | " 1 | \n",
653 | "
\n",
654 | " \n",
655 | " 5129 | \n",
656 | " 12 | \n",
657 | " 2015-07-27 | \n",
658 | " 1 | \n",
659 | "
\n",
660 | " \n",
661 | "
\n",
662 | "
"
663 | ],
664 | "text/plain": [
665 | " company_id day new_hire_count\n",
666 | "5125 12 2014-05-19 2\n",
667 | "5126 12 2014-10-13 1\n",
668 | "5127 12 2015-03-23 1\n",
669 | "5128 12 2015-07-06 1\n",
670 | "5129 12 2015-07-27 1"
671 | ]
672 | },
673 | "execution_count": 823,
674 | "metadata": {},
675 | "output_type": "execute_result"
676 | }
677 | ],
678 | "source": [
679 | "new_hire_by_date.tail()"
680 | ]
681 | },
682 | {
683 | "cell_type": "markdown",
684 | "metadata": {},
685 | "source": [
686 | "### Get quitted number for each company each day"
687 | ]
688 | },
689 | {
690 | "cell_type": "code",
691 | "execution_count": 824,
692 | "metadata": {},
693 | "outputs": [],
694 | "source": [
695 | "quit_by_date = data.groupby(['company_id','quit_date'],as_index=False).employee_id.count()"
696 | ]
697 | },
698 | {
699 | "cell_type": "code",
700 | "execution_count": 825,
701 | "metadata": {},
702 | "outputs": [],
703 | "source": [
704 | "quit_by_date.columns = ['company_id','day','quit_count']"
705 | ]
706 | },
707 | {
708 | "cell_type": "code",
709 | "execution_count": 826,
710 | "metadata": {},
711 | "outputs": [
712 | {
713 | "data": {
714 | "text/html": [
715 | "\n",
716 | "\n",
729 | "
\n",
730 | " \n",
731 | " \n",
732 | " | \n",
733 | " company_id | \n",
734 | " day | \n",
735 | " quit_count | \n",
736 | "
\n",
737 | " \n",
738 | " \n",
739 | " \n",
740 | " 0 | \n",
741 | " 1 | \n",
742 | " 2011-10-21 | \n",
743 | " 1 | \n",
744 | "
\n",
745 | " \n",
746 | " 1 | \n",
747 | " 1 | \n",
748 | " 2011-11-11 | \n",
749 | " 1 | \n",
750 | "
\n",
751 | " \n",
752 | " 2 | \n",
753 | " 1 | \n",
754 | " 2011-11-22 | \n",
755 | " 1 | \n",
756 | "
\n",
757 | " \n",
758 | " 3 | \n",
759 | " 1 | \n",
760 | " 2011-11-25 | \n",
761 | " 1 | \n",
762 | "
\n",
763 | " \n",
764 | " 4 | \n",
765 | " 1 | \n",
766 | " 2011-12-09 | \n",
767 | " 1 | \n",
768 | "
\n",
769 | " \n",
770 | "
\n",
771 | "
"
772 | ],
773 | "text/plain": [
774 | " company_id day quit_count\n",
775 | "0 1 2011-10-21 1\n",
776 | "1 1 2011-11-11 1\n",
777 | "2 1 2011-11-22 1\n",
778 | "3 1 2011-11-25 1\n",
779 | "4 1 2011-12-09 1"
780 | ]
781 | },
782 | "execution_count": 826,
783 | "metadata": {},
784 | "output_type": "execute_result"
785 | }
786 | ],
787 | "source": [
788 | "quit_by_date.head()"
789 | ]
790 | },
791 | {
792 | "cell_type": "markdown",
793 | "metadata": {},
794 | "source": [
795 | "### Create a dataframe storing the date from start to end"
796 | ]
797 | },
798 | {
799 | "cell_type": "code",
800 | "execution_count": 827,
801 | "metadata": {},
802 | "outputs": [],
803 | "source": [
804 | "start_date = '2011-01-23'\n",
805 | "end_date = '2015-12-13'"
806 | ]
807 | },
808 | {
809 | "cell_type": "code",
810 | "execution_count": 828,
811 | "metadata": {},
812 | "outputs": [],
813 | "source": [
814 | "# continuous day dataframe\n",
815 | "d = pd.DataFrame(pd.date_range(start_date, end_date),columns = ['day'])"
816 | ]
817 | },
818 | {
819 | "cell_type": "code",
820 | "execution_count": 829,
821 | "metadata": {},
822 | "outputs": [
823 | {
824 | "data": {
825 | "text/html": [
826 | "\n",
827 | "\n",
840 | "
\n",
841 | " \n",
842 | " \n",
843 | " | \n",
844 | " day | \n",
845 | "
\n",
846 | " \n",
847 | " \n",
848 | " \n",
849 | " 0 | \n",
850 | " 2011-01-23 | \n",
851 | "
\n",
852 | " \n",
853 | " 1 | \n",
854 | " 2011-01-24 | \n",
855 | "
\n",
856 | " \n",
857 | " 2 | \n",
858 | " 2011-01-25 | \n",
859 | "
\n",
860 | " \n",
861 | " 3 | \n",
862 | " 2011-01-26 | \n",
863 | "
\n",
864 | " \n",
865 | " 4 | \n",
866 | " 2011-01-27 | \n",
867 | "
\n",
868 | " \n",
869 | "
\n",
870 | "
"
871 | ],
872 | "text/plain": [
873 | " day\n",
874 | "0 2011-01-23\n",
875 | "1 2011-01-24\n",
876 | "2 2011-01-25\n",
877 | "3 2011-01-26\n",
878 | "4 2011-01-27"
879 | ]
880 | },
881 | "execution_count": 829,
882 | "metadata": {},
883 | "output_type": "execute_result"
884 | }
885 | ],
886 | "source": [
887 | "d.head()"
888 | ]
889 | },
890 | {
891 | "cell_type": "markdown",
892 | "metadata": {},
893 | "source": [
894 | "### Get the company list"
895 | ]
896 | },
897 | {
898 | "cell_type": "code",
899 | "execution_count": 830,
900 | "metadata": {},
901 | "outputs": [],
902 | "source": [
903 | "company_list = data.company_id.unique()"
904 | ]
905 | },
906 | {
907 | "cell_type": "code",
908 | "execution_count": 831,
909 | "metadata": {},
910 | "outputs": [
911 | {
912 | "data": {
913 | "text/plain": [
914 | "array([ 7, 4, 2, 9, 1, 6, 10, 5, 3, 8, 11, 12], dtype=int64)"
915 | ]
916 | },
917 | "execution_count": 831,
918 | "metadata": {},
919 | "output_type": "execute_result"
920 | }
921 | ],
922 | "source": [
923 | "company_list"
924 | ]
925 | },
926 | {
927 | "cell_type": "code",
928 | "execution_count": 832,
929 | "metadata": {},
930 | "outputs": [],
931 | "source": [
932 | "company_list.sort()"
933 | ]
934 | },
935 | {
936 | "cell_type": "code",
937 | "execution_count": 833,
938 | "metadata": {},
939 | "outputs": [
940 | {
941 | "data": {
942 | "text/plain": [
943 | "array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype=int64)"
944 | ]
945 | },
946 | "execution_count": 833,
947 | "metadata": {},
948 | "output_type": "execute_result"
949 | }
950 | ],
951 | "source": [
952 | "company_list"
953 | ]
954 | },
955 | {
956 | "cell_type": "code",
957 | "execution_count": 834,
958 | "metadata": {},
959 | "outputs": [],
960 | "source": [
961 | "c = pd.DataFrame(company_list,columns=['company_id'])"
962 | ]
963 | },
964 | {
965 | "cell_type": "markdown",
966 | "metadata": {},
967 | "source": [
968 | "### Cross Join date and company list"
969 | ]
970 | },
971 | {
972 | "cell_type": "code",
973 | "execution_count": 835,
974 | "metadata": {},
975 | "outputs": [],
976 | "source": [
977 | "# merge on a dummy column and drop it\n",
978 | "headcount = d.assign(foo = 1).merge(c.assign(foo=1)).drop('foo',1)"
979 | ]
980 | },
981 | {
982 | "cell_type": "code",
983 | "execution_count": 836,
984 | "metadata": {},
985 | "outputs": [
986 | {
987 | "data": {
988 | "text/html": [
989 | "\n",
990 | "\n",
1003 | "
\n",
1004 | " \n",
1005 | " \n",
1006 | " | \n",
1007 | " day | \n",
1008 | " company_id | \n",
1009 | "
\n",
1010 | " \n",
1011 | " \n",
1012 | " \n",
1013 | " 21427 | \n",
1014 | " 2015-12-13 | \n",
1015 | " 8 | \n",
1016 | "
\n",
1017 | " \n",
1018 | " 21428 | \n",
1019 | " 2015-12-13 | \n",
1020 | " 9 | \n",
1021 | "
\n",
1022 | " \n",
1023 | " 21429 | \n",
1024 | " 2015-12-13 | \n",
1025 | " 10 | \n",
1026 | "
\n",
1027 | " \n",
1028 | " 21430 | \n",
1029 | " 2015-12-13 | \n",
1030 | " 11 | \n",
1031 | "
\n",
1032 | " \n",
1033 | " 21431 | \n",
1034 | " 2015-12-13 | \n",
1035 | " 12 | \n",
1036 | "
\n",
1037 | " \n",
1038 | "
\n",
1039 | "
"
1040 | ],
1041 | "text/plain": [
1042 | " day company_id\n",
1043 | "21427 2015-12-13 8\n",
1044 | "21428 2015-12-13 9\n",
1045 | "21429 2015-12-13 10\n",
1046 | "21430 2015-12-13 11\n",
1047 | "21431 2015-12-13 12"
1048 | ]
1049 | },
1050 | "execution_count": 836,
1051 | "metadata": {},
1052 | "output_type": "execute_result"
1053 | }
1054 | ],
1055 | "source": [
1056 | "headcount.tail()"
1057 | ]
1058 | },
1059 | {
1060 | "cell_type": "markdown",
1061 | "metadata": {},
1062 | "source": [
1063 | "### merge with new_hire and quit data"
1064 | ]
1065 | },
1066 | {
1067 | "cell_type": "code",
1068 | "execution_count": 837,
1069 | "metadata": {},
1070 | "outputs": [],
1071 | "source": [
1072 | "headcount = (headcount.merge(new_hire_by_date, how='left',\\\n",
1073 | " on=['day','company_id']).fillna(0)).merge(quit_by_date, how='left',\\\n",
1074 | " on =['day','company_id']).fillna(0)"
1075 | ]
1076 | },
1077 | {
1078 | "cell_type": "code",
1079 | "execution_count": 838,
1080 | "metadata": {},
1081 | "outputs": [
1082 | {
1083 | "data": {
1084 | "text/html": [
1085 | "\n",
1086 | "\n",
1099 | "
\n",
1100 | " \n",
1101 | " \n",
1102 | " | \n",
1103 | " day | \n",
1104 | " company_id | \n",
1105 | " new_hire_count | \n",
1106 | " quit_count | \n",
1107 | "
\n",
1108 | " \n",
1109 | " \n",
1110 | " \n",
1111 | " 0 | \n",
1112 | " 2011-01-23 | \n",
1113 | " 1 | \n",
1114 | " 0.0 | \n",
1115 | " 0.0 | \n",
1116 | "
\n",
1117 | " \n",
1118 | " 1 | \n",
1119 | " 2011-01-23 | \n",
1120 | " 2 | \n",
1121 | " 0.0 | \n",
1122 | " 0.0 | \n",
1123 | "
\n",
1124 | " \n",
1125 | " 2 | \n",
1126 | " 2011-01-23 | \n",
1127 | " 3 | \n",
1128 | " 0.0 | \n",
1129 | " 0.0 | \n",
1130 | "
\n",
1131 | " \n",
1132 | " 3 | \n",
1133 | " 2011-01-23 | \n",
1134 | " 4 | \n",
1135 | " 0.0 | \n",
1136 | " 0.0 | \n",
1137 | "
\n",
1138 | " \n",
1139 | " 4 | \n",
1140 | " 2011-01-23 | \n",
1141 | " 5 | \n",
1142 | " 0.0 | \n",
1143 | " 0.0 | \n",
1144 | "
\n",
1145 | " \n",
1146 | " 5 | \n",
1147 | " 2011-01-23 | \n",
1148 | " 6 | \n",
1149 | " 0.0 | \n",
1150 | " 0.0 | \n",
1151 | "
\n",
1152 | " \n",
1153 | " 6 | \n",
1154 | " 2011-01-23 | \n",
1155 | " 7 | \n",
1156 | " 0.0 | \n",
1157 | " 0.0 | \n",
1158 | "
\n",
1159 | " \n",
1160 | " 7 | \n",
1161 | " 2011-01-23 | \n",
1162 | " 8 | \n",
1163 | " 0.0 | \n",
1164 | " 0.0 | \n",
1165 | "
\n",
1166 | " \n",
1167 | " 8 | \n",
1168 | " 2011-01-23 | \n",
1169 | " 9 | \n",
1170 | " 0.0 | \n",
1171 | " 0.0 | \n",
1172 | "
\n",
1173 | " \n",
1174 | " 9 | \n",
1175 | " 2011-01-23 | \n",
1176 | " 10 | \n",
1177 | " 0.0 | \n",
1178 | " 0.0 | \n",
1179 | "
\n",
1180 | " \n",
1181 | " 10 | \n",
1182 | " 2011-01-23 | \n",
1183 | " 11 | \n",
1184 | " 0.0 | \n",
1185 | " 0.0 | \n",
1186 | "
\n",
1187 | " \n",
1188 | " 11 | \n",
1189 | " 2011-01-23 | \n",
1190 | " 12 | \n",
1191 | " 0.0 | \n",
1192 | " 0.0 | \n",
1193 | "
\n",
1194 | " \n",
1195 | " 12 | \n",
1196 | " 2011-01-24 | \n",
1197 | " 1 | \n",
1198 | " 25.0 | \n",
1199 | " 0.0 | \n",
1200 | "
\n",
1201 | " \n",
1202 | " 13 | \n",
1203 | " 2011-01-24 | \n",
1204 | " 2 | \n",
1205 | " 17.0 | \n",
1206 | " 0.0 | \n",
1207 | "
\n",
1208 | " \n",
1209 | " 14 | \n",
1210 | " 2011-01-24 | \n",
1211 | " 3 | \n",
1212 | " 9.0 | \n",
1213 | " 0.0 | \n",
1214 | "
\n",
1215 | " \n",
1216 | " 15 | \n",
1217 | " 2011-01-24 | \n",
1218 | " 4 | \n",
1219 | " 12.0 | \n",
1220 | " 0.0 | \n",
1221 | "
\n",
1222 | " \n",
1223 | " 16 | \n",
1224 | " 2011-01-24 | \n",
1225 | " 5 | \n",
1226 | " 5.0 | \n",
1227 | " 0.0 | \n",
1228 | "
\n",
1229 | " \n",
1230 | " 17 | \n",
1231 | " 2011-01-24 | \n",
1232 | " 6 | \n",
1233 | " 3.0 | \n",
1234 | " 0.0 | \n",
1235 | "
\n",
1236 | " \n",
1237 | " 18 | \n",
1238 | " 2011-01-24 | \n",
1239 | " 7 | \n",
1240 | " 1.0 | \n",
1241 | " 0.0 | \n",
1242 | "
\n",
1243 | " \n",
1244 | " 19 | \n",
1245 | " 2011-01-24 | \n",
1246 | " 8 | \n",
1247 | " 6.0 | \n",
1248 | " 0.0 | \n",
1249 | "
\n",
1250 | " \n",
1251 | " 20 | \n",
1252 | " 2011-01-24 | \n",
1253 | " 9 | \n",
1254 | " 3.0 | \n",
1255 | " 0.0 | \n",
1256 | "
\n",
1257 | " \n",
1258 | " 21 | \n",
1259 | " 2011-01-24 | \n",
1260 | " 10 | \n",
1261 | " 0.0 | \n",
1262 | " 0.0 | \n",
1263 | "
\n",
1264 | " \n",
1265 | " 22 | \n",
1266 | " 2011-01-24 | \n",
1267 | " 11 | \n",
1268 | " 0.0 | \n",
1269 | " 0.0 | \n",
1270 | "
\n",
1271 | " \n",
1272 | " 23 | \n",
1273 | " 2011-01-24 | \n",
1274 | " 12 | \n",
1275 | " 0.0 | \n",
1276 | " 0.0 | \n",
1277 | "
\n",
1278 | " \n",
1279 | "
\n",
1280 | "
"
1281 | ],
1282 | "text/plain": [
1283 | " day company_id new_hire_count quit_count\n",
1284 | "0 2011-01-23 1 0.0 0.0\n",
1285 | "1 2011-01-23 2 0.0 0.0\n",
1286 | "2 2011-01-23 3 0.0 0.0\n",
1287 | "3 2011-01-23 4 0.0 0.0\n",
1288 | "4 2011-01-23 5 0.0 0.0\n",
1289 | "5 2011-01-23 6 0.0 0.0\n",
1290 | "6 2011-01-23 7 0.0 0.0\n",
1291 | "7 2011-01-23 8 0.0 0.0\n",
1292 | "8 2011-01-23 9 0.0 0.0\n",
1293 | "9 2011-01-23 10 0.0 0.0\n",
1294 | "10 2011-01-23 11 0.0 0.0\n",
1295 | "11 2011-01-23 12 0.0 0.0\n",
1296 | "12 2011-01-24 1 25.0 0.0\n",
1297 | "13 2011-01-24 2 17.0 0.0\n",
1298 | "14 2011-01-24 3 9.0 0.0\n",
1299 | "15 2011-01-24 4 12.0 0.0\n",
1300 | "16 2011-01-24 5 5.0 0.0\n",
1301 | "17 2011-01-24 6 3.0 0.0\n",
1302 | "18 2011-01-24 7 1.0 0.0\n",
1303 | "19 2011-01-24 8 6.0 0.0\n",
1304 | "20 2011-01-24 9 3.0 0.0\n",
1305 | "21 2011-01-24 10 0.0 0.0\n",
1306 | "22 2011-01-24 11 0.0 0.0\n",
1307 | "23 2011-01-24 12 0.0 0.0"
1308 | ]
1309 | },
1310 | "execution_count": 838,
1311 | "metadata": {},
1312 | "output_type": "execute_result"
1313 | }
1314 | ],
1315 | "source": [
1316 | "headcount.head(24)"
1317 | ]
1318 | },
1319 | {
1320 | "cell_type": "markdown",
1321 | "metadata": {},
1322 | "source": [
1323 | "Calculate net headcount change per day"
1324 | ]
1325 | },
1326 | {
1327 | "cell_type": "code",
1328 | "execution_count": 839,
1329 | "metadata": {},
1330 | "outputs": [],
1331 | "source": [
1332 | "headcount['head_count_net_change']=headcount.new_hire_count - headcount.quit_count"
1333 | ]
1334 | },
1335 | {
1336 | "cell_type": "markdown",
1337 | "metadata": {},
1338 | "source": [
1339 | "### Answer#1:Get the headcount per day per company"
1340 | ]
1341 | },
1342 | {
1343 | "cell_type": "markdown",
1344 | "metadata": {},
1345 | "source": [
1346 | "Get the cumulative sum of headcount per day per company"
1347 | ]
1348 | },
1349 | {
1350 | "cell_type": "code",
1351 | "execution_count": 840,
1352 | "metadata": {},
1353 | "outputs": [],
1354 | "source": [
1355 | "cumsums = headcount[['company_id','head_count_net_change']].groupby(['company_id']).cumsum()"
1356 | ]
1357 | },
1358 | {
1359 | "cell_type": "code",
1360 | "execution_count": 841,
1361 | "metadata": {},
1362 | "outputs": [],
1363 | "source": [
1364 | "cumsums.columns = ['head_count']"
1365 | ]
1366 | },
1367 | {
1368 | "cell_type": "code",
1369 | "execution_count": 842,
1370 | "metadata": {},
1371 | "outputs": [
1372 | {
1373 | "data": {
1374 | "text/plain": [
1375 | "21432"
1376 | ]
1377 | },
1378 | "execution_count": 842,
1379 | "metadata": {},
1380 | "output_type": "execute_result"
1381 | }
1382 | ],
1383 | "source": [
1384 | "len(cumsums)"
1385 | ]
1386 | },
1387 | {
1388 | "cell_type": "code",
1389 | "execution_count": 843,
1390 | "metadata": {},
1391 | "outputs": [],
1392 | "source": [
1393 | "headcount = pd.concat([headcount,cumsums], axis = 1)"
1394 | ]
1395 | },
1396 | {
1397 | "cell_type": "code",
1398 | "execution_count": 844,
1399 | "metadata": {},
1400 | "outputs": [
1401 | {
1402 | "data": {
1403 | "text/html": [
1404 | "\n",
1405 | "\n",
1418 | "
\n",
1419 | " \n",
1420 | " \n",
1421 | " | \n",
1422 | " day | \n",
1423 | " company_id | \n",
1424 | " new_hire_count | \n",
1425 | " quit_count | \n",
1426 | " head_count_net_change | \n",
1427 | " head_count | \n",
1428 | "
\n",
1429 | " \n",
1430 | " \n",
1431 | " \n",
1432 | " 21427 | \n",
1433 | " 2015-12-13 | \n",
1434 | " 8 | \n",
1435 | " 0.0 | \n",
1436 | " 0.0 | \n",
1437 | " 0.0 | \n",
1438 | " 468.0 | \n",
1439 | "
\n",
1440 | " \n",
1441 | " 21428 | \n",
1442 | " 2015-12-13 | \n",
1443 | " 9 | \n",
1444 | " 0.0 | \n",
1445 | " 0.0 | \n",
1446 | " 0.0 | \n",
1447 | " 432.0 | \n",
1448 | "
\n",
1449 | " \n",
1450 | " 21429 | \n",
1451 | " 2015-12-13 | \n",
1452 | " 10 | \n",
1453 | " 0.0 | \n",
1454 | " 0.0 | \n",
1455 | " 0.0 | \n",
1456 | " 385.0 | \n",
1457 | "
\n",
1458 | " \n",
1459 | " 21430 | \n",
1460 | " 2015-12-13 | \n",
1461 | " 11 | \n",
1462 | " 0.0 | \n",
1463 | " 0.0 | \n",
1464 | " 0.0 | \n",
1465 | " 4.0 | \n",
1466 | "
\n",
1467 | " \n",
1468 | " 21431 | \n",
1469 | " 2015-12-13 | \n",
1470 | " 12 | \n",
1471 | " 0.0 | \n",
1472 | " 0.0 | \n",
1473 | " 0.0 | \n",
1474 | " 12.0 | \n",
1475 | "
\n",
1476 | " \n",
1477 | "
\n",
1478 | "
"
1479 | ],
1480 | "text/plain": [
1481 | " day company_id new_hire_count quit_count \\\n",
1482 | "21427 2015-12-13 8 0.0 0.0 \n",
1483 | "21428 2015-12-13 9 0.0 0.0 \n",
1484 | "21429 2015-12-13 10 0.0 0.0 \n",
1485 | "21430 2015-12-13 11 0.0 0.0 \n",
1486 | "21431 2015-12-13 12 0.0 0.0 \n",
1487 | "\n",
1488 | " head_count_net_change head_count \n",
1489 | "21427 0.0 468.0 \n",
1490 | "21428 0.0 432.0 \n",
1491 | "21429 0.0 385.0 \n",
1492 | "21430 0.0 4.0 \n",
1493 | "21431 0.0 12.0 "
1494 | ]
1495 | },
1496 | "execution_count": 844,
1497 | "metadata": {},
1498 | "output_type": "execute_result"
1499 | }
1500 | ],
1501 | "source": [
1502 | "headcount.tail()"
1503 | ]
1504 | },
1505 | {
1506 | "cell_type": "markdown",
1507 | "metadata": {},
1508 | "source": [
1509 | "### Check the factors drive employee churn"
1510 | ]
1511 | },
1512 | {
1513 | "cell_type": "markdown",
1514 | "metadata": {},
1515 | "source": [
1516 | "### check employment length"
1517 | ]
1518 | },
1519 | {
1520 | "cell_type": "markdown",
1521 | "metadata": {},
1522 | "source": [
1523 | "Get the timedelta between join and quit"
1524 | ]
1525 | },
1526 | {
1527 | "cell_type": "code",
1528 | "execution_count": 845,
1529 | "metadata": {},
1530 | "outputs": [],
1531 | "source": [
1532 | "data['emp_length'] = data.quit_date-data.join_date"
1533 | ]
1534 | },
1535 | {
1536 | "cell_type": "code",
1537 | "execution_count": 846,
1538 | "metadata": {},
1539 | "outputs": [
1540 | {
1541 | "data": {
1542 | "text/html": [
1543 | "\n",
1544 | "\n",
1557 | "
\n",
1558 | " \n",
1559 | " \n",
1560 | " | \n",
1561 | " employee_id | \n",
1562 | " company_id | \n",
1563 | " dept | \n",
1564 | " seniority | \n",
1565 | " salary | \n",
1566 | " join_date | \n",
1567 | " quit_date | \n",
1568 | " emp_length | \n",
1569 | "
\n",
1570 | " \n",
1571 | " \n",
1572 | " \n",
1573 | " 0 | \n",
1574 | " 13021.0 | \n",
1575 | " 7 | \n",
1576 | " customer_service | \n",
1577 | " 28 | \n",
1578 | " 89000.0 | \n",
1579 | " 2014-03-24 | \n",
1580 | " 2015-10-30 | \n",
1581 | " 585 days | \n",
1582 | "
\n",
1583 | " \n",
1584 | " 1 | \n",
1585 | " 825355.0 | \n",
1586 | " 7 | \n",
1587 | " marketing | \n",
1588 | " 20 | \n",
1589 | " 183000.0 | \n",
1590 | " 2013-04-29 | \n",
1591 | " 2014-04-04 | \n",
1592 | " 340 days | \n",
1593 | "
\n",
1594 | " \n",
1595 | " 2 | \n",
1596 | " 927315.0 | \n",
1597 | " 4 | \n",
1598 | " marketing | \n",
1599 | " 14 | \n",
1600 | " 101000.0 | \n",
1601 | " 2014-10-13 | \n",
1602 | " NaT | \n",
1603 | " NaT | \n",
1604 | "
\n",
1605 | " \n",
1606 | " 3 | \n",
1607 | " 662910.0 | \n",
1608 | " 7 | \n",
1609 | " customer_service | \n",
1610 | " 20 | \n",
1611 | " 115000.0 | \n",
1612 | " 2012-05-14 | \n",
1613 | " 2013-06-07 | \n",
1614 | " 389 days | \n",
1615 | "
\n",
1616 | " \n",
1617 | " 4 | \n",
1618 | " 256971.0 | \n",
1619 | " 2 | \n",
1620 | " data_science | \n",
1621 | " 23 | \n",
1622 | " 276000.0 | \n",
1623 | " 2011-10-17 | \n",
1624 | " 2014-08-22 | \n",
1625 | " 1040 days | \n",
1626 | "
\n",
1627 | " \n",
1628 | "
\n",
1629 | "
"
1630 | ],
1631 | "text/plain": [
1632 | " employee_id company_id dept seniority salary join_date \\\n",
1633 | "0 13021.0 7 customer_service 28 89000.0 2014-03-24 \n",
1634 | "1 825355.0 7 marketing 20 183000.0 2013-04-29 \n",
1635 | "2 927315.0 4 marketing 14 101000.0 2014-10-13 \n",
1636 | "3 662910.0 7 customer_service 20 115000.0 2012-05-14 \n",
1637 | "4 256971.0 2 data_science 23 276000.0 2011-10-17 \n",
1638 | "\n",
1639 | " quit_date emp_length \n",
1640 | "0 2015-10-30 585 days \n",
1641 | "1 2014-04-04 340 days \n",
1642 | "2 NaT NaT \n",
1643 | "3 2013-06-07 389 days \n",
1644 | "4 2014-08-22 1040 days "
1645 | ]
1646 | },
1647 | "execution_count": 846,
1648 | "metadata": {},
1649 | "output_type": "execute_result"
1650 | }
1651 | ],
1652 | "source": [
1653 | "data.head()"
1654 | ]
1655 | },
1656 | {
1657 | "cell_type": "code",
1658 | "execution_count": 847,
1659 | "metadata": {},
1660 | "outputs": [
1661 | {
1662 | "data": {
1663 | "text/plain": [
1664 | "employee_id float64\n",
1665 | "company_id int64\n",
1666 | "dept object\n",
1667 | "seniority int64\n",
1668 | "salary float64\n",
1669 | "join_date datetime64[ns]\n",
1670 | "quit_date datetime64[ns]\n",
1671 | "emp_length timedelta64[ns]\n",
1672 | "dtype: object"
1673 | ]
1674 | },
1675 | "execution_count": 847,
1676 | "metadata": {},
1677 | "output_type": "execute_result"
1678 | }
1679 | ],
1680 | "source": [
1681 | "data.dtypes"
1682 | ]
1683 | },
1684 | {
1685 | "cell_type": "code",
1686 | "execution_count": 848,
1687 | "metadata": {},
1688 | "outputs": [
1689 | {
1690 | "data": {
1691 | "text/plain": [
1692 | "count 13510\n",
1693 | "mean 613 days 11:41:01.643227\n",
1694 | "std 328 days 14:56:33.800149\n",
1695 | "min 102 days 00:00:00\n",
1696 | "25% 361 days 00:00:00\n",
1697 | "50% 417 days 00:00:00\n",
1698 | "75% 781 days 00:00:00\n",
1699 | "max 1726 days 00:00:00\n",
1700 | "Name: emp_length, dtype: object"
1701 | ]
1702 | },
1703 | "execution_count": 848,
1704 | "metadata": {},
1705 | "output_type": "execute_result"
1706 | }
1707 | ],
1708 | "source": [
1709 | "data.emp_length.describe()"
1710 | ]
1711 | },
1712 | {
1713 | "cell_type": "code",
1714 | "execution_count": 862,
1715 | "metadata": {},
1716 | "outputs": [
1717 | {
1718 | "data": {
1719 | "text/plain": [
1720 | "375 days 370\n",
1721 | "361 days 368\n",
1722 | "354 days 367\n",
1723 | "368 days 333\n",
1724 | "382 days 325\n",
1725 | "Name: emp_length, dtype: int64"
1726 | ]
1727 | },
1728 | "execution_count": 862,
1729 | "metadata": {},
1730 | "output_type": "execute_result"
1731 | }
1732 | ],
1733 | "source": [
1734 | "data.emp_length.value_counts().head()"
1735 | ]
1736 | },
1737 | {
1738 | "cell_type": "code",
1739 | "execution_count": 860,
1740 | "metadata": {},
1741 | "outputs": [
1742 | {
1743 | "data": {
1744 | "text/plain": [
1745 | ""
1746 | ]
1747 | },
1748 | "execution_count": 860,
1749 | "metadata": {},
1750 | "output_type": "execute_result"
1751 | },
1752 | {
1753 | "data": {
1754 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD8CAYAAABthzNFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAGLNJREFUeJzt3X+wXOV93/H3tyhgjGwkmXCtSpoRThR3XKux4RZw3Ka34PDLFrgzxoFhYskho2liu44t14gyU6bJeILjNP4xTe2ohhpagkwwCRqMQyjmxuOZItsiBoGBcI1VuICNXbDaazdN1Hz7x3kk1pdHule7Z3eP0Ps1s3PPec6z53z2uXv3u+fH7o3MRJKk+f7euANIkrrJAiFJqrJASJKqLBCSpCoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqWrJuAMcykknnZRr164ddwx+9KMfccIJJ4w7xot0MZeZFq+LubqYCbqZq4uZoMn1yCOP/CAzf3rglWVmZ2+nnXZadsE999wz7ghVXcxlpsXrYq4uZsrsZq4uZspscgHfyBZegz3EJEmqskBIkqosEJKkKguEJKnKAiFJqrJASJKqLBCSpCoLhCSpasECERHXRcSzEfFgZdmHIiIj4qQyHxHxqYiYiYgHIuLUnr4bI+KxctvY7sOQJLVtMV+18TngPwA39DZGxBrgl4AneprPB9aV2xnAp4EzImIFcDUwCSSwKyJ2ZObzgz6Al5q1W794YHrPNW8dYxJJR7sF9yAy8yvAc5VFHwc+TPOCv99FwA3lE9/3AssiYiVwLnBXZj5XisJdwHkDp5ckDU1fX9YXERcCT2Xm/RHRu2gV8GTP/GxpO1h7bd2bgc0AExMTTE9P9xOxVXNzcyPLsWX9vgPTC21zlLkWy0yL18VcXcwE3czVxUzQ5GrLYReIiHg5cBVwTm1xpS0P0f7ixsxtwDaAycnJnJqaOtyIrZuenmZUOTb1HmK67NDbHGWuxTLT4nUxVxczQTdzdTETLPzG8nD0cxXTzwCnAPdHxB5gNXBfRLyaZs9gTU/f1cDTh2iXJHXUYReIzNydmSdn5trMXEvz4n9qZn4X2AG8q1zNdCawNzOfAe4EzomI5RGxnGbv4872HoYkqW2Lucz1JuC/A6+NiNmIuPwQ3e8AHgdmgP8E/AZAZj4H/Dbw9XL7rdImSeqoBc9BZOalCyxf2zOdwHsO0u864LrDzCdJGhM/SS1JqrJASJKqLBCSpCoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqcoCIUmqskBIkqosEJKkKguEJKnKAiFJqrJASJKqLBCSpCoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqWrBAhER10XEsxHxYE/bxyLikYh4ICL+JCKW9Sy7MiJmIuLRiDi3p/280jYTEVvbfyiSpDYtZg/ic8B589ruAl6fmf8I+CvgSoCIeB1wCfAPy33+Y0QcExHHAH8AnA+8Dri09JUkddSCBSIzvwI8N6/tzzNzX5m9F1hdpi8Ctmfm/83M7wAzwOnlNpOZj2fm3wDbS19JUke1cQ7iV4EvlelVwJM9y2ZL28HaJUkdtWSQO0fEVcA+4Mb9TZVuSb0Q5UHWuRnYDDAxMcH09PQgEVsxNzc3shxb1u87ML3QNkeZa7HMtHhdzNXFTNDNXF3MBE2utvRdICJiI/A24OzM3P9iPwus6em2Gni6TB+s/Sdk5jZgG8Dk5GROTU31G7E109PTjCrHpq1fPDC957JDb3OUuRbLTIvXxVxdzATdzNXFTLDwG8vD0dchpog4D7gCuDAzf9yzaAdwSUQcFxGnAOuArwFfB9ZFxCkRcSzNiewdg0WXJA3TgnsQEXETMAWcFBGzwNU0Vy0dB9wVEQD3Zua/zMyHIuJm4Fs0h57ek5n/r6znvcCdwDHAdZn50BAejySpJQsWiMy8tNJ87SH6fwT4SKX9DuCOw0onSRobP0ktSaqyQEiSqiwQkqQqC4QkqcoCIUmqskBIkqosEJKkKguEJKnKAiFJqrJASJKqLBCSpCoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqcoCIUmqskBIkqoW/JejGp+1W794YHrPNW8dYxJJRyP3ICRJVQsWiIi4LiKejYgHe9pWRMRdEfFY+bm8tEdEfCoiZiLigYg4tec+G0v/xyJi43AejiSpLYvZg/gccN68tq3A3Zm5Dri7zAOcD6wrt83Ap6EpKMDVwBnA6cDV+4uKJKmbFiwQmfkV4Ll5zRcB15fp64G397TfkI17gWURsRI4F7grM5/LzOeBu3hx0ZEkdUi/5yAmMvMZgPLz5NK+Cniyp99saTtYuySpo9q+iikqbXmI9hevIGIzzeEpJiYmmJ6ebi1cv+bm5kaWY8v6fdX22vZHmWuxzLR4XczVxUzQzVxdzARNrrb0WyC+FxErM/OZcgjp2dI+C6zp6bcaeLq0T81rn66tODO3AdsAJicnc2pqqtZtpKanpxlVjk09l7b22nPZi7c/ylyLZabF62KuLmaCbubqYiaov5nsV7+HmHYA+69E2gjc1tP+rnI105nA3nII6k7gnIhYXk5On1PaJEkdteAeRETcRPPu/6SImKW5Guka4OaIuBx4Ari4dL8DuACYAX4MvBsgM5+LiN8Gvl76/VZmzj/xLUnqkAULRGZeepBFZ1f6JvCeg6znOuC6w0onSRobP0ktSaqyQEiSqvyyviOEX9wnadTcg5AkVVkgJElVFghJUpUFQpJUZYGQJFVZICRJVRYISVKVBUKSVOUH5Y5A+z80t2X9vp/4DnVJapN7EJKkKguEJKnKAiFJqrJASJKqLBCSpCqvYuqA3q/ylqSucA9CklRlgZAkVQ1UICLiAxHxUEQ8GBE3RcTLIuKUiNgZEY9FxOcj4tjS97gyP1OWr23jAUiShqPvAhERq4B/BUxm5uuBY4BLgI8CH8/MdcDzwOXlLpcDz2fmzwIfL/0kSR016CGmJcDxEbEEeDnwDHAWcEtZfj3w9jJ9UZmnLD87ImLA7UuShqTvApGZTwG/BzxBUxj2AruAH2bmvtJtFlhVplcBT5b77iv9X9Xv9iVJwxWZ2d8dI5YDXwB+Gfgh8Mdl/upyGImIWAPckZnrI+Ih4NzMnC3Lvg2cnpn/c956NwObASYmJk7bvn17X/naNDc3x9KlS4e2/t1P7e3rfhPHw8krTmw5zWCGPVb96GIm6GauLmaCbubqYiZocm3YsGFXZk4Ouq5BPgfxFuA7mfl9gIi4FfgFYFlELCl7CauBp0v/WWANMFsOSZ0IPDd/pZm5DdgGMDk5mVNTUwNEbMf09DTDzLGpz89BbFm/j3d2YHx6DXus+tHFTNDNXF3MBN3M1cVM0ORqyyDnIJ4AzoyIl5dzCWcD3wLuAd5R+mwEbivTO8o8ZfmXs9/dF0nS0A1yDmInzcnm+4DdZV3bgCuAD0bEDM05hmvLXa4FXlXaPwhsHSC3JGnIBvqqjcy8Grh6XvPjwOmVvn8NXDzI9iRJo+MnqSVJVRYISVKVBUKSVGWBkCRVWSAkSVUWCElSlQVCklRlgZAkVVkgJElVFghJUpUFQpJUZYGQJFVZICRJVRYISVKVBUKSVGWBkCRVWSAkSVUWCElSlQVCklRlgZAkVVkgJElVAxWIiFgWEbdExCMR8XBEvCkiVkTEXRHxWPm5vPSNiPhURMxExAMRcWo7D0GSNAyD7kF8EvizzPwHwM8DDwNbgbszcx1wd5kHOB9YV26bgU8PuG1J0hD1XSAi4pXALwLXAmTm32TmD4GLgOtLt+uBt5fpi4AbsnEvsCwiVvadXJI0VJGZ/d0x4g3ANuBbNHsPu4D3A09l5rKefs9n5vKIuB24JjO/WtrvBq7IzG/MW+9mmj0MJiYmTtu+fXtf+do0NzfH0qVLh7b+3U/t7et+E8fDyStObDnNYIY9Vv3oYiboZq4uZoJu5upiJmhybdiwYVdmTg66riUD3vdU4H2ZuTMiPskLh5NqotL2ouqUmdtoCg+Tk5M5NTU1QMR2TE9PM8wcm7Z+sa/7bVm/j3d2YHx6DXus+tHFTNDNXF3MBN3M1cVM0ORqyyDnIGaB2czcWeZvoSkY39t/6Kj8fLan/5qe+68Gnh5g+5KkIep7DyIzvxsRT0bEazPzUeBsmsNN3wI2AteUn7eVu+wA3hsR24EzgL2Z+cxA6Y9ga/vca5CkURnkEBPA+4AbI+JY4HHg3TR7JTdHxOXAE8DFpe8dwAXADPDj0leS1FEDFYjM/CZQOxFydqVvAu8ZZHuSpNHxk9SSpCoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqcoCIUmqskBIkqosEJKkKguEJKnKAiFJqrJASJKqLBCSpKpBv+5bh8H/ASHpSOIehCSpygIhSaqyQEiSqiwQkqQqC4QkqcoCIUmqGrhARMQxEfGXEXF7mT8lInZGxGMR8fmIOLa0H1fmZ8rytYNuW5I0PG3sQbwfeLhn/qPAxzNzHfA8cHlpvxx4PjN/Fvh46SdJ6qiBCkRErAbeCny2zAdwFnBL6XI98PYyfVGZpyw/u/SXJHVQZGb/d464Bfgd4BXAh4BNwL1lL4GIWAN8KTNfHxEPAudl5mxZ9m3gjMz8wbx1bgY2A0xMTJy2ffv2vvO1ZW5ujqVLlw68nt1P7W0hzQsmjoeTV5zY6joH1dZYtamLmaCbubqYCbqZq4uZoMm1YcOGXZk5Oei6+v6qjYh4G/BsZu6KiKn9zZWuuYhlLzRkbgO2AUxOTubU1NT8LiM3PT1NGzk2tfxVG1vW7+PDf/ajn2jbc81bW93G4WprrNrUxUzQzVxdzATdzNXFTNDkassg38X0ZuDCiLgAeBnwSuATwLKIWJKZ+4DVwNOl/yywBpiNiCXAicBzA2xfkjREfZ+DyMwrM3N1Zq4FLgG+nJmXAfcA7yjdNgK3lekdZZ6y/Ms5yPEtSdJQDeNzEFcAH4yIGeBVwLWl/VrgVaX9g8DWIWxbktSSVr7uOzOngeky/ThweqXPXwMXt7E9SdLw+UlqSVKVBUKSVOV/lFNn9f4HvnFfvisdjdyDkCRVWSAkSVUWCElSlQVCklTlSWod0TyRLQ2PexCSpCoLhCSpygIhSaqyQEiSqiwQkqQqr2LSEWFty/+NT9LCLBDqFAuB1B0eYpIkVVkgJElVHmLSWPgJaKn73IOQJFVZICRJVX0fYoqINcANwKuBvwO2ZeYnI2IF8HlgLbAHeGdmPh8RAXwSuAD4MbApM+8bLL7m89CNpLYMcg5iH7AlM++LiFcAuyLiLmATcHdmXhMRW4GtwBXA+cC6cjsD+HT5KbVi/iWyFkhpMH0XiMx8BnimTP/viHgYWAVcBEyVbtcD0zQF4iLghsxM4N6IWBYRK8t6pNbtLxhb1u878ISUtHitnIOIiLXAG4GdwMT+F/3y8+TSbRXwZM/dZkubJKmDonlDP8AKIpYCfwF8JDNvjYgfZuaynuXPZ+byiPgi8DuZ+dXSfjfw4czcNW99m4HNABMTE6dt3759oHxtmJubY+nSpQOvZ/dTe1tI84KJ4+F7/+fgy9evOrHV7S3GYseq7bE4lInj4eQVox+LhbT1vGpTFzNBN3N1MRM0uTZs2LArMycHXddAn4OIiJ8CvgDcmJm3lubv7T90FBErgWdL+yywpufuq4Gn568zM7cB2wAmJydzampqkIitmJ6epo0cm1r+Gokt6/fx73cf/Fe457KpVre3GIsdq7bH4lC2rN/HOzvwPJqvredVm7qYCbqZq4uZoMnVlkGuYgrgWuDhzPz9nkU7gI3ANeXnbT3t742I7TQnp/d6/uHoMs7vWfLqLunwDbIH8WbgV4DdEfHN0vZvaArDzRFxOfAEcHFZdgfNJa4zNJe5vnuAbUuShmyQq5i+CsRBFp9d6Z/Ae/rd3pHKbyeVdKTyk9SSpCoLhCSpygIhSaqyQEiSqvx/EGpd10/Me8mrtDgWiJcwXwglDcJDTJKkKvcgdFQ72OEw97gkC4R0RPBwocbBAiG9RFhE1DYLhFrR+895fFodnlG/sFtItFj+JUsdNcjlwl2/1FhHBgvEEPjHqX4d7nPH55qGyQIhDdEwDueMoih4GEpggZD69lJ4EX0pPAYNjwVCffPwxgv6OTS0Zf2+kf7r1YV0MZPGywIhjchLoaAO8sHCtvZW3OsZHQtES14Kf/yqO9p/t8M+cb6Y/p8774S+12kR6Z8FQqo42ovCsB3u+O5+am/rh77mZ7CQvJgF4igxyDsqXyy1kHE9R3xuDtfIC0REnAd8EjgG+GxmXjPqDJKOHot9c+RhqRcbaYGIiGOAPwB+CZgFvh4ROzLzW6PMoYX5zkwvRYt9Xvstv41R70GcDsxk5uMAEbEduAg4IgvEkfoieqTmlsat929nMZcEH+kFZdQFYhXwZM/8LHDGiDP8hMW8WHptuKR+HO6bsa4VlMjM0W0s4mLg3Mz8tTL/K8Dpmfm+nj6bgc1l9rXAoyMLeHAnAT8Yd4iKLuYy0+J1MVcXM0E3c3UxEzS5TsjMnx50RaPeg5gF1vTMrwae7u2QmduAbaMMtZCI+EZmTo47x3xdzGWmxetiri5mgm7m6mImOJBrbRvrGvX/pP46sC4iTomIY4FLgB0jziBJWoSR7kFk5r6IeC9wJ81lrtdl5kOjzCBJWpyRfw4iM+8A7hj1dgfUqUNePbqYy0yL18VcXcwE3czVxUzQYq6RnqSWJB05Rn0OQpJ0hDjqC0RErImIeyLi4Yh4KCLeX9pXRMRdEfFY+bm8tEdEfCoiZiLigYg4dcj5jomIv4yI28v8KRGxs+T6fDnZT0QcV+ZnyvK1Q8qzLCJuiYhHypi9qQtjFREfKL+/ByPipoh42TjGKiKui4hnI+LBnrbDHp+I2Fj6PxYRG4eQ6WPld/hARPxJRCzrWXZlyfRoRJzb035eaZuJiK1tZ+pZ9qGIyIg4qcyPZJwOlSsi3lce+0MR8bs97WMZq4h4Q0TcGxHfjIhvRMTppb3dscrMo/oGrAROLdOvAP4KeB3wu8DW0r4V+GiZvgD4EhDAmcDOIef7IPBHwO1l/mbgkjL9GeDXy/RvAJ8p05cAnx9SnuuBXyvTxwLLxj1WNB/A/A5wfM8YbRrHWAG/CJwKPNjTdljjA6wAHi8/l5fp5S1nOgdYUqY/2pPpdcD9wHHAKcC3aS4oOaZMv6b83u8HXtdmptK+huYilv8BnDTKcTrEWP1z4L8Bx5X5k8c9VsCfA+f3jM/0MMaq9T/WI/0G3EbzXVGPAitL20rg0TL9h8ClPf0P9BtCltXA3cBZwO3ll/6Dnj/sNwF3luk7gTeV6SWlX7Sc55U0L8Qxr32sY8ULn9BfUR777cC54xorYO28P+bDGh/gUuAPe9p/ol8bmeYt+xfAjWX6SuDKnmV3lrE7MH61fm1lAm4Bfh7YwwsFYmTjdJDf383AWyr9xjZWZVu/XKYvBf5oGGN11B9i6lUONbwR2AlMZOYzAOXnyaVb7etCVg0p0ieADwN/V+ZfBfwwM/dVtn0gV1m+t/Rv02uA7wP/OZrDXp+NiBMY81hl5lPA7wFPAM/QPPZdjHeseh3u+IzyOQbwqzTvOseaKSIuBJ7KzPvnLRr3OP0c8E/L4ci/iIh/3IFcvwl8LCKepHnuXzmMTBaIIiKWAl8AfjMz/9ehulbaWr8ULCLeBjybmbsWue1R5FpCs6v76cx8I/AjmkMmBzOqsVpO86WPpwB/HzgBOP8Q2x5JrkU4WI6R5YuIq4B9wI3jzBQRLweuAv5tbfE4MvVYQnNY5kzgXwM3R0SMOdevAx/IzDXAB4BrS3urmSwQQET8FE1xuDEzby3N34uIlWX5SuDZ0r7g14W05M3AhRGxB9hOc5jpE8CyiNj/+ZXebR/IVZafCDzXcqZZYDYzd5b5W2gKxrjH6i3AdzLz+5n5t8CtwC8w3rHqdbjjM5JxKycq3wZcluW4wxgz/QxNgb+/POdXA/dFxKvHmGm/WeDWbHyNZo/+pDHn2kjzPAf4Y5pvyt6ftbVMR32BKO8ErgUezszf71m0g+aXQPl5W0/7u8rVAmcCe/cfPmhTZl6Zmauz+U6VS4AvZ+ZlwD3AOw6Sa3/ed5T+rb5ryczvAk9GxGtL09k0X9U+1rGiObR0ZkS8vPw+9+ca21jNc7jjcydwTkQsL3tH55S21kTzj7uuAC7MzB/Py3pJNFd6nQKsA77GkL8mJzN3Z+bJmbm2POdnaS4e+S5jHKfiT2neoBERP0dz4vkHjGmsiqeBf1amzwIeK9PtjtWgJ3SO9BvwT2h2tR4AvlluF9Ack767DPzdwIrSP2j+6dG3gd3A5AgyTvHCVUyvoXkSztC8c9h/ZcXLyvxMWf6aIWV5A/CNMl5/SrPrPfaxAv4d8AjwIPBfaK4sGflYATfRnAf5W5oXucv7GR+a8wIz5fbuIWSaoTkmvf85/5me/leVTI9SrpQp7RfQXOX3beCqtjPNW76HF05Sj2ScDjFWxwL/tTy37gPOGvdY0bxu7aK5QmoncNowxspPUkuSqo76Q0ySpDoLhCSpygIhSaqyQEiSqiwQkqQqC4QkqcoCIUmqskBIkqr+P4Zg0cF+j07kAAAAAElFTkSuQmCC\n",
1755 | "text/plain": [
1756 | ""
1757 | ]
1758 | },
1759 | "metadata": {},
1760 | "output_type": "display_data"
1761 | }
1762 | ],
1763 | "source": [
1764 | "# need to convert timedelta datatype to day or hour or min or second before plot\n",
1765 | "((data.emp_length.dropna() / np.timedelta64(1, 'D'))).hist(bins=100)"
1766 | ]
1767 | },
1768 | {
1769 | "cell_type": "markdown",
1770 | "metadata": {},
1771 | "source": [
1772 | "Observation:
\n",
1773 | "- Very high churn rate at the beginning of the second year of employment
\n",
1774 | "- relatively high churn rate between 1.5 to 2 years of employment"
1775 | ]
1776 | },
1777 | {
1778 | "cell_type": "markdown",
1779 | "metadata": {},
1780 | "source": [
1781 | "### Dig deeper"
1782 | ]
1783 | },
1784 | {
1785 | "cell_type": "markdown",
1786 | "metadata": {},
1787 | "source": [
1788 | "Since it has such a clear pattern, let's dig into deeper.
\n",
1789 | "Break into two groups: quitted early and not(if they haven’t been in the current\n",
1790 | "company for at least 13 months, we remove them)
\n",
1791 | "Let's define the early quitters are the ones quitted before 13 months"
1792 | ]
1793 | },
1794 | {
1795 | "cell_type": "code",
1796 | "execution_count": 930,
1797 | "metadata": {},
1798 | "outputs": [],
1799 | "source": [
1800 | "# get data quitted before 13 months\n",
1801 | "early_quitter = data[data.emp_length/np.timedelta64(1,'D') < 365+30]"
1802 | ]
1803 | },
1804 | {
1805 | "cell_type": "code",
1806 | "execution_count": 929,
1807 | "metadata": {},
1808 | "outputs": [
1809 | {
1810 | "data": {
1811 | "text/html": [
1812 | "\n",
1813 | "\n",
1826 | "
\n",
1827 | " \n",
1828 | " \n",
1829 | " | \n",
1830 | " employee_id | \n",
1831 | " company_id | \n",
1832 | " dept | \n",
1833 | " seniority | \n",
1834 | " salary | \n",
1835 | " join_date | \n",
1836 | " quit_date | \n",
1837 | " emp_length | \n",
1838 | "
\n",
1839 | " \n",
1840 | " \n",
1841 | " \n",
1842 | " 1 | \n",
1843 | " 825355.0 | \n",
1844 | " 7 | \n",
1845 | " marketing | \n",
1846 | " 20 | \n",
1847 | " 183000.0 | \n",
1848 | " 2013-04-29 | \n",
1849 | " 2014-04-04 | \n",
1850 | " 340 days | \n",
1851 | "
\n",
1852 | " \n",
1853 | " 3 | \n",
1854 | " 662910.0 | \n",
1855 | " 7 | \n",
1856 | " customer_service | \n",
1857 | " 20 | \n",
1858 | " 115000.0 | \n",
1859 | " 2012-05-14 | \n",
1860 | " 2013-06-07 | \n",
1861 | " 389 days | \n",
1862 | "
\n",
1863 | " \n",
1864 | " 12 | \n",
1865 | " 939058.0 | \n",
1866 | " 1 | \n",
1867 | " marketing | \n",
1868 | " 1 | \n",
1869 | " 48000.0 | \n",
1870 | " 2012-12-10 | \n",
1871 | " 2013-11-15 | \n",
1872 | " 340 days | \n",
1873 | "
\n",
1874 | " \n",
1875 | " 14 | \n",
1876 | " 461248.0 | \n",
1877 | " 2 | \n",
1878 | " sales | \n",
1879 | " 20 | \n",
1880 | " 201000.0 | \n",
1881 | " 2013-09-16 | \n",
1882 | " 2014-08-22 | \n",
1883 | " 340 days | \n",
1884 | "
\n",
1885 | " \n",
1886 | " 21 | \n",
1887 | " 219944.0 | \n",
1888 | " 6 | \n",
1889 | " customer_service | \n",
1890 | " 15 | \n",
1891 | " 98000.0 | \n",
1892 | " 2012-06-25 | \n",
1893 | " 2013-05-31 | \n",
1894 | " 340 days | \n",
1895 | "
\n",
1896 | " \n",
1897 | "
\n",
1898 | "
"
1899 | ],
1900 | "text/plain": [
1901 | " employee_id company_id dept seniority salary join_date \\\n",
1902 | "1 825355.0 7 marketing 20 183000.0 2013-04-29 \n",
1903 | "3 662910.0 7 customer_service 20 115000.0 2012-05-14 \n",
1904 | "12 939058.0 1 marketing 1 48000.0 2012-12-10 \n",
1905 | "14 461248.0 2 sales 20 201000.0 2013-09-16 \n",
1906 | "21 219944.0 6 customer_service 15 98000.0 2012-06-25 \n",
1907 | "\n",
1908 | " quit_date emp_length \n",
1909 | "1 2014-04-04 340 days \n",
1910 | "3 2013-06-07 389 days \n",
1911 | "12 2013-11-15 340 days \n",
1912 | "14 2014-08-22 340 days \n",
1913 | "21 2013-05-31 340 days "
1914 | ]
1915 | },
1916 | "execution_count": 929,
1917 | "metadata": {},
1918 | "output_type": "execute_result"
1919 | }
1920 | ],
1921 | "source": [
1922 | "early_quitter.head()"
1923 | ]
1924 | },
1925 | {
1926 | "cell_type": "code",
1927 | "execution_count": 931,
1928 | "metadata": {},
1929 | "outputs": [],
1930 | "source": [
1931 | "last_day = pd.to_datetime(\"2015-12-13\")"
1932 | ]
1933 | },
1934 | {
1935 | "cell_type": "code",
1936 | "execution_count": 932,
1937 | "metadata": {},
1938 | "outputs": [
1939 | {
1940 | "data": {
1941 | "text/plain": [
1942 | "Timestamp('2015-12-13 00:00:00')"
1943 | ]
1944 | },
1945 | "execution_count": 932,
1946 | "metadata": {},
1947 | "output_type": "execute_result"
1948 | }
1949 | ],
1950 | "source": [
1951 | "last_day"
1952 | ]
1953 | },
1954 | {
1955 | "cell_type": "code",
1956 | "execution_count": 944,
1957 | "metadata": {},
1958 | "outputs": [],
1959 | "source": [
1960 | "# get the data not early quitter and exclude the ones employed less than 13 months\n",
1961 | "longer_emp = data[((last_day - data.join_date)/np.timedelta64(1,'D') >365+30)\\\n",
1962 | " &(data.emp_length/np.timedelta64(1,'D') > 365+30)]"
1963 | ]
1964 | },
1965 | {
1966 | "cell_type": "code",
1967 | "execution_count": 945,
1968 | "metadata": {},
1969 | "outputs": [
1970 | {
1971 | "data": {
1972 | "text/html": [
1973 | "\n",
1974 | "\n",
1987 | "
\n",
1988 | " \n",
1989 | " \n",
1990 | " | \n",
1991 | " employee_id | \n",
1992 | " company_id | \n",
1993 | " dept | \n",
1994 | " seniority | \n",
1995 | " salary | \n",
1996 | " join_date | \n",
1997 | " quit_date | \n",
1998 | " emp_length | \n",
1999 | "
\n",
2000 | " \n",
2001 | " \n",
2002 | " \n",
2003 | " 0 | \n",
2004 | " 13021.0 | \n",
2005 | " 7 | \n",
2006 | " customer_service | \n",
2007 | " 28 | \n",
2008 | " 89000.0 | \n",
2009 | " 2014-03-24 | \n",
2010 | " 2015-10-30 | \n",
2011 | " 585 days | \n",
2012 | "
\n",
2013 | " \n",
2014 | " 4 | \n",
2015 | " 256971.0 | \n",
2016 | " 2 | \n",
2017 | " data_science | \n",
2018 | " 23 | \n",
2019 | " 276000.0 | \n",
2020 | " 2011-10-17 | \n",
2021 | " 2014-08-22 | \n",
2022 | " 1040 days | \n",
2023 | "
\n",
2024 | " \n",
2025 | " 5 | \n",
2026 | " 509529.0 | \n",
2027 | " 4 | \n",
2028 | " data_science | \n",
2029 | " 14 | \n",
2030 | " 165000.0 | \n",
2031 | " 2012-01-30 | \n",
2032 | " 2013-08-30 | \n",
2033 | " 578 days | \n",
2034 | "
\n",
2035 | " \n",
2036 | " 8 | \n",
2037 | " 172999.0 | \n",
2038 | " 9 | \n",
2039 | " engineer | \n",
2040 | " 7 | \n",
2041 | " 160000.0 | \n",
2042 | " 2012-12-10 | \n",
2043 | " 2015-10-23 | \n",
2044 | " 1047 days | \n",
2045 | "
\n",
2046 | " \n",
2047 | " 10 | \n",
2048 | " 892155.0 | \n",
2049 | " 6 | \n",
2050 | " customer_service | \n",
2051 | " 13 | \n",
2052 | " 72000.0 | \n",
2053 | " 2012-11-12 | \n",
2054 | " 2015-02-27 | \n",
2055 | " 837 days | \n",
2056 | "
\n",
2057 | " \n",
2058 | "
\n",
2059 | "
"
2060 | ],
2061 | "text/plain": [
2062 | " employee_id company_id dept seniority salary join_date \\\n",
2063 | "0 13021.0 7 customer_service 28 89000.0 2014-03-24 \n",
2064 | "4 256971.0 2 data_science 23 276000.0 2011-10-17 \n",
2065 | "5 509529.0 4 data_science 14 165000.0 2012-01-30 \n",
2066 | "8 172999.0 9 engineer 7 160000.0 2012-12-10 \n",
2067 | "10 892155.0 6 customer_service 13 72000.0 2012-11-12 \n",
2068 | "\n",
2069 | " quit_date emp_length \n",
2070 | "0 2015-10-30 585 days \n",
2071 | "4 2014-08-22 1040 days \n",
2072 | "5 2013-08-30 578 days \n",
2073 | "8 2015-10-23 1047 days \n",
2074 | "10 2015-02-27 837 days "
2075 | ]
2076 | },
2077 | "execution_count": 945,
2078 | "metadata": {},
2079 | "output_type": "execute_result"
2080 | }
2081 | ],
2082 | "source": [
2083 | "longer_emp.head()"
2084 | ]
2085 | },
2086 | {
2087 | "cell_type": "markdown",
2088 | "metadata": {},
2089 | "source": [
2090 | "might use decision tree here to model it"
2091 | ]
2092 | },
2093 | {
2094 | "cell_type": "code",
2095 | "execution_count": 949,
2096 | "metadata": {},
2097 | "outputs": [
2098 | {
2099 | "data": {
2100 | "text/plain": [
2101 | "count 5654.000000\n",
2102 | "mean 131393.880439\n",
2103 | "std 65464.211853\n",
2104 | "min 17000.000000\n",
2105 | "25% 81000.000000\n",
2106 | "50% 122000.000000\n",
2107 | "75% 173000.000000\n",
2108 | "max 372000.000000\n",
2109 | "Name: salary, dtype: float64"
2110 | ]
2111 | },
2112 | "execution_count": 949,
2113 | "metadata": {},
2114 | "output_type": "execute_result"
2115 | }
2116 | ],
2117 | "source": [
2118 | "early_quitter.salary.describe()"
2119 | ]
2120 | },
2121 | {
2122 | "cell_type": "code",
2123 | "execution_count": 950,
2124 | "metadata": {},
2125 | "outputs": [
2126 | {
2127 | "data": {
2128 | "text/plain": [
2129 | "count 7795.000000\n",
2130 | "mean 138768.313021\n",
2131 | "std 75379.904785\n",
2132 | "min 19000.000000\n",
2133 | "25% 80000.000000\n",
2134 | "50% 123000.000000\n",
2135 | "75% 187000.000000\n",
2136 | "max 379000.000000\n",
2137 | "Name: salary, dtype: float64"
2138 | ]
2139 | },
2140 | "execution_count": 950,
2141 | "metadata": {},
2142 | "output_type": "execute_result"
2143 | }
2144 | ],
2145 | "source": [
2146 | "longer_emp.salary.describe()"
2147 | ]
2148 | },
2149 | {
2150 | "cell_type": "markdown",
2151 | "metadata": {},
2152 | "source": [
2153 | "### Check week of year-quit time"
2154 | ]
2155 | },
2156 | {
2157 | "cell_type": "code",
2158 | "execution_count": 850,
2159 | "metadata": {},
2160 | "outputs": [
2161 | {
2162 | "data": {
2163 | "text/html": [
2164 | "\n",
2165 | "\n",
2178 | "
\n",
2179 | " \n",
2180 | " \n",
2181 | " | \n",
2182 | " employee_id | \n",
2183 | " company_id | \n",
2184 | " dept | \n",
2185 | " seniority | \n",
2186 | " salary | \n",
2187 | " join_date | \n",
2188 | " quit_date | \n",
2189 | " emp_length | \n",
2190 | "
\n",
2191 | " \n",
2192 | " \n",
2193 | " \n",
2194 | " 0 | \n",
2195 | " 13021.0 | \n",
2196 | " 7 | \n",
2197 | " customer_service | \n",
2198 | " 28 | \n",
2199 | " 89000.0 | \n",
2200 | " 2014-03-24 | \n",
2201 | " 2015-10-30 | \n",
2202 | " 585 days | \n",
2203 | "
\n",
2204 | " \n",
2205 | " 1 | \n",
2206 | " 825355.0 | \n",
2207 | " 7 | \n",
2208 | " marketing | \n",
2209 | " 20 | \n",
2210 | " 183000.0 | \n",
2211 | " 2013-04-29 | \n",
2212 | " 2014-04-04 | \n",
2213 | " 340 days | \n",
2214 | "
\n",
2215 | " \n",
2216 | " 2 | \n",
2217 | " 927315.0 | \n",
2218 | " 4 | \n",
2219 | " marketing | \n",
2220 | " 14 | \n",
2221 | " 101000.0 | \n",
2222 | " 2014-10-13 | \n",
2223 | " NaT | \n",
2224 | " NaT | \n",
2225 | "
\n",
2226 | " \n",
2227 | " 3 | \n",
2228 | " 662910.0 | \n",
2229 | " 7 | \n",
2230 | " customer_service | \n",
2231 | " 20 | \n",
2232 | " 115000.0 | \n",
2233 | " 2012-05-14 | \n",
2234 | " 2013-06-07 | \n",
2235 | " 389 days | \n",
2236 | "
\n",
2237 | " \n",
2238 | " 4 | \n",
2239 | " 256971.0 | \n",
2240 | " 2 | \n",
2241 | " data_science | \n",
2242 | " 23 | \n",
2243 | " 276000.0 | \n",
2244 | " 2011-10-17 | \n",
2245 | " 2014-08-22 | \n",
2246 | " 1040 days | \n",
2247 | "
\n",
2248 | " \n",
2249 | "
\n",
2250 | "
"
2251 | ],
2252 | "text/plain": [
2253 | " employee_id company_id dept seniority salary join_date \\\n",
2254 | "0 13021.0 7 customer_service 28 89000.0 2014-03-24 \n",
2255 | "1 825355.0 7 marketing 20 183000.0 2013-04-29 \n",
2256 | "2 927315.0 4 marketing 14 101000.0 2014-10-13 \n",
2257 | "3 662910.0 7 customer_service 20 115000.0 2012-05-14 \n",
2258 | "4 256971.0 2 data_science 23 276000.0 2011-10-17 \n",
2259 | "\n",
2260 | " quit_date emp_length \n",
2261 | "0 2015-10-30 585 days \n",
2262 | "1 2014-04-04 340 days \n",
2263 | "2 NaT NaT \n",
2264 | "3 2013-06-07 389 days \n",
2265 | "4 2014-08-22 1040 days "
2266 | ]
2267 | },
2268 | "execution_count": 850,
2269 | "metadata": {},
2270 | "output_type": "execute_result"
2271 | }
2272 | ],
2273 | "source": [
2274 | "data.head()"
2275 | ]
2276 | },
2277 | {
2278 | "cell_type": "code",
2279 | "execution_count": 868,
2280 | "metadata": {},
2281 | "outputs": [],
2282 | "source": [
2283 | "# get week of the year\n",
2284 | "week = data.quit_date.dropna().dt.week"
2285 | ]
2286 | },
2287 | {
2288 | "cell_type": "code",
2289 | "execution_count": 869,
2290 | "metadata": {},
2291 | "outputs": [
2292 | {
2293 | "data": {
2294 | "text/plain": [
2295 | ""
2296 | ]
2297 | },
2298 | "execution_count": 869,
2299 | "metadata": {},
2300 | "output_type": "execute_result"
2301 | },
2302 | {
2303 | "data": {
2304 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAF0NJREFUeJzt3X+MHOd93/H3J6QsEzqXFC1xwRzZngqzhhVdTVsHRYD6x55kxLJkhApgtTJYm3bUXgrIggozVSj/YycpYRmNrMCua1QJXdON7DOhHyUh0U1UShvVgGXnzmZ0klnBtM3K/FESsijGZytKT/n2j32oro97u3O3s7e7z31ewOFmnnlm5nl29j773OzsjiICMzPL16/0ugFmZtZdDnozs8w56M3MMuegNzPLnIPezCxzDnozs8w56M3MMuegNzPLnIPezCxzq3vdAIDLLrssRkZGWtb5+c9/ziWXXLI8DeoD7m++VlJfwf3tpunp6Zci4vJ29foi6EdGRpiammpZp1arUa1Wl6dBfcD9zddK6iu4v90k6X8XqedTN2ZmmXPQm5llzkFvZpY5B72ZWeYc9GZmmSsc9JJWSfqepMfS/BWSvi3pB5K+LulNqfziNH80LR/pTtPNzKyIxYzo7wKONMx/Brg/IrYAZ4HbU/ntwNmIeBtwf6pnZmY9UijoJW0Cbgb+NM0LuB54KFXZC9ySpreledLyG1J9MzPrgaIj+j8G7gb+Ps2/FXglIubS/HFgOE0PAz8BSMvPpfpmZtYDbT8ZK+n9wJmImJZUPV/cpGoUWNa43QlgAqBSqVCr1Vq2Y3Z2tm2dnLi//W/mxLk3pkeH1xZebxD72gn3t/eKfAXCdcBvSroJeDPwD6iP8NdJWp1G7ZuAk6n+cWAzcFzSamAt8PL8jUbEA8ADAGNjY9HuI8P+GHXeBrG/H9n1+BvTx7ZXC683iH3thPvbe21P3UTEPRGxKSJGgNuAJyNiO/AU8IFUbQewP00fSPOk5U9GxAUjejMzWx6dXEf/e8DHJR2lfg5+TyrfA7w1lX8c2NVZE83MrBOL+vbKiKgBtTT9I+CaJnX+Fri1hLaZmVkJ/MlYM7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDK3qK8ptryNNN4x6d6be9gSMyuTR/RmZplz0JuZZa5t0Et6s6TvSPprSc9L+v1U/mVJP5Z0OP1sTeWS9DlJRyU9K+nd3e6EmZktrMg5+teA6yNiVtJFwDclfSMt+3cR8dC8+u8DtqSfXwe+mH6bmVkPtB3RR91smr0o/USLVbYBX0nrPQOsk7Sx86aamdlSFDpHL2mVpMPAGeCJiPh2WrQ7nZ65X9LFqWwY+EnD6sdTmZmZ9YAiWg3O51WW1gGPAncCPwX+D/Am4AHghxHxB5IeBz4dEd9M6xwC7o6I6XnbmgAmACqVytWTk5Mt9z07O8vQ0FDhtg66XvR35sS5N6ZHh9cu674H8fgu9fEaxL52wv3tnvHx8emIGGtXb1HX0UfEK5JqwI0R8Uep+DVJ/wX43TR/HNjcsNom4GSTbT1A/QWCsbGxqFarLfddq9VoVycnvejvRxqvo9++vPsexOO71MdrEPvaCfe399oGvaTLgf+bQn4N8B7gM5I2RsQpSQJuAZ5LqxwAPiZpkvqbsOci4lSX2m895A9YmQ2GIiP6jcBeSauon9PfFxGPSXoyvQgIOAz8m1T/IHATcBT4BfDR8pttZmZFtQ36iHgWeFeT8usXqB/AHZ03zczMyuBPxpqZZc5Bb2aWOQe9mVnmHPRmZplz0JuZZc5Bb2aWOQe9mVnmHPRmZplz0JuZZc5Bb2aWOQe9mVnmHPRmZplz0JuZZc5Bb2aWuUXdYcrMLEe530THQW9m2ck9uBfLp27MzDJX5J6xbwaeBi5O9R+KiE9KugKYBNYD3wU+FBF/J+li4CvA1cBPgX8REce61H4zWwE8Qu9MkRH9a8D1EfFOYCtwo6Rrgc8A90fEFuAscHuqfztwNiLeBtyf6pmZWY+0Dfqom02zF6WfAK4HHkrle4Fb0vS2NE9afoMkldZiMzNbFNXv5d2mkrQKmAbeBnwB+A/AM2nUjqTNwDci4ipJzwE3RsTxtOyHwK9HxEvztjkBTABUKpWrJycnW7ZhdnaWoaGhRXZvcPWivzMnzr0xPTq8tvT6rQzi8V1q/wexr51o1t9uP9dWynN5fHx8OiLG2tUrdNVNRLwObJW0DngUeEezaul3s9H7Ba8mEfEA8ADA2NhYVKvVlm2o1Wq0q5OTXvT3I43nQbe33/di67cyiMd3qf0fxL52oll/u/1c83P5ly3qqpuIeAWoAdcC6ySdf6HYBJxM08eBzQBp+Vrg5TIaa2Zmi9c26CVdnkbySFoDvAc4AjwFfCBV2wHsT9MH0jxp+ZNR5PyQmZl1RZFTNxuBvek8/a8A+yLiMUnfByYl/Xvge8CeVH8P8F8lHaU+kr+tC+02M/NllwW1DfqIeBZ4V5PyHwHXNCn/W+DWUlpnZtanBulFxl+BYGa2gEEK81b8FQhmZpnziN6WTS6jI7NB46A36zG/AFq3+dSNmVnmPKI3s77h/266w0Fv1oQDJ08r9biumKBfqQfYrMznvv+OBtOKCfpBcf4PaefoHNXeNmXJHAZm/cVvxpqZZc4jerOSDfp/NI3th8Hsg/0yj+jNzDLnoDczy5xP3fTIoP97b9bIz+f+5qC3gbNSQmWl9NO6z0FvNmAWegHwC4MtxEG/AP/RDB4fM7PmitwzdrOkpyQdkfS8pLtS+acknZB0OP3c1LDOPZKOSnpB0nu72QEbfCO7HmfmxLkLLuszs3IUGdHPATsj4ruS3gJMS3oiLbs/Iv6osbKkK6nfJ/bXgF8F/oekfxIRr5fZ8LJ4FGhmuStyz9hTwKk0/TNJR4DhFqtsAyYj4jXgx+km4dcA3yqhvWaL1i8v5jl8vYUNJkVE8crSCPA0cBXwceAjwN8AU9RH/Wcl/UfgmYj4s7TOHuAbEfHQvG1NABMAlUrl6snJyZb7np2dZWhoqHBb55s5ce6N6dHhtaWXl92eyhrYsH7p2y+zTYutv5Tyyho4/ery7ncp+yijHfOPbbefm50c1zL20exvt1d9LvOxW2hZp1m1GOPj49MRMdauXuGglzQE/CWwOyIekVQBXgIC+ENgY0T8tqQvAN+aF/QHI+LhhbY9NjYWU1NTLfdfq9WoVquF2trMYq9U6PaVDe22v3N0jju3b1vy9sts02LrL6V85+gc982sXtb9LmUfZbRj/rHt9nOzk+Naxr6bHdte9bnMx26hZZ1m1WJIKhT0ha66kXQR8DDwYEQ8AhARpxuW/wnwWJo9DmxuWH0TcLJgu82Wjb/TxVaKtkEvScAe4EhEfLahfGM6fw/wW8BzafoA8FVJn6X+ZuwW4DultnoF6pfzzGY2eIqM6K8DPgTMSDqcyj4BfFDSVuqnbo4BvwMQEc9L2gd8n/oVO3cs1xU3HqGZmV2oyFU33wTUZNHBFuvsBnZ30C4z/xdjVhJ/e6WZWeYG/isQVvqnKT3qNbN2Bj7ozWxpPEhYOXzqxswscx7Rd5lHTWbWax7Rm5llzkFvZpY5B72ZWeYc9GZmmXPQm5llzkFvZpY5B72ZWeYc9GZmmfMHpkriD0aZWb9y0JuZLbPlHhj61I2ZWeY8os+UTyWZ2XltR/SSNkt6StIRSc9LuiuVr5f0hKQfpN+XpnJJ+pyko5KelfTubnfCzMwWVuTUzRywMyLeAVwL3CHpSmAXcCgitgCH0jzA+6jfEHwLMAF8sfRWm5lZYW2DPiJORcR30/TPgCPAMLAN2Juq7QVuSdPbgK9E3TPAOkkbS2+5mZkVoogoXlkaAZ4GrgJejIh1DcvORsSlkh4D7k03FUfSIeD3ImJq3rYmqI/4qVQqV09OTrbc9+zsLENDQxeUz5w4t+A6o8Nrm9YbhPLKGtiwvvztt7LYdcrsc2UNnH61d8eg6DqN+u3Y9vJx6edj243yVssWyqoi6y7W+Pj4dESMtatXOOglDQF/CeyOiEckvbJA0D8OfHpe0N8dEdMLbXtsbCympqYWWgxArVajWq1eUN7qnrGNb0Iu9OZkv5bvHJ3jzu3bSt/+Utdppsw+7xyd476Z1T07BkXXadRvx7aXj0s/H9tulLdatlBWFVl3sSQVCvpCl1dKugh4GHgwIh5JxafPn5JJv8+k8uPA5obVNwEnizbczMzKVeSqGwF7gCMR8dmGRQeAHWl6B7C/ofzD6eqba4FzEXGqxDabmdkiFLmO/jrgQ8CMpMOp7BPAvcA+SbcDLwK3pmUHgZuAo8AvgI+W2mIzswHU6jRzt7UN+nSuXQssvqFJ/QDu6LBdZmZWEn8FgplZ5hz0ZmaZc9CbmZVo5sQ5RnY93tNz8vM56M3MMuegNzPLnIPezCxzDnozs8w56M3MMuegNzPLnIPezCxzDnozs8w56M3MMuegNzPLnIPezCxzDnozs8w56M3MMlfkVoJfknRG0nMNZZ+SdELS4fRzU8OyeyQdlfSCpPd2q+FmZlZMkRH9l4Ebm5TfHxFb089BAElXArcBv5bW+U+SVpXVWDMzW7y2QR8RTwMvF9zeNmAyIl6LiB9Tv2/sNR20z8zMOlTk5uAL+ZikDwNTwM6IOAsMA8801DmeymyANd5A4di9N/ewJWa2FKrfy7tNJWkEeCwirkrzFeAlIIA/BDZGxG9L+gLwrYj4s1RvD3AwIh5uss0JYAKgUqlcPTk52bINs7OzDA0NXVA+c+LcguuMDq9tWm8QyitrYMP68rffD31rVl5ZA6df7V17iq7TqN+ObS8fl34+tt0ob7XszMvnOP1q6201Wuj5VcT4+Ph0RIy1q7ekoF9omaR7ACLi02nZnwOfiohvtdr+2NhYTE1NtWxDrVajWq1eUN7qdl2No8+FRqX9Wr5zdI47t28rffv90Ldm5TtH57hvZnXP2lN0nUb9dmx7+bj087HtRnmrZZ9/cD/3zaxuua1GnfyXLKlQ0C/p8kpJGxtmfws4f0XOAeA2SRdLugLYAnxnKfswM7NytD1HL+lrQBW4TNJx4JNAVdJW6qdujgG/AxARz0vaB3wfmAPuiIjXu9N0MzMrom3QR8QHmxTvaVF/N7C7k0aZmVl5/MlYM7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMtQ16SV+SdEbScw1l6yU9IekH6felqVySPifpqKRnJb27m403M7P2iozovwzcOK9sF3AoIrYAh9I8wPuo3xB8CzABfLGcZpqZ2VK1DfqIeBp4eV7xNmBvmt4L3NJQ/pWoewZYJ2ljWY01M7PFW+o5+kpEnAJIvzek8mHgJw31jqcyMzPrEUVE+0rSCPBYRFyV5l+JiHUNy89GxKWSHgc+HRHfTOWHgLsjYrrJNieon96hUqlcPTk52bINs7OzDA0NXVA+c+LcguuMDq9tWm8QyitrYMP68rffD31rVl5ZA6df7V17iq7TqN+ObS8fl34+tt0ob7XszMvnOP1q6201Wuj5VcT4+Ph0RIy1q7fUoH8BqEbEqXRqphYRb5f0n9P01+bXa7X9sbGxmJqaatmGWq1GtVq9oHxk1+MLrnPs3pub1huE8p2jc9y5fVvp2++HvjUr3zk6x30zq3vWnqLrNOq3Y9vLx6Wfj203ylst+/yD+7lvZnXLbTVa6PlVhKRCQb/UUzcHgB1pegewv6H8w+nqm2uBc+1C3szMumt1uwqSvgZUgcskHQc+CdwL7JN0O/AicGuqfhC4CTgK/AL4aBfabGZmi9A26CPigwssuqFJ3QDu6LRRZmZWHn8y1swsc21H9GYrXZE3Zs36mUf0ZmaZc9CbmWXOQW9mljkHvZlZ5hz0ZmaZc9CbmWXOQW9mljkHvZlZ5hz0ZmaZc9CbmWXOQW9mljkHvZlZ5hz0ZmaZc9CbmWXOQW9mlrmOvo9e0jHgZ8DrwFxEjElaD3wdGAGOAf88Is521kwzM1uqMkb04xGxteFO5LuAQxGxBTiU5s3MrEe6cepmG7A3Te8FbunCPszMrKBOgz6Av5A0LWkilVUi4hRA+r2hw32YmVkHFBFLX1n61Yg4KWkD8ARwJ3AgItY11DkbEZc2WXcCmACoVCpXT05OttzX7OwsQ0NDF5TPnDi34Dqjw2ub1huE8soa2LC+/O33Q9+alVfWwOlXe9eedsvK3F+3ju1yPy6Dcmy7Ud5q2ZmXz3H61dbbajR/u4sxPj4+3XDafEEdBf0vbUj6FDAL/GugGhGnJG0EahHx9lbrjo2NxdTUVMvt12o1qtXqBeWNN26er/FGzgvd4Llfy3eOznHn9m2lb78f+tasfOfoHPfNrO5Ze9otK3N/3Tq2y/24DMqx7UZ5q2Wff3A/982sbrmtRp3ccF5SoaBf8qkbSZdIesv5aeA3gOeAA8COVG0HsH+p+zAzs851cnllBXhU0vntfDUi/rukvwL2SbodeBG4tfNmmpnZUi056CPiR8A7m5T/FLihk0aZmVl5/MlYM7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMOejNzDLnoDczy5yD3swscw56M7PMdS3oJd0o6QVJRyXt6tZ+zMysta4EvaRVwBeA9wFXAh+UdGU39mVmZq11a0R/DXA0In4UEX8HTALburQvMzNroVtBPwz8pGH+eCozM7Nlpogof6PSrcB7I+JfpfkPAddExJ0NdSaAiTT7duCFNpu9DHip9Mb2L/c3Xyupr+D+dtM/iojL21Va3aWdHwc2N8xvAk42VoiIB4AHim5Q0lREjJXTvP7n/uZrJfUV3N9+0K1TN38FbJF0haQ3AbcBB7q0LzMza6ErI/qImJP0MeDPgVXAlyLi+W7sy8zMWuvWqRsi4iBwsMRNFj7Nkwn3N18rqa/g/vZcV96MNTOz/uGvQDAzy9xABH3uX6cg6UuSzkh6rqFsvaQnJP0g/b60l20si6TNkp6SdETS85LuSuW59vfNkr4j6a9Tf38/lV8h6dupv19PFy1kQdIqSd+T9Fiaz7mvxyTNSDosaSqV9d1zue+DfoV8ncKXgRvnle0CDkXEFuBQms/BHLAzIt4BXAvckY5nrv19Dbg+It4JbAVulHQt8Bng/tTfs8DtPWxj2e4CjjTM59xXgPGI2NpwSWXfPZf7PuhZAV+nEBFPAy/PK94G7E3Te4FblrVRXRIRpyLiu2n6Z9QDYZh8+xsRMZtmL0o/AVwPPJTKs+mvpE3AzcCfpnmRaV9b6Lvn8iAE/Ur9OoVKRJyCejgCG3rcntJJGgHeBXybjPubTmUcBs4ATwA/BF6JiLlUJafn9B8DdwN/n+bfSr59hfqL9l9Imk6f9oc+fC537fLKEqlJmS8VGnCShoCHgX8bEX9TH/jlKSJeB7ZKWgc8CryjWbXlbVX5JL0fOBMR05Kq54ubVB34vja4LiJOStoAPCHpf/W6Qc0Mwoi+7dcpZOq0pI0A6feZHrenNJIuoh7yD0bEI6k42/6eFxGvADXq702sk3R+oJXLc/o64DclHaN+ivV66iP8HPsKQEScTL/PUH8Rv4Y+fC4PQtCv1K9TOADsSNM7gP09bEtp0jnbPcCRiPhsw6Jc+3t5GskjaQ3wHurvSzwFfCBVy6K/EXFPRGyKiBHqf6dPRsR2MuwrgKRLJL3l/DTwG8Bz9OFzeSA+MCXpJuojg/Nfp7C7x00qlaSvAVXq33p3Gvgk8N+AfcA/BF4Ebo2I+W/YDhxJ/wz4n8AM//887ieon6fPsb//lPobcquoD6z2RcQfSPrH1Ee964HvAf8yIl7rXUvLlU7d/G5EvD/XvqZ+PZpmVwNfjYjdkt5Knz2XByLozcxs6Qbh1I2ZmXXAQW9mljkHvZlZ5hz0ZmaZc9CbmWXOQW9mljkHvZlZ5hz0ZmaZ+3+p3PjgQwJPlwAAAABJRU5ErkJggg==\n",
2305 | "text/plain": [
2306 | ""
2307 | ]
2308 | },
2309 | "metadata": {},
2310 | "output_type": "display_data"
2311 | }
2312 | ],
2313 | "source": [
2314 | "week.hist(bins = 100)"
2315 | ]
2316 | },
2317 | {
2318 | "cell_type": "markdown",
2319 | "metadata": {},
2320 | "source": [
2321 | "Observation:
\n",
2322 | "No significant pattern"
2323 | ]
2324 | },
2325 | {
2326 | "cell_type": "markdown",
2327 | "metadata": {},
2328 | "source": [
2329 | "### Check if different dept matters"
2330 | ]
2331 | },
2332 | {
2333 | "cell_type": "code",
2334 | "execution_count": 876,
2335 | "metadata": {},
2336 | "outputs": [],
2337 | "source": [
2338 | "# dept quitted\n",
2339 | "dept_q = data[data['quit_date'].notnull()].dept"
2340 | ]
2341 | },
2342 | {
2343 | "cell_type": "code",
2344 | "execution_count": 879,
2345 | "metadata": {},
2346 | "outputs": [
2347 | {
2348 | "data": {
2349 | "text/plain": [
2350 | "customer_service 0.554902\n",
2351 | "data_science 0.527273\n",
2352 | "design 0.563768\n",
2353 | "engineer 0.512031\n",
2354 | "marketing 0.562993\n",
2355 | "sales 0.570933\n",
2356 | "Name: dept, dtype: float64"
2357 | ]
2358 | },
2359 | "execution_count": 879,
2360 | "metadata": {},
2361 | "output_type": "execute_result"
2362 | }
2363 | ],
2364 | "source": [
2365 | "# percentage of churned in each dept\n",
2366 | "dept.value_counts()/data.dept.value_counts()"
2367 | ]
2368 | },
2369 | {
2370 | "cell_type": "markdown",
2371 | "metadata": {},
2372 | "source": [
2373 | "Observation:\n",
2374 | "No significant diff"
2375 | ]
2376 | },
2377 | {
2378 | "cell_type": "markdown",
2379 | "metadata": {},
2380 | "source": [
2381 | "### Seniority"
2382 | ]
2383 | },
2384 | {
2385 | "cell_type": "code",
2386 | "execution_count": 888,
2387 | "metadata": {},
2388 | "outputs": [],
2389 | "source": [
2390 | "s = data[data['quit_date'].notnull()].seniority.value_counts()/data.seniority.value_counts()"
2391 | ]
2392 | },
2393 | {
2394 | "cell_type": "code",
2395 | "execution_count": 889,
2396 | "metadata": {},
2397 | "outputs": [
2398 | {
2399 | "data": {
2400 | "text/plain": [
2401 | "1 0.499419\n",
2402 | "2 0.530786\n",
2403 | "3 0.507378\n",
2404 | "4 0.471508\n",
2405 | "5 0.569444\n",
2406 | "6 0.601053\n",
2407 | "7 0.550647\n",
2408 | "8 0.581349\n",
2409 | "9 0.552966\n",
2410 | "10 0.564186\n",
2411 | "11 0.554113\n",
2412 | "12 0.590081\n",
2413 | "13 0.559284\n",
2414 | "14 0.552174\n",
2415 | "15 0.554336\n",
2416 | "16 0.570513\n",
2417 | "17 0.535274\n",
2418 | "18 0.524083\n",
2419 | "19 0.546154\n",
2420 | "20 0.555687\n",
2421 | "21 0.581841\n",
2422 | "22 0.530105\n",
2423 | "23 0.547771\n",
2424 | "24 0.535666\n",
2425 | "25 0.563636\n",
2426 | "26 0.517291\n",
2427 | "27 0.532710\n",
2428 | "28 0.547009\n",
2429 | "29 0.492013\n",
2430 | "98 1.000000\n",
2431 | "99 1.000000\n",
2432 | "Name: seniority, dtype: float64"
2433 | ]
2434 | },
2435 | "execution_count": 889,
2436 | "metadata": {},
2437 | "output_type": "execute_result"
2438 | }
2439 | ],
2440 | "source": [
2441 | "s"
2442 | ]
2443 | },
2444 | {
2445 | "cell_type": "code",
2446 | "execution_count": 890,
2447 | "metadata": {},
2448 | "outputs": [],
2449 | "source": [
2450 | "s_df = pd.DataFrame(s)"
2451 | ]
2452 | },
2453 | {
2454 | "cell_type": "code",
2455 | "execution_count": 892,
2456 | "metadata": {
2457 | "scrolled": true
2458 | },
2459 | "outputs": [
2460 | {
2461 | "data": {
2462 | "text/plain": [
2463 | ""
2464 | ]
2465 | },
2466 | "execution_count": 892,
2467 | "metadata": {},
2468 | "output_type": "execute_result"
2469 | },
2470 | {
2471 | "data": {
2472 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD/CAYAAAAKVJb/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAFo9JREFUeJzt3X20VfV95/H3V0ABwcTCzYMiuawWGhGf6i1JYxKZqBGiIpOYWZJkSFxJWJmp0tTgqEtHjdOH1Jmpba0xNa0mcZKg0Y6wxqsmqZJqfQKMQRFZpZToHcyEosUkHSs4v/ljb8xhc+4951zO4dz78/1aay/2w/f89u/se/bn7L3PPodIKSFJystB3e6AJKn9DHdJypDhLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjJkuEtShgx3ScrQ2G6teOrUqam3t7dbq5ekUWndunX/lFLqaVTXtXDv7e1l7dq13Vq9JI1KEfHjZuq8LCNJGTLcJSlDhrskZahr19zr2bVrFwMDA7zyyivd7sqoMn78eKZNm8a4ceO63RVJI0TDcI+Im4GzgJ+mlObUWR7AnwIfAv4F+FRK6YnhdGZgYIDJkyfT29tL0awaSSmxY8cOBgYGmDFjRre7I2mEaOayzNeA+UMsXwDMLIelwI3D7cwrr7zClClTDPYWRARTpkzxbEfSXhqGe0rpb4EXhyg5B/hGKjwKvDki3j7cDhnsrXObSapqxweqRwLP10wPlPPesNauXcuyZcuG/ZjVq1fz8MMPd6Jrkt4g2vGBar3Dxrr/63ZELKW4dMP06dMbNtx76d371bGqrV86s63tDaavr4++vr6m63fv3r3XY1avXs2kSZN4z3ve06kuSjqA6mXZYHnUSu1Q2nHkPgAcVTM9DdhWrzCldFNKqS+l1NfT0/Dbs13xi1/8gjPPPJPjjz+eOXPmcNttt7Fu3TpOOeUUTjrpJM444wxeeOEFAObNm8cll1zC3LlzmTVrFg8++CBQhPNZZ50FwIsvvsiiRYs47rjjePe738369esBuPrqq1m6dCkf/OAHWbJkyeuP2bp1K1/5yle47rrrOOGEE3jwwQeZMWMGu3btAuDll1+mt7f39WlJqqcdR+6rgAsiYgXwLmBnSumFNrTbFffeey9HHHEEd99dvHvu3LmTBQsWsHLlSnp6erjtttu4/PLLufnmm4HiqPvxxx+nv7+fL37xi3z/+9/fq72rrrqKE088kbvuuov777+fJUuW8OSTTwKwbt06HnroISZMmMDq1auB4mcZPve5zzFp0iSWL18OFG8id999N4sWLWLFihV85CMf8bZHSUNq5lbIbwPzgKkRMQBcBYwDSCl9BeinuA1yM8WtkOd3qrMHwrHHHsvy5cu55JJLOOusszj88MN5+umnOf300wF47bXXePvbf/l58Yc//GEATjrpJLZu3bpPew899BB33nknAB/4wAfYsWMHO3fuBGDhwoVMmDChYZ8+85nPcO2117Jo0SJuueUWvvrVr+7v05SUuYbhnlJa3GB5An67bT3qslmzZrFu3Tr6+/u57LLLOP300znmmGN45JFH6tYfcsghAIwZM4bdu3fvs7zYPHvbc3fLoYce2lSfTj75ZLZu3coPfvADXnvtNebM2efrBpK0F39+oGLbtm1MnDiRT3ziEyxfvpzHHnuM7du3vx7uu3btYsOGDU239/73v59vfvObQHEtfurUqRx22GFDPmby5Mn87Gc/22vekiVLWLx4MeefP6pPjCQdICPq5wdGgqeeeoqLL76Ygw46iHHjxnHjjTcyduxYli1bxs6dO9m9ezef//znOeaYY5pq7+qrr+b888/nuOOOY+LEiXz9619v+Jizzz6bc889l5UrV3L99dfzvve9j49//ONcccUVLF485ImUJAEQ9S4bHAh9fX2p+nvuGzdu5Oijj+5Kf0a6O+64g5UrV3LrrbfWXe62k0audt4KGRHrUkoN77X2yH0UuPDCC7nnnnvo7+/vdlckjRKG+yhw/fXXd7sLkkYZP1CVpAyNuHDv1mcAo5nbTFLViAr38ePHs2PHDsOqBXt+z338+PHd7oqkEWREXXOfNm0aAwMDbN++vdtdGVX2/E9MkrTHiAr3cePG+b8JSVIbjKjLMpKk9jDcJSlDhrskZchwl6QMGe6SlCHDXZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDBnukpQhw12SMmS4S1KGDHdJypDhLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjLUVLhHxPyI2BQRmyPi0jrLp0fEAxHxw4hYHxEfan9XJUnNahjuETEGuAFYAMwGFkfE7ErZFcDtKaUTgfOAL7e7o5Kk5jVz5D4X2JxS2pJSehVYAZxTqUnAYeX4m4Bt7euiJKlVY5uoORJ4vmZ6AHhXpeZq4LsRcSFwKHBaW3onSRqWZo7co868VJleDHwtpTQN+BBwa0Ts03ZELI2ItRGxdvv27a33VpLUlGbCfQA4qmZ6Gvtedvk0cDtASukRYDwwtdpQSummlFJfSqmvp6dneD2WJDXUTLivAWZGxIyIOJjiA9NVlZrngFMBIuJoinD30FySuqRhuKeUdgMXAPcBGynuitkQEddExMKy7AvAZyPiR8C3gU+llKqXbiRJB0gzH6iSUuoH+ivzrqwZfwY4ub1dkyQNl99QlaQMGe6SlCHDXZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDBnukpQhw12SMmS4S1KGDHdJypDhLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjJkuEtShgx3ScqQ4S5JGTLcJSlDhrskZchwl6QMGe6SlCHDXZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDDUV7hExPyI2RcTmiLh0kJp/FxHPRMSGiPhWe7spSWrF2EYFETEGuAE4HRgA1kTEqpTSMzU1M4HLgJNTSi9FxFs61WFJUmPNHLnPBTanlLaklF4FVgDnVGo+C9yQUnoJIKX00/Z2U5LUimbC/Ujg+ZrpgXJerVnArIj4u4h4NCLmt6uDkqTWNbwsA0SdealOOzOBecA04MGImJNS+ue9GopYCiwFmD59esudHW16L717n3lbv3RmF3oi6Y2mmSP3AeComulpwLY6NStTSrtSSv8IbKII+72klG5KKfWllPp6enqG22dJUgPNHLmvAWZGxAzgfwPnAR+r1NwFLAa+FhFTKS7TbGlnRzU6efYidUfDI/eU0m7gAuA+YCNwe0ppQ0RcExELy7L7gB0R8QzwAHBxSmlHpzotSRpaM0fupJT6gf7KvCtrxhNwUTmog+odCYNHwxqcZ09vTH5DVZIy1NSRu37Jo6B8+bdVTgx3AQabRgZfh+1juGtE8LMEqb0Md7VsNB1djaa+tiLX56X2MdxHiE7srLkeDef6vKR2MtylEWI0vWmNpr62Iqfn5a2QkpQhj9ylYfCad/NyOhoeTUZduLtTSZ2T6/6V6/MaipdlJClDo+7IXRpNvCShbjHcJY1Kb8RLLa3wsowkZSjbI3dPhyW9kXnkLkkZGhFH7l47k6T2GhHhLkmjzUi/9OtlGUnKkOEuSRky3CUpQ15zZ+RfO5OkVnnkLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjJkuEtShgx3ScqQ4S5JGTLcJSlDTYV7RMyPiE0RsTkiLh2i7tyISBHR174uSpJa1TDcI2IMcAOwAJgNLI6I2XXqJgPLgMfa3UlJUmuaOXKfC2xOKW1JKb0KrADOqVP3X4BrgVfa2D9J0jA0E+5HAs/XTA+U814XEScCR6WU/lcb+yZJGqZmwj3qzEuvL4w4CLgO+ELDhiKWRsTaiFi7ffv25nspSWpJM+E+ABxVMz0N2FYzPRmYA6yOiK3Au4FV9T5UTSndlFLqSyn19fT0DL/XkqQhNRPua4CZETEjIg4GzgNW7VmYUtqZUpqaUupNKfUCjwILU0prO9JjSVJDDcM9pbQbuAC4D9gI3J5S2hAR10TEwk53UJLUuqb+m72UUj/QX5l35SC18/a/W5Kk/eE3VCUpQ4a7JGXIcJekDBnukpQhw12SMmS4S1KGDHdJypDhLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjJkuEtShgx3ScqQ4S5JGTLcJSlDhrskZchwl6QMGe6SlCHDXZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDBnukpQhw12SMmS4S1KGDHdJypDhLkkZMtwlKUNNhXtEzI+ITRGxOSIurbP8ooh4JiLWR8TfRMQ72t9VSVKzGoZ7RIwBbgAWALOBxRExu1L2Q6AvpXQccAdwbbs7KklqXjNH7nOBzSmlLSmlV4EVwDm1BSmlB1JK/1JOPgpMa283JUmtaCbcjwSer5keKOcN5tPAPfvTKUnS/hnbRE3UmZfqFkZ8AugDThlk+VJgKcD06dOb7KIkqVXNHLkPAEfVTE8DtlWLIuI04HJgYUrpX+s1lFK6KaXUl1Lq6+npGU5/JUlNaCbc1wAzI2JGRBwMnAesqi2IiBOBv6AI9p+2v5uSpFY0DPeU0m7gAuA+YCNwe0ppQ0RcExELy7L/CkwCvhMRT0bEqkGakyQdAM1ccyel1A/0V+ZdWTN+Wpv7JUnaD35DVZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDBnukpQhw12SMmS4S1KGDHdJypDhLkkZMtwlKUOGuyRlyHCXpAwZ7pKUIcNdkjJkuEtShgx3ScqQ4S5JGTLcJSlDhrskZchwl6QMGe6SlCHDXZIyZLhLUoYMd0nKkOEuSRky3CUpQ4a7JGXIcJekDBnukpShpsI9IuZHxKaI2BwRl9ZZfkhE3FYufywietvdUUlS8xqGe0SMAW4AFgCzgcURMbtS9mngpZTSrwHXAX/U7o5KkprXzJH7XGBzSmlLSulVYAVwTqXmHODr5fgdwKkREe3rpiSpFc2E+5HA8zXTA+W8ujUppd3ATmBKOzooSWpdpJSGLoj4KHBGSukz5fS/B+amlC6sqdlQ1gyU0/9Q1uyotLUUWFpO/jqwqbK6qcA/Ndn30VTb7fV3qrbb6+9UbbfX36nabq+/U7XdXn+nagere0dKqafho1NKQw7AbwH31UxfBlxWqbkP+K1yfGzZoWjUdp11rc2xttvr93n5vEbC+n1enXte9YZmLsusAWZGxIyIOBg4D1hVqVkFfLIcPxe4P5W9kyQdeGMbFaSUdkfEBRRH52OAm1NKGyLiGop3llXAXwG3RsRm4EWKNwBJUpc0DHeAlFI/0F+Zd2XN+CvAR9vQn5syre32+jtV2+31d6q22+vvVG2319+p2m6vv1O1rbS5j4YfqEqSRh9/fkCSMmS4S1KGsgz3iJgbEb9Zjs+OiIsi4kNNPO4bne/d8ETEwRGxJCJOK6c/FhF/HhG/HRHjut0/SSPLqLnmHhHvpPgm7GMppZ/XzJ+fUrq3Zvoqit/BGQt8D3gXsBo4jeJ+/d8v66q3cwbwb4D7AVJKC4foy3spfpbh6ZTSdyvL3gVsTCm9HBETgEuB3wCeAf4gpbSzpnYZ8D9TSrXfAB5snd8sn9NE4J+BScBfA6dS/B0/Wan/VeDfAkcBu4G/B75du37pQIuIt6SUftrmNqekyhcmReMvMXVrAM6vGV9G8W3Wu4CtwDk1y56oPO4pils2JwIvA4eV8ycA62sfB/wPYB5wSvnvC+X4KZU2H68Z/yzwJHAV8HfApZXaDcDYcvwm4E+A95b1f12p3QlsAx4E/iPQM8T2WF/+Oxb4P8CYcjpqn1fN9voecAXwMPBl4Pcp3mDmdftv2+bXyVs61O6Ubj+3On16E/Al4FlgRzlsLOe9uYV27qlMHwb8IXAr8LHKsi9Xpt8G3EjxY4JTgKvLfe524O2V2l+pDFPK/fdw4Fdq6uZXnuNfAeuBbwFvrbT5JWBqOd4HbAE2Az+us98+Ue4Dv9rENukDHigz4ahy/9lJ8T2fEyu1k4Bryn19J7AdeBT4VKfbbOn10u0X7BAb+7ma8aeASeV4L7AW+J1y+oeVx/2w3ng5/WTN+EHA75Yb/IRy3pZB+lLb5hrKEAYOBZ6q1G6sfXENtv497Zb9+GD5gt4O3EvxhbDJldqngYPLHeNne3YOYHztOmu2157wnwisLsen19kmbQ8MuhwWZW3bA4Puh8V9wCXA2yrb7xLge5Xa3xhkOAl4oVJ7Z7kNFlF8IfFO4JBBXsP3AhdSnJGuL9c9vZy3slL7/4B/rAy7yn+31G7XmvG/BH4PeAfF/nlX9bVdM/4A8Jvl+Cwq3+gs1/PfgOeAx8v2jhjk7/U4xRn/YorfyTq3nH8q8EildiXwKWAacBHwn4GZFD+e+AedbLOVodsBvn6Q4SngX2vqnqmzM9wL/DH7BuZjwMRy/KDKDv5EnT5MA74D/Dk1byiVmh9RBMiUOi+galh+h/KsA7gF6Kt58a2p1FZ3nHHAQuDbwPbKst+lCJ0fUxyZ/w3w1XJbXVXdAfjlznk4sK5m2dOV2rYHBl0Oi+p6aFNg0P2w2DTEvrSpMv0axSXGB+oM/7dSW92HLqc4K51S5+9Ve6DzXIN2lpd/32Nrt2Gdvj8xRBvV6Wf55Znxo4P9Heu0+z6KM9iflNtgaQvPq7qP/6gyvab89yDg2U622crQlpAe7kBxeeGEcqerHXqBbTV191MeXdfMGwt8A3itMv+QQdY1tfZFVmf5mQzyDklxdLiFMkQog5DiTab64nsT8DXgHyjeaHaVj/kBcPxQf+DKsgl15h1BGSbAmyl+6mFunbrfoQjKm8qdYc+bTQ/wt5XatgdGnW1yQMOinN/2wKD7YfFd4D9Rc+YBvJXiDfH7lTaeBmYOsm2er0xvpOZAqJz3SYoziR8P1lfg9wbbVjXz9hw8/TEwmTpnxxS/NHsR8IVyX4maZdVLjheW2+EDFGd5fwK8H/gicOtgr4GaeWOA+cAtlfmPUJxBf5TiAGpROf8U9j2gexh4bzl+Nnv/9tamNrS5cLA2WxlafkA7B4pT5fcOsuxblRfI2wapO7mL/Z8IzBhk2WTgeIqj2rcOUjOrg307hiL839mgru2B0e2wKOvaHhgjICwOp/iPcJ4FXqL4qY+N5bzqZalzgV8fZNssqkxfC5xWp24+8PeVeddQXiKtzP814I4hXmdnU1xq+kmdZVdVhj2XPd8GfKNO/TzgNorLmk9RfHt+KTCuUreihf3leIqz2HuAdwJ/SnHjwgbgPXVqHy+XP7RnO1McPC1r0OZLZZsnN2hzVr02Wxk6EiwOo2eoBMaLlcA4vFLbVGB0OyzK5e0KjLE1NZ0Ki+Oa3bHLtk6rbjNqPmOo1J66n7UL2tUuxU0Nc+rVtqmv+1t7dIu1Df8O5ev44vLv/9+B/wC8aZDXzJ7aPytrPzdYbVOvweE+0CH/gZo7ltpV2842K2HR9r5263kNVktrd421Unthh2qb6kMn2hxmu8+2s7as+y5N3LXWSm3Tr53hPMjhjTEwyAfM+1PbiTZHQu2BWD+t3TU2amq7vf4OP69m71prurbZoalfhVS+ImL9YIsorr23XNuJNkdCbbfXT7Hz/xwgpbQ1IuYBd0TEO8paRmltt9ffydqxFDciHELxGREppecG+VZ5K7UNGe56K3AGxQc9tYLi9HA4tZ1ocyTUdnv9P4mIE1JKTwKklH4eEWcBNwPHVh47mmq7vf5O1f4lsCYiHqX4gP6PACKih+KzLYZZ25zhHO475DPQ5B1LrdR2os2RUDsC1t/0XWOjqbbb6+9wbVN3rbVa28wwan5bRpLUvCx/FVKS3ugMd0nKkOEuSRky3CUpQ4a7JGXo/wN76dTXKFbruQAAAABJRU5ErkJggg==\n",
2473 | "text/plain": [
2474 | ""
2475 | ]
2476 | },
2477 | "metadata": {},
2478 | "output_type": "display_data"
2479 | }
2480 | ],
2481 | "source": [
2482 | "s_df.plot(kind = 'bar')"
2483 | ]
2484 | },
2485 | {
2486 | "cell_type": "markdown",
2487 | "metadata": {},
2488 | "source": [
2489 | "Observation:\n",
2490 | "No significant diff"
2491 | ]
2492 | },
2493 | {
2494 | "cell_type": "markdown",
2495 | "metadata": {},
2496 | "source": [
2497 | "## Conclusions:"
2498 | ]
2499 | },
2500 | {
2501 | "cell_type": "markdown",
2502 | "metadata": {},
2503 | "source": [
2504 | "1. Empolyee quit at their working anniversaries and has a extremly high churn rate at the first and second year.
\n",
2505 | "2. Salary is an import factor.(need to dig deeper).Employees with low and high salaries are less likely to quit. Probably because employees with high\n",
2506 | "salaries are happy there and employees with low salaries are not that marketable, so they have a\n",
2507 | "hard time finding a new job."
2508 | ]
2509 | }
2510 | ],
2511 | "metadata": {
2512 | "kernelspec": {
2513 | "display_name": "Python 3",
2514 | "language": "python",
2515 | "name": "python3"
2516 | },
2517 | "language_info": {
2518 | "codemirror_mode": {
2519 | "name": "ipython",
2520 | "version": 3
2521 | },
2522 | "file_extension": ".py",
2523 | "mimetype": "text/x-python",
2524 | "name": "python",
2525 | "nbconvert_exporter": "python",
2526 | "pygments_lexer": "ipython3",
2527 | "version": "3.5.4"
2528 | }
2529 | },
2530 | "nbformat": 4,
2531 | "nbformat_minor": 2
2532 | }
2533 |
--------------------------------------------------------------------------------
/Machine_Learning_Algorithms_Python.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Machine Learning Algorithms"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "3 Types of ML Algorithms
\n",
15 | "1. Supervised Learning - consisits of target variable and predictors. Regression and Classification. Models:Regression, KNN, Decision Tree, Random Forest, Logistics Regression
\n",
16 | "2. Unsupervised Learning - Do not have any outcome variables to predict. Clustering. Segement customer, or picture. Models: Apriori, K-means
\n",
17 | "3. Reinforcement learning - The machine is trained to make specific decisions. Markov Decision Process"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "- Training data: data used to fit the model\n",
25 | "- Test data"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "Split data into training and test data set"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 3,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "from sklearn.model_selection import train_test_split\n",
42 | "#train_X, test_X, train_y, test_y = train_test_split(X, y,random_state = 0)"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "## Linear Regression"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "Minimize the sum of squared difference of distince between obersed value and estimated value"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 1,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": [
65 | "from sklearn import linear_model"
66 | ]
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "metadata": {},
71 | "source": [
72 | " Identify feature and response variable(s) and values must be numeric and numpy arrays
\n",
73 | " Load train and test data sets"
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 3,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "# instantiate a model--make an instance\n",
83 | "lr = linear_model.LinearRegression()"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 5,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "# Train the model with trainning set and make prediction\n",
93 | "\n",
94 | "# lr.fit(X_train, y_train)\n",
95 | "# y_pred = lr.predict(X_test)"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 6,
101 | "metadata": {},
102 | "outputs": [],
103 | "source": [
104 | "#Equation coefficient and Intercept\n",
105 | "\n",
106 | "# print('Coefficient: \\n', linear.coef_)\n",
107 | "# print('Intercept: \\n', linear.intercept_)"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "## Logistic Regression"
115 | ]
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "metadata": {},
120 | "source": [
121 | "Classification: used to estimate discrete values---Binary values like 0/1, yes/no, true/false"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {},
127 | "source": [
128 | "Fit data to a logit function, and predicts probability"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 9,
134 | "metadata": {},
135 | "outputs": [],
136 | "source": [
137 | "from sklearn.linear_model import LogisticRegression"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 10,
143 | "metadata": {},
144 | "outputs": [],
145 | "source": [
146 | "# instantiate a model--make an instance\n",
147 | "logreg = LogisticRegression()"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 11,
153 | "metadata": {},
154 | "outputs": [],
155 | "source": [
156 | "# logreg.fit(X_train, y_train)\n",
157 | "# y_pred = logreg.predict(X_test)"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": 12,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "#Equation coefficient and Intercept\n",
167 | "# print('Coefficient: \\n', logreg.coef_)\n",
168 | "# print('Intercept: \\n', logreg.intercept_)"
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | "# Decision Tree"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "Classification"
183 | ]
184 | },
185 | {
186 | "cell_type": "markdown",
187 | "metadata": {},
188 | "source": [
189 | "Split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.
\n",
190 | "Tree's depth - how many splits it makes before coming to a prediction"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 17,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "from sklearn import tree"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 23,
205 | "metadata": {},
206 | "outputs": [],
207 | "source": [
208 | "# For Classification\n",
209 | "# lgorithm default is gini, others - entropy\n",
210 | "dt = tree.DecisionTreeClassifier(criterion = 'gini')"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 24,
216 | "metadata": {},
217 | "outputs": [],
218 | "source": [
219 | "# For Regression\n",
220 | "dt = tree.DecisionTreeRegressor()"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "## KNN - K-nearest Neighbors"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "Both Classification(more widely used) and Regression"
235 | ]
236 | },
237 | {
238 | "cell_type": "markdown",
239 | "metadata": {},
240 | "source": [
241 | "- KNN is computationaly expensive
\n",
242 | "- Variables should be normalized, else higher range varibles can bias it
\n",
243 | "- Remove outlier, noise before doing KNN"
244 | ]
245 | },
246 | {
247 | "cell_type": "code",
248 | "execution_count": 25,
249 | "metadata": {},
250 | "outputs": [],
251 | "source": [
252 | "from sklearn.neighbors import KNeighborsClassifier"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 26,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": [
261 | "# default value of n_neighbors is 5\n",
262 | "knn = KNeighborsClassifier(n_neighbors=6)"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {},
268 | "source": [
269 | "## Random Forest"
270 | ]
271 | },
272 | {
273 | "cell_type": "markdown",
274 | "metadata": {},
275 | "source": [
276 | "In Random Forest, there are a collection of decision trees. "
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 46,
282 | "metadata": {},
283 | "outputs": [],
284 | "source": [
285 | "from sklearn.ensemble import RandomForestClassifier"
286 | ]
287 | },
288 | {
289 | "cell_type": "code",
290 | "execution_count": 47,
291 | "metadata": {},
292 | "outputs": [],
293 | "source": [
294 | "rf = RandomForestClassifier()"
295 | ]
296 | },
297 | {
298 | "cell_type": "code",
299 | "execution_count": null,
300 | "metadata": {},
301 | "outputs": [],
302 | "source": [
303 | "from sklearn.ensemble import RandomForestRegressor\n",
304 | "rf = RandomForestRegressor()"
305 | ]
306 | },
307 | {
308 | "cell_type": "code",
309 | "execution_count": 51,
310 | "metadata": {},
311 | "outputs": [],
312 | "source": [
313 | "if 0:\n",
314 | " print ('Lin')"
315 | ]
316 | },
317 | {
318 | "cell_type": "markdown",
319 | "metadata": {},
320 | "source": [
321 | "## K-Means"
322 | ]
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": [
328 | "Unsupervised learning - Clustering"
329 | ]
330 | },
331 | {
332 | "cell_type": "markdown",
333 | "metadata": {},
334 | "source": [
335 | "How K-means forms cluster:
\n",
336 | "1. K-means picks k number of points for each cluster known as centroids.
\n",
337 | "2. Each data point forms a cluster with the closest centroids i.e. k clusters.
\n",
338 | "3. Finds the centroid of each cluster based on existing cluster members. Here we have new centroids.
\n",
339 | "4. As we have new centroids, repeat step 2 and 3. Find the closest distance for each data point from new centroids and get associated with new k-clusters. Repeat this process until convergence occurs i.e. centroids does not change."
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {},
345 | "source": [
346 | "How to determine value of K:
\n",
347 | "Sum of square of difference between centroid and the data points --- the number of cluster increases, the value keeps decreasing. If draw a plot, sum of square distince decreases sharply up to some value of k. Here, we can find the optimum number of cluster."
348 | ]
349 | },
350 | {
351 | "cell_type": "code",
352 | "execution_count": 27,
353 | "metadata": {},
354 | "outputs": [],
355 | "source": [
356 | "from sklearn.cluster import KMeans"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 28,
362 | "metadata": {},
363 | "outputs": [],
364 | "source": [
365 | "kmeans = KMeans(n_clusters = 3, random_state = 0)"
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": 4,
371 | "metadata": {},
372 | "outputs": [],
373 | "source": [
374 | "# kmeans.fit(X_train, y_train)\n",
375 | "# kmeans.predict(X_test)"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": null,
381 | "metadata": {},
382 | "outputs": [],
383 | "source": []
384 | },
385 | {
386 | "cell_type": "markdown",
387 | "metadata": {},
388 | "source": [
389 | "## Model Validation"
390 | ]
391 | },
392 | {
393 | "cell_type": "markdown",
394 | "metadata": {},
395 | "source": [
396 | "- MAE(Mean Absolute Error): absolute difference between predicted and actual value"
397 | ]
398 | },
399 | {
400 | "cell_type": "code",
401 | "execution_count": 1,
402 | "metadata": {},
403 | "outputs": [],
404 | "source": [
405 | "from sklearn.metrics import mean_absolute_error\n",
406 | "# mean_absolute_error(y,predicted_y)"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "metadata": {},
412 | "source": [
413 | "## Cross Validation"
414 | ]
415 | },
416 | {
417 | "cell_type": "markdown",
418 | "metadata": {},
419 | "source": [
420 | "By improving the accuracy score, we might get into the situation of over-fitting. Cross Validation helps to achieve more generalized relationships."
421 | ]
422 | },
423 | {
424 | "cell_type": "markdown",
425 | "metadata": {},
426 | "source": [
427 | "#### Method: k-fold cross validation"
428 | ]
429 | },
430 | {
431 | "cell_type": "markdown",
432 | "metadata": {},
433 | "source": [
434 | "Steps:
\n",
435 | "1. Randomly split data into k folds.
\n",
436 | "2. For each k folds, build and train the model on k-1 folds of the data set, and test the model on the kth fold.
\n",
437 | "3. Record error/accuracy.
\n",
438 | "4. Repeat until each of the k fold of data has served as test set.
\n",
439 | "5. The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model."
440 | ]
441 | },
442 | {
443 | "cell_type": "markdown",
444 | "metadata": {},
445 | "source": [
446 | "How to choose value of K --- often use k = 10
\n",
447 | "Lower value of k is more biased.
\n",
448 | "Large value of k is less biased, but can suffer from large variability."
449 | ]
450 | },
451 | {
452 | "cell_type": "code",
453 | "execution_count": 2,
454 | "metadata": {},
455 | "outputs": [],
456 | "source": [
457 | "from sklearn.model_selection import KFold\n",
458 | "from sklearn.model_selection import cross_val_score"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": 5,
464 | "metadata": {},
465 | "outputs": [],
466 | "source": [
467 | "kf = KFold(n_splits = 10, random_state=0)\n",
468 | "modelCV = RandomForestClassifier()\n",
469 | "scoring = \"accuracy\"\n",
470 | "# results = cross_val_score(modelCV, X_train, y_train, cv=kf,scoring = scoring)\n",
471 | "# print ('10-fold cross validation average accuracy: {}'.format(results.mean()))"
472 | ]
473 | },
474 | {
475 | "cell_type": "code",
476 | "execution_count": null,
477 | "metadata": {},
478 | "outputs": [],
479 | "source": []
480 | },
481 | {
482 | "cell_type": "code",
483 | "execution_count": null,
484 | "metadata": {},
485 | "outputs": [],
486 | "source": []
487 | },
488 | {
489 | "cell_type": "code",
490 | "execution_count": null,
491 | "metadata": {},
492 | "outputs": [],
493 | "source": []
494 | },
495 | {
496 | "cell_type": "code",
497 | "execution_count": null,
498 | "metadata": {},
499 | "outputs": [],
500 | "source": []
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "metadata": {},
505 | "source": [
506 | "## Underfitting, Overfitting and Model Optimization"
507 | ]
508 | },
509 | {
510 | "cell_type": "markdown",
511 | "metadata": {},
512 | "source": [
513 | "Now that we have a way to measure model accuracy, we can experiment with altenative models and see which gives the best predictions. (different options built with the model)"
514 | ]
515 | },
516 | {
517 | "cell_type": "markdown",
518 | "metadata": {},
519 | "source": [
520 | "Overfitting: model matches with training data almost perfectly, but does poorly in validation and other new data.
\n",
521 | "Underfitting: Model fails to capture important distinctions and patterns in the data
\n",
522 | "eg. for the decision tree model, more max_leaf_nodes, the more move from underfitting overfitting."
523 | ]
524 | }
525 | ],
526 | "metadata": {
527 | "kernelspec": {
528 | "display_name": "Python 3",
529 | "language": "python",
530 | "name": "python3"
531 | },
532 | "language_info": {
533 | "codemirror_mode": {
534 | "name": "ipython",
535 | "version": 3
536 | },
537 | "file_extension": ".py",
538 | "mimetype": "text/x-python",
539 | "name": "python",
540 | "nbconvert_exporter": "python",
541 | "pygments_lexer": "ipython3",
542 | "version": "3.6.4"
543 | }
544 | },
545 | "nbformat": 4,
546 | "nbformat_minor": 2
547 | }
548 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Data-Analysis-Machine-Learning-with-Python
2 | Data Analysis and Machine Learning with Python to Solve Business Problems
3 |
--------------------------------------------------------------------------------
/raw-data/readme:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------