├── 9781484261552.jpg
├── Chapter 1
└── ReadMe
├── Chapter 3
├── CreditRisk.csv
├── Decision Tree Random Forest Case Study.ipynb
├── Log_ROC.png
├── Logistic Regression Case Study.ipynb
├── Naive Bayes Case Study.ipynb
├── Network_Intrusion.csv
├── ReadMe
└── adult.data
├── Chapter 4
├── Chapter4_Boosting.ipynb
├── Chapter4_NLP2.ipynb
├── Chpater4_NLP1.ipynb
├── Imageclassification.ipynb
├── NeuralNetwork_ClassifcationFirst.ipynb
├── ReadMe
├── SVM_Chapter4.ipynb
├── bc2.csv
├── pima-indians-diabetes.csv
└── winequality-red-1.csv
├── Chapter 5
├── Chapter5.ipynb
├── Exploratory Data Analysis Notebook.ipynb
├── IRIS.csv
├── ReadMe
├── deliveries.csv
├── matches.csv
└── titanic.csv
├── Chapter2
├── Chapter2_PythonCode.ipynb
├── House_data.csv
├── House_data_LR.csv
├── House_data_MLR.csv
├── ReadMe
├── auto-mpg.csv
└── petrol_consumption.csv
├── Contributing.md
├── LICENSE.txt
├── README.md
└── errata.md
/9781484261552.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Apress/supervised-learning-w-python/68c94f12d27647fa3dcd6b19d83edfc0bb3c5f39/9781484261552.jpg
--------------------------------------------------------------------------------
/Chapter 1/ReadMe:
--------------------------------------------------------------------------------
1 |
2 | The first chapter of the book. No code in this chapter.
3 |
--------------------------------------------------------------------------------
/Chapter 3/CreditRisk.csv:
--------------------------------------------------------------------------------
1 | Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
2 | LP001002,Male,No,0,Graduate,No,5849,0,0,360,1,Urban,1
3 | LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,0
4 | LP001005,Male,Yes,0,Graduate,Yes,3000,0,66,360,1,Urban,1
5 | LP001006,Male,Yes,0,Not Graduate,No,2583,2358,120,360,1,Urban,1
6 | LP001008,Male,No,0,Graduate,No,6000,0,141,360,1,Urban,1
7 | LP001011,Male,Yes,2,Graduate,Yes,5417,4196,267,360,1,Urban,1
8 | LP001013,Male,Yes,0,Not Graduate,No,2333,1516,95,360,1,Urban,1
9 | LP001014,Male,Yes,3+,Graduate,No,3036,2504,158,360,0,Semiurban,0
10 | LP001018,Male,Yes,2,Graduate,No,4006,1526,168,360,1,Urban,1
11 | LP001020,Male,Yes,1,Graduate,No,12841,10968,349,360,1,Semiurban,0
12 | LP001024,Male,Yes,2,Graduate,No,3200,700,70,360,1,Urban,1
13 | LP001027,Male,Yes,2,Graduate,,2500,1840,109,360,1,Urban,1
14 | LP001028,Male,Yes,2,Graduate,No,3073,8106,200,360,1,Urban,1
15 | LP001029,Male,No,0,Graduate,No,1853,2840,114,360,1,Rural,0
16 | LP001030,Male,Yes,2,Graduate,No,1299,1086,17,120,1,Urban,1
17 | LP001032,Male,No,0,Graduate,No,4950,0,125,360,1,Urban,1
18 | LP001034,Male,No,1,Not Graduate,No,3596,0,100,240,,Urban,1
19 | LP001036,Female,No,0,Graduate,No,3510,0,76,360,0,Urban,0
20 | LP001038,Male,Yes,0,Not Graduate,No,4887,0,133,360,1,Rural,0
21 | LP001041,Male,Yes,0,Graduate,,2600,3500,115,,1,Urban,1
22 | LP001043,Male,Yes,0,Not Graduate,No,7660,0,104,360,0,Urban,0
23 | LP001046,Male,Yes,1,Graduate,No,5955,5625,315,360,1,Urban,1
24 | LP001047,Male,Yes,0,Not Graduate,No,2600,1911,116,360,0,Semiurban,0
25 | LP001050,,Yes,2,Not Graduate,No,3365,1917,112,360,0,Rural,0
26 | LP001052,Male,Yes,1,Graduate,,3717,2925,151,360,,Semiurban,0
27 | LP001066,Male,Yes,0,Graduate,Yes,9560,0,191,360,1,Semiurban,1
28 | LP001068,Male,Yes,0,Graduate,No,2799,2253,122,360,1,Semiurban,1
29 | LP001073,Male,Yes,2,Not Graduate,No,4226,1040,110,360,1,Urban,1
30 | LP001086,Male,No,0,Not Graduate,No,1442,0,35,360,1,Urban,0
31 | LP001087,Female,No,2,Graduate,,3750,2083,120,360,1,Semiurban,1
32 | LP001091,Male,Yes,1,Graduate,,4166,3369,201,360,,Urban,0
33 | LP001095,Male,No,0,Graduate,No,3167,0,74,360,1,Urban,0
34 | LP001097,Male,No,1,Graduate,Yes,4692,0,106,360,1,Rural,0
35 | LP001098,Male,Yes,0,Graduate,No,3500,1667,114,360,1,Semiurban,1
36 | LP001100,Male,No,3+,Graduate,No,12500,3000,320,360,1,Rural,0
37 | LP001106,Male,Yes,0,Graduate,No,2275,2067,0,360,1,Urban,1
38 | LP001109,Male,Yes,0,Graduate,No,1828,1330,100,,0,Urban,0
39 | LP001112,Female,Yes,0,Graduate,No,3667,1459,144,360,1,Semiurban,1
40 | LP001114,Male,No,0,Graduate,No,4166,7210,184,360,1,Urban,1
41 | LP001116,Male,No,0,Not Graduate,No,3748,1668,110,360,1,Semiurban,1
42 | LP001119,Male,No,0,Graduate,No,3600,0,80,360,1,Urban,0
43 | LP001120,Male,No,0,Graduate,No,1800,1213,47,360,1,Urban,1
44 | LP001123,Male,Yes,0,Graduate,No,2400,0,75,360,,Urban,1
45 | LP001131,Male,Yes,0,Graduate,No,3941,2336,134,360,1,Semiurban,1
46 | LP001136,Male,Yes,0,Not Graduate,Yes,4695,0,96,,1,Urban,1
47 | LP001137,Female,No,0,Graduate,No,3410,0,88,,1,Urban,1
48 | LP001138,Male,Yes,1,Graduate,No,5649,0,44,360,1,Urban,1
49 | LP001144,Male,Yes,0,Graduate,No,5821,0,144,360,1,Urban,1
50 | LP001146,Female,Yes,0,Graduate,No,2645,3440,120,360,0,Urban,0
51 | LP001151,Female,No,0,Graduate,No,4000,2275,144,360,1,Semiurban,1
52 | LP001155,Female,Yes,0,Not Graduate,No,1928,1644,100,360,1,Semiurban,1
53 | LP001157,Female,No,0,Graduate,No,3086,0,120,360,1,Semiurban,1
54 | LP001164,Female,No,0,Graduate,No,4230,0,112,360,1,Semiurban,0
55 | LP001179,Male,Yes,2,Graduate,No,4616,0,134,360,1,Urban,0
56 | LP001186,Female,Yes,1,Graduate,Yes,11500,0,286,360,0,Urban,0
57 | LP001194,Male,Yes,2,Graduate,No,2708,1167,97,360,1,Semiurban,1
58 | LP001195,Male,Yes,0,Graduate,No,2132,1591,96,360,1,Semiurban,1
59 | LP001197,Male,Yes,0,Graduate,No,3366,2200,135,360,1,Rural,0
60 | LP001198,Male,Yes,1,Graduate,No,8080,2250,180,360,1,Urban,1
61 | LP001199,Male,Yes,2,Not Graduate,No,3357,2859,144,360,1,Urban,1
62 | LP001205,Male,Yes,0,Graduate,No,2500,3796,120,360,1,Urban,1
63 | LP001206,Male,Yes,3+,Graduate,No,3029,0,99,360,1,Urban,1
64 | LP001207,Male,Yes,0,Not Graduate,Yes,2609,3449,165,180,0,Rural,0
65 | LP001213,Male,Yes,1,Graduate,No,4945,0,0,360,0,Rural,0
66 | LP001222,Female,No,0,Graduate,No,4166,0,116,360,0,Semiurban,0
67 | LP001225,Male,Yes,0,Graduate,No,5726,4595,258,360,1,Semiurban,0
68 | LP001228,Male,No,0,Not Graduate,No,3200,2254,126,180,0,Urban,0
69 | LP001233,Male,Yes,1,Graduate,No,10750,0,312,360,1,Urban,1
70 | LP001238,Male,Yes,3+,Not Graduate,Yes,7100,0,125,60,1,Urban,1
71 | LP001241,Female,No,0,Graduate,No,4300,0,136,360,0,Semiurban,0
72 | LP001243,Male,Yes,0,Graduate,No,3208,3066,172,360,1,Urban,1
73 | LP001245,Male,Yes,2,Not Graduate,Yes,1875,1875,97,360,1,Semiurban,1
74 | LP001248,Male,No,0,Graduate,No,3500,0,81,300,1,Semiurban,1
75 | LP001250,Male,Yes,3+,Not Graduate,No,4755,0,95,,0,Semiurban,0
76 | LP001253,Male,Yes,3+,Graduate,Yes,5266,1774,187,360,1,Semiurban,1
77 | LP001255,Male,No,0,Graduate,No,3750,0,113,480,1,Urban,0
78 | LP001256,Male,No,0,Graduate,No,3750,4750,176,360,1,Urban,0
79 | LP001259,Male,Yes,1,Graduate,Yes,1000,3022,110,360,1,Urban,0
80 | LP001263,Male,Yes,3+,Graduate,No,3167,4000,180,300,0,Semiurban,0
81 | LP001264,Male,Yes,3+,Not Graduate,Yes,3333,2166,130,360,,Semiurban,1
82 | LP001265,Female,No,0,Graduate,No,3846,0,111,360,1,Semiurban,1
83 | LP001266,Male,Yes,1,Graduate,Yes,2395,0,0,360,1,Semiurban,1
84 | LP001267,Female,Yes,2,Graduate,No,1378,1881,167,360,1,Urban,0
85 | LP001273,Male,Yes,0,Graduate,No,6000,2250,265,360,,Semiurban,0
86 | LP001275,Male,Yes,1,Graduate,No,3988,0,50,240,1,Urban,1
87 | LP001279,Male,No,0,Graduate,No,2366,2531,136,360,1,Semiurban,1
88 | LP001280,Male,Yes,2,Not Graduate,No,3333,2000,99,360,,Semiurban,1
89 | LP001282,Male,Yes,0,Graduate,No,2500,2118,104,360,1,Semiurban,1
90 | LP001289,Male,No,0,Graduate,No,8566,0,210,360,1,Urban,1
91 | LP001310,Male,Yes,0,Graduate,No,5695,4167,175,360,1,Semiurban,1
92 | LP001316,Male,Yes,0,Graduate,No,2958,2900,131,360,1,Semiurban,1
93 | LP001318,Male,Yes,2,Graduate,No,6250,5654,188,180,1,Semiurban,1
94 | LP001319,Male,Yes,2,Not Graduate,No,3273,1820,81,360,1,Urban,1
95 | LP001322,Male,No,0,Graduate,No,4133,0,122,360,1,Semiurban,1
96 | LP001325,Male,No,0,Not Graduate,No,3620,0,25,120,1,Semiurban,1
97 | LP001326,Male,No,0,Graduate,,6782,0,0,360,,Urban,0
98 | LP001327,Female,Yes,0,Graduate,No,2484,2302,137,360,1,Semiurban,1
99 | LP001333,Male,Yes,0,Graduate,No,1977,997,50,360,1,Semiurban,1
100 | LP001334,Male,Yes,0,Not Graduate,No,4188,0,115,180,1,Semiurban,1
101 | LP001343,Male,Yes,0,Graduate,No,1759,3541,131,360,1,Semiurban,1
102 | LP001345,Male,Yes,2,Not Graduate,No,4288,3263,133,180,1,Urban,1
103 | LP001349,Male,No,0,Graduate,No,4843,3806,151,360,1,Semiurban,1
104 | LP001350,Male,Yes,,Graduate,No,13650,0,0,360,1,Urban,1
105 | LP001356,Male,Yes,0,Graduate,No,4652,3583,0,360,1,Semiurban,1
106 | LP001357,Male,,,Graduate,No,3816,754,160,360,1,Urban,1
107 | LP001367,Male,Yes,1,Graduate,No,3052,1030,100,360,1,Urban,1
108 | LP001369,Male,Yes,2,Graduate,No,11417,1126,225,360,1,Urban,1
109 | LP001370,Male,No,0,Not Graduate,,7333,0,120,360,1,Rural,0
110 | LP001379,Male,Yes,2,Graduate,No,3800,3600,216,360,0,Urban,0
111 | LP001384,Male,Yes,3+,Not Graduate,No,2071,754,94,480,1,Semiurban,1
112 | LP001385,Male,No,0,Graduate,No,5316,0,136,360,1,Urban,1
113 | LP001387,Female,Yes,0,Graduate,,2929,2333,139,360,1,Semiurban,1
114 | LP001391,Male,Yes,0,Not Graduate,No,3572,4114,152,,0,Rural,0
115 | LP001392,Female,No,1,Graduate,Yes,7451,0,0,360,1,Semiurban,1
116 | LP001398,Male,No,0,Graduate,,5050,0,118,360,1,Semiurban,1
117 | LP001401,Male,Yes,1,Graduate,No,14583,0,185,180,1,Rural,1
118 | LP001404,Female,Yes,0,Graduate,No,3167,2283,154,360,1,Semiurban,1
119 | LP001405,Male,Yes,1,Graduate,No,2214,1398,85,360,,Urban,1
120 | LP001421,Male,Yes,0,Graduate,No,5568,2142,175,360,1,Rural,0
121 | LP001422,Female,No,0,Graduate,No,10408,0,259,360,1,Urban,1
122 | LP001426,Male,Yes,,Graduate,No,5667,2667,180,360,1,Rural,1
123 | LP001430,Female,No,0,Graduate,No,4166,0,44,360,1,Semiurban,1
124 | LP001431,Female,No,0,Graduate,No,2137,8980,137,360,0,Semiurban,1
125 | LP001432,Male,Yes,2,Graduate,No,2957,0,81,360,1,Semiurban,1
126 | LP001439,Male,Yes,0,Not Graduate,No,4300,2014,194,360,1,Rural,1
127 | LP001443,Female,No,0,Graduate,No,3692,0,93,360,,Rural,1
128 | LP001448,,Yes,3+,Graduate,No,23803,0,370,360,1,Rural,1
129 | LP001449,Male,No,0,Graduate,No,3865,1640,0,360,1,Rural,1
130 | LP001451,Male,Yes,1,Graduate,Yes,10513,3850,160,180,0,Urban,0
131 | LP001465,Male,Yes,0,Graduate,No,6080,2569,182,360,,Rural,0
132 | LP001469,Male,No,0,Graduate,Yes,20166,0,650,480,,Urban,1
133 | LP001473,Male,No,0,Graduate,No,2014,1929,74,360,1,Urban,1
134 | LP001478,Male,No,0,Graduate,No,2718,0,70,360,1,Semiurban,1
135 | LP001482,Male,Yes,0,Graduate,Yes,3459,0,25,120,1,Semiurban,1
136 | LP001487,Male,No,0,Graduate,No,4895,0,102,360,1,Semiurban,1
137 | LP001488,Male,Yes,3+,Graduate,No,4000,7750,290,360,1,Semiurban,0
138 | LP001489,Female,Yes,0,Graduate,No,4583,0,84,360,1,Rural,0
139 | LP001491,Male,Yes,2,Graduate,Yes,3316,3500,88,360,1,Urban,1
140 | LP001492,Male,No,0,Graduate,No,14999,0,242,360,0,Semiurban,0
141 | LP001493,Male,Yes,2,Not Graduate,No,4200,1430,129,360,1,Rural,0
142 | LP001497,Male,Yes,2,Graduate,No,5042,2083,185,360,1,Rural,0
143 | LP001498,Male,No,0,Graduate,No,5417,0,168,360,1,Urban,1
144 | LP001504,Male,No,0,Graduate,Yes,6950,0,175,180,1,Semiurban,1
145 | LP001507,Male,Yes,0,Graduate,No,2698,2034,122,360,1,Semiurban,1
146 | LP001508,Male,Yes,2,Graduate,No,11757,0,187,180,1,Urban,1
147 | LP001514,Female,Yes,0,Graduate,No,2330,4486,100,360,1,Semiurban,1
148 | LP001516,Female,Yes,2,Graduate,No,14866,0,70,360,1,Urban,1
149 | LP001518,Male,Yes,1,Graduate,No,1538,1425,30,360,1,Urban,1
150 | LP001519,Female,No,0,Graduate,No,10000,1666,225,360,1,Rural,0
151 | LP001520,Male,Yes,0,Graduate,No,4860,830,125,360,1,Semiurban,1
152 | LP001528,Male,No,0,Graduate,No,6277,0,118,360,0,Rural,0
153 | LP001529,Male,Yes,0,Graduate,Yes,2577,3750,152,360,1,Rural,1
154 | LP001531,Male,No,0,Graduate,No,9166,0,244,360,1,Urban,0
155 | LP001532,Male,Yes,2,Not Graduate,No,2281,0,113,360,1,Rural,0
156 | LP001535,Male,No,0,Graduate,No,3254,0,50,360,1,Urban,1
157 | LP001536,Male,Yes,3+,Graduate,No,39999,0,600,180,0,Semiurban,1
158 | LP001541,Male,Yes,1,Graduate,No,6000,0,160,360,,Rural,1
159 | LP001543,Male,Yes,1,Graduate,No,9538,0,187,360,1,Urban,1
160 | LP001546,Male,No,0,Graduate,,2980,2083,120,360,1,Rural,1
161 | LP001552,Male,Yes,0,Graduate,No,4583,5625,255,360,1,Semiurban,1
162 | LP001560,Male,Yes,0,Not Graduate,No,1863,1041,98,360,1,Semiurban,1
163 | LP001562,Male,Yes,0,Graduate,No,7933,0,275,360,1,Urban,0
164 | LP001565,Male,Yes,1,Graduate,No,3089,1280,121,360,0,Semiurban,0
165 | LP001570,Male,Yes,2,Graduate,No,4167,1447,158,360,1,Rural,1
166 | LP001572,Male,Yes,0,Graduate,No,9323,0,75,180,1,Urban,1
167 | LP001574,Male,Yes,0,Graduate,No,3707,3166,182,,1,Rural,1
168 | LP001577,Female,Yes,0,Graduate,No,4583,0,112,360,1,Rural,0
169 | LP001578,Male,Yes,0,Graduate,No,2439,3333,129,360,1,Rural,1
170 | LP001579,Male,No,0,Graduate,No,2237,0,63,480,0,Semiurban,0
171 | LP001580,Male,Yes,2,Graduate,No,8000,0,200,360,1,Semiurban,1
172 | LP001581,Male,Yes,0,Not Graduate,,1820,1769,95,360,1,Rural,1
173 | LP001585,,Yes,3+,Graduate,No,51763,0,700,300,1,Urban,1
174 | LP001586,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural,0
175 | LP001594,Male,Yes,0,Graduate,No,5708,5625,187,360,1,Semiurban,1
176 | LP001603,Male,Yes,0,Not Graduate,Yes,4344,736,87,360,1,Semiurban,0
177 | LP001606,Male,Yes,0,Graduate,No,3497,1964,116,360,1,Rural,1
178 | LP001608,Male,Yes,2,Graduate,No,2045,1619,101,360,1,Rural,1
179 | LP001610,Male,Yes,3+,Graduate,No,5516,11300,495,360,0,Semiurban,0
180 | LP001616,Male,Yes,1,Graduate,No,3750,0,116,360,1,Semiurban,1
181 | LP001630,Male,No,0,Not Graduate,No,2333,1451,102,480,0,Urban,0
182 | LP001633,Male,Yes,1,Graduate,No,6400,7250,180,360,0,Urban,0
183 | LP001634,Male,No,0,Graduate,No,1916,5063,67,360,,Rural,0
184 | LP001636,Male,Yes,0,Graduate,No,4600,0,73,180,1,Semiurban,1
185 | LP001637,Male,Yes,1,Graduate,No,33846,0,260,360,1,Semiurban,0
186 | LP001639,Female,Yes,0,Graduate,No,3625,0,108,360,1,Semiurban,1
187 | LP001640,Male,Yes,0,Graduate,Yes,39147,4750,120,360,1,Semiurban,1
188 | LP001641,Male,Yes,1,Graduate,Yes,2178,0,66,300,0,Rural,0
189 | LP001643,Male,Yes,0,Graduate,No,2383,2138,58,360,,Rural,1
190 | LP001644,,Yes,0,Graduate,Yes,674,5296,168,360,1,Rural,1
191 | LP001647,Male,Yes,0,Graduate,No,9328,0,188,180,1,Rural,1
192 | LP001653,Male,No,0,Not Graduate,No,4885,0,48,360,1,Rural,1
193 | LP001656,Male,No,0,Graduate,No,12000,0,164,360,1,Semiurban,0
194 | LP001657,Male,Yes,0,Not Graduate,No,6033,0,160,360,1,Urban,0
195 | LP001658,Male,No,0,Graduate,No,3858,0,76,360,1,Semiurban,1
196 | LP001664,Male,No,0,Graduate,No,4191,0,120,360,1,Rural,1
197 | LP001665,Male,Yes,1,Graduate,No,3125,2583,170,360,1,Semiurban,0
198 | LP001666,Male,No,0,Graduate,No,8333,3750,187,360,1,Rural,1
199 | LP001669,Female,No,0,Not Graduate,No,1907,2365,120,,1,Urban,1
200 | LP001671,Female,Yes,0,Graduate,No,3416,2816,113,360,,Semiurban,1
201 | LP001673,Male,No,0,Graduate,Yes,11000,0,83,360,1,Urban,0
202 | LP001674,Male,Yes,1,Not Graduate,No,2600,2500,90,360,1,Semiurban,1
203 | LP001677,Male,No,2,Graduate,No,4923,0,166,360,0,Semiurban,1
204 | LP001682,Male,Yes,3+,Not Graduate,No,3992,0,0,180,1,Urban,0
205 | LP001688,Male,Yes,1,Not Graduate,No,3500,1083,135,360,1,Urban,1
206 | LP001691,Male,Yes,2,Not Graduate,No,3917,0,124,360,1,Semiurban,1
207 | LP001692,Female,No,0,Not Graduate,No,4408,0,120,360,1,Semiurban,1
208 | LP001693,Female,No,0,Graduate,No,3244,0,80,360,1,Urban,1
209 | LP001698,Male,No,0,Not Graduate,No,3975,2531,55,360,1,Rural,1
210 | LP001699,Male,No,0,Graduate,No,2479,0,59,360,1,Urban,1
211 | LP001702,Male,No,0,Graduate,No,3418,0,127,360,1,Semiurban,0
212 | LP001708,Female,No,0,Graduate,No,10000,0,214,360,1,Semiurban,0
213 | LP001711,Male,Yes,3+,Graduate,No,3430,1250,128,360,0,Semiurban,0
214 | LP001713,Male,Yes,1,Graduate,Yes,7787,0,240,360,1,Urban,1
215 | LP001715,Male,Yes,3+,Not Graduate,Yes,5703,0,130,360,1,Rural,1
216 | LP001716,Male,Yes,0,Graduate,No,3173,3021,137,360,1,Urban,1
217 | LP001720,Male,Yes,3+,Not Graduate,No,3850,983,100,360,1,Semiurban,1
218 | LP001722,Male,Yes,0,Graduate,No,150,1800,135,360,1,Rural,0
219 | LP001726,Male,Yes,0,Graduate,No,3727,1775,131,360,1,Semiurban,1
220 | LP001732,Male,Yes,2,Graduate,,5000,0,72,360,0,Semiurban,0
221 | LP001734,Female,Yes,2,Graduate,No,4283,2383,127,360,,Semiurban,1
222 | LP001736,Male,Yes,0,Graduate,No,2221,0,60,360,0,Urban,0
223 | LP001743,Male,Yes,2,Graduate,No,4009,1717,116,360,1,Semiurban,1
224 | LP001744,Male,No,0,Graduate,No,2971,2791,144,360,1,Semiurban,1
225 | LP001749,Male,Yes,0,Graduate,No,7578,1010,175,,1,Semiurban,1
226 | LP001750,Male,Yes,0,Graduate,No,6250,0,128,360,1,Semiurban,1
227 | LP001751,Male,Yes,0,Graduate,No,3250,0,170,360,1,Rural,0
228 | LP001754,Male,Yes,,Not Graduate,Yes,4735,0,138,360,1,Urban,0
229 | LP001758,Male,Yes,2,Graduate,No,6250,1695,210,360,1,Semiurban,1
230 | LP001760,Male,,,Graduate,No,4758,0,158,480,1,Semiurban,1
231 | LP001761,Male,No,0,Graduate,Yes,6400,0,200,360,1,Rural,1
232 | LP001765,Male,Yes,1,Graduate,No,2491,2054,104,360,1,Semiurban,1
233 | LP001768,Male,Yes,0,Graduate,,3716,0,42,180,1,Rural,1
234 | LP001770,Male,No,0,Not Graduate,No,3189,2598,120,,1,Rural,1
235 | LP001776,Female,No,0,Graduate,No,8333,0,280,360,1,Semiurban,1
236 | LP001778,Male,Yes,1,Graduate,No,3155,1779,140,360,1,Semiurban,1
237 | LP001784,Male,Yes,1,Graduate,No,5500,1260,170,360,1,Rural,1
238 | LP001786,Male,Yes,0,Graduate,,5746,0,255,360,,Urban,0
239 | LP001788,Female,No,0,Graduate,Yes,3463,0,122,360,,Urban,1
240 | LP001790,Female,No,1,Graduate,No,3812,0,112,360,1,Rural,1
241 | LP001792,Male,Yes,1,Graduate,No,3315,0,96,360,1,Semiurban,1
242 | LP001798,Male,Yes,2,Graduate,No,5819,5000,120,360,1,Rural,1
243 | LP001800,Male,Yes,1,Not Graduate,No,2510,1983,140,180,1,Urban,0
244 | LP001806,Male,No,0,Graduate,No,2965,5701,155,60,1,Urban,1
245 | LP001807,Male,Yes,2,Graduate,Yes,6250,1300,108,360,1,Rural,1
246 | LP001811,Male,Yes,0,Not Graduate,No,3406,4417,123,360,1,Semiurban,1
247 | LP001813,Male,No,0,Graduate,Yes,6050,4333,120,180,1,Urban,0
248 | LP001814,Male,Yes,2,Graduate,No,9703,0,112,360,1,Urban,1
249 | LP001819,Male,Yes,1,Not Graduate,No,6608,0,137,180,1,Urban,1
250 | LP001824,Male,Yes,1,Graduate,No,2882,1843,123,480,1,Semiurban,1
251 | LP001825,Male,Yes,0,Graduate,No,1809,1868,90,360,1,Urban,1
252 | LP001835,Male,Yes,0,Not Graduate,No,1668,3890,201,360,0,Semiurban,0
253 | LP001836,Female,No,2,Graduate,No,3427,0,138,360,1,Urban,0
254 | LP001841,Male,No,0,Not Graduate,Yes,2583,2167,104,360,1,Rural,1
255 | LP001843,Male,Yes,1,Not Graduate,No,2661,7101,279,180,1,Semiurban,1
256 | LP001844,Male,No,0,Graduate,Yes,16250,0,192,360,0,Urban,0
257 | LP001846,Female,No,3+,Graduate,No,3083,0,255,360,1,Rural,1
258 | LP001849,Male,No,0,Not Graduate,No,6045,0,115,360,0,Rural,0
259 | LP001854,Male,Yes,3+,Graduate,No,5250,0,94,360,1,Urban,0
260 | LP001859,Male,Yes,0,Graduate,No,14683,2100,304,360,1,Rural,0
261 | LP001864,Male,Yes,3+,Not Graduate,No,4931,0,128,360,,Semiurban,0
262 | LP001865,Male,Yes,1,Graduate,No,6083,4250,330,360,,Urban,1
263 | LP001868,Male,No,0,Graduate,No,2060,2209,134,360,1,Semiurban,1
264 | LP001870,Female,No,1,Graduate,No,3481,0,155,36,1,Semiurban,0
265 | LP001871,Female,No,0,Graduate,No,7200,0,120,360,1,Rural,1
266 | LP001872,Male,No,0,Graduate,Yes,5166,0,128,360,1,Semiurban,1
267 | LP001875,Male,No,0,Graduate,No,4095,3447,151,360,1,Rural,1
268 | LP001877,Male,Yes,2,Graduate,No,4708,1387,150,360,1,Semiurban,1
269 | LP001882,Male,Yes,3+,Graduate,No,4333,1811,160,360,0,Urban,1
270 | LP001883,Female,No,0,Graduate,,3418,0,135,360,1,Rural,0
271 | LP001884,Female,No,1,Graduate,No,2876,1560,90,360,1,Urban,1
272 | LP001888,Female,No,0,Graduate,No,3237,0,30,360,1,Urban,1
273 | LP001891,Male,Yes,0,Graduate,No,11146,0,136,360,1,Urban,1
274 | LP001892,Male,No,0,Graduate,No,2833,1857,126,360,1,Rural,1
275 | LP001894,Male,Yes,0,Graduate,No,2620,2223,150,360,1,Semiurban,1
276 | LP001896,Male,Yes,2,Graduate,No,3900,0,90,360,1,Semiurban,1
277 | LP001900,Male,Yes,1,Graduate,No,2750,1842,115,360,1,Semiurban,1
278 | LP001903,Male,Yes,0,Graduate,No,3993,3274,207,360,1,Semiurban,1
279 | LP001904,Male,Yes,0,Graduate,No,3103,1300,80,360,1,Urban,1
280 | LP001907,Male,Yes,0,Graduate,No,14583,0,436,360,1,Semiurban,1
281 | LP001908,Female,Yes,0,Not Graduate,No,4100,0,124,360,,Rural,1
282 | LP001910,Male,No,1,Not Graduate,Yes,4053,2426,158,360,0,Urban,0
283 | LP001914,Male,Yes,0,Graduate,No,3927,800,112,360,1,Semiurban,1
284 | LP001915,Male,Yes,2,Graduate,No,2301,985.7999878,78,180,1,Urban,1
285 | LP001917,Female,No,0,Graduate,No,1811,1666,54,360,1,Urban,1
286 | LP001922,Male,Yes,0,Graduate,No,20667,0,0,360,1,Rural,0
287 | LP001924,Male,No,0,Graduate,No,3158,3053,89,360,1,Rural,1
288 | LP001925,Female,No,0,Graduate,Yes,2600,1717,99,300,1,Semiurban,0
289 | LP001926,Male,Yes,0,Graduate,No,3704,2000,120,360,1,Rural,1
290 | LP001931,Female,No,0,Graduate,No,4124,0,115,360,1,Semiurban,1
291 | LP001935,Male,No,0,Graduate,No,9508,0,187,360,1,Rural,1
292 | LP001936,Male,Yes,0,Graduate,No,3075,2416,139,360,1,Rural,1
293 | LP001938,Male,Yes,2,Graduate,No,4400,0,127,360,0,Semiurban,0
294 | LP001940,Male,Yes,2,Graduate,No,3153,1560,134,360,1,Urban,1
295 | LP001945,Female,No,,Graduate,No,5417,0,143,480,0,Urban,0
296 | LP001947,Male,Yes,0,Graduate,No,2383,3334,172,360,1,Semiurban,1
297 | LP001949,Male,Yes,3+,Graduate,,4416,1250,110,360,1,Urban,1
298 | LP001953,Male,Yes,1,Graduate,No,6875,0,200,360,1,Semiurban,1
299 | LP001954,Female,Yes,1,Graduate,No,4666,0,135,360,1,Urban,1
300 | LP001955,Female,No,0,Graduate,No,5000,2541,151,480,1,Rural,0
301 | LP001963,Male,Yes,1,Graduate,No,2014,2925,113,360,1,Urban,0
302 | LP001964,Male,Yes,0,Not Graduate,No,1800,2934,93,360,0,Urban,0
303 | LP001972,Male,Yes,,Not Graduate,No,2875,1750,105,360,1,Semiurban,1
304 | LP001974,Female,No,0,Graduate,No,5000,0,132,360,1,Rural,1
305 | LP001977,Male,Yes,1,Graduate,No,1625,1803,96,360,1,Urban,1
306 | LP001978,Male,No,0,Graduate,No,4000,2500,140,360,1,Rural,1
307 | LP001990,Male,No,0,Not Graduate,No,2000,0,0,360,1,Urban,0
308 | LP001993,Female,No,0,Graduate,No,3762,1666,135,360,1,Rural,1
309 | LP001994,Female,No,0,Graduate,No,2400,1863,104,360,0,Urban,0
310 | LP001996,Male,No,0,Graduate,No,20233,0,480,360,1,Rural,0
311 | LP001998,Male,Yes,2,Not Graduate,No,7667,0,185,360,,Rural,1
312 | LP002002,Female,No,0,Graduate,No,2917,0,84,360,1,Semiurban,1
313 | LP002004,Male,No,0,Not Graduate,No,2927,2405,111,360,1,Semiurban,1
314 | LP002006,Female,No,0,Graduate,No,2507,0,56,360,1,Rural,1
315 | LP002008,Male,Yes,2,Graduate,Yes,5746,0,144,84,,Rural,1
316 | LP002024,,Yes,0,Graduate,No,2473,1843,159,360,1,Rural,0
317 | LP002031,Male,Yes,1,Not Graduate,No,3399,1640,111,180,1,Urban,1
318 | LP002035,Male,Yes,2,Graduate,No,3717,0,120,360,1,Semiurban,1
319 | LP002036,Male,Yes,0,Graduate,No,2058,2134,88,360,,Urban,1
320 | LP002043,Female,No,1,Graduate,No,3541,0,112,360,,Semiurban,1
321 | LP002050,Male,Yes,1,Graduate,Yes,10000,0,155,360,1,Rural,0
322 | LP002051,Male,Yes,0,Graduate,No,2400,2167,115,360,1,Semiurban,1
323 | LP002053,Male,Yes,3+,Graduate,No,4342,189,124,360,1,Semiurban,1
324 | LP002054,Male,Yes,2,Not Graduate,No,3601,1590,0,360,1,Rural,1
325 | LP002055,Female,No,0,Graduate,No,3166,2985,132,360,,Rural,1
326 | LP002065,Male,Yes,3+,Graduate,No,15000,0,300,360,1,Rural,1
327 | LP002067,Male,Yes,1,Graduate,Yes,8666,4983,376,360,0,Rural,0
328 | LP002068,Male,No,0,Graduate,No,4917,0,130,360,0,Rural,1
329 | LP002082,Male,Yes,0,Graduate,Yes,5818,2160,184,360,1,Semiurban,1
330 | LP002086,Female,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,0
331 | LP002087,Female,No,0,Graduate,No,2500,0,67,360,1,Urban,1
332 | LP002097,Male,No,1,Graduate,No,4384,1793,117,360,1,Urban,1
333 | LP002098,Male,No,0,Graduate,No,2935,0,98,360,1,Semiurban,1
334 | LP002100,Male,No,,Graduate,No,2833,0,71,360,1,Urban,1
335 | LP002101,Male,Yes,0,Graduate,,63337,0,490,180,1,Urban,1
336 | LP002103,,Yes,1,Graduate,Yes,9833,1833,182,180,1,Urban,1
337 | LP002106,Male,Yes,,Graduate,Yes,5503,4490,70,,1,Semiurban,1
338 | LP002110,Male,Yes,1,Graduate,,5250,688,160,360,1,Rural,1
339 | LP002112,Male,Yes,2,Graduate,Yes,2500,4600,176,360,1,Rural,1
340 | LP002113,Female,No,3+,Not Graduate,No,1830,0,0,360,0,Urban,0
341 | LP002114,Female,No,0,Graduate,No,4160,0,71,360,1,Semiurban,1
342 | LP002115,Male,Yes,3+,Not Graduate,No,2647,1587,173,360,1,Rural,0
343 | LP002116,Female,No,0,Graduate,No,2378,0,46,360,1,Rural,0
344 | LP002119,Male,Yes,1,Not Graduate,No,4554,1229,158,360,1,Urban,1
345 | LP002126,Male,Yes,3+,Not Graduate,No,3173,0,74,360,1,Semiurban,1
346 | LP002128,Male,Yes,2,Graduate,,2583,2330,125,360,1,Rural,1
347 | LP002129,Male,Yes,0,Graduate,No,2499,2458,160,360,1,Semiurban,1
348 | LP002130,Male,Yes,,Not Graduate,No,3523,3230,152,360,0,Rural,0
349 | LP002131,Male,Yes,2,Not Graduate,No,3083,2168,126,360,1,Urban,1
350 | LP002137,Male,Yes,0,Graduate,No,6333,4583,259,360,,Semiurban,1
351 | LP002138,Male,Yes,0,Graduate,No,2625,6250,187,360,1,Rural,1
352 | LP002139,Male,Yes,0,Graduate,No,9083,0,228,360,1,Semiurban,1
353 | LP002140,Male,No,0,Graduate,No,8750,4167,308,360,1,Rural,0
354 | LP002141,Male,Yes,3+,Graduate,No,2666,2083,95,360,1,Rural,1
355 | LP002142,Female,Yes,0,Graduate,Yes,5500,0,105,360,0,Rural,0
356 | LP002143,Female,Yes,0,Graduate,No,2423,505,130,360,1,Semiurban,1
357 | LP002144,Female,No,,Graduate,No,3813,0,116,180,1,Urban,1
358 | LP002149,Male,Yes,2,Graduate,No,8333,3167,165,360,1,Rural,1
359 | LP002151,Male,Yes,1,Graduate,No,3875,0,67,360,1,Urban,0
360 | LP002158,Male,Yes,0,Not Graduate,No,3000,1666,100,480,0,Urban,0
361 | LP002160,Male,Yes,3+,Graduate,No,5167,3167,200,360,1,Semiurban,1
362 | LP002161,Female,No,1,Graduate,No,4723,0,81,360,1,Semiurban,0
363 | LP002170,Male,Yes,2,Graduate,No,5000,3667,236,360,1,Semiurban,1
364 | LP002175,Male,Yes,0,Graduate,No,4750,2333,130,360,1,Urban,1
365 | LP002178,Male,Yes,0,Graduate,No,3013,3033,95,300,,Urban,1
366 | LP002180,Male,No,0,Graduate,Yes,6822,0,141,360,1,Rural,1
367 | LP002181,Male,No,0,Not Graduate,No,6216,0,133,360,1,Rural,0
368 | LP002187,Male,No,0,Graduate,No,2500,0,96,480,1,Semiurban,0
369 | LP002188,Male,No,0,Graduate,No,5124,0,124,,0,Rural,0
370 | LP002190,Male,Yes,1,Graduate,No,6325,0,175,360,1,Semiurban,1
371 | LP002191,Male,Yes,0,Graduate,No,19730,5266,570,360,1,Rural,0
372 | LP002194,Female,No,0,Graduate,Yes,15759,0,55,360,1,Semiurban,1
373 | LP002197,Male,Yes,2,Graduate,No,5185,0,155,360,1,Semiurban,1
374 | LP002201,Male,Yes,2,Graduate,Yes,9323,7873,380,300,1,Rural,1
375 | LP002205,Male,No,1,Graduate,No,3062,1987,111,180,0,Urban,0
376 | LP002209,Female,No,0,Graduate,,2764,1459,110,360,1,Urban,1
377 | LP002211,Male,Yes,0,Graduate,No,4817,923,120,180,1,Urban,1
378 | LP002219,Male,Yes,3+,Graduate,No,8750,4996,130,360,1,Rural,1
379 | LP002223,Male,Yes,0,Graduate,No,4310,0,130,360,,Semiurban,1
380 | LP002224,Male,No,0,Graduate,No,3069,0,71,480,1,Urban,0
381 | LP002225,Male,Yes,2,Graduate,No,5391,0,130,360,1,Urban,1
382 | LP002226,Male,Yes,0,Graduate,,3333,2500,128,360,1,Semiurban,1
383 | LP002229,Male,No,0,Graduate,No,5941,4232,296,360,1,Semiurban,1
384 | LP002231,Female,No,0,Graduate,No,6000,0,156,360,1,Urban,1
385 | LP002234,Male,No,0,Graduate,Yes,7167,0,128,360,1,Urban,1
386 | LP002236,Male,Yes,2,Graduate,No,4566,0,100,360,1,Urban,0
387 | LP002237,Male,No,1,Graduate,,3667,0,113,180,1,Urban,1
388 | LP002239,Male,No,0,Not Graduate,No,2346,1600,132,360,1,Semiurban,1
389 | LP002243,Male,Yes,0,Not Graduate,No,3010,3136,0,360,0,Urban,0
390 | LP002244,Male,Yes,0,Graduate,No,2333,2417,136,360,1,Urban,1
391 | LP002250,Male,Yes,0,Graduate,No,5488,0,125,360,1,Rural,1
392 | LP002255,Male,No,3+,Graduate,No,9167,0,185,360,1,Rural,1
393 | LP002262,Male,Yes,3+,Graduate,No,9504,0,275,360,1,Rural,1
394 | LP002263,Male,Yes,0,Graduate,No,2583,2115,120,360,,Urban,1
395 | LP002265,Male,Yes,2,Not Graduate,No,1993,1625,113,180,1,Semiurban,1
396 | LP002266,Male,Yes,2,Graduate,No,3100,1400,113,360,1,Urban,1
397 | LP002272,Male,Yes,2,Graduate,No,3276,484,135,360,,Semiurban,1
398 | LP002277,Female,No,0,Graduate,No,3180,0,71,360,0,Urban,0
399 | LP002281,Male,Yes,0,Graduate,No,3033,1459,95,360,1,Urban,1
400 | LP002284,Male,No,0,Not Graduate,No,3902,1666,109,360,1,Rural,1
401 | LP002287,Female,No,0,Graduate,No,1500,1800,103,360,0,Semiurban,0
402 | LP002288,Male,Yes,2,Not Graduate,No,2889,0,45,180,0,Urban,0
403 | LP002296,Male,No,0,Not Graduate,No,2755,0,65,300,1,Rural,0
404 | LP002297,Male,No,0,Graduate,No,2500,20000,103,360,1,Semiurban,1
405 | LP002300,Female,No,0,Not Graduate,No,1963,0,53,360,1,Semiurban,1
406 | LP002301,Female,No,0,Graduate,Yes,7441,0,194,360,1,Rural,0
407 | LP002305,Female,No,0,Graduate,No,4547,0,115,360,1,Semiurban,1
408 | LP002308,Male,Yes,0,Not Graduate,No,2167,2400,115,360,1,Urban,1
409 | LP002314,Female,No,0,Not Graduate,No,2213,0,66,360,1,Rural,1
410 | LP002315,Male,Yes,1,Graduate,No,8300,0,152,300,0,Semiurban,0
411 | LP002317,Male,Yes,3+,Graduate,No,81000,0,360,360,0,Rural,0
412 | LP002318,Female,No,1,Not Graduate,Yes,3867,0,62,360,1,Semiurban,0
413 | LP002319,Male,Yes,0,Graduate,,6256,0,160,360,,Urban,1
414 | LP002328,Male,Yes,0,Not Graduate,No,6096,0,218,360,0,Rural,0
415 | LP002332,Male,Yes,0,Not Graduate,No,2253,2033,110,360,1,Rural,1
416 | LP002335,Female,Yes,0,Not Graduate,No,2149,3237,178,360,0,Semiurban,0
417 | LP002337,Female,No,0,Graduate,No,2995,0,60,360,1,Urban,1
418 | LP002341,Female,No,1,Graduate,No,2600,0,160,360,1,Urban,0
419 | LP002342,Male,Yes,2,Graduate,Yes,1600,20000,239,360,1,Urban,0
420 | LP002345,Male,Yes,0,Graduate,No,1025,2773,112,360,1,Rural,1
421 | LP002347,Male,Yes,0,Graduate,No,3246,1417,138,360,1,Semiurban,1
422 | LP002348,Male,Yes,0,Graduate,No,5829,0,138,360,1,Rural,1
423 | LP002357,Female,No,0,Not Graduate,No,2720,0,80,,0,Urban,0
424 | LP002361,Male,Yes,0,Graduate,No,1820,1719,100,360,1,Urban,1
425 | LP002362,Male,Yes,1,Graduate,No,7250,1667,110,,0,Urban,0
426 | LP002364,Male,Yes,0,Graduate,No,14880,0,96,360,1,Semiurban,1
427 | LP002366,Male,Yes,0,Graduate,No,2666,4300,121,360,1,Rural,1
428 | LP002367,Female,No,1,Not Graduate,No,4606,0,81,360,1,Rural,0
429 | LP002368,Male,Yes,2,Graduate,No,5935,0,133,360,1,Semiurban,1
430 | LP002369,Male,Yes,0,Graduate,No,2920,16.12000084,87,360,1,Rural,1
431 | LP002370,Male,No,0,Not Graduate,No,2717,0,60,180,1,Urban,1
432 | LP002377,Female,No,1,Graduate,Yes,8624,0,150,360,1,Semiurban,1
433 | LP002379,Male,No,0,Graduate,No,6500,0,105,360,0,Rural,0
434 | LP002386,Male,No,0,Graduate,,12876,0,405,360,1,Semiurban,1
435 | LP002387,Male,Yes,0,Graduate,No,2425,2340,143,360,1,Semiurban,1
436 | LP002390,Male,No,0,Graduate,No,3750,0,100,360,1,Urban,1
437 | LP002393,Female,,,Graduate,No,10047,0,0,240,1,Semiurban,1
438 | LP002398,Male,No,0,Graduate,No,1926,1851,50,360,1,Semiurban,1
439 | LP002401,Male,Yes,0,Graduate,No,2213,1125,0,360,1,Urban,1
440 | LP002403,Male,No,0,Graduate,Yes,10416,0,187,360,0,Urban,0
441 | LP002407,Female,Yes,0,Not Graduate,Yes,7142,0,138,360,1,Rural,1
442 | LP002408,Male,No,0,Graduate,No,3660,5064,187,360,1,Semiurban,1
443 | LP002409,Male,Yes,0,Graduate,No,7901,1833,180,360,1,Rural,1
444 | LP002418,Male,No,3+,Not Graduate,No,4707,1993,148,360,1,Semiurban,1
445 | LP002422,Male,No,1,Graduate,No,37719,0,152,360,1,Semiurban,1
446 | LP002424,Male,Yes,0,Graduate,No,7333,8333,175,300,,Rural,1
447 | LP002429,Male,Yes,1,Graduate,Yes,3466,1210,130,360,1,Rural,1
448 | LP002434,Male,Yes,2,Not Graduate,No,4652,0,110,360,1,Rural,1
449 | LP002435,Male,Yes,0,Graduate,,3539,1376,55,360,1,Rural,0
450 | LP002443,Male,Yes,2,Graduate,No,3340,1710,150,360,0,Rural,0
451 | LP002444,Male,No,1,Not Graduate,Yes,2769,1542,190,360,,Semiurban,0
452 | LP002446,Male,Yes,2,Not Graduate,No,2309,1255,125,360,0,Rural,0
453 | LP002447,Male,Yes,2,Not Graduate,No,1958,1456,60,300,,Urban,1
454 | LP002448,Male,Yes,0,Graduate,No,3948,1733,149,360,0,Rural,0
455 | LP002449,Male,Yes,0,Graduate,No,2483,2466,90,180,0,Rural,1
456 | LP002453,Male,No,0,Graduate,Yes,7085,0,84,360,1,Semiurban,1
457 | LP002455,Male,Yes,2,Graduate,No,3859,0,96,360,1,Semiurban,1
458 | LP002459,Male,Yes,0,Graduate,No,4301,0,118,360,1,Urban,1
459 | LP002467,Male,Yes,0,Graduate,No,3708,2569,173,360,1,Urban,0
460 | LP002472,Male,No,2,Graduate,No,4354,0,136,360,1,Rural,1
461 | LP002473,Male,Yes,0,Graduate,No,8334,0,160,360,1,Semiurban,0
462 | LP002478,,Yes,0,Graduate,Yes,2083,4083,160,360,,Semiurban,1
463 | LP002484,Male,Yes,3+,Graduate,No,7740,0,128,180,1,Urban,1
464 | LP002487,Male,Yes,0,Graduate,No,3015,2188,153,360,1,Rural,1
465 | LP002489,Female,No,1,Not Graduate,,5191,0,132,360,1,Semiurban,1
466 | LP002493,Male,No,0,Graduate,No,4166,0,98,360,0,Semiurban,0
467 | LP002494,Male,No,0,Graduate,No,6000,0,140,360,1,Rural,1
468 | LP002500,Male,Yes,3+,Not Graduate,No,2947,1664,70,180,0,Urban,0
469 | LP002501,,Yes,0,Graduate,No,16692,0,110,360,1,Semiurban,1
470 | LP002502,Female,Yes,2,Not Graduate,,210,2917,98,360,1,Semiurban,1
471 | LP002505,Male,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,0
472 | LP002515,Male,Yes,1,Graduate,Yes,3450,2079,162,360,1,Semiurban,1
473 | LP002517,Male,Yes,1,Not Graduate,No,2653,1500,113,180,0,Rural,0
474 | LP002519,Male,Yes,3+,Graduate,No,4691,0,100,360,1,Semiurban,1
475 | LP002522,Female,No,0,Graduate,Yes,2500,0,93,360,,Urban,1
476 | LP002524,Male,No,2,Graduate,No,5532,4648,162,360,1,Rural,1
477 | LP002527,Male,Yes,2,Graduate,Yes,16525,1014,150,360,1,Rural,1
478 | LP002529,Male,Yes,2,Graduate,No,6700,1750,230,300,1,Semiurban,1
479 | LP002530,,Yes,2,Graduate,No,2873,1872,132,360,0,Semiurban,0
480 | LP002531,Male,Yes,1,Graduate,Yes,16667,2250,86,360,1,Semiurban,1
481 | LP002533,Male,Yes,2,Graduate,No,2947,1603,0,360,1,Urban,0
482 | LP002534,Female,No,0,Not Graduate,No,4350,0,154,360,1,Rural,1
483 | LP002536,Male,Yes,3+,Not Graduate,No,3095,0,113,360,1,Rural,1
484 | LP002537,Male,Yes,0,Graduate,No,2083,3150,128,360,1,Semiurban,1
485 | LP002541,Male,Yes,0,Graduate,No,10833,0,234,360,1,Semiurban,1
486 | LP002543,Male,Yes,2,Graduate,No,8333,0,246,360,1,Semiurban,1
487 | LP002544,Male,Yes,1,Not Graduate,No,1958,2436,131,360,1,Rural,1
488 | LP002545,Male,No,2,Graduate,No,3547,0,80,360,0,Rural,0
489 | LP002547,Male,Yes,1,Graduate,No,18333,0,500,360,1,Urban,0
490 | LP002555,Male,Yes,2,Graduate,Yes,4583,2083,160,360,1,Semiurban,1
491 | LP002556,Male,No,0,Graduate,No,2435,0,75,360,1,Urban,0
492 | LP002560,Male,No,0,Not Graduate,No,2699,2785,96,360,,Semiurban,1
493 | LP002562,Male,Yes,1,Not Graduate,No,5333,1131,186,360,,Urban,1
494 | LP002571,Male,No,0,Not Graduate,No,3691,0,110,360,1,Rural,1
495 | LP002582,Female,No,0,Not Graduate,Yes,17263,0,225,360,1,Semiurban,1
496 | LP002585,Male,Yes,0,Graduate,No,3597,2157,119,360,0,Rural,0
497 | LP002586,Female,Yes,1,Graduate,No,3326,913,105,84,1,Semiurban,1
498 | LP002587,Male,Yes,0,Not Graduate,No,2600,1700,107,360,1,Rural,1
499 | LP002588,Male,Yes,0,Graduate,No,4625,2857,111,12,,Urban,1
500 | LP002600,Male,Yes,1,Graduate,Yes,2895,0,95,360,1,Semiurban,1
501 | LP002602,Male,No,0,Graduate,No,6283,4416,209,360,0,Rural,0
502 | LP002603,Female,No,0,Graduate,No,645,3683,113,480,1,Rural,1
503 | LP002606,Female,No,0,Graduate,No,3159,0,100,360,1,Semiurban,1
504 | LP002615,Male,Yes,2,Graduate,No,4865,5624,208,360,1,Semiurban,1
505 | LP002618,Male,Yes,1,Not Graduate,No,4050,5302,138,360,,Rural,0
506 | LP002619,Male,Yes,0,Not Graduate,No,3814,1483,124,300,1,Semiurban,1
507 | LP002622,Male,Yes,2,Graduate,No,3510,4416,243,360,1,Rural,1
508 | LP002624,Male,Yes,0,Graduate,No,20833,6667,480,360,,Urban,1
509 | LP002625,,No,0,Graduate,No,3583,0,96,360,1,Urban,0
510 | LP002626,Male,Yes,0,Graduate,Yes,2479,3013,188,360,1,Urban,1
511 | LP002634,Female,No,1,Graduate,No,13262,0,40,360,1,Urban,1
512 | LP002637,Male,No,0,Not Graduate,No,3598,1287,100,360,1,Rural,0
513 | LP002640,Male,Yes,1,Graduate,No,6065,2004,250,360,1,Semiurban,1
514 | LP002643,Male,Yes,2,Graduate,No,3283,2035,148,360,1,Urban,1
515 | LP002648,Male,Yes,0,Graduate,No,2130,6666,70,180,1,Semiurban,0
516 | LP002652,Male,No,0,Graduate,No,5815,3666,311,360,1,Rural,0
517 | LP002659,Male,Yes,3+,Graduate,No,3466,3428,150,360,1,Rural,1
518 | LP002670,Female,Yes,2,Graduate,No,2031,1632,113,480,1,Semiurban,1
519 | LP002682,Male,Yes,,Not Graduate,No,3074,1800,123,360,0,Semiurban,0
520 | LP002683,Male,No,0,Graduate,No,4683,1915,185,360,1,Semiurban,0
521 | LP002684,Female,No,0,Not Graduate,No,3400,0,95,360,1,Rural,0
522 | LP002689,Male,Yes,2,Not Graduate,No,2192,1742,45,360,1,Semiurban,1
523 | LP002690,Male,No,0,Graduate,No,2500,0,55,360,1,Semiurban,1
524 | LP002692,Male,Yes,3+,Graduate,Yes,5677,1424,100,360,1,Rural,1
525 | LP002693,Male,Yes,2,Graduate,Yes,7948,7166,480,360,1,Rural,1
526 | LP002697,Male,No,0,Graduate,No,4680,2087,0,360,1,Semiurban,0
527 | LP002699,Male,Yes,2,Graduate,Yes,17500,0,400,360,1,Rural,1
528 | LP002705,Male,Yes,0,Graduate,No,3775,0,110,360,1,Semiurban,1
529 | LP002706,Male,Yes,1,Not Graduate,No,5285,1430,161,360,0,Semiurban,1
530 | LP002714,Male,No,1,Not Graduate,No,2679,1302,94,360,1,Semiurban,1
531 | LP002716,Male,No,0,Not Graduate,No,6783,0,130,360,1,Semiurban,1
532 | LP002717,Male,Yes,0,Graduate,No,1025,5500,216,360,,Rural,1
533 | LP002720,Male,Yes,3+,Graduate,No,4281,0,100,360,1,Urban,1
534 | LP002723,Male,No,2,Graduate,No,3588,0,110,360,0,Rural,0
535 | LP002729,Male,No,1,Graduate,No,11250,0,196,360,,Semiurban,0
536 | LP002731,Female,No,0,Not Graduate,Yes,18165,0,125,360,1,Urban,1
537 | LP002732,Male,No,0,Not Graduate,,2550,2042,126,360,1,Rural,1
538 | LP002734,Male,Yes,0,Graduate,No,6133,3906,324,360,1,Urban,1
539 | LP002738,Male,No,2,Graduate,No,3617,0,107,360,1,Semiurban,1
540 | LP002739,Male,Yes,0,Not Graduate,No,2917,536,66,360,1,Rural,0
541 | LP002740,Male,Yes,3+,Graduate,No,6417,0,157,180,1,Rural,1
542 | LP002741,Female,Yes,1,Graduate,No,4608,2845,140,180,1,Semiurban,1
543 | LP002743,Female,No,0,Graduate,No,2138,0,99,360,0,Semiurban,0
544 | LP002753,Female,No,1,Graduate,,3652,0,95,360,1,Semiurban,1
545 | LP002755,Male,Yes,1,Not Graduate,No,2239,2524,128,360,1,Urban,1
546 | LP002757,Female,Yes,0,Not Graduate,No,3017,663,102,360,,Semiurban,1
547 | LP002767,Male,Yes,0,Graduate,No,2768,1950,155,360,1,Rural,1
548 | LP002768,Male,No,0,Not Graduate,No,3358,0,80,36,1,Semiurban,0
549 | LP002772,Male,No,0,Graduate,No,2526,1783,145,360,1,Rural,1
550 | LP002776,Female,No,0,Graduate,No,5000,0,103,360,0,Semiurban,0
551 | LP002777,Male,Yes,0,Graduate,No,2785,2016,110,360,1,Rural,1
552 | LP002778,Male,Yes,2,Graduate,Yes,6633,0,0,360,0,Rural,0
553 | LP002784,Male,Yes,1,Not Graduate,No,2492,2375,0,360,1,Rural,1
554 | LP002785,Male,Yes,1,Graduate,No,3333,3250,158,360,1,Urban,1
555 | LP002788,Male,Yes,0,Not Graduate,No,2454,2333,181,360,0,Urban,0
556 | LP002789,Male,Yes,0,Graduate,No,3593,4266,132,180,0,Rural,0
557 | LP002792,Male,Yes,1,Graduate,No,5468,1032,26,360,1,Semiurban,1
558 | LP002794,Female,No,0,Graduate,No,2667,1625,84,360,,Urban,1
559 | LP002795,Male,Yes,3+,Graduate,Yes,10139,0,260,360,1,Semiurban,1
560 | LP002798,Male,Yes,0,Graduate,No,3887,2669,162,360,1,Semiurban,1
561 | LP002804,Female,Yes,0,Graduate,No,4180,2306,182,360,1,Semiurban,1
562 | LP002807,Male,Yes,2,Not Graduate,No,3675,242,108,360,1,Semiurban,1
563 | LP002813,Female,Yes,1,Graduate,Yes,19484,0,600,360,1,Semiurban,1
564 | LP002820,Male,Yes,0,Graduate,No,5923,2054,211,360,1,Rural,1
565 | LP002821,Male,No,0,Not Graduate,Yes,5800,0,132,360,1,Semiurban,1
566 | LP002832,Male,Yes,2,Graduate,No,8799,0,258,360,0,Urban,0
567 | LP002833,Male,Yes,0,Not Graduate,No,4467,0,120,360,,Rural,1
568 | LP002836,Male,No,0,Graduate,No,3333,0,70,360,1,Urban,1
569 | LP002837,Male,Yes,3+,Graduate,No,3400,2500,123,360,0,Rural,0
570 | LP002840,Female,No,0,Graduate,No,2378,0,9,360,1,Urban,0
571 | LP002841,Male,Yes,0,Graduate,No,3166,2064,104,360,0,Urban,0
572 | LP002842,Male,Yes,1,Graduate,No,3417,1750,186,360,1,Urban,1
573 | LP002847,Male,Yes,,Graduate,No,5116,1451,165,360,0,Urban,0
574 | LP002855,Male,Yes,2,Graduate,No,16666,0,275,360,1,Urban,1
575 | LP002862,Male,Yes,2,Not Graduate,No,6125,1625,187,480,1,Semiurban,0
576 | LP002863,Male,Yes,3+,Graduate,No,6406,0,150,360,1,Semiurban,0
577 | LP002868,Male,Yes,2,Graduate,No,3159,461,108,84,1,Urban,1
578 | LP002872,,Yes,0,Graduate,No,3087,2210,136,360,0,Semiurban,0
579 | LP002874,Male,No,0,Graduate,No,3229,2739,110,360,1,Urban,1
580 | LP002877,Male,Yes,1,Graduate,No,1782,2232,107,360,1,Rural,1
581 | LP002888,Male,No,0,Graduate,,3182,2917,161,360,1,Urban,1
582 | LP002892,Male,Yes,2,Graduate,No,6540,0,205,360,1,Semiurban,1
583 | LP002893,Male,No,0,Graduate,No,1836,33837,90,360,1,Urban,0
584 | LP002894,Female,Yes,0,Graduate,No,3166,0,36,360,1,Semiurban,1
585 | LP002898,Male,Yes,1,Graduate,No,1880,0,61,360,,Rural,0
586 | LP002911,Male,Yes,1,Graduate,No,2787,1917,146,360,0,Rural,0
587 | LP002912,Male,Yes,1,Graduate,No,4283,3000,172,84,1,Rural,0
588 | LP002916,Male,Yes,0,Graduate,No,2297,1522,104,360,1,Urban,1
589 | LP002917,Female,No,0,Not Graduate,No,2165,0,70,360,1,Semiurban,1
590 | LP002925,,No,0,Graduate,No,4750,0,94,360,1,Semiurban,1
591 | LP002926,Male,Yes,2,Graduate,Yes,2726,0,106,360,0,Semiurban,0
592 | LP002928,Male,Yes,0,Graduate,No,3000,3416,56,180,1,Semiurban,1
593 | LP002931,Male,Yes,2,Graduate,Yes,6000,0,205,240,1,Semiurban,0
594 | LP002933,,No,3+,Graduate,Yes,9357,0,292,360,1,Semiurban,1
595 | LP002936,Male,Yes,0,Graduate,No,3859,3300,142,180,1,Rural,1
596 | LP002938,Male,Yes,0,Graduate,Yes,16120,0,260,360,1,Urban,1
597 | LP002940,Male,No,0,Not Graduate,No,3833,0,110,360,1,Rural,1
598 | LP002941,Male,Yes,2,Not Graduate,Yes,6383,1000,187,360,1,Rural,0
599 | LP002943,Male,No,,Graduate,No,2987,0,88,360,0,Semiurban,0
600 | LP002945,Male,Yes,0,Graduate,Yes,9963,0,180,360,1,Rural,1
601 | LP002948,Male,Yes,2,Graduate,No,5780,0,192,360,1,Urban,1
602 | LP002949,Female,No,3+,Graduate,,416,41667,350,180,,Urban,0
603 | LP002950,Male,Yes,0,Not Graduate,,2894,2792,155,360,1,Rural,1
604 | LP002953,Male,Yes,3+,Graduate,No,5703,0,128,360,1,Urban,1
605 | LP002958,Male,No,0,Graduate,No,3676,4301,172,360,1,Rural,1
606 | LP002959,Female,Yes,1,Graduate,No,12000,0,496,360,1,Semiurban,1
607 | LP002960,Male,Yes,0,Not Graduate,No,2400,3800,0,180,1,Urban,0
608 | LP002961,Male,Yes,1,Graduate,No,3400,2500,173,360,1,Semiurban,1
609 | LP002964,Male,Yes,2,Not Graduate,No,3987,1411,157,360,1,Rural,1
610 | LP002974,Male,Yes,0,Graduate,No,3232,1950,108,360,1,Rural,1
611 | LP002978,Female,No,0,Graduate,No,2900,0,71,360,1,Rural,1
612 | LP002979,Male,Yes,3+,Graduate,No,4106,0,40,180,1,Rural,1
613 | LP002983,Male,Yes,1,Graduate,No,8072,240,253,360,1,Urban,1
614 | LP002984,Male,Yes,2,Graduate,No,7583,0,187,360,1,Urban,1
615 | LP002990,Female,No,0,Graduate,Yes,4583,0,133,360,0,Semiurban,0
616 |
--------------------------------------------------------------------------------
/Chapter 3/Log_ROC.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Apress/supervised-learning-w-python/68c94f12d27647fa3dcd6b19d83edfc0bb3c5f39/Chapter 3/Log_ROC.png
--------------------------------------------------------------------------------
/Chapter 3/Naive Bayes Case Study.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Income prediction on census data"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Objective: \n",
15 | "To predict whether income exceeds 50K/yr based on census data"
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": [
22 | "Dataset: Adult Data Set\n",
23 | "\n",
24 | "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\n"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "Variable description:\n",
32 | " \n",
33 | "age: continuous\n",
34 | "\n",
35 | "workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.\n",
36 | "\n",
37 | "fnlwgt: continuous.\n",
38 | "\n",
39 | "education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.\n",
40 | "\n",
41 | "education-num: continuous.\n",
42 | "\n",
43 | "marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.\n",
44 | "\n",
45 | "occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.\n",
46 | "\n",
47 | "relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.\n",
48 | "\n",
49 | "race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.\n",
50 | "\n",
51 | "sex: Female, Male.\n",
52 | "\n",
53 | "capital-gain: continuous.\n",
54 | "\n",
55 | "capital-loss: continuous.\n",
56 | "\n",
57 | "hours-per-week: continuous.\n",
58 | "\n",
59 | "native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.\n",
60 | "\n",
61 | "class: >50K, <=50K"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 1,
67 | "metadata": {},
68 | "outputs": [],
69 | "source": [
70 | "# Pandas and Numpy libraries\n",
71 | "import pandas as pd\n",
72 | "import numpy as np"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": 2,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "# For preprocessing the data\n",
82 | "#from sklearn.preprocessing import Imputer\n",
83 | "from sklearn import preprocessing"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 3,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "# To split the dataset into train and test datasets\n",
93 | "from sklearn.model_selection import train_test_split"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 4,
99 | "metadata": {},
100 | "outputs": [],
101 | "source": [
102 | "# To model the Gaussian Navie Bayes classifier\n",
103 | "from sklearn.naive_bayes import GaussianNB"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 5,
109 | "metadata": {},
110 | "outputs": [],
111 | "source": [
112 | "# To calculate the accuracy score of the model\n",
113 | "from sklearn.metrics import accuracy_score"
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": 6,
119 | "metadata": {},
120 | "outputs": [],
121 | "source": [
122 | "census_df = pd.read_csv('adult.data', header = None, delimiter=' *, *', engine='python')"
123 | ]
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "metadata": {},
128 | "source": [
129 | "Load the dataset. Observe that this file has .data extention\n",
130 | "\n",
131 | "For importing the census data, we are using pandas read_csv() method. This method is a very simple and fast method for importing \n",
132 | "data.\n",
133 | "\n",
134 | "We are passing four parameters. The ‘adult.data’ parameter is the file name. The header parameter is for giving details to pandas\n",
135 | "that whether the first row of data consists of headers or not. In our dataset, there is no header. So, we are passing None.\n",
136 | "\n",
137 | "The delimiter parameter is for giving the information the delimiter that is separating the data. Here, we are using ‘ , ’ \n",
138 | "delimiter. This delimiter is to show delete the spaces before and after the data values. This is very helpful when there is \n",
139 | "inconsistency in spaces used with data values."
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 8,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "data": {
149 | "text/plain": [
150 | "Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], dtype='int64')"
151 | ]
152 | },
153 | "execution_count": 8,
154 | "metadata": {},
155 | "output_type": "execute_result"
156 | }
157 | ],
158 | "source": [
159 | "# Print columns in the adult data set\n",
160 | "census_df.columns"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 9,
166 | "metadata": {},
167 | "outputs": [],
168 | "source": [
169 | "# Adding headers to the dataframe \n",
170 | "census_df.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation', 'relationship',\n",
171 | " 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']"
172 | ]
173 | },
174 | {
175 | "cell_type": "code",
176 | "execution_count": 10,
177 | "metadata": {},
178 | "outputs": [
179 | {
180 | "data": {
181 | "text/plain": [
182 | "32561"
183 | ]
184 | },
185 | "execution_count": 10,
186 | "metadata": {},
187 | "output_type": "execute_result"
188 | }
189 | ],
190 | "source": [
191 | "# Number of records(rows) in the dataframe\n",
192 | "len(census_df)"
193 | ]
194 | },
195 | {
196 | "cell_type": "code",
197 | "execution_count": 11,
198 | "metadata": {},
199 | "outputs": [
200 | {
201 | "data": {
202 | "text/plain": [
203 | "age 0\n",
204 | "workclass 0\n",
205 | "fnlwgt 0\n",
206 | "education 0\n",
207 | "education_num 0\n",
208 | "marital_status 0\n",
209 | "occupation 0\n",
210 | "relationship 0\n",
211 | "race 0\n",
212 | "sex 0\n",
213 | "capital_gain 0\n",
214 | "capital_loss 0\n",
215 | "hours_per_week 0\n",
216 | "native_country 0\n",
217 | "income 0\n",
218 | "dtype: int64"
219 | ]
220 | },
221 | "execution_count": 11,
222 | "metadata": {},
223 | "output_type": "execute_result"
224 | }
225 | ],
226 | "source": [
227 | "# Handling missing data\n",
228 | "# Test whether there is any null value in our dataset or not. We can do this using isnull() method.\n",
229 | "census_df.isnull().sum()"
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "The above output shows that there is no “null” value in our dataset.\n",
237 | "\n",
238 | "Let’s try to test whether any categorical attribute contains a “?” in it or not. At times there exists “?” or ” ” in place of \n",
239 | "missing values. Using the below code snippet we are going to test whether adult_df data frame consists of categorical variables \n",
240 | "with values as “?”."
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 12,
246 | "metadata": {},
247 | "outputs": [
248 | {
249 | "name": "stdout",
250 | "output_type": "stream",
251 | "text": [
252 | "workclass : 1836\n",
253 | "education : 0\n",
254 | "marital_status : 0\n",
255 | "occupation : 1843\n",
256 | "relationship : 0\n",
257 | "race : 0\n",
258 | "sex : 0\n",
259 | "native_country : 583\n",
260 | "income : 0\n"
261 | ]
262 | }
263 | ],
264 | "source": [
265 | "for value in ['workclass','education','marital_status','occupation','relationship','race','sex','native_country','income']:\n",
266 | " print(value,\":\", sum(census_df[value] == '?'))"
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {},
272 | "source": [
273 | "The output of the above code snippet shows that there are 1836 missing values in workclass attribute. 1843 missing values in \n",
274 | "occupation attribute and 583 values in native_country attribute."
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "metadata": {},
280 | "source": [
281 | "# Data preprocessing\n",
282 | "\n",
283 | "For preprocessing, we are going to make a duplicate copy of our original dataframe.We are duplicating adult_df to adult_df_rev \n",
284 | "dataframe. Observe that we have used deep copy while copying. Why?"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 14,
290 | "metadata": {},
291 | "outputs": [],
292 | "source": [
293 | "## Deep copy of adult_df\n",
294 | "census_df_rev = census_df.copy(deep=True)"
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {},
300 | "source": [
301 | "Before doing missing values handling task, we need some summary statistics of our dataframe. For this, we can use describe() \n",
302 | "method. It can be used to generate various summary statistics, excluding NaN values."
303 | ]
304 | },
305 | {
306 | "cell_type": "code",
307 | "execution_count": 15,
308 | "metadata": {},
309 | "outputs": [
310 | {
311 | "data": {
312 | "text/html": [
313 | "
\n",
314 | "\n",
327 | "
\n",
328 | " \n",
329 | " \n",
330 | " | \n",
331 | " age | \n",
332 | " fnlwgt | \n",
333 | " education_num | \n",
334 | " capital_gain | \n",
335 | " capital_loss | \n",
336 | " hours_per_week | \n",
337 | "
\n",
338 | " \n",
339 | " \n",
340 | " \n",
341 | " count | \n",
342 | " 32561.000000 | \n",
343 | " 3.256100e+04 | \n",
344 | " 32561.000000 | \n",
345 | " 32561.000000 | \n",
346 | " 32561.000000 | \n",
347 | " 32561.000000 | \n",
348 | "
\n",
349 | " \n",
350 | " mean | \n",
351 | " 38.581647 | \n",
352 | " 1.897784e+05 | \n",
353 | " 10.080679 | \n",
354 | " 1077.648844 | \n",
355 | " 87.303830 | \n",
356 | " 40.437456 | \n",
357 | "
\n",
358 | " \n",
359 | " std | \n",
360 | " 13.640433 | \n",
361 | " 1.055500e+05 | \n",
362 | " 2.572720 | \n",
363 | " 7385.292085 | \n",
364 | " 402.960219 | \n",
365 | " 12.347429 | \n",
366 | "
\n",
367 | " \n",
368 | " min | \n",
369 | " 17.000000 | \n",
370 | " 1.228500e+04 | \n",
371 | " 1.000000 | \n",
372 | " 0.000000 | \n",
373 | " 0.000000 | \n",
374 | " 1.000000 | \n",
375 | "
\n",
376 | " \n",
377 | " 25% | \n",
378 | " 28.000000 | \n",
379 | " 1.178270e+05 | \n",
380 | " 9.000000 | \n",
381 | " 0.000000 | \n",
382 | " 0.000000 | \n",
383 | " 40.000000 | \n",
384 | "
\n",
385 | " \n",
386 | " 50% | \n",
387 | " 37.000000 | \n",
388 | " 1.783560e+05 | \n",
389 | " 10.000000 | \n",
390 | " 0.000000 | \n",
391 | " 0.000000 | \n",
392 | " 40.000000 | \n",
393 | "
\n",
394 | " \n",
395 | " 75% | \n",
396 | " 48.000000 | \n",
397 | " 2.370510e+05 | \n",
398 | " 12.000000 | \n",
399 | " 0.000000 | \n",
400 | " 0.000000 | \n",
401 | " 45.000000 | \n",
402 | "
\n",
403 | " \n",
404 | " max | \n",
405 | " 90.000000 | \n",
406 | " 1.484705e+06 | \n",
407 | " 16.000000 | \n",
408 | " 99999.000000 | \n",
409 | " 4356.000000 | \n",
410 | " 99.000000 | \n",
411 | "
\n",
412 | " \n",
413 | "
\n",
414 | "
"
415 | ],
416 | "text/plain": [
417 | " age fnlwgt education_num capital_gain capital_loss \\\n",
418 | "count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 \n",
419 | "mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 \n",
420 | "std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 \n",
421 | "min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 \n",
422 | "25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 \n",
423 | "50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 \n",
424 | "75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 \n",
425 | "max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 \n",
426 | "\n",
427 | " hours_per_week \n",
428 | "count 32561.000000 \n",
429 | "mean 40.437456 \n",
430 | "std 12.347429 \n",
431 | "min 1.000000 \n",
432 | "25% 40.000000 \n",
433 | "50% 40.000000 \n",
434 | "75% 45.000000 \n",
435 | "max 99.000000 "
436 | ]
437 | },
438 | "execution_count": 15,
439 | "metadata": {},
440 | "output_type": "execute_result"
441 | }
442 | ],
443 | "source": [
444 | "census_df_rev.describe()"
445 | ]
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "metadata": {},
450 | "source": [
451 | "We are passing an “include” parameter with value as “all”, this is used to specify that. we want summary statistics of all the \n",
452 | "attributes."
453 | ]
454 | },
455 | {
456 | "cell_type": "code",
457 | "execution_count": 16,
458 | "metadata": {},
459 | "outputs": [
460 | {
461 | "data": {
462 | "text/html": [
463 | "\n",
464 | "\n",
477 | "
\n",
478 | " \n",
479 | " \n",
480 | " | \n",
481 | " age | \n",
482 | " workclass | \n",
483 | " fnlwgt | \n",
484 | " education | \n",
485 | " education_num | \n",
486 | " marital_status | \n",
487 | " occupation | \n",
488 | " relationship | \n",
489 | " race | \n",
490 | " sex | \n",
491 | " capital_gain | \n",
492 | " capital_loss | \n",
493 | " hours_per_week | \n",
494 | " native_country | \n",
495 | " income | \n",
496 | "
\n",
497 | " \n",
498 | " \n",
499 | " \n",
500 | " count | \n",
501 | " 32561.000000 | \n",
502 | " 32561 | \n",
503 | " 3.256100e+04 | \n",
504 | " 32561 | \n",
505 | " 32561.000000 | \n",
506 | " 32561 | \n",
507 | " 32561 | \n",
508 | " 32561 | \n",
509 | " 32561 | \n",
510 | " 32561 | \n",
511 | " 32561.000000 | \n",
512 | " 32561.000000 | \n",
513 | " 32561.000000 | \n",
514 | " 32561 | \n",
515 | " 32561 | \n",
516 | "
\n",
517 | " \n",
518 | " unique | \n",
519 | " NaN | \n",
520 | " 9 | \n",
521 | " NaN | \n",
522 | " 16 | \n",
523 | " NaN | \n",
524 | " 7 | \n",
525 | " 15 | \n",
526 | " 6 | \n",
527 | " 5 | \n",
528 | " 2 | \n",
529 | " NaN | \n",
530 | " NaN | \n",
531 | " NaN | \n",
532 | " 42 | \n",
533 | " 2 | \n",
534 | "
\n",
535 | " \n",
536 | " top | \n",
537 | " NaN | \n",
538 | " Private | \n",
539 | " NaN | \n",
540 | " HS-grad | \n",
541 | " NaN | \n",
542 | " Married-civ-spouse | \n",
543 | " Prof-specialty | \n",
544 | " Husband | \n",
545 | " White | \n",
546 | " Male | \n",
547 | " NaN | \n",
548 | " NaN | \n",
549 | " NaN | \n",
550 | " United-States | \n",
551 | " <=50K | \n",
552 | "
\n",
553 | " \n",
554 | " freq | \n",
555 | " NaN | \n",
556 | " 22696 | \n",
557 | " NaN | \n",
558 | " 10501 | \n",
559 | " NaN | \n",
560 | " 14976 | \n",
561 | " 4140 | \n",
562 | " 13193 | \n",
563 | " 27816 | \n",
564 | " 21790 | \n",
565 | " NaN | \n",
566 | " NaN | \n",
567 | " NaN | \n",
568 | " 29170 | \n",
569 | " 24720 | \n",
570 | "
\n",
571 | " \n",
572 | " mean | \n",
573 | " 38.581647 | \n",
574 | " NaN | \n",
575 | " 1.897784e+05 | \n",
576 | " NaN | \n",
577 | " 10.080679 | \n",
578 | " NaN | \n",
579 | " NaN | \n",
580 | " NaN | \n",
581 | " NaN | \n",
582 | " NaN | \n",
583 | " 1077.648844 | \n",
584 | " 87.303830 | \n",
585 | " 40.437456 | \n",
586 | " NaN | \n",
587 | " NaN | \n",
588 | "
\n",
589 | " \n",
590 | " std | \n",
591 | " 13.640433 | \n",
592 | " NaN | \n",
593 | " 1.055500e+05 | \n",
594 | " NaN | \n",
595 | " 2.572720 | \n",
596 | " NaN | \n",
597 | " NaN | \n",
598 | " NaN | \n",
599 | " NaN | \n",
600 | " NaN | \n",
601 | " 7385.292085 | \n",
602 | " 402.960219 | \n",
603 | " 12.347429 | \n",
604 | " NaN | \n",
605 | " NaN | \n",
606 | "
\n",
607 | " \n",
608 | " min | \n",
609 | " 17.000000 | \n",
610 | " NaN | \n",
611 | " 1.228500e+04 | \n",
612 | " NaN | \n",
613 | " 1.000000 | \n",
614 | " NaN | \n",
615 | " NaN | \n",
616 | " NaN | \n",
617 | " NaN | \n",
618 | " NaN | \n",
619 | " 0.000000 | \n",
620 | " 0.000000 | \n",
621 | " 1.000000 | \n",
622 | " NaN | \n",
623 | " NaN | \n",
624 | "
\n",
625 | " \n",
626 | " 25% | \n",
627 | " 28.000000 | \n",
628 | " NaN | \n",
629 | " 1.178270e+05 | \n",
630 | " NaN | \n",
631 | " 9.000000 | \n",
632 | " NaN | \n",
633 | " NaN | \n",
634 | " NaN | \n",
635 | " NaN | \n",
636 | " NaN | \n",
637 | " 0.000000 | \n",
638 | " 0.000000 | \n",
639 | " 40.000000 | \n",
640 | " NaN | \n",
641 | " NaN | \n",
642 | "
\n",
643 | " \n",
644 | " 50% | \n",
645 | " 37.000000 | \n",
646 | " NaN | \n",
647 | " 1.783560e+05 | \n",
648 | " NaN | \n",
649 | " 10.000000 | \n",
650 | " NaN | \n",
651 | " NaN | \n",
652 | " NaN | \n",
653 | " NaN | \n",
654 | " NaN | \n",
655 | " 0.000000 | \n",
656 | " 0.000000 | \n",
657 | " 40.000000 | \n",
658 | " NaN | \n",
659 | " NaN | \n",
660 | "
\n",
661 | " \n",
662 | " 75% | \n",
663 | " 48.000000 | \n",
664 | " NaN | \n",
665 | " 2.370510e+05 | \n",
666 | " NaN | \n",
667 | " 12.000000 | \n",
668 | " NaN | \n",
669 | " NaN | \n",
670 | " NaN | \n",
671 | " NaN | \n",
672 | " NaN | \n",
673 | " 0.000000 | \n",
674 | " 0.000000 | \n",
675 | " 45.000000 | \n",
676 | " NaN | \n",
677 | " NaN | \n",
678 | "
\n",
679 | " \n",
680 | " max | \n",
681 | " 90.000000 | \n",
682 | " NaN | \n",
683 | " 1.484705e+06 | \n",
684 | " NaN | \n",
685 | " 16.000000 | \n",
686 | " NaN | \n",
687 | " NaN | \n",
688 | " NaN | \n",
689 | " NaN | \n",
690 | " NaN | \n",
691 | " 99999.000000 | \n",
692 | " 4356.000000 | \n",
693 | " 99.000000 | \n",
694 | " NaN | \n",
695 | " NaN | \n",
696 | "
\n",
697 | " \n",
698 | "
\n",
699 | "
"
700 | ],
701 | "text/plain": [
702 | " age workclass fnlwgt education education_num \\\n",
703 | "count 32561.000000 32561 3.256100e+04 32561 32561.000000 \n",
704 | "unique NaN 9 NaN 16 NaN \n",
705 | "top NaN Private NaN HS-grad NaN \n",
706 | "freq NaN 22696 NaN 10501 NaN \n",
707 | "mean 38.581647 NaN 1.897784e+05 NaN 10.080679 \n",
708 | "std 13.640433 NaN 1.055500e+05 NaN 2.572720 \n",
709 | "min 17.000000 NaN 1.228500e+04 NaN 1.000000 \n",
710 | "25% 28.000000 NaN 1.178270e+05 NaN 9.000000 \n",
711 | "50% 37.000000 NaN 1.783560e+05 NaN 10.000000 \n",
712 | "75% 48.000000 NaN 2.370510e+05 NaN 12.000000 \n",
713 | "max 90.000000 NaN 1.484705e+06 NaN 16.000000 \n",
714 | "\n",
715 | " marital_status occupation relationship race sex \\\n",
716 | "count 32561 32561 32561 32561 32561 \n",
717 | "unique 7 15 6 5 2 \n",
718 | "top Married-civ-spouse Prof-specialty Husband White Male \n",
719 | "freq 14976 4140 13193 27816 21790 \n",
720 | "mean NaN NaN NaN NaN NaN \n",
721 | "std NaN NaN NaN NaN NaN \n",
722 | "min NaN NaN NaN NaN NaN \n",
723 | "25% NaN NaN NaN NaN NaN \n",
724 | "50% NaN NaN NaN NaN NaN \n",
725 | "75% NaN NaN NaN NaN NaN \n",
726 | "max NaN NaN NaN NaN NaN \n",
727 | "\n",
728 | " capital_gain capital_loss hours_per_week native_country income \n",
729 | "count 32561.000000 32561.000000 32561.000000 32561 32561 \n",
730 | "unique NaN NaN NaN 42 2 \n",
731 | "top NaN NaN NaN United-States <=50K \n",
732 | "freq NaN NaN NaN 29170 24720 \n",
733 | "mean 1077.648844 87.303830 40.437456 NaN NaN \n",
734 | "std 7385.292085 402.960219 12.347429 NaN NaN \n",
735 | "min 0.000000 0.000000 1.000000 NaN NaN \n",
736 | "25% 0.000000 0.000000 40.000000 NaN NaN \n",
737 | "50% 0.000000 0.000000 40.000000 NaN NaN \n",
738 | "75% 0.000000 0.000000 45.000000 NaN NaN \n",
739 | "max 99999.000000 4356.000000 99.000000 NaN NaN "
740 | ]
741 | },
742 | "execution_count": 16,
743 | "metadata": {},
744 | "output_type": "execute_result"
745 | }
746 | ],
747 | "source": [
748 | "census_df_rev.describe(include= 'all')"
749 | ]
750 | },
751 | {
752 | "cell_type": "markdown",
753 | "metadata": {},
754 | "source": [
755 | "# Data imputation \n"
756 | ]
757 | },
758 | {
759 | "cell_type": "code",
760 | "execution_count": 17,
761 | "metadata": {},
762 | "outputs": [
763 | {
764 | "name": "stderr",
765 | "output_type": "stream",
766 | "text": [
767 | "/Users/vverdhan/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
768 | "A value is trying to be set on a copy of a slice from a DataFrame\n",
769 | "\n",
770 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
771 | " This is separate from the ipykernel package so we can avoid doing imports until\n"
772 | ]
773 | }
774 | ],
775 | "source": [
776 | "for value in ['workclass','education','marital_status','occupation','relationship','race','sex','native_country','income']:\n",
777 | " replaceValue = census_df_rev.describe(include='all')[value][2]\n",
778 | " census_df_rev[value][census_df_rev[value]=='?'] = replaceValue"
779 | ]
780 | },
781 | {
782 | "cell_type": "code",
783 | "execution_count": 18,
784 | "metadata": {},
785 | "outputs": [],
786 | "source": [
787 | "# Hot Encoding \n",
788 | "le = preprocessing.LabelEncoder()\n",
789 | "workclass_category = le.fit_transform(census_df.workclass)\n",
790 | "education_category = le.fit_transform(census_df.education)\n",
791 | "marital_category = le.fit_transform(census_df.marital_status)\n",
792 | "occupation_category = le.fit_transform(census_df.occupation)\n",
793 | "relationship_category = le.fit_transform(census_df.relationship)\n",
794 | "race_category = le.fit_transform(census_df.race)\n",
795 | "sex_category = le.fit_transform(census_df.sex)\n",
796 | "native_country_category = le.fit_transform(census_df.native_country)"
797 | ]
798 | },
799 | {
800 | "cell_type": "code",
801 | "execution_count": 20,
802 | "metadata": {},
803 | "outputs": [],
804 | "source": [
805 | "#initialize the encoded categorical columns\n",
806 | "census_df_rev['workclass_category'] = workclass_category\n",
807 | "census_df_rev['education_category'] = education_category\n",
808 | "census_df_rev['marital_category'] = marital_category\n",
809 | "census_df_rev['occupation_category'] = occupation_category\n",
810 | "census_df_rev['relationship_category'] = relationship_category\n",
811 | "census_df_rev['race_category'] = race_category\n",
812 | "census_df_rev['sex_category'] = sex_category\n",
813 | "census_df_rev['native_country_category'] = native_country_category"
814 | ]
815 | },
816 | {
817 | "cell_type": "code",
818 | "execution_count": 21,
819 | "metadata": {},
820 | "outputs": [
821 | {
822 | "data": {
823 | "text/html": [
824 | "\n",
825 | "\n",
838 | "
\n",
839 | " \n",
840 | " \n",
841 | " | \n",
842 | " age | \n",
843 | " workclass | \n",
844 | " fnlwgt | \n",
845 | " education | \n",
846 | " education_num | \n",
847 | " marital_status | \n",
848 | " occupation | \n",
849 | " relationship | \n",
850 | " race | \n",
851 | " sex | \n",
852 | " ... | \n",
853 | " native_country | \n",
854 | " income | \n",
855 | " workclass_category | \n",
856 | " education_category | \n",
857 | " marital_category | \n",
858 | " occupation_category | \n",
859 | " relationship_category | \n",
860 | " race_category | \n",
861 | " sex_category | \n",
862 | " native_country_category | \n",
863 | "
\n",
864 | " \n",
865 | " \n",
866 | " \n",
867 | " 0 | \n",
868 | " 39 | \n",
869 | " State-gov | \n",
870 | " 77516 | \n",
871 | " Bachelors | \n",
872 | " 13 | \n",
873 | " Never-married | \n",
874 | " Adm-clerical | \n",
875 | " Not-in-family | \n",
876 | " White | \n",
877 | " Male | \n",
878 | " ... | \n",
879 | " United-States | \n",
880 | " <=50K | \n",
881 | " 7 | \n",
882 | " 9 | \n",
883 | " 4 | \n",
884 | " 1 | \n",
885 | " 1 | \n",
886 | " 4 | \n",
887 | " 1 | \n",
888 | " 39 | \n",
889 | "
\n",
890 | " \n",
891 | " 1 | \n",
892 | " 50 | \n",
893 | " Self-emp-not-inc | \n",
894 | " 83311 | \n",
895 | " Bachelors | \n",
896 | " 13 | \n",
897 | " Married-civ-spouse | \n",
898 | " Exec-managerial | \n",
899 | " Husband | \n",
900 | " White | \n",
901 | " Male | \n",
902 | " ... | \n",
903 | " United-States | \n",
904 | " <=50K | \n",
905 | " 6 | \n",
906 | " 9 | \n",
907 | " 2 | \n",
908 | " 4 | \n",
909 | " 0 | \n",
910 | " 4 | \n",
911 | " 1 | \n",
912 | " 39 | \n",
913 | "
\n",
914 | " \n",
915 | " 2 | \n",
916 | " 38 | \n",
917 | " Private | \n",
918 | " 215646 | \n",
919 | " HS-grad | \n",
920 | " 9 | \n",
921 | " Divorced | \n",
922 | " Handlers-cleaners | \n",
923 | " Not-in-family | \n",
924 | " White | \n",
925 | " Male | \n",
926 | " ... | \n",
927 | " United-States | \n",
928 | " <=50K | \n",
929 | " 4 | \n",
930 | " 11 | \n",
931 | " 0 | \n",
932 | " 6 | \n",
933 | " 1 | \n",
934 | " 4 | \n",
935 | " 1 | \n",
936 | " 39 | \n",
937 | "
\n",
938 | " \n",
939 | " 3 | \n",
940 | " 53 | \n",
941 | " Private | \n",
942 | " 234721 | \n",
943 | " 11th | \n",
944 | " 7 | \n",
945 | " Married-civ-spouse | \n",
946 | " Handlers-cleaners | \n",
947 | " Husband | \n",
948 | " Black | \n",
949 | " Male | \n",
950 | " ... | \n",
951 | " United-States | \n",
952 | " <=50K | \n",
953 | " 4 | \n",
954 | " 1 | \n",
955 | " 2 | \n",
956 | " 6 | \n",
957 | " 0 | \n",
958 | " 2 | \n",
959 | " 1 | \n",
960 | " 39 | \n",
961 | "
\n",
962 | " \n",
963 | " 4 | \n",
964 | " 28 | \n",
965 | " Private | \n",
966 | " 338409 | \n",
967 | " Bachelors | \n",
968 | " 13 | \n",
969 | " Married-civ-spouse | \n",
970 | " Prof-specialty | \n",
971 | " Wife | \n",
972 | " Black | \n",
973 | " Female | \n",
974 | " ... | \n",
975 | " Cuba | \n",
976 | " <=50K | \n",
977 | " 4 | \n",
978 | " 9 | \n",
979 | " 2 | \n",
980 | " 10 | \n",
981 | " 5 | \n",
982 | " 2 | \n",
983 | " 0 | \n",
984 | " 5 | \n",
985 | "
\n",
986 | " \n",
987 | "
\n",
988 | "
5 rows × 23 columns
\n",
989 | "
"
990 | ],
991 | "text/plain": [
992 | " age workclass fnlwgt education education_num \\\n",
993 | "0 39 State-gov 77516 Bachelors 13 \n",
994 | "1 50 Self-emp-not-inc 83311 Bachelors 13 \n",
995 | "2 38 Private 215646 HS-grad 9 \n",
996 | "3 53 Private 234721 11th 7 \n",
997 | "4 28 Private 338409 Bachelors 13 \n",
998 | "\n",
999 | " marital_status occupation relationship race sex ... \\\n",
1000 | "0 Never-married Adm-clerical Not-in-family White Male ... \n",
1001 | "1 Married-civ-spouse Exec-managerial Husband White Male ... \n",
1002 | "2 Divorced Handlers-cleaners Not-in-family White Male ... \n",
1003 | "3 Married-civ-spouse Handlers-cleaners Husband Black Male ... \n",
1004 | "4 Married-civ-spouse Prof-specialty Wife Black Female ... \n",
1005 | "\n",
1006 | " native_country income workclass_category education_category \\\n",
1007 | "0 United-States <=50K 7 9 \n",
1008 | "1 United-States <=50K 6 9 \n",
1009 | "2 United-States <=50K 4 11 \n",
1010 | "3 United-States <=50K 4 1 \n",
1011 | "4 Cuba <=50K 4 9 \n",
1012 | "\n",
1013 | " marital_category occupation_category relationship_category race_category \\\n",
1014 | "0 4 1 1 4 \n",
1015 | "1 2 4 0 4 \n",
1016 | "2 0 6 1 4 \n",
1017 | "3 2 6 0 2 \n",
1018 | "4 2 10 5 2 \n",
1019 | "\n",
1020 | " sex_category native_country_category \n",
1021 | "0 1 39 \n",
1022 | "1 1 39 \n",
1023 | "2 1 39 \n",
1024 | "3 1 39 \n",
1025 | "4 0 5 \n",
1026 | "\n",
1027 | "[5 rows x 23 columns]"
1028 | ]
1029 | },
1030 | "execution_count": 21,
1031 | "metadata": {},
1032 | "output_type": "execute_result"
1033 | }
1034 | ],
1035 | "source": [
1036 | "census_df_rev.head()"
1037 | ]
1038 | },
1039 | {
1040 | "cell_type": "code",
1041 | "execution_count": 22,
1042 | "metadata": {},
1043 | "outputs": [],
1044 | "source": [
1045 | "#drop the old categorical columns from dataframe\n",
1046 | "dummy_fields = ['workclass','education','marital_status','occupation','relationship','race', 'sex', 'native_country']\n",
1047 | "census_df_rev = census_df_rev.drop(dummy_fields, axis = 1)"
1048 | ]
1049 | },
1050 | {
1051 | "cell_type": "code",
1052 | "execution_count": 23,
1053 | "metadata": {},
1054 | "outputs": [
1055 | {
1056 | "ename": "AttributeError",
1057 | "evalue": "'DataFrame' object has no attribute 'reindex_axis'",
1058 | "output_type": "error",
1059 | "traceback": [
1060 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
1061 | "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
1062 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m census_df_rev = census_df_rev.reindex_axis(['age', 'workclass_category', 'fnlwgt', 'education_category',\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m'education_num'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'marital_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'occupation_category'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m'relationship_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'race_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'sex_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'capital_gain'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m'capital_loss'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'hours_per_week'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'native_country_category'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m 'income'], axis= 1)\n",
1063 | "\u001b[0;32m~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 5177\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5178\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5179\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5180\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5181\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1064 | "\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'reindex_axis'"
1065 | ]
1066 | }
1067 | ],
1068 | "source": [
1069 | "census_df_rev = census_df_rev.reindex_axis(['age', 'workclass_category', 'fnlwgt', 'education_category',\n",
1070 | " 'education_num', 'marital_category', 'occupation_category',\n",
1071 | " 'relationship_category', 'race_category', 'sex_category', 'capital_gain',\n",
1072 | " 'capital_loss', 'hours_per_week', 'native_country_category', \n",
1073 | " 'income'], axis= 1)\n",
1074 | "census_df_rev.head(5)"
1075 | ]
1076 | },
1077 | {
1078 | "cell_type": "code",
1079 | "execution_count": 24,
1080 | "metadata": {},
1081 | "outputs": [
1082 | {
1083 | "data": {
1084 | "text/html": [
1085 | "\n",
1086 | "\n",
1099 | "
\n",
1100 | " \n",
1101 | " \n",
1102 | " | \n",
1103 | " age | \n",
1104 | " workclass_category | \n",
1105 | " fnlwgt | \n",
1106 | " education_category | \n",
1107 | " education_num | \n",
1108 | " marital_category | \n",
1109 | " occupation_category | \n",
1110 | " relationship_category | \n",
1111 | " race_category | \n",
1112 | " sex_category | \n",
1113 | " capital_gain | \n",
1114 | " capital_loss | \n",
1115 | " hours_per_week | \n",
1116 | " native_country_category | \n",
1117 | " income | \n",
1118 | "
\n",
1119 | " \n",
1120 | " \n",
1121 | " \n",
1122 | " 0 | \n",
1123 | " 39 | \n",
1124 | " 7 | \n",
1125 | " 77516 | \n",
1126 | " 9 | \n",
1127 | " 13 | \n",
1128 | " 4 | \n",
1129 | " 1 | \n",
1130 | " 1 | \n",
1131 | " 4 | \n",
1132 | " 1 | \n",
1133 | " 2174 | \n",
1134 | " 0 | \n",
1135 | " 40 | \n",
1136 | " 39 | \n",
1137 | " <=50K | \n",
1138 | "
\n",
1139 | " \n",
1140 | " 1 | \n",
1141 | " 50 | \n",
1142 | " 6 | \n",
1143 | " 83311 | \n",
1144 | " 9 | \n",
1145 | " 13 | \n",
1146 | " 2 | \n",
1147 | " 4 | \n",
1148 | " 0 | \n",
1149 | " 4 | \n",
1150 | " 1 | \n",
1151 | " 0 | \n",
1152 | " 0 | \n",
1153 | " 13 | \n",
1154 | " 39 | \n",
1155 | " <=50K | \n",
1156 | "
\n",
1157 | " \n",
1158 | " 2 | \n",
1159 | " 38 | \n",
1160 | " 4 | \n",
1161 | " 215646 | \n",
1162 | " 11 | \n",
1163 | " 9 | \n",
1164 | " 0 | \n",
1165 | " 6 | \n",
1166 | " 1 | \n",
1167 | " 4 | \n",
1168 | " 1 | \n",
1169 | " 0 | \n",
1170 | " 0 | \n",
1171 | " 40 | \n",
1172 | " 39 | \n",
1173 | " <=50K | \n",
1174 | "
\n",
1175 | " \n",
1176 | " 3 | \n",
1177 | " 53 | \n",
1178 | " 4 | \n",
1179 | " 234721 | \n",
1180 | " 1 | \n",
1181 | " 7 | \n",
1182 | " 2 | \n",
1183 | " 6 | \n",
1184 | " 0 | \n",
1185 | " 2 | \n",
1186 | " 1 | \n",
1187 | " 0 | \n",
1188 | " 0 | \n",
1189 | " 40 | \n",
1190 | " 39 | \n",
1191 | " <=50K | \n",
1192 | "
\n",
1193 | " \n",
1194 | " 4 | \n",
1195 | " 28 | \n",
1196 | " 4 | \n",
1197 | " 338409 | \n",
1198 | " 9 | \n",
1199 | " 13 | \n",
1200 | " 2 | \n",
1201 | " 10 | \n",
1202 | " 5 | \n",
1203 | " 2 | \n",
1204 | " 0 | \n",
1205 | " 0 | \n",
1206 | " 0 | \n",
1207 | " 40 | \n",
1208 | " 5 | \n",
1209 | " <=50K | \n",
1210 | "
\n",
1211 | " \n",
1212 | "
\n",
1213 | "
"
1214 | ],
1215 | "text/plain": [
1216 | " age workclass_category fnlwgt education_category education_num \\\n",
1217 | "0 39 7 77516 9 13 \n",
1218 | "1 50 6 83311 9 13 \n",
1219 | "2 38 4 215646 11 9 \n",
1220 | "3 53 4 234721 1 7 \n",
1221 | "4 28 4 338409 9 13 \n",
1222 | "\n",
1223 | " marital_category occupation_category relationship_category \\\n",
1224 | "0 4 1 1 \n",
1225 | "1 2 4 0 \n",
1226 | "2 0 6 1 \n",
1227 | "3 2 6 0 \n",
1228 | "4 2 10 5 \n",
1229 | "\n",
1230 | " race_category sex_category capital_gain capital_loss hours_per_week \\\n",
1231 | "0 4 1 2174 0 40 \n",
1232 | "1 4 1 0 0 13 \n",
1233 | "2 4 1 0 0 40 \n",
1234 | "3 2 1 0 0 40 \n",
1235 | "4 2 0 0 0 40 \n",
1236 | "\n",
1237 | " native_country_category income \n",
1238 | "0 39 <=50K \n",
1239 | "1 39 <=50K \n",
1240 | "2 39 <=50K \n",
1241 | "3 39 <=50K \n",
1242 | "4 5 <=50K "
1243 | ]
1244 | },
1245 | "execution_count": 24,
1246 | "metadata": {},
1247 | "output_type": "execute_result"
1248 | }
1249 | ],
1250 | "source": [
1251 | "census_df_rev = census_df_rev.reindex(['age', 'workclass_category', 'fnlwgt', 'education_category',\n",
1252 | " 'education_num', 'marital_category', 'occupation_category',\n",
1253 | " 'relationship_category', 'race_category', 'sex_category', 'capital_gain',\n",
1254 | " 'capital_loss', 'hours_per_week', 'native_country_category', \n",
1255 | " 'income'], axis= 1)\n",
1256 | "census_df_rev.head(5)"
1257 | ]
1258 | },
1259 | {
1260 | "cell_type": "markdown",
1261 | "metadata": {},
1262 | "source": [
1263 | "# Data Slicing"
1264 | ]
1265 | },
1266 | {
1267 | "cell_type": "code",
1268 | "execution_count": 25,
1269 | "metadata": {},
1270 | "outputs": [],
1271 | "source": [
1272 | "X = census_df_rev.values[:,:14]\n",
1273 | "Y = census_df_rev.values[:,14] "
1274 | ]
1275 | },
1276 | {
1277 | "cell_type": "code",
1278 | "execution_count": 26,
1279 | "metadata": {},
1280 | "outputs": [],
1281 | "source": [
1282 | "\n",
1283 | "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 5)"
1284 | ]
1285 | },
1286 | {
1287 | "cell_type": "markdown",
1288 | "metadata": {},
1289 | "source": [
1290 | "Implement Gaussian Naive Bayes"
1291 | ]
1292 | },
1293 | {
1294 | "cell_type": "code",
1295 | "execution_count": 27,
1296 | "metadata": {},
1297 | "outputs": [
1298 | {
1299 | "data": {
1300 | "text/plain": [
1301 | "GaussianNB()"
1302 | ]
1303 | },
1304 | "execution_count": 27,
1305 | "metadata": {},
1306 | "output_type": "execute_result"
1307 | }
1308 | ],
1309 | "source": [
1310 | "clf = GaussianNB()\n",
1311 | "clf.fit(X_train, Y_train)"
1312 | ]
1313 | },
1314 | {
1315 | "cell_type": "code",
1316 | "execution_count": 28,
1317 | "metadata": {},
1318 | "outputs": [],
1319 | "source": [
1320 | "Y_pred = clf.predict(X_test)"
1321 | ]
1322 | },
1323 | {
1324 | "cell_type": "code",
1325 | "execution_count": 29,
1326 | "metadata": {},
1327 | "outputs": [
1328 | {
1329 | "data": {
1330 | "text/plain": [
1331 | "0.7903205994349588"
1332 | ]
1333 | },
1334 | "execution_count": 29,
1335 | "metadata": {},
1336 | "output_type": "execute_result"
1337 | }
1338 | ],
1339 | "source": [
1340 | "accuracy_score(Y_test, Y_pred, normalize = True)"
1341 | ]
1342 | },
1343 | {
1344 | "cell_type": "code",
1345 | "execution_count": null,
1346 | "metadata": {},
1347 | "outputs": [],
1348 | "source": []
1349 | }
1350 | ],
1351 | "metadata": {
1352 | "kernelspec": {
1353 | "display_name": "Python 3",
1354 | "language": "python",
1355 | "name": "python3"
1356 | },
1357 | "language_info": {
1358 | "codemirror_mode": {
1359 | "name": "ipython",
1360 | "version": 3
1361 | },
1362 | "file_extension": ".py",
1363 | "mimetype": "text/x-python",
1364 | "name": "python",
1365 | "nbconvert_exporter": "python",
1366 | "pygments_lexer": "ipython3",
1367 | "version": "3.7.4"
1368 | }
1369 | },
1370 | "nbformat": 4,
1371 | "nbformat_minor": 2
1372 | }
1373 |
--------------------------------------------------------------------------------
/Chapter 3/ReadMe:
--------------------------------------------------------------------------------
1 |
2 |
3 | The third chapter of the book, it contains the classification algorithms. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding.
4 |
--------------------------------------------------------------------------------
/Chapter 4/Chapter4_NLP2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "colab_type": "text",
7 | "id": "uUpQff5qfTNc"
8 | },
9 | "source": [
10 | "## Complaint Categorization using Word Embeddings"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": 1,
16 | "metadata": {
17 | "colab": {},
18 | "colab_type": "code",
19 | "id": "zhXsYJwq7-Rs"
20 | },
21 | "outputs": [],
22 | "source": [
23 | "from nltk.tokenize import RegexpTokenizer\n",
24 | "import numpy as np\n",
25 | "import re"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 2,
31 | "metadata": {
32 | "colab": {
33 | "base_uri": "https://localhost:8080/",
34 | "height": 204
35 | },
36 | "colab_type": "code",
37 | "executionInfo": {
38 | "elapsed": 7321,
39 | "status": "ok",
40 | "timestamp": 1566387081318,
41 | "user": {
42 | "displayName": "dikshant gupta",
43 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
44 | "userId": "01845807612441668603"
45 | },
46 | "user_tz": -330
47 | },
48 | "id": "s_Bu4lfx7-Rz",
49 | "outputId": "97bd61bd-4a05-4365-f935-4127bf06790f"
50 | },
51 | "outputs": [],
52 | "source": [
53 | "import pandas as pd\n",
54 | "complaints_dataframe = pd.read_csv('complaints.csv') \n"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 4,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/html": [
65 | "\n",
66 | "\n",
79 | "
\n",
80 | " \n",
81 | " \n",
82 | " | \n",
83 | " Consumer complaint narrative | \n",
84 | " Product | \n",
85 | "
\n",
86 | " \n",
87 | " \n",
88 | " \n",
89 | " 0 | \n",
90 | " I have outdated information on my credit repor... | \n",
91 | " Credit reporting | \n",
92 | "
\n",
93 | " \n",
94 | " 1 | \n",
95 | " I purchased a new car on XXXX XXXX. The car de... | \n",
96 | " Consumer Loan | \n",
97 | "
\n",
98 | " \n",
99 | " 2 | \n",
100 | " An account on my credit report has a mistaken ... | \n",
101 | " Credit reporting | \n",
102 | "
\n",
103 | " \n",
104 | " 3 | \n",
105 | " This company refuses to provide me verificatio... | \n",
106 | " Debt collection | \n",
107 | "
\n",
108 | " \n",
109 | " 4 | \n",
110 | " This complaint is in regards to Square Two Fin... | \n",
111 | " Debt collection | \n",
112 | "
\n",
113 | " \n",
114 | "
\n",
115 | "
"
116 | ],
117 | "text/plain": [
118 | " Consumer complaint narrative Product\n",
119 | "0 I have outdated information on my credit repor... Credit reporting\n",
120 | "1 I purchased a new car on XXXX XXXX. The car de... Consumer Loan\n",
121 | "2 An account on my credit report has a mistaken ... Credit reporting\n",
122 | "3 This company refuses to provide me verificatio... Debt collection\n",
123 | "4 This complaint is in regards to Square Two Fin... Debt collection"
124 | ]
125 | },
126 | "execution_count": 4,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "complaints_dataframe.head()"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": 5,
138 | "metadata": {
139 | "colab": {},
140 | "colab_type": "code",
141 | "id": "v6H1UTDM7-R8"
142 | },
143 | "outputs": [],
144 | "source": [
145 | "def convert_complaint_to_words(comp):\n",
146 | " \n",
147 | " converted_words = RegexpTokenizer('\\w+').tokenize(comp)\n",
148 | " converted_words = [re.sub(r'([xx]+)|([XX]+)|(\\d+)', '', w).lower() for w in converted_words]\n",
149 | " converted_words = list(filter(lambda a: a != '', converted_words))\n",
150 | " \n",
151 | " return converted_words"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 6,
157 | "metadata": {
158 | "colab": {},
159 | "colab_type": "code",
160 | "id": "0RgXmo-N7-SC"
161 | },
162 | "outputs": [],
163 | "source": [
164 | "all_words = list()\n",
165 | "for comp in complaints_dataframe['Consumer complaint narrative']:\n",
166 | " for w in convert_complaint_to_words(comp):\n",
167 | " all_words.append(w)"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 7,
173 | "metadata": {
174 | "colab": {
175 | "base_uri": "https://localhost:8080/",
176 | "height": 34
177 | },
178 | "colab_type": "code",
179 | "executionInfo": {
180 | "elapsed": 80284,
181 | "status": "ok",
182 | "timestamp": 1566387158514,
183 | "user": {
184 | "displayName": "dikshant gupta",
185 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
186 | "userId": "01845807612441668603"
187 | },
188 | "user_tz": -330
189 | },
190 | "id": "8T_RNzwy7-SF",
191 | "outputId": "cd6c76a2-42d6-43aa-877f-f399f9799130"
192 | },
193 | "outputs": [
194 | {
195 | "name": "stdout",
196 | "output_type": "stream",
197 | "text": [
198 | "Size of vocabulary: 76908\n"
199 | ]
200 | }
201 | ],
202 | "source": [
203 | "print('Size of vocabulary is {}'.format(len(set(all_words))))"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": 9,
209 | "metadata": {
210 | "colab": {
211 | "base_uri": "https://localhost:8080/",
212 | "height": 190
213 | },
214 | "colab_type": "code",
215 | "executionInfo": {
216 | "elapsed": 79440,
217 | "status": "ok",
218 | "timestamp": 1566387158515,
219 | "user": {
220 | "displayName": "dikshant gupta",
221 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
222 | "userId": "01845807612441668603"
223 | },
224 | "user_tz": -330
225 | },
226 | "id": "Dbh2Y10y7-SL",
227 | "outputId": "da08bf30-04f9-4ec1-c798-0f22a4c44cb9"
228 | },
229 | "outputs": [
230 | {
231 | "name": "stdout",
232 | "output_type": "stream",
233 | "text": [
234 | "Complaint is \n",
235 | " Without provocation, I received notice that my credit line was being decreased by nearly 100 %. My available credit was reduced from $ XXXX to XXXX ( the rough amount of my available balance ). \n",
236 | "\n",
237 | "When I called to question the change, I was provided a nob-descript response referencing my XXXX report. It was my understanding that under the FCRA I was entitled to a copy of this report, but was refused by Citi and have been given no further explanation. \n",
238 | "\n",
239 | "This is predatory in that it affects my utilization of credit, further subjecting me to increase in APrs, etc and a higher cost of credit without any reason. \n",
240 | "\n",
241 | "Tokens are\n",
242 | " ['without', 'provocation', 'i', 'received', 'notice', 'that', 'my', 'credit', 'line', 'was', 'being', 'decreased', 'by', 'nearly', 'my', 'available', 'credit', 'was', 'reduced', 'from', 'to', 'the', 'rough', 'amount', 'of', 'my', 'available', 'balance', 'when', 'i', 'called', 'to', 'question', 'the', 'change', 'i', 'was', 'provided', 'a', 'nob', 'descript', 'response', 'referencing', 'my', 'report', 'it', 'was', 'my', 'understanding', 'that', 'under', 'the', 'fcra', 'i', 'was', 'entitled', 'to', 'a', 'copy', 'of', 'this', 'report', 'but', 'was', 'refused', 'by', 'citi', 'and', 'have', 'been', 'given', 'no', 'further', 'eplanation', 'this', 'is', 'predatory', 'in', 'that', 'it', 'affects', 'my', 'utilization', 'of', 'credit', 'further', 'subjecting', 'me', 'to', 'increase', 'in', 'aprs', 'etc', 'and', 'a', 'higher', 'cost', 'of', 'credit', 'without', 'any', 'reason']\n"
243 | ]
244 | }
245 | ],
246 | "source": [
247 | "print('Complaint is \\n', complaints_dataframe['Consumer complaint narrative'][10], '\\n')\n",
248 | "print('Tokens are\\n', convert_complaint_to_words(complaints_dataframe['Consumer complaint narrative'][10]))"
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {
254 | "colab_type": "text",
255 | "id": "YHi4vCGX7-SU"
256 | },
257 | "source": [
258 | "### Indexing\n"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": 10,
264 | "metadata": {
265 | "colab": {},
266 | "colab_type": "code",
267 | "id": "-ClRX_y07-SW"
268 | },
269 | "outputs": [],
270 | "source": [
271 | "index_dictionary = dict()\n",
272 | "count = 1\n",
273 | "index_dictionary[''] = 0\n",
274 | "for word in set(all_words):\n",
275 | " index_dictionary[word] = count\n",
276 | " count += 1"
277 | ]
278 | },
279 | {
280 | "cell_type": "markdown",
281 | "metadata": {
282 | "colab_type": "text",
283 | "id": "vv8dIbF47-Sa"
284 | },
285 | "source": [
286 | "### Dataset"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 11,
292 | "metadata": {
293 | "colab": {},
294 | "colab_type": "code",
295 | "id": "u1OJzln_7-Sb"
296 | },
297 | "outputs": [],
298 | "source": [
299 | "embeddings_index = {}\n",
300 | "f = open('glove.6B.300d.txt')\n",
301 | "for line in f:\n",
302 | " values = line.split()\n",
303 | " word = values[0]\n",
304 | " coefs = np.asarray(values[1:], dtype='float32')\n",
305 | " embeddings_index[word] = coefs\n",
306 | "f.close()"
307 | ]
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {
312 | "colab_type": "text",
313 | "id": "s--9I9d5msCp"
314 | },
315 | "source": [
316 | "#### Taking average of all word embeddings in a sentence to generate the sentence representation."
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 13,
322 | "metadata": {
323 | "colab": {},
324 | "colab_type": "code",
325 | "id": "aN13GyDe7-Sd"
326 | },
327 | "outputs": [],
328 | "source": [
329 | "complaints_list = list()\n",
330 | "for comp in complaints_dataframe['Consumer complaint narrative']:\n",
331 | " sentence = np.zeros(300)\n",
332 | " count = 0\n",
333 | " for w in convert_complaint_to_words(comp):\n",
334 | " try:\n",
335 | " sentence += embeddings_index[w]\n",
336 | " count += 1\n",
337 | " except KeyError:\n",
338 | " continue\n",
339 | " complaints_list.append(sentence / count)"
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {
345 | "colab_type": "text",
346 | "id": "OYZ703hVm5Cg"
347 | },
348 | "source": [
349 | "#### Converting categrical labels to numerical format and further one hot encoding on the numerical labels."
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": 14,
355 | "metadata": {
356 | "colab": {
357 | "base_uri": "https://localhost:8080/",
358 | "height": 204
359 | },
360 | "colab_type": "code",
361 | "executionInfo": {
362 | "elapsed": 1796,
363 | "status": "ok",
364 | "timestamp": 1566387500223,
365 | "user": {
366 | "displayName": "dikshant gupta",
367 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
368 | "userId": "01845807612441668603"
369 | },
370 | "user_tz": -330
371 | },
372 | "id": "EoR79rtZ7-Sr",
373 | "outputId": "704c6e1e-dc8e-4d42-851d-0e3724e60fd6"
374 | },
375 | "outputs": [
376 | {
377 | "data": {
378 | "text/html": [
379 | "\n",
380 | "\n",
393 | "
\n",
394 | " \n",
395 | " \n",
396 | " | \n",
397 | " Consumer complaint narrative | \n",
398 | " Product | \n",
399 | " Target | \n",
400 | "
\n",
401 | " \n",
402 | " \n",
403 | " \n",
404 | " 0 | \n",
405 | " I have outdated information on my credit repor... | \n",
406 | " Credit reporting | \n",
407 | " 5 | \n",
408 | "
\n",
409 | " \n",
410 | " 1 | \n",
411 | " I purchased a new car on XXXX XXXX. The car de... | \n",
412 | " Consumer Loan | \n",
413 | " 2 | \n",
414 | "
\n",
415 | " \n",
416 | " 2 | \n",
417 | " An account on my credit report has a mistaken ... | \n",
418 | " Credit reporting | \n",
419 | " 5 | \n",
420 | "
\n",
421 | " \n",
422 | " 3 | \n",
423 | " This company refuses to provide me verificatio... | \n",
424 | " Debt collection | \n",
425 | " 7 | \n",
426 | "
\n",
427 | " \n",
428 | " 4 | \n",
429 | " This complaint is in regards to Square Two Fin... | \n",
430 | " Debt collection | \n",
431 | " 7 | \n",
432 | "
\n",
433 | " \n",
434 | "
\n",
435 | "
"
436 | ],
437 | "text/plain": [
438 | " Consumer complaint narrative Product Target\n",
439 | "0 I have outdated information on my credit repor... Credit reporting 5\n",
440 | "1 I purchased a new car on XXXX XXXX. The car de... Consumer Loan 2\n",
441 | "2 An account on my credit report has a mistaken ... Credit reporting 5\n",
442 | "3 This company refuses to provide me verificatio... Debt collection 7\n",
443 | "4 This complaint is in regards to Square Two Fin... Debt collection 7"
444 | ]
445 | },
446 | "execution_count": 14,
447 | "metadata": {},
448 | "output_type": "execute_result"
449 | }
450 | ],
451 | "source": [
452 | "from sklearn import preprocessing\n",
453 | "le = preprocessing.LabelEncoder()\n",
454 | "le.fit(complaints_dataframe['Product'])\n",
455 | "complaints_dataframe['Target'] = le.transform(complaints_dataframe['Product'])\n",
456 | "complaints_dataframe.head()"
457 | ]
458 | },
459 | {
460 | "cell_type": "markdown",
461 | "metadata": {
462 | "colab_type": "text",
463 | "id": "atXHKYN27-S0"
464 | },
465 | "source": [
466 | "### One hot Encoding"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": 15,
472 | "metadata": {
473 | "colab": {},
474 | "colab_type": "code",
475 | "id": "_RwGbO_L7-S4"
476 | },
477 | "outputs": [],
478 | "source": [
479 | "from sklearn.model_selection import train_test_split\n",
480 | "X_train, X_test, y_train, y_test = train_test_split(np.array(complaints_list), complaints_dataframe.Target.values, \n",
481 | " test_size=0.15, random_state=0)"
482 | ]
483 | },
484 | {
485 | "cell_type": "code",
486 | "execution_count": 16,
487 | "metadata": {
488 | "colab": {
489 | "base_uri": "https://localhost:8080/",
490 | "height": 34
491 | },
492 | "colab_type": "code",
493 | "executionInfo": {
494 | "elapsed": 1623,
495 | "status": "ok",
496 | "timestamp": 1566387605397,
497 | "user": {
498 | "displayName": "dikshant gupta",
499 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
500 | "userId": "01845807612441668603"
501 | },
502 | "user_tz": -330
503 | },
504 | "id": "K7XHJpLc7-S7",
505 | "outputId": "2e39d372-d636-467b-cadc-4da71065e2a6"
506 | },
507 | "outputs": [
508 | {
509 | "name": "stdout",
510 | "output_type": "stream",
511 | "text": [
512 | "(152809, 300)\n"
513 | ]
514 | }
515 | ],
516 | "source": [
517 | "print(X_train.shape)"
518 | ]
519 | },
520 | {
521 | "cell_type": "code",
522 | "execution_count": 18,
523 | "metadata": {
524 | "colab": {
525 | "base_uri": "https://localhost:8080/",
526 | "height": 34
527 | },
528 | "colab_type": "code",
529 | "executionInfo": {
530 | "elapsed": 1388,
531 | "status": "ok",
532 | "timestamp": 1566387619059,
533 | "user": {
534 | "displayName": "dikshant gupta",
535 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
536 | "userId": "01845807612441668603"
537 | },
538 | "user_tz": -330
539 | },
540 | "id": "ob2Az_Oq-h0x",
541 | "outputId": "60606497-6f8c-4205-97bb-2deb2c55dbdd"
542 | },
543 | "outputs": [
544 | {
545 | "name": "stdout",
546 | "output_type": "stream",
547 | "text": [
548 | "(152809,)\n"
549 | ]
550 | }
551 | ],
552 | "source": [
553 | "print(y_train.shape)"
554 | ]
555 | },
556 | {
557 | "cell_type": "markdown",
558 | "metadata": {
559 | "colab_type": "text",
560 | "id": "sw7pp-WinSI5"
561 | },
562 | "source": [
563 | "#### Training and testing the classifier"
564 | ]
565 | },
566 | {
567 | "cell_type": "code",
568 | "execution_count": 19,
569 | "metadata": {
570 | "colab": {
571 | "base_uri": "https://localhost:8080/",
572 | "height": 34
573 | },
574 | "colab_type": "code",
575 | "executionInfo": {
576 | "elapsed": 3057,
577 | "status": "ok",
578 | "timestamp": 1566387636476,
579 | "user": {
580 | "displayName": "dikshant gupta",
581 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64",
582 | "userId": "01845807612441668603"
583 | },
584 | "user_tz": -330
585 | },
586 | "id": "--kThiN07-S_",
587 | "outputId": "cc3e1cc6-3c65-4836-b096-4271a7dda6b5"
588 | },
589 | "outputs": [
590 | {
591 | "name": "stdout",
592 | "output_type": "stream",
593 | "text": [
594 | "0.4839618793340008\n"
595 | ]
596 | }
597 | ],
598 | "source": [
599 | "from sklearn.naive_bayes import BernoulliNB\n",
600 | "from sklearn.metrics import accuracy_score\n",
601 | "clf = BernoulliNB()\n",
602 | "clf.fit(X_train, y_train)\n",
603 | "pred = clf.predict(X_test)\n",
604 | "print(accuracy_score(y_test, pred))"
605 | ]
606 | },
607 | {
608 | "cell_type": "code",
609 | "execution_count": 20,
610 | "metadata": {},
611 | "outputs": [],
612 | "source": [
613 | "from sklearn.tree import DecisionTreeClassifier"
614 | ]
615 | },
616 | {
617 | "cell_type": "code",
618 | "execution_count": 21,
619 | "metadata": {},
620 | "outputs": [
621 | {
622 | "data": {
623 | "text/plain": [
624 | "DecisionTreeClassifier()"
625 | ]
626 | },
627 | "execution_count": 21,
628 | "metadata": {},
629 | "output_type": "execute_result"
630 | }
631 | ],
632 | "source": [
633 | "dt_classifier = DecisionTreeClassifier() \n",
634 | "dt_classifier.fit(X_train, y_train) "
635 | ]
636 | },
637 | {
638 | "cell_type": "code",
639 | "execution_count": null,
640 | "metadata": {},
641 | "outputs": [],
642 | "source": []
643 | },
644 | {
645 | "cell_type": "code",
646 | "execution_count": 22,
647 | "metadata": {},
648 | "outputs": [
649 | {
650 | "name": "stdout",
651 | "output_type": "stream",
652 | "text": [
653 | "0.4839618793340008\n"
654 | ]
655 | }
656 | ],
657 | "source": [
658 | "print(accuracy_score(y_test, pred))"
659 | ]
660 | }
661 | ],
662 | "metadata": {
663 | "anaconda-cloud": {},
664 | "colab": {
665 | "name": "case_study.ipynb",
666 | "provenance": [],
667 | "toc_visible": true
668 | },
669 | "kernelspec": {
670 | "display_name": "Python 3",
671 | "language": "python",
672 | "name": "python3"
673 | },
674 | "language_info": {
675 | "codemirror_mode": {
676 | "name": "ipython",
677 | "version": 3
678 | },
679 | "file_extension": ".py",
680 | "mimetype": "text/x-python",
681 | "name": "python",
682 | "nbconvert_exporter": "python",
683 | "pygments_lexer": "ipython3",
684 | "version": "3.7.4"
685 | }
686 | },
687 | "nbformat": 4,
688 | "nbformat_minor": 1
689 | }
690 |
--------------------------------------------------------------------------------
/Chapter 4/Chpater4_NLP1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "colab_type": "text",
7 | "id": "F5RaUZ0iXBaB"
8 | },
9 | "source": [
10 | "## Complaint Categorization Baseline Model\n",
11 | "\n",
12 | "Fast and efficient handling of complaints on consumer forums is vital to commerce industry today. This notebook presents a baseline approach towards solving this problem. Consumer complaints on financial products is taken as the dataset to establish results.\n",
13 | "\n",
14 | "Tf-idf (term frequency times inverse document frequency) scheme to weight individual tokens is often used in information retrieval. One of the advantage of tf-idf is reduce the impact of tokens that occur very frequently, hence offering little to none in terms of information.\n",
15 | "The tf-idf of term 't' in document 'd' is tf-idf(d, t) = tf(t) * idf(d, t), where tf(t) is the number of times t occurs while idf is given by idf(d, t) = log [(1 + n) / (1 + df(d,t) + 1] "
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {
22 | "colab": {},
23 | "colab_type": "code",
24 | "id": "KKU3Av-XXBaD"
25 | },
26 | "outputs": [],
27 | "source": [
28 | "from sklearn.feature_extraction.text import TfidfVectorizer\n",
29 | "from sklearn.model_selection import train_test_split\n",
30 | "\n",
31 | "# Importing pandas for operating on dataset\n",
32 | "import pandas as pd\n",
33 | "\n",
34 | "complaints_df = pd.read_csv('complaints.csv')"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {
40 | "colab_type": "text",
41 | "id": "COwaeZO2XBaG"
42 | },
43 | "source": [
44 | "### Typical Complaint"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 2,
50 | "metadata": {
51 | "colab": {},
52 | "colab_type": "code",
53 | "id": "jw_3jqF5XBaH",
54 | "outputId": "c3001684-0236-447d-ff64-18f382d01112"
55 | },
56 | "outputs": [
57 | {
58 | "data": {
59 | "text/plain": [
60 | "\"I purchased a new car on XXXX XXXX. The car dealer called Citizens Bank to get a 10 day payoff on my loan, good till XXXX XXXX. The dealer sent the check the next day. When I balanced my checkbook on XXXX XXXX. I noticed that Citizens bank had taken the automatic payment out of my checking account at XXXX XXXX XXXX Bank. I called Citizens and they stated that they did not close the loan until XXXX XXXX. ( stating that they did not receive the check until XXXX. XXXX. ). I told them that I did not believe that the check took that long to arrive. XXXX told me a check was issued to me for the amount overpaid, they deducted additional interest. Today ( XXXX XXXX, ) I called Citizens Bank again and talked to a supervisor named XXXX, because on XXXX XXXX. I received a letter that the loan had been paid in full ( dated XXXX, XXXX ) but no refund check was included. XXXX stated that they hold any over payment for 10 business days after the loan was satisfied and that my check would be mailed out on Wed. the XX/XX/XXXX.. I questioned her about the delay in posting the dealer payment and she first stated that sometimes it takes 3 or 4 business days to post, then she said they did not receive the check till XXXX XXXX I again told her that I did not believe this and asked where is my money. She then stated that they hold the over payment for 10 business days. I asked her why, and she simply said that is their policy. I asked her if I would receive interest on my money and she stated no. I believe that Citizens bank is deliberately delaying the posting of payment and the return of consumer 's money to make additional interest for the bank. If this is not illegal it should be, it does hurt the consumer and is not ethical. My amount of money lost is minimal but if they are doing this on thousands of car loans a month, then the additional interest earned for them could be staggering. I still have another car loan from Citizens Bank and I am afraid when I trade that car in another year I will run into the same problem again.\""
61 | ]
62 | },
63 | "execution_count": 2,
64 | "metadata": {},
65 | "output_type": "execute_result"
66 | }
67 | ],
68 | "source": [
69 | "complaints_df['Consumer complaint narrative'][1]"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {
75 | "colab_type": "text",
76 | "id": "4QXHqmFxXBaJ"
77 | },
78 | "source": [
79 | "### Categories"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 3,
85 | "metadata": {
86 | "colab": {},
87 | "colab_type": "code",
88 | "id": "5xIzAd7SXBaK",
89 | "outputId": "a7bc8e90-6365-4c82-c245-6f7d8d422630"
90 | },
91 | "outputs": [
92 | {
93 | "name": "stdout",
94 | "output_type": "stream",
95 | "text": [
96 | "['Credit reporting' 'Consumer Loan' 'Debt collection' 'Mortgage'\n",
97 | " 'Credit card' 'Other financial service' 'Bank account or service'\n",
98 | " 'Student loan' 'Money transfers' 'Payday loan' 'Prepaid card'\n",
99 | " 'Virtual currency'\n",
100 | " 'Credit reporting, credit repair services, or other personal consumer reports'\n",
101 | " 'Credit card or prepaid card' 'Checking or savings account'\n",
102 | " 'Payday loan, title loan, or personal loan'\n",
103 | " 'Money transfer, virtual currency, or money service'\n",
104 | " 'Vehicle loan or lease']\n"
105 | ]
106 | }
107 | ],
108 | "source": [
109 | "print(complaints_df.Product.unique())"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {
115 | "colab_type": "text",
116 | "id": "cwS4qyhGXBaM"
117 | },
118 | "source": [
119 | "### Train-test split\n",
120 | "15% of the total data is used as validation data while the remaining as training. This leads to 152809 training instances while 26967 validation instances."
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 4,
126 | "metadata": {
127 | "colab": {},
128 | "colab_type": "code",
129 | "id": "RcHsVb4GXBaN",
130 | "outputId": "f3ba9442-057d-467a-f518-e6ff0ddc70b9"
131 | },
132 | "outputs": [
133 | {
134 | "name": "stdout",
135 | "output_type": "stream",
136 | "text": [
137 | "Training utterances: 152809\n",
138 | "Validation utterances: 26967\n"
139 | ]
140 | }
141 | ],
142 | "source": [
143 | "X_train, X_test, y_train, y_test = train_test_split(\n",
144 | " complaints_df['Consumer complaint narrative'].values, complaints_df['Product'].values, \n",
145 | " test_size=0.15, random_state=0)\n",
146 | "print('Training utterances: {}'.format(X_train.shape[0]))\n",
147 | "print('Validation utterances: {}'.format(X_test.shape[0]))"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {
153 | "colab_type": "text",
154 | "id": "AxKJJZn8XBaP"
155 | },
156 | "source": [
157 | "### Calculating tf-idf scores\n",
158 | "Calculating tf-idf scores for each unique token in the dataset and creating frequency chart for each utterance in the dataset."
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 5,
164 | "metadata": {
165 | "colab": {},
166 | "colab_type": "code",
167 | "id": "Ut2qdu8HXBaP",
168 | "outputId": "f1a62af5-5b7e-44e6-f26c-6b9cfae12525"
169 | },
170 | "outputs": [
171 | {
172 | "data": {
173 | "text/plain": [
174 | "TfidfVectorizer()"
175 | ]
176 | },
177 | "execution_count": 5,
178 | "metadata": {},
179 | "output_type": "execute_result"
180 | }
181 | ],
182 | "source": [
183 | "vectorizer = TfidfVectorizer()\n",
184 | "vectorizer.fit(X_train)"
185 | ]
186 | },
187 | {
188 | "cell_type": "code",
189 | "execution_count": 6,
190 | "metadata": {
191 | "colab": {},
192 | "colab_type": "code",
193 | "id": "luNy0LXQXBaR",
194 | "outputId": "2ddb2395-44ea-4ba5-f190-e3e154a8c3e9"
195 | },
196 | "outputs": [
197 | {
198 | "data": {
199 | "text/plain": [
200 | "(<152809x76350 sparse matrix of type ''\n",
201 | " \twith 13864799 stored elements in Compressed Sparse Row format>,\n",
202 | " <26967x76350 sparse matrix of type ''\n",
203 | " \twith 2447784 stored elements in Compressed Sparse Row format>)"
204 | ]
205 | },
206 | "execution_count": 6,
207 | "metadata": {},
208 | "output_type": "execute_result"
209 | }
210 | ],
211 | "source": [
212 | "X_train = vectorizer.transform(X_train)\n",
213 | "X_test = vectorizer.transform(X_test)\n",
214 | "X_train, X_test"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "metadata": {
220 | "colab_type": "text",
221 | "id": "XnG6cE6GXBaT"
222 | },
223 | "source": [
224 | "### Feature Selection"
225 | ]
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": 7,
230 | "metadata": {
231 | "colab": {},
232 | "colab_type": "code",
233 | "id": "9eZdPs2fXBaU",
234 | "outputId": "2db6881f-b2bb-49c5-eb02-f8c2765a2295"
235 | },
236 | "outputs": [
237 | {
238 | "data": {
239 | "text/plain": [
240 | "(<152809x5000 sparse matrix of type ''\n",
241 | " \twith 10780400 stored elements in Compressed Sparse Row format>,\n",
242 | " <26967x5000 sparse matrix of type ''\n",
243 | " \twith 1907878 stored elements in Compressed Sparse Row format>)"
244 | ]
245 | },
246 | "execution_count": 7,
247 | "metadata": {},
248 | "output_type": "execute_result"
249 | }
250 | ],
251 | "source": [
252 | "from sklearn.feature_selection import SelectKBest, chi2\n",
253 | "\n",
254 | "ch2 = SelectKBest(chi2, k=5000)\n",
255 | "X_train = ch2.fit_transform(X_train, y_train)\n",
256 | "X_test = ch2.transform(X_test)\n",
257 | "\n",
258 | "X_train, X_test"
259 | ]
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "metadata": {
264 | "colab_type": "text",
265 | "id": "Qez31NMtXBaW"
266 | },
267 | "source": [
268 | "### Naive Bayes"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": 8,
274 | "metadata": {
275 | "colab": {},
276 | "colab_type": "code",
277 | "id": "VI4oeUxsXBaW",
278 | "outputId": "1a323811-5f4a-4cc8-a5ee-b0d94eb3f86c"
279 | },
280 | "outputs": [
281 | {
282 | "name": "stdout",
283 | "output_type": "stream",
284 | "text": [
285 | "0.7656024029369229\n"
286 | ]
287 | }
288 | ],
289 | "source": [
290 | "from sklearn.naive_bayes import MultinomialNB\n",
291 | "from sklearn.metrics import accuracy_score\n",
292 | "clf = MultinomialNB()\n",
293 | "clf.fit(X_train, y_train)\n",
294 | "pred = clf.predict(X_test)\n",
295 | "print(accuracy_score(y_test, pred))"
296 | ]
297 | },
298 | {
299 | "cell_type": "code",
300 | "execution_count": 0,
301 | "metadata": {
302 | "colab": {},
303 | "colab_type": "code",
304 | "id": "wINTcuMGXBaY"
305 | },
306 | "outputs": [],
307 | "source": []
308 | }
309 | ],
310 | "metadata": {
311 | "colab": {
312 | "name": "complaint_classification_case_study.ipynb",
313 | "provenance": []
314 | },
315 | "kernelspec": {
316 | "display_name": "Python 3",
317 | "language": "python",
318 | "name": "python3"
319 | },
320 | "language_info": {
321 | "codemirror_mode": {
322 | "name": "ipython",
323 | "version": 3
324 | },
325 | "file_extension": ".py",
326 | "mimetype": "text/x-python",
327 | "name": "python",
328 | "nbconvert_exporter": "python",
329 | "pygments_lexer": "ipython3",
330 | "version": "3.7.4"
331 | }
332 | },
333 | "nbformat": 4,
334 | "nbformat_minor": 1
335 | }
336 |
--------------------------------------------------------------------------------
/Chapter 4/ReadMe:
--------------------------------------------------------------------------------
1 |
2 | This repository contains the code and the datasets for the Chapter 4 of the book. This chapter has two additional datasets which are more than 100 MB in size and hence they have been uploaded at a Google drive. The address of the datasets is given below. Happy coding.
3 | The link of the dataset is https://drive.google.com/drive/folders/1W0RyG3_1aadOIA2ZwQEOWkj8dzBzcobg?usp=sharing
4 |
--------------------------------------------------------------------------------
/Chapter 4/bc2.csv:
--------------------------------------------------------------------------------
1 | ID,ClumpThickness,Cell Size,Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Normal Nucleoli,Bland Chromatin,Mitoses,Class
2 | 1000025,5,1,1,1,2,1,3,1,1,2
3 | 1002945,5,4,4,5,7,10,3,2,1,2
4 | 1015425,3,1,1,1,2,2,3,1,1,2
5 | 1016277,6,8,8,1,3,4,3,7,1,2
6 | 1017023,4,1,1,3,2,1,3,1,1,2
7 | 1017122,8,10,10,8,7,10,9,7,1,4
8 | 1018099,1,1,1,1,2,10,3,1,1,2
9 | 1018561,2,1,2,1,2,1,3,1,1,2
10 | 1033078,2,1,1,1,2,1,1,1,5,2
11 | 1033078,4,2,1,1,2,1,2,1,1,2
12 | 1035283,1,1,1,1,1,1,3,1,1,2
13 | 1036172,2,1,1,1,2,1,2,1,1,2
14 | 1041801,5,3,3,3,2,3,4,4,1,4
15 | 1043999,1,1,1,1,2,3,3,1,1,2
16 | 1044572,8,7,5,10,7,9,5,5,4,4
17 | 1047630,7,4,6,4,6,1,4,3,1,4
18 | 1048672,4,1,1,1,2,1,2,1,1,2
19 | 1049815,4,1,1,1,2,1,3,1,1,2
20 | 1050670,10,7,7,6,4,10,4,1,2,4
21 | 1050718,6,1,1,1,2,1,3,1,1,2
22 | 1054590,7,3,2,10,5,10,5,4,4,4
23 | 1054593,10,5,5,3,6,7,7,10,1,4
24 | 1056784,3,1,1,1,2,1,2,1,1,2
25 | 1057013,8,4,5,1,2,?,7,3,1,4
26 | 1059552,1,1,1,1,2,1,3,1,1,2
27 | 1065726,5,2,3,4,2,7,3,6,1,4
28 | 1066373,3,2,1,1,1,1,2,1,1,2
29 | 1066979,5,1,1,1,2,1,2,1,1,2
30 | 1067444,2,1,1,1,2,1,2,1,1,2
31 | 1070935,1,1,3,1,2,1,1,1,1,2
32 | 1070935,3,1,1,1,1,1,2,1,1,2
33 | 1071760,2,1,1,1,2,1,3,1,1,2
34 | 1072179,10,7,7,3,8,5,7,4,3,4
35 | 1074610,2,1,1,2,2,1,3,1,1,2
36 | 1075123,3,1,2,1,2,1,2,1,1,2
37 | 1079304,2,1,1,1,2,1,2,1,1,2
38 | 1080185,10,10,10,8,6,1,8,9,1,4
39 | 1081791,6,2,1,1,1,1,7,1,1,2
40 | 1084584,5,4,4,9,2,10,5,6,1,4
41 | 1091262,2,5,3,3,6,7,7,5,1,4
42 | 1096800,6,6,6,9,6,?,7,8,1,2
43 | 1099510,10,4,3,1,3,3,6,5,2,4
44 | 1100524,6,10,10,2,8,10,7,3,3,4
45 | 1102573,5,6,5,6,10,1,3,1,1,4
46 | 1103608,10,10,10,4,8,1,8,10,1,4
47 | 1103722,1,1,1,1,2,1,2,1,2,2
48 | 1105257,3,7,7,4,4,9,4,8,1,4
49 | 1105524,1,1,1,1,2,1,2,1,1,2
50 | 1106095,4,1,1,3,2,1,3,1,1,2
51 | 1106829,7,8,7,2,4,8,3,8,2,4
52 | 1108370,9,5,8,1,2,3,2,1,5,4
53 | 1108449,5,3,3,4,2,4,3,4,1,4
54 | 1110102,10,3,6,2,3,5,4,10,2,4
55 | 1110503,5,5,5,8,10,8,7,3,7,4
56 | 1110524,10,5,5,6,8,8,7,1,1,4
57 | 1111249,10,6,6,3,4,5,3,6,1,4
58 | 1112209,8,10,10,1,3,6,3,9,1,4
59 | 1113038,8,2,4,1,5,1,5,4,4,4
60 | 1113483,5,2,3,1,6,10,5,1,1,4
61 | 1113906,9,5,5,2,2,2,5,1,1,4
62 | 1115282,5,3,5,5,3,3,4,10,1,4
63 | 1115293,1,1,1,1,2,2,2,1,1,2
64 | 1116116,9,10,10,1,10,8,3,3,1,4
65 | 1116132,6,3,4,1,5,2,3,9,1,4
66 | 1116192,1,1,1,1,2,1,2,1,1,2
67 | 1116998,10,4,2,1,3,2,4,3,10,4
68 | 1117152,4,1,1,1,2,1,3,1,1,2
69 | 1118039,5,3,4,1,8,10,4,9,1,4
70 | 1120559,8,3,8,3,4,9,8,9,8,4
71 | 1121732,1,1,1,1,2,1,3,2,1,2
72 | 1121919,5,1,3,1,2,1,2,1,1,2
73 | 1123061,6,10,2,8,10,2,7,8,10,4
74 | 1124651,1,3,3,2,2,1,7,2,1,2
75 | 1125035,9,4,5,10,6,10,4,8,1,4
76 | 1126417,10,6,4,1,3,4,3,2,3,4
77 | 1131294,1,1,2,1,2,2,4,2,1,2
78 | 1132347,1,1,4,1,2,1,2,1,1,2
79 | 1133041,5,3,1,2,2,1,2,1,1,2
80 | 1133136,3,1,1,1,2,3,3,1,1,2
81 | 1136142,2,1,1,1,3,1,2,1,1,2
82 | 1137156,2,2,2,1,1,1,7,1,1,2
83 | 1143978,4,1,1,2,2,1,2,1,1,2
84 | 1143978,5,2,1,1,2,1,3,1,1,2
85 | 1147044,3,1,1,1,2,2,7,1,1,2
86 | 1147699,3,5,7,8,8,9,7,10,7,4
87 | 1147748,5,10,6,1,10,4,4,10,10,4
88 | 1148278,3,3,6,4,5,8,4,4,1,4
89 | 1148873,3,6,6,6,5,10,6,8,3,4
90 | 1152331,4,1,1,1,2,1,3,1,1,2
91 | 1155546,2,1,1,2,3,1,2,1,1,2
92 | 1156272,1,1,1,1,2,1,3,1,1,2
93 | 1156948,3,1,1,2,2,1,1,1,1,2
94 | 1157734,4,1,1,1,2,1,3,1,1,2
95 | 1158247,1,1,1,1,2,1,2,1,1,2
96 | 1160476,2,1,1,1,2,1,3,1,1,2
97 | 1164066,1,1,1,1,2,1,3,1,1,2
98 | 1165297,2,1,1,2,2,1,1,1,1,2
99 | 1165790,5,1,1,1,2,1,3,1,1,2
100 | 1165926,9,6,9,2,10,6,2,9,10,4
101 | 1166630,7,5,6,10,5,10,7,9,4,4
102 | 1166654,10,3,5,1,10,5,3,10,2,4
103 | 1167439,2,3,4,4,2,5,2,5,1,4
104 | 1167471,4,1,2,1,2,1,3,1,1,2
105 | 1168359,8,2,3,1,6,3,7,1,1,4
106 | 1168736,10,10,10,10,10,1,8,8,8,4
107 | 1169049,7,3,4,4,3,3,3,2,7,4
108 | 1170419,10,10,10,8,2,10,4,1,1,4
109 | 1170420,1,6,8,10,8,10,5,7,1,4
110 | 1171710,1,1,1,1,2,1,2,3,1,2
111 | 1171710,6,5,4,4,3,9,7,8,3,4
112 | 1171795,1,3,1,2,2,2,5,3,2,2
113 | 1171845,8,6,4,3,5,9,3,1,1,4
114 | 1172152,10,3,3,10,2,10,7,3,3,4
115 | 1173216,10,10,10,3,10,8,8,1,1,4
116 | 1173235,3,3,2,1,2,3,3,1,1,2
117 | 1173347,1,1,1,1,2,5,1,1,1,2
118 | 1173347,8,3,3,1,2,2,3,2,1,2
119 | 1173509,4,5,5,10,4,10,7,5,8,4
120 | 1173514,1,1,1,1,4,3,1,1,1,2
121 | 1173681,3,2,1,1,2,2,3,1,1,2
122 | 1174057,1,1,2,2,2,1,3,1,1,2
123 | 1174057,4,2,1,1,2,2,3,1,1,2
124 | 1174131,10,10,10,2,10,10,5,3,3,4
125 | 1174428,5,3,5,1,8,10,5,3,1,4
126 | 1175937,5,4,6,7,9,7,8,10,1,4
127 | 1176406,1,1,1,1,2,1,2,1,1,2
128 | 1176881,7,5,3,7,4,10,7,5,5,4
129 | 1177027,3,1,1,1,2,1,3,1,1,2
130 | 1177399,8,3,5,4,5,10,1,6,2,4
131 | 1177512,1,1,1,1,10,1,1,1,1,2
132 | 1178580,5,1,3,1,2,1,2,1,1,2
133 | 1179818,2,1,1,1,2,1,3,1,1,2
134 | 1180194,5,10,8,10,8,10,3,6,3,4
135 | 1180523,3,1,1,1,2,1,2,2,1,2
136 | 1180831,3,1,1,1,3,1,2,1,1,2
137 | 1181356,5,1,1,1,2,2,3,3,1,2
138 | 1182404,4,1,1,1,2,1,2,1,1,2
139 | 1182410,3,1,1,1,2,1,1,1,1,2
140 | 1183240,4,1,2,1,2,1,2,1,1,2
141 | 1183246,1,1,1,1,1,?,2,1,1,2
142 | 1183516,3,1,1,1,2,1,1,1,1,2
143 | 1183911,2,1,1,1,2,1,1,1,1,2
144 | 1183983,9,5,5,4,4,5,4,3,3,4
145 | 1184184,1,1,1,1,2,5,1,1,1,2
146 | 1184241,2,1,1,1,2,1,2,1,1,2
147 | 1184840,1,1,3,1,2,?,2,1,1,2
148 | 1185609,3,4,5,2,6,8,4,1,1,4
149 | 1185610,1,1,1,1,3,2,2,1,1,2
150 | 1187457,3,1,1,3,8,1,5,8,1,2
151 | 1187805,8,8,7,4,10,10,7,8,7,4
152 | 1188472,1,1,1,1,1,1,3,1,1,2
153 | 1189266,7,2,4,1,6,10,5,4,3,4
154 | 1189286,10,10,8,6,4,5,8,10,1,4
155 | 1190394,4,1,1,1,2,3,1,1,1,2
156 | 1190485,1,1,1,1,2,1,1,1,1,2
157 | 1192325,5,5,5,6,3,10,3,1,1,4
158 | 1193091,1,2,2,1,2,1,2,1,1,2
159 | 1193210,2,1,1,1,2,1,3,1,1,2
160 | 1193683,1,1,2,1,3,?,1,1,1,2
161 | 1196295,9,9,10,3,6,10,7,10,6,4
162 | 1196915,10,7,7,4,5,10,5,7,2,4
163 | 1197080,4,1,1,1,2,1,3,2,1,2
164 | 1197270,3,1,1,1,2,1,3,1,1,2
165 | 1197440,1,1,1,2,1,3,1,1,7,2
166 | 1197510,5,1,1,1,2,?,3,1,1,2
167 | 1197979,4,1,1,1,2,2,3,2,1,2
168 | 1197993,5,6,7,8,8,10,3,10,3,4
169 | 1198128,10,8,10,10,6,1,3,1,10,4
170 | 1198641,3,1,1,1,2,1,3,1,1,2
171 | 1199219,1,1,1,2,1,1,1,1,1,2
172 | 1199731,3,1,1,1,2,1,1,1,1,2
173 | 1199983,1,1,1,1,2,1,3,1,1,2
174 | 1200772,1,1,1,1,2,1,2,1,1,2
175 | 1200847,6,10,10,10,8,10,10,10,7,4
176 | 1200892,8,6,5,4,3,10,6,1,1,4
177 | 1200952,5,8,7,7,10,10,5,7,1,4
178 | 1201834,2,1,1,1,2,1,3,1,1,2
179 | 1201936,5,10,10,3,8,1,5,10,3,4
180 | 1202125,4,1,1,1,2,1,3,1,1,2
181 | 1202812,5,3,3,3,6,10,3,1,1,4
182 | 1203096,1,1,1,1,1,1,3,1,1,2
183 | 1204242,1,1,1,1,2,1,1,1,1,2
184 | 1204898,6,1,1,1,2,1,3,1,1,2
185 | 1205138,5,8,8,8,5,10,7,8,1,4
186 | 1205579,8,7,6,4,4,10,5,1,1,4
187 | 1206089,2,1,1,1,1,1,3,1,1,2
188 | 1206695,1,5,8,6,5,8,7,10,1,4
189 | 1206841,10,5,6,10,6,10,7,7,10,4
190 | 1207986,5,8,4,10,5,8,9,10,1,4
191 | 1208301,1,2,3,1,2,1,3,1,1,2
192 | 1210963,10,10,10,8,6,8,7,10,1,4
193 | 1211202,7,5,10,10,10,10,4,10,3,4
194 | 1212232,5,1,1,1,2,1,2,1,1,2
195 | 1212251,1,1,1,1,2,1,3,1,1,2
196 | 1212422,3,1,1,1,2,1,3,1,1,2
197 | 1212422,4,1,1,1,2,1,3,1,1,2
198 | 1213375,8,4,4,5,4,7,7,8,2,2
199 | 1213383,5,1,1,4,2,1,3,1,1,2
200 | 1214092,1,1,1,1,2,1,1,1,1,2
201 | 1214556,3,1,1,1,2,1,2,1,1,2
202 | 1214966,9,7,7,5,5,10,7,8,3,4
203 | 1216694,10,8,8,4,10,10,8,1,1,4
204 | 1216947,1,1,1,1,2,1,3,1,1,2
205 | 1217051,5,1,1,1,2,1,3,1,1,2
206 | 1217264,1,1,1,1,2,1,3,1,1,2
207 | 1218105,5,10,10,9,6,10,7,10,5,4
208 | 1218741,10,10,9,3,7,5,3,5,1,4
209 | 1218860,1,1,1,1,1,1,3,1,1,2
210 | 1218860,1,1,1,1,1,1,3,1,1,2
211 | 1219406,5,1,1,1,1,1,3,1,1,2
212 | 1219525,8,10,10,10,5,10,8,10,6,4
213 | 1219859,8,10,8,8,4,8,7,7,1,4
214 | 1220330,1,1,1,1,2,1,3,1,1,2
215 | 1221863,10,10,10,10,7,10,7,10,4,4
216 | 1222047,10,10,10,10,3,10,10,6,1,4
217 | 1222936,8,7,8,7,5,5,5,10,2,4
218 | 1223282,1,1,1,1,2,1,2,1,1,2
219 | 1223426,1,1,1,1,2,1,3,1,1,2
220 | 1223793,6,10,7,7,6,4,8,10,2,4
221 | 1223967,6,1,3,1,2,1,3,1,1,2
222 | 1224329,1,1,1,2,2,1,3,1,1,2
223 | 1225799,10,6,4,3,10,10,9,10,1,4
224 | 1226012,4,1,1,3,1,5,2,1,1,4
225 | 1226612,7,5,6,3,3,8,7,4,1,4
226 | 1227210,10,5,5,6,3,10,7,9,2,4
227 | 1227244,1,1,1,1,2,1,2,1,1,2
228 | 1227481,10,5,7,4,4,10,8,9,1,4
229 | 1228152,8,9,9,5,3,5,7,7,1,4
230 | 1228311,1,1,1,1,1,1,3,1,1,2
231 | 1230175,10,10,10,3,10,10,9,10,1,4
232 | 1230688,7,4,7,4,3,7,7,6,1,4
233 | 1231387,6,8,7,5,6,8,8,9,2,4
234 | 1231706,8,4,6,3,3,1,4,3,1,2
235 | 1232225,10,4,5,5,5,10,4,1,1,4
236 | 1236043,3,3,2,1,3,1,3,6,1,2
237 | 1241232,3,1,4,1,2,?,3,1,1,2
238 | 1241559,10,8,8,2,8,10,4,8,10,4
239 | 1241679,9,8,8,5,6,2,4,10,4,4
240 | 1242364,8,10,10,8,6,9,3,10,10,4
241 | 1243256,10,4,3,2,3,10,5,3,2,4
242 | 1270479,5,1,3,3,2,2,2,3,1,2
243 | 1276091,3,1,1,3,1,1,3,1,1,2
244 | 1277018,2,1,1,1,2,1,3,1,1,2
245 | 128059,1,1,1,1,2,5,5,1,1,2
246 | 1285531,1,1,1,1,2,1,3,1,1,2
247 | 1287775,5,1,1,2,2,2,3,1,1,2
248 | 144888,8,10,10,8,5,10,7,8,1,4
249 | 145447,8,4,4,1,2,9,3,3,1,4
250 | 167528,4,1,1,1,2,1,3,6,1,2
251 | 169356,3,1,1,1,2,?,3,1,1,2
252 | 183913,1,2,2,1,2,1,1,1,1,2
253 | 191250,10,4,4,10,2,10,5,3,3,4
254 | 1017023,6,3,3,5,3,10,3,5,3,2
255 | 1100524,6,10,10,2,8,10,7,3,3,4
256 | 1116116,9,10,10,1,10,8,3,3,1,4
257 | 1168736,5,6,6,2,4,10,3,6,1,4
258 | 1182404,3,1,1,1,2,1,1,1,1,2
259 | 1182404,3,1,1,1,2,1,2,1,1,2
260 | 1198641,3,1,1,1,2,1,3,1,1,2
261 | 242970,5,7,7,1,5,8,3,4,1,2
262 | 255644,10,5,8,10,3,10,5,1,3,4
263 | 263538,5,10,10,6,10,10,10,6,5,4
264 | 274137,8,8,9,4,5,10,7,8,1,4
265 | 303213,10,4,4,10,6,10,5,5,1,4
266 | 314428,7,9,4,10,10,3,5,3,3,4
267 | 1182404,5,1,4,1,2,1,3,2,1,2
268 | 1198641,10,10,6,3,3,10,4,3,2,4
269 | 320675,3,3,5,2,3,10,7,1,1,4
270 | 324427,10,8,8,2,3,4,8,7,8,4
271 | 385103,1,1,1,1,2,1,3,1,1,2
272 | 390840,8,4,7,1,3,10,3,9,2,4
273 | 411453,5,1,1,1,2,1,3,1,1,2
274 | 320675,3,3,5,2,3,10,7,1,1,4
275 | 428903,7,2,4,1,3,4,3,3,1,4
276 | 431495,3,1,1,1,2,1,3,2,1,2
277 | 432809,3,1,3,1,2,?,2,1,1,2
278 | 434518,3,1,1,1,2,1,2,1,1,2
279 | 452264,1,1,1,1,2,1,2,1,1,2
280 | 456282,1,1,1,1,2,1,3,1,1,2
281 | 476903,10,5,7,3,3,7,3,3,8,4
282 | 486283,3,1,1,1,2,1,3,1,1,2
283 | 486662,2,1,1,2,2,1,3,1,1,2
284 | 488173,1,4,3,10,4,10,5,6,1,4
285 | 492268,10,4,6,1,2,10,5,3,1,4
286 | 508234,7,4,5,10,2,10,3,8,2,4
287 | 527363,8,10,10,10,8,10,10,7,3,4
288 | 529329,10,10,10,10,10,10,4,10,10,4
289 | 535331,3,1,1,1,3,1,2,1,1,2
290 | 543558,6,1,3,1,4,5,5,10,1,4
291 | 555977,5,6,6,8,6,10,4,10,4,4
292 | 560680,1,1,1,1,2,1,1,1,1,2
293 | 561477,1,1,1,1,2,1,3,1,1,2
294 | 563649,8,8,8,1,2,?,6,10,1,4
295 | 601265,10,4,4,6,2,10,2,3,1,4
296 | 606140,1,1,1,1,2,?,2,1,1,2
297 | 606722,5,5,7,8,6,10,7,4,1,4
298 | 616240,5,3,4,3,4,5,4,7,1,2
299 | 61634,5,4,3,1,2,?,2,3,1,2
300 | 625201,8,2,1,1,5,1,1,1,1,2
301 | 63375,9,1,2,6,4,10,7,7,2,4
302 | 635844,8,4,10,5,4,4,7,10,1,4
303 | 636130,1,1,1,1,2,1,3,1,1,2
304 | 640744,10,10,10,7,9,10,7,10,10,4
305 | 646904,1,1,1,1,2,1,3,1,1,2
306 | 653777,8,3,4,9,3,10,3,3,1,4
307 | 659642,10,8,4,4,4,10,3,10,4,4
308 | 666090,1,1,1,1,2,1,3,1,1,2
309 | 666942,1,1,1,1,2,1,3,1,1,2
310 | 667204,7,8,7,6,4,3,8,8,4,4
311 | 673637,3,1,1,1,2,5,5,1,1,2
312 | 684955,2,1,1,1,3,1,2,1,1,2
313 | 688033,1,1,1,1,2,1,1,1,1,2
314 | 691628,8,6,4,10,10,1,3,5,1,4
315 | 693702,1,1,1,1,2,1,1,1,1,2
316 | 704097,1,1,1,1,1,1,2,1,1,2
317 | 704168,4,6,5,6,7,?,4,9,1,2
318 | 706426,5,5,5,2,5,10,4,3,1,4
319 | 709287,6,8,7,8,6,8,8,9,1,4
320 | 718641,1,1,1,1,5,1,3,1,1,2
321 | 721482,4,4,4,4,6,5,7,3,1,2
322 | 730881,7,6,3,2,5,10,7,4,6,4
323 | 733639,3,1,1,1,2,?,3,1,1,2
324 | 733639,3,1,1,1,2,1,3,1,1,2
325 | 733823,5,4,6,10,2,10,4,1,1,4
326 | 740492,1,1,1,1,2,1,3,1,1,2
327 | 743348,3,2,2,1,2,1,2,3,1,2
328 | 752904,10,1,1,1,2,10,5,4,1,4
329 | 756136,1,1,1,1,2,1,2,1,1,2
330 | 760001,8,10,3,2,6,4,3,10,1,4
331 | 760239,10,4,6,4,5,10,7,1,1,4
332 | 76389,10,4,7,2,2,8,6,1,1,4
333 | 764974,5,1,1,1,2,1,3,1,2,2
334 | 770066,5,2,2,2,2,1,2,2,1,2
335 | 785208,5,4,6,6,4,10,4,3,1,4
336 | 785615,8,6,7,3,3,10,3,4,2,4
337 | 792744,1,1,1,1,2,1,1,1,1,2
338 | 797327,6,5,5,8,4,10,3,4,1,4
339 | 798429,1,1,1,1,2,1,3,1,1,2
340 | 704097,1,1,1,1,1,1,2,1,1,2
341 | 806423,8,5,5,5,2,10,4,3,1,4
342 | 809912,10,3,3,1,2,10,7,6,1,4
343 | 810104,1,1,1,1,2,1,3,1,1,2
344 | 814265,2,1,1,1,2,1,1,1,1,2
345 | 814911,1,1,1,1,2,1,1,1,1,2
346 | 822829,7,6,4,8,10,10,9,5,3,4
347 | 826923,1,1,1,1,2,1,1,1,1,2
348 | 830690,5,2,2,2,3,1,1,3,1,2
349 | 831268,1,1,1,1,1,1,1,3,1,2
350 | 832226,3,4,4,10,5,1,3,3,1,4
351 | 832567,4,2,3,5,3,8,7,6,1,4
352 | 836433,5,1,1,3,2,1,1,1,1,2
353 | 837082,2,1,1,1,2,1,3,1,1,2
354 | 846832,3,4,5,3,7,3,4,6,1,2
355 | 850831,2,7,10,10,7,10,4,9,4,4
356 | 855524,1,1,1,1,2,1,2,1,1,2
357 | 857774,4,1,1,1,3,1,2,2,1,2
358 | 859164,5,3,3,1,3,3,3,3,3,4
359 | 859350,8,10,10,7,10,10,7,3,8,4
360 | 866325,8,10,5,3,8,4,4,10,3,4
361 | 873549,10,3,5,4,3,7,3,5,3,4
362 | 877291,6,10,10,10,10,10,8,10,10,4
363 | 877943,3,10,3,10,6,10,5,1,4,4
364 | 888169,3,2,2,1,4,3,2,1,1,2
365 | 888523,4,4,4,2,2,3,2,1,1,2
366 | 896404,2,1,1,1,2,1,3,1,1,2
367 | 897172,2,1,1,1,2,1,2,1,1,2
368 | 95719,6,10,10,10,8,10,7,10,7,4
369 | 160296,5,8,8,10,5,10,8,10,3,4
370 | 342245,1,1,3,1,2,1,1,1,1,2
371 | 428598,1,1,3,1,1,1,2,1,1,2
372 | 492561,4,3,2,1,3,1,2,1,1,2
373 | 493452,1,1,3,1,2,1,1,1,1,2
374 | 493452,4,1,2,1,2,1,2,1,1,2
375 | 521441,5,1,1,2,2,1,2,1,1,2
376 | 560680,3,1,2,1,2,1,2,1,1,2
377 | 636437,1,1,1,1,2,1,1,1,1,2
378 | 640712,1,1,1,1,2,1,2,1,1,2
379 | 654244,1,1,1,1,1,1,2,1,1,2
380 | 657753,3,1,1,4,3,1,2,2,1,2
381 | 685977,5,3,4,1,4,1,3,1,1,2
382 | 805448,1,1,1,1,2,1,1,1,1,2
383 | 846423,10,6,3,6,4,10,7,8,4,4
384 | 1002504,3,2,2,2,2,1,3,2,1,2
385 | 1022257,2,1,1,1,2,1,1,1,1,2
386 | 1026122,2,1,1,1,2,1,1,1,1,2
387 | 1071084,3,3,2,2,3,1,1,2,3,2
388 | 1080233,7,6,6,3,2,10,7,1,1,4
389 | 1114570,5,3,3,2,3,1,3,1,1,2
390 | 1114570,2,1,1,1,2,1,2,2,1,2
391 | 1116715,5,1,1,1,3,2,2,2,1,2
392 | 1131411,1,1,1,2,2,1,2,1,1,2
393 | 1151734,10,8,7,4,3,10,7,9,1,4
394 | 1156017,3,1,1,1,2,1,2,1,1,2
395 | 1158247,1,1,1,1,1,1,1,1,1,2
396 | 1158405,1,2,3,1,2,1,2,1,1,2
397 | 1168278,3,1,1,1,2,1,2,1,1,2
398 | 1176187,3,1,1,1,2,1,3,1,1,2
399 | 1196263,4,1,1,1,2,1,1,1,1,2
400 | 1196475,3,2,1,1,2,1,2,2,1,2
401 | 1206314,1,2,3,1,2,1,1,1,1,2
402 | 1211265,3,10,8,7,6,9,9,3,8,4
403 | 1213784,3,1,1,1,2,1,1,1,1,2
404 | 1223003,5,3,3,1,2,1,2,1,1,2
405 | 1223306,3,1,1,1,2,4,1,1,1,2
406 | 1223543,1,2,1,3,2,1,1,2,1,2
407 | 1229929,1,1,1,1,2,1,2,1,1,2
408 | 1231853,4,2,2,1,2,1,2,1,1,2
409 | 1234554,1,1,1,1,2,1,2,1,1,2
410 | 1236837,2,3,2,2,2,2,3,1,1,2
411 | 1237674,3,1,2,1,2,1,2,1,1,2
412 | 1238021,1,1,1,1,2,1,2,1,1,2
413 | 1238464,1,1,1,1,1,?,2,1,1,2
414 | 1238633,10,10,10,6,8,4,8,5,1,4
415 | 1238915,5,1,2,1,2,1,3,1,1,2
416 | 1238948,8,5,6,2,3,10,6,6,1,4
417 | 1239232,3,3,2,6,3,3,3,5,1,2
418 | 1239347,8,7,8,5,10,10,7,2,1,4
419 | 1239967,1,1,1,1,2,1,2,1,1,2
420 | 1240337,5,2,2,2,2,2,3,2,2,2
421 | 1253505,2,3,1,1,5,1,1,1,1,2
422 | 1255384,3,2,2,3,2,3,3,1,1,2
423 | 1257200,10,10,10,7,10,10,8,2,1,4
424 | 1257648,4,3,3,1,2,1,3,3,1,2
425 | 1257815,5,1,3,1,2,1,2,1,1,2
426 | 1257938,3,1,1,1,2,1,1,1,1,2
427 | 1258549,9,10,10,10,10,10,10,10,1,4
428 | 1258556,5,3,6,1,2,1,1,1,1,2
429 | 1266154,8,7,8,2,4,2,5,10,1,4
430 | 1272039,1,1,1,1,2,1,2,1,1,2
431 | 1276091,2,1,1,1,2,1,2,1,1,2
432 | 1276091,1,3,1,1,2,1,2,2,1,2
433 | 1276091,5,1,1,3,4,1,3,2,1,2
434 | 1277629,5,1,1,1,2,1,2,2,1,2
435 | 1293439,3,2,2,3,2,1,1,1,1,2
436 | 1293439,6,9,7,5,5,8,4,2,1,2
437 | 1294562,10,8,10,1,3,10,5,1,1,4
438 | 1295186,10,10,10,1,6,1,2,8,1,4
439 | 527337,4,1,1,1,2,1,1,1,1,2
440 | 558538,4,1,3,3,2,1,1,1,1,2
441 | 566509,5,1,1,1,2,1,1,1,1,2
442 | 608157,10,4,3,10,4,10,10,1,1,4
443 | 677910,5,2,2,4,2,4,1,1,1,2
444 | 734111,1,1,1,3,2,3,1,1,1,2
445 | 734111,1,1,1,1,2,2,1,1,1,2
446 | 780555,5,1,1,6,3,1,2,1,1,2
447 | 827627,2,1,1,1,2,1,1,1,1,2
448 | 1049837,1,1,1,1,2,1,1,1,1,2
449 | 1058849,5,1,1,1,2,1,1,1,1,2
450 | 1182404,1,1,1,1,1,1,1,1,1,2
451 | 1193544,5,7,9,8,6,10,8,10,1,4
452 | 1201870,4,1,1,3,1,1,2,1,1,2
453 | 1202253,5,1,1,1,2,1,1,1,1,2
454 | 1227081,3,1,1,3,2,1,1,1,1,2
455 | 1230994,4,5,5,8,6,10,10,7,1,4
456 | 1238410,2,3,1,1,3,1,1,1,1,2
457 | 1246562,10,2,2,1,2,6,1,1,2,4
458 | 1257470,10,6,5,8,5,10,8,6,1,4
459 | 1259008,8,8,9,6,6,3,10,10,1,4
460 | 1266124,5,1,2,1,2,1,1,1,1,2
461 | 1267898,5,1,3,1,2,1,1,1,1,2
462 | 1268313,5,1,1,3,2,1,1,1,1,2
463 | 1268804,3,1,1,1,2,5,1,1,1,2
464 | 1276091,6,1,1,3,2,1,1,1,1,2
465 | 1280258,4,1,1,1,2,1,1,2,1,2
466 | 1293966,4,1,1,1,2,1,1,1,1,2
467 | 1296572,10,9,8,7,6,4,7,10,3,4
468 | 1298416,10,6,6,2,4,10,9,7,1,4
469 | 1299596,6,6,6,5,4,10,7,6,2,4
470 | 1105524,4,1,1,1,2,1,1,1,1,2
471 | 1181685,1,1,2,1,2,1,2,1,1,2
472 | 1211594,3,1,1,1,1,1,2,1,1,2
473 | 1238777,6,1,1,3,2,1,1,1,1,2
474 | 1257608,6,1,1,1,1,1,1,1,1,2
475 | 1269574,4,1,1,1,2,1,1,1,1,2
476 | 1277145,5,1,1,1,2,1,1,1,1,2
477 | 1287282,3,1,1,1,2,1,1,1,1,2
478 | 1296025,4,1,2,1,2,1,1,1,1,2
479 | 1296263,4,1,1,1,2,1,1,1,1,2
480 | 1296593,5,2,1,1,2,1,1,1,1,2
481 | 1299161,4,8,7,10,4,10,7,5,1,4
482 | 1301945,5,1,1,1,1,1,1,1,1,2
483 | 1302428,5,3,2,4,2,1,1,1,1,2
484 | 1318169,9,10,10,10,10,5,10,10,10,4
485 | 474162,8,7,8,5,5,10,9,10,1,4
486 | 787451,5,1,2,1,2,1,1,1,1,2
487 | 1002025,1,1,1,3,1,3,1,1,1,2
488 | 1070522,3,1,1,1,1,1,2,1,1,2
489 | 1073960,10,10,10,10,6,10,8,1,5,4
490 | 1076352,3,6,4,10,3,3,3,4,1,4
491 | 1084139,6,3,2,1,3,4,4,1,1,4
492 | 1115293,1,1,1,1,2,1,1,1,1,2
493 | 1119189,5,8,9,4,3,10,7,1,1,4
494 | 1133991,4,1,1,1,1,1,2,1,1,2
495 | 1142706,5,10,10,10,6,10,6,5,2,4
496 | 1155967,5,1,2,10,4,5,2,1,1,2
497 | 1170945,3,1,1,1,1,1,2,1,1,2
498 | 1181567,1,1,1,1,1,1,1,1,1,2
499 | 1182404,4,2,1,1,2,1,1,1,1,2
500 | 1204558,4,1,1,1,2,1,2,1,1,2
501 | 1217952,4,1,1,1,2,1,2,1,1,2
502 | 1224565,6,1,1,1,2,1,3,1,1,2
503 | 1238186,4,1,1,1,2,1,2,1,1,2
504 | 1253917,4,1,1,2,2,1,2,1,1,2
505 | 1265899,4,1,1,1,2,1,3,1,1,2
506 | 1268766,1,1,1,1,2,1,1,1,1,2
507 | 1277268,3,3,1,1,2,1,1,1,1,2
508 | 1286943,8,10,10,10,7,5,4,8,7,4
509 | 1295508,1,1,1,1,2,4,1,1,1,2
510 | 1297327,5,1,1,1,2,1,1,1,1,2
511 | 1297522,2,1,1,1,2,1,1,1,1,2
512 | 1298360,1,1,1,1,2,1,1,1,1,2
513 | 1299924,5,1,1,1,2,1,2,1,1,2
514 | 1299994,5,1,1,1,2,1,1,1,1,2
515 | 1304595,3,1,1,1,1,1,2,1,1,2
516 | 1306282,6,6,7,10,3,10,8,10,2,4
517 | 1313325,4,10,4,7,3,10,9,10,1,4
518 | 1320077,1,1,1,1,1,1,1,1,1,2
519 | 1320077,1,1,1,1,1,1,2,1,1,2
520 | 1320304,3,1,2,2,2,1,1,1,1,2
521 | 1330439,4,7,8,3,4,10,9,1,1,4
522 | 333093,1,1,1,1,3,1,1,1,1,2
523 | 369565,4,1,1,1,3,1,1,1,1,2
524 | 412300,10,4,5,4,3,5,7,3,1,4
525 | 672113,7,5,6,10,4,10,5,3,1,4
526 | 749653,3,1,1,1,2,1,2,1,1,2
527 | 769612,3,1,1,2,2,1,1,1,1,2
528 | 769612,4,1,1,1,2,1,1,1,1,2
529 | 798429,4,1,1,1,2,1,3,1,1,2
530 | 807657,6,1,3,2,2,1,1,1,1,2
531 | 8233704,4,1,1,1,1,1,2,1,1,2
532 | 837480,7,4,4,3,4,10,6,9,1,4
533 | 867392,4,2,2,1,2,1,2,1,1,2
534 | 869828,1,1,1,1,1,1,3,1,1,2
535 | 1043068,3,1,1,1,2,1,2,1,1,2
536 | 1056171,2,1,1,1,2,1,2,1,1,2
537 | 1061990,1,1,3,2,2,1,3,1,1,2
538 | 1113061,5,1,1,1,2,1,3,1,1,2
539 | 1116192,5,1,2,1,2,1,3,1,1,2
540 | 1135090,4,1,1,1,2,1,2,1,1,2
541 | 1145420,6,1,1,1,2,1,2,1,1,2
542 | 1158157,5,1,1,1,2,2,2,1,1,2
543 | 1171578,3,1,1,1,2,1,1,1,1,2
544 | 1174841,5,3,1,1,2,1,1,1,1,2
545 | 1184586,4,1,1,1,2,1,2,1,1,2
546 | 1186936,2,1,3,2,2,1,2,1,1,2
547 | 1197527,5,1,1,1,2,1,2,1,1,2
548 | 1222464,6,10,10,10,4,10,7,10,1,4
549 | 1240603,2,1,1,1,1,1,1,1,1,2
550 | 1240603,3,1,1,1,1,1,1,1,1,2
551 | 1241035,7,8,3,7,4,5,7,8,2,4
552 | 1287971,3,1,1,1,2,1,2,1,1,2
553 | 1289391,1,1,1,1,2,1,3,1,1,2
554 | 1299924,3,2,2,2,2,1,4,2,1,2
555 | 1306339,4,4,2,1,2,5,2,1,2,2
556 | 1313658,3,1,1,1,2,1,1,1,1,2
557 | 1313982,4,3,1,1,2,1,4,8,1,2
558 | 1321264,5,2,2,2,1,1,2,1,1,2
559 | 1321321,5,1,1,3,2,1,1,1,1,2
560 | 1321348,2,1,1,1,2,1,2,1,1,2
561 | 1321931,5,1,1,1,2,1,2,1,1,2
562 | 1321942,5,1,1,1,2,1,3,1,1,2
563 | 1321942,5,1,1,1,2,1,3,1,1,2
564 | 1328331,1,1,1,1,2,1,3,1,1,2
565 | 1328755,3,1,1,1,2,1,2,1,1,2
566 | 1331405,4,1,1,1,2,1,3,2,1,2
567 | 1331412,5,7,10,10,5,10,10,10,1,4
568 | 1333104,3,1,2,1,2,1,3,1,1,2
569 | 1334071,4,1,1,1,2,3,2,1,1,2
570 | 1343068,8,4,4,1,6,10,2,5,2,4
571 | 1343374,10,10,8,10,6,5,10,3,1,4
572 | 1344121,8,10,4,4,8,10,8,2,1,4
573 | 142932,7,6,10,5,3,10,9,10,2,4
574 | 183936,3,1,1,1,2,1,2,1,1,2
575 | 324382,1,1,1,1,2,1,2,1,1,2
576 | 378275,10,9,7,3,4,2,7,7,1,4
577 | 385103,5,1,2,1,2,1,3,1,1,2
578 | 690557,5,1,1,1,2,1,2,1,1,2
579 | 695091,1,1,1,1,2,1,2,1,1,2
580 | 695219,1,1,1,1,2,1,2,1,1,2
581 | 824249,1,1,1,1,2,1,3,1,1,2
582 | 871549,5,1,2,1,2,1,2,1,1,2
583 | 878358,5,7,10,6,5,10,7,5,1,4
584 | 1107684,6,10,5,5,4,10,6,10,1,4
585 | 1115762,3,1,1,1,2,1,1,1,1,2
586 | 1217717,5,1,1,6,3,1,1,1,1,2
587 | 1239420,1,1,1,1,2,1,1,1,1,2
588 | 1254538,8,10,10,10,6,10,10,10,1,4
589 | 1261751,5,1,1,1,2,1,2,2,1,2
590 | 1268275,9,8,8,9,6,3,4,1,1,4
591 | 1272166,5,1,1,1,2,1,1,1,1,2
592 | 1294261,4,10,8,5,4,1,10,1,1,4
593 | 1295529,2,5,7,6,4,10,7,6,1,4
594 | 1298484,10,3,4,5,3,10,4,1,1,4
595 | 1311875,5,1,2,1,2,1,1,1,1,2
596 | 1315506,4,8,6,3,4,10,7,1,1,4
597 | 1320141,5,1,1,1,2,1,2,1,1,2
598 | 1325309,4,1,2,1,2,1,2,1,1,2
599 | 1333063,5,1,3,1,2,1,3,1,1,2
600 | 1333495,3,1,1,1,2,1,2,1,1,2
601 | 1334659,5,2,4,1,1,1,1,1,1,2
602 | 1336798,3,1,1,1,2,1,2,1,1,2
603 | 1344449,1,1,1,1,1,1,2,1,1,2
604 | 1350568,4,1,1,1,2,1,2,1,1,2
605 | 1352663,5,4,6,8,4,1,8,10,1,4
606 | 188336,5,3,2,8,5,10,8,1,2,4
607 | 352431,10,5,10,3,5,8,7,8,3,4
608 | 353098,4,1,1,2,2,1,1,1,1,2
609 | 411453,1,1,1,1,2,1,1,1,1,2
610 | 557583,5,10,10,10,10,10,10,1,1,4
611 | 636375,5,1,1,1,2,1,1,1,1,2
612 | 736150,10,4,3,10,3,10,7,1,2,4
613 | 803531,5,10,10,10,5,2,8,5,1,4
614 | 822829,8,10,10,10,6,10,10,10,10,4
615 | 1016634,2,3,1,1,2,1,2,1,1,2
616 | 1031608,2,1,1,1,1,1,2,1,1,2
617 | 1041043,4,1,3,1,2,1,2,1,1,2
618 | 1042252,3,1,1,1,2,1,2,1,1,2
619 | 1057067,1,1,1,1,1,?,1,1,1,2
620 | 1061990,4,1,1,1,2,1,2,1,1,2
621 | 1073836,5,1,1,1,2,1,2,1,1,2
622 | 1083817,3,1,1,1,2,1,2,1,1,2
623 | 1096352,6,3,3,3,3,2,6,1,1,2
624 | 1140597,7,1,2,3,2,1,2,1,1,2
625 | 1149548,1,1,1,1,2,1,1,1,1,2
626 | 1174009,5,1,1,2,1,1,2,1,1,2
627 | 1183596,3,1,3,1,3,4,1,1,1,2
628 | 1190386,4,6,6,5,7,6,7,7,3,4
629 | 1190546,2,1,1,1,2,5,1,1,1,2
630 | 1213273,2,1,1,1,2,1,1,1,1,2
631 | 1218982,4,1,1,1,2,1,1,1,1,2
632 | 1225382,6,2,3,1,2,1,1,1,1,2
633 | 1235807,5,1,1,1,2,1,2,1,1,2
634 | 1238777,1,1,1,1,2,1,1,1,1,2
635 | 1253955,8,7,4,4,5,3,5,10,1,4
636 | 1257366,3,1,1,1,2,1,1,1,1,2
637 | 1260659,3,1,4,1,2,1,1,1,1,2
638 | 1268952,10,10,7,8,7,1,10,10,3,4
639 | 1275807,4,2,4,3,2,2,2,1,1,2
640 | 1277792,4,1,1,1,2,1,1,1,1,2
641 | 1277792,5,1,1,3,2,1,1,1,1,2
642 | 1285722,4,1,1,3,2,1,1,1,1,2
643 | 1288608,3,1,1,1,2,1,2,1,1,2
644 | 1290203,3,1,1,1,2,1,2,1,1,2
645 | 1294413,1,1,1,1,2,1,1,1,1,2
646 | 1299596,2,1,1,1,2,1,1,1,1,2
647 | 1303489,3,1,1,1,2,1,2,1,1,2
648 | 1311033,1,2,2,1,2,1,1,1,1,2
649 | 1311108,1,1,1,3,2,1,1,1,1,2
650 | 1315807,5,10,10,10,10,2,10,10,10,4
651 | 1318671,3,1,1,1,2,1,2,1,1,2
652 | 1319609,3,1,1,2,3,4,1,1,1,2
653 | 1323477,1,2,1,3,2,1,2,1,1,2
654 | 1324572,5,1,1,1,2,1,2,2,1,2
655 | 1324681,4,1,1,1,2,1,2,1,1,2
656 | 1325159,3,1,1,1,2,1,3,1,1,2
657 | 1326892,3,1,1,1,2,1,2,1,1,2
658 | 1330361,5,1,1,1,2,1,2,1,1,2
659 | 1333877,5,4,5,1,8,1,3,6,1,2
660 | 1334015,7,8,8,7,3,10,7,2,3,4
661 | 1334667,1,1,1,1,2,1,1,1,1,2
662 | 1339781,1,1,1,1,2,1,2,1,1,2
663 | 1339781,4,1,1,1,2,1,3,1,1,2
664 | 13454352,1,1,3,1,2,1,2,1,1,2
665 | 1345452,1,1,3,1,2,1,2,1,1,2
666 | 1345593,3,1,1,3,2,1,2,1,1,2
667 | 1347749,1,1,1,1,2,1,1,1,1,2
668 | 1347943,5,2,2,2,2,1,1,1,2,2
669 | 1348851,3,1,1,1,2,1,3,1,1,2
670 | 1350319,5,7,4,1,6,1,7,10,3,4
671 | 1350423,5,10,10,8,5,5,7,10,1,4
672 | 1352848,3,10,7,8,5,8,7,4,1,4
673 | 1353092,3,2,1,2,2,1,3,1,1,2
674 | 1354840,2,1,1,1,2,1,3,1,1,2
675 | 1354840,5,3,2,1,3,1,1,1,1,2
676 | 1355260,1,1,1,1,2,1,2,1,1,2
677 | 1365075,4,1,4,1,2,1,1,1,1,2
678 | 1365328,1,1,2,1,2,1,2,1,1,2
679 | 1368267,5,1,1,1,2,1,1,1,1,2
680 | 1368273,1,1,1,1,2,1,1,1,1,2
681 | 1368882,2,1,1,1,2,1,1,1,1,2
682 | 1369821,10,10,10,10,5,10,10,10,7,4
683 | 1371026,5,10,10,10,4,10,5,6,3,4
684 | 1371920,5,1,1,1,2,1,3,2,1,2
685 | 466906,1,1,1,1,2,1,1,1,1,2
686 | 466906,1,1,1,1,2,1,1,1,1,2
687 | 534555,1,1,1,1,2,1,1,1,1,2
688 | 536708,1,1,1,1,2,1,1,1,1,2
689 | 566346,3,1,1,1,2,1,2,3,1,2
690 | 603148,4,1,1,1,2,1,1,1,1,2
691 | 654546,1,1,1,1,2,1,1,1,8,2
692 | 654546,1,1,1,3,2,1,1,1,1,2
693 | 695091,5,10,10,5,4,5,4,4,1,4
694 | 714039,3,1,1,1,2,1,1,1,1,2
695 | 763235,3,1,1,1,2,1,2,1,2,2
696 | 776715,3,1,1,1,3,2,1,1,1,2
697 | 841769,2,1,1,1,2,1,1,1,1,2
698 | 888820,5,10,10,3,7,3,8,10,2,4
699 | 897471,4,8,6,4,3,4,10,6,1,4
700 | 897471,4,8,8,5,4,5,10,4,1,4
--------------------------------------------------------------------------------
/Chapter 4/pima-indians-diabetes.csv:
--------------------------------------------------------------------------------
1 | Preg,Plas,Pres,skin,test,mass,pedi,age,class
2 | 6,148,72,35,0,33.6,0.627,50,1
3 | 1,85,66,29,0,26.6,0.351,31,0
4 | 8,183,64,0,0,23.3,0.672,32,1
5 | 1,89,66,23,94,28.1,0.167,21,0
6 | 0,137,40,35,168,43.1,2.288,33,1
7 | 5,116,74,0,0,25.6,0.201,30,0
8 | 3,78,50,32,88,31,0.248,26,1
9 | 10,115,0,0,0,35.3,0.134,29,0
10 | 2,197,70,45,543,30.5,0.158,53,1
11 | 8,125,96,0,0,0,0.232,54,1
12 | 4,110,92,0,0,37.6,0.191,30,0
13 | 10,168,74,0,0,38,0.537,34,1
14 | 10,139,80,0,0,27.1,1.441,57,0
15 | 1,189,60,23,846,30.1,0.398,59,1
16 | 5,166,72,19,175,25.8,0.587,51,1
17 | 7,100,0,0,0,30,0.484,32,1
18 | 0,118,84,47,230,45.8,0.551,31,1
19 | 7,107,74,0,0,29.6,0.254,31,1
20 | 1,103,30,38,83,43.3,0.183,33,0
21 | 1,115,70,30,96,34.6,0.529,32,1
22 | 3,126,88,41,235,39.3,0.704,27,0
23 | 8,99,84,0,0,35.4,0.388,50,0
24 | 7,196,90,0,0,39.8,0.451,41,1
25 | 9,119,80,35,0,29,0.263,29,1
26 | 11,143,94,33,146,36.6,0.254,51,1
27 | 10,125,70,26,115,31.1,0.205,41,1
28 | 7,147,76,0,0,39.4,0.257,43,1
29 | 1,97,66,15,140,23.2,0.487,22,0
30 | 13,145,82,19,110,22.2,0.245,57,0
31 | 5,117,92,0,0,34.1,0.337,38,0
32 | 5,109,75,26,0,36,0.546,60,0
33 | 3,158,76,36,245,31.6,0.851,28,1
34 | 3,88,58,11,54,24.8,0.267,22,0
35 | 6,92,92,0,0,19.9,0.188,28,0
36 | 10,122,78,31,0,27.6,0.512,45,0
37 | 4,103,60,33,192,24,0.966,33,0
38 | 11,138,76,0,0,33.2,0.42,35,0
39 | 9,102,76,37,0,32.9,0.665,46,1
40 | 2,90,68,42,0,38.2,0.503,27,1
41 | 4,111,72,47,207,37.1,1.39,56,1
42 | 3,180,64,25,70,34,0.271,26,0
43 | 7,133,84,0,0,40.2,0.696,37,0
44 | 7,106,92,18,0,22.7,0.235,48,0
45 | 9,171,110,24,240,45.4,0.721,54,1
46 | 7,159,64,0,0,27.4,0.294,40,0
47 | 0,180,66,39,0,42,1.893,25,1
48 | 1,146,56,0,0,29.7,0.564,29,0
49 | 2,71,70,27,0,28,0.586,22,0
50 | 7,103,66,32,0,39.1,0.344,31,1
51 | 7,105,0,0,0,0,0.305,24,0
52 | 1,103,80,11,82,19.4,0.491,22,0
53 | 1,101,50,15,36,24.2,0.526,26,0
54 | 5,88,66,21,23,24.4,0.342,30,0
55 | 8,176,90,34,300,33.7,0.467,58,1
56 | 7,150,66,42,342,34.7,0.718,42,0
57 | 1,73,50,10,0,23,0.248,21,0
58 | 7,187,68,39,304,37.7,0.254,41,1
59 | 0,100,88,60,110,46.8,0.962,31,0
60 | 0,146,82,0,0,40.5,1.781,44,0
61 | 0,105,64,41,142,41.5,0.173,22,0
62 | 2,84,0,0,0,0,0.304,21,0
63 | 8,133,72,0,0,32.9,0.27,39,1
64 | 5,44,62,0,0,25,0.587,36,0
65 | 2,141,58,34,128,25.4,0.699,24,0
66 | 7,114,66,0,0,32.8,0.258,42,1
67 | 5,99,74,27,0,29,0.203,32,0
68 | 0,109,88,30,0,32.5,0.855,38,1
69 | 2,109,92,0,0,42.7,0.845,54,0
70 | 1,95,66,13,38,19.6,0.334,25,0
71 | 4,146,85,27,100,28.9,0.189,27,0
72 | 2,100,66,20,90,32.9,0.867,28,1
73 | 5,139,64,35,140,28.6,0.411,26,0
74 | 13,126,90,0,0,43.4,0.583,42,1
75 | 4,129,86,20,270,35.1,0.231,23,0
76 | 1,79,75,30,0,32,0.396,22,0
77 | 1,0,48,20,0,24.7,0.14,22,0
78 | 7,62,78,0,0,32.6,0.391,41,0
79 | 5,95,72,33,0,37.7,0.37,27,0
80 | 0,131,0,0,0,43.2,0.27,26,1
81 | 2,112,66,22,0,25,0.307,24,0
82 | 3,113,44,13,0,22.4,0.14,22,0
83 | 2,74,0,0,0,0,0.102,22,0
84 | 7,83,78,26,71,29.3,0.767,36,0
85 | 0,101,65,28,0,24.6,0.237,22,0
86 | 5,137,108,0,0,48.8,0.227,37,1
87 | 2,110,74,29,125,32.4,0.698,27,0
88 | 13,106,72,54,0,36.6,0.178,45,0
89 | 2,100,68,25,71,38.5,0.324,26,0
90 | 15,136,70,32,110,37.1,0.153,43,1
91 | 1,107,68,19,0,26.5,0.165,24,0
92 | 1,80,55,0,0,19.1,0.258,21,0
93 | 4,123,80,15,176,32,0.443,34,0
94 | 7,81,78,40,48,46.7,0.261,42,0
95 | 4,134,72,0,0,23.8,0.277,60,1
96 | 2,142,82,18,64,24.7,0.761,21,0
97 | 6,144,72,27,228,33.9,0.255,40,0
98 | 2,92,62,28,0,31.6,0.13,24,0
99 | 1,71,48,18,76,20.4,0.323,22,0
100 | 6,93,50,30,64,28.7,0.356,23,0
101 | 1,122,90,51,220,49.7,0.325,31,1
102 | 1,163,72,0,0,39,1.222,33,1
103 | 1,151,60,0,0,26.1,0.179,22,0
104 | 0,125,96,0,0,22.5,0.262,21,0
105 | 1,81,72,18,40,26.6,0.283,24,0
106 | 2,85,65,0,0,39.6,0.93,27,0
107 | 1,126,56,29,152,28.7,0.801,21,0
108 | 1,96,122,0,0,22.4,0.207,27,0
109 | 4,144,58,28,140,29.5,0.287,37,0
110 | 3,83,58,31,18,34.3,0.336,25,0
111 | 0,95,85,25,36,37.4,0.247,24,1
112 | 3,171,72,33,135,33.3,0.199,24,1
113 | 8,155,62,26,495,34,0.543,46,1
114 | 1,89,76,34,37,31.2,0.192,23,0
115 | 4,76,62,0,0,34,0.391,25,0
116 | 7,160,54,32,175,30.5,0.588,39,1
117 | 4,146,92,0,0,31.2,0.539,61,1
118 | 5,124,74,0,0,34,0.22,38,1
119 | 5,78,48,0,0,33.7,0.654,25,0
120 | 4,97,60,23,0,28.2,0.443,22,0
121 | 4,99,76,15,51,23.2,0.223,21,0
122 | 0,162,76,56,100,53.2,0.759,25,1
123 | 6,111,64,39,0,34.2,0.26,24,0
124 | 2,107,74,30,100,33.6,0.404,23,0
125 | 5,132,80,0,0,26.8,0.186,69,0
126 | 0,113,76,0,0,33.3,0.278,23,1
127 | 1,88,30,42,99,55,0.496,26,1
128 | 3,120,70,30,135,42.9,0.452,30,0
129 | 1,118,58,36,94,33.3,0.261,23,0
130 | 1,117,88,24,145,34.5,0.403,40,1
131 | 0,105,84,0,0,27.9,0.741,62,1
132 | 4,173,70,14,168,29.7,0.361,33,1
133 | 9,122,56,0,0,33.3,1.114,33,1
134 | 3,170,64,37,225,34.5,0.356,30,1
135 | 8,84,74,31,0,38.3,0.457,39,0
136 | 2,96,68,13,49,21.1,0.647,26,0
137 | 2,125,60,20,140,33.8,0.088,31,0
138 | 0,100,70,26,50,30.8,0.597,21,0
139 | 0,93,60,25,92,28.7,0.532,22,0
140 | 0,129,80,0,0,31.2,0.703,29,0
141 | 5,105,72,29,325,36.9,0.159,28,0
142 | 3,128,78,0,0,21.1,0.268,55,0
143 | 5,106,82,30,0,39.5,0.286,38,0
144 | 2,108,52,26,63,32.5,0.318,22,0
145 | 10,108,66,0,0,32.4,0.272,42,1
146 | 4,154,62,31,284,32.8,0.237,23,0
147 | 0,102,75,23,0,0,0.572,21,0
148 | 9,57,80,37,0,32.8,0.096,41,0
149 | 2,106,64,35,119,30.5,1.4,34,0
150 | 5,147,78,0,0,33.7,0.218,65,0
151 | 2,90,70,17,0,27.3,0.085,22,0
152 | 1,136,74,50,204,37.4,0.399,24,0
153 | 4,114,65,0,0,21.9,0.432,37,0
154 | 9,156,86,28,155,34.3,1.189,42,1
155 | 1,153,82,42,485,40.6,0.687,23,0
156 | 8,188,78,0,0,47.9,0.137,43,1
157 | 7,152,88,44,0,50,0.337,36,1
158 | 2,99,52,15,94,24.6,0.637,21,0
159 | 1,109,56,21,135,25.2,0.833,23,0
160 | 2,88,74,19,53,29,0.229,22,0
161 | 17,163,72,41,114,40.9,0.817,47,1
162 | 4,151,90,38,0,29.7,0.294,36,0
163 | 7,102,74,40,105,37.2,0.204,45,0
164 | 0,114,80,34,285,44.2,0.167,27,0
165 | 2,100,64,23,0,29.7,0.368,21,0
166 | 0,131,88,0,0,31.6,0.743,32,1
167 | 6,104,74,18,156,29.9,0.722,41,1
168 | 3,148,66,25,0,32.5,0.256,22,0
169 | 4,120,68,0,0,29.6,0.709,34,0
170 | 4,110,66,0,0,31.9,0.471,29,0
171 | 3,111,90,12,78,28.4,0.495,29,0
172 | 6,102,82,0,0,30.8,0.18,36,1
173 | 6,134,70,23,130,35.4,0.542,29,1
174 | 2,87,0,23,0,28.9,0.773,25,0
175 | 1,79,60,42,48,43.5,0.678,23,0
176 | 2,75,64,24,55,29.7,0.37,33,0
177 | 8,179,72,42,130,32.7,0.719,36,1
178 | 6,85,78,0,0,31.2,0.382,42,0
179 | 0,129,110,46,130,67.1,0.319,26,1
180 | 5,143,78,0,0,45,0.19,47,0
181 | 5,130,82,0,0,39.1,0.956,37,1
182 | 6,87,80,0,0,23.2,0.084,32,0
183 | 0,119,64,18,92,34.9,0.725,23,0
184 | 1,0,74,20,23,27.7,0.299,21,0
185 | 5,73,60,0,0,26.8,0.268,27,0
186 | 4,141,74,0,0,27.6,0.244,40,0
187 | 7,194,68,28,0,35.9,0.745,41,1
188 | 8,181,68,36,495,30.1,0.615,60,1
189 | 1,128,98,41,58,32,1.321,33,1
190 | 8,109,76,39,114,27.9,0.64,31,1
191 | 5,139,80,35,160,31.6,0.361,25,1
192 | 3,111,62,0,0,22.6,0.142,21,0
193 | 9,123,70,44,94,33.1,0.374,40,0
194 | 7,159,66,0,0,30.4,0.383,36,1
195 | 11,135,0,0,0,52.3,0.578,40,1
196 | 8,85,55,20,0,24.4,0.136,42,0
197 | 5,158,84,41,210,39.4,0.395,29,1
198 | 1,105,58,0,0,24.3,0.187,21,0
199 | 3,107,62,13,48,22.9,0.678,23,1
200 | 4,109,64,44,99,34.8,0.905,26,1
201 | 4,148,60,27,318,30.9,0.15,29,1
202 | 0,113,80,16,0,31,0.874,21,0
203 | 1,138,82,0,0,40.1,0.236,28,0
204 | 0,108,68,20,0,27.3,0.787,32,0
205 | 2,99,70,16,44,20.4,0.235,27,0
206 | 6,103,72,32,190,37.7,0.324,55,0
207 | 5,111,72,28,0,23.9,0.407,27,0
208 | 8,196,76,29,280,37.5,0.605,57,1
209 | 5,162,104,0,0,37.7,0.151,52,1
210 | 1,96,64,27,87,33.2,0.289,21,0
211 | 7,184,84,33,0,35.5,0.355,41,1
212 | 2,81,60,22,0,27.7,0.29,25,0
213 | 0,147,85,54,0,42.8,0.375,24,0
214 | 7,179,95,31,0,34.2,0.164,60,0
215 | 0,140,65,26,130,42.6,0.431,24,1
216 | 9,112,82,32,175,34.2,0.26,36,1
217 | 12,151,70,40,271,41.8,0.742,38,1
218 | 5,109,62,41,129,35.8,0.514,25,1
219 | 6,125,68,30,120,30,0.464,32,0
220 | 5,85,74,22,0,29,1.224,32,1
221 | 5,112,66,0,0,37.8,0.261,41,1
222 | 0,177,60,29,478,34.6,1.072,21,1
223 | 2,158,90,0,0,31.6,0.805,66,1
224 | 7,119,0,0,0,25.2,0.209,37,0
225 | 7,142,60,33,190,28.8,0.687,61,0
226 | 1,100,66,15,56,23.6,0.666,26,0
227 | 1,87,78,27,32,34.6,0.101,22,0
228 | 0,101,76,0,0,35.7,0.198,26,0
229 | 3,162,52,38,0,37.2,0.652,24,1
230 | 4,197,70,39,744,36.7,2.329,31,0
231 | 0,117,80,31,53,45.2,0.089,24,0
232 | 4,142,86,0,0,44,0.645,22,1
233 | 6,134,80,37,370,46.2,0.238,46,1
234 | 1,79,80,25,37,25.4,0.583,22,0
235 | 4,122,68,0,0,35,0.394,29,0
236 | 3,74,68,28,45,29.7,0.293,23,0
237 | 4,171,72,0,0,43.6,0.479,26,1
238 | 7,181,84,21,192,35.9,0.586,51,1
239 | 0,179,90,27,0,44.1,0.686,23,1
240 | 9,164,84,21,0,30.8,0.831,32,1
241 | 0,104,76,0,0,18.4,0.582,27,0
242 | 1,91,64,24,0,29.2,0.192,21,0
243 | 4,91,70,32,88,33.1,0.446,22,0
244 | 3,139,54,0,0,25.6,0.402,22,1
245 | 6,119,50,22,176,27.1,1.318,33,1
246 | 2,146,76,35,194,38.2,0.329,29,0
247 | 9,184,85,15,0,30,1.213,49,1
248 | 10,122,68,0,0,31.2,0.258,41,0
249 | 0,165,90,33,680,52.3,0.427,23,0
250 | 9,124,70,33,402,35.4,0.282,34,0
251 | 1,111,86,19,0,30.1,0.143,23,0
252 | 9,106,52,0,0,31.2,0.38,42,0
253 | 2,129,84,0,0,28,0.284,27,0
254 | 2,90,80,14,55,24.4,0.249,24,0
255 | 0,86,68,32,0,35.8,0.238,25,0
256 | 12,92,62,7,258,27.6,0.926,44,1
257 | 1,113,64,35,0,33.6,0.543,21,1
258 | 3,111,56,39,0,30.1,0.557,30,0
259 | 2,114,68,22,0,28.7,0.092,25,0
260 | 1,193,50,16,375,25.9,0.655,24,0
261 | 11,155,76,28,150,33.3,1.353,51,1
262 | 3,191,68,15,130,30.9,0.299,34,0
263 | 3,141,0,0,0,30,0.761,27,1
264 | 4,95,70,32,0,32.1,0.612,24,0
265 | 3,142,80,15,0,32.4,0.2,63,0
266 | 4,123,62,0,0,32,0.226,35,1
267 | 5,96,74,18,67,33.6,0.997,43,0
268 | 0,138,0,0,0,36.3,0.933,25,1
269 | 2,128,64,42,0,40,1.101,24,0
270 | 0,102,52,0,0,25.1,0.078,21,0
271 | 2,146,0,0,0,27.5,0.24,28,1
272 | 10,101,86,37,0,45.6,1.136,38,1
273 | 2,108,62,32,56,25.2,0.128,21,0
274 | 3,122,78,0,0,23,0.254,40,0
275 | 1,71,78,50,45,33.2,0.422,21,0
276 | 13,106,70,0,0,34.2,0.251,52,0
277 | 2,100,70,52,57,40.5,0.677,25,0
278 | 7,106,60,24,0,26.5,0.296,29,1
279 | 0,104,64,23,116,27.8,0.454,23,0
280 | 5,114,74,0,0,24.9,0.744,57,0
281 | 2,108,62,10,278,25.3,0.881,22,0
282 | 0,146,70,0,0,37.9,0.334,28,1
283 | 10,129,76,28,122,35.9,0.28,39,0
284 | 7,133,88,15,155,32.4,0.262,37,0
285 | 7,161,86,0,0,30.4,0.165,47,1
286 | 2,108,80,0,0,27,0.259,52,1
287 | 7,136,74,26,135,26,0.647,51,0
288 | 5,155,84,44,545,38.7,0.619,34,0
289 | 1,119,86,39,220,45.6,0.808,29,1
290 | 4,96,56,17,49,20.8,0.34,26,0
291 | 5,108,72,43,75,36.1,0.263,33,0
292 | 0,78,88,29,40,36.9,0.434,21,0
293 | 0,107,62,30,74,36.6,0.757,25,1
294 | 2,128,78,37,182,43.3,1.224,31,1
295 | 1,128,48,45,194,40.5,0.613,24,1
296 | 0,161,50,0,0,21.9,0.254,65,0
297 | 6,151,62,31,120,35.5,0.692,28,0
298 | 2,146,70,38,360,28,0.337,29,1
299 | 0,126,84,29,215,30.7,0.52,24,0
300 | 14,100,78,25,184,36.6,0.412,46,1
301 | 8,112,72,0,0,23.6,0.84,58,0
302 | 0,167,0,0,0,32.3,0.839,30,1
303 | 2,144,58,33,135,31.6,0.422,25,1
304 | 5,77,82,41,42,35.8,0.156,35,0
305 | 5,115,98,0,0,52.9,0.209,28,1
306 | 3,150,76,0,0,21,0.207,37,0
307 | 2,120,76,37,105,39.7,0.215,29,0
308 | 10,161,68,23,132,25.5,0.326,47,1
309 | 0,137,68,14,148,24.8,0.143,21,0
310 | 0,128,68,19,180,30.5,1.391,25,1
311 | 2,124,68,28,205,32.9,0.875,30,1
312 | 6,80,66,30,0,26.2,0.313,41,0
313 | 0,106,70,37,148,39.4,0.605,22,0
314 | 2,155,74,17,96,26.6,0.433,27,1
315 | 3,113,50,10,85,29.5,0.626,25,0
316 | 7,109,80,31,0,35.9,1.127,43,1
317 | 2,112,68,22,94,34.1,0.315,26,0
318 | 3,99,80,11,64,19.3,0.284,30,0
319 | 3,182,74,0,0,30.5,0.345,29,1
320 | 3,115,66,39,140,38.1,0.15,28,0
321 | 6,194,78,0,0,23.5,0.129,59,1
322 | 4,129,60,12,231,27.5,0.527,31,0
323 | 3,112,74,30,0,31.6,0.197,25,1
324 | 0,124,70,20,0,27.4,0.254,36,1
325 | 13,152,90,33,29,26.8,0.731,43,1
326 | 2,112,75,32,0,35.7,0.148,21,0
327 | 1,157,72,21,168,25.6,0.123,24,0
328 | 1,122,64,32,156,35.1,0.692,30,1
329 | 10,179,70,0,0,35.1,0.2,37,0
330 | 2,102,86,36,120,45.5,0.127,23,1
331 | 6,105,70,32,68,30.8,0.122,37,0
332 | 8,118,72,19,0,23.1,1.476,46,0
333 | 2,87,58,16,52,32.7,0.166,25,0
334 | 1,180,0,0,0,43.3,0.282,41,1
335 | 12,106,80,0,0,23.6,0.137,44,0
336 | 1,95,60,18,58,23.9,0.26,22,0
337 | 0,165,76,43,255,47.9,0.259,26,0
338 | 0,117,0,0,0,33.8,0.932,44,0
339 | 5,115,76,0,0,31.2,0.343,44,1
340 | 9,152,78,34,171,34.2,0.893,33,1
341 | 7,178,84,0,0,39.9,0.331,41,1
342 | 1,130,70,13,105,25.9,0.472,22,0
343 | 1,95,74,21,73,25.9,0.673,36,0
344 | 1,0,68,35,0,32,0.389,22,0
345 | 5,122,86,0,0,34.7,0.29,33,0
346 | 8,95,72,0,0,36.8,0.485,57,0
347 | 8,126,88,36,108,38.5,0.349,49,0
348 | 1,139,46,19,83,28.7,0.654,22,0
349 | 3,116,0,0,0,23.5,0.187,23,0
350 | 3,99,62,19,74,21.8,0.279,26,0
351 | 5,0,80,32,0,41,0.346,37,1
352 | 4,92,80,0,0,42.2,0.237,29,0
353 | 4,137,84,0,0,31.2,0.252,30,0
354 | 3,61,82,28,0,34.4,0.243,46,0
355 | 1,90,62,12,43,27.2,0.58,24,0
356 | 3,90,78,0,0,42.7,0.559,21,0
357 | 9,165,88,0,0,30.4,0.302,49,1
358 | 1,125,50,40,167,33.3,0.962,28,1
359 | 13,129,0,30,0,39.9,0.569,44,1
360 | 12,88,74,40,54,35.3,0.378,48,0
361 | 1,196,76,36,249,36.5,0.875,29,1
362 | 5,189,64,33,325,31.2,0.583,29,1
363 | 5,158,70,0,0,29.8,0.207,63,0
364 | 5,103,108,37,0,39.2,0.305,65,0
365 | 4,146,78,0,0,38.5,0.52,67,1
366 | 4,147,74,25,293,34.9,0.385,30,0
367 | 5,99,54,28,83,34,0.499,30,0
368 | 6,124,72,0,0,27.6,0.368,29,1
369 | 0,101,64,17,0,21,0.252,21,0
370 | 3,81,86,16,66,27.5,0.306,22,0
371 | 1,133,102,28,140,32.8,0.234,45,1
372 | 3,173,82,48,465,38.4,2.137,25,1
373 | 0,118,64,23,89,0,1.731,21,0
374 | 0,84,64,22,66,35.8,0.545,21,0
375 | 2,105,58,40,94,34.9,0.225,25,0
376 | 2,122,52,43,158,36.2,0.816,28,0
377 | 12,140,82,43,325,39.2,0.528,58,1
378 | 0,98,82,15,84,25.2,0.299,22,0
379 | 1,87,60,37,75,37.2,0.509,22,0
380 | 4,156,75,0,0,48.3,0.238,32,1
381 | 0,93,100,39,72,43.4,1.021,35,0
382 | 1,107,72,30,82,30.8,0.821,24,0
383 | 0,105,68,22,0,20,0.236,22,0
384 | 1,109,60,8,182,25.4,0.947,21,0
385 | 1,90,62,18,59,25.1,1.268,25,0
386 | 1,125,70,24,110,24.3,0.221,25,0
387 | 1,119,54,13,50,22.3,0.205,24,0
388 | 5,116,74,29,0,32.3,0.66,35,1
389 | 8,105,100,36,0,43.3,0.239,45,1
390 | 5,144,82,26,285,32,0.452,58,1
391 | 3,100,68,23,81,31.6,0.949,28,0
392 | 1,100,66,29,196,32,0.444,42,0
393 | 5,166,76,0,0,45.7,0.34,27,1
394 | 1,131,64,14,415,23.7,0.389,21,0
395 | 4,116,72,12,87,22.1,0.463,37,0
396 | 4,158,78,0,0,32.9,0.803,31,1
397 | 2,127,58,24,275,27.7,1.6,25,0
398 | 3,96,56,34,115,24.7,0.944,39,0
399 | 0,131,66,40,0,34.3,0.196,22,1
400 | 3,82,70,0,0,21.1,0.389,25,0
401 | 3,193,70,31,0,34.9,0.241,25,1
402 | 4,95,64,0,0,32,0.161,31,1
403 | 6,137,61,0,0,24.2,0.151,55,0
404 | 5,136,84,41,88,35,0.286,35,1
405 | 9,72,78,25,0,31.6,0.28,38,0
406 | 5,168,64,0,0,32.9,0.135,41,1
407 | 2,123,48,32,165,42.1,0.52,26,0
408 | 4,115,72,0,0,28.9,0.376,46,1
409 | 0,101,62,0,0,21.9,0.336,25,0
410 | 8,197,74,0,0,25.9,1.191,39,1
411 | 1,172,68,49,579,42.4,0.702,28,1
412 | 6,102,90,39,0,35.7,0.674,28,0
413 | 1,112,72,30,176,34.4,0.528,25,0
414 | 1,143,84,23,310,42.4,1.076,22,0
415 | 1,143,74,22,61,26.2,0.256,21,0
416 | 0,138,60,35,167,34.6,0.534,21,1
417 | 3,173,84,33,474,35.7,0.258,22,1
418 | 1,97,68,21,0,27.2,1.095,22,0
419 | 4,144,82,32,0,38.5,0.554,37,1
420 | 1,83,68,0,0,18.2,0.624,27,0
421 | 3,129,64,29,115,26.4,0.219,28,1
422 | 1,119,88,41,170,45.3,0.507,26,0
423 | 2,94,68,18,76,26,0.561,21,0
424 | 0,102,64,46,78,40.6,0.496,21,0
425 | 2,115,64,22,0,30.8,0.421,21,0
426 | 8,151,78,32,210,42.9,0.516,36,1
427 | 4,184,78,39,277,37,0.264,31,1
428 | 0,94,0,0,0,0,0.256,25,0
429 | 1,181,64,30,180,34.1,0.328,38,1
430 | 0,135,94,46,145,40.6,0.284,26,0
431 | 1,95,82,25,180,35,0.233,43,1
432 | 2,99,0,0,0,22.2,0.108,23,0
433 | 3,89,74,16,85,30.4,0.551,38,0
434 | 1,80,74,11,60,30,0.527,22,0
435 | 2,139,75,0,0,25.6,0.167,29,0
436 | 1,90,68,8,0,24.5,1.138,36,0
437 | 0,141,0,0,0,42.4,0.205,29,1
438 | 12,140,85,33,0,37.4,0.244,41,0
439 | 5,147,75,0,0,29.9,0.434,28,0
440 | 1,97,70,15,0,18.2,0.147,21,0
441 | 6,107,88,0,0,36.8,0.727,31,0
442 | 0,189,104,25,0,34.3,0.435,41,1
443 | 2,83,66,23,50,32.2,0.497,22,0
444 | 4,117,64,27,120,33.2,0.23,24,0
445 | 8,108,70,0,0,30.5,0.955,33,1
446 | 4,117,62,12,0,29.7,0.38,30,1
447 | 0,180,78,63,14,59.4,2.42,25,1
448 | 1,100,72,12,70,25.3,0.658,28,0
449 | 0,95,80,45,92,36.5,0.33,26,0
450 | 0,104,64,37,64,33.6,0.51,22,1
451 | 0,120,74,18,63,30.5,0.285,26,0
452 | 1,82,64,13,95,21.2,0.415,23,0
453 | 2,134,70,0,0,28.9,0.542,23,1
454 | 0,91,68,32,210,39.9,0.381,25,0
455 | 2,119,0,0,0,19.6,0.832,72,0
456 | 2,100,54,28,105,37.8,0.498,24,0
457 | 14,175,62,30,0,33.6,0.212,38,1
458 | 1,135,54,0,0,26.7,0.687,62,0
459 | 5,86,68,28,71,30.2,0.364,24,0
460 | 10,148,84,48,237,37.6,1.001,51,1
461 | 9,134,74,33,60,25.9,0.46,81,0
462 | 9,120,72,22,56,20.8,0.733,48,0
463 | 1,71,62,0,0,21.8,0.416,26,0
464 | 8,74,70,40,49,35.3,0.705,39,0
465 | 5,88,78,30,0,27.6,0.258,37,0
466 | 10,115,98,0,0,24,1.022,34,0
467 | 0,124,56,13,105,21.8,0.452,21,0
468 | 0,74,52,10,36,27.8,0.269,22,0
469 | 0,97,64,36,100,36.8,0.6,25,0
470 | 8,120,0,0,0,30,0.183,38,1
471 | 6,154,78,41,140,46.1,0.571,27,0
472 | 1,144,82,40,0,41.3,0.607,28,0
473 | 0,137,70,38,0,33.2,0.17,22,0
474 | 0,119,66,27,0,38.8,0.259,22,0
475 | 7,136,90,0,0,29.9,0.21,50,0
476 | 4,114,64,0,0,28.9,0.126,24,0
477 | 0,137,84,27,0,27.3,0.231,59,0
478 | 2,105,80,45,191,33.7,0.711,29,1
479 | 7,114,76,17,110,23.8,0.466,31,0
480 | 8,126,74,38,75,25.9,0.162,39,0
481 | 4,132,86,31,0,28,0.419,63,0
482 | 3,158,70,30,328,35.5,0.344,35,1
483 | 0,123,88,37,0,35.2,0.197,29,0
484 | 4,85,58,22,49,27.8,0.306,28,0
485 | 0,84,82,31,125,38.2,0.233,23,0
486 | 0,145,0,0,0,44.2,0.63,31,1
487 | 0,135,68,42,250,42.3,0.365,24,1
488 | 1,139,62,41,480,40.7,0.536,21,0
489 | 0,173,78,32,265,46.5,1.159,58,0
490 | 4,99,72,17,0,25.6,0.294,28,0
491 | 8,194,80,0,0,26.1,0.551,67,0
492 | 2,83,65,28,66,36.8,0.629,24,0
493 | 2,89,90,30,0,33.5,0.292,42,0
494 | 4,99,68,38,0,32.8,0.145,33,0
495 | 4,125,70,18,122,28.9,1.144,45,1
496 | 3,80,0,0,0,0,0.174,22,0
497 | 6,166,74,0,0,26.6,0.304,66,0
498 | 5,110,68,0,0,26,0.292,30,0
499 | 2,81,72,15,76,30.1,0.547,25,0
500 | 7,195,70,33,145,25.1,0.163,55,1
501 | 6,154,74,32,193,29.3,0.839,39,0
502 | 2,117,90,19,71,25.2,0.313,21,0
503 | 3,84,72,32,0,37.2,0.267,28,0
504 | 6,0,68,41,0,39,0.727,41,1
505 | 7,94,64,25,79,33.3,0.738,41,0
506 | 3,96,78,39,0,37.3,0.238,40,0
507 | 10,75,82,0,0,33.3,0.263,38,0
508 | 0,180,90,26,90,36.5,0.314,35,1
509 | 1,130,60,23,170,28.6,0.692,21,0
510 | 2,84,50,23,76,30.4,0.968,21,0
511 | 8,120,78,0,0,25,0.409,64,0
512 | 12,84,72,31,0,29.7,0.297,46,1
513 | 0,139,62,17,210,22.1,0.207,21,0
514 | 9,91,68,0,0,24.2,0.2,58,0
515 | 2,91,62,0,0,27.3,0.525,22,0
516 | 3,99,54,19,86,25.6,0.154,24,0
517 | 3,163,70,18,105,31.6,0.268,28,1
518 | 9,145,88,34,165,30.3,0.771,53,1
519 | 7,125,86,0,0,37.6,0.304,51,0
520 | 13,76,60,0,0,32.8,0.18,41,0
521 | 6,129,90,7,326,19.6,0.582,60,0
522 | 2,68,70,32,66,25,0.187,25,0
523 | 3,124,80,33,130,33.2,0.305,26,0
524 | 6,114,0,0,0,0,0.189,26,0
525 | 9,130,70,0,0,34.2,0.652,45,1
526 | 3,125,58,0,0,31.6,0.151,24,0
527 | 3,87,60,18,0,21.8,0.444,21,0
528 | 1,97,64,19,82,18.2,0.299,21,0
529 | 3,116,74,15,105,26.3,0.107,24,0
530 | 0,117,66,31,188,30.8,0.493,22,0
531 | 0,111,65,0,0,24.6,0.66,31,0
532 | 2,122,60,18,106,29.8,0.717,22,0
533 | 0,107,76,0,0,45.3,0.686,24,0
534 | 1,86,66,52,65,41.3,0.917,29,0
535 | 6,91,0,0,0,29.8,0.501,31,0
536 | 1,77,56,30,56,33.3,1.251,24,0
537 | 4,132,0,0,0,32.9,0.302,23,1
538 | 0,105,90,0,0,29.6,0.197,46,0
539 | 0,57,60,0,0,21.7,0.735,67,0
540 | 0,127,80,37,210,36.3,0.804,23,0
541 | 3,129,92,49,155,36.4,0.968,32,1
542 | 8,100,74,40,215,39.4,0.661,43,1
543 | 3,128,72,25,190,32.4,0.549,27,1
544 | 10,90,85,32,0,34.9,0.825,56,1
545 | 4,84,90,23,56,39.5,0.159,25,0
546 | 1,88,78,29,76,32,0.365,29,0
547 | 8,186,90,35,225,34.5,0.423,37,1
548 | 5,187,76,27,207,43.6,1.034,53,1
549 | 4,131,68,21,166,33.1,0.16,28,0
550 | 1,164,82,43,67,32.8,0.341,50,0
551 | 4,189,110,31,0,28.5,0.68,37,0
552 | 1,116,70,28,0,27.4,0.204,21,0
553 | 3,84,68,30,106,31.9,0.591,25,0
554 | 6,114,88,0,0,27.8,0.247,66,0
555 | 1,88,62,24,44,29.9,0.422,23,0
556 | 1,84,64,23,115,36.9,0.471,28,0
557 | 7,124,70,33,215,25.5,0.161,37,0
558 | 1,97,70,40,0,38.1,0.218,30,0
559 | 8,110,76,0,0,27.8,0.237,58,0
560 | 11,103,68,40,0,46.2,0.126,42,0
561 | 11,85,74,0,0,30.1,0.3,35,0
562 | 6,125,76,0,0,33.8,0.121,54,1
563 | 0,198,66,32,274,41.3,0.502,28,1
564 | 1,87,68,34,77,37.6,0.401,24,0
565 | 6,99,60,19,54,26.9,0.497,32,0
566 | 0,91,80,0,0,32.4,0.601,27,0
567 | 2,95,54,14,88,26.1,0.748,22,0
568 | 1,99,72,30,18,38.6,0.412,21,0
569 | 6,92,62,32,126,32,0.085,46,0
570 | 4,154,72,29,126,31.3,0.338,37,0
571 | 0,121,66,30,165,34.3,0.203,33,1
572 | 3,78,70,0,0,32.5,0.27,39,0
573 | 2,130,96,0,0,22.6,0.268,21,0
574 | 3,111,58,31,44,29.5,0.43,22,0
575 | 2,98,60,17,120,34.7,0.198,22,0
576 | 1,143,86,30,330,30.1,0.892,23,0
577 | 1,119,44,47,63,35.5,0.28,25,0
578 | 6,108,44,20,130,24,0.813,35,0
579 | 2,118,80,0,0,42.9,0.693,21,1
580 | 10,133,68,0,0,27,0.245,36,0
581 | 2,197,70,99,0,34.7,0.575,62,1
582 | 0,151,90,46,0,42.1,0.371,21,1
583 | 6,109,60,27,0,25,0.206,27,0
584 | 12,121,78,17,0,26.5,0.259,62,0
585 | 8,100,76,0,0,38.7,0.19,42,0
586 | 8,124,76,24,600,28.7,0.687,52,1
587 | 1,93,56,11,0,22.5,0.417,22,0
588 | 8,143,66,0,0,34.9,0.129,41,1
589 | 6,103,66,0,0,24.3,0.249,29,0
590 | 3,176,86,27,156,33.3,1.154,52,1
591 | 0,73,0,0,0,21.1,0.342,25,0
592 | 11,111,84,40,0,46.8,0.925,45,1
593 | 2,112,78,50,140,39.4,0.175,24,0
594 | 3,132,80,0,0,34.4,0.402,44,1
595 | 2,82,52,22,115,28.5,1.699,25,0
596 | 6,123,72,45,230,33.6,0.733,34,0
597 | 0,188,82,14,185,32,0.682,22,1
598 | 0,67,76,0,0,45.3,0.194,46,0
599 | 1,89,24,19,25,27.8,0.559,21,0
600 | 1,173,74,0,0,36.8,0.088,38,1
601 | 1,109,38,18,120,23.1,0.407,26,0
602 | 1,108,88,19,0,27.1,0.4,24,0
603 | 6,96,0,0,0,23.7,0.19,28,0
604 | 1,124,74,36,0,27.8,0.1,30,0
605 | 7,150,78,29,126,35.2,0.692,54,1
606 | 4,183,0,0,0,28.4,0.212,36,1
607 | 1,124,60,32,0,35.8,0.514,21,0
608 | 1,181,78,42,293,40,1.258,22,1
609 | 1,92,62,25,41,19.5,0.482,25,0
610 | 0,152,82,39,272,41.5,0.27,27,0
611 | 1,111,62,13,182,24,0.138,23,0
612 | 3,106,54,21,158,30.9,0.292,24,0
613 | 3,174,58,22,194,32.9,0.593,36,1
614 | 7,168,88,42,321,38.2,0.787,40,1
615 | 6,105,80,28,0,32.5,0.878,26,0
616 | 11,138,74,26,144,36.1,0.557,50,1
617 | 3,106,72,0,0,25.8,0.207,27,0
618 | 6,117,96,0,0,28.7,0.157,30,0
619 | 2,68,62,13,15,20.1,0.257,23,0
620 | 9,112,82,24,0,28.2,1.282,50,1
621 | 0,119,0,0,0,32.4,0.141,24,1
622 | 2,112,86,42,160,38.4,0.246,28,0
623 | 2,92,76,20,0,24.2,1.698,28,0
624 | 6,183,94,0,0,40.8,1.461,45,0
625 | 0,94,70,27,115,43.5,0.347,21,0
626 | 2,108,64,0,0,30.8,0.158,21,0
627 | 4,90,88,47,54,37.7,0.362,29,0
628 | 0,125,68,0,0,24.7,0.206,21,0
629 | 0,132,78,0,0,32.4,0.393,21,0
630 | 5,128,80,0,0,34.6,0.144,45,0
631 | 4,94,65,22,0,24.7,0.148,21,0
632 | 7,114,64,0,0,27.4,0.732,34,1
633 | 0,102,78,40,90,34.5,0.238,24,0
634 | 2,111,60,0,0,26.2,0.343,23,0
635 | 1,128,82,17,183,27.5,0.115,22,0
636 | 10,92,62,0,0,25.9,0.167,31,0
637 | 13,104,72,0,0,31.2,0.465,38,1
638 | 5,104,74,0,0,28.8,0.153,48,0
639 | 2,94,76,18,66,31.6,0.649,23,0
640 | 7,97,76,32,91,40.9,0.871,32,1
641 | 1,100,74,12,46,19.5,0.149,28,0
642 | 0,102,86,17,105,29.3,0.695,27,0
643 | 4,128,70,0,0,34.3,0.303,24,0
644 | 6,147,80,0,0,29.5,0.178,50,1
645 | 4,90,0,0,0,28,0.61,31,0
646 | 3,103,72,30,152,27.6,0.73,27,0
647 | 2,157,74,35,440,39.4,0.134,30,0
648 | 1,167,74,17,144,23.4,0.447,33,1
649 | 0,179,50,36,159,37.8,0.455,22,1
650 | 11,136,84,35,130,28.3,0.26,42,1
651 | 0,107,60,25,0,26.4,0.133,23,0
652 | 1,91,54,25,100,25.2,0.234,23,0
653 | 1,117,60,23,106,33.8,0.466,27,0
654 | 5,123,74,40,77,34.1,0.269,28,0
655 | 2,120,54,0,0,26.8,0.455,27,0
656 | 1,106,70,28,135,34.2,0.142,22,0
657 | 2,155,52,27,540,38.7,0.24,25,1
658 | 2,101,58,35,90,21.8,0.155,22,0
659 | 1,120,80,48,200,38.9,1.162,41,0
660 | 11,127,106,0,0,39,0.19,51,0
661 | 3,80,82,31,70,34.2,1.292,27,1
662 | 10,162,84,0,0,27.7,0.182,54,0
663 | 1,199,76,43,0,42.9,1.394,22,1
664 | 8,167,106,46,231,37.6,0.165,43,1
665 | 9,145,80,46,130,37.9,0.637,40,1
666 | 6,115,60,39,0,33.7,0.245,40,1
667 | 1,112,80,45,132,34.8,0.217,24,0
668 | 4,145,82,18,0,32.5,0.235,70,1
669 | 10,111,70,27,0,27.5,0.141,40,1
670 | 6,98,58,33,190,34,0.43,43,0
671 | 9,154,78,30,100,30.9,0.164,45,0
672 | 6,165,68,26,168,33.6,0.631,49,0
673 | 1,99,58,10,0,25.4,0.551,21,0
674 | 10,68,106,23,49,35.5,0.285,47,0
675 | 3,123,100,35,240,57.3,0.88,22,0
676 | 8,91,82,0,0,35.6,0.587,68,0
677 | 6,195,70,0,0,30.9,0.328,31,1
678 | 9,156,86,0,0,24.8,0.23,53,1
679 | 0,93,60,0,0,35.3,0.263,25,0
680 | 3,121,52,0,0,36,0.127,25,1
681 | 2,101,58,17,265,24.2,0.614,23,0
682 | 2,56,56,28,45,24.2,0.332,22,0
683 | 0,162,76,36,0,49.6,0.364,26,1
684 | 0,95,64,39,105,44.6,0.366,22,0
685 | 4,125,80,0,0,32.3,0.536,27,1
686 | 5,136,82,0,0,0,0.64,69,0
687 | 2,129,74,26,205,33.2,0.591,25,0
688 | 3,130,64,0,0,23.1,0.314,22,0
689 | 1,107,50,19,0,28.3,0.181,29,0
690 | 1,140,74,26,180,24.1,0.828,23,0
691 | 1,144,82,46,180,46.1,0.335,46,1
692 | 8,107,80,0,0,24.6,0.856,34,0
693 | 13,158,114,0,0,42.3,0.257,44,1
694 | 2,121,70,32,95,39.1,0.886,23,0
695 | 7,129,68,49,125,38.5,0.439,43,1
696 | 2,90,60,0,0,23.5,0.191,25,0
697 | 7,142,90,24,480,30.4,0.128,43,1
698 | 3,169,74,19,125,29.9,0.268,31,1
699 | 0,99,0,0,0,25,0.253,22,0
700 | 4,127,88,11,155,34.5,0.598,28,0
701 | 4,118,70,0,0,44.5,0.904,26,0
702 | 2,122,76,27,200,35.9,0.483,26,0
703 | 6,125,78,31,0,27.6,0.565,49,1
704 | 1,168,88,29,0,35,0.905,52,1
705 | 2,129,0,0,0,38.5,0.304,41,0
706 | 4,110,76,20,100,28.4,0.118,27,0
707 | 6,80,80,36,0,39.8,0.177,28,0
708 | 10,115,0,0,0,0,0.261,30,1
709 | 2,127,46,21,335,34.4,0.176,22,0
710 | 9,164,78,0,0,32.8,0.148,45,1
711 | 2,93,64,32,160,38,0.674,23,1
712 | 3,158,64,13,387,31.2,0.295,24,0
713 | 5,126,78,27,22,29.6,0.439,40,0
714 | 10,129,62,36,0,41.2,0.441,38,1
715 | 0,134,58,20,291,26.4,0.352,21,0
716 | 3,102,74,0,0,29.5,0.121,32,0
717 | 7,187,50,33,392,33.9,0.826,34,1
718 | 3,173,78,39,185,33.8,0.97,31,1
719 | 10,94,72,18,0,23.1,0.595,56,0
720 | 1,108,60,46,178,35.5,0.415,24,0
721 | 5,97,76,27,0,35.6,0.378,52,1
722 | 4,83,86,19,0,29.3,0.317,34,0
723 | 1,114,66,36,200,38.1,0.289,21,0
724 | 1,149,68,29,127,29.3,0.349,42,1
725 | 5,117,86,30,105,39.1,0.251,42,0
726 | 1,111,94,0,0,32.8,0.265,45,0
727 | 4,112,78,40,0,39.4,0.236,38,0
728 | 1,116,78,29,180,36.1,0.496,25,0
729 | 0,141,84,26,0,32.4,0.433,22,0
730 | 2,175,88,0,0,22.9,0.326,22,0
731 | 2,92,52,0,0,30.1,0.141,22,0
732 | 3,130,78,23,79,28.4,0.323,34,1
733 | 8,120,86,0,0,28.4,0.259,22,1
734 | 2,174,88,37,120,44.5,0.646,24,1
735 | 2,106,56,27,165,29,0.426,22,0
736 | 2,105,75,0,0,23.3,0.56,53,0
737 | 4,95,60,32,0,35.4,0.284,28,0
738 | 0,126,86,27,120,27.4,0.515,21,0
739 | 8,65,72,23,0,32,0.6,42,0
740 | 2,99,60,17,160,36.6,0.453,21,0
741 | 1,102,74,0,0,39.5,0.293,42,1
742 | 11,120,80,37,150,42.3,0.785,48,1
743 | 3,102,44,20,94,30.8,0.4,26,0
744 | 1,109,58,18,116,28.5,0.219,22,0
745 | 9,140,94,0,0,32.7,0.734,45,1
746 | 13,153,88,37,140,40.6,1.174,39,0
747 | 12,100,84,33,105,30,0.488,46,0
748 | 1,147,94,41,0,49.3,0.358,27,1
749 | 1,81,74,41,57,46.3,1.096,32,0
750 | 3,187,70,22,200,36.4,0.408,36,1
751 | 6,162,62,0,0,24.3,0.178,50,1
752 | 4,136,70,0,0,31.2,1.182,22,1
753 | 1,121,78,39,74,39,0.261,28,0
754 | 3,108,62,24,0,26,0.223,25,0
755 | 0,181,88,44,510,43.3,0.222,26,1
756 | 8,154,78,32,0,32.4,0.443,45,1
757 | 1,128,88,39,110,36.5,1.057,37,1
758 | 7,137,90,41,0,32,0.391,39,0
759 | 0,123,72,0,0,36.3,0.258,52,1
760 | 1,106,76,0,0,37.5,0.197,26,0
761 | 6,190,92,0,0,35.5,0.278,66,1
762 | 2,88,58,26,16,28.4,0.766,22,0
763 | 9,170,74,31,0,44,0.403,43,1
764 | 9,89,62,0,0,22.5,0.142,33,0
765 | 10,101,76,48,180,32.9,0.171,63,0
766 | 2,122,70,27,0,36.8,0.34,27,0
767 | 5,121,72,23,112,26.2,0.245,30,0
768 | 1,126,60,0,0,30.1,0.349,47,1
769 | 1,93,70,31,0,30.4,0.315,23,0
770 |
--------------------------------------------------------------------------------
/Chapter 5/Chapter5.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Remove duplicates"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from pandas import read_csv\n"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 6,
22 | "metadata": {},
23 | "outputs": [],
24 | "source": [
25 | "# now load the dataset\n",
26 | "data_frame = read_csv(\"IRIS.csv\", header=None)"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 7,
32 | "metadata": {},
33 | "outputs": [
34 | {
35 | "name": "stdout",
36 | "output_type": "stream",
37 | "text": [
38 | "(151, 5)\n"
39 | ]
40 | }
41 | ],
42 | "source": [
43 | "print(data_frame.shape)"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 8,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "# calculate the duplicates present\n",
53 | "duplicates = data_frame.duplicated()"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 9,
59 | "metadata": {},
60 | "outputs": [
61 | {
62 | "name": "stdout",
63 | "output_type": "stream",
64 | "text": [
65 | "True\n",
66 | " 0 1 2 3 4\n",
67 | "35 4.9 3.1 1.5 0.1 Iris-setosa\n",
68 | "38 4.9 3.1 1.5 0.1 Iris-setosa\n",
69 | "143 5.8 2.7 5.1 1.9 Iris-virginica\n"
70 | ]
71 | }
72 | ],
73 | "source": [
74 | "# output the duplicates if there are any duplicates\n",
75 | "print(duplicates.any())\n",
76 | "# list all duplicate rows\n",
77 | "print(data_frame[duplicates])"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": 10,
83 | "metadata": {},
84 | "outputs": [
85 | {
86 | "name": "stdout",
87 | "output_type": "stream",
88 | "text": [
89 | "(148, 5)\n"
90 | ]
91 | }
92 | ],
93 | "source": [
94 | "data_frame.drop_duplicates(inplace=True)\n",
95 | "print(data_frame.shape)"
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "## Impute the missing values"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": 1,
108 | "metadata": {},
109 | "outputs": [
110 | {
111 | "name": "stdout",
112 | "output_type": "stream",
113 | "text": [
114 | "[[3. 2. ]\n",
115 | " [6. 6.33333333]\n",
116 | " [7. 6. ]]\n"
117 | ]
118 | }
119 | ],
120 | "source": [
121 | "import numpy as np\n",
122 | "from sklearn.impute import SimpleImputer\n",
123 | "impute = SimpleImputer(missing_values=np.nan, strategy='mean')\n",
124 | "impute.fit([[2, 5], [np.nan, 8], [4, 6]])\n",
125 | "SimpleImputer()\n",
126 | "X = [[np.nan, 2], [6, np.nan], [7, 6]]\n",
127 | "print(impute.transform(X))"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": null,
133 | "metadata": {},
134 | "outputs": [],
135 | "source": [
136 | "In case of sparse metrices too, SimpleImputer works"
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": 2,
142 | "metadata": {},
143 | "outputs": [
144 | {
145 | "name": "stdout",
146 | "output_type": "stream",
147 | "text": [
148 | "[[2.66666667 2. ]\n",
149 | " [6. 1.33333333]\n",
150 | " [7. 6. ]]\n"
151 | ]
152 | }
153 | ],
154 | "source": [
155 | "import scipy.sparse as sp\n",
156 | "matrix = sp.csc_matrix([[2, 4], [0, -2], [6, 2]])\n",
157 | "impute = SimpleImputer(missing_values=-1, strategy='mean')\n",
158 | "impute.fit(matrix)\n",
159 | "SimpleImputer(missing_values=-1)\n",
160 | "matrix_test = sp.csc_matrix([[-1, 2], [6, -1], [7, 6]])\n",
161 | "print(impute.transform(matrix_test).toarray())"
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": 4,
167 | "metadata": {},
168 | "outputs": [
169 | {
170 | "name": "stdout",
171 | "output_type": "stream",
172 | "text": [
173 | "[['New York' 'New Delhi']\n",
174 | " ['New York' 'Tokyo']\n",
175 | " ['New York' 'Tokyo']\n",
176 | " ['New York' 'Tokyo']]\n"
177 | ]
178 | }
179 | ],
180 | "source": [
181 | "import pandas as pd\n",
182 | "data_frame = pd.DataFrame([[\"New York\", \"New Delhi\"],[np.nan, \"Tokyo\"],[\"New York\", np.nan],[\"New York\", \"Tokyo\"]], dtype=\"category\")\n",
183 | "\n",
184 | "impute = SimpleImputer(strategy=\"most_frequent\")\n",
185 | "print(impute.fit_transform(data_frame))\n"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": []
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "## Impute missing values using machine learning"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 57,
205 | "metadata": {},
206 | "outputs": [],
207 | "source": [
208 | "import numpy as np\n",
209 | "import pandas as pd"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 62,
215 | "metadata": {},
216 | "outputs": [],
217 | "source": [
218 | "missing_dictionary = {'Variable_A': [200, 190, 90, 149, np.nan],\n",
219 | " 'Variable_B': [400, np.nan, 149, 200, 205],\n",
220 | " 'Variable_C': [200,149, np.nan, 155, 165],\n",
221 | " 'Variable_D': [200, np.nan, 90, 149,100],\n",
222 | " 'Variable_E': [200, 190, 90, 149, np.nan],}"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 63,
228 | "metadata": {},
229 | "outputs": [],
230 | "source": [
231 | "missing_df = pd.DataFrame(missing_dictionary)"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 64,
237 | "metadata": {},
238 | "outputs": [
239 | {
240 | "data": {
241 | "text/html": [
242 | "\n",
243 | "\n",
256 | "
\n",
257 | " \n",
258 | " \n",
259 | " | \n",
260 | " Variable_A | \n",
261 | " Variable_B | \n",
262 | " Variable_C | \n",
263 | " Variable_D | \n",
264 | " Variable_E | \n",
265 | "
\n",
266 | " \n",
267 | " \n",
268 | " \n",
269 | " 0 | \n",
270 | " 200.0 | \n",
271 | " 400.0 | \n",
272 | " 200.0 | \n",
273 | " 200.0 | \n",
274 | " 200.0 | \n",
275 | "
\n",
276 | " \n",
277 | " 1 | \n",
278 | " 190.0 | \n",
279 | " NaN | \n",
280 | " 149.0 | \n",
281 | " NaN | \n",
282 | " 190.0 | \n",
283 | "
\n",
284 | " \n",
285 | " 2 | \n",
286 | " 90.0 | \n",
287 | " 149.0 | \n",
288 | " NaN | \n",
289 | " 90.0 | \n",
290 | " 90.0 | \n",
291 | "
\n",
292 | " \n",
293 | " 3 | \n",
294 | " 149.0 | \n",
295 | " 200.0 | \n",
296 | " 155.0 | \n",
297 | " 149.0 | \n",
298 | " 149.0 | \n",
299 | "
\n",
300 | " \n",
301 | " 4 | \n",
302 | " NaN | \n",
303 | " 205.0 | \n",
304 | " 165.0 | \n",
305 | " 100.0 | \n",
306 | " NaN | \n",
307 | "
\n",
308 | " \n",
309 | "
\n",
310 | "
"
311 | ],
312 | "text/plain": [
313 | " Variable_A Variable_B Variable_C Variable_D Variable_E\n",
314 | "0 200.0 400.0 200.0 200.0 200.0\n",
315 | "1 190.0 NaN 149.0 NaN 190.0\n",
316 | "2 90.0 149.0 NaN 90.0 90.0\n",
317 | "3 149.0 200.0 155.0 149.0 149.0\n",
318 | "4 NaN 205.0 165.0 100.0 NaN"
319 | ]
320 | },
321 | "execution_count": 64,
322 | "metadata": {},
323 | "output_type": "execute_result"
324 | }
325 | ],
326 | "source": [
327 | "missing_df"
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": 67,
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "from sklearn.impute import KNNImputer"
337 | ]
338 | },
339 | {
340 | "cell_type": "code",
341 | "execution_count": 69,
342 | "metadata": {},
343 | "outputs": [],
344 | "source": [
345 | "missing_imputer = KNNImputer(n_neighbors=2)"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": 70,
351 | "metadata": {},
352 | "outputs": [],
353 | "source": [
354 | "imputed_df = missing_imputer.fit_transform(missing_df)"
355 | ]
356 | },
357 | {
358 | "cell_type": "code",
359 | "execution_count": 71,
360 | "metadata": {},
361 | "outputs": [
362 | {
363 | "data": {
364 | "text/plain": [
365 | "array([[200. , 400. , 200. , 200. , 200. ],\n",
366 | " [190. , 302.5, 149. , 150. , 190. ],\n",
367 | " [ 90. , 149. , 160. , 90. , 90. ],\n",
368 | " [149. , 200. , 155. , 149. , 149. ],\n",
369 | " [169.5, 205. , 165. , 100. , 169.5]])"
370 | ]
371 | },
372 | "execution_count": 71,
373 | "metadata": {},
374 | "output_type": "execute_result"
375 | }
376 | ],
377 | "source": [
378 | "imputed_df"
379 | ]
380 | },
381 | {
382 | "cell_type": "code",
383 | "execution_count": null,
384 | "metadata": {},
385 | "outputs": [],
386 | "source": []
387 | },
388 | {
389 | "cell_type": "code",
390 | "execution_count": null,
391 | "metadata": {},
392 | "outputs": [],
393 | "source": []
394 | },
395 | {
396 | "cell_type": "code",
397 | "execution_count": null,
398 | "metadata": {},
399 | "outputs": [],
400 | "source": []
401 | },
402 | {
403 | "cell_type": "markdown",
404 | "metadata": {},
405 | "source": [
406 | "## Data Imbalance using SMOTE"
407 | ]
408 | },
409 | {
410 | "cell_type": "code",
411 | "execution_count": 83,
412 | "metadata": {},
413 | "outputs": [],
414 | "source": [
415 | "import pandas as pd\n",
416 | "from imblearn.over_sampling import SMOTE\n",
417 | "\n",
418 | "from imblearn.combine import SMOTETomek"
419 | ]
420 | },
421 | {
422 | "cell_type": "code",
423 | "execution_count": 111,
424 | "metadata": {},
425 | "outputs": [],
426 | "source": [
427 | "# Import data and create X, y\n",
428 | "credit_card_data_set = pd.read_csv('creditcard.csv')\n"
429 | ]
430 | },
431 | {
432 | "cell_type": "code",
433 | "execution_count": 127,
434 | "metadata": {},
435 | "outputs": [],
436 | "source": [
437 | "X = credit_card_data_set.iloc[:,:-1]\n",
438 | "y = credit_card_data_set.iloc[:,-1].map({1:'Fraud', 0:'No Fraud'})\n",
439 | "\n",
440 | "# Resample data\n",
441 | "X_resampled, y_resampled = SMOTE(sampling_strategy={\"Fraud\":500}).fit_resample(X, y)\n",
442 | "X_resampled = pd.DataFrame(X_resampled, columns=X.columns)"
443 | ]
444 | },
445 | {
446 | "cell_type": "code",
447 | "execution_count": 120,
448 | "metadata": {},
449 | "outputs": [],
450 | "source": [
451 | "class_0_original = len(credit_card_data_set[credit_card_data_set.Class==0])"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": 121,
457 | "metadata": {},
458 | "outputs": [],
459 | "source": [
460 | "class_1_original = len(credit_card_data_set[credit_card_data_set.Class==1])"
461 | ]
462 | },
463 | {
464 | "cell_type": "code",
465 | "execution_count": 123,
466 | "metadata": {},
467 | "outputs": [
468 | {
469 | "name": "stdout",
470 | "output_type": "stream",
471 | "text": [
472 | "0.001727485630620034\n"
473 | ]
474 | }
475 | ],
476 | "source": [
477 | "print(class_1_original/(class_0_original+class_1_original))"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": 128,
483 | "metadata": {},
484 | "outputs": [],
485 | "source": [
486 | "sampled_0 = len(y_sampled[y_sampled==0])\n"
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "execution_count": 129,
492 | "metadata": {},
493 | "outputs": [],
494 | "source": [
495 | "sampled_1 = len(y_sampled[y_sampled==1])"
496 | ]
497 | },
498 | {
499 | "cell_type": "code",
500 | "execution_count": 130,
501 | "metadata": {},
502 | "outputs": [
503 | {
504 | "name": "stdout",
505 | "output_type": "stream",
506 | "text": [
507 | "0.5\n"
508 | ]
509 | }
510 | ],
511 | "source": [
512 | "print(sampled_1/(sampled_0+sampled_1))"
513 | ]
514 | }
515 | ],
516 | "metadata": {
517 | "kernelspec": {
518 | "display_name": "Python 3",
519 | "language": "python",
520 | "name": "python3"
521 | },
522 | "language_info": {
523 | "codemirror_mode": {
524 | "name": "ipython",
525 | "version": 3
526 | },
527 | "file_extension": ".py",
528 | "mimetype": "text/x-python",
529 | "name": "python",
530 | "nbconvert_exporter": "python",
531 | "pygments_lexer": "ipython3",
532 | "version": "3.7.4"
533 | }
534 | },
535 | "nbformat": 4,
536 | "nbformat_minor": 2
537 | }
538 |
--------------------------------------------------------------------------------
/Chapter 5/IRIS.csv:
--------------------------------------------------------------------------------
1 | sepal_length,sepal_width,petal_length,petal_width,species
2 | 5.1,3.5,1.4,0.2,Iris-setosa
3 | 4.9,3,1.4,0.2,Iris-setosa
4 | 4.7,3.2,1.3,0.2,Iris-setosa
5 | 4.6,3.1,1.5,0.2,Iris-setosa
6 | 5,3.6,1.4,0.2,Iris-setosa
7 | 5.4,3.9,1.7,0.4,Iris-setosa
8 | 4.6,3.4,1.4,0.3,Iris-setosa
9 | 5,3.4,1.5,0.2,Iris-setosa
10 | 4.4,2.9,1.4,0.2,Iris-setosa
11 | 4.9,3.1,1.5,0.1,Iris-setosa
12 | 5.4,3.7,1.5,0.2,Iris-setosa
13 | 4.8,3.4,1.6,0.2,Iris-setosa
14 | 4.8,3,1.4,0.1,Iris-setosa
15 | 4.3,3,1.1,0.1,Iris-setosa
16 | 5.8,4,1.2,0.2,Iris-setosa
17 | 5.7,4.4,1.5,0.4,Iris-setosa
18 | 5.4,3.9,1.3,0.4,Iris-setosa
19 | 5.1,3.5,1.4,0.3,Iris-setosa
20 | 5.7,3.8,1.7,0.3,Iris-setosa
21 | 5.1,3.8,1.5,0.3,Iris-setosa
22 | 5.4,3.4,1.7,0.2,Iris-setosa
23 | 5.1,3.7,1.5,0.4,Iris-setosa
24 | 4.6,3.6,1,0.2,Iris-setosa
25 | 5.1,3.3,1.7,0.5,Iris-setosa
26 | 4.8,3.4,1.9,0.2,Iris-setosa
27 | 5,3,1.6,0.2,Iris-setosa
28 | 5,3.4,1.6,0.4,Iris-setosa
29 | 5.2,3.5,1.5,0.2,Iris-setosa
30 | 5.2,3.4,1.4,0.2,Iris-setosa
31 | 4.7,3.2,1.6,0.2,Iris-setosa
32 | 4.8,3.1,1.6,0.2,Iris-setosa
33 | 5.4,3.4,1.5,0.4,Iris-setosa
34 | 5.2,4.1,1.5,0.1,Iris-setosa
35 | 5.5,4.2,1.4,0.2,Iris-setosa
36 | 4.9,3.1,1.5,0.1,Iris-setosa
37 | 5,3.2,1.2,0.2,Iris-setosa
38 | 5.5,3.5,1.3,0.2,Iris-setosa
39 | 4.9,3.1,1.5,0.1,Iris-setosa
40 | 4.4,3,1.3,0.2,Iris-setosa
41 | 5.1,3.4,1.5,0.2,Iris-setosa
42 | 5,3.5,1.3,0.3,Iris-setosa
43 | 4.5,2.3,1.3,0.3,Iris-setosa
44 | 4.4,3.2,1.3,0.2,Iris-setosa
45 | 5,3.5,1.6,0.6,Iris-setosa
46 | 5.1,3.8,1.9,0.4,Iris-setosa
47 | 4.8,3,1.4,0.3,Iris-setosa
48 | 5.1,3.8,1.6,0.2,Iris-setosa
49 | 4.6,3.2,1.4,0.2,Iris-setosa
50 | 5.3,3.7,1.5,0.2,Iris-setosa
51 | 5,3.3,1.4,0.2,Iris-setosa
52 | 7,3.2,4.7,1.4,Iris-versicolor
53 | 6.4,3.2,4.5,1.5,Iris-versicolor
54 | 6.9,3.1,4.9,1.5,Iris-versicolor
55 | 5.5,2.3,4,1.3,Iris-versicolor
56 | 6.5,2.8,4.6,1.5,Iris-versicolor
57 | 5.7,2.8,4.5,1.3,Iris-versicolor
58 | 6.3,3.3,4.7,1.6,Iris-versicolor
59 | 4.9,2.4,3.3,1,Iris-versicolor
60 | 6.6,2.9,4.6,1.3,Iris-versicolor
61 | 5.2,2.7,3.9,1.4,Iris-versicolor
62 | 5,2,3.5,1,Iris-versicolor
63 | 5.9,3,4.2,1.5,Iris-versicolor
64 | 6,2.2,4,1,Iris-versicolor
65 | 6.1,2.9,4.7,1.4,Iris-versicolor
66 | 5.6,2.9,3.6,1.3,Iris-versicolor
67 | 6.7,3.1,4.4,1.4,Iris-versicolor
68 | 5.6,3,4.5,1.5,Iris-versicolor
69 | 5.8,2.7,4.1,1,Iris-versicolor
70 | 6.2,2.2,4.5,1.5,Iris-versicolor
71 | 5.6,2.5,3.9,1.1,Iris-versicolor
72 | 5.9,3.2,4.8,1.8,Iris-versicolor
73 | 6.1,2.8,4,1.3,Iris-versicolor
74 | 6.3,2.5,4.9,1.5,Iris-versicolor
75 | 6.1,2.8,4.7,1.2,Iris-versicolor
76 | 6.4,2.9,4.3,1.3,Iris-versicolor
77 | 6.6,3,4.4,1.4,Iris-versicolor
78 | 6.8,2.8,4.8,1.4,Iris-versicolor
79 | 6.7,3,5,1.7,Iris-versicolor
80 | 6,2.9,4.5,1.5,Iris-versicolor
81 | 5.7,2.6,3.5,1,Iris-versicolor
82 | 5.5,2.4,3.8,1.1,Iris-versicolor
83 | 5.5,2.4,3.7,1,Iris-versicolor
84 | 5.8,2.7,3.9,1.2,Iris-versicolor
85 | 6,2.7,5.1,1.6,Iris-versicolor
86 | 5.4,3,4.5,1.5,Iris-versicolor
87 | 6,3.4,4.5,1.6,Iris-versicolor
88 | 6.7,3.1,4.7,1.5,Iris-versicolor
89 | 6.3,2.3,4.4,1.3,Iris-versicolor
90 | 5.6,3,4.1,1.3,Iris-versicolor
91 | 5.5,2.5,4,1.3,Iris-versicolor
92 | 5.5,2.6,4.4,1.2,Iris-versicolor
93 | 6.1,3,4.6,1.4,Iris-versicolor
94 | 5.8,2.6,4,1.2,Iris-versicolor
95 | 5,2.3,3.3,1,Iris-versicolor
96 | 5.6,2.7,4.2,1.3,Iris-versicolor
97 | 5.7,3,4.2,1.2,Iris-versicolor
98 | 5.7,2.9,4.2,1.3,Iris-versicolor
99 | 6.2,2.9,4.3,1.3,Iris-versicolor
100 | 5.1,2.5,3,1.1,Iris-versicolor
101 | 5.7,2.8,4.1,1.3,Iris-versicolor
102 | 6.3,3.3,6,2.5,Iris-virginica
103 | 5.8,2.7,5.1,1.9,Iris-virginica
104 | 7.1,3,5.9,2.1,Iris-virginica
105 | 6.3,2.9,5.6,1.8,Iris-virginica
106 | 6.5,3,5.8,2.2,Iris-virginica
107 | 7.6,3,6.6,2.1,Iris-virginica
108 | 4.9,2.5,4.5,1.7,Iris-virginica
109 | 7.3,2.9,6.3,1.8,Iris-virginica
110 | 6.7,2.5,5.8,1.8,Iris-virginica
111 | 7.2,3.6,6.1,2.5,Iris-virginica
112 | 6.5,3.2,5.1,2,Iris-virginica
113 | 6.4,2.7,5.3,1.9,Iris-virginica
114 | 6.8,3,5.5,2.1,Iris-virginica
115 | 5.7,2.5,5,2,Iris-virginica
116 | 5.8,2.8,5.1,2.4,Iris-virginica
117 | 6.4,3.2,5.3,2.3,Iris-virginica
118 | 6.5,3,5.5,1.8,Iris-virginica
119 | 7.7,3.8,6.7,2.2,Iris-virginica
120 | 7.7,2.6,6.9,2.3,Iris-virginica
121 | 6,2.2,5,1.5,Iris-virginica
122 | 6.9,3.2,5.7,2.3,Iris-virginica
123 | 5.6,2.8,4.9,2,Iris-virginica
124 | 7.7,2.8,6.7,2,Iris-virginica
125 | 6.3,2.7,4.9,1.8,Iris-virginica
126 | 6.7,3.3,5.7,2.1,Iris-virginica
127 | 7.2,3.2,6,1.8,Iris-virginica
128 | 6.2,2.8,4.8,1.8,Iris-virginica
129 | 6.1,3,4.9,1.8,Iris-virginica
130 | 6.4,2.8,5.6,2.1,Iris-virginica
131 | 7.2,3,5.8,1.6,Iris-virginica
132 | 7.4,2.8,6.1,1.9,Iris-virginica
133 | 7.9,3.8,6.4,2,Iris-virginica
134 | 6.4,2.8,5.6,2.2,Iris-virginica
135 | 6.3,2.8,5.1,1.5,Iris-virginica
136 | 6.1,2.6,5.6,1.4,Iris-virginica
137 | 7.7,3,6.1,2.3,Iris-virginica
138 | 6.3,3.4,5.6,2.4,Iris-virginica
139 | 6.4,3.1,5.5,1.8,Iris-virginica
140 | 6,3,4.8,1.8,Iris-virginica
141 | 6.9,3.1,5.4,2.1,Iris-virginica
142 | 6.7,3.1,5.6,2.4,Iris-virginica
143 | 6.9,3.1,5.1,2.3,Iris-virginica
144 | 5.8,2.7,5.1,1.9,Iris-virginica
145 | 6.8,3.2,5.9,2.3,Iris-virginica
146 | 6.7,3.3,5.7,2.5,Iris-virginica
147 | 6.7,3,5.2,2.3,Iris-virginica
148 | 6.3,2.5,5,1.9,Iris-virginica
149 | 6.5,3,5.2,2,Iris-virginica
150 | 6.2,3.4,5.4,2.3,Iris-virginica
151 | 5.9,3,5.1,1.8,Iris-virginica
152 |
--------------------------------------------------------------------------------
/Chapter 5/ReadMe:
--------------------------------------------------------------------------------
1 |
2 |
3 | The last chapter of the book. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding.
4 |
--------------------------------------------------------------------------------
/Chapter2/ReadMe:
--------------------------------------------------------------------------------
1 |
2 | The second chapter of the book. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding.
3 |
--------------------------------------------------------------------------------
/Chapter2/auto-mpg.csv:
--------------------------------------------------------------------------------
1 | mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
2 | 18,8,307,130,3504,12,70,1,chevrolet chevelle malibu
3 | 15,8,350,165,3693,11.5,70,1,buick skylark 320
4 | 18,8,318,150,3436,11,70,1,plymouth satellite
5 | 16,8,304,150,3433,12,70,1,amc rebel sst
6 | 17,8,302,140,3449,10.5,70,1,ford torino
7 | 15,8,429,198,4341,10,70,1,ford galaxie 500
8 | 14,8,454,220,4354,9,70,1,chevrolet impala
9 | 14,8,440,215,4312,8.5,70,1,plymouth fury iii
10 | 14,8,455,225,4425,10,70,1,pontiac catalina
11 | 15,8,390,190,3850,8.5,70,1,amc ambassador dpl
12 | 15,8,383,170,3563,10,70,1,dodge challenger se
13 | 14,8,340,160,3609,8,70,1,plymouth 'cuda 340
14 | 15,8,400,150,3761,9.5,70,1,chevrolet monte carlo
15 | 14,8,455,225,3086,10,70,1,buick estate wagon (sw)
16 | 24,4,113,95,2372,15,70,3,toyota corona mark ii
17 | 22,6,198,95,2833,15.5,70,1,plymouth duster
18 | 18,6,199,97,2774,15.5,70,1,amc hornet
19 | 21,6,200,85,2587,16,70,1,ford maverick
20 | 27,4,97,88,2130,14.5,70,3,datsun pl510
21 | 26,4,97,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan
22 | 25,4,110,87,2672,17.5,70,2,peugeot 504
23 | 24,4,107,90,2430,14.5,70,2,audi 100 ls
24 | 25,4,104,95,2375,17.5,70,2,saab 99e
25 | 26,4,121,113,2234,12.5,70,2,bmw 2002
26 | 21,6,199,90,2648,15,70,1,amc gremlin
27 | 10,8,360,215,4615,14,70,1,ford f250
28 | 10,8,307,200,4376,15,70,1,chevy c20
29 | 11,8,318,210,4382,13.5,70,1,dodge d200
30 | 9,8,304,193,4732,18.5,70,1,hi 1200d
31 | 27,4,97,88,2130,14.5,71,3,datsun pl510
32 | 28,4,140,90,2264,15.5,71,1,chevrolet vega 2300
33 | 25,4,113,95,2228,14,71,3,toyota corona
34 | 25,4,98,?,2046,19,71,1,ford pinto
35 | 19,6,232,100,2634,13,71,1,amc gremlin
36 | 16,6,225,105,3439,15.5,71,1,plymouth satellite custom
37 | 17,6,250,100,3329,15.5,71,1,chevrolet chevelle malibu
38 | 19,6,250,88,3302,15.5,71,1,ford torino 500
39 | 18,6,232,100,3288,15.5,71,1,amc matador
40 | 14,8,350,165,4209,12,71,1,chevrolet impala
41 | 14,8,400,175,4464,11.5,71,1,pontiac catalina brougham
42 | 14,8,351,153,4154,13.5,71,1,ford galaxie 500
43 | 14,8,318,150,4096,13,71,1,plymouth fury iii
44 | 12,8,383,180,4955,11.5,71,1,dodge monaco (sw)
45 | 13,8,400,170,4746,12,71,1,ford country squire (sw)
46 | 13,8,400,175,5140,12,71,1,pontiac safari (sw)
47 | 18,6,258,110,2962,13.5,71,1,amc hornet sportabout (sw)
48 | 22,4,140,72,2408,19,71,1,chevrolet vega (sw)
49 | 19,6,250,100,3282,15,71,1,pontiac firebird
50 | 18,6,250,88,3139,14.5,71,1,ford mustang
51 | 23,4,122,86,2220,14,71,1,mercury capri 2000
52 | 28,4,116,90,2123,14,71,2,opel 1900
53 | 30,4,79,70,2074,19.5,71,2,peugeot 304
54 | 30,4,88,76,2065,14.5,71,2,fiat 124b
55 | 31,4,71,65,1773,19,71,3,toyota corolla 1200
56 | 35,4,72,69,1613,18,71,3,datsun 1200
57 | 27,4,97,60,1834,19,71,2,volkswagen model 111
58 | 26,4,91,70,1955,20.5,71,1,plymouth cricket
59 | 24,4,113,95,2278,15.5,72,3,toyota corona hardtop
60 | 25,4,97.5,80,2126,17,72,1,dodge colt hardtop
61 | 23,4,97,54,2254,23.5,72,2,volkswagen type 3
62 | 20,4,140,90,2408,19.5,72,1,chevrolet vega
63 | 21,4,122,86,2226,16.5,72,1,ford pinto runabout
64 | 13,8,350,165,4274,12,72,1,chevrolet impala
65 | 14,8,400,175,4385,12,72,1,pontiac catalina
66 | 15,8,318,150,4135,13.5,72,1,plymouth fury iii
67 | 14,8,351,153,4129,13,72,1,ford galaxie 500
68 | 17,8,304,150,3672,11.5,72,1,amc ambassador sst
69 | 11,8,429,208,4633,11,72,1,mercury marquis
70 | 13,8,350,155,4502,13.5,72,1,buick lesabre custom
71 | 12,8,350,160,4456,13.5,72,1,oldsmobile delta 88 royale
72 | 13,8,400,190,4422,12.5,72,1,chrysler newport royal
73 | 19,3,70,97,2330,13.5,72,3,mazda rx2 coupe
74 | 15,8,304,150,3892,12.5,72,1,amc matador (sw)
75 | 13,8,307,130,4098,14,72,1,chevrolet chevelle concours (sw)
76 | 13,8,302,140,4294,16,72,1,ford gran torino (sw)
77 | 14,8,318,150,4077,14,72,1,plymouth satellite custom (sw)
78 | 18,4,121,112,2933,14.5,72,2,volvo 145e (sw)
79 | 22,4,121,76,2511,18,72,2,volkswagen 411 (sw)
80 | 21,4,120,87,2979,19.5,72,2,peugeot 504 (sw)
81 | 26,4,96,69,2189,18,72,2,renault 12 (sw)
82 | 22,4,122,86,2395,16,72,1,ford pinto (sw)
83 | 28,4,97,92,2288,17,72,3,datsun 510 (sw)
84 | 23,4,120,97,2506,14.5,72,3,toyouta corona mark ii (sw)
85 | 28,4,98,80,2164,15,72,1,dodge colt (sw)
86 | 27,4,97,88,2100,16.5,72,3,toyota corolla 1600 (sw)
87 | 13,8,350,175,4100,13,73,1,buick century 350
88 | 14,8,304,150,3672,11.5,73,1,amc matador
89 | 13,8,350,145,3988,13,73,1,chevrolet malibu
90 | 14,8,302,137,4042,14.5,73,1,ford gran torino
91 | 15,8,318,150,3777,12.5,73,1,dodge coronet custom
92 | 12,8,429,198,4952,11.5,73,1,mercury marquis brougham
93 | 13,8,400,150,4464,12,73,1,chevrolet caprice classic
94 | 13,8,351,158,4363,13,73,1,ford ltd
95 | 14,8,318,150,4237,14.5,73,1,plymouth fury gran sedan
96 | 13,8,440,215,4735,11,73,1,chrysler new yorker brougham
97 | 12,8,455,225,4951,11,73,1,buick electra 225 custom
98 | 13,8,360,175,3821,11,73,1,amc ambassador brougham
99 | 18,6,225,105,3121,16.5,73,1,plymouth valiant
100 | 16,6,250,100,3278,18,73,1,chevrolet nova custom
101 | 18,6,232,100,2945,16,73,1,amc hornet
102 | 18,6,250,88,3021,16.5,73,1,ford maverick
103 | 23,6,198,95,2904,16,73,1,plymouth duster
104 | 26,4,97,46,1950,21,73,2,volkswagen super beetle
105 | 11,8,400,150,4997,14,73,1,chevrolet impala
106 | 12,8,400,167,4906,12.5,73,1,ford country
107 | 13,8,360,170,4654,13,73,1,plymouth custom suburb
108 | 12,8,350,180,4499,12.5,73,1,oldsmobile vista cruiser
109 | 18,6,232,100,2789,15,73,1,amc gremlin
110 | 20,4,97,88,2279,19,73,3,toyota carina
111 | 21,4,140,72,2401,19.5,73,1,chevrolet vega
112 | 22,4,108,94,2379,16.5,73,3,datsun 610
113 | 18,3,70,90,2124,13.5,73,3,maxda rx3
114 | 19,4,122,85,2310,18.5,73,1,ford pinto
115 | 21,6,155,107,2472,14,73,1,mercury capri v6
116 | 26,4,98,90,2265,15.5,73,2,fiat 124 sport coupe
117 | 15,8,350,145,4082,13,73,1,chevrolet monte carlo s
118 | 16,8,400,230,4278,9.5,73,1,pontiac grand prix
119 | 29,4,68,49,1867,19.5,73,2,fiat 128
120 | 24,4,116,75,2158,15.5,73,2,opel manta
121 | 20,4,114,91,2582,14,73,2,audi 100ls
122 | 19,4,121,112,2868,15.5,73,2,volvo 144ea
123 | 15,8,318,150,3399,11,73,1,dodge dart custom
124 | 24,4,121,110,2660,14,73,2,saab 99le
125 | 20,6,156,122,2807,13.5,73,3,toyota mark ii
126 | 11,8,350,180,3664,11,73,1,oldsmobile omega
127 | 20,6,198,95,3102,16.5,74,1,plymouth duster
128 | 21,6,200,?,2875,17,74,1,ford maverick
129 | 19,6,232,100,2901,16,74,1,amc hornet
130 | 15,6,250,100,3336,17,74,1,chevrolet nova
131 | 31,4,79,67,1950,19,74,3,datsun b210
132 | 26,4,122,80,2451,16.5,74,1,ford pinto
133 | 32,4,71,65,1836,21,74,3,toyota corolla 1200
134 | 25,4,140,75,2542,17,74,1,chevrolet vega
135 | 16,6,250,100,3781,17,74,1,chevrolet chevelle malibu classic
136 | 16,6,258,110,3632,18,74,1,amc matador
137 | 18,6,225,105,3613,16.5,74,1,plymouth satellite sebring
138 | 16,8,302,140,4141,14,74,1,ford gran torino
139 | 13,8,350,150,4699,14.5,74,1,buick century luxus (sw)
140 | 14,8,318,150,4457,13.5,74,1,dodge coronet custom (sw)
141 | 14,8,302,140,4638,16,74,1,ford gran torino (sw)
142 | 14,8,304,150,4257,15.5,74,1,amc matador (sw)
143 | 29,4,98,83,2219,16.5,74,2,audi fox
144 | 26,4,79,67,1963,15.5,74,2,volkswagen dasher
145 | 26,4,97,78,2300,14.5,74,2,opel manta
146 | 31,4,76,52,1649,16.5,74,3,toyota corona
147 | 32,4,83,61,2003,19,74,3,datsun 710
148 | 28,4,90,75,2125,14.5,74,1,dodge colt
149 | 24,4,90,75,2108,15.5,74,2,fiat 128
150 | 26,4,116,75,2246,14,74,2,fiat 124 tc
151 | 24,4,120,97,2489,15,74,3,honda civic
152 | 26,4,108,93,2391,15.5,74,3,subaru
153 | 31,4,79,67,2000,16,74,2,fiat x1.9
154 | 19,6,225,95,3264,16,75,1,plymouth valiant custom
155 | 18,6,250,105,3459,16,75,1,chevrolet nova
156 | 15,6,250,72,3432,21,75,1,mercury monarch
157 | 15,6,250,72,3158,19.5,75,1,ford maverick
158 | 16,8,400,170,4668,11.5,75,1,pontiac catalina
159 | 15,8,350,145,4440,14,75,1,chevrolet bel air
160 | 16,8,318,150,4498,14.5,75,1,plymouth grand fury
161 | 14,8,351,148,4657,13.5,75,1,ford ltd
162 | 17,6,231,110,3907,21,75,1,buick century
163 | 16,6,250,105,3897,18.5,75,1,chevroelt chevelle malibu
164 | 15,6,258,110,3730,19,75,1,amc matador
165 | 18,6,225,95,3785,19,75,1,plymouth fury
166 | 21,6,231,110,3039,15,75,1,buick skyhawk
167 | 20,8,262,110,3221,13.5,75,1,chevrolet monza 2+2
168 | 13,8,302,129,3169,12,75,1,ford mustang ii
169 | 29,4,97,75,2171,16,75,3,toyota corolla
170 | 23,4,140,83,2639,17,75,1,ford pinto
171 | 20,6,232,100,2914,16,75,1,amc gremlin
172 | 23,4,140,78,2592,18.5,75,1,pontiac astro
173 | 24,4,134,96,2702,13.5,75,3,toyota corona
174 | 25,4,90,71,2223,16.5,75,2,volkswagen dasher
175 | 24,4,119,97,2545,17,75,3,datsun 710
176 | 18,6,171,97,2984,14.5,75,1,ford pinto
177 | 29,4,90,70,1937,14,75,2,volkswagen rabbit
178 | 19,6,232,90,3211,17,75,1,amc pacer
179 | 23,4,115,95,2694,15,75,2,audi 100ls
180 | 23,4,120,88,2957,17,75,2,peugeot 504
181 | 22,4,121,98,2945,14.5,75,2,volvo 244dl
182 | 25,4,121,115,2671,13.5,75,2,saab 99le
183 | 33,4,91,53,1795,17.5,75,3,honda civic cvcc
184 | 28,4,107,86,2464,15.5,76,2,fiat 131
185 | 25,4,116,81,2220,16.9,76,2,opel 1900
186 | 25,4,140,92,2572,14.9,76,1,capri ii
187 | 26,4,98,79,2255,17.7,76,1,dodge colt
188 | 27,4,101,83,2202,15.3,76,2,renault 12tl
189 | 17.5,8,305,140,4215,13,76,1,chevrolet chevelle malibu classic
190 | 16,8,318,150,4190,13,76,1,dodge coronet brougham
191 | 15.5,8,304,120,3962,13.9,76,1,amc matador
192 | 14.5,8,351,152,4215,12.8,76,1,ford gran torino
193 | 22,6,225,100,3233,15.4,76,1,plymouth valiant
194 | 22,6,250,105,3353,14.5,76,1,chevrolet nova
195 | 24,6,200,81,3012,17.6,76,1,ford maverick
196 | 22.5,6,232,90,3085,17.6,76,1,amc hornet
197 | 29,4,85,52,2035,22.2,76,1,chevrolet chevette
198 | 24.5,4,98,60,2164,22.1,76,1,chevrolet woody
199 | 29,4,90,70,1937,14.2,76,2,vw rabbit
200 | 33,4,91,53,1795,17.4,76,3,honda civic
201 | 20,6,225,100,3651,17.7,76,1,dodge aspen se
202 | 18,6,250,78,3574,21,76,1,ford granada ghia
203 | 18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj
204 | 17.5,6,258,95,3193,17.8,76,1,amc pacer d/l
205 | 29.5,4,97,71,1825,12.2,76,2,volkswagen rabbit
206 | 32,4,85,70,1990,17,76,3,datsun b-210
207 | 28,4,97,75,2155,16.4,76,3,toyota corolla
208 | 26.5,4,140,72,2565,13.6,76,1,ford pinto
209 | 20,4,130,102,3150,15.7,76,2,volvo 245
210 | 13,8,318,150,3940,13.2,76,1,plymouth volare premier v8
211 | 19,4,120,88,3270,21.9,76,2,peugeot 504
212 | 19,6,156,108,2930,15.5,76,3,toyota mark ii
213 | 16.5,6,168,120,3820,16.7,76,2,mercedes-benz 280s
214 | 16.5,8,350,180,4380,12.1,76,1,cadillac seville
215 | 13,8,350,145,4055,12,76,1,chevy c10
216 | 13,8,302,130,3870,15,76,1,ford f108
217 | 13,8,318,150,3755,14,76,1,dodge d100
218 | 31.5,4,98,68,2045,18.5,77,3,honda accord cvcc
219 | 30,4,111,80,2155,14.8,77,1,buick opel isuzu deluxe
220 | 36,4,79,58,1825,18.6,77,2,renault 5 gtl
221 | 25.5,4,122,96,2300,15.5,77,1,plymouth arrow gs
222 | 33.5,4,85,70,1945,16.8,77,3,datsun f-10 hatchback
223 | 17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic
224 | 17,8,260,110,4060,19,77,1,oldsmobile cutlass supreme
225 | 15.5,8,318,145,4140,13.7,77,1,dodge monaco brougham
226 | 15,8,302,130,4295,14.9,77,1,mercury cougar brougham
227 | 17.5,6,250,110,3520,16.4,77,1,chevrolet concours
228 | 20.5,6,231,105,3425,16.9,77,1,buick skylark
229 | 19,6,225,100,3630,17.7,77,1,plymouth volare custom
230 | 18.5,6,250,98,3525,19,77,1,ford granada
231 | 16,8,400,180,4220,11.1,77,1,pontiac grand prix lj
232 | 15.5,8,350,170,4165,11.4,77,1,chevrolet monte carlo landau
233 | 15.5,8,400,190,4325,12.2,77,1,chrysler cordoba
234 | 16,8,351,149,4335,14.5,77,1,ford thunderbird
235 | 29,4,97,78,1940,14.5,77,2,volkswagen rabbit custom
236 | 24.5,4,151,88,2740,16,77,1,pontiac sunbird coupe
237 | 26,4,97,75,2265,18.2,77,3,toyota corolla liftback
238 | 25.5,4,140,89,2755,15.8,77,1,ford mustang ii 2+2
239 | 30.5,4,98,63,2051,17,77,1,chevrolet chevette
240 | 33.5,4,98,83,2075,15.9,77,1,dodge colt m/m
241 | 30,4,97,67,1985,16.4,77,3,subaru dl
242 | 30.5,4,97,78,2190,14.1,77,2,volkswagen dasher
243 | 22,6,146,97,2815,14.5,77,3,datsun 810
244 | 21.5,4,121,110,2600,12.8,77,2,bmw 320i
245 | 21.5,3,80,110,2720,13.5,77,3,mazda rx-4
246 | 43.1,4,90,48,1985,21.5,78,2,volkswagen rabbit custom diesel
247 | 36.1,4,98,66,1800,14.4,78,1,ford fiesta
248 | 32.8,4,78,52,1985,19.4,78,3,mazda glc deluxe
249 | 39.4,4,85,70,2070,18.6,78,3,datsun b210 gx
250 | 36.1,4,91,60,1800,16.4,78,3,honda civic cvcc
251 | 19.9,8,260,110,3365,15.5,78,1,oldsmobile cutlass salon brougham
252 | 19.4,8,318,140,3735,13.2,78,1,dodge diplomat
253 | 20.2,8,302,139,3570,12.8,78,1,mercury monarch ghia
254 | 19.2,6,231,105,3535,19.2,78,1,pontiac phoenix lj
255 | 20.5,6,200,95,3155,18.2,78,1,chevrolet malibu
256 | 20.2,6,200,85,2965,15.8,78,1,ford fairmont (auto)
257 | 25.1,4,140,88,2720,15.4,78,1,ford fairmont (man)
258 | 20.5,6,225,100,3430,17.2,78,1,plymouth volare
259 | 19.4,6,232,90,3210,17.2,78,1,amc concord
260 | 20.6,6,231,105,3380,15.8,78,1,buick century special
261 | 20.8,6,200,85,3070,16.7,78,1,mercury zephyr
262 | 18.6,6,225,110,3620,18.7,78,1,dodge aspen
263 | 18.1,6,258,120,3410,15.1,78,1,amc concord d/l
264 | 19.2,8,305,145,3425,13.2,78,1,chevrolet monte carlo landau
265 | 17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo)
266 | 18.1,8,302,139,3205,11.2,78,1,ford futura
267 | 17.5,8,318,140,4080,13.7,78,1,dodge magnum xe
268 | 30,4,98,68,2155,16.5,78,1,chevrolet chevette
269 | 27.5,4,134,95,2560,14.2,78,3,toyota corona
270 | 27.2,4,119,97,2300,14.7,78,3,datsun 510
271 | 30.9,4,105,75,2230,14.5,78,1,dodge omni
272 | 21.1,4,134,95,2515,14.8,78,3,toyota celica gt liftback
273 | 23.2,4,156,105,2745,16.7,78,1,plymouth sapporo
274 | 23.8,4,151,85,2855,17.6,78,1,oldsmobile starfire sx
275 | 23.9,4,119,97,2405,14.9,78,3,datsun 200-sx
276 | 20.3,5,131,103,2830,15.9,78,2,audi 5000
277 | 17,6,163,125,3140,13.6,78,2,volvo 264gl
278 | 21.6,4,121,115,2795,15.7,78,2,saab 99gle
279 | 16.2,6,163,133,3410,15.8,78,2,peugeot 604sl
280 | 31.5,4,89,71,1990,14.9,78,2,volkswagen scirocco
281 | 29.5,4,98,68,2135,16.6,78,3,honda accord lx
282 | 21.5,6,231,115,3245,15.4,79,1,pontiac lemans v6
283 | 19.8,6,200,85,2990,18.2,79,1,mercury zephyr 6
284 | 22.3,4,140,88,2890,17.3,79,1,ford fairmont 4
285 | 20.2,6,232,90,3265,18.2,79,1,amc concord dl 6
286 | 20.6,6,225,110,3360,16.6,79,1,dodge aspen 6
287 | 17,8,305,130,3840,15.4,79,1,chevrolet caprice classic
288 | 17.6,8,302,129,3725,13.4,79,1,ford ltd landau
289 | 16.5,8,351,138,3955,13.2,79,1,mercury grand marquis
290 | 18.2,8,318,135,3830,15.2,79,1,dodge st. regis
291 | 16.9,8,350,155,4360,14.9,79,1,buick estate wagon (sw)
292 | 15.5,8,351,142,4054,14.3,79,1,ford country squire (sw)
293 | 19.2,8,267,125,3605,15,79,1,chevrolet malibu classic (sw)
294 | 18.5,8,360,150,3940,13,79,1,chrysler lebaron town @ country (sw)
295 | 31.9,4,89,71,1925,14,79,2,vw rabbit custom
296 | 34.1,4,86,65,1975,15.2,79,3,maxda glc deluxe
297 | 35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom
298 | 27.4,4,121,80,2670,15,79,1,amc spirit dl
299 | 25.4,5,183,77,3530,20.1,79,2,mercedes benz 300d
300 | 23,8,350,125,3900,17.4,79,1,cadillac eldorado
301 | 27.2,4,141,71,3190,24.8,79,2,peugeot 504
302 | 23.9,8,260,90,3420,22.2,79,1,oldsmobile cutlass salon brougham
303 | 34.2,4,105,70,2200,13.2,79,1,plymouth horizon
304 | 34.5,4,105,70,2150,14.9,79,1,plymouth horizon tc3
305 | 31.8,4,85,65,2020,19.2,79,3,datsun 210
306 | 37.3,4,91,69,2130,14.7,79,2,fiat strada custom
307 | 28.4,4,151,90,2670,16,79,1,buick skylark limited
308 | 28.8,6,173,115,2595,11.3,79,1,chevrolet citation
309 | 26.8,6,173,115,2700,12.9,79,1,oldsmobile omega brougham
310 | 33.5,4,151,90,2556,13.2,79,1,pontiac phoenix
311 | 41.5,4,98,76,2144,14.7,80,2,vw rabbit
312 | 38.1,4,89,60,1968,18.8,80,3,toyota corolla tercel
313 | 32.1,4,98,70,2120,15.5,80,1,chevrolet chevette
314 | 37.2,4,86,65,2019,16.4,80,3,datsun 310
315 | 28,4,151,90,2678,16.5,80,1,chevrolet citation
316 | 26.4,4,140,88,2870,18.1,80,1,ford fairmont
317 | 24.3,4,151,90,3003,20.1,80,1,amc concord
318 | 19.1,6,225,90,3381,18.7,80,1,dodge aspen
319 | 34.3,4,97,78,2188,15.8,80,2,audi 4000
320 | 29.8,4,134,90,2711,15.5,80,3,toyota corona liftback
321 | 31.3,4,120,75,2542,17.5,80,3,mazda 626
322 | 37,4,119,92,2434,15,80,3,datsun 510 hatchback
323 | 32.2,4,108,75,2265,15.2,80,3,toyota corolla
324 | 46.6,4,86,65,2110,17.9,80,3,mazda glc
325 | 27.9,4,156,105,2800,14.4,80,1,dodge colt
326 | 40.8,4,85,65,2110,19.2,80,3,datsun 210
327 | 44.3,4,90,48,2085,21.7,80,2,vw rabbit c (diesel)
328 | 43.4,4,90,48,2335,23.7,80,2,vw dasher (diesel)
329 | 36.4,5,121,67,2950,19.9,80,2,audi 5000s (diesel)
330 | 30,4,146,67,3250,21.8,80,2,mercedes-benz 240d
331 | 44.6,4,91,67,1850,13.8,80,3,honda civic 1500 gl
332 | 40.9,4,85,?,1835,17.3,80,2,renault lecar deluxe
333 | 33.8,4,97,67,2145,18,80,3,subaru dl
334 | 29.8,4,89,62,1845,15.3,80,2,vokswagen rabbit
335 | 32.7,6,168,132,2910,11.4,80,3,datsun 280-zx
336 | 23.7,3,70,100,2420,12.5,80,3,mazda rx-7 gs
337 | 35,4,122,88,2500,15.1,80,2,triumph tr7 coupe
338 | 23.6,4,140,?,2905,14.3,80,1,ford mustang cobra
339 | 32.4,4,107,72,2290,17,80,3,honda accord
340 | 27.2,4,135,84,2490,15.7,81,1,plymouth reliant
341 | 26.6,4,151,84,2635,16.4,81,1,buick skylark
342 | 25.8,4,156,92,2620,14.4,81,1,dodge aries wagon (sw)
343 | 23.5,6,173,110,2725,12.6,81,1,chevrolet citation
344 | 30,4,135,84,2385,12.9,81,1,plymouth reliant
345 | 39.1,4,79,58,1755,16.9,81,3,toyota starlet
346 | 39,4,86,64,1875,16.4,81,1,plymouth champ
347 | 35.1,4,81,60,1760,16.1,81,3,honda civic 1300
348 | 32.3,4,97,67,2065,17.8,81,3,subaru
349 | 37,4,85,65,1975,19.4,81,3,datsun 210 mpg
350 | 37.7,4,89,62,2050,17.3,81,3,toyota tercel
351 | 34.1,4,91,68,1985,16,81,3,mazda glc 4
352 | 34.7,4,105,63,2215,14.9,81,1,plymouth horizon 4
353 | 34.4,4,98,65,2045,16.2,81,1,ford escort 4w
354 | 29.9,4,98,65,2380,20.7,81,1,ford escort 2h
355 | 33,4,105,74,2190,14.2,81,2,volkswagen jetta
356 | 34.5,4,100,?,2320,15.8,81,2,renault 18i
357 | 33.7,4,107,75,2210,14.4,81,3,honda prelude
358 | 32.4,4,108,75,2350,16.8,81,3,toyota corolla
359 | 32.9,4,119,100,2615,14.8,81,3,datsun 200sx
360 | 31.6,4,120,74,2635,18.3,81,3,mazda 626
361 | 28.1,4,141,80,3230,20.4,81,2,peugeot 505s turbo diesel
362 | 30.7,6,145,76,3160,19.6,81,2,volvo diesel
363 | 25.4,6,168,116,2900,12.6,81,3,toyota cressida
364 | 24.2,6,146,120,2930,13.8,81,3,datsun 810 maxima
365 | 22.4,6,231,110,3415,15.8,81,1,buick century
366 | 26.6,8,350,105,3725,19,81,1,oldsmobile cutlass ls
367 | 20.2,6,200,88,3060,17.1,81,1,ford granada gl
368 | 17.6,6,225,85,3465,16.6,81,1,chrysler lebaron salon
369 | 28,4,112,88,2605,19.6,82,1,chevrolet cavalier
370 | 27,4,112,88,2640,18.6,82,1,chevrolet cavalier wagon
371 | 34,4,112,88,2395,18,82,1,chevrolet cavalier 2-door
372 | 31,4,112,85,2575,16.2,82,1,pontiac j2000 se hatchback
373 | 29,4,135,84,2525,16,82,1,dodge aries se
374 | 27,4,151,90,2735,18,82,1,pontiac phoenix
375 | 24,4,140,92,2865,16.4,82,1,ford fairmont futura
376 | 23,4,151,?,3035,20.5,82,1,amc concord dl
377 | 36,4,105,74,1980,15.3,82,2,volkswagen rabbit l
378 | 37,4,91,68,2025,18.2,82,3,mazda glc custom l
379 | 31,4,91,68,1970,17.6,82,3,mazda glc custom
380 | 38,4,105,63,2125,14.7,82,1,plymouth horizon miser
381 | 36,4,98,70,2125,17.3,82,1,mercury lynx l
382 | 36,4,120,88,2160,14.5,82,3,nissan stanza xe
383 | 36,4,107,75,2205,14.5,82,3,honda accord
384 | 34,4,108,70,2245,16.9,82,3,toyota corolla
385 | 38,4,91,67,1965,15,82,3,honda civic
386 | 32,4,91,67,1965,15.7,82,3,honda civic (auto)
387 | 38,4,91,67,1995,16.2,82,3,datsun 310 gx
388 | 25,6,181,110,2945,16.4,82,1,buick century limited
389 | 38,6,262,85,3015,17,82,1,oldsmobile cutlass ciera (diesel)
390 | 26,4,156,92,2585,14.5,82,1,chrysler lebaron medallion
391 | 22,6,232,112,2835,14.7,82,1,ford granada l
392 | 32,4,144,96,2665,13.9,82,3,toyota celica gt
393 | 36,4,135,84,2370,13,82,1,dodge charger 2.2
394 | 27,4,151,90,2950,17.3,82,1,chevrolet camaro
395 | 27,4,140,86,2790,15.6,82,1,ford mustang gl
396 | 44,4,97,52,2130,24.6,82,2,vw pickup
397 | 32,4,135,84,2295,11.6,82,1,dodge rampage
398 | 28,4,120,79,2625,18.6,82,1,ford ranger
399 | 31,4,119,82,2720,19.4,82,1,chevy s-10
400 |
--------------------------------------------------------------------------------
/Chapter2/petrol_consumption.csv:
--------------------------------------------------------------------------------
1 | Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption
2 | 9.00,3571,1976,0.5250,541
3 | 9.00,4092,1250,0.5720,524
4 | 9.00,3865,1586,0.5800,561
5 | 7.50,4870,2351,0.5290,414
6 | 8.00,4399,431,0.5440,410
7 | 10.00,5342,1333,0.5710,457
8 | 8.00,5319,11868,0.4510,344
9 | 8.00,5126,2138,0.5530,467
10 | 8.00,4447,8577,0.5290,464
11 | 7.00,4512,8507,0.5520,498
12 | 8.00,4391,5939,0.5300,580
13 | 7.50,5126,14186,0.5250,471
14 | 7.00,4817,6930,0.5740,525
15 | 7.00,4207,6580,0.5450,508
16 | 7.00,4332,8159,0.6080,566
17 | 7.00,4318,10340,0.5860,635
18 | 7.00,4206,8508,0.5720,603
19 | 7.00,3718,4725,0.5400,714
20 | 7.00,4716,5915,0.7240,865
21 | 8.50,4341,6010,0.6770,640
22 | 7.00,4593,7834,0.6630,649
23 | 8.00,4983,602,0.6020,540
24 | 9.00,4897,2449,0.5110,464
25 | 9.00,4258,4686,0.5170,547
26 | 8.50,4574,2619,0.5510,460
27 | 9.00,3721,4746,0.5440,566
28 | 8.00,3448,5399,0.5480,577
29 | 7.50,3846,9061,0.5790,631
30 | 8.00,4188,5975,0.5630,574
31 | 9.00,3601,4650,0.4930,534
32 | 7.00,3640,6905,0.5180,571
33 | 7.00,3333,6594,0.5130,554
34 | 8.00,3063,6524,0.5780,577
35 | 7.50,3357,4121,0.5470,628
36 | 8.00,3528,3495,0.4870,487
37 | 6.58,3802,7834,0.6290,644
38 | 5.00,4045,17782,0.5660,640
39 | 7.00,3897,6385,0.5860,704
40 | 8.50,3635,3274,0.6630,648
41 | 7.00,4345,3905,0.6720,968
42 | 7.00,4449,4639,0.6260,587
43 | 7.00,3656,3985,0.5630,699
44 | 7.00,4300,3635,0.6030,632
45 | 7.00,3745,2611,0.5080,591
46 | 6.00,5215,2302,0.6720,782
47 | 9.00,4476,3942,0.5710,510
48 | 7.00,4296,4083,0.6230,610
49 | 7.00,5002,9794,0.5930,524
50 |
--------------------------------------------------------------------------------
/Contributing.md:
--------------------------------------------------------------------------------
1 | # Contributing to Apress Source Code
2 |
3 | Copyright for Apress source code belongs to the author(s). However, under fair use you are encouraged to fork and contribute minor corrections and updates for the benefit of the author(s) and other readers.
4 |
5 | ## How to Contribute
6 |
7 | 1. Make sure you have a GitHub account.
8 | 2. Fork the repository for the relevant book.
9 | 3. Create a new branch on which to make your change, e.g.
10 | `git checkout -b my_code_contribution`
11 | 4. Commit your change. Include a commit message describing the correction. Please note that if your commit message is not clear, the correction will not be accepted.
12 | 5. Submit a pull request.
13 |
14 | Thank you for your contribution!
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Freeware License, some rights reserved
2 |
3 | Copyright (c) 2020 Vaibhav Verdhan
4 |
5 | Permission is hereby granted, free of charge, to anyone obtaining a copy
6 | of this software and associated documentation files (the "Software"),
7 | to work with the Software within the limits of freeware distribution and fair use.
8 | This includes the rights to use, copy, and modify the Software for personal use.
9 | Users are also allowed and encouraged to submit corrections and modifications
10 | to the Software for the benefit of other users.
11 |
12 | It is not allowed to reuse, modify, or redistribute the Software for
13 | commercial use in any way, or for a user’s educational materials such as books
14 | or blog articles without prior permission from the copyright holder.
15 |
16 | The above copyright notice and this permission notice need to be included
17 | in all copies or substantial portions of the software.
18 |
19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22 | AUTHORS OR COPYRIGHT HOLDERS OR APRESS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25 | SOFTWARE.
26 |
27 |
28 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Apress Source Code
2 |
3 | This repository accompanies [*Supervised Learning with Python*](https://www.apress.com/9781484261552) by Vaibhav Verdhan (Apress, 2020).
4 |
5 | [comment]: #cover
6 | 
7 |
8 | Download the files as a zip using the green button, or clone the repository to your machine using Git.
9 |
10 | ## Releases
11 |
12 | Release v1.0 corresponds to the code in the published book, without corrections or updates.
13 |
14 | ## Contributions
15 |
16 | See the file Contributing.md for more information on how you can contribute to this repository.
--------------------------------------------------------------------------------
/errata.md:
--------------------------------------------------------------------------------
1 | # Errata for *Book Title*
2 |
3 | On **page xx** [Summary of error]:
4 |
5 | Details of error here. Highlight key pieces in **bold**.
6 |
7 | ***
8 |
9 | On **page xx** [Summary of error]:
10 |
11 | Details of error here. Highlight key pieces in **bold**.
12 |
13 | ***
--------------------------------------------------------------------------------