├── 9781484261552.jpg ├── Chapter 1 └── ReadMe ├── Chapter 3 ├── CreditRisk.csv ├── Decision Tree Random Forest Case Study.ipynb ├── Log_ROC.png ├── Logistic Regression Case Study.ipynb ├── Naive Bayes Case Study.ipynb ├── Network_Intrusion.csv ├── ReadMe └── adult.data ├── Chapter 4 ├── Chapter4_Boosting.ipynb ├── Chapter4_NLP2.ipynb ├── Chpater4_NLP1.ipynb ├── Imageclassification.ipynb ├── NeuralNetwork_ClassifcationFirst.ipynb ├── ReadMe ├── SVM_Chapter4.ipynb ├── bc2.csv ├── pima-indians-diabetes.csv └── winequality-red-1.csv ├── Chapter 5 ├── Chapter5.ipynb ├── Exploratory Data Analysis Notebook.ipynb ├── IRIS.csv ├── ReadMe ├── deliveries.csv ├── matches.csv └── titanic.csv ├── Chapter2 ├── Chapter2_PythonCode.ipynb ├── House_data.csv ├── House_data_LR.csv ├── House_data_MLR.csv ├── ReadMe ├── auto-mpg.csv └── petrol_consumption.csv ├── Contributing.md ├── LICENSE.txt ├── README.md └── errata.md /9781484261552.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Apress/supervised-learning-w-python/68c94f12d27647fa3dcd6b19d83edfc0bb3c5f39/9781484261552.jpg -------------------------------------------------------------------------------- /Chapter 1/ReadMe: -------------------------------------------------------------------------------- 1 | 2 | The first chapter of the book. No code in this chapter. 3 | -------------------------------------------------------------------------------- /Chapter 3/CreditRisk.csv: -------------------------------------------------------------------------------- 1 | Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status 2 | LP001002,Male,No,0,Graduate,No,5849,0,0,360,1,Urban,1 3 | LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,0 4 | LP001005,Male,Yes,0,Graduate,Yes,3000,0,66,360,1,Urban,1 5 | LP001006,Male,Yes,0,Not Graduate,No,2583,2358,120,360,1,Urban,1 6 | LP001008,Male,No,0,Graduate,No,6000,0,141,360,1,Urban,1 7 | LP001011,Male,Yes,2,Graduate,Yes,5417,4196,267,360,1,Urban,1 8 | LP001013,Male,Yes,0,Not Graduate,No,2333,1516,95,360,1,Urban,1 9 | LP001014,Male,Yes,3+,Graduate,No,3036,2504,158,360,0,Semiurban,0 10 | LP001018,Male,Yes,2,Graduate,No,4006,1526,168,360,1,Urban,1 11 | LP001020,Male,Yes,1,Graduate,No,12841,10968,349,360,1,Semiurban,0 12 | LP001024,Male,Yes,2,Graduate,No,3200,700,70,360,1,Urban,1 13 | LP001027,Male,Yes,2,Graduate,,2500,1840,109,360,1,Urban,1 14 | LP001028,Male,Yes,2,Graduate,No,3073,8106,200,360,1,Urban,1 15 | LP001029,Male,No,0,Graduate,No,1853,2840,114,360,1,Rural,0 16 | LP001030,Male,Yes,2,Graduate,No,1299,1086,17,120,1,Urban,1 17 | LP001032,Male,No,0,Graduate,No,4950,0,125,360,1,Urban,1 18 | LP001034,Male,No,1,Not Graduate,No,3596,0,100,240,,Urban,1 19 | LP001036,Female,No,0,Graduate,No,3510,0,76,360,0,Urban,0 20 | LP001038,Male,Yes,0,Not Graduate,No,4887,0,133,360,1,Rural,0 21 | LP001041,Male,Yes,0,Graduate,,2600,3500,115,,1,Urban,1 22 | LP001043,Male,Yes,0,Not Graduate,No,7660,0,104,360,0,Urban,0 23 | LP001046,Male,Yes,1,Graduate,No,5955,5625,315,360,1,Urban,1 24 | LP001047,Male,Yes,0,Not Graduate,No,2600,1911,116,360,0,Semiurban,0 25 | LP001050,,Yes,2,Not Graduate,No,3365,1917,112,360,0,Rural,0 26 | LP001052,Male,Yes,1,Graduate,,3717,2925,151,360,,Semiurban,0 27 | LP001066,Male,Yes,0,Graduate,Yes,9560,0,191,360,1,Semiurban,1 28 | LP001068,Male,Yes,0,Graduate,No,2799,2253,122,360,1,Semiurban,1 29 | LP001073,Male,Yes,2,Not Graduate,No,4226,1040,110,360,1,Urban,1 30 | LP001086,Male,No,0,Not Graduate,No,1442,0,35,360,1,Urban,0 31 | LP001087,Female,No,2,Graduate,,3750,2083,120,360,1,Semiurban,1 32 | LP001091,Male,Yes,1,Graduate,,4166,3369,201,360,,Urban,0 33 | LP001095,Male,No,0,Graduate,No,3167,0,74,360,1,Urban,0 34 | LP001097,Male,No,1,Graduate,Yes,4692,0,106,360,1,Rural,0 35 | LP001098,Male,Yes,0,Graduate,No,3500,1667,114,360,1,Semiurban,1 36 | LP001100,Male,No,3+,Graduate,No,12500,3000,320,360,1,Rural,0 37 | LP001106,Male,Yes,0,Graduate,No,2275,2067,0,360,1,Urban,1 38 | LP001109,Male,Yes,0,Graduate,No,1828,1330,100,,0,Urban,0 39 | LP001112,Female,Yes,0,Graduate,No,3667,1459,144,360,1,Semiurban,1 40 | LP001114,Male,No,0,Graduate,No,4166,7210,184,360,1,Urban,1 41 | LP001116,Male,No,0,Not Graduate,No,3748,1668,110,360,1,Semiurban,1 42 | LP001119,Male,No,0,Graduate,No,3600,0,80,360,1,Urban,0 43 | LP001120,Male,No,0,Graduate,No,1800,1213,47,360,1,Urban,1 44 | LP001123,Male,Yes,0,Graduate,No,2400,0,75,360,,Urban,1 45 | LP001131,Male,Yes,0,Graduate,No,3941,2336,134,360,1,Semiurban,1 46 | LP001136,Male,Yes,0,Not Graduate,Yes,4695,0,96,,1,Urban,1 47 | LP001137,Female,No,0,Graduate,No,3410,0,88,,1,Urban,1 48 | LP001138,Male,Yes,1,Graduate,No,5649,0,44,360,1,Urban,1 49 | LP001144,Male,Yes,0,Graduate,No,5821,0,144,360,1,Urban,1 50 | LP001146,Female,Yes,0,Graduate,No,2645,3440,120,360,0,Urban,0 51 | LP001151,Female,No,0,Graduate,No,4000,2275,144,360,1,Semiurban,1 52 | LP001155,Female,Yes,0,Not Graduate,No,1928,1644,100,360,1,Semiurban,1 53 | LP001157,Female,No,0,Graduate,No,3086,0,120,360,1,Semiurban,1 54 | LP001164,Female,No,0,Graduate,No,4230,0,112,360,1,Semiurban,0 55 | LP001179,Male,Yes,2,Graduate,No,4616,0,134,360,1,Urban,0 56 | LP001186,Female,Yes,1,Graduate,Yes,11500,0,286,360,0,Urban,0 57 | LP001194,Male,Yes,2,Graduate,No,2708,1167,97,360,1,Semiurban,1 58 | LP001195,Male,Yes,0,Graduate,No,2132,1591,96,360,1,Semiurban,1 59 | LP001197,Male,Yes,0,Graduate,No,3366,2200,135,360,1,Rural,0 60 | LP001198,Male,Yes,1,Graduate,No,8080,2250,180,360,1,Urban,1 61 | LP001199,Male,Yes,2,Not Graduate,No,3357,2859,144,360,1,Urban,1 62 | LP001205,Male,Yes,0,Graduate,No,2500,3796,120,360,1,Urban,1 63 | LP001206,Male,Yes,3+,Graduate,No,3029,0,99,360,1,Urban,1 64 | LP001207,Male,Yes,0,Not Graduate,Yes,2609,3449,165,180,0,Rural,0 65 | LP001213,Male,Yes,1,Graduate,No,4945,0,0,360,0,Rural,0 66 | LP001222,Female,No,0,Graduate,No,4166,0,116,360,0,Semiurban,0 67 | LP001225,Male,Yes,0,Graduate,No,5726,4595,258,360,1,Semiurban,0 68 | LP001228,Male,No,0,Not Graduate,No,3200,2254,126,180,0,Urban,0 69 | LP001233,Male,Yes,1,Graduate,No,10750,0,312,360,1,Urban,1 70 | LP001238,Male,Yes,3+,Not Graduate,Yes,7100,0,125,60,1,Urban,1 71 | LP001241,Female,No,0,Graduate,No,4300,0,136,360,0,Semiurban,0 72 | LP001243,Male,Yes,0,Graduate,No,3208,3066,172,360,1,Urban,1 73 | LP001245,Male,Yes,2,Not Graduate,Yes,1875,1875,97,360,1,Semiurban,1 74 | LP001248,Male,No,0,Graduate,No,3500,0,81,300,1,Semiurban,1 75 | LP001250,Male,Yes,3+,Not Graduate,No,4755,0,95,,0,Semiurban,0 76 | LP001253,Male,Yes,3+,Graduate,Yes,5266,1774,187,360,1,Semiurban,1 77 | LP001255,Male,No,0,Graduate,No,3750,0,113,480,1,Urban,0 78 | LP001256,Male,No,0,Graduate,No,3750,4750,176,360,1,Urban,0 79 | LP001259,Male,Yes,1,Graduate,Yes,1000,3022,110,360,1,Urban,0 80 | LP001263,Male,Yes,3+,Graduate,No,3167,4000,180,300,0,Semiurban,0 81 | LP001264,Male,Yes,3+,Not Graduate,Yes,3333,2166,130,360,,Semiurban,1 82 | LP001265,Female,No,0,Graduate,No,3846,0,111,360,1,Semiurban,1 83 | LP001266,Male,Yes,1,Graduate,Yes,2395,0,0,360,1,Semiurban,1 84 | LP001267,Female,Yes,2,Graduate,No,1378,1881,167,360,1,Urban,0 85 | LP001273,Male,Yes,0,Graduate,No,6000,2250,265,360,,Semiurban,0 86 | LP001275,Male,Yes,1,Graduate,No,3988,0,50,240,1,Urban,1 87 | LP001279,Male,No,0,Graduate,No,2366,2531,136,360,1,Semiurban,1 88 | LP001280,Male,Yes,2,Not Graduate,No,3333,2000,99,360,,Semiurban,1 89 | LP001282,Male,Yes,0,Graduate,No,2500,2118,104,360,1,Semiurban,1 90 | LP001289,Male,No,0,Graduate,No,8566,0,210,360,1,Urban,1 91 | LP001310,Male,Yes,0,Graduate,No,5695,4167,175,360,1,Semiurban,1 92 | LP001316,Male,Yes,0,Graduate,No,2958,2900,131,360,1,Semiurban,1 93 | LP001318,Male,Yes,2,Graduate,No,6250,5654,188,180,1,Semiurban,1 94 | LP001319,Male,Yes,2,Not Graduate,No,3273,1820,81,360,1,Urban,1 95 | LP001322,Male,No,0,Graduate,No,4133,0,122,360,1,Semiurban,1 96 | LP001325,Male,No,0,Not Graduate,No,3620,0,25,120,1,Semiurban,1 97 | LP001326,Male,No,0,Graduate,,6782,0,0,360,,Urban,0 98 | LP001327,Female,Yes,0,Graduate,No,2484,2302,137,360,1,Semiurban,1 99 | LP001333,Male,Yes,0,Graduate,No,1977,997,50,360,1,Semiurban,1 100 | LP001334,Male,Yes,0,Not Graduate,No,4188,0,115,180,1,Semiurban,1 101 | LP001343,Male,Yes,0,Graduate,No,1759,3541,131,360,1,Semiurban,1 102 | LP001345,Male,Yes,2,Not Graduate,No,4288,3263,133,180,1,Urban,1 103 | LP001349,Male,No,0,Graduate,No,4843,3806,151,360,1,Semiurban,1 104 | LP001350,Male,Yes,,Graduate,No,13650,0,0,360,1,Urban,1 105 | LP001356,Male,Yes,0,Graduate,No,4652,3583,0,360,1,Semiurban,1 106 | LP001357,Male,,,Graduate,No,3816,754,160,360,1,Urban,1 107 | LP001367,Male,Yes,1,Graduate,No,3052,1030,100,360,1,Urban,1 108 | LP001369,Male,Yes,2,Graduate,No,11417,1126,225,360,1,Urban,1 109 | LP001370,Male,No,0,Not Graduate,,7333,0,120,360,1,Rural,0 110 | LP001379,Male,Yes,2,Graduate,No,3800,3600,216,360,0,Urban,0 111 | LP001384,Male,Yes,3+,Not Graduate,No,2071,754,94,480,1,Semiurban,1 112 | LP001385,Male,No,0,Graduate,No,5316,0,136,360,1,Urban,1 113 | LP001387,Female,Yes,0,Graduate,,2929,2333,139,360,1,Semiurban,1 114 | LP001391,Male,Yes,0,Not Graduate,No,3572,4114,152,,0,Rural,0 115 | LP001392,Female,No,1,Graduate,Yes,7451,0,0,360,1,Semiurban,1 116 | LP001398,Male,No,0,Graduate,,5050,0,118,360,1,Semiurban,1 117 | LP001401,Male,Yes,1,Graduate,No,14583,0,185,180,1,Rural,1 118 | LP001404,Female,Yes,0,Graduate,No,3167,2283,154,360,1,Semiurban,1 119 | LP001405,Male,Yes,1,Graduate,No,2214,1398,85,360,,Urban,1 120 | LP001421,Male,Yes,0,Graduate,No,5568,2142,175,360,1,Rural,0 121 | LP001422,Female,No,0,Graduate,No,10408,0,259,360,1,Urban,1 122 | LP001426,Male,Yes,,Graduate,No,5667,2667,180,360,1,Rural,1 123 | LP001430,Female,No,0,Graduate,No,4166,0,44,360,1,Semiurban,1 124 | LP001431,Female,No,0,Graduate,No,2137,8980,137,360,0,Semiurban,1 125 | LP001432,Male,Yes,2,Graduate,No,2957,0,81,360,1,Semiurban,1 126 | LP001439,Male,Yes,0,Not Graduate,No,4300,2014,194,360,1,Rural,1 127 | LP001443,Female,No,0,Graduate,No,3692,0,93,360,,Rural,1 128 | LP001448,,Yes,3+,Graduate,No,23803,0,370,360,1,Rural,1 129 | LP001449,Male,No,0,Graduate,No,3865,1640,0,360,1,Rural,1 130 | LP001451,Male,Yes,1,Graduate,Yes,10513,3850,160,180,0,Urban,0 131 | LP001465,Male,Yes,0,Graduate,No,6080,2569,182,360,,Rural,0 132 | LP001469,Male,No,0,Graduate,Yes,20166,0,650,480,,Urban,1 133 | LP001473,Male,No,0,Graduate,No,2014,1929,74,360,1,Urban,1 134 | LP001478,Male,No,0,Graduate,No,2718,0,70,360,1,Semiurban,1 135 | LP001482,Male,Yes,0,Graduate,Yes,3459,0,25,120,1,Semiurban,1 136 | LP001487,Male,No,0,Graduate,No,4895,0,102,360,1,Semiurban,1 137 | LP001488,Male,Yes,3+,Graduate,No,4000,7750,290,360,1,Semiurban,0 138 | LP001489,Female,Yes,0,Graduate,No,4583,0,84,360,1,Rural,0 139 | LP001491,Male,Yes,2,Graduate,Yes,3316,3500,88,360,1,Urban,1 140 | LP001492,Male,No,0,Graduate,No,14999,0,242,360,0,Semiurban,0 141 | LP001493,Male,Yes,2,Not Graduate,No,4200,1430,129,360,1,Rural,0 142 | LP001497,Male,Yes,2,Graduate,No,5042,2083,185,360,1,Rural,0 143 | LP001498,Male,No,0,Graduate,No,5417,0,168,360,1,Urban,1 144 | LP001504,Male,No,0,Graduate,Yes,6950,0,175,180,1,Semiurban,1 145 | LP001507,Male,Yes,0,Graduate,No,2698,2034,122,360,1,Semiurban,1 146 | LP001508,Male,Yes,2,Graduate,No,11757,0,187,180,1,Urban,1 147 | LP001514,Female,Yes,0,Graduate,No,2330,4486,100,360,1,Semiurban,1 148 | LP001516,Female,Yes,2,Graduate,No,14866,0,70,360,1,Urban,1 149 | LP001518,Male,Yes,1,Graduate,No,1538,1425,30,360,1,Urban,1 150 | LP001519,Female,No,0,Graduate,No,10000,1666,225,360,1,Rural,0 151 | LP001520,Male,Yes,0,Graduate,No,4860,830,125,360,1,Semiurban,1 152 | LP001528,Male,No,0,Graduate,No,6277,0,118,360,0,Rural,0 153 | LP001529,Male,Yes,0,Graduate,Yes,2577,3750,152,360,1,Rural,1 154 | LP001531,Male,No,0,Graduate,No,9166,0,244,360,1,Urban,0 155 | LP001532,Male,Yes,2,Not Graduate,No,2281,0,113,360,1,Rural,0 156 | LP001535,Male,No,0,Graduate,No,3254,0,50,360,1,Urban,1 157 | LP001536,Male,Yes,3+,Graduate,No,39999,0,600,180,0,Semiurban,1 158 | LP001541,Male,Yes,1,Graduate,No,6000,0,160,360,,Rural,1 159 | LP001543,Male,Yes,1,Graduate,No,9538,0,187,360,1,Urban,1 160 | LP001546,Male,No,0,Graduate,,2980,2083,120,360,1,Rural,1 161 | LP001552,Male,Yes,0,Graduate,No,4583,5625,255,360,1,Semiurban,1 162 | LP001560,Male,Yes,0,Not Graduate,No,1863,1041,98,360,1,Semiurban,1 163 | LP001562,Male,Yes,0,Graduate,No,7933,0,275,360,1,Urban,0 164 | LP001565,Male,Yes,1,Graduate,No,3089,1280,121,360,0,Semiurban,0 165 | LP001570,Male,Yes,2,Graduate,No,4167,1447,158,360,1,Rural,1 166 | LP001572,Male,Yes,0,Graduate,No,9323,0,75,180,1,Urban,1 167 | LP001574,Male,Yes,0,Graduate,No,3707,3166,182,,1,Rural,1 168 | LP001577,Female,Yes,0,Graduate,No,4583,0,112,360,1,Rural,0 169 | LP001578,Male,Yes,0,Graduate,No,2439,3333,129,360,1,Rural,1 170 | LP001579,Male,No,0,Graduate,No,2237,0,63,480,0,Semiurban,0 171 | LP001580,Male,Yes,2,Graduate,No,8000,0,200,360,1,Semiurban,1 172 | LP001581,Male,Yes,0,Not Graduate,,1820,1769,95,360,1,Rural,1 173 | LP001585,,Yes,3+,Graduate,No,51763,0,700,300,1,Urban,1 174 | LP001586,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural,0 175 | LP001594,Male,Yes,0,Graduate,No,5708,5625,187,360,1,Semiurban,1 176 | LP001603,Male,Yes,0,Not Graduate,Yes,4344,736,87,360,1,Semiurban,0 177 | LP001606,Male,Yes,0,Graduate,No,3497,1964,116,360,1,Rural,1 178 | LP001608,Male,Yes,2,Graduate,No,2045,1619,101,360,1,Rural,1 179 | LP001610,Male,Yes,3+,Graduate,No,5516,11300,495,360,0,Semiurban,0 180 | LP001616,Male,Yes,1,Graduate,No,3750,0,116,360,1,Semiurban,1 181 | LP001630,Male,No,0,Not Graduate,No,2333,1451,102,480,0,Urban,0 182 | LP001633,Male,Yes,1,Graduate,No,6400,7250,180,360,0,Urban,0 183 | LP001634,Male,No,0,Graduate,No,1916,5063,67,360,,Rural,0 184 | LP001636,Male,Yes,0,Graduate,No,4600,0,73,180,1,Semiurban,1 185 | LP001637,Male,Yes,1,Graduate,No,33846,0,260,360,1,Semiurban,0 186 | LP001639,Female,Yes,0,Graduate,No,3625,0,108,360,1,Semiurban,1 187 | LP001640,Male,Yes,0,Graduate,Yes,39147,4750,120,360,1,Semiurban,1 188 | LP001641,Male,Yes,1,Graduate,Yes,2178,0,66,300,0,Rural,0 189 | LP001643,Male,Yes,0,Graduate,No,2383,2138,58,360,,Rural,1 190 | LP001644,,Yes,0,Graduate,Yes,674,5296,168,360,1,Rural,1 191 | LP001647,Male,Yes,0,Graduate,No,9328,0,188,180,1,Rural,1 192 | LP001653,Male,No,0,Not Graduate,No,4885,0,48,360,1,Rural,1 193 | LP001656,Male,No,0,Graduate,No,12000,0,164,360,1,Semiurban,0 194 | LP001657,Male,Yes,0,Not Graduate,No,6033,0,160,360,1,Urban,0 195 | LP001658,Male,No,0,Graduate,No,3858,0,76,360,1,Semiurban,1 196 | LP001664,Male,No,0,Graduate,No,4191,0,120,360,1,Rural,1 197 | LP001665,Male,Yes,1,Graduate,No,3125,2583,170,360,1,Semiurban,0 198 | LP001666,Male,No,0,Graduate,No,8333,3750,187,360,1,Rural,1 199 | LP001669,Female,No,0,Not Graduate,No,1907,2365,120,,1,Urban,1 200 | LP001671,Female,Yes,0,Graduate,No,3416,2816,113,360,,Semiurban,1 201 | LP001673,Male,No,0,Graduate,Yes,11000,0,83,360,1,Urban,0 202 | LP001674,Male,Yes,1,Not Graduate,No,2600,2500,90,360,1,Semiurban,1 203 | LP001677,Male,No,2,Graduate,No,4923,0,166,360,0,Semiurban,1 204 | LP001682,Male,Yes,3+,Not Graduate,No,3992,0,0,180,1,Urban,0 205 | LP001688,Male,Yes,1,Not Graduate,No,3500,1083,135,360,1,Urban,1 206 | LP001691,Male,Yes,2,Not Graduate,No,3917,0,124,360,1,Semiurban,1 207 | LP001692,Female,No,0,Not Graduate,No,4408,0,120,360,1,Semiurban,1 208 | LP001693,Female,No,0,Graduate,No,3244,0,80,360,1,Urban,1 209 | LP001698,Male,No,0,Not Graduate,No,3975,2531,55,360,1,Rural,1 210 | LP001699,Male,No,0,Graduate,No,2479,0,59,360,1,Urban,1 211 | LP001702,Male,No,0,Graduate,No,3418,0,127,360,1,Semiurban,0 212 | LP001708,Female,No,0,Graduate,No,10000,0,214,360,1,Semiurban,0 213 | LP001711,Male,Yes,3+,Graduate,No,3430,1250,128,360,0,Semiurban,0 214 | LP001713,Male,Yes,1,Graduate,Yes,7787,0,240,360,1,Urban,1 215 | LP001715,Male,Yes,3+,Not Graduate,Yes,5703,0,130,360,1,Rural,1 216 | LP001716,Male,Yes,0,Graduate,No,3173,3021,137,360,1,Urban,1 217 | LP001720,Male,Yes,3+,Not Graduate,No,3850,983,100,360,1,Semiurban,1 218 | LP001722,Male,Yes,0,Graduate,No,150,1800,135,360,1,Rural,0 219 | LP001726,Male,Yes,0,Graduate,No,3727,1775,131,360,1,Semiurban,1 220 | LP001732,Male,Yes,2,Graduate,,5000,0,72,360,0,Semiurban,0 221 | LP001734,Female,Yes,2,Graduate,No,4283,2383,127,360,,Semiurban,1 222 | LP001736,Male,Yes,0,Graduate,No,2221,0,60,360,0,Urban,0 223 | LP001743,Male,Yes,2,Graduate,No,4009,1717,116,360,1,Semiurban,1 224 | LP001744,Male,No,0,Graduate,No,2971,2791,144,360,1,Semiurban,1 225 | LP001749,Male,Yes,0,Graduate,No,7578,1010,175,,1,Semiurban,1 226 | LP001750,Male,Yes,0,Graduate,No,6250,0,128,360,1,Semiurban,1 227 | LP001751,Male,Yes,0,Graduate,No,3250,0,170,360,1,Rural,0 228 | LP001754,Male,Yes,,Not Graduate,Yes,4735,0,138,360,1,Urban,0 229 | LP001758,Male,Yes,2,Graduate,No,6250,1695,210,360,1,Semiurban,1 230 | LP001760,Male,,,Graduate,No,4758,0,158,480,1,Semiurban,1 231 | LP001761,Male,No,0,Graduate,Yes,6400,0,200,360,1,Rural,1 232 | LP001765,Male,Yes,1,Graduate,No,2491,2054,104,360,1,Semiurban,1 233 | LP001768,Male,Yes,0,Graduate,,3716,0,42,180,1,Rural,1 234 | LP001770,Male,No,0,Not Graduate,No,3189,2598,120,,1,Rural,1 235 | LP001776,Female,No,0,Graduate,No,8333,0,280,360,1,Semiurban,1 236 | LP001778,Male,Yes,1,Graduate,No,3155,1779,140,360,1,Semiurban,1 237 | LP001784,Male,Yes,1,Graduate,No,5500,1260,170,360,1,Rural,1 238 | LP001786,Male,Yes,0,Graduate,,5746,0,255,360,,Urban,0 239 | LP001788,Female,No,0,Graduate,Yes,3463,0,122,360,,Urban,1 240 | LP001790,Female,No,1,Graduate,No,3812,0,112,360,1,Rural,1 241 | LP001792,Male,Yes,1,Graduate,No,3315,0,96,360,1,Semiurban,1 242 | LP001798,Male,Yes,2,Graduate,No,5819,5000,120,360,1,Rural,1 243 | LP001800,Male,Yes,1,Not Graduate,No,2510,1983,140,180,1,Urban,0 244 | LP001806,Male,No,0,Graduate,No,2965,5701,155,60,1,Urban,1 245 | LP001807,Male,Yes,2,Graduate,Yes,6250,1300,108,360,1,Rural,1 246 | LP001811,Male,Yes,0,Not Graduate,No,3406,4417,123,360,1,Semiurban,1 247 | LP001813,Male,No,0,Graduate,Yes,6050,4333,120,180,1,Urban,0 248 | LP001814,Male,Yes,2,Graduate,No,9703,0,112,360,1,Urban,1 249 | LP001819,Male,Yes,1,Not Graduate,No,6608,0,137,180,1,Urban,1 250 | LP001824,Male,Yes,1,Graduate,No,2882,1843,123,480,1,Semiurban,1 251 | LP001825,Male,Yes,0,Graduate,No,1809,1868,90,360,1,Urban,1 252 | LP001835,Male,Yes,0,Not Graduate,No,1668,3890,201,360,0,Semiurban,0 253 | LP001836,Female,No,2,Graduate,No,3427,0,138,360,1,Urban,0 254 | LP001841,Male,No,0,Not Graduate,Yes,2583,2167,104,360,1,Rural,1 255 | LP001843,Male,Yes,1,Not Graduate,No,2661,7101,279,180,1,Semiurban,1 256 | LP001844,Male,No,0,Graduate,Yes,16250,0,192,360,0,Urban,0 257 | LP001846,Female,No,3+,Graduate,No,3083,0,255,360,1,Rural,1 258 | LP001849,Male,No,0,Not Graduate,No,6045,0,115,360,0,Rural,0 259 | LP001854,Male,Yes,3+,Graduate,No,5250,0,94,360,1,Urban,0 260 | LP001859,Male,Yes,0,Graduate,No,14683,2100,304,360,1,Rural,0 261 | LP001864,Male,Yes,3+,Not Graduate,No,4931,0,128,360,,Semiurban,0 262 | LP001865,Male,Yes,1,Graduate,No,6083,4250,330,360,,Urban,1 263 | LP001868,Male,No,0,Graduate,No,2060,2209,134,360,1,Semiurban,1 264 | LP001870,Female,No,1,Graduate,No,3481,0,155,36,1,Semiurban,0 265 | LP001871,Female,No,0,Graduate,No,7200,0,120,360,1,Rural,1 266 | LP001872,Male,No,0,Graduate,Yes,5166,0,128,360,1,Semiurban,1 267 | LP001875,Male,No,0,Graduate,No,4095,3447,151,360,1,Rural,1 268 | LP001877,Male,Yes,2,Graduate,No,4708,1387,150,360,1,Semiurban,1 269 | LP001882,Male,Yes,3+,Graduate,No,4333,1811,160,360,0,Urban,1 270 | LP001883,Female,No,0,Graduate,,3418,0,135,360,1,Rural,0 271 | LP001884,Female,No,1,Graduate,No,2876,1560,90,360,1,Urban,1 272 | LP001888,Female,No,0,Graduate,No,3237,0,30,360,1,Urban,1 273 | LP001891,Male,Yes,0,Graduate,No,11146,0,136,360,1,Urban,1 274 | LP001892,Male,No,0,Graduate,No,2833,1857,126,360,1,Rural,1 275 | LP001894,Male,Yes,0,Graduate,No,2620,2223,150,360,1,Semiurban,1 276 | LP001896,Male,Yes,2,Graduate,No,3900,0,90,360,1,Semiurban,1 277 | LP001900,Male,Yes,1,Graduate,No,2750,1842,115,360,1,Semiurban,1 278 | LP001903,Male,Yes,0,Graduate,No,3993,3274,207,360,1,Semiurban,1 279 | LP001904,Male,Yes,0,Graduate,No,3103,1300,80,360,1,Urban,1 280 | LP001907,Male,Yes,0,Graduate,No,14583,0,436,360,1,Semiurban,1 281 | LP001908,Female,Yes,0,Not Graduate,No,4100,0,124,360,,Rural,1 282 | LP001910,Male,No,1,Not Graduate,Yes,4053,2426,158,360,0,Urban,0 283 | LP001914,Male,Yes,0,Graduate,No,3927,800,112,360,1,Semiurban,1 284 | LP001915,Male,Yes,2,Graduate,No,2301,985.7999878,78,180,1,Urban,1 285 | LP001917,Female,No,0,Graduate,No,1811,1666,54,360,1,Urban,1 286 | LP001922,Male,Yes,0,Graduate,No,20667,0,0,360,1,Rural,0 287 | LP001924,Male,No,0,Graduate,No,3158,3053,89,360,1,Rural,1 288 | LP001925,Female,No,0,Graduate,Yes,2600,1717,99,300,1,Semiurban,0 289 | LP001926,Male,Yes,0,Graduate,No,3704,2000,120,360,1,Rural,1 290 | LP001931,Female,No,0,Graduate,No,4124,0,115,360,1,Semiurban,1 291 | LP001935,Male,No,0,Graduate,No,9508,0,187,360,1,Rural,1 292 | LP001936,Male,Yes,0,Graduate,No,3075,2416,139,360,1,Rural,1 293 | LP001938,Male,Yes,2,Graduate,No,4400,0,127,360,0,Semiurban,0 294 | LP001940,Male,Yes,2,Graduate,No,3153,1560,134,360,1,Urban,1 295 | LP001945,Female,No,,Graduate,No,5417,0,143,480,0,Urban,0 296 | LP001947,Male,Yes,0,Graduate,No,2383,3334,172,360,1,Semiurban,1 297 | LP001949,Male,Yes,3+,Graduate,,4416,1250,110,360,1,Urban,1 298 | LP001953,Male,Yes,1,Graduate,No,6875,0,200,360,1,Semiurban,1 299 | LP001954,Female,Yes,1,Graduate,No,4666,0,135,360,1,Urban,1 300 | LP001955,Female,No,0,Graduate,No,5000,2541,151,480,1,Rural,0 301 | LP001963,Male,Yes,1,Graduate,No,2014,2925,113,360,1,Urban,0 302 | LP001964,Male,Yes,0,Not Graduate,No,1800,2934,93,360,0,Urban,0 303 | LP001972,Male,Yes,,Not Graduate,No,2875,1750,105,360,1,Semiurban,1 304 | LP001974,Female,No,0,Graduate,No,5000,0,132,360,1,Rural,1 305 | LP001977,Male,Yes,1,Graduate,No,1625,1803,96,360,1,Urban,1 306 | LP001978,Male,No,0,Graduate,No,4000,2500,140,360,1,Rural,1 307 | LP001990,Male,No,0,Not Graduate,No,2000,0,0,360,1,Urban,0 308 | LP001993,Female,No,0,Graduate,No,3762,1666,135,360,1,Rural,1 309 | LP001994,Female,No,0,Graduate,No,2400,1863,104,360,0,Urban,0 310 | LP001996,Male,No,0,Graduate,No,20233,0,480,360,1,Rural,0 311 | LP001998,Male,Yes,2,Not Graduate,No,7667,0,185,360,,Rural,1 312 | LP002002,Female,No,0,Graduate,No,2917,0,84,360,1,Semiurban,1 313 | LP002004,Male,No,0,Not Graduate,No,2927,2405,111,360,1,Semiurban,1 314 | LP002006,Female,No,0,Graduate,No,2507,0,56,360,1,Rural,1 315 | LP002008,Male,Yes,2,Graduate,Yes,5746,0,144,84,,Rural,1 316 | LP002024,,Yes,0,Graduate,No,2473,1843,159,360,1,Rural,0 317 | LP002031,Male,Yes,1,Not Graduate,No,3399,1640,111,180,1,Urban,1 318 | LP002035,Male,Yes,2,Graduate,No,3717,0,120,360,1,Semiurban,1 319 | LP002036,Male,Yes,0,Graduate,No,2058,2134,88,360,,Urban,1 320 | LP002043,Female,No,1,Graduate,No,3541,0,112,360,,Semiurban,1 321 | LP002050,Male,Yes,1,Graduate,Yes,10000,0,155,360,1,Rural,0 322 | LP002051,Male,Yes,0,Graduate,No,2400,2167,115,360,1,Semiurban,1 323 | LP002053,Male,Yes,3+,Graduate,No,4342,189,124,360,1,Semiurban,1 324 | LP002054,Male,Yes,2,Not Graduate,No,3601,1590,0,360,1,Rural,1 325 | LP002055,Female,No,0,Graduate,No,3166,2985,132,360,,Rural,1 326 | LP002065,Male,Yes,3+,Graduate,No,15000,0,300,360,1,Rural,1 327 | LP002067,Male,Yes,1,Graduate,Yes,8666,4983,376,360,0,Rural,0 328 | LP002068,Male,No,0,Graduate,No,4917,0,130,360,0,Rural,1 329 | LP002082,Male,Yes,0,Graduate,Yes,5818,2160,184,360,1,Semiurban,1 330 | LP002086,Female,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,0 331 | LP002087,Female,No,0,Graduate,No,2500,0,67,360,1,Urban,1 332 | LP002097,Male,No,1,Graduate,No,4384,1793,117,360,1,Urban,1 333 | LP002098,Male,No,0,Graduate,No,2935,0,98,360,1,Semiurban,1 334 | LP002100,Male,No,,Graduate,No,2833,0,71,360,1,Urban,1 335 | LP002101,Male,Yes,0,Graduate,,63337,0,490,180,1,Urban,1 336 | LP002103,,Yes,1,Graduate,Yes,9833,1833,182,180,1,Urban,1 337 | LP002106,Male,Yes,,Graduate,Yes,5503,4490,70,,1,Semiurban,1 338 | LP002110,Male,Yes,1,Graduate,,5250,688,160,360,1,Rural,1 339 | LP002112,Male,Yes,2,Graduate,Yes,2500,4600,176,360,1,Rural,1 340 | LP002113,Female,No,3+,Not Graduate,No,1830,0,0,360,0,Urban,0 341 | LP002114,Female,No,0,Graduate,No,4160,0,71,360,1,Semiurban,1 342 | LP002115,Male,Yes,3+,Not Graduate,No,2647,1587,173,360,1,Rural,0 343 | LP002116,Female,No,0,Graduate,No,2378,0,46,360,1,Rural,0 344 | LP002119,Male,Yes,1,Not Graduate,No,4554,1229,158,360,1,Urban,1 345 | LP002126,Male,Yes,3+,Not Graduate,No,3173,0,74,360,1,Semiurban,1 346 | LP002128,Male,Yes,2,Graduate,,2583,2330,125,360,1,Rural,1 347 | LP002129,Male,Yes,0,Graduate,No,2499,2458,160,360,1,Semiurban,1 348 | LP002130,Male,Yes,,Not Graduate,No,3523,3230,152,360,0,Rural,0 349 | LP002131,Male,Yes,2,Not Graduate,No,3083,2168,126,360,1,Urban,1 350 | LP002137,Male,Yes,0,Graduate,No,6333,4583,259,360,,Semiurban,1 351 | LP002138,Male,Yes,0,Graduate,No,2625,6250,187,360,1,Rural,1 352 | LP002139,Male,Yes,0,Graduate,No,9083,0,228,360,1,Semiurban,1 353 | LP002140,Male,No,0,Graduate,No,8750,4167,308,360,1,Rural,0 354 | LP002141,Male,Yes,3+,Graduate,No,2666,2083,95,360,1,Rural,1 355 | LP002142,Female,Yes,0,Graduate,Yes,5500,0,105,360,0,Rural,0 356 | LP002143,Female,Yes,0,Graduate,No,2423,505,130,360,1,Semiurban,1 357 | LP002144,Female,No,,Graduate,No,3813,0,116,180,1,Urban,1 358 | LP002149,Male,Yes,2,Graduate,No,8333,3167,165,360,1,Rural,1 359 | LP002151,Male,Yes,1,Graduate,No,3875,0,67,360,1,Urban,0 360 | LP002158,Male,Yes,0,Not Graduate,No,3000,1666,100,480,0,Urban,0 361 | LP002160,Male,Yes,3+,Graduate,No,5167,3167,200,360,1,Semiurban,1 362 | LP002161,Female,No,1,Graduate,No,4723,0,81,360,1,Semiurban,0 363 | LP002170,Male,Yes,2,Graduate,No,5000,3667,236,360,1,Semiurban,1 364 | LP002175,Male,Yes,0,Graduate,No,4750,2333,130,360,1,Urban,1 365 | LP002178,Male,Yes,0,Graduate,No,3013,3033,95,300,,Urban,1 366 | LP002180,Male,No,0,Graduate,Yes,6822,0,141,360,1,Rural,1 367 | LP002181,Male,No,0,Not Graduate,No,6216,0,133,360,1,Rural,0 368 | LP002187,Male,No,0,Graduate,No,2500,0,96,480,1,Semiurban,0 369 | LP002188,Male,No,0,Graduate,No,5124,0,124,,0,Rural,0 370 | LP002190,Male,Yes,1,Graduate,No,6325,0,175,360,1,Semiurban,1 371 | LP002191,Male,Yes,0,Graduate,No,19730,5266,570,360,1,Rural,0 372 | LP002194,Female,No,0,Graduate,Yes,15759,0,55,360,1,Semiurban,1 373 | LP002197,Male,Yes,2,Graduate,No,5185,0,155,360,1,Semiurban,1 374 | LP002201,Male,Yes,2,Graduate,Yes,9323,7873,380,300,1,Rural,1 375 | LP002205,Male,No,1,Graduate,No,3062,1987,111,180,0,Urban,0 376 | LP002209,Female,No,0,Graduate,,2764,1459,110,360,1,Urban,1 377 | LP002211,Male,Yes,0,Graduate,No,4817,923,120,180,1,Urban,1 378 | LP002219,Male,Yes,3+,Graduate,No,8750,4996,130,360,1,Rural,1 379 | LP002223,Male,Yes,0,Graduate,No,4310,0,130,360,,Semiurban,1 380 | LP002224,Male,No,0,Graduate,No,3069,0,71,480,1,Urban,0 381 | LP002225,Male,Yes,2,Graduate,No,5391,0,130,360,1,Urban,1 382 | LP002226,Male,Yes,0,Graduate,,3333,2500,128,360,1,Semiurban,1 383 | LP002229,Male,No,0,Graduate,No,5941,4232,296,360,1,Semiurban,1 384 | LP002231,Female,No,0,Graduate,No,6000,0,156,360,1,Urban,1 385 | LP002234,Male,No,0,Graduate,Yes,7167,0,128,360,1,Urban,1 386 | LP002236,Male,Yes,2,Graduate,No,4566,0,100,360,1,Urban,0 387 | LP002237,Male,No,1,Graduate,,3667,0,113,180,1,Urban,1 388 | LP002239,Male,No,0,Not Graduate,No,2346,1600,132,360,1,Semiurban,1 389 | LP002243,Male,Yes,0,Not Graduate,No,3010,3136,0,360,0,Urban,0 390 | LP002244,Male,Yes,0,Graduate,No,2333,2417,136,360,1,Urban,1 391 | LP002250,Male,Yes,0,Graduate,No,5488,0,125,360,1,Rural,1 392 | LP002255,Male,No,3+,Graduate,No,9167,0,185,360,1,Rural,1 393 | LP002262,Male,Yes,3+,Graduate,No,9504,0,275,360,1,Rural,1 394 | LP002263,Male,Yes,0,Graduate,No,2583,2115,120,360,,Urban,1 395 | LP002265,Male,Yes,2,Not Graduate,No,1993,1625,113,180,1,Semiurban,1 396 | LP002266,Male,Yes,2,Graduate,No,3100,1400,113,360,1,Urban,1 397 | LP002272,Male,Yes,2,Graduate,No,3276,484,135,360,,Semiurban,1 398 | LP002277,Female,No,0,Graduate,No,3180,0,71,360,0,Urban,0 399 | LP002281,Male,Yes,0,Graduate,No,3033,1459,95,360,1,Urban,1 400 | LP002284,Male,No,0,Not Graduate,No,3902,1666,109,360,1,Rural,1 401 | LP002287,Female,No,0,Graduate,No,1500,1800,103,360,0,Semiurban,0 402 | LP002288,Male,Yes,2,Not Graduate,No,2889,0,45,180,0,Urban,0 403 | LP002296,Male,No,0,Not Graduate,No,2755,0,65,300,1,Rural,0 404 | LP002297,Male,No,0,Graduate,No,2500,20000,103,360,1,Semiurban,1 405 | LP002300,Female,No,0,Not Graduate,No,1963,0,53,360,1,Semiurban,1 406 | LP002301,Female,No,0,Graduate,Yes,7441,0,194,360,1,Rural,0 407 | LP002305,Female,No,0,Graduate,No,4547,0,115,360,1,Semiurban,1 408 | LP002308,Male,Yes,0,Not Graduate,No,2167,2400,115,360,1,Urban,1 409 | LP002314,Female,No,0,Not Graduate,No,2213,0,66,360,1,Rural,1 410 | LP002315,Male,Yes,1,Graduate,No,8300,0,152,300,0,Semiurban,0 411 | LP002317,Male,Yes,3+,Graduate,No,81000,0,360,360,0,Rural,0 412 | LP002318,Female,No,1,Not Graduate,Yes,3867,0,62,360,1,Semiurban,0 413 | LP002319,Male,Yes,0,Graduate,,6256,0,160,360,,Urban,1 414 | LP002328,Male,Yes,0,Not Graduate,No,6096,0,218,360,0,Rural,0 415 | LP002332,Male,Yes,0,Not Graduate,No,2253,2033,110,360,1,Rural,1 416 | LP002335,Female,Yes,0,Not Graduate,No,2149,3237,178,360,0,Semiurban,0 417 | LP002337,Female,No,0,Graduate,No,2995,0,60,360,1,Urban,1 418 | LP002341,Female,No,1,Graduate,No,2600,0,160,360,1,Urban,0 419 | LP002342,Male,Yes,2,Graduate,Yes,1600,20000,239,360,1,Urban,0 420 | LP002345,Male,Yes,0,Graduate,No,1025,2773,112,360,1,Rural,1 421 | LP002347,Male,Yes,0,Graduate,No,3246,1417,138,360,1,Semiurban,1 422 | LP002348,Male,Yes,0,Graduate,No,5829,0,138,360,1,Rural,1 423 | LP002357,Female,No,0,Not Graduate,No,2720,0,80,,0,Urban,0 424 | LP002361,Male,Yes,0,Graduate,No,1820,1719,100,360,1,Urban,1 425 | LP002362,Male,Yes,1,Graduate,No,7250,1667,110,,0,Urban,0 426 | LP002364,Male,Yes,0,Graduate,No,14880,0,96,360,1,Semiurban,1 427 | LP002366,Male,Yes,0,Graduate,No,2666,4300,121,360,1,Rural,1 428 | LP002367,Female,No,1,Not Graduate,No,4606,0,81,360,1,Rural,0 429 | LP002368,Male,Yes,2,Graduate,No,5935,0,133,360,1,Semiurban,1 430 | LP002369,Male,Yes,0,Graduate,No,2920,16.12000084,87,360,1,Rural,1 431 | LP002370,Male,No,0,Not Graduate,No,2717,0,60,180,1,Urban,1 432 | LP002377,Female,No,1,Graduate,Yes,8624,0,150,360,1,Semiurban,1 433 | LP002379,Male,No,0,Graduate,No,6500,0,105,360,0,Rural,0 434 | LP002386,Male,No,0,Graduate,,12876,0,405,360,1,Semiurban,1 435 | LP002387,Male,Yes,0,Graduate,No,2425,2340,143,360,1,Semiurban,1 436 | LP002390,Male,No,0,Graduate,No,3750,0,100,360,1,Urban,1 437 | LP002393,Female,,,Graduate,No,10047,0,0,240,1,Semiurban,1 438 | LP002398,Male,No,0,Graduate,No,1926,1851,50,360,1,Semiurban,1 439 | LP002401,Male,Yes,0,Graduate,No,2213,1125,0,360,1,Urban,1 440 | LP002403,Male,No,0,Graduate,Yes,10416,0,187,360,0,Urban,0 441 | LP002407,Female,Yes,0,Not Graduate,Yes,7142,0,138,360,1,Rural,1 442 | LP002408,Male,No,0,Graduate,No,3660,5064,187,360,1,Semiurban,1 443 | LP002409,Male,Yes,0,Graduate,No,7901,1833,180,360,1,Rural,1 444 | LP002418,Male,No,3+,Not Graduate,No,4707,1993,148,360,1,Semiurban,1 445 | LP002422,Male,No,1,Graduate,No,37719,0,152,360,1,Semiurban,1 446 | LP002424,Male,Yes,0,Graduate,No,7333,8333,175,300,,Rural,1 447 | LP002429,Male,Yes,1,Graduate,Yes,3466,1210,130,360,1,Rural,1 448 | LP002434,Male,Yes,2,Not Graduate,No,4652,0,110,360,1,Rural,1 449 | LP002435,Male,Yes,0,Graduate,,3539,1376,55,360,1,Rural,0 450 | LP002443,Male,Yes,2,Graduate,No,3340,1710,150,360,0,Rural,0 451 | LP002444,Male,No,1,Not Graduate,Yes,2769,1542,190,360,,Semiurban,0 452 | LP002446,Male,Yes,2,Not Graduate,No,2309,1255,125,360,0,Rural,0 453 | LP002447,Male,Yes,2,Not Graduate,No,1958,1456,60,300,,Urban,1 454 | LP002448,Male,Yes,0,Graduate,No,3948,1733,149,360,0,Rural,0 455 | LP002449,Male,Yes,0,Graduate,No,2483,2466,90,180,0,Rural,1 456 | LP002453,Male,No,0,Graduate,Yes,7085,0,84,360,1,Semiurban,1 457 | LP002455,Male,Yes,2,Graduate,No,3859,0,96,360,1,Semiurban,1 458 | LP002459,Male,Yes,0,Graduate,No,4301,0,118,360,1,Urban,1 459 | LP002467,Male,Yes,0,Graduate,No,3708,2569,173,360,1,Urban,0 460 | LP002472,Male,No,2,Graduate,No,4354,0,136,360,1,Rural,1 461 | LP002473,Male,Yes,0,Graduate,No,8334,0,160,360,1,Semiurban,0 462 | LP002478,,Yes,0,Graduate,Yes,2083,4083,160,360,,Semiurban,1 463 | LP002484,Male,Yes,3+,Graduate,No,7740,0,128,180,1,Urban,1 464 | LP002487,Male,Yes,0,Graduate,No,3015,2188,153,360,1,Rural,1 465 | LP002489,Female,No,1,Not Graduate,,5191,0,132,360,1,Semiurban,1 466 | LP002493,Male,No,0,Graduate,No,4166,0,98,360,0,Semiurban,0 467 | LP002494,Male,No,0,Graduate,No,6000,0,140,360,1,Rural,1 468 | LP002500,Male,Yes,3+,Not Graduate,No,2947,1664,70,180,0,Urban,0 469 | LP002501,,Yes,0,Graduate,No,16692,0,110,360,1,Semiurban,1 470 | LP002502,Female,Yes,2,Not Graduate,,210,2917,98,360,1,Semiurban,1 471 | LP002505,Male,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,0 472 | LP002515,Male,Yes,1,Graduate,Yes,3450,2079,162,360,1,Semiurban,1 473 | LP002517,Male,Yes,1,Not Graduate,No,2653,1500,113,180,0,Rural,0 474 | LP002519,Male,Yes,3+,Graduate,No,4691,0,100,360,1,Semiurban,1 475 | LP002522,Female,No,0,Graduate,Yes,2500,0,93,360,,Urban,1 476 | LP002524,Male,No,2,Graduate,No,5532,4648,162,360,1,Rural,1 477 | LP002527,Male,Yes,2,Graduate,Yes,16525,1014,150,360,1,Rural,1 478 | LP002529,Male,Yes,2,Graduate,No,6700,1750,230,300,1,Semiurban,1 479 | LP002530,,Yes,2,Graduate,No,2873,1872,132,360,0,Semiurban,0 480 | LP002531,Male,Yes,1,Graduate,Yes,16667,2250,86,360,1,Semiurban,1 481 | LP002533,Male,Yes,2,Graduate,No,2947,1603,0,360,1,Urban,0 482 | LP002534,Female,No,0,Not Graduate,No,4350,0,154,360,1,Rural,1 483 | LP002536,Male,Yes,3+,Not Graduate,No,3095,0,113,360,1,Rural,1 484 | LP002537,Male,Yes,0,Graduate,No,2083,3150,128,360,1,Semiurban,1 485 | LP002541,Male,Yes,0,Graduate,No,10833,0,234,360,1,Semiurban,1 486 | LP002543,Male,Yes,2,Graduate,No,8333,0,246,360,1,Semiurban,1 487 | LP002544,Male,Yes,1,Not Graduate,No,1958,2436,131,360,1,Rural,1 488 | LP002545,Male,No,2,Graduate,No,3547,0,80,360,0,Rural,0 489 | LP002547,Male,Yes,1,Graduate,No,18333,0,500,360,1,Urban,0 490 | LP002555,Male,Yes,2,Graduate,Yes,4583,2083,160,360,1,Semiurban,1 491 | LP002556,Male,No,0,Graduate,No,2435,0,75,360,1,Urban,0 492 | LP002560,Male,No,0,Not Graduate,No,2699,2785,96,360,,Semiurban,1 493 | LP002562,Male,Yes,1,Not Graduate,No,5333,1131,186,360,,Urban,1 494 | LP002571,Male,No,0,Not Graduate,No,3691,0,110,360,1,Rural,1 495 | LP002582,Female,No,0,Not Graduate,Yes,17263,0,225,360,1,Semiurban,1 496 | LP002585,Male,Yes,0,Graduate,No,3597,2157,119,360,0,Rural,0 497 | LP002586,Female,Yes,1,Graduate,No,3326,913,105,84,1,Semiurban,1 498 | LP002587,Male,Yes,0,Not Graduate,No,2600,1700,107,360,1,Rural,1 499 | LP002588,Male,Yes,0,Graduate,No,4625,2857,111,12,,Urban,1 500 | LP002600,Male,Yes,1,Graduate,Yes,2895,0,95,360,1,Semiurban,1 501 | LP002602,Male,No,0,Graduate,No,6283,4416,209,360,0,Rural,0 502 | LP002603,Female,No,0,Graduate,No,645,3683,113,480,1,Rural,1 503 | LP002606,Female,No,0,Graduate,No,3159,0,100,360,1,Semiurban,1 504 | LP002615,Male,Yes,2,Graduate,No,4865,5624,208,360,1,Semiurban,1 505 | LP002618,Male,Yes,1,Not Graduate,No,4050,5302,138,360,,Rural,0 506 | LP002619,Male,Yes,0,Not Graduate,No,3814,1483,124,300,1,Semiurban,1 507 | LP002622,Male,Yes,2,Graduate,No,3510,4416,243,360,1,Rural,1 508 | LP002624,Male,Yes,0,Graduate,No,20833,6667,480,360,,Urban,1 509 | LP002625,,No,0,Graduate,No,3583,0,96,360,1,Urban,0 510 | LP002626,Male,Yes,0,Graduate,Yes,2479,3013,188,360,1,Urban,1 511 | LP002634,Female,No,1,Graduate,No,13262,0,40,360,1,Urban,1 512 | LP002637,Male,No,0,Not Graduate,No,3598,1287,100,360,1,Rural,0 513 | LP002640,Male,Yes,1,Graduate,No,6065,2004,250,360,1,Semiurban,1 514 | LP002643,Male,Yes,2,Graduate,No,3283,2035,148,360,1,Urban,1 515 | LP002648,Male,Yes,0,Graduate,No,2130,6666,70,180,1,Semiurban,0 516 | LP002652,Male,No,0,Graduate,No,5815,3666,311,360,1,Rural,0 517 | LP002659,Male,Yes,3+,Graduate,No,3466,3428,150,360,1,Rural,1 518 | LP002670,Female,Yes,2,Graduate,No,2031,1632,113,480,1,Semiurban,1 519 | LP002682,Male,Yes,,Not Graduate,No,3074,1800,123,360,0,Semiurban,0 520 | LP002683,Male,No,0,Graduate,No,4683,1915,185,360,1,Semiurban,0 521 | LP002684,Female,No,0,Not Graduate,No,3400,0,95,360,1,Rural,0 522 | LP002689,Male,Yes,2,Not Graduate,No,2192,1742,45,360,1,Semiurban,1 523 | LP002690,Male,No,0,Graduate,No,2500,0,55,360,1,Semiurban,1 524 | LP002692,Male,Yes,3+,Graduate,Yes,5677,1424,100,360,1,Rural,1 525 | LP002693,Male,Yes,2,Graduate,Yes,7948,7166,480,360,1,Rural,1 526 | LP002697,Male,No,0,Graduate,No,4680,2087,0,360,1,Semiurban,0 527 | LP002699,Male,Yes,2,Graduate,Yes,17500,0,400,360,1,Rural,1 528 | LP002705,Male,Yes,0,Graduate,No,3775,0,110,360,1,Semiurban,1 529 | LP002706,Male,Yes,1,Not Graduate,No,5285,1430,161,360,0,Semiurban,1 530 | LP002714,Male,No,1,Not Graduate,No,2679,1302,94,360,1,Semiurban,1 531 | LP002716,Male,No,0,Not Graduate,No,6783,0,130,360,1,Semiurban,1 532 | LP002717,Male,Yes,0,Graduate,No,1025,5500,216,360,,Rural,1 533 | LP002720,Male,Yes,3+,Graduate,No,4281,0,100,360,1,Urban,1 534 | LP002723,Male,No,2,Graduate,No,3588,0,110,360,0,Rural,0 535 | LP002729,Male,No,1,Graduate,No,11250,0,196,360,,Semiurban,0 536 | LP002731,Female,No,0,Not Graduate,Yes,18165,0,125,360,1,Urban,1 537 | LP002732,Male,No,0,Not Graduate,,2550,2042,126,360,1,Rural,1 538 | LP002734,Male,Yes,0,Graduate,No,6133,3906,324,360,1,Urban,1 539 | LP002738,Male,No,2,Graduate,No,3617,0,107,360,1,Semiurban,1 540 | LP002739,Male,Yes,0,Not Graduate,No,2917,536,66,360,1,Rural,0 541 | LP002740,Male,Yes,3+,Graduate,No,6417,0,157,180,1,Rural,1 542 | LP002741,Female,Yes,1,Graduate,No,4608,2845,140,180,1,Semiurban,1 543 | LP002743,Female,No,0,Graduate,No,2138,0,99,360,0,Semiurban,0 544 | LP002753,Female,No,1,Graduate,,3652,0,95,360,1,Semiurban,1 545 | LP002755,Male,Yes,1,Not Graduate,No,2239,2524,128,360,1,Urban,1 546 | LP002757,Female,Yes,0,Not Graduate,No,3017,663,102,360,,Semiurban,1 547 | LP002767,Male,Yes,0,Graduate,No,2768,1950,155,360,1,Rural,1 548 | LP002768,Male,No,0,Not Graduate,No,3358,0,80,36,1,Semiurban,0 549 | LP002772,Male,No,0,Graduate,No,2526,1783,145,360,1,Rural,1 550 | LP002776,Female,No,0,Graduate,No,5000,0,103,360,0,Semiurban,0 551 | LP002777,Male,Yes,0,Graduate,No,2785,2016,110,360,1,Rural,1 552 | LP002778,Male,Yes,2,Graduate,Yes,6633,0,0,360,0,Rural,0 553 | LP002784,Male,Yes,1,Not Graduate,No,2492,2375,0,360,1,Rural,1 554 | LP002785,Male,Yes,1,Graduate,No,3333,3250,158,360,1,Urban,1 555 | LP002788,Male,Yes,0,Not Graduate,No,2454,2333,181,360,0,Urban,0 556 | LP002789,Male,Yes,0,Graduate,No,3593,4266,132,180,0,Rural,0 557 | LP002792,Male,Yes,1,Graduate,No,5468,1032,26,360,1,Semiurban,1 558 | LP002794,Female,No,0,Graduate,No,2667,1625,84,360,,Urban,1 559 | LP002795,Male,Yes,3+,Graduate,Yes,10139,0,260,360,1,Semiurban,1 560 | LP002798,Male,Yes,0,Graduate,No,3887,2669,162,360,1,Semiurban,1 561 | LP002804,Female,Yes,0,Graduate,No,4180,2306,182,360,1,Semiurban,1 562 | LP002807,Male,Yes,2,Not Graduate,No,3675,242,108,360,1,Semiurban,1 563 | LP002813,Female,Yes,1,Graduate,Yes,19484,0,600,360,1,Semiurban,1 564 | LP002820,Male,Yes,0,Graduate,No,5923,2054,211,360,1,Rural,1 565 | LP002821,Male,No,0,Not Graduate,Yes,5800,0,132,360,1,Semiurban,1 566 | LP002832,Male,Yes,2,Graduate,No,8799,0,258,360,0,Urban,0 567 | LP002833,Male,Yes,0,Not Graduate,No,4467,0,120,360,,Rural,1 568 | LP002836,Male,No,0,Graduate,No,3333,0,70,360,1,Urban,1 569 | LP002837,Male,Yes,3+,Graduate,No,3400,2500,123,360,0,Rural,0 570 | LP002840,Female,No,0,Graduate,No,2378,0,9,360,1,Urban,0 571 | LP002841,Male,Yes,0,Graduate,No,3166,2064,104,360,0,Urban,0 572 | LP002842,Male,Yes,1,Graduate,No,3417,1750,186,360,1,Urban,1 573 | LP002847,Male,Yes,,Graduate,No,5116,1451,165,360,0,Urban,0 574 | LP002855,Male,Yes,2,Graduate,No,16666,0,275,360,1,Urban,1 575 | LP002862,Male,Yes,2,Not Graduate,No,6125,1625,187,480,1,Semiurban,0 576 | LP002863,Male,Yes,3+,Graduate,No,6406,0,150,360,1,Semiurban,0 577 | LP002868,Male,Yes,2,Graduate,No,3159,461,108,84,1,Urban,1 578 | LP002872,,Yes,0,Graduate,No,3087,2210,136,360,0,Semiurban,0 579 | LP002874,Male,No,0,Graduate,No,3229,2739,110,360,1,Urban,1 580 | LP002877,Male,Yes,1,Graduate,No,1782,2232,107,360,1,Rural,1 581 | LP002888,Male,No,0,Graduate,,3182,2917,161,360,1,Urban,1 582 | LP002892,Male,Yes,2,Graduate,No,6540,0,205,360,1,Semiurban,1 583 | LP002893,Male,No,0,Graduate,No,1836,33837,90,360,1,Urban,0 584 | LP002894,Female,Yes,0,Graduate,No,3166,0,36,360,1,Semiurban,1 585 | LP002898,Male,Yes,1,Graduate,No,1880,0,61,360,,Rural,0 586 | LP002911,Male,Yes,1,Graduate,No,2787,1917,146,360,0,Rural,0 587 | LP002912,Male,Yes,1,Graduate,No,4283,3000,172,84,1,Rural,0 588 | LP002916,Male,Yes,0,Graduate,No,2297,1522,104,360,1,Urban,1 589 | LP002917,Female,No,0,Not Graduate,No,2165,0,70,360,1,Semiurban,1 590 | LP002925,,No,0,Graduate,No,4750,0,94,360,1,Semiurban,1 591 | LP002926,Male,Yes,2,Graduate,Yes,2726,0,106,360,0,Semiurban,0 592 | LP002928,Male,Yes,0,Graduate,No,3000,3416,56,180,1,Semiurban,1 593 | LP002931,Male,Yes,2,Graduate,Yes,6000,0,205,240,1,Semiurban,0 594 | LP002933,,No,3+,Graduate,Yes,9357,0,292,360,1,Semiurban,1 595 | LP002936,Male,Yes,0,Graduate,No,3859,3300,142,180,1,Rural,1 596 | LP002938,Male,Yes,0,Graduate,Yes,16120,0,260,360,1,Urban,1 597 | LP002940,Male,No,0,Not Graduate,No,3833,0,110,360,1,Rural,1 598 | LP002941,Male,Yes,2,Not Graduate,Yes,6383,1000,187,360,1,Rural,0 599 | LP002943,Male,No,,Graduate,No,2987,0,88,360,0,Semiurban,0 600 | LP002945,Male,Yes,0,Graduate,Yes,9963,0,180,360,1,Rural,1 601 | LP002948,Male,Yes,2,Graduate,No,5780,0,192,360,1,Urban,1 602 | LP002949,Female,No,3+,Graduate,,416,41667,350,180,,Urban,0 603 | LP002950,Male,Yes,0,Not Graduate,,2894,2792,155,360,1,Rural,1 604 | LP002953,Male,Yes,3+,Graduate,No,5703,0,128,360,1,Urban,1 605 | LP002958,Male,No,0,Graduate,No,3676,4301,172,360,1,Rural,1 606 | LP002959,Female,Yes,1,Graduate,No,12000,0,496,360,1,Semiurban,1 607 | LP002960,Male,Yes,0,Not Graduate,No,2400,3800,0,180,1,Urban,0 608 | LP002961,Male,Yes,1,Graduate,No,3400,2500,173,360,1,Semiurban,1 609 | LP002964,Male,Yes,2,Not Graduate,No,3987,1411,157,360,1,Rural,1 610 | LP002974,Male,Yes,0,Graduate,No,3232,1950,108,360,1,Rural,1 611 | LP002978,Female,No,0,Graduate,No,2900,0,71,360,1,Rural,1 612 | LP002979,Male,Yes,3+,Graduate,No,4106,0,40,180,1,Rural,1 613 | LP002983,Male,Yes,1,Graduate,No,8072,240,253,360,1,Urban,1 614 | LP002984,Male,Yes,2,Graduate,No,7583,0,187,360,1,Urban,1 615 | LP002990,Female,No,0,Graduate,Yes,4583,0,133,360,0,Semiurban,0 616 | -------------------------------------------------------------------------------- /Chapter 3/Log_ROC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Apress/supervised-learning-w-python/68c94f12d27647fa3dcd6b19d83edfc0bb3c5f39/Chapter 3/Log_ROC.png -------------------------------------------------------------------------------- /Chapter 3/Naive Bayes Case Study.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Income prediction on census data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Objective: \n", 15 | "To predict whether income exceeds 50K/yr based on census data" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "Dataset: Adult Data Set\n", 23 | "\n", 24 | "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "Variable description:\n", 32 | " \n", 33 | "age: continuous\n", 34 | "\n", 35 | "workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.\n", 36 | "\n", 37 | "fnlwgt: continuous.\n", 38 | "\n", 39 | "education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.\n", 40 | "\n", 41 | "education-num: continuous.\n", 42 | "\n", 43 | "marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.\n", 44 | "\n", 45 | "occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.\n", 46 | "\n", 47 | "relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.\n", 48 | "\n", 49 | "race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.\n", 50 | "\n", 51 | "sex: Female, Male.\n", 52 | "\n", 53 | "capital-gain: continuous.\n", 54 | "\n", 55 | "capital-loss: continuous.\n", 56 | "\n", 57 | "hours-per-week: continuous.\n", 58 | "\n", 59 | "native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.\n", 60 | "\n", 61 | "class: >50K, <=50K" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "# Pandas and Numpy libraries\n", 71 | "import pandas as pd\n", 72 | "import numpy as np" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 2, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "# For preprocessing the data\n", 82 | "#from sklearn.preprocessing import Imputer\n", 83 | "from sklearn import preprocessing" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# To split the dataset into train and test datasets\n", 93 | "from sklearn.model_selection import train_test_split" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 4, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "# To model the Gaussian Navie Bayes classifier\n", 103 | "from sklearn.naive_bayes import GaussianNB" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 5, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "# To calculate the accuracy score of the model\n", 113 | "from sklearn.metrics import accuracy_score" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 6, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "census_df = pd.read_csv('adult.data', header = None, delimiter=' *, *', engine='python')" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "Load the dataset. Observe that this file has .data extention\n", 130 | "\n", 131 | "For importing the census data, we are using pandas read_csv() method. This method is a very simple and fast method for importing \n", 132 | "data.\n", 133 | "\n", 134 | "We are passing four parameters. The ‘adult.data’ parameter is the file name. The header parameter is for giving details to pandas\n", 135 | "that whether the first row of data consists of headers or not. In our dataset, there is no header. So, we are passing None.\n", 136 | "\n", 137 | "The delimiter parameter is for giving the information the delimiter that is separating the data. Here, we are using ‘ , ’ \n", 138 | "delimiter. This delimiter is to show delete the spaces before and after the data values. This is very helpful when there is \n", 139 | "inconsistency in spaces used with data values." 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 8, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "data": { 149 | "text/plain": [ 150 | "Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], dtype='int64')" 151 | ] 152 | }, 153 | "execution_count": 8, 154 | "metadata": {}, 155 | "output_type": "execute_result" 156 | } 157 | ], 158 | "source": [ 159 | "# Print columns in the adult data set\n", 160 | "census_df.columns" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 9, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "# Adding headers to the dataframe \n", 170 | "census_df.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation', 'relationship',\n", 171 | " 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 10, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "data": { 181 | "text/plain": [ 182 | "32561" 183 | ] 184 | }, 185 | "execution_count": 10, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "# Number of records(rows) in the dataframe\n", 192 | "len(census_df)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 11, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "data": { 202 | "text/plain": [ 203 | "age 0\n", 204 | "workclass 0\n", 205 | "fnlwgt 0\n", 206 | "education 0\n", 207 | "education_num 0\n", 208 | "marital_status 0\n", 209 | "occupation 0\n", 210 | "relationship 0\n", 211 | "race 0\n", 212 | "sex 0\n", 213 | "capital_gain 0\n", 214 | "capital_loss 0\n", 215 | "hours_per_week 0\n", 216 | "native_country 0\n", 217 | "income 0\n", 218 | "dtype: int64" 219 | ] 220 | }, 221 | "execution_count": 11, 222 | "metadata": {}, 223 | "output_type": "execute_result" 224 | } 225 | ], 226 | "source": [ 227 | "# Handling missing data\n", 228 | "# Test whether there is any null value in our dataset or not. We can do this using isnull() method.\n", 229 | "census_df.isnull().sum()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "The above output shows that there is no “null” value in our dataset.\n", 237 | "\n", 238 | "Let’s try to test whether any categorical attribute contains a “?” in it or not. At times there exists “?” or ” ” in place of \n", 239 | "missing values. Using the below code snippet we are going to test whether adult_df data frame consists of categorical variables \n", 240 | "with values as “?”." 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 12, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "workclass : 1836\n", 253 | "education : 0\n", 254 | "marital_status : 0\n", 255 | "occupation : 1843\n", 256 | "relationship : 0\n", 257 | "race : 0\n", 258 | "sex : 0\n", 259 | "native_country : 583\n", 260 | "income : 0\n" 261 | ] 262 | } 263 | ], 264 | "source": [ 265 | "for value in ['workclass','education','marital_status','occupation','relationship','race','sex','native_country','income']:\n", 266 | " print(value,\":\", sum(census_df[value] == '?'))" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "The output of the above code snippet shows that there are 1836 missing values in workclass attribute. 1843 missing values in \n", 274 | "occupation attribute and 583 values in native_country attribute." 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "# Data preprocessing\n", 282 | "\n", 283 | "For preprocessing, we are going to make a duplicate copy of our original dataframe.We are duplicating adult_df to adult_df_rev \n", 284 | "dataframe. Observe that we have used deep copy while copying. Why?" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 14, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "## Deep copy of adult_df\n", 294 | "census_df_rev = census_df.copy(deep=True)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "Before doing missing values handling task, we need some summary statistics of our dataframe. For this, we can use describe() \n", 302 | "method. It can be used to generate various summary statistics, excluding NaN values." 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 15, 308 | "metadata": {}, 309 | "outputs": [ 310 | { 311 | "data": { 312 | "text/html": [ 313 | "
\n", 314 | "\n", 327 | "\n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | "
agefnlwgteducation_numcapital_gaincapital_losshours_per_week
count32561.0000003.256100e+0432561.00000032561.00000032561.00000032561.000000
mean38.5816471.897784e+0510.0806791077.64884487.30383040.437456
std13.6404331.055500e+052.5727207385.292085402.96021912.347429
min17.0000001.228500e+041.0000000.0000000.0000001.000000
25%28.0000001.178270e+059.0000000.0000000.00000040.000000
50%37.0000001.783560e+0510.0000000.0000000.00000040.000000
75%48.0000002.370510e+0512.0000000.0000000.00000045.000000
max90.0000001.484705e+0616.00000099999.0000004356.00000099.000000
\n", 414 | "
" 415 | ], 416 | "text/plain": [ 417 | " age fnlwgt education_num capital_gain capital_loss \\\n", 418 | "count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 \n", 419 | "mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 \n", 420 | "std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 \n", 421 | "min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 \n", 422 | "25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 \n", 423 | "50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 \n", 424 | "75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 \n", 425 | "max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 \n", 426 | "\n", 427 | " hours_per_week \n", 428 | "count 32561.000000 \n", 429 | "mean 40.437456 \n", 430 | "std 12.347429 \n", 431 | "min 1.000000 \n", 432 | "25% 40.000000 \n", 433 | "50% 40.000000 \n", 434 | "75% 45.000000 \n", 435 | "max 99.000000 " 436 | ] 437 | }, 438 | "execution_count": 15, 439 | "metadata": {}, 440 | "output_type": "execute_result" 441 | } 442 | ], 443 | "source": [ 444 | "census_df_rev.describe()" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "We are passing an “include” parameter with value as “all”, this is used to specify that. we want summary statistics of all the \n", 452 | "attributes." 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 16, 458 | "metadata": {}, 459 | "outputs": [ 460 | { 461 | "data": { 462 | "text/html": [ 463 | "
\n", 464 | "\n", 477 | "\n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | "
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
count32561.000000325613.256100e+043256132561.000000325613256132561325613256132561.00000032561.00000032561.0000003256132561
uniqueNaN9NaN16NaN715652NaNNaNNaN422
topNaNPrivateNaNHS-gradNaNMarried-civ-spouseProf-specialtyHusbandWhiteMaleNaNNaNNaNUnited-States<=50K
freqNaN22696NaN10501NaN149764140131932781621790NaNNaNNaN2917024720
mean38.581647NaN1.897784e+05NaN10.080679NaNNaNNaNNaNNaN1077.64884487.30383040.437456NaNNaN
std13.640433NaN1.055500e+05NaN2.572720NaNNaNNaNNaNNaN7385.292085402.96021912.347429NaNNaN
min17.000000NaN1.228500e+04NaN1.000000NaNNaNNaNNaNNaN0.0000000.0000001.000000NaNNaN
25%28.000000NaN1.178270e+05NaN9.000000NaNNaNNaNNaNNaN0.0000000.00000040.000000NaNNaN
50%37.000000NaN1.783560e+05NaN10.000000NaNNaNNaNNaNNaN0.0000000.00000040.000000NaNNaN
75%48.000000NaN2.370510e+05NaN12.000000NaNNaNNaNNaNNaN0.0000000.00000045.000000NaNNaN
max90.000000NaN1.484705e+06NaN16.000000NaNNaNNaNNaNNaN99999.0000004356.00000099.000000NaNNaN
\n", 699 | "
" 700 | ], 701 | "text/plain": [ 702 | " age workclass fnlwgt education education_num \\\n", 703 | "count 32561.000000 32561 3.256100e+04 32561 32561.000000 \n", 704 | "unique NaN 9 NaN 16 NaN \n", 705 | "top NaN Private NaN HS-grad NaN \n", 706 | "freq NaN 22696 NaN 10501 NaN \n", 707 | "mean 38.581647 NaN 1.897784e+05 NaN 10.080679 \n", 708 | "std 13.640433 NaN 1.055500e+05 NaN 2.572720 \n", 709 | "min 17.000000 NaN 1.228500e+04 NaN 1.000000 \n", 710 | "25% 28.000000 NaN 1.178270e+05 NaN 9.000000 \n", 711 | "50% 37.000000 NaN 1.783560e+05 NaN 10.000000 \n", 712 | "75% 48.000000 NaN 2.370510e+05 NaN 12.000000 \n", 713 | "max 90.000000 NaN 1.484705e+06 NaN 16.000000 \n", 714 | "\n", 715 | " marital_status occupation relationship race sex \\\n", 716 | "count 32561 32561 32561 32561 32561 \n", 717 | "unique 7 15 6 5 2 \n", 718 | "top Married-civ-spouse Prof-specialty Husband White Male \n", 719 | "freq 14976 4140 13193 27816 21790 \n", 720 | "mean NaN NaN NaN NaN NaN \n", 721 | "std NaN NaN NaN NaN NaN \n", 722 | "min NaN NaN NaN NaN NaN \n", 723 | "25% NaN NaN NaN NaN NaN \n", 724 | "50% NaN NaN NaN NaN NaN \n", 725 | "75% NaN NaN NaN NaN NaN \n", 726 | "max NaN NaN NaN NaN NaN \n", 727 | "\n", 728 | " capital_gain capital_loss hours_per_week native_country income \n", 729 | "count 32561.000000 32561.000000 32561.000000 32561 32561 \n", 730 | "unique NaN NaN NaN 42 2 \n", 731 | "top NaN NaN NaN United-States <=50K \n", 732 | "freq NaN NaN NaN 29170 24720 \n", 733 | "mean 1077.648844 87.303830 40.437456 NaN NaN \n", 734 | "std 7385.292085 402.960219 12.347429 NaN NaN \n", 735 | "min 0.000000 0.000000 1.000000 NaN NaN \n", 736 | "25% 0.000000 0.000000 40.000000 NaN NaN \n", 737 | "50% 0.000000 0.000000 40.000000 NaN NaN \n", 738 | "75% 0.000000 0.000000 45.000000 NaN NaN \n", 739 | "max 99999.000000 4356.000000 99.000000 NaN NaN " 740 | ] 741 | }, 742 | "execution_count": 16, 743 | "metadata": {}, 744 | "output_type": "execute_result" 745 | } 746 | ], 747 | "source": [ 748 | "census_df_rev.describe(include= 'all')" 749 | ] 750 | }, 751 | { 752 | "cell_type": "markdown", 753 | "metadata": {}, 754 | "source": [ 755 | "# Data imputation \n" 756 | ] 757 | }, 758 | { 759 | "cell_type": "code", 760 | "execution_count": 17, 761 | "metadata": {}, 762 | "outputs": [ 763 | { 764 | "name": "stderr", 765 | "output_type": "stream", 766 | "text": [ 767 | "/Users/vverdhan/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n", 768 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 769 | "\n", 770 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 771 | " This is separate from the ipykernel package so we can avoid doing imports until\n" 772 | ] 773 | } 774 | ], 775 | "source": [ 776 | "for value in ['workclass','education','marital_status','occupation','relationship','race','sex','native_country','income']:\n", 777 | " replaceValue = census_df_rev.describe(include='all')[value][2]\n", 778 | " census_df_rev[value][census_df_rev[value]=='?'] = replaceValue" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": 18, 784 | "metadata": {}, 785 | "outputs": [], 786 | "source": [ 787 | "# Hot Encoding \n", 788 | "le = preprocessing.LabelEncoder()\n", 789 | "workclass_category = le.fit_transform(census_df.workclass)\n", 790 | "education_category = le.fit_transform(census_df.education)\n", 791 | "marital_category = le.fit_transform(census_df.marital_status)\n", 792 | "occupation_category = le.fit_transform(census_df.occupation)\n", 793 | "relationship_category = le.fit_transform(census_df.relationship)\n", 794 | "race_category = le.fit_transform(census_df.race)\n", 795 | "sex_category = le.fit_transform(census_df.sex)\n", 796 | "native_country_category = le.fit_transform(census_df.native_country)" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 20, 802 | "metadata": {}, 803 | "outputs": [], 804 | "source": [ 805 | "#initialize the encoded categorical columns\n", 806 | "census_df_rev['workclass_category'] = workclass_category\n", 807 | "census_df_rev['education_category'] = education_category\n", 808 | "census_df_rev['marital_category'] = marital_category\n", 809 | "census_df_rev['occupation_category'] = occupation_category\n", 810 | "census_df_rev['relationship_category'] = relationship_category\n", 811 | "census_df_rev['race_category'] = race_category\n", 812 | "census_df_rev['sex_category'] = sex_category\n", 813 | "census_df_rev['native_country_category'] = native_country_category" 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": 21, 819 | "metadata": {}, 820 | "outputs": [ 821 | { 822 | "data": { 823 | "text/html": [ 824 | "
\n", 825 | "\n", 838 | "\n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | "
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesex...native_countryincomeworkclass_categoryeducation_categorymarital_categoryoccupation_categoryrelationship_categoryrace_categorysex_categorynative_country_category
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale...United-States<=50K794114139
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale...United-States<=50K692404139
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale...United-States<=50K4110614139
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale...United-States<=50K412602139
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale...Cuba<=50K492105205
\n", 988 | "

5 rows × 23 columns

\n", 989 | "
" 990 | ], 991 | "text/plain": [ 992 | " age workclass fnlwgt education education_num \\\n", 993 | "0 39 State-gov 77516 Bachelors 13 \n", 994 | "1 50 Self-emp-not-inc 83311 Bachelors 13 \n", 995 | "2 38 Private 215646 HS-grad 9 \n", 996 | "3 53 Private 234721 11th 7 \n", 997 | "4 28 Private 338409 Bachelors 13 \n", 998 | "\n", 999 | " marital_status occupation relationship race sex ... \\\n", 1000 | "0 Never-married Adm-clerical Not-in-family White Male ... \n", 1001 | "1 Married-civ-spouse Exec-managerial Husband White Male ... \n", 1002 | "2 Divorced Handlers-cleaners Not-in-family White Male ... \n", 1003 | "3 Married-civ-spouse Handlers-cleaners Husband Black Male ... \n", 1004 | "4 Married-civ-spouse Prof-specialty Wife Black Female ... \n", 1005 | "\n", 1006 | " native_country income workclass_category education_category \\\n", 1007 | "0 United-States <=50K 7 9 \n", 1008 | "1 United-States <=50K 6 9 \n", 1009 | "2 United-States <=50K 4 11 \n", 1010 | "3 United-States <=50K 4 1 \n", 1011 | "4 Cuba <=50K 4 9 \n", 1012 | "\n", 1013 | " marital_category occupation_category relationship_category race_category \\\n", 1014 | "0 4 1 1 4 \n", 1015 | "1 2 4 0 4 \n", 1016 | "2 0 6 1 4 \n", 1017 | "3 2 6 0 2 \n", 1018 | "4 2 10 5 2 \n", 1019 | "\n", 1020 | " sex_category native_country_category \n", 1021 | "0 1 39 \n", 1022 | "1 1 39 \n", 1023 | "2 1 39 \n", 1024 | "3 1 39 \n", 1025 | "4 0 5 \n", 1026 | "\n", 1027 | "[5 rows x 23 columns]" 1028 | ] 1029 | }, 1030 | "execution_count": 21, 1031 | "metadata": {}, 1032 | "output_type": "execute_result" 1033 | } 1034 | ], 1035 | "source": [ 1036 | "census_df_rev.head()" 1037 | ] 1038 | }, 1039 | { 1040 | "cell_type": "code", 1041 | "execution_count": 22, 1042 | "metadata": {}, 1043 | "outputs": [], 1044 | "source": [ 1045 | "#drop the old categorical columns from dataframe\n", 1046 | "dummy_fields = ['workclass','education','marital_status','occupation','relationship','race', 'sex', 'native_country']\n", 1047 | "census_df_rev = census_df_rev.drop(dummy_fields, axis = 1)" 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": 23, 1053 | "metadata": {}, 1054 | "outputs": [ 1055 | { 1056 | "ename": "AttributeError", 1057 | "evalue": "'DataFrame' object has no attribute 'reindex_axis'", 1058 | "output_type": "error", 1059 | "traceback": [ 1060 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1061 | "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", 1062 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m census_df_rev = census_df_rev.reindex_axis(['age', 'workclass_category', 'fnlwgt', 'education_category',\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m'education_num'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'marital_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'occupation_category'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m'relationship_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'race_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'sex_category'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'capital_gain'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m'capital_loss'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'hours_per_week'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'native_country_category'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m 'income'], axis= 1)\n", 1063 | "\u001b[0;32m~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 5177\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5178\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5179\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5180\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5181\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 1064 | "\u001b[0;31mAttributeError\u001b[0m: 'DataFrame' object has no attribute 'reindex_axis'" 1065 | ] 1066 | } 1067 | ], 1068 | "source": [ 1069 | "census_df_rev = census_df_rev.reindex_axis(['age', 'workclass_category', 'fnlwgt', 'education_category',\n", 1070 | " 'education_num', 'marital_category', 'occupation_category',\n", 1071 | " 'relationship_category', 'race_category', 'sex_category', 'capital_gain',\n", 1072 | " 'capital_loss', 'hours_per_week', 'native_country_category', \n", 1073 | " 'income'], axis= 1)\n", 1074 | "census_df_rev.head(5)" 1075 | ] 1076 | }, 1077 | { 1078 | "cell_type": "code", 1079 | "execution_count": 24, 1080 | "metadata": {}, 1081 | "outputs": [ 1082 | { 1083 | "data": { 1084 | "text/html": [ 1085 | "
\n", 1086 | "\n", 1099 | "\n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | "
ageworkclass_categoryfnlwgteducation_categoryeducation_nummarital_categoryoccupation_categoryrelationship_categoryrace_categorysex_categorycapital_gaincapital_losshours_per_weeknative_country_categoryincome
03977751691341141217404039<=50K
15068331191324041001339<=50K
238421564611906141004039<=50K
35342347211726021004039<=50K
428433840991321052000405<=50K
\n", 1213 | "
" 1214 | ], 1215 | "text/plain": [ 1216 | " age workclass_category fnlwgt education_category education_num \\\n", 1217 | "0 39 7 77516 9 13 \n", 1218 | "1 50 6 83311 9 13 \n", 1219 | "2 38 4 215646 11 9 \n", 1220 | "3 53 4 234721 1 7 \n", 1221 | "4 28 4 338409 9 13 \n", 1222 | "\n", 1223 | " marital_category occupation_category relationship_category \\\n", 1224 | "0 4 1 1 \n", 1225 | "1 2 4 0 \n", 1226 | "2 0 6 1 \n", 1227 | "3 2 6 0 \n", 1228 | "4 2 10 5 \n", 1229 | "\n", 1230 | " race_category sex_category capital_gain capital_loss hours_per_week \\\n", 1231 | "0 4 1 2174 0 40 \n", 1232 | "1 4 1 0 0 13 \n", 1233 | "2 4 1 0 0 40 \n", 1234 | "3 2 1 0 0 40 \n", 1235 | "4 2 0 0 0 40 \n", 1236 | "\n", 1237 | " native_country_category income \n", 1238 | "0 39 <=50K \n", 1239 | "1 39 <=50K \n", 1240 | "2 39 <=50K \n", 1241 | "3 39 <=50K \n", 1242 | "4 5 <=50K " 1243 | ] 1244 | }, 1245 | "execution_count": 24, 1246 | "metadata": {}, 1247 | "output_type": "execute_result" 1248 | } 1249 | ], 1250 | "source": [ 1251 | "census_df_rev = census_df_rev.reindex(['age', 'workclass_category', 'fnlwgt', 'education_category',\n", 1252 | " 'education_num', 'marital_category', 'occupation_category',\n", 1253 | " 'relationship_category', 'race_category', 'sex_category', 'capital_gain',\n", 1254 | " 'capital_loss', 'hours_per_week', 'native_country_category', \n", 1255 | " 'income'], axis= 1)\n", 1256 | "census_df_rev.head(5)" 1257 | ] 1258 | }, 1259 | { 1260 | "cell_type": "markdown", 1261 | "metadata": {}, 1262 | "source": [ 1263 | "# Data Slicing" 1264 | ] 1265 | }, 1266 | { 1267 | "cell_type": "code", 1268 | "execution_count": 25, 1269 | "metadata": {}, 1270 | "outputs": [], 1271 | "source": [ 1272 | "X = census_df_rev.values[:,:14]\n", 1273 | "Y = census_df_rev.values[:,14] " 1274 | ] 1275 | }, 1276 | { 1277 | "cell_type": "code", 1278 | "execution_count": 26, 1279 | "metadata": {}, 1280 | "outputs": [], 1281 | "source": [ 1282 | "\n", 1283 | "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 5)" 1284 | ] 1285 | }, 1286 | { 1287 | "cell_type": "markdown", 1288 | "metadata": {}, 1289 | "source": [ 1290 | "Implement Gaussian Naive Bayes" 1291 | ] 1292 | }, 1293 | { 1294 | "cell_type": "code", 1295 | "execution_count": 27, 1296 | "metadata": {}, 1297 | "outputs": [ 1298 | { 1299 | "data": { 1300 | "text/plain": [ 1301 | "GaussianNB()" 1302 | ] 1303 | }, 1304 | "execution_count": 27, 1305 | "metadata": {}, 1306 | "output_type": "execute_result" 1307 | } 1308 | ], 1309 | "source": [ 1310 | "clf = GaussianNB()\n", 1311 | "clf.fit(X_train, Y_train)" 1312 | ] 1313 | }, 1314 | { 1315 | "cell_type": "code", 1316 | "execution_count": 28, 1317 | "metadata": {}, 1318 | "outputs": [], 1319 | "source": [ 1320 | "Y_pred = clf.predict(X_test)" 1321 | ] 1322 | }, 1323 | { 1324 | "cell_type": "code", 1325 | "execution_count": 29, 1326 | "metadata": {}, 1327 | "outputs": [ 1328 | { 1329 | "data": { 1330 | "text/plain": [ 1331 | "0.7903205994349588" 1332 | ] 1333 | }, 1334 | "execution_count": 29, 1335 | "metadata": {}, 1336 | "output_type": "execute_result" 1337 | } 1338 | ], 1339 | "source": [ 1340 | "accuracy_score(Y_test, Y_pred, normalize = True)" 1341 | ] 1342 | }, 1343 | { 1344 | "cell_type": "code", 1345 | "execution_count": null, 1346 | "metadata": {}, 1347 | "outputs": [], 1348 | "source": [] 1349 | } 1350 | ], 1351 | "metadata": { 1352 | "kernelspec": { 1353 | "display_name": "Python 3", 1354 | "language": "python", 1355 | "name": "python3" 1356 | }, 1357 | "language_info": { 1358 | "codemirror_mode": { 1359 | "name": "ipython", 1360 | "version": 3 1361 | }, 1362 | "file_extension": ".py", 1363 | "mimetype": "text/x-python", 1364 | "name": "python", 1365 | "nbconvert_exporter": "python", 1366 | "pygments_lexer": "ipython3", 1367 | "version": "3.7.4" 1368 | } 1369 | }, 1370 | "nbformat": 4, 1371 | "nbformat_minor": 2 1372 | } 1373 | -------------------------------------------------------------------------------- /Chapter 3/ReadMe: -------------------------------------------------------------------------------- 1 | 2 | 3 | The third chapter of the book, it contains the classification algorithms. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding. 4 | -------------------------------------------------------------------------------- /Chapter 4/Chapter4_NLP2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "uUpQff5qfTNc" 8 | }, 9 | "source": [ 10 | "## Complaint Categorization using Word Embeddings" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "colab": {}, 18 | "colab_type": "code", 19 | "id": "zhXsYJwq7-Rs" 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "from nltk.tokenize import RegexpTokenizer\n", 24 | "import numpy as np\n", 25 | "import re" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": { 32 | "colab": { 33 | "base_uri": "https://localhost:8080/", 34 | "height": 204 35 | }, 36 | "colab_type": "code", 37 | "executionInfo": { 38 | "elapsed": 7321, 39 | "status": "ok", 40 | "timestamp": 1566387081318, 41 | "user": { 42 | "displayName": "dikshant gupta", 43 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 44 | "userId": "01845807612441668603" 45 | }, 46 | "user_tz": -330 47 | }, 48 | "id": "s_Bu4lfx7-Rz", 49 | "outputId": "97bd61bd-4a05-4365-f935-4127bf06790f" 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "import pandas as pd\n", 54 | "complaints_dataframe = pd.read_csv('complaints.csv') \n" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 4, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/html": [ 65 | "
\n", 66 | "\n", 79 | "\n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | "
Consumer complaint narrativeProduct
0I have outdated information on my credit repor...Credit reporting
1I purchased a new car on XXXX XXXX. The car de...Consumer Loan
2An account on my credit report has a mistaken ...Credit reporting
3This company refuses to provide me verificatio...Debt collection
4This complaint is in regards to Square Two Fin...Debt collection
\n", 115 | "
" 116 | ], 117 | "text/plain": [ 118 | " Consumer complaint narrative Product\n", 119 | "0 I have outdated information on my credit repor... Credit reporting\n", 120 | "1 I purchased a new car on XXXX XXXX. The car de... Consumer Loan\n", 121 | "2 An account on my credit report has a mistaken ... Credit reporting\n", 122 | "3 This company refuses to provide me verificatio... Debt collection\n", 123 | "4 This complaint is in regards to Square Two Fin... Debt collection" 124 | ] 125 | }, 126 | "execution_count": 4, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "complaints_dataframe.head()" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": { 139 | "colab": {}, 140 | "colab_type": "code", 141 | "id": "v6H1UTDM7-R8" 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "def convert_complaint_to_words(comp):\n", 146 | " \n", 147 | " converted_words = RegexpTokenizer('\\w+').tokenize(comp)\n", 148 | " converted_words = [re.sub(r'([xx]+)|([XX]+)|(\\d+)', '', w).lower() for w in converted_words]\n", 149 | " converted_words = list(filter(lambda a: a != '', converted_words))\n", 150 | " \n", 151 | " return converted_words" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 6, 157 | "metadata": { 158 | "colab": {}, 159 | "colab_type": "code", 160 | "id": "0RgXmo-N7-SC" 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "all_words = list()\n", 165 | "for comp in complaints_dataframe['Consumer complaint narrative']:\n", 166 | " for w in convert_complaint_to_words(comp):\n", 167 | " all_words.append(w)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 7, 173 | "metadata": { 174 | "colab": { 175 | "base_uri": "https://localhost:8080/", 176 | "height": 34 177 | }, 178 | "colab_type": "code", 179 | "executionInfo": { 180 | "elapsed": 80284, 181 | "status": "ok", 182 | "timestamp": 1566387158514, 183 | "user": { 184 | "displayName": "dikshant gupta", 185 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 186 | "userId": "01845807612441668603" 187 | }, 188 | "user_tz": -330 189 | }, 190 | "id": "8T_RNzwy7-SF", 191 | "outputId": "cd6c76a2-42d6-43aa-877f-f399f9799130" 192 | }, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "Size of vocabulary: 76908\n" 199 | ] 200 | } 201 | ], 202 | "source": [ 203 | "print('Size of vocabulary is {}'.format(len(set(all_words))))" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 9, 209 | "metadata": { 210 | "colab": { 211 | "base_uri": "https://localhost:8080/", 212 | "height": 190 213 | }, 214 | "colab_type": "code", 215 | "executionInfo": { 216 | "elapsed": 79440, 217 | "status": "ok", 218 | "timestamp": 1566387158515, 219 | "user": { 220 | "displayName": "dikshant gupta", 221 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 222 | "userId": "01845807612441668603" 223 | }, 224 | "user_tz": -330 225 | }, 226 | "id": "Dbh2Y10y7-SL", 227 | "outputId": "da08bf30-04f9-4ec1-c798-0f22a4c44cb9" 228 | }, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "Complaint is \n", 235 | " Without provocation, I received notice that my credit line was being decreased by nearly 100 %. My available credit was reduced from $ XXXX to XXXX ( the rough amount of my available balance ). \n", 236 | "\n", 237 | "When I called to question the change, I was provided a nob-descript response referencing my XXXX report. It was my understanding that under the FCRA I was entitled to a copy of this report, but was refused by Citi and have been given no further explanation. \n", 238 | "\n", 239 | "This is predatory in that it affects my utilization of credit, further subjecting me to increase in APrs, etc and a higher cost of credit without any reason. \n", 240 | "\n", 241 | "Tokens are\n", 242 | " ['without', 'provocation', 'i', 'received', 'notice', 'that', 'my', 'credit', 'line', 'was', 'being', 'decreased', 'by', 'nearly', 'my', 'available', 'credit', 'was', 'reduced', 'from', 'to', 'the', 'rough', 'amount', 'of', 'my', 'available', 'balance', 'when', 'i', 'called', 'to', 'question', 'the', 'change', 'i', 'was', 'provided', 'a', 'nob', 'descript', 'response', 'referencing', 'my', 'report', 'it', 'was', 'my', 'understanding', 'that', 'under', 'the', 'fcra', 'i', 'was', 'entitled', 'to', 'a', 'copy', 'of', 'this', 'report', 'but', 'was', 'refused', 'by', 'citi', 'and', 'have', 'been', 'given', 'no', 'further', 'eplanation', 'this', 'is', 'predatory', 'in', 'that', 'it', 'affects', 'my', 'utilization', 'of', 'credit', 'further', 'subjecting', 'me', 'to', 'increase', 'in', 'aprs', 'etc', 'and', 'a', 'higher', 'cost', 'of', 'credit', 'without', 'any', 'reason']\n" 243 | ] 244 | } 245 | ], 246 | "source": [ 247 | "print('Complaint is \\n', complaints_dataframe['Consumer complaint narrative'][10], '\\n')\n", 248 | "print('Tokens are\\n', convert_complaint_to_words(complaints_dataframe['Consumer complaint narrative'][10]))" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": { 254 | "colab_type": "text", 255 | "id": "YHi4vCGX7-SU" 256 | }, 257 | "source": [ 258 | "### Indexing\n" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 10, 264 | "metadata": { 265 | "colab": {}, 266 | "colab_type": "code", 267 | "id": "-ClRX_y07-SW" 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "index_dictionary = dict()\n", 272 | "count = 1\n", 273 | "index_dictionary[''] = 0\n", 274 | "for word in set(all_words):\n", 275 | " index_dictionary[word] = count\n", 276 | " count += 1" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": { 282 | "colab_type": "text", 283 | "id": "vv8dIbF47-Sa" 284 | }, 285 | "source": [ 286 | "### Dataset" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 11, 292 | "metadata": { 293 | "colab": {}, 294 | "colab_type": "code", 295 | "id": "u1OJzln_7-Sb" 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "embeddings_index = {}\n", 300 | "f = open('glove.6B.300d.txt')\n", 301 | "for line in f:\n", 302 | " values = line.split()\n", 303 | " word = values[0]\n", 304 | " coefs = np.asarray(values[1:], dtype='float32')\n", 305 | " embeddings_index[word] = coefs\n", 306 | "f.close()" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": { 312 | "colab_type": "text", 313 | "id": "s--9I9d5msCp" 314 | }, 315 | "source": [ 316 | "#### Taking average of all word embeddings in a sentence to generate the sentence representation." 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 13, 322 | "metadata": { 323 | "colab": {}, 324 | "colab_type": "code", 325 | "id": "aN13GyDe7-Sd" 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "complaints_list = list()\n", 330 | "for comp in complaints_dataframe['Consumer complaint narrative']:\n", 331 | " sentence = np.zeros(300)\n", 332 | " count = 0\n", 333 | " for w in convert_complaint_to_words(comp):\n", 334 | " try:\n", 335 | " sentence += embeddings_index[w]\n", 336 | " count += 1\n", 337 | " except KeyError:\n", 338 | " continue\n", 339 | " complaints_list.append(sentence / count)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": { 345 | "colab_type": "text", 346 | "id": "OYZ703hVm5Cg" 347 | }, 348 | "source": [ 349 | "#### Converting categrical labels to numerical format and further one hot encoding on the numerical labels." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 14, 355 | "metadata": { 356 | "colab": { 357 | "base_uri": "https://localhost:8080/", 358 | "height": 204 359 | }, 360 | "colab_type": "code", 361 | "executionInfo": { 362 | "elapsed": 1796, 363 | "status": "ok", 364 | "timestamp": 1566387500223, 365 | "user": { 366 | "displayName": "dikshant gupta", 367 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 368 | "userId": "01845807612441668603" 369 | }, 370 | "user_tz": -330 371 | }, 372 | "id": "EoR79rtZ7-Sr", 373 | "outputId": "704c6e1e-dc8e-4d42-851d-0e3724e60fd6" 374 | }, 375 | "outputs": [ 376 | { 377 | "data": { 378 | "text/html": [ 379 | "
\n", 380 | "\n", 393 | "\n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | "
Consumer complaint narrativeProductTarget
0I have outdated information on my credit repor...Credit reporting5
1I purchased a new car on XXXX XXXX. The car de...Consumer Loan2
2An account on my credit report has a mistaken ...Credit reporting5
3This company refuses to provide me verificatio...Debt collection7
4This complaint is in regards to Square Two Fin...Debt collection7
\n", 435 | "
" 436 | ], 437 | "text/plain": [ 438 | " Consumer complaint narrative Product Target\n", 439 | "0 I have outdated information on my credit repor... Credit reporting 5\n", 440 | "1 I purchased a new car on XXXX XXXX. The car de... Consumer Loan 2\n", 441 | "2 An account on my credit report has a mistaken ... Credit reporting 5\n", 442 | "3 This company refuses to provide me verificatio... Debt collection 7\n", 443 | "4 This complaint is in regards to Square Two Fin... Debt collection 7" 444 | ] 445 | }, 446 | "execution_count": 14, 447 | "metadata": {}, 448 | "output_type": "execute_result" 449 | } 450 | ], 451 | "source": [ 452 | "from sklearn import preprocessing\n", 453 | "le = preprocessing.LabelEncoder()\n", 454 | "le.fit(complaints_dataframe['Product'])\n", 455 | "complaints_dataframe['Target'] = le.transform(complaints_dataframe['Product'])\n", 456 | "complaints_dataframe.head()" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": { 462 | "colab_type": "text", 463 | "id": "atXHKYN27-S0" 464 | }, 465 | "source": [ 466 | "### One hot Encoding" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 15, 472 | "metadata": { 473 | "colab": {}, 474 | "colab_type": "code", 475 | "id": "_RwGbO_L7-S4" 476 | }, 477 | "outputs": [], 478 | "source": [ 479 | "from sklearn.model_selection import train_test_split\n", 480 | "X_train, X_test, y_train, y_test = train_test_split(np.array(complaints_list), complaints_dataframe.Target.values, \n", 481 | " test_size=0.15, random_state=0)" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 16, 487 | "metadata": { 488 | "colab": { 489 | "base_uri": "https://localhost:8080/", 490 | "height": 34 491 | }, 492 | "colab_type": "code", 493 | "executionInfo": { 494 | "elapsed": 1623, 495 | "status": "ok", 496 | "timestamp": 1566387605397, 497 | "user": { 498 | "displayName": "dikshant gupta", 499 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 500 | "userId": "01845807612441668603" 501 | }, 502 | "user_tz": -330 503 | }, 504 | "id": "K7XHJpLc7-S7", 505 | "outputId": "2e39d372-d636-467b-cadc-4da71065e2a6" 506 | }, 507 | "outputs": [ 508 | { 509 | "name": "stdout", 510 | "output_type": "stream", 511 | "text": [ 512 | "(152809, 300)\n" 513 | ] 514 | } 515 | ], 516 | "source": [ 517 | "print(X_train.shape)" 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": 18, 523 | "metadata": { 524 | "colab": { 525 | "base_uri": "https://localhost:8080/", 526 | "height": 34 527 | }, 528 | "colab_type": "code", 529 | "executionInfo": { 530 | "elapsed": 1388, 531 | "status": "ok", 532 | "timestamp": 1566387619059, 533 | "user": { 534 | "displayName": "dikshant gupta", 535 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 536 | "userId": "01845807612441668603" 537 | }, 538 | "user_tz": -330 539 | }, 540 | "id": "ob2Az_Oq-h0x", 541 | "outputId": "60606497-6f8c-4205-97bb-2deb2c55dbdd" 542 | }, 543 | "outputs": [ 544 | { 545 | "name": "stdout", 546 | "output_type": "stream", 547 | "text": [ 548 | "(152809,)\n" 549 | ] 550 | } 551 | ], 552 | "source": [ 553 | "print(y_train.shape)" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "colab_type": "text", 560 | "id": "sw7pp-WinSI5" 561 | }, 562 | "source": [ 563 | "#### Training and testing the classifier" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": 19, 569 | "metadata": { 570 | "colab": { 571 | "base_uri": "https://localhost:8080/", 572 | "height": 34 573 | }, 574 | "colab_type": "code", 575 | "executionInfo": { 576 | "elapsed": 3057, 577 | "status": "ok", 578 | "timestamp": 1566387636476, 579 | "user": { 580 | "displayName": "dikshant gupta", 581 | "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mBY-VAEDCe_t9LzKk0g7MBc8rY1qZ-QR-XiCIyRYw=s64", 582 | "userId": "01845807612441668603" 583 | }, 584 | "user_tz": -330 585 | }, 586 | "id": "--kThiN07-S_", 587 | "outputId": "cc3e1cc6-3c65-4836-b096-4271a7dda6b5" 588 | }, 589 | "outputs": [ 590 | { 591 | "name": "stdout", 592 | "output_type": "stream", 593 | "text": [ 594 | "0.4839618793340008\n" 595 | ] 596 | } 597 | ], 598 | "source": [ 599 | "from sklearn.naive_bayes import BernoulliNB\n", 600 | "from sklearn.metrics import accuracy_score\n", 601 | "clf = BernoulliNB()\n", 602 | "clf.fit(X_train, y_train)\n", 603 | "pred = clf.predict(X_test)\n", 604 | "print(accuracy_score(y_test, pred))" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": 20, 610 | "metadata": {}, 611 | "outputs": [], 612 | "source": [ 613 | "from sklearn.tree import DecisionTreeClassifier" 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": 21, 619 | "metadata": {}, 620 | "outputs": [ 621 | { 622 | "data": { 623 | "text/plain": [ 624 | "DecisionTreeClassifier()" 625 | ] 626 | }, 627 | "execution_count": 21, 628 | "metadata": {}, 629 | "output_type": "execute_result" 630 | } 631 | ], 632 | "source": [ 633 | "dt_classifier = DecisionTreeClassifier() \n", 634 | "dt_classifier.fit(X_train, y_train) " 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": null, 640 | "metadata": {}, 641 | "outputs": [], 642 | "source": [] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 22, 647 | "metadata": {}, 648 | "outputs": [ 649 | { 650 | "name": "stdout", 651 | "output_type": "stream", 652 | "text": [ 653 | "0.4839618793340008\n" 654 | ] 655 | } 656 | ], 657 | "source": [ 658 | "print(accuracy_score(y_test, pred))" 659 | ] 660 | } 661 | ], 662 | "metadata": { 663 | "anaconda-cloud": {}, 664 | "colab": { 665 | "name": "case_study.ipynb", 666 | "provenance": [], 667 | "toc_visible": true 668 | }, 669 | "kernelspec": { 670 | "display_name": "Python 3", 671 | "language": "python", 672 | "name": "python3" 673 | }, 674 | "language_info": { 675 | "codemirror_mode": { 676 | "name": "ipython", 677 | "version": 3 678 | }, 679 | "file_extension": ".py", 680 | "mimetype": "text/x-python", 681 | "name": "python", 682 | "nbconvert_exporter": "python", 683 | "pygments_lexer": "ipython3", 684 | "version": "3.7.4" 685 | } 686 | }, 687 | "nbformat": 4, 688 | "nbformat_minor": 1 689 | } 690 | -------------------------------------------------------------------------------- /Chapter 4/Chpater4_NLP1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "F5RaUZ0iXBaB" 8 | }, 9 | "source": [ 10 | "## Complaint Categorization Baseline Model\n", 11 | "\n", 12 | "Fast and efficient handling of complaints on consumer forums is vital to commerce industry today. This notebook presents a baseline approach towards solving this problem. Consumer complaints on financial products is taken as the dataset to establish results.\n", 13 | "\n", 14 | "Tf-idf (term frequency times inverse document frequency) scheme to weight individual tokens is often used in information retrieval. One of the advantage of tf-idf is reduce the impact of tokens that occur very frequently, hence offering little to none in terms of information.\n", 15 | "The tf-idf of term 't' in document 'd' is tf-idf(d, t) = tf(t) * idf(d, t), where tf(t) is the number of times t occurs while idf is given by idf(d, t) = log [(1 + n) / (1 + df(d,t) + 1] " 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": { 22 | "colab": {}, 23 | "colab_type": "code", 24 | "id": "KKU3Av-XXBaD" 25 | }, 26 | "outputs": [], 27 | "source": [ 28 | "from sklearn.feature_extraction.text import TfidfVectorizer\n", 29 | "from sklearn.model_selection import train_test_split\n", 30 | "\n", 31 | "# Importing pandas for operating on dataset\n", 32 | "import pandas as pd\n", 33 | "\n", 34 | "complaints_df = pd.read_csv('complaints.csv')" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "colab_type": "text", 41 | "id": "COwaeZO2XBaG" 42 | }, 43 | "source": [ 44 | "### Typical Complaint" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": { 51 | "colab": {}, 52 | "colab_type": "code", 53 | "id": "jw_3jqF5XBaH", 54 | "outputId": "c3001684-0236-447d-ff64-18f382d01112" 55 | }, 56 | "outputs": [ 57 | { 58 | "data": { 59 | "text/plain": [ 60 | "\"I purchased a new car on XXXX XXXX. The car dealer called Citizens Bank to get a 10 day payoff on my loan, good till XXXX XXXX. The dealer sent the check the next day. When I balanced my checkbook on XXXX XXXX. I noticed that Citizens bank had taken the automatic payment out of my checking account at XXXX XXXX XXXX Bank. I called Citizens and they stated that they did not close the loan until XXXX XXXX. ( stating that they did not receive the check until XXXX. XXXX. ). I told them that I did not believe that the check took that long to arrive. XXXX told me a check was issued to me for the amount overpaid, they deducted additional interest. Today ( XXXX XXXX, ) I called Citizens Bank again and talked to a supervisor named XXXX, because on XXXX XXXX. I received a letter that the loan had been paid in full ( dated XXXX, XXXX ) but no refund check was included. XXXX stated that they hold any over payment for 10 business days after the loan was satisfied and that my check would be mailed out on Wed. the XX/XX/XXXX.. I questioned her about the delay in posting the dealer payment and she first stated that sometimes it takes 3 or 4 business days to post, then she said they did not receive the check till XXXX XXXX I again told her that I did not believe this and asked where is my money. She then stated that they hold the over payment for 10 business days. I asked her why, and she simply said that is their policy. I asked her if I would receive interest on my money and she stated no. I believe that Citizens bank is deliberately delaying the posting of payment and the return of consumer 's money to make additional interest for the bank. If this is not illegal it should be, it does hurt the consumer and is not ethical. My amount of money lost is minimal but if they are doing this on thousands of car loans a month, then the additional interest earned for them could be staggering. I still have another car loan from Citizens Bank and I am afraid when I trade that car in another year I will run into the same problem again.\"" 61 | ] 62 | }, 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "output_type": "execute_result" 66 | } 67 | ], 68 | "source": [ 69 | "complaints_df['Consumer complaint narrative'][1]" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "colab_type": "text", 76 | "id": "4QXHqmFxXBaJ" 77 | }, 78 | "source": [ 79 | "### Categories" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 3, 85 | "metadata": { 86 | "colab": {}, 87 | "colab_type": "code", 88 | "id": "5xIzAd7SXBaK", 89 | "outputId": "a7bc8e90-6365-4c82-c245-6f7d8d422630" 90 | }, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "['Credit reporting' 'Consumer Loan' 'Debt collection' 'Mortgage'\n", 97 | " 'Credit card' 'Other financial service' 'Bank account or service'\n", 98 | " 'Student loan' 'Money transfers' 'Payday loan' 'Prepaid card'\n", 99 | " 'Virtual currency'\n", 100 | " 'Credit reporting, credit repair services, or other personal consumer reports'\n", 101 | " 'Credit card or prepaid card' 'Checking or savings account'\n", 102 | " 'Payday loan, title loan, or personal loan'\n", 103 | " 'Money transfer, virtual currency, or money service'\n", 104 | " 'Vehicle loan or lease']\n" 105 | ] 106 | } 107 | ], 108 | "source": [ 109 | "print(complaints_df.Product.unique())" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "colab_type": "text", 116 | "id": "cwS4qyhGXBaM" 117 | }, 118 | "source": [ 119 | "### Train-test split\n", 120 | "15% of the total data is used as validation data while the remaining as training. This leads to 152809 training instances while 26967 validation instances." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 4, 126 | "metadata": { 127 | "colab": {}, 128 | "colab_type": "code", 129 | "id": "RcHsVb4GXBaN", 130 | "outputId": "f3ba9442-057d-467a-f518-e6ff0ddc70b9" 131 | }, 132 | "outputs": [ 133 | { 134 | "name": "stdout", 135 | "output_type": "stream", 136 | "text": [ 137 | "Training utterances: 152809\n", 138 | "Validation utterances: 26967\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "X_train, X_test, y_train, y_test = train_test_split(\n", 144 | " complaints_df['Consumer complaint narrative'].values, complaints_df['Product'].values, \n", 145 | " test_size=0.15, random_state=0)\n", 146 | "print('Training utterances: {}'.format(X_train.shape[0]))\n", 147 | "print('Validation utterances: {}'.format(X_test.shape[0]))" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": { 153 | "colab_type": "text", 154 | "id": "AxKJJZn8XBaP" 155 | }, 156 | "source": [ 157 | "### Calculating tf-idf scores\n", 158 | "Calculating tf-idf scores for each unique token in the dataset and creating frequency chart for each utterance in the dataset." 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "metadata": { 165 | "colab": {}, 166 | "colab_type": "code", 167 | "id": "Ut2qdu8HXBaP", 168 | "outputId": "f1a62af5-5b7e-44e6-f26c-6b9cfae12525" 169 | }, 170 | "outputs": [ 171 | { 172 | "data": { 173 | "text/plain": [ 174 | "TfidfVectorizer()" 175 | ] 176 | }, 177 | "execution_count": 5, 178 | "metadata": {}, 179 | "output_type": "execute_result" 180 | } 181 | ], 182 | "source": [ 183 | "vectorizer = TfidfVectorizer()\n", 184 | "vectorizer.fit(X_train)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 6, 190 | "metadata": { 191 | "colab": {}, 192 | "colab_type": "code", 193 | "id": "luNy0LXQXBaR", 194 | "outputId": "2ddb2395-44ea-4ba5-f190-e3e154a8c3e9" 195 | }, 196 | "outputs": [ 197 | { 198 | "data": { 199 | "text/plain": [ 200 | "(<152809x76350 sparse matrix of type ''\n", 201 | " \twith 13864799 stored elements in Compressed Sparse Row format>,\n", 202 | " <26967x76350 sparse matrix of type ''\n", 203 | " \twith 2447784 stored elements in Compressed Sparse Row format>)" 204 | ] 205 | }, 206 | "execution_count": 6, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "X_train = vectorizer.transform(X_train)\n", 213 | "X_test = vectorizer.transform(X_test)\n", 214 | "X_train, X_test" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": { 220 | "colab_type": "text", 221 | "id": "XnG6cE6GXBaT" 222 | }, 223 | "source": [ 224 | "### Feature Selection" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 7, 230 | "metadata": { 231 | "colab": {}, 232 | "colab_type": "code", 233 | "id": "9eZdPs2fXBaU", 234 | "outputId": "2db6881f-b2bb-49c5-eb02-f8c2765a2295" 235 | }, 236 | "outputs": [ 237 | { 238 | "data": { 239 | "text/plain": [ 240 | "(<152809x5000 sparse matrix of type ''\n", 241 | " \twith 10780400 stored elements in Compressed Sparse Row format>,\n", 242 | " <26967x5000 sparse matrix of type ''\n", 243 | " \twith 1907878 stored elements in Compressed Sparse Row format>)" 244 | ] 245 | }, 246 | "execution_count": 7, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "from sklearn.feature_selection import SelectKBest, chi2\n", 253 | "\n", 254 | "ch2 = SelectKBest(chi2, k=5000)\n", 255 | "X_train = ch2.fit_transform(X_train, y_train)\n", 256 | "X_test = ch2.transform(X_test)\n", 257 | "\n", 258 | "X_train, X_test" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": { 264 | "colab_type": "text", 265 | "id": "Qez31NMtXBaW" 266 | }, 267 | "source": [ 268 | "### Naive Bayes" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 8, 274 | "metadata": { 275 | "colab": {}, 276 | "colab_type": "code", 277 | "id": "VI4oeUxsXBaW", 278 | "outputId": "1a323811-5f4a-4cc8-a5ee-b0d94eb3f86c" 279 | }, 280 | "outputs": [ 281 | { 282 | "name": "stdout", 283 | "output_type": "stream", 284 | "text": [ 285 | "0.7656024029369229\n" 286 | ] 287 | } 288 | ], 289 | "source": [ 290 | "from sklearn.naive_bayes import MultinomialNB\n", 291 | "from sklearn.metrics import accuracy_score\n", 292 | "clf = MultinomialNB()\n", 293 | "clf.fit(X_train, y_train)\n", 294 | "pred = clf.predict(X_test)\n", 295 | "print(accuracy_score(y_test, pred))" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 0, 301 | "metadata": { 302 | "colab": {}, 303 | "colab_type": "code", 304 | "id": "wINTcuMGXBaY" 305 | }, 306 | "outputs": [], 307 | "source": [] 308 | } 309 | ], 310 | "metadata": { 311 | "colab": { 312 | "name": "complaint_classification_case_study.ipynb", 313 | "provenance": [] 314 | }, 315 | "kernelspec": { 316 | "display_name": "Python 3", 317 | "language": "python", 318 | "name": "python3" 319 | }, 320 | "language_info": { 321 | "codemirror_mode": { 322 | "name": "ipython", 323 | "version": 3 324 | }, 325 | "file_extension": ".py", 326 | "mimetype": "text/x-python", 327 | "name": "python", 328 | "nbconvert_exporter": "python", 329 | "pygments_lexer": "ipython3", 330 | "version": "3.7.4" 331 | } 332 | }, 333 | "nbformat": 4, 334 | "nbformat_minor": 1 335 | } 336 | -------------------------------------------------------------------------------- /Chapter 4/ReadMe: -------------------------------------------------------------------------------- 1 | 2 | This repository contains the code and the datasets for the Chapter 4 of the book. This chapter has two additional datasets which are more than 100 MB in size and hence they have been uploaded at a Google drive. The address of the datasets is given below. Happy coding. 3 | The link of the dataset is https://drive.google.com/drive/folders/1W0RyG3_1aadOIA2ZwQEOWkj8dzBzcobg?usp=sharing 4 | -------------------------------------------------------------------------------- /Chapter 4/bc2.csv: -------------------------------------------------------------------------------- 1 | ID,ClumpThickness,Cell Size,Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Normal Nucleoli,Bland Chromatin,Mitoses,Class 2 | 1000025,5,1,1,1,2,1,3,1,1,2 3 | 1002945,5,4,4,5,7,10,3,2,1,2 4 | 1015425,3,1,1,1,2,2,3,1,1,2 5 | 1016277,6,8,8,1,3,4,3,7,1,2 6 | 1017023,4,1,1,3,2,1,3,1,1,2 7 | 1017122,8,10,10,8,7,10,9,7,1,4 8 | 1018099,1,1,1,1,2,10,3,1,1,2 9 | 1018561,2,1,2,1,2,1,3,1,1,2 10 | 1033078,2,1,1,1,2,1,1,1,5,2 11 | 1033078,4,2,1,1,2,1,2,1,1,2 12 | 1035283,1,1,1,1,1,1,3,1,1,2 13 | 1036172,2,1,1,1,2,1,2,1,1,2 14 | 1041801,5,3,3,3,2,3,4,4,1,4 15 | 1043999,1,1,1,1,2,3,3,1,1,2 16 | 1044572,8,7,5,10,7,9,5,5,4,4 17 | 1047630,7,4,6,4,6,1,4,3,1,4 18 | 1048672,4,1,1,1,2,1,2,1,1,2 19 | 1049815,4,1,1,1,2,1,3,1,1,2 20 | 1050670,10,7,7,6,4,10,4,1,2,4 21 | 1050718,6,1,1,1,2,1,3,1,1,2 22 | 1054590,7,3,2,10,5,10,5,4,4,4 23 | 1054593,10,5,5,3,6,7,7,10,1,4 24 | 1056784,3,1,1,1,2,1,2,1,1,2 25 | 1057013,8,4,5,1,2,?,7,3,1,4 26 | 1059552,1,1,1,1,2,1,3,1,1,2 27 | 1065726,5,2,3,4,2,7,3,6,1,4 28 | 1066373,3,2,1,1,1,1,2,1,1,2 29 | 1066979,5,1,1,1,2,1,2,1,1,2 30 | 1067444,2,1,1,1,2,1,2,1,1,2 31 | 1070935,1,1,3,1,2,1,1,1,1,2 32 | 1070935,3,1,1,1,1,1,2,1,1,2 33 | 1071760,2,1,1,1,2,1,3,1,1,2 34 | 1072179,10,7,7,3,8,5,7,4,3,4 35 | 1074610,2,1,1,2,2,1,3,1,1,2 36 | 1075123,3,1,2,1,2,1,2,1,1,2 37 | 1079304,2,1,1,1,2,1,2,1,1,2 38 | 1080185,10,10,10,8,6,1,8,9,1,4 39 | 1081791,6,2,1,1,1,1,7,1,1,2 40 | 1084584,5,4,4,9,2,10,5,6,1,4 41 | 1091262,2,5,3,3,6,7,7,5,1,4 42 | 1096800,6,6,6,9,6,?,7,8,1,2 43 | 1099510,10,4,3,1,3,3,6,5,2,4 44 | 1100524,6,10,10,2,8,10,7,3,3,4 45 | 1102573,5,6,5,6,10,1,3,1,1,4 46 | 1103608,10,10,10,4,8,1,8,10,1,4 47 | 1103722,1,1,1,1,2,1,2,1,2,2 48 | 1105257,3,7,7,4,4,9,4,8,1,4 49 | 1105524,1,1,1,1,2,1,2,1,1,2 50 | 1106095,4,1,1,3,2,1,3,1,1,2 51 | 1106829,7,8,7,2,4,8,3,8,2,4 52 | 1108370,9,5,8,1,2,3,2,1,5,4 53 | 1108449,5,3,3,4,2,4,3,4,1,4 54 | 1110102,10,3,6,2,3,5,4,10,2,4 55 | 1110503,5,5,5,8,10,8,7,3,7,4 56 | 1110524,10,5,5,6,8,8,7,1,1,4 57 | 1111249,10,6,6,3,4,5,3,6,1,4 58 | 1112209,8,10,10,1,3,6,3,9,1,4 59 | 1113038,8,2,4,1,5,1,5,4,4,4 60 | 1113483,5,2,3,1,6,10,5,1,1,4 61 | 1113906,9,5,5,2,2,2,5,1,1,4 62 | 1115282,5,3,5,5,3,3,4,10,1,4 63 | 1115293,1,1,1,1,2,2,2,1,1,2 64 | 1116116,9,10,10,1,10,8,3,3,1,4 65 | 1116132,6,3,4,1,5,2,3,9,1,4 66 | 1116192,1,1,1,1,2,1,2,1,1,2 67 | 1116998,10,4,2,1,3,2,4,3,10,4 68 | 1117152,4,1,1,1,2,1,3,1,1,2 69 | 1118039,5,3,4,1,8,10,4,9,1,4 70 | 1120559,8,3,8,3,4,9,8,9,8,4 71 | 1121732,1,1,1,1,2,1,3,2,1,2 72 | 1121919,5,1,3,1,2,1,2,1,1,2 73 | 1123061,6,10,2,8,10,2,7,8,10,4 74 | 1124651,1,3,3,2,2,1,7,2,1,2 75 | 1125035,9,4,5,10,6,10,4,8,1,4 76 | 1126417,10,6,4,1,3,4,3,2,3,4 77 | 1131294,1,1,2,1,2,2,4,2,1,2 78 | 1132347,1,1,4,1,2,1,2,1,1,2 79 | 1133041,5,3,1,2,2,1,2,1,1,2 80 | 1133136,3,1,1,1,2,3,3,1,1,2 81 | 1136142,2,1,1,1,3,1,2,1,1,2 82 | 1137156,2,2,2,1,1,1,7,1,1,2 83 | 1143978,4,1,1,2,2,1,2,1,1,2 84 | 1143978,5,2,1,1,2,1,3,1,1,2 85 | 1147044,3,1,1,1,2,2,7,1,1,2 86 | 1147699,3,5,7,8,8,9,7,10,7,4 87 | 1147748,5,10,6,1,10,4,4,10,10,4 88 | 1148278,3,3,6,4,5,8,4,4,1,4 89 | 1148873,3,6,6,6,5,10,6,8,3,4 90 | 1152331,4,1,1,1,2,1,3,1,1,2 91 | 1155546,2,1,1,2,3,1,2,1,1,2 92 | 1156272,1,1,1,1,2,1,3,1,1,2 93 | 1156948,3,1,1,2,2,1,1,1,1,2 94 | 1157734,4,1,1,1,2,1,3,1,1,2 95 | 1158247,1,1,1,1,2,1,2,1,1,2 96 | 1160476,2,1,1,1,2,1,3,1,1,2 97 | 1164066,1,1,1,1,2,1,3,1,1,2 98 | 1165297,2,1,1,2,2,1,1,1,1,2 99 | 1165790,5,1,1,1,2,1,3,1,1,2 100 | 1165926,9,6,9,2,10,6,2,9,10,4 101 | 1166630,7,5,6,10,5,10,7,9,4,4 102 | 1166654,10,3,5,1,10,5,3,10,2,4 103 | 1167439,2,3,4,4,2,5,2,5,1,4 104 | 1167471,4,1,2,1,2,1,3,1,1,2 105 | 1168359,8,2,3,1,6,3,7,1,1,4 106 | 1168736,10,10,10,10,10,1,8,8,8,4 107 | 1169049,7,3,4,4,3,3,3,2,7,4 108 | 1170419,10,10,10,8,2,10,4,1,1,4 109 | 1170420,1,6,8,10,8,10,5,7,1,4 110 | 1171710,1,1,1,1,2,1,2,3,1,2 111 | 1171710,6,5,4,4,3,9,7,8,3,4 112 | 1171795,1,3,1,2,2,2,5,3,2,2 113 | 1171845,8,6,4,3,5,9,3,1,1,4 114 | 1172152,10,3,3,10,2,10,7,3,3,4 115 | 1173216,10,10,10,3,10,8,8,1,1,4 116 | 1173235,3,3,2,1,2,3,3,1,1,2 117 | 1173347,1,1,1,1,2,5,1,1,1,2 118 | 1173347,8,3,3,1,2,2,3,2,1,2 119 | 1173509,4,5,5,10,4,10,7,5,8,4 120 | 1173514,1,1,1,1,4,3,1,1,1,2 121 | 1173681,3,2,1,1,2,2,3,1,1,2 122 | 1174057,1,1,2,2,2,1,3,1,1,2 123 | 1174057,4,2,1,1,2,2,3,1,1,2 124 | 1174131,10,10,10,2,10,10,5,3,3,4 125 | 1174428,5,3,5,1,8,10,5,3,1,4 126 | 1175937,5,4,6,7,9,7,8,10,1,4 127 | 1176406,1,1,1,1,2,1,2,1,1,2 128 | 1176881,7,5,3,7,4,10,7,5,5,4 129 | 1177027,3,1,1,1,2,1,3,1,1,2 130 | 1177399,8,3,5,4,5,10,1,6,2,4 131 | 1177512,1,1,1,1,10,1,1,1,1,2 132 | 1178580,5,1,3,1,2,1,2,1,1,2 133 | 1179818,2,1,1,1,2,1,3,1,1,2 134 | 1180194,5,10,8,10,8,10,3,6,3,4 135 | 1180523,3,1,1,1,2,1,2,2,1,2 136 | 1180831,3,1,1,1,3,1,2,1,1,2 137 | 1181356,5,1,1,1,2,2,3,3,1,2 138 | 1182404,4,1,1,1,2,1,2,1,1,2 139 | 1182410,3,1,1,1,2,1,1,1,1,2 140 | 1183240,4,1,2,1,2,1,2,1,1,2 141 | 1183246,1,1,1,1,1,?,2,1,1,2 142 | 1183516,3,1,1,1,2,1,1,1,1,2 143 | 1183911,2,1,1,1,2,1,1,1,1,2 144 | 1183983,9,5,5,4,4,5,4,3,3,4 145 | 1184184,1,1,1,1,2,5,1,1,1,2 146 | 1184241,2,1,1,1,2,1,2,1,1,2 147 | 1184840,1,1,3,1,2,?,2,1,1,2 148 | 1185609,3,4,5,2,6,8,4,1,1,4 149 | 1185610,1,1,1,1,3,2,2,1,1,2 150 | 1187457,3,1,1,3,8,1,5,8,1,2 151 | 1187805,8,8,7,4,10,10,7,8,7,4 152 | 1188472,1,1,1,1,1,1,3,1,1,2 153 | 1189266,7,2,4,1,6,10,5,4,3,4 154 | 1189286,10,10,8,6,4,5,8,10,1,4 155 | 1190394,4,1,1,1,2,3,1,1,1,2 156 | 1190485,1,1,1,1,2,1,1,1,1,2 157 | 1192325,5,5,5,6,3,10,3,1,1,4 158 | 1193091,1,2,2,1,2,1,2,1,1,2 159 | 1193210,2,1,1,1,2,1,3,1,1,2 160 | 1193683,1,1,2,1,3,?,1,1,1,2 161 | 1196295,9,9,10,3,6,10,7,10,6,4 162 | 1196915,10,7,7,4,5,10,5,7,2,4 163 | 1197080,4,1,1,1,2,1,3,2,1,2 164 | 1197270,3,1,1,1,2,1,3,1,1,2 165 | 1197440,1,1,1,2,1,3,1,1,7,2 166 | 1197510,5,1,1,1,2,?,3,1,1,2 167 | 1197979,4,1,1,1,2,2,3,2,1,2 168 | 1197993,5,6,7,8,8,10,3,10,3,4 169 | 1198128,10,8,10,10,6,1,3,1,10,4 170 | 1198641,3,1,1,1,2,1,3,1,1,2 171 | 1199219,1,1,1,2,1,1,1,1,1,2 172 | 1199731,3,1,1,1,2,1,1,1,1,2 173 | 1199983,1,1,1,1,2,1,3,1,1,2 174 | 1200772,1,1,1,1,2,1,2,1,1,2 175 | 1200847,6,10,10,10,8,10,10,10,7,4 176 | 1200892,8,6,5,4,3,10,6,1,1,4 177 | 1200952,5,8,7,7,10,10,5,7,1,4 178 | 1201834,2,1,1,1,2,1,3,1,1,2 179 | 1201936,5,10,10,3,8,1,5,10,3,4 180 | 1202125,4,1,1,1,2,1,3,1,1,2 181 | 1202812,5,3,3,3,6,10,3,1,1,4 182 | 1203096,1,1,1,1,1,1,3,1,1,2 183 | 1204242,1,1,1,1,2,1,1,1,1,2 184 | 1204898,6,1,1,1,2,1,3,1,1,2 185 | 1205138,5,8,8,8,5,10,7,8,1,4 186 | 1205579,8,7,6,4,4,10,5,1,1,4 187 | 1206089,2,1,1,1,1,1,3,1,1,2 188 | 1206695,1,5,8,6,5,8,7,10,1,4 189 | 1206841,10,5,6,10,6,10,7,7,10,4 190 | 1207986,5,8,4,10,5,8,9,10,1,4 191 | 1208301,1,2,3,1,2,1,3,1,1,2 192 | 1210963,10,10,10,8,6,8,7,10,1,4 193 | 1211202,7,5,10,10,10,10,4,10,3,4 194 | 1212232,5,1,1,1,2,1,2,1,1,2 195 | 1212251,1,1,1,1,2,1,3,1,1,2 196 | 1212422,3,1,1,1,2,1,3,1,1,2 197 | 1212422,4,1,1,1,2,1,3,1,1,2 198 | 1213375,8,4,4,5,4,7,7,8,2,2 199 | 1213383,5,1,1,4,2,1,3,1,1,2 200 | 1214092,1,1,1,1,2,1,1,1,1,2 201 | 1214556,3,1,1,1,2,1,2,1,1,2 202 | 1214966,9,7,7,5,5,10,7,8,3,4 203 | 1216694,10,8,8,4,10,10,8,1,1,4 204 | 1216947,1,1,1,1,2,1,3,1,1,2 205 | 1217051,5,1,1,1,2,1,3,1,1,2 206 | 1217264,1,1,1,1,2,1,3,1,1,2 207 | 1218105,5,10,10,9,6,10,7,10,5,4 208 | 1218741,10,10,9,3,7,5,3,5,1,4 209 | 1218860,1,1,1,1,1,1,3,1,1,2 210 | 1218860,1,1,1,1,1,1,3,1,1,2 211 | 1219406,5,1,1,1,1,1,3,1,1,2 212 | 1219525,8,10,10,10,5,10,8,10,6,4 213 | 1219859,8,10,8,8,4,8,7,7,1,4 214 | 1220330,1,1,1,1,2,1,3,1,1,2 215 | 1221863,10,10,10,10,7,10,7,10,4,4 216 | 1222047,10,10,10,10,3,10,10,6,1,4 217 | 1222936,8,7,8,7,5,5,5,10,2,4 218 | 1223282,1,1,1,1,2,1,2,1,1,2 219 | 1223426,1,1,1,1,2,1,3,1,1,2 220 | 1223793,6,10,7,7,6,4,8,10,2,4 221 | 1223967,6,1,3,1,2,1,3,1,1,2 222 | 1224329,1,1,1,2,2,1,3,1,1,2 223 | 1225799,10,6,4,3,10,10,9,10,1,4 224 | 1226012,4,1,1,3,1,5,2,1,1,4 225 | 1226612,7,5,6,3,3,8,7,4,1,4 226 | 1227210,10,5,5,6,3,10,7,9,2,4 227 | 1227244,1,1,1,1,2,1,2,1,1,2 228 | 1227481,10,5,7,4,4,10,8,9,1,4 229 | 1228152,8,9,9,5,3,5,7,7,1,4 230 | 1228311,1,1,1,1,1,1,3,1,1,2 231 | 1230175,10,10,10,3,10,10,9,10,1,4 232 | 1230688,7,4,7,4,3,7,7,6,1,4 233 | 1231387,6,8,7,5,6,8,8,9,2,4 234 | 1231706,8,4,6,3,3,1,4,3,1,2 235 | 1232225,10,4,5,5,5,10,4,1,1,4 236 | 1236043,3,3,2,1,3,1,3,6,1,2 237 | 1241232,3,1,4,1,2,?,3,1,1,2 238 | 1241559,10,8,8,2,8,10,4,8,10,4 239 | 1241679,9,8,8,5,6,2,4,10,4,4 240 | 1242364,8,10,10,8,6,9,3,10,10,4 241 | 1243256,10,4,3,2,3,10,5,3,2,4 242 | 1270479,5,1,3,3,2,2,2,3,1,2 243 | 1276091,3,1,1,3,1,1,3,1,1,2 244 | 1277018,2,1,1,1,2,1,3,1,1,2 245 | 128059,1,1,1,1,2,5,5,1,1,2 246 | 1285531,1,1,1,1,2,1,3,1,1,2 247 | 1287775,5,1,1,2,2,2,3,1,1,2 248 | 144888,8,10,10,8,5,10,7,8,1,4 249 | 145447,8,4,4,1,2,9,3,3,1,4 250 | 167528,4,1,1,1,2,1,3,6,1,2 251 | 169356,3,1,1,1,2,?,3,1,1,2 252 | 183913,1,2,2,1,2,1,1,1,1,2 253 | 191250,10,4,4,10,2,10,5,3,3,4 254 | 1017023,6,3,3,5,3,10,3,5,3,2 255 | 1100524,6,10,10,2,8,10,7,3,3,4 256 | 1116116,9,10,10,1,10,8,3,3,1,4 257 | 1168736,5,6,6,2,4,10,3,6,1,4 258 | 1182404,3,1,1,1,2,1,1,1,1,2 259 | 1182404,3,1,1,1,2,1,2,1,1,2 260 | 1198641,3,1,1,1,2,1,3,1,1,2 261 | 242970,5,7,7,1,5,8,3,4,1,2 262 | 255644,10,5,8,10,3,10,5,1,3,4 263 | 263538,5,10,10,6,10,10,10,6,5,4 264 | 274137,8,8,9,4,5,10,7,8,1,4 265 | 303213,10,4,4,10,6,10,5,5,1,4 266 | 314428,7,9,4,10,10,3,5,3,3,4 267 | 1182404,5,1,4,1,2,1,3,2,1,2 268 | 1198641,10,10,6,3,3,10,4,3,2,4 269 | 320675,3,3,5,2,3,10,7,1,1,4 270 | 324427,10,8,8,2,3,4,8,7,8,4 271 | 385103,1,1,1,1,2,1,3,1,1,2 272 | 390840,8,4,7,1,3,10,3,9,2,4 273 | 411453,5,1,1,1,2,1,3,1,1,2 274 | 320675,3,3,5,2,3,10,7,1,1,4 275 | 428903,7,2,4,1,3,4,3,3,1,4 276 | 431495,3,1,1,1,2,1,3,2,1,2 277 | 432809,3,1,3,1,2,?,2,1,1,2 278 | 434518,3,1,1,1,2,1,2,1,1,2 279 | 452264,1,1,1,1,2,1,2,1,1,2 280 | 456282,1,1,1,1,2,1,3,1,1,2 281 | 476903,10,5,7,3,3,7,3,3,8,4 282 | 486283,3,1,1,1,2,1,3,1,1,2 283 | 486662,2,1,1,2,2,1,3,1,1,2 284 | 488173,1,4,3,10,4,10,5,6,1,4 285 | 492268,10,4,6,1,2,10,5,3,1,4 286 | 508234,7,4,5,10,2,10,3,8,2,4 287 | 527363,8,10,10,10,8,10,10,7,3,4 288 | 529329,10,10,10,10,10,10,4,10,10,4 289 | 535331,3,1,1,1,3,1,2,1,1,2 290 | 543558,6,1,3,1,4,5,5,10,1,4 291 | 555977,5,6,6,8,6,10,4,10,4,4 292 | 560680,1,1,1,1,2,1,1,1,1,2 293 | 561477,1,1,1,1,2,1,3,1,1,2 294 | 563649,8,8,8,1,2,?,6,10,1,4 295 | 601265,10,4,4,6,2,10,2,3,1,4 296 | 606140,1,1,1,1,2,?,2,1,1,2 297 | 606722,5,5,7,8,6,10,7,4,1,4 298 | 616240,5,3,4,3,4,5,4,7,1,2 299 | 61634,5,4,3,1,2,?,2,3,1,2 300 | 625201,8,2,1,1,5,1,1,1,1,2 301 | 63375,9,1,2,6,4,10,7,7,2,4 302 | 635844,8,4,10,5,4,4,7,10,1,4 303 | 636130,1,1,1,1,2,1,3,1,1,2 304 | 640744,10,10,10,7,9,10,7,10,10,4 305 | 646904,1,1,1,1,2,1,3,1,1,2 306 | 653777,8,3,4,9,3,10,3,3,1,4 307 | 659642,10,8,4,4,4,10,3,10,4,4 308 | 666090,1,1,1,1,2,1,3,1,1,2 309 | 666942,1,1,1,1,2,1,3,1,1,2 310 | 667204,7,8,7,6,4,3,8,8,4,4 311 | 673637,3,1,1,1,2,5,5,1,1,2 312 | 684955,2,1,1,1,3,1,2,1,1,2 313 | 688033,1,1,1,1,2,1,1,1,1,2 314 | 691628,8,6,4,10,10,1,3,5,1,4 315 | 693702,1,1,1,1,2,1,1,1,1,2 316 | 704097,1,1,1,1,1,1,2,1,1,2 317 | 704168,4,6,5,6,7,?,4,9,1,2 318 | 706426,5,5,5,2,5,10,4,3,1,4 319 | 709287,6,8,7,8,6,8,8,9,1,4 320 | 718641,1,1,1,1,5,1,3,1,1,2 321 | 721482,4,4,4,4,6,5,7,3,1,2 322 | 730881,7,6,3,2,5,10,7,4,6,4 323 | 733639,3,1,1,1,2,?,3,1,1,2 324 | 733639,3,1,1,1,2,1,3,1,1,2 325 | 733823,5,4,6,10,2,10,4,1,1,4 326 | 740492,1,1,1,1,2,1,3,1,1,2 327 | 743348,3,2,2,1,2,1,2,3,1,2 328 | 752904,10,1,1,1,2,10,5,4,1,4 329 | 756136,1,1,1,1,2,1,2,1,1,2 330 | 760001,8,10,3,2,6,4,3,10,1,4 331 | 760239,10,4,6,4,5,10,7,1,1,4 332 | 76389,10,4,7,2,2,8,6,1,1,4 333 | 764974,5,1,1,1,2,1,3,1,2,2 334 | 770066,5,2,2,2,2,1,2,2,1,2 335 | 785208,5,4,6,6,4,10,4,3,1,4 336 | 785615,8,6,7,3,3,10,3,4,2,4 337 | 792744,1,1,1,1,2,1,1,1,1,2 338 | 797327,6,5,5,8,4,10,3,4,1,4 339 | 798429,1,1,1,1,2,1,3,1,1,2 340 | 704097,1,1,1,1,1,1,2,1,1,2 341 | 806423,8,5,5,5,2,10,4,3,1,4 342 | 809912,10,3,3,1,2,10,7,6,1,4 343 | 810104,1,1,1,1,2,1,3,1,1,2 344 | 814265,2,1,1,1,2,1,1,1,1,2 345 | 814911,1,1,1,1,2,1,1,1,1,2 346 | 822829,7,6,4,8,10,10,9,5,3,4 347 | 826923,1,1,1,1,2,1,1,1,1,2 348 | 830690,5,2,2,2,3,1,1,3,1,2 349 | 831268,1,1,1,1,1,1,1,3,1,2 350 | 832226,3,4,4,10,5,1,3,3,1,4 351 | 832567,4,2,3,5,3,8,7,6,1,4 352 | 836433,5,1,1,3,2,1,1,1,1,2 353 | 837082,2,1,1,1,2,1,3,1,1,2 354 | 846832,3,4,5,3,7,3,4,6,1,2 355 | 850831,2,7,10,10,7,10,4,9,4,4 356 | 855524,1,1,1,1,2,1,2,1,1,2 357 | 857774,4,1,1,1,3,1,2,2,1,2 358 | 859164,5,3,3,1,3,3,3,3,3,4 359 | 859350,8,10,10,7,10,10,7,3,8,4 360 | 866325,8,10,5,3,8,4,4,10,3,4 361 | 873549,10,3,5,4,3,7,3,5,3,4 362 | 877291,6,10,10,10,10,10,8,10,10,4 363 | 877943,3,10,3,10,6,10,5,1,4,4 364 | 888169,3,2,2,1,4,3,2,1,1,2 365 | 888523,4,4,4,2,2,3,2,1,1,2 366 | 896404,2,1,1,1,2,1,3,1,1,2 367 | 897172,2,1,1,1,2,1,2,1,1,2 368 | 95719,6,10,10,10,8,10,7,10,7,4 369 | 160296,5,8,8,10,5,10,8,10,3,4 370 | 342245,1,1,3,1,2,1,1,1,1,2 371 | 428598,1,1,3,1,1,1,2,1,1,2 372 | 492561,4,3,2,1,3,1,2,1,1,2 373 | 493452,1,1,3,1,2,1,1,1,1,2 374 | 493452,4,1,2,1,2,1,2,1,1,2 375 | 521441,5,1,1,2,2,1,2,1,1,2 376 | 560680,3,1,2,1,2,1,2,1,1,2 377 | 636437,1,1,1,1,2,1,1,1,1,2 378 | 640712,1,1,1,1,2,1,2,1,1,2 379 | 654244,1,1,1,1,1,1,2,1,1,2 380 | 657753,3,1,1,4,3,1,2,2,1,2 381 | 685977,5,3,4,1,4,1,3,1,1,2 382 | 805448,1,1,1,1,2,1,1,1,1,2 383 | 846423,10,6,3,6,4,10,7,8,4,4 384 | 1002504,3,2,2,2,2,1,3,2,1,2 385 | 1022257,2,1,1,1,2,1,1,1,1,2 386 | 1026122,2,1,1,1,2,1,1,1,1,2 387 | 1071084,3,3,2,2,3,1,1,2,3,2 388 | 1080233,7,6,6,3,2,10,7,1,1,4 389 | 1114570,5,3,3,2,3,1,3,1,1,2 390 | 1114570,2,1,1,1,2,1,2,2,1,2 391 | 1116715,5,1,1,1,3,2,2,2,1,2 392 | 1131411,1,1,1,2,2,1,2,1,1,2 393 | 1151734,10,8,7,4,3,10,7,9,1,4 394 | 1156017,3,1,1,1,2,1,2,1,1,2 395 | 1158247,1,1,1,1,1,1,1,1,1,2 396 | 1158405,1,2,3,1,2,1,2,1,1,2 397 | 1168278,3,1,1,1,2,1,2,1,1,2 398 | 1176187,3,1,1,1,2,1,3,1,1,2 399 | 1196263,4,1,1,1,2,1,1,1,1,2 400 | 1196475,3,2,1,1,2,1,2,2,1,2 401 | 1206314,1,2,3,1,2,1,1,1,1,2 402 | 1211265,3,10,8,7,6,9,9,3,8,4 403 | 1213784,3,1,1,1,2,1,1,1,1,2 404 | 1223003,5,3,3,1,2,1,2,1,1,2 405 | 1223306,3,1,1,1,2,4,1,1,1,2 406 | 1223543,1,2,1,3,2,1,1,2,1,2 407 | 1229929,1,1,1,1,2,1,2,1,1,2 408 | 1231853,4,2,2,1,2,1,2,1,1,2 409 | 1234554,1,1,1,1,2,1,2,1,1,2 410 | 1236837,2,3,2,2,2,2,3,1,1,2 411 | 1237674,3,1,2,1,2,1,2,1,1,2 412 | 1238021,1,1,1,1,2,1,2,1,1,2 413 | 1238464,1,1,1,1,1,?,2,1,1,2 414 | 1238633,10,10,10,6,8,4,8,5,1,4 415 | 1238915,5,1,2,1,2,1,3,1,1,2 416 | 1238948,8,5,6,2,3,10,6,6,1,4 417 | 1239232,3,3,2,6,3,3,3,5,1,2 418 | 1239347,8,7,8,5,10,10,7,2,1,4 419 | 1239967,1,1,1,1,2,1,2,1,1,2 420 | 1240337,5,2,2,2,2,2,3,2,2,2 421 | 1253505,2,3,1,1,5,1,1,1,1,2 422 | 1255384,3,2,2,3,2,3,3,1,1,2 423 | 1257200,10,10,10,7,10,10,8,2,1,4 424 | 1257648,4,3,3,1,2,1,3,3,1,2 425 | 1257815,5,1,3,1,2,1,2,1,1,2 426 | 1257938,3,1,1,1,2,1,1,1,1,2 427 | 1258549,9,10,10,10,10,10,10,10,1,4 428 | 1258556,5,3,6,1,2,1,1,1,1,2 429 | 1266154,8,7,8,2,4,2,5,10,1,4 430 | 1272039,1,1,1,1,2,1,2,1,1,2 431 | 1276091,2,1,1,1,2,1,2,1,1,2 432 | 1276091,1,3,1,1,2,1,2,2,1,2 433 | 1276091,5,1,1,3,4,1,3,2,1,2 434 | 1277629,5,1,1,1,2,1,2,2,1,2 435 | 1293439,3,2,2,3,2,1,1,1,1,2 436 | 1293439,6,9,7,5,5,8,4,2,1,2 437 | 1294562,10,8,10,1,3,10,5,1,1,4 438 | 1295186,10,10,10,1,6,1,2,8,1,4 439 | 527337,4,1,1,1,2,1,1,1,1,2 440 | 558538,4,1,3,3,2,1,1,1,1,2 441 | 566509,5,1,1,1,2,1,1,1,1,2 442 | 608157,10,4,3,10,4,10,10,1,1,4 443 | 677910,5,2,2,4,2,4,1,1,1,2 444 | 734111,1,1,1,3,2,3,1,1,1,2 445 | 734111,1,1,1,1,2,2,1,1,1,2 446 | 780555,5,1,1,6,3,1,2,1,1,2 447 | 827627,2,1,1,1,2,1,1,1,1,2 448 | 1049837,1,1,1,1,2,1,1,1,1,2 449 | 1058849,5,1,1,1,2,1,1,1,1,2 450 | 1182404,1,1,1,1,1,1,1,1,1,2 451 | 1193544,5,7,9,8,6,10,8,10,1,4 452 | 1201870,4,1,1,3,1,1,2,1,1,2 453 | 1202253,5,1,1,1,2,1,1,1,1,2 454 | 1227081,3,1,1,3,2,1,1,1,1,2 455 | 1230994,4,5,5,8,6,10,10,7,1,4 456 | 1238410,2,3,1,1,3,1,1,1,1,2 457 | 1246562,10,2,2,1,2,6,1,1,2,4 458 | 1257470,10,6,5,8,5,10,8,6,1,4 459 | 1259008,8,8,9,6,6,3,10,10,1,4 460 | 1266124,5,1,2,1,2,1,1,1,1,2 461 | 1267898,5,1,3,1,2,1,1,1,1,2 462 | 1268313,5,1,1,3,2,1,1,1,1,2 463 | 1268804,3,1,1,1,2,5,1,1,1,2 464 | 1276091,6,1,1,3,2,1,1,1,1,2 465 | 1280258,4,1,1,1,2,1,1,2,1,2 466 | 1293966,4,1,1,1,2,1,1,1,1,2 467 | 1296572,10,9,8,7,6,4,7,10,3,4 468 | 1298416,10,6,6,2,4,10,9,7,1,4 469 | 1299596,6,6,6,5,4,10,7,6,2,4 470 | 1105524,4,1,1,1,2,1,1,1,1,2 471 | 1181685,1,1,2,1,2,1,2,1,1,2 472 | 1211594,3,1,1,1,1,1,2,1,1,2 473 | 1238777,6,1,1,3,2,1,1,1,1,2 474 | 1257608,6,1,1,1,1,1,1,1,1,2 475 | 1269574,4,1,1,1,2,1,1,1,1,2 476 | 1277145,5,1,1,1,2,1,1,1,1,2 477 | 1287282,3,1,1,1,2,1,1,1,1,2 478 | 1296025,4,1,2,1,2,1,1,1,1,2 479 | 1296263,4,1,1,1,2,1,1,1,1,2 480 | 1296593,5,2,1,1,2,1,1,1,1,2 481 | 1299161,4,8,7,10,4,10,7,5,1,4 482 | 1301945,5,1,1,1,1,1,1,1,1,2 483 | 1302428,5,3,2,4,2,1,1,1,1,2 484 | 1318169,9,10,10,10,10,5,10,10,10,4 485 | 474162,8,7,8,5,5,10,9,10,1,4 486 | 787451,5,1,2,1,2,1,1,1,1,2 487 | 1002025,1,1,1,3,1,3,1,1,1,2 488 | 1070522,3,1,1,1,1,1,2,1,1,2 489 | 1073960,10,10,10,10,6,10,8,1,5,4 490 | 1076352,3,6,4,10,3,3,3,4,1,4 491 | 1084139,6,3,2,1,3,4,4,1,1,4 492 | 1115293,1,1,1,1,2,1,1,1,1,2 493 | 1119189,5,8,9,4,3,10,7,1,1,4 494 | 1133991,4,1,1,1,1,1,2,1,1,2 495 | 1142706,5,10,10,10,6,10,6,5,2,4 496 | 1155967,5,1,2,10,4,5,2,1,1,2 497 | 1170945,3,1,1,1,1,1,2,1,1,2 498 | 1181567,1,1,1,1,1,1,1,1,1,2 499 | 1182404,4,2,1,1,2,1,1,1,1,2 500 | 1204558,4,1,1,1,2,1,2,1,1,2 501 | 1217952,4,1,1,1,2,1,2,1,1,2 502 | 1224565,6,1,1,1,2,1,3,1,1,2 503 | 1238186,4,1,1,1,2,1,2,1,1,2 504 | 1253917,4,1,1,2,2,1,2,1,1,2 505 | 1265899,4,1,1,1,2,1,3,1,1,2 506 | 1268766,1,1,1,1,2,1,1,1,1,2 507 | 1277268,3,3,1,1,2,1,1,1,1,2 508 | 1286943,8,10,10,10,7,5,4,8,7,4 509 | 1295508,1,1,1,1,2,4,1,1,1,2 510 | 1297327,5,1,1,1,2,1,1,1,1,2 511 | 1297522,2,1,1,1,2,1,1,1,1,2 512 | 1298360,1,1,1,1,2,1,1,1,1,2 513 | 1299924,5,1,1,1,2,1,2,1,1,2 514 | 1299994,5,1,1,1,2,1,1,1,1,2 515 | 1304595,3,1,1,1,1,1,2,1,1,2 516 | 1306282,6,6,7,10,3,10,8,10,2,4 517 | 1313325,4,10,4,7,3,10,9,10,1,4 518 | 1320077,1,1,1,1,1,1,1,1,1,2 519 | 1320077,1,1,1,1,1,1,2,1,1,2 520 | 1320304,3,1,2,2,2,1,1,1,1,2 521 | 1330439,4,7,8,3,4,10,9,1,1,4 522 | 333093,1,1,1,1,3,1,1,1,1,2 523 | 369565,4,1,1,1,3,1,1,1,1,2 524 | 412300,10,4,5,4,3,5,7,3,1,4 525 | 672113,7,5,6,10,4,10,5,3,1,4 526 | 749653,3,1,1,1,2,1,2,1,1,2 527 | 769612,3,1,1,2,2,1,1,1,1,2 528 | 769612,4,1,1,1,2,1,1,1,1,2 529 | 798429,4,1,1,1,2,1,3,1,1,2 530 | 807657,6,1,3,2,2,1,1,1,1,2 531 | 8233704,4,1,1,1,1,1,2,1,1,2 532 | 837480,7,4,4,3,4,10,6,9,1,4 533 | 867392,4,2,2,1,2,1,2,1,1,2 534 | 869828,1,1,1,1,1,1,3,1,1,2 535 | 1043068,3,1,1,1,2,1,2,1,1,2 536 | 1056171,2,1,1,1,2,1,2,1,1,2 537 | 1061990,1,1,3,2,2,1,3,1,1,2 538 | 1113061,5,1,1,1,2,1,3,1,1,2 539 | 1116192,5,1,2,1,2,1,3,1,1,2 540 | 1135090,4,1,1,1,2,1,2,1,1,2 541 | 1145420,6,1,1,1,2,1,2,1,1,2 542 | 1158157,5,1,1,1,2,2,2,1,1,2 543 | 1171578,3,1,1,1,2,1,1,1,1,2 544 | 1174841,5,3,1,1,2,1,1,1,1,2 545 | 1184586,4,1,1,1,2,1,2,1,1,2 546 | 1186936,2,1,3,2,2,1,2,1,1,2 547 | 1197527,5,1,1,1,2,1,2,1,1,2 548 | 1222464,6,10,10,10,4,10,7,10,1,4 549 | 1240603,2,1,1,1,1,1,1,1,1,2 550 | 1240603,3,1,1,1,1,1,1,1,1,2 551 | 1241035,7,8,3,7,4,5,7,8,2,4 552 | 1287971,3,1,1,1,2,1,2,1,1,2 553 | 1289391,1,1,1,1,2,1,3,1,1,2 554 | 1299924,3,2,2,2,2,1,4,2,1,2 555 | 1306339,4,4,2,1,2,5,2,1,2,2 556 | 1313658,3,1,1,1,2,1,1,1,1,2 557 | 1313982,4,3,1,1,2,1,4,8,1,2 558 | 1321264,5,2,2,2,1,1,2,1,1,2 559 | 1321321,5,1,1,3,2,1,1,1,1,2 560 | 1321348,2,1,1,1,2,1,2,1,1,2 561 | 1321931,5,1,1,1,2,1,2,1,1,2 562 | 1321942,5,1,1,1,2,1,3,1,1,2 563 | 1321942,5,1,1,1,2,1,3,1,1,2 564 | 1328331,1,1,1,1,2,1,3,1,1,2 565 | 1328755,3,1,1,1,2,1,2,1,1,2 566 | 1331405,4,1,1,1,2,1,3,2,1,2 567 | 1331412,5,7,10,10,5,10,10,10,1,4 568 | 1333104,3,1,2,1,2,1,3,1,1,2 569 | 1334071,4,1,1,1,2,3,2,1,1,2 570 | 1343068,8,4,4,1,6,10,2,5,2,4 571 | 1343374,10,10,8,10,6,5,10,3,1,4 572 | 1344121,8,10,4,4,8,10,8,2,1,4 573 | 142932,7,6,10,5,3,10,9,10,2,4 574 | 183936,3,1,1,1,2,1,2,1,1,2 575 | 324382,1,1,1,1,2,1,2,1,1,2 576 | 378275,10,9,7,3,4,2,7,7,1,4 577 | 385103,5,1,2,1,2,1,3,1,1,2 578 | 690557,5,1,1,1,2,1,2,1,1,2 579 | 695091,1,1,1,1,2,1,2,1,1,2 580 | 695219,1,1,1,1,2,1,2,1,1,2 581 | 824249,1,1,1,1,2,1,3,1,1,2 582 | 871549,5,1,2,1,2,1,2,1,1,2 583 | 878358,5,7,10,6,5,10,7,5,1,4 584 | 1107684,6,10,5,5,4,10,6,10,1,4 585 | 1115762,3,1,1,1,2,1,1,1,1,2 586 | 1217717,5,1,1,6,3,1,1,1,1,2 587 | 1239420,1,1,1,1,2,1,1,1,1,2 588 | 1254538,8,10,10,10,6,10,10,10,1,4 589 | 1261751,5,1,1,1,2,1,2,2,1,2 590 | 1268275,9,8,8,9,6,3,4,1,1,4 591 | 1272166,5,1,1,1,2,1,1,1,1,2 592 | 1294261,4,10,8,5,4,1,10,1,1,4 593 | 1295529,2,5,7,6,4,10,7,6,1,4 594 | 1298484,10,3,4,5,3,10,4,1,1,4 595 | 1311875,5,1,2,1,2,1,1,1,1,2 596 | 1315506,4,8,6,3,4,10,7,1,1,4 597 | 1320141,5,1,1,1,2,1,2,1,1,2 598 | 1325309,4,1,2,1,2,1,2,1,1,2 599 | 1333063,5,1,3,1,2,1,3,1,1,2 600 | 1333495,3,1,1,1,2,1,2,1,1,2 601 | 1334659,5,2,4,1,1,1,1,1,1,2 602 | 1336798,3,1,1,1,2,1,2,1,1,2 603 | 1344449,1,1,1,1,1,1,2,1,1,2 604 | 1350568,4,1,1,1,2,1,2,1,1,2 605 | 1352663,5,4,6,8,4,1,8,10,1,4 606 | 188336,5,3,2,8,5,10,8,1,2,4 607 | 352431,10,5,10,3,5,8,7,8,3,4 608 | 353098,4,1,1,2,2,1,1,1,1,2 609 | 411453,1,1,1,1,2,1,1,1,1,2 610 | 557583,5,10,10,10,10,10,10,1,1,4 611 | 636375,5,1,1,1,2,1,1,1,1,2 612 | 736150,10,4,3,10,3,10,7,1,2,4 613 | 803531,5,10,10,10,5,2,8,5,1,4 614 | 822829,8,10,10,10,6,10,10,10,10,4 615 | 1016634,2,3,1,1,2,1,2,1,1,2 616 | 1031608,2,1,1,1,1,1,2,1,1,2 617 | 1041043,4,1,3,1,2,1,2,1,1,2 618 | 1042252,3,1,1,1,2,1,2,1,1,2 619 | 1057067,1,1,1,1,1,?,1,1,1,2 620 | 1061990,4,1,1,1,2,1,2,1,1,2 621 | 1073836,5,1,1,1,2,1,2,1,1,2 622 | 1083817,3,1,1,1,2,1,2,1,1,2 623 | 1096352,6,3,3,3,3,2,6,1,1,2 624 | 1140597,7,1,2,3,2,1,2,1,1,2 625 | 1149548,1,1,1,1,2,1,1,1,1,2 626 | 1174009,5,1,1,2,1,1,2,1,1,2 627 | 1183596,3,1,3,1,3,4,1,1,1,2 628 | 1190386,4,6,6,5,7,6,7,7,3,4 629 | 1190546,2,1,1,1,2,5,1,1,1,2 630 | 1213273,2,1,1,1,2,1,1,1,1,2 631 | 1218982,4,1,1,1,2,1,1,1,1,2 632 | 1225382,6,2,3,1,2,1,1,1,1,2 633 | 1235807,5,1,1,1,2,1,2,1,1,2 634 | 1238777,1,1,1,1,2,1,1,1,1,2 635 | 1253955,8,7,4,4,5,3,5,10,1,4 636 | 1257366,3,1,1,1,2,1,1,1,1,2 637 | 1260659,3,1,4,1,2,1,1,1,1,2 638 | 1268952,10,10,7,8,7,1,10,10,3,4 639 | 1275807,4,2,4,3,2,2,2,1,1,2 640 | 1277792,4,1,1,1,2,1,1,1,1,2 641 | 1277792,5,1,1,3,2,1,1,1,1,2 642 | 1285722,4,1,1,3,2,1,1,1,1,2 643 | 1288608,3,1,1,1,2,1,2,1,1,2 644 | 1290203,3,1,1,1,2,1,2,1,1,2 645 | 1294413,1,1,1,1,2,1,1,1,1,2 646 | 1299596,2,1,1,1,2,1,1,1,1,2 647 | 1303489,3,1,1,1,2,1,2,1,1,2 648 | 1311033,1,2,2,1,2,1,1,1,1,2 649 | 1311108,1,1,1,3,2,1,1,1,1,2 650 | 1315807,5,10,10,10,10,2,10,10,10,4 651 | 1318671,3,1,1,1,2,1,2,1,1,2 652 | 1319609,3,1,1,2,3,4,1,1,1,2 653 | 1323477,1,2,1,3,2,1,2,1,1,2 654 | 1324572,5,1,1,1,2,1,2,2,1,2 655 | 1324681,4,1,1,1,2,1,2,1,1,2 656 | 1325159,3,1,1,1,2,1,3,1,1,2 657 | 1326892,3,1,1,1,2,1,2,1,1,2 658 | 1330361,5,1,1,1,2,1,2,1,1,2 659 | 1333877,5,4,5,1,8,1,3,6,1,2 660 | 1334015,7,8,8,7,3,10,7,2,3,4 661 | 1334667,1,1,1,1,2,1,1,1,1,2 662 | 1339781,1,1,1,1,2,1,2,1,1,2 663 | 1339781,4,1,1,1,2,1,3,1,1,2 664 | 13454352,1,1,3,1,2,1,2,1,1,2 665 | 1345452,1,1,3,1,2,1,2,1,1,2 666 | 1345593,3,1,1,3,2,1,2,1,1,2 667 | 1347749,1,1,1,1,2,1,1,1,1,2 668 | 1347943,5,2,2,2,2,1,1,1,2,2 669 | 1348851,3,1,1,1,2,1,3,1,1,2 670 | 1350319,5,7,4,1,6,1,7,10,3,4 671 | 1350423,5,10,10,8,5,5,7,10,1,4 672 | 1352848,3,10,7,8,5,8,7,4,1,4 673 | 1353092,3,2,1,2,2,1,3,1,1,2 674 | 1354840,2,1,1,1,2,1,3,1,1,2 675 | 1354840,5,3,2,1,3,1,1,1,1,2 676 | 1355260,1,1,1,1,2,1,2,1,1,2 677 | 1365075,4,1,4,1,2,1,1,1,1,2 678 | 1365328,1,1,2,1,2,1,2,1,1,2 679 | 1368267,5,1,1,1,2,1,1,1,1,2 680 | 1368273,1,1,1,1,2,1,1,1,1,2 681 | 1368882,2,1,1,1,2,1,1,1,1,2 682 | 1369821,10,10,10,10,5,10,10,10,7,4 683 | 1371026,5,10,10,10,4,10,5,6,3,4 684 | 1371920,5,1,1,1,2,1,3,2,1,2 685 | 466906,1,1,1,1,2,1,1,1,1,2 686 | 466906,1,1,1,1,2,1,1,1,1,2 687 | 534555,1,1,1,1,2,1,1,1,1,2 688 | 536708,1,1,1,1,2,1,1,1,1,2 689 | 566346,3,1,1,1,2,1,2,3,1,2 690 | 603148,4,1,1,1,2,1,1,1,1,2 691 | 654546,1,1,1,1,2,1,1,1,8,2 692 | 654546,1,1,1,3,2,1,1,1,1,2 693 | 695091,5,10,10,5,4,5,4,4,1,4 694 | 714039,3,1,1,1,2,1,1,1,1,2 695 | 763235,3,1,1,1,2,1,2,1,2,2 696 | 776715,3,1,1,1,3,2,1,1,1,2 697 | 841769,2,1,1,1,2,1,1,1,1,2 698 | 888820,5,10,10,3,7,3,8,10,2,4 699 | 897471,4,8,6,4,3,4,10,6,1,4 700 | 897471,4,8,8,5,4,5,10,4,1,4 -------------------------------------------------------------------------------- /Chapter 4/pima-indians-diabetes.csv: -------------------------------------------------------------------------------- 1 | Preg,Plas,Pres,skin,test,mass,pedi,age,class 2 | 6,148,72,35,0,33.6,0.627,50,1 3 | 1,85,66,29,0,26.6,0.351,31,0 4 | 8,183,64,0,0,23.3,0.672,32,1 5 | 1,89,66,23,94,28.1,0.167,21,0 6 | 0,137,40,35,168,43.1,2.288,33,1 7 | 5,116,74,0,0,25.6,0.201,30,0 8 | 3,78,50,32,88,31,0.248,26,1 9 | 10,115,0,0,0,35.3,0.134,29,0 10 | 2,197,70,45,543,30.5,0.158,53,1 11 | 8,125,96,0,0,0,0.232,54,1 12 | 4,110,92,0,0,37.6,0.191,30,0 13 | 10,168,74,0,0,38,0.537,34,1 14 | 10,139,80,0,0,27.1,1.441,57,0 15 | 1,189,60,23,846,30.1,0.398,59,1 16 | 5,166,72,19,175,25.8,0.587,51,1 17 | 7,100,0,0,0,30,0.484,32,1 18 | 0,118,84,47,230,45.8,0.551,31,1 19 | 7,107,74,0,0,29.6,0.254,31,1 20 | 1,103,30,38,83,43.3,0.183,33,0 21 | 1,115,70,30,96,34.6,0.529,32,1 22 | 3,126,88,41,235,39.3,0.704,27,0 23 | 8,99,84,0,0,35.4,0.388,50,0 24 | 7,196,90,0,0,39.8,0.451,41,1 25 | 9,119,80,35,0,29,0.263,29,1 26 | 11,143,94,33,146,36.6,0.254,51,1 27 | 10,125,70,26,115,31.1,0.205,41,1 28 | 7,147,76,0,0,39.4,0.257,43,1 29 | 1,97,66,15,140,23.2,0.487,22,0 30 | 13,145,82,19,110,22.2,0.245,57,0 31 | 5,117,92,0,0,34.1,0.337,38,0 32 | 5,109,75,26,0,36,0.546,60,0 33 | 3,158,76,36,245,31.6,0.851,28,1 34 | 3,88,58,11,54,24.8,0.267,22,0 35 | 6,92,92,0,0,19.9,0.188,28,0 36 | 10,122,78,31,0,27.6,0.512,45,0 37 | 4,103,60,33,192,24,0.966,33,0 38 | 11,138,76,0,0,33.2,0.42,35,0 39 | 9,102,76,37,0,32.9,0.665,46,1 40 | 2,90,68,42,0,38.2,0.503,27,1 41 | 4,111,72,47,207,37.1,1.39,56,1 42 | 3,180,64,25,70,34,0.271,26,0 43 | 7,133,84,0,0,40.2,0.696,37,0 44 | 7,106,92,18,0,22.7,0.235,48,0 45 | 9,171,110,24,240,45.4,0.721,54,1 46 | 7,159,64,0,0,27.4,0.294,40,0 47 | 0,180,66,39,0,42,1.893,25,1 48 | 1,146,56,0,0,29.7,0.564,29,0 49 | 2,71,70,27,0,28,0.586,22,0 50 | 7,103,66,32,0,39.1,0.344,31,1 51 | 7,105,0,0,0,0,0.305,24,0 52 | 1,103,80,11,82,19.4,0.491,22,0 53 | 1,101,50,15,36,24.2,0.526,26,0 54 | 5,88,66,21,23,24.4,0.342,30,0 55 | 8,176,90,34,300,33.7,0.467,58,1 56 | 7,150,66,42,342,34.7,0.718,42,0 57 | 1,73,50,10,0,23,0.248,21,0 58 | 7,187,68,39,304,37.7,0.254,41,1 59 | 0,100,88,60,110,46.8,0.962,31,0 60 | 0,146,82,0,0,40.5,1.781,44,0 61 | 0,105,64,41,142,41.5,0.173,22,0 62 | 2,84,0,0,0,0,0.304,21,0 63 | 8,133,72,0,0,32.9,0.27,39,1 64 | 5,44,62,0,0,25,0.587,36,0 65 | 2,141,58,34,128,25.4,0.699,24,0 66 | 7,114,66,0,0,32.8,0.258,42,1 67 | 5,99,74,27,0,29,0.203,32,0 68 | 0,109,88,30,0,32.5,0.855,38,1 69 | 2,109,92,0,0,42.7,0.845,54,0 70 | 1,95,66,13,38,19.6,0.334,25,0 71 | 4,146,85,27,100,28.9,0.189,27,0 72 | 2,100,66,20,90,32.9,0.867,28,1 73 | 5,139,64,35,140,28.6,0.411,26,0 74 | 13,126,90,0,0,43.4,0.583,42,1 75 | 4,129,86,20,270,35.1,0.231,23,0 76 | 1,79,75,30,0,32,0.396,22,0 77 | 1,0,48,20,0,24.7,0.14,22,0 78 | 7,62,78,0,0,32.6,0.391,41,0 79 | 5,95,72,33,0,37.7,0.37,27,0 80 | 0,131,0,0,0,43.2,0.27,26,1 81 | 2,112,66,22,0,25,0.307,24,0 82 | 3,113,44,13,0,22.4,0.14,22,0 83 | 2,74,0,0,0,0,0.102,22,0 84 | 7,83,78,26,71,29.3,0.767,36,0 85 | 0,101,65,28,0,24.6,0.237,22,0 86 | 5,137,108,0,0,48.8,0.227,37,1 87 | 2,110,74,29,125,32.4,0.698,27,0 88 | 13,106,72,54,0,36.6,0.178,45,0 89 | 2,100,68,25,71,38.5,0.324,26,0 90 | 15,136,70,32,110,37.1,0.153,43,1 91 | 1,107,68,19,0,26.5,0.165,24,0 92 | 1,80,55,0,0,19.1,0.258,21,0 93 | 4,123,80,15,176,32,0.443,34,0 94 | 7,81,78,40,48,46.7,0.261,42,0 95 | 4,134,72,0,0,23.8,0.277,60,1 96 | 2,142,82,18,64,24.7,0.761,21,0 97 | 6,144,72,27,228,33.9,0.255,40,0 98 | 2,92,62,28,0,31.6,0.13,24,0 99 | 1,71,48,18,76,20.4,0.323,22,0 100 | 6,93,50,30,64,28.7,0.356,23,0 101 | 1,122,90,51,220,49.7,0.325,31,1 102 | 1,163,72,0,0,39,1.222,33,1 103 | 1,151,60,0,0,26.1,0.179,22,0 104 | 0,125,96,0,0,22.5,0.262,21,0 105 | 1,81,72,18,40,26.6,0.283,24,0 106 | 2,85,65,0,0,39.6,0.93,27,0 107 | 1,126,56,29,152,28.7,0.801,21,0 108 | 1,96,122,0,0,22.4,0.207,27,0 109 | 4,144,58,28,140,29.5,0.287,37,0 110 | 3,83,58,31,18,34.3,0.336,25,0 111 | 0,95,85,25,36,37.4,0.247,24,1 112 | 3,171,72,33,135,33.3,0.199,24,1 113 | 8,155,62,26,495,34,0.543,46,1 114 | 1,89,76,34,37,31.2,0.192,23,0 115 | 4,76,62,0,0,34,0.391,25,0 116 | 7,160,54,32,175,30.5,0.588,39,1 117 | 4,146,92,0,0,31.2,0.539,61,1 118 | 5,124,74,0,0,34,0.22,38,1 119 | 5,78,48,0,0,33.7,0.654,25,0 120 | 4,97,60,23,0,28.2,0.443,22,0 121 | 4,99,76,15,51,23.2,0.223,21,0 122 | 0,162,76,56,100,53.2,0.759,25,1 123 | 6,111,64,39,0,34.2,0.26,24,0 124 | 2,107,74,30,100,33.6,0.404,23,0 125 | 5,132,80,0,0,26.8,0.186,69,0 126 | 0,113,76,0,0,33.3,0.278,23,1 127 | 1,88,30,42,99,55,0.496,26,1 128 | 3,120,70,30,135,42.9,0.452,30,0 129 | 1,118,58,36,94,33.3,0.261,23,0 130 | 1,117,88,24,145,34.5,0.403,40,1 131 | 0,105,84,0,0,27.9,0.741,62,1 132 | 4,173,70,14,168,29.7,0.361,33,1 133 | 9,122,56,0,0,33.3,1.114,33,1 134 | 3,170,64,37,225,34.5,0.356,30,1 135 | 8,84,74,31,0,38.3,0.457,39,0 136 | 2,96,68,13,49,21.1,0.647,26,0 137 | 2,125,60,20,140,33.8,0.088,31,0 138 | 0,100,70,26,50,30.8,0.597,21,0 139 | 0,93,60,25,92,28.7,0.532,22,0 140 | 0,129,80,0,0,31.2,0.703,29,0 141 | 5,105,72,29,325,36.9,0.159,28,0 142 | 3,128,78,0,0,21.1,0.268,55,0 143 | 5,106,82,30,0,39.5,0.286,38,0 144 | 2,108,52,26,63,32.5,0.318,22,0 145 | 10,108,66,0,0,32.4,0.272,42,1 146 | 4,154,62,31,284,32.8,0.237,23,0 147 | 0,102,75,23,0,0,0.572,21,0 148 | 9,57,80,37,0,32.8,0.096,41,0 149 | 2,106,64,35,119,30.5,1.4,34,0 150 | 5,147,78,0,0,33.7,0.218,65,0 151 | 2,90,70,17,0,27.3,0.085,22,0 152 | 1,136,74,50,204,37.4,0.399,24,0 153 | 4,114,65,0,0,21.9,0.432,37,0 154 | 9,156,86,28,155,34.3,1.189,42,1 155 | 1,153,82,42,485,40.6,0.687,23,0 156 | 8,188,78,0,0,47.9,0.137,43,1 157 | 7,152,88,44,0,50,0.337,36,1 158 | 2,99,52,15,94,24.6,0.637,21,0 159 | 1,109,56,21,135,25.2,0.833,23,0 160 | 2,88,74,19,53,29,0.229,22,0 161 | 17,163,72,41,114,40.9,0.817,47,1 162 | 4,151,90,38,0,29.7,0.294,36,0 163 | 7,102,74,40,105,37.2,0.204,45,0 164 | 0,114,80,34,285,44.2,0.167,27,0 165 | 2,100,64,23,0,29.7,0.368,21,0 166 | 0,131,88,0,0,31.6,0.743,32,1 167 | 6,104,74,18,156,29.9,0.722,41,1 168 | 3,148,66,25,0,32.5,0.256,22,0 169 | 4,120,68,0,0,29.6,0.709,34,0 170 | 4,110,66,0,0,31.9,0.471,29,0 171 | 3,111,90,12,78,28.4,0.495,29,0 172 | 6,102,82,0,0,30.8,0.18,36,1 173 | 6,134,70,23,130,35.4,0.542,29,1 174 | 2,87,0,23,0,28.9,0.773,25,0 175 | 1,79,60,42,48,43.5,0.678,23,0 176 | 2,75,64,24,55,29.7,0.37,33,0 177 | 8,179,72,42,130,32.7,0.719,36,1 178 | 6,85,78,0,0,31.2,0.382,42,0 179 | 0,129,110,46,130,67.1,0.319,26,1 180 | 5,143,78,0,0,45,0.19,47,0 181 | 5,130,82,0,0,39.1,0.956,37,1 182 | 6,87,80,0,0,23.2,0.084,32,0 183 | 0,119,64,18,92,34.9,0.725,23,0 184 | 1,0,74,20,23,27.7,0.299,21,0 185 | 5,73,60,0,0,26.8,0.268,27,0 186 | 4,141,74,0,0,27.6,0.244,40,0 187 | 7,194,68,28,0,35.9,0.745,41,1 188 | 8,181,68,36,495,30.1,0.615,60,1 189 | 1,128,98,41,58,32,1.321,33,1 190 | 8,109,76,39,114,27.9,0.64,31,1 191 | 5,139,80,35,160,31.6,0.361,25,1 192 | 3,111,62,0,0,22.6,0.142,21,0 193 | 9,123,70,44,94,33.1,0.374,40,0 194 | 7,159,66,0,0,30.4,0.383,36,1 195 | 11,135,0,0,0,52.3,0.578,40,1 196 | 8,85,55,20,0,24.4,0.136,42,0 197 | 5,158,84,41,210,39.4,0.395,29,1 198 | 1,105,58,0,0,24.3,0.187,21,0 199 | 3,107,62,13,48,22.9,0.678,23,1 200 | 4,109,64,44,99,34.8,0.905,26,1 201 | 4,148,60,27,318,30.9,0.15,29,1 202 | 0,113,80,16,0,31,0.874,21,0 203 | 1,138,82,0,0,40.1,0.236,28,0 204 | 0,108,68,20,0,27.3,0.787,32,0 205 | 2,99,70,16,44,20.4,0.235,27,0 206 | 6,103,72,32,190,37.7,0.324,55,0 207 | 5,111,72,28,0,23.9,0.407,27,0 208 | 8,196,76,29,280,37.5,0.605,57,1 209 | 5,162,104,0,0,37.7,0.151,52,1 210 | 1,96,64,27,87,33.2,0.289,21,0 211 | 7,184,84,33,0,35.5,0.355,41,1 212 | 2,81,60,22,0,27.7,0.29,25,0 213 | 0,147,85,54,0,42.8,0.375,24,0 214 | 7,179,95,31,0,34.2,0.164,60,0 215 | 0,140,65,26,130,42.6,0.431,24,1 216 | 9,112,82,32,175,34.2,0.26,36,1 217 | 12,151,70,40,271,41.8,0.742,38,1 218 | 5,109,62,41,129,35.8,0.514,25,1 219 | 6,125,68,30,120,30,0.464,32,0 220 | 5,85,74,22,0,29,1.224,32,1 221 | 5,112,66,0,0,37.8,0.261,41,1 222 | 0,177,60,29,478,34.6,1.072,21,1 223 | 2,158,90,0,0,31.6,0.805,66,1 224 | 7,119,0,0,0,25.2,0.209,37,0 225 | 7,142,60,33,190,28.8,0.687,61,0 226 | 1,100,66,15,56,23.6,0.666,26,0 227 | 1,87,78,27,32,34.6,0.101,22,0 228 | 0,101,76,0,0,35.7,0.198,26,0 229 | 3,162,52,38,0,37.2,0.652,24,1 230 | 4,197,70,39,744,36.7,2.329,31,0 231 | 0,117,80,31,53,45.2,0.089,24,0 232 | 4,142,86,0,0,44,0.645,22,1 233 | 6,134,80,37,370,46.2,0.238,46,1 234 | 1,79,80,25,37,25.4,0.583,22,0 235 | 4,122,68,0,0,35,0.394,29,0 236 | 3,74,68,28,45,29.7,0.293,23,0 237 | 4,171,72,0,0,43.6,0.479,26,1 238 | 7,181,84,21,192,35.9,0.586,51,1 239 | 0,179,90,27,0,44.1,0.686,23,1 240 | 9,164,84,21,0,30.8,0.831,32,1 241 | 0,104,76,0,0,18.4,0.582,27,0 242 | 1,91,64,24,0,29.2,0.192,21,0 243 | 4,91,70,32,88,33.1,0.446,22,0 244 | 3,139,54,0,0,25.6,0.402,22,1 245 | 6,119,50,22,176,27.1,1.318,33,1 246 | 2,146,76,35,194,38.2,0.329,29,0 247 | 9,184,85,15,0,30,1.213,49,1 248 | 10,122,68,0,0,31.2,0.258,41,0 249 | 0,165,90,33,680,52.3,0.427,23,0 250 | 9,124,70,33,402,35.4,0.282,34,0 251 | 1,111,86,19,0,30.1,0.143,23,0 252 | 9,106,52,0,0,31.2,0.38,42,0 253 | 2,129,84,0,0,28,0.284,27,0 254 | 2,90,80,14,55,24.4,0.249,24,0 255 | 0,86,68,32,0,35.8,0.238,25,0 256 | 12,92,62,7,258,27.6,0.926,44,1 257 | 1,113,64,35,0,33.6,0.543,21,1 258 | 3,111,56,39,0,30.1,0.557,30,0 259 | 2,114,68,22,0,28.7,0.092,25,0 260 | 1,193,50,16,375,25.9,0.655,24,0 261 | 11,155,76,28,150,33.3,1.353,51,1 262 | 3,191,68,15,130,30.9,0.299,34,0 263 | 3,141,0,0,0,30,0.761,27,1 264 | 4,95,70,32,0,32.1,0.612,24,0 265 | 3,142,80,15,0,32.4,0.2,63,0 266 | 4,123,62,0,0,32,0.226,35,1 267 | 5,96,74,18,67,33.6,0.997,43,0 268 | 0,138,0,0,0,36.3,0.933,25,1 269 | 2,128,64,42,0,40,1.101,24,0 270 | 0,102,52,0,0,25.1,0.078,21,0 271 | 2,146,0,0,0,27.5,0.24,28,1 272 | 10,101,86,37,0,45.6,1.136,38,1 273 | 2,108,62,32,56,25.2,0.128,21,0 274 | 3,122,78,0,0,23,0.254,40,0 275 | 1,71,78,50,45,33.2,0.422,21,0 276 | 13,106,70,0,0,34.2,0.251,52,0 277 | 2,100,70,52,57,40.5,0.677,25,0 278 | 7,106,60,24,0,26.5,0.296,29,1 279 | 0,104,64,23,116,27.8,0.454,23,0 280 | 5,114,74,0,0,24.9,0.744,57,0 281 | 2,108,62,10,278,25.3,0.881,22,0 282 | 0,146,70,0,0,37.9,0.334,28,1 283 | 10,129,76,28,122,35.9,0.28,39,0 284 | 7,133,88,15,155,32.4,0.262,37,0 285 | 7,161,86,0,0,30.4,0.165,47,1 286 | 2,108,80,0,0,27,0.259,52,1 287 | 7,136,74,26,135,26,0.647,51,0 288 | 5,155,84,44,545,38.7,0.619,34,0 289 | 1,119,86,39,220,45.6,0.808,29,1 290 | 4,96,56,17,49,20.8,0.34,26,0 291 | 5,108,72,43,75,36.1,0.263,33,0 292 | 0,78,88,29,40,36.9,0.434,21,0 293 | 0,107,62,30,74,36.6,0.757,25,1 294 | 2,128,78,37,182,43.3,1.224,31,1 295 | 1,128,48,45,194,40.5,0.613,24,1 296 | 0,161,50,0,0,21.9,0.254,65,0 297 | 6,151,62,31,120,35.5,0.692,28,0 298 | 2,146,70,38,360,28,0.337,29,1 299 | 0,126,84,29,215,30.7,0.52,24,0 300 | 14,100,78,25,184,36.6,0.412,46,1 301 | 8,112,72,0,0,23.6,0.84,58,0 302 | 0,167,0,0,0,32.3,0.839,30,1 303 | 2,144,58,33,135,31.6,0.422,25,1 304 | 5,77,82,41,42,35.8,0.156,35,0 305 | 5,115,98,0,0,52.9,0.209,28,1 306 | 3,150,76,0,0,21,0.207,37,0 307 | 2,120,76,37,105,39.7,0.215,29,0 308 | 10,161,68,23,132,25.5,0.326,47,1 309 | 0,137,68,14,148,24.8,0.143,21,0 310 | 0,128,68,19,180,30.5,1.391,25,1 311 | 2,124,68,28,205,32.9,0.875,30,1 312 | 6,80,66,30,0,26.2,0.313,41,0 313 | 0,106,70,37,148,39.4,0.605,22,0 314 | 2,155,74,17,96,26.6,0.433,27,1 315 | 3,113,50,10,85,29.5,0.626,25,0 316 | 7,109,80,31,0,35.9,1.127,43,1 317 | 2,112,68,22,94,34.1,0.315,26,0 318 | 3,99,80,11,64,19.3,0.284,30,0 319 | 3,182,74,0,0,30.5,0.345,29,1 320 | 3,115,66,39,140,38.1,0.15,28,0 321 | 6,194,78,0,0,23.5,0.129,59,1 322 | 4,129,60,12,231,27.5,0.527,31,0 323 | 3,112,74,30,0,31.6,0.197,25,1 324 | 0,124,70,20,0,27.4,0.254,36,1 325 | 13,152,90,33,29,26.8,0.731,43,1 326 | 2,112,75,32,0,35.7,0.148,21,0 327 | 1,157,72,21,168,25.6,0.123,24,0 328 | 1,122,64,32,156,35.1,0.692,30,1 329 | 10,179,70,0,0,35.1,0.2,37,0 330 | 2,102,86,36,120,45.5,0.127,23,1 331 | 6,105,70,32,68,30.8,0.122,37,0 332 | 8,118,72,19,0,23.1,1.476,46,0 333 | 2,87,58,16,52,32.7,0.166,25,0 334 | 1,180,0,0,0,43.3,0.282,41,1 335 | 12,106,80,0,0,23.6,0.137,44,0 336 | 1,95,60,18,58,23.9,0.26,22,0 337 | 0,165,76,43,255,47.9,0.259,26,0 338 | 0,117,0,0,0,33.8,0.932,44,0 339 | 5,115,76,0,0,31.2,0.343,44,1 340 | 9,152,78,34,171,34.2,0.893,33,1 341 | 7,178,84,0,0,39.9,0.331,41,1 342 | 1,130,70,13,105,25.9,0.472,22,0 343 | 1,95,74,21,73,25.9,0.673,36,0 344 | 1,0,68,35,0,32,0.389,22,0 345 | 5,122,86,0,0,34.7,0.29,33,0 346 | 8,95,72,0,0,36.8,0.485,57,0 347 | 8,126,88,36,108,38.5,0.349,49,0 348 | 1,139,46,19,83,28.7,0.654,22,0 349 | 3,116,0,0,0,23.5,0.187,23,0 350 | 3,99,62,19,74,21.8,0.279,26,0 351 | 5,0,80,32,0,41,0.346,37,1 352 | 4,92,80,0,0,42.2,0.237,29,0 353 | 4,137,84,0,0,31.2,0.252,30,0 354 | 3,61,82,28,0,34.4,0.243,46,0 355 | 1,90,62,12,43,27.2,0.58,24,0 356 | 3,90,78,0,0,42.7,0.559,21,0 357 | 9,165,88,0,0,30.4,0.302,49,1 358 | 1,125,50,40,167,33.3,0.962,28,1 359 | 13,129,0,30,0,39.9,0.569,44,1 360 | 12,88,74,40,54,35.3,0.378,48,0 361 | 1,196,76,36,249,36.5,0.875,29,1 362 | 5,189,64,33,325,31.2,0.583,29,1 363 | 5,158,70,0,0,29.8,0.207,63,0 364 | 5,103,108,37,0,39.2,0.305,65,0 365 | 4,146,78,0,0,38.5,0.52,67,1 366 | 4,147,74,25,293,34.9,0.385,30,0 367 | 5,99,54,28,83,34,0.499,30,0 368 | 6,124,72,0,0,27.6,0.368,29,1 369 | 0,101,64,17,0,21,0.252,21,0 370 | 3,81,86,16,66,27.5,0.306,22,0 371 | 1,133,102,28,140,32.8,0.234,45,1 372 | 3,173,82,48,465,38.4,2.137,25,1 373 | 0,118,64,23,89,0,1.731,21,0 374 | 0,84,64,22,66,35.8,0.545,21,0 375 | 2,105,58,40,94,34.9,0.225,25,0 376 | 2,122,52,43,158,36.2,0.816,28,0 377 | 12,140,82,43,325,39.2,0.528,58,1 378 | 0,98,82,15,84,25.2,0.299,22,0 379 | 1,87,60,37,75,37.2,0.509,22,0 380 | 4,156,75,0,0,48.3,0.238,32,1 381 | 0,93,100,39,72,43.4,1.021,35,0 382 | 1,107,72,30,82,30.8,0.821,24,0 383 | 0,105,68,22,0,20,0.236,22,0 384 | 1,109,60,8,182,25.4,0.947,21,0 385 | 1,90,62,18,59,25.1,1.268,25,0 386 | 1,125,70,24,110,24.3,0.221,25,0 387 | 1,119,54,13,50,22.3,0.205,24,0 388 | 5,116,74,29,0,32.3,0.66,35,1 389 | 8,105,100,36,0,43.3,0.239,45,1 390 | 5,144,82,26,285,32,0.452,58,1 391 | 3,100,68,23,81,31.6,0.949,28,0 392 | 1,100,66,29,196,32,0.444,42,0 393 | 5,166,76,0,0,45.7,0.34,27,1 394 | 1,131,64,14,415,23.7,0.389,21,0 395 | 4,116,72,12,87,22.1,0.463,37,0 396 | 4,158,78,0,0,32.9,0.803,31,1 397 | 2,127,58,24,275,27.7,1.6,25,0 398 | 3,96,56,34,115,24.7,0.944,39,0 399 | 0,131,66,40,0,34.3,0.196,22,1 400 | 3,82,70,0,0,21.1,0.389,25,0 401 | 3,193,70,31,0,34.9,0.241,25,1 402 | 4,95,64,0,0,32,0.161,31,1 403 | 6,137,61,0,0,24.2,0.151,55,0 404 | 5,136,84,41,88,35,0.286,35,1 405 | 9,72,78,25,0,31.6,0.28,38,0 406 | 5,168,64,0,0,32.9,0.135,41,1 407 | 2,123,48,32,165,42.1,0.52,26,0 408 | 4,115,72,0,0,28.9,0.376,46,1 409 | 0,101,62,0,0,21.9,0.336,25,0 410 | 8,197,74,0,0,25.9,1.191,39,1 411 | 1,172,68,49,579,42.4,0.702,28,1 412 | 6,102,90,39,0,35.7,0.674,28,0 413 | 1,112,72,30,176,34.4,0.528,25,0 414 | 1,143,84,23,310,42.4,1.076,22,0 415 | 1,143,74,22,61,26.2,0.256,21,0 416 | 0,138,60,35,167,34.6,0.534,21,1 417 | 3,173,84,33,474,35.7,0.258,22,1 418 | 1,97,68,21,0,27.2,1.095,22,0 419 | 4,144,82,32,0,38.5,0.554,37,1 420 | 1,83,68,0,0,18.2,0.624,27,0 421 | 3,129,64,29,115,26.4,0.219,28,1 422 | 1,119,88,41,170,45.3,0.507,26,0 423 | 2,94,68,18,76,26,0.561,21,0 424 | 0,102,64,46,78,40.6,0.496,21,0 425 | 2,115,64,22,0,30.8,0.421,21,0 426 | 8,151,78,32,210,42.9,0.516,36,1 427 | 4,184,78,39,277,37,0.264,31,1 428 | 0,94,0,0,0,0,0.256,25,0 429 | 1,181,64,30,180,34.1,0.328,38,1 430 | 0,135,94,46,145,40.6,0.284,26,0 431 | 1,95,82,25,180,35,0.233,43,1 432 | 2,99,0,0,0,22.2,0.108,23,0 433 | 3,89,74,16,85,30.4,0.551,38,0 434 | 1,80,74,11,60,30,0.527,22,0 435 | 2,139,75,0,0,25.6,0.167,29,0 436 | 1,90,68,8,0,24.5,1.138,36,0 437 | 0,141,0,0,0,42.4,0.205,29,1 438 | 12,140,85,33,0,37.4,0.244,41,0 439 | 5,147,75,0,0,29.9,0.434,28,0 440 | 1,97,70,15,0,18.2,0.147,21,0 441 | 6,107,88,0,0,36.8,0.727,31,0 442 | 0,189,104,25,0,34.3,0.435,41,1 443 | 2,83,66,23,50,32.2,0.497,22,0 444 | 4,117,64,27,120,33.2,0.23,24,0 445 | 8,108,70,0,0,30.5,0.955,33,1 446 | 4,117,62,12,0,29.7,0.38,30,1 447 | 0,180,78,63,14,59.4,2.42,25,1 448 | 1,100,72,12,70,25.3,0.658,28,0 449 | 0,95,80,45,92,36.5,0.33,26,0 450 | 0,104,64,37,64,33.6,0.51,22,1 451 | 0,120,74,18,63,30.5,0.285,26,0 452 | 1,82,64,13,95,21.2,0.415,23,0 453 | 2,134,70,0,0,28.9,0.542,23,1 454 | 0,91,68,32,210,39.9,0.381,25,0 455 | 2,119,0,0,0,19.6,0.832,72,0 456 | 2,100,54,28,105,37.8,0.498,24,0 457 | 14,175,62,30,0,33.6,0.212,38,1 458 | 1,135,54,0,0,26.7,0.687,62,0 459 | 5,86,68,28,71,30.2,0.364,24,0 460 | 10,148,84,48,237,37.6,1.001,51,1 461 | 9,134,74,33,60,25.9,0.46,81,0 462 | 9,120,72,22,56,20.8,0.733,48,0 463 | 1,71,62,0,0,21.8,0.416,26,0 464 | 8,74,70,40,49,35.3,0.705,39,0 465 | 5,88,78,30,0,27.6,0.258,37,0 466 | 10,115,98,0,0,24,1.022,34,0 467 | 0,124,56,13,105,21.8,0.452,21,0 468 | 0,74,52,10,36,27.8,0.269,22,0 469 | 0,97,64,36,100,36.8,0.6,25,0 470 | 8,120,0,0,0,30,0.183,38,1 471 | 6,154,78,41,140,46.1,0.571,27,0 472 | 1,144,82,40,0,41.3,0.607,28,0 473 | 0,137,70,38,0,33.2,0.17,22,0 474 | 0,119,66,27,0,38.8,0.259,22,0 475 | 7,136,90,0,0,29.9,0.21,50,0 476 | 4,114,64,0,0,28.9,0.126,24,0 477 | 0,137,84,27,0,27.3,0.231,59,0 478 | 2,105,80,45,191,33.7,0.711,29,1 479 | 7,114,76,17,110,23.8,0.466,31,0 480 | 8,126,74,38,75,25.9,0.162,39,0 481 | 4,132,86,31,0,28,0.419,63,0 482 | 3,158,70,30,328,35.5,0.344,35,1 483 | 0,123,88,37,0,35.2,0.197,29,0 484 | 4,85,58,22,49,27.8,0.306,28,0 485 | 0,84,82,31,125,38.2,0.233,23,0 486 | 0,145,0,0,0,44.2,0.63,31,1 487 | 0,135,68,42,250,42.3,0.365,24,1 488 | 1,139,62,41,480,40.7,0.536,21,0 489 | 0,173,78,32,265,46.5,1.159,58,0 490 | 4,99,72,17,0,25.6,0.294,28,0 491 | 8,194,80,0,0,26.1,0.551,67,0 492 | 2,83,65,28,66,36.8,0.629,24,0 493 | 2,89,90,30,0,33.5,0.292,42,0 494 | 4,99,68,38,0,32.8,0.145,33,0 495 | 4,125,70,18,122,28.9,1.144,45,1 496 | 3,80,0,0,0,0,0.174,22,0 497 | 6,166,74,0,0,26.6,0.304,66,0 498 | 5,110,68,0,0,26,0.292,30,0 499 | 2,81,72,15,76,30.1,0.547,25,0 500 | 7,195,70,33,145,25.1,0.163,55,1 501 | 6,154,74,32,193,29.3,0.839,39,0 502 | 2,117,90,19,71,25.2,0.313,21,0 503 | 3,84,72,32,0,37.2,0.267,28,0 504 | 6,0,68,41,0,39,0.727,41,1 505 | 7,94,64,25,79,33.3,0.738,41,0 506 | 3,96,78,39,0,37.3,0.238,40,0 507 | 10,75,82,0,0,33.3,0.263,38,0 508 | 0,180,90,26,90,36.5,0.314,35,1 509 | 1,130,60,23,170,28.6,0.692,21,0 510 | 2,84,50,23,76,30.4,0.968,21,0 511 | 8,120,78,0,0,25,0.409,64,0 512 | 12,84,72,31,0,29.7,0.297,46,1 513 | 0,139,62,17,210,22.1,0.207,21,0 514 | 9,91,68,0,0,24.2,0.2,58,0 515 | 2,91,62,0,0,27.3,0.525,22,0 516 | 3,99,54,19,86,25.6,0.154,24,0 517 | 3,163,70,18,105,31.6,0.268,28,1 518 | 9,145,88,34,165,30.3,0.771,53,1 519 | 7,125,86,0,0,37.6,0.304,51,0 520 | 13,76,60,0,0,32.8,0.18,41,0 521 | 6,129,90,7,326,19.6,0.582,60,0 522 | 2,68,70,32,66,25,0.187,25,0 523 | 3,124,80,33,130,33.2,0.305,26,0 524 | 6,114,0,0,0,0,0.189,26,0 525 | 9,130,70,0,0,34.2,0.652,45,1 526 | 3,125,58,0,0,31.6,0.151,24,0 527 | 3,87,60,18,0,21.8,0.444,21,0 528 | 1,97,64,19,82,18.2,0.299,21,0 529 | 3,116,74,15,105,26.3,0.107,24,0 530 | 0,117,66,31,188,30.8,0.493,22,0 531 | 0,111,65,0,0,24.6,0.66,31,0 532 | 2,122,60,18,106,29.8,0.717,22,0 533 | 0,107,76,0,0,45.3,0.686,24,0 534 | 1,86,66,52,65,41.3,0.917,29,0 535 | 6,91,0,0,0,29.8,0.501,31,0 536 | 1,77,56,30,56,33.3,1.251,24,0 537 | 4,132,0,0,0,32.9,0.302,23,1 538 | 0,105,90,0,0,29.6,0.197,46,0 539 | 0,57,60,0,0,21.7,0.735,67,0 540 | 0,127,80,37,210,36.3,0.804,23,0 541 | 3,129,92,49,155,36.4,0.968,32,1 542 | 8,100,74,40,215,39.4,0.661,43,1 543 | 3,128,72,25,190,32.4,0.549,27,1 544 | 10,90,85,32,0,34.9,0.825,56,1 545 | 4,84,90,23,56,39.5,0.159,25,0 546 | 1,88,78,29,76,32,0.365,29,0 547 | 8,186,90,35,225,34.5,0.423,37,1 548 | 5,187,76,27,207,43.6,1.034,53,1 549 | 4,131,68,21,166,33.1,0.16,28,0 550 | 1,164,82,43,67,32.8,0.341,50,0 551 | 4,189,110,31,0,28.5,0.68,37,0 552 | 1,116,70,28,0,27.4,0.204,21,0 553 | 3,84,68,30,106,31.9,0.591,25,0 554 | 6,114,88,0,0,27.8,0.247,66,0 555 | 1,88,62,24,44,29.9,0.422,23,0 556 | 1,84,64,23,115,36.9,0.471,28,0 557 | 7,124,70,33,215,25.5,0.161,37,0 558 | 1,97,70,40,0,38.1,0.218,30,0 559 | 8,110,76,0,0,27.8,0.237,58,0 560 | 11,103,68,40,0,46.2,0.126,42,0 561 | 11,85,74,0,0,30.1,0.3,35,0 562 | 6,125,76,0,0,33.8,0.121,54,1 563 | 0,198,66,32,274,41.3,0.502,28,1 564 | 1,87,68,34,77,37.6,0.401,24,0 565 | 6,99,60,19,54,26.9,0.497,32,0 566 | 0,91,80,0,0,32.4,0.601,27,0 567 | 2,95,54,14,88,26.1,0.748,22,0 568 | 1,99,72,30,18,38.6,0.412,21,0 569 | 6,92,62,32,126,32,0.085,46,0 570 | 4,154,72,29,126,31.3,0.338,37,0 571 | 0,121,66,30,165,34.3,0.203,33,1 572 | 3,78,70,0,0,32.5,0.27,39,0 573 | 2,130,96,0,0,22.6,0.268,21,0 574 | 3,111,58,31,44,29.5,0.43,22,0 575 | 2,98,60,17,120,34.7,0.198,22,0 576 | 1,143,86,30,330,30.1,0.892,23,0 577 | 1,119,44,47,63,35.5,0.28,25,0 578 | 6,108,44,20,130,24,0.813,35,0 579 | 2,118,80,0,0,42.9,0.693,21,1 580 | 10,133,68,0,0,27,0.245,36,0 581 | 2,197,70,99,0,34.7,0.575,62,1 582 | 0,151,90,46,0,42.1,0.371,21,1 583 | 6,109,60,27,0,25,0.206,27,0 584 | 12,121,78,17,0,26.5,0.259,62,0 585 | 8,100,76,0,0,38.7,0.19,42,0 586 | 8,124,76,24,600,28.7,0.687,52,1 587 | 1,93,56,11,0,22.5,0.417,22,0 588 | 8,143,66,0,0,34.9,0.129,41,1 589 | 6,103,66,0,0,24.3,0.249,29,0 590 | 3,176,86,27,156,33.3,1.154,52,1 591 | 0,73,0,0,0,21.1,0.342,25,0 592 | 11,111,84,40,0,46.8,0.925,45,1 593 | 2,112,78,50,140,39.4,0.175,24,0 594 | 3,132,80,0,0,34.4,0.402,44,1 595 | 2,82,52,22,115,28.5,1.699,25,0 596 | 6,123,72,45,230,33.6,0.733,34,0 597 | 0,188,82,14,185,32,0.682,22,1 598 | 0,67,76,0,0,45.3,0.194,46,0 599 | 1,89,24,19,25,27.8,0.559,21,0 600 | 1,173,74,0,0,36.8,0.088,38,1 601 | 1,109,38,18,120,23.1,0.407,26,0 602 | 1,108,88,19,0,27.1,0.4,24,0 603 | 6,96,0,0,0,23.7,0.19,28,0 604 | 1,124,74,36,0,27.8,0.1,30,0 605 | 7,150,78,29,126,35.2,0.692,54,1 606 | 4,183,0,0,0,28.4,0.212,36,1 607 | 1,124,60,32,0,35.8,0.514,21,0 608 | 1,181,78,42,293,40,1.258,22,1 609 | 1,92,62,25,41,19.5,0.482,25,0 610 | 0,152,82,39,272,41.5,0.27,27,0 611 | 1,111,62,13,182,24,0.138,23,0 612 | 3,106,54,21,158,30.9,0.292,24,0 613 | 3,174,58,22,194,32.9,0.593,36,1 614 | 7,168,88,42,321,38.2,0.787,40,1 615 | 6,105,80,28,0,32.5,0.878,26,0 616 | 11,138,74,26,144,36.1,0.557,50,1 617 | 3,106,72,0,0,25.8,0.207,27,0 618 | 6,117,96,0,0,28.7,0.157,30,0 619 | 2,68,62,13,15,20.1,0.257,23,0 620 | 9,112,82,24,0,28.2,1.282,50,1 621 | 0,119,0,0,0,32.4,0.141,24,1 622 | 2,112,86,42,160,38.4,0.246,28,0 623 | 2,92,76,20,0,24.2,1.698,28,0 624 | 6,183,94,0,0,40.8,1.461,45,0 625 | 0,94,70,27,115,43.5,0.347,21,0 626 | 2,108,64,0,0,30.8,0.158,21,0 627 | 4,90,88,47,54,37.7,0.362,29,0 628 | 0,125,68,0,0,24.7,0.206,21,0 629 | 0,132,78,0,0,32.4,0.393,21,0 630 | 5,128,80,0,0,34.6,0.144,45,0 631 | 4,94,65,22,0,24.7,0.148,21,0 632 | 7,114,64,0,0,27.4,0.732,34,1 633 | 0,102,78,40,90,34.5,0.238,24,0 634 | 2,111,60,0,0,26.2,0.343,23,0 635 | 1,128,82,17,183,27.5,0.115,22,0 636 | 10,92,62,0,0,25.9,0.167,31,0 637 | 13,104,72,0,0,31.2,0.465,38,1 638 | 5,104,74,0,0,28.8,0.153,48,0 639 | 2,94,76,18,66,31.6,0.649,23,0 640 | 7,97,76,32,91,40.9,0.871,32,1 641 | 1,100,74,12,46,19.5,0.149,28,0 642 | 0,102,86,17,105,29.3,0.695,27,0 643 | 4,128,70,0,0,34.3,0.303,24,0 644 | 6,147,80,0,0,29.5,0.178,50,1 645 | 4,90,0,0,0,28,0.61,31,0 646 | 3,103,72,30,152,27.6,0.73,27,0 647 | 2,157,74,35,440,39.4,0.134,30,0 648 | 1,167,74,17,144,23.4,0.447,33,1 649 | 0,179,50,36,159,37.8,0.455,22,1 650 | 11,136,84,35,130,28.3,0.26,42,1 651 | 0,107,60,25,0,26.4,0.133,23,0 652 | 1,91,54,25,100,25.2,0.234,23,0 653 | 1,117,60,23,106,33.8,0.466,27,0 654 | 5,123,74,40,77,34.1,0.269,28,0 655 | 2,120,54,0,0,26.8,0.455,27,0 656 | 1,106,70,28,135,34.2,0.142,22,0 657 | 2,155,52,27,540,38.7,0.24,25,1 658 | 2,101,58,35,90,21.8,0.155,22,0 659 | 1,120,80,48,200,38.9,1.162,41,0 660 | 11,127,106,0,0,39,0.19,51,0 661 | 3,80,82,31,70,34.2,1.292,27,1 662 | 10,162,84,0,0,27.7,0.182,54,0 663 | 1,199,76,43,0,42.9,1.394,22,1 664 | 8,167,106,46,231,37.6,0.165,43,1 665 | 9,145,80,46,130,37.9,0.637,40,1 666 | 6,115,60,39,0,33.7,0.245,40,1 667 | 1,112,80,45,132,34.8,0.217,24,0 668 | 4,145,82,18,0,32.5,0.235,70,1 669 | 10,111,70,27,0,27.5,0.141,40,1 670 | 6,98,58,33,190,34,0.43,43,0 671 | 9,154,78,30,100,30.9,0.164,45,0 672 | 6,165,68,26,168,33.6,0.631,49,0 673 | 1,99,58,10,0,25.4,0.551,21,0 674 | 10,68,106,23,49,35.5,0.285,47,0 675 | 3,123,100,35,240,57.3,0.88,22,0 676 | 8,91,82,0,0,35.6,0.587,68,0 677 | 6,195,70,0,0,30.9,0.328,31,1 678 | 9,156,86,0,0,24.8,0.23,53,1 679 | 0,93,60,0,0,35.3,0.263,25,0 680 | 3,121,52,0,0,36,0.127,25,1 681 | 2,101,58,17,265,24.2,0.614,23,0 682 | 2,56,56,28,45,24.2,0.332,22,0 683 | 0,162,76,36,0,49.6,0.364,26,1 684 | 0,95,64,39,105,44.6,0.366,22,0 685 | 4,125,80,0,0,32.3,0.536,27,1 686 | 5,136,82,0,0,0,0.64,69,0 687 | 2,129,74,26,205,33.2,0.591,25,0 688 | 3,130,64,0,0,23.1,0.314,22,0 689 | 1,107,50,19,0,28.3,0.181,29,0 690 | 1,140,74,26,180,24.1,0.828,23,0 691 | 1,144,82,46,180,46.1,0.335,46,1 692 | 8,107,80,0,0,24.6,0.856,34,0 693 | 13,158,114,0,0,42.3,0.257,44,1 694 | 2,121,70,32,95,39.1,0.886,23,0 695 | 7,129,68,49,125,38.5,0.439,43,1 696 | 2,90,60,0,0,23.5,0.191,25,0 697 | 7,142,90,24,480,30.4,0.128,43,1 698 | 3,169,74,19,125,29.9,0.268,31,1 699 | 0,99,0,0,0,25,0.253,22,0 700 | 4,127,88,11,155,34.5,0.598,28,0 701 | 4,118,70,0,0,44.5,0.904,26,0 702 | 2,122,76,27,200,35.9,0.483,26,0 703 | 6,125,78,31,0,27.6,0.565,49,1 704 | 1,168,88,29,0,35,0.905,52,1 705 | 2,129,0,0,0,38.5,0.304,41,0 706 | 4,110,76,20,100,28.4,0.118,27,0 707 | 6,80,80,36,0,39.8,0.177,28,0 708 | 10,115,0,0,0,0,0.261,30,1 709 | 2,127,46,21,335,34.4,0.176,22,0 710 | 9,164,78,0,0,32.8,0.148,45,1 711 | 2,93,64,32,160,38,0.674,23,1 712 | 3,158,64,13,387,31.2,0.295,24,0 713 | 5,126,78,27,22,29.6,0.439,40,0 714 | 10,129,62,36,0,41.2,0.441,38,1 715 | 0,134,58,20,291,26.4,0.352,21,0 716 | 3,102,74,0,0,29.5,0.121,32,0 717 | 7,187,50,33,392,33.9,0.826,34,1 718 | 3,173,78,39,185,33.8,0.97,31,1 719 | 10,94,72,18,0,23.1,0.595,56,0 720 | 1,108,60,46,178,35.5,0.415,24,0 721 | 5,97,76,27,0,35.6,0.378,52,1 722 | 4,83,86,19,0,29.3,0.317,34,0 723 | 1,114,66,36,200,38.1,0.289,21,0 724 | 1,149,68,29,127,29.3,0.349,42,1 725 | 5,117,86,30,105,39.1,0.251,42,0 726 | 1,111,94,0,0,32.8,0.265,45,0 727 | 4,112,78,40,0,39.4,0.236,38,0 728 | 1,116,78,29,180,36.1,0.496,25,0 729 | 0,141,84,26,0,32.4,0.433,22,0 730 | 2,175,88,0,0,22.9,0.326,22,0 731 | 2,92,52,0,0,30.1,0.141,22,0 732 | 3,130,78,23,79,28.4,0.323,34,1 733 | 8,120,86,0,0,28.4,0.259,22,1 734 | 2,174,88,37,120,44.5,0.646,24,1 735 | 2,106,56,27,165,29,0.426,22,0 736 | 2,105,75,0,0,23.3,0.56,53,0 737 | 4,95,60,32,0,35.4,0.284,28,0 738 | 0,126,86,27,120,27.4,0.515,21,0 739 | 8,65,72,23,0,32,0.6,42,0 740 | 2,99,60,17,160,36.6,0.453,21,0 741 | 1,102,74,0,0,39.5,0.293,42,1 742 | 11,120,80,37,150,42.3,0.785,48,1 743 | 3,102,44,20,94,30.8,0.4,26,0 744 | 1,109,58,18,116,28.5,0.219,22,0 745 | 9,140,94,0,0,32.7,0.734,45,1 746 | 13,153,88,37,140,40.6,1.174,39,0 747 | 12,100,84,33,105,30,0.488,46,0 748 | 1,147,94,41,0,49.3,0.358,27,1 749 | 1,81,74,41,57,46.3,1.096,32,0 750 | 3,187,70,22,200,36.4,0.408,36,1 751 | 6,162,62,0,0,24.3,0.178,50,1 752 | 4,136,70,0,0,31.2,1.182,22,1 753 | 1,121,78,39,74,39,0.261,28,0 754 | 3,108,62,24,0,26,0.223,25,0 755 | 0,181,88,44,510,43.3,0.222,26,1 756 | 8,154,78,32,0,32.4,0.443,45,1 757 | 1,128,88,39,110,36.5,1.057,37,1 758 | 7,137,90,41,0,32,0.391,39,0 759 | 0,123,72,0,0,36.3,0.258,52,1 760 | 1,106,76,0,0,37.5,0.197,26,0 761 | 6,190,92,0,0,35.5,0.278,66,1 762 | 2,88,58,26,16,28.4,0.766,22,0 763 | 9,170,74,31,0,44,0.403,43,1 764 | 9,89,62,0,0,22.5,0.142,33,0 765 | 10,101,76,48,180,32.9,0.171,63,0 766 | 2,122,70,27,0,36.8,0.34,27,0 767 | 5,121,72,23,112,26.2,0.245,30,0 768 | 1,126,60,0,0,30.1,0.349,47,1 769 | 1,93,70,31,0,30.4,0.315,23,0 770 | -------------------------------------------------------------------------------- /Chapter 5/Chapter5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Remove duplicates" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from pandas import read_csv\n" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 6, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "# now load the dataset\n", 26 | "data_frame = read_csv(\"IRIS.csv\", header=None)" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 7, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "(151, 5)\n" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "print(data_frame.shape)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 8, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "# calculate the duplicates present\n", 53 | "duplicates = data_frame.duplicated()" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 9, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "name": "stdout", 63 | "output_type": "stream", 64 | "text": [ 65 | "True\n", 66 | " 0 1 2 3 4\n", 67 | "35 4.9 3.1 1.5 0.1 Iris-setosa\n", 68 | "38 4.9 3.1 1.5 0.1 Iris-setosa\n", 69 | "143 5.8 2.7 5.1 1.9 Iris-virginica\n" 70 | ] 71 | } 72 | ], 73 | "source": [ 74 | "# output the duplicates if there are any duplicates\n", 75 | "print(duplicates.any())\n", 76 | "# list all duplicate rows\n", 77 | "print(data_frame[duplicates])" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 10, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "(148, 5)\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "data_frame.drop_duplicates(inplace=True)\n", 95 | "print(data_frame.shape)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "## Impute the missing values" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 1, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "name": "stdout", 112 | "output_type": "stream", 113 | "text": [ 114 | "[[3. 2. ]\n", 115 | " [6. 6.33333333]\n", 116 | " [7. 6. ]]\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "import numpy as np\n", 122 | "from sklearn.impute import SimpleImputer\n", 123 | "impute = SimpleImputer(missing_values=np.nan, strategy='mean')\n", 124 | "impute.fit([[2, 5], [np.nan, 8], [4, 6]])\n", 125 | "SimpleImputer()\n", 126 | "X = [[np.nan, 2], [6, np.nan], [7, 6]]\n", 127 | "print(impute.transform(X))" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "In case of sparse metrices too, SimpleImputer works" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 2, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "name": "stdout", 146 | "output_type": "stream", 147 | "text": [ 148 | "[[2.66666667 2. ]\n", 149 | " [6. 1.33333333]\n", 150 | " [7. 6. ]]\n" 151 | ] 152 | } 153 | ], 154 | "source": [ 155 | "import scipy.sparse as sp\n", 156 | "matrix = sp.csc_matrix([[2, 4], [0, -2], [6, 2]])\n", 157 | "impute = SimpleImputer(missing_values=-1, strategy='mean')\n", 158 | "impute.fit(matrix)\n", 159 | "SimpleImputer(missing_values=-1)\n", 160 | "matrix_test = sp.csc_matrix([[-1, 2], [6, -1], [7, 6]])\n", 161 | "print(impute.transform(matrix_test).toarray())" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 4, 167 | "metadata": {}, 168 | "outputs": [ 169 | { 170 | "name": "stdout", 171 | "output_type": "stream", 172 | "text": [ 173 | "[['New York' 'New Delhi']\n", 174 | " ['New York' 'Tokyo']\n", 175 | " ['New York' 'Tokyo']\n", 176 | " ['New York' 'Tokyo']]\n" 177 | ] 178 | } 179 | ], 180 | "source": [ 181 | "import pandas as pd\n", 182 | "data_frame = pd.DataFrame([[\"New York\", \"New Delhi\"],[np.nan, \"Tokyo\"],[\"New York\", np.nan],[\"New York\", \"Tokyo\"]], dtype=\"category\")\n", 183 | "\n", 184 | "impute = SimpleImputer(strategy=\"most_frequent\")\n", 185 | "print(impute.fit_transform(data_frame))\n" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "## Impute missing values using machine learning" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 57, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "import numpy as np\n", 209 | "import pandas as pd" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 62, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "missing_dictionary = {'Variable_A': [200, 190, 90, 149, np.nan],\n", 219 | " 'Variable_B': [400, np.nan, 149, 200, 205],\n", 220 | " 'Variable_C': [200,149, np.nan, 155, 165],\n", 221 | " 'Variable_D': [200, np.nan, 90, 149,100],\n", 222 | " 'Variable_E': [200, 190, 90, 149, np.nan],}" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 63, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [ 231 | "missing_df = pd.DataFrame(missing_dictionary)" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 64, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "data": { 241 | "text/html": [ 242 | "
\n", 243 | "\n", 256 | "\n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | "
Variable_AVariable_BVariable_CVariable_DVariable_E
0200.0400.0200.0200.0200.0
1190.0NaN149.0NaN190.0
290.0149.0NaN90.090.0
3149.0200.0155.0149.0149.0
4NaN205.0165.0100.0NaN
\n", 310 | "
" 311 | ], 312 | "text/plain": [ 313 | " Variable_A Variable_B Variable_C Variable_D Variable_E\n", 314 | "0 200.0 400.0 200.0 200.0 200.0\n", 315 | "1 190.0 NaN 149.0 NaN 190.0\n", 316 | "2 90.0 149.0 NaN 90.0 90.0\n", 317 | "3 149.0 200.0 155.0 149.0 149.0\n", 318 | "4 NaN 205.0 165.0 100.0 NaN" 319 | ] 320 | }, 321 | "execution_count": 64, 322 | "metadata": {}, 323 | "output_type": "execute_result" 324 | } 325 | ], 326 | "source": [ 327 | "missing_df" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": 67, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [ 336 | "from sklearn.impute import KNNImputer" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 69, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "missing_imputer = KNNImputer(n_neighbors=2)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 70, 351 | "metadata": {}, 352 | "outputs": [], 353 | "source": [ 354 | "imputed_df = missing_imputer.fit_transform(missing_df)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 71, 360 | "metadata": {}, 361 | "outputs": [ 362 | { 363 | "data": { 364 | "text/plain": [ 365 | "array([[200. , 400. , 200. , 200. , 200. ],\n", 366 | " [190. , 302.5, 149. , 150. , 190. ],\n", 367 | " [ 90. , 149. , 160. , 90. , 90. ],\n", 368 | " [149. , 200. , 155. , 149. , 149. ],\n", 369 | " [169.5, 205. , 165. , 100. , 169.5]])" 370 | ] 371 | }, 372 | "execution_count": 71, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "imputed_df" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": {}, 392 | "outputs": [], 393 | "source": [] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "## Data Imbalance using SMOTE" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 83, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "import pandas as pd\n", 416 | "from imblearn.over_sampling import SMOTE\n", 417 | "\n", 418 | "from imblearn.combine import SMOTETomek" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": 111, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "# Import data and create X, y\n", 428 | "credit_card_data_set = pd.read_csv('creditcard.csv')\n" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 127, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "X = credit_card_data_set.iloc[:,:-1]\n", 438 | "y = credit_card_data_set.iloc[:,-1].map({1:'Fraud', 0:'No Fraud'})\n", 439 | "\n", 440 | "# Resample data\n", 441 | "X_resampled, y_resampled = SMOTE(sampling_strategy={\"Fraud\":500}).fit_resample(X, y)\n", 442 | "X_resampled = pd.DataFrame(X_resampled, columns=X.columns)" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 120, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "class_0_original = len(credit_card_data_set[credit_card_data_set.Class==0])" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 121, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "class_1_original = len(credit_card_data_set[credit_card_data_set.Class==1])" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 123, 466 | "metadata": {}, 467 | "outputs": [ 468 | { 469 | "name": "stdout", 470 | "output_type": "stream", 471 | "text": [ 472 | "0.001727485630620034\n" 473 | ] 474 | } 475 | ], 476 | "source": [ 477 | "print(class_1_original/(class_0_original+class_1_original))" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 128, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "sampled_0 = len(y_sampled[y_sampled==0])\n" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": 129, 492 | "metadata": {}, 493 | "outputs": [], 494 | "source": [ 495 | "sampled_1 = len(y_sampled[y_sampled==1])" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 130, 501 | "metadata": {}, 502 | "outputs": [ 503 | { 504 | "name": "stdout", 505 | "output_type": "stream", 506 | "text": [ 507 | "0.5\n" 508 | ] 509 | } 510 | ], 511 | "source": [ 512 | "print(sampled_1/(sampled_0+sampled_1))" 513 | ] 514 | } 515 | ], 516 | "metadata": { 517 | "kernelspec": { 518 | "display_name": "Python 3", 519 | "language": "python", 520 | "name": "python3" 521 | }, 522 | "language_info": { 523 | "codemirror_mode": { 524 | "name": "ipython", 525 | "version": 3 526 | }, 527 | "file_extension": ".py", 528 | "mimetype": "text/x-python", 529 | "name": "python", 530 | "nbconvert_exporter": "python", 531 | "pygments_lexer": "ipython3", 532 | "version": "3.7.4" 533 | } 534 | }, 535 | "nbformat": 4, 536 | "nbformat_minor": 2 537 | } 538 | -------------------------------------------------------------------------------- /Chapter 5/IRIS.csv: -------------------------------------------------------------------------------- 1 | sepal_length,sepal_width,petal_length,petal_width,species 2 | 5.1,3.5,1.4,0.2,Iris-setosa 3 | 4.9,3,1.4,0.2,Iris-setosa 4 | 4.7,3.2,1.3,0.2,Iris-setosa 5 | 4.6,3.1,1.5,0.2,Iris-setosa 6 | 5,3.6,1.4,0.2,Iris-setosa 7 | 5.4,3.9,1.7,0.4,Iris-setosa 8 | 4.6,3.4,1.4,0.3,Iris-setosa 9 | 5,3.4,1.5,0.2,Iris-setosa 10 | 4.4,2.9,1.4,0.2,Iris-setosa 11 | 4.9,3.1,1.5,0.1,Iris-setosa 12 | 5.4,3.7,1.5,0.2,Iris-setosa 13 | 4.8,3.4,1.6,0.2,Iris-setosa 14 | 4.8,3,1.4,0.1,Iris-setosa 15 | 4.3,3,1.1,0.1,Iris-setosa 16 | 5.8,4,1.2,0.2,Iris-setosa 17 | 5.7,4.4,1.5,0.4,Iris-setosa 18 | 5.4,3.9,1.3,0.4,Iris-setosa 19 | 5.1,3.5,1.4,0.3,Iris-setosa 20 | 5.7,3.8,1.7,0.3,Iris-setosa 21 | 5.1,3.8,1.5,0.3,Iris-setosa 22 | 5.4,3.4,1.7,0.2,Iris-setosa 23 | 5.1,3.7,1.5,0.4,Iris-setosa 24 | 4.6,3.6,1,0.2,Iris-setosa 25 | 5.1,3.3,1.7,0.5,Iris-setosa 26 | 4.8,3.4,1.9,0.2,Iris-setosa 27 | 5,3,1.6,0.2,Iris-setosa 28 | 5,3.4,1.6,0.4,Iris-setosa 29 | 5.2,3.5,1.5,0.2,Iris-setosa 30 | 5.2,3.4,1.4,0.2,Iris-setosa 31 | 4.7,3.2,1.6,0.2,Iris-setosa 32 | 4.8,3.1,1.6,0.2,Iris-setosa 33 | 5.4,3.4,1.5,0.4,Iris-setosa 34 | 5.2,4.1,1.5,0.1,Iris-setosa 35 | 5.5,4.2,1.4,0.2,Iris-setosa 36 | 4.9,3.1,1.5,0.1,Iris-setosa 37 | 5,3.2,1.2,0.2,Iris-setosa 38 | 5.5,3.5,1.3,0.2,Iris-setosa 39 | 4.9,3.1,1.5,0.1,Iris-setosa 40 | 4.4,3,1.3,0.2,Iris-setosa 41 | 5.1,3.4,1.5,0.2,Iris-setosa 42 | 5,3.5,1.3,0.3,Iris-setosa 43 | 4.5,2.3,1.3,0.3,Iris-setosa 44 | 4.4,3.2,1.3,0.2,Iris-setosa 45 | 5,3.5,1.6,0.6,Iris-setosa 46 | 5.1,3.8,1.9,0.4,Iris-setosa 47 | 4.8,3,1.4,0.3,Iris-setosa 48 | 5.1,3.8,1.6,0.2,Iris-setosa 49 | 4.6,3.2,1.4,0.2,Iris-setosa 50 | 5.3,3.7,1.5,0.2,Iris-setosa 51 | 5,3.3,1.4,0.2,Iris-setosa 52 | 7,3.2,4.7,1.4,Iris-versicolor 53 | 6.4,3.2,4.5,1.5,Iris-versicolor 54 | 6.9,3.1,4.9,1.5,Iris-versicolor 55 | 5.5,2.3,4,1.3,Iris-versicolor 56 | 6.5,2.8,4.6,1.5,Iris-versicolor 57 | 5.7,2.8,4.5,1.3,Iris-versicolor 58 | 6.3,3.3,4.7,1.6,Iris-versicolor 59 | 4.9,2.4,3.3,1,Iris-versicolor 60 | 6.6,2.9,4.6,1.3,Iris-versicolor 61 | 5.2,2.7,3.9,1.4,Iris-versicolor 62 | 5,2,3.5,1,Iris-versicolor 63 | 5.9,3,4.2,1.5,Iris-versicolor 64 | 6,2.2,4,1,Iris-versicolor 65 | 6.1,2.9,4.7,1.4,Iris-versicolor 66 | 5.6,2.9,3.6,1.3,Iris-versicolor 67 | 6.7,3.1,4.4,1.4,Iris-versicolor 68 | 5.6,3,4.5,1.5,Iris-versicolor 69 | 5.8,2.7,4.1,1,Iris-versicolor 70 | 6.2,2.2,4.5,1.5,Iris-versicolor 71 | 5.6,2.5,3.9,1.1,Iris-versicolor 72 | 5.9,3.2,4.8,1.8,Iris-versicolor 73 | 6.1,2.8,4,1.3,Iris-versicolor 74 | 6.3,2.5,4.9,1.5,Iris-versicolor 75 | 6.1,2.8,4.7,1.2,Iris-versicolor 76 | 6.4,2.9,4.3,1.3,Iris-versicolor 77 | 6.6,3,4.4,1.4,Iris-versicolor 78 | 6.8,2.8,4.8,1.4,Iris-versicolor 79 | 6.7,3,5,1.7,Iris-versicolor 80 | 6,2.9,4.5,1.5,Iris-versicolor 81 | 5.7,2.6,3.5,1,Iris-versicolor 82 | 5.5,2.4,3.8,1.1,Iris-versicolor 83 | 5.5,2.4,3.7,1,Iris-versicolor 84 | 5.8,2.7,3.9,1.2,Iris-versicolor 85 | 6,2.7,5.1,1.6,Iris-versicolor 86 | 5.4,3,4.5,1.5,Iris-versicolor 87 | 6,3.4,4.5,1.6,Iris-versicolor 88 | 6.7,3.1,4.7,1.5,Iris-versicolor 89 | 6.3,2.3,4.4,1.3,Iris-versicolor 90 | 5.6,3,4.1,1.3,Iris-versicolor 91 | 5.5,2.5,4,1.3,Iris-versicolor 92 | 5.5,2.6,4.4,1.2,Iris-versicolor 93 | 6.1,3,4.6,1.4,Iris-versicolor 94 | 5.8,2.6,4,1.2,Iris-versicolor 95 | 5,2.3,3.3,1,Iris-versicolor 96 | 5.6,2.7,4.2,1.3,Iris-versicolor 97 | 5.7,3,4.2,1.2,Iris-versicolor 98 | 5.7,2.9,4.2,1.3,Iris-versicolor 99 | 6.2,2.9,4.3,1.3,Iris-versicolor 100 | 5.1,2.5,3,1.1,Iris-versicolor 101 | 5.7,2.8,4.1,1.3,Iris-versicolor 102 | 6.3,3.3,6,2.5,Iris-virginica 103 | 5.8,2.7,5.1,1.9,Iris-virginica 104 | 7.1,3,5.9,2.1,Iris-virginica 105 | 6.3,2.9,5.6,1.8,Iris-virginica 106 | 6.5,3,5.8,2.2,Iris-virginica 107 | 7.6,3,6.6,2.1,Iris-virginica 108 | 4.9,2.5,4.5,1.7,Iris-virginica 109 | 7.3,2.9,6.3,1.8,Iris-virginica 110 | 6.7,2.5,5.8,1.8,Iris-virginica 111 | 7.2,3.6,6.1,2.5,Iris-virginica 112 | 6.5,3.2,5.1,2,Iris-virginica 113 | 6.4,2.7,5.3,1.9,Iris-virginica 114 | 6.8,3,5.5,2.1,Iris-virginica 115 | 5.7,2.5,5,2,Iris-virginica 116 | 5.8,2.8,5.1,2.4,Iris-virginica 117 | 6.4,3.2,5.3,2.3,Iris-virginica 118 | 6.5,3,5.5,1.8,Iris-virginica 119 | 7.7,3.8,6.7,2.2,Iris-virginica 120 | 7.7,2.6,6.9,2.3,Iris-virginica 121 | 6,2.2,5,1.5,Iris-virginica 122 | 6.9,3.2,5.7,2.3,Iris-virginica 123 | 5.6,2.8,4.9,2,Iris-virginica 124 | 7.7,2.8,6.7,2,Iris-virginica 125 | 6.3,2.7,4.9,1.8,Iris-virginica 126 | 6.7,3.3,5.7,2.1,Iris-virginica 127 | 7.2,3.2,6,1.8,Iris-virginica 128 | 6.2,2.8,4.8,1.8,Iris-virginica 129 | 6.1,3,4.9,1.8,Iris-virginica 130 | 6.4,2.8,5.6,2.1,Iris-virginica 131 | 7.2,3,5.8,1.6,Iris-virginica 132 | 7.4,2.8,6.1,1.9,Iris-virginica 133 | 7.9,3.8,6.4,2,Iris-virginica 134 | 6.4,2.8,5.6,2.2,Iris-virginica 135 | 6.3,2.8,5.1,1.5,Iris-virginica 136 | 6.1,2.6,5.6,1.4,Iris-virginica 137 | 7.7,3,6.1,2.3,Iris-virginica 138 | 6.3,3.4,5.6,2.4,Iris-virginica 139 | 6.4,3.1,5.5,1.8,Iris-virginica 140 | 6,3,4.8,1.8,Iris-virginica 141 | 6.9,3.1,5.4,2.1,Iris-virginica 142 | 6.7,3.1,5.6,2.4,Iris-virginica 143 | 6.9,3.1,5.1,2.3,Iris-virginica 144 | 5.8,2.7,5.1,1.9,Iris-virginica 145 | 6.8,3.2,5.9,2.3,Iris-virginica 146 | 6.7,3.3,5.7,2.5,Iris-virginica 147 | 6.7,3,5.2,2.3,Iris-virginica 148 | 6.3,2.5,5,1.9,Iris-virginica 149 | 6.5,3,5.2,2,Iris-virginica 150 | 6.2,3.4,5.4,2.3,Iris-virginica 151 | 5.9,3,5.1,1.8,Iris-virginica 152 | -------------------------------------------------------------------------------- /Chapter 5/ReadMe: -------------------------------------------------------------------------------- 1 | 2 | 3 | The last chapter of the book. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding. 4 | -------------------------------------------------------------------------------- /Chapter2/ReadMe: -------------------------------------------------------------------------------- 1 | 2 | The second chapter of the book. All the Python Jupyter notebooks and the datasets are committed here. All the codes are using Jupyter Notebook. Happy coding. 3 | -------------------------------------------------------------------------------- /Chapter2/auto-mpg.csv: -------------------------------------------------------------------------------- 1 | mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name 2 | 18,8,307,130,3504,12,70,1,chevrolet chevelle malibu 3 | 15,8,350,165,3693,11.5,70,1,buick skylark 320 4 | 18,8,318,150,3436,11,70,1,plymouth satellite 5 | 16,8,304,150,3433,12,70,1,amc rebel sst 6 | 17,8,302,140,3449,10.5,70,1,ford torino 7 | 15,8,429,198,4341,10,70,1,ford galaxie 500 8 | 14,8,454,220,4354,9,70,1,chevrolet impala 9 | 14,8,440,215,4312,8.5,70,1,plymouth fury iii 10 | 14,8,455,225,4425,10,70,1,pontiac catalina 11 | 15,8,390,190,3850,8.5,70,1,amc ambassador dpl 12 | 15,8,383,170,3563,10,70,1,dodge challenger se 13 | 14,8,340,160,3609,8,70,1,plymouth 'cuda 340 14 | 15,8,400,150,3761,9.5,70,1,chevrolet monte carlo 15 | 14,8,455,225,3086,10,70,1,buick estate wagon (sw) 16 | 24,4,113,95,2372,15,70,3,toyota corona mark ii 17 | 22,6,198,95,2833,15.5,70,1,plymouth duster 18 | 18,6,199,97,2774,15.5,70,1,amc hornet 19 | 21,6,200,85,2587,16,70,1,ford maverick 20 | 27,4,97,88,2130,14.5,70,3,datsun pl510 21 | 26,4,97,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan 22 | 25,4,110,87,2672,17.5,70,2,peugeot 504 23 | 24,4,107,90,2430,14.5,70,2,audi 100 ls 24 | 25,4,104,95,2375,17.5,70,2,saab 99e 25 | 26,4,121,113,2234,12.5,70,2,bmw 2002 26 | 21,6,199,90,2648,15,70,1,amc gremlin 27 | 10,8,360,215,4615,14,70,1,ford f250 28 | 10,8,307,200,4376,15,70,1,chevy c20 29 | 11,8,318,210,4382,13.5,70,1,dodge d200 30 | 9,8,304,193,4732,18.5,70,1,hi 1200d 31 | 27,4,97,88,2130,14.5,71,3,datsun pl510 32 | 28,4,140,90,2264,15.5,71,1,chevrolet vega 2300 33 | 25,4,113,95,2228,14,71,3,toyota corona 34 | 25,4,98,?,2046,19,71,1,ford pinto 35 | 19,6,232,100,2634,13,71,1,amc gremlin 36 | 16,6,225,105,3439,15.5,71,1,plymouth satellite custom 37 | 17,6,250,100,3329,15.5,71,1,chevrolet chevelle malibu 38 | 19,6,250,88,3302,15.5,71,1,ford torino 500 39 | 18,6,232,100,3288,15.5,71,1,amc matador 40 | 14,8,350,165,4209,12,71,1,chevrolet impala 41 | 14,8,400,175,4464,11.5,71,1,pontiac catalina brougham 42 | 14,8,351,153,4154,13.5,71,1,ford galaxie 500 43 | 14,8,318,150,4096,13,71,1,plymouth fury iii 44 | 12,8,383,180,4955,11.5,71,1,dodge monaco (sw) 45 | 13,8,400,170,4746,12,71,1,ford country squire (sw) 46 | 13,8,400,175,5140,12,71,1,pontiac safari (sw) 47 | 18,6,258,110,2962,13.5,71,1,amc hornet sportabout (sw) 48 | 22,4,140,72,2408,19,71,1,chevrolet vega (sw) 49 | 19,6,250,100,3282,15,71,1,pontiac firebird 50 | 18,6,250,88,3139,14.5,71,1,ford mustang 51 | 23,4,122,86,2220,14,71,1,mercury capri 2000 52 | 28,4,116,90,2123,14,71,2,opel 1900 53 | 30,4,79,70,2074,19.5,71,2,peugeot 304 54 | 30,4,88,76,2065,14.5,71,2,fiat 124b 55 | 31,4,71,65,1773,19,71,3,toyota corolla 1200 56 | 35,4,72,69,1613,18,71,3,datsun 1200 57 | 27,4,97,60,1834,19,71,2,volkswagen model 111 58 | 26,4,91,70,1955,20.5,71,1,plymouth cricket 59 | 24,4,113,95,2278,15.5,72,3,toyota corona hardtop 60 | 25,4,97.5,80,2126,17,72,1,dodge colt hardtop 61 | 23,4,97,54,2254,23.5,72,2,volkswagen type 3 62 | 20,4,140,90,2408,19.5,72,1,chevrolet vega 63 | 21,4,122,86,2226,16.5,72,1,ford pinto runabout 64 | 13,8,350,165,4274,12,72,1,chevrolet impala 65 | 14,8,400,175,4385,12,72,1,pontiac catalina 66 | 15,8,318,150,4135,13.5,72,1,plymouth fury iii 67 | 14,8,351,153,4129,13,72,1,ford galaxie 500 68 | 17,8,304,150,3672,11.5,72,1,amc ambassador sst 69 | 11,8,429,208,4633,11,72,1,mercury marquis 70 | 13,8,350,155,4502,13.5,72,1,buick lesabre custom 71 | 12,8,350,160,4456,13.5,72,1,oldsmobile delta 88 royale 72 | 13,8,400,190,4422,12.5,72,1,chrysler newport royal 73 | 19,3,70,97,2330,13.5,72,3,mazda rx2 coupe 74 | 15,8,304,150,3892,12.5,72,1,amc matador (sw) 75 | 13,8,307,130,4098,14,72,1,chevrolet chevelle concours (sw) 76 | 13,8,302,140,4294,16,72,1,ford gran torino (sw) 77 | 14,8,318,150,4077,14,72,1,plymouth satellite custom (sw) 78 | 18,4,121,112,2933,14.5,72,2,volvo 145e (sw) 79 | 22,4,121,76,2511,18,72,2,volkswagen 411 (sw) 80 | 21,4,120,87,2979,19.5,72,2,peugeot 504 (sw) 81 | 26,4,96,69,2189,18,72,2,renault 12 (sw) 82 | 22,4,122,86,2395,16,72,1,ford pinto (sw) 83 | 28,4,97,92,2288,17,72,3,datsun 510 (sw) 84 | 23,4,120,97,2506,14.5,72,3,toyouta corona mark ii (sw) 85 | 28,4,98,80,2164,15,72,1,dodge colt (sw) 86 | 27,4,97,88,2100,16.5,72,3,toyota corolla 1600 (sw) 87 | 13,8,350,175,4100,13,73,1,buick century 350 88 | 14,8,304,150,3672,11.5,73,1,amc matador 89 | 13,8,350,145,3988,13,73,1,chevrolet malibu 90 | 14,8,302,137,4042,14.5,73,1,ford gran torino 91 | 15,8,318,150,3777,12.5,73,1,dodge coronet custom 92 | 12,8,429,198,4952,11.5,73,1,mercury marquis brougham 93 | 13,8,400,150,4464,12,73,1,chevrolet caprice classic 94 | 13,8,351,158,4363,13,73,1,ford ltd 95 | 14,8,318,150,4237,14.5,73,1,plymouth fury gran sedan 96 | 13,8,440,215,4735,11,73,1,chrysler new yorker brougham 97 | 12,8,455,225,4951,11,73,1,buick electra 225 custom 98 | 13,8,360,175,3821,11,73,1,amc ambassador brougham 99 | 18,6,225,105,3121,16.5,73,1,plymouth valiant 100 | 16,6,250,100,3278,18,73,1,chevrolet nova custom 101 | 18,6,232,100,2945,16,73,1,amc hornet 102 | 18,6,250,88,3021,16.5,73,1,ford maverick 103 | 23,6,198,95,2904,16,73,1,plymouth duster 104 | 26,4,97,46,1950,21,73,2,volkswagen super beetle 105 | 11,8,400,150,4997,14,73,1,chevrolet impala 106 | 12,8,400,167,4906,12.5,73,1,ford country 107 | 13,8,360,170,4654,13,73,1,plymouth custom suburb 108 | 12,8,350,180,4499,12.5,73,1,oldsmobile vista cruiser 109 | 18,6,232,100,2789,15,73,1,amc gremlin 110 | 20,4,97,88,2279,19,73,3,toyota carina 111 | 21,4,140,72,2401,19.5,73,1,chevrolet vega 112 | 22,4,108,94,2379,16.5,73,3,datsun 610 113 | 18,3,70,90,2124,13.5,73,3,maxda rx3 114 | 19,4,122,85,2310,18.5,73,1,ford pinto 115 | 21,6,155,107,2472,14,73,1,mercury capri v6 116 | 26,4,98,90,2265,15.5,73,2,fiat 124 sport coupe 117 | 15,8,350,145,4082,13,73,1,chevrolet monte carlo s 118 | 16,8,400,230,4278,9.5,73,1,pontiac grand prix 119 | 29,4,68,49,1867,19.5,73,2,fiat 128 120 | 24,4,116,75,2158,15.5,73,2,opel manta 121 | 20,4,114,91,2582,14,73,2,audi 100ls 122 | 19,4,121,112,2868,15.5,73,2,volvo 144ea 123 | 15,8,318,150,3399,11,73,1,dodge dart custom 124 | 24,4,121,110,2660,14,73,2,saab 99le 125 | 20,6,156,122,2807,13.5,73,3,toyota mark ii 126 | 11,8,350,180,3664,11,73,1,oldsmobile omega 127 | 20,6,198,95,3102,16.5,74,1,plymouth duster 128 | 21,6,200,?,2875,17,74,1,ford maverick 129 | 19,6,232,100,2901,16,74,1,amc hornet 130 | 15,6,250,100,3336,17,74,1,chevrolet nova 131 | 31,4,79,67,1950,19,74,3,datsun b210 132 | 26,4,122,80,2451,16.5,74,1,ford pinto 133 | 32,4,71,65,1836,21,74,3,toyota corolla 1200 134 | 25,4,140,75,2542,17,74,1,chevrolet vega 135 | 16,6,250,100,3781,17,74,1,chevrolet chevelle malibu classic 136 | 16,6,258,110,3632,18,74,1,amc matador 137 | 18,6,225,105,3613,16.5,74,1,plymouth satellite sebring 138 | 16,8,302,140,4141,14,74,1,ford gran torino 139 | 13,8,350,150,4699,14.5,74,1,buick century luxus (sw) 140 | 14,8,318,150,4457,13.5,74,1,dodge coronet custom (sw) 141 | 14,8,302,140,4638,16,74,1,ford gran torino (sw) 142 | 14,8,304,150,4257,15.5,74,1,amc matador (sw) 143 | 29,4,98,83,2219,16.5,74,2,audi fox 144 | 26,4,79,67,1963,15.5,74,2,volkswagen dasher 145 | 26,4,97,78,2300,14.5,74,2,opel manta 146 | 31,4,76,52,1649,16.5,74,3,toyota corona 147 | 32,4,83,61,2003,19,74,3,datsun 710 148 | 28,4,90,75,2125,14.5,74,1,dodge colt 149 | 24,4,90,75,2108,15.5,74,2,fiat 128 150 | 26,4,116,75,2246,14,74,2,fiat 124 tc 151 | 24,4,120,97,2489,15,74,3,honda civic 152 | 26,4,108,93,2391,15.5,74,3,subaru 153 | 31,4,79,67,2000,16,74,2,fiat x1.9 154 | 19,6,225,95,3264,16,75,1,plymouth valiant custom 155 | 18,6,250,105,3459,16,75,1,chevrolet nova 156 | 15,6,250,72,3432,21,75,1,mercury monarch 157 | 15,6,250,72,3158,19.5,75,1,ford maverick 158 | 16,8,400,170,4668,11.5,75,1,pontiac catalina 159 | 15,8,350,145,4440,14,75,1,chevrolet bel air 160 | 16,8,318,150,4498,14.5,75,1,plymouth grand fury 161 | 14,8,351,148,4657,13.5,75,1,ford ltd 162 | 17,6,231,110,3907,21,75,1,buick century 163 | 16,6,250,105,3897,18.5,75,1,chevroelt chevelle malibu 164 | 15,6,258,110,3730,19,75,1,amc matador 165 | 18,6,225,95,3785,19,75,1,plymouth fury 166 | 21,6,231,110,3039,15,75,1,buick skyhawk 167 | 20,8,262,110,3221,13.5,75,1,chevrolet monza 2+2 168 | 13,8,302,129,3169,12,75,1,ford mustang ii 169 | 29,4,97,75,2171,16,75,3,toyota corolla 170 | 23,4,140,83,2639,17,75,1,ford pinto 171 | 20,6,232,100,2914,16,75,1,amc gremlin 172 | 23,4,140,78,2592,18.5,75,1,pontiac astro 173 | 24,4,134,96,2702,13.5,75,3,toyota corona 174 | 25,4,90,71,2223,16.5,75,2,volkswagen dasher 175 | 24,4,119,97,2545,17,75,3,datsun 710 176 | 18,6,171,97,2984,14.5,75,1,ford pinto 177 | 29,4,90,70,1937,14,75,2,volkswagen rabbit 178 | 19,6,232,90,3211,17,75,1,amc pacer 179 | 23,4,115,95,2694,15,75,2,audi 100ls 180 | 23,4,120,88,2957,17,75,2,peugeot 504 181 | 22,4,121,98,2945,14.5,75,2,volvo 244dl 182 | 25,4,121,115,2671,13.5,75,2,saab 99le 183 | 33,4,91,53,1795,17.5,75,3,honda civic cvcc 184 | 28,4,107,86,2464,15.5,76,2,fiat 131 185 | 25,4,116,81,2220,16.9,76,2,opel 1900 186 | 25,4,140,92,2572,14.9,76,1,capri ii 187 | 26,4,98,79,2255,17.7,76,1,dodge colt 188 | 27,4,101,83,2202,15.3,76,2,renault 12tl 189 | 17.5,8,305,140,4215,13,76,1,chevrolet chevelle malibu classic 190 | 16,8,318,150,4190,13,76,1,dodge coronet brougham 191 | 15.5,8,304,120,3962,13.9,76,1,amc matador 192 | 14.5,8,351,152,4215,12.8,76,1,ford gran torino 193 | 22,6,225,100,3233,15.4,76,1,plymouth valiant 194 | 22,6,250,105,3353,14.5,76,1,chevrolet nova 195 | 24,6,200,81,3012,17.6,76,1,ford maverick 196 | 22.5,6,232,90,3085,17.6,76,1,amc hornet 197 | 29,4,85,52,2035,22.2,76,1,chevrolet chevette 198 | 24.5,4,98,60,2164,22.1,76,1,chevrolet woody 199 | 29,4,90,70,1937,14.2,76,2,vw rabbit 200 | 33,4,91,53,1795,17.4,76,3,honda civic 201 | 20,6,225,100,3651,17.7,76,1,dodge aspen se 202 | 18,6,250,78,3574,21,76,1,ford granada ghia 203 | 18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj 204 | 17.5,6,258,95,3193,17.8,76,1,amc pacer d/l 205 | 29.5,4,97,71,1825,12.2,76,2,volkswagen rabbit 206 | 32,4,85,70,1990,17,76,3,datsun b-210 207 | 28,4,97,75,2155,16.4,76,3,toyota corolla 208 | 26.5,4,140,72,2565,13.6,76,1,ford pinto 209 | 20,4,130,102,3150,15.7,76,2,volvo 245 210 | 13,8,318,150,3940,13.2,76,1,plymouth volare premier v8 211 | 19,4,120,88,3270,21.9,76,2,peugeot 504 212 | 19,6,156,108,2930,15.5,76,3,toyota mark ii 213 | 16.5,6,168,120,3820,16.7,76,2,mercedes-benz 280s 214 | 16.5,8,350,180,4380,12.1,76,1,cadillac seville 215 | 13,8,350,145,4055,12,76,1,chevy c10 216 | 13,8,302,130,3870,15,76,1,ford f108 217 | 13,8,318,150,3755,14,76,1,dodge d100 218 | 31.5,4,98,68,2045,18.5,77,3,honda accord cvcc 219 | 30,4,111,80,2155,14.8,77,1,buick opel isuzu deluxe 220 | 36,4,79,58,1825,18.6,77,2,renault 5 gtl 221 | 25.5,4,122,96,2300,15.5,77,1,plymouth arrow gs 222 | 33.5,4,85,70,1945,16.8,77,3,datsun f-10 hatchback 223 | 17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic 224 | 17,8,260,110,4060,19,77,1,oldsmobile cutlass supreme 225 | 15.5,8,318,145,4140,13.7,77,1,dodge monaco brougham 226 | 15,8,302,130,4295,14.9,77,1,mercury cougar brougham 227 | 17.5,6,250,110,3520,16.4,77,1,chevrolet concours 228 | 20.5,6,231,105,3425,16.9,77,1,buick skylark 229 | 19,6,225,100,3630,17.7,77,1,plymouth volare custom 230 | 18.5,6,250,98,3525,19,77,1,ford granada 231 | 16,8,400,180,4220,11.1,77,1,pontiac grand prix lj 232 | 15.5,8,350,170,4165,11.4,77,1,chevrolet monte carlo landau 233 | 15.5,8,400,190,4325,12.2,77,1,chrysler cordoba 234 | 16,8,351,149,4335,14.5,77,1,ford thunderbird 235 | 29,4,97,78,1940,14.5,77,2,volkswagen rabbit custom 236 | 24.5,4,151,88,2740,16,77,1,pontiac sunbird coupe 237 | 26,4,97,75,2265,18.2,77,3,toyota corolla liftback 238 | 25.5,4,140,89,2755,15.8,77,1,ford mustang ii 2+2 239 | 30.5,4,98,63,2051,17,77,1,chevrolet chevette 240 | 33.5,4,98,83,2075,15.9,77,1,dodge colt m/m 241 | 30,4,97,67,1985,16.4,77,3,subaru dl 242 | 30.5,4,97,78,2190,14.1,77,2,volkswagen dasher 243 | 22,6,146,97,2815,14.5,77,3,datsun 810 244 | 21.5,4,121,110,2600,12.8,77,2,bmw 320i 245 | 21.5,3,80,110,2720,13.5,77,3,mazda rx-4 246 | 43.1,4,90,48,1985,21.5,78,2,volkswagen rabbit custom diesel 247 | 36.1,4,98,66,1800,14.4,78,1,ford fiesta 248 | 32.8,4,78,52,1985,19.4,78,3,mazda glc deluxe 249 | 39.4,4,85,70,2070,18.6,78,3,datsun b210 gx 250 | 36.1,4,91,60,1800,16.4,78,3,honda civic cvcc 251 | 19.9,8,260,110,3365,15.5,78,1,oldsmobile cutlass salon brougham 252 | 19.4,8,318,140,3735,13.2,78,1,dodge diplomat 253 | 20.2,8,302,139,3570,12.8,78,1,mercury monarch ghia 254 | 19.2,6,231,105,3535,19.2,78,1,pontiac phoenix lj 255 | 20.5,6,200,95,3155,18.2,78,1,chevrolet malibu 256 | 20.2,6,200,85,2965,15.8,78,1,ford fairmont (auto) 257 | 25.1,4,140,88,2720,15.4,78,1,ford fairmont (man) 258 | 20.5,6,225,100,3430,17.2,78,1,plymouth volare 259 | 19.4,6,232,90,3210,17.2,78,1,amc concord 260 | 20.6,6,231,105,3380,15.8,78,1,buick century special 261 | 20.8,6,200,85,3070,16.7,78,1,mercury zephyr 262 | 18.6,6,225,110,3620,18.7,78,1,dodge aspen 263 | 18.1,6,258,120,3410,15.1,78,1,amc concord d/l 264 | 19.2,8,305,145,3425,13.2,78,1,chevrolet monte carlo landau 265 | 17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo) 266 | 18.1,8,302,139,3205,11.2,78,1,ford futura 267 | 17.5,8,318,140,4080,13.7,78,1,dodge magnum xe 268 | 30,4,98,68,2155,16.5,78,1,chevrolet chevette 269 | 27.5,4,134,95,2560,14.2,78,3,toyota corona 270 | 27.2,4,119,97,2300,14.7,78,3,datsun 510 271 | 30.9,4,105,75,2230,14.5,78,1,dodge omni 272 | 21.1,4,134,95,2515,14.8,78,3,toyota celica gt liftback 273 | 23.2,4,156,105,2745,16.7,78,1,plymouth sapporo 274 | 23.8,4,151,85,2855,17.6,78,1,oldsmobile starfire sx 275 | 23.9,4,119,97,2405,14.9,78,3,datsun 200-sx 276 | 20.3,5,131,103,2830,15.9,78,2,audi 5000 277 | 17,6,163,125,3140,13.6,78,2,volvo 264gl 278 | 21.6,4,121,115,2795,15.7,78,2,saab 99gle 279 | 16.2,6,163,133,3410,15.8,78,2,peugeot 604sl 280 | 31.5,4,89,71,1990,14.9,78,2,volkswagen scirocco 281 | 29.5,4,98,68,2135,16.6,78,3,honda accord lx 282 | 21.5,6,231,115,3245,15.4,79,1,pontiac lemans v6 283 | 19.8,6,200,85,2990,18.2,79,1,mercury zephyr 6 284 | 22.3,4,140,88,2890,17.3,79,1,ford fairmont 4 285 | 20.2,6,232,90,3265,18.2,79,1,amc concord dl 6 286 | 20.6,6,225,110,3360,16.6,79,1,dodge aspen 6 287 | 17,8,305,130,3840,15.4,79,1,chevrolet caprice classic 288 | 17.6,8,302,129,3725,13.4,79,1,ford ltd landau 289 | 16.5,8,351,138,3955,13.2,79,1,mercury grand marquis 290 | 18.2,8,318,135,3830,15.2,79,1,dodge st. regis 291 | 16.9,8,350,155,4360,14.9,79,1,buick estate wagon (sw) 292 | 15.5,8,351,142,4054,14.3,79,1,ford country squire (sw) 293 | 19.2,8,267,125,3605,15,79,1,chevrolet malibu classic (sw) 294 | 18.5,8,360,150,3940,13,79,1,chrysler lebaron town @ country (sw) 295 | 31.9,4,89,71,1925,14,79,2,vw rabbit custom 296 | 34.1,4,86,65,1975,15.2,79,3,maxda glc deluxe 297 | 35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom 298 | 27.4,4,121,80,2670,15,79,1,amc spirit dl 299 | 25.4,5,183,77,3530,20.1,79,2,mercedes benz 300d 300 | 23,8,350,125,3900,17.4,79,1,cadillac eldorado 301 | 27.2,4,141,71,3190,24.8,79,2,peugeot 504 302 | 23.9,8,260,90,3420,22.2,79,1,oldsmobile cutlass salon brougham 303 | 34.2,4,105,70,2200,13.2,79,1,plymouth horizon 304 | 34.5,4,105,70,2150,14.9,79,1,plymouth horizon tc3 305 | 31.8,4,85,65,2020,19.2,79,3,datsun 210 306 | 37.3,4,91,69,2130,14.7,79,2,fiat strada custom 307 | 28.4,4,151,90,2670,16,79,1,buick skylark limited 308 | 28.8,6,173,115,2595,11.3,79,1,chevrolet citation 309 | 26.8,6,173,115,2700,12.9,79,1,oldsmobile omega brougham 310 | 33.5,4,151,90,2556,13.2,79,1,pontiac phoenix 311 | 41.5,4,98,76,2144,14.7,80,2,vw rabbit 312 | 38.1,4,89,60,1968,18.8,80,3,toyota corolla tercel 313 | 32.1,4,98,70,2120,15.5,80,1,chevrolet chevette 314 | 37.2,4,86,65,2019,16.4,80,3,datsun 310 315 | 28,4,151,90,2678,16.5,80,1,chevrolet citation 316 | 26.4,4,140,88,2870,18.1,80,1,ford fairmont 317 | 24.3,4,151,90,3003,20.1,80,1,amc concord 318 | 19.1,6,225,90,3381,18.7,80,1,dodge aspen 319 | 34.3,4,97,78,2188,15.8,80,2,audi 4000 320 | 29.8,4,134,90,2711,15.5,80,3,toyota corona liftback 321 | 31.3,4,120,75,2542,17.5,80,3,mazda 626 322 | 37,4,119,92,2434,15,80,3,datsun 510 hatchback 323 | 32.2,4,108,75,2265,15.2,80,3,toyota corolla 324 | 46.6,4,86,65,2110,17.9,80,3,mazda glc 325 | 27.9,4,156,105,2800,14.4,80,1,dodge colt 326 | 40.8,4,85,65,2110,19.2,80,3,datsun 210 327 | 44.3,4,90,48,2085,21.7,80,2,vw rabbit c (diesel) 328 | 43.4,4,90,48,2335,23.7,80,2,vw dasher (diesel) 329 | 36.4,5,121,67,2950,19.9,80,2,audi 5000s (diesel) 330 | 30,4,146,67,3250,21.8,80,2,mercedes-benz 240d 331 | 44.6,4,91,67,1850,13.8,80,3,honda civic 1500 gl 332 | 40.9,4,85,?,1835,17.3,80,2,renault lecar deluxe 333 | 33.8,4,97,67,2145,18,80,3,subaru dl 334 | 29.8,4,89,62,1845,15.3,80,2,vokswagen rabbit 335 | 32.7,6,168,132,2910,11.4,80,3,datsun 280-zx 336 | 23.7,3,70,100,2420,12.5,80,3,mazda rx-7 gs 337 | 35,4,122,88,2500,15.1,80,2,triumph tr7 coupe 338 | 23.6,4,140,?,2905,14.3,80,1,ford mustang cobra 339 | 32.4,4,107,72,2290,17,80,3,honda accord 340 | 27.2,4,135,84,2490,15.7,81,1,plymouth reliant 341 | 26.6,4,151,84,2635,16.4,81,1,buick skylark 342 | 25.8,4,156,92,2620,14.4,81,1,dodge aries wagon (sw) 343 | 23.5,6,173,110,2725,12.6,81,1,chevrolet citation 344 | 30,4,135,84,2385,12.9,81,1,plymouth reliant 345 | 39.1,4,79,58,1755,16.9,81,3,toyota starlet 346 | 39,4,86,64,1875,16.4,81,1,plymouth champ 347 | 35.1,4,81,60,1760,16.1,81,3,honda civic 1300 348 | 32.3,4,97,67,2065,17.8,81,3,subaru 349 | 37,4,85,65,1975,19.4,81,3,datsun 210 mpg 350 | 37.7,4,89,62,2050,17.3,81,3,toyota tercel 351 | 34.1,4,91,68,1985,16,81,3,mazda glc 4 352 | 34.7,4,105,63,2215,14.9,81,1,plymouth horizon 4 353 | 34.4,4,98,65,2045,16.2,81,1,ford escort 4w 354 | 29.9,4,98,65,2380,20.7,81,1,ford escort 2h 355 | 33,4,105,74,2190,14.2,81,2,volkswagen jetta 356 | 34.5,4,100,?,2320,15.8,81,2,renault 18i 357 | 33.7,4,107,75,2210,14.4,81,3,honda prelude 358 | 32.4,4,108,75,2350,16.8,81,3,toyota corolla 359 | 32.9,4,119,100,2615,14.8,81,3,datsun 200sx 360 | 31.6,4,120,74,2635,18.3,81,3,mazda 626 361 | 28.1,4,141,80,3230,20.4,81,2,peugeot 505s turbo diesel 362 | 30.7,6,145,76,3160,19.6,81,2,volvo diesel 363 | 25.4,6,168,116,2900,12.6,81,3,toyota cressida 364 | 24.2,6,146,120,2930,13.8,81,3,datsun 810 maxima 365 | 22.4,6,231,110,3415,15.8,81,1,buick century 366 | 26.6,8,350,105,3725,19,81,1,oldsmobile cutlass ls 367 | 20.2,6,200,88,3060,17.1,81,1,ford granada gl 368 | 17.6,6,225,85,3465,16.6,81,1,chrysler lebaron salon 369 | 28,4,112,88,2605,19.6,82,1,chevrolet cavalier 370 | 27,4,112,88,2640,18.6,82,1,chevrolet cavalier wagon 371 | 34,4,112,88,2395,18,82,1,chevrolet cavalier 2-door 372 | 31,4,112,85,2575,16.2,82,1,pontiac j2000 se hatchback 373 | 29,4,135,84,2525,16,82,1,dodge aries se 374 | 27,4,151,90,2735,18,82,1,pontiac phoenix 375 | 24,4,140,92,2865,16.4,82,1,ford fairmont futura 376 | 23,4,151,?,3035,20.5,82,1,amc concord dl 377 | 36,4,105,74,1980,15.3,82,2,volkswagen rabbit l 378 | 37,4,91,68,2025,18.2,82,3,mazda glc custom l 379 | 31,4,91,68,1970,17.6,82,3,mazda glc custom 380 | 38,4,105,63,2125,14.7,82,1,plymouth horizon miser 381 | 36,4,98,70,2125,17.3,82,1,mercury lynx l 382 | 36,4,120,88,2160,14.5,82,3,nissan stanza xe 383 | 36,4,107,75,2205,14.5,82,3,honda accord 384 | 34,4,108,70,2245,16.9,82,3,toyota corolla 385 | 38,4,91,67,1965,15,82,3,honda civic 386 | 32,4,91,67,1965,15.7,82,3,honda civic (auto) 387 | 38,4,91,67,1995,16.2,82,3,datsun 310 gx 388 | 25,6,181,110,2945,16.4,82,1,buick century limited 389 | 38,6,262,85,3015,17,82,1,oldsmobile cutlass ciera (diesel) 390 | 26,4,156,92,2585,14.5,82,1,chrysler lebaron medallion 391 | 22,6,232,112,2835,14.7,82,1,ford granada l 392 | 32,4,144,96,2665,13.9,82,3,toyota celica gt 393 | 36,4,135,84,2370,13,82,1,dodge charger 2.2 394 | 27,4,151,90,2950,17.3,82,1,chevrolet camaro 395 | 27,4,140,86,2790,15.6,82,1,ford mustang gl 396 | 44,4,97,52,2130,24.6,82,2,vw pickup 397 | 32,4,135,84,2295,11.6,82,1,dodge rampage 398 | 28,4,120,79,2625,18.6,82,1,ford ranger 399 | 31,4,119,82,2720,19.4,82,1,chevy s-10 400 | -------------------------------------------------------------------------------- /Chapter2/petrol_consumption.csv: -------------------------------------------------------------------------------- 1 | Petrol_tax,Average_income,Paved_Highways,Population_Driver_licence(%),Petrol_Consumption 2 | 9.00,3571,1976,0.5250,541 3 | 9.00,4092,1250,0.5720,524 4 | 9.00,3865,1586,0.5800,561 5 | 7.50,4870,2351,0.5290,414 6 | 8.00,4399,431,0.5440,410 7 | 10.00,5342,1333,0.5710,457 8 | 8.00,5319,11868,0.4510,344 9 | 8.00,5126,2138,0.5530,467 10 | 8.00,4447,8577,0.5290,464 11 | 7.00,4512,8507,0.5520,498 12 | 8.00,4391,5939,0.5300,580 13 | 7.50,5126,14186,0.5250,471 14 | 7.00,4817,6930,0.5740,525 15 | 7.00,4207,6580,0.5450,508 16 | 7.00,4332,8159,0.6080,566 17 | 7.00,4318,10340,0.5860,635 18 | 7.00,4206,8508,0.5720,603 19 | 7.00,3718,4725,0.5400,714 20 | 7.00,4716,5915,0.7240,865 21 | 8.50,4341,6010,0.6770,640 22 | 7.00,4593,7834,0.6630,649 23 | 8.00,4983,602,0.6020,540 24 | 9.00,4897,2449,0.5110,464 25 | 9.00,4258,4686,0.5170,547 26 | 8.50,4574,2619,0.5510,460 27 | 9.00,3721,4746,0.5440,566 28 | 8.00,3448,5399,0.5480,577 29 | 7.50,3846,9061,0.5790,631 30 | 8.00,4188,5975,0.5630,574 31 | 9.00,3601,4650,0.4930,534 32 | 7.00,3640,6905,0.5180,571 33 | 7.00,3333,6594,0.5130,554 34 | 8.00,3063,6524,0.5780,577 35 | 7.50,3357,4121,0.5470,628 36 | 8.00,3528,3495,0.4870,487 37 | 6.58,3802,7834,0.6290,644 38 | 5.00,4045,17782,0.5660,640 39 | 7.00,3897,6385,0.5860,704 40 | 8.50,3635,3274,0.6630,648 41 | 7.00,4345,3905,0.6720,968 42 | 7.00,4449,4639,0.6260,587 43 | 7.00,3656,3985,0.5630,699 44 | 7.00,4300,3635,0.6030,632 45 | 7.00,3745,2611,0.5080,591 46 | 6.00,5215,2302,0.6720,782 47 | 9.00,4476,3942,0.5710,510 48 | 7.00,4296,4083,0.6230,610 49 | 7.00,5002,9794,0.5930,524 50 | -------------------------------------------------------------------------------- /Contributing.md: -------------------------------------------------------------------------------- 1 | # Contributing to Apress Source Code 2 | 3 | Copyright for Apress source code belongs to the author(s). However, under fair use you are encouraged to fork and contribute minor corrections and updates for the benefit of the author(s) and other readers. 4 | 5 | ## How to Contribute 6 | 7 | 1. Make sure you have a GitHub account. 8 | 2. Fork the repository for the relevant book. 9 | 3. Create a new branch on which to make your change, e.g. 10 | `git checkout -b my_code_contribution` 11 | 4. Commit your change. Include a commit message describing the correction. Please note that if your commit message is not clear, the correction will not be accepted. 12 | 5. Submit a pull request. 13 | 14 | Thank you for your contribution! -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Freeware License, some rights reserved 2 | 3 | Copyright (c) 2020 Vaibhav Verdhan 4 | 5 | Permission is hereby granted, free of charge, to anyone obtaining a copy 6 | of this software and associated documentation files (the "Software"), 7 | to work with the Software within the limits of freeware distribution and fair use. 8 | This includes the rights to use, copy, and modify the Software for personal use. 9 | Users are also allowed and encouraged to submit corrections and modifications 10 | to the Software for the benefit of other users. 11 | 12 | It is not allowed to reuse, modify, or redistribute the Software for 13 | commercial use in any way, or for a user’s educational materials such as books 14 | or blog articles without prior permission from the copyright holder. 15 | 16 | The above copyright notice and this permission notice need to be included 17 | in all copies or substantial portions of the software. 18 | 19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 22 | AUTHORS OR COPYRIGHT HOLDERS OR APRESS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 25 | SOFTWARE. 26 | 27 | 28 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Apress Source Code 2 | 3 | This repository accompanies [*Supervised Learning with Python*](https://www.apress.com/9781484261552) by Vaibhav Verdhan (Apress, 2020). 4 | 5 | [comment]: #cover 6 | ![Cover image](9781484261552.jpg) 7 | 8 | Download the files as a zip using the green button, or clone the repository to your machine using Git. 9 | 10 | ## Releases 11 | 12 | Release v1.0 corresponds to the code in the published book, without corrections or updates. 13 | 14 | ## Contributions 15 | 16 | See the file Contributing.md for more information on how you can contribute to this repository. -------------------------------------------------------------------------------- /errata.md: -------------------------------------------------------------------------------- 1 | # Errata for *Book Title* 2 | 3 | On **page xx** [Summary of error]: 4 | 5 | Details of error here. Highlight key pieces in **bold**. 6 | 7 | *** 8 | 9 | On **page xx** [Summary of error]: 10 | 11 | Details of error here. Highlight key pieces in **bold**. 12 | 13 | *** --------------------------------------------------------------------------------