├── customer_segments └── customer_segments │ ├── README.md │ ├── customer_segments.ipynb │ ├── customers.csv │ ├── report.html │ └── visuals.py └── finding_donors └── finding_donors ├── README.md ├── census.csv ├── finding_donors.ipynb ├── project_description.md ├── report.html └── visuals.py /customer_segments/customer_segments/README.md: -------------------------------------------------------------------------------- 1 | # Machine Learning Engineer Nanodegree 2 | ## Project 4: Creating Customer Segments 3 | 4 | ### Project Description 5 | 6 | This is the 4th project for the Machine Learning Engineer Nanodegree. In this project I apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor to identify customer segments hidden in the data. 7 | 8 | Here, I first explore the data to determine if any product categories highly correlate with one another by observing a small subset of the data and also by ploting a scater matrix. Afterwards, I preprocess the data by scaling each product category and then identifying (and removing) unwanted outliers. Then I apply PCA transformations to the data and implement a clustering algorithm (Gaussian Mixture Model)to segment the transformed customer data. Finally, I compare the segmentation found with an additional labeling and consider ways this information could assist the wholesale distributor with future service changes. 9 | 10 | ### Install 11 | 12 | This project requires **Python 2.7** and the following Python libraries installed: 13 | 14 | - [NumPy](http://www.numpy.org/) 15 | - [Pandas](http://pandas.pydata.org) 16 | - [matplotlib](http://matplotlib.org/) 17 | - [scikit-learn](http://scikit-learn.org/stable/) 18 | 19 | You will also need to have software installed to run and execute a [Jupyter Notebook](http://ipython.org/notebook.html) 20 | 21 | If you do not have Python installed yet, it is highly recommended that you install the [Anaconda](http://continuum.io/downloads) distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer. 22 | 23 | ### Code 24 | 25 | The main code for this project is located in the `customer_segments.ipynb` notebook file. Additional supporting code for visualizing the necessary graphs can be found in `visuals.py`. Additionally, the `Report.html` file contains a snapshot of the main code in the jupyter notebook with all code cells executed 26 | 27 | ### Run 28 | 29 | In a terminal or command window, navigate to the top-level project directory `customer_segments/` (that contains this README) and run one of the following commands: 30 | 31 | ```bash 32 | ipython notebook customer_segments.ipynb 33 | ``` 34 | or 35 | ```bash 36 | jupyter notebook customer_segments.ipynb 37 | ``` 38 | 39 | This will open the Jupyter Notebook software and project file in your browser. 40 | 41 | ## Data 42 | 43 | The customer segments data is included as a selection of 440 data points collected on data found from clients of a wholesale distributor in Lisbon, Portugal. More information can be found on the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Wholesale+customers). 44 | 45 | Note (m.u.) is shorthand for *monetary units*. 46 | 47 | **Features** 48 | 1) `Fresh`: annual spending (m.u.) on fresh products (Continuous); 49 | 2) `Milk`: annual spending (m.u.) on milk products (Continuous); 50 | 3) `Grocery`: annual spending (m.u.) on grocery products (Continuous); 51 | 4) `Frozen`: annual spending (m.u.) on frozen products (Continuous); 52 | 5) `Detergents_Paper`: annual spending (m.u.) on detergents and paper products (Continuous); 53 | 6) `Delicatessen`: annual spending (m.u.) on and delicatessen products (Continuous); 54 | 7) `Channel`: {Hotel/Restaurant/Cafe - 1, Retail - 2} (Nominal) 55 | 8) `Region`: {Lisnon - 1, Oporto - 2, or Other - 3} (Nominal) 56 | -------------------------------------------------------------------------------- /customer_segments/customer_segments/customers.csv: -------------------------------------------------------------------------------- 1 | Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicatessen 2 | 2,3,12669,9656,7561,214,2674,1338 3 | 2,3,7057,9810,9568,1762,3293,1776 4 | 2,3,6353,8808,7684,2405,3516,7844 5 | 1,3,13265,1196,4221,6404,507,1788 6 | 2,3,22615,5410,7198,3915,1777,5185 7 | 2,3,9413,8259,5126,666,1795,1451 8 | 2,3,12126,3199,6975,480,3140,545 9 | 2,3,7579,4956,9426,1669,3321,2566 10 | 1,3,5963,3648,6192,425,1716,750 11 | 2,3,6006,11093,18881,1159,7425,2098 12 | 2,3,3366,5403,12974,4400,5977,1744 13 | 2,3,13146,1124,4523,1420,549,497 14 | 2,3,31714,12319,11757,287,3881,2931 15 | 2,3,21217,6208,14982,3095,6707,602 16 | 2,3,24653,9465,12091,294,5058,2168 17 | 1,3,10253,1114,3821,397,964,412 18 | 2,3,1020,8816,12121,134,4508,1080 19 | 1,3,5876,6157,2933,839,370,4478 20 | 2,3,18601,6327,10099,2205,2767,3181 21 | 1,3,7780,2495,9464,669,2518,501 22 | 2,3,17546,4519,4602,1066,2259,2124 23 | 1,3,5567,871,2010,3383,375,569 24 | 1,3,31276,1917,4469,9408,2381,4334 25 | 2,3,26373,36423,22019,5154,4337,16523 26 | 2,3,22647,9776,13792,2915,4482,5778 27 | 2,3,16165,4230,7595,201,4003,57 28 | 1,3,9898,961,2861,3151,242,833 29 | 1,3,14276,803,3045,485,100,518 30 | 2,3,4113,20484,25957,1158,8604,5206 31 | 1,3,43088,2100,2609,1200,1107,823 32 | 1,3,18815,3610,11107,1148,2134,2963 33 | 1,3,2612,4339,3133,2088,820,985 34 | 1,3,21632,1318,2886,266,918,405 35 | 1,3,29729,4786,7326,6130,361,1083 36 | 1,3,1502,1979,2262,425,483,395 37 | 2,3,688,5491,11091,833,4239,436 38 | 1,3,29955,4362,5428,1729,862,4626 39 | 2,3,15168,10556,12477,1920,6506,714 40 | 2,3,4591,15729,16709,33,6956,433 41 | 1,3,56159,555,902,10002,212,2916 42 | 1,3,24025,4332,4757,9510,1145,5864 43 | 1,3,19176,3065,5956,2033,2575,2802 44 | 2,3,10850,7555,14961,188,6899,46 45 | 2,3,630,11095,23998,787,9529,72 46 | 2,3,9670,7027,10471,541,4618,65 47 | 2,3,5181,22044,21531,1740,7353,4985 48 | 2,3,3103,14069,21955,1668,6792,1452 49 | 2,3,44466,54259,55571,7782,24171,6465 50 | 2,3,11519,6152,10868,584,5121,1476 51 | 2,3,4967,21412,28921,1798,13583,1163 52 | 1,3,6269,1095,1980,3860,609,2162 53 | 1,3,3347,4051,6996,239,1538,301 54 | 2,3,40721,3916,5876,532,2587,1278 55 | 2,3,491,10473,11532,744,5611,224 56 | 1,3,27329,1449,1947,2436,204,1333 57 | 1,3,5264,3683,5005,1057,2024,1130 58 | 2,3,4098,29892,26866,2616,17740,1340 59 | 2,3,5417,9933,10487,38,7572,1282 60 | 1,3,13779,1970,1648,596,227,436 61 | 1,3,6137,5360,8040,129,3084,1603 62 | 2,3,8590,3045,7854,96,4095,225 63 | 2,3,35942,38369,59598,3254,26701,2017 64 | 2,3,7823,6245,6544,4154,4074,964 65 | 2,3,9396,11601,15775,2896,7677,1295 66 | 1,3,4760,1227,3250,3724,1247,1145 67 | 2,3,85,20959,45828,36,24231,1423 68 | 1,3,9,1534,7417,175,3468,27 69 | 2,3,19913,6759,13462,1256,5141,834 70 | 1,3,2446,7260,3993,5870,788,3095 71 | 1,3,8352,2820,1293,779,656,144 72 | 1,3,16705,2037,3202,10643,116,1365 73 | 1,3,18291,1266,21042,5373,4173,14472 74 | 1,3,4420,5139,2661,8872,1321,181 75 | 2,3,19899,5332,8713,8132,764,648 76 | 2,3,8190,6343,9794,1285,1901,1780 77 | 1,3,20398,1137,3,4407,3,975 78 | 1,3,717,3587,6532,7530,529,894 79 | 2,3,12205,12697,28540,869,12034,1009 80 | 1,3,10766,1175,2067,2096,301,167 81 | 1,3,1640,3259,3655,868,1202,1653 82 | 1,3,7005,829,3009,430,610,529 83 | 2,3,219,9540,14403,283,7818,156 84 | 2,3,10362,9232,11009,737,3537,2342 85 | 1,3,20874,1563,1783,2320,550,772 86 | 2,3,11867,3327,4814,1178,3837,120 87 | 2,3,16117,46197,92780,1026,40827,2944 88 | 2,3,22925,73498,32114,987,20070,903 89 | 1,3,43265,5025,8117,6312,1579,14351 90 | 1,3,7864,542,4042,9735,165,46 91 | 1,3,24904,3836,5330,3443,454,3178 92 | 1,3,11405,596,1638,3347,69,360 93 | 1,3,12754,2762,2530,8693,627,1117 94 | 2,3,9198,27472,32034,3232,18906,5130 95 | 1,3,11314,3090,2062,35009,71,2698 96 | 2,3,5626,12220,11323,206,5038,244 97 | 1,3,3,2920,6252,440,223,709 98 | 2,3,23,2616,8118,145,3874,217 99 | 1,3,403,254,610,774,54,63 100 | 1,3,503,112,778,895,56,132 101 | 1,3,9658,2182,1909,5639,215,323 102 | 2,3,11594,7779,12144,3252,8035,3029 103 | 2,3,1420,10810,16267,1593,6766,1838 104 | 2,3,2932,6459,7677,2561,4573,1386 105 | 1,3,56082,3504,8906,18028,1480,2498 106 | 1,3,14100,2132,3445,1336,1491,548 107 | 1,3,15587,1014,3970,910,139,1378 108 | 2,3,1454,6337,10704,133,6830,1831 109 | 2,3,8797,10646,14886,2471,8969,1438 110 | 2,3,1531,8397,6981,247,2505,1236 111 | 2,3,1406,16729,28986,673,836,3 112 | 1,3,11818,1648,1694,2276,169,1647 113 | 2,3,12579,11114,17569,805,6457,1519 114 | 1,3,19046,2770,2469,8853,483,2708 115 | 1,3,14438,2295,1733,3220,585,1561 116 | 1,3,18044,1080,2000,2555,118,1266 117 | 1,3,11134,793,2988,2715,276,610 118 | 1,3,11173,2521,3355,1517,310,222 119 | 1,3,6990,3880,5380,1647,319,1160 120 | 1,3,20049,1891,2362,5343,411,933 121 | 1,3,8258,2344,2147,3896,266,635 122 | 1,3,17160,1200,3412,2417,174,1136 123 | 1,3,4020,3234,1498,2395,264,255 124 | 1,3,12212,201,245,1991,25,860 125 | 2,3,11170,10769,8814,2194,1976,143 126 | 1,3,36050,1642,2961,4787,500,1621 127 | 1,3,76237,3473,7102,16538,778,918 128 | 1,3,19219,1840,1658,8195,349,483 129 | 2,3,21465,7243,10685,880,2386,2749 130 | 1,3,140,8847,3823,142,1062,3 131 | 1,3,42312,926,1510,1718,410,1819 132 | 1,3,7149,2428,699,6316,395,911 133 | 1,3,2101,589,314,346,70,310 134 | 1,3,14903,2032,2479,576,955,328 135 | 1,3,9434,1042,1235,436,256,396 136 | 1,3,7388,1882,2174,720,47,537 137 | 1,3,6300,1289,2591,1170,199,326 138 | 1,3,4625,8579,7030,4575,2447,1542 139 | 1,3,3087,8080,8282,661,721,36 140 | 1,3,13537,4257,5034,155,249,3271 141 | 1,3,5387,4979,3343,825,637,929 142 | 1,3,17623,4280,7305,2279,960,2616 143 | 1,3,30379,13252,5189,321,51,1450 144 | 1,3,37036,7152,8253,2995,20,3 145 | 1,3,10405,1596,1096,8425,399,318 146 | 1,3,18827,3677,1988,118,516,201 147 | 2,3,22039,8384,34792,42,12591,4430 148 | 1,3,7769,1936,2177,926,73,520 149 | 1,3,9203,3373,2707,1286,1082,526 150 | 1,3,5924,584,542,4052,283,434 151 | 1,3,31812,1433,1651,800,113,1440 152 | 1,3,16225,1825,1765,853,170,1067 153 | 1,3,1289,3328,2022,531,255,1774 154 | 1,3,18840,1371,3135,3001,352,184 155 | 1,3,3463,9250,2368,779,302,1627 156 | 1,3,622,55,137,75,7,8 157 | 2,3,1989,10690,19460,233,11577,2153 158 | 2,3,3830,5291,14855,317,6694,3182 159 | 1,3,17773,1366,2474,3378,811,418 160 | 2,3,2861,6570,9618,930,4004,1682 161 | 2,3,355,7704,14682,398,8077,303 162 | 2,3,1725,3651,12822,824,4424,2157 163 | 1,3,12434,540,283,1092,3,2233 164 | 1,3,15177,2024,3810,2665,232,610 165 | 2,3,5531,15726,26870,2367,13726,446 166 | 2,3,5224,7603,8584,2540,3674,238 167 | 2,3,15615,12653,19858,4425,7108,2379 168 | 2,3,4822,6721,9170,993,4973,3637 169 | 1,3,2926,3195,3268,405,1680,693 170 | 1,3,5809,735,803,1393,79,429 171 | 1,3,5414,717,2155,2399,69,750 172 | 2,3,260,8675,13430,1116,7015,323 173 | 2,3,200,25862,19816,651,8773,6250 174 | 1,3,955,5479,6536,333,2840,707 175 | 2,3,514,7677,19805,937,9836,716 176 | 1,3,286,1208,5241,2515,153,1442 177 | 2,3,2343,7845,11874,52,4196,1697 178 | 1,3,45640,6958,6536,7368,1532,230 179 | 1,3,12759,7330,4533,1752,20,2631 180 | 1,3,11002,7075,4945,1152,120,395 181 | 1,3,3157,4888,2500,4477,273,2165 182 | 1,3,12356,6036,8887,402,1382,2794 183 | 1,3,112151,29627,18148,16745,4948,8550 184 | 1,3,694,8533,10518,443,6907,156 185 | 1,3,36847,43950,20170,36534,239,47943 186 | 1,3,327,918,4710,74,334,11 187 | 1,3,8170,6448,1139,2181,58,247 188 | 1,3,3009,521,854,3470,949,727 189 | 1,3,2438,8002,9819,6269,3459,3 190 | 2,3,8040,7639,11687,2758,6839,404 191 | 2,3,834,11577,11522,275,4027,1856 192 | 1,3,16936,6250,1981,7332,118,64 193 | 1,3,13624,295,1381,890,43,84 194 | 1,3,5509,1461,2251,547,187,409 195 | 2,3,180,3485,20292,959,5618,666 196 | 1,3,7107,1012,2974,806,355,1142 197 | 1,3,17023,5139,5230,7888,330,1755 198 | 1,1,30624,7209,4897,18711,763,2876 199 | 2,1,2427,7097,10391,1127,4314,1468 200 | 1,1,11686,2154,6824,3527,592,697 201 | 1,1,9670,2280,2112,520,402,347 202 | 2,1,3067,13240,23127,3941,9959,731 203 | 2,1,4484,14399,24708,3549,14235,1681 204 | 1,1,25203,11487,9490,5065,284,6854 205 | 1,1,583,685,2216,469,954,18 206 | 1,1,1956,891,5226,1383,5,1328 207 | 2,1,1107,11711,23596,955,9265,710 208 | 1,1,6373,780,950,878,288,285 209 | 2,1,2541,4737,6089,2946,5316,120 210 | 1,1,1537,3748,5838,1859,3381,806 211 | 2,1,5550,12729,16767,864,12420,797 212 | 1,1,18567,1895,1393,1801,244,2100 213 | 2,1,12119,28326,39694,4736,19410,2870 214 | 1,1,7291,1012,2062,1291,240,1775 215 | 1,1,3317,6602,6861,1329,3961,1215 216 | 2,1,2362,6551,11364,913,5957,791 217 | 1,1,2806,10765,15538,1374,5828,2388 218 | 2,1,2532,16599,36486,179,13308,674 219 | 1,1,18044,1475,2046,2532,130,1158 220 | 2,1,18,7504,15205,1285,4797,6372 221 | 1,1,4155,367,1390,2306,86,130 222 | 1,1,14755,899,1382,1765,56,749 223 | 1,1,5396,7503,10646,91,4167,239 224 | 1,1,5041,1115,2856,7496,256,375 225 | 2,1,2790,2527,5265,5612,788,1360 226 | 1,1,7274,659,1499,784,70,659 227 | 1,1,12680,3243,4157,660,761,786 228 | 2,1,20782,5921,9212,1759,2568,1553 229 | 1,1,4042,2204,1563,2286,263,689 230 | 1,1,1869,577,572,950,4762,203 231 | 1,1,8656,2746,2501,6845,694,980 232 | 2,1,11072,5989,5615,8321,955,2137 233 | 1,1,2344,10678,3828,1439,1566,490 234 | 1,1,25962,1780,3838,638,284,834 235 | 1,1,964,4984,3316,937,409,7 236 | 1,1,15603,2703,3833,4260,325,2563 237 | 1,1,1838,6380,2824,1218,1216,295 238 | 1,1,8635,820,3047,2312,415,225 239 | 1,1,18692,3838,593,4634,28,1215 240 | 1,1,7363,475,585,1112,72,216 241 | 1,1,47493,2567,3779,5243,828,2253 242 | 1,1,22096,3575,7041,11422,343,2564 243 | 1,1,24929,1801,2475,2216,412,1047 244 | 1,1,18226,659,2914,3752,586,578 245 | 1,1,11210,3576,5119,561,1682,2398 246 | 1,1,6202,7775,10817,1183,3143,1970 247 | 2,1,3062,6154,13916,230,8933,2784 248 | 1,1,8885,2428,1777,1777,430,610 249 | 1,1,13569,346,489,2077,44,659 250 | 1,1,15671,5279,2406,559,562,572 251 | 1,1,8040,3795,2070,6340,918,291 252 | 1,1,3191,1993,1799,1730,234,710 253 | 2,1,6134,23133,33586,6746,18594,5121 254 | 1,1,6623,1860,4740,7683,205,1693 255 | 1,1,29526,7961,16966,432,363,1391 256 | 1,1,10379,17972,4748,4686,1547,3265 257 | 1,1,31614,489,1495,3242,111,615 258 | 1,1,11092,5008,5249,453,392,373 259 | 1,1,8475,1931,1883,5004,3593,987 260 | 1,1,56083,4563,2124,6422,730,3321 261 | 1,1,53205,4959,7336,3012,967,818 262 | 1,1,9193,4885,2157,327,780,548 263 | 1,1,7858,1110,1094,6818,49,287 264 | 1,1,23257,1372,1677,982,429,655 265 | 1,1,2153,1115,6684,4324,2894,411 266 | 2,1,1073,9679,15445,61,5980,1265 267 | 1,1,5909,23527,13699,10155,830,3636 268 | 2,1,572,9763,22182,2221,4882,2563 269 | 1,1,20893,1222,2576,3975,737,3628 270 | 2,1,11908,8053,19847,1069,6374,698 271 | 1,1,15218,258,1138,2516,333,204 272 | 1,1,4720,1032,975,5500,197,56 273 | 1,1,2083,5007,1563,1120,147,1550 274 | 1,1,514,8323,6869,529,93,1040 275 | 1,3,36817,3045,1493,4802,210,1824 276 | 1,3,894,1703,1841,744,759,1153 277 | 1,3,680,1610,223,862,96,379 278 | 1,3,27901,3749,6964,4479,603,2503 279 | 1,3,9061,829,683,16919,621,139 280 | 1,3,11693,2317,2543,5845,274,1409 281 | 2,3,17360,6200,9694,1293,3620,1721 282 | 1,3,3366,2884,2431,977,167,1104 283 | 2,3,12238,7108,6235,1093,2328,2079 284 | 1,3,49063,3965,4252,5970,1041,1404 285 | 1,3,25767,3613,2013,10303,314,1384 286 | 1,3,68951,4411,12609,8692,751,2406 287 | 1,3,40254,640,3600,1042,436,18 288 | 1,3,7149,2247,1242,1619,1226,128 289 | 1,3,15354,2102,2828,8366,386,1027 290 | 1,3,16260,594,1296,848,445,258 291 | 1,3,42786,286,471,1388,32,22 292 | 1,3,2708,2160,2642,502,965,1522 293 | 1,3,6022,3354,3261,2507,212,686 294 | 1,3,2838,3086,4329,3838,825,1060 295 | 2,2,3996,11103,12469,902,5952,741 296 | 1,2,21273,2013,6550,909,811,1854 297 | 2,2,7588,1897,5234,417,2208,254 298 | 1,2,19087,1304,3643,3045,710,898 299 | 2,2,8090,3199,6986,1455,3712,531 300 | 2,2,6758,4560,9965,934,4538,1037 301 | 1,2,444,879,2060,264,290,259 302 | 2,2,16448,6243,6360,824,2662,2005 303 | 2,2,5283,13316,20399,1809,8752,172 304 | 2,2,2886,5302,9785,364,6236,555 305 | 2,2,2599,3688,13829,492,10069,59 306 | 2,2,161,7460,24773,617,11783,2410 307 | 2,2,243,12939,8852,799,3909,211 308 | 2,2,6468,12867,21570,1840,7558,1543 309 | 1,2,17327,2374,2842,1149,351,925 310 | 1,2,6987,1020,3007,416,257,656 311 | 2,2,918,20655,13567,1465,6846,806 312 | 1,2,7034,1492,2405,12569,299,1117 313 | 1,2,29635,2335,8280,3046,371,117 314 | 2,2,2137,3737,19172,1274,17120,142 315 | 1,2,9784,925,2405,4447,183,297 316 | 1,2,10617,1795,7647,1483,857,1233 317 | 2,2,1479,14982,11924,662,3891,3508 318 | 1,2,7127,1375,2201,2679,83,1059 319 | 1,2,1182,3088,6114,978,821,1637 320 | 1,2,11800,2713,3558,2121,706,51 321 | 2,2,9759,25071,17645,1128,12408,1625 322 | 1,2,1774,3696,2280,514,275,834 323 | 1,2,9155,1897,5167,2714,228,1113 324 | 1,2,15881,713,3315,3703,1470,229 325 | 1,2,13360,944,11593,915,1679,573 326 | 1,2,25977,3587,2464,2369,140,1092 327 | 1,2,32717,16784,13626,60869,1272,5609 328 | 1,2,4414,1610,1431,3498,387,834 329 | 1,2,542,899,1664,414,88,522 330 | 1,2,16933,2209,3389,7849,210,1534 331 | 1,2,5113,1486,4583,5127,492,739 332 | 1,2,9790,1786,5109,3570,182,1043 333 | 2,2,11223,14881,26839,1234,9606,1102 334 | 1,2,22321,3216,1447,2208,178,2602 335 | 2,2,8565,4980,67298,131,38102,1215 336 | 2,2,16823,928,2743,11559,332,3486 337 | 2,2,27082,6817,10790,1365,4111,2139 338 | 1,2,13970,1511,1330,650,146,778 339 | 1,2,9351,1347,2611,8170,442,868 340 | 1,2,3,333,7021,15601,15,550 341 | 1,2,2617,1188,5332,9584,573,1942 342 | 2,3,381,4025,9670,388,7271,1371 343 | 2,3,2320,5763,11238,767,5162,2158 344 | 1,3,255,5758,5923,349,4595,1328 345 | 2,3,1689,6964,26316,1456,15469,37 346 | 1,3,3043,1172,1763,2234,217,379 347 | 1,3,1198,2602,8335,402,3843,303 348 | 2,3,2771,6939,15541,2693,6600,1115 349 | 2,3,27380,7184,12311,2809,4621,1022 350 | 1,3,3428,2380,2028,1341,1184,665 351 | 2,3,5981,14641,20521,2005,12218,445 352 | 1,3,3521,1099,1997,1796,173,995 353 | 2,3,1210,10044,22294,1741,12638,3137 354 | 1,3,608,1106,1533,830,90,195 355 | 2,3,117,6264,21203,228,8682,1111 356 | 1,3,14039,7393,2548,6386,1333,2341 357 | 1,3,190,727,2012,245,184,127 358 | 1,3,22686,134,218,3157,9,548 359 | 2,3,37,1275,22272,137,6747,110 360 | 1,3,759,18664,1660,6114,536,4100 361 | 1,3,796,5878,2109,340,232,776 362 | 1,3,19746,2872,2006,2601,468,503 363 | 1,3,4734,607,864,1206,159,405 364 | 1,3,2121,1601,2453,560,179,712 365 | 1,3,4627,997,4438,191,1335,314 366 | 1,3,2615,873,1524,1103,514,468 367 | 2,3,4692,6128,8025,1619,4515,3105 368 | 1,3,9561,2217,1664,1173,222,447 369 | 1,3,3477,894,534,1457,252,342 370 | 1,3,22335,1196,2406,2046,101,558 371 | 1,3,6211,337,683,1089,41,296 372 | 2,3,39679,3944,4955,1364,523,2235 373 | 1,3,20105,1887,1939,8164,716,790 374 | 1,3,3884,3801,1641,876,397,4829 375 | 2,3,15076,6257,7398,1504,1916,3113 376 | 1,3,6338,2256,1668,1492,311,686 377 | 1,3,5841,1450,1162,597,476,70 378 | 2,3,3136,8630,13586,5641,4666,1426 379 | 1,3,38793,3154,2648,1034,96,1242 380 | 1,3,3225,3294,1902,282,68,1114 381 | 2,3,4048,5164,10391,130,813,179 382 | 1,3,28257,944,2146,3881,600,270 383 | 1,3,17770,4591,1617,9927,246,532 384 | 1,3,34454,7435,8469,2540,1711,2893 385 | 1,3,1821,1364,3450,4006,397,361 386 | 1,3,10683,21858,15400,3635,282,5120 387 | 1,3,11635,922,1614,2583,192,1068 388 | 1,3,1206,3620,2857,1945,353,967 389 | 1,3,20918,1916,1573,1960,231,961 390 | 1,3,9785,848,1172,1677,200,406 391 | 1,3,9385,1530,1422,3019,227,684 392 | 1,3,3352,1181,1328,5502,311,1000 393 | 1,3,2647,2761,2313,907,95,1827 394 | 1,3,518,4180,3600,659,122,654 395 | 1,3,23632,6730,3842,8620,385,819 396 | 1,3,12377,865,3204,1398,149,452 397 | 1,3,9602,1316,1263,2921,841,290 398 | 2,3,4515,11991,9345,2644,3378,2213 399 | 1,3,11535,1666,1428,6838,64,743 400 | 1,3,11442,1032,582,5390,74,247 401 | 1,3,9612,577,935,1601,469,375 402 | 1,3,4446,906,1238,3576,153,1014 403 | 1,3,27167,2801,2128,13223,92,1902 404 | 1,3,26539,4753,5091,220,10,340 405 | 1,3,25606,11006,4604,127,632,288 406 | 1,3,18073,4613,3444,4324,914,715 407 | 1,3,6884,1046,1167,2069,593,378 408 | 1,3,25066,5010,5026,9806,1092,960 409 | 2,3,7362,12844,18683,2854,7883,553 410 | 2,3,8257,3880,6407,1646,2730,344 411 | 1,3,8708,3634,6100,2349,2123,5137 412 | 1,3,6633,2096,4563,1389,1860,1892 413 | 1,3,2126,3289,3281,1535,235,4365 414 | 1,3,97,3605,12400,98,2970,62 415 | 1,3,4983,4859,6633,17866,912,2435 416 | 1,3,5969,1990,3417,5679,1135,290 417 | 2,3,7842,6046,8552,1691,3540,1874 418 | 2,3,4389,10940,10908,848,6728,993 419 | 1,3,5065,5499,11055,364,3485,1063 420 | 2,3,660,8494,18622,133,6740,776 421 | 1,3,8861,3783,2223,633,1580,1521 422 | 1,3,4456,5266,13227,25,6818,1393 423 | 2,3,17063,4847,9053,1031,3415,1784 424 | 1,3,26400,1377,4172,830,948,1218 425 | 2,3,17565,3686,4657,1059,1803,668 426 | 2,3,16980,2884,12232,874,3213,249 427 | 1,3,11243,2408,2593,15348,108,1886 428 | 1,3,13134,9347,14316,3141,5079,1894 429 | 1,3,31012,16687,5429,15082,439,1163 430 | 1,3,3047,5970,4910,2198,850,317 431 | 1,3,8607,1750,3580,47,84,2501 432 | 1,3,3097,4230,16483,575,241,2080 433 | 1,3,8533,5506,5160,13486,1377,1498 434 | 1,3,21117,1162,4754,269,1328,395 435 | 1,3,1982,3218,1493,1541,356,1449 436 | 1,3,16731,3922,7994,688,2371,838 437 | 1,3,29703,12051,16027,13135,182,2204 438 | 1,3,39228,1431,764,4510,93,2346 439 | 2,3,14531,15488,30243,437,14841,1867 440 | 1,3,10290,1981,2232,1038,168,2125 441 | 1,3,2787,1698,2510,65,477,52 442 | -------------------------------------------------------------------------------- /customer_segments/customer_segments/visuals.py: -------------------------------------------------------------------------------- 1 | ########################################### 2 | # Suppress matplotlib user warnings 3 | # Necessary for newer version of matplotlib 4 | import warnings 5 | warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib") 6 | # 7 | # Display inline matplotlib plots with IPython 8 | from IPython import get_ipython 9 | get_ipython().run_line_magic('matplotlib', 'inline') 10 | ########################################### 11 | 12 | import matplotlib.pyplot as plt 13 | import matplotlib.cm as cm 14 | import pandas as pd 15 | import numpy as np 16 | 17 | def pca_results(good_data, pca): 18 | ''' 19 | Create a DataFrame of the PCA results 20 | Includes dimension feature weights and explained variance 21 | Visualizes the PCA results 22 | ''' 23 | 24 | # Dimension indexing 25 | dimensions = dimensions = ['Dimension {}'.format(i) for i in range(1,len(pca.components_)+1)] 26 | 27 | # PCA components 28 | components = pd.DataFrame(np.round(pca.components_, 4), columns = good_data.keys()) 29 | components.index = dimensions 30 | 31 | # PCA explained variance 32 | ratios = pca.explained_variance_ratio_.reshape(len(pca.components_), 1) 33 | variance_ratios = pd.DataFrame(np.round(ratios, 4), columns = ['Explained Variance']) 34 | variance_ratios.index = dimensions 35 | 36 | # Create a bar plot visualization 37 | fig, ax = plt.subplots(figsize = (14,8)) 38 | 39 | # Plot the feature weights as a function of the components 40 | components.plot(ax = ax, kind = 'bar'); 41 | ax.set_ylabel("Feature Weights") 42 | ax.set_xticklabels(dimensions, rotation=0) 43 | 44 | 45 | # Display the explained variance ratios 46 | for i, ev in enumerate(pca.explained_variance_ratio_): 47 | ax.text(i-0.40, ax.get_ylim()[1] + 0.05, "Explained Variance\n %.4f"%(ev)) 48 | 49 | # Return a concatenated DataFrame 50 | return pd.concat([variance_ratios, components], axis = 1) 51 | 52 | def cluster_results(reduced_data, preds, centers, pca_samples): 53 | ''' 54 | Visualizes the PCA-reduced cluster data in two dimensions 55 | Adds cues for cluster centers and student-selected sample data 56 | ''' 57 | 58 | predictions = pd.DataFrame(preds, columns = ['Cluster']) 59 | plot_data = pd.concat([predictions, reduced_data], axis = 1) 60 | 61 | # Generate the cluster plot 62 | fig, ax = plt.subplots(figsize = (14,8)) 63 | 64 | # Color map 65 | cmap = cm.get_cmap('gist_rainbow') 66 | 67 | # Color the points based on assigned cluster 68 | for i, cluster in plot_data.groupby('Cluster'): 69 | cluster.plot(ax = ax, kind = 'scatter', x = 'Dimension 1', y = 'Dimension 2', \ 70 | color = cmap((i)*1.0/(len(centers)-1)), label = 'Cluster %i'%(i), s=30); 71 | 72 | # Plot centers with indicators 73 | for i, c in enumerate(centers): 74 | ax.scatter(x = c[0], y = c[1], color = 'white', edgecolors = 'black', \ 75 | alpha = 1, linewidth = 2, marker = 'o', s=200); 76 | ax.scatter(x = c[0], y = c[1], marker='$%d$'%(i), alpha = 1, s=100); 77 | 78 | # Plot transformed sample points 79 | ax.scatter(x = pca_samples[:,0], y = pca_samples[:,1], \ 80 | s = 150, linewidth = 4, color = 'black', marker = 'x'); 81 | 82 | # Set plot title 83 | ax.set_title("Cluster Learning on PCA-Reduced Data - Centroids Marked by Number\nTransformed Sample Data Marked by Black Cross"); 84 | 85 | 86 | def biplot(good_data, reduced_data, pca): 87 | ''' 88 | Produce a biplot that shows a scatterplot of the reduced 89 | data and the projections of the original features. 90 | 91 | good_data: original data, before transformation. 92 | Needs to be a pandas dataframe with valid column names 93 | reduced_data: the reduced data (the first two dimensions are plotted) 94 | pca: pca object that contains the components_ attribute 95 | 96 | return: a matplotlib AxesSubplot object (for any additional customization) 97 | 98 | This procedure is inspired by the script: 99 | https://github.com/teddyroland/python-biplot 100 | ''' 101 | 102 | fig, ax = plt.subplots(figsize = (14,8)) 103 | # scatterplot of the reduced data 104 | ax.scatter(x=reduced_data.loc[:, 'Dimension 1'], y=reduced_data.loc[:, 'Dimension 2'], 105 | facecolors='b', edgecolors='b', s=70, alpha=0.5) 106 | 107 | feature_vectors = pca.components_.T 108 | 109 | # we use scaling factors to make the arrows easier to see 110 | arrow_size, text_pos = 7.0, 8.0, 111 | 112 | # projections of the original features 113 | for i, v in enumerate(feature_vectors): 114 | ax.arrow(0, 0, arrow_size*v[0], arrow_size*v[1], 115 | head_width=0.2, head_length=0.2, linewidth=2, color='red') 116 | ax.text(v[0]*text_pos, v[1]*text_pos, good_data.columns[i], color='black', 117 | ha='center', va='center', fontsize=18) 118 | 119 | ax.set_xlabel("Dimension 1", fontsize=14) 120 | ax.set_ylabel("Dimension 2", fontsize=14) 121 | ax.set_title("PC plane with original feature projections.", fontsize=16); 122 | return ax 123 | 124 | 125 | def channel_results(reduced_data, outliers, pca_samples): 126 | ''' 127 | Visualizes the PCA-reduced cluster data in two dimensions using the full dataset 128 | Data is labeled by "Channel" and cues added for student-selected sample data 129 | ''' 130 | 131 | # Check that the dataset is loadable 132 | try: 133 | full_data = pd.read_csv("customers.csv") 134 | except: 135 | print "Dataset could not be loaded. Is the file missing?" 136 | return False 137 | 138 | # Create the Channel DataFrame 139 | channel = pd.DataFrame(full_data['Channel'], columns = ['Channel']) 140 | channel = channel.drop(channel.index[outliers]).reset_index(drop = True) 141 | labeled = pd.concat([reduced_data, channel], axis = 1) 142 | 143 | # Generate the cluster plot 144 | fig, ax = plt.subplots(figsize = (14,8)) 145 | 146 | # Color map 147 | cmap = cm.get_cmap('gist_rainbow') 148 | 149 | # Color the points based on assigned Channel 150 | labels = ['Hotel/Restaurant/Cafe', 'Retailer'] 151 | grouped = labeled.groupby('Channel') 152 | for i, channel in grouped: 153 | channel.plot(ax = ax, kind = 'scatter', x = 'Dimension 1', y = 'Dimension 2', \ 154 | color = cmap((i-1)*1.0/2), label = labels[i-1], s=30); 155 | 156 | # Plot transformed sample points 157 | for i, sample in enumerate(pca_samples): 158 | ax.scatter(x = sample[0], y = sample[1], \ 159 | s = 200, linewidth = 3, color = 'black', marker = 'o', facecolors = 'none'); 160 | ax.scatter(x = sample[0]+0.25, y = sample[1]+0.3, marker='$%d$'%(i), alpha = 1, s=125); 161 | 162 | # Set plot title 163 | ax.set_title("PCA-Reduced Data Labeled by 'Channel'\nTransformed Sample Data Circled"); -------------------------------------------------------------------------------- /finding_donors/finding_donors/README.md: -------------------------------------------------------------------------------- 1 | # Finding Donors for CharityML 2 | 3 | Investigated factors that affect the likelihood of charity donations being made based on real census data. Developed a naive classifier to compare testing results to. Trained and tested several supervised machine learning models on preprocessed census data to predict the likelihood of donations. Selected the best model based on accuracy, a modified F-scoring metric, and algorithm efficiency. 4 | 5 | See my implementation and report [here](https://github.com/robertyoung2/Finding-Donors-for-CharityML/blob/master/finding_donors.ipynb). 6 | 7 | ## Project Brief 8 | 9 | In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you've selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given. 10 | 11 | ## Project Evaluation 12 | 13 | My project was evaluated against the [Finding Donors for CharityML project rubric.](https://github.com/robertyoung2/Finding-Donors-for-CharityML/blob/master/Finding%20Donors%20for%20CharityML%20project%20rubric.pdf). 14 | 15 | ## Files Submitted 16 | 17 | - The `finding_donors.ipynb` notebook file with all questions answered and all code cells executed and displaying output. 18 | - An HTML export of the project notebook with the name `report.html`. This file must be present for your project to be evaluated. 19 | -------------------------------------------------------------------------------- /finding_donors/finding_donors/project_description.md: -------------------------------------------------------------------------------- 1 | # Content: Supervised Learning 2 | ## Project: Finding Donors for CharityML 3 | 4 | ## Project Overview 5 | In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you've selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given. 6 | predicted selling price to your statistics. 7 | 8 | ## Project Highlights 9 | This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided. 10 | 11 | Things you will learn by completing this project: 12 | - How to identify when preprocessing is needed, and how to apply it. 13 | - How to establish a benchmark for a solution to the problem. 14 | - What each of several supervised learning algorithms accomplishes given a specific dataset. 15 | - How to investigate whether a candidate solution model is adequate for the problem. 16 | 17 | ## Software Requirements 18 | 19 | This project uses the following software and Python libraries: 20 | 21 | - [Python 2.7](https://www.python.org/download/releases/2.7/) 22 | - [NumPy](http://www.numpy.org/) 23 | - [Pandas](http://pandas.pydata.org/) 24 | - [scikit-learn](http://scikit-learn.org/stable/) 25 | - [matplotlib](http://matplotlib.org/) 26 | 27 | You will also need to have software installed to run and execute a [Jupyter Notebook](http://ipython.org/notebook.html) 28 | 29 | If you do not have Python installed yet, it is highly recommended that you install the [Anaconda](http://continuum.io/downloads) distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer. 30 | 31 | ## Starting the Project 32 | 33 | For this assignment, you can find the `finding_donors` folder containing the necessary project files on the [Machine Learning projects GitHub](https://github.com/udacity/machine-learning), under the `projects` folder. You may download all of the files for projects we'll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project! 34 | 35 | This project contains three files: 36 | 37 | - `finding_donors.ipynb`: This is the main file where you will be performing your work on the project. 38 | - `census.csv`: The project dataset. You'll load this data in the notebook. 39 | - `visuals.py`: A Python file containing visualization code that is run behind-the-scenes. Do not modify 40 | 41 | In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command `jupyter notebook finding_donors.ipynb` to open up a browser window or tab to work with your notebook. Alternatively, you can use the command `jupyter notebook` or `ipython notebook` and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A **README** file has also been provided with the project files which may contain additional necessary information or instruction for the project. 42 | 43 | ## Submitting the Project 44 | 45 | ### Evaluation 46 | Your project will be reviewed by a Udacity reviewer against the **Finding Donors for CharityML project rubric**. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be *meeting specifications* for you to pass. 47 | 48 | ### Submission Files 49 | When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named `student_intervention` for ease of access: 50 | - The `finding_donors.ipynb` notebook file with all questions answered and all code cells executed and displaying output. 51 | - An **HTML** export of the project notebook with the name **report.html**. This file *must* be present for your project to be evaluated. 52 | 53 | Once you have collected these files and reviewed the project rubric, proceed to the project submission page. 54 | 55 | ### I'm Ready! 56 | When you're ready to submit your project, click on the **Submit Project** button at the bottom of the page. 57 | 58 | If you are having any problems submitting your project or wish to check on the status of your submission, please email us at **machine-support@udacity.com** or visit us in the discussion forums. 59 | 60 | ### What's Next? 61 | You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it! 62 | -------------------------------------------------------------------------------- /finding_donors/finding_donors/visuals.py: -------------------------------------------------------------------------------- 1 | ########################################### 2 | # Suppress matplotlib user warnings 3 | # Necessary for newer version of matplotlib 4 | import warnings 5 | warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib") 6 | # 7 | # Display inline matplotlib plots with IPython 8 | from IPython import get_ipython 9 | get_ipython().run_line_magic('matplotlib', 'inline') 10 | ########################################### 11 | 12 | import matplotlib.pyplot as pl 13 | import matplotlib.patches as mpatches 14 | import numpy as np 15 | import pandas as pd 16 | from time import time 17 | from sklearn.metrics import f1_score, accuracy_score 18 | 19 | 20 | def distribution(data, transformed = False): 21 | """ 22 | Visualization code for displaying skewed distributions of features 23 | """ 24 | 25 | # Create figure 26 | fig = pl.figure(figsize = (11,5)); 27 | 28 | # Skewed feature plotting 29 | for i, feature in enumerate(['capital-gain','capital-loss']): 30 | ax = fig.add_subplot(1, 2, i+1) 31 | ax.hist(data[feature], bins = 25, color = '#00A0A0') 32 | ax.set_title("'%s' Feature Distribution"%(feature), fontsize = 14) 33 | ax.set_xlabel("Value") 34 | ax.set_ylabel("Number of Records") 35 | ax.set_ylim((0, 2000)) 36 | ax.set_yticks([0, 500, 1000, 1500, 2000]) 37 | ax.set_yticklabels([0, 500, 1000, 1500, ">2000"]) 38 | 39 | # Plot aesthetics 40 | if transformed: 41 | fig.suptitle("Log-transformed Distributions of Continuous Census Data Features", \ 42 | fontsize = 16, y = 1.03) 43 | else: 44 | fig.suptitle("Skewed Distributions of Continuous Census Data Features", \ 45 | fontsize = 16, y = 1.03) 46 | 47 | fig.tight_layout() 48 | fig.show() 49 | 50 | 51 | def evaluate(results, accuracy, f1): 52 | """ 53 | Visualization code to display results of various learners. 54 | 55 | inputs: 56 | - learners: a list of supervised learners 57 | - stats: a list of dictionaries of the statistic results from 'train_predict()' 58 | - accuracy: The score for the naive predictor 59 | - f1: The score for the naive predictor 60 | """ 61 | 62 | # Create figure 63 | fig, ax = pl.subplots(2, 4, figsize = (11,7)) 64 | 65 | # Constants 66 | bar_width = 0.3 67 | colors = ['#A00000','#00A0A0','#00A000'] 68 | 69 | # Super loop to plot four panels of data 70 | for k, learner in enumerate(results.keys()): 71 | for j, metric in enumerate(['train_time', 'acc_train', 'f_train', 'pred_time', 'acc_test', 'f_test']): 72 | for i in np.arange(3): 73 | 74 | # Creative plot code 75 | ax[j//3, j%3].bar(i+k*bar_width, results[learner][i][metric], width = bar_width, color = colors[k]) 76 | ax[j//3, j%3].set_xticks([0.45, 1.45, 2.45]) 77 | ax[j//3, j%3].set_xticklabels(["1%", "10%", "100%"]) 78 | ax[j//3, j%3].set_xlabel("Training Set Size") 79 | ax[j//3, j%3].set_xlim((-0.1, 3.0)) 80 | 81 | # Add unique y-labels 82 | ax[0, 0].set_ylabel("Time (in seconds)") 83 | ax[0, 1].set_ylabel("Accuracy Score") 84 | ax[0, 2].set_ylabel("F-score") 85 | ax[1, 0].set_ylabel("Time (in seconds)") 86 | ax[1, 1].set_ylabel("Accuracy Score") 87 | ax[1, 2].set_ylabel("F-score") 88 | 89 | # Add titles 90 | ax[0, 0].set_title("Model Training") 91 | ax[0, 1].set_title("Accuracy Score on Training Subset") 92 | ax[0, 2].set_title("F-score on Training Subset") 93 | ax[1, 0].set_title("Model Predicting") 94 | ax[1, 1].set_title("Accuracy Score on Testing Set") 95 | ax[1, 2].set_title("F-score on Testing Set") 96 | 97 | # Add horizontal lines for naive predictors 98 | ax[0, 1].axhline(y = accuracy, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed') 99 | ax[1, 1].axhline(y = accuracy, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed') 100 | ax[0, 2].axhline(y = f1, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed') 101 | ax[1, 2].axhline(y = f1, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed') 102 | 103 | # Set y-limits for score panels 104 | ax[0, 1].set_ylim((0, 1)) 105 | ax[0, 2].set_ylim((0, 1)) 106 | ax[1, 1].set_ylim((0, 1)) 107 | ax[1, 2].set_ylim((0, 1)) 108 | 109 | # Set additional plots invisibles 110 | ax[0, 3].set_visible(False) 111 | ax[1, 3].axis('off') 112 | 113 | # Create legend 114 | for i, learner in enumerate(results.keys()): 115 | pl.bar(0, 0, color=colors[i], label=learner) 116 | pl.legend() 117 | 118 | # Aesthetics 119 | pl.suptitle("Performance Metrics for Three Supervised Learning Models", fontsize = 16, y = 1.10) 120 | pl.tight_layout() 121 | pl.show() 122 | 123 | 124 | def feature_plot(importances, X_train, y_train): 125 | 126 | # Display the five most important features 127 | indices = np.argsort(importances)[::-1] 128 | columns = X_train.columns.values[indices[:5]] 129 | values = importances[indices][:5] 130 | 131 | # Creat the plot 132 | fig = pl.figure(figsize = (9,5)) 133 | pl.title("Normalized Weights for First Five Most Predictive Features", fontsize = 16) 134 | pl.bar(np.arange(5), values, width = 0.6, align="center", color = '#00A000', \ 135 | label = "Feature Weight") 136 | pl.bar(np.arange(5) - 0.3, np.cumsum(values), width = 0.2, align = "center", color = '#00A0A0', \ 137 | label = "Cumulative Feature Weight") 138 | pl.xticks(np.arange(5), columns) 139 | pl.xlim((-0.5, 4.5)) 140 | pl.ylabel("Weight", fontsize = 12) 141 | pl.xlabel("Feature", fontsize = 12) 142 | 143 | pl.legend(loc = 'upper center') 144 | pl.tight_layout() 145 | pl.show() 146 | --------------------------------------------------------------------------------