├── customer_segments
└── customer_segments
│ ├── README.md
│ ├── customer_segments.ipynb
│ ├── customers.csv
│ ├── report.html
│ └── visuals.py
└── finding_donors
└── finding_donors
├── README.md
├── census.csv
├── finding_donors.ipynb
├── project_description.md
├── report.html
└── visuals.py
/customer_segments/customer_segments/README.md:
--------------------------------------------------------------------------------
1 | # Machine Learning Engineer Nanodegree
2 | ## Project 4: Creating Customer Segments
3 |
4 | ### Project Description
5 |
6 | This is the 4th project for the Machine Learning Engineer Nanodegree. In this project I apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor to identify customer segments hidden in the data.
7 |
8 | Here, I first explore the data to determine if any product categories highly correlate with one another by observing a small subset of the data and also by ploting a scater matrix. Afterwards, I preprocess the data by scaling each product category and then identifying (and removing) unwanted outliers. Then I apply PCA transformations to the data and implement a clustering algorithm (Gaussian Mixture Model)to segment the transformed customer data. Finally, I compare the segmentation found with an additional labeling and consider ways this information could assist the wholesale distributor with future service changes.
9 |
10 | ### Install
11 |
12 | This project requires **Python 2.7** and the following Python libraries installed:
13 |
14 | - [NumPy](http://www.numpy.org/)
15 | - [Pandas](http://pandas.pydata.org)
16 | - [matplotlib](http://matplotlib.org/)
17 | - [scikit-learn](http://scikit-learn.org/stable/)
18 |
19 | You will also need to have software installed to run and execute a [Jupyter Notebook](http://ipython.org/notebook.html)
20 |
21 | If you do not have Python installed yet, it is highly recommended that you install the [Anaconda](http://continuum.io/downloads) distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
22 |
23 | ### Code
24 |
25 | The main code for this project is located in the `customer_segments.ipynb` notebook file. Additional supporting code for visualizing the necessary graphs can be found in `visuals.py`. Additionally, the `Report.html` file contains a snapshot of the main code in the jupyter notebook with all code cells executed
26 |
27 | ### Run
28 |
29 | In a terminal or command window, navigate to the top-level project directory `customer_segments/` (that contains this README) and run one of the following commands:
30 |
31 | ```bash
32 | ipython notebook customer_segments.ipynb
33 | ```
34 | or
35 | ```bash
36 | jupyter notebook customer_segments.ipynb
37 | ```
38 |
39 | This will open the Jupyter Notebook software and project file in your browser.
40 |
41 | ## Data
42 |
43 | The customer segments data is included as a selection of 440 data points collected on data found from clients of a wholesale distributor in Lisbon, Portugal. More information can be found on the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Wholesale+customers).
44 |
45 | Note (m.u.) is shorthand for *monetary units*.
46 |
47 | **Features**
48 | 1) `Fresh`: annual spending (m.u.) on fresh products (Continuous);
49 | 2) `Milk`: annual spending (m.u.) on milk products (Continuous);
50 | 3) `Grocery`: annual spending (m.u.) on grocery products (Continuous);
51 | 4) `Frozen`: annual spending (m.u.) on frozen products (Continuous);
52 | 5) `Detergents_Paper`: annual spending (m.u.) on detergents and paper products (Continuous);
53 | 6) `Delicatessen`: annual spending (m.u.) on and delicatessen products (Continuous);
54 | 7) `Channel`: {Hotel/Restaurant/Cafe - 1, Retail - 2} (Nominal)
55 | 8) `Region`: {Lisnon - 1, Oporto - 2, or Other - 3} (Nominal)
56 |
--------------------------------------------------------------------------------
/customer_segments/customer_segments/customers.csv:
--------------------------------------------------------------------------------
1 | Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicatessen
2 | 2,3,12669,9656,7561,214,2674,1338
3 | 2,3,7057,9810,9568,1762,3293,1776
4 | 2,3,6353,8808,7684,2405,3516,7844
5 | 1,3,13265,1196,4221,6404,507,1788
6 | 2,3,22615,5410,7198,3915,1777,5185
7 | 2,3,9413,8259,5126,666,1795,1451
8 | 2,3,12126,3199,6975,480,3140,545
9 | 2,3,7579,4956,9426,1669,3321,2566
10 | 1,3,5963,3648,6192,425,1716,750
11 | 2,3,6006,11093,18881,1159,7425,2098
12 | 2,3,3366,5403,12974,4400,5977,1744
13 | 2,3,13146,1124,4523,1420,549,497
14 | 2,3,31714,12319,11757,287,3881,2931
15 | 2,3,21217,6208,14982,3095,6707,602
16 | 2,3,24653,9465,12091,294,5058,2168
17 | 1,3,10253,1114,3821,397,964,412
18 | 2,3,1020,8816,12121,134,4508,1080
19 | 1,3,5876,6157,2933,839,370,4478
20 | 2,3,18601,6327,10099,2205,2767,3181
21 | 1,3,7780,2495,9464,669,2518,501
22 | 2,3,17546,4519,4602,1066,2259,2124
23 | 1,3,5567,871,2010,3383,375,569
24 | 1,3,31276,1917,4469,9408,2381,4334
25 | 2,3,26373,36423,22019,5154,4337,16523
26 | 2,3,22647,9776,13792,2915,4482,5778
27 | 2,3,16165,4230,7595,201,4003,57
28 | 1,3,9898,961,2861,3151,242,833
29 | 1,3,14276,803,3045,485,100,518
30 | 2,3,4113,20484,25957,1158,8604,5206
31 | 1,3,43088,2100,2609,1200,1107,823
32 | 1,3,18815,3610,11107,1148,2134,2963
33 | 1,3,2612,4339,3133,2088,820,985
34 | 1,3,21632,1318,2886,266,918,405
35 | 1,3,29729,4786,7326,6130,361,1083
36 | 1,3,1502,1979,2262,425,483,395
37 | 2,3,688,5491,11091,833,4239,436
38 | 1,3,29955,4362,5428,1729,862,4626
39 | 2,3,15168,10556,12477,1920,6506,714
40 | 2,3,4591,15729,16709,33,6956,433
41 | 1,3,56159,555,902,10002,212,2916
42 | 1,3,24025,4332,4757,9510,1145,5864
43 | 1,3,19176,3065,5956,2033,2575,2802
44 | 2,3,10850,7555,14961,188,6899,46
45 | 2,3,630,11095,23998,787,9529,72
46 | 2,3,9670,7027,10471,541,4618,65
47 | 2,3,5181,22044,21531,1740,7353,4985
48 | 2,3,3103,14069,21955,1668,6792,1452
49 | 2,3,44466,54259,55571,7782,24171,6465
50 | 2,3,11519,6152,10868,584,5121,1476
51 | 2,3,4967,21412,28921,1798,13583,1163
52 | 1,3,6269,1095,1980,3860,609,2162
53 | 1,3,3347,4051,6996,239,1538,301
54 | 2,3,40721,3916,5876,532,2587,1278
55 | 2,3,491,10473,11532,744,5611,224
56 | 1,3,27329,1449,1947,2436,204,1333
57 | 1,3,5264,3683,5005,1057,2024,1130
58 | 2,3,4098,29892,26866,2616,17740,1340
59 | 2,3,5417,9933,10487,38,7572,1282
60 | 1,3,13779,1970,1648,596,227,436
61 | 1,3,6137,5360,8040,129,3084,1603
62 | 2,3,8590,3045,7854,96,4095,225
63 | 2,3,35942,38369,59598,3254,26701,2017
64 | 2,3,7823,6245,6544,4154,4074,964
65 | 2,3,9396,11601,15775,2896,7677,1295
66 | 1,3,4760,1227,3250,3724,1247,1145
67 | 2,3,85,20959,45828,36,24231,1423
68 | 1,3,9,1534,7417,175,3468,27
69 | 2,3,19913,6759,13462,1256,5141,834
70 | 1,3,2446,7260,3993,5870,788,3095
71 | 1,3,8352,2820,1293,779,656,144
72 | 1,3,16705,2037,3202,10643,116,1365
73 | 1,3,18291,1266,21042,5373,4173,14472
74 | 1,3,4420,5139,2661,8872,1321,181
75 | 2,3,19899,5332,8713,8132,764,648
76 | 2,3,8190,6343,9794,1285,1901,1780
77 | 1,3,20398,1137,3,4407,3,975
78 | 1,3,717,3587,6532,7530,529,894
79 | 2,3,12205,12697,28540,869,12034,1009
80 | 1,3,10766,1175,2067,2096,301,167
81 | 1,3,1640,3259,3655,868,1202,1653
82 | 1,3,7005,829,3009,430,610,529
83 | 2,3,219,9540,14403,283,7818,156
84 | 2,3,10362,9232,11009,737,3537,2342
85 | 1,3,20874,1563,1783,2320,550,772
86 | 2,3,11867,3327,4814,1178,3837,120
87 | 2,3,16117,46197,92780,1026,40827,2944
88 | 2,3,22925,73498,32114,987,20070,903
89 | 1,3,43265,5025,8117,6312,1579,14351
90 | 1,3,7864,542,4042,9735,165,46
91 | 1,3,24904,3836,5330,3443,454,3178
92 | 1,3,11405,596,1638,3347,69,360
93 | 1,3,12754,2762,2530,8693,627,1117
94 | 2,3,9198,27472,32034,3232,18906,5130
95 | 1,3,11314,3090,2062,35009,71,2698
96 | 2,3,5626,12220,11323,206,5038,244
97 | 1,3,3,2920,6252,440,223,709
98 | 2,3,23,2616,8118,145,3874,217
99 | 1,3,403,254,610,774,54,63
100 | 1,3,503,112,778,895,56,132
101 | 1,3,9658,2182,1909,5639,215,323
102 | 2,3,11594,7779,12144,3252,8035,3029
103 | 2,3,1420,10810,16267,1593,6766,1838
104 | 2,3,2932,6459,7677,2561,4573,1386
105 | 1,3,56082,3504,8906,18028,1480,2498
106 | 1,3,14100,2132,3445,1336,1491,548
107 | 1,3,15587,1014,3970,910,139,1378
108 | 2,3,1454,6337,10704,133,6830,1831
109 | 2,3,8797,10646,14886,2471,8969,1438
110 | 2,3,1531,8397,6981,247,2505,1236
111 | 2,3,1406,16729,28986,673,836,3
112 | 1,3,11818,1648,1694,2276,169,1647
113 | 2,3,12579,11114,17569,805,6457,1519
114 | 1,3,19046,2770,2469,8853,483,2708
115 | 1,3,14438,2295,1733,3220,585,1561
116 | 1,3,18044,1080,2000,2555,118,1266
117 | 1,3,11134,793,2988,2715,276,610
118 | 1,3,11173,2521,3355,1517,310,222
119 | 1,3,6990,3880,5380,1647,319,1160
120 | 1,3,20049,1891,2362,5343,411,933
121 | 1,3,8258,2344,2147,3896,266,635
122 | 1,3,17160,1200,3412,2417,174,1136
123 | 1,3,4020,3234,1498,2395,264,255
124 | 1,3,12212,201,245,1991,25,860
125 | 2,3,11170,10769,8814,2194,1976,143
126 | 1,3,36050,1642,2961,4787,500,1621
127 | 1,3,76237,3473,7102,16538,778,918
128 | 1,3,19219,1840,1658,8195,349,483
129 | 2,3,21465,7243,10685,880,2386,2749
130 | 1,3,140,8847,3823,142,1062,3
131 | 1,3,42312,926,1510,1718,410,1819
132 | 1,3,7149,2428,699,6316,395,911
133 | 1,3,2101,589,314,346,70,310
134 | 1,3,14903,2032,2479,576,955,328
135 | 1,3,9434,1042,1235,436,256,396
136 | 1,3,7388,1882,2174,720,47,537
137 | 1,3,6300,1289,2591,1170,199,326
138 | 1,3,4625,8579,7030,4575,2447,1542
139 | 1,3,3087,8080,8282,661,721,36
140 | 1,3,13537,4257,5034,155,249,3271
141 | 1,3,5387,4979,3343,825,637,929
142 | 1,3,17623,4280,7305,2279,960,2616
143 | 1,3,30379,13252,5189,321,51,1450
144 | 1,3,37036,7152,8253,2995,20,3
145 | 1,3,10405,1596,1096,8425,399,318
146 | 1,3,18827,3677,1988,118,516,201
147 | 2,3,22039,8384,34792,42,12591,4430
148 | 1,3,7769,1936,2177,926,73,520
149 | 1,3,9203,3373,2707,1286,1082,526
150 | 1,3,5924,584,542,4052,283,434
151 | 1,3,31812,1433,1651,800,113,1440
152 | 1,3,16225,1825,1765,853,170,1067
153 | 1,3,1289,3328,2022,531,255,1774
154 | 1,3,18840,1371,3135,3001,352,184
155 | 1,3,3463,9250,2368,779,302,1627
156 | 1,3,622,55,137,75,7,8
157 | 2,3,1989,10690,19460,233,11577,2153
158 | 2,3,3830,5291,14855,317,6694,3182
159 | 1,3,17773,1366,2474,3378,811,418
160 | 2,3,2861,6570,9618,930,4004,1682
161 | 2,3,355,7704,14682,398,8077,303
162 | 2,3,1725,3651,12822,824,4424,2157
163 | 1,3,12434,540,283,1092,3,2233
164 | 1,3,15177,2024,3810,2665,232,610
165 | 2,3,5531,15726,26870,2367,13726,446
166 | 2,3,5224,7603,8584,2540,3674,238
167 | 2,3,15615,12653,19858,4425,7108,2379
168 | 2,3,4822,6721,9170,993,4973,3637
169 | 1,3,2926,3195,3268,405,1680,693
170 | 1,3,5809,735,803,1393,79,429
171 | 1,3,5414,717,2155,2399,69,750
172 | 2,3,260,8675,13430,1116,7015,323
173 | 2,3,200,25862,19816,651,8773,6250
174 | 1,3,955,5479,6536,333,2840,707
175 | 2,3,514,7677,19805,937,9836,716
176 | 1,3,286,1208,5241,2515,153,1442
177 | 2,3,2343,7845,11874,52,4196,1697
178 | 1,3,45640,6958,6536,7368,1532,230
179 | 1,3,12759,7330,4533,1752,20,2631
180 | 1,3,11002,7075,4945,1152,120,395
181 | 1,3,3157,4888,2500,4477,273,2165
182 | 1,3,12356,6036,8887,402,1382,2794
183 | 1,3,112151,29627,18148,16745,4948,8550
184 | 1,3,694,8533,10518,443,6907,156
185 | 1,3,36847,43950,20170,36534,239,47943
186 | 1,3,327,918,4710,74,334,11
187 | 1,3,8170,6448,1139,2181,58,247
188 | 1,3,3009,521,854,3470,949,727
189 | 1,3,2438,8002,9819,6269,3459,3
190 | 2,3,8040,7639,11687,2758,6839,404
191 | 2,3,834,11577,11522,275,4027,1856
192 | 1,3,16936,6250,1981,7332,118,64
193 | 1,3,13624,295,1381,890,43,84
194 | 1,3,5509,1461,2251,547,187,409
195 | 2,3,180,3485,20292,959,5618,666
196 | 1,3,7107,1012,2974,806,355,1142
197 | 1,3,17023,5139,5230,7888,330,1755
198 | 1,1,30624,7209,4897,18711,763,2876
199 | 2,1,2427,7097,10391,1127,4314,1468
200 | 1,1,11686,2154,6824,3527,592,697
201 | 1,1,9670,2280,2112,520,402,347
202 | 2,1,3067,13240,23127,3941,9959,731
203 | 2,1,4484,14399,24708,3549,14235,1681
204 | 1,1,25203,11487,9490,5065,284,6854
205 | 1,1,583,685,2216,469,954,18
206 | 1,1,1956,891,5226,1383,5,1328
207 | 2,1,1107,11711,23596,955,9265,710
208 | 1,1,6373,780,950,878,288,285
209 | 2,1,2541,4737,6089,2946,5316,120
210 | 1,1,1537,3748,5838,1859,3381,806
211 | 2,1,5550,12729,16767,864,12420,797
212 | 1,1,18567,1895,1393,1801,244,2100
213 | 2,1,12119,28326,39694,4736,19410,2870
214 | 1,1,7291,1012,2062,1291,240,1775
215 | 1,1,3317,6602,6861,1329,3961,1215
216 | 2,1,2362,6551,11364,913,5957,791
217 | 1,1,2806,10765,15538,1374,5828,2388
218 | 2,1,2532,16599,36486,179,13308,674
219 | 1,1,18044,1475,2046,2532,130,1158
220 | 2,1,18,7504,15205,1285,4797,6372
221 | 1,1,4155,367,1390,2306,86,130
222 | 1,1,14755,899,1382,1765,56,749
223 | 1,1,5396,7503,10646,91,4167,239
224 | 1,1,5041,1115,2856,7496,256,375
225 | 2,1,2790,2527,5265,5612,788,1360
226 | 1,1,7274,659,1499,784,70,659
227 | 1,1,12680,3243,4157,660,761,786
228 | 2,1,20782,5921,9212,1759,2568,1553
229 | 1,1,4042,2204,1563,2286,263,689
230 | 1,1,1869,577,572,950,4762,203
231 | 1,1,8656,2746,2501,6845,694,980
232 | 2,1,11072,5989,5615,8321,955,2137
233 | 1,1,2344,10678,3828,1439,1566,490
234 | 1,1,25962,1780,3838,638,284,834
235 | 1,1,964,4984,3316,937,409,7
236 | 1,1,15603,2703,3833,4260,325,2563
237 | 1,1,1838,6380,2824,1218,1216,295
238 | 1,1,8635,820,3047,2312,415,225
239 | 1,1,18692,3838,593,4634,28,1215
240 | 1,1,7363,475,585,1112,72,216
241 | 1,1,47493,2567,3779,5243,828,2253
242 | 1,1,22096,3575,7041,11422,343,2564
243 | 1,1,24929,1801,2475,2216,412,1047
244 | 1,1,18226,659,2914,3752,586,578
245 | 1,1,11210,3576,5119,561,1682,2398
246 | 1,1,6202,7775,10817,1183,3143,1970
247 | 2,1,3062,6154,13916,230,8933,2784
248 | 1,1,8885,2428,1777,1777,430,610
249 | 1,1,13569,346,489,2077,44,659
250 | 1,1,15671,5279,2406,559,562,572
251 | 1,1,8040,3795,2070,6340,918,291
252 | 1,1,3191,1993,1799,1730,234,710
253 | 2,1,6134,23133,33586,6746,18594,5121
254 | 1,1,6623,1860,4740,7683,205,1693
255 | 1,1,29526,7961,16966,432,363,1391
256 | 1,1,10379,17972,4748,4686,1547,3265
257 | 1,1,31614,489,1495,3242,111,615
258 | 1,1,11092,5008,5249,453,392,373
259 | 1,1,8475,1931,1883,5004,3593,987
260 | 1,1,56083,4563,2124,6422,730,3321
261 | 1,1,53205,4959,7336,3012,967,818
262 | 1,1,9193,4885,2157,327,780,548
263 | 1,1,7858,1110,1094,6818,49,287
264 | 1,1,23257,1372,1677,982,429,655
265 | 1,1,2153,1115,6684,4324,2894,411
266 | 2,1,1073,9679,15445,61,5980,1265
267 | 1,1,5909,23527,13699,10155,830,3636
268 | 2,1,572,9763,22182,2221,4882,2563
269 | 1,1,20893,1222,2576,3975,737,3628
270 | 2,1,11908,8053,19847,1069,6374,698
271 | 1,1,15218,258,1138,2516,333,204
272 | 1,1,4720,1032,975,5500,197,56
273 | 1,1,2083,5007,1563,1120,147,1550
274 | 1,1,514,8323,6869,529,93,1040
275 | 1,3,36817,3045,1493,4802,210,1824
276 | 1,3,894,1703,1841,744,759,1153
277 | 1,3,680,1610,223,862,96,379
278 | 1,3,27901,3749,6964,4479,603,2503
279 | 1,3,9061,829,683,16919,621,139
280 | 1,3,11693,2317,2543,5845,274,1409
281 | 2,3,17360,6200,9694,1293,3620,1721
282 | 1,3,3366,2884,2431,977,167,1104
283 | 2,3,12238,7108,6235,1093,2328,2079
284 | 1,3,49063,3965,4252,5970,1041,1404
285 | 1,3,25767,3613,2013,10303,314,1384
286 | 1,3,68951,4411,12609,8692,751,2406
287 | 1,3,40254,640,3600,1042,436,18
288 | 1,3,7149,2247,1242,1619,1226,128
289 | 1,3,15354,2102,2828,8366,386,1027
290 | 1,3,16260,594,1296,848,445,258
291 | 1,3,42786,286,471,1388,32,22
292 | 1,3,2708,2160,2642,502,965,1522
293 | 1,3,6022,3354,3261,2507,212,686
294 | 1,3,2838,3086,4329,3838,825,1060
295 | 2,2,3996,11103,12469,902,5952,741
296 | 1,2,21273,2013,6550,909,811,1854
297 | 2,2,7588,1897,5234,417,2208,254
298 | 1,2,19087,1304,3643,3045,710,898
299 | 2,2,8090,3199,6986,1455,3712,531
300 | 2,2,6758,4560,9965,934,4538,1037
301 | 1,2,444,879,2060,264,290,259
302 | 2,2,16448,6243,6360,824,2662,2005
303 | 2,2,5283,13316,20399,1809,8752,172
304 | 2,2,2886,5302,9785,364,6236,555
305 | 2,2,2599,3688,13829,492,10069,59
306 | 2,2,161,7460,24773,617,11783,2410
307 | 2,2,243,12939,8852,799,3909,211
308 | 2,2,6468,12867,21570,1840,7558,1543
309 | 1,2,17327,2374,2842,1149,351,925
310 | 1,2,6987,1020,3007,416,257,656
311 | 2,2,918,20655,13567,1465,6846,806
312 | 1,2,7034,1492,2405,12569,299,1117
313 | 1,2,29635,2335,8280,3046,371,117
314 | 2,2,2137,3737,19172,1274,17120,142
315 | 1,2,9784,925,2405,4447,183,297
316 | 1,2,10617,1795,7647,1483,857,1233
317 | 2,2,1479,14982,11924,662,3891,3508
318 | 1,2,7127,1375,2201,2679,83,1059
319 | 1,2,1182,3088,6114,978,821,1637
320 | 1,2,11800,2713,3558,2121,706,51
321 | 2,2,9759,25071,17645,1128,12408,1625
322 | 1,2,1774,3696,2280,514,275,834
323 | 1,2,9155,1897,5167,2714,228,1113
324 | 1,2,15881,713,3315,3703,1470,229
325 | 1,2,13360,944,11593,915,1679,573
326 | 1,2,25977,3587,2464,2369,140,1092
327 | 1,2,32717,16784,13626,60869,1272,5609
328 | 1,2,4414,1610,1431,3498,387,834
329 | 1,2,542,899,1664,414,88,522
330 | 1,2,16933,2209,3389,7849,210,1534
331 | 1,2,5113,1486,4583,5127,492,739
332 | 1,2,9790,1786,5109,3570,182,1043
333 | 2,2,11223,14881,26839,1234,9606,1102
334 | 1,2,22321,3216,1447,2208,178,2602
335 | 2,2,8565,4980,67298,131,38102,1215
336 | 2,2,16823,928,2743,11559,332,3486
337 | 2,2,27082,6817,10790,1365,4111,2139
338 | 1,2,13970,1511,1330,650,146,778
339 | 1,2,9351,1347,2611,8170,442,868
340 | 1,2,3,333,7021,15601,15,550
341 | 1,2,2617,1188,5332,9584,573,1942
342 | 2,3,381,4025,9670,388,7271,1371
343 | 2,3,2320,5763,11238,767,5162,2158
344 | 1,3,255,5758,5923,349,4595,1328
345 | 2,3,1689,6964,26316,1456,15469,37
346 | 1,3,3043,1172,1763,2234,217,379
347 | 1,3,1198,2602,8335,402,3843,303
348 | 2,3,2771,6939,15541,2693,6600,1115
349 | 2,3,27380,7184,12311,2809,4621,1022
350 | 1,3,3428,2380,2028,1341,1184,665
351 | 2,3,5981,14641,20521,2005,12218,445
352 | 1,3,3521,1099,1997,1796,173,995
353 | 2,3,1210,10044,22294,1741,12638,3137
354 | 1,3,608,1106,1533,830,90,195
355 | 2,3,117,6264,21203,228,8682,1111
356 | 1,3,14039,7393,2548,6386,1333,2341
357 | 1,3,190,727,2012,245,184,127
358 | 1,3,22686,134,218,3157,9,548
359 | 2,3,37,1275,22272,137,6747,110
360 | 1,3,759,18664,1660,6114,536,4100
361 | 1,3,796,5878,2109,340,232,776
362 | 1,3,19746,2872,2006,2601,468,503
363 | 1,3,4734,607,864,1206,159,405
364 | 1,3,2121,1601,2453,560,179,712
365 | 1,3,4627,997,4438,191,1335,314
366 | 1,3,2615,873,1524,1103,514,468
367 | 2,3,4692,6128,8025,1619,4515,3105
368 | 1,3,9561,2217,1664,1173,222,447
369 | 1,3,3477,894,534,1457,252,342
370 | 1,3,22335,1196,2406,2046,101,558
371 | 1,3,6211,337,683,1089,41,296
372 | 2,3,39679,3944,4955,1364,523,2235
373 | 1,3,20105,1887,1939,8164,716,790
374 | 1,3,3884,3801,1641,876,397,4829
375 | 2,3,15076,6257,7398,1504,1916,3113
376 | 1,3,6338,2256,1668,1492,311,686
377 | 1,3,5841,1450,1162,597,476,70
378 | 2,3,3136,8630,13586,5641,4666,1426
379 | 1,3,38793,3154,2648,1034,96,1242
380 | 1,3,3225,3294,1902,282,68,1114
381 | 2,3,4048,5164,10391,130,813,179
382 | 1,3,28257,944,2146,3881,600,270
383 | 1,3,17770,4591,1617,9927,246,532
384 | 1,3,34454,7435,8469,2540,1711,2893
385 | 1,3,1821,1364,3450,4006,397,361
386 | 1,3,10683,21858,15400,3635,282,5120
387 | 1,3,11635,922,1614,2583,192,1068
388 | 1,3,1206,3620,2857,1945,353,967
389 | 1,3,20918,1916,1573,1960,231,961
390 | 1,3,9785,848,1172,1677,200,406
391 | 1,3,9385,1530,1422,3019,227,684
392 | 1,3,3352,1181,1328,5502,311,1000
393 | 1,3,2647,2761,2313,907,95,1827
394 | 1,3,518,4180,3600,659,122,654
395 | 1,3,23632,6730,3842,8620,385,819
396 | 1,3,12377,865,3204,1398,149,452
397 | 1,3,9602,1316,1263,2921,841,290
398 | 2,3,4515,11991,9345,2644,3378,2213
399 | 1,3,11535,1666,1428,6838,64,743
400 | 1,3,11442,1032,582,5390,74,247
401 | 1,3,9612,577,935,1601,469,375
402 | 1,3,4446,906,1238,3576,153,1014
403 | 1,3,27167,2801,2128,13223,92,1902
404 | 1,3,26539,4753,5091,220,10,340
405 | 1,3,25606,11006,4604,127,632,288
406 | 1,3,18073,4613,3444,4324,914,715
407 | 1,3,6884,1046,1167,2069,593,378
408 | 1,3,25066,5010,5026,9806,1092,960
409 | 2,3,7362,12844,18683,2854,7883,553
410 | 2,3,8257,3880,6407,1646,2730,344
411 | 1,3,8708,3634,6100,2349,2123,5137
412 | 1,3,6633,2096,4563,1389,1860,1892
413 | 1,3,2126,3289,3281,1535,235,4365
414 | 1,3,97,3605,12400,98,2970,62
415 | 1,3,4983,4859,6633,17866,912,2435
416 | 1,3,5969,1990,3417,5679,1135,290
417 | 2,3,7842,6046,8552,1691,3540,1874
418 | 2,3,4389,10940,10908,848,6728,993
419 | 1,3,5065,5499,11055,364,3485,1063
420 | 2,3,660,8494,18622,133,6740,776
421 | 1,3,8861,3783,2223,633,1580,1521
422 | 1,3,4456,5266,13227,25,6818,1393
423 | 2,3,17063,4847,9053,1031,3415,1784
424 | 1,3,26400,1377,4172,830,948,1218
425 | 2,3,17565,3686,4657,1059,1803,668
426 | 2,3,16980,2884,12232,874,3213,249
427 | 1,3,11243,2408,2593,15348,108,1886
428 | 1,3,13134,9347,14316,3141,5079,1894
429 | 1,3,31012,16687,5429,15082,439,1163
430 | 1,3,3047,5970,4910,2198,850,317
431 | 1,3,8607,1750,3580,47,84,2501
432 | 1,3,3097,4230,16483,575,241,2080
433 | 1,3,8533,5506,5160,13486,1377,1498
434 | 1,3,21117,1162,4754,269,1328,395
435 | 1,3,1982,3218,1493,1541,356,1449
436 | 1,3,16731,3922,7994,688,2371,838
437 | 1,3,29703,12051,16027,13135,182,2204
438 | 1,3,39228,1431,764,4510,93,2346
439 | 2,3,14531,15488,30243,437,14841,1867
440 | 1,3,10290,1981,2232,1038,168,2125
441 | 1,3,2787,1698,2510,65,477,52
442 |
--------------------------------------------------------------------------------
/customer_segments/customer_segments/visuals.py:
--------------------------------------------------------------------------------
1 | ###########################################
2 | # Suppress matplotlib user warnings
3 | # Necessary for newer version of matplotlib
4 | import warnings
5 | warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib")
6 | #
7 | # Display inline matplotlib plots with IPython
8 | from IPython import get_ipython
9 | get_ipython().run_line_magic('matplotlib', 'inline')
10 | ###########################################
11 |
12 | import matplotlib.pyplot as plt
13 | import matplotlib.cm as cm
14 | import pandas as pd
15 | import numpy as np
16 |
17 | def pca_results(good_data, pca):
18 | '''
19 | Create a DataFrame of the PCA results
20 | Includes dimension feature weights and explained variance
21 | Visualizes the PCA results
22 | '''
23 |
24 | # Dimension indexing
25 | dimensions = dimensions = ['Dimension {}'.format(i) for i in range(1,len(pca.components_)+1)]
26 |
27 | # PCA components
28 | components = pd.DataFrame(np.round(pca.components_, 4), columns = good_data.keys())
29 | components.index = dimensions
30 |
31 | # PCA explained variance
32 | ratios = pca.explained_variance_ratio_.reshape(len(pca.components_), 1)
33 | variance_ratios = pd.DataFrame(np.round(ratios, 4), columns = ['Explained Variance'])
34 | variance_ratios.index = dimensions
35 |
36 | # Create a bar plot visualization
37 | fig, ax = plt.subplots(figsize = (14,8))
38 |
39 | # Plot the feature weights as a function of the components
40 | components.plot(ax = ax, kind = 'bar');
41 | ax.set_ylabel("Feature Weights")
42 | ax.set_xticklabels(dimensions, rotation=0)
43 |
44 |
45 | # Display the explained variance ratios
46 | for i, ev in enumerate(pca.explained_variance_ratio_):
47 | ax.text(i-0.40, ax.get_ylim()[1] + 0.05, "Explained Variance\n %.4f"%(ev))
48 |
49 | # Return a concatenated DataFrame
50 | return pd.concat([variance_ratios, components], axis = 1)
51 |
52 | def cluster_results(reduced_data, preds, centers, pca_samples):
53 | '''
54 | Visualizes the PCA-reduced cluster data in two dimensions
55 | Adds cues for cluster centers and student-selected sample data
56 | '''
57 |
58 | predictions = pd.DataFrame(preds, columns = ['Cluster'])
59 | plot_data = pd.concat([predictions, reduced_data], axis = 1)
60 |
61 | # Generate the cluster plot
62 | fig, ax = plt.subplots(figsize = (14,8))
63 |
64 | # Color map
65 | cmap = cm.get_cmap('gist_rainbow')
66 |
67 | # Color the points based on assigned cluster
68 | for i, cluster in plot_data.groupby('Cluster'):
69 | cluster.plot(ax = ax, kind = 'scatter', x = 'Dimension 1', y = 'Dimension 2', \
70 | color = cmap((i)*1.0/(len(centers)-1)), label = 'Cluster %i'%(i), s=30);
71 |
72 | # Plot centers with indicators
73 | for i, c in enumerate(centers):
74 | ax.scatter(x = c[0], y = c[1], color = 'white', edgecolors = 'black', \
75 | alpha = 1, linewidth = 2, marker = 'o', s=200);
76 | ax.scatter(x = c[0], y = c[1], marker='$%d$'%(i), alpha = 1, s=100);
77 |
78 | # Plot transformed sample points
79 | ax.scatter(x = pca_samples[:,0], y = pca_samples[:,1], \
80 | s = 150, linewidth = 4, color = 'black', marker = 'x');
81 |
82 | # Set plot title
83 | ax.set_title("Cluster Learning on PCA-Reduced Data - Centroids Marked by Number\nTransformed Sample Data Marked by Black Cross");
84 |
85 |
86 | def biplot(good_data, reduced_data, pca):
87 | '''
88 | Produce a biplot that shows a scatterplot of the reduced
89 | data and the projections of the original features.
90 |
91 | good_data: original data, before transformation.
92 | Needs to be a pandas dataframe with valid column names
93 | reduced_data: the reduced data (the first two dimensions are plotted)
94 | pca: pca object that contains the components_ attribute
95 |
96 | return: a matplotlib AxesSubplot object (for any additional customization)
97 |
98 | This procedure is inspired by the script:
99 | https://github.com/teddyroland/python-biplot
100 | '''
101 |
102 | fig, ax = plt.subplots(figsize = (14,8))
103 | # scatterplot of the reduced data
104 | ax.scatter(x=reduced_data.loc[:, 'Dimension 1'], y=reduced_data.loc[:, 'Dimension 2'],
105 | facecolors='b', edgecolors='b', s=70, alpha=0.5)
106 |
107 | feature_vectors = pca.components_.T
108 |
109 | # we use scaling factors to make the arrows easier to see
110 | arrow_size, text_pos = 7.0, 8.0,
111 |
112 | # projections of the original features
113 | for i, v in enumerate(feature_vectors):
114 | ax.arrow(0, 0, arrow_size*v[0], arrow_size*v[1],
115 | head_width=0.2, head_length=0.2, linewidth=2, color='red')
116 | ax.text(v[0]*text_pos, v[1]*text_pos, good_data.columns[i], color='black',
117 | ha='center', va='center', fontsize=18)
118 |
119 | ax.set_xlabel("Dimension 1", fontsize=14)
120 | ax.set_ylabel("Dimension 2", fontsize=14)
121 | ax.set_title("PC plane with original feature projections.", fontsize=16);
122 | return ax
123 |
124 |
125 | def channel_results(reduced_data, outliers, pca_samples):
126 | '''
127 | Visualizes the PCA-reduced cluster data in two dimensions using the full dataset
128 | Data is labeled by "Channel" and cues added for student-selected sample data
129 | '''
130 |
131 | # Check that the dataset is loadable
132 | try:
133 | full_data = pd.read_csv("customers.csv")
134 | except:
135 | print "Dataset could not be loaded. Is the file missing?"
136 | return False
137 |
138 | # Create the Channel DataFrame
139 | channel = pd.DataFrame(full_data['Channel'], columns = ['Channel'])
140 | channel = channel.drop(channel.index[outliers]).reset_index(drop = True)
141 | labeled = pd.concat([reduced_data, channel], axis = 1)
142 |
143 | # Generate the cluster plot
144 | fig, ax = plt.subplots(figsize = (14,8))
145 |
146 | # Color map
147 | cmap = cm.get_cmap('gist_rainbow')
148 |
149 | # Color the points based on assigned Channel
150 | labels = ['Hotel/Restaurant/Cafe', 'Retailer']
151 | grouped = labeled.groupby('Channel')
152 | for i, channel in grouped:
153 | channel.plot(ax = ax, kind = 'scatter', x = 'Dimension 1', y = 'Dimension 2', \
154 | color = cmap((i-1)*1.0/2), label = labels[i-1], s=30);
155 |
156 | # Plot transformed sample points
157 | for i, sample in enumerate(pca_samples):
158 | ax.scatter(x = sample[0], y = sample[1], \
159 | s = 200, linewidth = 3, color = 'black', marker = 'o', facecolors = 'none');
160 | ax.scatter(x = sample[0]+0.25, y = sample[1]+0.3, marker='$%d$'%(i), alpha = 1, s=125);
161 |
162 | # Set plot title
163 | ax.set_title("PCA-Reduced Data Labeled by 'Channel'\nTransformed Sample Data Circled");
--------------------------------------------------------------------------------
/finding_donors/finding_donors/README.md:
--------------------------------------------------------------------------------
1 | # Finding Donors for CharityML
2 |
3 | Investigated factors that affect the likelihood of charity donations being made based on real census data. Developed a naive classifier to compare testing results to. Trained and tested several supervised machine learning models on preprocessed census data to predict the likelihood of donations. Selected the best model based on accuracy, a modified F-scoring metric, and algorithm efficiency.
4 |
5 | See my implementation and report [here](https://github.com/robertyoung2/Finding-Donors-for-CharityML/blob/master/finding_donors.ipynb).
6 |
7 | ## Project Brief
8 |
9 | In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you've selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given.
10 |
11 | ## Project Evaluation
12 |
13 | My project was evaluated against the [Finding Donors for CharityML project rubric.](https://github.com/robertyoung2/Finding-Donors-for-CharityML/blob/master/Finding%20Donors%20for%20CharityML%20project%20rubric.pdf).
14 |
15 | ## Files Submitted
16 |
17 | - The `finding_donors.ipynb` notebook file with all questions answered and all code cells executed and displaying output.
18 | - An HTML export of the project notebook with the name `report.html`. This file must be present for your project to be evaluated.
19 |
--------------------------------------------------------------------------------
/finding_donors/finding_donors/project_description.md:
--------------------------------------------------------------------------------
1 | # Content: Supervised Learning
2 | ## Project: Finding Donors for CharityML
3 |
4 | ## Project Overview
5 | In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you've selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given.
6 | predicted selling price to your statistics.
7 |
8 | ## Project Highlights
9 | This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided.
10 |
11 | Things you will learn by completing this project:
12 | - How to identify when preprocessing is needed, and how to apply it.
13 | - How to establish a benchmark for a solution to the problem.
14 | - What each of several supervised learning algorithms accomplishes given a specific dataset.
15 | - How to investigate whether a candidate solution model is adequate for the problem.
16 |
17 | ## Software Requirements
18 |
19 | This project uses the following software and Python libraries:
20 |
21 | - [Python 2.7](https://www.python.org/download/releases/2.7/)
22 | - [NumPy](http://www.numpy.org/)
23 | - [Pandas](http://pandas.pydata.org/)
24 | - [scikit-learn](http://scikit-learn.org/stable/)
25 | - [matplotlib](http://matplotlib.org/)
26 |
27 | You will also need to have software installed to run and execute a [Jupyter Notebook](http://ipython.org/notebook.html)
28 |
29 | If you do not have Python installed yet, it is highly recommended that you install the [Anaconda](http://continuum.io/downloads) distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
30 |
31 | ## Starting the Project
32 |
33 | For this assignment, you can find the `finding_donors` folder containing the necessary project files on the [Machine Learning projects GitHub](https://github.com/udacity/machine-learning), under the `projects` folder. You may download all of the files for projects we'll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
34 |
35 | This project contains three files:
36 |
37 | - `finding_donors.ipynb`: This is the main file where you will be performing your work on the project.
38 | - `census.csv`: The project dataset. You'll load this data in the notebook.
39 | - `visuals.py`: A Python file containing visualization code that is run behind-the-scenes. Do not modify
40 |
41 | In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command `jupyter notebook finding_donors.ipynb` to open up a browser window or tab to work with your notebook. Alternatively, you can use the command `jupyter notebook` or `ipython notebook` and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A **README** file has also been provided with the project files which may contain additional necessary information or instruction for the project.
42 |
43 | ## Submitting the Project
44 |
45 | ### Evaluation
46 | Your project will be reviewed by a Udacity reviewer against the **Finding Donors for CharityML project rubric**. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be *meeting specifications* for you to pass.
47 |
48 | ### Submission Files
49 | When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named `student_intervention` for ease of access:
50 | - The `finding_donors.ipynb` notebook file with all questions answered and all code cells executed and displaying output.
51 | - An **HTML** export of the project notebook with the name **report.html**. This file *must* be present for your project to be evaluated.
52 |
53 | Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
54 |
55 | ### I'm Ready!
56 | When you're ready to submit your project, click on the **Submit Project** button at the bottom of the page.
57 |
58 | If you are having any problems submitting your project or wish to check on the status of your submission, please email us at **machine-support@udacity.com** or visit us in the discussion forums.
59 |
60 | ### What's Next?
61 | You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
62 |
--------------------------------------------------------------------------------
/finding_donors/finding_donors/visuals.py:
--------------------------------------------------------------------------------
1 | ###########################################
2 | # Suppress matplotlib user warnings
3 | # Necessary for newer version of matplotlib
4 | import warnings
5 | warnings.filterwarnings("ignore", category = UserWarning, module = "matplotlib")
6 | #
7 | # Display inline matplotlib plots with IPython
8 | from IPython import get_ipython
9 | get_ipython().run_line_magic('matplotlib', 'inline')
10 | ###########################################
11 |
12 | import matplotlib.pyplot as pl
13 | import matplotlib.patches as mpatches
14 | import numpy as np
15 | import pandas as pd
16 | from time import time
17 | from sklearn.metrics import f1_score, accuracy_score
18 |
19 |
20 | def distribution(data, transformed = False):
21 | """
22 | Visualization code for displaying skewed distributions of features
23 | """
24 |
25 | # Create figure
26 | fig = pl.figure(figsize = (11,5));
27 |
28 | # Skewed feature plotting
29 | for i, feature in enumerate(['capital-gain','capital-loss']):
30 | ax = fig.add_subplot(1, 2, i+1)
31 | ax.hist(data[feature], bins = 25, color = '#00A0A0')
32 | ax.set_title("'%s' Feature Distribution"%(feature), fontsize = 14)
33 | ax.set_xlabel("Value")
34 | ax.set_ylabel("Number of Records")
35 | ax.set_ylim((0, 2000))
36 | ax.set_yticks([0, 500, 1000, 1500, 2000])
37 | ax.set_yticklabels([0, 500, 1000, 1500, ">2000"])
38 |
39 | # Plot aesthetics
40 | if transformed:
41 | fig.suptitle("Log-transformed Distributions of Continuous Census Data Features", \
42 | fontsize = 16, y = 1.03)
43 | else:
44 | fig.suptitle("Skewed Distributions of Continuous Census Data Features", \
45 | fontsize = 16, y = 1.03)
46 |
47 | fig.tight_layout()
48 | fig.show()
49 |
50 |
51 | def evaluate(results, accuracy, f1):
52 | """
53 | Visualization code to display results of various learners.
54 |
55 | inputs:
56 | - learners: a list of supervised learners
57 | - stats: a list of dictionaries of the statistic results from 'train_predict()'
58 | - accuracy: The score for the naive predictor
59 | - f1: The score for the naive predictor
60 | """
61 |
62 | # Create figure
63 | fig, ax = pl.subplots(2, 4, figsize = (11,7))
64 |
65 | # Constants
66 | bar_width = 0.3
67 | colors = ['#A00000','#00A0A0','#00A000']
68 |
69 | # Super loop to plot four panels of data
70 | for k, learner in enumerate(results.keys()):
71 | for j, metric in enumerate(['train_time', 'acc_train', 'f_train', 'pred_time', 'acc_test', 'f_test']):
72 | for i in np.arange(3):
73 |
74 | # Creative plot code
75 | ax[j//3, j%3].bar(i+k*bar_width, results[learner][i][metric], width = bar_width, color = colors[k])
76 | ax[j//3, j%3].set_xticks([0.45, 1.45, 2.45])
77 | ax[j//3, j%3].set_xticklabels(["1%", "10%", "100%"])
78 | ax[j//3, j%3].set_xlabel("Training Set Size")
79 | ax[j//3, j%3].set_xlim((-0.1, 3.0))
80 |
81 | # Add unique y-labels
82 | ax[0, 0].set_ylabel("Time (in seconds)")
83 | ax[0, 1].set_ylabel("Accuracy Score")
84 | ax[0, 2].set_ylabel("F-score")
85 | ax[1, 0].set_ylabel("Time (in seconds)")
86 | ax[1, 1].set_ylabel("Accuracy Score")
87 | ax[1, 2].set_ylabel("F-score")
88 |
89 | # Add titles
90 | ax[0, 0].set_title("Model Training")
91 | ax[0, 1].set_title("Accuracy Score on Training Subset")
92 | ax[0, 2].set_title("F-score on Training Subset")
93 | ax[1, 0].set_title("Model Predicting")
94 | ax[1, 1].set_title("Accuracy Score on Testing Set")
95 | ax[1, 2].set_title("F-score on Testing Set")
96 |
97 | # Add horizontal lines for naive predictors
98 | ax[0, 1].axhline(y = accuracy, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed')
99 | ax[1, 1].axhline(y = accuracy, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed')
100 | ax[0, 2].axhline(y = f1, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed')
101 | ax[1, 2].axhline(y = f1, xmin = -0.1, xmax = 3.0, linewidth = 1, color = 'k', linestyle = 'dashed')
102 |
103 | # Set y-limits for score panels
104 | ax[0, 1].set_ylim((0, 1))
105 | ax[0, 2].set_ylim((0, 1))
106 | ax[1, 1].set_ylim((0, 1))
107 | ax[1, 2].set_ylim((0, 1))
108 |
109 | # Set additional plots invisibles
110 | ax[0, 3].set_visible(False)
111 | ax[1, 3].axis('off')
112 |
113 | # Create legend
114 | for i, learner in enumerate(results.keys()):
115 | pl.bar(0, 0, color=colors[i], label=learner)
116 | pl.legend()
117 |
118 | # Aesthetics
119 | pl.suptitle("Performance Metrics for Three Supervised Learning Models", fontsize = 16, y = 1.10)
120 | pl.tight_layout()
121 | pl.show()
122 |
123 |
124 | def feature_plot(importances, X_train, y_train):
125 |
126 | # Display the five most important features
127 | indices = np.argsort(importances)[::-1]
128 | columns = X_train.columns.values[indices[:5]]
129 | values = importances[indices][:5]
130 |
131 | # Creat the plot
132 | fig = pl.figure(figsize = (9,5))
133 | pl.title("Normalized Weights for First Five Most Predictive Features", fontsize = 16)
134 | pl.bar(np.arange(5), values, width = 0.6, align="center", color = '#00A000', \
135 | label = "Feature Weight")
136 | pl.bar(np.arange(5) - 0.3, np.cumsum(values), width = 0.2, align = "center", color = '#00A0A0', \
137 | label = "Cumulative Feature Weight")
138 | pl.xticks(np.arange(5), columns)
139 | pl.xlim((-0.5, 4.5))
140 | pl.ylabel("Weight", fontsize = 12)
141 | pl.xlabel("Feature", fontsize = 12)
142 |
143 | pl.legend(loc = 'upper center')
144 | pl.tight_layout()
145 | pl.show()
146 |
--------------------------------------------------------------------------------