├── .gitignore
├── 1-Introduction-to-machine-learning-software-stack
    └── 1.3
    │   ├── 1.3a NumPy Examples
    │   ├── 1.3b Matplotlib
    │   ├── 1.3c Pandas Examples
    │   ├── 1.3d Machine learning with scikit learn
    │   ├── Machine Learning with Scikit Learn.docx
    │   └── Matplotlib.docx
├── 2-intro-to-ml
    └── README.md
├── 3-project-overview
    └── README.md
├── 4-regression
    ├── 4.1 - Linear Regression Ad Sales Revenue.ipynb
    ├── 4.2 - Multivariate Linear Regression.ipynb
    ├── 4.3 - Regularization and Model Evaluation.ipynb
    └── README.md
├── 5-classification
    ├── 5.2 - Logistic Regression in Classifying Breast Cancer .ipynb
    ├── 5.3 - Visualizing SVMs with Diabetes.ipynb
    ├── 5.4 - Artificial Neural Networks in Classifying Breast Cancer.ipynb
    ├── 5.4.2 - Peeking Inside a Neural Network with MNIST Data.ipynb
    └── README.md
├── 6-clustering
    ├── README.md
    ├── example-clustering-2-dimensions.ipynb
    ├── example-clustering-states.ipynb
    └── k means visualization
    │   ├── d3.v3.min.js
    │   ├── index.html
    │   └── k-means.js
├── 7 - Practical Methodologies
├── 7---practical-methodologies
    └── CV
├── Error Analysis and Classification Measures.ipynb
├── Iris PCA.ipynb
├── LICENSE
├── Learning curves and bias-variance tradeoff.ipynb
├── Ng's Machine Learning Exercise 6.ipynb
├── README.md
├── datasets
    ├── Advertising.csv
    ├── breast-cancer-wisconson.csv
    └── pima-indians-diabetes.csv
├── evaluating estimator performance using cross-validation.ipynb
├── examples
    ├── Gradient.ipynb
    └── kNN.ipynb
├── images
    ├── neuralnet.png
    ├── train_img.png
    └── weights.png
├── kaggle-data
    ├── README.md
    ├── county_facts.csv
    ├── county_facts_dictionary.csv
    └── primary_results.csv
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | #####=== IPythonNotebook ===#####
 2 | # Temporary data
 3 | .ipynb_checkpoints/
 4 | 
 5 | #####=== Python ===#####
 6 | 
 7 | # Byte-compiled / optimized / DLL files
 8 | __pycache__/
 9 | *.py[cod]
10 | 
11 | # C extensions
12 | *.so
13 | 
14 | # Distribution / packaging
15 | .Python
16 | env/
17 | build/
18 | develop-eggs/
19 | dist/
20 | downloads/
21 | eggs/
22 | lib/
23 | lib64/
24 | parts/
25 | sdist/
26 | var/
27 | *.egg-info/
28 | .installed.cfg
29 | *.egg
30 | 
31 | # PyInstaller
32 | #  Usually these files are written by a python script from a template
33 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
34 | *.manifest
35 | *.spec
36 | 
37 | # Installer logs
38 | pip-log.txt
39 | pip-delete-this-directory.txt
40 | 
41 | # Unit test / coverage reports
42 | htmlcov/
43 | .tox/
44 | .coverage
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | 
49 | # Translations
50 | *.mo
51 | *.pot
52 | 
53 | # Django stuff:
54 | *.log
55 | 
56 | # Sphinx documentation
57 | docs/_build/
58 | 
59 | # PyBuilder
60 | target/
61 | 
62 | #####=== OSX ===#####
63 | .DS_Store
64 | .AppleDouble
65 | .LSOverride
66 | 
67 | # Icon must end with two \r
68 | Icon
69 | 
70 | 
71 | # Thumbnails
72 | ._*
73 | 
74 | # Files that might appear on external disk
75 | .Spotlight-V100
76 | .Trashes
77 | 
78 | # Directories potentially created on remote AFP share
79 | .AppleDB
80 | .AppleDesktop
81 | Network Trash Folder
82 | Temporary Items
83 | .apdisk
84 | 
85 | 


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/1.3a NumPy Examples:
--------------------------------------------------------------------------------
  1 | ARRAYS
  2 | 
  3 | The central feature of NumPy is the array object class. Arrays are similar to lists in Python.
  4 | Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists.
  5 | An array can be created from a list:
  6 | In [1]: import numpy as np
  7 | 
  8 | In [2]: a = np.array([1,2,3,4,5], float)
  9 | 
 10 | In [3]: a
 11 | Out[3]: array([ 1., 2., 3., 4., 5.])
 12 | 
 13 | We can also find out the type of each member of the list.
 14 | In [4]: type(a)
 15 | Out[4]: numpy.ndarray
 16 | 
 17 | Arrays can be accessed with the help of index. 
 18 | In [5]: a[:2]
 19 | Out[5]: array([ 1., 2.])
 20 | This indexing gives the elements from the beginning of the array before the 3rd element.
 21 | 
 22 | In [6]: a[3]
 23 | Out[6]: 4.0
 24 | 
 25 | In [7]: a[0]
 26 | Out[7]: 1.0
 27 | 
 28 | In [8]: a[0] = 7
 29 | 
 30 | In [9]: a
 31 | Out[9]: array([ 7., 2., 3., 4., 5.])
 32 | 
 33 | An example of two-dimensional array –
 34 | In [12]: a = np.array([[1,2,3], [4,5,6]], float)
 35 | 
 36 | In [13]: a
 37 | Out[13]: 
 38 | array([[ 1., 2., 3.],
 39 | [ 4., 5., 6.]])
 40 | In [14]: a[0,0]
 41 | Out[14]: 1.0
 42 | 
 43 | In [15]: a[0,1]
 44 | Out[15]: 2.0
 45 | 
 46 | Array slicing works with multiple dimensions in the same way as usual, applying each slice specification as a filter to a specified dimension. Use of a single ":" in a dimension indicates the use of everything along that dimension
 47 | In [16]: a[1,:]
 48 | Out[16]: array([ 4., 5., 6.])
 49 | 
 50 | In [17]: a[:,2]
 51 | Out[17]: array([ 3., 6.])
 52 | 
 53 | In [18]: a[-1:, -2:]
 54 | Out[18]: array([[ 5., 6.]])
 55 | 
 56 | The shape property of an array returns a tuple with the size of each array dimension
 57 | In [19]: a.shape
 58 | Out[19]: (2, 3)
 59 | The dtype property tells you what type of values are stored by the array
 60 | In [20]: a.dtype
 61 | Out[20]: dtype('float64')
 62 | 
 63 | The len function returns the length of the first axis
 64 | In [23]: len(a)
 65 | Out[23]: 2
 66 | 
 67 | The in statement can be used to test if values are present in an array
 68 | In [24]: 2 in a
 69 | Out[24]: True
 70 | 
 71 | In [25]: 10 in a
 72 | Out[25]: False
 73 | 
 74 | Arrays can be reshaped using tuples that specify new dimensions. In the following example, we turn a twenty-element one-dimensional array into a two-dimensional array whose first axis has five elements and whose second axis has four elements
 75 | In [26]: a = np.array(range(20), float)
 76 | 
 77 | In [27]: a
 78 | Out[27]: 
 79 | array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
 80 | 11., 12., 13., 14., 15., 16., 17., 18., 19.])
 81 | 
 82 | In [28]: a = a.reshape((5,4))
 83 | 
 84 | In [29]: a
 85 | Out[29]: 
 86 | array([[ 0., 1., 2., 3.],
 87 | [ 4., 5., 6., 7.],
 88 | [ 8., 9., 10., 11.],
 89 | [ 12., 13., 14., 15.],
 90 | [ 16., 17., 18., 19.]])
 91 | In [30]: a.shape
 92 | Out[30]: (5, 4)
 93 | The reshape function creates a new array and does not itself modify the original array.
 94 | 
 95 | The copy function can be used to create a new, separate copy of an array in memory
 96 | In [31]: a = np.array([1,2,3], float)
 97 | 
 98 | In [32]: b = a
 99 | 
100 | In [33]: c = a.copy()
101 | 
102 | In [34]: a[0] = 0
103 | 
104 | In [35]: a
105 | Out[35]: array([ 0., 2., 3.])
106 | 
107 | In [36]: b
108 | Out[36]: array([ 0., 2., 3.])
109 | 
110 | In [37]: c
111 | Out[37]: array([ 1., 2., 3.])
112 | 
113 | Lists can also be created from array
114 | In [38]: a - np.array([1,2,3], float)
115 | Out[38]: array([-1., 0., 0.])
116 | 
117 | In [39]: a.tolist()
118 | Out[39]: [0.0, 2.0, 3.0]
119 | 
120 | In [40]: list(a)
121 | Out[40]: [0.0, 2.0, 3.0]
122 | We can convert the raw data in an array to a binary string (i.e., not in human-readable form) using the tostring function. The fromstring function then allows an array to be created from this data later on. These routines are sometimes convenient for saving large amount of array data in files that can be read later on.
123 | In [48]: a = np.array([5,4,3], float)
124 | 
125 | In [49]: s = a.tostring()
126 | 
127 | In [50]: s
128 | Out[50]: '\x00\x00\x00\x00\x00\x00\x14@\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x00\x00\x08@'
129 | 
130 | In [51]: np.fromstring(s)
131 | Out[51]: array([ 5., 4., 3.])
132 | 
133 | Array can also be transposed with the help of transpose function
134 | In [52]: a = np.array(range(6), float).reshape((3,2))
135 | 
136 | In [53]: a
137 | Out[53]: 
138 | array([[ 0., 1.],
139 | [ 2., 3.],
140 | [ 4., 5.]])
141 | 
142 | In [54]: a.transpose()
143 | Out[54]: 
144 | array([[ 0., 2., 4.],
145 | [ 1., 3., 5.]])
146 | 
147 | One-dimensional versions of multi-dimensional arrays can be generated with flatten
148 | In [55]: a = np.array([[1, 2, 3], [4, 5, 6]], float)
149 | 
150 | In [56]: a
151 | Out[56]: 
152 | array([[ 1., 2., 3.],
153 | [ 4., 5., 6.]])
154 | 
155 | In [57]: a.flatten()
156 | Out[57]: array([ 1., 2., 3., 4., 5., 6.])
157 | 
158 | Two or more arrays can be concatenated together using the concatenate function with a tuple of the arrays to be joined
159 | In [62]: a = np.array([1,2], float)
160 | 
161 | In [63]: b = np.array([3,4,5], float)
162 | 
163 | In [64]: c = np.array([6,7,8,9], float)
164 | 
165 | In [65]: np.concatenate((a, b, c))
166 | Out[65]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
167 | 
168 | If an array has more than one dimension, it is possible to specify the axis along which multiple arrays are concatenated. By default (without specifying the axis), NumPy concatenates along the first dimension
169 | In [66]: a = np.array([[1, 2], [3, 4]], float)
170 | 
171 | In [67]: b = np.array([[5, 6], [7,8]], float)
172 | 
173 | In [68]: np.concatenate((a,b))
174 | Out[68]: 
175 | array([[ 1., 2.],
176 | [ 3., 4.],
177 | [ 5., 6.],
178 | [ 7., 8.]])
179 | 
180 | In [69]: np.concatenate((a,b), axis = 0)
181 | Out[69]: 
182 | array([[ 1., 2.],
183 | [ 3., 4.],
184 | [ 5., 6.],
185 | [ 7., 8.]])
186 | 
187 | In [70]: np.concatenate((a,b), axis = 1)
188 | Out[70]: 
189 | array([[ 1., 2., 5., 6.],
190 | [ 3., 4., 7., 8.]])
191 | 
192 | The dimensionality of an array can be increased using the newaxis constant in bracket notation
193 | In [75]: a = np.array([1, 2, 3], float)
194 | 
195 | In [76]: a
196 | Out[76]: array([ 1., 2., 3.])
197 | 
198 | In [77]: a[:, np.newaxis].shape
199 | Out[77]: (3, 1)
200 | 
201 | In [79]: a[np.newaxis,:]
202 | Out[79]: array([[ 1., 2., 3.]])
203 | 
204 | In [80]: a[np.newaxis,:].shape
205 | Out[80]: (1, 3)
206 | 
207 | The arange function returns an array
208 | In [82]: np.arange(5, dtype = float)
209 | Out[82]: array([ 0., 1., 2., 3., 4.])
210 | 
211 | In [83]: np.arange(1,6,2, dtype = int)
212 | Out[83]: array([1, 3, 5])
213 | 
214 | The functions zeros and ones create new arrays of specified dimensions filled with these values
215 | In [84]: np.ones((3,2), dtype = float)
216 | Out[84]: 
217 | array([[ 1., 1.],
218 | [ 1., 1.],
219 | [ 1., 1.]])
220 | 
221 | In [85]: np.zeros(7, dtype = int)
222 | Out[85]: array([0, 0, 0, 0, 0, 0, 0])
223 | 
224 | The zeros_like and ones_like functions create a new array with the same dimensions and type of an existing one
225 | In [86]: a = np.array([[1,2,3], [4,5,6]], float)
226 | 
227 | In [87]: np.zeros_like(a)
228 | Out[87]: 
229 | array([[ 0., 0., 0.],
230 | [ 0., 0., 0.]])
231 | 
232 | In [88]: np.ones_like(a)
233 | Out[88]: 
234 | array([[ 1., 1., 1.],
235 | [ 1., 1., 1.]])
236 | 
237 | 
238 | 
239 | 
240 | ARRAY MATHEMATICS
241 | 
242 | When standard mathematical operations are used with arrays, they are applied on an element-by-element basis. This means that the arrays should be the same size during addition, subtraction
243 | In [90]: a = np.array([1,2,3])
244 | 
245 | In [91]: b = np.array([5,6,7])
246 | 
247 | In [92]: a+b
248 | Out[92]: array([ 6, 8, 10])
249 | 
250 | In [93]: a-b
251 | Out[93]: array([-4, -4, -4])
252 | 
253 | In [94]: a*b
254 | Out[94]: array([ 5, 12, 21])
255 | 
256 | In [95]: b/a
257 | Out[95]: array([5, 3, 2])
258 | 
259 | In [96]: a%b
260 | Out[96]: array([1, 2, 3])
261 | 
262 | In [97]: b**a
263 | Out[97]: array([ 5, 36, 343])
264 | 
265 | For two-dimensional arrays, multiplication remains element wise and does not correspond to matrix multiplication
266 | In [98]: a = np.array([[1,2], [3,4]])
267 | 
268 | In [99]: b = np.array([[2,0], [1,3]])
269 | 
270 | In [100]: a*b
271 | Out[100]: 
272 | array([[ 2, 0],
273 | [ 3, 12]])
274 | We will see Errors when the arrays do not match in size.
275 | 
276 | Arrays that do not match in the number of dimensions will be broadcasted by Python to perform mathematical operations
277 | In [103]: a = np.array([[1, 2], [3, 4], [5, 6]])
278 | 
279 | In [104]: b = np.array([-1, 3])
280 | 
281 | In [105]: a
282 | Out[105]: 
283 | array([[1, 2],
284 | [3, 4],
285 | [5, 6]])
286 | 
287 | In [106]: b
288 | Out[106]: array([-1, 3])
289 | 
290 | In [107]: a + b
291 | Out[107]: 
292 | array([[0, 5],
293 | [2, 7],
294 | [4, 9]])
295 | 
296 | In addition to the standard operators, NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. Among these are the functions: abs, sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, and arctanh
297 | In [108]: a = np.array([6,8,9])
298 | 
299 | In [109]: np.sqrt(a)
300 | Out[109]: array([ 2.44948974, 2.82842712, 3. ])
301 | 
302 | In [110]: np. sin(a)
303 | Out[110]: array([-0.2794155 , 0.98935825, 0.41211849])
304 | 
305 | In [111]: np.cos(a)
306 | Out[111]: array([ 0.96017029, -0.14550003, -0.91113026])
307 | 
308 | In [112]: np.log(a)
309 | Out[112]: array([ 1.79175947, 2.07944154, 2.19722458])
310 | 
311 | In [113]: np.log10(a)
312 | Out[113]: array([ 0.77815125, 0.90308999, 0.95424251])
313 | 
314 | In [114]: np.tan(a)
315 | Out[114]: array([-0.29100619, -6.79971146, -0.45231566])
316 | 
317 | 
318 | The functions floor, ceil, and rint give the lower, upper, or nearest (rounded) integer
319 | In [117]: a = np.array([3.6, 8.4, 9.2, 5.9], float)
320 | 
321 | In [118]: np.floor(a)
322 | Out[118]: array([ 3., 8., 9., 5.])
323 | 
324 | In [119]: np.ceil(a)
325 | Out[119]: array([ 4., 9., 10., 6.])
326 | 
327 | In [120]: np.rint(a)
328 | Out[120]: array([ 4., 8., 9., 6.])
329 | Also included in the NumPy module are two important mathematical constants
330 | In [121]: np.pi
331 | Out[121]: 3.141592653589793
332 | 
333 | In [122]: np.e
334 | Out[122]: 2.718281828459045
335 | 
336 | 
337 | 
338 | 
339 | ARRAY ITERATION
340 | 
341 | It is possible to iterate over arrays in a manner similar to that of lists
342 | In [131]: a = np.array([1,3,6])
343 | 
344 | In [132]: for x in a:
345 |      ...: print x
346 |      ...: 
347 | 1
348 | 3
349 | 6
350 | 
351 | For multidimensional arrays, iteration proceeds over the first axis such that each loop returns a subsection of the array
352 | In [133]: a = np.array([[1, 2], [3, 4], [5, 6]], float)
353 | 
354 | In [134]: for x in a:
355 |      ...: print x
356 |      ...: 
357 | [ 1. 2.]
358 | [ 3. 4.]
359 | [ 5. 6.]
360 | 
361 | 
362 | Multiple assignment can also be used with array iteration
363 | In [135]: a = np.array([[1, 2], [3, 4], [5, 6]])
364 | 
365 | In [136]: for (x,y) in a:
366 |      ...: print x*y
367 |      ...: 
368 | 2
369 | 12
370 | 30
371 | 
372 | 
373 | 
374 | 
375 | ARRAY OPERATIONS
376 | 
377 | The elements in an array can be summed or multiplied
378 | In [137]: a = np.array([2,4,3])
379 | 
380 | In [138]: a.sum()
381 | Out[138]: 9
382 | 
383 | In [139]: a.prod()
384 | Out[139]: 24
385 | 
386 | Here, member functions of the arrays were used. We can also use standalone functions in the NumPy module
387 | In [140]: np.sum(a)
388 | Out[140]: 9
389 | 
390 | In [141]: np.prod(a)
391 | Out[141]: 24
392 | 
393 | A number of routines enable computation of statistical quantities in array datasets, such as the mean (average), variance, and standard deviation
394 | In [143]: a.mean()
395 | Out[143]: 3.0
396 | 
397 | In [144]: a.var()
398 | Out[144]: 0.66666666666666663
399 | 
400 | In [145]: a.std()
401 | Out[145]: 0.81649658092772603
402 | 
403 | It's also possible to find the minimum and maximum element values
404 | In [150]: a = np.array([4,9,6])
405 | 
406 | In [151]: a.min()
407 | Out[151]: 4
408 | 
409 | In [152]: a.max()
410 | Out[152]: 9
411 | 
412 | The argmin and argmax functions return the array indices of the minimum and maximum values
413 | In [153]: a = np.array([4,9,6])
414 | 
415 | In [154]: a.argmin()
416 | Out[154]: 0
417 | 
418 | In [155]: a.argmax()
419 | Out[155]: 1
420 | 
421 | For multidimensional arrays, each of the functions thus far described can take an optional argument axis that will perform an operation along only the specified axis, placing the results in a return array
422 | In [163]: a = np.array([[0,1], [2,-4], [9,7]])
423 | 
424 | In [164]: a.mean(axis=0)
425 | Out[164]: array([ 3.66666667, 1.33333333])
426 | 
427 | In [165]: a.mean(axis=1)
428 | Out[165]: array([ 0.5, -1. , 8. ])
429 | 
430 | In [166]: a.min(axis=1)
431 | Out[166]: array([ 0, -4, 7])
432 | 
433 | In [167]: a.max(axis=0)
434 | Out[167]: array([9, 7])
435 | 
436 | Arrays can also be sorted
437 | In [168]: a = np.array([8,5,9,3,7,0,12])
438 | 
439 | In [169]: sorted(a)
440 | Out[169]: [0, 3, 5, 7, 8, 9, 12]
441 | 
442 | In [170]: a.sort()
443 | 
444 | In [171]: a
445 | Out[171]: array([ 0, 3, 5, 7, 8, 9, 12])
446 | 
447 | Values in an array can be "clipped" to be within a prespecified range
448 | In [172]: a = np.array([8,5,9,3,7,0,12])
449 | 
450 | In [173]: a.clip(0,5)
451 | Out[173]: array([5, 5, 5, 3, 5, 0, 5])
452 | 
453 | Unique elements can be extracted from an array.
454 | In [176]: a = np.array([7,7,7,7,8,1,3,2,2,5,5,9,9,1])
455 | 
456 | In [177]: np.unique(a)
457 | Out[177]: array([1, 2, 3, 5, 7, 8, 9])
458 | 
459 | 
460 | 
461 | 
462 | BOOLEAN COMPARISONS
463 | 
464 | Boolean comparisons can be used to compare members element wise on arrays of equal size. The return value is an array of Boolean True / False values.
465 | In [178]: a = np.array([6,9,4], float)
466 | 
467 | In [179]: b = np.array([0,4,7], float)
468 | 
469 | In [180]: a > b
470 | Out[180]: array([ True, True, False], dtype=bool)
471 | In [181]: a == b
472 | Out[181]: array([False, False, False], dtype=bool)
473 | 
474 | In [182]: a <= b
475 | Out[182]: array([False, False, True], dtype=bool)
476 | 
477 | The results of a Boolean comparison can be stored in an array
478 | In [183]: c = a > b
479 | 
480 | In [184]: c
481 | Out[184]: array([ True, True, False], dtype=bool)
482 | 
483 | Arrays can be compared to single values using broadcasting
484 | In [185]: a = np.array([2,9,7,5,8])
485 | 
486 | In [186]: a > 4
487 | Out[186]: array([False, True, True, True, True], dtype=bool)
488 | 
489 | It is possible to test whether or not values are NaN ("not a number") or finite
490 | In [197]: np.isnan(a)
491 | Out[197]: array([False, True, False], dtype=bool)
492 | 
493 | In [198]: np.isfinite(a)
494 | Out[198]: array([ True, False, False], dtype=bool)
495 | 
496 | 
497 | 
498 | 
499 | STATISTICS
500 | 
501 | In addition to the mean, var, and std functions, NumPy supplies several other methods for returning statistical features of arrays.
502 | In [199]: a = np.array([1,3,5,7,4,6,9,8])
503 | 
504 | In [200]: np.median(a)
505 | Out[200]: 5.5
506 | 
507 | The correlation coefficient for multiple variables observed at multiple instances can be found for arrays of the form [[x1, x2, …], [y1, y2, …], [z1, z2, …], …] where x, y, z are different observables and the numbers indicate the observation times
508 | In [202]: a = np.array([[1,2,3,4,5], [6,7,8,9,10]])
509 | 
510 | In [203]: b = np.corrcoef(a)
511 | 
512 | In [204]: b
513 | Out[204]: 
514 | array([[ 1., 1.],
515 | [ 1., 1.]])
516 | 
517 | The covariance for data can be found
518 | In [208]: np.cov(a)
519 | Out[208]: 
520 | array([[ 0.91666667, 2.08333333],
521 | [ 2.08333333, 8.91666667]])
522 | 


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/1.3b Matplotlib:
--------------------------------------------------------------------------------
  1 | LINE PLOT
  2 | 
  3 | In [1]: import matplotlib as mp
  4 | 
  5 | In [2]: import matplotlib.pyplot as plt
  6 | 
  7 | In [3]: plt.plot([1,2,3,4])
  8 | Out[3]: [<matplotlib.lines.Line2D at 0x61b7170>]
  9 | 
 10 | In [4]: plt.show()
 11 | 
 12 |  
 13 | 
 14 | In [5]: plt.plot([1,2,3,4], [2,5,8,15])
 15 | Out[5]: [<matplotlib.lines.Line2D at 0x63cce70>]
 16 | 
 17 | In [6]: plt.show()
 18 | 
 19 |  
 20 | 
 21 | 
 22 | PLOTTING VALUES WITH DOTS
 23 | 
 24 | In [7]: plt.plot([1,2,3,4], [2,5,8,15], 'ro')
 25 | Out[7]: [<matplotlib.lines.Line2D at 0x64fc670>]
 26 | 
 27 | In [8]: plt.axis([0,6,0,20])
 28 | Out[8]: [0, 6, 0, 20]
 29 | 
 30 | In [9]: plt.show()
 31 | 
 32 | 
 33 |  
 34 | 
 35 | PLOTTING SEVERAL LINES WITH DIFFERENT FORMAT STYLE
 36 | 
 37 | In [11]: import numpy as np
 38 | 
 39 | In [12]: t = np.arange(0., 5., 0.2)
 40 | 
 41 | In [13]: plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
 42 | Out[13]: 
 43 | [<matplotlib.lines.Line2D at 0x65a90b0>,
 44 | <matplotlib.lines.Line2D at 0x65a91b0>,
 45 | <matplotlib.lines.Line2D at 0x65a9570>]
 46 | 
 47 | In [14]: plt.show()
 48 | 
 49 |  
 50 | 
 51 | 
 52 | SIMPLE PLOT
 53 | 
 54 | In [16]: x = np.arange(0, 3*np.pi, 0.1)
 55 | 
 56 | In [17]: y = np.sin(x)
 57 | 
 58 | In [18]: plt.plot(x,y)
 59 | Out[18]: [<matplotlib.lines.Line2D at 0x66a7d30>]
 60 | 
 61 | In [19]: plt.show()
 62 | 
 63 |  
 64 | 
 65 | 
 66 | SINE AND COSINE PLOT
 67 | 
 68 | In [20]: x = np.arange(0, 3 * np.pi, 0.1)
 69 | In [22]: y_sin = np.sin(x)
 70 | 
 71 | In [23]: y_cos = np.cos(x)
 72 | 
 73 | In [24]: plt.plot(x, y_sin)
 74 | Out[24]: [<matplotlib.lines.Line2D at 0x673bcf0>]
 75 | 
 76 | In [25]: plt.plot(x, y_cos)
 77 | Out[25]: [<matplotlib.lines.Line2D at 0x678c230>]
 78 | 
 79 | In [26]: plt.xlabel('x axis label')
 80 | Out[26]: <matplotlib.text.Text at 0x65dc530>
 81 | 
 82 | In [27]: plt.ylabel('y axis label')
 83 | Out[27]: <matplotlib.text.Text at 0x651c250>
 84 | 
 85 | In [28]: plt.title ('Sine and Cosine')
 86 | Out[28]: <matplotlib.text.Text at 0x6720b10>
 87 | 
 88 | In [29]: plt.legend(['Sine', 'Cosine'])
 89 | Out[29]: <matplotlib.legend.Legend at 0x678c510>
 90 | 
 91 | In [30]: plt.show()
 92 | 
 93 |  
 94 | 
 95 | 
 96 | SUBPLOTS
 97 | 
 98 | In [31]: x = np.arange(0, 3 * np.pi, 0.1)
 99 | 
100 | In [32]: y_sin = np.sin(x)
101 | 
102 | In [33]: y_cos = np.cos(x)
103 | 
104 | In [34]: plt.subplot(2,1,1)
105 | Out[34]: <matplotlib.axes._subplots.AxesSubplot at 0x65d4330>
106 | 
107 | In [35]: plt.plot(x, y_sin)
108 | Out[35]: [<matplotlib.lines.Line2D at 0x684f650>]
109 | 
110 | In [36]: plt.title('Sine')
111 | Out[36]: <matplotlib.text.Text at 0x68353d0>
112 | 
113 | In [37]: plt.subplot(2,1,2)
114 | Out[37]: <matplotlib.axes._subplots.AxesSubplot at 0x684f630>
115 | 
116 | In [38]: plt.plot(x, y_cos)
117 | Out[38]: [<matplotlib.lines.Line2D at 0x68ed150>]
118 | 
119 | In [39]: plt.title('Cosine')
120 | Out[39]: <matplotlib.text.Text at 0x68ce510>
121 | 
122 | In [40]: plt.show()
123 | 
124 |  
125 | 
126 | 
127 | HISTOGRAM
128 | 
129 | data = np.random.normal(5.0, 3.0, 1000)
130 | 
131 | In [12]: plt.hist(data)
132 | Out[12]: 
133 | (array([ 2., 12., 44., 100., 200., 249., 214., 123., 43., 13.]),
134 | array([ -5.6815256 , -3.7794175 , -1.8773094 , 0.02479871,
135 | 1.92690681, 3.82901491, 5.73112301, 7.63323111,
136 | 9.53533921, 11.43744731, 13.33955541]),
137 | <a list of 10 Patch objects>)
138 | 
139 | In [13]: plt.xlabel('data')
140 | Out[13]: <matplotlib.text.Text at 0x74b7250>
141 | 
142 | In [14]: plt.show()
143 | 
144 | 
145 |  
146 | 
147 | 
148 | 
149 | In [15]: bins = np.arange(-5., 16., 1.)
150 | 
151 | In [16]: plt.hist(data, bins)
152 | Out[16]: 
153 | (array([ 0., 5., 8., 13., 31., 33., 74., 104., 110.,
154 | 138., 120., 113., 103., 74., 25., 27., 13., 5.,
155 | 3., 0.]),
156 | array([ -5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.,
157 | 6., 7., 8., 9., 10., 11., 12., 13., 14., 15.]),
158 | <a list of 20 Patch objects>)
159 | 
160 | In [17]: plt.show()
161 | 
162 |  
163 | 
164 | 
165 | BAR CHART
166 | 
167 | In [3]: year = (2011, 2012, 2013, 2014, 2015)
168 | 
169 | In [4]: score = (83, 78, 99, 60, 80)
170 | In [13]: plt.xlabel('year')
171 | Out[13]: <matplotlib.text.Text at 0x6bc5f10>
172 | 
173 | In [14]: plt.ylabel('score')
174 | Out[14]: <matplotlib.text.Text at 0x6d16930>
175 | 
176 | In [15]: plt.bar(year, score)
177 | Out[15]: <Container object of 5 artists>
178 | 
179 | In [16]: plt.show()
180 | 
181 |  
182 | 
183 | 
184 | SCATTER PLOT
185 | 
186 | In [14]: year = (2011, 2012, 2013, 2014, 2015)
187 | 
188 | In [15]: score = (83, 78, 99, 60, 80)
189 | 
190 | In [16]: plt.scatter(year, score)
191 | Out[16]: <matplotlib.collections.PathCollection at 0x6a5b430>
192 | 
193 | In [17]: plt.xlabel(year)
194 | Out[17]: <matplotlib.text.Text at 0x6753ff0>
195 | 
196 | In [18]: plt.ylabel(score)
197 | Out[18]: <matplotlib.text.Text at 0x69bdb10>
198 | 
199 | In [19]: plt.show()
200 |  
201 | 
202 | 


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/1.3c Pandas Examples:
--------------------------------------------------------------------------------
  1 | SERIES IN PANDAS:  
  2 | 
  3 | A series is an one-dimensional array containing an array of any data type and an associated array of data labels called index.
  4 | 
  5 | 
  6 | In [1]: import pandas as pd
  7 | 
  8 | In [3]: from pandas import Series, DataFrame
  9 | 
 10 | 
 11 | 
 12 | Forming a simple series.
 13 | 
 14 | In [4]: obj = Series([4,7,-3,1])
 15 | 
 16 | In [5]: obj
 17 | Out[5]: 
 18 | 0 4
 19 | 1 7
 20 | 2 -3
 21 | 3 1
 22 | dtype: int64
 23 | 
 24 | The Series is represented normally with an index at the left. Since we did not specify any index for data here, it specified an index on its own starting from 0 through N-1 where N is the length of data (in this case its 4).
 25 |  
 26 | 
 27 | 
 28 | We can get an array representation and index objects of the Series via its values and index attributes - 
 29 | 
 30 | In [6]: obj.values
 31 | Out[6]: array([ 4, 7, -3, 1], dtype=int64)
 32 | 
 33 | In [7]: obj.index
 34 | Out[7]: RangeIndex(start=0, stop=4, step=1)
 35 | 
 36 | 
 37 | 
 38 | It is good to create a Series with an index for each element in the Series.
 39 | 
 40 | In [8]: obj2 = Series([4,7,-3,1], index = ['a','b','c','d'])
 41 | 
 42 | In [9]: obj2
 43 | Out[9]: 
 44 | a 4
 45 | b 7
 46 | c -3
 47 | d 1
 48 | dtype: int64
 49 | 
 50 | 
 51 | 
 52 | Compared to regular Numpy Array, in pandas, we can use values in the index when selecting a single values or a set of values.
 53 | 
 54 | In [10]: obj2.index
 55 | Out[10]: Index([u'a', u'b', u'c', u'd'], dtype='object')
 56 | 
 57 | In [11]: obj2['a']
 58 | Out[11]: 4
 59 | 
 60 | In [12]: obj2['d'] - 4
 61 | Out[12]: -3
 62 | 
 63 | In [13]: obj2[['c', 'a', 'd']]
 64 | Out[13]: 
 65 | c -3
 66 | a 4
 67 | d 1
 68 | dtype: int64
 69 | 
 70 | 
 71 | 
 72 | NumPy array operations, such as filtering with Boolean array, Scalar multiplications, or applying math functions, will preserve the index value.
 73 | 
 74 | In [14]: obj2
 75 | Out[14]: 
 76 | a 4
 77 | b 7
 78 | c -3
 79 | d 1
 80 | dtype: int64
 81 | 
 82 | 
 83 | In [15]: obj2[obj2 > 0]
 84 | Out[15]: 
 85 | a 4
 86 | b 7
 87 | d 1
 88 | dtype: int64
 89 | 
 90 | 
 91 | In [16]: obj2 * 2
 92 | Out[16]: 
 93 | a 8
 94 | b 14
 95 | c -6
 96 | d 2
 97 | dtype: int64
 98 | 
 99 | 
100 | In [17]: import numpy as np
101 | 
102 | 
103 | In [18]: np.exp(obj2)
104 | Out[18]: 
105 | a 54.598150
106 | b 1096.633158
107 | c 0.049787
108 | d 2.718282
109 | dtype: float64
110 | 
111 | 
112 | 
113 | We can also think of Series as a fixed-length, ordered dictionary, as it is a mapping of index values to data values. It can be substituted into many functions that expect a dictionary.
114 | 
115 | In [19]: 'b' in obj2
116 | Out[19]: True
117 | 
118 | In [21]: 'f' in obj2
119 | Out[21]: False
120 | 
121 | 
122 | 
123 | If there is data in Python dictionary, we can create a Series from it by passing the dictionary.
124 | 
125 | In [22]: sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
126 | 
127 | 
128 | In [23]: obj3 = Series(sdata)
129 | 
130 | 
131 | In [24]: obj3
132 | Out[24]: 
133 | Ohio 35000
134 | Oregon 16000
135 | Texas 71000
136 | Utah 5000
137 | dtype: int64
138 | 
139 | 
140 | 
141 | 
142 | While passing a dictionary, the index of the resulting Series will have the dictionary keys in the sorted order.
143 | 
144 | In [25]: states = ['California', 'Ohio', 'Oregon', 'Texas']
145 | 
146 | In [26]: obj4 = Series(sdata, index= states)
147 | 
148 | In [27]: obj4
149 | Out[27]: 
150 | California NaN
151 | Ohio 35000.0
152 | Oregon 16000.0
153 | Texas 71000.0
154 | dtype: float64
155 | 
156 | In this case, three values are found in sdata and were placed in the appropriate location. There was no value for California and hence it is NaN (Not an Number) which is considered in pandas to mark missing or NA values. 
157 | 
158 | 
159 | 
160 | The isnull or notnull function in pandas is used to detect missing values.
161 | 
162 | In [28]: pd.isnull(obj4)
163 | Out[28]: 
164 | California True
165 | Ohio False
166 | Oregon False
167 | Texas False
168 | dtype: bool
169 | 
170 | 
171 | In [29]: pd.notnull(obj4)
172 | Out[29]: 
173 | California False
174 | Ohio True
175 | Oregon True
176 | Texas True
177 | dtype: bool
178 | 
179 | 
180 | 
181 | Pandas Series also has these as instance methods.
182 | 
183 | In [30]: obj4.isnull()
184 | Out[30]: 
185 | California True
186 | Ohio False
187 | Oregon False
188 | Texas False
189 | dtype: bool
190 | 
191 | 
192 | 
193 | Another important Series feature for many applications is that it automatically aligns differently indexed data in arithmetic operations.
194 | 
195 | In [31]: obj3
196 | Out[31]: 
197 | Ohio 35000
198 | Oregon 16000
199 | Texas 71000
200 | Utah 5000
201 | dtype: int64
202 | 
203 | 
204 | In [32]: obj4
205 | Out[32]: 
206 | California NaN
207 | Ohio 35000.0
208 | Oregon 16000.0
209 | Texas 71000.0
210 | dtype: float64
211 | 
212 | 
213 | In [33]: obj3 + obj4
214 | Out[33]: 
215 | California NaN
216 | Ohio 70000.0
217 | Oregon 32000.0
218 | Texas 142000.0
219 | Utah NaN
220 | dtype: float64
221 | 
222 | 
223 | 
224 | Both the Series objects and its index has a name attribute, which integrates with other key areas of pandas functionality.
225 | 
226 | In [34]: obj4.name = 'population'
227 | 
228 | In [35]: obj4.index.name = 'state'
229 | 
230 | In [36]: obj4
231 | Out[36]: 
232 | state
233 | California NaN
234 | Ohio 35000.0
235 | Oregon 16000.0
236 | Texas 71000.0
237 | Name: population, dtype: float64
238 | 
239 | 
240 | 
241 | A Series index can be altered by assigning it.
242 | 
243 | In [37]: obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
244 | 
245 | In [38]: obj
246 | Out[38]: 
247 | Bob 4
248 | Steve 7
249 | Jeff -3
250 | Ryan 1
251 | dtype: int64
252 | 
253 | 
254 | 
255 | 
256 | 
257 | 
258 | DATA FRAMES:
259 | 
260 | A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered
261 | collection of columns, each of which can be a different value type (numeric,
262 | string, boolean, etc.). The DataFrame has both a row and column index; it can be
263 | thought of as a dict of Series (one for all sharing the same index).
264 | 
265 | 
266 | 
267 | Though there are different ways to construct a DataFrame, the most common way is to form a dictionary of equal length lists or Numpy arrays.
268 | 
269 | In [39]: data = {'state' : ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
270 |     ...: 'year' : [2000, 2001, 2002, 2001, 2002],
271 |     ...: 'pop' : [1.5, 1.7, 3.6, 2.4, 2.9]}
272 | 
273 | In [40]: frame = DataFrame(data)
274 | 
275 | 
276 | 
277 | The resulting DataFrame will have its index assigned automatically and the columns are placed in sorted order.
278 | 
279 | In [41]: frame
280 | Out[41]: 
281 |      pop       state          year
282 | 0   1.5        Ohio           2000
283 | 1   1.7        Ohio           2001
284 | 2   3.6        Ohio           2002
285 | 3   2.4       Nevada          2001
286 | 4   2.9       Nevada          2002
287 | 
288 |  
289 | 
290 | If we specify a sequence of columns, the DataFrame’s columns will be exactly in the order of what we pass
291 | 
292 | In [42]: DataFrame(data, columns = ['year', 'state', 'pop'])
293 | Out[42]: 
294 |         year          state            pop
295 | 0       2000          Ohio             1.5
296 | 1       2001          Ohio             1.7
297 | 2       2002          Ohio             3.6
298 | 3       2001         Nevada            2.4
299 | 4       2002         Nevada            2.9
300 | 
301 | 
302 | 
303 | If we pass a column that is not contained in the data, it will appear as NaN values.
304 | 
305 | In [43]: frame2 = DataFrame(data, columns = ['year', 'state', 'pop', 'debt'],
306 |     ...: index = ['one', 'two', 'three', 'four', 'five'])
307 | 
308 | In [44]: frame2
309 | Out[44]: 
310 |              year        state         pop         debt
311 | one          2000        Ohio          1.5          NaN
312 | two          2001        Ohio          1.7          NaN
313 | three        2002        Ohio          3.6          NaN
314 | four         2001       Nevada         2.4          NaN
315 | five         2002       Nevada         2.9          NaN
316 | 
317 | 
318 | 
319 | A column in a DataFrame can be retrieved as a Series either by dictionary like notation or by attribute.
320 | 
321 | In [45]: frame2.columns
322 | Out[45]: Index([u'year', u'state', u'pop', u'debt'], dtype='object')
323 | 
324 | 
325 | In [46]: frame2['state']
326 | Out[46]: 
327 | one      Ohio
328 | two      Ohio
329 | three    Ohio
330 | four     Nevada
331 | five     Nevada
332 | Name: state, dtype: object
333 | 
334 | 
335 | In [47]: frame2.year
336 | Out[47]: 
337 | one     2000
338 | two     2001
339 | three   2002
340 | four    2001
341 | five    2002
342 | Name: year, dtype: int64
343 | 
344 | 
345 | Note: The name attribute has the same index as the DataFrame, and their name attribute has been appropriately set.
346 | 
347 | 
348 | 
349 | Rows can also be retrieved by position or name by a couple of methods, such as the
350 | ix indexing field.
351 | 
352 | In [48]: frame2.ix['three']
353 | Out[48]: 
354 | year     2002
355 | state    Ohio
356 | pop      3.6
357 | debt     NaN
358 | Name: three, dtype: object
359 | 
360 | 
361 | 
362 | Columns can also be assigned values and modified.
363 | 
364 | In [49]: frame2['debt'] = 16.5
365 | 
366 | In [50]: frame2
367 | Out[50]: 
368 |            year      state         pop           debt
369 | one        2000      Ohio          1.5           16.5
370 | two        2001      Ohio          1.7           16.5
371 | three      2002      Ohio          3.6           16.5
372 | four       2001     Nevada         2.4           16.5
373 | five       2002     Nevada         2.9           16.5
374 | 
375 | In [52]: frame2['debt'] = np.arange(5.)
376 |  
377 | In [53]: frame2
378 | Out[53]: 
379 |              year        state          pop            debt
380 | one          2000        Ohio           1.5             0.0
381 | two          2001        Ohio           1.7             1.0
382 | three        2002        Ohio           3.6             2.0
383 | four         2001       Nevada          2.4             3.0
384 | five         2002       Nevada          2.9             4.0
385 | 
386 | 
387 | 
388 | While assigning lists or arrays to a column, the value’s length must match the length
389 | of the DataFrame. If you assign a Series, it will be instead conformed exactly to the
390 | DataFrame’s index, inserting missing values in any holes
391 | 
392 | In [56]: val = Series([-1.4, -1.6, -1.7], index = ['two', 'four', 'five'])
393 | 
394 | In [57]: frame2['debt'] = val
395 | 
396 | In [58]: frame2
397 | Out[58]: 
398 |            year        state         pop         debt
399 | one        2000        Ohio          1.5         NaN
400 | two        2001        Ohio          1.7         -1.4
401 | three      2002        Ohio          3.6         NaN
402 | four       2001       Nevada         2.4         -1.6
403 | five       2002       Nevada         2.9         -1.7
404 | 
405 | 
406 | 
407 | Assigning a column that doesn’t exist will create a new column. The del keyword will
408 | delete columns as with a dictionary.
409 | 
410 | In [59]: frame2['western'] = frame2.state == 'Ohio'
411 | 
412 | In [60]: frame2
413 | Out[60]: 
414 |             year         state             pop             debt              western
415 | one         2000         Ohio              1.5              NaN                True
416 | two         2001         Ohio              1.7              -1.4               True
417 | three       2002         Ohio              3.6              NaN                True
418 | four        2001        Nevada             2.4              -1.6               False
419 | five        2002        Nevada             2.9              -1.7               False
420 | 
421 | In [61]: del frame2['western']
422 | 
423 | In [62]: frame2.columns
424 | Out[62]: Index([u'year', u'state', u'pop', u'debt'], dtype='object')
425 | 
426 | 
427 | 
428 | Another common form of data is a nested dict of dicts format
429 | 
430 | In [63]: pop = {'Nevada' : {2001: 2.4, 2002: 2.9},
431 |     ...: 'Ohio' : {2000: 1.5, 2001: 1.7, 2002: 3.6}}
432 | 
433 | 
434 | 
435 | If passed to DataFrame, it will interpret the outer dict keys as the columns and the inner
436 | keys as the row indices.
437 | 
438 | In [64]: frame3 = DataFrame(pop)
439 | 
440 | In [65]: frame3
441 | Out[65]: 
442 |                Nevada       Ohio
443 | 2000            NaN          1.5
444 | 2001            2.4          1.7
445 | 2002            2.9          3.6
446 | 
447 | 
448 | 
449 | We can also transpose the result
450 | 
451 | In [66]: frame3.T
452 | Out[66]: 
453 |                     2000              2001              2002
454 | Nevada               NaN              2.4                2.9
455 | Ohio                 1.5              1.7                3.6
456 | 
457 | 
458 | 
459 | The keys in the inner dicts are unioned and sorted to form the index in the result. This
460 | isn’t true if an explicit index is specified.
461 | 
462 | In [67]: DataFrame(pop, index=[2001, 2002, 2003])
463 | Out[67]: 
464 |               Nevada           Ohio
465 | 2001           2.4              1.7
466 | 2002           2.9              3.6
467 | 2003           NaN              NaN
468 | 
469 | 
470 | 
471 | Dicts of Series are treated much in the same way.
472 | 
473 | In [68]: pdata = {'Ohio': frame3['Ohio'][:-1],
474 |     ...: 'Nevada' : frame3['Nevada'][:2]}
475 | 
476 | In [69]: DataFrame(pdata)
477 | Out[69]: 
478 |             Nevada        Ohio
479 | 2000         NaN           1.5
480 | 2001         2.4           1.7
481 | 
482 | 
483 | 
484 | If the DataFrame’s index and column have their name attributes set, these will also be displayed. 
485 | 
486 | In [70]: frame3.index.name = 'year'; frame3.columns.name = 'state
487 | 
488 | In [71]: frame3
489 | Out[71]: 
490 | State       Nevada         Ohio
491 | year 
492 | 2000         NaN           1.5
493 | 2001         2.4           1.7
494 | 2002         2.9           3.6
495 | 
496 | 
497 | 
498 | Like Series, the values attribute returns the data contained in the DataFrame as a 2D
499 | ndarray
500 | 
501 | In [72]: frame3.values
502 | Out[72]: 
503 | array([[ nan, 1.5],
504 |            [ 2.4, 1.7],
505 |            [ 2.9, 3.6]])
506 | 
507 | 
508 | 
509 | If the DataFrame’s columns are different dtypes, the dtype of the values array will be
510 | chosen to accomodate all of the columns
511 | 
512 | In [73]: frame2.values
513 | Out[73]: 
514 | array([[2000L, 'Ohio', 1.5, nan],
515 |            [2001L, 'Ohio', 1.7, -1.4],
516 |            [2002L, 'Ohio', 3.6, nan],
517 |            [2001L, 'Nevada', 2.4, -1.6],
518 |            [2002L, 'Nevada', 2.9, -1.7]], dtype=object)
519 | 
520 | 


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/1.3d Machine learning with scikit learn:
--------------------------------------------------------------------------------
  1 | To begin with, let us load the scikit learn library and import the module that contains the various functions in order to extract the inbuilt datasets
  2 | 
  3 | In [5]: from sklearn.datasets import load_iris,load_boston,make_classification, make_circles, make_moons
  4 | 
  5 | 
  6 | First, we will look into the ‘Iris flower data set’ by Ronald Fisher. 
  7 | This dataset would be used for classification problem.
  8 | 
  9 | The load_iris function returns a dictionary object
 10 | 
 11 | In [6]: data = load_iris()
 12 | 
 13 | 
 14 | The predictor x, response variable y, response variable names and feature names can be extracted by querying the dictionary object with the appropriate keys
 15 | 
 16 | In [7]: x = data['data']
 17 | 
 18 | In [8]: y = data['target']
 19 | 
 20 | In [9]: y_labels = data['target_names']
 21 | 
 22 | In [10]: x_labels = data['feature_names']
 23 | 
 24 | 
 25 | 
 26 | Let’s print to see the values
 27 | 
 28 | In [11]: print
 29 | 
 30 | In [12]: print x.shape
 31 | (150, 4)
 32 | 
 33 | In [13]: print y.shape
 34 | (150,)
 35 | 
 36 | In [14]: print x_labels
 37 | ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
 38 | 
 39 | In [15]: print y_labels
 40 | ['setosa' 'versicolor' 'virginica']
 41 | 
 42 | 
 43 | We can see that predictors have 150 instances and 4 attributes. 
 44 | The response variable has 150 instances and a class label for each of the rows in our predictor set. 
 45 | We are then printing out the attribute names, petal and sepal width and length, and finally, the class labels. 
 46 | 
 47 | 
 48 | 
 49 | Now we will see some features with the Boston Housing Dataset.
 50 | This dataset would be used for regression problem.
 51 | The data is loaded pretty much the same way as is done above for iris flower data set. 
 52 | 
 53 | In [16]: data = load_boston()
 54 | 
 55 | 
 56 | The various components of the data, including the predictors and response variables, are queried using the respective keys from the dictionary.
 57 | 
 58 | In [17]: x = data['data']
 59 | 
 60 | In [18]: y = data['target']
 61 | 
 62 | In [19]: x_labels = data['feature_names']
 63 | 
 64 | 
 65 | Let's print these variables to understand more.
 66 | 
 67 | In [20]: print
 68 | 
 69 | In [21]: print x.shape
 70 | (506, 13)
 71 | 
 72 | In [22]: print y.shape
 73 | (506,)
 74 | 
 75 | In [23]: print x_labels
 76 | ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 77 | 'B' 'LSTAT']
 78 | 
 79 | We can see that the predictor set x has 506 instances and 13 attributes.
 80 | The response variable has 506 entries.
 81 | Finally, we have also printed out the names of the attributes.
 82 | 
 83 | 
 84 | 
 85 | Scikit-learn has also provided functions that will help us produce a random classification dataset with some desired properties
 86 | 
 87 | # make some classification dataset
 88 | 
 89 | The make_classification function is a function that can be used to generate a classification dataset.
 90 | 
 91 | In [24]: x,y = make_classification(n_samples = 50, n_features = 5, n_classes = 2)
 92 | 
 93 | In this example, we generated a dataset with 50 instances that are dictated by the n_samples parameter, five attributes, n_features parameters, and two classes set by the n_classes parameter.
 94 | 
 95 | Let’s print to analyze the output of this function. 
 96 | 
 97 | In [25]: print
 98 | 
 99 | In [26]: print x.shape
100 | (50, 5)
101 | 
102 | In [27]: print y.shape
103 | (50,)
104 | 
105 | In [28]: print x[1,:]
106 | [-0.06371735 0.24424535 -0.3383024 0.84448991 0.94607483]
107 | 
108 | In [29]: print y[1]
109 | 0
110 | 
111 | 
112 | We can see that predictor x has 150 instances with 5 features. 
113 | The response variable has 150 instances, with a class label for each of the prediction instances.
114 | On printing the second record in our predictor set x, we can see that we have a vector of dimension 5, relating to the five features that we requested.
115 | Also, we will print the response variable y. For the second row of our predictors, the class label is 0.
116 | 
117 | 
118 | 
119 | Scikit learn also provide us with functions that can generate data with non-linear relationships.
120 | 
121 | In [30]: x,y = make_circles()
122 | 
123 | In [31]: import numpy as np
124 | 
125 | In [32]: import matplotlib.pyplot as plt
126 | 
127 | In [33]: plt.close('all')
128 | 
129 | In [34]: plt.figure(1)
130 | Out[34]: <matplotlib.figure.Figure at 0x9a589b0>
131 | 
132 | In [35]: plt.scatter(x[:,0],x[:,1],c=y)
133 | Out[35]: <matplotlib.collections.PathCollection at 0x9b8b3f0>
134 | 
135 | In [36]: plt.show()
136 | 
137 | 
138 | Let us look at the plot to understand non – linear relationship. (Plot to be found in the docx file)
139 | Our classification has produced two concentric circles. x is a dataset with two variables. Variable y is the class label. As shown by the concentric circle, the relationship between our prediction variable is nonlinear.
140 | 
141 | Another interesting function to produce a nonlinear relationship is make_moons from scikit-learn.
142 | 
143 | In [39]: x,y = make_moons()
144 | 
145 | In [40]: plt.figure(2)
146 | Out[40]: <matplotlib.figure.Figure at 0x9a8c150>
147 | 
148 | In [42]: plt.scatter(x[:,0],x[:,1],c=y)
149 | Out[42]: <matplotlib.collections.PathCollection at 0x9d4cd10>
150 | 
151 | In [43]: plt.show()
152 | 
153 | Lets look at the plot to understand more (Plot to be found in the docx file)
154 |  
155 | 
156 | The crescent-shaped plot shows that the attributes in our predictor set x are nonlinearly related to each other.
157 | 
158 | 
159 | 
160 | 
161 | Let us now learn the API structures of Scikit learn. One of the major advantages of using scikit-learn is its clean API structure. All the data modeling classes deriving from the BaseEstimator class have to strictly implement the fit and transform functions. We will see some examples regarding this.
162 | 
163 | Lets see how we can put some machine learning functionalities in scikit learn
164 | 
165 | In [44]: import numpy as np
166 | 
167 | Lets start with the preprocessing module of scikit learn
168 | 
169 | In [45]: from sklearn.preprocessing import PolynomialFeatures
170 | 
171 | We will use the PolynomialFeatures class in order to demonstrate the ease of using scikit-learn's SDK.
172 | 
173 | With a set of predictor variables, we may want to add some more variables to our predictor set in order to see if our model accuracy can be improved. We can use the polynomials of the existing features as a new feature. The PolynomialFeatures class helps us do this.
174 | 
175 | We will first create a dataset. In this case, our dataset has two instances and two attributes –
176 | 
177 | In [46]: x = np.asmatrix([[1,2],[2,4]])
178 | 
179 | We will proceed to instantiate our PolynomialFeatures class with the required degree of polynomials. In this case, it will be a second degree
180 | 
181 | In [47]: poly = PolynomialFeatures(degree = 2)
182 | 
183 | There are two functions, fit and transform. The fit function is used to do the necessary calculations for the transformation. The transform function takes the input and, based on the calculations performed by fit, transforms the given input.
184 | 
185 | In [48]: poly.fit(x)
186 | Out[48]: PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)
187 | 
188 | In [49]: x_poly = poly.transform(x)
189 | 
190 | In [50]: print "Original x variable shape", x.shape
191 | Original x variable shape (2, 2)
192 | 
193 | 
194 | #original x values
195 | 
196 | In [51]: print x
197 | [[1 2]
198 | [2 4]]
199 | 
200 | In [52]: print
201 | 
202 | In [53]: print "Transformed x variables", x_poly.shape
203 | Transformed x variables (2, 6)
204 | 
205 | 
206 | # transformed x values
207 | 
208 | In [54]: print x_poly
209 | [[ 1. 1. 2. 1. 2. 4.]
210 | [ 1. 2. 4. 4. 8. 16.]]
211 | 
212 | 
213 | Alternatively, in this case, fit and transform can be called in one shot and the output would be the same.
214 | 
215 | In [55]: x_poly = poly.fit_transform(x)
216 | 
217 | 
218 | Any class that implements a machine learning method in scikit-learn has to deliver from BaseEstimator. BaseEstimator expects that the implementation class provides both the fit and transform methods. This way the API is kept very clean.
219 | 
220 | Lets see another example. Here we imported a class called DecisionTreeClassifier from the module tree. DecisionTreeClassifier implements the decision tree algorithm
221 | 
222 | In [56]: from sklearn.tree import DecisionTreeClassifier
223 | 
224 | In [57]: from sklearn.datasets import load_iris
225 | 
226 | In [58]: data = load_iris()
227 | 
228 | In [59]: x = data['data']
229 | 
230 | In [60]: y = data['target']
231 | 
232 | In [61]: estimator = DecisionTreeClassifier()
233 | 
234 | In [62]: estimator.fit(x,y)
235 | Out[62]: 
236 | DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
237 | max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
238 | min_samples_split=2, min_weight_fraction_leaf=0.0,
239 | presort=False, random_state=None, splitter='best')
240 | 
241 | In [63]: predicted_y = estimator.predict(x)
242 | 
243 | In [64]: predicted_y_prob = estimator.predict_proba(x)
244 | 
245 | In [65]: predicted_y_lprob = estimator.predict_log_proba(x)
246 | 
247 | We have used the iris dataset to see how the tree algorithm can be used. First, we have loaded the iris dataset in the x and y variables. We then instantiate DecisonTreeClassifier. We proceeded to build the model by invoking the fit function and passing our x predictor and y response variable. This built the tree model. 
248 | 
249 | Now, we are ready with our model to do some predictions. We have used the predict function in order to predict the class labels for the given input. As we can see, we leveraged the same fit and predict method as in PolynomialFeatures. There are two other methods, predict_proba - which gives the probability of the prediction, and predict_log_proba – which provides the logarithm of the prediction probability.
250 | 
251 | 
252 | We will see another interesting utility called pipelining. Various machine learning methods can be chained together using pipe lining.
253 | 
254 | In [66]: from sklearn.pipeline import Pipeline
255 | 
256 | In [67]: poly = PolynomialFeatures(n=3)
257 | 
258 | In [70]: tree_estimator = DecisionTreeClassifier()
259 | 
260 | 
261 | Let's start by instantiating the data processing routines, PolynomialFeatures and DecisionTreeClassifier
262 | 
263 | In [71]: steps = [('poly',poly),('tree',tree_estimator)]
264 | 
265 | We will define a list of tuples to indicate the order of our chaining. We want to run the polynomial feature generation, followed by our decision tree
266 | 
267 | In [72]: estimator = Pipeline(steps = steps)
268 | 
269 | In [73]: estimator.fit(x,y)
270 | Out[73]: 
271 | Pipeline(steps=[('poly', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)), ('tree', DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
272 | max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
273 | min_samples_split=2, min_weight_fraction_leaf=0.0,
274 | presort=False, random_state=None, splitter='best'))])
275 | 
276 | In [74]: predicted_y = estimator.predict(x)
277 | 
278 | We can now instantiate our Pipeline object with the list declared using the steps variable. Now, we can proceed as usual by calling the fit and predict methods.
279 | 
280 | We can invoke the named_steps attribute in order to inspect the models in the various stages of our pipeline.
281 | 
282 | In [75]: estimator.named_steps
283 | Out[75]: 
284 | {'poly': PolynomialFeatures(degree=2, include_bias=True, interaction_only=False),
285 | 'tree': DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
286 | max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
287 | min_samples_split=2, min_weight_fraction_leaf=0.0,
288 | presort=False, random_state=None, splitter='best')}
289 | 


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/Machine Learning with Scikit Learn.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/1-Introduction-to-machine-learning-software-stack/1.3/Machine Learning with Scikit Learn.docx


--------------------------------------------------------------------------------
/1-Introduction-to-machine-learning-software-stack/1.3/Matplotlib.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/1-Introduction-to-machine-learning-software-stack/1.3/Matplotlib.docx


--------------------------------------------------------------------------------
/2-intro-to-ml/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/2-intro-to-ml/README.md


--------------------------------------------------------------------------------
/3-project-overview/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/3-project-overview/README.md


--------------------------------------------------------------------------------
/4-regression/4.3 - Regularization and Model Evaluation.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Optimizing the Models "
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "Welcome to the practical section of module 4.3. Here we'll continue with the advertising-sales dataset to investigate the ideas of regularization and model evaluation. We'll continue with the multivariate regression model we build in the previous module and we'll be looking into tuning the regularization parameter to achieve the most accurate model and we'll evaluate this accuracy using better metrics than MSE which we have been using in the previous modules."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 2,
 20 |    "metadata": {
 21 |     "collapsed": true
 22 |    },
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "import pandas as pd\n",
 26 |     "import numpy as np\n",
 27 |     "from sklearn.linear_model import SGDRegressor\n",
 28 |     "from sklearn.preprocessing import StandardScaler\n",
 29 |     "\n",
 30 |     "%matplotlib inline\n",
 31 |     "import matplotlib.pyplot as plt\n",
 32 |     "plt.rcParams['figure.figsize'] = (10, 10)"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "markdown",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "# Building the Model"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "metadata": {},
 45 |    "source": [
 46 |     "In the following you'll see the same code (without visualization) we wrote in the previous module for the rgeression model using both **TV** and **Newspaper** data, so it's nothing new, **except** for the part where we prepare our data. We'll be splitting the dataset into three parts now instead of two:\n",
 47 |     "* **Training Set**: we'll train the model on this\n",
 48 |     "* **Validation Set**: we'll be tuning hyperparameters on this (more on that later)\n",
 49 |     "* **Tests Set**: we'll be evaluating our model on this"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 3,
 55 |    "metadata": {
 56 |     "collapsed": true
 57 |    },
 58 |    "outputs": [],
 59 |    "source": [
 60 |     "def scale_features(X, scalar=None):\n",
 61 |     "    if(len(X.shape) == 1):\n",
 62 |     "        X = X.reshape(-1, 1)\n",
 63 |     "    \n",
 64 |     "    if scalar == None:\n",
 65 |     "        scalar = StandardScaler()\n",
 66 |     "        scalar.fit(X)\n",
 67 |     "    \n",
 68 |     "    return scalar.transform(X), scalar"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": 3,
 74 |    "metadata": {
 75 |     "collapsed": false
 76 |    },
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "# get the advertising data set\n",
 80 |     "\n",
 81 |     "dataset = pd.read_csv('../datasets/Advertising.csv')\n",
 82 |     "dataset = dataset[[\"TV\", \"Radio\", \"Newspaper\", \"Sales\"]]  # filtering the Unamed index column out of the dataset"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": 7,
 88 |    "metadata": {
 89 |     "collapsed": true
 90 |    },
 91 |    "outputs": [],
 92 |    "source": [
 93 |     "dataset_size = len(dataset)\n",
 94 |     "training_size = np.floor(dataset_size * 0.6).astype(int)\n",
 95 |     "validation_size = np.floor(dataset_size * 0.2).astype(int)\n",
 96 |     "\n",
 97 |     "# First we split the shuffled dataset into three  parts: training, validation and test\n",
 98 |     "X_training = dataset[[\"TV\", \"Newspaper\"]][:training_size]\n",
 99 |     "y_training = dataset[\"Sales\"][:training_size]\n",
100 |     "\n",
101 |     "X_validation = dataset[[\"TV\", \"Newspaper\"]][training_size:training_size + validation_size]\n",
102 |     "y_validation = dataset[\"Sales\"][training_size:training_size + validation_size]\n",
103 |     "\n",
104 |     "X_test = dataset[[\"TV\", \"Newspaper\"]][training_size:training_size + validation_size:]\n",
105 |     "y_test = dataset[\"Sales\"][training_size:training_size + validation_size:]\n",
106 |     "\n",
107 |     "# Second we apply feature scaling on X_training and X_test\n",
108 |     "X_training, training_scalar = scale_features(X_training)\n",
109 |     "X_validation,_ = scale_features(X_validation, scalar=training_scalar)\n",
110 |     "X_test,_ = scale_features(X_test, scalar=training_scalar)"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 40,
116 |    "metadata": {
117 |     "collapsed": false
118 |    },
119 |    "outputs": [
120 |     {
121 |      "name": "stdout",
122 |      "output_type": "stream",
123 |      "text": [
124 |       "Trained model: y = 11.55 + 3.32x₁ + 0.83x₂\n",
125 |       "The Test Data MSE is: 16.852\n"
126 |      ]
127 |     }
128 |    ],
129 |    "source": [
130 |     "model = SGDRegressor(loss='squared_loss')\n",
131 |     "model.fit(X_training, y_training)\n",
132 |     "\n",
133 |     "w0 = model.intercept_\n",
134 |     "w1 = model.coef_[0]  # Notice that model.coef_ is a list now not a single number\n",
135 |     "w2 = model.coef_[1]\n",
136 |     "\n",
137 |     "print \"Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂\" % (w0, w1, w2)\n",
138 |     "\n",
139 |     "MSE = np.mean((y_test - model.predict(X_test)) ** 2)\n",
140 |     "\n",
141 |     "print \"The Test Data MSE is: %0.3f\" % (MSE)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {},
147 |    "source": [
148 |     "# L2 Regularization"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "metadata": {},
154 |    "source": [
155 |     "From the videos, we learned that the idea of **regularization** is introduced to prevent the model from overfitting to the data points by adding a penality for large weights values. Such penality is expressed mathematically with the second term of the cost function:\n",
156 |     "\n",
157 |     "$$ J(W) = \\sum_{i=1}^{m} (h_w(X^{(i)} - y^{(i)})^2 + \\lambda \\sum_{j=1}^{n} w_j^2 $$\n",
158 |     "\n",
159 |     "This is called **L2 Regularization** and $\\lambda$ is called the **Regularization Parameter** , How can we implment it then with scikit-learn for our models?\n",
160 |     "\n",
161 |     "Well, no worries, scikit-learn implements that for you and we have been using it all the time.\n",
162 |     "The **SGDRegressor** constructs has two arguments that define the behavior of the penality:\n",
163 |     "* *penalty*: which is a string specifying the type of penality to use (default to 'l2')\n",
164 |     "* *alpha*: which is the value of the $\\lambda$ in the equation above\n",
165 |     "\n",
166 |     "Now let's play with the value of alpha and see how does that affect our model's accuracy. Let's set alpha to a large number say 1. In this case we give the values of the weights a very harsh penalty so they'll end up smaller than they should be and the accuracy should be worse!"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": 41,
172 |    "metadata": {
173 |     "collapsed": false
174 |    },
175 |    "outputs": [
176 |     {
177 |      "name": "stdout",
178 |      "output_type": "stream",
179 |      "text": [
180 |       "Trained model: y = 11.53 + 1.97x₁ + 0.48x₂\n",
181 |       "The Test Data MSE is: 19.423\n"
182 |      ]
183 |     }
184 |    ],
185 |    "source": [
186 |     "model = SGDRegressor(loss='squared_loss', alpha=1)\n",
187 |     "model.fit(X_training, y_training)\n",
188 |     "\n",
189 |     "w0 = model.intercept_\n",
190 |     "w1 = model.coef_[0]  # Notice that model.coef_ is a list now not a single number\n",
191 |     "w2 = model.coef_[1]\n",
192 |     "\n",
193 |     "print \"Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂\" % (w0, w1, w2)\n",
194 |     "\n",
195 |     "MSE = np.mean((y_test - model.predict(X_test)) ** 2)\n",
196 |     "\n",
197 |     "print \"The Test Data MSE is: %0.3f\" % (MSE)"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "markdown",
202 |    "metadata": {},
203 |    "source": [
204 |     "The effect the value of the regularization parameter has on the model's accuracy makes a very good candidate for tuning. We can use the validation data set we created for that purpose. We create a list of possible values for the regularization parameter, we train the model using each of these value and evaluate the model using the validation set. The value with the best evaluation (least MSE) is the best value for the regularization parameter."
205 |    ]
206 |   },
207 |   {
208 |    "cell_type": "code",
209 |    "execution_count": 61,
210 |    "metadata": {
211 |     "collapsed": false
212 |    },
213 |    "outputs": [
214 |     {
215 |      "name": "stdout",
216 |      "output_type": "stream",
217 |      "text": [
218 |       "The Best alpha is: 0.0001\n",
219 |       "The Test Data MSE is: 16.985\n"
220 |      ]
221 |     }
222 |    ],
223 |    "source": [
224 |     "alphas = [0.00025, 0.00005, 0.0001, 0.0002, 0.0004]\n",
225 |     "best_alpha = alphas[0]\n",
226 |     "least_mse = float(\"inf\")  #initialized to infinity\n",
227 |     "for possible_alpha in alphas:\n",
228 |     "    model = SGDRegressor(loss='squared_loss', alpha=possible_alpha)\n",
229 |     "    model.fit(X_training, y_training)\n",
230 |     "    \n",
231 |     "    mse = np.mean((y_validation - model.predict(X_validation)) ** 2)\n",
232 |     "    if mse <= least_mse:\n",
233 |     "        least_mse = mse\n",
234 |     "        best_alpha = possible_alpha\n",
235 |     "\n",
236 |     "print \"The Best alpha is: %.4f\" % (best_alpha)        \n",
237 |     "\n",
238 |     "best_model = SGDRegressor(loss='squared_loss', alpha=best_alpha)\n",
239 |     "best_model.fit(X_training, y_training)\n",
240 |     "MSE = np.mean((y_test - best_model.predict(X_test)) ** 2) # evaluating the best model on test data\n",
241 |     "\n",
242 |     "print \"The Test Data MSE is: %0.3f\" % (MSE)"
243 |    ]
244 |   },
245 |   {
246 |    "cell_type": "markdown",
247 |    "metadata": {},
248 |    "source": [
249 |     "There's a better way to tune the regularization parameter and possiblby multiple parameters at the same time. This way through scikit-learn's [GridSearchCV](http://scikit-learn.org/stable/modules/grid_search.html). We'll not be working with that here, but you're encouraged to read the documentation and user guides and try for yourself how it could be done. Once you got the hang of it, you can maybe try and tune the learning rate and the regularization parameter at the same time! "
250 |    ]
251 |   },
252 |   {
253 |    "cell_type": "markdown",
254 |    "metadata": {},
255 |    "source": [
256 |     "# The R-squared Metric"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {},
262 |    "source": [
263 |     "The Last thing we have here is to see how we can evaluate our model using the $R^2$ metric. We learned in the videos that the $R^2$ metric measures how close the data points are to our regression line (or plane). We also learned that there's an adjusted version of that metric denoted by $\\overline{R^2}$ that penalizes for the extra features we add to the model that doesn't help the model be more accurate. Those metric can be calculated using the following formulas:\n",
264 |     "\n",
265 |     "$$R^2 = 1 - \\frac{\\sum_{i=1}^{n}(y_i - f_i)^2}{\\sum_{i=1}^{n}(y_i - \\overline{y})^2}$$"
266 |    ]
267 |   },
268 |   {
269 |    "cell_type": "markdown",
270 |    "metadata": {},
271 |    "source": [
272 |     "where $f_i$ is our model prediction and $overline{y}$ is the mean of all n $y_i$s. And for the adjusted version:\n",
273 |     "\n",
274 |     "$$\\overline{R^2} = R^2 - \\frac{k - 1}{n - k}(1 - R^2)$$"
275 |    ]
276 |   },
277 |   {
278 |    "cell_type": "markdown",
279 |    "metadata": {},
280 |    "source": [
281 |     "where $k$ is the number of fatures and $n$ is the number of data samples. Both $R^2$ and $\\overline{R^2}$ take a value less than or equal to **1**.The closer it is to one, the better our model is.\n",
282 |     "\n",
283 |     "Fortunately, we don't have to do all these calculations by hand to use this metric with scikit-learn. The model's **score** method does that for us. It takes the test Xs and ys and spits out the value of $\\overline{R^2}$"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": 64,
289 |    "metadata": {
290 |     "collapsed": false
291 |    },
292 |    "outputs": [
293 |     {
294 |      "name": "stdout",
295 |      "output_type": "stream",
296 |      "text": [
297 |       "Trained model: y = 13.82 + 3.90x₁ + 0.98x₂\n",
298 |       "The Model's Adjusted R² on Test Data is 0.65\n"
299 |      ]
300 |     }
301 |    ],
302 |    "source": [
303 |     "model = SGDRegressor(loss='squared_loss', eta0=0.02)\n",
304 |     "model.fit(X_training, y_training)\n",
305 |     "\n",
306 |     "w0 = model.intercept_\n",
307 |     "w1 = model.coef_[0]  # Notice that model.coef_ is a list now not a single number\n",
308 |     "w2 = model.coef_[1]\n",
309 |     "\n",
310 |     "print \"Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂\" % (w0, w1, w2)\n",
311 |     "\n",
312 |     "R2_adjusted = model.score(X_test, y_test)\n",
313 |     "\n",
314 |     "print \"The Model's Adjusted R² on Test Data is %0.2f\" % (R2_adjusted)"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {},
320 |    "source": [
321 |     "# Exercise\n",
322 |     "Apply the ideas of L2 Regularization and $R^2$ metric to the exercises you did in the last two modules.\n",
323 |     "\n",
324 |     "# Research Idea\n",
325 |     "Download [Kaggle's 2016 US Election Dataset](https://www.kaggle.com/benhamner/2016-us-election/) and explore the data using what you learned in Linear Regression. Make assumptions about the data correlations and dependence and test your assumptions using what you learned. If had interesting results, publish your code and your results to the [Script's Repo](https://www.kaggle.com/benhamner/2016-us-election/scripts) and share them with the community."
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "code",
330 |    "execution_count": null,
331 |    "metadata": {
332 |     "collapsed": true
333 |    },
334 |    "outputs": [],
335 |    "source": []
336 |   }
337 |  ],
338 |  "metadata": {
339 |   "kernelspec": {
340 |    "display_name": "Python 2",
341 |    "language": "python",
342 |    "name": "python2"
343 |   },
344 |   "language_info": {
345 |    "codemirror_mode": {
346 |     "name": "ipython",
347 |     "version": 2
348 |    },
349 |    "file_extension": ".py",
350 |    "mimetype": "text/x-python",
351 |    "name": "python",
352 |    "nbconvert_exporter": "python",
353 |    "pygments_lexer": "ipython2",
354 |    "version": "2.7.6"
355 |   }
356 |  },
357 |  "nbformat": 4,
358 |  "nbformat_minor": 0
359 | }
360 | 


--------------------------------------------------------------------------------
/4-regression/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/4-regression/README.md


--------------------------------------------------------------------------------
/5-classification/5.4 - Artificial Neural Networks in Classifying Breast Cancer.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Example 5.4: Classifying Malignant/Benign Breast Tumors with Artificial Neural Networks"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "Welcome to the practical section of module 5.4. Here we'll be using **Artificial Neural Networks** with the [Wisconsin Breast Cancer Database](http://bit.ly/1IoTs7x) just like in the practical example for module 5.2 to predict whether a patient's tumor is benign or malignant based on tumor cell charactaristics. This is just one example from many to which machine learning and classification could offer great insights and aid. **Make sure** to delete any rows with missing data (which will contain a **\"?\"** character in a feature cell).\n",
 15 |     "\n",
 16 |     "By the end of the module, we'll have a trained an artificial neural network model on the a subset of the features presented in the dataset that is very accurate at diagnosing the condition of the tumor based on these features. We'll also see how we can make interseting inferences from the model that could be helpful for the physicians in diagnosing cancer."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "metadata": {},
 22 |    "source": [
 23 |     "Since scikit-learn's newest **stable** version does not support neural networks / multi-layer perceptrons, we will be using the [scikit-neuralnetwork](https://scikit-neuralnetwork.readthedocs.io/en/latest/) third-party implementation. To install scikit-neuralnetwork, please consult the [Installation](https://scikit-neuralnetwork.readthedocs.io/en/latest/guide_installation.html) section of the documentation."
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "metadata": {},
 29 |    "source": [
 30 |     "First, we will import all our dependencies. Make sure to install all of these separately:"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": 52,
 36 |    "metadata": {
 37 |     "collapsed": true
 38 |    },
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "import pandas as pd # we use this library to import a CSV of cancer tumor data\n",
 42 |     "import numpy as np # we use this library to help us represent traditional Python arrays/lists as matrices/tensors with linear algebra operations\n",
 43 |     "\n",
 44 |     "from sknn.mlp import Classifier, Layer # we use this library for the actual neural network code\n",
 45 |     "from sklearn.utils import shuffle # we use this library for randomly shuffling arrays/tensors\n",
 46 |     "\n",
 47 |     "import sys # we use for accessing output window \n",
 48 |     "import logging # we use this library for outputting real time statistics/updates on training progress\n",
 49 |     "logging.basicConfig(format=\"%(message)s\", level=logging.DEBUG, stream=sys.stdout) # set the logging mode to DEBUG to output training information, use \"INFO\" for less volume of output"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "# The Data\n",
 57 |     "\n",
 58 |     "We'll start off by exploring our dataset to see what attributes we have and how the class of the tumor is represented"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "Before we proceed, ensure to include headers to the dataset provided by the University of Wisconsin. We will use the following headers:\n",
 66 |     "\n",
 67 |     "> ID,CT,UCS,UCSh,MA,SECS,BN,BC,NN,M,Class\n",
 68 |     "\n",
 69 |     "Add this to the **beginning** line of your .csv file."
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 2,
 75 |    "metadata": {
 76 |     "collapsed": false
 77 |    },
 78 |    "outputs": [
 79 |     {
 80 |      "data": {
 81 |       "text/html": [
 82 |        "<div>\n",
 83 |        "<table border=\"1\" class=\"dataframe\">\n",
 84 |        "  <thead>\n",
 85 |        "    <tr style=\"text-align: right;\">\n",
 86 |        "      <th></th>\n",
 87 |        "      <th>ID</th>\n",
 88 |        "      <th>CT</th>\n",
 89 |        "      <th>UCS</th>\n",
 90 |        "      <th>UCSh</th>\n",
 91 |        "      <th>MA</th>\n",
 92 |        "      <th>SECS</th>\n",
 93 |        "      <th>BN</th>\n",
 94 |        "      <th>BC</th>\n",
 95 |        "      <th>NN</th>\n",
 96 |        "      <th>M</th>\n",
 97 |        "      <th>Class</th>\n",
 98 |        "    </tr>\n",
 99 |        "  </thead>\n",
100 |        "  <tbody>\n",
101 |        "    <tr>\n",
102 |        "      <th>0</th>\n",
103 |        "      <td>1000025</td>\n",
104 |        "      <td>5</td>\n",
105 |        "      <td>1</td>\n",
106 |        "      <td>1</td>\n",
107 |        "      <td>1</td>\n",
108 |        "      <td>2</td>\n",
109 |        "      <td>1</td>\n",
110 |        "      <td>3</td>\n",
111 |        "      <td>1</td>\n",
112 |        "      <td>1</td>\n",
113 |        "      <td>2</td>\n",
114 |        "    </tr>\n",
115 |        "    <tr>\n",
116 |        "      <th>1</th>\n",
117 |        "      <td>1002945</td>\n",
118 |        "      <td>5</td>\n",
119 |        "      <td>4</td>\n",
120 |        "      <td>4</td>\n",
121 |        "      <td>5</td>\n",
122 |        "      <td>7</td>\n",
123 |        "      <td>10</td>\n",
124 |        "      <td>3</td>\n",
125 |        "      <td>2</td>\n",
126 |        "      <td>1</td>\n",
127 |        "      <td>2</td>\n",
128 |        "    </tr>\n",
129 |        "    <tr>\n",
130 |        "      <th>2</th>\n",
131 |        "      <td>1015425</td>\n",
132 |        "      <td>3</td>\n",
133 |        "      <td>1</td>\n",
134 |        "      <td>1</td>\n",
135 |        "      <td>1</td>\n",
136 |        "      <td>2</td>\n",
137 |        "      <td>2</td>\n",
138 |        "      <td>3</td>\n",
139 |        "      <td>1</td>\n",
140 |        "      <td>1</td>\n",
141 |        "      <td>2</td>\n",
142 |        "    </tr>\n",
143 |        "    <tr>\n",
144 |        "      <th>3</th>\n",
145 |        "      <td>1016277</td>\n",
146 |        "      <td>6</td>\n",
147 |        "      <td>8</td>\n",
148 |        "      <td>8</td>\n",
149 |        "      <td>1</td>\n",
150 |        "      <td>3</td>\n",
151 |        "      <td>4</td>\n",
152 |        "      <td>3</td>\n",
153 |        "      <td>7</td>\n",
154 |        "      <td>1</td>\n",
155 |        "      <td>2</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <th>4</th>\n",
159 |        "      <td>1017023</td>\n",
160 |        "      <td>4</td>\n",
161 |        "      <td>1</td>\n",
162 |        "      <td>1</td>\n",
163 |        "      <td>3</td>\n",
164 |        "      <td>2</td>\n",
165 |        "      <td>1</td>\n",
166 |        "      <td>3</td>\n",
167 |        "      <td>1</td>\n",
168 |        "      <td>1</td>\n",
169 |        "      <td>2</td>\n",
170 |        "    </tr>\n",
171 |        "    <tr>\n",
172 |        "      <th>5</th>\n",
173 |        "      <td>1017122</td>\n",
174 |        "      <td>8</td>\n",
175 |        "      <td>10</td>\n",
176 |        "      <td>10</td>\n",
177 |        "      <td>8</td>\n",
178 |        "      <td>7</td>\n",
179 |        "      <td>10</td>\n",
180 |        "      <td>9</td>\n",
181 |        "      <td>7</td>\n",
182 |        "      <td>1</td>\n",
183 |        "      <td>4</td>\n",
184 |        "    </tr>\n",
185 |        "    <tr>\n",
186 |        "      <th>6</th>\n",
187 |        "      <td>1018099</td>\n",
188 |        "      <td>1</td>\n",
189 |        "      <td>1</td>\n",
190 |        "      <td>1</td>\n",
191 |        "      <td>1</td>\n",
192 |        "      <td>2</td>\n",
193 |        "      <td>10</td>\n",
194 |        "      <td>3</td>\n",
195 |        "      <td>1</td>\n",
196 |        "      <td>1</td>\n",
197 |        "      <td>2</td>\n",
198 |        "    </tr>\n",
199 |        "    <tr>\n",
200 |        "      <th>7</th>\n",
201 |        "      <td>1018561</td>\n",
202 |        "      <td>2</td>\n",
203 |        "      <td>1</td>\n",
204 |        "      <td>2</td>\n",
205 |        "      <td>1</td>\n",
206 |        "      <td>2</td>\n",
207 |        "      <td>1</td>\n",
208 |        "      <td>3</td>\n",
209 |        "      <td>1</td>\n",
210 |        "      <td>1</td>\n",
211 |        "      <td>2</td>\n",
212 |        "    </tr>\n",
213 |        "    <tr>\n",
214 |        "      <th>8</th>\n",
215 |        "      <td>1033078</td>\n",
216 |        "      <td>2</td>\n",
217 |        "      <td>1</td>\n",
218 |        "      <td>1</td>\n",
219 |        "      <td>1</td>\n",
220 |        "      <td>2</td>\n",
221 |        "      <td>1</td>\n",
222 |        "      <td>1</td>\n",
223 |        "      <td>1</td>\n",
224 |        "      <td>5</td>\n",
225 |        "      <td>2</td>\n",
226 |        "    </tr>\n",
227 |        "    <tr>\n",
228 |        "      <th>9</th>\n",
229 |        "      <td>1033078</td>\n",
230 |        "      <td>4</td>\n",
231 |        "      <td>2</td>\n",
232 |        "      <td>1</td>\n",
233 |        "      <td>1</td>\n",
234 |        "      <td>2</td>\n",
235 |        "      <td>1</td>\n",
236 |        "      <td>2</td>\n",
237 |        "      <td>1</td>\n",
238 |        "      <td>1</td>\n",
239 |        "      <td>2</td>\n",
240 |        "    </tr>\n",
241 |        "  </tbody>\n",
242 |        "</table>\n",
243 |        "</div>"
244 |       ],
245 |       "text/plain": [
246 |        "        ID  CT  UCS  UCSh  MA  SECS  BN  BC  NN  M  Class\n",
247 |        "0  1000025   5    1     1   1     2   1   3   1  1      2\n",
248 |        "1  1002945   5    4     4   5     7  10   3   2  1      2\n",
249 |        "2  1015425   3    1     1   1     2   2   3   1  1      2\n",
250 |        "3  1016277   6    8     8   1     3   4   3   7  1      2\n",
251 |        "4  1017023   4    1     1   3     2   1   3   1  1      2\n",
252 |        "5  1017122   8   10    10   8     7  10   9   7  1      4\n",
253 |        "6  1018099   1    1     1   1     2  10   3   1  1      2\n",
254 |        "7  1018561   2    1     2   1     2   1   3   1  1      2\n",
255 |        "8  1033078   2    1     1   1     2   1   1   1  5      2\n",
256 |        "9  1033078   4    2     1   1     2   1   2   1  1      2"
257 |       ]
258 |      },
259 |      "execution_count": 2,
260 |      "metadata": {},
261 |      "output_type": "execute_result"
262 |     }
263 |    ],
264 |    "source": [
265 |     "dataset = pd.read_csv('./datasets/breast-cancer-wisconson.csv') # import the CSV data into an array using the panda dependency\n",
266 |     "print dataset[:10]"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "markdown",
271 |    "metadata": {},
272 |    "source": [
273 |     "To understand the meaning of the abbreviations we can consult the [dataset's website](http://bit.ly/1IoTs7x) to find a description of each attribute in order. We are going to train on **all** features unlike the logistic regression example (where we just trained on three). This does mean that we will be unable to visualize the results, but will get a feel for how to work with high-dimensional data.\n",
274 |     "\n",
275 |     "If you noticed the **Class** attribute at the end (which gives the class of the tumor), you'll find that it takes either 2 or 4, where 2 represents a *benign* tumor while 4 represents a *malignant* tumor. We'll change that to more expressive values and make a benign tumor represented by 0 (false) and mlignants by 1s (true).\n",
276 |     "\n",
277 |     "You'll notice that the **ID** attribute of data that is useless to our modelling, since it provides no information about the tumor itself, and is instead a way of identifying a specific tumor. We will hence strip this from our dataset before training."
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "code",
282 |    "execution_count": null,
283 |    "metadata": {
284 |     "collapsed": false
285 |    },
286 |    "outputs": [],
287 |    "source": [
288 |     "dataset = dataset[[\"CT\", \"UCS\", \"UCSh\", \"MA\", \"SECS\", \"BN\", \"BC\", \"NN\", \"M\", \"Class\"]]  # remove the ID attribute from the dataset\n",
289 |     "dataset.is_copy = False  # this is just to hide a nasty warning!\n",
290 |     "[0 if tclass == 2 else 1 for tclass in dataset[\"Class\"]]  # convert Class attributes to 0/1 if they are 2/4 in dataset[\"Class\"] column"
291 |    ]
292 |   },
293 |   {
294 |    "cell_type": "markdown",
295 |    "metadata": {},
296 |    "source": [
297 |     "We will now need to split up the dataset into two separate tensors: **X** and **y**. **X** will contain the features and their values for each training example, and **y** will contain all the outputs. In this training set, let's say **m** refers to the number of training examples (of which there are just over 600) and **n** refers to the number of features (of which there are 9). Thus, **X** will be a matrix where $X\\in \\mathbb{R}^{m\\:\\cdot \\:n}$ and **y** is a vector where $y\\in \\mathbb{R}^m$ (because we only have one output - a probability of the tumor being malignant).\n",
298 |     "\n",
299 |     "We simply separate by the **\"Class\"** attribute and the other features."
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "code",
304 |    "execution_count": null,
305 |    "metadata": {
306 |     "collapsed": true
307 |    },
308 |    "outputs": [],
309 |    "source": [
310 |     "X = np.array(dataset[[\"CT\", \"UCS\", \"UCSh\", \"MA\", \"SECS\", \"BN\", \"BC\", \"NN\", \"M\"]]) # X is composed of all the n feature columns\n",
311 |     "y = np.array(dataset[\"Class\"]) # y is composed of just the output class column"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "markdown",
316 |    "metadata": {},
317 |    "source": [
318 |     "# Training the Neural Network Model"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "markdown",
323 |    "metadata": {},
324 |    "source": [
325 |     "For the training, we are going to split the dataset into a training set and a test set. The training set will be 70% of the original data set, and will be what the neural network will learn from. We will test the accuracy of the neural network's learned weights by using the test set, which is composed of 30% of the original data."
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "markdown",
330 |    "metadata": {},
331 |    "source": [
332 |     "It's **very important** to **shuffle** the dataset before partitioning into training/test sets. Why? Because the data given to us by University of Wisconsin may be in some sorted (apparent or unapparent) order. It may be that most y=0 examples are in the latter half of the data. It may be that most of the nearby recorded patients are similar, or were recorded during similar dates/times. We want none of that - we want to remove any information and ensure that our partitioning is random, so that our test results represent true probabilites of picking a random training case from an entire population of permutations of the feature vector. For example, I once got a 5% difference in test results since the data I had was sorted beforehand. Essentially, we want to make sure that we're being absolutely fair about the partition, and not accidentally making our test results too good/too bad.\n",
333 |     "\n",
334 |     "We can't use numpy's traditional shuffle function because this shuffles one array only. If we independently shuffled **x** and **y**, the order between them would be lost (ie. if training case x had output of 1 beforehand, this may be accidentally changed to an output of 0, since we use corresponding indicdes to match up the inputs and outputs)."
335 |    ]
336 |   },
337 |   {
338 |    "cell_type": "code",
339 |    "execution_count": null,
340 |    "metadata": {
341 |     "collapsed": true
342 |    },
343 |    "outputs": [],
344 |    "source": [
345 |     "X, y = shuffle(X, y, random_state=0) # we use scikit-learn's synchronized shuffle feature to shuffle two arrays in unison"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": 48,
351 |    "metadata": {
352 |     "collapsed": false
353 |    },
354 |    "outputs": [],
355 |    "source": [
356 |     "dataset_size = len(dataset) # get the size of overall dataset\n",
357 |     "training_size = np.floor(dataset_size * 0.7).astype(int) # get the training size as 70% of the dataset size (or roughly 0.7 * dataset_size) and as an integer\n",
358 |     "\n",
359 |     "X_train = X[:training_size] # extract the first 70% of inputs for training\n",
360 |     "y_train = y[:training_size] # extract the first 70% of inputs for training\n",
361 |     "\n",
362 |     "X_test = X[training_size:] # extract rest 30% of inputs for testing\n",
363 |     "y_test = y[training_size:] # extract rest 30% of outputs for testing"
364 |    ]
365 |   },
366 |   {
367 |    "cell_type": "markdown",
368 |    "metadata": {},
369 |    "source": [
370 |     "**scikit-neuralnetwork** offers a really neat, easy to use API (from sknn.mlp import Classifier, Layer) for training neural networks. This API has support for many different paradigms like dropout, momentum, weight decay, mini-batch gradient descent etc. and even different neural network types like Convolutional Neural Networks. Today, however, our goal is to get a simple Artificial Neural Network setup!"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "markdown",
375 |    "metadata": {},
376 |    "source": [
377 |     "Our first job is to configure the architecture for the neural network. We will need to decide:\n",
378 |     "* The number of hidden layers\n",
379 |     "* The size of each hidden layer\n",
380 |     "* The activation function used at each hidden layer\n",
381 |     "* The learning rate, number of iterations, and other hyperparameters\n",
382 |     "\n",
383 |     "Some types of activation functions offered by scikit-neuralnetwork include:\n",
384 |     "* Linear\n",
385 |     "* Rectifier\n",
386 |     "* Sigmoid\n",
387 |     "* Softmax\n",
388 |     "\n",
389 |     "Where Softmax computes a sigmoid probability distribution over multiple outputs. Generally, it is conventional to use Softmax as the activation function for the output layer (when we have 1 output it really is just the same as a Sigmoid layer, but scikit-neuralnetwork will still output a warning). Recall the formula for the sigmoid activation function:\n",
390 |     "\n",
391 |     "$Sigmoid\\left(z\\right)=\\frac{1}{1+e^{-z}}$\n",
392 |     "\n",
393 |     "This function \"squeezes\" any real value into a probability of range (0, 1).\n",
394 |     "\n",
395 |     "The Linear and Rectifier [(ReLU)](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) may be used, but we obviously need to use Sigmoid/Softmax in our neural network because we are performing a classification task. Generally, I found that using Rectifier units throughout and then having a Softmax (Sigmoid) layer at the end produced the best results."
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "markdown",
400 |    "metadata": {},
401 |    "source": [
402 |     "Now we need to decide on our structure (as in, the hidden layers and their sizes). Our neural network will end up looking like the following:\n",
403 |     "<img src=\"./../images/neuralnet.png\">\n",
404 |     "\n",
405 |     "That is, for this example, we will use two hidden layers (both of which are Rectifiers). Generally, the greater number of hidden layers you have, the greater complexity. Two should be fine for our task though. In our neural network, we will have **9** input nodes (for our 9 features), and I have chosen 100 neurons for the first hidden layer as well as 50 for the second. The Softmax layer will just have one output because we only want one output (probability of malignancy). You can **play around** with these numbers, bar the Softmax output layer :) We really need not that many hidden units in each hidden layer, but I want to demonstrate the scale we can create with this API.\n",
406 |     "\n",
407 |     "**NOTE**: In actuality, our Softmax layer for **this coding example** will have **two outputs**. For classifiers, whether its binary classification or multi-class, scikit-neuralnetwork uses a one-hot-encoded representation of the labels with cross-entropy-loss. This requires that the output layer (the softmax layer) to be a probability distribution over all labels, hence the number of the units in the softmax layer needs to be the number of labels. In binary classification, we have 2 labels: 0 and 1, so the expected behavior is for the softmax layer to have 2 units.\n",
408 |     "\n",
409 |     "Finally, we need to choose our **hyperparameters**. A **learning rate** of **0.001** and number of iterations/epochs of **100** should suffice. Our code will look like the following:"
410 |    ]
411 |   },
412 |   {
413 |    "cell_type": "code",
414 |    "execution_count": null,
415 |    "metadata": {
416 |     "collapsed": true
417 |    },
418 |    "outputs": [],
419 |    "source": [
420 |     "nn = Classifier(layers=[ # create a new Classifier object (neural network classifier), and pass all the layers as parameters\n",
421 |     "        Layer(\"Rectifier\", units=100), # create the first post-input hidden layer, a Rectifier (ReLU) layer of 100 units\n",
422 |     "        Layer(\"Rectifier\", units=50), # create the second hidden layer, a Rectifier layer of 50 units\n",
423 |     "        Layer(\"Softmax\"), units=2], # create the final output layer, a Softmax layer that will output two probabilities, as mentioned before\n",
424 |     "        learning_rate=0.001, n_iter=100) # pass in hyperparameters as a separate parameter to the layers"
425 |    ]
426 |   },
427 |   {
428 |    "cell_type": "markdown",
429 |    "metadata": {},
430 |    "source": [
431 |     "Our neural network has been configured and built! Now, we just need to train it using our training set, using the intuitively named function \"fit\":"
432 |    ]
433 |   },
434 |   {
435 |    "cell_type": "code",
436 |    "execution_count": null,
437 |    "metadata": {
438 |     "collapsed": true
439 |    },
440 |    "outputs": [],
441 |    "source": [
442 |     "nn.fit(X_train, y_train) # begin training using backpropagation!"
443 |    ]
444 |   },
445 |   {
446 |    "cell_type": "markdown",
447 |    "metadata": {},
448 |    "source": [
449 |     "The output in my console due to the DEBUG logging looked like this (it's fun to see training in action!):\n",
450 |     "\n",
451 |     "<img src=\"./../images/train_img.png\">"
452 |    ]
453 |   },
454 |   {
455 |    "cell_type": "markdown",
456 |    "metadata": {},
457 |    "source": [
458 |     "**NOTE**: We do **not** need to prefix 1s to the dataset beforehand to achieve bias terms that vertically shift the decision boundary regions. The API provides this by default."
459 |    ]
460 |   },
461 |   {
462 |    "cell_type": "markdown",
463 |    "metadata": {},
464 |    "source": [
465 |     "# Results "
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "markdown",
470 |    "metadata": {},
471 |    "source": [
472 |     "We can see the weights that were produced (including the bias terms) using the following line of code:"
473 |    ]
474 |   },
475 |   {
476 |    "cell_type": "code",
477 |    "execution_count": null,
478 |    "metadata": {
479 |     "collapsed": true
480 |    },
481 |    "outputs": [],
482 |    "source": [
483 |     "print nn.get_parameters() # output the weights of the neural network"
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "markdown",
488 |    "metadata": {},
489 |    "source": [
490 |     "Unlike logistic regression, the weights of neural networks, unfortunately, are not very interpretable. There are also a high number of these weights. For example, the weights my system produced were outputted as so:\n",
491 |     "\n",
492 |     "<img src=\"./../images/weights.png\">"
493 |    ]
494 |   },
495 |   {
496 |    "cell_type": "markdown",
497 |    "metadata": {},
498 |    "source": [
499 |     "However, it is more important to output the **acurracy** of these results, and how much error is made. Unlike logistic regression where we used a cost function that outputted a real number, we are going to find the **percent** error of our system like so:\n",
500 |     "\n",
501 |     "1. Create predictions in the form of probabilities for each test tumor being malignant\n",
502 |     "2. Iterate through the test set with index i\n",
503 |     "3. Fetch the predicted output probability of this test tumor\n",
504 |     "4. Round this probability to either a 1 (malignant) or a 0 (benign)\n",
505 |     "5. Compare this to the correct test output at the ith index\n",
506 |     "6. If an error occurs, increment some pre-initialized error counter\n",
507 |     "7. Get the percentage error by dividing the error counter by the total number of test examples, multiplying by 100"
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "code",
512 |    "execution_count": null,
513 |    "metadata": {
514 |     "collapsed": true
515 |    },
516 |    "outputs": [],
517 |    "source": [
518 |     "error_count = 0.0 # initialize the error counter\n",
519 |     "prob_predictions = nn.predict_proba(X_test) # predict the outputs of the X_test instances, in the form of probabilities (hence predict_proba)\n",
520 |     "for i in xrange(len(prob_predictions)): # iterate through all the predictions (equal in size to test set)\n",
521 |     "    # create a discrete decision for the tumor being malignant or benign, using 0.5 as the lowest probability needed for a predicted malignancy (general rounding)\n",
522 |     "    # as discussed before, our network actually outputs [probability_of_benign, probability_of_malignant], so we will want to\n",
523 |     "    # fetch the probability_of_malignant value and round this one (that's how it would be for a single output network if it worked!)\n",
524 |     "    discrete_prediction = 0 if prob_predictions[i][1] < 0.5 else 1\n",
525 |     "\tif not y_test[i] == discrete_prediction: # if the actual, correct value for this test tumor does not equal the discrete prediction \n",
526 |     "\t\terror_count += 1.0 # increment the number of errors\n",
527 |     "        \n",
528 |     "error_rate = error_count / len(prob_predictions) * 100 # get the percentage of errors by dividing total errors by number of instances, multiplying by 100 \n",
529 |     "print error_count # print number of raw errors\n",
530 |     "print str(error_rate) + \"%\" # output this error percentage "
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "markdown",
535 |    "metadata": {},
536 |    "source": [
537 |     "My program produced **8** errors in total, with an error rate of **3.9%**. This means my neural network had a success/accuracy rate of **96.1%**. This is pretty good! In the future, we can make it even better. We could introduce more complex models with a greater number of hidden layers, or we could perform pre-processing on the input eg. by using normalization. We may also want to apply **regularization**, and ensure that our model is not too complex for the data (we could use more test data to do that - but right now it looks good!). Lastly, we may want to look at each individual error and try to minimize **false negatives** (we predict a patient **does not** have a malignant tumor when they **do** - risky business!) over **false positives**.\n",
538 |     "\n",
539 |     "There are many other things we can do from here, and this practical example demonstrated the power of neural networks and how we can use an API like scikit-neuralnetwork to get one up and running in no time. Hope y'all had fun!"
540 |    ]
541 |   }
542 |  ],
543 |  "metadata": {
544 |   "kernelspec": {
545 |    "display_name": "Python 2",
546 |    "language": "python",
547 |    "name": "python2"
548 |   },
549 |   "language_info": {
550 |    "codemirror_mode": {
551 |     "name": "ipython",
552 |     "version": 2
553 |    },
554 |    "file_extension": ".py",
555 |    "mimetype": "text/x-python",
556 |    "name": "python",
557 |    "nbconvert_exporter": "python",
558 |    "pygments_lexer": "ipython2",
559 |    "version": "2.7.11"
560 |   }
561 |  },
562 |  "nbformat": 4,
563 |  "nbformat_minor": 0
564 | }
565 | 


--------------------------------------------------------------------------------
/5-classification/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/5-classification/README.md


--------------------------------------------------------------------------------
/6-clustering/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/6-clustering/README.md


--------------------------------------------------------------------------------
/6-clustering/k means visualization/index.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | 
 3 | <html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
 4 | 
 5 | <div class="wrap">
 6 | <div class="wrapped">
 7 | 
 8 | 
 9 | <article>
10 | 
11 | <div id="entry">
12 | <div class="entry-body">
13 | <section>
14 | 
15 | 
16 | <div id="kmeans">
17 |   <h3>K means visualization</h3>
18 |   <br>
19 | <div><svg width="300" height="400" style="padding: 50px; cursor: pointer; -webkit-user-select: none; background: rgb(34, 51, 68);"><g></g><g><circle r="5" fill="#ffffff" cy="475.26160078481973" cx="698.4268613106314"></circle><circle r="5" fill="#ffffff" cy="289.4110802165872" cx="121.12960348268867"></circle><circle r="5" fill="#ffffff" cy="31.77806895864268" cx="250.54092609566513"></circle><circle r="5" fill="#ffffff" cy="126.86654404437687" cx="261.6020839946741"></circle><circle r="5" fill="#ffffff" cy="474.81629004633567" cx="529.8871224912934"></circle><circle r="5" fill="#ffffff" cy="318.74531665751533" cx="190.57248855572647"></circle><circle r="5" fill="#ffffff" cy="46.300404318299165" cx="582.9040356613247"></circle><circle r="5" fill="#ffffff" cy="486.0172230472919" cx="257.30807387433936"></circle><circle r="5" fill="#ffffff" cy="404.29789048523617" cx="157.24561446045837"></circle><circle r="5" fill="#ffffff" cy="193.28535685066237" cx="532.8710921722422"></circle><circle r="5" fill="#ffffff" cy="323.26579886032897" cx="448.83039505996646"></circle><circle r="5" fill="#ffffff" cy="316.00930012417945" cx="248.4938750382004"></circle><circle r="5" fill="#ffffff" cy="297.30997826022445" cx="502.1665284165493"></circle><circle r="5" fill="#ffffff" cy="313.79902860731323" cx="657.2501863993388"></circle><circle r="5" fill="#ffffff" cy="355.7082020991801" cx="485.8494590583273"></circle><circle r="5" fill="#ffffff" cy="91.55152652882646" cx="646.3974441457262"></circle><circle r="5" fill="#ffffff" cy="47.76014580838703" cx="664.4251980579719"></circle><circle r="5" fill="#ffffff" cy="323.10796000687475" cx="534.2093804313356"></circle><circle r="5" fill="#ffffff" cy="148.37757936708877" cx="264.6354878522129"></circle><circle r="5" fill="#ffffff" cy="238.38513458179384" cx="535.0670881832567"></circle><circle r="5" fill="#ffffff" cy="61.27513450850863" cx="400.3290201325573"></circle><circle r="5" fill="#ffffff" cy="144.73622624617093" cx="459.186790135359"></circle><circle r="5" fill="#ffffff" cy="172.98872351831648" cx="84.98516070902849"></circle><circle r="5" fill="#ffffff" cy="449.60032890550167" cx="121.92965672648968"></circle><circle r="5" fill="#ffffff" cy="435.8024891689475" cx="211.76661771424517"></circle><circle r="5" fill="#ffffff" cy="43.033469659907304" cx="533.0975077893307"></circle><circle r="5" fill="#ffffff" cy="442.2651809852544" cx="617.5387573097759"></circle><circle r="5" fill="#ffffff" cy="388.9056940768275" cx="65.00911557776092"></circle><circle r="5" fill="#ffffff" cy="444.57436755296976" cx="74.08058877898722"></circle><circle r="5" fill="#ffffff" cy="160.8058770861852" cx="189.40386410922173"></circle><circle r="5" fill="#ffffff" cy="468.9412961842274" cx="610.20495148341"></circle><circle r="5" fill="#ffffff" cy="289.6989269364568" cx="212.68140442370316"></circle><circle r="5" fill="#ffffff" cy="198.8176427800746" cx="632.6201126807501"></circle><circle r="5" fill="#ffffff" cy="76.89276319023007" cx="282.1883089235664"></circle><circle r="5" fill="#ffffff" cy="281.0708381937667" cx="385.96996063793176"></circle><circle r="5" fill="#ffffff" cy="455.0284262654906" cx="613.9109149678505"></circle><circle r="5" fill="#ffffff" cy="214.15440995704077" cx="30.51200081214205"></circle><circle r="5" fill="#ffffff" cy="5.557825189992646" cx="581.6675717489247"></circle><circle r="5" fill="#ffffff" cy="334.76472598013174" cx="356.15048263529485"></circle><circle r="5" fill="#ffffff" cy="353.44645156537945" cx="425.679711970698"></circle><circle r="5" fill="#ffffff" cy="335.43572170337313" cx="449.1727826625799"></circle><circle r="5" fill="#ffffff" cy="263.42411374249224" cx="428.41360696739054"></circle><circle r="5" fill="#ffffff" cy="329.50330475865235" cx="450.50515985636974"></circle><circle r="5" fill="#ffffff" cy="119.40897398524055" cx="262.6312782872759"></circle><circle r="5" fill="#ffffff" cy="268.0182355059841" cx="383.62370567753703"></circle><circle r="5" fill="#ffffff" cy="38.253591646206274" cx="562.8646812383596"></circle><circle r="5" fill="#ffffff" cy="365.8159661586098" cx="211.09931233076674"></circle><circle r="5" fill="#ffffff" cy="456.0460968264462" cx="479.2167859939649"></circle><circle r="5" fill="#ffffff" cy="174.19922597306373" cx="531.4761069400217"></circle><circle r="5" fill="#ffffff" cy="78.85750333130821" cx="14.127082468794594"></circle><circle r="5" fill="#ffffff" cy="230.47691934976197" cx="503.61095744234075"></circle><circle r="5" fill="#ffffff" cy="96.70036242946003" cx="133.7156771529722"></circle><circle r="5" fill="#ffffff" cy="219.65033827971328" cx="693.0719975816843"></circle><circle r="5" fill="#ffffff" cy="6.587795789118086" cx="479.5915322814479"></circle><circle r="5" fill="#ffffff" cy="167.25631903306606" cx="651.2551210345347"></circle><circle r="5" fill="#ffffff" cy="353.20348286890874" cx="122.4747916037198"></circle><circle r="5" fill="#ffffff" cy="395.3699746307489" cx="511.21621749146806"></circle><circle r="5" fill="#ffffff" cy="310.3649355418032" cx="296.6075631978574"></circle><circle r="5" fill="#ffffff" cy="481.2461834780277" cx="193.0453846693039"></circle><circle r="5" fill="#ffffff" cy="310.37281831897684" cx="542.0002059770959"></circle><circle r="5" fill="#ffffff" cy="27.816584894527807" cx="543.843243799998"></circle><circle r="5" fill="#ffffff" cy="11.3948179128241" cx="29.2820935218864"></circle><circle r="5" fill="#ffffff" cy="397.49311709826725" cx="263.4249001876969"></circle><circle r="5" fill="#ffffff" cy="392.47312975913115" cx="136.14370314060457"></circle><circle r="5" fill="#ffffff" cy="401.75841653833436" cx="301.6594021421974"></circle><circle r="5" fill="#ffffff" cy="409.1323379655973" cx="677.1647211580325"></circle><circle r="5" fill="#ffffff" cy="372.20930289093377" cx="107.33200154899905"></circle><circle r="5" fill="#ffffff" cy="260.76496904275217" cx="572.250753181114"></circle><circle r="5" fill="#ffffff" cy="418.9801935705597" cx="466.97286348526814"></circle><circle r="5" fill="#ffffff" cy="145.91389887033802" cx="612.183567032471"></circle><circle r="5" fill="#ffffff" cy="150.4824973995405" cx="266.476444300274"></circle><circle r="5" fill="#ffffff" cy="56.80359275111913" cx="207.6565521079293"></circle><circle r="5" fill="#ffffff" cy="275.29344390533845" cx="647.5208043321051"></circle><circle r="5" fill="#ffffff" cy="54.6035851558617" cx="631.3093551556798"></circle><circle r="5" fill="#ffffff" cy="101.10898922236525" cx="211.8698916118772"></circle><circle r="5" fill="#ffffff" cy="336.7480703258512" cx="393.082745457442"></circle><circle r="5" fill="#ffffff" cy="481.0721144294941" cx="469.55632536030265"></circle><circle r="5" fill="#ffffff" cy="22.236681416990983" cx="82.65139299672781"></circle><circle r="5" fill="#ffffff" cy="138.18611309564648" cx="368.29563120920545"></circle><circle r="5" fill="#ffffff" cy="206.2924597202249" cx="420.98034408694537"></circle><circle r="5" fill="#ffffff" cy="358.07702608423426" cx="574.1412450430731"></circle><circle r="5" fill="#ffffff" cy="330.26232186115027" cx="138.4196563447728"></circle><circle r="5" fill="#ffffff" cy="284.0773000458238" cx="477.6205663932412"></circle><circle r="5" fill="#ffffff" cy="120.9958780381876" cx="535.760739494805"></circle><circle r="5" fill="#ffffff" cy="258.0602133539299" cx="166.62964168967085"></circle><circle r="5" fill="#ffffff" cy="39.707289781977416" cx="252.4661891432681"></circle><circle r="5" fill="#ffffff" cy="182.34573705317732" cx="129.78225431129127"></circle><circle r="5" fill="#ffffff" cy="155.0456798928957" cx="77.98090429600629"></circle><circle r="5" fill="#ffffff" cy="284.9139832444958" cx="40.776908576939434"></circle><circle r="5" fill="#ffffff" cy="234.5774640797276" cx="217.5392768967051"></circle><circle r="5" fill="#ffffff" cy="462.09127485166056" cx="128.120984828058"></circle><circle r="5" fill="#ffffff" cy="239.56368453368714" cx="1.3376514476199919"></circle><circle r="5" fill="#ffffff" cy="303.65710260782737" cx="336.2743836921656"></circle><circle r="5" fill="#ffffff" cy="40.721970018160455" cx="112.42170270935159"></circle><circle r="5" fill="#ffffff" cy="193.00633649795262" cx="271.1155494839827"></circle><circle r="5" fill="#ffffff" cy="435.05902682218436" cx="262.85167326619916"></circle><circle r="5" fill="#ffffff" cy="84.16549753872125" cx="607.8061380434542"></circle><circle r="5" fill="#ffffff" cy="215.07168080528808" cx="231.46716053128605"></circle><circle r="5" fill="#ffffff" cy="123.04024142741618" cx="360.8978252439401"></circle><circle r="5" fill="#ffffff" cy="306.92521431350474" cx="122.78328184547422"></circle></g><g><path d="M-5.366563145999495,-1.7888543819998317H-1.7888543819998317V-5.366563145999495H1.7888543819998317V-1.7888543819998317H5.366563145999495V1.7888543819998317H1.7888543819998317V5.366563145999495H-1.7888543819998317V1.7888543819998317H-5.366563145999495Z" stroke="#aabbcc" transform="translate(276.46218022930196,73.52283783550861) rotate(45)" fill="hsl(0,100%,50%)"></path><path d="M-5.366563145999495,-1.7888543819998317H-1.7888543819998317V-5.366563145999495H1.7888543819998317V-1.7888543819998317H5.366563145999495V1.7888543819998317H1.7888543819998317V5.366563145999495H-1.7888543819998317V1.7888543819998317H-5.366563145999495Z" stroke="#aabbcc" transform="translate(669.7142119918328,307.9983803746949) rotate(45)" fill="hsl(72,100%,50%)"></path><path d="M-5.366563145999495,-1.7888543819998317H-1.7888543819998317V-5.366563145999495H1.7888543819998317V-1.7888543819998317H5.366563145999495V1.7888543819998317H1.7888543819998317V5.366563145999495H-1.7888543819998317V1.7888543819998317H-5.366563145999495Z" stroke="#aabbcc" transform="translate(670.2739704032056,333.69351795073953) rotate(45)" fill="hsl(144,100%,50%)"></path><path d="M-5.366563145999495,-1.7888543819998317H-1.7888543819998317V-5.366563145999495H1.7888543819998317V-1.7888543819998317H5.366563145999495V1.7888543819998317H1.7888543819998317V5.366563145999495H-1.7888543819998317V1.7888543819998317H-5.366563145999495Z" stroke="#aabbcc" transform="translate(137.3896739025942,366.1346278024748) rotate(45)" fill="hsl(216,100%,50%)"></path><path d="M-5.366563145999495,-1.7888543819998317H-1.7888543819998317V-5.366563145999495H1.7888543819998317V-1.7888543819998317H5.366563145999495V1.7888543819998317H1.7888543819998317V5.366563145999495H-1.7888543819998317V1.7888543819998317H-5.366563145999495Z" stroke="#aabbcc" transform="translate(623.3791379435459,255.04412506670133) rotate(45)" fill="hsl(288,100%,50%)"></path></g></svg></div>
20 | <div><button id="step" style="padding: 0.5em 0.8em;">Step</button> <button id="restart" disabled="disabled" style="padding: 0.5em 0.8em;">Restart</button></div>
21 | <fieldset style="display: inline; margin: .8em 0 1em 0; border: 1px solid #999; padding: .5em">
22 | <div><label for="N" style="display: inline-block; width: 15em;">N (the number of node):</label><input type="number" id="N" min="2" max="1000" value="100"></div>
23 | <div><label for="K" style="display: inline-block; width: 15em;">K (the number of cluster):</label><input type="number" id="K" min="2" max="50" value="5"></div>
24 | <div><button id="reset" style="padding: 0.5em 0.8em;">New</button></div>
25 | </fieldset>
26 | </div>
27 | 
28 | <script src="d3.v3.min.js" charset="utf-8"></script>
29 | 
30 | <script src="k-means.js"></script>
31 | 
32 | </section>
33 | 
34 | </div>
35 | 
36 | </div>
37 | 
38 | </article>
39 | 
40 | </div>
41 | </div>
42 | 


--------------------------------------------------------------------------------
/6-clustering/k means visualization/k-means.js:
--------------------------------------------------------------------------------
  1 | var flag = false;
  2 | var WIDTH = d3.select("#kmeans")[0][0].offsetWidth - 20;
  3 | var HEIGHT = Math.max(300, WIDTH * .7);
  4 | var svg = d3.select("#kmeans svg")
  5 |   .attr('width', WIDTH)
  6 |   .attr('height', HEIGHT)
  7 |   .style('padding', '10px')
  8 |   .style('background', '#223344')
  9 |   .style('cursor', 'pointer')
 10 |   .style('-webkit-user-select', 'none')
 11 |   .style('-khtml-user-select', 'none')
 12 |   .style('-moz-user-select', 'none')
 13 |   .style('-ms-user-select', 'none')
 14 |   .style('user-select', 'none')
 15 |   .on('click', function() {
 16 |     d3.event.preventDefault();
 17 |     step();
 18 |   });
 19 | 
 20 | d3.selectAll("#kmeans button")
 21 |   .style('padding', '.5em .8em');
 22 | 
 23 | d3.selectAll("#kmeans label")
 24 |   .style('display', 'inline-block')
 25 |   .style('width', '15em');
 26 | 
 27 | var lineg = svg.append('g');
 28 | var dotg = svg.append('g');
 29 | var centerg = svg.append('g');
 30 | d3.select("#step")
 31 |   .on('click', function() { step(); draw(); });
 32 | d3.select("#restart")
 33 |   .on('click', function() { restart(); draw(); });
 34 | d3.select("#reset")
 35 |   .on('click', function() { init(); draw(); });
 36 | 
 37 | 
 38 | var groups = [], dots = [];
 39 | 
 40 | function step() {
 41 |   d3.select("#restart").attr("disabled", null);
 42 |   if (flag) {
 43 |     moveCenter();
 44 |     draw();
 45 |   } else {
 46 |     updateGroups();
 47 |     draw();
 48 |   }
 49 |   flag = !flag;
 50 | }
 51 | 
 52 | function init() {
 53 |   d3.select("#restart").attr("disabled", "disabled");
 54 | 
 55 |   var N = parseInt(d3.select('#N')[0][0].value, 10);
 56 |   var K = parseInt(d3.select('#K')[0][0].value, 10);
 57 |   groups = [];
 58 |   for (var i = 0; i < K; i++) {
 59 |     var g = {
 60 |       dots: [],
 61 |       color: 'hsl(' + (i * 360 / K) + ',100%,50%)',
 62 |       center: {
 63 |         x: Math.random() * WIDTH,
 64 |         y: Math.random() * HEIGHT
 65 |       },
 66 |       init: {
 67 |         center: {}
 68 |       }
 69 |     };
 70 |     g.init.center = {
 71 |       x: g.center.x,
 72 |       y: g.center.y
 73 |     };
 74 |     groups.push(g);
 75 |   }
 76 | 
 77 |   dots = [];
 78 |   flag = false;
 79 |   for (i = 0; i < N; i++) {
 80 |     var dot ={
 81 |       x: Math.random() * WIDTH,
 82 |       y: Math.random() * HEIGHT,
 83 |       group: undefined
 84 |     };
 85 |     dot.init = {
 86 |       x: dot.x,
 87 |       y: dot.y,
 88 |       group: dot.group
 89 |     };
 90 |     dots.push(dot);
 91 |   }
 92 | }
 93 | 
 94 | function restart() {
 95 |   flag = false;
 96 |   d3.select("#restart").attr("disabled", "disabled");
 97 | 
 98 |   groups.forEach(function(g) {
 99 |     g.dots = [];
100 |     g.center.x = g.init.center.x;
101 |     g.center.y = g.init.center.y;
102 |   });
103 | 
104 |   for (var i = 0; i < dots.length; i++) {
105 |     var dot = dots[i];
106 |     dots[i] = {
107 |       x: dot.init.x,
108 |       y: dot.init.y,
109 |       group: undefined,
110 |       init: dot.init
111 |     };
112 |   }
113 | }
114 | 
115 | 
116 | function draw() {
117 |   var circles = dotg.selectAll('circle')
118 |     .data(dots);
119 |   circles.enter()
120 |     .append('circle');
121 |   circles.exit().remove();
122 |   circles
123 |     .transition()
124 |     .duration(500)
125 |     .attr('cx', function(d) { return d.x; })
126 |     .attr('cy', function(d) { return d.y; })
127 |     .attr('fill', function(d) { return d.group ? d.group.color : '#ffffff'; })
128 |     .attr('r', 5);
129 | 
130 |   if (dots[0].group) {
131 |     var l = lineg.selectAll('line')
132 |       .data(dots);
133 |     var updateLine = function(lines) {
134 |       lines
135 |         .attr('x1', function(d) { return d.x; })
136 |         .attr('y1', function(d) { return d.y; })
137 |         .attr('x2', function(d) { return d.group.center.x; })
138 |         .attr('y2', function(d) { return d.group.center.y; })
139 |         .attr('stroke', function(d) { return d.group.color; });
140 |     };
141 |     updateLine(l.enter().append('line'));
142 |     updateLine(l.transition().duration(500));
143 |     l.exit().remove();
144 |   } else {
145 |     lineg.selectAll('line').remove();
146 |   }
147 | 
148 |   var c = centerg.selectAll('path')
149 |     .data(groups);
150 |   var updateCenters = function(centers) {
151 |     centers
152 |       .attr('transform', function(d) { return "translate(" + d.center.x + "," + d.center.y + ") rotate(45)";})
153 |       .attr('fill', function(d,i) { return d.color; })
154 |       .attr('stroke', '#aabbcc');
155 |   };
156 |   c.exit().remove();
157 |   updateCenters(c.enter()
158 |     .append('path')
159 |     .attr('d', d3.svg.symbol().type('cross'))
160 |     .attr('stroke', '#aabbcc'));
161 |   updateCenters(c
162 |     .transition()
163 |     .duration(500));}
164 | 
165 | function moveCenter() {
166 |   groups.forEach(function(group, i) {
167 |     if (group.dots.length == 0) return;
168 | 
169 |     // get center of gravity
170 |     var x = 0, y = 0;
171 |     group.dots.forEach(function(dot) {
172 |       x += dot.x;
173 |       y += dot.y;
174 |     });
175 | 
176 |     group.center = {
177 |       x: x / group.dots.length,
178 |       y: y / group.dots.length
179 |     };
180 |   });
181 | 
182 | }
183 | 
184 | function updateGroups() {
185 |   groups.forEach(function(g) { g.dots = []; });
186 |   dots.forEach(function(dot) {
187 |     // find the nearest group
188 |     var min = Infinity;
189 |     var group;
190 |     groups.forEach(function(g) {
191 |       var d = Math.pow(g.center.x - dot.x, 2) + Math.pow(g.center.y - dot.y, 2);
192 |       if (d < min) {
193 |         min = d;
194 |         group = g;
195 |       }
196 |     });
197 | 
198 |     // update group
199 |     group.dots.push(dot);
200 |     dot.group = group;
201 |   });
202 | }
203 | 
204 | init(); draw();
205 | 


--------------------------------------------------------------------------------
/7 - Practical Methodologies:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/7---practical-methodologies/CV:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Stanford Crowd Course Initiative 
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # datascience
 2 | 
 3 | [![Binder](http://mybinder.org/badge.svg)](http://mybinder.org/repo/crowd-course/datascience)
 4 | 
 5 | * Add Jupyter notebooks to the corresponding folder.
 6 | * To maintain the order, create one notebook per module.
 7 |   e.g. 2-1-this-is-module-1-for-chapter-2, 3-3-this-is-module-3-for-chapter-3.
 8 | * If you need to add extra info, add it to the README file inside each folder.
 9 | * Use scikit-learn, numpy and Pandas only.
10 | 


--------------------------------------------------------------------------------
/datasets/Advertising.csv:
--------------------------------------------------------------------------------
  1 | "","TV","Radio","Newspaper","Sales"
  2 | "1",230.1,37.8,69.2,22.1
  3 | "2",44.5,39.3,45.1,10.4
  4 | "3",17.2,45.9,69.3,9.3
  5 | "4",151.5,41.3,58.5,18.5
  6 | "5",180.8,10.8,58.4,12.9
  7 | "6",8.7,48.9,75,7.2
  8 | "7",57.5,32.8,23.5,11.8
  9 | "8",120.2,19.6,11.6,13.2
 10 | "9",8.6,2.1,1,4.8
 11 | "10",199.8,2.6,21.2,10.6
 12 | "11",66.1,5.8,24.2,8.6
 13 | "12",214.7,24,4,17.4
 14 | "13",23.8,35.1,65.9,9.2
 15 | "14",97.5,7.6,7.2,9.7
 16 | "15",204.1,32.9,46,19
 17 | "16",195.4,47.7,52.9,22.4
 18 | "17",67.8,36.6,114,12.5
 19 | "18",281.4,39.6,55.8,24.4
 20 | "19",69.2,20.5,18.3,11.3
 21 | "20",147.3,23.9,19.1,14.6
 22 | "21",218.4,27.7,53.4,18
 23 | "22",237.4,5.1,23.5,12.5
 24 | "23",13.2,15.9,49.6,5.6
 25 | "24",228.3,16.9,26.2,15.5
 26 | "25",62.3,12.6,18.3,9.7
 27 | "26",262.9,3.5,19.5,12
 28 | "27",142.9,29.3,12.6,15
 29 | "28",240.1,16.7,22.9,15.9
 30 | "29",248.8,27.1,22.9,18.9
 31 | "30",70.6,16,40.8,10.5
 32 | "31",292.9,28.3,43.2,21.4
 33 | "32",112.9,17.4,38.6,11.9
 34 | "33",97.2,1.5,30,9.6
 35 | "34",265.6,20,0.3,17.4
 36 | "35",95.7,1.4,7.4,9.5
 37 | "36",290.7,4.1,8.5,12.8
 38 | "37",266.9,43.8,5,25.4
 39 | "38",74.7,49.4,45.7,14.7
 40 | "39",43.1,26.7,35.1,10.1
 41 | "40",228,37.7,32,21.5
 42 | "41",202.5,22.3,31.6,16.6
 43 | "42",177,33.4,38.7,17.1
 44 | "43",293.6,27.7,1.8,20.7
 45 | "44",206.9,8.4,26.4,12.9
 46 | "45",25.1,25.7,43.3,8.5
 47 | "46",175.1,22.5,31.5,14.9
 48 | "47",89.7,9.9,35.7,10.6
 49 | "48",239.9,41.5,18.5,23.2
 50 | "49",227.2,15.8,49.9,14.8
 51 | "50",66.9,11.7,36.8,9.7
 52 | "51",199.8,3.1,34.6,11.4
 53 | "52",100.4,9.6,3.6,10.7
 54 | "53",216.4,41.7,39.6,22.6
 55 | "54",182.6,46.2,58.7,21.2
 56 | "55",262.7,28.8,15.9,20.2
 57 | "56",198.9,49.4,60,23.7
 58 | "57",7.3,28.1,41.4,5.5
 59 | "58",136.2,19.2,16.6,13.2
 60 | "59",210.8,49.6,37.7,23.8
 61 | "60",210.7,29.5,9.3,18.4
 62 | "61",53.5,2,21.4,8.1
 63 | "62",261.3,42.7,54.7,24.2
 64 | "63",239.3,15.5,27.3,15.7
 65 | "64",102.7,29.6,8.4,14
 66 | "65",131.1,42.8,28.9,18
 67 | "66",69,9.3,0.9,9.3
 68 | "67",31.5,24.6,2.2,9.5
 69 | "68",139.3,14.5,10.2,13.4
 70 | "69",237.4,27.5,11,18.9
 71 | "70",216.8,43.9,27.2,22.3
 72 | "71",199.1,30.6,38.7,18.3
 73 | "72",109.8,14.3,31.7,12.4
 74 | "73",26.8,33,19.3,8.8
 75 | "74",129.4,5.7,31.3,11
 76 | "75",213.4,24.6,13.1,17
 77 | "76",16.9,43.7,89.4,8.7
 78 | "77",27.5,1.6,20.7,6.9
 79 | "78",120.5,28.5,14.2,14.2
 80 | "79",5.4,29.9,9.4,5.3
 81 | "80",116,7.7,23.1,11
 82 | "81",76.4,26.7,22.3,11.8
 83 | "82",239.8,4.1,36.9,12.3
 84 | "83",75.3,20.3,32.5,11.3
 85 | "84",68.4,44.5,35.6,13.6
 86 | "85",213.5,43,33.8,21.7
 87 | "86",193.2,18.4,65.7,15.2
 88 | "87",76.3,27.5,16,12
 89 | "88",110.7,40.6,63.2,16
 90 | "89",88.3,25.5,73.4,12.9
 91 | "90",109.8,47.8,51.4,16.7
 92 | "91",134.3,4.9,9.3,11.2
 93 | "92",28.6,1.5,33,7.3
 94 | "93",217.7,33.5,59,19.4
 95 | "94",250.9,36.5,72.3,22.2
 96 | "95",107.4,14,10.9,11.5
 97 | "96",163.3,31.6,52.9,16.9
 98 | "97",197.6,3.5,5.9,11.7
 99 | "98",184.9,21,22,15.5
100 | "99",289.7,42.3,51.2,25.4
101 | "100",135.2,41.7,45.9,17.2
102 | "101",222.4,4.3,49.8,11.7
103 | "102",296.4,36.3,100.9,23.8
104 | "103",280.2,10.1,21.4,14.8
105 | "104",187.9,17.2,17.9,14.7
106 | "105",238.2,34.3,5.3,20.7
107 | "106",137.9,46.4,59,19.2
108 | "107",25,11,29.7,7.2
109 | "108",90.4,0.3,23.2,8.7
110 | "109",13.1,0.4,25.6,5.3
111 | "110",255.4,26.9,5.5,19.8
112 | "111",225.8,8.2,56.5,13.4
113 | "112",241.7,38,23.2,21.8
114 | "113",175.7,15.4,2.4,14.1
115 | "114",209.6,20.6,10.7,15.9
116 | "115",78.2,46.8,34.5,14.6
117 | "116",75.1,35,52.7,12.6
118 | "117",139.2,14.3,25.6,12.2
119 | "118",76.4,0.8,14.8,9.4
120 | "119",125.7,36.9,79.2,15.9
121 | "120",19.4,16,22.3,6.6
122 | "121",141.3,26.8,46.2,15.5
123 | "122",18.8,21.7,50.4,7
124 | "123",224,2.4,15.6,11.6
125 | "124",123.1,34.6,12.4,15.2
126 | "125",229.5,32.3,74.2,19.7
127 | "126",87.2,11.8,25.9,10.6
128 | "127",7.8,38.9,50.6,6.6
129 | "128",80.2,0,9.2,8.8
130 | "129",220.3,49,3.2,24.7
131 | "130",59.6,12,43.1,9.7
132 | "131",0.7,39.6,8.7,1.6
133 | "132",265.2,2.9,43,12.7
134 | "133",8.4,27.2,2.1,5.7
135 | "134",219.8,33.5,45.1,19.6
136 | "135",36.9,38.6,65.6,10.8
137 | "136",48.3,47,8.5,11.6
138 | "137",25.6,39,9.3,9.5
139 | "138",273.7,28.9,59.7,20.8
140 | "139",43,25.9,20.5,9.6
141 | "140",184.9,43.9,1.7,20.7
142 | "141",73.4,17,12.9,10.9
143 | "142",193.7,35.4,75.6,19.2
144 | "143",220.5,33.2,37.9,20.1
145 | "144",104.6,5.7,34.4,10.4
146 | "145",96.2,14.8,38.9,11.4
147 | "146",140.3,1.9,9,10.3
148 | "147",240.1,7.3,8.7,13.2
149 | "148",243.2,49,44.3,25.4
150 | "149",38,40.3,11.9,10.9
151 | "150",44.7,25.8,20.6,10.1
152 | "151",280.7,13.9,37,16.1
153 | "152",121,8.4,48.7,11.6
154 | "153",197.6,23.3,14.2,16.6
155 | "154",171.3,39.7,37.7,19
156 | "155",187.8,21.1,9.5,15.6
157 | "156",4.1,11.6,5.7,3.2
158 | "157",93.9,43.5,50.5,15.3
159 | "158",149.8,1.3,24.3,10.1
160 | "159",11.7,36.9,45.2,7.3
161 | "160",131.7,18.4,34.6,12.9
162 | "161",172.5,18.1,30.7,14.4
163 | "162",85.7,35.8,49.3,13.3
164 | "163",188.4,18.1,25.6,14.9
165 | "164",163.5,36.8,7.4,18
166 | "165",117.2,14.7,5.4,11.9
167 | "166",234.5,3.4,84.8,11.9
168 | "167",17.9,37.6,21.6,8
169 | "168",206.8,5.2,19.4,12.2
170 | "169",215.4,23.6,57.6,17.1
171 | "170",284.3,10.6,6.4,15
172 | "171",50,11.6,18.4,8.4
173 | "172",164.5,20.9,47.4,14.5
174 | "173",19.6,20.1,17,7.6
175 | "174",168.4,7.1,12.8,11.7
176 | "175",222.4,3.4,13.1,11.5
177 | "176",276.9,48.9,41.8,27
178 | "177",248.4,30.2,20.3,20.2
179 | "178",170.2,7.8,35.2,11.7
180 | "179",276.7,2.3,23.7,11.8
181 | "180",165.6,10,17.6,12.6
182 | "181",156.6,2.6,8.3,10.5
183 | "182",218.5,5.4,27.4,12.2
184 | "183",56.2,5.7,29.7,8.7
185 | "184",287.6,43,71.8,26.2
186 | "185",253.8,21.3,30,17.6
187 | "186",205,45.1,19.6,22.6
188 | "187",139.5,2.1,26.6,10.3
189 | "188",191.1,28.7,18.2,17.3
190 | "189",286,13.9,3.7,15.9
191 | "190",18.7,12.1,23.4,6.7
192 | "191",39.5,41.1,5.8,10.8
193 | "192",75.5,10.8,6,9.9
194 | "193",17.2,4.1,31.6,5.9
195 | "194",166.8,42,3.6,19.6
196 | "195",149.7,35.6,6,17.3
197 | "196",38.2,3.7,13.8,7.6
198 | "197",94.2,4.9,8.1,9.7
199 | "198",177,9.3,6.4,12.8
200 | "199",283.6,42,66.2,25.5
201 | "200",232.1,8.6,8.7,13.4
202 | 


--------------------------------------------------------------------------------
/datasets/breast-cancer-wisconson.csv:
--------------------------------------------------------------------------------
  1 | ID,CT,UCS,UCSh,MA,SECS,BN,BC,NN,M,Class
  2 | 1000025,5,1,1,1,2,1,3,1,1,2
  3 | 1002945,5,4,4,5,7,10,3,2,1,2
  4 | 1015425,3,1,1,1,2,2,3,1,1,2
  5 | 1016277,6,8,8,1,3,4,3,7,1,2
  6 | 1017023,4,1,1,3,2,1,3,1,1,2
  7 | 1017122,8,10,10,8,7,10,9,7,1,4
  8 | 1018099,1,1,1,1,2,10,3,1,1,2
  9 | 1018561,2,1,2,1,2,1,3,1,1,2
 10 | 1033078,2,1,1,1,2,1,1,1,5,2
 11 | 1033078,4,2,1,1,2,1,2,1,1,2
 12 | 1035283,1,1,1,1,1,1,3,1,1,2
 13 | 1036172,2,1,1,1,2,1,2,1,1,2
 14 | 1041801,5,3,3,3,2,3,4,4,1,4
 15 | 1043999,1,1,1,1,2,3,3,1,1,2
 16 | 1044572,8,7,5,10,7,9,5,5,4,4
 17 | 1047630,7,4,6,4,6,1,4,3,1,4
 18 | 1048672,4,1,1,1,2,1,2,1,1,2
 19 | 1049815,4,1,1,1,2,1,3,1,1,2
 20 | 1050670,10,7,7,6,4,10,4,1,2,4
 21 | 1050718,6,1,1,1,2,1,3,1,1,2
 22 | 1054590,7,3,2,10,5,10,5,4,4,4
 23 | 1054593,10,5,5,3,6,7,7,10,1,4
 24 | 1056784,3,1,1,1,2,1,2,1,1,2
 25 | 1057013,8,4,5,1,2,?,7,3,1,4
 26 | 1059552,1,1,1,1,2,1,3,1,1,2
 27 | 1065726,5,2,3,4,2,7,3,6,1,4
 28 | 1066373,3,2,1,1,1,1,2,1,1,2
 29 | 1066979,5,1,1,1,2,1,2,1,1,2
 30 | 1067444,2,1,1,1,2,1,2,1,1,2
 31 | 1070935,1,1,3,1,2,1,1,1,1,2
 32 | 1070935,3,1,1,1,1,1,2,1,1,2
 33 | 1071760,2,1,1,1,2,1,3,1,1,2
 34 | 1072179,10,7,7,3,8,5,7,4,3,4
 35 | 1074610,2,1,1,2,2,1,3,1,1,2
 36 | 1075123,3,1,2,1,2,1,2,1,1,2
 37 | 1079304,2,1,1,1,2,1,2,1,1,2
 38 | 1080185,10,10,10,8,6,1,8,9,1,4
 39 | 1081791,6,2,1,1,1,1,7,1,1,2
 40 | 1084584,5,4,4,9,2,10,5,6,1,4
 41 | 1091262,2,5,3,3,6,7,7,5,1,4
 42 | 1096800,6,6,6,9,6,?,7,8,1,2
 43 | 1099510,10,4,3,1,3,3,6,5,2,4
 44 | 1100524,6,10,10,2,8,10,7,3,3,4
 45 | 1102573,5,6,5,6,10,1,3,1,1,4
 46 | 1103608,10,10,10,4,8,1,8,10,1,4
 47 | 1103722,1,1,1,1,2,1,2,1,2,2
 48 | 1105257,3,7,7,4,4,9,4,8,1,4
 49 | 1105524,1,1,1,1,2,1,2,1,1,2
 50 | 1106095,4,1,1,3,2,1,3,1,1,2
 51 | 1106829,7,8,7,2,4,8,3,8,2,4
 52 | 1108370,9,5,8,1,2,3,2,1,5,4
 53 | 1108449,5,3,3,4,2,4,3,4,1,4
 54 | 1110102,10,3,6,2,3,5,4,10,2,4
 55 | 1110503,5,5,5,8,10,8,7,3,7,4
 56 | 1110524,10,5,5,6,8,8,7,1,1,4
 57 | 1111249,10,6,6,3,4,5,3,6,1,4
 58 | 1112209,8,10,10,1,3,6,3,9,1,4
 59 | 1113038,8,2,4,1,5,1,5,4,4,4
 60 | 1113483,5,2,3,1,6,10,5,1,1,4
 61 | 1113906,9,5,5,2,2,2,5,1,1,4
 62 | 1115282,5,3,5,5,3,3,4,10,1,4
 63 | 1115293,1,1,1,1,2,2,2,1,1,2
 64 | 1116116,9,10,10,1,10,8,3,3,1,4
 65 | 1116132,6,3,4,1,5,2,3,9,1,4
 66 | 1116192,1,1,1,1,2,1,2,1,1,2
 67 | 1116998,10,4,2,1,3,2,4,3,10,4
 68 | 1117152,4,1,1,1,2,1,3,1,1,2
 69 | 1118039,5,3,4,1,8,10,4,9,1,4
 70 | 1120559,8,3,8,3,4,9,8,9,8,4
 71 | 1121732,1,1,1,1,2,1,3,2,1,2
 72 | 1121919,5,1,3,1,2,1,2,1,1,2
 73 | 1123061,6,10,2,8,10,2,7,8,10,4
 74 | 1124651,1,3,3,2,2,1,7,2,1,2
 75 | 1125035,9,4,5,10,6,10,4,8,1,4
 76 | 1126417,10,6,4,1,3,4,3,2,3,4
 77 | 1131294,1,1,2,1,2,2,4,2,1,2
 78 | 1132347,1,1,4,1,2,1,2,1,1,2
 79 | 1133041,5,3,1,2,2,1,2,1,1,2
 80 | 1133136,3,1,1,1,2,3,3,1,1,2
 81 | 1136142,2,1,1,1,3,1,2,1,1,2
 82 | 1137156,2,2,2,1,1,1,7,1,1,2
 83 | 1143978,4,1,1,2,2,1,2,1,1,2
 84 | 1143978,5,2,1,1,2,1,3,1,1,2
 85 | 1147044,3,1,1,1,2,2,7,1,1,2
 86 | 1147699,3,5,7,8,8,9,7,10,7,4
 87 | 1147748,5,10,6,1,10,4,4,10,10,4
 88 | 1148278,3,3,6,4,5,8,4,4,1,4
 89 | 1148873,3,6,6,6,5,10,6,8,3,4
 90 | 1152331,4,1,1,1,2,1,3,1,1,2
 91 | 1155546,2,1,1,2,3,1,2,1,1,2
 92 | 1156272,1,1,1,1,2,1,3,1,1,2
 93 | 1156948,3,1,1,2,2,1,1,1,1,2
 94 | 1157734,4,1,1,1,2,1,3,1,1,2
 95 | 1158247,1,1,1,1,2,1,2,1,1,2
 96 | 1160476,2,1,1,1,2,1,3,1,1,2
 97 | 1164066,1,1,1,1,2,1,3,1,1,2
 98 | 1165297,2,1,1,2,2,1,1,1,1,2
 99 | 1165790,5,1,1,1,2,1,3,1,1,2
100 | 1165926,9,6,9,2,10,6,2,9,10,4
101 | 1166630,7,5,6,10,5,10,7,9,4,4
102 | 1166654,10,3,5,1,10,5,3,10,2,4
103 | 1167439,2,3,4,4,2,5,2,5,1,4
104 | 1167471,4,1,2,1,2,1,3,1,1,2
105 | 1168359,8,2,3,1,6,3,7,1,1,4
106 | 1168736,10,10,10,10,10,1,8,8,8,4
107 | 1169049,7,3,4,4,3,3,3,2,7,4
108 | 1170419,10,10,10,8,2,10,4,1,1,4
109 | 1170420,1,6,8,10,8,10,5,7,1,4
110 | 1171710,1,1,1,1,2,1,2,3,1,2
111 | 1171710,6,5,4,4,3,9,7,8,3,4
112 | 1171795,1,3,1,2,2,2,5,3,2,2
113 | 1171845,8,6,4,3,5,9,3,1,1,4
114 | 1172152,10,3,3,10,2,10,7,3,3,4
115 | 1173216,10,10,10,3,10,8,8,1,1,4
116 | 1173235,3,3,2,1,2,3,3,1,1,2
117 | 1173347,1,1,1,1,2,5,1,1,1,2
118 | 1173347,8,3,3,1,2,2,3,2,1,2
119 | 1173509,4,5,5,10,4,10,7,5,8,4
120 | 1173514,1,1,1,1,4,3,1,1,1,2
121 | 1173681,3,2,1,1,2,2,3,1,1,2
122 | 1174057,1,1,2,2,2,1,3,1,1,2
123 | 1174057,4,2,1,1,2,2,3,1,1,2
124 | 1174131,10,10,10,2,10,10,5,3,3,4
125 | 1174428,5,3,5,1,8,10,5,3,1,4
126 | 1175937,5,4,6,7,9,7,8,10,1,4
127 | 1176406,1,1,1,1,2,1,2,1,1,2
128 | 1176881,7,5,3,7,4,10,7,5,5,4
129 | 1177027,3,1,1,1,2,1,3,1,1,2
130 | 1177399,8,3,5,4,5,10,1,6,2,4
131 | 1177512,1,1,1,1,10,1,1,1,1,2
132 | 1178580,5,1,3,1,2,1,2,1,1,2
133 | 1179818,2,1,1,1,2,1,3,1,1,2
134 | 1180194,5,10,8,10,8,10,3,6,3,4
135 | 1180523,3,1,1,1,2,1,2,2,1,2
136 | 1180831,3,1,1,1,3,1,2,1,1,2
137 | 1181356,5,1,1,1,2,2,3,3,1,2
138 | 1182404,4,1,1,1,2,1,2,1,1,2
139 | 1182410,3,1,1,1,2,1,1,1,1,2
140 | 1183240,4,1,2,1,2,1,2,1,1,2
141 | 1183246,1,1,1,1,1,?,2,1,1,2
142 | 1183516,3,1,1,1,2,1,1,1,1,2
143 | 1183911,2,1,1,1,2,1,1,1,1,2
144 | 1183983,9,5,5,4,4,5,4,3,3,4
145 | 1184184,1,1,1,1,2,5,1,1,1,2
146 | 1184241,2,1,1,1,2,1,2,1,1,2
147 | 1184840,1,1,3,1,2,?,2,1,1,2
148 | 1185609,3,4,5,2,6,8,4,1,1,4
149 | 1185610,1,1,1,1,3,2,2,1,1,2
150 | 1187457,3,1,1,3,8,1,5,8,1,2
151 | 1187805,8,8,7,4,10,10,7,8,7,4
152 | 1188472,1,1,1,1,1,1,3,1,1,2
153 | 1189266,7,2,4,1,6,10,5,4,3,4
154 | 1189286,10,10,8,6,4,5,8,10,1,4
155 | 1190394,4,1,1,1,2,3,1,1,1,2
156 | 1190485,1,1,1,1,2,1,1,1,1,2
157 | 1192325,5,5,5,6,3,10,3,1,1,4
158 | 1193091,1,2,2,1,2,1,2,1,1,2
159 | 1193210,2,1,1,1,2,1,3,1,1,2
160 | 1193683,1,1,2,1,3,?,1,1,1,2
161 | 1196295,9,9,10,3,6,10,7,10,6,4
162 | 1196915,10,7,7,4,5,10,5,7,2,4
163 | 1197080,4,1,1,1,2,1,3,2,1,2
164 | 1197270,3,1,1,1,2,1,3,1,1,2
165 | 1197440,1,1,1,2,1,3,1,1,7,2
166 | 1197510,5,1,1,1,2,?,3,1,1,2
167 | 1197979,4,1,1,1,2,2,3,2,1,2
168 | 1197993,5,6,7,8,8,10,3,10,3,4
169 | 1198128,10,8,10,10,6,1,3,1,10,4
170 | 1198641,3,1,1,1,2,1,3,1,1,2
171 | 1199219,1,1,1,2,1,1,1,1,1,2
172 | 1199731,3,1,1,1,2,1,1,1,1,2
173 | 1199983,1,1,1,1,2,1,3,1,1,2
174 | 1200772,1,1,1,1,2,1,2,1,1,2
175 | 1200847,6,10,10,10,8,10,10,10,7,4
176 | 1200892,8,6,5,4,3,10,6,1,1,4
177 | 1200952,5,8,7,7,10,10,5,7,1,4
178 | 1201834,2,1,1,1,2,1,3,1,1,2
179 | 1201936,5,10,10,3,8,1,5,10,3,4
180 | 1202125,4,1,1,1,2,1,3,1,1,2
181 | 1202812,5,3,3,3,6,10,3,1,1,4
182 | 1203096,1,1,1,1,1,1,3,1,1,2
183 | 1204242,1,1,1,1,2,1,1,1,1,2
184 | 1204898,6,1,1,1,2,1,3,1,1,2
185 | 1205138,5,8,8,8,5,10,7,8,1,4
186 | 1205579,8,7,6,4,4,10,5,1,1,4
187 | 1206089,2,1,1,1,1,1,3,1,1,2
188 | 1206695,1,5,8,6,5,8,7,10,1,4
189 | 1206841,10,5,6,10,6,10,7,7,10,4
190 | 1207986,5,8,4,10,5,8,9,10,1,4
191 | 1208301,1,2,3,1,2,1,3,1,1,2
192 | 1210963,10,10,10,8,6,8,7,10,1,4
193 | 1211202,7,5,10,10,10,10,4,10,3,4
194 | 1212232,5,1,1,1,2,1,2,1,1,2
195 | 1212251,1,1,1,1,2,1,3,1,1,2
196 | 1212422,3,1,1,1,2,1,3,1,1,2
197 | 1212422,4,1,1,1,2,1,3,1,1,2
198 | 1213375,8,4,4,5,4,7,7,8,2,2
199 | 1213383,5,1,1,4,2,1,3,1,1,2
200 | 1214092,1,1,1,1,2,1,1,1,1,2
201 | 1214556,3,1,1,1,2,1,2,1,1,2
202 | 1214966,9,7,7,5,5,10,7,8,3,4
203 | 1216694,10,8,8,4,10,10,8,1,1,4
204 | 1216947,1,1,1,1,2,1,3,1,1,2
205 | 1217051,5,1,1,1,2,1,3,1,1,2
206 | 1217264,1,1,1,1,2,1,3,1,1,2
207 | 1218105,5,10,10,9,6,10,7,10,5,4
208 | 1218741,10,10,9,3,7,5,3,5,1,4
209 | 1218860,1,1,1,1,1,1,3,1,1,2
210 | 1218860,1,1,1,1,1,1,3,1,1,2
211 | 1219406,5,1,1,1,1,1,3,1,1,2
212 | 1219525,8,10,10,10,5,10,8,10,6,4
213 | 1219859,8,10,8,8,4,8,7,7,1,4
214 | 1220330,1,1,1,1,2,1,3,1,1,2
215 | 1221863,10,10,10,10,7,10,7,10,4,4
216 | 1222047,10,10,10,10,3,10,10,6,1,4
217 | 1222936,8,7,8,7,5,5,5,10,2,4
218 | 1223282,1,1,1,1,2,1,2,1,1,2
219 | 1223426,1,1,1,1,2,1,3,1,1,2
220 | 1223793,6,10,7,7,6,4,8,10,2,4
221 | 1223967,6,1,3,1,2,1,3,1,1,2
222 | 1224329,1,1,1,2,2,1,3,1,1,2
223 | 1225799,10,6,4,3,10,10,9,10,1,4
224 | 1226012,4,1,1,3,1,5,2,1,1,4
225 | 1226612,7,5,6,3,3,8,7,4,1,4
226 | 1227210,10,5,5,6,3,10,7,9,2,4
227 | 1227244,1,1,1,1,2,1,2,1,1,2
228 | 1227481,10,5,7,4,4,10,8,9,1,4
229 | 1228152,8,9,9,5,3,5,7,7,1,4
230 | 1228311,1,1,1,1,1,1,3,1,1,2
231 | 1230175,10,10,10,3,10,10,9,10,1,4
232 | 1230688,7,4,7,4,3,7,7,6,1,4
233 | 1231387,6,8,7,5,6,8,8,9,2,4
234 | 1231706,8,4,6,3,3,1,4,3,1,2
235 | 1232225,10,4,5,5,5,10,4,1,1,4
236 | 1236043,3,3,2,1,3,1,3,6,1,2
237 | 1241232,3,1,4,1,2,?,3,1,1,2
238 | 1241559,10,8,8,2,8,10,4,8,10,4
239 | 1241679,9,8,8,5,6,2,4,10,4,4
240 | 1242364,8,10,10,8,6,9,3,10,10,4
241 | 1243256,10,4,3,2,3,10,5,3,2,4
242 | 1270479,5,1,3,3,2,2,2,3,1,2
243 | 1276091,3,1,1,3,1,1,3,1,1,2
244 | 1277018,2,1,1,1,2,1,3,1,1,2
245 | 128059,1,1,1,1,2,5,5,1,1,2
246 | 1285531,1,1,1,1,2,1,3,1,1,2
247 | 1287775,5,1,1,2,2,2,3,1,1,2
248 | 144888,8,10,10,8,5,10,7,8,1,4
249 | 145447,8,4,4,1,2,9,3,3,1,4
250 | 167528,4,1,1,1,2,1,3,6,1,2
251 | 169356,3,1,1,1,2,?,3,1,1,2
252 | 183913,1,2,2,1,2,1,1,1,1,2
253 | 191250,10,4,4,10,2,10,5,3,3,4
254 | 1017023,6,3,3,5,3,10,3,5,3,2
255 | 1100524,6,10,10,2,8,10,7,3,3,4
256 | 1116116,9,10,10,1,10,8,3,3,1,4
257 | 1168736,5,6,6,2,4,10,3,6,1,4
258 | 1182404,3,1,1,1,2,1,1,1,1,2
259 | 1182404,3,1,1,1,2,1,2,1,1,2
260 | 1198641,3,1,1,1,2,1,3,1,1,2
261 | 242970,5,7,7,1,5,8,3,4,1,2
262 | 255644,10,5,8,10,3,10,5,1,3,4
263 | 263538,5,10,10,6,10,10,10,6,5,4
264 | 274137,8,8,9,4,5,10,7,8,1,4
265 | 303213,10,4,4,10,6,10,5,5,1,4
266 | 314428,7,9,4,10,10,3,5,3,3,4
267 | 1182404,5,1,4,1,2,1,3,2,1,2
268 | 1198641,10,10,6,3,3,10,4,3,2,4
269 | 320675,3,3,5,2,3,10,7,1,1,4
270 | 324427,10,8,8,2,3,4,8,7,8,4
271 | 385103,1,1,1,1,2,1,3,1,1,2
272 | 390840,8,4,7,1,3,10,3,9,2,4
273 | 411453,5,1,1,1,2,1,3,1,1,2
274 | 320675,3,3,5,2,3,10,7,1,1,4
275 | 428903,7,2,4,1,3,4,3,3,1,4
276 | 431495,3,1,1,1,2,1,3,2,1,2
277 | 432809,3,1,3,1,2,?,2,1,1,2
278 | 434518,3,1,1,1,2,1,2,1,1,2
279 | 452264,1,1,1,1,2,1,2,1,1,2
280 | 456282,1,1,1,1,2,1,3,1,1,2
281 | 476903,10,5,7,3,3,7,3,3,8,4
282 | 486283,3,1,1,1,2,1,3,1,1,2
283 | 486662,2,1,1,2,2,1,3,1,1,2
284 | 488173,1,4,3,10,4,10,5,6,1,4
285 | 492268,10,4,6,1,2,10,5,3,1,4
286 | 508234,7,4,5,10,2,10,3,8,2,4
287 | 527363,8,10,10,10,8,10,10,7,3,4
288 | 529329,10,10,10,10,10,10,4,10,10,4
289 | 535331,3,1,1,1,3,1,2,1,1,2
290 | 543558,6,1,3,1,4,5,5,10,1,4
291 | 555977,5,6,6,8,6,10,4,10,4,4
292 | 560680,1,1,1,1,2,1,1,1,1,2
293 | 561477,1,1,1,1,2,1,3,1,1,2
294 | 563649,8,8,8,1,2,?,6,10,1,4
295 | 601265,10,4,4,6,2,10,2,3,1,4
296 | 606140,1,1,1,1,2,?,2,1,1,2
297 | 606722,5,5,7,8,6,10,7,4,1,4
298 | 616240,5,3,4,3,4,5,4,7,1,2
299 | 61634,5,4,3,1,2,?,2,3,1,2
300 | 625201,8,2,1,1,5,1,1,1,1,2
301 | 63375,9,1,2,6,4,10,7,7,2,4
302 | 635844,8,4,10,5,4,4,7,10,1,4
303 | 636130,1,1,1,1,2,1,3,1,1,2
304 | 640744,10,10,10,7,9,10,7,10,10,4
305 | 646904,1,1,1,1,2,1,3,1,1,2
306 | 653777,8,3,4,9,3,10,3,3,1,4
307 | 659642,10,8,4,4,4,10,3,10,4,4
308 | 666090,1,1,1,1,2,1,3,1,1,2
309 | 666942,1,1,1,1,2,1,3,1,1,2
310 | 667204,7,8,7,6,4,3,8,8,4,4
311 | 673637,3,1,1,1,2,5,5,1,1,2
312 | 684955,2,1,1,1,3,1,2,1,1,2
313 | 688033,1,1,1,1,2,1,1,1,1,2
314 | 691628,8,6,4,10,10,1,3,5,1,4
315 | 693702,1,1,1,1,2,1,1,1,1,2
316 | 704097,1,1,1,1,1,1,2,1,1,2
317 | 704168,4,6,5,6,7,?,4,9,1,2
318 | 706426,5,5,5,2,5,10,4,3,1,4
319 | 709287,6,8,7,8,6,8,8,9,1,4
320 | 718641,1,1,1,1,5,1,3,1,1,2
321 | 721482,4,4,4,4,6,5,7,3,1,2
322 | 730881,7,6,3,2,5,10,7,4,6,4
323 | 733639,3,1,1,1,2,?,3,1,1,2
324 | 733639,3,1,1,1,2,1,3,1,1,2
325 | 733823,5,4,6,10,2,10,4,1,1,4
326 | 740492,1,1,1,1,2,1,3,1,1,2
327 | 743348,3,2,2,1,2,1,2,3,1,2
328 | 752904,10,1,1,1,2,10,5,4,1,4
329 | 756136,1,1,1,1,2,1,2,1,1,2
330 | 760001,8,10,3,2,6,4,3,10,1,4
331 | 760239,10,4,6,4,5,10,7,1,1,4
332 | 76389,10,4,7,2,2,8,6,1,1,4
333 | 764974,5,1,1,1,2,1,3,1,2,2
334 | 770066,5,2,2,2,2,1,2,2,1,2
335 | 785208,5,4,6,6,4,10,4,3,1,4
336 | 785615,8,6,7,3,3,10,3,4,2,4
337 | 792744,1,1,1,1,2,1,1,1,1,2
338 | 797327,6,5,5,8,4,10,3,4,1,4
339 | 798429,1,1,1,1,2,1,3,1,1,2
340 | 704097,1,1,1,1,1,1,2,1,1,2
341 | 806423,8,5,5,5,2,10,4,3,1,4
342 | 809912,10,3,3,1,2,10,7,6,1,4
343 | 810104,1,1,1,1,2,1,3,1,1,2
344 | 814265,2,1,1,1,2,1,1,1,1,2
345 | 814911,1,1,1,1,2,1,1,1,1,2
346 | 822829,7,6,4,8,10,10,9,5,3,4
347 | 826923,1,1,1,1,2,1,1,1,1,2
348 | 830690,5,2,2,2,3,1,1,3,1,2
349 | 831268,1,1,1,1,1,1,1,3,1,2
350 | 832226,3,4,4,10,5,1,3,3,1,4
351 | 832567,4,2,3,5,3,8,7,6,1,4
352 | 836433,5,1,1,3,2,1,1,1,1,2
353 | 837082,2,1,1,1,2,1,3,1,1,2
354 | 846832,3,4,5,3,7,3,4,6,1,2
355 | 850831,2,7,10,10,7,10,4,9,4,4
356 | 855524,1,1,1,1,2,1,2,1,1,2
357 | 857774,4,1,1,1,3,1,2,2,1,2
358 | 859164,5,3,3,1,3,3,3,3,3,4
359 | 859350,8,10,10,7,10,10,7,3,8,4
360 | 866325,8,10,5,3,8,4,4,10,3,4
361 | 873549,10,3,5,4,3,7,3,5,3,4
362 | 877291,6,10,10,10,10,10,8,10,10,4
363 | 877943,3,10,3,10,6,10,5,1,4,4
364 | 888169,3,2,2,1,4,3,2,1,1,2
365 | 888523,4,4,4,2,2,3,2,1,1,2
366 | 896404,2,1,1,1,2,1,3,1,1,2
367 | 897172,2,1,1,1,2,1,2,1,1,2
368 | 95719,6,10,10,10,8,10,7,10,7,4
369 | 160296,5,8,8,10,5,10,8,10,3,4
370 | 342245,1,1,3,1,2,1,1,1,1,2
371 | 428598,1,1,3,1,1,1,2,1,1,2
372 | 492561,4,3,2,1,3,1,2,1,1,2
373 | 493452,1,1,3,1,2,1,1,1,1,2
374 | 493452,4,1,2,1,2,1,2,1,1,2
375 | 521441,5,1,1,2,2,1,2,1,1,2
376 | 560680,3,1,2,1,2,1,2,1,1,2
377 | 636437,1,1,1,1,2,1,1,1,1,2
378 | 640712,1,1,1,1,2,1,2,1,1,2
379 | 654244,1,1,1,1,1,1,2,1,1,2
380 | 657753,3,1,1,4,3,1,2,2,1,2
381 | 685977,5,3,4,1,4,1,3,1,1,2
382 | 805448,1,1,1,1,2,1,1,1,1,2
383 | 846423,10,6,3,6,4,10,7,8,4,4
384 | 1002504,3,2,2,2,2,1,3,2,1,2
385 | 1022257,2,1,1,1,2,1,1,1,1,2
386 | 1026122,2,1,1,1,2,1,1,1,1,2
387 | 1071084,3,3,2,2,3,1,1,2,3,2
388 | 1080233,7,6,6,3,2,10,7,1,1,4
389 | 1114570,5,3,3,2,3,1,3,1,1,2
390 | 1114570,2,1,1,1,2,1,2,2,1,2
391 | 1116715,5,1,1,1,3,2,2,2,1,2
392 | 1131411,1,1,1,2,2,1,2,1,1,2
393 | 1151734,10,8,7,4,3,10,7,9,1,4
394 | 1156017,3,1,1,1,2,1,2,1,1,2
395 | 1158247,1,1,1,1,1,1,1,1,1,2
396 | 1158405,1,2,3,1,2,1,2,1,1,2
397 | 1168278,3,1,1,1,2,1,2,1,1,2
398 | 1176187,3,1,1,1,2,1,3,1,1,2
399 | 1196263,4,1,1,1,2,1,1,1,1,2
400 | 1196475,3,2,1,1,2,1,2,2,1,2
401 | 1206314,1,2,3,1,2,1,1,1,1,2
402 | 1211265,3,10,8,7,6,9,9,3,8,4
403 | 1213784,3,1,1,1,2,1,1,1,1,2
404 | 1223003,5,3,3,1,2,1,2,1,1,2
405 | 1223306,3,1,1,1,2,4,1,1,1,2
406 | 1223543,1,2,1,3,2,1,1,2,1,2
407 | 1229929,1,1,1,1,2,1,2,1,1,2
408 | 1231853,4,2,2,1,2,1,2,1,1,2
409 | 1234554,1,1,1,1,2,1,2,1,1,2
410 | 1236837,2,3,2,2,2,2,3,1,1,2
411 | 1237674,3,1,2,1,2,1,2,1,1,2
412 | 1238021,1,1,1,1,2,1,2,1,1,2
413 | 1238464,1,1,1,1,1,?,2,1,1,2
414 | 1238633,10,10,10,6,8,4,8,5,1,4
415 | 1238915,5,1,2,1,2,1,3,1,1,2
416 | 1238948,8,5,6,2,3,10,6,6,1,4
417 | 1239232,3,3,2,6,3,3,3,5,1,2
418 | 1239347,8,7,8,5,10,10,7,2,1,4
419 | 1239967,1,1,1,1,2,1,2,1,1,2
420 | 1240337,5,2,2,2,2,2,3,2,2,2
421 | 1253505,2,3,1,1,5,1,1,1,1,2
422 | 1255384,3,2,2,3,2,3,3,1,1,2
423 | 1257200,10,10,10,7,10,10,8,2,1,4
424 | 1257648,4,3,3,1,2,1,3,3,1,2
425 | 1257815,5,1,3,1,2,1,2,1,1,2
426 | 1257938,3,1,1,1,2,1,1,1,1,2
427 | 1258549,9,10,10,10,10,10,10,10,1,4
428 | 1258556,5,3,6,1,2,1,1,1,1,2
429 | 1266154,8,7,8,2,4,2,5,10,1,4
430 | 1272039,1,1,1,1,2,1,2,1,1,2
431 | 1276091,2,1,1,1,2,1,2,1,1,2
432 | 1276091,1,3,1,1,2,1,2,2,1,2
433 | 1276091,5,1,1,3,4,1,3,2,1,2
434 | 1277629,5,1,1,1,2,1,2,2,1,2
435 | 1293439,3,2,2,3,2,1,1,1,1,2
436 | 1293439,6,9,7,5,5,8,4,2,1,2
437 | 1294562,10,8,10,1,3,10,5,1,1,4
438 | 1295186,10,10,10,1,6,1,2,8,1,4
439 | 527337,4,1,1,1,2,1,1,1,1,2
440 | 558538,4,1,3,3,2,1,1,1,1,2
441 | 566509,5,1,1,1,2,1,1,1,1,2
442 | 608157,10,4,3,10,4,10,10,1,1,4
443 | 677910,5,2,2,4,2,4,1,1,1,2
444 | 734111,1,1,1,3,2,3,1,1,1,2
445 | 734111,1,1,1,1,2,2,1,1,1,2
446 | 780555,5,1,1,6,3,1,2,1,1,2
447 | 827627,2,1,1,1,2,1,1,1,1,2
448 | 1049837,1,1,1,1,2,1,1,1,1,2
449 | 1058849,5,1,1,1,2,1,1,1,1,2
450 | 1182404,1,1,1,1,1,1,1,1,1,2
451 | 1193544,5,7,9,8,6,10,8,10,1,4
452 | 1201870,4,1,1,3,1,1,2,1,1,2
453 | 1202253,5,1,1,1,2,1,1,1,1,2
454 | 1227081,3,1,1,3,2,1,1,1,1,2
455 | 1230994,4,5,5,8,6,10,10,7,1,4
456 | 1238410,2,3,1,1,3,1,1,1,1,2
457 | 1246562,10,2,2,1,2,6,1,1,2,4
458 | 1257470,10,6,5,8,5,10,8,6,1,4
459 | 1259008,8,8,9,6,6,3,10,10,1,4
460 | 1266124,5,1,2,1,2,1,1,1,1,2
461 | 1267898,5,1,3,1,2,1,1,1,1,2
462 | 1268313,5,1,1,3,2,1,1,1,1,2
463 | 1268804,3,1,1,1,2,5,1,1,1,2
464 | 1276091,6,1,1,3,2,1,1,1,1,2
465 | 1280258,4,1,1,1,2,1,1,2,1,2
466 | 1293966,4,1,1,1,2,1,1,1,1,2
467 | 1296572,10,9,8,7,6,4,7,10,3,4
468 | 1298416,10,6,6,2,4,10,9,7,1,4
469 | 1299596,6,6,6,5,4,10,7,6,2,4
470 | 1105524,4,1,1,1,2,1,1,1,1,2
471 | 1181685,1,1,2,1,2,1,2,1,1,2
472 | 1211594,3,1,1,1,1,1,2,1,1,2
473 | 1238777,6,1,1,3,2,1,1,1,1,2
474 | 1257608,6,1,1,1,1,1,1,1,1,2
475 | 1269574,4,1,1,1,2,1,1,1,1,2
476 | 1277145,5,1,1,1,2,1,1,1,1,2
477 | 1287282,3,1,1,1,2,1,1,1,1,2
478 | 1296025,4,1,2,1,2,1,1,1,1,2
479 | 1296263,4,1,1,1,2,1,1,1,1,2
480 | 1296593,5,2,1,1,2,1,1,1,1,2
481 | 1299161,4,8,7,10,4,10,7,5,1,4
482 | 1301945,5,1,1,1,1,1,1,1,1,2
483 | 1302428,5,3,2,4,2,1,1,1,1,2
484 | 1318169,9,10,10,10,10,5,10,10,10,4
485 | 474162,8,7,8,5,5,10,9,10,1,4
486 | 787451,5,1,2,1,2,1,1,1,1,2
487 | 1002025,1,1,1,3,1,3,1,1,1,2
488 | 1070522,3,1,1,1,1,1,2,1,1,2
489 | 1073960,10,10,10,10,6,10,8,1,5,4
490 | 1076352,3,6,4,10,3,3,3,4,1,4
491 | 1084139,6,3,2,1,3,4,4,1,1,4
492 | 1115293,1,1,1,1,2,1,1,1,1,2
493 | 1119189,5,8,9,4,3,10,7,1,1,4
494 | 1133991,4,1,1,1,1,1,2,1,1,2
495 | 1142706,5,10,10,10,6,10,6,5,2,4
496 | 1155967,5,1,2,10,4,5,2,1,1,2
497 | 1170945,3,1,1,1,1,1,2,1,1,2
498 | 1181567,1,1,1,1,1,1,1,1,1,2
499 | 1182404,4,2,1,1,2,1,1,1,1,2
500 | 1204558,4,1,1,1,2,1,2,1,1,2
501 | 1217952,4,1,1,1,2,1,2,1,1,2
502 | 1224565,6,1,1,1,2,1,3,1,1,2
503 | 1238186,4,1,1,1,2,1,2,1,1,2
504 | 1253917,4,1,1,2,2,1,2,1,1,2
505 | 1265899,4,1,1,1,2,1,3,1,1,2
506 | 1268766,1,1,1,1,2,1,1,1,1,2
507 | 1277268,3,3,1,1,2,1,1,1,1,2
508 | 1286943,8,10,10,10,7,5,4,8,7,4
509 | 1295508,1,1,1,1,2,4,1,1,1,2
510 | 1297327,5,1,1,1,2,1,1,1,1,2
511 | 1297522,2,1,1,1,2,1,1,1,1,2
512 | 1298360,1,1,1,1,2,1,1,1,1,2
513 | 1299924,5,1,1,1,2,1,2,1,1,2
514 | 1299994,5,1,1,1,2,1,1,1,1,2
515 | 1304595,3,1,1,1,1,1,2,1,1,2
516 | 1306282,6,6,7,10,3,10,8,10,2,4
517 | 1313325,4,10,4,7,3,10,9,10,1,4
518 | 1320077,1,1,1,1,1,1,1,1,1,2
519 | 1320077,1,1,1,1,1,1,2,1,1,2
520 | 1320304,3,1,2,2,2,1,1,1,1,2
521 | 1330439,4,7,8,3,4,10,9,1,1,4
522 | 333093,1,1,1,1,3,1,1,1,1,2
523 | 369565,4,1,1,1,3,1,1,1,1,2
524 | 412300,10,4,5,4,3,5,7,3,1,4
525 | 672113,7,5,6,10,4,10,5,3,1,4
526 | 749653,3,1,1,1,2,1,2,1,1,2
527 | 769612,3,1,1,2,2,1,1,1,1,2
528 | 769612,4,1,1,1,2,1,1,1,1,2
529 | 798429,4,1,1,1,2,1,3,1,1,2
530 | 807657,6,1,3,2,2,1,1,1,1,2
531 | 8233704,4,1,1,1,1,1,2,1,1,2
532 | 837480,7,4,4,3,4,10,6,9,1,4
533 | 867392,4,2,2,1,2,1,2,1,1,2
534 | 869828,1,1,1,1,1,1,3,1,1,2
535 | 1043068,3,1,1,1,2,1,2,1,1,2
536 | 1056171,2,1,1,1,2,1,2,1,1,2
537 | 1061990,1,1,3,2,2,1,3,1,1,2
538 | 1113061,5,1,1,1,2,1,3,1,1,2
539 | 1116192,5,1,2,1,2,1,3,1,1,2
540 | 1135090,4,1,1,1,2,1,2,1,1,2
541 | 1145420,6,1,1,1,2,1,2,1,1,2
542 | 1158157,5,1,1,1,2,2,2,1,1,2
543 | 1171578,3,1,1,1,2,1,1,1,1,2
544 | 1174841,5,3,1,1,2,1,1,1,1,2
545 | 1184586,4,1,1,1,2,1,2,1,1,2
546 | 1186936,2,1,3,2,2,1,2,1,1,2
547 | 1197527,5,1,1,1,2,1,2,1,1,2
548 | 1222464,6,10,10,10,4,10,7,10,1,4
549 | 1240603,2,1,1,1,1,1,1,1,1,2
550 | 1240603,3,1,1,1,1,1,1,1,1,2
551 | 1241035,7,8,3,7,4,5,7,8,2,4
552 | 1287971,3,1,1,1,2,1,2,1,1,2
553 | 1289391,1,1,1,1,2,1,3,1,1,2
554 | 1299924,3,2,2,2,2,1,4,2,1,2
555 | 1306339,4,4,2,1,2,5,2,1,2,2
556 | 1313658,3,1,1,1,2,1,1,1,1,2
557 | 1313982,4,3,1,1,2,1,4,8,1,2
558 | 1321264,5,2,2,2,1,1,2,1,1,2
559 | 1321321,5,1,1,3,2,1,1,1,1,2
560 | 1321348,2,1,1,1,2,1,2,1,1,2
561 | 1321931,5,1,1,1,2,1,2,1,1,2
562 | 1321942,5,1,1,1,2,1,3,1,1,2
563 | 1321942,5,1,1,1,2,1,3,1,1,2
564 | 1328331,1,1,1,1,2,1,3,1,1,2
565 | 1328755,3,1,1,1,2,1,2,1,1,2
566 | 1331405,4,1,1,1,2,1,3,2,1,2
567 | 1331412,5,7,10,10,5,10,10,10,1,4
568 | 1333104,3,1,2,1,2,1,3,1,1,2
569 | 1334071,4,1,1,1,2,3,2,1,1,2
570 | 1343068,8,4,4,1,6,10,2,5,2,4
571 | 1343374,10,10,8,10,6,5,10,3,1,4
572 | 1344121,8,10,4,4,8,10,8,2,1,4
573 | 142932,7,6,10,5,3,10,9,10,2,4
574 | 183936,3,1,1,1,2,1,2,1,1,2
575 | 324382,1,1,1,1,2,1,2,1,1,2
576 | 378275,10,9,7,3,4,2,7,7,1,4
577 | 385103,5,1,2,1,2,1,3,1,1,2
578 | 690557,5,1,1,1,2,1,2,1,1,2
579 | 695091,1,1,1,1,2,1,2,1,1,2
580 | 695219,1,1,1,1,2,1,2,1,1,2
581 | 824249,1,1,1,1,2,1,3,1,1,2
582 | 871549,5,1,2,1,2,1,2,1,1,2
583 | 878358,5,7,10,6,5,10,7,5,1,4
584 | 1107684,6,10,5,5,4,10,6,10,1,4
585 | 1115762,3,1,1,1,2,1,1,1,1,2
586 | 1217717,5,1,1,6,3,1,1,1,1,2
587 | 1239420,1,1,1,1,2,1,1,1,1,2
588 | 1254538,8,10,10,10,6,10,10,10,1,4
589 | 1261751,5,1,1,1,2,1,2,2,1,2
590 | 1268275,9,8,8,9,6,3,4,1,1,4
591 | 1272166,5,1,1,1,2,1,1,1,1,2
592 | 1294261,4,10,8,5,4,1,10,1,1,4
593 | 1295529,2,5,7,6,4,10,7,6,1,4
594 | 1298484,10,3,4,5,3,10,4,1,1,4
595 | 1311875,5,1,2,1,2,1,1,1,1,2
596 | 1315506,4,8,6,3,4,10,7,1,1,4
597 | 1320141,5,1,1,1,2,1,2,1,1,2
598 | 1325309,4,1,2,1,2,1,2,1,1,2
599 | 1333063,5,1,3,1,2,1,3,1,1,2
600 | 1333495,3,1,1,1,2,1,2,1,1,2
601 | 1334659,5,2,4,1,1,1,1,1,1,2
602 | 1336798,3,1,1,1,2,1,2,1,1,2
603 | 1344449,1,1,1,1,1,1,2,1,1,2
604 | 1350568,4,1,1,1,2,1,2,1,1,2
605 | 1352663,5,4,6,8,4,1,8,10,1,4
606 | 188336,5,3,2,8,5,10,8,1,2,4
607 | 352431,10,5,10,3,5,8,7,8,3,4
608 | 353098,4,1,1,2,2,1,1,1,1,2
609 | 411453,1,1,1,1,2,1,1,1,1,2
610 | 557583,5,10,10,10,10,10,10,1,1,4
611 | 636375,5,1,1,1,2,1,1,1,1,2
612 | 736150,10,4,3,10,3,10,7,1,2,4
613 | 803531,5,10,10,10,5,2,8,5,1,4
614 | 822829,8,10,10,10,6,10,10,10,10,4
615 | 1016634,2,3,1,1,2,1,2,1,1,2
616 | 1031608,2,1,1,1,1,1,2,1,1,2
617 | 1041043,4,1,3,1,2,1,2,1,1,2
618 | 1042252,3,1,1,1,2,1,2,1,1,2
619 | 1057067,1,1,1,1,1,?,1,1,1,2
620 | 1061990,4,1,1,1,2,1,2,1,1,2
621 | 1073836,5,1,1,1,2,1,2,1,1,2
622 | 1083817,3,1,1,1,2,1,2,1,1,2
623 | 1096352,6,3,3,3,3,2,6,1,1,2
624 | 1140597,7,1,2,3,2,1,2,1,1,2
625 | 1149548,1,1,1,1,2,1,1,1,1,2
626 | 1174009,5,1,1,2,1,1,2,1,1,2
627 | 1183596,3,1,3,1,3,4,1,1,1,2
628 | 1190386,4,6,6,5,7,6,7,7,3,4
629 | 1190546,2,1,1,1,2,5,1,1,1,2
630 | 1213273,2,1,1,1,2,1,1,1,1,2
631 | 1218982,4,1,1,1,2,1,1,1,1,2
632 | 1225382,6,2,3,1,2,1,1,1,1,2
633 | 1235807,5,1,1,1,2,1,2,1,1,2
634 | 1238777,1,1,1,1,2,1,1,1,1,2
635 | 1253955,8,7,4,4,5,3,5,10,1,4
636 | 1257366,3,1,1,1,2,1,1,1,1,2
637 | 1260659,3,1,4,1,2,1,1,1,1,2
638 | 1268952,10,10,7,8,7,1,10,10,3,4
639 | 1275807,4,2,4,3,2,2,2,1,1,2
640 | 1277792,4,1,1,1,2,1,1,1,1,2
641 | 1277792,5,1,1,3,2,1,1,1,1,2
642 | 1285722,4,1,1,3,2,1,1,1,1,2
643 | 1288608,3,1,1,1,2,1,2,1,1,2
644 | 1290203,3,1,1,1,2,1,2,1,1,2
645 | 1294413,1,1,1,1,2,1,1,1,1,2
646 | 1299596,2,1,1,1,2,1,1,1,1,2
647 | 1303489,3,1,1,1,2,1,2,1,1,2
648 | 1311033,1,2,2,1,2,1,1,1,1,2
649 | 1311108,1,1,1,3,2,1,1,1,1,2
650 | 1315807,5,10,10,10,10,2,10,10,10,4
651 | 1318671,3,1,1,1,2,1,2,1,1,2
652 | 1319609,3,1,1,2,3,4,1,1,1,2
653 | 1323477,1,2,1,3,2,1,2,1,1,2
654 | 1324572,5,1,1,1,2,1,2,2,1,2
655 | 1324681,4,1,1,1,2,1,2,1,1,2
656 | 1325159,3,1,1,1,2,1,3,1,1,2
657 | 1326892,3,1,1,1,2,1,2,1,1,2
658 | 1330361,5,1,1,1,2,1,2,1,1,2
659 | 1333877,5,4,5,1,8,1,3,6,1,2
660 | 1334015,7,8,8,7,3,10,7,2,3,4
661 | 1334667,1,1,1,1,2,1,1,1,1,2
662 | 1339781,1,1,1,1,2,1,2,1,1,2
663 | 1339781,4,1,1,1,2,1,3,1,1,2
664 | 13454352,1,1,3,1,2,1,2,1,1,2
665 | 1345452,1,1,3,1,2,1,2,1,1,2
666 | 1345593,3,1,1,3,2,1,2,1,1,2
667 | 1347749,1,1,1,1,2,1,1,1,1,2
668 | 1347943,5,2,2,2,2,1,1,1,2,2
669 | 1348851,3,1,1,1,2,1,3,1,1,2
670 | 1350319,5,7,4,1,6,1,7,10,3,4
671 | 1350423,5,10,10,8,5,5,7,10,1,4
672 | 1352848,3,10,7,8,5,8,7,4,1,4
673 | 1353092,3,2,1,2,2,1,3,1,1,2
674 | 1354840,2,1,1,1,2,1,3,1,1,2
675 | 1354840,5,3,2,1,3,1,1,1,1,2
676 | 1355260,1,1,1,1,2,1,2,1,1,2
677 | 1365075,4,1,4,1,2,1,1,1,1,2
678 | 1365328,1,1,2,1,2,1,2,1,1,2
679 | 1368267,5,1,1,1,2,1,1,1,1,2
680 | 1368273,1,1,1,1,2,1,1,1,1,2
681 | 1368882,2,1,1,1,2,1,1,1,1,2
682 | 1369821,10,10,10,10,5,10,10,10,7,4
683 | 1371026,5,10,10,10,4,10,5,6,3,4
684 | 1371920,5,1,1,1,2,1,3,2,1,2
685 | 466906,1,1,1,1,2,1,1,1,1,2
686 | 466906,1,1,1,1,2,1,1,1,1,2
687 | 534555,1,1,1,1,2,1,1,1,1,2
688 | 536708,1,1,1,1,2,1,1,1,1,2
689 | 566346,3,1,1,1,2,1,2,3,1,2
690 | 603148,4,1,1,1,2,1,1,1,1,2
691 | 654546,1,1,1,1,2,1,1,1,8,2
692 | 654546,1,1,1,3,2,1,1,1,1,2
693 | 695091,5,10,10,5,4,5,4,4,1,4
694 | 714039,3,1,1,1,2,1,1,1,1,2
695 | 763235,3,1,1,1,2,1,2,1,2,2
696 | 776715,3,1,1,1,3,2,1,1,1,2
697 | 841769,2,1,1,1,2,1,1,1,1,2
698 | 888820,5,10,10,3,7,3,8,10,2,4
699 | 897471,4,8,6,4,3,4,10,6,1,4
700 | 897471,4,8,8,5,4,5,10,4,1,4
701 | 


--------------------------------------------------------------------------------
/datasets/pima-indians-diabetes.csv:
--------------------------------------------------------------------------------
  1 | Pregnancy Count,Blood Glucose,Diastolic BP,Triceps Skin Fold Thickness,Serum Insulin,BMI,Diabetes Pedigree Function,Age,Class
  2 | 6,148,72,35,0,33.6,0.627,50,1
  3 | 1,85,66,29,0,26.6,0.351,31,0
  4 | 8,183,64,0,0,23.3,0.672,32,1
  5 | 1,89,66,23,94,28.1,0.167,21,0
  6 | 0,137,40,35,168,43.1,2.288,33,1
  7 | 5,116,74,0,0,25.6,0.201,30,0
  8 | 3,78,50,32,88,31.0,0.248,26,1
  9 | 10,115,0,0,0,35.3,0.134,29,0
 10 | 2,197,70,45,543,30.5,0.158,53,1
 11 | 8,125,96,0,0,0.0,0.232,54,1
 12 | 4,110,92,0,0,37.6,0.191,30,0
 13 | 10,168,74,0,0,38.0,0.537,34,1
 14 | 10,139,80,0,0,27.1,1.441,57,0
 15 | 1,189,60,23,846,30.1,0.398,59,1
 16 | 5,166,72,19,175,25.8,0.587,51,1
 17 | 7,100,0,0,0,30.0,0.484,32,1
 18 | 0,118,84,47,230,45.8,0.551,31,1
 19 | 7,107,74,0,0,29.6,0.254,31,1
 20 | 1,103,30,38,83,43.3,0.183,33,0
 21 | 1,115,70,30,96,34.6,0.529,32,1
 22 | 3,126,88,41,235,39.3,0.704,27,0
 23 | 8,99,84,0,0,35.4,0.388,50,0
 24 | 7,196,90,0,0,39.8,0.451,41,1
 25 | 9,119,80,35,0,29.0,0.263,29,1
 26 | 11,143,94,33,146,36.6,0.254,51,1
 27 | 10,125,70,26,115,31.1,0.205,41,1
 28 | 7,147,76,0,0,39.4,0.257,43,1
 29 | 1,97,66,15,140,23.2,0.487,22,0
 30 | 13,145,82,19,110,22.2,0.245,57,0
 31 | 5,117,92,0,0,34.1,0.337,38,0
 32 | 5,109,75,26,0,36.0,0.546,60,0
 33 | 3,158,76,36,245,31.6,0.851,28,1
 34 | 3,88,58,11,54,24.8,0.267,22,0
 35 | 6,92,92,0,0,19.9,0.188,28,0
 36 | 10,122,78,31,0,27.6,0.512,45,0
 37 | 4,103,60,33,192,24.0,0.966,33,0
 38 | 11,138,76,0,0,33.2,0.420,35,0
 39 | 9,102,76,37,0,32.9,0.665,46,1
 40 | 2,90,68,42,0,38.2,0.503,27,1
 41 | 4,111,72,47,207,37.1,1.390,56,1
 42 | 3,180,64,25,70,34.0,0.271,26,0
 43 | 7,133,84,0,0,40.2,0.696,37,0
 44 | 7,106,92,18,0,22.7,0.235,48,0
 45 | 9,171,110,24,240,45.4,0.721,54,1
 46 | 7,159,64,0,0,27.4,0.294,40,0
 47 | 0,180,66,39,0,42.0,1.893,25,1
 48 | 1,146,56,0,0,29.7,0.564,29,0
 49 | 2,71,70,27,0,28.0,0.586,22,0
 50 | 7,103,66,32,0,39.1,0.344,31,1
 51 | 7,105,0,0,0,0.0,0.305,24,0
 52 | 1,103,80,11,82,19.4,0.491,22,0
 53 | 1,101,50,15,36,24.2,0.526,26,0
 54 | 5,88,66,21,23,24.4,0.342,30,0
 55 | 8,176,90,34,300,33.7,0.467,58,1
 56 | 7,150,66,42,342,34.7,0.718,42,0
 57 | 1,73,50,10,0,23.0,0.248,21,0
 58 | 7,187,68,39,304,37.7,0.254,41,1
 59 | 0,100,88,60,110,46.8,0.962,31,0
 60 | 0,146,82,0,0,40.5,1.781,44,0
 61 | 0,105,64,41,142,41.5,0.173,22,0
 62 | 2,84,0,0,0,0.0,0.304,21,0
 63 | 8,133,72,0,0,32.9,0.270,39,1
 64 | 5,44,62,0,0,25.0,0.587,36,0
 65 | 2,141,58,34,128,25.4,0.699,24,0
 66 | 7,114,66,0,0,32.8,0.258,42,1
 67 | 5,99,74,27,0,29.0,0.203,32,0
 68 | 0,109,88,30,0,32.5,0.855,38,1
 69 | 2,109,92,0,0,42.7,0.845,54,0
 70 | 1,95,66,13,38,19.6,0.334,25,0
 71 | 4,146,85,27,100,28.9,0.189,27,0
 72 | 2,100,66,20,90,32.9,0.867,28,1
 73 | 5,139,64,35,140,28.6,0.411,26,0
 74 | 13,126,90,0,0,43.4,0.583,42,1
 75 | 4,129,86,20,270,35.1,0.231,23,0
 76 | 1,79,75,30,0,32.0,0.396,22,0
 77 | 1,0,48,20,0,24.7,0.140,22,0
 78 | 7,62,78,0,0,32.6,0.391,41,0
 79 | 5,95,72,33,0,37.7,0.370,27,0
 80 | 0,131,0,0,0,43.2,0.270,26,1
 81 | 2,112,66,22,0,25.0,0.307,24,0
 82 | 3,113,44,13,0,22.4,0.140,22,0
 83 | 2,74,0,0,0,0.0,0.102,22,0
 84 | 7,83,78,26,71,29.3,0.767,36,0
 85 | 0,101,65,28,0,24.6,0.237,22,0
 86 | 5,137,108,0,0,48.8,0.227,37,1
 87 | 2,110,74,29,125,32.4,0.698,27,0
 88 | 13,106,72,54,0,36.6,0.178,45,0
 89 | 2,100,68,25,71,38.5,0.324,26,0
 90 | 15,136,70,32,110,37.1,0.153,43,1
 91 | 1,107,68,19,0,26.5,0.165,24,0
 92 | 1,80,55,0,0,19.1,0.258,21,0
 93 | 4,123,80,15,176,32.0,0.443,34,0
 94 | 7,81,78,40,48,46.7,0.261,42,0
 95 | 4,134,72,0,0,23.8,0.277,60,1
 96 | 2,142,82,18,64,24.7,0.761,21,0
 97 | 6,144,72,27,228,33.9,0.255,40,0
 98 | 2,92,62,28,0,31.6,0.130,24,0
 99 | 1,71,48,18,76,20.4,0.323,22,0
100 | 6,93,50,30,64,28.7,0.356,23,0
101 | 1,122,90,51,220,49.7,0.325,31,1
102 | 1,163,72,0,0,39.0,1.222,33,1
103 | 1,151,60,0,0,26.1,0.179,22,0
104 | 0,125,96,0,0,22.5,0.262,21,0
105 | 1,81,72,18,40,26.6,0.283,24,0
106 | 2,85,65,0,0,39.6,0.930,27,0
107 | 1,126,56,29,152,28.7,0.801,21,0
108 | 1,96,122,0,0,22.4,0.207,27,0
109 | 4,144,58,28,140,29.5,0.287,37,0
110 | 3,83,58,31,18,34.3,0.336,25,0
111 | 0,95,85,25,36,37.4,0.247,24,1
112 | 3,171,72,33,135,33.3,0.199,24,1
113 | 8,155,62,26,495,34.0,0.543,46,1
114 | 1,89,76,34,37,31.2,0.192,23,0
115 | 4,76,62,0,0,34.0,0.391,25,0
116 | 7,160,54,32,175,30.5,0.588,39,1
117 | 4,146,92,0,0,31.2,0.539,61,1
118 | 5,124,74,0,0,34.0,0.220,38,1
119 | 5,78,48,0,0,33.7,0.654,25,0
120 | 4,97,60,23,0,28.2,0.443,22,0
121 | 4,99,76,15,51,23.2,0.223,21,0
122 | 0,162,76,56,100,53.2,0.759,25,1
123 | 6,111,64,39,0,34.2,0.260,24,0
124 | 2,107,74,30,100,33.6,0.404,23,0
125 | 5,132,80,0,0,26.8,0.186,69,0
126 | 0,113,76,0,0,33.3,0.278,23,1
127 | 1,88,30,42,99,55.0,0.496,26,1
128 | 3,120,70,30,135,42.9,0.452,30,0
129 | 1,118,58,36,94,33.3,0.261,23,0
130 | 1,117,88,24,145,34.5,0.403,40,1
131 | 0,105,84,0,0,27.9,0.741,62,1
132 | 4,173,70,14,168,29.7,0.361,33,1
133 | 9,122,56,0,0,33.3,1.114,33,1
134 | 3,170,64,37,225,34.5,0.356,30,1
135 | 8,84,74,31,0,38.3,0.457,39,0
136 | 2,96,68,13,49,21.1,0.647,26,0
137 | 2,125,60,20,140,33.8,0.088,31,0
138 | 0,100,70,26,50,30.8,0.597,21,0
139 | 0,93,60,25,92,28.7,0.532,22,0
140 | 0,129,80,0,0,31.2,0.703,29,0
141 | 5,105,72,29,325,36.9,0.159,28,0
142 | 3,128,78,0,0,21.1,0.268,55,0
143 | 5,106,82,30,0,39.5,0.286,38,0
144 | 2,108,52,26,63,32.5,0.318,22,0
145 | 10,108,66,0,0,32.4,0.272,42,1
146 | 4,154,62,31,284,32.8,0.237,23,0
147 | 0,102,75,23,0,0.0,0.572,21,0
148 | 9,57,80,37,0,32.8,0.096,41,0
149 | 2,106,64,35,119,30.5,1.400,34,0
150 | 5,147,78,0,0,33.7,0.218,65,0
151 | 2,90,70,17,0,27.3,0.085,22,0
152 | 1,136,74,50,204,37.4,0.399,24,0
153 | 4,114,65,0,0,21.9,0.432,37,0
154 | 9,156,86,28,155,34.3,1.189,42,1
155 | 1,153,82,42,485,40.6,0.687,23,0
156 | 8,188,78,0,0,47.9,0.137,43,1
157 | 7,152,88,44,0,50.0,0.337,36,1
158 | 2,99,52,15,94,24.6,0.637,21,0
159 | 1,109,56,21,135,25.2,0.833,23,0
160 | 2,88,74,19,53,29.0,0.229,22,0
161 | 17,163,72,41,114,40.9,0.817,47,1
162 | 4,151,90,38,0,29.7,0.294,36,0
163 | 7,102,74,40,105,37.2,0.204,45,0
164 | 0,114,80,34,285,44.2,0.167,27,0
165 | 2,100,64,23,0,29.7,0.368,21,0
166 | 0,131,88,0,0,31.6,0.743,32,1
167 | 6,104,74,18,156,29.9,0.722,41,1
168 | 3,148,66,25,0,32.5,0.256,22,0
169 | 4,120,68,0,0,29.6,0.709,34,0
170 | 4,110,66,0,0,31.9,0.471,29,0
171 | 3,111,90,12,78,28.4,0.495,29,0
172 | 6,102,82,0,0,30.8,0.180,36,1
173 | 6,134,70,23,130,35.4,0.542,29,1
174 | 2,87,0,23,0,28.9,0.773,25,0
175 | 1,79,60,42,48,43.5,0.678,23,0
176 | 2,75,64,24,55,29.7,0.370,33,0
177 | 8,179,72,42,130,32.7,0.719,36,1
178 | 6,85,78,0,0,31.2,0.382,42,0
179 | 0,129,110,46,130,67.1,0.319,26,1
180 | 5,143,78,0,0,45.0,0.190,47,0
181 | 5,130,82,0,0,39.1,0.956,37,1
182 | 6,87,80,0,0,23.2,0.084,32,0
183 | 0,119,64,18,92,34.9,0.725,23,0
184 | 1,0,74,20,23,27.7,0.299,21,0
185 | 5,73,60,0,0,26.8,0.268,27,0
186 | 4,141,74,0,0,27.6,0.244,40,0
187 | 7,194,68,28,0,35.9,0.745,41,1
188 | 8,181,68,36,495,30.1,0.615,60,1
189 | 1,128,98,41,58,32.0,1.321,33,1
190 | 8,109,76,39,114,27.9,0.640,31,1
191 | 5,139,80,35,160,31.6,0.361,25,1
192 | 3,111,62,0,0,22.6,0.142,21,0
193 | 9,123,70,44,94,33.1,0.374,40,0
194 | 7,159,66,0,0,30.4,0.383,36,1
195 | 11,135,0,0,0,52.3,0.578,40,1
196 | 8,85,55,20,0,24.4,0.136,42,0
197 | 5,158,84,41,210,39.4,0.395,29,1
198 | 1,105,58,0,0,24.3,0.187,21,0
199 | 3,107,62,13,48,22.9,0.678,23,1
200 | 4,109,64,44,99,34.8,0.905,26,1
201 | 4,148,60,27,318,30.9,0.150,29,1
202 | 0,113,80,16,0,31.0,0.874,21,0
203 | 1,138,82,0,0,40.1,0.236,28,0
204 | 0,108,68,20,0,27.3,0.787,32,0
205 | 2,99,70,16,44,20.4,0.235,27,0
206 | 6,103,72,32,190,37.7,0.324,55,0
207 | 5,111,72,28,0,23.9,0.407,27,0
208 | 8,196,76,29,280,37.5,0.605,57,1
209 | 5,162,104,0,0,37.7,0.151,52,1
210 | 1,96,64,27,87,33.2,0.289,21,0
211 | 7,184,84,33,0,35.5,0.355,41,1
212 | 2,81,60,22,0,27.7,0.290,25,0
213 | 0,147,85,54,0,42.8,0.375,24,0
214 | 7,179,95,31,0,34.2,0.164,60,0
215 | 0,140,65,26,130,42.6,0.431,24,1
216 | 9,112,82,32,175,34.2,0.260,36,1
217 | 12,151,70,40,271,41.8,0.742,38,1
218 | 5,109,62,41,129,35.8,0.514,25,1
219 | 6,125,68,30,120,30.0,0.464,32,0
220 | 5,85,74,22,0,29.0,1.224,32,1
221 | 5,112,66,0,0,37.8,0.261,41,1
222 | 0,177,60,29,478,34.6,1.072,21,1
223 | 2,158,90,0,0,31.6,0.805,66,1
224 | 7,119,0,0,0,25.2,0.209,37,0
225 | 7,142,60,33,190,28.8,0.687,61,0
226 | 1,100,66,15,56,23.6,0.666,26,0
227 | 1,87,78,27,32,34.6,0.101,22,0
228 | 0,101,76,0,0,35.7,0.198,26,0
229 | 3,162,52,38,0,37.2,0.652,24,1
230 | 4,197,70,39,744,36.7,2.329,31,0
231 | 0,117,80,31,53,45.2,0.089,24,0
232 | 4,142,86,0,0,44.0,0.645,22,1
233 | 6,134,80,37,370,46.2,0.238,46,1
234 | 1,79,80,25,37,25.4,0.583,22,0
235 | 4,122,68,0,0,35.0,0.394,29,0
236 | 3,74,68,28,45,29.7,0.293,23,0
237 | 4,171,72,0,0,43.6,0.479,26,1
238 | 7,181,84,21,192,35.9,0.586,51,1
239 | 0,179,90,27,0,44.1,0.686,23,1
240 | 9,164,84,21,0,30.8,0.831,32,1
241 | 0,104,76,0,0,18.4,0.582,27,0
242 | 1,91,64,24,0,29.2,0.192,21,0
243 | 4,91,70,32,88,33.1,0.446,22,0
244 | 3,139,54,0,0,25.6,0.402,22,1
245 | 6,119,50,22,176,27.1,1.318,33,1
246 | 2,146,76,35,194,38.2,0.329,29,0
247 | 9,184,85,15,0,30.0,1.213,49,1
248 | 10,122,68,0,0,31.2,0.258,41,0
249 | 0,165,90,33,680,52.3,0.427,23,0
250 | 9,124,70,33,402,35.4,0.282,34,0
251 | 1,111,86,19,0,30.1,0.143,23,0
252 | 9,106,52,0,0,31.2,0.380,42,0
253 | 2,129,84,0,0,28.0,0.284,27,0
254 | 2,90,80,14,55,24.4,0.249,24,0
255 | 0,86,68,32,0,35.8,0.238,25,0
256 | 12,92,62,7,258,27.6,0.926,44,1
257 | 1,113,64,35,0,33.6,0.543,21,1
258 | 3,111,56,39,0,30.1,0.557,30,0
259 | 2,114,68,22,0,28.7,0.092,25,0
260 | 1,193,50,16,375,25.9,0.655,24,0
261 | 11,155,76,28,150,33.3,1.353,51,1
262 | 3,191,68,15,130,30.9,0.299,34,0
263 | 3,141,0,0,0,30.0,0.761,27,1
264 | 4,95,70,32,0,32.1,0.612,24,0
265 | 3,142,80,15,0,32.4,0.200,63,0
266 | 4,123,62,0,0,32.0,0.226,35,1
267 | 5,96,74,18,67,33.6,0.997,43,0
268 | 0,138,0,0,0,36.3,0.933,25,1
269 | 2,128,64,42,0,40.0,1.101,24,0
270 | 0,102,52,0,0,25.1,0.078,21,0
271 | 2,146,0,0,0,27.5,0.240,28,1
272 | 10,101,86,37,0,45.6,1.136,38,1
273 | 2,108,62,32,56,25.2,0.128,21,0
274 | 3,122,78,0,0,23.0,0.254,40,0
275 | 1,71,78,50,45,33.2,0.422,21,0
276 | 13,106,70,0,0,34.2,0.251,52,0
277 | 2,100,70,52,57,40.5,0.677,25,0
278 | 7,106,60,24,0,26.5,0.296,29,1
279 | 0,104,64,23,116,27.8,0.454,23,0
280 | 5,114,74,0,0,24.9,0.744,57,0
281 | 2,108,62,10,278,25.3,0.881,22,0
282 | 0,146,70,0,0,37.9,0.334,28,1
283 | 10,129,76,28,122,35.9,0.280,39,0
284 | 7,133,88,15,155,32.4,0.262,37,0
285 | 7,161,86,0,0,30.4,0.165,47,1
286 | 2,108,80,0,0,27.0,0.259,52,1
287 | 7,136,74,26,135,26.0,0.647,51,0
288 | 5,155,84,44,545,38.7,0.619,34,0
289 | 1,119,86,39,220,45.6,0.808,29,1
290 | 4,96,56,17,49,20.8,0.340,26,0
291 | 5,108,72,43,75,36.1,0.263,33,0
292 | 0,78,88,29,40,36.9,0.434,21,0
293 | 0,107,62,30,74,36.6,0.757,25,1
294 | 2,128,78,37,182,43.3,1.224,31,1
295 | 1,128,48,45,194,40.5,0.613,24,1
296 | 0,161,50,0,0,21.9,0.254,65,0
297 | 6,151,62,31,120,35.5,0.692,28,0
298 | 2,146,70,38,360,28.0,0.337,29,1
299 | 0,126,84,29,215,30.7,0.520,24,0
300 | 14,100,78,25,184,36.6,0.412,46,1
301 | 8,112,72,0,0,23.6,0.840,58,0
302 | 0,167,0,0,0,32.3,0.839,30,1
303 | 2,144,58,33,135,31.6,0.422,25,1
304 | 5,77,82,41,42,35.8,0.156,35,0
305 | 5,115,98,0,0,52.9,0.209,28,1
306 | 3,150,76,0,0,21.0,0.207,37,0
307 | 2,120,76,37,105,39.7,0.215,29,0
308 | 10,161,68,23,132,25.5,0.326,47,1
309 | 0,137,68,14,148,24.8,0.143,21,0
310 | 0,128,68,19,180,30.5,1.391,25,1
311 | 2,124,68,28,205,32.9,0.875,30,1
312 | 6,80,66,30,0,26.2,0.313,41,0
313 | 0,106,70,37,148,39.4,0.605,22,0
314 | 2,155,74,17,96,26.6,0.433,27,1
315 | 3,113,50,10,85,29.5,0.626,25,0
316 | 7,109,80,31,0,35.9,1.127,43,1
317 | 2,112,68,22,94,34.1,0.315,26,0
318 | 3,99,80,11,64,19.3,0.284,30,0
319 | 3,182,74,0,0,30.5,0.345,29,1
320 | 3,115,66,39,140,38.1,0.150,28,0
321 | 6,194,78,0,0,23.5,0.129,59,1
322 | 4,129,60,12,231,27.5,0.527,31,0
323 | 3,112,74,30,0,31.6,0.197,25,1
324 | 0,124,70,20,0,27.4,0.254,36,1
325 | 13,152,90,33,29,26.8,0.731,43,1
326 | 2,112,75,32,0,35.7,0.148,21,0
327 | 1,157,72,21,168,25.6,0.123,24,0
328 | 1,122,64,32,156,35.1,0.692,30,1
329 | 10,179,70,0,0,35.1,0.200,37,0
330 | 2,102,86,36,120,45.5,0.127,23,1
331 | 6,105,70,32,68,30.8,0.122,37,0
332 | 8,118,72,19,0,23.1,1.476,46,0
333 | 2,87,58,16,52,32.7,0.166,25,0
334 | 1,180,0,0,0,43.3,0.282,41,1
335 | 12,106,80,0,0,23.6,0.137,44,0
336 | 1,95,60,18,58,23.9,0.260,22,0
337 | 0,165,76,43,255,47.9,0.259,26,0
338 | 0,117,0,0,0,33.8,0.932,44,0
339 | 5,115,76,0,0,31.2,0.343,44,1
340 | 9,152,78,34,171,34.2,0.893,33,1
341 | 7,178,84,0,0,39.9,0.331,41,1
342 | 1,130,70,13,105,25.9,0.472,22,0
343 | 1,95,74,21,73,25.9,0.673,36,0
344 | 1,0,68,35,0,32.0,0.389,22,0
345 | 5,122,86,0,0,34.7,0.290,33,0
346 | 8,95,72,0,0,36.8,0.485,57,0
347 | 8,126,88,36,108,38.5,0.349,49,0
348 | 1,139,46,19,83,28.7,0.654,22,0
349 | 3,116,0,0,0,23.5,0.187,23,0
350 | 3,99,62,19,74,21.8,0.279,26,0
351 | 5,0,80,32,0,41.0,0.346,37,1
352 | 4,92,80,0,0,42.2,0.237,29,0
353 | 4,137,84,0,0,31.2,0.252,30,0
354 | 3,61,82,28,0,34.4,0.243,46,0
355 | 1,90,62,12,43,27.2,0.580,24,0
356 | 3,90,78,0,0,42.7,0.559,21,0
357 | 9,165,88,0,0,30.4,0.302,49,1
358 | 1,125,50,40,167,33.3,0.962,28,1
359 | 13,129,0,30,0,39.9,0.569,44,1
360 | 12,88,74,40,54,35.3,0.378,48,0
361 | 1,196,76,36,249,36.5,0.875,29,1
362 | 5,189,64,33,325,31.2,0.583,29,1
363 | 5,158,70,0,0,29.8,0.207,63,0
364 | 5,103,108,37,0,39.2,0.305,65,0
365 | 4,146,78,0,0,38.5,0.520,67,1
366 | 4,147,74,25,293,34.9,0.385,30,0
367 | 5,99,54,28,83,34.0,0.499,30,0
368 | 6,124,72,0,0,27.6,0.368,29,1
369 | 0,101,64,17,0,21.0,0.252,21,0
370 | 3,81,86,16,66,27.5,0.306,22,0
371 | 1,133,102,28,140,32.8,0.234,45,1
372 | 3,173,82,48,465,38.4,2.137,25,1
373 | 0,118,64,23,89,0.0,1.731,21,0
374 | 0,84,64,22,66,35.8,0.545,21,0
375 | 2,105,58,40,94,34.9,0.225,25,0
376 | 2,122,52,43,158,36.2,0.816,28,0
377 | 12,140,82,43,325,39.2,0.528,58,1
378 | 0,98,82,15,84,25.2,0.299,22,0
379 | 1,87,60,37,75,37.2,0.509,22,0
380 | 4,156,75,0,0,48.3,0.238,32,1
381 | 0,93,100,39,72,43.4,1.021,35,0
382 | 1,107,72,30,82,30.8,0.821,24,0
383 | 0,105,68,22,0,20.0,0.236,22,0
384 | 1,109,60,8,182,25.4,0.947,21,0
385 | 1,90,62,18,59,25.1,1.268,25,0
386 | 1,125,70,24,110,24.3,0.221,25,0
387 | 1,119,54,13,50,22.3,0.205,24,0
388 | 5,116,74,29,0,32.3,0.660,35,1
389 | 8,105,100,36,0,43.3,0.239,45,1
390 | 5,144,82,26,285,32.0,0.452,58,1
391 | 3,100,68,23,81,31.6,0.949,28,0
392 | 1,100,66,29,196,32.0,0.444,42,0
393 | 5,166,76,0,0,45.7,0.340,27,1
394 | 1,131,64,14,415,23.7,0.389,21,0
395 | 4,116,72,12,87,22.1,0.463,37,0
396 | 4,158,78,0,0,32.9,0.803,31,1
397 | 2,127,58,24,275,27.7,1.600,25,0
398 | 3,96,56,34,115,24.7,0.944,39,0
399 | 0,131,66,40,0,34.3,0.196,22,1
400 | 3,82,70,0,0,21.1,0.389,25,0
401 | 3,193,70,31,0,34.9,0.241,25,1
402 | 4,95,64,0,0,32.0,0.161,31,1
403 | 6,137,61,0,0,24.2,0.151,55,0
404 | 5,136,84,41,88,35.0,0.286,35,1
405 | 9,72,78,25,0,31.6,0.280,38,0
406 | 5,168,64,0,0,32.9,0.135,41,1
407 | 2,123,48,32,165,42.1,0.520,26,0
408 | 4,115,72,0,0,28.9,0.376,46,1
409 | 0,101,62,0,0,21.9,0.336,25,0
410 | 8,197,74,0,0,25.9,1.191,39,1
411 | 1,172,68,49,579,42.4,0.702,28,1
412 | 6,102,90,39,0,35.7,0.674,28,0
413 | 1,112,72,30,176,34.4,0.528,25,0
414 | 1,143,84,23,310,42.4,1.076,22,0
415 | 1,143,74,22,61,26.2,0.256,21,0
416 | 0,138,60,35,167,34.6,0.534,21,1
417 | 3,173,84,33,474,35.7,0.258,22,1
418 | 1,97,68,21,0,27.2,1.095,22,0
419 | 4,144,82,32,0,38.5,0.554,37,1
420 | 1,83,68,0,0,18.2,0.624,27,0
421 | 3,129,64,29,115,26.4,0.219,28,1
422 | 1,119,88,41,170,45.3,0.507,26,0
423 | 2,94,68,18,76,26.0,0.561,21,0
424 | 0,102,64,46,78,40.6,0.496,21,0
425 | 2,115,64,22,0,30.8,0.421,21,0
426 | 8,151,78,32,210,42.9,0.516,36,1
427 | 4,184,78,39,277,37.0,0.264,31,1
428 | 0,94,0,0,0,0.0,0.256,25,0
429 | 1,181,64,30,180,34.1,0.328,38,1
430 | 0,135,94,46,145,40.6,0.284,26,0
431 | 1,95,82,25,180,35.0,0.233,43,1
432 | 2,99,0,0,0,22.2,0.108,23,0
433 | 3,89,74,16,85,30.4,0.551,38,0
434 | 1,80,74,11,60,30.0,0.527,22,0
435 | 2,139,75,0,0,25.6,0.167,29,0
436 | 1,90,68,8,0,24.5,1.138,36,0
437 | 0,141,0,0,0,42.4,0.205,29,1
438 | 12,140,85,33,0,37.4,0.244,41,0
439 | 5,147,75,0,0,29.9,0.434,28,0
440 | 1,97,70,15,0,18.2,0.147,21,0
441 | 6,107,88,0,0,36.8,0.727,31,0
442 | 0,189,104,25,0,34.3,0.435,41,1
443 | 2,83,66,23,50,32.2,0.497,22,0
444 | 4,117,64,27,120,33.2,0.230,24,0
445 | 8,108,70,0,0,30.5,0.955,33,1
446 | 4,117,62,12,0,29.7,0.380,30,1
447 | 0,180,78,63,14,59.4,2.420,25,1
448 | 1,100,72,12,70,25.3,0.658,28,0
449 | 0,95,80,45,92,36.5,0.330,26,0
450 | 0,104,64,37,64,33.6,0.510,22,1
451 | 0,120,74,18,63,30.5,0.285,26,0
452 | 1,82,64,13,95,21.2,0.415,23,0
453 | 2,134,70,0,0,28.9,0.542,23,1
454 | 0,91,68,32,210,39.9,0.381,25,0
455 | 2,119,0,0,0,19.6,0.832,72,0
456 | 2,100,54,28,105,37.8,0.498,24,0
457 | 14,175,62,30,0,33.6,0.212,38,1
458 | 1,135,54,0,0,26.7,0.687,62,0
459 | 5,86,68,28,71,30.2,0.364,24,0
460 | 10,148,84,48,237,37.6,1.001,51,1
461 | 9,134,74,33,60,25.9,0.460,81,0
462 | 9,120,72,22,56,20.8,0.733,48,0
463 | 1,71,62,0,0,21.8,0.416,26,0
464 | 8,74,70,40,49,35.3,0.705,39,0
465 | 5,88,78,30,0,27.6,0.258,37,0
466 | 10,115,98,0,0,24.0,1.022,34,0
467 | 0,124,56,13,105,21.8,0.452,21,0
468 | 0,74,52,10,36,27.8,0.269,22,0
469 | 0,97,64,36,100,36.8,0.600,25,0
470 | 8,120,0,0,0,30.0,0.183,38,1
471 | 6,154,78,41,140,46.1,0.571,27,0
472 | 1,144,82,40,0,41.3,0.607,28,0
473 | 0,137,70,38,0,33.2,0.170,22,0
474 | 0,119,66,27,0,38.8,0.259,22,0
475 | 7,136,90,0,0,29.9,0.210,50,0
476 | 4,114,64,0,0,28.9,0.126,24,0
477 | 0,137,84,27,0,27.3,0.231,59,0
478 | 2,105,80,45,191,33.7,0.711,29,1
479 | 7,114,76,17,110,23.8,0.466,31,0
480 | 8,126,74,38,75,25.9,0.162,39,0
481 | 4,132,86,31,0,28.0,0.419,63,0
482 | 3,158,70,30,328,35.5,0.344,35,1
483 | 0,123,88,37,0,35.2,0.197,29,0
484 | 4,85,58,22,49,27.8,0.306,28,0
485 | 0,84,82,31,125,38.2,0.233,23,0
486 | 0,145,0,0,0,44.2,0.630,31,1
487 | 0,135,68,42,250,42.3,0.365,24,1
488 | 1,139,62,41,480,40.7,0.536,21,0
489 | 0,173,78,32,265,46.5,1.159,58,0
490 | 4,99,72,17,0,25.6,0.294,28,0
491 | 8,194,80,0,0,26.1,0.551,67,0
492 | 2,83,65,28,66,36.8,0.629,24,0
493 | 2,89,90,30,0,33.5,0.292,42,0
494 | 4,99,68,38,0,32.8,0.145,33,0
495 | 4,125,70,18,122,28.9,1.144,45,1
496 | 3,80,0,0,0,0.0,0.174,22,0
497 | 6,166,74,0,0,26.6,0.304,66,0
498 | 5,110,68,0,0,26.0,0.292,30,0
499 | 2,81,72,15,76,30.1,0.547,25,0
500 | 7,195,70,33,145,25.1,0.163,55,1
501 | 6,154,74,32,193,29.3,0.839,39,0
502 | 2,117,90,19,71,25.2,0.313,21,0
503 | 3,84,72,32,0,37.2,0.267,28,0
504 | 6,0,68,41,0,39.0,0.727,41,1
505 | 7,94,64,25,79,33.3,0.738,41,0
506 | 3,96,78,39,0,37.3,0.238,40,0
507 | 10,75,82,0,0,33.3,0.263,38,0
508 | 0,180,90,26,90,36.5,0.314,35,1
509 | 1,130,60,23,170,28.6,0.692,21,0
510 | 2,84,50,23,76,30.4,0.968,21,0
511 | 8,120,78,0,0,25.0,0.409,64,0
512 | 12,84,72,31,0,29.7,0.297,46,1
513 | 0,139,62,17,210,22.1,0.207,21,0
514 | 9,91,68,0,0,24.2,0.200,58,0
515 | 2,91,62,0,0,27.3,0.525,22,0
516 | 3,99,54,19,86,25.6,0.154,24,0
517 | 3,163,70,18,105,31.6,0.268,28,1
518 | 9,145,88,34,165,30.3,0.771,53,1
519 | 7,125,86,0,0,37.6,0.304,51,0
520 | 13,76,60,0,0,32.8,0.180,41,0
521 | 6,129,90,7,326,19.6,0.582,60,0
522 | 2,68,70,32,66,25.0,0.187,25,0
523 | 3,124,80,33,130,33.2,0.305,26,0
524 | 6,114,0,0,0,0.0,0.189,26,0
525 | 9,130,70,0,0,34.2,0.652,45,1
526 | 3,125,58,0,0,31.6,0.151,24,0
527 | 3,87,60,18,0,21.8,0.444,21,0
528 | 1,97,64,19,82,18.2,0.299,21,0
529 | 3,116,74,15,105,26.3,0.107,24,0
530 | 0,117,66,31,188,30.8,0.493,22,0
531 | 0,111,65,0,0,24.6,0.660,31,0
532 | 2,122,60,18,106,29.8,0.717,22,0
533 | 0,107,76,0,0,45.3,0.686,24,0
534 | 1,86,66,52,65,41.3,0.917,29,0
535 | 6,91,0,0,0,29.8,0.501,31,0
536 | 1,77,56,30,56,33.3,1.251,24,0
537 | 4,132,0,0,0,32.9,0.302,23,1
538 | 0,105,90,0,0,29.6,0.197,46,0
539 | 0,57,60,0,0,21.7,0.735,67,0
540 | 0,127,80,37,210,36.3,0.804,23,0
541 | 3,129,92,49,155,36.4,0.968,32,1
542 | 8,100,74,40,215,39.4,0.661,43,1
543 | 3,128,72,25,190,32.4,0.549,27,1
544 | 10,90,85,32,0,34.9,0.825,56,1
545 | 4,84,90,23,56,39.5,0.159,25,0
546 | 1,88,78,29,76,32.0,0.365,29,0
547 | 8,186,90,35,225,34.5,0.423,37,1
548 | 5,187,76,27,207,43.6,1.034,53,1
549 | 4,131,68,21,166,33.1,0.160,28,0
550 | 1,164,82,43,67,32.8,0.341,50,0
551 | 4,189,110,31,0,28.5,0.680,37,0
552 | 1,116,70,28,0,27.4,0.204,21,0
553 | 3,84,68,30,106,31.9,0.591,25,0
554 | 6,114,88,0,0,27.8,0.247,66,0
555 | 1,88,62,24,44,29.9,0.422,23,0
556 | 1,84,64,23,115,36.9,0.471,28,0
557 | 7,124,70,33,215,25.5,0.161,37,0
558 | 1,97,70,40,0,38.1,0.218,30,0
559 | 8,110,76,0,0,27.8,0.237,58,0
560 | 11,103,68,40,0,46.2,0.126,42,0
561 | 11,85,74,0,0,30.1,0.300,35,0
562 | 6,125,76,0,0,33.8,0.121,54,1
563 | 0,198,66,32,274,41.3,0.502,28,1
564 | 1,87,68,34,77,37.6,0.401,24,0
565 | 6,99,60,19,54,26.9,0.497,32,0
566 | 0,91,80,0,0,32.4,0.601,27,0
567 | 2,95,54,14,88,26.1,0.748,22,0
568 | 1,99,72,30,18,38.6,0.412,21,0
569 | 6,92,62,32,126,32.0,0.085,46,0
570 | 4,154,72,29,126,31.3,0.338,37,0
571 | 0,121,66,30,165,34.3,0.203,33,1
572 | 3,78,70,0,0,32.5,0.270,39,0
573 | 2,130,96,0,0,22.6,0.268,21,0
574 | 3,111,58,31,44,29.5,0.430,22,0
575 | 2,98,60,17,120,34.7,0.198,22,0
576 | 1,143,86,30,330,30.1,0.892,23,0
577 | 1,119,44,47,63,35.5,0.280,25,0
578 | 6,108,44,20,130,24.0,0.813,35,0
579 | 2,118,80,0,0,42.9,0.693,21,1
580 | 10,133,68,0,0,27.0,0.245,36,0
581 | 2,197,70,99,0,34.7,0.575,62,1
582 | 0,151,90,46,0,42.1,0.371,21,1
583 | 6,109,60,27,0,25.0,0.206,27,0
584 | 12,121,78,17,0,26.5,0.259,62,0
585 | 8,100,76,0,0,38.7,0.190,42,0
586 | 8,124,76,24,600,28.7,0.687,52,1
587 | 1,93,56,11,0,22.5,0.417,22,0
588 | 8,143,66,0,0,34.9,0.129,41,1
589 | 6,103,66,0,0,24.3,0.249,29,0
590 | 3,176,86,27,156,33.3,1.154,52,1
591 | 0,73,0,0,0,21.1,0.342,25,0
592 | 11,111,84,40,0,46.8,0.925,45,1
593 | 2,112,78,50,140,39.4,0.175,24,0
594 | 3,132,80,0,0,34.4,0.402,44,1
595 | 2,82,52,22,115,28.5,1.699,25,0
596 | 6,123,72,45,230,33.6,0.733,34,0
597 | 0,188,82,14,185,32.0,0.682,22,1
598 | 0,67,76,0,0,45.3,0.194,46,0
599 | 1,89,24,19,25,27.8,0.559,21,0
600 | 1,173,74,0,0,36.8,0.088,38,1
601 | 1,109,38,18,120,23.1,0.407,26,0
602 | 1,108,88,19,0,27.1,0.400,24,0
603 | 6,96,0,0,0,23.7,0.190,28,0
604 | 1,124,74,36,0,27.8,0.100,30,0
605 | 7,150,78,29,126,35.2,0.692,54,1
606 | 4,183,0,0,0,28.4,0.212,36,1
607 | 1,124,60,32,0,35.8,0.514,21,0
608 | 1,181,78,42,293,40.0,1.258,22,1
609 | 1,92,62,25,41,19.5,0.482,25,0
610 | 0,152,82,39,272,41.5,0.270,27,0
611 | 1,111,62,13,182,24.0,0.138,23,0
612 | 3,106,54,21,158,30.9,0.292,24,0
613 | 3,174,58,22,194,32.9,0.593,36,1
614 | 7,168,88,42,321,38.2,0.787,40,1
615 | 6,105,80,28,0,32.5,0.878,26,0
616 | 11,138,74,26,144,36.1,0.557,50,1
617 | 3,106,72,0,0,25.8,0.207,27,0
618 | 6,117,96,0,0,28.7,0.157,30,0
619 | 2,68,62,13,15,20.1,0.257,23,0
620 | 9,112,82,24,0,28.2,1.282,50,1
621 | 0,119,0,0,0,32.4,0.141,24,1
622 | 2,112,86,42,160,38.4,0.246,28,0
623 | 2,92,76,20,0,24.2,1.698,28,0
624 | 6,183,94,0,0,40.8,1.461,45,0
625 | 0,94,70,27,115,43.5,0.347,21,0
626 | 2,108,64,0,0,30.8,0.158,21,0
627 | 4,90,88,47,54,37.7,0.362,29,0
628 | 0,125,68,0,0,24.7,0.206,21,0
629 | 0,132,78,0,0,32.4,0.393,21,0
630 | 5,128,80,0,0,34.6,0.144,45,0
631 | 4,94,65,22,0,24.7,0.148,21,0
632 | 7,114,64,0,0,27.4,0.732,34,1
633 | 0,102,78,40,90,34.5,0.238,24,0
634 | 2,111,60,0,0,26.2,0.343,23,0
635 | 1,128,82,17,183,27.5,0.115,22,0
636 | 10,92,62,0,0,25.9,0.167,31,0
637 | 13,104,72,0,0,31.2,0.465,38,1
638 | 5,104,74,0,0,28.8,0.153,48,0
639 | 2,94,76,18,66,31.6,0.649,23,0
640 | 7,97,76,32,91,40.9,0.871,32,1
641 | 1,100,74,12,46,19.5,0.149,28,0
642 | 0,102,86,17,105,29.3,0.695,27,0
643 | 4,128,70,0,0,34.3,0.303,24,0
644 | 6,147,80,0,0,29.5,0.178,50,1
645 | 4,90,0,0,0,28.0,0.610,31,0
646 | 3,103,72,30,152,27.6,0.730,27,0
647 | 2,157,74,35,440,39.4,0.134,30,0
648 | 1,167,74,17,144,23.4,0.447,33,1
649 | 0,179,50,36,159,37.8,0.455,22,1
650 | 11,136,84,35,130,28.3,0.260,42,1
651 | 0,107,60,25,0,26.4,0.133,23,0
652 | 1,91,54,25,100,25.2,0.234,23,0
653 | 1,117,60,23,106,33.8,0.466,27,0
654 | 5,123,74,40,77,34.1,0.269,28,0
655 | 2,120,54,0,0,26.8,0.455,27,0
656 | 1,106,70,28,135,34.2,0.142,22,0
657 | 2,155,52,27,540,38.7,0.240,25,1
658 | 2,101,58,35,90,21.8,0.155,22,0
659 | 1,120,80,48,200,38.9,1.162,41,0
660 | 11,127,106,0,0,39.0,0.190,51,0
661 | 3,80,82,31,70,34.2,1.292,27,1
662 | 10,162,84,0,0,27.7,0.182,54,0
663 | 1,199,76,43,0,42.9,1.394,22,1
664 | 8,167,106,46,231,37.6,0.165,43,1
665 | 9,145,80,46,130,37.9,0.637,40,1
666 | 6,115,60,39,0,33.7,0.245,40,1
667 | 1,112,80,45,132,34.8,0.217,24,0
668 | 4,145,82,18,0,32.5,0.235,70,1
669 | 10,111,70,27,0,27.5,0.141,40,1
670 | 6,98,58,33,190,34.0,0.430,43,0
671 | 9,154,78,30,100,30.9,0.164,45,0
672 | 6,165,68,26,168,33.6,0.631,49,0
673 | 1,99,58,10,0,25.4,0.551,21,0
674 | 10,68,106,23,49,35.5,0.285,47,0
675 | 3,123,100,35,240,57.3,0.880,22,0
676 | 8,91,82,0,0,35.6,0.587,68,0
677 | 6,195,70,0,0,30.9,0.328,31,1
678 | 9,156,86,0,0,24.8,0.230,53,1
679 | 0,93,60,0,0,35.3,0.263,25,0
680 | 3,121,52,0,0,36.0,0.127,25,1
681 | 2,101,58,17,265,24.2,0.614,23,0
682 | 2,56,56,28,45,24.2,0.332,22,0
683 | 0,162,76,36,0,49.6,0.364,26,1
684 | 0,95,64,39,105,44.6,0.366,22,0
685 | 4,125,80,0,0,32.3,0.536,27,1
686 | 5,136,82,0,0,0.0,0.640,69,0
687 | 2,129,74,26,205,33.2,0.591,25,0
688 | 3,130,64,0,0,23.1,0.314,22,0
689 | 1,107,50,19,0,28.3,0.181,29,0
690 | 1,140,74,26,180,24.1,0.828,23,0
691 | 1,144,82,46,180,46.1,0.335,46,1
692 | 8,107,80,0,0,24.6,0.856,34,0
693 | 13,158,114,0,0,42.3,0.257,44,1
694 | 2,121,70,32,95,39.1,0.886,23,0
695 | 7,129,68,49,125,38.5,0.439,43,1
696 | 2,90,60,0,0,23.5,0.191,25,0
697 | 7,142,90,24,480,30.4,0.128,43,1
698 | 3,169,74,19,125,29.9,0.268,31,1
699 | 0,99,0,0,0,25.0,0.253,22,0
700 | 4,127,88,11,155,34.5,0.598,28,0
701 | 4,118,70,0,0,44.5,0.904,26,0
702 | 2,122,76,27,200,35.9,0.483,26,0
703 | 6,125,78,31,0,27.6,0.565,49,1
704 | 1,168,88,29,0,35.0,0.905,52,1
705 | 2,129,0,0,0,38.5,0.304,41,0
706 | 4,110,76,20,100,28.4,0.118,27,0
707 | 6,80,80,36,0,39.8,0.177,28,0
708 | 10,115,0,0,0,0.0,0.261,30,1
709 | 2,127,46,21,335,34.4,0.176,22,0
710 | 9,164,78,0,0,32.8,0.148,45,1
711 | 2,93,64,32,160,38.0,0.674,23,1
712 | 3,158,64,13,387,31.2,0.295,24,0
713 | 5,126,78,27,22,29.6,0.439,40,0
714 | 10,129,62,36,0,41.2,0.441,38,1
715 | 0,134,58,20,291,26.4,0.352,21,0
716 | 3,102,74,0,0,29.5,0.121,32,0
717 | 7,187,50,33,392,33.9,0.826,34,1
718 | 3,173,78,39,185,33.8,0.970,31,1
719 | 10,94,72,18,0,23.1,0.595,56,0
720 | 1,108,60,46,178,35.5,0.415,24,0
721 | 5,97,76,27,0,35.6,0.378,52,1
722 | 4,83,86,19,0,29.3,0.317,34,0
723 | 1,114,66,36,200,38.1,0.289,21,0
724 | 1,149,68,29,127,29.3,0.349,42,1
725 | 5,117,86,30,105,39.1,0.251,42,0
726 | 1,111,94,0,0,32.8,0.265,45,0
727 | 4,112,78,40,0,39.4,0.236,38,0
728 | 1,116,78,29,180,36.1,0.496,25,0
729 | 0,141,84,26,0,32.4,0.433,22,0
730 | 2,175,88,0,0,22.9,0.326,22,0
731 | 2,92,52,0,0,30.1,0.141,22,0
732 | 3,130,78,23,79,28.4,0.323,34,1
733 | 8,120,86,0,0,28.4,0.259,22,1
734 | 2,174,88,37,120,44.5,0.646,24,1
735 | 2,106,56,27,165,29.0,0.426,22,0
736 | 2,105,75,0,0,23.3,0.560,53,0
737 | 4,95,60,32,0,35.4,0.284,28,0
738 | 0,126,86,27,120,27.4,0.515,21,0
739 | 8,65,72,23,0,32.0,0.600,42,0
740 | 2,99,60,17,160,36.6,0.453,21,0
741 | 1,102,74,0,0,39.5,0.293,42,1
742 | 11,120,80,37,150,42.3,0.785,48,1
743 | 3,102,44,20,94,30.8,0.400,26,0
744 | 1,109,58,18,116,28.5,0.219,22,0
745 | 9,140,94,0,0,32.7,0.734,45,1
746 | 13,153,88,37,140,40.6,1.174,39,0
747 | 12,100,84,33,105,30.0,0.488,46,0
748 | 1,147,94,41,0,49.3,0.358,27,1
749 | 1,81,74,41,57,46.3,1.096,32,0
750 | 3,187,70,22,200,36.4,0.408,36,1
751 | 6,162,62,0,0,24.3,0.178,50,1
752 | 4,136,70,0,0,31.2,1.182,22,1
753 | 1,121,78,39,74,39.0,0.261,28,0
754 | 3,108,62,24,0,26.0,0.223,25,0
755 | 0,181,88,44,510,43.3,0.222,26,1
756 | 8,154,78,32,0,32.4,0.443,45,1
757 | 1,128,88,39,110,36.5,1.057,37,1
758 | 7,137,90,41,0,32.0,0.391,39,0
759 | 0,123,72,0,0,36.3,0.258,52,1
760 | 1,106,76,0,0,37.5,0.197,26,0
761 | 6,190,92,0,0,35.5,0.278,66,1
762 | 2,88,58,26,16,28.4,0.766,22,0
763 | 9,170,74,31,0,44.0,0.403,43,1
764 | 9,89,62,0,0,22.5,0.142,33,0
765 | 10,101,76,48,180,32.9,0.171,63,0
766 | 2,122,70,27,0,36.8,0.340,27,0
767 | 5,121,72,23,112,26.2,0.245,30,0
768 | 1,126,60,0,0,30.1,0.349,47,1
769 | 1,93,70,31,0,30.4,0.315,23,0
770 | 


--------------------------------------------------------------------------------
/evaluating estimator performance using cross-validation.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, it is common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set X_test, y_test. Note that the word “experiment” is not intended to denote academic use only, because even in commercial settings machine learning usually starts out experimentally.\n",
  8 |     "In scikit-learn a random split into training and test sets can be quickly computed with the train_test_split helper function. Let’s load the iris data set to fit a linear support vector machine on it:"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": 1,
 14 |    "metadata": {
 15 |     "collapsed": false
 16 |    },
 17 |    "outputs": [
 18 |     {
 19 |      "data": {
 20 |       "text/plain": [
 21 |        "((150, 4), (150,))"
 22 |       ]
 23 |      },
 24 |      "execution_count": 1,
 25 |      "metadata": {},
 26 |      "output_type": "execute_result"
 27 |     }
 28 |    ],
 29 |    "source": [
 30 |     "import numpy as np\n",
 31 |     "from sklearn import cross_validation\n",
 32 |     "from sklearn import datasets\n",
 33 |     "from sklearn import svm\n",
 34 |     "\n",
 35 |     "iris = datasets.load_iris()\n",
 36 |     "iris.data.shape, iris.target.shape\n",
 37 |     "((150, 4), (150,))"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "We can now quickly sample a training set while holding out 40% of the data for testing (evaluating) our classifier:"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "code",
 49 |    "execution_count": 3,
 50 |    "metadata": {
 51 |     "collapsed": false
 52 |    },
 53 |    "outputs": [
 54 |     {
 55 |      "data": {
 56 |       "text/plain": [
 57 |        "0.96666666666666667"
 58 |       ]
 59 |      },
 60 |      "execution_count": 3,
 61 |      "metadata": {},
 62 |      "output_type": "execute_result"
 63 |     }
 64 |    ],
 65 |    "source": [
 66 |     "X_train, X_test, y_train, y_test = cross_validation.train_test_split(\n",
 67 |     "...     iris.data, iris.target, test_size=0.4, random_state=0)\n",
 68 |     "\n",
 69 |     "X_train.shape, y_train.shape\n",
 70 |     "((90, 4), (90,))\n",
 71 |     "X_test.shape, y_test.shape\n",
 72 |     "((60, 4), (60,))\n",
 73 |     "\n",
 74 |     "clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)\n",
 75 |     "clf.score(X_test, y_test)  "
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "When evaluating different settings (“hyperparameters”) for estimators, such as the C setting that must be manually set for an SVM, there is still a risk of overfitting on the test set because the parameters can be tweaked until the estimator performs optimally. This way, knowledge about the test set can “leak” into the model and evaluation metrics no longer report on generalization performance. To solve this problem, yet another part of the dataset can be held out as a so-called “validation set”: training proceeds on the training set, after which evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation can be done on the test set.\n",
 83 |     "However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.\n",
 84 |     "A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:\n",
 85 |     "A model is trained using k-1 of the folds as training data;\n",
 86 |     "the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).\n",
 87 |     "The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as it is the case when fixing an arbitrary test set), which is a major advantage in problem such as inverse inference where the number of samples is very small."
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "# Computing cross-validated metrics "
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "markdown",
 99 |    "metadata": {},
100 |    "source": [
101 |     "The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset.\n",
102 |     "The following example demonstrates how to estimate the accuracy of a linear kernel support vector machine on the iris dataset by splitting the data, fitting a model and computing the score 5 consecutive times (with different splits each time):"
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": 10,
108 |    "metadata": {
109 |     "collapsed": false
110 |    },
111 |    "outputs": [
112 |     {
113 |      "data": {
114 |       "text/plain": [
115 |        "array([ 0.96666667,  1.        ,  0.96666667,  0.96666667,  1.        ])"
116 |       ]
117 |      },
118 |      "execution_count": 10,
119 |      "metadata": {},
120 |      "output_type": "execute_result"
121 |     }
122 |    ],
123 |    "source": [
124 |     "clf = svm.SVC(kernel='linear', C=1)\n",
125 |     "scores = cross_validation.cross_val_score(\n",
126 |     "    clf, iris.data, iris.target, cv=5)\n",
127 |     "...\n",
128 |     "scores                                              \n"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "The mean score and the 95% confidence interval of the score estimate are hence given by:"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "code",
140 |    "execution_count": 11,
141 |    "metadata": {
142 |     "collapsed": false
143 |    },
144 |    "outputs": [
145 |     {
146 |      "name": "stdout",
147 |      "output_type": "stream",
148 |      "text": [
149 |       "Accuracy: 0.98 (+/- 0.03)\n"
150 |      ]
151 |     }
152 |    ],
153 |    "source": [
154 |     "print(\"Accuracy: %0.2f (+/- %0.2f)\" % (scores.mean(), scores.std() * 2))"
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "markdown",
159 |    "metadata": {},
160 |    "source": [
161 |     "By default, the score computed at each CV iteration is the score method of the estimator. It is possible to change this by using the scoring parameter:"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "code",
166 |    "execution_count": 13,
167 |    "metadata": {
168 |     "collapsed": false
169 |    },
170 |    "outputs": [
171 |     {
172 |      "data": {
173 |       "text/plain": [
174 |        "array([ 0.96658312,  1.        ,  0.96658312,  0.96658312,  1.        ])"
175 |       ]
176 |      },
177 |      "execution_count": 13,
178 |      "metadata": {},
179 |      "output_type": "execute_result"
180 |     }
181 |    ],
182 |    "source": [
183 |     "from sklearn import metrics\n",
184 |     "scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5, scoring='f1_weighted')\n",
185 |     "scores"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {},
191 |    "source": [
192 |     "In the case of the Iris dataset, the samples are balanced across target classes hence the accuracy and the F1-score are almost equal.\n",
193 |     "When the cv argument is an integer, cross_val_score uses the KFold or StratifiedKFold strategies by default, the latter being used if the estimator derives from ClassifierMixin.\n",
194 |     "It is also possible to use other cross validation strategies by passing a cross validation iterator instead, for instance:"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "code",
199 |    "execution_count": 14,
200 |    "metadata": {
201 |     "collapsed": false
202 |    },
203 |    "outputs": [
204 |     {
205 |      "data": {
206 |       "text/plain": [
207 |        "array([ 0.97777778,  0.97777778,  1.        ])"
208 |       ]
209 |      },
210 |      "execution_count": 14,
211 |      "metadata": {},
212 |      "output_type": "execute_result"
213 |     }
214 |    ],
215 |    "source": [
216 |     "n_samples = iris.data.shape[0]\n",
217 |     "cv = cross_validation.ShuffleSplit(n_samples, n_iter=3, test_size=0.3, random_state=0)\n",
218 |     "cross_validation.cross_val_score(clf, iris.data, iris.target, cv=cv)"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "markdown",
223 |    "metadata": {},
224 |    "source": [
225 |     "# Data transformation with held out data"
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "markdown",
230 |    "metadata": {},
231 |    "source": [
232 |     "Just as it is important to test a predictor on data held-out from training, preprocessing (such as standardization, feature selection, etc.) and similar data transformations similarly should be learnt from a training set and applied to held-out data for prediction:"
233 |    ]
234 |   },
235 |   {
236 |    "cell_type": "code",
237 |    "execution_count": 15,
238 |    "metadata": {
239 |     "collapsed": false
240 |    },
241 |    "outputs": [
242 |     {
243 |      "data": {
244 |       "text/plain": [
245 |        "0.93333333333333335"
246 |       ]
247 |      },
248 |      "execution_count": 15,
249 |      "metadata": {},
250 |      "output_type": "execute_result"
251 |     }
252 |    ],
253 |    "source": [
254 |     "from sklearn import preprocessing\n",
255 |     "X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=0)\n",
256 |     "scaler = preprocessing.StandardScaler().fit(X_train)\n",
257 |     "X_train_transformed = scaler.transform(X_train)\n",
258 |     "clf = svm.SVC(C=1).fit(X_train_transformed, y_train)\n",
259 |     "X_test_transformed = scaler.transform(X_test)\n",
260 |     "clf.score(X_test_transformed, y_test) "
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "markdown",
265 |    "metadata": {},
266 |    "source": [
267 |     "A Pipeline makes it easier to compose estimators, providing this behavior under cross-validation:"
268 |    ]
269 |   },
270 |   {
271 |    "cell_type": "code",
272 |    "execution_count": 16,
273 |    "metadata": {
274 |     "collapsed": false
275 |    },
276 |    "outputs": [
277 |     {
278 |      "data": {
279 |       "text/plain": [
280 |        "array([ 0.97777778,  0.93333333,  0.95555556])"
281 |       ]
282 |      },
283 |      "execution_count": 16,
284 |      "metadata": {},
285 |      "output_type": "execute_result"
286 |     }
287 |    ],
288 |    "source": [
289 |     "from sklearn.pipeline import make_pipeline\n",
290 |     "clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))\n",
291 |     "cross_validation.cross_val_score(clf, iris.data, iris.target, cv=cv)"
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "markdown",
296 |    "metadata": {},
297 |    "source": [
298 |     "# Obtaining predictions by cross-validation"
299 |    ]
300 |   },
301 |   {
302 |    "cell_type": "markdown",
303 |    "metadata": {},
304 |    "source": [
305 |     "The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).\n",
306 |     "These prediction can then be used to evaluate the classifier:"
307 |    ]
308 |   },
309 |   {
310 |    "cell_type": "code",
311 |    "execution_count": 17,
312 |    "metadata": {
313 |     "collapsed": false
314 |    },
315 |    "outputs": [
316 |     {
317 |      "data": {
318 |       "text/plain": [
319 |        "0.96666666666666667"
320 |       ]
321 |      },
322 |      "execution_count": 17,
323 |      "metadata": {},
324 |      "output_type": "execute_result"
325 |     }
326 |    ],
327 |    "source": [
328 |     "predicted = cross_validation.cross_val_predict(clf, iris.data, iris.target, cv=10)\n",
329 |     "metrics.accuracy_score(iris.target, predicted) "
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "markdown",
334 |    "metadata": {},
335 |    "source": [
336 |     "Note that the result of this computation may be slightly different from those obtained using cross_val_score as the elements are grouped in different ways."
337 |    ]
338 |   }
339 |  ],
340 |  "metadata": {
341 |   "kernelspec": {
342 |    "display_name": "Python 3",
343 |    "language": "python",
344 |    "name": "python3"
345 |   },
346 |   "language_info": {
347 |    "codemirror_mode": {
348 |     "name": "ipython",
349 |     "version": 3
350 |    },
351 |    "file_extension": ".py",
352 |    "mimetype": "text/x-python",
353 |    "name": "python",
354 |    "nbconvert_exporter": "python",
355 |    "pygments_lexer": "ipython3",
356 |    "version": "3.5.1"
357 |   }
358 |  },
359 |  "nbformat": 4,
360 |  "nbformat_minor": 0
361 | }
362 | 


--------------------------------------------------------------------------------
/examples/Gradient.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# Code for Gradient disent "
 8 |    ]
 9 |   },
10 |   {
11 |    "cell_type": "markdown",
12 |    "metadata": {},
13 |    "source": [
14 |     "Code for calculating local minima for $$f(x,y)= x^4 + y^4 - x^2 - y^2$$ with partial derivative wrt x : $$4x^3 - 2x$$ and partial derivative wrt y: $$4y^3 - 2y$$ "
15 |    ]
16 |   },
17 |   {
18 |    "cell_type": "code",
19 |    "execution_count": null,
20 |    "metadata": {
21 |     "collapsed": true
22 |    },
23 |    "outputs": [],
24 |    "source": [
25 |     "max_iter = 1000\n",
26 |     "x_o=0\n",
27 |     "y_o=0\n",
28 |     "alpha = 0.01 ## Step Size\n",
29 |     "x_k=2 ## Starting position of x coordinate \n",
30 |     "y_k=2 ## Starting position of y coordinate \n",
31 |     "\n",
32 |     "\n"
33 |    ]
34 |   },
35 |   {
36 |    "cell_type": "markdown",
37 |    "metadata": {},
38 |    "source": [
39 |     "Here we set the Max iterations as 1000 and starting coordinte in ($x_k,y_k$) = (2,2) and Step size as 0.01 donated by alpha "
40 |    ]
41 |   },
42 |   {
43 |    "cell_type": "code",
44 |    "execution_count": null,
45 |    "metadata": {
46 |     "collapsed": true
47 |    },
48 |    "outputs": [],
49 |    "source": [
50 |     "def devx(x): ## Defining partial derivative wrt x\n",
51 |     "    return 4*x**3 - 2*x\n",
52 |     "def devy(y): ## Defining partial derivation wrt y\n",
53 |     "    return 4*y**3 -2*y\n",
54 |     "for i in range(max_iter):\n",
55 |     "    x_o = x_k\n",
56 |     "    y_o = y_k\n",
57 |     "    x_k = x_o - alpha * devx(x_o)\n",
58 |     "    y_k = y_o - alpha * devy(y_o)\n",
59 |     "\n",
60 |     "print \"Local Minimum at\",x_k,\",\",y_k"
61 |    ]
62 |   },
63 |   {
64 |    "cell_type": "markdown",
65 |    "metadata": {},
66 |    "source": [
67 |     "Here We define 2 functions devx(x) and devy(y) as the partial derivative with respect to x:$$4x^3 - 2x$$ and partial derivative with respect to y: $$4y^3 - 2y$$ respectively . In the following loop we calculate the local minima using equations : ![Alt Text](http://www.codeproject.com/KB/recipes/879043/GradientDescent.jpg \"Gradient Descent\"). "
68 |    ]
69 |   }
70 |  ],
71 |  "metadata": {
72 |   "kernelspec": {
73 |    "display_name": "Python 2",
74 |    "language": "python",
75 |    "name": "python2"
76 |   },
77 |   "language_info": {
78 |    "codemirror_mode": {
79 |     "name": "ipython",
80 |     "version": 2
81 |    },
82 |    "file_extension": ".py",
83 |    "mimetype": "text/x-python",
84 |    "name": "python",
85 |    "nbconvert_exporter": "python",
86 |    "pygments_lexer": "ipython2",
87 |    "version": "2.7.10"
88 |   }
89 |  },
90 |  "nbformat": 4,
91 |  "nbformat_minor": 0
92 | }
93 | 


--------------------------------------------------------------------------------
/images/neuralnet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/images/neuralnet.png


--------------------------------------------------------------------------------
/images/train_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/images/train_img.png


--------------------------------------------------------------------------------
/images/weights.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/crowd-course/datascience/f5961c20c4052566b1b5a9fc0699c8cadb6147f5/images/weights.png


--------------------------------------------------------------------------------
/kaggle-data/README.md:
--------------------------------------------------------------------------------
1 | Source: [https://www.kaggle.com/benhamner/2016-us-election](https://www.kaggle.com/benhamner/2016-us-election)
2 | 
3 | 


--------------------------------------------------------------------------------
/kaggle-data/county_facts_dictionary.csv:
--------------------------------------------------------------------------------
 1 | column_name,description
 2 | PST045214,"Population, 2014 estimate"
 3 | PST040210,"Population, 2010 (April 1) estimates base"
 4 | PST120214,"Population, percent change - April 1, 2010 to July 1, 2014"
 5 | POP010210,"Population, 2010"
 6 | AGE135214,"Persons under 5 years, percent, 2014"
 7 | AGE295214,"Persons under 18 years, percent, 2014"
 8 | AGE775214,"Persons 65 years and over, percent, 2014"
 9 | SEX255214,"Female persons, percent, 2014"
10 | RHI125214,"White alone, percent, 2014"
11 | RHI225214,"Black or African American alone, percent, 2014"
12 | RHI325214,"American Indian and Alaska Native alone, percent, 2014"
13 | RHI425214,"Asian alone, percent, 2014"
14 | RHI525214,"Native Hawaiian and Other Pacific Islander alone, percent, 2014"
15 | RHI625214,"Two or More Races, percent, 2014"
16 | RHI725214,"Hispanic or Latino, percent, 2014"
17 | RHI825214,"White alone, not Hispanic or Latino, percent, 2014"
18 | POP715213,"Living in same house 1 year & over, percent, 2009-2013"
19 | POP645213,"Foreign born persons, percent, 2009-2013"
20 | POP815213,"Language other than English spoken at home, pct age 5+, 2009-2013"
21 | EDU635213,"High school graduate or higher, percent of persons age 25+, 2009-2013"
22 | EDU685213,"Bachelor's degree or higher, percent of persons age 25+, 2009-2013"
23 | VET605213,"Veterans, 2009-2013"
24 | LFE305213,"Mean travel time to work (minutes), workers age 16+, 2009-2013"
25 | HSG010214,"Housing units, 2014"
26 | HSG445213,"Homeownership rate, 2009-2013"
27 | HSG096213,"Housing units in multi-unit structures, percent, 2009-2013"
28 | HSG495213,"Median value of owner-occupied housing units, 2009-2013"
29 | HSD410213,"Households, 2009-2013"
30 | HSD310213,"Persons per household, 2009-2013"
31 | INC910213,"Per capita money income in past 12 months (2013 dollars), 2009-2013"
32 | INC110213,"Median household income, 2009-2013"
33 | PVY020213,"Persons below poverty level, percent, 2009-2013"
34 | BZA010213,"Private nonfarm establishments, 2013"
35 | BZA110213,"Private nonfarm employment,  2013"
36 | BZA115213,"Private nonfarm employment, percent change, 2012-2013"
37 | NES010213,"Nonemployer establishments, 2013"
38 | SBO001207,"Total number of firms, 2007"
39 | SBO315207,"Black-owned firms, percent, 2007"
40 | SBO115207,"American Indian- and Alaska Native-owned firms, percent, 2007"
41 | SBO215207,"Asian-owned firms, percent, 2007"
42 | SBO515207,"Native Hawaiian- and Other Pacific Islander-owned firms, percent, 2007"
43 | SBO415207,"Hispanic-owned firms, percent, 2007"
44 | SBO015207,"Women-owned firms, percent, 2007"
45 | MAN450207,"Manufacturers shipments, 2007 ($1,000)"
46 | WTN220207,"Merchant wholesaler sales, 2007 ($1,000)"
47 | RTN130207,"Retail sales, 2007 ($1,000)"
48 | RTN131207,"Retail sales per capita, 2007"
49 | AFN120207,"Accommodation and food services sales, 2007 ($1,000)"
50 | BPS030214,"Building permits, 2014"
51 | LND110210,"Land area in square miles, 2010"
52 | POP060210,"Population per square mile, 2010"
53 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.11.0
2 | pandas==0.18.1
3 | python-dateutil==2.5.3
4 | pytz==2016.4
5 | scikit-learn==0.17.1
6 | scipy==0.17.1
7 | six==1.10.0
8 | scikit-neuralnetwork==0.7
9 | 


--------------------------------------------------------------------------------