├── ArbreDeDecision_4DS_etudiant.ipynb
├── ML0101EN-Clas-Decision-Trees-drug-py-.ipynb
├── ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb
├── ML0101EN-Clas-Decision-Trees.ipynb
└── drug200.csv
/ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "button": false,
7 | "new_sheet": false,
8 | "run_control": {
9 | "read_only": false
10 | }
11 | },
12 | "source": [
13 | "
\n",
14 | "\n",
15 | "#
Decision Trees"
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {
21 | "button": false,
22 | "new_sheet": false,
23 | "run_control": {
24 | "read_only": false
25 | }
26 | },
27 | "source": [
28 | "In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of patients, and their respond to different medications. Then you use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient."
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {
34 | "button": false,
35 | "new_sheet": false,
36 | "run_control": {
37 | "read_only": false
38 | }
39 | },
40 | "source": [
41 | "Import the Following Libraries:\n",
42 | "\n",
43 | " - numpy (as np)
\n",
44 | " - pandas
\n",
45 | " - DecisionTreeClassifier from sklearn.tree
\n",
46 | "
"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 1,
52 | "metadata": {
53 | "button": false,
54 | "new_sheet": false,
55 | "run_control": {
56 | "read_only": false
57 | }
58 | },
59 | "outputs": [],
60 | "source": [
61 | "import numpy as np \n",
62 | "import pandas as pd\n",
63 | "from sklearn.tree import DecisionTreeClassifier"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {
69 | "button": false,
70 | "new_sheet": false,
71 | "run_control": {
72 | "read_only": false
73 | }
74 | },
75 | "source": [
76 | "### About dataset\n",
77 | "Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. \n",
78 | "\n",
79 | "Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to. \n",
80 | "\n",
81 | "It is a sample of binary classifier, and you can use the training part of the dataset \n",
82 | "to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.\n"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {
88 | "button": false,
89 | "new_sheet": false,
90 | "run_control": {
91 | "read_only": false
92 | }
93 | },
94 | "source": [
95 | "### Downloading Data\n",
96 | "To download the data, we will use !wget to download it from IBM Object Storage."
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 2,
102 | "metadata": {},
103 | "outputs": [
104 | {
105 | "name": "stderr",
106 | "output_type": "stream",
107 | "text": [
108 | "'wget' n'est pas reconnu en tant que commande interne\n",
109 | "ou externe, un programme ex‚cutable ou un fichier de commandes.\n"
110 | ]
111 | }
112 | ],
113 | "source": [
114 | "!wget -O drug200.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/drug200.csv"
115 | ]
116 | },
117 | {
118 | "cell_type": "markdown",
119 | "metadata": {},
120 | "source": [
121 | "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "metadata": {},
127 | "source": [
128 | "now, read data using pandas dataframe:"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 13,
134 | "metadata": {
135 | "button": false,
136 | "new_sheet": false,
137 | "run_control": {
138 | "read_only": false
139 | }
140 | },
141 | "outputs": [
142 | {
143 | "data": {
144 | "text/html": [
145 | "\n",
146 | "\n",
159 | "
\n",
160 | " \n",
161 | " \n",
162 | " | \n",
163 | " Age | \n",
164 | " Sex | \n",
165 | " BP | \n",
166 | " Cholesterol | \n",
167 | " Na_to_K | \n",
168 | " Drug | \n",
169 | "
\n",
170 | " \n",
171 | " \n",
172 | " \n",
173 | " 0 | \n",
174 | " 23 | \n",
175 | " F | \n",
176 | " HIGH | \n",
177 | " HIGH | \n",
178 | " 25.355 | \n",
179 | " drugY | \n",
180 | "
\n",
181 | " \n",
182 | " 1 | \n",
183 | " 47 | \n",
184 | " M | \n",
185 | " LOW | \n",
186 | " HIGH | \n",
187 | " 13.093 | \n",
188 | " drugC | \n",
189 | "
\n",
190 | " \n",
191 | " 2 | \n",
192 | " 47 | \n",
193 | " M | \n",
194 | " LOW | \n",
195 | " HIGH | \n",
196 | " 10.114 | \n",
197 | " drugC | \n",
198 | "
\n",
199 | " \n",
200 | " 3 | \n",
201 | " 28 | \n",
202 | " F | \n",
203 | " NORMAL | \n",
204 | " HIGH | \n",
205 | " 7.798 | \n",
206 | " drugX | \n",
207 | "
\n",
208 | " \n",
209 | " 4 | \n",
210 | " 61 | \n",
211 | " F | \n",
212 | " LOW | \n",
213 | " HIGH | \n",
214 | " 18.043 | \n",
215 | " drugY | \n",
216 | "
\n",
217 | " \n",
218 | " 5 | \n",
219 | " 22 | \n",
220 | " F | \n",
221 | " NORMAL | \n",
222 | " HIGH | \n",
223 | " 8.607 | \n",
224 | " drugX | \n",
225 | "
\n",
226 | " \n",
227 | " 6 | \n",
228 | " 49 | \n",
229 | " F | \n",
230 | " NORMAL | \n",
231 | " HIGH | \n",
232 | " 16.275 | \n",
233 | " drugY | \n",
234 | "
\n",
235 | " \n",
236 | " 7 | \n",
237 | " 41 | \n",
238 | " M | \n",
239 | " LOW | \n",
240 | " HIGH | \n",
241 | " 11.037 | \n",
242 | " drugC | \n",
243 | "
\n",
244 | " \n",
245 | "
\n",
246 | "
"
247 | ],
248 | "text/plain": [
249 | " Age Sex BP Cholesterol Na_to_K Drug\n",
250 | "0 23 F HIGH HIGH 25.355 drugY\n",
251 | "1 47 M LOW HIGH 13.093 drugC\n",
252 | "2 47 M LOW HIGH 10.114 drugC\n",
253 | "3 28 F NORMAL HIGH 7.798 drugX\n",
254 | "4 61 F LOW HIGH 18.043 drugY\n",
255 | "5 22 F NORMAL HIGH 8.607 drugX\n",
256 | "6 49 F NORMAL HIGH 16.275 drugY\n",
257 | "7 41 M LOW HIGH 11.037 drugC"
258 | ]
259 | },
260 | "execution_count": 13,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "my_data = pd.read_csv(\"drug200.csv\", delimiter=\",\")\n",
267 | "my_data[0:8]"
268 | ]
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "metadata": {
273 | "button": false,
274 | "new_sheet": false,
275 | "run_control": {
276 | "read_only": false
277 | }
278 | },
279 | "source": [
280 | "## Practice \n",
281 | "What is the size of data? "
282 | ]
283 | },
284 | {
285 | "cell_type": "code",
286 | "execution_count": 4,
287 | "metadata": {
288 | "button": false,
289 | "new_sheet": false,
290 | "run_control": {
291 | "read_only": false
292 | }
293 | },
294 | "outputs": [
295 | {
296 | "data": {
297 | "text/plain": [
298 | "1200"
299 | ]
300 | },
301 | "execution_count": 4,
302 | "metadata": {},
303 | "output_type": "execute_result"
304 | }
305 | ],
306 | "source": [
307 | "# write your code here\n",
308 | "my_data.size\n",
309 | "\n"
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "## Pre-processing"
317 | ]
318 | },
319 | {
320 | "cell_type": "markdown",
321 | "metadata": {
322 | "button": false,
323 | "new_sheet": false,
324 | "run_control": {
325 | "read_only": false
326 | }
327 | },
328 | "source": [
329 | "Using my_data as the Drug.csv data read by pandas, declare the following variables:
\n",
330 | "\n",
331 | " - X as the Feature Matrix (data of my_data)
\n",
332 | "\n",
333 | " \n",
334 | " - y as the response vector (target)
\n",
335 | "\n",
336 | "\n",
337 | " \n",
338 | "
"
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "metadata": {
344 | "button": false,
345 | "new_sheet": false,
346 | "run_control": {
347 | "read_only": false
348 | }
349 | },
350 | "source": [
351 | "Remove the column containing the target name since it doesn't contain numeric values."
352 | ]
353 | },
354 | {
355 | "cell_type": "code",
356 | "execution_count": 5,
357 | "metadata": {},
358 | "outputs": [
359 | {
360 | "data": {
361 | "text/plain": [
362 | "array([[23, 'F', 'HIGH', 'HIGH', 25.355],\n",
363 | " [47, 'M', 'LOW', 'HIGH', 13.093],\n",
364 | " [47, 'M', 'LOW', 'HIGH', 10.114],\n",
365 | " [28, 'F', 'NORMAL', 'HIGH', 7.798],\n",
366 | " [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)"
367 | ]
368 | },
369 | "execution_count": 5,
370 | "metadata": {},
371 | "output_type": "execute_result"
372 | }
373 | ],
374 | "source": [
375 | "X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values\n",
376 | "X[0:5]"
377 | ]
378 | },
379 | {
380 | "cell_type": "markdown",
381 | "metadata": {},
382 | "source": [
383 | "As you may figure out, some featurs in this dataset are catergorical such as __Sex__ or __BP__. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values. __pandas.get_dummies()__\n",
384 | "Convert categorical variable into dummy/indicator variables."
385 | ]
386 | },
387 | {
388 | "cell_type": "code",
389 | "execution_count": 6,
390 | "metadata": {},
391 | "outputs": [
392 | {
393 | "data": {
394 | "text/plain": [
395 | "array([[23, 0, 0, 0, 25.355],\n",
396 | " [47, 1, 1, 0, 13.093],\n",
397 | " [47, 1, 1, 0, 10.114],\n",
398 | " [28, 0, 2, 0, 7.798],\n",
399 | " [61, 0, 1, 0, 18.043]], dtype=object)"
400 | ]
401 | },
402 | "execution_count": 6,
403 | "metadata": {},
404 | "output_type": "execute_result"
405 | }
406 | ],
407 | "source": [
408 | "from sklearn import preprocessing\n",
409 | "le_sex = preprocessing.LabelEncoder()\n",
410 | "le_sex.fit(['F','M'])\n",
411 | "X[:,1] = le_sex.transform(X[:,1]) \n",
412 | "\n",
413 | "\n",
414 | "le_BP = preprocessing.LabelEncoder()\n",
415 | "le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])\n",
416 | "X[:,2] = le_BP.transform(X[:,2])\n",
417 | "\n",
418 | "\n",
419 | "le_Chol = preprocessing.LabelEncoder()\n",
420 | "le_Chol.fit([ 'NORMAL', 'HIGH'])\n",
421 | "X[:,3] = le_Chol.transform(X[:,3]) \n",
422 | "\n",
423 | "X[0:5]\n"
424 | ]
425 | },
426 | {
427 | "cell_type": "markdown",
428 | "metadata": {},
429 | "source": [
430 | "Now we can fill the target variable."
431 | ]
432 | },
433 | {
434 | "cell_type": "code",
435 | "execution_count": 7,
436 | "metadata": {
437 | "button": false,
438 | "new_sheet": false,
439 | "run_control": {
440 | "read_only": false
441 | }
442 | },
443 | "outputs": [
444 | {
445 | "data": {
446 | "text/plain": [
447 | "0 drugY\n",
448 | "1 drugC\n",
449 | "2 drugC\n",
450 | "3 drugX\n",
451 | "4 drugY\n",
452 | "Name: Drug, dtype: object"
453 | ]
454 | },
455 | "execution_count": 7,
456 | "metadata": {},
457 | "output_type": "execute_result"
458 | }
459 | ],
460 | "source": [
461 | "y = my_data[\"Drug\"]\n",
462 | "y[0:5]"
463 | ]
464 | },
465 | {
466 | "cell_type": "markdown",
467 | "metadata": {
468 | "button": false,
469 | "new_sheet": false,
470 | "run_control": {
471 | "read_only": false
472 | }
473 | },
474 | "source": [
475 | "---\n",
476 | "## Setting up the Decision Tree\n",
477 | "We will be using train/test split on our decision tree. Let's import train_test_split from sklearn.cross_validation."
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": 8,
483 | "metadata": {
484 | "button": false,
485 | "new_sheet": false,
486 | "run_control": {
487 | "read_only": false
488 | }
489 | },
490 | "outputs": [],
491 | "source": [
492 | "from sklearn.model_selection import train_test_split"
493 | ]
494 | },
495 | {
496 | "cell_type": "markdown",
497 | "metadata": {
498 | "button": false,
499 | "new_sheet": false,
500 | "run_control": {
501 | "read_only": false
502 | }
503 | },
504 | "source": [
505 | "Now train_test_split will return 4 different parameters. We will name them:
\n",
506 | "X_trainset, X_testset, y_trainset, y_testset
\n",
507 | "The train_test_split will need the parameters:
\n",
508 | "X, y, test_size=0.3, and random_state=3.
\n",
509 | "The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits."
510 | ]
511 | },
512 | {
513 | "cell_type": "code",
514 | "execution_count": 9,
515 | "metadata": {
516 | "button": false,
517 | "new_sheet": false,
518 | "run_control": {
519 | "read_only": false
520 | }
521 | },
522 | "outputs": [],
523 | "source": [
524 | "X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)"
525 | ]
526 | },
527 | {
528 | "cell_type": "markdown",
529 | "metadata": {
530 | "button": false,
531 | "new_sheet": false,
532 | "run_control": {
533 | "read_only": false
534 | }
535 | },
536 | "source": [
537 | "## Practice\n",
538 | "Print the shape of X_trainset and y_trainset. Ensure that the dimensions match"
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": 14,
544 | "metadata": {
545 | "button": false,
546 | "new_sheet": false,
547 | "run_control": {
548 | "read_only": false
549 | }
550 | },
551 | "outputs": [
552 | {
553 | "data": {
554 | "text/plain": [
555 | "(140,)"
556 | ]
557 | },
558 | "execution_count": 14,
559 | "metadata": {},
560 | "output_type": "execute_result"
561 | }
562 | ],
563 | "source": [
564 | "# your code\n",
565 | "\n",
566 | "# your code\n",
567 | "\n",
568 | "X_trainset.shape,\n",
569 | "y_trainset.shape"
570 | ]
571 | },
572 | {
573 | "cell_type": "markdown",
574 | "metadata": {
575 | "button": false,
576 | "new_sheet": false,
577 | "run_control": {
578 | "read_only": false
579 | }
580 | },
581 | "source": [
582 | "Print the shape of X_testset and y_testset. Ensure that the dimensions match"
583 | ]
584 | },
585 | {
586 | "cell_type": "code",
587 | "execution_count": 15,
588 | "metadata": {
589 | "button": false,
590 | "new_sheet": false,
591 | "run_control": {
592 | "read_only": false
593 | }
594 | },
595 | "outputs": [
596 | {
597 | "name": "stdout",
598 | "output_type": "stream",
599 | "text": [
600 | "[[26 0 0 1 19.161]\n",
601 | " [41 0 2 1 22.905]\n",
602 | " [28 0 2 0 19.675]\n",
603 | " [19 0 0 0 13.313]\n",
604 | " [50 1 2 1 15.79]\n",
605 | " [24 1 2 0 25.786]\n",
606 | " [72 1 1 0 16.31]\n",
607 | " [74 0 1 0 20.942]\n",
608 | " [37 0 1 1 12.006]\n",
609 | " [31 1 0 1 17.069]\n",
610 | " [22 0 2 0 8.607]\n",
611 | " [20 0 2 1 9.281]\n",
612 | " [28 0 1 0 13.127]\n",
613 | " [59 0 2 0 13.884]\n",
614 | " [15 1 0 1 17.206]\n",
615 | " [51 0 1 1 23.003]\n",
616 | " [45 1 1 1 10.017]\n",
617 | " [33 0 1 0 33.486]\n",
618 | " [39 1 0 0 9.664]\n",
619 | " [29 0 0 0 29.45]\n",
620 | " [60 1 2 0 15.171]\n",
621 | " [24 0 0 1 18.457]\n",
622 | " [49 0 2 1 9.381]\n",
623 | " [37 1 1 1 8.968]\n",
624 | " [32 0 0 1 10.292]\n",
625 | " [21 0 0 1 28.632]\n",
626 | " [23 1 2 0 12.26]\n",
627 | " [40 1 0 0 27.826]\n",
628 | " [38 1 1 0 18.295]\n",
629 | " [47 1 1 1 30.568]\n",
630 | " [22 0 0 1 22.818]\n",
631 | " [47 1 0 0 10.403]\n",
632 | " [30 0 2 0 10.443]\n",
633 | " [69 1 1 0 15.478]\n",
634 | " [42 0 0 0 21.036]\n",
635 | " [45 1 1 1 8.37]\n",
636 | " [49 1 0 1 6.269]\n",
637 | " [72 1 1 0 6.769]\n",
638 | " [74 1 1 1 11.939]\n",
639 | " [66 0 2 1 8.107]\n",
640 | " [46 1 2 1 7.285]\n",
641 | " [68 0 2 1 27.05]\n",
642 | " [58 0 0 0 19.416]\n",
643 | " [19 0 0 1 25.969]\n",
644 | " [20 1 0 1 35.639]\n",
645 | " [69 1 1 1 11.455]\n",
646 | " [32 0 0 1 25.974]\n",
647 | " [72 1 0 1 9.677]\n",
648 | " [50 0 2 1 12.295]\n",
649 | " [54 1 2 0 24.658]\n",
650 | " [36 0 0 0 11.198]\n",
651 | " [64 0 1 1 25.741]\n",
652 | " [35 1 1 1 9.17]\n",
653 | " [47 0 1 0 11.767]\n",
654 | " [47 0 1 0 10.067]\n",
655 | " [34 0 0 1 19.199]\n",
656 | " [26 0 1 0 14.16]\n",
657 | " [37 0 0 1 23.091]\n",
658 | " [48 1 0 1 10.446]\n",
659 | " [47 0 2 1 6.683]\n",
660 | " [55 0 0 0 10.977]\n",
661 | " [43 1 1 1 19.368]\n",
662 | " [35 0 0 0 12.894]\n",
663 | " [49 1 1 1 11.014]\n",
664 | " [45 1 1 0 17.951]\n",
665 | " [15 1 2 0 9.084]\n",
666 | " [57 0 2 1 25.893]\n",
667 | " [65 1 0 1 11.34]\n",
668 | " [70 1 0 0 9.849]\n",
669 | " [46 0 0 0 34.686]\n",
670 | " [41 1 0 1 15.156]\n",
671 | " [34 1 0 0 18.703]\n",
672 | " [42 1 0 1 12.766]\n",
673 | " [32 1 0 1 9.445]\n",
674 | " [25 1 2 0 19.011]\n",
675 | " [62 1 1 1 27.183]\n",
676 | " [23 1 0 0 8.011]\n",
677 | " [23 1 2 0 31.686]\n",
678 | " [58 0 1 0 38.247]\n",
679 | " [26 1 1 1 20.909]\n",
680 | " [68 1 0 0 11.009]\n",
681 | " [60 1 0 0 13.934]\n",
682 | " [15 0 0 1 16.725]\n",
683 | " [53 0 0 1 12.495]\n",
684 | " [37 1 1 1 16.724]\n",
685 | " [40 0 2 0 10.103]\n",
686 | " [59 1 0 0 13.935]\n",
687 | " [47 1 1 0 13.093]\n",
688 | " [65 0 1 1 13.769]\n",
689 | " [16 1 0 1 19.007]\n",
690 | " [67 1 2 1 9.514]\n",
691 | " [23 1 1 0 7.298]\n",
692 | " [56 0 1 0 11.567]\n",
693 | " [68 0 0 1 10.189]\n",
694 | " [65 1 0 1 34.997]\n",
695 | " [39 0 1 1 22.697]\n",
696 | " [35 1 2 1 7.845]\n",
697 | " [64 1 0 1 20.932]\n",
698 | " [28 0 1 0 19.796]\n",
699 | " [56 1 1 0 15.015]\n",
700 | " [57 1 1 1 19.128]\n",
701 | " [39 1 1 1 13.938]\n",
702 | " [32 0 1 1 10.84]\n",
703 | " [36 0 2 0 16.753]\n",
704 | " [65 0 0 1 31.876]\n",
705 | " [41 1 1 0 11.037]\n",
706 | " [67 1 1 1 20.693]\n",
707 | " [23 1 2 1 14.02]\n",
708 | " [40 0 1 1 11.349]\n",
709 | " [53 1 1 0 22.963]\n",
710 | " [56 0 0 0 25.395]\n",
711 | " [50 1 0 0 7.49]\n",
712 | " [22 1 0 1 28.294]\n",
713 | " [18 0 0 1 24.276]\n",
714 | " [62 1 2 0 16.594]\n",
715 | " [32 0 2 0 7.477]\n",
716 | " [38 0 1 1 29.875]\n",
717 | " [47 1 1 0 10.114]\n",
718 | " [29 1 0 0 12.856]\n",
719 | " [49 1 0 1 8.7]\n",
720 | " [64 1 2 0 7.761]\n",
721 | " [31 1 0 0 30.366]\n",
722 | " [60 1 0 1 8.621]\n",
723 | " [57 0 2 0 14.216]\n",
724 | " [42 0 1 1 29.271]\n",
725 | " [39 0 2 1 17.225]\n",
726 | " [61 0 1 1 7.34]\n",
727 | " [58 0 1 0 26.645]\n",
728 | " [61 0 0 0 25.475]\n",
729 | " [22 1 1 0 8.151]\n",
730 | " [51 1 0 1 11.343]\n",
731 | " [20 0 0 0 11.262]\n",
732 | " [42 1 1 0 20.013]\n",
733 | " [26 0 0 1 12.307]\n",
734 | " [63 1 2 0 25.917]\n",
735 | " [23 0 0 0 25.355]\n",
736 | " [18 0 0 0 37.188]\n",
737 | " [52 1 1 1 32.922]\n",
738 | " [55 1 2 1 7.261]\n",
739 | " [22 1 2 0 11.953]]\n",
740 | "40 drugY\n",
741 | "51 drugX\n",
742 | "139 drugX\n",
743 | "197 drugX\n",
744 | "170 drugX\n",
745 | "82 drugC\n",
746 | "183 drugY\n",
747 | "46 drugA\n",
748 | "70 drugB\n",
749 | "100 drugA\n",
750 | "179 drugY\n",
751 | "83 drugA\n",
752 | "25 drugY\n",
753 | "190 drugY\n",
754 | "159 drugX\n",
755 | "173 drugY\n",
756 | "95 drugX\n",
757 | "3 drugX\n",
758 | "41 drugB\n",
759 | "58 drugX\n",
760 | "14 drugX\n",
761 | "143 drugY\n",
762 | "12 drugY\n",
763 | "6 drugY\n",
764 | "182 drugX\n",
765 | "161 drugB\n",
766 | "128 drugY\n",
767 | "122 drugY\n",
768 | "101 drugA\n",
769 | "86 drugX\n",
770 | "64 drugB\n",
771 | "47 drugC\n",
772 | "158 drugC\n",
773 | "34 drugX\n",
774 | "38 drugX\n",
775 | "196 drugC\n",
776 | "4 drugY\n",
777 | "72 drugX\n",
778 | "67 drugX\n",
779 | "145 drugX\n",
780 | "156 drugA\n",
781 | "115 drugY\n",
782 | "155 drugC\n",
783 | "15 drugY\n",
784 | "61 drugA\n",
785 | "175 drugY\n",
786 | "120 drugY\n",
787 | "130 drugY\n",
788 | "23 drugY\n",
789 | "153 drugX\n",
790 | "31 drugB\n",
791 | "103 drugX\n",
792 | "89 drugY\n",
793 | "132 drugX\n",
794 | "109 drugY\n",
795 | "126 drugY\n",
796 | "17 drugA\n",
797 | "30 drugX\n",
798 | "178 drugY\n",
799 | "162 drugX\n",
800 | "Name: Drug, dtype: object\n"
801 | ]
802 | }
803 | ],
804 | "source": [
805 | "# your code\n",
806 | "\n",
807 | "print(X_trainset)\n",
808 | "print(y_testset)"
809 | ]
810 | },
811 | {
812 | "cell_type": "markdown",
813 | "metadata": {
814 | "button": false,
815 | "new_sheet": false,
816 | "run_control": {
817 | "read_only": false
818 | }
819 | },
820 | "source": [
821 | "## Modeling\n",
822 | "We will first create an instance of the DecisionTreeClassifier called drugTree.
\n",
823 | "Inside of the classifier, specify criterion=\"entropy\" so we can see the information gain of each node."
824 | ]
825 | },
826 | {
827 | "cell_type": "code",
828 | "execution_count": 16,
829 | "metadata": {
830 | "button": false,
831 | "new_sheet": false,
832 | "run_control": {
833 | "read_only": false
834 | }
835 | },
836 | "outputs": [
837 | {
838 | "data": {
839 | "text/plain": [
840 | "DecisionTreeClassifier(criterion='entropy', max_depth=4)"
841 | ]
842 | },
843 | "execution_count": 16,
844 | "metadata": {},
845 | "output_type": "execute_result"
846 | }
847 | ],
848 | "source": [
849 | "drugTree = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\n",
850 | "drugTree # it shows the default parameters"
851 | ]
852 | },
853 | {
854 | "cell_type": "markdown",
855 | "metadata": {
856 | "button": false,
857 | "new_sheet": false,
858 | "run_control": {
859 | "read_only": false
860 | }
861 | },
862 | "source": [
863 | "Next, we will fit the data with the training feature matrix X_trainset and training response vector y_trainset "
864 | ]
865 | },
866 | {
867 | "cell_type": "code",
868 | "execution_count": 17,
869 | "metadata": {
870 | "button": false,
871 | "new_sheet": false,
872 | "run_control": {
873 | "read_only": false
874 | }
875 | },
876 | "outputs": [
877 | {
878 | "data": {
879 | "text/plain": [
880 | "DecisionTreeClassifier(criterion='entropy', max_depth=4)"
881 | ]
882 | },
883 | "execution_count": 17,
884 | "metadata": {},
885 | "output_type": "execute_result"
886 | }
887 | ],
888 | "source": [
889 | "drugTree.fit(X_trainset,y_trainset)"
890 | ]
891 | },
892 | {
893 | "cell_type": "markdown",
894 | "metadata": {
895 | "button": false,
896 | "new_sheet": false,
897 | "run_control": {
898 | "read_only": false
899 | }
900 | },
901 | "source": [
902 | "## Prediction\n",
903 | "Let's make some predictions on the testing dataset and store it into a variable called predTree."
904 | ]
905 | },
906 | {
907 | "cell_type": "code",
908 | "execution_count": 18,
909 | "metadata": {
910 | "button": false,
911 | "new_sheet": false,
912 | "run_control": {
913 | "read_only": false
914 | }
915 | },
916 | "outputs": [],
917 | "source": [
918 | "predTree = drugTree.predict(X_testset)"
919 | ]
920 | },
921 | {
922 | "cell_type": "markdown",
923 | "metadata": {
924 | "button": false,
925 | "new_sheet": false,
926 | "run_control": {
927 | "read_only": false
928 | }
929 | },
930 | "source": [
931 | "You can print out predTree and y_testset if you want to visually compare the prediction to the actual values."
932 | ]
933 | },
934 | {
935 | "cell_type": "code",
936 | "execution_count": 19,
937 | "metadata": {
938 | "button": false,
939 | "new_sheet": false,
940 | "run_control": {
941 | "read_only": false
942 | },
943 | "scrolled": true
944 | },
945 | "outputs": [
946 | {
947 | "name": "stdout",
948 | "output_type": "stream",
949 | "text": [
950 | "['drugY' 'drugX' 'drugX' 'drugX' 'drugX']\n",
951 | "40 drugY\n",
952 | "51 drugX\n",
953 | "139 drugX\n",
954 | "197 drugX\n",
955 | "170 drugX\n",
956 | "Name: Drug, dtype: object\n"
957 | ]
958 | }
959 | ],
960 | "source": [
961 | "print (predTree [0:5])\n",
962 | "print (y_testset [0:5])\n"
963 | ]
964 | },
965 | {
966 | "cell_type": "markdown",
967 | "metadata": {
968 | "button": false,
969 | "new_sheet": false,
970 | "run_control": {
971 | "read_only": false
972 | }
973 | },
974 | "source": [
975 | "## Evaluation\n",
976 | "Next, let's import __metrics__ from sklearn and check the accuracy of our model."
977 | ]
978 | },
979 | {
980 | "cell_type": "code",
981 | "execution_count": 20,
982 | "metadata": {
983 | "button": false,
984 | "new_sheet": false,
985 | "run_control": {
986 | "read_only": false
987 | }
988 | },
989 | "outputs": [
990 | {
991 | "name": "stdout",
992 | "output_type": "stream",
993 | "text": [
994 | "DecisionTrees's Accuracy: 0.9833333333333333\n"
995 | ]
996 | }
997 | ],
998 | "source": [
999 | "from sklearn import metrics\n",
1000 | "import matplotlib.pyplot as plt\n",
1001 | "print(\"DecisionTrees's Accuracy: \", metrics.accuracy_score(y_testset, predTree))"
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "markdown",
1006 | "metadata": {
1007 | "button": false,
1008 | "new_sheet": false,
1009 | "run_control": {
1010 | "read_only": false
1011 | }
1012 | },
1013 | "source": [
1014 | "__Accuracy classification score__ computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. \n",
1015 | "\n",
1016 | "In multilabel classification, the function returns the subset accuracy. If the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0; otherwise it is 0.0.\n"
1017 | ]
1018 | },
1019 | {
1020 | "cell_type": "markdown",
1021 | "metadata": {
1022 | "button": false,
1023 | "new_sheet": false,
1024 | "run_control": {
1025 | "read_only": false
1026 | }
1027 | },
1028 | "source": [
1029 | "## Practice \n",
1030 | "Can you calculate the accuracy score without sklearn ?"
1031 | ]
1032 | },
1033 | {
1034 | "cell_type": "code",
1035 | "execution_count": 26,
1036 | "metadata": {
1037 | "button": false,
1038 | "new_sheet": false,
1039 | "run_control": {
1040 | "read_only": false
1041 | }
1042 | },
1043 | "outputs": [
1044 | {
1045 | "name": "stdout",
1046 | "output_type": "stream",
1047 | "text": [
1048 | "Collecting package metadata (current_repodata.json): ...working... done\n",
1049 | "Solving environment: ...working... done\n",
1050 | "\n",
1051 | "## Package Plan ##\n",
1052 | "\n",
1053 | " environment location: C:\\ProgramData\\Anaconda3\n",
1054 | "\n",
1055 | " added / updated specs:\n",
1056 | " - pydotplus\n",
1057 | "\n",
1058 | "\n",
1059 | "The following NEW packages will be INSTALLED:\n",
1060 | "\n",
1061 | " graphviz pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n",
1062 | " pydotplus conda-forge/noarch::pydotplus-2.0.2-py_2\n",
1063 | "\n",
1064 | "The following packages will be UPDATED:\n",
1065 | "\n",
1066 | " conda 4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n",
1067 | "\n",
1068 | "\n",
1069 | "Preparing transaction: ...working... done\n",
1070 | "Verifying transaction: ...working... failed\n"
1071 | ]
1072 | },
1073 | {
1074 | "name": "stderr",
1075 | "output_type": "stream",
1076 | "text": [
1077 | "\n",
1078 | "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n",
1079 | " environment location: C:\\ProgramData\\Anaconda3\n",
1080 | "\n",
1081 | "\n"
1082 | ]
1083 | },
1084 | {
1085 | "name": "stdout",
1086 | "output_type": "stream",
1087 | "text": [
1088 | "Collecting package metadata (current_repodata.json): ...working... done\n",
1089 | "Solving environment: ...working... done\n",
1090 | "\n",
1091 | "## Package Plan ##\n",
1092 | "\n",
1093 | " environment location: C:\\ProgramData\\Anaconda3"
1094 | ]
1095 | },
1096 | {
1097 | "name": "stderr",
1098 | "output_type": "stream",
1099 | "text": [
1100 | "\n",
1101 | "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n",
1102 | " environment location: C:\\ProgramData\\Anaconda3\n",
1103 | "\n",
1104 | "\n"
1105 | ]
1106 | },
1107 | {
1108 | "name": "stdout",
1109 | "output_type": "stream",
1110 | "text": [
1111 | "\n",
1112 | "\n",
1113 | " added / updated specs:\n",
1114 | " - python-graphviz\n",
1115 | "\n",
1116 | "\n",
1117 | "The following NEW packages will be INSTALLED:\n",
1118 | "\n",
1119 | " graphviz pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n",
1120 | " python-graphviz pkgs/main/noarch::python-graphviz-0.16-pyhd3eb1b0_1\n",
1121 | "\n",
1122 | "The following packages will be UPDATED:\n",
1123 | "\n",
1124 | " conda 4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n",
1125 | "\n",
1126 | "\n",
1127 | "Preparing transaction: ...working... done\n",
1128 | "Verifying transaction: ...working... failed\n"
1129 | ]
1130 | }
1131 | ],
1132 | "source": [
1133 | "# your code here\n",
1134 | "# Notice: You might need to uncomment and install the pydotplus and graphviz libraries if you have not installed these before\n",
1135 | "!conda install -c conda-forge pydotplus -y\n",
1136 | "!conda install -c conda-forge python-graphviz -y"
1137 | ]
1138 | },
1139 | {
1140 | "cell_type": "markdown",
1141 | "metadata": {},
1142 | "source": [
1143 | "## Visualization\n",
1144 | "Lets visualize the tree"
1145 | ]
1146 | },
1147 | {
1148 | "cell_type": "code",
1149 | "execution_count": 31,
1150 | "metadata": {
1151 | "button": false,
1152 | "new_sheet": false,
1153 | "run_control": {
1154 | "read_only": false
1155 | }
1156 | },
1157 | "outputs": [],
1158 | "source": [
1159 | "from six import StringIO\n",
1160 | "import pydotplus\n",
1161 | "import matplotlib.image as mpimg\n",
1162 | "from sklearn import tree\n",
1163 | "%matplotlib inline "
1164 | ]
1165 | },
1166 | {
1167 | "cell_type": "code",
1168 | "execution_count": 36,
1169 | "metadata": {
1170 | "button": false,
1171 | "new_sheet": false,
1172 | "run_control": {
1173 | "read_only": false
1174 | },
1175 | "scrolled": true
1176 | },
1177 | "outputs": [
1178 | {
1179 | "ename": "SyntaxError",
1180 | "evalue": "not a PNG file ()",
1181 | "output_type": "error",
1182 | "traceback": [
1183 | "Traceback \u001b[1;36m(most recent call last)\u001b[0m:\n",
1184 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py\"\u001b[0m, line \u001b[0;32m3437\u001b[0m, in \u001b[0;35mrun_code\u001b[0m\n exec(code_obj, self.user_global_ns, self.user_ns)\n",
1185 | " File \u001b[0;32m\"\"\u001b[0m, line \u001b[0;32m9\u001b[0m, in \u001b[0;35m\u001b[0m\n img = mpimg.imread(filename)\n",
1186 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\matplotlib\\image.py\"\u001b[0m, line \u001b[0;32m1496\u001b[0m, in \u001b[0;35mimread\u001b[0m\n with img_open(fname) as image:\n",
1187 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\ImageFile.py\"\u001b[0m, line \u001b[0;32m121\u001b[0m, in \u001b[0;35m__init__\u001b[0m\n self._open()\n",
1188 | "\u001b[1;36m File \u001b[1;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\PngImagePlugin.py\"\u001b[1;36m, line \u001b[1;32m676\u001b[1;36m, in \u001b[1;35m_open\u001b[1;36m\u001b[0m\n\u001b[1;33m raise SyntaxError(\"not a PNG file\")\u001b[0m\n",
1189 | "\u001b[1;36m File \u001b[1;32m\"\"\u001b[1;36m, line \u001b[1;32munknown\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m not a PNG file\n"
1190 | ]
1191 | }
1192 | ],
1193 | "source": [
1194 | "dot_data = StringIO()\n",
1195 | "filename = \"drugtree.png\"\n",
1196 | "featureNames = my_data.columns[0:5]\n",
1197 | "targetNames = my_data[\"Drug\"].unique().tolist()\n",
1198 | "out=tree.export_graphviz(drugTree,feature_names=featureNames, out_file=dot_data, class_names= np.unique(y_trainset), filled=True, special_characters=True,rotate=False) \n",
1199 | "graph = pydotplus.graph_from_dot_data(dot_data.getvalue())\n",
1200 | "#graph.write_png(filename)\n",
1201 | "graph.write_png(filename)\n",
1202 | "img = mpimg.imread(filename)\n",
1203 | "plt.figure(figsize=(100, 200))\n",
1204 | "plt.imshow(img,interpolation='nearest')"
1205 | ]
1206 | },
1207 | {
1208 | "cell_type": "markdown",
1209 | "metadata": {
1210 | "button": false,
1211 | "new_sheet": false,
1212 | "run_control": {
1213 | "read_only": false
1214 | }
1215 | },
1216 | "source": [
1217 | "## Want to learn more?\n",
1218 | "\n",
1219 | "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).\n",
1220 | "\n",
1221 | "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)\n",
1222 | "\n",
1223 | "### Thanks for completing this lesson!\n",
1224 | "\n",
1225 | "Notebook created by: Saeed Aghabozorgi\n",
1226 | "\n",
1227 | "
\n",
1228 | "Copyright © 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
1229 | ]
1230 | }
1231 | ],
1232 | "metadata": {
1233 | "anaconda-cloud": {},
1234 | "kernelspec": {
1235 | "display_name": "Python 3",
1236 | "language": "python",
1237 | "name": "python3"
1238 | },
1239 | "language_info": {
1240 | "codemirror_mode": {
1241 | "name": "ipython",
1242 | "version": 3
1243 | },
1244 | "file_extension": ".py",
1245 | "mimetype": "text/x-python",
1246 | "name": "python",
1247 | "nbconvert_exporter": "python",
1248 | "pygments_lexer": "ipython3",
1249 | "version": "3.8.8"
1250 | },
1251 | "widgets": {
1252 | "state": {},
1253 | "version": "1.1.2"
1254 | }
1255 | },
1256 | "nbformat": 4,
1257 | "nbformat_minor": 2
1258 | }
1259 |
--------------------------------------------------------------------------------
/drug200.csv:
--------------------------------------------------------------------------------
1 | Age,Sex,BP,Cholesterol,Na_to_K,Drug
2 | 23,F,HIGH,HIGH,25.355,drugY
3 | 47,M,LOW,HIGH,13.093,drugC
4 | 47,M,LOW,HIGH,10.114,drugC
5 | 28,F,NORMAL,HIGH,7.798,drugX
6 | 61,F,LOW,HIGH,18.043,drugY
7 | 22,F,NORMAL,HIGH,8.607,drugX
8 | 49,F,NORMAL,HIGH,16.275,drugY
9 | 41,M,LOW,HIGH,11.037,drugC
10 | 60,M,NORMAL,HIGH,15.171,drugY
11 | 43,M,LOW,NORMAL,19.368,drugY
12 | 47,F,LOW,HIGH,11.767,drugC
13 | 34,F,HIGH,NORMAL,19.199,drugY
14 | 43,M,LOW,HIGH,15.376,drugY
15 | 74,F,LOW,HIGH,20.942,drugY
16 | 50,F,NORMAL,HIGH,12.703,drugX
17 | 16,F,HIGH,NORMAL,15.516,drugY
18 | 69,M,LOW,NORMAL,11.455,drugX
19 | 43,M,HIGH,HIGH,13.972,drugA
20 | 23,M,LOW,HIGH,7.298,drugC
21 | 32,F,HIGH,NORMAL,25.974,drugY
22 | 57,M,LOW,NORMAL,19.128,drugY
23 | 63,M,NORMAL,HIGH,25.917,drugY
24 | 47,M,LOW,NORMAL,30.568,drugY
25 | 48,F,LOW,HIGH,15.036,drugY
26 | 33,F,LOW,HIGH,33.486,drugY
27 | 28,F,HIGH,NORMAL,18.809,drugY
28 | 31,M,HIGH,HIGH,30.366,drugY
29 | 49,F,NORMAL,NORMAL,9.381,drugX
30 | 39,F,LOW,NORMAL,22.697,drugY
31 | 45,M,LOW,HIGH,17.951,drugY
32 | 18,F,NORMAL,NORMAL,8.75,drugX
33 | 74,M,HIGH,HIGH,9.567,drugB
34 | 49,M,LOW,NORMAL,11.014,drugX
35 | 65,F,HIGH,NORMAL,31.876,drugY
36 | 53,M,NORMAL,HIGH,14.133,drugX
37 | 46,M,NORMAL,NORMAL,7.285,drugX
38 | 32,M,HIGH,NORMAL,9.445,drugA
39 | 39,M,LOW,NORMAL,13.938,drugX
40 | 39,F,NORMAL,NORMAL,9.709,drugX
41 | 15,M,NORMAL,HIGH,9.084,drugX
42 | 73,F,NORMAL,HIGH,19.221,drugY
43 | 58,F,HIGH,NORMAL,14.239,drugB
44 | 50,M,NORMAL,NORMAL,15.79,drugY
45 | 23,M,NORMAL,HIGH,12.26,drugX
46 | 50,F,NORMAL,NORMAL,12.295,drugX
47 | 66,F,NORMAL,NORMAL,8.107,drugX
48 | 37,F,HIGH,HIGH,13.091,drugA
49 | 68,M,LOW,HIGH,10.291,drugC
50 | 23,M,NORMAL,HIGH,31.686,drugY
51 | 28,F,LOW,HIGH,19.796,drugY
52 | 58,F,HIGH,HIGH,19.416,drugY
53 | 67,M,NORMAL,NORMAL,10.898,drugX
54 | 62,M,LOW,NORMAL,27.183,drugY
55 | 24,F,HIGH,NORMAL,18.457,drugY
56 | 68,F,HIGH,NORMAL,10.189,drugB
57 | 26,F,LOW,HIGH,14.16,drugC
58 | 65,M,HIGH,NORMAL,11.34,drugB
59 | 40,M,HIGH,HIGH,27.826,drugY
60 | 60,M,NORMAL,NORMAL,10.091,drugX
61 | 34,M,HIGH,HIGH,18.703,drugY
62 | 38,F,LOW,NORMAL,29.875,drugY
63 | 24,M,HIGH,NORMAL,9.475,drugA
64 | 67,M,LOW,NORMAL,20.693,drugY
65 | 45,M,LOW,NORMAL,8.37,drugX
66 | 60,F,HIGH,HIGH,13.303,drugB
67 | 68,F,NORMAL,NORMAL,27.05,drugY
68 | 29,M,HIGH,HIGH,12.856,drugA
69 | 17,M,NORMAL,NORMAL,10.832,drugX
70 | 54,M,NORMAL,HIGH,24.658,drugY
71 | 18,F,HIGH,NORMAL,24.276,drugY
72 | 70,M,HIGH,HIGH,13.967,drugB
73 | 28,F,NORMAL,HIGH,19.675,drugY
74 | 24,F,NORMAL,HIGH,10.605,drugX
75 | 41,F,NORMAL,NORMAL,22.905,drugY
76 | 31,M,HIGH,NORMAL,17.069,drugY
77 | 26,M,LOW,NORMAL,20.909,drugY
78 | 36,F,HIGH,HIGH,11.198,drugA
79 | 26,F,HIGH,NORMAL,19.161,drugY
80 | 19,F,HIGH,HIGH,13.313,drugA
81 | 32,F,LOW,NORMAL,10.84,drugX
82 | 60,M,HIGH,HIGH,13.934,drugB
83 | 64,M,NORMAL,HIGH,7.761,drugX
84 | 32,F,LOW,HIGH,9.712,drugC
85 | 38,F,HIGH,NORMAL,11.326,drugA
86 | 47,F,LOW,HIGH,10.067,drugC
87 | 59,M,HIGH,HIGH,13.935,drugB
88 | 51,F,NORMAL,HIGH,13.597,drugX
89 | 69,M,LOW,HIGH,15.478,drugY
90 | 37,F,HIGH,NORMAL,23.091,drugY
91 | 50,F,NORMAL,NORMAL,17.211,drugY
92 | 62,M,NORMAL,HIGH,16.594,drugY
93 | 41,M,HIGH,NORMAL,15.156,drugY
94 | 29,F,HIGH,HIGH,29.45,drugY
95 | 42,F,LOW,NORMAL,29.271,drugY
96 | 56,M,LOW,HIGH,15.015,drugY
97 | 36,M,LOW,NORMAL,11.424,drugX
98 | 58,F,LOW,HIGH,38.247,drugY
99 | 56,F,HIGH,HIGH,25.395,drugY
100 | 20,M,HIGH,NORMAL,35.639,drugY
101 | 15,F,HIGH,NORMAL,16.725,drugY
102 | 31,M,HIGH,NORMAL,11.871,drugA
103 | 45,F,HIGH,HIGH,12.854,drugA
104 | 28,F,LOW,HIGH,13.127,drugC
105 | 56,M,NORMAL,HIGH,8.966,drugX
106 | 22,M,HIGH,NORMAL,28.294,drugY
107 | 37,M,LOW,NORMAL,8.968,drugX
108 | 22,M,NORMAL,HIGH,11.953,drugX
109 | 42,M,LOW,HIGH,20.013,drugY
110 | 72,M,HIGH,NORMAL,9.677,drugB
111 | 23,M,NORMAL,HIGH,16.85,drugY
112 | 50,M,HIGH,HIGH,7.49,drugA
113 | 47,F,NORMAL,NORMAL,6.683,drugX
114 | 35,M,LOW,NORMAL,9.17,drugX
115 | 65,F,LOW,NORMAL,13.769,drugX
116 | 20,F,NORMAL,NORMAL,9.281,drugX
117 | 51,M,HIGH,HIGH,18.295,drugY
118 | 67,M,NORMAL,NORMAL,9.514,drugX
119 | 40,F,NORMAL,HIGH,10.103,drugX
120 | 32,F,HIGH,NORMAL,10.292,drugA
121 | 61,F,HIGH,HIGH,25.475,drugY
122 | 28,M,NORMAL,HIGH,27.064,drugY
123 | 15,M,HIGH,NORMAL,17.206,drugY
124 | 34,M,NORMAL,HIGH,22.456,drugY
125 | 36,F,NORMAL,HIGH,16.753,drugY
126 | 53,F,HIGH,NORMAL,12.495,drugB
127 | 19,F,HIGH,NORMAL,25.969,drugY
128 | 66,M,HIGH,HIGH,16.347,drugY
129 | 35,M,NORMAL,NORMAL,7.845,drugX
130 | 47,M,LOW,NORMAL,33.542,drugY
131 | 32,F,NORMAL,HIGH,7.477,drugX
132 | 70,F,NORMAL,HIGH,20.489,drugY
133 | 52,M,LOW,NORMAL,32.922,drugY
134 | 49,M,LOW,NORMAL,13.598,drugX
135 | 24,M,NORMAL,HIGH,25.786,drugY
136 | 42,F,HIGH,HIGH,21.036,drugY
137 | 74,M,LOW,NORMAL,11.939,drugX
138 | 55,F,HIGH,HIGH,10.977,drugB
139 | 35,F,HIGH,HIGH,12.894,drugA
140 | 51,M,HIGH,NORMAL,11.343,drugB
141 | 69,F,NORMAL,HIGH,10.065,drugX
142 | 49,M,HIGH,NORMAL,6.269,drugA
143 | 64,F,LOW,NORMAL,25.741,drugY
144 | 60,M,HIGH,NORMAL,8.621,drugB
145 | 74,M,HIGH,NORMAL,15.436,drugY
146 | 39,M,HIGH,HIGH,9.664,drugA
147 | 61,M,NORMAL,HIGH,9.443,drugX
148 | 37,F,LOW,NORMAL,12.006,drugX
149 | 26,F,HIGH,NORMAL,12.307,drugA
150 | 61,F,LOW,NORMAL,7.34,drugX
151 | 22,M,LOW,HIGH,8.151,drugC
152 | 49,M,HIGH,NORMAL,8.7,drugA
153 | 68,M,HIGH,HIGH,11.009,drugB
154 | 55,M,NORMAL,NORMAL,7.261,drugX
155 | 72,F,LOW,NORMAL,14.642,drugX
156 | 37,M,LOW,NORMAL,16.724,drugY
157 | 49,M,LOW,HIGH,10.537,drugC
158 | 31,M,HIGH,NORMAL,11.227,drugA
159 | 53,M,LOW,HIGH,22.963,drugY
160 | 59,F,LOW,HIGH,10.444,drugC
161 | 34,F,LOW,NORMAL,12.923,drugX
162 | 30,F,NORMAL,HIGH,10.443,drugX
163 | 57,F,HIGH,NORMAL,9.945,drugB
164 | 43,M,NORMAL,NORMAL,12.859,drugX
165 | 21,F,HIGH,NORMAL,28.632,drugY
166 | 16,M,HIGH,NORMAL,19.007,drugY
167 | 38,M,LOW,HIGH,18.295,drugY
168 | 58,F,LOW,HIGH,26.645,drugY
169 | 57,F,NORMAL,HIGH,14.216,drugX
170 | 51,F,LOW,NORMAL,23.003,drugY
171 | 20,F,HIGH,HIGH,11.262,drugA
172 | 28,F,NORMAL,HIGH,12.879,drugX
173 | 45,M,LOW,NORMAL,10.017,drugX
174 | 39,F,NORMAL,NORMAL,17.225,drugY
175 | 41,F,LOW,NORMAL,18.739,drugY
176 | 42,M,HIGH,NORMAL,12.766,drugA
177 | 73,F,HIGH,HIGH,18.348,drugY
178 | 48,M,HIGH,NORMAL,10.446,drugA
179 | 25,M,NORMAL,HIGH,19.011,drugY
180 | 39,M,NORMAL,HIGH,15.969,drugY
181 | 67,F,NORMAL,HIGH,15.891,drugY
182 | 22,F,HIGH,NORMAL,22.818,drugY
183 | 59,F,NORMAL,HIGH,13.884,drugX
184 | 20,F,LOW,NORMAL,11.686,drugX
185 | 36,F,HIGH,NORMAL,15.49,drugY
186 | 18,F,HIGH,HIGH,37.188,drugY
187 | 57,F,NORMAL,NORMAL,25.893,drugY
188 | 70,M,HIGH,HIGH,9.849,drugB
189 | 47,M,HIGH,HIGH,10.403,drugA
190 | 65,M,HIGH,NORMAL,34.997,drugY
191 | 64,M,HIGH,NORMAL,20.932,drugY
192 | 58,M,HIGH,HIGH,18.991,drugY
193 | 23,M,HIGH,HIGH,8.011,drugA
194 | 72,M,LOW,HIGH,16.31,drugY
195 | 72,M,LOW,HIGH,6.769,drugC
196 | 46,F,HIGH,HIGH,34.686,drugY
197 | 56,F,LOW,HIGH,11.567,drugC
198 | 16,M,LOW,HIGH,12.006,drugC
199 | 52,M,NORMAL,HIGH,9.894,drugX
200 | 23,M,NORMAL,NORMAL,14.02,drugX
201 | 40,F,LOW,NORMAL,11.349,drugX
--------------------------------------------------------------------------------