├── ArbreDeDecision_4DS_etudiant.ipynb ├── ML0101EN-Clas-Decision-Trees-drug-py-.ipynb ├── ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb ├── ML0101EN-Clas-Decision-Trees.ipynb └── drug200.csv /ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "button": false, 7 | "new_sheet": false, 8 | "run_control": { 9 | "read_only": false 10 | } 11 | }, 12 | "source": [ 13 | "\n", 14 | "\n", 15 | "#
Decision Trees
" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "button": false, 22 | "new_sheet": false, 23 | "run_control": { 24 | "read_only": false 25 | } 26 | }, 27 | "source": [ 28 | "In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of patients, and their respond to different medications. Then you use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient." 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": { 34 | "button": false, 35 | "new_sheet": false, 36 | "run_control": { 37 | "read_only": false 38 | } 39 | }, 40 | "source": [ 41 | "Import the Following Libraries:\n", 42 | "" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 1, 52 | "metadata": { 53 | "button": false, 54 | "new_sheet": false, 55 | "run_control": { 56 | "read_only": false 57 | } 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "import numpy as np \n", 62 | "import pandas as pd\n", 63 | "from sklearn.tree import DecisionTreeClassifier" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": { 69 | "button": false, 70 | "new_sheet": false, 71 | "run_control": { 72 | "read_only": false 73 | } 74 | }, 75 | "source": [ 76 | "### About dataset\n", 77 | "Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. \n", 78 | "\n", 79 | "Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to. \n", 80 | "\n", 81 | "It is a sample of binary classifier, and you can use the training part of the dataset \n", 82 | "to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.\n" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": { 88 | "button": false, 89 | "new_sheet": false, 90 | "run_control": { 91 | "read_only": false 92 | } 93 | }, 94 | "source": [ 95 | "### Downloading Data\n", 96 | "To download the data, we will use !wget to download it from IBM Object Storage." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 2, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "name": "stderr", 106 | "output_type": "stream", 107 | "text": [ 108 | "'wget' n'est pas reconnu en tant que commande interne\n", 109 | "ou externe, un programme ex‚cutable ou un fichier de commandes.\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "!wget -O drug200.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/drug200.csv" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "now, read data using pandas dataframe:" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 13, 134 | "metadata": { 135 | "button": false, 136 | "new_sheet": false, 137 | "run_control": { 138 | "read_only": false 139 | } 140 | }, 141 | "outputs": [ 142 | { 143 | "data": { 144 | "text/html": [ 145 | "
\n", 146 | "\n", 159 | "\n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | "
AgeSexBPCholesterolNa_to_KDrug
023FHIGHHIGH25.355drugY
147MLOWHIGH13.093drugC
247MLOWHIGH10.114drugC
328FNORMALHIGH7.798drugX
461FLOWHIGH18.043drugY
522FNORMALHIGH8.607drugX
649FNORMALHIGH16.275drugY
741MLOWHIGH11.037drugC
\n", 246 | "
" 247 | ], 248 | "text/plain": [ 249 | " Age Sex BP Cholesterol Na_to_K Drug\n", 250 | "0 23 F HIGH HIGH 25.355 drugY\n", 251 | "1 47 M LOW HIGH 13.093 drugC\n", 252 | "2 47 M LOW HIGH 10.114 drugC\n", 253 | "3 28 F NORMAL HIGH 7.798 drugX\n", 254 | "4 61 F LOW HIGH 18.043 drugY\n", 255 | "5 22 F NORMAL HIGH 8.607 drugX\n", 256 | "6 49 F NORMAL HIGH 16.275 drugY\n", 257 | "7 41 M LOW HIGH 11.037 drugC" 258 | ] 259 | }, 260 | "execution_count": 13, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "my_data = pd.read_csv(\"drug200.csv\", delimiter=\",\")\n", 267 | "my_data[0:8]" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": { 273 | "button": false, 274 | "new_sheet": false, 275 | "run_control": { 276 | "read_only": false 277 | } 278 | }, 279 | "source": [ 280 | "## Practice \n", 281 | "What is the size of data? " 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 4, 287 | "metadata": { 288 | "button": false, 289 | "new_sheet": false, 290 | "run_control": { 291 | "read_only": false 292 | } 293 | }, 294 | "outputs": [ 295 | { 296 | "data": { 297 | "text/plain": [ 298 | "1200" 299 | ] 300 | }, 301 | "execution_count": 4, 302 | "metadata": {}, 303 | "output_type": "execute_result" 304 | } 305 | ], 306 | "source": [ 307 | "# write your code here\n", 308 | "my_data.size\n", 309 | "\n" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "## Pre-processing" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": { 322 | "button": false, 323 | "new_sheet": false, 324 | "run_control": { 325 | "read_only": false 326 | } 327 | }, 328 | "source": [ 329 | "Using my_data as the Drug.csv data read by pandas, declare the following variables:
\n", 330 | "" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": { 344 | "button": false, 345 | "new_sheet": false, 346 | "run_control": { 347 | "read_only": false 348 | } 349 | }, 350 | "source": [ 351 | "Remove the column containing the target name since it doesn't contain numeric values." 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 5, 357 | "metadata": {}, 358 | "outputs": [ 359 | { 360 | "data": { 361 | "text/plain": [ 362 | "array([[23, 'F', 'HIGH', 'HIGH', 25.355],\n", 363 | " [47, 'M', 'LOW', 'HIGH', 13.093],\n", 364 | " [47, 'M', 'LOW', 'HIGH', 10.114],\n", 365 | " [28, 'F', 'NORMAL', 'HIGH', 7.798],\n", 366 | " [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)" 367 | ] 368 | }, 369 | "execution_count": 5, 370 | "metadata": {}, 371 | "output_type": "execute_result" 372 | } 373 | ], 374 | "source": [ 375 | "X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values\n", 376 | "X[0:5]" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "As you may figure out, some featurs in this dataset are catergorical such as __Sex__ or __BP__. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values. __pandas.get_dummies()__\n", 384 | "Convert categorical variable into dummy/indicator variables." 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 6, 390 | "metadata": {}, 391 | "outputs": [ 392 | { 393 | "data": { 394 | "text/plain": [ 395 | "array([[23, 0, 0, 0, 25.355],\n", 396 | " [47, 1, 1, 0, 13.093],\n", 397 | " [47, 1, 1, 0, 10.114],\n", 398 | " [28, 0, 2, 0, 7.798],\n", 399 | " [61, 0, 1, 0, 18.043]], dtype=object)" 400 | ] 401 | }, 402 | "execution_count": 6, 403 | "metadata": {}, 404 | "output_type": "execute_result" 405 | } 406 | ], 407 | "source": [ 408 | "from sklearn import preprocessing\n", 409 | "le_sex = preprocessing.LabelEncoder()\n", 410 | "le_sex.fit(['F','M'])\n", 411 | "X[:,1] = le_sex.transform(X[:,1]) \n", 412 | "\n", 413 | "\n", 414 | "le_BP = preprocessing.LabelEncoder()\n", 415 | "le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])\n", 416 | "X[:,2] = le_BP.transform(X[:,2])\n", 417 | "\n", 418 | "\n", 419 | "le_Chol = preprocessing.LabelEncoder()\n", 420 | "le_Chol.fit([ 'NORMAL', 'HIGH'])\n", 421 | "X[:,3] = le_Chol.transform(X[:,3]) \n", 422 | "\n", 423 | "X[0:5]\n" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "Now we can fill the target variable." 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 7, 436 | "metadata": { 437 | "button": false, 438 | "new_sheet": false, 439 | "run_control": { 440 | "read_only": false 441 | } 442 | }, 443 | "outputs": [ 444 | { 445 | "data": { 446 | "text/plain": [ 447 | "0 drugY\n", 448 | "1 drugC\n", 449 | "2 drugC\n", 450 | "3 drugX\n", 451 | "4 drugY\n", 452 | "Name: Drug, dtype: object" 453 | ] 454 | }, 455 | "execution_count": 7, 456 | "metadata": {}, 457 | "output_type": "execute_result" 458 | } 459 | ], 460 | "source": [ 461 | "y = my_data[\"Drug\"]\n", 462 | "y[0:5]" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": { 468 | "button": false, 469 | "new_sheet": false, 470 | "run_control": { 471 | "read_only": false 472 | } 473 | }, 474 | "source": [ 475 | "---\n", 476 | "## Setting up the Decision Tree\n", 477 | "We will be using train/test split on our decision tree. Let's import train_test_split from sklearn.cross_validation." 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 8, 483 | "metadata": { 484 | "button": false, 485 | "new_sheet": false, 486 | "run_control": { 487 | "read_only": false 488 | } 489 | }, 490 | "outputs": [], 491 | "source": [ 492 | "from sklearn.model_selection import train_test_split" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": { 498 | "button": false, 499 | "new_sheet": false, 500 | "run_control": { 501 | "read_only": false 502 | } 503 | }, 504 | "source": [ 505 | "Now train_test_split will return 4 different parameters. We will name them:
\n", 506 | "X_trainset, X_testset, y_trainset, y_testset

\n", 507 | "The train_test_split will need the parameters:
\n", 508 | "X, y, test_size=0.3, and random_state=3.

\n", 509 | "The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits." 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": 9, 515 | "metadata": { 516 | "button": false, 517 | "new_sheet": false, 518 | "run_control": { 519 | "read_only": false 520 | } 521 | }, 522 | "outputs": [], 523 | "source": [ 524 | "X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": { 530 | "button": false, 531 | "new_sheet": false, 532 | "run_control": { 533 | "read_only": false 534 | } 535 | }, 536 | "source": [ 537 | "## Practice\n", 538 | "Print the shape of X_trainset and y_trainset. Ensure that the dimensions match" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 14, 544 | "metadata": { 545 | "button": false, 546 | "new_sheet": false, 547 | "run_control": { 548 | "read_only": false 549 | } 550 | }, 551 | "outputs": [ 552 | { 553 | "data": { 554 | "text/plain": [ 555 | "(140,)" 556 | ] 557 | }, 558 | "execution_count": 14, 559 | "metadata": {}, 560 | "output_type": "execute_result" 561 | } 562 | ], 563 | "source": [ 564 | "# your code\n", 565 | "\n", 566 | "# your code\n", 567 | "\n", 568 | "X_trainset.shape,\n", 569 | "y_trainset.shape" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": { 575 | "button": false, 576 | "new_sheet": false, 577 | "run_control": { 578 | "read_only": false 579 | } 580 | }, 581 | "source": [ 582 | "Print the shape of X_testset and y_testset. Ensure that the dimensions match" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 15, 588 | "metadata": { 589 | "button": false, 590 | "new_sheet": false, 591 | "run_control": { 592 | "read_only": false 593 | } 594 | }, 595 | "outputs": [ 596 | { 597 | "name": "stdout", 598 | "output_type": "stream", 599 | "text": [ 600 | "[[26 0 0 1 19.161]\n", 601 | " [41 0 2 1 22.905]\n", 602 | " [28 0 2 0 19.675]\n", 603 | " [19 0 0 0 13.313]\n", 604 | " [50 1 2 1 15.79]\n", 605 | " [24 1 2 0 25.786]\n", 606 | " [72 1 1 0 16.31]\n", 607 | " [74 0 1 0 20.942]\n", 608 | " [37 0 1 1 12.006]\n", 609 | " [31 1 0 1 17.069]\n", 610 | " [22 0 2 0 8.607]\n", 611 | " [20 0 2 1 9.281]\n", 612 | " [28 0 1 0 13.127]\n", 613 | " [59 0 2 0 13.884]\n", 614 | " [15 1 0 1 17.206]\n", 615 | " [51 0 1 1 23.003]\n", 616 | " [45 1 1 1 10.017]\n", 617 | " [33 0 1 0 33.486]\n", 618 | " [39 1 0 0 9.664]\n", 619 | " [29 0 0 0 29.45]\n", 620 | " [60 1 2 0 15.171]\n", 621 | " [24 0 0 1 18.457]\n", 622 | " [49 0 2 1 9.381]\n", 623 | " [37 1 1 1 8.968]\n", 624 | " [32 0 0 1 10.292]\n", 625 | " [21 0 0 1 28.632]\n", 626 | " [23 1 2 0 12.26]\n", 627 | " [40 1 0 0 27.826]\n", 628 | " [38 1 1 0 18.295]\n", 629 | " [47 1 1 1 30.568]\n", 630 | " [22 0 0 1 22.818]\n", 631 | " [47 1 0 0 10.403]\n", 632 | " [30 0 2 0 10.443]\n", 633 | " [69 1 1 0 15.478]\n", 634 | " [42 0 0 0 21.036]\n", 635 | " [45 1 1 1 8.37]\n", 636 | " [49 1 0 1 6.269]\n", 637 | " [72 1 1 0 6.769]\n", 638 | " [74 1 1 1 11.939]\n", 639 | " [66 0 2 1 8.107]\n", 640 | " [46 1 2 1 7.285]\n", 641 | " [68 0 2 1 27.05]\n", 642 | " [58 0 0 0 19.416]\n", 643 | " [19 0 0 1 25.969]\n", 644 | " [20 1 0 1 35.639]\n", 645 | " [69 1 1 1 11.455]\n", 646 | " [32 0 0 1 25.974]\n", 647 | " [72 1 0 1 9.677]\n", 648 | " [50 0 2 1 12.295]\n", 649 | " [54 1 2 0 24.658]\n", 650 | " [36 0 0 0 11.198]\n", 651 | " [64 0 1 1 25.741]\n", 652 | " [35 1 1 1 9.17]\n", 653 | " [47 0 1 0 11.767]\n", 654 | " [47 0 1 0 10.067]\n", 655 | " [34 0 0 1 19.199]\n", 656 | " [26 0 1 0 14.16]\n", 657 | " [37 0 0 1 23.091]\n", 658 | " [48 1 0 1 10.446]\n", 659 | " [47 0 2 1 6.683]\n", 660 | " [55 0 0 0 10.977]\n", 661 | " [43 1 1 1 19.368]\n", 662 | " [35 0 0 0 12.894]\n", 663 | " [49 1 1 1 11.014]\n", 664 | " [45 1 1 0 17.951]\n", 665 | " [15 1 2 0 9.084]\n", 666 | " [57 0 2 1 25.893]\n", 667 | " [65 1 0 1 11.34]\n", 668 | " [70 1 0 0 9.849]\n", 669 | " [46 0 0 0 34.686]\n", 670 | " [41 1 0 1 15.156]\n", 671 | " [34 1 0 0 18.703]\n", 672 | " [42 1 0 1 12.766]\n", 673 | " [32 1 0 1 9.445]\n", 674 | " [25 1 2 0 19.011]\n", 675 | " [62 1 1 1 27.183]\n", 676 | " [23 1 0 0 8.011]\n", 677 | " [23 1 2 0 31.686]\n", 678 | " [58 0 1 0 38.247]\n", 679 | " [26 1 1 1 20.909]\n", 680 | " [68 1 0 0 11.009]\n", 681 | " [60 1 0 0 13.934]\n", 682 | " [15 0 0 1 16.725]\n", 683 | " [53 0 0 1 12.495]\n", 684 | " [37 1 1 1 16.724]\n", 685 | " [40 0 2 0 10.103]\n", 686 | " [59 1 0 0 13.935]\n", 687 | " [47 1 1 0 13.093]\n", 688 | " [65 0 1 1 13.769]\n", 689 | " [16 1 0 1 19.007]\n", 690 | " [67 1 2 1 9.514]\n", 691 | " [23 1 1 0 7.298]\n", 692 | " [56 0 1 0 11.567]\n", 693 | " [68 0 0 1 10.189]\n", 694 | " [65 1 0 1 34.997]\n", 695 | " [39 0 1 1 22.697]\n", 696 | " [35 1 2 1 7.845]\n", 697 | " [64 1 0 1 20.932]\n", 698 | " [28 0 1 0 19.796]\n", 699 | " [56 1 1 0 15.015]\n", 700 | " [57 1 1 1 19.128]\n", 701 | " [39 1 1 1 13.938]\n", 702 | " [32 0 1 1 10.84]\n", 703 | " [36 0 2 0 16.753]\n", 704 | " [65 0 0 1 31.876]\n", 705 | " [41 1 1 0 11.037]\n", 706 | " [67 1 1 1 20.693]\n", 707 | " [23 1 2 1 14.02]\n", 708 | " [40 0 1 1 11.349]\n", 709 | " [53 1 1 0 22.963]\n", 710 | " [56 0 0 0 25.395]\n", 711 | " [50 1 0 0 7.49]\n", 712 | " [22 1 0 1 28.294]\n", 713 | " [18 0 0 1 24.276]\n", 714 | " [62 1 2 0 16.594]\n", 715 | " [32 0 2 0 7.477]\n", 716 | " [38 0 1 1 29.875]\n", 717 | " [47 1 1 0 10.114]\n", 718 | " [29 1 0 0 12.856]\n", 719 | " [49 1 0 1 8.7]\n", 720 | " [64 1 2 0 7.761]\n", 721 | " [31 1 0 0 30.366]\n", 722 | " [60 1 0 1 8.621]\n", 723 | " [57 0 2 0 14.216]\n", 724 | " [42 0 1 1 29.271]\n", 725 | " [39 0 2 1 17.225]\n", 726 | " [61 0 1 1 7.34]\n", 727 | " [58 0 1 0 26.645]\n", 728 | " [61 0 0 0 25.475]\n", 729 | " [22 1 1 0 8.151]\n", 730 | " [51 1 0 1 11.343]\n", 731 | " [20 0 0 0 11.262]\n", 732 | " [42 1 1 0 20.013]\n", 733 | " [26 0 0 1 12.307]\n", 734 | " [63 1 2 0 25.917]\n", 735 | " [23 0 0 0 25.355]\n", 736 | " [18 0 0 0 37.188]\n", 737 | " [52 1 1 1 32.922]\n", 738 | " [55 1 2 1 7.261]\n", 739 | " [22 1 2 0 11.953]]\n", 740 | "40 drugY\n", 741 | "51 drugX\n", 742 | "139 drugX\n", 743 | "197 drugX\n", 744 | "170 drugX\n", 745 | "82 drugC\n", 746 | "183 drugY\n", 747 | "46 drugA\n", 748 | "70 drugB\n", 749 | "100 drugA\n", 750 | "179 drugY\n", 751 | "83 drugA\n", 752 | "25 drugY\n", 753 | "190 drugY\n", 754 | "159 drugX\n", 755 | "173 drugY\n", 756 | "95 drugX\n", 757 | "3 drugX\n", 758 | "41 drugB\n", 759 | "58 drugX\n", 760 | "14 drugX\n", 761 | "143 drugY\n", 762 | "12 drugY\n", 763 | "6 drugY\n", 764 | "182 drugX\n", 765 | "161 drugB\n", 766 | "128 drugY\n", 767 | "122 drugY\n", 768 | "101 drugA\n", 769 | "86 drugX\n", 770 | "64 drugB\n", 771 | "47 drugC\n", 772 | "158 drugC\n", 773 | "34 drugX\n", 774 | "38 drugX\n", 775 | "196 drugC\n", 776 | "4 drugY\n", 777 | "72 drugX\n", 778 | "67 drugX\n", 779 | "145 drugX\n", 780 | "156 drugA\n", 781 | "115 drugY\n", 782 | "155 drugC\n", 783 | "15 drugY\n", 784 | "61 drugA\n", 785 | "175 drugY\n", 786 | "120 drugY\n", 787 | "130 drugY\n", 788 | "23 drugY\n", 789 | "153 drugX\n", 790 | "31 drugB\n", 791 | "103 drugX\n", 792 | "89 drugY\n", 793 | "132 drugX\n", 794 | "109 drugY\n", 795 | "126 drugY\n", 796 | "17 drugA\n", 797 | "30 drugX\n", 798 | "178 drugY\n", 799 | "162 drugX\n", 800 | "Name: Drug, dtype: object\n" 801 | ] 802 | } 803 | ], 804 | "source": [ 805 | "# your code\n", 806 | "\n", 807 | "print(X_trainset)\n", 808 | "print(y_testset)" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": { 814 | "button": false, 815 | "new_sheet": false, 816 | "run_control": { 817 | "read_only": false 818 | } 819 | }, 820 | "source": [ 821 | "## Modeling\n", 822 | "We will first create an instance of the DecisionTreeClassifier called drugTree.
\n", 823 | "Inside of the classifier, specify criterion=\"entropy\" so we can see the information gain of each node." 824 | ] 825 | }, 826 | { 827 | "cell_type": "code", 828 | "execution_count": 16, 829 | "metadata": { 830 | "button": false, 831 | "new_sheet": false, 832 | "run_control": { 833 | "read_only": false 834 | } 835 | }, 836 | "outputs": [ 837 | { 838 | "data": { 839 | "text/plain": [ 840 | "DecisionTreeClassifier(criterion='entropy', max_depth=4)" 841 | ] 842 | }, 843 | "execution_count": 16, 844 | "metadata": {}, 845 | "output_type": "execute_result" 846 | } 847 | ], 848 | "source": [ 849 | "drugTree = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\n", 850 | "drugTree # it shows the default parameters" 851 | ] 852 | }, 853 | { 854 | "cell_type": "markdown", 855 | "metadata": { 856 | "button": false, 857 | "new_sheet": false, 858 | "run_control": { 859 | "read_only": false 860 | } 861 | }, 862 | "source": [ 863 | "Next, we will fit the data with the training feature matrix X_trainset and training response vector y_trainset " 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 17, 869 | "metadata": { 870 | "button": false, 871 | "new_sheet": false, 872 | "run_control": { 873 | "read_only": false 874 | } 875 | }, 876 | "outputs": [ 877 | { 878 | "data": { 879 | "text/plain": [ 880 | "DecisionTreeClassifier(criterion='entropy', max_depth=4)" 881 | ] 882 | }, 883 | "execution_count": 17, 884 | "metadata": {}, 885 | "output_type": "execute_result" 886 | } 887 | ], 888 | "source": [ 889 | "drugTree.fit(X_trainset,y_trainset)" 890 | ] 891 | }, 892 | { 893 | "cell_type": "markdown", 894 | "metadata": { 895 | "button": false, 896 | "new_sheet": false, 897 | "run_control": { 898 | "read_only": false 899 | } 900 | }, 901 | "source": [ 902 | "## Prediction\n", 903 | "Let's make some predictions on the testing dataset and store it into a variable called predTree." 904 | ] 905 | }, 906 | { 907 | "cell_type": "code", 908 | "execution_count": 18, 909 | "metadata": { 910 | "button": false, 911 | "new_sheet": false, 912 | "run_control": { 913 | "read_only": false 914 | } 915 | }, 916 | "outputs": [], 917 | "source": [ 918 | "predTree = drugTree.predict(X_testset)" 919 | ] 920 | }, 921 | { 922 | "cell_type": "markdown", 923 | "metadata": { 924 | "button": false, 925 | "new_sheet": false, 926 | "run_control": { 927 | "read_only": false 928 | } 929 | }, 930 | "source": [ 931 | "You can print out predTree and y_testset if you want to visually compare the prediction to the actual values." 932 | ] 933 | }, 934 | { 935 | "cell_type": "code", 936 | "execution_count": 19, 937 | "metadata": { 938 | "button": false, 939 | "new_sheet": false, 940 | "run_control": { 941 | "read_only": false 942 | }, 943 | "scrolled": true 944 | }, 945 | "outputs": [ 946 | { 947 | "name": "stdout", 948 | "output_type": "stream", 949 | "text": [ 950 | "['drugY' 'drugX' 'drugX' 'drugX' 'drugX']\n", 951 | "40 drugY\n", 952 | "51 drugX\n", 953 | "139 drugX\n", 954 | "197 drugX\n", 955 | "170 drugX\n", 956 | "Name: Drug, dtype: object\n" 957 | ] 958 | } 959 | ], 960 | "source": [ 961 | "print (predTree [0:5])\n", 962 | "print (y_testset [0:5])\n" 963 | ] 964 | }, 965 | { 966 | "cell_type": "markdown", 967 | "metadata": { 968 | "button": false, 969 | "new_sheet": false, 970 | "run_control": { 971 | "read_only": false 972 | } 973 | }, 974 | "source": [ 975 | "## Evaluation\n", 976 | "Next, let's import __metrics__ from sklearn and check the accuracy of our model." 977 | ] 978 | }, 979 | { 980 | "cell_type": "code", 981 | "execution_count": 20, 982 | "metadata": { 983 | "button": false, 984 | "new_sheet": false, 985 | "run_control": { 986 | "read_only": false 987 | } 988 | }, 989 | "outputs": [ 990 | { 991 | "name": "stdout", 992 | "output_type": "stream", 993 | "text": [ 994 | "DecisionTrees's Accuracy: 0.9833333333333333\n" 995 | ] 996 | } 997 | ], 998 | "source": [ 999 | "from sklearn import metrics\n", 1000 | "import matplotlib.pyplot as plt\n", 1001 | "print(\"DecisionTrees's Accuracy: \", metrics.accuracy_score(y_testset, predTree))" 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "markdown", 1006 | "metadata": { 1007 | "button": false, 1008 | "new_sheet": false, 1009 | "run_control": { 1010 | "read_only": false 1011 | } 1012 | }, 1013 | "source": [ 1014 | "__Accuracy classification score__ computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. \n", 1015 | "\n", 1016 | "In multilabel classification, the function returns the subset accuracy. If the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0; otherwise it is 0.0.\n" 1017 | ] 1018 | }, 1019 | { 1020 | "cell_type": "markdown", 1021 | "metadata": { 1022 | "button": false, 1023 | "new_sheet": false, 1024 | "run_control": { 1025 | "read_only": false 1026 | } 1027 | }, 1028 | "source": [ 1029 | "## Practice \n", 1030 | "Can you calculate the accuracy score without sklearn ?" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "code", 1035 | "execution_count": 26, 1036 | "metadata": { 1037 | "button": false, 1038 | "new_sheet": false, 1039 | "run_control": { 1040 | "read_only": false 1041 | } 1042 | }, 1043 | "outputs": [ 1044 | { 1045 | "name": "stdout", 1046 | "output_type": "stream", 1047 | "text": [ 1048 | "Collecting package metadata (current_repodata.json): ...working... done\n", 1049 | "Solving environment: ...working... done\n", 1050 | "\n", 1051 | "## Package Plan ##\n", 1052 | "\n", 1053 | " environment location: C:\\ProgramData\\Anaconda3\n", 1054 | "\n", 1055 | " added / updated specs:\n", 1056 | " - pydotplus\n", 1057 | "\n", 1058 | "\n", 1059 | "The following NEW packages will be INSTALLED:\n", 1060 | "\n", 1061 | " graphviz pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n", 1062 | " pydotplus conda-forge/noarch::pydotplus-2.0.2-py_2\n", 1063 | "\n", 1064 | "The following packages will be UPDATED:\n", 1065 | "\n", 1066 | " conda 4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n", 1067 | "\n", 1068 | "\n", 1069 | "Preparing transaction: ...working... done\n", 1070 | "Verifying transaction: ...working... failed\n" 1071 | ] 1072 | }, 1073 | { 1074 | "name": "stderr", 1075 | "output_type": "stream", 1076 | "text": [ 1077 | "\n", 1078 | "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n", 1079 | " environment location: C:\\ProgramData\\Anaconda3\n", 1080 | "\n", 1081 | "\n" 1082 | ] 1083 | }, 1084 | { 1085 | "name": "stdout", 1086 | "output_type": "stream", 1087 | "text": [ 1088 | "Collecting package metadata (current_repodata.json): ...working... done\n", 1089 | "Solving environment: ...working... done\n", 1090 | "\n", 1091 | "## Package Plan ##\n", 1092 | "\n", 1093 | " environment location: C:\\ProgramData\\Anaconda3" 1094 | ] 1095 | }, 1096 | { 1097 | "name": "stderr", 1098 | "output_type": "stream", 1099 | "text": [ 1100 | "\n", 1101 | "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n", 1102 | " environment location: C:\\ProgramData\\Anaconda3\n", 1103 | "\n", 1104 | "\n" 1105 | ] 1106 | }, 1107 | { 1108 | "name": "stdout", 1109 | "output_type": "stream", 1110 | "text": [ 1111 | "\n", 1112 | "\n", 1113 | " added / updated specs:\n", 1114 | " - python-graphviz\n", 1115 | "\n", 1116 | "\n", 1117 | "The following NEW packages will be INSTALLED:\n", 1118 | "\n", 1119 | " graphviz pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n", 1120 | " python-graphviz pkgs/main/noarch::python-graphviz-0.16-pyhd3eb1b0_1\n", 1121 | "\n", 1122 | "The following packages will be UPDATED:\n", 1123 | "\n", 1124 | " conda 4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n", 1125 | "\n", 1126 | "\n", 1127 | "Preparing transaction: ...working... done\n", 1128 | "Verifying transaction: ...working... failed\n" 1129 | ] 1130 | } 1131 | ], 1132 | "source": [ 1133 | "# your code here\n", 1134 | "# Notice: You might need to uncomment and install the pydotplus and graphviz libraries if you have not installed these before\n", 1135 | "!conda install -c conda-forge pydotplus -y\n", 1136 | "!conda install -c conda-forge python-graphviz -y" 1137 | ] 1138 | }, 1139 | { 1140 | "cell_type": "markdown", 1141 | "metadata": {}, 1142 | "source": [ 1143 | "## Visualization\n", 1144 | "Lets visualize the tree" 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "code", 1149 | "execution_count": 31, 1150 | "metadata": { 1151 | "button": false, 1152 | "new_sheet": false, 1153 | "run_control": { 1154 | "read_only": false 1155 | } 1156 | }, 1157 | "outputs": [], 1158 | "source": [ 1159 | "from six import StringIO\n", 1160 | "import pydotplus\n", 1161 | "import matplotlib.image as mpimg\n", 1162 | "from sklearn import tree\n", 1163 | "%matplotlib inline " 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "code", 1168 | "execution_count": 36, 1169 | "metadata": { 1170 | "button": false, 1171 | "new_sheet": false, 1172 | "run_control": { 1173 | "read_only": false 1174 | }, 1175 | "scrolled": true 1176 | }, 1177 | "outputs": [ 1178 | { 1179 | "ename": "SyntaxError", 1180 | "evalue": "not a PNG file ()", 1181 | "output_type": "error", 1182 | "traceback": [ 1183 | "Traceback \u001b[1;36m(most recent call last)\u001b[0m:\n", 1184 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py\"\u001b[0m, line \u001b[0;32m3437\u001b[0m, in \u001b[0;35mrun_code\u001b[0m\n exec(code_obj, self.user_global_ns, self.user_ns)\n", 1185 | " File \u001b[0;32m\"\"\u001b[0m, line \u001b[0;32m9\u001b[0m, in \u001b[0;35m\u001b[0m\n img = mpimg.imread(filename)\n", 1186 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\matplotlib\\image.py\"\u001b[0m, line \u001b[0;32m1496\u001b[0m, in \u001b[0;35mimread\u001b[0m\n with img_open(fname) as image:\n", 1187 | " File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\ImageFile.py\"\u001b[0m, line \u001b[0;32m121\u001b[0m, in \u001b[0;35m__init__\u001b[0m\n self._open()\n", 1188 | "\u001b[1;36m File \u001b[1;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\PngImagePlugin.py\"\u001b[1;36m, line \u001b[1;32m676\u001b[1;36m, in \u001b[1;35m_open\u001b[1;36m\u001b[0m\n\u001b[1;33m raise SyntaxError(\"not a PNG file\")\u001b[0m\n", 1189 | "\u001b[1;36m File \u001b[1;32m\"\"\u001b[1;36m, line \u001b[1;32munknown\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m not a PNG file\n" 1190 | ] 1191 | } 1192 | ], 1193 | "source": [ 1194 | "dot_data = StringIO()\n", 1195 | "filename = \"drugtree.png\"\n", 1196 | "featureNames = my_data.columns[0:5]\n", 1197 | "targetNames = my_data[\"Drug\"].unique().tolist()\n", 1198 | "out=tree.export_graphviz(drugTree,feature_names=featureNames, out_file=dot_data, class_names= np.unique(y_trainset), filled=True, special_characters=True,rotate=False) \n", 1199 | "graph = pydotplus.graph_from_dot_data(dot_data.getvalue())\n", 1200 | "#graph.write_png(filename)\n", 1201 | "graph.write_png(filename)\n", 1202 | "img = mpimg.imread(filename)\n", 1203 | "plt.figure(figsize=(100, 200))\n", 1204 | "plt.imshow(img,interpolation='nearest')" 1205 | ] 1206 | }, 1207 | { 1208 | "cell_type": "markdown", 1209 | "metadata": { 1210 | "button": false, 1211 | "new_sheet": false, 1212 | "run_control": { 1213 | "read_only": false 1214 | } 1215 | }, 1216 | "source": [ 1217 | "## Want to learn more?\n", 1218 | "\n", 1219 | "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).\n", 1220 | "\n", 1221 | "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)\n", 1222 | "\n", 1223 | "### Thanks for completing this lesson!\n", 1224 | "\n", 1225 | "Notebook created by: Saeed Aghabozorgi\n", 1226 | "\n", 1227 | "
\n", 1228 | "Copyright © 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​" 1229 | ] 1230 | } 1231 | ], 1232 | "metadata": { 1233 | "anaconda-cloud": {}, 1234 | "kernelspec": { 1235 | "display_name": "Python 3", 1236 | "language": "python", 1237 | "name": "python3" 1238 | }, 1239 | "language_info": { 1240 | "codemirror_mode": { 1241 | "name": "ipython", 1242 | "version": 3 1243 | }, 1244 | "file_extension": ".py", 1245 | "mimetype": "text/x-python", 1246 | "name": "python", 1247 | "nbconvert_exporter": "python", 1248 | "pygments_lexer": "ipython3", 1249 | "version": "3.8.8" 1250 | }, 1251 | "widgets": { 1252 | "state": {}, 1253 | "version": "1.1.2" 1254 | } 1255 | }, 1256 | "nbformat": 4, 1257 | "nbformat_minor": 2 1258 | } 1259 | -------------------------------------------------------------------------------- /drug200.csv: -------------------------------------------------------------------------------- 1 | Age,Sex,BP,Cholesterol,Na_to_K,Drug 2 | 23,F,HIGH,HIGH,25.355,drugY 3 | 47,M,LOW,HIGH,13.093,drugC 4 | 47,M,LOW,HIGH,10.114,drugC 5 | 28,F,NORMAL,HIGH,7.798,drugX 6 | 61,F,LOW,HIGH,18.043,drugY 7 | 22,F,NORMAL,HIGH,8.607,drugX 8 | 49,F,NORMAL,HIGH,16.275,drugY 9 | 41,M,LOW,HIGH,11.037,drugC 10 | 60,M,NORMAL,HIGH,15.171,drugY 11 | 43,M,LOW,NORMAL,19.368,drugY 12 | 47,F,LOW,HIGH,11.767,drugC 13 | 34,F,HIGH,NORMAL,19.199,drugY 14 | 43,M,LOW,HIGH,15.376,drugY 15 | 74,F,LOW,HIGH,20.942,drugY 16 | 50,F,NORMAL,HIGH,12.703,drugX 17 | 16,F,HIGH,NORMAL,15.516,drugY 18 | 69,M,LOW,NORMAL,11.455,drugX 19 | 43,M,HIGH,HIGH,13.972,drugA 20 | 23,M,LOW,HIGH,7.298,drugC 21 | 32,F,HIGH,NORMAL,25.974,drugY 22 | 57,M,LOW,NORMAL,19.128,drugY 23 | 63,M,NORMAL,HIGH,25.917,drugY 24 | 47,M,LOW,NORMAL,30.568,drugY 25 | 48,F,LOW,HIGH,15.036,drugY 26 | 33,F,LOW,HIGH,33.486,drugY 27 | 28,F,HIGH,NORMAL,18.809,drugY 28 | 31,M,HIGH,HIGH,30.366,drugY 29 | 49,F,NORMAL,NORMAL,9.381,drugX 30 | 39,F,LOW,NORMAL,22.697,drugY 31 | 45,M,LOW,HIGH,17.951,drugY 32 | 18,F,NORMAL,NORMAL,8.75,drugX 33 | 74,M,HIGH,HIGH,9.567,drugB 34 | 49,M,LOW,NORMAL,11.014,drugX 35 | 65,F,HIGH,NORMAL,31.876,drugY 36 | 53,M,NORMAL,HIGH,14.133,drugX 37 | 46,M,NORMAL,NORMAL,7.285,drugX 38 | 32,M,HIGH,NORMAL,9.445,drugA 39 | 39,M,LOW,NORMAL,13.938,drugX 40 | 39,F,NORMAL,NORMAL,9.709,drugX 41 | 15,M,NORMAL,HIGH,9.084,drugX 42 | 73,F,NORMAL,HIGH,19.221,drugY 43 | 58,F,HIGH,NORMAL,14.239,drugB 44 | 50,M,NORMAL,NORMAL,15.79,drugY 45 | 23,M,NORMAL,HIGH,12.26,drugX 46 | 50,F,NORMAL,NORMAL,12.295,drugX 47 | 66,F,NORMAL,NORMAL,8.107,drugX 48 | 37,F,HIGH,HIGH,13.091,drugA 49 | 68,M,LOW,HIGH,10.291,drugC 50 | 23,M,NORMAL,HIGH,31.686,drugY 51 | 28,F,LOW,HIGH,19.796,drugY 52 | 58,F,HIGH,HIGH,19.416,drugY 53 | 67,M,NORMAL,NORMAL,10.898,drugX 54 | 62,M,LOW,NORMAL,27.183,drugY 55 | 24,F,HIGH,NORMAL,18.457,drugY 56 | 68,F,HIGH,NORMAL,10.189,drugB 57 | 26,F,LOW,HIGH,14.16,drugC 58 | 65,M,HIGH,NORMAL,11.34,drugB 59 | 40,M,HIGH,HIGH,27.826,drugY 60 | 60,M,NORMAL,NORMAL,10.091,drugX 61 | 34,M,HIGH,HIGH,18.703,drugY 62 | 38,F,LOW,NORMAL,29.875,drugY 63 | 24,M,HIGH,NORMAL,9.475,drugA 64 | 67,M,LOW,NORMAL,20.693,drugY 65 | 45,M,LOW,NORMAL,8.37,drugX 66 | 60,F,HIGH,HIGH,13.303,drugB 67 | 68,F,NORMAL,NORMAL,27.05,drugY 68 | 29,M,HIGH,HIGH,12.856,drugA 69 | 17,M,NORMAL,NORMAL,10.832,drugX 70 | 54,M,NORMAL,HIGH,24.658,drugY 71 | 18,F,HIGH,NORMAL,24.276,drugY 72 | 70,M,HIGH,HIGH,13.967,drugB 73 | 28,F,NORMAL,HIGH,19.675,drugY 74 | 24,F,NORMAL,HIGH,10.605,drugX 75 | 41,F,NORMAL,NORMAL,22.905,drugY 76 | 31,M,HIGH,NORMAL,17.069,drugY 77 | 26,M,LOW,NORMAL,20.909,drugY 78 | 36,F,HIGH,HIGH,11.198,drugA 79 | 26,F,HIGH,NORMAL,19.161,drugY 80 | 19,F,HIGH,HIGH,13.313,drugA 81 | 32,F,LOW,NORMAL,10.84,drugX 82 | 60,M,HIGH,HIGH,13.934,drugB 83 | 64,M,NORMAL,HIGH,7.761,drugX 84 | 32,F,LOW,HIGH,9.712,drugC 85 | 38,F,HIGH,NORMAL,11.326,drugA 86 | 47,F,LOW,HIGH,10.067,drugC 87 | 59,M,HIGH,HIGH,13.935,drugB 88 | 51,F,NORMAL,HIGH,13.597,drugX 89 | 69,M,LOW,HIGH,15.478,drugY 90 | 37,F,HIGH,NORMAL,23.091,drugY 91 | 50,F,NORMAL,NORMAL,17.211,drugY 92 | 62,M,NORMAL,HIGH,16.594,drugY 93 | 41,M,HIGH,NORMAL,15.156,drugY 94 | 29,F,HIGH,HIGH,29.45,drugY 95 | 42,F,LOW,NORMAL,29.271,drugY 96 | 56,M,LOW,HIGH,15.015,drugY 97 | 36,M,LOW,NORMAL,11.424,drugX 98 | 58,F,LOW,HIGH,38.247,drugY 99 | 56,F,HIGH,HIGH,25.395,drugY 100 | 20,M,HIGH,NORMAL,35.639,drugY 101 | 15,F,HIGH,NORMAL,16.725,drugY 102 | 31,M,HIGH,NORMAL,11.871,drugA 103 | 45,F,HIGH,HIGH,12.854,drugA 104 | 28,F,LOW,HIGH,13.127,drugC 105 | 56,M,NORMAL,HIGH,8.966,drugX 106 | 22,M,HIGH,NORMAL,28.294,drugY 107 | 37,M,LOW,NORMAL,8.968,drugX 108 | 22,M,NORMAL,HIGH,11.953,drugX 109 | 42,M,LOW,HIGH,20.013,drugY 110 | 72,M,HIGH,NORMAL,9.677,drugB 111 | 23,M,NORMAL,HIGH,16.85,drugY 112 | 50,M,HIGH,HIGH,7.49,drugA 113 | 47,F,NORMAL,NORMAL,6.683,drugX 114 | 35,M,LOW,NORMAL,9.17,drugX 115 | 65,F,LOW,NORMAL,13.769,drugX 116 | 20,F,NORMAL,NORMAL,9.281,drugX 117 | 51,M,HIGH,HIGH,18.295,drugY 118 | 67,M,NORMAL,NORMAL,9.514,drugX 119 | 40,F,NORMAL,HIGH,10.103,drugX 120 | 32,F,HIGH,NORMAL,10.292,drugA 121 | 61,F,HIGH,HIGH,25.475,drugY 122 | 28,M,NORMAL,HIGH,27.064,drugY 123 | 15,M,HIGH,NORMAL,17.206,drugY 124 | 34,M,NORMAL,HIGH,22.456,drugY 125 | 36,F,NORMAL,HIGH,16.753,drugY 126 | 53,F,HIGH,NORMAL,12.495,drugB 127 | 19,F,HIGH,NORMAL,25.969,drugY 128 | 66,M,HIGH,HIGH,16.347,drugY 129 | 35,M,NORMAL,NORMAL,7.845,drugX 130 | 47,M,LOW,NORMAL,33.542,drugY 131 | 32,F,NORMAL,HIGH,7.477,drugX 132 | 70,F,NORMAL,HIGH,20.489,drugY 133 | 52,M,LOW,NORMAL,32.922,drugY 134 | 49,M,LOW,NORMAL,13.598,drugX 135 | 24,M,NORMAL,HIGH,25.786,drugY 136 | 42,F,HIGH,HIGH,21.036,drugY 137 | 74,M,LOW,NORMAL,11.939,drugX 138 | 55,F,HIGH,HIGH,10.977,drugB 139 | 35,F,HIGH,HIGH,12.894,drugA 140 | 51,M,HIGH,NORMAL,11.343,drugB 141 | 69,F,NORMAL,HIGH,10.065,drugX 142 | 49,M,HIGH,NORMAL,6.269,drugA 143 | 64,F,LOW,NORMAL,25.741,drugY 144 | 60,M,HIGH,NORMAL,8.621,drugB 145 | 74,M,HIGH,NORMAL,15.436,drugY 146 | 39,M,HIGH,HIGH,9.664,drugA 147 | 61,M,NORMAL,HIGH,9.443,drugX 148 | 37,F,LOW,NORMAL,12.006,drugX 149 | 26,F,HIGH,NORMAL,12.307,drugA 150 | 61,F,LOW,NORMAL,7.34,drugX 151 | 22,M,LOW,HIGH,8.151,drugC 152 | 49,M,HIGH,NORMAL,8.7,drugA 153 | 68,M,HIGH,HIGH,11.009,drugB 154 | 55,M,NORMAL,NORMAL,7.261,drugX 155 | 72,F,LOW,NORMAL,14.642,drugX 156 | 37,M,LOW,NORMAL,16.724,drugY 157 | 49,M,LOW,HIGH,10.537,drugC 158 | 31,M,HIGH,NORMAL,11.227,drugA 159 | 53,M,LOW,HIGH,22.963,drugY 160 | 59,F,LOW,HIGH,10.444,drugC 161 | 34,F,LOW,NORMAL,12.923,drugX 162 | 30,F,NORMAL,HIGH,10.443,drugX 163 | 57,F,HIGH,NORMAL,9.945,drugB 164 | 43,M,NORMAL,NORMAL,12.859,drugX 165 | 21,F,HIGH,NORMAL,28.632,drugY 166 | 16,M,HIGH,NORMAL,19.007,drugY 167 | 38,M,LOW,HIGH,18.295,drugY 168 | 58,F,LOW,HIGH,26.645,drugY 169 | 57,F,NORMAL,HIGH,14.216,drugX 170 | 51,F,LOW,NORMAL,23.003,drugY 171 | 20,F,HIGH,HIGH,11.262,drugA 172 | 28,F,NORMAL,HIGH,12.879,drugX 173 | 45,M,LOW,NORMAL,10.017,drugX 174 | 39,F,NORMAL,NORMAL,17.225,drugY 175 | 41,F,LOW,NORMAL,18.739,drugY 176 | 42,M,HIGH,NORMAL,12.766,drugA 177 | 73,F,HIGH,HIGH,18.348,drugY 178 | 48,M,HIGH,NORMAL,10.446,drugA 179 | 25,M,NORMAL,HIGH,19.011,drugY 180 | 39,M,NORMAL,HIGH,15.969,drugY 181 | 67,F,NORMAL,HIGH,15.891,drugY 182 | 22,F,HIGH,NORMAL,22.818,drugY 183 | 59,F,NORMAL,HIGH,13.884,drugX 184 | 20,F,LOW,NORMAL,11.686,drugX 185 | 36,F,HIGH,NORMAL,15.49,drugY 186 | 18,F,HIGH,HIGH,37.188,drugY 187 | 57,F,NORMAL,NORMAL,25.893,drugY 188 | 70,M,HIGH,HIGH,9.849,drugB 189 | 47,M,HIGH,HIGH,10.403,drugA 190 | 65,M,HIGH,NORMAL,34.997,drugY 191 | 64,M,HIGH,NORMAL,20.932,drugY 192 | 58,M,HIGH,HIGH,18.991,drugY 193 | 23,M,HIGH,HIGH,8.011,drugA 194 | 72,M,LOW,HIGH,16.31,drugY 195 | 72,M,LOW,HIGH,6.769,drugC 196 | 46,F,HIGH,HIGH,34.686,drugY 197 | 56,F,LOW,HIGH,11.567,drugC 198 | 16,M,LOW,HIGH,12.006,drugC 199 | 52,M,NORMAL,HIGH,9.894,drugX 200 | 23,M,NORMAL,NORMAL,14.02,drugX 201 | 40,F,LOW,NORMAL,11.349,drugX --------------------------------------------------------------------------------