├── ArbreDeDecision_4DS_etudiant.ipynb
├── ML0101EN-Clas-Decision-Trees-drug-py-.ipynb
├── ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb
├── ML0101EN-Clas-Decision-Trees.ipynb
└── drug200.csv


/ML0101EN-Clas-Decision-Trees-drug-py-v1 (1).ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {
   6 |     "button": false,
   7 |     "new_sheet": false,
   8 |     "run_control": {
   9 |      "read_only": false
  10 |     }
  11 |    },
  12 |    "source": [
  13 |     "<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width = 400, align = \"center\"></a>\n",
  14 |     "\n",
  15 |     "# <center>Decision Trees</center>"
  16 |    ]
  17 |   },
  18 |   {
  19 |    "cell_type": "markdown",
  20 |    "metadata": {
  21 |     "button": false,
  22 |     "new_sheet": false,
  23 |     "run_control": {
  24 |      "read_only": false
  25 |     }
  26 |    },
  27 |    "source": [
  28 |     "In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of patients, and their respond to different medications. Then you use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient."
  29 |    ]
  30 |   },
  31 |   {
  32 |    "cell_type": "markdown",
  33 |    "metadata": {
  34 |     "button": false,
  35 |     "new_sheet": false,
  36 |     "run_control": {
  37 |      "read_only": false
  38 |     }
  39 |    },
  40 |    "source": [
  41 |     "Import the Following Libraries:\n",
  42 |     "<ul>\n",
  43 |     "    <li> <b>numpy (as np)</b> </li>\n",
  44 |     "    <li> <b>pandas</b> </li>\n",
  45 |     "    <li> <b>DecisionTreeClassifier</b> from <b>sklearn.tree</b> </li>\n",
  46 |     "</ul>"
  47 |    ]
  48 |   },
  49 |   {
  50 |    "cell_type": "code",
  51 |    "execution_count": 1,
  52 |    "metadata": {
  53 |     "button": false,
  54 |     "new_sheet": false,
  55 |     "run_control": {
  56 |      "read_only": false
  57 |     }
  58 |    },
  59 |    "outputs": [],
  60 |    "source": [
  61 |     "import numpy as np \n",
  62 |     "import pandas as pd\n",
  63 |     "from sklearn.tree import DecisionTreeClassifier"
  64 |    ]
  65 |   },
  66 |   {
  67 |    "cell_type": "markdown",
  68 |    "metadata": {
  69 |     "button": false,
  70 |     "new_sheet": false,
  71 |     "run_control": {
  72 |      "read_only": false
  73 |     }
  74 |    },
  75 |    "source": [
  76 |     "### About dataset\n",
  77 |     "Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Drug x and y. \n",
  78 |     "\n",
  79 |     "Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness. The feature sets of this dataset are Age, Sex, Blood Pressure, and Cholesterol of patients, and the target is the drug that each patient responded to. \n",
  80 |     "\n",
  81 |     "It is a sample of binary classifier, and you can use the training part of the dataset \n",
  82 |     "to build a decision tree, and then use it to predict the class of a unknown patient, or to prescribe it to a new patient.\n"
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "markdown",
  87 |    "metadata": {
  88 |     "button": false,
  89 |     "new_sheet": false,
  90 |     "run_control": {
  91 |      "read_only": false
  92 |     }
  93 |    },
  94 |    "source": [
  95 |     "### Downloading Data\n",
  96 |     "To download the data, we will use !wget to download it from IBM Object Storage."
  97 |    ]
  98 |   },
  99 |   {
 100 |    "cell_type": "code",
 101 |    "execution_count": 2,
 102 |    "metadata": {},
 103 |    "outputs": [
 104 |     {
 105 |      "name": "stderr",
 106 |      "output_type": "stream",
 107 |      "text": [
 108 |       "'wget' n'est pas reconnu en tant que commande interne\n",
 109 |       "ou externe, un programme ex‚cutable ou un fichier de commandes.\n"
 110 |      ]
 111 |     }
 112 |    ],
 113 |    "source": [
 114 |     "!wget -O drug200.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/drug200.csv"
 115 |    ]
 116 |   },
 117 |   {
 118 |    "cell_type": "markdown",
 119 |    "metadata": {},
 120 |    "source": [
 121 |     "__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)"
 122 |    ]
 123 |   },
 124 |   {
 125 |    "cell_type": "markdown",
 126 |    "metadata": {},
 127 |    "source": [
 128 |     "now, read data using pandas dataframe:"
 129 |    ]
 130 |   },
 131 |   {
 132 |    "cell_type": "code",
 133 |    "execution_count": 13,
 134 |    "metadata": {
 135 |     "button": false,
 136 |     "new_sheet": false,
 137 |     "run_control": {
 138 |      "read_only": false
 139 |     }
 140 |    },
 141 |    "outputs": [
 142 |     {
 143 |      "data": {
 144 |       "text/html": [
 145 |        "<div>\n",
 146 |        "<style scoped>\n",
 147 |        "    .dataframe tbody tr th:only-of-type {\n",
 148 |        "        vertical-align: middle;\n",
 149 |        "    }\n",
 150 |        "\n",
 151 |        "    .dataframe tbody tr th {\n",
 152 |        "        vertical-align: top;\n",
 153 |        "    }\n",
 154 |        "\n",
 155 |        "    .dataframe thead th {\n",
 156 |        "        text-align: right;\n",
 157 |        "    }\n",
 158 |        "</style>\n",
 159 |        "<table border=\"1\" class=\"dataframe\">\n",
 160 |        "  <thead>\n",
 161 |        "    <tr style=\"text-align: right;\">\n",
 162 |        "      <th></th>\n",
 163 |        "      <th>Age</th>\n",
 164 |        "      <th>Sex</th>\n",
 165 |        "      <th>BP</th>\n",
 166 |        "      <th>Cholesterol</th>\n",
 167 |        "      <th>Na_to_K</th>\n",
 168 |        "      <th>Drug</th>\n",
 169 |        "    </tr>\n",
 170 |        "  </thead>\n",
 171 |        "  <tbody>\n",
 172 |        "    <tr>\n",
 173 |        "      <th>0</th>\n",
 174 |        "      <td>23</td>\n",
 175 |        "      <td>F</td>\n",
 176 |        "      <td>HIGH</td>\n",
 177 |        "      <td>HIGH</td>\n",
 178 |        "      <td>25.355</td>\n",
 179 |        "      <td>drugY</td>\n",
 180 |        "    </tr>\n",
 181 |        "    <tr>\n",
 182 |        "      <th>1</th>\n",
 183 |        "      <td>47</td>\n",
 184 |        "      <td>M</td>\n",
 185 |        "      <td>LOW</td>\n",
 186 |        "      <td>HIGH</td>\n",
 187 |        "      <td>13.093</td>\n",
 188 |        "      <td>drugC</td>\n",
 189 |        "    </tr>\n",
 190 |        "    <tr>\n",
 191 |        "      <th>2</th>\n",
 192 |        "      <td>47</td>\n",
 193 |        "      <td>M</td>\n",
 194 |        "      <td>LOW</td>\n",
 195 |        "      <td>HIGH</td>\n",
 196 |        "      <td>10.114</td>\n",
 197 |        "      <td>drugC</td>\n",
 198 |        "    </tr>\n",
 199 |        "    <tr>\n",
 200 |        "      <th>3</th>\n",
 201 |        "      <td>28</td>\n",
 202 |        "      <td>F</td>\n",
 203 |        "      <td>NORMAL</td>\n",
 204 |        "      <td>HIGH</td>\n",
 205 |        "      <td>7.798</td>\n",
 206 |        "      <td>drugX</td>\n",
 207 |        "    </tr>\n",
 208 |        "    <tr>\n",
 209 |        "      <th>4</th>\n",
 210 |        "      <td>61</td>\n",
 211 |        "      <td>F</td>\n",
 212 |        "      <td>LOW</td>\n",
 213 |        "      <td>HIGH</td>\n",
 214 |        "      <td>18.043</td>\n",
 215 |        "      <td>drugY</td>\n",
 216 |        "    </tr>\n",
 217 |        "    <tr>\n",
 218 |        "      <th>5</th>\n",
 219 |        "      <td>22</td>\n",
 220 |        "      <td>F</td>\n",
 221 |        "      <td>NORMAL</td>\n",
 222 |        "      <td>HIGH</td>\n",
 223 |        "      <td>8.607</td>\n",
 224 |        "      <td>drugX</td>\n",
 225 |        "    </tr>\n",
 226 |        "    <tr>\n",
 227 |        "      <th>6</th>\n",
 228 |        "      <td>49</td>\n",
 229 |        "      <td>F</td>\n",
 230 |        "      <td>NORMAL</td>\n",
 231 |        "      <td>HIGH</td>\n",
 232 |        "      <td>16.275</td>\n",
 233 |        "      <td>drugY</td>\n",
 234 |        "    </tr>\n",
 235 |        "    <tr>\n",
 236 |        "      <th>7</th>\n",
 237 |        "      <td>41</td>\n",
 238 |        "      <td>M</td>\n",
 239 |        "      <td>LOW</td>\n",
 240 |        "      <td>HIGH</td>\n",
 241 |        "      <td>11.037</td>\n",
 242 |        "      <td>drugC</td>\n",
 243 |        "    </tr>\n",
 244 |        "  </tbody>\n",
 245 |        "</table>\n",
 246 |        "</div>"
 247 |       ],
 248 |       "text/plain": [
 249 |        "   Age Sex      BP Cholesterol  Na_to_K   Drug\n",
 250 |        "0   23   F    HIGH        HIGH   25.355  drugY\n",
 251 |        "1   47   M     LOW        HIGH   13.093  drugC\n",
 252 |        "2   47   M     LOW        HIGH   10.114  drugC\n",
 253 |        "3   28   F  NORMAL        HIGH    7.798  drugX\n",
 254 |        "4   61   F     LOW        HIGH   18.043  drugY\n",
 255 |        "5   22   F  NORMAL        HIGH    8.607  drugX\n",
 256 |        "6   49   F  NORMAL        HIGH   16.275  drugY\n",
 257 |        "7   41   M     LOW        HIGH   11.037  drugC"
 258 |       ]
 259 |      },
 260 |      "execution_count": 13,
 261 |      "metadata": {},
 262 |      "output_type": "execute_result"
 263 |     }
 264 |    ],
 265 |    "source": [
 266 |     "my_data = pd.read_csv(\"drug200.csv\", delimiter=\",\")\n",
 267 |     "my_data[0:8]"
 268 |    ]
 269 |   },
 270 |   {
 271 |    "cell_type": "markdown",
 272 |    "metadata": {
 273 |     "button": false,
 274 |     "new_sheet": false,
 275 |     "run_control": {
 276 |      "read_only": false
 277 |     }
 278 |    },
 279 |    "source": [
 280 |     "## Practice \n",
 281 |     "What is the size of data? "
 282 |    ]
 283 |   },
 284 |   {
 285 |    "cell_type": "code",
 286 |    "execution_count": 4,
 287 |    "metadata": {
 288 |     "button": false,
 289 |     "new_sheet": false,
 290 |     "run_control": {
 291 |      "read_only": false
 292 |     }
 293 |    },
 294 |    "outputs": [
 295 |     {
 296 |      "data": {
 297 |       "text/plain": [
 298 |        "1200"
 299 |       ]
 300 |      },
 301 |      "execution_count": 4,
 302 |      "metadata": {},
 303 |      "output_type": "execute_result"
 304 |     }
 305 |    ],
 306 |    "source": [
 307 |     "# write your code here\n",
 308 |     "my_data.size\n",
 309 |     "\n"
 310 |    ]
 311 |   },
 312 |   {
 313 |    "cell_type": "markdown",
 314 |    "metadata": {},
 315 |    "source": [
 316 |     "## Pre-processing"
 317 |    ]
 318 |   },
 319 |   {
 320 |    "cell_type": "markdown",
 321 |    "metadata": {
 322 |     "button": false,
 323 |     "new_sheet": false,
 324 |     "run_control": {
 325 |      "read_only": false
 326 |     }
 327 |    },
 328 |    "source": [
 329 |     "Using <b>my_data</b> as the Drug.csv data read by pandas, declare the following variables: <br>\n",
 330 |     "<ul>\n",
 331 |     "    <li> <b> X </b> as the <b> Feature Matrix </b> (data of my_data) </li>\n",
 332 |     "\n",
 333 |     "    \n",
 334 |     "    <li> <b> y </b> as the <b> response vector (target) </b> </li>\n",
 335 |     "\n",
 336 |     "\n",
 337 |     "   \n",
 338 |     "</ul>"
 339 |    ]
 340 |   },
 341 |   {
 342 |    "cell_type": "markdown",
 343 |    "metadata": {
 344 |     "button": false,
 345 |     "new_sheet": false,
 346 |     "run_control": {
 347 |      "read_only": false
 348 |     }
 349 |    },
 350 |    "source": [
 351 |     "Remove the column containing the target name since it doesn't contain numeric values."
 352 |    ]
 353 |   },
 354 |   {
 355 |    "cell_type": "code",
 356 |    "execution_count": 5,
 357 |    "metadata": {},
 358 |    "outputs": [
 359 |     {
 360 |      "data": {
 361 |       "text/plain": [
 362 |        "array([[23, 'F', 'HIGH', 'HIGH', 25.355],\n",
 363 |        "       [47, 'M', 'LOW', 'HIGH', 13.093],\n",
 364 |        "       [47, 'M', 'LOW', 'HIGH', 10.114],\n",
 365 |        "       [28, 'F', 'NORMAL', 'HIGH', 7.798],\n",
 366 |        "       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)"
 367 |       ]
 368 |      },
 369 |      "execution_count": 5,
 370 |      "metadata": {},
 371 |      "output_type": "execute_result"
 372 |     }
 373 |    ],
 374 |    "source": [
 375 |     "X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values\n",
 376 |     "X[0:5]"
 377 |    ]
 378 |   },
 379 |   {
 380 |    "cell_type": "markdown",
 381 |    "metadata": {},
 382 |    "source": [
 383 |     "As you may figure out, some featurs in this dataset are catergorical such as __Sex__ or __BP__. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values. __pandas.get_dummies()__\n",
 384 |     "Convert categorical variable into dummy/indicator variables."
 385 |    ]
 386 |   },
 387 |   {
 388 |    "cell_type": "code",
 389 |    "execution_count": 6,
 390 |    "metadata": {},
 391 |    "outputs": [
 392 |     {
 393 |      "data": {
 394 |       "text/plain": [
 395 |        "array([[23, 0, 0, 0, 25.355],\n",
 396 |        "       [47, 1, 1, 0, 13.093],\n",
 397 |        "       [47, 1, 1, 0, 10.114],\n",
 398 |        "       [28, 0, 2, 0, 7.798],\n",
 399 |        "       [61, 0, 1, 0, 18.043]], dtype=object)"
 400 |       ]
 401 |      },
 402 |      "execution_count": 6,
 403 |      "metadata": {},
 404 |      "output_type": "execute_result"
 405 |     }
 406 |    ],
 407 |    "source": [
 408 |     "from sklearn import preprocessing\n",
 409 |     "le_sex = preprocessing.LabelEncoder()\n",
 410 |     "le_sex.fit(['F','M'])\n",
 411 |     "X[:,1] = le_sex.transform(X[:,1]) \n",
 412 |     "\n",
 413 |     "\n",
 414 |     "le_BP = preprocessing.LabelEncoder()\n",
 415 |     "le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])\n",
 416 |     "X[:,2] = le_BP.transform(X[:,2])\n",
 417 |     "\n",
 418 |     "\n",
 419 |     "le_Chol = preprocessing.LabelEncoder()\n",
 420 |     "le_Chol.fit([ 'NORMAL', 'HIGH'])\n",
 421 |     "X[:,3] = le_Chol.transform(X[:,3]) \n",
 422 |     "\n",
 423 |     "X[0:5]\n"
 424 |    ]
 425 |   },
 426 |   {
 427 |    "cell_type": "markdown",
 428 |    "metadata": {},
 429 |    "source": [
 430 |     "Now we can fill the target variable."
 431 |    ]
 432 |   },
 433 |   {
 434 |    "cell_type": "code",
 435 |    "execution_count": 7,
 436 |    "metadata": {
 437 |     "button": false,
 438 |     "new_sheet": false,
 439 |     "run_control": {
 440 |      "read_only": false
 441 |     }
 442 |    },
 443 |    "outputs": [
 444 |     {
 445 |      "data": {
 446 |       "text/plain": [
 447 |        "0    drugY\n",
 448 |        "1    drugC\n",
 449 |        "2    drugC\n",
 450 |        "3    drugX\n",
 451 |        "4    drugY\n",
 452 |        "Name: Drug, dtype: object"
 453 |       ]
 454 |      },
 455 |      "execution_count": 7,
 456 |      "metadata": {},
 457 |      "output_type": "execute_result"
 458 |     }
 459 |    ],
 460 |    "source": [
 461 |     "y = my_data[\"Drug\"]\n",
 462 |     "y[0:5]"
 463 |    ]
 464 |   },
 465 |   {
 466 |    "cell_type": "markdown",
 467 |    "metadata": {
 468 |     "button": false,
 469 |     "new_sheet": false,
 470 |     "run_control": {
 471 |      "read_only": false
 472 |     }
 473 |    },
 474 |    "source": [
 475 |     "---\n",
 476 |     "## Setting up the Decision Tree\n",
 477 |     "We will be using <b>train/test split</b> on our <b>decision tree</b>. Let's import <b>train_test_split</b> from <b>sklearn.cross_validation</b>."
 478 |    ]
 479 |   },
 480 |   {
 481 |    "cell_type": "code",
 482 |    "execution_count": 8,
 483 |    "metadata": {
 484 |     "button": false,
 485 |     "new_sheet": false,
 486 |     "run_control": {
 487 |      "read_only": false
 488 |     }
 489 |    },
 490 |    "outputs": [],
 491 |    "source": [
 492 |     "from sklearn.model_selection import train_test_split"
 493 |    ]
 494 |   },
 495 |   {
 496 |    "cell_type": "markdown",
 497 |    "metadata": {
 498 |     "button": false,
 499 |     "new_sheet": false,
 500 |     "run_control": {
 501 |      "read_only": false
 502 |     }
 503 |    },
 504 |    "source": [
 505 |     "Now <b> train_test_split </b> will return 4 different parameters. We will name them:<br>\n",
 506 |     "X_trainset, X_testset, y_trainset, y_testset <br> <br>\n",
 507 |     "The <b> train_test_split </b> will need the parameters: <br>\n",
 508 |     "X, y, test_size=0.3, and random_state=3. <br> <br>\n",
 509 |     "The <b>X</b> and <b>y</b> are the arrays required before the split, the <b>test_size</b> represents the ratio of the testing dataset, and the <b>random_state</b> ensures that we obtain the same splits."
 510 |    ]
 511 |   },
 512 |   {
 513 |    "cell_type": "code",
 514 |    "execution_count": 9,
 515 |    "metadata": {
 516 |     "button": false,
 517 |     "new_sheet": false,
 518 |     "run_control": {
 519 |      "read_only": false
 520 |     }
 521 |    },
 522 |    "outputs": [],
 523 |    "source": [
 524 |     "X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)"
 525 |    ]
 526 |   },
 527 |   {
 528 |    "cell_type": "markdown",
 529 |    "metadata": {
 530 |     "button": false,
 531 |     "new_sheet": false,
 532 |     "run_control": {
 533 |      "read_only": false
 534 |     }
 535 |    },
 536 |    "source": [
 537 |     "## Practice\n",
 538 |     "Print the shape of X_trainset and y_trainset. Ensure that the dimensions match"
 539 |    ]
 540 |   },
 541 |   {
 542 |    "cell_type": "code",
 543 |    "execution_count": 14,
 544 |    "metadata": {
 545 |     "button": false,
 546 |     "new_sheet": false,
 547 |     "run_control": {
 548 |      "read_only": false
 549 |     }
 550 |    },
 551 |    "outputs": [
 552 |     {
 553 |      "data": {
 554 |       "text/plain": [
 555 |        "(140,)"
 556 |       ]
 557 |      },
 558 |      "execution_count": 14,
 559 |      "metadata": {},
 560 |      "output_type": "execute_result"
 561 |     }
 562 |    ],
 563 |    "source": [
 564 |     "# your code\n",
 565 |     "\n",
 566 |     "# your code\n",
 567 |     "\n",
 568 |     "X_trainset.shape,\n",
 569 |     "y_trainset.shape"
 570 |    ]
 571 |   },
 572 |   {
 573 |    "cell_type": "markdown",
 574 |    "metadata": {
 575 |     "button": false,
 576 |     "new_sheet": false,
 577 |     "run_control": {
 578 |      "read_only": false
 579 |     }
 580 |    },
 581 |    "source": [
 582 |     "Print the shape of X_testset and y_testset. Ensure that the dimensions match"
 583 |    ]
 584 |   },
 585 |   {
 586 |    "cell_type": "code",
 587 |    "execution_count": 15,
 588 |    "metadata": {
 589 |     "button": false,
 590 |     "new_sheet": false,
 591 |     "run_control": {
 592 |      "read_only": false
 593 |     }
 594 |    },
 595 |    "outputs": [
 596 |     {
 597 |      "name": "stdout",
 598 |      "output_type": "stream",
 599 |      "text": [
 600 |       "[[26 0 0 1 19.161]\n",
 601 |       " [41 0 2 1 22.905]\n",
 602 |       " [28 0 2 0 19.675]\n",
 603 |       " [19 0 0 0 13.313]\n",
 604 |       " [50 1 2 1 15.79]\n",
 605 |       " [24 1 2 0 25.786]\n",
 606 |       " [72 1 1 0 16.31]\n",
 607 |       " [74 0 1 0 20.942]\n",
 608 |       " [37 0 1 1 12.006]\n",
 609 |       " [31 1 0 1 17.069]\n",
 610 |       " [22 0 2 0 8.607]\n",
 611 |       " [20 0 2 1 9.281]\n",
 612 |       " [28 0 1 0 13.127]\n",
 613 |       " [59 0 2 0 13.884]\n",
 614 |       " [15 1 0 1 17.206]\n",
 615 |       " [51 0 1 1 23.003]\n",
 616 |       " [45 1 1 1 10.017]\n",
 617 |       " [33 0 1 0 33.486]\n",
 618 |       " [39 1 0 0 9.664]\n",
 619 |       " [29 0 0 0 29.45]\n",
 620 |       " [60 1 2 0 15.171]\n",
 621 |       " [24 0 0 1 18.457]\n",
 622 |       " [49 0 2 1 9.381]\n",
 623 |       " [37 1 1 1 8.968]\n",
 624 |       " [32 0 0 1 10.292]\n",
 625 |       " [21 0 0 1 28.632]\n",
 626 |       " [23 1 2 0 12.26]\n",
 627 |       " [40 1 0 0 27.826]\n",
 628 |       " [38 1 1 0 18.295]\n",
 629 |       " [47 1 1 1 30.568]\n",
 630 |       " [22 0 0 1 22.818]\n",
 631 |       " [47 1 0 0 10.403]\n",
 632 |       " [30 0 2 0 10.443]\n",
 633 |       " [69 1 1 0 15.478]\n",
 634 |       " [42 0 0 0 21.036]\n",
 635 |       " [45 1 1 1 8.37]\n",
 636 |       " [49 1 0 1 6.269]\n",
 637 |       " [72 1 1 0 6.769]\n",
 638 |       " [74 1 1 1 11.939]\n",
 639 |       " [66 0 2 1 8.107]\n",
 640 |       " [46 1 2 1 7.285]\n",
 641 |       " [68 0 2 1 27.05]\n",
 642 |       " [58 0 0 0 19.416]\n",
 643 |       " [19 0 0 1 25.969]\n",
 644 |       " [20 1 0 1 35.639]\n",
 645 |       " [69 1 1 1 11.455]\n",
 646 |       " [32 0 0 1 25.974]\n",
 647 |       " [72 1 0 1 9.677]\n",
 648 |       " [50 0 2 1 12.295]\n",
 649 |       " [54 1 2 0 24.658]\n",
 650 |       " [36 0 0 0 11.198]\n",
 651 |       " [64 0 1 1 25.741]\n",
 652 |       " [35 1 1 1 9.17]\n",
 653 |       " [47 0 1 0 11.767]\n",
 654 |       " [47 0 1 0 10.067]\n",
 655 |       " [34 0 0 1 19.199]\n",
 656 |       " [26 0 1 0 14.16]\n",
 657 |       " [37 0 0 1 23.091]\n",
 658 |       " [48 1 0 1 10.446]\n",
 659 |       " [47 0 2 1 6.683]\n",
 660 |       " [55 0 0 0 10.977]\n",
 661 |       " [43 1 1 1 19.368]\n",
 662 |       " [35 0 0 0 12.894]\n",
 663 |       " [49 1 1 1 11.014]\n",
 664 |       " [45 1 1 0 17.951]\n",
 665 |       " [15 1 2 0 9.084]\n",
 666 |       " [57 0 2 1 25.893]\n",
 667 |       " [65 1 0 1 11.34]\n",
 668 |       " [70 1 0 0 9.849]\n",
 669 |       " [46 0 0 0 34.686]\n",
 670 |       " [41 1 0 1 15.156]\n",
 671 |       " [34 1 0 0 18.703]\n",
 672 |       " [42 1 0 1 12.766]\n",
 673 |       " [32 1 0 1 9.445]\n",
 674 |       " [25 1 2 0 19.011]\n",
 675 |       " [62 1 1 1 27.183]\n",
 676 |       " [23 1 0 0 8.011]\n",
 677 |       " [23 1 2 0 31.686]\n",
 678 |       " [58 0 1 0 38.247]\n",
 679 |       " [26 1 1 1 20.909]\n",
 680 |       " [68 1 0 0 11.009]\n",
 681 |       " [60 1 0 0 13.934]\n",
 682 |       " [15 0 0 1 16.725]\n",
 683 |       " [53 0 0 1 12.495]\n",
 684 |       " [37 1 1 1 16.724]\n",
 685 |       " [40 0 2 0 10.103]\n",
 686 |       " [59 1 0 0 13.935]\n",
 687 |       " [47 1 1 0 13.093]\n",
 688 |       " [65 0 1 1 13.769]\n",
 689 |       " [16 1 0 1 19.007]\n",
 690 |       " [67 1 2 1 9.514]\n",
 691 |       " [23 1 1 0 7.298]\n",
 692 |       " [56 0 1 0 11.567]\n",
 693 |       " [68 0 0 1 10.189]\n",
 694 |       " [65 1 0 1 34.997]\n",
 695 |       " [39 0 1 1 22.697]\n",
 696 |       " [35 1 2 1 7.845]\n",
 697 |       " [64 1 0 1 20.932]\n",
 698 |       " [28 0 1 0 19.796]\n",
 699 |       " [56 1 1 0 15.015]\n",
 700 |       " [57 1 1 1 19.128]\n",
 701 |       " [39 1 1 1 13.938]\n",
 702 |       " [32 0 1 1 10.84]\n",
 703 |       " [36 0 2 0 16.753]\n",
 704 |       " [65 0 0 1 31.876]\n",
 705 |       " [41 1 1 0 11.037]\n",
 706 |       " [67 1 1 1 20.693]\n",
 707 |       " [23 1 2 1 14.02]\n",
 708 |       " [40 0 1 1 11.349]\n",
 709 |       " [53 1 1 0 22.963]\n",
 710 |       " [56 0 0 0 25.395]\n",
 711 |       " [50 1 0 0 7.49]\n",
 712 |       " [22 1 0 1 28.294]\n",
 713 |       " [18 0 0 1 24.276]\n",
 714 |       " [62 1 2 0 16.594]\n",
 715 |       " [32 0 2 0 7.477]\n",
 716 |       " [38 0 1 1 29.875]\n",
 717 |       " [47 1 1 0 10.114]\n",
 718 |       " [29 1 0 0 12.856]\n",
 719 |       " [49 1 0 1 8.7]\n",
 720 |       " [64 1 2 0 7.761]\n",
 721 |       " [31 1 0 0 30.366]\n",
 722 |       " [60 1 0 1 8.621]\n",
 723 |       " [57 0 2 0 14.216]\n",
 724 |       " [42 0 1 1 29.271]\n",
 725 |       " [39 0 2 1 17.225]\n",
 726 |       " [61 0 1 1 7.34]\n",
 727 |       " [58 0 1 0 26.645]\n",
 728 |       " [61 0 0 0 25.475]\n",
 729 |       " [22 1 1 0 8.151]\n",
 730 |       " [51 1 0 1 11.343]\n",
 731 |       " [20 0 0 0 11.262]\n",
 732 |       " [42 1 1 0 20.013]\n",
 733 |       " [26 0 0 1 12.307]\n",
 734 |       " [63 1 2 0 25.917]\n",
 735 |       " [23 0 0 0 25.355]\n",
 736 |       " [18 0 0 0 37.188]\n",
 737 |       " [52 1 1 1 32.922]\n",
 738 |       " [55 1 2 1 7.261]\n",
 739 |       " [22 1 2 0 11.953]]\n",
 740 |       "40     drugY\n",
 741 |       "51     drugX\n",
 742 |       "139    drugX\n",
 743 |       "197    drugX\n",
 744 |       "170    drugX\n",
 745 |       "82     drugC\n",
 746 |       "183    drugY\n",
 747 |       "46     drugA\n",
 748 |       "70     drugB\n",
 749 |       "100    drugA\n",
 750 |       "179    drugY\n",
 751 |       "83     drugA\n",
 752 |       "25     drugY\n",
 753 |       "190    drugY\n",
 754 |       "159    drugX\n",
 755 |       "173    drugY\n",
 756 |       "95     drugX\n",
 757 |       "3      drugX\n",
 758 |       "41     drugB\n",
 759 |       "58     drugX\n",
 760 |       "14     drugX\n",
 761 |       "143    drugY\n",
 762 |       "12     drugY\n",
 763 |       "6      drugY\n",
 764 |       "182    drugX\n",
 765 |       "161    drugB\n",
 766 |       "128    drugY\n",
 767 |       "122    drugY\n",
 768 |       "101    drugA\n",
 769 |       "86     drugX\n",
 770 |       "64     drugB\n",
 771 |       "47     drugC\n",
 772 |       "158    drugC\n",
 773 |       "34     drugX\n",
 774 |       "38     drugX\n",
 775 |       "196    drugC\n",
 776 |       "4      drugY\n",
 777 |       "72     drugX\n",
 778 |       "67     drugX\n",
 779 |       "145    drugX\n",
 780 |       "156    drugA\n",
 781 |       "115    drugY\n",
 782 |       "155    drugC\n",
 783 |       "15     drugY\n",
 784 |       "61     drugA\n",
 785 |       "175    drugY\n",
 786 |       "120    drugY\n",
 787 |       "130    drugY\n",
 788 |       "23     drugY\n",
 789 |       "153    drugX\n",
 790 |       "31     drugB\n",
 791 |       "103    drugX\n",
 792 |       "89     drugY\n",
 793 |       "132    drugX\n",
 794 |       "109    drugY\n",
 795 |       "126    drugY\n",
 796 |       "17     drugA\n",
 797 |       "30     drugX\n",
 798 |       "178    drugY\n",
 799 |       "162    drugX\n",
 800 |       "Name: Drug, dtype: object\n"
 801 |      ]
 802 |     }
 803 |    ],
 804 |    "source": [
 805 |     "# your code\n",
 806 |     "\n",
 807 |     "print(X_trainset)\n",
 808 |     "print(y_testset)"
 809 |    ]
 810 |   },
 811 |   {
 812 |    "cell_type": "markdown",
 813 |    "metadata": {
 814 |     "button": false,
 815 |     "new_sheet": false,
 816 |     "run_control": {
 817 |      "read_only": false
 818 |     }
 819 |    },
 820 |    "source": [
 821 |     "## Modeling\n",
 822 |     "We will first create an instance of the <b>DecisionTreeClassifier</b> called <b>drugTree</b>.<br>\n",
 823 |     "Inside of the classifier, specify <i> criterion=\"entropy\" </i> so we can see the information gain of each node."
 824 |    ]
 825 |   },
 826 |   {
 827 |    "cell_type": "code",
 828 |    "execution_count": 16,
 829 |    "metadata": {
 830 |     "button": false,
 831 |     "new_sheet": false,
 832 |     "run_control": {
 833 |      "read_only": false
 834 |     }
 835 |    },
 836 |    "outputs": [
 837 |     {
 838 |      "data": {
 839 |       "text/plain": [
 840 |        "DecisionTreeClassifier(criterion='entropy', max_depth=4)"
 841 |       ]
 842 |      },
 843 |      "execution_count": 16,
 844 |      "metadata": {},
 845 |      "output_type": "execute_result"
 846 |     }
 847 |    ],
 848 |    "source": [
 849 |     "drugTree = DecisionTreeClassifier(criterion=\"entropy\", max_depth = 4)\n",
 850 |     "drugTree # it shows the default parameters"
 851 |    ]
 852 |   },
 853 |   {
 854 |    "cell_type": "markdown",
 855 |    "metadata": {
 856 |     "button": false,
 857 |     "new_sheet": false,
 858 |     "run_control": {
 859 |      "read_only": false
 860 |     }
 861 |    },
 862 |    "source": [
 863 |     "Next, we will fit the data with the training feature matrix <b> X_trainset </b> and training  response vector <b> y_trainset </b>"
 864 |    ]
 865 |   },
 866 |   {
 867 |    "cell_type": "code",
 868 |    "execution_count": 17,
 869 |    "metadata": {
 870 |     "button": false,
 871 |     "new_sheet": false,
 872 |     "run_control": {
 873 |      "read_only": false
 874 |     }
 875 |    },
 876 |    "outputs": [
 877 |     {
 878 |      "data": {
 879 |       "text/plain": [
 880 |        "DecisionTreeClassifier(criterion='entropy', max_depth=4)"
 881 |       ]
 882 |      },
 883 |      "execution_count": 17,
 884 |      "metadata": {},
 885 |      "output_type": "execute_result"
 886 |     }
 887 |    ],
 888 |    "source": [
 889 |     "drugTree.fit(X_trainset,y_trainset)"
 890 |    ]
 891 |   },
 892 |   {
 893 |    "cell_type": "markdown",
 894 |    "metadata": {
 895 |     "button": false,
 896 |     "new_sheet": false,
 897 |     "run_control": {
 898 |      "read_only": false
 899 |     }
 900 |    },
 901 |    "source": [
 902 |     "## Prediction\n",
 903 |     "Let's make some <b>predictions</b> on the testing dataset and store it into a variable called <b>predTree</b>."
 904 |    ]
 905 |   },
 906 |   {
 907 |    "cell_type": "code",
 908 |    "execution_count": 18,
 909 |    "metadata": {
 910 |     "button": false,
 911 |     "new_sheet": false,
 912 |     "run_control": {
 913 |      "read_only": false
 914 |     }
 915 |    },
 916 |    "outputs": [],
 917 |    "source": [
 918 |     "predTree = drugTree.predict(X_testset)"
 919 |    ]
 920 |   },
 921 |   {
 922 |    "cell_type": "markdown",
 923 |    "metadata": {
 924 |     "button": false,
 925 |     "new_sheet": false,
 926 |     "run_control": {
 927 |      "read_only": false
 928 |     }
 929 |    },
 930 |    "source": [
 931 |     "You can print out <b>predTree</b> and <b>y_testset</b> if you want to visually compare the prediction to the actual values."
 932 |    ]
 933 |   },
 934 |   {
 935 |    "cell_type": "code",
 936 |    "execution_count": 19,
 937 |    "metadata": {
 938 |     "button": false,
 939 |     "new_sheet": false,
 940 |     "run_control": {
 941 |      "read_only": false
 942 |     },
 943 |     "scrolled": true
 944 |    },
 945 |    "outputs": [
 946 |     {
 947 |      "name": "stdout",
 948 |      "output_type": "stream",
 949 |      "text": [
 950 |       "['drugY' 'drugX' 'drugX' 'drugX' 'drugX']\n",
 951 |       "40     drugY\n",
 952 |       "51     drugX\n",
 953 |       "139    drugX\n",
 954 |       "197    drugX\n",
 955 |       "170    drugX\n",
 956 |       "Name: Drug, dtype: object\n"
 957 |      ]
 958 |     }
 959 |    ],
 960 |    "source": [
 961 |     "print (predTree [0:5])\n",
 962 |     "print (y_testset [0:5])\n"
 963 |    ]
 964 |   },
 965 |   {
 966 |    "cell_type": "markdown",
 967 |    "metadata": {
 968 |     "button": false,
 969 |     "new_sheet": false,
 970 |     "run_control": {
 971 |      "read_only": false
 972 |     }
 973 |    },
 974 |    "source": [
 975 |     "## Evaluation\n",
 976 |     "Next, let's import __metrics__ from sklearn and check the accuracy of our model."
 977 |    ]
 978 |   },
 979 |   {
 980 |    "cell_type": "code",
 981 |    "execution_count": 20,
 982 |    "metadata": {
 983 |     "button": false,
 984 |     "new_sheet": false,
 985 |     "run_control": {
 986 |      "read_only": false
 987 |     }
 988 |    },
 989 |    "outputs": [
 990 |     {
 991 |      "name": "stdout",
 992 |      "output_type": "stream",
 993 |      "text": [
 994 |       "DecisionTrees's Accuracy:  0.9833333333333333\n"
 995 |      ]
 996 |     }
 997 |    ],
 998 |    "source": [
 999 |     "from sklearn import metrics\n",
1000 |     "import matplotlib.pyplot as plt\n",
1001 |     "print(\"DecisionTrees's Accuracy: \", metrics.accuracy_score(y_testset, predTree))"
1002 |    ]
1003 |   },
1004 |   {
1005 |    "cell_type": "markdown",
1006 |    "metadata": {
1007 |     "button": false,
1008 |     "new_sheet": false,
1009 |     "run_control": {
1010 |      "read_only": false
1011 |     }
1012 |    },
1013 |    "source": [
1014 |     "__Accuracy classification score__ computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.  \n",
1015 |     "\n",
1016 |     "In multilabel classification, the function returns the subset accuracy. If the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0; otherwise it is 0.0.\n"
1017 |    ]
1018 |   },
1019 |   {
1020 |    "cell_type": "markdown",
1021 |    "metadata": {
1022 |     "button": false,
1023 |     "new_sheet": false,
1024 |     "run_control": {
1025 |      "read_only": false
1026 |     }
1027 |    },
1028 |    "source": [
1029 |     "## Practice \n",
1030 |     "Can you calculate the accuracy score without sklearn ?"
1031 |    ]
1032 |   },
1033 |   {
1034 |    "cell_type": "code",
1035 |    "execution_count": 26,
1036 |    "metadata": {
1037 |     "button": false,
1038 |     "new_sheet": false,
1039 |     "run_control": {
1040 |      "read_only": false
1041 |     }
1042 |    },
1043 |    "outputs": [
1044 |     {
1045 |      "name": "stdout",
1046 |      "output_type": "stream",
1047 |      "text": [
1048 |       "Collecting package metadata (current_repodata.json): ...working... done\n",
1049 |       "Solving environment: ...working... done\n",
1050 |       "\n",
1051 |       "## Package Plan ##\n",
1052 |       "\n",
1053 |       "  environment location: C:\\ProgramData\\Anaconda3\n",
1054 |       "\n",
1055 |       "  added / updated specs:\n",
1056 |       "    - pydotplus\n",
1057 |       "\n",
1058 |       "\n",
1059 |       "The following NEW packages will be INSTALLED:\n",
1060 |       "\n",
1061 |       "  graphviz           pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n",
1062 |       "  pydotplus          conda-forge/noarch::pydotplus-2.0.2-py_2\n",
1063 |       "\n",
1064 |       "The following packages will be UPDATED:\n",
1065 |       "\n",
1066 |       "  conda                               4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n",
1067 |       "\n",
1068 |       "\n",
1069 |       "Preparing transaction: ...working... done\n",
1070 |       "Verifying transaction: ...working... failed\n"
1071 |      ]
1072 |     },
1073 |     {
1074 |      "name": "stderr",
1075 |      "output_type": "stream",
1076 |      "text": [
1077 |       "\n",
1078 |       "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n",
1079 |       "  environment location: C:\\ProgramData\\Anaconda3\n",
1080 |       "\n",
1081 |       "\n"
1082 |      ]
1083 |     },
1084 |     {
1085 |      "name": "stdout",
1086 |      "output_type": "stream",
1087 |      "text": [
1088 |       "Collecting package metadata (current_repodata.json): ...working... done\n",
1089 |       "Solving environment: ...working... done\n",
1090 |       "\n",
1091 |       "## Package Plan ##\n",
1092 |       "\n",
1093 |       "  environment location: C:\\ProgramData\\Anaconda3"
1094 |      ]
1095 |     },
1096 |     {
1097 |      "name": "stderr",
1098 |      "output_type": "stream",
1099 |      "text": [
1100 |       "\n",
1101 |       "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.\n",
1102 |       "  environment location: C:\\ProgramData\\Anaconda3\n",
1103 |       "\n",
1104 |       "\n"
1105 |      ]
1106 |     },
1107 |     {
1108 |      "name": "stdout",
1109 |      "output_type": "stream",
1110 |      "text": [
1111 |       "\n",
1112 |       "\n",
1113 |       "  added / updated specs:\n",
1114 |       "    - python-graphviz\n",
1115 |       "\n",
1116 |       "\n",
1117 |       "The following NEW packages will be INSTALLED:\n",
1118 |       "\n",
1119 |       "  graphviz           pkgs/main/win-64::graphviz-2.38-hfd603c8_2\n",
1120 |       "  python-graphviz    pkgs/main/noarch::python-graphviz-0.16-pyhd3eb1b0_1\n",
1121 |       "\n",
1122 |       "The following packages will be UPDATED:\n",
1123 |       "\n",
1124 |       "  conda                               4.10.3-py38haa244fe_0 --> 4.10.3-py38haa244fe_2\n",
1125 |       "\n",
1126 |       "\n",
1127 |       "Preparing transaction: ...working... done\n",
1128 |       "Verifying transaction: ...working... failed\n"
1129 |      ]
1130 |     }
1131 |    ],
1132 |    "source": [
1133 |     "# your code here\n",
1134 |     "# Notice: You might need to uncomment and install the pydotplus and graphviz libraries if you have not installed these before\n",
1135 |     "!conda install -c conda-forge pydotplus -y\n",
1136 |     "!conda install -c conda-forge python-graphviz -y"
1137 |    ]
1138 |   },
1139 |   {
1140 |    "cell_type": "markdown",
1141 |    "metadata": {},
1142 |    "source": [
1143 |     "## Visualization\n",
1144 |     "Lets visualize the tree"
1145 |    ]
1146 |   },
1147 |   {
1148 |    "cell_type": "code",
1149 |    "execution_count": 31,
1150 |    "metadata": {
1151 |     "button": false,
1152 |     "new_sheet": false,
1153 |     "run_control": {
1154 |      "read_only": false
1155 |     }
1156 |    },
1157 |    "outputs": [],
1158 |    "source": [
1159 |     "from six import StringIO\n",
1160 |     "import pydotplus\n",
1161 |     "import matplotlib.image as mpimg\n",
1162 |     "from sklearn import tree\n",
1163 |     "%matplotlib inline "
1164 |    ]
1165 |   },
1166 |   {
1167 |    "cell_type": "code",
1168 |    "execution_count": 36,
1169 |    "metadata": {
1170 |     "button": false,
1171 |     "new_sheet": false,
1172 |     "run_control": {
1173 |      "read_only": false
1174 |     },
1175 |     "scrolled": true
1176 |    },
1177 |    "outputs": [
1178 |     {
1179 |      "ename": "SyntaxError",
1180 |      "evalue": "not a PNG file (<string>)",
1181 |      "output_type": "error",
1182 |      "traceback": [
1183 |       "Traceback \u001b[1;36m(most recent call last)\u001b[0m:\n",
1184 |       "  File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py\"\u001b[0m, line \u001b[0;32m3437\u001b[0m, in \u001b[0;35mrun_code\u001b[0m\n    exec(code_obj, self.user_global_ns, self.user_ns)\n",
1185 |       "  File \u001b[0;32m\"<ipython-input-36-8a570924cb51>\"\u001b[0m, line \u001b[0;32m9\u001b[0m, in \u001b[0;35m<module>\u001b[0m\n    img = mpimg.imread(filename)\n",
1186 |       "  File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\matplotlib\\image.py\"\u001b[0m, line \u001b[0;32m1496\u001b[0m, in \u001b[0;35mimread\u001b[0m\n    with img_open(fname) as image:\n",
1187 |       "  File \u001b[0;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\ImageFile.py\"\u001b[0m, line \u001b[0;32m121\u001b[0m, in \u001b[0;35m__init__\u001b[0m\n    self._open()\n",
1188 |       "\u001b[1;36m  File \u001b[1;32m\"C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PIL\\PngImagePlugin.py\"\u001b[1;36m, line \u001b[1;32m676\u001b[1;36m, in \u001b[1;35m_open\u001b[1;36m\u001b[0m\n\u001b[1;33m    raise SyntaxError(\"not a PNG file\")\u001b[0m\n",
1189 |       "\u001b[1;36m  File \u001b[1;32m\"<string>\"\u001b[1;36m, line \u001b[1;32munknown\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m not a PNG file\n"
1190 |      ]
1191 |     }
1192 |    ],
1193 |    "source": [
1194 |     "dot_data = StringIO()\n",
1195 |     "filename = \"drugtree.png\"\n",
1196 |     "featureNames = my_data.columns[0:5]\n",
1197 |     "targetNames = my_data[\"Drug\"].unique().tolist()\n",
1198 |     "out=tree.export_graphviz(drugTree,feature_names=featureNames, out_file=dot_data, class_names= np.unique(y_trainset), filled=True,  special_characters=True,rotate=False)  \n",
1199 |     "graph = pydotplus.graph_from_dot_data(dot_data.getvalue())\n",
1200 |     "#graph.write_png(filename)\n",
1201 |     "graph.write_png(filename)\n",
1202 |     "img = mpimg.imread(filename)\n",
1203 |     "plt.figure(figsize=(100, 200))\n",
1204 |     "plt.imshow(img,interpolation='nearest')"
1205 |    ]
1206 |   },
1207 |   {
1208 |    "cell_type": "markdown",
1209 |    "metadata": {
1210 |     "button": false,
1211 |     "new_sheet": false,
1212 |     "run_control": {
1213 |      "read_only": false
1214 |     }
1215 |    },
1216 |    "source": [
1217 |     "## Want to learn more?\n",
1218 |     "\n",
1219 |     "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).\n",
1220 |     "\n",
1221 |     "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)\n",
1222 |     "\n",
1223 |     "### Thanks for completing this lesson!\n",
1224 |     "\n",
1225 |     "Notebook created by: <a href = \"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>\n",
1226 |     "\n",
1227 |     "<hr>\n",
1228 |     "Copyright &copy; 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​"
1229 |    ]
1230 |   }
1231 |  ],
1232 |  "metadata": {
1233 |   "anaconda-cloud": {},
1234 |   "kernelspec": {
1235 |    "display_name": "Python 3",
1236 |    "language": "python",
1237 |    "name": "python3"
1238 |   },
1239 |   "language_info": {
1240 |    "codemirror_mode": {
1241 |     "name": "ipython",
1242 |     "version": 3
1243 |    },
1244 |    "file_extension": ".py",
1245 |    "mimetype": "text/x-python",
1246 |    "name": "python",
1247 |    "nbconvert_exporter": "python",
1248 |    "pygments_lexer": "ipython3",
1249 |    "version": "3.8.8"
1250 |   },
1251 |   "widgets": {
1252 |    "state": {},
1253 |    "version": "1.1.2"
1254 |   }
1255 |  },
1256 |  "nbformat": 4,
1257 |  "nbformat_minor": 2
1258 | }
1259 | 


--------------------------------------------------------------------------------
/drug200.csv:
--------------------------------------------------------------------------------
  1 | Age,Sex,BP,Cholesterol,Na_to_K,Drug
  2 | 23,F,HIGH,HIGH,25.355,drugY
  3 | 47,M,LOW,HIGH,13.093,drugC
  4 | 47,M,LOW,HIGH,10.114,drugC
  5 | 28,F,NORMAL,HIGH,7.798,drugX
  6 | 61,F,LOW,HIGH,18.043,drugY
  7 | 22,F,NORMAL,HIGH,8.607,drugX
  8 | 49,F,NORMAL,HIGH,16.275,drugY
  9 | 41,M,LOW,HIGH,11.037,drugC
 10 | 60,M,NORMAL,HIGH,15.171,drugY
 11 | 43,M,LOW,NORMAL,19.368,drugY
 12 | 47,F,LOW,HIGH,11.767,drugC
 13 | 34,F,HIGH,NORMAL,19.199,drugY
 14 | 43,M,LOW,HIGH,15.376,drugY
 15 | 74,F,LOW,HIGH,20.942,drugY
 16 | 50,F,NORMAL,HIGH,12.703,drugX
 17 | 16,F,HIGH,NORMAL,15.516,drugY
 18 | 69,M,LOW,NORMAL,11.455,drugX
 19 | 43,M,HIGH,HIGH,13.972,drugA
 20 | 23,M,LOW,HIGH,7.298,drugC
 21 | 32,F,HIGH,NORMAL,25.974,drugY
 22 | 57,M,LOW,NORMAL,19.128,drugY
 23 | 63,M,NORMAL,HIGH,25.917,drugY
 24 | 47,M,LOW,NORMAL,30.568,drugY
 25 | 48,F,LOW,HIGH,15.036,drugY
 26 | 33,F,LOW,HIGH,33.486,drugY
 27 | 28,F,HIGH,NORMAL,18.809,drugY
 28 | 31,M,HIGH,HIGH,30.366,drugY
 29 | 49,F,NORMAL,NORMAL,9.381,drugX
 30 | 39,F,LOW,NORMAL,22.697,drugY
 31 | 45,M,LOW,HIGH,17.951,drugY
 32 | 18,F,NORMAL,NORMAL,8.75,drugX
 33 | 74,M,HIGH,HIGH,9.567,drugB
 34 | 49,M,LOW,NORMAL,11.014,drugX
 35 | 65,F,HIGH,NORMAL,31.876,drugY
 36 | 53,M,NORMAL,HIGH,14.133,drugX
 37 | 46,M,NORMAL,NORMAL,7.285,drugX
 38 | 32,M,HIGH,NORMAL,9.445,drugA
 39 | 39,M,LOW,NORMAL,13.938,drugX
 40 | 39,F,NORMAL,NORMAL,9.709,drugX
 41 | 15,M,NORMAL,HIGH,9.084,drugX
 42 | 73,F,NORMAL,HIGH,19.221,drugY
 43 | 58,F,HIGH,NORMAL,14.239,drugB
 44 | 50,M,NORMAL,NORMAL,15.79,drugY
 45 | 23,M,NORMAL,HIGH,12.26,drugX
 46 | 50,F,NORMAL,NORMAL,12.295,drugX
 47 | 66,F,NORMAL,NORMAL,8.107,drugX
 48 | 37,F,HIGH,HIGH,13.091,drugA
 49 | 68,M,LOW,HIGH,10.291,drugC
 50 | 23,M,NORMAL,HIGH,31.686,drugY
 51 | 28,F,LOW,HIGH,19.796,drugY
 52 | 58,F,HIGH,HIGH,19.416,drugY
 53 | 67,M,NORMAL,NORMAL,10.898,drugX
 54 | 62,M,LOW,NORMAL,27.183,drugY
 55 | 24,F,HIGH,NORMAL,18.457,drugY
 56 | 68,F,HIGH,NORMAL,10.189,drugB
 57 | 26,F,LOW,HIGH,14.16,drugC
 58 | 65,M,HIGH,NORMAL,11.34,drugB
 59 | 40,M,HIGH,HIGH,27.826,drugY
 60 | 60,M,NORMAL,NORMAL,10.091,drugX
 61 | 34,M,HIGH,HIGH,18.703,drugY
 62 | 38,F,LOW,NORMAL,29.875,drugY
 63 | 24,M,HIGH,NORMAL,9.475,drugA
 64 | 67,M,LOW,NORMAL,20.693,drugY
 65 | 45,M,LOW,NORMAL,8.37,drugX
 66 | 60,F,HIGH,HIGH,13.303,drugB
 67 | 68,F,NORMAL,NORMAL,27.05,drugY
 68 | 29,M,HIGH,HIGH,12.856,drugA
 69 | 17,M,NORMAL,NORMAL,10.832,drugX
 70 | 54,M,NORMAL,HIGH,24.658,drugY
 71 | 18,F,HIGH,NORMAL,24.276,drugY
 72 | 70,M,HIGH,HIGH,13.967,drugB
 73 | 28,F,NORMAL,HIGH,19.675,drugY
 74 | 24,F,NORMAL,HIGH,10.605,drugX
 75 | 41,F,NORMAL,NORMAL,22.905,drugY
 76 | 31,M,HIGH,NORMAL,17.069,drugY
 77 | 26,M,LOW,NORMAL,20.909,drugY
 78 | 36,F,HIGH,HIGH,11.198,drugA
 79 | 26,F,HIGH,NORMAL,19.161,drugY
 80 | 19,F,HIGH,HIGH,13.313,drugA
 81 | 32,F,LOW,NORMAL,10.84,drugX
 82 | 60,M,HIGH,HIGH,13.934,drugB
 83 | 64,M,NORMAL,HIGH,7.761,drugX
 84 | 32,F,LOW,HIGH,9.712,drugC
 85 | 38,F,HIGH,NORMAL,11.326,drugA
 86 | 47,F,LOW,HIGH,10.067,drugC
 87 | 59,M,HIGH,HIGH,13.935,drugB
 88 | 51,F,NORMAL,HIGH,13.597,drugX
 89 | 69,M,LOW,HIGH,15.478,drugY
 90 | 37,F,HIGH,NORMAL,23.091,drugY
 91 | 50,F,NORMAL,NORMAL,17.211,drugY
 92 | 62,M,NORMAL,HIGH,16.594,drugY
 93 | 41,M,HIGH,NORMAL,15.156,drugY
 94 | 29,F,HIGH,HIGH,29.45,drugY
 95 | 42,F,LOW,NORMAL,29.271,drugY
 96 | 56,M,LOW,HIGH,15.015,drugY
 97 | 36,M,LOW,NORMAL,11.424,drugX
 98 | 58,F,LOW,HIGH,38.247,drugY
 99 | 56,F,HIGH,HIGH,25.395,drugY
100 | 20,M,HIGH,NORMAL,35.639,drugY
101 | 15,F,HIGH,NORMAL,16.725,drugY
102 | 31,M,HIGH,NORMAL,11.871,drugA
103 | 45,F,HIGH,HIGH,12.854,drugA
104 | 28,F,LOW,HIGH,13.127,drugC
105 | 56,M,NORMAL,HIGH,8.966,drugX
106 | 22,M,HIGH,NORMAL,28.294,drugY
107 | 37,M,LOW,NORMAL,8.968,drugX
108 | 22,M,NORMAL,HIGH,11.953,drugX
109 | 42,M,LOW,HIGH,20.013,drugY
110 | 72,M,HIGH,NORMAL,9.677,drugB
111 | 23,M,NORMAL,HIGH,16.85,drugY
112 | 50,M,HIGH,HIGH,7.49,drugA
113 | 47,F,NORMAL,NORMAL,6.683,drugX
114 | 35,M,LOW,NORMAL,9.17,drugX
115 | 65,F,LOW,NORMAL,13.769,drugX
116 | 20,F,NORMAL,NORMAL,9.281,drugX
117 | 51,M,HIGH,HIGH,18.295,drugY
118 | 67,M,NORMAL,NORMAL,9.514,drugX
119 | 40,F,NORMAL,HIGH,10.103,drugX
120 | 32,F,HIGH,NORMAL,10.292,drugA
121 | 61,F,HIGH,HIGH,25.475,drugY
122 | 28,M,NORMAL,HIGH,27.064,drugY
123 | 15,M,HIGH,NORMAL,17.206,drugY
124 | 34,M,NORMAL,HIGH,22.456,drugY
125 | 36,F,NORMAL,HIGH,16.753,drugY
126 | 53,F,HIGH,NORMAL,12.495,drugB
127 | 19,F,HIGH,NORMAL,25.969,drugY
128 | 66,M,HIGH,HIGH,16.347,drugY
129 | 35,M,NORMAL,NORMAL,7.845,drugX
130 | 47,M,LOW,NORMAL,33.542,drugY
131 | 32,F,NORMAL,HIGH,7.477,drugX
132 | 70,F,NORMAL,HIGH,20.489,drugY
133 | 52,M,LOW,NORMAL,32.922,drugY
134 | 49,M,LOW,NORMAL,13.598,drugX
135 | 24,M,NORMAL,HIGH,25.786,drugY
136 | 42,F,HIGH,HIGH,21.036,drugY
137 | 74,M,LOW,NORMAL,11.939,drugX
138 | 55,F,HIGH,HIGH,10.977,drugB
139 | 35,F,HIGH,HIGH,12.894,drugA
140 | 51,M,HIGH,NORMAL,11.343,drugB
141 | 69,F,NORMAL,HIGH,10.065,drugX
142 | 49,M,HIGH,NORMAL,6.269,drugA
143 | 64,F,LOW,NORMAL,25.741,drugY
144 | 60,M,HIGH,NORMAL,8.621,drugB
145 | 74,M,HIGH,NORMAL,15.436,drugY
146 | 39,M,HIGH,HIGH,9.664,drugA
147 | 61,M,NORMAL,HIGH,9.443,drugX
148 | 37,F,LOW,NORMAL,12.006,drugX
149 | 26,F,HIGH,NORMAL,12.307,drugA
150 | 61,F,LOW,NORMAL,7.34,drugX
151 | 22,M,LOW,HIGH,8.151,drugC
152 | 49,M,HIGH,NORMAL,8.7,drugA
153 | 68,M,HIGH,HIGH,11.009,drugB
154 | 55,M,NORMAL,NORMAL,7.261,drugX
155 | 72,F,LOW,NORMAL,14.642,drugX
156 | 37,M,LOW,NORMAL,16.724,drugY
157 | 49,M,LOW,HIGH,10.537,drugC
158 | 31,M,HIGH,NORMAL,11.227,drugA
159 | 53,M,LOW,HIGH,22.963,drugY
160 | 59,F,LOW,HIGH,10.444,drugC
161 | 34,F,LOW,NORMAL,12.923,drugX
162 | 30,F,NORMAL,HIGH,10.443,drugX
163 | 57,F,HIGH,NORMAL,9.945,drugB
164 | 43,M,NORMAL,NORMAL,12.859,drugX
165 | 21,F,HIGH,NORMAL,28.632,drugY
166 | 16,M,HIGH,NORMAL,19.007,drugY
167 | 38,M,LOW,HIGH,18.295,drugY
168 | 58,F,LOW,HIGH,26.645,drugY
169 | 57,F,NORMAL,HIGH,14.216,drugX
170 | 51,F,LOW,NORMAL,23.003,drugY
171 | 20,F,HIGH,HIGH,11.262,drugA
172 | 28,F,NORMAL,HIGH,12.879,drugX
173 | 45,M,LOW,NORMAL,10.017,drugX
174 | 39,F,NORMAL,NORMAL,17.225,drugY
175 | 41,F,LOW,NORMAL,18.739,drugY
176 | 42,M,HIGH,NORMAL,12.766,drugA
177 | 73,F,HIGH,HIGH,18.348,drugY
178 | 48,M,HIGH,NORMAL,10.446,drugA
179 | 25,M,NORMAL,HIGH,19.011,drugY
180 | 39,M,NORMAL,HIGH,15.969,drugY
181 | 67,F,NORMAL,HIGH,15.891,drugY
182 | 22,F,HIGH,NORMAL,22.818,drugY
183 | 59,F,NORMAL,HIGH,13.884,drugX
184 | 20,F,LOW,NORMAL,11.686,drugX
185 | 36,F,HIGH,NORMAL,15.49,drugY
186 | 18,F,HIGH,HIGH,37.188,drugY
187 | 57,F,NORMAL,NORMAL,25.893,drugY
188 | 70,M,HIGH,HIGH,9.849,drugB
189 | 47,M,HIGH,HIGH,10.403,drugA
190 | 65,M,HIGH,NORMAL,34.997,drugY
191 | 64,M,HIGH,NORMAL,20.932,drugY
192 | 58,M,HIGH,HIGH,18.991,drugY
193 | 23,M,HIGH,HIGH,8.011,drugA
194 | 72,M,LOW,HIGH,16.31,drugY
195 | 72,M,LOW,HIGH,6.769,drugC
196 | 46,F,HIGH,HIGH,34.686,drugY
197 | 56,F,LOW,HIGH,11.567,drugC
198 | 16,M,LOW,HIGH,12.006,drugC
199 | 52,M,NORMAL,HIGH,9.894,drugX
200 | 23,M,NORMAL,NORMAL,14.02,drugX
201 | 40,F,LOW,NORMAL,11.349,drugX


--------------------------------------------------------------------------------
	Age	Sex	BP	Cholesterol	Na_to_K	Drug
0	23	F	HIGH	HIGH	25.355	drugY
1	47	M	LOW	HIGH	13.093	drugC
2	47	M	LOW	HIGH	10.114	drugC
3	28	F	NORMAL	HIGH	7.798	drugX
4	61	F	LOW	HIGH	18.043	drugY
5	22	F	NORMAL	HIGH	8.607	drugX
6	49	F	NORMAL	HIGH	16.275	drugY
7	41	M	LOW	HIGH	11.037	drugC