├── Chapter01
    ├── Breast Cancer Detection with SVM (Jupyter Notebook).html
    └── Breast Cancer Detection with SVM (Jupyter Notebook).ipynb
├── Chapter02
    ├── Deep Learning Grid Search (Jupyter Notebook).html
    └── Deep Learning Grid Search (Jupyter Notebook).ipynb
├── Chapter03
    └── chapter3.ipynb
├── Chapter04
    └── Heart Disease Prediction with Neural Networks.ipynb
├── Chapter05
    └── Autism Screening with Machine Learning.ipynb
├── LICENSE
└── README.md


/Chapter03/chapter3.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Classifying DNA Sequences\n",
   8 |     "### Presented by Eduonix\n",
   9 |     "\n",
  10 |     "During this tutorial, we will explore the world of bioinformatics by using Markov models, K-nearest neighbor (KNN) algorithms, support vector machines, and other common classifiers to classify short E. Coli DNA sequences. This project will use a dataset from the UCI Machine Learning Repository that has 106 DNA sequences, with 57 sequential nucleotides (“base-pairs”) each.   \n",
  11 |     "\n",
  12 |     "You will learn how to:\n",
  13 |     "* Import data from the UCI repository\n",
  14 |     "* Convert text inputs to numerical data\n",
  15 |     "* Build and train classification algorithms\n",
  16 |     "* Compare and contrast classification algorithms\n",
  17 |     "\n",
  18 |     "## Step 1: Importing the Dataset\n",
  19 |     "\n",
  20 |     "The following code cells will import necessary libraries and import the dataset from the UCI repository as a Pandas DataFrame."
  21 |    ]
  22 |   },
  23 |   {
  24 |    "cell_type": "code",
  25 |    "execution_count": 1,
  26 |    "metadata": {},
  27 |    "outputs": [
  28 |     {
  29 |      "name": "stdout",
  30 |      "output_type": "stream",
  31 |      "text": [
  32 |       "Python: 2.7.13 |Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)]\n",
  33 |       "Numpy: 1.14.0\n",
  34 |       "Sklearn: 0.19.1\n",
  35 |       "Pandas: 0.21.0\n"
  36 |      ]
  37 |     }
  38 |    ],
  39 |    "source": [
  40 |     "# To make sure all of the correct libraries are installed, import each module and print the version number\n",
  41 |     "\n",
  42 |     "import sys\n",
  43 |     "import numpy\n",
  44 |     "import sklearn\n",
  45 |     "import pandas\n",
  46 |     "\n",
  47 |     "print('Python: {}'.format(sys.version))\n",
  48 |     "print('Numpy: {}'.format(numpy.__version__))\n",
  49 |     "print('Sklearn: {}'.format(sklearn.__version__))\n",
  50 |     "print('Pandas: {}'.format(pandas.__version__))"
  51 |    ]
  52 |   },
  53 |   {
  54 |    "cell_type": "code",
  55 |    "execution_count": 2,
  56 |    "metadata": {
  57 |     "collapsed": true
  58 |    },
  59 |    "outputs": [],
  60 |    "source": [
  61 |     "# Import, change module names\n",
  62 |     "import numpy as np\n",
  63 |     "import pandas as pd\n",
  64 |     "\n",
  65 |     "# import the uci Molecular Biology (Promoter Gene Sequences) Data Set\n",
  66 |     "url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/molecular-biology/promoter-gene-sequences/promoters.data'\n",
  67 |     "names = ['Class', 'id', 'Sequence']\n",
  68 |     "data = pd.read_csv(url, names = names)"
  69 |    ]
  70 |   },
  71 |   {
  72 |    "cell_type": "code",
  73 |    "execution_count": 3,
  74 |    "metadata": {},
  75 |    "outputs": [
  76 |     {
  77 |      "name": "stdout",
  78 |      "output_type": "stream",
  79 |      "text": [
  80 |       "Class                                                       +\n",
  81 |       "id                                                        S10\n",
  82 |       "Sequence    \\t\\ttactagcaatacgcttgcgttcggtggttaagtatgtataat...\n",
  83 |       "Name: 0, dtype: object\n"
  84 |      ]
  85 |     }
  86 |    ],
  87 |    "source": [
  88 |     "print(data.iloc[0])"
  89 |    ]
  90 |   },
  91 |   {
  92 |    "cell_type": "markdown",
  93 |    "metadata": {},
  94 |    "source": [
  95 |     "## Step 2: Preprocessing the Dataset\n",
  96 |     "\n",
  97 |     "The data is not in a usable form; as a result, we will need to process it before using it to train our algorithms."
  98 |    ]
  99 |   },
 100 |   {
 101 |    "cell_type": "code",
 102 |    "execution_count": 4,
 103 |    "metadata": {},
 104 |    "outputs": [
 105 |     {
 106 |      "name": "stdout",
 107 |      "output_type": "stream",
 108 |      "text": [
 109 |       "0    +\n",
 110 |       "1    +\n",
 111 |       "2    +\n",
 112 |       "3    +\n",
 113 |       "4    +\n",
 114 |       "Name: Class, dtype: object\n"
 115 |      ]
 116 |     }
 117 |    ],
 118 |    "source": [
 119 |     "# Building our Dataset by creating a custom Pandas DataFrame\n",
 120 |     "# Each column in a DataFrame is called a Series. Lets start by making a series for each column.\n",
 121 |     "\n",
 122 |     "classes = data.loc[:, 'Class']\n",
 123 |     "print(classes[:5])"
 124 |    ]
 125 |   },
 126 |   {
 127 |    "cell_type": "code",
 128 |    "execution_count": 5,
 129 |    "metadata": {},
 130 |    "outputs": [
 131 |     {
 132 |      "name": "stdout",
 133 |      "output_type": "stream",
 134 |      "text": [
 135 |       "['t', 'a', 'c', 't', 'a', 'g', 'c', 'a', 'a', 't', 'a', 'c', 'g', 'c', 't', 't', 'g', 'c', 'g', 't', 't', 'c', 'g', 'g', 't', 'g', 'g', 't', 't', 'a', 'a', 'g', 't', 'a', 't', 'g', 't', 'a', 't', 'a', 'a', 't', 'g', 'c', 'g', 'c', 'g', 'g', 'g', 'c', 't', 't', 'g', 't', 'c', 'g', 't', '+']\n"
 136 |      ]
 137 |     }
 138 |    ],
 139 |    "source": [
 140 |     "# generate list of DNA sequences\n",
 141 |     "sequences = list(data.loc[:, 'Sequence'])\n",
 142 |     "dataset = {}\n",
 143 |     "\n",
 144 |     "# loop through sequences and split into individual nucleotides\n",
 145 |     "for i, seq in enumerate(sequences):\n",
 146 |     "    \n",
 147 |     "    # split into nucleotides, remove tab characters\n",
 148 |     "    nucleotides = list(seq)\n",
 149 |     "    nucleotides = [x for x in nucleotides if x != '\\t']\n",
 150 |     "    \n",
 151 |     "    # append class assignment\n",
 152 |     "    nucleotides.append(classes[i])\n",
 153 |     "    \n",
 154 |     "    # add to dataset\n",
 155 |     "    dataset[i] = nucleotides\n",
 156 |     "    \n",
 157 |     "print(dataset[0])"
 158 |    ]
 159 |   },
 160 |   {
 161 |    "cell_type": "code",
 162 |    "execution_count": 6,
 163 |    "metadata": {},
 164 |    "outputs": [
 165 |     {
 166 |      "name": "stdout",
 167 |      "output_type": "stream",
 168 |      "text": [
 169 |       "   0   1   2   3   4   5   6   7   8   9   ... 96  97  98  99  100 101 102  \\\n",
 170 |       "0    t   t   g   a   t   a   c   t   c   t ...   c   c   t   a   g   c   g   \n",
 171 |       "1    a   g   t   a   c   g   a   t   g   t ...   c   g   a   g   a   c   t   \n",
 172 |       "2    c   c   a   t   g   g   g   t   a   t ...   g   c   t   a   g   t   a   \n",
 173 |       "3    t   t   c   t   a   g   g   c   c   t ...   a   t   g   g   a   c   t   \n",
 174 |       "4    a   a   t   g   t   g   g   t   t   a ...   g   a   a   g   g   a   t   \n",
 175 |       "5    g   t   a   t   a   c   g   a   t   a ...   t   g   c   g   c   a   c   \n",
 176 |       "6    c   c   g   g   a   a   g   c   a   a ...   a   g   c   t   a   t   t   \n",
 177 |       "7    a   c   a   a   t   a   t   a   a   t ...   g   a   g   g   t   g   c   \n",
 178 |       "8    a   t   g   t   t   g   g   a   t   t ...   a   c   a   t   g   g   a   \n",
 179 |       "9    t   g   a   g   a   g   g   a   a   t ...   c   t   a   a   t   c   a   \n",
 180 |       "10   a   a   a   t   a   a   a   a   t   c ...   c   t   c   c   c   c   c   \n",
 181 |       "11   c   c   c   g   c   g   g   c   a   c ...   c   t   g   t   a   t   a   \n",
 182 |       "12   g   a   t   t   t   g   g   a   c   t ...   t   c   a   c   g   c   a   \n",
 183 |       "13   c   g   a   a   a   a   a   c   t   c ...   t   t   g   c   c   t   g   \n",
 184 |       "14   t   t   g   t   t   t   t   t   g   t ...   a   t   t   a   c   a   a   \n",
 185 |       "15   t   t   t   c   t   g   t   t   c   t ...   g   g   c   a   t   a   t   \n",
 186 |       "16   g   g   g   g   g   g   t   g   g   g ...   a   t   a   g   c   a   t   \n",
 187 |       "17   c   t   c   a   a   a   a   a   a   t ...   g   t   a   a   g   c   a   \n",
 188 |       "18   g   c   a   a   c   a   a   t   c   c ...   a   g   t   a   a   g   a   \n",
 189 |       "19   t   a   t   g   g   a   g   a   a   a ...   g   a   c   g   c   g   c   \n",
 190 |       "20   t   c   t   t   a   g   c   c   g   g ...   c   t   a   a   a   g   c   \n",
 191 |       "21   c   g   a   g   a   a   c   t   g   g ...   a   t   g   g   a   t   g   \n",
 192 |       "22   g   c   g   t   a   g   a   g   a   c ...   t   t   a   g   c   c   a   \n",
 193 |       "23   g   t   c   g   a   g   t   t   c   c ...   g   t   c   a   t   t   c   \n",
 194 |       "24   t   g   t   t   g   t   c   a   g   g ...   t   c   c   a   t   t   a   \n",
 195 |       "25   g   a   t   t   c   t   t   t   t   g ...   c   c   g   g   g   g   g   \n",
 196 |       "26   g   t   a   g   t   g   c   g   c   a ...   a   a   c   a   c   a   a   \n",
 197 |       "27   t   t   t   c   g   c   c   a   c   a ...   g   t   t   t   a   g   t   \n",
 198 |       "28   t   g   t   g   a   c   t   g   g   t ...   c   g   t   g   t   g   t   \n",
 199 |       "29   a   g   t   g   a   g   g   c   t   a ...   c   c   t   a   a   g   c   \n",
 200 |       "30   a   t   t   a   a   t   a   a   t   a ...   t   g   g   g   a   g   a   \n",
 201 |       "31   g   g   t   g   a   a   t   t   c   c ...   c   g   a   g   a   t   a   \n",
 202 |       "32   t   t   t   t   c   t   g   a   t   t ...   g   t   c   c   t   t   t   \n",
 203 |       "33   a   c   t   a   c   a   a   c   g   c ...   a   g   t   t   g   t   c   \n",
 204 |       "34   t   g   g   g   a   a   c   a   t   c ...   c   t   c   a   c   t   t   \n",
 205 |       "35   g   t   t   a   c   a   g   g   g   c ...   a   t   t   g   t   t   c   \n",
 206 |       "36   t   t   t   t   t   g   c   t   t   t ...   a   t   g   a   t   t   g   \n",
 207 |       "37   a   a   a   g   a   a   a   a   a   a ...   c   t   g   c   t   g   t   \n",
 208 |       "38   t   c   t   t   g   a   t   t   a   t ...   t   g   t   g   c   c   g   \n",
 209 |       "39   a   a   c   t   a   a   a   a   a   a ...   t   c   a   t   t   t   g   \n",
 210 |       "40   a   a   a   a   a   c   g   a   t   a ...   g   g   t   c   t   g   a   \n",
 211 |       "41   t   t   t   g   t   t   t   t   c   t ...   c   c   t   t   g   a   t   \n",
 212 |       "42   g   c   g   a   g   a   c   t   g   g ...   a   a   a   c   t   a   g   \n",
 213 |       "43   c   t   c   a   c   g   a   g   c   c ...   t   a   c   t   a   a   g   \n",
 214 |       "44   g   a   t   t   g   a   g   c   a   g ...   a   t   t   g   g   g   a   \n",
 215 |       "45   c   a   a   a   c   g   c   t   a   c ...   a   g   g   c   a   g   c   \n",
 216 |       "46   g   c   a   c   c   t   c   t   t   c ...   a   t   t   a   c   a   g   \n",
 217 |       "47   g   g   c   t   t   c   c   c   g   a ...   t   t   g   t   g   g   t   \n",
 218 |       "48   g   c   c   a   c   c   a   a   a   c ...   g   a   a   g   t   g   t   \n",
 219 |       "49   c   a   a   a   c   g   t   a   a   c ...   c   a   a   g   g   a   c   \n",
 220 |       "50   t   t   c   c   g   t   c   c   a   a ...   t   t   c   a   c   a   a   \n",
 221 |       "51   t   c   c   a   t   t   a   a   t   c ...   t   c   a   g   c   c   a   \n",
 222 |       "52   g   g   c   a   g   t   t   g   g   t ...   t   g   t   t   c   t   c   \n",
 223 |       "53   t   c   g   a   g   a   g   a   g   g ...   c   c   t   a   t   a   a   \n",
 224 |       "54   c   c   g   c   t   g   a   a   t   a ...   t   t   a   t   a   t   t   \n",
 225 |       "55   g   a   c   t   a   g   a   c   t   c ...   t   t   t   g   c   a   t   \n",
 226 |       "56   t   a   g   c   g   t   t   a   t   a ...   g   t   t   a   g   t   g   \n",
 227 |       "57   +   +   +   +   +   +   +   +   +   + ...   -   -   -   -   -   -   -   \n",
 228 |       "\n",
 229 |       "   103 104 105  \n",
 230 |       "0    c   c   t  \n",
 231 |       "1    g   t   a  \n",
 232 |       "2    c   c   a  \n",
 233 |       "3    g   g   c  \n",
 234 |       "4    a   t   a  \n",
 235 |       "5    c   c   t  \n",
 236 |       "6    t   c   t  \n",
 237 |       "7    a   t   a  \n",
 238 |       "8    c   c   a  \n",
 239 |       "9    g   a   t  \n",
 240 |       "10   a   a   a  \n",
 241 |       "11   t   t   a  \n",
 242 |       "12   g   g   a  \n",
 243 |       "13   a   g   t  \n",
 244 |       "14   g   c   a  \n",
 245 |       "15   a   c   a  \n",
 246 |       "16   t   t   g  \n",
 247 |       "17   g   c   g  \n",
 248 |       "18   c   t   a  \n",
 249 |       "19   c   a   g  \n",
 250 |       "20   t   a   g  \n",
 251 |       "21   g   a   c  \n",
 252 |       "22   a   c   t  \n",
 253 |       "23   g   g   c  \n",
 254 |       "24   t   g   t  \n",
 255 |       "25   g   g   a  \n",
 256 |       "26   c   t   a  \n",
 257 |       "27   t   c   t  \n",
 258 |       "28   t   t   g  \n",
 259 |       "29   c   t   g  \n",
 260 |       "30   c   g   c  \n",
 261 |       "31   g   a   a  \n",
 262 |       "32   t   g   c  \n",
 263 |       "33   t   g   t  \n",
 264 |       "34   a   g   c  \n",
 265 |       "35   c   g   a  \n",
 266 |       "36   t   t   t  \n",
 267 |       "37   g   t   t  \n",
 268 |       "38   g   t   a  \n",
 269 |       "39   a   t   g  \n",
 270 |       "40   t   t   c  \n",
 271 |       "41   t   t   c  \n",
 272 |       "42   g   g   a  \n",
 273 |       "43   t   c   a  \n",
 274 |       "44   c   t   t  \n",
 275 |       "45   a   g   c  \n",
 276 |       "46   c   a   a  \n",
 277 |       "47   c   a   a  \n",
 278 |       "48   a   a   t  \n",
 279 |       "49   a   g   c  \n",
 280 |       "50   g   g   a  \n",
 281 |       "51   g   a   a  \n",
 282 |       "52   c   g   g  \n",
 283 |       "53   t   g   a  \n",
 284 |       "54   t   a   a  \n",
 285 |       "55   c   a   c  \n",
 286 |       "56   c   c   t  \n",
 287 |       "57   -   -   -  \n",
 288 |       "\n",
 289 |       "[58 rows x 106 columns]\n"
 290 |      ]
 291 |     }
 292 |    ],
 293 |    "source": [
 294 |     "# turn dataset into pandas DataFrame\n",
 295 |     "dframe = pd.DataFrame(dataset)\n",
 296 |     "print(dframe)"
 297 |    ]
 298 |   },
 299 |   {
 300 |    "cell_type": "code",
 301 |    "execution_count": 7,
 302 |    "metadata": {},
 303 |    "outputs": [
 304 |     {
 305 |      "name": "stdout",
 306 |      "output_type": "stream",
 307 |      "text": [
 308 |       "  0  1  2  3  4  5  6  7  8  9  ... 48 49 50 51 52 53 54 55 56 57\n",
 309 |       "0  t  a  c  t  a  g  c  a  a  t ...  g  c  t  t  g  t  c  g  t  +\n",
 310 |       "1  t  g  c  t  a  t  c  c  t  g ...  c  a  t  c  g  c  c  a  a  +\n",
 311 |       "2  g  t  a  c  t  a  g  a  g  a ...  c  a  c  c  c  g  g  c  g  +\n",
 312 |       "3  a  a  t  t  g  t  g  a  t  g ...  a  a  c  a  a  a  c  t  c  +\n",
 313 |       "4  t  c  g  a  t  a  a  t  t  a ...  c  c  g  t  g  g  t  a  g  +\n",
 314 |       "\n",
 315 |       "[5 rows x 58 columns]\n"
 316 |      ]
 317 |     }
 318 |    ],
 319 |    "source": [
 320 |     "# transpose the DataFrame\n",
 321 |     "df = dframe.transpose()\n",
 322 |     "print(df.iloc[:5])"
 323 |    ]
 324 |   },
 325 |   {
 326 |    "cell_type": "code",
 327 |    "execution_count": 8,
 328 |    "metadata": {},
 329 |    "outputs": [
 330 |     {
 331 |      "name": "stdout",
 332 |      "output_type": "stream",
 333 |      "text": [
 334 |       "   0  1  2  3  4  5  6  7  8  9  ...  48 49 50 51 52 53 54 55 56 Class\n",
 335 |       "0  t  a  c  t  a  g  c  a  a  t  ...   g  c  t  t  g  t  c  g  t     +\n",
 336 |       "1  t  g  c  t  a  t  c  c  t  g  ...   c  a  t  c  g  c  c  a  a     +\n",
 337 |       "2  g  t  a  c  t  a  g  a  g  a  ...   c  a  c  c  c  g  g  c  g     +\n",
 338 |       "3  a  a  t  t  g  t  g  a  t  g  ...   a  a  c  a  a  a  c  t  c     +\n",
 339 |       "4  t  c  g  a  t  a  a  t  t  a  ...   c  c  g  t  g  g  t  a  g     +\n",
 340 |       "\n",
 341 |       "[5 rows x 58 columns]\n"
 342 |      ]
 343 |     }
 344 |    ],
 345 |    "source": [
 346 |     "# for clarity, lets rename the last dataframe column to class\n",
 347 |     "df.rename(columns = {57: 'Class'}, inplace = True) \n",
 348 |     "print(df.iloc[:5])"
 349 |    ]
 350 |   },
 351 |   {
 352 |    "cell_type": "code",
 353 |    "execution_count": 9,
 354 |    "metadata": {},
 355 |    "outputs": [
 356 |     {
 357 |      "data": {
 358 |       "text/html": [
 359 |        "<div>\n",
 360 |        "<style scoped>\n",
 361 |        "    .dataframe tbody tr th:only-of-type {\n",
 362 |        "        vertical-align: middle;\n",
 363 |        "    }\n",
 364 |        "\n",
 365 |        "    .dataframe tbody tr th {\n",
 366 |        "        vertical-align: top;\n",
 367 |        "    }\n",
 368 |        "\n",
 369 |        "    .dataframe thead th {\n",
 370 |        "        text-align: right;\n",
 371 |        "    }\n",
 372 |        "</style>\n",
 373 |        "<table border=\"1\" class=\"dataframe\">\n",
 374 |        "  <thead>\n",
 375 |        "    <tr style=\"text-align: right;\">\n",
 376 |        "      <th></th>\n",
 377 |        "      <th>0</th>\n",
 378 |        "      <th>1</th>\n",
 379 |        "      <th>2</th>\n",
 380 |        "      <th>3</th>\n",
 381 |        "      <th>4</th>\n",
 382 |        "      <th>5</th>\n",
 383 |        "      <th>6</th>\n",
 384 |        "      <th>7</th>\n",
 385 |        "      <th>8</th>\n",
 386 |        "      <th>9</th>\n",
 387 |        "      <th>...</th>\n",
 388 |        "      <th>48</th>\n",
 389 |        "      <th>49</th>\n",
 390 |        "      <th>50</th>\n",
 391 |        "      <th>51</th>\n",
 392 |        "      <th>52</th>\n",
 393 |        "      <th>53</th>\n",
 394 |        "      <th>54</th>\n",
 395 |        "      <th>55</th>\n",
 396 |        "      <th>56</th>\n",
 397 |        "      <th>Class</th>\n",
 398 |        "    </tr>\n",
 399 |        "  </thead>\n",
 400 |        "  <tbody>\n",
 401 |        "    <tr>\n",
 402 |        "      <th>count</th>\n",
 403 |        "      <td>106</td>\n",
 404 |        "      <td>106</td>\n",
 405 |        "      <td>106</td>\n",
 406 |        "      <td>106</td>\n",
 407 |        "      <td>106</td>\n",
 408 |        "      <td>106</td>\n",
 409 |        "      <td>106</td>\n",
 410 |        "      <td>106</td>\n",
 411 |        "      <td>106</td>\n",
 412 |        "      <td>106</td>\n",
 413 |        "      <td>...</td>\n",
 414 |        "      <td>106</td>\n",
 415 |        "      <td>106</td>\n",
 416 |        "      <td>106</td>\n",
 417 |        "      <td>106</td>\n",
 418 |        "      <td>106</td>\n",
 419 |        "      <td>106</td>\n",
 420 |        "      <td>106</td>\n",
 421 |        "      <td>106</td>\n",
 422 |        "      <td>106</td>\n",
 423 |        "      <td>106</td>\n",
 424 |        "    </tr>\n",
 425 |        "    <tr>\n",
 426 |        "      <th>unique</th>\n",
 427 |        "      <td>4</td>\n",
 428 |        "      <td>4</td>\n",
 429 |        "      <td>4</td>\n",
 430 |        "      <td>4</td>\n",
 431 |        "      <td>4</td>\n",
 432 |        "      <td>4</td>\n",
 433 |        "      <td>4</td>\n",
 434 |        "      <td>4</td>\n",
 435 |        "      <td>4</td>\n",
 436 |        "      <td>4</td>\n",
 437 |        "      <td>...</td>\n",
 438 |        "      <td>4</td>\n",
 439 |        "      <td>4</td>\n",
 440 |        "      <td>4</td>\n",
 441 |        "      <td>4</td>\n",
 442 |        "      <td>4</td>\n",
 443 |        "      <td>4</td>\n",
 444 |        "      <td>4</td>\n",
 445 |        "      <td>4</td>\n",
 446 |        "      <td>4</td>\n",
 447 |        "      <td>2</td>\n",
 448 |        "    </tr>\n",
 449 |        "    <tr>\n",
 450 |        "      <th>top</th>\n",
 451 |        "      <td>t</td>\n",
 452 |        "      <td>a</td>\n",
 453 |        "      <td>a</td>\n",
 454 |        "      <td>c</td>\n",
 455 |        "      <td>a</td>\n",
 456 |        "      <td>a</td>\n",
 457 |        "      <td>a</td>\n",
 458 |        "      <td>a</td>\n",
 459 |        "      <td>a</td>\n",
 460 |        "      <td>a</td>\n",
 461 |        "      <td>...</td>\n",
 462 |        "      <td>c</td>\n",
 463 |        "      <td>c</td>\n",
 464 |        "      <td>c</td>\n",
 465 |        "      <td>t</td>\n",
 466 |        "      <td>t</td>\n",
 467 |        "      <td>c</td>\n",
 468 |        "      <td>c</td>\n",
 469 |        "      <td>t</td>\n",
 470 |        "      <td>t</td>\n",
 471 |        "      <td>-</td>\n",
 472 |        "    </tr>\n",
 473 |        "    <tr>\n",
 474 |        "      <th>freq</th>\n",
 475 |        "      <td>38</td>\n",
 476 |        "      <td>34</td>\n",
 477 |        "      <td>30</td>\n",
 478 |        "      <td>30</td>\n",
 479 |        "      <td>36</td>\n",
 480 |        "      <td>42</td>\n",
 481 |        "      <td>38</td>\n",
 482 |        "      <td>34</td>\n",
 483 |        "      <td>33</td>\n",
 484 |        "      <td>36</td>\n",
 485 |        "      <td>...</td>\n",
 486 |        "      <td>36</td>\n",
 487 |        "      <td>42</td>\n",
 488 |        "      <td>31</td>\n",
 489 |        "      <td>33</td>\n",
 490 |        "      <td>35</td>\n",
 491 |        "      <td>32</td>\n",
 492 |        "      <td>29</td>\n",
 493 |        "      <td>29</td>\n",
 494 |        "      <td>34</td>\n",
 495 |        "      <td>53</td>\n",
 496 |        "    </tr>\n",
 497 |        "  </tbody>\n",
 498 |        "</table>\n",
 499 |        "<p>4 rows × 58 columns</p>\n",
 500 |        "</div>"
 501 |       ],
 502 |       "text/plain": [
 503 |        "          0    1    2    3    4    5    6    7    8    9  ...    48   49   50  \\\n",
 504 |        "count   106  106  106  106  106  106  106  106  106  106  ...   106  106  106   \n",
 505 |        "unique    4    4    4    4    4    4    4    4    4    4  ...     4    4    4   \n",
 506 |        "top       t    a    a    c    a    a    a    a    a    a  ...     c    c    c   \n",
 507 |        "freq     38   34   30   30   36   42   38   34   33   36  ...    36   42   31   \n",
 508 |        "\n",
 509 |        "         51   52   53   54   55   56 Class  \n",
 510 |        "count   106  106  106  106  106  106   106  \n",
 511 |        "unique    4    4    4    4    4    4     2  \n",
 512 |        "top       t    t    c    c    t    t     -  \n",
 513 |        "freq     33   35   32   29   29   34    53  \n",
 514 |        "\n",
 515 |        "[4 rows x 58 columns]"
 516 |       ]
 517 |      },
 518 |      "execution_count": 9,
 519 |      "metadata": {},
 520 |      "output_type": "execute_result"
 521 |     }
 522 |    ],
 523 |    "source": [
 524 |     "# looks good! Let's start to familiarize ourselves with the dataset so we can pick the most suitable \n",
 525 |     "# algorithms for this data\n",
 526 |     "\n",
 527 |     "df.describe()"
 528 |    ]
 529 |   },
 530 |   {
 531 |    "cell_type": "code",
 532 |    "execution_count": 10,
 533 |    "metadata": {},
 534 |    "outputs": [
 535 |     {
 536 |      "name": "stdout",
 537 |      "output_type": "stream",
 538 |      "text": [
 539 |       "      0     1     2     3     4     5     6     7     8     9  ...      48  \\\n",
 540 |       "+   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...     NaN   \n",
 541 |       "-   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...     NaN   \n",
 542 |       "a  26.0  34.0  30.0  22.0  36.0  42.0  38.0  34.0  33.0  36.0  ...    23.0   \n",
 543 |       "c  27.0  22.0  21.0  30.0  19.0  18.0  21.0  20.0  22.0  22.0  ...    36.0   \n",
 544 |       "g  15.0  24.0  28.0  28.0  29.0  22.0  17.0  20.0  19.0  20.0  ...    26.0   \n",
 545 |       "t  38.0  26.0  27.0  26.0  22.0  24.0  30.0  32.0  32.0  28.0  ...    21.0   \n",
 546 |       "\n",
 547 |       "     49    50    51    52    53    54    55    56  Class  \n",
 548 |       "+   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   53.0  \n",
 549 |       "-   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   53.0  \n",
 550 |       "a  24.0  28.0  27.0  25.0  22.0  26.0  24.0  27.0    NaN  \n",
 551 |       "c  42.0  31.0  32.0  21.0  32.0  29.0  29.0  17.0    NaN  \n",
 552 |       "g  18.0  24.0  14.0  25.0  22.0  28.0  24.0  28.0    NaN  \n",
 553 |       "t  22.0  23.0  33.0  35.0  30.0  23.0  29.0  34.0    NaN  \n",
 554 |       "\n",
 555 |       "[6 rows x 58 columns]\n"
 556 |      ]
 557 |     }
 558 |    ],
 559 |    "source": [
 560 |     "# desribe does not tell us enough information since the attributes are text. Lets record value counts for each sequence\n",
 561 |     "series = []\n",
 562 |     "for name in df.columns:\n",
 563 |     "    series.append(df[name].value_counts())\n",
 564 |     "    \n",
 565 |     "info = pd.DataFrame(series)\n",
 566 |     "details = info.transpose()\n",
 567 |     "print(details)"
 568 |    ]
 569 |   },
 570 |   {
 571 |    "cell_type": "code",
 572 |    "execution_count": 11,
 573 |    "metadata": {},
 574 |    "outputs": [
 575 |     {
 576 |      "data": {
 577 |       "text/html": [
 578 |        "<div>\n",
 579 |        "<style scoped>\n",
 580 |        "    .dataframe tbody tr th:only-of-type {\n",
 581 |        "        vertical-align: middle;\n",
 582 |        "    }\n",
 583 |        "\n",
 584 |        "    .dataframe tbody tr th {\n",
 585 |        "        vertical-align: top;\n",
 586 |        "    }\n",
 587 |        "\n",
 588 |        "    .dataframe thead th {\n",
 589 |        "        text-align: right;\n",
 590 |        "    }\n",
 591 |        "</style>\n",
 592 |        "<table border=\"1\" class=\"dataframe\">\n",
 593 |        "  <thead>\n",
 594 |        "    <tr style=\"text-align: right;\">\n",
 595 |        "      <th></th>\n",
 596 |        "      <th>0_a</th>\n",
 597 |        "      <th>0_c</th>\n",
 598 |        "      <th>0_g</th>\n",
 599 |        "      <th>0_t</th>\n",
 600 |        "      <th>1_a</th>\n",
 601 |        "      <th>1_c</th>\n",
 602 |        "      <th>1_g</th>\n",
 603 |        "      <th>1_t</th>\n",
 604 |        "      <th>2_a</th>\n",
 605 |        "      <th>2_c</th>\n",
 606 |        "      <th>...</th>\n",
 607 |        "      <th>55_a</th>\n",
 608 |        "      <th>55_c</th>\n",
 609 |        "      <th>55_g</th>\n",
 610 |        "      <th>55_t</th>\n",
 611 |        "      <th>56_a</th>\n",
 612 |        "      <th>56_c</th>\n",
 613 |        "      <th>56_g</th>\n",
 614 |        "      <th>56_t</th>\n",
 615 |        "      <th>Class_+</th>\n",
 616 |        "      <th>Class_-</th>\n",
 617 |        "    </tr>\n",
 618 |        "  </thead>\n",
 619 |        "  <tbody>\n",
 620 |        "    <tr>\n",
 621 |        "      <th>0</th>\n",
 622 |        "      <td>0</td>\n",
 623 |        "      <td>0</td>\n",
 624 |        "      <td>0</td>\n",
 625 |        "      <td>1</td>\n",
 626 |        "      <td>1</td>\n",
 627 |        "      <td>0</td>\n",
 628 |        "      <td>0</td>\n",
 629 |        "      <td>0</td>\n",
 630 |        "      <td>0</td>\n",
 631 |        "      <td>1</td>\n",
 632 |        "      <td>...</td>\n",
 633 |        "      <td>0</td>\n",
 634 |        "      <td>0</td>\n",
 635 |        "      <td>1</td>\n",
 636 |        "      <td>0</td>\n",
 637 |        "      <td>0</td>\n",
 638 |        "      <td>0</td>\n",
 639 |        "      <td>0</td>\n",
 640 |        "      <td>1</td>\n",
 641 |        "      <td>1</td>\n",
 642 |        "      <td>0</td>\n",
 643 |        "    </tr>\n",
 644 |        "    <tr>\n",
 645 |        "      <th>1</th>\n",
 646 |        "      <td>0</td>\n",
 647 |        "      <td>0</td>\n",
 648 |        "      <td>0</td>\n",
 649 |        "      <td>1</td>\n",
 650 |        "      <td>0</td>\n",
 651 |        "      <td>0</td>\n",
 652 |        "      <td>1</td>\n",
 653 |        "      <td>0</td>\n",
 654 |        "      <td>0</td>\n",
 655 |        "      <td>1</td>\n",
 656 |        "      <td>...</td>\n",
 657 |        "      <td>1</td>\n",
 658 |        "      <td>0</td>\n",
 659 |        "      <td>0</td>\n",
 660 |        "      <td>0</td>\n",
 661 |        "      <td>1</td>\n",
 662 |        "      <td>0</td>\n",
 663 |        "      <td>0</td>\n",
 664 |        "      <td>0</td>\n",
 665 |        "      <td>1</td>\n",
 666 |        "      <td>0</td>\n",
 667 |        "    </tr>\n",
 668 |        "    <tr>\n",
 669 |        "      <th>2</th>\n",
 670 |        "      <td>0</td>\n",
 671 |        "      <td>0</td>\n",
 672 |        "      <td>1</td>\n",
 673 |        "      <td>0</td>\n",
 674 |        "      <td>0</td>\n",
 675 |        "      <td>0</td>\n",
 676 |        "      <td>0</td>\n",
 677 |        "      <td>1</td>\n",
 678 |        "      <td>1</td>\n",
 679 |        "      <td>0</td>\n",
 680 |        "      <td>...</td>\n",
 681 |        "      <td>0</td>\n",
 682 |        "      <td>1</td>\n",
 683 |        "      <td>0</td>\n",
 684 |        "      <td>0</td>\n",
 685 |        "      <td>0</td>\n",
 686 |        "      <td>0</td>\n",
 687 |        "      <td>1</td>\n",
 688 |        "      <td>0</td>\n",
 689 |        "      <td>1</td>\n",
 690 |        "      <td>0</td>\n",
 691 |        "    </tr>\n",
 692 |        "    <tr>\n",
 693 |        "      <th>3</th>\n",
 694 |        "      <td>1</td>\n",
 695 |        "      <td>0</td>\n",
 696 |        "      <td>0</td>\n",
 697 |        "      <td>0</td>\n",
 698 |        "      <td>1</td>\n",
 699 |        "      <td>0</td>\n",
 700 |        "      <td>0</td>\n",
 701 |        "      <td>0</td>\n",
 702 |        "      <td>0</td>\n",
 703 |        "      <td>0</td>\n",
 704 |        "      <td>...</td>\n",
 705 |        "      <td>0</td>\n",
 706 |        "      <td>0</td>\n",
 707 |        "      <td>0</td>\n",
 708 |        "      <td>1</td>\n",
 709 |        "      <td>0</td>\n",
 710 |        "      <td>1</td>\n",
 711 |        "      <td>0</td>\n",
 712 |        "      <td>0</td>\n",
 713 |        "      <td>1</td>\n",
 714 |        "      <td>0</td>\n",
 715 |        "    </tr>\n",
 716 |        "    <tr>\n",
 717 |        "      <th>4</th>\n",
 718 |        "      <td>0</td>\n",
 719 |        "      <td>0</td>\n",
 720 |        "      <td>0</td>\n",
 721 |        "      <td>1</td>\n",
 722 |        "      <td>0</td>\n",
 723 |        "      <td>1</td>\n",
 724 |        "      <td>0</td>\n",
 725 |        "      <td>0</td>\n",
 726 |        "      <td>0</td>\n",
 727 |        "      <td>0</td>\n",
 728 |        "      <td>...</td>\n",
 729 |        "      <td>1</td>\n",
 730 |        "      <td>0</td>\n",
 731 |        "      <td>0</td>\n",
 732 |        "      <td>0</td>\n",
 733 |        "      <td>0</td>\n",
 734 |        "      <td>0</td>\n",
 735 |        "      <td>1</td>\n",
 736 |        "      <td>0</td>\n",
 737 |        "      <td>1</td>\n",
 738 |        "      <td>0</td>\n",
 739 |        "    </tr>\n",
 740 |        "  </tbody>\n",
 741 |        "</table>\n",
 742 |        "<p>5 rows × 230 columns</p>\n",
 743 |        "</div>"
 744 |       ],
 745 |       "text/plain": [
 746 |        "   0_a  0_c  0_g  0_t  1_a  1_c  1_g  1_t  2_a  2_c   ...     55_a  55_c  \\\n",
 747 |        "0    0    0    0    1    1    0    0    0    0    1   ...        0     0   \n",
 748 |        "1    0    0    0    1    0    0    1    0    0    1   ...        1     0   \n",
 749 |        "2    0    0    1    0    0    0    0    1    1    0   ...        0     1   \n",
 750 |        "3    1    0    0    0    1    0    0    0    0    0   ...        0     0   \n",
 751 |        "4    0    0    0    1    0    1    0    0    0    0   ...        1     0   \n",
 752 |        "\n",
 753 |        "   55_g  55_t  56_a  56_c  56_g  56_t  Class_+  Class_-  \n",
 754 |        "0     1     0     0     0     0     1        1        0  \n",
 755 |        "1     0     0     1     0     0     0        1        0  \n",
 756 |        "2     0     0     0     0     1     0        1        0  \n",
 757 |        "3     0     1     0     1     0     0        1        0  \n",
 758 |        "4     0     0     0     0     1     0        1        0  \n",
 759 |        "\n",
 760 |        "[5 rows x 230 columns]"
 761 |       ]
 762 |      },
 763 |      "execution_count": 11,
 764 |      "metadata": {},
 765 |      "output_type": "execute_result"
 766 |     }
 767 |    ],
 768 |    "source": [
 769 |     "# Unfortunately, we can't run machine learning algorithms on the data in 'String' formats. As a result, we need to switch\n",
 770 |     "# it to numerical data. This can easily be accomplished using the pd.get_dummies() function\n",
 771 |     "numerical_df = pd.get_dummies(df)\n",
 772 |     "numerical_df.iloc[:5]"
 773 |    ]
 774 |   },
 775 |   {
 776 |    "cell_type": "code",
 777 |    "execution_count": 12,
 778 |    "metadata": {},
 779 |    "outputs": [
 780 |     {
 781 |      "name": "stdout",
 782 |      "output_type": "stream",
 783 |      "text": [
 784 |       "   0_a  0_c  0_g  0_t  1_a  1_c  1_g  1_t  2_a  2_c  ...    54_t  55_a  55_c  \\\n",
 785 |       "0    0    0    0    1    1    0    0    0    0    1  ...       0     0     0   \n",
 786 |       "1    0    0    0    1    0    0    1    0    0    1  ...       0     1     0   \n",
 787 |       "2    0    0    1    0    0    0    0    1    1    0  ...       0     0     1   \n",
 788 |       "3    1    0    0    0    1    0    0    0    0    0  ...       0     0     0   \n",
 789 |       "4    0    0    0    1    0    1    0    0    0    0  ...       1     1     0   \n",
 790 |       "\n",
 791 |       "   55_g  55_t  56_a  56_c  56_g  56_t  Class  \n",
 792 |       "0     1     0     0     0     0     1      1  \n",
 793 |       "1     0     0     1     0     0     0      1  \n",
 794 |       "2     0     0     0     0     1     0      1  \n",
 795 |       "3     0     1     0     1     0     0      1  \n",
 796 |       "4     0     0     0     0     1     0      1  \n",
 797 |       "\n",
 798 |       "[5 rows x 229 columns]\n"
 799 |      ]
 800 |     }
 801 |    ],
 802 |    "source": [
 803 |     "# We don't need both class columns.  Lets drop one then rename the other to simply 'Class'.\n",
 804 |     "df = numerical_df.drop(columns=['Class_-'])\n",
 805 |     "\n",
 806 |     "df.rename(columns = {'Class_+': 'Class'}, inplace = True)\n",
 807 |     "print(df.iloc[:5])"
 808 |    ]
 809 |   },
 810 |   {
 811 |    "cell_type": "code",
 812 |    "execution_count": 13,
 813 |    "metadata": {},
 814 |    "outputs": [],
 815 |    "source": [
 816 |     "# Use the model_selection module to separate training and testing datasets\n",
 817 |     "from sklearn import model_selection\n",
 818 |     "\n",
 819 |     "# Create X and Y datasets for training\n",
 820 |     "X = np.array(df.drop(['Class'], 1))\n",
 821 |     "y = np.array(df['Class'])\n",
 822 |     "\n",
 823 |     "# define seed for reproducibility\n",
 824 |     "seed = 1\n",
 825 |     "\n",
 826 |     "# split data into training and testing datasets\n",
 827 |     "X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.25, random_state=seed)\n"
 828 |    ]
 829 |   },
 830 |   {
 831 |    "cell_type": "markdown",
 832 |    "metadata": {},
 833 |    "source": [
 834 |     "## Step 3: Training and Testing the Classification Algorithms\n",
 835 |     "\n",
 836 |     "Now that we have preprocessed the data and built our training and testing datasets, we can start to deploy different classification algorithms. It's relatively easy to test multiple models; as a result, we will compare and contrast the performance of ten different algorithms."
 837 |    ]
 838 |   },
 839 |   {
 840 |    "cell_type": "code",
 841 |    "execution_count": 14,
 842 |    "metadata": {},
 843 |    "outputs": [
 844 |     {
 845 |      "name": "stdout",
 846 |      "output_type": "stream",
 847 |      "text": [
 848 |       "Nearest Neighbors: 0.823214 (0.113908)\n",
 849 |       "Gaussian Process: 0.873214 (0.056158)\n",
 850 |       "Decision Tree: 0.750000 (0.185405)\n",
 851 |       "Random Forest: 0.580357 (0.106021)\n"
 852 |      ]
 853 |     },
 854 |     {
 855 |      "name": "stderr",
 856 |      "output_type": "stream",
 857 |      "text": [
 858 |       "C:\\Programdata\\anaconda2\\lib\\site-packages\\sklearn\\neural_network\\multilayer_perceptron.py:564: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.\n",
 859 |       "  % self.max_iter, ConvergenceWarning)\n"
 860 |      ]
 861 |     },
 862 |     {
 863 |      "name": "stdout",
 864 |      "output_type": "stream",
 865 |      "text": [
 866 |       "Neural Net: 0.887500 (0.087500)\n",
 867 |       "AdaBoost: 0.912500 (0.112500)\n",
 868 |       "Naive Bayes: 0.837500 (0.137500)\n",
 869 |       "SVM Linear: 0.850000 (0.108972)\n",
 870 |       "SVM RBF: 0.737500 (0.117925)\n",
 871 |       "SVM Sigmoid: 0.569643 (0.159209)\n"
 872 |      ]
 873 |     }
 874 |    ],
 875 |    "source": [
 876 |     "# Now that we have our dataset, we can start building algorithms! We'll need to import each algorithm we plan on using\n",
 877 |     "# from sklearn.  We also need to import some performance metrics, such as accuracy_score and classification_report.\n",
 878 |     "\n",
 879 |     "from sklearn.neighbors import KNeighborsClassifier\n",
 880 |     "from sklearn.neural_network import MLPClassifier\n",
 881 |     "from sklearn.gaussian_process import GaussianProcessClassifier\n",
 882 |     "from sklearn.gaussian_process.kernels import RBF\n",
 883 |     "from sklearn.tree import DecisionTreeClassifier\n",
 884 |     "from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier\n",
 885 |     "from sklearn.naive_bayes import GaussianNB\n",
 886 |     "from sklearn.svm import SVC\n",
 887 |     "from sklearn.metrics import classification_report, accuracy_score\n",
 888 |     "\n",
 889 |     "# define scoring method\n",
 890 |     "scoring = 'accuracy'\n",
 891 |     "\n",
 892 |     "# Define models to train\n",
 893 |     "names = [\"Nearest Neighbors\", \"Gaussian Process\",\n",
 894 |     "         \"Decision Tree\", \"Random Forest\", \"Neural Net\", \"AdaBoost\",\n",
 895 |     "         \"Naive Bayes\", \"SVM Linear\", \"SVM RBF\", \"SVM Sigmoid\"]\n",
 896 |     "\n",
 897 |     "classifiers = [\n",
 898 |     "    KNeighborsClassifier(n_neighbors = 3),\n",
 899 |     "    GaussianProcessClassifier(1.0 * RBF(1.0)),\n",
 900 |     "    DecisionTreeClassifier(max_depth=5),\n",
 901 |     "    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),\n",
 902 |     "    MLPClassifier(alpha=1),\n",
 903 |     "    AdaBoostClassifier(),\n",
 904 |     "    GaussianNB(),\n",
 905 |     "    SVC(kernel = 'linear'), \n",
 906 |     "    SVC(kernel = 'rbf'),\n",
 907 |     "    SVC(kernel = 'sigmoid')\n",
 908 |     "]\n",
 909 |     "\n",
 910 |     "models = zip(names, classifiers)\n",
 911 |     "\n",
 912 |     "# evaluate each model in turn\n",
 913 |     "results = []\n",
 914 |     "names = []\n",
 915 |     "\n",
 916 |     "for name, model in models:\n",
 917 |     "    kfold = model_selection.KFold(n_splits=10, random_state = seed)\n",
 918 |     "    cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring)\n",
 919 |     "    results.append(cv_results)\n",
 920 |     "    names.append(name)\n",
 921 |     "    msg = \"%s: %f (%f)\" % (name, cv_results.mean(), cv_results.std())\n",
 922 |     "    print(msg)"
 923 |    ]
 924 |   },
 925 |   {
 926 |    "cell_type": "code",
 927 |    "execution_count": 15,
 928 |    "metadata": {},
 929 |    "outputs": [
 930 |     {
 931 |      "name": "stdout",
 932 |      "output_type": "stream",
 933 |      "text": [
 934 |       "Nearest Neighbors\n",
 935 |       "0.7777777777777778\n",
 936 |       "             precision    recall  f1-score   support\n",
 937 |       "\n",
 938 |       "          0       1.00      0.65      0.79        17\n",
 939 |       "          1       0.62      1.00      0.77        10\n",
 940 |       "\n",
 941 |       "avg / total       0.86      0.78      0.78        27\n",
 942 |       "\n",
 943 |       "Gaussian Process\n",
 944 |       "0.8888888888888888\n",
 945 |       "             precision    recall  f1-score   support\n",
 946 |       "\n",
 947 |       "          0       1.00      0.82      0.90        17\n",
 948 |       "          1       0.77      1.00      0.87        10\n",
 949 |       "\n",
 950 |       "avg / total       0.91      0.89      0.89        27\n",
 951 |       "\n",
 952 |       "Decision Tree\n",
 953 |       "0.7777777777777778\n",
 954 |       "             precision    recall  f1-score   support\n",
 955 |       "\n",
 956 |       "          0       1.00      0.65      0.79        17\n",
 957 |       "          1       0.62      1.00      0.77        10\n",
 958 |       "\n",
 959 |       "avg / total       0.86      0.78      0.78        27\n",
 960 |       "\n",
 961 |       "Random Forest\n",
 962 |       "0.5925925925925926\n",
 963 |       "             precision    recall  f1-score   support\n",
 964 |       "\n",
 965 |       "          0       0.88      0.41      0.56        17\n",
 966 |       "          1       0.47      0.90      0.62        10\n",
 967 |       "\n",
 968 |       "avg / total       0.73      0.59      0.58        27\n",
 969 |       "\n",
 970 |       "Neural Net\n",
 971 |       "0.9259259259259259\n",
 972 |       "             precision    recall  f1-score   support\n",
 973 |       "\n",
 974 |       "          0       1.00      0.88      0.94        17\n",
 975 |       "          1       0.83      1.00      0.91        10\n",
 976 |       "\n",
 977 |       "avg / total       0.94      0.93      0.93        27\n",
 978 |       "\n",
 979 |       "AdaBoost\n",
 980 |       "0.8518518518518519\n",
 981 |       "             precision    recall  f1-score   support\n",
 982 |       "\n",
 983 |       "          0       1.00      0.76      0.87        17\n",
 984 |       "          1       0.71      1.00      0.83        10\n",
 985 |       "\n",
 986 |       "avg / total       0.89      0.85      0.85        27\n",
 987 |       "\n",
 988 |       "Naive Bayes\n",
 989 |       "0.9259259259259259\n",
 990 |       "             precision    recall  f1-score   support\n",
 991 |       "\n",
 992 |       "          0       1.00      0.88      0.94        17\n",
 993 |       "          1       0.83      1.00      0.91        10\n",
 994 |       "\n",
 995 |       "avg / total       0.94      0.93      0.93        27\n",
 996 |       "\n",
 997 |       "SVM Linear\n",
 998 |       "0.9629629629629629\n",
 999 |       "             precision    recall  f1-score   support\n",
1000 |       "\n",
1001 |       "          0       1.00      0.94      0.97        17\n",
1002 |       "          1       0.91      1.00      0.95        10\n",
1003 |       "\n",
1004 |       "avg / total       0.97      0.96      0.96        27\n",
1005 |       "\n",
1006 |       "SVM RBF\n",
1007 |       "0.7777777777777778\n",
1008 |       "             precision    recall  f1-score   support\n",
1009 |       "\n",
1010 |       "          0       1.00      0.65      0.79        17\n",
1011 |       "          1       0.62      1.00      0.77        10\n",
1012 |       "\n",
1013 |       "avg / total       0.86      0.78      0.78        27\n",
1014 |       "\n",
1015 |       "SVM Sigmoid\n",
1016 |       "0.4444444444444444\n",
1017 |       "             precision    recall  f1-score   support\n",
1018 |       "\n",
1019 |       "          0       1.00      0.12      0.21        17\n",
1020 |       "          1       0.40      1.00      0.57        10\n",
1021 |       "\n",
1022 |       "avg / total       0.78      0.44      0.34        27\n",
1023 |       "\n"
1024 |      ]
1025 |     }
1026 |    ],
1027 |    "source": [
1028 |     "# Remember, performance on the training data is not that important. We want to know how well our algorithms\n",
1029 |     "# can generalize to new data.  To test this, let's make predictions on the validation dataset.\n",
1030 |     "\n",
1031 |     "for name, model in models:\n",
1032 |     "    model.fit(X_train, y_train)\n",
1033 |     "    predictions = model.predict(X_test)\n",
1034 |     "    print(name)\n",
1035 |     "    print(accuracy_score(y_test, predictions))\n",
1036 |     "    print(classification_report(y_test, predictions))\n",
1037 |     "    \n",
1038 |     "# Accuracy - ratio of correctly predicted observation to the total observations. \n",
1039 |     "# Precision - (false positives) ratio of correctly predicted positive observations to the total predicted positive observations\n",
1040 |     "# Recall (Sensitivity) - (false negatives) ratio of correctly predicted positive observations to the all observations in actual class - yes.\n",
1041 |     "# F1 score - F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false "
1042 |    ]
1043 |   },
1044 |   {
1045 |    "cell_type": "code",
1046 |    "execution_count": null,
1047 |    "metadata": {
1048 |     "collapsed": true
1049 |    },
1050 |    "outputs": [],
1051 |    "source": []
1052 |   }
1053 |  ],
1054 |  "metadata": {
1055 |   "kernelspec": {
1056 |    "display_name": "Python [default]",
1057 |    "language": "python",
1058 |    "name": "python2"
1059 |   },
1060 |   "language_info": {
1061 |    "codemirror_mode": {
1062 |     "name": "ipython",
1063 |     "version": 2
1064 |    },
1065 |    "file_extension": ".py",
1066 |    "mimetype": "text/x-python",
1067 |    "name": "python",
1068 |    "nbconvert_exporter": "python",
1069 |    "pygments_lexer": "ipython2",
1070 |    "version": "2.7.13"
1071 |   }
1072 |  },
1073 |  "nbformat": 4,
1074 |  "nbformat_minor": 2
1075 | }
1076 | 


--------------------------------------------------------------------------------
/Chapter05/Autism Screening with Machine Learning.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "### Childhood Autistic Spectrum Disorder Screening using Machine Learning\n",
   8 |     "\n",
   9 |     "The early diagnosis of neurodevelopment disorders can improve treatment and significantly decrease the associated \n",
  10 |     "healthcare costs. In this project, we will use supervised learning to diagnose Autistic Spectrum Disorder \n",
  11 |     "(ASD) based on behavioural features and individual characteristics. More specifically, we will build and deploy a neural network using the Keras API. \n",
  12 |     "\n",
  13 |     "This project will use a dataset provided by the UCI Machine Learning Repository that contains screening data for 292 patients. The dataset can be found at the following URL: \n",
  14 |     "https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++\n",
  15 |     "\n",
  16 |     "Let's dive right in! First, we will import a few of libraries we will use in this project. "
  17 |    ]
  18 |   },
  19 |   {
  20 |    "cell_type": "code",
  21 |    "execution_count": 1,
  22 |    "metadata": {},
  23 |    "outputs": [
  24 |     {
  25 |      "name": "stderr",
  26 |      "output_type": "stream",
  27 |      "text": [
  28 |       "Using Theano backend.\n",
  29 |       "WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n"
  30 |      ]
  31 |     },
  32 |     {
  33 |      "name": "stdout",
  34 |      "output_type": "stream",
  35 |      "text": [
  36 |       "Python: 2.7.13 |Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)]\n",
  37 |       "Pandas: 0.21.0\n",
  38 |       "Sklearn: 0.19.1\n",
  39 |       "Keras: 2.1.4\n"
  40 |      ]
  41 |     }
  42 |    ],
  43 |    "source": [
  44 |     "import sys\n",
  45 |     "import pandas as pd\n",
  46 |     "import sklearn\n",
  47 |     "import keras\n",
  48 |     "\n",
  49 |     "print 'Python: {}'.format(sys.version)\n",
  50 |     "print 'Pandas: {}'.format(pd.__version__)\n",
  51 |     "print 'Sklearn: {}'.format(sklearn.__version__)\n",
  52 |     "print 'Keras: {}'.format(keras.__version__)"
  53 |    ]
  54 |   },
  55 |   {
  56 |    "cell_type": "markdown",
  57 |    "metadata": {},
  58 |    "source": [
  59 |     "### 1. Importing the Dataset\n",
  60 |     "\n",
  61 |     "We will obtain the data from the UCI Machine Learning Repository; however, since the data isn't contained in a csv or txt file, we will have to download the compressed zip file and then extract the data manually. Once that is accomplished, we will read the information in from a text file using Pandas. "
  62 |    ]
  63 |   },
  64 |   {
  65 |    "cell_type": "code",
  66 |    "execution_count": 2,
  67 |    "metadata": {},
  68 |    "outputs": [],
  69 |    "source": [
  70 |     "# import the dataset\n",
  71 |     "file = 'C:/users/brend/tutorial/autism-data.txt'\n",
  72 |     "\n",
  73 |     "# read the csv\n",
  74 |     "data = pd.read_table(file, sep = ',', index_col = None)"
  75 |    ]
  76 |   },
  77 |   {
  78 |    "cell_type": "code",
  79 |    "execution_count": 3,
  80 |    "metadata": {},
  81 |    "outputs": [
  82 |     {
  83 |      "name": "stdout",
  84 |      "output_type": "stream",
  85 |      "text": [
  86 |       "Shape of DataFrame: (292, 21)\n",
  87 |       "A1_Score                            1\n",
  88 |       "A2_Score                            1\n",
  89 |       "A3_Score                            0\n",
  90 |       "A4_Score                            0\n",
  91 |       "A5_Score                            1\n",
  92 |       "A6_Score                            1\n",
  93 |       "A7_Score                            0\n",
  94 |       "A8_Score                            1\n",
  95 |       "A9_Score                            0\n",
  96 |       "A10_Score                           0\n",
  97 |       "age                                 6\n",
  98 |       "gender                              m\n",
  99 |       "ethnicity                      Others\n",
 100 |       "jundice                            no\n",
 101 |       "family_history_of_PDD              no\n",
 102 |       "contry_of_res                  Jordan\n",
 103 |       "used_app_before                    no\n",
 104 |       "result                              5\n",
 105 |       "age_desc                 '4-11 years'\n",
 106 |       "relation                       Parent\n",
 107 |       "class                              NO\n",
 108 |       "Name: 0, dtype: object\n"
 109 |      ]
 110 |     }
 111 |    ],
 112 |    "source": [
 113 |     "# print the shape of the DataFrame, so we can see how many examples we have\n",
 114 |     "print 'Shape of DataFrame: {}'.format(data.shape)\n",
 115 |     "print data.loc[0]"
 116 |    ]
 117 |   },
 118 |   {
 119 |    "cell_type": "code",
 120 |    "execution_count": 4,
 121 |    "metadata": {},
 122 |    "outputs": [
 123 |     {
 124 |      "data": {
 125 |       "text/html": [
 126 |        "<div>\n",
 127 |        "<style scoped>\n",
 128 |        "    .dataframe tbody tr th:only-of-type {\n",
 129 |        "        vertical-align: middle;\n",
 130 |        "    }\n",
 131 |        "\n",
 132 |        "    .dataframe tbody tr th {\n",
 133 |        "        vertical-align: top;\n",
 134 |        "    }\n",
 135 |        "\n",
 136 |        "    .dataframe thead th {\n",
 137 |        "        text-align: right;\n",
 138 |        "    }\n",
 139 |        "</style>\n",
 140 |        "<table border=\"1\" class=\"dataframe\">\n",
 141 |        "  <thead>\n",
 142 |        "    <tr style=\"text-align: right;\">\n",
 143 |        "      <th></th>\n",
 144 |        "      <th>A1_Score</th>\n",
 145 |        "      <th>A2_Score</th>\n",
 146 |        "      <th>A3_Score</th>\n",
 147 |        "      <th>A4_Score</th>\n",
 148 |        "      <th>A5_Score</th>\n",
 149 |        "      <th>A6_Score</th>\n",
 150 |        "      <th>A7_Score</th>\n",
 151 |        "      <th>A8_Score</th>\n",
 152 |        "      <th>A9_Score</th>\n",
 153 |        "      <th>A10_Score</th>\n",
 154 |        "      <th>...</th>\n",
 155 |        "      <th>gender</th>\n",
 156 |        "      <th>ethnicity</th>\n",
 157 |        "      <th>jundice</th>\n",
 158 |        "      <th>family_history_of_PDD</th>\n",
 159 |        "      <th>contry_of_res</th>\n",
 160 |        "      <th>used_app_before</th>\n",
 161 |        "      <th>result</th>\n",
 162 |        "      <th>age_desc</th>\n",
 163 |        "      <th>relation</th>\n",
 164 |        "      <th>class</th>\n",
 165 |        "    </tr>\n",
 166 |        "  </thead>\n",
 167 |        "  <tbody>\n",
 168 |        "    <tr>\n",
 169 |        "      <th>0</th>\n",
 170 |        "      <td>1</td>\n",
 171 |        "      <td>1</td>\n",
 172 |        "      <td>0</td>\n",
 173 |        "      <td>0</td>\n",
 174 |        "      <td>1</td>\n",
 175 |        "      <td>1</td>\n",
 176 |        "      <td>0</td>\n",
 177 |        "      <td>1</td>\n",
 178 |        "      <td>0</td>\n",
 179 |        "      <td>0</td>\n",
 180 |        "      <td>...</td>\n",
 181 |        "      <td>m</td>\n",
 182 |        "      <td>Others</td>\n",
 183 |        "      <td>no</td>\n",
 184 |        "      <td>no</td>\n",
 185 |        "      <td>Jordan</td>\n",
 186 |        "      <td>no</td>\n",
 187 |        "      <td>5</td>\n",
 188 |        "      <td>'4-11 years'</td>\n",
 189 |        "      <td>Parent</td>\n",
 190 |        "      <td>NO</td>\n",
 191 |        "    </tr>\n",
 192 |        "    <tr>\n",
 193 |        "      <th>1</th>\n",
 194 |        "      <td>1</td>\n",
 195 |        "      <td>1</td>\n",
 196 |        "      <td>0</td>\n",
 197 |        "      <td>0</td>\n",
 198 |        "      <td>1</td>\n",
 199 |        "      <td>1</td>\n",
 200 |        "      <td>0</td>\n",
 201 |        "      <td>1</td>\n",
 202 |        "      <td>0</td>\n",
 203 |        "      <td>0</td>\n",
 204 |        "      <td>...</td>\n",
 205 |        "      <td>m</td>\n",
 206 |        "      <td>'Middle Eastern '</td>\n",
 207 |        "      <td>no</td>\n",
 208 |        "      <td>no</td>\n",
 209 |        "      <td>Jordan</td>\n",
 210 |        "      <td>no</td>\n",
 211 |        "      <td>5</td>\n",
 212 |        "      <td>'4-11 years'</td>\n",
 213 |        "      <td>Parent</td>\n",
 214 |        "      <td>NO</td>\n",
 215 |        "    </tr>\n",
 216 |        "    <tr>\n",
 217 |        "      <th>2</th>\n",
 218 |        "      <td>1</td>\n",
 219 |        "      <td>1</td>\n",
 220 |        "      <td>0</td>\n",
 221 |        "      <td>0</td>\n",
 222 |        "      <td>0</td>\n",
 223 |        "      <td>1</td>\n",
 224 |        "      <td>1</td>\n",
 225 |        "      <td>1</td>\n",
 226 |        "      <td>0</td>\n",
 227 |        "      <td>0</td>\n",
 228 |        "      <td>...</td>\n",
 229 |        "      <td>m</td>\n",
 230 |        "      <td>?</td>\n",
 231 |        "      <td>no</td>\n",
 232 |        "      <td>no</td>\n",
 233 |        "      <td>Jordan</td>\n",
 234 |        "      <td>yes</td>\n",
 235 |        "      <td>5</td>\n",
 236 |        "      <td>'4-11 years'</td>\n",
 237 |        "      <td>?</td>\n",
 238 |        "      <td>NO</td>\n",
 239 |        "    </tr>\n",
 240 |        "    <tr>\n",
 241 |        "      <th>3</th>\n",
 242 |        "      <td>0</td>\n",
 243 |        "      <td>1</td>\n",
 244 |        "      <td>0</td>\n",
 245 |        "      <td>0</td>\n",
 246 |        "      <td>1</td>\n",
 247 |        "      <td>1</td>\n",
 248 |        "      <td>0</td>\n",
 249 |        "      <td>0</td>\n",
 250 |        "      <td>0</td>\n",
 251 |        "      <td>1</td>\n",
 252 |        "      <td>...</td>\n",
 253 |        "      <td>f</td>\n",
 254 |        "      <td>?</td>\n",
 255 |        "      <td>yes</td>\n",
 256 |        "      <td>no</td>\n",
 257 |        "      <td>Jordan</td>\n",
 258 |        "      <td>no</td>\n",
 259 |        "      <td>4</td>\n",
 260 |        "      <td>'4-11 years'</td>\n",
 261 |        "      <td>?</td>\n",
 262 |        "      <td>NO</td>\n",
 263 |        "    </tr>\n",
 264 |        "    <tr>\n",
 265 |        "      <th>4</th>\n",
 266 |        "      <td>1</td>\n",
 267 |        "      <td>1</td>\n",
 268 |        "      <td>1</td>\n",
 269 |        "      <td>1</td>\n",
 270 |        "      <td>1</td>\n",
 271 |        "      <td>1</td>\n",
 272 |        "      <td>1</td>\n",
 273 |        "      <td>1</td>\n",
 274 |        "      <td>1</td>\n",
 275 |        "      <td>1</td>\n",
 276 |        "      <td>...</td>\n",
 277 |        "      <td>m</td>\n",
 278 |        "      <td>Others</td>\n",
 279 |        "      <td>yes</td>\n",
 280 |        "      <td>no</td>\n",
 281 |        "      <td>'United States'</td>\n",
 282 |        "      <td>no</td>\n",
 283 |        "      <td>10</td>\n",
 284 |        "      <td>'4-11 years'</td>\n",
 285 |        "      <td>Parent</td>\n",
 286 |        "      <td>YES</td>\n",
 287 |        "    </tr>\n",
 288 |        "    <tr>\n",
 289 |        "      <th>5</th>\n",
 290 |        "      <td>0</td>\n",
 291 |        "      <td>0</td>\n",
 292 |        "      <td>1</td>\n",
 293 |        "      <td>0</td>\n",
 294 |        "      <td>1</td>\n",
 295 |        "      <td>1</td>\n",
 296 |        "      <td>0</td>\n",
 297 |        "      <td>1</td>\n",
 298 |        "      <td>0</td>\n",
 299 |        "      <td>1</td>\n",
 300 |        "      <td>...</td>\n",
 301 |        "      <td>m</td>\n",
 302 |        "      <td>?</td>\n",
 303 |        "      <td>no</td>\n",
 304 |        "      <td>yes</td>\n",
 305 |        "      <td>Egypt</td>\n",
 306 |        "      <td>no</td>\n",
 307 |        "      <td>5</td>\n",
 308 |        "      <td>'4-11 years'</td>\n",
 309 |        "      <td>?</td>\n",
 310 |        "      <td>NO</td>\n",
 311 |        "    </tr>\n",
 312 |        "    <tr>\n",
 313 |        "      <th>6</th>\n",
 314 |        "      <td>1</td>\n",
 315 |        "      <td>0</td>\n",
 316 |        "      <td>1</td>\n",
 317 |        "      <td>1</td>\n",
 318 |        "      <td>1</td>\n",
 319 |        "      <td>1</td>\n",
 320 |        "      <td>0</td>\n",
 321 |        "      <td>1</td>\n",
 322 |        "      <td>0</td>\n",
 323 |        "      <td>1</td>\n",
 324 |        "      <td>...</td>\n",
 325 |        "      <td>m</td>\n",
 326 |        "      <td>White-European</td>\n",
 327 |        "      <td>no</td>\n",
 328 |        "      <td>no</td>\n",
 329 |        "      <td>'United Kingdom'</td>\n",
 330 |        "      <td>no</td>\n",
 331 |        "      <td>7</td>\n",
 332 |        "      <td>'4-11 years'</td>\n",
 333 |        "      <td>Parent</td>\n",
 334 |        "      <td>YES</td>\n",
 335 |        "    </tr>\n",
 336 |        "    <tr>\n",
 337 |        "      <th>7</th>\n",
 338 |        "      <td>1</td>\n",
 339 |        "      <td>1</td>\n",
 340 |        "      <td>1</td>\n",
 341 |        "      <td>1</td>\n",
 342 |        "      <td>1</td>\n",
 343 |        "      <td>1</td>\n",
 344 |        "      <td>1</td>\n",
 345 |        "      <td>1</td>\n",
 346 |        "      <td>0</td>\n",
 347 |        "      <td>0</td>\n",
 348 |        "      <td>...</td>\n",
 349 |        "      <td>f</td>\n",
 350 |        "      <td>'Middle Eastern '</td>\n",
 351 |        "      <td>no</td>\n",
 352 |        "      <td>no</td>\n",
 353 |        "      <td>Bahrain</td>\n",
 354 |        "      <td>no</td>\n",
 355 |        "      <td>8</td>\n",
 356 |        "      <td>'4-11 years'</td>\n",
 357 |        "      <td>Parent</td>\n",
 358 |        "      <td>YES</td>\n",
 359 |        "    </tr>\n",
 360 |        "    <tr>\n",
 361 |        "      <th>8</th>\n",
 362 |        "      <td>1</td>\n",
 363 |        "      <td>1</td>\n",
 364 |        "      <td>1</td>\n",
 365 |        "      <td>1</td>\n",
 366 |        "      <td>1</td>\n",
 367 |        "      <td>1</td>\n",
 368 |        "      <td>1</td>\n",
 369 |        "      <td>0</td>\n",
 370 |        "      <td>0</td>\n",
 371 |        "      <td>0</td>\n",
 372 |        "      <td>...</td>\n",
 373 |        "      <td>f</td>\n",
 374 |        "      <td>'Middle Eastern '</td>\n",
 375 |        "      <td>no</td>\n",
 376 |        "      <td>no</td>\n",
 377 |        "      <td>Bahrain</td>\n",
 378 |        "      <td>no</td>\n",
 379 |        "      <td>7</td>\n",
 380 |        "      <td>'4-11 years'</td>\n",
 381 |        "      <td>Parent</td>\n",
 382 |        "      <td>YES</td>\n",
 383 |        "    </tr>\n",
 384 |        "    <tr>\n",
 385 |        "      <th>9</th>\n",
 386 |        "      <td>0</td>\n",
 387 |        "      <td>0</td>\n",
 388 |        "      <td>1</td>\n",
 389 |        "      <td>1</td>\n",
 390 |        "      <td>1</td>\n",
 391 |        "      <td>0</td>\n",
 392 |        "      <td>1</td>\n",
 393 |        "      <td>1</td>\n",
 394 |        "      <td>0</td>\n",
 395 |        "      <td>0</td>\n",
 396 |        "      <td>...</td>\n",
 397 |        "      <td>f</td>\n",
 398 |        "      <td>?</td>\n",
 399 |        "      <td>no</td>\n",
 400 |        "      <td>yes</td>\n",
 401 |        "      <td>Austria</td>\n",
 402 |        "      <td>no</td>\n",
 403 |        "      <td>5</td>\n",
 404 |        "      <td>'4-11 years'</td>\n",
 405 |        "      <td>?</td>\n",
 406 |        "      <td>NO</td>\n",
 407 |        "    </tr>\n",
 408 |        "    <tr>\n",
 409 |        "      <th>10</th>\n",
 410 |        "      <td>1</td>\n",
 411 |        "      <td>0</td>\n",
 412 |        "      <td>0</td>\n",
 413 |        "      <td>0</td>\n",
 414 |        "      <td>1</td>\n",
 415 |        "      <td>1</td>\n",
 416 |        "      <td>1</td>\n",
 417 |        "      <td>1</td>\n",
 418 |        "      <td>1</td>\n",
 419 |        "      <td>1</td>\n",
 420 |        "      <td>...</td>\n",
 421 |        "      <td>m</td>\n",
 422 |        "      <td>White-European</td>\n",
 423 |        "      <td>yes</td>\n",
 424 |        "      <td>no</td>\n",
 425 |        "      <td>'United Kingdom'</td>\n",
 426 |        "      <td>no</td>\n",
 427 |        "      <td>7</td>\n",
 428 |        "      <td>'4-11 years'</td>\n",
 429 |        "      <td>Self</td>\n",
 430 |        "      <td>YES</td>\n",
 431 |        "    </tr>\n",
 432 |        "  </tbody>\n",
 433 |        "</table>\n",
 434 |        "<p>11 rows × 21 columns</p>\n",
 435 |        "</div>"
 436 |       ],
 437 |       "text/plain": [
 438 |        "    A1_Score  A2_Score  A3_Score  A4_Score  A5_Score  A6_Score  A7_Score  \\\n",
 439 |        "0          1         1         0         0         1         1         0   \n",
 440 |        "1          1         1         0         0         1         1         0   \n",
 441 |        "2          1         1         0         0         0         1         1   \n",
 442 |        "3          0         1         0         0         1         1         0   \n",
 443 |        "4          1         1         1         1         1         1         1   \n",
 444 |        "5          0         0         1         0         1         1         0   \n",
 445 |        "6          1         0         1         1         1         1         0   \n",
 446 |        "7          1         1         1         1         1         1         1   \n",
 447 |        "8          1         1         1         1         1         1         1   \n",
 448 |        "9          0         0         1         1         1         0         1   \n",
 449 |        "10         1         0         0         0         1         1         1   \n",
 450 |        "\n",
 451 |        "    A8_Score  A9_Score  A10_Score  ...  gender          ethnicity jundice  \\\n",
 452 |        "0          1         0          0  ...       m             Others      no   \n",
 453 |        "1          1         0          0  ...       m  'Middle Eastern '      no   \n",
 454 |        "2          1         0          0  ...       m                  ?      no   \n",
 455 |        "3          0         0          1  ...       f                  ?     yes   \n",
 456 |        "4          1         1          1  ...       m             Others     yes   \n",
 457 |        "5          1         0          1  ...       m                  ?      no   \n",
 458 |        "6          1         0          1  ...       m     White-European      no   \n",
 459 |        "7          1         0          0  ...       f  'Middle Eastern '      no   \n",
 460 |        "8          0         0          0  ...       f  'Middle Eastern '      no   \n",
 461 |        "9          1         0          0  ...       f                  ?      no   \n",
 462 |        "10         1         1          1  ...       m     White-European     yes   \n",
 463 |        "\n",
 464 |        "   family_history_of_PDD     contry_of_res used_app_before result  \\\n",
 465 |        "0                     no            Jordan              no      5   \n",
 466 |        "1                     no            Jordan              no      5   \n",
 467 |        "2                     no            Jordan             yes      5   \n",
 468 |        "3                     no            Jordan              no      4   \n",
 469 |        "4                     no   'United States'              no     10   \n",
 470 |        "5                    yes             Egypt              no      5   \n",
 471 |        "6                     no  'United Kingdom'              no      7   \n",
 472 |        "7                     no           Bahrain              no      8   \n",
 473 |        "8                     no           Bahrain              no      7   \n",
 474 |        "9                    yes           Austria              no      5   \n",
 475 |        "10                    no  'United Kingdom'              no      7   \n",
 476 |        "\n",
 477 |        "        age_desc relation class  \n",
 478 |        "0   '4-11 years'   Parent    NO  \n",
 479 |        "1   '4-11 years'   Parent    NO  \n",
 480 |        "2   '4-11 years'        ?    NO  \n",
 481 |        "3   '4-11 years'        ?    NO  \n",
 482 |        "4   '4-11 years'   Parent   YES  \n",
 483 |        "5   '4-11 years'        ?    NO  \n",
 484 |        "6   '4-11 years'   Parent   YES  \n",
 485 |        "7   '4-11 years'   Parent   YES  \n",
 486 |        "8   '4-11 years'   Parent   YES  \n",
 487 |        "9   '4-11 years'        ?    NO  \n",
 488 |        "10  '4-11 years'     Self   YES  \n",
 489 |        "\n",
 490 |        "[11 rows x 21 columns]"
 491 |       ]
 492 |      },
 493 |      "execution_count": 4,
 494 |      "metadata": {},
 495 |      "output_type": "execute_result"
 496 |     }
 497 |    ],
 498 |    "source": [
 499 |     "# print out multiple patients at the same time\n",
 500 |     "data.loc[:10]"
 501 |    ]
 502 |   },
 503 |   {
 504 |    "cell_type": "code",
 505 |    "execution_count": 5,
 506 |    "metadata": {},
 507 |    "outputs": [
 508 |     {
 509 |      "data": {
 510 |       "text/html": [
 511 |        "<div>\n",
 512 |        "<style scoped>\n",
 513 |        "    .dataframe tbody tr th:only-of-type {\n",
 514 |        "        vertical-align: middle;\n",
 515 |        "    }\n",
 516 |        "\n",
 517 |        "    .dataframe tbody tr th {\n",
 518 |        "        vertical-align: top;\n",
 519 |        "    }\n",
 520 |        "\n",
 521 |        "    .dataframe thead th {\n",
 522 |        "        text-align: right;\n",
 523 |        "    }\n",
 524 |        "</style>\n",
 525 |        "<table border=\"1\" class=\"dataframe\">\n",
 526 |        "  <thead>\n",
 527 |        "    <tr style=\"text-align: right;\">\n",
 528 |        "      <th></th>\n",
 529 |        "      <th>A1_Score</th>\n",
 530 |        "      <th>A2_Score</th>\n",
 531 |        "      <th>A3_Score</th>\n",
 532 |        "      <th>A4_Score</th>\n",
 533 |        "      <th>A5_Score</th>\n",
 534 |        "      <th>A6_Score</th>\n",
 535 |        "      <th>A7_Score</th>\n",
 536 |        "      <th>A8_Score</th>\n",
 537 |        "      <th>A9_Score</th>\n",
 538 |        "      <th>A10_Score</th>\n",
 539 |        "      <th>result</th>\n",
 540 |        "    </tr>\n",
 541 |        "  </thead>\n",
 542 |        "  <tbody>\n",
 543 |        "    <tr>\n",
 544 |        "      <th>count</th>\n",
 545 |        "      <td>292.000000</td>\n",
 546 |        "      <td>292.000000</td>\n",
 547 |        "      <td>292.000000</td>\n",
 548 |        "      <td>292.000000</td>\n",
 549 |        "      <td>292.000000</td>\n",
 550 |        "      <td>292.000000</td>\n",
 551 |        "      <td>292.000000</td>\n",
 552 |        "      <td>292.000000</td>\n",
 553 |        "      <td>292.000000</td>\n",
 554 |        "      <td>292.000000</td>\n",
 555 |        "      <td>292.000000</td>\n",
 556 |        "    </tr>\n",
 557 |        "    <tr>\n",
 558 |        "      <th>mean</th>\n",
 559 |        "      <td>0.633562</td>\n",
 560 |        "      <td>0.534247</td>\n",
 561 |        "      <td>0.743151</td>\n",
 562 |        "      <td>0.551370</td>\n",
 563 |        "      <td>0.743151</td>\n",
 564 |        "      <td>0.712329</td>\n",
 565 |        "      <td>0.606164</td>\n",
 566 |        "      <td>0.496575</td>\n",
 567 |        "      <td>0.493151</td>\n",
 568 |        "      <td>0.726027</td>\n",
 569 |        "      <td>6.239726</td>\n",
 570 |        "    </tr>\n",
 571 |        "    <tr>\n",
 572 |        "      <th>std</th>\n",
 573 |        "      <td>0.482658</td>\n",
 574 |        "      <td>0.499682</td>\n",
 575 |        "      <td>0.437646</td>\n",
 576 |        "      <td>0.498208</td>\n",
 577 |        "      <td>0.437646</td>\n",
 578 |        "      <td>0.453454</td>\n",
 579 |        "      <td>0.489438</td>\n",
 580 |        "      <td>0.500847</td>\n",
 581 |        "      <td>0.500811</td>\n",
 582 |        "      <td>0.446761</td>\n",
 583 |        "      <td>2.284882</td>\n",
 584 |        "    </tr>\n",
 585 |        "    <tr>\n",
 586 |        "      <th>min</th>\n",
 587 |        "      <td>0.000000</td>\n",
 588 |        "      <td>0.000000</td>\n",
 589 |        "      <td>0.000000</td>\n",
 590 |        "      <td>0.000000</td>\n",
 591 |        "      <td>0.000000</td>\n",
 592 |        "      <td>0.000000</td>\n",
 593 |        "      <td>0.000000</td>\n",
 594 |        "      <td>0.000000</td>\n",
 595 |        "      <td>0.000000</td>\n",
 596 |        "      <td>0.000000</td>\n",
 597 |        "      <td>0.000000</td>\n",
 598 |        "    </tr>\n",
 599 |        "    <tr>\n",
 600 |        "      <th>25%</th>\n",
 601 |        "      <td>0.000000</td>\n",
 602 |        "      <td>0.000000</td>\n",
 603 |        "      <td>0.000000</td>\n",
 604 |        "      <td>0.000000</td>\n",
 605 |        "      <td>0.000000</td>\n",
 606 |        "      <td>0.000000</td>\n",
 607 |        "      <td>0.000000</td>\n",
 608 |        "      <td>0.000000</td>\n",
 609 |        "      <td>0.000000</td>\n",
 610 |        "      <td>0.000000</td>\n",
 611 |        "      <td>5.000000</td>\n",
 612 |        "    </tr>\n",
 613 |        "    <tr>\n",
 614 |        "      <th>50%</th>\n",
 615 |        "      <td>1.000000</td>\n",
 616 |        "      <td>1.000000</td>\n",
 617 |        "      <td>1.000000</td>\n",
 618 |        "      <td>1.000000</td>\n",
 619 |        "      <td>1.000000</td>\n",
 620 |        "      <td>1.000000</td>\n",
 621 |        "      <td>1.000000</td>\n",
 622 |        "      <td>0.000000</td>\n",
 623 |        "      <td>0.000000</td>\n",
 624 |        "      <td>1.000000</td>\n",
 625 |        "      <td>6.000000</td>\n",
 626 |        "    </tr>\n",
 627 |        "    <tr>\n",
 628 |        "      <th>75%</th>\n",
 629 |        "      <td>1.000000</td>\n",
 630 |        "      <td>1.000000</td>\n",
 631 |        "      <td>1.000000</td>\n",
 632 |        "      <td>1.000000</td>\n",
 633 |        "      <td>1.000000</td>\n",
 634 |        "      <td>1.000000</td>\n",
 635 |        "      <td>1.000000</td>\n",
 636 |        "      <td>1.000000</td>\n",
 637 |        "      <td>1.000000</td>\n",
 638 |        "      <td>1.000000</td>\n",
 639 |        "      <td>8.000000</td>\n",
 640 |        "    </tr>\n",
 641 |        "    <tr>\n",
 642 |        "      <th>max</th>\n",
 643 |        "      <td>1.000000</td>\n",
 644 |        "      <td>1.000000</td>\n",
 645 |        "      <td>1.000000</td>\n",
 646 |        "      <td>1.000000</td>\n",
 647 |        "      <td>1.000000</td>\n",
 648 |        "      <td>1.000000</td>\n",
 649 |        "      <td>1.000000</td>\n",
 650 |        "      <td>1.000000</td>\n",
 651 |        "      <td>1.000000</td>\n",
 652 |        "      <td>1.000000</td>\n",
 653 |        "      <td>10.000000</td>\n",
 654 |        "    </tr>\n",
 655 |        "  </tbody>\n",
 656 |        "</table>\n",
 657 |        "</div>"
 658 |       ],
 659 |       "text/plain": [
 660 |        "         A1_Score    A2_Score    A3_Score    A4_Score    A5_Score    A6_Score  \\\n",
 661 |        "count  292.000000  292.000000  292.000000  292.000000  292.000000  292.000000   \n",
 662 |        "mean     0.633562    0.534247    0.743151    0.551370    0.743151    0.712329   \n",
 663 |        "std      0.482658    0.499682    0.437646    0.498208    0.437646    0.453454   \n",
 664 |        "min      0.000000    0.000000    0.000000    0.000000    0.000000    0.000000   \n",
 665 |        "25%      0.000000    0.000000    0.000000    0.000000    0.000000    0.000000   \n",
 666 |        "50%      1.000000    1.000000    1.000000    1.000000    1.000000    1.000000   \n",
 667 |        "75%      1.000000    1.000000    1.000000    1.000000    1.000000    1.000000   \n",
 668 |        "max      1.000000    1.000000    1.000000    1.000000    1.000000    1.000000   \n",
 669 |        "\n",
 670 |        "         A7_Score    A8_Score    A9_Score   A10_Score      result  \n",
 671 |        "count  292.000000  292.000000  292.000000  292.000000  292.000000  \n",
 672 |        "mean     0.606164    0.496575    0.493151    0.726027    6.239726  \n",
 673 |        "std      0.489438    0.500847    0.500811    0.446761    2.284882  \n",
 674 |        "min      0.000000    0.000000    0.000000    0.000000    0.000000  \n",
 675 |        "25%      0.000000    0.000000    0.000000    0.000000    5.000000  \n",
 676 |        "50%      1.000000    0.000000    0.000000    1.000000    6.000000  \n",
 677 |        "75%      1.000000    1.000000    1.000000    1.000000    8.000000  \n",
 678 |        "max      1.000000    1.000000    1.000000    1.000000   10.000000  "
 679 |       ]
 680 |      },
 681 |      "execution_count": 5,
 682 |      "metadata": {},
 683 |      "output_type": "execute_result"
 684 |     }
 685 |    ],
 686 |    "source": [
 687 |     "# print out a description of the dataframe\n",
 688 |     "data.describe()"
 689 |    ]
 690 |   },
 691 |   {
 692 |    "cell_type": "markdown",
 693 |    "metadata": {},
 694 |    "source": [
 695 |     "### 2. Data Preprocessing\n",
 696 |     "\n",
 697 |     "This dataset is going to require multiple preprocessing steps. First, we have columns in our DataFrame (attributes) that we don't want to use when training our neural network. We will drop these columns first. Secondly, much of our data is reported using strings; as a result, we will convert our data to categorical labels. During our preprocessing, we will also split the dataset into X and Y datasets, where X has all of the attributes we want to use for prediction and Y has the class labels. "
 698 |    ]
 699 |   },
 700 |   {
 701 |    "cell_type": "code",
 702 |    "execution_count": 6,
 703 |    "metadata": {},
 704 |    "outputs": [],
 705 |    "source": [
 706 |     "# drop unwanted columns\n",
 707 |     "data = data.drop(['result', 'age_desc'], axis=1)"
 708 |    ]
 709 |   },
 710 |   {
 711 |    "cell_type": "code",
 712 |    "execution_count": 7,
 713 |    "metadata": {},
 714 |    "outputs": [
 715 |     {
 716 |      "data": {
 717 |       "text/html": [
 718 |        "<div>\n",
 719 |        "<style scoped>\n",
 720 |        "    .dataframe tbody tr th:only-of-type {\n",
 721 |        "        vertical-align: middle;\n",
 722 |        "    }\n",
 723 |        "\n",
 724 |        "    .dataframe tbody tr th {\n",
 725 |        "        vertical-align: top;\n",
 726 |        "    }\n",
 727 |        "\n",
 728 |        "    .dataframe thead th {\n",
 729 |        "        text-align: right;\n",
 730 |        "    }\n",
 731 |        "</style>\n",
 732 |        "<table border=\"1\" class=\"dataframe\">\n",
 733 |        "  <thead>\n",
 734 |        "    <tr style=\"text-align: right;\">\n",
 735 |        "      <th></th>\n",
 736 |        "      <th>A1_Score</th>\n",
 737 |        "      <th>A2_Score</th>\n",
 738 |        "      <th>A3_Score</th>\n",
 739 |        "      <th>A4_Score</th>\n",
 740 |        "      <th>A5_Score</th>\n",
 741 |        "      <th>A6_Score</th>\n",
 742 |        "      <th>A7_Score</th>\n",
 743 |        "      <th>A8_Score</th>\n",
 744 |        "      <th>A9_Score</th>\n",
 745 |        "      <th>A10_Score</th>\n",
 746 |        "      <th>age</th>\n",
 747 |        "      <th>gender</th>\n",
 748 |        "      <th>ethnicity</th>\n",
 749 |        "      <th>jundice</th>\n",
 750 |        "      <th>family_history_of_PDD</th>\n",
 751 |        "      <th>contry_of_res</th>\n",
 752 |        "      <th>used_app_before</th>\n",
 753 |        "      <th>relation</th>\n",
 754 |        "      <th>class</th>\n",
 755 |        "    </tr>\n",
 756 |        "  </thead>\n",
 757 |        "  <tbody>\n",
 758 |        "    <tr>\n",
 759 |        "      <th>0</th>\n",
 760 |        "      <td>1</td>\n",
 761 |        "      <td>1</td>\n",
 762 |        "      <td>0</td>\n",
 763 |        "      <td>0</td>\n",
 764 |        "      <td>1</td>\n",
 765 |        "      <td>1</td>\n",
 766 |        "      <td>0</td>\n",
 767 |        "      <td>1</td>\n",
 768 |        "      <td>0</td>\n",
 769 |        "      <td>0</td>\n",
 770 |        "      <td>6</td>\n",
 771 |        "      <td>m</td>\n",
 772 |        "      <td>Others</td>\n",
 773 |        "      <td>no</td>\n",
 774 |        "      <td>no</td>\n",
 775 |        "      <td>Jordan</td>\n",
 776 |        "      <td>no</td>\n",
 777 |        "      <td>Parent</td>\n",
 778 |        "      <td>NO</td>\n",
 779 |        "    </tr>\n",
 780 |        "    <tr>\n",
 781 |        "      <th>1</th>\n",
 782 |        "      <td>1</td>\n",
 783 |        "      <td>1</td>\n",
 784 |        "      <td>0</td>\n",
 785 |        "      <td>0</td>\n",
 786 |        "      <td>1</td>\n",
 787 |        "      <td>1</td>\n",
 788 |        "      <td>0</td>\n",
 789 |        "      <td>1</td>\n",
 790 |        "      <td>0</td>\n",
 791 |        "      <td>0</td>\n",
 792 |        "      <td>6</td>\n",
 793 |        "      <td>m</td>\n",
 794 |        "      <td>'Middle Eastern '</td>\n",
 795 |        "      <td>no</td>\n",
 796 |        "      <td>no</td>\n",
 797 |        "      <td>Jordan</td>\n",
 798 |        "      <td>no</td>\n",
 799 |        "      <td>Parent</td>\n",
 800 |        "      <td>NO</td>\n",
 801 |        "    </tr>\n",
 802 |        "    <tr>\n",
 803 |        "      <th>2</th>\n",
 804 |        "      <td>1</td>\n",
 805 |        "      <td>1</td>\n",
 806 |        "      <td>0</td>\n",
 807 |        "      <td>0</td>\n",
 808 |        "      <td>0</td>\n",
 809 |        "      <td>1</td>\n",
 810 |        "      <td>1</td>\n",
 811 |        "      <td>1</td>\n",
 812 |        "      <td>0</td>\n",
 813 |        "      <td>0</td>\n",
 814 |        "      <td>6</td>\n",
 815 |        "      <td>m</td>\n",
 816 |        "      <td>?</td>\n",
 817 |        "      <td>no</td>\n",
 818 |        "      <td>no</td>\n",
 819 |        "      <td>Jordan</td>\n",
 820 |        "      <td>yes</td>\n",
 821 |        "      <td>?</td>\n",
 822 |        "      <td>NO</td>\n",
 823 |        "    </tr>\n",
 824 |        "    <tr>\n",
 825 |        "      <th>3</th>\n",
 826 |        "      <td>0</td>\n",
 827 |        "      <td>1</td>\n",
 828 |        "      <td>0</td>\n",
 829 |        "      <td>0</td>\n",
 830 |        "      <td>1</td>\n",
 831 |        "      <td>1</td>\n",
 832 |        "      <td>0</td>\n",
 833 |        "      <td>0</td>\n",
 834 |        "      <td>0</td>\n",
 835 |        "      <td>1</td>\n",
 836 |        "      <td>5</td>\n",
 837 |        "      <td>f</td>\n",
 838 |        "      <td>?</td>\n",
 839 |        "      <td>yes</td>\n",
 840 |        "      <td>no</td>\n",
 841 |        "      <td>Jordan</td>\n",
 842 |        "      <td>no</td>\n",
 843 |        "      <td>?</td>\n",
 844 |        "      <td>NO</td>\n",
 845 |        "    </tr>\n",
 846 |        "    <tr>\n",
 847 |        "      <th>4</th>\n",
 848 |        "      <td>1</td>\n",
 849 |        "      <td>1</td>\n",
 850 |        "      <td>1</td>\n",
 851 |        "      <td>1</td>\n",
 852 |        "      <td>1</td>\n",
 853 |        "      <td>1</td>\n",
 854 |        "      <td>1</td>\n",
 855 |        "      <td>1</td>\n",
 856 |        "      <td>1</td>\n",
 857 |        "      <td>1</td>\n",
 858 |        "      <td>5</td>\n",
 859 |        "      <td>m</td>\n",
 860 |        "      <td>Others</td>\n",
 861 |        "      <td>yes</td>\n",
 862 |        "      <td>no</td>\n",
 863 |        "      <td>'United States'</td>\n",
 864 |        "      <td>no</td>\n",
 865 |        "      <td>Parent</td>\n",
 866 |        "      <td>YES</td>\n",
 867 |        "    </tr>\n",
 868 |        "    <tr>\n",
 869 |        "      <th>5</th>\n",
 870 |        "      <td>0</td>\n",
 871 |        "      <td>0</td>\n",
 872 |        "      <td>1</td>\n",
 873 |        "      <td>0</td>\n",
 874 |        "      <td>1</td>\n",
 875 |        "      <td>1</td>\n",
 876 |        "      <td>0</td>\n",
 877 |        "      <td>1</td>\n",
 878 |        "      <td>0</td>\n",
 879 |        "      <td>1</td>\n",
 880 |        "      <td>4</td>\n",
 881 |        "      <td>m</td>\n",
 882 |        "      <td>?</td>\n",
 883 |        "      <td>no</td>\n",
 884 |        "      <td>yes</td>\n",
 885 |        "      <td>Egypt</td>\n",
 886 |        "      <td>no</td>\n",
 887 |        "      <td>?</td>\n",
 888 |        "      <td>NO</td>\n",
 889 |        "    </tr>\n",
 890 |        "    <tr>\n",
 891 |        "      <th>6</th>\n",
 892 |        "      <td>1</td>\n",
 893 |        "      <td>0</td>\n",
 894 |        "      <td>1</td>\n",
 895 |        "      <td>1</td>\n",
 896 |        "      <td>1</td>\n",
 897 |        "      <td>1</td>\n",
 898 |        "      <td>0</td>\n",
 899 |        "      <td>1</td>\n",
 900 |        "      <td>0</td>\n",
 901 |        "      <td>1</td>\n",
 902 |        "      <td>5</td>\n",
 903 |        "      <td>m</td>\n",
 904 |        "      <td>White-European</td>\n",
 905 |        "      <td>no</td>\n",
 906 |        "      <td>no</td>\n",
 907 |        "      <td>'United Kingdom'</td>\n",
 908 |        "      <td>no</td>\n",
 909 |        "      <td>Parent</td>\n",
 910 |        "      <td>YES</td>\n",
 911 |        "    </tr>\n",
 912 |        "    <tr>\n",
 913 |        "      <th>7</th>\n",
 914 |        "      <td>1</td>\n",
 915 |        "      <td>1</td>\n",
 916 |        "      <td>1</td>\n",
 917 |        "      <td>1</td>\n",
 918 |        "      <td>1</td>\n",
 919 |        "      <td>1</td>\n",
 920 |        "      <td>1</td>\n",
 921 |        "      <td>1</td>\n",
 922 |        "      <td>0</td>\n",
 923 |        "      <td>0</td>\n",
 924 |        "      <td>5</td>\n",
 925 |        "      <td>f</td>\n",
 926 |        "      <td>'Middle Eastern '</td>\n",
 927 |        "      <td>no</td>\n",
 928 |        "      <td>no</td>\n",
 929 |        "      <td>Bahrain</td>\n",
 930 |        "      <td>no</td>\n",
 931 |        "      <td>Parent</td>\n",
 932 |        "      <td>YES</td>\n",
 933 |        "    </tr>\n",
 934 |        "    <tr>\n",
 935 |        "      <th>8</th>\n",
 936 |        "      <td>1</td>\n",
 937 |        "      <td>1</td>\n",
 938 |        "      <td>1</td>\n",
 939 |        "      <td>1</td>\n",
 940 |        "      <td>1</td>\n",
 941 |        "      <td>1</td>\n",
 942 |        "      <td>1</td>\n",
 943 |        "      <td>0</td>\n",
 944 |        "      <td>0</td>\n",
 945 |        "      <td>0</td>\n",
 946 |        "      <td>11</td>\n",
 947 |        "      <td>f</td>\n",
 948 |        "      <td>'Middle Eastern '</td>\n",
 949 |        "      <td>no</td>\n",
 950 |        "      <td>no</td>\n",
 951 |        "      <td>Bahrain</td>\n",
 952 |        "      <td>no</td>\n",
 953 |        "      <td>Parent</td>\n",
 954 |        "      <td>YES</td>\n",
 955 |        "    </tr>\n",
 956 |        "    <tr>\n",
 957 |        "      <th>9</th>\n",
 958 |        "      <td>0</td>\n",
 959 |        "      <td>0</td>\n",
 960 |        "      <td>1</td>\n",
 961 |        "      <td>1</td>\n",
 962 |        "      <td>1</td>\n",
 963 |        "      <td>0</td>\n",
 964 |        "      <td>1</td>\n",
 965 |        "      <td>1</td>\n",
 966 |        "      <td>0</td>\n",
 967 |        "      <td>0</td>\n",
 968 |        "      <td>11</td>\n",
 969 |        "      <td>f</td>\n",
 970 |        "      <td>?</td>\n",
 971 |        "      <td>no</td>\n",
 972 |        "      <td>yes</td>\n",
 973 |        "      <td>Austria</td>\n",
 974 |        "      <td>no</td>\n",
 975 |        "      <td>?</td>\n",
 976 |        "      <td>NO</td>\n",
 977 |        "    </tr>\n",
 978 |        "    <tr>\n",
 979 |        "      <th>10</th>\n",
 980 |        "      <td>1</td>\n",
 981 |        "      <td>0</td>\n",
 982 |        "      <td>0</td>\n",
 983 |        "      <td>0</td>\n",
 984 |        "      <td>1</td>\n",
 985 |        "      <td>1</td>\n",
 986 |        "      <td>1</td>\n",
 987 |        "      <td>1</td>\n",
 988 |        "      <td>1</td>\n",
 989 |        "      <td>1</td>\n",
 990 |        "      <td>10</td>\n",
 991 |        "      <td>m</td>\n",
 992 |        "      <td>White-European</td>\n",
 993 |        "      <td>yes</td>\n",
 994 |        "      <td>no</td>\n",
 995 |        "      <td>'United Kingdom'</td>\n",
 996 |        "      <td>no</td>\n",
 997 |        "      <td>Self</td>\n",
 998 |        "      <td>YES</td>\n",
 999 |        "    </tr>\n",
1000 |        "  </tbody>\n",
1001 |        "</table>\n",
1002 |        "</div>"
1003 |       ],
1004 |       "text/plain": [
1005 |        "    A1_Score  A2_Score  A3_Score  A4_Score  A5_Score  A6_Score  A7_Score  \\\n",
1006 |        "0          1         1         0         0         1         1         0   \n",
1007 |        "1          1         1         0         0         1         1         0   \n",
1008 |        "2          1         1         0         0         0         1         1   \n",
1009 |        "3          0         1         0         0         1         1         0   \n",
1010 |        "4          1         1         1         1         1         1         1   \n",
1011 |        "5          0         0         1         0         1         1         0   \n",
1012 |        "6          1         0         1         1         1         1         0   \n",
1013 |        "7          1         1         1         1         1         1         1   \n",
1014 |        "8          1         1         1         1         1         1         1   \n",
1015 |        "9          0         0         1         1         1         0         1   \n",
1016 |        "10         1         0         0         0         1         1         1   \n",
1017 |        "\n",
1018 |        "    A8_Score  A9_Score  A10_Score age gender          ethnicity jundice  \\\n",
1019 |        "0          1         0          0   6      m             Others      no   \n",
1020 |        "1          1         0          0   6      m  'Middle Eastern '      no   \n",
1021 |        "2          1         0          0   6      m                  ?      no   \n",
1022 |        "3          0         0          1   5      f                  ?     yes   \n",
1023 |        "4          1         1          1   5      m             Others     yes   \n",
1024 |        "5          1         0          1   4      m                  ?      no   \n",
1025 |        "6          1         0          1   5      m     White-European      no   \n",
1026 |        "7          1         0          0   5      f  'Middle Eastern '      no   \n",
1027 |        "8          0         0          0  11      f  'Middle Eastern '      no   \n",
1028 |        "9          1         0          0  11      f                  ?      no   \n",
1029 |        "10         1         1          1  10      m     White-European     yes   \n",
1030 |        "\n",
1031 |        "   family_history_of_PDD     contry_of_res used_app_before relation class  \n",
1032 |        "0                     no            Jordan              no   Parent    NO  \n",
1033 |        "1                     no            Jordan              no   Parent    NO  \n",
1034 |        "2                     no            Jordan             yes        ?    NO  \n",
1035 |        "3                     no            Jordan              no        ?    NO  \n",
1036 |        "4                     no   'United States'              no   Parent   YES  \n",
1037 |        "5                    yes             Egypt              no        ?    NO  \n",
1038 |        "6                     no  'United Kingdom'              no   Parent   YES  \n",
1039 |        "7                     no           Bahrain              no   Parent   YES  \n",
1040 |        "8                     no           Bahrain              no   Parent   YES  \n",
1041 |        "9                    yes           Austria              no        ?    NO  \n",
1042 |        "10                    no  'United Kingdom'              no     Self   YES  "
1043 |       ]
1044 |      },
1045 |      "execution_count": 7,
1046 |      "metadata": {},
1047 |      "output_type": "execute_result"
1048 |     }
1049 |    ],
1050 |    "source": [
1051 |     "data.loc[:10]"
1052 |    ]
1053 |   },
1054 |   {
1055 |    "cell_type": "code",
1056 |    "execution_count": 8,
1057 |    "metadata": {},
1058 |    "outputs": [],
1059 |    "source": [
1060 |     "# create X and Y datasets for training\n",
1061 |     "x = data.drop(['class'], 1)\n",
1062 |     "y = data['class']"
1063 |    ]
1064 |   },
1065 |   {
1066 |    "cell_type": "code",
1067 |    "execution_count": 9,
1068 |    "metadata": {},
1069 |    "outputs": [
1070 |     {
1071 |      "data": {
1072 |       "text/html": [
1073 |        "<div>\n",
1074 |        "<style scoped>\n",
1075 |        "    .dataframe tbody tr th:only-of-type {\n",
1076 |        "        vertical-align: middle;\n",
1077 |        "    }\n",
1078 |        "\n",
1079 |        "    .dataframe tbody tr th {\n",
1080 |        "        vertical-align: top;\n",
1081 |        "    }\n",
1082 |        "\n",
1083 |        "    .dataframe thead th {\n",
1084 |        "        text-align: right;\n",
1085 |        "    }\n",
1086 |        "</style>\n",
1087 |        "<table border=\"1\" class=\"dataframe\">\n",
1088 |        "  <thead>\n",
1089 |        "    <tr style=\"text-align: right;\">\n",
1090 |        "      <th></th>\n",
1091 |        "      <th>A1_Score</th>\n",
1092 |        "      <th>A2_Score</th>\n",
1093 |        "      <th>A3_Score</th>\n",
1094 |        "      <th>A4_Score</th>\n",
1095 |        "      <th>A5_Score</th>\n",
1096 |        "      <th>A6_Score</th>\n",
1097 |        "      <th>A7_Score</th>\n",
1098 |        "      <th>A8_Score</th>\n",
1099 |        "      <th>A9_Score</th>\n",
1100 |        "      <th>A10_Score</th>\n",
1101 |        "      <th>age</th>\n",
1102 |        "      <th>gender</th>\n",
1103 |        "      <th>ethnicity</th>\n",
1104 |        "      <th>jundice</th>\n",
1105 |        "      <th>family_history_of_PDD</th>\n",
1106 |        "      <th>contry_of_res</th>\n",
1107 |        "      <th>used_app_before</th>\n",
1108 |        "      <th>relation</th>\n",
1109 |        "    </tr>\n",
1110 |        "  </thead>\n",
1111 |        "  <tbody>\n",
1112 |        "    <tr>\n",
1113 |        "      <th>0</th>\n",
1114 |        "      <td>1</td>\n",
1115 |        "      <td>1</td>\n",
1116 |        "      <td>0</td>\n",
1117 |        "      <td>0</td>\n",
1118 |        "      <td>1</td>\n",
1119 |        "      <td>1</td>\n",
1120 |        "      <td>0</td>\n",
1121 |        "      <td>1</td>\n",
1122 |        "      <td>0</td>\n",
1123 |        "      <td>0</td>\n",
1124 |        "      <td>6</td>\n",
1125 |        "      <td>m</td>\n",
1126 |        "      <td>Others</td>\n",
1127 |        "      <td>no</td>\n",
1128 |        "      <td>no</td>\n",
1129 |        "      <td>Jordan</td>\n",
1130 |        "      <td>no</td>\n",
1131 |        "      <td>Parent</td>\n",
1132 |        "    </tr>\n",
1133 |        "    <tr>\n",
1134 |        "      <th>1</th>\n",
1135 |        "      <td>1</td>\n",
1136 |        "      <td>1</td>\n",
1137 |        "      <td>0</td>\n",
1138 |        "      <td>0</td>\n",
1139 |        "      <td>1</td>\n",
1140 |        "      <td>1</td>\n",
1141 |        "      <td>0</td>\n",
1142 |        "      <td>1</td>\n",
1143 |        "      <td>0</td>\n",
1144 |        "      <td>0</td>\n",
1145 |        "      <td>6</td>\n",
1146 |        "      <td>m</td>\n",
1147 |        "      <td>'Middle Eastern '</td>\n",
1148 |        "      <td>no</td>\n",
1149 |        "      <td>no</td>\n",
1150 |        "      <td>Jordan</td>\n",
1151 |        "      <td>no</td>\n",
1152 |        "      <td>Parent</td>\n",
1153 |        "    </tr>\n",
1154 |        "    <tr>\n",
1155 |        "      <th>2</th>\n",
1156 |        "      <td>1</td>\n",
1157 |        "      <td>1</td>\n",
1158 |        "      <td>0</td>\n",
1159 |        "      <td>0</td>\n",
1160 |        "      <td>0</td>\n",
1161 |        "      <td>1</td>\n",
1162 |        "      <td>1</td>\n",
1163 |        "      <td>1</td>\n",
1164 |        "      <td>0</td>\n",
1165 |        "      <td>0</td>\n",
1166 |        "      <td>6</td>\n",
1167 |        "      <td>m</td>\n",
1168 |        "      <td>?</td>\n",
1169 |        "      <td>no</td>\n",
1170 |        "      <td>no</td>\n",
1171 |        "      <td>Jordan</td>\n",
1172 |        "      <td>yes</td>\n",
1173 |        "      <td>?</td>\n",
1174 |        "    </tr>\n",
1175 |        "    <tr>\n",
1176 |        "      <th>3</th>\n",
1177 |        "      <td>0</td>\n",
1178 |        "      <td>1</td>\n",
1179 |        "      <td>0</td>\n",
1180 |        "      <td>0</td>\n",
1181 |        "      <td>1</td>\n",
1182 |        "      <td>1</td>\n",
1183 |        "      <td>0</td>\n",
1184 |        "      <td>0</td>\n",
1185 |        "      <td>0</td>\n",
1186 |        "      <td>1</td>\n",
1187 |        "      <td>5</td>\n",
1188 |        "      <td>f</td>\n",
1189 |        "      <td>?</td>\n",
1190 |        "      <td>yes</td>\n",
1191 |        "      <td>no</td>\n",
1192 |        "      <td>Jordan</td>\n",
1193 |        "      <td>no</td>\n",
1194 |        "      <td>?</td>\n",
1195 |        "    </tr>\n",
1196 |        "    <tr>\n",
1197 |        "      <th>4</th>\n",
1198 |        "      <td>1</td>\n",
1199 |        "      <td>1</td>\n",
1200 |        "      <td>1</td>\n",
1201 |        "      <td>1</td>\n",
1202 |        "      <td>1</td>\n",
1203 |        "      <td>1</td>\n",
1204 |        "      <td>1</td>\n",
1205 |        "      <td>1</td>\n",
1206 |        "      <td>1</td>\n",
1207 |        "      <td>1</td>\n",
1208 |        "      <td>5</td>\n",
1209 |        "      <td>m</td>\n",
1210 |        "      <td>Others</td>\n",
1211 |        "      <td>yes</td>\n",
1212 |        "      <td>no</td>\n",
1213 |        "      <td>'United States'</td>\n",
1214 |        "      <td>no</td>\n",
1215 |        "      <td>Parent</td>\n",
1216 |        "    </tr>\n",
1217 |        "    <tr>\n",
1218 |        "      <th>5</th>\n",
1219 |        "      <td>0</td>\n",
1220 |        "      <td>0</td>\n",
1221 |        "      <td>1</td>\n",
1222 |        "      <td>0</td>\n",
1223 |        "      <td>1</td>\n",
1224 |        "      <td>1</td>\n",
1225 |        "      <td>0</td>\n",
1226 |        "      <td>1</td>\n",
1227 |        "      <td>0</td>\n",
1228 |        "      <td>1</td>\n",
1229 |        "      <td>4</td>\n",
1230 |        "      <td>m</td>\n",
1231 |        "      <td>?</td>\n",
1232 |        "      <td>no</td>\n",
1233 |        "      <td>yes</td>\n",
1234 |        "      <td>Egypt</td>\n",
1235 |        "      <td>no</td>\n",
1236 |        "      <td>?</td>\n",
1237 |        "    </tr>\n",
1238 |        "    <tr>\n",
1239 |        "      <th>6</th>\n",
1240 |        "      <td>1</td>\n",
1241 |        "      <td>0</td>\n",
1242 |        "      <td>1</td>\n",
1243 |        "      <td>1</td>\n",
1244 |        "      <td>1</td>\n",
1245 |        "      <td>1</td>\n",
1246 |        "      <td>0</td>\n",
1247 |        "      <td>1</td>\n",
1248 |        "      <td>0</td>\n",
1249 |        "      <td>1</td>\n",
1250 |        "      <td>5</td>\n",
1251 |        "      <td>m</td>\n",
1252 |        "      <td>White-European</td>\n",
1253 |        "      <td>no</td>\n",
1254 |        "      <td>no</td>\n",
1255 |        "      <td>'United Kingdom'</td>\n",
1256 |        "      <td>no</td>\n",
1257 |        "      <td>Parent</td>\n",
1258 |        "    </tr>\n",
1259 |        "    <tr>\n",
1260 |        "      <th>7</th>\n",
1261 |        "      <td>1</td>\n",
1262 |        "      <td>1</td>\n",
1263 |        "      <td>1</td>\n",
1264 |        "      <td>1</td>\n",
1265 |        "      <td>1</td>\n",
1266 |        "      <td>1</td>\n",
1267 |        "      <td>1</td>\n",
1268 |        "      <td>1</td>\n",
1269 |        "      <td>0</td>\n",
1270 |        "      <td>0</td>\n",
1271 |        "      <td>5</td>\n",
1272 |        "      <td>f</td>\n",
1273 |        "      <td>'Middle Eastern '</td>\n",
1274 |        "      <td>no</td>\n",
1275 |        "      <td>no</td>\n",
1276 |        "      <td>Bahrain</td>\n",
1277 |        "      <td>no</td>\n",
1278 |        "      <td>Parent</td>\n",
1279 |        "    </tr>\n",
1280 |        "    <tr>\n",
1281 |        "      <th>8</th>\n",
1282 |        "      <td>1</td>\n",
1283 |        "      <td>1</td>\n",
1284 |        "      <td>1</td>\n",
1285 |        "      <td>1</td>\n",
1286 |        "      <td>1</td>\n",
1287 |        "      <td>1</td>\n",
1288 |        "      <td>1</td>\n",
1289 |        "      <td>0</td>\n",
1290 |        "      <td>0</td>\n",
1291 |        "      <td>0</td>\n",
1292 |        "      <td>11</td>\n",
1293 |        "      <td>f</td>\n",
1294 |        "      <td>'Middle Eastern '</td>\n",
1295 |        "      <td>no</td>\n",
1296 |        "      <td>no</td>\n",
1297 |        "      <td>Bahrain</td>\n",
1298 |        "      <td>no</td>\n",
1299 |        "      <td>Parent</td>\n",
1300 |        "    </tr>\n",
1301 |        "    <tr>\n",
1302 |        "      <th>9</th>\n",
1303 |        "      <td>0</td>\n",
1304 |        "      <td>0</td>\n",
1305 |        "      <td>1</td>\n",
1306 |        "      <td>1</td>\n",
1307 |        "      <td>1</td>\n",
1308 |        "      <td>0</td>\n",
1309 |        "      <td>1</td>\n",
1310 |        "      <td>1</td>\n",
1311 |        "      <td>0</td>\n",
1312 |        "      <td>0</td>\n",
1313 |        "      <td>11</td>\n",
1314 |        "      <td>f</td>\n",
1315 |        "      <td>?</td>\n",
1316 |        "      <td>no</td>\n",
1317 |        "      <td>yes</td>\n",
1318 |        "      <td>Austria</td>\n",
1319 |        "      <td>no</td>\n",
1320 |        "      <td>?</td>\n",
1321 |        "    </tr>\n",
1322 |        "    <tr>\n",
1323 |        "      <th>10</th>\n",
1324 |        "      <td>1</td>\n",
1325 |        "      <td>0</td>\n",
1326 |        "      <td>0</td>\n",
1327 |        "      <td>0</td>\n",
1328 |        "      <td>1</td>\n",
1329 |        "      <td>1</td>\n",
1330 |        "      <td>1</td>\n",
1331 |        "      <td>1</td>\n",
1332 |        "      <td>1</td>\n",
1333 |        "      <td>1</td>\n",
1334 |        "      <td>10</td>\n",
1335 |        "      <td>m</td>\n",
1336 |        "      <td>White-European</td>\n",
1337 |        "      <td>yes</td>\n",
1338 |        "      <td>no</td>\n",
1339 |        "      <td>'United Kingdom'</td>\n",
1340 |        "      <td>no</td>\n",
1341 |        "      <td>Self</td>\n",
1342 |        "    </tr>\n",
1343 |        "  </tbody>\n",
1344 |        "</table>\n",
1345 |        "</div>"
1346 |       ],
1347 |       "text/plain": [
1348 |        "    A1_Score  A2_Score  A3_Score  A4_Score  A5_Score  A6_Score  A7_Score  \\\n",
1349 |        "0          1         1         0         0         1         1         0   \n",
1350 |        "1          1         1         0         0         1         1         0   \n",
1351 |        "2          1         1         0         0         0         1         1   \n",
1352 |        "3          0         1         0         0         1         1         0   \n",
1353 |        "4          1         1         1         1         1         1         1   \n",
1354 |        "5          0         0         1         0         1         1         0   \n",
1355 |        "6          1         0         1         1         1         1         0   \n",
1356 |        "7          1         1         1         1         1         1         1   \n",
1357 |        "8          1         1         1         1         1         1         1   \n",
1358 |        "9          0         0         1         1         1         0         1   \n",
1359 |        "10         1         0         0         0         1         1         1   \n",
1360 |        "\n",
1361 |        "    A8_Score  A9_Score  A10_Score age gender          ethnicity jundice  \\\n",
1362 |        "0          1         0          0   6      m             Others      no   \n",
1363 |        "1          1         0          0   6      m  'Middle Eastern '      no   \n",
1364 |        "2          1         0          0   6      m                  ?      no   \n",
1365 |        "3          0         0          1   5      f                  ?     yes   \n",
1366 |        "4          1         1          1   5      m             Others     yes   \n",
1367 |        "5          1         0          1   4      m                  ?      no   \n",
1368 |        "6          1         0          1   5      m     White-European      no   \n",
1369 |        "7          1         0          0   5      f  'Middle Eastern '      no   \n",
1370 |        "8          0         0          0  11      f  'Middle Eastern '      no   \n",
1371 |        "9          1         0          0  11      f                  ?      no   \n",
1372 |        "10         1         1          1  10      m     White-European     yes   \n",
1373 |        "\n",
1374 |        "   family_history_of_PDD     contry_of_res used_app_before relation  \n",
1375 |        "0                     no            Jordan              no   Parent  \n",
1376 |        "1                     no            Jordan              no   Parent  \n",
1377 |        "2                     no            Jordan             yes        ?  \n",
1378 |        "3                     no            Jordan              no        ?  \n",
1379 |        "4                     no   'United States'              no   Parent  \n",
1380 |        "5                    yes             Egypt              no        ?  \n",
1381 |        "6                     no  'United Kingdom'              no   Parent  \n",
1382 |        "7                     no           Bahrain              no   Parent  \n",
1383 |        "8                     no           Bahrain              no   Parent  \n",
1384 |        "9                    yes           Austria              no        ?  \n",
1385 |        "10                    no  'United Kingdom'              no     Self  "
1386 |       ]
1387 |      },
1388 |      "execution_count": 9,
1389 |      "metadata": {},
1390 |      "output_type": "execute_result"
1391 |     }
1392 |    ],
1393 |    "source": [
1394 |     "x.loc[:10]"
1395 |    ]
1396 |   },
1397 |   {
1398 |    "cell_type": "code",
1399 |    "execution_count": 10,
1400 |    "metadata": {},
1401 |    "outputs": [],
1402 |    "source": [
1403 |     "# convert the data to categorical values - one-hot-encoded vectors\n",
1404 |     "X = pd.get_dummies(x)"
1405 |    ]
1406 |   },
1407 |   {
1408 |    "cell_type": "code",
1409 |    "execution_count": 11,
1410 |    "metadata": {},
1411 |    "outputs": [
1412 |     {
1413 |      "data": {
1414 |       "text/plain": [
1415 |        "array(['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score',\n",
1416 |        "       'A6_Score', 'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score',\n",
1417 |        "       'age_10', 'age_11', 'age_4', 'age_5', 'age_6', 'age_7', 'age_8',\n",
1418 |        "       'age_9', 'age_?', 'gender_f', 'gender_m',\n",
1419 |        "       \"ethnicity_'Middle Eastern '\", \"ethnicity_'South Asian'\",\n",
1420 |        "       'ethnicity_?', 'ethnicity_Asian', 'ethnicity_Black',\n",
1421 |        "       'ethnicity_Hispanic', 'ethnicity_Latino', 'ethnicity_Others',\n",
1422 |        "       'ethnicity_Pasifika', 'ethnicity_Turkish',\n",
1423 |        "       'ethnicity_White-European', 'jundice_no', 'jundice_yes',\n",
1424 |        "       'family_history_of_PDD_no', 'family_history_of_PDD_yes',\n",
1425 |        "       \"contry_of_res_'Costa Rica'\", \"contry_of_res_'Isle of Man'\",\n",
1426 |        "       \"contry_of_res_'New Zealand'\", \"contry_of_res_'Saudi Arabia'\",\n",
1427 |        "       \"contry_of_res_'South Africa'\", \"contry_of_res_'South Korea'\",\n",
1428 |        "       \"contry_of_res_'U.S. Outlying Islands'\",\n",
1429 |        "       \"contry_of_res_'United Arab Emirates'\",\n",
1430 |        "       \"contry_of_res_'United Kingdom'\", \"contry_of_res_'United States'\",\n",
1431 |        "       'contry_of_res_Afghanistan', 'contry_of_res_Argentina',\n",
1432 |        "       'contry_of_res_Armenia', 'contry_of_res_Australia',\n",
1433 |        "       'contry_of_res_Austria', 'contry_of_res_Bahrain',\n",
1434 |        "       'contry_of_res_Bangladesh', 'contry_of_res_Bhutan',\n",
1435 |        "       'contry_of_res_Brazil', 'contry_of_res_Bulgaria',\n",
1436 |        "       'contry_of_res_Canada', 'contry_of_res_China',\n",
1437 |        "       'contry_of_res_Egypt', 'contry_of_res_Europe',\n",
1438 |        "       'contry_of_res_Georgia', 'contry_of_res_Germany',\n",
1439 |        "       'contry_of_res_Ghana', 'contry_of_res_India', 'contry_of_res_Iraq',\n",
1440 |        "       'contry_of_res_Ireland', 'contry_of_res_Italy',\n",
1441 |        "       'contry_of_res_Japan', 'contry_of_res_Jordan',\n",
1442 |        "       'contry_of_res_Kuwait', 'contry_of_res_Latvia',\n",
1443 |        "       'contry_of_res_Lebanon', 'contry_of_res_Libya',\n",
1444 |        "       'contry_of_res_Malaysia', 'contry_of_res_Malta',\n",
1445 |        "       'contry_of_res_Mexico', 'contry_of_res_Nepal',\n",
1446 |        "       'contry_of_res_Netherlands', 'contry_of_res_Nigeria',\n",
1447 |        "       'contry_of_res_Oman', 'contry_of_res_Pakistan',\n",
1448 |        "       'contry_of_res_Philippines', 'contry_of_res_Qatar',\n",
1449 |        "       'contry_of_res_Romania', 'contry_of_res_Russia',\n",
1450 |        "       'contry_of_res_Sweden', 'contry_of_res_Syria',\n",
1451 |        "       'contry_of_res_Turkey', 'used_app_before_no',\n",
1452 |        "       'used_app_before_yes', \"relation_'Health care professional'\",\n",
1453 |        "       'relation_?', 'relation_Parent', 'relation_Relative',\n",
1454 |        "       'relation_Self', 'relation_self'], dtype=object)"
1455 |       ]
1456 |      },
1457 |      "execution_count": 11,
1458 |      "metadata": {},
1459 |      "output_type": "execute_result"
1460 |     }
1461 |    ],
1462 |    "source": [
1463 |     "# print the new categorical column labels\n",
1464 |     "X.columns.values"
1465 |    ]
1466 |   },
1467 |   {
1468 |    "cell_type": "code",
1469 |    "execution_count": 12,
1470 |    "metadata": {},
1471 |    "outputs": [
1472 |     {
1473 |      "data": {
1474 |       "text/plain": [
1475 |        "A1_Score                               1\n",
1476 |        "A2_Score                               1\n",
1477 |        "A3_Score                               0\n",
1478 |        "A4_Score                               0\n",
1479 |        "A5_Score                               1\n",
1480 |        "A6_Score                               1\n",
1481 |        "A7_Score                               0\n",
1482 |        "A8_Score                               1\n",
1483 |        "A9_Score                               0\n",
1484 |        "A10_Score                              0\n",
1485 |        "age_10                                 0\n",
1486 |        "age_11                                 0\n",
1487 |        "age_4                                  0\n",
1488 |        "age_5                                  0\n",
1489 |        "age_6                                  1\n",
1490 |        "age_7                                  0\n",
1491 |        "age_8                                  0\n",
1492 |        "age_9                                  0\n",
1493 |        "age_?                                  0\n",
1494 |        "gender_f                               0\n",
1495 |        "gender_m                               1\n",
1496 |        "ethnicity_'Middle Eastern '            1\n",
1497 |        "ethnicity_'South Asian'                0\n",
1498 |        "ethnicity_?                            0\n",
1499 |        "ethnicity_Asian                        0\n",
1500 |        "ethnicity_Black                        0\n",
1501 |        "ethnicity_Hispanic                     0\n",
1502 |        "ethnicity_Latino                       0\n",
1503 |        "ethnicity_Others                       0\n",
1504 |        "ethnicity_Pasifika                     0\n",
1505 |        "                                      ..\n",
1506 |        "contry_of_res_Italy                    0\n",
1507 |        "contry_of_res_Japan                    0\n",
1508 |        "contry_of_res_Jordan                   1\n",
1509 |        "contry_of_res_Kuwait                   0\n",
1510 |        "contry_of_res_Latvia                   0\n",
1511 |        "contry_of_res_Lebanon                  0\n",
1512 |        "contry_of_res_Libya                    0\n",
1513 |        "contry_of_res_Malaysia                 0\n",
1514 |        "contry_of_res_Malta                    0\n",
1515 |        "contry_of_res_Mexico                   0\n",
1516 |        "contry_of_res_Nepal                    0\n",
1517 |        "contry_of_res_Netherlands              0\n",
1518 |        "contry_of_res_Nigeria                  0\n",
1519 |        "contry_of_res_Oman                     0\n",
1520 |        "contry_of_res_Pakistan                 0\n",
1521 |        "contry_of_res_Philippines              0\n",
1522 |        "contry_of_res_Qatar                    0\n",
1523 |        "contry_of_res_Romania                  0\n",
1524 |        "contry_of_res_Russia                   0\n",
1525 |        "contry_of_res_Sweden                   0\n",
1526 |        "contry_of_res_Syria                    0\n",
1527 |        "contry_of_res_Turkey                   0\n",
1528 |        "used_app_before_no                     1\n",
1529 |        "used_app_before_yes                    0\n",
1530 |        "relation_'Health care professional'    0\n",
1531 |        "relation_?                             0\n",
1532 |        "relation_Parent                        1\n",
1533 |        "relation_Relative                      0\n",
1534 |        "relation_Self                          0\n",
1535 |        "relation_self                          0\n",
1536 |        "Name: 1, Length: 96, dtype: int64"
1537 |       ]
1538 |      },
1539 |      "execution_count": 12,
1540 |      "metadata": {},
1541 |      "output_type": "execute_result"
1542 |     }
1543 |    ],
1544 |    "source": [
1545 |     "# print an example patient from the categorical data\n",
1546 |     "X.loc[1]"
1547 |    ]
1548 |   },
1549 |   {
1550 |    "cell_type": "code",
1551 |    "execution_count": 13,
1552 |    "metadata": {},
1553 |    "outputs": [],
1554 |    "source": [
1555 |     "# convert the class data to categorical values - one-hot-encoded vectors\n",
1556 |     "Y = pd.get_dummies(y)"
1557 |    ]
1558 |   },
1559 |   {
1560 |    "cell_type": "code",
1561 |    "execution_count": 14,
1562 |    "metadata": {},
1563 |    "outputs": [
1564 |     {
1565 |      "data": {
1566 |       "text/html": [
1567 |        "<div>\n",
1568 |        "<style scoped>\n",
1569 |        "    .dataframe tbody tr th:only-of-type {\n",
1570 |        "        vertical-align: middle;\n",
1571 |        "    }\n",
1572 |        "\n",
1573 |        "    .dataframe tbody tr th {\n",
1574 |        "        vertical-align: top;\n",
1575 |        "    }\n",
1576 |        "\n",
1577 |        "    .dataframe thead th {\n",
1578 |        "        text-align: right;\n",
1579 |        "    }\n",
1580 |        "</style>\n",
1581 |        "<table border=\"1\" class=\"dataframe\">\n",
1582 |        "  <thead>\n",
1583 |        "    <tr style=\"text-align: right;\">\n",
1584 |        "      <th></th>\n",
1585 |        "      <th>NO</th>\n",
1586 |        "      <th>YES</th>\n",
1587 |        "    </tr>\n",
1588 |        "  </thead>\n",
1589 |        "  <tbody>\n",
1590 |        "    <tr>\n",
1591 |        "      <th>0</th>\n",
1592 |        "      <td>1</td>\n",
1593 |        "      <td>0</td>\n",
1594 |        "    </tr>\n",
1595 |        "    <tr>\n",
1596 |        "      <th>1</th>\n",
1597 |        "      <td>1</td>\n",
1598 |        "      <td>0</td>\n",
1599 |        "    </tr>\n",
1600 |        "    <tr>\n",
1601 |        "      <th>2</th>\n",
1602 |        "      <td>1</td>\n",
1603 |        "      <td>0</td>\n",
1604 |        "    </tr>\n",
1605 |        "    <tr>\n",
1606 |        "      <th>3</th>\n",
1607 |        "      <td>1</td>\n",
1608 |        "      <td>0</td>\n",
1609 |        "    </tr>\n",
1610 |        "    <tr>\n",
1611 |        "      <th>4</th>\n",
1612 |        "      <td>0</td>\n",
1613 |        "      <td>1</td>\n",
1614 |        "    </tr>\n",
1615 |        "    <tr>\n",
1616 |        "      <th>5</th>\n",
1617 |        "      <td>1</td>\n",
1618 |        "      <td>0</td>\n",
1619 |        "    </tr>\n",
1620 |        "    <tr>\n",
1621 |        "      <th>6</th>\n",
1622 |        "      <td>0</td>\n",
1623 |        "      <td>1</td>\n",
1624 |        "    </tr>\n",
1625 |        "    <tr>\n",
1626 |        "      <th>7</th>\n",
1627 |        "      <td>0</td>\n",
1628 |        "      <td>1</td>\n",
1629 |        "    </tr>\n",
1630 |        "    <tr>\n",
1631 |        "      <th>8</th>\n",
1632 |        "      <td>0</td>\n",
1633 |        "      <td>1</td>\n",
1634 |        "    </tr>\n",
1635 |        "    <tr>\n",
1636 |        "      <th>9</th>\n",
1637 |        "      <td>1</td>\n",
1638 |        "      <td>0</td>\n",
1639 |        "    </tr>\n",
1640 |        "  </tbody>\n",
1641 |        "</table>\n",
1642 |        "</div>"
1643 |       ],
1644 |       "text/plain": [
1645 |        "   NO  YES\n",
1646 |        "0   1    0\n",
1647 |        "1   1    0\n",
1648 |        "2   1    0\n",
1649 |        "3   1    0\n",
1650 |        "4   0    1\n",
1651 |        "5   1    0\n",
1652 |        "6   0    1\n",
1653 |        "7   0    1\n",
1654 |        "8   0    1\n",
1655 |        "9   1    0"
1656 |       ]
1657 |      },
1658 |      "execution_count": 14,
1659 |      "metadata": {},
1660 |      "output_type": "execute_result"
1661 |     }
1662 |    ],
1663 |    "source": [
1664 |     "Y.iloc[:10]"
1665 |    ]
1666 |   },
1667 |   {
1668 |    "cell_type": "markdown",
1669 |    "metadata": {},
1670 |    "source": [
1671 |     "### 3. Split the Dataset into Training and Testing Datasets\n",
1672 |     "\n",
1673 |     "Before we can begin training our neural network, we need to split the dataset into training and testing datasets. This will allow us to test our network after we are done training to determine how well it will generalize to new data. This step is incredibly easy when using the train_test_split() function provided by scikit-learn!"
1674 |    ]
1675 |   },
1676 |   {
1677 |    "cell_type": "code",
1678 |    "execution_count": 15,
1679 |    "metadata": {},
1680 |    "outputs": [],
1681 |    "source": [
1682 |     "from sklearn import model_selection\n",
1683 |     "# split the X and Y data into training and testing datasets\n",
1684 |     "X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.2)"
1685 |    ]
1686 |   },
1687 |   {
1688 |    "cell_type": "code",
1689 |    "execution_count": 16,
1690 |    "metadata": {},
1691 |    "outputs": [
1692 |     {
1693 |      "name": "stdout",
1694 |      "output_type": "stream",
1695 |      "text": [
1696 |       "(233, 96)\n",
1697 |       "(59, 96)\n",
1698 |       "(233, 2)\n",
1699 |       "(59, 2)\n"
1700 |      ]
1701 |     }
1702 |    ],
1703 |    "source": [
1704 |     "print X_train.shape\n",
1705 |     "print X_test.shape\n",
1706 |     "print Y_train.shape\n",
1707 |     "print Y_test.shape"
1708 |    ]
1709 |   },
1710 |   {
1711 |    "cell_type": "markdown",
1712 |    "metadata": {},
1713 |    "source": [
1714 |     "### 4. Building the Network - Keras\n",
1715 |     "\n",
1716 |     "In this project, we are going to use Keras to build and train our network. This model will be relatively simple and will only use dense (also known as fully connected) layers. This is the most common neural network layer. The network will have one hidden layer, use an Adam optimizer, and a categorical crossentropy loss. We won't worry about optimizing parameters such as learning rate, number of neurons in each layer, or activation functions in this project; however, if you have the time, manually adjusting these parameters and observing the results is a great way to learn about their function!"
1717 |    ]
1718 |   },
1719 |   {
1720 |    "cell_type": "code",
1721 |    "execution_count": 17,
1722 |    "metadata": {},
1723 |    "outputs": [
1724 |     {
1725 |      "name": "stdout",
1726 |      "output_type": "stream",
1727 |      "text": [
1728 |       "_________________________________________________________________\n",
1729 |       "Layer (type)                 Output Shape              Param #   \n",
1730 |       "=================================================================\n",
1731 |       "dense_1 (Dense)              (None, 8)                 776       \n",
1732 |       "_________________________________________________________________\n",
1733 |       "dense_2 (Dense)              (None, 4)                 36        \n",
1734 |       "_________________________________________________________________\n",
1735 |       "dense_3 (Dense)              (None, 2)                 10        \n",
1736 |       "=================================================================\n",
1737 |       "Total params: 822\n",
1738 |       "Trainable params: 822\n",
1739 |       "Non-trainable params: 0\n",
1740 |       "_________________________________________________________________\n",
1741 |       "None\n"
1742 |      ]
1743 |     }
1744 |    ],
1745 |    "source": [
1746 |     "# build a neural network using Keras\n",
1747 |     "from keras.models import Sequential\n",
1748 |     "from keras.layers import Dense\n",
1749 |     "from keras.optimizers import Adam\n",
1750 |     "\n",
1751 |     "# define a function to build the keras model\n",
1752 |     "def create_model():\n",
1753 |     "    # create model\n",
1754 |     "    model = Sequential()\n",
1755 |     "    model.add(Dense(8, input_dim=96, kernel_initializer='normal', activation='relu'))\n",
1756 |     "    model.add(Dense(4, kernel_initializer='normal', activation='relu'))\n",
1757 |     "    model.add(Dense(2, activation='sigmoid'))\n",
1758 |     "    \n",
1759 |     "    # compile model\n",
1760 |     "    adam = Adam(lr=0.001)\n",
1761 |     "    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])\n",
1762 |     "    return model\n",
1763 |     "\n",
1764 |     "model = create_model()\n",
1765 |     "\n",
1766 |     "print(model.summary())"
1767 |    ]
1768 |   },
1769 |   {
1770 |    "cell_type": "markdown",
1771 |    "metadata": {},
1772 |    "source": [
1773 |     "### 5. Training the Network\n",
1774 |     "\n",
1775 |     "Now it's time for the fun! Training a Keras model is as simple as calling model.fit()."
1776 |    ]
1777 |   },
1778 |   {
1779 |    "cell_type": "code",
1780 |    "execution_count": 18,
1781 |    "metadata": {},
1782 |    "outputs": [
1783 |     {
1784 |      "name": "stdout",
1785 |      "output_type": "stream",
1786 |      "text": [
1787 |       "Epoch 1/50\n",
1788 |       "233/233 [==============================] - 0s 288us/step - loss: 0.6927 - acc: 0.5794\n",
1789 |       "Epoch 2/50\n",
1790 |       "233/233 [==============================] - 0s 245us/step - loss: 0.6910 - acc: 0.7210\n",
1791 |       "Epoch 3/50\n",
1792 |       "233/233 [==============================] - 0s 258us/step - loss: 0.6868 - acc: 0.7639\n",
1793 |       "Epoch 4/50\n",
1794 |       "233/233 [==============================] - 0s 236us/step - loss: 0.6779 - acc: 0.7082\n",
1795 |       "Epoch 5/50\n",
1796 |       "233/233 [==============================] - 0s 236us/step - loss: 0.6619 - acc: 0.8541\n",
1797 |       "Epoch 6/50\n",
1798 |       "233/233 [==============================] - 0s 305us/step - loss: 0.6340 - acc: 0.8283\n",
1799 |       "Epoch 7/50\n",
1800 |       "233/233 [==============================] - 0s 227us/step - loss: 0.5963 - acc: 0.8541\n",
1801 |       "Epoch 8/50\n",
1802 |       "233/233 [==============================] - 0s 305us/step - loss: 0.5446 - acc: 0.9399\n",
1803 |       "Epoch 9/50\n",
1804 |       "233/233 [==============================] - 0s 240us/step - loss: 0.4884 - acc: 0.8884\n",
1805 |       "Epoch 10/50\n",
1806 |       "233/233 [==============================] - 0s 227us/step - loss: 0.4220 - acc: 0.9227\n",
1807 |       "Epoch 11/50\n",
1808 |       "233/233 [==============================] - 0s 322us/step - loss: 0.3603 - acc: 0.9313\n",
1809 |       "Epoch 12/50\n",
1810 |       "233/233 [==============================] - 0s 245us/step - loss: 0.2935 - acc: 0.9614\n",
1811 |       "Epoch 13/50\n",
1812 |       "233/233 [==============================] - 0s 296us/step - loss: 0.2528 - acc: 0.9657\n",
1813 |       "Epoch 14/50\n",
1814 |       "233/233 [==============================] - 0s 330us/step - loss: 0.2087 - acc: 0.9657\n",
1815 |       "Epoch 15/50\n",
1816 |       "233/233 [==============================] - 0s 305us/step - loss: 0.1788 - acc: 0.9871\n",
1817 |       "Epoch 16/50\n",
1818 |       "233/233 [==============================] - 0s 313us/step - loss: 0.1605 - acc: 0.9700\n",
1819 |       "Epoch 17/50\n",
1820 |       "233/233 [==============================] - 0s 309us/step - loss: 0.1389 - acc: 0.9828\n",
1821 |       "Epoch 18/50\n",
1822 |       "233/233 [==============================] - 0s 335us/step - loss: 0.1258 - acc: 0.9785\n",
1823 |       "Epoch 19/50\n",
1824 |       "233/233 [==============================] - 0s 343us/step - loss: 0.1108 - acc: 0.9871\n",
1825 |       "Epoch 20/50\n",
1826 |       "233/233 [==============================] - 0s 399us/step - loss: 0.1004 - acc: 0.9871\n",
1827 |       "Epoch 21/50\n",
1828 |       "233/233 [==============================] - 0s 416us/step - loss: 0.0910 - acc: 0.9871\n",
1829 |       "Epoch 22/50\n",
1830 |       "233/233 [==============================] - 0s 343us/step - loss: 0.0820 - acc: 0.9871\n",
1831 |       "Epoch 23/50\n",
1832 |       "233/233 [==============================] - 0s 361us/step - loss: 0.0752 - acc: 0.9914\n",
1833 |       "Epoch 24/50\n",
1834 |       "233/233 [==============================] - 0s 356us/step - loss: 0.0714 - acc: 0.9957\n",
1835 |       "Epoch 25/50\n",
1836 |       "233/233 [==============================] - 0s 309us/step - loss: 0.0634 - acc: 0.9957\n",
1837 |       "Epoch 26/50\n",
1838 |       "233/233 [==============================] - 0s 339us/step - loss: 0.0585 - acc: 0.9957\n",
1839 |       "Epoch 27/50\n",
1840 |       "233/233 [==============================] - 0s 335us/step - loss: 0.0571 - acc: 1.0000\n",
1841 |       "Epoch 28/50\n",
1842 |       "233/233 [==============================] - 0s 429us/step - loss: 0.0526 - acc: 0.9957\n",
1843 |       "Epoch 29/50\n",
1844 |       "233/233 [==============================] - 0s 335us/step - loss: 0.0474 - acc: 1.0000\n",
1845 |       "Epoch 30/50\n",
1846 |       "233/233 [==============================] - 0s 322us/step - loss: 0.0463 - acc: 0.9957\n",
1847 |       "Epoch 31/50\n",
1848 |       "233/233 [==============================] - 0s 296us/step - loss: 0.0431 - acc: 1.0000\n",
1849 |       "Epoch 32/50\n",
1850 |       "233/233 [==============================] - 0s 348us/step - loss: 0.0381 - acc: 1.0000\n",
1851 |       "Epoch 33/50\n",
1852 |       "233/233 [==============================] - 0s 322us/step - loss: 0.0357 - acc: 1.0000\n",
1853 |       "Epoch 34/50\n",
1854 |       "233/233 [==============================] - 0s 292us/step - loss: 0.0331 - acc: 1.0000\n",
1855 |       "Epoch 35/50\n",
1856 |       "233/233 [==============================] - 0s 305us/step - loss: 0.0316 - acc: 1.0000\n",
1857 |       "Epoch 36/50\n",
1858 |       "233/233 [==============================] - 0s 335us/step - loss: 0.0294 - acc: 1.0000\n",
1859 |       "Epoch 37/50\n",
1860 |       "233/233 [==============================] - 0s 322us/step - loss: 0.0282 - acc: 1.0000\n",
1861 |       "Epoch 38/50\n",
1862 |       "233/233 [==============================] - 0s 236us/step - loss: 0.0281 - acc: 1.0000\n",
1863 |       "Epoch 39/50\n",
1864 |       "233/233 [==============================] - 0s 339us/step - loss: 0.0253 - acc: 1.0000\n",
1865 |       "Epoch 40/50\n",
1866 |       "233/233 [==============================] - 0s 223us/step - loss: 0.0252 - acc: 1.0000\n",
1867 |       "Epoch 41/50\n",
1868 |       "233/233 [==============================] - 0s 326us/step - loss: 0.0226 - acc: 1.0000\n",
1869 |       "Epoch 42/50\n",
1870 |       "233/233 [==============================] - 0s 326us/step - loss: 0.0213 - acc: 1.0000\n",
1871 |       "Epoch 43/50\n",
1872 |       "233/233 [==============================] - 0s 219us/step - loss: 0.0203 - acc: 1.0000\n",
1873 |       "Epoch 44/50\n",
1874 |       "233/233 [==============================] - 0s 215us/step - loss: 0.0193 - acc: 1.0000\n",
1875 |       "Epoch 45/50\n",
1876 |       "233/233 [==============================] - 0s 318us/step - loss: 0.0190 - acc: 1.0000\n",
1877 |       "Epoch 46/50\n",
1878 |       "233/233 [==============================] - 0s 232us/step - loss: 0.0176 - acc: 1.0000\n",
1879 |       "Epoch 47/50\n",
1880 |       "233/233 [==============================] - 0s 215us/step - loss: 0.0163 - acc: 1.0000\n",
1881 |       "Epoch 48/50\n",
1882 |       "233/233 [==============================] - 0s 202us/step - loss: 0.0161 - acc: 1.0000\n",
1883 |       "Epoch 49/50\n",
1884 |       "233/233 [==============================] - 0s 240us/step - loss: 0.0154 - acc: 1.0000\n",
1885 |       "Epoch 50/50\n",
1886 |       "233/233 [==============================] - 0s 223us/step - loss: 0.0150 - acc: 1.0000\n"
1887 |      ]
1888 |     },
1889 |     {
1890 |      "data": {
1891 |       "text/plain": [
1892 |        "<keras.callbacks.History at 0x12000f28>"
1893 |       ]
1894 |      },
1895 |      "execution_count": 18,
1896 |      "metadata": {},
1897 |      "output_type": "execute_result"
1898 |     }
1899 |    ],
1900 |    "source": [
1901 |     "# fit the model to the training data\n",
1902 |     "model.fit(X_train, Y_train, epochs=50, batch_size=10, verbose = 1)"
1903 |    ]
1904 |   },
1905 |   {
1906 |    "cell_type": "markdown",
1907 |    "metadata": {},
1908 |    "source": [
1909 |     "### 6. Testing and Performance Metrics\n",
1910 |     "\n",
1911 |     "Now that our model has been trained, we need to test its performance on the testing dataset. The model has never seen this information before; as a result, the testing dataset allows us to determine whether or not the model will be able to generalize to information that wasn't used during its training phase. We will use some of the metrics provided by scikit-learn for this purpose! "
1912 |    ]
1913 |   },
1914 |   {
1915 |    "cell_type": "code",
1916 |    "execution_count": 19,
1917 |    "metadata": {},
1918 |    "outputs": [
1919 |     {
1920 |      "data": {
1921 |       "text/plain": [
1922 |        "array([1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,\n",
1923 |        "       1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0,\n",
1924 |        "       1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0], dtype=int64)"
1925 |       ]
1926 |      },
1927 |      "execution_count": 19,
1928 |      "metadata": {},
1929 |      "output_type": "execute_result"
1930 |     }
1931 |    ],
1932 |    "source": [
1933 |     "# generate classification report using predictions for categorical model\n",
1934 |     "from sklearn.metrics import classification_report, accuracy_score\n",
1935 |     "\n",
1936 |     "predictions = model.predict_classes(X_test)\n",
1937 |     "predictions"
1938 |    ]
1939 |   },
1940 |   {
1941 |    "cell_type": "code",
1942 |    "execution_count": 20,
1943 |    "metadata": {},
1944 |    "outputs": [
1945 |     {
1946 |      "name": "stdout",
1947 |      "output_type": "stream",
1948 |      "text": [
1949 |       "Results for Categorical Model\n",
1950 |       "0.9661016949152542\n",
1951 |       "             precision    recall  f1-score   support\n",
1952 |       "\n",
1953 |       "          0       0.97      0.97      0.97        36\n",
1954 |       "          1       0.96      0.96      0.96        23\n",
1955 |       "\n",
1956 |       "avg / total       0.97      0.97      0.97        59\n",
1957 |       "\n"
1958 |      ]
1959 |     }
1960 |    ],
1961 |    "source": [
1962 |     "print('Results for Categorical Model')\n",
1963 |     "print(accuracy_score(Y_test[['YES']], predictions))\n",
1964 |     "print(classification_report(Y_test[['YES']], predictions))"
1965 |    ]
1966 |   },
1967 |   {
1968 |    "cell_type": "code",
1969 |    "execution_count": null,
1970 |    "metadata": {},
1971 |    "outputs": [],
1972 |    "source": []
1973 |   }
1974 |  ],
1975 |  "metadata": {
1976 |   "kernelspec": {
1977 |    "display_name": "Python [default]",
1978 |    "language": "python",
1979 |    "name": "python2"
1980 |   },
1981 |   "language_info": {
1982 |    "codemirror_mode": {
1983 |     "name": "ipython",
1984 |     "version": 2
1985 |    },
1986 |    "file_extension": ".py",
1987 |    "mimetype": "text/x-python",
1988 |    "name": "python",
1989 |    "nbconvert_exporter": "python",
1990 |    "pygments_lexer": "ipython2",
1991 |    "version": "2.7.13"
1992 |   }
1993 |  },
1994 |  "nbformat": 4,
1995 |  "nbformat_minor": 2
1996 | }
1997 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Dinesh Jinjala
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Machine-Learning-for-Healthcare-Analytics
 2 | This is the code repository for [Machine Learning for Healthcare Analytics Projects](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-healthcare-analytics-projects?utm_source=github&utm_medium=repository&utm_campaign=9781789536591), published by Packt.
 3 | 
 4 | Machine Learning (ML) has changed the way organizations and individuals use data to improve the efficiency of a system. ML algorithms allow strategists to deal with a variety of structured, unstructured, and semi-structured data. Machine Learning for Healthcare Analytics Projects is packed with new approaches and methodologies for creating powerful solutions for healthcare analytics.
 5 | 
 6 | This book will teach you how to implement key machine learning algorithms and walk you through their use cases by employing a range of libraries from the Python ecosystem. You will build five end-to-end projects to evaluate the efficiency of Artificial Intelligence (AI) applications for carrying out simple-to-complex healthcare analytics tasks. With each project, you will gain new insights, which will then help you handle healthcare data efficiently. As you make your way through the book, you will use ML to detect cancer in a set of patients using support vector machines (SVMs) and k-Nearest neighbors (KNN) models. In the final chapters, you will create a deep neural network in Keras to predict the onset of diabetes in a huge dataset of patients. You will also learn how to predict heart diseases using neural networks.
 7 | 
 8 | By the end of this book, you will have learned how to address long-standing challenges, provide specialized solutions for how to deal with them, and carry out a range of cognitive tasks in the healthcare domain.
 9 | 
10 | # Table of Contents
11 | 1. Breast Cancer Detection
12 | 2. Diabetes Onset Detection
13 | 3. DNA classification
14 | 4. Diagnosing Coronary Artery Disease Using machine Learning
15 | 5. Screening Children for Autistic Spectrum Disorder using machine learning
16 | 
17 | ## Instructions and Navigations
18 | All of the code is organized into folders. For example, Chapter02.
19 | 
20 | The code will look like the following:
21 | 
22 | import sys
23 | 
24 | import pandas as pd
25 | 
26 | import sklearn
27 | 
28 | import keras
29 | 
30 | print 'Python: {}'.format(sys.version)
31 | 
32 | print 'Pandas: {}'.format(pd.__version__)
33 | 
34 | print 'Sklearn: {}'.format(sklearn.__version__)
35 | 
36 | print 'Keras: {}'.format(keras.__version__)
37 | 
38 | 
39 | *Following is what you need for this book:*
40 | Machine Learning for Healthcare Analytics Projects is for data scientists, machine learning engineers, and healthcare professionals who want to implement machine learning algorithms to build smart AI applications. Basic knowledge of Python or any programming language is expected to get the most from this book.
41 | 
42 | With the following software and hardware list you can run all code files present in the book (Chapter 1-5).
43 | ### Software and Hardware List
44 | | Chapter | Software required | OS required |
45 | | -------- | ------------------------------------ | ----------------------------------- |
46 | | All | Python 3.6 or later | Windows, Mac OS X, and Linux (Any) |
47 | |  | Anaconda 5.2 | Windows, Mac OS X, and Linux (Any) |
48 | |  | Jupyter Notebook | Windows, Mac OS X, and Linux (Any) |
49 | 
50 | 
51 | We also provide a PDF file that has color images of the screenshots/diagrams used in this book. [Click here to download it](https://www.packtpub.com/sites/default/files/downloads/9781789536591_ColorImages.pdf).
52 | 


--------------------------------------------------------------------------------