├── LICENSE
├── README.md
└── Syncora_vs_Gretel_vs_MostlyAI_metrics_comparison.ipynb


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Syncora.ai - Agentic Synthetic Data Generation Tool
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # syncora-benchmarks
 2 | 
 3 | Syncora Benchmarks is a plug and play toolkit to evaluate synthetic data quality. Add CSVs from any generator and get instant fidelity metrics and visual comparisons with simple file naming.
 4 | ---
 5 | 
 6 | ## What’s Inside
 7 | 
 8 | - **`Syncora_vs_Gretel_vs_MostlyAI_metrics_comparison.ipynb`**  
 9 |   A Jupyter Notebook that:
10 |   1. Loads your real & synthetic datasets  
11 |   2. Computes a suite of similarity & fidelity metrics  
12 |   3. Visualizes comparative results  
13 | 
14 | - **`README.md`**  
15 |   This overview file.
16 | 
17 | - **Raw / Synthetic Data Files**  
18 |   Place your CSVs here following the naming convention:  <generator_name>_synthetic.csv
19 | 
20 | ---
21 | 
22 | ## Template Usage
23 | 
24 | 1. **Generate synthetic data**  
25 |  Use any platform or in-house model to produce a CSV.
26 | 
27 | 2. **Name your output**  
28 |  Rename your file to: mygenerator_synthetic.csv
29 | 
30 | _e.g._ `Syncora_synthetic.csv`, `Gretel_synthetic.csv`, etc.
31 | 
32 | 3. **Drop it into this repo**  
33 | Place your CSV alongside the notebook in the same folder.
34 | 
35 | 4. **Edit & run the notebook**  
36 | - Open `Syncora_vs_Gretel_vs_MostlyAI_metrics_comparison.ipynb`.  
37 | - The code automatically discovers all `*_synthetic.csv` files.  
38 | - Execute all cells to regenerate metrics and plots.
39 | 
40 | ---
41 | 
42 | ## Adding New Generators or Datasets
43 | 
44 | 1. Generate your synthetic CSV and name it `<your_name>_synthetic.csv`.  
45 | 2. (Optionally) Add a short description in the notebook’s metadata.  
46 | 3. Re-run the notebook — your new results will be appended to the comparison charts.
47 | 
48 | ---
49 | 
50 | ## Contributing
51 | 
52 | 1. Fork this repository.  
53 | 2. Create a feature branch (`git checkout -b feature/xyz`).  
54 | 3. Submit a pull request with your updates.  
55 | 4. Ensure the notebook runs end-to-end without errors.
56 | 
57 | ---
58 | 
59 | ## License
60 | 
61 | This project is released under the [MIT License](LICENSE).
62 | 


--------------------------------------------------------------------------------
/Syncora_vs_Gretel_vs_MostlyAI_metrics_comparison.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "cells": [
   3 |     {
   4 |       "cell_type": "code",
   5 |       "execution_count": 2,
   6 |       "metadata": {
   7 |         "id": "llZS4N1pjedS"
   8 |       },
   9 |       "outputs": [],
  10 |       "source": [
  11 |         "import pandas as pd\n",
  12 |         "import numpy as np"
  13 |       ]
  14 |     },
  15 |     {
  16 |       "cell_type": "code",
  17 |       "execution_count": 40,
  18 |       "metadata": {
  19 |         "colab": {
  20 |           "base_uri": "https://localhost:8080/"
  21 |         },
  22 |         "id": "AtD6Tn_OlLdf",
  23 |         "outputId": "7651a509-4c6f-4e68-e4e6-8d9fad588cc0"
  24 |       },
  25 |       "outputs": [
  26 |         {
  27 |           "output_type": "stream",
  28 |           "name": "stdout",
  29 |           "text": [
  30 |             "Collecting sdmetrics\n",
  31 |             "  Downloading sdmetrics-0.22.0-py3-none-any.whl.metadata (9.4 kB)\n",
  32 |             "Requirement already satisfied: numpy>=1.24.0 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (2.0.2)\n",
  33 |             "Requirement already satisfied: pandas>=1.5.0 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (2.2.2)\n",
  34 |             "Requirement already satisfied: scikit-learn>=1.1.3 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (1.6.1)\n",
  35 |             "Requirement already satisfied: scipy>=1.9.2 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (1.16.0)\n",
  36 |             "Collecting copulas>=0.12.1 (from sdmetrics)\n",
  37 |             "  Downloading copulas-0.12.3-py3-none-any.whl.metadata (9.5 kB)\n",
  38 |             "Requirement already satisfied: tqdm>=4.29 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (4.67.1)\n",
  39 |             "Requirement already satisfied: plotly>=5.19.0 in /usr/local/lib/python3.11/dist-packages (from sdmetrics) (5.24.1)\n",
  40 |             "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->sdmetrics) (2.9.0.post0)\n",
  41 |             "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->sdmetrics) (2025.2)\n",
  42 |             "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->sdmetrics) (2025.2)\n",
  43 |             "Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from plotly>=5.19.0->sdmetrics) (8.5.0)\n",
  44 |             "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from plotly>=5.19.0->sdmetrics) (25.0)\n",
  45 |             "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=1.1.3->sdmetrics) (1.5.1)\n",
  46 |             "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=1.1.3->sdmetrics) (3.6.0)\n",
  47 |             "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas>=1.5.0->sdmetrics) (1.17.0)\n",
  48 |             "Downloading sdmetrics-0.22.0-py3-none-any.whl (198 kB)\n",
  49 |             "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m198.1/198.1 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  50 |             "\u001b[?25hDownloading copulas-0.12.3-py3-none-any.whl (52 kB)\n",
  51 |             "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.7/52.7 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  52 |             "\u001b[?25hInstalling collected packages: copulas, sdmetrics\n",
  53 |             "Successfully installed copulas-0.12.3 sdmetrics-0.22.0\n"
  54 |           ]
  55 |         }
  56 |       ],
  57 |       "source": [
  58 |         "!pip install sdmetrics"
  59 |       ]
  60 |     },
  61 |     {
  62 |       "cell_type": "code",
  63 |       "execution_count": 17,
  64 |       "metadata": {
  65 |         "id": "2NfHINaxj0_C",
  66 |         "colab": {
  67 |           "base_uri": "https://localhost:8080/",
  68 |           "height": 424
  69 |         },
  70 |         "outputId": "a89b1fe8-3460-47df-eeb8-cbbaa520123d"
  71 |       },
  72 |       "outputs": [
  73 |         {
  74 |           "output_type": "execute_result",
  75 |           "data": {
  76 |             "text/plain": [
  77 |               "       Age  Gender  Blood Type  Medical Condition  Billing Amount  \\\n",
  78 |               "0       30     1.0         5.0                2.0    18856.281306   \n",
  79 |               "1       62     1.0         0.0                5.0    33643.327287   \n",
  80 |               "2       76     0.0         1.0                5.0    27955.096079   \n",
  81 |               "3       28     0.0         6.0                3.0    37909.782410   \n",
  82 |               "4       43     0.0         2.0                2.0    14238.317814   \n",
  83 |               "...    ...     ...         ...                ...             ...   \n",
  84 |               "55495   42     0.0         6.0                1.0     2650.714952   \n",
  85 |               "55496   61     0.0         3.0                5.0    31457.797307   \n",
  86 |               "55497   38     0.0         4.0                4.0    27620.764717   \n",
  87 |               "55498   43     1.0         7.0                0.0    32451.092358   \n",
  88 |               "55499   53     0.0         6.0                0.0     4010.134172   \n",
  89 |               "\n",
  90 |               "       Admission Type  Medication  Test Results  \n",
  91 |               "0                 2.0         3.0           2.0  \n",
  92 |               "1                 1.0         1.0           1.0  \n",
  93 |               "2                 1.0         0.0           2.0  \n",
  94 |               "3                 0.0         1.0           0.0  \n",
  95 |               "4                 2.0         4.0           0.0  \n",
  96 |               "...               ...         ...           ...  \n",
  97 |               "55495             0.0         4.0           0.0  \n",
  98 |               "55496             0.0         0.0           2.0  \n",
  99 |               "55497             2.0         1.0           0.0  \n",
 100 |               "55498             0.0         1.0           0.0  \n",
 101 |               "55499             2.0         1.0           0.0  \n",
 102 |               "\n",
 103 |               "[55500 rows x 8 columns]"
 104 |             ],
 105 |             "text/html": [
 106 |               "\n",
 107 |               "  <div id=\"df-a4c6e381-c1b2-479d-a96f-1bb63cd4f018\" class=\"colab-df-container\">\n",
 108 |               "    <div>\n",
 109 |               "<style scoped>\n",
 110 |               "    .dataframe tbody tr th:only-of-type {\n",
 111 |               "        vertical-align: middle;\n",
 112 |               "    }\n",
 113 |               "\n",
 114 |               "    .dataframe tbody tr th {\n",
 115 |               "        vertical-align: top;\n",
 116 |               "    }\n",
 117 |               "\n",
 118 |               "    .dataframe thead th {\n",
 119 |               "        text-align: right;\n",
 120 |               "    }\n",
 121 |               "</style>\n",
 122 |               "<table border=\"1\" class=\"dataframe\">\n",
 123 |               "  <thead>\n",
 124 |               "    <tr style=\"text-align: right;\">\n",
 125 |               "      <th></th>\n",
 126 |               "      <th>Age</th>\n",
 127 |               "      <th>Gender</th>\n",
 128 |               "      <th>Blood Type</th>\n",
 129 |               "      <th>Medical Condition</th>\n",
 130 |               "      <th>Billing Amount</th>\n",
 131 |               "      <th>Admission Type</th>\n",
 132 |               "      <th>Medication</th>\n",
 133 |               "      <th>Test Results</th>\n",
 134 |               "    </tr>\n",
 135 |               "  </thead>\n",
 136 |               "  <tbody>\n",
 137 |               "    <tr>\n",
 138 |               "      <th>0</th>\n",
 139 |               "      <td>30</td>\n",
 140 |               "      <td>1.0</td>\n",
 141 |               "      <td>5.0</td>\n",
 142 |               "      <td>2.0</td>\n",
 143 |               "      <td>18856.281306</td>\n",
 144 |               "      <td>2.0</td>\n",
 145 |               "      <td>3.0</td>\n",
 146 |               "      <td>2.0</td>\n",
 147 |               "    </tr>\n",
 148 |               "    <tr>\n",
 149 |               "      <th>1</th>\n",
 150 |               "      <td>62</td>\n",
 151 |               "      <td>1.0</td>\n",
 152 |               "      <td>0.0</td>\n",
 153 |               "      <td>5.0</td>\n",
 154 |               "      <td>33643.327287</td>\n",
 155 |               "      <td>1.0</td>\n",
 156 |               "      <td>1.0</td>\n",
 157 |               "      <td>1.0</td>\n",
 158 |               "    </tr>\n",
 159 |               "    <tr>\n",
 160 |               "      <th>2</th>\n",
 161 |               "      <td>76</td>\n",
 162 |               "      <td>0.0</td>\n",
 163 |               "      <td>1.0</td>\n",
 164 |               "      <td>5.0</td>\n",
 165 |               "      <td>27955.096079</td>\n",
 166 |               "      <td>1.0</td>\n",
 167 |               "      <td>0.0</td>\n",
 168 |               "      <td>2.0</td>\n",
 169 |               "    </tr>\n",
 170 |               "    <tr>\n",
 171 |               "      <th>3</th>\n",
 172 |               "      <td>28</td>\n",
 173 |               "      <td>0.0</td>\n",
 174 |               "      <td>6.0</td>\n",
 175 |               "      <td>3.0</td>\n",
 176 |               "      <td>37909.782410</td>\n",
 177 |               "      <td>0.0</td>\n",
 178 |               "      <td>1.0</td>\n",
 179 |               "      <td>0.0</td>\n",
 180 |               "    </tr>\n",
 181 |               "    <tr>\n",
 182 |               "      <th>4</th>\n",
 183 |               "      <td>43</td>\n",
 184 |               "      <td>0.0</td>\n",
 185 |               "      <td>2.0</td>\n",
 186 |               "      <td>2.0</td>\n",
 187 |               "      <td>14238.317814</td>\n",
 188 |               "      <td>2.0</td>\n",
 189 |               "      <td>4.0</td>\n",
 190 |               "      <td>0.0</td>\n",
 191 |               "    </tr>\n",
 192 |               "    <tr>\n",
 193 |               "      <th>...</th>\n",
 194 |               "      <td>...</td>\n",
 195 |               "      <td>...</td>\n",
 196 |               "      <td>...</td>\n",
 197 |               "      <td>...</td>\n",
 198 |               "      <td>...</td>\n",
 199 |               "      <td>...</td>\n",
 200 |               "      <td>...</td>\n",
 201 |               "      <td>...</td>\n",
 202 |               "    </tr>\n",
 203 |               "    <tr>\n",
 204 |               "      <th>55495</th>\n",
 205 |               "      <td>42</td>\n",
 206 |               "      <td>0.0</td>\n",
 207 |               "      <td>6.0</td>\n",
 208 |               "      <td>1.0</td>\n",
 209 |               "      <td>2650.714952</td>\n",
 210 |               "      <td>0.0</td>\n",
 211 |               "      <td>4.0</td>\n",
 212 |               "      <td>0.0</td>\n",
 213 |               "    </tr>\n",
 214 |               "    <tr>\n",
 215 |               "      <th>55496</th>\n",
 216 |               "      <td>61</td>\n",
 217 |               "      <td>0.0</td>\n",
 218 |               "      <td>3.0</td>\n",
 219 |               "      <td>5.0</td>\n",
 220 |               "      <td>31457.797307</td>\n",
 221 |               "      <td>0.0</td>\n",
 222 |               "      <td>0.0</td>\n",
 223 |               "      <td>2.0</td>\n",
 224 |               "    </tr>\n",
 225 |               "    <tr>\n",
 226 |               "      <th>55497</th>\n",
 227 |               "      <td>38</td>\n",
 228 |               "      <td>0.0</td>\n",
 229 |               "      <td>4.0</td>\n",
 230 |               "      <td>4.0</td>\n",
 231 |               "      <td>27620.764717</td>\n",
 232 |               "      <td>2.0</td>\n",
 233 |               "      <td>1.0</td>\n",
 234 |               "      <td>0.0</td>\n",
 235 |               "    </tr>\n",
 236 |               "    <tr>\n",
 237 |               "      <th>55498</th>\n",
 238 |               "      <td>43</td>\n",
 239 |               "      <td>1.0</td>\n",
 240 |               "      <td>7.0</td>\n",
 241 |               "      <td>0.0</td>\n",
 242 |               "      <td>32451.092358</td>\n",
 243 |               "      <td>0.0</td>\n",
 244 |               "      <td>1.0</td>\n",
 245 |               "      <td>0.0</td>\n",
 246 |               "    </tr>\n",
 247 |               "    <tr>\n",
 248 |               "      <th>55499</th>\n",
 249 |               "      <td>53</td>\n",
 250 |               "      <td>0.0</td>\n",
 251 |               "      <td>6.0</td>\n",
 252 |               "      <td>0.0</td>\n",
 253 |               "      <td>4010.134172</td>\n",
 254 |               "      <td>2.0</td>\n",
 255 |               "      <td>1.0</td>\n",
 256 |               "      <td>0.0</td>\n",
 257 |               "    </tr>\n",
 258 |               "  </tbody>\n",
 259 |               "</table>\n",
 260 |               "<p>55500 rows × 8 columns</p>\n",
 261 |               "</div>\n",
 262 |               "    <div class=\"colab-df-buttons\">\n",
 263 |               "\n",
 264 |               "  <div class=\"colab-df-container\">\n",
 265 |               "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a4c6e381-c1b2-479d-a96f-1bb63cd4f018')\"\n",
 266 |               "            title=\"Convert this dataframe to an interactive table.\"\n",
 267 |               "            style=\"display:none;\">\n",
 268 |               "\n",
 269 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
 270 |               "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
 271 |               "  </svg>\n",
 272 |               "    </button>\n",
 273 |               "\n",
 274 |               "  <style>\n",
 275 |               "    .colab-df-container {\n",
 276 |               "      display:flex;\n",
 277 |               "      gap: 12px;\n",
 278 |               "    }\n",
 279 |               "\n",
 280 |               "    .colab-df-convert {\n",
 281 |               "      background-color: #E8F0FE;\n",
 282 |               "      border: none;\n",
 283 |               "      border-radius: 50%;\n",
 284 |               "      cursor: pointer;\n",
 285 |               "      display: none;\n",
 286 |               "      fill: #1967D2;\n",
 287 |               "      height: 32px;\n",
 288 |               "      padding: 0 0 0 0;\n",
 289 |               "      width: 32px;\n",
 290 |               "    }\n",
 291 |               "\n",
 292 |               "    .colab-df-convert:hover {\n",
 293 |               "      background-color: #E2EBFA;\n",
 294 |               "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 295 |               "      fill: #174EA6;\n",
 296 |               "    }\n",
 297 |               "\n",
 298 |               "    .colab-df-buttons div {\n",
 299 |               "      margin-bottom: 4px;\n",
 300 |               "    }\n",
 301 |               "\n",
 302 |               "    [theme=dark] .colab-df-convert {\n",
 303 |               "      background-color: #3B4455;\n",
 304 |               "      fill: #D2E3FC;\n",
 305 |               "    }\n",
 306 |               "\n",
 307 |               "    [theme=dark] .colab-df-convert:hover {\n",
 308 |               "      background-color: #434B5C;\n",
 309 |               "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
 310 |               "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
 311 |               "      fill: #FFFFFF;\n",
 312 |               "    }\n",
 313 |               "  </style>\n",
 314 |               "\n",
 315 |               "    <script>\n",
 316 |               "      const buttonEl =\n",
 317 |               "        document.querySelector('#df-a4c6e381-c1b2-479d-a96f-1bb63cd4f018 button.colab-df-convert');\n",
 318 |               "      buttonEl.style.display =\n",
 319 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 320 |               "\n",
 321 |               "      async function convertToInteractive(key) {\n",
 322 |               "        const element = document.querySelector('#df-a4c6e381-c1b2-479d-a96f-1bb63cd4f018');\n",
 323 |               "        const dataTable =\n",
 324 |               "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
 325 |               "                                                    [key], {});\n",
 326 |               "        if (!dataTable) return;\n",
 327 |               "\n",
 328 |               "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
 329 |               "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
 330 |               "          + ' to learn more about interactive tables.';\n",
 331 |               "        element.innerHTML = '';\n",
 332 |               "        dataTable['output_type'] = 'display_data';\n",
 333 |               "        await google.colab.output.renderOutput(dataTable, element);\n",
 334 |               "        const docLink = document.createElement('div');\n",
 335 |               "        docLink.innerHTML = docLinkHtml;\n",
 336 |               "        element.appendChild(docLink);\n",
 337 |               "      }\n",
 338 |               "    </script>\n",
 339 |               "  </div>\n",
 340 |               "\n",
 341 |               "\n",
 342 |               "    <div id=\"df-e7f5b196-d8c0-469f-9c54-6d7e2acec1be\">\n",
 343 |               "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-e7f5b196-d8c0-469f-9c54-6d7e2acec1be')\"\n",
 344 |               "                title=\"Suggest charts\"\n",
 345 |               "                style=\"display:none;\">\n",
 346 |               "\n",
 347 |               "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
 348 |               "     width=\"24px\">\n",
 349 |               "    <g>\n",
 350 |               "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
 351 |               "    </g>\n",
 352 |               "</svg>\n",
 353 |               "      </button>\n",
 354 |               "\n",
 355 |               "<style>\n",
 356 |               "  .colab-df-quickchart {\n",
 357 |               "      --bg-color: #E8F0FE;\n",
 358 |               "      --fill-color: #1967D2;\n",
 359 |               "      --hover-bg-color: #E2EBFA;\n",
 360 |               "      --hover-fill-color: #174EA6;\n",
 361 |               "      --disabled-fill-color: #AAA;\n",
 362 |               "      --disabled-bg-color: #DDD;\n",
 363 |               "  }\n",
 364 |               "\n",
 365 |               "  [theme=dark] .colab-df-quickchart {\n",
 366 |               "      --bg-color: #3B4455;\n",
 367 |               "      --fill-color: #D2E3FC;\n",
 368 |               "      --hover-bg-color: #434B5C;\n",
 369 |               "      --hover-fill-color: #FFFFFF;\n",
 370 |               "      --disabled-bg-color: #3B4455;\n",
 371 |               "      --disabled-fill-color: #666;\n",
 372 |               "  }\n",
 373 |               "\n",
 374 |               "  .colab-df-quickchart {\n",
 375 |               "    background-color: var(--bg-color);\n",
 376 |               "    border: none;\n",
 377 |               "    border-radius: 50%;\n",
 378 |               "    cursor: pointer;\n",
 379 |               "    display: none;\n",
 380 |               "    fill: var(--fill-color);\n",
 381 |               "    height: 32px;\n",
 382 |               "    padding: 0;\n",
 383 |               "    width: 32px;\n",
 384 |               "  }\n",
 385 |               "\n",
 386 |               "  .colab-df-quickchart:hover {\n",
 387 |               "    background-color: var(--hover-bg-color);\n",
 388 |               "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 389 |               "    fill: var(--button-hover-fill-color);\n",
 390 |               "  }\n",
 391 |               "\n",
 392 |               "  .colab-df-quickchart-complete:disabled,\n",
 393 |               "  .colab-df-quickchart-complete:disabled:hover {\n",
 394 |               "    background-color: var(--disabled-bg-color);\n",
 395 |               "    fill: var(--disabled-fill-color);\n",
 396 |               "    box-shadow: none;\n",
 397 |               "  }\n",
 398 |               "\n",
 399 |               "  .colab-df-spinner {\n",
 400 |               "    border: 2px solid var(--fill-color);\n",
 401 |               "    border-color: transparent;\n",
 402 |               "    border-bottom-color: var(--fill-color);\n",
 403 |               "    animation:\n",
 404 |               "      spin 1s steps(1) infinite;\n",
 405 |               "  }\n",
 406 |               "\n",
 407 |               "  @keyframes spin {\n",
 408 |               "    0% {\n",
 409 |               "      border-color: transparent;\n",
 410 |               "      border-bottom-color: var(--fill-color);\n",
 411 |               "      border-left-color: var(--fill-color);\n",
 412 |               "    }\n",
 413 |               "    20% {\n",
 414 |               "      border-color: transparent;\n",
 415 |               "      border-left-color: var(--fill-color);\n",
 416 |               "      border-top-color: var(--fill-color);\n",
 417 |               "    }\n",
 418 |               "    30% {\n",
 419 |               "      border-color: transparent;\n",
 420 |               "      border-left-color: var(--fill-color);\n",
 421 |               "      border-top-color: var(--fill-color);\n",
 422 |               "      border-right-color: var(--fill-color);\n",
 423 |               "    }\n",
 424 |               "    40% {\n",
 425 |               "      border-color: transparent;\n",
 426 |               "      border-right-color: var(--fill-color);\n",
 427 |               "      border-top-color: var(--fill-color);\n",
 428 |               "    }\n",
 429 |               "    60% {\n",
 430 |               "      border-color: transparent;\n",
 431 |               "      border-right-color: var(--fill-color);\n",
 432 |               "    }\n",
 433 |               "    80% {\n",
 434 |               "      border-color: transparent;\n",
 435 |               "      border-right-color: var(--fill-color);\n",
 436 |               "      border-bottom-color: var(--fill-color);\n",
 437 |               "    }\n",
 438 |               "    90% {\n",
 439 |               "      border-color: transparent;\n",
 440 |               "      border-bottom-color: var(--fill-color);\n",
 441 |               "    }\n",
 442 |               "  }\n",
 443 |               "</style>\n",
 444 |               "\n",
 445 |               "      <script>\n",
 446 |               "        async function quickchart(key) {\n",
 447 |               "          const quickchartButtonEl =\n",
 448 |               "            document.querySelector('#' + key + ' button');\n",
 449 |               "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
 450 |               "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
 451 |               "          try {\n",
 452 |               "            const charts = await google.colab.kernel.invokeFunction(\n",
 453 |               "                'suggestCharts', [key], {});\n",
 454 |               "          } catch (error) {\n",
 455 |               "            console.error('Error during call to suggestCharts:', error);\n",
 456 |               "          }\n",
 457 |               "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
 458 |               "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
 459 |               "        }\n",
 460 |               "        (() => {\n",
 461 |               "          let quickchartButtonEl =\n",
 462 |               "            document.querySelector('#df-e7f5b196-d8c0-469f-9c54-6d7e2acec1be button');\n",
 463 |               "          quickchartButtonEl.style.display =\n",
 464 |               "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 465 |               "        })();\n",
 466 |               "      </script>\n",
 467 |               "    </div>\n",
 468 |               "\n",
 469 |               "  <div id=\"id_c655cedc-6894-4937-9b78-997213c1a34f\">\n",
 470 |               "    <style>\n",
 471 |               "      .colab-df-generate {\n",
 472 |               "        background-color: #E8F0FE;\n",
 473 |               "        border: none;\n",
 474 |               "        border-radius: 50%;\n",
 475 |               "        cursor: pointer;\n",
 476 |               "        display: none;\n",
 477 |               "        fill: #1967D2;\n",
 478 |               "        height: 32px;\n",
 479 |               "        padding: 0 0 0 0;\n",
 480 |               "        width: 32px;\n",
 481 |               "      }\n",
 482 |               "\n",
 483 |               "      .colab-df-generate:hover {\n",
 484 |               "        background-color: #E2EBFA;\n",
 485 |               "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 486 |               "        fill: #174EA6;\n",
 487 |               "      }\n",
 488 |               "\n",
 489 |               "      [theme=dark] .colab-df-generate {\n",
 490 |               "        background-color: #3B4455;\n",
 491 |               "        fill: #D2E3FC;\n",
 492 |               "      }\n",
 493 |               "\n",
 494 |               "      [theme=dark] .colab-df-generate:hover {\n",
 495 |               "        background-color: #434B5C;\n",
 496 |               "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
 497 |               "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
 498 |               "        fill: #FFFFFF;\n",
 499 |               "      }\n",
 500 |               "    </style>\n",
 501 |               "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df_real')\"\n",
 502 |               "            title=\"Generate code using this dataframe.\"\n",
 503 |               "            style=\"display:none;\">\n",
 504 |               "\n",
 505 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
 506 |               "       width=\"24px\">\n",
 507 |               "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
 508 |               "  </svg>\n",
 509 |               "    </button>\n",
 510 |               "    <script>\n",
 511 |               "      (() => {\n",
 512 |               "      const buttonEl =\n",
 513 |               "        document.querySelector('#id_c655cedc-6894-4937-9b78-997213c1a34f button.colab-df-generate');\n",
 514 |               "      buttonEl.style.display =\n",
 515 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 516 |               "\n",
 517 |               "      buttonEl.onclick = () => {\n",
 518 |               "        google.colab.notebook.generateWithVariable('df_real');\n",
 519 |               "      }\n",
 520 |               "      })();\n",
 521 |               "    </script>\n",
 522 |               "  </div>\n",
 523 |               "\n",
 524 |               "    </div>\n",
 525 |               "  </div>\n"
 526 |             ],
 527 |             "application/vnd.google.colaboratory.intrinsic+json": {
 528 |               "type": "dataframe",
 529 |               "variable_name": "df_real",
 530 |               "summary": "{\n  \"name\": \"df_real\",\n  \"rows\": 55500,\n  \"fields\": [\n    {\n      \"column\": \"Age\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 19,\n        \"min\": 13,\n        \"max\": 89,\n        \"num_unique_values\": 77,\n        \"samples\": [\n          43,\n          22,\n          72\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Gender\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.5000043175658112,\n        \"min\": 0.0,\n        \"max\": 1.0,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Blood Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2.289699610900092,\n        \"min\": 0.0,\n        \"max\": 7.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          0.0,\n          3.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medical Condition\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1.7083359197155301,\n        \"min\": 0.0,\n        \"max\": 5.0,\n        \"num_unique_values\": 6,\n        \"samples\": [\n          2.0,\n          5.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Billing Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 14211.45443086446,\n        \"min\": -2008.4921398591305,\n        \"max\": 52764.276736469175,\n        \"num_unique_values\": 50000,\n        \"samples\": [\n          41172.960486003554,\n          7672.233633429568\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Admission Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.8190475504400777,\n        \"min\": 0.0,\n        \"max\": 2.0,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2.0,\n          1.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medication\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1.4132435830881946,\n        \"min\": 0.0,\n        \"max\": 4.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.0,\n          2.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Test Results\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.8180888655374859,\n        \"min\": 0.0,\n        \"max\": 2.0,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2.0,\n          1.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
 531 |             }
 532 |           },
 533 |           "metadata": {},
 534 |           "execution_count": 17
 535 |         }
 536 |       ],
 537 |       "source": [
 538 |         "df_real = pd.read_csv('/content/healthcare_cleaned_data.csv')\n",
 539 |         "df_real"
 540 |       ]
 541 |     },
 542 |     {
 543 |       "cell_type": "code",
 544 |       "execution_count": 15,
 545 |       "metadata": {
 546 |         "id": "Zc__Y5t2lwIa",
 547 |         "colab": {
 548 |           "base_uri": "https://localhost:8080/",
 549 |           "height": 424
 550 |         },
 551 |         "outputId": "82bd5713-5598-4391-f927-a189a135b89e"
 552 |       },
 553 |       "outputs": [
 554 |         {
 555 |           "output_type": "execute_result",
 556 |           "data": {
 557 |             "text/plain": [
 558 |               "       Age  Gender  Blood Type  Medical Condition  Billing Amount  \\\n",
 559 |               "0       52     1.0         7.0                4.0    19205.266739   \n",
 560 |               "1       75     0.0         4.0                1.0     1189.229029   \n",
 561 |               "2       62     1.0         3.0                4.0     8068.886263   \n",
 562 |               "3       61     0.0         4.0                3.0     7179.079255   \n",
 563 |               "4       65     0.0         3.0                3.0    12120.088272   \n",
 564 |               "...    ...     ...         ...                ...             ...   \n",
 565 |               "29993   74     1.0         0.0                4.0    27015.554780   \n",
 566 |               "29994   53     1.0         3.0                2.0    45501.646881   \n",
 567 |               "29995   61     1.0         0.0                3.0    36968.704333   \n",
 568 |               "29996   44     1.0         6.0                5.0    48874.126856   \n",
 569 |               "29997   61     1.0         2.0                0.0    25784.574781   \n",
 570 |               "\n",
 571 |               "       Admission Type  Medication  Test Results  \n",
 572 |               "0                 1.0         3.0           1.0  \n",
 573 |               "1                 0.0         1.0           1.0  \n",
 574 |               "2                 2.0         2.0           0.0  \n",
 575 |               "3                 0.0         3.0           1.0  \n",
 576 |               "4                 1.0         4.0           1.0  \n",
 577 |               "...               ...         ...           ...  \n",
 578 |               "29993             0.0         0.0           0.0  \n",
 579 |               "29994             0.0         1.0           2.0  \n",
 580 |               "29995             2.0         0.0           1.0  \n",
 581 |               "29996             2.0         2.0           0.0  \n",
 582 |               "29997             2.0         3.0           1.0  \n",
 583 |               "\n",
 584 |               "[29998 rows x 8 columns]"
 585 |             ],
 586 |             "text/html": [
 587 |               "\n",
 588 |               "  <div id=\"df-69177456-3349-4d69-ae57-82537c257dc0\" class=\"colab-df-container\">\n",
 589 |               "    <div>\n",
 590 |               "<style scoped>\n",
 591 |               "    .dataframe tbody tr th:only-of-type {\n",
 592 |               "        vertical-align: middle;\n",
 593 |               "    }\n",
 594 |               "\n",
 595 |               "    .dataframe tbody tr th {\n",
 596 |               "        vertical-align: top;\n",
 597 |               "    }\n",
 598 |               "\n",
 599 |               "    .dataframe thead th {\n",
 600 |               "        text-align: right;\n",
 601 |               "    }\n",
 602 |               "</style>\n",
 603 |               "<table border=\"1\" class=\"dataframe\">\n",
 604 |               "  <thead>\n",
 605 |               "    <tr style=\"text-align: right;\">\n",
 606 |               "      <th></th>\n",
 607 |               "      <th>Age</th>\n",
 608 |               "      <th>Gender</th>\n",
 609 |               "      <th>Blood Type</th>\n",
 610 |               "      <th>Medical Condition</th>\n",
 611 |               "      <th>Billing Amount</th>\n",
 612 |               "      <th>Admission Type</th>\n",
 613 |               "      <th>Medication</th>\n",
 614 |               "      <th>Test Results</th>\n",
 615 |               "    </tr>\n",
 616 |               "  </thead>\n",
 617 |               "  <tbody>\n",
 618 |               "    <tr>\n",
 619 |               "      <th>0</th>\n",
 620 |               "      <td>52</td>\n",
 621 |               "      <td>1.0</td>\n",
 622 |               "      <td>7.0</td>\n",
 623 |               "      <td>4.0</td>\n",
 624 |               "      <td>19205.266739</td>\n",
 625 |               "      <td>1.0</td>\n",
 626 |               "      <td>3.0</td>\n",
 627 |               "      <td>1.0</td>\n",
 628 |               "    </tr>\n",
 629 |               "    <tr>\n",
 630 |               "      <th>1</th>\n",
 631 |               "      <td>75</td>\n",
 632 |               "      <td>0.0</td>\n",
 633 |               "      <td>4.0</td>\n",
 634 |               "      <td>1.0</td>\n",
 635 |               "      <td>1189.229029</td>\n",
 636 |               "      <td>0.0</td>\n",
 637 |               "      <td>1.0</td>\n",
 638 |               "      <td>1.0</td>\n",
 639 |               "    </tr>\n",
 640 |               "    <tr>\n",
 641 |               "      <th>2</th>\n",
 642 |               "      <td>62</td>\n",
 643 |               "      <td>1.0</td>\n",
 644 |               "      <td>3.0</td>\n",
 645 |               "      <td>4.0</td>\n",
 646 |               "      <td>8068.886263</td>\n",
 647 |               "      <td>2.0</td>\n",
 648 |               "      <td>2.0</td>\n",
 649 |               "      <td>0.0</td>\n",
 650 |               "    </tr>\n",
 651 |               "    <tr>\n",
 652 |               "      <th>3</th>\n",
 653 |               "      <td>61</td>\n",
 654 |               "      <td>0.0</td>\n",
 655 |               "      <td>4.0</td>\n",
 656 |               "      <td>3.0</td>\n",
 657 |               "      <td>7179.079255</td>\n",
 658 |               "      <td>0.0</td>\n",
 659 |               "      <td>3.0</td>\n",
 660 |               "      <td>1.0</td>\n",
 661 |               "    </tr>\n",
 662 |               "    <tr>\n",
 663 |               "      <th>4</th>\n",
 664 |               "      <td>65</td>\n",
 665 |               "      <td>0.0</td>\n",
 666 |               "      <td>3.0</td>\n",
 667 |               "      <td>3.0</td>\n",
 668 |               "      <td>12120.088272</td>\n",
 669 |               "      <td>1.0</td>\n",
 670 |               "      <td>4.0</td>\n",
 671 |               "      <td>1.0</td>\n",
 672 |               "    </tr>\n",
 673 |               "    <tr>\n",
 674 |               "      <th>...</th>\n",
 675 |               "      <td>...</td>\n",
 676 |               "      <td>...</td>\n",
 677 |               "      <td>...</td>\n",
 678 |               "      <td>...</td>\n",
 679 |               "      <td>...</td>\n",
 680 |               "      <td>...</td>\n",
 681 |               "      <td>...</td>\n",
 682 |               "      <td>...</td>\n",
 683 |               "    </tr>\n",
 684 |               "    <tr>\n",
 685 |               "      <th>29993</th>\n",
 686 |               "      <td>74</td>\n",
 687 |               "      <td>1.0</td>\n",
 688 |               "      <td>0.0</td>\n",
 689 |               "      <td>4.0</td>\n",
 690 |               "      <td>27015.554780</td>\n",
 691 |               "      <td>0.0</td>\n",
 692 |               "      <td>0.0</td>\n",
 693 |               "      <td>0.0</td>\n",
 694 |               "    </tr>\n",
 695 |               "    <tr>\n",
 696 |               "      <th>29994</th>\n",
 697 |               "      <td>53</td>\n",
 698 |               "      <td>1.0</td>\n",
 699 |               "      <td>3.0</td>\n",
 700 |               "      <td>2.0</td>\n",
 701 |               "      <td>45501.646881</td>\n",
 702 |               "      <td>0.0</td>\n",
 703 |               "      <td>1.0</td>\n",
 704 |               "      <td>2.0</td>\n",
 705 |               "    </tr>\n",
 706 |               "    <tr>\n",
 707 |               "      <th>29995</th>\n",
 708 |               "      <td>61</td>\n",
 709 |               "      <td>1.0</td>\n",
 710 |               "      <td>0.0</td>\n",
 711 |               "      <td>3.0</td>\n",
 712 |               "      <td>36968.704333</td>\n",
 713 |               "      <td>2.0</td>\n",
 714 |               "      <td>0.0</td>\n",
 715 |               "      <td>1.0</td>\n",
 716 |               "    </tr>\n",
 717 |               "    <tr>\n",
 718 |               "      <th>29996</th>\n",
 719 |               "      <td>44</td>\n",
 720 |               "      <td>1.0</td>\n",
 721 |               "      <td>6.0</td>\n",
 722 |               "      <td>5.0</td>\n",
 723 |               "      <td>48874.126856</td>\n",
 724 |               "      <td>2.0</td>\n",
 725 |               "      <td>2.0</td>\n",
 726 |               "      <td>0.0</td>\n",
 727 |               "    </tr>\n",
 728 |               "    <tr>\n",
 729 |               "      <th>29997</th>\n",
 730 |               "      <td>61</td>\n",
 731 |               "      <td>1.0</td>\n",
 732 |               "      <td>2.0</td>\n",
 733 |               "      <td>0.0</td>\n",
 734 |               "      <td>25784.574781</td>\n",
 735 |               "      <td>2.0</td>\n",
 736 |               "      <td>3.0</td>\n",
 737 |               "      <td>1.0</td>\n",
 738 |               "    </tr>\n",
 739 |               "  </tbody>\n",
 740 |               "</table>\n",
 741 |               "<p>29998 rows × 8 columns</p>\n",
 742 |               "</div>\n",
 743 |               "    <div class=\"colab-df-buttons\">\n",
 744 |               "\n",
 745 |               "  <div class=\"colab-df-container\">\n",
 746 |               "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-69177456-3349-4d69-ae57-82537c257dc0')\"\n",
 747 |               "            title=\"Convert this dataframe to an interactive table.\"\n",
 748 |               "            style=\"display:none;\">\n",
 749 |               "\n",
 750 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
 751 |               "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
 752 |               "  </svg>\n",
 753 |               "    </button>\n",
 754 |               "\n",
 755 |               "  <style>\n",
 756 |               "    .colab-df-container {\n",
 757 |               "      display:flex;\n",
 758 |               "      gap: 12px;\n",
 759 |               "    }\n",
 760 |               "\n",
 761 |               "    .colab-df-convert {\n",
 762 |               "      background-color: #E8F0FE;\n",
 763 |               "      border: none;\n",
 764 |               "      border-radius: 50%;\n",
 765 |               "      cursor: pointer;\n",
 766 |               "      display: none;\n",
 767 |               "      fill: #1967D2;\n",
 768 |               "      height: 32px;\n",
 769 |               "      padding: 0 0 0 0;\n",
 770 |               "      width: 32px;\n",
 771 |               "    }\n",
 772 |               "\n",
 773 |               "    .colab-df-convert:hover {\n",
 774 |               "      background-color: #E2EBFA;\n",
 775 |               "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 776 |               "      fill: #174EA6;\n",
 777 |               "    }\n",
 778 |               "\n",
 779 |               "    .colab-df-buttons div {\n",
 780 |               "      margin-bottom: 4px;\n",
 781 |               "    }\n",
 782 |               "\n",
 783 |               "    [theme=dark] .colab-df-convert {\n",
 784 |               "      background-color: #3B4455;\n",
 785 |               "      fill: #D2E3FC;\n",
 786 |               "    }\n",
 787 |               "\n",
 788 |               "    [theme=dark] .colab-df-convert:hover {\n",
 789 |               "      background-color: #434B5C;\n",
 790 |               "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
 791 |               "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
 792 |               "      fill: #FFFFFF;\n",
 793 |               "    }\n",
 794 |               "  </style>\n",
 795 |               "\n",
 796 |               "    <script>\n",
 797 |               "      const buttonEl =\n",
 798 |               "        document.querySelector('#df-69177456-3349-4d69-ae57-82537c257dc0 button.colab-df-convert');\n",
 799 |               "      buttonEl.style.display =\n",
 800 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 801 |               "\n",
 802 |               "      async function convertToInteractive(key) {\n",
 803 |               "        const element = document.querySelector('#df-69177456-3349-4d69-ae57-82537c257dc0');\n",
 804 |               "        const dataTable =\n",
 805 |               "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
 806 |               "                                                    [key], {});\n",
 807 |               "        if (!dataTable) return;\n",
 808 |               "\n",
 809 |               "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
 810 |               "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
 811 |               "          + ' to learn more about interactive tables.';\n",
 812 |               "        element.innerHTML = '';\n",
 813 |               "        dataTable['output_type'] = 'display_data';\n",
 814 |               "        await google.colab.output.renderOutput(dataTable, element);\n",
 815 |               "        const docLink = document.createElement('div');\n",
 816 |               "        docLink.innerHTML = docLinkHtml;\n",
 817 |               "        element.appendChild(docLink);\n",
 818 |               "      }\n",
 819 |               "    </script>\n",
 820 |               "  </div>\n",
 821 |               "\n",
 822 |               "\n",
 823 |               "    <div id=\"df-617da533-22bf-4635-a3e6-43d6e6c5149a\">\n",
 824 |               "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-617da533-22bf-4635-a3e6-43d6e6c5149a')\"\n",
 825 |               "                title=\"Suggest charts\"\n",
 826 |               "                style=\"display:none;\">\n",
 827 |               "\n",
 828 |               "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
 829 |               "     width=\"24px\">\n",
 830 |               "    <g>\n",
 831 |               "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
 832 |               "    </g>\n",
 833 |               "</svg>\n",
 834 |               "      </button>\n",
 835 |               "\n",
 836 |               "<style>\n",
 837 |               "  .colab-df-quickchart {\n",
 838 |               "      --bg-color: #E8F0FE;\n",
 839 |               "      --fill-color: #1967D2;\n",
 840 |               "      --hover-bg-color: #E2EBFA;\n",
 841 |               "      --hover-fill-color: #174EA6;\n",
 842 |               "      --disabled-fill-color: #AAA;\n",
 843 |               "      --disabled-bg-color: #DDD;\n",
 844 |               "  }\n",
 845 |               "\n",
 846 |               "  [theme=dark] .colab-df-quickchart {\n",
 847 |               "      --bg-color: #3B4455;\n",
 848 |               "      --fill-color: #D2E3FC;\n",
 849 |               "      --hover-bg-color: #434B5C;\n",
 850 |               "      --hover-fill-color: #FFFFFF;\n",
 851 |               "      --disabled-bg-color: #3B4455;\n",
 852 |               "      --disabled-fill-color: #666;\n",
 853 |               "  }\n",
 854 |               "\n",
 855 |               "  .colab-df-quickchart {\n",
 856 |               "    background-color: var(--bg-color);\n",
 857 |               "    border: none;\n",
 858 |               "    border-radius: 50%;\n",
 859 |               "    cursor: pointer;\n",
 860 |               "    display: none;\n",
 861 |               "    fill: var(--fill-color);\n",
 862 |               "    height: 32px;\n",
 863 |               "    padding: 0;\n",
 864 |               "    width: 32px;\n",
 865 |               "  }\n",
 866 |               "\n",
 867 |               "  .colab-df-quickchart:hover {\n",
 868 |               "    background-color: var(--hover-bg-color);\n",
 869 |               "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 870 |               "    fill: var(--button-hover-fill-color);\n",
 871 |               "  }\n",
 872 |               "\n",
 873 |               "  .colab-df-quickchart-complete:disabled,\n",
 874 |               "  .colab-df-quickchart-complete:disabled:hover {\n",
 875 |               "    background-color: var(--disabled-bg-color);\n",
 876 |               "    fill: var(--disabled-fill-color);\n",
 877 |               "    box-shadow: none;\n",
 878 |               "  }\n",
 879 |               "\n",
 880 |               "  .colab-df-spinner {\n",
 881 |               "    border: 2px solid var(--fill-color);\n",
 882 |               "    border-color: transparent;\n",
 883 |               "    border-bottom-color: var(--fill-color);\n",
 884 |               "    animation:\n",
 885 |               "      spin 1s steps(1) infinite;\n",
 886 |               "  }\n",
 887 |               "\n",
 888 |               "  @keyframes spin {\n",
 889 |               "    0% {\n",
 890 |               "      border-color: transparent;\n",
 891 |               "      border-bottom-color: var(--fill-color);\n",
 892 |               "      border-left-color: var(--fill-color);\n",
 893 |               "    }\n",
 894 |               "    20% {\n",
 895 |               "      border-color: transparent;\n",
 896 |               "      border-left-color: var(--fill-color);\n",
 897 |               "      border-top-color: var(--fill-color);\n",
 898 |               "    }\n",
 899 |               "    30% {\n",
 900 |               "      border-color: transparent;\n",
 901 |               "      border-left-color: var(--fill-color);\n",
 902 |               "      border-top-color: var(--fill-color);\n",
 903 |               "      border-right-color: var(--fill-color);\n",
 904 |               "    }\n",
 905 |               "    40% {\n",
 906 |               "      border-color: transparent;\n",
 907 |               "      border-right-color: var(--fill-color);\n",
 908 |               "      border-top-color: var(--fill-color);\n",
 909 |               "    }\n",
 910 |               "    60% {\n",
 911 |               "      border-color: transparent;\n",
 912 |               "      border-right-color: var(--fill-color);\n",
 913 |               "    }\n",
 914 |               "    80% {\n",
 915 |               "      border-color: transparent;\n",
 916 |               "      border-right-color: var(--fill-color);\n",
 917 |               "      border-bottom-color: var(--fill-color);\n",
 918 |               "    }\n",
 919 |               "    90% {\n",
 920 |               "      border-color: transparent;\n",
 921 |               "      border-bottom-color: var(--fill-color);\n",
 922 |               "    }\n",
 923 |               "  }\n",
 924 |               "</style>\n",
 925 |               "\n",
 926 |               "      <script>\n",
 927 |               "        async function quickchart(key) {\n",
 928 |               "          const quickchartButtonEl =\n",
 929 |               "            document.querySelector('#' + key + ' button');\n",
 930 |               "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
 931 |               "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
 932 |               "          try {\n",
 933 |               "            const charts = await google.colab.kernel.invokeFunction(\n",
 934 |               "                'suggestCharts', [key], {});\n",
 935 |               "          } catch (error) {\n",
 936 |               "            console.error('Error during call to suggestCharts:', error);\n",
 937 |               "          }\n",
 938 |               "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
 939 |               "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
 940 |               "        }\n",
 941 |               "        (() => {\n",
 942 |               "          let quickchartButtonEl =\n",
 943 |               "            document.querySelector('#df-617da533-22bf-4635-a3e6-43d6e6c5149a button');\n",
 944 |               "          quickchartButtonEl.style.display =\n",
 945 |               "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 946 |               "        })();\n",
 947 |               "      </script>\n",
 948 |               "    </div>\n",
 949 |               "\n",
 950 |               "  <div id=\"id_85d8d765-df25-488b-ae3d-34158ae8be9c\">\n",
 951 |               "    <style>\n",
 952 |               "      .colab-df-generate {\n",
 953 |               "        background-color: #E8F0FE;\n",
 954 |               "        border: none;\n",
 955 |               "        border-radius: 50%;\n",
 956 |               "        cursor: pointer;\n",
 957 |               "        display: none;\n",
 958 |               "        fill: #1967D2;\n",
 959 |               "        height: 32px;\n",
 960 |               "        padding: 0 0 0 0;\n",
 961 |               "        width: 32px;\n",
 962 |               "      }\n",
 963 |               "\n",
 964 |               "      .colab-df-generate:hover {\n",
 965 |               "        background-color: #E2EBFA;\n",
 966 |               "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
 967 |               "        fill: #174EA6;\n",
 968 |               "      }\n",
 969 |               "\n",
 970 |               "      [theme=dark] .colab-df-generate {\n",
 971 |               "        background-color: #3B4455;\n",
 972 |               "        fill: #D2E3FC;\n",
 973 |               "      }\n",
 974 |               "\n",
 975 |               "      [theme=dark] .colab-df-generate:hover {\n",
 976 |               "        background-color: #434B5C;\n",
 977 |               "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
 978 |               "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
 979 |               "        fill: #FFFFFF;\n",
 980 |               "      }\n",
 981 |               "    </style>\n",
 982 |               "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df_syncora')\"\n",
 983 |               "            title=\"Generate code using this dataframe.\"\n",
 984 |               "            style=\"display:none;\">\n",
 985 |               "\n",
 986 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
 987 |               "       width=\"24px\">\n",
 988 |               "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
 989 |               "  </svg>\n",
 990 |               "    </button>\n",
 991 |               "    <script>\n",
 992 |               "      (() => {\n",
 993 |               "      const buttonEl =\n",
 994 |               "        document.querySelector('#id_85d8d765-df25-488b-ae3d-34158ae8be9c button.colab-df-generate');\n",
 995 |               "      buttonEl.style.display =\n",
 996 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
 997 |               "\n",
 998 |               "      buttonEl.onclick = () => {\n",
 999 |               "        google.colab.notebook.generateWithVariable('df_syncora');\n",
1000 |               "      }\n",
1001 |               "      })();\n",
1002 |               "    </script>\n",
1003 |               "  </div>\n",
1004 |               "\n",
1005 |               "    </div>\n",
1006 |               "  </div>\n"
1007 |             ],
1008 |             "application/vnd.google.colaboratory.intrinsic+json": {
1009 |               "type": "dataframe",
1010 |               "variable_name": "df_syncora",
1011 |               "summary": "{\n  \"name\": \"df_syncora\",\n  \"rows\": 29998,\n  \"fields\": [\n    {\n      \"column\": \"Age\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 19,\n        \"min\": 10,\n        \"max\": 92,\n        \"num_unique_values\": 82,\n        \"samples\": [\n          46,\n          52,\n          71\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Gender\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.5000074628547292,\n        \"min\": 0.0,\n        \"max\": 1.0,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0.0,\n          1.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Blood Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2.297786847763739,\n        \"min\": 0.0,\n        \"max\": 7.0,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          4.0,\n          5.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medical Condition\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1.7031843744873485,\n        \"min\": 0.0,\n        \"max\": 5.0,\n        \"num_unique_values\": 6,\n        \"samples\": [\n          4.0,\n          1.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Billing Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 14244.80230719376,\n        \"min\": -2503.2441829154573,\n        \"max\": 54927.96333269359,\n        \"num_unique_values\": 29998,\n        \"samples\": [\n          22686.23873928449,\n          23125.61129632902\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Admission Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.8212432357383687,\n        \"min\": 0.0,\n        \"max\": 2.0,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.0,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medication\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1.4136687408242463,\n        \"min\": 0.0,\n        \"max\": 4.0,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          1.0,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Test Results\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.8145298486109528,\n        \"min\": 0.0,\n        \"max\": 2.0,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1.0,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
1012 |             }
1013 |           },
1014 |           "metadata": {},
1015 |           "execution_count": 15
1016 |         }
1017 |       ],
1018 |       "source": [
1019 |         "df_syncora = pd.read_csv('/content/syncora-healthcare.csv')\n",
1020 |         "df_syncora"
1021 |       ]
1022 |     },
1023 |     {
1024 |       "cell_type": "code",
1025 |       "execution_count": 23,
1026 |       "metadata": {
1027 |         "id": "HFe0Dwj71SQV",
1028 |         "colab": {
1029 |           "base_uri": "https://localhost:8080/",
1030 |           "height": 424
1031 |         },
1032 |         "outputId": "365dff1e-dba5-42bf-bb61-db9e932a1980"
1033 |       },
1034 |       "outputs": [
1035 |         {
1036 |           "output_type": "execute_result",
1037 |           "data": {
1038 |             "text/plain": [
1039 |               "       Age  Gender  Blood Type  Medical Condition  Billing Amount  \\\n",
1040 |               "0       57       1           1                4.0     9222.063822   \n",
1041 |               "1       61       1           5                0.0    48199.843441   \n",
1042 |               "2       79       0           2                5.0    34559.720382   \n",
1043 |               "3       38       1           3                3.0     5152.106075   \n",
1044 |               "4       20       0           5                5.0    47127.044982   \n",
1045 |               "...    ...     ...         ...                ...             ...   \n",
1046 |               "29995   67       0           5                3.0    24048.348990   \n",
1047 |               "29996   67       1           0                5.0      306.935522   \n",
1048 |               "29997   62       1           1                0.0     7144.839921   \n",
1049 |               "29998   34       0           1                2.0    39901.103876   \n",
1050 |               "29999   50       1           2                1.0    12641.611835   \n",
1051 |               "\n",
1052 |               "       Admission Type  Medication  Test Results  \n",
1053 |               "0                   2           2             2  \n",
1054 |               "1                   1           0             1  \n",
1055 |               "2                   2           0             1  \n",
1056 |               "3                   1           0             1  \n",
1057 |               "4                   0           0             2  \n",
1058 |               "...               ...         ...           ...  \n",
1059 |               "29995               1           1             1  \n",
1060 |               "29996               2           4             2  \n",
1061 |               "29997               2           3             0  \n",
1062 |               "29998               1           1             0  \n",
1063 |               "29999               2           2             0  \n",
1064 |               "\n",
1065 |               "[30000 rows x 8 columns]"
1066 |             ],
1067 |             "text/html": [
1068 |               "\n",
1069 |               "  <div id=\"df-34c165e1-00ad-42a1-b32e-f4c36c642245\" class=\"colab-df-container\">\n",
1070 |               "    <div>\n",
1071 |               "<style scoped>\n",
1072 |               "    .dataframe tbody tr th:only-of-type {\n",
1073 |               "        vertical-align: middle;\n",
1074 |               "    }\n",
1075 |               "\n",
1076 |               "    .dataframe tbody tr th {\n",
1077 |               "        vertical-align: top;\n",
1078 |               "    }\n",
1079 |               "\n",
1080 |               "    .dataframe thead th {\n",
1081 |               "        text-align: right;\n",
1082 |               "    }\n",
1083 |               "</style>\n",
1084 |               "<table border=\"1\" class=\"dataframe\">\n",
1085 |               "  <thead>\n",
1086 |               "    <tr style=\"text-align: right;\">\n",
1087 |               "      <th></th>\n",
1088 |               "      <th>Age</th>\n",
1089 |               "      <th>Gender</th>\n",
1090 |               "      <th>Blood Type</th>\n",
1091 |               "      <th>Medical Condition</th>\n",
1092 |               "      <th>Billing Amount</th>\n",
1093 |               "      <th>Admission Type</th>\n",
1094 |               "      <th>Medication</th>\n",
1095 |               "      <th>Test Results</th>\n",
1096 |               "    </tr>\n",
1097 |               "  </thead>\n",
1098 |               "  <tbody>\n",
1099 |               "    <tr>\n",
1100 |               "      <th>0</th>\n",
1101 |               "      <td>57</td>\n",
1102 |               "      <td>1</td>\n",
1103 |               "      <td>1</td>\n",
1104 |               "      <td>4.0</td>\n",
1105 |               "      <td>9222.063822</td>\n",
1106 |               "      <td>2</td>\n",
1107 |               "      <td>2</td>\n",
1108 |               "      <td>2</td>\n",
1109 |               "    </tr>\n",
1110 |               "    <tr>\n",
1111 |               "      <th>1</th>\n",
1112 |               "      <td>61</td>\n",
1113 |               "      <td>1</td>\n",
1114 |               "      <td>5</td>\n",
1115 |               "      <td>0.0</td>\n",
1116 |               "      <td>48199.843441</td>\n",
1117 |               "      <td>1</td>\n",
1118 |               "      <td>0</td>\n",
1119 |               "      <td>1</td>\n",
1120 |               "    </tr>\n",
1121 |               "    <tr>\n",
1122 |               "      <th>2</th>\n",
1123 |               "      <td>79</td>\n",
1124 |               "      <td>0</td>\n",
1125 |               "      <td>2</td>\n",
1126 |               "      <td>5.0</td>\n",
1127 |               "      <td>34559.720382</td>\n",
1128 |               "      <td>2</td>\n",
1129 |               "      <td>0</td>\n",
1130 |               "      <td>1</td>\n",
1131 |               "    </tr>\n",
1132 |               "    <tr>\n",
1133 |               "      <th>3</th>\n",
1134 |               "      <td>38</td>\n",
1135 |               "      <td>1</td>\n",
1136 |               "      <td>3</td>\n",
1137 |               "      <td>3.0</td>\n",
1138 |               "      <td>5152.106075</td>\n",
1139 |               "      <td>1</td>\n",
1140 |               "      <td>0</td>\n",
1141 |               "      <td>1</td>\n",
1142 |               "    </tr>\n",
1143 |               "    <tr>\n",
1144 |               "      <th>4</th>\n",
1145 |               "      <td>20</td>\n",
1146 |               "      <td>0</td>\n",
1147 |               "      <td>5</td>\n",
1148 |               "      <td>5.0</td>\n",
1149 |               "      <td>47127.044982</td>\n",
1150 |               "      <td>0</td>\n",
1151 |               "      <td>0</td>\n",
1152 |               "      <td>2</td>\n",
1153 |               "    </tr>\n",
1154 |               "    <tr>\n",
1155 |               "      <th>...</th>\n",
1156 |               "      <td>...</td>\n",
1157 |               "      <td>...</td>\n",
1158 |               "      <td>...</td>\n",
1159 |               "      <td>...</td>\n",
1160 |               "      <td>...</td>\n",
1161 |               "      <td>...</td>\n",
1162 |               "      <td>...</td>\n",
1163 |               "      <td>...</td>\n",
1164 |               "    </tr>\n",
1165 |               "    <tr>\n",
1166 |               "      <th>29995</th>\n",
1167 |               "      <td>67</td>\n",
1168 |               "      <td>0</td>\n",
1169 |               "      <td>5</td>\n",
1170 |               "      <td>3.0</td>\n",
1171 |               "      <td>24048.348990</td>\n",
1172 |               "      <td>1</td>\n",
1173 |               "      <td>1</td>\n",
1174 |               "      <td>1</td>\n",
1175 |               "    </tr>\n",
1176 |               "    <tr>\n",
1177 |               "      <th>29996</th>\n",
1178 |               "      <td>67</td>\n",
1179 |               "      <td>1</td>\n",
1180 |               "      <td>0</td>\n",
1181 |               "      <td>5.0</td>\n",
1182 |               "      <td>306.935522</td>\n",
1183 |               "      <td>2</td>\n",
1184 |               "      <td>4</td>\n",
1185 |               "      <td>2</td>\n",
1186 |               "    </tr>\n",
1187 |               "    <tr>\n",
1188 |               "      <th>29997</th>\n",
1189 |               "      <td>62</td>\n",
1190 |               "      <td>1</td>\n",
1191 |               "      <td>1</td>\n",
1192 |               "      <td>0.0</td>\n",
1193 |               "      <td>7144.839921</td>\n",
1194 |               "      <td>2</td>\n",
1195 |               "      <td>3</td>\n",
1196 |               "      <td>0</td>\n",
1197 |               "    </tr>\n",
1198 |               "    <tr>\n",
1199 |               "      <th>29998</th>\n",
1200 |               "      <td>34</td>\n",
1201 |               "      <td>0</td>\n",
1202 |               "      <td>1</td>\n",
1203 |               "      <td>2.0</td>\n",
1204 |               "      <td>39901.103876</td>\n",
1205 |               "      <td>1</td>\n",
1206 |               "      <td>1</td>\n",
1207 |               "      <td>0</td>\n",
1208 |               "    </tr>\n",
1209 |               "    <tr>\n",
1210 |               "      <th>29999</th>\n",
1211 |               "      <td>50</td>\n",
1212 |               "      <td>1</td>\n",
1213 |               "      <td>2</td>\n",
1214 |               "      <td>1.0</td>\n",
1215 |               "      <td>12641.611835</td>\n",
1216 |               "      <td>2</td>\n",
1217 |               "      <td>2</td>\n",
1218 |               "      <td>0</td>\n",
1219 |               "    </tr>\n",
1220 |               "  </tbody>\n",
1221 |               "</table>\n",
1222 |               "<p>30000 rows × 8 columns</p>\n",
1223 |               "</div>\n",
1224 |               "    <div class=\"colab-df-buttons\">\n",
1225 |               "\n",
1226 |               "  <div class=\"colab-df-container\">\n",
1227 |               "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-34c165e1-00ad-42a1-b32e-f4c36c642245')\"\n",
1228 |               "            title=\"Convert this dataframe to an interactive table.\"\n",
1229 |               "            style=\"display:none;\">\n",
1230 |               "\n",
1231 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
1232 |               "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
1233 |               "  </svg>\n",
1234 |               "    </button>\n",
1235 |               "\n",
1236 |               "  <style>\n",
1237 |               "    .colab-df-container {\n",
1238 |               "      display:flex;\n",
1239 |               "      gap: 12px;\n",
1240 |               "    }\n",
1241 |               "\n",
1242 |               "    .colab-df-convert {\n",
1243 |               "      background-color: #E8F0FE;\n",
1244 |               "      border: none;\n",
1245 |               "      border-radius: 50%;\n",
1246 |               "      cursor: pointer;\n",
1247 |               "      display: none;\n",
1248 |               "      fill: #1967D2;\n",
1249 |               "      height: 32px;\n",
1250 |               "      padding: 0 0 0 0;\n",
1251 |               "      width: 32px;\n",
1252 |               "    }\n",
1253 |               "\n",
1254 |               "    .colab-df-convert:hover {\n",
1255 |               "      background-color: #E2EBFA;\n",
1256 |               "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1257 |               "      fill: #174EA6;\n",
1258 |               "    }\n",
1259 |               "\n",
1260 |               "    .colab-df-buttons div {\n",
1261 |               "      margin-bottom: 4px;\n",
1262 |               "    }\n",
1263 |               "\n",
1264 |               "    [theme=dark] .colab-df-convert {\n",
1265 |               "      background-color: #3B4455;\n",
1266 |               "      fill: #D2E3FC;\n",
1267 |               "    }\n",
1268 |               "\n",
1269 |               "    [theme=dark] .colab-df-convert:hover {\n",
1270 |               "      background-color: #434B5C;\n",
1271 |               "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
1272 |               "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
1273 |               "      fill: #FFFFFF;\n",
1274 |               "    }\n",
1275 |               "  </style>\n",
1276 |               "\n",
1277 |               "    <script>\n",
1278 |               "      const buttonEl =\n",
1279 |               "        document.querySelector('#df-34c165e1-00ad-42a1-b32e-f4c36c642245 button.colab-df-convert');\n",
1280 |               "      buttonEl.style.display =\n",
1281 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1282 |               "\n",
1283 |               "      async function convertToInteractive(key) {\n",
1284 |               "        const element = document.querySelector('#df-34c165e1-00ad-42a1-b32e-f4c36c642245');\n",
1285 |               "        const dataTable =\n",
1286 |               "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
1287 |               "                                                    [key], {});\n",
1288 |               "        if (!dataTable) return;\n",
1289 |               "\n",
1290 |               "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
1291 |               "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
1292 |               "          + ' to learn more about interactive tables.';\n",
1293 |               "        element.innerHTML = '';\n",
1294 |               "        dataTable['output_type'] = 'display_data';\n",
1295 |               "        await google.colab.output.renderOutput(dataTable, element);\n",
1296 |               "        const docLink = document.createElement('div');\n",
1297 |               "        docLink.innerHTML = docLinkHtml;\n",
1298 |               "        element.appendChild(docLink);\n",
1299 |               "      }\n",
1300 |               "    </script>\n",
1301 |               "  </div>\n",
1302 |               "\n",
1303 |               "\n",
1304 |               "    <div id=\"df-54babb5f-fbd6-4d03-aae7-0eb0cad636d3\">\n",
1305 |               "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-54babb5f-fbd6-4d03-aae7-0eb0cad636d3')\"\n",
1306 |               "                title=\"Suggest charts\"\n",
1307 |               "                style=\"display:none;\">\n",
1308 |               "\n",
1309 |               "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
1310 |               "     width=\"24px\">\n",
1311 |               "    <g>\n",
1312 |               "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
1313 |               "    </g>\n",
1314 |               "</svg>\n",
1315 |               "      </button>\n",
1316 |               "\n",
1317 |               "<style>\n",
1318 |               "  .colab-df-quickchart {\n",
1319 |               "      --bg-color: #E8F0FE;\n",
1320 |               "      --fill-color: #1967D2;\n",
1321 |               "      --hover-bg-color: #E2EBFA;\n",
1322 |               "      --hover-fill-color: #174EA6;\n",
1323 |               "      --disabled-fill-color: #AAA;\n",
1324 |               "      --disabled-bg-color: #DDD;\n",
1325 |               "  }\n",
1326 |               "\n",
1327 |               "  [theme=dark] .colab-df-quickchart {\n",
1328 |               "      --bg-color: #3B4455;\n",
1329 |               "      --fill-color: #D2E3FC;\n",
1330 |               "      --hover-bg-color: #434B5C;\n",
1331 |               "      --hover-fill-color: #FFFFFF;\n",
1332 |               "      --disabled-bg-color: #3B4455;\n",
1333 |               "      --disabled-fill-color: #666;\n",
1334 |               "  }\n",
1335 |               "\n",
1336 |               "  .colab-df-quickchart {\n",
1337 |               "    background-color: var(--bg-color);\n",
1338 |               "    border: none;\n",
1339 |               "    border-radius: 50%;\n",
1340 |               "    cursor: pointer;\n",
1341 |               "    display: none;\n",
1342 |               "    fill: var(--fill-color);\n",
1343 |               "    height: 32px;\n",
1344 |               "    padding: 0;\n",
1345 |               "    width: 32px;\n",
1346 |               "  }\n",
1347 |               "\n",
1348 |               "  .colab-df-quickchart:hover {\n",
1349 |               "    background-color: var(--hover-bg-color);\n",
1350 |               "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1351 |               "    fill: var(--button-hover-fill-color);\n",
1352 |               "  }\n",
1353 |               "\n",
1354 |               "  .colab-df-quickchart-complete:disabled,\n",
1355 |               "  .colab-df-quickchart-complete:disabled:hover {\n",
1356 |               "    background-color: var(--disabled-bg-color);\n",
1357 |               "    fill: var(--disabled-fill-color);\n",
1358 |               "    box-shadow: none;\n",
1359 |               "  }\n",
1360 |               "\n",
1361 |               "  .colab-df-spinner {\n",
1362 |               "    border: 2px solid var(--fill-color);\n",
1363 |               "    border-color: transparent;\n",
1364 |               "    border-bottom-color: var(--fill-color);\n",
1365 |               "    animation:\n",
1366 |               "      spin 1s steps(1) infinite;\n",
1367 |               "  }\n",
1368 |               "\n",
1369 |               "  @keyframes spin {\n",
1370 |               "    0% {\n",
1371 |               "      border-color: transparent;\n",
1372 |               "      border-bottom-color: var(--fill-color);\n",
1373 |               "      border-left-color: var(--fill-color);\n",
1374 |               "    }\n",
1375 |               "    20% {\n",
1376 |               "      border-color: transparent;\n",
1377 |               "      border-left-color: var(--fill-color);\n",
1378 |               "      border-top-color: var(--fill-color);\n",
1379 |               "    }\n",
1380 |               "    30% {\n",
1381 |               "      border-color: transparent;\n",
1382 |               "      border-left-color: var(--fill-color);\n",
1383 |               "      border-top-color: var(--fill-color);\n",
1384 |               "      border-right-color: var(--fill-color);\n",
1385 |               "    }\n",
1386 |               "    40% {\n",
1387 |               "      border-color: transparent;\n",
1388 |               "      border-right-color: var(--fill-color);\n",
1389 |               "      border-top-color: var(--fill-color);\n",
1390 |               "    }\n",
1391 |               "    60% {\n",
1392 |               "      border-color: transparent;\n",
1393 |               "      border-right-color: var(--fill-color);\n",
1394 |               "    }\n",
1395 |               "    80% {\n",
1396 |               "      border-color: transparent;\n",
1397 |               "      border-right-color: var(--fill-color);\n",
1398 |               "      border-bottom-color: var(--fill-color);\n",
1399 |               "    }\n",
1400 |               "    90% {\n",
1401 |               "      border-color: transparent;\n",
1402 |               "      border-bottom-color: var(--fill-color);\n",
1403 |               "    }\n",
1404 |               "  }\n",
1405 |               "</style>\n",
1406 |               "\n",
1407 |               "      <script>\n",
1408 |               "        async function quickchart(key) {\n",
1409 |               "          const quickchartButtonEl =\n",
1410 |               "            document.querySelector('#' + key + ' button');\n",
1411 |               "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
1412 |               "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
1413 |               "          try {\n",
1414 |               "            const charts = await google.colab.kernel.invokeFunction(\n",
1415 |               "                'suggestCharts', [key], {});\n",
1416 |               "          } catch (error) {\n",
1417 |               "            console.error('Error during call to suggestCharts:', error);\n",
1418 |               "          }\n",
1419 |               "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
1420 |               "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
1421 |               "        }\n",
1422 |               "        (() => {\n",
1423 |               "          let quickchartButtonEl =\n",
1424 |               "            document.querySelector('#df-54babb5f-fbd6-4d03-aae7-0eb0cad636d3 button');\n",
1425 |               "          quickchartButtonEl.style.display =\n",
1426 |               "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1427 |               "        })();\n",
1428 |               "      </script>\n",
1429 |               "    </div>\n",
1430 |               "\n",
1431 |               "  <div id=\"id_aff826b7-5a47-41f0-9d6b-dc74c95a8c6a\">\n",
1432 |               "    <style>\n",
1433 |               "      .colab-df-generate {\n",
1434 |               "        background-color: #E8F0FE;\n",
1435 |               "        border: none;\n",
1436 |               "        border-radius: 50%;\n",
1437 |               "        cursor: pointer;\n",
1438 |               "        display: none;\n",
1439 |               "        fill: #1967D2;\n",
1440 |               "        height: 32px;\n",
1441 |               "        padding: 0 0 0 0;\n",
1442 |               "        width: 32px;\n",
1443 |               "      }\n",
1444 |               "\n",
1445 |               "      .colab-df-generate:hover {\n",
1446 |               "        background-color: #E2EBFA;\n",
1447 |               "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1448 |               "        fill: #174EA6;\n",
1449 |               "      }\n",
1450 |               "\n",
1451 |               "      [theme=dark] .colab-df-generate {\n",
1452 |               "        background-color: #3B4455;\n",
1453 |               "        fill: #D2E3FC;\n",
1454 |               "      }\n",
1455 |               "\n",
1456 |               "      [theme=dark] .colab-df-generate:hover {\n",
1457 |               "        background-color: #434B5C;\n",
1458 |               "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
1459 |               "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
1460 |               "        fill: #FFFFFF;\n",
1461 |               "      }\n",
1462 |               "    </style>\n",
1463 |               "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df_gretel')\"\n",
1464 |               "            title=\"Generate code using this dataframe.\"\n",
1465 |               "            style=\"display:none;\">\n",
1466 |               "\n",
1467 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
1468 |               "       width=\"24px\">\n",
1469 |               "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
1470 |               "  </svg>\n",
1471 |               "    </button>\n",
1472 |               "    <script>\n",
1473 |               "      (() => {\n",
1474 |               "      const buttonEl =\n",
1475 |               "        document.querySelector('#id_aff826b7-5a47-41f0-9d6b-dc74c95a8c6a button.colab-df-generate');\n",
1476 |               "      buttonEl.style.display =\n",
1477 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1478 |               "\n",
1479 |               "      buttonEl.onclick = () => {\n",
1480 |               "        google.colab.notebook.generateWithVariable('df_gretel');\n",
1481 |               "      }\n",
1482 |               "      })();\n",
1483 |               "    </script>\n",
1484 |               "  </div>\n",
1485 |               "\n",
1486 |               "    </div>\n",
1487 |               "  </div>\n"
1488 |             ],
1489 |             "application/vnd.google.colaboratory.intrinsic+json": {
1490 |               "type": "dataframe",
1491 |               "variable_name": "df_gretel",
1492 |               "summary": "{\n  \"name\": \"df_gretel\",\n  \"rows\": 30000,\n  \"fields\": [\n    {\n      \"column\": \"Age\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 19,\n        \"min\": 13,\n        \"max\": 89,\n        \"num_unique_values\": 77,\n        \"samples\": [\n          20,\n          47,\n          28\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Gender\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Blood Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2,\n        \"min\": 0,\n        \"max\": 7,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          5,\n          6\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medical Condition\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1.6875826439204777,\n        \"min\": 0.0,\n        \"max\": 5.0,\n        \"num_unique_values\": 6,\n        \"samples\": [\n          4.0,\n          0.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Billing Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 15707.36726458584,\n        \"min\": -1956.0183883179,\n        \"max\": 52755.0249810965,\n        \"num_unique_values\": 30000,\n        \"samples\": [\n          13652.8463977239,\n          14627.807964455\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Admission Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medication\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 4,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Test Results\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
1493 |             }
1494 |           },
1495 |           "metadata": {},
1496 |           "execution_count": 23
1497 |         }
1498 |       ],
1499 |       "source": [
1500 |         "df_gretel = pd.read_csv('/content/gretel-healthcare.csv')\n",
1501 |         "df_gretel\n"
1502 |       ]
1503 |     },
1504 |     {
1505 |       "cell_type": "code",
1506 |       "source": [
1507 |         "df_mostlyai = pd.read_csv('/content/mostlyai-healthcare.csv')\n",
1508 |         "df_mostlyai"
1509 |       ],
1510 |       "metadata": {
1511 |         "colab": {
1512 |           "base_uri": "https://localhost:8080/",
1513 |           "height": 424
1514 |         },
1515 |         "id": "gZITpNXRLsYa",
1516 |         "outputId": "66c7a807-d82e-4ef1-d797-3ee2f79bd0f5"
1517 |       },
1518 |       "execution_count": 25,
1519 |       "outputs": [
1520 |         {
1521 |           "output_type": "execute_result",
1522 |           "data": {
1523 |             "text/plain": [
1524 |               "       Age  Gender  Blood Type  Medical Condition  Billing Amount  \\\n",
1525 |               "0       19       1           7                  5    34357.470891   \n",
1526 |               "1       45       0           7                  4     5015.284517   \n",
1527 |               "2       29       1           2                  3    45050.120972   \n",
1528 |               "3       42       1           3                  5    27874.820180   \n",
1529 |               "4       83       0           7                  5    14946.856473   \n",
1530 |               "...    ...     ...         ...                ...             ...   \n",
1531 |               "29995   76       1           0                  1    12813.129371   \n",
1532 |               "29996   23       0           3                  5     5900.749086   \n",
1533 |               "29997   34       1           1                  5    44033.367695   \n",
1534 |               "29998   34       0           0                  3    48530.034546   \n",
1535 |               "29999   51       1           0                  4    13618.873465   \n",
1536 |               "\n",
1537 |               "       Admission Type  Medication  Test Results  \n",
1538 |               "0                   1           0             0  \n",
1539 |               "1                   0           4             0  \n",
1540 |               "2                   2           0             2  \n",
1541 |               "3                   1           3             2  \n",
1542 |               "4                   2           3             0  \n",
1543 |               "...               ...         ...           ...  \n",
1544 |               "29995               0           1             2  \n",
1545 |               "29996               0           0             1  \n",
1546 |               "29997               2           3             2  \n",
1547 |               "29998               2           4             2  \n",
1548 |               "29999               0           0             2  \n",
1549 |               "\n",
1550 |               "[30000 rows x 8 columns]"
1551 |             ],
1552 |             "text/html": [
1553 |               "\n",
1554 |               "  <div id=\"df-0a6d3452-bcb9-4cfc-a8be-1b26490ea071\" class=\"colab-df-container\">\n",
1555 |               "    <div>\n",
1556 |               "<style scoped>\n",
1557 |               "    .dataframe tbody tr th:only-of-type {\n",
1558 |               "        vertical-align: middle;\n",
1559 |               "    }\n",
1560 |               "\n",
1561 |               "    .dataframe tbody tr th {\n",
1562 |               "        vertical-align: top;\n",
1563 |               "    }\n",
1564 |               "\n",
1565 |               "    .dataframe thead th {\n",
1566 |               "        text-align: right;\n",
1567 |               "    }\n",
1568 |               "</style>\n",
1569 |               "<table border=\"1\" class=\"dataframe\">\n",
1570 |               "  <thead>\n",
1571 |               "    <tr style=\"text-align: right;\">\n",
1572 |               "      <th></th>\n",
1573 |               "      <th>Age</th>\n",
1574 |               "      <th>Gender</th>\n",
1575 |               "      <th>Blood Type</th>\n",
1576 |               "      <th>Medical Condition</th>\n",
1577 |               "      <th>Billing Amount</th>\n",
1578 |               "      <th>Admission Type</th>\n",
1579 |               "      <th>Medication</th>\n",
1580 |               "      <th>Test Results</th>\n",
1581 |               "    </tr>\n",
1582 |               "  </thead>\n",
1583 |               "  <tbody>\n",
1584 |               "    <tr>\n",
1585 |               "      <th>0</th>\n",
1586 |               "      <td>19</td>\n",
1587 |               "      <td>1</td>\n",
1588 |               "      <td>7</td>\n",
1589 |               "      <td>5</td>\n",
1590 |               "      <td>34357.470891</td>\n",
1591 |               "      <td>1</td>\n",
1592 |               "      <td>0</td>\n",
1593 |               "      <td>0</td>\n",
1594 |               "    </tr>\n",
1595 |               "    <tr>\n",
1596 |               "      <th>1</th>\n",
1597 |               "      <td>45</td>\n",
1598 |               "      <td>0</td>\n",
1599 |               "      <td>7</td>\n",
1600 |               "      <td>4</td>\n",
1601 |               "      <td>5015.284517</td>\n",
1602 |               "      <td>0</td>\n",
1603 |               "      <td>4</td>\n",
1604 |               "      <td>0</td>\n",
1605 |               "    </tr>\n",
1606 |               "    <tr>\n",
1607 |               "      <th>2</th>\n",
1608 |               "      <td>29</td>\n",
1609 |               "      <td>1</td>\n",
1610 |               "      <td>2</td>\n",
1611 |               "      <td>3</td>\n",
1612 |               "      <td>45050.120972</td>\n",
1613 |               "      <td>2</td>\n",
1614 |               "      <td>0</td>\n",
1615 |               "      <td>2</td>\n",
1616 |               "    </tr>\n",
1617 |               "    <tr>\n",
1618 |               "      <th>3</th>\n",
1619 |               "      <td>42</td>\n",
1620 |               "      <td>1</td>\n",
1621 |               "      <td>3</td>\n",
1622 |               "      <td>5</td>\n",
1623 |               "      <td>27874.820180</td>\n",
1624 |               "      <td>1</td>\n",
1625 |               "      <td>3</td>\n",
1626 |               "      <td>2</td>\n",
1627 |               "    </tr>\n",
1628 |               "    <tr>\n",
1629 |               "      <th>4</th>\n",
1630 |               "      <td>83</td>\n",
1631 |               "      <td>0</td>\n",
1632 |               "      <td>7</td>\n",
1633 |               "      <td>5</td>\n",
1634 |               "      <td>14946.856473</td>\n",
1635 |               "      <td>2</td>\n",
1636 |               "      <td>3</td>\n",
1637 |               "      <td>0</td>\n",
1638 |               "    </tr>\n",
1639 |               "    <tr>\n",
1640 |               "      <th>...</th>\n",
1641 |               "      <td>...</td>\n",
1642 |               "      <td>...</td>\n",
1643 |               "      <td>...</td>\n",
1644 |               "      <td>...</td>\n",
1645 |               "      <td>...</td>\n",
1646 |               "      <td>...</td>\n",
1647 |               "      <td>...</td>\n",
1648 |               "      <td>...</td>\n",
1649 |               "    </tr>\n",
1650 |               "    <tr>\n",
1651 |               "      <th>29995</th>\n",
1652 |               "      <td>76</td>\n",
1653 |               "      <td>1</td>\n",
1654 |               "      <td>0</td>\n",
1655 |               "      <td>1</td>\n",
1656 |               "      <td>12813.129371</td>\n",
1657 |               "      <td>0</td>\n",
1658 |               "      <td>1</td>\n",
1659 |               "      <td>2</td>\n",
1660 |               "    </tr>\n",
1661 |               "    <tr>\n",
1662 |               "      <th>29996</th>\n",
1663 |               "      <td>23</td>\n",
1664 |               "      <td>0</td>\n",
1665 |               "      <td>3</td>\n",
1666 |               "      <td>5</td>\n",
1667 |               "      <td>5900.749086</td>\n",
1668 |               "      <td>0</td>\n",
1669 |               "      <td>0</td>\n",
1670 |               "      <td>1</td>\n",
1671 |               "    </tr>\n",
1672 |               "    <tr>\n",
1673 |               "      <th>29997</th>\n",
1674 |               "      <td>34</td>\n",
1675 |               "      <td>1</td>\n",
1676 |               "      <td>1</td>\n",
1677 |               "      <td>5</td>\n",
1678 |               "      <td>44033.367695</td>\n",
1679 |               "      <td>2</td>\n",
1680 |               "      <td>3</td>\n",
1681 |               "      <td>2</td>\n",
1682 |               "    </tr>\n",
1683 |               "    <tr>\n",
1684 |               "      <th>29998</th>\n",
1685 |               "      <td>34</td>\n",
1686 |               "      <td>0</td>\n",
1687 |               "      <td>0</td>\n",
1688 |               "      <td>3</td>\n",
1689 |               "      <td>48530.034546</td>\n",
1690 |               "      <td>2</td>\n",
1691 |               "      <td>4</td>\n",
1692 |               "      <td>2</td>\n",
1693 |               "    </tr>\n",
1694 |               "    <tr>\n",
1695 |               "      <th>29999</th>\n",
1696 |               "      <td>51</td>\n",
1697 |               "      <td>1</td>\n",
1698 |               "      <td>0</td>\n",
1699 |               "      <td>4</td>\n",
1700 |               "      <td>13618.873465</td>\n",
1701 |               "      <td>0</td>\n",
1702 |               "      <td>0</td>\n",
1703 |               "      <td>2</td>\n",
1704 |               "    </tr>\n",
1705 |               "  </tbody>\n",
1706 |               "</table>\n",
1707 |               "<p>30000 rows × 8 columns</p>\n",
1708 |               "</div>\n",
1709 |               "    <div class=\"colab-df-buttons\">\n",
1710 |               "\n",
1711 |               "  <div class=\"colab-df-container\">\n",
1712 |               "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0a6d3452-bcb9-4cfc-a8be-1b26490ea071')\"\n",
1713 |               "            title=\"Convert this dataframe to an interactive table.\"\n",
1714 |               "            style=\"display:none;\">\n",
1715 |               "\n",
1716 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
1717 |               "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
1718 |               "  </svg>\n",
1719 |               "    </button>\n",
1720 |               "\n",
1721 |               "  <style>\n",
1722 |               "    .colab-df-container {\n",
1723 |               "      display:flex;\n",
1724 |               "      gap: 12px;\n",
1725 |               "    }\n",
1726 |               "\n",
1727 |               "    .colab-df-convert {\n",
1728 |               "      background-color: #E8F0FE;\n",
1729 |               "      border: none;\n",
1730 |               "      border-radius: 50%;\n",
1731 |               "      cursor: pointer;\n",
1732 |               "      display: none;\n",
1733 |               "      fill: #1967D2;\n",
1734 |               "      height: 32px;\n",
1735 |               "      padding: 0 0 0 0;\n",
1736 |               "      width: 32px;\n",
1737 |               "    }\n",
1738 |               "\n",
1739 |               "    .colab-df-convert:hover {\n",
1740 |               "      background-color: #E2EBFA;\n",
1741 |               "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1742 |               "      fill: #174EA6;\n",
1743 |               "    }\n",
1744 |               "\n",
1745 |               "    .colab-df-buttons div {\n",
1746 |               "      margin-bottom: 4px;\n",
1747 |               "    }\n",
1748 |               "\n",
1749 |               "    [theme=dark] .colab-df-convert {\n",
1750 |               "      background-color: #3B4455;\n",
1751 |               "      fill: #D2E3FC;\n",
1752 |               "    }\n",
1753 |               "\n",
1754 |               "    [theme=dark] .colab-df-convert:hover {\n",
1755 |               "      background-color: #434B5C;\n",
1756 |               "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
1757 |               "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
1758 |               "      fill: #FFFFFF;\n",
1759 |               "    }\n",
1760 |               "  </style>\n",
1761 |               "\n",
1762 |               "    <script>\n",
1763 |               "      const buttonEl =\n",
1764 |               "        document.querySelector('#df-0a6d3452-bcb9-4cfc-a8be-1b26490ea071 button.colab-df-convert');\n",
1765 |               "      buttonEl.style.display =\n",
1766 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1767 |               "\n",
1768 |               "      async function convertToInteractive(key) {\n",
1769 |               "        const element = document.querySelector('#df-0a6d3452-bcb9-4cfc-a8be-1b26490ea071');\n",
1770 |               "        const dataTable =\n",
1771 |               "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
1772 |               "                                                    [key], {});\n",
1773 |               "        if (!dataTable) return;\n",
1774 |               "\n",
1775 |               "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
1776 |               "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
1777 |               "          + ' to learn more about interactive tables.';\n",
1778 |               "        element.innerHTML = '';\n",
1779 |               "        dataTable['output_type'] = 'display_data';\n",
1780 |               "        await google.colab.output.renderOutput(dataTable, element);\n",
1781 |               "        const docLink = document.createElement('div');\n",
1782 |               "        docLink.innerHTML = docLinkHtml;\n",
1783 |               "        element.appendChild(docLink);\n",
1784 |               "      }\n",
1785 |               "    </script>\n",
1786 |               "  </div>\n",
1787 |               "\n",
1788 |               "\n",
1789 |               "    <div id=\"df-907eccbb-9415-458a-8968-6f9b82aacbc2\">\n",
1790 |               "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-907eccbb-9415-458a-8968-6f9b82aacbc2')\"\n",
1791 |               "                title=\"Suggest charts\"\n",
1792 |               "                style=\"display:none;\">\n",
1793 |               "\n",
1794 |               "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
1795 |               "     width=\"24px\">\n",
1796 |               "    <g>\n",
1797 |               "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
1798 |               "    </g>\n",
1799 |               "</svg>\n",
1800 |               "      </button>\n",
1801 |               "\n",
1802 |               "<style>\n",
1803 |               "  .colab-df-quickchart {\n",
1804 |               "      --bg-color: #E8F0FE;\n",
1805 |               "      --fill-color: #1967D2;\n",
1806 |               "      --hover-bg-color: #E2EBFA;\n",
1807 |               "      --hover-fill-color: #174EA6;\n",
1808 |               "      --disabled-fill-color: #AAA;\n",
1809 |               "      --disabled-bg-color: #DDD;\n",
1810 |               "  }\n",
1811 |               "\n",
1812 |               "  [theme=dark] .colab-df-quickchart {\n",
1813 |               "      --bg-color: #3B4455;\n",
1814 |               "      --fill-color: #D2E3FC;\n",
1815 |               "      --hover-bg-color: #434B5C;\n",
1816 |               "      --hover-fill-color: #FFFFFF;\n",
1817 |               "      --disabled-bg-color: #3B4455;\n",
1818 |               "      --disabled-fill-color: #666;\n",
1819 |               "  }\n",
1820 |               "\n",
1821 |               "  .colab-df-quickchart {\n",
1822 |               "    background-color: var(--bg-color);\n",
1823 |               "    border: none;\n",
1824 |               "    border-radius: 50%;\n",
1825 |               "    cursor: pointer;\n",
1826 |               "    display: none;\n",
1827 |               "    fill: var(--fill-color);\n",
1828 |               "    height: 32px;\n",
1829 |               "    padding: 0;\n",
1830 |               "    width: 32px;\n",
1831 |               "  }\n",
1832 |               "\n",
1833 |               "  .colab-df-quickchart:hover {\n",
1834 |               "    background-color: var(--hover-bg-color);\n",
1835 |               "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1836 |               "    fill: var(--button-hover-fill-color);\n",
1837 |               "  }\n",
1838 |               "\n",
1839 |               "  .colab-df-quickchart-complete:disabled,\n",
1840 |               "  .colab-df-quickchart-complete:disabled:hover {\n",
1841 |               "    background-color: var(--disabled-bg-color);\n",
1842 |               "    fill: var(--disabled-fill-color);\n",
1843 |               "    box-shadow: none;\n",
1844 |               "  }\n",
1845 |               "\n",
1846 |               "  .colab-df-spinner {\n",
1847 |               "    border: 2px solid var(--fill-color);\n",
1848 |               "    border-color: transparent;\n",
1849 |               "    border-bottom-color: var(--fill-color);\n",
1850 |               "    animation:\n",
1851 |               "      spin 1s steps(1) infinite;\n",
1852 |               "  }\n",
1853 |               "\n",
1854 |               "  @keyframes spin {\n",
1855 |               "    0% {\n",
1856 |               "      border-color: transparent;\n",
1857 |               "      border-bottom-color: var(--fill-color);\n",
1858 |               "      border-left-color: var(--fill-color);\n",
1859 |               "    }\n",
1860 |               "    20% {\n",
1861 |               "      border-color: transparent;\n",
1862 |               "      border-left-color: var(--fill-color);\n",
1863 |               "      border-top-color: var(--fill-color);\n",
1864 |               "    }\n",
1865 |               "    30% {\n",
1866 |               "      border-color: transparent;\n",
1867 |               "      border-left-color: var(--fill-color);\n",
1868 |               "      border-top-color: var(--fill-color);\n",
1869 |               "      border-right-color: var(--fill-color);\n",
1870 |               "    }\n",
1871 |               "    40% {\n",
1872 |               "      border-color: transparent;\n",
1873 |               "      border-right-color: var(--fill-color);\n",
1874 |               "      border-top-color: var(--fill-color);\n",
1875 |               "    }\n",
1876 |               "    60% {\n",
1877 |               "      border-color: transparent;\n",
1878 |               "      border-right-color: var(--fill-color);\n",
1879 |               "    }\n",
1880 |               "    80% {\n",
1881 |               "      border-color: transparent;\n",
1882 |               "      border-right-color: var(--fill-color);\n",
1883 |               "      border-bottom-color: var(--fill-color);\n",
1884 |               "    }\n",
1885 |               "    90% {\n",
1886 |               "      border-color: transparent;\n",
1887 |               "      border-bottom-color: var(--fill-color);\n",
1888 |               "    }\n",
1889 |               "  }\n",
1890 |               "</style>\n",
1891 |               "\n",
1892 |               "      <script>\n",
1893 |               "        async function quickchart(key) {\n",
1894 |               "          const quickchartButtonEl =\n",
1895 |               "            document.querySelector('#' + key + ' button');\n",
1896 |               "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
1897 |               "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
1898 |               "          try {\n",
1899 |               "            const charts = await google.colab.kernel.invokeFunction(\n",
1900 |               "                'suggestCharts', [key], {});\n",
1901 |               "          } catch (error) {\n",
1902 |               "            console.error('Error during call to suggestCharts:', error);\n",
1903 |               "          }\n",
1904 |               "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
1905 |               "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
1906 |               "        }\n",
1907 |               "        (() => {\n",
1908 |               "          let quickchartButtonEl =\n",
1909 |               "            document.querySelector('#df-907eccbb-9415-458a-8968-6f9b82aacbc2 button');\n",
1910 |               "          quickchartButtonEl.style.display =\n",
1911 |               "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1912 |               "        })();\n",
1913 |               "      </script>\n",
1914 |               "    </div>\n",
1915 |               "\n",
1916 |               "  <div id=\"id_9a910497-9173-4f22-bae9-455f74f78d3e\">\n",
1917 |               "    <style>\n",
1918 |               "      .colab-df-generate {\n",
1919 |               "        background-color: #E8F0FE;\n",
1920 |               "        border: none;\n",
1921 |               "        border-radius: 50%;\n",
1922 |               "        cursor: pointer;\n",
1923 |               "        display: none;\n",
1924 |               "        fill: #1967D2;\n",
1925 |               "        height: 32px;\n",
1926 |               "        padding: 0 0 0 0;\n",
1927 |               "        width: 32px;\n",
1928 |               "      }\n",
1929 |               "\n",
1930 |               "      .colab-df-generate:hover {\n",
1931 |               "        background-color: #E2EBFA;\n",
1932 |               "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
1933 |               "        fill: #174EA6;\n",
1934 |               "      }\n",
1935 |               "\n",
1936 |               "      [theme=dark] .colab-df-generate {\n",
1937 |               "        background-color: #3B4455;\n",
1938 |               "        fill: #D2E3FC;\n",
1939 |               "      }\n",
1940 |               "\n",
1941 |               "      [theme=dark] .colab-df-generate:hover {\n",
1942 |               "        background-color: #434B5C;\n",
1943 |               "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
1944 |               "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
1945 |               "        fill: #FFFFFF;\n",
1946 |               "      }\n",
1947 |               "    </style>\n",
1948 |               "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df_mostlyai')\"\n",
1949 |               "            title=\"Generate code using this dataframe.\"\n",
1950 |               "            style=\"display:none;\">\n",
1951 |               "\n",
1952 |               "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
1953 |               "       width=\"24px\">\n",
1954 |               "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
1955 |               "  </svg>\n",
1956 |               "    </button>\n",
1957 |               "    <script>\n",
1958 |               "      (() => {\n",
1959 |               "      const buttonEl =\n",
1960 |               "        document.querySelector('#id_9a910497-9173-4f22-bae9-455f74f78d3e button.colab-df-generate');\n",
1961 |               "      buttonEl.style.display =\n",
1962 |               "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
1963 |               "\n",
1964 |               "      buttonEl.onclick = () => {\n",
1965 |               "        google.colab.notebook.generateWithVariable('df_mostlyai');\n",
1966 |               "      }\n",
1967 |               "      })();\n",
1968 |               "    </script>\n",
1969 |               "  </div>\n",
1970 |               "\n",
1971 |               "    </div>\n",
1972 |               "  </div>\n"
1973 |             ],
1974 |             "application/vnd.google.colaboratory.intrinsic+json": {
1975 |               "type": "dataframe",
1976 |               "variable_name": "df_mostlyai",
1977 |               "summary": "{\n  \"name\": \"df_mostlyai\",\n  \"rows\": 30000,\n  \"fields\": [\n    {\n      \"column\": \"Age\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 19,\n        \"min\": 13,\n        \"max\": 89,\n        \"num_unique_values\": 77,\n        \"samples\": [\n          83,\n          62,\n          70\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Gender\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Blood Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2,\n        \"min\": 0,\n        \"max\": 7,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          2,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medical Condition\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 5,\n        \"num_unique_values\": 6,\n        \"samples\": [\n          5,\n          4\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Billing Amount\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 14212.245751873123,\n        \"min\": -1310.2728947084124,\n        \"max\": 52170.03685355641,\n        \"num_unique_values\": 29981,\n        \"samples\": [\n          12914.23721681,\n          16524.88569619\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Admission Type\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Medication\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 4,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          4,\n          2\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Test Results\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          0,\n          2\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
1978 |             }
1979 |           },
1980 |           "metadata": {},
1981 |           "execution_count": 25
1982 |         }
1983 |       ]
1984 |     },
1985 |     {
1986 |       "cell_type": "markdown",
1987 |       "source": [
1988 |         "The Below model training is specifically for : https://www.kaggle.com/datasets/prasad22/healthcare-dataset/code this dataset, feel free to change the code if you want to try different datasets."
1989 |       ],
1990 |       "metadata": {
1991 |         "id": "tbn1MxPUJHHe"
1992 |       }
1993 |     },
1994 |     {
1995 |       "cell_type": "code",
1996 |       "execution_count": 26,
1997 |       "metadata": {
1998 |         "id": "FlkjgRx01okZ",
1999 |         "colab": {
2000 |           "base_uri": "https://localhost:8080/"
2001 |         },
2002 |         "outputId": "9fab4875-4a7b-43a6-e4b9-e928a4af1f8c"
2003 |       },
2004 |       "outputs": [
2005 |         {
2006 |           "output_type": "stream",
2007 |           "name": "stdout",
2008 |           "text": [
2009 |             "Classification Report:\n",
2010 |             "              precision    recall  f1-score   support\n",
2011 |             "\n",
2012 |             "         0.0       0.42      0.42      0.42      3754\n",
2013 |             "         1.0       0.42      0.43      0.42      3617\n",
2014 |             "         2.0       0.43      0.42      0.42      3729\n",
2015 |             "\n",
2016 |             "    accuracy                           0.42     11100\n",
2017 |             "   macro avg       0.42      0.42      0.42     11100\n",
2018 |             "weighted avg       0.42      0.42      0.42     11100\n",
2019 |             "\n",
2020 |             "Accuracy: 0.4218018018018018\n"
2021 |           ]
2022 |         }
2023 |       ],
2024 |       "source": [
2025 |         "# Import necessary libraries\n",
2026 |         "from sklearn.model_selection import train_test_split\n",
2027 |         "from sklearn.ensemble import RandomForestClassifier\n",
2028 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2029 |         "\n",
2030 |         "# Define features (X) and target (y)\n",
2031 |         "\n",
2032 |         "X = df_real.drop(['Test Results'], axis=1)\n",
2033 |         "y = df_real['Test Results']\n",
2034 |         "\n",
2035 |         "# Split data into training and testing sets\n",
2036 |         "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
2037 |         "\n",
2038 |         "# Initialize and train the Random Forest Classifier\n",
2039 |         "model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2040 |         "model.fit(X_train, y_train)\n",
2041 |         "\n",
2042 |         "# Make predictions on the test set\n",
2043 |         "y_pred = model.predict(X_test)\n",
2044 |         "\n",
2045 |         "# Print the classification report\n",
2046 |         "print(\"Classification Report:\")\n",
2047 |         "print(classification_report(y_test, y_pred))\n",
2048 |         "\n",
2049 |         "# Print the accuracy score\n",
2050 |         "print(\"Accuracy:\", accuracy_score(y_test, y_pred))"
2051 |       ]
2052 |     },
2053 |     {
2054 |       "cell_type": "code",
2055 |       "source": [
2056 |         "from sklearn.model_selection import train_test_split\n",
2057 |         "from sklearn.ensemble import RandomForestClassifier\n",
2058 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2059 |         "import pandas as pd\n",
2060 |         "\n",
2061 |         "# Step 1: Split df_real into training and test sets (20% held out for testing)\n",
2062 |         "real_train, real_test = train_test_split(df_real, test_size=0.2, random_state=42)\n",
2063 |         "\n",
2064 |         "# Separate features and targets\n",
2065 |         "X_real_train = real_train.drop(['Test Results'], axis=1)\n",
2066 |         "y_real_train = real_train['Test Results']\n",
2067 |         "\n",
2068 |         "X_real_test = real_test.drop(['Test Results'], axis=1)\n",
2069 |         "y_real_test = real_test['Test Results']\n",
2070 |         "\n",
2071 |         "# Step 2: Train model_real on only the real training data\n",
2072 |         "model_real = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2073 |         "model_real.fit(X_real_train, y_real_train)\n",
2074 |         "\n",
2075 |         "# Step 3: Combine synthetic and real training data, then train model_combined\n",
2076 |         "combined_train_df = pd.concat([df_gretel, real_train], ignore_index=True)\n",
2077 |         "X_combined_train = combined_train_df.drop(['Test Results'], axis=1)\n",
2078 |         "y_combined_train = combined_train_df['Test Results']\n",
2079 |         "\n",
2080 |         "model_combined = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2081 |         "model_combined.fit(X_combined_train, y_combined_train)\n",
2082 |         "\n",
2083 |         "# Step 4: Evaluate both models on the same real test set\n",
2084 |         "y_pred_real = model_real.predict(X_real_test)\n",
2085 |         "y_pred_combined = model_combined.predict(X_real_test)\n",
2086 |         "\n",
2087 |         "# Step 5: Print classification reports\n",
2088 |         "print(\"=== Model Trained Only on Real Data ===\")\n",
2089 |         "print(classification_report(y_real_test, y_pred_real))\n",
2090 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_real))\n",
2091 |         "\n",
2092 |         "print(\"\\n=== Model Trained on Real + Gretel Synthetic Data ===\")\n",
2093 |         "print(classification_report(y_real_test, y_pred_combined))\n",
2094 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_combined))"
2095 |       ],
2096 |       "metadata": {
2097 |         "colab": {
2098 |           "base_uri": "https://localhost:8080/"
2099 |         },
2100 |         "id": "frc6qYvt1peS",
2101 |         "outputId": "db359022-05c2-4729-b69f-c68270adbc5d"
2102 |       },
2103 |       "execution_count": 28,
2104 |       "outputs": [
2105 |         {
2106 |           "output_type": "stream",
2107 |           "name": "stdout",
2108 |           "text": [
2109 |             "=== Model Trained Only on Real Data ===\n",
2110 |             "              precision    recall  f1-score   support\n",
2111 |             "\n",
2112 |             "         0.0       0.42      0.42      0.42      3754\n",
2113 |             "         1.0       0.42      0.43      0.42      3617\n",
2114 |             "         2.0       0.43      0.42      0.42      3729\n",
2115 |             "\n",
2116 |             "    accuracy                           0.42     11100\n",
2117 |             "   macro avg       0.42      0.42      0.42     11100\n",
2118 |             "weighted avg       0.42      0.42      0.42     11100\n",
2119 |             "\n",
2120 |             "Accuracy: 0.4218018018018018\n",
2121 |             "\n",
2122 |             "=== Model Trained on Real + Gretel Synthetic Data ===\n",
2123 |             "              precision    recall  f1-score   support\n",
2124 |             "\n",
2125 |             "         0.0       0.42      0.40      0.41      3754\n",
2126 |             "         1.0       0.41      0.42      0.41      3617\n",
2127 |             "         2.0       0.41      0.42      0.42      3729\n",
2128 |             "\n",
2129 |             "    accuracy                           0.41     11100\n",
2130 |             "   macro avg       0.41      0.41      0.41     11100\n",
2131 |             "weighted avg       0.41      0.41      0.41     11100\n",
2132 |             "\n",
2133 |             "Accuracy: 0.41315315315315315\n"
2134 |           ]
2135 |         }
2136 |       ]
2137 |     },
2138 |     {
2139 |       "cell_type": "code",
2140 |       "source": [
2141 |         "from sklearn.model_selection import train_test_split\n",
2142 |         "from sklearn.ensemble import RandomForestClassifier\n",
2143 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2144 |         "import pandas as pd\n",
2145 |         "\n",
2146 |         "# Optional: Drop index columns if they exist\n",
2147 |         "df_real = df_real.drop(columns=['Unnamed: 0'], errors='ignore')\n",
2148 |         "df_mostlyai = df_mostlyai.drop(columns=['Unnamed: 0'], errors='ignore')\n",
2149 |         "\n",
2150 |         "# Step 1: Split df_real into training and test sets (20% held out for testing)\n",
2151 |         "real_train, real_test = train_test_split(df_real, test_size=0.2, random_state=42)\n",
2152 |         "\n",
2153 |         "# Separate features and targets\n",
2154 |         "X_real_train = real_train.drop(['Test Results'], axis=1)\n",
2155 |         "y_real_train = real_train['Test Results']\n",
2156 |         "\n",
2157 |         "X_real_test = real_test.drop(['Test Results'], axis=1)\n",
2158 |         "y_real_test = real_test['Test Results']\n",
2159 |         "\n",
2160 |         "# Step 2: Train model_real on only the real training data\n",
2161 |         "model_real = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2162 |         "model_real.fit(X_real_train, y_real_train)\n",
2163 |         "\n",
2164 |         "# Step 3: Combine synthetic and real training data, then train model_combined\n",
2165 |         "combined_train_df = pd.concat([df_mostlyai, real_train], ignore_index=True)\n",
2166 |         "X_combined_train = combined_train_df.drop(['Test Results'], axis=1)\n",
2167 |         "y_combined_train = combined_train_df['Test Results']\n",
2168 |         "\n",
2169 |         "model_combined = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2170 |         "model_combined.fit(X_combined_train, y_combined_train)\n",
2171 |         "\n",
2172 |         "# Step 4: Evaluate both models on the same real test set\n",
2173 |         "y_pred_real = model_real.predict(X_real_test)\n",
2174 |         "y_pred_combined = model_combined.predict(X_real_test)\n",
2175 |         "\n",
2176 |         "# Step 5: Print classification reports\n",
2177 |         "print(\"=== Model Trained Only on Real Data ===\")\n",
2178 |         "print(classification_report(y_real_test, y_pred_real))\n",
2179 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_real))\n",
2180 |         "\n",
2181 |         "print(\"\\n=== Model Trained on Real + MostlyAI Synthetic Data ===\")\n",
2182 |         "print(classification_report(y_real_test, y_pred_combined))\n",
2183 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_combined))"
2184 |       ],
2185 |       "metadata": {
2186 |         "colab": {
2187 |           "base_uri": "https://localhost:8080/"
2188 |         },
2189 |         "id": "mXbUrihBG1in",
2190 |         "outputId": "83b689c4-3a40-479d-ec82-9a8fa0973725"
2191 |       },
2192 |       "execution_count": 29,
2193 |       "outputs": [
2194 |         {
2195 |           "output_type": "stream",
2196 |           "name": "stdout",
2197 |           "text": [
2198 |             "=== Model Trained Only on Real Data ===\n",
2199 |             "              precision    recall  f1-score   support\n",
2200 |             "\n",
2201 |             "         0.0       0.42      0.42      0.42      3754\n",
2202 |             "         1.0       0.42      0.43      0.42      3617\n",
2203 |             "         2.0       0.43      0.42      0.42      3729\n",
2204 |             "\n",
2205 |             "    accuracy                           0.42     11100\n",
2206 |             "   macro avg       0.42      0.42      0.42     11100\n",
2207 |             "weighted avg       0.42      0.42      0.42     11100\n",
2208 |             "\n",
2209 |             "Accuracy: 0.4218018018018018\n",
2210 |             "\n",
2211 |             "=== Model Trained on Real + MostlyAI Synthetic Data ===\n",
2212 |             "              precision    recall  f1-score   support\n",
2213 |             "\n",
2214 |             "         0.0       0.42      0.42      0.42      3754\n",
2215 |             "         1.0       0.42      0.40      0.41      3617\n",
2216 |             "         2.0       0.42      0.44      0.43      3729\n",
2217 |             "\n",
2218 |             "    accuracy                           0.42     11100\n",
2219 |             "   macro avg       0.42      0.42      0.42     11100\n",
2220 |             "weighted avg       0.42      0.42      0.42     11100\n",
2221 |             "\n",
2222 |             "Accuracy: 0.42\n"
2223 |           ]
2224 |         }
2225 |       ]
2226 |     },
2227 |     {
2228 |       "cell_type": "code",
2229 |       "source": [
2230 |         "from sklearn.model_selection import train_test_split\n",
2231 |         "from sklearn.ensemble import RandomForestClassifier\n",
2232 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2233 |         "import pandas as pd\n",
2234 |         "\n",
2235 |         "\n",
2236 |         "# Step 1: Split df_real into training and test sets (20% held out for testing)\n",
2237 |         "real_train, real_test = train_test_split(df_real, test_size=0.2, random_state=42)\n",
2238 |         "\n",
2239 |         "# Separate features and targets\n",
2240 |         "X_real_train = real_train.drop(['Test Results'], axis=1)\n",
2241 |         "y_real_train = real_train['Test Results']\n",
2242 |         "\n",
2243 |         "X_real_test = real_test.drop(['Test Results'], axis=1)\n",
2244 |         "y_real_test = real_test['Test Results']\n",
2245 |         "\n",
2246 |         "# Step 2: Train model_real on only the real training data\n",
2247 |         "model_real = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2248 |         "model_real.fit(X_real_train, y_real_train)\n",
2249 |         "\n",
2250 |         "# Step 3: Combine synthetic and real training data, then train model_combined\n",
2251 |         "combined_train_df = pd.concat([df_syncora, real_train], ignore_index=True)\n",
2252 |         "X_combined_train = combined_train_df.drop(['Test Results'], axis=1)\n",
2253 |         "y_combined_train = combined_train_df['Test Results']\n",
2254 |         "\n",
2255 |         "model_combined = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2256 |         "model_combined.fit(X_combined_train, y_combined_train)\n",
2257 |         "\n",
2258 |         "# Step 4: Evaluate both models on the same real test set\n",
2259 |         "y_pred_real = model_real.predict(X_real_test)\n",
2260 |         "y_pred_combined = model_combined.predict(X_real_test)\n",
2261 |         "\n",
2262 |         "# Step 5: Print classification reports\n",
2263 |         "print(\"=== Model Trained Only on Real Data ===\")\n",
2264 |         "print(classification_report(y_real_test, y_pred_real))\n",
2265 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_real))\n",
2266 |         "\n",
2267 |         "print(\"\\n=== Model Trained on Real + Syncora Synthetic Data ===\")\n",
2268 |         "print(classification_report(y_real_test, y_pred_combined))\n",
2269 |         "print(\"Accuracy:\", accuracy_score(y_real_test, y_pred_combined))"
2270 |       ],
2271 |       "metadata": {
2272 |         "colab": {
2273 |           "base_uri": "https://localhost:8080/"
2274 |         },
2275 |         "id": "w4uB_iKPGkdO",
2276 |         "outputId": "760a4f94-995e-47f2-c9c6-767599d3de6b"
2277 |       },
2278 |       "execution_count": 30,
2279 |       "outputs": [
2280 |         {
2281 |           "output_type": "stream",
2282 |           "name": "stdout",
2283 |           "text": [
2284 |             "=== Model Trained Only on Real Data ===\n",
2285 |             "              precision    recall  f1-score   support\n",
2286 |             "\n",
2287 |             "         0.0       0.42      0.42      0.42      3754\n",
2288 |             "         1.0       0.42      0.43      0.42      3617\n",
2289 |             "         2.0       0.43      0.42      0.42      3729\n",
2290 |             "\n",
2291 |             "    accuracy                           0.42     11100\n",
2292 |             "   macro avg       0.42      0.42      0.42     11100\n",
2293 |             "weighted avg       0.42      0.42      0.42     11100\n",
2294 |             "\n",
2295 |             "Accuracy: 0.4218018018018018\n",
2296 |             "\n",
2297 |             "=== Model Trained on Real + Syncora Synthetic Data ===\n",
2298 |             "              precision    recall  f1-score   support\n",
2299 |             "\n",
2300 |             "         0.0       0.63      0.62      0.62      3754\n",
2301 |             "         1.0       0.61      0.63      0.62      3617\n",
2302 |             "         2.0       0.63      0.62      0.62      3729\n",
2303 |             "\n",
2304 |             "    accuracy                           0.62     11100\n",
2305 |             "   macro avg       0.62      0.62      0.62     11100\n",
2306 |             "weighted avg       0.62      0.62      0.62     11100\n",
2307 |             "\n",
2308 |             "Accuracy: 0.6222522522522522\n"
2309 |           ]
2310 |         }
2311 |       ]
2312 |     },
2313 |     {
2314 |       "cell_type": "code",
2315 |       "execution_count": 31,
2316 |       "metadata": {
2317 |         "id": "I9bmBoho2Ruw",
2318 |         "colab": {
2319 |           "base_uri": "https://localhost:8080/"
2320 |         },
2321 |         "outputId": "7df3c8d1-d52a-4f1a-c2c7-54ca643699a1"
2322 |       },
2323 |       "outputs": [
2324 |         {
2325 |           "output_type": "stream",
2326 |           "name": "stdout",
2327 |           "text": [
2328 |             "Classification Report for MostlyAI:\n",
2329 |             "              precision    recall  f1-score   support\n",
2330 |             "\n",
2331 |             "           0       0.34      0.35      0.34      2000\n",
2332 |             "           1       0.30      0.27      0.29      1828\n",
2333 |             "           2       0.36      0.39      0.38      2172\n",
2334 |             "\n",
2335 |             "    accuracy                           0.34      6000\n",
2336 |             "   macro avg       0.34      0.34      0.34      6000\n",
2337 |             "weighted avg       0.34      0.34      0.34      6000\n",
2338 |             "\n",
2339 |             "Accuracy: 0.33916666666666667\n"
2340 |           ]
2341 |         }
2342 |       ],
2343 |       "source": [
2344 |         "# Import necessary libraries\n",
2345 |         "from sklearn.model_selection import train_test_split\n",
2346 |         "from sklearn.ensemble import RandomForestClassifier\n",
2347 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2348 |         "\n",
2349 |         "# Define features (X) and target (y)\n",
2350 |         "X = df_mostlyai.drop(['Test Results'], axis=1)\n",
2351 |         "y = df_mostlyai['Test Results']\n",
2352 |         "\n",
2353 |         "# Split data into training and testing sets\n",
2354 |         "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
2355 |         "\n",
2356 |         "# Initialize and train the Random Forest Classifier\n",
2357 |         "model_mostlyai = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2358 |         "model_mostlyai.fit(X_train, y_train)\n",
2359 |         "\n",
2360 |         "# Make predictions on the test set\n",
2361 |         "y_pred = model_mostlyai.predict(X_test)\n",
2362 |         "\n",
2363 |         "# Print the classification report\n",
2364 |         "print(\"Classification Report for MostlyAI:\")\n",
2365 |         "print(classification_report(y_test, y_pred))\n",
2366 |         "\n",
2367 |         "# Print the accuracy score\n",
2368 |         "print(\"Accuracy:\", accuracy_score(y_test, y_pred))"
2369 |       ]
2370 |     },
2371 |     {
2372 |       "cell_type": "code",
2373 |       "source": [
2374 |         "# Import necessary libraries\n",
2375 |         "from sklearn.model_selection import train_test_split\n",
2376 |         "from sklearn.ensemble import RandomForestClassifier\n",
2377 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2378 |         "\n",
2379 |         "# Define features (X) and target (y)\n",
2380 |         "X = df_gretel.drop(['Test Results'], axis=1)\n",
2381 |         "y = df_gretel['Test Results']\n",
2382 |         "\n",
2383 |         "# Split data into training and testing sets\n",
2384 |         "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
2385 |         "\n",
2386 |         "# Initialize and train the Random Forest Classifier\n",
2387 |         "model_gretel = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2388 |         "model_gretel.fit(X_train, y_train)\n",
2389 |         "\n",
2390 |         "# Make predictions on the test set\n",
2391 |         "y_pred = model_gretel.predict(X_test)\n",
2392 |         "\n",
2393 |         "# Print the classification report\n",
2394 |         "print(\"Classification Report for Gretel:\")\n",
2395 |         "print(classification_report(y_test, y_pred))\n",
2396 |         "\n",
2397 |         "# Print the accuracy score\n",
2398 |         "print(\"Accuracy:\", accuracy_score(y_test, y_pred))"
2399 |       ],
2400 |       "metadata": {
2401 |         "colab": {
2402 |           "base_uri": "https://localhost:8080/"
2403 |         },
2404 |         "id": "GD96lqizN_Rj",
2405 |         "outputId": "ec95b12a-6cdc-4c57-e691-9b11e1d5f8e2"
2406 |       },
2407 |       "execution_count": 37,
2408 |       "outputs": [
2409 |         {
2410 |           "output_type": "stream",
2411 |           "name": "stdout",
2412 |           "text": [
2413 |             "Classification Report for Gretel:\n",
2414 |             "              precision    recall  f1-score   support\n",
2415 |             "\n",
2416 |             "           0       0.33      0.33      0.33      1903\n",
2417 |             "           1       0.33      0.33      0.33      2007\n",
2418 |             "           2       0.35      0.35      0.35      2090\n",
2419 |             "\n",
2420 |             "    accuracy                           0.34      6000\n",
2421 |             "   macro avg       0.34      0.34      0.34      6000\n",
2422 |             "weighted avg       0.34      0.34      0.34      6000\n",
2423 |             "\n",
2424 |             "Accuracy: 0.3358333333333333\n"
2425 |           ]
2426 |         }
2427 |       ]
2428 |     },
2429 |     {
2430 |       "cell_type": "code",
2431 |       "execution_count": 32,
2432 |       "metadata": {
2433 |         "id": "z002rkwm2tjx",
2434 |         "colab": {
2435 |           "base_uri": "https://localhost:8080/"
2436 |         },
2437 |         "outputId": "a3c29ad8-185f-48b6-ce34-bdd2ce69c17f"
2438 |       },
2439 |       "outputs": [
2440 |         {
2441 |           "output_type": "stream",
2442 |           "name": "stdout",
2443 |           "text": [
2444 |             "Classification Report Syncora:\n",
2445 |             "              precision    recall  f1-score   support\n",
2446 |             "\n",
2447 |             "         0.0       0.56      0.56      0.56      2008\n",
2448 |             "         1.0       0.57      0.57      0.57      2034\n",
2449 |             "         2.0       0.55      0.55      0.55      1958\n",
2450 |             "\n",
2451 |             "    accuracy                           0.56      6000\n",
2452 |             "   macro avg       0.56      0.56      0.56      6000\n",
2453 |             "weighted avg       0.56      0.56      0.56      6000\n",
2454 |             "\n",
2455 |             "Accuracy: 0.559\n"
2456 |           ]
2457 |         }
2458 |       ],
2459 |       "source": [
2460 |         "# Import necessary libraries\n",
2461 |         "from sklearn.model_selection import train_test_split\n",
2462 |         "from sklearn.ensemble import RandomForestClassifier\n",
2463 |         "from sklearn.metrics import classification_report, accuracy_score\n",
2464 |         "\n",
2465 |         "# Define features (X) and target (y)\n",
2466 |         "X = df_syncora.drop(['Test Results'], axis=1)\n",
2467 |         "y = df_syncora['Test Results']\n",
2468 |         "\n",
2469 |         "# Split data into training and testing sets\n",
2470 |         "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
2471 |         "\n",
2472 |         "# Initialize and train the Random Forest Classifier\n",
2473 |         "model_syncora = RandomForestClassifier(n_estimators=100, random_state=42)\n",
2474 |         "model_syncora.fit(X_train, y_train)\n",
2475 |         "\n",
2476 |         "# Make predictions on the test set\n",
2477 |         "y_pred = model_syncora.predict(X_test)\n",
2478 |         "\n",
2479 |         "# Print the classification report\n",
2480 |         "print(\"Classification Report Syncora:\")\n",
2481 |         "print(classification_report(y_test, y_pred))\n",
2482 |         "\n",
2483 |         "# Print the accuracy score\n",
2484 |         "print(\"Accuracy:\", accuracy_score(y_test, y_pred))"
2485 |       ]
2486 |     },
2487 |     {
2488 |       "cell_type": "code",
2489 |       "execution_count": 33,
2490 |       "metadata": {
2491 |         "id": "BmCznQhp1mYz",
2492 |         "colab": {
2493 |           "base_uri": "https://localhost:8080/"
2494 |         },
2495 |         "outputId": "40691e0b-1d54-4dd0-80d6-808b4d96734b"
2496 |       },
2497 |       "outputs": [
2498 |         {
2499 |           "output_type": "stream",
2500 |           "name": "stdout",
2501 |           "text": [
2502 |             "Accuracy on df_real: 0.8843603603603604\n",
2503 |             "\n",
2504 |             "Classification Report on df_real:\n",
2505 |             "              precision    recall  f1-score   support\n",
2506 |             "\n",
2507 |             "         0.0       0.88      0.88      0.88     18627\n",
2508 |             "         1.0       0.88      0.89      0.88     18356\n",
2509 |             "         2.0       0.89      0.88      0.88     18517\n",
2510 |             "\n",
2511 |             "    accuracy                           0.88     55500\n",
2512 |             "   macro avg       0.88      0.88      0.88     55500\n",
2513 |             "weighted avg       0.88      0.88      0.88     55500\n",
2514 |             "\n"
2515 |           ]
2516 |         }
2517 |       ],
2518 |       "source": [
2519 |         "# prompt: now convert whole df_real as a test data and print the accuracy of above model on that whole data\n",
2520 |         "\n",
2521 |         "# Convert the entire df_real to test data\n",
2522 |         "X_test_real = df_real.drop(['Test Results'], axis=1)\n",
2523 |         "y_test_real = df_real['Test Results']\n",
2524 |         "\n",
2525 |         "# Make predictions on the entire df_real test data\n",
2526 |         "y_pred_real = model.predict(X_test_real)\n",
2527 |         "\n",
2528 |         "# Print the accuracy score on the entire df_real data\n",
2529 |         "print(\"Accuracy on df_real:\", accuracy_score(y_test_real, y_pred_real))\n",
2530 |         "\n",
2531 |         "# Print the classification report on the entire df_real data\n",
2532 |         "print(\"\\nClassification Report on df_real:\")\n",
2533 |         "print(classification_report(y_test_real, y_pred_real))"
2534 |       ]
2535 |     },
2536 |     {
2537 |       "cell_type": "code",
2538 |       "execution_count": 35,
2539 |       "metadata": {
2540 |         "id": "7Dlzyw9Q2f_I",
2541 |         "colab": {
2542 |           "base_uri": "https://localhost:8080/"
2543 |         },
2544 |         "outputId": "bb71cb90-bd0e-4d72-9601-da9e51f7fdd0"
2545 |       },
2546 |       "outputs": [
2547 |         {
2548 |           "output_type": "stream",
2549 |           "name": "stdout",
2550 |           "text": [
2551 |             "Accuracy on df_mostlyai: 0.3367747747747748\n",
2552 |             "\n",
2553 |             "Classification Report on df_mostlyai:\n",
2554 |             "              precision    recall  f1-score   support\n",
2555 |             "\n",
2556 |             "         0.0       0.34      0.35      0.35     18627\n",
2557 |             "         1.0       0.34      0.27      0.30     18356\n",
2558 |             "         2.0       0.33      0.38      0.36     18517\n",
2559 |             "\n",
2560 |             "    accuracy                           0.34     55500\n",
2561 |             "   macro avg       0.34      0.34      0.33     55500\n",
2562 |             "weighted avg       0.34      0.34      0.34     55500\n",
2563 |             "\n"
2564 |           ]
2565 |         }
2566 |       ],
2567 |       "source": [
2568 |         "# Make predictions on the entire df_real test data\n",
2569 |         "y_pred_mostlyai = model_mostlyai.predict(X_test_real)\n",
2570 |         "\n",
2571 |         "# Print the accuracy score on the entire df_real data\n",
2572 |         "print(\"Accuracy on df_mostlyai:\", accuracy_score(y_test_real, y_pred_mostlyai))\n",
2573 |         "\n",
2574 |         "# Print the classification report on the entire df_real data\n",
2575 |         "print(\"\\nClassification Report on df_mostlyai:\")\n",
2576 |         "print(classification_report(y_test_real, y_pred_mostlyai))"
2577 |       ]
2578 |     },
2579 |     {
2580 |       "cell_type": "code",
2581 |       "execution_count": 36,
2582 |       "metadata": {
2583 |         "id": "b-K4ceOn2law",
2584 |         "colab": {
2585 |           "base_uri": "https://localhost:8080/"
2586 |         },
2587 |         "outputId": "056eaf40-0a34-44f9-8d50-b60657d6f4eb"
2588 |       },
2589 |       "outputs": [
2590 |         {
2591 |           "output_type": "stream",
2592 |           "name": "stdout",
2593 |           "text": [
2594 |             "Accuracy on df_syncora: 0.57009009009009\n",
2595 |             "\n",
2596 |             "Classification Report on df_syncora:\n",
2597 |             "              precision    recall  f1-score   support\n",
2598 |             "\n",
2599 |             "         0.0       0.57      0.56      0.57     18627\n",
2600 |             "         1.0       0.56      0.58      0.57     18356\n",
2601 |             "         2.0       0.57      0.57      0.57     18517\n",
2602 |             "\n",
2603 |             "    accuracy                           0.57     55500\n",
2604 |             "   macro avg       0.57      0.57      0.57     55500\n",
2605 |             "weighted avg       0.57      0.57      0.57     55500\n",
2606 |             "\n"
2607 |           ]
2608 |         }
2609 |       ],
2610 |       "source": [
2611 |         "# Make predictions on the entire df_real test data\n",
2612 |         "y_pred_syncora = model_syncora.predict(X_test_real)\n",
2613 |         "\n",
2614 |         "# Print the accuracy score on the entire df_real data\n",
2615 |         "print(\"Accuracy on df_syncora:\", accuracy_score(y_test_real, y_pred_syncora))\n",
2616 |         "\n",
2617 |         "# Print the classification report on the entire df_real data\n",
2618 |         "print(\"\\nClassification Report on df_syncora:\")\n",
2619 |         "print(classification_report(y_test_real, y_pred_syncora))"
2620 |       ]
2621 |     },
2622 |     {
2623 |       "cell_type": "code",
2624 |       "source": [
2625 |         "# Make predictions on the entire df_real test data\n",
2626 |         "y_pred_gretel = model_gretel.predict(X_test_real)\n",
2627 |         "\n",
2628 |         "# Print the accuracy score on the entire df_real data\n",
2629 |         "print(\"Accuracy on df_gretel:\", accuracy_score(y_test_real, y_pred_gretel))\n",
2630 |         "\n",
2631 |         "# Print the classification report on the entire df_real data\n",
2632 |         "print(\"\\nClassification Report on df_gretel:\")\n",
2633 |         "print(classification_report(y_test_real, y_pred_gretel))"
2634 |       ],
2635 |       "metadata": {
2636 |         "colab": {
2637 |           "base_uri": "https://localhost:8080/"
2638 |         },
2639 |         "id": "eAdeWh-EORnj",
2640 |         "outputId": "12781903-b026-4655-c2b3-877b3036170e"
2641 |       },
2642 |       "execution_count": 38,
2643 |       "outputs": [
2644 |         {
2645 |           "output_type": "stream",
2646 |           "name": "stdout",
2647 |           "text": [
2648 |             "Accuracy on df_gretel: 0.33535135135135136\n",
2649 |             "\n",
2650 |             "Classification Report on df_gretel:\n",
2651 |             "              precision    recall  f1-score   support\n",
2652 |             "\n",
2653 |             "         0.0       0.34      0.33      0.33     18627\n",
2654 |             "         1.0       0.33      0.33      0.33     18356\n",
2655 |             "         2.0       0.34      0.35      0.34     18517\n",
2656 |             "\n",
2657 |             "    accuracy                           0.34     55500\n",
2658 |             "   macro avg       0.34      0.34      0.34     55500\n",
2659 |             "weighted avg       0.34      0.34      0.34     55500\n",
2660 |             "\n"
2661 |           ]
2662 |         }
2663 |       ]
2664 |     },
2665 |     {
2666 |       "cell_type": "code",
2667 |       "execution_count": 41,
2668 |       "metadata": {
2669 |         "id": "-dbuURPJk1Gj",
2670 |         "colab": {
2671 |           "base_uri": "https://localhost:8080/"
2672 |         },
2673 |         "outputId": "df4bfd6e-4c8c-4fda-ed7c-56a0cd4d8ff4"
2674 |       },
2675 |       "outputs": [
2676 |         {
2677 |           "output_type": "stream",
2678 |           "name": "stdout",
2679 |           "text": [
2680 |             "KS Complement (Continuous Columns):\n",
2681 |             "Syncora vs Real: {'Age': np.float64(0.9890484218467417), 'Billing Amount': np.float64(0.996284565517581)}\n",
2682 |             "Gretel vs Real: {'Age': np.float64(0.9857360360360361), 'Billing Amount': np.float64(0.829954054054054)}\n",
2683 |             "MostlyAI vs Real: {'Age': np.float64(0.9907990990990991), 'Billing Amount': np.float64(0.9946549549549549)}\n",
2684 |             "\n",
2685 |             "TV Complement (Discrete Columns):\n",
2686 |             "Syncora vs Real: {'Gender': 0.9986341720078636, 'Blood Type': 0.9938083187527817, 'Medical Condition': 0.995611438360155, 'Admission Type': 0.995003384610025, 'Medication': 0.9946757231262865, 'Test Results': 0.9941829683540464}\n",
2687 |             "Gretel vs Real: {'Gender': 0.9641324324324324, 'Blood Type': 0.9745486486486487, 'Medical Condition': 0.9740297297297297, 'Admission Type': 0.994536036036036, 'Medication': 0.9823783783783784, 'Test Results': 0.9860450450450451}\n",
2688 |             "MostlyAI vs Real: {'Gender': 0.9987342342342342, 'Blood Type': 0.9771702702702703, 'Medical Condition': 0.9870252252252252, 'Admission Type': 0.9906954954954955, 'Medication': 0.9888261261261261, 'Test Results': 0.9769945945945946}\n"
2689 |           ]
2690 |         }
2691 |       ],
2692 |       "source": [
2693 |         "# prompt: Using dataframe df_real: using sdmetrics, find the two things for df_syncora vs df_real , df_gretel vs df_real, df_mostlyai vs df_real KS Complement(for continuos values) and TV Complement for discrete values.\n",
2694 |         "\n",
2695 |         "from sdmetrics.single_column import KSComplement, TVComplement\n",
2696 |         "\n",
2697 |         "# Assuming df_syncora, df_gretel, and df_mostlyai are also loaded DataFrames\n",
2698 |         "\n",
2699 |         "# List of continuous columns (excluding the identifier 'Unnamed: 0')\n",
2700 |         "continuous_cols = ['Age', 'Billing Amount']\n",
2701 |         "\n",
2702 |         "# List of discrete columns\n",
2703 |         "discrete_cols = ['Gender', 'Blood Type', 'Medical Condition', 'Admission Type', 'Medication', 'Test Results']\n",
2704 |         "\n",
2705 |         "# Calculate KS Complement for continuous columns\n",
2706 |         "ks_syncora_real = {col: KSComplement.compute(df_real[col], df_syncora[col]) for col in continuous_cols}\n",
2707 |         "ks_gretel_real = {col: KSComplement.compute(df_real[col], df_gretel[col]) for col in continuous_cols}\n",
2708 |         "ks_mostlyai_real = {col: KSComplement.compute(df_real[col], df_mostlyai[col]) for col in continuous_cols}\n",
2709 |         "\n",
2710 |         "# Calculate TV Complement for discrete columns\n",
2711 |         "tv_syncora_real = {col: TVComplement.compute(df_real[col], df_syncora[col]) for col in discrete_cols}\n",
2712 |         "tv_gretel_real = {col: TVComplement.compute(df_real[col], df_gretel[col]) for col in discrete_cols}\n",
2713 |         "tv_mostlyai_real = {col: TVComplement.compute(df_real[col], df_mostlyai[col]) for col in discrete_cols}\n",
2714 |         "\n",
2715 |         "# Print the results\n",
2716 |         "print(\"KS Complement (Continuous Columns):\")\n",
2717 |         "print(\"Syncora vs Real:\", ks_syncora_real)\n",
2718 |         "print(\"Gretel vs Real:\", ks_gretel_real)\n",
2719 |         "print(\"MostlyAI vs Real:\", ks_mostlyai_real)\n",
2720 |         "print(\"\\nTV Complement (Discrete Columns):\")\n",
2721 |         "print(\"Syncora vs Real:\", tv_syncora_real)\n",
2722 |         "print(\"Gretel vs Real:\", tv_gretel_real)\n",
2723 |         "print(\"MostlyAI vs Real:\", tv_mostlyai_real)"
2724 |       ]
2725 |     }
2726 |   ],
2727 |   "metadata": {
2728 |     "colab": {
2729 |       "provenance": [],
2730 |       "gpuType": "T4"
2731 |     },
2732 |     "kernelspec": {
2733 |       "display_name": "Python 3",
2734 |       "name": "python3"
2735 |     },
2736 |     "language_info": {
2737 |       "name": "python"
2738 |     },
2739 |     "accelerator": "GPU"
2740 |   },
2741 |   "nbformat": 4,
2742 |   "nbformat_minor": 0
2743 | }


--------------------------------------------------------------------------------