├── 1. ML workflow.ipynb ├── 2. Linear Regression.ipynb ├── 3. Regression accuracy metrics.ipynb ├── 4. Logistic Regression.ipynb ├── 5. Classification: Accuracy Metrics.ipynb ├── 6. Decision Tree.ipynb ├── 7. Cross-validation and Grid Search.ipynb ├── 8. K-means Clustering.ipynb ├── Churn.csv ├── README.md ├── bmw.csv └── processed.cleveland.data /2. Linear Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Expedition to Data Science and Machine Learning\n", 8 | "## Module 4: Machine Learning with Python\n", 9 | "### Lecture 2: Supervised Learning: Linear Regression\n", 10 | "\n", 11 | "Instructor: Md Shahidullah Kawsar\n", 12 | "
Data Scientist, IDARE, Houston, TX, USA\n", 13 | "\n", 14 | "#### Objectives:\n", 15 | "- Supervised Learning: Linear Regression\n", 16 | "- train data, test data\n", 17 | "- Understanding the equation of a straight line\n", 18 | "- feature coefficient (slope, gradient, m)\n", 19 | "- bias coeffcient (y-interccept, c)\n", 20 | "- domain: x-axis, independent variable\n", 21 | "- range: y-axis, dependent variable\n", 22 | "- loss function, cost function, objective function, error function\n", 23 | "- bias-variance tradeoff, overfitting, underfitting\n", 24 | "- ordinary least square method\n", 25 | "- gradient descent method\n", 26 | "- residual, error, squared error, RMSE - Root Mean Squared Error\n", 27 | "\n", 28 | "#### References:\n", 29 | "[1] A Gentle Introduction to Machine Learning: https://www.youtube.com/watch?v=Gv9_4yMHFhI&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&ab_channel=StatQuestwithJoshStarmer\n", 30 | "
[2] Linear Regression, Clearly Explained!!!: https://www.youtube.com/watch?v=nk2CQITm_eo&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=10&ab_channel=StatQuestwithJoshStarmer\n", 31 | "
[3] Linear Regression scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\n", 32 | "
[4] Data Splitting: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html\n", 33 | "
[5] Mean Squared Error: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html\n", 34 | "
[6] RMSE calculation: https://www.youtube.com/watch?v=zMFdb__sUpw&ab_channel=KhanAcademy\n", 35 | "
[7] Regression coefficients: https://statisticsbyjim.com/glossary/regression-coefficient/\n", 36 | "
[8] Machine Learning Quiz 01: Linear Regression https://kawsar34.medium.com/machine-learning-quiz-01-a2fac2712a55\n", 37 | "
[9] Linear Regression Assumptions: https://www.statology.org/linear-regression-assumptions/\n", 38 | "
[10] Constant Variance: https://stats.stackexchange.com/questions/52089/what-does-having-constant-variance-in-a-linear-regression-model-mean\n", 39 | "
[11] Multiple Regression: https://www.youtube.com/watch?v=zITIFTsivN8&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=11&ab_channel=StatQuestwithJoshStarmer\n", 40 | "
[12] Linear Regression Simplified - Ordinary Least Square vs Gradient Descent: https://towardsdatascience.com/linear-regression-simplified-ordinary-least-square-vs-gradient-descent-48145de2cf76" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "#### Terminologies:\n", 48 | "\n", 49 | "- equation of a straight line: y=mx+c\n", 50 | "
Straight lines: https://github.com/SKawsar/Data_Visualization_with_Python/blob/main/Lecture_4.ipynb\n", 51 | "- feature coefficient (slope, gradient, m)\n", 52 | "- bias coeffcient (y-interccept, c)\n", 53 | "- domain: x-axis, independent variable\n", 54 | "- range: y-axis, dependent variable\n", 55 | "- loss function, cost function, objective function, error function\n", 56 | "- bias-variance tradeoff, overfitting, underfitting\n", 57 | "- ordinary least square method\n", 58 | "- gradient descent method\n", 59 | "- residual, error, squared error\n", 60 | "- train data, test data\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "#### Import required Libraries" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 108, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "import pandas as pd\n", 77 | "import numpy as np\n", 78 | "\n", 79 | "from sklearn.model_selection import train_test_split\n", 80 | "from sklearn.linear_model import LinearRegression\n", 81 | "from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "#### Load data" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 109, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "data": { 98 | "text/html": [ 99 | "
\n", 100 | "\n", 113 | "\n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | "
modelyearpricetransmissionmileagefuelTypempgengineSize
0T-Roc201925000Automatic13904Diesel49.62.0
1T-Roc201926883Automatic4562Diesel49.62.0
2T-Roc201920000Manual7414Diesel50.42.0
3T-Roc201933492Automatic4825Petrol32.52.0
4T-Roc201922900Semi-Auto6500Petrol39.81.5
\n", 185 | "
" 186 | ], 187 | "text/plain": [ 188 | " model year price transmission mileage fuelType mpg engineSize\n", 189 | "0 T-Roc 2019 25000 Automatic 13904 Diesel 49.6 2.0\n", 190 | "1 T-Roc 2019 26883 Automatic 4562 Diesel 49.6 2.0\n", 191 | "2 T-Roc 2019 20000 Manual 7414 Diesel 50.4 2.0\n", 192 | "3 T-Roc 2019 33492 Automatic 4825 Petrol 32.5 2.0\n", 193 | "4 T-Roc 2019 22900 Semi-Auto 6500 Petrol 39.8 1.5" 194 | ] 195 | }, 196 | "metadata": {}, 197 | "output_type": "display_data" 198 | }, 199 | { 200 | "name": "stdout", 201 | "output_type": "stream", 202 | "text": [ 203 | "(15157, 8)\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "df = pd.read_csv(\"vw.csv\")\n", 209 | "\n", 210 | "display(df.head())\n", 211 | "print(df.shape)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 110, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "name": "stdout", 221 | "output_type": "stream", 222 | "text": [ 223 | "\n", 224 | "RangeIndex: 15157 entries, 0 to 15156\n", 225 | "Data columns (total 8 columns):\n", 226 | " # Column Non-Null Count Dtype \n", 227 | "--- ------ -------------- ----- \n", 228 | " 0 model 15157 non-null object \n", 229 | " 1 year 15157 non-null int64 \n", 230 | " 2 price 15157 non-null int64 \n", 231 | " 3 transmission 15157 non-null object \n", 232 | " 4 mileage 15157 non-null int64 \n", 233 | " 5 fuelType 15157 non-null object \n", 234 | " 6 mpg 15157 non-null float64\n", 235 | " 7 engineSize 15157 non-null float64\n", 236 | "dtypes: float64(2), int64(3), object(3)\n", 237 | "memory usage: 947.4+ KB\n" 238 | ] 239 | } 240 | ], 241 | "source": [ 242 | "df.info()" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 111, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "data": { 252 | "text/html": [ 253 | "
\n", 254 | "\n", 267 | "\n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | "
yearpricemileagempgengineSize
count15157.00000015157.00000015157.00000015157.00000015157.000000
mean2017.25578916838.95236522092.78564453.7533551.600693
std2.0530597755.01520621148.94163513.6421820.461695
min2000.000000899.0000001.0000000.3000000.000000
25%2016.00000010990.0000005962.00000046.3000001.200000
50%2017.00000015497.00000016393.00000053.3000001.600000
75%2019.00000020998.00000031824.00000060.1000002.000000
max2020.00000069994.000000212000.000000188.3000003.200000
\n", 345 | "
" 346 | ], 347 | "text/plain": [ 348 | " year price mileage mpg engineSize\n", 349 | "count 15157.000000 15157.000000 15157.000000 15157.000000 15157.000000\n", 350 | "mean 2017.255789 16838.952365 22092.785644 53.753355 1.600693\n", 351 | "std 2.053059 7755.015206 21148.941635 13.642182 0.461695\n", 352 | "min 2000.000000 899.000000 1.000000 0.300000 0.000000\n", 353 | "25% 2016.000000 10990.000000 5962.000000 46.300000 1.200000\n", 354 | "50% 2017.000000 15497.000000 16393.000000 53.300000 1.600000\n", 355 | "75% 2019.000000 20998.000000 31824.000000 60.100000 2.000000\n", 356 | "max 2020.000000 69994.000000 212000.000000 188.300000 3.200000" 357 | ] 358 | }, 359 | "execution_count": 111, 360 | "metadata": {}, 361 | "output_type": "execute_result" 362 | } 363 | ], 364 | "source": [ 365 | "df.describe()" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 112, 371 | "metadata": {}, 372 | "outputs": [ 373 | { 374 | "data": { 375 | "text/plain": [ 376 | "-9600.0" 377 | ] 378 | }, 379 | "execution_count": 112, 380 | "metadata": {}, 381 | "output_type": "execute_result" 382 | } 383 | ], 384 | "source": [ 385 | "120000*-0.08" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": 113, 391 | "metadata": {}, 392 | "outputs": [ 393 | { 394 | "data": { 395 | "text/plain": [ 396 | " Golf 4863\n", 397 | " Polo 3287\n", 398 | " Tiguan 1765\n", 399 | " Passat 915\n", 400 | " Up 884\n", 401 | " T-Roc 733\n", 402 | " Touareg 363\n", 403 | " Touran 352\n", 404 | " T-Cross 300\n", 405 | " Golf SV 268\n", 406 | " Sharan 260\n", 407 | " Arteon 248\n", 408 | " Scirocco 242\n", 409 | " Amarok 111\n", 410 | " Caravelle 101\n", 411 | " CC 95\n", 412 | " Tiguan Allspace 91\n", 413 | " Beetle 83\n", 414 | " Shuttle 61\n", 415 | " Caddy Maxi Life 59\n", 416 | " Jetta 32\n", 417 | " California 15\n", 418 | " Caddy Life 8\n", 419 | " Eos 7\n", 420 | " Caddy 6\n", 421 | " Caddy Maxi 4\n", 422 | " Fox 4\n", 423 | "Name: model, dtype: int64" 424 | ] 425 | }, 426 | "execution_count": 113, 427 | "metadata": {}, 428 | "output_type": "execute_result" 429 | } 430 | ], 431 | "source": [ 432 | "df[\"model\"].value_counts()" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 114, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "data": { 442 | "text/plain": [ 443 | "Index(['model', 'year', 'price', 'transmission', 'mileage', 'fuelType', 'mpg',\n", 444 | " 'engineSize'],\n", 445 | " dtype='object')" 446 | ] 447 | }, 448 | "execution_count": 114, 449 | "metadata": {}, 450 | "output_type": "execute_result" 451 | } 452 | ], 453 | "source": [ 454 | "df.columns" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 115, 460 | "metadata": {}, 461 | "outputs": [ 462 | { 463 | "data": { 464 | "text/plain": [ 465 | "Manual 9417\n", 466 | "Semi-Auto 3780\n", 467 | "Automatic 1960\n", 468 | "Name: transmission, dtype: int64" 469 | ] 470 | }, 471 | "execution_count": 115, 472 | "metadata": {}, 473 | "output_type": "execute_result" 474 | } 475 | ], 476 | "source": [ 477 | "df[\"transmission\"].value_counts()" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 116, 483 | "metadata": {}, 484 | "outputs": [ 485 | { 486 | "data": { 487 | "text/plain": [ 488 | "Petrol 8553\n", 489 | "Diesel 6372\n", 490 | "Hybrid 145\n", 491 | "Other 87\n", 492 | "Name: fuelType, dtype: int64" 493 | ] 494 | }, 495 | "execution_count": 116, 496 | "metadata": {}, 497 | "output_type": "execute_result" 498 | } 499 | ], 500 | "source": [ 501 | "df[\"fuelType\"].value_counts()" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "#### Correlation plot: \n", 509 | "https://github.com/SKawsar/Data_Analysis_with_Python/blob/main/Lecture_8.ipynb" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "#### Separating the features and target variable" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 117, 522 | "metadata": {}, 523 | "outputs": [ 524 | { 525 | "name": "stdout", 526 | "output_type": "stream", 527 | "text": [ 528 | "Shape of X = (15157, 3)\n", 529 | "Shape of y = (15157, 1)\n" 530 | ] 531 | } 532 | ], 533 | "source": [ 534 | "features = ['year', 'mpg', 'engineSize']\n", 535 | "target = ['price']\n", 536 | "\n", 537 | "X = df[features]\n", 538 | "y = df[target]\n", 539 | "\n", 540 | "print(\"Shape of X = \", X.shape)\n", 541 | "print(\"Shape of y = \", y.shape)" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "metadata": {}, 547 | "source": [ 548 | "#### Create train and test set" 549 | ] 550 | }, 551 | { 552 | "cell_type": "code", 553 | "execution_count": 118, 554 | "metadata": {}, 555 | "outputs": [ 556 | { 557 | "name": "stdout", 558 | "output_type": "stream", 559 | "text": [ 560 | "(12125, 3) (3032, 3) (12125, 1) (3032, 1)\n" 561 | ] 562 | } 563 | ], 564 | "source": [ 565 | "X_train, X_test, y_train, y_test = train_test_split(X, \n", 566 | " y,\n", 567 | " test_size=0.2, \n", 568 | " random_state=42)\n", 569 | "\n", 570 | "print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "#### Linear Regression" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 119, 583 | "metadata": {}, 584 | "outputs": [], 585 | "source": [ 586 | "model = LinearRegression()\n", 587 | "model = model.fit(X_train, y_train)" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "metadata": {}, 593 | "source": [ 594 | "y = m1*x1 + m2*x2 + m3*x3 + m4*x4 + c" 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "execution_count": 120, 600 | "metadata": {}, 601 | "outputs": [ 602 | { 603 | "name": "stdout", 604 | "output_type": "stream", 605 | "text": [ 606 | "[[2114.02170162 -106.30742038 8759.3771598 ]]\n" 607 | ] 608 | } 609 | ], 610 | "source": [ 611 | "coefficients = model.coef_\n", 612 | "print(coefficients)" 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 121, 618 | "metadata": {}, 619 | "outputs": [ 620 | { 621 | "name": "stdout", 622 | "output_type": "stream", 623 | "text": [ 624 | "[-4255990.43023728]\n" 625 | ] 626 | } 627 | ], 628 | "source": [ 629 | "c = model.intercept_\n", 630 | "print(c)" 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": 122, 636 | "metadata": {}, 637 | "outputs": [ 638 | { 639 | "data": { 640 | "text/plain": [ 641 | "Index(['year', 'mpg', 'engineSize'], dtype='object')" 642 | ] 643 | }, 644 | "execution_count": 122, 645 | "metadata": {}, 646 | "output_type": "execute_result" 647 | } 648 | ], 649 | "source": [ 650 | "X.columns" 651 | ] 652 | }, 653 | { 654 | "cell_type": "code", 655 | "execution_count": 123, 656 | "metadata": {}, 657 | "outputs": [ 658 | { 659 | "data": { 660 | "text/html": [ 661 | "
\n", 662 | "\n", 675 | "\n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | "
featurescoefficients
0year2114.021702
1mpg-106.307420
2engineSize8759.377160
\n", 701 | "
" 702 | ], 703 | "text/plain": [ 704 | " features coefficients\n", 705 | "0 year 2114.021702\n", 706 | "1 mpg -106.307420\n", 707 | "2 engineSize 8759.377160" 708 | ] 709 | }, 710 | "metadata": {}, 711 | "output_type": "display_data" 712 | } 713 | ], 714 | "source": [ 715 | "coef_df = pd.DataFrame({\"features\": X.columns, \n", 716 | " \"coefficients\": np.squeeze(coefficients)})\n", 717 | "\n", 718 | "display(coef_df)" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "- A positive sign indicates that as the feature variable increases, the target variable also increases.\n", 726 | "- A negative sign indicates that as the feature variable increases, the target variable decreases." 727 | ] 728 | }, 729 | { 730 | "cell_type": "markdown", 731 | "metadata": {}, 732 | "source": [ 733 | "#### Prediction" 734 | ] 735 | }, 736 | { 737 | "cell_type": "code", 738 | "execution_count": 124, 739 | "metadata": {}, 740 | "outputs": [ 741 | { 742 | "name": "stdout", 743 | "output_type": "stream", 744 | "text": [ 745 | "[[13304.86271129]\n", 746 | " [24646.01422239]\n", 747 | " [18355.60686375]\n", 748 | " ...\n", 749 | " [ 8375.19033353]\n", 750 | " [ 5771.45345778]\n", 751 | " [ 6261.1686319 ]]\n" 752 | ] 753 | } 754 | ], 755 | "source": [ 756 | "y_pred = model.predict(X_test)\n", 757 | "print(y_pred)" 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": 125, 763 | "metadata": {}, 764 | "outputs": [ 765 | { 766 | "name": "stdout", 767 | "output_type": "stream", 768 | "text": [ 769 | " price\n", 770 | "7342 14450\n", 771 | "10328 23950\n", 772 | "14992 10495\n", 773 | "8466 9990\n", 774 | "10347 21998\n", 775 | "... ...\n", 776 | "8211 17250\n", 777 | "8401 10450\n", 778 | "9810 10290\n", 779 | "7872 7499\n", 780 | "9399 7290\n", 781 | "\n", 782 | "[3032 rows x 1 columns]\n" 783 | ] 784 | } 785 | ], 786 | "source": [ 787 | "print(y_test)" 788 | ] 789 | }, 790 | { 791 | "cell_type": "code", 792 | "execution_count": 126, 793 | "metadata": {}, 794 | "outputs": [], 795 | "source": [ 796 | "# actual value - predicted value = +" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "#### Prediction Error" 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": 127, 809 | "metadata": {}, 810 | "outputs": [ 811 | { 812 | "name": "stdout", 813 | "output_type": "stream", 814 | "text": [ 815 | "MAE = 2681.5051965262132\n", 816 | "MSE = 16459163.341451917\n", 817 | "RMSE = 4056.9894431033363\n", 818 | "r_squared = 0.7226377192887656\n" 819 | ] 820 | } 821 | ], 822 | "source": [ 823 | "MAE = mean_absolute_error(y_test, y_pred)\n", 824 | "print(\"MAE = \", MAE)\n", 825 | "\n", 826 | "MSE = mean_squared_error(y_test, y_pred, squared=True)\n", 827 | "print(\"MSE = \", MSE)\n", 828 | "\n", 829 | "RMSE = mean_squared_error(y_test, y_pred, squared=False)\n", 830 | "print(\"RMSE = \", RMSE)\n", 831 | "\n", 832 | "r2 = r2_score(y_test, y_pred)\n", 833 | "print(\"r_squared = \", r2)" 834 | ] 835 | }, 836 | { 837 | "cell_type": "code", 838 | "execution_count": 128, 839 | "metadata": {}, 840 | "outputs": [], 841 | "source": [ 842 | "# mean_absolute_error, r2_score" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": null, 848 | "metadata": {}, 849 | "outputs": [], 850 | "source": [] 851 | } 852 | ], 853 | "metadata": { 854 | "kernelspec": { 855 | "display_name": "Python 3 (ipykernel)", 856 | "language": "python", 857 | "name": "python3" 858 | }, 859 | "language_info": { 860 | "codemirror_mode": { 861 | "name": "ipython", 862 | "version": 3 863 | }, 864 | "file_extension": ".py", 865 | "mimetype": "text/x-python", 866 | "name": "python", 867 | "nbconvert_exporter": "python", 868 | "pygments_lexer": "ipython3", 869 | "version": "3.8.5" 870 | } 871 | }, 872 | "nbformat": 4, 873 | "nbformat_minor": 4 874 | } 875 | -------------------------------------------------------------------------------- /3. Regression accuracy metrics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Expedition to Data Science and Machine Learning\n", 8 | "## Module 4: Machine Learning with Python\n", 9 | "### Lecture 3: Supervised Learning: Linear Regression and Regression accuracy metrics\n", 10 | "\n", 11 | "Instructor: Md Shahidullah Kawsar\n", 12 | "
Data Scientist, IDARE, Houston, TX, USA\n", 13 | "\n", 14 | "#### Objectives:\n", 15 | "- Supervised Learning: Linear Regression\n", 16 | "- Accuracy metric in Regression problem\n", 17 | "- Mean Absolute Error (MAE)\n", 18 | "- Mean Absolute Percentage Error (MAPE)\n", 19 | "- Mean Squared Error (MSE)\n", 20 | "- Root Mean Squared Error (RMSE)\n", 21 | "- R-squared or coefficient of determination\n", 22 | "- Prediction result evaluation\n", 23 | "\n", 24 | "#### References:\n", 25 | "[1] Accuracy metrics in sklearn: https://scikit-learn.org/stable/modules/model_evaluation.html\n", 26 | "
[2] Mean Absolute Error (MAE): https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error\n", 27 | "
[3] Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error\n", 28 | "
[4] R-squared or coefficient of determination: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score\n", 29 | "
[5] MAE, MSE, RMSE, Coefficient of Determination, Adjusted R Squared — Which Metric is Better? https://medium.com/analytics-vidhya/mae-mse-rmse-coefficient-of-determination-adjusted-r-squared-which-metric-is-better-cd0326a5697e" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "#### Import required Libraries" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 10, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "import pandas as pd\n", 46 | "import numpy as np" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "#### Accuracy metrics in Supervised Learning: Regression" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 11, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "actual_value = [1,2,3,4,5,6,7,8,9,10]\n", 63 | "predicted_value = [1,3,4,5,6,5,6,5,8,9]" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "**Mean absolute error** represents the average of the absolute difference between the actual and predicted values in the dataset. It measures the average of the residuals in the dataset.\n", 71 | "\n", 72 | "**Mean Squared Error** represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals.\n", 73 | "\n", 74 | "**Root Mean Squared Error** is the square root of Mean Squared error. It measures the standard deviation of residuals.\n", 75 | "\n", 76 | "**Coefficient of determination or R-squared** represents the proportion of the variance in the dependent variable. It is a scale-free score i.e. irrespective of the values being small or large, the value of R square will be less than one." 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 24, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "data": { 86 | "text/html": [ 87 | "
\n", 88 | "\n", 101 | "\n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | "
actualpredicteddifabs_errorsquared_erroractual_subtract_meansquared_actual_subtract_mean
011000-4.520.25
123-111-3.512.25
234-111-2.56.25
345-111-1.52.25
456-111-0.50.25
5651110.50.25
6761111.52.25
7853392.56.25
8981113.512.25
91091114.520.25
\n", 217 | "
" 218 | ], 219 | "text/plain": [ 220 | " actual predicted dif abs_error squared_error actual_subtract_mean \\\n", 221 | "0 1 1 0 0 0 -4.5 \n", 222 | "1 2 3 -1 1 1 -3.5 \n", 223 | "2 3 4 -1 1 1 -2.5 \n", 224 | "3 4 5 -1 1 1 -1.5 \n", 225 | "4 5 6 -1 1 1 -0.5 \n", 226 | "5 6 5 1 1 1 0.5 \n", 227 | "6 7 6 1 1 1 1.5 \n", 228 | "7 8 5 3 3 9 2.5 \n", 229 | "8 9 8 1 1 1 3.5 \n", 230 | "9 10 9 1 1 1 4.5 \n", 231 | "\n", 232 | " squared_actual_subtract_mean \n", 233 | "0 20.25 \n", 234 | "1 12.25 \n", 235 | "2 6.25 \n", 236 | "3 2.25 \n", 237 | "4 0.25 \n", 238 | "5 0.25 \n", 239 | "6 2.25 \n", 240 | "7 6.25 \n", 241 | "8 12.25 \n", 242 | "9 20.25 " 243 | ] 244 | }, 245 | "metadata": {}, 246 | "output_type": "display_data" 247 | } 248 | ], 249 | "source": [ 250 | "df = pd.DataFrame({\"actual\":actual_value,\n", 251 | " \"predicted\": predicted_value})\n", 252 | "\n", 253 | "df[\"dif\"] = df[\"actual\"] - df[\"predicted\"]\n", 254 | "df[\"abs_error\"] = np.abs(df[\"dif\"])\n", 255 | "df[\"squared_error\"] = df[\"dif\"]**2\n", 256 | "\n", 257 | "df[\"actual_subtract_mean\"] = df[\"actual\"] - df[\"actual\"].mean()\n", 258 | "df[\"squared_actual_subtract_mean\"] = df[\"actual_subtract_mean\"]**2\n", 259 | "\n", 260 | "display(df)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 23, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "data": { 270 | "text/plain": [ 271 | "5.5" 272 | ] 273 | }, 274 | "execution_count": 23, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "df[\"actual\"].mean()" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 27, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "MAE = 1.1\n", 293 | "MAPE = 21.79\n", 294 | "MSE = 1.7\n", 295 | "RMSE = 1.3\n", 296 | "r_squared = 0.79\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "# mean absolute error: lower is better\n", 302 | "MAE = df[\"abs_error\"].mean()\n", 303 | "print(\"MAE = \", MAE)\n", 304 | "\n", 305 | "# MAPE: Mean Absolute Percentage Error: lower is better\n", 306 | "MAPE = np.round(np.mean(df[\"abs_error\"]/df[\"actual\"])*100, 2)\n", 307 | "print(\"MAPE = \", MAPE)\n", 308 | "\n", 309 | "# mean squared error: lower is better\n", 310 | "MSE = df[\"squared_error\"].mean()\n", 311 | "print(\"MSE = \", MSE)\n", 312 | "\n", 313 | "# root mean squared error: lower is better\n", 314 | "RMSE = np.round(np.sqrt(MSE), 2)\n", 315 | "print(\"RMSE = \", RMSE)\n", 316 | "\n", 317 | "# coefficient of determination == r_squared: greater is better. Max =1, min=-\n", 318 | "r_squared = np.round(1- df[\"squared_error\"].sum()/df[\"squared_actual_subtract_mean\"].sum(), 2)\n", 319 | "print(\"r_squared = \", r_squared)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [] 328 | } 329 | ], 330 | "metadata": { 331 | "kernelspec": { 332 | "display_name": "Python 3 (ipykernel)", 333 | "language": "python", 334 | "name": "python3" 335 | }, 336 | "language_info": { 337 | "codemirror_mode": { 338 | "name": "ipython", 339 | "version": 3 340 | }, 341 | "file_extension": ".py", 342 | "mimetype": "text/x-python", 343 | "name": "python", 344 | "nbconvert_exporter": "python", 345 | "pygments_lexer": "ipython3", 346 | "version": "3.8.5" 347 | } 348 | }, 349 | "nbformat": 4, 350 | "nbformat_minor": 4 351 | } 352 | -------------------------------------------------------------------------------- /4. Logistic Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Expedition to Data Science and Machine Learning\n", 8 | "## Module 4: Machine Learning with Python\n", 9 | "### Lecture 4: Supervised Learning - Classification: Logistic Regression\n", 10 | "\n", 11 | "Instructor: Md Shahidullah Kawsar\n", 12 | "
Data Scientist, IDARE, Houston, TX, USA\n", 13 | "\n", 14 | "#### Objectives:\n", 15 | "- Supervised Learning - Classification: Logistic Regression\n", 16 | "\n", 17 | "#### References:\n", 18 | "[1] Machine Learning Fundamentals: The Confusion Matrix - https://www.youtube.com/watch?v=Kdsp6soqA7o&ab_channel=StatQuestwithJoshStarmer\n", 19 | "
[2] Machine Learning Fundamentals: Sensitivity and Specificity: https://www.youtube.com/watch?v=sunUKFXMHGk&ab_channel=StatQuestwithJoshStarmer\n", 20 | "
[3] ROC and AUC, Clearly Explained! https://www.youtube.com/watch?v=4jRBRDbJemM&ab_channel=StatQuestwithJoshStarmer\n", 21 | "
[4] Machine Learning Fundamentals: Bias and Variance: https://www.youtube.com/watch?v=EuBBz3bI-aA&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=5&ab_channel=StatQuestwithJoshStarmer\n", 22 | "
[5] StatQuest: Logistic Regression - https://www.youtube.com/watch?v=yIYKR4sgzI8&ab_channel=StatQuestwithJoshStarmer\n", 23 | "
[6] Logistic Regression Details Pt1: Coefficients: https://www.youtube.com/watch?v=vN5cNN2-HWE&ab_channel=StatQuestwithJoshStarmer\n", 24 | "
[7] Logistic Regression Details Pt 2: Maximum Likelihood: https://www.youtube.com/watch?v=BfKanl1aSG0&t=163s&ab_channel=StatQuestwithJoshStarmer\n", 25 | "
[8] Machine Learning Tutorial Python - 8: Logistic Regression (Binary Classification): https://www.youtube.com/watch?v=zM4VZR0px8E&ab_channel=codebasics\n", 26 | "
[9] Maximum Likelihood, clearly explained!!!: https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "#### Import required libraries" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 123, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import pandas as pd\n", 43 | "\n", 44 | "import numpy as np\n", 45 | "\n", 46 | "import matplotlib.pyplot as plt\n", 47 | "import seaborn as sns\n", 48 | "\n", 49 | "from sklearn.model_selection import train_test_split\n", 50 | "\n", 51 | "from sklearn.linear_model import LogisticRegression" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "#### Load data\n", 59 | "Dataset Source: https://learn.datacamp.com/courses/marketing-analytics-predicting-customer-churn-in-python" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 124, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/html": [ 70 | "
\n", 71 | "\n", 84 | "\n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | "
Account_LengthVmail_MessageDay_MinsEve_MinsNight_MinsIntl_MinsCustServ_CallsChurnIntl_PlanVmail_PlanDay_CallsDay_ChargeEve_CallsEve_ChargeNight_CallsNight_ChargeIntl_CallsIntl_ChargeStateArea_CodePhone
16481020174.5236.8270.48.50nonono7929.6713620.1311012.1752.30VA510398-5788
27061250206.0198.1135.913.20nonono12835.027116.841166.1233.56WV415381-7597
31985332131.2227.4178.912.82nonoyes6322.3012519.331058.0523.46DE415416-9723
3061130272.1268.5213.88.51yesnono11146.2611822.821059.62102.30VT415419-1714
2227410237.8223.5217.410.22nonono9240.4315519.00909.7862.75SC408417-6906
2749950229.9202.4171.414.21nonono11639.0811017.201057.7163.83AL415350-7273
29709022124.5231.7222.26.41nonoyes9421.179019.6910810.00121.73ND415329-8638
2427830159.3202.3229.09.52nonono10427.089817.207310.3132.57ID415350-4297
6154843172.0200.2233.18.01nonoyes11129.246417.029610.4952.16UT510340-3075
2414160110.0147.3190.56.40noyesno9118.707512.52738.5771.73IL415342-2013
\n", 354 | "
" 355 | ], 356 | "text/plain": [ 357 | " Account_Length Vmail_Message Day_Mins Eve_Mins Night_Mins \\\n", 358 | "1648 102 0 174.5 236.8 270.4 \n", 359 | "2706 125 0 206.0 198.1 135.9 \n", 360 | "3198 53 32 131.2 227.4 178.9 \n", 361 | "306 113 0 272.1 268.5 213.8 \n", 362 | "2227 41 0 237.8 223.5 217.4 \n", 363 | "2749 95 0 229.9 202.4 171.4 \n", 364 | "2970 90 22 124.5 231.7 222.2 \n", 365 | "2427 83 0 159.3 202.3 229.0 \n", 366 | "615 48 43 172.0 200.2 233.1 \n", 367 | "2414 16 0 110.0 147.3 190.5 \n", 368 | "\n", 369 | " Intl_Mins CustServ_Calls Churn Intl_Plan Vmail_Plan Day_Calls \\\n", 370 | "1648 8.5 0 no no no 79 \n", 371 | "2706 13.2 0 no no no 128 \n", 372 | "3198 12.8 2 no no yes 63 \n", 373 | "306 8.5 1 yes no no 111 \n", 374 | "2227 10.2 2 no no no 92 \n", 375 | "2749 14.2 1 no no no 116 \n", 376 | "2970 6.4 1 no no yes 94 \n", 377 | "2427 9.5 2 no no no 104 \n", 378 | "615 8.0 1 no no yes 111 \n", 379 | "2414 6.4 0 no yes no 91 \n", 380 | "\n", 381 | " Day_Charge Eve_Calls Eve_Charge Night_Calls Night_Charge \\\n", 382 | "1648 29.67 136 20.13 110 12.17 \n", 383 | "2706 35.02 71 16.84 116 6.12 \n", 384 | "3198 22.30 125 19.33 105 8.05 \n", 385 | "306 46.26 118 22.82 105 9.62 \n", 386 | "2227 40.43 155 19.00 90 9.78 \n", 387 | "2749 39.08 110 17.20 105 7.71 \n", 388 | "2970 21.17 90 19.69 108 10.00 \n", 389 | "2427 27.08 98 17.20 73 10.31 \n", 390 | "615 29.24 64 17.02 96 10.49 \n", 391 | "2414 18.70 75 12.52 73 8.57 \n", 392 | "\n", 393 | " Intl_Calls Intl_Charge State Area_Code Phone \n", 394 | "1648 5 2.30 VA 510 398-5788 \n", 395 | "2706 3 3.56 WV 415 381-7597 \n", 396 | "3198 2 3.46 DE 415 416-9723 \n", 397 | "306 10 2.30 VT 415 419-1714 \n", 398 | "2227 6 2.75 SC 408 417-6906 \n", 399 | "2749 6 3.83 AL 415 350-7273 \n", 400 | "2970 12 1.73 ND 415 329-8638 \n", 401 | "2427 3 2.57 ID 415 350-4297 \n", 402 | "615 5 2.16 UT 510 340-3075 \n", 403 | "2414 7 1.73 IL 415 342-2013 " 404 | ] 405 | }, 406 | "metadata": {}, 407 | "output_type": "display_data" 408 | }, 409 | { 410 | "name": "stdout", 411 | "output_type": "stream", 412 | "text": [ 413 | "(3333, 21)\n" 414 | ] 415 | } 416 | ], 417 | "source": [ 418 | "df = pd.read_csv(\"Churn.csv\")\n", 419 | "\n", 420 | "pd.options.display.max_columns = df.shape[1]\n", 421 | "\n", 422 | "display(df.sample(10))\n", 423 | "print(df.shape)" 424 | ] 425 | }, 426 | { 427 | "cell_type": "code", 428 | "execution_count": 125, 429 | "metadata": {}, 430 | "outputs": [ 431 | { 432 | "name": "stdout", 433 | "output_type": "stream", 434 | "text": [ 435 | "(3333, 18)\n" 436 | ] 437 | } 438 | ], 439 | "source": [ 440 | "df = df.drop([\"Area_Code\", \"Phone\", \"State\"], axis=1)\n", 441 | "\n", 442 | "print(df.shape)" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 126, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "# abul = sorted(df[\"State\"].unique())\n", 452 | "# print(abul)" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 127, 458 | "metadata": {}, 459 | "outputs": [ 460 | { 461 | "name": "stdout", 462 | "output_type": "stream", 463 | "text": [ 464 | "\n", 465 | "RangeIndex: 3333 entries, 0 to 3332\n", 466 | "Data columns (total 18 columns):\n", 467 | " # Column Non-Null Count Dtype \n", 468 | "--- ------ -------------- ----- \n", 469 | " 0 Account_Length 3333 non-null int64 \n", 470 | " 1 Vmail_Message 3333 non-null int64 \n", 471 | " 2 Day_Mins 3333 non-null float64\n", 472 | " 3 Eve_Mins 3333 non-null float64\n", 473 | " 4 Night_Mins 3333 non-null float64\n", 474 | " 5 Intl_Mins 3333 non-null float64\n", 475 | " 6 CustServ_Calls 3333 non-null int64 \n", 476 | " 7 Churn 3333 non-null object \n", 477 | " 8 Intl_Plan 3333 non-null object \n", 478 | " 9 Vmail_Plan 3333 non-null object \n", 479 | " 10 Day_Calls 3333 non-null int64 \n", 480 | " 11 Day_Charge 3333 non-null float64\n", 481 | " 12 Eve_Calls 3333 non-null int64 \n", 482 | " 13 Eve_Charge 3333 non-null float64\n", 483 | " 14 Night_Calls 3333 non-null int64 \n", 484 | " 15 Night_Charge 3333 non-null float64\n", 485 | " 16 Intl_Calls 3333 non-null int64 \n", 486 | " 17 Intl_Charge 3333 non-null float64\n", 487 | "dtypes: float64(8), int64(7), object(3)\n", 488 | "memory usage: 468.8+ KB\n" 489 | ] 490 | } 491 | ], 492 | "source": [ 493 | "df.info()" 494 | ] 495 | }, 496 | { 497 | "cell_type": "markdown", 498 | "metadata": {}, 499 | "source": [ 500 | "#### Data Preprocessing" 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": 128, 506 | "metadata": {}, 507 | "outputs": [ 508 | { 509 | "data": { 510 | "text/plain": [ 511 | "no 2850\n", 512 | "yes 483\n", 513 | "Name: Churn, dtype: int64" 514 | ] 515 | }, 516 | "execution_count": 128, 517 | "metadata": {}, 518 | "output_type": "execute_result" 519 | } 520 | ], 521 | "source": [ 522 | "df[\"Churn\"].value_counts()" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 129, 528 | "metadata": {}, 529 | "outputs": [ 530 | { 531 | "data": { 532 | "text/plain": [ 533 | "no 3010\n", 534 | "yes 323\n", 535 | "Name: Intl_Plan, dtype: int64" 536 | ] 537 | }, 538 | "execution_count": 129, 539 | "metadata": {}, 540 | "output_type": "execute_result" 541 | } 542 | ], 543 | "source": [ 544 | "df[\"Intl_Plan\"].value_counts()" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 130, 550 | "metadata": {}, 551 | "outputs": [ 552 | { 553 | "data": { 554 | "text/plain": [ 555 | "no 2411\n", 556 | "yes 922\n", 557 | "Name: Vmail_Plan, dtype: int64" 558 | ] 559 | }, 560 | "execution_count": 130, 561 | "metadata": {}, 562 | "output_type": "execute_result" 563 | } 564 | ], 565 | "source": [ 566 | "df[\"Vmail_Plan\"].value_counts()" 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": 131, 572 | "metadata": {}, 573 | "outputs": [ 574 | { 575 | "data": { 576 | "text/html": [ 577 | "
\n", 578 | "\n", 591 | "\n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | "
Account_LengthVmail_MessageDay_MinsEve_MinsNight_MinsIntl_MinsCustServ_CallsChurnIntl_PlanVmail_PlanDay_CallsDay_ChargeEve_CallsEve_ChargeNight_CallsNight_ChargeIntl_CallsIntl_Charge
012825265.1197.4244.710.0100111045.079916.789111.0132.70
110726161.6195.5254.413.7100112327.4710316.6210311.4533.70
21370243.4121.2162.612.2000011441.3811010.301047.3253.29
3840299.461.9196.96.620107150.90885.26898.8671.78
4750166.7148.3186.910.1301011328.3412212.611218.4132.73
51180223.4220.6203.96.300109837.9810118.751189.1861.70
612124218.2348.5212.67.530018837.0910829.621189.5772.03
71470157.0103.1211.87.100107926.69948.76969.5361.92
81170184.5351.6215.88.710009731.378029.89909.7142.35
914137258.6222.0326.411.200118443.9611118.879714.6953.02
\n", 828 | "
" 829 | ], 830 | "text/plain": [ 831 | " Account_Length Vmail_Message Day_Mins Eve_Mins Night_Mins Intl_Mins \\\n", 832 | "0 128 25 265.1 197.4 244.7 10.0 \n", 833 | "1 107 26 161.6 195.5 254.4 13.7 \n", 834 | "2 137 0 243.4 121.2 162.6 12.2 \n", 835 | "3 84 0 299.4 61.9 196.9 6.6 \n", 836 | "4 75 0 166.7 148.3 186.9 10.1 \n", 837 | "5 118 0 223.4 220.6 203.9 6.3 \n", 838 | "6 121 24 218.2 348.5 212.6 7.5 \n", 839 | "7 147 0 157.0 103.1 211.8 7.1 \n", 840 | "8 117 0 184.5 351.6 215.8 8.7 \n", 841 | "9 141 37 258.6 222.0 326.4 11.2 \n", 842 | "\n", 843 | " CustServ_Calls Churn Intl_Plan Vmail_Plan Day_Calls Day_Charge \\\n", 844 | "0 1 0 0 1 110 45.07 \n", 845 | "1 1 0 0 1 123 27.47 \n", 846 | "2 0 0 0 0 114 41.38 \n", 847 | "3 2 0 1 0 71 50.90 \n", 848 | "4 3 0 1 0 113 28.34 \n", 849 | "5 0 0 1 0 98 37.98 \n", 850 | "6 3 0 0 1 88 37.09 \n", 851 | "7 0 0 1 0 79 26.69 \n", 852 | "8 1 0 0 0 97 31.37 \n", 853 | "9 0 0 1 1 84 43.96 \n", 854 | "\n", 855 | " Eve_Calls Eve_Charge Night_Calls Night_Charge Intl_Calls Intl_Charge \n", 856 | "0 99 16.78 91 11.01 3 2.70 \n", 857 | "1 103 16.62 103 11.45 3 3.70 \n", 858 | "2 110 10.30 104 7.32 5 3.29 \n", 859 | "3 88 5.26 89 8.86 7 1.78 \n", 860 | "4 122 12.61 121 8.41 3 2.73 \n", 861 | "5 101 18.75 118 9.18 6 1.70 \n", 862 | "6 108 29.62 118 9.57 7 2.03 \n", 863 | "7 94 8.76 96 9.53 6 1.92 \n", 864 | "8 80 29.89 90 9.71 4 2.35 \n", 865 | "9 111 18.87 97 14.69 5 3.02 " 866 | ] 867 | }, 868 | "metadata": {}, 869 | "output_type": "display_data" 870 | }, 871 | { 872 | "name": "stdout", 873 | "output_type": "stream", 874 | "text": [ 875 | "\n", 876 | "RangeIndex: 3333 entries, 0 to 3332\n", 877 | "Data columns (total 18 columns):\n", 878 | " # Column Non-Null Count Dtype \n", 879 | "--- ------ -------------- ----- \n", 880 | " 0 Account_Length 3333 non-null int64 \n", 881 | " 1 Vmail_Message 3333 non-null int64 \n", 882 | " 2 Day_Mins 3333 non-null float64\n", 883 | " 3 Eve_Mins 3333 non-null float64\n", 884 | " 4 Night_Mins 3333 non-null float64\n", 885 | " 5 Intl_Mins 3333 non-null float64\n", 886 | " 6 CustServ_Calls 3333 non-null int64 \n", 887 | " 7 Churn 3333 non-null int64 \n", 888 | " 8 Intl_Plan 3333 non-null int64 \n", 889 | " 9 Vmail_Plan 3333 non-null int64 \n", 890 | " 10 Day_Calls 3333 non-null int64 \n", 891 | " 11 Day_Charge 3333 non-null float64\n", 892 | " 12 Eve_Calls 3333 non-null int64 \n", 893 | " 13 Eve_Charge 3333 non-null float64\n", 894 | " 14 Night_Calls 3333 non-null int64 \n", 895 | " 15 Night_Charge 3333 non-null float64\n", 896 | " 16 Intl_Calls 3333 non-null int64 \n", 897 | " 17 Intl_Charge 3333 non-null float64\n", 898 | "dtypes: float64(8), int64(10)\n", 899 | "memory usage: 468.8 KB\n" 900 | ] 901 | } 902 | ], 903 | "source": [ 904 | "df[\"Churn\"] = df[\"Churn\"].replace({\"no\":0, \"yes\": 1})\n", 905 | "df[\"Intl_Plan\"] = df[\"Intl_Plan\"].replace({\"no\":0, \"yes\": 1})\n", 906 | "df[\"Vmail_Plan\"] = df[\"Vmail_Plan\"].replace({\"no\":0, \"yes\": 1})\n", 907 | "\n", 908 | "display(df.head(10))\n", 909 | "df.info()" 910 | ] 911 | }, 912 | { 913 | "cell_type": "markdown", 914 | "metadata": {}, 915 | "source": [ 916 | "#### Target variable visualization" 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 132, 922 | "metadata": {}, 923 | "outputs": [ 924 | { 925 | "data": { 926 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAATIElEQVR4nO3dfZBdd13H8ffHpFBaqCSwiaEP8mCGWtAWWGuhI6Kh2iKaoHQsDrrjVOMDAlVHJzpq0dGZ6jCOD6NIBpB1xEJEMBEQyawCOgi4LUFaak1FSEPXZFukFYqlKV//uCenS9wkt5u952z2vl8zO79zfuece76dub2fnKffSVUhSRLA1/RdgCRp5TAUJEktQ0GS1DIUJEktQ0GS1DIUJEmtkYVCkjclOZzklgV965PsTbK/adctWPZLSe5IcnuS7x5VXZKk4xvlkcKbgSuP6dsBzFTVZmCmmSfJRcA1wDOabf44yZoR1iZJWsTIQqGqPgh87pjurcB0Mz0NbFvQ/9aqeqCq/hO4A7h0VLVJkha3tuP9bayqOYCqmkuyoek/F/jwgvUONn3/T5LtwHaAs88++zkXXnjhCMuVpNXnpptuuruqJhZb1nUoHE8W6Vt0/I2q2gnsBJicnKzZ2dlR1iVJq06SzxxvWdd3Hx1KsgmgaQ83/QeB8xesdx5wV8e1SdLY6zoU9gBTzfQUsHtB/zVJHp3kKcBm4KMd1yZJY29kp4+S3Ai8AHhikoPA9cANwK4k1wIHgKsBqurWJLuATwJHgFdU1UOjqk2StLiRhUJVvew4i7YcZ/3fAn5rVPVIkk7OJ5olSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSa1eQiHJq5PckuTWJNc1feuT7E2yv2nX9VGbJI2zzkMhyTOBHwcuBS4GXpxkM7ADmKmqzcBMMy9J6lAfRwrfCHy4qu6vqiPAB4CXAFuB6WadaWBbD7VJ0ljrIxRuAZ6f5AlJzgJexOD9zBurag6gaTcstnGS7Ulmk8zOz893VrQkjYPOQ6GqbgN+G9gLvBf4OINXcA67/c6qmqyqyYmJiRFVKUnjqZcLzVX1xqp6dlU9H/gcsB84lGQTQNMe7qM2SRpnfd19tKFpLwC+H7gR2ANMNatMAbv7qE2Sxtnanvb7V0meADwIvKKq/jvJDcCuJNcCB4Cre6pNksZWL6FQVd+2SN89wJYeypEkNXyiWZLUMhQkSS1DQZLUMhQkSa2+7j5aEc49/wLuOnhn32VolXrSeefz2TsP9F2G9IiMdSjcdfBOfvD1H+q7DK1Sb/uJ5/VdgvSIefpIktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktTq630KP5vk1iS3JLkxyZlJ1ifZm2R/067rozZJGmedh0KSc4FXAZNV9UxgDXANsAOYqarNwEwzL0nqUF+nj9YCj0myFjgLuAvYCkw3y6eBbf2UJknjq/NQqKrPAq9l8Ha1OeDeqnofsLGq5pp15oANi22fZHuS2SSz8/PzXZUtSWOhj9NH6xgcFTwFeBJwdpKXD7t9Ve2sqsmqmpyYmBhVmZI0lvo4ffRC4D+rar6qHgTeATwPOJRkE0DTHu6hNkkaa32EwgHgsiRnJQmD9zLfBuwBppp1poDdPdQmSWOt86Gzq+ojSd4O3AwcAT4G7AQeC+xKci2D4Li669okadz18j6FqroeuP6Y7gcYHDVIknriE82SpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElq9fGSnacn2bfg774k1yVZn2Rvkv1Nu67r2iRp3PXxOs7bq+qSqroEeA5wP/BOYAcwU1WbgZlmXpLUob5PH20B/qOqPsPgFZ3TTf80sK2voiRpXPUdCtcANzbTG6tqDqBpN/RWlSSNqd5CIcmjgO8D/vIRbrc9yWyS2fn5+dEUJ0ljqs8jhauAm6vqUDN/KMkmgKY9vNhGVbWzqiaranJiYqKjUiVpPPQZCi/j4VNHAHuAqWZ6CtjdeUWSNOZ6CYUkZwFXAO9Y0H0DcEWS/c2yG/qoTZLG2do+dlpV9wNPOKbvHgZ3I0mSetL33UeSpBXEUJAktQwFSVLLUJAktQwFSVLLUJAktQwFSVLLUJAktQwFSVLLUJAktQwFSVLLUJAktQwFSVKrr6GzH5/k7Un+LcltSZ6bZH2SvUn2N+26PmqTpHHW15HC7wPvraoLgYuB24AdwExVbQZmmnlJUoc6D4Uk5wDPB94IUFVfrqrPA1uB6Wa1aWBb17VJ0rjr40jhqcA88KdJPpbkDUnOBjZW1RxA025YbOMk25PMJpmdn5/vrmpJGgN9hMJa4NnA66rqWcAXeQSniqpqZ1VNVtXkxMTEqGqUpLHURygcBA5W1Uea+bczCIlDSTYBNO3hHmqTpLHWeShU1X8BdyZ5etO1BfgksAeYavqmgN1d1yZJ425tT/t9JfCWJI8CPgX8KIOA2pXkWuAAcHVPtUnS2OolFKpqHzC5yKItHZciSVpgqNNHSS4fpk+SdHob9prCHw7ZJ0k6jZ3w9FGS5wLPAyaS/NyCRecAa0ZZmCSpeye7pvAo4LHNeo9b0H8f8NJRFSVJ6scJQ6GqPgB8IMmbq+ozHdUkSerJsHcfPTrJTuDJC7epqu8cRVGSpH4MGwp/CfwJ8AbgodGVI0nq07ChcKSqXjfSSiRJvRv2ltS/SfLTSTY1L8NZn2T9SCuTJHVu2COFo2MS/cKCvmIwDLYkaZUYKhSq6imjLkSS1L9hh7k4K8mvNHcgkWRzkhePtjRJUteGvabwp8CXGTzdDIN3IvzmSCqSJPVm2FB4WlX9DvAgQFV9CchSd5rk00k+kWRfktmmb32SvUn2N+26pX6+JGlphg2FLyd5DIOLyyR5GvDAKe77O6rqkqo6OoT2DmCmqjYDMzyCV3RKkpbHsKFwPfBe4Pwkb2Hwo/2Ly1zLVmC6mZ4Gti3z50uSTmLYu4/2JrkZuIzBaaNXV9Xdp7DfAt6XpIDXV9VOYGNVzTX7m0uyYbENk2wHtgNccMEFp1CCJOlYw9599BIGTzW/u6reBRxJsu0U9nt5VT0buAp4RZLnD7thVe2sqsmqmpyYmDiFEiRJxxr69FFV3Xt0pqo+z+CU0pJU1V1Nexh4J3ApcCjJJoCmPbzUz5ckLc2wobDYekt6v3OSs5M87ug08F3ALcAeHn5yegrYvZTPlyQt3bA/7LNJfhf4IwbXA14J3LTEfW4E3pnk6P7/oqrem+RfgF1JrgUOAFcv8fMlSUs0bCi8EvhV4G3N/PuAX1nKDqvqU8DFi/TfA2xZymdKkpbHSUMhyRpgd1W9sIN6JEk9Ouk1hap6CLg/ydd2UI8kqUfDnj76X+ATSfYCXzzaWVWvGklVkqReDBsK727+JEmr2LBPNE83Yx9dUFW3j7gmSVJPhn2i+XuBfQzGPyLJJUn2jLAuSVIPhn147TUMnjr+PEBV7QN8G5skrTLDhsKRhcNcNGq5i5Ek9WvYC823JPkhYE2SzcCrgA+NrixJUh+GPVJ4JfAMBi/W+QvgXuC6EdUkSerJCY8UkpwJ/CTwDcAngOdW1ZEuCpMkde9kRwrTwCSDQLgKeO3IK5Ik9eZk1xQuqqpvAkjyRuCjoy9JktSXkx0pPHh0YrlPGyVZk+RjSd7VzK9PsjfJ/qZdt5z7kySd3MlC4eIk9zV//wN889HpJPed4r5fDdy2YH4HMFNVm4GZZl6S1KEThkJVramqc5q/x1XV2gXT5yx1p0nOA74HeMOC7q0MrmHQtNuW+vmSpKUZ9pbU5fZ7wC8CX1nQt7Gq5gCadsNiGybZnmQ2yez8/PzIC5WkcdJ5KCR5MXC4qpb0Os+q2llVk1U1OTExsczVSdJ4G/aJ5uV0OfB9SV4EnAmck+TPgUNJNlXVXJJNwOEeapOksdb5kUJV/VJVnVdVTwauAf6+ql4O7AGmmtWmgN1d1yZJ466vawqLuQG4Isl+4IpmXpLUoT5OH7Wq6v3A+5vpe4AtfdYjSeNuJR0pSJJ6ZihIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklp9vGTnzCQfTfLxJLcm+fWmf32SvUn2N+26rmuTpHHXx5HCA8B3VtXFwCXAlUkuA3YAM1W1GZhp5iVJHerjJTtVVV9oZs9o/grYCkw3/dPAtq5rk6Rx18s1hSRrkuxj8MrNvVX1EWBjVc0BNO2GPmqTpHHWSyhU1UNVdQlwHnBpkmcOu22S7Ulmk8zOz8+PrEZJGke93n1UVZ9n8Oa1K4FDSTYBNO3h42yzs6omq2pyYmKiq1IlaSz0cffRRJLHN9OPAV4I/BuwB5hqVpsCdnddmySNuz7e0bwJmE6yhkEo7aqqdyX5Z2BXkmuBA8DVPdQmSWOt81Coqn8FnrVI/z3Alq7rkSQ9zCeaJUktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEmtPt6ncH6Sf0hyW5Jbk7y66V+fZG+S/U27ruvaJGnc9XGkcAT4+ar6RuAy4BVJLgJ2ADNVtRmYaeYlSR3qPBSqaq6qbm6m/we4DTgX2ApMN6tNA9u6rk2Sxl2v1xSSPJnBC3c+AmysqjkYBAew4TjbbE8ym2R2fn6+s1olaRz0FgpJHgv8FXBdVd037HZVtbOqJqtqcmJiYnQFStIY6iUUkpzBIBDeUlXvaLoPJdnULN8EHO6jNkkaZ33cfRTgjcBtVfW7CxbtAaaa6Slgd9e1SdK4W9vDPi8Hfhj4RJJ9Td8vAzcAu5JcCxwAru6hNkkaa52HQlX9E5DjLN7SZS2SpK/mE82SpJahIElqGQqSpJahIElqGQqSpJahIElq9fGcgjQevmYtg2c1peX3pPPO57N3Hlj2zzUUpFH5yhF+8PUf6rsKrVJv+4nnjeRzPX0kSWoZCpKklqEgSWoZCpKklqEgSWr19ZKdNyU5nOSWBX3rk+xNsr9p1/VRmySNs76OFN4MXHlM3w5gpqo2AzPNvCSpQ72EQlV9EPjcMd1bgelmehrY1mVNkqSVdU1hY1XNATTthsVWSrI9yWyS2fn5+U4LlKTVbiWFwlCqamdVTVbV5MTERN/lSNKqspJC4VCSTQBNe7jneiRp7KykUNgDTDXTU8DuHmuRpLHU1y2pNwL/DDw9ycEk1wI3AFck2Q9c0cxLkjrUyyipVfWy4yza0mkhkqSvspJOH0mSemYoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqWUoSJJahoIkqbXiQiHJlUluT3JHkh191yNJ42RFhUKSNcAfAVcBFwEvS3JRv1VJ0vhYUaEAXArcUVWfqqovA28FtvZckySNjVRV3zW0krwUuLKqfqyZ/2HgW6vqZxassx3Y3sw+Hbj9FHb5RODuU9heOhG/XxqlU/l+fX1VTSy2oJc3r51AFun7qtSqqp3AzmXZWTJbVZPL8VnSsfx+aZRG9f1aaaePDgLnL5g/D7irp1okaeystFD4F2BzkqckeRRwDbCn55okaWysqNNHVXUkyc8AfwesAd5UVbeOcJfLchpKOg6/XxqlkXy/VtSFZklSv1ba6SNJUo8MBUlSa9WHwsmGzcjAHzTL/zXJs/uoU6enIb5fL0hyb5J9zd+v9VGnTk9J3pTkcJJbjrN82X+/VnUoDDlsxlXA5uZvO/C6TovUaesRDMvyj1V1SfP3G50WqdPdm4ErT7B82X+/VnUoMNywGVuBP6uBDwOPT7Kp60J1WnJYFo1UVX0Q+NwJVln236/VHgrnAncumD/Y9D3SdaTFDPvdeW6Sjyf52yTP6KY0jYll//1aUc8pjMBJh80Ych1pMcN8d25mMM7MF5K8CPhrBof60nJY9t+v1X6kMMywGQ6toaU66Xenqu6rqi800+8BzkjyxO5K1Cq37L9fqz0Uhhk2Yw/wI81V/MuAe6tqrutCdVo66fcrydclSTN9KYP/5+7pvFKtVsv++7WqTx8db9iMJD/ZLP8T4D3Ai4A7gPuBH+2rXp1ehvx+vRT4qSRHgC8B15TDCGhISW4EXgA8MclB4HrgDBjd75fDXEiSWqv99JEk6REwFCRJLUNBktQyFCRJLUNBktQyFKSTaJ41eGuS/0jyySTvSbI9ybv6rk1aboaCdALNg2fvBN5fVU+rqouAXwY2nuLnrupnhHT68ospndh3AA82DwoBUFX7kjwe2JLk7cAzgZuAl1dVJfk0MFlVdyeZBF5bVS9I8hrgScCTgbuT/DtwAfDUpv29qvqD7v7TpP/PIwXpxI7+4C/mWcB1DN6l8FTg8iE+7znA1qr6oWb+QuC7GQzDfX2SM06pWukUGQrS0n20qg5W1VeAfQyOAE5mT1V9acH8u6vqgaq6GzjMKZ6Wkk6VoSCd2K0M/nW/mAcWTD/Ew6djj/Dw/1tnHrPNF4f8DKkXhoJ0Yn8PPDrJjx/tSPItwLefYJtP83CQ/MDoSpOWn6EgnUAzoulLgCuaW1JvBV7Dices/3Xg95P8I4N//UunDUdJlSS1PFKQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLX+D20jWftLfdavAAAAAElFTkSuQmCC\n", 927 | "text/plain": [ 928 | "
" 929 | ] 930 | }, 931 | "metadata": { 932 | "needs_background": "light" 933 | }, 934 | "output_type": "display_data" 935 | } 936 | ], 937 | "source": [ 938 | "sns.histplot(x=\"Churn\", data=df, binwidth=0.5, stat=\"percent\")\n", 939 | "plt.yticks(np.arange(0,101,10))\n", 940 | "plt.xticks(np.arange(0,1.1,0.5))\n", 941 | "plt.show()" 942 | ] 943 | }, 944 | { 945 | "cell_type": "markdown", 946 | "metadata": {}, 947 | "source": [ 948 | "#### Feature and target variable separation" 949 | ] 950 | }, 951 | { 952 | "cell_type": "code", 953 | "execution_count": 133, 954 | "metadata": {}, 955 | "outputs": [ 956 | { 957 | "name": "stdout", 958 | "output_type": "stream", 959 | "text": [ 960 | "(3333, 17) (3333, 1)\n" 961 | ] 962 | } 963 | ], 964 | "source": [ 965 | "X = df.drop(\"Churn\", axis=1)\n", 966 | "y = df[[\"Churn\"]]\n", 967 | "\n", 968 | "print(X.shape, y.shape)" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "#### Train test separation" 976 | ] 977 | }, 978 | { 979 | "cell_type": "code", 980 | "execution_count": 134, 981 | "metadata": {}, 982 | "outputs": [ 983 | { 984 | "name": "stdout", 985 | "output_type": "stream", 986 | "text": [ 987 | "(2666, 17) (667, 17) (2666, 1) (667, 1)\n" 988 | ] 989 | } 990 | ], 991 | "source": [ 992 | "X_train, X_test, y_train, y_test = train_test_split(X, \n", 993 | " y,\n", 994 | " test_size=0.2, \n", 995 | " random_state=42,\n", 996 | " stratify=y)\n", 997 | "\n", 998 | "print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "execution_count": 135, 1004 | "metadata": {}, 1005 | "outputs": [ 1006 | { 1007 | "data": { 1008 | "text/plain": [ 1009 | "0 85.52138\n", 1010 | "1 14.47862\n", 1011 | "Name: Churn, dtype: float64" 1012 | ] 1013 | }, 1014 | "execution_count": 135, 1015 | "metadata": {}, 1016 | "output_type": "execute_result" 1017 | } 1018 | ], 1019 | "source": [ 1020 | "y_train[\"Churn\"].value_counts(normalize=True)*100" 1021 | ] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": 136, 1026 | "metadata": {}, 1027 | "outputs": [ 1028 | { 1029 | "data": { 1030 | "text/plain": [ 1031 | "0 85.457271\n", 1032 | "1 14.542729\n", 1033 | "Name: Churn, dtype: float64" 1034 | ] 1035 | }, 1036 | "execution_count": 136, 1037 | "metadata": {}, 1038 | "output_type": "execute_result" 1039 | } 1040 | ], 1041 | "source": [ 1042 | "y_test[\"Churn\"].value_counts(normalize=True)*100" 1043 | ] 1044 | }, 1045 | { 1046 | "cell_type": "markdown", 1047 | "metadata": {}, 1048 | "source": [ 1049 | "#### Training: Logistic Regression" 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "code", 1054 | "execution_count": null, 1055 | "metadata": {}, 1056 | "outputs": [], 1057 | "source": [] 1058 | }, 1059 | { 1060 | "cell_type": "code", 1061 | "execution_count": null, 1062 | "metadata": {}, 1063 | "outputs": [], 1064 | "source": [] 1065 | } 1066 | ], 1067 | "metadata": { 1068 | "kernelspec": { 1069 | "display_name": "Python 3 (ipykernel)", 1070 | "language": "python", 1071 | "name": "python3" 1072 | }, 1073 | "language_info": { 1074 | "codemirror_mode": { 1075 | "name": "ipython", 1076 | "version": 3 1077 | }, 1078 | "file_extension": ".py", 1079 | "mimetype": "text/x-python", 1080 | "name": "python", 1081 | "nbconvert_exporter": "python", 1082 | "pygments_lexer": "ipython3", 1083 | "version": "3.8.5" 1084 | } 1085 | }, 1086 | "nbformat": 4, 1087 | "nbformat_minor": 4 1088 | } 1089 | -------------------------------------------------------------------------------- /5. Classification: Accuracy Metrics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lecture 5: Supervised Learning - Classification: Accuracy Metrics\n", 8 | "\n", 9 | "Instructor: Md Shahidullah Kawsar\n", 10 | "
Data Scientist, IDARE, Houston, TX, USA\n", 11 | "\n", 12 | "#### Objectives:\n", 13 | "- Supervised Learning - Classification: Logistic Regression\n", 14 | "- Confusion Matrix\n", 15 | "- Accuracy, Precision, Recall/Sensitivity/True Positive Rate, F1 score, False Positive Rate\n", 16 | "- ROC: Receiver Operating Characterisitcs and AUC: Area Under the Curve\n", 17 | "- Classification report\n", 18 | "\n", 19 | "#### References:\n", 20 | "[1] Machine Learning Fundamentals: The Confusion Matrix - https://www.youtube.com/watch?v=Kdsp6soqA7o&ab_channel=StatQuestwithJoshStarmer\n", 21 | "
[2] Machine Learning Fundamentals: Sensitivity and Specificity: https://www.youtube.com/watch?v=sunUKFXMHGk&ab_channel=StatQuestwithJoshStarmer\n", 22 | "
[3] ROC and AUC, Clearly Explained! https://www.youtube.com/watch?v=4jRBRDbJemM&ab_channel=StatQuestwithJoshStarmer\n", 23 | "
[4] Machine Learning Fundamentals: Bias and Variance: https://www.youtube.com/watch?v=EuBBz3bI-aA&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=5&ab_channel=StatQuestwithJoshStarmer\n", 24 | "
[5] StatQuest: Logistic Regression - https://www.youtube.com/watch?v=yIYKR4sgzI8&ab_channel=StatQuestwithJoshStarmer\n", 25 | "
[6] Logistic Regression Details Pt1: Coefficients: https://www.youtube.com/watch?v=vN5cNN2-HWE&ab_channel=StatQuestwithJoshStarmer\n", 26 | "
[7] Logistic Regression Details Pt 2: Maximum Likelihood: https://www.youtube.com/watch?v=BfKanl1aSG0&t=163s&ab_channel=StatQuestwithJoshStarmer\n", 27 | "
[8] Machine Learning Tutorial Python - 8: Logistic Regression (Binary Classification): https://www.youtube.com/watch?v=zM4VZR0px8E&ab_channel=codebasics\n", 28 | "
[9] Maximum Likelihood, clearly explained!!!: https://www.youtube.com/watch?v=XepXtl9YKwc&ab_channel=StatQuestwithJoshStarmer" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "#### Import required libraries" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 43, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "import pandas as pd\n", 45 | "import numpy as np\n", 46 | "\n", 47 | "pd.options.display.max_columns = 20\n", 48 | "# pd.options.display.max_rows = 100\n", 49 | "\n", 50 | "from sklearn.model_selection import train_test_split\n", 51 | "from sklearn.linear_model import LogisticRegression\n", 52 | "# from sklearn.tree import DecisionTreeClassifier\n", 53 | "# from sklearn.ensemble import RandomForestClassifier\n", 54 | "\n", 55 | "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n", 56 | "from sklearn.metrics import roc_curve, roc_auc_score, precision_score, recall_score, f1_score\n", 57 | "from sklearn.metrics import plot_confusion_matrix\n", 58 | "\n", 59 | "import matplotlib.pyplot as plt\n", 60 | "import seaborn as sns\n", 61 | "\n", 62 | "import warnings\n", 63 | "warnings.filterwarnings('ignore')" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "#### Load data\n", 71 | "Dataset Source: https://learn.datacamp.com/courses/marketing-analytics-predicting-customer-churn-in-python" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 44, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/html": [ 82 | "
\n", 83 | "\n", 96 | "\n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | "
Account_LengthVmail_MessageDay_MinsEve_MinsNight_MinsIntl_MinsCustServ_CallsChurnIntl_PlanVmail_Plan...Day_ChargeEve_CallsEve_ChargeNight_CallsNight_ChargeIntl_CallsIntl_ChargeStateArea_CodePhone
012825265.1197.4244.710.01nonoyes...45.079916.789111.0132.70KS415382-4657
110726161.6195.5254.413.71nonoyes...27.4710316.6210311.4533.70OH415371-7191
21370243.4121.2162.612.20nonono...41.3811010.301047.3253.29NJ415358-1921
3840299.461.9196.96.62noyesno...50.90885.26898.8671.78OH408375-9999
4750166.7148.3186.910.13noyesno...28.3412212.611218.4132.73OK415330-6626
51180223.4220.6203.96.30noyesno...37.9810118.751189.1861.70AL510391-8027
612124218.2348.5212.67.53nonoyes...37.0910829.621189.5772.03MA510355-9993
71470157.0103.1211.87.10noyesno...26.69948.76969.5361.92MO415329-9001
81170184.5351.6215.88.71nonono...31.378029.89909.7142.35LA408335-4719
914137258.6222.0326.411.20noyesyes...43.9611118.879714.6953.02WV415330-8173
\n", 366 | "

10 rows × 21 columns

\n", 367 | "
" 368 | ], 369 | "text/plain": [ 370 | " Account_Length Vmail_Message Day_Mins Eve_Mins Night_Mins Intl_Mins \\\n", 371 | "0 128 25 265.1 197.4 244.7 10.0 \n", 372 | "1 107 26 161.6 195.5 254.4 13.7 \n", 373 | "2 137 0 243.4 121.2 162.6 12.2 \n", 374 | "3 84 0 299.4 61.9 196.9 6.6 \n", 375 | "4 75 0 166.7 148.3 186.9 10.1 \n", 376 | "5 118 0 223.4 220.6 203.9 6.3 \n", 377 | "6 121 24 218.2 348.5 212.6 7.5 \n", 378 | "7 147 0 157.0 103.1 211.8 7.1 \n", 379 | "8 117 0 184.5 351.6 215.8 8.7 \n", 380 | "9 141 37 258.6 222.0 326.4 11.2 \n", 381 | "\n", 382 | " CustServ_Calls Churn Intl_Plan Vmail_Plan ... Day_Charge Eve_Calls \\\n", 383 | "0 1 no no yes ... 45.07 99 \n", 384 | "1 1 no no yes ... 27.47 103 \n", 385 | "2 0 no no no ... 41.38 110 \n", 386 | "3 2 no yes no ... 50.90 88 \n", 387 | "4 3 no yes no ... 28.34 122 \n", 388 | "5 0 no yes no ... 37.98 101 \n", 389 | "6 3 no no yes ... 37.09 108 \n", 390 | "7 0 no yes no ... 26.69 94 \n", 391 | "8 1 no no no ... 31.37 80 \n", 392 | "9 0 no yes yes ... 43.96 111 \n", 393 | "\n", 394 | " Eve_Charge Night_Calls Night_Charge Intl_Calls Intl_Charge State \\\n", 395 | "0 16.78 91 11.01 3 2.70 KS \n", 396 | "1 16.62 103 11.45 3 3.70 OH \n", 397 | "2 10.30 104 7.32 5 3.29 NJ \n", 398 | "3 5.26 89 8.86 7 1.78 OH \n", 399 | "4 12.61 121 8.41 3 2.73 OK \n", 400 | "5 18.75 118 9.18 6 1.70 AL \n", 401 | "6 29.62 118 9.57 7 2.03 MA \n", 402 | "7 8.76 96 9.53 6 1.92 MO \n", 403 | "8 29.89 90 9.71 4 2.35 LA \n", 404 | "9 18.87 97 14.69 5 3.02 WV \n", 405 | "\n", 406 | " Area_Code Phone \n", 407 | "0 415 382-4657 \n", 408 | "1 415 371-7191 \n", 409 | "2 415 358-1921 \n", 410 | "3 408 375-9999 \n", 411 | "4 415 330-6626 \n", 412 | "5 510 391-8027 \n", 413 | "6 510 355-9993 \n", 414 | "7 415 329-9001 \n", 415 | "8 408 335-4719 \n", 416 | "9 415 330-8173 \n", 417 | "\n", 418 | "[10 rows x 21 columns]" 419 | ] 420 | }, 421 | "metadata": {}, 422 | "output_type": "display_data" 423 | } 424 | ], 425 | "source": [ 426 | "df = pd.read_csv(\"Churn.csv\")\n", 427 | "\n", 428 | "display(df.head(10))" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": 45, 434 | "metadata": {}, 435 | "outputs": [ 436 | { 437 | "name": "stdout", 438 | "output_type": "stream", 439 | "text": [ 440 | "(3333, 21)\n" 441 | ] 442 | } 443 | ], 444 | "source": [ 445 | "print(df.shape)" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 46, 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "name": "stdout", 455 | "output_type": "stream", 456 | "text": [ 457 | "\n", 458 | "RangeIndex: 3333 entries, 0 to 3332\n", 459 | "Data columns (total 21 columns):\n", 460 | " # Column Non-Null Count Dtype \n", 461 | "--- ------ -------------- ----- \n", 462 | " 0 Account_Length 3333 non-null int64 \n", 463 | " 1 Vmail_Message 3333 non-null int64 \n", 464 | " 2 Day_Mins 3333 non-null float64\n", 465 | " 3 Eve_Mins 3333 non-null float64\n", 466 | " 4 Night_Mins 3333 non-null float64\n", 467 | " 5 Intl_Mins 3333 non-null float64\n", 468 | " 6 CustServ_Calls 3333 non-null int64 \n", 469 | " 7 Churn 3333 non-null object \n", 470 | " 8 Intl_Plan 3333 non-null object \n", 471 | " 9 Vmail_Plan 3333 non-null object \n", 472 | " 10 Day_Calls 3333 non-null int64 \n", 473 | " 11 Day_Charge 3333 non-null float64\n", 474 | " 12 Eve_Calls 3333 non-null int64 \n", 475 | " 13 Eve_Charge 3333 non-null float64\n", 476 | " 14 Night_Calls 3333 non-null int64 \n", 477 | " 15 Night_Charge 3333 non-null float64\n", 478 | " 16 Intl_Calls 3333 non-null int64 \n", 479 | " 17 Intl_Charge 3333 non-null float64\n", 480 | " 18 State 3333 non-null object \n", 481 | " 19 Area_Code 3333 non-null int64 \n", 482 | " 20 Phone 3333 non-null object \n", 483 | "dtypes: float64(8), int64(8), object(5)\n", 484 | "memory usage: 546.9+ KB\n" 485 | ] 486 | } 487 | ], 488 | "source": [ 489 | "df.info()" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "#### Data Preprocessing" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": 47, 502 | "metadata": {}, 503 | "outputs": [ 504 | { 505 | "name": "stdout", 506 | "output_type": "stream", 507 | "text": [ 508 | "['no' 'yes']\n", 509 | "['no' 'yes']\n", 510 | "['yes' 'no']\n" 511 | ] 512 | } 513 | ], 514 | "source": [ 515 | "print(df['Churn'].unique())\n", 516 | "print(df['Intl_Plan'].unique())\n", 517 | "print(df['Vmail_Plan'].unique())" 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": 48, 523 | "metadata": {}, 524 | "outputs": [ 525 | { 526 | "data": { 527 | "text/html": [ 528 | "
\n", 529 | "\n", 542 | "\n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | "
Account_LengthVmail_MessageDay_MinsEve_MinsNight_MinsIntl_MinsCustServ_CallsChurnIntl_PlanVmail_PlanDay_CallsDay_ChargeEve_CallsEve_ChargeNight_CallsNight_ChargeIntl_CallsIntl_Charge
012825265.1197.4244.710.0100111045.079916.789111.0132.70
110726161.6195.5254.413.7100112327.4710316.6210311.4533.70
21370243.4121.2162.612.2000011441.3811010.301047.3253.29
3840299.461.9196.96.620107150.90885.26898.8671.78
4750166.7148.3186.910.1301011328.3412212.611218.4132.73
51180223.4220.6203.96.300109837.9810118.751189.1861.70
612124218.2348.5212.67.530018837.0910829.621189.5772.03
71470157.0103.1211.87.100107926.69948.76969.5361.92
81170184.5351.6215.88.710009731.378029.89909.7142.35
914137258.6222.0326.411.200118443.9611118.879714.6953.02
\n", 779 | "
" 780 | ], 781 | "text/plain": [ 782 | " Account_Length Vmail_Message Day_Mins Eve_Mins Night_Mins Intl_Mins \\\n", 783 | "0 128 25 265.1 197.4 244.7 10.0 \n", 784 | "1 107 26 161.6 195.5 254.4 13.7 \n", 785 | "2 137 0 243.4 121.2 162.6 12.2 \n", 786 | "3 84 0 299.4 61.9 196.9 6.6 \n", 787 | "4 75 0 166.7 148.3 186.9 10.1 \n", 788 | "5 118 0 223.4 220.6 203.9 6.3 \n", 789 | "6 121 24 218.2 348.5 212.6 7.5 \n", 790 | "7 147 0 157.0 103.1 211.8 7.1 \n", 791 | "8 117 0 184.5 351.6 215.8 8.7 \n", 792 | "9 141 37 258.6 222.0 326.4 11.2 \n", 793 | "\n", 794 | " CustServ_Calls Churn Intl_Plan Vmail_Plan Day_Calls Day_Charge \\\n", 795 | "0 1 0 0 1 110 45.07 \n", 796 | "1 1 0 0 1 123 27.47 \n", 797 | "2 0 0 0 0 114 41.38 \n", 798 | "3 2 0 1 0 71 50.90 \n", 799 | "4 3 0 1 0 113 28.34 \n", 800 | "5 0 0 1 0 98 37.98 \n", 801 | "6 3 0 0 1 88 37.09 \n", 802 | "7 0 0 1 0 79 26.69 \n", 803 | "8 1 0 0 0 97 31.37 \n", 804 | "9 0 0 1 1 84 43.96 \n", 805 | "\n", 806 | " Eve_Calls Eve_Charge Night_Calls Night_Charge Intl_Calls Intl_Charge \n", 807 | "0 99 16.78 91 11.01 3 2.70 \n", 808 | "1 103 16.62 103 11.45 3 3.70 \n", 809 | "2 110 10.30 104 7.32 5 3.29 \n", 810 | "3 88 5.26 89 8.86 7 1.78 \n", 811 | "4 122 12.61 121 8.41 3 2.73 \n", 812 | "5 101 18.75 118 9.18 6 1.70 \n", 813 | "6 108 29.62 118 9.57 7 2.03 \n", 814 | "7 94 8.76 96 9.53 6 1.92 \n", 815 | "8 80 29.89 90 9.71 4 2.35 \n", 816 | "9 111 18.87 97 14.69 5 3.02 " 817 | ] 818 | }, 819 | "metadata": {}, 820 | "output_type": "display_data" 821 | }, 822 | { 823 | "name": "stdout", 824 | "output_type": "stream", 825 | "text": [ 826 | "(3333, 18)\n" 827 | ] 828 | } 829 | ], 830 | "source": [ 831 | "df = df.drop(['State', 'Area_Code', 'Phone'], axis=1)\n", 832 | "\n", 833 | "df['Churn'] = df['Churn'].replace(({'no':0, 'yes':1}))\n", 834 | "df['Intl_Plan'] = df['Intl_Plan'].replace(({'no':0, 'yes':1}))\n", 835 | "df['Vmail_Plan'] = df['Vmail_Plan'].replace(({'no':0, 'yes':1}))\n", 836 | "\n", 837 | "display(df.head(10))\n", 838 | "print(df.shape)" 839 | ] 840 | }, 841 | { 842 | "cell_type": "code", 843 | "execution_count": 49, 844 | "metadata": {}, 845 | "outputs": [ 846 | { 847 | "name": "stdout", 848 | "output_type": "stream", 849 | "text": [ 850 | "\n", 851 | "RangeIndex: 3333 entries, 0 to 3332\n", 852 | "Data columns (total 18 columns):\n", 853 | " # Column Non-Null Count Dtype \n", 854 | "--- ------ -------------- ----- \n", 855 | " 0 Account_Length 3333 non-null int64 \n", 856 | " 1 Vmail_Message 3333 non-null int64 \n", 857 | " 2 Day_Mins 3333 non-null float64\n", 858 | " 3 Eve_Mins 3333 non-null float64\n", 859 | " 4 Night_Mins 3333 non-null float64\n", 860 | " 5 Intl_Mins 3333 non-null float64\n", 861 | " 6 CustServ_Calls 3333 non-null int64 \n", 862 | " 7 Churn 3333 non-null int64 \n", 863 | " 8 Intl_Plan 3333 non-null int64 \n", 864 | " 9 Vmail_Plan 3333 non-null int64 \n", 865 | " 10 Day_Calls 3333 non-null int64 \n", 866 | " 11 Day_Charge 3333 non-null float64\n", 867 | " 12 Eve_Calls 3333 non-null int64 \n", 868 | " 13 Eve_Charge 3333 non-null float64\n", 869 | " 14 Night_Calls 3333 non-null int64 \n", 870 | " 15 Night_Charge 3333 non-null float64\n", 871 | " 16 Intl_Calls 3333 non-null int64 \n", 872 | " 17 Intl_Charge 3333 non-null float64\n", 873 | "dtypes: float64(8), int64(10)\n", 874 | "memory usage: 468.8 KB\n" 875 | ] 876 | } 877 | ], 878 | "source": [ 879 | "df.info()" 880 | ] 881 | }, 882 | { 883 | "cell_type": "markdown", 884 | "metadata": {}, 885 | "source": [ 886 | "#### Target variable " 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 50, 892 | "metadata": {}, 893 | "outputs": [ 894 | { 895 | "data": { 896 | "text/plain": [ 897 | "0 2850\n", 898 | "1 483\n", 899 | "Name: Churn, dtype: int64" 900 | ] 901 | }, 902 | "execution_count": 50, 903 | "metadata": {}, 904 | "output_type": "execute_result" 905 | } 906 | ], 907 | "source": [ 908 | "df['Churn'].value_counts()" 909 | ] 910 | }, 911 | { 912 | "cell_type": "markdown", 913 | "metadata": {}, 914 | "source": [ 915 | "#### Feature and target variable separation" 916 | ] 917 | }, 918 | { 919 | "cell_type": "code", 920 | "execution_count": 51, 921 | "metadata": {}, 922 | "outputs": [ 923 | { 924 | "name": "stdout", 925 | "output_type": "stream", 926 | "text": [ 927 | "(3333, 17) (3333, 1)\n" 928 | ] 929 | } 930 | ], 931 | "source": [ 932 | "X = df.drop('Churn', axis=1)\n", 933 | "y = df[['Churn']]\n", 934 | "\n", 935 | "print(X.shape, y.shape)" 936 | ] 937 | }, 938 | { 939 | "cell_type": "markdown", 940 | "metadata": {}, 941 | "source": [ 942 | "#### Train test separation" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": 52, 948 | "metadata": {}, 949 | "outputs": [ 950 | { 951 | "name": "stdout", 952 | "output_type": "stream", 953 | "text": [ 954 | "(2333, 17) (1000, 17) (2333, 1) (1000, 1)\n" 955 | ] 956 | } 957 | ], 958 | "source": [ 959 | "X_train, X_test, y_train, y_test = train_test_split(X, \n", 960 | " y, \n", 961 | " test_size=0.3, \n", 962 | " random_state=42, \n", 963 | " stratify=y)\n", 964 | "\n", 965 | "print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)" 966 | ] 967 | }, 968 | { 969 | "cell_type": "code", 970 | "execution_count": 66, 971 | "metadata": {}, 972 | "outputs": [ 973 | { 974 | "data": { 975 | "text/plain": [ 976 | "0 855\n", 977 | "1 145\n", 978 | "Name: Churn, dtype: int64" 979 | ] 980 | }, 981 | "execution_count": 66, 982 | "metadata": {}, 983 | "output_type": "execute_result" 984 | } 985 | ], 986 | "source": [ 987 | "y_test[\"Churn\"].value_counts()" 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": {}, 993 | "source": [ 994 | "#### Training: Logistic Regression" 995 | ] 996 | }, 997 | { 998 | "cell_type": "code", 999 | "execution_count": 53, 1000 | "metadata": {}, 1001 | "outputs": [], 1002 | "source": [ 1003 | "model = LogisticRegression()\n", 1004 | "model = model.fit(X_train, y_train)" 1005 | ] 1006 | }, 1007 | { 1008 | "cell_type": "markdown", 1009 | "metadata": {}, 1010 | "source": [ 1011 | "#### Prediction" 1012 | ] 1013 | }, 1014 | { 1015 | "cell_type": "code", 1016 | "execution_count": 54, 1017 | "metadata": {}, 1018 | "outputs": [], 1019 | "source": [ 1020 | "y_pred = model.predict(X_test)\n", 1021 | "# print(y_pred)" 1022 | ] 1023 | }, 1024 | { 1025 | "cell_type": "code", 1026 | "execution_count": 55, 1027 | "metadata": {}, 1028 | "outputs": [], 1029 | "source": [ 1030 | "# model.predict_proba(X_test)" 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "#### Prediction Comparison with the test data" 1038 | ] 1039 | }, 1040 | { 1041 | "cell_type": "code", 1042 | "execution_count": 56, 1043 | "metadata": {}, 1044 | "outputs": [ 1045 | { 1046 | "data": { 1047 | "text/html": [ 1048 | "
\n", 1049 | "\n", 1062 | "\n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | "
ChurnChurn_probabilityChurn_predicted
6800.350
165310.240
171600.120
325100.160
240600.100
\n", 1104 | "
" 1105 | ], 1106 | "text/plain": [ 1107 | " Churn Churn_probability Churn_predicted\n", 1108 | "68 0 0.35 0\n", 1109 | "1653 1 0.24 0\n", 1110 | "1716 0 0.12 0\n", 1111 | "3251 0 0.16 0\n", 1112 | "2406 0 0.10 0" 1113 | ] 1114 | }, 1115 | "metadata": {}, 1116 | "output_type": "display_data" 1117 | } 1118 | ], 1119 | "source": [ 1120 | "y_test[\"Churn_probability\"] = np.round(model.predict_proba(X_test)[:,1], 2)\n", 1121 | "y_test[\"Churn_predicted\"] = y_pred\n", 1122 | "\n", 1123 | "display(y_test.head())" 1124 | ] 1125 | }, 1126 | { 1127 | "cell_type": "code", 1128 | "execution_count": 57, 1129 | "metadata": {}, 1130 | "outputs": [ 1131 | { 1132 | "name": "stdout", 1133 | "output_type": "stream", 1134 | "text": [ 1135 | "Shape of X_test = (1000, 17)\n", 1136 | "Shape of y_test = (1000, 3)\n" 1137 | ] 1138 | } 1139 | ], 1140 | "source": [ 1141 | "print(\"Shape of X_test = \", X_test.shape)\n", 1142 | "print(\"Shape of y_test = \", y_test.shape)" 1143 | ] 1144 | }, 1145 | { 1146 | "cell_type": "code", 1147 | "execution_count": 59, 1148 | "metadata": {}, 1149 | "outputs": [ 1150 | { 1151 | "name": "stdout", 1152 | "output_type": "stream", 1153 | "text": [ 1154 | "(1000, 20)\n" 1155 | ] 1156 | }, 1157 | { 1158 | "data": { 1159 | "text/html": [ 1160 | "
\n", 1161 | "\n", 1174 | "\n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | "
Account_LengthVmail_MessageDay_MinsEve_MinsNight_MinsIntl_MinsCustServ_CallsIntl_PlanVmail_PlanDay_CallsDay_ChargeEve_CallsEve_ChargeNight_CallsNight_ChargeIntl_CallsIntl_ChargeChurnChurn_probabilityChurn_predicted
681260211.6216.9153.57.81007035.978018.44606.9112.1100.350
1653930131.4219.7155.711.11107822.3410618.671037.0123.0010.240
17163625152.8242.8147.49.110111025.986720.64746.6322.4600.120
3251880274.6161.1194.49.220010546.6812113.691238.7542.4800.160
2406810145.6287.9181.79.22005924.7513124.471218.1842.4800.100
793450112.8218.8240.29.020010819.1812018.6010610.8132.4300.060
10291160201.8231.5226.116.50008234.319519.6813010.1754.4600.080
19431250168.6175.6243.310.90009928.6610714.939210.9572.9400.040
1161400170.7179.1281.98.23105529.0210815.228912.6992.2100.410
2837750203.3228.9222.214.31007034.569719.4611810.0033.8600.190
\n", 1433 | "
" 1434 | ], 1435 | "text/plain": [ 1436 | " Account_Length Vmail_Message Day_Mins Eve_Mins Night_Mins \\\n", 1437 | "68 126 0 211.6 216.9 153.5 \n", 1438 | "1653 93 0 131.4 219.7 155.7 \n", 1439 | "1716 36 25 152.8 242.8 147.4 \n", 1440 | "3251 88 0 274.6 161.1 194.4 \n", 1441 | "2406 81 0 145.6 287.9 181.7 \n", 1442 | "793 45 0 112.8 218.8 240.2 \n", 1443 | "1029 116 0 201.8 231.5 226.1 \n", 1444 | "1943 125 0 168.6 175.6 243.3 \n", 1445 | "1161 40 0 170.7 179.1 281.9 \n", 1446 | "2837 75 0 203.3 228.9 222.2 \n", 1447 | "\n", 1448 | " Intl_Mins CustServ_Calls Intl_Plan Vmail_Plan Day_Calls Day_Charge \\\n", 1449 | "68 7.8 1 0 0 70 35.97 \n", 1450 | "1653 11.1 1 1 0 78 22.34 \n", 1451 | "1716 9.1 1 0 1 110 25.98 \n", 1452 | "3251 9.2 2 0 0 105 46.68 \n", 1453 | "2406 9.2 2 0 0 59 24.75 \n", 1454 | "793 9.0 2 0 0 108 19.18 \n", 1455 | "1029 16.5 0 0 0 82 34.31 \n", 1456 | "1943 10.9 0 0 0 99 28.66 \n", 1457 | "1161 8.2 3 1 0 55 29.02 \n", 1458 | "2837 14.3 1 0 0 70 34.56 \n", 1459 | "\n", 1460 | " Eve_Calls Eve_Charge Night_Calls Night_Charge Intl_Calls \\\n", 1461 | "68 80 18.44 60 6.91 1 \n", 1462 | "1653 106 18.67 103 7.01 2 \n", 1463 | "1716 67 20.64 74 6.63 2 \n", 1464 | "3251 121 13.69 123 8.75 4 \n", 1465 | "2406 131 24.47 121 8.18 4 \n", 1466 | "793 120 18.60 106 10.81 3 \n", 1467 | "1029 95 19.68 130 10.17 5 \n", 1468 | "1943 107 14.93 92 10.95 7 \n", 1469 | "1161 108 15.22 89 12.69 9 \n", 1470 | "2837 97 19.46 118 10.00 3 \n", 1471 | "\n", 1472 | " Intl_Charge Churn Churn_probability Churn_predicted \n", 1473 | "68 2.11 0 0.35 0 \n", 1474 | "1653 3.00 1 0.24 0 \n", 1475 | "1716 2.46 0 0.12 0 \n", 1476 | "3251 2.48 0 0.16 0 \n", 1477 | "2406 2.48 0 0.10 0 \n", 1478 | "793 2.43 0 0.06 0 \n", 1479 | "1029 4.46 0 0.08 0 \n", 1480 | "1943 2.94 0 0.04 0 \n", 1481 | "1161 2.21 0 0.41 0 \n", 1482 | "2837 3.86 0 0.19 0 " 1483 | ] 1484 | }, 1485 | "metadata": {}, 1486 | "output_type": "display_data" 1487 | } 1488 | ], 1489 | "source": [ 1490 | "test = pd.concat([X_test, y_test], axis=1)\n", 1491 | "\n", 1492 | "print(test.shape)\n", 1493 | "display(test.head(10))" 1494 | ] 1495 | }, 1496 | { 1497 | "cell_type": "markdown", 1498 | "metadata": {}, 1499 | "source": [ 1500 | "#### Confusion Matrix" 1501 | ] 1502 | }, 1503 | { 1504 | "cell_type": "code", 1505 | "execution_count": 62, 1506 | "metadata": {}, 1507 | "outputs": [ 1508 | { 1509 | "data": { 1510 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEGCAYAAABxfL6kAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkE0lEQVR4nO3de7xd853/8df75E7ikhKCVIIEcYuKqBoq+BHaCp0iWm2oqdKU0VZdRluGZqQdrSnGtKGtKBVxDzUuTWm0LrmVkBBScckkcnMNEkl8fn+s72E7PWfvdU72yb7k/exjPfZaa6/1Xd99dn3y3d/1XZ+vIgIzM6sfDZWugJmZlZcDu5lZnXFgNzOrMw7sZmZ1xoHdzKzOdKx0BdZ36tgt1LlHpathrbDnzp+sdBWslWbMmL40IjZv6/kdNto2YvV7uY6N95bcFxHD2nqtcnBgrzB17kGXHY+tdDWsFf76+JWVroK1UrdOemltzo/VK+iy04hcx6742xWbrc21ysGB3cysFAFSpWuRmwO7mVkeqp1bkg7sZmZ5uMVuZlZPBA0dKl2J3Grnt4WZWaWIrCsmz5KnOOk7kmZJelrSjZK6Suop6QFJz6fXTQuOP0/SXElzJB1WqnwHdjOzkpR1xeRZSpUkbQ2cAQyOiF2BDsAI4FxgUkT0ByalbSQNTO/vAgwDrpJU9OeDA7uZWR5lbLGTdYN3k9QR2ABYAAwHxqX3xwFHpfXhwPiIWBkR84C5wJBihTuwm5nlkb/FvpmkaQXLKYXFRMT/AZcCLwMLgTcj4n5gi4hYmI5ZCPRKp2wNvFJQxPy0r0W+eWpmVpJa0xpfGhGDWywp6zsfDvQD3gBulnRC8Yv/g6ITaTiwm5mVIso5KuYQYF5ELAGQdBvwGWCRpN4RsVBSb2BxOn4+0Kfg/G3Ium5a5K4YM7OSVM4+9peBT0vaQJKAg4FngInAyHTMSODOtD4RGCGpi6R+QH9gSrELuMVuZpZHQ3keUIqIxyXdAswAVgN/A8YC3YEJkk4mC/7HpONnSZoAzE7Hj4qINcWu4cBuZlZK4zj2MomIC4ALmuxeSdZ6b+740cDovOU7sJuZ5eGUAmZm9aS2Ugo4sJuZ5eHsjmZmdSRnuoBq4cBuZpaHW+xmZnXGLXYzs3rSqpQCFefAbmZWSnlTCrQ7B3Yzs5LcYjczqz/uYzczqzNusZuZ1Rm32M3M6ojcx25mVnfU4MBuZlY3BMhdMWZmdUQ0P/NolXJgNzMrSW6xm5nVm1oK7LVzN8DMrIIaGhpyLaVI2lHSEwXLW5LOlNRT0gOSnk+vmxacc56kuZLmSDqsZF3X8rOamdU/tWIpISLmRMSgiBgE7AW8C9wOnAtMioj+wKS0jaSBwAhgF2AYcJWkoolrHNjNzEpQ6mPPs7TSwcDfI+IlYDgwLu0fBxyV1ocD4yNiZUTMA+YCQ4oV6j52M7McWhG0N5M0rWB7bESMbeHYEcCNaX2LiFgIEBELJfVK+7cGHis4Z37a1yIHdjOzHFoR2JdGxOAc5XUGjgTOK3VoM/ui2AnuijEzy6EdumIOB2ZExKK0vUhS73St3sDitH8+0KfgvG2ABcUKdmA3MytFoAblWlrheD7qhgGYCIxM6yOBOwv2j5DURVI/oD8wpVjB7ooxMytBZX5ASdIGwP8DvlmwewwwQdLJwMvAMQARMUvSBGA2sBoYFRFripXvwG5mlkM5A3tEvAt8osm+ZWSjZJo7fjQwOm/5DuxmZnnUzoOnDuxmZiWptlIKOLCbmeXgwG5mVkeEcuWBqRYO7GZmedROg92B3cysJPexm5nVHwd2M7M648BuZlZnWpkuoKIc2K1NTjt+KF896jMQwey5Cxh10fV87+uHccQBu/NBBEtee5tR/349ry59kwOH7MQF3z6Szp068v6q1fzo8jt4eNpzlf4I67X5r77OaRdex+Jlb9EgMfLo/Tj1+KGM/p+7uWfyTBokNu/Zg/++4AR6b75JpatbcW3MtV4xiiia/bH9LizdA3w5It6QtDwiukvqC9wdEbuWofwzyfIgv5vz+BOBwRHx7bW9dms0bNAruux47Lq85FrrvfnG/O/V3+HTx41mxcpV/OY/vs4Dj8zi7gef5O13VgBwynGfZad+vfnumPHsNmAblrz2Nq8ufZOdt+/NLZePYpfP/aDCn6LtXp96ZaWrsNZeXfomi5a+xR479eHtd1Yw9Gs/4fr/PIWtem3CRt27AfCr8Q/x7LyFXHbe8RWu7drr1knT86TSbUmXLfpH7xGX5Tr2pcu/sFbXKoeKDcyMiCMi4o12vMSZwAbtWP7HSFqvfv107NiBrl060aFDAxt07cyrS978MKgDbNitC42Nhqeem8+rS98E4Jm/L6Rr50507rRe/bmqzpabbcweO2WZYHts2JUBfbdk4ZI3PgzqAO+8t7KmWqntrZ1mUGoX7fJfl6SzgRURcbmky4A9IuIgSQcDJ0XECZJeJGshL81R3oHAhcBSYFdgOnBCREQq89L0WaYCp5FlTNsKeFDS0ogY2qS8vYFfABsCK/ko8c5Wku4Ftgduj4iz0/HLI6J7Wv8S8PmIOFHStcBrwJ7ADEmfAN4CBgNbAmdHxC2t++tVv4VL3uSK6yfx1F0Xs2Ll+zz4+LM8+PizAPzgtC8w4nNDeGv5e3zh1Mv/4dwjDxrEzOde4f1Vq9d1ta0FLy9Yxsw589lrl74AXHzVRMb/YQobde/GXb88o7KVqybVEbNzaa8W+2Rg/7Q+GOguqRPwT8DDbSxzT7JW+EBgO2A/SV2Ba4HjImI3suB+WkRcTpaIfmgzQb0zcBPwrxGxB3AI8F56exBwHLAbcJykwuT2LRkAHBIR30vbvck+5+fJ0nD+A0mnSJomaVqsfq+5Q6raxj26ccQBuzFo+AXsfPj5bNC1M8cevjcAP/6fu9j18z/k5nun8Y1jD/jYeTtttyUXnj6c7/zH+EpU25qx/N2VfO2ca7jku//8YWv9h986kll/+DHHDBvM1RMmV7iG1aOWWuztFdinA3tJ6kHWIn6ULMDvT9sD+5SImB8RHwBPAH2BHYF5EdF4J24ccEDzp39oR2BhREwFiIi3IqKx+TgpIt6MiBVkuY+3zVGvm5vkRr4jIj6IiNnAFs2dEBFjI2JwRAxWx27NHVLVDhyyEy8tWMayN5azes0H3PXgkwzZvd/Hjrnl3qkcedCgD7e36rUJv/vpKZx2we948f9K/kizdWDV6jWMPOdqjhk2mC8UfFeNvjRsbyb+6Yl1Xq9qJEFDg3It1aBdAntErAJeBE4CHiEL5kPJujieaWOxKwvW15C1ztvyVxQtzxfY3DVocnzXJue8U6SM6viWy2z+q68xeLd+dOvSCYDP7r0jc+YtYrs+m394zLADdue5F7MZvzbq3o2bLjuVi/57Io/PfKEidbaPiwhOv/gGBvTdklFf+SgF+N9fXvzh+r2TZzKgb7Ntk/VQvtZ6tbTY2/MO1mTgLODrwFPAz4HpUd5hOM8CfSXtEBFzga8Cf07vvQ30IOuXb3rOVpL2joip6VdFqf6QRZJ2BuYAR6ey11vTZ73ExEl/46Hrz2HNmg+YOWc+427/K1f/+ET6b9uLDz4IXnn1Nb57Sdbl8o1jD6Bfn835/r8M4/v/MgyAL377Spa+vrySH2O99tiTL3DTPVMYuMNW7P/lSwD44agjuf7OR3j+pcU0NIg+W/bk5+eNqHBNq0eVxOxc2jOwPwycDzwaEe9IWkHbu2GaFRErJJ0E3JxGpUwFfpneHgv8r6SFhf3sEfG+pOOAKyR1Iwvqh5S41LnA3cArwNNA93J+jlo0Zuw9jBl7z8f2jTznmmaP/dlv7uNnv7lvXVTLctp30PbNDts8dL9dKlCb2lAtrfE8KjaO3TK1OI59fVcP49jXN2s7jr1r7wHRd+QVuY6d85NhJa8laRPgGrJRfkHWszGHbGBHX7Ku7GMj4vV0/HnAyWRdxGdERNGWUu0kGDYzqxBR9punvwDujYidgD3I7j2eSzaAoz8wKW0jaSAwAtgFGAZcJalDscId2M3McihXYJe0EdnovV9D1j2cHtYcTjayj/R6VFofDoyPiJURMQ+YCwwpWtc2fD4zs/WLspuneRZgs8bnVNJySpPStgOWAL+V9DdJ10jaENgiIhYCpNde6fitye7vNZqf9rXIz3WbmZUgWnXzdGmJPvaOwKeA0yPicUm/IHW7FLl8U0VvjrrFbmZWUlnHsc8H5kfE42n7FrJAv0hSb4D0urjg+MKn4Lche7K+RQ7sZmY5tKIrpqiIeBV4RdKOadfBZE+6TwRGpn0jgTvT+kRghKQukvoB/YEpxa7hrhgzs1JSSoEyOh24IeWueoHsKf0GYIKkk4GXgWMAImKWpAlkwX81MKpJGpN/4MBuZlZCK/vYS4qIJ8jyZzV1cDP7iIjRwOi85Tuwm5nlUEMPnjqwm5nlUUspBRzYzcxyqKG47sBuZlaS3GI3M6sronom0cjDgd3MLIcaarA7sJuZ5eGuGDOzepLzqdJq4cBuZlZCuR9Qam8O7GZmOTiwm5nVGY+KMTOrJ+5jNzOrLyJ3rvWq4MBuZpZDDcV1B3YzszwaaiiyO7CbmZWg8k+00a4c2M3McqihuO45T83M8ijjZNZIelHSU5KekDQt7esp6QFJz6fXTQuOP0/SXElzJB1WqvwWW+ySrgCipfcj4oxcn8DMrA60Qxf70IhYWrB9LjApIsZIOjdtnyNpIDAC2AXYCvijpAHF5j0t1hUzrQwVNzOreSIb8tjOhgMHpvVxwEPAOWn/+IhYCcyTNBcYAjzaUkEtBvaIGFe4LWnDiHhnraptZlajytzHHsD9kgL4VUSMBbaIiIUAEbFQUq907NbAYwXnzk/7WlTy5qmkfYFfA92BT0raA/hmRHyr1R/FzKwWqVUTbWzW2G+ejE2Bu9B+EbEgBe8HJD1b7OrN7GuxmxzyjYr5L+AwYCJARDwp6YAc55mZ1QXRqnHsSyNicLEDImJBel0s6XayrpVFknqn1npvYHE6fD7Qp+D0bYAFxcrPNSomIl5psqvFTnszs3ok5VtKl6MNJfVoXAcOBZ4mazyPTIeNBO5M6xOBEZK6SOoH9AemFLtGnhb7K5I+A4SkzsAZwDM5zjMzqxtlzBWzBXB7Kq8j8PuIuFfSVGCCpJOBl4FjACJilqQJwGxgNTCq2IiYxkJLORX4BVln/f8B9wGj2vZ5zMxqT97WeB4R8QKwRzP7lwEHt3DOaGB03muUDOxpnOVX8hZoZlaPOtRQrpiSfeyStpN0l6QlkhZLulPSduuicmZm1aKcT562tzw3T38PTAB6kz31dDNwY3tWysysmmSjYvIt1SBPYFdE/C4iVqflekqMoTQzqys5W+vV0mIvliumZ1p9MOUtGE8W0I8D/rAO6mZmVjWqJGbnUuzm6XSyQN74cb5Z8F4AF7dXpczMqk21tMbzKJYrpt+6rIiZWbUS0KFaOtBzyDXRhqRdgYFA18Z9EXFde1XKzKza1E5Yz5cE7AKyVJIDgXuAw4G/AA7sZrZekGprztM8o2K+RPY01KsRcRLZE1Nd2rVWZmZVply5YtaFPF0x70XEB5JWS9qILOOYH1Ays/VKXdw8LTBN0ibA1WQjZZZTIrOYmVm9qaG4nitXTOOEGr+UdC+wUUTMbN9qmZlVD0n1MSpG0qeKvRcRM9qnSmZm1adeumJ+VuS9AA4qc13WS7sM6MPEBy6tdDXMrIRcsxJViWIPKA1dlxUxM6tWon5a7GZmltRQF7sDu5lZKVJtpRSopW4jM7OKKXc+dkkdJP1N0t1pu6ekByQ9n143LTj2PElzJc2RdFjJuua4uCSdIOlHafuTkobkr76ZWe1rhydP/xV4pmD7XGBSRPQHJqVtJA0ERgC7AMOAqyR1KFZwnhb7VcC+wPFp+23gv1tTezOzWpbNoKRcS67ypG2AzwHXFOweDoxL6+OAowr2j4+IlRExD5gLFG1c5wns+0TEKGAFQES8DnTOVXszszrRkHMBNpM0rWA5pZni/gs4G/igYN8WEbEQIL32Svu3Bl4pOG5+2teiPDdPV6VmfwBI2rxJZczM6l4rulmWRsTglsvR54HFETFd0oF5Lt3MvqLTk+YJ7JcDtwO9JI0my/b4gxznmZnVhTKnFNgPOFLSEWRzXGwk6XpgkaTeEbFQUm+yhIuQtdD7FJy/DbCg2AVKdsVExA1kPxkuARYCR0XEza3+KGZmNaxco2Ii4ryI2CYi+pLdFP1TRJwATARGpsNGAnem9YnACEldJPUD+lMiEWOeiTY+CbwL3FW4LyJeLv0RzMxqX+PN03Y2Bpgg6WTgZeAYgIiYJWkCMBtYDYyKiDXFCsrTFfMHPprUuivQD5hDNvTGzGy90B5xPSIeAh5K68vIJjVq7rjRwOi85eZJ27tb4XbK+vjNvBcwM6t5rXz4qNJanVIgImZI2rs9KmNmVq1UQ9NZ5+lj/27BZgPwKWBJu9XIzKzKCOhYQwlY8rTYexSsrybrc7+1fapjZlad6iZtb3owqXtEfH8d1cfMrOpko2IqXYv8ik2N1zEiVhebIs/MbL3Q+gRfFVWsxT6FrD/9CUkTgZuBdxrfjIjb2rluZmZVYx2MYy+bPH3sPYFlZHOcNo5nD8CB3czWCwI61MnN015pRMzTfBTQGxVNQGNmVl9EQ50Md+wAdKcNmcXMzOpJNpl1pWuRX7HAvjAiLlpnNTEzq1Z19ORpDX0MM7P2VS83T5tNRmNmtr6pm66YiHhtXVbEzKyalXGijXbX6iRgZmbrG5Fvguhq4cBuZlaK6ihXjJmZZWonrDuwm5mVtI6mxiubWuo2MjOrGOVcSpYjdZU0RdKTkmZJ+ve0v6ekByQ9n143LTjnPElzJc2RdFipaziwm5mVJBoa8i05rAQOiog9gEHAMEmfBs4FJkVEf2BS2kbSQGAE2TzTw4CrUkr1Fjmwm5mV0DgqJs9SSmSWp81OaQlgODAu7R8HHJXWhwPjI2JlRMwD5gJDil3Dgd3MLAdJuRZgM0nTCpZTmimrg6QngMXAAxHxOLBFRCwESK+90uFbA68UnD4/7WuRb56ameXQilunSyNicLEDImINMEjSJsDtknZt5aWLJmJ0i93MrBS1qsWeW0S8ATxE1ne+SFJvgPS6OB02H+hTcNo2wIJi5Tqwm5mVIKCDlGspWZa0eWqpI6kbcAjwLDARGJkOGwncmdYnAiMkdZHUD+hPNsNdi9wVY2aWQxlHsfcGxqWRLQ3AhIi4W9KjwARJJwMvA8cARMQsSROA2cBqYFTqymmRA7uZWQ7lej4pImYCezazfxktZNWNiNHA6LzXcGA3MyshG+5YO0+eOrCbmeVQQxkFHNjNzEoTcovdzKx+NI6KqRUO7GZmpchdMWZmdceB3cyszriP3cysjmQTbVS6Fvk5sJuZ5VBLMyg5sJuZ5eCuGKt75196E39+fDY9N+nOxKu/D8B/jr2Lhx6bTaeOHemz1ScYfdZxbNS9G2+89Q5nXnQdT815haMPHcwPTv9ihWtv8199ndMuvI7Fy96iQWLk0ftx6vFDP3z/it/9kR9dfgdzHxjDJzbpXsGaVoda64qpm+yOku4pyJi2PL32lfR0K8p4SFLRPMqWOfrQwYz9j298bN9nPjWAO68+izvGfo++W2/G1TdOAqBzp46cfuIwvn/K5ytRVWtGx44N/PjML/L4zT/k/t+exTW3TObZFxYCWdB/aMqzbLPlpiVKWZ8o9/+qQd0E9og4IuU2rphS8xDWk8G7b8/GPTb42L79Bu9Ixw7Zn2CPnbfl1aVvArBBty7stWs/unTutM7rac3bcrON2WOnLMV3jw27MqDvlixc8gYA5192KxeeflSrc4vXtTSOPc9SDWoisEs6W9IZaf0ySX9K6wdLuj6tvyhps1aW+VSaKXxMwVvHpBnEn5O0fzr2RElXFpx7t6QD0/pySRdJehzYN22PTuU+JmmLtf38tei2+6aw/947VboalsPLC5Yxc8589tqlL/f8eSa9N9+E3QZsU+lqVR3lXKpBTQR2YDKwf1ofDHSX1An4J+Dh1hYm6XCyiWL3STOF/7Tg7Y4RMQQ4E7ggR3EbAk9HxD4R8Ze0/VgqdzLwjaYnSDqlcT7E15YtaW31q94vb/gjHTp04AsHf6rSVbESlr+7kq+dcw2XfPef6dixAz//7X2cd+rnKl2tqlPOiTbWhVoJ7NOBvST1AFYCj5IF+P1pQ2Anm7HktxHxLkBEvFbw3m0F1+ybo6w1wK0F2+8DdxcrIyLGRsTgiBjc8xObt67mVe6O+6fy58ef4afnftk/5avcqtVrGHnO1RwzbDBfOGgQ8+Yv4aUFy9j/y5ew+5E/YsHiN/jsCT9h0dK3Kl3V6lBDTfaaGBUTEaskvQicBDwCzASGAtsDz7ShSNHyZLAr0+saPvr7rObj/wh2LVhf0WQ2k1UR0Vh2YRl17+Gpz3LNTQ9y3c++RbeunStdHSsiIjj94hsY0HdLRn0lm9thlx225vn7P+qV3P3IH/HgdWd7VExSLTdG86iloDMZOAv4OvAU8HNgekEQbY37gR9J+n1EvCupZ5NWe1MvAt+S1ABsDQxpwzXrylmjr2fKzL/zxpvvMPT4i/n21w5l7Pg/sWrVak4+ZywAe+z8SS4880sAHHLCaJa/u4JVq9Yw6ZFZXD3mG+yw7ZaV/AjrtceefIGb7pnCwB22Yv8vXwLAD0cdyaH77VLhmlWvWvoBWkuB/WHgfODRiHhH0gra1g1DRNwraRAwTdL7wD3AvxU55a/APLJ/UJ4GZrTluvXk0vNP+Id9/3z4Pi0e/8frz2/P6lgr7Ttoe16femXRY2ZOvGgd1aY2lCuuS+oDXAdsCXwAjI2IX0jqCdxE1n37InBsRLyezjkPOJmsF+CMiLiv6DXa1uC1ctlt0F4x8Y9/rXQ1rBV6b9K19EFWVbp10vSIaPMzKjvvtmdcN/GhXMcO2W6ToteS1BvoHREz0n3D6WSDOU4EXouIMZLOBTaNiHMkDQRuJOsp2Ar4IzCg2ITWtXLz1MysYqQsV0yepZSIWBgRM9L622T3CbcGhgPj0mHjyII9af/4iFgZEfOAuZToDnZgNzPLoRWDYjZrHM6cllNaLFPqC+wJPA5sERELIQv+QK902NbAKwWnzU/7WlRLfexmZpWTv5N9aZ5uH0ndyYZKnxkRbxUZHtzcG0X70N1iNzMrqby5YtIDlrcCN0RE47Mzi1L/e2M//OK0fz7Qp+D0bYAFxcp3YDczy6FcuWKUNc1/DTwTET8veGsiMDKtjwTuLNg/QlIXSf2A/sCUYtdwV4yZWQmirOPY9wO+Cjwl6Ym079+AMcAESScDLwPHAETELEkTgNlkD0uOKjYiBhzYzcxyKdeTpymnVEuFHdzCOaOB0Xmv4cBuZpaDnzw1M6szNRTXHdjNzEqqosyNeTiwm5nl4OyOZmZ1pNYms3ZgNzPLw4HdzKy+uCvGzKzOeLijmVmdqaG47sBuZpZLDUV2B3YzsxIaJ9qoFQ7sZmY51E5Yd2A3M8unhiK7A7uZWUn5J9GoBg7sZmY51FAXuwO7mVkpZZ5oo905sJuZ5eCuGDOzOlNLLXZPZm1mloNyLiXLkX4jabGkpwv29ZT0gKTn0+umBe+dJ2mupDmSDstTVwd2M7NSlLXY8yw5XAsMa7LvXGBSRPQHJqVtJA0ERgC7pHOuktSh1AUc2M3McilPmz0iJgOvNdk9HBiX1scBRxXsHx8RKyNiHjAXGFLqGu5jNzMroZUTbWwmaVrB9tiIGFvinC0iYiFARCyU1Cvt3xp4rOC4+WlfUQ7sZmY5tOLm6dKIGFyuyzazL0qd5K4YM7MclPN/bbRIUm+A9Lo47Z8P9Ck4bhtgQanCHNjNzPIo17CY5k0ERqb1kcCdBftHSOoiqR/QH5hSqjB3xZiZ5VCuYeySbgQOJOuLnw9cAIwBJkg6GXgZOAYgImZJmgDMBlYDoyJiTalrOLCbmZXQiqGMJUXE8S28dXALx48GRrfmGg7sZmY5qIYePXVgNzPLoXbCugO7mVkuNdRgd2A3MyvNE22YmdUV52M3M6tDDuxmZnXGXTFmZvWkjOPY1wUHdjOzEtYuW8C658BuZpZHDUV2B3Yzsxzcx25mVmdaMdFGxTmwm5nl4cBuZlZf3BVjZlZHau3JU0WUnD7P2pGkJcBLla5HO9gMWFrpSlir1PN3tm1EbN7WkyXdS/b3yWNpRAxr67XKwYHd2oWkaWWc0NfWAX9n9cNznpqZ1RkHdjOzOuPAbu1lbKUrYK3m76xOuI/dzKzOuMVuZlZnHNjNzOqMA/t6SNI9kjZJ68vTa19JT5ep/DMlbdCK40+UdGU5rr2+K8d3K+khSR72WMMc2NdDEXFERLzRjpc4E8gd2NeWJD9BnayD77YkSR0qeX1zYK87ks6WdEZav0zSn9L6wZKuT+svSsr1FJ2kA1ML7hZJz0q6Qcoerk5l/k3SU5J+I6lLuvZWwIOSHmymvL0lPSLpSUlTJPVIb20l6V5Jz0v6acHxywvWvyTp2rR+raSfp2v8JG1fnsp+QdKX2vDnq2rl/m4LynwqfR9jCt46Jn0/z0naPx37sV9Wku6WdGBaXy7pIkmPA/um7dGp3MckbbG2n9/yc2CvP5OB/dP6YKC7pE7APwEPt7HMPcla4QOB7YD9JHUFrgWOi4jdyPIOnRYRlwMLgKERMbSwEEmdgZuAf42IPYBDgPfS24OA44DdgOMk9clRrwHAIRHxvbTdm+xzfh4Y0+JZtaus362kw4GjgH3S9/HTgrc7RsQQsu/9ghzFbQg8HRH7RMRf0vZjqdzJwDdaWz9rOwf2+jMd2Cu1hFcCj5IFgf1pe2CfEhHzI+ID4AmgL7AjMC8inkvHjAMOKFHOjsDCiJgKEBFvRcTq9N6kiHgzIlYAs4Ftc9Tr5ohYU7B9R0R8EBGzgXpsIZb7uz0E+G1EvAsQEa8VvHdbwTX75ihrDXBrwfb7wN2tLMPKxH2TdSYiVkl6ETgJeASYCQwFtgeeaWOxKwvW15D9/6Ytue4EtPTgRHPXoMnxXZuc806RMmooF18+7fDd5vk+Cr+L1Xy8MVj4faxo8o/sqvjoIZnCMmwdcIu9Pk0GzkqvDwOnAk8U/IdWDs8CfSXtkLa/Cvw5rb8N9GjhnK0k7Q0gqUeOG5+LJO0sqQE4ugz1rnXl/G7vB77eOIJJUs8Sx78IDJLUkLrKhrThmrYOOLDXp4fJ+psfjYhFwAra3g3TrNRlchJws6SngA+AX6a3xwL/2/TmaUS8T9aPfoWkJ4EH+MdWeFPnkv2k/xOwsHyfoGaV7buNiHuBicA0SU+Q/YNRzF+BecBTwKXAjLZc19qfUwqYmdUZt9jNzOqMA7uZWZ1xYDczqzMO7GZmdcaB3cyszjiwW9WTtEbSE5KelnRzazJHNlPWtY15ZCRdI2lgkWMPlPSZNlyj2XwtefK4FObGyXmtCyWVGqZo6xkHdqsF70XEoIjYlexR9VML32xrNsGI+JeUfqAlBwKtDuxmlebAbrXmYWCH1Jp+UNLvgackdZD0n5KmSpop6ZsAylwpabakPwC9GgtSQd5xScMkzUjZCCdJ6kv2D8h30q+F/SVtLunWdI2pkvZL535C0v3KMl3+ihzpDCTdIWm6pFmSTmny3s9SXSZJ2jzt215Z9svpkh6WtFNZ/ppWl5y/wWpGSj9wOHBv2jUE2DUi5qXg+GZE7C2pC/BXSfeTZabckSxr5BZkCcZ+06TczYGrgQNSWT0j4jVJvwSWR8Sl6bjfA5dFxF8kfRK4D9iZLPvhXyLiIkmfAz4WqFvw9XSNbsBUSbdGxDKyrIgzIuJ7kn6Uyv422dO8p0bE85L2Aa4CDmrDn9HWAw7sVgu6pUfeIWux/5qsi2RKRMxL+w8FdtdHedg3BvqTZZy8MSWoWqCUw7yJTwOTG8tqkuWw0CHAQOnDBvlGKdPiAcAX07l/kPR6js90hqTG3Dd9Ul2XkaVmuCntvx64TVL39HlvLrh2lxzXsPWUA7vVgvciYlDhjhTgCrM7Cjg9Iu5rctwRtJzBsPDcPLk1GoB9I+K9wp2pLrlzc6TJKQ5JZb0r6SFazpkT6bpvNP0bmLXEfexWL+4DTksTTyBpgKQNybIgjkh98L3J0tw29SjwWUn90rmNWQ6bZqm8n6xbhHTcoLQ6GfhK2nc4sGmJum4MvJ6C+k5kvxgaNQCNvzq+TNbF8xYwT9Ix6RqStEeJa9h6zIHd6sU1ZP3nM5RN3Pwrsl+ktwPPk2Uk/B8+Si38oYhYQtYvflvKOtnYFXIXcHTjzVPgDGBwujk7m49G5/w7cICkGWRdQi+XqOu9QEdJM4GLgccK3nsH2EXSdLI+9IvS/q8AJ6f6zQKG5/ib2HrK2R3NzOqMW+xmZnXGgd3MrM44sJuZ1RkHdjOzOuPAbmZWZxzYzczqjAO7mVmd+f+zJEOQB7swrAAAAABJRU5ErkJggg==\n", 1511 | "text/plain": [ 1512 | "
" 1513 | ] 1514 | }, 1515 | "metadata": { 1516 | "needs_background": "light" 1517 | }, 1518 | "output_type": "display_data" 1519 | } 1520 | ], 1521 | "source": [ 1522 | "plot_confusion_matrix(model,\n", 1523 | " X_test,\n", 1524 | " y_test[\"Churn\"],\n", 1525 | " display_labels = [\"will not churn\", \"will churn\"],\n", 1526 | " cmap = \"Blues\")\n", 1527 | "\n", 1528 | "plt.show()" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "code", 1533 | "execution_count": 63, 1534 | "metadata": {}, 1535 | "outputs": [ 1536 | { 1537 | "name": "stdout", 1538 | "output_type": "stream", 1539 | "text": [ 1540 | "[[832 23]\n", 1541 | " [121 24]]\n" 1542 | ] 1543 | } 1544 | ], 1545 | "source": [ 1546 | "confusion_matrix_ = confusion_matrix(y_test[\"Churn\"],\n", 1547 | " y_test[\"Churn_predicted\"])\n", 1548 | "\n", 1549 | "print(confusion_matrix_)" 1550 | ] 1551 | }, 1552 | { 1553 | "cell_type": "code", 1554 | "execution_count": 64, 1555 | "metadata": {}, 1556 | "outputs": [ 1557 | { 1558 | "name": "stdout", 1559 | "output_type": "stream", 1560 | "text": [ 1561 | "True Negatives = 832\n", 1562 | "False Negatives = 121\n", 1563 | "False Positives = 23\n", 1564 | "True Positives = 24\n" 1565 | ] 1566 | } 1567 | ], 1568 | "source": [ 1569 | "TN = confusion_matrix_[0,0]\n", 1570 | "print(\"True Negatives = \", TN)\n", 1571 | "\n", 1572 | "FN = confusion_matrix_[1,0]\n", 1573 | "print(\"False Negatives = \", FN)\n", 1574 | "\n", 1575 | "FP = confusion_matrix_[0,1]\n", 1576 | "print(\"False Positives = \", FP)\n", 1577 | "\n", 1578 | "TP = confusion_matrix_[1,1]\n", 1579 | "print(\"True Positives = \", TP)" 1580 | ] 1581 | }, 1582 | { 1583 | "cell_type": "markdown", 1584 | "metadata": {}, 1585 | "source": [ 1586 | "#### Accuracy" 1587 | ] 1588 | }, 1589 | { 1590 | "cell_type": "code", 1591 | "execution_count": 65, 1592 | "metadata": {}, 1593 | "outputs": [ 1594 | { 1595 | "name": "stdout", 1596 | "output_type": "stream", 1597 | "text": [ 1598 | "0.856\n" 1599 | ] 1600 | } 1601 | ], 1602 | "source": [ 1603 | "accuracy = (TN+TP)/(TN+TP+FP+FN)\n", 1604 | "print(accuracy)" 1605 | ] 1606 | }, 1607 | { 1608 | "cell_type": "markdown", 1609 | "metadata": {}, 1610 | "source": [ 1611 | "#### Precision, Recall/Sensitivity/True Positive Rate, F1 score\n", 1612 | "\n", 1613 | "**precision:**\n", 1614 | "Precision can be seen as a measure of a classifier’s exactness. For each class, it is defined as the ratio of true positives to the sum of true and false positives. Said another way, “for all instances classified positive, what percent was correct?”\n", 1615 | "\n", 1616 | "**recall:**\n", 1617 | "Recall is a measure of the classifier’s completeness; the ability of a classifier to correctly find all positive instances. For each class, it is defined as the ratio of true positives to the sum of true positives and false negatives. Said another way, “for all instances that were actually positive, what percent was classified correctly?”\n", 1618 | "\n", 1619 | "**f1 score:**\n", 1620 | "The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0. Generally speaking, F1 scores are lower than accuracy measures as they embed precision and recall into their computation. As a rule of thumb, the weighted average of F1 should be used to compare classifier models, not global accuracy.\n", 1621 | "\n", 1622 | "**support:**\n", 1623 | "Support is the number of actual occurrences of the class in the specified dataset. Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing. Support doesn’t change between models but instead diagnoses the evaluation process." 1624 | ] 1625 | }, 1626 | { 1627 | "cell_type": "code", 1628 | "execution_count": 73, 1629 | "metadata": {}, 1630 | "outputs": [ 1631 | { 1632 | "name": "stdout", 1633 | "output_type": "stream", 1634 | "text": [ 1635 | "Calculated __________________________\n", 1636 | "Precision = 0.51\n", 1637 | "Recall = 0.17\n", 1638 | "f1_score = 0.26\n", 1639 | "\n", 1640 | "Scikit-Learn ________________________\n", 1641 | "Precision = 0.51\n", 1642 | "Recall = 0.17\n", 1643 | "f1_score = 0.25\n" 1644 | ] 1645 | } 1646 | ], 1647 | "source": [ 1648 | "print(\"Calculated __________________________\")\n", 1649 | "precision = np.round(TP/(TP+FP), 2) #predicted class 1\n", 1650 | "recall = np.round(TP/(TP+FN), 2) #actual class 1\n", 1651 | "f1_score_ = np.round((2*precision*recall)/(precision+recall), 2)\n", 1652 | "\n", 1653 | "print(\"Precision = \", precision)\n", 1654 | "print(\"Recall = \", recall)\n", 1655 | "print(\"f1_score = \", f1_score_)\n", 1656 | "\n", 1657 | "print(\"\\nScikit-Learn ________________________\")\n", 1658 | "precision_ = np.round(precision_score(y_test['Churn'], \n", 1659 | " y_test['Churn_predicted']), 2)\n", 1660 | "recall_ = np.round(recall_score(y_test['Churn'], \n", 1661 | " y_test['Churn_predicted']), 2)\n", 1662 | "\n", 1663 | "f1_score__ = np.round(f1_score(y_test['Churn'], \n", 1664 | " y_test['Churn_predicted']), 2)\n", 1665 | "\n", 1666 | "print(\"Precision = \", precision_)\n", 1667 | "print(\"Recall = \", recall_)\n", 1668 | "print(\"f1_score = \", f1_score__)" 1669 | ] 1670 | }, 1671 | { 1672 | "cell_type": "markdown", 1673 | "metadata": {}, 1674 | "source": [ 1675 | "#### Classification report" 1676 | ] 1677 | }, 1678 | { 1679 | "cell_type": "code", 1680 | "execution_count": 69, 1681 | "metadata": {}, 1682 | "outputs": [ 1683 | { 1684 | "name": "stdout", 1685 | "output_type": "stream", 1686 | "text": [ 1687 | " precision recall f1-score support\n", 1688 | "\n", 1689 | " 0 0.87 0.97 0.92 855\n", 1690 | " 1 0.51 0.17 0.25 145\n", 1691 | "\n", 1692 | " accuracy 0.86 1000\n", 1693 | " macro avg 0.69 0.57 0.59 1000\n", 1694 | "weighted avg 0.82 0.86 0.82 1000\n", 1695 | "\n" 1696 | ] 1697 | } 1698 | ], 1699 | "source": [ 1700 | "classification_report_ = classification_report(y_test[\"Churn\"],\n", 1701 | " y_test[\"Churn_predicted\"])\n", 1702 | "print(classification_report_)" 1703 | ] 1704 | }, 1705 | { 1706 | "cell_type": "code", 1707 | "execution_count": null, 1708 | "metadata": {}, 1709 | "outputs": [], 1710 | "source": [] 1711 | } 1712 | ], 1713 | "metadata": { 1714 | "kernelspec": { 1715 | "display_name": "Python 3 (ipykernel)", 1716 | "language": "python", 1717 | "name": "python3" 1718 | }, 1719 | "language_info": { 1720 | "codemirror_mode": { 1721 | "name": "ipython", 1722 | "version": 3 1723 | }, 1724 | "file_extension": ".py", 1725 | "mimetype": "text/x-python", 1726 | "name": "python", 1727 | "nbconvert_exporter": "python", 1728 | "pygments_lexer": "ipython3", 1729 | "version": "3.8.5" 1730 | } 1731 | }, 1732 | "nbformat": 4, 1733 | "nbformat_minor": 4 1734 | } 1735 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | I taught Machine Learning to 250+ students through an online platform. These are the course materials I built for my students (Mostly graduate-level students from the Non-CS background). 2 | 3 | Video Lectures: https://www.youtube.com/playlist?list=PLGjf1T0akmhN1B4YHB5PT42XmJwr9BpS6 4 | 5 | Quiz: https://kawsar34.medium.com/list/machine-learning-interview-quiz-057db29497a4 6 | 7 | #### Lecture 1: Difference between Supervised Learning & Unsupervised Learning 8 | - Defining Machine Learning: What can ML do? 9 | - How does ML work? 10 | - Machine Learning terminologies 11 | - Supervised Learning, Unsupervised Learning, Deep Learning 12 | 13 | #### Lecture 2: Supervised Learning: Linear Regression 14 | - Supervised Learning: Linear Regression 15 | - train data, test data 16 | - Understanding the equation of a straight line 17 | - feature coefficient (slope, gradient, m) 18 | - bias coefficient (y-intercept, c) 19 | - domain: x-axis, independent variable 20 | - range: y-axis, dependent variable 21 | - loss function, cost function, objective function, error function 22 | - bias-variance tradeoff, overfitting, underfitting 23 | - ordinary least square method 24 | - gradient descent method 25 | - residual, error, squared error, RMSE - Root Mean Squared Error 26 | 27 | #### Lecture 3: Supervised Learning: Linear Regression and Regression accuracy metrics 28 | - Supervised Learning: Linear Regression 29 | - Accuracy metric in Regression problem 30 | - Mean Absolute Error (MAE) 31 | - Mean Absolute Percentage Error (MAPE) 32 | - Mean Squared Error (MSE) 33 | - Root Mean Squared Error (RMSE) 34 | - R-squared or coefficient of determination 35 | - Prediction result evaluation 36 | 37 | #### Lecture 4: Supervised Learning - Classification: Logistic Regression 38 | - Supervised Learning - Classification: Logistic Regression 39 | 40 | #### Lecture 5: Supervised Learning - Classification: Accuracy Metrics 41 | - Supervised Learning - Classification: Logistic Regression 42 | - Confusion Matrix 43 | - Accuracy, Precision, Recall/Sensitivity/True Positive Rate, F1 score, False Positive Rate 44 | - ROC: Receiver Operating Characteristics and AUC: Area Under the Curve 45 | - Classification report 46 | 47 | #### Lecture 6: Supervised Learning - Decision Tree 48 | - Decision Tree Classification and Regression 49 | 50 | #### Lecture 7: Supervised Learning - Decision Tree, Cross-validation and Grid Search 51 | - Decision Tree Classification 52 | - Cross-Validation 53 | - Grid Search 54 | 55 | #### Lecture 8: Unsupervised Learning - K-means Clustering 56 | - Unsupervised Learning 57 | - K-means Clustering 58 | - Elbow method 59 | -------------------------------------------------------------------------------- /processed.cleveland.data: -------------------------------------------------------------------------------- 1 | 63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0 2 | 67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2 3 | 67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1 4 | 37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0 5 | 41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0 6 | 56.0,1.0,2.0,120.0,236.0,0.0,0.0,178.0,0.0,0.8,1.0,0.0,3.0,0 7 | 62.0,0.0,4.0,140.0,268.0,0.0,2.0,160.0,0.0,3.6,3.0,2.0,3.0,3 8 | 57.0,0.0,4.0,120.0,354.0,0.0,0.0,163.0,1.0,0.6,1.0,0.0,3.0,0 9 | 63.0,1.0,4.0,130.0,254.0,0.0,2.0,147.0,0.0,1.4,2.0,1.0,7.0,2 10 | 53.0,1.0,4.0,140.0,203.0,1.0,2.0,155.0,1.0,3.1,3.0,0.0,7.0,1 11 | 57.0,1.0,4.0,140.0,192.0,0.0,0.0,148.0,0.0,0.4,2.0,0.0,6.0,0 12 | 56.0,0.0,2.0,140.0,294.0,0.0,2.0,153.0,0.0,1.3,2.0,0.0,3.0,0 13 | 56.0,1.0,3.0,130.0,256.0,1.0,2.0,142.0,1.0,0.6,2.0,1.0,6.0,2 14 | 44.0,1.0,2.0,120.0,263.0,0.0,0.0,173.0,0.0,0.0,1.0,0.0,7.0,0 15 | 52.0,1.0,3.0,172.0,199.0,1.0,0.0,162.0,0.0,0.5,1.0,0.0,7.0,0 16 | 57.0,1.0,3.0,150.0,168.0,0.0,0.0,174.0,0.0,1.6,1.0,0.0,3.0,0 17 | 48.0,1.0,2.0,110.0,229.0,0.0,0.0,168.0,0.0,1.0,3.0,0.0,7.0,1 18 | 54.0,1.0,4.0,140.0,239.0,0.0,0.0,160.0,0.0,1.2,1.0,0.0,3.0,0 19 | 48.0,0.0,3.0,130.0,275.0,0.0,0.0,139.0,0.0,0.2,1.0,0.0,3.0,0 20 | 49.0,1.0,2.0,130.0,266.0,0.0,0.0,171.0,0.0,0.6,1.0,0.0,3.0,0 21 | 64.0,1.0,1.0,110.0,211.0,0.0,2.0,144.0,1.0,1.8,2.0,0.0,3.0,0 22 | 58.0,0.0,1.0,150.0,283.0,1.0,2.0,162.0,0.0,1.0,1.0,0.0,3.0,0 23 | 58.0,1.0,2.0,120.0,284.0,0.0,2.0,160.0,0.0,1.8,2.0,0.0,3.0,1 24 | 58.0,1.0,3.0,132.0,224.0,0.0,2.0,173.0,0.0,3.2,1.0,2.0,7.0,3 25 | 60.0,1.0,4.0,130.0,206.0,0.0,2.0,132.0,1.0,2.4,2.0,2.0,7.0,4 26 | 50.0,0.0,3.0,120.0,219.0,0.0,0.0,158.0,0.0,1.6,2.0,0.0,3.0,0 27 | 58.0,0.0,3.0,120.0,340.0,0.0,0.0,172.0,0.0,0.0,1.0,0.0,3.0,0 28 | 66.0,0.0,1.0,150.0,226.0,0.0,0.0,114.0,0.0,2.6,3.0,0.0,3.0,0 29 | 43.0,1.0,4.0,150.0,247.0,0.0,0.0,171.0,0.0,1.5,1.0,0.0,3.0,0 30 | 40.0,1.0,4.0,110.0,167.0,0.0,2.0,114.0,1.0,2.0,2.0,0.0,7.0,3 31 | 69.0,0.0,1.0,140.0,239.0,0.0,0.0,151.0,0.0,1.8,1.0,2.0,3.0,0 32 | 60.0,1.0,4.0,117.0,230.0,1.0,0.0,160.0,1.0,1.4,1.0,2.0,7.0,2 33 | 64.0,1.0,3.0,140.0,335.0,0.0,0.0,158.0,0.0,0.0,1.0,0.0,3.0,1 34 | 59.0,1.0,4.0,135.0,234.0,0.0,0.0,161.0,0.0,0.5,2.0,0.0,7.0,0 35 | 44.0,1.0,3.0,130.0,233.0,0.0,0.0,179.0,1.0,0.4,1.0,0.0,3.0,0 36 | 42.0,1.0,4.0,140.0,226.0,0.0,0.0,178.0,0.0,0.0,1.0,0.0,3.0,0 37 | 43.0,1.0,4.0,120.0,177.0,0.0,2.0,120.0,1.0,2.5,2.0,0.0,7.0,3 38 | 57.0,1.0,4.0,150.0,276.0,0.0,2.0,112.0,1.0,0.6,2.0,1.0,6.0,1 39 | 55.0,1.0,4.0,132.0,353.0,0.0,0.0,132.0,1.0,1.2,2.0,1.0,7.0,3 40 | 61.0,1.0,3.0,150.0,243.0,1.0,0.0,137.0,1.0,1.0,2.0,0.0,3.0,0 41 | 65.0,0.0,4.0,150.0,225.0,0.0,2.0,114.0,0.0,1.0,2.0,3.0,7.0,4 42 | 40.0,1.0,1.0,140.0,199.0,0.0,0.0,178.0,1.0,1.4,1.0,0.0,7.0,0 43 | 71.0,0.0,2.0,160.0,302.0,0.0,0.0,162.0,0.0,0.4,1.0,2.0,3.0,0 44 | 59.0,1.0,3.0,150.0,212.0,1.0,0.0,157.0,0.0,1.6,1.0,0.0,3.0,0 45 | 61.0,0.0,4.0,130.0,330.0,0.0,2.0,169.0,0.0,0.0,1.0,0.0,3.0,1 46 | 58.0,1.0,3.0,112.0,230.0,0.0,2.0,165.0,0.0,2.5,2.0,1.0,7.0,4 47 | 51.0,1.0,3.0,110.0,175.0,0.0,0.0,123.0,0.0,0.6,1.0,0.0,3.0,0 48 | 50.0,1.0,4.0,150.0,243.0,0.0,2.0,128.0,0.0,2.6,2.0,0.0,7.0,4 49 | 65.0,0.0,3.0,140.0,417.0,1.0,2.0,157.0,0.0,0.8,1.0,1.0,3.0,0 50 | 53.0,1.0,3.0,130.0,197.0,1.0,2.0,152.0,0.0,1.2,3.0,0.0,3.0,0 51 | 41.0,0.0,2.0,105.0,198.0,0.0,0.0,168.0,0.0,0.0,1.0,1.0,3.0,0 52 | 65.0,1.0,4.0,120.0,177.0,0.0,0.0,140.0,0.0,0.4,1.0,0.0,7.0,0 53 | 44.0,1.0,4.0,112.0,290.0,0.0,2.0,153.0,0.0,0.0,1.0,1.0,3.0,2 54 | 44.0,1.0,2.0,130.0,219.0,0.0,2.0,188.0,0.0,0.0,1.0,0.0,3.0,0 55 | 60.0,1.0,4.0,130.0,253.0,0.0,0.0,144.0,1.0,1.4,1.0,1.0,7.0,1 56 | 54.0,1.0,4.0,124.0,266.0,0.0,2.0,109.0,1.0,2.2,2.0,1.0,7.0,1 57 | 50.0,1.0,3.0,140.0,233.0,0.0,0.0,163.0,0.0,0.6,2.0,1.0,7.0,1 58 | 41.0,1.0,4.0,110.0,172.0,0.0,2.0,158.0,0.0,0.0,1.0,0.0,7.0,1 59 | 54.0,1.0,3.0,125.0,273.0,0.0,2.0,152.0,0.0,0.5,3.0,1.0,3.0,0 60 | 51.0,1.0,1.0,125.0,213.0,0.0,2.0,125.0,1.0,1.4,1.0,1.0,3.0,0 61 | 51.0,0.0,4.0,130.0,305.0,0.0,0.0,142.0,1.0,1.2,2.0,0.0,7.0,2 62 | 46.0,0.0,3.0,142.0,177.0,0.0,2.0,160.0,1.0,1.4,3.0,0.0,3.0,0 63 | 58.0,1.0,4.0,128.0,216.0,0.0,2.0,131.0,1.0,2.2,2.0,3.0,7.0,1 64 | 54.0,0.0,3.0,135.0,304.0,1.0,0.0,170.0,0.0,0.0,1.0,0.0,3.0,0 65 | 54.0,1.0,4.0,120.0,188.0,0.0,0.0,113.0,0.0,1.4,2.0,1.0,7.0,2 66 | 60.0,1.0,4.0,145.0,282.0,0.0,2.0,142.0,1.0,2.8,2.0,2.0,7.0,2 67 | 60.0,1.0,3.0,140.0,185.0,0.0,2.0,155.0,0.0,3.0,2.0,0.0,3.0,1 68 | 54.0,1.0,3.0,150.0,232.0,0.0,2.0,165.0,0.0,1.6,1.0,0.0,7.0,0 69 | 59.0,1.0,4.0,170.0,326.0,0.0,2.0,140.0,1.0,3.4,3.0,0.0,7.0,2 70 | 46.0,1.0,3.0,150.0,231.0,0.0,0.0,147.0,0.0,3.6,2.0,0.0,3.0,1 71 | 65.0,0.0,3.0,155.0,269.0,0.0,0.0,148.0,0.0,0.8,1.0,0.0,3.0,0 72 | 67.0,1.0,4.0,125.0,254.0,1.0,0.0,163.0,0.0,0.2,2.0,2.0,7.0,3 73 | 62.0,1.0,4.0,120.0,267.0,0.0,0.0,99.0,1.0,1.8,2.0,2.0,7.0,1 74 | 65.0,1.0,4.0,110.0,248.0,0.0,2.0,158.0,0.0,0.6,1.0,2.0,6.0,1 75 | 44.0,1.0,4.0,110.0,197.0,0.0,2.0,177.0,0.0,0.0,1.0,1.0,3.0,1 76 | 65.0,0.0,3.0,160.0,360.0,0.0,2.0,151.0,0.0,0.8,1.0,0.0,3.0,0 77 | 60.0,1.0,4.0,125.0,258.0,0.0,2.0,141.0,1.0,2.8,2.0,1.0,7.0,1 78 | 51.0,0.0,3.0,140.0,308.0,0.0,2.0,142.0,0.0,1.5,1.0,1.0,3.0,0 79 | 48.0,1.0,2.0,130.0,245.0,0.0,2.0,180.0,0.0,0.2,2.0,0.0,3.0,0 80 | 58.0,1.0,4.0,150.0,270.0,0.0,2.0,111.0,1.0,0.8,1.0,0.0,7.0,3 81 | 45.0,1.0,4.0,104.0,208.0,0.0,2.0,148.0,1.0,3.0,2.0,0.0,3.0,0 82 | 53.0,0.0,4.0,130.0,264.0,0.0,2.0,143.0,0.0,0.4,2.0,0.0,3.0,0 83 | 39.0,1.0,3.0,140.0,321.0,0.0,2.0,182.0,0.0,0.0,1.0,0.0,3.0,0 84 | 68.0,1.0,3.0,180.0,274.0,1.0,2.0,150.0,1.0,1.6,2.0,0.0,7.0,3 85 | 52.0,1.0,2.0,120.0,325.0,0.0,0.0,172.0,0.0,0.2,1.0,0.0,3.0,0 86 | 44.0,1.0,3.0,140.0,235.0,0.0,2.0,180.0,0.0,0.0,1.0,0.0,3.0,0 87 | 47.0,1.0,3.0,138.0,257.0,0.0,2.0,156.0,0.0,0.0,1.0,0.0,3.0,0 88 | 53.0,0.0,3.0,128.0,216.0,0.0,2.0,115.0,0.0,0.0,1.0,0.0,?,0 89 | 53.0,0.0,4.0,138.0,234.0,0.0,2.0,160.0,0.0,0.0,1.0,0.0,3.0,0 90 | 51.0,0.0,3.0,130.0,256.0,0.0,2.0,149.0,0.0,0.5,1.0,0.0,3.0,0 91 | 66.0,1.0,4.0,120.0,302.0,0.0,2.0,151.0,0.0,0.4,2.0,0.0,3.0,0 92 | 62.0,0.0,4.0,160.0,164.0,0.0,2.0,145.0,0.0,6.2,3.0,3.0,7.0,3 93 | 62.0,1.0,3.0,130.0,231.0,0.0,0.0,146.0,0.0,1.8,2.0,3.0,7.0,0 94 | 44.0,0.0,3.0,108.0,141.0,0.0,0.0,175.0,0.0,0.6,2.0,0.0,3.0,0 95 | 63.0,0.0,3.0,135.0,252.0,0.0,2.0,172.0,0.0,0.0,1.0,0.0,3.0,0 96 | 52.0,1.0,4.0,128.0,255.0,0.0,0.0,161.0,1.0,0.0,1.0,1.0,7.0,1 97 | 59.0,1.0,4.0,110.0,239.0,0.0,2.0,142.0,1.0,1.2,2.0,1.0,7.0,2 98 | 60.0,0.0,4.0,150.0,258.0,0.0,2.0,157.0,0.0,2.6,2.0,2.0,7.0,3 99 | 52.0,1.0,2.0,134.0,201.0,0.0,0.0,158.0,0.0,0.8,1.0,1.0,3.0,0 100 | 48.0,1.0,4.0,122.0,222.0,0.0,2.0,186.0,0.0,0.0,1.0,0.0,3.0,0 101 | 45.0,1.0,4.0,115.0,260.0,0.0,2.0,185.0,0.0,0.0,1.0,0.0,3.0,0 102 | 34.0,1.0,1.0,118.0,182.0,0.0,2.0,174.0,0.0,0.0,1.0,0.0,3.0,0 103 | 57.0,0.0,4.0,128.0,303.0,0.0,2.0,159.0,0.0,0.0,1.0,1.0,3.0,0 104 | 71.0,0.0,3.0,110.0,265.0,1.0,2.0,130.0,0.0,0.0,1.0,1.0,3.0,0 105 | 49.0,1.0,3.0,120.0,188.0,0.0,0.0,139.0,0.0,2.0,2.0,3.0,7.0,3 106 | 54.0,1.0,2.0,108.0,309.0,0.0,0.0,156.0,0.0,0.0,1.0,0.0,7.0,0 107 | 59.0,1.0,4.0,140.0,177.0,0.0,0.0,162.0,1.0,0.0,1.0,1.0,7.0,2 108 | 57.0,1.0,3.0,128.0,229.0,0.0,2.0,150.0,0.0,0.4,2.0,1.0,7.0,1 109 | 61.0,1.0,4.0,120.0,260.0,0.0,0.0,140.0,1.0,3.6,2.0,1.0,7.0,2 110 | 39.0,1.0,4.0,118.0,219.0,0.0,0.0,140.0,0.0,1.2,2.0,0.0,7.0,3 111 | 61.0,0.0,4.0,145.0,307.0,0.0,2.0,146.0,1.0,1.0,2.0,0.0,7.0,1 112 | 56.0,1.0,4.0,125.0,249.0,1.0,2.0,144.0,1.0,1.2,2.0,1.0,3.0,1 113 | 52.0,1.0,1.0,118.0,186.0,0.0,2.0,190.0,0.0,0.0,2.0,0.0,6.0,0 114 | 43.0,0.0,4.0,132.0,341.0,1.0,2.0,136.0,1.0,3.0,2.0,0.0,7.0,2 115 | 62.0,0.0,3.0,130.0,263.0,0.0,0.0,97.0,0.0,1.2,2.0,1.0,7.0,2 116 | 41.0,1.0,2.0,135.0,203.0,0.0,0.0,132.0,0.0,0.0,2.0,0.0,6.0,0 117 | 58.0,1.0,3.0,140.0,211.0,1.0,2.0,165.0,0.0,0.0,1.0,0.0,3.0,0 118 | 35.0,0.0,4.0,138.0,183.0,0.0,0.0,182.0,0.0,1.4,1.0,0.0,3.0,0 119 | 63.0,1.0,4.0,130.0,330.0,1.0,2.0,132.0,1.0,1.8,1.0,3.0,7.0,3 120 | 65.0,1.0,4.0,135.0,254.0,0.0,2.0,127.0,0.0,2.8,2.0,1.0,7.0,2 121 | 48.0,1.0,4.0,130.0,256.0,1.0,2.0,150.0,1.0,0.0,1.0,2.0,7.0,3 122 | 63.0,0.0,4.0,150.0,407.0,0.0,2.0,154.0,0.0,4.0,2.0,3.0,7.0,4 123 | 51.0,1.0,3.0,100.0,222.0,0.0,0.0,143.0,1.0,1.2,2.0,0.0,3.0,0 124 | 55.0,1.0,4.0,140.0,217.0,0.0,0.0,111.0,1.0,5.6,3.0,0.0,7.0,3 125 | 65.0,1.0,1.0,138.0,282.0,1.0,2.0,174.0,0.0,1.4,2.0,1.0,3.0,1 126 | 45.0,0.0,2.0,130.0,234.0,0.0,2.0,175.0,0.0,0.6,2.0,0.0,3.0,0 127 | 56.0,0.0,4.0,200.0,288.0,1.0,2.0,133.0,1.0,4.0,3.0,2.0,7.0,3 128 | 54.0,1.0,4.0,110.0,239.0,0.0,0.0,126.0,1.0,2.8,2.0,1.0,7.0,3 129 | 44.0,1.0,2.0,120.0,220.0,0.0,0.0,170.0,0.0,0.0,1.0,0.0,3.0,0 130 | 62.0,0.0,4.0,124.0,209.0,0.0,0.0,163.0,0.0,0.0,1.0,0.0,3.0,0 131 | 54.0,1.0,3.0,120.0,258.0,0.0,2.0,147.0,0.0,0.4,2.0,0.0,7.0,0 132 | 51.0,1.0,3.0,94.0,227.0,0.0,0.0,154.0,1.0,0.0,1.0,1.0,7.0,0 133 | 29.0,1.0,2.0,130.0,204.0,0.0,2.0,202.0,0.0,0.0,1.0,0.0,3.0,0 134 | 51.0,1.0,4.0,140.0,261.0,0.0,2.0,186.0,1.0,0.0,1.0,0.0,3.0,0 135 | 43.0,0.0,3.0,122.0,213.0,0.0,0.0,165.0,0.0,0.2,2.0,0.0,3.0,0 136 | 55.0,0.0,2.0,135.0,250.0,0.0,2.0,161.0,0.0,1.4,2.0,0.0,3.0,0 137 | 70.0,1.0,4.0,145.0,174.0,0.0,0.0,125.0,1.0,2.6,3.0,0.0,7.0,4 138 | 62.0,1.0,2.0,120.0,281.0,0.0,2.0,103.0,0.0,1.4,2.0,1.0,7.0,3 139 | 35.0,1.0,4.0,120.0,198.0,0.0,0.0,130.0,1.0,1.6,2.0,0.0,7.0,1 140 | 51.0,1.0,3.0,125.0,245.0,1.0,2.0,166.0,0.0,2.4,2.0,0.0,3.0,0 141 | 59.0,1.0,2.0,140.0,221.0,0.0,0.0,164.0,1.0,0.0,1.0,0.0,3.0,0 142 | 59.0,1.0,1.0,170.0,288.0,0.0,2.0,159.0,0.0,0.2,2.0,0.0,7.0,1 143 | 52.0,1.0,2.0,128.0,205.0,1.0,0.0,184.0,0.0,0.0,1.0,0.0,3.0,0 144 | 64.0,1.0,3.0,125.0,309.0,0.0,0.0,131.0,1.0,1.8,2.0,0.0,7.0,1 145 | 58.0,1.0,3.0,105.0,240.0,0.0,2.0,154.0,1.0,0.6,2.0,0.0,7.0,0 146 | 47.0,1.0,3.0,108.0,243.0,0.0,0.0,152.0,0.0,0.0,1.0,0.0,3.0,1 147 | 57.0,1.0,4.0,165.0,289.0,1.0,2.0,124.0,0.0,1.0,2.0,3.0,7.0,4 148 | 41.0,1.0,3.0,112.0,250.0,0.0,0.0,179.0,0.0,0.0,1.0,0.0,3.0,0 149 | 45.0,1.0,2.0,128.0,308.0,0.0,2.0,170.0,0.0,0.0,1.0,0.0,3.0,0 150 | 60.0,0.0,3.0,102.0,318.0,0.0,0.0,160.0,0.0,0.0,1.0,1.0,3.0,0 151 | 52.0,1.0,1.0,152.0,298.0,1.0,0.0,178.0,0.0,1.2,2.0,0.0,7.0,0 152 | 42.0,0.0,4.0,102.0,265.0,0.0,2.0,122.0,0.0,0.6,2.0,0.0,3.0,0 153 | 67.0,0.0,3.0,115.0,564.0,0.0,2.0,160.0,0.0,1.6,2.0,0.0,7.0,0 154 | 55.0,1.0,4.0,160.0,289.0,0.0,2.0,145.0,1.0,0.8,2.0,1.0,7.0,4 155 | 64.0,1.0,4.0,120.0,246.0,0.0,2.0,96.0,1.0,2.2,3.0,1.0,3.0,3 156 | 70.0,1.0,4.0,130.0,322.0,0.0,2.0,109.0,0.0,2.4,2.0,3.0,3.0,1 157 | 51.0,1.0,4.0,140.0,299.0,0.0,0.0,173.0,1.0,1.6,1.0,0.0,7.0,1 158 | 58.0,1.0,4.0,125.0,300.0,0.0,2.0,171.0,0.0,0.0,1.0,2.0,7.0,1 159 | 60.0,1.0,4.0,140.0,293.0,0.0,2.0,170.0,0.0,1.2,2.0,2.0,7.0,2 160 | 68.0,1.0,3.0,118.0,277.0,0.0,0.0,151.0,0.0,1.0,1.0,1.0,7.0,0 161 | 46.0,1.0,2.0,101.0,197.0,1.0,0.0,156.0,0.0,0.0,1.0,0.0,7.0,0 162 | 77.0,1.0,4.0,125.0,304.0,0.0,2.0,162.0,1.0,0.0,1.0,3.0,3.0,4 163 | 54.0,0.0,3.0,110.0,214.0,0.0,0.0,158.0,0.0,1.6,2.0,0.0,3.0,0 164 | 58.0,0.0,4.0,100.0,248.0,0.0,2.0,122.0,0.0,1.0,2.0,0.0,3.0,0 165 | 48.0,1.0,3.0,124.0,255.0,1.0,0.0,175.0,0.0,0.0,1.0,2.0,3.0,0 166 | 57.0,1.0,4.0,132.0,207.0,0.0,0.0,168.0,1.0,0.0,1.0,0.0,7.0,0 167 | 52.0,1.0,3.0,138.0,223.0,0.0,0.0,169.0,0.0,0.0,1.0,?,3.0,0 168 | 54.0,0.0,2.0,132.0,288.0,1.0,2.0,159.0,1.0,0.0,1.0,1.0,3.0,0 169 | 35.0,1.0,4.0,126.0,282.0,0.0,2.0,156.0,1.0,0.0,1.0,0.0,7.0,1 170 | 45.0,0.0,2.0,112.0,160.0,0.0,0.0,138.0,0.0,0.0,2.0,0.0,3.0,0 171 | 70.0,1.0,3.0,160.0,269.0,0.0,0.0,112.0,1.0,2.9,2.0,1.0,7.0,3 172 | 53.0,1.0,4.0,142.0,226.0,0.0,2.0,111.0,1.0,0.0,1.0,0.0,7.0,0 173 | 59.0,0.0,4.0,174.0,249.0,0.0,0.0,143.0,1.0,0.0,2.0,0.0,3.0,1 174 | 62.0,0.0,4.0,140.0,394.0,0.0,2.0,157.0,0.0,1.2,2.0,0.0,3.0,0 175 | 64.0,1.0,4.0,145.0,212.0,0.0,2.0,132.0,0.0,2.0,2.0,2.0,6.0,4 176 | 57.0,1.0,4.0,152.0,274.0,0.0,0.0,88.0,1.0,1.2,2.0,1.0,7.0,1 177 | 52.0,1.0,4.0,108.0,233.0,1.0,0.0,147.0,0.0,0.1,1.0,3.0,7.0,0 178 | 56.0,1.0,4.0,132.0,184.0,0.0,2.0,105.0,1.0,2.1,2.0,1.0,6.0,1 179 | 43.0,1.0,3.0,130.0,315.0,0.0,0.0,162.0,0.0,1.9,1.0,1.0,3.0,0 180 | 53.0,1.0,3.0,130.0,246.0,1.0,2.0,173.0,0.0,0.0,1.0,3.0,3.0,0 181 | 48.0,1.0,4.0,124.0,274.0,0.0,2.0,166.0,0.0,0.5,2.0,0.0,7.0,3 182 | 56.0,0.0,4.0,134.0,409.0,0.0,2.0,150.0,1.0,1.9,2.0,2.0,7.0,2 183 | 42.0,1.0,1.0,148.0,244.0,0.0,2.0,178.0,0.0,0.8,1.0,2.0,3.0,0 184 | 59.0,1.0,1.0,178.0,270.0,0.0,2.0,145.0,0.0,4.2,3.0,0.0,7.0,0 185 | 60.0,0.0,4.0,158.0,305.0,0.0,2.0,161.0,0.0,0.0,1.0,0.0,3.0,1 186 | 63.0,0.0,2.0,140.0,195.0,0.0,0.0,179.0,0.0,0.0,1.0,2.0,3.0,0 187 | 42.0,1.0,3.0,120.0,240.0,1.0,0.0,194.0,0.0,0.8,3.0,0.0,7.0,0 188 | 66.0,1.0,2.0,160.0,246.0,0.0,0.0,120.0,1.0,0.0,2.0,3.0,6.0,2 189 | 54.0,1.0,2.0,192.0,283.0,0.0,2.0,195.0,0.0,0.0,1.0,1.0,7.0,1 190 | 69.0,1.0,3.0,140.0,254.0,0.0,2.0,146.0,0.0,2.0,2.0,3.0,7.0,2 191 | 50.0,1.0,3.0,129.0,196.0,0.0,0.0,163.0,0.0,0.0,1.0,0.0,3.0,0 192 | 51.0,1.0,4.0,140.0,298.0,0.0,0.0,122.0,1.0,4.2,2.0,3.0,7.0,3 193 | 43.0,1.0,4.0,132.0,247.0,1.0,2.0,143.0,1.0,0.1,2.0,?,7.0,1 194 | 62.0,0.0,4.0,138.0,294.0,1.0,0.0,106.0,0.0,1.9,2.0,3.0,3.0,2 195 | 68.0,0.0,3.0,120.0,211.0,0.0,2.0,115.0,0.0,1.5,2.0,0.0,3.0,0 196 | 67.0,1.0,4.0,100.0,299.0,0.0,2.0,125.0,1.0,0.9,2.0,2.0,3.0,3 197 | 69.0,1.0,1.0,160.0,234.0,1.0,2.0,131.0,0.0,0.1,2.0,1.0,3.0,0 198 | 45.0,0.0,4.0,138.0,236.0,0.0,2.0,152.0,1.0,0.2,2.0,0.0,3.0,0 199 | 50.0,0.0,2.0,120.0,244.0,0.0,0.0,162.0,0.0,1.1,1.0,0.0,3.0,0 200 | 59.0,1.0,1.0,160.0,273.0,0.0,2.0,125.0,0.0,0.0,1.0,0.0,3.0,1 201 | 50.0,0.0,4.0,110.0,254.0,0.0,2.0,159.0,0.0,0.0,1.0,0.0,3.0,0 202 | 64.0,0.0,4.0,180.0,325.0,0.0,0.0,154.0,1.0,0.0,1.0,0.0,3.0,0 203 | 57.0,1.0,3.0,150.0,126.0,1.0,0.0,173.0,0.0,0.2,1.0,1.0,7.0,0 204 | 64.0,0.0,3.0,140.0,313.0,0.0,0.0,133.0,0.0,0.2,1.0,0.0,7.0,0 205 | 43.0,1.0,4.0,110.0,211.0,0.0,0.0,161.0,0.0,0.0,1.0,0.0,7.0,0 206 | 45.0,1.0,4.0,142.0,309.0,0.0,2.0,147.0,1.0,0.0,2.0,3.0,7.0,3 207 | 58.0,1.0,4.0,128.0,259.0,0.0,2.0,130.0,1.0,3.0,2.0,2.0,7.0,3 208 | 50.0,1.0,4.0,144.0,200.0,0.0,2.0,126.0,1.0,0.9,2.0,0.0,7.0,3 209 | 55.0,1.0,2.0,130.0,262.0,0.0,0.0,155.0,0.0,0.0,1.0,0.0,3.0,0 210 | 62.0,0.0,4.0,150.0,244.0,0.0,0.0,154.0,1.0,1.4,2.0,0.0,3.0,1 211 | 37.0,0.0,3.0,120.0,215.0,0.0,0.0,170.0,0.0,0.0,1.0,0.0,3.0,0 212 | 38.0,1.0,1.0,120.0,231.0,0.0,0.0,182.0,1.0,3.8,2.0,0.0,7.0,4 213 | 41.0,1.0,3.0,130.0,214.0,0.0,2.0,168.0,0.0,2.0,2.0,0.0,3.0,0 214 | 66.0,0.0,4.0,178.0,228.0,1.0,0.0,165.0,1.0,1.0,2.0,2.0,7.0,3 215 | 52.0,1.0,4.0,112.0,230.0,0.0,0.0,160.0,0.0,0.0,1.0,1.0,3.0,1 216 | 56.0,1.0,1.0,120.0,193.0,0.0,2.0,162.0,0.0,1.9,2.0,0.0,7.0,0 217 | 46.0,0.0,2.0,105.0,204.0,0.0,0.0,172.0,0.0,0.0,1.0,0.0,3.0,0 218 | 46.0,0.0,4.0,138.0,243.0,0.0,2.0,152.0,1.0,0.0,2.0,0.0,3.0,0 219 | 64.0,0.0,4.0,130.0,303.0,0.0,0.0,122.0,0.0,2.0,2.0,2.0,3.0,0 220 | 59.0,1.0,4.0,138.0,271.0,0.0,2.0,182.0,0.0,0.0,1.0,0.0,3.0,0 221 | 41.0,0.0,3.0,112.0,268.0,0.0,2.0,172.0,1.0,0.0,1.0,0.0,3.0,0 222 | 54.0,0.0,3.0,108.0,267.0,0.0,2.0,167.0,0.0,0.0,1.0,0.0,3.0,0 223 | 39.0,0.0,3.0,94.0,199.0,0.0,0.0,179.0,0.0,0.0,1.0,0.0,3.0,0 224 | 53.0,1.0,4.0,123.0,282.0,0.0,0.0,95.0,1.0,2.0,2.0,2.0,7.0,3 225 | 63.0,0.0,4.0,108.0,269.0,0.0,0.0,169.0,1.0,1.8,2.0,2.0,3.0,1 226 | 34.0,0.0,2.0,118.0,210.0,0.0,0.0,192.0,0.0,0.7,1.0,0.0,3.0,0 227 | 47.0,1.0,4.0,112.0,204.0,0.0,0.0,143.0,0.0,0.1,1.0,0.0,3.0,0 228 | 67.0,0.0,3.0,152.0,277.0,0.0,0.0,172.0,0.0,0.0,1.0,1.0,3.0,0 229 | 54.0,1.0,4.0,110.0,206.0,0.0,2.0,108.0,1.0,0.0,2.0,1.0,3.0,3 230 | 66.0,1.0,4.0,112.0,212.0,0.0,2.0,132.0,1.0,0.1,1.0,1.0,3.0,2 231 | 52.0,0.0,3.0,136.0,196.0,0.0,2.0,169.0,0.0,0.1,2.0,0.0,3.0,0 232 | 55.0,0.0,4.0,180.0,327.0,0.0,1.0,117.0,1.0,3.4,2.0,0.0,3.0,2 233 | 49.0,1.0,3.0,118.0,149.0,0.0,2.0,126.0,0.0,0.8,1.0,3.0,3.0,1 234 | 74.0,0.0,2.0,120.0,269.0,0.0,2.0,121.0,1.0,0.2,1.0,1.0,3.0,0 235 | 54.0,0.0,3.0,160.0,201.0,0.0,0.0,163.0,0.0,0.0,1.0,1.0,3.0,0 236 | 54.0,1.0,4.0,122.0,286.0,0.0,2.0,116.0,1.0,3.2,2.0,2.0,3.0,3 237 | 56.0,1.0,4.0,130.0,283.0,1.0,2.0,103.0,1.0,1.6,3.0,0.0,7.0,2 238 | 46.0,1.0,4.0,120.0,249.0,0.0,2.0,144.0,0.0,0.8,1.0,0.0,7.0,1 239 | 49.0,0.0,2.0,134.0,271.0,0.0,0.0,162.0,0.0,0.0,2.0,0.0,3.0,0 240 | 42.0,1.0,2.0,120.0,295.0,0.0,0.0,162.0,0.0,0.0,1.0,0.0,3.0,0 241 | 41.0,1.0,2.0,110.0,235.0,0.0,0.0,153.0,0.0,0.0,1.0,0.0,3.0,0 242 | 41.0,0.0,2.0,126.0,306.0,0.0,0.0,163.0,0.0,0.0,1.0,0.0,3.0,0 243 | 49.0,0.0,4.0,130.0,269.0,0.0,0.0,163.0,0.0,0.0,1.0,0.0,3.0,0 244 | 61.0,1.0,1.0,134.0,234.0,0.0,0.0,145.0,0.0,2.6,2.0,2.0,3.0,2 245 | 60.0,0.0,3.0,120.0,178.0,1.0,0.0,96.0,0.0,0.0,1.0,0.0,3.0,0 246 | 67.0,1.0,4.0,120.0,237.0,0.0,0.0,71.0,0.0,1.0,2.0,0.0,3.0,2 247 | 58.0,1.0,4.0,100.0,234.0,0.0,0.0,156.0,0.0,0.1,1.0,1.0,7.0,2 248 | 47.0,1.0,4.0,110.0,275.0,0.0,2.0,118.0,1.0,1.0,2.0,1.0,3.0,1 249 | 52.0,1.0,4.0,125.0,212.0,0.0,0.0,168.0,0.0,1.0,1.0,2.0,7.0,3 250 | 62.0,1.0,2.0,128.0,208.0,1.0,2.0,140.0,0.0,0.0,1.0,0.0,3.0,0 251 | 57.0,1.0,4.0,110.0,201.0,0.0,0.0,126.0,1.0,1.5,2.0,0.0,6.0,0 252 | 58.0,1.0,4.0,146.0,218.0,0.0,0.0,105.0,0.0,2.0,2.0,1.0,7.0,1 253 | 64.0,1.0,4.0,128.0,263.0,0.0,0.0,105.0,1.0,0.2,2.0,1.0,7.0,0 254 | 51.0,0.0,3.0,120.0,295.0,0.0,2.0,157.0,0.0,0.6,1.0,0.0,3.0,0 255 | 43.0,1.0,4.0,115.0,303.0,0.0,0.0,181.0,0.0,1.2,2.0,0.0,3.0,0 256 | 42.0,0.0,3.0,120.0,209.0,0.0,0.0,173.0,0.0,0.0,2.0,0.0,3.0,0 257 | 67.0,0.0,4.0,106.0,223.0,0.0,0.0,142.0,0.0,0.3,1.0,2.0,3.0,0 258 | 76.0,0.0,3.0,140.0,197.0,0.0,1.0,116.0,0.0,1.1,2.0,0.0,3.0,0 259 | 70.0,1.0,2.0,156.0,245.0,0.0,2.0,143.0,0.0,0.0,1.0,0.0,3.0,0 260 | 57.0,1.0,2.0,124.0,261.0,0.0,0.0,141.0,0.0,0.3,1.0,0.0,7.0,1 261 | 44.0,0.0,3.0,118.0,242.0,0.0,0.0,149.0,0.0,0.3,2.0,1.0,3.0,0 262 | 58.0,0.0,2.0,136.0,319.0,1.0,2.0,152.0,0.0,0.0,1.0,2.0,3.0,3 263 | 60.0,0.0,1.0,150.0,240.0,0.0,0.0,171.0,0.0,0.9,1.0,0.0,3.0,0 264 | 44.0,1.0,3.0,120.0,226.0,0.0,0.0,169.0,0.0,0.0,1.0,0.0,3.0,0 265 | 61.0,1.0,4.0,138.0,166.0,0.0,2.0,125.0,1.0,3.6,2.0,1.0,3.0,4 266 | 42.0,1.0,4.0,136.0,315.0,0.0,0.0,125.0,1.0,1.8,2.0,0.0,6.0,2 267 | 52.0,1.0,4.0,128.0,204.0,1.0,0.0,156.0,1.0,1.0,2.0,0.0,?,2 268 | 59.0,1.0,3.0,126.0,218.0,1.0,0.0,134.0,0.0,2.2,2.0,1.0,6.0,2 269 | 40.0,1.0,4.0,152.0,223.0,0.0,0.0,181.0,0.0,0.0,1.0,0.0,7.0,1 270 | 42.0,1.0,3.0,130.0,180.0,0.0,0.0,150.0,0.0,0.0,1.0,0.0,3.0,0 271 | 61.0,1.0,4.0,140.0,207.0,0.0,2.0,138.0,1.0,1.9,1.0,1.0,7.0,1 272 | 66.0,1.0,4.0,160.0,228.0,0.0,2.0,138.0,0.0,2.3,1.0,0.0,6.0,0 273 | 46.0,1.0,4.0,140.0,311.0,0.0,0.0,120.0,1.0,1.8,2.0,2.0,7.0,2 274 | 71.0,0.0,4.0,112.0,149.0,0.0,0.0,125.0,0.0,1.6,2.0,0.0,3.0,0 275 | 59.0,1.0,1.0,134.0,204.0,0.0,0.0,162.0,0.0,0.8,1.0,2.0,3.0,1 276 | 64.0,1.0,1.0,170.0,227.0,0.0,2.0,155.0,0.0,0.6,2.0,0.0,7.0,0 277 | 66.0,0.0,3.0,146.0,278.0,0.0,2.0,152.0,0.0,0.0,2.0,1.0,3.0,0 278 | 39.0,0.0,3.0,138.0,220.0,0.0,0.0,152.0,0.0,0.0,2.0,0.0,3.0,0 279 | 57.0,1.0,2.0,154.0,232.0,0.0,2.0,164.0,0.0,0.0,1.0,1.0,3.0,1 280 | 58.0,0.0,4.0,130.0,197.0,0.0,0.0,131.0,0.0,0.6,2.0,0.0,3.0,0 281 | 57.0,1.0,4.0,110.0,335.0,0.0,0.0,143.0,1.0,3.0,2.0,1.0,7.0,2 282 | 47.0,1.0,3.0,130.0,253.0,0.0,0.0,179.0,0.0,0.0,1.0,0.0,3.0,0 283 | 55.0,0.0,4.0,128.0,205.0,0.0,1.0,130.0,1.0,2.0,2.0,1.0,7.0,3 284 | 35.0,1.0,2.0,122.0,192.0,0.0,0.0,174.0,0.0,0.0,1.0,0.0,3.0,0 285 | 61.0,1.0,4.0,148.0,203.0,0.0,0.0,161.0,0.0,0.0,1.0,1.0,7.0,2 286 | 58.0,1.0,4.0,114.0,318.0,0.0,1.0,140.0,0.0,4.4,3.0,3.0,6.0,4 287 | 58.0,0.0,4.0,170.0,225.0,1.0,2.0,146.0,1.0,2.8,2.0,2.0,6.0,2 288 | 58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0 289 | 56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0 290 | 56.0,1.0,2.0,120.0,240.0,0.0,0.0,169.0,0.0,0.0,3.0,0.0,3.0,0 291 | 67.0,1.0,3.0,152.0,212.0,0.0,2.0,150.0,0.0,0.8,2.0,0.0,7.0,1 292 | 55.0,0.0,2.0,132.0,342.0,0.0,0.0,166.0,0.0,1.2,1.0,0.0,3.0,0 293 | 44.0,1.0,4.0,120.0,169.0,0.0,0.0,144.0,1.0,2.8,3.0,0.0,6.0,2 294 | 63.0,1.0,4.0,140.0,187.0,0.0,2.0,144.0,1.0,4.0,1.0,2.0,7.0,2 295 | 63.0,0.0,4.0,124.0,197.0,0.0,0.0,136.0,1.0,0.0,2.0,0.0,3.0,1 296 | 41.0,1.0,2.0,120.0,157.0,0.0,0.0,182.0,0.0,0.0,1.0,0.0,3.0,0 297 | 59.0,1.0,4.0,164.0,176.0,1.0,2.0,90.0,0.0,1.0,2.0,2.0,6.0,3 298 | 57.0,0.0,4.0,140.0,241.0,0.0,0.0,123.0,1.0,0.2,2.0,0.0,7.0,1 299 | 45.0,1.0,1.0,110.0,264.0,0.0,0.0,132.0,0.0,1.2,2.0,0.0,7.0,1 300 | 68.0,1.0,4.0,144.0,193.0,1.0,0.0,141.0,0.0,3.4,2.0,2.0,7.0,2 301 | 57.0,1.0,4.0,130.0,131.0,0.0,0.0,115.0,1.0,1.2,2.0,1.0,7.0,3 302 | 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1 303 | 38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0 304 | --------------------------------------------------------------------------------