├── Executing K Means in Python ├── Clustering+Python+Lab.ipynb ├── Hopkins+Statistic.ipynb └── init ├── K Means Clustering └── init ├── Other Forms of Clustering ├── K-Mode+Bank+Marketing.ipynb ├── K-Prototype+clustering (1).ipynb └── init └── README.md /Executing K Means in Python/Hopkins+Statistic.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Hopkins Statistics:\n", 8 | "The Hopkins statistic, is a statistic which gives a value which indicates the cluster tendency, in other words: how well the data can be clustered.\n", 9 | "\n", 10 | "- If the value is between {0.01, ...,0.3}, the data is regularly spaced.\n", 11 | "\n", 12 | "- If the value is around 0.5, it is random.\n", 13 | "\n", 14 | "- If the value is between {0.7, ..., 0.99}, it has a high tendency to cluster." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "Some usefull links to understand Hopkins Statistics:\n", 22 | "- [WikiPedia](https://en.wikipedia.org/wiki/Hopkins_statistic)\n", 23 | "- [Article](http://www.sthda.com/english/articles/29-cluster-validation-essentials/95-assessing-clustering-tendency-essentials/)" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "from sklearn.neighbors import NearestNeighbors\n", 33 | "from random import sample\n", 34 | "from numpy.random import uniform\n", 35 | "import numpy as np\n", 36 | "from math import isnan\n", 37 | " \n", 38 | "def hopkins(X):\n", 39 | " d = X.shape[1]\n", 40 | " #d = len(vars) # columns\n", 41 | " n = len(X) # rows\n", 42 | " m = int(0.1 * n) \n", 43 | " nbrs = NearestNeighbors(n_neighbors=1).fit(X.values)\n", 44 | " \n", 45 | " rand_X = sample(range(0, n, 1), m)\n", 46 | " \n", 47 | " ujd = []\n", 48 | " wjd = []\n", 49 | " for j in range(0, m):\n", 50 | " u_dist, _ = nbrs.kneighbors(uniform(np.amin(X,axis=0),np.amax(X,axis=0),d).reshape(1, -1), 2, return_distance=True)\n", 51 | " ujd.append(u_dist[0][1])\n", 52 | " w_dist, _ = nbrs.kneighbors(X.iloc[rand_X[j]].values.reshape(1, -1), 2, return_distance=True)\n", 53 | " wjd.append(w_dist[0][1])\n", 54 | " \n", 55 | " H = sum(ujd) / (sum(ujd) + sum(wjd))\n", 56 | " if isnan(H):\n", 57 | " print(ujd, wjd)\n", 58 | " H = 0\n", 59 | " \n", 60 | " return H" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "#First convert the numpy array that you have to a dataframe\n", 70 | "rfm_df_scaled = pd.DataFrame(rfm_df_scaled)\n", 71 | "rfm_df_scaled.columns = ['amount', 'frequency', 'recency']" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "#Use the Hopkins Statistic function by passing the above dataframe as a paramter\n", 81 | "hopkins(rfm_df_scaled)" 82 | ] 83 | } 84 | ], 85 | "metadata": { 86 | "kernelspec": { 87 | "display_name": "Python 3", 88 | "language": "python", 89 | "name": "python3" 90 | }, 91 | "language_info": { 92 | "codemirror_mode": { 93 | "name": "ipython", 94 | "version": 3 95 | }, 96 | "file_extension": ".py", 97 | "mimetype": "text/x-python", 98 | "name": "python", 99 | "nbconvert_exporter": "python", 100 | "pygments_lexer": "ipython3", 101 | "version": "3.6.5" 102 | } 103 | }, 104 | "nbformat": 4, 105 | "nbformat_minor": 2 106 | } 107 | -------------------------------------------------------------------------------- /Executing K Means in Python/init: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /K Means Clustering/init: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Other Forms of Clustering/K-Mode+Bank+Marketing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# K-Mode Clustering on Bank Marketing Dataset" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed. " 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "**Attribute Information(Categorical):**\n", 22 | "\n", 23 | "- age (numeric)\n", 24 | "- job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')\n", 25 | "- marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)\n", 26 | "- education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')\n", 27 | "- default: has credit in default? (categorical: 'no','yes','unknown')\n", 28 | "- housing: has housing loan? (categorical: 'no','yes','unknown')\n", 29 | "- loan: has personal loan? (categorical: 'no','yes','unknown')\n", 30 | "- contact: contact communication type (categorical: 'cellular','telephone') \n", 31 | "- month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')\n", 32 | "- day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')\n", 33 | "- poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')\n", 34 | "- UCI Repository: " 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 24, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "# Importing Libraries\n", 44 | "import pandas as pd\n", 45 | "import numpy as np\n", 46 | "%matplotlib inline\n", 47 | "import matplotlib.pyplot as plt\n", 48 | "import seaborn as sns\n", 49 | "from kmodes.kmodes import KModes\n", 50 | "import warnings\n", 51 | "warnings.filterwarnings(\"ignore\") " 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "help(KModes)" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 25, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "bank = pd.read_csv('bankmarketing.csv')" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 26, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/html": [ 80 | "
\n", 81 | "\n", 94 | "\n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | "
agejobmaritaleducationdefaulthousingloancontactmonthday_of_week...campaignpdayspreviouspoutcomeemp.var.ratecons.price.idxcons.conf.idxeuribor3mnr.employedy
056housemaidmarriedbasic.4ynononotelephonemaymon...19990nonexistent1.193.994-36.44.8575191.0no
157servicesmarriedhigh.schoolunknownnonotelephonemaymon...19990nonexistent1.193.994-36.44.8575191.0no
237servicesmarriedhigh.schoolnoyesnotelephonemaymon...19990nonexistent1.193.994-36.44.8575191.0no
340admin.marriedbasic.6ynononotelephonemaymon...19990nonexistent1.193.994-36.44.8575191.0no
456servicesmarriedhigh.schoolnonoyestelephonemaymon...19990nonexistent1.193.994-36.44.8575191.0no
\n", 244 | "

5 rows × 21 columns

\n", 245 | "
" 246 | ], 247 | "text/plain": [ 248 | " age job marital education default housing loan contact \\\n", 249 | "0 56 housemaid married basic.4y no no no telephone \n", 250 | "1 57 services married high.school unknown no no telephone \n", 251 | "2 37 services married high.school no yes no telephone \n", 252 | "3 40 admin. married basic.6y no no no telephone \n", 253 | "4 56 services married high.school no no yes telephone \n", 254 | "\n", 255 | " month day_of_week ... campaign pdays previous poutcome emp.var.rate \\\n", 256 | "0 may mon ... 1 999 0 nonexistent 1.1 \n", 257 | "1 may mon ... 1 999 0 nonexistent 1.1 \n", 258 | "2 may mon ... 1 999 0 nonexistent 1.1 \n", 259 | "3 may mon ... 1 999 0 nonexistent 1.1 \n", 260 | "4 may mon ... 1 999 0 nonexistent 1.1 \n", 261 | "\n", 262 | " cons.price.idx cons.conf.idx euribor3m nr.employed y \n", 263 | "0 93.994 -36.4 4.857 5191.0 no \n", 264 | "1 93.994 -36.4 4.857 5191.0 no \n", 265 | "2 93.994 -36.4 4.857 5191.0 no \n", 266 | "3 93.994 -36.4 4.857 5191.0 no \n", 267 | "4 93.994 -36.4 4.857 5191.0 no \n", 268 | "\n", 269 | "[5 rows x 21 columns]" 270 | ] 271 | }, 272 | "execution_count": 26, 273 | "metadata": {}, 274 | "output_type": "execute_result" 275 | } 276 | ], 277 | "source": [ 278 | "bank.head()" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 27, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/plain": [ 289 | "Index(['age', 'job', 'marital', 'education', 'default', 'housing', 'loan',\n", 290 | " 'contact', 'month', 'day_of_week', 'duration', 'campaign', 'pdays',\n", 291 | " 'previous', 'poutcome', 'emp.var.rate', 'cons.price.idx',\n", 292 | " 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y'],\n", 293 | " dtype='object')" 294 | ] 295 | }, 296 | "execution_count": 27, 297 | "metadata": {}, 298 | "output_type": "execute_result" 299 | } 300 | ], 301 | "source": [ 302 | "bank.columns" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 28, 308 | "metadata": {}, 309 | "outputs": [], 310 | "source": [ 311 | "bank_cust = bank[['age','job', 'marital', 'education', 'default', 'housing', 'loan','contact','month','day_of_week','poutcome']]" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 29, 317 | "metadata": {}, 318 | "outputs": [ 319 | { 320 | "data": { 321 | "text/html": [ 322 | "
\n", 323 | "\n", 336 | "\n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | "
agejobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcome
056housemaidmarriedbasic.4ynononotelephonemaymonnonexistent
157servicesmarriedhigh.schoolunknownnonotelephonemaymonnonexistent
237servicesmarriedhigh.schoolnoyesnotelephonemaymonnonexistent
340admin.marriedbasic.6ynononotelephonemaymonnonexistent
456servicesmarriedhigh.schoolnonoyestelephonemaymonnonexistent
\n", 426 | "
" 427 | ], 428 | "text/plain": [ 429 | " age job marital education default housing loan contact \\\n", 430 | "0 56 housemaid married basic.4y no no no telephone \n", 431 | "1 57 services married high.school unknown no no telephone \n", 432 | "2 37 services married high.school no yes no telephone \n", 433 | "3 40 admin. married basic.6y no no no telephone \n", 434 | "4 56 services married high.school no no yes telephone \n", 435 | "\n", 436 | " month day_of_week poutcome \n", 437 | "0 may mon nonexistent \n", 438 | "1 may mon nonexistent \n", 439 | "2 may mon nonexistent \n", 440 | "3 may mon nonexistent \n", 441 | "4 may mon nonexistent " 442 | ] 443 | }, 444 | "execution_count": 29, 445 | "metadata": {}, 446 | "output_type": "execute_result" 447 | } 448 | ], 449 | "source": [ 450 | "bank_cust.head()" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 30, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "bank_cust['age_bin'] = pd.cut(bank_cust['age'], [0, 20, 30, 40, 50, 60, 70, 80, 90, 100], \n", 460 | " labels=['0-20', '20-30', '30-40', '40-50','50-60','60-70','70-80', '80-90','90-100'])" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 31, 466 | "metadata": {}, 467 | "outputs": [ 468 | { 469 | "data": { 470 | "text/html": [ 471 | "
\n", 472 | "\n", 485 | "\n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | "
agejobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcomeage_bin
056housemaidmarriedbasic.4ynononotelephonemaymonnonexistent50-60
157servicesmarriedhigh.schoolunknownnonotelephonemaymonnonexistent50-60
237servicesmarriedhigh.schoolnoyesnotelephonemaymonnonexistent30-40
340admin.marriedbasic.6ynononotelephonemaymonnonexistent30-40
456servicesmarriedhigh.schoolnonoyestelephonemaymonnonexistent50-60
\n", 581 | "
" 582 | ], 583 | "text/plain": [ 584 | " age job marital education default housing loan contact \\\n", 585 | "0 56 housemaid married basic.4y no no no telephone \n", 586 | "1 57 services married high.school unknown no no telephone \n", 587 | "2 37 services married high.school no yes no telephone \n", 588 | "3 40 admin. married basic.6y no no no telephone \n", 589 | "4 56 services married high.school no no yes telephone \n", 590 | "\n", 591 | " month day_of_week poutcome age_bin \n", 592 | "0 may mon nonexistent 50-60 \n", 593 | "1 may mon nonexistent 50-60 \n", 594 | "2 may mon nonexistent 30-40 \n", 595 | "3 may mon nonexistent 30-40 \n", 596 | "4 may mon nonexistent 50-60 " 597 | ] 598 | }, 599 | "execution_count": 31, 600 | "metadata": {}, 601 | "output_type": "execute_result" 602 | } 603 | ], 604 | "source": [ 605 | "bank_cust.head()" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 32, 611 | "metadata": {}, 612 | "outputs": [], 613 | "source": [ 614 | "bank_cust = bank_cust.drop('age',axis = 1)" 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": 33, 620 | "metadata": {}, 621 | "outputs": [ 622 | { 623 | "data": { 624 | "text/html": [ 625 | "
\n", 626 | "\n", 639 | "\n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | "
jobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcomeage_bin
0housemaidmarriedbasic.4ynononotelephonemaymonnonexistent50-60
1servicesmarriedhigh.schoolunknownnonotelephonemaymonnonexistent50-60
2servicesmarriedhigh.schoolnoyesnotelephonemaymonnonexistent30-40
3admin.marriedbasic.6ynononotelephonemaymonnonexistent30-40
4servicesmarriedhigh.schoolnonoyestelephonemaymonnonexistent50-60
\n", 729 | "
" 730 | ], 731 | "text/plain": [ 732 | " job marital education default housing loan contact month \\\n", 733 | "0 housemaid married basic.4y no no no telephone may \n", 734 | "1 services married high.school unknown no no telephone may \n", 735 | "2 services married high.school no yes no telephone may \n", 736 | "3 admin. married basic.6y no no no telephone may \n", 737 | "4 services married high.school no no yes telephone may \n", 738 | "\n", 739 | " day_of_week poutcome age_bin \n", 740 | "0 mon nonexistent 50-60 \n", 741 | "1 mon nonexistent 50-60 \n", 742 | "2 mon nonexistent 30-40 \n", 743 | "3 mon nonexistent 30-40 \n", 744 | "4 mon nonexistent 50-60 " 745 | ] 746 | }, 747 | "execution_count": 33, 748 | "metadata": {}, 749 | "output_type": "execute_result" 750 | } 751 | ], 752 | "source": [ 753 | "bank_cust.head()" 754 | ] 755 | }, 756 | { 757 | "cell_type": "code", 758 | "execution_count": 34, 759 | "metadata": {}, 760 | "outputs": [ 761 | { 762 | "name": "stdout", 763 | "output_type": "stream", 764 | "text": [ 765 | "\n", 766 | "RangeIndex: 41188 entries, 0 to 41187\n", 767 | "Data columns (total 11 columns):\n", 768 | "job 41188 non-null object\n", 769 | "marital 41188 non-null object\n", 770 | "education 41188 non-null object\n", 771 | "default 41188 non-null object\n", 772 | "housing 41188 non-null object\n", 773 | "loan 41188 non-null object\n", 774 | "contact 41188 non-null object\n", 775 | "month 41188 non-null object\n", 776 | "day_of_week 41188 non-null object\n", 777 | "poutcome 41188 non-null object\n", 778 | "age_bin 41188 non-null category\n", 779 | "dtypes: category(1), object(10)\n", 780 | "memory usage: 3.2+ MB\n" 781 | ] 782 | } 783 | ], 784 | "source": [ 785 | "bank_cust.info()" 786 | ] 787 | }, 788 | { 789 | "cell_type": "code", 790 | "execution_count": 35, 791 | "metadata": {}, 792 | "outputs": [ 793 | { 794 | "data": { 795 | "text/html": [ 796 | "
\n", 797 | "\n", 810 | "\n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | "
jobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcomeage_bin
031000016114
171310016114
271302016112
301100016112
471300216114
\n", 900 | "
" 901 | ], 902 | "text/plain": [ 903 | " job marital education default housing loan contact month \\\n", 904 | "0 3 1 0 0 0 0 1 6 \n", 905 | "1 7 1 3 1 0 0 1 6 \n", 906 | "2 7 1 3 0 2 0 1 6 \n", 907 | "3 0 1 1 0 0 0 1 6 \n", 908 | "4 7 1 3 0 0 2 1 6 \n", 909 | "\n", 910 | " day_of_week poutcome age_bin \n", 911 | "0 1 1 4 \n", 912 | "1 1 1 4 \n", 913 | "2 1 1 2 \n", 914 | "3 1 1 2 \n", 915 | "4 1 1 4 " 916 | ] 917 | }, 918 | "execution_count": 35, 919 | "metadata": {}, 920 | "output_type": "execute_result" 921 | } 922 | ], 923 | "source": [ 924 | "from sklearn import preprocessing\n", 925 | "le = preprocessing.LabelEncoder()\n", 926 | "bank_cust = bank_cust.apply(le.fit_transform)\n", 927 | "bank_cust.head()" 928 | ] 929 | }, 930 | { 931 | "cell_type": "code", 932 | "execution_count": 38, 933 | "metadata": {}, 934 | "outputs": [], 935 | "source": [ 936 | "# Checking the count per category\n", 937 | "job_df = pd.DataFrame(bank_cust['job'].value_counts())" 938 | ] 939 | }, 940 | { 941 | "cell_type": "code", 942 | "execution_count": 39, 943 | "metadata": {}, 944 | "outputs": [ 945 | { 946 | "data": { 947 | "text/plain": [ 948 | "" 949 | ] 950 | }, 951 | "execution_count": 39, 952 | "metadata": {}, 953 | "output_type": "execute_result" 954 | }, 955 | { 956 | "data": { 957 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAD8CAYAAAC/1zkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAE2dJREFUeJzt3X+w3XV95/Hnq0QU8EcCBJYmuMFpxpUyu5VNIy0O65gOv3QMdKAbp9Wsk266HbZid6dd3M4sU60zdetUy0xllzFxo3VBNhVhLUrTqO06W8EgiMFIk4qFW5BcDaJbp2r0vX+cz9WbcBMu5PM9J5c8HzNnzvf7OZ/v5/39Jrl53e/n+z3npKqQJKmHn5j0DkiSnj0MFUlSN4aKJKkbQ0WS1I2hIknqxlCRJHVjqEiSujFUJEndGCqSpG4WTXoHxu3UU0+tFStWTHo3JGnBuPvuu79eVUvn0/eYC5UVK1awY8eOSe+GJC0YSf5uvn2d/pIkdWOoSJK6MVQkSd0YKpKkbgwVSVI3hookqRtDRZLUjaEiSerGUJEkdXPMvaN+xvT1fzLY2Et//VcGG1uSjmaeqUiSujFUJEndGCqSpG4GC5Ukm5PsTbJzVtvJSbYl2d2el7T2JLkuyZ4k9yU5d9Y261v/3UnWz2r/l0m+2La5LkmGOhZJ0vwMeabyP4CLD2q7BtheVSuB7W0d4BJgZXtsBK6HUQgB1wKvAFYD184EUeuzcdZ2B9eSJI3ZYKFSVX8F7DuoeS2wpS1vAS6b1f6BGvkssDjJGcBFwLaq2ldVjwPbgIvbay+sqr+uqgI+MGssSdKEjPuayulV9ShAez6ttS8DHp7Vb6q1Ha59ao52SdIEHS0X6ue6HlLPoH3uwZONSXYk2TE9Pf0Md1GS9FTGHSqPtakr2vPe1j4FnDmr33LgkadoXz5H+5yq6oaqWlVVq5YundfXLEuSnoFxh8ptwMwdXOuBW2e1v7HdBXYe8ESbHrsDuDDJknaB/kLgjvbat5Oc1+76euOssSRJEzLYx7QkuRF4FXBqkilGd3H9PnBzkg3AQ8CVrfvtwKXAHuA7wJsAqmpfkrcDn2v93lZVMxf/f53RHWYnAB9vD0nSBA0WKlX1+kO8tGaOvgVcdYhxNgOb52jfAZxzJPsoSerraLlQL0l6FjBUJEndGCqSpG4MFUlSN4aKJKkbQ0WS1I2hIknqxlCRJHVjqEiSujFUJEndDPYxLTrQQ9ddMdjYL37z1sHGlqSnwzMVSVI3hookqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3hookqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3hookqRtDRZLUjaEiSepmIqGS5DeT3J9kZ5IbkzwvyVlJ7kyyO8mHkxzf+j63re9pr6+YNc5bW/sDSS6axLFIkn5s7KGSZBnwZmBVVZ0DHAesA94JvLuqVgKPAxvaJhuAx6vqp4B3t34kObtt99PAxcB7kxw3zmORJB1oUtNfi4ATkiwCTgQeBV4NbG2vbwEua8tr2zrt9TVJ0tpvqqrvVtWDwB5g9Zj2X5I0h7GHSlX9PfAu4CFGYfIEcDfwzara37pNAcva8jLg4bbt/tb/lNntc2wjSZqASUx/LWF0lnEW8JPAScAlc3StmU0O8dqh2uequTHJjiQ7pqenn/5OS5LmZRLTX78APFhV01X1feAjwM8Di9t0GMBy4JG2PAWcCdBefxGwb3b7HNscoKpuqKpVVbVq6dKlvY9HktRMIlQeAs5LcmK7NrIG+BLwKeCK1mc9cGtbvq2t017/ZFVVa1/X7g47C1gJ3DWmY5AkzWHRU3fpq6ruTLIV+DywH7gHuAH4M+CmJL/X2ja1TTYBH0yyh9EZyro2zv1JbmYUSPuBq6rqB2M9GEnSAcYeKgBVdS1w7UHNX2GOu7eq6h+BKw8xzjuAd3TfQUnSM+I76iVJ3RgqkqRuDBVJUjeGiiSpG0NFktSNoSJJ6sZQkSR1Y6hIkroxVCRJ3RgqkqRuDBVJUjeGiiSpG0NFktSNoSJJ6sZQkSR1Y6hIkroxVCRJ3RgqkqRuDBVJUjeGiiSpm0WT3gFJGtLHP/z1wca+5F+fOtjYC5VnKpKkbgwVSVI3hookqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3EwmVJIuTbE3y5SS7kvxckpOTbEuyuz0vaX2T5Loke5Lcl+TcWeOsb/13J1k/iWORJP3YpM5U/gj4RFX9M+BfALuAa4DtVbUS2N7WAS4BVrbHRuB6gCQnA9cCrwBWA9fOBJEkaTLGHipJXghcAGwCqKrvVdU3gbXAltZtC3BZW14LfKBGPgssTnIGcBGwrar2VdXjwDbg4jEeiiTpIJM4U3kJMA28P8k9Sd6X5CTg9Kp6FKA9n9b6LwMenrX9VGs7VLskaUImESqLgHOB66vq5cA/8OOprrlkjrY6TPuTB0g2JtmRZMf09PTT3V9J0jxNIlSmgKmqurOtb2UUMo+1aS3a895Z/c+ctf1y4JHDtD9JVd1QVauqatXSpUu7HYgk6UBjD5Wq+hrwcJKXtqY1wJeA24CZO7jWA7e25duAN7a7wM4DnmjTY3cAFyZZ0i7QX9jaJEkTMqlvfvwN4ENJjge+AryJUcDdnGQD8BBwZet7O3ApsAf4TutLVe1L8nbgc63f26pq3/gOQZJ0sHmFSpJfBF7J6JrFZ6rqliMpWlX3AqvmeGnNHH0LuOoQ42wGNh/JvkiS+nnK6a8k7wX+HfBFYCfwa0n+eOgdkyQtPPM5U/lXwDntjIEkWxgFjCRJB5jPhfoHgBfPWj8TuG+Y3ZEkLWSHPFNJ8r8ZXUN5EbAryV3tpdXA/x3DvkmSFpjDTX+9a2x7IUl6VjhkqFTVX84sJzkd+Nm2eldV7Z17K0nSsWw+d3/9EnAXo/eN/BJwZ5Irht4xSdLCM5+7v34H+NmZs5MkS4G/YPTxKpIk/ch87v76iYOmu74xz+0kSceY+ZypfCLJHcCNbX0d8PHhdkmStFA9ZahU1W+1j2k5n9HHzf+3qvro4HsmSVpwDvc+lc9U1SuTfJsDv7/k3yb5IbAP+IOqeu8Y9lOStAAc7pbiV7bnF8z1epJTGL0J0lCRJAFHcMG9qr4BvKrfrkiSFrojuotr5jvlJUkCbw2WJHVkqEiSujFUJEndGCqSpG4MFUlSN4aKJKkbQ0WS1I2hIknqxlCRJHVjqEiSujFUJEndGCqSpG4MFUlSN4aKJKkbQ0WS1I2hIknqZmKhkuS4JPck+VhbPyvJnUl2J/lwkuNb+3Pb+p72+opZY7y1tT+Q5KLJHIkkacYkz1SuBnbNWn8n8O6qWgk8Dmxo7RuAx6vqp4B3t34kORtYB/w0cDHw3iTHjWnfJUlzmEioJFkOvAZ4X1sP8Gpga+uyBbisLa9t67TX17T+a4Gbquq7VfUgsAdYPZ4jkCTNZVJnKu8Bfhv4YVs/BfhmVe1v61PAsra8DHgYoL3+ROv/o/Y5tjlAko1JdiTZMT093fM4JEmzLBp3wSSvBfZW1d1JXjXTPEfXeorXDrfNgY1VNwA3AKxatWrOPtKx7LKt2wcZ96NXrBlkXB29xh4qwPnA65JcCjwPeCGjM5fFSRa1s5HlwCOt/xRwJjCVZBHwImDfrPYZs7eRJE3A2Ke/quqtVbW8qlYwutD+yar6ZeBTwBWt23rg1rZ8W1unvf7JqqrWvq7dHXYWsBK4a0yHIUmawyTOVA7lPwE3Jfk94B5gU2vfBHwwyR5GZyjrAKrq/iQ3A18C9gNXVdUPxr/bkqQZEw2Vqvo08Om2/BXmuHurqv4RuPIQ278DeMdweyhJejp8R70kqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3hookqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3R9M3P0rz9qZbLh5k3Pdf/olBxpWOFZ6pSJK6MVQkSd0YKpKkbgwVSVI3hookqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd0YKpKkbgwVSVI3hookqZuxh0qSM5N8KsmuJPcnubq1n5xkW5Ld7XlJa0+S65LsSXJfknNnjbW+9d+dZP24j0WSdKBJnKnsB/5jVb0MOA+4KsnZwDXA9qpaCWxv6wCXACvbYyNwPYxCCLgWeAWwGrh2JogkSZMx9lCpqker6vNt+dvALmAZsBbY0rptAS5ry2uBD9TIZ4HFSc4ALgK2VdW+qnoc2AYM8yUbkqR5meg1lSQrgJcDdwKnV9WjMAoe4LTWbRnw8KzNplrbodolSRMysVBJ8nzgT4G3VNW3Dtd1jrY6TPtctTYm2ZFkx/T09NPfWUnSvEwkVJI8h1GgfKiqPtKaH2vTWrTnva19Cjhz1ubLgUcO0/4kVXVDVa2qqlVLly7tdyCSpANM4u6vAJuAXVX1h7Neug2YuYNrPXDrrPY3trvAzgOeaNNjdwAXJlnSLtBf2NokSROyaAI1zwfeAHwxyb2t7T8Dvw/cnGQD8BBwZXvtduBSYA/wHeBNAFW1L8nbgc+1fm+rqn3jOQRJ0lzGHipV9Rnmvh4CsGaO/gVcdYixNgOb++2dJOlI+I56SVI3k5j+khac19zyB4OM+2eX/9Yg40qT4pmKJKkbQ0WS1I2hIknqxlCRJHXjhfpnqTs2XTrIuBdtuH2QcSU9Oxgq0lHotVs/NMi4H7vilwcZV5rh9JckqRtDRZLUjaEiSerGUJEkdWOoSJK6MVQkSd14S7G6+O8fvGiQcX/tDX7vmrSQGCqS1NFX3/O1wcZe8ZZ/MtjYvTj9JUnqxjMVSWP35lseHmTc6y4/c5BxNX+eqUiSujFUJEndGCqSpG4MFUlSN4aKJKkbQ0WS1I2hIknqxlCRJHVjqEiSujFUJEnd+DEtkrSAPfZHfz3Y2Kdf/XNPexvPVCRJ3RgqkqRuFnyoJLk4yQNJ9iS5ZtL7I0nHsgUdKkmOA/4YuAQ4G3h9krMnu1eSdOxa0KECrAb2VNVXqup7wE3A2gnvkyQdsxZ6qCwDZn/bz1RrkyRNQKpq0vvwjCW5Erioqn61rb8BWF1Vv3FQv43Axrb6UuCBZ1DuVODrR7C7R2st61nPesdOvWda659W1dL5dFzo71OZAmZ/f+hy4JGDO1XVDcANR1IoyY6qWnUkYxyNtaxnPesdO/XGUWuhT399DliZ5KwkxwPrgNsmvE+SdMxa0GcqVbU/yb8H7gCOAzZX1f0T3i1JOmYt6FABqKrbgdvHUOqIps+O4lrWs571jp16g9da0BfqJUlHl4V+TUWSdBQxVJ7COD8GJsnmJHuT7Byyzqx6Zyb5VJJdSe5PcvXA9Z6X5K4kX2j1fnfIeq3mcUnuSfKxoWu1el9N8sUk9ybZMXCtxUm2Jvly+zt8+h8pO/9aL23HNPP4VpK3DFWv1fzN9u9kZ5Ibkzxv4HpXt1r3D3Fsc/18Jzk5ybYku9vzkoHrXdmO74dJhrkLrKp8HOLB6OL/3wIvAY4HvgCcPWC9C4BzgZ1jOr4zgHPb8guAvxn4+AI8vy0/B7gTOG/gY/wPwP8EPjamP9OvAqeOqdYW4Ffb8vHA4jHVPQ74GqP3LgxVYxnwIHBCW78Z+DcD1jsH2AmcyOha818AKzvXeNLPN/BfgWva8jXAOweu9zJG79X7NLBqiD9Lz1QOb6wfA1NVfwXsG2r8Oeo9WlWfb8vfBnYx4CcS1Mj/a6vPaY/BLuolWQ68BnjfUDUmJckLGf2nsQmgqr5XVd8cU/k1wN9W1d8NXGcRcEKSRYz+s3/Se9A6ehnw2ar6TlXtB/4SuLxngUP8fK9l9MsB7fmyIetV1a6qeiZv/p43Q+XwjpmPgUmyAng5o7OHIescl+ReYC+wraqGrPce4LeBHw5Y42AF/HmSu9snOQzlJcA08P42vfe+JCcNWG+2dcCNQxaoqr8H3gU8BDwKPFFVfz5gyZ3ABUlOSXIicCkHvrF6KKdX1aMw+iUPOG0MNQdlqBxe5mh71t0ul+T5wJ8Cb6mqbw1Zq6p+UFU/w+jTD1YnOWeIOkleC+ytqruHGP8wzq+qcxl9cvZVSS4YqM4iRlMb11fVy4F/YDR9Mqj2JuPXAf9r4DpLGP0Wfxbwk8BJSX5lqHpVtQt4J7AN+ASjqe79Q9V7NjNUDm9eHwOzkCV5DqNA+VBVfWRcddtUzaeBiwcqcT7wuiRfZTRt+eokfzJQrR+pqkfa817gFkZTqEOYAqZmneltZRQyQ7sE+HxVPTZwnV8AHqyq6ar6PvAR4OeHLFhVm6rq3Kq6gNG00e4h6zWPJTkDoD3vHUPNQRkqh/es/hiYJGE0J7+rqv5wDPWWJlnclk9g9B/Hl4eoVVVvrarlVbWC0d/bJ6tqsN90AZKclOQFM8vAhYymVbqrqq8BDyd5aWtaA3xpiFoHeT0DT301DwHnJTmx/Ttdw+ia32CSnNaeXwz8IuM5ztuA9W15PXDrGGoOa4ir/8+mB6O51b9hdBfY7wxc60ZG88ffZ/Sb6IaB672S0XTefcC97XHpgPX+OXBPq7cT+C9j+jt8FWO4+4vRdY4vtMf9Y/j38jPAjvbn+VFgycD1TgS+AbxoTH9vv8vol46dwAeB5w5c7/8wCuYvAGsGGP9JP9/AKcB2RmdF24GTB653eVv+LvAYcEfv4/Qd9ZKkbpz+kiR1Y6hIkroxVCRJ3RgqkqRuDBVJUjeGiiSpG0NFktSNoSJJ6ub/A9+QXurJYcLZAAAAAElFTkSuQmCC\n", 958 | "text/plain": [ 959 | "" 960 | ] 961 | }, 962 | "metadata": {}, 963 | "output_type": "display_data" 964 | } 965 | ], 966 | "source": [ 967 | "sns.barplot(x=job_df.index, y=job_df['job'])" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 40, 973 | "metadata": {}, 974 | "outputs": [], 975 | "source": [ 976 | "# Checking the count per category\n", 977 | "age_df = pd.DataFrame(bank_cust['age_bin'].value_counts())" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": 41, 983 | "metadata": { 984 | "scrolled": true 985 | }, 986 | "outputs": [ 987 | { 988 | "data": { 989 | "text/plain": [ 990 | "" 991 | ] 992 | }, 993 | "execution_count": 41, 994 | "metadata": {}, 995 | "output_type": "execute_result" 996 | }, 997 | { 998 | "data": { 999 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAD8CAYAAABU4IIeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAGIZJREFUeJzt3X20XXV95/H3p0RUqDY8RAcTaHBMmSJjK0agssZaqBAsJcws6MBSyVhcmXGQYm1V0FmDVZklrSNqq8zKQEqYWpBBLZkuFDOIMu3IQ3hQ5KlEULiCJjaAqBUG+p0/zu/S4+Xc5N5k37tvwvu11l337O/+7bO/mxXyyd77d85OVSFJUld+ru8GJEk7F4NFktQpg0WS1CmDRZLUKYNFktQpg0WS1CmDRZLUKYNFktQpg0WS1Kl5fTfQh7333rsWL17cdxuStMO46aabflBVC6Yy9lkZLIsXL2b9+vV9tyFJO4wk35nqWC+FSZI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOvWs/OS9ts3hf3r4rO/zb0//21nfp6TtM6NnLElWJ9mY5JsT6qcnuTvJ7Un+eKh+VpINbd3RQ/VlrbYhyZlD9f2TXJ/kniSfSbLrTB6PJGnrZvpS2EXAsuFCkt8AlgOvqKqXAx9p9QOBk4CXt20+lWSXJLsAnwSOAQ4ETm5jAc4FzquqJcDDwKkzfDySpK2Y0WCpqmuBzRPKbwM+XFWPtzEbW305cGlVPV5V9wEbgEPaz4aqureqngAuBZYnCXAEcHnbfg1w/EwejyRp6/q4ef9LwL9ql7C+muTVrb4QeGBo3FirTVbfC3ikqp6cUJck9aiPm/fzgD2Aw4BXA5cleSmQEWOL0eFXWxg/UpKVwEqA/fbbb5otS5Kmqo8zljHgczVwA/CPwN6tvu/QuEXAg1uo/wCYn2TehPpIVbWqqpZW1dIFC6b0rBpJ0jboI1j+isG9EZL8ErArg5BYC5yU5LlJ9geWADcANwJL2gywXRnc4F9bVQVcA5zQ3ncFcMWsHokk6Rlm9FJYkkuA1wF7JxkDzgZWA6vbFOQngBUtJG5PchlwB/AkcFpVPdXe5+3AVcAuwOqqur3t4j3ApUk+BNwCXDiTxyNJ2roZDZaqOnmSVW+aZPw5wDkj6lcCV46o38tg1pgkaY7wK10kSZ0yWCRJnTJYJEmdMlgkSZ0yWCRJnTJYJEmdMlgkSZ0yWCRJnTJYJEmdMlgkSZ0yWCRJnTJYJEmdMlgkSZ0yWCRJnTJYJEmdMlgkSZ2a0WBJsjrJxva0yInr/jBJJdm7LSfJJ5JsSPKNJAcPjV2R5J72s2Ko/qokt7VtPpEkM3k8kqStm+kzlouAZROLSfYFXg/cP1Q+hsFz7pcAK4Hz29g9GTzS+FAGT4s8O8kebZvz29jx7Z6xL0nS7JrRYKmqa4HNI1adB7wbqKHacuDiGrgOmJ9kH+BoYF1Vba6qh4F1wLK27oVV9bWqKuBi4PiZPB5J0tbN+j2WJMcB362qr09YtRB4YGh5rNW2VB8bUZck9WjebO4syW7A+4CjRq0eUattqE+275UMLpux3377bbVXSdK2me0zln8O7A98Pcm3gUXAzUn+GYMzjn2Hxi4CHtxKfdGI+khVtaqqllbV0gULFnRwKJKkUWY1WKrqtqp6UVUtrqrFDMLh4Kr6HrAWOKXNDjsMeLSqHgKuAo5Kske7aX8UcFVb91iSw9pssFOAK2bzeCRJzzTT040vAb4GHJBkLMmpWxh+JXAvsAH478B/BKiqzcAHgRvbzwdaDeBtwAVtm28BX5iJ45AkTd2M3mOpqpO3sn7x0OsCTptk3Gpg9Yj6euCg7etSktQlP3kvSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSerUTD9BcnWSjUm+OVT7kyR3JflGks8nmT+07qwkG5LcneToofqyVtuQ5Myh+v5Jrk9yT5LPJNl1Jo9HkrR1M33GchGwbEJtHXBQVb0C+DvgLIAkBwInAS9v23wqyS5JdgE+CRwDHAic3MYCnAucV1VLgIeBLT36WJI0C2Y0WKrqWmDzhNqXqurJtngdsKi9Xg5cWlWPV9V9DJ5jf0j72VBV91bVE8ClwPIkAY4ALm/brwGOn8njkSRtXd/3WH4X+EJ7vRB4YGjdWKtNVt8LeGQopMbrkqQe9RYsSd4HPAl8erw0YlhtQ32y/a1Msj7J+k2bNk23XUnSFPUSLElWAMcCb6yq8TAYA/YdGrYIeHAL9R8A85PMm1AfqapWVdXSqlq6YMGCbg5EkvQMsx4sSZYB7wGOq6qfDK1aC5yU5LlJ9geWADcANwJL2gywXRnc4F/bAuka4IS2/Qrgitk6DknSaDM93fgS4GvAAUnGkpwK/BnwAmBdkluT/DeAqroduAy4A/gicFpVPdXuobwduAq4E7isjYVBQL0zyQYG91wunMnjkSRt3bytD9l2VXXyiPKkf/lX1TnAOSPqVwJXjqjfy2DWmCRpjuh7VpgkaSdjsEiSOmWwSJI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOmWwSJI6ZbBIkjplsEiSOjWjT5BMsho4FthYVQe12p7AZ4DFwLeB36mqh5ME+DjwBuAnwL+rqpvbNiuA/9Te9kNVtabVXwVcBDyfwRMmz6iqmslj0tzy1df+ei/7/fVrv9rLfqUdwUyfsVwELJtQOxO4uqqWAFe3ZYBjgCXtZyVwPjwdRGcDhzJ4DPHZSfZo25zfxo5vN3FfkqRZNqPBUlXXApsnlJcDa9rrNcDxQ/WLa+A6YH6SfYCjgXVVtbmqHgbWAcvauhdW1dfaWcrFQ+8lSepJH/dYXlxVDwG03y9q9YXAA0PjxlptS/WxEfWRkqxMsj7J+k2bNm33QUiSRpvWPZYkC4FfHN6unZV0ISNqtQ31kapqFbAKYOnSpd6HkaQZMuVgSXIu8G+BO4CnWrmA6QbL95PsU1UPtctZG1t9DNh3aNwi4MFWf92E+ldafdGI8ZKkHk3nUtjxwAFV9Yaq+u32c9w27HMtsKK9XgFcMVQ/JQOHAY+2S2VXAUcl2aPdtD8KuKqteyzJYW1G2SlD7yVJ6sl0LoXdCzwHeHyqGyS5hMHZxt5JxhjM7vowcFmSU4H7gRPb8CsZTDXewGC68VsAqmpzkg8CN7ZxH6iq8QkBb+Ofpht/of1Ikno0nWD5CXBrkqsZCpeq+r3JNqiqkydZdeSIsQWcNsn7rAZWj6ivBw7actuSpNk0nWBZ234kSZrUlINl/NPukiRtyVaDJcllVfU7SW5jxHTeqnrFjHQmSdohTeWM5Yz2+9iZbESStHPY6nTjoU/Jf4fBTftfAV4BPN5qkiQ9bcqfY0nyVuAG4N8AJwDXJfndmWpMkrRjms6ssHcBr6yqvwdIshfwfxkxDViS9Ow1nU/ejwGPDS0/xs9+OaQkSVOaFfbO9vK7wPVJrmAwO2w5g0tjkiQ9bSqXwl7Qfn+r/Yzze7kkSc+w1WCpqj+ayhsl+dOqOn37W5Ik7ci6fNDX4R2+lyRpB9XHEyQlSTsxg0WS1Kkug2XUo4IlSc8y0w6WJLtPsurj29mLJGknMJ2vdHlNkjuAO9vyryT51Pj6qrpoOjtO8vtJbk/yzSSXJHlekv2TXJ/kniSfSbJrG/vctryhrV889D5ntfrdSY6eTg+SpO5N54zlPOBo4O8BqurrwGu3ZadJFgK/ByytqoOAXYCTgHOB86pqCfAwcGrb5FTg4ap6Wevj3PY+B7btXg4sAz6VZJdt6UmS1I1pXQqrqolf4fLUdux7HvD8JPOA3YCHgCOAy9v6NcDx7fXytkxbf2SStPqlVfV4Vd0HbAAO2Y6eJEnbaTrB8kCS1wCVZNckf0i7LDZdVfVd4CPA/QwC5VHgJuCRqnqyDRsDFrbXC2nfS9bWPwrsNVwfsY0kqQfTCZb/AJzG4C/uMeBX2/K0JdmDwdnG/sBLgN2BY0YMHX9i5agZZ7WF+qh9rkyyPsn6TZs2Tb9pSdKUTOeZ9z8A3tjRfn8TuK+qNgEk+RzwGmB+knntrGQR8GAbPwbsC4y1S2e/AGweqo8b3mZi/6uAVQBLly4dGT6SpO035WBJ8okR5UeB9VU13S+kvB84LMluwD8ARwLrgWsYPETsUmAF//RFl2vb8tfa+i9XVSVZC/xlko8yOPNZgt+4LEm9ms6lsOcxuPx1T/t5BbAncGqSj01np1V1PYOb8DcDt7U+VgHvAd6ZZAODeygXtk0uBPZq9XcCZ7b3uR24DLgD+CJwWlVtz4QCSdJ2ms4TJF8GHDF+cz3J+cCXgNczCIdpqaqzgbMnlO9lxKyuqvopcOIk73MOcM509y9JmhnTOWNZyOAm+7jdgZe0M4THO+1KkrTDms4Zyx8Dtyb5CoPZWK8F/kv7ipf/PQO9SZJ2QNOZFXZhki8AbwbuYnAZbKyqfgy8a4b6kyTtYKYzK+ytwBkMpvTeChzGYJbWETPTmiRpRzSdS2FnAK8Grquq30jyL4ApPbZY03f/B/7lrO9zv/887TkYkvQM07l5/9M2O4skz62qu4ADZqYtSdKOajpnLGNJ5gN/BaxL8jCTfMpdkvTsNZ2b9/+6vXx/kmsYfK3KF2ekK0nSDms6ZyxPq6qvdt2IJGnn0OUz7yVJMlgkSd0yWCRJnTJYJEmdMlgkSZ0yWCRJnTJYJEmd6i1YksxPcnmSu5LcmeTXkuyZZF2Se9rvPdrYJPlEkg1JvpHk4KH3WdHG35NkRV/HI0ka2KYPSHbk48AXq+qEJLsCuwHvBa6uqg8nOZPBI4jfAxzD4Hn2S4BDgfOBQ5PsyeAplEuBAm5KsraqHp79w5EG/uwP/lcv+337f/3tXvYrTdTLGUuSFzJ4UNiFAFX1RFU9AiwH1rRha4Dj2+vlwMU1cB0wP8k+wNHAuqra3MJkHbBsFg9FkjRBX5fCXgpsAv48yS1JLmhPonxxVT0E0H6/qI1fCDwwtP1Yq01WlyT1pK9gmQccDJxfVa8EfszgstdkMqJWW6g/8w2SlUnWJ1m/adOm6fYrSZqivoJljMFjja9vy5czCJrvt0tctN8bh8bvO7T9IgZf2T9Z/RmqalVVLa2qpQsWLOjsQCRJP6uXYKmq7wEPJBl/UNiRwB3AWmB8ZtcK4Ir2ei1wSpsddhjwaLtUdhVwVJI92gyyo1pNktSTPmeFnQ58us0Iuxd4C4OguyzJqcD9wIlt7JXAG4ANwE/aWKpqc5IPAje2cR+oqs2zdwiSpIl6C5aqupXBNOGJjhwxtoDTJnmf1cDqbruTJG0rP3kvSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6pTBIknqlMEiSeqUwSJJ6lSvwZJklyS3JPnrtrx/kuuT3JPkM+2xxSR5blve0NYvHnqPs1r97iRH93MkkqRxfZ+xnAHcObR8LnBeVS0BHgZObfVTgYer6mXAeW0cSQ4ETgJeDiwDPpVkl1nqXZI0Qm/BkmQR8FvABW05wBHA5W3IGuD49np5W6atP7KNXw5cWlWPV9V9wAbgkNk5AknSKH2esXwMeDfwj215L+CRqnqyLY8BC9vrhcADAG39o2380/UR2/yMJCuTrE+yftOmTV0ehyRpSC/BkuRYYGNV3TRcHjG0trJuS9v8bLFqVVUtraqlCxYsmFa/kqSpm9fTfg8HjkvyBuB5wAsZnMHMTzKvnZUsAh5s48eAfYGxJPOAXwA2D9XHDW8jSepBL2csVXVWVS2qqsUMbr5/uareCFwDnNCGrQCuaK/XtmXa+i9XVbX6SW3W2P7AEuCGWToMSdIIfZ2xTOY9wKVJPgTcAlzY6hcC/yPJBgZnKicBVNXtSS4D7gCeBE6rqqdmv21J0rjeg6WqvgJ8pb2+lxGzuqrqp8CJk2x/DnDOzHUoSZqOvj/HIknayRgskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkTvUSLEn2TXJNkjuT3J7kjFbfM8m6JPe033u0epJ8IsmGJN9IcvDQe61o4+9JsmKyfUqSZkdfZyxPAn9QVb8MHAacluRA4Ezg6qpaAlzdlgGOYfA8+yXASuB8GAQRcDZwKIMnT549HkaSpH70EixV9VBV3dxePwbcCSwElgNr2rA1wPHt9XLg4hq4DpifZB/gaGBdVW2uqoeBdcCyWTwUSdIEvd9jSbIYeCVwPfDiqnoIBuEDvKgNWwg8MLTZWKtNVpck9aTXYEny88BngXdU1Q+3NHRErbZQH7WvlUnWJ1m/adOm6TcrSZqS3oIlyXMYhMqnq+pzrfz9domL9ntjq48B+w5tvgh4cAv1Z6iqVVW1tKqWLliwoLsDkST9jL5mhQW4ELizqj46tGotMD6zawVwxVD9lDY77DDg0Xap7CrgqCR7tJv2R7WaJKkn83ra7+HAm4Hbktzaau8FPgxcluRU4H7gxLbuSuANwAbgJ8BbAKpqc5IPAje2cR+oqs2zcwiSpFF6CZaq+htG3x8BOHLE+AJOm+S9VgOru+tOkrQ9ep8VJknauRgskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqROGSySpE4ZLJKkThkskqRO9fU8Fkmz6Jw3ndDLft/3F5f3sl/1yzMWSVKndoozliTLgI8DuwAXVNWHe25J0hTcec6XZ32fv/y+I2Z9n882O/wZS5JdgE8CxwAHAicnObDfriTp2WtnOGM5BNhQVfcCJLkUWA7cMZWNX/Wui2ewtcnd9Cen9LJfSVv2/ve//1mxz5m0w5+xAAuBB4aWx1pNktSDVFXfPWyXJCcCR1fVW9vym4FDqur0CeNWAivb4gHA3R3sfm/gBx28T5fmYk8wN/uyp6mxp6mbi3111dMvVtWCqQzcGS6FjQH7Di0vAh6cOKiqVgGrutxxkvVVtbTL99xec7EnmJt92dPU2NPUzcW++uhpZ7gUdiOwJMn+SXYFTgLW9tyTJD1r7fBnLFX1ZJK3A1cxmG68uqpu77ktSXrW2uGDBaCqrgSu7GHXnV5a68hc7AnmZl/2NDX2NHVzsa9Z72mHv3kvSZpbdoZ7LJKkOcRg2UZJliW5O8mGJGfOgX5WJ9mY5Jt99zIuyb5JrklyZ5Lbk5wxB3p6XpIbkny99fRHffc0LskuSW5J8td99zIuybeT3Jbk1iTr++4HIMn8JJcnuav92fq1nvs5oP33Gf/5YZJ39NlT6+v325/xbya5JMnzZm3fXgqbvvY1Mn8HvJ7BdOcbgZOrakqf9p+hnl4L/Ai4uKoO6quPYUn2AfapqpuTvAC4CTi+5/9OAXavqh8leQ7wN8AZVXVdXz2NS/JOYCnwwqo6tu9+YBAswNKqmjOfzUiyBvg/VXVBmwm6W1U90ndf8PTfDd8FDq2q7/TYx0IGf7YPrKp/SHIZcGVVXTQb+/eMZds8/TUyVfUEMP41Mr2pqmuBzX32MFFVPVRVN7fXjwF30vO3ItTAj9ric9pP7/+6SrII+C3ggr57mcuSvBB4LXAhQFU9MVdCpTkS+FafoTJkHvD8JPOA3Rjx+b6ZYrBsG79GZpqSLAZeCVzfbydPX3K6FdgIrKuq3nsCPga8G/jHvhuZoIAvJbmpfXtF314KbAL+vF02vCDJ7n03NeQk4JK+m6iq7wIfAe4HHgIeraovzdb+DZZtkxG13v/VO1cl+Xngs8A7quqHffdTVU9V1a8y+JaGQ5L0eukwybHAxqq6qc8+JnF4VR3M4NvDT2uXXPs0DzgYOL+qXgn8GOj9HidAuyx3HPA/50AvezC4irI/8BJg9yRvmq39GyzbZkpfIyNo9zE+C3y6qj7Xdz/D2iWUrwDLem7lcOC4dj/jUuCIJH/Rb0sDVfVg+70R+DyDy8B9GgPGhs4yL2cQNHPBMcDNVfX9vhsBfhO4r6o2VdX/Az4HvGa2dm6wbBu/RmYK2o3yC4E7q+qjffcDkGRBkvnt9fMZ/A94V589VdVZVbWoqhYz+LP05aqatX9dTibJ7m3SBe1y01FAr7MOq+p7wANJDmilI5niIzJmwcnMgctgzf3AYUl2a/8fHsngHues2Ck+eT/b5uLXyCS5BHgdsHeSMeDsqrqwz54Y/Ev8zcBt7Z4GwHvbNyX0ZR9gTZu983PAZVU1Z6b3zjEvBj4/+HuJecBfVtUX+20JgNOBT7d/1N0LvKXnfkiyG4NZov++714Aqur6JJcDNwNPArcwi5/Ad7qxJKlTXgqTJHXKYJEkdcpgkSR1ymCRJHXKYJEkdcpgkSR1ymCRJHXKYJEkder/A8B9dVfl2iQgAAAAAElFTkSuQmCC\n", 1000 | "text/plain": [ 1001 | "" 1002 | ] 1003 | }, 1004 | "metadata": {}, 1005 | "output_type": "display_data" 1006 | } 1007 | ], 1008 | "source": [ 1009 | "sns.barplot(x=age_df.index, y=age_df['age_bin'])" 1010 | ] 1011 | }, 1012 | { 1013 | "cell_type": "markdown", 1014 | "metadata": {}, 1015 | "source": [ 1016 | "## Using K-Mode with \"Cao\" initialization" 1017 | ] 1018 | }, 1019 | { 1020 | "cell_type": "code", 1021 | "execution_count": 42, 1022 | "metadata": { 1023 | "scrolled": true 1024 | }, 1025 | "outputs": [ 1026 | { 1027 | "name": "stdout", 1028 | "output_type": "stream", 1029 | "text": [ 1030 | "Init: initializing centroids\n", 1031 | "Init: initializing clusters\n", 1032 | "Starting iterations...\n", 1033 | "Run 1, iteration: 1/100, moves: 5322, cost: 192203.0\n", 1034 | "Run 1, iteration: 2/100, moves: 1160, cost: 192203.0\n" 1035 | ] 1036 | } 1037 | ], 1038 | "source": [ 1039 | "km_cao = KModes(n_clusters=2, init = \"Cao\", n_init = 1, verbose=1)\n", 1040 | "fitClusters_cao = km_cao.fit_predict(bank_cust)" 1041 | ] 1042 | }, 1043 | { 1044 | "cell_type": "code", 1045 | "execution_count": 43, 1046 | "metadata": {}, 1047 | "outputs": [ 1048 | { 1049 | "data": { 1050 | "text/plain": [ 1051 | "array([1, 1, 0, ..., 0, 1, 0], dtype=uint8)" 1052 | ] 1053 | }, 1054 | "execution_count": 43, 1055 | "metadata": {}, 1056 | "output_type": "execute_result" 1057 | } 1058 | ], 1059 | "source": [ 1060 | "# Predicted Clusters\n", 1061 | "fitClusters_cao" 1062 | ] 1063 | }, 1064 | { 1065 | "cell_type": "code", 1066 | "execution_count": 44, 1067 | "metadata": {}, 1068 | "outputs": [], 1069 | "source": [ 1070 | "clusterCentroidsDf = pd.DataFrame(km_cao.cluster_centroids_)\n", 1071 | "clusterCentroidsDf.columns = bank_cust.columns" 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "code", 1076 | "execution_count": 45, 1077 | "metadata": {}, 1078 | "outputs": [ 1079 | { 1080 | "data": { 1081 | "text/html": [ 1082 | "
\n", 1083 | "\n", 1096 | "\n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | "
jobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcomeage_bin
001602006212
111300016013
\n", 1144 | "
" 1145 | ], 1146 | "text/plain": [ 1147 | " job marital education default housing loan contact month \\\n", 1148 | "0 0 1 6 0 2 0 0 6 \n", 1149 | "1 1 1 3 0 0 0 1 6 \n", 1150 | "\n", 1151 | " day_of_week poutcome age_bin \n", 1152 | "0 2 1 2 \n", 1153 | "1 0 1 3 " 1154 | ] 1155 | }, 1156 | "execution_count": 45, 1157 | "metadata": {}, 1158 | "output_type": "execute_result" 1159 | } 1160 | ], 1161 | "source": [ 1162 | "# Mode of the clusters\n", 1163 | "clusterCentroidsDf" 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "markdown", 1168 | "metadata": {}, 1169 | "source": [ 1170 | "## Using K-Mode with \"Huang\" initialization" 1171 | ] 1172 | }, 1173 | { 1174 | "cell_type": "code", 1175 | "execution_count": 46, 1176 | "metadata": {}, 1177 | "outputs": [ 1178 | { 1179 | "name": "stdout", 1180 | "output_type": "stream", 1181 | "text": [ 1182 | "Init: initializing centroids\n", 1183 | "Init: initializing clusters\n", 1184 | "Starting iterations...\n", 1185 | "Run 1, iteration: 1/100, moves: 8403, cost: 195645.0\n", 1186 | "Run 1, iteration: 2/100, moves: 0, cost: 195645.0\n" 1187 | ] 1188 | } 1189 | ], 1190 | "source": [ 1191 | "km_huang = KModes(n_clusters=2, init = \"Huang\", n_init = 1, verbose=1)\n", 1192 | "fitClusters_huang = km_huang.fit_predict(bank_cust)" 1193 | ] 1194 | }, 1195 | { 1196 | "cell_type": "code", 1197 | "execution_count": 47, 1198 | "metadata": {}, 1199 | "outputs": [ 1200 | { 1201 | "data": { 1202 | "text/plain": [ 1203 | "array([0, 0, 1, ..., 0, 0, 1], dtype=uint8)" 1204 | ] 1205 | }, 1206 | "execution_count": 47, 1207 | "metadata": {}, 1208 | "output_type": "execute_result" 1209 | } 1210 | ], 1211 | "source": [ 1212 | "# Predicted clusters\n", 1213 | "fitClusters_huang" 1214 | ] 1215 | }, 1216 | { 1217 | "cell_type": "markdown", 1218 | "metadata": {}, 1219 | "source": [ 1220 | "## Choosing K by comparing Cost against each K" 1221 | ] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": 68, 1226 | "metadata": {}, 1227 | "outputs": [ 1228 | { 1229 | "name": "stdout", 1230 | "output_type": "stream", 1231 | "text": [ 1232 | "Init: initializing centroids\n", 1233 | "Init: initializing clusters\n", 1234 | "Starting iterations...\n", 1235 | "Run 1, iteration: 1/100, moves: 0, cost: 258139.0\n", 1236 | "Init: initializing centroids\n", 1237 | "Init: initializing clusters\n", 1238 | "Starting iterations...\n", 1239 | "Run 1, iteration: 1/100, moves: 5319, cost: 233390.0\n", 1240 | "Run 1, iteration: 2/100, moves: 1165, cost: 233389.0\n", 1241 | "Run 1, iteration: 3/100, moves: 0, cost: 233389.0\n", 1242 | "Init: initializing centroids\n", 1243 | "Init: initializing clusters\n", 1244 | "Starting iterations...\n", 1245 | "Run 1, iteration: 1/100, moves: 4991, cost: 226325.0\n", 1246 | "Run 1, iteration: 2/100, moves: 1369, cost: 226323.0\n", 1247 | "Run 1, iteration: 3/100, moves: 0, cost: 226323.0\n", 1248 | "Init: initializing centroids\n", 1249 | "Init: initializing clusters\n", 1250 | "Starting iterations...\n", 1251 | "Run 1, iteration: 1/100, moves: 0, cost: 223701.0\n" 1252 | ] 1253 | } 1254 | ], 1255 | "source": [ 1256 | "cost = []\n", 1257 | "for num_clusters in list(range(1,5)):\n", 1258 | " kmode = KModes(n_clusters=num_clusters, init = \"Cao\", n_init = 1, verbose=1)\n", 1259 | " kmode.fit_predict(bank_cust)\n", 1260 | " cost.append(kmode.cost_)" 1261 | ] 1262 | }, 1263 | { 1264 | "cell_type": "code", 1265 | "execution_count": 69, 1266 | "metadata": {}, 1267 | "outputs": [ 1268 | { 1269 | "data": { 1270 | "text/plain": [ 1271 | "[]" 1272 | ] 1273 | }, 1274 | "execution_count": 69, 1275 | "metadata": {}, 1276 | "output_type": "execute_result" 1277 | }, 1278 | { 1279 | "data": { 1280 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAD8CAYAAACLrvgBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XucVeV97/HPd2C4I9dRcJiRi9iKRtBMgFRjPebUoDFiGpNAUiWpKTHVFI/JiUaPJrXNeWn7ikbTNKmpOdXUG1Ub0GoNTUy0rWIG5CKiEVFhAHUQ5BrRgd/5Yz8Dm3GYWcDM7Mt836/XfrH2s5615lksne886/nNRhGBmZlZFhWFHoCZmZUOh4aZmWXm0DAzs8wcGmZmlplDw8zMMnNomJlZZg4NMzPLzKFhZmaZOTTMzCyznoUeQEcbPnx4jB49utDDMDMrKYsWLdoYEVXt9Su70Bg9ejT19fWFHoaZWUmR9FqWfn48ZWZmmTk0zMwsM4eGmZll5tAwM7PMHBpmZpaZQ8PMzDJzaJiZWWYOjWTVm9u58d9fwP/8rZnZgTk0kl+9+CY//NXL3PPM2kIPxcysaDk0kj89dQynHTuc6x9ewao3txd6OGZmRcmhkVRUiO9+ZiJ9K3sw595nebdpT6GHZGZWdBwaeY46og83fuokVqzfynd//mKhh2NmVnQcGi2cdcIIPjelln94YjX/tWpjoYdjZlZUHBqtuPbjExhX1Z8r5i5h8453Cz0cM7Oi4dBoRd9ePbhlxsls2vEuVz6wzGW4ZmaJQ+MATqwexP/+2O/x8+ff4N7fuAzXzAwcGm360mljc2W4Dz3Py40uwzUzc2i0obkMt09lhctwzcxwaLSruQz3uXUuwzUzc2hk4DJcM7Mch0ZG1358AmNdhmtm3ZxDI6O+vXpwayrDvepBl+GaWffk0DgIzWW4j61wGa6ZdU/thoakGkmPS1opaYWkOan925LWSVqSXuek9tGSfpfX/qO8c31Q0nJJqyTdKkmpfaikBZJeSn8OSe1K/VZJWibplM75a8jOZbhm1p1lmWk0AV+LiOOBqcClkiakfTdHxKT0eiTvmJfz2i/Ja/8hMBsYn17TUvtVwC8iYjzwi/Qe4Oy8vrPT8QXlMlwz687aDY2I2BARi9P2NmAlUH2wX0jSSOCIiHgqcgsCdwLnp93TgTvS9h0t2u+MnKeBwek8BXXUEX24wWW4ZtYNHdSahqTRwMnAwtR0WXps9JPmR0rJGEnPSvq1pI+ktmqgIa9PA/vC56iI2AC5kAKOzDtm7QGOyR/XbEn1kuobGxsP5pIO2cdOGMHMyS7DNbPuJXNoSBoAPABcHhFbyT0qGgdMAjYA301dNwC1EXEycAVwt6QjALVy2vZKkDIdExG3RURdRNRVVVVlup6OcO25x7sM18y6lUyhIamSXGDcFREPAkTEGxGxOyL2AD8GJqf2XRHxVtpeBLwMHEduljAq77SjgPVp+43mx07pzzdTewNQc4BjCq5fr54uwzWzbiVL9ZSA24GVEXFTXnv+2sIngedSe5WkHml7LLlF7NXpsdM2SVPTOS8C5qXj5wOz0vasFu0XpSqqqcCW5sdYxcJluGbWnfTM0OdU4EJguaQlqe1qYKakSeQeF70KfDntOx24XlITsBu4JCI2pX1fAf4J6As8ml4ANwBzJV0MrAE+ndofAc4BVgE7gS8e/CV2vi+dNpYnfruR6x96nsljhjKuakChh2Rm1ilUbo9U6urqor6+vsu/7htb3+Fj33uCUUP68uBXTqVXT//epJmVDkmLIqKuvX7+ztZB/Gm4ZtYdODQ6kMtwzazcOTQ6mMtwzaycOTQ6mMtwzaycOTQ6gctwzaxcOTQ6yZdOG8upxw7zp+GaWVlxaHSSigrx3U9Porc/DdfMyohDoxONGJRXhrvAZbhmVvocGp2suQz3tidW898uwzWzEufQ6ALXnns8Y4b353+5DNfMSpxDowu4DNfMyoVDo4u4DNfMyoFDowu5DNfMSp1Dowvll+Fefu8Sl+GaWclxaHSx5jLc5eu2uAzXzEqOQ6MAXIZrZqXKoVEgLsM1s1Lk0CgQl+GaWSlyaBTQidWD+PpZuTLc+1yGa2YloN3QkFQj6XFJKyWtkDQntX9b0jpJS9LrnLxjvilplaQXJX0sr31aalsl6aq89jGSFkp6SdJ9knql9t7p/aq0f3RHXnwx+LOP5Mpw/9JluGZWArLMNJqAr0XE8cBU4FJJE9K+myNiUno9ApD2zQBOAKYBfy+ph6QewA+As4EJwMy889yYzjUe2AxcnNovBjZHxLHAzalfWXEZrpmVknZDIyI2RMTitL0NWAlUt3HIdODeiNgVEa8Aq4DJ6bUqIlZHxLvAvcB0SQLOBO5Px98BnJ93rjvS9v3AR1P/suIyXDMrFQe1ppEeD50MLExNl0laJuknkoaktmog/wF9Q2o7UPsw4O2IaGrRvt+50v4tqX/ZcRmumZWCzKEhaQDwAHB5RGwFfgiMAyYBG4DvNndt5fA4hPa2ztVybLMl1Uuqb2xsbPM6illzGe4Vc5e6DNfMilKm0JBUSS4w7oqIBwEi4o2I2B0Re4Afk3v8BLmZQk3e4aOA9W20bwQGS+rZon2/c6X9g4BNLccXEbdFRF1E1FVVVWW5pKLUXIb71o5dfPPB5S7DNbOik6V6SsDtwMqIuCmvfWRet08Cz6Xt+cCMVPk0BhgPPAP8BhifKqV6kVssnx+574yPAxek42cB8/LONSttXwD8Msr8O2lzGe6/r3jdZbhmVnR6tt+FU4ELgeWSlqS2q8lVP00i97joVeDLABGxQtJc4HlylVeXRsRuAEmXAY8BPYCfRMSKdL4rgXsl/TXwLLmQIv35U0mryM0wZhzGtZaMP/vIWJ54qZG/fOh5PjRmKOOqBhR6SGZmAKjcfnCvq6uL+vr6Qg/jsL2+5R2m3fIENUP68cBX/oBePf17mGbWeSQtioi69vr5O1GRGjGoDzf8sctwzay4ODSK2LQTRzBzco3LcM2saDg0ity1505wGa6ZFQ2HRpFzGa6ZFROHRglwGa6ZFQuHRonwp+GaWTFwaJQIfxqumRUDh0YJyS/DvWnBbws9HDPrhhwaJaa5DPcfnniZ/37ZZbhm1rUcGiVobxnufS7DNbOu5dAoQS7DNbNCcWiUKJfhmlkhODRK2J99ZCx/MC5XhrvaZbhm1gUcGiWsokLc9JlcGe4cl+GaWRdwaJQ4l+GaWVdyaJQBl+GaWVdxaJQJl+GaWVdwaJQJl+GaWVdwaJSRE6sH8bVUhju33mW4ZtbxHBplZnYqw/32fJfhmlnHazc0JNVIelzSSkkrJM1psf/rkkLS8PT+DElbJC1Jr+vy+k6T9KKkVZKuymsfI2mhpJck3SepV2rvnd6vSvtHd9SFlyuX4ZpZZ8oy02gCvhYRxwNTgUslTYBcoAB/BKxpccyTETEpva5PfXsAPwDOBiYAM5vPA9wI3BwR44HNwMWp/WJgc0QcC9yc+lk7XIZrZp2l3dCIiA0RsThtbwNWAtVp983AN4Asq66TgVURsToi3gXuBaZLEnAmcH/qdwdwftqent6T9n809bd2uAzXzDrDQa1ppMdDJwMLJZ0HrIuIpa10/bCkpZIelXRCaqsG8ldnG1LbMODtiGhq0b7fMWn/ltTfMrj23AmMGeYyXDPrOJlDQ9IA4AHgcnKPrK4Brmul62LgmIiYCHwf+FnzKVrpG220t3VMy7HNllQvqb6xsbHN6+hO+vXqyS2pDPfqf3UZrpkdvkyhIamSXGDcFREPAuOAMcBSSa8Co4DFkkZExNaI2A4QEY8AlWmRvAGoyTvtKGA9sBEYLKlni3byj0n7BwGbWo4vIm6LiLqIqKuqqsp88d3BB0blynAffc5luGZ2+LJUTwm4HVgZETcBRMTyiDgyIkZHxGhy39xPiYjXJY1oXneQNDl9jbeA3wDjU6VUL2AGMD9yP/4+DlyQvuQsYF7anp/ek/b/Mvzj8kFzGa6ZdZQsM41TgQuBM/PKaM9po/8FwHOSlgK3AjMipwm4DHiM3GL63IhYkY65ErhC0ipyaxa3p/bbgWGp/QrgKuyguQzXzDqKyu0H97q6uqivry/0MIrSvz/3Opf88yIu+cNxXHX27xd6OGZWRCQtioi69vr5N8K7kWknjmDGh1yGa2aHzqHRzVz3iX1luG/vdBmumR0ch0Y3k1+G60/DNbOD5dDohlyGa2aHyqHRTbkM18wOhUOjm2ouw+3V02W4ZpadQ6MbGzGoDzd+6gMsX7eFm//Dn4ZrZu1zaHRz004cyYwP1fCjX7sM18za59Awl+GaWWYODXMZrpll5tAwwGW4ZpaNQ8P2chmumbXHoWF7VVSI735mostwzeyAHBq2n5GD+roM18wOyKFh7+MyXDM7EIeGtcpluGbWGoeGtcpluGbWGoeGHVB+Ge6/1DcUejhmVgQcGtam2R8Zy4fHDuPbD63glY07Cj0cMyswh4a1qaJC3PTZiVT2qGDOvc+6DNesm2s3NCTVSHpc0kpJKyTNabH/65JC0vD0XpJulbRK0jJJp+T1nSXppfSaldf+QUnL0zG3SlJqHyppQeq/QNKQjrt0y6q5DHdZg8twzbq7LDONJuBrEXE8MBW4VNIEyAUK8EfAmrz+ZwPj02s28MPUdyjwLWAKMBn4Vl4I/DD1bT5uWmq/CvhFRIwHfpHeWwG4DNfMIENoRMSGiFictrcBK4HqtPtm4BtAfmnNdODOyHkaGCxpJPAxYEFEbIqIzcACYFrad0REPBW5Ep07gfPzznVH2r4jr90KwGW4ZnZQaxqSRgMnAwslnQesi4ilLbpVA/mfeNeQ2tpqb2ilHeCoiNgAufACjjzAuGZLqpdU39jYeDCXZAehX6+efG/GJDZudxmuWXeVOTQkDQAeAC4n98jqGuC61rq20haH0J5ZRNwWEXURUVdVVXUwh9pBOmnUYJfhmnVjmUJDUiW5wLgrIh4ExgFjgKWSXgVGAYsljSA3U6jJO3wUsL6d9lGttAO8kR5fkf5882AuzjrHl093Ga5Zd5WlekrA7cDKiLgJICKWR8SRETE6IkaT+8Z/SkS8DswHLkpVVFOBLenR0mPAWZKGpAXws4DH0r5tkqamr3URMC99+flAc5XVrLx2KyCX4Zp1X1lmGqcCFwJnSlqSXue00f8RYDWwCvgx8OcAEbEJ+CvgN+l1fWoD+Arwj+mYl4FHU/sNwB9JeolcldYNB3Ft1olchmvWPancFjPr6uqivr6+0MPoNq56YBn31a/lri9N4Q/GDS/0cMzsEElaFBF17fXzb4TbYbn23AmMdhmuWbfh0LDD0r93T25JZbhX/6vLcM3KnUPDDltzGe4jy12Ga1buHBrWIVyGa9Y9ODSsQ7gM16x7cGhYhxk5qC83/HGuDPd7LsM1K0sODetQZ39gJJ+tq+GHv36Zp15+q9DDMbMO5tCwDnfdJ1IZ7twlLsM1KzMODetwzWW4jdtchmtWbhwa1ilchmtWnhwa1mlchmtWfhwa1mlaluG+t9tluGalzqFhnSq/DPfmBS7DNSt1Dg3rdC7DNSsfDg3rEi7DNSsPDg3rEi7DNSsPDg3rMi7DNSt9Dg3rUrNPH8vUsUNdhmtWohwa1qV6VIibPzuJyh4VXO4yXLOS025oSKqR9LiklZJWSJqT2v9K0jJJSyT9XNLRqf0MSVtS+xJJ1+Wda5qkFyWtknRVXvsYSQslvSTpPkm9Unvv9H5V2j+6o/8CrOs1l+EudRmuWcnJMtNoAr4WEccDU4FLJU0A/jYiToqIScDDwHV5xzwZEZPS63oAST2AHwBnAxOAmek8ADcCN0fEeGAzcHFqvxjYHBHHAjenflYGXIZrVpraDY2I2BARi9P2NmAlUB0RW/O69QfaK4eZDKyKiNUR8S5wLzBdkoAzgftTvzuA89P29PSetP+jqb+VAZfhmpWeg1rTSI+HTgYWpvffkbQW+Dz7zzQ+LGmppEclnZDaqoG1eX0aUtsw4O2IaGrRvt8xaf+W1N/KgMtwzUpP5tCQNAB4ALi8eZYREddERA1wF3BZ6roYOCYiJgLfB37WfIpWThtttLd1TMuxzZZUL6m+sbEx6yVZEThp1GCuOOu4XBnuIpfhmhW7TKEhqZJcYNwVEQ+20uVu4FMAEbE1Iran7UeASknDyc0gavKOGQWsBzYCgyX1bNFO/jFp/yBgU8svHhG3RURdRNRVVVVluSQrIl8+fVyuDHe+y3DNil2W6ikBtwMrI+KmvPbxed3OA15I7SOa1x0kTU5f4y3gN8D4VCnVC5gBzI/cM4nHgQvSuWYB89L2/PSetP+X4WcYZcdluGalI8tM41TgQuDMvDLac4AbJD0naRlwFjAn9b8AeE7SUuBWYEbkNJF7hPUYucX0uRGxIh1zJXCFpFXk1ixuT+23A8NS+xXA3jJdKy8uwzUrDSq3H9zr6uqivr6+0MOwQ3Tl/cuYu2gtd39pKh8e55oHs64iaVFE1LXXz78RbkUlvwx3y873Cj0cM2vBoWFFpX/vnnzvsy7DNStWDg0rOhNrcmW4/7Z8g8twzYqMQ8OKUn4Zbv2r76uyNrMCcWhYUWouw+3XqwcX/Ogpzv3+k9y18DW272pq/2Az6zSunrKitvWd95j37DruWriGF17fRr9ePZg+6WhmTq7lA9WD8EeRmXWMrNVTDg0rCRHBkrVvc88za3ho6QZ+995uTjj6CD43pZbzJh7NwD6VhR6iWUlzaFjZam32cd7Eo/ncFM8+zA6VQ8PK3oFmHzMn1zJ9kmcfZgfDoWHdyoFmHzMn13LSKM8+zNrj0LBuybMPs0Pj0LBub+s77zFvyXruXriGlRu2evZh1gaHhlkSESxt2MLdC1/bO/uYMDJXeeXZh1mOQ8OsFS1nH30r91VeefZh3ZlDw6wNzbOPexauYf7S9XtnHzOn1HK+Zx/WDTk0zDLy7MPMoVHoYVgJamv2MX3S0Rzh2YeVMYeG2WHY9s57/KyV2cfMKbVM9OzDypBDw6wDRATLGrZwt2cfVuYcGmYdbFve2sfznn1YmemwfyNcUo2kxyWtlLRC0pzU/leSlklaIunnko5O7ZJ0q6RVaf8peeeaJeml9JqV1/5BScvTMbcq/d8naaikBan/AklDDuUvw6wjDOxTyZ9MPYZ/+4vTmHfpqUyfdDQPLVvP+T/4L8659T/56dOvsfUd/7vmVt7anWlIGgmMjIjFkgYCi4DzgYaI2Jr6/AUwISIukXQO8FXgHGAKcEtETJE0FKgH6oBI5/lgRGyW9AwwB3gaeAS4NSIelfQ3wKaIuEHSVcCQiLiyrfF6pmFdqbXZxycmjuRzU47x7MNKStaZRs/2OkTEBmBD2t4maSVQHRHP53XrTy4IAKYDd0YujZ6WNDgFzxnAgojYlAa4AJgm6VfAERHxVGq/k1woPZrOdUY67x3Ar4A2Q8OsKzXPPj4/pZZlDVu455nc2sfc+gaOH3kEn5tcw/STq732YWWj3dDIJ2k0cDKwML3/DnARsAX4H6lbNbA277CG1NZWe0Mr7QBHpdAiIjZIOvJgxmvWVSQxsWYwE2sGc83Hj987+7h23gr+7yMv8ImJI5k5uZZJNYM9+7CSljk0JA0AHgAub34sFRHXANdI+iZwGfAtoLX/I+IQ2jOTNBuYDVBbW3swh5p1uPzZx/J1+yqvPPuwctDuQjiApEpygXFXRDzYSpe7gU+l7QagJm/fKGB9O+2jWmkHeCM92mpeW3mztfFFxG0RURcRdVVVVVkuyazTSeKkUYO54VMnsfDqj/KdT55IheDaeSuY8p1f8I37l/Lsms2UWwWjlbcs1VMCbgdWRsRNee3j87qdB7yQtucDF6UqqqnAlvSI6THgLElDUhXUWcBjad82SVPT17oImJd3ruYqq1l57WYlZWCfSj4/5Rge/uppzL8sV3n18LINfPLv/5uzb3mSnz71qiuvrCRkqZ46DXgSWA7sSc1XAxcDv5faXgMuiYh16Rv/3wHTgJ3AFyOiPp3rT9OxAN+JiP+X2uuAfwL6klsA/2pEhKRhwFygFlgDfLp5If1AXD1lpWLbO+8xf2lu7WPF+n2VV177sELwL/eZlZBlDbl/bXDekvXsfHc3vz9iIJ+fUuu1D+syDg2zErR9VxPzlqzbO/voU1nBJ07KfeKuZx/WmRwaZiVuecMW7n7mtf1mH5+bUsv5nn1YJ3BomJWJA80+Zk6p5WTPPqyDODTMylBu9rGG+UvWsSNv9jF9UjWD+nr2YYfOoWFWxrbvamL+kvXc/cxrPLfOsw87fA4Ns27Csw/rCA4Ns26mefZxzzNrWL5uC30qKzg3VV559mHtcWiYdWOtzT5mTs5VXnn2Ya1xaJgZ23c18VD6rfP82cfMybWcUuvZh+3j0DCz/Xj2YW1xaJhZqzz7sNY4NMysXcsbtnDPb9Yw71nPPro7h4aZZdY8+7jnmTUsa/DsoztyaJjZIXluXW7tw7OP7sWhYWaHZceuJua3mH18/AO53/vw7KP8ODTMrMO0nH2MGtKX444ayJjh/RkzvD9jh/dnTFV/RhzRx2FSohwaZtbhmmcfT/y2kVc27uDVt3bwznt79u7vW9mD0c0h0vyqyr0f3K9XAUdu7XFomFmn27MneH3rO7yycQerN+7glcYdvLJxO69s3MHazb9j955931+G9KtMQTKAsVX7QmX0sP707dWjgFdhkD00enbFYMysPFVUiKMH9+XowX059djh++17t2kPazfvTEGSQmXjdv5zVSMPLG7Yr+/Rg/owZm+QDNg7Uxk1pC89e1R05SVZOxwaZtYpevWsYFzVAMZVDXjfvh27mnhl4473veYvWc/Wd5r29utZIWqH9ct73DUgt4ZS1Z8jB/b2+kkBtBsakmqAO4ERwB7gtoi4RdLfAp8A3gVeBr4YEW9LGg2sBF5Mp3g6Ii5J5/og8E9AX+ARYE5EhKShwH3AaOBV4DMRsVm5/yJuAc4BdgJfiIjFh3/ZZlZI/Xv35MTqQZxYPWi/9ohg8873eGXjdlY37h8oT760kV1N+9ZP+vXq8b6F+DHDBzBmWH8G9XNpcGdpd01D0khgZEQsljQQWAScD4wCfhkRTZJuBIiIK1NoPBwRJ7ZyrmeAOcDT5ELj1oh4VNLfAJsi4gZJVwFD0rnOAb5KLjSmALdExJS2xus1DbPytGdPsGHrO3vXTVbnBcraTTvJWz5hWP9e+9ZM8kJl9LD+9Kn0+klrOmxNIyI2ABvS9jZJK4HqiPh5XrengQvaGdBI4IiIeCq9v5Nc+DwKTAfOSF3vAH4FXJna74xcsj0tabCkkWlMZtaNVFSI6sF9qR7cl9PGv3/9ZM2mnSlEcgvxqxt38OvfNvIvi/atn0hw9KC++yq78qq7qgd7/SSLg1rTSLOIk4GFLXb9KbnHS83GSHoW2Ar8n4h4EqgG8le/GlIbwFHNQRARGyQdmdqrgbWtHLNfaEiaDcwGqK2tPZhLMrMy0KtnBcceOYBjjxwAHLXfvu27mni1lequnz27jm279q2fVPYQtUP7va+6a+zw/lR5/WSvzKEhaQDwAHB5RGzNa78GaALuSk0bgNqIeCutYfxM0glAa3/j7dX7ZjomIm4DboPc46n2rsXMuo8BbayfvLXj3dzspHFfddcrG3fwxEuNvJu3ftK/V499ayb5v4dS1Z8j+nSv9ZNMoSGpklxg3BURD+a1zwLOBT6aHiEREbuAXWl7kaSXgePIzRJG5Z12FLA+bb/R/NgpPcZ6M7U3ADUHOMbM7JBJYviA3gwf0JsPjR66377de4L1b/9uv4X41Rt3sGTtZh5etp78peDhA3rlPe7aV91VO7RfWa6fZKmeEnA7sDIibsprn0Zu3eEPI2JnXnsVuUXt3ZLGAuOB1RGxSdI2SVPJPd66CPh+Omw+MAu4If05L6/9Mkn3klsI3+L1DDPrbD0qRM3QftQM7cfpx1Xtt29X027Wbtq5X3XX6o07ePzFRubW779+Uj24b4uZSe53UI4e3JceFaX5uCvLTONU4EJguaQlqe1q4FagN7AgPetrLq09HbheUhOwG7gkIjal477CvpLbR9MLcmExV9LFwBrg06n9EXKVU6vIldx+8dAu08ysY/Tu2YNjjxzIsUcOfN++be+8x6sbd7I6PeZqfj2weB3b89ZPevWo4Jhh/fZbiG+epQwf0Kuo10/8MSJmZp0sIti4/d291V37FuV38NpbO3l39771k4G9e+b9dvz+r4GduH7ijxExMysSkqga2Juqgb2ZPKb19ZNckGzf+7hr0Wubmb+05fpJ7/0W4ZsffdUO60fvnl2zfuLQMDMroPz1kz9ssX7yznu7WbPf+kkuVH7xwhtsrH93b78KQfWQvnz9rN9j+qTqll+iQzk0zMyKVJ/KHhx31ECOO+r96ydbfvcer27M/zDIHQwf0LvTx+TQMDMrQYP6VjKxZjATawZ36df178ybmVlmDg0zM8vMoWFmZpk5NMzMLDOHhpmZZebQMDOzzBwaZmaWmUPDzMwyK7sPLJTUCLx2iIcPBzZ24HAKyddSfMrlOsDXUqwO51qOiYiq9jqVXWgcDkn1WT7lsRT4WopPuVwH+FqKVVdcix9PmZlZZg4NMzPLzKGxv9sKPYAO5GspPuVyHeBrKVadfi1e0zAzs8w80zAzs8y6XWhI+omkNyU9d4D9knSrpFWSlkk6pavHmFWGazlD0hZJS9Lruq4eY1aSaiQ9LmmlpBWS5rTSp+jvTcbrKIn7IqmPpGckLU3X8pet9Okt6b50TxZKGt31I21fxmv5gqTGvPvypUKMNQtJPSQ9K+nhVvZ17j2JiG71Ak4HTgGeO8D+c4BHAQFTgYWFHvNhXMsZwMOFHmfGaxkJnJK2BwK/BSaU2r3JeB0lcV/S3/OAtF0JLASmtujz58CP0vYM4L5Cj/swruULwN8VeqwZr+cK4O7W/jvq7HvS7WYaEfEEsKmNLtOBOyPnaWCwpJFdM7qDk+FaSkZEbIiIxWl7G7ASaPmPHRf9vcl4HSUh/T1vT28r06vlIuh04I60fT/wUUnqoiFmlvFaSoKkUcAD0y2/AAACXklEQVTHgX88QJdOvSfdLjQyqAbW5r1voET/p08+nKbkj0o6odCDySJNp08m99NgvpK6N21cB5TIfUmPQZYAbwILIuKA9yQimoAtwLCuHWU2Ga4F4FPp0ef9kmq6eIhZfQ/4BrDnAPs79Z44NN6vtUQuyZ9IgMXkPhpgIvB94GcFHk+7JA0AHgAuj4itLXe3ckhR3pt2rqNk7ktE7I6IScAoYLKkE1t0KZl7kuFaHgJGR8RJwH+w76f1oiHpXODNiFjUVrdW2jrsnjg03q8ByP8JYxSwvkBjOSwRsbV5Sh4RjwCVkoYXeFgHJKmS3DfauyLiwVa6lMS9ae86Su2+AETE28CvgGktdu29J5J6AoMo8kemB7qWiHgrInaltz8GPtjFQ8viVOA8Sa8C9wJnSvrnFn069Z44NN5vPnBRqtSZCmyJiA2FHtShkDSi+VmmpMnk7vdbhR1V69I4bwdWRsRNB+hW9Pcmy3WUyn2RVCVpcNruC/xP4IUW3eYDs9L2BcAvI63AFpMs19Jifew8cutRRSUivhkRoyJiNLlF7l9GxJ+06Nap96RnR52oVEi6h1z1ynBJDcC3yC2KERE/Ah4hV6WzCtgJfLEwI21fhmu5APiKpCbgd8CMYvwfOjkVuBBYnp47A1wN1EJJ3Zss11Eq92UkcIekHuSCbW5EPCzpeqA+IuaTC8ifSlpF7qfZGYUbbpuyXMtfSDoPaCJ3LV8o2GgPUlfeE/9GuJmZZebHU2ZmlplDw8zMMnNomJlZZg4NMzPLzKFhZmaZOTTMzCwzh4aZmWXm0DAzs8z+PyHUvLLc9lRXAAAAAElFTkSuQmCC\n", 1281 | "text/plain": [ 1282 | "" 1283 | ] 1284 | }, 1285 | "metadata": {}, 1286 | "output_type": "display_data" 1287 | } 1288 | ], 1289 | "source": [ 1290 | "y = np.array([i for i in range(1,5,1)])\n", 1291 | "plt.plot(y,cost)" 1292 | ] 1293 | }, 1294 | { 1295 | "cell_type": "code", 1296 | "execution_count": 50, 1297 | "metadata": {}, 1298 | "outputs": [], 1299 | "source": [ 1300 | "## Choosing K=2" 1301 | ] 1302 | }, 1303 | { 1304 | "cell_type": "code", 1305 | "execution_count": 51, 1306 | "metadata": {}, 1307 | "outputs": [ 1308 | { 1309 | "name": "stdout", 1310 | "output_type": "stream", 1311 | "text": [ 1312 | "Init: initializing centroids\n", 1313 | "Init: initializing clusters\n", 1314 | "Starting iterations...\n", 1315 | "Run 1, iteration: 1/100, moves: 5322, cost: 192203.0\n", 1316 | "Run 1, iteration: 2/100, moves: 1160, cost: 192203.0\n" 1317 | ] 1318 | } 1319 | ], 1320 | "source": [ 1321 | "km_cao = KModes(n_clusters=2, init = \"Cao\", n_init = 1, verbose=1)\n", 1322 | "fitClusters_cao = km_cao.fit_predict(bank_cust)" 1323 | ] 1324 | }, 1325 | { 1326 | "cell_type": "code", 1327 | "execution_count": 52, 1328 | "metadata": {}, 1329 | "outputs": [ 1330 | { 1331 | "data": { 1332 | "text/plain": [ 1333 | "array([1, 1, 0, ..., 0, 1, 0], dtype=uint8)" 1334 | ] 1335 | }, 1336 | "execution_count": 52, 1337 | "metadata": {}, 1338 | "output_type": "execute_result" 1339 | } 1340 | ], 1341 | "source": [ 1342 | "fitClusters_cao" 1343 | ] 1344 | }, 1345 | { 1346 | "cell_type": "markdown", 1347 | "metadata": {}, 1348 | "source": [ 1349 | "### Combining the predicted clusters with the original DF." 1350 | ] 1351 | }, 1352 | { 1353 | "cell_type": "code", 1354 | "execution_count": 53, 1355 | "metadata": {}, 1356 | "outputs": [], 1357 | "source": [ 1358 | "bank_cust = bank_cust.reset_index()\n", 1359 | "clustersDf = pd.DataFrame(fitClusters_cao)\n", 1360 | "clustersDf.columns = ['cluster_predicted']\n", 1361 | "combinedDf = pd.concat([bank_cust, clustersDf], axis = 1).reset_index()\n", 1362 | "combinedDf = combinedDf.drop(['index', 'level_0'], axis = 1)" 1363 | ] 1364 | }, 1365 | { 1366 | "cell_type": "code", 1367 | "execution_count": 54, 1368 | "metadata": {}, 1369 | "outputs": [ 1370 | { 1371 | "data": { 1372 | "text/html": [ 1373 | "
\n", 1374 | "\n", 1387 | "\n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | "
jobmaritaleducationdefaulthousingloancontactmonthday_of_weekpoutcomeage_bincluster_predicted
0310000161141
1713100161141
2713020161120
3011000161120
4713002161141
\n", 1483 | "
" 1484 | ], 1485 | "text/plain": [ 1486 | " job marital education default housing loan contact month \\\n", 1487 | "0 3 1 0 0 0 0 1 6 \n", 1488 | "1 7 1 3 1 0 0 1 6 \n", 1489 | "2 7 1 3 0 2 0 1 6 \n", 1490 | "3 0 1 1 0 0 0 1 6 \n", 1491 | "4 7 1 3 0 0 2 1 6 \n", 1492 | "\n", 1493 | " day_of_week poutcome age_bin cluster_predicted \n", 1494 | "0 1 1 4 1 \n", 1495 | "1 1 1 4 1 \n", 1496 | "2 1 1 2 0 \n", 1497 | "3 1 1 2 0 \n", 1498 | "4 1 1 4 1 " 1499 | ] 1500 | }, 1501 | "execution_count": 54, 1502 | "metadata": {}, 1503 | "output_type": "execute_result" 1504 | } 1505 | ], 1506 | "source": [ 1507 | "combinedDf.head()" 1508 | ] 1509 | }, 1510 | { 1511 | "cell_type": "code", 1512 | "execution_count": 55, 1513 | "metadata": {}, 1514 | "outputs": [], 1515 | "source": [ 1516 | "# Data for Cluster1\n", 1517 | "cluster1 = combinedDf[combinedDf.cluster_predicted==1]" 1518 | ] 1519 | }, 1520 | { 1521 | "cell_type": "code", 1522 | "execution_count": 56, 1523 | "metadata": {}, 1524 | "outputs": [], 1525 | "source": [ 1526 | "# Data for Cluster0\n", 1527 | "cluster0 = combinedDf[combinedDf.cluster_predicted==0]" 1528 | ] 1529 | }, 1530 | { 1531 | "cell_type": "code", 1532 | "execution_count": 57, 1533 | "metadata": {}, 1534 | "outputs": [ 1535 | { 1536 | "name": "stdout", 1537 | "output_type": "stream", 1538 | "text": [ 1539 | "\n", 1540 | "Int64Index: 12895 entries, 0 to 41186\n", 1541 | "Data columns (total 12 columns):\n", 1542 | "job 12895 non-null int64\n", 1543 | "marital 12895 non-null int64\n", 1544 | "education 12895 non-null int64\n", 1545 | "default 12895 non-null int64\n", 1546 | "housing 12895 non-null int64\n", 1547 | "loan 12895 non-null int64\n", 1548 | "contact 12895 non-null int64\n", 1549 | "month 12895 non-null int64\n", 1550 | "day_of_week 12895 non-null int64\n", 1551 | "poutcome 12895 non-null int64\n", 1552 | "age_bin 12895 non-null int64\n", 1553 | "cluster_predicted 12895 non-null uint8\n", 1554 | "dtypes: int64(11), uint8(1)\n", 1555 | "memory usage: 1.2 MB\n" 1556 | ] 1557 | } 1558 | ], 1559 | "source": [ 1560 | "cluster1.info()" 1561 | ] 1562 | }, 1563 | { 1564 | "cell_type": "code", 1565 | "execution_count": 58, 1566 | "metadata": {}, 1567 | "outputs": [ 1568 | { 1569 | "name": "stdout", 1570 | "output_type": "stream", 1571 | "text": [ 1572 | "\n", 1573 | "Int64Index: 28293 entries, 2 to 41187\n", 1574 | "Data columns (total 12 columns):\n", 1575 | "job 28293 non-null int64\n", 1576 | "marital 28293 non-null int64\n", 1577 | "education 28293 non-null int64\n", 1578 | "default 28293 non-null int64\n", 1579 | "housing 28293 non-null int64\n", 1580 | "loan 28293 non-null int64\n", 1581 | "contact 28293 non-null int64\n", 1582 | "month 28293 non-null int64\n", 1583 | "day_of_week 28293 non-null int64\n", 1584 | "poutcome 28293 non-null int64\n", 1585 | "age_bin 28293 non-null int64\n", 1586 | "cluster_predicted 28293 non-null uint8\n", 1587 | "dtypes: int64(11), uint8(1)\n", 1588 | "memory usage: 2.6 MB\n" 1589 | ] 1590 | } 1591 | ], 1592 | "source": [ 1593 | "cluster0.info()" 1594 | ] 1595 | }, 1596 | { 1597 | "cell_type": "code", 1598 | "execution_count": 79, 1599 | "metadata": {}, 1600 | "outputs": [], 1601 | "source": [ 1602 | "# Checking the count per category for JOB\n", 1603 | "job1_df = pd.DataFrame(cluster1['job'].value_counts())\n", 1604 | "job0_df = pd.DataFrame(cluster0['job'].value_counts())" 1605 | ] 1606 | }, 1607 | { 1608 | "cell_type": "code", 1609 | "execution_count": 80, 1610 | "metadata": {}, 1611 | "outputs": [ 1612 | { 1613 | "data": { 1614 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABJwAAAEyCAYAAAC/AMrlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAGnVJREFUeJzt3X3MpXdZJ/DvZUdEUCnQgcW27NTYoJVEYWstYliXurQFQgtptUaxIV3LmipgjC64fzSrkmg0omwAbWihAsvLDm9dRGq3oMa4FqaAQqlNZ0HbsZUOtiArUaxc+8dzD0zb55kZyu/Mfe5nPp9k8pz7d+5z+p2n83LN99wv1d0BAAAAgFG+bu4AAAAAAGwvCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFA75g6wCieccELv2rVr7hgAwArdeOONn+nunXPn4CvMYACwvX0189e2LJx27dqVPXv2zB0DAFihqvqbuTNwX2YwANjevpr5yyl1AAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQ+2YOwD3ddsrL5g7QpLk8S/aPXcEAGDh9r/mjXNHSJLs/KkfnzsCABxzHOEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQymcAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFArLZyq6mer6qaq+nhVvbmqHlpVp1TVDVV1a1W9taoeMu37DdP23un5XQe9z8um9Vuq6uxVZgYAWDozGAAwt5UVTlV1YpIXJTm9u5+Y5LgkFyX5tSSv6O5Tk9yT5JLpJZckuae7vz3JK6b9UlWnTa/7riTnJHl1VR23qtwAAEtmBgMA1sGqT6nbkeQbq2pHkocluTPJ05Psnp6/Osn50+Pzpu1Mz59VVTWtv6W7/7m7P5Vkb5IzVpwbAGDJzGAAwKxWVjh1998m+Y0kt2VjyPlckhuTfLa7751225fkxOnxiUlun15777T/ow9e3+Q1X1ZVl1bVnqras3///vE/IQCABTCDAQDrYJWn1D0yG5+MnZLkW5M8PMm5m+zaB16yxXNbrd93ofuK7j69u0/fuXPngwsNALBwZjAAYB2s8pS6H0ryqe7e393/kuQdSb4/yfHT4d1JclKSO6bH+5KcnCTT849IcvfB65u8BgCA+zKDAQCzW2XhdFuSM6vqYdN1AM5K8okkH0hywbTPxUnePT2+ZtrO9Pz7u7un9YumO6ickuTUJB9cYW4AgCUzgwEAs9tx+F0enO6+oap2J/lwknuTfCTJFUl+P8lbqupXprUrp5dcmeQNVbU3G5+qXTS9z01V9bZsDEr3Jrmsu/91VbkBAJbMDAYArIOVFU5J0t2XJ7n8fsufzCZ3OOnuf0py4Rbv8/IkLx8eEABgGzKDAQBzW+UpdQAAAAAcgxROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQymcAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQymcAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADDUSgunqjq+qnZX1V9V1c1V9ZSqelRVXVdVt05fHzntW1X1yqraW1V/WVVPPuh9Lp72v7WqLl5lZgCApTODAQBzW/URTr+d5H3d/R1JvjvJzUlemuT67j41yfXTdpKcm+TU6celSV6TJFX1qCSXJ/m+JGckufzAgAQAwKbMYADArFZWOFXVtyR5WpIrk6S7v9jdn01yXpKrp92uTnL+9Pi8JL/XG/48yfFV9bgkZye5rrvv7u57klyX5JxV5QYAWDIzGACwDlZ5hNO3Jdmf5HVV9ZGqem1VPTzJY7v7ziSZvj5m2v/EJLcf9Pp909pW6/dRVZdW1Z6q2rN///7xPxsAgGUwgwEAs1tl4bQjyZOTvKa7n5TkH/OVQ7c3U5us9SHW77vQfUV3n97dp+/cufPB5AUA2A7MYADA7FZZOO1Lsq+7b5i2d2dj+Pn0dJh2pq93HbT/yQe9/qQkdxxiHQCABzKDAQCzW1nh1N1/l+T2qnrCtHRWkk8kuSbJgbucXJzk3dPja5L8xHSnlDOTfG463PvaJM+oqkdOF6p8xrQGAMD9mMEAgHWwY8Xv/zNJ3lRVD0nyySQvyEbJ9baquiTJbUkunPZ9b5JnJtmb5AvTvunuu6vql5N8aNrvl7r77hXnBgBYMjMYADCrlRZO3f3RJKdv8tRZm+zbSS7b4n2uSnLV2HQAANuTGQwAmNsqr+EEAAAAwDFI4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChdswdAOBYdv7u6+eOkCR51wVnzR0BAIAF+4O3fmbuCEmSc3/khLkjMHGEEwAAAABDKZwAAAAAGErhBAAAAMBQR3QNp6p6XpIfSNJJ/rS737nSVAAAmMEAgMU67BFOVfXqJP85yceSfDzJC6vqVasOBgBwLDODAQBLdiRHOP37JE/s7k6Sqro6G4MPAACrYwYDABbrSK7hdEuSxx+0fXKSv1xNHAAAJmYwAGCxtjzCqar+VzauF/CIJDdX1Qenp85I8mdHIRsAwDHHDAYAbAeHOqXuN45aCgAADjCDAQCLt2Xh1N1/fOBxVT02yfdOmx/s7rtWHQwA4FhkBgMAtoMjuUvdDyf5YJILk/xwkhuq6oJVBwMAOJaZwQCAJTuSu9T91yTfe+ATtarameR/J9m9ymAAAMc4MxgAsFhHcpe6r7vf4dt/f4SvAwDgwTODAQCLdSRHOL2vqq5N8uZp+6Ikf7C6SAAAxAwGACzYYQun7v75qnpekqcmqSS/093vWnkyAIBjmBkMAFiyLQunqvrT7v6Bqvp8ks7GoJMkP1lVX0pyd5Jf7+5XH4WcAADHBDMYALAdbFk4dfcPTF+/ebPnq+rRSf4siWEHAGAQMxgAsB086AtPdvffJ/nBcVEAADgcMxgAsARf051OuvvOUUEAADgyZjAAYN25tS4AAAAAQymcAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMtWPuAAAw0h+89TNzR0iSnPsjJ8wdAfgq3PbKC+aOkMe/aPfcEQBgmJUf4VRVx1XVR6rqPdP2KVV1Q1XdWlVvraqHTOvfMG3vnZ7fddB7vGxav6Wqzl51ZgCAJTN/AQBzOxqn1L04yc0Hbf9akld096lJ7klyybR+SZJ7uvvbk7xi2i9VdVqSi5J8V5Jzkry6qo47CrkBAJbK/AUAzGqlhVNVnZTkWUleO21XkqcnOXC88NVJzp8enzdtZ3r+rGn/85K8pbv/ubs/lWRvkjNWmRsAYKnMXwDAOlj1EU6/leQXknxp2n50ks92973T9r4kJ06PT0xye5JMz39u2v/L65u85suq6tKq2lNVe/bv3z/65wEAsBRHbf5KzGAAwOZWVjhV1bOT3NXdNx68vMmufZjnDvWaryx0X9Hdp3f36Tt37vyq8wIALN3Rnr8SMxgAsLlV3qXuqUmeU1XPTPLQJN+SjU/cjq+qHdOnaCcluWPaf1+Sk5Psq6odSR6R5O6D1g84+DUAAHyF+QsAWAsrO8Kpu1/W3Sd1965sXHTy/d39Y0k+kOTAfWcvTvLu6fE103am59/f3T2tXzTdReWUJKcm+eCqcgMALJX5CwBYF6s8wmkr/yXJW6rqV5J8JMmV0/qVSd5QVXuz8cnaRUnS3TdV1duSfCLJvUku6+5/PfqxAQAWy/wFABxVR6Vw6u4/SvJH0+NPZpO7nHT3PyW5cIvXvzzJy1eXEABgezF/AQBzWvVd6gAAAAA4xiicAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQymcAAAAABhK4QQAAADAUAonAAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACG2jF3gKNl/2veOHeE7PypH587AgAAMHnBO8+ZO0KS5HXPfd/cEQCGO2YKJ1h36zDwGHYAADgWnL/7+rkjJEnedcFZc0eAlXFKHQAAAABDKZwAAAAAGGplhVNVnVxVH6iqm6vqpqp68bT+qKq6rqpunb4+clqvqnplVe2tqr+sqicf9F4XT/vfWlUXryozAMDSmcEAgHWwyiOc7k3yc939nUnOTHJZVZ2W5KVJru/uU5NcP20nyblJTp1+XJrkNcnGcJTk8iTfl+SMJJcfGJAAAHgAMxgAMLuVFU7dfWd3f3h6/PkkNyc5Mcl5Sa6edrs6yfnT4/OS/F5v+PMkx1fV45KcneS67r67u+9Jcl2S+a+uDACwhsxgAMA6OCrXcKqqXUmelOSGJI/t7juTjYEoyWOm3U5McvtBL9s3rW21fv//xqVVtaeq9uzfv3/0TwEAYHHMYADAXFZeOFXVNyV5e5KXdPc/HGrXTdb6EOv3Xei+ortP7+7Td+7c+eDCAgBsE2YwAGBOKy2cqurrszHovKm73zEtf3o6TDvT17um9X1JTj7o5SclueMQ6wAAbMIMBgDMbZV3qaskVya5ubt/86Cnrkly4C4nFyd590HrPzHdKeXMJJ+bDve+NskzquqR04UqnzGtAQBwP2YwAGAd7Fjhez81yfOTfKyqPjqt/WKSX03ytqq6JMltSS6cnntvkmcm2ZvkC0lekCTdfXdV/XKSD037/VJ3373C3AAAS2YGAwBmt7LCqbv/NJuf+58kZ22yfye5bIv3uirJVePSAQBsT2YwAGAdHJW71AEAAABw7FA4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChFE4AAAAADKVwAgAAAGAohRMAAAAAQymcAAAAABhK4QQAAADAUDvmDgAsy7Pe+etzR0iS/P5zf37uCAAAR806zGDmL+Cr4QgnAAAAAIZSOAEAAAAwlMIJAAAAgKFcw4kH5dornzl3hCTJ2Ze8d+4IAADA5Nm73zR3hCTJey74sbkjwDHPEU4AAAAADOUIJ7a1333D2XNHSJK88PnXzh0BANgGHGUOwFI4wgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYSuEEAAAAwFAKJwAAAACGUjgBAAAAMJTCCQAAAIChdswdAACORX/9W383d4Qkya6X/Ju5IwDb0O++4ey5IyRJXvj8a+eOAHDMUjgB29Kzd79p7ghJkvdc8GNzRwAAACY+9Dt6FE4AHJEXvfP2uSMkSV753JPnjgAAcFSYv1gy13ACAAAAYCiFEwAAAABDKZwAAAAAGErhBAAAAMBQCicAAAAAhlI4AQAAADCUwgkAAACAoRROAAAAAAylcAIAAABgKIUTAAAAAEMpnAAAAAAYasfcAQCA9fbp3/4/c0fIY1/8lLkjAAAcNeswfyVf2wzmCCcAAAAAhlpM4VRV51TVLVW1t6peOnceAIDtzvwFADxYiyicquq4JK9Kcm6S05L8aFWdNm8qAIDty/wFAHwtFlE4JTkjyd7u/mR3fzHJW5KcN3MmAIDtzPwFADxoSymcTkxy+0Hb+6Y1AABWw/wFADxo1d1zZzisqrowydnd/Z+m7ecnOaO7f+agfS5Ncum0+YQktwyOcUKSzwx+z1VZSlY5x1tKVjnHWkrOZDlZ5RxrVTn/bXfvXMH7kiObv6Z1M9iGpeRMlpNVzvGWklXOsZaSM1lO1mM55xHPXzsG/4dXZV+Skw/aPinJHQfv0N1XJLliVQGqak93n76q9x9pKVnlHG8pWeUcayk5k+VklXOspeTkAQ47fyVmsAOWkjNZTlY5x1tKVjnHWkrOZDlZ5TwySzml7kNJTq2qU6rqIUkuSnLNzJkAALYz8xcA8KAt4gin7r63qn46ybVJjktyVXffNHMsAIBty/wFAHwtFlE4JUl3vzfJe2eMsLJDxVdgKVnlHG8pWeUcayk5k+VklXOspeTkftZg/kqW8+tnKTmT5WSVc7ylZJVzrKXkTJaTVc4jsIiLhgMAAACwHEu5hhMAAAAAC6FwAgAAAGAohdMRqKpzquqWqtpbVS+dO89Wquqqqrqrqj4+d5ZDqaqTq+oDVXVzVd1UVS+eO9NmquqhVfXBqvqLKed/mzvToVTVcVX1kap6z9xZtlJVf11VH6uqj1bVnrnzHEpVHV9Vu6vqr6Zfq0+ZO9P9VdUTpu/lgR//UFUvmTvXZqrqZ6ffRx+vqjdX1UPnzrSZqnrxlPGmdftebvZnfFU9qqquq6pbp6+PnDPjlGmznBdO39MvVdXa30KY9bGEGcz8NZ4ZbLylzGBLmL8SM9gqmMG+dus4gymcDqOqjkvyqiTnJjktyY9W1WnzptrS65OcM3eII3Bvkp/r7u9McmaSy9b0e/rPSZ7e3d+d5HuSnFNVZ86c6VBenOTmuUMcgf/Q3d/T3ev+j87fTvK+7v6OJN+dNfzedvct0/fye5L8uyRfSPLOmWM9QFWdmORFSU7v7idm425XF82b6oGq6olJfjLJGdn4f/7sqjp13lT38fo88M/4lya5vrtPTXL9tD231+eBOT+e5HlJ/uSop2GxFjSDvT7mr9HMYKuxhBls7eevxAw2mhlsmNdnzWYwhdPhnZFkb3d/sru/mOQtSc6bOdOmuvtPktw9d47D6e47u/vD0+PPZ+MvkhPnTfVAveH/TZtfP/1Yy6vsV9VJSZ6V5LVzZ9kOqupbkjwtyZVJ0t1f7O7PzpvqsM5K8n+7+2/mDrKFHUm+sap2JHlYkjtmzrOZ70zy5939he6+N8kfJ3nuzJm+bIs/489LcvX0+Ook5x/VUJvYLGd339zdt8wUieVaxAxm/hrPDHZsWuj8lZjBRjCDDbCOM5jC6fBOTHL7Qdv7sqZ/OS9RVe1K8qQkN8ybZHPTIdIfTXJXkuu6ey1zJvmtJL+Q5EtzBzmMTvKHVXVjVV06d5hD+LYk+5O8bjpE/rVV9fC5Qx3GRUnePHeIzXT33yb5jSS3Jbkzyee6+w/nTbWpjyd5WlU9uqoeluSZSU6eOdPhPLa770w2/jGZ5DEz54GRzGArsu7zV2IGW4ElzGBLnL8SM9gIZrBtSuF0eLXJ2lp+wrI0VfVNSd6e5CXd/Q9z59lMd//rdKjsSUnOmA73XCtV9ewkd3X3jXNnOQJP7e4nZ+P0iMuq6mlzB9rCjiRPTvKa7n5Skn/Mehwmu6mqekiS5yT5n3Nn2cx0Tvt5SU5J8q1JHl5VPz5vqgfq7puT/FqS65K8L8lfZOMUFGAeZrAVWML8lZjBVmAJM9ii5q/EDDaKGWz7Ujgd3r7ct109Ket5GOKiVNXXZ2PYeVN3v2PuPIczHc77R1nPazQ8Nclzquqvs3G6wdOr6o3zRtpcd98xfb0rG+e5nzFvoi3tS7LvoE9Td2djAFpX5yb5cHd/eu4gW/ihJJ/q7v3d/S9J3pHk+2fOtKnuvrK7n9zdT8vGIcm3zp3pMD5dVY9LkunrXTPngZHMYIMtbf5KzGCjLGQGW9r8lZjBhjGDbU8Kp8P7UJJTq+qUqcG+KMk1M2datKqqbJybfXN3/+bcebZSVTur6vjp8Tdm4w/sv5o31QN198u6+6Tu3pWNX5/v7+61++Siqh5eVd984HGSZ2Tj8Nm1091/l+T2qnrCtHRWkk/MGOlwfjRreij35LYkZ1bVw6bf/2dlTS8CWlWPmb4+PhsXWFzn72uy8ffRxdPji5O8e8YsMJoZbKClzF+JGWy0pcxgC5y/EjPYMGaw7WnH3AHWXXffW1U/neTabFzV/6ruvmnmWJuqqjcn+cEkJ1TVviSXd/eV86ba1FOTPD/Jx6Zz85PkF7v7vTNm2szjklw93SXn65K8rbvX9na3C/DYJO/c+LsuO5L8j+5+37yRDulnkrxp+kfOJ5O8YOY8m5rOc/+PSV44d5atdPcNVbU7yYezcXj0R5JcMW+qLb29qh6d5F+SXNbd98wd6IDN/oxP8qtJ3lZVl2RjqLxwvoQbtsh5d5L/nmRnkt+vqo9299nzpWQJljKDmb9Wwgw21pJmsEXMX4kZbAXMYF+jdZzBqtup8AAAAACM45Q6AAAAAIZSOAEAAAAwlMIJAAAAgKEUTgAAAAAMpXACAAAAYCiFEwAAAABDKZwAAAAAGOr/Ay2P3JY8TPqaAAAAAElFTkSuQmCC\n", 1615 | "text/plain": [ 1616 | "" 1617 | ] 1618 | }, 1619 | "metadata": {}, 1620 | "output_type": "display_data" 1621 | } 1622 | ], 1623 | "source": [ 1624 | "fig, ax =plt.subplots(1,2,figsize=(20,5))\n", 1625 | "sns.barplot(x=job1_df.index, y=job1_df['job'], ax=ax[0])\n", 1626 | "sns.barplot(x=job0_df.index, y=job0_df['job'], ax=ax[1])\n", 1627 | "fig.show()" 1628 | ] 1629 | }, 1630 | { 1631 | "cell_type": "code", 1632 | "execution_count": 81, 1633 | "metadata": {}, 1634 | "outputs": [], 1635 | "source": [ 1636 | "age1_df = pd.DataFrame(cluster1['age_bin'].value_counts())\n", 1637 | "age0_df = pd.DataFrame(cluster0['age_bin'].value_counts())" 1638 | ] 1639 | }, 1640 | { 1641 | "cell_type": "code", 1642 | "execution_count": 83, 1643 | "metadata": {}, 1644 | "outputs": [ 1645 | { 1646 | "data": { 1647 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABJwAAAEyCAYAAAC/AMrlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+0ZWV5J/jvE8rfUUEtbVMFDT3WOEE7RlMDdFjLpCXyw9jCzNJuXGmt2GTVTDca08kkatKrSVR66aQ7RpNIL1qqhbQBaaIDnSFiNf6aZAQtFEVEmxINXCFSppBobHUwz/xxdumxuPWTfe+pe87ns9Zd5+xnv3vf510qPD733e+u7g4AAAAAjOWHZp0AAAAAAPNFwwkAAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMat2sE1gpT3rSk/r444+fdRoAwAq66aabvtrd62edx6xU1bYkL0xyb3c/c69z/0eS306yvru/WlWV5K1JXpDkm0l+vrs/MYzdkuRfDZe+sbsvHeI/keSdSR6V5Nokr+7u3l9OajAAmG8HW3/NbcPp+OOPz44dO2adBgCwgqrqL2adw4y9M8nvJ7lsOlhVxyZ5fpI7p8JnJdk0/Jyc5KIkJ1fVE5JckGRzkk5yU1Vd0933DWO2Jrkhk4bTmUn+dH8JqcEAYL4dbP3lkToAgDWquz+SZPcyp96S5NcyaSDtcXaSy3rihiRHV9VTk5yRZHt37x6aTNuTnDmce1x3f3RY1XRZknNWcj4AwPzQcAIAmCNV9aIkX+7uT+11akOSu6aOl4bY/uJLy8QBAA5obh+pAwBYNFX16CS/keT05U4vE+vDiC/3e7dm8uhdjjvuuIPKFQCYb1Y4AQDMj/8hyQlJPlVVX0qyMcknqurvZLJC6dipsRuT3H2A+MZl4g/S3Rd39+bu3rx+/cLu4Q4ATNFwAgCYE919S3c/ubuP7+7jM2kaPae7/zLJNUleXhOnJLm/u+9Jcl2S06vqmKo6JpPVUdcN575eVacMb7h7eZKrZzIxAGDN0XACAFijquryJB9N8vSqWqqq8/Yz/NokdyTZmeQ/JPkXSdLdu5O8IcnHh5/XD7Ek+edJ3jFc84Uc4A11AAB72MMJAGCN6u6XHuD88VPfO8n5+xi3Lcm2ZeI7kjzzoWUJACwiK5wAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMyqbhsCA+/NyfmnUKo/mpj3x41ikAwEI59fdOnXUKh+zPX/Xns04BYKFZ4QQAAADAqDScAAAAABiVhhMAAAAAo9JwAgAAAGBUGk4AAAAAjErDCQAAAIBRaTgBAAAAMCoNJwAAAABGpeEEAAAAwKg0nAAAAAAYlYYTAAAAAKPScAIAAABgVBpOAAAAAIxKwwkAAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMasUbTlX1paq6papurqodQ+wJVbW9qm4fPo8Z4lVVb6uqnVX16ap6ztR9tgzjb6+qLSudNwAAAACHZ7VWOP3D7v7x7t48HL82yfXdvSnJ9cNxkpyVZNPwszXJRcmkQZXkgiQnJzkpyQV7mlQAAAAAHFlm9Ujd2UkuHb5fmuScqfhlPXFDkqOr6qlJzkiyvbt3d/d9SbYnOXO1kwYAAADgwFaj4dRJ3l9VN1XV1iH2lO6+J0mGzycP8Q1J7pq6dmmI7SsOAAAAwBFmNRpOp3b3czJ5XO78qnrufsbWMrHeT/wHL67aWlU7qmrHrl27Di9bAIA1oqq2VdW9VfWZqdhvV9Xnhv0w31tVR0+de92wV+bnq+qMqfiZQ2xnVb12Kn5CVd047KH57qp6+OrNDgBYy1a84dTddw+f9yZ5byZ7MH1leFQuw+e9w/ClJMdOXb4xyd37ie/9uy7u7s3dvXn9+vVjTwUA4Ejzzjx4m4HtSZ7Z3T+W5L8leV2SVNWJSc5N8ozhmrdX1VFVdVSSP8jkj4MnJnnpMDZJ3pzkLcO+m/clOW9lpwMAzIsVbThV1WOq6rF7vic5PclnklyTZM+b5rYkuXr4fk2Slw9vqzslyf3DI3fXJTm9qo4ZNgs/fYgBACys7v5Ikt17xd7f3Q8Mhzdk8oe6ZLJX5hXd/e3u/mKSnZn8IfCkJDu7+47u/k6SK5KcXVWV5HlJrhqun953EwBgv9at8P2fkuS9k3ol65L8UXe/r6o+nuTKqjovyZ1JXjKMvzbJCzIpgL6Z5BVJ0t27q+oNST4+jHt9d/9AcQUAwIP8syTvHr5vyKQBtcf0nph775V5cpInJvnaVPNqn3toDvt0bk2S4447bpTEAYC1bUUbTt19R5JnLRP/qySnLRPvJOfv417bkmwbO0cAgHlUVb+R5IEk79oTWmZYZ/kV7we9h2Yy2dYgycVJsnnz5mXHAACLZaVXOAEAsMqqakuSFyY5bfiDXrL/PTGXi381ydFVtW5Y5bTsHpoAAMtZjbfUAQCwSqrqzCSvSfKi7v7m1KlrkpxbVY+oqhOSbErysUy2LNg0vJHu4ZlsLH7N0Kj6YJIXD9dP77sJALBfGk4AAGtUVV2e5KNJnl5VS8P+mL+f5LFJtlfVzVX175Oku29NcmWSzyZ5X5Lzu/u7w+qlV2byQpbbklw5jE0mjatfrqqdmezpdMkqTg8AWMM8UgcAsEZ190uXCe+zKdTdFya5cJn4tZm8vGXv+B2ZvMUOAOCQWOEEAAAAwKg0nAAAAAAYlYYTAAAAAKPScAIAAABgVBpOAAAAAIxKwwkAAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEal4QQAAADAqDScAAAAABiVhhMAAAAAo9JwAgAAAGBUGk4AAAAAjErDCQAAAIBRaTgBAAAAMCoNJwAAAABGpeEEAAAAwKg0nAAAAAAYlYYTAAAAAKPScAIAAABgVBpOAAAAAIxKwwkAYI2qqm1VdW9VfWYq9oSq2l5Vtw+fxwzxqqq3VdXOqvp0VT1n6potw/jbq2rLVPwnquqW4Zq3VVWt7gwBgLVKwwkAYO16Z5Iz94q9Nsn13b0pyfXDcZKclWTT8LM1yUXJpEGV5IIkJyc5KckFe5pUw5itU9ft/bsAAJal4QQAsEZ190eS7N4rfHaSS4fvlyY5Zyp+WU/ckOToqnpqkjOSbO/u3d19X5LtSc4czj2uuz/a3Z3ksql7AQDsl4YTAMB8eUp335Mkw+eTh/iGJHdNjVsaYvuLLy0Tf5Cq2lpVO6pqx65du0aZBACwtmk4AQAshuX2X+rDiD842H1xd2/u7s3r169/CCkCAPNiVRpOVXVUVX2yqv5kOD6hqm4cNqZ8d1U9fIg/YjjeOZw/fuoerxvin6+qM1YjbwCANegrw+NwGT7vHeJLSY6dGrcxyd0HiG9cJg4AcECrtcLp1Ulumzp+c5K3DJtZ3pfkvCF+XpL7uvtpSd4yjEtVnZjk3CTPyGSzyrdX1VGrlDsAwFpyTZI9b5rbkuTqqfjLh7fVnZLk/uGRu+uSnF5VxwybhZ+e5Lrh3Ner6pTh7XQvn7oXAMB+rXjDqao2JvnZJO8YjivJ85JcNQzZezPLPZtcXpXktGH82Umu6O5vd/cXk+zM5C0qAAALq6ouT/LRJE+vqqWqOi/Jm5I8v6puT/L84ThJrk1yRyZ11H9I8i+SpLt3J3lDko8PP68fYknyzzOp4XYm+UKSP12NeQEAa9+6Vfgdv5vk15I8djh+YpKvdfcDw/H0BpTf27Syux+oqvuH8RuS3DB1z2U3rayqrZm8ujfHHXfcuLMAADjCdPdL93HqtGXGdpLz93GfbUm2LRPfkeSZDyVHAGAxregKp6p6YZJ7u/um6fAyQ/sA5w5q00obVgIAAADM3kqvcDo1yYuq6gVJHpnkcZmseDq6qtYNq5ymN6Dcs2nlUlWtS/L4JLuz780sAQAAADjCrOgKp+5+XXdv7O7jM9n0+wPd/XNJPpjkxcOwvTez3LPJ5YuH8T3Ezx3eYndCkk1JPraSuQMAAABweFZjD6flvCbJFVX1xiSfTHLJEL8kyR9W1c5MVjadmyTdfWtVXZnks0keSHJ+d3939dMGAAAA4EBWreHU3R9K8qHh+x1Z5i1z3f2tJC/Zx/UXJrlw5TIEAAAAYAwr+kgdAAAAAItHwwkAAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEal4QQAAADAqDScAAAAABiVhhMAAAAAo9JwAgAAAGBUGk4AAAAAjErDCQAAAIBRaTgBAAAAMCoNJwAAAABGpeEEAAAAwKg0nAAAAAAYlYYTAAAAAKPScAIAAABgVBpOAABzqKr+ZVXdWlWfqarLq+qRVXVCVd1YVbdX1bur6uHD2EcMxzuH88dP3ed1Q/zzVXXGrOYDAKwtGk4AAHOmqjYk+cUkm7v7mUmOSnJukjcneUt3b0pyX5LzhkvOS3Jfdz8tyVuGcamqE4frnpHkzCRvr6qjVnMuAMDapOEEADCf1iV5VFWtS/LoJPckeV6Sq4bzlyY5Z/h+9nCc4fxpVVVD/Iru/nZ3fzHJziQnrVL+AMAapuEEADBnuvvLSf5tkjszaTTdn+SmJF/r7geGYUtJNgzfNyS5a7j2gWH8E6fjy1wDALBP6w5l8LA8++9OX9fdHxk7KQCARTNmnVVVx2SyOumEJF9L8p+TnLXM0N5zyT7O7Su+9+/bmmRrkhx33HGHkTEAMG8OuuFUVW9O8k+SfDbJd4dwJ9FwAgB4CFagzvqZJF/s7l3D/d+T5CeTHF1V64ZVTBuT3D2MX0pybJKl4RG8xyfZPRXfY/qa7+nui5NcnCSbN29+UEMKAFg8h7LC6ZwkT+/ub69UMgAAC2rsOuvOJKdU1aOT/PckpyXZkeSDSV6c5IokW5JcPYy/Zjj+6HD+A93dVXVNkj+qqt9J8iNJNiX52Eg5AgBz7FAaTnckeVgSDScAgHGNWmd1941VdVWSTyR5IMknM1mB9H8nuaKq3jjELhkuuSTJH1bVzkxWNp073OfWqroyk5VXDyQ5v7u/GwCAAziUhtM3k9xcVddnqhjq7l8cPSsAgMUyep3V3RckuWCv8B1Z5i1z3f2tJC/Zx30uTHLh4eYBACymQ2k4XTP8AAAwLnUWADBXDrrh1N2XrmQiAACLSp0FAMybAzacqurK7v7HVXVLlnkNbnf/2IpkBgAw59RZAMC8OpgVTq8ePl+4kokAACwgdRYAMJd+6EADuvue4fMvMtnE8llJfizJt4cYAACHQZ0FAMyrAzac9qiqX0jysST/a5IXJ7mhqv7ZAa55ZFV9rKo+VVW3VtVvDfETqurGqrq9qt5dVQ8f4o8YjncO54+futfrhvjnq+qMQ58qAMCR6XDqLACAI9mhvKXuV5M8u7v/Kkmq6olJ/t8k2/ZzzbeTPK+7v1FVD0vyZ1X1p0l+OclbuvuKqvr3Sc5LctHweV93P62qzk3y5iT/pKpOTHJukmck+ZEk/7Wq/sfu/u4hzRYA4Mh0OHUWAMAR66BXOCVZSvL1qeOvJ7lrfxf0xDeGw4cNP53keUmuGuKXJjln+H72cJzh/GlVVUP8iu7+dnd/McnOJCcdQu4AAEeyQ66zAACOZAfzlrpfHr5+OcmNVXV1Jk2jszNZ+n2g649KclOSpyX5gyRfSPK17n5gGLKUZMPwfUOG4qq7H6iq+5M8cYjfMHXb6Wumf9fWJFuT5LjjjjtQagAAM/VQ6ywAgCPVwTxS99jh8wvDzx5XH8wvGB57+/GqOjrJe5P86HLDhs/ax7l9xff+XRcnuThJNm/e/KDzAABHmIdUZwEAHKkO2HDq7t86mBtV1e9196v2c5+vVdWHkpyS5OiqWjesctqY5O5h2FKSY5MsVdW6JI9Psnsqvsf0NQAAa9JYdRYAwJHmUPZwOpBT9w5U1fphZVOq6lFJfibJbUk+mMkbWJJkS77/V7xrhuMM5z/Q3T3Ezx3eYndCkk2xzBwAWBwPqrMAAI5kh/KWusPx1CSXDvs4/VCSK7v7T6rqs0muqKo3JvlkkkuG8Zck+cOq2pnJyqZzk6S7b62qK5N8NskDSc73hjoAAACAI9OKNpy6+9NJnr1M/I4s85a57v5Wkpfs414XJrlw7BwBAAAAGNeYDaflNvaGI8apvzc/TyP8+av+fNYpALC61FkAwJpyyHs4VdVj9nHqrQ8xFwCAhabOAgDmxUE3nKrqJ4e9l24bjp9VVW/fc7673zl+egAA80+dBQDMm0NZ4fSWJGck+ask6e5PJXnuSiQFALBg1FkAwFw5pEfquvuuvULeFAcAMAJ1FgAwTw5l0/C7quonk3RVPTzJL2ZY9g0AwEOizgIA5sqhrHD635Ocn2RDkqUkPz4cAwDw0KizAIC5ctArnLr7q0l+bgVzAQBYSOosAGDeHHTDqaretkz4/iQ7uvvq8VICAFgs6iwAYN4cyiN1j8xkefftw8+PJXlCkvOq6ndXIDcAgEWhzgIA5sqhbBr+tCTP6+4HkqSqLkry/iTPT3LLCuQGALAo1FkAwFw5lBVOG5I8Zur4MUl+pLu/m+Tbo2YFALBY1FkAwFw5lBVO/2eSm6vqQ0kqyXOT/JuqekyS/7oCuQEALAp1FgAwVw7lLXWXVNWfJnlZks9lssx7qbv/JsmvrlB+AABzT50FAMybQ3lL3S8keXWSjUluTnJKko8med7KpAYAsBjUWQDAvDmUPZxeneR/TvIX3f0Pkzw7ya4VyQoAYLGoswCAuXIoDadvdfe3kqSqHtHdn0vy9JVJCwBgoYxeZ1XV0VV1VVV9rqpuq6p/UFVPqKrtVXX78HnMMLaq6m1VtbOqPl1Vz5m6z5Zh/O1VteUhzRIAWBiH0nBaqqqjk/xfSbZX1dVJ7l6ZtAAAFspK1FlvTfK+7v6fkjwryW1JXpvk+u7elOT64ThJzkqyafjZmuSiJKmqJyS5IMnJSU5KcsGeJhUAwP4cyqbh/8vw9Ter6oNJHp/kfSuSFQDAAhm7zqqqx2XyprufH+7/nSTfqaqzk/z0MOzSJB9K8pokZye5rLs7yQ3D6qinDmO3d/fu4b7bk5yZ5PLDzQ0AWAwH3XCa1t0fHjsRAABGq7P+XiZ7QP3HqnpWkpsy2SfqKd19z/B77qmqJw/jNyS5a+r6pSG2r/gPqKqtmayMynHHHTdC+gDAWncoj9QBALA2rEvynCQXdfezk/xNvv/43HJqmVjvJ/6Dge6Lu3tzd29ev3794eQLAMwZDScAgPmzlGSpu28cjq/KpAH1leFRuQyf906NP3bq+o2Z7CG1rzgAwH5pOAEAzJnu/sskd1XVnjfdnZbks0muSbLnTXNbklw9fL8mycuHt9WdkuT+4dG765KcXlXHDJuFnz7EAAD267D2cAIA4Ij3qiTvqqqHJ7kjySsy+WPjlVV1XpI7k7xkGHttkhck2Znkm8PYdPfuqnpDko8P416/ZwNxAID90XACAJhD3X1zks3LnDptmbGd5Px93Gdbkm3jZgcAzDuP1AEAAAAwKg0nAAAAAEal4QQAAADAqOzhtGDufP3fn3UKoznuX98y6xQAAACAZVjhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEa1og2nqjq2qj5YVbdV1a1V9eoh/oSq2l5Vtw+fxwzxqqq3VdXOqvp0VT1n6l5bhvG3V9WWlcwbAAAAgMO30iucHkjyK939o0lOSXJ+VZ2Y5LVJru/uTUmuH46T5Kwkm4afrUkuSiYNqiQXJDk5yUlJLtjTpAIAAADgyLJuJW/e3fckuWf4/vWqui3JhiRnJ/npYdilST6U5DVD/LLu7iQ3VNXRVfXUYez27t6dJFW1PcmZSS5fyfyB+fD7v/JfZp3CaF757/7RrFMAYBl3vv7vzzqFQ3bcv75l1ikAMMdWbQ+nqjo+ybOT3JjkKUMzak9T6snDsA1J7pq6bGmI7SsOAAAAwBFmVRpOVfXDSf44yS9191/vb+gysd5PfO/fs7WqdlTVjl27dh1esgAAAAA8JCvecKqqh2XSbHpXd79nCH9leFQuw+e9Q3wpybFTl29Mcvd+4j+guy/u7s3dvXn9+vXjTgQAAACAg7LSb6mrJJckua27f2fq1DVJ9rxpbkuSq6fiLx/eVndKkvuHR+6uS3J6VR0zbBZ++hADAAAA4AizopuGJzk1ycuS3FJVNw+xX0/ypiRXVtV5Se5M8pLh3LVJXpBkZ5JvJnlFknT37qp6Q5KPD+Nev2cDcQAAgIfqw8/9qVmncMh+6iMfnnUKAPu00m+p+7Msv/9Skpy2zPhOcv4+7rUtybbxsgMAAABgJazaW+oAAAAAWAwaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEal4QQAAADAqDScAAAAABjVulknAAAAwMr6/V/5L7NO4ZC98t/9o1mnADwEVjgBAMyhqjqqqj5ZVX8yHJ9QVTdW1e1V9e6qevgQf8RwvHM4f/zUPV43xD9fVWfMZiYAwFqk4QQAMJ9eneS2qeM3J3lLd29Kcl+S84b4eUnu6+6nJXnLMC5VdWKSc5M8I8mZSd5eVUetUu4AwBqn4QQAMGeqamOSn03yjuG4kjwvyVXDkEuTnDN8P3s4znD+tGH82Umu6O5vd/cXk+xMctLqzAAAWOs0nAAA5s/vJvm1JH87HD8xyde6+4HheCnJhuH7hiR3Jclw/v5h/Pfiy1zzA6pqa1XtqKodu3btGnMeAMAapeEEADBHquqFSe7t7pumw8sM7QOc2981Pxjsvri7N3f35vXr1x9SvgDAfPKWOgCA+XJqkhdV1QuSPDLJ4zJZ8XR0Va0bVjFtTHL3MH4pybFJlqpqXZLHJ9k9Fd9j+hoAgP2ywgkAYI509+u6e2N3H5/Jpt8f6O6fS/LBJC8ehm1JcvXw/ZrhOMP5D3R3D/Fzh7fYnZBkU5KPrdI0AIA1zgonAIDF8JokV1TVG5N8MsklQ/ySJH9YVTszWdl0bpJ0961VdWWSzyZ5IMn53f3d1U8bAFiLNJwAAOZUd38oyYeG73dkmbfMdfe3krxkH9dfmOTClcsQAJhXHqkDAAAAYFQaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEal4QQAAADAqDScAAAAABiVhhMAAAAAo9JwAgAAAGBUGk4AAAAAjErDCQAAAIBRaTgBAAAAMCoNJwAAAABGpeEEAAAAwKg0nAAAAAAYlYYTAAAAAKPScAIAAABgVBpOAAAAAIxKwwkAAACAUWk4AQAAADAqDScAAAAARrWiDaeq2lZV91bVZ6ZiT6iq7VV1+/B5zBCvqnpbVe2sqk9X1XOmrtkyjL+9qrasZM4AAAAAPDQrvcLpnUnO3Cv22iTXd/emJNcPx0lyVpJNw8/WJBclkwZVkguSnJzkpCQX7GlSAQAAAHDkWdGGU3d/JMnuvcJnJ7l0+H5pknOm4pf1xA1Jjq6qpyY5I8n27t7d3fcl2Z4HN7EAAAAAOELMYg+np3T3PUkyfD55iG9IctfUuKUhtq/4g1TV1qraUVU7du3aNXriAAAAABzYkbRpeC0T6/3EHxzsvri7N3f35vXr14+aHAAAAAAHZxYNp68Mj8pl+Lx3iC8lOXZq3MYkd+8nDgAAAMARaBYNp2uS7HnT3JYkV0/FXz68re6UJPcPj9xdl+T0qjpm2Cz89CEGAAAAwBFo3UrevKouT/LTSZ5UVUuZvG3uTUmurKrzktyZ5CXD8GuTvCDJziTfTPKKJOnu3VX1hiQfH8a9vrv33ogcAAAAgCPEijacuvul+zh12jJjO8n5+7jPtiTbRkwNAGBuVdWxSS5L8neS/G2Si7v7rVX1hCTvTnJ8ki8l+cfdfV9VVZK3ZvLHv28m+fnu/sRwry1J/tVw6zd296UBADiAI2nTcAAAxvFAkl/p7h9NckqS86vqxCSvTXJ9d29Kcv1wnCRnJdk0/GxNclGSDA2qC5KcnOSkJBcMWxwAAOyXhhMAwJzp7nv2rFDq7q8nuS3JhiRnJ9mzQunSJOcM389OcllP3JDk6OHlLmck2d7du7v7viTbk5y5ilMBANYoDScAgDlWVccneXaSG5M8ZXgpS4bPJw/DNiS5a+qypSG2r/jev2NrVe2oqh27du0aewoAwBqk4QQAMKeq6oeT/HGSX+ruv97f0GVivZ/4Dwa6L+7uzd29ef369YeXLAAwVzScAADmUFU9LJNm07u6+z1D+CvDo3IZPu8d4ktJjp26fGOSu/cTBwDYLw0nAIA5M7x17pIkt3X370yduibJluH7liRXT8VfXhOnJLl/eOTuuiSnV9Uxw2bhpw8xAID9WjfrBAAAGN2pSV6W5JaqunmI/XqSNyW5sqrOS3JnkpcM565N8oIkO5N8M8krkqS7d1fVG5J8fBj3+u7evTpTAADWMg0nAIA5091/luX3X0qS05YZ30nO38e9tiXZNl52AMAi8EgdAAAAAKPScAIAAABgVBpOAAAAAIxKwwkAAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFQaTgAAAACMSsMJAAAAgFFpOAEAAAAwKg0nAAAAAEal4QQAAADAqDScAAAAABiVhhMAAAAAo9JwAgAAAGBUGk4AAAAAjErDCQAAAIBRaTgBAAAAMCoNJwAAAABGpeEEAAAAwKg0nAAAAAAYlYYTAAAAAKNaN+sEAAAA4KG48J++eNYpHJbf+E9XzToFWDFWOAEAAAAwKg0nAAAAAEblkToAAAA4wt124QdmncIh+9HfeN6sU2CG1lTDqarOTPLWJEcleUd3v2nGKQEc8dbqngZ7s8cBzIb6CwA4HGvmkbqqOirJHyQx/ZwdAAAG8ElEQVQ5K8mJSV5aVSfONisAgPml/gIADtdaWuF0UpKd3X1HklTVFUnOTvLZQ73RT/zqZSOnNhs3/fbLZ50CwBFtLS493xdL0pmRha6/1FoAq+c3f/M3Z53CIVuLOa+mtdRw2pDkrqnjpSQnzygXADiizVMBNE9zWYPUXwDAYanunnUOB6WqXpLkjO7+heH4ZUlO6u5XTY3ZmmTrcPj0JJ9f9US/70lJvjrD3z9Lizz3ZLHnv8hzTxZ7/ua+uGY9/7/b3etn+Pvn2sHUX0N8ljXYrP87uBrmfY7mt/bN+xzNb+2b9zmu9vwOqv5aSyuclpIcO3W8Mcnd0wO6++IkF69mUvtSVTu6e/Os85iFRZ57stjzX+S5J4s9f3NfzLkn5r8ADlh/JbOtwRbhv4PzPkfzW/vmfY7mt/bN+xyP1PmtmU3Dk3w8yaaqOqGqHp7k3CTXzDgnAIB5pv4CAA7Lmlnh1N0PVNUrk1yXyWt5t3X3rTNOCwBgbqm/AIDDtWYaTknS3dcmuXbWeRykI+LRvhlZ5Lkniz3/RZ57stjzN/fFtejzn3troP5ahP8OzvsczW/tm/c5mt/aN+9zPCLnt2Y2DQcAAABgbVhLezgBAAAAsAZoOAEAAAAwKg2nkVXVmVX1+araWVWvnXU+q6mqtlXVvVX1mVnnstqq6tiq+mBV3VZVt1bVq2ed02qqqkdW1ceq6lPD/H9r1jmttqo6qqo+WVV/MutcVltVfamqbqmqm6tqx6zzWU1VdXRVXVVVnxv+9/8PZp3Taqiqpw//ee/5+euq+qVZ58Ximee6a97rqkWonRalPpr3Gmje65x5rmUWpV6pqn85/DPmM1V1eVU9ctY57WEPpxFV1VFJ/luS5ydZyuRVwi/t7s/ONLFVUlXPTfKNJJd19zNnnc9qqqqnJnlqd3+iqh6b5KYk5yzQf/aV5DHd/Y2qeliSP0vy6u6+YcaprZqq+uUkm5M8rrtfOOt8VlNVfSnJ5u7+6qxzWW1VdWmS/6e73zG8Mv7R3f21Wee1moZ/9305ycnd/RezzofFMe9117zXVYtQOy1KfTTvNdC81zmLUsvMa71SVRsy+WfLid3936vqyiTXdvc7Z5vZhBVO4zopyc7uvqO7v5PkiiRnzzinVdPdH0mye9Z5zEJ339Pdnxi+fz3JbUk2zDar1dMT3xgOHzb8LEw3u6o2JvnZJO+YdS6snqp6XJLnJrkkSbr7O/NYoB2E05J8YZ6KN9aMua675r2uWoTaaRHqIzXQ2rZgtcw81yvrkjyqqtYleXSSu2ecz/doOI1rQ5K7po6XMmf/4uTAqur4JM9OcuNsM1ldw3Lqm5Pcm2R7dy/S/H83ya8l+dtZJzIjneT9VXVTVW2ddTKr6O8l2ZXkPw6PEryjqh4z66Rm4Nwkl886CRaSumtOzHPttAD10SLUQPNc5yxSLTOX9Up3fznJv01yZ5J7ktzf3e+fbVbfp+E0rlomNld/xWD/quqHk/xxkl/q7r+edT6rqbu/290/nmRjkpOqau6W/y+nql6Y5N7uvmnWuczQqd39nCRnJTl/eAxkEaxL8pwkF3X3s5P8TZK52kPmQIal9y9K8p9nnQsLSd01B+a9dprn+miBaqB5rnMWopaZ53qlqo7JZHXvCUl+JMljquqfzjar79NwGtdSkmOnjjfmCFrOxsoans3/4yTv6u73zDqfWRmW4X4oyZkzTmW1nJrkRcPz/VckeV5V/afZprS6uvvu4fPeJO/N5DGXRbCUZGnqr9VXZVK0LZKzknyiu78y60RYSOquNW6Raqc5rY8Wogaa8zpnUWqZea5XfibJF7t7V3f/f0nek+QnZ5zT92g4jevjSTZV1QlDF/XcJNfMOCdWwbAp5CVJbuvu35l1PqutqtZX1dHD90dl8g++z802q9XR3a/r7o3dfXwm/5v/QHcfMX9VWGlV9Zhhs9cMS7BPTzKXb1TaW3f/ZZK7qurpQ+i0JHOz2e1BemnmcHk6a4a6aw1bhNpp3uujRaiB5r3OWaBaZp7rlTuTnFJVjx7+uXpaJnviHRHWzTqBedLdD1TVK5Ncl+SoJNu6+9YZp7VqquryJD+d5ElVtZTkgu6+ZLZZrZpTk7wsyS3Dc/pJ8uvdfe0Mc1pNT01y6fD2hx9KcmV3z+WrcXmQpyR57+Tfb1mX5I+6+32zTWlVvSrJu4b/s3tHklfMOJ9VU1WPzuTtYP/brHNhMc173bUAddUi1E7qo7VvEeqcua5l5r1e6e4bq+qqJJ9I8kCSTya5eLZZfV91e9QdAAAAgPF4pA4AAACAUWk4AQAAADAqDScAAAAARqXhBAAAAMCoNJwAAAAAGJWGEwAAAACj0nACAAAAYFT/P1VW1AyVKj3LAAAAAElFTkSuQmCC\n", 1648 | "text/plain": [ 1649 | "" 1650 | ] 1651 | }, 1652 | "metadata": {}, 1653 | "output_type": "display_data" 1654 | } 1655 | ], 1656 | "source": [ 1657 | "fig, ax =plt.subplots(1,2,figsize=(20,5))\n", 1658 | "sns.barplot(x=age1_df.index, y=age1_df['age_bin'], ax=ax[0])\n", 1659 | "sns.barplot(x=age0_df.index, y=age0_df['age_bin'], ax=ax[1])\n", 1660 | "fig.show()" 1661 | ] 1662 | }, 1663 | { 1664 | "cell_type": "code", 1665 | "execution_count": 63, 1666 | "metadata": {}, 1667 | "outputs": [ 1668 | { 1669 | "name": "stdout", 1670 | "output_type": "stream", 1671 | "text": [ 1672 | "1 8636\n", 1673 | "2 2732\n", 1674 | "0 1501\n", 1675 | "3 26\n", 1676 | "Name: marital, dtype: int64\n", 1677 | "1 16292\n", 1678 | "2 8836\n", 1679 | "0 3111\n", 1680 | "3 54\n", 1681 | "Name: marital, dtype: int64\n" 1682 | ] 1683 | } 1684 | ], 1685 | "source": [ 1686 | "print(cluster1['marital'].value_counts())\n", 1687 | "print(cluster0['marital'].value_counts())" 1688 | ] 1689 | }, 1690 | { 1691 | "cell_type": "code", 1692 | "execution_count": 65, 1693 | "metadata": {}, 1694 | "outputs": [ 1695 | { 1696 | "name": "stdout", 1697 | "output_type": "stream", 1698 | "text": [ 1699 | "3 4186\n", 1700 | "2 2572\n", 1701 | "0 1981\n", 1702 | "5 1459\n", 1703 | "1 1033\n", 1704 | "6 977\n", 1705 | "7 680\n", 1706 | "4 7\n", 1707 | "Name: education, dtype: int64\n", 1708 | "6 11191\n", 1709 | "3 5329\n", 1710 | "5 3784\n", 1711 | "2 3473\n", 1712 | "0 2195\n", 1713 | "1 1259\n", 1714 | "7 1051\n", 1715 | "4 11\n", 1716 | "Name: education, dtype: int64\n" 1717 | ] 1718 | } 1719 | ], 1720 | "source": [ 1721 | "print(cluster1['education'].value_counts())\n", 1722 | "print(cluster0['education'].value_counts())" 1723 | ] 1724 | } 1725 | ], 1726 | "metadata": { 1727 | "kernelspec": { 1728 | "display_name": "Python 3", 1729 | "language": "python", 1730 | "name": "python3" 1731 | }, 1732 | "language_info": { 1733 | "codemirror_mode": { 1734 | "name": "ipython", 1735 | "version": 3 1736 | }, 1737 | "file_extension": ".py", 1738 | "mimetype": "text/x-python", 1739 | "name": "python", 1740 | "nbconvert_exporter": "python", 1741 | "pygments_lexer": "ipython3", 1742 | "version": "3.6.4" 1743 | } 1744 | }, 1745 | "nbformat": 4, 1746 | "nbformat_minor": 2 1747 | } 1748 | -------------------------------------------------------------------------------- /Other Forms of Clustering/K-Prototype+clustering (1).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "python", 5 | "display_name": "Pyolite", 6 | "language": "python" 7 | }, 8 | "language_info": { 9 | "codemirror_mode": { 10 | "name": "python", 11 | "version": 3 12 | }, 13 | "file_extension": ".py", 14 | "mimetype": "text/x-python", 15 | "name": "python", 16 | "nbconvert_exporter": "python", 17 | "pygments_lexer": "ipython3", 18 | "version": "3.8" 19 | } 20 | }, 21 | "nbformat_minor": 4, 22 | "nbformat": 4, 23 | "cells": [ 24 | { 25 | "cell_type": "markdown", 26 | "source": "# K-Prototype Clustering", 27 | "metadata": {} 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "source": "**About Blood Transfusion dataset**

\nTo demonstrate the RFMTC marketing model (a modified version of RFM), this study adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. To build a FRMTC model, we selected 748 donors at random from the donor database. These 748 donor data, each one included R (Recency - months since last donation), F (Frequency - total number of donation), M (Monetary - total blood donated in c.c.), T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood).", 32 | "metadata": {} 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "source": "**Attribute Information:**\n\n- R (Recency - months since last donation), \n- F (Frequency - total number of donation), \n- M (Monetary - total blood donated in c.c.), \n- T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood). \n\n- Variable\tData Type\tMeasurement\tDescription\tmin\tmax\tmean\tstd \n- Recency quantitative\tMonths\tInput\t0.03\t74.4\t9.74\t8.07 \n- Frequency quantitative\tTimes\tInput\t1\t50\t5.51\t5.84 \n- Monetary\tquantitative\tc.c. blood\tInput\t250\t12500\t1378.68\t1459.83 \n- Time quantitative\tMonths\tInput\t2.27\t98.3\t34.42\t24.32 \n- Whether he/she donated blood in March 2007\tbinary\t1=yes 0=no\tOutput\t0\t1\t1 (24%) 0 (76%) ", 37 | "metadata": {} 38 | }, 39 | { 40 | "cell_type": "code", 41 | "source": "pip install kmodes", 42 | "metadata": {}, 43 | "execution_count": null, 44 | "outputs": [] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "source": "# Importing Libraries\nimport numpy as np\nimport pandas as pd\nfrom kmodes.kprototypes import KPrototypes\n%matplotlib inline\nimport matplotlib.pyplot as plt\nimport seaborn as sns", 49 | "metadata": {}, 50 | "execution_count": 1, 51 | "outputs": [] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "source": "help(KPrototypes)", 56 | "metadata": {}, 57 | "execution_count": 2, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": "Help on class KPrototypes in module kmodes.kprototypes:\n\n\n\nclass KPrototypes(kmodes.kmodes.KModes)\n\n | k-protoypes clustering algorithm for mixed numerical/categorical data.\n\n | \n\n | Parameters\n\n | -----------\n\n | n_clusters : int, optional, default: 8\n\n | The number of clusters to form as well as the number of\n\n | centroids to generate.\n\n | \n\n | max_iter : int, default: 300\n\n | Maximum number of iterations of the k-modes algorithm for a\n\n | single run.\n\n | \n\n | num_dissim : func, default: euclidian_dissim\n\n | Dissimilarity function used by the algorithm for numerical variables.\n\n | Defaults to the Euclidian dissimilarity function.\n\n | \n\n | cat_dissim : func, default: matching_dissim\n\n | Dissimilarity function used by the kmodes algorithm for categorical variables.\n\n | Defaults to the matching dissimilarity function.\n\n | \n\n | n_init : int, default: 10\n\n | Number of time the k-modes algorithm will be run with different\n\n | centroid seeds. The final results will be the best output of\n\n | n_init consecutive runs in terms of cost.\n\n | \n\n | init : {'Huang', 'Cao', 'random' or a list of ndarrays}, default: 'Cao'\n\n | Method for initialization:\n\n | 'Huang': Method in Huang [1997, 1998]\n\n | 'Cao': Method in Cao et al. [2009]\n\n | 'random': choose 'n_clusters' observations (rows) at random from\n\n | data for the initial centroids.\n\n | If a list of ndarrays is passed, it should be of length 2, with\n\n | shapes (n_clusters, n_features) for numerical and categorical\n\n | data respectively. These are the initial centroids.\n\n | \n\n | gamma : float, default: None\n\n | Weighing factor that determines relative importance of numerical vs.\n\n | categorical attributes (see discussion in Huang [1997]). By default,\n\n | automatically calculated from data.\n\n | \n\n | verbose : integer, optional\n\n | Verbosity mode.\n\n | \n\n | Attributes\n\n | ----------\n\n | cluster_centroids_ : array, [n_clusters, n_features]\n\n | Categories of cluster centroids\n\n | \n\n | labels_ :\n\n | Labels of each point\n\n | \n\n | cost_ : float\n\n | Clustering cost, defined as the sum distance of all points to\n\n | their respective cluster centroids.\n\n | \n\n | n_iter_ : int\n\n | The number of iterations the algorithm ran for.\n\n | \n\n | gamma : float\n\n | The (potentially calculated) weighing factor.\n\n | \n\n | Notes\n\n | -----\n\n | See:\n\n | Huang, Z.: Extensions to the k-modes algorithm for clustering large\n\n | data sets with categorical values, Data Mining and Knowledge\n\n | Discovery 2(3), 1998.\n\n | \n\n | Method resolution order:\n\n | KPrototypes\n\n | kmodes.kmodes.KModes\n\n | sklearn.base.BaseEstimator\n\n | sklearn.base.ClusterMixin\n\n | builtins.object\n\n | \n\n | Methods defined here:\n\n | \n\n | __init__(self, n_clusters=8, max_iter=100, num_dissim=, cat_dissim=, init='Huang', n_init=10, gamma=None, verbose=0)\n\n | Initialize self. See help(type(self)) for accurate signature.\n\n | \n\n | fit(self, X, y=None, categorical=None)\n\n | Compute k-prototypes clustering.\n\n | \n\n | Parameters\n\n | ----------\n\n | X : array-like, shape=[n_samples, n_features]\n\n | categorical : Index of columns that contain categorical data\n\n | \n\n | predict(self, X, categorical=None)\n\n | Predict the closest cluster each sample in X belongs to.\n\n | \n\n | Parameters\n\n | ----------\n\n | X : array-like, shape = [n_samples, n_features]\n\n | New data to predict.\n\n | categorical : Index of columns that contain categorical data\n\n | \n\n | Returns\n\n | -------\n\n | labels : array, shape [n_samples,]\n\n | Index of the cluster each sample belongs to.\n\n | \n\n | ----------------------------------------------------------------------\n\n | Data descriptors defined here:\n\n | \n\n | cluster_centroids_\n\n | \n\n | ----------------------------------------------------------------------\n\n | Methods inherited from kmodes.kmodes.KModes:\n\n | \n\n | fit_predict(self, X, y=None, **kwargs)\n\n | Compute cluster centroids and predict cluster index for each sample.\n\n | \n\n | Convenience method; equivalent to calling fit(X) followed by\n\n | predict(X).\n\n | \n\n | ----------------------------------------------------------------------\n\n | Methods inherited from sklearn.base.BaseEstimator:\n\n | \n\n | __getstate__(self)\n\n | \n\n | __repr__(self)\n\n | Return repr(self).\n\n | \n\n | __setstate__(self, state)\n\n | \n\n | get_params(self, deep=True)\n\n | Get parameters for this estimator.\n\n | \n\n | Parameters\n\n | ----------\n\n | deep : boolean, optional\n\n | If True, will return the parameters for this estimator and\n\n | contained subobjects that are estimators.\n\n | \n\n | Returns\n\n | -------\n\n | params : mapping of string to any\n\n | Parameter names mapped to their values.\n\n | \n\n | set_params(self, **params)\n\n | Set the parameters of this estimator.\n\n | \n\n | The method works on simple estimators as well as on nested objects\n\n | (such as pipelines). The latter have parameters of the form\n\n | ``__`` so that it's possible to update each\n\n | component of a nested object.\n\n | \n\n | Returns\n\n | -------\n\n | self\n\n | \n\n | ----------------------------------------------------------------------\n\n | Data descriptors inherited from sklearn.base.BaseEstimator:\n\n | \n\n | __dict__\n\n | dictionary for instance variables (if defined)\n\n | \n\n | __weakref__\n\n | list of weak references to the object (if defined)\n\n\n" 63 | } 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "source": "# Reading Dataset\nblood = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data', sep=\",\",engine = 'python')", 69 | "metadata": {}, 70 | "execution_count": 3, 71 | "outputs": [] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "source": "#Sanity Check\nblood.head()", 76 | "metadata": {}, 77 | "execution_count": 4, 78 | "outputs": [ 79 | { 80 | "execution_count": 4, 81 | "output_type": "execute_result", 82 | "data": { 83 | "text/html": [ 84 | "
\n", 85 | "\n", 98 | "\n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | "
Recency (months)Frequency (times)Monetary (c.c. blood)Time (months)whether he/she donated blood in March 2007
025012500981
10133250281
21164000351
32205000451
41246000770
\n", 152 | "
" 153 | ], 154 | "text/plain": [ 155 | " Recency (months) Frequency (times) Monetary (c.c. blood) Time (months) \\\n", 156 | "0 2 50 12500 98 \n", 157 | "1 0 13 3250 28 \n", 158 | "2 1 16 4000 35 \n", 159 | "3 2 20 5000 45 \n", 160 | "4 1 24 6000 77 \n", 161 | "\n", 162 | " whether he/she donated blood in March 2007 \n", 163 | "0 1 \n", 164 | "1 1 \n", 165 | "2 1 \n", 166 | "3 1 \n", 167 | "4 0 " 168 | ] 169 | }, 170 | "metadata": {} 171 | } 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "source": "# standardizing data\ncolumns_to_normalize = ['Recency (months)','Frequency (times)','Monetary (c.c. blood)','Time (months)']\nblood[columns_to_normalize] = blood[columns_to_normalize].apply(lambda x: (x - x.mean()) / np.std(x))", 177 | "metadata": {}, 178 | "execution_count": 5, 179 | "outputs": [] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "source": "# Re-check after standardizing data\nblood.head()", 184 | "metadata": {}, 185 | "execution_count": 6, 186 | "outputs": [ 187 | { 188 | "execution_count": 6, 189 | "output_type": "execute_result", 190 | "data": { 191 | "text/html": [ 192 | "
\n", 193 | "\n", 206 | "\n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | "
Recency (months)Frequency (times)Monetary (c.c. blood)Time (months)whether he/she donated blood in March 2007
0-0.9278997.6233467.6233462.6156331
1-1.1751181.2827381.282738-0.2578811
2-1.0515081.7968421.7968420.0294711
3-0.9278992.4823132.4823130.4399731
4-1.0515083.1677843.1677841.7535790
\n", 260 | "
" 261 | ], 262 | "text/plain": [ 263 | " Recency (months) Frequency (times) Monetary (c.c. blood) Time (months) \\\n", 264 | "0 -0.927899 7.623346 7.623346 2.615633 \n", 265 | "1 -1.175118 1.282738 1.282738 -0.257881 \n", 266 | "2 -1.051508 1.796842 1.796842 0.029471 \n", 267 | "3 -0.927899 2.482313 2.482313 0.439973 \n", 268 | "4 -1.051508 3.167784 3.167784 1.753579 \n", 269 | "\n", 270 | " whether he/she donated blood in March 2007 \n", 271 | "0 1 \n", 272 | "1 1 \n", 273 | "2 1 \n", 274 | "3 1 \n", 275 | "4 0 " 276 | ] 277 | }, 278 | "metadata": {} 279 | } 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "source": "# Converting the dataset into matrix\nblood_matrix = blood.values", 285 | "metadata": { 286 | "trusted": true 287 | }, 288 | "execution_count": 1, 289 | "outputs": [ 290 | { 291 | "ename": "", 292 | "evalue": "name 'blood' is not defined", 293 | "traceback": [ 294 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 295 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 296 | "Input \u001b[0;32mIn [1]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# Converting the dataset into matrix\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m blood_matrix \u001b[38;5;241m=\u001b[39m \u001b[43mblood\u001b[49m\u001b[38;5;241m.\u001b[39mvalues\n", 297 | "\u001b[0;31mNameError\u001b[0m: name 'blood' is not defined" 298 | ], 299 | "output_type": "error" 300 | } 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "source": "# Martix for analysis\nblood_matrix", 306 | "metadata": {}, 307 | "execution_count": 8, 308 | "outputs": [ 309 | { 310 | "execution_count": 8, 311 | "output_type": "execute_result", 312 | "data": { 313 | "text/plain": [ 314 | "array([[-0.92789873, 7.62334626, 7.62334626, 2.61563344, 1. ],\n", 315 | " [-1.17511806, 1.28273826, 1.28273826, -0.2578809 , 1. ],\n", 316 | " [-1.0515084 , 1.79684161, 1.79684161, 0.02947053, 1. ],\n", 317 | " ...,\n", 318 | " [ 1.66790417, -0.43093957, -0.43093957, 1.13782607, 0. ],\n", 319 | " [ 3.64565877, -0.77367514, -0.77367514, 0.19367135, 0. ],\n", 320 | " [ 7.72477762, -0.77367514, -0.77367514, 1.54832812, 0. ]])" 321 | ] 322 | }, 323 | "metadata": {} 324 | } 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "source": "# Running K-Prototype clustering\nkproto = KPrototypes(n_clusters=5, init='Cao')\nclusters = kproto.fit_predict(blood_matrix, categorical=[4])", 330 | "metadata": {}, 331 | "execution_count": 9, 332 | "outputs": [] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "source": "kproto.cluster_centroids_", 337 | "metadata": {}, 338 | "execution_count": 12, 339 | "outputs": [ 340 | { 341 | "execution_count": 12, 342 | "output_type": "execute_result", 343 | "data": { 344 | "text/plain": [ 345 | "[array([[ 1.20155863, -0.0434147 , -0.0434147 , 1.16721428],\n", 346 | " [-0.5217527 , 4.8324995 , 4.8324995 , 2.12009883],\n", 347 | " [ 0.77649881, -0.51171646, -0.51171646, -0.41737993],\n", 348 | " [-0.77483521, -0.39947752, -0.39947752, -0.73541024],\n", 349 | " [-0.46834379, 0.94841338, 0.94841338, 0.92401242]]), array([[0.],\n", 350 | " [0.],\n", 351 | " [0.],\n", 352 | " [0.],\n", 353 | " [0.]])]" 354 | ] 355 | }, 356 | "metadata": {} 357 | } 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "source": "# Checking the cost of the clusters created.\nkproto.cost_", 363 | "metadata": {}, 364 | "execution_count": 9, 365 | "outputs": [ 366 | { 367 | "execution_count": 9, 368 | "output_type": "execute_result", 369 | "data": { 370 | "text/plain": [ 371 | "915.5424569187537" 372 | ] 373 | }, 374 | "metadata": {} 375 | } 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "source": "# Adding the predicted clusters to the main dataset\nblood['cluster_id'] = clusters", 381 | "metadata": {}, 382 | "execution_count": 10, 383 | "outputs": [] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "source": "# Re-check\nblood.head()", 388 | "metadata": {}, 389 | "execution_count": 11, 390 | "outputs": [ 391 | { 392 | "execution_count": 11, 393 | "output_type": "execute_result", 394 | "data": { 395 | "text/html": [ 396 | "
\n", 397 | "\n", 410 | "\n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | "
Recency (months)Frequency (times)Monetary (c.c. blood)Time (months)whether he/she donated blood in March 2007cluster_id
0-0.9278997.6233467.6233462.61563314
1-1.1751181.2827381.282738-0.25788110
2-1.0515081.7968421.7968420.02947113
3-0.9278992.4823132.4823130.43997313
4-1.0515083.1677843.1677841.75357903
\n", 470 | "
" 471 | ], 472 | "text/plain": [ 473 | " Recency (months) Frequency (times) Monetary (c.c. blood) Time (months) \\\n", 474 | "0 -0.927899 7.623346 7.623346 2.615633 \n", 475 | "1 -1.175118 1.282738 1.282738 -0.257881 \n", 476 | "2 -1.051508 1.796842 1.796842 0.029471 \n", 477 | "3 -0.927899 2.482313 2.482313 0.439973 \n", 478 | "4 -1.051508 3.167784 3.167784 1.753579 \n", 479 | "\n", 480 | " whether he/she donated blood in March 2007 cluster_id \n", 481 | "0 1 4 \n", 482 | "1 1 0 \n", 483 | "2 1 3 \n", 484 | "3 1 3 \n", 485 | "4 0 3 " 486 | ] 487 | }, 488 | "metadata": {} 489 | } 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "source": "# Checking the clusters created\nblooddf = pd.DataFrame(blood['cluster_id'].value_counts())\nblooddf", 495 | "metadata": {}, 496 | "execution_count": 14, 497 | "outputs": [ 498 | { 499 | "execution_count": 14, 500 | "output_type": "execute_result", 501 | "data": { 502 | "text/html": [ 503 | "
\n", 504 | "\n", 517 | "\n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | "
cluster_id
1267
0213
2180
380
48
\n", 547 | "
" 548 | ], 549 | "text/plain": [ 550 | " cluster_id\n", 551 | "1 267\n", 552 | "0 213\n", 553 | "2 180\n", 554 | "3 80\n", 555 | "4 8" 556 | ] 557 | }, 558 | "metadata": {} 559 | } 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "source": "sns.barplot(x=blooddf.index, y=blooddf['cluster_id'])", 565 | "metadata": {}, 566 | "execution_count": 16, 567 | "outputs": [ 568 | { 569 | "execution_count": 16, 570 | "output_type": "execute_result", 571 | "data": { 572 | "text/plain": [ 573 | "" 574 | ] 575 | }, 576 | "metadata": {} 577 | }, 578 | { 579 | "output_type": "display_data", 580 | "data": { 581 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD8CAYAAACCRVh7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAD8RJREFUeJzt3XvMZHV9x/H3R6TaCA0QHgiFpYt2S8S2LnYlNLT1VitQFTHBQCIQQrumBYuVtIJpvDQhtamgsRdSFCp4LS1aqcHWlVKJtoq7SLm4Ereywrpbdr1CayJd+PaPOdsdl+e3z4z7nD2z+7xfyWTm/ObMzIcTsp/nXOY3qSokSZrPU4YOIEmaXZaEJKnJkpAkNVkSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU1PHTrAnjr88MNr+fLlQ8eQpH3KunXrvlVVcwutt8+XxPLly1m7du3QMSRpn5LkG5Os5+EmSVKTJSFJarIkJElNloQkqcmSkCQ1WRKSpCZLQpLUZElIkposCUlS0z7/jWtN5sE//oWhIyy6Y99yz9ARpP2eexKSpCZLQpLUZElIkposCUlSkyUhSWqyJCRJTZaEJKnJkpAkNfVaEkmWJbktyfok9yW5pBt/W5JvJrmru50+9prLk2xIcn+Sl/WZT5K0e31/43o7cGlV3ZnkYGBdkjXdc++qqneOr5zkBOBs4DnATwOfSfJzVfV4zzklSfPodU+iqrZU1Z3d40eB9cDRu3nJGcBHq+qHVfUAsAE4qc+MkqS2vXZOIsly4ETgi93QxUnuTnJdkkO7saOBh8Zetondl4okqUd7pSSSHATcBLyhqh4BrgaeBawEtgBX7lh1npfXPO+3OsnaJGu3bdvWU2pJUu8lkeRARgXxoar6GEBVPVxVj1fVE8B72XlIaROwbOzlxwCbd33PqrqmqlZV1aq5ubl+/wMkaQnr++qmANcC66vqqrHxo8ZWOxO4t3t8M3B2kqclOQ5YAdzRZ0ZJUlvfVzedApwL3JPkrm7szcA5SVYyOpS0EXgdQFXdl+RG4CuMroy6yCubJGk4vZZEVX2O+c8z3LKb11wBXNFbKEnSxPzGtSSpyZKQJDVZEpKkJktCktRkSUiSmiwJSVKTJSFJarIkJElNloQkqcmSkCQ1WRKSpCZLQpLUZElIkposCUlSkyUhSWqyJCRJTZaEJKnJkpAkNVkSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU1PHTpAn37pD24YOsKiW/dn5w0dQdIS4p6EJKnJkpAkNVkSkqQmS0KS1GRJSJKaei2JJMuS3JZkfZL7klzSjR+WZE2Sr3X3h3bjSfKeJBuS3J3keX3mkyTtXt97EtuBS6vq2cDJwEVJTgAuA26tqhXArd0ywGnAiu62Gri653ySpN3otSSqaktV3dk9fhRYDxwNnAFc3612PfCq7vEZwA018gXgkCRH9ZlRktS2185JJFkOnAh8ETiyqrbAqEiAI7rVjgYeGnvZpm5MkjSAvVISSQ4CbgLeUFWP7G7VecZqnvdbnWRtkrXbtm1brJiSpF30XhJJDmRUEB+qqo91ww/vOIzU3W/txjcBy8Zefgywedf3rKprqmpVVa2am5vrL7wkLXF9X90U4FpgfVVdNfbUzcD53ePzgU+MjZ/XXeV0MvD9HYelJEl7X98T/J0CnAvck+SubuzNwDuAG5NcCDwInNU9dwtwOrAB+AFwQc/5JEm70WtJVNXnmP88A8BL5lm/gIv6zCRJmpzfuJYkNVkSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU2WhCSpyZKQJDVZEpKkpr7nbpJmzil/fsrQERbd51//+aEjaD/lnoQkqcmSkCQ1WRKSpCZLQpLUZElIkposCUlSkyUhSWqyJCRJTZaEJKlpwW9cJ3nj7p6vqqsWL44kaZZMMi3Hwd398cDzgZu75VcAt/cRSpI0GxYsiap6O0CSTwPPq6pHu+W3AX/XazpJ0qCmOSdxLPDY2PJjwPJFTSNJminTzAL7AeCOJB8HCjgTuKGXVJKkmTBxSVTVFUk+BfxqN3RBVX25n1iSpFkwydVNP1VVjyQ5DNjY3XY8d1hVfae/eJKkIU2yJ/Fh4OXAOkaHmXZIt/zMHnJJkmbAJFc3vby7P2536yV5TlXdt1jBJEnDW8xvXH9gEd9LkjQDFrMk8qSB5LokW5PcOzb2tiTfTHJXdzt97LnLk2xIcn+Sly1iNknSj2ExS6LmGXs/cOo84++qqpXd7RaAJCcAZwPP6V7zV0kOWMR8kqQp9TrBX1XdDkx69dMZwEer6odV9QCwATipt3CSpAVNVBIZWbbAao8t8Py4i5Pc3R2OOrQbOxp4aGydTd3YfHlWJ1mbZO22bdum+FhJ0jQmKomqKuAfFljn5Ak/82rgWcBKYAtwZTf+pHMazH8Ii6q6pqpWVdWqubm5CT9WkjStaQ43fSHJ8/f0A6vq4ap6vKqeAN7LzkNKm4DxvZVjgM17+nmSpB/fNCXxIkZF8Z/doaJ7ktw97QcmOWps8Uxgx5VPNwNnJ3lakuOAFcAd076/JGnxTDPB32nTvnmSjwAvBA5Psgl4K/DCJCsZHUraCLwOoKruS3Ij8BVgO3BRVT0+7WdKkhbPNBP8fSPJrwArqupvkswBBy3wmnPmGb52N+tfAVwxaSZJUr8mPtyU5K3Am4DLu6EDgQ/2EUqSNBumOSdxJvBK4H8AqmozO3/aVJK0H5qmJB7rLoUtgCTP6CeSJGlWTFMSNyb5a+CQJL8NfAZ4Xz+xJEmzYJoT1+9M8lLgEeB44C1Vtaa3ZJKkwU1cEkn+tKreBKyZZ0yStB+a5nDTS+cZm/q7E5Kkfcckv3H9O8DvAs/c5RvWBwOf7yuYJGl4k/7G9aeAPwEuGxt/tKomnQZckrQPWvBwU1V9v6o2An8E/FdVfQM4DnhtkkN6zidJGtA05yRuAh5P8rOMptY4jtFehiRpPzVNSTxRVduBVwPvrqrfB45a4DWSpH3YNCXxv0nOAc4DPtmNHbj4kSRJs2KakrgA+GXgiqp6oPvNByf4k6T92DTfuP4K8Htjyw8A7+gjlCRpNkzzjesHmOc3p6vqmYuaSJI0M6b5ZbpVY4+fDpwFHLa4cSRJs2TicxJV9e2x2zer6t3Ai3vMJkka2DSHm543tvgURnsW/uiQJO3HpjncdOXY4+3ARuA1i5pGkjRTprm66UV9BpEkzZ5JZoF94+6er6qrFi+OJGmWTLInsbvzDk+6JFaStP9YsCSq6u0ASa4HLqmq73XLh/Kj5ykkSfuZaabl+MUdBQFQVd8FTlz8SJKkWTFNSTyl23sAIMlhTHd1lCRpHzPtJbD/luTvGZ2LeA1wRS+pJEkzYZpLYG9IspbRt6wDvLqb9E+StJ+a6nBRVwoWgyQtEdOck5AkLTG9lkSS65JsTXLv2NhhSdYk+Vp3f2g3niTvSbIhyd27zBUlSRpA33sS7wdO3WXsMuDWqloB3NotA5wGrOhuq4Gre84mSVpAryVRVbcD39ll+Azg+u7x9cCrxsZvqJEvAIckOarPfJKk3RvinMSRVbUFoLs/ohs/GnhobL1N3diTJFmdZG2Stdu2bes1rCQtZbN04jrzjM07N1RVXVNVq6pq1dzcXM+xJGnpGqIkHt5xGKm739qNbwKWja13DLB5L2eTJI0ZoiRuBs7vHp8PfGJs/LzuKqeTge/vOCwlSRpGr3MvJfkI8ELg8CSbgLcC7wBuTHIh8CBwVrf6LcDpwAbgB8AFfWaTJC2s15KoqnMaT71knnULuKjPPJKk6czSiWtJ0oyxJCRJTZaEJKnJkpAkNVkSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU29TsshabZ99tdeMHSERfeC2z87dIT9insSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU2WhCSpyZKQJDVZEpKkJktCktRkSUiSmiwJSVKTJSFJarIkJElNloQkqcmSkCQ1WRKSpCZLQpLUNNjPlybZCDwKPA5sr6pVSQ4D/hZYDmwEXlNV3x0qoyQtdUPvSbyoqlZW1apu+TLg1qpaAdzaLUuSBjJ0SezqDOD67vH1wKsGzCJJS96QJVHAp5OsS7K6GzuyqrYAdPdHDJZOkjTcOQnglKranOQIYE2Sr076wq5UVgMce+yxfeWTpCVvsD2Jqtrc3W8FPg6cBDyc5CiA7n5r47XXVNWqqlo1Nze3tyJL0pIzSEkkeUaSg3c8Bn4DuBe4GTi/W+184BND5JMkjQx1uOlI4ONJdmT4cFX9U5IvATcmuRB4EDhroHySJAYqiar6OvDceca/Dbxk7yeSJM1n1i6BlSTNEEtCktRkSUiSmiwJSVKTJSFJarIkJElNloQkqcmSkCQ1WRKSpCZLQpLUZElIkposCUlSkyUhSWqyJCRJTZaEJKnJkpAkNVkSkqQmS0KS1GRJSJKaLAlJUpMlIUlqsiQkSU2WhCSpyZKQJDVZEpKkJktCktRkSUiSmiwJSVLTU4cOIEmz4C8u/cehIyy6i698xR6/h3sSkqSmmSuJJKcmuT/JhiSXDZ1HkpaymSqJJAcAfwmcBpwAnJPkhGFTSdLSNVMlAZwEbKiqr1fVY8BHgTMGziRJS9aslcTRwENjy5u6MUnSAFJVQ2f4f0nOAl5WVb/VLZ8LnFRVr99lvdXA6m7xeOD+vRp0focD3xo6xIxwW+zkthhxO+w0K9viZ6pqbqGVZu0S2E3AsrHlY4DNu65UVdcA1+ytUJNIsraqVg2dYxa4LXZyW4y4HXba17bFrB1u+hKwIslxSX4COBu4eeBMkrRkzdSeRFVtT3Ix8M/AAcB1VXXfwLEkacmaqZIAqKpbgFuGzvFjmKnDXwNzW+zkthhxO+y0T22LmTpxLUmaLbN2TkKSNEMsiT3kNCIjSa5LsjXJvUNnGVqSZUluS7I+yX1JLhk601CSPD3JHUn+o9sWbx8609CSHJDky0k+OXSWSVgSe8BpRH7E+4FThw4xI7YDl1bVs4GTgYuW8P8XPwReXFXPBVYCpyY5eeBMQ7sEWD90iElZEnvGaUQ6VXU78J2hc8yCqtpSVXd2jx9l9A/Ckpw5oEb+u1s8sLst2ROhSY4BfhN439BZJmVJ7BmnEdFuJVkOnAh8cdgkw+kOr9wFbAXWVNWS3RbAu4E/BJ4YOsikLIk9k3nGluxfSfpRSQ4CbgLeUFWPDJ1nKFX1eFWtZDSDwklJfn7oTENI8nJga1WtGzrLNCyJPTPRNCJaepIcyKggPlRVHxs6zyyoqu8B/8rSPXd1CvDKJBsZHZp+cZIPDhtpYZbEnnEaET1JkgDXAuur6qqh8wwpyVySQ7rHPwn8OvDVYVMNo6our6pjqmo5o38r/qWqXjtwrAVZEnugqrYDO6YRWQ/cuFSnEUnyEeDfgeOTbEpy4dCZBnQKcC6jvxTv6m6nDx1qIEcBtyW5m9EfVWuqap+49FMjfuNaktTknoQkqcmSkCQ1WRKSpCZLQpLUZElIkposCUlSkyUhSWqyJCRJTf8HnMwFuydguAUAAAAASUVORK5CYII=\n", 582 | "text/plain": [ 583 | "" 584 | ] 585 | }, 586 | "metadata": {} 587 | } 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "source": "#Choosing optimal K\ncost = []\nfor num_clusters in list(range(1,14)):\n kproto = KPrototypes(n_clusters=num_clusters, init='Cao')\n kproto.fit_predict(blood_matrix, categorical=[4])\n cost.append(kproto.cost_)\n \nplt.plot(cost)", 593 | "metadata": {}, 594 | "execution_count": 24, 595 | "outputs": [ 596 | { 597 | "execution_count": 24, 598 | "output_type": "execute_result", 599 | "data": { 600 | "text/plain": [ 601 | "[]" 602 | ] 603 | }, 604 | "metadata": {} 605 | }, 606 | { 607 | "output_type": "display_data", 608 | "data": { 609 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xl0XdV59/Hvo9maLNkaPMi2bCzbshOwjQAnGBKwGUtjQhoWJG1M8FuyWkghTZqSNu9LStI2XS0kkNC0JBCcFnBYwUkcQqC2GYIpg+UBPAiQPMs2lmR5lGyNz/vHPXbkUbKmc4ffZ6277r37nqv7HCzuT3ufffYxd0dERBJPUtgFiIhIOBQAIiIJSgEgIpKgFAAiIglKASAikqAUACIiCUoBICKSoBQAIiIJSgEgIpKgUsIu4GwKCgq8tLQ07DJERGLKqlWrGty9sLvtojoASktLqaysDLsMEZGYYmbberKdhoBERBKUAkBEJEEpAEREEpQCQEQkQSkAREQSlAJARCRBKQBERBJUXAbA/uZWHlpWzcZdB8MuRUQkakX1iWC9lZRk/OClao62dzB1VG7Y5YiIRKW47AHkZqRyyYRhLNu4J+xSRESiVlwGAMCcKcVU1x1m296msEsREYlK3QaAmWWY2dtm9o6ZbTCzfwjax5vZW2ZWbWY/N7O0oD09eF4TvF7a5Wd9I2h/38yuGaidAphbXgzAsqq6gfwYEZGY1ZMeQAtwpbtfAEwHrjWzWcC/AN9z9zJgH7Ag2H4BsM/dJwLfC7bDzKYCtwDTgGuBfzez5P7cma7GDs9kcnGOhoFERM6g2wDwiMPB09Tg5sCVwC+C9oXAjcHjecFzgtfnmJkF7YvcvcXdtwA1wMX9shdnMKe8iLe3NnKguW0gP0ZEJCb16BiAmSWb2VqgDlgKbAL2u3t7sEktMDp4PBrYARC8fgAY3rX9NO/p+ll3mFmlmVXW19ef+x51MXdqMR2dzisfaBhIRORkPQoAd+9w9+lACZG/2stPt1lwb2d47UztJ3/Wo+5e4e4VhYXdXs/grKaX5FGQnabjACIip3FOs4DcfT/wCjALyDOzY+cRlAC7gse1wBiA4PWhQGPX9tO8Z0AkJRlXTinilffraOvoHMiPEhGJOT2ZBVRoZnnB4yHAXKAKeBn4k2Cz+cCvg8dLgucEr7/k7h603xLMEhoPlAFv99eOnMnc8mIOHW1n5ZbGgf4oEZGY0pMzgUcCC4MZO0nAM+7+nJltBBaZ2XeANcBjwfaPAf9lZjVE/vK/BcDdN5jZM8BGoB240907+nd3TjW7rIC0lCSWVdXx8YkFA/1xIiIxwyJ/nEeniooK749rAt/+xEqq6w7x+7+5gsiEJBGR+GVmq9y9orvt4vZM4K7mlhezo/EI1XWHu99YRCRBJEQAzCkvAmBZlU4KExE5JiECoDg3g/NLhuqsYBGRLhIiACCyONyaHftpONwSdikiIlEhYQJg7tQi3OGl93RSmIgIJFAATB2Zy6ihGRoGEhEJJEwAmBlzyot5rbqBo20DfvqBiEjUS5gAgMjicEfaOnhj096wSxERCV1CBcCsCcPISktmqaaDiogkVgCkpyRz+aRCllftIZrPgBYRGQwJFQAQOSt4z8EW1u88GHYpIiKhSrgAuGJKEUmGhoFEJOElXAAMy0rjwnH5LFcAiEiCS7gAgMgw0IZdB9m1/0jYpYiIhCYhA2BOeTGAegEiktASMgDOK8xifEGWrhUsIgktIQPAzJhbXsQbm/ZyuKU97HJEREKRkAEAkWGg1o5OVlTXh12KiEgoEjYAKsblM3RIKks3ahhIRBJTwgZASnISV0wu5OX36+jo1FnBIpJ4EjYAILI4XGNTK2u27wu7FBGRQZfQAXD5pEJSk01nBYtIQkroAMjNSOWS8cN1kRgRSUgJHQAAc8uL2FTfxJaGprBLEREZVAkfADorWEQSVcIHwJhhmUwZkcMyBYCIJJiEDwCILA63cus+9je3hl2KiMigUQAAc8qL6Oh0XnlfZwWLSOLoNgDMbIyZvWxmVWa2wczuDtq/ZWY7zWxtcLu+y3u+YWY1Zva+mV3Tpf3aoK3GzO4dmF06dxeU5FGQna5hIBFJKCk92KYd+Kq7rzazHGCVmS0NXvueu/9b143NbCpwCzANGAUsM7NJwcuPAFcBtcBKM1vi7hv7Y0f6Iikpsjjcb9/dTWt7J2kp6hiJSPzr9pvO3Xe7++rg8SGgChh9lrfMAxa5e4u7bwFqgIuDW427b3b3VmBRsG1UmFNezKGWdt7e0hh2KSIig+Kc/tQ1s1JgBvBW0HSXmb1rZo+bWX7QNhrY0eVttUHbmdqjwuyJBaSnJGkYSEQSRo8DwMyygWeBe9z9IPAj4DxgOrAbeODYpqd5u5+l/eTPucPMKs2ssr5+8A7KDklL5rKyApZV7cFdi8OJSPzrUQCYWSqRL/8n3X0xgLvvcfcOd+8EfkxkiAcif9mP6fL2EmDXWdpP4O6PunuFu1cUFhae6/70yZzyYmr3HeGDPYcH9XNFRMLQk1lABjwGVLn7g13aR3bZ7NPA+uDxEuAWM0s3s/FAGfA2sBIoM7PxZpZG5EDxkv7Zjf4xZ0oRgIaBRCQh9GQW0KXAnwHrzGxt0PZ3wK1mNp3IMM5W4EsA7r7BzJ4BNhKZQXSnu3cAmNldwItAMvC4u2/ox33ps6LcDC4oGcrSjXu484qJYZcjIjKgug0Ad1/B6cfvnz/Le/4R+MfTtD9/tvdFg7nlxTy47APqDh2lKCcj7HJERAaMJryfZO7UYtzh5fd0qUgRiW8KgJNMGZHD6LwhulawiMQ9BcBJzCJnBa+oqedoW0fY5YiIDBgFwGnMnVrM0bZOXq9pCLsUEZEBowA4jUvGDyc7PYVlVRoGEpH4pQA4jbSUJD4xqZDlVXvo7NRZwSISnxQAZzCnvIi6Qy2s23kg7FJERAaEAuAMrphcRJLpWsEiEr8UAGeQn5VGRekwluo4gIjEKQXAWcwtL6Jq90Fq9zWHXYqISL9TAJzF3PJiAF7SWcEiEocUAGcxoTCbCYVZLN2o4wAiEn8UAN2YW17Mm5v3cuhoW9iliIj0KwVAN+aWF9PW4bxWrbOCRSS+KAC6MXNsHnmZqSzTMJCIxBkFQDdSkpO4cnIRL79fR3tHZ9jliIj0GwVAD8ydWsy+5jZWb98fdikiIv1GAdADl5UVkJpsulawiMQVBUAP5GSkMmvCcAWAiMQVBUAPXTW1mM31TWyqPxx2KSIi/UIB0ENXTikCtDiciMQPBUAPleRnUj4yVxeJEZG4oQA4B3PLi6jc2si+ptawSxER6TMFwDmYW15Mp8MrH6gXICKxTwFwDj46eihFOeks26gAEJHYpwA4B0lJxpzyIl79oJ7Wdp0VLCKxTQFwjuaWF3O4pZ23tuwNuxQRkT5RAJyjSycWkJGapMXhRCTmKQDOUUZqMrMnFrKsqg53D7scEZFe6zYAzGyMmb1sZlVmtsHM7g7ah5nZUjOrDu7zg3Yzs4fNrMbM3jWzmV1+1vxg+2ozmz9wuzWwrppaxM79R3jvw0NhlyIi0ms96QG0A19193JgFnCnmU0F7gWWu3sZsDx4DnAdUBbc7gB+BJHAAO4DLgEuBu47Fhqx5orgrGANA4lILOs2ANx9t7uvDh4fAqqA0cA8YGGw2ULgxuDxPOBnHvEmkGdmI4FrgKXu3uju+4ClwLX9ujeDpCgng+lj8limi8WLSAw7p2MAZlYKzADeAordfTdEQgIoCjYbDezo8rbaoO1M7THpqqnFvLNjP3UHj4ZdiohIr/Q4AMwsG3gWuMfdD55t09O0+VnaT/6cO8ys0swq6+vre1reoJtTHiwOp16AiMSoHgWAmaUS+fJ/0t0XB817gqEdgvtj34S1wJguby8Bdp2l/QTu/qi7V7h7RWFh4bnsy6CaXJxDSf4QlqzdpdlAIhKTejILyIDHgCp3f7DLS0uAYzN55gO/7tL+hWA20CzgQDBE9CJwtZnlBwd/rw7aYpKZcful43lj816efGt72OWIiJyzlB5scynwZ8A6M1sbtP0d8F3gGTNbAGwHPhu89jxwPVADNANfBHD3RjP7NrAy2O5+d2/sl70IyW0fL+XVD+r59nMbuah0GJNH5IRdkohIj1k0D19UVFR4ZWVl2GWcVf2hFq576DWGZaWy5K7ZZKQmh12SiCQ4M1vl7hXdbaczgfuoMCedB2++gA/2HOY7v90YdjkiIj2mAOgHl08q5EuXT+C/39zOC+s/DLscEZEeUQD0k69ePZkLSobyt8++y679R8IuR0SkWwqAfpKWksTDt86go9O5Z9Fa2jt0vQARiW4KgH40bngW37nxI7y9tZEfvlwTdjkiImelAOhnN84YzU0zR/Pw8mre3hLTs1xFJM4pAAbA/fM+wthhmdyzaA37m1vDLkdE5LQUAAMgOz2FH9w6k/rDLdz77DotFSEiUUkBMEA+WjKUr18zhRc2fMhTb2upCBGJPgqAAbRg9ngun1TI/b/ZyAd7dPUwEYkuCoABlJRkPPDZC8jJSOGup1ZztK0j7JJERI5TAAywwpx0Hrh5upaKEJGoowAYBJ+YVMgdWipCRKKMAmCQfO3qyZyvpSJEJIooAAZJWkoSD98yg/aOTu5ZtJaOTk0NFZFwKQAGUWlBFt/5dLBUxEtaKkJEwqUAGGSfnlHCTTNG89DyD1i5VUtFiEh4FAAhuP/GyFIRdz+tpSJEJDwKgBBkp6fw8K0zqDukpSJEJDwKgJCcX5LH16+drKUiRCQ0CoAQ/Z/ZE7isrEBLRYhIKBQAIUpKMh64ObJUxJefWqOlIkRkUCkAQlaUk8EDN0/n/T2H+MffVoVdjogkEAVAFDi2VMR/vbmNFzdoqQgRGRwKgCjxtasn89HRQ/n6L7RUhIgMDgVAlEhLSeLhW7VUhIgMHgVAFBlfkMW3b9RSESIyOBQAUeammSV8WktFiMggUABEofvnTWNMsFTEgea2sMsRkTjVbQCY2eNmVmdm67u0fcvMdprZ2uB2fZfXvmFmNWb2vpld06X92qCtxszu7f9diR85Gak8fEuwVMTid7VUhIgMiJ70AJ4Arj1N+/fcfXpwex7AzKYCtwDTgvf8u5klm1ky8AhwHTAVuDXYVs7ggjF5/M01k/ndei0VISIDo9sAcPffAz0djJ4HLHL3FnffAtQAFwe3Gnff7O6twKJgWzmLP7/sD0tFrKs9EHY5IhJn+nIM4C4zezcYIsoP2kYDO7psUxu0nan9FGZ2h5lVmlllfX19H8qLfUlJxoM3T6cgO50FC1fq/AAR6Ve9DYAfAecB04HdwANBu51mWz9L+6mN7o+6e4W7VxQWFvayvPhRmJPO47ddxJHWDm5/YiWHW9rDLklE4kSvAsDd97h7h7t3Aj8mMsQDkb/sx3TZtATYdZZ26YHJI3J45PMzqa47zJefWk17R2fYJYlIHOhVAJjZyC5PPw0cmyG0BLjFzNLNbDxQBrwNrATKzGy8maUROVC8pPdlJ57LJxVy/7xpvPx+Pd9+bmPY5YhIHEjpbgMzexr4JFBgZrXAfcAnzWw6kWGcrcCXANx9g5k9A2wE2oE73b0j+Dl3AS8CycDj7r6h3/cmzn3+knFsbWjix69tobQgiy9eOj7skkQkhlk0zzGvqKjwysrKsMuIKh2dzl/89yqWVe3hx1+oYE55cdgliUiUMbNV7l7R3XY6EzjGJCcZ379lOtNGDeXLT69h/U5NDxWR3lEAxKDMtBR+Mr+CoUNSWbBwJR8eOBp2SSISgxQAMao4N4PHb7uIw0fbWbBwJU2aHioi50gBEMPKR+byw8/NpGr3Qe5etEbXEBCRc6IAiHFXTCniW5+axrKqOr7zW00PFZGe63YaqES/L3yslC0NTfz09a2ML8jiCx8rDbskEYkBCoA48c0/msqOxma+tWQDY/IzuWJKUdgliUiU0xBQnEhOMh66ZQblI3O566nVbNx1MOySRCTKKQDiSFZ6Co/Nv4icjMj00D0HNT1URM5MARBnRgzN4LHbKjhwpI0FC1fS3KrpoSJyegqAODRt1FB+cOsMNu46yN2L1mp6qIiclgIgTs0pL+abfzSVpRv38M/PV4VdjohEIc0CimNfvLSUrXub+MmKyOqhfzprXNgliUgUUQDEMTPj/90wle2Nzdy3ZANjhmXyiUm6ypqIRGgIKM6lJCfxw8/NpKwomzufXM37Hx4KuyQRiRIKgASQnZ7C47ddRGZaMrc/sZK6Q5oeKiIKgIQxKm8Ij82/iMamVv58YSVHWjvCLklEQqYASCAfLRnKQ7dM592dB/jrZ9bSqemhIglNAZBgrp42gr+/vpzfrf+Qf3nxvbDLEZEQaRZQAlowezxbGpr4z1c3Uzo8i1svHht2SSISAgVAAjIz/uFT09ix7wjf/NV6xuRnMrusIOyyRGSQaQgoQaUkJ/HI52YwsTCbv3hyFdV7ND1UJNEoABJYTkYqj91WQXpKMl98YiV1Wj1UJKEoABJcSX4mP5lfQcPhFq7+/u9Z9PZ2zQ4SSRAKAGH6mDyW3DWbSUU53Lt4HX/yH/+rC8qIJAAFgAAwqTiHn39pFv/6J+ezdW8zf/zDFXz7uY0cbtH1BETilQJAjjMzPlsxhuV//QlurijhsRVbmPvAqzy/bjfuGhYSiTcKADlFflYa/3zT+Sz+y4+Tn5XGXz65mtt+upJte5vCLk1E+lG3AWBmj5tZnZmt79I2zMyWmll1cJ8ftJuZPWxmNWb2rpnN7PKe+cH21WY2f2B2R/rTzLH5/OauS/m/N0ylcmsjV3/v9zy8vJqWdq0jJBIPetIDeAK49qS2e4Hl7l4GLA+eA1wHlAW3O4AfQSQwgPuAS4CLgfuOhYZEt5TkJBbMHs/yr36SueXFPLj0A677/mu8XtMQdmki0kfdBoC7/x5oPKl5HrAweLwQuLFL+8884k0gz8xGAtcAS9290d33AUs5NVQkio0YmsEjn5/JwtsvpsOdz//kLf7q6TU6d0AkhvX2GECxu+8GCO6LgvbRwI4u29UGbWdqlxjziUmFvHjP5dw9p4wX1n/InAde5YnXt+jC8yIxqL8PAttp2vws7af+ALM7zKzSzCrr6+v7tTjpHxmpyXzlqkm8+JXLmT42j2/9ZiPzHlnBOzv2h12aiJyD3gbAnmBoh+C+LmivBcZ02a4E2HWW9lO4+6PuXuHuFYWFun5tNBtfkMXPbr+YH9w6g7qDLdz476/zzV+t48CRtrBLE5Ee6G0ALAGOzeSZD/y6S/sXgtlAs4ADwRDRi8DVZpYfHPy9OmiTGGdm/PEFo1j+1U9w28dLeeqt7cx54BV+uaZW5w6IRLmeTAN9GngDmGxmtWa2APgucJWZVQNXBc8Bngc2AzXAj4G/BHD3RuDbwMrgdn/QJnEiJyOV+/54Gkvumk1JfiZf+fk7fO7Hb1FTdzjs0kTkDCya/0qrqKjwysrKsMuQc9TZ6Ty9cjv/8rv3ONLWwR2XT+CuK8oYkpYcdmkiCcHMVrl7RXfb6Uxg6XdJScbnLxnHS1/7JJ+6YDSPvLyJq773Ksur9oRdmoh0oQCQAVOQnc4DN1/Az++YxZDUZBYsrOQLj7/N8+t2c7RNZxOLhE1DQDIoWts7efz1Lfz09S3sOdhCbkYKN1wwis/MHM3MsfmYnW6msIj0Rk+HgBQAMqg6Op3/3dTA4tU7eWH9hxxp62Dc8Ew+PWM0N80oYezwzLBLFIl5CgCJeodb2nlh/YcsXl3LG5v34g4XleZz08wSrv/oSIYOSQ27RJGYpACQmLJr/xF+tXYnz66qZVN9E2kpSVw1tZjPzBzNZWWFpCbrcJVITykAJCa5O+t2HmDx6p0seWcXjU2tDM9K41PTR/GZmSVMG5Wr4wUi3VAASMxrbe/k1Q/q+eWaWpZtrKO1o5NJxdncNLOEG6ePZsTQjLBLFIlKCgCJKwea23hu3S4Wr97Jqm37MINLzyvgppmjuWbaCLLSU8IuUSRqKAAkbm1taGLxmp38ck0tOxqPkJmWzLUfGcFnZpYwa8JwkpM0RCSJTQEgcc/dqdy2j8Wra3nu3d0cOtrOiNwM5s0YxZwpxcwYm6eDx5KQFACSUI62dbCsag+LV+/k1Q/q6eh0stKSmTVhOLPLCrisrJDzCrN0AFkSQk8DQAOnEhcyUpO54fxR3HD+KA4caeONTXtZUVPPa9UNLH8vcrmKkUMzmD2xgNllBcyeWMDw7PSQqxYJl3oAEvd2NDbzWnUDK2rqWVHdwMGj7QBMG5Ub6R1MLKSiNJ+MVK1WKvFBQ0Aip9HRGTnPYEV1pHewevs+2jqc9JQkLh4/jMvKCpg9sZDykTkaLpKYpQAQ6YGmlnbe2rI30kOobqA6uIBNQXY6sycOZ3ZZIZeVFVCcq3MOJHboGIBID2Slp3DllGKunFIMwO4DR1hR3cCKmsjtV2sjl66eVJzN7ImRMLhkwjAy0/S/jsQ+9QBEzqCz06n68ODxQHhrSyOt7Z2kJhszx+Yza8JwLhyXz/SxeeRmaOE6iR4aAhLpZ0fbOli5tZEV1Q28Vt1A1YcHcQczmFSUw8xx+cwcm8eF4/IZX6AppxIeBYDIADt0tI13dhxg1bZ9rN4euR0KZhjlZ6Yyc2x+EAr5XDBmqIaNZNDoGIDIAMvJSI2cU1BWAESGjDbVH2bVtn3HQ+HYOQjJSUb5yBwu7BIKJflD1EuQUKkHIDKA9je3smb7/uOBsHbHfppbI9dDLsxJDwIhMmw0bdRQnYsg/UI9AJEokJeZxhVTirhiShEA7R2dvL/nEKu37WN1EAwvbPgQgLTkJKaNzj3eS7hwXL6mn8qAUg9AJGT1h1oixxCCXsI7tQdobe8EoDg3ndLhWZQOz2JcQeYfHg/P1BLYckbqAYjEiMKcdK6ZNoJrpo0AIhfC2bj7IKu27WPDrgNs39vM8vfqaDjccsr7SodnMm54Vpf7SFBoWqr0hAJAJMqkpSQxfUwe08fkndB+uKWdbXub2La3mS0NTWzb28TWvc28Vl3PL1adGA7Ds9IYNzwz6C1kUVrwh6DIy0wbzN2RKKYAEIkR2ekpTBs1lGmjhp7yWnNrO9sbm9na0Hw8GLY2NPHm5r0sXrPzhG2HDkmldHgmpQVZx0NhQmE2Ewqz1HNIMAoAkTiQmZbClBG5TBmRe8prR9s62NHYzNa9x8Ih0otYtW0fv3lnF51dDgMW5qRzXmEW5xVmM6Ew+/jj0XlDSNKV1uKOAkAkzmWkJlNWnENZcc4pr7W0d7Cj8Qib6w+zuaGJTXWH2VR/mOfe3c2BI23Ht0tPSWJ8QRbnFWVzXpdgGF+QpYPRMaxP/3JmthU4BHQA7e5eYWbDgJ8DpcBW4GZ332eRM14eAq4HmoHb3H11Xz5fRPomPSWZiUXZTCzKPqHd3WlsamVTfROb6g+zuf4wm+qbWL/zAL9bt/uEXsOooRmcV5TNhBMCIpvi3HSd6Bbl+iO6r3D3hi7P7wWWu/t3zeze4PnfAtcBZcHtEuBHwb2IRBkzY3h2OsOz07l4/LATXmtp72Db3ubjvYXNQUg8u3onh1vaj2+XlZZ8fBjp2DGGCQWRXsOQNJ3wFg0Gou82D/hk8Hgh8AqRAJgH/MwjJx68aWZ5ZjbS3XcPQA0iMkDSU5KZVJzDpJOGlNyd+kMt1AS9hU11kWGllVv3HV9W+5jReUOCQOgSDoXZjMzN0LGGQdTXAHDgf8zMgf9090eB4mNf6u6+28yKgm1HAzu6vLc2aFMAiMQBM6MoN4Oi3Aw+fl7BCa8dae1gS0MTmxsiPYZjxxxO7jVkpCYxviASCOd1CYfxBVnkaIZSv+trAFzq7ruCL/mlZvbeWbY9Xayfchqymd0B3AEwduzYPpYnItFgSFoyU0flMnXUibOUjvUaNtWfGA6nO9ZQlJN+vKcQ6TlEhpRK8oeQkpw0yHvUe+5Oc2sH+5pb2d/cxoEjbSc+bmpl/5E2RuRm8LVrJg9oLX0KAHffFdzXmdkvgYuBPceGdsxsJFAXbF4LjOny9hLgxH5h5Gc9CjwKkaUg+lKfiES3rr2Gj503/ITXWto72L63+Xg4bKlvYnNDE79bt5t9zX+YoZSWnMTY4ZlMKMhizLBMMlKTSEtOJj01ibTkJNJSIrf04JaWEnm9a3uk7Q+P01MiryefZTjK3TnS1sH+5sgX+IHmNvY1t7H/SOTLfH/wpb7/yKmP2zrO/NU2JDWZ/MxUZozL7/t/4G70OgDMLAtIcvdDweOrgfuBJcB84LvB/a+DtywB7jKzRUQO/h7Q+L+InEl6ypmnr+5ramVzQ+RYQ9chpdeqG2hp7zih59AXyUl2PES6BsXhlnb2N7fR2tF5xvdmpCaRNySNvMxU8jJTmViUTV5mKkOHpJEftOVlppE3JHKfn5lK7pDUQV0Rti89gGLgl8E0rxTgKXd/wcxWAs+Y2QJgO/DZYPvniUwBrSEyDfSLffhsEUlg+VlpXJg1jAvHDTvt6+0dnbR2dNLa3klL+4n3rR2dtLR1HH/9D22dtHRpa2nvOOH14z+jo5Oc9BSGZqaSd/KXedCWlzm4X+S91esAcPfNwAWnad8LzDlNuwN39vbzRER6KiU5iZTkJLTs0dnFzpETERHpVwoAEZEEpQAQEUlQCgARkQSlABARSVAKABGRBKUAEBFJUAoAEZEEZZHzs6KTmdUD2/rwIwqAhm63in7xsh+gfYlW8bIv8bIf0Ld9Gefuhd1tFNUB0FdmVunuFWHX0Vfxsh+gfYlW8bIv8bIfMDj7oiEgEZEEpQAQEUlQ8R4Aj4ZdQD+Jl/0A7Uu0ipd9iZf9gEHYl7g+BiAiImcW7z0AERE5g7gMADO71szeN7MaM7s37Hp6y8zGmNnLZlZlZhvM7O6wa+oLM0s2szVm9lzYtfSFmeWZ2S/M7L3g3+ZjYdfUW2b2leB3a72ZPW1mGWHX1FNm9riZ1ZnZ+i5tw8xsqZlVB/cDf13FfnCGffnX4HfsXTP7pZnl9ffnxl0AmFky8AhwHTAVuNXMpoZbVa+1A19193JgFnBnDO8LwN1AVdhF9IOHgBfcfQqRiyLF5D6Z2Wjgr4DXziuTAAACrElEQVQKd/8IkAzcEm5V5+QJ4NqT2u4Flrt7GbA8eB4LnuDUfVkKfMTdzwc+AL7R3x8adwFA5ML0Ne6+2d1bgUXAvJBr6hV33+3uq4PHh4h80YwOt6reMbMS4I+An4RdS1+YWS5wOfAYgLu3uvv+cKvqkxRgiJmlAJnArpDr6TF3/z3QeFLzPGBh8HghcOOgFtVLp9sXd/8fd28Pnr4JlPT358ZjAIwGdnR5XkuMfml2ZWalwAzgrXAr6bXvA18HznwV7dgwAagHfhoMZ/3EzLLCLqo33H0n8G9Ert29Gzjg7v8TblV9VuzuuyHyBxRQFHI9/eV24Hf9/UPjMQDsNG0xPdXJzLKBZ4F73P1g2PWcKzO7Aahz91Vh19IPUoCZwI/cfQbQROwMM5wgGB+fB4wHRgFZZvan4VYlJzOzvycyHPxkf//seAyAWmBMl+clxFC39mRmlkrky/9Jd18cdj29dCnwKTPbSmRI7koz++9wS+q1WqDW3Y/1xH5BJBBi0Vxgi7vXu3sbsBj4eMg19dUeMxsJENzXhVxPn5jZfOAG4PM+AHP24zEAVgJlZjbezNKIHNRaEnJNvWJmRmSsucrdHwy7nt5y92+4e4m7lxL593jJ3WPyL013/xDYYWaTg6Y5wMYQS+qL7cAsM8sMftfmEKMHtLtYAswPHs8Hfh1iLX1iZtcCfwt8yt2bB+Iz4i4AgoMmdwEvEvllfsbdN4RbVa9dCvwZkb+Y1wa368MuSvgy8KSZvQtMB/4p5Hp6JejF/AJYDawj8n0QM2fSmtnTwBvAZDOrNbMFwHeBq8ysGrgqeB71zrAvPwRygKXB//v/0e+fqzOBRUQSU9z1AEREpGcUACIiCUoBICKSoBQAIiIJSgEgIpKgFAAiIglKASAikqAUACIiCer/A/xmYEeqGpTtAAAAAElFTkSuQmCC\n", 610 | "text/plain": [ 611 | "" 612 | ] 613 | }, 614 | "metadata": {} 615 | } 616 | ] 617 | } 618 | ] 619 | } -------------------------------------------------------------------------------- /Other Forms of Clustering/init: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Unsupervised-Learning-Clustering --------------------------------------------------------------------------------