├── README.md ├── cover_fifa_22.jpg ├── fifa.ipynb └── players_22.csv /README.md: -------------------------------------------------------------------------------- 1 | # K-Means Clustering 2 | 3 | # ![fifa2022](https://i0.wp.com/gnova.com.ar/wp-content/uploads/2021/10/ea-fifa-22-cover-kylian-mbappe_1qeaco87s803l13iu0tnr84jhq.jpg) 4 | 5 | 6 | Building a K-means clustering algorithm from scratch and using it to cluster the FIFA22 data. 7 | 8 | Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering. 9 | 10 | We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn. 11 | 12 | Project Steps 13 | 14 | * Write out pseudocode for the algorithm 15 | * Code the k-means algorithm 16 | * Plot the clusters from the algorithm 17 | * Compare performance to the scikit-learn algorithm 18 | 19 | # K-means overview 20 | 21 | K-means is an unsupervised machine learning technique that allow us to cluster data points. This enables us to find patterns in the data that can help us analyze it more effectively. K-means is an iterative algorithm, which means that it will converge to the optimal clustering over time. 22 | 23 | To run a k-means clustering: 24 | 25 | * Specify the number of clusters you want (usually referred to as `k`). 26 | * Randomly initialize the centroid for each cluster. The centroid is the data point that is in the center of the cluster. 27 | * Determine which data points belong to which cluster by finding the closest `centroid` to each data point. 28 | * Update the centroids based on the geometric mean of all the data points in the cluster. 29 | * Run 3 and 4 until the `centroids` stop changing. Each run is referred to as an iteration. 30 | 31 | # Code 32 | 33 | You can find the code for this project [here](https://github.com/taureanjoe/kmeans-clustering). 34 | 35 | ## Data 36 | 37 | We'll be using data from FIFA, which you can download [here](https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset?select=players_22.csv). We'll use the file `players_22.csv`. 38 | -------------------------------------------------------------------------------- /cover_fifa_22.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taureanjoe/kmeans-clustering/6317826da46702012622970ee99312d9da507125/cover_fifa_22.jpg -------------------------------------------------------------------------------- /fifa.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "We are going to cluster the FIFA data dataset using our own K-means clustering algorithm.\n", 8 | "\n", 9 | "# ![fifa2022](cover_fifa_22.jpg)\n", 10 | "\n", 11 | "The datasets provided include the players data for the Career Mode from FIFA 15 to FIFA 22 (\"players_22.csv\"). The data allows multiple comparisons for the same players across the last 8 version of the videogame.\n", 12 | "\n", 13 | "### Content\n", 14 | "\n", 15 | "* Every player available in FIFA 15, 16, 17, 18, 19, 20, 21, and also FIFA 22\n", 16 | "* 100+ attributes\n", 17 | "* URL of the scraped players\n", 18 | "* URL of the uploaded player faces, club and nation logos\n", 19 | "* Player positions, with the role in the club and in the national team\n", 20 | "* Player attributes with statistics as Attacking, Skills, Defense, Mentality, GK Skills, etc.\n", 21 | "* Player personal data like Nationality, Club, DateOfBirth, Wage, Salary, etc.\n", 22 | "\n", 23 | "Data has been scraped from the publicly available website [sofifa.com](https://sofifa.com/)." 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "import pandas as pd\n", 33 | "import numpy as np" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 18, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "name": "stderr", 43 | "output_type": "stream", 44 | "text": [ 45 | "/var/folders/tm/5w0c8x0d2rzck8tf0hvstwf40000gn/T/ipykernel_5324/2061566770.py:1: DtypeWarning: Columns (25,108) have mixed types. Specify dtype option on import or set low_memory=False.\n", 46 | " players = pd.read_csv(\"players_22.csv\")\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "players = pd.read_csv(\"players_22.csv\")" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 19, 57 | "metadata": {}, 58 | "outputs": [ 59 | { 60 | "data": { 61 | "text/html": [ 62 | "
\n", 63 | "\n", 76 | "\n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | "
sofifa_idplayer_urlshort_namelong_nameplayer_positionsoverallpotentialvalue_eurwage_eurage...lcbcbrcbrbgkplayer_face_urlclub_logo_urlclub_flag_urlnation_logo_urlnation_flag_url
0158023https://sofifa.com/player/158023/lionel-messi/...L. MessiLionel Andrés Messi CuccittiniRW, ST, CF939378000000.0320000.034...50+350+350+361+319+3https://cdn.sofifa.net/players/158/023/22_120.pnghttps://cdn.sofifa.net/teams/73/60.pnghttps://cdn.sofifa.net/flags/fr.pnghttps://cdn.sofifa.net/teams/1369/60.pnghttps://cdn.sofifa.net/flags/ar.png
1188545https://sofifa.com/player/188545/robert-lewand...R. LewandowskiRobert LewandowskiST9292119500000.0270000.032...60+360+360+361+319+3https://cdn.sofifa.net/players/188/545/22_120.pnghttps://cdn.sofifa.net/teams/21/60.pnghttps://cdn.sofifa.net/flags/de.pnghttps://cdn.sofifa.net/teams/1353/60.pnghttps://cdn.sofifa.net/flags/pl.png
220801https://sofifa.com/player/20801/c-ronaldo-dos-...Cristiano RonaldoCristiano Ronaldo dos Santos AveiroST, LW919145000000.0270000.036...53+353+353+360+320+3https://cdn.sofifa.net/players/020/801/22_120.pnghttps://cdn.sofifa.net/teams/11/60.pnghttps://cdn.sofifa.net/flags/gb-eng.pnghttps://cdn.sofifa.net/teams/1354/60.pnghttps://cdn.sofifa.net/flags/pt.png
3190871https://sofifa.com/player/190871/neymar-da-sil...Neymar JrNeymar da Silva Santos JúniorLW, CAM9191129000000.0270000.029...50+350+350+362+320+3https://cdn.sofifa.net/players/190/871/22_120.pnghttps://cdn.sofifa.net/teams/73/60.pnghttps://cdn.sofifa.net/flags/fr.pngNaNhttps://cdn.sofifa.net/flags/br.png
4192985https://sofifa.com/player/192985/kevin-de-bruy...K. De BruyneKevin De BruyneCM, CAM9191125500000.0350000.030...69+369+369+375+321+3https://cdn.sofifa.net/players/192/985/22_120.pnghttps://cdn.sofifa.net/teams/10/60.pnghttps://cdn.sofifa.net/flags/gb-eng.pnghttps://cdn.sofifa.net/teams/1325/60.pnghttps://cdn.sofifa.net/flags/be.png
..................................................................
19234261962https://sofifa.com/player/261962/defu-song/220002Song Defu宋德福CDM475270000.01000.022...46+246+246+248+215+2https://cdn.sofifa.net/players/261/962/22_120.pnghttps://cdn.sofifa.net/teams/112541/60.pnghttps://cdn.sofifa.net/flags/cn.pngNaNhttps://cdn.sofifa.net/flags/cn.png
19235262040https://sofifa.com/player/262040/caoimhin-port...C. PorterCaoimhin PorterCM4759110000.0500.019...44+244+244+248+214+2https://cdn.sofifa.net/players/262/040/22_120.pnghttps://cdn.sofifa.net/teams/445/60.pnghttps://cdn.sofifa.net/flags/ie.pngNaNhttps://cdn.sofifa.net/flags/ie.png
19236262760https://sofifa.com/player/262760/nathan-logue/...N. LogueNathan Logue-CunninghamCM4755100000.0500.021...45+245+245+247+212+2https://cdn.sofifa.net/players/262/760/22_120.pnghttps://cdn.sofifa.net/teams/111131/60.pnghttps://cdn.sofifa.net/flags/ie.pngNaNhttps://cdn.sofifa.net/flags/ie.png
19237262820https://sofifa.com/player/262820/luke-rudden/2...L. RuddenLuke RuddenST4760110000.0500.019...26+226+226+232+215+2https://cdn.sofifa.net/players/262/820/22_120.pnghttps://cdn.sofifa.net/teams/111131/60.pnghttps://cdn.sofifa.net/flags/ie.pngNaNhttps://cdn.sofifa.net/flags/ie.png
19238264540https://sofifa.com/player/264540/emanuel-lalch...E. LalchhanchhuahaEmanuel LalchhanchhuahaCAM4760110000.0500.019...41+241+241+245+216+2https://cdn.sofifa.net/players/264/540/22_120.pnghttps://cdn.sofifa.net/teams/113040/60.pnghttps://cdn.sofifa.net/flags/in.pngNaNhttps://cdn.sofifa.net/flags/in.png
\n", 370 | "

19239 rows × 110 columns

\n", 371 | "
" 372 | ], 373 | "text/plain": [ 374 | " sofifa_id player_url \\\n", 375 | "0 158023 https://sofifa.com/player/158023/lionel-messi/... \n", 376 | "1 188545 https://sofifa.com/player/188545/robert-lewand... \n", 377 | "2 20801 https://sofifa.com/player/20801/c-ronaldo-dos-... \n", 378 | "3 190871 https://sofifa.com/player/190871/neymar-da-sil... \n", 379 | "4 192985 https://sofifa.com/player/192985/kevin-de-bruy... \n", 380 | "... ... ... \n", 381 | "19234 261962 https://sofifa.com/player/261962/defu-song/220002 \n", 382 | "19235 262040 https://sofifa.com/player/262040/caoimhin-port... \n", 383 | "19236 262760 https://sofifa.com/player/262760/nathan-logue/... \n", 384 | "19237 262820 https://sofifa.com/player/262820/luke-rudden/2... \n", 385 | "19238 264540 https://sofifa.com/player/264540/emanuel-lalch... \n", 386 | "\n", 387 | " short_name long_name \\\n", 388 | "0 L. Messi Lionel Andrés Messi Cuccittini \n", 389 | "1 R. Lewandowski Robert Lewandowski \n", 390 | "2 Cristiano Ronaldo Cristiano Ronaldo dos Santos Aveiro \n", 391 | "3 Neymar Jr Neymar da Silva Santos Júnior \n", 392 | "4 K. De Bruyne Kevin De Bruyne \n", 393 | "... ... ... \n", 394 | "19234 Song Defu 宋德福 \n", 395 | "19235 C. Porter Caoimhin Porter \n", 396 | "19236 N. Logue Nathan Logue-Cunningham \n", 397 | "19237 L. Rudden Luke Rudden \n", 398 | "19238 E. Lalchhanchhuaha Emanuel Lalchhanchhuaha \n", 399 | "\n", 400 | " player_positions overall potential value_eur wage_eur age ... \\\n", 401 | "0 RW, ST, CF 93 93 78000000.0 320000.0 34 ... \n", 402 | "1 ST 92 92 119500000.0 270000.0 32 ... \n", 403 | "2 ST, LW 91 91 45000000.0 270000.0 36 ... \n", 404 | "3 LW, CAM 91 91 129000000.0 270000.0 29 ... \n", 405 | "4 CM, CAM 91 91 125500000.0 350000.0 30 ... \n", 406 | "... ... ... ... ... ... ... ... \n", 407 | "19234 CDM 47 52 70000.0 1000.0 22 ... \n", 408 | "19235 CM 47 59 110000.0 500.0 19 ... \n", 409 | "19236 CM 47 55 100000.0 500.0 21 ... \n", 410 | "19237 ST 47 60 110000.0 500.0 19 ... \n", 411 | "19238 CAM 47 60 110000.0 500.0 19 ... \n", 412 | "\n", 413 | " lcb cb rcb rb gk \\\n", 414 | "0 50+3 50+3 50+3 61+3 19+3 \n", 415 | "1 60+3 60+3 60+3 61+3 19+3 \n", 416 | "2 53+3 53+3 53+3 60+3 20+3 \n", 417 | "3 50+3 50+3 50+3 62+3 20+3 \n", 418 | "4 69+3 69+3 69+3 75+3 21+3 \n", 419 | "... ... ... ... ... ... \n", 420 | "19234 46+2 46+2 46+2 48+2 15+2 \n", 421 | "19235 44+2 44+2 44+2 48+2 14+2 \n", 422 | "19236 45+2 45+2 45+2 47+2 12+2 \n", 423 | "19237 26+2 26+2 26+2 32+2 15+2 \n", 424 | "19238 41+2 41+2 41+2 45+2 16+2 \n", 425 | "\n", 426 | " player_face_url \\\n", 427 | "0 https://cdn.sofifa.net/players/158/023/22_120.png \n", 428 | "1 https://cdn.sofifa.net/players/188/545/22_120.png \n", 429 | "2 https://cdn.sofifa.net/players/020/801/22_120.png \n", 430 | "3 https://cdn.sofifa.net/players/190/871/22_120.png \n", 431 | "4 https://cdn.sofifa.net/players/192/985/22_120.png \n", 432 | "... ... \n", 433 | "19234 https://cdn.sofifa.net/players/261/962/22_120.png \n", 434 | "19235 https://cdn.sofifa.net/players/262/040/22_120.png \n", 435 | "19236 https://cdn.sofifa.net/players/262/760/22_120.png \n", 436 | "19237 https://cdn.sofifa.net/players/262/820/22_120.png \n", 437 | "19238 https://cdn.sofifa.net/players/264/540/22_120.png \n", 438 | "\n", 439 | " club_logo_url \\\n", 440 | "0 https://cdn.sofifa.net/teams/73/60.png \n", 441 | "1 https://cdn.sofifa.net/teams/21/60.png \n", 442 | "2 https://cdn.sofifa.net/teams/11/60.png \n", 443 | "3 https://cdn.sofifa.net/teams/73/60.png \n", 444 | "4 https://cdn.sofifa.net/teams/10/60.png \n", 445 | "... ... \n", 446 | "19234 https://cdn.sofifa.net/teams/112541/60.png \n", 447 | "19235 https://cdn.sofifa.net/teams/445/60.png \n", 448 | "19236 https://cdn.sofifa.net/teams/111131/60.png \n", 449 | "19237 https://cdn.sofifa.net/teams/111131/60.png \n", 450 | "19238 https://cdn.sofifa.net/teams/113040/60.png \n", 451 | "\n", 452 | " club_flag_url \\\n", 453 | "0 https://cdn.sofifa.net/flags/fr.png \n", 454 | "1 https://cdn.sofifa.net/flags/de.png \n", 455 | "2 https://cdn.sofifa.net/flags/gb-eng.png \n", 456 | "3 https://cdn.sofifa.net/flags/fr.png \n", 457 | "4 https://cdn.sofifa.net/flags/gb-eng.png \n", 458 | "... ... \n", 459 | "19234 https://cdn.sofifa.net/flags/cn.png \n", 460 | "19235 https://cdn.sofifa.net/flags/ie.png \n", 461 | "19236 https://cdn.sofifa.net/flags/ie.png \n", 462 | "19237 https://cdn.sofifa.net/flags/ie.png \n", 463 | "19238 https://cdn.sofifa.net/flags/in.png \n", 464 | "\n", 465 | " nation_logo_url \\\n", 466 | "0 https://cdn.sofifa.net/teams/1369/60.png \n", 467 | "1 https://cdn.sofifa.net/teams/1353/60.png \n", 468 | "2 https://cdn.sofifa.net/teams/1354/60.png \n", 469 | "3 NaN \n", 470 | "4 https://cdn.sofifa.net/teams/1325/60.png \n", 471 | "... ... \n", 472 | "19234 NaN \n", 473 | "19235 NaN \n", 474 | "19236 NaN \n", 475 | "19237 NaN \n", 476 | "19238 NaN \n", 477 | "\n", 478 | " nation_flag_url \n", 479 | "0 https://cdn.sofifa.net/flags/ar.png \n", 480 | "1 https://cdn.sofifa.net/flags/pl.png \n", 481 | "2 https://cdn.sofifa.net/flags/pt.png \n", 482 | "3 https://cdn.sofifa.net/flags/br.png \n", 483 | "4 https://cdn.sofifa.net/flags/be.png \n", 484 | "... ... \n", 485 | "19234 https://cdn.sofifa.net/flags/cn.png \n", 486 | "19235 https://cdn.sofifa.net/flags/ie.png \n", 487 | "19236 https://cdn.sofifa.net/flags/ie.png \n", 488 | "19237 https://cdn.sofifa.net/flags/ie.png \n", 489 | "19238 https://cdn.sofifa.net/flags/in.png \n", 490 | "\n", 491 | "[19239 rows x 110 columns]" 492 | ] 493 | }, 494 | "execution_count": 19, 495 | "metadata": {}, 496 | "output_type": "execute_result" 497 | } 498 | ], 499 | "source": [ 500 | "players" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": {}, 506 | "source": [ 507 | "Let's list the features based on which we will cluster our players\n", 508 | "\n", 509 | "* `overall` - overall rating of the player\n", 510 | "* `potential` - potential rating of the player\n", 511 | "* `value_eur` - total value of the player to the Club in Euros\n", 512 | "* `wage_eur` - wage of the player in Euros\n", 513 | "* `age` - age of the player" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": 20, 519 | "metadata": {}, 520 | "outputs": [], 521 | "source": [ 522 | "features = [\"overall\", \"potential\", \"wage_eur\", \"value_eur\", \"age\"]" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 21, 528 | "metadata": {}, 529 | "outputs": [], 530 | "source": [ 531 | "# dropping null values from columns mentioned in features\n", 532 | "players = players.dropna(subset=features)\n", 533 | "# this will make sure that there will be no missing values in the columns that we are clustering" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": 22, 539 | "metadata": {}, 540 | "outputs": [], 541 | "source": [ 542 | "data = players[features].copy()" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": 23, 548 | "metadata": {}, 549 | "outputs": [ 550 | { 551 | "data": { 552 | "text/html": [ 553 | "
\n", 554 | "\n", 567 | "\n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | "
overallpotentialwage_eurvalue_eurage
09393320000.078000000.034
19292270000.0119500000.032
29191270000.045000000.036
39191270000.0129000000.029
49191350000.0125500000.030
..................
1923447521000.070000.022
192354759500.0110000.019
192364755500.0100000.021
192374760500.0110000.019
192384760500.0110000.019
\n", 669 | "

19165 rows × 5 columns

\n", 670 | "
" 671 | ], 672 | "text/plain": [ 673 | " overall potential wage_eur value_eur age\n", 674 | "0 93 93 320000.0 78000000.0 34\n", 675 | "1 92 92 270000.0 119500000.0 32\n", 676 | "2 91 91 270000.0 45000000.0 36\n", 677 | "3 91 91 270000.0 129000000.0 29\n", 678 | "4 91 91 350000.0 125500000.0 30\n", 679 | "... ... ... ... ... ...\n", 680 | "19234 47 52 1000.0 70000.0 22\n", 681 | "19235 47 59 500.0 110000.0 19\n", 682 | "19236 47 55 500.0 100000.0 21\n", 683 | "19237 47 60 500.0 110000.0 19\n", 684 | "19238 47 60 500.0 110000.0 19\n", 685 | "\n", 686 | "[19165 rows x 5 columns]" 687 | ] 688 | }, 689 | "execution_count": 23, 690 | "metadata": {}, 691 | "output_type": "execute_result" 692 | } 693 | ], 694 | "source": [ 695 | "data" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "metadata": {}, 701 | "source": [ 702 | "1. Scale the data\n", 703 | "2. Initialize random centroids\n", 704 | "3. Label each data points\n", 705 | "4. Update centroids\n", 706 | "5. Repeat steps 3 and 4 until centroids stop changing" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "## 1. Scaling the Data" 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": 26, 719 | "metadata": {}, 720 | "outputs": [], 721 | "source": [ 722 | "data = ((data - data.min()) / (data.max() - data.min())) * 9 + 1" 723 | ] 724 | }, 725 | { 726 | "cell_type": "code", 727 | "execution_count": 27, 728 | "metadata": {}, 729 | "outputs": [ 730 | { 731 | "data": { 732 | "text/html": [ 733 | "
\n", 734 | "\n", 747 | "\n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | "
overallpotentialwage_eurvalue_eurage
count19165.00000019165.00000019165.00000019165.00000019165.000000
mean4.6704725.3199981.2194431.1318264.063345
std1.3466351.1910760.5015280.3532291.575838
min1.0000001.0000001.0000001.0000001.000000
25%3.7391304.5217391.0128761.0216202.666667
50%4.7173915.3043481.0643781.0448174.000000
75%5.5000006.0869571.1931331.0923705.333333
max10.00000010.00000010.00000010.00000010.000000
\n", 825 | "
" 826 | ], 827 | "text/plain": [ 828 | " overall potential wage_eur value_eur age\n", 829 | "count 19165.000000 19165.000000 19165.000000 19165.000000 19165.000000\n", 830 | "mean 4.670472 5.319998 1.219443 1.131826 4.063345\n", 831 | "std 1.346635 1.191076 0.501528 0.353229 1.575838\n", 832 | "min 1.000000 1.000000 1.000000 1.000000 1.000000\n", 833 | "25% 3.739130 4.521739 1.012876 1.021620 2.666667\n", 834 | "50% 4.717391 5.304348 1.064378 1.044817 4.000000\n", 835 | "75% 5.500000 6.086957 1.193133 1.092370 5.333333\n", 836 | "max 10.000000 10.000000 10.000000 10.000000 10.000000" 837 | ] 838 | }, 839 | "execution_count": 27, 840 | "metadata": {}, 841 | "output_type": "execute_result" 842 | } 843 | ], 844 | "source": [ 845 | "data.describe()" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": 28, 851 | "metadata": {}, 852 | "outputs": [ 853 | { 854 | "data": { 855 | "text/html": [ 856 | "
\n", 857 | "\n", 870 | "\n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | "
overallpotentialwage_eurvalue_eurage
010.0000009.6086969.2274684.6183077.000000
19.8043489.4130437.9399146.5436546.333333
29.6086969.2173917.9399143.0873087.666667
39.6086969.2173917.9399146.9843965.333333
49.6086969.21739110.0000006.8220185.666667
\n", 924 | "
" 925 | ], 926 | "text/plain": [ 927 | " overall potential wage_eur value_eur age\n", 928 | "0 10.000000 9.608696 9.227468 4.618307 7.000000\n", 929 | "1 9.804348 9.413043 7.939914 6.543654 6.333333\n", 930 | "2 9.608696 9.217391 7.939914 3.087308 7.666667\n", 931 | "3 9.608696 9.217391 7.939914 6.984396 5.333333\n", 932 | "4 9.608696 9.217391 10.000000 6.822018 5.666667" 933 | ] 934 | }, 935 | "execution_count": 28, 936 | "metadata": {}, 937 | "output_type": "execute_result" 938 | } 939 | ], 940 | "source": [ 941 | "data.head()" 942 | ] 943 | }, 944 | { 945 | "cell_type": "markdown", 946 | "metadata": {}, 947 | "source": [ 948 | "## 2. Intializing Random Centroids" 949 | ] 950 | }, 951 | { 952 | "cell_type": "code", 953 | "execution_count": 32, 954 | "metadata": {}, 955 | "outputs": [], 956 | "source": [ 957 | "def random_centroid(data, k):\n", 958 | " centroids = []\n", 959 | " for i in range(k):\n", 960 | " centroid = data.apply(lambda x: float(x.sample()))\n", 961 | " centroids.append(centroid)\n", 962 | " return pd.concat(centroids, axis=1)" 963 | ] 964 | }, 965 | { 966 | "cell_type": "code", 967 | "execution_count": 33, 968 | "metadata": {}, 969 | "outputs": [], 970 | "source": [ 971 | "centroids = random_centroid(data, 5)" 972 | ] 973 | }, 974 | { 975 | "cell_type": "code", 976 | "execution_count": 34, 977 | "metadata": {}, 978 | "outputs": [ 979 | { 980 | "data": { 981 | "text/html": [ 982 | "
\n", 983 | "\n", 996 | "\n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | "
01234
overall6.2826095.1086966.2826095.5000004.521739
potential6.4782614.7173915.1086964.1304355.695652
wage_eur1.1158801.0128761.3733911.2703861.064378
value_eur1.0970101.0250991.1758791.0877311.055255
age7.0000005.6666672.6666672.0000002.666667
\n", 1050 | "
" 1051 | ], 1052 | "text/plain": [ 1053 | " 0 1 2 3 4\n", 1054 | "overall 6.282609 5.108696 6.282609 5.500000 4.521739\n", 1055 | "potential 6.478261 4.717391 5.108696 4.130435 5.695652\n", 1056 | "wage_eur 1.115880 1.012876 1.373391 1.270386 1.064378\n", 1057 | "value_eur 1.097010 1.025099 1.175879 1.087731 1.055255\n", 1058 | "age 7.000000 5.666667 2.666667 2.000000 2.666667" 1059 | ] 1060 | }, 1061 | "execution_count": 34, 1062 | "metadata": {}, 1063 | "output_type": "execute_result" 1064 | } 1065 | ], 1066 | "source": [ 1067 | "centroids" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": 41, 1073 | "metadata": {}, 1074 | "outputs": [], 1075 | "source": [ 1076 | "def get_labels(data, centroids):\n", 1077 | " distances = centroids.apply(lambda x: np.sqrt(((data - x) ** 2).sum(axis=1)))\n", 1078 | " return distances.idxmin(axis=1)" 1079 | ] 1080 | }, 1081 | { 1082 | "cell_type": "code", 1083 | "execution_count": 43, 1084 | "metadata": {}, 1085 | "outputs": [], 1086 | "source": [ 1087 | "labels = get_labels(data, centroids)" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "code", 1092 | "execution_count": 45, 1093 | "metadata": {}, 1094 | "outputs": [ 1095 | { 1096 | "data": { 1097 | "text/plain": [ 1098 | "4 9064\n", 1099 | "1 6605\n", 1100 | "0 1713\n", 1101 | "2 1701\n", 1102 | "3 82\n", 1103 | "dtype: int64" 1104 | ] 1105 | }, 1106 | "execution_count": 45, 1107 | "metadata": {}, 1108 | "output_type": "execute_result" 1109 | } 1110 | ], 1111 | "source": [ 1112 | "labels.value_counts()" 1113 | ] 1114 | }, 1115 | { 1116 | "cell_type": "code", 1117 | "execution_count": 48, 1118 | "metadata": {}, 1119 | "outputs": [], 1120 | "source": [ 1121 | "def new_centroids(data, labels, k):\n", 1122 | " return data.groupby(labels).apply(lambda x: np.exp(np.log(x).mean())).T\n" 1123 | ] 1124 | }, 1125 | { 1126 | "cell_type": "code", 1127 | "execution_count": 49, 1128 | "metadata": {}, 1129 | "outputs": [], 1130 | "source": [ 1131 | "from sklearn.decomposition import PCA\n", 1132 | "import matplotlib.pyplot as plt\n", 1133 | "from IPython.display import clear_output" 1134 | ] 1135 | }, 1136 | { 1137 | "cell_type": "code", 1138 | "execution_count": 54, 1139 | "metadata": {}, 1140 | "outputs": [], 1141 | "source": [ 1142 | "def plot_clusters(data, labels, centroids, iteration):\n", 1143 | " pca = PCA(n_components=2)\n", 1144 | " data_2d = pca.fit_transform(data)\n", 1145 | " centroids_2d = pca.transform(centroids.T)\n", 1146 | " clear_output(wait=True)\n", 1147 | " plt.title(f'Iteration {iteration}')\n", 1148 | " plt.scatter(x=data_2d[:, 0], y=data_2d[:,1], c=labels)\n", 1149 | " plt.show()" 1150 | ] 1151 | }, 1152 | { 1153 | "cell_type": "code", 1154 | "execution_count": 55, 1155 | "metadata": {}, 1156 | "outputs": [ 1157 | { 1158 | "data": { 1159 | "image/png": "", 1160 | "text/plain": [ 1161 | "
" 1162 | ] 1163 | }, 1164 | "metadata": { 1165 | "needs_background": "light" 1166 | }, 1167 | "output_type": "display_data" 1168 | } 1169 | ], 1170 | "source": [ 1171 | "max_iterations = 100\n", 1172 | "k = 3 # number of clusters\n", 1173 | "\n", 1174 | "centroids = random_centroid(data, k)\n", 1175 | "old_centroids = pd.DataFrame()\n", 1176 | "iteration = 1\n", 1177 | "\n", 1178 | "while iteration < max_iterations and not centroids.equals(old_centroids):\n", 1179 | " old_centroids = centroids\n", 1180 | "\n", 1181 | " labels = get_labels(data, centroids)\n", 1182 | " centroids = new_centroids(data, labels, k)\n", 1183 | " plot_clusters(data, labels, centroids, iteration)\n", 1184 | " iteration += 1" 1185 | ] 1186 | }, 1187 | { 1188 | "cell_type": "code", 1189 | "execution_count": 56, 1190 | "metadata": {}, 1191 | "outputs": [ 1192 | { 1193 | "data": { 1194 | "text/html": [ 1195 | "
\n", 1196 | "\n", 1209 | "\n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | "
012
overall5.8075034.7819603.205672
potential6.4978704.5068134.930905
wage_eur1.4205001.1184981.028564
value_eur1.2856851.0449091.026655
age3.5982155.4676482.514741
\n", 1251 | "
" 1252 | ], 1253 | "text/plain": [ 1254 | " 0 1 2\n", 1255 | "overall 5.807503 4.781960 3.205672\n", 1256 | "potential 6.497870 4.506813 4.930905\n", 1257 | "wage_eur 1.420500 1.118498 1.028564\n", 1258 | "value_eur 1.285685 1.044909 1.026655\n", 1259 | "age 3.598215 5.467648 2.514741" 1260 | ] 1261 | }, 1262 | "execution_count": 56, 1263 | "metadata": {}, 1264 | "output_type": "execute_result" 1265 | } 1266 | ], 1267 | "source": [ 1268 | "centroids" 1269 | ] 1270 | }, 1271 | { 1272 | "cell_type": "code", 1273 | "execution_count": 58, 1274 | "metadata": {}, 1275 | "outputs": [ 1276 | { 1277 | "data": { 1278 | "text/html": [ 1279 | "
\n", 1280 | "\n", 1293 | "\n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | "
short_nameoverallpotentialwage_eurvalue_eurage
199Pepe828214000.05500000.038
284Joaquín818123000.08500000.039
292José Fonte818130000.04600000.037
388G. Buffon808018000.02300000.043
509Iniesta797910000.05500000.037
.....................
18890S. Haokip5151500.060000.028
18971Lalkhawpuimawia5151500.060000.029
19032Song Yue50502000.040000.029
19100J. Russell4949500.015000.036
19118Gao Xiang49492000.035000.032
\n", 1407 | "

7191 rows × 6 columns

\n", 1408 | "
" 1409 | ], 1410 | "text/plain": [ 1411 | " short_name overall potential wage_eur value_eur age\n", 1412 | "199 Pepe 82 82 14000.0 5500000.0 38\n", 1413 | "284 Joaquín 81 81 23000.0 8500000.0 39\n", 1414 | "292 José Fonte 81 81 30000.0 4600000.0 37\n", 1415 | "388 G. Buffon 80 80 18000.0 2300000.0 43\n", 1416 | "509 Iniesta 79 79 10000.0 5500000.0 37\n", 1417 | "... ... ... ... ... ... ...\n", 1418 | "18890 S. Haokip 51 51 500.0 60000.0 28\n", 1419 | "18971 Lalkhawpuimawia 51 51 500.0 60000.0 29\n", 1420 | "19032 Song Yue 50 50 2000.0 40000.0 29\n", 1421 | "19100 J. Russell 49 49 500.0 15000.0 36\n", 1422 | "19118 Gao Xiang 49 49 2000.0 35000.0 32\n", 1423 | "\n", 1424 | "[7191 rows x 6 columns]" 1425 | ] 1426 | }, 1427 | "execution_count": 58, 1428 | "metadata": {}, 1429 | "output_type": "execute_result" 1430 | } 1431 | ], 1432 | "source": [ 1433 | "players[labels == 1][[\"short_name\"] + features]" 1434 | ] 1435 | }, 1436 | { 1437 | "cell_type": "markdown", 1438 | "metadata": {}, 1439 | "source": [ 1440 | "## Conclusion\n", 1441 | "Cluster 0 represets star players\n", 1442 | "\n", 1443 | "Cluster 1 represents older players who have hit their potential.\n", 1444 | "\n", 1445 | "Cluster 2 represents young players\n", 1446 | "\n", 1447 | "We were able to confirm this hypothesis by looking at the name of players. \n", 1448 | "\n", 1449 | "K-means allows you to find patterns in your data that you didn't know were there. This allowed us to categorize players automatically into different groups. " 1450 | ] 1451 | }, 1452 | { 1453 | "cell_type": "code", 1454 | "execution_count": 60, 1455 | "metadata": {}, 1456 | "outputs": [], 1457 | "source": [ 1458 | "from sklearn.cluster import KMeans" 1459 | ] 1460 | }, 1461 | { 1462 | "cell_type": "code", 1463 | "execution_count": 61, 1464 | "metadata": {}, 1465 | "outputs": [ 1466 | { 1467 | "data": { 1468 | "text/html": [ 1469 | "
KMeans(n_clusters=3)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" 1470 | ], 1471 | "text/plain": [ 1472 | "KMeans(n_clusters=3)" 1473 | ] 1474 | }, 1475 | "execution_count": 61, 1476 | "metadata": {}, 1477 | "output_type": "execute_result" 1478 | } 1479 | ], 1480 | "source": [ 1481 | "kmeans = KMeans(3)\n", 1482 | "kmeans.fit(data)" 1483 | ] 1484 | }, 1485 | { 1486 | "cell_type": "code", 1487 | "execution_count": 62, 1488 | "metadata": {}, 1489 | "outputs": [], 1490 | "source": [ 1491 | "centroids = kmeans.cluster_centers_" 1492 | ] 1493 | }, 1494 | { 1495 | "cell_type": "code", 1496 | "execution_count": 63, 1497 | "metadata": {}, 1498 | "outputs": [ 1499 | { 1500 | "data": { 1501 | "text/html": [ 1502 | "
\n", 1503 | "\n", 1516 | "\n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | "
012
overall6.2329404.8008603.601045
potential6.6235094.5038555.207475
wage_eur1.6579001.1128851.040009
value_eur1.4141341.0400071.035859
age4.1421025.6089232.712137
\n", 1558 | "
" 1559 | ], 1560 | "text/plain": [ 1561 | " 0 1 2\n", 1562 | "overall 6.232940 4.800860 3.601045\n", 1563 | "potential 6.623509 4.503855 5.207475\n", 1564 | "wage_eur 1.657900 1.112885 1.040009\n", 1565 | "value_eur 1.414134 1.040007 1.035859\n", 1566 | "age 4.142102 5.608923 2.712137" 1567 | ] 1568 | }, 1569 | "execution_count": 63, 1570 | "metadata": {}, 1571 | "output_type": "execute_result" 1572 | } 1573 | ], 1574 | "source": [ 1575 | "pd.DataFrame(centroids, columns=features).T" 1576 | ] 1577 | }, 1578 | { 1579 | "cell_type": "code", 1580 | "execution_count": null, 1581 | "metadata": {}, 1582 | "outputs": [], 1583 | "source": [] 1584 | } 1585 | ], 1586 | "metadata": { 1587 | "kernelspec": { 1588 | "display_name": "Python 3.9.13 64-bit", 1589 | "language": "python", 1590 | "name": "python3" 1591 | }, 1592 | "language_info": { 1593 | "codemirror_mode": { 1594 | "name": "ipython", 1595 | "version": 3 1596 | }, 1597 | "file_extension": ".py", 1598 | "mimetype": "text/x-python", 1599 | "name": "python", 1600 | "nbconvert_exporter": "python", 1601 | "pygments_lexer": "ipython3", 1602 | "version": "3.9.13" 1603 | }, 1604 | "orig_nbformat": 4, 1605 | "vscode": { 1606 | "interpreter": { 1607 | "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" 1608 | } 1609 | } 1610 | }, 1611 | "nbformat": 4, 1612 | "nbformat_minor": 2 1613 | } 1614 | --------------------------------------------------------------------------------