├── .gitignore ├── 1. Beginning with Python.ipynb ├── 2. First steps with Pandas.ipynb ├── 3.1 Visualization with Matplotlib.ipynb ├── 3.2 Visualization with Seaborn.ipynb ├── 4. More Python basics.ipynb ├── 5. More Pandas.ipynb ├── CHANGELOG.md ├── LICENSE ├── README.md ├── data ├── Iris │ ├── Iris.csv │ └── Iris_data.csv ├── Penguins │ ├── penguins.csv │ └── penguins_clean.csv ├── Pokemon │ └── pokemon.csv └── food_training │ ├── languages.csv │ ├── training_2014.csv │ ├── training_2015.csv │ └── training_2016.csv ├── media ├── colab │ ├── image1.png │ ├── image10.png │ ├── image2.png │ ├── image3.png │ ├── image4.png │ ├── image5.png │ ├── image6.png │ ├── image7.png │ ├── image8.png │ └── image9.png ├── humble-data-logo-transparent.png ├── humble-data-logo-white-transparent.png └── matplotlib_logo_light.svg ├── requirements.txt └── solutions ├── 01_01.py ├── 01_02.py ├── 01_03.py ├── 01_04.py ├── 01_05.py ├── 01_06.py ├── 01_07.py ├── 01_08.py ├── 01_09.py ├── 01_10.py ├── 01_11.py ├── 01_12.py ├── 01_13.py ├── 01_14.py ├── 01_15.py ├── 01_16.py ├── 01_17.py ├── 01_18.py ├── 01_19.py ├── 01_20.py ├── 01_21.py ├── 01_22.py ├── 01_23.py ├── 01_24.py ├── 01_25.py ├── 01_26.py ├── 01_27.py ├── 01_28.py ├── 01_29.py ├── 01_30.py ├── 01_31.py ├── 01_32.py ├── 01_33.py ├── 01_34.py ├── 01_35.py ├── 01_36.py ├── 01_37.py ├── 01_38.py ├── 02_01.py ├── 02_02.py ├── 02_03.py ├── 02_04.py ├── 02_05.py ├── 02_06.py ├── 02_07.py ├── 02_08.py ├── 02_09.py ├── 02_10.py ├── 02_11.py ├── 02_12.py ├── 02_13.py ├── 02_14.py ├── 02_15.py ├── 02_16.py ├── 02_17.py ├── 02_18.py ├── 02_19.py ├── 02_20.py ├── 02_21.py ├── 02_22.py ├── 02_23.py ├── 02_24.py ├── 02_25.py ├── 02_26.py ├── 02_27.py ├── 02_28.py ├── 02_29.py ├── 02_30.py ├── 02_31.py ├── 02_32.py ├── 04_01.py ├── 04_02.py ├── 04_03.py ├── 04_04.py ├── 04_05.py ├── 04_06.py ├── 04_07.py ├── 04_08.py ├── 04_09.py ├── 05_01.py ├── 05_02.py ├── 05_03.py ├── 05_04.py ├── 05_05.py ├── 05_06.py ├── 05_07.py ├── 05_08.py ├── 05_09.py ├── 05_10.py ├── 05_11.py ├── 05_12.py ├── 05_13.py ├── 05_14.py ├── 05_15.py ├── 05_16.py ├── 05_17.py ├── 05_18.py ├── 05_19.py ├── 05_20.py ├── 05_21.py ├── 05_22.py ├── 05_23.py ├── 05_24.py ├── 05_25.py ├── 05_26.py ├── 05_27.py ├── 05_28.py ├── 05_29.py ├── 05_30.py ├── 05_31.py ├── 05_32.py ├── 05_33.py ├── 05_34.py ├── 05_35.py ├── 05_36.py ├── 05_37.py ├── 05_38.py ├── 05_39.py ├── 05_40.py ├── 05_41.py ├── 05_42.py ├── 05_43.py ├── 05_44.py ├── 05_45.py ├── 05_46.py └── 05_47.py /.gitignore: -------------------------------------------------------------------------------- 1 | ### https://www.gitignore.io/ ### 2 | 3 | # Caches 4 | **/.ipynb_checkpoints/* 5 | **/__pycache__/* 6 | *.pyc 7 | 8 | # Data 9 | data/Penguins/my_penguins.csv 10 | 11 | # CoCalc 12 | *.sage-chat 13 | *.sage-jupyter2 14 | 15 | # Other 16 | .DS_Store 17 | .vscode 18 | 19 | env_workshop/ 20 | -------------------------------------------------------------------------------- /2. First steps with Pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "
\n", 8 | "

\n", 9 | "

\n", 10 | "

\n", 11 | "Data Analysis with Pandas\n", 12 | "

\n", 13 | "
" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "> ***Note***: This notebook contains solution cells with ***a*** solution. Remember there is not only one solution to a problem! \n", 21 | "> \n", 22 | "> You will recognise these cells as they start with **# %**. \n", 23 | "> \n", 24 | "> If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "

\n", 32 | "Data analysis packages\n", 33 | "


\n", 34 | "
" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "Data Scientists use a wide variety of libraries in Python that make working with data significantly easier. Those libraries primarily consist of:\n", 42 | "\n", 43 | "| Package | Description |\n", 44 | "| -- | -- |\n", 45 | "| `NumPy` | Numerical calculations - does all the heavy lifting by passing out to C subroutines. This means you get _both_ the productivity of Python, _and_ the computational power of C. Best of both worlds! |\n", 46 | "| `SciPy` | Scientific computing, statistic tests, and much more! |\n", 47 | "| `pandas` | Your data manipulation swiss army knife. You'll likely see pandas used in any PyData demo! pandas is built on top of NumPy, so it's **fast**. |\n", 48 | "| `matplotlib` | An old but powerful data visualisation package, inspired by Matlab. |\n", 49 | "| `Seaborn` | A newer and easy-to-use but limited data visualisation package, built on top of matplotlib. |\n", 50 | "| `scikit-learn` | Your one-stop machine learning shop! Classification, regression, clustering, dimensional reduction and more. |\n", 51 | "| `nltk` and `spacy` | nltk = natural language processing toolkit; spacy is a newer package for natural language processing but very easy to use. |\n", 52 | "| `statsmodels` | Statistical tests, time series forecasting and more. The \"model formula\" interface will be familiar to R users. |\n", 53 | "| `requests` and `Beautiful Soup` | `requests` + `Beautiful Soup` = great combination for building web scrapers. |\n", 54 | "| `Jupyter` | Jupyter itself is a package too. See the latest version at https://pypi.org/project/jupyter/, and upgrade with e.g. `conda install jupyter==1.0.0` |\n", 55 | "\n", 56 | "Though there are countless others available.\n", 57 | "\n", 58 | "For today, we'll primarily focus ourselves around the library that is 99% of our work: `pandas`. Pandas is built on top of the speed and power of NumPy." 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "---\n", 66 | "\n", 67 | "

\n", 68 | "Imports\n", 69 | "


\n", 70 | "
" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "jupyter": { 78 | "outputs_hidden": false 79 | } 80 | }, 81 | "outputs": [], 82 | "source": [ 83 | "import pandas as pd" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | ">Import numpy using the convention seen at the end of the first notebook." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "jupyter": { 98 | "outputs_hidden": false 99 | } 100 | }, 101 | "outputs": [], 102 | "source": [] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": { 108 | "jupyter": { 109 | "outputs_hidden": false 110 | } 111 | }, 112 | "outputs": [], 113 | "source": "# !cat solutions/02_01.py" 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "---\n", 120 | "\n", 121 | "

\n", 122 | "Loading the data\n", 123 | "


\n", 124 | "
" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "To see a method's documentation, you can use the help function. In Jupyter, you can also just put a question mark before the method." 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": { 138 | "jupyter": { 139 | "outputs_hidden": false 140 | }, 141 | "scrolled": true 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "?pd.read_csv" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "To load the dataframe we are using in this notebook, we will provide the path to the file: ../data/Penguins/penguins.csv" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | ">Load the dataframe, read it into a pandas DataFrame and assign it to df" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": { 166 | "jupyter": { 167 | "outputs_hidden": false 168 | } 169 | }, 170 | "outputs": [], 171 | "source": [] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": { 177 | "jupyter": { 178 | "outputs_hidden": false 179 | } 180 | }, 181 | "outputs": [], 182 | "source": "# !cat solutions/02_02.py" 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "**To have a look at the first 5 rows of df, we can use the *head* method.**" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "jupyter": { 196 | "outputs_hidden": false 197 | } 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "df.head()" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | ">Have a look at the last 3 rows of df using the tail method" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": { 215 | "jupyter": { 216 | "outputs_hidden": false 217 | } 218 | }, 219 | "outputs": [], 220 | "source": [] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": { 226 | "jupyter": { 227 | "outputs_hidden": false 228 | } 229 | }, 230 | "outputs": [], 231 | "source": "# !cat solutions/02_03.py" 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "---\n", 238 | "\n", 239 | "

\n", 240 | "General information about the dataset\n", 241 | "


\n", 242 | "
" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "**To get the size of the datasets, we can use the *shape* attribute.** \n", 250 | "The first number is the number of row, the second one the number of columns" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | ">Show the shape of df (do not put brackets at the end)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": { 264 | "jupyter": { 265 | "outputs_hidden": false 266 | } 267 | }, 268 | "outputs": [], 269 | "source": [] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": { 275 | "jupyter": { 276 | "outputs_hidden": false 277 | } 278 | }, 279 | "outputs": [], 280 | "source": "# !cat solutions/02_04.py" 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | ">Get the names of the columns and info about them (number of non null and type) using the info method." 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": { 293 | "jupyter": { 294 | "outputs_hidden": false 295 | } 296 | }, 297 | "outputs": [], 298 | "source": [] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": { 304 | "jupyter": { 305 | "outputs_hidden": false 306 | } 307 | }, 308 | "outputs": [], 309 | "source": "# !cat solutions/02_05.py" 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | ">Get the columns of the dataframe using the columns attribute." 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "jupyter": { 323 | "outputs_hidden": false 324 | } 325 | }, 326 | "outputs": [], 327 | "source": [] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "jupyter": { 334 | "outputs_hidden": false 335 | } 336 | }, 337 | "outputs": [], 338 | "source": "# !cat solutions/02_06.py" 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "---\n", 345 | "\n", 346 | "

\n", 347 | "Display settings\n", 348 | "


\n", 349 | "
" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "We can check the display option of the notebook." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": { 363 | "jupyter": { 364 | "outputs_hidden": false 365 | } 366 | }, 367 | "outputs": [], 368 | "source": [ 369 | "pd.options.display.max_rows" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | ">Force pandas to display 25 rows by changing the value of the above." 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": { 383 | "jupyter": { 384 | "outputs_hidden": false 385 | } 386 | }, 387 | "outputs": [], 388 | "source": [] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "metadata": { 394 | "jupyter": { 395 | "outputs_hidden": false 396 | } 397 | }, 398 | "outputs": [], 399 | "source": "# !cat solutions/02_07.py" 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "---\n", 406 | "\n", 407 | "

\n", 408 | "Subsetting data\n", 409 | "


\n", 410 | "
" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "We can subset a dataframe by label, by index or a combination of both. \n", 418 | "There are different ways to do it, using .loc, .iloc and also []. \n", 419 | "See [documentation ](https://pandas.pydata.org/pandas-docs/stable/indexing.html)." 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | ">Display the 'bill_length_mm' column" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": { 433 | "jupyter": { 434 | "outputs_hidden": false 435 | } 436 | }, 437 | "outputs": [], 438 | "source": [] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": { 444 | "jupyter": { 445 | "outputs_hidden": false 446 | } 447 | }, 448 | "outputs": [], 449 | "source": "# !cat solutions/02_08.py" 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "*Note:* We could also use `df.bill_length_mm`, but it's not the greatest idea because it could be mixed with methods and does not work for columns with spaces." 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": {}, 461 | "source": [ 462 | ">Have a look at the 12th observation:" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": null, 468 | "metadata": { 469 | "jupyter": { 470 | "outputs_hidden": false 471 | } 472 | }, 473 | "outputs": [], 474 | "source": [ 475 | "# using .iloc (uses positions, \"i\" stands for integer)\n" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": { 482 | "jupyter": { 483 | "outputs_hidden": false 484 | } 485 | }, 486 | "outputs": [], 487 | "source": "# !cat solutions/02_09.py" 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": null, 492 | "metadata": { 493 | "jupyter": { 494 | "outputs_hidden": false 495 | } 496 | }, 497 | "outputs": [], 498 | "source": [ 499 | "# using .loc (uses indexes and labels)\n" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": { 506 | "jupyter": { 507 | "outputs_hidden": false 508 | } 509 | }, 510 | "outputs": [], 511 | "source": "# !cat solutions/02_10.py" 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | ">Display the **bill_length_mm** of the last three observations." 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": null, 523 | "metadata": { 524 | "jupyter": { 525 | "outputs_hidden": false 526 | } 527 | }, 528 | "outputs": [], 529 | "source": [ 530 | "# using .iloc\n" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": null, 536 | "metadata": { 537 | "jupyter": { 538 | "outputs_hidden": false 539 | } 540 | }, 541 | "outputs": [], 542 | "source": "# !cat solutions/02_11.py" 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": null, 547 | "metadata": { 548 | "jupyter": { 549 | "outputs_hidden": false 550 | } 551 | }, 552 | "outputs": [], 553 | "source": [ 554 | "# using .loc\n" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": null, 560 | "metadata": { 561 | "jupyter": { 562 | "outputs_hidden": false 563 | }, 564 | "scrolled": true 565 | }, 566 | "outputs": [], 567 | "source": "# !cat solutions/02_12.py" 568 | }, 569 | { 570 | "cell_type": "markdown", 571 | "metadata": {}, 572 | "source": [ 573 | "And finally look at the **flipper_length_mm** and **body_mass_g** of the 146th, the 8th and the 1rst observations:" 574 | ] 575 | }, 576 | { 577 | "cell_type": "code", 578 | "execution_count": null, 579 | "metadata": { 580 | "jupyter": { 581 | "outputs_hidden": false 582 | } 583 | }, 584 | "outputs": [], 585 | "source": [ 586 | "# using .iloc\n" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": null, 592 | "metadata": { 593 | "jupyter": { 594 | "outputs_hidden": false 595 | } 596 | }, 597 | "outputs": [], 598 | "source": "# !cat solutions/02_13.py" 599 | }, 600 | { 601 | "cell_type": "code", 602 | "execution_count": null, 603 | "metadata": { 604 | "jupyter": { 605 | "outputs_hidden": false 606 | } 607 | }, 608 | "outputs": [], 609 | "source": [ 610 | "# using .loc\n" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": null, 616 | "metadata": { 617 | "jupyter": { 618 | "outputs_hidden": false 619 | } 620 | }, 621 | "outputs": [], 622 | "source": "# !cat solutions/02_14.py" 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "**!!WARNING!!** Unlike Python and ``.iloc``, the end value in a range specified by ``.loc`` **includes** the last index specified. " 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": null, 634 | "metadata": { 635 | "jupyter": { 636 | "outputs_hidden": false 637 | }, 638 | "scrolled": true 639 | }, 640 | "outputs": [], 641 | "source": [ 642 | "df.iloc[5:10]" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": null, 648 | "metadata": { 649 | "jupyter": { 650 | "outputs_hidden": false 651 | } 652 | }, 653 | "outputs": [], 654 | "source": [ 655 | "df.loc[5:10]" 656 | ] 657 | }, 658 | { 659 | "cell_type": "markdown", 660 | "metadata": {}, 661 | "source": [ 662 | "---\n", 663 | "\n", 664 | "

\n", 665 | "Filtering data on conditions\n", 666 | "


\n", 667 | "
" 668 | ] 669 | }, 670 | { 671 | "cell_type": "markdown", 672 | "metadata": {}, 673 | "source": [ 674 | "**We can also use condition(s) to filter.** \n", 675 | "We want to display the rows of df where **body_mass_g** is greater than 4000. We will start by creating a mask with this condition." 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": null, 681 | "metadata": { 682 | "jupyter": { 683 | "outputs_hidden": false 684 | }, 685 | "scrolled": true 686 | }, 687 | "outputs": [], 688 | "source": [ 689 | "mask_PW = df['body_mass_g'] > 4000\n", 690 | "mask_PW" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "Note that this return booleans. If we pass this mask to our dataframe, it will display only the rows where the mask is True." 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": null, 703 | "metadata": { 704 | "jupyter": { 705 | "outputs_hidden": false 706 | } 707 | }, 708 | "outputs": [], 709 | "source": [ 710 | "df[mask_PW]" 711 | ] 712 | }, 713 | { 714 | "cell_type": "markdown", 715 | "metadata": {}, 716 | "source": [ 717 | ">Display the rows of df where **body_mass_g** is greater than 4000 and **flipper_length_mm** is less than 185." 718 | ] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "execution_count": null, 723 | "metadata": { 724 | "jupyter": { 725 | "outputs_hidden": false 726 | } 727 | }, 728 | "outputs": [], 729 | "source": [] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": null, 734 | "metadata": { 735 | "jupyter": { 736 | "outputs_hidden": false 737 | } 738 | }, 739 | "outputs": [], 740 | "source": "# !cat solutions/02_15.py" 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "---\n", 747 | "\n", 748 | "

\n", 749 | "Values\n", 750 | "


\n", 751 | "
" 752 | ] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "We can get the number of unique values from a certain column by using the `nunique` method.\n", 759 | "\n", 760 | "For example, we can get the number of unique values from the species column:" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": { 767 | "jupyter": { 768 | "outputs_hidden": false 769 | } 770 | }, 771 | "outputs": [], 772 | "source": [ 773 | "df['species'].nunique()" 774 | ] 775 | }, 776 | { 777 | "cell_type": "markdown", 778 | "metadata": {}, 779 | "source": [ 780 | "We can also get the list of unique values from a certain column by using the `unique` method.\n", 781 | ">Return the list of unique values from the species column" 782 | ] 783 | }, 784 | { 785 | "cell_type": "code", 786 | "execution_count": null, 787 | "metadata": { 788 | "jupyter": { 789 | "outputs_hidden": false 790 | } 791 | }, 792 | "outputs": [], 793 | "source": [] 794 | }, 795 | { 796 | "cell_type": "code", 797 | "execution_count": null, 798 | "metadata": { 799 | "jupyter": { 800 | "outputs_hidden": false 801 | } 802 | }, 803 | "outputs": [], 804 | "source": "# !cat solutions/02_16.py" 805 | }, 806 | { 807 | "cell_type": "markdown", 808 | "metadata": {}, 809 | "source": [ 810 | "---\n", 811 | "\n", 812 | "

\n", 813 | "Null Values and NaN\n", 814 | "


\n", 815 | "
" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "When you work with data, you will quickly learn that data is never \"clean\". These values are usually referred to as null value. In computation it is best practice to define a \"special number\" that is \"**N**ot **a** **N**umber\" also called NaN.\n", 823 | "\n", 824 | "We can use the `isnull` method to know if a value is null or not. It returns boolean values." 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": null, 830 | "metadata": { 831 | "jupyter": { 832 | "outputs_hidden": false 833 | } 834 | }, 835 | "outputs": [], 836 | "source": [ 837 | "df['flipper_length_mm'].isnull()" 838 | ] 839 | }, 840 | { 841 | "cell_type": "markdown", 842 | "metadata": {}, 843 | "source": [ 844 | "**We can apply different methods one after the other.**. \n", 845 | "For example, we could apply to method `sum` after the method `isnull` to know the number of null observations in the **flipper_length_mm** column.\n", 846 | ">Get the total number of null values for **flipper_length_mm**." 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "execution_count": null, 852 | "metadata": {}, 853 | "outputs": [], 854 | "source": [] 855 | }, 856 | { 857 | "cell_type": "code", 858 | "execution_count": null, 859 | "metadata": {}, 860 | "outputs": [], 861 | "source": "# !cat solutions/02_17.py" 862 | }, 863 | { 864 | "cell_type": "markdown", 865 | "metadata": {}, 866 | "source": [ 867 | "To get the count of the different values of a column, we can use the `value_counts` method.\n", 868 | "\n", 869 | "For example, for the species column:" 870 | ] 871 | }, 872 | { 873 | "cell_type": "code", 874 | "execution_count": null, 875 | "metadata": { 876 | "jupyter": { 877 | "outputs_hidden": false 878 | } 879 | }, 880 | "outputs": [], 881 | "source": [ 882 | "df['species'].value_counts()" 883 | ] 884 | }, 885 | { 886 | "cell_type": "markdown", 887 | "metadata": {}, 888 | "source": [ 889 | "If we want to know the count of NaN values, we have to pass the value `False` to the parameter **dropna** (set to `True` by default).\n", 890 | "> Return the proportion for each sex, including the NaN values.\"" 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "execution_count": null, 896 | "metadata": { 897 | "jupyter": { 898 | "outputs_hidden": false 899 | } 900 | }, 901 | "outputs": [], 902 | "source": [] 903 | }, 904 | { 905 | "cell_type": "code", 906 | "execution_count": null, 907 | "metadata": { 908 | "jupyter": { 909 | "outputs_hidden": false 910 | } 911 | }, 912 | "outputs": [], 913 | "source": "# !cat solutions/02_18.py" 914 | }, 915 | { 916 | "cell_type": "markdown", 917 | "metadata": {}, 918 | "source": [ 919 | "To get the proportion instead of the count of these values, we have to pass the value `True` to the parameter **normalize**.\n", 920 | ">Return the proportion for each species." 921 | ] 922 | }, 923 | { 924 | "cell_type": "code", 925 | "execution_count": null, 926 | "metadata": { 927 | "jupyter": { 928 | "outputs_hidden": false 929 | } 930 | }, 931 | "outputs": [], 932 | "source": [] 933 | }, 934 | { 935 | "cell_type": "code", 936 | "execution_count": null, 937 | "metadata": { 938 | "jupyter": { 939 | "outputs_hidden": false 940 | } 941 | }, 942 | "outputs": [], 943 | "source": "# !cat solutions/02_19.py" 944 | }, 945 | { 946 | "cell_type": "markdown", 947 | "metadata": {}, 948 | "source": [ 949 | ">Using the index attribute, get the indexes of the observation without **flipper_length_mm**" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": null, 955 | "metadata": { 956 | "jupyter": { 957 | "outputs_hidden": false 958 | } 959 | }, 960 | "outputs": [], 961 | "source": [] 962 | }, 963 | { 964 | "cell_type": "code", 965 | "execution_count": null, 966 | "metadata": { 967 | "jupyter": { 968 | "outputs_hidden": false 969 | } 970 | }, 971 | "outputs": [], 972 | "source": "# !cat solutions/02_20.py" 973 | }, 974 | { 975 | "cell_type": "markdown", 976 | "metadata": {}, 977 | "source": [ 978 | "Use the **[dropna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)** method to remove the row which only has NaN values.\n", 979 | ">Get the help for the dropna method." 980 | ] 981 | }, 982 | { 983 | "cell_type": "code", 984 | "execution_count": null, 985 | "metadata": { 986 | "jupyter": { 987 | "outputs_hidden": false 988 | } 989 | }, 990 | "outputs": [], 991 | "source": [] 992 | }, 993 | { 994 | "cell_type": "code", 995 | "execution_count": null, 996 | "metadata": { 997 | "jupyter": { 998 | "outputs_hidden": false 999 | }, 1000 | "scrolled": true 1001 | }, 1002 | "outputs": [], 1003 | "source": "# !cat solutions/02_21.py" 1004 | }, 1005 | { 1006 | "cell_type": "markdown", 1007 | "metadata": {}, 1008 | "source": [ 1009 | ">Use the dropna method to remove the row of `df` where all of the values are NaN, and assign it to `df_2`." 1010 | ] 1011 | }, 1012 | { 1013 | "cell_type": "code", 1014 | "execution_count": null, 1015 | "metadata": { 1016 | "jupyter": { 1017 | "outputs_hidden": false 1018 | } 1019 | }, 1020 | "outputs": [], 1021 | "source": [] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": null, 1026 | "metadata": { 1027 | "jupyter": { 1028 | "outputs_hidden": false 1029 | } 1030 | }, 1031 | "outputs": [], 1032 | "source": "# !cat solutions/02_22.py" 1033 | }, 1034 | { 1035 | "cell_type": "markdown", 1036 | "metadata": {}, 1037 | "source": [ 1038 | "We can use a f-string to format a string. We have to write a `f` before the quotation mark, and write what you want to format between curly brackets." 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "code", 1043 | "execution_count": null, 1044 | "metadata": { 1045 | "jupyter": { 1046 | "outputs_hidden": false 1047 | } 1048 | }, 1049 | "outputs": [], 1050 | "source": [ 1051 | "print(f'shape of df: {df.shape}')" 1052 | ] 1053 | }, 1054 | { 1055 | "cell_type": "markdown", 1056 | "metadata": {}, 1057 | "source": [ 1058 | "> Print the number of rows of `df_2` using a f_string. Did we lose any rows between `df` and `df_2`? If not, why not?" 1059 | ] 1060 | }, 1061 | { 1062 | "cell_type": "code", 1063 | "execution_count": null, 1064 | "metadata": { 1065 | "jupyter": { 1066 | "outputs_hidden": false 1067 | } 1068 | }, 1069 | "outputs": [], 1070 | "source": [] 1071 | }, 1072 | { 1073 | "cell_type": "code", 1074 | "execution_count": null, 1075 | "metadata": { 1076 | "jupyter": { 1077 | "outputs_hidden": false 1078 | } 1079 | }, 1080 | "outputs": [], 1081 | "source": "# !cat solutions/02_23.py" 1082 | }, 1083 | { 1084 | "cell_type": "markdown", 1085 | "metadata": {}, 1086 | "source": [ 1087 | ">Use the dropna method to remove the rows of `df_2` which contains any NaN values, and assign it to `df_3`" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "code", 1092 | "execution_count": null, 1093 | "metadata": { 1094 | "jupyter": { 1095 | "outputs_hidden": false 1096 | } 1097 | }, 1098 | "outputs": [], 1099 | "source": "# !cat solutions/02_24.py" 1100 | }, 1101 | { 1102 | "cell_type": "markdown", 1103 | "metadata": {}, 1104 | "source": [ 1105 | ">Print the number of rows of `df_3` using a f_string." 1106 | ] 1107 | }, 1108 | { 1109 | "cell_type": "code", 1110 | "execution_count": null, 1111 | "metadata": { 1112 | "jupyter": { 1113 | "outputs_hidden": false 1114 | } 1115 | }, 1116 | "outputs": [], 1117 | "source": [] 1118 | }, 1119 | { 1120 | "cell_type": "code", 1121 | "execution_count": null, 1122 | "metadata": { 1123 | "jupyter": { 1124 | "outputs_hidden": false 1125 | } 1126 | }, 1127 | "outputs": [], 1128 | "source": "# !cat solutions/02_25.py" 1129 | }, 1130 | { 1131 | "cell_type": "markdown", 1132 | "metadata": {}, 1133 | "source": [ 1134 | "---\n", 1135 | "\n", 1136 | "

\n", 1137 | "Duplicates\n", 1138 | "


\n", 1139 | "
" 1140 | ] 1141 | }, 1142 | { 1143 | "cell_type": "markdown", 1144 | "metadata": {}, 1145 | "source": [ 1146 | ">Remove the duplicates rows from `df_3`, and assign the new dataframe to `df_4`" 1147 | ] 1148 | }, 1149 | { 1150 | "cell_type": "code", 1151 | "execution_count": null, 1152 | "metadata": { 1153 | "jupyter": { 1154 | "outputs_hidden": false 1155 | } 1156 | }, 1157 | "outputs": [], 1158 | "source": [] 1159 | }, 1160 | { 1161 | "cell_type": "code", 1162 | "execution_count": null, 1163 | "metadata": { 1164 | "jupyter": { 1165 | "outputs_hidden": false 1166 | }, 1167 | "scrolled": true 1168 | }, 1169 | "outputs": [], 1170 | "source": "# !cat solutions/02_26.py" 1171 | }, 1172 | { 1173 | "cell_type": "code", 1174 | "execution_count": null, 1175 | "metadata": { 1176 | "jupyter": { 1177 | "outputs_hidden": false 1178 | } 1179 | }, 1180 | "outputs": [], 1181 | "source": [ 1182 | "# checking the shape of df_4\n", 1183 | "df_4.shape" 1184 | ] 1185 | }, 1186 | { 1187 | "cell_type": "markdown", 1188 | "metadata": {}, 1189 | "source": [ 1190 | "You should see that 4 rows have been dropped. " 1191 | ] 1192 | }, 1193 | { 1194 | "cell_type": "markdown", 1195 | "metadata": {}, 1196 | "source": [ 1197 | "---\n", 1198 | "\n", 1199 | "

\n", 1200 | "Some stats\n", 1201 | "


\n", 1202 | "
" 1203 | ] 1204 | }, 1205 | { 1206 | "cell_type": "markdown", 1207 | "metadata": {}, 1208 | "source": [ 1209 | ">Use the describe method to see how the data is distributed (numerical features only!)" 1210 | ] 1211 | }, 1212 | { 1213 | "cell_type": "code", 1214 | "execution_count": null, 1215 | "metadata": { 1216 | "jupyter": { 1217 | "outputs_hidden": false 1218 | } 1219 | }, 1220 | "outputs": [], 1221 | "source": [] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": null, 1226 | "metadata": { 1227 | "jupyter": { 1228 | "outputs_hidden": false 1229 | } 1230 | }, 1231 | "outputs": [], 1232 | "source": "# !cat solutions/02_27.py" 1233 | }, 1234 | { 1235 | "cell_type": "markdown", 1236 | "metadata": {}, 1237 | "source": [ 1238 | "We can also change the **species** column to save memory space. Note: You may receive a **SettingWithCopyWarning** - you can safely ignore this error for this notebook." 1239 | ] 1240 | }, 1241 | { 1242 | "cell_type": "code", 1243 | "execution_count": null, 1244 | "metadata": { 1245 | "jupyter": { 1246 | "outputs_hidden": false 1247 | } 1248 | }, 1249 | "outputs": [], 1250 | "source": [ 1251 | "df_4['species'] = df_4['species'].astype('category')" 1252 | ] 1253 | }, 1254 | { 1255 | "cell_type": "markdown", 1256 | "metadata": {}, 1257 | "source": [ 1258 | ">Using the dtypes attribute, check the types of the columns of `df_4`" 1259 | ] 1260 | }, 1261 | { 1262 | "cell_type": "code", 1263 | "execution_count": null, 1264 | "metadata": { 1265 | "jupyter": { 1266 | "outputs_hidden": false 1267 | } 1268 | }, 1269 | "outputs": [], 1270 | "source": [] 1271 | }, 1272 | { 1273 | "cell_type": "code", 1274 | "execution_count": null, 1275 | "metadata": { 1276 | "jupyter": { 1277 | "outputs_hidden": false 1278 | } 1279 | }, 1280 | "outputs": [], 1281 | "source": "# !cat solutions/02_28.py" 1282 | }, 1283 | { 1284 | "cell_type": "markdown", 1285 | "metadata": {}, 1286 | "source": [ 1287 | "We can also use the functions count(), mean(), sum(), median(), std(), min() and max() separately if we are only interested in one of those." 1288 | ] 1289 | }, 1290 | { 1291 | "cell_type": "markdown", 1292 | "metadata": {}, 1293 | "source": [ 1294 | ">Get the minimum for each numerical column of `df_4`" 1295 | ] 1296 | }, 1297 | { 1298 | "cell_type": "code", 1299 | "execution_count": null, 1300 | "metadata": { 1301 | "jupyter": { 1302 | "outputs_hidden": false 1303 | } 1304 | }, 1305 | "outputs": [], 1306 | "source": [] 1307 | }, 1308 | { 1309 | "cell_type": "code", 1310 | "execution_count": null, 1311 | "metadata": { 1312 | "jupyter": { 1313 | "outputs_hidden": false 1314 | } 1315 | }, 1316 | "outputs": [], 1317 | "source": "# !cat solutions/02_29.py" 1318 | }, 1319 | { 1320 | "cell_type": "markdown", 1321 | "metadata": {}, 1322 | "source": [ 1323 | ">Calculate the maximum of the **flipper_length_mm**" 1324 | ] 1325 | }, 1326 | { 1327 | "cell_type": "code", 1328 | "execution_count": null, 1329 | "metadata": { 1330 | "jupyter": { 1331 | "outputs_hidden": false 1332 | } 1333 | }, 1334 | "outputs": [], 1335 | "source": [] 1336 | }, 1337 | { 1338 | "cell_type": "code", 1339 | "execution_count": null, 1340 | "metadata": { 1341 | "jupyter": { 1342 | "outputs_hidden": false 1343 | } 1344 | }, 1345 | "outputs": [], 1346 | "source": "# !cat solutions/02_30.py" 1347 | }, 1348 | { 1349 | "cell_type": "markdown", 1350 | "metadata": {}, 1351 | "source": [ 1352 | "We can also get information for each species using the `groupby` method.\n", 1353 | "\n", 1354 | "\n", 1355 | "> Get the median for each **species**." 1356 | ] 1357 | }, 1358 | { 1359 | "cell_type": "code", 1360 | "execution_count": null, 1361 | "metadata": { 1362 | "jupyter": { 1363 | "outputs_hidden": false 1364 | } 1365 | }, 1366 | "outputs": [], 1367 | "source": "# !cat solutions/02_31.py" 1368 | }, 1369 | { 1370 | "cell_type": "markdown", 1371 | "metadata": {}, 1372 | "source": [ 1373 | "---\n", 1374 | "\n", 1375 | "

\n", 1376 | "Saving the dataframe as a csv file\n", 1377 | "


\n", 1378 | "
" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "markdown", 1383 | "metadata": {}, 1384 | "source": ">Save df_4 using this path: `'data/Penguins/my_penguins.csv'`" 1385 | }, 1386 | { 1387 | "cell_type": "code", 1388 | "execution_count": null, 1389 | "metadata": { 1390 | "jupyter": { 1391 | "outputs_hidden": false 1392 | } 1393 | }, 1394 | "outputs": [], 1395 | "source": [] 1396 | }, 1397 | { 1398 | "cell_type": "code", 1399 | "execution_count": null, 1400 | "metadata": { 1401 | "jupyter": { 1402 | "outputs_hidden": false 1403 | } 1404 | }, 1405 | "outputs": [], 1406 | "source": "# !cat solutions/02_32.py\n" 1407 | } 1408 | ], 1409 | "metadata": { 1410 | "kernelspec": { 1411 | "display_name": "Python 3 (system-wide)", 1412 | "language": "python", 1413 | "name": "python3" 1414 | }, 1415 | "language_info": { 1416 | "codemirror_mode": { 1417 | "name": "ipython", 1418 | "version": 3 1419 | }, 1420 | "file_extension": ".py", 1421 | "mimetype": "text/x-python", 1422 | "name": "python", 1423 | "nbconvert_exporter": "python", 1424 | "pygments_lexer": "ipython3", 1425 | "version": "3.8.5" 1426 | } 1427 | }, 1428 | "nbformat": 4, 1429 | "nbformat_minor": 4 1430 | } 1431 | -------------------------------------------------------------------------------- /3.1 Visualization with Matplotlib.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "
\n", 8 | "

\n", 9 | "

\n", 10 | "

\n", 11 | "Data visualization with matplotlib\n", 12 | "

\n", 13 | "
" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": "![](media/matplotlib_logo_light.svg)" 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "

\n", 26 | "Import pyplot in matplotlib (and pandas)\n", 27 | "


\n", 28 | "
" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "According to the [official documentation](https://matplotlib.org/gallery/index.html):\n", 36 | "\n", 37 | "`matplotlib.pyplot` is a collection of command style functions that make Matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.\n", 38 | "\n", 39 | "`pyplot` is mainly intended for interactive plots and simple cases of programmatic plot generation." 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "![](https://miro.medium.com/max/2000/1*swPzVFGpYdijWAmbrydCDw.png)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "collapsed": false, 54 | "jupyter": { 55 | "outputs_hidden": false 56 | } 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "%matplotlib inline\n", 61 | "# this is for ipython interpreter to show the plot in Jupyter\n", 62 | "\n", 63 | "import pandas as pd\n", 64 | "import matplotlib.pyplot as plt" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "### Import the dataframe again, read it into a pandas DataFrame and assign it to df." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": { 78 | "collapsed": false, 79 | "jupyter": { 80 | "outputs_hidden": false 81 | } 82 | }, 83 | "outputs": [], 84 | "source": "df = pd.read_csv('data/Penguins/penguins_clean.csv')" 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "### Refresh our memory about how the data looks like" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": { 97 | "collapsed": false, 98 | "jupyter": { 99 | "outputs_hidden": false 100 | } 101 | }, 102 | "outputs": [], 103 | "source": [ 104 | "df.head()" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "### Using DataFrame.plot() in pandas\n", 112 | "\n", 113 | "pandas DataFrame object has a `plot()` method which provide basic plot of different kinds, including: 'line', 'bar', 'hist', 'box' etc. You can also set parameters to control the layout and labels of the plot.\n", 114 | "\n", 115 | "`plot()` uses `matplotlib.pyplot` in the background which makes plotting data in a DataFrame much easier \n", 116 | "\n", 117 | "You will find this page very helpful:\n", 118 | "https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "#### Example: Box plot in general" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "collapsed": false, 133 | "jupyter": { 134 | "outputs_hidden": false 135 | } 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "df.plot(kind='box')" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "The scales of our data don't align particularly well. So for the sake of plotting, we'll ignore the body mass of the penguins." 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "df.drop([\"body_mass_g\"], axis=1).plot(kind='box')" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "#### Better presentation: figure size, add title and legend" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": { 169 | "collapsed": false, 170 | "jupyter": { 171 | "outputs_hidden": false 172 | } 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "df.drop([\"body_mass_g\"], axis=1).plot(kind='box', figsize=(10,8), title='Box plot of different measurements of species of penguin', legend=True)" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "#### Making subplots" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": { 190 | "collapsed": false, 191 | "jupyter": { 192 | "outputs_hidden": false 193 | } 194 | }, 195 | "outputs": [], 196 | "source": [ 197 | "df.plot(kind='box',\n", 198 | " subplots=True, layout=(2,2),\n", 199 | " figsize=(10,8), title='Box plot of different measurements of species of penguin', legend=True)" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "---\n", 207 | "\n", 208 | "

\n", 209 | "Exercise: Compare bill length of different species of penguin\n", 210 | "


\n", 211 | "
" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "Let's use box plot to compare the bill length of different species of penguin. We need the DataFrame to be slightly different so we can compare the different type species of penguin. We would like to pivot the data so each column are bill length of different species of penguin." 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "#### Prepare the data set" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false, 233 | "jupyter": { 234 | "outputs_hidden": false 235 | } 236 | }, 237 | "outputs": [], 238 | "source": [ 239 | "df_pivot = df.pivot(columns='species', values='bill_length_mm')\n", 240 | "# tell the pivot() method to make the 'species' as columns, and using the 'bill_length_mm' as the value" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": { 247 | "collapsed": false, 248 | "jupyter": { 249 | "outputs_hidden": false 250 | } 251 | }, 252 | "outputs": [], 253 | "source": [ 254 | "df_pivot.sample(10)" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "#### Box plot of df_pivot\n", 262 | "\n", 263 | "Now we can use `plot()` on `df_pivot`. To make a box plot, remember to set the parameter `kind` to 'box'. Also make the presentation nice by setting a good `figsize` and with a good `title`. Don't forget the `legend`." 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": { 270 | "collapsed": false, 271 | "jupyter": { 272 | "outputs_hidden": false 273 | } 274 | }, 275 | "outputs": [], 276 | "source": [] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "#### Additional exercise\n", 283 | "\n", 284 | "Challenge yourself by making your own `df_pivot` pivoting on a different measure (e.g. Body Mass). Also try using a histogram (hist) instead of a boxplot. You can also try making a plot with 3 subplots, each is a histogram of a type of penguin." 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": { 291 | "collapsed": false, 292 | "jupyter": { 293 | "outputs_hidden": false 294 | } 295 | }, 296 | "outputs": [], 297 | "source": [] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "So far we are not using `matplotlib.pyplot` directly. Although it is very convenient to use `df.plot()`, sometimes we would like to have more control with what we are plotting and make more complex graphs. In the following sections, we will use `matplotlib.pyplot` (which is imported as `plt` now) directly." 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "### Divide the data into 3 types accordingly" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": { 317 | "collapsed": false, 318 | "jupyter": { 319 | "outputs_hidden": false 320 | } 321 | }, 322 | "outputs": [], 323 | "source": [ 324 | "df['species'].unique()" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": { 331 | "collapsed": false, 332 | "jupyter": { 333 | "outputs_hidden": false 334 | } 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "df_adelie = df[df['species'] == 'Adelie']" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": { 345 | "collapsed": false, 346 | "jupyter": { 347 | "outputs_hidden": false 348 | } 349 | }, 350 | "outputs": [], 351 | "source": [ 352 | "df_chinstrap = df[df['species'] == 'Chinstrap']" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": null, 358 | "metadata": { 359 | "collapsed": false, 360 | "jupyter": { 361 | "outputs_hidden": false 362 | } 363 | }, 364 | "outputs": [], 365 | "source": [ 366 | "df_gentoo = df[df['species'] == 'Gentoo']" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "### Scatter plot example: plot on Bill Length and Width" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": { 380 | "collapsed": false, 381 | "jupyter": { 382 | "outputs_hidden": false 383 | } 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "plt.scatter(df_adelie['bill_length_mm'], df_adelie['bill_depth_mm'], c='r')\n", 388 | "plt.scatter(df_chinstrap['bill_length_mm'], df_chinstrap['bill_depth_mm'], c='g')\n", 389 | "plt.scatter(df_gentoo['bill_length_mm'], df_gentoo['bill_depth_mm'], c='b')" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "#### Better presentation: figure size, add labels and legend" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": false, 404 | "jupyter": { 405 | "outputs_hidden": false 406 | } 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "plt.figure(figsize=(10,8)) # set the size of the plot\n", 411 | "\n", 412 | "plt.scatter(df_adelie['bill_length_mm'], df_adelie['bill_depth_mm'], c='r')\n", 413 | "plt.scatter(df_chinstrap['bill_length_mm'], df_chinstrap['bill_depth_mm'], c='g')\n", 414 | "plt.scatter(df_gentoo['bill_length_mm'], df_gentoo['bill_depth_mm'], c='b')\n", 415 | "\n", 416 | "ax = plt.gca() #gca method tell the rest of the code to reference the plot we made\n", 417 | "\n", 418 | "ax.set_xlabel('Bill Length (mm)')\n", 419 | "ax.set_ylabel('Bill Width (mm)')\n", 420 | "ax.set_title('Bill Length and Width for Different Species of Penguin')\n", 421 | "\n", 422 | "ax.legend(('adelie', 'chinstrap', 'gentoo'))" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### Scatter plot exercise: plot on Flipper Length and Body Mass\n", 430 | "\n", 431 | "Now is your turn to make your own plot. Make sure you have also set the labels and legend" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "metadata": { 438 | "collapsed": false, 439 | "jupyter": { 440 | "outputs_hidden": false 441 | } 442 | }, 443 | "outputs": [], 444 | "source": [] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "### Histogram example: plot on Bill Length" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": null, 456 | "metadata": { 457 | "collapsed": false, 458 | "jupyter": { 459 | "outputs_hidden": false 460 | } 461 | }, 462 | "outputs": [], 463 | "source": [ 464 | "plt.figure(figsize=(10,8))\n", 465 | "\n", 466 | "plt.hist(df_adelie['bill_length_mm'], color='r', alpha=.5) # alpha set the transparency of the plot\n", 467 | "plt.hist(df_chinstrap['bill_length_mm'], color='g', alpha=.5)\n", 468 | "plt.hist(df_gentoo['bill_length_mm'], color='b', alpha=.5)\n", 469 | "\n", 470 | "ax = plt.gca()\n", 471 | "\n", 472 | "ax.set_xlabel('Bill Length (mm)')\n", 473 | "ax.set_title('Histogram of Bill Length for Different Species of Penguin')\n", 474 | "\n", 475 | "ax.legend(('adelie', 'chinstrap', 'gentoo'))" 476 | ] 477 | }, 478 | { 479 | "cell_type": "markdown", 480 | "metadata": {}, 481 | "source": [ 482 | "### Histogram exercise: plot on Body Mass\n", 483 | "\n", 484 | "Now is your turn to make your own plot. Make sure you set the alpha to a proper value and have the right the labels and legend." 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": null, 490 | "metadata": { 491 | "collapsed": false, 492 | "jupyter": { 493 | "outputs_hidden": false 494 | } 495 | }, 496 | "outputs": [], 497 | "source": [] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "### Making subplots example\n", 504 | "\n", 505 | "To make subplots with just `plt` is a bit more complicated. It is considered more advance and require some understanding of what the building blocks are in a plot. Don't feel bad if you find it challenging, you can always follow the example and try it yourself to understand more what is going on.\n", 506 | "\n", 507 | "The example below plot the histogram of Bill Length and Bill Width side by side" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": { 514 | "collapsed": false, 515 | "jupyter": { 516 | "outputs_hidden": false 517 | } 518 | }, 519 | "outputs": [], 520 | "source": [ 521 | "# First, we have to decide how many subplots we want and how they are orientated\n", 522 | "# say we want them side by side (i.e. 1 row 2 columns)\n", 523 | "\n", 524 | "fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2, figsize=(15,8))\n", 525 | "\n", 526 | "# this will create a figure object (which is the whole plot area)\n", 527 | "# and 2 axes (which are the 2 subplots labeled ax0 and ax1)\n", 528 | "\n", 529 | "# Now we can put plots in them accordingly\n", 530 | "\n", 531 | "### for ax0 ###\n", 532 | "\n", 533 | "ax0.hist(df_adelie['bill_length_mm'], color='r', alpha=.5) \n", 534 | "ax0.hist(df_chinstrap['bill_length_mm'], color='g', alpha=.5)\n", 535 | "ax0.hist(df_gentoo['bill_length_mm'], color='b', alpha=.5)\n", 536 | "\n", 537 | "ax0.set_xlabel('Bill Length (mm)')\n", 538 | "ax0.set_title('Histogram of Bill Length for Different Species of Penguin')\n", 539 | "\n", 540 | "ax0.legend(('adelie', 'chinstrap', 'gentoo'))\n", 541 | "\n", 542 | "### for ax1 ###\n", 543 | "\n", 544 | "ax1.hist(df_adelie['bill_depth_mm'], color='r', alpha=.5) \n", 545 | "ax1.hist(df_chinstrap['bill_depth_mm'], color='g', alpha=.5)\n", 546 | "ax1.hist(df_gentoo['bill_depth_mm'], color='b', alpha=.5)\n", 547 | "\n", 548 | "ax1.set_xlabel('Bill Width (mm)')\n", 549 | "ax1.set_title('Histogram of Bill Width for Different Species of Penguin')\n", 550 | "\n", 551 | "ax1.legend(('adelie', 'chinstrap', 'gentoo'))\n", 552 | "\n", 553 | "plt.show() # after building what we want for both axes, use show() method to show plots" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "---\n", 561 | "\n", 562 | "

\n", 563 | "Making subplots exercise\n", 564 | "


\n", 565 | "
" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "Make 2 subplots, one on top of another. They are scatter plots of Flipper Length and Body Mass (with different type of penguin). After you have done it, try also other orientation and plots. See if you can make 4 subplots together. Always make sure the presentation is good." 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": {}, 578 | "source": [ 579 | "---\n", 580 | "\n", 581 | "

\n", 582 | "More matplotlib!\n", 583 | "


\n", 584 | "
" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "Check out more example of histogram with multiple data sets: https://matplotlib.org/gallery/statistics/histogram_multihist.html#sphx-glr-gallery-statistics-histogram-multihist-py\n", 592 | "\n", 593 | "Example: Creates histogram from scatter plot and adds them to the sides of the plot\n", 594 | "https://matplotlib.org/gallery/lines_bars_and_markers/scatter_hist.html#sphx-glr-gallery-lines-bars-and-markers-scatter-hist-py\n", 595 | "\n", 596 | "There are a lot more to learn about matplotlib. It is a very powerful library. You can always learn more by looking at the examples at: https://matplotlib.org/gallery/index.html\n", 597 | "\n", 598 | "Also, if you are stuck, always check the documentation: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot\n", 599 | "\n", 600 | "![](https://media0.giphy.com/media/l3nF8lOW9D0ZElDvG/200.gif)\n" 601 | ] 602 | } 603 | ], 604 | "metadata": { 605 | "kernelspec": { 606 | "display_name": "Python 3 (ipykernel)", 607 | "language": "python", 608 | "name": "python3" 609 | }, 610 | "language_info": { 611 | "codemirror_mode": { 612 | "name": "ipython", 613 | "version": 3 614 | }, 615 | "file_extension": ".py", 616 | "mimetype": "text/x-python", 617 | "name": "python", 618 | "nbconvert_exporter": "python", 619 | "pygments_lexer": "ipython3", 620 | "version": "3.11.1" 621 | } 622 | }, 623 | "nbformat": 4, 624 | "nbformat_minor": 4 625 | } 626 | -------------------------------------------------------------------------------- /3.2 Visualization with Seaborn.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "
\n", 8 | "

\n", 9 | "

\n", 10 | "

\n", 11 | "Data visualization with Seaborn\n", 12 | "

\n", 13 | "
" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## About seaborn\n", 21 | "Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics, which is very powerful for visualizing categorical data.\n", 22 | "\n", 23 | "![](https://d1rwhvwstyk9gu.cloudfront.net/2017/07/seaburn-1.png)" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "We will be using the [Pokemon.csv](https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6). Let's have a look at the data:" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": { 37 | "jupyter": { 38 | "outputs_hidden": false 39 | } 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "import pandas as pd\n", 44 | "\n", 45 | "pokemon_df = pd.read_csv('data/Pokemon/pokemon.csv', index_col=0)\n", 46 | "pokemon_df.head(10)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "---\n", 54 | "\n", 55 | "

\n", 56 | "Categorical scatterplots\n", 57 | "


\n", 58 | "
" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "For example, we want to compare the Attack of different type of Pokemon, to see if any type is generally more powerful than the others:" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": { 72 | "jupyter": { 73 | "outputs_hidden": false 74 | } 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "import seaborn as sns\n", 79 | "import matplotlib.pyplot as plt\n", 80 | "\n", 81 | "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "When import, we usually simplify 'seaborn' as 'sns'. (It's a [West Wing / Rob Lowe](https://en.wikipedia.org/wiki/Sam_Seaborn) reference!) Note that we have to also have to import matplotlib.pyplot because Seaborn is a library that sit on top of matplotlib. We got a plot but it looks ugly and not readable, let's add some configuration to make it nicer." 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "**Try: adding `aspect=2.5` as the last arguments in the following `sns.catplot`**" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": { 102 | "jupyter": { 103 | "outputs_hidden": false 104 | } 105 | }, 106 | "outputs": [], 107 | "source": [ 108 | "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "So you can see that by adding 'aspect' we make the plot wider. The width of the plot is equal to 'aspect * height' so by adding 'aspect' we increase the width of the plot. It is one of the configuration we can add to the plot. For the whole list and their details, we can refer to the [official documentation](https://seaborn.pydata.org/generated/seaborn.catplot.html#seaborn.catplot) but we will give an introduction to a few common ones." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "For example, here we see that there's a random x-axis offset for all the points so we can see them without dots overlapping each other. This is done by the 'jitter' setting which is default to True. Let's turn it off and see how it looks like:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "**Try: adding `jitter=False` as the last arguments in the following `sns.catplot`**" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": { 136 | "jupyter": { 137 | "outputs_hidden": false 138 | } 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "So we now have a plot that points are align according to their catagories without the x-axis offsets. Which one to use is depending on if the population of the value (e.g. Attack) is important. In our case, we want to know how the Attack is distributed in each Type so many be it's good to have 'jitter' on, or even better if we can spread it out even more and show the distribution:" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "**Try: adding `kind=\"swarm\"` as the last arguments in the following `sns.catplot`**" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": { 163 | "jupyter": { 164 | "outputs_hidden": false 165 | } 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, aspect=2.5);" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "Here we can do it by setting 'kind' to 'swarm' so the points are not overlapping. The disadvantage is that this ploy will need more space horizontally. Imagine we don't want to make the plot super wide due to the limitation of the paper. We can turn it 90 degrees by flipping the x and the y,also we would adjust the aspect and the height:" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "**Try: swap `x` and `y`, and add `height=12, aspect=0.6, kind=\"swarm\"` in the arguments of the following `sns.catplot`**" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": null, 189 | "metadata": { 190 | "jupyter": { 191 | "outputs_hidden": false 192 | } 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "sns.catplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df);" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "There are a few thing we can observe so far:\n", 204 | "\n", 205 | "1. For some Types, like Psychic has a very large range of Attack with a long tail the end (i.e. some Physic Types has very high Attack power while most of the Psychic type does not).\n", 206 | "\n", 207 | "2. On the other hand, the Poison type are mostly in the range of 40-110 Attacks.\n", 208 | "\n", 209 | "3. In general Dragon Types have more Attack power than Fairy, but there are 2 Fairy type that has more attack power." 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "However, we would like to look deeper: I have a theory that Legendary Pokemon are more powerful. let's colour code according to 'Legendary' to see if the pokemon is Legendary or not will have something to do with the Attack of the pokemon:" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "**Try: adding `hue=\"Legendary\"` as the last arguments in the following `sns.catplot`**" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": { 230 | "jupyter": { 231 | "outputs_hidden": false 232 | } 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "plt.figure(figsize=(15, 6))\n", 237 | "sns.stripplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7)" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "Ah ha! We see that a lot of the Psychic Type that has higher that others in Attack is actually Legendary pokemon. That also happen to the Ground Type and the Flying type." 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "### Exercise\n", 252 | "Now it's your turn to do some analysis. Pick a property of the Pokemon: HP, Defense, Sp. Atk, Sp. Def or Speed and do the similar analysis as above to see if you can find any interesting facts about Pokemon." 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "---\n", 260 | "\n", 261 | "

\n", 262 | "Building structured multi-plot grids\n", 263 | "


\n", 264 | "
" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "Sometimes, we would have multiple plots in one graph for comparison. One way to do it in seaborn is to use FacetGrid. The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. In the following, we will be using FacetGrid to see if there is a difference for our analysis above across different Generations." 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "To make a FacetGrid, we can do the following:" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": null, 284 | "metadata": { 285 | "jupyter": { 286 | "outputs_hidden": false 287 | } 288 | }, 289 | "outputs": [], 290 | "source": [ 291 | "g = sns.FacetGrid(pokemon_df, col=\"Generation\")" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "Look we have 6 plot areas which match as the number of different of Generations that we have\n", 299 | "(we can check what are the different Generations like this):" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": { 306 | "jupyter": { 307 | "outputs_hidden": false 308 | } 309 | }, 310 | "outputs": [], 311 | "source": [ 312 | "pokemon_df[\"Generation\"].unique()" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "However, we would like to have the plots align vertically rather than horizontally." 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "**Try: replace `col` with `row` in the following `sns.FacetGrid`**" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": { 333 | "jupyter": { 334 | "outputs_hidden": false 335 | } 336 | }, 337 | "outputs": [], 338 | "source": [ 339 | "g = sns.FacetGrid(pokemon_df, col=\"Generation\")" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "Ok, now we have the layout, how we gonna to put the plot in? For some plots, it could be done with the [FacetGrid.map()](https://seaborn.pydata.org/generated/seaborn.FacetGrid.map.html#seaborn.FacetGrid.map) method, for example, using sns.countplot to count how many Pokemon in different types:" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "jupyter": { 354 | "outputs_hidden": false 355 | } 356 | }, 357 | "outputs": [], 358 | "source": [ 359 | "g = sns.FacetGrid(pokemon_df, row=\"Generation\", aspect=3.5)\n", 360 | "g.map(sns.countplot, \"Type 1\");" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "But with sns.catplot that we used before, this are even simpler. As catplot is already a FacetGrid , we can directly add the `row` or `col` setting to it." 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "**Try: adding `row=\"Generation\"` as the last arguments in the following `sns.catplot`**" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": null, 380 | "metadata": { 381 | "jupyter": { 382 | "outputs_hidden": false 383 | } 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "plt.figure(figsize=(15, 6))\n", 388 | "sns.stripplot(x=\"Type 1\", y=\"Attack\", data=pokemon_df, size=7, hue=\"Legendary\")" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": {}, 394 | "source": "Now you see that in each generation, the Legendary Pokemon are outliers with super attack powers comparing with the others within their own generation. For details using FacetGrids, you can see the official documentation here: https://seaborn.pydata.org/tutorial/axis_grids.html\n" 395 | } 396 | ], 397 | "metadata": { 398 | "kernelspec": { 399 | "display_name": "Python 3 (system-wide)", 400 | "language": "python", 401 | "name": "python3" 402 | }, 403 | "language_info": { 404 | "codemirror_mode": { 405 | "name": "ipython", 406 | "version": 3 407 | }, 408 | "file_extension": ".py", 409 | "mimetype": "text/x-python", 410 | "name": "python", 411 | "nbconvert_exporter": "python", 412 | "pygments_lexer": "ipython3", 413 | "version": "3.8.5" 414 | } 415 | }, 416 | "nbformat": 4, 417 | "nbformat_minor": 4 418 | } 419 | -------------------------------------------------------------------------------- /4. More Python basics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "
\n", 8 | "

\n", 9 | "

\n", 10 | "

\n", 11 | "More Python\n", 12 | "

\n", 13 | "
" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "***Note***: This notebook contains solution cells with (a) solution. Remember there is not only one solution to a problem! \n", 21 | "You will recognise these cells as they start with **# %**. \n", 22 | "If you would like to see the solution, you will have to remove the **#** (which can be done by using **Ctrl** and **?**) and run the cell. If you want to run the solution code, you will have to run the cell again." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "---\n", 30 | "\n", 31 | "

\n", 32 | "Dictionaries\n", 33 | "


\n", 34 | "
" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "**A dictionary is formed of value-key pairs, separated by commas, enclosed in curly brackets ( {} ). \n", 42 | "The key and the value are separated by a column ( : ), i.e. key:value.**" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "dict_greeting = {'Namibia':'Hallo', 'France':'Bonjour', 'Spain':'Ola', 'UK':'Hello', 'Italy':'Ciao'}" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "dict_greeting # there is no order in a dictionary" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "**We can access values using the keys between square brackets.** \n", 68 | ">Get the greeting from Italy (use square brackets)." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": "# !cat solutions/04_01.py" 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "**Keys are immutables (i.e. can't be changed) but values can be updated.** \n", 90 | ">Replace the UK greeting with 'Good Morning'. \n", 91 | ">Print the dictionary." 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": "# !cat solutions/04_02.py" 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "**We can also add new key-value pairs.** \n", 113 | ">Add the greeting of 'Hawaii' as 'Aloha'" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": "# !cat solutions/04_03.py" 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "---\n", 135 | "\n", 136 | "

\n", 137 | "Sets\n", 138 | "


\n", 139 | "
" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "**A set is a collection of unique, unordered and un-indexed elements.**" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "important_set = set(['me','myself', 'me', 'I'])\n", 156 | "important_set" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "We can add an item to a set using the ***add*** method..." 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "important_set.add('you')\n", 173 | "important_set" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "...or multiple items using the ***update*** method." 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": {}, 187 | "outputs": [], 188 | "source": [ 189 | "other_set = {'me', 'all of you', 'other people'}\n", 190 | "important_set.update(other_set)\n", 191 | "important_set" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "You can find the methods for set [here](https://docs.python.org/2/library/sets.html). \n", 199 | "For example, you can get the intersection of two sets." 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "set_1 = {3, 6, 9, 12, 15, 18, 21, 24, 27, 30}\n", 209 | "set_2 = {5, 10, 15, 20, 25, 30}\n", 210 | "set_3 = set_1.intersection(set_2)\n", 211 | "set_3" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "---\n", 219 | "\n", 220 | "

\n", 221 | "If - Elif - Else\n", 222 | "


\n", 223 | "
" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "\n", 231 | "Let's write a short program with an ***if*** statement to help us decide if a name is long or not (6 is completely arbitrary ;-)). \n", 232 | "We have to respect blocks of codes / indentation." 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "name = input('What is your name? ')\n", 242 | "\n", 243 | "if len(name) > 6:\n", 244 | " print('You have a long name.')\n", 245 | "else:\n", 246 | " print('You have a short name.')" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | ">Write an ***If - Elif - Else*** statement printing 'Python' if x is positive, else 'sunshine' if y is equal to 2, else 'data' if z is a multiple of 3, else 'Why?'. \n", 254 | "You can test it with different values of x, y and z." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "x=\n", 264 | "y=\n", 265 | "z=\n", 266 | "\n", 267 | "if\n", 268 | "\n" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": "# !cat solutions/04_04.py" 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "---\n", 283 | "\n", 284 | "

\n", 285 | "Using \"and\" to check for None\n", 286 | "


\n", 287 | "
" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "Sometimes we have missing data (which is None or np.nan) and will cause error in the check condition. e.g. age > 18 while age is missing or NaN) However, there's a trick. Since in the `and` operation, the second argument will not be checked if the 1st one is False (the result will be False anyway) so by checking if the 1st argument is valid or not we can avoid the error. For example:" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "age = None\n", 304 | "age > 18 # you will get an error" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "age = None\n", 314 | "(age is not None) and (age > 18) # no error" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "The advantage of doing it this way is that we will have simpler code and a less indented `if` structure" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "if age is not None:\n", 331 | " if age > 18:\n", 332 | " print(\"have beer\")\n", 333 | "\n", 334 | "# which is not as good as\n", 335 | "\n", 336 | "if (age is not None) and (age > 18):\n", 337 | " print(\"have beer\")" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "---\n", 345 | "\n", 346 | "

\n", 347 | "Functions\n", 348 | "


\n", 349 | "
" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "**We can define our own functions with the keyword \"def\" followed by the name of the function and by parentheses with the parameter(s) inside.** \n", 357 | "\n", 358 | "Using the list **list_greeting**, we can define a **is_greeting** function which will decide if a string is a greeting or not." 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "list_greeting = ['Hallo', 'Bonjour', 'Ola', 'Hello', 'Ciao', 'Ave']\n", 368 | "\n", 369 | "def is_greeting(s):\n", 370 | " \"\"\"Returns True if s is in list_greeting, else False.\"\"\"\n", 371 | " if s in list_greeting:\n", 372 | " return True\n", 373 | " else:\n", 374 | " return False " 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "We can now check if **Ola** and **Yo** are greetings." 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": null, 387 | "metadata": {}, 388 | "outputs": [], 389 | "source": [ 390 | "is_greeting('Ola')" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "is_greeting('Yo')" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | ">Get the documentation for the is_greeting function." 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": {}, 420 | "outputs": [], 421 | "source": "# !cat solutions/04_05.py" 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | ">Write a function that returns the input multiplied by 3 and increased by 10.\n", 428 | "\n", 429 | "Note: we call these inputs arguments." 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": {}, 436 | "outputs": [], 437 | "source": [] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": {}, 443 | "outputs": [], 444 | "source": "# !cat solutions/04_06.py\n" 445 | } 446 | ], 447 | "metadata": { 448 | "kernelspec": { 449 | "display_name": "Python 3 (system-wide)", 450 | "language": "python", 451 | "name": "python3" 452 | }, 453 | "language_info": { 454 | "codemirror_mode": { 455 | "name": "ipython", 456 | "version": 3 457 | }, 458 | "file_extension": ".py", 459 | "mimetype": "text/x-python", 460 | "name": "python", 461 | "nbconvert_exporter": "python", 462 | "pygments_lexer": "ipython3", 463 | "version": "3.8.5" 464 | } 465 | }, 466 | "nbformat": 4, 467 | "nbformat_minor": 4 468 | } 469 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | All notable changes to this project will be documented in this file. 4 | 5 | The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) 6 | and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). 7 | 8 | 9 | 10 | ## [2.0.0] -2025-02-01 11 | 12 | - simplify set up process and instructions 13 | - restructure and put all the notebooks in the root directory for easy access in Colab 14 | - use `!cat` instead of load magic to load solution as load magics are not implemented in some environments 15 | 16 | ## [1.1.0] - 2022-07-10 17 | 18 | - Various upgrades ([#40](https://github.com/HumbleData/beginners-data-workshop/pull/40)) 19 | - **Behind-the-scenes** 20 | - Upgrade Python to 3.9.13 21 | - Upgrade all dependencies to latest with NumPy/pandas/Matplotlib/scikit-learn matching CoCalc versions. 22 | - Switch from `pip-tools` to Poetry (configuration in `pyproject.toml`) 23 | - Update Development Setup guidance in `README.md` 24 | - Integrate & configure flake8, pylint, black & other linters. 25 | - Added pre-commit configuration. 26 | - Updated VS Code `settings.json` 27 | - Added `linestripper.py` to keep EOF newlines out of solutions code (better attendee UX) yet black-compliant. 28 | - Added `CHANGELOG.md` and started versioning releases. 29 | - **Workshop materials** 30 | - Reviewed all workshop materials for deprecations. 31 | - Replaced all single quotes in solutions with double quotes for black compliance. 32 | - Added EOF newlines to datasets. 33 | - Other minor changes: trailing whitespace, trailing commas, isort compliant solution code. 34 | 35 | 36 | ## [1.0.0] - 2021-07-23 37 | Final version used for Humble Data workshops in 2021. 38 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. 2 | To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Humble Data Workshop 2 | 3 | [![Humble Data Workshop](./media/humble-data-logo-transparent.png)](https://humbledata.org) 4 | 5 | ## ℹ️ If you would like to know more about this workshop, please [email us](mailto:contact@humbledata.org). 6 | 7 | --- 8 | ## Table of Contents 9 | * [Google Colab setup](#google-colab-setup) 10 | * [Local environment setup](#local-environment-setup) 11 | + [UV Installation](#uv-installation) 12 | + [Installing Miniconda](#installing-miniconda) 13 | - [Windows](#windows) 14 | - [Unix (Linux/macOS)](#unix-linuxmacos) 15 | + [Creating and Activating the Environment](#creating-and-activating-the-environment) 16 | 17 | * [License](#license) 18 | --- 19 | 20 | ## Google Colab setup 21 | 22 | 1. Go to [https://githubtocolab.com/HumbleData/beginners-data-workshop](https://githubtocolab.com/HumbleData/beginners-data-workshop) 23 | 2. Choose the notebook that you want to open 24 | ![open a notebook in colab](media/colab/image10.png) 25 | 3. Click on the file icon file icon on the left 26 | 4. If you haven’t logged in to your Google account, you will be asked to do so 27 | ![sign in to Google](media/colab/image2.png) 28 | 5. At the beginning of the notebook, add a cell by clicking the add code icon button at the top 29 | ![adding a code block](media/colab/image1.png) 30 | 6. After that copy and paste the following codes in the new cell: 31 | ``` 32 | !git clone https://github.com/HumbleData/beginners-data-workshop.git 33 | !cp -r beginners-data-workshop/media/ . 34 | !cp -r beginners-data-workshop/data/ . 35 | !cp -r beginners-data-workshop/solutions/ . 36 | !rm -r beginners-data-workshop/ 37 | ``` 38 | > NOTE: You will need to add this code cell to every notebook you start. 39 | 40 | ![adding the script shown above](media/colab/image7.png) 41 | 7. Run the cell by clicking the play button on the left of the cell or press shift \+ enter on your keyboard 42 | ![running the script shown above](media/colab/image4.png) 43 | 8. You may get this warning when running the first code block. Click “Run anyway” when asked (because you trust us not giving you malicious code). 44 | ![warning about running code in colab](media/colab/image3.png) 45 | 9. When the code is finished (it may take a moment), you should see that three folders are added to your files. Consider the preparation work done and you may now start using the notebook. 46 | 47 | new files added 48 | 10. Note that when you disconnect from the notebook (or leave it inactive for a long time) the files we just download with the code and your work is not saved. 49 | 50 | Consider downloading or saving your work in drive before you leave this notebook. You can do so by clicking on the “File” button at the bottom. 51 | 52 | saving or downloading file to keep your work 53 | 54 | --- 55 | 56 | ## Local environment setup 57 | 58 | This document contains instructions on how to run the workshop using either `uv` or `conda` (Miniconda). 59 | 60 | ### UV Installation 61 | To run this workshop locally using `uv`, first you will need to [install uv](https://docs.astral.sh/uv/getting-started/installation/) on your computer. 62 | 63 | Once it is done, follow the instructions below: 64 | 65 | 1. Create a virtual python virtual environment 3.10+ 66 | * `uv venv humble-data-workshop --python 3.10` 67 | 2. Activate the virtual environment. 68 | * `source humble-data-workshop/bin/activate` 69 | 3. Install Dependencies 70 | * `uv pip install -r requirements.txt` 71 | 72 | ### Installing Miniconda 73 | 74 | #### Windows 75 | 1. Download the Miniconda installer for Windows from the [official website](https://docs.conda.io/en/latest/miniconda.html) 76 | 2. Double-click the downloaded `.exe` file 77 | 3. Follow the installation prompts: 78 | - Click "Next" 79 | - Accept the license terms 80 | - Select "Just Me" for installation scope 81 | - Choose an installation directory (default is recommended) 82 | - In "Advanced Options", check "Add Miniconda3 to my PATH environment variable" 83 | - Click "Install" 84 | 85 | #### Unix (Linux/macOS) 86 | 1. Download the Miniconda installer for your system from the [official website](https://docs.conda.io/en/latest/miniconda.html) 87 | 2. Open Terminal 88 | 3. Navigate to the directory containing the downloaded file 89 | 4. Make the installer executable: 90 | ```bash 91 | chmod +x Miniconda3-latest-*-x86_64.sh 92 | ``` 93 | 5. Run the installer: 94 | ```bash 95 | ./Miniconda3-latest-*-x86_64.sh 96 | ``` 97 | 6. Follow the prompts: 98 | - Press Enter to review the license agreement 99 | - Type "yes" to accept the license terms 100 | - Confirm the installation location (default is recommended) 101 | - Type "yes" to initialize Miniconda3 102 | 103 | ### Creating and Activating the Environment 104 | 105 | 1. Open a new terminal (Windows: Anaconda Prompt, Unix: Terminal) 106 | 2. Create a new environment named 'humble-data': 107 | ```bash 108 | conda create -n humble-data python=3.8 109 | ``` 110 | 3. Activate the environment: 111 | - Windows: 112 | ```bash 113 | conda activate humble-data 114 | ``` 115 | - Unix: 116 | ```bash 117 | conda activate humble-data 118 | ``` 119 | 4. Install required packages: 120 | ```bash 121 | pip install -r requirements.txt 122 | ``` 123 | 124 | 5. Start Jupyter Notebook: 125 | ```bash 126 | jupyter notebook 127 | ``` 128 | This will open Jupyter Notebook in your default web browser. You can now navigate to and open any of the workshop notebooks. 129 | 130 | ## Contributing 131 | 132 | 1. Fork this repository 133 | 2. Clone your fork locally 134 | 3. Create a branch for your changes: 135 | ```git checkout -b improve-notebook-x``` 136 | 137 | 4. Make your changes: 138 | 139 | - Keep explanations simple and beginner-friendly 140 | - Test notebooks in both Google Colab and local environments 141 | - Follow existing code style and formatting 142 | 143 | 144 | 5. Commit with a clear message: 145 | ```git commit -m "Fix typo in data visualization notebook"``` 146 | 147 | 6. Push and create a pull request 148 | 149 | --- 150 | 151 | ## License 152 | 153 | Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 154 | -------------------------------------------------------------------------------- /data/Iris/Iris.csv: -------------------------------------------------------------------------------- 1 | Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species 2 | 1,5.1,3.5,1.4,0.2,Iris-setosa 3 | 2,4.9,3.0,1.4,0.2,Iris-setosa 4 | 3,4.7,3.2,1.3,0.2,Iris-setosa 5 | 4,4.6,3.1,1.5,0.2,Iris-setosa 6 | 5,5.0,3.6,1.4,0.2,Iris-setosa 7 | 6,5.4,3.9,1.7,0.4,Iris-setosa 8 | 7,4.6,3.4,1.4,0.3,Iris-setosa 9 | 8,5.0,3.4,1.5,0.2,Iris-setosa 10 | 9,4.4,2.9,1.4,0.2,Iris-setosa 11 | 10,4.9,3.1,1.5,0.1,Iris-setosa 12 | 11,5.4,3.7,1.5,0.2,Iris-setosa 13 | 12,4.8,3.4,1.6,0.2,Iris-setosa 14 | 13,4.8,3.0,1.4,0.1,Iris-setosa 15 | 14,4.3,3.0,1.1,0.1,Iris-setosa 16 | 15,5.8,4.0,1.2,0.2,Iris-setosa 17 | 16,5.7,4.4,1.5,0.4,Iris-setosa 18 | 17,5.4,3.9,1.3,0.4,Iris-setosa 19 | 18,5.1,3.5,1.4,0.3,Iris-setosa 20 | 19,5.7,3.8,1.7,0.3,Iris-setosa 21 | 20,5.1,3.8,1.5,0.3,Iris-setosa 22 | 21,5.4,3.4,1.7,0.2,Iris-setosa 23 | 22,5.1,3.7,1.5,0.4,Iris-setosa 24 | 23,4.6,3.6,1.0,0.2,Iris-setosa 25 | 24,5.1,3.3,1.7,0.5,Iris-setosa 26 | 25,4.8,3.4,1.9,0.2,Iris-setosa 27 | 26,5.0,3.0,1.6,0.2,Iris-setosa 28 | 27,5.0,3.4,1.6,0.4,Iris-setosa 29 | 28,5.2,3.5,1.5,0.2,Iris-setosa 30 | 29,5.2,3.4,1.4,0.2,Iris-setosa 31 | 30,4.7,3.2,1.6,0.2,Iris-setosa 32 | 31,4.8,3.1,1.6,0.2,Iris-setosa 33 | 32,5.4,3.4,1.5,0.4,Iris-setosa 34 | 33,5.2,4.1,1.5,0.1,Iris-setosa 35 | 34,5.5,4.2,1.4,0.2,Iris-setosa 36 | 35,4.9,3.1,1.5,0.1,Iris-setosa 37 | 36,5.0,3.2,1.2,0.2,Iris-setosa 38 | 37,5.5,3.5,1.3,0.2,Iris-setosa 39 | 38,4.9,3.1,1.5,0.1,Iris-setosa 40 | 39,4.4,3.0,1.3,0.2,Iris-setosa 41 | 40,5.1,3.4,1.5,0.2,Iris-setosa 42 | 41,5.0,3.5,1.3,0.3,Iris-setosa 43 | 42,4.5,2.3,1.3,0.3,Iris-setosa 44 | 43,4.4,3.2,1.3,0.2,Iris-setosa 45 | 44,5.0,3.5,1.6,0.6,Iris-setosa 46 | 45,5.1,3.8,1.9,0.4,Iris-setosa 47 | 46,4.8,3.0,1.4,0.3,Iris-setosa 48 | 47,5.1,3.8,1.6,0.2,Iris-setosa 49 | 48,4.6,3.2,1.4,0.2,Iris-setosa 50 | 49,5.3,3.7,1.5,0.2,Iris-setosa 51 | 50,5.0,3.3,1.4,0.2,Iris-setosa 52 | 51,7.0,3.2,4.7,1.4,Iris-versicolor 53 | 52,6.4,3.2,4.5,1.5,Iris-versicolor 54 | 53,6.9,3.1,4.9,1.5,Iris-versicolor 55 | 54,5.5,2.3,4.0,1.3,Iris-versicolor 56 | 55,6.5,2.8,4.6,1.5,Iris-versicolor 57 | 56,5.7,2.8,4.5,1.3,Iris-versicolor 58 | 57,6.3,3.3,4.7,1.6,Iris-versicolor 59 | 58,4.9,2.4,3.3,1.0,Iris-versicolor 60 | 59,6.6,2.9,4.6,1.3,Iris-versicolor 61 | 60,5.2,2.7,3.9,1.4,Iris-versicolor 62 | 61,5.0,2.0,3.5,1.0,Iris-versicolor 63 | 62,5.9,3.0,4.2,1.5,Iris-versicolor 64 | 63,6.0,2.2,4.0,1.0,Iris-versicolor 65 | 64,6.1,2.9,4.7,1.4,Iris-versicolor 66 | 65,5.6,2.9,3.6,1.3,Iris-versicolor 67 | 66,6.7,3.1,4.4,1.4,Iris-versicolor 68 | 67,5.6,3.0,4.5,1.5,Iris-versicolor 69 | 68,5.8,2.7,4.1,1.0,Iris-versicolor 70 | 69,6.2,2.2,4.5,1.5,Iris-versicolor 71 | 70,5.6,2.5,3.9,1.1,Iris-versicolor 72 | 71,5.9,3.2,4.8,1.8,Iris-versicolor 73 | 72,6.1,2.8,4.0,1.3,Iris-versicolor 74 | 73,6.3,2.5,4.9,1.5,Iris-versicolor 75 | 74,6.1,2.8,4.7,1.2,Iris-versicolor 76 | 75,6.4,2.9,4.3,1.3,Iris-versicolor 77 | 76,6.6,3.0,4.4,1.4,Iris-versicolor 78 | 77,6.8,2.8,4.8,1.4,Iris-versicolor 79 | 78,6.7,3.0,5.0,1.7,Iris-versicolor 80 | 79,6.0,2.9,4.5,1.5,Iris-versicolor 81 | 80,5.7,2.6,3.5,1.0,Iris-versicolor 82 | 81,5.5,2.4,3.8,1.1,Iris-versicolor 83 | 82,5.5,2.4,3.7,1.0,Iris-versicolor 84 | 83,5.8,2.7,3.9,1.2,Iris-versicolor 85 | 84,6.0,2.7,5.1,1.6,Iris-versicolor 86 | 85,5.4,3.0,4.5,1.5,Iris-versicolor 87 | 86,6.0,3.4,4.5,1.6,Iris-versicolor 88 | 87,6.7,3.1,4.7,1.5,Iris-versicolor 89 | 88,6.3,2.3,4.4,1.3,Iris-versicolor 90 | 89,5.6,3.0,4.1,1.3,Iris-versicolor 91 | 90,5.5,2.5,4.0,1.3,Iris-versicolor 92 | 91,5.5,2.6,4.4,1.2,Iris-versicolor 93 | 92,6.1,3.0,4.6,1.4,Iris-versicolor 94 | 93,5.8,2.6,4.0,1.2,Iris-versicolor 95 | 94,5.0,2.3,3.3,1.0,Iris-versicolor 96 | 95,5.6,2.7,4.2,1.3,Iris-versicolor 97 | 96,5.7,3.0,4.2,1.2,Iris-versicolor 98 | 97,5.7,2.9,4.2,1.3,Iris-versicolor 99 | 98,6.2,2.9,4.3,1.3,Iris-versicolor 100 | 99,5.1,2.5,3.0,1.1,Iris-versicolor 101 | 100,5.7,2.8,4.1,1.3,Iris-versicolor 102 | 101,6.3,3.3,6.0,2.5,Iris-virginica 103 | 102,5.8,2.7,5.1,1.9,Iris-virginica 104 | 103,7.1,3.0,5.9,2.1,Iris-virginica 105 | 104,6.3,2.9,5.6,1.8,Iris-virginica 106 | 105,6.5,3.0,5.8,2.2,Iris-virginica 107 | 106,7.6,3.0,6.6,2.1,Iris-virginica 108 | 107,4.9,2.5,4.5,1.7,Iris-virginica 109 | 108,7.3,2.9,6.3,1.8,Iris-virginica 110 | 109,6.7,2.5,5.8,1.8,Iris-virginica 111 | 110,7.2,3.6,6.1,2.5,Iris-virginica 112 | 111,6.5,3.2,5.1,2.0,Iris-virginica 113 | 112,6.4,2.7,5.3,1.9,Iris-virginica 114 | 113,6.8,3.0,5.5,2.1,Iris-virginica 115 | 114,5.7,2.5,5.0,2.0,Iris-virginica 116 | 115,5.8,2.8,5.1,2.4,Iris-virginica 117 | 116,6.4,3.2,5.3,2.3,Iris-virginica 118 | 117,6.5,3.0,5.5,1.8,Iris-virginica 119 | 118,7.7,3.8,6.7,2.2,Iris-virginica 120 | 119,7.7,2.6,6.9,2.3,Iris-virginica 121 | 120,6.0,2.2,5.0,1.5,Iris-virginica 122 | 121,6.9,3.2,5.7,2.3,Iris-virginica 123 | 122,5.6,2.8,4.9,2.0,Iris-virginica 124 | 123,7.7,2.8,6.7,2.0,Iris-virginica 125 | 124,6.3,2.7,4.9,1.8,Iris-virginica 126 | 125,6.7,3.3,5.7,2.1,Iris-virginica 127 | 126,7.2,3.2,6.0,1.8,Iris-virginica 128 | 127,6.2,2.8,4.8,1.8,Iris-virginica 129 | 128,6.1,3.0,4.9,1.8,Iris-virginica 130 | 129,6.4,2.8,5.6,2.1,Iris-virginica 131 | 130,7.2,3.0,5.8,1.6,Iris-virginica 132 | 131,7.4,2.8,6.1,1.9,Iris-virginica 133 | 132,7.9,3.8,6.4,2.0,Iris-virginica 134 | 133,6.4,2.8,5.6,2.2,Iris-virginica 135 | 134,6.3,2.8,5.1,1.5,Iris-virginica 136 | 135,6.1,2.6,5.6,1.4,Iris-virginica 137 | 136,7.7,3.0,6.1,2.3,Iris-virginica 138 | 137,6.3,3.4,5.6,2.4,Iris-virginica 139 | 138,6.4,3.1,5.5,1.8,Iris-virginica 140 | 139,6.0,3.0,4.8,1.8,Iris-virginica 141 | 140,6.9,3.1,5.4,2.1,Iris-virginica 142 | 141,6.7,3.1,5.6,2.4,Iris-virginica 143 | 142,6.9,3.1,5.1,2.3,Iris-virginica 144 | 143,5.8,2.7,5.1,1.9,Iris-virginica 145 | 144,6.8,3.2,5.9,2.3,Iris-virginica 146 | 145,6.7,3.3,5.7,2.5,Iris-virginica 147 | 146,6.7,3.0,5.2,2.3,Iris-virginica 148 | 147,6.3,2.5,5.0,1.9,Iris-virginica 149 | 148,6.5,3.0,5.2,2.0,Iris-virginica 150 | 149,6.2,3.4,5.4,2.3,Iris-virginica 151 | 150,5.9,3.0,5.1,1.8,Iris-virginica 152 | -------------------------------------------------------------------------------- /data/Iris/Iris_data.csv: -------------------------------------------------------------------------------- 1 | Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species 2 | 1,5.1,3.5,1.4,0.2,Iris-setosa 3 | 2,4.9,3.0,1.4,0.2,Iris-setosa 4 | 3,4.7,3.2,1.3,0.2,Iris-setosa 5 | 4,4.6,3.1,1.5,0.2,Iris-setosa 6 | 5,5.0,3.6,1.4,0.2,Iris-setosa 7 | 6,5.4,3.9,1.7,0.4,Iris-setosa 8 | 7,4.6,3.4,1.4,0.3,Iris-setosa 9 | 8,5.0,3.4,1.5,0.2,Iris-setosa 10 | 9,4.4,2.9,1.4,0.2,Iris-setosa 11 | 10,4.9,3.1,1.5,0.1,Iris-setosa 12 | 11,5.4,3.7,1.5,0.2,Iris-setosa 13 | 12,4.8,3.4,1.6,0.2,Iris-setosa 14 | 12,4.8,3.4,1.6,0.2,Iris-setosa 15 | 13,4.8,3.0,1.4,0.1,Iris-setosa 16 | 14,4.3,3.0,1.1,0.1,Iris-setosa 17 | 15,5.8,4.0,1.2,0.2,Iris-setosa 18 | 16,5.7,4.4,1.5,0.4,Iris-setosa 19 | 17,5.4,3.9,1.3,0.4,Iris-setosa 20 | 18,5.1,3.5,1.4,0.3,Iris-setosa 21 | 19,5.7,3.8,1.7,0.3,Iris-setosa 22 | 20,5.1,3.8,1.5,0.3,Iris-setosa 23 | 21,5.4,3.4,1.7,0.2,Iris-setosa 24 | 22,5.1,3.7,1.5,0.4,Iris-setosa 25 | 23,4.6,3.6,1.0,0.2,Iris-setosa 26 | 24,5.1,3.3,1.7,0.5,Iris-setosa 27 | 25,4.8,3.4,1.9,0.2,Iris-setosa 28 | 26,5.0,3.0,1.6,0.2,Iris-setosa 29 | 27,5.0,3.4,1.6,0.4,Iris-setosa 30 | 28,5.2,3.5,1.5,0.2,Iris-setosa 31 | 29,5.2,3.4,1.4,0.2,Iris-setosa 32 | 30,4.7,3.2,1.6,0.2,Iris-setosa 33 | 31,4.8,3.1,1.6,0.2,Iris-setosa 34 | 32,5.4,3.4,1.5,0.4,Iris-setosa 35 | 33,5.2,4.1,1.5,0.1,Iris-setosa 36 | 34,5.5,4.2,1.4,0.2,Iris-setosa 37 | 35,4.9,3.1,1.5,0.1,Iris-setosa 38 | 36,5.0,3.2,1.2,0.2,Iris-setosa 39 | 37,5.5,3.5,1.3,0.2,Iris-setosa 40 | 38,4.9,3.1,1.5,0.1,Iris-setosa 41 | 39,4.4,3.0,1.3,0.2,Iris-setosa 42 | 40,5.1,3.4,1.5,0.2,Iris-setosa 43 | 41,5.0,3.5,1.3,0.3,Iris-setosa 44 | 42,4.5,2.3,1.3,0.3,Iris-setosa 45 | 43,4.4,3.2,1.3,0.2,Iris-setosa 46 | 44,5.0,3.5,1.6,0.6,Iris-setosa 47 | 45,5.1,3.8,1.9,0.4,Iris-setosa 48 | 46,4.8,3.0,1.4,0.3,Iris-setosa 49 | 47,5.1,3.8,1.6,0.2,Iris-setosa 50 | 48,4.6,3.2,1.4,0.2,Iris-setosa 51 | 49,5.3,3.7,1.5,0.2,Iris-setosa 52 | 50,5.0,3.3,1.4,0.2,Iris-setosa 53 | 51,7.0,3.2,4.7,1.4,Iris-versicolor 54 | 52,6.4,3.2,4.5,1.5,Iris-versicolor 55 | 53,6.9,3.1,4.9,1.5,Iris-versicolor 56 | 54,5.5,2.3,4.0,1.3,Iris-versicolor 57 | 55,6.5,2.8,4.6,1.5,Iris-versicolor 58 | 56,5.7,2.8,4.5,1.3,Iris-versicolor 59 | 57,6.3,3.3,4.7,1.6,Iris-versicolor 60 | 58,4.9,2.4,3.3,1.0,Iris-versicolor 61 | 59,6.6,2.9,4.6,1.3,Iris-versicolor 62 | 60,5.2,2.7,3.9,1.4,Iris-versicolor 63 | 61,5.0,2.0,3.5,1.0,Iris-versicolor 64 | 62,5.9,3.0,4.2,1.5,Iris-versicolor 65 | 63,6.0,2.2,4.0,1.0,Iris-versicolor 66 | 64,6.1,2.9,4.7,1.4,Iris-versicolor 67 | 65,5.6,2.9,3.6,1.3,Iris-versicolor 68 | 66,6.7,3.1,4.4,1.4,Iris-versicolor 69 | 67,5.6,3.0,4.5,1.5,Iris-versicolor 70 | 68,5.8,2.7,4.1,1.0,Iris-versicolor 71 | 69,6.2,2.2,4.5,1.5,Iris-versicolor 72 | 70,5.6,2.5,3.9,1.1,Iris-versicolor 73 | 71,5.9,3.2,4.8,1.8,Iris-versicolor 74 | 72,6.3,,4.7,1.6,Iris-versicolor 75 | 73,6.1,2.8,4.0,1.3,Iris-versicolor 76 | 74,6.3,2.5,4.9,1.5,Iris-versicolor 77 | 75,6.1,2.8,4.7,1.2,Iris-versicolor 78 | 76,6.4,2.9,4.3,1.3,Iris-versicolor 79 | 77,6.6,3.0,4.4,1.4,Iris-versicolor 80 | 78,6.8,2.8,4.8,1.4,Iris-versicolor 81 | 79,6.7,3.0,5.0,1.7,Iris-versicolor 82 | 80,6.0,2.9,4.5,1.5,Iris-versicolor 83 | 81,5.7,2.6,3.5,1.0,Iris-versicolor 84 | 82,5.5,2.4,3.8,1.1,Iris-versicolor 85 | 83,5.5,2.4,3.7,1.0,Iris-versicolor 86 | 84,5.8,2.7,3.9,1.2,Iris-versicolor 87 | 85,6.0,2.7,5.1,1.6,Iris-versicolor 88 | 86,5.4,3.0,4.5,1.5,Iris-versicolor 89 | 87,6.0,3.4,4.5,1.6,Iris-versicolor 90 | 88,6.7,3.1,4.7,1.5,Iris-versicolor 91 | 89,6.3,2.3,4.4,1.3,Iris-versicolor 92 | 90,5.6,3.0,4.1,1.3,Iris-versicolor 93 | 91,5.5,2.5,4.0,1.3,Iris-versicolor 94 | 92,5.5,2.6,4.4,1.2,Iris-versicolor 95 | 93,6.1,3.0,4.6,1.4,Iris-versicolor 96 | 94,5.8,2.6,4.0,1.2,Iris-versicolor 97 | 95,5.0,2.3,3.3,1.0,Iris-versicolor 98 | 96,5.6,2.7,4.2,1.3,Iris-versicolor 99 | 97,5.7,3.0,4.2,1.2,Iris-versicolor 100 | 98,5.7,2.9,4.2,1.3,Iris-versicolor 101 | 99,6.2,2.9,4.3,1.3,Iris-versicolor 102 | 100,5.1,2.5,3.0,1.1,Iris-versicolor 103 | 101,5.7,2.8,4.1,1.3,Iris-versicolor 104 | 102,6.3,3.3,6.0,2.5,Iris-virginica 105 | 103,5.8,2.7,5.1,1.9,Iris-virginica 106 | 104,7.1,3.0,5.9,2.1,Iris-virginica 107 | 105,6.3,2.9,5.6,1.8,Iris-virginica 108 | 106,6.5,3.0,5.8,2.2,Iris-virginica 109 | 107,7.6,3.0,6.6,2.1,Iris-virginica 110 | 108,4.9,2.5,4.5,1.7,Iris-virginica 111 | 109,7.3,2.9,6.3,1.8,Iris-virginica 112 | 110,6.7,2.5,5.8,1.8,Iris-virginica 113 | 111,7.2,3.6,6.1,2.5,Iris-virginica 114 | 112,5.8,2.6,,,Iris-versicolor 115 | 113,6.5,3.2,5.1,2.0,Iris-virginica 116 | 114,6.4,2.7,5.3,1.9,Iris-virginica 117 | 115,6.8,3.0,5.5,2.1,Iris-virginica 118 | 116,5.7,2.5,5.0,2.0,Iris-virginica 119 | 117,5.8,2.8,5.1,2.4,Iris-virginica 120 | 118,6.4,3.2,5.3,2.3,Iris-virginica 121 | 119,6.5,3.0,5.5,1.8,Iris-virginica 122 | 120,7.7,3.8,6.7,2.2,Iris-virginica 123 | 121,7.7,2.6,6.9,2.3,Iris-virginica 124 | 122,6.0,2.2,5.0,1.5,Iris-virginica 125 | 123,6.9,3.2,5.7,2.3,Iris-virginica 126 | 124,5.6,2.8,4.9,2.0,Iris-virginica 127 | 125,7.7,2.8,6.7,2.0,Iris-virginica 128 | 126,6.3,2.7,4.9,1.8,Iris-virginica 129 | 127,6.7,3.3,5.7,2.1,Iris-virginica 130 | 128,7.2,3.2,6.0,1.8,Iris-virginica 131 | 129,6.2,2.8,4.8,1.8,Iris-virginica 132 | 130,6.1,3.0,4.9,1.8,Iris-virginica 133 | 131,6.4,2.8,5.6,2.1,Iris-virginica 134 | 132,7.2,3.0,5.8,1.6,Iris-virginica 135 | 133,7.4,2.8,6.1,1.9,Iris-virginica 136 | 134,7.9,3.8,6.4,2.0,Iris-virginica 137 | 135,6.4,2.8,5.6,2.2,Iris-virginica 138 | 136,6.3,2.8,5.1,1.5,Iris-virginica 139 | 137,6.1,2.6,5.6,1.4,Iris-virginica 140 | ,,,,, 141 | 139,7.7,3.0,6.1,2.3,Iris-virginica 142 | 140,6.3,3.4,5.6,2.4,Iris-virginica 143 | 141,6.4,3.1,5.5,1.8,Iris-virginica 144 | 142,6.0,3.0,4.8,1.8,Iris-virginica 145 | 143,6.9,3.1,5.4,2.1,Iris-virginica 146 | 144,6.7,3.1,5.6,2.4,Iris-virginica 147 | 145,6.9,3.1,5.1,2.3,Iris-virginica 148 | 146,5.8,2.7,5.1,1.9,Iris-virginica 149 | 147,6.8,3.2,5.9,2.3,Iris-virginica 150 | 148,6.7,3.3,5.7,2.5,Iris-virginica 151 | 149,6.7,3.0,5.2,2.3,Iris-virginica 152 | 150,6.3,2.5,5.0,1.9,Iris-virginica 153 | 151,6.5,3.0,5.2,2.0,Iris-virginica 154 | 152,6.2,3.4,5.4,2.3,Iris-virginica 155 | 153,5.9,3.0,5.1,1.8,Iris-virginica 156 | -------------------------------------------------------------------------------- /data/Penguins/penguins.csv: -------------------------------------------------------------------------------- 1 | species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex 2 | Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male 3 | Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female 4 | Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female 5 | Adelie,Torgersen,,,,, 6 | ,,,,,, 7 | Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female 8 | Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male 9 | Adelie,Torgersen,38.9,17.8,181.0,3625.0,Female 10 | Adelie,Torgersen,39.2,19.6,195.0,4675.0,Male 11 | Adelie,Torgersen,34.1,18.1,193.0,3475.0, 12 | Adelie,Torgersen,42.0,20.2,190.0,4250.0, 13 | Adelie,Torgersen,37.8,17.1,186.0,3300.0, 14 | Adelie,Torgersen,37.8,17.3,180.0,3700.0, 15 | Adelie,Torgersen,41.1,17.6,182.0,3200.0,Female 16 | Adelie,Torgersen,38.6,21.2,191.0,3800.0,Male 17 | Adelie,Torgersen,34.6,21.1,198.0,4400.0,Male 18 | Adelie,Torgersen,36.6,17.8,185.0,3700.0,Female 19 | Adelie,Torgersen,38.7,19.0,195.0,3450.0,Female 20 | Adelie,Torgersen,42.5,20.7,197.0,4500.0,Male 21 | Adelie,Torgersen,34.4,18.4,184.0,3325.0,Female 22 | Adelie,Torgersen,46.0,21.5,194.0,4200.0,Male 23 | Adelie,Biscoe,37.8,18.3,174.0,3400.0,Female 24 | Adelie,Biscoe,37.7,18.7,180.0,3600.0,Male 25 | Adelie,Biscoe,35.9,19.2,189.0,3800.0,Female 26 | Adelie,Biscoe,38.2,18.1,185.0,3950.0,Male 27 | Adelie,Biscoe,38.8,17.2,180.0,3800.0,Male 28 | Adelie,Biscoe,35.3,18.9,187.0,3800.0,Female 29 | Adelie,Biscoe,40.6,18.6,183.0,3550.0,Male 30 | Adelie,Biscoe,40.5,17.9,187.0,3200.0,Female 31 | Adelie,Biscoe,40.5,17.9,187.0,3200.0,Female 32 | Adelie,Biscoe,37.9,18.6,172.0,3150.0,Female 33 | Adelie,Biscoe,40.5,18.9,180.0,3950.0,Male 34 | Adelie,Dream,39.5,16.7,178.0,3250.0,Female 35 | Adelie,Dream,37.2,18.1,178.0,3900.0,Male 36 | Adelie,Dream,39.5,17.8,188.0,3300.0,Female 37 | Adelie,Dream,40.9,18.9,184.0,3900.0,Male 38 | Adelie,Dream,36.4,17.0,195.0,3325.0,Female 39 | Adelie,Dream,39.2,21.1,196.0,4150.0,Male 40 | Adelie,Dream,38.8,20.0,190.0,3950.0,Male 41 | Adelie,Dream,42.2,18.5,180.0,3550.0,Female 42 | Adelie,Dream,37.6,19.3,181.0,3300.0,Female 43 | Adelie,Dream,39.8,19.1,184.0,4650.0,Male 44 | Adelie,Dream,36.5,18.0,182.0,3150.0,Female 45 | Adelie,Dream,40.8,18.4,195.0,3900.0,Male 46 | Adelie,Dream,36.0,18.5,186.0,3100.0,Female 47 | Adelie,Dream,44.1,19.7,196.0,4400.0,Male 48 | Adelie,Dream,37.0,16.9,185.0,3000.0,Female 49 | Adelie,Dream,39.6,18.8,190.0,4600.0,Male 50 | Adelie,Dream,41.1,19.0,182.0,3425.0,Male 51 | Adelie,Dream,37.5,18.9,179.0,2975.0, 52 | Adelie,Dream,36.0,17.9,190.0,3450.0,Female 53 | Adelie,Dream,42.3,21.2,191.0,4150.0,Male 54 | Adelie,Biscoe,39.6,17.7,186.0,3500.0,Female 55 | Adelie,Biscoe,40.1,18.9,188.0,4300.0,Male 56 | Adelie,Biscoe,35.0,17.9,190.0,3450.0,Female 57 | Adelie,Biscoe,42.0,19.5,200.0,4050.0,Male 58 | Adelie,Biscoe,34.5,18.1,187.0,2900.0,Female 59 | Adelie,Biscoe,41.4,18.6,191.0,3700.0,Male 60 | Adelie,Biscoe,39.0,17.5,186.0,3550.0,Female 61 | ,,,,,, 62 | Adelie,Biscoe,40.6,18.8,193.0,3800.0,Male 63 | Adelie,Biscoe,36.5,16.6,181.0,2850.0,Female 64 | Adelie,Biscoe,37.6,19.1,194.0,3750.0,Male 65 | Adelie,Biscoe,35.7,16.9,185.0,3150.0,Female 66 | Adelie,Biscoe,41.3,21.1,195.0,4400.0,Male 67 | Adelie,Biscoe,37.6,17.0,185.0,3600.0,Female 68 | Adelie,Biscoe,41.1,18.2,192.0,4050.0,Male 69 | Adelie,Biscoe,36.4,17.1,184.0,2850.0,Female 70 | Adelie,Biscoe,41.6,18.0,192.0,3950.0,Male 71 | Adelie,Biscoe,35.5,16.2,195.0,3350.0,Female 72 | Adelie,Biscoe,41.1,19.1,188.0,4100.0,Male 73 | Adelie,Torgersen,35.9,16.6,190.0,3050.0,Female 74 | Adelie,Torgersen,41.8,19.4,198.0,4450.0,Male 75 | Adelie,Torgersen,33.5,19.0,190.0,3600.0,Female 76 | Adelie,Torgersen,39.7,18.4,190.0,3900.0,Male 77 | Adelie,Torgersen,39.6,17.2,196.0,3550.0,Female 78 | Adelie,Torgersen,45.8,18.9,197.0,4150.0,Male 79 | Adelie,Torgersen,35.5,17.5,190.0,3700.0,Female 80 | Adelie,Torgersen,42.8,18.5,195.0,4250.0,Male 81 | Adelie,Torgersen,40.9,16.8,191.0,3700.0,Female 82 | Adelie,Torgersen,37.2,19.4,184.0,3900.0,Male 83 | Adelie,Torgersen,36.2,16.1,187.0,3550.0,Female 84 | Adelie,Torgersen,42.1,19.1,195.0,4000.0,Male 85 | Adelie,Torgersen,34.6,17.2,189.0,3200.0,Female 86 | Adelie,Torgersen,42.9,17.6,196.0,4700.0,Male 87 | Adelie,Torgersen,36.7,18.8,187.0,3800.0,Female 88 | Adelie,Torgersen,35.1,19.4,193.0,4200.0,Male 89 | Adelie,Dream,37.3,17.8,191.0,3350.0,Female 90 | Adelie,Dream,41.3,20.3,194.0,3550.0,Male 91 | Adelie,Dream,36.3,19.5,190.0,3800.0,Male 92 | Adelie,Dream,36.9,18.6,189.0,3500.0,Female 93 | Adelie,Dream,38.3,19.2,189.0,3950.0,Male 94 | Adelie,Dream,38.9,18.8,190.0,3600.0,Female 95 | Adelie,Dream,35.7,18.0,202.0,3550.0,Female 96 | Adelie,Dream,41.1,18.1,205.0,4300.0,Male 97 | Adelie,Dream,34.0,17.1,185.0,3400.0,Female 98 | Adelie,Dream,39.6,18.1,186.0,4450.0,Male 99 | Adelie,Dream,36.2,17.3,187.0,3300.0,Female 100 | Adelie,Dream,40.8,18.9,208.0,4300.0,Male 101 | Adelie,Dream,38.1,18.6,190.0,3700.0,Female 102 | Adelie,Dream,40.3,18.5,196.0,4350.0,Male 103 | ,,,,,, 104 | Adelie,Dream,33.1,16.1,178.0,2900.0,Female 105 | Adelie,Dream,43.2,18.5,192.0,4100.0,Male 106 | Adelie,Biscoe,35.0,17.9,192.0,3725.0,Female 107 | Adelie,Biscoe,41.0,20.0,203.0,4725.0,Male 108 | Adelie,Biscoe,37.7,16.0,183.0,3075.0,Female 109 | Adelie,Biscoe,37.8,20.0,190.0,4250.0,Male 110 | Adelie,Biscoe,37.9,18.6,193.0,2925.0,Female 111 | Adelie,Biscoe,39.7,18.9,184.0,3550.0,Male 112 | Adelie,Biscoe,38.6,17.2,199.0,3750.0,Female 113 | Adelie,Biscoe,38.2,20.0,190.0,3900.0,Male 114 | Adelie,Biscoe,38.1,17.0,181.0,3175.0,Female 115 | Adelie,Biscoe,43.2,19.0,197.0,4775.0,Male 116 | Adelie,Biscoe,38.1,16.5,198.0,3825.0,Female 117 | Adelie,Biscoe,45.6,20.3,191.0,4600.0,Male 118 | Adelie,Biscoe,39.7,17.7,193.0,3200.0,Female 119 | Adelie,Biscoe,42.2,19.5,197.0,4275.0,Male 120 | Adelie,Biscoe,39.6,20.7,191.0,3900.0,Female 121 | Adelie,Biscoe,42.7,18.3,196.0,4075.0,Male 122 | Adelie,Torgersen,38.6,17.0,188.0,2900.0,Female 123 | Adelie,Torgersen,37.3,20.5,199.0,3775.0,Male 124 | Adelie,Torgersen,35.7,17.0,189.0,3350.0,Female 125 | Adelie,Torgersen,41.1,18.6,189.0,3325.0,Male 126 | Adelie,Torgersen,36.2,17.2,187.0,3150.0,Female 127 | Adelie,Torgersen,37.7,19.8,198.0,3500.0,Male 128 | Adelie,Torgersen,40.2,17.0,176.0,3450.0,Female 129 | Adelie,Torgersen,41.4,18.5,202.0,3875.0,Male 130 | Adelie,Torgersen,35.2,15.9,186.0,3050.0,Female 131 | Adelie,Torgersen,40.6,19.0,199.0,4000.0,Male 132 | Adelie,Torgersen,38.8,17.6,191.0,3275.0,Female 133 | ,,,,,, 134 | Adelie,Torgersen,41.5,18.3,195.0,4300.0,Male 135 | Adelie,Torgersen,39.0,17.1,191.0,3050.0,Female 136 | Adelie,Torgersen,44.1,18.0,210.0,4000.0,Male 137 | Adelie,Torgersen,38.5,17.9,190.0,3325.0,Female 138 | Adelie,Torgersen,43.1,19.2,197.0,3500.0,Male 139 | Adelie,Dream,36.8,18.5,193.0,3500.0,Female 140 | Adelie,Dream,37.5,18.5,199.0,4475.0,Male 141 | Adelie,Dream,38.1,17.6,187.0,3425.0,Female 142 | Adelie,Dream,41.1,17.5,190.0,3900.0,Male 143 | Adelie,Dream,35.6,17.5,191.0,3175.0,Female 144 | Adelie,Dream,40.2,20.1,200.0,3975.0,Male 145 | Adelie,Dream,37.0,16.5,185.0,3400.0,Female 146 | Adelie,Dream,39.7,17.9,193.0,4250.0,Male 147 | Adelie,Dream,40.2,17.1,193.0,3400.0,Female 148 | Adelie,Dream,40.6,17.2,187.0,3475.0,Male 149 | Adelie,Dream,32.1,15.5,188.0,3050.0,Female 150 | Adelie,Dream,40.7,17.0,190.0,3725.0,Male 151 | Adelie,Dream,37.3,16.8,192.0,3000.0,Female 152 | Adelie,Dream,39.0,18.7,185.0,3650.0,Male 153 | Adelie,Dream,39.2,18.6,190.0,4250.0,Male 154 | Adelie,Dream,36.6,18.4,184.0,3475.0,Female 155 | Adelie,Dream,36.0,17.8,195.0,3450.0,Female 156 | Adelie,Dream,37.8,18.1,193.0,3750.0,Male 157 | Adelie,Dream,36.0,17.1,187.0,3700.0,Female 158 | Adelie,Dream,41.5,18.5,201.0,4000.0,Male 159 | Chinstrap,Dream,46.5,17.9,192.0,3500.0,Female 160 | Chinstrap,Dream,50.0,19.5,196.0,3900.0,Male 161 | Chinstrap,Dream,51.3,19.2,193.0,3650.0,Male 162 | Chinstrap,Dream,45.4,18.7,188.0,3525.0,Female 163 | Chinstrap,Dream,52.7,19.8,197.0,3725.0,Male 164 | Chinstrap,Dream,45.2,17.8,198.0,3950.0,Female 165 | Chinstrap,Dream,46.1,18.2,178.0,3250.0,Female 166 | Chinstrap,Dream,51.3,18.2,197.0,3750.0,Male 167 | Chinstrap,Dream,46.0,18.9,195.0,4150.0,Female 168 | Chinstrap,Dream,51.3,19.9,198.0,3700.0,Male 169 | Chinstrap,Dream,46.6,17.8,193.0,3800.0,Female 170 | Chinstrap,Dream,46.6,17.8,193.0,3800.0,Female 171 | Chinstrap,Dream,51.7,20.3,194.0,3775.0,Male 172 | Chinstrap,Dream,47.0,17.3,185.0,3700.0,Female 173 | Chinstrap,Dream,52.0,18.1,201.0,4050.0,Male 174 | Chinstrap,Dream,45.9,17.1,190.0,3575.0,Female 175 | Chinstrap,Dream,50.5,19.6,201.0,4050.0,Male 176 | Chinstrap,Dream,50.3,20.0,197.0,3300.0,Male 177 | Chinstrap,Dream,58.0,17.8,181.0,3700.0,Female 178 | Chinstrap,Dream,46.4,18.6,190.0,3450.0,Female 179 | Chinstrap,Dream,49.2,18.2,195.0,4400.0,Male 180 | Chinstrap,Dream,42.4,17.3,181.0,3600.0,Female 181 | Chinstrap,Dream,48.5,17.5,191.0,3400.0,Male 182 | Chinstrap,Dream,43.2,16.6,187.0,2900.0,Female 183 | Chinstrap,Dream,50.6,19.4,193.0,3800.0,Male 184 | Chinstrap,Dream,46.7,17.9,195.0,3300.0,Female 185 | Chinstrap,Dream,52.0,19.0,197.0,4150.0,Male 186 | Chinstrap,Dream,50.5,18.4,200.0,3400.0,Female 187 | ,,,,,, 188 | Chinstrap,Dream,49.5,19.0,200.0,3800.0,Male 189 | Chinstrap,Dream,46.4,17.8,191.0,3700.0,Female 190 | Chinstrap,Dream,52.8,20.0,205.0,4550.0,Male 191 | Chinstrap,Dream,40.9,16.6,187.0,3200.0,Female 192 | Chinstrap,Dream,54.2,20.8,201.0,4300.0,Male 193 | Chinstrap,Dream,42.5,16.7,187.0,3350.0,Female 194 | Chinstrap,Dream,51.0,18.8,203.0,4100.0,Male 195 | Chinstrap,Dream,49.7,18.6,195.0,3600.0,Male 196 | Chinstrap,Dream,47.5,16.8,199.0,3900.0,Female 197 | Chinstrap,Dream,47.6,18.3,195.0,3850.0,Female 198 | Chinstrap,Dream,52.0,20.7,210.0,4800.0,Male 199 | Chinstrap,Dream,46.9,16.6,192.0,2700.0,Female 200 | Chinstrap,Dream,53.5,19.9,205.0,4500.0,Male 201 | Chinstrap,Dream,49.0,19.5,210.0,3950.0,Male 202 | Chinstrap,Dream,46.2,17.5,187.0,3650.0,Female 203 | Chinstrap,Dream,50.9,19.1,196.0,3550.0,Male 204 | Chinstrap,Dream,45.5,17.0,196.0,3500.0,Female 205 | Chinstrap,Dream,50.9,17.9,196.0,3675.0,Female 206 | Chinstrap,Dream,50.8,18.5,201.0,4450.0,Male 207 | Chinstrap,Dream,50.1,17.9,190.0,3400.0,Female 208 | Chinstrap,Dream,49.0,19.6,212.0,4300.0,Male 209 | Chinstrap,Dream,51.5,18.7,187.0,3250.0,Male 210 | Chinstrap,Dream,49.8,17.3,198.0,3675.0,Female 211 | Chinstrap,Dream,48.1,16.4,199.0,3325.0,Female 212 | Chinstrap,Dream,51.4,19.0,201.0,3950.0,Male 213 | Chinstrap,Dream,45.7,17.3,193.0,3600.0,Female 214 | Chinstrap,Dream,50.7,19.7,203.0,4050.0,Male 215 | Chinstrap,Dream,42.5,17.3,187.0,3350.0,Female 216 | Chinstrap,Dream,52.2,18.8,197.0,3450.0,Male 217 | Chinstrap,Dream,45.2,16.6,191.0,3250.0,Female 218 | Chinstrap,Dream,49.3,19.9,203.0,4050.0,Male 219 | Chinstrap,Dream,50.2,18.8,202.0,3800.0,Male 220 | Chinstrap,Dream,45.6,19.4,194.0,3525.0,Female 221 | Chinstrap,Dream,51.9,19.5,206.0,3950.0,Male 222 | Chinstrap,Dream,46.8,16.5,189.0,3650.0,Female 223 | Chinstrap,Dream,45.7,17.0,195.0,3650.0,Female 224 | Chinstrap,Dream,55.8,19.8,207.0,4000.0,Male 225 | Chinstrap,Dream,43.5,18.1,202.0,3400.0,Female 226 | Chinstrap,Dream,49.6,18.2,193.0,3775.0,Male 227 | Chinstrap,Dream,50.8,19.0,210.0,4100.0,Male 228 | Chinstrap,Dream,50.2,18.7,198.0,3775.0,Female 229 | Gentoo,Biscoe,46.1,13.2,211.0,4500.0,Female 230 | Gentoo,Biscoe,50.0,16.3,230.0,5700.0,Male 231 | Gentoo,Biscoe,48.7,14.1,210.0,4450.0,Female 232 | Gentoo,Biscoe,50.0,15.2,218.0,5700.0,Male 233 | Gentoo,Biscoe,47.6,14.5,215.0,5400.0,Male 234 | Gentoo,Biscoe,46.5,13.5,210.0,4550.0,Female 235 | Gentoo,Biscoe,45.4,14.6,211.0,4800.0,Female 236 | Gentoo,Biscoe,46.7,15.3,219.0,5200.0,Male 237 | Gentoo,Biscoe,43.3,13.4,209.0,4400.0,Female 238 | Gentoo,Biscoe,46.8,15.4,215.0,5150.0,Male 239 | Gentoo,Biscoe,40.9,13.7,214.0,4650.0,Female 240 | Gentoo,Biscoe,49.0,16.1,216.0,5550.0,Male 241 | ,,,,,, 242 | Gentoo,Biscoe,45.5,13.7,214.0,4650.0,Female 243 | Gentoo,Biscoe,48.4,14.6,213.0,5850.0,Male 244 | Gentoo,Biscoe,45.8,14.6,210.0,4200.0,Female 245 | Gentoo,Biscoe,49.3,15.7,217.0,5850.0,Male 246 | Gentoo,Biscoe,42.0,13.5,210.0,4150.0,Female 247 | Gentoo,Biscoe,49.2,15.2,221.0,6300.0,Male 248 | Gentoo,Biscoe,46.2,14.5,209.0,4800.0,Female 249 | Gentoo,Biscoe,48.7,15.1,222.0,5350.0,Male 250 | Gentoo,Biscoe,50.2,14.3,218.0,5700.0,Male 251 | Gentoo,Biscoe,45.1,14.5,215.0,5000.0,Female 252 | Gentoo,Biscoe,46.5,14.5,213.0,4400.0,Female 253 | Gentoo,Biscoe,46.3,15.8,215.0,5050.0,Male 254 | Gentoo,Biscoe,42.9,13.1,215.0,5000.0,Female 255 | Gentoo,Biscoe,46.1,15.1,215.0,5100.0,Male 256 | Gentoo,Biscoe,44.5,14.3,216.0,4100.0, 257 | Gentoo,Biscoe,47.8,15.0,215.0,5650.0,Male 258 | Gentoo,Biscoe,48.2,14.3,210.0,4600.0,Female 259 | Gentoo,Biscoe,50.0,15.3,220.0,5550.0,Male 260 | Gentoo,Biscoe,47.3,15.3,222.0,5250.0,Male 261 | Gentoo,Biscoe,42.8,14.2,209.0,4700.0,Female 262 | Gentoo,Biscoe,45.1,14.5,207.0,5050.0,Female 263 | Gentoo,Biscoe,59.6,17.0,230.0,6050.0,Male 264 | Gentoo,Biscoe,49.1,14.8,220.0,5150.0,Female 265 | Gentoo,Biscoe,48.4,16.3,220.0,5400.0,Male 266 | Gentoo,Biscoe,48.4,16.3,220.0,5400.0,Male 267 | Gentoo,Biscoe,42.6,13.7,213.0,4950.0,Female 268 | Gentoo,Biscoe,44.4,17.3,219.0,5250.0,Male 269 | Gentoo,Biscoe,44.0,13.6,208.0,4350.0,Female 270 | Gentoo,Biscoe,48.7,15.7,208.0,5350.0,Male 271 | Gentoo,Biscoe,42.7,13.7,208.0,3950.0,Female 272 | Gentoo,Biscoe,49.6,16.0,225.0,5700.0,Male 273 | Gentoo,Biscoe,45.3,13.7,210.0,4300.0,Female 274 | Gentoo,Biscoe,49.6,15.0,216.0,4750.0,Male 275 | Gentoo,Biscoe,50.5,15.9,222.0,5550.0,Male 276 | Gentoo,Biscoe,43.6,13.9,217.0,4900.0,Female 277 | Gentoo,Biscoe,45.5,13.9,210.0,4200.0,Female 278 | Gentoo,Biscoe,50.5,15.9,225.0,5400.0,Male 279 | Gentoo,Biscoe,44.9,13.3,213.0,5100.0,Female 280 | Gentoo,Biscoe,45.2,15.8,215.0,5300.0,Male 281 | Gentoo,Biscoe,46.6,14.2,210.0,4850.0,Female 282 | Gentoo,Biscoe,48.5,14.1,220.0,5300.0,Male 283 | Gentoo,Biscoe,45.1,14.4,210.0,4400.0,Female 284 | Gentoo,Biscoe,50.1,15.0,225.0,5000.0,Male 285 | Gentoo,Biscoe,46.5,14.4,217.0,4900.0,Female 286 | Gentoo,Biscoe,45.0,15.4,220.0,5050.0,Male 287 | Gentoo,Biscoe,43.8,13.9,208.0,4300.0,Female 288 | Gentoo,Biscoe,45.5,15.0,220.0,5000.0,Male 289 | Gentoo,Biscoe,43.2,14.5,208.0,4450.0,Female 290 | Gentoo,Biscoe,50.4,15.3,224.0,5550.0,Male 291 | Gentoo,Biscoe,45.3,13.8,208.0,4200.0,Female 292 | Gentoo,Biscoe,46.2,14.9,221.0,5300.0,Male 293 | Gentoo,Biscoe,45.7,13.9,214.0,4400.0,Female 294 | Gentoo,Biscoe,54.3,15.7,231.0,5650.0,Male 295 | Gentoo,Biscoe,45.8,14.2,219.0,4700.0,Female 296 | Gentoo,Biscoe,49.8,16.8,230.0,5700.0,Male 297 | Gentoo,Biscoe,46.2,14.4,214.0,4650.0, 298 | Gentoo,Biscoe,49.5,16.2,229.0,5800.0,Male 299 | Gentoo,Biscoe,43.5,14.2,220.0,4700.0,Female 300 | Gentoo,Biscoe,50.7,15.0,223.0,5550.0,Male 301 | Gentoo,Biscoe,47.7,15.0,216.0,4750.0,Female 302 | Gentoo,Biscoe,46.4,15.6,221.0,5000.0,Male 303 | Gentoo,Biscoe,48.2,15.6,221.0,5100.0,Male 304 | Gentoo,Biscoe,46.5,14.8,217.0,5200.0,Female 305 | Gentoo,Biscoe,46.4,15.0,216.0,4700.0,Female 306 | Gentoo,Biscoe,48.6,16.0,230.0,5800.0,Male 307 | Gentoo,Biscoe,47.5,14.2,209.0,4600.0,Female 308 | Gentoo,Biscoe,51.1,16.3,220.0,6000.0,Male 309 | Gentoo,Biscoe,45.2,13.8,215.0,4750.0,Female 310 | Gentoo,Biscoe,45.2,16.4,223.0,5950.0,Male 311 | Gentoo,Biscoe,49.1,14.5,212.0,4625.0,Female 312 | Gentoo,Biscoe,52.5,15.6,221.0,5450.0,Male 313 | Gentoo,Biscoe,47.4,14.6,212.0,4725.0,Female 314 | Gentoo,Biscoe,50.0,15.9,224.0,5350.0,Male 315 | Gentoo,Biscoe,44.9,13.8,212.0,4750.0,Female 316 | Gentoo,Biscoe,50.8,17.3,228.0,5600.0,Male 317 | Gentoo,Biscoe,43.4,14.4,218.0,4600.0,Female 318 | Gentoo,Biscoe,51.3,14.2,218.0,5300.0,Male 319 | Gentoo,Biscoe,47.5,14.0,212.0,4875.0,Female 320 | Gentoo,Biscoe,52.1,17.0,230.0,5550.0,Male 321 | Gentoo,Biscoe,47.5,15.0,218.0,4950.0,Female 322 | Gentoo,Biscoe,52.2,17.1,228.0,5400.0,Male 323 | Gentoo,Biscoe,45.5,14.5,212.0,4750.0,Female 324 | Gentoo,Biscoe,49.5,16.1,224.0,5650.0,Male 325 | Gentoo,Biscoe,44.5,14.7,214.0,4850.0,Female 326 | Gentoo,Biscoe,50.8,15.7,226.0,5200.0,Male 327 | Gentoo,Biscoe,49.4,15.8,216.0,4925.0,Male 328 | Gentoo,Biscoe,46.9,14.6,222.0,4875.0,Female 329 | Gentoo,Biscoe,48.4,14.4,203.0,4625.0,Female 330 | Gentoo,Biscoe,51.1,16.5,225.0,5250.0,Male 331 | Gentoo,Biscoe,48.5,15.0,219.0,4850.0,Female 332 | Gentoo,Biscoe,55.9,17.0,228.0,5600.0,Male 333 | Gentoo,Biscoe,47.2,15.5,215.0,4975.0,Female 334 | Gentoo,Biscoe,49.1,15.0,228.0,5500.0,Male 335 | Gentoo,Biscoe,47.3,13.8,216.0,4725.0, 336 | Gentoo,Biscoe,46.8,16.1,215.0,5500.0,Male 337 | Gentoo,Biscoe,41.7,14.7,210.0,4700.0,Female 338 | Gentoo,Biscoe,53.4,15.8,219.0,5500.0,Male 339 | Gentoo,Biscoe,43.3,14.0,208.0,4575.0,Female 340 | Gentoo,Biscoe,48.1,15.1,209.0,5500.0,Male 341 | ,,,,,, 342 | Gentoo,Biscoe,50.5,15.2,216.0,5000.0,Female 343 | Gentoo,Biscoe,49.8,15.9,229.0,5950.0,Male 344 | Gentoo,Biscoe,43.5,15.2,213.0,4650.0,Female 345 | Gentoo,Biscoe,51.5,16.3,230.0,5500.0,Male 346 | Gentoo,Biscoe,46.2,14.1,217.0,4375.0,Female 347 | Gentoo,Biscoe,55.1,16.0,230.0,5850.0,Male 348 | Gentoo,Biscoe,44.5,15.7,217.0,4875.0, 349 | Gentoo,Biscoe,48.8,16.2,222.0,6000.0,Male 350 | Gentoo,Biscoe,47.2,13.7,214.0,4925.0,Female 351 | Gentoo,Biscoe,,,,, 352 | Gentoo,Biscoe,46.8,14.3,215.0,4850.0,Female 353 | Gentoo,Biscoe,50.4,15.7,222.0,5750.0,Male 354 | Gentoo,Biscoe,45.2,14.8,212.0,5200.0,Female 355 | Gentoo,Biscoe,45.2,14.8,212.0,5200.0,Female 356 | Gentoo,Biscoe,49.9,16.1,213.0,5400.0,Male 357 | -------------------------------------------------------------------------------- /data/Penguins/penguins_clean.csv: -------------------------------------------------------------------------------- 1 | species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex 2 | Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male 3 | Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female 4 | Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female 5 | Adelie,Torgersen,,,,, 6 | Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female 7 | Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male 8 | Adelie,Torgersen,38.9,17.8,181.0,3625.0,Female 9 | Adelie,Torgersen,39.2,19.6,195.0,4675.0,Male 10 | Adelie,Torgersen,34.1,18.1,193.0,3475.0, 11 | Adelie,Torgersen,42.0,20.2,190.0,4250.0, 12 | Adelie,Torgersen,37.8,17.1,186.0,3300.0, 13 | Adelie,Torgersen,37.8,17.3,180.0,3700.0, 14 | Adelie,Torgersen,41.1,17.6,182.0,3200.0,Female 15 | Adelie,Torgersen,38.6,21.2,191.0,3800.0,Male 16 | Adelie,Torgersen,34.6,21.1,198.0,4400.0,Male 17 | Adelie,Torgersen,36.6,17.8,185.0,3700.0,Female 18 | Adelie,Torgersen,38.7,19.0,195.0,3450.0,Female 19 | Adelie,Torgersen,42.5,20.7,197.0,4500.0,Male 20 | Adelie,Torgersen,34.4,18.4,184.0,3325.0,Female 21 | Adelie,Torgersen,46.0,21.5,194.0,4200.0,Male 22 | Adelie,Biscoe,37.8,18.3,174.0,3400.0,Female 23 | Adelie,Biscoe,37.7,18.7,180.0,3600.0,Male 24 | Adelie,Biscoe,35.9,19.2,189.0,3800.0,Female 25 | Adelie,Biscoe,38.2,18.1,185.0,3950.0,Male 26 | Adelie,Biscoe,38.8,17.2,180.0,3800.0,Male 27 | Adelie,Biscoe,35.3,18.9,187.0,3800.0,Female 28 | Adelie,Biscoe,40.6,18.6,183.0,3550.0,Male 29 | Adelie,Biscoe,40.5,17.9,187.0,3200.0,Female 30 | Adelie,Biscoe,37.9,18.6,172.0,3150.0,Female 31 | Adelie,Biscoe,40.5,18.9,180.0,3950.0,Male 32 | Adelie,Dream,39.5,16.7,178.0,3250.0,Female 33 | Adelie,Dream,37.2,18.1,178.0,3900.0,Male 34 | Adelie,Dream,39.5,17.8,188.0,3300.0,Female 35 | Adelie,Dream,40.9,18.9,184.0,3900.0,Male 36 | Adelie,Dream,36.4,17.0,195.0,3325.0,Female 37 | Adelie,Dream,39.2,21.1,196.0,4150.0,Male 38 | Adelie,Dream,38.8,20.0,190.0,3950.0,Male 39 | Adelie,Dream,42.2,18.5,180.0,3550.0,Female 40 | Adelie,Dream,37.6,19.3,181.0,3300.0,Female 41 | Adelie,Dream,39.8,19.1,184.0,4650.0,Male 42 | Adelie,Dream,36.5,18.0,182.0,3150.0,Female 43 | Adelie,Dream,40.8,18.4,195.0,3900.0,Male 44 | Adelie,Dream,36.0,18.5,186.0,3100.0,Female 45 | Adelie,Dream,44.1,19.7,196.0,4400.0,Male 46 | Adelie,Dream,37.0,16.9,185.0,3000.0,Female 47 | Adelie,Dream,39.6,18.8,190.0,4600.0,Male 48 | Adelie,Dream,41.1,19.0,182.0,3425.0,Male 49 | Adelie,Dream,37.5,18.9,179.0,2975.0, 50 | Adelie,Dream,36.0,17.9,190.0,3450.0,Female 51 | Adelie,Dream,42.3,21.2,191.0,4150.0,Male 52 | Adelie,Biscoe,39.6,17.7,186.0,3500.0,Female 53 | Adelie,Biscoe,40.1,18.9,188.0,4300.0,Male 54 | Adelie,Biscoe,35.0,17.9,190.0,3450.0,Female 55 | Adelie,Biscoe,42.0,19.5,200.0,4050.0,Male 56 | Adelie,Biscoe,34.5,18.1,187.0,2900.0,Female 57 | Adelie,Biscoe,41.4,18.6,191.0,3700.0,Male 58 | Adelie,Biscoe,39.0,17.5,186.0,3550.0,Female 59 | Adelie,Biscoe,40.6,18.8,193.0,3800.0,Male 60 | Adelie,Biscoe,36.5,16.6,181.0,2850.0,Female 61 | Adelie,Biscoe,37.6,19.1,194.0,3750.0,Male 62 | Adelie,Biscoe,35.7,16.9,185.0,3150.0,Female 63 | Adelie,Biscoe,41.3,21.1,195.0,4400.0,Male 64 | Adelie,Biscoe,37.6,17.0,185.0,3600.0,Female 65 | Adelie,Biscoe,41.1,18.2,192.0,4050.0,Male 66 | Adelie,Biscoe,36.4,17.1,184.0,2850.0,Female 67 | Adelie,Biscoe,41.6,18.0,192.0,3950.0,Male 68 | Adelie,Biscoe,35.5,16.2,195.0,3350.0,Female 69 | Adelie,Biscoe,41.1,19.1,188.0,4100.0,Male 70 | Adelie,Torgersen,35.9,16.6,190.0,3050.0,Female 71 | Adelie,Torgersen,41.8,19.4,198.0,4450.0,Male 72 | Adelie,Torgersen,33.5,19.0,190.0,3600.0,Female 73 | Adelie,Torgersen,39.7,18.4,190.0,3900.0,Male 74 | Adelie,Torgersen,39.6,17.2,196.0,3550.0,Female 75 | Adelie,Torgersen,45.8,18.9,197.0,4150.0,Male 76 | Adelie,Torgersen,35.5,17.5,190.0,3700.0,Female 77 | Adelie,Torgersen,42.8,18.5,195.0,4250.0,Male 78 | Adelie,Torgersen,40.9,16.8,191.0,3700.0,Female 79 | Adelie,Torgersen,37.2,19.4,184.0,3900.0,Male 80 | Adelie,Torgersen,36.2,16.1,187.0,3550.0,Female 81 | Adelie,Torgersen,42.1,19.1,195.0,4000.0,Male 82 | Adelie,Torgersen,34.6,17.2,189.0,3200.0,Female 83 | Adelie,Torgersen,42.9,17.6,196.0,4700.0,Male 84 | Adelie,Torgersen,36.7,18.8,187.0,3800.0,Female 85 | Adelie,Torgersen,35.1,19.4,193.0,4200.0,Male 86 | Adelie,Dream,37.3,17.8,191.0,3350.0,Female 87 | Adelie,Dream,41.3,20.3,194.0,3550.0,Male 88 | Adelie,Dream,36.3,19.5,190.0,3800.0,Male 89 | Adelie,Dream,36.9,18.6,189.0,3500.0,Female 90 | Adelie,Dream,38.3,19.2,189.0,3950.0,Male 91 | Adelie,Dream,38.9,18.8,190.0,3600.0,Female 92 | Adelie,Dream,35.7,18.0,202.0,3550.0,Female 93 | Adelie,Dream,41.1,18.1,205.0,4300.0,Male 94 | Adelie,Dream,34.0,17.1,185.0,3400.0,Female 95 | Adelie,Dream,39.6,18.1,186.0,4450.0,Male 96 | Adelie,Dream,36.2,17.3,187.0,3300.0,Female 97 | Adelie,Dream,40.8,18.9,208.0,4300.0,Male 98 | Adelie,Dream,38.1,18.6,190.0,3700.0,Female 99 | Adelie,Dream,40.3,18.5,196.0,4350.0,Male 100 | Adelie,Dream,33.1,16.1,178.0,2900.0,Female 101 | Adelie,Dream,43.2,18.5,192.0,4100.0,Male 102 | Adelie,Biscoe,35.0,17.9,192.0,3725.0,Female 103 | Adelie,Biscoe,41.0,20.0,203.0,4725.0,Male 104 | Adelie,Biscoe,37.7,16.0,183.0,3075.0,Female 105 | Adelie,Biscoe,37.8,20.0,190.0,4250.0,Male 106 | Adelie,Biscoe,37.9,18.6,193.0,2925.0,Female 107 | Adelie,Biscoe,39.7,18.9,184.0,3550.0,Male 108 | Adelie,Biscoe,38.6,17.2,199.0,3750.0,Female 109 | Adelie,Biscoe,38.2,20.0,190.0,3900.0,Male 110 | Adelie,Biscoe,38.1,17.0,181.0,3175.0,Female 111 | Adelie,Biscoe,43.2,19.0,197.0,4775.0,Male 112 | Adelie,Biscoe,38.1,16.5,198.0,3825.0,Female 113 | Adelie,Biscoe,45.6,20.3,191.0,4600.0,Male 114 | Adelie,Biscoe,39.7,17.7,193.0,3200.0,Female 115 | Adelie,Biscoe,42.2,19.5,197.0,4275.0,Male 116 | Adelie,Biscoe,39.6,20.7,191.0,3900.0,Female 117 | Adelie,Biscoe,42.7,18.3,196.0,4075.0,Male 118 | Adelie,Torgersen,38.6,17.0,188.0,2900.0,Female 119 | Adelie,Torgersen,37.3,20.5,199.0,3775.0,Male 120 | Adelie,Torgersen,35.7,17.0,189.0,3350.0,Female 121 | Adelie,Torgersen,41.1,18.6,189.0,3325.0,Male 122 | Adelie,Torgersen,36.2,17.2,187.0,3150.0,Female 123 | Adelie,Torgersen,37.7,19.8,198.0,3500.0,Male 124 | Adelie,Torgersen,40.2,17.0,176.0,3450.0,Female 125 | Adelie,Torgersen,41.4,18.5,202.0,3875.0,Male 126 | Adelie,Torgersen,35.2,15.9,186.0,3050.0,Female 127 | Adelie,Torgersen,40.6,19.0,199.0,4000.0,Male 128 | Adelie,Torgersen,38.8,17.6,191.0,3275.0,Female 129 | Adelie,Torgersen,41.5,18.3,195.0,4300.0,Male 130 | Adelie,Torgersen,39.0,17.1,191.0,3050.0,Female 131 | Adelie,Torgersen,44.1,18.0,210.0,4000.0,Male 132 | Adelie,Torgersen,38.5,17.9,190.0,3325.0,Female 133 | Adelie,Torgersen,43.1,19.2,197.0,3500.0,Male 134 | Adelie,Dream,36.8,18.5,193.0,3500.0,Female 135 | Adelie,Dream,37.5,18.5,199.0,4475.0,Male 136 | Adelie,Dream,38.1,17.6,187.0,3425.0,Female 137 | Adelie,Dream,41.1,17.5,190.0,3900.0,Male 138 | Adelie,Dream,35.6,17.5,191.0,3175.0,Female 139 | Adelie,Dream,40.2,20.1,200.0,3975.0,Male 140 | Adelie,Dream,37.0,16.5,185.0,3400.0,Female 141 | Adelie,Dream,39.7,17.9,193.0,4250.0,Male 142 | Adelie,Dream,40.2,17.1,193.0,3400.0,Female 143 | Adelie,Dream,40.6,17.2,187.0,3475.0,Male 144 | Adelie,Dream,32.1,15.5,188.0,3050.0,Female 145 | Adelie,Dream,40.7,17.0,190.0,3725.0,Male 146 | Adelie,Dream,37.3,16.8,192.0,3000.0,Female 147 | Adelie,Dream,39.0,18.7,185.0,3650.0,Male 148 | Adelie,Dream,39.2,18.6,190.0,4250.0,Male 149 | Adelie,Dream,36.6,18.4,184.0,3475.0,Female 150 | Adelie,Dream,36.0,17.8,195.0,3450.0,Female 151 | Adelie,Dream,37.8,18.1,193.0,3750.0,Male 152 | Adelie,Dream,36.0,17.1,187.0,3700.0,Female 153 | Adelie,Dream,41.5,18.5,201.0,4000.0,Male 154 | Chinstrap,Dream,46.5,17.9,192.0,3500.0,Female 155 | Chinstrap,Dream,50.0,19.5,196.0,3900.0,Male 156 | Chinstrap,Dream,51.3,19.2,193.0,3650.0,Male 157 | Chinstrap,Dream,45.4,18.7,188.0,3525.0,Female 158 | Chinstrap,Dream,52.7,19.8,197.0,3725.0,Male 159 | Chinstrap,Dream,45.2,17.8,198.0,3950.0,Female 160 | Chinstrap,Dream,46.1,18.2,178.0,3250.0,Female 161 | Chinstrap,Dream,51.3,18.2,197.0,3750.0,Male 162 | Chinstrap,Dream,46.0,18.9,195.0,4150.0,Female 163 | Chinstrap,Dream,51.3,19.9,198.0,3700.0,Male 164 | Chinstrap,Dream,46.6,17.8,193.0,3800.0,Female 165 | Chinstrap,Dream,51.7,20.3,194.0,3775.0,Male 166 | Chinstrap,Dream,47.0,17.3,185.0,3700.0,Female 167 | Chinstrap,Dream,52.0,18.1,201.0,4050.0,Male 168 | Chinstrap,Dream,45.9,17.1,190.0,3575.0,Female 169 | Chinstrap,Dream,50.5,19.6,201.0,4050.0,Male 170 | Chinstrap,Dream,50.3,20.0,197.0,3300.0,Male 171 | Chinstrap,Dream,58.0,17.8,181.0,3700.0,Female 172 | Chinstrap,Dream,46.4,18.6,190.0,3450.0,Female 173 | Chinstrap,Dream,49.2,18.2,195.0,4400.0,Male 174 | Chinstrap,Dream,42.4,17.3,181.0,3600.0,Female 175 | Chinstrap,Dream,48.5,17.5,191.0,3400.0,Male 176 | Chinstrap,Dream,43.2,16.6,187.0,2900.0,Female 177 | Chinstrap,Dream,50.6,19.4,193.0,3800.0,Male 178 | Chinstrap,Dream,46.7,17.9,195.0,3300.0,Female 179 | Chinstrap,Dream,52.0,19.0,197.0,4150.0,Male 180 | Chinstrap,Dream,50.5,18.4,200.0,3400.0,Female 181 | Chinstrap,Dream,49.5,19.0,200.0,3800.0,Male 182 | Chinstrap,Dream,46.4,17.8,191.0,3700.0,Female 183 | Chinstrap,Dream,52.8,20.0,205.0,4550.0,Male 184 | Chinstrap,Dream,40.9,16.6,187.0,3200.0,Female 185 | Chinstrap,Dream,54.2,20.8,201.0,4300.0,Male 186 | Chinstrap,Dream,42.5,16.7,187.0,3350.0,Female 187 | Chinstrap,Dream,51.0,18.8,203.0,4100.0,Male 188 | Chinstrap,Dream,49.7,18.6,195.0,3600.0,Male 189 | Chinstrap,Dream,47.5,16.8,199.0,3900.0,Female 190 | Chinstrap,Dream,47.6,18.3,195.0,3850.0,Female 191 | Chinstrap,Dream,52.0,20.7,210.0,4800.0,Male 192 | Chinstrap,Dream,46.9,16.6,192.0,2700.0,Female 193 | Chinstrap,Dream,53.5,19.9,205.0,4500.0,Male 194 | Chinstrap,Dream,49.0,19.5,210.0,3950.0,Male 195 | Chinstrap,Dream,46.2,17.5,187.0,3650.0,Female 196 | Chinstrap,Dream,50.9,19.1,196.0,3550.0,Male 197 | Chinstrap,Dream,45.5,17.0,196.0,3500.0,Female 198 | Chinstrap,Dream,50.9,17.9,196.0,3675.0,Female 199 | Chinstrap,Dream,50.8,18.5,201.0,4450.0,Male 200 | Chinstrap,Dream,50.1,17.9,190.0,3400.0,Female 201 | Chinstrap,Dream,49.0,19.6,212.0,4300.0,Male 202 | Chinstrap,Dream,51.5,18.7,187.0,3250.0,Male 203 | Chinstrap,Dream,49.8,17.3,198.0,3675.0,Female 204 | Chinstrap,Dream,48.1,16.4,199.0,3325.0,Female 205 | Chinstrap,Dream,51.4,19.0,201.0,3950.0,Male 206 | Chinstrap,Dream,45.7,17.3,193.0,3600.0,Female 207 | Chinstrap,Dream,50.7,19.7,203.0,4050.0,Male 208 | Chinstrap,Dream,42.5,17.3,187.0,3350.0,Female 209 | Chinstrap,Dream,52.2,18.8,197.0,3450.0,Male 210 | Chinstrap,Dream,45.2,16.6,191.0,3250.0,Female 211 | Chinstrap,Dream,49.3,19.9,203.0,4050.0,Male 212 | Chinstrap,Dream,50.2,18.8,202.0,3800.0,Male 213 | Chinstrap,Dream,45.6,19.4,194.0,3525.0,Female 214 | Chinstrap,Dream,51.9,19.5,206.0,3950.0,Male 215 | Chinstrap,Dream,46.8,16.5,189.0,3650.0,Female 216 | Chinstrap,Dream,45.7,17.0,195.0,3650.0,Female 217 | Chinstrap,Dream,55.8,19.8,207.0,4000.0,Male 218 | Chinstrap,Dream,43.5,18.1,202.0,3400.0,Female 219 | Chinstrap,Dream,49.6,18.2,193.0,3775.0,Male 220 | Chinstrap,Dream,50.8,19.0,210.0,4100.0,Male 221 | Chinstrap,Dream,50.2,18.7,198.0,3775.0,Female 222 | Gentoo,Biscoe,46.1,13.2,211.0,4500.0,Female 223 | Gentoo,Biscoe,50.0,16.3,230.0,5700.0,Male 224 | Gentoo,Biscoe,48.7,14.1,210.0,4450.0,Female 225 | Gentoo,Biscoe,50.0,15.2,218.0,5700.0,Male 226 | Gentoo,Biscoe,47.6,14.5,215.0,5400.0,Male 227 | Gentoo,Biscoe,46.5,13.5,210.0,4550.0,Female 228 | Gentoo,Biscoe,45.4,14.6,211.0,4800.0,Female 229 | Gentoo,Biscoe,46.7,15.3,219.0,5200.0,Male 230 | Gentoo,Biscoe,43.3,13.4,209.0,4400.0,Female 231 | Gentoo,Biscoe,46.8,15.4,215.0,5150.0,Male 232 | Gentoo,Biscoe,40.9,13.7,214.0,4650.0,Female 233 | Gentoo,Biscoe,49.0,16.1,216.0,5550.0,Male 234 | Gentoo,Biscoe,45.5,13.7,214.0,4650.0,Female 235 | Gentoo,Biscoe,48.4,14.6,213.0,5850.0,Male 236 | Gentoo,Biscoe,45.8,14.6,210.0,4200.0,Female 237 | Gentoo,Biscoe,49.3,15.7,217.0,5850.0,Male 238 | Gentoo,Biscoe,42.0,13.5,210.0,4150.0,Female 239 | Gentoo,Biscoe,49.2,15.2,221.0,6300.0,Male 240 | Gentoo,Biscoe,46.2,14.5,209.0,4800.0,Female 241 | Gentoo,Biscoe,48.7,15.1,222.0,5350.0,Male 242 | Gentoo,Biscoe,50.2,14.3,218.0,5700.0,Male 243 | Gentoo,Biscoe,45.1,14.5,215.0,5000.0,Female 244 | Gentoo,Biscoe,46.5,14.5,213.0,4400.0,Female 245 | Gentoo,Biscoe,46.3,15.8,215.0,5050.0,Male 246 | Gentoo,Biscoe,42.9,13.1,215.0,5000.0,Female 247 | Gentoo,Biscoe,46.1,15.1,215.0,5100.0,Male 248 | Gentoo,Biscoe,44.5,14.3,216.0,4100.0, 249 | Gentoo,Biscoe,47.8,15.0,215.0,5650.0,Male 250 | Gentoo,Biscoe,48.2,14.3,210.0,4600.0,Female 251 | Gentoo,Biscoe,50.0,15.3,220.0,5550.0,Male 252 | Gentoo,Biscoe,47.3,15.3,222.0,5250.0,Male 253 | Gentoo,Biscoe,42.8,14.2,209.0,4700.0,Female 254 | Gentoo,Biscoe,45.1,14.5,207.0,5050.0,Female 255 | Gentoo,Biscoe,59.6,17.0,230.0,6050.0,Male 256 | Gentoo,Biscoe,49.1,14.8,220.0,5150.0,Female 257 | Gentoo,Biscoe,48.4,16.3,220.0,5400.0,Male 258 | Gentoo,Biscoe,42.6,13.7,213.0,4950.0,Female 259 | Gentoo,Biscoe,44.4,17.3,219.0,5250.0,Male 260 | Gentoo,Biscoe,44.0,13.6,208.0,4350.0,Female 261 | Gentoo,Biscoe,48.7,15.7,208.0,5350.0,Male 262 | Gentoo,Biscoe,42.7,13.7,208.0,3950.0,Female 263 | Gentoo,Biscoe,49.6,16.0,225.0,5700.0,Male 264 | Gentoo,Biscoe,45.3,13.7,210.0,4300.0,Female 265 | Gentoo,Biscoe,49.6,15.0,216.0,4750.0,Male 266 | Gentoo,Biscoe,50.5,15.9,222.0,5550.0,Male 267 | Gentoo,Biscoe,43.6,13.9,217.0,4900.0,Female 268 | Gentoo,Biscoe,45.5,13.9,210.0,4200.0,Female 269 | Gentoo,Biscoe,50.5,15.9,225.0,5400.0,Male 270 | Gentoo,Biscoe,44.9,13.3,213.0,5100.0,Female 271 | Gentoo,Biscoe,45.2,15.8,215.0,5300.0,Male 272 | Gentoo,Biscoe,46.6,14.2,210.0,4850.0,Female 273 | Gentoo,Biscoe,48.5,14.1,220.0,5300.0,Male 274 | Gentoo,Biscoe,45.1,14.4,210.0,4400.0,Female 275 | Gentoo,Biscoe,50.1,15.0,225.0,5000.0,Male 276 | Gentoo,Biscoe,46.5,14.4,217.0,4900.0,Female 277 | Gentoo,Biscoe,45.0,15.4,220.0,5050.0,Male 278 | Gentoo,Biscoe,43.8,13.9,208.0,4300.0,Female 279 | Gentoo,Biscoe,45.5,15.0,220.0,5000.0,Male 280 | Gentoo,Biscoe,43.2,14.5,208.0,4450.0,Female 281 | Gentoo,Biscoe,50.4,15.3,224.0,5550.0,Male 282 | Gentoo,Biscoe,45.3,13.8,208.0,4200.0,Female 283 | Gentoo,Biscoe,46.2,14.9,221.0,5300.0,Male 284 | Gentoo,Biscoe,45.7,13.9,214.0,4400.0,Female 285 | Gentoo,Biscoe,54.3,15.7,231.0,5650.0,Male 286 | Gentoo,Biscoe,45.8,14.2,219.0,4700.0,Female 287 | Gentoo,Biscoe,49.8,16.8,230.0,5700.0,Male 288 | Gentoo,Biscoe,46.2,14.4,214.0,4650.0, 289 | Gentoo,Biscoe,49.5,16.2,229.0,5800.0,Male 290 | Gentoo,Biscoe,43.5,14.2,220.0,4700.0,Female 291 | Gentoo,Biscoe,50.7,15.0,223.0,5550.0,Male 292 | Gentoo,Biscoe,47.7,15.0,216.0,4750.0,Female 293 | Gentoo,Biscoe,46.4,15.6,221.0,5000.0,Male 294 | Gentoo,Biscoe,48.2,15.6,221.0,5100.0,Male 295 | Gentoo,Biscoe,46.5,14.8,217.0,5200.0,Female 296 | Gentoo,Biscoe,46.4,15.0,216.0,4700.0,Female 297 | Gentoo,Biscoe,48.6,16.0,230.0,5800.0,Male 298 | Gentoo,Biscoe,47.5,14.2,209.0,4600.0,Female 299 | Gentoo,Biscoe,51.1,16.3,220.0,6000.0,Male 300 | Gentoo,Biscoe,45.2,13.8,215.0,4750.0,Female 301 | Gentoo,Biscoe,45.2,16.4,223.0,5950.0,Male 302 | Gentoo,Biscoe,49.1,14.5,212.0,4625.0,Female 303 | Gentoo,Biscoe,52.5,15.6,221.0,5450.0,Male 304 | Gentoo,Biscoe,47.4,14.6,212.0,4725.0,Female 305 | Gentoo,Biscoe,50.0,15.9,224.0,5350.0,Male 306 | Gentoo,Biscoe,44.9,13.8,212.0,4750.0,Female 307 | Gentoo,Biscoe,50.8,17.3,228.0,5600.0,Male 308 | Gentoo,Biscoe,43.4,14.4,218.0,4600.0,Female 309 | Gentoo,Biscoe,51.3,14.2,218.0,5300.0,Male 310 | Gentoo,Biscoe,47.5,14.0,212.0,4875.0,Female 311 | Gentoo,Biscoe,52.1,17.0,230.0,5550.0,Male 312 | Gentoo,Biscoe,47.5,15.0,218.0,4950.0,Female 313 | Gentoo,Biscoe,52.2,17.1,228.0,5400.0,Male 314 | Gentoo,Biscoe,45.5,14.5,212.0,4750.0,Female 315 | Gentoo,Biscoe,49.5,16.1,224.0,5650.0,Male 316 | Gentoo,Biscoe,44.5,14.7,214.0,4850.0,Female 317 | Gentoo,Biscoe,50.8,15.7,226.0,5200.0,Male 318 | Gentoo,Biscoe,49.4,15.8,216.0,4925.0,Male 319 | Gentoo,Biscoe,46.9,14.6,222.0,4875.0,Female 320 | Gentoo,Biscoe,48.4,14.4,203.0,4625.0,Female 321 | Gentoo,Biscoe,51.1,16.5,225.0,5250.0,Male 322 | Gentoo,Biscoe,48.5,15.0,219.0,4850.0,Female 323 | Gentoo,Biscoe,55.9,17.0,228.0,5600.0,Male 324 | Gentoo,Biscoe,47.2,15.5,215.0,4975.0,Female 325 | Gentoo,Biscoe,49.1,15.0,228.0,5500.0,Male 326 | Gentoo,Biscoe,47.3,13.8,216.0,4725.0, 327 | Gentoo,Biscoe,46.8,16.1,215.0,5500.0,Male 328 | Gentoo,Biscoe,41.7,14.7,210.0,4700.0,Female 329 | Gentoo,Biscoe,53.4,15.8,219.0,5500.0,Male 330 | Gentoo,Biscoe,43.3,14.0,208.0,4575.0,Female 331 | Gentoo,Biscoe,48.1,15.1,209.0,5500.0,Male 332 | Gentoo,Biscoe,50.5,15.2,216.0,5000.0,Female 333 | Gentoo,Biscoe,49.8,15.9,229.0,5950.0,Male 334 | Gentoo,Biscoe,43.5,15.2,213.0,4650.0,Female 335 | Gentoo,Biscoe,51.5,16.3,230.0,5500.0,Male 336 | Gentoo,Biscoe,46.2,14.1,217.0,4375.0,Female 337 | Gentoo,Biscoe,55.1,16.0,230.0,5850.0,Male 338 | Gentoo,Biscoe,44.5,15.7,217.0,4875.0, 339 | Gentoo,Biscoe,48.8,16.2,222.0,6000.0,Male 340 | Gentoo,Biscoe,47.2,13.7,214.0,4925.0,Female 341 | Gentoo,Biscoe,,,,, 342 | Gentoo,Biscoe,46.8,14.3,215.0,4850.0,Female 343 | Gentoo,Biscoe,50.4,15.7,222.0,5750.0,Male 344 | Gentoo,Biscoe,45.2,14.8,212.0,5200.0,Female 345 | Gentoo,Biscoe,49.9,16.1,213.0,5400.0,Male 346 | -------------------------------------------------------------------------------- /data/food_training/languages.csv: -------------------------------------------------------------------------------- 1 | Country,Official and national Languages 2 | Albania,Albanian 3 | Andorra,Catalan 4 | Austria,German/Slovene/Croatian/Hungarian 5 | Belarus,Belarusian/Russian 6 | Belgium,Dutch/French/German 7 | Bosnia & Herzegovina,Bosnian/Croatian/Serbian 8 | Bulgaria,Bulgarian 9 | Croatia,Croatian 10 | Cyprus,Greek/Turkish/English 11 | Czech Republic,Czech 12 | Denmark,Danish 13 | Estonia,Estonian 14 | Faroe Islands,Faroese/Danish 15 | Finland,Finnish/Swedish 16 | France,French 17 | Germany,German 18 | Gibraltar,English 19 | Greece,Greek 20 | Greenland,Greenlandic Inuktitut/Danish 21 | Hungary,Hungarian 22 | Iceland,Icelandic 23 | Ireland,Irish/English 24 | Italy,Italian 25 | Latvia,Latvian 26 | Liechtenstein,German 27 | Lithuania,Lithuanian 28 | Luxembourg,Luxembourgish/French/German 29 | Macedonia (Rep. of),Macedonia/Albanian 30 | Malta,Maltese 31 | Moldova,Moldovan 32 | Monaco,French 33 | Montenegro,Serbo-Croatian 34 | Netherlands,Dutch/Frisian 35 | Norway,Norwegian 36 | Poland,Polish 37 | Portugal,Portuguese 38 | Romania,Romanian 39 | Russian Federation,Russian 40 | San Marino,Italian 41 | Serbia,Serbian/Albanian 42 | Slovakia,Slovak 43 | Slovenia,Slovenian 44 | Spain,Spanish/Catalan/Galician/Basque 45 | Sweden,Swedish 46 | Switzerland,German/French/Italian/Romansch 47 | Turkey,Turkish 48 | Ukraine,Ukrainian 49 | United Kingdom,English 50 | Vatican City State,Latin/Italian 51 | -------------------------------------------------------------------------------- /data/food_training/training_2014.csv: -------------------------------------------------------------------------------- 1 | ,,,,,, 2 | CourseName,Location,DateFrom,DateTo,Attendees,, 3 | Risk Assessment (Pest),lisbon;Portugal,2015-01-12,2015-01-16,1,, 4 | Organic Farming,Bristol,2015-01-19,2015-01-22,2,, 5 | Prevention Control and Eradication of Transmissible Spongiform Encephalopathies,Ljubljana;Slovenia,2015-01-20,2015-01-23,2,, 6 | Contingency Planning,Padua,2015-01-26,2015-01-30,2,, 7 | HACCP,Rome,2015-02-02,2015-02-06,2,, 8 | Food Hygiene and Flexibility,Parma,2015-02-08,2015-02-13,1,, 9 | Risk Assessment (Animal Health),Lisbon,2015-02-08,2015-02-12,1,, 10 | Plant Health Risks,Munich,2015-02-09,2015-02-12,1,, 11 | Plant Protection Products,Lisbon,2015-02-16,2015-02-19,2,, 12 | Food Hygiene at Primary Production (Plants),Murcia,2015-02-16,2015-02-20,2,, 13 | Risk Assessment (GMO),Lisbon,2015-02-23,2015-02-27,1,, 14 | Audit,Amsterdam,2015-02-23,2015-02-27,1,, 15 | Semen Embryos and Ova,Venice,2015-02-23,2015-02-27,1,, 16 | Food Additives,Athens,2015-02-23,2015-02-27,2,, 17 | Organic Farming,Warsaw,2015-03-02,2015-03-05,2,, 18 | Risk Assessment (Microbiological),Berlin,2015-03-02,2015-03-06,1,, 19 | HACCP,Lyon,2015-03-02,2015-03-06,4,, 20 | Animal identification registration and traceability,Lyon,2015-03-02,2015-03-06,1,, 21 | Animal Health and Disease Prevention for Bees and Zoo Animals,Antwerp,2015-03-08,2015-03-13,1,, 22 | Food Hygiene and Flexibility,Graz,2015-03-08,2015-03-13,1,, 23 | Audit ,Barcelona,2015-03-09,2015-03-13,2,, 24 | Risk Assessment (Environment),Rome,2015-03-16,2015-03-20,1,, 25 | Food Hygiene at Primary Production (Land Animals), Budapest,2015-03-16,2015-03-20,1,, 26 | RASFF,Madrid,2015-03-23,2015-03-26,1,, 27 | Contingency Planning,Padua,2015-03-23,2015-03-27,1,, 28 | Microbiological Criteria,Lisbon,2015-03-23,2015-03-27,1,, 29 | Food Composition and Information,Athens,2015-03-23,2015-03-27,1,, 30 | Control on Contaminants in Feed and Food,Berlin,2015-03-24,2015-03-27,3,, 31 | -------------------------------------------------------------------------------- /data/food_training/training_2015.csv: -------------------------------------------------------------------------------- 1 | ,,,,,, 2 | CourseName,Location,DateFrom,DateTo,Attendees,, 3 | Food Hygiene and Flexibility,Vilnius,2015-04-12,2015-04-17,2,, 4 | Plant Health Risks,Milan;Italy,2015-04-13,2015-04-16,1,, 5 | New Investigative Techniques,Madrid,2015-04-13,2015-04-16,2,, 6 | Foodborne Outbreaks Investigations,Lisbon,2015-04-13,2015-04-17,2,, 7 | Food Hygiene at Primary Production (Plants),Budapest,2015-04-13,2015-04-17,2,, 8 | Food Composition and Information,Trim; Ireland ,2015-04-13,2015-04-17,2,, 9 | Veterinary Medical Products,Venice,2015-04-14,2015-04-17,3,, 10 | Audit,Grange;Ireland,2015-04-20,2015-04-24,1,, 11 | Food Additives,Riga,2015-04-20,2015-04-24,2,, 12 | Movement of Cats and Dogs,Zagreb;Croatia,2015-04-21,2015-04-24,1,, 13 | Import Controls on Food and Feed of Non-Animal Origin,Genoa,2015-04-21,2015-04-24,3,, 14 | Animal by Products (Intermediate Level),Antwerp,2015-04-21,2015-04-24,2,, 15 | Microbiological Criteria,Riga,2015-05-04,2015-05-07,2,, 16 | Risk Assessment (Microbiological),Tallinn,2015-05-04,2015-05-08,1,, 17 | Animal identification registration and traceability,Warsaw,2015-05-04,2015-05-08,2,, 18 | Contingency Planning,Padua;Italy,2015-05-04,2015-05-08,3,, 19 | Food Hygiene and Flexibility,Coimbra;Portugal,2015-05-10,2015-05-15,2,, 20 | Food Composition and Information,Madrid,2015-05-11,2015-05-14,1,, 21 | Microbiological Criteria,Valencia,2015-05-11,2015-05-15,3,, 22 | New Investigative Techniques,Bratislava;Slovakia,2015-05-18,2015-05-21,2,, 23 | Border Inspection Post,Felixstowe;United Kingdom,2015-05-18,2015-05-21,4,, 24 | Semen Embryos and Ova,Venice,2015-05-18,2015-05-22,1,, 25 | RASFF,Valencia,2015-05-18,2015-05-22,1,, 26 | Traces (USE AT IMPORT OF CERTAIN FEED AND FOOD OF NON-ANIMAL ORIGIN),Riga,2015-05-19,2015-05-22,2,, 27 | Animal by Products (Upgraded),Maribor,2015-05-19,2015-05-22,2,, 28 | Animal Welfare (During Transport),Unknown;France,2015-05-26,2015-05-29,2,, 29 | Plant Health Risks,Lisbon,2015-06-01,2015-06-04,1,, 30 | Veterinary Medical Products,Krakow,2015-06-02,2015-06-05,2,, 31 | Animal Health and Disease Prevention for Bees and Zoo Animals,Maribor,2015-06-02,2015-06-05,2,, 32 | HACCP,Rome,2015-06-08,2015-06-11,2,, 33 | New Investigative Techniques,Prague,2015-06-08,2015-06-11,1,, 34 | Feed Law,Bremen,2015-06-08,2015-06-12,2,, 35 | Food Additives,Rotterdam;Netherlands,2015-06-09,2015-06-12,3,, 36 | Food Hygiene and Flexibility,Vilnius,2015-06-14,2015-06-19,1,, 37 | Risk Assessment (Pest),Tallinn,2015-06-15,2015-06-16,1,, 38 | Animal identification registration and traceability,Munich;Germany,2015-06-15,2015-06-19,2,, 39 | RASFF,Tallinn,2015-06-16,2015-06-19,1,, 40 | Animal Welfare (Pig Production),Unknown;Denmark,2015-06-16,2015-06-19,1,, 41 | Border Inspection Post,Vienna;Austria,2015-06-16,2015-06-19,3,, 42 | Food Additives,Trim;Ireland,2015-06-22,2015-06-22,2,, 43 | Traces (USE AT IMPORT OF LIVE PLANTS),Marseille;France,2015-06-23,2015-06-26,1,, 44 | Food Hygiene at Primary Production (Aquatic Animals),Venice/Udine; Italy ,2015-06-28,2015-10-02,1,, 45 | Semen Embryos and Ova,Lisbon,2015-06-29,2015-07-03,1,, 46 | Plant Health Risks,Lisbon,2015-07-06,2015-07-09,1,, 47 | Contingency Planning,Riga,2015-07-06,2015-07-10,1,, 48 | Import Controls on Food and Feed of Non-Animal Origin,Athens,2015-07-06,2015-07-10,2,, 49 | Pesticide Application Equipment,Barcelona,2015-07-07,2015-07-10,1,, 50 | Animal by Products (Advanced Level),Dusseldorf;Germany,2015-07-07,2015-07-10,1,, 51 | Veterinary Medical Products,Madrid,2015-07-07,2015-07-10,2,, 52 | Contingency Planning,Maribor;Slovenia,2015-09-01,2015-09-04,2,, 53 | Animal Welfare,Cardiff,2015-09-07,2015-09-11,2,, 54 | Plant Health Controls,Brussels/Antwerp,2015-09-15,2015-09-17,1,, 55 | Prevention Control and Eradication of Transmissible Spongiform Encephalopathies,Utrecht;Netherlands,2015-09-15,2015-09-18,2,, 56 | RASFF,Tallinn,2015-09-15,2015-09-18,1,, 57 | Control on Contaminants in Feed and Food,Rome;IT,2015-09-15,2015-09-18,3,, 58 | New Investigative Techniques,Prague,2015-09-21,2015-09-24,1,, 59 | Food Additives,Trim;Ireland,2015-09-21,2015-09-25,1,, 60 | Movement of Cats and Dogs,Malaga;Spain,2015-09-22,2015-09-25,2,, 61 | Animal by Products (Upgraded),Maribor;Slovenia,2015-09-22,2015-09-25,2,, 62 | Animal Welfare (In Hen Laying),Unknown;UK,2015-09-22,2015-09-25,2,, 63 | Import Controls on Food and Feed of Non-Animal Origin,Valencia;Spain,2015-09-22,2015-09-25,3,, 64 | Detection of counterfeit/illegal pesticides,Grange;Ireland,2015-09-23,2015-09-25,1,, 65 | Plant Health Controls,Warsaw;Poland,2015-09-28,2015-10-02,1,, 66 | Foodborne Outbreaks Investigations,Berlin,2015-09-28,2015-10-02,2,, 67 | HACCP,Budapest,2015-09-28,2015-10-02,1,, 68 | Veterinary Medical Products,Trim;Ireland,2015-09-29,2015-10-02,1,, 69 | Traces (USE AT IMPORT OF LIVE ANIMALS AND PRODUCTS OF ANIMAL ORIGIN),Budapest,2015-09-29,2015-10-02,2,, 70 | Risk Assessment (GMO),Tallinn;Estonia,2015-10-05,2015-10-09,1,, 71 | Control on Contaminants in Feed and Food,Riga,2015-10-05,2015-10-09,1,, 72 | Semen Embryos and Ova,Gothenburg,2015-10-05,2015-10-09,2,, 73 | Food Composition and Information,Trim;Ireland,2015-10-05,2015-10-09,1,, 74 | Animal Welfare (Killing For Disease Control),Unknown;Italy,2015-10-06,2015-10-09,1,, 75 | Contingency Planning,Grange;Ireland,2015-10-07,2015-10-09,2,, 76 | Food Hygiene at Primary Production (Land Animals),Trim;Ireland,2015-10-12,2015-10-16,3,, 77 | Animal Welfare (At Slaughter),Grange;Ireland,2015-10-13,2015-10-15,2,, 78 | Border Inspection Post,Felixstowe,2015-10-13,2015-10-16,4,, 79 | Food Composition and Information,D?sseldorf,2015-10-13,2015-10-16,2,, 80 | Food Hygiene and Flexibility,Graz,2015-10-18,2015-10-23,2,, 81 | Microbiological Criteria,Barcelona;Spain,2015-10-19,2015-10-22,2,, 82 | New Investigative Techniques,Madrid,2015-10-19,2015-10-22,3,, 83 | Animal identification registration and traceability,Munich,2015-10-19,2015-10-23,1,, 84 | Animal Welfare (At Slaughter Cattle pigs sheep and goats),Unknown;Italy,2015-10-20,2015-10-23,1,, 85 | Control on Contaminants in Feed and Food,Rome;IT,2015-10-20,2015-10-23,3,, 86 | Food Hygiene at Primary Production (Plants),Budapest,2015-10-26,2015-10-30,1,, 87 | HACCP,Sofia Bulgaria,2015-10-26,2015-10-30,2,, 88 | Movement of Cats and Dogs,London,2015-10-27,2015-10-30,3,, 89 | Import Controls on Food and Feed of Non-Animal Origin,Riga,2015-10-27,2015-10-30,3,, 90 | EU Feed Hygiene Rules and HACCP auditing,Budapest;Hungary,2015-11-03,2015-11-06,1,, 91 | Food Hygiene and Flexibility,Barcelona,2015-11-08,2015-11-13,1,, 92 | Plant Health Controls,Lisbon;Portugal,2015-11-16,2015-11-19,1,, 93 | Food Hygiene at Primary Production (Plants),Alicante/Murcia;Spain,2015-11-16,2015-11-20,1,, 94 | Border Inspection Post,Vienna,2015-11-17,2015-11-20,3,, 95 | Plant Health Controls,Unknown;Italy,2015-11-17,2015-11-20,1,, 96 | Microbiological Criteria,Rome,2015-11-23,2015-11-26,2,, 97 | Import Controls on Food and Feed of Non-Animal Origin,Frankfurt;Switzerland,2015-11-24,2015-11-27,3,, 98 | Control on Contaminants in Feed and Food,Brussels,2015-11-24,2015-11-27,3,, 99 | Foodborne Outbreaks Investigations,Rome,2015-11-30,2015-12-04,3,, 100 | Post-slaughter traceability of Meat (FVO),Grange;Ireland,2015-12-08,2015-12-10,3,, 101 | Plant Health Controls,London,2015-12-08,2015-12-10,2,, 102 | Movement of Cats and Dogs,Zagreb,2015-12-15,2015-12-18,1,, 103 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (Zoon), Venice; Italy,2015-12-15,2015-12-18,1,, 104 | Food Hygiene at Primary Production (Aquatic Animals),Tarragona;Spain,2016-02-08,2016-02-12,3,, 105 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (AMR),Krakow;Poland,2016-02-09,2016-02-12,1,, 106 | Food Composition and Information,Athens ; Greece,2016-03-07,2016-03-11,1,, 107 | Animal Welfare (Broiler Production),Unknown;Italy,2016-03-08,2016-03-11,1,, 108 | Food Composition and Information,Madrid;Spain,2016-01-18,2016-01-22,1,, 109 | Movement of Cats and Dogs,Zagreb,2016-01-19,2016-01-22,1,, 110 | Foodborne Outbreaks Investigations,Berlin,2016-02-01,2016-02-05,2,, 111 | Feed Law,Bremen;Germany,2016-02-01,2016-02-05,2,, 112 | Food Composition and Information,Athens;Greece,2016-02-01,2016-02-05,1,, 113 | Animal Welfare (Cattle pigs sheep and goats: Advanced level course),Unknown;Spain,2016-02-02,2016-02-05,2,, 114 | Food Hygiene at Primary Production (Plants),Valencia;Spain,2016-02-15,2016-02-19,2,, 115 | Animal Health and Disease Prevention for Bees and Zoo Animals,London ,2016-02-16,2016-02-18,2,, 116 | Animal by Products (Intermediate Level),Antwerp; Belgium,2016-02-16,2016-02-19,2,, 117 | Pesticide Application Equipment,Torino;Italy,2016-02-22,2016-02-25,1,, 118 | Semen Embryos and Ova,Lisbon ,2016-02-22,2016-02-26,2,, 119 | Food Hygiene at Primary Production (Land Animals), Budapest,2016-02-29,2016-03-04,1,, 120 | Plant Health Controls,Venice/Treviso,2016-03-07,2016-03-11,2,, 121 | Import Controls on Food and Feed of Non-Animal Origin,Genoa;Italy,2016-03-08,2016-03-11,4,, 122 | Feed Law,Nantes;France,2016-03-14,2016-03-18,3,, 123 | Animal by Products (Upgraded),Maribor;Slovenia,2016-03-15,2016-03-18,2,, 124 | EU Feed Hygiene Rules and HACCP auditing ,Barcelona,2016-03-31,2016-06-03,2,, 125 | -------------------------------------------------------------------------------- /data/food_training/training_2016.csv: -------------------------------------------------------------------------------- 1 | ,,,,,, 2 | CourseName,Location,DateFrom,DateTo,Attendees,, 3 | Food Composition and Information,Valencia;Spain,2016-04-04,2016-04-08,2,, 4 | HACCP,Dublin;Ireland,2016-04-04,2016-04-08,3,, 5 | Food Hygiene and Flexibility,Vilnius;Lithuania,2016-04-04,2016-04-08,2,, 6 | Foodborne Outbreaks Investigations,Berlin,2016-04-04,2016-04-08,2,, 7 | Traces (USE AT IMPORT OF LIVE ANIMALS AND PRODUCTS OF ANIMAL ORIGIN),Alicante;Spain,2016-04-05,2016-04-08,1,, 8 | Animal identification registration and traceability,Lyon,2016-04-11,2016-04-15,1,, 9 | Audit B1 - Standard Level,Grange ; Ireland,2016-04-11,2016-04-15,2,, 10 | Food Additives Type 1,Athens;Greece,2016-04-25,2016-04-29,1,, 11 | New Investigative Techniques - A,Rome,2016-04-25,2016-04-28,2,, 12 | Plant Health Controls,Venice/Treviso,2016-04-25,2016-04-29,2,, 13 | Foodborne Outbreaks Investigations,Tallinn,2016-04-25,2016-04-29,3,, 14 | Import Controls on Food and Feed of Non-Animal Origin,Valencia,2016-04-26,2016-04-29,3,, 15 | Animal Health and Disease Prevention for Bees and Zoo Animals,Prague,2016-04-26,2016-04-29,1,, 16 | Foodborne Outbreaks Investigations,Lisbon,2016-04-29,2016-03-04,2,, 17 | Contingency Planning,Cardiff;UK,2016-05-09,2016-05-13,1,, 18 | New Investigative Techniques - Standard B1,Prague,2016-05-09,2016-05-12,2,, 19 | Border Inspection Post,Felixstowe,2016-05-10,2016-05-13,5,, 20 | Auditing Plastic Recycling Processes,Treviso,2016-05-10,2016-05-13,1,, 21 | EU Feed Hygiene Rules and HACCP auditing ,Amsterdam,2016-05-10,2016-05-13,2,, 22 | Traces (USE AT IMPORT OF CERTAIN FEED AND FOOD OF NON-ANIMAL ORIGIN),Tallinn,2016-05-10,2016-05-13,1,, 23 | Control on Contaminants in Feed and Food,Brussels;Belgium,2016-05-10,2016-05-13,1,, 24 | HACCP,Ljubljana,2016-05-16,2016-05-20,1,, 25 | Food Hygiene and Flexibility (Decision Makers),Barcelona,2016-05-16,2016-05-20,1,, 26 | Microbiological Criteria,Barcelona,2016-05-16,2016-05-19,2,, 27 | Plant Health Controls,Brussels/Antwerp,2016-05-17,2016-05-19,2,, 28 | Animal Welfare (During Transport),Poland,2016-05-17,2016-05-20,1,, 29 | Audit A,Budapest,2016-05-23,2016-05-27,2,, 30 | HACCP,Budapest,2016-05-23,2016-05-27,2,, 31 | Foodborne Outbreaks Investigations,Rome,2016-05-23,2016-05-27,4,, 32 | Food Composition and Information,Prague,2016-05-30,2016-06-03,1,, 33 | Prevention Control and Eradication of Transmissible Spongiform Encephalopathies,Ljubljana,2016-05-31,2016-06-03,2,, 34 | Movement of Cats and Dogs,Malaga,2016-05-31,2016-06-03,2,, 35 | Plant Health Controls,Naples;Italy,2016-06-06,2016-06-09,2,, 36 | Control on Contaminants in Feed and Food,Prague;CZ,2016-06-07,2016-06-10,4,, 37 | Animal by Products (Upgraded),Antwerp;Belgium,2016-06-07,2016-06-10,1,, 38 | HACCP,Ljubljana,2016-06-13,2016-06-17,2,, 39 | Feed Law,Riga,2016-06-13,2016-06-17,4,, 40 | Food Composition and Information,Prague,2016-06-13,2016-06-17,3,, 41 | Animal identification registration and traceability,lisbon,2016-06-13,2016-06-17,1,, 42 | Food Additives Type 1,Trim;Ireland,2016-06-13,2016-06-17,2,, 43 | Food Hygiene and Flexibility,Turin,2016-06-13,2016-06-17,1,, 44 | Microbiological Criteria,Riga;Latvia,2016-06-13,2016-06-16,2,, 45 | Border Inspection Post,Felixstowe,2016-06-14,2016-06-17,6,, 46 | Traces (USE AT IMPORT OF LIVE PLANTS),Riga,2016-06-14,2016-06-17,2,, 47 | Animal Welfare (Poultry at Slaughter: Advanced level course),Unknown;Spain,2016-06-14,2016-06-17,2,, 48 | Audit B1 - Standard Level,Trim,2016-06-20,2016-06-24,1,, 49 | Plant Health Controls,Vienna,2016-06-20,2016-06-24,2,, 50 | Foodborne Outbreaks Investigations,Lisbon,2016-06-20,2016-06-24,2,, 51 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (Zoon),Uppsala;Sweden,2016-06-21,2016-06-24,1,, 52 | Import Controls on Food and Feed of Non-Animal Origin,Frankfurt,2016-06-21,2016-06-24,3,, 53 | New Investigative Techniques - Standard B1,Madrid,2016-06-27,2016-06-30,2,, 54 | Plant Protection Products,Lisbon,2016-06-27,2016-06-30,2,, 55 | Animal Health and Disease Prevention for Bees and Zoo Animals,Maribor,2016-06-28,2016-07-01,1,, 56 | Animal by Products (Upgraded),Dusseldorf,2016-07-05,2016-07-08,1,, 57 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (AMR),Trim;Ireland,2016-07-05,2016-07-08,1,, 58 | Food Additives Type 1,Valencia;Spain,2016-07-11,2016-07-15,1,, 59 | Plant Protection Products,Berlin,2016-08-29,2016-09-01,1,, 60 | Food Composition and Information,Trim;Ireland ,2016-09-05,2016-09-09,1,, 61 | Microbiological Criteria,Riga,2016-09-05,2016-09-08,2,, 62 | Movement of Cats and Dogs,Milan,2016-09-06,2016-09-09,1,, 63 | Food Hygiene and Flexibility,Coimbra,2016-09-12,2016-09-16,2,, 64 | Plant Health Controls,Naples,2016-09-12,2016-09-16,3,, 65 | Foodborne Outbreaks Investigations,Tallinn,2016-09-12,2016-09-16,3,, 66 | Animal by Products (Upgraded),Dusseldorf,2016-09-13,2016-09-16,1,, 67 | Animal Health and Disease Prevention for Bees and Zoo Animals,Maribor,2016-09-13,2016-09-16,4,, 68 | Control on Contaminants in Feed and Food,Sofia;BG,2016-09-13,2016-09-16,5,, 69 | Import Controls on Food and Feed of Non-Animal Origin,Rotterdam/Delft,2016-09-13,2016-09-16,13,, 70 | Animal identification registration and traceability,Warsaw,2016-09-19,2016-09-23,1,, 71 | Food Additives Type 1,Trim;Ireland,2016-09-19,2016-09-23,3,, 72 | HACCP,Lyon,2016-09-19,2016-09-23,2,, 73 | Traces (USE AT INTRA-EU TRADE OF LIVE ANIMALS),Madrid,2016-09-20,2016-09-23,2,, 74 | New Investigative Techniques - Advanced B2,Madrid,2016-09-25,2016-09-28,2,, 75 | Audit B1 - Standard Level,Bratislava,2016-09-26,2016-09-30,4,, 76 | Animal Welfare (In Pig Production),Denmark,2016-09-27,2016-09-30,3,, 77 | EU Feed Hygiene Rules and HACCP auditing ,Budapest,2016-09-27,2016-09-30,4,, 78 | Plant Health Controls,Warsaw,2016-10-03,2016-10-07,4,, 79 | Audit B2 - Advanced Level,Bratislava,2016-10-03,2016-10-07,2,, 80 | Food Hygiene and Flexibility,Vilnius,2016-10-03,2016-10-07,3,, 81 | Microbiological Criteria,Barcelona,2016-10-03,2016-10-06,2,, 82 | Border Inspection Post,Felixstowe,2016-10-04,2016-10-07,4,, 83 | Control on Contaminants in Feed and Food,Rome;IT,2016-10-04,2016-10-07,3,, 84 | Contingency Planning,Thessaloniki;GR,2016-10-10,2016-10-14,2,, 85 | Foodborne Outbreaks Investigations,Rome,2016-10-10,2016-10-14,2,, 86 | Animal by Products (Intermediate Level),Antwerp;Belgium,2016-10-11,2016-10-14,2,, 87 | Audit B1 - Standard Level,Valencia,2016-10-17,2016-10-21,1,, 88 | HACCP (German),Budapest,2016-10-24,2016-10-28,2,, 89 | New Investigative Techniques - Advanced B2,Madrid,2016-10-24,2016-10-27,1,, 90 | Animal Welfare (In Hen Laying),Unknown;UK,2016-10-25,2016-10-28,2,, 91 | Import Controls on Food and Feed of Non-Animal Origin,Riga,2016-10-25,2016-10-28,3,, 92 | Auditing Plastic Recycling Processes,Leipzig;Germany,2016-10-25,2016-10-28,2,, 93 | Contingency Planning,Cardiff;UK,2016-11-07,2016-11-11,2,, 94 | Audit A,Amsterdam;Netherlands,2016-11-07,2016-11-11,2,, 95 | Animal identification registration and traceability,Ljubljana,2016-11-07,2016-11-11,1,, 96 | Food Additives Type 1,Athens;Greece,2016-11-07,2016-11-11,1,, 97 | Food Hygiene and Flexibility,Turin;Italy,2016-11-07,2016-11-11,2,, 98 | Foodborne Outbreaks Investigations,Berlin;Germany,2016-11-07,2016-11-11,1,, 99 | Border Inspection Post,Vienna,2016-11-08,2016-11-11,3,, 100 | Animal Welfare (In Pig Production),Unknown;Italy,2016-11-15,2016-11-18,1,, 101 | Traces (USE AT INTRA-EU TRADE OF LIVE ANIMALS),Torino,2016-11-15,2016-11-18,2,, 102 | Audit B2 - Advanced Level,Berlin,2016-11-21,2016-11-25,2,, 103 | Prevention Control and Eradication of Transmissible Spongiform Encephalopathies,Ljubljana;SI,2016-11-22,2016-11-25,2,, 104 | HACCP,Valencia,2016-11-28,2016-12-02,2,, 105 | Workshop on Fishery Products (FVO),Grange;Ireland,2016-12-01,2015-12-03,3,, 106 | Contingency Planning,Venice;IT,2016-12-05,2016-12-09,2,, 107 | Animal identification registration and traceability,Lisbon;Portugal,2016-12-05,2016-12-09,1,, 108 | Audit A,Seville;Spain,2016-12-12,2016-12-16,1,, 109 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (AMR),Athens;Greece,2016-12-13,2016-12-16,1,, 110 | Food Additives Type 2,Trim;Ireland,2017-01-23,2017-01-27,3,, 111 | Food Hygiene and Flexibility,Barcelona,2017-02-06,2017-02-10,1,, 112 | Auditing Plastic Recycling Processes,Treviso;Italy,2017-02-07,2017-02-10,2,, 113 | CONTROL OF ZOONOSES AND PREVENTION AND MONITORING OF ANTI-MICROBIAL RESISTANCE IN THE FOOD CHAIN (Zoon),Venice;Italy,2017-02-13,2017-02-16,1,, 114 | Contingency Planning,Venice;IT,2017-03-06,2017-03-10,3,, 115 | Audit B1 - Standard Level,Valencia,2017-03-06,2017-03-10,2,, 116 | Audit A,Bratislava,2017-03-20,2017-03-24,1,, 117 | Food Hygiene and Flexibility,Zagreb/Helsinki,2017-03-20,2017-03-24,2,, 118 | Animal Health and Disease Prevention for Bees and Zoo Animals,Antwerp,2017-03-28,2017-03-31,2,, 119 | -------------------------------------------------------------------------------- /media/colab/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image1.png -------------------------------------------------------------------------------- /media/colab/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image10.png -------------------------------------------------------------------------------- /media/colab/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image2.png -------------------------------------------------------------------------------- /media/colab/image3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image3.png -------------------------------------------------------------------------------- /media/colab/image4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image4.png -------------------------------------------------------------------------------- /media/colab/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image5.png -------------------------------------------------------------------------------- /media/colab/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image6.png -------------------------------------------------------------------------------- /media/colab/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image7.png -------------------------------------------------------------------------------- /media/colab/image8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image8.png -------------------------------------------------------------------------------- /media/colab/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/colab/image9.png -------------------------------------------------------------------------------- /media/humble-data-logo-transparent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/humble-data-logo-transparent.png -------------------------------------------------------------------------------- /media/humble-data-logo-white-transparent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/media/humble-data-logo-white-transparent.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyterlab==4.0.8 2 | notebook==7.0.6 3 | pandas==2.1.3 4 | numpy==1.26.2 5 | matplotlib==3.8.2 6 | seaborn==0.13.0 7 | -------------------------------------------------------------------------------- /solutions/01_01.py: -------------------------------------------------------------------------------- 1 | name = "Anne" 2 | print(name) -------------------------------------------------------------------------------- /solutions/01_02.py: -------------------------------------------------------------------------------- 1 | name = 8 2 | print(name) -------------------------------------------------------------------------------- /solutions/01_03.py: -------------------------------------------------------------------------------- 1 | "I'm enjoying this workshop!" -------------------------------------------------------------------------------- /solutions/01_04.py: -------------------------------------------------------------------------------- 1 | 'I'm enjoying this workshop!' -------------------------------------------------------------------------------- /solutions/01_05.py: -------------------------------------------------------------------------------- 1 | 'I\'m enjoying this workshop!' -------------------------------------------------------------------------------- /solutions/01_06.py: -------------------------------------------------------------------------------- 1 | s[-1] -------------------------------------------------------------------------------- /solutions/01_07.py: -------------------------------------------------------------------------------- 1 | s[-3:] -------------------------------------------------------------------------------- /solutions/01_08.py: -------------------------------------------------------------------------------- 1 | "I" not in s -------------------------------------------------------------------------------- /solutions/01_09.py: -------------------------------------------------------------------------------- 1 | 3 + 4 -------------------------------------------------------------------------------- /solutions/01_10.py: -------------------------------------------------------------------------------- 1 | 10.0 - 6 -------------------------------------------------------------------------------- /solutions/01_11.py: -------------------------------------------------------------------------------- 1 | 15 * 12 -------------------------------------------------------------------------------- /solutions/01_12.py: -------------------------------------------------------------------------------- 1 | 2**6 -------------------------------------------------------------------------------- /solutions/01_13.py: -------------------------------------------------------------------------------- 1 | 3.1**2 -------------------------------------------------------------------------------- /solutions/01_14.py: -------------------------------------------------------------------------------- 1 | 5.0**2 -------------------------------------------------------------------------------- /solutions/01_15.py: -------------------------------------------------------------------------------- 1 | 6 / 2 -------------------------------------------------------------------------------- /solutions/01_16.py: -------------------------------------------------------------------------------- 1 | 6 // 2 -------------------------------------------------------------------------------- /solutions/01_17.py: -------------------------------------------------------------------------------- 1 | 19 / 5 -------------------------------------------------------------------------------- /solutions/01_18.py: -------------------------------------------------------------------------------- 1 | 19 // 5 -------------------------------------------------------------------------------- /solutions/01_19.py: -------------------------------------------------------------------------------- 1 | 19 % 5 -------------------------------------------------------------------------------- /solutions/01_20.py: -------------------------------------------------------------------------------- 1 | False != 2 -------------------------------------------------------------------------------- /solutions/01_21.py: -------------------------------------------------------------------------------- 1 | len("Sandrine") > 8 -------------------------------------------------------------------------------- /solutions/01_22.py: -------------------------------------------------------------------------------- 1 | (len("Sandrine") > 5) and (len("Cheuk") < 7) -------------------------------------------------------------------------------- /solutions/01_23.py: -------------------------------------------------------------------------------- 1 | list_greeting[0] -------------------------------------------------------------------------------- /solutions/01_24.py: -------------------------------------------------------------------------------- 1 | list_greeting[3:] -------------------------------------------------------------------------------- /solutions/01_25.py: -------------------------------------------------------------------------------- 1 | list_greeting[:4] -------------------------------------------------------------------------------- /solutions/01_26.py: -------------------------------------------------------------------------------- 1 | list_greeting[::2] -------------------------------------------------------------------------------- /solutions/01_27.py: -------------------------------------------------------------------------------- 1 | list_greeting[2] = "Ola" 2 | print(list_greeting) -------------------------------------------------------------------------------- /solutions/01_28.py: -------------------------------------------------------------------------------- 1 | 10 in list_greeting -------------------------------------------------------------------------------- /solutions/01_29.py: -------------------------------------------------------------------------------- 1 | "Ole" not in list_greeting -------------------------------------------------------------------------------- /solutions/01_30.py: -------------------------------------------------------------------------------- 1 | print("Here we are!") -------------------------------------------------------------------------------- /solutions/01_31.py: -------------------------------------------------------------------------------- 1 | len(snakes) -------------------------------------------------------------------------------- /solutions/01_32.py: -------------------------------------------------------------------------------- 1 | len(list_greeting) -------------------------------------------------------------------------------- /solutions/01_33.py: -------------------------------------------------------------------------------- 1 | max(1, 2, 3, 4, 5) -------------------------------------------------------------------------------- /solutions/01_34.py: -------------------------------------------------------------------------------- 1 | round(123.45) -------------------------------------------------------------------------------- /solutions/01_35.py: -------------------------------------------------------------------------------- 1 | round(123.45, 1) -------------------------------------------------------------------------------- /solutions/01_36.py: -------------------------------------------------------------------------------- 1 | list_greeting.append("Aloha") -------------------------------------------------------------------------------- /solutions/01_37.py: -------------------------------------------------------------------------------- 1 | from math import sqrt 2 | 3 | sqrt(24336) -------------------------------------------------------------------------------- /solutions/01_38.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | np.sin(np.pi / 4) -------------------------------------------------------------------------------- /solutions/02_01.py: -------------------------------------------------------------------------------- 1 | import numpy as np -------------------------------------------------------------------------------- /solutions/02_02.py: -------------------------------------------------------------------------------- 1 | df = pd.read_csv("../data/Penguins/penguins.csv") -------------------------------------------------------------------------------- /solutions/02_03.py: -------------------------------------------------------------------------------- 1 | df.tail(3) -------------------------------------------------------------------------------- /solutions/02_04.py: -------------------------------------------------------------------------------- 1 | df.shape -------------------------------------------------------------------------------- /solutions/02_05.py: -------------------------------------------------------------------------------- 1 | df.info() -------------------------------------------------------------------------------- /solutions/02_06.py: -------------------------------------------------------------------------------- 1 | df.columns -------------------------------------------------------------------------------- /solutions/02_07.py: -------------------------------------------------------------------------------- 1 | pd.options.display.max_rows = 25 -------------------------------------------------------------------------------- /solutions/02_08.py: -------------------------------------------------------------------------------- 1 | df["bill_length_mm"] -------------------------------------------------------------------------------- /solutions/02_09.py: -------------------------------------------------------------------------------- 1 | df.iloc[11] -------------------------------------------------------------------------------- /solutions/02_10.py: -------------------------------------------------------------------------------- 1 | df.loc[11] -------------------------------------------------------------------------------- /solutions/02_11.py: -------------------------------------------------------------------------------- 1 | df.iloc[-3:, 2] -------------------------------------------------------------------------------- /solutions/02_12.py: -------------------------------------------------------------------------------- 1 | df.loc[352:, "bill_length_mm"] 2 | -------------------------------------------------------------------------------- /solutions/02_13.py: -------------------------------------------------------------------------------- 1 | df.iloc[[145, 7, 0], [4, -2]] -------------------------------------------------------------------------------- /solutions/02_14.py: -------------------------------------------------------------------------------- 1 | df.loc[[145, 7, 0], ["flipper_length_mm", "body_mass_g"]] -------------------------------------------------------------------------------- /solutions/02_15.py: -------------------------------------------------------------------------------- 1 | mask_PW_PL = (df["body_mass_g"] > 4000) & (df["flipper_length_mm"] < 185) 2 | df[mask_PW_PL] -------------------------------------------------------------------------------- /solutions/02_16.py: -------------------------------------------------------------------------------- 1 | df["species"].unique() -------------------------------------------------------------------------------- /solutions/02_17.py: -------------------------------------------------------------------------------- 1 | df["flipper_length_mm"].isnull().sum() -------------------------------------------------------------------------------- /solutions/02_18.py: -------------------------------------------------------------------------------- 1 | df["sex"].value_counts(dropna=False) -------------------------------------------------------------------------------- /solutions/02_19.py: -------------------------------------------------------------------------------- 1 | df["species"].value_counts(normalize=True) -------------------------------------------------------------------------------- /solutions/02_20.py: -------------------------------------------------------------------------------- 1 | df[df["flipper_length_mm"].isnull()].index -------------------------------------------------------------------------------- /solutions/02_21.py: -------------------------------------------------------------------------------- 1 | ?pd.DataFrame.dropna -------------------------------------------------------------------------------- /solutions/02_22.py: -------------------------------------------------------------------------------- 1 | df_2 = df.dropna(how="all") -------------------------------------------------------------------------------- /solutions/02_23.py: -------------------------------------------------------------------------------- 1 | print(f"number of rows of df_2: {df_2.shape[0]}") -------------------------------------------------------------------------------- /solutions/02_24.py: -------------------------------------------------------------------------------- 1 | df_3 = df_2.dropna(how="any") -------------------------------------------------------------------------------- /solutions/02_25.py: -------------------------------------------------------------------------------- 1 | print(f"number of rows of df_3: {df_3.shape[0]}") -------------------------------------------------------------------------------- /solutions/02_26.py: -------------------------------------------------------------------------------- 1 | df_4 = df_3.drop_duplicates() -------------------------------------------------------------------------------- /solutions/02_27.py: -------------------------------------------------------------------------------- 1 | df_4.describe() -------------------------------------------------------------------------------- /solutions/02_28.py: -------------------------------------------------------------------------------- 1 | df_4.dtypes -------------------------------------------------------------------------------- /solutions/02_29.py: -------------------------------------------------------------------------------- 1 | df_4.min(numeric_only=True) -------------------------------------------------------------------------------- /solutions/02_30.py: -------------------------------------------------------------------------------- 1 | df_4["flipper_length_mm"].max() -------------------------------------------------------------------------------- /solutions/02_31.py: -------------------------------------------------------------------------------- 1 | df_4.groupby("species").median(numeric_only=True) -------------------------------------------------------------------------------- /solutions/02_32.py: -------------------------------------------------------------------------------- 1 | df_4.to_csv("../data/Penguins/my_penguins.csv") -------------------------------------------------------------------------------- /solutions/04_01.py: -------------------------------------------------------------------------------- 1 | dict_greeting["Italy"] -------------------------------------------------------------------------------- /solutions/04_02.py: -------------------------------------------------------------------------------- 1 | dict_greeting["UK"] = "Good Morning" 2 | print(dict_greeting) -------------------------------------------------------------------------------- /solutions/04_03.py: -------------------------------------------------------------------------------- 1 | dict_greeting["Hawaii"] = "Aloha" 2 | print(dict_greeting) -------------------------------------------------------------------------------- /solutions/04_04.py: -------------------------------------------------------------------------------- 1 | x = -1 2 | y = 2 3 | z = 12 4 | 5 | if x > 0: 6 | print("Python") 7 | elif y == 2: 8 | print("sunshine") 9 | elif z % 3 == 0: 10 | print("data") 11 | else: 12 | print("Why?") -------------------------------------------------------------------------------- /solutions/04_05.py: -------------------------------------------------------------------------------- 1 | ?is_greeting -------------------------------------------------------------------------------- /solutions/04_06.py: -------------------------------------------------------------------------------- 1 | # Remember to run your function 2 | # Such as: 3 | # 4 | # print(f(x)) 5 | # 6 | 7 | 8 | def f(x): 9 | """Returns the argument multiplied by 3 and increased by 10.""" 10 | return (x * 3) + 10 -------------------------------------------------------------------------------- /solutions/04_07.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/solutions/04_07.py -------------------------------------------------------------------------------- /solutions/04_08.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/solutions/04_08.py -------------------------------------------------------------------------------- /solutions/04_09.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HumbleData/beginners-data-workshop/94e1eb90e7694903badff6e535451ead42247c13/solutions/04_09.py -------------------------------------------------------------------------------- /solutions/05_01.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | 3 | import matplotlib.pyplot as plt 4 | import pandas as pd 5 | 6 | %matplotlib inline -------------------------------------------------------------------------------- /solutions/05_02.py: -------------------------------------------------------------------------------- 1 | df_2014 = pd.read_csv("../data/food_training/training_2014.csv") -------------------------------------------------------------------------------- /solutions/05_03.py: -------------------------------------------------------------------------------- 1 | df_2014.head() -------------------------------------------------------------------------------- /solutions/05_04.py: -------------------------------------------------------------------------------- 1 | df_2014 = pd.read_csv("../data/food_training/training_2014.csv", header=1) -------------------------------------------------------------------------------- /solutions/05_05.py: -------------------------------------------------------------------------------- 1 | df_2014.head() -------------------------------------------------------------------------------- /solutions/05_06.py: -------------------------------------------------------------------------------- 1 | df_2015 = pd.read_csv("../data/food_training/training_2015.csv", header=1) 2 | df_2016 = pd.read_csv("../data/food_training/training_2016.csv", header=1) -------------------------------------------------------------------------------- /solutions/05_07.py: -------------------------------------------------------------------------------- 1 | frames = [df_2014, df_2015, df_2016] 2 | df = pd.concat(frames) -------------------------------------------------------------------------------- /solutions/05_08.py: -------------------------------------------------------------------------------- 1 | df.shape -------------------------------------------------------------------------------- /solutions/05_09.py: -------------------------------------------------------------------------------- 1 | df.index -------------------------------------------------------------------------------- /solutions/05_10.py: -------------------------------------------------------------------------------- 1 | df = df.reset_index() 2 | df.index 3 | 4 | # We could also have done the following when concatenating: 5 | # df = pd.concat(frames, ignore_index=True) -------------------------------------------------------------------------------- /solutions/05_11.py: -------------------------------------------------------------------------------- 1 | df.info() -------------------------------------------------------------------------------- /solutions/05_12.py: -------------------------------------------------------------------------------- 1 | ?pd.DataFrame.drop -------------------------------------------------------------------------------- /solutions/05_13.py: -------------------------------------------------------------------------------- 1 | cols_to_remove = ["Unnamed: 5", "Unnamed: 6"] 2 | df = df.drop(cols_to_remove, axis=1) -------------------------------------------------------------------------------- /solutions/05_14.py: -------------------------------------------------------------------------------- 1 | df["Location"].unique() -------------------------------------------------------------------------------- /solutions/05_15.py: -------------------------------------------------------------------------------- 1 | df["Location"].str.split(pat=";") -------------------------------------------------------------------------------- /solutions/05_16.py: -------------------------------------------------------------------------------- 1 | df["Location"].str.split(pat=";", expand=True) -------------------------------------------------------------------------------- /solutions/05_17.py: -------------------------------------------------------------------------------- 1 | df[["city", "country"]] = df["Location"].str.split(pat=";", expand=True) -------------------------------------------------------------------------------- /solutions/05_18.py: -------------------------------------------------------------------------------- 1 | df = df.drop("Location", axis=1) -------------------------------------------------------------------------------- /solutions/05_19.py: -------------------------------------------------------------------------------- 1 | df["country"].nunique() -------------------------------------------------------------------------------- /solutions/05_20.py: -------------------------------------------------------------------------------- 1 | df["country"].value_counts() -------------------------------------------------------------------------------- /solutions/05_21.py: -------------------------------------------------------------------------------- 1 | df["country"] = df["country"].str.strip() 2 | df["city"] = df["city"].str.strip() -------------------------------------------------------------------------------- /solutions/05_22.py: -------------------------------------------------------------------------------- 1 | df["country"].nunique() -------------------------------------------------------------------------------- /solutions/05_23.py: -------------------------------------------------------------------------------- 1 | df[df["country"] == "Portugal"] -------------------------------------------------------------------------------- /solutions/05_24.py: -------------------------------------------------------------------------------- 1 | df["city"] = df["city"].str.lower() -------------------------------------------------------------------------------- /solutions/05_25.py: -------------------------------------------------------------------------------- 1 | df["city"][df["city"].str.contains("/")] -------------------------------------------------------------------------------- /solutions/05_26.py: -------------------------------------------------------------------------------- 1 | df["city"] = df["city"].str.replace(r"/\w*", "", regex=True) -------------------------------------------------------------------------------- /solutions/05_27.py: -------------------------------------------------------------------------------- 1 | dict_codes = { 2 | "BG": "Bulgaria", 3 | "CZ": "Czech Republic", 4 | "IT": "Italy", 5 | "GR": "Greece", 6 | "SI": "Slovenia", 7 | "UK": "United Kingdom", 8 | } 9 | 10 | country_in_codes = df["country"].isin(dict_codes.keys()) 11 | df.loc[country_in_codes, "country"] = df.loc[country_in_codes, "country"].map(dict_codes) -------------------------------------------------------------------------------- /solutions/05_28.py: -------------------------------------------------------------------------------- 1 | df.loc[df["city"] == "unknown", "country"] -------------------------------------------------------------------------------- /solutions/05_29.py: -------------------------------------------------------------------------------- 1 | dict_capitals = { 2 | "Denmark": "copenhague", 3 | "France": "paris", 4 | "Italy": "rome", 5 | "Spain": "madrid", 6 | "United Kingdom": "london", 7 | } 8 | 9 | unknown_city = df["city"] == "unknown" 10 | df.loc[unknown_city, "city"] = df.loc[unknown_city, "country"].map(dict_capitals) -------------------------------------------------------------------------------- /solutions/05_30.py: -------------------------------------------------------------------------------- 1 | set(df["city"]) - dict_cities.keys() -------------------------------------------------------------------------------- /solutions/05_31.py: -------------------------------------------------------------------------------- 1 | dict_cities.update( 2 | { 3 | "bristol": "United Kingdom", 4 | "gothenburg": "Sweden", 5 | "graz": "Austria", 6 | "lyon": "France", 7 | "murcia": "Spain", 8 | "parma": "Italy", 9 | }, 10 | ) -------------------------------------------------------------------------------- /solutions/05_32.py: -------------------------------------------------------------------------------- 1 | null_country = df["country"].isnull() 2 | df.loc[null_country, "country"] = df.loc[null_country, "city"].map(dict_cities) -------------------------------------------------------------------------------- /solutions/05_33.py: -------------------------------------------------------------------------------- 1 | df["country"].value_counts(dropna=False) -------------------------------------------------------------------------------- /solutions/05_34.py: -------------------------------------------------------------------------------- 1 | def f(x): 2 | if x == 1: 3 | return "single" 4 | else: 5 | return "multiple" -------------------------------------------------------------------------------- /solutions/05_35.py: -------------------------------------------------------------------------------- 1 | df["Attendees"].apply(f) -------------------------------------------------------------------------------- /solutions/05_36.py: -------------------------------------------------------------------------------- 1 | languages = pd.read_csv("../data/food_training/languages.csv") -------------------------------------------------------------------------------- /solutions/05_37.py: -------------------------------------------------------------------------------- 1 | df = df.merge(languages, how="left", left_on="country", right_on="Country") -------------------------------------------------------------------------------- /solutions/05_38.py: -------------------------------------------------------------------------------- 1 | df = df.drop("Country", axis=1) 2 | 3 | # N.B. You can only run this cell once! If you try run it again, it will throw an error! 4 | # Why? Because if you drop the Country column, it will be removed...so you can't 5 | # drop it a second time as the column isn't there to drop! -------------------------------------------------------------------------------- /solutions/05_39.py: -------------------------------------------------------------------------------- 1 | df["DateFrom"].dtype -------------------------------------------------------------------------------- /solutions/05_40.py: -------------------------------------------------------------------------------- 1 | df["DateFrom"] = pd.to_datetime(df["DateFrom"], format="%Y-%m-%d") 2 | df["DateTo"] = pd.to_datetime(df["DateTo"], format="%Y-%m-%d") -------------------------------------------------------------------------------- /solutions/05_41.py: -------------------------------------------------------------------------------- 1 | df[df["DateFrom"] > "2017-02-01"] -------------------------------------------------------------------------------- /solutions/05_42.py: -------------------------------------------------------------------------------- 1 | df["duration"] = df["DateTo"] - df["DateFrom"] + datetime.timedelta(days=1) -------------------------------------------------------------------------------- /solutions/05_43.py: -------------------------------------------------------------------------------- 1 | df["month"] = df["DateFrom"].dt.month 2 | df["month"].hist() -------------------------------------------------------------------------------- /solutions/05_44.py: -------------------------------------------------------------------------------- 1 | df.sort_values("city") -------------------------------------------------------------------------------- /solutions/05_45.py: -------------------------------------------------------------------------------- 1 | df.sort_values(["duration", "Attendees"], ascending=[True, False]) -------------------------------------------------------------------------------- /solutions/05_46.py: -------------------------------------------------------------------------------- 1 | df_gr = df.groupby("city") -------------------------------------------------------------------------------- /solutions/05_47.py: -------------------------------------------------------------------------------- 1 | df_gr["Attendees"].mean() --------------------------------------------------------------------------------