├── .ipynb_checkpoints └── Pandas Tutorial-checkpoint.ipynb ├── DataStructures.png ├── Pandas Tutorial.ipynb ├── README.md ├── RegularSeasonCompactResults.csv └── result.csv /.ipynb_checkpoints/Pandas Tutorial-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import pandas as pd" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# Introduction" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "Since I've been working on a lot of Kaggle competitions, I use Pandas a lot. As you may know, Pandas (in addition to Numpy) is the go-to Python library for all your data science needs. It helps with dealing with input data in CSV formats and with transofrming your data into a form where it can be inputted into ML models. However, getting comfortable with the ideas of dataframes, slicing, etc was very tough for me in the beginning. Hopefully, this short tutorial can show you a lot of different commands that will help you gain the most insights into your dataset. " 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "# Loading in Data" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "The first step in any ML problem is identifying what format your data is in, and then loading it into whateer framework you're using. For Kaggle compeitions, a lot of data can be found in CSV files, so that's the example we're going to use. " 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "Since I'm a huge sports fan, we're going to be looking at a sports dataset that shows the results from NCAA basketball games from 1985 to 2016. This dataset is in a CSV file, and the function we're going to use to read in the file is called **pd.read_csv()**. This function returns a **dataframe** variable. The dataframe is the golden jewel data structure for Pandas. It is defined as \"a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)\"." 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "Just think of it as a table for now. We'll explain more about what makes it unique later on. " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": { 60 | "collapsed": false 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "df = pd.read_csv('RegularSeasonCompactResults.csv')" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "Now that we have our dataframe in our variable df, let's look at what it contains. We can use the function **head()** to see the first couple rows of the dataframe (or the function **tail()** to see the last few rows)." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": { 78 | "collapsed": false 79 | }, 80 | "outputs": [ 81 | { 82 | "data": { 83 | "text/html": [ 84 | "

\n", 85 | "\n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H
3	1985	25	1165	70	1432	54	H
4	1985	25	1192	86	1447	74	H

\n", 157 | "

" 158 | ], 159 | "text/plain": [ 160 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 161 | "0 1985 20 1228 81 1328 64 N 0\n", 162 | "1 1985 25 1106 77 1354 70 H 0\n", 163 | "2 1985 25 1112 63 1223 56 H 0\n", 164 | "3 1985 25 1165 70 1432 54 H 0\n", 165 | "4 1985 25 1192 86 1447 74 H 0" 166 | ] 167 | }, 168 | "execution_count": 3, 169 | "metadata": {}, 170 | "output_type": "execute_result" 171 | } 172 | ], 173 | "source": [ 174 | "df.head()" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "We can see the dimensions of the dataframe using the the **shape** attribute" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 4, 187 | "metadata": { 188 | "collapsed": false 189 | }, 190 | "outputs": [ 191 | { 192 | "data": { 193 | "text/plain": [ 194 | "(145289, 8)" 195 | ] 196 | }, 197 | "execution_count": 4, 198 | "metadata": {}, 199 | "output_type": "execute_result" 200 | } 201 | ], 202 | "source": [ 203 | "df.shape" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": { 209 | "collapsed": true 210 | }, 211 | "source": [ 212 | "We can also extract all the columns as a list, by using the **columns** attribute" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "text/plain": [ 225 | "['Season', 'Daynum', 'Wteam', 'Wscore', 'Lteam', 'Lscore', 'Wloc', 'Numot']" 226 | ] 227 | }, 228 | "execution_count": 6, 229 | "metadata": {}, 230 | "output_type": "execute_result" 231 | } 232 | ], 233 | "source": [ 234 | "df.columns.tolist()" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "In order to get a better idea of the type of data that we are dealing with, we can call the **describe()** function to see statistics like mean, min, etc about each column of the dataset. " 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 10, 247 | "metadata": { 248 | "collapsed": false 249 | }, 250 | "outputs": [ 251 | { 252 | "data": { 253 | "text/html": [ 254 | "

\n", 255 | "\n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Numot
count	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000
mean	2001.574834	75.223816	1286.720646	76.600321	1282.864064	64.497009	0.044387
std	9.233342	33.287418	104.570275	12.173033	104.829234	11.380625	0.247819
min	1985.000000	0.000000	1101.000000	34.000000	1101.000000	20.000000	0.000000
25%	1994.000000	47.000000	1198.000000	68.000000	1191.000000	57.000000	0.000000
50%	2002.000000	78.000000	1284.000000	76.000000	1280.000000	64.000000	0.000000
75%	2010.000000	103.000000	1379.000000	84.000000	1375.000000	72.000000	0.000000
max	2016.000000	132.000000	1464.000000	186.000000	1464.000000	150.000000	6.000000

\n", 351 | "

" 352 | ], 353 | "text/plain": [ 354 | " Season Daynum Wteam Wscore \\\n", 355 | "count 145289.000000 145289.000000 145289.000000 145289.000000 \n", 356 | "mean 2001.574834 75.223816 1286.720646 76.600321 \n", 357 | "std 9.233342 33.287418 104.570275 12.173033 \n", 358 | "min 1985.000000 0.000000 1101.000000 34.000000 \n", 359 | "25% 1994.000000 47.000000 1198.000000 68.000000 \n", 360 | "50% 2002.000000 78.000000 1284.000000 76.000000 \n", 361 | "75% 2010.000000 103.000000 1379.000000 84.000000 \n", 362 | "max 2016.000000 132.000000 1464.000000 186.000000 \n", 363 | "\n", 364 | " Lteam Lscore Numot \n", 365 | "count 145289.000000 145289.000000 145289.000000 \n", 366 | "mean 1282.864064 64.497009 0.044387 \n", 367 | "std 104.829234 11.380625 0.247819 \n", 368 | "min 1101.000000 20.000000 0.000000 \n", 369 | "25% 1191.000000 57.000000 0.000000 \n", 370 | "50% 1280.000000 64.000000 0.000000 \n", 371 | "75% 1375.000000 72.000000 0.000000 \n", 372 | "max 1464.000000 150.000000 6.000000 " 373 | ] 374 | }, 375 | "execution_count": 10, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "df.describe()" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "Okay, so now let's looking at information that we want to extract from the dataframe. Let's say I wanted to know the max value of a certain column. The function **max()** will show you the maximum values of all columns" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 22, 394 | "metadata": { 395 | "collapsed": false 396 | }, 397 | "outputs": [ 398 | { 399 | "data": { 400 | "text/plain": [ 401 | "Season 2016\n", 402 | "Daynum 132\n", 403 | "Wteam 1464\n", 404 | "Wscore 186\n", 405 | "Lteam 1464\n", 406 | "Lscore 150\n", 407 | "Wloc N\n", 408 | "Numot 6\n", 409 | "dtype: object" 410 | ] 411 | }, 412 | "execution_count": 22, 413 | "metadata": {}, 414 | "output_type": "execute_result" 415 | } 416 | ], 417 | "source": [ 418 | "df.max()" 419 | ] 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "Then, if you'd like to specifically get the max value for a particular column, you pass in the name of the column using the bracket indexing operator" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": 24, 431 | "metadata": { 432 | "collapsed": false 433 | }, 434 | "outputs": [ 435 | { 436 | "data": { 437 | "text/plain": [ 438 | "186" 439 | ] 440 | }, 441 | "execution_count": 24, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "df['Wscore'].max()" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "But what if that's not enough? Let's say we want to actually see the game(row) where this max score happened. We can call the **argmax()** function to identify the row index" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 36, 460 | "metadata": { 461 | "collapsed": false 462 | }, 463 | "outputs": [ 464 | { 465 | "data": { 466 | "text/plain": [ 467 | "24970" 468 | ] 469 | }, 470 | "execution_count": 36, 471 | "metadata": {}, 472 | "output_type": "execute_result" 473 | } 474 | ], 475 | "source": [ 476 | "df['Wscore'].argmax()" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "Then, in order to get attributes about the game, we need to use the **iloc[]** function. Iloc is definitely one of the more important functions. The main idea is that you want to use it whenever you have the integer index of a certain row that you want to access. As per Pandas documentation, iloc is an \"integer-location based indexing for selection by position.\"" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 35, 489 | "metadata": { 490 | "collapsed": false 491 | }, 492 | "outputs": [ 493 | { 494 | "data": { 495 | "text/html": [ 496 | "

\n", 497 | "\n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc	Numot
24970	1991	68	1258	186	1109	140	H	0

\n", 525 | "

" 526 | ], 527 | "text/plain": [ 528 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 529 | "24970 1991 68 1258 186 1109 140 H 0" 530 | ] 531 | }, 532 | "execution_count": 35, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "df.iloc[[df['Wscore'].argmax()]]" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": {}, 544 | "source": [ 545 | "Let's take this a step further. Let's say you want to know the game with the highest scoring winning team (this is what we just calculated), but you then want to know how many points the losing team scored. " 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": 38, 551 | "metadata": { 552 | "collapsed": false 553 | }, 554 | "outputs": [ 555 | { 556 | "data": { 557 | "text/plain": [ 558 | "140" 559 | ] 560 | }, 561 | "execution_count": 38, 562 | "metadata": {}, 563 | "output_type": "execute_result" 564 | } 565 | ], 566 | "source": [ 567 | "df.iloc[[df['Wscore'].argmax()]]['Lscore'].max()" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "metadata": {}, 573 | "source": [ 574 | "The bracket indexing operator is the best way to extract certain columns from a dataframe." 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 27, 580 | "metadata": { 581 | "collapsed": false, 582 | "scrolled": true 583 | }, 584 | "outputs": [ 585 | { 586 | "data": { 587 | "text/html": [ 588 | "

\n", 589 | "\n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | "

	Wscore	Lscore
0	81	64
1	77	70
2	63	56
3	70	54
4	86	74
5	79	78
6	64	44
7	58	56
8	98	80
9	97	89
10	103	71
11	75	71
12	91	72
13	70	65
14	87	58
15	65	62
16	92	50
17	65	60
18	58	53
19	50	48
20	47	40
21	55	52
22	76	56
23	59	58
24	79	76
25	106	55
26	95	77
27	79	66
28	64	59
29	76	47
...	...	...
145259	69	67
145260	72	65
145261	64	61
145262	77	62
145263	57	54
145264	68	63
145265	81	69
145266	64	60
145267	81	71
145268	93	80
145269	74	54
145270	64	61
145271	55	53
145272	61	57
145273	88	57
145274	76	59
145275	69	67
145276	82	60
145277	54	53
145278	82	79
145279	80	74
145280	71	38
145281	82	71
145282	76	54
145283	62	59
145284	70	50
145285	72	58
145286	82	77
145287	66	62
145288	87	74

\n", 905 | "

145289 rows × 2 columns

\n", 906 | "

" 907 | ], 908 | "text/plain": [ 909 | " Wscore Lscore\n", 910 | "0 81 64\n", 911 | "1 77 70\n", 912 | "2 63 56\n", 913 | "3 70 54\n", 914 | "4 86 74\n", 915 | "5 79 78\n", 916 | "6 64 44\n", 917 | "7 58 56\n", 918 | "8 98 80\n", 919 | "9 97 89\n", 920 | "10 103 71\n", 921 | "11 75 71\n", 922 | "12 91 72\n", 923 | "13 70 65\n", 924 | "14 87 58\n", 925 | "15 65 62\n", 926 | "16 92 50\n", 927 | "17 65 60\n", 928 | "18 58 53\n", 929 | "19 50 48\n", 930 | "20 47 40\n", 931 | "21 55 52\n", 932 | "22 76 56\n", 933 | "23 59 58\n", 934 | "24 79 76\n", 935 | "25 106 55\n", 936 | "26 95 77\n", 937 | "27 79 66\n", 938 | "28 64 59\n", 939 | "29 76 47\n", 940 | "... ... ...\n", 941 | "145259 69 67\n", 942 | "145260 72 65\n", 943 | "145261 64 61\n", 944 | "145262 77 62\n", 945 | "145263 57 54\n", 946 | "145264 68 63\n", 947 | "145265 81 69\n", 948 | "145266 64 60\n", 949 | "145267 81 71\n", 950 | "145268 93 80\n", 951 | "145269 74 54\n", 952 | "145270 64 61\n", 953 | "145271 55 53\n", 954 | "145272 61 57\n", 955 | "145273 88 57\n", 956 | "145274 76 59\n", 957 | "145275 69 67\n", 958 | "145276 82 60\n", 959 | "145277 54 53\n", 960 | "145278 82 79\n", 961 | "145279 80 74\n", 962 | "145280 71 38\n", 963 | "145281 82 71\n", 964 | "145282 76 54\n", 965 | "145283 62 59\n", 966 | "145284 70 50\n", 967 | "145285 72 58\n", 968 | "145286 82 77\n", 969 | "145287 66 62\n", 970 | "145288 87 74\n", 971 | "\n", 972 | "[145289 rows x 2 columns]" 973 | ] 974 | }, 975 | "execution_count": 27, 976 | "metadata": {}, 977 | "output_type": "execute_result" 978 | } 979 | ], 980 | "source": [ 981 | "df[['Wscore', 'Lscore']]" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "metadata": {}, 987 | "source": [ 988 | "Now, let's say we want to find all of the rows that satisy a particular condition. For example, I want to find all of the games where the winning team scored more than 150 points. The idea behind this command is you want to access the column 'Wscore' of the dataframe df (df['Wscore']), find which entries are above 150 (df['Wscore'] > 150), and then return the results in a dataframe (df[df['Wscore'] > 150])." 989 | ] 990 | }, 991 | { 992 | "cell_type": "code", 993 | "execution_count": 33, 994 | "metadata": { 995 | "collapsed": false 996 | }, 997 | "outputs": [ 998 | { 999 | "data": { 1000 | "text/html": [ 1001 | "

\n", 1002 | "\n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
5269	1986	75	1258	151	1109	107	H
12046	1988	40	1328	152	1147	84	H
12355	1988	52	1328	151	1173	99	N
16040	1989	40	1328	152	1331	122	H
16853	1989	68	1258	162	1109	144	A
17867	1989	92	1258	181	1109	150	H
19653	1990	30	1328	173	1109	101	H
19971	1990	38	1258	152	1109	137	A
20022	1990	40	1116	166	1109	101	H
22145	1990	97	1258	157	1362	115	H
23582	1991	26	1318	152	1258	123	N
24341	1991	47	1328	172	1258	112	H
24970	1991	68	1258	186	1109	140	H
25656	1991	84	1106	151	1212	97	H
28687	1992	54	1261	159	1319	86	H
35023	1993	112	1380	155	1341	91	A
40060	1995	32	1375	156	1341	114	H
52600	1998	33	1395	153	1410	87	H

\n", 1217 | "

" 1218 | ], 1219 | "text/plain": [ 1220 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 1221 | "5269 1986 75 1258 151 1109 107 H 0\n", 1222 | "12046 1988 40 1328 152 1147 84 H 0\n", 1223 | "12355 1988 52 1328 151 1173 99 N 0\n", 1224 | "16040 1989 40 1328 152 1331 122 H 0\n", 1225 | "16853 1989 68 1258 162 1109 144 A 0\n", 1226 | "17867 1989 92 1258 181 1109 150 H 0\n", 1227 | "19653 1990 30 1328 173 1109 101 H 0\n", 1228 | "19971 1990 38 1258 152 1109 137 A 0\n", 1229 | "20022 1990 40 1116 166 1109 101 H 0\n", 1230 | "22145 1990 97 1258 157 1362 115 H 0\n", 1231 | "23582 1991 26 1318 152 1258 123 N 0\n", 1232 | "24341 1991 47 1328 172 1258 112 H 0\n", 1233 | "24970 1991 68 1258 186 1109 140 H 0\n", 1234 | "25656 1991 84 1106 151 1212 97 H 0\n", 1235 | "28687 1992 54 1261 159 1319 86 H 0\n", 1236 | "35023 1993 112 1380 155 1341 91 A 0\n", 1237 | "40060 1995 32 1375 156 1341 114 H 0\n", 1238 | "52600 1998 33 1395 153 1410 87 H 0" 1239 | ] 1240 | }, 1241 | "execution_count": 33, 1242 | "metadata": {}, 1243 | "output_type": "execute_result" 1244 | } 1245 | ], 1246 | "source": [ 1247 | "df[df['Wscore'] > 150]" 1248 | ] 1249 | }, 1250 | { 1251 | "cell_type": "markdown", 1252 | "metadata": {}, 1253 | "source": [ 1254 | "Each dataframe has a **values** attribute which is useful because it basically displays your dataframe in an array style format" 1255 | ] 1256 | }, 1257 | { 1258 | "cell_type": "code", 1259 | "execution_count": 39, 1260 | "metadata": { 1261 | "collapsed": false 1262 | }, 1263 | "outputs": [ 1264 | { 1265 | "data": { 1266 | "text/plain": [ 1267 | "array([[1985, 20, 1228, ..., 64, 'N', 0],\n", 1268 | " [1985, 25, 1106, ..., 70, 'H', 0],\n", 1269 | " [1985, 25, 1112, ..., 56, 'H', 0],\n", 1270 | " ..., \n", 1271 | " [2016, 132, 1246, ..., 77, 'N', 1],\n", 1272 | " [2016, 132, 1277, ..., 62, 'N', 0],\n", 1273 | " [2016, 132, 1386, ..., 74, 'N', 0]], dtype=object)" 1274 | ] 1275 | }, 1276 | "execution_count": 39, 1277 | "metadata": {}, 1278 | "output_type": "execute_result" 1279 | } 1280 | ], 1281 | "source": [ 1282 | "df.values" 1283 | ] 1284 | }, 1285 | { 1286 | "cell_type": "markdown", 1287 | "metadata": {}, 1288 | "source": [ 1289 | "Now, you can simply just access elements like you would in an array. " 1290 | ] 1291 | }, 1292 | { 1293 | "cell_type": "code", 1294 | "execution_count": 40, 1295 | "metadata": { 1296 | "collapsed": false 1297 | }, 1298 | "outputs": [ 1299 | { 1300 | "data": { 1301 | "text/plain": [ 1302 | "1985" 1303 | ] 1304 | }, 1305 | "execution_count": 40, 1306 | "metadata": {}, 1307 | "output_type": "execute_result" 1308 | } 1309 | ], 1310 | "source": [ 1311 | "df.values[0][0]" 1312 | ] 1313 | }, 1314 | { 1315 | "cell_type": "markdown", 1316 | "metadata": {}, 1317 | "source": [ 1318 | "# Dataframe Iteration" 1319 | ] 1320 | }, 1321 | { 1322 | "cell_type": "markdown", 1323 | "metadata": {}, 1324 | "source": [ 1325 | "In order to iterate through dataframes, we can use the " 1326 | ] 1327 | }, 1328 | { 1329 | "cell_type": "markdown", 1330 | "metadata": { 1331 | "collapsed": true 1332 | }, 1333 | "source": [ 1334 | "# Lots of Other Resources" 1335 | ] 1336 | }, 1337 | { 1338 | "cell_type": "markdown", 1339 | "metadata": {}, 1340 | "source": [ 1341 | "Pandas has been around for a while and there are a lot of other good resources if you're still interested on getting the most out of this library. \n", 1342 | "* http://pandas.pydata.org/pandas-docs/stable/10min.html\n", 1343 | "* https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python\n", 1344 | "* http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/\n", 1345 | "* https://www.dataquest.io/blog/pandas-python-tutorial/\n", 1346 | "* https://drive.google.com/file/d/0ByIrJAE4KMTtTUtiVExiUGVkRkE/view" 1347 | ] 1348 | }, 1349 | { 1350 | "cell_type": "code", 1351 | "execution_count": null, 1352 | "metadata": { 1353 | "collapsed": true 1354 | }, 1355 | "outputs": [], 1356 | "source": [] 1357 | } 1358 | ], 1359 | "metadata": { 1360 | "anaconda-cloud": {}, 1361 | "kernelspec": { 1362 | "display_name": "Python [conda root]", 1363 | "language": "python", 1364 | "name": "conda-root-py" 1365 | }, 1366 | "language_info": { 1367 | "codemirror_mode": { 1368 | "name": "ipython", 1369 | "version": 2 1370 | }, 1371 | "file_extension": ".py", 1372 | "mimetype": "text/x-python", 1373 | "name": "python", 1374 | "nbconvert_exporter": "python", 1375 | "pygments_lexer": "ipython2", 1376 | "version": "2.7.12" 1377 | } 1378 | }, 1379 | "nbformat": 4, 1380 | "nbformat_minor": 1 1381 | } 1382 | -------------------------------------------------------------------------------- /DataStructures.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adeshpande3/Pandas-Tutorial/7ce62d4166db83e4f29599a1d8b8eb6b22f21e4e/DataStructures.png -------------------------------------------------------------------------------- /Pandas Tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Since I've been working on a lot of Kaggle competitions, I use Pandas quite a bit. As you may know, Pandas (in addition to Numpy) is the go-to Python library for all your data science needs. It helps with dealing with input data in CSV formats and with transforming your data into a form where it can be inputted into ML models. However, getting comfortable with the ideas of dataframes, slicing, etc was very tough for me in the beginning. Hopefully, this short tutorial can show you a lot of different commands that will help you gain the most insights into your dataset. " 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "import pandas as pd" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "# Loading in Data" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "The first step in any ML problem is identifying what format your data is in, and then loading it into whatever framework you're using. For Kaggle compeitions, a lot of data can be found in CSV files, so that's the example we're going to use. " 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "Since I'm a huge sports fan, we're going to be looking at a sports dataset that shows the results from NCAA basketball games from 1985 to 2016. This dataset is in a CSV file, and the function we're going to use to read in the file is called **pd.read_csv()**. This function returns a **dataframe** variable. The dataframe is the golden jewel data structure for Pandas. It is defined as \"a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)\"." 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "Just think of it as a table for now. " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": { 60 | "collapsed": false 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "df = pd.read_csv('RegularSeasonCompactResults.csv')" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "# The Basics" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "Now that we have our dataframe in our variable df, let's look at what it contains. We can use the function **head()** to see the first couple rows of the dataframe (or the function **tail()** to see the last few rows)." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": { 85 | "collapsed": false 86 | }, 87 | "outputs": [ 88 | { 89 | "data": { 90 | "text/html": [ 91 | "

\n", 92 | "\n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H
3	1985	25	1165	70	1432	54	H
4	1985	25	1192	86	1447	74	H

\n", 164 | "

" 165 | ], 166 | "text/plain": [ 167 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 168 | "0 1985 20 1228 81 1328 64 N 0\n", 169 | "1 1985 25 1106 77 1354 70 H 0\n", 170 | "2 1985 25 1112 63 1223 56 H 0\n", 171 | "3 1985 25 1165 70 1432 54 H 0\n", 172 | "4 1985 25 1192 86 1447 74 H 0" 173 | ] 174 | }, 175 | "execution_count": 3, 176 | "metadata": {}, 177 | "output_type": "execute_result" 178 | } 179 | ], 180 | "source": [ 181 | "df.head()" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 4, 187 | "metadata": { 188 | "collapsed": false 189 | }, 190 | "outputs": [ 191 | { 192 | "data": { 193 | "text/html": [ 194 | "

\n", 195 | "\n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc	Numot
145284	2016	132	1114	70	1419	50	N	0
145285	2016	132	1163	72	1272	58	N	0
145286	2016	132	1246	82	1401	77	N	1
145287	2016	132	1277	66	1345	62	N	0
145288	2016	132	1386	87	1433	74	N	0

\n", 267 | "

" 268 | ], 269 | "text/plain": [ 270 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 271 | "145284 2016 132 1114 70 1419 50 N 0\n", 272 | "145285 2016 132 1163 72 1272 58 N 0\n", 273 | "145286 2016 132 1246 82 1401 77 N 1\n", 274 | "145287 2016 132 1277 66 1345 62 N 0\n", 275 | "145288 2016 132 1386 87 1433 74 N 0" 276 | ] 277 | }, 278 | "execution_count": 4, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "df.tail()" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "We can see the dimensions of the dataframe using the the **shape** attribute" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 5, 297 | "metadata": { 298 | "collapsed": false 299 | }, 300 | "outputs": [ 301 | { 302 | "data": { 303 | "text/plain": [ 304 | "(145289, 8)" 305 | ] 306 | }, 307 | "execution_count": 5, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "df.shape" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "collapsed": true 320 | }, 321 | "source": [ 322 | "We can also extract all the column names as a list, by using the **columns** attribute and can extract the rows with the **index** attribute" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 6, 328 | "metadata": { 329 | "collapsed": false 330 | }, 331 | "outputs": [ 332 | { 333 | "data": { 334 | "text/plain": [ 335 | "['Season', 'Daynum', 'Wteam', 'Wscore', 'Lteam', 'Lscore', 'Wloc', 'Numot']" 336 | ] 337 | }, 338 | "execution_count": 6, 339 | "metadata": {}, 340 | "output_type": "execute_result" 341 | } 342 | ], 343 | "source": [ 344 | "df.columns.tolist()" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "In order to get a better idea of the type of data that we are dealing with, we can call the **describe()** function to see statistics like mean, min, etc about each column of the dataset. " 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 7, 357 | "metadata": { 358 | "collapsed": false 359 | }, 360 | "outputs": [ 361 | { 362 | "data": { 363 | "text/html": [ 364 | "

\n", 365 | "\n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Numot
count	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000	145289.000000
mean	2001.574834	75.223816	1286.720646	76.600321	1282.864064	64.497009	0.044387
std	9.233342	33.287418	104.570275	12.173033	104.829234	11.380625	0.247819
min	1985.000000	0.000000	1101.000000	34.000000	1101.000000	20.000000	0.000000
25%	1994.000000	47.000000	1198.000000	68.000000	1191.000000	57.000000	0.000000
50%	2002.000000	78.000000	1284.000000	76.000000	1280.000000	64.000000	0.000000
75%	2010.000000	103.000000	1379.000000	84.000000	1375.000000	72.000000	0.000000
max	2016.000000	132.000000	1464.000000	186.000000	1464.000000	150.000000	6.000000

\n", 461 | "

" 462 | ], 463 | "text/plain": [ 464 | " Season Daynum Wteam Wscore \\\n", 465 | "count 145289.000000 145289.000000 145289.000000 145289.000000 \n", 466 | "mean 2001.574834 75.223816 1286.720646 76.600321 \n", 467 | "std 9.233342 33.287418 104.570275 12.173033 \n", 468 | "min 1985.000000 0.000000 1101.000000 34.000000 \n", 469 | "25% 1994.000000 47.000000 1198.000000 68.000000 \n", 470 | "50% 2002.000000 78.000000 1284.000000 76.000000 \n", 471 | "75% 2010.000000 103.000000 1379.000000 84.000000 \n", 472 | "max 2016.000000 132.000000 1464.000000 186.000000 \n", 473 | "\n", 474 | " Lteam Lscore Numot \n", 475 | "count 145289.000000 145289.000000 145289.000000 \n", 476 | "mean 1282.864064 64.497009 0.044387 \n", 477 | "std 104.829234 11.380625 0.247819 \n", 478 | "min 1101.000000 20.000000 0.000000 \n", 479 | "25% 1191.000000 57.000000 0.000000 \n", 480 | "50% 1280.000000 64.000000 0.000000 \n", 481 | "75% 1375.000000 72.000000 0.000000 \n", 482 | "max 1464.000000 150.000000 6.000000 " 483 | ] 484 | }, 485 | "execution_count": 7, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | } 489 | ], 490 | "source": [ 491 | "df.describe()" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "Okay, so now let's looking at information that we want to extract from the dataframe. Let's say I wanted to know the max value of a certain column. The function **max()** will show you the maximum values of all columns" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 8, 504 | "metadata": { 505 | "collapsed": false 506 | }, 507 | "outputs": [ 508 | { 509 | "data": { 510 | "text/plain": [ 511 | "Season 2016\n", 512 | "Daynum 132\n", 513 | "Wteam 1464\n", 514 | "Wscore 186\n", 515 | "Lteam 1464\n", 516 | "Lscore 150\n", 517 | "Wloc N\n", 518 | "Numot 6\n", 519 | "dtype: object" 520 | ] 521 | }, 522 | "execution_count": 8, 523 | "metadata": {}, 524 | "output_type": "execute_result" 525 | } 526 | ], 527 | "source": [ 528 | "df.max()" 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": {}, 534 | "source": [ 535 | "Then, if you'd like to specifically get the max value for a particular column, you pass in the name of the column using the bracket indexing operator" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 9, 541 | "metadata": { 542 | "collapsed": false 543 | }, 544 | "outputs": [ 545 | { 546 | "data": { 547 | "text/plain": [ 548 | "186" 549 | ] 550 | }, 551 | "execution_count": 9, 552 | "metadata": {}, 553 | "output_type": "execute_result" 554 | } 555 | ], 556 | "source": [ 557 | "df['Wscore'].max()" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": {}, 563 | "source": [ 564 | "If you'd like to find the mean of the Losing teams' score. " 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": 10, 570 | "metadata": { 571 | "collapsed": false 572 | }, 573 | "outputs": [ 574 | { 575 | "data": { 576 | "text/plain": [ 577 | "64.49700940883343" 578 | ] 579 | }, 580 | "execution_count": 10, 581 | "metadata": {}, 582 | "output_type": "execute_result" 583 | } 584 | ], 585 | "source": [ 586 | "df['Lscore'].mean()" 587 | ] 588 | }, 589 | { 590 | "cell_type": "markdown", 591 | "metadata": {}, 592 | "source": [ 593 | "But what if that's not enough? Let's say we want to actually see the game(row) where this max score happened. We can call the **argmax()** function to identify the row index" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": 11, 599 | "metadata": { 600 | "collapsed": false 601 | }, 602 | "outputs": [ 603 | { 604 | "data": { 605 | "text/plain": [ 606 | "24970" 607 | ] 608 | }, 609 | "execution_count": 11, 610 | "metadata": {}, 611 | "output_type": "execute_result" 612 | } 613 | ], 614 | "source": [ 615 | "df['Wscore'].argmax()" 616 | ] 617 | }, 618 | { 619 | "cell_type": "markdown", 620 | "metadata": {}, 621 | "source": [ 622 | "One of the most useful functions that you can call on certain columns in a dataframe is the **value_counts()** function. It shows how many times each item appears in the column. This particular command shows the number of games in each season" 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 12, 628 | "metadata": { 629 | "collapsed": false 630 | }, 631 | "outputs": [ 632 | { 633 | "data": { 634 | "text/plain": [ 635 | "2016 5369\n", 636 | "2014 5362\n", 637 | "2015 5354\n", 638 | "2013 5320\n", 639 | "2010 5263\n", 640 | "2012 5253\n", 641 | "2009 5249\n", 642 | "2011 5246\n", 643 | "2008 5163\n", 644 | "2007 5043\n", 645 | "2006 4757\n", 646 | "2005 4675\n", 647 | "2003 4616\n", 648 | "2004 4571\n", 649 | "2002 4555\n", 650 | "2000 4519\n", 651 | "2001 4467\n", 652 | "1999 4222\n", 653 | "1998 4167\n", 654 | "1997 4155\n", 655 | "1992 4127\n", 656 | "1991 4123\n", 657 | "1996 4122\n", 658 | "1995 4077\n", 659 | "1994 4060\n", 660 | "1990 4045\n", 661 | "1989 4037\n", 662 | "1993 3982\n", 663 | "1988 3955\n", 664 | "1987 3915\n", 665 | "1986 3783\n", 666 | "1985 3737\n", 667 | "Name: Season, dtype: int64" 668 | ] 669 | }, 670 | "execution_count": 12, 671 | "metadata": {}, 672 | "output_type": "execute_result" 673 | } 674 | ], 675 | "source": [ 676 | "df['Season'].value_counts()" 677 | ] 678 | }, 679 | { 680 | "cell_type": "markdown", 681 | "metadata": {}, 682 | "source": [ 683 | "# Acessing Values" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "Then, in order to get attributes about the game, we need to use the **iloc[]** function. Iloc is definitely one of the more important functions. The main idea is that you want to use it whenever you have the integer index of a certain row that you want to access. As per Pandas documentation, iloc is an \"integer-location based indexing for selection by position.\"" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": 13, 696 | "metadata": { 697 | "collapsed": false 698 | }, 699 | "outputs": [ 700 | { 701 | "data": { 702 | "text/html": [ 703 | "

\n", 704 | "\n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc	Numot
24970	1991	68	1258	186	1109	140	H	0

\n", 732 | "

" 733 | ], 734 | "text/plain": [ 735 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 736 | "24970 1991 68 1258 186 1109 140 H 0" 737 | ] 738 | }, 739 | "execution_count": 13, 740 | "metadata": {}, 741 | "output_type": "execute_result" 742 | } 743 | ], 744 | "source": [ 745 | "df.iloc[[df['Wscore'].argmax()]]" 746 | ] 747 | }, 748 | { 749 | "cell_type": "markdown", 750 | "metadata": {}, 751 | "source": [ 752 | "Let's take this a step further. Let's say you want to know the game with the highest scoring winning team (this is what we just calculated), but you then want to know how many points the losing team scored. " 753 | ] 754 | }, 755 | { 756 | "cell_type": "code", 757 | "execution_count": 14, 758 | "metadata": { 759 | "collapsed": false 760 | }, 761 | "outputs": [ 762 | { 763 | "data": { 764 | "text/plain": [ 765 | "24970 140\n", 766 | "Name: Lscore, dtype: int64" 767 | ] 768 | }, 769 | "execution_count": 14, 770 | "metadata": {}, 771 | "output_type": "execute_result" 772 | } 773 | ], 774 | "source": [ 775 | "df.iloc[[df['Wscore'].argmax()]]['Lscore']" 776 | ] 777 | }, 778 | { 779 | "cell_type": "markdown", 780 | "metadata": {}, 781 | "source": [ 782 | "When you see data displayed in the above format, you're dealing with a Pandas **Series** object, not a dataframe object." 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "execution_count": 15, 788 | "metadata": { 789 | "collapsed": false 790 | }, 791 | "outputs": [ 792 | { 793 | "data": { 794 | "text/plain": [ 795 | "pandas.core.series.Series" 796 | ] 797 | }, 798 | "execution_count": 15, 799 | "metadata": {}, 800 | "output_type": "execute_result" 801 | } 802 | ], 803 | "source": [ 804 | "type(df.iloc[[df['Wscore'].argmax()]]['Lscore'])" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": 16, 810 | "metadata": { 811 | "collapsed": false 812 | }, 813 | "outputs": [ 814 | { 815 | "data": { 816 | "text/plain": [ 817 | "pandas.core.frame.DataFrame" 818 | ] 819 | }, 820 | "execution_count": 16, 821 | "metadata": {}, 822 | "output_type": "execute_result" 823 | } 824 | ], 825 | "source": [ 826 | "type(df.iloc[[df['Wscore'].argmax()]])" 827 | ] 828 | }, 829 | { 830 | "cell_type": "markdown", 831 | "metadata": {}, 832 | "source": [ 833 | "The following is a summary of the 3 data structures in Pandas (Haven't ever really used Panels yet)\n", 834 | "\n", 835 | "![](DataStructures.png)" 836 | ] 837 | }, 838 | { 839 | "cell_type": "markdown", 840 | "metadata": {}, 841 | "source": [ 842 | "When you want to access values in a Series, you'll want to just treat the Series like a Python dictionary, so you'd access the value according to its key (which is normally an integer index)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": 17, 848 | "metadata": { 849 | "collapsed": false 850 | }, 851 | "outputs": [ 852 | { 853 | "data": { 854 | "text/plain": [ 855 | "140" 856 | ] 857 | }, 858 | "execution_count": 17, 859 | "metadata": {}, 860 | "output_type": "execute_result" 861 | } 862 | ], 863 | "source": [ 864 | "df.iloc[[df['Wscore'].argmax()]]['Lscore'][24970]" 865 | ] 866 | }, 867 | { 868 | "cell_type": "markdown", 869 | "metadata": {}, 870 | "source": [ 871 | "The other really important function in Pandas is the **loc** function. Contrary to iloc, which is an integer based indexing, loc is a \"Purely label-location based indexer for selection by label\". Since all the games are ordered from 0 to 145288, iloc and loc are going to be pretty interchangable in this type of dataset" 872 | ] 873 | }, 874 | { 875 | "cell_type": "code", 876 | "execution_count": 18, 877 | "metadata": { 878 | "collapsed": false 879 | }, 880 | "outputs": [ 881 | { 882 | "data": { 883 | "text/html": [ 884 | "

\n", 885 | "\n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H

\n", 935 | "

" 936 | ], 937 | "text/plain": [ 938 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 939 | "0 1985 20 1228 81 1328 64 N 0\n", 940 | "1 1985 25 1106 77 1354 70 H 0\n", 941 | "2 1985 25 1112 63 1223 56 H 0" 942 | ] 943 | }, 944 | "execution_count": 18, 945 | "metadata": {}, 946 | "output_type": "execute_result" 947 | } 948 | ], 949 | "source": [ 950 | "df.iloc[:3]" 951 | ] 952 | }, 953 | { 954 | "cell_type": "code", 955 | "execution_count": 19, 956 | "metadata": { 957 | "collapsed": false 958 | }, 959 | "outputs": [ 960 | { 961 | "data": { 962 | "text/html": [ 963 | "

\n", 964 | "\n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H
3	1985	25	1165	70	1432	54	H

\n", 1025 | "

" 1026 | ], 1027 | "text/plain": [ 1028 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 1029 | "0 1985 20 1228 81 1328 64 N 0\n", 1030 | "1 1985 25 1106 77 1354 70 H 0\n", 1031 | "2 1985 25 1112 63 1223 56 H 0\n", 1032 | "3 1985 25 1165 70 1432 54 H 0" 1033 | ] 1034 | }, 1035 | "execution_count": 19, 1036 | "metadata": {}, 1037 | "output_type": "execute_result" 1038 | } 1039 | ], 1040 | "source": [ 1041 | "df.loc[:3]" 1042 | ] 1043 | }, 1044 | { 1045 | "cell_type": "markdown", 1046 | "metadata": {}, 1047 | "source": [ 1048 | "Notice the slight difference in that iloc is exclusive of the second number, while loc is inclusive. " 1049 | ] 1050 | }, 1051 | { 1052 | "cell_type": "markdown", 1053 | "metadata": {}, 1054 | "source": [ 1055 | "Below is an example of how you can use loc to acheive the same task as we did previously with iloc" 1056 | ] 1057 | }, 1058 | { 1059 | "cell_type": "code", 1060 | "execution_count": 20, 1061 | "metadata": { 1062 | "collapsed": false 1063 | }, 1064 | "outputs": [ 1065 | { 1066 | "data": { 1067 | "text/plain": [ 1068 | "140" 1069 | ] 1070 | }, 1071 | "execution_count": 20, 1072 | "metadata": {}, 1073 | "output_type": "execute_result" 1074 | } 1075 | ], 1076 | "source": [ 1077 | "df.loc[df['Wscore'].argmax(), 'Lscore']" 1078 | ] 1079 | }, 1080 | { 1081 | "cell_type": "markdown", 1082 | "metadata": {}, 1083 | "source": [ 1084 | "A faster version uses the **at()** function. At() is really useful wheneever you know the row label and the column label of the particular value that you want to get. " 1085 | ] 1086 | }, 1087 | { 1088 | "cell_type": "code", 1089 | "execution_count": 21, 1090 | "metadata": { 1091 | "collapsed": false 1092 | }, 1093 | "outputs": [ 1094 | { 1095 | "data": { 1096 | "text/plain": [ 1097 | "140" 1098 | ] 1099 | }, 1100 | "execution_count": 21, 1101 | "metadata": {}, 1102 | "output_type": "execute_result" 1103 | } 1104 | ], 1105 | "source": [ 1106 | "df.at[df['Wscore'].argmax(), 'Lscore']" 1107 | ] 1108 | }, 1109 | { 1110 | "cell_type": "markdown", 1111 | "metadata": {}, 1112 | "source": [ 1113 | "If you'd like to see more discussion on how loc and iloc are different, check out this great Stack Overflow post: http://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation. Just remember that **iloc looks at position** and **loc looks at labels**. Loc becomes very important when your row labels aren't integers. " 1114 | ] 1115 | }, 1116 | { 1117 | "cell_type": "markdown", 1118 | "metadata": {}, 1119 | "source": [ 1120 | "# Sorting" 1121 | ] 1122 | }, 1123 | { 1124 | "cell_type": "markdown", 1125 | "metadata": {}, 1126 | "source": [ 1127 | "Let's say that we want to sort the dataframe in increasing order for the scores of the losing team" 1128 | ] 1129 | }, 1130 | { 1131 | "cell_type": "code", 1132 | "execution_count": 22, 1133 | "metadata": { 1134 | "collapsed": false, 1135 | "scrolled": true 1136 | }, 1137 | "outputs": [ 1138 | { 1139 | "data": { 1140 | "text/html": [ 1141 | "

\n", 1142 | "\n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
100027	2008	66	1203	49	1387	20	H
49310	1997	66	1157	61	1204	21	H
89021	2006	44	1284	41	1343	21	A
85042	2005	66	1131	73	1216	22	H
103660	2009	26	1326	59	1359	22	H

\n", 1214 | "

" 1215 | ], 1216 | "text/plain": [ 1217 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 1218 | "100027 2008 66 1203 49 1387 20 H 0\n", 1219 | "49310 1997 66 1157 61 1204 21 H 0\n", 1220 | "89021 2006 44 1284 41 1343 21 A 0\n", 1221 | "85042 2005 66 1131 73 1216 22 H 0\n", 1222 | "103660 2009 26 1326 59 1359 22 H 0" 1223 | ] 1224 | }, 1225 | "execution_count": 22, 1226 | "metadata": {}, 1227 | "output_type": "execute_result" 1228 | } 1229 | ], 1230 | "source": [ 1231 | "df.sort_values('Lscore').head()" 1232 | ] 1233 | }, 1234 | { 1235 | "cell_type": "code", 1236 | "execution_count": 23, 1237 | "metadata": { 1238 | "collapsed": false 1239 | }, 1240 | "outputs": [ 1241 | { 1242 | "data": { 1243 | "text/plain": [ 1244 | "" 1245 | ] 1246 | }, 1247 | "execution_count": 23, 1248 | "metadata": {}, 1249 | "output_type": "execute_result" 1250 | } 1251 | ], 1252 | "source": [ 1253 | "df.groupby('Lscore')" 1254 | ] 1255 | }, 1256 | { 1257 | "cell_type": "markdown", 1258 | "metadata": {}, 1259 | "source": [ 1260 | "# Filtering Rows Conditionally" 1261 | ] 1262 | }, 1263 | { 1264 | "cell_type": "markdown", 1265 | "metadata": {}, 1266 | "source": [ 1267 | "Now, let's say we want to find all of the rows that satisy a particular condition. For example, I want to find all of the games where the winning team scored more than 150 points. The idea behind this command is you want to access the column 'Wscore' of the dataframe df (df['Wscore']), find which entries are above 150 (df['Wscore'] > 150), and then returns only those specific rows in a dataframe format (df[df['Wscore'] > 150])." 1268 | ] 1269 | }, 1270 | { 1271 | "cell_type": "code", 1272 | "execution_count": 24, 1273 | "metadata": { 1274 | "collapsed": false 1275 | }, 1276 | "outputs": [ 1277 | { 1278 | "data": { 1279 | "text/html": [ 1280 | "

\n", 1281 | "\n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
5269	1986	75	1258	151	1109	107	H
12046	1988	40	1328	152	1147	84	H
12355	1988	52	1328	151	1173	99	N
16040	1989	40	1328	152	1331	122	H
16853	1989	68	1258	162	1109	144	A
17867	1989	92	1258	181	1109	150	H
19653	1990	30	1328	173	1109	101	H
19971	1990	38	1258	152	1109	137	A
20022	1990	40	1116	166	1109	101	H
22145	1990	97	1258	157	1362	115	H
23582	1991	26	1318	152	1258	123	N
24341	1991	47	1328	172	1258	112	H
24970	1991	68	1258	186	1109	140	H
25656	1991	84	1106	151	1212	97	H
28687	1992	54	1261	159	1319	86	H
35023	1993	112	1380	155	1341	91	A
40060	1995	32	1375	156	1341	114	H
52600	1998	33	1395	153	1410	87	H

\n", 1496 | "

" 1497 | ], 1498 | "text/plain": [ 1499 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 1500 | "5269 1986 75 1258 151 1109 107 H 0\n", 1501 | "12046 1988 40 1328 152 1147 84 H 0\n", 1502 | "12355 1988 52 1328 151 1173 99 N 0\n", 1503 | "16040 1989 40 1328 152 1331 122 H 0\n", 1504 | "16853 1989 68 1258 162 1109 144 A 0\n", 1505 | "17867 1989 92 1258 181 1109 150 H 0\n", 1506 | "19653 1990 30 1328 173 1109 101 H 0\n", 1507 | "19971 1990 38 1258 152 1109 137 A 0\n", 1508 | "20022 1990 40 1116 166 1109 101 H 0\n", 1509 | "22145 1990 97 1258 157 1362 115 H 0\n", 1510 | "23582 1991 26 1318 152 1258 123 N 0\n", 1511 | "24341 1991 47 1328 172 1258 112 H 0\n", 1512 | "24970 1991 68 1258 186 1109 140 H 0\n", 1513 | "25656 1991 84 1106 151 1212 97 H 0\n", 1514 | "28687 1992 54 1261 159 1319 86 H 0\n", 1515 | "35023 1993 112 1380 155 1341 91 A 0\n", 1516 | "40060 1995 32 1375 156 1341 114 H 0\n", 1517 | "52600 1998 33 1395 153 1410 87 H 0" 1518 | ] 1519 | }, 1520 | "execution_count": 24, 1521 | "metadata": {}, 1522 | "output_type": "execute_result" 1523 | } 1524 | ], 1525 | "source": [ 1526 | "df[df['Wscore'] > 150]" 1527 | ] 1528 | }, 1529 | { 1530 | "cell_type": "markdown", 1531 | "metadata": {}, 1532 | "source": [ 1533 | "This also works if you have multiple conditions. Let's say we want to find out when the winning team scores more than 150 points and when the losing team scores below 100. " 1534 | ] 1535 | }, 1536 | { 1537 | "cell_type": "code", 1538 | "execution_count": 25, 1539 | "metadata": { 1540 | "collapsed": false 1541 | }, 1542 | "outputs": [ 1543 | { 1544 | "data": { 1545 | "text/html": [ 1546 | "

\n", 1547 | "\n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
12046	1988	40	1328	152	1147	84	H
12355	1988	52	1328	151	1173	99	N
25656	1991	84	1106	151	1212	97	H
28687	1992	54	1261	159	1319	86	H
35023	1993	112	1380	155	1341	91	A
52600	1998	33	1395	153	1410	87	H

\n", 1630 | "

" 1631 | ], 1632 | "text/plain": [ 1633 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 1634 | "12046 1988 40 1328 152 1147 84 H 0\n", 1635 | "12355 1988 52 1328 151 1173 99 N 0\n", 1636 | "25656 1991 84 1106 151 1212 97 H 0\n", 1637 | "28687 1992 54 1261 159 1319 86 H 0\n", 1638 | "35023 1993 112 1380 155 1341 91 A 0\n", 1639 | "52600 1998 33 1395 153 1410 87 H 0" 1640 | ] 1641 | }, 1642 | "execution_count": 25, 1643 | "metadata": {}, 1644 | "output_type": "execute_result" 1645 | } 1646 | ], 1647 | "source": [ 1648 | "df[(df['Wscore'] > 150) & (df['Lscore'] < 100)]" 1649 | ] 1650 | }, 1651 | { 1652 | "cell_type": "markdown", 1653 | "metadata": {}, 1654 | "source": [ 1655 | "# Grouping" 1656 | ] 1657 | }, 1658 | { 1659 | "cell_type": "markdown", 1660 | "metadata": {}, 1661 | "source": [ 1662 | "Another important function in Pandas is **groupby()**. This is a function that allows you to group entries by certain attributes (e.g Grouping entries by Wteam number) and then perform operations on them. The following function groups all the entries (games) with the same Wteam number and finds the mean for each group. " 1663 | ] 1664 | }, 1665 | { 1666 | "cell_type": "code", 1667 | "execution_count": 26, 1668 | "metadata": { 1669 | "collapsed": false 1670 | }, 1671 | "outputs": [ 1672 | { 1673 | "data": { 1674 | "text/plain": [ 1675 | "Wteam\n", 1676 | "1101 78.111111\n", 1677 | "1102 69.893204\n", 1678 | "1103 75.839768\n", 1679 | "1104 75.825944\n", 1680 | "1105 74.960894\n", 1681 | "Name: Wscore, dtype: float64" 1682 | ] 1683 | }, 1684 | "execution_count": 26, 1685 | "metadata": {}, 1686 | "output_type": "execute_result" 1687 | } 1688 | ], 1689 | "source": [ 1690 | "df.groupby('Wteam')['Wscore'].mean().head()" 1691 | ] 1692 | }, 1693 | { 1694 | "cell_type": "markdown", 1695 | "metadata": {}, 1696 | "source": [ 1697 | "This next command groups all the games with the same Wteam number and finds where how many times that specific team won at home, on the road, or at a neutral site" 1698 | ] 1699 | }, 1700 | { 1701 | "cell_type": "code", 1702 | "execution_count": 27, 1703 | "metadata": { 1704 | "collapsed": false, 1705 | "scrolled": false 1706 | }, 1707 | "outputs": [ 1708 | { 1709 | "data": { 1710 | "text/plain": [ 1711 | "Wteam Wloc\n", 1712 | "1101 H 12\n", 1713 | " A 3\n", 1714 | " N 3\n", 1715 | "1102 H 204\n", 1716 | " A 73\n", 1717 | " N 32\n", 1718 | "1103 H 324\n", 1719 | " A 153\n", 1720 | " N 41\n", 1721 | "Name: Wloc, dtype: int64" 1722 | ] 1723 | }, 1724 | "execution_count": 27, 1725 | "metadata": {}, 1726 | "output_type": "execute_result" 1727 | } 1728 | ], 1729 | "source": [ 1730 | "df.groupby('Wteam')['Wloc'].value_counts().head(9)" 1731 | ] 1732 | }, 1733 | { 1734 | "cell_type": "markdown", 1735 | "metadata": {}, 1736 | "source": [ 1737 | "Each dataframe has a **values** attribute which is useful because it basically displays your dataframe in a numpy array style format" 1738 | ] 1739 | }, 1740 | { 1741 | "cell_type": "code", 1742 | "execution_count": 28, 1743 | "metadata": { 1744 | "collapsed": false 1745 | }, 1746 | "outputs": [ 1747 | { 1748 | "data": { 1749 | "text/plain": [ 1750 | "array([[1985, 20, 1228, ..., 64, 'N', 0],\n", 1751 | " [1985, 25, 1106, ..., 70, 'H', 0],\n", 1752 | " [1985, 25, 1112, ..., 56, 'H', 0],\n", 1753 | " ..., \n", 1754 | " [2016, 132, 1246, ..., 77, 'N', 1],\n", 1755 | " [2016, 132, 1277, ..., 62, 'N', 0],\n", 1756 | " [2016, 132, 1386, ..., 74, 'N', 0]], dtype=object)" 1757 | ] 1758 | }, 1759 | "execution_count": 28, 1760 | "metadata": {}, 1761 | "output_type": "execute_result" 1762 | } 1763 | ], 1764 | "source": [ 1765 | "df.values" 1766 | ] 1767 | }, 1768 | { 1769 | "cell_type": "markdown", 1770 | "metadata": {}, 1771 | "source": [ 1772 | "Now, you can simply just access elements like you would in an array. " 1773 | ] 1774 | }, 1775 | { 1776 | "cell_type": "code", 1777 | "execution_count": 29, 1778 | "metadata": { 1779 | "collapsed": false 1780 | }, 1781 | "outputs": [ 1782 | { 1783 | "data": { 1784 | "text/plain": [ 1785 | "1985" 1786 | ] 1787 | }, 1788 | "execution_count": 29, 1789 | "metadata": {}, 1790 | "output_type": "execute_result" 1791 | } 1792 | ], 1793 | "source": [ 1794 | "df.values[0][0]" 1795 | ] 1796 | }, 1797 | { 1798 | "cell_type": "markdown", 1799 | "metadata": {}, 1800 | "source": [ 1801 | "# Dataframe Iteration" 1802 | ] 1803 | }, 1804 | { 1805 | "cell_type": "markdown", 1806 | "metadata": {}, 1807 | "source": [ 1808 | "In order to iterate through dataframes, we can use the **iterrows()** function. Below is an example of what the first two rows look like. Each row in iterrows is a Series object" 1809 | ] 1810 | }, 1811 | { 1812 | "cell_type": "code", 1813 | "execution_count": 30, 1814 | "metadata": { 1815 | "collapsed": false 1816 | }, 1817 | "outputs": [ 1818 | { 1819 | "name": "stdout", 1820 | "output_type": "stream", 1821 | "text": [ 1822 | "Season 1985\n", 1823 | "Daynum 20\n", 1824 | "Wteam 1228\n", 1825 | "Wscore 81\n", 1826 | "Lteam 1328\n", 1827 | "Lscore 64\n", 1828 | "Wloc N\n", 1829 | "Numot 0\n", 1830 | "Name: 0, dtype: object\n", 1831 | "Season 1985\n", 1832 | "Daynum 25\n", 1833 | "Wteam 1106\n", 1834 | "Wscore 77\n", 1835 | "Lteam 1354\n", 1836 | "Lscore 70\n", 1837 | "Wloc H\n", 1838 | "Numot 0\n", 1839 | "Name: 1, dtype: object\n" 1840 | ] 1841 | } 1842 | ], 1843 | "source": [ 1844 | "for index, row in df.iterrows():\n", 1845 | " print row\n", 1846 | " if index == 1:\n", 1847 | " break" 1848 | ] 1849 | }, 1850 | { 1851 | "cell_type": "markdown", 1852 | "metadata": {}, 1853 | "source": [ 1854 | "# Extracting Rows and Columns" 1855 | ] 1856 | }, 1857 | { 1858 | "cell_type": "markdown", 1859 | "metadata": {}, 1860 | "source": [ 1861 | "The bracket indexing operator is one way to extract certain columns from a dataframe." 1862 | ] 1863 | }, 1864 | { 1865 | "cell_type": "code", 1866 | "execution_count": 31, 1867 | "metadata": { 1868 | "collapsed": false, 1869 | "scrolled": true 1870 | }, 1871 | "outputs": [ 1872 | { 1873 | "data": { 1874 | "text/html": [ 1875 | "

\n", 1876 | "\n", 1877 | " \n", 1878 | " \n", 1879 | " \n", 1880 | " \n", 1881 | " \n", 1882 | " \n", 1883 | " \n", 1884 | " \n", 1885 | " \n", 1886 | " \n", 1887 | " \n", 1888 | " \n", 1889 | " \n", 1890 | " \n", 1891 | " \n", 1892 | " \n", 1893 | " \n", 1894 | " \n", 1895 | " \n", 1896 | " \n", 1897 | " \n", 1898 | " \n", 1899 | " \n", 1900 | " \n", 1901 | " \n", 1902 | " \n", 1903 | " \n", 1904 | " \n", 1905 | " \n", 1906 | " \n", 1907 | " \n", 1908 | " \n", 1909 | " \n", 1910 | " \n", 1911 | "

	Wscore	Lscore
0	81	64
1	77	70
2	63	56
3	70	54
4	86	74

\n", 1912 | "

" 1913 | ], 1914 | "text/plain": [ 1915 | " Wscore Lscore\n", 1916 | "0 81 64\n", 1917 | "1 77 70\n", 1918 | "2 63 56\n", 1919 | "3 70 54\n", 1920 | "4 86 74" 1921 | ] 1922 | }, 1923 | "execution_count": 31, 1924 | "metadata": {}, 1925 | "output_type": "execute_result" 1926 | } 1927 | ], 1928 | "source": [ 1929 | "df[['Wscore', 'Lscore']].head()" 1930 | ] 1931 | }, 1932 | { 1933 | "cell_type": "markdown", 1934 | "metadata": {}, 1935 | "source": [ 1936 | "Notice that you can acheive the same result by using the loc function. Loc is a veryyyy versatile function that can help you in a lot of accessing and extracting tasks. " 1937 | ] 1938 | }, 1939 | { 1940 | "cell_type": "code", 1941 | "execution_count": 32, 1942 | "metadata": { 1943 | "collapsed": false 1944 | }, 1945 | "outputs": [ 1946 | { 1947 | "data": { 1948 | "text/html": [ 1949 | "

\n", 1950 | "\n", 1951 | " \n", 1952 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | " \n", 1961 | " \n", 1962 | " \n", 1963 | " \n", 1964 | " \n", 1965 | " \n", 1966 | " \n", 1967 | " \n", 1968 | " \n", 1969 | " \n", 1970 | " \n", 1971 | " \n", 1972 | " \n", 1973 | " \n", 1974 | " \n", 1975 | " \n", 1976 | " \n", 1977 | " \n", 1978 | " \n", 1979 | " \n", 1980 | " \n", 1981 | " \n", 1982 | " \n", 1983 | " \n", 1984 | " \n", 1985 | "

	Wscore	Lscore
0	81	64
1	77	70
2	63	56
3	70	54
4	86	74

\n", 1986 | "

" 1987 | ], 1988 | "text/plain": [ 1989 | " Wscore Lscore\n", 1990 | "0 81 64\n", 1991 | "1 77 70\n", 1992 | "2 63 56\n", 1993 | "3 70 54\n", 1994 | "4 86 74" 1995 | ] 1996 | }, 1997 | "execution_count": 32, 1998 | "metadata": {}, 1999 | "output_type": "execute_result" 2000 | } 2001 | ], 2002 | "source": [ 2003 | "df.loc[:, ['Wscore', 'Lscore']].head()" 2004 | ] 2005 | }, 2006 | { 2007 | "cell_type": "markdown", 2008 | "metadata": {}, 2009 | "source": [ 2010 | "Note the difference is the return types when you use brackets and when you use double brackets. " 2011 | ] 2012 | }, 2013 | { 2014 | "cell_type": "code", 2015 | "execution_count": 33, 2016 | "metadata": { 2017 | "collapsed": false 2018 | }, 2019 | "outputs": [ 2020 | { 2021 | "data": { 2022 | "text/plain": [ 2023 | "pandas.core.series.Series" 2024 | ] 2025 | }, 2026 | "execution_count": 33, 2027 | "metadata": {}, 2028 | "output_type": "execute_result" 2029 | } 2030 | ], 2031 | "source": [ 2032 | "type(df['Wscore'])" 2033 | ] 2034 | }, 2035 | { 2036 | "cell_type": "code", 2037 | "execution_count": 34, 2038 | "metadata": { 2039 | "collapsed": false 2040 | }, 2041 | "outputs": [ 2042 | { 2043 | "data": { 2044 | "text/plain": [ 2045 | "pandas.core.frame.DataFrame" 2046 | ] 2047 | }, 2048 | "execution_count": 34, 2049 | "metadata": {}, 2050 | "output_type": "execute_result" 2051 | } 2052 | ], 2053 | "source": [ 2054 | "type(df[['Wscore']])" 2055 | ] 2056 | }, 2057 | { 2058 | "cell_type": "markdown", 2059 | "metadata": {}, 2060 | "source": [ 2061 | "You've seen before that you can access columns through df['col name']. You can access rows by using slicing operations. " 2062 | ] 2063 | }, 2064 | { 2065 | "cell_type": "code", 2066 | "execution_count": 35, 2067 | "metadata": { 2068 | "collapsed": false 2069 | }, 2070 | "outputs": [ 2071 | { 2072 | "data": { 2073 | "text/html": [ 2074 | "

\n", 2075 | "\n", 2076 | " \n", 2077 | " \n", 2078 | " \n", 2079 | " \n", 2080 | " \n", 2081 | " \n", 2082 | " \n", 2083 | " \n", 2084 | " \n", 2085 | " \n", 2086 | " \n", 2087 | " \n", 2088 | " \n", 2089 | " \n", 2090 | " \n", 2091 | " \n", 2092 | " \n", 2093 | " \n", 2094 | " \n", 2095 | " \n", 2096 | " \n", 2097 | " \n", 2098 | " \n", 2099 | " \n", 2100 | " \n", 2101 | " \n", 2102 | " \n", 2103 | " \n", 2104 | " \n", 2105 | " \n", 2106 | " \n", 2107 | " \n", 2108 | " \n", 2109 | " \n", 2110 | " \n", 2111 | " \n", 2112 | " \n", 2113 | " \n", 2114 | " \n", 2115 | " \n", 2116 | " \n", 2117 | " \n", 2118 | " \n", 2119 | " \n", 2120 | " \n", 2121 | " \n", 2122 | " \n", 2123 | " \n", 2124 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H

\n", 2125 | "

" 2126 | ], 2127 | "text/plain": [ 2128 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 2129 | "0 1985 20 1228 81 1328 64 N 0\n", 2130 | "1 1985 25 1106 77 1354 70 H 0\n", 2131 | "2 1985 25 1112 63 1223 56 H 0" 2132 | ] 2133 | }, 2134 | "execution_count": 35, 2135 | "metadata": {}, 2136 | "output_type": "execute_result" 2137 | } 2138 | ], 2139 | "source": [ 2140 | "df[0:3]" 2141 | ] 2142 | }, 2143 | { 2144 | "cell_type": "markdown", 2145 | "metadata": {}, 2146 | "source": [ 2147 | "Here's an equivalent using iloc" 2148 | ] 2149 | }, 2150 | { 2151 | "cell_type": "code", 2152 | "execution_count": 36, 2153 | "metadata": { 2154 | "collapsed": false 2155 | }, 2156 | "outputs": [ 2157 | { 2158 | "data": { 2159 | "text/html": [ 2160 | "

\n", 2161 | "\n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | " \n", 2175 | " \n", 2176 | " \n", 2177 | " \n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2190 | " \n", 2191 | " \n", 2192 | " \n", 2193 | " \n", 2194 | " \n", 2195 | " \n", 2196 | " \n", 2197 | " \n", 2198 | " \n", 2199 | " \n", 2200 | " \n", 2201 | " \n", 2202 | " \n", 2203 | " \n", 2204 | " \n", 2205 | " \n", 2206 | " \n", 2207 | " \n", 2208 | " \n", 2209 | " \n", 2210 | "

	Season	Daynum	Wteam	Wscore	Lteam	Lscore	Wloc
0	1985	20	1228	81	1328	64	N
1	1985	25	1106	77	1354	70	H
2	1985	25	1112	63	1223	56	H

\n", 2211 | "

" 2212 | ], 2213 | "text/plain": [ 2214 | " Season Daynum Wteam Wscore Lteam Lscore Wloc Numot\n", 2215 | "0 1985 20 1228 81 1328 64 N 0\n", 2216 | "1 1985 25 1106 77 1354 70 H 0\n", 2217 | "2 1985 25 1112 63 1223 56 H 0" 2218 | ] 2219 | }, 2220 | "execution_count": 36, 2221 | "metadata": {}, 2222 | "output_type": "execute_result" 2223 | } 2224 | ], 2225 | "source": [ 2226 | "df.iloc[0:3,:]" 2227 | ] 2228 | }, 2229 | { 2230 | "cell_type": "markdown", 2231 | "metadata": {}, 2232 | "source": [ 2233 | "# Data Cleaning" 2234 | ] 2235 | }, 2236 | { 2237 | "cell_type": "markdown", 2238 | "metadata": {}, 2239 | "source": [ 2240 | "One of the big jobs of doing well in Kaggle competitions is that of data cleaning. A lot of times, the CSV file you're given (especially like in the Titanic dataset), you'll have a lot of missing values in the dataset, which you have to identify. The following **isnull** function will figure out if there are any missing values in the dataframe, and will then sum up the total for each column. In this case, we have a pretty clean dataset." 2241 | ] 2242 | }, 2243 | { 2244 | "cell_type": "code", 2245 | "execution_count": 37, 2246 | "metadata": { 2247 | "collapsed": false 2248 | }, 2249 | "outputs": [ 2250 | { 2251 | "data": { 2252 | "text/plain": [ 2253 | "Season 0\n", 2254 | "Daynum 0\n", 2255 | "Wteam 0\n", 2256 | "Wscore 0\n", 2257 | "Lteam 0\n", 2258 | "Lscore 0\n", 2259 | "Wloc 0\n", 2260 | "Numot 0\n", 2261 | "dtype: int64" 2262 | ] 2263 | }, 2264 | "execution_count": 37, 2265 | "metadata": {}, 2266 | "output_type": "execute_result" 2267 | } 2268 | ], 2269 | "source": [ 2270 | "df.isnull().sum()" 2271 | ] 2272 | }, 2273 | { 2274 | "cell_type": "markdown", 2275 | "metadata": {}, 2276 | "source": [ 2277 | "If you do end up having missing values in your datasets, be sure to get familiar with these two functions. \n", 2278 | "* **dropna()** - This function allows you to drop all(or some) of the rows that have missing values. \n", 2279 | "* **fillna()** - This function allows you replace the rows that have missing values with the value that you pass in." 2280 | ] 2281 | }, 2282 | { 2283 | "cell_type": "markdown", 2284 | "metadata": {}, 2285 | "source": [ 2286 | "# Visualizing Data" 2287 | ] 2288 | }, 2289 | { 2290 | "cell_type": "markdown", 2291 | "metadata": {}, 2292 | "source": [ 2293 | "An interesting way of displaying Dataframes is through matplotlib. " 2294 | ] 2295 | }, 2296 | { 2297 | "cell_type": "code", 2298 | "execution_count": 38, 2299 | "metadata": { 2300 | "collapsed": true 2301 | }, 2302 | "outputs": [], 2303 | "source": [ 2304 | "import matplotlib.pyplot as plt\n", 2305 | "%matplotlib inline" 2306 | ] 2307 | }, 2308 | { 2309 | "cell_type": "code", 2310 | "execution_count": 39, 2311 | "metadata": { 2312 | "collapsed": false 2313 | }, 2314 | "outputs": [ 2315 | { 2316 | "data": { 2317 | "text/plain": [ 2318 | "" 2319 | ] 2320 | }, 2321 | "execution_count": 39, 2322 | "metadata": {}, 2323 | "output_type": "execute_result" 2324 | }, 2325 | { 2326 | "data": { 2327 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjkAAAF5CAYAAAB9WzucAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XuYnVV99//3h0MSwSYRIwkU0qpoSD1QMhyClQDGB6rg\n6YePMppysNZKAXnSeon6gxLhaYt4SfhBwFKkohym0iCeEglykCJEooRClCFUiR0QEhgJQwxNIMn3\n98daW+/cTuawZ8/svW8+r+vaV7LX+s6619o72fs7617rvhURmJmZmVXNTs3ugJmZmdlocJJjZmZm\nleQkx8zMzCrJSY6ZmZlVkpMcMzMzqyQnOWZmZlZJTnLMzMyskpzkmJmZWSU5yTEzM7NKcpJjZmZm\nldRySY6kT0vaJumiUvl5kp6Q9Lyk70var1Q/XtJlknolbZC0WNKepZhXSLpOUp+k9ZK+LGn3Usy+\nkpZI2ihpraQLJbXc62RmZmYDa6kvb0kHAx8DHiiVnwWcnusOATYCyySNK4RdDBwLHA/MAfYGbiwd\n4npgJjA3x84BrigcZydgKbALMBs4CTgZOK8R4zMzM7Oxo1a5QaeklwP3AacC5wD3R8Tf5rongC9E\nxML8fCKwDjgpIm7Iz58GToiIm3LMDKAbmB0RKyTNBH4GdETE/TnmGGAJsE9ErJX0DuDbwF4R0Ztj\n/hq4AHhVRGwZkxfDzMzMRqyVZnIuA74TEbcXCyW9GpgG3FYri4jngHuBw3LRQaTZl2LMaqCnEDMb\nWF9LcLJbgQAOLcSsqiU42TJgEvCGkQzOzMzMxtYuze4AgKQTgD8lJStl00iJyLpS+bpcBzAVeCEn\nPzuKmQY8VayMiK2SninF9HecWt0DmJmZWVtoepIjaR/Sepq3R8SLze7PcEl6JXAM8EtgU3N7Y2Zm\n1lYmAH8MLIuIXze68aYnOUAH8CpgpSTlsp2BOZJOB/YHRJqtKc6yTAVqp57WAuMkTSzN5kzNdbWY\n8m6rnYE9SjEHl/o3tVDXn2OA6wYaoJmZmQ3ow6TNQQ3VCknOrcCbSmVXkxYNXxARj0paS9oR9SD8\nduHxoaR1PJAWLG/JMcWFx9OB5TlmOTBZ0oGFdTlzSQnUvYWYz0qaUliXczTQBzy0g/7/EuDaa69l\n5syZwxp4q5o/fz4LFy5sdjcaokpjAY+nlVVpLODxtLIqjaW7u5t58+ZB/i5ttKYnORGxkVICIWkj\n8OuI6M5FFwNnS/o56YU4H3gc+FZu4zlJVwEXSVoPbAAuAe6OiBU55mFJy4ArJZ0KjAMuBboiojZL\nc0vuyzV52/pe+ViLBjiVtglg5syZzJo1a2QvRouYNGmSx9KiPJ7WVaWxgMfTyqo0loJRWe7R9CRn\nB7bb1x4RF0rajXRNm8nAXcA7IuKFQth8YCuwGBgP3AycVmr3Q8Ai0uzRthx7ZuE42yQdB3wJuId0\nPZ6rgXMbNTAzMzMbGy2Z5ETE2/opWwAsGOBnNgNn5MeOYp4F5g1y7MeA44bYVTMzM2tRrXSdHDMz\nM7OGacmZHGuuzs7OZnehYZo5lp6eHnp7ewcPHKIpU6ZU6r0B/1trZR5P66rSWEZby9zWoV1JmgXc\nd99991VxIZjVqaenhxkzZrJp0/MNa3PChN1Yvbqb6dOnN6xNM7NmWrlyJR0dHZBuubSy0e17Jsds\nFPT29uYE51rSPWFHqptNm+bR29vrJMfMbIic5JiNqpmAZ/jMzJrBC4/NzMyskpzkmJmZWSU5yTEz\nM7NKcpJjZmZmleQkx8zMzCrJSY6ZmZlVkpMcMzMzqyQnOWZmZlZJTnLMzMyskpzkmJmZWSU5yTEz\nM7NKcpJjZmZmleQkx8zMzCrJSY6ZmZlVkpMcMzMzqyQnOWZmZlZJTnLMzMyskpzkmJmZWSU5yTEz\nM7NKcpJjZmZmleQkx8zMzCrJSY6ZmZlVkpMcMzMzq6SmJzmSPi7pAUl9+XGPpD8v1H9F0rbSY2mp\njfGSLpPUK2mDpMWS9izFvELSdfkY6yV9WdLupZh9JS2RtFHSWkkXSmr6a2RmZmbD1wpf4I8BZwGz\ngA7gduBbkmYWYr4HTAWm5UdnqY2LgWOB44E5wN7AjaWY64GZwNwcOwe4olaZk5mlwC7AbOAk4GTg\nvBGOz8zMzJpgl2Z3ICKWlIrOlnQqKdHozmWbI+Lp/n5e0kTgI8AJEXFnLjsF6JZ0SESsyAnTMUBH\nRNyfY84Alkj6ZESszfX7A0dFRC+wStI5wAWSFkTEloYO3MzMzEZVK8zk/JaknSSdAOwG3FOoOlLS\nOkkPS7pc0h6Fug5SsnZbrSAiVgM9wGG5aDawvpbgZLcCARxaiFmVE5yaZcAk4A0jH52ZmZmNpabP\n5ABIeiOwHJgAbADelxMVSKeqbgTWAK8F/glYKumwiAjS6asXIuK5UrPrch35z6eKlRGxVdIzpZh1\n/bRRq3ug/hGamZnZWGuJJAd4GDiANGvyfuBrkuZExMMRcUMh7meSVgG/AI4E7hjznu7A/PnzmTRp\n0nZlnZ2ddHaWlw+ZmZm99HR1ddHV1bVdWV9f36gesyWSnLze5dH89H5JhwBnAqf2E7tGUi+wHynJ\nWQuMkzSxNJszNdeR/yzvttoZ2KMUc3DpcFMLdQNauHAhs2bNGizMzMzsJam/X/xXrlxJR0fHqB2z\npdbkFOwEjO+vQtI+wCuBJ3PRfcAW0q6pWswMYDrpFBj5z8mSDiw0NRcQcG8h5k2SphRijgb6gIdG\nMhgzMzMbe02fyZH0j6R1Nz3AHwAfBo4Ajs7XsTmXtCZnLWn25vPAI6RFwUTEc5KuAi6StJ60pucS\n4O6IWJFjHpa0DLgy79waB1wKdOWdVQC3kJKZaySdBewFnA8siogXR/llMDMzswZrepJDOo30VVJS\n0Qc8CBwdEbdLmgC8GTgRmAw8QUpu/r6UeMwHtgKLSTNANwOnlY7zIWARaVfVthx7Zq0yIrZJOg74\nEmln10bgalKSZWZmZm2m6UlORHx0gLpNwJ/vqL4Qtxk4Iz92FPMsMG+Qdh4DjhvseGZmZtb6WnVN\njpmZmdmIOMkxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6skJzlmZmZWSU5yzMzMrJKc\n5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6skJzlmZmZWSU5yzMzMrJKc\n5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6skJzlmZmZWSU5yzMzMrJKc\n5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqqelJjqSPS3pAUl9+3CPpz0sx50l6QtLzkr4vab9S\n/XhJl0nqlbRB0mJJe5ZiXiHpunyM9ZK+LGn3Usy+kpZI2ihpraQLJTX9NTIzM7Pha4Uv8MeAs4BZ\nQAdwO/AtSTMBJJ0FnA58DDgE2AgskzSu0MbFwLHA8cAcYG/gxtJxrgdmAnNz7BzgilplTmaWArsA\ns4GTgJOB8xo2UjMzMxszTU9yImJJRNwcEb+IiJ9HxNnAb0iJBsCZwPkR8d2I+ClwIimJeS+ApInA\nR4D5EXFnRNwPnAL8maRDcsxM4BjgLyPiJxFxD3AGcIKkafk4xwD7Ax+OiFURsQw4BzhN0i6j/0qY\nmZlZIzU9ySmStJOkE4DdgHskvRqYBtxWi4mI54B7gcNy0UGk2ZdizGqgpxAzG1ifE6CaW4EADi3E\nrIqI3kLMMmAS8IaGDNDMzMzGTEskOZLeKGkDsBm4HHhfTlSmkRKRdaUfWZfrAKYCL+TkZ0cx04Cn\nipURsRV4phTT33EoxJiZmVmbaJXTMA8DB5BmTd4PfE3SnOZ2aXjmz5/PpEmTtivr7Oyks7OzST2y\nKuru7m5IO1OmTGH69OkNacvMbCi6urro6urarqyvr29Uj9kSSU5EbAEezU/vz2tpzgQuBESarSnO\nskwFaqee1gLjJE0szeZMzXW1mPJuq52BPUoxB5e6NrVQN6CFCxcya9aswcLM6vQksBPz5s1rSGsT\nJuzG6tXdTnTMbMz094v/ypUr6ejoGLVjtkSS04+dgPERsUbSWtKOqAfhtwuNDwUuy7H3AVtyzE05\nZgYwHVieY5YDkyUdWFiXM5eUQN1biPmspCmFdTlHA33AQ6MySrMhexbYBlxL2iQ4Et1s2jSP3t5e\nJzlmVmlNT3Ik/SPwPdJC4T8APgwcQUowIG0PP1vSz4FfAucDjwPfgrQQWdJVwEWS1gMbgEuAuyNi\nRY55WNIy4EpJpwLjgEuBroiozdLcQkpmrsnb1vfKx1oUES+O4ktgNgwzSVdbMDOzwTQ9ySGdRvoq\nKanoI83YHB0RtwNExIWSdiNd02YycBfwjoh4odDGfGArsBgYD9wMnFY6zoeARaRdVdty7Jm1yojY\nJuk44EvAPaTr8VwNnNvAsZqZmdkYaXqSExEfHULMAmDBAPWbSde9OWOAmGeBARc0RMRjwHGD9cfM\nzMxaX9OTHLNW0tPTQ29v7+CBg2jULigzM6ufkxyzrKenhxkzZrJp0/PN7oqZmTWAkxyzrLe3Nyc4\njdjBtJR0VxAzM2sWJzlmv6cRO5h8usrMrNla4rYOZmZmZo3mJMfMzMwqyUmOmZmZVZKTHDMzM6sk\nJzlmZmZWSU5yzMzMrJKc5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6sk\nJzlmZmZWSU5yzMzMrJKc5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6sk\nJzlmZmZWSU5yzMzMrJKc5JiZmVklNT3JkfQZSSskPSdpnaSbJL2+FPMVSdtKj6WlmPGSLpPUK2mD\npMWS9izFvELSdZL6JK2X9GVJu5di9pW0RNJGSWslXSip6a+TmZmZDU8rfHkfDlwKHAq8HdgVuEXS\ny0px3wOmAtPyo7NUfzFwLHA8MAfYG7ixFHM9MBOYm2PnAFfUKnMysxTYBZgNnAScDJw3gvGZmZlZ\nE+zS7A5ExDuLzyWdDDwFdAA/LFRtjoin+2tD0kTgI8AJEXFnLjsF6JZ0SESskDQTOAboiIj7c8wZ\nwBJJn4yItbl+f+CoiOgFVkk6B7hA0oKI2NK4kZuZmdloaoWZnLLJQADPlMqPzKezHpZ0uaQ9CnUd\npITttlpBRKwGeoDDctFsYH0twcluzcc6tBCzKic4NcuAScAbRjYsMzMzG0stleRIEum00w8j4qFC\n1feAE4G3AZ8CjgCW5nhIp69eiIjnSk2uy3W1mKeKlRGxlZRMFWPW9dMGhRgzMzNrA00/XVVyOfAn\nwJ8VCyPihsLTn0laBfwCOBK4Y8x6Z2ZmZm2jZZIcSYuAdwKHR8STA8VGxBpJvcB+pCRnLTBO0sTS\nbM7UXEf+s7zbamdgj1LMwaXDTS3U7dD8+fOZNGnSdmWdnZ10dpbXR5uZmb30dHV10dXVtV1ZX1/f\nqB6zJZKcnOC8BzgiInqGEL8P8EqglgzdB2wh7Zq6KcfMAKYDy3PMcmCypAML63LmAgLuLcR8VtKU\nwrqco4E+oHj67PcsXLiQWbNmDdZ1MzOzl6T+fvFfuXIlHR0do3bMpic5ki4nbQd/N7BRUm3mpC8i\nNuXr2JxL2g6+ljR783ngEdKiYCLiOUlXARdJWg9sAC4B7o6IFTnmYUnLgCslnQqMI21d78o7qwBu\nISUz10g6C9gLOB9YFBEvjuoLYWZmZg3V9CQH+Dhph9MPSuWnAF8DtgJvJi08ngw8QUpu/r6UeMzP\nsYuB8cDNwGmlNj8ELCLtqtqWY8+sVUbENknHAV8C7gE2AleTkiwzMzNrI01PciJiwB1eEbEJ+PMh\ntLMZOCM/dhTzLDBvkHYeA44b7HhmZmbW2uraQi7pLyRNaHRnzMzMzBql3uvkLATWSrpC0iGN7JCZ\nmZlZI9Sb5OwN/BWwD3C3pJ9K+jtJr2pc18zMzMzqV1eSExEvRMS/R8SxpG3a1wB/CTwu6RuSji1c\njdjMzMxszI34tg75wn23ki7KF8BBQBfwX5IOH2n7ZmZmZvWoO8mRNEXS/5H0AHA36WrC7wX+CPhD\n4JukLeBmZmZmY66uLeSSbiLdgmEN8GXgqxHxdCFkg6QLgb8deRfNzMzMhq/e6+Q8B7w9Iu4aIOZp\n4HV1tm9mZmY2InUlORFx0hBignSncDMzM7MxV+/FABdKKt8yAUmnSfriyLtlZmZmNjL1Ljz+36R7\nO5X9CPhg/d0xMzMza4x6k5wppHU5ZX25zszMzKyp6k1yfgEc00/5MaQdV2ZmZmZNVe/uqouBiyW9\nErg9l80FPgV8shEdMzMzMxuJendXXZnvQv5Z4HO5+HHgExHxr43qnJmZmVm96p3JISIuBS6VtBfw\nPxHxbOO6ZWZmZjYydSc5NfneVWZmZmYtpd7r5LxK0lck9UjaJOmF4qPRnTQzMzMbrnpncq4GXgt8\nAXiSdPdxMzMzs5ZRb5IzB5gTEfc3sjNmZmZmjVLvdXIex7M3ZmZm1sLqTXLmA/8kaZ9GdsbMzMys\nUeo9XXUN8AfAf0t6DnixWBkRe460Y2ZmZmYjUW+S8+mG9sLMzMysweq94vFVje6ImZmZWSPVuyYH\nSX8saYGkayTtmcuOljSzcd0zMzMzq0+9FwM8HPgZcATwAeDluaoDOK8xXTMzMzOrX70zOZ8HFkTE\nUUDxCse3AbNH3CszMzOzEao3yXkzsLif8qeAVw2nIUmfkbRC0nOS1km6SdLr+4k7T9ITkp6X9H1J\n+5Xqx0u6TFKvpA2SFtdOoxViXiHpOkl9ktZL+rKk3Usx+0paImmjpLWSLpRU92k9MzMza456v7z7\ngGn9lB8A/GqYbR0OXAocCrwd2BW4RdLLagGSzgJOBz4GHAJsBJZJGldo52LgWOB40hWZ9wZuLB3r\nemAmMDfHzgGuKBxnJ2ApaUH2bOAk4GR8Cs7MzKzt1LuF/OvABZLeT77ysaRDgS8C1w6noYh4Z/G5\npJNJM0IdwA9z8ZnA+RHx3RxzIrAOeC9wg6SJwEeAEyLizhxzCtAt6ZCIWJEXRB8DdNRuRyHpDGCJ\npE9GxNpcvz9wVET0AqsknZPHuiAitgxnbGZmZtY89c7kfAZ4FHiCtOj4IeAe4MfA+SPs02RS4vQM\ngKRXk2aNbqsFRMRzwL3AYbnoIFLCVoxZDfQUYmYD60v327o1H+vQQsyqnODULAMmAW8Y4bjMzMxs\nDNV7nZzNwCmSzgPeREp0VkbEwyPpjCSRTjv9MCIeysXTSInIulL4On53ymwq8EJOfnYUM400Q1Qc\nx1ZJz5Ri+jtOre6BYQ3IzMzMmqbe01UARMQaYE2D+gJwOfAnwJ81sE0zMzN7CaoryZH0LwPVR8TH\n6mhzEfBO4PCIeLJQtRYQabamOMsyFbi/EDNO0sTSbM7UXFeLKe+22hnYoxRzcKlrUwt1OzR//nwm\nTZq0XVlnZyednZ0D/ZiZmdlLQldXF11dXduV9fX1jeox653J2av0fFfSmpU/AP5juI3lBOc9wBER\n0VOsi4g1ktaSdkQ9mOMnktbRXJbD7gO25JibcswMYDqwPMcsByZLOrCwLmcuKYG6txDzWUlTCuty\njibtJqudPuvXwoULmTVr1nCHbmZm9pLQ3y/+K1eupKOjY9SOWe+anHeVyyTtAvwzgyQD/fzc5UAn\n8G5go6TazElfRGzKf78YOFvSz4FfkhY3Pw58K/fnOUlXARdJWg9sAC4B7o6IFTnmYUnLgCslnQqM\nI21d78o7qwBuyf2/Jm9b3ysfa1FEbHendTMzM2ttI1qTUxQRWyR9AfgBcNEwfvTjpIXFPyiVnwJ8\nLbd9oaTdSNe0mQzcBbwjIopXW54PbCVdpHA8cDNwWqnNDwGLSLuqtuXYMwtj2CbpOOBLpN1iG4Gr\ngXOHMR4zMzNrAQ1LcrJXk05dDVlEDGkbe0QsABYMUL8ZOCM/dhTzLDBvkOM8Bhw3lD6ZmZlZ66p3\n4fGF5SLSqZ13M8yLAZqZmZmNhnpncg4rPd8GPA18GrhyRD0yMzMza4B6Fx4f3uiOmJmZmTWS765t\nZmZmlVTvmpwfk2/MOZiIOKSeY5iZmZmNRL1rcu4A/hp4hN9dbG82MIO0zXvzyLtmZmZmVr96k5zJ\nwGUR8dlioaR/AKZGxEdH3DMzMzOzEah3Tc4HgK/0U3418L/r7o2ZmZlZg9Sb5GwmnZ4qm41PVZmZ\nmVkLqPd01SXAFZIOBFbkskOBvwL+qREdMzMzMxuJeq+T8w+S1pDu+1Rbf9MNfCwirm9U58zMzMzq\nVfe9q3Iy44TGzMzMWlLdFwOUNFHSyZLOk/SKXHaApL0a1z0zMzOz+tR7McA3ArcCzwP7knZVrQc+\nCPwhcFKD+mdmZmZWl3pnchaSTlW9FthUKF8CzBlpp8zMzMxGqt4k52Dg8ogo39rhV4BPV5mZmVnT\n1bvw+EXg5f2U7wf01t8ds+Hp6emht7cx/+S6u7sb0o6ZmbWGepOc7wDnSPpgfh6S/hC4APhGQ3pm\nNoienh5mzJjJpk3PN7srZmbWgupNcv6OlMysBV4G3A7sDfwY+OwAP2fWML29vTnBuRaY2YAWlwLn\nNKAdMzNrBfVeDHA9cJSkI4ADSKeuVgLL+lmnYzbKZgKzGtCOT1eZmVXJsJMcSbsC3wVOj4g7gTsb\n3iszMzOzERr27qqIeBHoADxjY2ZmZi2r3i3k1wGnNLIjZmZmZo1U78LjAE6X9HbgJ8DG7SojPjXS\njpmZmZmNRL1JTgfwYP77m0t1Po1lZmZmTTesJEfSa4A1EXH4KPXHzMzMrCGGuybnv4BX1Z5I+rqk\nqY3tkpmZmdnIDTfJUen5O4HdG9QXMzMzs4apd3dVQ0k6XNK3Jf1K0jZJ7y7VfyWXFx9LSzHjJV0m\nqVfSBkmLJe1ZinmFpOsk9UlaL+nLknYvxewraYmkjZLWSrpQUku8TmZmZjZ0w/3yDn5/YXEjFhrv\nDvwn8DcDtPc9YCowLT86S/UXA8cCxwNzSLeZuLEUcz3p8rhzc+wc4IpaZU5mlpLWKs0GTgJOBs6r\na1RmZmbWNMPdXSXgakmb8/MJwD9LKm8h/3+G02hE3AzcDCCpfEqsZnNEPN1vp6SJwEeAE/JVmJF0\nCtAt6ZCIWCFpJnAM0BER9+eYM4Alkj4ZEWtz/f7AURHRC6ySdA5wgaQFEbFlOOMyMzOz5hnuTM5X\ngaeAvvy4Fnii8Lz2GA1HSlon6WFJl0vao1DXQUrYbqsVRMRqoAc4LBfNBtbXEpzsVtLM0aGFmFU5\nwalZBkwC3tDQ0ZiZmdmoGtZMTkQ06yrH3yOdeloDvBb4J2CppMPyDUGnAS9ExHOln1uX68h/PlWs\njIitkp4pxazrp41a3QMNGIuZmZmNgXovBjimIuKGwtOfSVoF/AI4ErijKZ0qmT9/PpMmTdqurLOz\nk87O8tIhMzOzl56uri66urq2K+vrG62TP0lbJDllEbFGUi+wHynJWQuMkzSxNJszNdeR/yzvttoZ\n2KMUc3DpcFMLdTu0cOFCZs2aNdyhmJmZvST094v/ypUr6ejoGLVjtuXWaEn7AK8EnsxF9wFbSLum\najEzgOnA8ly0HJgs6cBCU3NJi6nvLcS8SdKUQszRpHVGDzV4GGZmZjaKWmImJ1+rZj9+d7HB10g6\nAHgmP84lrclZm+M+DzxCWhRMRDwn6SrgIknrgQ3AJcDdEbEixzwsaRlwpaRTgXHApUBX3lkFcAsp\nmblG0lnAXsD5wKKIeHE0XwMzMzNrrJZIcoCDSKedatfh+WIu/yrp2jlvBk4EJpN2cy0D/r6UeMwH\ntgKLgfGkLemnlY7zIWARaVfVthx7Zq0yIrZJOg74EnAP6e7qV5OSLLNK6e7ublhbU6ZMYfr06Q1r\nz8ysEVoiycnXthno1NmfD6GNzcAZ+bGjmGeBeYO08xhw3GDHM2tfTwI7MW/egP8VhmXChN1Yvbrb\niY6ZtZSWSHLMbCw9S5rIvJZ0AfCR6mbTpnn09vY6yTGzluIkx+wlaybgHYFmVl1tubvKzMzMbDBO\ncszMzKySnOSYmZlZJTnJMTMzs0pykmNmZmaV5CTHzMzMKslJjpmZmVWSkxwzMzOrJCc5ZmZmVklO\ncszMzKySnOSYmZlZJTnJMTMzs0pykmNmZmaV5CTHzMzMKslJjpmZmVWSkxwzMzOrJCc5ZmZmVklO\ncszMzKySnOSYmZlZJTnJMTMzs0pykmNmZmaV5CTHzMzMKslJjpmZmVWSkxwzMzOrJCc5ZmZmVklO\ncszMzKySWiLJkXS4pG9L+pWkbZLe3U/MeZKekPS8pO9L2q9UP17SZZJ6JW2QtFjSnqWYV0i6TlKf\npPWSvixp91LMvpKWSNooaa2kCyW1xOtkZmZmQ9cqX967A/8J/A0Q5UpJZwGnAx8DDgE2AsskjSuE\nXQwcCxwPzAH2Bm4sNXU9MBOYm2PnAFcUjrMTsBTYBZgNnAScDJw3wvGZmZnZGNul2R0AiIibgZsB\nJKmfkDOB8yPiuznmRGAd8F7gBkkTgY8AJ0TEnTnmFKBb0iERsULSTOAYoCMi7s8xZwBLJH0yItbm\n+v2BoyKiF1gl6RzgAkkLImLLqL0IZmZm1lCtMpOzQ5JeDUwDbquVRcRzwL3AYbnoIFLCVoxZDfQU\nYmYD62sJTnYraebo0ELMqpzg1CwDJgFvaNCQzMzMbAy0fJJDSnCCNHNTtC7XAUwFXsjJz45ipgFP\nFSsjYivwTCmmv+NQiDEzM7M20BKnq6pg/vz5TJo0abuyzs5OOjs7m9QjMzOz1tHV1UVXV9d2ZX19\nfaN6zHZIctYCIs3WFGdZpgL3F2LGSZpYms2ZmutqMeXdVjsDe5RiDi4df2qhbocWLlzIrFmzBh2M\nmZnZS1F/v/ivXLmSjo6OUTtmy5+uiog1pARjbq0sLzQ+FLgnF90HbCnFzACmA8tz0XJgsqQDC83P\nJSVQ9xZi3iRpSiHmaKAPeKhBQzIzM7Mx0BIzOflaNfuREg6A10g6AHgmIh4jbQ8/W9LPgV8C5wOP\nA9+CtBCdV2t+AAAWK0lEQVRZ0lXARZLWAxuAS4C7I2JFjnlY0jLgSkmnAuOAS4GuvLMK4BZSMnNN\n3ra+Vz7Wooh4cVRfBDMzM2uolkhySLuj7iAtMA7gi7n8q8BHIuJCSbuRrmkzGbgLeEdEvFBoYz6w\nFVgMjCdtST+tdJwPAYtIu6q25dgza5URsU3SccCXSLNEG4GrgXMbNVAzMzMbGy2R5ORr2wx46iwi\nFgALBqjfDJyRHzuKeRaYN8hxHgOOGyjGzMzMWl/Lr8kxMzMzq4eTHDMzM6skJzlmZmZWSU5yzMzM\nrJKc5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6skJzlmZmZWSU5yzMzM\nrJKc5JiZmVklOckxMzOzSnKSY2ZmZpXkJMfMzMwqaZdmd8DMqqG7u7thbU2ZMoXp06c3rD0ze2ly\nkmNmI/QksBPz5s1rWIsTJuzG6tXdTnTMbESc5JjZCD0LbAOuBWY2oL1uNm2aR29vr5McMxsRJzlm\n1iAzgVnN7oSZ2W954bGZmZlVkpMcMzMzqyQnOWZmZlZJTnLMzMyskpzkmJmZWSU5yTEzM7NKcpJj\nZmZmldQWSY6kcyVtKz0eKsWcJ+kJSc9L+r6k/Ur14yVdJqlX0gZJiyXtWYp5haTrJPVJWi/py5J2\nH4sxmpmZWWO1RZKT/RSYCkzLj7fWKiSdBZwOfAw4BNgILJM0rvDzFwPHAscDc4C9gRtLx7iedEWz\nuTl2DnDFKIzFzMzMRlk7XfF4S0Q8vYO6M4HzI+K7AJJOBNYB7wVukDQR+AhwQkTcmWNOAbolHRIR\nKyTNBI4BOiLi/hxzBrBE0icjYu2ojs7MzMwaqp1mcl4n6VeSfiHpWkn7Akh6NWlm57ZaYEQ8B9wL\nHJaLDiIldMWY1UBPIWY2sL6W4GS3AgEcOjpDMjMzs9HSLknOj4CTSTMtHwdeDfxHXi8zjZSIrCv9\nzLpcB+k01ws5+dlRzDTgqWJlRGwFninEmJmZWZtoi9NVEbGs8PSnklYA/w18AHi4Ob0yMzOzVtYW\nSU5ZRPRJegTYD/gBINJsTXE2ZypQO/W0FhgnaWJpNmdqrqvFlHdb7QzsUYjZofnz5zNp0qTtyjo7\nO+ns7BziqMzMzKqrq6uLrq6u7cr6+vpG9ZhtmeRIejkpwflqRKyRtJa0I+rBXD+RtI7msvwj9wFb\ncsxNOWYGMB1YnmOWA5MlHVhYlzOXlEDdO1ifFi5cyKxZsxowOjMzs+rp7xf/lStX0tHRMWrHbIsk\nR9IXgO+QTlH9IfA54EXg33LIxcDZkn4O/BI4H3gc+BakhciSrgIukrQe2ABcAtwdEStyzMOSlgFX\nSjoVGAdcCnR5Z5WZmVn7aYskB9iHdA2bVwJPAz8EZkfErwEi4kJJu5GuaTMZuAt4R0S8UGhjPrAV\nWAyMB24GTisd50PAItKuqm059sxRGpOZmZmNorZIciJi0IUtEbEAWDBA/WbgjPzYUcyzwLzh99CG\nqqenh97e3oa01d3d3ZB2zMysmtoiybFq6OnpYcaMmWza9Hyzu2JmZi8BTnJszPT29uYE51rS3TNG\nailwTgPaMTOzKnKSY00wE2jETjSfrjIzsx1rlysem5mZmQ2LkxwzMzOrJCc5ZmZmVklOcszMzKyS\nnOSYmZlZJTnJMTMzs0pykmNmZmaV5CTHzMzMKslJjpmZmVWSkxwzMzOrJCc5ZmZmVklOcszMzKyS\nfINOM2tJ3d2NuQHrlClTmD59ekPaMrP24iTHzFrMk8BOzJs3ryGtTZiwG6tXdzvRMXsJcpJjZi3m\nWWAbcC0wc4RtdbNp0zx6e3ud5Ji9BDnJMbMWNROY1exOmFkb88JjMzMzqyQnOWZmZlZJTnLMzMys\nkpzkmJmZWSU5yTEzM7NKcpJjZmZmleQt5GZWeY26ejL4Cspm7cRJjplVWGOvngy+grJZO3GSY4Pq\n6emht7d3xO008rdps6Fp5NWTwVdQNmsvTnL6Iek04JPANOAB4IyI+HFzezV2urq66OzsBFKCM2PG\nTDZter7JvapXF9DZ7E400M1U6yrAY/X+jP7Vk4v/b6rA42ldVRrLaPPC4xJJHwS+CJwLHEhKcpZJ\nmtLUjo2hrq6u3/69t7c3JzjXAveN8HH+mI3hd7oGD2kry5rdgQarzvtT/H9TBR5P66rSWEabZ3J+\n33zgioj4GoCkjwPHAh8BLmxmx5qrEb8J+3SVVUN/p177+vpYuXLlsNvyQmaz0eMkp0DSrkAH8I+1\nsogISbcChzWtY2bWIgZeyNzR0THsFr2Q2Wz0OMnZ3hRgZ2BdqXwdMGPsuzN8q1ev5q/+6q/YvHlz\n3W088sgjHHrooQBMnjy5UV0zq4CBFjLPBxYOs720kPmuu+5i5syRL4zevHkz48ePH3E7kGamli9f\n3rD2oLH9G25bg820NbNvw22v3lnDmpfS7KGTnJGbAK2zc2jRokXcddddI25nxYoVpZKljPx0090N\nbGuo7T0OXNfA9oZqtMa6jqGPZ7C2xvJ92JH+3p92eB/W9FO3oY5j3A+ogVvcdyIlYY3xlre8taHt\nNbZ/w29r4Jm25vZtuO3VM2tYM27cBL7xjcXstddeI+zXyBW+OyeMRvuKiNFoty3l01XPA8dHxLcL\n5VcDkyLiff38zIcY+beOmZnZS9mHI+L6RjfqmZyCiHhR0n3AXODbAJKUn1+ygx9bBnwY+CWwaQy6\naWZmVhUTgD9mlLaOeianRNIHgKuBjwMrSCfa3w/sHxFPN7FrZmZmNgyeySmJiBvyNXHOA6YC/wkc\n4wTHzMysvXgmx8zMzCrJVzw2MzOzSnKSY2ZmZpXkJGcIJH1G0gpJz0laJ+kmSa/vJ+48SU9Iel7S\n9yXt14z+DoekT0vaJumiUnnbjEXS3pKukdSb+/uApFmlmLYYj6SdJJ0v6dHc159LOrufuJYcj6TD\nJX1b0q/yv6t39xMzYN8ljZd0WX4/N0haLGnPsRvFb/uxw7FI2kXS5yU9KOk3OearkvYqtdESY8l9\nGfS9KcT+c475RKm8rcYjaaakb0l6Nr9P90rap1DfEuMZbCySdpe0SNJj+f/NzyT9dSmmJcaS+9KQ\n78xGjMlJztAcDlwKHAq8HdgVuEXSy2oBks4CTgc+BhwCbCTd2HPc2Hd3aCQdTOrvA6XythmLpMmk\nK7RtBo4hXYb274D1hZi2GQ/waeCvgb8B9gc+BXxK0um1gBYfz+6kxfp/A/zegr8h9v1i0v3ijgfm\nAHsDN45ut/s10Fh2A/4U+BzpRr7vI10V/VuluFYZCwzy3tRIeh/ps+5X/VS3zXgkvRa4C3iI1Nc3\nke4SXLzUR6uMZ7D3ZiFwNPAh0ufCQmCRpOMKMa0yFmjcd+bIxxQRfgzzQbr9wzbgrYWyJ4D5hecT\ngf8BPtDs/u5gDC8HVgNvA+4ALmrHsQAXAHcOEtNO4/kOcGWpbDHwtXYbT/4/8u7hvBf5+WbgfYWY\nGbmtQ1ppLP3EHARsBfZp5bEMNB7gD4Ee0i8La4BPlN6rthkP6Rb3Xx3gZ1pyPDsYyyrg/y2V/QQ4\nr5XHUujLsL8zGzUmz+TUZzIp234GQNKrgWnAbbWAiHgOuJfWvbHnZcB3IuL2YmEbjuVdwE8k3ZCn\nRVdK+mitsg3Hcw8wV9LrACQdAPwZ6b4E7Tie3xpi3w8iXdqiGLOa9MXb0uPjd58Lz+bnHbTRWCQJ\n+BpwYUT0d3+KthlPHsuxwH9Jujl/NvxI0nsKYW0zHtLnwrsl7Q0g6SjgdfzuAnqtPpZ6vjMb8lng\nJGeY8n+ei4EfRsRDuXga6Q3s78ae08awe0Mi6QTSVPtn+qluq7EArwFOJc1KHQ18CbhE0l/k+nYb\nzwXA14GHJb0A3AdcHBH/luvbbTxFQ+n7VOCF/IG3o5iWI2k86b27PiJ+k4un0V5j+TSpv4t2UN9O\n49mTNFt9FukXhP8F3AR8Q9LhOaadxnMG6cZoj+fPhaXAaRFRu5lay45lBN+ZDfks8MUAh+9y4E9I\nv123nbzo7mLg7RHxYrP70wA7ASsi4pz8/AFJbyRdsfqa5nWrbh8knXc/gbSW4E+B/0/SExHRjuOp\nPEm7AP9O+tD+myZ3py6SOoBPkNYXVUHtF/hvRkTtljwPSnoL6bNh5HcxHlufIK1vOY40kzEHuDx/\nLtw+4E82X1O/Mz2TMwySFgHvBI6MiCcLVWsBkTLPoqm5rpV0AK8CVkp6UdKLwBHAmfk3hHW0z1gA\nnuT3b/3cDUzPf2+n9wbgQuCCiPj3iPhZRFxHWmRYm3Vrt/EUDaXva4FxkiYOENMyCgnOvsDRhVkc\naK+xvJX0ufBY4XPhj4CLJD2aY9ppPL3AFgb/bGj58UiaAPwD8LcRsTQifhoRl5NmfD+Zw1pyLCP8\nzmzImJzkDFF+s94DHBURPcW6iFhDetHnFuInkjLve8ayn0NwK2mXwZ8CB+THT4BrgQMi4lHaZyyQ\ndlbNKJXNAP4b2u69gbRrZ2upbBv5/2objue3htj3+0hfTsWYGaQvpuVj1tkhKCQ4rwHmRsT6Ukjb\njIW0FufN/O4z4QDSwtALSbsWoY3Gk2epf8zvfza8nvzZQPuMZ9f8KH8ubOV33+EtN5YGfGc2ZkzN\nXnXdDg/SdNt60ra4qYXHhELMp4BfkxbCvgn4JvBfwLhm938I4yvvrmqbsZAWp20mzXS8lnSqZwNw\nQpuO5yuk6eh3kn6Tfh/wFPCP7TAe0lbYA0hJ9Dbg/+Tn+w617/n/2xrgSNLM493AXa00FtKp/m+R\nvjDfVPpc2LXVxjKU96af+O12V7XbeID3kraLfzR/NpwOvAAc1mrjGcJY7gAeJM26/zFwMvA88LFW\nG0uhLyP+zmzEmMZ88O34yP/otvbzOLEUt4D028/zpFXv+zW770Mc3+0Ukpx2GwspIXgw9/VnwEf6\niWmL8eQPu4vyf+yN+T/954Bd2mE8+UO4v/8v/zrUvgPjSdfY6CUlrP8O7NlKYyEloOW62vM5rTaW\nob43pfhH+f0kp63GQ0oGHsn/l1YCx7XieAYbC2kh9VXAY3ksDwFntuJYcl8a8p3ZiDH5Bp1mZmZW\nSV6TY2ZmZpXkJMfMzMwqyUmOmZmZVZKTHDMzM6skJzlmZmZWSU5yzMzMrJKc5JiZmVklOckxMzOz\nSnKSY2ZmZpXkJMesgiStkfSJJhz3LZIelPSCpG+M9fH7MxqvhaRzJa1sZJtm1nhOcsxajKSvSNom\naaukzZL+S9I5kobz//Ug4F+Gccwj8jEnDr/H27mIdI+gPyLdN6ghJB2T+7dnqfxJSY+Wyv4oxx6V\ni4b1WgzRFyjcHXk05ORs2w4eWyX962ge36wKdml2B8ysX98jJQkTgHeQ7sa7GbhwKD8cEb8e5vEE\nRP5zJF4LfCkinqy3AUm7RsSLpeIfAi+S7kZ8Q47bn/T6TJA0PSJ6cuzbSHefvhvqei0GFRHPk24q\nOJoOAnbOf/8zYDHwetKNCgH+Z5SPb9b2PJNj1po2R8TTEfFYRPwLcCvwnlqlpOMl/VTSpvwb/98W\nf7h8iib/9v+Xkr4haaOkRyS9K9f9EelO9ADri7MEkt6fTz89L6lX0i2SXlbubG32BNgD+Epu48Rc\nd4Ske3Nfn5D0T8VZKUl3SLpU0kJJTwM3l9uPiI3AT0hJTs2RwF2kZKZYfgTwo4h4YbivRaG/2yS9\nTdKPc8zdkl5fiDlX0v2F51+RdJOkv8tj7JW0SNLOhZhpkpbk1/Lnkj4w0Km0iPh1RDwVEU8Bz+Ti\np2tlEbGh8NovlvRsPu6NkvYpHPcwSbfmuvX5728q1I/P4z1F0vfyeFdJ6pA0Q9Jdkn4j6T8k7dtf\nX81alZMcs/awCRgHIKkD+DpwPfBG4Fzg/FpSMYC/B/4NeBOwFLhO0mTgMeD4HPM6YC/gTEnT8jG+\nDOxPSh6+Qf+zPT3ANNIswydyG1+XtDewBLgXeDPwceAvgbNLP38iaabqLTmmP3cARxWeHwX8APiP\nUvmROXYgO3otiv4vMB/oALYAV5Xqo/T8KOA1+fgnkmbiTi7UX0N6jeYA7wdOBV41SD8HJGkcKQFe\nCxwGHE6a8VoiqfY+vRy4EphNen0fB5ZKGl9q7u+BfwYOIL2f1wGX5fKDgZcBF4+kv2ZjLiL88MOP\nFnoAXwG+UXj+dtKpiQvy82uBm0s/83lgVeH5GuAThefbgAWF57vlsqPz8yOArcDEQsyBuWzfYfR9\nPXBi4fk/AA+VYk4F+grP7wB+MoS25+b+TM3P15ISkNnAmlz2mjyutzbgtTiyEPOOXDYuPz8XWFl6\nzx4FVCj7OnB9/vv++RgHFupfm8s+MYSx/977k8v/stiPXPYyUsL41h20tSvpVNvb8vPxuR+fLh1v\nG/DBQtlJwDPN/v/hhx/DeXgmx6w1vUvSBkmbSDMhXcDnct1M8nqTgruB1xV+e+/PqtpfIq0peQ7Y\nc8fhPADcBvxU0g2SPtrPbMdg9geW99PXlxdPqQD3DaGte8jrciTNJK3HWUk6jTUln3Y7kvQF/qNB\n2hrKa7Gq8PfaGqOBXq+fRURxdufJQvzrgRcj4renuCLiF6SkcCQOAN6Y/61skLQBeIq0lue1AJL2\nkvSvSgvY+0invsYB00ttFce7jjRT9dNS2SRJXstpbcP/WM1a0+2k0zYvAk9ExLYGtFlezBsMcMo6\nH/NoSYcBRwNnAP9X0qER8d8N6E/RxsECIuJ/JK0gnRZ6JfDDnFRskXQPacHxkcDdEbFlkOaG8lq8\nWKqnn5jhttloLyclf6fw+6cRn8p/dpFmb04jnZrcDNxPPv1Z0N94h/samLUU/2M1a00bI2JNRDze\nT4LTTdptU/RW4JHSTMJwvJD/3LlcERHLI+JzpNNXLwLvG0a73aS1IkVvBTZExON19LO2LudI0nqc\nmrty2REMvh6nGVYDu0g6sFYgaT/gFSNsdyUwA1gbEY+WHr/JMYcBF0XELRHRTfrc/4MRHtesLTjJ\nMWs/XwTmSjpb0usknUT6Lf0LI2jzv0m/qb9L0hRJu0s6RNJn8i6bfUmLk6cADw2j3cuBffPuqRmS\n3gMsyGOoxx2kxdFHA3cWyu8E3gvsQ2OSnP5O+9W9vT4iVpNO/V0p6eCc7FxBOrU21MS0v+N/lTQL\n9k2lCzH+cd4VtkjSlBzzc+AkSa+X9BbgatJC9nqOZ9ZWnOSYtZm8ruMDwAdJ6ygWAGdHxDXFsPKP\n9ddUoc0nSItpLyAt6L0U6CPtBFpCmok4D/jbiLhloO6V+voE8E7S7pz/JCU9V5IWJA/Utx1ZTjrd\nAtuv47mXdEpmA/Djgfq0g+PVEzNcf0F6be8EbiS9Dr9haAlHv8ePtI38cNJ6mW+SEtB/JiUotVOA\nJ5J2u/0naafc54FnB2t7B2VmbUX1z26bmVm98sLrHmBuRLTiKTaztuckx8xsDCjdZuLlpNm3vUlX\nr54GzIiIrc3sm1lVeXeVmdnY2BX4R+DVpNNqdwOdTnDMRo9ncszMzKySvPDYzMzMKslJjpmZmVWS\nkxwzMzOrJCc5ZmZmVklOcszMzKySnOSYmZlZJTnJMTMzs0pykmNmZmaV9P8D26paXy8mx+gAAAAA\nSUVORK5CYII=\n", 2328 | "text/plain": [ 2329 | "" 2330 | ] 2331 | }, 2332 | "metadata": {}, 2333 | "output_type": "display_data" 2334 | } 2335 | ], 2336 | "source": [ 2337 | "ax = df['Wscore'].plot.hist(bins=20)\n", 2338 | "ax.set_xlabel('Points for Winning Team')" 2339 | ] 2340 | }, 2341 | { 2342 | "cell_type": "markdown", 2343 | "metadata": {}, 2344 | "source": [ 2345 | "# Creating Kaggle Submission CSVs" 2346 | ] 2347 | }, 2348 | { 2349 | "cell_type": "markdown", 2350 | "metadata": {}, 2351 | "source": [ 2352 | "This isn't directly Pandas related, but I assume that most people who use Pandas probably do a lot of Kaggle competitions as well. As you probably know, Kaggle competitions require you to create a CSV of your predictions. Here's some starter code that can help you create that csv file" 2353 | ] 2354 | }, 2355 | { 2356 | "cell_type": "code", 2357 | "execution_count": 40, 2358 | "metadata": { 2359 | "collapsed": false 2360 | }, 2361 | "outputs": [ 2362 | { 2363 | "name": "stdout", 2364 | "output_type": "stream", 2365 | "text": [ 2366 | "[[ 0 10]\n", 2367 | " [ 1 15]\n", 2368 | " [ 2 20]]\n" 2369 | ] 2370 | } 2371 | ], 2372 | "source": [ 2373 | "import numpy as np\n", 2374 | "import csv\n", 2375 | "\n", 2376 | "results = [[0,10],[1,15],[2,20]]\n", 2377 | "results = pd.np.array(results)\n", 2378 | "print results" 2379 | ] 2380 | }, 2381 | { 2382 | "cell_type": "code", 2383 | "execution_count": 41, 2384 | "metadata": { 2385 | "collapsed": false 2386 | }, 2387 | "outputs": [], 2388 | "source": [ 2389 | "firstRow = [['id', 'pred']]\n", 2390 | "with open(\"result.csv\", \"wb\") as f:\n", 2391 | " writer = csv.writer(f)\n", 2392 | " writer.writerows(firstRow)\n", 2393 | " writer.writerows(results)" 2394 | ] 2395 | }, 2396 | { 2397 | "cell_type": "markdown", 2398 | "metadata": {}, 2399 | "source": [ 2400 | "The approach I described above deals more with python lists and numpy. If you want a purely Pandas based approach, take a look at this video: https://www.youtube.com/watch?v=ylRlGCtAtiE&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=22" 2401 | ] 2402 | }, 2403 | { 2404 | "cell_type": "markdown", 2405 | "metadata": {}, 2406 | "source": [ 2407 | "# Other Useful Functions" 2408 | ] 2409 | }, 2410 | { 2411 | "cell_type": "markdown", 2412 | "metadata": {}, 2413 | "source": [ 2414 | "* **drop()** - This function removes the column or row that you pass in (You also have the specify the axis). \n", 2415 | "* **agg()** - The aggregate function lets you compute summary statistics about each group\n", 2416 | "* **apply()** - Lets you apply a specific function to any/all elements in a Dataframe or Series\n", 2417 | "* **get_dummies()** - Helpful for turning categorical data into one hot vectors.\n", 2418 | "* **drop_duplicates()** - Lets you remove identical rows" 2419 | ] 2420 | }, 2421 | { 2422 | "cell_type": "markdown", 2423 | "metadata": { 2424 | "collapsed": true 2425 | }, 2426 | "source": [ 2427 | "# Lots of Other Great Resources" 2428 | ] 2429 | }, 2430 | { 2431 | "cell_type": "markdown", 2432 | "metadata": {}, 2433 | "source": [ 2434 | "Pandas has been around for a while and there are a lot of other good resources if you're still interested on getting the most out of this library. \n", 2435 | "* http://pandas.pydata.org/pandas-docs/stable/10min.html\n", 2436 | "* https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python\n", 2437 | "* http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/\n", 2438 | "* https://www.dataquest.io/blog/pandas-python-tutorial/\n", 2439 | "* https://drive.google.com/file/d/0ByIrJAE4KMTtTUtiVExiUGVkRkE/view\n", 2440 | "* https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y" 2441 | ] 2442 | } 2443 | ], 2444 | "metadata": { 2445 | "anaconda-cloud": {}, 2446 | "kernelspec": { 2447 | "display_name": "Python [conda root]", 2448 | "language": "python", 2449 | "name": "conda-root-py" 2450 | }, 2451 | "language_info": { 2452 | "codemirror_mode": { 2453 | "name": "ipython", 2454 | "version": 2 2455 | }, 2456 | "file_extension": ".py", 2457 | "mimetype": "text/x-python", 2458 | "name": "python", 2459 | "nbconvert_exporter": "python", 2460 | "pygments_lexer": "ipython2", 2461 | "version": "2.7.12" 2462 | } 2463 | }, 2464 | "nbformat": 4, 2465 | "nbformat_minor": 1 2466 | } 2467 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pandas-Tutorial 2 | 3 | I've been working with Pandas quite a bit lately, and figured I'd make a short summary of the most important and helpful functions in the library. 4 | 5 | Hopefully it's helpful for you! 6 | 7 | # Lots of Other Great Tutorials 8 | * http://pandas.pydata.org/pandas-docs/stable/10min.html 9 | * https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python 10 | * http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/ 11 | * https://www.dataquest.io/blog/pandas-python-tutorial/ 12 | * https://drive.google.com/file/d/0ByIrJAE4KMTtTUtiVExiUGVkRkE/view 13 | * https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y 14 | -------------------------------------------------------------------------------- /result.csv: -------------------------------------------------------------------------------- 1 | id,pred 2 | 0,10 3 | 1,15 4 | 2,20 5 | --------------------------------------------------------------------------------