├── Task_6.pdf ├── README.md └── Task_5-rev1.ipynb /Task_6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ssilvacris/visualitzaci-_explorat-ria/main/Task_6.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # visualitzaci-_explorat-ria 2 | Delivery task 6: Graphic visualization of a dataset 3 | -------------------------------------------------------------------------------- /Task_5-rev1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Data Science amb Python\n", 8 | "\n", 9 | "**Estudiant**: Cristiane de Souza da Silva\n", 10 | "\n", 11 | "### Lliurament tasca 5: Exploració de les dades\n", 12 | "\n", 13 | "#### Descripció\n", 14 | "\n", 15 | "Familiaritza't amb les tecniques de exploració de les dades mitjantçant la estructure de dades, Datafreame amb la llibreria Pandas.\n", 16 | "\n", 17 | "This dataset is composed by the following variables:\n", 18 | "\n", 19 | "1) Year 2008\n", 20 | "\n", 21 | "2) Month 1-12\n", 22 | "\n", 23 | "3) DayofMonth 1-31\n", 24 | "\n", 25 | "4) DayOfWeek 1 (Monday) - 7 (Sunday)\n", 26 | "\n", 27 | "5) DepTime actual departure time (local, hhmm)\n", 28 | "\n", 29 | "6) CRSDepTime scheduled departure time (local, hhmm)\n", 30 | "\n", 31 | "7) ArrTime actual arrival time (local, hhmm)\n", 32 | "\n", 33 | "8) CRSArrTime scheduled arrival time (local, hhmm)\n", 34 | "\n", 35 | "9) UniqueCarrier unique carrier code\n", 36 | "\n", 37 | "10) FlightNum flight number\n", 38 | "\n", 39 | "11) TailNum plane tail number: aircraft registration, unique aircraft identifier\n", 40 | "\n", 41 | "12) ActualElapsedTime in minutes\n", 42 | "\n", 43 | "13) CRSElapsedTime in minutes\n", 44 | "\n", 45 | "14) AirTime in minutes\n", 46 | "\n", 47 | "15) ArrDelay arrival delay, in minutes: A flight is counted as \"on time\" if it operated less than 15 minutes later -the scheduled time shown in the carriers' Computerized Reservations Systems (CRS).\n", 48 | "\n", 49 | "16) DepDelay departure delay, in minutes\n", 50 | "\n", 51 | "17) Origin origin IATA airport code\n", 52 | "\n", 53 | "18) Dest destination IATA airport code\n", 54 | "\n", 55 | "19) Distance in miles\n", 56 | "\n", 57 | "20) TaxiIn taxi in time, in minutes\n", 58 | "\n", 59 | "21) TaxiOut taxi out time in minutes\n", 60 | "\n", 61 | "22) Cancelled *was the flight cancelled\n", 62 | "\n", 63 | "23) CancellationCode reason for cancellation (A = carrier, B = weather, C = NAS, D = security)\n", 64 | "\n", 65 | "24) Diverted 1 = yes, 0 = no\n", 66 | "\n", 67 | "25) CarrierDelay in minutes: Carrier delay is within the control of the air carrier. Examples of occurrences that may determine carrier delay are: aircraft cleaning, aircraft damage, awaiting the arrival of connecting passengers or crew, baggage, bird strike, cargo loading, catering, computer, outage-carrier equipment, crew legality (pilot or attendant rest), damage by hazardous goods, engineering inspection, fueling, handling disabled passengers, late crew, lavatory servicing, maintenance, oversales, potable water servicing, removal of unruly passenger, slow boarding or seating, stowing carry-on baggage, weight and balance delays.\n", 68 | "\n", 69 | "26) WeatherDelay in minutes: Weather delay is caused by extreme or hazardous weather conditions that are forecasted or manifest themselves on point of departure, enroute, or on point of arrival.\n", 70 | "\n", 71 | "27) NASDelay in minutes: Delay that is within the control of the National Airspace System (NAS) may include: non-extreme weather conditions, airport operations, heavy traffic volume, air traffic control, etc.\n", 72 | "\n", 73 | "28) SecurityDelay in minutes: Security delay is caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas.\n", 74 | "\n", 75 | "29) LateAircraftDelay in minutes: Arrival delay at an airport due to the late arrival of the same aircraft at a previous airport. The ripple effect of an earlier delay at downstream airports is referred to as delay propagation." 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "- **Exercici 1**\n", 83 | "\n", 84 | "Descarrega el data set Airlines Delay: Airline on-time statistics and delay causes i carrega’l a un pandas Dataframe. Explora les dades que conté, i queda’t únicament amb les columnes que consideris rellevants.\n", 85 | "\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 1, 92 | "metadata": {}, 93 | "outputs": [ 94 | { 95 | "data": { 96 | "text/html": [ 97 | "
\n", 98 | "\n", 111 | "\n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | "
Unnamed: 0YearMonthDayofMonthDayOfWeekDepTimeCRSDepTimeArrTimeCRSArrTimeUniqueCarrier...TaxiInTaxiOutCancelledCancellationCodeDivertedCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelay
0020081342003.019552211.02225WN...4.08.00N0NaNNaNNaNNaNNaN
112008134754.07351002.01000WN...5.010.00N0NaNNaNNaNNaNNaN
222008134628.0620804.0750WN...3.017.00N0NaNNaNNaNNaNNaN
3420081341829.017551959.01925WN...3.010.00N02.00.00.00.032.0
4520081341940.019152121.02110WN...4.010.00N0NaNNaNNaNNaNNaN
..................................................................
193675370097102008121361250.012201617.01552DL...9.018.00N03.00.00.00.022.0
19367547009717200812136657.0600904.0749DL...15.034.00N00.057.018.00.00.0
193675570097182008121361007.08471149.01010DL...8.032.00N01.00.019.00.079.0
193675670097262008121361251.012401446.01437DL...13.013.00N0NaNNaNNaNNaNNaN
193675770097272008121361110.011031413.01418DL...8.011.00N0NaNNaNNaNNaNNaN
\n", 405 | "

1936758 rows × 30 columns

\n", 406 | "
" 407 | ], 408 | "text/plain": [ 409 | " Unnamed: 0 Year Month DayofMonth DayOfWeek DepTime CRSDepTime \\\n", 410 | "0 0 2008 1 3 4 2003.0 1955 \n", 411 | "1 1 2008 1 3 4 754.0 735 \n", 412 | "2 2 2008 1 3 4 628.0 620 \n", 413 | "3 4 2008 1 3 4 1829.0 1755 \n", 414 | "4 5 2008 1 3 4 1940.0 1915 \n", 415 | "... ... ... ... ... ... ... ... \n", 416 | "1936753 7009710 2008 12 13 6 1250.0 1220 \n", 417 | "1936754 7009717 2008 12 13 6 657.0 600 \n", 418 | "1936755 7009718 2008 12 13 6 1007.0 847 \n", 419 | "1936756 7009726 2008 12 13 6 1251.0 1240 \n", 420 | "1936757 7009727 2008 12 13 6 1110.0 1103 \n", 421 | "\n", 422 | " ArrTime CRSArrTime UniqueCarrier ... TaxiIn TaxiOut Cancelled \\\n", 423 | "0 2211.0 2225 WN ... 4.0 8.0 0 \n", 424 | "1 1002.0 1000 WN ... 5.0 10.0 0 \n", 425 | "2 804.0 750 WN ... 3.0 17.0 0 \n", 426 | "3 1959.0 1925 WN ... 3.0 10.0 0 \n", 427 | "4 2121.0 2110 WN ... 4.0 10.0 0 \n", 428 | "... ... ... ... ... ... ... ... \n", 429 | "1936753 1617.0 1552 DL ... 9.0 18.0 0 \n", 430 | "1936754 904.0 749 DL ... 15.0 34.0 0 \n", 431 | "1936755 1149.0 1010 DL ... 8.0 32.0 0 \n", 432 | "1936756 1446.0 1437 DL ... 13.0 13.0 0 \n", 433 | "1936757 1413.0 1418 DL ... 8.0 11.0 0 \n", 434 | "\n", 435 | " CancellationCode Diverted CarrierDelay WeatherDelay NASDelay \\\n", 436 | "0 N 0 NaN NaN NaN \n", 437 | "1 N 0 NaN NaN NaN \n", 438 | "2 N 0 NaN NaN NaN \n", 439 | "3 N 0 2.0 0.0 0.0 \n", 440 | "4 N 0 NaN NaN NaN \n", 441 | "... ... ... ... ... ... \n", 442 | "1936753 N 0 3.0 0.0 0.0 \n", 443 | "1936754 N 0 0.0 57.0 18.0 \n", 444 | "1936755 N 0 1.0 0.0 19.0 \n", 445 | "1936756 N 0 NaN NaN NaN \n", 446 | "1936757 N 0 NaN NaN NaN \n", 447 | "\n", 448 | " SecurityDelay LateAircraftDelay \n", 449 | "0 NaN NaN \n", 450 | "1 NaN NaN \n", 451 | "2 NaN NaN \n", 452 | "3 0.0 32.0 \n", 453 | "4 NaN NaN \n", 454 | "... ... ... \n", 455 | "1936753 0.0 22.0 \n", 456 | "1936754 0.0 0.0 \n", 457 | "1936755 0.0 79.0 \n", 458 | "1936756 NaN NaN \n", 459 | "1936757 NaN NaN \n", 460 | "\n", 461 | "[1936758 rows x 30 columns]" 462 | ] 463 | }, 464 | "execution_count": 1, 465 | "metadata": {}, 466 | "output_type": "execute_result" 467 | } 468 | ], 469 | "source": [ 470 | "# Loading the libraries\n", 471 | "\n", 472 | "import numpy as np\n", 473 | "import pandas as pd\n", 474 | "\n", 475 | "\n", 476 | "flights = pd.read_csv('DelayedFlights.csv')\n", 477 | "flights" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 2, 483 | "metadata": {}, 484 | "outputs": [ 485 | { 486 | "data": { 487 | "text/plain": [ 488 | "Index(['Unnamed: 0', 'Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime',\n", 489 | " 'CRSDepTime', 'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum',\n", 490 | " 'TailNum', 'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay',\n", 491 | " 'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut',\n", 492 | " 'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay',\n", 493 | " 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'],\n", 494 | " dtype='object')" 495 | ] 496 | }, 497 | "execution_count": 2, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "flights.columns" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": 3, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [ 512 | "# Remove 'Unnamed' column\n", 513 | "\n", 514 | "flights.drop('Unnamed: 0', axis=1, inplace=True)" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 4, 520 | "metadata": {}, 521 | "outputs": [ 522 | { 523 | "name": "stdout", 524 | "output_type": "stream", 525 | "text": [ 526 | "\n", 527 | "RangeIndex: 1936758 entries, 0 to 1936757\n", 528 | "Data columns (total 29 columns):\n", 529 | " # Column Dtype \n", 530 | "--- ------ ----- \n", 531 | " 0 Year int64 \n", 532 | " 1 Month int64 \n", 533 | " 2 DayofMonth int64 \n", 534 | " 3 DayOfWeek int64 \n", 535 | " 4 DepTime float64\n", 536 | " 5 CRSDepTime int64 \n", 537 | " 6 ArrTime float64\n", 538 | " 7 CRSArrTime int64 \n", 539 | " 8 UniqueCarrier object \n", 540 | " 9 FlightNum int64 \n", 541 | " 10 TailNum object \n", 542 | " 11 ActualElapsedTime float64\n", 543 | " 12 CRSElapsedTime float64\n", 544 | " 13 AirTime float64\n", 545 | " 14 ArrDelay float64\n", 546 | " 15 DepDelay float64\n", 547 | " 16 Origin object \n", 548 | " 17 Dest object \n", 549 | " 18 Distance int64 \n", 550 | " 19 TaxiIn float64\n", 551 | " 20 TaxiOut float64\n", 552 | " 21 Cancelled int64 \n", 553 | " 22 CancellationCode object \n", 554 | " 23 Diverted int64 \n", 555 | " 24 CarrierDelay float64\n", 556 | " 25 WeatherDelay float64\n", 557 | " 26 NASDelay float64\n", 558 | " 27 SecurityDelay float64\n", 559 | " 28 LateAircraftDelay float64\n", 560 | "dtypes: float64(14), int64(10), object(5)\n", 561 | "memory usage: 428.5+ MB\n", 562 | "None\n" 563 | ] 564 | } 565 | ], 566 | "source": [ 567 | "\n", 568 | "print(flights.info())" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": {}, 574 | "source": [ 575 | "- **Exercici 2**\n", 576 | "\n", 577 | "Fes un informe complet del data set:.\n", 578 | "\n", 579 | "- Resumeix estadísticament les columnes d’interès\n", 580 | "- Troba quantes dades faltants hi ha per columna\n", 581 | "- Crea columnes noves (velocitat mitjana del vol, si ha arribat tard o no...)\n", 582 | "- Taula de les aerolínies amb més endarreriments acumulats\n", 583 | "- Quins són els vols més llargs? I els més endarrerits?\n", 584 | "Etc." 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": 5, 590 | "metadata": {}, 591 | "outputs": [ 592 | { 593 | "data": { 594 | "text/html": [ 595 | "
\n", 596 | "\n", 609 | "\n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | "
YearMonthDayofMonthDayOfWeekDepTimeCRSDepTimeArrTimeCRSArrTimeFlightNumActualElapsedTime...DistanceTaxiInTaxiOutCancelledDivertedCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelay
count1936758.01.936758e+061.936758e+061.936758e+061.936758e+061.936758e+061.929648e+061.936758e+061.936758e+061.928371e+06...1.936758e+061.929648e+061.936303e+061.936758e+061.936758e+061.247488e+061.247488e+061.247488e+061.247488e+061.247488e+06
mean2008.06.111106e+001.575347e+013.984827e+001.518534e+031.467473e+031.610141e+031.634225e+032.184263e+031.333059e+02...7.656862e+026.812975e+001.823220e+013.268348e-044.003598e-031.917940e+013.703571e+001.502164e+019.013714e-022.529647e+01
std0.03.482546e+008.776272e+001.995966e+004.504853e+024.247668e+025.481781e+024.646347e+021.944702e+037.206007e+01...5.744797e+025.273595e+001.433853e+011.807562e-026.314722e-024.354621e+012.149290e+013.383305e+012.022714e+004.205486e+01
min2008.01.000000e+001.000000e+001.000000e+001.000000e+000.000000e+001.000000e+000.000000e+001.000000e+001.400000e+01...1.100000e+010.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+00
25%2008.03.000000e+008.000000e+002.000000e+001.203000e+031.135000e+031.316000e+031.325000e+036.100000e+028.000000e+01...3.380000e+024.000000e+001.000000e+010.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+000.000000e+00
50%2008.06.000000e+001.600000e+014.000000e+001.545000e+031.510000e+031.715000e+031.705000e+031.543000e+031.160000e+02...6.060000e+026.000000e+001.400000e+010.000000e+000.000000e+002.000000e+000.000000e+002.000000e+000.000000e+008.000000e+00
75%2008.09.000000e+002.300000e+016.000000e+001.900000e+031.815000e+032.030000e+032.014000e+033.422000e+031.650000e+02...9.980000e+028.000000e+002.100000e+010.000000e+000.000000e+002.100000e+010.000000e+001.500000e+010.000000e+003.300000e+01
max2008.01.200000e+013.100000e+017.000000e+002.400000e+032.359000e+032.400000e+032.400000e+039.742000e+031.114000e+03...4.962000e+032.400000e+024.220000e+021.000000e+001.000000e+002.436000e+031.352000e+031.357000e+033.920000e+021.316000e+03
\n", 831 | "

8 rows × 24 columns

\n", 832 | "
" 833 | ], 834 | "text/plain": [ 835 | " Year Month DayofMonth DayOfWeek DepTime \\\n", 836 | "count 1936758.0 1.936758e+06 1.936758e+06 1.936758e+06 1.936758e+06 \n", 837 | "mean 2008.0 6.111106e+00 1.575347e+01 3.984827e+00 1.518534e+03 \n", 838 | "std 0.0 3.482546e+00 8.776272e+00 1.995966e+00 4.504853e+02 \n", 839 | "min 2008.0 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 \n", 840 | "25% 2008.0 3.000000e+00 8.000000e+00 2.000000e+00 1.203000e+03 \n", 841 | "50% 2008.0 6.000000e+00 1.600000e+01 4.000000e+00 1.545000e+03 \n", 842 | "75% 2008.0 9.000000e+00 2.300000e+01 6.000000e+00 1.900000e+03 \n", 843 | "max 2008.0 1.200000e+01 3.100000e+01 7.000000e+00 2.400000e+03 \n", 844 | "\n", 845 | " CRSDepTime ArrTime CRSArrTime FlightNum \\\n", 846 | "count 1.936758e+06 1.929648e+06 1.936758e+06 1.936758e+06 \n", 847 | "mean 1.467473e+03 1.610141e+03 1.634225e+03 2.184263e+03 \n", 848 | "std 4.247668e+02 5.481781e+02 4.646347e+02 1.944702e+03 \n", 849 | "min 0.000000e+00 1.000000e+00 0.000000e+00 1.000000e+00 \n", 850 | "25% 1.135000e+03 1.316000e+03 1.325000e+03 6.100000e+02 \n", 851 | "50% 1.510000e+03 1.715000e+03 1.705000e+03 1.543000e+03 \n", 852 | "75% 1.815000e+03 2.030000e+03 2.014000e+03 3.422000e+03 \n", 853 | "max 2.359000e+03 2.400000e+03 2.400000e+03 9.742000e+03 \n", 854 | "\n", 855 | " ActualElapsedTime ... Distance TaxiIn TaxiOut \\\n", 856 | "count 1.928371e+06 ... 1.936758e+06 1.929648e+06 1.936303e+06 \n", 857 | "mean 1.333059e+02 ... 7.656862e+02 6.812975e+00 1.823220e+01 \n", 858 | "std 7.206007e+01 ... 5.744797e+02 5.273595e+00 1.433853e+01 \n", 859 | "min 1.400000e+01 ... 1.100000e+01 0.000000e+00 0.000000e+00 \n", 860 | "25% 8.000000e+01 ... 3.380000e+02 4.000000e+00 1.000000e+01 \n", 861 | "50% 1.160000e+02 ... 6.060000e+02 6.000000e+00 1.400000e+01 \n", 862 | "75% 1.650000e+02 ... 9.980000e+02 8.000000e+00 2.100000e+01 \n", 863 | "max 1.114000e+03 ... 4.962000e+03 2.400000e+02 4.220000e+02 \n", 864 | "\n", 865 | " Cancelled Diverted CarrierDelay WeatherDelay NASDelay \\\n", 866 | "count 1.936758e+06 1.936758e+06 1.247488e+06 1.247488e+06 1.247488e+06 \n", 867 | "mean 3.268348e-04 4.003598e-03 1.917940e+01 3.703571e+00 1.502164e+01 \n", 868 | "std 1.807562e-02 6.314722e-02 4.354621e+01 2.149290e+01 3.383305e+01 \n", 869 | "min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n", 870 | "25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 \n", 871 | "50% 0.000000e+00 0.000000e+00 2.000000e+00 0.000000e+00 2.000000e+00 \n", 872 | "75% 0.000000e+00 0.000000e+00 2.100000e+01 0.000000e+00 1.500000e+01 \n", 873 | "max 1.000000e+00 1.000000e+00 2.436000e+03 1.352000e+03 1.357000e+03 \n", 874 | "\n", 875 | " SecurityDelay LateAircraftDelay \n", 876 | "count 1.247488e+06 1.247488e+06 \n", 877 | "mean 9.013714e-02 2.529647e+01 \n", 878 | "std 2.022714e+00 4.205486e+01 \n", 879 | "min 0.000000e+00 0.000000e+00 \n", 880 | "25% 0.000000e+00 0.000000e+00 \n", 881 | "50% 0.000000e+00 8.000000e+00 \n", 882 | "75% 0.000000e+00 3.300000e+01 \n", 883 | "max 3.920000e+02 1.316000e+03 \n", 884 | "\n", 885 | "[8 rows x 24 columns]" 886 | ] 887 | }, 888 | "execution_count": 5, 889 | "metadata": {}, 890 | "output_type": "execute_result" 891 | } 892 | ], 893 | "source": [ 894 | "# Summarize the columns of interest statistically\n", 895 | "\n", 896 | "flights.describe()" 897 | ] 898 | }, 899 | { 900 | "cell_type": "code", 901 | "execution_count": 6, 902 | "metadata": {}, 903 | "outputs": [ 904 | { 905 | "data": { 906 | "text/plain": [ 907 | "Year 0\n", 908 | "Month 0\n", 909 | "DayofMonth 0\n", 910 | "DayOfWeek 0\n", 911 | "DepTime 0\n", 912 | "CRSDepTime 0\n", 913 | "ArrTime 7110\n", 914 | "CRSArrTime 0\n", 915 | "UniqueCarrier 0\n", 916 | "FlightNum 0\n", 917 | "TailNum 5\n", 918 | "ActualElapsedTime 8387\n", 919 | "CRSElapsedTime 198\n", 920 | "AirTime 8387\n", 921 | "ArrDelay 8387\n", 922 | "DepDelay 0\n", 923 | "Origin 0\n", 924 | "Dest 0\n", 925 | "Distance 0\n", 926 | "TaxiIn 7110\n", 927 | "TaxiOut 455\n", 928 | "Cancelled 0\n", 929 | "CancellationCode 0\n", 930 | "Diverted 0\n", 931 | "CarrierDelay 689270\n", 932 | "WeatherDelay 689270\n", 933 | "NASDelay 689270\n", 934 | "SecurityDelay 689270\n", 935 | "LateAircraftDelay 689270\n", 936 | "dtype: int64" 937 | ] 938 | }, 939 | "execution_count": 6, 940 | "metadata": {}, 941 | "output_type": "execute_result" 942 | } 943 | ], 944 | "source": [ 945 | "# Find how many missing data are per column\n", 946 | "\n", 947 | "flights.isna().sum()" 948 | ] 949 | }, 950 | { 951 | "cell_type": "code", 952 | "execution_count": null, 953 | "metadata": {}, 954 | "outputs": [], 955 | "source": [] 956 | }, 957 | { 958 | "cell_type": "code", 959 | "execution_count": 7, 960 | "metadata": {}, 961 | "outputs": [], 962 | "source": [ 963 | "#Create new columns\n", 964 | "# Create departure date column \n", 965 | "# Convert time\n", 966 | "\n", 967 | "flights['DepDate'] = pd.to_datetime(flights.Year*10000+flights.Month*100+flights.DayofMonth ,format='%Y%m%d')\n" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 8, 973 | "metadata": {}, 974 | "outputs": [ 975 | { 976 | "data": { 977 | "text/plain": [ 978 | "0 2008-01-03\n", 979 | "1 2008-01-03\n", 980 | "2 2008-01-03\n", 981 | "3 2008-01-03\n", 982 | "4 2008-01-03\n", 983 | " ... \n", 984 | "1936753 2008-12-13\n", 985 | "1936754 2008-12-13\n", 986 | "1936755 2008-12-13\n", 987 | "1936756 2008-12-13\n", 988 | "1936757 2008-12-13\n", 989 | "Name: DepDate, Length: 1936758, dtype: datetime64[ns]" 990 | ] 991 | }, 992 | "execution_count": 8, 993 | "metadata": {}, 994 | "output_type": "execute_result" 995 | } 996 | ], 997 | "source": [ 998 | "flights['DepDate']" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "execution_count": null, 1004 | "metadata": {}, 1005 | "outputs": [], 1006 | "source": [] 1007 | }, 1008 | { 1009 | "cell_type": "markdown", 1010 | "metadata": {}, 1011 | "source": [ 1012 | "- Exercici 3\n", 1013 | "\n", 1014 | "Exporta el data set net i amb les noves columnes a Excel." 1015 | ] 1016 | }, 1017 | { 1018 | "cell_type": "code", 1019 | "execution_count": 9, 1020 | "metadata": {}, 1021 | "outputs": [ 1022 | { 1023 | "data": { 1024 | "text/html": [ 1025 | "
\n", 1026 | "\n", 1039 | "\n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | "
YearMonthDayofMonthDayOfWeekDepTimeCRSDepTimeCRSArrTimeUniqueCarrierFlightNumDepDelayOriginDestDistanceCancelledCancellationCodeDivertedDepDate
020081342003.019552225WN3358.0IADTPA8100N02008-01-03
12008134754.07351000WN323119.0IADTPA8100N02008-01-03
22008134628.0620750WN4488.0INDBWI5150N02008-01-03
320081341829.017551925WN392034.0INDBWI5150N02008-01-03
420081341940.019152110WN37825.0INDJAX6880N02008-01-03
......................................................
19367532008121361250.012201552DL162130.0MSPATL9060N02008-12-13
1936754200812136657.0600749DL163157.0RICATL4810N02008-12-13
19367552008121361007.08471010DL163180.0ATLIAH6890N02008-12-13
19367562008121361251.012401437DL163911.0IADATL5330N02008-12-13
19367572008121361110.011031418DL16417.0SATATL8740N02008-12-13
\n", 1285 | "

1936758 rows × 17 columns

\n", 1286 | "
" 1287 | ], 1288 | "text/plain": [ 1289 | " Year Month DayofMonth DayOfWeek DepTime CRSDepTime CRSArrTime \\\n", 1290 | "0 2008 1 3 4 2003.0 1955 2225 \n", 1291 | "1 2008 1 3 4 754.0 735 1000 \n", 1292 | "2 2008 1 3 4 628.0 620 750 \n", 1293 | "3 2008 1 3 4 1829.0 1755 1925 \n", 1294 | "4 2008 1 3 4 1940.0 1915 2110 \n", 1295 | "... ... ... ... ... ... ... ... \n", 1296 | "1936753 2008 12 13 6 1250.0 1220 1552 \n", 1297 | "1936754 2008 12 13 6 657.0 600 749 \n", 1298 | "1936755 2008 12 13 6 1007.0 847 1010 \n", 1299 | "1936756 2008 12 13 6 1251.0 1240 1437 \n", 1300 | "1936757 2008 12 13 6 1110.0 1103 1418 \n", 1301 | "\n", 1302 | " UniqueCarrier FlightNum DepDelay Origin Dest Distance Cancelled \\\n", 1303 | "0 WN 335 8.0 IAD TPA 810 0 \n", 1304 | "1 WN 3231 19.0 IAD TPA 810 0 \n", 1305 | "2 WN 448 8.0 IND BWI 515 0 \n", 1306 | "3 WN 3920 34.0 IND BWI 515 0 \n", 1307 | "4 WN 378 25.0 IND JAX 688 0 \n", 1308 | "... ... ... ... ... ... ... ... \n", 1309 | "1936753 DL 1621 30.0 MSP ATL 906 0 \n", 1310 | "1936754 DL 1631 57.0 RIC ATL 481 0 \n", 1311 | "1936755 DL 1631 80.0 ATL IAH 689 0 \n", 1312 | "1936756 DL 1639 11.0 IAD ATL 533 0 \n", 1313 | "1936757 DL 1641 7.0 SAT ATL 874 0 \n", 1314 | "\n", 1315 | " CancellationCode Diverted DepDate \n", 1316 | "0 N 0 2008-01-03 \n", 1317 | "1 N 0 2008-01-03 \n", 1318 | "2 N 0 2008-01-03 \n", 1319 | "3 N 0 2008-01-03 \n", 1320 | "4 N 0 2008-01-03 \n", 1321 | "... ... ... ... \n", 1322 | "1936753 N 0 2008-12-13 \n", 1323 | "1936754 N 0 2008-12-13 \n", 1324 | "1936755 N 0 2008-12-13 \n", 1325 | "1936756 N 0 2008-12-13 \n", 1326 | "1936757 N 0 2008-12-13 \n", 1327 | "\n", 1328 | "[1936758 rows x 17 columns]" 1329 | ] 1330 | }, 1331 | "execution_count": 9, 1332 | "metadata": {}, 1333 | "output_type": "execute_result" 1334 | } 1335 | ], 1336 | "source": [ 1337 | "# Remove the columns with missing data\n", 1338 | "\n", 1339 | "flights.drop(['ArrTime','ActualElapsedTime','CRSElapsedTime','AirTime','ArrDelay','TaxiIn','TailNum' ,'TaxiOut','CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'], axis=1, inplace=True)\n", 1340 | "flights" 1341 | ] 1342 | }, 1343 | { 1344 | "cell_type": "code", 1345 | "execution_count": 11, 1346 | "metadata": {}, 1347 | "outputs": [], 1348 | "source": [ 1349 | "#Export the data set clean and with the new columns to Excel.\n", 1350 | "\n", 1351 | "flights.to_csv(\"output.csv\")\n", 1352 | "\n", 1353 | "# The file is too big to be exported to excel, so I exported as csv format. " 1354 | ] 1355 | }, 1356 | { 1357 | "cell_type": "markdown", 1358 | "metadata": {}, 1359 | "source": [ 1360 | "### New plots from task 6\n", 1361 | "\n", 1362 | "![code_catplot](code_catplot.png)\n", 1363 | "\n", 1364 | "![code_arrdelay](code-arrdelay.png)\n", 1365 | "\n", 1366 | "![code-arr-dep](code-arr-dep.png)\n", 1367 | "\n", 1368 | "![arrdelay-depdelay](arrdelay-depdelay.png)\n", 1369 | "\n", 1370 | "![time-code](time-code.png)" 1371 | ] 1372 | }, 1373 | { 1374 | "cell_type": "code", 1375 | "execution_count": null, 1376 | "metadata": {}, 1377 | "outputs": [], 1378 | "source": [] 1379 | } 1380 | ], 1381 | "metadata": { 1382 | "kernelspec": { 1383 | "display_name": "Python 3", 1384 | "language": "python", 1385 | "name": "python3" 1386 | }, 1387 | "language_info": { 1388 | "codemirror_mode": { 1389 | "name": "ipython", 1390 | "version": 3 1391 | }, 1392 | "file_extension": ".py", 1393 | "mimetype": "text/x-python", 1394 | "name": "python", 1395 | "nbconvert_exporter": "python", 1396 | "pygments_lexer": "ipython3", 1397 | "version": "3.8.5" 1398 | } 1399 | }, 1400 | "nbformat": 4, 1401 | "nbformat_minor": 4 1402 | } 1403 | --------------------------------------------------------------------------------