├── LICENSE ├── README.md └── Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Darius Moruri 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Flood-Prediction-in-Malawi--Zindi-Competition 2 | ## Starter Code for Flood Prediction in Malawi 3 | ### Author: [Darius Moruri](https://www.linkedin.com/in/dariusmoruri/) 4 | 5 | 6 | --- 7 | 8 | - This is a simple starter code to get you going for the [Zindi flood prediction competition](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi) 9 | - As this is just a basic machine learning pipeline, the following aspects haven't been covered: 10 | - Exploratory Data Analysis 11 | - Feature Engineering 12 | - Feature Selection 13 | - Hyperparameter Tuning 14 | - Model Evaluation 15 | - Model interpretation 16 | - Sourcing for more data 17 | - Documentation and Presentation 18 | 19 | *Despite its basic approach, this starter code yieldied a satisfacatory RMSE of **0.11866** and a **top 15 ranking** (as at the time of writing) in the [public leaderboard](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi/leaderboard)* 20 | 21 | ## Context 22 | On 14 March 2019, tropical Cyclone Idai made landfall at the port of Beira, Mozambique, before moving across the region. Millions of people in Malawi, Mozambique and Zimbabwe have been affected by what is the worst natural disaster to hit southern Africa in at least two decades. 23 | 24 | In recent decades, countries across Africa have experienced an increase in the frequency and severity of floods. Malawi has been hit with major floods in 2015 and again in 2019. In fact, between 1946 and 2013, floods accounted for 48% of major disasters in Malawi. The Lower Shire Valley in southern Malawi, bordering Mozambique, composed of Chikwawa and Nsanje Districts is the area most prone to flooding. 25 | 26 | The objective of this challenge is to build a machine learning model that helps predict the location and extent of floods in southern Malawi. 27 | 28 | 29 | ## Data 30 | The training data for this competion can be found [here](https://drive.google.com/file/d/13PmGuIpBbgc-BaDeXxR8-i-9E3oGZYY0/view?usp=sharing) 31 | and a sample of the submission file can be found [here](https://drive.google.com/file/d/1HBdLXuiXkhRHDoPSUUpbvw6Eh5OredLy/view?usp=sharing) 32 | 33 | ## Evaluation 34 | The error metric for this competition is the Root Mean Squared Error 35 | 36 | 37 | -------------------------------------------------------------------------------- /Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "qaqB2GS7FYRK", 32 | "colab_type": "text" 33 | }, 34 | "source": [ 35 | "# Starter Code for Flood Prediction in Malawi\n", 36 | "### Author: [Darius Moruri](https://www.linkedin.com/in/dariusmoruri/)\n", 37 | "\n", 38 | "\n", 39 | "---\n", 40 | "\n", 41 | " - This is a simple starter code to get you going for the [Zindi flood prediction competition](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi)\n", 42 | " - As it is just a basic machine learning pipeline, the following aspects haven't been covered:\n", 43 | " - Exploratory Data Analysis\n", 44 | " - Feature Engineering\n", 45 | " - Feature Selection\n", 46 | " - Hyperparameter Tuning\n", 47 | " - Model Evaluation\n", 48 | " - Model interpretation\n", 49 | " - Sourcing for more data\n", 50 | " - Documentation and Presentation\n", 51 | "\n", 52 | "*Despite its basic approach, this starter code yieldied a satisfacatory RMSE of **0.11866** and a **top 15 ranking** (as at the time of writing) in the [public leaderboard](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi/leaderboard)*\n", 53 | "\n", 54 | "## Context\n", 55 | "On 14 March 2019, tropical Cyclone Idai made landfall at the port of Beira, Mozambique, before moving across the region. Millions of people in Malawi, Mozambique and Zimbabwe have been affected by what is the worst natural disaster to hit southern Africa in at least two decades.\n", 56 | "\n", 57 | "In recent decades, countries across Africa have experienced an increase in the frequency and severity of floods. Malawi has been hit with major floods in 2015 and again in 2019. In fact, between 1946 and 2013, floods accounted for 48% of major disasters in Malawi. The Lower Shire Valley in southern Malawi, bordering Mozambique, composed of Chikwawa and Nsanje Districts is the area most prone to flooding.\n", 58 | "\n", 59 | "The objective of this challenge is to build a machine learning model that helps predict the location and extent of floods in southern Malawi.\n", 60 | "\n", 61 | "\n", 62 | "## Data\n", 63 | "The training data for this competion can be found [here](https://drive.google.com/file/d/13PmGuIpBbgc-BaDeXxR8-i-9E3oGZYY0/view?usp=sharing)\n", 64 | "and a sample of the submission file can be found [here](https://drive.google.com/file/d/1HBdLXuiXkhRHDoPSUUpbvw6Eh5OredLy/view?usp=sharing)\n", 65 | "\n", 66 | "## Evaluation\n", 67 | "The error metric for this competition is the Root Mean Squared Error\n", 68 | "\n" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "id": "KPejNILARlEE", 75 | "colab_type": "text" 76 | }, 77 | "source": [ 78 | "## Importing the Necessary Libraries" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "metadata": { 84 | "id": "QAXGhwzWHLpq", 85 | "colab_type": "code", 86 | "colab": {} 87 | }, 88 | "source": [ 89 | "# Importing libraries\n", 90 | "#\n", 91 | "import pandas as pd\n", 92 | "import numpy as np\n", 93 | "import requests\n", 94 | "from io import StringIO \n", 95 | "import warnings\n", 96 | "warnings.filterwarnings('ignore')" 97 | ], 98 | "execution_count": 0, 99 | "outputs": [] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": { 104 | "id": "AWbBYTZSRoIH", 105 | "colab_type": "text" 106 | }, 107 | "source": [ 108 | "## Reading the Data" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "metadata": { 114 | "id": "FZTl-W7uHe3Y", 115 | "colab_type": "code", 116 | "colab": {} 117 | }, 118 | "source": [ 119 | "# Google drive links to shared submission and training datasets\n", 120 | "#\n", 121 | "submission = 'https://drive.google.com/file/d/1XhTATUUEIKpkFudV9HygYzbJZc4mWcHo/view?usp=sharing'\n", 122 | "train = 'https://drive.google.com/file/d/1hqS1wAoClLHN0aFJABL4myzFMDn4USgV/view?usp=sharing'\n", 123 | "\n", 124 | "\n", 125 | "# Creating a function to read a csv file shared via google\n", 126 | "#\n", 127 | "def read_csv(url):\n", 128 | " url = 'https://drive.google.com/uc?export=download&id=' + url.split('/')[-2]\n", 129 | " csv_raw = requests.get(url).text\n", 130 | " csv = StringIO(csv_raw)\n", 131 | " df = pd.read_csv(csv)\n", 132 | " return df\n", 133 | "\n", 134 | "# Creating submission and training datataframes\n", 135 | "#\n", 136 | "sub = read_csv(submission)\n", 137 | "df = read_csv(train)" 138 | ], 139 | "execution_count": 0, 140 | "outputs": [] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": { 145 | "id": "5YU4NXvPR483", 146 | "colab_type": "text" 147 | }, 148 | "source": [ 149 | "## Basic Data Analysis" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "metadata": { 155 | "id": "N93Od3lxHe07", 156 | "colab_type": "code", 157 | "outputId": "f09c457b-995b-422f-98d4-51981eb01791", 158 | "colab": { 159 | "base_uri": "https://localhost:8080/", 160 | "height": 551 161 | } 162 | }, 163 | "source": [ 164 | "# Previewing the first five rows of the dataframe\n", 165 | "#\n", 166 | "df.head()" 167 | ], 168 | "execution_count": 0, 169 | "outputs": [ 170 | { 171 | "output_type": "execute_result", 172 | "data": { 173 | "text/html": [ 174 | "
\n", 175 | "\n", 188 | "\n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | "
XYtarget_2015elevationprecip 2014-11-16 - 2014-11-23precip 2014-11-23 - 2014-11-30precip 2014-11-30 - 2014-12-07precip 2014-12-07 - 2014-12-14precip 2014-12-14 - 2014-12-21precip 2014-12-21 - 2014-12-28precip 2014-12-28 - 2015-01-04precip 2015-01-04 - 2015-01-11precip 2015-01-11 - 2015-01-18precip 2015-01-18 - 2015-01-25precip 2015-01-25 - 2015-02-01precip 2015-02-01 - 2015-02-08precip 2015-02-08 - 2015-02-15precip 2015-02-15 - 2015-02-22precip 2015-02-22 - 2015-03-01precip 2015-03-01 - 2015-03-08precip 2015-03-08 - 2015-03-15precip 2019-01-20 - 2019-01-27precip 2019-01-27 - 2019-02-03precip 2019-02-03 - 2019-02-10precip 2019-02-10 - 2019-02-17precip 2019-02-17 - 2019-02-24precip 2019-02-24 - 2019-03-03precip 2019-03-03 - 2019-03-10precip 2019-03-10 - 2019-03-17precip 2019-03-17 - 2019-03-24precip 2019-03-24 - 2019-03-31precip 2019-03-31 - 2019-04-07precip 2019-04-07 - 2019-04-14precip 2019-04-14 - 2019-04-21precip 2019-04-21 - 2019-04-28precip 2019-04-28 - 2019-05-05precip 2019-05-05 - 2019-05-12precip 2019-05-12 - 2019-05-19LC_Type1_modeSquare_ID
034.26-15.910.0887.7642220.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.094e3c3896-14ce-11ea-bce5-f49634744a41
134.26-15.900.0743.4039120.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.094e3c3897-14ce-11ea-bce5-f49634744a41
234.26-15.890.0565.7283430.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.094e3c3898-14ce-11ea-bce5-f49634744a41
334.26-15.880.0443.3927740.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.0104e3c3899-14ce-11ea-bce5-f49634744a41
434.26-15.870.0437.4434280.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.0104e3c389a-14ce-11ea-bce5-f49634744a41
\n", 452 | "
" 453 | ], 454 | "text/plain": [ 455 | " X Y ... LC_Type1_mode Square_ID\n", 456 | "0 34.26 -15.91 ... 9 4e3c3896-14ce-11ea-bce5-f49634744a41\n", 457 | "1 34.26 -15.90 ... 9 4e3c3897-14ce-11ea-bce5-f49634744a41\n", 458 | "2 34.26 -15.89 ... 9 4e3c3898-14ce-11ea-bce5-f49634744a41\n", 459 | "3 34.26 -15.88 ... 10 4e3c3899-14ce-11ea-bce5-f49634744a41\n", 460 | "4 34.26 -15.87 ... 10 4e3c389a-14ce-11ea-bce5-f49634744a41\n", 461 | "\n", 462 | "[5 rows x 40 columns]" 463 | ] 464 | }, 465 | "metadata": { 466 | "tags": [] 467 | }, 468 | "execution_count": 3 469 | } 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "metadata": { 475 | "id": "6JawgMLoW5OC", 476 | "colab_type": "code", 477 | "outputId": "4d3bb0e2-ee3e-45cc-ea21-49d6a3a7da31", 478 | "colab": { 479 | "base_uri": "https://localhost:8080/", 480 | "height": 534 481 | } 482 | }, 483 | "source": [ 484 | "# Previewwing the last ten rows of the dataframe\n", 485 | "#\n", 486 | "df.tail()" 487 | ], 488 | "execution_count": 0, 489 | "outputs": [ 490 | { 491 | "output_type": "execute_result", 492 | "data": { 493 | "text/html": [ 494 | "
\n", 495 | "\n", 508 | "\n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | "
XYtarget_2015elevationprecip 2014-11-16 - 2014-11-23precip 2014-11-23 - 2014-11-30precip 2014-11-30 - 2014-12-07precip 2014-12-07 - 2014-12-14precip 2014-12-14 - 2014-12-21precip 2014-12-21 - 2014-12-28precip 2014-12-28 - 2015-01-04precip 2015-01-04 - 2015-01-11precip 2015-01-11 - 2015-01-18precip 2015-01-18 - 2015-01-25precip 2015-01-25 - 2015-02-01precip 2015-02-01 - 2015-02-08precip 2015-02-08 - 2015-02-15precip 2015-02-15 - 2015-02-22precip 2015-02-22 - 2015-03-01precip 2015-03-01 - 2015-03-08precip 2015-03-08 - 2015-03-15precip 2019-01-20 - 2019-01-27precip 2019-01-27 - 2019-02-03precip 2019-02-03 - 2019-02-10precip 2019-02-10 - 2019-02-17precip 2019-02-17 - 2019-02-24precip 2019-02-24 - 2019-03-03precip 2019-03-03 - 2019-03-10precip 2019-03-10 - 2019-03-17precip 2019-03-17 - 2019-03-24precip 2019-03-24 - 2019-03-31precip 2019-03-31 - 2019-04-07precip 2019-04-07 - 2019-04-14precip 2019-04-14 - 2019-04-21precip 2019-04-21 - 2019-04-28precip 2019-04-28 - 2019-05-05precip 2019-05-05 - 2019-05-12precip 2019-05-12 - 2019-05-19LC_Type1_modeSquare_ID
1646135.86-15.440.0635.67502216.95656331.15553112.8820138.8101456.1798299.86368515.76568521.457507105.2758913.64533818.53148313.81606323.7280588.7949989.36976321.4281312.4936838.7603265.17761612.45031917.28994219.61217910.90963564.49417115.94085224.82898211.33533930.9847620.5182695.77006614.8397794.92829410.52618618.746072104e6f5dfd-14ce-11ea-bce5-f49634744a41
1646235.86-15.430.0632.59889216.95656331.15553112.8820138.8101456.1798299.86368515.76568521.457507105.2758913.64533818.53148313.81606323.7280588.7949989.36976321.4281312.4936838.7603265.17761612.45031917.28994219.61217910.90963564.49417115.94085224.82898211.33533930.9847620.5182695.77006614.8397794.92829410.52618618.746072104e6f5dfe-14ce-11ea-bce5-f49634744a41
1646335.86-15.420.0632.45013616.95656331.15553112.8820138.8101456.1798299.86368515.76568521.457507105.2758913.64533818.53148313.81606323.7280588.7949989.36976321.4281312.4936838.7603265.17761612.45031917.28994219.61217910.90963564.49417115.94085224.82898211.33533930.9847620.5182695.77006614.8397794.92829410.52618618.746072104e6f5dff-14ce-11ea-bce5-f49634744a41
1646435.86-15.410.0629.27273316.95656331.15553112.8820138.8101456.1798299.86368515.76568521.457507105.2758913.64533818.53148313.81606323.7280588.7949989.36976321.4281312.4936838.7603265.17761612.45031917.28994219.61217910.90963564.49417115.94085224.82898211.33533930.9847620.5182695.77006614.8397794.92829410.52618618.746072104e6f5e00-14ce-11ea-bce5-f49634744a41
1646535.86-15.400.0626.16464116.95656331.15553112.8820138.8101456.1798299.86368515.76568521.457507105.2758913.64533818.53148313.81606323.7280588.7949989.36976321.4281312.4936838.7603265.17761612.45031917.28994219.61217910.90963564.49417115.94085224.82898211.33533930.9847620.5182695.77006614.8397794.92829410.52618618.746072104e6f5e01-14ce-11ea-bce5-f49634744a41
\n", 772 | "
" 773 | ], 774 | "text/plain": [ 775 | " X Y ... LC_Type1_mode Square_ID\n", 776 | "16461 35.86 -15.44 ... 10 4e6f5dfd-14ce-11ea-bce5-f49634744a41\n", 777 | "16462 35.86 -15.43 ... 10 4e6f5dfe-14ce-11ea-bce5-f49634744a41\n", 778 | "16463 35.86 -15.42 ... 10 4e6f5dff-14ce-11ea-bce5-f49634744a41\n", 779 | "16464 35.86 -15.41 ... 10 4e6f5e00-14ce-11ea-bce5-f49634744a41\n", 780 | "16465 35.86 -15.40 ... 10 4e6f5e01-14ce-11ea-bce5-f49634744a41\n", 781 | "\n", 782 | "[5 rows x 40 columns]" 783 | ] 784 | }, 785 | "metadata": { 786 | "tags": [] 787 | }, 788 | "execution_count": 4 789 | } 790 | ] 791 | }, 792 | { 793 | "cell_type": "code", 794 | "metadata": { 795 | "id": "AnOG5ZNGHeyC", 796 | "colab_type": "code", 797 | "outputId": "fb029135-7207-4ad1-a906-d693d168b3cd", 798 | "colab": { 799 | "base_uri": "https://localhost:8080/", 800 | "height": 1000 801 | } 802 | }, 803 | "source": [ 804 | "# Previewing some statistical summaries of the dataframe\n", 805 | "# Transposing for a better view\n", 806 | "#\n", 807 | "df.describe().T" 808 | ], 809 | "execution_count": 0, 810 | "outputs": [ 811 | { 812 | "output_type": "execute_result", 813 | "data": { 814 | "text/html": [ 815 | "
\n", 816 | "\n", 829 | "\n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | "
countmeanstdmin25%50%75%max
X16466.035.0776560.39239534.26000034.76000035.05000035.39000035.860000
Y16466.0-15.8138020.359789-16.640000-16.070000-15.800000-15.520000-15.210000
target_201516466.00.0766090.2287340.0000000.0000000.0000000.0000001.000000
elevation16466.0592.848206354.79035745.541444329.063852623.000000751.4348132803.303645
precip 2014-11-16 - 2014-11-2316466.01.6107604.2254610.0000000.0000000.0000001.26184819.354969
precip 2014-11-23 - 2014-11-3016466.02.5020588.6318460.0000000.0000000.0000000.00000041.023858
precip 2014-11-30 - 2014-12-0716466.01.1620764.3966760.0000000.0000000.0000000.00000022.020803
precip 2014-12-07 - 2014-12-1416466.08.2706104.2633751.4114525.5484407.94182210.88723518.870675
precip 2014-12-14 - 2014-12-2116466.08.8924593.7600523.5803425.9054408.61839010.96066823.044340
precip 2014-12-21 - 2014-12-2816466.09.5728214.5237671.2540986.1798858.78678012.67077521.757828
precip 2014-12-28 - 2015-01-0416466.022.92503613.6904517.46299911.61705718.38153931.30469962.433432
precip 2015-01-04 - 2015-01-1116466.028.1132107.79429115.64815423.48387926.08558633.58743451.197420
precip 2015-01-11 - 2015-01-1816466.058.85920816.80783830.44946845.97260155.50111569.311540105.275891
precip 2015-01-18 - 2015-01-2516466.01.2511731.9699230.0000000.0000000.5021641.19586611.103658
precip 2015-01-25 - 2015-02-0116466.034.6531777.45642214.96438330.03745034.36372936.71538653.014243
precip 2015-02-01 - 2015-02-0816466.028.3148888.04722313.26128022.26241726.51267534.88024044.341312
precip 2015-02-08 - 2015-02-1516466.012.4879097.0644350.4590675.09080214.09201218.68192628.559923
precip 2015-02-15 - 2015-02-2216466.03.8025842.6744340.2790021.6541553.3010295.12027615.715008
precip 2015-02-22 - 2015-03-0116466.017.0722856.0749266.72868513.76995715.54950819.83644936.964993
precip 2015-03-01 - 2015-03-0816466.09.1109494.5722013.2834255.5388488.23581911.30865025.711649
precip 2015-03-08 - 2015-03-1516466.00.3306411.0084900.0000000.0000000.0000000.0000004.953321
precip 2019-01-20 - 2019-01-2716466.013.3290235.5528183.8138648.94647912.91314717.12383125.101563
precip 2019-01-27 - 2019-02-0316466.04.4374905.1631840.0000000.8882162.2498336.83276822.774148
precip 2019-02-03 - 2019-02-1016466.023.1495006.14850912.45031919.59060621.66169824.21346046.225504
precip 2019-02-10 - 2019-02-1716466.09.7497854.5581722.8015466.7617049.07645710.83000021.948157
precip 2019-02-17 - 2019-02-2416466.029.5759918.50860812.78085525.36091030.59341533.30045752.682312
precip 2019-02-24 - 2019-03-0316466.01.8643994.1583130.0000000.0000000.6900001.42937021.275595
precip 2019-03-03 - 2019-03-1016466.060.4249648.31319932.10822656.38049060.58169665.72144684.675319
precip 2019-03-10 - 2019-03-1716466.012.3216209.9009940.0000003.12017312.50860620.00437536.740809
precip 2019-03-17 - 2019-03-2416466.035.63735414.51916915.80342922.02176334.27571644.25389772.123185
precip 2019-03-24 - 2019-03-3116466.02.1262343.7348290.0000000.0000000.8963232.07659016.403638
precip 2019-03-31 - 2019-04-0716466.03.4533958.0072480.0000000.0000000.0000002.91499637.059980
precip 2019-04-07 - 2019-04-1416466.03.5593663.8202940.0000000.0000002.6070536.39000012.979454
precip 2019-04-14 - 2019-04-2116466.09.1276776.8689370.0000004.3525287.86245313.45907046.367849
precip 2019-04-21 - 2019-04-2816466.01.6607094.4180320.0000000.0000000.0000000.00000019.475846
precip 2019-04-28 - 2019-05-0516466.00.5261441.4949350.0000000.0000000.0000000.0000006.914834
precip 2019-05-05 - 2019-05-1216466.00.9681013.6906980.0000000.0000000.0000000.00000018.170051
precip 2019-05-12 - 2019-05-1916466.01.5857434.6518630.0000000.0000000.0000000.00000020.092777
LC_Type1_mode16466.010.7317502.0261002.0000009.00000010.00000012.00000017.000000
\n", 1275 | "
" 1276 | ], 1277 | "text/plain": [ 1278 | " count mean ... 75% max\n", 1279 | "X 16466.0 35.077656 ... 35.390000 35.860000\n", 1280 | "Y 16466.0 -15.813802 ... -15.520000 -15.210000\n", 1281 | "target_2015 16466.0 0.076609 ... 0.000000 1.000000\n", 1282 | "elevation 16466.0 592.848206 ... 751.434813 2803.303645\n", 1283 | "precip 2014-11-16 - 2014-11-23 16466.0 1.610760 ... 1.261848 19.354969\n", 1284 | "precip 2014-11-23 - 2014-11-30 16466.0 2.502058 ... 0.000000 41.023858\n", 1285 | "precip 2014-11-30 - 2014-12-07 16466.0 1.162076 ... 0.000000 22.020803\n", 1286 | "precip 2014-12-07 - 2014-12-14 16466.0 8.270610 ... 10.887235 18.870675\n", 1287 | "precip 2014-12-14 - 2014-12-21 16466.0 8.892459 ... 10.960668 23.044340\n", 1288 | "precip 2014-12-21 - 2014-12-28 16466.0 9.572821 ... 12.670775 21.757828\n", 1289 | "precip 2014-12-28 - 2015-01-04 16466.0 22.925036 ... 31.304699 62.433432\n", 1290 | "precip 2015-01-04 - 2015-01-11 16466.0 28.113210 ... 33.587434 51.197420\n", 1291 | "precip 2015-01-11 - 2015-01-18 16466.0 58.859208 ... 69.311540 105.275891\n", 1292 | "precip 2015-01-18 - 2015-01-25 16466.0 1.251173 ... 1.195866 11.103658\n", 1293 | "precip 2015-01-25 - 2015-02-01 16466.0 34.653177 ... 36.715386 53.014243\n", 1294 | "precip 2015-02-01 - 2015-02-08 16466.0 28.314888 ... 34.880240 44.341312\n", 1295 | "precip 2015-02-08 - 2015-02-15 16466.0 12.487909 ... 18.681926 28.559923\n", 1296 | "precip 2015-02-15 - 2015-02-22 16466.0 3.802584 ... 5.120276 15.715008\n", 1297 | "precip 2015-02-22 - 2015-03-01 16466.0 17.072285 ... 19.836449 36.964993\n", 1298 | "precip 2015-03-01 - 2015-03-08 16466.0 9.110949 ... 11.308650 25.711649\n", 1299 | "precip 2015-03-08 - 2015-03-15 16466.0 0.330641 ... 0.000000 4.953321\n", 1300 | "precip 2019-01-20 - 2019-01-27 16466.0 13.329023 ... 17.123831 25.101563\n", 1301 | "precip 2019-01-27 - 2019-02-03 16466.0 4.437490 ... 6.832768 22.774148\n", 1302 | "precip 2019-02-03 - 2019-02-10 16466.0 23.149500 ... 24.213460 46.225504\n", 1303 | "precip 2019-02-10 - 2019-02-17 16466.0 9.749785 ... 10.830000 21.948157\n", 1304 | "precip 2019-02-17 - 2019-02-24 16466.0 29.575991 ... 33.300457 52.682312\n", 1305 | "precip 2019-02-24 - 2019-03-03 16466.0 1.864399 ... 1.429370 21.275595\n", 1306 | "precip 2019-03-03 - 2019-03-10 16466.0 60.424964 ... 65.721446 84.675319\n", 1307 | "precip 2019-03-10 - 2019-03-17 16466.0 12.321620 ... 20.004375 36.740809\n", 1308 | "precip 2019-03-17 - 2019-03-24 16466.0 35.637354 ... 44.253897 72.123185\n", 1309 | "precip 2019-03-24 - 2019-03-31 16466.0 2.126234 ... 2.076590 16.403638\n", 1310 | "precip 2019-03-31 - 2019-04-07 16466.0 3.453395 ... 2.914996 37.059980\n", 1311 | "precip 2019-04-07 - 2019-04-14 16466.0 3.559366 ... 6.390000 12.979454\n", 1312 | "precip 2019-04-14 - 2019-04-21 16466.0 9.127677 ... 13.459070 46.367849\n", 1313 | "precip 2019-04-21 - 2019-04-28 16466.0 1.660709 ... 0.000000 19.475846\n", 1314 | "precip 2019-04-28 - 2019-05-05 16466.0 0.526144 ... 0.000000 6.914834\n", 1315 | "precip 2019-05-05 - 2019-05-12 16466.0 0.968101 ... 0.000000 18.170051\n", 1316 | "precip 2019-05-12 - 2019-05-19 16466.0 1.585743 ... 0.000000 20.092777\n", 1317 | "LC_Type1_mode 16466.0 10.731750 ... 12.000000 17.000000\n", 1318 | "\n", 1319 | "[39 rows x 8 columns]" 1320 | ] 1321 | }, 1322 | "metadata": { 1323 | "tags": [] 1324 | }, 1325 | "execution_count": 5 1326 | } 1327 | ] 1328 | }, 1329 | { 1330 | "cell_type": "markdown", 1331 | "metadata": { 1332 | "id": "VGJx1k4cYBm5", 1333 | "colab_type": "text" 1334 | }, 1335 | "source": [ 1336 | "#### Shape and Size" 1337 | ] 1338 | }, 1339 | { 1340 | "cell_type": "code", 1341 | "metadata": { 1342 | "id": "3ZDs7u909mEt", 1343 | "colab_type": "code", 1344 | "outputId": "94ca72fe-dd3a-475e-dce4-5fc8e5456ecb", 1345 | "colab": { 1346 | "base_uri": "https://localhost:8080/", 1347 | "height": 34 1348 | } 1349 | }, 1350 | "source": [ 1351 | "# Checking for the shape and size of the dataframe\n", 1352 | "#\n", 1353 | "df.shape, df.size" 1354 | ], 1355 | "execution_count": 0, 1356 | "outputs": [ 1357 | { 1358 | "output_type": "execute_result", 1359 | "data": { 1360 | "text/plain": [ 1361 | "((16466, 40), 658640)" 1362 | ] 1363 | }, 1364 | "metadata": { 1365 | "tags": [] 1366 | }, 1367 | "execution_count": 6 1368 | } 1369 | ] 1370 | }, 1371 | { 1372 | "cell_type": "markdown", 1373 | "metadata": { 1374 | "id": "L2wdvD_8X3Ln", 1375 | "colab_type": "text" 1376 | }, 1377 | "source": [ 1378 | "#### Missing Values" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "code", 1383 | "metadata": { 1384 | "id": "di56pTPRWbm9", 1385 | "colab_type": "code", 1386 | "outputId": "eb5cdc57-906c-4250-b33b-8b59254aafa7", 1387 | "colab": { 1388 | "base_uri": "https://localhost:8080/", 1389 | "height": 34 1390 | } 1391 | }, 1392 | "source": [ 1393 | "# Checking for missing values\n", 1394 | "#\n", 1395 | "df.isnull().sum().any()" 1396 | ], 1397 | "execution_count": 0, 1398 | "outputs": [ 1399 | { 1400 | "output_type": "execute_result", 1401 | "data": { 1402 | "text/plain": [ 1403 | "False" 1404 | ] 1405 | }, 1406 | "metadata": { 1407 | "tags": [] 1408 | }, 1409 | "execution_count": 7 1410 | } 1411 | ] 1412 | }, 1413 | { 1414 | "cell_type": "markdown", 1415 | "metadata": { 1416 | "id": "nT_RY2oSX6_S", 1417 | "colab_type": "text" 1418 | }, 1419 | "source": [ 1420 | "#### Duplicated Values" 1421 | ] 1422 | }, 1423 | { 1424 | "cell_type": "code", 1425 | "metadata": { 1426 | "id": "-w80Xtd3WbjP", 1427 | "colab_type": "code", 1428 | "outputId": "75d1117f-25e0-4b0c-d22f-6fb20433d0cf", 1429 | "colab": { 1430 | "base_uri": "https://localhost:8080/", 1431 | "height": 34 1432 | } 1433 | }, 1434 | "source": [ 1435 | "# Checking for duplicates\n", 1436 | "#\n", 1437 | "df.duplicated().any()" 1438 | ], 1439 | "execution_count": 0, 1440 | "outputs": [ 1441 | { 1442 | "output_type": "execute_result", 1443 | "data": { 1444 | "text/plain": [ 1445 | "False" 1446 | ] 1447 | }, 1448 | "metadata": { 1449 | "tags": [] 1450 | }, 1451 | "execution_count": 8 1452 | } 1453 | ] 1454 | }, 1455 | { 1456 | "cell_type": "markdown", 1457 | "metadata": { 1458 | "id": "Pq4eeZlwX9qW", 1459 | "colab_type": "text" 1460 | }, 1461 | "source": [ 1462 | "#### Data Types" 1463 | ] 1464 | }, 1465 | { 1466 | "cell_type": "code", 1467 | "metadata": { 1468 | "id": "utNJgAZ-fvIB", 1469 | "colab_type": "code", 1470 | "outputId": "3ae8ae17-be78-44e2-8885-a371759fca0a", 1471 | "colab": { 1472 | "base_uri": "https://localhost:8080/", 1473 | "height": 738 1474 | } 1475 | }, 1476 | "source": [ 1477 | "# Checking if the columns are represented with the appriopriate datatypes\n", 1478 | "#\n", 1479 | "df.dtypes" 1480 | ], 1481 | "execution_count": 0, 1482 | "outputs": [ 1483 | { 1484 | "output_type": "execute_result", 1485 | "data": { 1486 | "text/plain": [ 1487 | "X float64\n", 1488 | "Y float64\n", 1489 | "target_2015 float64\n", 1490 | "elevation float64\n", 1491 | "precip 2014-11-16 - 2014-11-23 float64\n", 1492 | "precip 2014-11-23 - 2014-11-30 float64\n", 1493 | "precip 2014-11-30 - 2014-12-07 float64\n", 1494 | "precip 2014-12-07 - 2014-12-14 float64\n", 1495 | "precip 2014-12-14 - 2014-12-21 float64\n", 1496 | "precip 2014-12-21 - 2014-12-28 float64\n", 1497 | "precip 2014-12-28 - 2015-01-04 float64\n", 1498 | "precip 2015-01-04 - 2015-01-11 float64\n", 1499 | "precip 2015-01-11 - 2015-01-18 float64\n", 1500 | "precip 2015-01-18 - 2015-01-25 float64\n", 1501 | "precip 2015-01-25 - 2015-02-01 float64\n", 1502 | "precip 2015-02-01 - 2015-02-08 float64\n", 1503 | "precip 2015-02-08 - 2015-02-15 float64\n", 1504 | "precip 2015-02-15 - 2015-02-22 float64\n", 1505 | "precip 2015-02-22 - 2015-03-01 float64\n", 1506 | "precip 2015-03-01 - 2015-03-08 float64\n", 1507 | "precip 2015-03-08 - 2015-03-15 float64\n", 1508 | "precip 2019-01-20 - 2019-01-27 float64\n", 1509 | "precip 2019-01-27 - 2019-02-03 float64\n", 1510 | "precip 2019-02-03 - 2019-02-10 float64\n", 1511 | "precip 2019-02-10 - 2019-02-17 float64\n", 1512 | "precip 2019-02-17 - 2019-02-24 float64\n", 1513 | "precip 2019-02-24 - 2019-03-03 float64\n", 1514 | "precip 2019-03-03 - 2019-03-10 float64\n", 1515 | "precip 2019-03-10 - 2019-03-17 float64\n", 1516 | "precip 2019-03-17 - 2019-03-24 float64\n", 1517 | "precip 2019-03-24 - 2019-03-31 float64\n", 1518 | "precip 2019-03-31 - 2019-04-07 float64\n", 1519 | "precip 2019-04-07 - 2019-04-14 float64\n", 1520 | "precip 2019-04-14 - 2019-04-21 float64\n", 1521 | "precip 2019-04-21 - 2019-04-28 float64\n", 1522 | "precip 2019-04-28 - 2019-05-05 float64\n", 1523 | "precip 2019-05-05 - 2019-05-12 float64\n", 1524 | "precip 2019-05-12 - 2019-05-19 float64\n", 1525 | "LC_Type1_mode int64\n", 1526 | "Square_ID object\n", 1527 | "dtype: object" 1528 | ] 1529 | }, 1530 | "metadata": { 1531 | "tags": [] 1532 | }, 1533 | "execution_count": 9 1534 | } 1535 | ] 1536 | }, 1537 | { 1538 | "cell_type": "markdown", 1539 | "metadata": { 1540 | "id": "EDdJqbOWR_Mw", 1541 | "colab_type": "text" 1542 | }, 1543 | "source": [ 1544 | "## Data Cleaning" 1545 | ] 1546 | }, 1547 | { 1548 | "cell_type": "markdown", 1549 | "metadata": { 1550 | "id": "pKS5LhlvSxTp", 1551 | "colab_type": "text" 1552 | }, 1553 | "source": [ 1554 | "#### Separating the train and test sets " 1555 | ] 1556 | }, 1557 | { 1558 | "cell_type": "code", 1559 | "metadata": { 1560 | "id": "TOm258dvHetE", 1561 | "colab_type": "code", 1562 | "colab": {} 1563 | }, 1564 | "source": [ 1565 | "# Creating lists of columns to be used in separating the dataframe into training and testing datasets\n", 1566 | "# Using a for loop for efficiency\n", 1567 | "#\n", 1568 | "precip_features_2019 = []\n", 1569 | "precip_features_2015 = []\n", 1570 | "for col in df.columns:\n", 1571 | " if '2019' in col:\n", 1572 | " precip_features_2019.append(col)\n", 1573 | " elif 'precip 2014' in col:\n", 1574 | " precip_features_2015.append(col)\n", 1575 | " elif 'precip 2015' in col:\n", 1576 | " precip_features_2015.append(col)" 1577 | ], 1578 | "execution_count": 0, 1579 | "outputs": [] 1580 | }, 1581 | { 1582 | "cell_type": "code", 1583 | "metadata": { 1584 | "id": "xTMJlAXOJ6Of", 1585 | "colab_type": "code", 1586 | "outputId": "a2be0b25-21e7-4e69-9c37-4dab9abbc4f7", 1587 | "colab": { 1588 | "base_uri": "https://localhost:8080/", 1589 | "height": 311 1590 | } 1591 | }, 1592 | "source": [ 1593 | "# Separating the train dataset from the main dataframe\n", 1594 | "#\n", 1595 | "train = df[df.columns.difference(precip_features_2019)]\n", 1596 | "\n", 1597 | "# Previewing the first two rows of the train dataset\n", 1598 | "#\n", 1599 | "train.head(2)" 1600 | ], 1601 | "execution_count": 0, 1602 | "outputs": [ 1603 | { 1604 | "output_type": "execute_result", 1605 | "data": { 1606 | "text/html": [ 1607 | "
\n", 1608 | "\n", 1621 | "\n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | "
LC_Type1_modeSquare_IDXYelevationprecip 2014-11-16 - 2014-11-23precip 2014-11-23 - 2014-11-30precip 2014-11-30 - 2014-12-07precip 2014-12-07 - 2014-12-14precip 2014-12-14 - 2014-12-21precip 2014-12-21 - 2014-12-28precip 2014-12-28 - 2015-01-04precip 2015-01-04 - 2015-01-11precip 2015-01-11 - 2015-01-18precip 2015-01-18 - 2015-01-25precip 2015-01-25 - 2015-02-01precip 2015-02-01 - 2015-02-08precip 2015-02-08 - 2015-02-15precip 2015-02-15 - 2015-02-22precip 2015-02-22 - 2015-03-01precip 2015-03-01 - 2015-03-08precip 2015-03-08 - 2015-03-15target_2015
094e3c3896-14ce-11ea-bce5-f49634744a4134.26-15.91887.7642220.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.00.0
194e3c3897-14ce-11ea-bce5-f49634744a4134.26-15.90743.4039120.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.00.0
\n", 1705 | "
" 1706 | ], 1707 | "text/plain": [ 1708 | " LC_Type1_mode ... target_2015\n", 1709 | "0 9 ... 0.0\n", 1710 | "1 9 ... 0.0\n", 1711 | "\n", 1712 | "[2 rows x 23 columns]" 1713 | ] 1714 | }, 1715 | "metadata": { 1716 | "tags": [] 1717 | }, 1718 | "execution_count": 11 1719 | } 1720 | ] 1721 | }, 1722 | { 1723 | "cell_type": "code", 1724 | "metadata": { 1725 | "id": "od6oWJvdTV1K", 1726 | "colab_type": "code", 1727 | "outputId": "4c0b0ada-b1c5-4b4d-8d20-784d91c224d3", 1728 | "colab": { 1729 | "base_uri": "https://localhost:8080/", 1730 | "height": 311 1731 | } 1732 | }, 1733 | "source": [ 1734 | "# Separating the test dataset from the main dataframe\n", 1735 | "#\n", 1736 | "precip_features_2019.extend(['X',\t'Y',\t'elevation', 'LC_Type1_mode',\t'Square_ID'])\n", 1737 | "test = df[precip_features_2019]\n", 1738 | "\n", 1739 | "# Previewing the first two rows of the test dataset\n", 1740 | "#\n", 1741 | "test.head(2)" 1742 | ], 1743 | "execution_count": 0, 1744 | "outputs": [ 1745 | { 1746 | "output_type": "execute_result", 1747 | "data": { 1748 | "text/html": [ 1749 | "
\n", 1750 | "\n", 1763 | "\n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | " \n", 1834 | " \n", 1835 | " \n", 1836 | " \n", 1837 | " \n", 1838 | " \n", 1839 | " \n", 1840 | " \n", 1841 | " \n", 1842 | " \n", 1843 | "
precip 2019-01-20 - 2019-01-27precip 2019-01-27 - 2019-02-03precip 2019-02-03 - 2019-02-10precip 2019-02-10 - 2019-02-17precip 2019-02-17 - 2019-02-24precip 2019-02-24 - 2019-03-03precip 2019-03-03 - 2019-03-10precip 2019-03-10 - 2019-03-17precip 2019-03-17 - 2019-03-24precip 2019-03-24 - 2019-03-31precip 2019-03-31 - 2019-04-07precip 2019-04-07 - 2019-04-14precip 2019-04-14 - 2019-04-21precip 2019-04-21 - 2019-04-28precip 2019-04-28 - 2019-05-05precip 2019-05-05 - 2019-05-12precip 2019-05-12 - 2019-05-19XYelevationLC_Type1_modeSquare_ID
012.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.034.26-15.91887.76422294e3c3896-14ce-11ea-bce5-f49634744a41
112.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.034.26-15.90743.40391294e3c3897-14ce-11ea-bce5-f49634744a41
\n", 1844 | "
" 1845 | ], 1846 | "text/plain": [ 1847 | " precip 2019-01-20 - 2019-01-27 ... Square_ID\n", 1848 | "0 12.99262 ... 4e3c3896-14ce-11ea-bce5-f49634744a41\n", 1849 | "1 12.99262 ... 4e3c3897-14ce-11ea-bce5-f49634744a41\n", 1850 | "\n", 1851 | "[2 rows x 22 columns]" 1852 | ] 1853 | }, 1854 | "metadata": { 1855 | "tags": [] 1856 | }, 1857 | "execution_count": 12 1858 | } 1859 | ] 1860 | }, 1861 | { 1862 | "cell_type": "markdown", 1863 | "metadata": { 1864 | "id": "TCSE0BL1SGdE", 1865 | "colab_type": "text" 1866 | }, 1867 | "source": [ 1868 | "#### Renaming columns" 1869 | ] 1870 | }, 1871 | { 1872 | "cell_type": "code", 1873 | "metadata": { 1874 | "id": "F_IbMz6pOwr7", 1875 | "colab_type": "code", 1876 | "colab": {} 1877 | }, 1878 | "source": [ 1879 | "# Creating a dictionary of column names to be renamed for the training dataset\n", 1880 | "# The column names are renamed for conveniency\n", 1881 | "#\n", 1882 | "new_2015_cols = {}\n", 1883 | "for col, number in zip(precip_features_2015, range(1, len(precip_features_2015) + 1)):\n", 1884 | " if 'precip' in col:\n", 1885 | " new_2015_cols[col] = 'week_' + str(number) + '_precip'\n", 1886 | "\n", 1887 | " \n", 1888 | "# Creating a dictionary of column names to be renamed for the testing dataset\n", 1889 | "#\n", 1890 | "new_2019_cols = {}\n", 1891 | "for col, number in zip(precip_features_2019, range(1, len(precip_features_2019) + 1)):\n", 1892 | " if 'precip' in col:\n", 1893 | " new_2019_cols[col] = 'week_' + str(number) + '_precip'\n", 1894 | " \n", 1895 | "# Renaming the columns\n", 1896 | "#\n", 1897 | "train.rename(columns = new_2015_cols, inplace = True)\n", 1898 | "test.rename(columns = new_2019_cols, inplace = True)" 1899 | ], 1900 | "execution_count": 0, 1901 | "outputs": [] 1902 | }, 1903 | { 1904 | "cell_type": "code", 1905 | "metadata": { 1906 | "id": "gbxK4ssoOwn9", 1907 | "colab_type": "code", 1908 | "outputId": "7e4ee8c7-69c5-4e47-b83a-ae722dc21721", 1909 | "colab": { 1910 | "base_uri": "https://localhost:8080/", 1911 | "height": 307 1912 | } 1913 | }, 1914 | "source": [ 1915 | "# Previewing the first three rows of the cleaned train set\n", 1916 | "#\n", 1917 | "train.head(3)" 1918 | ], 1919 | "execution_count": 0, 1920 | "outputs": [ 1921 | { 1922 | "output_type": "execute_result", 1923 | "data": { 1924 | "text/html": [ 1925 | "
\n", 1926 | "\n", 1939 | "\n", 1940 | " \n", 1941 | " \n", 1942 | " \n", 1943 | " \n", 1944 | " \n", 1945 | " \n", 1946 | " \n", 1947 | " \n", 1948 | " \n", 1949 | " \n", 1950 | " \n", 1951 | " \n", 1952 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | " \n", 1961 | " \n", 1962 | " \n", 1963 | " \n", 1964 | " \n", 1965 | " \n", 1966 | " \n", 1967 | " \n", 1968 | " \n", 1969 | " \n", 1970 | " \n", 1971 | " \n", 1972 | " \n", 1973 | " \n", 1974 | " \n", 1975 | " \n", 1976 | " \n", 1977 | " \n", 1978 | " \n", 1979 | " \n", 1980 | " \n", 1981 | " \n", 1982 | " \n", 1983 | " \n", 1984 | " \n", 1985 | " \n", 1986 | " \n", 1987 | " \n", 1988 | " \n", 1989 | " \n", 1990 | " \n", 1991 | " \n", 1992 | " \n", 1993 | " \n", 1994 | " \n", 1995 | " \n", 1996 | " \n", 1997 | " \n", 1998 | " \n", 1999 | " \n", 2000 | " \n", 2001 | " \n", 2002 | " \n", 2003 | " \n", 2004 | " \n", 2005 | " \n", 2006 | " \n", 2007 | " \n", 2008 | " \n", 2009 | " \n", 2010 | " \n", 2011 | " \n", 2012 | " \n", 2013 | " \n", 2014 | " \n", 2015 | " \n", 2016 | " \n", 2017 | " \n", 2018 | " \n", 2019 | " \n", 2020 | " \n", 2021 | " \n", 2022 | " \n", 2023 | " \n", 2024 | " \n", 2025 | " \n", 2026 | " \n", 2027 | " \n", 2028 | " \n", 2029 | " \n", 2030 | " \n", 2031 | " \n", 2032 | " \n", 2033 | " \n", 2034 | " \n", 2035 | " \n", 2036 | " \n", 2037 | " \n", 2038 | " \n", 2039 | " \n", 2040 | " \n", 2041 | " \n", 2042 | " \n", 2043 | " \n", 2044 | " \n", 2045 | " \n", 2046 | " \n", 2047 | " \n", 2048 | "
LC_Type1_modeSquare_IDXYelevationweek_1_precipweek_2_precipweek_3_precipweek_4_precipweek_5_precipweek_6_precipweek_7_precipweek_8_precipweek_9_precipweek_10_precipweek_11_precipweek_12_precipweek_13_precipweek_14_precipweek_15_precipweek_16_precipweek_17_preciptarget_2015
094e3c3896-14ce-11ea-bce5-f49634744a4134.26-15.91887.7642220.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.00.0
194e3c3897-14ce-11ea-bce5-f49634744a4134.26-15.90743.4039120.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.00.0
294e3c3898-14ce-11ea-bce5-f49634744a4134.26-15.89565.7283430.00.00.014.84402514.55282312.23776657.45136130.12704730.4494681.52182929.38999532.8783188.1798040.96398116.6590973.3044660.00.0
\n", 2049 | "
" 2050 | ], 2051 | "text/plain": [ 2052 | " LC_Type1_mode ... target_2015\n", 2053 | "0 9 ... 0.0\n", 2054 | "1 9 ... 0.0\n", 2055 | "2 9 ... 0.0\n", 2056 | "\n", 2057 | "[3 rows x 23 columns]" 2058 | ] 2059 | }, 2060 | "metadata": { 2061 | "tags": [] 2062 | }, 2063 | "execution_count": 14 2064 | } 2065 | ] 2066 | }, 2067 | { 2068 | "cell_type": "markdown", 2069 | "metadata": { 2070 | "id": "f1OOTy-USMWb", 2071 | "colab_type": "text" 2072 | }, 2073 | "source": [ 2074 | "#### Re-aligning the Train and Test Datasets" 2075 | ] 2076 | }, 2077 | { 2078 | "cell_type": "code", 2079 | "metadata": { 2080 | "id": "K_LmbYUFN5zW", 2081 | "colab_type": "code", 2082 | "outputId": "38663225-daff-45ad-b49c-80bd7da2bdb4", 2083 | "colab": { 2084 | "base_uri": "https://localhost:8080/", 2085 | "height": 307 2086 | } 2087 | }, 2088 | "source": [ 2089 | "# Separating the target variable\n", 2090 | "#\n", 2091 | "target = train.target_2015\n", 2092 | "\n", 2093 | "\n", 2094 | "# Aligning the training and testing datasets\n", 2095 | "#\n", 2096 | "train, test = train.align(test, join = 'inner', axis = 1)\n", 2097 | "\n", 2098 | "\n", 2099 | "# Previewing the first three rows of the cleaned and realigned test set\n", 2100 | "#\n", 2101 | "test.head(3)" 2102 | ], 2103 | "execution_count": 0, 2104 | "outputs": [ 2105 | { 2106 | "output_type": "execute_result", 2107 | "data": { 2108 | "text/html": [ 2109 | "
\n", 2110 | "\n", 2123 | "\n", 2124 | " \n", 2125 | " \n", 2126 | " \n", 2127 | " \n", 2128 | " \n", 2129 | " \n", 2130 | " \n", 2131 | " \n", 2132 | " \n", 2133 | " \n", 2134 | " \n", 2135 | " \n", 2136 | " \n", 2137 | " \n", 2138 | " \n", 2139 | " \n", 2140 | " \n", 2141 | " \n", 2142 | " \n", 2143 | " \n", 2144 | " \n", 2145 | " \n", 2146 | " \n", 2147 | " \n", 2148 | " \n", 2149 | " \n", 2150 | " \n", 2151 | " \n", 2152 | " \n", 2153 | " \n", 2154 | " \n", 2155 | " \n", 2156 | " \n", 2157 | " \n", 2158 | " \n", 2159 | " \n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | " \n", 2175 | " \n", 2176 | " \n", 2177 | " \n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2190 | " \n", 2191 | " \n", 2192 | " \n", 2193 | " \n", 2194 | " \n", 2195 | " \n", 2196 | " \n", 2197 | " \n", 2198 | " \n", 2199 | " \n", 2200 | " \n", 2201 | " \n", 2202 | " \n", 2203 | " \n", 2204 | " \n", 2205 | " \n", 2206 | " \n", 2207 | " \n", 2208 | " \n", 2209 | " \n", 2210 | " \n", 2211 | " \n", 2212 | " \n", 2213 | " \n", 2214 | " \n", 2215 | " \n", 2216 | " \n", 2217 | " \n", 2218 | " \n", 2219 | " \n", 2220 | " \n", 2221 | " \n", 2222 | " \n", 2223 | " \n", 2224 | " \n", 2225 | " \n", 2226 | " \n", 2227 | " \n", 2228 | "
LC_Type1_modeSquare_IDXYelevationweek_1_precipweek_2_precipweek_3_precipweek_4_precipweek_5_precipweek_6_precipweek_7_precipweek_8_precipweek_9_precipweek_10_precipweek_11_precipweek_12_precipweek_13_precipweek_14_precipweek_15_precipweek_16_precipweek_17_precip
094e3c3896-14ce-11ea-bce5-f49634744a4134.26-15.91887.76422212.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.0
194e3c3897-14ce-11ea-bce5-f49634744a4134.26-15.90743.40391212.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.0
294e3c3898-14ce-11ea-bce5-f49634744a4134.26-15.89565.72834312.992624.58285635.0375324.79601228.0833140.058.36245618.26469217.5374860.8963231.680.00.00.00.00.00.0
\n", 2229 | "
" 2230 | ], 2231 | "text/plain": [ 2232 | " LC_Type1_mode ... week_17_precip\n", 2233 | "0 9 ... 0.0\n", 2234 | "1 9 ... 0.0\n", 2235 | "2 9 ... 0.0\n", 2236 | "\n", 2237 | "[3 rows x 22 columns]" 2238 | ] 2239 | }, 2240 | "metadata": { 2241 | "tags": [] 2242 | }, 2243 | "execution_count": 15 2244 | } 2245 | ] 2246 | }, 2247 | { 2248 | "cell_type": "markdown", 2249 | "metadata": { 2250 | "id": "jUolrELTTFGm", 2251 | "colab_type": "text" 2252 | }, 2253 | "source": [ 2254 | "## Model Selection" 2255 | ] 2256 | }, 2257 | { 2258 | "cell_type": "code", 2259 | "metadata": { 2260 | "id": "oo8XVSzBhWph", 2261 | "colab_type": "code", 2262 | "outputId": "b2900251-94b7-4766-9caf-516fe035e450", 2263 | "colab": { 2264 | "base_uri": "https://localhost:8080/", 2265 | "height": 351 2266 | } 2267 | }, 2268 | "source": [ 2269 | "# Installing catboost\n", 2270 | "!pip install catboost==0.20.2" 2271 | ], 2272 | "execution_count": 0, 2273 | "outputs": [ 2274 | { 2275 | "output_type": "stream", 2276 | "text": [ 2277 | "Collecting catboost\n", 2278 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/3d/f6/733fe7cca5d0d882e1a708ad59da2510416cc2e4fa54e17c7a5082f67811/catboost-0.20.1-cp36-none-manylinux1_x86_64.whl (63.6MB)\n", 2279 | "\u001b[K |████████████████████████████████| 63.6MB 60.8MB/s \n", 2280 | "\u001b[?25hRequirement already satisfied: graphviz in /usr/local/lib/python3.6/dist-packages (from catboost) (0.10.1)\n", 2281 | "Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from catboost) (1.17.4)\n", 2282 | "Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.6/dist-packages (from catboost) (0.25.3)\n", 2283 | "Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from catboost) (1.3.3)\n", 2284 | "Requirement already satisfied: plotly in /usr/local/lib/python3.6/dist-packages (from catboost) (4.1.1)\n", 2285 | "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from catboost) (1.12.0)\n", 2286 | "Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from catboost) (3.1.2)\n", 2287 | "Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.24.0->catboost) (2.6.1)\n", 2288 | "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.24.0->catboost) (2018.9)\n", 2289 | "Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.6/dist-packages (from plotly->catboost) (1.3.3)\n", 2290 | "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (0.10.0)\n", 2291 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (2.4.5)\n", 2292 | "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (1.1.0)\n", 2293 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib->catboost) (42.0.2)\n", 2294 | "Installing collected packages: catboost\n", 2295 | "Successfully installed catboost-0.20.1\n" 2296 | ], 2297 | "name": "stdout" 2298 | } 2299 | ] 2300 | }, 2301 | { 2302 | "cell_type": "markdown", 2303 | "metadata": { 2304 | "id": "uKFPY7QXTN-A", 2305 | "colab_type": "text" 2306 | }, 2307 | "source": [ 2308 | "#### Comparing different models to find the most accurate" 2309 | ] 2310 | }, 2311 | { 2312 | "cell_type": "code", 2313 | "metadata": { 2314 | "id": "4o5TqrYmN5wJ", 2315 | "colab_type": "code", 2316 | "outputId": "d8e09ba5-c30b-4fd1-8db1-302b3c53d5e5", 2317 | "colab": { 2318 | "base_uri": "https://localhost:8080/", 2319 | "height": 432 2320 | } 2321 | }, 2322 | "source": [ 2323 | "# Using different models to find the optimal model\n", 2324 | "#\n", 2325 | "from sklearn.model_selection import KFold, cross_val_score\n", 2326 | "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor\n", 2327 | "from sklearn.tree import DecisionTreeRegressor\n", 2328 | "from sklearn.svm import SVR\n", 2329 | "from sklearn.neighbors import KNeighborsRegressor\n", 2330 | "from xgboost import XGBRegressor\n", 2331 | "from sklearn.linear_model import LinearRegression\n", 2332 | "from sklearn.metrics import mean_squared_error\n", 2333 | "from catboost import CatBoostRegressor\n", 2334 | "import warnings\n", 2335 | "warnings.filterwarnings('ignore')\n", 2336 | "\n", 2337 | "\n", 2338 | "# Creating a list of regressor algorithms to compare with\n", 2339 | "#\n", 2340 | "models = [RandomForestRegressor(), GradientBoostingRegressor(), AdaBoostRegressor(), DecisionTreeRegressor(), XGBRegressor(objective ='reg:squarederror'),\\\n", 2341 | " SVR(), KNeighborsRegressor(), LinearRegression(), CatBoostRegressor(logging_level='Silent')]\n", 2342 | "\n", 2343 | "\n", 2344 | "# Creating lists of the algorithms, to store the accuracy scores of each fold\n", 2345 | "#\n", 2346 | "RandomForest, GradientBoosting, AdaBoost, DecisionTree, XGB, SVR, KNeighbors, Linear, Cat = ([] for x in range(9))\n", 2347 | "\n", 2348 | "\n", 2349 | "# Creating a list containig the list of each algorithm. Created for easy iteration\n", 2350 | "#\n", 2351 | "model_list = [RandomForest, GradientBoosting, AdaBoost, DecisionTree, XGB, SVR, KNeighbors, Linear, Cat]\n", 2352 | "\n", 2353 | "\n", 2354 | "# Spliting the data into features and the target variable\n", 2355 | "#\n", 2356 | "X = train.drop('Square_ID', axis = 1)\n", 2357 | "y = target\n", 2358 | "\n", 2359 | "\n", 2360 | "# Creating a cross validation of 10 folds\n", 2361 | "#\n", 2362 | "kfold = KFold(n_splits=10, random_state=101)\n", 2363 | "\n", 2364 | "\n", 2365 | "# Iterating through each model and appending the scores of each fold to the appriopriate list\n", 2366 | "#\n", 2367 | "for i, j in zip(models, model_list):\n", 2368 | " j.extend(list(cross_val_score(i, X, y, scoring = 'neg_mean_squared_error', cv = kfold)))\n", 2369 | "\n", 2370 | " \n", 2371 | "# Creating a function to convert neg_mean_squared_error to a square root\n", 2372 | "#\n", 2373 | "def sq(lis):\n", 2374 | " new_lis = []\n", 2375 | " lis = np.array(lis)\n", 2376 | " for i in lis:\n", 2377 | " i = np.sqrt(i*-1)\n", 2378 | " new_lis.append(i)\n", 2379 | " return new_lis\n", 2380 | "\n", 2381 | "\n", 2382 | "# Creating a dataframe of all the rmses from the iterations for each model\n", 2383 | "#\n", 2384 | "rmses = pd.DataFrame({'Fold': np.arange(1, 11), 'RandomForest': sq(RandomForest), 'GradientBoosting': sq(GradientBoosting), 'Adaboost': sq(AdaBoost), 'DecisionTree': sq(DecisionTree),\\\n", 2385 | " 'XGB': sq(XGB), 'SVR': sq(SVR), 'Kneighbors': sq(KNeighbors), 'Linear': sq(Linear), 'Cat': sq(Cat)})\n", 2386 | "\n", 2387 | "# Setting the index\n", 2388 | "#\n", 2389 | "rmses.set_index('Fold', inplace = True)\n", 2390 | "\n", 2391 | "\n", 2392 | "# Calculating the mean and standard deviation rmse of each algorithm\n", 2393 | "#\n", 2394 | "rmses.loc['mean'] = rmses.mean()\n", 2395 | "rmses.loc['std'] = rmses.std()\n", 2396 | "\n", 2397 | "\n", 2398 | "# Previewing the rmses dataframe\n", 2399 | "#\n", 2400 | "rmses" 2401 | ], 2402 | "execution_count": 0, 2403 | "outputs": [ 2404 | { 2405 | "output_type": "execute_result", 2406 | "data": { 2407 | "text/html": [ 2408 | "
\n", 2409 | "\n", 2422 | "\n", 2423 | " \n", 2424 | " \n", 2425 | " \n", 2426 | " \n", 2427 | " \n", 2428 | " \n", 2429 | " \n", 2430 | " \n", 2431 | " \n", 2432 | " \n", 2433 | " \n", 2434 | " \n", 2435 | " \n", 2436 | " \n", 2437 | " \n", 2438 | " \n", 2439 | " \n", 2440 | " \n", 2441 | " \n", 2442 | " \n", 2443 | " \n", 2444 | " \n", 2445 | " \n", 2446 | " \n", 2447 | " \n", 2448 | " \n", 2449 | " \n", 2450 | " \n", 2451 | " \n", 2452 | " \n", 2453 | " \n", 2454 | " \n", 2455 | " \n", 2456 | " \n", 2457 | " \n", 2458 | " \n", 2459 | " \n", 2460 | " \n", 2461 | " \n", 2462 | " \n", 2463 | " \n", 2464 | " \n", 2465 | " \n", 2466 | " \n", 2467 | " \n", 2468 | " \n", 2469 | " \n", 2470 | " \n", 2471 | " \n", 2472 | " \n", 2473 | " \n", 2474 | " \n", 2475 | " \n", 2476 | " \n", 2477 | " \n", 2478 | " \n", 2479 | " \n", 2480 | " \n", 2481 | " \n", 2482 | " \n", 2483 | " \n", 2484 | " \n", 2485 | " \n", 2486 | " \n", 2487 | " \n", 2488 | " \n", 2489 | " \n", 2490 | " \n", 2491 | " \n", 2492 | " \n", 2493 | " \n", 2494 | " \n", 2495 | " \n", 2496 | " \n", 2497 | " \n", 2498 | " \n", 2499 | " \n", 2500 | " \n", 2501 | " \n", 2502 | " \n", 2503 | " \n", 2504 | " \n", 2505 | " \n", 2506 | " \n", 2507 | " \n", 2508 | " \n", 2509 | " \n", 2510 | " \n", 2511 | " \n", 2512 | " \n", 2513 | " \n", 2514 | " \n", 2515 | " \n", 2516 | " \n", 2517 | " \n", 2518 | " \n", 2519 | " \n", 2520 | " \n", 2521 | " \n", 2522 | " \n", 2523 | " \n", 2524 | " \n", 2525 | " \n", 2526 | " \n", 2527 | " \n", 2528 | " \n", 2529 | " \n", 2530 | " \n", 2531 | " \n", 2532 | " \n", 2533 | " \n", 2534 | " \n", 2535 | " \n", 2536 | " \n", 2537 | " \n", 2538 | " \n", 2539 | " \n", 2540 | " \n", 2541 | " \n", 2542 | " \n", 2543 | " \n", 2544 | " \n", 2545 | " \n", 2546 | " \n", 2547 | " \n", 2548 | " \n", 2549 | " \n", 2550 | " \n", 2551 | " \n", 2552 | " \n", 2553 | " \n", 2554 | " \n", 2555 | " \n", 2556 | " \n", 2557 | " \n", 2558 | " \n", 2559 | " \n", 2560 | " \n", 2561 | " \n", 2562 | " \n", 2563 | " \n", 2564 | " \n", 2565 | " \n", 2566 | " \n", 2567 | " \n", 2568 | " \n", 2569 | " \n", 2570 | " \n", 2571 | " \n", 2572 | " \n", 2573 | " \n", 2574 | " \n", 2575 | " \n", 2576 | " \n", 2577 | " \n", 2578 | " \n", 2579 | " \n", 2580 | " \n", 2581 | " \n", 2582 | " \n", 2583 | " \n", 2584 | " \n", 2585 | " \n", 2586 | " \n", 2587 | " \n", 2588 | " \n", 2589 | " \n", 2590 | " \n", 2591 | " \n", 2592 | " \n", 2593 | " \n", 2594 | " \n", 2595 | "
RandomForestGradientBoostingAdaboostDecisionTreeXGBSVRKneighborsLinearCat
Fold
10.0859370.0845690.0913660.0859260.0852660.1300310.0860230.1355250.084636
20.0734270.0593870.0883000.0890460.0584360.1095980.0583850.0897540.062418
30.1126100.0899130.0919470.1410380.0883110.1275530.0967830.1217610.094261
40.1599490.1660180.2135430.1986350.1701620.1982250.1918400.2627400.160362
50.1602060.1791720.2189340.2243280.1761870.2069090.1772720.3368410.162315
60.1095050.1186840.1490150.1334930.1189870.1489060.1093680.2315560.109079
70.0589810.0569480.0814550.0647640.0587300.1125760.0592420.1533790.052203
80.1574630.1021680.1248360.1575400.1011400.1957070.1846380.1155950.145777
90.2464380.2646490.2691140.2729450.2609120.2767310.3240140.3411670.260180
100.2246250.2258230.3664040.3153160.2273950.2366190.2304690.3706600.216168
mean0.1389140.1347330.1694910.1683030.1345530.1742860.1518030.2158980.134740
std0.0593200.0676020.0905300.0790450.0671240.0540880.0810090.1009520.063839
\n", 2596 | "
" 2597 | ], 2598 | "text/plain": [ 2599 | " RandomForest GradientBoosting Adaboost ... Kneighbors Linear Cat\n", 2600 | "Fold ... \n", 2601 | "1 0.085937 0.084569 0.091366 ... 0.086023 0.135525 0.084636\n", 2602 | "2 0.073427 0.059387 0.088300 ... 0.058385 0.089754 0.062418\n", 2603 | "3 0.112610 0.089913 0.091947 ... 0.096783 0.121761 0.094261\n", 2604 | "4 0.159949 0.166018 0.213543 ... 0.191840 0.262740 0.160362\n", 2605 | "5 0.160206 0.179172 0.218934 ... 0.177272 0.336841 0.162315\n", 2606 | "6 0.109505 0.118684 0.149015 ... 0.109368 0.231556 0.109079\n", 2607 | "7 0.058981 0.056948 0.081455 ... 0.059242 0.153379 0.052203\n", 2608 | "8 0.157463 0.102168 0.124836 ... 0.184638 0.115595 0.145777\n", 2609 | "9 0.246438 0.264649 0.269114 ... 0.324014 0.341167 0.260180\n", 2610 | "10 0.224625 0.225823 0.366404 ... 0.230469 0.370660 0.216168\n", 2611 | "mean 0.138914 0.134733 0.169491 ... 0.151803 0.215898 0.134740\n", 2612 | "std 0.059320 0.067602 0.090530 ... 0.081009 0.100952 0.063839\n", 2613 | "\n", 2614 | "[12 rows x 9 columns]" 2615 | ] 2616 | }, 2617 | "metadata": { 2618 | "tags": [] 2619 | }, 2620 | "execution_count": 17 2621 | } 2622 | ] 2623 | }, 2624 | { 2625 | "cell_type": "markdown", 2626 | "metadata": { 2627 | "id": "cMvlmS9lTVvY", 2628 | "colab_type": "text" 2629 | }, 2630 | "source": [ 2631 | "#### Selecting the top three models with the least RMSE" 2632 | ] 2633 | }, 2634 | { 2635 | "cell_type": "code", 2636 | "metadata": { 2637 | "id": "xl3ZnMjQN5dh", 2638 | "colab_type": "code", 2639 | "outputId": "1d34f7ae-b19b-4be6-ce9f-4f3b0545dda9", 2640 | "colab": { 2641 | "base_uri": "https://localhost:8080/", 2642 | "height": 34 2643 | } 2644 | }, 2645 | "source": [ 2646 | "# Checking for the regressor with minimum root mean squared error\n", 2647 | "#\n", 2648 | "rmses.loc['mean'].idxmin(), rmses.loc['mean'].min()" 2649 | ], 2650 | "execution_count": 0, 2651 | "outputs": [ 2652 | { 2653 | "output_type": "execute_result", 2654 | "data": { 2655 | "text/plain": [ 2656 | "('XGB', 0.13455272135801205)" 2657 | ] 2658 | }, 2659 | "metadata": { 2660 | "tags": [] 2661 | }, 2662 | "execution_count": 18 2663 | } 2664 | ] 2665 | }, 2666 | { 2667 | "cell_type": "code", 2668 | "metadata": { 2669 | "id": "lRBUedWKu2TH", 2670 | "colab_type": "code", 2671 | "outputId": "fbbd51ca-907b-46bb-c550-880ef3b8d5c2", 2672 | "colab": { 2673 | "base_uri": "https://localhost:8080/", 2674 | "height": 193 2675 | } 2676 | }, 2677 | "source": [ 2678 | "# Arranging the models in ascending order\n", 2679 | "#\n", 2680 | "rmses.loc['mean'].sort_values()" 2681 | ], 2682 | "execution_count": 0, 2683 | "outputs": [ 2684 | { 2685 | "output_type": "execute_result", 2686 | "data": { 2687 | "text/plain": [ 2688 | "XGB 0.134553\n", 2689 | "GradientBoosting 0.134733\n", 2690 | "Cat 0.134740\n", 2691 | "RandomForest 0.138914\n", 2692 | "Kneighbors 0.151803\n", 2693 | "DecisionTree 0.168303\n", 2694 | "Adaboost 0.169491\n", 2695 | "SVR 0.174286\n", 2696 | "Linear 0.215898\n", 2697 | "Name: mean, dtype: float64" 2698 | ] 2699 | }, 2700 | "metadata": { 2701 | "tags": [] 2702 | }, 2703 | "execution_count": 19 2704 | } 2705 | ] 2706 | }, 2707 | { 2708 | "cell_type": "markdown", 2709 | "metadata": { 2710 | "id": "JRgVpDTSTouK", 2711 | "colab_type": "text" 2712 | }, 2713 | "source": [ 2714 | "## Training the top three models and making predictions" 2715 | ] 2716 | }, 2717 | { 2718 | "cell_type": "code", 2719 | "metadata": { 2720 | "id": "-xSnVX_ZN5Ys", 2721 | "colab_type": "code", 2722 | "colab": {} 2723 | }, 2724 | "source": [ 2725 | "# Using the top three models; XGBoost, Catboost and Gradientboost to train and make predictions\n", 2726 | "# Creating a list of models to use\n", 2727 | "models = [XGBRegressor(objective ='reg:squarederror'), CatBoostRegressor(logging_level='Silent'), GradientBoostingRegressor()]\n", 2728 | "model_names = ['xgboost', 'catboost', 'gradientboost']\n", 2729 | "\n", 2730 | "\n", 2731 | "# Selecting the training features and the target feature\n", 2732 | "#\n", 2733 | "X = train.drop('Square_ID', axis = 1)\n", 2734 | "y = target\n", 2735 | "\n", 2736 | "\n", 2737 | "# Submission dataset\n", 2738 | "#\n", 2739 | "sub = test.drop('Square_ID', axis = 1)\n", 2740 | "\n", 2741 | "\n", 2742 | "# Using a for loop to create a submission file for each model\n", 2743 | "#\n", 2744 | "for model, model_name in zip(models, model_names):\n", 2745 | " regressor = model # instantiating the model\n", 2746 | " regressor.fit(X, y) # Training the model\n", 2747 | " predictions = regressor.predict(sub) # Making predictions\n", 2748 | " submission_df = pd.DataFrame({'Square_ID': test.Square_ID, 'target_2019': predictions}) # Creating a submission file\n", 2749 | " submission_df.to_csv(model_name + '_baseline.csv', index = False)" 2750 | ], 2751 | "execution_count": 0, 2752 | "outputs": [] 2753 | }, 2754 | { 2755 | "cell_type": "markdown", 2756 | "metadata": { 2757 | "id": "aNTalTIjTwg4", 2758 | "colab_type": "text" 2759 | }, 2760 | "source": [ 2761 | "*The models yielded the following Root Mean Squared Errors:*\n", 2762 | " - XGBRegressor: 0.250710809791906\n", 2763 | " - **CatBoostRegressor: 0.118661182373564**\n", 2764 | " - GradientBoostingRegressor: 0.608857842367698\n", 2765 | " \n", 2766 | "The CatBoostRegressor was the most accurate with an RMSE of 0.118661182373564" 2767 | ] 2768 | }, 2769 | { 2770 | "cell_type": "markdown", 2771 | "metadata": { 2772 | "id": "mqV0M63KVIIo", 2773 | "colab_type": "text" 2774 | }, 2775 | "source": [ 2776 | "# Next Steps:\n", 2777 | "To further improve the accuracy of the model, the following should be considered:\n", 2778 | " - A thorough Exploratory Data Analysis\n", 2779 | " - Feature Engineering\n", 2780 | " - Feature Selection\n", 2781 | " - Hyperparameter Tuning\n", 2782 | " - Model Evaluation\n", 2783 | " - Model interpretation\n", 2784 | " - Source for more data\n", 2785 | " \n", 2786 | "For any suggestions or clarifications, feel free to reach out @ [Darius Moruri - Linkedin](https://www.linkedin.com/in/dariusmoruri/)\n" 2787 | ] 2788 | } 2789 | ] 2790 | } --------------------------------------------------------------------------------