├── LICENSE
├── README.md
└── Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Darius Moruri
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Flood-Prediction-in-Malawi--Zindi-Competition
2 | ## Starter Code for Flood Prediction in Malawi
3 | ### Author: [Darius Moruri](https://www.linkedin.com/in/dariusmoruri/)
4 |
5 |
6 | ---
7 |
8 | - This is a simple starter code to get you going for the [Zindi flood prediction competition](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi)
9 | - As this is just a basic machine learning pipeline, the following aspects haven't been covered:
10 | - Exploratory Data Analysis
11 | - Feature Engineering
12 | - Feature Selection
13 | - Hyperparameter Tuning
14 | - Model Evaluation
15 | - Model interpretation
16 | - Sourcing for more data
17 | - Documentation and Presentation
18 |
19 | *Despite its basic approach, this starter code yieldied a satisfacatory RMSE of **0.11866** and a **top 15 ranking** (as at the time of writing) in the [public leaderboard](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi/leaderboard)*
20 |
21 | ## Context
22 | On 14 March 2019, tropical Cyclone Idai made landfall at the port of Beira, Mozambique, before moving across the region. Millions of people in Malawi, Mozambique and Zimbabwe have been affected by what is the worst natural disaster to hit southern Africa in at least two decades.
23 |
24 | In recent decades, countries across Africa have experienced an increase in the frequency and severity of floods. Malawi has been hit with major floods in 2015 and again in 2019. In fact, between 1946 and 2013, floods accounted for 48% of major disasters in Malawi. The Lower Shire Valley in southern Malawi, bordering Mozambique, composed of Chikwawa and Nsanje Districts is the area most prone to flooding.
25 |
26 | The objective of this challenge is to build a machine learning model that helps predict the location and extent of floods in southern Malawi.
27 |
28 |
29 | ## Data
30 | The training data for this competion can be found [here](https://drive.google.com/file/d/13PmGuIpBbgc-BaDeXxR8-i-9E3oGZYY0/view?usp=sharing)
31 | and a sample of the submission file can be found [here](https://drive.google.com/file/d/1HBdLXuiXkhRHDoPSUUpbvw6Eh5OredLy/view?usp=sharing)
32 |
33 | ## Evaluation
34 | The error metric for this competition is the Root Mean Squared Error
35 |
36 |
37 |
--------------------------------------------------------------------------------
/Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "Malawi_Flood_Prediction__starter_code__by_DariusMoruri.ipynb",
7 | "provenance": [],
8 | "collapsed_sections": [],
9 | "include_colab_link": true
10 | },
11 | "kernelspec": {
12 | "name": "python3",
13 | "display_name": "Python 3"
14 | },
15 | "accelerator": "GPU"
16 | },
17 | "cells": [
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {
21 | "id": "view-in-github",
22 | "colab_type": "text"
23 | },
24 | "source": [
25 | "
"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {
31 | "id": "qaqB2GS7FYRK",
32 | "colab_type": "text"
33 | },
34 | "source": [
35 | "# Starter Code for Flood Prediction in Malawi\n",
36 | "### Author: [Darius Moruri](https://www.linkedin.com/in/dariusmoruri/)\n",
37 | "\n",
38 | "\n",
39 | "---\n",
40 | "\n",
41 | " - This is a simple starter code to get you going for the [Zindi flood prediction competition](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi)\n",
42 | " - As it is just a basic machine learning pipeline, the following aspects haven't been covered:\n",
43 | " - Exploratory Data Analysis\n",
44 | " - Feature Engineering\n",
45 | " - Feature Selection\n",
46 | " - Hyperparameter Tuning\n",
47 | " - Model Evaluation\n",
48 | " - Model interpretation\n",
49 | " - Sourcing for more data\n",
50 | " - Documentation and Presentation\n",
51 | "\n",
52 | "*Despite its basic approach, this starter code yieldied a satisfacatory RMSE of **0.11866** and a **top 15 ranking** (as at the time of writing) in the [public leaderboard](https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi/leaderboard)*\n",
53 | "\n",
54 | "## Context\n",
55 | "On 14 March 2019, tropical Cyclone Idai made landfall at the port of Beira, Mozambique, before moving across the region. Millions of people in Malawi, Mozambique and Zimbabwe have been affected by what is the worst natural disaster to hit southern Africa in at least two decades.\n",
56 | "\n",
57 | "In recent decades, countries across Africa have experienced an increase in the frequency and severity of floods. Malawi has been hit with major floods in 2015 and again in 2019. In fact, between 1946 and 2013, floods accounted for 48% of major disasters in Malawi. The Lower Shire Valley in southern Malawi, bordering Mozambique, composed of Chikwawa and Nsanje Districts is the area most prone to flooding.\n",
58 | "\n",
59 | "The objective of this challenge is to build a machine learning model that helps predict the location and extent of floods in southern Malawi.\n",
60 | "\n",
61 | "\n",
62 | "## Data\n",
63 | "The training data for this competion can be found [here](https://drive.google.com/file/d/13PmGuIpBbgc-BaDeXxR8-i-9E3oGZYY0/view?usp=sharing)\n",
64 | "and a sample of the submission file can be found [here](https://drive.google.com/file/d/1HBdLXuiXkhRHDoPSUUpbvw6Eh5OredLy/view?usp=sharing)\n",
65 | "\n",
66 | "## Evaluation\n",
67 | "The error metric for this competition is the Root Mean Squared Error\n",
68 | "\n"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {
74 | "id": "KPejNILARlEE",
75 | "colab_type": "text"
76 | },
77 | "source": [
78 | "## Importing the Necessary Libraries"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "metadata": {
84 | "id": "QAXGhwzWHLpq",
85 | "colab_type": "code",
86 | "colab": {}
87 | },
88 | "source": [
89 | "# Importing libraries\n",
90 | "#\n",
91 | "import pandas as pd\n",
92 | "import numpy as np\n",
93 | "import requests\n",
94 | "from io import StringIO \n",
95 | "import warnings\n",
96 | "warnings.filterwarnings('ignore')"
97 | ],
98 | "execution_count": 0,
99 | "outputs": []
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {
104 | "id": "AWbBYTZSRoIH",
105 | "colab_type": "text"
106 | },
107 | "source": [
108 | "## Reading the Data"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "metadata": {
114 | "id": "FZTl-W7uHe3Y",
115 | "colab_type": "code",
116 | "colab": {}
117 | },
118 | "source": [
119 | "# Google drive links to shared submission and training datasets\n",
120 | "#\n",
121 | "submission = 'https://drive.google.com/file/d/1XhTATUUEIKpkFudV9HygYzbJZc4mWcHo/view?usp=sharing'\n",
122 | "train = 'https://drive.google.com/file/d/1hqS1wAoClLHN0aFJABL4myzFMDn4USgV/view?usp=sharing'\n",
123 | "\n",
124 | "\n",
125 | "# Creating a function to read a csv file shared via google\n",
126 | "#\n",
127 | "def read_csv(url):\n",
128 | " url = 'https://drive.google.com/uc?export=download&id=' + url.split('/')[-2]\n",
129 | " csv_raw = requests.get(url).text\n",
130 | " csv = StringIO(csv_raw)\n",
131 | " df = pd.read_csv(csv)\n",
132 | " return df\n",
133 | "\n",
134 | "# Creating submission and training datataframes\n",
135 | "#\n",
136 | "sub = read_csv(submission)\n",
137 | "df = read_csv(train)"
138 | ],
139 | "execution_count": 0,
140 | "outputs": []
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {
145 | "id": "5YU4NXvPR483",
146 | "colab_type": "text"
147 | },
148 | "source": [
149 | "## Basic Data Analysis"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "metadata": {
155 | "id": "N93Od3lxHe07",
156 | "colab_type": "code",
157 | "outputId": "f09c457b-995b-422f-98d4-51981eb01791",
158 | "colab": {
159 | "base_uri": "https://localhost:8080/",
160 | "height": 551
161 | }
162 | },
163 | "source": [
164 | "# Previewing the first five rows of the dataframe\n",
165 | "#\n",
166 | "df.head()"
167 | ],
168 | "execution_count": 0,
169 | "outputs": [
170 | {
171 | "output_type": "execute_result",
172 | "data": {
173 | "text/html": [
174 | "
\n",
175 | "\n",
188 | "
\n",
189 | " \n",
190 | " \n",
191 | " | \n",
192 | " X | \n",
193 | " Y | \n",
194 | " target_2015 | \n",
195 | " elevation | \n",
196 | " precip 2014-11-16 - 2014-11-23 | \n",
197 | " precip 2014-11-23 - 2014-11-30 | \n",
198 | " precip 2014-11-30 - 2014-12-07 | \n",
199 | " precip 2014-12-07 - 2014-12-14 | \n",
200 | " precip 2014-12-14 - 2014-12-21 | \n",
201 | " precip 2014-12-21 - 2014-12-28 | \n",
202 | " precip 2014-12-28 - 2015-01-04 | \n",
203 | " precip 2015-01-04 - 2015-01-11 | \n",
204 | " precip 2015-01-11 - 2015-01-18 | \n",
205 | " precip 2015-01-18 - 2015-01-25 | \n",
206 | " precip 2015-01-25 - 2015-02-01 | \n",
207 | " precip 2015-02-01 - 2015-02-08 | \n",
208 | " precip 2015-02-08 - 2015-02-15 | \n",
209 | " precip 2015-02-15 - 2015-02-22 | \n",
210 | " precip 2015-02-22 - 2015-03-01 | \n",
211 | " precip 2015-03-01 - 2015-03-08 | \n",
212 | " precip 2015-03-08 - 2015-03-15 | \n",
213 | " precip 2019-01-20 - 2019-01-27 | \n",
214 | " precip 2019-01-27 - 2019-02-03 | \n",
215 | " precip 2019-02-03 - 2019-02-10 | \n",
216 | " precip 2019-02-10 - 2019-02-17 | \n",
217 | " precip 2019-02-17 - 2019-02-24 | \n",
218 | " precip 2019-02-24 - 2019-03-03 | \n",
219 | " precip 2019-03-03 - 2019-03-10 | \n",
220 | " precip 2019-03-10 - 2019-03-17 | \n",
221 | " precip 2019-03-17 - 2019-03-24 | \n",
222 | " precip 2019-03-24 - 2019-03-31 | \n",
223 | " precip 2019-03-31 - 2019-04-07 | \n",
224 | " precip 2019-04-07 - 2019-04-14 | \n",
225 | " precip 2019-04-14 - 2019-04-21 | \n",
226 | " precip 2019-04-21 - 2019-04-28 | \n",
227 | " precip 2019-04-28 - 2019-05-05 | \n",
228 | " precip 2019-05-05 - 2019-05-12 | \n",
229 | " precip 2019-05-12 - 2019-05-19 | \n",
230 | " LC_Type1_mode | \n",
231 | " Square_ID | \n",
232 | "
\n",
233 | " \n",
234 | " \n",
235 | " \n",
236 | " | 0 | \n",
237 | " 34.26 | \n",
238 | " -15.91 | \n",
239 | " 0.0 | \n",
240 | " 887.764222 | \n",
241 | " 0.0 | \n",
242 | " 0.0 | \n",
243 | " 0.0 | \n",
244 | " 14.844025 | \n",
245 | " 14.552823 | \n",
246 | " 12.237766 | \n",
247 | " 57.451361 | \n",
248 | " 30.127047 | \n",
249 | " 30.449468 | \n",
250 | " 1.521829 | \n",
251 | " 29.389995 | \n",
252 | " 32.878318 | \n",
253 | " 8.179804 | \n",
254 | " 0.963981 | \n",
255 | " 16.659097 | \n",
256 | " 3.304466 | \n",
257 | " 0.0 | \n",
258 | " 12.99262 | \n",
259 | " 4.582856 | \n",
260 | " 35.037532 | \n",
261 | " 4.796012 | \n",
262 | " 28.083314 | \n",
263 | " 0.0 | \n",
264 | " 58.362456 | \n",
265 | " 18.264692 | \n",
266 | " 17.537486 | \n",
267 | " 0.896323 | \n",
268 | " 1.68 | \n",
269 | " 0.0 | \n",
270 | " 0.0 | \n",
271 | " 0.0 | \n",
272 | " 0.0 | \n",
273 | " 0.0 | \n",
274 | " 0.0 | \n",
275 | " 9 | \n",
276 | " 4e3c3896-14ce-11ea-bce5-f49634744a41 | \n",
277 | "
\n",
278 | " \n",
279 | " | 1 | \n",
280 | " 34.26 | \n",
281 | " -15.90 | \n",
282 | " 0.0 | \n",
283 | " 743.403912 | \n",
284 | " 0.0 | \n",
285 | " 0.0 | \n",
286 | " 0.0 | \n",
287 | " 14.844025 | \n",
288 | " 14.552823 | \n",
289 | " 12.237766 | \n",
290 | " 57.451361 | \n",
291 | " 30.127047 | \n",
292 | " 30.449468 | \n",
293 | " 1.521829 | \n",
294 | " 29.389995 | \n",
295 | " 32.878318 | \n",
296 | " 8.179804 | \n",
297 | " 0.963981 | \n",
298 | " 16.659097 | \n",
299 | " 3.304466 | \n",
300 | " 0.0 | \n",
301 | " 12.99262 | \n",
302 | " 4.582856 | \n",
303 | " 35.037532 | \n",
304 | " 4.796012 | \n",
305 | " 28.083314 | \n",
306 | " 0.0 | \n",
307 | " 58.362456 | \n",
308 | " 18.264692 | \n",
309 | " 17.537486 | \n",
310 | " 0.896323 | \n",
311 | " 1.68 | \n",
312 | " 0.0 | \n",
313 | " 0.0 | \n",
314 | " 0.0 | \n",
315 | " 0.0 | \n",
316 | " 0.0 | \n",
317 | " 0.0 | \n",
318 | " 9 | \n",
319 | " 4e3c3897-14ce-11ea-bce5-f49634744a41 | \n",
320 | "
\n",
321 | " \n",
322 | " | 2 | \n",
323 | " 34.26 | \n",
324 | " -15.89 | \n",
325 | " 0.0 | \n",
326 | " 565.728343 | \n",
327 | " 0.0 | \n",
328 | " 0.0 | \n",
329 | " 0.0 | \n",
330 | " 14.844025 | \n",
331 | " 14.552823 | \n",
332 | " 12.237766 | \n",
333 | " 57.451361 | \n",
334 | " 30.127047 | \n",
335 | " 30.449468 | \n",
336 | " 1.521829 | \n",
337 | " 29.389995 | \n",
338 | " 32.878318 | \n",
339 | " 8.179804 | \n",
340 | " 0.963981 | \n",
341 | " 16.659097 | \n",
342 | " 3.304466 | \n",
343 | " 0.0 | \n",
344 | " 12.99262 | \n",
345 | " 4.582856 | \n",
346 | " 35.037532 | \n",
347 | " 4.796012 | \n",
348 | " 28.083314 | \n",
349 | " 0.0 | \n",
350 | " 58.362456 | \n",
351 | " 18.264692 | \n",
352 | " 17.537486 | \n",
353 | " 0.896323 | \n",
354 | " 1.68 | \n",
355 | " 0.0 | \n",
356 | " 0.0 | \n",
357 | " 0.0 | \n",
358 | " 0.0 | \n",
359 | " 0.0 | \n",
360 | " 0.0 | \n",
361 | " 9 | \n",
362 | " 4e3c3898-14ce-11ea-bce5-f49634744a41 | \n",
363 | "
\n",
364 | " \n",
365 | " | 3 | \n",
366 | " 34.26 | \n",
367 | " -15.88 | \n",
368 | " 0.0 | \n",
369 | " 443.392774 | \n",
370 | " 0.0 | \n",
371 | " 0.0 | \n",
372 | " 0.0 | \n",
373 | " 14.844025 | \n",
374 | " 14.552823 | \n",
375 | " 12.237766 | \n",
376 | " 57.451361 | \n",
377 | " 30.127047 | \n",
378 | " 30.449468 | \n",
379 | " 1.521829 | \n",
380 | " 29.389995 | \n",
381 | " 32.878318 | \n",
382 | " 8.179804 | \n",
383 | " 0.963981 | \n",
384 | " 16.659097 | \n",
385 | " 3.304466 | \n",
386 | " 0.0 | \n",
387 | " 12.99262 | \n",
388 | " 4.582856 | \n",
389 | " 35.037532 | \n",
390 | " 4.796012 | \n",
391 | " 28.083314 | \n",
392 | " 0.0 | \n",
393 | " 58.362456 | \n",
394 | " 18.264692 | \n",
395 | " 17.537486 | \n",
396 | " 0.896323 | \n",
397 | " 1.68 | \n",
398 | " 0.0 | \n",
399 | " 0.0 | \n",
400 | " 0.0 | \n",
401 | " 0.0 | \n",
402 | " 0.0 | \n",
403 | " 0.0 | \n",
404 | " 10 | \n",
405 | " 4e3c3899-14ce-11ea-bce5-f49634744a41 | \n",
406 | "
\n",
407 | " \n",
408 | " | 4 | \n",
409 | " 34.26 | \n",
410 | " -15.87 | \n",
411 | " 0.0 | \n",
412 | " 437.443428 | \n",
413 | " 0.0 | \n",
414 | " 0.0 | \n",
415 | " 0.0 | \n",
416 | " 14.844025 | \n",
417 | " 14.552823 | \n",
418 | " 12.237766 | \n",
419 | " 57.451361 | \n",
420 | " 30.127047 | \n",
421 | " 30.449468 | \n",
422 | " 1.521829 | \n",
423 | " 29.389995 | \n",
424 | " 32.878318 | \n",
425 | " 8.179804 | \n",
426 | " 0.963981 | \n",
427 | " 16.659097 | \n",
428 | " 3.304466 | \n",
429 | " 0.0 | \n",
430 | " 12.99262 | \n",
431 | " 4.582856 | \n",
432 | " 35.037532 | \n",
433 | " 4.796012 | \n",
434 | " 28.083314 | \n",
435 | " 0.0 | \n",
436 | " 58.362456 | \n",
437 | " 18.264692 | \n",
438 | " 17.537486 | \n",
439 | " 0.896323 | \n",
440 | " 1.68 | \n",
441 | " 0.0 | \n",
442 | " 0.0 | \n",
443 | " 0.0 | \n",
444 | " 0.0 | \n",
445 | " 0.0 | \n",
446 | " 0.0 | \n",
447 | " 10 | \n",
448 | " 4e3c389a-14ce-11ea-bce5-f49634744a41 | \n",
449 | "
\n",
450 | " \n",
451 | "
\n",
452 | "
"
453 | ],
454 | "text/plain": [
455 | " X Y ... LC_Type1_mode Square_ID\n",
456 | "0 34.26 -15.91 ... 9 4e3c3896-14ce-11ea-bce5-f49634744a41\n",
457 | "1 34.26 -15.90 ... 9 4e3c3897-14ce-11ea-bce5-f49634744a41\n",
458 | "2 34.26 -15.89 ... 9 4e3c3898-14ce-11ea-bce5-f49634744a41\n",
459 | "3 34.26 -15.88 ... 10 4e3c3899-14ce-11ea-bce5-f49634744a41\n",
460 | "4 34.26 -15.87 ... 10 4e3c389a-14ce-11ea-bce5-f49634744a41\n",
461 | "\n",
462 | "[5 rows x 40 columns]"
463 | ]
464 | },
465 | "metadata": {
466 | "tags": []
467 | },
468 | "execution_count": 3
469 | }
470 | ]
471 | },
472 | {
473 | "cell_type": "code",
474 | "metadata": {
475 | "id": "6JawgMLoW5OC",
476 | "colab_type": "code",
477 | "outputId": "4d3bb0e2-ee3e-45cc-ea21-49d6a3a7da31",
478 | "colab": {
479 | "base_uri": "https://localhost:8080/",
480 | "height": 534
481 | }
482 | },
483 | "source": [
484 | "# Previewwing the last ten rows of the dataframe\n",
485 | "#\n",
486 | "df.tail()"
487 | ],
488 | "execution_count": 0,
489 | "outputs": [
490 | {
491 | "output_type": "execute_result",
492 | "data": {
493 | "text/html": [
494 | "\n",
495 | "\n",
508 | "
\n",
509 | " \n",
510 | " \n",
511 | " | \n",
512 | " X | \n",
513 | " Y | \n",
514 | " target_2015 | \n",
515 | " elevation | \n",
516 | " precip 2014-11-16 - 2014-11-23 | \n",
517 | " precip 2014-11-23 - 2014-11-30 | \n",
518 | " precip 2014-11-30 - 2014-12-07 | \n",
519 | " precip 2014-12-07 - 2014-12-14 | \n",
520 | " precip 2014-12-14 - 2014-12-21 | \n",
521 | " precip 2014-12-21 - 2014-12-28 | \n",
522 | " precip 2014-12-28 - 2015-01-04 | \n",
523 | " precip 2015-01-04 - 2015-01-11 | \n",
524 | " precip 2015-01-11 - 2015-01-18 | \n",
525 | " precip 2015-01-18 - 2015-01-25 | \n",
526 | " precip 2015-01-25 - 2015-02-01 | \n",
527 | " precip 2015-02-01 - 2015-02-08 | \n",
528 | " precip 2015-02-08 - 2015-02-15 | \n",
529 | " precip 2015-02-15 - 2015-02-22 | \n",
530 | " precip 2015-02-22 - 2015-03-01 | \n",
531 | " precip 2015-03-01 - 2015-03-08 | \n",
532 | " precip 2015-03-08 - 2015-03-15 | \n",
533 | " precip 2019-01-20 - 2019-01-27 | \n",
534 | " precip 2019-01-27 - 2019-02-03 | \n",
535 | " precip 2019-02-03 - 2019-02-10 | \n",
536 | " precip 2019-02-10 - 2019-02-17 | \n",
537 | " precip 2019-02-17 - 2019-02-24 | \n",
538 | " precip 2019-02-24 - 2019-03-03 | \n",
539 | " precip 2019-03-03 - 2019-03-10 | \n",
540 | " precip 2019-03-10 - 2019-03-17 | \n",
541 | " precip 2019-03-17 - 2019-03-24 | \n",
542 | " precip 2019-03-24 - 2019-03-31 | \n",
543 | " precip 2019-03-31 - 2019-04-07 | \n",
544 | " precip 2019-04-07 - 2019-04-14 | \n",
545 | " precip 2019-04-14 - 2019-04-21 | \n",
546 | " precip 2019-04-21 - 2019-04-28 | \n",
547 | " precip 2019-04-28 - 2019-05-05 | \n",
548 | " precip 2019-05-05 - 2019-05-12 | \n",
549 | " precip 2019-05-12 - 2019-05-19 | \n",
550 | " LC_Type1_mode | \n",
551 | " Square_ID | \n",
552 | "
\n",
553 | " \n",
554 | " \n",
555 | " \n",
556 | " | 16461 | \n",
557 | " 35.86 | \n",
558 | " -15.44 | \n",
559 | " 0.0 | \n",
560 | " 635.675022 | \n",
561 | " 16.956563 | \n",
562 | " 31.155531 | \n",
563 | " 12.882013 | \n",
564 | " 8.810145 | \n",
565 | " 6.179829 | \n",
566 | " 9.863685 | \n",
567 | " 15.765685 | \n",
568 | " 21.457507 | \n",
569 | " 105.275891 | \n",
570 | " 3.645338 | \n",
571 | " 18.531483 | \n",
572 | " 13.816063 | \n",
573 | " 23.728058 | \n",
574 | " 8.794998 | \n",
575 | " 9.369763 | \n",
576 | " 21.428131 | \n",
577 | " 2.493683 | \n",
578 | " 8.760326 | \n",
579 | " 5.177616 | \n",
580 | " 12.450319 | \n",
581 | " 17.289942 | \n",
582 | " 19.612179 | \n",
583 | " 10.909635 | \n",
584 | " 64.494171 | \n",
585 | " 15.940852 | \n",
586 | " 24.828982 | \n",
587 | " 11.335339 | \n",
588 | " 30.984762 | \n",
589 | " 0.518269 | \n",
590 | " 5.770066 | \n",
591 | " 14.839779 | \n",
592 | " 4.928294 | \n",
593 | " 10.526186 | \n",
594 | " 18.746072 | \n",
595 | " 10 | \n",
596 | " 4e6f5dfd-14ce-11ea-bce5-f49634744a41 | \n",
597 | "
\n",
598 | " \n",
599 | " | 16462 | \n",
600 | " 35.86 | \n",
601 | " -15.43 | \n",
602 | " 0.0 | \n",
603 | " 632.598892 | \n",
604 | " 16.956563 | \n",
605 | " 31.155531 | \n",
606 | " 12.882013 | \n",
607 | " 8.810145 | \n",
608 | " 6.179829 | \n",
609 | " 9.863685 | \n",
610 | " 15.765685 | \n",
611 | " 21.457507 | \n",
612 | " 105.275891 | \n",
613 | " 3.645338 | \n",
614 | " 18.531483 | \n",
615 | " 13.816063 | \n",
616 | " 23.728058 | \n",
617 | " 8.794998 | \n",
618 | " 9.369763 | \n",
619 | " 21.428131 | \n",
620 | " 2.493683 | \n",
621 | " 8.760326 | \n",
622 | " 5.177616 | \n",
623 | " 12.450319 | \n",
624 | " 17.289942 | \n",
625 | " 19.612179 | \n",
626 | " 10.909635 | \n",
627 | " 64.494171 | \n",
628 | " 15.940852 | \n",
629 | " 24.828982 | \n",
630 | " 11.335339 | \n",
631 | " 30.984762 | \n",
632 | " 0.518269 | \n",
633 | " 5.770066 | \n",
634 | " 14.839779 | \n",
635 | " 4.928294 | \n",
636 | " 10.526186 | \n",
637 | " 18.746072 | \n",
638 | " 10 | \n",
639 | " 4e6f5dfe-14ce-11ea-bce5-f49634744a41 | \n",
640 | "
\n",
641 | " \n",
642 | " | 16463 | \n",
643 | " 35.86 | \n",
644 | " -15.42 | \n",
645 | " 0.0 | \n",
646 | " 632.450136 | \n",
647 | " 16.956563 | \n",
648 | " 31.155531 | \n",
649 | " 12.882013 | \n",
650 | " 8.810145 | \n",
651 | " 6.179829 | \n",
652 | " 9.863685 | \n",
653 | " 15.765685 | \n",
654 | " 21.457507 | \n",
655 | " 105.275891 | \n",
656 | " 3.645338 | \n",
657 | " 18.531483 | \n",
658 | " 13.816063 | \n",
659 | " 23.728058 | \n",
660 | " 8.794998 | \n",
661 | " 9.369763 | \n",
662 | " 21.428131 | \n",
663 | " 2.493683 | \n",
664 | " 8.760326 | \n",
665 | " 5.177616 | \n",
666 | " 12.450319 | \n",
667 | " 17.289942 | \n",
668 | " 19.612179 | \n",
669 | " 10.909635 | \n",
670 | " 64.494171 | \n",
671 | " 15.940852 | \n",
672 | " 24.828982 | \n",
673 | " 11.335339 | \n",
674 | " 30.984762 | \n",
675 | " 0.518269 | \n",
676 | " 5.770066 | \n",
677 | " 14.839779 | \n",
678 | " 4.928294 | \n",
679 | " 10.526186 | \n",
680 | " 18.746072 | \n",
681 | " 10 | \n",
682 | " 4e6f5dff-14ce-11ea-bce5-f49634744a41 | \n",
683 | "
\n",
684 | " \n",
685 | " | 16464 | \n",
686 | " 35.86 | \n",
687 | " -15.41 | \n",
688 | " 0.0 | \n",
689 | " 629.272733 | \n",
690 | " 16.956563 | \n",
691 | " 31.155531 | \n",
692 | " 12.882013 | \n",
693 | " 8.810145 | \n",
694 | " 6.179829 | \n",
695 | " 9.863685 | \n",
696 | " 15.765685 | \n",
697 | " 21.457507 | \n",
698 | " 105.275891 | \n",
699 | " 3.645338 | \n",
700 | " 18.531483 | \n",
701 | " 13.816063 | \n",
702 | " 23.728058 | \n",
703 | " 8.794998 | \n",
704 | " 9.369763 | \n",
705 | " 21.428131 | \n",
706 | " 2.493683 | \n",
707 | " 8.760326 | \n",
708 | " 5.177616 | \n",
709 | " 12.450319 | \n",
710 | " 17.289942 | \n",
711 | " 19.612179 | \n",
712 | " 10.909635 | \n",
713 | " 64.494171 | \n",
714 | " 15.940852 | \n",
715 | " 24.828982 | \n",
716 | " 11.335339 | \n",
717 | " 30.984762 | \n",
718 | " 0.518269 | \n",
719 | " 5.770066 | \n",
720 | " 14.839779 | \n",
721 | " 4.928294 | \n",
722 | " 10.526186 | \n",
723 | " 18.746072 | \n",
724 | " 10 | \n",
725 | " 4e6f5e00-14ce-11ea-bce5-f49634744a41 | \n",
726 | "
\n",
727 | " \n",
728 | " | 16465 | \n",
729 | " 35.86 | \n",
730 | " -15.40 | \n",
731 | " 0.0 | \n",
732 | " 626.164641 | \n",
733 | " 16.956563 | \n",
734 | " 31.155531 | \n",
735 | " 12.882013 | \n",
736 | " 8.810145 | \n",
737 | " 6.179829 | \n",
738 | " 9.863685 | \n",
739 | " 15.765685 | \n",
740 | " 21.457507 | \n",
741 | " 105.275891 | \n",
742 | " 3.645338 | \n",
743 | " 18.531483 | \n",
744 | " 13.816063 | \n",
745 | " 23.728058 | \n",
746 | " 8.794998 | \n",
747 | " 9.369763 | \n",
748 | " 21.428131 | \n",
749 | " 2.493683 | \n",
750 | " 8.760326 | \n",
751 | " 5.177616 | \n",
752 | " 12.450319 | \n",
753 | " 17.289942 | \n",
754 | " 19.612179 | \n",
755 | " 10.909635 | \n",
756 | " 64.494171 | \n",
757 | " 15.940852 | \n",
758 | " 24.828982 | \n",
759 | " 11.335339 | \n",
760 | " 30.984762 | \n",
761 | " 0.518269 | \n",
762 | " 5.770066 | \n",
763 | " 14.839779 | \n",
764 | " 4.928294 | \n",
765 | " 10.526186 | \n",
766 | " 18.746072 | \n",
767 | " 10 | \n",
768 | " 4e6f5e01-14ce-11ea-bce5-f49634744a41 | \n",
769 | "
\n",
770 | " \n",
771 | "
\n",
772 | "
"
773 | ],
774 | "text/plain": [
775 | " X Y ... LC_Type1_mode Square_ID\n",
776 | "16461 35.86 -15.44 ... 10 4e6f5dfd-14ce-11ea-bce5-f49634744a41\n",
777 | "16462 35.86 -15.43 ... 10 4e6f5dfe-14ce-11ea-bce5-f49634744a41\n",
778 | "16463 35.86 -15.42 ... 10 4e6f5dff-14ce-11ea-bce5-f49634744a41\n",
779 | "16464 35.86 -15.41 ... 10 4e6f5e00-14ce-11ea-bce5-f49634744a41\n",
780 | "16465 35.86 -15.40 ... 10 4e6f5e01-14ce-11ea-bce5-f49634744a41\n",
781 | "\n",
782 | "[5 rows x 40 columns]"
783 | ]
784 | },
785 | "metadata": {
786 | "tags": []
787 | },
788 | "execution_count": 4
789 | }
790 | ]
791 | },
792 | {
793 | "cell_type": "code",
794 | "metadata": {
795 | "id": "AnOG5ZNGHeyC",
796 | "colab_type": "code",
797 | "outputId": "fb029135-7207-4ad1-a906-d693d168b3cd",
798 | "colab": {
799 | "base_uri": "https://localhost:8080/",
800 | "height": 1000
801 | }
802 | },
803 | "source": [
804 | "# Previewing some statistical summaries of the dataframe\n",
805 | "# Transposing for a better view\n",
806 | "#\n",
807 | "df.describe().T"
808 | ],
809 | "execution_count": 0,
810 | "outputs": [
811 | {
812 | "output_type": "execute_result",
813 | "data": {
814 | "text/html": [
815 | "\n",
816 | "\n",
829 | "
\n",
830 | " \n",
831 | " \n",
832 | " | \n",
833 | " count | \n",
834 | " mean | \n",
835 | " std | \n",
836 | " min | \n",
837 | " 25% | \n",
838 | " 50% | \n",
839 | " 75% | \n",
840 | " max | \n",
841 | "
\n",
842 | " \n",
843 | " \n",
844 | " \n",
845 | " | X | \n",
846 | " 16466.0 | \n",
847 | " 35.077656 | \n",
848 | " 0.392395 | \n",
849 | " 34.260000 | \n",
850 | " 34.760000 | \n",
851 | " 35.050000 | \n",
852 | " 35.390000 | \n",
853 | " 35.860000 | \n",
854 | "
\n",
855 | " \n",
856 | " | Y | \n",
857 | " 16466.0 | \n",
858 | " -15.813802 | \n",
859 | " 0.359789 | \n",
860 | " -16.640000 | \n",
861 | " -16.070000 | \n",
862 | " -15.800000 | \n",
863 | " -15.520000 | \n",
864 | " -15.210000 | \n",
865 | "
\n",
866 | " \n",
867 | " | target_2015 | \n",
868 | " 16466.0 | \n",
869 | " 0.076609 | \n",
870 | " 0.228734 | \n",
871 | " 0.000000 | \n",
872 | " 0.000000 | \n",
873 | " 0.000000 | \n",
874 | " 0.000000 | \n",
875 | " 1.000000 | \n",
876 | "
\n",
877 | " \n",
878 | " | elevation | \n",
879 | " 16466.0 | \n",
880 | " 592.848206 | \n",
881 | " 354.790357 | \n",
882 | " 45.541444 | \n",
883 | " 329.063852 | \n",
884 | " 623.000000 | \n",
885 | " 751.434813 | \n",
886 | " 2803.303645 | \n",
887 | "
\n",
888 | " \n",
889 | " | precip 2014-11-16 - 2014-11-23 | \n",
890 | " 16466.0 | \n",
891 | " 1.610760 | \n",
892 | " 4.225461 | \n",
893 | " 0.000000 | \n",
894 | " 0.000000 | \n",
895 | " 0.000000 | \n",
896 | " 1.261848 | \n",
897 | " 19.354969 | \n",
898 | "
\n",
899 | " \n",
900 | " | precip 2014-11-23 - 2014-11-30 | \n",
901 | " 16466.0 | \n",
902 | " 2.502058 | \n",
903 | " 8.631846 | \n",
904 | " 0.000000 | \n",
905 | " 0.000000 | \n",
906 | " 0.000000 | \n",
907 | " 0.000000 | \n",
908 | " 41.023858 | \n",
909 | "
\n",
910 | " \n",
911 | " | precip 2014-11-30 - 2014-12-07 | \n",
912 | " 16466.0 | \n",
913 | " 1.162076 | \n",
914 | " 4.396676 | \n",
915 | " 0.000000 | \n",
916 | " 0.000000 | \n",
917 | " 0.000000 | \n",
918 | " 0.000000 | \n",
919 | " 22.020803 | \n",
920 | "
\n",
921 | " \n",
922 | " | precip 2014-12-07 - 2014-12-14 | \n",
923 | " 16466.0 | \n",
924 | " 8.270610 | \n",
925 | " 4.263375 | \n",
926 | " 1.411452 | \n",
927 | " 5.548440 | \n",
928 | " 7.941822 | \n",
929 | " 10.887235 | \n",
930 | " 18.870675 | \n",
931 | "
\n",
932 | " \n",
933 | " | precip 2014-12-14 - 2014-12-21 | \n",
934 | " 16466.0 | \n",
935 | " 8.892459 | \n",
936 | " 3.760052 | \n",
937 | " 3.580342 | \n",
938 | " 5.905440 | \n",
939 | " 8.618390 | \n",
940 | " 10.960668 | \n",
941 | " 23.044340 | \n",
942 | "
\n",
943 | " \n",
944 | " | precip 2014-12-21 - 2014-12-28 | \n",
945 | " 16466.0 | \n",
946 | " 9.572821 | \n",
947 | " 4.523767 | \n",
948 | " 1.254098 | \n",
949 | " 6.179885 | \n",
950 | " 8.786780 | \n",
951 | " 12.670775 | \n",
952 | " 21.757828 | \n",
953 | "
\n",
954 | " \n",
955 | " | precip 2014-12-28 - 2015-01-04 | \n",
956 | " 16466.0 | \n",
957 | " 22.925036 | \n",
958 | " 13.690451 | \n",
959 | " 7.462999 | \n",
960 | " 11.617057 | \n",
961 | " 18.381539 | \n",
962 | " 31.304699 | \n",
963 | " 62.433432 | \n",
964 | "
\n",
965 | " \n",
966 | " | precip 2015-01-04 - 2015-01-11 | \n",
967 | " 16466.0 | \n",
968 | " 28.113210 | \n",
969 | " 7.794291 | \n",
970 | " 15.648154 | \n",
971 | " 23.483879 | \n",
972 | " 26.085586 | \n",
973 | " 33.587434 | \n",
974 | " 51.197420 | \n",
975 | "
\n",
976 | " \n",
977 | " | precip 2015-01-11 - 2015-01-18 | \n",
978 | " 16466.0 | \n",
979 | " 58.859208 | \n",
980 | " 16.807838 | \n",
981 | " 30.449468 | \n",
982 | " 45.972601 | \n",
983 | " 55.501115 | \n",
984 | " 69.311540 | \n",
985 | " 105.275891 | \n",
986 | "
\n",
987 | " \n",
988 | " | precip 2015-01-18 - 2015-01-25 | \n",
989 | " 16466.0 | \n",
990 | " 1.251173 | \n",
991 | " 1.969923 | \n",
992 | " 0.000000 | \n",
993 | " 0.000000 | \n",
994 | " 0.502164 | \n",
995 | " 1.195866 | \n",
996 | " 11.103658 | \n",
997 | "
\n",
998 | " \n",
999 | " | precip 2015-01-25 - 2015-02-01 | \n",
1000 | " 16466.0 | \n",
1001 | " 34.653177 | \n",
1002 | " 7.456422 | \n",
1003 | " 14.964383 | \n",
1004 | " 30.037450 | \n",
1005 | " 34.363729 | \n",
1006 | " 36.715386 | \n",
1007 | " 53.014243 | \n",
1008 | "
\n",
1009 | " \n",
1010 | " | precip 2015-02-01 - 2015-02-08 | \n",
1011 | " 16466.0 | \n",
1012 | " 28.314888 | \n",
1013 | " 8.047223 | \n",
1014 | " 13.261280 | \n",
1015 | " 22.262417 | \n",
1016 | " 26.512675 | \n",
1017 | " 34.880240 | \n",
1018 | " 44.341312 | \n",
1019 | "
\n",
1020 | " \n",
1021 | " | precip 2015-02-08 - 2015-02-15 | \n",
1022 | " 16466.0 | \n",
1023 | " 12.487909 | \n",
1024 | " 7.064435 | \n",
1025 | " 0.459067 | \n",
1026 | " 5.090802 | \n",
1027 | " 14.092012 | \n",
1028 | " 18.681926 | \n",
1029 | " 28.559923 | \n",
1030 | "
\n",
1031 | " \n",
1032 | " | precip 2015-02-15 - 2015-02-22 | \n",
1033 | " 16466.0 | \n",
1034 | " 3.802584 | \n",
1035 | " 2.674434 | \n",
1036 | " 0.279002 | \n",
1037 | " 1.654155 | \n",
1038 | " 3.301029 | \n",
1039 | " 5.120276 | \n",
1040 | " 15.715008 | \n",
1041 | "
\n",
1042 | " \n",
1043 | " | precip 2015-02-22 - 2015-03-01 | \n",
1044 | " 16466.0 | \n",
1045 | " 17.072285 | \n",
1046 | " 6.074926 | \n",
1047 | " 6.728685 | \n",
1048 | " 13.769957 | \n",
1049 | " 15.549508 | \n",
1050 | " 19.836449 | \n",
1051 | " 36.964993 | \n",
1052 | "
\n",
1053 | " \n",
1054 | " | precip 2015-03-01 - 2015-03-08 | \n",
1055 | " 16466.0 | \n",
1056 | " 9.110949 | \n",
1057 | " 4.572201 | \n",
1058 | " 3.283425 | \n",
1059 | " 5.538848 | \n",
1060 | " 8.235819 | \n",
1061 | " 11.308650 | \n",
1062 | " 25.711649 | \n",
1063 | "
\n",
1064 | " \n",
1065 | " | precip 2015-03-08 - 2015-03-15 | \n",
1066 | " 16466.0 | \n",
1067 | " 0.330641 | \n",
1068 | " 1.008490 | \n",
1069 | " 0.000000 | \n",
1070 | " 0.000000 | \n",
1071 | " 0.000000 | \n",
1072 | " 0.000000 | \n",
1073 | " 4.953321 | \n",
1074 | "
\n",
1075 | " \n",
1076 | " | precip 2019-01-20 - 2019-01-27 | \n",
1077 | " 16466.0 | \n",
1078 | " 13.329023 | \n",
1079 | " 5.552818 | \n",
1080 | " 3.813864 | \n",
1081 | " 8.946479 | \n",
1082 | " 12.913147 | \n",
1083 | " 17.123831 | \n",
1084 | " 25.101563 | \n",
1085 | "
\n",
1086 | " \n",
1087 | " | precip 2019-01-27 - 2019-02-03 | \n",
1088 | " 16466.0 | \n",
1089 | " 4.437490 | \n",
1090 | " 5.163184 | \n",
1091 | " 0.000000 | \n",
1092 | " 0.888216 | \n",
1093 | " 2.249833 | \n",
1094 | " 6.832768 | \n",
1095 | " 22.774148 | \n",
1096 | "
\n",
1097 | " \n",
1098 | " | precip 2019-02-03 - 2019-02-10 | \n",
1099 | " 16466.0 | \n",
1100 | " 23.149500 | \n",
1101 | " 6.148509 | \n",
1102 | " 12.450319 | \n",
1103 | " 19.590606 | \n",
1104 | " 21.661698 | \n",
1105 | " 24.213460 | \n",
1106 | " 46.225504 | \n",
1107 | "
\n",
1108 | " \n",
1109 | " | precip 2019-02-10 - 2019-02-17 | \n",
1110 | " 16466.0 | \n",
1111 | " 9.749785 | \n",
1112 | " 4.558172 | \n",
1113 | " 2.801546 | \n",
1114 | " 6.761704 | \n",
1115 | " 9.076457 | \n",
1116 | " 10.830000 | \n",
1117 | " 21.948157 | \n",
1118 | "
\n",
1119 | " \n",
1120 | " | precip 2019-02-17 - 2019-02-24 | \n",
1121 | " 16466.0 | \n",
1122 | " 29.575991 | \n",
1123 | " 8.508608 | \n",
1124 | " 12.780855 | \n",
1125 | " 25.360910 | \n",
1126 | " 30.593415 | \n",
1127 | " 33.300457 | \n",
1128 | " 52.682312 | \n",
1129 | "
\n",
1130 | " \n",
1131 | " | precip 2019-02-24 - 2019-03-03 | \n",
1132 | " 16466.0 | \n",
1133 | " 1.864399 | \n",
1134 | " 4.158313 | \n",
1135 | " 0.000000 | \n",
1136 | " 0.000000 | \n",
1137 | " 0.690000 | \n",
1138 | " 1.429370 | \n",
1139 | " 21.275595 | \n",
1140 | "
\n",
1141 | " \n",
1142 | " | precip 2019-03-03 - 2019-03-10 | \n",
1143 | " 16466.0 | \n",
1144 | " 60.424964 | \n",
1145 | " 8.313199 | \n",
1146 | " 32.108226 | \n",
1147 | " 56.380490 | \n",
1148 | " 60.581696 | \n",
1149 | " 65.721446 | \n",
1150 | " 84.675319 | \n",
1151 | "
\n",
1152 | " \n",
1153 | " | precip 2019-03-10 - 2019-03-17 | \n",
1154 | " 16466.0 | \n",
1155 | " 12.321620 | \n",
1156 | " 9.900994 | \n",
1157 | " 0.000000 | \n",
1158 | " 3.120173 | \n",
1159 | " 12.508606 | \n",
1160 | " 20.004375 | \n",
1161 | " 36.740809 | \n",
1162 | "
\n",
1163 | " \n",
1164 | " | precip 2019-03-17 - 2019-03-24 | \n",
1165 | " 16466.0 | \n",
1166 | " 35.637354 | \n",
1167 | " 14.519169 | \n",
1168 | " 15.803429 | \n",
1169 | " 22.021763 | \n",
1170 | " 34.275716 | \n",
1171 | " 44.253897 | \n",
1172 | " 72.123185 | \n",
1173 | "
\n",
1174 | " \n",
1175 | " | precip 2019-03-24 - 2019-03-31 | \n",
1176 | " 16466.0 | \n",
1177 | " 2.126234 | \n",
1178 | " 3.734829 | \n",
1179 | " 0.000000 | \n",
1180 | " 0.000000 | \n",
1181 | " 0.896323 | \n",
1182 | " 2.076590 | \n",
1183 | " 16.403638 | \n",
1184 | "
\n",
1185 | " \n",
1186 | " | precip 2019-03-31 - 2019-04-07 | \n",
1187 | " 16466.0 | \n",
1188 | " 3.453395 | \n",
1189 | " 8.007248 | \n",
1190 | " 0.000000 | \n",
1191 | " 0.000000 | \n",
1192 | " 0.000000 | \n",
1193 | " 2.914996 | \n",
1194 | " 37.059980 | \n",
1195 | "
\n",
1196 | " \n",
1197 | " | precip 2019-04-07 - 2019-04-14 | \n",
1198 | " 16466.0 | \n",
1199 | " 3.559366 | \n",
1200 | " 3.820294 | \n",
1201 | " 0.000000 | \n",
1202 | " 0.000000 | \n",
1203 | " 2.607053 | \n",
1204 | " 6.390000 | \n",
1205 | " 12.979454 | \n",
1206 | "
\n",
1207 | " \n",
1208 | " | precip 2019-04-14 - 2019-04-21 | \n",
1209 | " 16466.0 | \n",
1210 | " 9.127677 | \n",
1211 | " 6.868937 | \n",
1212 | " 0.000000 | \n",
1213 | " 4.352528 | \n",
1214 | " 7.862453 | \n",
1215 | " 13.459070 | \n",
1216 | " 46.367849 | \n",
1217 | "
\n",
1218 | " \n",
1219 | " | precip 2019-04-21 - 2019-04-28 | \n",
1220 | " 16466.0 | \n",
1221 | " 1.660709 | \n",
1222 | " 4.418032 | \n",
1223 | " 0.000000 | \n",
1224 | " 0.000000 | \n",
1225 | " 0.000000 | \n",
1226 | " 0.000000 | \n",
1227 | " 19.475846 | \n",
1228 | "
\n",
1229 | " \n",
1230 | " | precip 2019-04-28 - 2019-05-05 | \n",
1231 | " 16466.0 | \n",
1232 | " 0.526144 | \n",
1233 | " 1.494935 | \n",
1234 | " 0.000000 | \n",
1235 | " 0.000000 | \n",
1236 | " 0.000000 | \n",
1237 | " 0.000000 | \n",
1238 | " 6.914834 | \n",
1239 | "
\n",
1240 | " \n",
1241 | " | precip 2019-05-05 - 2019-05-12 | \n",
1242 | " 16466.0 | \n",
1243 | " 0.968101 | \n",
1244 | " 3.690698 | \n",
1245 | " 0.000000 | \n",
1246 | " 0.000000 | \n",
1247 | " 0.000000 | \n",
1248 | " 0.000000 | \n",
1249 | " 18.170051 | \n",
1250 | "
\n",
1251 | " \n",
1252 | " | precip 2019-05-12 - 2019-05-19 | \n",
1253 | " 16466.0 | \n",
1254 | " 1.585743 | \n",
1255 | " 4.651863 | \n",
1256 | " 0.000000 | \n",
1257 | " 0.000000 | \n",
1258 | " 0.000000 | \n",
1259 | " 0.000000 | \n",
1260 | " 20.092777 | \n",
1261 | "
\n",
1262 | " \n",
1263 | " | LC_Type1_mode | \n",
1264 | " 16466.0 | \n",
1265 | " 10.731750 | \n",
1266 | " 2.026100 | \n",
1267 | " 2.000000 | \n",
1268 | " 9.000000 | \n",
1269 | " 10.000000 | \n",
1270 | " 12.000000 | \n",
1271 | " 17.000000 | \n",
1272 | "
\n",
1273 | " \n",
1274 | "
\n",
1275 | "
"
1276 | ],
1277 | "text/plain": [
1278 | " count mean ... 75% max\n",
1279 | "X 16466.0 35.077656 ... 35.390000 35.860000\n",
1280 | "Y 16466.0 -15.813802 ... -15.520000 -15.210000\n",
1281 | "target_2015 16466.0 0.076609 ... 0.000000 1.000000\n",
1282 | "elevation 16466.0 592.848206 ... 751.434813 2803.303645\n",
1283 | "precip 2014-11-16 - 2014-11-23 16466.0 1.610760 ... 1.261848 19.354969\n",
1284 | "precip 2014-11-23 - 2014-11-30 16466.0 2.502058 ... 0.000000 41.023858\n",
1285 | "precip 2014-11-30 - 2014-12-07 16466.0 1.162076 ... 0.000000 22.020803\n",
1286 | "precip 2014-12-07 - 2014-12-14 16466.0 8.270610 ... 10.887235 18.870675\n",
1287 | "precip 2014-12-14 - 2014-12-21 16466.0 8.892459 ... 10.960668 23.044340\n",
1288 | "precip 2014-12-21 - 2014-12-28 16466.0 9.572821 ... 12.670775 21.757828\n",
1289 | "precip 2014-12-28 - 2015-01-04 16466.0 22.925036 ... 31.304699 62.433432\n",
1290 | "precip 2015-01-04 - 2015-01-11 16466.0 28.113210 ... 33.587434 51.197420\n",
1291 | "precip 2015-01-11 - 2015-01-18 16466.0 58.859208 ... 69.311540 105.275891\n",
1292 | "precip 2015-01-18 - 2015-01-25 16466.0 1.251173 ... 1.195866 11.103658\n",
1293 | "precip 2015-01-25 - 2015-02-01 16466.0 34.653177 ... 36.715386 53.014243\n",
1294 | "precip 2015-02-01 - 2015-02-08 16466.0 28.314888 ... 34.880240 44.341312\n",
1295 | "precip 2015-02-08 - 2015-02-15 16466.0 12.487909 ... 18.681926 28.559923\n",
1296 | "precip 2015-02-15 - 2015-02-22 16466.0 3.802584 ... 5.120276 15.715008\n",
1297 | "precip 2015-02-22 - 2015-03-01 16466.0 17.072285 ... 19.836449 36.964993\n",
1298 | "precip 2015-03-01 - 2015-03-08 16466.0 9.110949 ... 11.308650 25.711649\n",
1299 | "precip 2015-03-08 - 2015-03-15 16466.0 0.330641 ... 0.000000 4.953321\n",
1300 | "precip 2019-01-20 - 2019-01-27 16466.0 13.329023 ... 17.123831 25.101563\n",
1301 | "precip 2019-01-27 - 2019-02-03 16466.0 4.437490 ... 6.832768 22.774148\n",
1302 | "precip 2019-02-03 - 2019-02-10 16466.0 23.149500 ... 24.213460 46.225504\n",
1303 | "precip 2019-02-10 - 2019-02-17 16466.0 9.749785 ... 10.830000 21.948157\n",
1304 | "precip 2019-02-17 - 2019-02-24 16466.0 29.575991 ... 33.300457 52.682312\n",
1305 | "precip 2019-02-24 - 2019-03-03 16466.0 1.864399 ... 1.429370 21.275595\n",
1306 | "precip 2019-03-03 - 2019-03-10 16466.0 60.424964 ... 65.721446 84.675319\n",
1307 | "precip 2019-03-10 - 2019-03-17 16466.0 12.321620 ... 20.004375 36.740809\n",
1308 | "precip 2019-03-17 - 2019-03-24 16466.0 35.637354 ... 44.253897 72.123185\n",
1309 | "precip 2019-03-24 - 2019-03-31 16466.0 2.126234 ... 2.076590 16.403638\n",
1310 | "precip 2019-03-31 - 2019-04-07 16466.0 3.453395 ... 2.914996 37.059980\n",
1311 | "precip 2019-04-07 - 2019-04-14 16466.0 3.559366 ... 6.390000 12.979454\n",
1312 | "precip 2019-04-14 - 2019-04-21 16466.0 9.127677 ... 13.459070 46.367849\n",
1313 | "precip 2019-04-21 - 2019-04-28 16466.0 1.660709 ... 0.000000 19.475846\n",
1314 | "precip 2019-04-28 - 2019-05-05 16466.0 0.526144 ... 0.000000 6.914834\n",
1315 | "precip 2019-05-05 - 2019-05-12 16466.0 0.968101 ... 0.000000 18.170051\n",
1316 | "precip 2019-05-12 - 2019-05-19 16466.0 1.585743 ... 0.000000 20.092777\n",
1317 | "LC_Type1_mode 16466.0 10.731750 ... 12.000000 17.000000\n",
1318 | "\n",
1319 | "[39 rows x 8 columns]"
1320 | ]
1321 | },
1322 | "metadata": {
1323 | "tags": []
1324 | },
1325 | "execution_count": 5
1326 | }
1327 | ]
1328 | },
1329 | {
1330 | "cell_type": "markdown",
1331 | "metadata": {
1332 | "id": "VGJx1k4cYBm5",
1333 | "colab_type": "text"
1334 | },
1335 | "source": [
1336 | "#### Shape and Size"
1337 | ]
1338 | },
1339 | {
1340 | "cell_type": "code",
1341 | "metadata": {
1342 | "id": "3ZDs7u909mEt",
1343 | "colab_type": "code",
1344 | "outputId": "94ca72fe-dd3a-475e-dce4-5fc8e5456ecb",
1345 | "colab": {
1346 | "base_uri": "https://localhost:8080/",
1347 | "height": 34
1348 | }
1349 | },
1350 | "source": [
1351 | "# Checking for the shape and size of the dataframe\n",
1352 | "#\n",
1353 | "df.shape, df.size"
1354 | ],
1355 | "execution_count": 0,
1356 | "outputs": [
1357 | {
1358 | "output_type": "execute_result",
1359 | "data": {
1360 | "text/plain": [
1361 | "((16466, 40), 658640)"
1362 | ]
1363 | },
1364 | "metadata": {
1365 | "tags": []
1366 | },
1367 | "execution_count": 6
1368 | }
1369 | ]
1370 | },
1371 | {
1372 | "cell_type": "markdown",
1373 | "metadata": {
1374 | "id": "L2wdvD_8X3Ln",
1375 | "colab_type": "text"
1376 | },
1377 | "source": [
1378 | "#### Missing Values"
1379 | ]
1380 | },
1381 | {
1382 | "cell_type": "code",
1383 | "metadata": {
1384 | "id": "di56pTPRWbm9",
1385 | "colab_type": "code",
1386 | "outputId": "eb5cdc57-906c-4250-b33b-8b59254aafa7",
1387 | "colab": {
1388 | "base_uri": "https://localhost:8080/",
1389 | "height": 34
1390 | }
1391 | },
1392 | "source": [
1393 | "# Checking for missing values\n",
1394 | "#\n",
1395 | "df.isnull().sum().any()"
1396 | ],
1397 | "execution_count": 0,
1398 | "outputs": [
1399 | {
1400 | "output_type": "execute_result",
1401 | "data": {
1402 | "text/plain": [
1403 | "False"
1404 | ]
1405 | },
1406 | "metadata": {
1407 | "tags": []
1408 | },
1409 | "execution_count": 7
1410 | }
1411 | ]
1412 | },
1413 | {
1414 | "cell_type": "markdown",
1415 | "metadata": {
1416 | "id": "nT_RY2oSX6_S",
1417 | "colab_type": "text"
1418 | },
1419 | "source": [
1420 | "#### Duplicated Values"
1421 | ]
1422 | },
1423 | {
1424 | "cell_type": "code",
1425 | "metadata": {
1426 | "id": "-w80Xtd3WbjP",
1427 | "colab_type": "code",
1428 | "outputId": "75d1117f-25e0-4b0c-d22f-6fb20433d0cf",
1429 | "colab": {
1430 | "base_uri": "https://localhost:8080/",
1431 | "height": 34
1432 | }
1433 | },
1434 | "source": [
1435 | "# Checking for duplicates\n",
1436 | "#\n",
1437 | "df.duplicated().any()"
1438 | ],
1439 | "execution_count": 0,
1440 | "outputs": [
1441 | {
1442 | "output_type": "execute_result",
1443 | "data": {
1444 | "text/plain": [
1445 | "False"
1446 | ]
1447 | },
1448 | "metadata": {
1449 | "tags": []
1450 | },
1451 | "execution_count": 8
1452 | }
1453 | ]
1454 | },
1455 | {
1456 | "cell_type": "markdown",
1457 | "metadata": {
1458 | "id": "Pq4eeZlwX9qW",
1459 | "colab_type": "text"
1460 | },
1461 | "source": [
1462 | "#### Data Types"
1463 | ]
1464 | },
1465 | {
1466 | "cell_type": "code",
1467 | "metadata": {
1468 | "id": "utNJgAZ-fvIB",
1469 | "colab_type": "code",
1470 | "outputId": "3ae8ae17-be78-44e2-8885-a371759fca0a",
1471 | "colab": {
1472 | "base_uri": "https://localhost:8080/",
1473 | "height": 738
1474 | }
1475 | },
1476 | "source": [
1477 | "# Checking if the columns are represented with the appriopriate datatypes\n",
1478 | "#\n",
1479 | "df.dtypes"
1480 | ],
1481 | "execution_count": 0,
1482 | "outputs": [
1483 | {
1484 | "output_type": "execute_result",
1485 | "data": {
1486 | "text/plain": [
1487 | "X float64\n",
1488 | "Y float64\n",
1489 | "target_2015 float64\n",
1490 | "elevation float64\n",
1491 | "precip 2014-11-16 - 2014-11-23 float64\n",
1492 | "precip 2014-11-23 - 2014-11-30 float64\n",
1493 | "precip 2014-11-30 - 2014-12-07 float64\n",
1494 | "precip 2014-12-07 - 2014-12-14 float64\n",
1495 | "precip 2014-12-14 - 2014-12-21 float64\n",
1496 | "precip 2014-12-21 - 2014-12-28 float64\n",
1497 | "precip 2014-12-28 - 2015-01-04 float64\n",
1498 | "precip 2015-01-04 - 2015-01-11 float64\n",
1499 | "precip 2015-01-11 - 2015-01-18 float64\n",
1500 | "precip 2015-01-18 - 2015-01-25 float64\n",
1501 | "precip 2015-01-25 - 2015-02-01 float64\n",
1502 | "precip 2015-02-01 - 2015-02-08 float64\n",
1503 | "precip 2015-02-08 - 2015-02-15 float64\n",
1504 | "precip 2015-02-15 - 2015-02-22 float64\n",
1505 | "precip 2015-02-22 - 2015-03-01 float64\n",
1506 | "precip 2015-03-01 - 2015-03-08 float64\n",
1507 | "precip 2015-03-08 - 2015-03-15 float64\n",
1508 | "precip 2019-01-20 - 2019-01-27 float64\n",
1509 | "precip 2019-01-27 - 2019-02-03 float64\n",
1510 | "precip 2019-02-03 - 2019-02-10 float64\n",
1511 | "precip 2019-02-10 - 2019-02-17 float64\n",
1512 | "precip 2019-02-17 - 2019-02-24 float64\n",
1513 | "precip 2019-02-24 - 2019-03-03 float64\n",
1514 | "precip 2019-03-03 - 2019-03-10 float64\n",
1515 | "precip 2019-03-10 - 2019-03-17 float64\n",
1516 | "precip 2019-03-17 - 2019-03-24 float64\n",
1517 | "precip 2019-03-24 - 2019-03-31 float64\n",
1518 | "precip 2019-03-31 - 2019-04-07 float64\n",
1519 | "precip 2019-04-07 - 2019-04-14 float64\n",
1520 | "precip 2019-04-14 - 2019-04-21 float64\n",
1521 | "precip 2019-04-21 - 2019-04-28 float64\n",
1522 | "precip 2019-04-28 - 2019-05-05 float64\n",
1523 | "precip 2019-05-05 - 2019-05-12 float64\n",
1524 | "precip 2019-05-12 - 2019-05-19 float64\n",
1525 | "LC_Type1_mode int64\n",
1526 | "Square_ID object\n",
1527 | "dtype: object"
1528 | ]
1529 | },
1530 | "metadata": {
1531 | "tags": []
1532 | },
1533 | "execution_count": 9
1534 | }
1535 | ]
1536 | },
1537 | {
1538 | "cell_type": "markdown",
1539 | "metadata": {
1540 | "id": "EDdJqbOWR_Mw",
1541 | "colab_type": "text"
1542 | },
1543 | "source": [
1544 | "## Data Cleaning"
1545 | ]
1546 | },
1547 | {
1548 | "cell_type": "markdown",
1549 | "metadata": {
1550 | "id": "pKS5LhlvSxTp",
1551 | "colab_type": "text"
1552 | },
1553 | "source": [
1554 | "#### Separating the train and test sets "
1555 | ]
1556 | },
1557 | {
1558 | "cell_type": "code",
1559 | "metadata": {
1560 | "id": "TOm258dvHetE",
1561 | "colab_type": "code",
1562 | "colab": {}
1563 | },
1564 | "source": [
1565 | "# Creating lists of columns to be used in separating the dataframe into training and testing datasets\n",
1566 | "# Using a for loop for efficiency\n",
1567 | "#\n",
1568 | "precip_features_2019 = []\n",
1569 | "precip_features_2015 = []\n",
1570 | "for col in df.columns:\n",
1571 | " if '2019' in col:\n",
1572 | " precip_features_2019.append(col)\n",
1573 | " elif 'precip 2014' in col:\n",
1574 | " precip_features_2015.append(col)\n",
1575 | " elif 'precip 2015' in col:\n",
1576 | " precip_features_2015.append(col)"
1577 | ],
1578 | "execution_count": 0,
1579 | "outputs": []
1580 | },
1581 | {
1582 | "cell_type": "code",
1583 | "metadata": {
1584 | "id": "xTMJlAXOJ6Of",
1585 | "colab_type": "code",
1586 | "outputId": "a2be0b25-21e7-4e69-9c37-4dab9abbc4f7",
1587 | "colab": {
1588 | "base_uri": "https://localhost:8080/",
1589 | "height": 311
1590 | }
1591 | },
1592 | "source": [
1593 | "# Separating the train dataset from the main dataframe\n",
1594 | "#\n",
1595 | "train = df[df.columns.difference(precip_features_2019)]\n",
1596 | "\n",
1597 | "# Previewing the first two rows of the train dataset\n",
1598 | "#\n",
1599 | "train.head(2)"
1600 | ],
1601 | "execution_count": 0,
1602 | "outputs": [
1603 | {
1604 | "output_type": "execute_result",
1605 | "data": {
1606 | "text/html": [
1607 | "\n",
1608 | "\n",
1621 | "
\n",
1622 | " \n",
1623 | " \n",
1624 | " | \n",
1625 | " LC_Type1_mode | \n",
1626 | " Square_ID | \n",
1627 | " X | \n",
1628 | " Y | \n",
1629 | " elevation | \n",
1630 | " precip 2014-11-16 - 2014-11-23 | \n",
1631 | " precip 2014-11-23 - 2014-11-30 | \n",
1632 | " precip 2014-11-30 - 2014-12-07 | \n",
1633 | " precip 2014-12-07 - 2014-12-14 | \n",
1634 | " precip 2014-12-14 - 2014-12-21 | \n",
1635 | " precip 2014-12-21 - 2014-12-28 | \n",
1636 | " precip 2014-12-28 - 2015-01-04 | \n",
1637 | " precip 2015-01-04 - 2015-01-11 | \n",
1638 | " precip 2015-01-11 - 2015-01-18 | \n",
1639 | " precip 2015-01-18 - 2015-01-25 | \n",
1640 | " precip 2015-01-25 - 2015-02-01 | \n",
1641 | " precip 2015-02-01 - 2015-02-08 | \n",
1642 | " precip 2015-02-08 - 2015-02-15 | \n",
1643 | " precip 2015-02-15 - 2015-02-22 | \n",
1644 | " precip 2015-02-22 - 2015-03-01 | \n",
1645 | " precip 2015-03-01 - 2015-03-08 | \n",
1646 | " precip 2015-03-08 - 2015-03-15 | \n",
1647 | " target_2015 | \n",
1648 | "
\n",
1649 | " \n",
1650 | " \n",
1651 | " \n",
1652 | " | 0 | \n",
1653 | " 9 | \n",
1654 | " 4e3c3896-14ce-11ea-bce5-f49634744a41 | \n",
1655 | " 34.26 | \n",
1656 | " -15.91 | \n",
1657 | " 887.764222 | \n",
1658 | " 0.0 | \n",
1659 | " 0.0 | \n",
1660 | " 0.0 | \n",
1661 | " 14.844025 | \n",
1662 | " 14.552823 | \n",
1663 | " 12.237766 | \n",
1664 | " 57.451361 | \n",
1665 | " 30.127047 | \n",
1666 | " 30.449468 | \n",
1667 | " 1.521829 | \n",
1668 | " 29.389995 | \n",
1669 | " 32.878318 | \n",
1670 | " 8.179804 | \n",
1671 | " 0.963981 | \n",
1672 | " 16.659097 | \n",
1673 | " 3.304466 | \n",
1674 | " 0.0 | \n",
1675 | " 0.0 | \n",
1676 | "
\n",
1677 | " \n",
1678 | " | 1 | \n",
1679 | " 9 | \n",
1680 | " 4e3c3897-14ce-11ea-bce5-f49634744a41 | \n",
1681 | " 34.26 | \n",
1682 | " -15.90 | \n",
1683 | " 743.403912 | \n",
1684 | " 0.0 | \n",
1685 | " 0.0 | \n",
1686 | " 0.0 | \n",
1687 | " 14.844025 | \n",
1688 | " 14.552823 | \n",
1689 | " 12.237766 | \n",
1690 | " 57.451361 | \n",
1691 | " 30.127047 | \n",
1692 | " 30.449468 | \n",
1693 | " 1.521829 | \n",
1694 | " 29.389995 | \n",
1695 | " 32.878318 | \n",
1696 | " 8.179804 | \n",
1697 | " 0.963981 | \n",
1698 | " 16.659097 | \n",
1699 | " 3.304466 | \n",
1700 | " 0.0 | \n",
1701 | " 0.0 | \n",
1702 | "
\n",
1703 | " \n",
1704 | "
\n",
1705 | "
"
1706 | ],
1707 | "text/plain": [
1708 | " LC_Type1_mode ... target_2015\n",
1709 | "0 9 ... 0.0\n",
1710 | "1 9 ... 0.0\n",
1711 | "\n",
1712 | "[2 rows x 23 columns]"
1713 | ]
1714 | },
1715 | "metadata": {
1716 | "tags": []
1717 | },
1718 | "execution_count": 11
1719 | }
1720 | ]
1721 | },
1722 | {
1723 | "cell_type": "code",
1724 | "metadata": {
1725 | "id": "od6oWJvdTV1K",
1726 | "colab_type": "code",
1727 | "outputId": "4c0b0ada-b1c5-4b4d-8d20-784d91c224d3",
1728 | "colab": {
1729 | "base_uri": "https://localhost:8080/",
1730 | "height": 311
1731 | }
1732 | },
1733 | "source": [
1734 | "# Separating the test dataset from the main dataframe\n",
1735 | "#\n",
1736 | "precip_features_2019.extend(['X',\t'Y',\t'elevation', 'LC_Type1_mode',\t'Square_ID'])\n",
1737 | "test = df[precip_features_2019]\n",
1738 | "\n",
1739 | "# Previewing the first two rows of the test dataset\n",
1740 | "#\n",
1741 | "test.head(2)"
1742 | ],
1743 | "execution_count": 0,
1744 | "outputs": [
1745 | {
1746 | "output_type": "execute_result",
1747 | "data": {
1748 | "text/html": [
1749 | "\n",
1750 | "\n",
1763 | "
\n",
1764 | " \n",
1765 | " \n",
1766 | " | \n",
1767 | " precip 2019-01-20 - 2019-01-27 | \n",
1768 | " precip 2019-01-27 - 2019-02-03 | \n",
1769 | " precip 2019-02-03 - 2019-02-10 | \n",
1770 | " precip 2019-02-10 - 2019-02-17 | \n",
1771 | " precip 2019-02-17 - 2019-02-24 | \n",
1772 | " precip 2019-02-24 - 2019-03-03 | \n",
1773 | " precip 2019-03-03 - 2019-03-10 | \n",
1774 | " precip 2019-03-10 - 2019-03-17 | \n",
1775 | " precip 2019-03-17 - 2019-03-24 | \n",
1776 | " precip 2019-03-24 - 2019-03-31 | \n",
1777 | " precip 2019-03-31 - 2019-04-07 | \n",
1778 | " precip 2019-04-07 - 2019-04-14 | \n",
1779 | " precip 2019-04-14 - 2019-04-21 | \n",
1780 | " precip 2019-04-21 - 2019-04-28 | \n",
1781 | " precip 2019-04-28 - 2019-05-05 | \n",
1782 | " precip 2019-05-05 - 2019-05-12 | \n",
1783 | " precip 2019-05-12 - 2019-05-19 | \n",
1784 | " X | \n",
1785 | " Y | \n",
1786 | " elevation | \n",
1787 | " LC_Type1_mode | \n",
1788 | " Square_ID | \n",
1789 | "
\n",
1790 | " \n",
1791 | " \n",
1792 | " \n",
1793 | " | 0 | \n",
1794 | " 12.99262 | \n",
1795 | " 4.582856 | \n",
1796 | " 35.037532 | \n",
1797 | " 4.796012 | \n",
1798 | " 28.083314 | \n",
1799 | " 0.0 | \n",
1800 | " 58.362456 | \n",
1801 | " 18.264692 | \n",
1802 | " 17.537486 | \n",
1803 | " 0.896323 | \n",
1804 | " 1.68 | \n",
1805 | " 0.0 | \n",
1806 | " 0.0 | \n",
1807 | " 0.0 | \n",
1808 | " 0.0 | \n",
1809 | " 0.0 | \n",
1810 | " 0.0 | \n",
1811 | " 34.26 | \n",
1812 | " -15.91 | \n",
1813 | " 887.764222 | \n",
1814 | " 9 | \n",
1815 | " 4e3c3896-14ce-11ea-bce5-f49634744a41 | \n",
1816 | "
\n",
1817 | " \n",
1818 | " | 1 | \n",
1819 | " 12.99262 | \n",
1820 | " 4.582856 | \n",
1821 | " 35.037532 | \n",
1822 | " 4.796012 | \n",
1823 | " 28.083314 | \n",
1824 | " 0.0 | \n",
1825 | " 58.362456 | \n",
1826 | " 18.264692 | \n",
1827 | " 17.537486 | \n",
1828 | " 0.896323 | \n",
1829 | " 1.68 | \n",
1830 | " 0.0 | \n",
1831 | " 0.0 | \n",
1832 | " 0.0 | \n",
1833 | " 0.0 | \n",
1834 | " 0.0 | \n",
1835 | " 0.0 | \n",
1836 | " 34.26 | \n",
1837 | " -15.90 | \n",
1838 | " 743.403912 | \n",
1839 | " 9 | \n",
1840 | " 4e3c3897-14ce-11ea-bce5-f49634744a41 | \n",
1841 | "
\n",
1842 | " \n",
1843 | "
\n",
1844 | "
"
1845 | ],
1846 | "text/plain": [
1847 | " precip 2019-01-20 - 2019-01-27 ... Square_ID\n",
1848 | "0 12.99262 ... 4e3c3896-14ce-11ea-bce5-f49634744a41\n",
1849 | "1 12.99262 ... 4e3c3897-14ce-11ea-bce5-f49634744a41\n",
1850 | "\n",
1851 | "[2 rows x 22 columns]"
1852 | ]
1853 | },
1854 | "metadata": {
1855 | "tags": []
1856 | },
1857 | "execution_count": 12
1858 | }
1859 | ]
1860 | },
1861 | {
1862 | "cell_type": "markdown",
1863 | "metadata": {
1864 | "id": "TCSE0BL1SGdE",
1865 | "colab_type": "text"
1866 | },
1867 | "source": [
1868 | "#### Renaming columns"
1869 | ]
1870 | },
1871 | {
1872 | "cell_type": "code",
1873 | "metadata": {
1874 | "id": "F_IbMz6pOwr7",
1875 | "colab_type": "code",
1876 | "colab": {}
1877 | },
1878 | "source": [
1879 | "# Creating a dictionary of column names to be renamed for the training dataset\n",
1880 | "# The column names are renamed for conveniency\n",
1881 | "#\n",
1882 | "new_2015_cols = {}\n",
1883 | "for col, number in zip(precip_features_2015, range(1, len(precip_features_2015) + 1)):\n",
1884 | " if 'precip' in col:\n",
1885 | " new_2015_cols[col] = 'week_' + str(number) + '_precip'\n",
1886 | "\n",
1887 | " \n",
1888 | "# Creating a dictionary of column names to be renamed for the testing dataset\n",
1889 | "#\n",
1890 | "new_2019_cols = {}\n",
1891 | "for col, number in zip(precip_features_2019, range(1, len(precip_features_2019) + 1)):\n",
1892 | " if 'precip' in col:\n",
1893 | " new_2019_cols[col] = 'week_' + str(number) + '_precip'\n",
1894 | " \n",
1895 | "# Renaming the columns\n",
1896 | "#\n",
1897 | "train.rename(columns = new_2015_cols, inplace = True)\n",
1898 | "test.rename(columns = new_2019_cols, inplace = True)"
1899 | ],
1900 | "execution_count": 0,
1901 | "outputs": []
1902 | },
1903 | {
1904 | "cell_type": "code",
1905 | "metadata": {
1906 | "id": "gbxK4ssoOwn9",
1907 | "colab_type": "code",
1908 | "outputId": "7e4ee8c7-69c5-4e47-b83a-ae722dc21721",
1909 | "colab": {
1910 | "base_uri": "https://localhost:8080/",
1911 | "height": 307
1912 | }
1913 | },
1914 | "source": [
1915 | "# Previewing the first three rows of the cleaned train set\n",
1916 | "#\n",
1917 | "train.head(3)"
1918 | ],
1919 | "execution_count": 0,
1920 | "outputs": [
1921 | {
1922 | "output_type": "execute_result",
1923 | "data": {
1924 | "text/html": [
1925 | "\n",
1926 | "\n",
1939 | "
\n",
1940 | " \n",
1941 | " \n",
1942 | " | \n",
1943 | " LC_Type1_mode | \n",
1944 | " Square_ID | \n",
1945 | " X | \n",
1946 | " Y | \n",
1947 | " elevation | \n",
1948 | " week_1_precip | \n",
1949 | " week_2_precip | \n",
1950 | " week_3_precip | \n",
1951 | " week_4_precip | \n",
1952 | " week_5_precip | \n",
1953 | " week_6_precip | \n",
1954 | " week_7_precip | \n",
1955 | " week_8_precip | \n",
1956 | " week_9_precip | \n",
1957 | " week_10_precip | \n",
1958 | " week_11_precip | \n",
1959 | " week_12_precip | \n",
1960 | " week_13_precip | \n",
1961 | " week_14_precip | \n",
1962 | " week_15_precip | \n",
1963 | " week_16_precip | \n",
1964 | " week_17_precip | \n",
1965 | " target_2015 | \n",
1966 | "
\n",
1967 | " \n",
1968 | " \n",
1969 | " \n",
1970 | " | 0 | \n",
1971 | " 9 | \n",
1972 | " 4e3c3896-14ce-11ea-bce5-f49634744a41 | \n",
1973 | " 34.26 | \n",
1974 | " -15.91 | \n",
1975 | " 887.764222 | \n",
1976 | " 0.0 | \n",
1977 | " 0.0 | \n",
1978 | " 0.0 | \n",
1979 | " 14.844025 | \n",
1980 | " 14.552823 | \n",
1981 | " 12.237766 | \n",
1982 | " 57.451361 | \n",
1983 | " 30.127047 | \n",
1984 | " 30.449468 | \n",
1985 | " 1.521829 | \n",
1986 | " 29.389995 | \n",
1987 | " 32.878318 | \n",
1988 | " 8.179804 | \n",
1989 | " 0.963981 | \n",
1990 | " 16.659097 | \n",
1991 | " 3.304466 | \n",
1992 | " 0.0 | \n",
1993 | " 0.0 | \n",
1994 | "
\n",
1995 | " \n",
1996 | " | 1 | \n",
1997 | " 9 | \n",
1998 | " 4e3c3897-14ce-11ea-bce5-f49634744a41 | \n",
1999 | " 34.26 | \n",
2000 | " -15.90 | \n",
2001 | " 743.403912 | \n",
2002 | " 0.0 | \n",
2003 | " 0.0 | \n",
2004 | " 0.0 | \n",
2005 | " 14.844025 | \n",
2006 | " 14.552823 | \n",
2007 | " 12.237766 | \n",
2008 | " 57.451361 | \n",
2009 | " 30.127047 | \n",
2010 | " 30.449468 | \n",
2011 | " 1.521829 | \n",
2012 | " 29.389995 | \n",
2013 | " 32.878318 | \n",
2014 | " 8.179804 | \n",
2015 | " 0.963981 | \n",
2016 | " 16.659097 | \n",
2017 | " 3.304466 | \n",
2018 | " 0.0 | \n",
2019 | " 0.0 | \n",
2020 | "
\n",
2021 | " \n",
2022 | " | 2 | \n",
2023 | " 9 | \n",
2024 | " 4e3c3898-14ce-11ea-bce5-f49634744a41 | \n",
2025 | " 34.26 | \n",
2026 | " -15.89 | \n",
2027 | " 565.728343 | \n",
2028 | " 0.0 | \n",
2029 | " 0.0 | \n",
2030 | " 0.0 | \n",
2031 | " 14.844025 | \n",
2032 | " 14.552823 | \n",
2033 | " 12.237766 | \n",
2034 | " 57.451361 | \n",
2035 | " 30.127047 | \n",
2036 | " 30.449468 | \n",
2037 | " 1.521829 | \n",
2038 | " 29.389995 | \n",
2039 | " 32.878318 | \n",
2040 | " 8.179804 | \n",
2041 | " 0.963981 | \n",
2042 | " 16.659097 | \n",
2043 | " 3.304466 | \n",
2044 | " 0.0 | \n",
2045 | " 0.0 | \n",
2046 | "
\n",
2047 | " \n",
2048 | "
\n",
2049 | "
"
2050 | ],
2051 | "text/plain": [
2052 | " LC_Type1_mode ... target_2015\n",
2053 | "0 9 ... 0.0\n",
2054 | "1 9 ... 0.0\n",
2055 | "2 9 ... 0.0\n",
2056 | "\n",
2057 | "[3 rows x 23 columns]"
2058 | ]
2059 | },
2060 | "metadata": {
2061 | "tags": []
2062 | },
2063 | "execution_count": 14
2064 | }
2065 | ]
2066 | },
2067 | {
2068 | "cell_type": "markdown",
2069 | "metadata": {
2070 | "id": "f1OOTy-USMWb",
2071 | "colab_type": "text"
2072 | },
2073 | "source": [
2074 | "#### Re-aligning the Train and Test Datasets"
2075 | ]
2076 | },
2077 | {
2078 | "cell_type": "code",
2079 | "metadata": {
2080 | "id": "K_LmbYUFN5zW",
2081 | "colab_type": "code",
2082 | "outputId": "38663225-daff-45ad-b49c-80bd7da2bdb4",
2083 | "colab": {
2084 | "base_uri": "https://localhost:8080/",
2085 | "height": 307
2086 | }
2087 | },
2088 | "source": [
2089 | "# Separating the target variable\n",
2090 | "#\n",
2091 | "target = train.target_2015\n",
2092 | "\n",
2093 | "\n",
2094 | "# Aligning the training and testing datasets\n",
2095 | "#\n",
2096 | "train, test = train.align(test, join = 'inner', axis = 1)\n",
2097 | "\n",
2098 | "\n",
2099 | "# Previewing the first three rows of the cleaned and realigned test set\n",
2100 | "#\n",
2101 | "test.head(3)"
2102 | ],
2103 | "execution_count": 0,
2104 | "outputs": [
2105 | {
2106 | "output_type": "execute_result",
2107 | "data": {
2108 | "text/html": [
2109 | "\n",
2110 | "\n",
2123 | "
\n",
2124 | " \n",
2125 | " \n",
2126 | " | \n",
2127 | " LC_Type1_mode | \n",
2128 | " Square_ID | \n",
2129 | " X | \n",
2130 | " Y | \n",
2131 | " elevation | \n",
2132 | " week_1_precip | \n",
2133 | " week_2_precip | \n",
2134 | " week_3_precip | \n",
2135 | " week_4_precip | \n",
2136 | " week_5_precip | \n",
2137 | " week_6_precip | \n",
2138 | " week_7_precip | \n",
2139 | " week_8_precip | \n",
2140 | " week_9_precip | \n",
2141 | " week_10_precip | \n",
2142 | " week_11_precip | \n",
2143 | " week_12_precip | \n",
2144 | " week_13_precip | \n",
2145 | " week_14_precip | \n",
2146 | " week_15_precip | \n",
2147 | " week_16_precip | \n",
2148 | " week_17_precip | \n",
2149 | "
\n",
2150 | " \n",
2151 | " \n",
2152 | " \n",
2153 | " | 0 | \n",
2154 | " 9 | \n",
2155 | " 4e3c3896-14ce-11ea-bce5-f49634744a41 | \n",
2156 | " 34.26 | \n",
2157 | " -15.91 | \n",
2158 | " 887.764222 | \n",
2159 | " 12.99262 | \n",
2160 | " 4.582856 | \n",
2161 | " 35.037532 | \n",
2162 | " 4.796012 | \n",
2163 | " 28.083314 | \n",
2164 | " 0.0 | \n",
2165 | " 58.362456 | \n",
2166 | " 18.264692 | \n",
2167 | " 17.537486 | \n",
2168 | " 0.896323 | \n",
2169 | " 1.68 | \n",
2170 | " 0.0 | \n",
2171 | " 0.0 | \n",
2172 | " 0.0 | \n",
2173 | " 0.0 | \n",
2174 | " 0.0 | \n",
2175 | " 0.0 | \n",
2176 | "
\n",
2177 | " \n",
2178 | " | 1 | \n",
2179 | " 9 | \n",
2180 | " 4e3c3897-14ce-11ea-bce5-f49634744a41 | \n",
2181 | " 34.26 | \n",
2182 | " -15.90 | \n",
2183 | " 743.403912 | \n",
2184 | " 12.99262 | \n",
2185 | " 4.582856 | \n",
2186 | " 35.037532 | \n",
2187 | " 4.796012 | \n",
2188 | " 28.083314 | \n",
2189 | " 0.0 | \n",
2190 | " 58.362456 | \n",
2191 | " 18.264692 | \n",
2192 | " 17.537486 | \n",
2193 | " 0.896323 | \n",
2194 | " 1.68 | \n",
2195 | " 0.0 | \n",
2196 | " 0.0 | \n",
2197 | " 0.0 | \n",
2198 | " 0.0 | \n",
2199 | " 0.0 | \n",
2200 | " 0.0 | \n",
2201 | "
\n",
2202 | " \n",
2203 | " | 2 | \n",
2204 | " 9 | \n",
2205 | " 4e3c3898-14ce-11ea-bce5-f49634744a41 | \n",
2206 | " 34.26 | \n",
2207 | " -15.89 | \n",
2208 | " 565.728343 | \n",
2209 | " 12.99262 | \n",
2210 | " 4.582856 | \n",
2211 | " 35.037532 | \n",
2212 | " 4.796012 | \n",
2213 | " 28.083314 | \n",
2214 | " 0.0 | \n",
2215 | " 58.362456 | \n",
2216 | " 18.264692 | \n",
2217 | " 17.537486 | \n",
2218 | " 0.896323 | \n",
2219 | " 1.68 | \n",
2220 | " 0.0 | \n",
2221 | " 0.0 | \n",
2222 | " 0.0 | \n",
2223 | " 0.0 | \n",
2224 | " 0.0 | \n",
2225 | " 0.0 | \n",
2226 | "
\n",
2227 | " \n",
2228 | "
\n",
2229 | "
"
2230 | ],
2231 | "text/plain": [
2232 | " LC_Type1_mode ... week_17_precip\n",
2233 | "0 9 ... 0.0\n",
2234 | "1 9 ... 0.0\n",
2235 | "2 9 ... 0.0\n",
2236 | "\n",
2237 | "[3 rows x 22 columns]"
2238 | ]
2239 | },
2240 | "metadata": {
2241 | "tags": []
2242 | },
2243 | "execution_count": 15
2244 | }
2245 | ]
2246 | },
2247 | {
2248 | "cell_type": "markdown",
2249 | "metadata": {
2250 | "id": "jUolrELTTFGm",
2251 | "colab_type": "text"
2252 | },
2253 | "source": [
2254 | "## Model Selection"
2255 | ]
2256 | },
2257 | {
2258 | "cell_type": "code",
2259 | "metadata": {
2260 | "id": "oo8XVSzBhWph",
2261 | "colab_type": "code",
2262 | "outputId": "b2900251-94b7-4766-9caf-516fe035e450",
2263 | "colab": {
2264 | "base_uri": "https://localhost:8080/",
2265 | "height": 351
2266 | }
2267 | },
2268 | "source": [
2269 | "# Installing catboost\n",
2270 | "!pip install catboost==0.20.2"
2271 | ],
2272 | "execution_count": 0,
2273 | "outputs": [
2274 | {
2275 | "output_type": "stream",
2276 | "text": [
2277 | "Collecting catboost\n",
2278 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/3d/f6/733fe7cca5d0d882e1a708ad59da2510416cc2e4fa54e17c7a5082f67811/catboost-0.20.1-cp36-none-manylinux1_x86_64.whl (63.6MB)\n",
2279 | "\u001b[K |████████████████████████████████| 63.6MB 60.8MB/s \n",
2280 | "\u001b[?25hRequirement already satisfied: graphviz in /usr/local/lib/python3.6/dist-packages (from catboost) (0.10.1)\n",
2281 | "Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from catboost) (1.17.4)\n",
2282 | "Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.6/dist-packages (from catboost) (0.25.3)\n",
2283 | "Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from catboost) (1.3.3)\n",
2284 | "Requirement already satisfied: plotly in /usr/local/lib/python3.6/dist-packages (from catboost) (4.1.1)\n",
2285 | "Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from catboost) (1.12.0)\n",
2286 | "Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from catboost) (3.1.2)\n",
2287 | "Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.24.0->catboost) (2.6.1)\n",
2288 | "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.24.0->catboost) (2018.9)\n",
2289 | "Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.6/dist-packages (from plotly->catboost) (1.3.3)\n",
2290 | "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (0.10.0)\n",
2291 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (2.4.5)\n",
2292 | "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->catboost) (1.1.0)\n",
2293 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib->catboost) (42.0.2)\n",
2294 | "Installing collected packages: catboost\n",
2295 | "Successfully installed catboost-0.20.1\n"
2296 | ],
2297 | "name": "stdout"
2298 | }
2299 | ]
2300 | },
2301 | {
2302 | "cell_type": "markdown",
2303 | "metadata": {
2304 | "id": "uKFPY7QXTN-A",
2305 | "colab_type": "text"
2306 | },
2307 | "source": [
2308 | "#### Comparing different models to find the most accurate"
2309 | ]
2310 | },
2311 | {
2312 | "cell_type": "code",
2313 | "metadata": {
2314 | "id": "4o5TqrYmN5wJ",
2315 | "colab_type": "code",
2316 | "outputId": "d8e09ba5-c30b-4fd1-8db1-302b3c53d5e5",
2317 | "colab": {
2318 | "base_uri": "https://localhost:8080/",
2319 | "height": 432
2320 | }
2321 | },
2322 | "source": [
2323 | "# Using different models to find the optimal model\n",
2324 | "#\n",
2325 | "from sklearn.model_selection import KFold, cross_val_score\n",
2326 | "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor\n",
2327 | "from sklearn.tree import DecisionTreeRegressor\n",
2328 | "from sklearn.svm import SVR\n",
2329 | "from sklearn.neighbors import KNeighborsRegressor\n",
2330 | "from xgboost import XGBRegressor\n",
2331 | "from sklearn.linear_model import LinearRegression\n",
2332 | "from sklearn.metrics import mean_squared_error\n",
2333 | "from catboost import CatBoostRegressor\n",
2334 | "import warnings\n",
2335 | "warnings.filterwarnings('ignore')\n",
2336 | "\n",
2337 | "\n",
2338 | "# Creating a list of regressor algorithms to compare with\n",
2339 | "#\n",
2340 | "models = [RandomForestRegressor(), GradientBoostingRegressor(), AdaBoostRegressor(), DecisionTreeRegressor(), XGBRegressor(objective ='reg:squarederror'),\\\n",
2341 | " SVR(), KNeighborsRegressor(), LinearRegression(), CatBoostRegressor(logging_level='Silent')]\n",
2342 | "\n",
2343 | "\n",
2344 | "# Creating lists of the algorithms, to store the accuracy scores of each fold\n",
2345 | "#\n",
2346 | "RandomForest, GradientBoosting, AdaBoost, DecisionTree, XGB, SVR, KNeighbors, Linear, Cat = ([] for x in range(9))\n",
2347 | "\n",
2348 | "\n",
2349 | "# Creating a list containig the list of each algorithm. Created for easy iteration\n",
2350 | "#\n",
2351 | "model_list = [RandomForest, GradientBoosting, AdaBoost, DecisionTree, XGB, SVR, KNeighbors, Linear, Cat]\n",
2352 | "\n",
2353 | "\n",
2354 | "# Spliting the data into features and the target variable\n",
2355 | "#\n",
2356 | "X = train.drop('Square_ID', axis = 1)\n",
2357 | "y = target\n",
2358 | "\n",
2359 | "\n",
2360 | "# Creating a cross validation of 10 folds\n",
2361 | "#\n",
2362 | "kfold = KFold(n_splits=10, random_state=101)\n",
2363 | "\n",
2364 | "\n",
2365 | "# Iterating through each model and appending the scores of each fold to the appriopriate list\n",
2366 | "#\n",
2367 | "for i, j in zip(models, model_list):\n",
2368 | " j.extend(list(cross_val_score(i, X, y, scoring = 'neg_mean_squared_error', cv = kfold)))\n",
2369 | "\n",
2370 | " \n",
2371 | "# Creating a function to convert neg_mean_squared_error to a square root\n",
2372 | "#\n",
2373 | "def sq(lis):\n",
2374 | " new_lis = []\n",
2375 | " lis = np.array(lis)\n",
2376 | " for i in lis:\n",
2377 | " i = np.sqrt(i*-1)\n",
2378 | " new_lis.append(i)\n",
2379 | " return new_lis\n",
2380 | "\n",
2381 | "\n",
2382 | "# Creating a dataframe of all the rmses from the iterations for each model\n",
2383 | "#\n",
2384 | "rmses = pd.DataFrame({'Fold': np.arange(1, 11), 'RandomForest': sq(RandomForest), 'GradientBoosting': sq(GradientBoosting), 'Adaboost': sq(AdaBoost), 'DecisionTree': sq(DecisionTree),\\\n",
2385 | " 'XGB': sq(XGB), 'SVR': sq(SVR), 'Kneighbors': sq(KNeighbors), 'Linear': sq(Linear), 'Cat': sq(Cat)})\n",
2386 | "\n",
2387 | "# Setting the index\n",
2388 | "#\n",
2389 | "rmses.set_index('Fold', inplace = True)\n",
2390 | "\n",
2391 | "\n",
2392 | "# Calculating the mean and standard deviation rmse of each algorithm\n",
2393 | "#\n",
2394 | "rmses.loc['mean'] = rmses.mean()\n",
2395 | "rmses.loc['std'] = rmses.std()\n",
2396 | "\n",
2397 | "\n",
2398 | "# Previewing the rmses dataframe\n",
2399 | "#\n",
2400 | "rmses"
2401 | ],
2402 | "execution_count": 0,
2403 | "outputs": [
2404 | {
2405 | "output_type": "execute_result",
2406 | "data": {
2407 | "text/html": [
2408 | "\n",
2409 | "\n",
2422 | "
\n",
2423 | " \n",
2424 | " \n",
2425 | " | \n",
2426 | " RandomForest | \n",
2427 | " GradientBoosting | \n",
2428 | " Adaboost | \n",
2429 | " DecisionTree | \n",
2430 | " XGB | \n",
2431 | " SVR | \n",
2432 | " Kneighbors | \n",
2433 | " Linear | \n",
2434 | " Cat | \n",
2435 | "
\n",
2436 | " \n",
2437 | " | Fold | \n",
2438 | " | \n",
2439 | " | \n",
2440 | " | \n",
2441 | " | \n",
2442 | " | \n",
2443 | " | \n",
2444 | " | \n",
2445 | " | \n",
2446 | " | \n",
2447 | "
\n",
2448 | " \n",
2449 | " \n",
2450 | " \n",
2451 | " | 1 | \n",
2452 | " 0.085937 | \n",
2453 | " 0.084569 | \n",
2454 | " 0.091366 | \n",
2455 | " 0.085926 | \n",
2456 | " 0.085266 | \n",
2457 | " 0.130031 | \n",
2458 | " 0.086023 | \n",
2459 | " 0.135525 | \n",
2460 | " 0.084636 | \n",
2461 | "
\n",
2462 | " \n",
2463 | " | 2 | \n",
2464 | " 0.073427 | \n",
2465 | " 0.059387 | \n",
2466 | " 0.088300 | \n",
2467 | " 0.089046 | \n",
2468 | " 0.058436 | \n",
2469 | " 0.109598 | \n",
2470 | " 0.058385 | \n",
2471 | " 0.089754 | \n",
2472 | " 0.062418 | \n",
2473 | "
\n",
2474 | " \n",
2475 | " | 3 | \n",
2476 | " 0.112610 | \n",
2477 | " 0.089913 | \n",
2478 | " 0.091947 | \n",
2479 | " 0.141038 | \n",
2480 | " 0.088311 | \n",
2481 | " 0.127553 | \n",
2482 | " 0.096783 | \n",
2483 | " 0.121761 | \n",
2484 | " 0.094261 | \n",
2485 | "
\n",
2486 | " \n",
2487 | " | 4 | \n",
2488 | " 0.159949 | \n",
2489 | " 0.166018 | \n",
2490 | " 0.213543 | \n",
2491 | " 0.198635 | \n",
2492 | " 0.170162 | \n",
2493 | " 0.198225 | \n",
2494 | " 0.191840 | \n",
2495 | " 0.262740 | \n",
2496 | " 0.160362 | \n",
2497 | "
\n",
2498 | " \n",
2499 | " | 5 | \n",
2500 | " 0.160206 | \n",
2501 | " 0.179172 | \n",
2502 | " 0.218934 | \n",
2503 | " 0.224328 | \n",
2504 | " 0.176187 | \n",
2505 | " 0.206909 | \n",
2506 | " 0.177272 | \n",
2507 | " 0.336841 | \n",
2508 | " 0.162315 | \n",
2509 | "
\n",
2510 | " \n",
2511 | " | 6 | \n",
2512 | " 0.109505 | \n",
2513 | " 0.118684 | \n",
2514 | " 0.149015 | \n",
2515 | " 0.133493 | \n",
2516 | " 0.118987 | \n",
2517 | " 0.148906 | \n",
2518 | " 0.109368 | \n",
2519 | " 0.231556 | \n",
2520 | " 0.109079 | \n",
2521 | "
\n",
2522 | " \n",
2523 | " | 7 | \n",
2524 | " 0.058981 | \n",
2525 | " 0.056948 | \n",
2526 | " 0.081455 | \n",
2527 | " 0.064764 | \n",
2528 | " 0.058730 | \n",
2529 | " 0.112576 | \n",
2530 | " 0.059242 | \n",
2531 | " 0.153379 | \n",
2532 | " 0.052203 | \n",
2533 | "
\n",
2534 | " \n",
2535 | " | 8 | \n",
2536 | " 0.157463 | \n",
2537 | " 0.102168 | \n",
2538 | " 0.124836 | \n",
2539 | " 0.157540 | \n",
2540 | " 0.101140 | \n",
2541 | " 0.195707 | \n",
2542 | " 0.184638 | \n",
2543 | " 0.115595 | \n",
2544 | " 0.145777 | \n",
2545 | "
\n",
2546 | " \n",
2547 | " | 9 | \n",
2548 | " 0.246438 | \n",
2549 | " 0.264649 | \n",
2550 | " 0.269114 | \n",
2551 | " 0.272945 | \n",
2552 | " 0.260912 | \n",
2553 | " 0.276731 | \n",
2554 | " 0.324014 | \n",
2555 | " 0.341167 | \n",
2556 | " 0.260180 | \n",
2557 | "
\n",
2558 | " \n",
2559 | " | 10 | \n",
2560 | " 0.224625 | \n",
2561 | " 0.225823 | \n",
2562 | " 0.366404 | \n",
2563 | " 0.315316 | \n",
2564 | " 0.227395 | \n",
2565 | " 0.236619 | \n",
2566 | " 0.230469 | \n",
2567 | " 0.370660 | \n",
2568 | " 0.216168 | \n",
2569 | "
\n",
2570 | " \n",
2571 | " | mean | \n",
2572 | " 0.138914 | \n",
2573 | " 0.134733 | \n",
2574 | " 0.169491 | \n",
2575 | " 0.168303 | \n",
2576 | " 0.134553 | \n",
2577 | " 0.174286 | \n",
2578 | " 0.151803 | \n",
2579 | " 0.215898 | \n",
2580 | " 0.134740 | \n",
2581 | "
\n",
2582 | " \n",
2583 | " | std | \n",
2584 | " 0.059320 | \n",
2585 | " 0.067602 | \n",
2586 | " 0.090530 | \n",
2587 | " 0.079045 | \n",
2588 | " 0.067124 | \n",
2589 | " 0.054088 | \n",
2590 | " 0.081009 | \n",
2591 | " 0.100952 | \n",
2592 | " 0.063839 | \n",
2593 | "
\n",
2594 | " \n",
2595 | "
\n",
2596 | "
"
2597 | ],
2598 | "text/plain": [
2599 | " RandomForest GradientBoosting Adaboost ... Kneighbors Linear Cat\n",
2600 | "Fold ... \n",
2601 | "1 0.085937 0.084569 0.091366 ... 0.086023 0.135525 0.084636\n",
2602 | "2 0.073427 0.059387 0.088300 ... 0.058385 0.089754 0.062418\n",
2603 | "3 0.112610 0.089913 0.091947 ... 0.096783 0.121761 0.094261\n",
2604 | "4 0.159949 0.166018 0.213543 ... 0.191840 0.262740 0.160362\n",
2605 | "5 0.160206 0.179172 0.218934 ... 0.177272 0.336841 0.162315\n",
2606 | "6 0.109505 0.118684 0.149015 ... 0.109368 0.231556 0.109079\n",
2607 | "7 0.058981 0.056948 0.081455 ... 0.059242 0.153379 0.052203\n",
2608 | "8 0.157463 0.102168 0.124836 ... 0.184638 0.115595 0.145777\n",
2609 | "9 0.246438 0.264649 0.269114 ... 0.324014 0.341167 0.260180\n",
2610 | "10 0.224625 0.225823 0.366404 ... 0.230469 0.370660 0.216168\n",
2611 | "mean 0.138914 0.134733 0.169491 ... 0.151803 0.215898 0.134740\n",
2612 | "std 0.059320 0.067602 0.090530 ... 0.081009 0.100952 0.063839\n",
2613 | "\n",
2614 | "[12 rows x 9 columns]"
2615 | ]
2616 | },
2617 | "metadata": {
2618 | "tags": []
2619 | },
2620 | "execution_count": 17
2621 | }
2622 | ]
2623 | },
2624 | {
2625 | "cell_type": "markdown",
2626 | "metadata": {
2627 | "id": "cMvlmS9lTVvY",
2628 | "colab_type": "text"
2629 | },
2630 | "source": [
2631 | "#### Selecting the top three models with the least RMSE"
2632 | ]
2633 | },
2634 | {
2635 | "cell_type": "code",
2636 | "metadata": {
2637 | "id": "xl3ZnMjQN5dh",
2638 | "colab_type": "code",
2639 | "outputId": "1d34f7ae-b19b-4be6-ce9f-4f3b0545dda9",
2640 | "colab": {
2641 | "base_uri": "https://localhost:8080/",
2642 | "height": 34
2643 | }
2644 | },
2645 | "source": [
2646 | "# Checking for the regressor with minimum root mean squared error\n",
2647 | "#\n",
2648 | "rmses.loc['mean'].idxmin(), rmses.loc['mean'].min()"
2649 | ],
2650 | "execution_count": 0,
2651 | "outputs": [
2652 | {
2653 | "output_type": "execute_result",
2654 | "data": {
2655 | "text/plain": [
2656 | "('XGB', 0.13455272135801205)"
2657 | ]
2658 | },
2659 | "metadata": {
2660 | "tags": []
2661 | },
2662 | "execution_count": 18
2663 | }
2664 | ]
2665 | },
2666 | {
2667 | "cell_type": "code",
2668 | "metadata": {
2669 | "id": "lRBUedWKu2TH",
2670 | "colab_type": "code",
2671 | "outputId": "fbbd51ca-907b-46bb-c550-880ef3b8d5c2",
2672 | "colab": {
2673 | "base_uri": "https://localhost:8080/",
2674 | "height": 193
2675 | }
2676 | },
2677 | "source": [
2678 | "# Arranging the models in ascending order\n",
2679 | "#\n",
2680 | "rmses.loc['mean'].sort_values()"
2681 | ],
2682 | "execution_count": 0,
2683 | "outputs": [
2684 | {
2685 | "output_type": "execute_result",
2686 | "data": {
2687 | "text/plain": [
2688 | "XGB 0.134553\n",
2689 | "GradientBoosting 0.134733\n",
2690 | "Cat 0.134740\n",
2691 | "RandomForest 0.138914\n",
2692 | "Kneighbors 0.151803\n",
2693 | "DecisionTree 0.168303\n",
2694 | "Adaboost 0.169491\n",
2695 | "SVR 0.174286\n",
2696 | "Linear 0.215898\n",
2697 | "Name: mean, dtype: float64"
2698 | ]
2699 | },
2700 | "metadata": {
2701 | "tags": []
2702 | },
2703 | "execution_count": 19
2704 | }
2705 | ]
2706 | },
2707 | {
2708 | "cell_type": "markdown",
2709 | "metadata": {
2710 | "id": "JRgVpDTSTouK",
2711 | "colab_type": "text"
2712 | },
2713 | "source": [
2714 | "## Training the top three models and making predictions"
2715 | ]
2716 | },
2717 | {
2718 | "cell_type": "code",
2719 | "metadata": {
2720 | "id": "-xSnVX_ZN5Ys",
2721 | "colab_type": "code",
2722 | "colab": {}
2723 | },
2724 | "source": [
2725 | "# Using the top three models; XGBoost, Catboost and Gradientboost to train and make predictions\n",
2726 | "# Creating a list of models to use\n",
2727 | "models = [XGBRegressor(objective ='reg:squarederror'), CatBoostRegressor(logging_level='Silent'), GradientBoostingRegressor()]\n",
2728 | "model_names = ['xgboost', 'catboost', 'gradientboost']\n",
2729 | "\n",
2730 | "\n",
2731 | "# Selecting the training features and the target feature\n",
2732 | "#\n",
2733 | "X = train.drop('Square_ID', axis = 1)\n",
2734 | "y = target\n",
2735 | "\n",
2736 | "\n",
2737 | "# Submission dataset\n",
2738 | "#\n",
2739 | "sub = test.drop('Square_ID', axis = 1)\n",
2740 | "\n",
2741 | "\n",
2742 | "# Using a for loop to create a submission file for each model\n",
2743 | "#\n",
2744 | "for model, model_name in zip(models, model_names):\n",
2745 | " regressor = model # instantiating the model\n",
2746 | " regressor.fit(X, y) # Training the model\n",
2747 | " predictions = regressor.predict(sub) # Making predictions\n",
2748 | " submission_df = pd.DataFrame({'Square_ID': test.Square_ID, 'target_2019': predictions}) # Creating a submission file\n",
2749 | " submission_df.to_csv(model_name + '_baseline.csv', index = False)"
2750 | ],
2751 | "execution_count": 0,
2752 | "outputs": []
2753 | },
2754 | {
2755 | "cell_type": "markdown",
2756 | "metadata": {
2757 | "id": "aNTalTIjTwg4",
2758 | "colab_type": "text"
2759 | },
2760 | "source": [
2761 | "*The models yielded the following Root Mean Squared Errors:*\n",
2762 | " - XGBRegressor: 0.250710809791906\n",
2763 | " - **CatBoostRegressor: 0.118661182373564**\n",
2764 | " - GradientBoostingRegressor: 0.608857842367698\n",
2765 | " \n",
2766 | "The CatBoostRegressor was the most accurate with an RMSE of 0.118661182373564"
2767 | ]
2768 | },
2769 | {
2770 | "cell_type": "markdown",
2771 | "metadata": {
2772 | "id": "mqV0M63KVIIo",
2773 | "colab_type": "text"
2774 | },
2775 | "source": [
2776 | "# Next Steps:\n",
2777 | "To further improve the accuracy of the model, the following should be considered:\n",
2778 | " - A thorough Exploratory Data Analysis\n",
2779 | " - Feature Engineering\n",
2780 | " - Feature Selection\n",
2781 | " - Hyperparameter Tuning\n",
2782 | " - Model Evaluation\n",
2783 | " - Model interpretation\n",
2784 | " - Source for more data\n",
2785 | " \n",
2786 | "For any suggestions or clarifications, feel free to reach out @ [Darius Moruri - Linkedin](https://www.linkedin.com/in/dariusmoruri/)\n"
2787 | ]
2788 | }
2789 | ]
2790 | }
--------------------------------------------------------------------------------