├── .gitattributes └── pamap2.ipynb /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /pamap2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Science Research Methods" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "### Contents\n", 20 | "\n", 21 | "#### 1.Overview\n", 22 | "#### 2.Preparatory Tasks\n", 23 | "#### 3.Data Cleaning\n", 24 | "#### 4.Data Correlation\n", 25 | "#### 5.Exploratory Data Analysis\n", 26 | "#### 6.Hypothesis Testing\n", 27 | "#### 7.Modelling\n", 28 | "#### 7.Decisions\n", 29 | "#### 8.Summary\n", 30 | "#### 9. References" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "### Overview" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "For this assignment, the dataset used is PAMAP2 which is an Activity Monitoring dataset that covers 18 different physical activities which are taken by 9 different subjects, 8 men and 1 woman, taken using 3 inertial measurement units and a heart rate monitor. \n", 45 | "\n", 46 | "The outcome of this assignment should be to gain insights based on the results of our analysis to determine how active an individual is based on the physical activities taken which are to be used to create hardware of software.\n", 47 | "\n", 48 | "The three requirements of this assignment are to:\n", 49 | "\n", 50 | "1. Carry out thoroughly exploratory data analysis and appropriately handle missing or dirty data\n", 51 | "\n", 52 | "2. Develop and tests at least one hypothesis for a relationship between a single pair of attributes\n", 53 | "\n", 54 | "3. Develop and test at least one model which uses multiple attributes to make predictions" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "### Preparatory Tasks" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 1, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "from matplotlib import pyplot as plt\n", 71 | "%matplotlib inline\n", 72 | "import pandas as pd\n", 73 | "import numpy as np\n", 74 | "import math\n", 75 | "from sklearn.model_selection import train_test_split\n" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "A list of the file names has to be created in order to load all the files and create the dataframe. Moreover, a dictionary that will hold the names as well as numbers of each different activity has to be created in order to be able to understand which activity is being analysed at each phase.\n", 83 | "\n", 84 | "Lists for each different category of IMU's have to be put together as well in order to have the column names for the dataframe. IMU's that will be used are for chest, ankle and hand.\n", 85 | "\n", 86 | "Then all the different lists have to be put together to create the collection of the columns." 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 2, 92 | "metadata": {}, 93 | "outputs": [ 94 | { 95 | "data": { 96 | "text/plain": [ 97 | "54" 98 | ] 99 | }, 100 | "execution_count": 2, 101 | "metadata": {}, 102 | "output_type": "execute_result" 103 | } 104 | ], 105 | "source": [ 106 | "# Load data\n", 107 | "list_of_files = ['PAMAP2_Dataset/Protocol/subject101.dat',\n", 108 | " 'PAMAP2_Dataset/Protocol/subject102.dat',\n", 109 | " 'PAMAP2_Dataset/Protocol/subject103.dat',\n", 110 | " 'PAMAP2_Dataset/Protocol/subject104.dat',\n", 111 | " 'PAMAP2_Dataset/Protocol/subject105.dat',\n", 112 | " 'PAMAP2_Dataset/Protocol/subject106.dat',\n", 113 | " 'PAMAP2_Dataset/Protocol/subject107.dat',\n", 114 | " 'PAMAP2_Dataset/Protocol/subject108.dat',\n", 115 | " 'PAMAP2_Dataset/Protocol/subject109.dat' ]\n", 116 | "\n", 117 | "subjectID = [1,2,3,4,5,6,7,8,9]\n", 118 | "\n", 119 | "activityIDdict = {0: 'transient',\n", 120 | " 1: 'lying',\n", 121 | " 2: 'sitting',\n", 122 | " 3: 'standing',\n", 123 | " 4: 'walking',\n", 124 | " 5: 'running',\n", 125 | " 6: 'cycling',\n", 126 | " 7: 'Nordic_walking',\n", 127 | " 9: 'watching_TV',\n", 128 | " 10: 'computer_work',\n", 129 | " 11: 'car driving',\n", 130 | " 12: 'ascending_stairs',\n", 131 | " 13: 'descending_stairs',\n", 132 | " 16: 'vacuum_cleaning',\n", 133 | " 17: 'ironing',\n", 134 | " 18: 'folding_laundry',\n", 135 | " 19: 'house_cleaning',\n", 136 | " 20: 'playing_soccer',\n", 137 | " 24: 'rope_jumping' }\n", 138 | "\n", 139 | "colNames = [\"timestamp\", \"activityID\",\"heartrate\"]\n", 140 | "\n", 141 | "IMUhand = ['handTemperature', \n", 142 | " 'handAcc16_1', 'handAcc16_2', 'handAcc16_3', \n", 143 | " 'handAcc6_1', 'handAcc6_2', 'handAcc6_3', \n", 144 | " 'handGyro1', 'handGyro2', 'handGyro3', \n", 145 | " 'handMagne1', 'handMagne2', 'handMagne3',\n", 146 | " 'handOrientation1', 'handOrientation2', 'handOrientation3', 'handOrientation4']\n", 147 | "\n", 148 | "IMUchest = ['chestTemperature', \n", 149 | " 'chestAcc16_1', 'chestAcc16_2', 'chestAcc16_3', \n", 150 | " 'chestAcc6_1', 'chestAcc6_2', 'chestAcc6_3', \n", 151 | " 'chestGyro1', 'chestGyro2', 'chestGyro3', \n", 152 | " 'chestMagne1', 'chestMagne2', 'chestMagne3',\n", 153 | " 'chestOrientation1', 'chestOrientation2', 'chestOrientation3', 'chestOrientation4']\n", 154 | "\n", 155 | "IMUankle = ['ankleTemperature', \n", 156 | " 'ankleAcc16_1', 'ankleAcc16_2', 'ankleAcc16_3', \n", 157 | " 'ankleAcc6_1', 'ankleAcc6_2', 'ankleAcc6_3', \n", 158 | " 'ankleGyro1', 'ankleGyro2', 'ankleGyro3', \n", 159 | " 'ankleMagne1', 'ankleMagne2', 'ankleMagne3',\n", 160 | " 'ankleOrientation1', 'ankleOrientation2', 'ankleOrientation3', 'ankleOrientation4']\n", 161 | "\n", 162 | "columns = colNames + IMUhand + IMUchest + IMUankle #all columns in one list\n", 163 | "\n", 164 | "len(columns)\n" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 3, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "data": { 174 | "text/html": [ 175 | "
\n", 176 | "\n", 189 | "\n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | "
timestampactivityIDheartratehandTemperaturehandAcc16_1handAcc16_2handAcc16_3handAcc6_1handAcc6_2handAcc6_3...ankleGyro2ankleGyro3ankleMagne1ankleMagne2ankleMagne3ankleOrientation1ankleOrientation2ankleOrientation3ankleOrientation4subject_id
08.380104.030.02.372238.600743.510482.439548.761653.35465...0.009250-0.017580-61.1888-38.9599-58.14381.00.00.00.01
18.390NaN30.02.188378.565603.661792.394948.550813.64207...-0.0046380.000368-59.8479-38.8919-58.52531.00.00.00.01
28.400NaN30.02.373578.601073.548982.305148.536443.73280...0.0001480.022495-60.7361-39.4138-58.39991.00.00.00.01
38.410NaN30.02.074738.528533.660212.335288.536223.73277...-0.0203010.011275-60.4091-38.7635-58.39561.00.00.00.01
48.420NaN30.02.229368.831223.700002.230558.597413.76295...-0.014303-0.002823-61.5199-39.3879-58.26941.00.00.00.01
\n", 339 | "

5 rows × 55 columns

\n", 340 | "
" 341 | ], 342 | "text/plain": [ 343 | " timestamp activityID heartrate handTemperature handAcc16_1 \\\n", 344 | "0 8.38 0 104.0 30.0 2.37223 \n", 345 | "1 8.39 0 NaN 30.0 2.18837 \n", 346 | "2 8.40 0 NaN 30.0 2.37357 \n", 347 | "3 8.41 0 NaN 30.0 2.07473 \n", 348 | "4 8.42 0 NaN 30.0 2.22936 \n", 349 | "\n", 350 | " handAcc16_2 handAcc16_3 handAcc6_1 handAcc6_2 handAcc6_3 ... \\\n", 351 | "0 8.60074 3.51048 2.43954 8.76165 3.35465 ... \n", 352 | "1 8.56560 3.66179 2.39494 8.55081 3.64207 ... \n", 353 | "2 8.60107 3.54898 2.30514 8.53644 3.73280 ... \n", 354 | "3 8.52853 3.66021 2.33528 8.53622 3.73277 ... \n", 355 | "4 8.83122 3.70000 2.23055 8.59741 3.76295 ... \n", 356 | "\n", 357 | " ankleGyro2 ankleGyro3 ankleMagne1 ankleMagne2 ankleMagne3 \\\n", 358 | "0 0.009250 -0.017580 -61.1888 -38.9599 -58.1438 \n", 359 | "1 -0.004638 0.000368 -59.8479 -38.8919 -58.5253 \n", 360 | "2 0.000148 0.022495 -60.7361 -39.4138 -58.3999 \n", 361 | "3 -0.020301 0.011275 -60.4091 -38.7635 -58.3956 \n", 362 | "4 -0.014303 -0.002823 -61.5199 -39.3879 -58.2694 \n", 363 | "\n", 364 | " ankleOrientation1 ankleOrientation2 ankleOrientation3 ankleOrientation4 \\\n", 365 | "0 1.0 0.0 0.0 0.0 \n", 366 | "1 1.0 0.0 0.0 0.0 \n", 367 | "2 1.0 0.0 0.0 0.0 \n", 368 | "3 1.0 0.0 0.0 0.0 \n", 369 | "4 1.0 0.0 0.0 0.0 \n", 370 | "\n", 371 | " subject_id \n", 372 | "0 1 \n", 373 | "1 1 \n", 374 | "2 1 \n", 375 | "3 1 \n", 376 | "4 1 \n", 377 | "\n", 378 | "[5 rows x 55 columns]" 379 | ] 380 | }, 381 | "execution_count": 3, 382 | "metadata": {}, 383 | "output_type": "execute_result" 384 | } 385 | ], 386 | "source": [ 387 | "dataCollection = pd.DataFrame()\n", 388 | "for file in list_of_files:\n", 389 | " procData = pd.read_table(file, header=None, sep='\\s+')\n", 390 | " procData.columns = columns\n", 391 | " procData['subject_id'] = int(file[-5])\n", 392 | " dataCollection = dataCollection.append(procData, ignore_index=True)\n", 393 | "\n", 394 | "dataCollection.reset_index(drop=True, inplace=True)\n", 395 | "dataCollection.head()" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "As it can be seen from the sample of the dataframe that we produced, there is some data cleaning required. For instance, activityID 0 must be removed completely from our dataset since this is transient period where the subject was not doing any particular activity as discussed in the readme file that was given. Data cleaning will be discussed thoroughly in the following section." 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "### Data Cleaning" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "From a look at the **PerformedActivitiesSummary** file which is part of the collection of files given, it can be seen that various data is missing and as the **readme** file comments on, there were some wireless disconnections in data collection therefore the missing data has to be accounted for and made up in a way that our data analysis will not be impacted. More in depth, each different activity has 8 or less subjects which did the activity and each different activity has NaN values for various subjects. Therefore some data filling has be applied. As for the NaN values in our data, it is best to use **interpolate** which is constructing a new data point out of a set of known data points.\n", 417 | "\n", 418 | "As a guideline for the code written below, interpolation happens after removing 'activity 0' as it has very noisy and would data which would impact the output massively as the values given in heart rate would not be produced from the data point set that is of activities that actually matter. " 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": 4, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "def dataCleaning(dataCollection):\n", 428 | " dataCollection = dataCollection.drop(['handOrientation1', 'handOrientation2', 'handOrientation3', 'handOrientation4',\n", 429 | " 'chestOrientation1', 'chestOrientation2', 'chestOrientation3', 'chestOrientation4',\n", 430 | " 'ankleOrientation1', 'ankleOrientation2', 'ankleOrientation3', 'ankleOrientation4'],\n", 431 | " axis = 1) # removal of orientation columns as they are not needed\n", 432 | " dataCollection = dataCollection.drop(dataCollection[dataCollection.activityID == 0].index) #removal of any row of activity 0 as it is transient activity which it is not used\n", 433 | " dataCollection = dataCollection.apply(pd.to_numeric, errors = 'coerse') #removal of non numeric data in cells\n", 434 | " dataCollection = dataCollection.interpolate() #removal of any remaining NaN value cells by constructing new data points in known set of data points\n", 435 | " \n", 436 | " return dataCollection\n" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 5, 442 | "metadata": {}, 443 | "outputs": [], 444 | "source": [ 445 | "dataCol = dataCleaning(dataCollection)" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 6, 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "data": { 455 | "text/html": [ 456 | "
\n", 457 | "\n", 470 | "\n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | "
timestampactivityIDheartratehandTemperaturehandAcc16_1handAcc16_2handAcc16_3handAcc6_1handAcc6_2handAcc6_3...ankleAcc6_1ankleAcc6_2ankleAcc6_3ankleGyro1ankleGyro2ankleGyro3ankleMagne1ankleMagne2ankleMagne3subject_id
037.661NaN30.3752.215308.279155.587532.246898.553875.77143...9.63162-1.767570.2657610.002908-0.0277140.001752-61.1081-36.8636-58.36961
137.671NaN30.3752.291967.672885.744672.273738.145925.78739...9.58649-1.752470.2508160.0208820.0009450.006007-60.8916-36.3197-58.36561
237.681NaN30.3752.290907.142405.823422.269667.662685.78846...9.60196-1.737210.356632-0.035392-0.052422-0.004882-60.3407-35.7842-58.61191
337.691NaN30.3752.218007.143655.899302.221777.255355.88000...9.58674-1.782640.311453-0.032514-0.0188440.026950-60.7646-37.1028-57.87991
437.701100.030.3752.301067.258576.092592.207207.240425.95555...9.64677-1.752400.2959020.001351-0.048878-0.006328-60.2040-37.1225-57.88471
537.711100.030.3752.071657.259656.012182.192387.210386.01604...9.60177-1.752390.3112760.003793-0.0269060.004125-61.3257-36.9744-57.75011
637.721100.030.3752.411487.597805.939152.239887.466796.03053...9.67694-1.767480.3260600.036814-0.032277-0.006866-61.5520-36.9632-57.99571
737.731100.030.3752.328157.634315.706862.316637.647456.01495...9.61685-1.767490.326380-0.010352-0.0166210.006548-61.5738-36.1724-59.34871
837.741100.030.3752.250967.785985.628212.286377.708015.93935...9.61686-1.722120.3262340.0393460.020393-0.011880-61.7741-37.1744-58.11991
937.751100.030.3752.141077.522625.781412.315387.722765.78828...9.63189-1.706990.3261050.029874-0.0107630.005133-60.7680-37.4206-58.87351
\n", 740 | "

10 rows × 43 columns

\n", 741 | "
" 742 | ], 743 | "text/plain": [ 744 | " timestamp activityID heartrate handTemperature handAcc16_1 \\\n", 745 | "0 37.66 1 NaN 30.375 2.21530 \n", 746 | "1 37.67 1 NaN 30.375 2.29196 \n", 747 | "2 37.68 1 NaN 30.375 2.29090 \n", 748 | "3 37.69 1 NaN 30.375 2.21800 \n", 749 | "4 37.70 1 100.0 30.375 2.30106 \n", 750 | "5 37.71 1 100.0 30.375 2.07165 \n", 751 | "6 37.72 1 100.0 30.375 2.41148 \n", 752 | "7 37.73 1 100.0 30.375 2.32815 \n", 753 | "8 37.74 1 100.0 30.375 2.25096 \n", 754 | "9 37.75 1 100.0 30.375 2.14107 \n", 755 | "\n", 756 | " handAcc16_2 handAcc16_3 handAcc6_1 handAcc6_2 handAcc6_3 ... \\\n", 757 | "0 8.27915 5.58753 2.24689 8.55387 5.77143 ... \n", 758 | "1 7.67288 5.74467 2.27373 8.14592 5.78739 ... \n", 759 | "2 7.14240 5.82342 2.26966 7.66268 5.78846 ... \n", 760 | "3 7.14365 5.89930 2.22177 7.25535 5.88000 ... \n", 761 | "4 7.25857 6.09259 2.20720 7.24042 5.95555 ... \n", 762 | "5 7.25965 6.01218 2.19238 7.21038 6.01604 ... \n", 763 | "6 7.59780 5.93915 2.23988 7.46679 6.03053 ... \n", 764 | "7 7.63431 5.70686 2.31663 7.64745 6.01495 ... \n", 765 | "8 7.78598 5.62821 2.28637 7.70801 5.93935 ... \n", 766 | "9 7.52262 5.78141 2.31538 7.72276 5.78828 ... \n", 767 | "\n", 768 | " ankleAcc6_1 ankleAcc6_2 ankleAcc6_3 ankleGyro1 ankleGyro2 ankleGyro3 \\\n", 769 | "0 9.63162 -1.76757 0.265761 0.002908 -0.027714 0.001752 \n", 770 | "1 9.58649 -1.75247 0.250816 0.020882 0.000945 0.006007 \n", 771 | "2 9.60196 -1.73721 0.356632 -0.035392 -0.052422 -0.004882 \n", 772 | "3 9.58674 -1.78264 0.311453 -0.032514 -0.018844 0.026950 \n", 773 | "4 9.64677 -1.75240 0.295902 0.001351 -0.048878 -0.006328 \n", 774 | "5 9.60177 -1.75239 0.311276 0.003793 -0.026906 0.004125 \n", 775 | "6 9.67694 -1.76748 0.326060 0.036814 -0.032277 -0.006866 \n", 776 | "7 9.61685 -1.76749 0.326380 -0.010352 -0.016621 0.006548 \n", 777 | "8 9.61686 -1.72212 0.326234 0.039346 0.020393 -0.011880 \n", 778 | "9 9.63189 -1.70699 0.326105 0.029874 -0.010763 0.005133 \n", 779 | "\n", 780 | " ankleMagne1 ankleMagne2 ankleMagne3 subject_id \n", 781 | "0 -61.1081 -36.8636 -58.3696 1 \n", 782 | "1 -60.8916 -36.3197 -58.3656 1 \n", 783 | "2 -60.3407 -35.7842 -58.6119 1 \n", 784 | "3 -60.7646 -37.1028 -57.8799 1 \n", 785 | "4 -60.2040 -37.1225 -57.8847 1 \n", 786 | "5 -61.3257 -36.9744 -57.7501 1 \n", 787 | "6 -61.5520 -36.9632 -57.9957 1 \n", 788 | "7 -61.5738 -36.1724 -59.3487 1 \n", 789 | "8 -61.7741 -37.1744 -58.1199 1 \n", 790 | "9 -60.7680 -37.4206 -58.8735 1 \n", 791 | "\n", 792 | "[10 rows x 43 columns]" 793 | ] 794 | }, 795 | "execution_count": 6, 796 | "metadata": {}, 797 | "output_type": "execute_result" 798 | } 799 | ], 800 | "source": [ 801 | "dataCol.reset_index(drop = True, inplace = True)\n", 802 | "dataCol.head(10)" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "The reason why heartrate still has NaN values is because interpolation calculates the values around the NaN cell. Since the first cells are NaN it is normal to generate new NaN values after interpolation. To overcome this problem we can assume that the value of the first 4 cells is 100 since the values after the index 4 is 100. Doing so will eliminate any NaN values from our dataset." 810 | ] 811 | }, 812 | { 813 | "cell_type": "code", 814 | "execution_count": 7, 815 | "metadata": {}, 816 | "outputs": [ 817 | { 818 | "data": { 819 | "text/plain": [ 820 | "timestamp 0\n", 821 | "activityID 0\n", 822 | "heartrate 4\n", 823 | "handTemperature 0\n", 824 | "handAcc16_1 0\n", 825 | "handAcc16_2 0\n", 826 | "handAcc16_3 0\n", 827 | "handAcc6_1 0\n", 828 | "handAcc6_2 0\n", 829 | "handAcc6_3 0\n", 830 | "handGyro1 0\n", 831 | "handGyro2 0\n", 832 | "handGyro3 0\n", 833 | "handMagne1 0\n", 834 | "handMagne2 0\n", 835 | "handMagne3 0\n", 836 | "chestTemperature 0\n", 837 | "chestAcc16_1 0\n", 838 | "chestAcc16_2 0\n", 839 | "chestAcc16_3 0\n", 840 | "chestAcc6_1 0\n", 841 | "chestAcc6_2 0\n", 842 | "chestAcc6_3 0\n", 843 | "chestGyro1 0\n", 844 | "chestGyro2 0\n", 845 | "chestGyro3 0\n", 846 | "chestMagne1 0\n", 847 | "chestMagne2 0\n", 848 | "chestMagne3 0\n", 849 | "ankleTemperature 0\n", 850 | "ankleAcc16_1 0\n", 851 | "ankleAcc16_2 0\n", 852 | "ankleAcc16_3 0\n", 853 | "ankleAcc6_1 0\n", 854 | "ankleAcc6_2 0\n", 855 | "ankleAcc6_3 0\n", 856 | "ankleGyro1 0\n", 857 | "ankleGyro2 0\n", 858 | "ankleGyro3 0\n", 859 | "ankleMagne1 0\n", 860 | "ankleMagne2 0\n", 861 | "ankleMagne3 0\n", 862 | "subject_id 0\n", 863 | "dtype: int64" 864 | ] 865 | }, 866 | "execution_count": 7, 867 | "metadata": {}, 868 | "output_type": "execute_result" 869 | } 870 | ], 871 | "source": [ 872 | "dataCol.isnull().sum()" 873 | ] 874 | }, 875 | { 876 | "cell_type": "markdown", 877 | "metadata": {}, 878 | "source": [ 879 | "The reason why HR still has nan values is because the interpolation calculates the values around the nan cell. Since the first cells are nan it is normal to generate new nan values after interpolation. To overcome this problem we can assume that the value of the first 4 cells is 100 since the values after the index 4 is 100." 880 | ] 881 | }, 882 | { 883 | "cell_type": "code", 884 | "execution_count": 8, 885 | "metadata": {}, 886 | "outputs": [ 887 | { 888 | "name": "stderr", 889 | "output_type": "stream", 890 | "text": [ 891 | "C:\\Users\\Andreas\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\pandas\\core\\indexing.py:189: SettingWithCopyWarning: \n", 892 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 893 | "\n", 894 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", 895 | " self._setitem_with_indexer(indexer, value)\n" 896 | ] 897 | } 898 | ], 899 | "source": [ 900 | "for i in range(0,4):\n", 901 | " dataCol[\"heartrate\"].iloc[i]=100" 902 | ] 903 | }, 904 | { 905 | "cell_type": "markdown", 906 | "metadata": {}, 907 | "source": [ 908 | "It is clear that there are no missing values but we can check just to be sure. " 909 | ] 910 | }, 911 | { 912 | "cell_type": "code", 913 | "execution_count": 9, 914 | "metadata": {}, 915 | "outputs": [ 916 | { 917 | "data": { 918 | "text/plain": [ 919 | "timestamp 0\n", 920 | "activityID 0\n", 921 | "heartrate 0\n", 922 | "handTemperature 0\n", 923 | "handAcc16_1 0\n", 924 | "handAcc16_2 0\n", 925 | "handAcc16_3 0\n", 926 | "handAcc6_1 0\n", 927 | "handAcc6_2 0\n", 928 | "handAcc6_3 0\n", 929 | "handGyro1 0\n", 930 | "handGyro2 0\n", 931 | "handGyro3 0\n", 932 | "handMagne1 0\n", 933 | "handMagne2 0\n", 934 | "handMagne3 0\n", 935 | "chestTemperature 0\n", 936 | "chestAcc16_1 0\n", 937 | "chestAcc16_2 0\n", 938 | "chestAcc16_3 0\n", 939 | "chestAcc6_1 0\n", 940 | "chestAcc6_2 0\n", 941 | "chestAcc6_3 0\n", 942 | "chestGyro1 0\n", 943 | "chestGyro2 0\n", 944 | "chestGyro3 0\n", 945 | "chestMagne1 0\n", 946 | "chestMagne2 0\n", 947 | "chestMagne3 0\n", 948 | "ankleTemperature 0\n", 949 | "ankleAcc16_1 0\n", 950 | "ankleAcc16_2 0\n", 951 | "ankleAcc16_3 0\n", 952 | "ankleAcc6_1 0\n", 953 | "ankleAcc6_2 0\n", 954 | "ankleAcc6_3 0\n", 955 | "ankleGyro1 0\n", 956 | "ankleGyro2 0\n", 957 | "ankleGyro3 0\n", 958 | "ankleMagne1 0\n", 959 | "ankleMagne2 0\n", 960 | "ankleMagne3 0\n", 961 | "subject_id 0\n", 962 | "dtype: int64" 963 | ] 964 | }, 965 | "execution_count": 9, 966 | "metadata": {}, 967 | "output_type": "execute_result" 968 | } 969 | ], 970 | "source": [ 971 | "dataCol.isnull().sum()" 972 | ] 973 | }, 974 | { 975 | "cell_type": "markdown", 976 | "metadata": {}, 977 | "source": [ 978 | "As we can see, there are no more missing values. We can now move into Exploratory Data Analysis." 979 | ] 980 | }, 981 | { 982 | "cell_type": "markdown", 983 | "metadata": {}, 984 | "source": [ 985 | "## Exploratory Data Analysis" 986 | ] 987 | }, 988 | { 989 | "cell_type": "markdown", 990 | "metadata": {}, 991 | "source": [ 992 | "#### Splitting Data in Train and Test sets" 993 | ] 994 | }, 995 | { 996 | "cell_type": "markdown", 997 | "metadata": {}, 998 | "source": [ 999 | "Before splitting our data, it is best if we first check if the classes are balanced which would imply that stratification is not needed for our split. If classes' weights are unbalanced, we should proceed with stratifying while splitting the data. Stratified sampling is the process of taking samples of all classes and putting them on the sub-sets created for train and test sets. It helps with getting better results when the distribution of the weights of the classes is not normal. Normal distribution would imply that each class' weight is similar with all other weights of classes. " 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "markdown", 1004 | "metadata": {}, 1005 | "source": [ 1006 | "## See class distribution/ration\n", 1007 | "Check if our dataset is balanced" 1008 | ] 1009 | }, 1010 | { 1011 | "cell_type": "code", 1012 | "execution_count": 10, 1013 | "metadata": {}, 1014 | "outputs": [ 1015 | { 1016 | "data": { 1017 | "image/png": "\n", 1018 | "text/plain": [ 1019 | "
" 1020 | ] 1021 | }, 1022 | "metadata": {}, 1023 | "output_type": "display_data" 1024 | } 1025 | ], 1026 | "source": [ 1027 | "dataCol['activityID'].value_counts().plot(kind = \"bar\",figsize = (12,6))\n", 1028 | "plt.show()" 1029 | ] 1030 | }, 1031 | { 1032 | "cell_type": "markdown", 1033 | "metadata": {}, 1034 | "source": [ 1035 | "As the above plot shows, our classes are mostly balanced. Therefore, we proceed with splitting the data into train and test sets. The most common split in the industry is a 80% split for train set and 20% for test set, which is the split fraction that will be used. " 1036 | ] 1037 | }, 1038 | { 1039 | "cell_type": "code", 1040 | "execution_count": 11, 1041 | "metadata": {}, 1042 | "outputs": [], 1043 | "source": [ 1044 | "train_df = dataCol.sample(frac=0.8, random_state=1)\n", 1045 | "test_df = dataCol.drop(train_df.index)" 1046 | ] 1047 | }, 1048 | { 1049 | "cell_type": "markdown", 1050 | "metadata": {}, 1051 | "source": [ 1052 | "We need to check some statistics/insights from the describe method from pandas as it can provide further results that we need to know as to how our data is being processed for our analysis." 1053 | ] 1054 | }, 1055 | { 1056 | "cell_type": "code", 1057 | "execution_count": 12, 1058 | "metadata": {}, 1059 | "outputs": [ 1060 | { 1061 | "data": { 1062 | "text/html": [ 1063 | "
\n", 1064 | "\n", 1077 | "\n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | "
timestampactivityIDheartratehandTemperaturehandAcc16_1handAcc16_2handAcc16_3handAcc6_1handAcc6_2handAcc6_3...ankleAcc6_1ankleAcc6_2ankleAcc6_3ankleGyro1ankleGyro2ankleGyro3ankleMagne1ankleMagne2ankleMagne3subject_id
count1.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+06...1.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+061.554298e+06
mean1.705049e+038.080534e+001.074758e+023.275138e+01-4.953526e+003.581113e+003.603514e+00-4.886385e+003.570863e+003.787763e+00...9.374706e+00-4.445267e-02-2.175620e+001.027736e-02-3.649406e-025.607242e-03-3.157962e+011.394841e+001.725137e+014.566578e+00
std1.093592e+036.175064e+002.699031e+011.794207e+006.239143e+006.886169e+003.958145e+006.245060e+006.585066e+003.945398e+00...6.067489e+007.183548e+003.475628e+001.126197e+006.380780e-012.011908e+001.834688e+012.168353e+011.969368e+012.333375e+00
min3.120000e+011.000000e+005.700000e+012.487500e+01-1.453670e+02-1.043010e+02-1.014520e+02-6.121470e+01-6.184170e+01-6.193470e+01...-6.114200e+01-6.190350e+01-6.231480e+01-1.416200e+01-1.304010e+01-1.401960e+01-1.726240e+02-1.379080e+02-1.027160e+021.000000e+00
25%7.442925e+023.000000e+008.600000e+013.168750e+01-8.970020e+001.057830e+001.162090e+00-8.867070e+001.055563e+001.365070e+00...8.396590e+00-2.073120e+00-3.399390e+00-2.081648e-01-1.066463e-01-4.416657e-01-4.170160e+01-1.246927e+013.799633e+002.000000e+00
50%1.480090e+036.000000e+001.040000e+023.312500e+01-5.449130e+003.525300e+003.432840e+00-5.377104e+003.566820e+003.663470e+00...9.550020e+00-2.252810e-01-1.993145e+004.636280e-03-3.977450e-03-2.336400e-03-3.400060e+017.672570e-011.876795e+015.000000e+00
75%2.664000e+031.300000e+011.240000e+023.406250e+01-9.581008e-016.450507e+006.532445e+00-9.061720e-016.458267e+006.778200e+00...1.028160e+011.920960e+00-5.958823e-011.308070e-011.160640e-019.121635e-02-1.789610e+011.782927e+013.120910e+017.000000e+00
max4.245680e+032.400000e+012.020000e+023.550000e+016.285960e+011.556990e+021.577600e+025.282140e+016.225980e+016.192340e+01...6.196930e+016.204900e+016.093570e+011.742040e+011.358820e+011.448270e+019.155160e+019.369920e+011.469000e+029.000000e+00
\n", 1299 | "

8 rows × 43 columns

\n", 1300 | "
" 1301 | ], 1302 | "text/plain": [ 1303 | " timestamp activityID heartrate handTemperature \\\n", 1304 | "count 1.554298e+06 1.554298e+06 1.554298e+06 1.554298e+06 \n", 1305 | "mean 1.705049e+03 8.080534e+00 1.074758e+02 3.275138e+01 \n", 1306 | "std 1.093592e+03 6.175064e+00 2.699031e+01 1.794207e+00 \n", 1307 | "min 3.120000e+01 1.000000e+00 5.700000e+01 2.487500e+01 \n", 1308 | "25% 7.442925e+02 3.000000e+00 8.600000e+01 3.168750e+01 \n", 1309 | "50% 1.480090e+03 6.000000e+00 1.040000e+02 3.312500e+01 \n", 1310 | "75% 2.664000e+03 1.300000e+01 1.240000e+02 3.406250e+01 \n", 1311 | "max 4.245680e+03 2.400000e+01 2.020000e+02 3.550000e+01 \n", 1312 | "\n", 1313 | " handAcc16_1 handAcc16_2 handAcc16_3 handAcc6_1 handAcc6_2 \\\n", 1314 | "count 1.554298e+06 1.554298e+06 1.554298e+06 1.554298e+06 1.554298e+06 \n", 1315 | "mean -4.953526e+00 3.581113e+00 3.603514e+00 -4.886385e+00 3.570863e+00 \n", 1316 | "std 6.239143e+00 6.886169e+00 3.958145e+00 6.245060e+00 6.585066e+00 \n", 1317 | "min -1.453670e+02 -1.043010e+02 -1.014520e+02 -6.121470e+01 -6.184170e+01 \n", 1318 | "25% -8.970020e+00 1.057830e+00 1.162090e+00 -8.867070e+00 1.055563e+00 \n", 1319 | "50% -5.449130e+00 3.525300e+00 3.432840e+00 -5.377104e+00 3.566820e+00 \n", 1320 | "75% -9.581008e-01 6.450507e+00 6.532445e+00 -9.061720e-01 6.458267e+00 \n", 1321 | "max 6.285960e+01 1.556990e+02 1.577600e+02 5.282140e+01 6.225980e+01 \n", 1322 | "\n", 1323 | " handAcc6_3 ... ankleAcc6_1 ankleAcc6_2 ankleAcc6_3 \\\n", 1324 | "count 1.554298e+06 ... 1.554298e+06 1.554298e+06 1.554298e+06 \n", 1325 | "mean 3.787763e+00 ... 9.374706e+00 -4.445267e-02 -2.175620e+00 \n", 1326 | "std 3.945398e+00 ... 6.067489e+00 7.183548e+00 3.475628e+00 \n", 1327 | "min -6.193470e+01 ... -6.114200e+01 -6.190350e+01 -6.231480e+01 \n", 1328 | "25% 1.365070e+00 ... 8.396590e+00 -2.073120e+00 -3.399390e+00 \n", 1329 | "50% 3.663470e+00 ... 9.550020e+00 -2.252810e-01 -1.993145e+00 \n", 1330 | "75% 6.778200e+00 ... 1.028160e+01 1.920960e+00 -5.958823e-01 \n", 1331 | "max 6.192340e+01 ... 6.196930e+01 6.204900e+01 6.093570e+01 \n", 1332 | "\n", 1333 | " ankleGyro1 ankleGyro2 ankleGyro3 ankleMagne1 ankleMagne2 \\\n", 1334 | "count 1.554298e+06 1.554298e+06 1.554298e+06 1.554298e+06 1.554298e+06 \n", 1335 | "mean 1.027736e-02 -3.649406e-02 5.607242e-03 -3.157962e+01 1.394841e+00 \n", 1336 | "std 1.126197e+00 6.380780e-01 2.011908e+00 1.834688e+01 2.168353e+01 \n", 1337 | "min -1.416200e+01 -1.304010e+01 -1.401960e+01 -1.726240e+02 -1.379080e+02 \n", 1338 | "25% -2.081648e-01 -1.066463e-01 -4.416657e-01 -4.170160e+01 -1.246927e+01 \n", 1339 | "50% 4.636280e-03 -3.977450e-03 -2.336400e-03 -3.400060e+01 7.672570e-01 \n", 1340 | "75% 1.308070e-01 1.160640e-01 9.121635e-02 -1.789610e+01 1.782927e+01 \n", 1341 | "max 1.742040e+01 1.358820e+01 1.448270e+01 9.155160e+01 9.369920e+01 \n", 1342 | "\n", 1343 | " ankleMagne3 subject_id \n", 1344 | "count 1.554298e+06 1.554298e+06 \n", 1345 | "mean 1.725137e+01 4.566578e+00 \n", 1346 | "std 1.969368e+01 2.333375e+00 \n", 1347 | "min -1.027160e+02 1.000000e+00 \n", 1348 | "25% 3.799633e+00 2.000000e+00 \n", 1349 | "50% 1.876795e+01 5.000000e+00 \n", 1350 | "75% 3.120910e+01 7.000000e+00 \n", 1351 | "max 1.469000e+02 9.000000e+00 \n", 1352 | "\n", 1353 | "[8 rows x 43 columns]" 1354 | ] 1355 | }, 1356 | "execution_count": 12, 1357 | "metadata": {}, 1358 | "output_type": "execute_result" 1359 | } 1360 | ], 1361 | "source": [ 1362 | "train_df.describe()" 1363 | ] 1364 | }, 1365 | { 1366 | "cell_type": "markdown", 1367 | "metadata": {}, 1368 | "source": [ 1369 | "We are going to focus on heart rate as it is our most precice meter of check for tracking subjects during activities as implied by the various indications on the readme file of the dataset. With that in mind, looking at the table, we can observe that the mean heart rate throughout the dataset is 107.4 . Furthermore the minimum heart rate is 57 and the maximum heart rate is 202. The quartiles that are shown can be further analysed by plotting a box plot which will help with understanding our outliers and quartiles groups and also shown the mean of our data's heart rate." 1370 | ] 1371 | }, 1372 | { 1373 | "cell_type": "code", 1374 | "execution_count": 13, 1375 | "metadata": {}, 1376 | "outputs": [ 1377 | { 1378 | "data": { 1379 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAARkAAAD7CAYAAABe6+AqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEdxJREFUeJzt3XmQHOV9xvHvgxRA4pAEEhQIsMArTICAIYsQdtmWMTaGYKAqkIDBHKZMDPJ6IabMlQq2y8SEOLaFKFMWp0hRnOZ04ThExVEOCCJxn2GNEEgIEJcEKAgQv/zR74Zhmd3tXfR2a0bPp4qambd7pn9DsQ9vH9M/RQRmZrmsU3cBZtbeHDJmlpVDxsyycsiYWVYOGTPLyiFjZlk5ZMwsK4fMWkjSs5L26TN2jKQ/ZtxmSOoYYPkxklZJekvSckkPSTpgCJ9/maSfrp5qbXVyyFhWkkYOYfV7ImJDYCzwa+AqSWPzVGZVcchYU5K2lPRbSUslLZD0/YZlUyTdI+kNSUsknS9p3YblIWm6pKeBpyXdlRY9lGYqfzvQtiPiA+DfgA2AyQ2fe62kFyUtk3SXpJ3S+PHAEcAP0+ffMth3sOo4ZOxjJK0D3AI8BEwEvgKcJGnftMoq4GRgPLBXWn5in485GNgT2DEivpjGdo2IDSPi6kG2PwI4FngPWNiw6PcUobMZcD9wBUBEzErPz02f/40S38EqMpSprLWXGyW93/B6XYo/XIA9gAkR8ZP0+hlJFwKHAX+IiPkN73tW0m+ALwG/ahj/WUS8NsSapkp6g2IG8z5wZES83LswIi7pfS7pR8DrksZExLImnzXgdxhiXfYJeCaz9jo4Isb2/sNHZyKfArZMu0NvpD/8M4DNASRtL+l3addlOfBPFLOaRs8Po6a5qZZxwM3AF3oXSBoh6RxJf0rbfDYt6rvdUt/BquOZjDXzPLAgIib3s/wC4AHg8Ih4U9JJwCF91hn2z/sj4i1JJwJ/knRJRDwAfBM4CNiHImDGAK8D6md7g30Hq4hnMtbMfcBySadKGpVmETtL2iMt3whYDrwlaQfghBKf+RKwXdkCIuJV4CLgHxu2uRJ4FRhNMXsa6PMH+w5WEYeMfUxErAK+AXwWWAC8QvEHPyatcgrFzOJN4EJgwAO5yY+A2WnX5W9KlvIrYH9JuwCXUxwEXgw8Dszts+7FwI7p828s8R2sIvJNq8wsJ89kzCwrh4yZZeWQMbOsHDJmlpVDxsyyaumL8caPHx+TJk2quwyztc78+fNfiYgJZdZt6ZCZNGkS8+bNq7sMs7WOpIWDr1Xw7pKZZeWQMbOsHDJmlpVDxsyyaukDv9a6pk2b9v/P77jjjtrqsPw8kzGzrBwyVrnGWUyz19ZesoWMpK0l3S7pCUmPSepO45tIuk3S0+lxXBqXpPMk9Uh6WNLuuWozs+rknMm8D/wgIv4cmApMl7QjcBowJ90WcU56DbAfxZ3oJwPHU9zi0cxaXLaQiYglEXF/ev4m8ARFa4qDgNlptdkUrTNI45dHYS4wVtIWueozs2pUckxG0iRgN+BeYPOIWAJFEFH00IEigBrvcL8ojfX9rOMlzZM0b+nSpTnLNrPVIHvISNoQ+C1wUkQsH2jVJmMfuzdoRMyKiM6I6JwwodTvs8ysRllDRtKfUQTMFRFxfRp+qXc3KD32Nu9aBGzd8PatgBdy1mdm+eU8uySKO8g/ERG/aFh0M3B0en40cFPD+FHpLNNUYFnvbpWZta6cV/x+HvgW8IikB9PYGcA5wDWSjgOeAw5Ny24F9gd6gBUUvZDNrMVlC5mI+CPNj7NA0fy87/oBTM9Vj5nVw1f8mllWDhkzy8ohY2ZZOWTMLCuHjJll5ZAxs6wcMmaWlUPGzLJyyJhZVg4ZM8vKIWNmWTlkzCwrh4yZZeWQMbOsHDJmlpVDxsyycsiYWVYOGTPLKueNxC+R9LKkRxvGPitprqQHU++kKWncLWrN2lTOG4lfBpwPXN4wdi7w44j4vaT90+tpfLRF7Z4ULWr3zFhbW5k5cyY9PT11l/GJdHd3113CoDo6Oujq6qq7jJaTs03tXcBrfYeBjdPzMXzYV8ktas3aVM6ZTDMnAX+Q9HOKgPtcGu+vRa37LpXQav93nTZt2sfGZsyYUX0hVomqD/yeAJwcEVsDJ1M0f4OSLWrBvbDbwZgxYz7yety4cTVVYlWoOmSOBnrb1V4LTEnPS7eodS/s1nfTTTd95PUNN9xQUyVWhapD5gXgS+n53sDT6blb1K6lPItpf9mOyUi6kuLM0XhJi4CzgO8AMySNBN4Bjk+ru0XtWmbXXXcFfCxmbZCzTe3h/Sz6yybrukWtWZvyFb9mlpVDxsyycsiYWVYOGTPLyiFjZlk5ZMwsK4eMmWXlkDGzrBwyZpaVQ8bMsnLImFlWDhkzy8ohY2ZZOWTMLCuHjJll5ZAxs6wcMmaWlUPGzLKqtE1tGu+S9JSkxySd2zB+empT+5SkfXPVZWbVqrRNraQvU3SL3CUiVkraLI3vCBwG7ARsCfynpO0jYlXG+sysAlW3qT0BOCciVqZ1Xk7jBwFXRcTKiFhA0bVgCmbW8qo+JrM98AVJ90q6U9Ieaby/NrVm1uKq7oU9EhgHTAX2AK6RtB1DbFNL6te0zTbbZCrTzFaXqmcyi4Dro3Af8AEwHrepNWtbVYfMjRTtaZG0PbAu8ApFm9rDJK0naVtgMnBfxbWZWQZVt6m9BLgkndZ+Fzg6dY98TNI1wOPA+8B0n1kyaw91tKk9sp/1zwbOzlWPmdXDV/yaWVYOGTPLyiFjZlk5ZMwsK4eMmWXlkDGzrBwyZpaVQ8bMsnLImFlWDhkzy8ohY2ZZOWTMLCuHjJll5ZAxs6wcMmaWVemQkfQpSfuk56MkbZSvLDNrF6VCRtJ3gOuA36ShrShupWlmNqCyM5npwOeB5QAR8TSwWa6izKx9lA2ZlRHxbu8LSSPpp2WJmVmjsiFzp6QzgFGSvgpcC9wy0Bv664Wdlp0iKSSNT68l6bzUC/thSbsP9YuY2ZqpbMicBiwFHgH+Drg1Is4c5D2XAV/vOyhpa+CrwHMNw/tRtEGZTNG47YKSdZnZGq5syHRFxIURcWhEHBIRF0rqHugN/fTCBvgl8EM+urt1EHB5avo2FxgraYuStZnZGqxsyBzdZOyYoW5M0oHA4oh4qM+i0r2wJR0vaZ6keUuXLh1qCWZWsQH7Lkk6HPgmsK2kmxsWbQS8OpQNSRoNnAl8rdniJmNNDyxHxCxgFkBnZ6cPPput4QZr7nY3sISiX/W/Noy/CTw8xG19GtgWeEgSFNfa3C9pCkPohW1mrWXAkImIhcBCYK9PuqGIeISGa2skPQt0RsQraZb0PUlXAXsCyyJiySfdppnVr+wVv1Ml/bektyS9K2mVpOWDvOdK4B7gM5IWSTpugNVvBZ4BeoALgRNL1m9ma7iyvbDPBw6juD6mEzgK6BjoDQP0wu5dPqnheVBcVWxmbaZsyBARPZJGRMQq4FJJd2esy8zaRNmQWSFpXeBBSedSHAzeIF9ZZtYuyl4n86207veAtynOBP11rqLMrH0MOpORNAI4OyKOBN4Bfpy9KjNrG4POZNIxmAlpd8nMbEjKHpN5FvivdD3L272DEfGLHEWZWfsoGzIvpH/WofhJAfh+MmZWQtmQeTwirm0ckHRohnrMrM2UPbt0eskxM7OPGOxX2PsB+wMTJZ3XsGhj4P2chZlZexhsd+kFYB5wIDC/YfxN4ORcRZlZ+xjsV9gPpXv0fi0iZldUk5m1kbLXyWzq62TMbDjKnl1aiK+TMbNh+CTXyZiZDapUyETEWvV7pZkzZ9LT01N3GW2t999vd/eATS/sE+ro6KCrq6vWGkqFjKQJFG1MdgLW7x2PiL0z1VWrnp4eHnz0CVaN3qTuUtrWOu8WF4zPf+almitpXyNWNOtIVL2yu0tXAFcDBwDfpWiR0tb9SFaN3oT/3WH/usswG7ZRT95adwlA+St+N42Ii4H3IuLOiPg2MHWgNzRrUyvpXyQ9mVrR3iBpbMOy01Ob2qck7Tusb2Nma5yyIfNeelwi6a8k7UbRtmQgl/HxNrW3ATtHxC7A/5B+miBpR4p7CO+U3vPrdB8bM2txZUPmp5LGAD8ATgEuYpArfpu1qY2I/4iI3p8jzOXDoDoIuCoiVkbEAoquBVNK1mZma7CyZ5d+l54uA768mrb9bYrjPFC0pJ3bsKzfNrVm1lrK9l3aXtKc3uMrknaR9A/D3aikMyl+YHlF71CT1Zrer8a9sM1aS9ndpQspjp+8BxARD1McQxkySUdTnKU6IvVbgiG0qY2IWRHRGRGdEyZMGE4JZlahsiEzOiLu6zM25Fs9SPo6cCpwYESsaFh0M3CYpPUkbQtMBvpuz8xaUNnrZF6R9GnSLoykQyh6L/UrtamdBoyXtAg4i2I2tB5wmySAuRHx3Yh4TNI1wOMU4TU9/TDTzFpc2ZCZDswCdpC0GFgAHDHQG/ppU3vxAOufDZxdsh4zaxFlQ2YxcClwO7AJsJziqt+fZKrLzNpE2ZC5CXgDuJ9+DsiamTVTNmS2ioi+V++amQ2q7NmluyX9RdZKzKwtDdat4BGKM0ojgWMlPQOspLh4LtJvkMzM+jXY7tIBlVRhZm1rsG4FC6sqxMzaU9ljMmZmw+KQMbOsHDJmlpVDxsyycsiYWVYOGTPLyiFjZlk5ZMwsq7I/kFyrLF68mBErlq0xzbHMhmPEildZvHjIN7Bc7TyTMbOsPJNpYuLEiby4cqTb1FpLG/XkrUycuHndZeSbyfTTpnYTSbdJejo9jkvjknRealP7sKTdc9VlZtXKubt0GR9vU3saMCciJgNz0muA/Sg6FEwGjgcuyFiXmVUoW8g0a1NL0Y52dno+Gzi4YfzyKMwFxkraIldtZladqg/8bh4RSwDS42ZpfCLwfMN6blNr1ibWlLNLblNr1qaqDpmXeneD0uPLadxtas3aVNUhczNFvybS400N40els0xTgWW9u1Vm1tqyXSfTT5vac4BrJB0HPAccmla/Fdgf6AFWAMfmqsvMqpUtZPppUwvwlSbrBkUrXDNrM2vKgV8za1MOGTPLyiFjZlk5ZMwsK4eMmWXlkDGzrBwyZpaVQ8bMsnLImFlWDhkzy8ohY2ZZOWTMLCuHjJll5ZYo/Rix4jU3d8tonXeWA/DB+hvXXEn7GrHiNaD+ligOmSY6OjrqLqHt9fS8CUDHdvX/EbSvzdeI/5YdMk10dXXVXULb6+7uBmDGjBk1V2K5+ZiMmWXlkDGzrGoJGUknS3pM0qOSrpS0vqRtJd2bWtheLWndOmozs9Wr8pCRNBH4PtAZETsDI4DDgH8Gfpla2L4OHFd1bWa2+tW1uzQSGCVpJDAaWALsDVyXlje2sDWzFlZ5yETEYuDnFC1RlgDLgPnAGxHxflrNbWrN2kQdu0vjgIOAbYEtgQ2A/Zqs6ja1Zm2gjt2lfYAFEbE0It4Drgc+B4xNu0/gNrVmbaOOkHkOmCpptCRRNHt7HLgdOCSt09jC1sxaWB3HZO6lOMB7P/BIqmEWcCrw95J6gE2Bi6uuzcxWv1p+VhARZ1H0xm70DDClhnLMLCNf8WtmWTlkzCwrh4yZZeWQMbOsHDJmlpVDxsyycsiYWVYOGTPLyiFjZlk5ZMwsK4eMmWXlkDGzrBwyZpaVQ8bMsnLImFlWDhkzy8ohY2ZZOWTMLCuHjJllVVcv7LGSrpP0pKQnJO0laRNJt6Ve2Lel/kxm1uLqmsnMAP49InYAdgWeAE4D5qRe2HPSazNrcXV0kNwY+CKp5UlEvBsRb1B0lZydVnMvbLM2UcdMZjtgKXCppAckXSRpA2DziFgCkB43a/Zmt6k1ay11hMxIYHfggojYDXibIewauU2tWWupI2QWAYtSJ0kouknuDrwkaQuA9PhyDbWZ2WpWeQfJiHhR0vOSPhMRT/FhL+zHKXpgn4N7YQ/JzJkz6enpqbuMIemtt7u7u+ZKyuvo6KCrq6vuMlpOLW1qgS7gCknrUrSnPZZiVnWNpOOA54BDa6rNKjBq1Ki6S7CKKCLqrmHYOjs7Y968eXWXYbbWkTQ/IjrLrOsrfs0sK4eMmWXlkDGzrBwyZpaVQ8bMsnLImFlWDhkzy8ohY2ZZtfTFeJKWAgvrrsOGbTzwSt1F2LB8KiJK/UK5pUPGWpukeWWvGrXW5d0lM8vKIWNmWTlkrE6z6i7A8vMxGTPLyjMZM8vKIWNmWTlkzCwrh4yZZeWQMbOs/g/Zqxd5l3dcCgAAAABJRU5ErkJggg==\n", 1380 | "text/plain": [ 1381 | "
" 1382 | ] 1383 | }, 1384 | "metadata": {}, 1385 | "output_type": "display_data" 1386 | } 1387 | ], 1388 | "source": [ 1389 | "import seaborn as sns\n", 1390 | "\n", 1391 | "fig, ax = plt.subplots(figsize=(4,4))\n", 1392 | "plt.title(\"Heart Rate \")\n", 1393 | "ax = sns.boxplot(y=train_df[\"heartrate\"])" 1394 | ] 1395 | }, 1396 | { 1397 | "cell_type": "markdown", 1398 | "metadata": {}, 1399 | "source": [ 1400 | "Looking at the box plot, we can see that the outliers have heart rate from 180 up to 202. Our highest quartile group out of the four starts from 124 which is the end of the Inter-quartile range and finishes at 180 which also makes it our biggest group by looking at the size of it on the box plot compared to the other quartiles. Meaning that the biggest amount of subjects on the activities performed had heart rate of 124 up to 180. Our third quartile group starts from the mean value which is 107.4 which is where the horizontal line in our box is, and finishes at the end of the Inter-quartile range which is 124. Our second quartile group, starts from the start of the Inter-quartile range which is 86 and ends at the mean value 124. Our first quartile group starts from the lowest data point, 57 and ends at the start of the Inter-quartile range 86. Our box plot also shows that most subjects performed some activities at a statistically similar way but failed to do the same in all activities which explains the big upper quartile group.\n", 1401 | "\n", 1402 | "To find the most cumbersome activities we have to plot a bar chart which will show the mean values of heart rate for each activity that was performed. This will in return enable us to analyse further specific activity data. The names of the activities will be used for easier analysis of results." 1403 | ] 1404 | }, 1405 | { 1406 | "cell_type": "code", 1407 | "execution_count": 15, 1408 | "metadata": {}, 1409 | "outputs": [ 1410 | { 1411 | "data": { 1412 | "text/plain": [ 1413 | "" 1414 | ] 1415 | }, 1416 | "execution_count": 15, 1417 | "metadata": {}, 1418 | "output_type": "execute_result" 1419 | }, 1420 | { 1421 | "data": { 1422 | "image/png": "\n", 1423 | "text/plain": [ 1424 | "
" 1425 | ] 1426 | }, 1427 | "metadata": {}, 1428 | "output_type": "display_data" 1429 | } 1430 | ], 1431 | "source": [ 1432 | "df_hr_act = train_df['heartrate'].groupby(train_df['activityID']).mean()\n", 1433 | "df_hr_act.index = df_hr_act.index.map(activityIDdict)\n", 1434 | "df_hr_act.plot(kind='bar')" 1435 | ] 1436 | }, 1437 | { 1438 | "cell_type": "markdown", 1439 | "metadata": {}, 1440 | "source": [ 1441 | "The bar chart shows that Rope Jumping and Running are the most cumbersome activities out of all the activities.\n", 1442 | "\n", 1443 | "To check further on our data to see any anormalities, we have to plot a heat map which will show whether our data has correlations inbetween it. All columns will be used in order to understand the extend of problems, if there are any." 1444 | ] 1445 | }, 1446 | { 1447 | "cell_type": "code", 1448 | "execution_count": 16, 1449 | "metadata": {}, 1450 | "outputs": [ 1451 | { 1452 | "data": { 1453 | "image/png": "\n", 1454 | "text/plain": [ 1455 | "
" 1456 | ] 1457 | }, 1458 | "metadata": {}, 1459 | "output_type": "display_data" 1460 | } 1461 | ], 1462 | "source": [ 1463 | "from pandas.plotting import scatter_matrix\n", 1464 | "df_corr = train_df.corr()\n", 1465 | "df_corr = df_corr.drop(['activityID'], axis = 1)\n", 1466 | "\n", 1467 | "f, ax = plt.subplots(figsize=(15, 10))\n", 1468 | "sns.heatmap(df_corr, mask=np.zeros_like(df_corr, dtype=np.bool), cmap = \"BrBG\",ax=ax)\n", 1469 | "plt.show()" 1470 | ] 1471 | }, 1472 | { 1473 | "cell_type": "markdown", 1474 | "metadata": {}, 1475 | "source": [ 1476 | "Our heatmap shows how much statistical similarity there is between our different columns. We can every easily observe that the gyroscopes do not correlate with any of our other data and seem unneeded in this model. \n", 1477 | "\n", 1478 | "On the other hand we can understand the correlation between accelerometers of the hand and temperature. The two are strongly correlated on all three instances of hand accelerometers. \n", 1479 | "\n", 1480 | "Furthermore the chest Magnetometers seem to be correlated with heart rate and it is very logical as they very close together on the body." 1481 | ] 1482 | }, 1483 | { 1484 | "cell_type": "markdown", 1485 | "metadata": {}, 1486 | "source": [ 1487 | "## Hypothesis Testing" 1488 | ] 1489 | }, 1490 | { 1491 | "cell_type": "markdown", 1492 | "metadata": {}, 1493 | "source": [ 1494 | "The most cumbersome activities seem to be running and rope jumping as seen from bar charts plotted above. Therefore our hypothesis testing will be based on these two activities and how their heart rate data correlates with the rest of the activities' heart rates. This will be done by getting the mean heart rate of the two activities and them comparing it to the mean heart rate of all activities." 1495 | ] 1496 | }, 1497 | { 1498 | "cell_type": "markdown", 1499 | "metadata": {}, 1500 | "source": [ 1501 | "**Null Hypothesis: **\n", 1502 | "- h0 : The mean heart rate of the cumbersome activities has no mass difference from the mean of all activities\n", 1503 | "\n", 1504 | "**Non Null Hypothesis: **\n", 1505 | "\n", 1506 | "- h1 : The mean heart rate of the cumbersome activities has mass difference from the mean of all activities" 1507 | ] 1508 | }, 1509 | { 1510 | "cell_type": "code", 1511 | "execution_count": 17, 1512 | "metadata": {}, 1513 | "outputs": [], 1514 | "source": [ 1515 | "running_data = train_df.loc[(train_df[\"activityID\"] == 5)]\n", 1516 | "ropejumping_data = train_df.loc[(train_df[\"activityID\"] == 24)]\n", 1517 | "cumbersome_data = running_data + ropejumping_data" 1518 | ] 1519 | }, 1520 | { 1521 | "cell_type": "code", 1522 | "execution_count": 18, 1523 | "metadata": {}, 1524 | "outputs": [ 1525 | { 1526 | "name": "stdout", 1527 | "output_type": "stream", 1528 | "text": [ 1529 | "The p_value is 0.0 and h0 is rejected. There is mass difference between the means of cumbersome activities and all activities.\n" 1530 | ] 1531 | } 1532 | ], 1533 | "source": [ 1534 | "import scipy.stats\n", 1535 | "\n", 1536 | "p = train_df['heartrate'].mean() / (running_data['heartrate'].std() / math.sqrt( running_data['heartrate'].count() ))\n", 1537 | "pValue = 1 - scipy.stats.norm.cdf(p)\n", 1538 | "\n", 1539 | "if pValue > 0.1:\n", 1540 | " print(\"The p_value is \", pValue, \" and h1 is rejected. There is no mass difference between the means of cumbersome activities and all activities.\")\n", 1541 | "else:\n", 1542 | " print(\"The p_value is \", pValue, \" and h0 is rejected. There is mass difference between the means of cumbersome activities and all activities.\")\n" 1543 | ] 1544 | }, 1545 | { 1546 | "cell_type": "markdown", 1547 | "metadata": {}, 1548 | "source": [ 1549 | "After making sure that our hypothesis is non null, proving that the mean of the heart rate of the two most cumbersome activities is in fact very different from the mean heart rate of all the activities we can move on into Modelling where we will be looking at different modelling algorithms and will be choosing one as the modelling algorithm that would be used after testing." 1550 | ] 1551 | }, 1552 | { 1553 | "cell_type": "markdown", 1554 | "metadata": {}, 1555 | "source": [ 1556 | "## Modelling" 1557 | ] 1558 | }, 1559 | { 1560 | "cell_type": "markdown", 1561 | "metadata": {}, 1562 | "source": [ 1563 | "Some variables have to be dropped which would impact our modelling precision. The variables to be dropped are timestamp and subject_id as they are numeric numbers which would our modelling method would use in its calculations but since their values don't have any meaning, the modelling method used would have noise and predictions of accuracy would be innacurate." 1564 | ] 1565 | }, 1566 | { 1567 | "cell_type": "code", 1568 | "execution_count": 19, 1569 | "metadata": {}, 1570 | "outputs": [], 1571 | "source": [ 1572 | "from sklearn.tree import DecisionTreeClassifier\n", 1573 | "from sklearn.model_selection import train_test_split,cross_val_score,StratifiedShuffleSplit\n", 1574 | "from sklearn.metrics import precision_score,recall_score, f1_score, confusion_matrix,roc_auc_score,roc_curve, accuracy_score\n", 1575 | "from sklearn.preprocessing import StandardScaler, RobustScaler\n", 1576 | "from sklearn.decomposition import PCA, TruncatedSVD\n", 1577 | "from sklearn.linear_model import LogisticRegression\n", 1578 | "from sklearn.neighbors import KNeighborsClassifier\n", 1579 | "from sklearn.svm import SVC" 1580 | ] 1581 | }, 1582 | { 1583 | "cell_type": "code", 1584 | "execution_count": 20, 1585 | "metadata": {}, 1586 | "outputs": [], 1587 | "source": [ 1588 | "train_df = train_df.drop([\"timestamp\", \"subject_id\"],1)" 1589 | ] 1590 | }, 1591 | { 1592 | "cell_type": "code", 1593 | "execution_count": 21, 1594 | "metadata": {}, 1595 | "outputs": [ 1596 | { 1597 | "data": { 1598 | "text/html": [ 1599 | "
\n", 1600 | "\n", 1613 | "\n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | " \n", 1706 | " \n", 1707 | " \n", 1708 | " \n", 1709 | " \n", 1710 | " \n", 1711 | " \n", 1712 | " \n", 1713 | " \n", 1714 | " \n", 1715 | " \n", 1716 | " \n", 1717 | " \n", 1718 | " \n", 1719 | " \n", 1720 | " \n", 1721 | " \n", 1722 | " \n", 1723 | " \n", 1724 | " \n", 1725 | " \n", 1726 | " \n", 1727 | " \n", 1728 | " \n", 1729 | " \n", 1730 | " \n", 1731 | " \n", 1732 | " \n", 1733 | " \n", 1734 | " \n", 1735 | " \n", 1736 | " \n", 1737 | " \n", 1738 | " \n", 1739 | " \n", 1740 | " \n", 1741 | " \n", 1742 | " \n", 1743 | " \n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1747 | " \n", 1748 | " \n", 1749 | " \n", 1750 | " \n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | "
activityIDheartratehandTemperaturehandAcc16_1handAcc16_2handAcc16_3handAcc6_1handAcc6_2handAcc6_3handGyro1...ankleAcc16_3ankleAcc6_1ankleAcc6_2ankleAcc6_3ankleGyro1ankleGyro2ankleGyro3ankleMagne1ankleMagne2ankleMagne3
3129213-0.3684210.447368-0.3845120.220069-0.572507-0.3614360.256971-0.530520-0.012729...0.3219390.088339-0.3221520.3518000.2201830.001165-0.1393540.641700-0.5824320.658114
141735121.6818180.236842-0.721708-0.103407-0.206281-0.625117-0.108947-0.253147-0.868631...0.0208130.842160-0.269519-0.185601-0.8298813.024896-0.265261-0.7444161.091512-0.924482
1191085242.0263160.2894740.4579750.035504-2.0150470.6035120.072534-1.452154-5.190897...1.905139-7.260794-0.9467881.4822385.524513-0.773631-1.222988-0.447203-0.428537-0.674781
12069141-1.1052630.1315791.290689-0.6939320.9013521.320385-0.7239560.9113470.065061...-0.351884-5.128883-2.300004-0.300736-0.0373130.0273500.0377090.7006150.847494-0.859194
17105202-0.6578950.5000000.434876-2.413007-0.4736100.449008-2.412753-0.481267-0.015890...-0.107595-0.1924390.783229-0.0571620.0164400.0687800.0586570.6117700.4970420.263863
\n", 1763 | "

5 rows × 41 columns

\n", 1764 | "
" 1765 | ], 1766 | "text/plain": [ 1767 | " activityID heartrate handTemperature handAcc16_1 handAcc16_2 \\\n", 1768 | "312921 3 -0.368421 0.447368 -0.384512 0.220069 \n", 1769 | "141735 12 1.681818 0.236842 -0.721708 -0.103407 \n", 1770 | "1191085 24 2.026316 0.289474 0.457975 0.035504 \n", 1771 | "1206914 1 -1.105263 0.131579 1.290689 -0.693932 \n", 1772 | "1710520 2 -0.657895 0.500000 0.434876 -2.413007 \n", 1773 | "\n", 1774 | " handAcc16_3 handAcc6_1 handAcc6_2 handAcc6_3 handGyro1 \\\n", 1775 | "312921 -0.572507 -0.361436 0.256971 -0.530520 -0.012729 \n", 1776 | "141735 -0.206281 -0.625117 -0.108947 -0.253147 -0.868631 \n", 1777 | "1191085 -2.015047 0.603512 0.072534 -1.452154 -5.190897 \n", 1778 | "1206914 0.901352 1.320385 -0.723956 0.911347 0.065061 \n", 1779 | "1710520 -0.473610 0.449008 -2.412753 -0.481267 -0.015890 \n", 1780 | "\n", 1781 | " ... ankleAcc16_3 ankleAcc6_1 ankleAcc6_2 ankleAcc6_3 \\\n", 1782 | "312921 ... 0.321939 0.088339 -0.322152 0.351800 \n", 1783 | "141735 ... 0.020813 0.842160 -0.269519 -0.185601 \n", 1784 | "1191085 ... 1.905139 -7.260794 -0.946788 1.482238 \n", 1785 | "1206914 ... -0.351884 -5.128883 -2.300004 -0.300736 \n", 1786 | "1710520 ... -0.107595 -0.192439 0.783229 -0.057162 \n", 1787 | "\n", 1788 | " ankleGyro1 ankleGyro2 ankleGyro3 ankleMagne1 ankleMagne2 \\\n", 1789 | "312921 0.220183 0.001165 -0.139354 0.641700 -0.582432 \n", 1790 | "141735 -0.829881 3.024896 -0.265261 -0.744416 1.091512 \n", 1791 | "1191085 5.524513 -0.773631 -1.222988 -0.447203 -0.428537 \n", 1792 | "1206914 -0.037313 0.027350 0.037709 0.700615 0.847494 \n", 1793 | "1710520 0.016440 0.068780 0.058657 0.611770 0.497042 \n", 1794 | "\n", 1795 | " ankleMagne3 \n", 1796 | "312921 0.658114 \n", 1797 | "141735 -0.924482 \n", 1798 | "1191085 -0.674781 \n", 1799 | "1206914 -0.859194 \n", 1800 | "1710520 0.263863 \n", 1801 | "\n", 1802 | "[5 rows x 41 columns]" 1803 | ] 1804 | }, 1805 | "execution_count": 21, 1806 | "metadata": {}, 1807 | "output_type": "execute_result" 1808 | } 1809 | ], 1810 | "source": [ 1811 | "from sklearn import preprocessing\n", 1812 | "from sklearn.preprocessing import StandardScaler,RobustScaler\n", 1813 | "\n", 1814 | "#apply scaling to all columns except subject and activity \n", 1815 | "scaler = RobustScaler()\n", 1816 | "df_scaled = train_df.copy()\n", 1817 | "df_scaled_test = test_df.copy()\n", 1818 | "\n", 1819 | "df_scaled.iloc[:,1:41] = scaler.fit_transform(df_scaled.iloc[:,1:41])\n", 1820 | "df_scaled_test.iloc[:,1:41] = scaler.fit_transform(df_scaled_test.iloc[:,1:41])\n", 1821 | "\n", 1822 | "df_scaled.head()" 1823 | ] 1824 | }, 1825 | { 1826 | "cell_type": "code", 1827 | "execution_count": 22, 1828 | "metadata": {}, 1829 | "outputs": [], 1830 | "source": [ 1831 | "X_train = df_scaled.drop('activityID', axis=1).values\n", 1832 | "y_train = df_scaled['activityID'].values\n", 1833 | "\n", 1834 | "# Test Dataset\n", 1835 | "X_test = df_scaled.drop('activityID', axis=1).values\n", 1836 | "y_test = df_scaled['activityID'].values" 1837 | ] 1838 | }, 1839 | { 1840 | "cell_type": "markdown", 1841 | "metadata": {}, 1842 | "source": [ 1843 | "### Dimensionality reduction using Principal Component Analysis(PCA)\n", 1844 | "\n", 1845 | "\n", 1846 | "Usually 90-98% of the variance will explain our data really well. So by plotting the variance ratio aginst the number of componments we could see how many of those we could use. As we see from the graph below 15 componments fall around to 94% of the variance. " 1847 | ] 1848 | }, 1849 | { 1850 | "cell_type": "code", 1851 | "execution_count": 23, 1852 | "metadata": {}, 1853 | "outputs": [ 1854 | { 1855 | "data": { 1856 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Wd4HPXV9/HvUbcsyXKVe8E2rmBjGdPBotcYCISSEENInIQ7lCQktDwBEpJAuAmQO6TQTQKIEooxYIqRcYBg3HuvsiV3yZJsdZ3nxfwFi7ySVitpZyWdz3XNtTv9t6PVnJ0uqooxxhhTV4zfAYwxxkQnKxDGGGOCsgJhjDEmKCsQxhhjgrICYYwxJigrEMYYY4KyAmEAEJE7ReRJv3NEgoj8XUT+n985/CKeZ0SkQES+8DuPiV5WIFqIiGwRkVIRKRGRXe4fMCWg/zkiMldEikVkj4h8LCLfqDONySKiIvLLBubTT0SqRGRokH6vi8j/hpNfVX+vqt8PZ9y2RlV/pKq/bc403N9qe0tlirCTgbOA/qo6KdgAItJHRJ4SkXz3nV0jIveKSOfIRm0bRGSOiLS7/x8rEC3rIlVNASYAxwK/AhCRy4BXgOeA/kAG8GvgojrjTwX2u9egVHUHMBu4JrC7iHQDzgemNzW0iMQ1dRzTpg0CtqjqwWA93Xfpv0An4ARVTcUrKOnAYT9MTDumqta0QANsAc4MaH8QmAkIsA34RSPjJwPFwJVABTCxgWGvBjbW6XYDsCig/VEgFygCFgKnBPS7B3gV+Jfr/33X7V8Bw7wC7AQOAHOBMQH9ngUeA952mecBQwP6jwE+wCt2u4A7XfcY4HZgI7APeBnoVs9n7OqW3x6gwL3vH9B/iMtVDHzo8jQl/33u/WRgO/BzYDeQD1wXMOz5wCo3nx3ArUBnoBSoAUpc0zfIZ6h3OQGDAQXiAoafA3zfvb8W+BR4GCgENgEnuu65LuvUBr4jfYEZ7m+wAfiB6349UAZUu9z3Bhn3PmA5ENPA9E8E5rvlOx84sc7nuA/4zM3jLaA78Dze920+MDhgeAVucp9xL97/TkzAd+ZXwFb3mZ8DutRZhlPx/sf2AnfV+Z6/gvc9L3af6UjgDjetXODsOrl/65Z7MfA+0COg//HuMxUCS4HJrvvv3PIsc5/3L3j/9w+7+RwAlgFj/V5PNXm95neA9tIQUCCAAcBK92Ub6b7EQxoZ/xq8lVOs+4f6cwPDdnJfupMDuv0XuCWg/TvunzIOb+W3E0hy/e4BKoGL3T9gJw4vEN8DUoFE4BFgSUC/Z/FWPJPc9J8Hsl2/VPc5fg4kufbjXL9bgM/xtqISgX8AL9bzGbsD38QrnKnuH/2NOp/3f4EEvF0mRU3MH1ggqoDfAPF4BeEQ0NX1z8cVV7yiNSFgvO2N/E0bWk6DabxAVAHXue/EfXgrwcfcZzobbyWWUs+8Pwb+6v4G4/EK7RkB0/6kgdyfE6RwBPTvhle0r3Gf6yrX3j3gc2zA29rogldg1wFnuuGfA54JmJ4COW66A92wtcvhe25aRwApwGvAP+sswyfwvsPjgHJgVMD3vAw4J2C+m4G73N/6B8DmOst/I14R6eTa73f9+uH9qDkf73/mLNfes+7fzrWfg/fDLB2vWIwC+vi9nmryes3vAO2lwSsQJXi/Lra6f85OwEnuS5zUyPgfAo+491e5f+j4BoZ/EnjcvR+Ot9XRq4HhC4Bx7v09wNw6/e8hYAVbp1+6+wy1v9yeBZ4M6H8+sCYg++J6prMat5Jy7X3wClVcfbkDhh0PFLj3A/FWnskB/f/VxPyBBaKUr6+odwPHu/fbgB8CaXWmOZnQCkR9y2kwjReI9QH9jnLDZwR02weMDzLfAXi/aFMDuv0BeDZg2g0ViPXAjxrofw3wRZ1u/wWuDfgcgb/kHwLeDWi/iK8XbAXODWi/AZjt3s8GbgjoN6L2OxOwDAO3LL8Argz4Tn9QZ74lQKxrT3Xjpwfk/lWdHLPc+9twhSmg/3u4rTgOLxCn4xW642lgSyzaGzsG0bIuVtV0VR2kqjeoainePzF4K8OgRGQAkIX3CxPgTbxffhc0MK/pwLdEJAnvH3aWqu4OmObPRWS1iBwQkUK8X3I9AsbPbSBPrIjcLyIbRaQIr/hRZ/ydAe8P4f26A2/ltLGeSQ8CXheRQpdpNd6KLCNIhmQR+YeIbHUZ5gLpIhKLt/tkv6oeCvZ5QswfaJ+qVtXzeb6Jt2Lf6k4sOKGeadSnvuUUil0B70sBVLVut2DTq10+xQHdtuL9Cg7FPhr4vrrpb63Tre706+ZsLHfg93Grm0eweW3FKw6B35mGlnHd+e5V1eqAduoMX9+0BgGX13533ff3ZOpZTqr6Ed6upseAXSLyuIikBRs2mlmBaH1r8b7832xgmGvw/hZvichOvH2xScB36xtBVf+D9488BW930nO1/UTkFLxfPN/C21WSjrdLSgIn0UCeq910z8QrLINrJ93AOLVyqf9AZi5wniuitU2Segfe6/o53q/F41Q1DTg1IEM+0E1EkgOGH9BC+b9GVeer6hSgF/AG3nETaHj5haL2AHHgZ+jdzGnWysNbPqkB3QbiHUMJxYfAJSJS3/ohD2+FGagp0w8m8O830M0j2Lxqtx4DV/yRkIu3BRH43e2sqve7/od9H1T1z6qaiXdM7kjgFxHM2yKsQLQy9bY3fwb8PxG5TkTSRCRGRE4WkcfdYN8F7sXbjVLbfBO4QES6NzD554AH8HahvBXQPRXvn2gPECcivwaa8uslFW9f7j68FdjvmzDuTKC3iNwiIokikioix7l+fwd+JyKDAESkp4hMaSBDKVDozqq5u7aHqm4FFgD3iEiC+1V/UZ1xw83/JTftb4tIF1WtxDvOUfvrcxfQXUS6hDNtVd2Dt0L9jtvi+R4tdIaQqubiHUz9g4gkicjReAenn294zC/9Ce/7Mj3gb9VPRP7kpvUOcKSIXC0icSJyBTAa728frl+ISFe3NX0z8JLr/iLwUxEZ4k4b/z3wUp0tvkj4F3CRO1091i3XySLS3/XfhXecBAAROVZEjhOReLwfA7UnBrQpViAiQFVfBa7AO+CWh/dlug94U0SOx/uF+5iq7gxoZuAdnLuqgUk/h/eL6iVVLQ/o/h7wLt4+0K14X856dynVM92teCuwVXgHLUPidmuchbfC3om3PzvL9X4U78ya90Wk2E33uGDTwTuw3AnvzJTPgVl1+n8bOAGvCNyHt0KpXQZh5w/iGmCL21X1I7ytNVR1Dd7Ka5Pb5dC3gWnU5wd4vyr34f3K/KwZOeu6Cu97lQe8Dtytqh+EMqKq7sc7S6kSmOf+VrPxtkI3qOo+4EK8rbx9wC+BC1V1bzPyvol3UHcJ3llfT7nuTwP/xNvFuBnvu3xjM+YTFld0pwB34v3wysX729WuQx8FLnMXH/4Zr8A+gXfsbyvecgrrGiU/iTugYkybJiIv4R0AvrvRgU1UEREFhqvqBr+zmK+zLQjTJrlN+KFud925eL/u3vA7lzHtiV1Ba9qq3njnxHfHu9Dtx6q62N9IxrQvtovJGGNMULaLyRhjTFBtehdTjx49dPDgwWGNe/DgQTp3js4bU1q28Fi28Fi28LTlbAsXLtyrqj0bnZDfl3I3p8nMzNRw5eTkhD1ua7Ns4bFs4bFs4WnL2YAFarfaMMYYEy4rEMYYY4KyAmGMMSYoKxDGGGOCsgJhjDEmqFYrECLytIjsFpEVAd26icgHIrLevXZ13UVE/iwiG0RkmYhMaK1cxhhjQtOaWxDPAufW6XY73pOihuPdHfJ21/08vKeiDQemAX9rxVzGGGNC0GoXyqnqXBEZXKfzFLxHNYL3RLQ5eA+2mQI8587P/VxE0kWkj6rmt1Y+Y0zHpKpUVisV1TVUVtVQWVNDZbV676vd+2rvfVWNUlOjVKt+9d41y/OrKFqaR02NUqNeN1W896rUfPlIZ69b3XYFVEHxugXmq22vO4x+NRBnjMpg3ID0Vl1WrXovJlcgZqrqWNdeqN7TzWr7F6hqVxGZifdw8E9c99nAbaq6IMg0p+FtZZCRkZGZnZ0dVraSkhJSUpry9MfIsWzhsWzhiZZsqkpFNZRWKaVVUFqtFBSXIvFJlFVDWZVS7l7LqpXyKiivViproLIGqmrUveK6KVWuvSrwfTu5/dx3Rydw+sD4oP0a+5tmZWUtVNWJjc0jWm61EexRkEH/jKr6OPA4wMSJE3Xy5MlhzXDOnDmEO25rs2zhsWzhaclsFVU1FB6qYP+hCvYfrKDgYCVFZZUUlXqvxWVVFJW617JKikqrKCmvoriskpLyKmoO+68XvnoO1FeSE2JJToijU0IMiXGxJMbHkBAXQ2pcDAlxsSTGee2Jsd5rfJ3XhFj5erfYGOJcN6+7EBcTQ1yMEBMjX77GihAb4zWLFizguOMmESMQGyPEiDdMjECMCOJevQYEQWJcP0Bqu7m135evrlvtSlECh5fQnprbUn/TSBeIXbW7jkSkD7Dbdd/O159J25+vnklrjPFBTY1SVFbJvoPeyr5uU3Cwgn0HKyg45JqD3kq+PrExQmpSHGlJ8aR1iiM1MZ5B3ZNJTYonNSmOlMQ4Utxrbfv61Ss4+biJdE6Mo3NCLMmJcSTHxxIT0+THi7e4XakxDOvl/5ZXa4p0gZgBTAXud69vBnT/iYhk4z2C8oAdfzCm5VVU1bC3pJw9xa4pKWf+xgrmFK1k38EK9pWUs9+t+PcfrKD68J/0gPcLvmtyAt1TEkhPTmBozxTSk+PplpxA184JdOucQHpyPF2Tvde0pHiSE2JD/gVcK3bXasb2C+ux36YFtFqBEJEX8Q5I9xCR7XgPnb8feFlErge2AZe7wd8Bzsd7BvMh4LrWymVMe1VWWU3+gTJ2FJSSV1jKDtfkFZZ+WQwKD1UGHTd123a6p3gr9gHdkhk/IJ1unRPonpJId7fCD2yS4mMj/OmMH1rzLKar6ul1RpBhFfif1spiTHtRU6PsKCxlw+4S1u8uZv2uEtbvLmF7wSH2llR8bVgRyEhNok96EkN7pnD8Ed3pmZroNSnea4/URFYt+pyzTs/y6ROZaBYtB6mNMQFUvUKwblcxa3YWs8EVgg27SyitrP5yuJ6piQzrmcJZozPo26UTfdM70a9rJ/qldyIjLYmEuMYvdVofBfvzTXSyAmGMzwoOVrB6ZxHrdhazdlcxa3cWs25XydcO+PZOS2J4RgpXTRrI8IwUhvdKYVivFNKTE3xMbto7KxDGREh1jbJ5bwmr8otZnV/EmvwiVucXs7Oo7Mth0pPjGZGRyqUT+jGidyojMlIZnpFKl07Bz3c3pjVZgTCmlewpLmf+lv18sXk/H68sJe/DWZRX1QAQFyMM65XCCUO7M6pPKiN7pzGydyo9UxObfKaPMa3FCoQxLWR7wSG+2OwVhC+27GfTnoMAdIqPZXAqXHP8IEb1SWNUnzSG9UoJ6fiAMX6yAmFMmGpqlMW5Bcxcls/7K3exo7AUgLSkOI4d3I0rJg5g0pBujO3XhU//M5fJk0f7nNiYprECYUwT1BaFt5ft5J3l+ewsKiMhNoZTj+zBtFOPYNKQbozISI2KK32NaS4rEMY0QlVZtK2Qt5fl8+6KfPIP1BaFntx23gjOGJVBWpIdRDbtT9suEGvXQpg3pBpfWAjprXur3HBZtvC0ZDYFVif3ZEb3UbzVYyQ7EruQUFPFqYWb+eX+dZxRsIG0TysanU5rZGtpli08HSFb2y4QxrSwLYnpzOgxihndR7IhuQexWsPJB7bws9xPOatgPWnVoRcFY9q6tl0gRoyAOXPCGnVJFN9+2bKFJ9xs+0rKeWNJHm8u2cGy7QcAmDSkG78d15fzx/ame0qib9kiwbKFp01nC/FU6rZdIIwJU2V1DXPW7uGVBbl8tGY3VTXKUf26cNf5o7hwXB/6dOnkd0RjfGcFwnQo63YV88qCXF5fvIO9JRX0SEngupMGc/nEARyZkep3PGOiihUI0+4Vl1UyY2keL8/PZen2A8TFCKeP7MXlEwcweURP4mPtgjVjgrECYdolVWXh1gKy5+fy9rJ8SiurGZGRyq8uGMXFx/SjRwscVzCmvbMCYdqVfSXlvL54B9nzc9mwu4TOCbFcfExfrjh2IOP6d7H7HBnTBL4UCBG5GfgB3nO5n1DVR0SkG/ASMBjYAnxLVQv8yGfaFlVlwdYCHltSxpIPZlNZrUwYmM4fv3k0Fxzdh86J9jvImHBE/D9HRMbiFYdJQAUwS0Tedt1mq+r9InI7cDtwW6TzmbajukZ5b+VOHp+7iSW5hXSOh2uOH8IVxw5gRG874GxMc/nx02oU8LmqHgIQkY+BS4ApeM+wBpgOzMEKhAniUEUVryzYzpOfbCJ3fymDuifz2ylj6HVoM+ecYTfEM6aliPc46AjOUGQU8CZwAlAKzAYWANeoanrAcAWq2jXI+NOAaQAZGRmZ2dnZYeUoKSkhJSUlrHFbm2ULrrC8htlbq/got5KDlTAsPYZzB8czISOWGBFbbmGybOFpy9mysrIWqurERiekqhFvgOuBRcBc4O/Aw0BhnWEKGptOZmamhisnJyfscVubZfu6woMV+ru3V+nwu97RwbfP1B8+t0AXbNkXFdlCZdnCY9nC01g2YIGGsK725eidqj4FPAUgIr8HtgO7RKSPquaLSB9gtx/ZTPQoq6zmuf9u4S8fbaC4vIpvTujPT7KGMbhHZ7+jGdMh+HUWUy9V3S0iA4FL8XY3DQGmAve71zf9yGb8V1OjvLFkBw+9v44dhaVMHtGT284dyag+aX5HM6ZD8ev8v3+LSHegEvgfVS0QkfuBl0XkemAbcLlP2YyP/rN+D394Zw2r8osY2y+NBy87mhOH9fA7ljEdkl+7mE4J0m0fcIYPcUwU2Lz3IHfPWMncdXvo37UTj145nouO7mtPZjPGR3YFkfFVRVUNj8/dyJ8/2kBibAy/umAU15wwiMS4WL+jGdPhWYEwvpm/ZT93vLacDbtLuOCoPvz6otFkpCX5HcsY41iBMBF34FAlf3h3Ndnzc+mX3omnr53I6SMz/I5ljKnDCoSJGFVlxtI8fjtzFQWHKpl26hHccuZwkhPsa2hMNLL/TBMReYWl3PHacj5et4dx/bsw/XuTGNO3i9+xjDENsAJhWpWq8vKCXO6buZqqGuXui0bz3RMGE2tnJxkT9axAmFaTf6CU2//tbTUcN6QbD142joHdk/2OZYwJkRUI0+JUlVcWbOe3M1dRVaPc+40xXHP8ILumwZg2xgqEaVH5B7xjDXPW7mHSkG48eNnRDOpu904ypi2yAmFahKry70U7uPetlVRVK/e4Yw221WBM22UFwjTb3pJy7nxtOe+v2sWkwd148HLbajCmPbACYZrlvZU7ufO15RSXVXHX+aO4/uQhttVgTDthBcKEpaiskntnrOLfi7Yzpm8aL/xgvD0H2ph2xgqEabLPNu7lF68sI/9AKTeePowbTx9OQlyM37GMMS3MCoQJWVllNc+vLueDWfMY0qMzr/74RCYMPOyx4caYdsIKhAnJrqIypj23gKXbq5h6wiBuP28UnRLsltzGtGe+7BcQkZ+KyEoRWSEiL4pIkogMEZF5IrJeRF4SkQQ/spnDLdteyDf+8gnrd5dw4zGJ3DtlrBUHYzqAiBcIEekH3ARMVNWxQCxwJfAA8LCqDgcKgOsjnc0c7q2leVz+9/8SFxPDqz86kcwM2+g0pqPw68hiHNBJROKAZCAfOB141fWfDlzsUzYD1NQof3p/LTe+uJij+nXhzZ+cxOi+aX7HMsZEkKhq5GcqcjPwO6AUeB+4GfhcVYe5/gOAd90WRt1xpwHTADIyMjKzs7PDylBSUkJKSkp4H6CV+Z2tvEp5fHk5C3dVc0q/OL47JoF4d22D39kaYtnCY9nC05azZWVlLVTViY1OSFUj2gBdgY+AnkA88AZwDbAhYJgBwPLGppWZmanhysnJCXvc1uZntu0Fh/S8R+bqkNtn6hNzN2pNTc3X+ttyC49lC49lC09j2YAFGsL62o8dymcCm1V1D4CIvAacCKSLSJyqVgH9gTwfsnVoK3Yc4Npn5lNeWc1TU48la2QvvyMZY3zkxzGIbcDxIpIsIgKcAawCcoDL3DBTgTd9yNZhLdpWwFVPfE5CrPDaDSdacTDGRL5AqOo8vIPRi4DlLsPjwG3Az0RkA9AdeCrS2Tqqzzft45on59GtcwIv/+gEhmfYLTOMMT5dKKeqdwN31+m8CZjkQ5wObc7a3fzwnwsZ0C2Z579/HBlpSX5HMsZECTupvQN7b+VOfvLCIob3SuWf10+ie0qi35GMMVHECkQH9eaSHfzs5aUc1a8L06+bRJfkeL8jGWOijBWIDujl+bnc9toyJg3uxlPXHktKon0NjDGHszVDBzP9sy3cPWMlpx7Zk398J9PuqWSMqZcViA7kyf9s4r63V3P26Az+7+pjSIyz4mCMqZ8ViA7iibmb+N07qzn/qN48euUxxMfaA36MMQ2zAtEB/OPjjfzh3TVccFQfHrlyvBUHY0xIrEC0c3+bs5EHZq3hwqP78MgV44mz4mCMCZEViHbsr3M28MdZa7loXF8e/tY4Kw7GmCaxAtFOPZazgQffW8uU8X156HIrDsaYprMC0Q793+z1PPTBOi45ph//e/k4Yt2zHIwxpinsZ2U7U1scLrXiYIxpJisQ7cgL87Z9WRwetOJgjGkmKxDtxIerdvGrN5aTNaInD1x2tBUHY0yzWYFoBxZtK+AnLy5ibL8u/OXqCXadgzGmRdiapI3btKeE709fQEZaEk9feyyd7cZ7xpgW0qQCISJJIpLWnBmKyAgRWRLQFInILSLSTUQ+EJH17rVrc+bTEewpLmfqM18gwPTrJtHDnudgjGlBIRcIEfk+8B7wtoj8PtwZqupaVR2vquOBTOAQ8DpwOzBbVYcDs127qUdJeRXXPfsFe4sreOraYxnco7PfkYwx7Uy9BUJELqrT6UxVPU1VTwEuaKH5nwFsVNWtwBRguus+Hbi4hebR7lRW13DD84tYnV/MY98+hvED0v2OZIxph0RVg/cQ+RVwLPBrVV0qIncCYwAFYlT16mbPXORpYJGq/kVEClU1PaBfgaoetptJRKYB0wAyMjIys7Ozw5p3SUkJKSkpYSZvXQ1lU1WeXF7Bp3lVXDcmgdMGRPZJcG11ufnNsoXHsoWnsWxZWVkLVXVioxNS1XoboDfwuGt6A8OAoxsaJ9QGSAD2AhmuvbBO/4LGppGZmanhysnJCXvc1tZQtkc/XKeDbpupD3+wNnKBArTV5eY3yxYeyxaexrIBCzSE9XRjxyAOArcAj7kicRWwrtGqE5rz8LYedrn2XSLSB8C97m6h+bQbn23cy8MferfQuPmM4X7HMca0cw0dg7gPeBvvgHGWqn4DWIp3kPqaFpj3VcCLAe0zgKnu/VTgzRaYR7uxr6ScW7KXMKRHZ+67eCwidiGcMaZ1NbQFcaGqngqcCHwXQFVnAOcA3ZozUxFJBs4CXgvofD9wloisd/3ub8482pOaGuXnryylsLSS/7vqGLvWwRgTEQ2taVaIyD+BTsDHtR1VtQp4tDkzVdVDQPc63fbhndVk6njqk83MWbuH30wZw5i+XfyOY4zpIOotEKr6HRE5CqhU1TURzGQCLM0t5IFZazhnTAbXHD/I7zjGmA6kwX0Vqro8UkHM4YrKKvnJi4vISEvij98cZ8cdjDERZTuzo5Sqcsdry8krLOPlHx5Pl+TIXu9gjDF2s74olT0/l7eX5fOzs44kc1CzzgkwxpiwNFogxPMdEfm1ax8oIpNaP1rHtaO4hntmrOTkYT348WlD/Y5jjOmgQtmC+CtwAt51CwDFeBfOmVZQWlHNY0vLSE2K409XjCPGHvxjjPFJKMcgjlPVCSKyGEBVC0QkoZVzdViPzF5HXonyz+vH0ys1ye84xpgOLJQtiEoRicW7SR8i0hOoadVUHVTu/kM888kWTuobxynDe/odxxjTwYVSIP6M97yGXiLyO+ATIOznQZj63T9rDTExcNmRdsaSMcZ/je5iUtXnRWQh3lXOAlysqqtbPVkHs3Drft5els/NZwyna3ye33GMMSaks5iOB3ao6mOq+hdgu4gc1/rROo6aGuU3M1eTkZbID087wu84xhgDhLaL6W9ASUD7QdfNtJC3luWxNLeQX5wzkuQEu3bRGBMdQikQ4h4wAYCq1mBXYLeYsspqHnh3DWP7pXHpMf38jmOMMV8KpUBsEpGbRCTeNTcDm1o7WEfx1CebyTtQxl3nj7ZrHowxUSWUAvEjvGdC7AC2A8fhngltmmd3cRl/zdnA2aMzOGFo98ZHMMaYCArlLKbdwJURyNLhPPzBOiqqa7jj/FF+RzHGmMM0WiDchXE/AAYHDq+q3wt3piKSDjwJjMW7AO97wFrgJTefLcC3VLUg3HlEu9X5Rbw0P5frThrCkB6d/Y5jjDGHCWUX05tAF+BDvGdU1zbN8SgwS1VHAuOA1cDtwGxVHY73HOzbmzmPqKWq/O7t1aR1iuem04f7HccYY4IK5WykZFW9raVmKCJpwKnAtQCqWgFUiMgUYLIbbDowB2ix+UaTOWv38MmGvdx90Wh7zoMxJmqFsgUxU0TOb8F5HgHsAZ4RkcUi8qSIdAYyVDUfwL32asF5Ro3K6hrue3sVR/TozHfsEaLGmCgmAZc4BB9ApBjoDJQDlXi321BVTQtrhiITgc+Bk1R1nog8ChQBN6pqesBwBaraNcj403BnUWVkZGRmZ2eHE4OSkhJSUlLCGrc55uRW8uzKCm46JpEJGcE34PzKFgrLFh7LFh7LFp7GsmVlZS1U1YmNTkhVI9oAvYEtAe2n4B3TWAv0cd36AGsbm1ZmZqaGKycnJ+xxw1VaUaXH//5DveSxT7Smpqbe4fzIFirLFh7LFh7LFp7GsgELNIT1dUiPHBWRriIySUROrW1CGa+egrQTyBWREa7TGcAqYAYw1XWbindwvF15ft428g+Uces5IxCxi+KMMdEtlNNcvw/cDPQHlgDHA/8FTm/GfG8EnncPHtoEXId3PORlEbke2AZc3ox9kazDAAAVVklEQVTpR52D5VX8NWcDJw3rzolDe/gdxxhjGhXKWUw3A8cCn6tqloiMBO5tzkxVdQkQbP/XGc2ZbjR79rMt7DtYwa1nj2h8YGOMiQKh7GIqU9UyABFJVNU1gK3lmuBAaSX/+HgjZ47qxTEDDzvubowxUSmULYjt7srnN4APRKQAsCfaNMETczdRVFbFz86yumqMaTtCuRfTJe7tPSKSg3dV9axWTdWO7C0p5+lPN3Ph0X0Y3TesM4ONMcYX9RYIEUlT1SIR6RbQebl7TQH2t2qyduKvORspq6zmp2cd6XcUY4xpkoa2IF4ALgQW4t1QT+q82rMxG5F/oJR/zdvKNyf0Z2jP6Lygxhhj6lNvgVDVC8U7Wf80Vd0WwUztxp9nb0BVuekMuyGfMabtafAsJnfF3esRytKubN13kFcW5HLVpIEM6JbsdxxjjGmyUE5z/VxEjm31JO3MIx+uJy5W+EnWML+jGGNMWEI5zTUL+KGIbAUO8tXN+o5u1WRt2LpdxbyxZAfTTjmCXmlJfscxxpiwhFIgzmv1FO3Mn95fR+eEOH502lC/oxhjTNga3cWkqltVdStQinf2Um1jgti0p4RZK3fyvZOH0LVzgt9xjDEmbI0WCBH5hoisBzYDH+M9L/rdVs7VZr34xTbiYoTvHD/Q7yjGGNMsoRyk/i3eHVzXqeoQvBvqfdqqqdqosspqXl24nbPHZNAr1Y49GGPatlAKRKWq7gNiRCRGVXOA8a2cq016b+VOCg5VcvUke5SoMabtC+UgdaGIpABz8Z7hsBuoat1YbdPz87YxqHsyJw7t7ncUY4xptlC2IKbgHaD+Kd5N+jYCF7VmqLZow+5ivti8n6smDSQmxp4WZ4xp+xq6Wd9fgBdU9bOAztNbYqYisgUoBqqBKlWd6G4K+BIwGO9A+LdUtaAl5hcJL8zLJT5WuCyzv99RjDGmRTS0BbEeeEhEtojIAyLS0scdslR1vKrWPlnudmC2qg4HZrv2NqGsspp/L9rOOWN60yMl0e84xhjTIuotEKr6qKqeAJyGd2vvZ0RktYj8WkRa497VU/hqC2U6cHErzKNVvLM8nwOllVx9nJ3aaoxpP0K9UO4BVT0GuBq4BFjdzPkq8L6ILBSRaa5bhqrmu3nmA72aOY+IeWHeNo7o0ZkTjrCD08aY9kO8G7Y2MIBIPHAucCXeNRAfAy+q6hthz1Skr6rmiUgv4APgRmCGqqYHDFOgqoc9wNkVlGkAGRkZmdnZ2WFlKCkpISWl+c9o2FFcw12flnLFiATOGxLf7OlBy2VrDZYtPJYtPJYtPI1ly8rKWhiwe79+qhq0Ac4CngZ2AW8B3wY61zd8uA1wD3ArsBbo47r1AdY2Nm5mZqaGKycnJ+xxA9395godfuc7uq+kvEWmp9py2VqDZQuPZQuPZQtPY9mABRrC+rmhXUx3Av8FRqnqRar6vKoebLTiNEJEOotIau174GxgBTADmOoGmwq82dx5tbbSCu/g9HlH9aab3XfJGNPONPREuaxWmmcG8Lr3sDri8E6lnSUi84GXReR6YBtweSvNv8XMXJZHcVkVV02yg9PGmPYnlCupW5SqbgLGBem+D+8YR5vxwhfbGNqzM8cN6eZ3FGOMaXGhXEltglidX8TibYVcNWkgbmvIGGPaFSsQYXph3jYS4mLsymljTLtlBSIMhyqqeGPxDi44qg/pyXZw2hjTPlmBCMNbS/MoLq+yK6eNMe2aFYgwvPBFLsN7pTBx0GHX8RljTLthBaKJNuwuYWluIVccO8AOThtj2jUrEE30xuIdxAh8Y1xfv6MYY0yrsgLRBDU1yuuLd3Dy8J70SrNnThtj2jcrEE0wf8t+dhSWcukx/fyOYowxrc4KRBO8vngHyQmxnD0mw+8oxhjT6qxAhKisspq3l+dz7pjeJCdE/A4lxhgTcVYgQjR79W6Ky6q4ZILtXjLGdAxWIEL0+uLtZKQlcuLQHn5HMcaYiLACEYJ9JeXMWbuHKeP7ERtj1z4YYzoGKxAhmLksn6oa5RI7e8kY04FYgQjB64t3MLJ3KqP6pPkdxRhjIsa3AiEisSKyWERmuvYhIjJPRNaLyEsiEhW3Sd20p4QluYVcagenjTEdjJ9bEDcDqwPaHwAeVtXhQAFwvS+p6nhj8Q5E4BvjrEAYYzoWXwqEiPQHLgCedO0CnA686gaZDlzsR7ZAqsrrS3Zw0tAe9O5it9YwxnQsfm1BPAL8Eqhx7d2BQlWtcu3bAd9/si/YWkDu/lI7OG2M6ZBEVSM7Q5ELgfNV9QYRmQzcClwH/FdVh7lhBgDvqOpRQcafBkwDyMjIyMzOzg4rR0lJCSkpKQ0O8+yKcj7Lr+LPWckkxUXu9NZQsvnFsoXHsoXHsoWnsWxZWVkLVXVioxNS1Yg2wB/wthC2ADuBQ8DzwF4gzg1zAvBeY9PKzMzUcOXk5DTYv7SiSo+6e5be/OKisOcRrsay+cmyhceyhceyhaexbMACDWF9HfFdTKp6h6r2V9XBwJXAR6r6bSAHuMwNNhV4M9LZAuWs2U1RWRWXTOjvZwxjjPFNNF0HcRvwMxHZgHdM4ik/w7y2eAc9UxM5aWh3P2MYY4xvfL0tqarOAea495uASX7mqVVwsII5a3cz9YTBxMVGUw01xpjIsbVfEDOX51NZrVxsZy8ZYzowKxBBfLR6F0f06MyYvnZrDWNMx2UFIoiVeUWMH5COd/2eMcZ0TFYg6thdXMbu4nJG29aDMaaDswJRx8q8IgDG9O3icxJjjPGXFYg6VrkCYVsQxpiOzgpEHSvzDjCwWzJdOsX7HcUYY3xlBaKOlXlFdvaSMcZgBeJrisoq2brvkBUIY4zBCsTXrLID1MYY8yUrEAG+PIOpn21BGGOMFYgAK/MO0DM1kV6p9vQ4Y4yxAhFglR2gNsaYL1mBcMoqq1m/u8QKhDHGOFYgnLU7i6muUcbaAWpjjAGsQHzJbrFhjDFfZwXCWZl3gNSkOAZ06+R3FGOMiQoRLxAikiQiX4jIUhFZKSL3uu5DRGSeiKwXkZdEJCGSuVbmFTG6T5rd4tsYYxw/tiDKgdNVdRwwHjhXRI4HHgAeVtXhQAFwfaQCVVXXsDq/yHYvGWNMgIgXCPWUuNZ41yhwOvCq6z4duDhSmTbtPUh5VQ1j7QI5Y4z5kqhq5GcqEgssBIYBjwEPAp+r6jDXfwDwrqqODTLuNGAaQEZGRmZ2dnZYGUpKSkhJSQHgs7wqHl9Wzn0ndaJ/qv+HZQKzRRvLFh7LFh7LFp7GsmVlZS1U1YmNTkhVfWuAdCAHOAXYENB9ALC8sfEzMzM1XDk5OV++/+1bK/XIu97RyqrqsKfXkgKzRRvLFh7LFh7LFp7GsgELNIR1tK8/l1W1EJgDHA+ki0ic69UfyItUjpV5RYzsnUpcrP9bD8YYEy38OIupp4iku/edgDOB1XhbEpe5waYCb0Yij6qyMu8AY/rZAWpjjAkU1/ggLa4PMN0dh4gBXlbVmSKyCsgWkfuAxcBTkQizvaCUorIqu8WGMcbUEfECoarLgGOCdN8ETIp0npV5BwC7gtoYY+rq8DvdV+YVERsjjOyd6ncUY4yJKlYg8ooY2rMzSfGxfkcxxpio0uELxIodB+wOrsYYE0SHLhB7isvZXVzOaDtAbYwxh+nQBcIOUBtjTP06eIHwngFhWxDGGHO4Dl0gVuUVMbBbMl06xfsdxRhjok6HLhAr8g7YBXLGGFOPDlsgDlUqW/cdsgJhjDH16LAFIre4BrAD1MYYU58OWyC2FtUWCNuCMMaYYDp0geiZmkivtCS/oxhjTFTqwAWi2rYejDGmAR2yQJRVVpN3UK1AGGNMAzpkgVi3q5gatQPUxhjTkA5ZIGqvoLYtCGOMqZ8fjxwdICI5IrJaRFaKyM2uezcR+UBE1rvXrq2VoXvnBCb0imVA1+TWmoUxxrR5fmxBVAE/V9VRwPHA/4jIaOB2YLaqDgdmu/ZWcfaY3tw0IYmYGGmtWRhjTJsX8QKhqvmqusi9LwZWA/2AKcB0N9h04OJIZzPGGPMVUVX/Zi4yGJgLjAW2qWp6QL8CVT1sN5OITAOmAWRkZGRmZ2eHNe+SkhJSUlLCGre1WbbwWLbwWLbwtOVsWVlZC1V1YqMTUlVfGiAFWAhc6toL6/QvaGwamZmZGq6cnJywx21tli08li08li08bTkbsEBDWE/7chaTiMQD/waeV9XXXOddItLH9e8D7PYjmzHGGI8fZzEJ8BSwWlX/FNBrBjDVvZ8KvBnpbMYYY74S58M8TwKuAZaLyBLX7U7gfuBlEbke2AZc7kM2Y4wxTsQLhKp+AtR3fukZkcxijDGmfh3ySmpjjDGN8/U01+YSkT3A1jBH7wHsbcE4LcmyhceyhceyhactZxukqj0bm0ibLhDNISILNJTzgH1g2cJj2cJj2cLTEbLZLiZjjDFBWYEwxhgTVEcuEI/7HaABli08li08li087T5bhz0GYYwxpmEdeQvCGGNMA6xAGGOMCapDFggROVdE1orIBhFptQcThUNEtojIchFZIiILfM7ytIjsFpEVAd0i9uS/MLLdIyI73LJbIiLn+5TN96cmhpHN92UnIkki8oWILHXZ7nXdh4jIPLfcXhKRhCjK9qyIbA5YbuMjnS0gY6yILBaRma69+cstlFu+tqcGiAU2AkcACcBSYLTfuQLybQF6+J3DZTkVmACsCOj2R+B29/524IEoynYPcGsULLc+wAT3PhVYB4yOhmXXQDbflx3eLXhS3Pt4YB7eUydfBq503f8O/DiKsj0LXOb3d87l+hnwAjDTtTd7uXXELYhJwAZV3aSqFUA23tPsTB2qOhfYX6dzVDz5r55sUUGj+KmJDWTznXpKXGu8axQ4HXjVdfdrudWXLSqISH/gAuBJ1y60wHLriAWiH5Ab0L6dKPkHcRR4X0QWuqfnRZsMVc0Hb2UD9PI5T10/EZFlbheUL7u/ArmnJh6D94szqpZdnWwQBcvO7SZZgvc8mA/wtvYLVbXKDeLb/2vdbKpau9x+55bbwyKS6Ec24BHgl0CNa+9OCyy3jlgggt1JNmp+CQAnqeoE4Dzgf0TkVL8DtSF/A4YC44F84CE/w4hICt6DsW5R1SI/s9QVJFtULDtVrVbV8UB/vK39UcEGi2wqN9M62URkLHAHMBI4FugG3BbpXCJyIbBbVRcGdg4yaJOXW0csENuBAQHt/YE8n7IcRlXz3Otu4HW8f5JoErVP/lPVXe6fuAZ4Ah+XXTQ/NTFYtmhadi5PITAHbz9/uojUPprA9//XgGznul12qqrlwDP4s9xOAr4hIlvwdpmfjrdF0ezl1hELxHxguDvCnwBcifc0O9+JSGcRSa19D5wNrGh4rIiL2if/1a58nUvwadm5/b9R+dTE+rJFw7ITkZ4iku7edwLOxDtGkgNc5gbza7kFy7YmoOAL3j7+iC83Vb1DVfur6mC89dlHqvptWmK5+X3k3Y8GOB/v7I2NwF1+5wnIdQTeWVVLgZV+ZwNexNvdUIm35XU93r7N2cB699otirL9E1gOLMNbGffxKdvJeJvzy4Alrjk/GpZdA9l8X3bA0cBil2EF8GvX/QjgC2AD8AqQGEXZPnLLbQXwL9yZTn41wGS+Ooup2cvNbrVhjDEmqI64i8kYY0wIrEAYY4wJygqEMcaYoKxAGGOMCcoKhDHGmKCsQJhWJSIqIg8FtN8qIve00LSfFZHLGh+y2fO53N39NKe159UWiMh4v+6UayLLCoRpbeXApSLSw+8ggUQktgmDXw/coKpZrZWnjRmPd+2EaeesQJjWVoX3fNyf1u1RdwtARErc62QR+VhEXhaRdSJyv4h8292Pf7mIDA2YzJki8h833IVu/FgReVBE5rubqP0wYLo5IvIC3sVNdfNc5aa/QkQecN1+jXdx2d9F5MEg4/zSjbNURO533caLyOdu3q/X3vhOROa4G7rNdVskx4rIa+5+/fe5YQaLyBoRme7Gf1VEkl2/M8S73/9yd0O9RNd9i4jcKyKLXL+Rrvs9bjrvu2EuFZE/umFmuVtuICKZbnkvFJH3Aq4OniMiD7jlvk5ETnF3H/gNcIV4zz+4QkROk6+eh7C49m4Aph3w86o/a9p/A5QAaXjPuegC3Arc4/o9S8C99IES9zoZKMR7dkEisAO41/W7GXgkYPxZeD90huNdUZ0ETAN+5YZJBBYAQ9x0DwJDguTsC2wDegJxeFfIXuz6zQEmBhnnPOAzINm1d3Ovy4DT3PvfBOSdg3sGhPsceQGfcTveldaD8a50PskN97RbZkl4dyE+0nV/Du9Ge7hle6N7fwPwpHt/D/AJ3q2pxwGHgPNcv9fxbg0R7z5DT9f9CuDpgLwPuffnAx+699cCfwlYDm8F5E0B4vz+3lnTMo1tQZhWp97dQp8DbmrCaPPVuxFaOd4tUd533ZfjrURrvayqNaq6HtiEd2fNs4Hvindr5nl4K97hbvgvVHVzkPkdC8xR1T3q3SL5ebyHEjXkTOAZVT3kPud+EekCpKvqx26Y6XWmU3vfr+XAyoDPuImvbiKZq6qfuvf/wtuCGQFsVtV19Uy39oaAC/n68nlXVSvd/GLxCmrt/Ae76Y4FPnDL61d4N3ZrbLqBPgX+JCI3uc9eVc9wpo2Ja3wQY1rEI8AivDte1qrC7eZ0NzsLfCRiecD7moD2Gr7+va17rxjFu9Xxjar6XmAPEZmMtwURTLDbIzdGgsy/MYGfo+5nrP1c9X2mUKZbzdeXTzmAqtaISKWq1k67dn6CV6hOaOJ0vwqner+IvI23lfG5iJypqmsayWvaANuCMBGhqvvxHoF4fUDnLUCmez8Fb3dHU10uIjHuuMQRwFrgPeDHAfvYjxTv7rgNmQecJiI93AHsq4CPGxnnfeB7AccIuqnqAaBARE5xw1wTwnTqGigitSvsq/B2E60BBovIsGZMN5i1QM/a+YlIvIiMaWScYrzHleLGGaqqy1X1AbzdeSNbIJeJAlYgTCQ9BASezfQE3kr5C+A46v9135C1eCvKd4EfqWoZ3mMXVwGLRGQF8A8a2VpW7wlvd+DdInkpsEhVG7w9sqrOwttltMDtnrnV9ZoKPCgiy/DO+PlNEz/TamCqG78b8Df3ua4DXhGR5XhbAH9v4nSDfYYKvFtCPyAiS/Hu7npiI6PlAKNrD1IDt7gD+0uBUry/hWkH7G6uxkQR8R4DOlNVx/ocxRjbgjDGGBOcbUEYY4wJyrYgjDHGBGUFwhhjTFBWIIwxxgRlBcIYY0xQViCMMcYE9f8Be5GZ/Dutsr4AAAAASUVORK5CYII=\n", 1857 | "text/plain": [ 1858 | "
" 1859 | ] 1860 | }, 1861 | "metadata": {}, 1862 | "output_type": "display_data" 1863 | } 1864 | ], 1865 | "source": [ 1866 | "from sklearn.decomposition import PCA\n", 1867 | "pca = PCA()\n", 1868 | "pca.fit(X_train)\n", 1869 | "var= pca.explained_variance_ratio_\n", 1870 | "var1=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)\n", 1871 | "\n", 1872 | "plt.title(\"PCA Variance against num of Componmnets\")\n", 1873 | "plt.ylabel(\"Variance %\")\n", 1874 | "plt.xlabel(\"Number of componments\")\n", 1875 | "l = plt.axhline(94, color=\"red\")\n", 1876 | "\n", 1877 | "plt.plot(var1)\n", 1878 | "plt.grid()\n" 1879 | ] 1880 | }, 1881 | { 1882 | "cell_type": "code", 1883 | "execution_count": null, 1884 | "metadata": {}, 1885 | "outputs": [], 1886 | "source": [ 1887 | "pca = PCA(n_components=17)\n", 1888 | "X_train=pca.fit_transform(X_train)\n", 1889 | "X_test=pca.fit_transform(X_test)" 1890 | ] 1891 | }, 1892 | { 1893 | "cell_type": "markdown", 1894 | "metadata": {}, 1895 | "source": [ 1896 | "### Choosing the best model\n", 1897 | "In this section we will train two models and check their performance.\n", 1898 | "\n", 1899 | "We are going to focus on two modelling algorithms, Random Forest and Logistic Regression.\n", 1900 | "\n", 1901 | "**Random Forest** algorithm can be used for both classification and regression which makes it a very versatile modelling algorithm. As the name implies, Random Forest is a forest of trees, decision trees which are randomly populating the forest. The algorithms creates and combines decision trees together, the more trees in the forest, the better the accuracy of its predictions will be.\n", 1902 | "\n", 1903 | "Random Forest algorithms are good because of the high accuracy they provide as well as that they are flexible i.e. can work for both classification and regression modelling. In addition, Random Forest facilitates for good performance of high dimensionality datasets which our dataset has. \n", 1904 | "\n", 1905 | "**Logistic Regression** algorithm is a simple algorithm that can be used for binary/multivariate classification tasks. The result of it is a probability that a data point is part of a class." 1906 | ] 1907 | }, 1908 | { 1909 | "cell_type": "code", 1910 | "execution_count": null, 1911 | "metadata": {}, 1912 | "outputs": [], 1913 | "source": [ 1914 | "def get_metrics (y_true,y_pred):\n", 1915 | " acc = accuracy_score(y_true, y_pred)\n", 1916 | " err = 1-acc\n", 1917 | " p = precision_score(y_true, y_pred,average=None).mean()\n", 1918 | " r = recall_score(y_true, y_pred, average=None).mean()\n", 1919 | " f1 = f1_score(y_true, y_pred, average=None).mean()\n", 1920 | " \n", 1921 | " print(\"Accuracy: \",acc)\n", 1922 | " print(\"Error: \",acc)\n", 1923 | " print(\"Precision\", p)\n", 1924 | " print(\"Recall\", r)\n", 1925 | " print(\"F1\", f1)\n", 1926 | " #print(\"Accuracy: %.3f \\nError: %.3f \\nPrecesion: %.3f \\nRecall: %.3f \\nF1:%.3f \\n\"% (acc,err,p,r,f1) )\n", 1927 | " \n", 1928 | "\n", 1929 | "log_reg = LogisticRegression()\n", 1930 | "log_reg.fit(X=X_train, y=y_train )\n", 1931 | "y_pred_lr = log_reg.predict(X_test)\n", 1932 | "get_metrics(y_test, y_pred_lr)" 1933 | ] 1934 | }, 1935 | { 1936 | "cell_type": "code", 1937 | "execution_count": null, 1938 | "metadata": {}, 1939 | "outputs": [], 1940 | "source": [ 1941 | "from sklearn.ensemble import RandomForestClassifier\n", 1942 | "rfc = RandomForestClassifier(n_jobs =4)\n", 1943 | "rfc.fit(X_train,y_train)\n", 1944 | "y_pred_rf = rfc.predict(X_test)\n", 1945 | "get_metrics(y_test,y_pred_rf)" 1946 | ] 1947 | }, 1948 | { 1949 | "cell_type": "markdown", 1950 | "metadata": {}, 1951 | "source": [ 1952 | "### Cross validation" 1953 | ] 1954 | }, 1955 | { 1956 | "cell_type": "markdown", 1957 | "metadata": {}, 1958 | "source": [ 1959 | "Even though the above models seem to perform really good, the metrics used for that do not represent the real score since the models were train on a specific part of the dataset. By using cross validation, we could k=10 number of folds, which in few words, will generate 10 different samples. By doing that we will get 10 different metrics values. The mean value of these metrics will show a better representation of our model's performance" 1960 | ] 1961 | }, 1962 | { 1963 | "cell_type": "code", 1964 | "execution_count": null, 1965 | "metadata": {}, 1966 | "outputs": [], 1967 | "source": [ 1968 | "from sklearn.model_selection import cross_val_score\n", 1969 | "from sklearn.model_selection import cross_val_predict\n", 1970 | "\n", 1971 | "classifiers = [LogisticRegression(),\n", 1972 | " RandomForestClassifier(n_jobs =4)]\n", 1973 | "\n", 1974 | "score_lst = []\n", 1975 | "for cls in classifiers:\n", 1976 | " accs = accuracy_score(y_train, cross_val_predict(cls,X_train,y_train,cv = 10)) #cross validate the accurancy \n", 1977 | " scores = cross_val_score(cls,X_train,y_train,scoring = \"neg_mean_squared_error\",cv= 10)#calcuate the error\n", 1978 | " score = np.sqrt(-scores)\n", 1979 | " f1 = cross_val_score(cls,X_test, y_test,scoring = \"f1_macro\", cv =10)\n", 1980 | " \n", 1981 | " \n", 1982 | " #p = cross_val_score(cls,X_test, y_test,scoring = \"average_precision\", cv =10)\n", 1983 | "# r = recall_score(y_true, y_pred, average=None).mean()\n", 1984 | "# f1 = f1_score(y_true, y_pred, average=None).mean()\n", 1985 | " score_lst.append([cls.__class__.__name__,accs,score.mean(), f1.mean()])\n", 1986 | "\n", 1987 | "df_scores=pd.DataFrame(columns = [\"Claasifier\",\"Accurancy\",\"MSE\",\"F1\"],data = score_lst)\n", 1988 | "display(df_scores)" 1989 | ] 1990 | }, 1991 | { 1992 | "cell_type": "markdown", 1993 | "metadata": {}, 1994 | "source": [ 1995 | "As this took a lot of time to compute the results of the above cells, it is uploaded without the outputs but work and should be run by the examiner as the computer that I am using has very few processing capabilities." 1996 | ] 1997 | }, 1998 | { 1999 | "cell_type": "markdown", 2000 | "metadata": {}, 2001 | "source": [ 2002 | "## References\n", 2003 | "\n", 2004 | "Archive.ics.uci.edu. (2012). UCI Machine Learning Repository: PAMAP2 Physical Activity Monitoring Data Set. [online] Available at: http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring [Accessed 15 Dec. 2018].\n", 2005 | "\n", 2006 | "Chandrayan, P. (2017). Machine Learning Part 3 : Logistic Regression – Towards Data Science. [online] Towards Data Science. Available at: https://towardsdatascience.com/machine-learning-part-3-logistics-regression-9d890928680f [Accessed 2 Jan. 2019].\n", 2007 | "\n", 2008 | "DeZyre. (n.d.). Principal Component Analysis Tutorial. [online] Available at: https://www.dezyre.com/data-science-in-python-tutorial/principal-component-analysis-tutorial [Accessed 30 Dec. 2018].\n", 2009 | "\n", 2010 | "Donges, N. (2018). The Random Forest Algorithm – Towards Data Science. [online] Towards Data Science. Available at: https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd [Accessed 3 Jan. 2019].\n", 2011 | "\n", 2012 | "PAMAP2_Dataset: Physical Activity Monitoring. (n.d.). [ebook] Available at: http://archive.ics.uci.edu/ml/machine-learning-databases/00231/readme.pdf [Accessed 15 Dec. 2018].\n", 2013 | "\n", 2014 | "Scikit-learn.org. (n.d.). Robust Scaling on Toy Data — scikit-learn 0.18.2 documentation. [online] Available at: https://scikit-learn.org/0.18/auto_examples/preprocessing/plot_robust_scaling.html [Accessed 2 Jan. 2019]." 2015 | ] 2016 | }, 2017 | { 2018 | "cell_type": "code", 2019 | "execution_count": null, 2020 | "metadata": {}, 2021 | "outputs": [], 2022 | "source": [] 2023 | } 2024 | ], 2025 | "metadata": { 2026 | "kernelspec": { 2027 | "display_name": "Python 3", 2028 | "language": "python", 2029 | "name": "python3" 2030 | }, 2031 | "language_info": { 2032 | "codemirror_mode": { 2033 | "name": "ipython", 2034 | "version": 3 2035 | }, 2036 | "file_extension": ".py", 2037 | "mimetype": "text/x-python", 2038 | "name": "python", 2039 | "nbconvert_exporter": "python", 2040 | "pygments_lexer": "ipython3", 2041 | "version": "3.6.5" 2042 | } 2043 | }, 2044 | "nbformat": 4, 2045 | "nbformat_minor": 2 2046 | } 2047 | --------------------------------------------------------------------------------