├── .gitignore ├── LICENSE ├── New_Years_Resolutions_Workshop.ipynb ├── README.md ├── data ├── geoMap.csv └── multiTimeline.csv ├── environment.yml └── solution_New_Years_Resolutions_Workshop.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 DataCamp 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /New_Years_Resolutions_Workshop.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# New Year's Resolutions" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this Facebook live code along session, you're going to check out Google trends data of keywords 'diet', 'gym' and 'finance' to see how they vary over time. Could there be more searches for these terms in January when we're all trying to turn over a new leaf? You're not going to do much mathematics today but you'll source your data, visualize it and learn about trends and seasonality in time series data. The emphasis will be squarely on a visual exploration of the dataset in question.\n", 15 | "\n", 16 | "So the question remains: could there be more searches for these terms in January when we're all trying to turn over a new leaf?\n", 17 | "Let's find out by going [here](https://trends.google.com/trends/explore?date=all&q=diet,gym,finance) and checking out the data (inspired by [this fivethirtyeight piece](https://fivethirtyeight.com/features/how-fast-youll-abandon-your-new-years-resolutions/)).\n", 18 | "\n", 19 | "You can also download the data as a .csv, save to file and import into your very own Python environment to perform your own analysis. You'll do this now. Let's get it!" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Import data" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "# Import packages\n", 36 | "import numpy as np\n", 37 | "import pandas as pd\n", 38 | "import matplotlib.pyplot as plt\n", 39 | "import seaborn as sns\n", 40 | "%matplotlib inline\n", 41 | "sns.set()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "* Import data that you downloaded and check out first several rows:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "df = ____\n", 58 | "____" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "* Use the `.info()` method to check out your data types, number of rows and more:" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "____" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "**Recap:**\n", 82 | "\n", 83 | "* You've imported your data from google trends and had a brief look at it;\n", 84 | "\n", 85 | "**Up next:**\n", 86 | "\n", 87 | "* Wrangle your data and get it into the form you want to prepare it for analysis." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## Wrangle your data" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "* Rename the columns of `df` so that they have no spaces:" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "____\n", 111 | "____" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "* Turn the 'month' column into a datetime data type and make it the index of the DataFrame;" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "Now it's time to explore your DataFrame visually." 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "## A bit of exploratory data analysis" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "* Use a built-in `pandas` visualization method to plot your data as 3 line plots on a single figure (one for each column):" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "____\n", 156 | "____" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "* Plot the 'diet' column by itself as a time series:" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "____\n", 173 | "____" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "**Note:** it looks like there are trends _and_ seasonal components to these time series." 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "**Recap:**\n", 188 | "\n", 189 | "* You've imported your data from google trends and had a brief look at it;\n", 190 | "* You've wrangled your data and gotten it into the form you want to prepare it for analysis.\n", 191 | "* You've checked out youe time series visually.\n", 192 | "\n", 193 | "**Up next:**\n", 194 | "\n", 195 | "* Identify trends in your time series." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "For more on pandas, check out our [Data Manipulation with Python track](https://www.datacamp.com/tracks/data-manipulation-with-python). For more on time series with pandas, check out our [Manipulating Time Series Data in Python course](https://www.datacamp.com/courses/manipulating-time-series-data-in-python).\n", 203 | "\n", 204 | "If you're enoying this session, retweet or share on FB now and follow us on Twitter: [@hugobowne](https://twitter.com/hugobowne) & [@DataCamp](https://twitter.com/datacamp)." 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "## Is there a trend?" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "There are several ways to think about identifying trends in time series. One popular way is by taking a _rolling average_, which means that, for each time point, you take the average of the points on either side of it (the number of points is specified by a _window size_, which you need to choose)." 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "### Check out rolling average:" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "* Plot the rolling average of 'diet' using built-in `pandas` methods. What window size does it make sense to use?" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "diet = ____\n", 242 | "____" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "* Plot the rolling average of 'gym' using built-in `pandas` methods. What window size does it make sense to use?" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "gym = ____\n", 259 | "____" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "* Plot the trends of 'gym' and 'diet' on a single figure:" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": null, 272 | "metadata": {}, 273 | "outputs": [], 274 | "source": [ 275 | "df_rm = ____\n", 276 | "____" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "## Seasonal patterns" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "You can remove the trend from a time series to investigate seasonality. To remove the trend, you can subtract the trend you computed above (rolling mean) from the original signal. This, however, will be dependent on how many data points you averaged over. Another way to remove the trend is called **differencing**, where you look at the diferrence between successive data points (called first-order differencing)." 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "### First-order differencing" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "* Use `pandas` to compute and plot the first order difference of the 'diet' series:" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "____\n", 314 | "____" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "See that you have removed much of the trend and _you can really see the peaks in January every year_. Note: You can also perform 2nd order differencing if the trend is not yet entirely removed. See [here](https://www.otexts.org/fpp/8/1) for more on differencing.\n", 322 | "\n", 323 | "Differencing is super helpful in turning you time series into a **stationary time series**. We won't get too much into these here but a **stationary time series** is one whose statistical properties (such as mean & variance) don't change over time. **Stationary time series** are useful because many time series forecasting methods are based on the assumption that the time series is approximately stationary." 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "**Recap:**\n", 331 | "\n", 332 | "* You've imported your data from google trends and had a brief look at it;\n", 333 | "* You've wrangled your data and gotten it into the form you want to prepare it for analysis.\n", 334 | "* You've checked out youe time series visually.\n", 335 | "* You've identified trends in your time series.\n", 336 | "* You've had some experience with first-order differencing of times series.\n", 337 | "\n", 338 | "**Up next:**\n", 339 | "\n", 340 | "* Analyze your periodicity in your times series by looking at its autocorrelation function;\n", 341 | "* But first: a short detour into correlation." 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "For more on pandas, check out our [Data Manipulation with Python track](https://www.datacamp.com/tracks/data-manipulation-with-python). For more on time series with pandas, check out our [Manipulating Time Series Data in Python course](https://www.datacamp.com/courses/manipulating-time-series-data-in-python).\n", 349 | "\n", 350 | "If you're enoying this session, retweet or share on FB now and follow us on Twitter: [@hugobowne](https://twitter.com/hugobowne) & [@DataCamp](https://twitter.com/datacamp)." 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "### Periodicity and Autocorrelation" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "A time series is _periodic_ if it repeats itself at equally spaced intervals, say, every 12 months. Another way to think of this is that if the time series has a peak somewhere, then it will have a peak 12 months after that and, if it has a trough somewhere, it will also have a trough 12 months after that. Yet another way of thinking about this is that the time series is _correlated_ with itself shifted by 12 months. \n", 365 | "\n", 366 | "Considering the correlation of a time series with such a shifted version of itself is captured by the concept of _autocorrelation_. We'll get to this in a minute. First, let's remind ourselves about correlation:" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "### Correlation" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "The correlation coefficient of two variables captures how linearly related they are:" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "* Import the iris dataset from scikit-learn, turn it into a DataFrame and view the head:" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "metadata": {}, 394 | "outputs": [], 395 | "source": [ 396 | "from sklearn import datasets\n", 397 | "iris = datasets.load_iris()\n", 398 | "df_iris = pd.DataFrame(data= np.c_[iris['data'], iris['target']],\n", 399 | " columns= iris['feature_names'] + ['target'])\n", 400 | "df_iris.head()" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "* Use `pandas` or `seaborn` to build a scatter plot of 'sepal length' against 'sepal width', coloured by the target (species):" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": null, 413 | "metadata": {}, 414 | "outputs": [], 415 | "source": [ 416 | "____" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": {}, 422 | "source": [ 423 | "**Question:** Are sepal length and width positively or negatively correlated across all flowers? Are they positively or negatively correlated within each species? This is an essential distinction." 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "* Compute the correlation coefficients of each pair of measurements:" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": null, 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "____" 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "Note that 'sepal length (cm)' and 'sepal width (cm)' seem to be negatively correlated! And they are, over the entire population of flowers measured. But they not within each species. For those interested, this is known as _Simpson's paradox_ and is essential when thinking about causal inference. You can read more [here](http://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf). Let's check out correlation as a function of species:" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "* Compute the correlation coefficients of each pair of measurements within each species:" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": {}, 460 | "outputs": [], 461 | "source": [ 462 | "____" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "**Recap:**\n", 470 | "\n", 471 | "* You've imported your data from google trends and had a brief look at it;\n", 472 | "* You've wrangled your data and gotten it into the form you want to prepare it for analysis.\n", 473 | "* You've checked out youe time series visually.\n", 474 | "* You've identified trends in your time series.\n", 475 | "* You've had some experience with first-order differencing of times series.\n", 476 | "* You've learnt about correlation of two variables, how to compute it and _Simpson's Paradox_.\n", 477 | "\n", 478 | "**Up next:**\n", 479 | "\n", 480 | "* Analyze your periodicity in your times series by looking at its autocorrelation function." 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "For more on pandas, check out our [Data Manipulation with Python track](https://www.datacamp.com/tracks/data-manipulation-with-python). For more on time series with pandas, check out our [Manipulating Time Series Data in Python course](https://www.datacamp.com/courses/manipulating-time-series-data-in-python).\n", 488 | "\n", 489 | "If you're enoying this session, retweet or share on FB now and follow us on Twitter: [@hugobowne](https://twitter.com/hugobowne) & [@DataCamp](https://twitter.com/datacamp)." 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "### Correlation of time series" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "* Plot all your time series again to remind yourself of what they look like:" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": null, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [ 512 | "____\n", 513 | "____" 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": {}, 519 | "source": [ 520 | "* Compute the correlation coefficients of all of these time series:" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": {}, 527 | "outputs": [], 528 | "source": [ 529 | "____" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "* Interpret the above ^." 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "* Plot the first-order differences of these time series (removing the trend may reveal correlation in seasonality):" 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": null, 549 | "metadata": {}, 550 | "outputs": [], 551 | "source": [ 552 | "____\n", 553 | "____" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "* Compute the correlation coefficients of the first-order differences of these time series (removing the trend may reveal correlation in seasonality):" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": null, 566 | "metadata": {}, 567 | "outputs": [], 568 | "source": [ 569 | "____" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": {}, 575 | "source": [ 576 | "## Autocorrelation" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "Now we've taken a dive into correlation of variables and correlation of time series, it's time to plot the autocorrelation of the 'diet' series: on the x-axis you have the lag and on the y-axis you have how correlated the time series is with itself at that lag. For example, if the original time series repeats itself every two days, you would expect to see a spike in the autocorrelation function at 2 days." 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "metadata": {}, 589 | "source": [ 590 | "* Plot the autocorrelation function of the time series diet:" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": null, 596 | "metadata": {}, 597 | "outputs": [], 598 | "source": [ 599 | "____" 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": {}, 605 | "source": [ 606 | "* Interpret the above." 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "**Recap:**\n", 614 | "\n", 615 | "* You've imported your data from google trends and had a brief look at it;\n", 616 | "* You've wrangled your data and gotten it into the form you want to prepare it for analysis.\n", 617 | "* You've checked out youe time series visually.\n", 618 | "* You've identified trends in your time series.\n", 619 | "* You've had some experience with first-order differencing of times series.\n", 620 | "* You've learnt about correlation of two variables, how to compute it and _Simpson's Paradox_.\n", 621 | "* You've analyzed the periodicity in your times series by looking at its autocorrelation function." 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "In this Facebook live code along session, you've checked out Google trends data of keywords 'diet', 'gym' and looked cursorily at 'finance' to see how they vary over time. For those eager data scientists, there are two things you could do right away:\n", 629 | "\n", 630 | "* Look into the 'finance' column and report what you find;\n", 631 | "* Use ARIMA modeling to make some time series forecasts as to what these search trends will look like over the coming years. Jason Brownlee at Machine Learning Mastery has a cool tutorial on [ARIMA modeling in Python](https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/), DataCamp has a [great ARIMA Modeling with R](https://www.datacamp.com/courses/arima-modeling-with-r) and we'll also have a Python Time Series forecasting course up and running this year." 632 | ] 633 | } 634 | ], 635 | "metadata": { 636 | "kernelspec": { 637 | "display_name": "Python 3", 638 | "language": "python", 639 | "name": "python3" 640 | }, 641 | "language_info": { 642 | "codemirror_mode": { 643 | "name": "ipython", 644 | "version": 3 645 | }, 646 | "file_extension": ".py", 647 | "mimetype": "text/x-python", 648 | "name": "python", 649 | "nbconvert_exporter": "python", 650 | "pygments_lexer": "ipython3", 651 | "version": "3.6.4" 652 | } 653 | }, 654 | "nbformat": 4, 655 | "nbformat_minor": 2 656 | } 657 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IMPORTANT 2 | 3 | **If you're planning to code along, make sure to clone, download, or re-pull this repository on the morning of Thursday January 4th, 2018. All edits will be completed by 1159pm ET Wednesday January 3rd.** 4 | 5 | 6 | 7 | ## Identifying New Year's Resolutions with Google Trends 8 | 9 | In this Facebook live code along session, you're going to check out Google trends data of keywords 'diet', 'gym' and 'finance' to see how they vary over time. Could there be more searches for these terms in January when we're all trying to turn over a new leaf? Let's find out! In this session, you'll code along with Hugo to import google trends data into your Jupyter notebook and use `pandas` and a bunch of other packages from the Python data science stack to analyze and interpret these time series. You'll learn a bunch about time series analysis while getting your hands dirty with the world's new year's resolutions. You're not going to do much mathematics today but you'll source your data, visualize it and learn about trends and seasonality in time series data. The emphasis will be squarely on a visual exploration of the dataset in question. 10 | 11 | Join Hugo live on Thursday January 4th, 2018 at 10:30am ET on Facebook! 12 | 13 | ## Prerequisites 14 | 15 | Not a lot. It would help if you knew 16 | 17 | * programming fundamentals and the basics of the Python programming language (e.g., variables, for loops); 18 | * a bit about Jupyter Notebooks; 19 | * your way around the terminal/shell. 20 | 21 | 22 | **However, I have always found that the most important and beneficial prerequisite is a will to learn new things so if you have this quality, you'll definitely get something out of this code-along session.** 23 | 24 | Also, if you'd like to watch and **not** code along, you'll also have a great time and these notebooks will be downloadable afterwards also. 25 | 26 | If you are going to code along and use the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3 (see below), I ask that you install it before the session. 27 | 28 | 29 | ## Getting set up computationally 30 | 31 | ### 1. Clone the repository 32 | 33 | To get set up for this live coding session, clone this repository. You can do so by executing the following in your terminal: 34 | 35 | ``` 36 | git clone https://github.com/datacamp/datacamp_facebook_live_ny_resolution 37 | ``` 38 | 39 | Alternatively, you can download the zip file of the repository at the top of the main page of the repository. If you prefer not to use git or don't have experience with it, this a good option. 40 | 41 | ### 2. Download Anaconda (if you haven't already) 42 | 43 | If you do not already have the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3, go get it (n.b., you can also do this w/out Anaconda using `pip` to install the required packages, however Anaconda is great for Data Science and I encourage you to use it). 44 | 45 | ### 3. Create your conda environment for this session 46 | 47 | Navigate to the relevant directory `datacamp_facebook_live_ny_resolution` and install required packages in a new conda environment: 48 | 49 | ``` 50 | conda env create -f environment.yml 51 | ``` 52 | 53 | This will create a new environment called `fb_live_ny_res`. To activate the environment on OSX/Linux, execute 54 | 55 | ``` 56 | source activate fb_live_ny_res 57 | ``` 58 | On Windows, execute 59 | 60 | ``` 61 | activate fb_live_ny_res 62 | ``` 63 | 64 | 65 | ### 4. Open your Jupyter notebook 66 | 67 | In the terminal, execute `jupyter notebook`. 68 | 69 | Then open the notebook `New_Years_Resolutions_Workshop.ipynb` and we're ready to get coding. Enjoy. 70 | 71 | 72 | ### Code 73 | The code in this repository is released under the [MIT license](LICENSE). Read more at the [Open Source Initiative](https://opensource.org/licenses/MIT). All text remains the Intellectual Property of DataCamp. If you wish to reuse, adapt or remix, get in touch with me at hugo at datacamp com to request permission. 74 | -------------------------------------------------------------------------------- /data/geoMap.csv: -------------------------------------------------------------------------------- 1 | Category: All categories 2 | 3 | Country,diet: (1/1/04 - 12/15/17),gym: (1/1/04 - 12/15/17),finance: (1/1/04 - 12/15/17) 4 | Slovenia,<1,<1,100 5 | Singapore,30,34,70 6 | South Africa,63,35,57 7 | United Kingdom,46,59,60 8 | Hong Kong,13,14,60 9 | Ireland,48,55,35 10 | Australia,54,49,40 11 | United States,53,28,37 12 | India,27,23,52 13 | Sri Lanka,<1,18,48 14 | New Zealand,48,43,43 15 | Kenya,23,<1,44 16 | Ghana,<1,<1,43 17 | Canada,42,30,39 18 | Vietnam,40,14,2 19 | Lebanon,38,28,<1 20 | Indonesia,36,11,17 21 | United Arab Emirates,35,24,33 22 | Philippines,32,23,16 23 | Nigeria,17,<1,31 24 | Sweden,13,30,8 25 | Malaysia,29,18,25 26 | Pakistan,23,13,29 27 | Switzerland,5,14,26 28 | Bangladesh,<1,<1,23 29 | Tunisia,<1,<1,23 30 | Israel,6,5,22 31 | Mexico,3,20,2 32 | Peru,<1,19,4 33 | Guatemala,<1,19,<1 34 | Belgium,4,11,18 35 | Greece,9,17,8 36 | Morocco,<1,<1,16 37 | Lithuania,<1,16,<1 38 | Dominican Republic,<1,16,<1 39 | France,3,16,13 40 | Costa Rica,<1,15,<1 41 | Finland,6,15,8 42 | Denmark,7,15,8 43 | Bolivia,<1,14,<1 44 | Norway,10,13,7 45 | Czechia,<1,11,13 46 | Austria,<1,13,6 47 | Spain,3,12,6 48 | Slovakia,<1,12,<1 49 | Ecuador,<1,12,<1 50 | Netherlands,5,12,11 51 | Argentina,<1,11,4 52 | Chile,<1,11,<1 53 | Portugal,<1,7,10 54 | Romania,5,10,6 55 | Colombia,<1,10,2 56 | Croatia,<1,10,<1 57 | Venezuela,<1,9,<1 58 | Hungary,<1,9,<1 59 | Serbia,<1,9,<1 60 | Egypt,7,8,<1 61 | Saudi Arabia,8,4,5 62 | Germany,3,7,6 63 | South Korea,5,<1,7 64 | Taiwan,<1,7,7 65 | Italy,3,4,6 66 | Poland,3,6,5 67 | China,<1,<1,5 68 | Ukraine,<1,<1,5 69 | Thailand,3,3,4 70 | Brazil,4,2,1 71 | Turkey,2,2,1 72 | Russia,1,2,2 73 | Japan,1,1,2 74 | -------------------------------------------------------------------------------- /data/multiTimeline.csv: -------------------------------------------------------------------------------- 1 | Category: All categories 2 | 3 | Month,diet: (Worldwide),gym: (Worldwide),finance: (Worldwide) 4 | 2004-01,100,31,48 5 | 2004-02,75,26,49 6 | 2004-03,67,24,47 7 | 2004-04,70,22,48 8 | 2004-05,72,22,43 9 | 2004-06,64,24,45 10 | 2004-07,60,23,44 11 | 2004-08,59,28,44 12 | 2004-09,53,25,44 13 | 2004-10,52,24,45 14 | 2004-11,50,23,43 15 | 2004-12,42,24,41 16 | 2005-01,64,32,44 17 | 2005-02,54,28,48 18 | 2005-03,56,27,46 19 | 2005-04,56,25,44 20 | 2005-05,59,24,42 21 | 2005-06,53,25,44 22 | 2005-07,53,25,44 23 | 2005-08,51,28,44 24 | 2005-09,47,28,44 25 | 2005-10,46,27,43 26 | 2005-11,44,25,42 27 | 2005-12,40,24,38 28 | 2006-01,64,34,44 29 | 2006-02,51,29,44 30 | 2006-03,51,28,46 31 | 2006-04,50,27,47 32 | 2006-05,50,26,45 33 | 2006-06,52,25,44 34 | 2006-07,51,27,42 35 | 2006-08,51,30,44 36 | 2006-09,45,30,46 37 | 2006-10,42,27,45 38 | 2006-11,43,26,45 39 | 2006-12,37,26,41 40 | 2007-01,57,35,46 41 | 2007-02,49,33,47 42 | 2007-03,51,32,48 43 | 2007-04,51,32,48 44 | 2007-05,49,32,47 45 | 2007-06,47,31,46 46 | 2007-07,49,30,50 47 | 2007-08,44,31,54 48 | 2007-09,46,32,52 49 | 2007-10,43,28,52 50 | 2007-11,40,27,50 51 | 2007-12,34,26,43 52 | 2008-01,52,35,53 53 | 2008-02,47,30,50 54 | 2008-03,46,29,53 55 | 2008-04,47,28,52 56 | 2008-05,45,27,48 57 | 2008-06,43,27,49 58 | 2008-07,44,28,52 59 | 2008-08,43,31,48 60 | 2008-09,42,33,61 61 | 2008-10,43,28,73 62 | 2008-11,39,28,58 63 | 2008-12,38,27,50 64 | 2009-01,52,35,54 65 | 2009-02,46,30,58 66 | 2009-03,48,28,58 67 | 2009-04,49,29,57 68 | 2009-05,48,28,53 69 | 2009-06,47,28,55 70 | 2009-07,47,28,57 71 | 2009-08,48,30,57 72 | 2009-09,44,31,60 73 | 2009-10,44,28,57 74 | 2009-11,41,27,52 75 | 2009-12,39,27,47 76 | 2010-01,57,35,51 77 | 2010-02,50,31,53 78 | 2010-03,51,30,53 79 | 2010-04,51,29,56 80 | 2010-05,49,28,55 81 | 2010-06,47,28,52 82 | 2010-07,48,29,50 83 | 2010-08,48,31,51 84 | 2010-09,48,32,54 85 | 2010-10,45,30,51 86 | 2010-11,43,28,49 87 | 2010-12,39,28,46 88 | 2011-01,61,39,51 89 | 2011-02,53,34,50 90 | 2011-03,54,33,51 91 | 2011-04,59,31,49 92 | 2011-05,57,31,50 93 | 2011-06,52,32,48 94 | 2011-07,52,30,48 95 | 2011-08,52,34,56 96 | 2011-09,50,36,52 97 | 2011-10,48,33,50 98 | 2011-11,49,33,47 99 | 2011-12,44,32,42 100 | 2012-01,64,42,44 101 | 2012-02,57,37,47 102 | 2012-03,57,35,47 103 | 2012-04,56,34,45 104 | 2012-05,55,33,47 105 | 2012-06,52,35,43 106 | 2012-07,55,37,44 107 | 2012-08,55,37,45 108 | 2012-09,51,39,46 109 | 2012-10,46,33,45 110 | 2012-11,44,32,42 111 | 2012-12,42,32,38 112 | 2013-01,65,43,46 113 | 2013-02,58,37,46 114 | 2013-03,59,37,46 115 | 2013-04,58,37,47 116 | 2013-05,55,36,46 117 | 2013-06,55,37,43 118 | 2013-07,55,37,46 119 | 2013-08,51,39,46 120 | 2013-09,52,41,47 121 | 2013-10,46,38,47 122 | 2013-11,46,37,44 123 | 2013-12,42,36,40 124 | 2014-01,61,47,46 125 | 2014-02,53,44,47 126 | 2014-03,54,43,47 127 | 2014-04,53,40,46 128 | 2014-05,50,39,44 129 | 2014-06,49,39,44 130 | 2014-07,48,41,45 131 | 2014-08,47,40,44 132 | 2014-09,46,40,48 133 | 2014-10,43,38,47 134 | 2014-11,42,37,42 135 | 2014-12,38,38,42 136 | 2015-01,54,48,46 137 | 2015-02,48,43,47 138 | 2015-03,51,43,46 139 | 2015-04,49,42,46 140 | 2015-05,48,41,44 141 | 2015-06,48,42,45 142 | 2015-07,47,42,46 143 | 2015-08,46,43,48 144 | 2015-09,43,45,48 145 | 2015-10,42,41,46 146 | 2015-11,39,42,42 147 | 2015-12,38,42,40 148 | 2016-01,51,52,44 149 | 2016-02,48,46,46 150 | 2016-03,48,47,44 151 | 2016-04,48,44,43 152 | 2016-05,47,46,42 153 | 2016-06,44,46,45 154 | 2016-07,43,58,41 155 | 2016-08,45,53,41 156 | 2016-09,43,51,44 157 | 2016-10,40,45,41 158 | 2016-11,39,44,43 159 | 2016-12,36,44,39 160 | 2017-01,55,56,43 161 | 2017-02,56,51,44 162 | 2017-03,50,51,44 163 | 2017-04,49,48,42 164 | 2017-05,48,48,43 165 | 2017-06,48,49,41 166 | 2017-07,52,52,43 167 | 2017-08,46,52,43 168 | 2017-09,44,50,47 169 | 2017-10,44,47,45 170 | 2017-11,41,47,47 171 | 2017-12,39,45,56 172 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: fb_live_ny_res 2 | channels: 3 | - defaults 4 | dependencies: 5 | - jupyter=1.0.0=py36_3 6 | - matplotlib=2.0.2=np113py36_0 7 | - pandas=0.20.3=py36_0 8 | - scikit-learn=0.19.0=np113py36_0 9 | - scipy=0.19.1=np113py36_0 10 | - seaborn=0.8=py36_0 11 | - statsmodels=0.8.0 12 | --------------------------------------------------------------------------------