├── .gitignore ├── README.md ├── Section_1_Introduction_tutorial.ipynb ├── Section_2_ARIMA_Models_tutorial.ipynb ├── Section_3_ARIMA_Modeling_tutorial.ipynb ├── Section_4_SARIMA_tutorial.ipynb ├── Section_5_ClosingRemarks_tutorial.ipynb ├── data ├── HOUST.csv ├── T10yr.csv ├── TOTALSA.csv ├── TTLCON.csv ├── citi.csv ├── dji.csv ├── international-airline-passengers.csv ├── liquor.csv ├── mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv ├── sentiment.csv ├── series1.csv └── series2.csv └── img └── svds_logo.png /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .ipynb_checkpoints 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyData San Francisco 2016 - ARIMA Tutorial 2 | 3 | ## Shortlink to this page: http://bit.ly/svds-pydata-ts 4 | 5 | To clone this repository, run the command: 6 | ```bash 7 | git clone git@github.com:silicon-valley-data-science/pydata-sf-2016-arima-tutorial.git 8 | ``` 9 | 10 | Requirements (Anaconda/conda install) plus one bleeding edge package: 11 | ```bash 12 | pip install --pre statsmodels --upgrade 13 | ``` 14 | 15 | ## Conference page: http://pydata.org/sfo2016/schedule/presentation/38/ 16 | 17 | ## Tutorial video: https://www.youtube.com/watch?v=tJ-O3hk1vRw 18 | -------------------------------------------------------------------------------- /Section_1_Introduction_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\"SVDS\"" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# PyData San Francisco 2016\n", 15 | "## Applied Time Series Econometrics in Python (and R) Tutorial" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "# Abstract\n", 23 | "\n", 24 | "Time series data is ubiquitous, both within and outside of the data science field: weekly initial unemployment claims, tick level stock prices, weekly company sales, daily number of steps taken recorded by a wearable, just to name a few. Some of the most important and commonly used data science techniques to analyze time series data are those in developed in the field of statistics. For this reason, time series statistical models should be included in any data scientist's toolkit.\n", 25 | "\n", 26 | "This 120-minute tutorial covers the mathematical formulation, statistical foundation, and practical considerations of one of the most important classes of time series models: AutoRegression Integrated Moving Average with Explanatory Variables (ARIMAX) models, and its Seasonal counterpart (SARIMAX)." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "# Topics included in the Tutorial\n", 34 | "\n", 35 | "- Common use cases of SARIMAX\n", 36 | "- The entire class of SARIMAX models, which include Autoregressive (AR) models, Moving Average (MA) models, Mixed Autoregressive Moving Average (ARMA) models, Autoregressive Integrated Moving Average (ARIMA) models, these models with explanatory variables (ARIMAX), and these models with seasonal components and explanatory variables (SARIMAX)\n", 37 | "- Mathematical formulation\n", 38 | "- Underlying assumptions of this class of model\n", 39 | "- Implementation of these models in Python and R, in which I will compare and contrast the two, using simulated and real-world time-series data, which includes\n", 40 | " - Exploratory time series data analysis using histogram, kernel density plot, time-series plot, scatterplot matrix, plots of autocorrelation (i.e. correlogram), and plots of partial autocorrelation \n", 41 | " - Statistical estimation and its options available in Python and R\n", 42 | " - Simulation of these models\n", 43 | " - Order selection (using the celebrated Box-Jenkins approach)\n", 44 | " - Assumption testing and model evaluation \n", 45 | " - Forecasting" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "# Outline of the Tutorial\n", 53 | "\n", 54 | "### 1. Introduction\n", 55 | "\n", 56 | " - 1.1 Common use cases from different disciplines\n", 57 | " - 1.2 Common characteristics of time series\n", 58 | " - 1.3 The class of models to be covered today: A demo\n", 59 | " - Exercise 1\n", 60 | " \n", 61 | "### 2. ETSDA, ARIMA Model Formulation\n", 62 | " - 2.1 The notion of stochastic processes, time series, and stationarity\n", 63 | " - 2.2 Exploratory Time Series Data Analysis\n", 64 | " - 2.3 Mathematical formulation of ARIMA models\n", 65 | " - 2.4 The Box-Jenkins Approach to ARIMA Modeling\n", 66 | " - Exercise 2\n", 67 | " \n", 68 | "### 3. ARIMA Modeling\n", 69 | " - 3.1 Model Identification\n", 70 | " - 3.2 Model Diagnostic Checking\n", 71 | " - 3.3 Model performane evaluation (in-sample fit)\n", 72 | " - 3.4 Forecasting and forecast evaluation \n", 73 | " - 3.5 A few words on adding explanatory variables, its use cases, and its practical suggestions\n", 74 | " - Exercise 3\n", 75 | "\n", 76 | "### 4. SARIMA Modeling\n", 77 | " - 4.1 Mathematical formulation of Seasonal ARIMA (SARIMA) models\n", 78 | " - 4.2 Building a seasonal ARIMA model for forecasting\n", 79 | " - Exercise 4\n", 80 | "\n", 81 | "### 5. Closing Remarks: Practical suggestions and other topics\n", 82 | " - 5.1 Model selection heuristics\n", 83 | " - 5.2 Where to go from here" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "
Note: You may note that these notebooks are, at times, fairly dense. That is because there is likely more material here than we can cover today. This was done on purpose, as there is a lot to know. My hope is that you can continue your exploration of the topic with these notebooks, even after the tutorial has ended.
" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "
Requires `statsmodels0.8.0rc1` or newer. Install with `pip install --pre statsmodels --upgrade`
" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": { 104 | "ExecuteTime": { 105 | "end_time": "2016-08-24T11:55:46.760055", 106 | "start_time": "2016-08-24T11:55:44.886453" 107 | }, 108 | "collapsed": true 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "%load_ext autoreload\n", 113 | "%autoreload 2\n", 114 | "%matplotlib inline\n", 115 | "%config InlineBackend.figure_format='retina'\n", 116 | "\n", 117 | "from __future__ import absolute_import, division, print_function\n", 118 | "\n", 119 | "import pandas as pd\n", 120 | "import numpy as np\n", 121 | "\n", 122 | "import statsmodels.api as sm\n", 123 | "import statsmodels.formula.api as smf\n", 124 | "import statsmodels.tsa.api as smt\n", 125 | "\n", 126 | "# Display and Plotting\n", 127 | "import matplotlib.pylab as plt\n", 128 | "import seaborn as sns\n", 129 | "\n", 130 | "from ipywidgets import interactive, widgets, RadioButtons, ToggleButton, Select, FloatSlider, FloatRangeSlider, IntSlider, fixed\n", 131 | "\n", 132 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n", 133 | "np.set_printoptions(precision=5, suppress=True) # numpy\n", 134 | "\n", 135 | "pd.set_option('display.max_columns', 100)\n", 136 | "pd.set_option('display.max_rows', 100)\n", 137 | "\n", 138 | "# seaborn plotting style\n", 139 | "sns.set(style='ticks', context='poster')" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "# 1. Introduction\n", 147 | " - 1.1 Common use cases from different disciplines\n", 148 | " - 1.2 Common characteristics of time series\n", 149 | " - 1.3 The class of models to be covered today: A demo" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": { 155 | "collapsed": true 156 | }, 157 | "source": [ 158 | "## 1.1 Common Use Cases" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": { 164 | "collapsed": true 165 | }, 166 | "source": [ 167 | "- Government Budget and Key Economic Indicator Projections\n", 168 | "- Companies forecast sales \n", 169 | "- CMS Projection on National Health Expenditure\n", 170 | "- NCES Projections of Education Statistics\n", 171 | "- Vehicular traffic flow forecasting\n", 172 | "- Dynamic resource allocation (e.g., servers)\n", 173 | "- Physiological models for health monitoring (e.g., glucose levels in diabetics)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "## 1.2 Common characteristics of time series" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": { 186 | "collapsed": true 187 | }, 188 | "source": [ 189 | "* Trend\n", 190 | "* Seasonality\n", 191 | "* Cycles\n", 192 | "* Combination of the above" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "### Pattern 1: Trend and Fluctuation around the Trend\n", 200 | "\n", 201 | "Airline Passenger Bookings\n", 202 | "\n", 203 | "https://datamarket.com/data/set/22u3/international-airline-passengers-monthly-totals-in-thousands-jan-49-dec-60\n", 204 | "- `data/international-airline-passengers.csv`\n" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": null, 210 | "metadata": { 211 | "ExecuteTime": { 212 | "end_time": "2016-08-24T11:40:50.642369", 213 | "start_time": "2016-08-24T11:40:50.615605" 214 | }, 215 | "collapsed": true 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "air = pd.read_csv('data/international-airline-passengers.csv', header=0, index_col=0, parse_dates=[0])" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": { 226 | "ExecuteTime": { 227 | "end_time": "2016-08-24T11:40:51.296736", 228 | "start_time": "2016-08-24T11:40:50.643896" 229 | }, 230 | "collapsed": false 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "fig, ax = plt.subplots(figsize=(8,6));\n", 235 | "\n", 236 | "air['n_pass_thousands'].plot(ax=ax);\n", 237 | "\n", 238 | "ax.set_title('International airline passengers, 1949-1960');\n", 239 | "ax.set_ylabel('Thousands of passengers');\n", 240 | "ax.set_xlabel('Year');\n", 241 | "ax.xaxis.set_ticks_position('bottom')\n", 242 | "fig.tight_layout();" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": { 249 | "ExecuteTime": { 250 | "end_time": "2016-08-24T11:40:52.092458", 251 | "start_time": "2016-08-24T11:40:51.300100" 252 | }, 253 | "collapsed": false 254 | }, 255 | "outputs": [], 256 | "source": [ 257 | "# Examine annual trend in the data\n", 258 | "fig, ax = plt.subplots(figsize=(8,6));\n", 259 | "\n", 260 | "air['n_pass_thousands'].resample('AS').sum().plot(ax=ax)\n", 261 | "\n", 262 | "# ax.set_title('Aggregated annual series: International airline passengers, 1949-1960');\n", 263 | "fig.suptitle('Aggregated annual series: International airline passengers, 1949-1960');\n", 264 | "ax.set_ylabel('Thousands of passengers');\n", 265 | "ax.set_xlabel('Year');\n", 266 | "ax.xaxis.set_ticks_position('bottom')\n", 267 | "fig.tight_layout();\n", 268 | "fig.subplots_adjust(top=0.9)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": { 275 | "ExecuteTime": { 276 | "end_time": "2016-08-24T11:40:52.858761", 277 | "start_time": "2016-08-24T11:40:52.094540" 278 | }, 279 | "collapsed": false 280 | }, 281 | "outputs": [], 282 | "source": [ 283 | "# Examine seasonal trend in the data\n", 284 | "air['Month'] = air.index.strftime('%b')\n", 285 | "air['Year'] = air.index.year\n", 286 | "\n", 287 | "air_piv = air.pivot(index='Year', columns='Month', values='n_pass_thousands')\n", 288 | "\n", 289 | "air = air.drop(['Month', 'Year'], axis=1)\n", 290 | "\n", 291 | "# put the months in order\n", 292 | "month_names = pd.date_range(start='2016-01-01', periods=12, freq='MS').strftime('%b')\n", 293 | "air_piv = air_piv.reindex(columns=month_names)\n", 294 | "\n", 295 | "# plot it\n", 296 | "fig, ax = plt.subplots(figsize=(8, 6))\n", 297 | "air_piv.plot(ax=ax, kind='box');\n", 298 | "\n", 299 | "ax.set_xlabel('Month');\n", 300 | "ax.set_ylabel('Thousands of passengers');\n", 301 | "ax.set_title('Boxplot of seasonal values');\n", 302 | "ax.xaxis.set_ticks_position('bottom')\n", 303 | "fig.tight_layout();" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "### Pattern 2: Trend and Change in Structure\n", 311 | "\n", 312 | "Annual Average Global Temperature Change\n", 313 | "\n", 314 | "http://data.giss.nasa.gov/gistemp/graphs/graph_files.html - Land-Ocean: Global Means\n", 315 | "- `data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv`" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": { 322 | "ExecuteTime": { 323 | "end_time": "2016-08-24T11:40:53.800427", 324 | "start_time": "2016-08-24T11:40:52.860435" 325 | }, 326 | "collapsed": false 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "gtemp = pd.read_csv('data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv', header=1, index_col=0, parse_dates=[0])\n", 331 | "gtemp['avg'] = gtemp.iloc[:,:12].mean(axis=1)\n", 332 | "\n", 333 | "fig, ax = plt.subplots(figsize=(8, 6));\n", 334 | "\n", 335 | "gtemp['avg'].plot(ax=ax);\n", 336 | "\n", 337 | "ax.set_title('Annual Average Global Temperature Change');\n", 338 | "\n", 339 | "ylim = (-1.0, 1.5)\n", 340 | "ax.set_ylim(ylim)\n", 341 | "\n", 342 | "ax.fill_betweenx(ylim, gtemp.index[0], pd.Timestamp('1922'), alpha=.1, zorder=-1, color='b');\n", 343 | "ax.fill_betweenx(ylim, pd.Timestamp('1922'), pd.Timestamp('1965'), alpha=.1, zorder=-1, color='g');\n", 344 | "ax.fill_betweenx(ylim, pd.Timestamp('1965'), gtemp.index[-1], alpha=.1, zorder=-1, color='r');\n", 345 | "\n", 346 | "ax.annotate('$\\\\longrightarrow$', (gtemp.index[15], -0.8));\n", 347 | "ax.annotate('$\\\\nearrow$', (gtemp.index[-30], 0));\n", 348 | "ax.xaxis.set_ticks_position('bottom')\n", 349 | "fig.tight_layout();" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "### Pattern 3: Variation around a Stable Mean\n", 357 | "\n", 358 | "Dow Jones Industrial Average\n", 359 | "- `data/dji.csv`\n", 360 | "\n", 361 | "```python\n", 362 | "import pandas_datareader.data as web\n", 363 | "\n", 364 | "start = pd.Timestamp('2006-04-20')\n", 365 | "end = pd.Timestamp('2016-04-20')\n", 366 | "\n", 367 | "dji = web.DataReader(\"^DJI\", 'yahoo', start, end)\n", 368 | "\n", 369 | "dji['Return_log'] = dji['Close'].apply(lambda x: np.log(x)).diff()\n", 370 | "\n", 371 | "dji.to_csv('data/dji.csv')\n", 372 | "```" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": { 379 | "ExecuteTime": { 380 | "end_time": "2016-08-24T11:40:54.532548", 381 | "start_time": "2016-08-24T11:40:53.801747" 382 | }, 383 | "collapsed": false 384 | }, 385 | "outputs": [], 386 | "source": [ 387 | "dji = pd.read_csv('data/dji.csv', header=0, index_col=0)\n", 388 | "\n", 389 | "fig, ax = plt.subplots(figsize=(8, 7))\n", 390 | "\n", 391 | "dji['Return_log'].plot(ax=ax);\n", 392 | "\n", 393 | "ax.set_title('Dow Jones Industrial returns, 2006-2016');\n", 394 | "ax.xaxis.set_ticks_position('bottom')\n", 395 | "fig.tight_layout();" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "### Pattern 4: Cycles/Periodicity\n", 403 | "\n", 404 | "Number of annual sunspots" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": { 411 | "ExecuteTime": { 412 | "end_time": "2016-08-24T11:40:54.562494", 413 | "start_time": "2016-08-24T11:40:54.533876" 414 | }, 415 | "collapsed": false 416 | }, 417 | "outputs": [], 418 | "source": [ 419 | "print(sm.datasets.sunspots.NOTE)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": { 426 | "ExecuteTime": { 427 | "end_time": "2016-08-24T11:40:54.601703", 428 | "start_time": "2016-08-24T11:40:54.563767" 429 | }, 430 | "collapsed": false 431 | }, 432 | "outputs": [], 433 | "source": [ 434 | "sun = sm.datasets.sunspots.load_pandas().data\n", 435 | "sun['YEAR'] = pd.to_datetime(sun['YEAR'].astype(int), format='%Y')\n", 436 | "sun = sun.set_index('YEAR')" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": null, 442 | "metadata": { 443 | "ExecuteTime": { 444 | "end_time": "2016-08-24T11:40:54.642270", 445 | "start_time": "2016-08-24T11:40:54.603075" 446 | }, 447 | "collapsed": true 448 | }, 449 | "outputs": [], 450 | "source": [ 451 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n", 452 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n", 453 | " \n", 454 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n", 455 | " '''\n", 456 | " fig = plt.figure(figsize=figsize)\n", 457 | " layout = (2, 2)\n", 458 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n", 459 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n", 460 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n", 461 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n", 462 | " \n", 463 | " y.plot(ax=ts_ax)\n", 464 | " ts_ax.set_title(title)\n", 465 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n", 466 | " hist_ax.set_title('Histogram')\n", 467 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n", 468 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n", 469 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n", 470 | " sns.despine()\n", 471 | " plt.tight_layout()\n", 472 | " return ts_ax, acf_ax, pacf_ax" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "metadata": { 479 | "ExecuteTime": { 480 | "end_time": "2016-08-24T11:40:56.478273", 481 | "start_time": "2016-08-24T11:40:54.643768" 482 | }, 483 | "collapsed": false 484 | }, 485 | "outputs": [], 486 | "source": [ 487 | "tsplot(sun, lags=40);" 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "## 1.3 Demo: Plot and model data generated from ARIMA process" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": null, 500 | "metadata": { 501 | "ExecuteTime": { 502 | "end_time": "2016-08-24T11:51:07.785101", 503 | "start_time": "2016-08-24T11:51:07.752205" 504 | }, 505 | "collapsed": true 506 | }, 507 | "outputs": [], 508 | "source": [ 509 | "def generate_arima_data(arparams,\n", 510 | " maparams,\n", 511 | " i_order=0,\n", 512 | " n_samp=120,\n", 513 | " rng_state=None,\n", 514 | " sigma=1,\n", 515 | " burnin=10,\n", 516 | " lin_trend=None,\n", 517 | " verbose=True,\n", 518 | " ):\n", 519 | " \n", 520 | " if rng_state is None:\n", 521 | " rng_state = np.random.RandomState()\n", 522 | " ar = np.r_[1, -arparams] # add zero-lag and negate\n", 523 | " ma = np.r_[1, maparams] # add zero-lag\n", 524 | " \n", 525 | " if verbose:\n", 526 | " arma_process = smt.ArmaProcess(ar, ma, nobs=n_samp)\n", 527 | " print('Is the process stationary? {}'.format(arma_process.isstationary))\n", 528 | " print('Is the process invertible? {}'.format(arma_process.isinvertible))\n", 529 | "\n", 530 | " y = smt.arma_generate_sample(ar, ma, n_samp, sigma=sigma, distrvs=rng_state.randn, burnin=burnin)\n", 531 | " # add deterministic linear trend\n", 532 | " if lin_trend is not None:\n", 533 | " y = y + np.cumsum(np.repeat(lin_trend, n_samp))\n", 534 | " for i in range(i_order):\n", 535 | " y = y.cumsum()\n", 536 | " \n", 537 | " return y" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": { 544 | "ExecuteTime": { 545 | "end_time": "2016-08-24T11:51:08.340037", 546 | "start_time": "2016-08-24T11:51:08.149594" 547 | }, 548 | "collapsed": false 549 | }, 550 | "outputs": [], 551 | "source": [ 552 | "# the function for generating data and plotting\n", 553 | "def arima_data(n_samp=120,\n", 554 | " ar_gen=0,\n", 555 | " ar1_coef=0,\n", 556 | " ar2_coef=0,\n", 557 | " ar3_coef=0,\n", 558 | " ar4_coef=0,\n", 559 | " i_gen=0,\n", 560 | " ma_gen=0,\n", 561 | " ma1_coef=0,\n", 562 | " ma2_coef=0,\n", 563 | " ma3_coef=0,\n", 564 | " ma4_coef=0,\n", 565 | " rand_state=42,\n", 566 | " ylim=5,\n", 567 | " ar_fit_p=0,\n", 568 | " i_fit_d=0,\n", 569 | " ma_fit_q=0,\n", 570 | " n_train=108,\n", 571 | " n_forecast=24,\n", 572 | " dynamic=False,\n", 573 | " lin_trend=None,\n", 574 | " verbose=True,\n", 575 | " ):\n", 576 | " \n", 577 | " rng_state = np.random.RandomState(rand_state)\n", 578 | "\n", 579 | " arparams = np.array([ar1_coef, ar2_coef, ar3_coef, ar4_coef])\n", 580 | " arparams = arparams[:ar_gen]\n", 581 | " maparams = np.array([ma1_coef, ma2_coef, ma3_coef, ma4_coef])\n", 582 | " maparams = maparams[:ma_gen]\n", 583 | " \n", 584 | " print('Generated ARIMA({}, {}, {})'.format(ar_gen, i_gen, ma_gen))\n", 585 | " print('AR coeff = {}, MA coeff = {}'.format(arparams, maparams))\n", 586 | " \n", 587 | " y = generate_arima_data(arparams,\n", 588 | " maparams,\n", 589 | " i_gen,\n", 590 | " n_samp=n_samp,\n", 591 | " rng_state=rng_state,\n", 592 | " lin_trend=lin_trend,\n", 593 | " verbose=verbose,\n", 594 | " )\n", 595 | " \n", 596 | " # set a fake DatetimeIndex\n", 597 | " df = pd.DataFrame(data=y, columns=['value'], index=pd.date_range(start='1990-01-01', freq='MS', periods=len(y)))\n", 598 | " \n", 599 | " fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(16, 5))\n", 600 | " \n", 601 | " ax1.plot(df.iloc[:n_train]['value'], label='In-sample data', linestyle='-')\n", 602 | " # subtract 1 only to connect it to previous point in the graph\n", 603 | " ax1.plot(df.iloc[n_train-1:]['value'], label='Held-out data', linestyle='--')\n", 604 | "\n", 605 | " fitting=False\n", 606 | " if (((ar_gen > 0) and (ar1_coef != 0)) or ((ma_gen > 0) and (ma1_coef != 0))) and ((ar_fit_p > 0) or (ma_fit_q > 0)):\n", 607 | " fitting=True\n", 608 | " print('Fit ARIMA({}, {}, {})'.format(ar_fit_p, i_fit_d, ma_fit_q))\n", 609 | " \n", 610 | " training = df.iloc[:n_train]['value']\n", 611 | " \n", 612 | " if (lin_trend is not None) and (lin_trend > 0):\n", 613 | " #trend='t'\n", 614 | " \n", 615 | " # there's a bug in statsmodels 0.8.0rc1 regarding trend that has been fixed\n", 616 | " # https://github.com/statsmodels/statsmodels/issues/3111\n", 617 | " trend='n'\n", 618 | " else:\n", 619 | " trend='n'\n", 620 | " model = smt.SARIMAX(training, order=(ar_fit_p, i_fit_d, ma_fit_q),\n", 621 | " trend=trend,\n", 622 | " enforce_stationarity=False,\n", 623 | " enforce_invertibility=False,\n", 624 | " )\n", 625 | " results = model.fit()\n", 626 | " \n", 627 | " pred_begin = df.index[results.loglikelihood_burn]\n", 628 | " pred_end = df.index[n_train] + pd.DateOffset(months = n_forecast - 1)\n", 629 | " pred = results.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n", 630 | " end=pred_end.strftime('%Y-%m-%d'),\n", 631 | " dynamic=dynamic)\n", 632 | " pred_mean = pred.predicted_mean\n", 633 | " pred_ci = pred.conf_int(alpha=0.05)\n", 634 | " \n", 635 | " ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n", 636 | " ax1.fill_between(pred_ci.index,\n", 637 | " pred_ci.iloc[:, 0],\n", 638 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n", 639 | " # plot the residuals\n", 640 | " (df['value'] - pred_mean).dropna().plot(ax=ax2, marker='o')\n", 641 | " ax2.set_xlim((df.index[0], pred_end))\n", 642 | " ax2.set_title('Residuals ($data - model$)');\n", 643 | " ax2.axhline(y=0, linestyle='--', color='k', alpha=.5);\n", 644 | " \n", 645 | " # scale with i_gen\n", 646 | " ylim = ylim*(10**(i_gen))\n", 647 | " ax1.set_ylim((-ylim, ylim));\n", 648 | " ax1.legend(loc='best');\n", 649 | " \n", 650 | " if fitting:\n", 651 | " ax1.fill_betweenx(ax1.get_ylim(), df.index[n_train], pred_end, alpha=.1, zorder=-1)\n", 652 | " ax2.fill_betweenx(ax2.get_ylim(), df.index[n_train], pred_end, alpha=.1, zorder=-1)\n", 653 | " plt.show();\n", 654 | " print(results.summary())\n", 655 | " pass" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": null, 661 | "metadata": { 662 | "ExecuteTime": { 663 | "end_time": "2016-08-24T11:51:08.872897", 664 | "start_time": "2016-08-24T11:51:08.540231" 665 | }, 666 | "collapsed": false 667 | }, 668 | "outputs": [], 669 | "source": [ 670 | "# set up the widgets\n", 671 | "n_samp=120\n", 672 | "\n", 673 | "n_train=108\n", 674 | "n_forecast=24\n", 675 | "\n", 676 | "rand_state_init=42\n", 677 | "ylim_init=5\n", 678 | "\n", 679 | "# orders\n", 680 | "int_min = 0\n", 681 | "int_max = 4\n", 682 | "int_step = 1\n", 683 | "\n", 684 | "# sliders for data generation\n", 685 | "ar_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 686 | "i_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 687 | "ma_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 688 | "\n", 689 | "# coefficients\n", 690 | "lag_min = -1\n", 691 | "lag_max = 1\n", 692 | "lag_step = 0.1\n", 693 | "\n", 694 | "ar1_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 695 | "ar2_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 696 | "ar3_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 697 | "ar4_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 698 | "\n", 699 | "ma1_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 700 | "ma2_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 701 | "ma3_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 702 | "ma4_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n", 703 | "\n", 704 | "rand_slider = IntSlider(value=rand_state_init, min=0, max=10000, step=1, continuous_update=False)\n", 705 | "ylim_slider = IntSlider(value=ylim_init, min=1, max=100, step=1, continuous_update=False)\n", 706 | "\n", 707 | "# initial values and sliders for model parameters\n", 708 | "ar_fit_p_init=0\n", 709 | "i_fit_d_init=0\n", 710 | "ma_fit_q_init=0\n", 711 | "ar_fit_p_slider = IntSlider(value=ar_fit_p_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 712 | "i_fit_d_slider = IntSlider(value=i_fit_d_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 713 | "ma_fit_q_slider = IntSlider(value=ma_fit_q_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n", 714 | "\n", 715 | "# dynamic_init=n_train\n", 716 | "# dynamic_slider = IntSlider(value=dynamic_init, min=n_train-10, max=n_train+1, step=int_step, continuous_update=False)\n", 717 | "\n", 718 | "# lin_trend_init = 0\n", 719 | "# lin_trend_slider = FloatSlider(value=lin_trend_init, min=0, max=1.0, step=0.1, continuous_update=False)\n", 720 | "\n", 721 | "arima_w = interactive(\n", 722 | " arima_data,\n", 723 | " n_samp=fixed(n_samp),\n", 724 | " ar_gen=ar_gen_slider,\n", 725 | " ar1_coef=ar1_coef_slider,\n", 726 | " ar2_coef=ar2_coef_slider,\n", 727 | " ar3_coef=ar3_coef_slider,\n", 728 | " ar4_coef=ar4_coef_slider,\n", 729 | " i_gen=i_gen_slider,\n", 730 | " ma_gen=ma_gen_slider,\n", 731 | " ma1_coef=ma1_coef_slider,\n", 732 | " ma2_coef=ma2_coef_slider,\n", 733 | " ma3_coef=ma3_coef_slider,\n", 734 | " ma4_coef=ma4_coef_slider,\n", 735 | " rand_state=rand_slider,\n", 736 | " ylim=ylim_slider,\n", 737 | " ar_fit_p=ar_fit_p_slider,\n", 738 | " i_fit_d=i_fit_d_slider,\n", 739 | " ma_fit_q=ma_fit_q_slider,\n", 740 | " n_train=fixed(n_train),\n", 741 | " n_forecast=fixed(n_forecast),\n", 742 | " dynamic=fixed(False),\n", 743 | " #dynamic=dynamic_slider,\n", 744 | " lin_trend=fixed(None),\n", 745 | " #lin_trend=lin_trend_slider,\n", 746 | " verbose=fixed(True),\n", 747 | " )\n", 748 | "\n", 749 | "# arrange the widgets\n", 750 | "arima_widget = widgets.HBox([widgets.VBox(arima_w.children[:6]),\n", 751 | " widgets.VBox(arima_w.children[6:11]),\n", 752 | " widgets.VBox(arima_w.children[11:]),\n", 753 | " ])\n", 754 | "# this is the set of widgets in the function with defaults\n", 755 | "arima_widget.on_displayed(lambda x: arima_data(ar_gen=0,\n", 756 | " ar1_coef=0,\n", 757 | " ar2_coef=0,\n", 758 | " ar3_coef=0,\n", 759 | " ar4_coef=0,\n", 760 | " i_gen=0,\n", 761 | " ma_gen=0,\n", 762 | " ma1_coef=0,\n", 763 | " ma2_coef=0,\n", 764 | " ma3_coef=0,\n", 765 | " ma4_coef=0,\n", 766 | " rand_state=rand_state_init,\n", 767 | " ylim=ylim_init,\n", 768 | " ar_fit_p=ar_fit_p_init,\n", 769 | " i_fit_d=i_fit_d_init,\n", 770 | " ma_fit_q=ma_fit_q_init,\n", 771 | " n_train=n_train,\n", 772 | " #dynamic=dynamic_init,\n", 773 | " #lin_trend=lin_trend_init,\n", 774 | " ))" 775 | ] 776 | }, 777 | { 778 | "cell_type": "code", 779 | "execution_count": null, 780 | "metadata": { 781 | "ExecuteTime": { 782 | "end_time": "2016-08-24T11:51:10.280655", 783 | "start_time": "2016-08-24T11:51:09.374116" 784 | }, 785 | "collapsed": false 786 | }, 787 | "outputs": [], 788 | "source": [ 789 | "arima_widget" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": { 795 | "collapsed": true 796 | }, 797 | "source": [ 798 | "### Exericse 1\n", 799 | "\n", 800 | "1. Write down 2 - 4 examples of time series that you encouter in real life.\n", 801 | "2. Use the widget above to simulate a number of time series. Are there any of them that resemble the ones that you encounter in real life?" 802 | ] 803 | }, 804 | { 805 | "cell_type": "code", 806 | "execution_count": null, 807 | "metadata": { 808 | "collapsed": true 809 | }, 810 | "outputs": [], 811 | "source": [] 812 | } 813 | ], 814 | "metadata": { 815 | "kernelspec": { 816 | "display_name": "Python 3", 817 | "language": "python", 818 | "name": "python3" 819 | }, 820 | "language_info": { 821 | "codemirror_mode": { 822 | "name": "ipython", 823 | "version": 3 824 | }, 825 | "file_extension": ".py", 826 | "mimetype": "text/x-python", 827 | "name": "python", 828 | "nbconvert_exporter": "python", 829 | "pygments_lexer": "ipython3", 830 | "version": "3.5.2" 831 | }, 832 | "nav_menu": {}, 833 | "toc": { 834 | "navigate_menu": true, 835 | "number_sections": false, 836 | "sideBar": true, 837 | "threshold": 6, 838 | "toc_cell": false, 839 | "toc_section_display": "block", 840 | "toc_window_display": false 841 | }, 842 | "widgets": { 843 | "state": { 844 | "e1d05cf91d904ccfa8f5327c13a904cf": { 845 | "views": [ 846 | { 847 | "cell_index": 31 848 | } 849 | ] 850 | } 851 | }, 852 | "version": "1.2.0" 853 | } 854 | }, 855 | "nbformat": 4, 856 | "nbformat_minor": 0 857 | } 858 | -------------------------------------------------------------------------------- /Section_2_ARIMA_Models_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\"SVDS\"" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# PyData San Francisco 2016\n", 15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n", 16 | "### Section 2: Exploratory Time Series Data Analysis and the Class of ARIMA Models" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### Topics in this section include \n", 24 | "\n", 25 | "- 2.1 The notion of stochastic processes, time series, and stationarity\n", 26 | "- 2.2 Exploratory Time Series Data Analysis\n", 27 | "- 2.3 Mathematical formulation of ARIMA models\n", 28 | "- 2.4 An Introduction to the *Box-Jenkins Approach* to ARIMA Modeling" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### 2.1 The notion of Stochastic Processes, Time Series, Stationarity, Autocorrelation\n", 36 | "
Note: This is a relatively dense section. However, it sets up the necessary framework for us to study the class of *Autoregressive Integrated Moving Average* model.
\n", 37 | "\n", 38 | "#### Key Takeaway from this section:\n", 39 | "1. An observed time series is treated as a realization of an underlying probability model.\n", 40 | "2. We will study a certain class of probability model that comes with a very appealing (and simple) probability structure.\n", 41 | "3. The concept of (weak) stationarity is a key requirement of the class of time series models that we will study.\n", 42 | "4. The concept of autocorrelation function and (partial) autocorrelation function are a main tool for us to analyze a time series.\n" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "* The $\\textbf{autocovariance function}$ is defined as\n", 50 | "\n", 51 | "$$\\gamma_{x}(s,t) = cov(x_s,x_t) = E[(x_s-\\mu_s)(x_t-\\mu_t)] \\forall s,t$$\n", 52 | "\n", 53 | "* Two natural implications are $(1) \\gamma_{x}(s,t) = \\gamma_{x}(t,s)$ and $(2)$ $\\gamma_{x}(s,s) = cov(x_s,x_s) = E[(x_s-\\mu_s)^2]$\n", 54 | "\n", 55 | "* A correlation of a variable with itself at different times is known as $\\textit{autocorrelation}$. If a time series model is second-order stationary (i.e. stationary in both mean and variance: $\\mu_t = \\mu$ and $\\sigma_t^2 = \\sigma^2$ for all $t$), then an $\\textit{autocovariance function}$ can be expressed as a function only of the time lag $k$:\n", 56 | "\n", 57 | "$$ \\gamma_k = E[(x_t-\\mu)(x_{t+k} - \\mu)] $$\n", 58 | " \n", 59 | "* Likewise, the autocorrelation function \\emph{acf} is defined as\n", 60 | "\n", 61 | "$$ \\rho_k = \\frac{\\gamma_k}{\\sigma^2} $$\n", 62 | " \n", 63 | "* When $k=0$, $\\rho_0 = 1$" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "#### Estimation of ($\\mathbf{1^{st}}$ order) Dependency:\n", 71 | "\n", 72 | "* Using the $\\textit{moment principles}$, the $\\textit{acvf}$ and $\\textit{acf}$ can be estimated from a time series by their sample equivalents. The sample \\emph{acvf} can be estimated using the following formula:\n", 73 | "\n", 74 | "$$ \\hat{\\gamma}_k = \\frac{1}{T} \\sum_{t=1}^{T-k} \\left( x_t - \\bar{x} \\right) \\left( x_{t+k} - \\bar{x} \\right) $$\n", 75 | "\n", 76 | "* Note that the sum is divided by $T$ and and not $T-k$.\n", 77 | "\n", 78 | "* The sample $\\textit{ACF}$ is defined by\n", 79 | "\n", 80 | "$$ \\frac{\\hat{\\gamma}_k}{\\hat{\\gamma}_0} = \\frac{\\frac{1}{T} \\sum_{t=1}^{T-k} \\left( x_t - \\bar{x} \\right) \\left( x_{t+k} - \\bar{x} \\right)}{ \\frac{1}{T} \\sum_{t=1}^{T} \\left( x_t - \\bar{x} \\right)^2} $$\n" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "#### Notion of Stationarity:\n", 88 | "\n", 89 | "\n", 90 | "* A time series ${x_t}$ is said to be $\\textit{strictly stationary}$ if the joint distributions $F(x_{t_1}, \\dots, x_{t_n})$ and $F( x_{t_1+m}, \\dots, x_{t_n +m})$ are the same, $\\forall$ $t_1, ... t_n$ and $m$. This is a very strong condition, too strong to be applied in practice; it implies that the distribution is unchanged for any time shift!\n", 91 | "\n", 92 | "* A weaker and more practical stationarity condition is that of $\\textit{weakly stationary}$ (or $\\textit{second order stationarity}$). A time series $x_t$ is said to be $\\textit{weakly stationary}$ if it is mean and variance stationary and its autocovariance $Cov(x_t,x_{t+k})$ depends only the time displacement $k$ and can be written as $\\gamma(k)$. \n", 93 | "\n", 94 | "* Second order stationarity plays an important role in many of the time series models we will discuss in this tutorial; if a time series is second order stationary, then once a distribution assumption, such as normality, is imposed, the series can be completely characterized by its mean and covariance structure." 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "### 2.2 Exploratory Time Series Data Analysis\n", 102 | "\n", 103 | "* Now that we introduce the essential concepts for characterizing the probability structure of a time series, we will proceed to \"$\\textit{explore}$\" these characteristics empirically.\n", 104 | "\n", 105 | "* Specifically, we will use *time series plot, histogram (and its variants), plot of sample autocorrelation, and plot of sample partial autocorrelation}* to examine a given time series. \n", 106 | "\n", 107 | "* These visuals play a very crucial role in the $\\textit{Box-Jenkins approach}$ to ARIMA modeling." 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": { 114 | "collapsed": false 115 | }, 116 | "outputs": [], 117 | "source": [ 118 | "# If you get an error in reading pandas-datareader run the following\n", 119 | "# !conda install pandas-datareader -y" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": { 126 | "collapsed": true 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "%load_ext autoreload\n", 131 | "%autoreload 2\n", 132 | "%matplotlib inline\n", 133 | "%config InlineBackend.figure_format='retina'\n", 134 | "\n", 135 | "from __future__ import absolute_import, division, print_function\n", 136 | "\n", 137 | "import sys\n", 138 | "import os\n", 139 | "\n", 140 | "import pandas as pd\n", 141 | "import numpy as np\n", 142 | "\n", 143 | "# # Remote Data Access\n", 144 | "# import pandas_datareader.data as web\n", 145 | "# import datetime\n", 146 | "# # reference: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html\n", 147 | "\n", 148 | "# TSA from Statsmodels\n", 149 | "import statsmodels.api as sm\n", 150 | "import statsmodels.formula.api as smf\n", 151 | "import statsmodels.tsa.api as smt\n", 152 | "\n", 153 | "# Display and Plotting\n", 154 | "import matplotlib.pylab as plt\n", 155 | "import seaborn as sns\n", 156 | "\n", 157 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n", 158 | "np.set_printoptions(precision=5, suppress=True) # numpy\n", 159 | "\n", 160 | "pd.set_option('display.max_columns', 100)\n", 161 | "pd.set_option('display.max_rows', 100)\n", 162 | "\n", 163 | "# seaborn plotting style\n", 164 | "sns.set(style='ticks', context='poster')" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": { 171 | "collapsed": false 172 | }, 173 | "outputs": [], 174 | "source": [ 175 | "# Load data from the internet using the Remote Data Reader\n", 176 | "# This is a very useful function, as it allows one to access a lot of time series and (non-time series) data publicly\n", 177 | "# available on the internet\n", 178 | "\n", 179 | "# start = pd.Timestamp('2000-01-01')\n", 180 | "# end = pd.Timestamp('2016-07-31')\n", 181 | "\n", 182 | "# C = web.DataReader(\"C\", 'yahoo', start, end)\n", 183 | "# Sentiment= web.DataReader(\"UMCSENT\", 'fred', start, end)\n", 184 | "# T10yr = web.DataReader(\"^TNX\", 'yahoo', start, end)\n", 185 | "\n", 186 | "# Save the DataFrame to a csv file\n", 187 | "# Sentiment.to_csv('data/sentiment.csv')\n", 188 | "# C.to_csv('data/citi.csv')\n", 189 | "# T10yr.to_csv('data/T10yr.csv')" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "collapsed": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "#Read the data\n", 201 | "\n", 202 | "Sentiment = 'data/sentiment.csv'\n", 203 | "Sentiment = pd.read_csv(Sentiment, index_col=0, parse_dates=[0])\n", 204 | "\n", 205 | "C = 'data/citi.csv'\n", 206 | "C = pd.read_csv(C, index_col=0, parse_dates=[0])\n", 207 | "\n", 208 | "T10yr = 'data/T10yr.csv'\n", 209 | "T10yr = pd.read_csv(T10yr, index_col=0, parse_dates=[0])" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": { 216 | "collapsed": false 217 | }, 218 | "outputs": [], 219 | "source": [ 220 | "print(\"Citigroup's stock price:\", \"\\n\", C.dtypes, \"\\n\")\n", 221 | "print(\"10 Year Treasury Bond Rate:\", \"\\n\", T10yr.dtypes, \"\\n\")\n", 222 | "print(\"University of Michigan: Consumer Sentiment:\", \"\\n\", Sentiment.dtypes)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "collapsed": false 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "T10yr.head()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": { 240 | "collapsed": false 241 | }, 242 | "outputs": [], 243 | "source": [ 244 | "C['Close'].head(10)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": { 251 | "collapsed": false 252 | }, 253 | "outputs": [], 254 | "source": [ 255 | "Sentiment.head()" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": null, 261 | "metadata": { 262 | "collapsed": false 263 | }, 264 | "outputs": [], 265 | "source": [ 266 | "C.close = C['Close']\n", 267 | "T10yr.close = T10yr['Close']" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Plots to used in Exploratory Time Series Analysis\n", 275 | "\n", 276 | "* Time series plot: to visualize the dynamic and evolution of the series\n", 277 | "* Histogram or NP Density: to visualize the distribution \n", 278 | "* Sample ACF and PACF graphs: to examine autocorrelation and partial autocorrelation\n", 279 | "* Scatterplot matrix on lags: an alternative way to visualize autocorrelation of the series" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": { 286 | "collapsed": false 287 | }, 288 | "outputs": [], 289 | "source": [ 290 | "Sentiment.head()" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": { 297 | "collapsed": false 298 | }, 299 | "outputs": [], 300 | "source": [ 301 | "# Select the series from 2005 - 2016\n", 302 | "sentiment_short = Sentiment.ix['2005':'2016']" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "metadata": { 309 | "collapsed": false 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "sentiment_short.index[:5]" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": null, 319 | "metadata": { 320 | "collapsed": false 321 | }, 322 | "outputs": [], 323 | "source": [ 324 | "print(sentiment_short.dtypes)" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": { 331 | "collapsed": false 332 | }, 333 | "outputs": [], 334 | "source": [ 335 | "sentiment_short.plot(figsize=(12,8))\n", 336 | "plt.legend(bbox_to_anchor=(1.25, 0.5))\n", 337 | "plt.title(\"Consumer Sentiment\")\n", 338 | "sns.despine()" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": null, 344 | "metadata": { 345 | "collapsed": false 346 | }, 347 | "outputs": [], 348 | "source": [ 349 | "fig = plt.figure(figsize=(12,8))\n", 350 | "\n", 351 | "ax1 = fig.add_subplot(211)\n", 352 | "fig = sm.graphics.tsa.plot_acf(sentiment_short, lags=20, ax=ax1)\n", 353 | "ax1.xaxis.set_ticks_position('bottom')\n", 354 | "fig.tight_layout();\n", 355 | "\n", 356 | "ax2 = fig.add_subplot(212)\n", 357 | "fig = sm.graphics.tsa.plot_pacf(sentiment_short, lags=20, ax=ax2)\n", 358 | "ax2.xaxis.set_ticks_position('bottom')\n", 359 | "fig.tight_layout();" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": null, 365 | "metadata": { 366 | "collapsed": false 367 | }, 368 | "outputs": [], 369 | "source": [ 370 | "# Scatterplot matrix is another way to visualize the autocorrelation\n", 371 | "# Its advantage is that it is very intuitive, as scatterplot (i.e. one of the plots in a scatterplot matrix) \n", 372 | "# is used often in practice\n", 373 | "\n", 374 | "lags=9\n", 375 | "\n", 376 | "ncols=3\n", 377 | "nrows=int(np.ceil(lags/ncols))\n", 378 | "\n", 379 | "fig, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(4*ncols, 4*nrows))\n", 380 | "\n", 381 | "for ax, lag in zip(axes.flat, np.arange(1,lags+1, 1)):\n", 382 | " lag_str = 't-{}'.format(lag)\n", 383 | " X = (pd.concat([sentiment_short, sentiment_short.shift(-lag)], axis=1,\n", 384 | " keys=['y'] + [lag_str]).dropna())\n", 385 | "\n", 386 | " X.plot(ax=ax, kind='scatter', y='y', x=lag_str);\n", 387 | " corr = X.corr().as_matrix()[0][1]\n", 388 | " ax.set_ylabel('Original')\n", 389 | " ax.set_title('Lag: {} (corr={:.2f})'.format(lag_str, corr));\n", 390 | " ax.set_aspect('equal');\n", 391 | " sns.despine();\n", 392 | "\n", 393 | "fig.tight_layout();" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": { 400 | "collapsed": true 401 | }, 402 | "outputs": [], 403 | "source": [ 404 | "# Or, we can plot the four essential plots all at once:\n", 405 | "\n", 406 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n", 407 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n", 408 | " \n", 409 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n", 410 | " '''\n", 411 | " fig = plt.figure(figsize=figsize)\n", 412 | " layout = (2, 2)\n", 413 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n", 414 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n", 415 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n", 416 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n", 417 | " \n", 418 | " y.plot(ax=ts_ax)\n", 419 | " ts_ax.set_title(title)\n", 420 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n", 421 | " hist_ax.set_title('Histogram')\n", 422 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n", 423 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n", 424 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n", 425 | " sns.despine()\n", 426 | " plt.tight_layout()\n", 427 | " return ts_ax, acf_ax, pacf_ax" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": { 434 | "collapsed": false 435 | }, 436 | "outputs": [], 437 | "source": [ 438 | "tsplot(sentiment_short, title='Consumer Sentiment', lags=36);" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "### 2.3 Mathematical formulation of ARIMA models\n", 446 | "\n", 447 | "* A time series ${z_t}$ follows an ARIMA$(p,d,q)$ process if the $d^{th}$ differences of the ${z_t}$ series is an ARMA($p,q$) process. Using lag operator, it can expressed as \n", 448 | "\n", 449 | "$$\\begin{equation}\n", 450 | " \\phi_p(B)(1-B)^d z_t = \\theta_q(B) \\omega_t\n", 451 | "\\end{equation}$$\n", 452 | "\n", 453 | "where $\\phi_p$ and $\\theta_q$ are polynomials of orders $p$ and $q$.\n", 454 | "\n", 455 | "* Writing an ARIMA$(p,d,q)$ may seem too abstract, and whenever a model is presented this way, you may get a feel of the model by making simple cases, such as a low order ARIMA$(p,d,q)$ model. \n", 456 | "\n", 457 | "\n", 458 | "\n", 459 | "* Below show two such examples to unpack some of these notations:\n", 460 | "\n", 461 | "$\\textbf{Example 1:}$\n", 462 | "Consider the model $ z_t = z_{t-1} + \\omega_t + \\theta \\omega_{t-1}$. Re-write this model using lag (or backward shift) operator. By now, we should be familiar with this kind of manipulation:\n", 463 | "\n", 464 | "$$\\begin{align}\n", 465 | " z_t &= z_{t-1} + \\omega_t + \\theta \\omega_{t-1} \\\\\n", 466 | " z_t - z_{t-1} &= \\omega_t + \\theta \\omega_{t-1} \\\\\n", 467 | " (1-B)z_t &= (1+\\theta B)\\omega_t\n", 468 | "\\end{align}$$\n", 469 | "\n", 470 | "where $B$ is a lag operator that when applying to $z_t$, gives $z_{t-1}$. That is, $Bz_t = z_{t-1}$.\n", 471 | "\n", 472 | "* This becomes an ARIMA(0,1,1) model, or $\\textit{integrated moving average}$ model (IMA(1,1)).\n", 473 | "\n", 474 | "$\\textbf{Example 2:}$\n", 475 | "Consider a model of the form\n", 476 | "\n", 477 | "$$\\begin{equation}\n", 478 | " z_t = \\phi z_{t-1} + z_{t-1} - \\phi z_{t-2} + \\omega_t\n", 479 | "\\end{equation}$$\n", 480 | "\n", 481 | "* Rewrite the equation, re-arrange terms, and factorize them:\n", 482 | "\n", 483 | "$$\\begin{align}\n", 484 | " z_t - z_{t-1} &= \\phi (z_{t-1} - z_{t-2}) + \\omega_t \\\\\n", 485 | " (z_t - z_{t-1}) - \\phi (z_{t-1} - z_{t-2}) &= \\omega_t \\\\\n", 486 | " (1 - \\phi B)(z_t - z_{t-1}) &= \\omega_t \\\\\n", 487 | " (1 - \\phi B) \\bigtriangledown z_t &= \\omega_t \\\\\n", 488 | " (1 - \\phi B)(1 - B)z_t &= \\omega_t\n", 489 | "\\end{align}$$\n", 490 | "\n", 491 | "The model can be re-written as $(1 - \\phi B) \\bigtriangledown y_t = \\omega_t$, which is an ARIMA(1,1,0) model.\n", 492 | "\n", 493 | "**Sidenotes**\n", 494 | "\n", 495 | "A series ${z_t}$ is $\\textit{integrated}$ of order $d$, denoted as $I(d)$, if the $d^{th}$ differences of ${z_t}$ is a white noise: $\\bigtriangledown^d y_t = \\omega_t$, where $\\bigtriangledown^d \\equiv (1-B)^d$:\n", 496 | "\n", 497 | "$$\\begin{equation}\n", 498 | " (1-B)^d y_t = \\omega_t\n", 499 | "\\end{equation}$$\n", 500 | "\n", 501 | "As such, random walk is the special case I(1).\n", 502 | "\n", 503 | "* In practice, I(0) and I(1) cases find themselves having the most applications.\n", 504 | "\n" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "### 2.4 An Overview of the Box-Jenkins Approach to Non-Seasonal ARIMA Modeling\n", 512 | "\n", 513 | "1. Assess the stationarity of the process $z_t$\n", 514 | "2. If the process is not stationary, difference it (i.e. create an integrated model) as many times as needed to produced a stationary process to be modeled using the $\\textit{mixed autoregressive-moving average process}$ described above.\n", 515 | "3. Identify (i.e. determining the order of the process) the resulting the ARMA model.\n", 516 | " * The sample autocorrelation and sample partial autocorrelation functions are tools used in step $1$ and $2$.\n", 517 | "\n", 518 | "In practice, other steps are necessary in order to produce a functionable model. These steps include:\n", 519 | "- Model diagnostic checking\n", 520 | "- Re-specification of the model if one or more of the underlying statistical assumptions is not satisfied\n", 521 | "- Model selection\n", 522 | "- Perform statistical inference and/or forecasting\n", 523 | "- Forecast evaluation" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "
\n", 531 | "**Exercise 2:**\n", 532 | "\n", 533 | "Let use *series1.csv* and conduct the exploratory data analysis \n", 534 | "
" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": null, 540 | "metadata": { 541 | "collapsed": true 542 | }, 543 | "outputs": [], 544 | "source": [ 545 | "# Step 1: Import the csv file containing the series for the analysis\n", 546 | "filename_ts = 'data/series1.csv'\n", 547 | "ts_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": { 554 | "collapsed": false 555 | }, 556 | "outputs": [], 557 | "source": [ 558 | "# Step 2: Explore the patterns of the time series and its autocorrelation and partial autocorrelation structure\n", 559 | "\n", 560 | "# Choose the number of lags to display the sample ACF and PACF\n", 561 | "n_lag=25\n", 562 | "graph_title=\"Series 1\"\n", 563 | "\n", 564 | "# Make sure the tsplot() function is defined before running the following command\n", 565 | "tsplot(ts_df, title=graph_title, lags=n_lag);" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "** Step 3**\n", 573 | "Type your observations here and discuss with your neighbors.\n", 574 | "* Are there any trend, seasonality, cycles?\n", 575 | "* What are pattern of the ACF? Does it decline exponentially or dampen towards zero? Does it have a sharp cut-off?\n", 576 | "* What about the PACF?" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": null, 582 | "metadata": { 583 | "collapsed": true 584 | }, 585 | "outputs": [], 586 | "source": [] 587 | } 588 | ], 589 | "metadata": { 590 | "kernelspec": { 591 | "display_name": "Python 3", 592 | "language": "python", 593 | "name": "python3" 594 | }, 595 | "language_info": { 596 | "codemirror_mode": { 597 | "name": "ipython", 598 | "version": 3 599 | }, 600 | "file_extension": ".py", 601 | "mimetype": "text/x-python", 602 | "name": "python", 603 | "nbconvert_exporter": "python", 604 | "pygments_lexer": "ipython3", 605 | "version": "3.5.2" 606 | } 607 | }, 608 | "nbformat": 4, 609 | "nbformat_minor": 0 610 | } 611 | -------------------------------------------------------------------------------- /Section_3_ARIMA_Modeling_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\"SVDS\"" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# PyData San Francisco 2016\n", 15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n", 16 | "### Section 3: ARIMAX Models" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### Topics in this section include \n", 24 | "\n", 25 | "\n", 26 | " - 3.1 Model Estimation and Identification\n", 27 | " - 3.2 Model Diagnostic Checking\n", 28 | " * Define the stationary and invertible conditions for $ARIMA(p,d,q)$ models\n", 29 | " - 3.3 Model performance evaluation (in-sample fit)\n", 30 | " - 3.4 Forecasting and forecast evaluation \n", 31 | " - 3.5 A few words on adding explanatory variables, its use cases, and its practical suggestions" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": { 38 | "collapsed": false 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "%load_ext autoreload\n", 43 | "%autoreload 2\n", 44 | "%matplotlib inline\n", 45 | "%config InlineBackend.figure_format='retina'\n", 46 | "\n", 47 | "from __future__ import absolute_import, division, print_function\n", 48 | "\n", 49 | "import sys\n", 50 | "import os\n", 51 | "\n", 52 | "import pandas as pd\n", 53 | "import numpy as np\n", 54 | "\n", 55 | "# TSA from Statsmodels\n", 56 | "import statsmodels.api as sm\n", 57 | "import statsmodels.formula.api as smf\n", 58 | "import statsmodels.tsa.api as smt\n", 59 | "\n", 60 | "# Display and Plotting\n", 61 | "import matplotlib.pylab as plt\n", 62 | "import seaborn as sns\n", 63 | "\n", 64 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n", 65 | "np.set_printoptions(precision=5, suppress=True) # numpy\n", 66 | "\n", 67 | "pd.set_option('display.max_columns', 100)\n", 68 | "pd.set_option('display.max_rows', 100)\n", 69 | "\n", 70 | "# seaborn plotting style\n", 71 | "sns.set(style='ticks', context='poster')" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "
\n", 79 | "\n", 80 | "** Read a series stored in a csv file. ** This is the same series we used in *Exercise 2*.\n", 81 | "
" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": { 88 | "collapsed": false 89 | }, 90 | "outputs": [], 91 | "source": [ 92 | "# Import the csv file containing the series for the analysis\n", 93 | "# This is the file we just analyzed in Exercise 2\n", 94 | "\n", 95 | "filename_ts = 'data/series1.csv'\n", 96 | "ts_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])\n", 97 | "\n", 98 | "n_sample = ts_df.shape[0]" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": { 105 | "collapsed": false 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "print(ts_df.shape)\n", 110 | "print(ts_df.head())" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "### 3.1 Model Identification (ARIMA Model Determination)\n", 118 | "\n", 119 | "1. Determine the *degree of differencing*, $d$\n", 120 | "\n", 121 | "2. Study the patterns of the ACF and PACF of the appropriately differenced series: $\\omega_t = (1-B)^d z_t$, as these autocorrelation functions will provide indication for the choice of the order of autoregressive and the moving average components. While we did not have enough time in this tutorial, it is very beneficial to study the *theoretical* ACF and PACF of the autoregressive, moving average, and the mixed autoregressive and moving average processes.\n", 122 | "\n", 123 | "3. The table below summarize the patterns of the ACF and PACF associated with the $AR(p)$, $MA(q)$, and $ARMA(p,q)$ processes:\n", 124 | "\n", 125 | "| Process |       ACF       | PACF |\n", 126 | "|---------------|:--------------------:|:--------------------:|\n", 127 | "| **AR(p)** |   tails off | cutoff after lag $p$ |\n", 128 | "| **MA(q)** | cutoff after lag $q$ |   tails off |\n", 129 | "| **ARMA(p,q)** | tails off |    tails off |\n", 130 | "\n", 131 | "4. In general, the ACF of an autoregressive process is similar to the PACF of a moving average process, and vice versa.\n", 132 | "5. Keep in mind that these are theoretical properties. In practice, the estimated sample ACF and PACF can come with large variances, deviating from the underlying theoretical behavior. As such, it is prudent to recognize that these are but broad characteristics, and it is quite possible that several candidate models are narrowed down and will need to be investigaged further in the later stage of the modeling process." 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "# Create a training sample and testing sample before analyzing the series\n", 144 | "\n", 145 | "n_train=int(0.95*n_sample)+1\n", 146 | "n_forecast=n_sample-n_train\n", 147 | "#ts_df\n", 148 | "ts_train = ts_df.iloc[:n_train]['value']\n", 149 | "ts_test = ts_df.iloc[n_train:]['value']\n", 150 | "print(ts_train.shape)\n", 151 | "print(ts_test.shape)\n", 152 | "print(\"Training Series:\", \"\\n\", ts_train.tail(), \"\\n\")\n", 153 | "print(\"Testing Series:\", \"\\n\", ts_test.head())" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": { 160 | "collapsed": true 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n", 165 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n", 166 | " \n", 167 | " Source: https://tomaugspurger.github.io/modern-7-timeseries.html\n", 168 | " '''\n", 169 | " fig = plt.figure(figsize=figsize)\n", 170 | " layout = (2, 2)\n", 171 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n", 172 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n", 173 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n", 174 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n", 175 | " \n", 176 | " y.plot(ax=ts_ax)\n", 177 | " ts_ax.set_title(title)\n", 178 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n", 179 | " hist_ax.set_title('Histogram')\n", 180 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n", 181 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n", 182 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n", 183 | " sns.despine()\n", 184 | " fig.tight_layout()\n", 185 | " return ts_ax, acf_ax, pacf_ax" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "collapsed": false 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "tsplot(ts_train, title='A Given Training Series', lags=20);" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "** Observations from the sample ACF and sample PACF (based on first 20 lags) **\n", 204 | "\n", 205 | "- The sample autocorrelation gradually tails off.\n", 206 | "- The sample partial autocorrelation does not exactly cut off at some lag $p$ but does not exactly tail off either.\n", 207 | "- Based on these observations, we could attempt an ARIMA(2,0,0) model as a starting point, although other orders could serve as candidates as well." 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": { 214 | "collapsed": true 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "# Up until this point in the tutorial, statsmodels 0.6.1 is fine.\n", 219 | "# From here on, we need an updated version of statsmodels 0.8.0rc1" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": { 226 | "collapsed": false 227 | }, 228 | "outputs": [], 229 | "source": [ 230 | "# Uncomment to install\n", 231 | "# !pip install --pre statsmodels --upgrade" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": { 238 | "collapsed": false 239 | }, 240 | "outputs": [], 241 | "source": [ 242 | "#Model Estimation\n", 243 | "\n", 244 | "# Fit the model\n", 245 | "arima200 = sm.tsa.SARIMAX(ts_train, order=(2,0,0))\n", 246 | "model_results = arima200.fit()\n", 247 | "model_results.summary()" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": { 253 | "collapsed": true 254 | }, 255 | "source": [ 256 | "#### Digression:\n", 257 | "\n", 258 | "* In practice, one could *search* over a few models using the visual clues above as a starting point. \n", 259 | "* The code below gives one such example" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": { 266 | "collapsed": false 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "import itertools\n", 271 | "\n", 272 | "p_min = 0\n", 273 | "d_min = 0\n", 274 | "q_min = 0\n", 275 | "p_max = 4\n", 276 | "d_max = 0\n", 277 | "q_max = 4\n", 278 | "\n", 279 | "# Initialize a DataFrame to store the results\n", 280 | "results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min,p_max+1)],\n", 281 | " columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])\n", 282 | "\n", 283 | "for p,d,q in itertools.product(range(p_min,p_max+1),\n", 284 | " range(d_min,d_max+1),\n", 285 | " range(q_min,q_max+1)):\n", 286 | " if p==0 and d==0 and q==0:\n", 287 | " results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = np.nan\n", 288 | " continue\n", 289 | " \n", 290 | " try:\n", 291 | " model = sm.tsa.SARIMAX(ts_train, order=(p, d, q),\n", 292 | " #enforce_stationarity=False,\n", 293 | " #enforce_invertibility=False,\n", 294 | " )\n", 295 | " results = model.fit()\n", 296 | " results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = results.bic\n", 297 | " except:\n", 298 | " continue\n", 299 | "results_bic = results_bic[results_bic.columns].astype(float)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": { 306 | "collapsed": false 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "fig, ax = plt.subplots(figsize=(10, 8))\n", 311 | "ax = sns.heatmap(results_bic,\n", 312 | " mask=results_bic.isnull(),\n", 313 | " ax=ax,\n", 314 | " annot=True,\n", 315 | " fmt='.2f',\n", 316 | " );\n", 317 | "ax.set_title('BIC');" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": null, 323 | "metadata": { 324 | "collapsed": false 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "# Alternative model selection method, limited to only searching AR and MA parameters\n", 329 | "\n", 330 | "train_results = sm.tsa.arma_order_select_ic(ts_train, ic=['aic', 'bic'], trend='nc', max_ar=4, max_ma=4)\n", 331 | "\n", 332 | "print('AIC', train_results.aic_min_order)\n", 333 | "print('BIC', train_results.bic_min_order)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "### 3.2 Model Diagnostic Checking\n", 341 | "\n", 342 | "* Conduct visual inspection of the residual plots\n", 343 | "* Residuals of a well-specified ARIMA model should mimic *Gaussian white noises*: the residuals should be uncorrelated and distributed approximated normally with mean zero and variance $n^{-1}$\n", 344 | "* Apparent patterns in the standardized residuals and the estimated ACF of the residuals give an indication that the model need to be re-specified\n", 345 | "* The *results.plot_diagnostics()* function conveniently produce several plots to facilitate the investigation.\n", 346 | "* The estimation results also come with some statistical tests" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "collapsed": false 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "# Residual Diagnostics\n", 358 | "# The plot_diagnostics function associated with the estimated result object produce a few plots that allow us \n", 359 | "# to examine the distribution and correlation of the estimated residuals\n", 360 | "\n", 361 | "model_results.plot_diagnostics(figsize=(16, 12));" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "### 3.2.1 Formal testing\n", 369 | "\n", 370 | "** More information about the statistics under the parameters table, tests of standardized residuals **\n", 371 | "\n", 372 | "#### Test of heteroskedasticity\n", 373 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity\n", 374 | "\n", 375 | "#### Test of normality (Jarque-Bera)\n", 376 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality\n", 377 | "\n", 378 | "#### Test of serial correlation (Ljung-Box)\n", 379 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": { 386 | "collapsed": false 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "# Re-run the above statistical tests, and more. To be used when selecting viable models.\n", 391 | "\n", 392 | "het_method='breakvar'\n", 393 | "norm_method='jarquebera'\n", 394 | "sercor_method='ljungbox'\n", 395 | "\n", 396 | "(het_stat, het_p) = model_results.test_heteroskedasticity(het_method)[0]\n", 397 | "norm_stat, norm_p, skew, kurtosis = model_results.test_normality(norm_method)[0]\n", 398 | "sercor_stat, sercor_p = model_results.test_serial_correlation(method=sercor_method)[0]\n", 399 | "sercor_stat = sercor_stat[-1] # last number for the largest lag\n", 400 | "sercor_p = sercor_p[-1] # last number for the largest lag\n", 401 | "\n", 402 | "# Run Durbin-Watson test on the standardized residuals.\n", 403 | "# The statistic is approximately equal to 2*(1-r), where r is the sample autocorrelation of the residuals.\n", 404 | "# Thus, for r == 0, indicating no serial correlation, the test statistic equals 2.\n", 405 | "# This statistic will always be between 0 and 4. The closer to 0 the statistic,\n", 406 | "# the more evidence for positive serial correlation. The closer to 4,\n", 407 | "# the more evidence for negative serial correlation.\n", 408 | "# Essentially, below 1 or above 3 is bad.\n", 409 | "dw = sm.stats.stattools.durbin_watson(model_results.filter_results.standardized_forecasts_error[0, model_results.loglikelihood_burn:])\n", 410 | "\n", 411 | "# check whether roots are outside the unit circle (we want them to be);\n", 412 | "# will be True when AR is not used (i.e., AR order = 0)\n", 413 | "arroots_outside_unit_circle = np.all(np.abs(model_results.arroots) > 1)\n", 414 | "# will be True when MA is not used (i.e., MA order = 0)\n", 415 | "maroots_outside_unit_circle = np.all(np.abs(model_results.maroots) > 1)\n", 416 | "\n", 417 | "print('Test heteroskedasticity of residuals ({}): stat={:.3f}, p={:.3f}'.format(het_method, het_stat, het_p));\n", 418 | "print('\\nTest normality of residuals ({}): stat={:.3f}, p={:.3f}'.format(norm_method, norm_stat, norm_p));\n", 419 | "print('\\nTest serial correlation of residuals ({}): stat={:.3f}, p={:.3f}'.format(sercor_method, sercor_stat, sercor_p));\n", 420 | "print('\\nDurbin-Watson test on residuals: d={:.2f}\\n\\t(NB: 2 means no serial correlation, 0=pos, 4=neg)'.format(dw))\n", 421 | "print('\\nTest for all AR roots outside unit circle (>1): {}'.format(arroots_outside_unit_circle))\n", 422 | "print('\\nTest for all MA roots outside unit circle (>1): {}'.format(maroots_outside_unit_circle))\n" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "### 3.3 Model performance evaluation (in-sample fit)" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": { 436 | "collapsed": false 437 | }, 438 | "outputs": [], 439 | "source": [ 440 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n", 441 | " \n", 442 | "ax1.plot(ts_train, label='In-sample data', linestyle='-')\n", 443 | "# subtract 1 only to connect it to previous point in the graph\n", 444 | "ax1.plot(ts_test, label='Held-out data', linestyle='--')\n", 445 | "\n", 446 | "# yes DatetimeIndex\n", 447 | "pred_begin = ts_train.index[model_results.loglikelihood_burn]\n", 448 | "pred_end = ts_test.index[-1]\n", 449 | "pred = model_results.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n", 450 | " end=pred_end.strftime('%Y-%m-%d'))\n", 451 | "pred_mean = pred.predicted_mean\n", 452 | "pred_ci = pred.conf_int(alpha=0.05)\n", 453 | "\n", 454 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n", 455 | "ax1.fill_between(pred_ci.index,\n", 456 | " pred_ci.iloc[:, 0],\n", 457 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n", 458 | "\n", 459 | "ax1.legend(loc='best');" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "metadata": { 466 | "collapsed": true 467 | }, 468 | "outputs": [], 469 | "source": [ 470 | "def get_rmse(y, y_hat):\n", 471 | " '''Root Mean Square Error\n", 472 | " https://en.wikipedia.org/wiki/Root-mean-square_deviation\n", 473 | " '''\n", 474 | " mse = np.mean((y - y_hat)**2)\n", 475 | " return np.sqrt(mse)\n", 476 | "\n", 477 | "def get_mape(y, y_hat):\n", 478 | " '''Mean Absolute Percent Error\n", 479 | " https://en.wikipedia.org/wiki/Mean_absolute_percentage_error\n", 480 | " '''\n", 481 | " perc_err = (100*(y - y_hat))/y\n", 482 | " return np.mean(abs(perc_err))\n", 483 | "\n", 484 | "def get_mase(y, y_hat):\n", 485 | " '''Mean Absolute Scaled Error\n", 486 | " https://en.wikipedia.org/wiki/Mean_absolute_scaled_error\n", 487 | " '''\n", 488 | " abs_err = abs(y - y_hat)\n", 489 | " dsum=sum(abs(y[1:] - y_hat[1:]))\n", 490 | " t = len(y)\n", 491 | " denom = (1/(t - 1))* dsum\n", 492 | " return np.mean(abs_err/denom)" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": null, 498 | "metadata": { 499 | "collapsed": false 500 | }, 501 | "outputs": [], 502 | "source": [ 503 | "rmse = get_rmse(ts_train, pred_mean.ix[ts_train.index])\n", 504 | "print(\"RMSE: \", rmse)\n", 505 | "\n", 506 | "mape = get_mape(ts_train, pred_mean.ix[ts_train.index])\n", 507 | "print(\"MAPE: \", mape)\n", 508 | "\n", 509 | "mase = get_mase(ts_train, pred_mean.ix[ts_train.index])\n", 510 | "print(\"MASE: \", mase)" 511 | ] 512 | }, 513 | { 514 | "cell_type": "markdown", 515 | "metadata": {}, 516 | "source": [ 517 | "### 3.4 Forecasting and forecast evaluation" 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": null, 523 | "metadata": { 524 | "collapsed": false 525 | }, 526 | "outputs": [], 527 | "source": [ 528 | "rmse = get_rmse(ts_test, pred_mean.ix[ts_test.index])\n", 529 | "print(rmse)\n", 530 | "\n", 531 | "mape = get_mape(ts_test, pred_mean.ix[ts_test.index])\n", 532 | "print(mape)\n", 533 | "\n", 534 | "mase = get_mase(ts_test, pred_mean.ix[ts_test.index])\n", 535 | "print(mase)" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": { 541 | "collapsed": true 542 | }, 543 | "source": [ 544 | "### Exericse 3:\n", 545 | "\n" 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": null, 551 | "metadata": { 552 | "collapsed": false 553 | }, 554 | "outputs": [], 555 | "source": [ 556 | "# Import the csv file containing the series for the analysis\n", 557 | "\n", 558 | "# Step 1a: Read the data series\n", 559 | "filename_ts = 'data/series2.csv'\n", 560 | "series2_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])\n", 561 | "\n", 562 | "# Step 1b: Create the training and testing series before analyzing the series\n", 563 | "\n", 564 | "n_sample = series2_df.shape[0]\n", 565 | "\n", 566 | "n_train=int(0.95*n_sample)+1\n", 567 | "n_forecast=n_sample-n_train\n", 568 | "\n", 569 | "series2_train = series2_df.iloc[:n_train]['value']\n", 570 | "series2_test = series2_df.iloc[n_train:]['value']\n", 571 | "print(series2_train.shape)\n", 572 | "print(series2_test.shape)\n", 573 | "print(\"Training Series:\", \"\\n\", series2_train.tail(), \"\\n\")\n", 574 | "print(\"Testing Series:\", \"\\n\", series2_test.head())" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": null, 580 | "metadata": { 581 | "collapsed": false 582 | }, 583 | "outputs": [], 584 | "source": [ 585 | "# Step 2a: Examine the basic structure of the data\n", 586 | "print(\"Data shape:\", series2_train.shape, \"\\n\")\n", 587 | "print(\"First 5 observations of the data series:\", \"\\n\", series2_train.head(), \"\\n\")\n", 588 | "print(\"Last 5 observations of the data series:\", \"\\n\", series2_train.tail())" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": null, 594 | "metadata": { 595 | "collapsed": false 596 | }, 597 | "outputs": [], 598 | "source": [ 599 | "# Step 2b: Examine the series and use the visuals as clues for the choice of the orders of the ARIMA model\n", 600 | "# Choose the number of lags you would like to display. Pick a number that is at least 20.\n", 601 | "\n", 602 | "# tsplot(series2_train, title='Series 2', lags=?);\n", 603 | "\n", 604 | "tsplot(series2_train, title='Series 2', lags=YOUR_CODE_HERE);" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": null, 610 | "metadata": { 611 | "collapsed": true 612 | }, 613 | "outputs": [], 614 | "source": [ 615 | "# Step 2c: Conduct any necessary transformations (such as natural log, difference, difference in natural log, etc )\n", 616 | "# and repeat Step 2b\n" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": null, 622 | "metadata": { 623 | "collapsed": false 624 | }, 625 | "outputs": [], 626 | "source": [ 627 | "# Step 3: Estimate an non-Seasonal ARIMA model\n", 628 | "# Note: you will have to pick the orders (p,d,q)\n", 629 | "\n", 630 | "# ex3_mod = sm.tsa.statespace.SARIMAX(series2_train, order=(?,?,?))\n", 631 | "ex3_mod = sm.tsa.statespace.SARIMAX(series2_train, order=())\n", 632 | "ex3_arima_fit = ex3_mod.fit()\n", 633 | "print(ex3_arima_fit.summary())\n", 634 | "\n", 635 | "# Discuss your results" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": null, 641 | "metadata": { 642 | "collapsed": false 643 | }, 644 | "outputs": [], 645 | "source": [ 646 | "# Step 4: Conduct model diagnostic check\n", 647 | "\n", 648 | "ex3_arima_fit.plot_diagnostics(figsize=(16, 12));\n", 649 | "\n", 650 | "# Discuss these plots" 651 | ] 652 | }, 653 | { 654 | "cell_type": "code", 655 | "execution_count": null, 656 | "metadata": { 657 | "collapsed": false 658 | }, 659 | "outputs": [], 660 | "source": [ 661 | "# Step 5: Do a 5-step ahead forecast\n", 662 | "\n", 663 | "# ... codes need to be adjusted\n", 664 | "\n", 665 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n", 666 | " \n", 667 | "ax1.plot(series2_train, label='In-sample data', linestyle='-')\n", 668 | "# subtract 1 only to connect it to previous point in the graph\n", 669 | "ax1.plot(series2_test, label='Held-out data', linestyle='--')\n", 670 | "\n", 671 | "# yes DatetimeIndex\n", 672 | "pred_begin = series2_train.index[ex3_arima_fit.loglikelihood_burn]\n", 673 | "pred_end = series2_test.index[-1]\n", 674 | "pred = ex3_arima_fit.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n", 675 | " end=pred_end.strftime('%Y-%m-%d'))\n", 676 | "pred_mean = pred.predicted_mean\n", 677 | "pred_ci = pred.conf_int(alpha=0.05)\n", 678 | "\n", 679 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n", 680 | "ax1.fill_between(pred_ci.index,\n", 681 | " pred_ci.iloc[:, 0],\n", 682 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n", 683 | "\n", 684 | "ax1.legend(loc='best');\n", 685 | "\n", 686 | "## Discuss the results. How does your forecast look?" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": null, 692 | "metadata": { 693 | "collapsed": false 694 | }, 695 | "outputs": [], 696 | "source": [] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": null, 701 | "metadata": { 702 | "collapsed": true 703 | }, 704 | "outputs": [], 705 | "source": [] 706 | } 707 | ], 708 | "metadata": { 709 | "kernelspec": { 710 | "display_name": "Python 3", 711 | "language": "python", 712 | "name": "python3" 713 | }, 714 | "language_info": { 715 | "codemirror_mode": { 716 | "name": "ipython", 717 | "version": 3 718 | }, 719 | "file_extension": ".py", 720 | "mimetype": "text/x-python", 721 | "name": "python", 722 | "nbconvert_exporter": "python", 723 | "pygments_lexer": "ipython3", 724 | "version": "3.5.2" 725 | } 726 | }, 727 | "nbformat": 4, 728 | "nbformat_minor": 0 729 | } 730 | -------------------------------------------------------------------------------- /Section_4_SARIMA_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\"SVDS\"" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# PyData San Francisco 2016\n", 15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n", 16 | "### Section 4: Seasonal ARIMA Models" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "### Topics in this section include \n", 24 | "\n", 25 | " - 4.1 Mathematical formulation of Seasonal ARIMA (SARIMA) models\n", 26 | " - 4.2 Building a seasonal ARIMA model for forecasting\n", 27 | " - Exercise 4" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": { 34 | "collapsed": false 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "# Set up\n", 39 | "\n", 40 | "%load_ext autoreload\n", 41 | "%autoreload 2\n", 42 | "%matplotlib inline\n", 43 | "%config InlineBackend.figure_format='retina'\n", 44 | "\n", 45 | "from __future__ import absolute_import, division, print_function\n", 46 | "\n", 47 | "import sys\n", 48 | "import os\n", 49 | "\n", 50 | "import pandas as pd\n", 51 | "import numpy as np\n", 52 | "\n", 53 | "# Remote Data Access\n", 54 | "import pandas_datareader.data as web\n", 55 | "import datetime\n", 56 | "# reference: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html\n", 57 | "\n", 58 | "# TSA from Statsmodels\n", 59 | "import statsmodels.api as sm\n", 60 | "import statsmodels.formula.api as smf\n", 61 | "import statsmodels.tsa.api as smt\n", 62 | "\n", 63 | "from statsmodels.graphics.api import qqplot\n", 64 | "\n", 65 | "# Display and Plotting\n", 66 | "import matplotlib.pylab as plt\n", 67 | "import seaborn as sns\n", 68 | "\n", 69 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n", 70 | "np.set_printoptions(precision=5, suppress=True) # numpy\n", 71 | "\n", 72 | "pd.set_option('display.max_columns', 100)\n", 73 | "pd.set_option('display.max_rows', 100)\n", 74 | "\n", 75 | "# seaborn plotting style\n", 76 | "sns.set(style='ticks', context='poster')" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "### Motivation of Using Seasonal ARIMA Model" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": { 90 | "collapsed": false 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "# Import a time series\n", 95 | "# This is a series that we introduced in Section 1 of this tutorial\n", 96 | "\n", 97 | "air = pd.read_csv('data/international-airline-passengers.csv', header=0, index_col=0, parse_dates=[0])" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": { 104 | "collapsed": false 105 | }, 106 | "outputs": [], 107 | "source": [ 108 | "# Examine the basic structure of the data\n", 109 | "print(\"Data shape:\", air.shape, \"\\n\")\n", 110 | "print(\"First 5 observations of the data series:\", \"\\n\", air.head())\n", 111 | "print(\"Last 5 observations of the data series:\", \"\\n\", air.tail())" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": { 118 | "collapsed": true 119 | }, 120 | "outputs": [], 121 | "source": [ 122 | "# Examine the patterns of ACF and PACF (along with the time series plot and histogram)\n", 123 | "\n", 124 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n", 125 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n", 126 | " \n", 127 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n", 128 | " '''\n", 129 | " fig = plt.figure(figsize=figsize)\n", 130 | " layout = (2, 2)\n", 131 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n", 132 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n", 133 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n", 134 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n", 135 | " \n", 136 | " y.plot(ax=ts_ax)\n", 137 | " ts_ax.set_title(title)\n", 138 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n", 139 | " hist_ax.set_title('Histogram')\n", 140 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n", 141 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n", 142 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n", 143 | " sns.despine()\n", 144 | " fig.tight_layout()\n", 145 | " return ts_ax, acf_ax, pacf_ax" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "collapsed": false 153 | }, 154 | "outputs": [], 155 | "source": [ 156 | "tsplot(air, title='International airline passengers, 1949-1960', lags=20);" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "collapsed": false 163 | }, 164 | "source": [ 165 | "### Observations of these graphs:\n", 166 | "\n", 167 | "* The airline passengers displays an increasing trend (over time)\n", 168 | "* There appears to be *seasonality*\n", 169 | "* The autocorrelations do not just gradually decline" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": { 176 | "collapsed": false 177 | }, 178 | "outputs": [], 179 | "source": [ 180 | "# Take log of the series\n", 181 | "air['lnair'] = np.log(air)\n", 182 | "print(air['lnair'].head(),\"\\n\")\n", 183 | "print(air['lnair'].shape,\"\\n\")\n", 184 | "\n", 185 | "# Take first difference of the series\n", 186 | "#air_ln_diff = air['lnair'].diff() - air['lnair'].shift()\n", 187 | "air_ln_diff = air['lnair'].diff()\n", 188 | "air_ln_diff = air_ln_diff.dropna()\n", 189 | "print(air_ln_diff.head(),\"\\n\")\n", 190 | "print(air_ln_diff.shape,\"\\n\")" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": false 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "tsplot(air['lnair'], title='Natural Log of nternational airline passengers, 1949-1960', lags=20);" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": { 208 | "collapsed": false 209 | }, 210 | "outputs": [], 211 | "source": [ 212 | "tsplot(air_ln_diff[1:], title='Differences of Log of International airline passengers, 1949-1960', lags=40);" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "# An alternative way to detect seasonality\n", 224 | "\n", 225 | "air['Month'] = air.index.strftime('%b')\n", 226 | "air['Year'] = air.index.year\n", 227 | "\n", 228 | "air_piv = air.pivot(index='Year', columns='Month', values='n_pass_thousands')\n", 229 | "\n", 230 | "air = air.drop(['Month', 'Year'], axis=1)\n", 231 | "\n", 232 | "# put the months in order\n", 233 | "month_names = pd.date_range(start='2016-01-01', periods=12, freq='MS').strftime('%b')\n", 234 | "air_piv = air_piv.reindex(columns=month_names)\n", 235 | "\n", 236 | "# plot it\n", 237 | "fig, ax = plt.subplots(figsize=(8, 6))\n", 238 | "air_piv.plot(ax=ax, kind='box');\n", 239 | "\n", 240 | "ax.set_xlabel('Month');\n", 241 | "ax.set_ylabel('Thousands of passengers');\n", 242 | "ax.set_title('Boxplot of seasonal values');\n", 243 | "ax.xaxis.set_ticks_position('bottom')\n", 244 | "fig.tight_layout();" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "### 4.1 Formulation of the Seasonal ARIMA Model\n", 252 | "\n", 253 | "The *pure* seasonal autoregressive and moving average model, $ARMA(P,Q)$, take the from\n", 254 | "\n", 255 | "$$\\Phi_P(B^s)z_t=\\Theta_Q(B^s)\\epsilon_t$$ \n", 256 | "\n", 257 | "where \n", 258 | "\n", 259 | "$$\\Phi_P(B^2)=1 - \\Phi_1 B^s - \\Phi_2 B^{2s} - \\cdots - \\Phi_P B^{Ps}$$\n", 260 | "\n", 261 | "and \n", 262 | "\n", 263 | "$$\\Theta_Q(B^2)=1 - \\Theta_1 B^s - \\Theta_2 B^{2s} - \\cdots - \\Theta_Q B^{Qs}$$\n", 264 | "\n", 265 | "are the **seasonal autoregressive operator** and the **seasonal moving average operator** of orders $P$ and $Q$ with **seasonal period s**.\n", 266 | "\n", 267 | "**Example:**\n", 268 | "\n", 269 | "A first-order seasonal autoregressive moving average series over months (or $SARIMA(1,0,1,12)$) can be expressed as\n", 270 | "\n", 271 | "$$ z_t = \\Phi z_{t-12} + \\epsilon_t + \\Theta \\epsilon_{t-12} $$\n", 272 | "\n", 273 | "or\n", 274 | "\n", 275 | "$$ (1 - \\Phi B^{12})z_t = (1 + \\Theta B^{12})\\epsilon_t $$\n", 276 | "\n", 277 | "In other words, this model capture the relationship between $z_t$ and its lags at the multiple of the yearly seasonal period $s=12$ months. \n", 278 | "\n", 279 | "The stationarity condition requires that $|\\Phi|<1$ and the invertible condition requires that $|\\Theta|<1$.\n", 280 | "\n", 281 | "Similar to that for the ARIMA models, the table below summarize the behavior of the theoretical ACF and PACF of the pure seasonal ARMA models:\n", 282 | "\n", 283 | "| Process |       ACF       | PACF |\n", 284 | "|---------------|:--------------------:|:--------------------:|\n", 285 | "| **AR(P)** |   tails off | cutoff after lag $P$ |\n", 286 | "| **MA(Q)** | cutoff after lag $Q$ |   tails off |\n", 287 | "| **ARMA(P,Q)** | tails off |    tails off |\n", 288 | "\n", 289 | "* **Note that we use (p,d,q) to denote the orders for the non-seasonal components of the ARIMA models and (P,D,Q,s) to denote the orders for the seasonal components of the ARIMA model.**\n", 290 | "\n", 291 | "The general formulation of the **Multiplicative Seasonal Autoregressive Integrated Moving Average (SARIMA)** model takes the following form:\n", 292 | "\n", 293 | "$$ \\phi_p(B) \\Phi_P(B^s) \\bigtriangledown^d \\bigtriangledown^D_s z_t = \\theta_q(B) \\Theta_Q(B^s) \\epsilon_t $$ \n", 294 | "\n", 295 | "where \n", 296 | "\n", 297 | "$\\epsilon_t$ is a white noise process\n", 298 | "\n", 299 | "$\\phi_p(B)$ and $\\theta_q(B)$ are non-seasonal autoregressive and moving average lag polynomials\n", 300 | "\n", 301 | "$\\Phi_P(B^s)$ and $\\Theta_Q(B^s)$ are seasonal autoregressive and moving average lag polynomials\n", 302 | "\n", 303 | "$\\bigtriangledown^d \\equiv (1-B)^d$ and $\\bigtriangledown^D_s \\equiv (1-B^s)^D$ are the difference (or integrated) components\n", 304 | "\n", 305 | "Therefore, the general model is denoted as $\\mathbf{ARIMA(p,d,q)\\times(P,D,Q)_s}$\n", 306 | "\n" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "**Example:**\n", 314 | "\n", 315 | "Unpacking the notation, the $\\mathbf{ARIMA(0,1,1)\\times(P,1,1)_12}$ model becomes\n", 316 | "\n", 317 | "$$(1-B)(1-B^{12})z_t = (1+\\theta B)(1+\\Theta B^{12}) \\epsilon_t$$\n", 318 | "\n", 319 | "When multiplying the lag polynomials on both side, we get\n", 320 | "\n", 321 | "$$ (1 - B - B^{12} + B^{13}) z_t = (1 + \\theta B + \\Theta B^{12} + \\theta \\Theta B^{13}) \\epsilon_t $$\n", 322 | "\n", 323 | "Simplify gives\n", 324 | "\n", 325 | "$$ z_t = z_{t-1} + (z_{t-12} - z_{t-13}) + \\epsilon_t + \\theta \\epsilon_{t-1} + \\Theta \\epsilon_{t-12} + \\theta \\Theta \\epsilon_{t-13}$$\n" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "### 4.2 Building a Seasonal ARIMA Model for Forecasting" 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": null, 338 | "metadata": { 339 | "collapsed": false 340 | }, 341 | "outputs": [], 342 | "source": [ 343 | "# Air Passengers Series\n", 344 | "mod = sm.tsa.statespace.SARIMAX(air['lnair'], order=(2,1,0), seasonal_order=(1,1,0,12), simple_differencing=True)\n", 345 | "sarima_fit1 = mod.fit()\n", 346 | "print(sarima_fit1.summary())" 347 | ] 348 | }, 349 | { 350 | "cell_type": "markdown", 351 | "metadata": {}, 352 | "source": [ 353 | "* Notice an additional argument *simple_differencing=True*. \n", 354 | "\n", 355 | "* This controls how the order of integration is handled in ARIMA models. \n", 356 | "\n", 357 | "* If *simple_differencing=True*, then the time series provided as endog is literally differenced and an ARMA model is fit to the resulting new time series. This implies that a number of initial periods are lost to the differencing process, however it may be necessary either to compare results to other packages (e.g. Stata's arima always uses simple differencing) or if the seasonal periodicity is large" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": null, 363 | "metadata": { 364 | "collapsed": false 365 | }, 366 | "outputs": [], 367 | "source": [ 368 | "# Model Diagnostic\n", 369 | "\n", 370 | "sarima_fit1.plot_diagnostics(figsize=(16, 12));" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "### Exercise 4: " 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": { 384 | "collapsed": false 385 | }, 386 | "outputs": [], 387 | "source": [ 388 | "# Step 1: Import the data series\n", 389 | "liquor = pd.read_csv('data/liquor.csv', header=0, index_col=0, parse_dates=[0])\n", 390 | "\n", 391 | "# Step 1b: Create the training and testing series before analyzing the series\n", 392 | "n_sample = liquor.shape[0]\n", 393 | "n_train=int(0.95*n_sample)+1\n", 394 | "n_forecast=n_sample-n_train\n", 395 | "\n", 396 | "liquor_train = liquor.iloc[:n_train]['Value']\n", 397 | "liquor_test = liquor.iloc[n_train:]['Value']\n", 398 | "print(liquor_train.shape)\n", 399 | "print(liquor_test.shape)\n", 400 | "print(\"Training Series:\", \"\\n\", liquor_train.tail(), \"\\n\")\n", 401 | "print(\"Testing Series:\", \"\\n\", liquor_test.head())" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": null, 407 | "metadata": { 408 | "collapsed": false 409 | }, 410 | "outputs": [], 411 | "source": [ 412 | "# Step 2a: Examine the basic structure of the data\n", 413 | "print(\"Data shape:\", liquor_train.shape, \"\\n\")\n", 414 | "print(\"First 5 observations of the training data series:\", \"\\n\", liquor_train.head(), \"\\n\")\n", 415 | "print(\"Last 5 observations of the training data series:\", \"\\n\", liquor_train.tail())" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": { 422 | "collapsed": false 423 | }, 424 | "outputs": [], 425 | "source": [ 426 | "# Step 2b: Examine the series and use the visuals as clues for the choice of the orders of the ARIMA model\n", 427 | "#tsplot(liquor_train, title='Liquor Sales (in millions of dollars), 2007-2016', lags=??);\n", 428 | "tsplot(liquor_train, title='Liquor Sales (in millions of dollars)', lags=40);" 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": { 435 | "collapsed": true 436 | }, 437 | "outputs": [], 438 | "source": [ 439 | "# Step 2c: Conduct any necessary transformations (such as natural log, difference, difference in natural log, etc )\n", 440 | "# and repeat Step 2b\n" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": null, 446 | "metadata": { 447 | "collapsed": false 448 | }, 449 | "outputs": [], 450 | "source": [ 451 | "# Step 3: Estimate an Seasonal ARIMA model\n", 452 | "# Note: you will have to pick the orders (p,d,q)(P,D,Q)_s\n", 453 | "\n", 454 | "#mod = sm.tsa.statespace.SARIMAX(liquor, order=(?,?,?), seasonal_order=(?,?,?,?))\n", 455 | "\n", 456 | "mod = sm.tsa.statespace.SARIMAX(liquor_train, order=(0,1,1), seasonal_order=(0,1,0,12))\n", 457 | "sarima_fit2 = mod.fit()\n", 458 | "print(sarima_fit2.summary())" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": false 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "# Step 4: Conduct model diagnostic check\n", 470 | "sarima_fit2.plot_diagnostics();\n", 471 | "\n", 472 | "# Discuss these plots" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "metadata": { 479 | "collapsed": false 480 | }, 481 | "outputs": [], 482 | "source": [ 483 | "# Step 5: Do a 14-step ahead forecast\n", 484 | "\n", 485 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n", 486 | " \n", 487 | "ax1.plot(liquor_train, label='In-sample data', linestyle='-')\n", 488 | "# subtract 1 only to connect it to previous point in the graph\n", 489 | "ax1.plot(liquor_test, label='Held-out data', linestyle='--')\n", 490 | "\n", 491 | "# yes DatetimeIndex\n", 492 | "pred_begin = liquor_train.index[sarima_fit2.loglikelihood_burn]\n", 493 | "pred_end = liquor_test.index[-1]\n", 494 | "pred = sarima_fit2.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n", 495 | " end=pred_end.strftime('%Y-%m-%d'))\n", 496 | "pred_mean = pred.predicted_mean\n", 497 | "pred_ci = pred.conf_int(alpha=0.05)\n", 498 | "\n", 499 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n", 500 | "ax1.fill_between(pred_ci.index,\n", 501 | " pred_ci.iloc[:, 0],\n", 502 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n", 503 | "ax1.set_xlabel(\"Year\")\n", 504 | "ax1.set_ylabel(\"Liquor Sales (in millions of dollars)\")\n", 505 | "ax1.legend(loc='best');\n", 506 | "fig.tight_layout();\n", 507 | "## Discuss the results. How does your forecast look?" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": { 514 | "collapsed": true 515 | }, 516 | "outputs": [], 517 | "source": [] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": null, 522 | "metadata": { 523 | "collapsed": true 524 | }, 525 | "outputs": [], 526 | "source": [] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": null, 531 | "metadata": { 532 | "collapsed": true 533 | }, 534 | "outputs": [], 535 | "source": [] 536 | } 537 | ], 538 | "metadata": { 539 | "kernelspec": { 540 | "display_name": "Python 2", 541 | "language": "python", 542 | "name": "python2" 543 | }, 544 | "language_info": { 545 | "codemirror_mode": { 546 | "name": "ipython", 547 | "version": 2 548 | }, 549 | "file_extension": ".py", 550 | "mimetype": "text/x-python", 551 | "name": "python", 552 | "nbconvert_exporter": "python", 553 | "pygments_lexer": "ipython2", 554 | "version": "2.7.12" 555 | } 556 | }, 557 | "nbformat": 4, 558 | "nbformat_minor": 0 559 | } 560 | -------------------------------------------------------------------------------- /Section_5_ClosingRemarks_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\"SVDS\"" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# PyData San Francisco 2016\n", 15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n", 16 | "\n", 17 | "### Section 5. Closing Remarks: Practical suggestions and other topics" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "### Topics in this section include\n", 25 | "\n", 26 | "- 5.1 Model selection heuristics\n", 27 | "- 5.2 Material we did not cover\n", 28 | "- 5.3 Where to go from here" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### 5.1 Model selection heuristics\n", 36 | "\n", 37 | "ARIMA $(p,d,q)$\n", 38 | "\n", 39 | "SARIMAX $(p,d,q) \\times (P,D,Q)_{s}$\n", 40 | "\n", 41 | "- Examine the time series to understand its characteristics, e.g., trend, seasonality.\n", 42 | "- Choose an appropriate model form (ARIMA, SARIMA, ARIMAX, SARIMAX).\n", 43 | "- Check for (unit root) stationarity of the time series.\n", 44 | " - Determine whether differencing (informs $d$ and $D$) or other transformation is necessary to make stationary.\n", 45 | "- Examine ACF and PACF to determine the initial choice of the AR($p$) and MA($q$) model orders, and seasonal $P$ and $Q$ orders if appropriate.\n", 46 | "- Alternatively, or in addition, fit many models.\n", 47 | "- Choose a model based on:\n", 48 | " - A criterion, e.g., AIC, BIC\n", 49 | " - Examination of statistical tests on residuals.\n", 50 | " - Out-of-sample forecast error." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": { 57 | "collapsed": true 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "%load_ext autoreload\n", 62 | "%autoreload 2\n", 63 | "%matplotlib inline\n", 64 | "%config InlineBackend.figure_format='retina'\n", 65 | "\n", 66 | "from __future__ import absolute_import, division, print_function\n", 67 | "\n", 68 | "import sys\n", 69 | "import os\n", 70 | "\n", 71 | "import pandas as pd\n", 72 | "import numpy as np\n", 73 | "\n", 74 | "import statsmodels.api as sm\n", 75 | "import statsmodels.formula.api as smf\n", 76 | "import statsmodels.tsa.api as smt\n", 77 | "\n", 78 | "import itertools\n", 79 | "import warnings\n", 80 | "\n", 81 | "# Display and Plotting\n", 82 | "import matplotlib.pylab as plt\n", 83 | "import seaborn as sns\n", 84 | "\n", 85 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n", 86 | "np.set_printoptions(precision=5, suppress=True) # numpy\n", 87 | "\n", 88 | "pd.set_option('display.max_columns', 100)\n", 89 | "pd.set_option('display.max_rows', 100)\n", 90 | "\n", 91 | "# seaborn plotting style\n", 92 | "sns.set(style='ticks', context='poster')" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "collapsed": true 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "def test_stationarity(timeseries,\n", 104 | " maxlag=None, regression=None, autolag=None,\n", 105 | " window=None, plot=False, verbose=False):\n", 106 | " '''\n", 107 | " Check unit root stationarity of time series.\n", 108 | " \n", 109 | " Null hypothesis: the series is non-stationary.\n", 110 | " If p >= alpha, the series is non-stationary.\n", 111 | " If p < alpha, reject the null hypothesis (has unit root stationarity).\n", 112 | " \n", 113 | " Original source: http://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/\n", 114 | " \n", 115 | " Function: http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.adfuller.html\n", 116 | " \n", 117 | " window argument is only required for plotting rolling functions. Default=4.\n", 118 | " '''\n", 119 | " \n", 120 | " # set defaults (from function page)\n", 121 | " if regression is None:\n", 122 | " regression = 'c'\n", 123 | " \n", 124 | " if verbose:\n", 125 | " print('Running Augmented Dickey-Fuller test with paramters:')\n", 126 | " print('maxlag: {}'.format(maxlag))\n", 127 | " print('regression: {}'.format(regression))\n", 128 | " print('autolag: {}'.format(autolag))\n", 129 | " \n", 130 | " if plot:\n", 131 | " if window is None:\n", 132 | " window = 4\n", 133 | " #Determing rolling statistics\n", 134 | " rolmean = timeseries.rolling(window=window, center=False).mean()\n", 135 | " rolstd = timeseries.rolling(window=window, center=False).std()\n", 136 | " \n", 137 | " #Plot rolling statistics:\n", 138 | " orig = plt.plot(timeseries, color='blue', label='Original')\n", 139 | " mean = plt.plot(rolmean, color='red', label='Rolling Mean ({})'.format(window))\n", 140 | " std = plt.plot(rolstd, color='black', label='Rolling Std ({})'.format(window))\n", 141 | " plt.legend(loc='best')\n", 142 | " plt.title('Rolling Mean & Standard Deviation')\n", 143 | " plt.show(block=False)\n", 144 | " \n", 145 | " #Perform Augmented Dickey-Fuller test:\n", 146 | " dftest = smt.adfuller(timeseries, maxlag=maxlag, regression=regression, autolag=autolag)\n", 147 | " dfoutput = pd.Series(dftest[0:4], index=['Test Statistic',\n", 148 | " 'p-value',\n", 149 | " '#Lags Used',\n", 150 | " 'Number of Observations Used',\n", 151 | " ])\n", 152 | " for key,value in dftest[4].items():\n", 153 | " dfoutput['Critical Value (%s)'%key] = value\n", 154 | " if verbose:\n", 155 | " print('Results of Augmented Dickey-Fuller Test:')\n", 156 | " print(dfoutput)\n", 157 | " return dfoutput" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "collapsed": true 165 | }, 166 | "outputs": [], 167 | "source": [ 168 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n", 169 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n", 170 | " \n", 171 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n", 172 | " '''\n", 173 | " fig = plt.figure(figsize=figsize)\n", 174 | " layout = (2, 2)\n", 175 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n", 176 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n", 177 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n", 178 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n", 179 | " \n", 180 | " y.plot(ax=ts_ax)\n", 181 | " ts_ax.set_title(title)\n", 182 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n", 183 | " hist_ax.set_title('Histogram')\n", 184 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n", 185 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n", 186 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n", 187 | " sns.despine()\n", 188 | " fig.tight_layout()\n", 189 | " return ts_ax, acf_ax, pacf_ax" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "collapsed": true 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "def model_resid_stats(model_results,\n", 201 | " het_method='breakvar',\n", 202 | " norm_method='jarquebera',\n", 203 | " sercor_method='ljungbox',\n", 204 | " verbose=True,\n", 205 | " ):\n", 206 | " '''More information about the statistics under the ARIMA parameters table, tests of standardized residuals:\n", 207 | " \n", 208 | " Test of heteroskedasticity\n", 209 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity\n", 210 | "\n", 211 | " Test of normality (Default: Jarque-Bera)\n", 212 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality\n", 213 | "\n", 214 | " Test of serial correlation (Default: Ljung-Box)\n", 215 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation.html\n", 216 | " '''\n", 217 | " # Re-run the ARIMA model statistical tests, and more. To be used when selecting viable models.\n", 218 | " (het_stat, het_p) = model_results.test_heteroskedasticity(het_method)[0]\n", 219 | " norm_stat, norm_p, skew, kurtosis = model_results.test_normality(norm_method)[0]\n", 220 | " sercor_stat, sercor_p = model_results.test_serial_correlation(method=sercor_method)[0]\n", 221 | " sercor_stat = sercor_stat[-1] # last number for the largest lag\n", 222 | " sercor_p = sercor_p[-1] # last number for the largest lag\n", 223 | "\n", 224 | " # Run Durbin-Watson test on the standardized residuals.\n", 225 | " # The statistic is approximately equal to 2*(1-r), where r is the sample autocorrelation of the residuals.\n", 226 | " # Thus, for r == 0, indicating no serial correlation, the test statistic equals 2.\n", 227 | " # This statistic will always be between 0 and 4. The closer to 0 the statistic,\n", 228 | " # the more evidence for positive serial correlation. The closer to 4,\n", 229 | " # the more evidence for negative serial correlation.\n", 230 | " # Essentially, below 1 or above 3 is bad.\n", 231 | " dw_stat = sm.stats.stattools.durbin_watson(model_results.filter_results.standardized_forecasts_error[0, model_results.loglikelihood_burn:])\n", 232 | "\n", 233 | " # check whether roots are outside the unit circle (we want them to be);\n", 234 | " # will be True when AR is not used (i.e., AR order = 0)\n", 235 | " arroots_outside_unit_circle = np.all(np.abs(model_results.arroots) > 1)\n", 236 | " # will be True when MA is not used (i.e., MA order = 0)\n", 237 | " maroots_outside_unit_circle = np.all(np.abs(model_results.maroots) > 1)\n", 238 | " \n", 239 | " if verbose:\n", 240 | " print('Test heteroskedasticity of residuals ({}): stat={:.3f}, p={:.3f}'.format(het_method, het_stat, het_p));\n", 241 | " print('\\nTest normality of residuals ({}): stat={:.3f}, p={:.3f}'.format(norm_method, norm_stat, norm_p));\n", 242 | " print('\\nTest serial correlation of residuals ({}): stat={:.3f}, p={:.3f}'.format(sercor_method, sercor_stat, sercor_p));\n", 243 | " print('\\nDurbin-Watson test on residuals: d={:.2f}\\n\\t(NB: 2 means no serial correlation, 0=pos, 4=neg)'.format(dw_stat))\n", 244 | " print('\\nTest for all AR roots outside unit circle (>1): {}'.format(arroots_outside_unit_circle))\n", 245 | " print('\\nTest for all MA roots outside unit circle (>1): {}'.format(maroots_outside_unit_circle))\n", 246 | " \n", 247 | " stat = {'het_method': het_method,\n", 248 | " 'het_stat': het_stat,\n", 249 | " 'het_p': het_p,\n", 250 | " 'norm_method': norm_method,\n", 251 | " 'norm_stat': norm_stat,\n", 252 | " 'norm_p': norm_p,\n", 253 | " 'skew': skew,\n", 254 | " 'kurtosis': kurtosis,\n", 255 | " 'sercor_method': sercor_method,\n", 256 | " 'sercor_stat': sercor_stat,\n", 257 | " 'sercor_p': sercor_p,\n", 258 | " 'dw_stat': dw_stat,\n", 259 | " 'arroots_outside_unit_circle': arroots_outside_unit_circle,\n", 260 | " 'maroots_outside_unit_circle': maroots_outside_unit_circle,\n", 261 | " }\n", 262 | " return stat" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": null, 268 | "metadata": { 269 | "collapsed": true 270 | }, 271 | "outputs": [], 272 | "source": [ 273 | "def model_gridsearch(ts,\n", 274 | " p_min,\n", 275 | " d_min,\n", 276 | " q_min,\n", 277 | " p_max,\n", 278 | " d_max,\n", 279 | " q_max,\n", 280 | " sP_min,\n", 281 | " sD_min,\n", 282 | " sQ_min,\n", 283 | " sP_max,\n", 284 | " sD_max,\n", 285 | " sQ_max,\n", 286 | " trends,\n", 287 | " s=None,\n", 288 | " enforce_stationarity=True,\n", 289 | " enforce_invertibility=True,\n", 290 | " simple_differencing=False,\n", 291 | " plot_diagnostics=False,\n", 292 | " verbose=False,\n", 293 | " filter_warnings=True,\n", 294 | " ):\n", 295 | " '''Run grid search of SARIMAX models and save results.\n", 296 | " '''\n", 297 | " \n", 298 | " cols = ['p', 'd', 'q', 'sP', 'sD', 'sQ', 's', 'trend',\n", 299 | " 'enforce_stationarity', 'enforce_invertibility', 'simple_differencing',\n", 300 | " 'aic', 'bic',\n", 301 | " 'het_p', 'norm_p', 'sercor_p', 'dw_stat',\n", 302 | " 'arroots_gt_1', 'maroots_gt_1',\n", 303 | " 'datetime_run']\n", 304 | "\n", 305 | " # Initialize a DataFrame to store the results\n", 306 | " df_results = pd.DataFrame(columns=cols)\n", 307 | "\n", 308 | " # # Initialize a DataFrame to store the results\n", 309 | " # results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min,p_max+1)],\n", 310 | " # columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])\n", 311 | "\n", 312 | " mod_num=0\n", 313 | " for trend,p,d,q,sP,sD,sQ in itertools.product(trends,\n", 314 | " range(p_min,p_max+1),\n", 315 | " range(d_min,d_max+1),\n", 316 | " range(q_min,q_max+1),\n", 317 | " range(sP_min,sP_max+1),\n", 318 | " range(sD_min,sD_max+1),\n", 319 | " range(sQ_min,sQ_max+1),\n", 320 | " ):\n", 321 | " # initialize to store results for this parameter set\n", 322 | " this_model = pd.DataFrame(index=[mod_num], columns=cols)\n", 323 | "\n", 324 | " if p==0 and d==0 and q==0:\n", 325 | " continue\n", 326 | "\n", 327 | " try:\n", 328 | " model = sm.tsa.SARIMAX(ts,\n", 329 | " trend=trend,\n", 330 | " order=(p, d, q),\n", 331 | " seasonal_order=(sP, sD, sQ, s),\n", 332 | " enforce_stationarity=enforce_stationarity,\n", 333 | " enforce_invertibility=enforce_invertibility,\n", 334 | " simple_differencing=simple_differencing,\n", 335 | " )\n", 336 | " \n", 337 | " if filter_warnings is True:\n", 338 | " with warnings.catch_warnings():\n", 339 | " warnings.filterwarnings(\"ignore\")\n", 340 | " model_results = model.fit(disp=0)\n", 341 | " else:\n", 342 | " model_results = model.fit()\n", 343 | "\n", 344 | " if verbose:\n", 345 | " print(model_results.summary())\n", 346 | "\n", 347 | " if plot_diagnostics:\n", 348 | " model_results.plot_diagnostics();\n", 349 | "\n", 350 | " stat = model_resid_stats(model_results,\n", 351 | " verbose=verbose)\n", 352 | "\n", 353 | " this_model.loc[mod_num, 'p'] = p\n", 354 | " this_model.loc[mod_num, 'd'] = d\n", 355 | " this_model.loc[mod_num, 'q'] = q\n", 356 | " this_model.loc[mod_num, 'sP'] = sP\n", 357 | " this_model.loc[mod_num, 'sD'] = sD\n", 358 | " this_model.loc[mod_num, 'sQ'] = sQ\n", 359 | " this_model.loc[mod_num, 's'] = s\n", 360 | " this_model.loc[mod_num, 'trend'] = trend\n", 361 | " this_model.loc[mod_num, 'enforce_stationarity'] = enforce_stationarity\n", 362 | " this_model.loc[mod_num, 'enforce_invertibility'] = enforce_invertibility\n", 363 | " this_model.loc[mod_num, 'simple_differencing'] = simple_differencing\n", 364 | "\n", 365 | " this_model.loc[mod_num, 'aic'] = model_results.aic\n", 366 | " this_model.loc[mod_num, 'bic'] = model_results.bic\n", 367 | "\n", 368 | " # this_model.loc[mod_num, 'het_method'] = stat['het_method']\n", 369 | " # this_model.loc[mod_num, 'het_stat'] = stat['het_stat']\n", 370 | " this_model.loc[mod_num, 'het_p'] = stat['het_p']\n", 371 | " # this_model.loc[mod_num, 'norm_method'] = stat['norm_method']\n", 372 | " # this_model.loc[mod_num, 'norm_stat'] = stat['norm_stat']\n", 373 | " this_model.loc[mod_num, 'norm_p'] = stat['norm_p']\n", 374 | " # this_model.loc[mod_num, 'skew'] = stat['skew']\n", 375 | " # this_model.loc[mod_num, 'kurtosis'] = stat['kurtosis']\n", 376 | " # this_model.loc[mod_num, 'sercor_method'] = stat['sercor_method']\n", 377 | " # this_model.loc[mod_num, 'sercor_stat'] = stat['sercor_stat']\n", 378 | " this_model.loc[mod_num, 'sercor_p'] = stat['sercor_p']\n", 379 | " this_model.loc[mod_num, 'dw_stat'] = stat['dw_stat']\n", 380 | " this_model.loc[mod_num, 'arroots_gt_1'] = stat['arroots_outside_unit_circle']\n", 381 | " this_model.loc[mod_num, 'maroots_gt_1'] = stat['maroots_outside_unit_circle']\n", 382 | "\n", 383 | " this_model.loc[mod_num, 'datetime_run'] = pd.to_datetime('today').strftime('%Y-%m-%d %H:%M:%S')\n", 384 | "\n", 385 | " df_results = df_results.append(this_model)\n", 386 | " mod_num+=1\n", 387 | " except:\n", 388 | " continue\n", 389 | " return df_results" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": null, 395 | "metadata": { 396 | "collapsed": true 397 | }, 398 | "outputs": [], 399 | "source": [ 400 | "# load time series\n", 401 | "liquor = pd.read_csv('data/liquor.csv', header=0, index_col=0, parse_dates=[0])\n", 402 | "\n", 403 | "# Keey only the data from the last 10 years or so\n", 404 | "liquor = liquor.ix['2007':'2016']" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": { 411 | "collapsed": false 412 | }, 413 | "outputs": [], 414 | "source": [ 415 | "# plot\n", 416 | "tsplot(liquor['Value'], title='Liquor Sales (in millions of dollars), 2007-2016', lags=40);" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": { 423 | "collapsed": false 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "# Test stationarity\n", 428 | "\n", 429 | "test_stationarity(liquor['Value'])" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "metadata": { 436 | "collapsed": false 437 | }, 438 | "outputs": [], 439 | "source": [ 440 | "# Take first difference of the series\n", 441 | "test_stationarity(liquor['Value'].diff().dropna())" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": { 448 | "collapsed": false 449 | }, 450 | "outputs": [], 451 | "source": [ 452 | "# Take log of the series\n", 453 | "liquor['lnliquor'] = np.log(liquor)\n", 454 | "\n", 455 | "test_stationarity(liquor['lnliquor'])" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": null, 461 | "metadata": { 462 | "collapsed": false 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "# Take first difference of the log series\n", 467 | "liquor_ln_diff = liquor['lnliquor'].diff()\n", 468 | "liquor_ln_diff = liquor_ln_diff.dropna()\n", 469 | "\n", 470 | "test_stationarity(liquor_ln_diff)" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": null, 476 | "metadata": { 477 | "collapsed": false, 478 | "scrolled": false 479 | }, 480 | "outputs": [], 481 | "source": [ 482 | "# run model grid search\n", 483 | "\n", 484 | "p_min = 0\n", 485 | "d_min = 0\n", 486 | "q_min = 0\n", 487 | "p_max = 2\n", 488 | "d_max = 1\n", 489 | "q_max = 2\n", 490 | "\n", 491 | "sP_min = 0\n", 492 | "sD_min = 0\n", 493 | "sQ_min = 0\n", 494 | "sP_max = 1\n", 495 | "sD_max = 1\n", 496 | "sQ_max = 1\n", 497 | "\n", 498 | "s=12\n", 499 | "\n", 500 | "# trends=['n', 'c']\n", 501 | "trends=['n']\n", 502 | "\n", 503 | "enforce_stationarity=True\n", 504 | "enforce_invertibility=True\n", 505 | "simple_differencing=False\n", 506 | "\n", 507 | "plot_diagnostics=False\n", 508 | "\n", 509 | "verbose=False\n", 510 | "\n", 511 | "df_results = model_gridsearch(liquor['Value'],\n", 512 | " p_min,\n", 513 | " d_min,\n", 514 | " q_min,\n", 515 | " p_max,\n", 516 | " d_max,\n", 517 | " q_max,\n", 518 | " sP_min,\n", 519 | " sD_min,\n", 520 | " sQ_min,\n", 521 | " sP_max,\n", 522 | " sD_max,\n", 523 | " sQ_max,\n", 524 | " trends,\n", 525 | " s=s,\n", 526 | " enforce_stationarity=enforce_stationarity,\n", 527 | " enforce_invertibility=enforce_invertibility,\n", 528 | " simple_differencing=simple_differencing,\n", 529 | " plot_diagnostics=plot_diagnostics,\n", 530 | " verbose=verbose,\n", 531 | " )" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": null, 537 | "metadata": { 538 | "collapsed": false 539 | }, 540 | "outputs": [], 541 | "source": [ 542 | "# choose a model\n", 543 | "\n", 544 | "df_results.sort_values(by='bic').head(10)" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "### 5.2 Where to go from here" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "#### Jupyter notebooks, presentations, blog posts\n", 559 | "\n", 560 | "- Example notebooks using SARIMAX models:\n", 561 | " - SARIMAX introduction\n", 562 | " - http://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html\n", 563 | " - Model selection, missing data\n", 564 | " - http://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_internet.html\n", 565 | "- \"Time series analysis and forecasting with statsmodels\" presentation, by a statsmodels lead contributor\n", 566 | " - https://josef-pkt.github.io/pages/slides/slides_forecasting.slides.html\n", 567 | "- Time series analysis using Pandas and statsmodels, by a statsmodels lead contributor\n", 568 | " - https://tomaugspurger.github.io/modern-7-timeseries.html\n", 569 | "- Time series analysis using Pandas and statsmodels\n", 570 | " - https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/\n", 571 | "- Simple example of SARIMA forecast (based on the analyticsvidhya.com post)\n", 572 | " - http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/\n", 573 | "\n", 574 | "\n", 575 | "#### Free textbooks available online\n", 576 | "\n", 577 | "- Forecasting: principles and practice (free textbook, with R code)\n", 578 | " - https://www.otexts.org/fpp/\n", 579 | "- Time Series Analysis and Its Applications: With R Examples (Shumway & Stoffer, EZ time series edition)\n", 580 | " - http://www.stat.pitt.edu/stoffer/tsa4/\n", 581 | "\n" 582 | ] 583 | } 584 | ], 585 | "metadata": { 586 | "kernelspec": { 587 | "display_name": "Python 3", 588 | "language": "python", 589 | "name": "python3" 590 | }, 591 | "language_info": { 592 | "codemirror_mode": { 593 | "name": "ipython", 594 | "version": 3 595 | }, 596 | "file_extension": ".py", 597 | "mimetype": "text/x-python", 598 | "name": "python", 599 | "nbconvert_exporter": "python", 600 | "pygments_lexer": "ipython3", 601 | "version": "3.5.2" 602 | } 603 | }, 604 | "nbformat": 4, 605 | "nbformat_minor": 0 606 | } 607 | -------------------------------------------------------------------------------- /data/HOUST.csv: -------------------------------------------------------------------------------- 1 | observation_date,HOUST 2 | 1/1/93,1210 3 | 2/1/93,1210 4 | 3/1/93,1083 5 | 4/1/93,1258 6 | 5/1/93,1260 7 | 6/1/93,1280 8 | 7/1/93,1254 9 | 8/1/93,1300 10 | 9/1/93,1343 11 | 10/1/93,1392 12 | 11/1/93,1376 13 | 12/1/93,1533 14 | 1/1/94,1272 15 | 2/1/94,1337 16 | 3/1/94,1564 17 | 4/1/94,1465 18 | 5/1/94,1526 19 | 6/1/94,1409 20 | 7/1/94,1439 21 | 8/1/94,1450 22 | 9/1/94,1474 23 | 10/1/94,1450 24 | 11/1/94,1511 25 | 12/1/94,1455 26 | 1/1/95,1407 27 | 2/1/95,1316 28 | 3/1/95,1249 29 | 4/1/95,1267 30 | 5/1/95,1314 31 | 6/1/95,1281 32 | 7/1/95,1461 33 | 8/1/95,1416 34 | 9/1/95,1369 35 | 10/1/95,1369 36 | 11/1/95,1452 37 | 12/1/95,1431 38 | 1/1/96,1467 39 | 2/1/96,1491 40 | 3/1/96,1424 41 | 4/1/96,1516 42 | 5/1/96,1504 43 | 6/1/96,1467 44 | 7/1/96,1472 45 | 8/1/96,1557 46 | 9/1/96,1475 47 | 10/1/96,1392 48 | 11/1/96,1489 49 | 12/1/96,1370 50 | 1/1/97,1355 51 | 2/1/97,1486 52 | 3/1/97,1457 53 | 4/1/97,1492 54 | 5/1/97,1442 55 | 6/1/97,1494 56 | 7/1/97,1437 57 | 8/1/97,1390 58 | 9/1/97,1546 59 | 10/1/97,1520 60 | 11/1/97,1510 61 | 12/1/97,1566 62 | 1/1/98,1525 63 | 2/1/98,1584 64 | 3/1/98,1567 65 | 4/1/98,1540 66 | 5/1/98,1536 67 | 6/1/98,1641 68 | 7/1/98,1698 69 | 8/1/98,1614 70 | 9/1/98,1582 71 | 10/1/98,1715 72 | 11/1/98,1660 73 | 12/1/98,1792 74 | 1/1/99,1748 75 | 2/1/99,1670 76 | 3/1/99,1710 77 | 4/1/99,1553 78 | 5/1/99,1611 79 | 6/1/99,1559 80 | 7/1/99,1669 81 | 8/1/99,1648 82 | 9/1/99,1635 83 | 10/1/99,1608 84 | 11/1/99,1648 85 | 12/1/99,1708 86 | 1/1/00,1636 87 | 2/1/00,1737 88 | 3/1/00,1604 89 | 4/1/00,1626 90 | 5/1/00,1575 91 | 6/1/00,1559 92 | 7/1/00,1463 93 | 8/1/00,1541 94 | 9/1/00,1507 95 | 10/1/00,1549 96 | 11/1/00,1551 97 | 12/1/00,1532 98 | 1/1/01,1600 99 | 2/1/01,1625 100 | 3/1/01,1590 101 | 4/1/01,1649 102 | 5/1/01,1605 103 | 6/1/01,1636 104 | 7/1/01,1670 105 | 8/1/01,1567 106 | 9/1/01,1562 107 | 10/1/01,1540 108 | 11/1/01,1602 109 | 12/1/01,1568 110 | 1/1/02,1698 111 | 2/1/02,1829 112 | 3/1/02,1642 113 | 4/1/02,1592 114 | 5/1/02,1764 115 | 6/1/02,1717 116 | 7/1/02,1655 117 | 8/1/02,1633 118 | 9/1/02,1804 119 | 10/1/02,1648 120 | 11/1/02,1753 121 | 12/1/02,1788 122 | 1/1/03,1853 123 | 2/1/03,1629 124 | 3/1/03,1726 125 | 4/1/03,1643 126 | 5/1/03,1751 127 | 6/1/03,1867 128 | 7/1/03,1897 129 | 8/1/03,1833 130 | 9/1/03,1939 131 | 10/1/03,1967 132 | 11/1/03,2083 133 | 12/1/03,2057 134 | 1/1/04,1911 135 | 2/1/04,1846 136 | 3/1/04,1998 137 | 4/1/04,2003 138 | 5/1/04,1981 139 | 6/1/04,1828 140 | 7/1/04,2002 141 | 8/1/04,2024 142 | 9/1/04,1905 143 | 10/1/04,2072 144 | 11/1/04,1782 145 | 12/1/04,2042 146 | 1/1/05,2144 147 | 2/1/05,2207 148 | 3/1/05,1864 149 | 4/1/05,2061 150 | 5/1/05,2025 151 | 6/1/05,2068 152 | 7/1/05,2054 153 | 8/1/05,2095 154 | 9/1/05,2151 155 | 10/1/05,2065 156 | 11/1/05,2147 157 | 12/1/05,1994 158 | 1/1/06,2273 159 | 2/1/06,2119 160 | 3/1/06,1969 161 | 4/1/06,1821 162 | 5/1/06,1942 163 | 6/1/06,1802 164 | 7/1/06,1737 165 | 8/1/06,1650 166 | 9/1/06,1720 167 | 10/1/06,1491 168 | 11/1/06,1570 169 | 12/1/06,1649 170 | 1/1/07,1409 171 | 2/1/07,1480 172 | 3/1/07,1495 173 | 4/1/07,1490 174 | 5/1/07,1415 175 | 6/1/07,1448 176 | 7/1/07,1354 177 | 8/1/07,1330 178 | 9/1/07,1183 179 | 10/1/07,1264 180 | 11/1/07,1197 181 | 12/1/07,1037 182 | 1/1/08,1084 183 | 2/1/08,1103 184 | 3/1/08,1005 185 | 4/1/08,1013 186 | 5/1/08,973 187 | 6/1/08,1046 188 | 7/1/08,923 189 | 8/1/08,844 190 | 9/1/08,820 191 | 10/1/08,777 192 | 11/1/08,652 193 | 12/1/08,560 194 | 1/1/09,490 195 | 2/1/09,582 196 | 3/1/09,505 197 | 4/1/09,478 198 | 5/1/09,540 199 | 6/1/09,585 200 | 7/1/09,594 201 | 8/1/09,586 202 | 9/1/09,585 203 | 10/1/09,534 204 | 11/1/09,588 205 | 12/1/09,581 206 | 1/1/10,614 207 | 2/1/10,604 208 | 3/1/10,636 209 | 4/1/10,687 210 | 5/1/10,583 211 | 6/1/10,536 212 | 7/1/10,546 213 | 8/1/10,599 214 | 9/1/10,594 215 | 10/1/10,543 216 | 11/1/10,545 217 | 12/1/10,539 218 | 1/1/11,630 219 | 2/1/11,517 220 | 3/1/11,600 221 | 4/1/11,554 222 | 5/1/11,561 223 | 6/1/11,608 224 | 7/1/11,623 225 | 8/1/11,585 226 | 9/1/11,650 227 | 10/1/11,610 228 | 11/1/11,711 229 | 12/1/11,694 230 | 1/1/12,723 231 | 2/1/12,704 232 | 3/1/12,695 233 | 4/1/12,753 234 | 5/1/12,708 235 | 6/1/12,757 236 | 7/1/12,740 237 | 8/1/12,754 238 | 9/1/12,847 239 | 10/1/12,915 240 | 11/1/12,833 241 | 12/1/12,976 242 | 1/1/13,888 243 | 2/1/13,970 244 | 3/1/13,999 245 | 4/1/13,826 246 | 5/1/13,920 247 | 6/1/13,852 248 | 7/1/13,891 249 | 8/1/13,898 250 | 9/1/13,860 251 | 10/1/13,921 252 | 11/1/13,1104 253 | 12/1/13,1010 254 | 1/1/14,902 255 | 2/1/14,948 256 | 3/1/14,973 257 | 4/1/14,1038 258 | 5/1/14,987 259 | 6/1/14,928 260 | 7/1/14,1085 261 | 8/1/14,984 262 | 9/1/14,999 263 | 10/1/14,1094 264 | 11/1/14,994 265 | 12/1/14,1081 266 | 1/1/15,1101 267 | 2/1/15,893 268 | 3/1/15,964 269 | 4/1/15,1192 270 | 5/1/15,1063 271 | 6/1/15,1213 272 | 7/1/15,1147 273 | 8/1/15,1132 274 | 9/1/15,1189 275 | 10/1/15,1073 276 | 11/1/15,1171 277 | 12/1/15,1160 278 | 1/1/16,1128 279 | 2/1/16,1213 280 | 3/1/16,1113 281 | 4/1/16,1155 282 | 5/1/16,1135 283 | 6/1/16,1189 -------------------------------------------------------------------------------- /data/TOTALSA.csv: -------------------------------------------------------------------------------- 1 | DATE,TOTALSA 2 | 1993-01-01,13.5 3 | 1993-02-01,13.0 4 | 1993-03-01,13.3 5 | 1993-04-01,14.5 6 | 1993-05-01,14.5 7 | 1993-06-01,14.5 8 | 1993-07-01,14.5 9 | 1993-08-01,13.6 10 | 1993-09-01,14.0 11 | 1993-10-01,14.9 12 | 1993-11-01,14.9 13 | 1993-12-01,15.0 14 | 1994-01-01,15.4 15 | 1994-02-01,15.5 16 | 1994-03-01,15.3 17 | 1994-04-01,16.0 18 | 1994-05-01,14.6 19 | 1994-06-01,15.1 20 | 1994-07-01,14.9 21 | 1994-08-01,15.4 22 | 1994-09-01,15.3 23 | 1994-10-01,15.9 24 | 1994-11-01,15.9 25 | 1994-12-01,15.6 26 | 1995-01-01,14.8 27 | 1995-02-01,14.9 28 | 1995-03-01,15.3 29 | 1995-04-01,14.4 30 | 1995-05-01,14.8 31 | 1995-06-01,15.4 32 | 1995-07-01,14.7 33 | 1995-08-01,15.4 34 | 1995-09-01,15.3 35 | 1995-10-01,14.9 36 | 1995-11-01,15.4 37 | 1995-12-01,16.3 38 | 1996-01-01,14.8 39 | 1996-02-01,15.6 40 | 1996-03-01,16.0 41 | 1996-04-01,15.5 42 | 1996-05-01,16.0 43 | 1996-06-01,15.3 44 | 1996-07-01,15.1 45 | 1996-08-01,15.5 46 | 1996-09-01,15.5 47 | 1996-10-01,15.3 48 | 1996-11-01,15.7 49 | 1996-12-01,15.2 50 | 1997-01-01,15.7 51 | 1997-02-01,15.3 52 | 1997-03-01,15.8 53 | 1997-04-01,15.1 54 | 1997-05-01,15.1 55 | 1997-06-01,14.5 56 | 1997-07-01,15.6 57 | 1997-08-01,16.1 58 | 1997-09-01,15.1 59 | 1997-10-01,15.4 60 | 1997-11-01,15.8 61 | 1997-12-01,16.5 62 | 1998-01-01,14.8 63 | 1998-02-01,15.2 64 | 1998-03-01,15.4 65 | 1998-04-01,15.9 66 | 1998-05-01,17.1 67 | 1998-06-01,16.8 68 | 1998-07-01,14.7 69 | 1998-08-01,14.8 70 | 1998-09-01,16.3 71 | 1998-10-01,17.1 72 | 1998-11-01,16.1 73 | 1998-12-01,17.4 74 | 1999-01-01,16.6 75 | 1999-02-01,17.1 76 | 1999-03-01,16.8 77 | 1999-04-01,16.9 78 | 1999-05-01,17.6 79 | 1999-06-01,17.3 80 | 1999-07-01,17.7 81 | 1999-08-01,17.6 82 | 1999-09-01,17.7 83 | 1999-10-01,17.7 84 | 1999-11-01,17.6 85 | 1999-12-01,18.3 86 | 2000-01-01,18.6 87 | 2000-02-01,19.4 88 | 2000-03-01,18.3 89 | 2000-04-01,17.9 90 | 2000-05-01,17.9 91 | 2000-06-01,17.6 92 | 2000-07-01,17.3 93 | 2000-08-01,17.5 94 | 2000-09-01,18.7 95 | 2000-10-01,17.5 96 | 2000-11-01,16.6 97 | 2000-12-01,16.2 98 | 2001-01-01,17.7 99 | 2001-02-01,17.8 100 | 2001-03-01,17.2 101 | 2001-04-01,16.9 102 | 2001-05-01,16.9 103 | 2001-06-01,17.5 104 | 2001-07-01,16.5 105 | 2001-08-01,16.3 106 | 2001-09-01,16.4 107 | 2001-10-01,22.1 108 | 2001-11-01,18.0 109 | 2001-12-01,16.5 110 | 2002-01-01,16.5 111 | 2002-02-01,17.3 112 | 2002-03-01,17.1 113 | 2002-04-01,17.7 114 | 2002-05-01,16.2 115 | 2002-06-01,16.9 116 | 2002-07-01,18.2 117 | 2002-08-01,18.4 118 | 2002-09-01,16.7 119 | 2002-10-01,16.3 120 | 2002-11-01,16.5 121 | 2002-12-01,17.9 122 | 2003-01-01,16.7 123 | 2003-02-01,16.1 124 | 2003-03-01,16.5 125 | 2003-04-01,16.7 126 | 2003-05-01,16.5 127 | 2003-06-01,17.0 128 | 2003-07-01,17.1 129 | 2003-08-01,18.3 130 | 2003-09-01,17.3 131 | 2003-10-01,16.5 132 | 2003-11-01,17.6 133 | 2003-12-01,17.4 134 | 2004-01-01,16.7 135 | 2004-02-01,17.0 136 | 2004-03-01,17.2 137 | 2004-04-01,16.9 138 | 2004-05-01,18.2 139 | 2004-06-01,16.2 140 | 2004-07-01,17.3 141 | 2004-08-01,17.2 142 | 2004-09-01,17.9 143 | 2004-10-01,17.5 144 | 2004-11-01,17.4 145 | 2004-12-01,18.1 146 | 2005-01-01,16.9 147 | 2005-02-01,16.9 148 | 2005-03-01,17.4 149 | 2005-04-01,17.8 150 | 2005-05-01,17.4 151 | 2005-06-01,18.5 152 | 2005-07-01,21.1 153 | 2005-08-01,17.4 154 | 2005-09-01,16.9 155 | 2005-10-01,15.3 156 | 2005-11-01,16.5 157 | 2005-12-01,17.2 158 | 2006-01-01,18.1 159 | 2006-02-01,17.1 160 | 2006-03-01,17.0 161 | 2006-04-01,17.1 162 | 2006-05-01,16.7 163 | 2006-06-01,16.9 164 | 2006-07-01,17.7 165 | 2006-08-01,16.5 166 | 2006-09-01,17.0 167 | 2006-10-01,16.9 168 | 2006-11-01,16.7 169 | 2006-12-01,17.1 170 | 2007-01-01,16.9 171 | 2007-02-01,17.2 172 | 2007-03-01,16.4 173 | 2007-04-01,16.6 174 | 2007-05-01,16.7 175 | 2007-06-01,16.2 176 | 2007-07-01,15.8 177 | 2007-08-01,16.4 178 | 2007-09-01,16.5 179 | 2007-10-01,16.5 180 | 2007-11-01,16.4 181 | 2007-12-01,16.0 182 | 2008-01-01,15.7 183 | 2008-02-01,15.5 184 | 2008-03-01,15.1 185 | 2008-04-01,14.6 186 | 2008-05-01,14.7 187 | 2008-06-01,14.4 188 | 2008-07-01,13.0 189 | 2008-08-01,14.1 190 | 2008-09-01,13.0 191 | 2008-10-01,10.9 192 | 2008-11-01,10.5 193 | 2008-12-01,10.4 194 | 2009-01-01,9.8 195 | 2009-02-01,9.2 196 | 2009-03-01,9.8 197 | 2009-04-01,9.4 198 | 2009-05-01,10.2 199 | 2009-06-01,10.1 200 | 2009-07-01,11.6 201 | 2009-08-01,14.8 202 | 2009-09-01,9.5 203 | 2009-10-01,10.6 204 | 2009-11-01,11.0 205 | 2009-12-01,11.3 206 | 2010-01-01,10.9 207 | 2010-02-01,10.3 208 | 2010-03-01,11.8 209 | 2010-04-01,11.5 210 | 2010-05-01,12.0 211 | 2010-06-01,11.6 212 | 2010-07-01,11.9 213 | 2010-08-01,12.0 214 | 2010-09-01,11.9 215 | 2010-10-01,12.4 216 | 2010-11-01,12.3 217 | 2010-12-01,12.6 218 | 2011-01-01,12.8 219 | 2011-02-01,13.1 220 | 2011-03-01,13.2 221 | 2011-04-01,13.4 222 | 2011-05-01,12.3 223 | 2011-06-01,11.9 224 | 2011-07-01,12.7 225 | 2011-08-01,12.6 226 | 2011-09-01,13.4 227 | 2011-10-01,13.7 228 | 2011-11-01,13.6 229 | 2011-12-01,13.8 230 | 2012-01-01,14.3 231 | 2012-02-01,14.9 232 | 2012-03-01,14.6 233 | 2012-04-01,14.8 234 | 2012-05-01,14.4 235 | 2012-06-01,14.5 236 | 2012-07-01,14.5 237 | 2012-08-01,14.5 238 | 2012-09-01,15.2 239 | 2012-10-01,14.8 240 | 2012-11-01,15.4 241 | 2012-12-01,15.6 242 | 2013-01-01,15.7 243 | 2013-02-01,15.9 244 | 2013-03-01,15.7 245 | 2013-04-01,15.8 246 | 2013-05-01,15.7 247 | 2013-06-01,16.1 248 | 2013-07-01,16.0 249 | 2013-08-01,16.0 250 | 2013-09-01,15.7 251 | 2013-10-01,15.7 252 | 2013-11-01,16.4 253 | 2013-12-01,15.8 254 | 2014-01-01,15.7 255 | 2014-02-01,15.9 256 | 2014-03-01,17.0 257 | 2014-04-01,16.7 258 | 2014-05-01,16.9 259 | 2014-06-01,17.2 260 | 2014-07-01,16.9 261 | 2014-08-01,17.6 262 | 2014-09-01,16.7 263 | 2014-10-01,16.8 264 | 2014-11-01,17.4 265 | 2014-12-01,17.4 266 | 2015-01-01,17.2 267 | 2015-02-01,16.9 268 | 2015-03-01,17.8 269 | 2015-04-01,17.3 270 | 2015-05-01,18.0 271 | 2015-06-01,17.5 272 | 2015-07-01,18.0 273 | 2015-08-01,18.2 274 | 2015-09-01,18.4 275 | 2015-10-01,18.5 276 | 2015-11-01,18.4 277 | 2015-12-01,17.9 278 | 2016-01-01,18.0 279 | 2016-02-01,18.0 280 | 2016-03-01,17.2 281 | 2016-04-01,17.9 282 | 2016-05-01,17.6 283 | 2016-06-01,17.1 284 | -------------------------------------------------------------------------------- /data/TTLCON.csv: -------------------------------------------------------------------------------- 1 | DATE,TTLCON 2 | 1993-01-01,31283 3 | 1993-02-01,30264 4 | 1993-03-01,33794 5 | 1993-04-01,37257 6 | 1993-05-01,40124 7 | 1993-06-01,43842 8 | 1993-07-01,44682 9 | 1993-08-01,46866 10 | 1993-09-01,47755 11 | 1993-10-01,45361 12 | 1993-11-01,44786 13 | 1993-12-01,39535 14 | 1994-01-01,34917 15 | 1994-02-01,33222 16 | 1994-03-01,38215 17 | 1994-04-01,41754 18 | 1994-05-01,45711 19 | 1994-06-01,49032 20 | 1994-07-01,49545 21 | 1994-08-01,51516 22 | 1994-09-01,51718 23 | 1994-10-01,49135 24 | 1994-11-01,46719 25 | 1994-12-01,40406 26 | 1995-01-01,37206 27 | 1995-02-01,35250 28 | 1995-03-01,40016 29 | 1995-04-01,42824 30 | 1995-05-01,46615 31 | 1995-06-01,49804 32 | 1995-07-01,50219 33 | 1995-08-01,52628 34 | 1995-09-01,53028 35 | 1995-10-01,51372 36 | 1995-11-01,48163 37 | 1995-12-01,41542 38 | 1996-01-01,39356 39 | 1996-02-01,37391 40 | 1996-03-01,42052 41 | 1996-04-01,46865 42 | 1996-05-01,51598 43 | 1996-06-01,54703 44 | 1996-07-01,55430 45 | 1996-08-01,57488 46 | 1996-09-01,58544 47 | 1996-10-01,57456 48 | 1996-11-01,53498 49 | 1996-12-01,45313 50 | 1997-01-01,41882 51 | 1997-02-01,40788 52 | 1997-03-01,45704 53 | 1997-04-01,48998 54 | 1997-05-01,53617 55 | 1997-06-01,56977 56 | 1997-07-01,58826 57 | 1997-08-01,60682 58 | 1997-09-01,61323 59 | 1997-10-01,60053 60 | 1997-11-01,54968 61 | 1997-12-01,48031 62 | 1998-01-01,44409 63 | 1998-02-01,43150 64 | 1998-03-01,49602 65 | 1998-04-01,54017 66 | 1998-05-01,57723 67 | 1998-06-01,64393 68 | 1998-07-01,64381 69 | 1998-08-01,66137 70 | 1998-09-01,66544 71 | 1998-10-01,64600 72 | 1998-11-01,60469 73 | 1998-12-01,53095 74 | 1999-01-01,48746 75 | 1999-02-01,48824 76 | 1999-03-01,55607 77 | 1999-04-01,58654 78 | 1999-05-01,62209 79 | 1999-06-01,67545 80 | 1999-07-01,68522 81 | 1999-08-01,69975 82 | 1999-09-01,70034 83 | 1999-10-01,68669 84 | 1999-11-01,66684 85 | 1999-12-01,59082 86 | 2000-01-01,53782 87 | 2000-02-01,53993 88 | 2000-03-01,61295 89 | 2000-04-01,64524 90 | 2000-05-01,69253 91 | 2000-06-01,72715 92 | 2000-07-01,71774 93 | 2000-08-01,76132 94 | 2000-09-01,75153 95 | 2000-10-01,73550 96 | 2000-11-01,69677 97 | 2000-12-01,60910 98 | 2001-01-01,56855 99 | 2001-02-01,55529 100 | 2001-03-01,62808 101 | 2001-04-01,67787 102 | 2001-05-01,72920 103 | 2001-06-01,78004 104 | 2001-07-01,78281 105 | 2001-08-01,79916 106 | 2001-09-01,76605 107 | 2001-10-01,76302 108 | 2001-11-01,71543 109 | 2001-12-01,63697 110 | 2002-01-01,59516 111 | 2002-02-01,58588 112 | 2002-03-01,63782 113 | 2002-04-01,69504 114 | 2002-05-01,73384 115 | 2002-06-01,77182 116 | 2002-07-01,78863 117 | 2002-08-01,79460 118 | 2002-09-01,76542 119 | 2002-10-01,75710 120 | 2002-11-01,71362 121 | 2002-12-01,63984 122 | 2003-01-01,59877 123 | 2003-02-01,58526 124 | 2003-03-01,64506 125 | 2003-04-01,69638 126 | 2003-05-01,74473 127 | 2003-06-01,80377 128 | 2003-07-01,82971 129 | 2003-08-01,85191 130 | 2003-09-01,83841 131 | 2003-10-01,83133 132 | 2003-11-01,77915 133 | 2003-12-01,71050 134 | 2004-01-01,64934 135 | 2004-02-01,64138 136 | 2004-03-01,73238 137 | 2004-04-01,78354 138 | 2004-05-01,83736 139 | 2004-06-01,89932 140 | 2004-07-01,93614 141 | 2004-08-01,96164 142 | 2004-09-01,92538 143 | 2004-10-01,90582 144 | 2004-11-01,86394 145 | 2004-12-01,77733 146 | 2005-01-01,72458 147 | 2005-02-01,73094 148 | 2005-03-01,81791 149 | 2005-04-01,88032 150 | 2005-05-01,93704 151 | 2005-06-01,100678 152 | 2005-07-01,103875 153 | 2005-08-01,107453 154 | 2005-09-01,104682 155 | 2005-10-01,104039 156 | 2005-11-01,98348 157 | 2005-12-01,88657 158 | 2006-01-01,82400 159 | 2006-02-01,82381 160 | 2006-03-01,92354 161 | 2006-04-01,97056 162 | 2006-05-01,101862 163 | 2006-06-01,106777 164 | 2006-07-01,107150 165 | 2006-08-01,108598 166 | 2006-09-01,103102 167 | 2006-10-01,100721 168 | 2006-11-01,93850 169 | 2006-12-01,85031 170 | 2007-01-01,79009 171 | 2007-02-01,78501 172 | 2007-03-01,87421 173 | 2007-04-01,93644 174 | 2007-05-01,99690 175 | 2007-06-01,105020 176 | 2007-07-01,106779 177 | 2007-08-01,109573 178 | 2007-09-01,104744 179 | 2007-10-01,104313 180 | 2007-11-01,94934 181 | 2007-12-01,84325 182 | 2008-01-01,78039 183 | 2008-02-01,77921 184 | 2008-03-01,83384 185 | 2008-04-01,89092 186 | 2008-05-01,93316 187 | 2008-06-01,97882 188 | 2008-07-01,100234 189 | 2008-08-01,100366 190 | 2008-09-01,97303 191 | 2008-10-01,96609 192 | 2008-11-01,86093 193 | 2008-12-01,77112 194 | 2009-01-01,67301 195 | 2009-02-01,67030 196 | 2009-03-01,71985 197 | 2009-04-01,75043 198 | 2009-05-01,76661 199 | 2009-06-01,82089 200 | 2009-07-01,83885 201 | 2009-08-01,84426 202 | 2009-09-01,82037 203 | 2009-10-01,79952 204 | 2009-11-01,71527 205 | 2009-12-01,64608 206 | 2010-01-01,55362 207 | 2010-02-01,54986 208 | 2010-03-01,60990 209 | 2010-04-01,66565 210 | 2010-05-01,68903 211 | 2010-06-01,74806 212 | 2010-07-01,73918 213 | 2010-08-01,76554 214 | 2010-09-01,75818 215 | 2010-10-01,73386 216 | 2010-11-01,67318 217 | 2010-12-01,60648 218 | 2011-01-01,50973 219 | 2011-02-01,51017 220 | 2011-03-01,57148 221 | 2011-04-01,61590 222 | 2011-05-01,65430 223 | 2011-06-01,72495 224 | 2011-07-01,72253 225 | 2011-08-01,76986 226 | 2011-09-01,75871 227 | 2011-10-01,73783 228 | 2011-11-01,68203 229 | 2011-12-01,62582 230 | 2012-01-01,56608 231 | 2012-02-01,56994 232 | 2012-03-01,61803 233 | 2012-04-01,67002 234 | 2012-05-01,72228 235 | 2012-06-01,77580 236 | 2012-07-01,78305 237 | 2012-08-01,81152 238 | 2012-09-01,79404 239 | 2012-10-01,80287 240 | 2012-11-01,73071 241 | 2012-12-01,66022 242 | 2013-01-01,58821 243 | 2013-02-01,58898 244 | 2013-03-01,64190 245 | 2013-04-01,70601 246 | 2013-05-01,75775 247 | 2013-06-01,80997 248 | 2013-07-01,84346 249 | 2013-08-01,86776 250 | 2013-09-01,85825 251 | 2013-10-01,86551 252 | 2013-11-01,79695 253 | 2013-12-01,73876 254 | 2014-01-01,67391 255 | 2014-02-01,66916 256 | 2014-03-01,73899 257 | 2014-04-01,80749 258 | 2014-05-01,85599 259 | 2014-06-01,90712 260 | 2014-07-01,92585 261 | 2014-08-01,93261 262 | 2014-09-01,93193 263 | 2014-10-01,94888 264 | 2014-11-01,86262 265 | 2014-12-01,80174 266 | 2015-01-01,71447 267 | 2015-02-01,71159 268 | 2015-03-01,79751 269 | 2015-04-01,88308 270 | 2015-05-01,94756 271 | 2015-06-01,102716 272 | 2015-07-01,104958 273 | 2015-08-01,106856 274 | 2015-09-01,106868 275 | 2015-10-01,103844 276 | 2015-11-01,94876 277 | 2015-12-01,86894 278 | 2016-01-01,78610 279 | 2016-02-01,79903 280 | 2016-03-01,88984 281 | 2016-04-01,92208 282 | 2016-05-01,97406 283 | 2016-06-01,102696 284 | -------------------------------------------------------------------------------- /data/international-airline-passengers.csv: -------------------------------------------------------------------------------- 1 | "Month","n_pass_thousands" 2 | "1949-01",112 3 | "1949-02",118 4 | "1949-03",132 5 | "1949-04",129 6 | "1949-05",121 7 | "1949-06",135 8 | "1949-07",148 9 | "1949-08",148 10 | "1949-09",136 11 | "1949-10",119 12 | "1949-11",104 13 | "1949-12",118 14 | "1950-01",115 15 | "1950-02",126 16 | "1950-03",141 17 | "1950-04",135 18 | "1950-05",125 19 | "1950-06",149 20 | "1950-07",170 21 | "1950-08",170 22 | "1950-09",158 23 | "1950-10",133 24 | "1950-11",114 25 | "1950-12",140 26 | "1951-01",145 27 | "1951-02",150 28 | "1951-03",178 29 | "1951-04",163 30 | "1951-05",172 31 | "1951-06",178 32 | "1951-07",199 33 | "1951-08",199 34 | "1951-09",184 35 | "1951-10",162 36 | "1951-11",146 37 | "1951-12",166 38 | "1952-01",171 39 | "1952-02",180 40 | "1952-03",193 41 | "1952-04",181 42 | "1952-05",183 43 | "1952-06",218 44 | "1952-07",230 45 | "1952-08",242 46 | "1952-09",209 47 | "1952-10",191 48 | "1952-11",172 49 | "1952-12",194 50 | "1953-01",196 51 | "1953-02",196 52 | "1953-03",236 53 | "1953-04",235 54 | "1953-05",229 55 | "1953-06",243 56 | "1953-07",264 57 | "1953-08",272 58 | "1953-09",237 59 | "1953-10",211 60 | "1953-11",180 61 | "1953-12",201 62 | "1954-01",204 63 | "1954-02",188 64 | "1954-03",235 65 | "1954-04",227 66 | "1954-05",234 67 | "1954-06",264 68 | "1954-07",302 69 | "1954-08",293 70 | "1954-09",259 71 | "1954-10",229 72 | "1954-11",203 73 | "1954-12",229 74 | "1955-01",242 75 | "1955-02",233 76 | "1955-03",267 77 | "1955-04",269 78 | "1955-05",270 79 | "1955-06",315 80 | "1955-07",364 81 | "1955-08",347 82 | "1955-09",312 83 | "1955-10",274 84 | "1955-11",237 85 | "1955-12",278 86 | "1956-01",284 87 | "1956-02",277 88 | "1956-03",317 89 | "1956-04",313 90 | "1956-05",318 91 | "1956-06",374 92 | "1956-07",413 93 | "1956-08",405 94 | "1956-09",355 95 | "1956-10",306 96 | "1956-11",271 97 | "1956-12",306 98 | "1957-01",315 99 | "1957-02",301 100 | "1957-03",356 101 | "1957-04",348 102 | "1957-05",355 103 | "1957-06",422 104 | "1957-07",465 105 | "1957-08",467 106 | "1957-09",404 107 | "1957-10",347 108 | "1957-11",305 109 | "1957-12",336 110 | "1958-01",340 111 | "1958-02",318 112 | "1958-03",362 113 | "1958-04",348 114 | "1958-05",363 115 | "1958-06",435 116 | "1958-07",491 117 | "1958-08",505 118 | "1958-09",404 119 | "1958-10",359 120 | "1958-11",310 121 | "1958-12",337 122 | "1959-01",360 123 | "1959-02",342 124 | "1959-03",406 125 | "1959-04",396 126 | "1959-05",420 127 | "1959-06",472 128 | "1959-07",548 129 | "1959-08",559 130 | "1959-09",463 131 | "1959-10",407 132 | "1959-11",362 133 | "1959-12",405 134 | "1960-01",417 135 | "1960-02",391 136 | "1960-03",419 137 | "1960-04",461 138 | "1960-05",472 139 | "1960-06",535 140 | "1960-07",622 141 | "1960-08",606 142 | "1960-09",508 143 | "1960-10",461 144 | "1960-11",390 145 | "1960-12",432 146 | -------------------------------------------------------------------------------- /data/liquor.csv: -------------------------------------------------------------------------------- 1 | Period,Value 2 | 1/1/92,1509 3 | 2/1/92,1541 4 | 3/1/92,1597 5 | 4/1/92,1675 6 | 5/1/92,1822 7 | 6/1/92,1775 8 | 7/1/92,1912 9 | 8/1/92,1862 10 | 9/1/92,1770 11 | 10/1/92,1882 12 | 11/1/92,1831 13 | 12/1/92,2511 14 | 1/1/93,1614 15 | 2/1/93,1529 16 | 3/1/93,1678 17 | 4/1/93,1713 18 | 5/1/93,1796 19 | 6/1/93,1792 20 | 7/1/93,1950 21 | 8/1/93,1777 22 | 9/1/93,1707 23 | 10/1/93,1757 24 | 11/1/93,1782 25 | 12/1/93,2443 26 | 1/1/94,1548 27 | 2/1/94,1505 28 | 3/1/94,1714 29 | 4/1/94,1757 30 | 5/1/94,1830 31 | 6/1/94,1857 32 | 7/1/94,1981 33 | 8/1/94,1858 34 | 9/1/94,1823 35 | 10/1/94,1806 36 | 11/1/94,1845 37 | 12/1/94,2577 38 | 1/1/95,1555 39 | 2/1/95,1501 40 | 3/1/95,1725 41 | 4/1/95,1699 42 | 5/1/95,1807 43 | 6/1/95,1863 44 | 7/1/95,1886 45 | 8/1/95,1861 46 | 9/1/95,1845 47 | 10/1/95,1788 48 | 11/1/95,1879 49 | 12/1/95,2598 50 | 1/1/96,1679 51 | 2/1/96,1652 52 | 3/1/96,1837 53 | 4/1/96,1798 54 | 5/1/96,1957 55 | 6/1/96,1958 56 | 7/1/96,2034 57 | 8/1/96,2062 58 | 9/1/96,1781 59 | 10/1/96,1860 60 | 11/1/96,1992 61 | 12/1/96,2547 62 | 1/1/97,1706 63 | 2/1/97,1621 64 | 3/1/97,1853 65 | 4/1/97,1817 66 | 5/1/97,2060 67 | 6/1/97,2002 68 | 7/1/97,2098 69 | 8/1/97,2079 70 | 9/1/97,1892 71 | 10/1/97,2050 72 | 11/1/97,2082 73 | 12/1/97,2821 74 | 1/1/98,1846 75 | 2/1/98,1768 76 | 3/1/98,1894 77 | 4/1/98,1963 78 | 5/1/98,2140 79 | 6/1/98,2059 80 | 7/1/98,2209 81 | 8/1/98,2118 82 | 9/1/98,2031 83 | 10/1/98,2163 84 | 11/1/98,2154 85 | 12/1/98,3037 86 | 1/1/99,1866 87 | 2/1/99,1808 88 | 3/1/99,1986 89 | 4/1/99,2099 90 | 5/1/99,2210 91 | 6/1/99,2145 92 | 7/1/99,2339 93 | 8/1/99,2140 94 | 9/1/99,2126 95 | 10/1/99,2219 96 | 11/1/99,2273 97 | 12/1/99,3265 98 | 1/1/00,1920 99 | 2/1/00,1976 100 | 3/1/00,2190 101 | 4/1/00,2132 102 | 5/1/00,2357 103 | 6/1/00,2413 104 | 7/1/00,2463 105 | 8/1/00,2422 106 | 9/1/00,2358 107 | 10/1/00,2352 108 | 11/1/00,2549 109 | 12/1/00,3375 110 | 1/1/01,2109 111 | 2/1/01,2052 112 | 3/1/01,2327 113 | 4/1/01,2231 114 | 5/1/01,2470 115 | 6/1/01,2526 116 | 7/1/01,2483 117 | 8/1/01,2518 118 | 9/1/01,2316 119 | 10/1/01,2409 120 | 11/1/01,2638 121 | 12/1/01,3542 122 | 1/1/02,2114 123 | 2/1/02,2109 124 | 3/1/02,2366 125 | 4/1/02,2300 126 | 5/1/02,2569 127 | 6/1/02,2486 128 | 7/1/02,2568 129 | 8/1/02,2595 130 | 9/1/02,2297 131 | 10/1/02,2401 132 | 11/1/02,2601 133 | 12/1/02,3488 134 | 1/1/03,2121 135 | 2/1/03,2046 136 | 3/1/03,2273 137 | 4/1/03,2333 138 | 5/1/03,2576 139 | 6/1/03,2433 140 | 7/1/03,2611 141 | 8/1/03,2660 142 | 9/1/03,2461 143 | 10/1/03,2641 144 | 11/1/03,2660 145 | 12/1/03,3654 146 | 1/1/04,2293 147 | 2/1/04,2219 148 | 3/1/04,2398 149 | 4/1/04,2553 150 | 5/1/04,2685 151 | 6/1/04,2643 152 | 7/1/04,2867 153 | 8/1/04,2622 154 | 9/1/04,2618 155 | 10/1/04,2727 156 | 11/1/04,2763 157 | 12/1/04,3801 158 | 1/1/05,2219 159 | 2/1/05,2316 160 | 3/1/05,2530 161 | 4/1/05,2640 162 | 5/1/05,2709 163 | 6/1/05,2783 164 | 7/1/05,2924 165 | 8/1/05,2791 166 | 9/1/05,2784 167 | 10/1/05,2801 168 | 11/1/05,2933 169 | 12/1/05,4137 170 | 1/1/06,2424 171 | 2/1/06,2519 172 | 3/1/06,2753 173 | 4/1/06,2791 174 | 5/1/06,3017 175 | 6/1/06,3055 176 | 7/1/06,3117 177 | 8/1/06,3024 178 | 9/1/06,2997 179 | 10/1/06,2913 180 | 11/1/06,3137 181 | 12/1/06,4269 182 | 1/1/07,2569 183 | 2/1/07,2603 184 | 3/1/07,3005 185 | 4/1/07,2867 186 | 5/1/07,3262 187 | 6/1/07,3364 188 | 7/1/07,3322 189 | 8/1/07,3292 190 | 9/1/07,3057 191 | 10/1/07,3087 192 | 11/1/07,3297 193 | 12/1/07,4403 194 | 1/1/08,2675 195 | 2/1/08,2806 196 | 3/1/08,2989 197 | 4/1/08,2997 198 | 5/1/08,3420 199 | 6/1/08,3280 200 | 7/1/08,3517 201 | 8/1/08,3473 202 | 9/1/08,3150 203 | 10/1/08,3351 204 | 11/1/08,3387 205 | 12/1/08,4459 206 | 1/1/09,2912 207 | 2/1/09,2781 208 | 3/1/09,3024 209 | 4/1/09,3130 210 | 5/1/09,3467 211 | 6/1/09,3306 212 | 7/1/09,3556 213 | 8/1/09,3399 214 | 9/1/09,3263 215 | 10/1/09,3425 216 | 11/1/09,3356 217 | 12/1/09,4626 218 | 1/1/10,2877 219 | 2/1/10,2916 220 | 3/1/10,3214 221 | 4/1/10,3310 222 | 5/1/10,3466 223 | 6/1/10,3438 224 | 7/1/10,3657 225 | 8/1/10,3455 226 | 9/1/10,3365 227 | 10/1/10,3497 228 | 11/1/10,3524 229 | 12/1/10,4683 230 | 1/1/11,2888 231 | 2/1/11,2985 232 | 3/1/11,3249 233 | 4/1/11,3363 234 | 5/1/11,3471 235 | 6/1/11,3551 236 | 7/1/11,3740 237 | 8/1/11,3576 238 | 9/1/11,3517 239 | 10/1/11,3515 240 | 11/1/11,3646 241 | 12/1/11,4892 242 | 1/1/12,2995 243 | 2/1/12,3202 244 | 3/1/12,3549 245 | 4/1/12,3409 246 | 5/1/12,3786 247 | 6/1/12,3815 248 | 7/1/12,3733 249 | 8/1/12,3752 250 | 9/1/12,3503 251 | 10/1/12,3626 252 | 11/1/12,3869 253 | 12/1/12,5126 254 | 1/1/13,3158 255 | 2/1/13,3231 256 | 3/1/13,3625 257 | 4/1/13,3465 258 | 5/1/13,3973 259 | 6/1/13,3817 260 | 7/1/13,4010 261 | 8/1/13,4078 262 | 9/1/13,3643 263 | 10/1/13,3799 264 | 11/1/13,4043 265 | 12/1/13,5235 266 | 1/1/14,3373 267 | 2/1/14,3353 268 | 3/1/14,3679 269 | 4/1/14,3699 270 | 5/1/14,4187 271 | 6/1/14,4086 272 | 7/1/14,4240 273 | 8/1/14,4216 274 | 9/1/14,3856 275 | 10/1/14,4087 276 | 11/1/14,4133 277 | 12/1/14,5606 278 | 1/1/15,3576 279 | 2/1/15,3517 280 | 3/1/15,3881 281 | 4/1/15,3864 282 | 5/1/15,4369 283 | 6/1/15,4241 284 | 7/1/15,4524 285 | 8/1/15,4248 286 | 9/1/15,4091 287 | 10/1/15,4291 288 | 11/1/15,4241 289 | 12/1/15,5834 290 | 1/1/16,3559 291 | 2/1/16,3718 292 | 3/1/16,3986 293 | 4/1/16,4043 294 | 5/1/16,4311 -------------------------------------------------------------------------------- /data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv: -------------------------------------------------------------------------------- 1 | Land-Ocean: Global Means 2 | Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,J-D,D-N,DJF,MAM,JJA,SON 3 | 1880,-.30,-.20,-.18,-.28,-.14,-.29,-.24,-.08,-.17,-.16,-.19,-.21,-.20,***,***,-.20,-.20,-.17 4 | 1881,-.09,-.14,.01,-.03,-.04,-.28,-.06,-.02,-.09,-.19,-.26,-.15,-.11,-.12,-.15,-.02,-.12,-.18 5 | 1882,.10,.09,.02,-.20,-.17,-.25,-.10,.04,-.01,-.22,-.21,-.25,-.10,-.09,.01,-.12,-.10,-.14 6 | 1883,-.33,-.42,-.17,-.24,-.25,-.11,-.08,-.13,-.18,-.11,-.20,-.18,-.20,-.20,-.33,-.22,-.11,-.16 7 | 1884,-.18,-.11,-.34,-.36,-.31,-.38,-.34,-.25,-.23,-.22,-.30,-.29,-.27,-.27,-.15,-.34,-.32,-.25 8 | 1885,-.64,-.29,-.23,-.44,-.41,-.50,-.28,-.27,-.19,-.18,-.22,-.05,-.31,-.33,-.41,-.36,-.35,-.20 9 | 1886,-.41,-.45,-.41,-.29,-.27,-.39,-.16,-.32,-.19,-.25,-.26,-.25,-.30,-.29,-.30,-.32,-.29,-.23 10 | 1887,-.66,-.48,-.31,-.37,-.33,-.20,-.19,-.27,-.20,-.32,-.26,-.37,-.33,-.32,-.46,-.34,-.22,-.26 11 | 1888,-.43,-.42,-.47,-.28,-.22,-.20,-.09,-.11,-.08,.01,-.01,-.13,-.20,-.22,-.41,-.33,-.14,-.02 12 | 1889,-.20,.14,.04,.04,-.03,-.12,-.05,-.18,-.18,-.22,-.30,-.31,-.11,-.10,-.06,.02,-.11,-.23 13 | 1890,-.48,-.47,-.41,-.38,-.48,-.27,-.29,-.36,-.36,-.23,-.37,-.30,-.37,-.37,-.42,-.42,-.31,-.32 14 | 1891,-.46,-.48,-.15,-.25,-.17,-.22,-.22,-.21,-.13,-.23,-.37,-.01,-.24,-.27,-.41,-.19,-.22,-.25 15 | 1892,-.26,-.15,-.36,-.35,-.25,-.20,-.28,-.20,-.25,-.16,-.50,-.29,-.27,-.25,-.14,-.32,-.22,-.30 16 | 1893,-.68,-.51,-.24,-.33,-.35,-.23,-.13,-.24,-.18,-.16,-.17,-.37,-.30,-.29,-.49,-.30,-.20,-.17 17 | 1894,-.55,-.32,-.21,-.42,-.30,-.43,-.32,-.28,-.22,-.16,-.25,-.22,-.31,-.32,-.41,-.31,-.34,-.21 18 | 1895,-.44,-.43,-.29,-.23,-.23,-.25,-.17,-.16,-.02,-.10,-.16,-.12,-.22,-.22,-.36,-.25,-.19,-.09 19 | 1896,-.23,-.15,-.30,-.32,-.20,-.13,-.06,-.09,-.04,.04,-.16,-.12,-.15,-.15,-.17,-.27,-.10,-.05 20 | 1897,-.23,-.19,-.12,-.01,.00,-.12,-.05,-.03,-.04,-.09,-.17,-.25,-.11,-.10,-.18,-.05,-.07,-.10 21 | 1898,-.06,-.32,-.56,-.34,-.36,-.21,-.22,-.23,-.19,-.31,-.36,-.21,-.28,-.28,-.21,-.42,-.22,-.29 22 | 1899,-.18,-.40,-.34,-.21,-.20,-.25,-.13,-.05,-.01,.00,.13,-.26,-.16,-.15,-.26,-.25,-.14,.04 23 | 1900,-.39,-.07,.02,-.15,-.06,-.15,-.08,-.03,.02,.09,-.13,-.14,-.09,-.10,-.24,-.06,-.09,-.01 24 | 1901,-.27,-.05,.05,-.05,-.17,-.10,-.08,-.12,-.16,-.29,-.17,-.29,-.14,-.13,-.15,-.06,-.10,-.20 25 | 1902,-.18,-.03,-.29,-.28,-.31,-.33,-.25,-.28,-.21,-.26,-.36,-.45,-.27,-.25,-.17,-.29,-.29,-.27 26 | 1903,-.27,-.05,-.23,-.40,-.41,-.44,-.30,-.43,-.42,-.41,-.38,-.47,-.35,-.35,-.26,-.34,-.39,-.40 27 | 1904,-.63,-.53,-.45,-.50,-.50,-.49,-.47,-.43,-.46,-.34,-.16,-.29,-.44,-.45,-.54,-.48,-.47,-.32 28 | 1905,-.37,-.58,-.24,-.37,-.33,-.31,-.24,-.20,-.14,-.24,-.09,-.21,-.28,-.28,-.41,-.31,-.25,-.16 29 | 1906,-.30,-.32,-.15,-.03,-.20,-.21,-.25,-.18,-.25,-.19,-.38,-.18,-.22,-.22,-.28,-.13,-.21,-.28 30 | 1907,-.43,-.51,-.23,-.39,-.46,-.43,-.35,-.37,-.32,-.24,-.51,-.50,-.40,-.37,-.37,-.36,-.38,-.36 31 | 1908,-.46,-.35,-.58,-.46,-.40,-.39,-.35,-.45,-.33,-.43,-.50,-.49,-.43,-.43,-.44,-.48,-.40,-.42 32 | 1909,-.70,-.47,-.52,-.59,-.53,-.51,-.42,-.30,-.37,-.38,-.31,-.54,-.47,-.47,-.55,-.55,-.41,-.35 33 | 1910,-.43,-.42,-.48,-.38,-.33,-.37,-.31,-.34,-.37,-.38,-.56,-.69,-.42,-.41,-.46,-.40,-.34,-.44 34 | 1911,-.63,-.59,-.62,-.55,-.51,-.46,-.40,-.43,-.38,-.26,-.20,-.24,-.44,-.48,-.64,-.56,-.43,-.28 35 | 1912,-.27,-.15,-.38,-.21,-.19,-.25,-.41,-.52,-.47,-.55,-.37,-.42,-.35,-.33,-.22,-.26,-.39,-.47 36 | 1913,-.41,-.43,-.43,-.36,-.45,-.46,-.35,-.33,-.34,-.35,-.17,-.02,-.34,-.37,-.42,-.41,-.38,-.28 37 | 1914,.02,-.14,-.23,-.27,-.20,-.23,-.24,-.14,-.13,-.05,-.21,-.09,-.16,-.15,-.05,-.23,-.20,-.13 38 | 1915,-.18,-.01,-.11,.08,-.03,-.14,-.02,-.16,-.12,-.22,-.14,-.25,-.11,-.09,-.09,-.02,-.11,-.16 39 | 1916,-.19,-.20,-.32,-.25,-.26,-.42,-.33,-.26,-.28,-.27,-.42,-.78,-.33,-.29,-.21,-.28,-.34,-.32 40 | 1917,-.47,-.55,-.48,-.39,-.48,-.40,-.22,-.26,-.18,-.35,-.27,-.72,-.40,-.40,-.60,-.45,-.29,-.27 41 | 1918,-.44,-.30,-.19,-.40,-.37,-.28,-.19,-.26,-.14,-.04,-.16,-.29,-.25,-.29,-.49,-.32,-.24,-.11 42 | 1919,-.19,-.22,-.26,-.19,-.19,-.27,-.21,-.19,-.18,-.15,-.30,-.35,-.22,-.22,-.23,-.21,-.23,-.21 43 | 1920,-.14,-.23,-.06,-.26,-.25,-.33,-.31,-.29,-.19,-.28,-.33,-.46,-.26,-.25,-.24,-.19,-.31,-.27 44 | 1921,-.04,-.21,-.28,-.36,-.36,-.31,-.15,-.23,-.17,-.06,-.16,-.18,-.21,-.23,-.24,-.33,-.23,-.13 45 | 1922,-.32,-.42,-.12,-.21,-.34,-.32,-.25,-.30,-.28,-.32,-.16,-.16,-.27,-.27,-.30,-.22,-.29,-.25 46 | 1923,-.25,-.36,-.31,-.37,-.33,-.23,-.28,-.28,-.27,-.11,.04,-.04,-.23,-.24,-.26,-.33,-.26,-.11 47 | 1924,-.22,-.26,-.12,-.34,-.18,-.27,-.25,-.34,-.30,-.35,-.23,-.41,-.27,-.24,-.17,-.21,-.29,-.29 48 | 1925,-.32,-.33,-.22,-.24,-.30,-.33,-.29,-.18,-.13,-.17,.04,.10,-.20,-.24,-.35,-.25,-.26,-.09 49 | 1926,.21,.08,.13,-.14,-.24,-.24,-.20,-.10,-.10,-.11,-.07,-.30,-.09,-.06,.13,-.08,-.18,-.09 50 | 1927,-.28,-.20,-.38,-.31,-.25,-.27,-.15,-.19,-.05,.00,-.04,-.36,-.21,-.20,-.26,-.32,-.20,-.03 51 | 1928,-.02,-.11,-.28,-.29,-.29,-.41,-.21,-.25,-.20,-.19,-.10,-.20,-.21,-.23,-.16,-.29,-.29,-.16 52 | 1929,-.47,-.58,-.35,-.42,-.39,-.43,-.33,-.29,-.24,-.15,-.15,-.53,-.36,-.33,-.42,-.39,-.35,-.18 53 | 1930,-.29,-.24,-.09,-.26,-.25,-.19,-.16,-.10,-.11,-.08,.14,-.08,-.14,-.18,-.35,-.20,-.15,-.02 54 | 1931,-.10,-.20,-.06,-.20,-.22,-.06,.01,-.01,-.07,-.01,-.11,-.10,-.09,-.09,-.13,-.16,-.02,-.06 55 | 1932,.14,-.17,-.20,-.08,-.23,-.30,-.24,-.24,-.13,-.10,-.27,-.23,-.17,-.16,-.04,-.17,-.26,-.16 56 | 1933,-.32,-.31,-.28,-.24,-.25,-.32,-.20,-.23,-.27,-.24,-.32,-.47,-.29,-.27,-.28,-.26,-.25,-.28 57 | 1934,-.26,-.04,-.29,-.28,-.11,-.14,-.11,-.11,-.16,-.11,-.01,-.08,-.14,-.17,-.26,-.23,-.12,-.09 58 | 1935,-.38,.11,-.14,-.35,-.26,-.23,-.19,-.17,-.17,-.08,-.28,-.21,-.20,-.19,-.12,-.25,-.20,-.18 59 | 1936,-.29,-.39,-.25,-.20,-.16,-.19,-.06,-.12,-.05,-.03,-.05,-.03,-.15,-.17,-.30,-.20,-.12,-.04 60 | 1937,-.08,.06,-.17,-.16,-.06,-.07,-.04,.03,.14,.09,.10,-.10,-.02,-.02,-.02,-.13,-.03,.11 61 | 1938,.02,-.03,.06,.05,-.07,-.17,-.08,-.04,.04,.12,.02,-.24,-.03,-.02,-.03,.01,-.10,.06 62 | 1939,-.12,-.11,-.20,-.12,-.07,-.08,-.06,-.05,.00,-.03,.06,.40,-.03,-.09,-.16,-.13,-.06,.01 63 | 1940,-.14,.06,.11,.16,.06,.05,.10,.01,.12,.08,.12,.19,.08,.09,.11,.11,.05,.11 64 | 1941,.12,.22,.05,.10,.10,.04,.16,.14,.02,.25,.12,.14,.12,.13,.18,.08,.11,.13 65 | 1942,.28,.06,.11,.13,.14,.10,.02,-.03,.00,.07,.13,.13,.09,.10,.16,.13,.03,.06 66 | 1943,.00,.22,.02,.14,.10,.00,.14,.03,.11,.31,.26,.28,.13,.12,.11,.09,.06,.22 67 | 1944,.42,.32,.34,.26,.26,.23,.23,.23,.31,.28,.13,.07,.26,.27,.34,.29,.23,.24 68 | 1945,.14,.04,.11,.24,.11,.03,.08,.25,.22,.22,.09,-.09,.12,.13,.09,.15,.12,.18 69 | 1946,.15,.05,.00,.11,-.03,-.17,-.09,-.09,-.02,-.06,-.02,-.28,-.04,-.02,.04,.03,-.11,-.03 70 | 1947,-.10,-.06,.06,.04,-.06,-.01,-.06,-.08,-.14,.06,-.01,-.16,-.04,-.05,-.14,.01,-.05,-.03 71 | 1948,.05,-.12,-.23,-.09,.07,-.06,-.13,-.10,-.11,-.07,-.09,-.21,-.09,-.09,-.07,-.08,-.10,-.09 72 | 1949,.09,-.15,-.01,-.07,-.08,-.23,-.13,-.08,-.09,-.03,-.08,-.16,-.09,-.09,-.09,-.05,-.15,-.06 73 | 1950,-.28,-.26,-.07,-.20,-.11,-.06,-.09,-.17,-.10,-.19,-.35,-.19,-.17,-.17,-.23,-.13,-.11,-.21 74 | 1951,-.34,-.43,-.19,-.10,-.02,-.05,.01,.06,.07,.07,-.01,.16,-.07,-.09,-.32,-.10,.00,.04 75 | 1952,.16,.13,-.09,.01,-.06,-.04,.05,.07,.08,-.03,-.16,-.02,.01,.02,.15,-.05,.03,-.04 76 | 1953,.08,.16,.11,.19,.09,.07,.02,.08,.06,.05,-.05,.04,.08,.07,.07,.13,.06,.02 77 | 1954,-.27,-.10,-.12,-.17,-.19,-.15,-.16,-.13,-.07,.00,.09,-.17,-.12,-.10,-.11,-.16,-.15,.00 78 | 1955,.12,-.21,-.35,-.22,-.20,-.07,-.07,.06,-.14,-.05,-.28,-.32,-.14,-.13,-.09,-.26,-.03,-.15 79 | 1956,-.16,-.24,-.22,-.28,-.28,-.14,-.12,-.26,-.19,-.24,-.15,-.09,-.20,-.22,-.24,-.26,-.17,-.20 80 | 1957,-.13,-.06,-.07,-.02,.08,.16,.01,.14,.07,.00,.06,.16,.03,.01,-.09,.00,.10,.05 81 | 1958,.38,.23,.08,.02,.08,-.06,.04,-.06,-.04,.03,.02,.00,.06,.07,.25,.06,-.02,.00 82 | 1959,.06,.09,.20,.16,.06,.02,.06,-.01,-.05,-.09,-.09,-.02,.03,.04,.05,.14,.03,-.07 83 | 1960,-.02,.13,-.36,-.16,-.08,.01,-.02,.00,.04,.08,-.12,.19,-.03,-.04,.03,-.20,.00,.00 84 | 1961,.07,.19,.10,.16,.14,.11,-.03,.00,.05,-.01,.03,-.15,.05,.08,.15,.13,.03,.03 85 | 1962,.07,.14,.11,.05,-.06,.04,-.03,-.03,-.02,-.03,.06,-.02,.02,.01,.02,.04,.00,.00 86 | 1963,-.03,.18,-.14,-.05,-.09,.02,.08,.24,.20,.15,.15,-.01,.06,.06,.05,-.10,.12,.17 87 | 1964,-.07,-.11,-.24,-.30,-.25,-.07,-.08,-.22,-.28,-.30,-.21,-.30,-.20,-.18,-.06,-.26,-.12,-.26 88 | 1965,-.08,-.18,-.12,-.19,-.15,-.10,-.11,-.01,-.14,-.04,-.06,-.06,-.10,-.12,-.19,-.15,-.07,-.08 89 | 1966,-.17,-.01,.05,-.13,-.11,.02,.08,-.11,-.02,-.15,-.02,-.05,-.05,-.05,-.08,-.06,.00,-.06 90 | 1967,-.07,-.19,.04,-.06,.13,-.08,-.01,.02,-.04,.07,-.06,-.02,-.02,-.03,-.10,.04,-.02,-.01 91 | 1968,-.23,-.14,.21,-.05,-.10,-.06,-.10,-.11,-.19,.11,-.04,-.14,-.07,-.06,-.13,.02,-.09,-.04 92 | 1969,-.11,-.15,.01,.18,.20,.05,-.02,.03,.10,.10,.12,.28,.07,.03,-.13,.13,.02,.11 93 | 1970,.10,.23,.08,.10,-.05,-.03,-.04,-.11,.11,.04,.02,-.14,.03,.06,.20,.04,-.06,.06 94 | 1971,-.03,-.21,-.19,-.11,-.07,-.18,-.12,-.02,.00,-.06,-.04,-.08,-.09,-.10,-.12,-.12,-.11,-.03 95 | 1972,-.25,-.17,.03,-.01,-.01,.04,.02,.18,.04,.08,.02,.19,.01,-.01,-.16,.00,.08,.05 96 | 1973,.27,.31,.26,.26,.26,.16,.09,.02,.07,.12,.05,-.06,.15,.17,.26,.26,.09,.08 97 | 1974,-.14,-.28,-.05,-.10,.00,-.05,-.03,.11,-.13,-.07,-.07,-.10,-.08,-.07,-.16,-.05,.01,-.09 98 | 1975,.08,.07,.14,.06,.16,-.01,-.02,-.20,-.03,-.09,-.16,-.17,-.01,-.01,.02,.12,-.08,-.09 99 | 1976,-.01,-.07,-.21,-.09,-.22,-.15,-.11,-.17,-.10,-.26,-.05,.08,-.11,-.13,-.08,-.18,-.14,-.14 100 | 1977,.17,.21,.25,.27,.31,.25,.23,.19,.02,.05,.20,.05,.18,.19,.15,.28,.23,.09 101 | 1978,.07,.12,.21,.14,.06,-.03,.07,-.18,.05,.01,.17,.11,.07,.06,.08,.14,-.05,.08 102 | 1979,.14,-.08,.18,.12,.05,.13,.02,.14,.26,.26,.29,.47,.17,.13,.06,.12,.10,.27 103 | 1980,.30,.42,.29,.32,.34,.16,.29,.26,.22,.19,.29,.20,.27,.30,.40,.32,.24,.23 104 | 1981,.55,.41,.49,.31,.23,.31,.34,.34,.17,.14,.22,.39,.33,.31,.39,.34,.33,.18 105 | 1982,.08,.15,-.02,.09,.15,.05,.13,.09,.16,.13,.13,.43,.13,.13,.21,.07,.09,.14 106 | 1983,.52,.40,.42,.30,.36,.18,.16,.31,.39,.16,.31,.17,.31,.33,.45,.36,.21,.28 107 | 1984,.31,.18,.29,.10,.34,.05,.15,.15,.19,.15,.04,-.06,.16,.18,.22,.24,.12,.13 108 | 1985,.21,-.06,.17,.11,.18,.16,-.01,.14,.15,.11,.10,.15,.12,.10,.03,.15,.10,.12 109 | 1986,.30,.38,.29,.26,.26,.12,.13,.12,.02,.14,.11,.15,.19,.19,.28,.27,.12,.09 110 | 1987,.35,.46,.17,.25,.26,.36,.46,.28,.39,.32,.25,.47,.33,.31,.32,.22,.36,.32 111 | 1988,.57,.42,.49,.44,.44,.42,.35,.45,.41,.38,.12,.34,.40,.41,.49,.46,.41,.30 112 | 1989,.16,.35,.36,.34,.17,.15,.33,.35,.37,.33,.20,.37,.29,.29,.28,.29,.28,.30 113 | 1990,.40,.41,.76,.55,.46,.37,.43,.30,.30,.43,.45,.41,.44,.44,.39,.59,.37,.39 114 | 1991,.42,.50,.35,.52,.39,.54,.51,.41,.49,.32,.31,.33,.42,.43,.45,.42,.49,.37 115 | 1992,.45,.42,.47,.23,.33,.24,.13,.10,.00,.10,.04,.22,.23,.24,.40,.34,.16,.05 116 | 1993,.38,.39,.36,.27,.27,.24,.27,.14,.11,.23,.07,.19,.24,.25,.33,.30,.22,.14 117 | 1994,.31,.04,.26,.41,.29,.42,.32,.23,.32,.42,.45,.36,.32,.30,.18,.32,.32,.40 118 | 1995,.51,.78,.45,.46,.29,.44,.48,.49,.34,.49,.44,.30,.46,.46,.55,.40,.47,.43 119 | 1996,.27,.49,.34,.37,.29,.26,.35,.49,.26,.20,.41,.40,.34,.34,.35,.33,.37,.29 120 | 1997,.32,.37,.51,.37,.38,.54,.36,.42,.56,.65,.65,.59,.48,.46,.37,.42,.44,.62 121 | 1998,.61,.89,.61,.63,.71,.77,.70,.68,.45,.46,.49,.57,.63,.63,.70,.65,.72,.47 122 | 1999,.48,.67,.33,.34,.33,.37,.41,.34,.44,.44,.41,.47,.42,.43,.57,.33,.38,.43 123 | 2000,.26,.59,.59,.59,.40,.43,.42,.43,.43,.30,.33,.30,.42,.44,.44,.53,.42,.35 124 | 2001,.44,.46,.58,.52,.59,.54,.61,.49,.56,.52,.70,.55,.55,.53,.40,.56,.55,.59 125 | 2002,.74,.76,.91,.59,.65,.54,.61,.55,.64,.57,.59,.42,.63,.64,.68,.71,.57,.60 126 | 2003,.72,.55,.57,.55,.63,.48,.55,.66,.66,.75,.54,.74,.62,.59,.56,.58,.56,.65 127 | 2004,.58,.70,.66,.63,.42,.43,.27,.45,.53,.66,.72,.52,.55,.56,.67,.57,.38,.64 128 | 2005,.72,.58,.70,.69,.65,.66,.65,.62,.77,.80,.76,.68,.69,.68,.61,.68,.65,.77 129 | 2006,.57,.71,.63,.49,.47,.63,.54,.71,.64,.69,.72,.77,.63,.62,.65,.53,.63,.69 130 | 2007,.96,.69,.71,.75,.67,.57,.62,.60,.64,.60,.57,.49,.66,.68,.81,.71,.60,.60 131 | 2008,.26,.35,.73,.53,.50,.47,.60,.44,.64,.67,.66,.55,.53,.53,.36,.59,.50,.66 132 | 2009,.61,.53,.53,.61,.63,.65,.71,.67,.69,.65,.78,.65,.64,.63,.56,.59,.68,.71 133 | 2010,.73,.79,.92,.87,.74,.64,.61,.65,.62,.71,.80,.49,.71,.73,.72,.84,.63,.71 134 | 2011,.51,.51,.64,.65,.53,.58,.74,.73,.56,.66,.56,.54,.60,.60,.50,.61,.68,.59 135 | 2012,.46,.49,.57,.68,.77,.61,.57,.63,.75,.79,.75,.52,.63,.63,.49,.67,.60,.76 136 | 2013,.66,.56,.65,.53,.61,.65,.59,.65,.77,.70,.80,.66,.65,.64,.58,.60,.63,.76 137 | 2014,.73,.50,.77,.79,.86,.65,.58,.81,.90,.86,.69,.79,.74,.73,.63,.80,.68,.82 138 | 2015,.82,.87,.90,.74,.78,.78,.73,.78,.81,1.07,1.03,1.10,.87,.84,.82,.81,.76,.97 139 | 2016,1.14,1.33,1.29,1.09,.93,.79,***,***,***,***,***,***,***,***,1.19,1.10,***,*** 140 | -------------------------------------------------------------------------------- /data/sentiment.csv: -------------------------------------------------------------------------------- 1 | DATE,UMCSENT 2 | 2000-01-01,112.0 3 | 2000-02-01,111.3 4 | 2000-03-01,107.1 5 | 2000-04-01,109.2 6 | 2000-05-01,110.7 7 | 2000-06-01,106.4 8 | 2000-07-01,108.3 9 | 2000-08-01,107.3 10 | 2000-09-01,106.8 11 | 2000-10-01,105.8 12 | 2000-11-01,107.6 13 | 2000-12-01,98.4 14 | 2001-01-01,94.7 15 | 2001-02-01,90.6 16 | 2001-03-01,91.5 17 | 2001-04-01,88.4 18 | 2001-05-01,92.0 19 | 2001-06-01,92.6 20 | 2001-07-01,92.4 21 | 2001-08-01,91.5 22 | 2001-09-01,81.8 23 | 2001-10-01,82.7 24 | 2001-11-01,83.9 25 | 2001-12-01,88.8 26 | 2002-01-01,93.0 27 | 2002-02-01,90.7 28 | 2002-03-01,95.7 29 | 2002-04-01,93.0 30 | 2002-05-01,96.9 31 | 2002-06-01,92.4 32 | 2002-07-01,88.1 33 | 2002-08-01,87.6 34 | 2002-09-01,86.1 35 | 2002-10-01,80.6 36 | 2002-11-01,84.2 37 | 2002-12-01,86.7 38 | 2003-01-01,82.4 39 | 2003-02-01,79.9 40 | 2003-03-01,77.6 41 | 2003-04-01,86.0 42 | 2003-05-01,92.1 43 | 2003-06-01,89.7 44 | 2003-07-01,90.9 45 | 2003-08-01,89.3 46 | 2003-09-01,87.7 47 | 2003-10-01,89.6 48 | 2003-11-01,93.7 49 | 2003-12-01,92.6 50 | 2004-01-01,103.8 51 | 2004-02-01,94.4 52 | 2004-03-01,95.8 53 | 2004-04-01,94.2 54 | 2004-05-01,90.2 55 | 2004-06-01,95.6 56 | 2004-07-01,96.7 57 | 2004-08-01,95.9 58 | 2004-09-01,94.2 59 | 2004-10-01,91.7 60 | 2004-11-01,92.8 61 | 2004-12-01,97.1 62 | 2005-01-01,95.5 63 | 2005-02-01,94.1 64 | 2005-03-01,92.6 65 | 2005-04-01,87.7 66 | 2005-05-01,86.9 67 | 2005-06-01,96.0 68 | 2005-07-01,96.5 69 | 2005-08-01,89.1 70 | 2005-09-01,76.9 71 | 2005-10-01,74.2 72 | 2005-11-01,81.6 73 | 2005-12-01,91.5 74 | 2006-01-01,91.2 75 | 2006-02-01,86.7 76 | 2006-03-01,88.9 77 | 2006-04-01,87.4 78 | 2006-05-01,79.1 79 | 2006-06-01,84.9 80 | 2006-07-01,84.7 81 | 2006-08-01,82.0 82 | 2006-09-01,85.4 83 | 2006-10-01,93.6 84 | 2006-11-01,92.1 85 | 2006-12-01,91.7 86 | 2007-01-01,96.9 87 | 2007-02-01,91.3 88 | 2007-03-01,88.4 89 | 2007-04-01,87.1 90 | 2007-05-01,88.3 91 | 2007-06-01,85.3 92 | 2007-07-01,90.4 93 | 2007-08-01,83.4 94 | 2007-09-01,83.4 95 | 2007-10-01,80.9 96 | 2007-11-01,76.1 97 | 2007-12-01,75.5 98 | 2008-01-01,78.4 99 | 2008-02-01,70.8 100 | 2008-03-01,69.5 101 | 2008-04-01,62.6 102 | 2008-05-01,59.8 103 | 2008-06-01,56.4 104 | 2008-07-01,61.2 105 | 2008-08-01,63.0 106 | 2008-09-01,70.3 107 | 2008-10-01,57.6 108 | 2008-11-01,55.3 109 | 2008-12-01,60.1 110 | 2009-01-01,61.2 111 | 2009-02-01,56.3 112 | 2009-03-01,57.3 113 | 2009-04-01,65.1 114 | 2009-05-01,68.7 115 | 2009-06-01,70.8 116 | 2009-07-01,66.0 117 | 2009-08-01,65.7 118 | 2009-09-01,73.5 119 | 2009-10-01,70.6 120 | 2009-11-01,67.4 121 | 2009-12-01,72.5 122 | 2010-01-01,74.4 123 | 2010-02-01,73.6 124 | 2010-03-01,73.6 125 | 2010-04-01,72.2 126 | 2010-05-01,73.6 127 | 2010-06-01,76.0 128 | 2010-07-01,67.8 129 | 2010-08-01,68.9 130 | 2010-09-01,68.2 131 | 2010-10-01,67.7 132 | 2010-11-01,71.6 133 | 2010-12-01,74.5 134 | 2011-01-01,74.2 135 | 2011-02-01,77.5 136 | 2011-03-01,67.5 137 | 2011-04-01,69.8 138 | 2011-05-01,74.3 139 | 2011-06-01,71.5 140 | 2011-07-01,63.7 141 | 2011-08-01,55.8 142 | 2011-09-01,59.5 143 | 2011-10-01,60.8 144 | 2011-11-01,63.7 145 | 2011-12-01,69.9 146 | 2012-01-01,75.0 147 | 2012-02-01,75.3 148 | 2012-03-01,76.2 149 | 2012-04-01,76.4 150 | 2012-05-01,79.3 151 | 2012-06-01,73.2 152 | 2012-07-01,72.3 153 | 2012-08-01,74.3 154 | 2012-09-01,78.3 155 | 2012-10-01,82.6 156 | 2012-11-01,82.7 157 | 2012-12-01,72.9 158 | 2013-01-01,73.8 159 | 2013-02-01,77.6 160 | 2013-03-01,78.6 161 | 2013-04-01,76.4 162 | 2013-05-01,84.5 163 | 2013-06-01,84.1 164 | 2013-07-01,85.1 165 | 2013-08-01,82.1 166 | 2013-09-01,77.5 167 | 2013-10-01,73.2 168 | 2013-11-01,75.1 169 | 2013-12-01,82.5 170 | 2014-01-01,81.2 171 | 2014-02-01,81.6 172 | 2014-03-01,80.0 173 | 2014-04-01,84.1 174 | 2014-05-01,81.9 175 | 2014-06-01,82.5 176 | 2014-07-01,81.8 177 | 2014-08-01,82.5 178 | 2014-09-01,84.6 179 | 2014-10-01,86.9 180 | 2014-11-01,88.8 181 | 2014-12-01,93.6 182 | 2015-01-01,98.1 183 | 2015-02-01,95.4 184 | 2015-03-01,93.0 185 | 2015-04-01,95.9 186 | 2015-05-01,90.7 187 | 2015-06-01,96.1 188 | 2015-07-01,93.1 189 | 2015-08-01,91.9 190 | 2015-09-01,87.2 191 | 2015-10-01,90.0 192 | 2015-11-01,91.3 193 | 2015-12-01,92.6 194 | 2016-01-01,92.0 195 | 2016-02-01,91.7 196 | 2016-03-01,91.0 197 | 2016-04-01,89.0 198 | 2016-05-01,94.7 199 | 2016-06-01,93.5 200 | 2016-07-01,90.0 201 | -------------------------------------------------------------------------------- /data/series1.csv: -------------------------------------------------------------------------------- 1 | ,value 2 | 2006-06-01,0.21506609377014937 3 | 2006-07-01,1.142246186967091 4 | 2006-08-01,0.08077089285729766 5 | 2006-09-01,-0.7395189837372728 6 | 2006-10-01,0.5355162794384382 7 | 2006-11-01,-0.5647264651320741 8 | 2006-12-01,-1.1913935216543692 9 | 2007-01-01,-1.9961368164247117 10 | 2007-02-01,-1.8824096445368526 11 | 2007-03-01,-1.881361293860238 12 | 2007-04-01,-0.9766776697907428 13 | 2007-05-01,-1.9019923318711625 14 | 2007-06-01,-3.108610707028711 15 | 2007-07-01,-3.5268821422956975 16 | 2007-08-01,-2.7697700367118965 17 | 2007-09-01,-2.0338040672828015 18 | 2007-10-01,-3.180075966358295 19 | 2007-11-01,-3.3080815409072417 20 | 2007-12-01,-3.418501908245932 21 | 2008-01-01,-4.104801270845488 22 | 2008-02-01,-3.1744056756446346 23 | 2008-03-01,-1.425289135757783 24 | 2008-04-01,0.4401615904944527 25 | 2008-05-01,1.2679262510037599 26 | 2008-06-01,0.5441554324441394 27 | 2008-07-01,-0.48066970719551183 28 | 2008-08-01,-1.5799841374850232 29 | 2008-09-01,-0.1336163743808476 30 | 2008-10-01,1.7643398021524446 31 | 2008-11-01,-1.2648773342494266 32 | 2008-12-01,-3.1521978842334883 33 | 2009-01-01,-3.589928203018723 34 | 2009-02-01,-3.406228122833789 35 | 2009-03-01,-3.8263343229385622 36 | 2009-04-01,-2.7425294344933775 37 | 2009-05-01,-1.7887272597313733 38 | 2009-06-01,-2.4639028761016517 39 | 2009-07-01,-2.075657675328104 40 | 2009-08-01,-2.701547013802915 41 | 2009-09-01,-1.7025518581431656 42 | 2009-10-01,-0.7589319907020389 43 | 2009-11-01,-2.905804182464432 44 | 2009-12-01,-1.7551468831911923 45 | 2010-01-01,-1.9096679238947885 46 | 2010-02-01,-0.1291052757849871 47 | 2010-03-01,2.211945650117264 48 | 2010-04-01,1.569618562080831 49 | 2010-05-01,1.5087042686118521 50 | 2010-06-01,1.6998884025934553 51 | 2010-07-01,-1.7667376106183483 52 | 2010-08-01,-1.366051636217643 53 | 2010-09-01,0.7285490524822341 54 | 2010-10-01,2.2262658158365096 55 | 2010-11-01,1.6367586650124024 56 | 2010-12-01,0.043641470761556 57 | 2011-01-01,-2.3933886026916142 58 | 2011-02-01,-3.2353454025596102 59 | 2011-03-01,-2.101837771824045 60 | 2011-04-01,-0.8990375021239454 61 | 2011-05-01,-1.4936664574666798 62 | 2011-06-01,-3.1073167508272217 63 | 2011-07-01,-1.4205328314401435 64 | 2011-08-01,0.2758607214066028 65 | 2011-09-01,0.43735990925986584 66 | 2011-10-01,-0.2548263800847047 67 | 2011-11-01,-0.3458472155762089 68 | 2011-12-01,-0.6115030256444327 69 | 2012-01-01,-0.38337005302025623 70 | 2012-02-01,1.6954758499607498 71 | 2012-03-01,1.5613010884084153 72 | 2012-04-01,-0.17909432703339856 73 | 2012-05-01,-0.5810841195080401 74 | 2012-06-01,-2.3308344299399346 75 | 2012-07-01,-2.0057500970363367 76 | 2012-08-01,-0.5478467175272086 77 | 2012-09-01,0.7102722661376356 78 | 2012-10-01,1.5215664448259674 79 | 2012-11-01,1.323207097388035 80 | 2012-12-01,0.8370054084248226 81 | 2013-01-01,-0.10582093340683829 82 | 2013-02-01,-1.8597930129863454 83 | 2013-03-01,-1.9819951932431905 84 | 2013-04-01,-0.3690851457447418 85 | 2013-05-01,1.0213087568123702 86 | 2013-06-01,1.313503316487931 87 | 2013-07-01,1.138473914848284 88 | 2013-08-01,-0.5684114159918983 89 | 2013-09-01,-1.4298085649813963 90 | 2013-10-01,-1.8058074666928423 91 | 2013-11-01,-1.9511514403250418 92 | 2013-12-01,-1.4477675756530977 93 | 2014-01-01,-0.039660778396576335 94 | 2014-02-01,1.4280190595321094 95 | 2014-03-01,1.1451101579872376 96 | 2014-04-01,-1.6690214653425715 97 | 2014-05-01,-1.5040115349757466 98 | 2014-06-01,-2.448986495668514 99 | 2014-07-01,-2.8317230406994662 100 | 2014-08-01,-2.6938098802334176 101 | 2014-09-01,0.23414533840190543 102 | 2014-10-01,1.33963923299082 103 | 2014-11-01,1.4028876775484114 104 | 2014-12-01,1.7780474518983755 105 | 2015-01-01,1.6194943314371981 106 | 2015-02-01,0.4887985096857944 107 | 2015-03-01,2.208630445167098 108 | 2015-04-01,2.4556132237466786 109 | 2015-05-01,2.6470927240715665 110 | 2015-06-01,3.0162463456840567 111 | 2015-07-01,1.7039588266126313 112 | 2015-08-01,0.6037148295707649 113 | 2015-09-01,-1.2737209728501695 114 | 2015-10-01,-0.93284071310711 115 | 2015-11-01,0.08551545990148485 116 | 2015-12-01,1.20534410726747 117 | 2016-01-01,2.164110679279701 118 | 2016-02-01,0.9522611305039776 119 | 2016-03-01,0.3648520796360801 120 | 2016-04-01,-2.264868721883362 121 | 2016-05-01,-2.3816786375743693 122 | -------------------------------------------------------------------------------- /data/series2.csv: -------------------------------------------------------------------------------- 1 | ,value 2 | 1998-06-01,-0.5988321459640398 3 | 1998-07-01,-0.8001818245420875 4 | 1998-08-01,2.29897730433156 5 | 1998-09-01,1.1503876193657336 6 | 1998-10-01,-1.1925784409492428 7 | 1998-11-01,-2.8537590592863404 8 | 1998-12-01,-1.9850296778582175 9 | 1999-01-01,-0.2804660526798255 10 | 1999-02-01,0.6441921376804158 11 | 1999-03-01,-0.38643573665159 12 | 1999-04-01,-1.7077984242692898 13 | 1999-05-01,-1.0643432678858735 14 | 1999-06-01,-0.2807300935199594 15 | 1999-07-01,1.0352887029352587 16 | 1999-08-01,0.4772659587223367 17 | 1999-09-01,0.14729461428518129 18 | 1999-10-01,0.3154110024282838 19 | 1999-11-01,0.6863548129190071 20 | 1999-12-01,-0.287397199588831 21 | 2000-01-01,-0.7152901377520209 22 | 2000-02-01,0.9393973123420367 23 | 2000-03-01,1.5775143325001566 24 | 2000-04-01,2.882927016963263 25 | 2000-05-01,1.4941798228687881 26 | 2000-06-01,-0.13206446812278239 27 | 2000-07-01,0.08711484098949879 28 | 2000-08-01,-1.206737167276847 29 | 2000-09-01,0.9805199834263024 30 | 2000-10-01,1.4375923397946233 31 | 2000-11-01,-1.4505667037318246 32 | 2000-12-01,-0.5276126147276123 33 | 2001-01-01,-0.6351930657840301 34 | 2001-02-01,-0.8789431454671376 35 | 2001-03-01,0.38016739799920973 36 | 2001-04-01,0.8989237811520877 37 | 2001-05-01,-0.5696251467413351 38 | 2001-06-01,0.5810107288436385 39 | 2001-07-01,-0.877114392182485 40 | 2001-08-01,-1.8419085781204712 41 | 2001-09-01,-3.0691972159928227 42 | 2001-10-01,-1.592178550180547 43 | 2001-11-01,1.32565772017106 44 | 2001-12-01,3.0356773079514054 45 | 2002-01-01,0.539169160847849 46 | 2002-02-01,-1.5749740555067968 47 | 2002-03-01,-0.5628386964025566 48 | 2002-04-01,2.209853716845048 49 | 2002-05-01,-0.3874084576722314 50 | 2002-06-01,0.11802333676416787 51 | 2002-07-01,0.030258282148944465 52 | 2002-08-01,-0.8169009829590038 53 | 2002-09-01,-0.7066727929039491 54 | 2002-10-01,-1.9666381833685282 55 | 2002-11-01,0.1961625983960542 56 | 2002-12-01,0.9501640581007602 57 | 2003-01-01,-0.09661344822577811 58 | 2003-02-01,1.2536526524586213 59 | 2003-03-01,-0.48851392015721484 60 | 2003-04-01,-2.123074480287456 61 | 2003-05-01,-1.0768519769268379 62 | 2003-06-01,-0.41002004156468885 63 | 2003-07-01,0.9077753163701393 64 | 2003-08-01,1.8289107832882583 65 | 2003-09-01,1.8839236166059405 66 | 2003-10-01,-0.17034810074818763 67 | 2003-11-01,0.3461316722485077 68 | 2003-12-01,-0.5123804109231517 69 | 2004-01-01,-0.0250276791192518 70 | 2004-02-01,-1.6531120005419884 71 | 2004-03-01,-0.04375296259860606 72 | 2004-04-01,-1.0487457638712718 73 | 2004-05-01,-1.3555129316049697 74 | 2004-06-01,-1.0994696847921825 75 | 2004-07-01,0.5323094106346663 76 | 2004-08-01,0.14925877923715314 77 | 2004-09-01,0.01848323695573731 78 | 2004-10-01,-0.7814264894902231 79 | 2004-11-01,1.5167771796013843 80 | 2004-12-01,2.296051087044108 81 | 2005-01-01,0.9434746775169713 82 | 2005-02-01,-1.3443629053521349 83 | 2005-03-01,-0.8976294986936344 84 | 2005-04-01,0.98175810146041 85 | 2005-05-01,1.914742733869594 86 | 2005-06-01,3.6808250803475433 87 | 2005-07-01,2.2499900438842384 88 | 2005-08-01,-0.29639336413031137 89 | 2005-09-01,0.4269621646670275 90 | 2005-10-01,0.19173131063833318 91 | 2005-11-01,-0.2889815853393901 92 | 2005-12-01,-1.0759536060630406 93 | 2006-01-01,-2.3631250663725956 94 | 2006-02-01,-1.9836952101294716 95 | 2006-03-01,-3.500637349426545 96 | 2006-04-01,-2.1141695706277153 97 | 2006-05-01,0.017968722919202373 98 | 2006-06-01,-0.9458506118246945 99 | 2006-07-01,-0.6874705885358369 100 | 2006-08-01,0.8465448195500478 101 | 2006-09-01,0.6691255871439188 102 | 2006-10-01,-0.08713307038502685 103 | 2006-11-01,1.0998058262016177 104 | 2006-12-01,1.4779251634861494 105 | 2007-01-01,1.0297282792527176 106 | 2007-02-01,0.2761081358786794 107 | 2007-03-01,-0.07624838857865499 108 | 2007-04-01,0.9858677151212878 109 | 2007-05-01,1.8400672256944817 110 | 2007-06-01,1.3298450389193845 111 | 2007-07-01,1.3647035685985633 112 | 2007-08-01,0.002165062180341848 113 | 2007-09-01,-1.8832985648714708 114 | 2007-10-01,0.25478369007939394 115 | 2007-11-01,-0.08428480654220216 116 | 2007-12-01,0.250528577334092 117 | 2008-01-01,-1.388046269408355 118 | 2008-02-01,-0.8141709674977629 119 | 2008-03-01,1.105602704377001 120 | 2008-04-01,0.6783867356618624 121 | 2008-05-01,0.8597988185166201 122 | 2008-06-01,0.6072776843392129 123 | 2008-07-01,-1.3642440097784643 124 | 2008-08-01,-1.7137879496710897 125 | 2008-09-01,-0.23947769210328385 126 | 2008-10-01,-0.33873159235012495 127 | 2008-11-01,-0.8917264305872387 128 | 2008-12-01,-1.6743758264767927 129 | 2009-01-01,0.20039551857306726 130 | 2009-02-01,-2.179181944879357 131 | 2009-03-01,0.5897093990491 132 | 2009-04-01,2.2617760869600314 133 | 2009-05-01,1.7321264302126078 134 | 2009-06-01,-0.6027358708709991 135 | 2009-07-01,-0.964091433892799 136 | 2009-08-01,-0.961072497165241 137 | 2009-09-01,1.121733623900013 138 | 2009-10-01,1.8587796044504794 139 | 2009-11-01,1.0411921039301089 140 | 2009-12-01,-0.8383568366059101 141 | 2010-01-01,-1.0065600590335944 142 | 2010-02-01,-0.9352167936808549 143 | 2010-03-01,-0.32786047740480695 144 | 2010-04-01,0.4058865638022857 145 | 2010-05-01,1.1289604538400548 146 | 2010-06-01,-0.1289863833332744 147 | 2010-07-01,-0.7837566521267346 148 | 2010-08-01,-0.9382022886725967 149 | 2010-09-01,0.019015698560999605 150 | 2010-10-01,1.1744723944491786 151 | 2010-11-01,0.023480177242375166 152 | 2010-12-01,0.11225444641066057 153 | 2011-01-01,0.7503458729665786 154 | 2011-02-01,0.6327836554084286 155 | 2011-03-01,-0.397904482673621 156 | 2011-04-01,-0.9041192801212166 157 | 2011-05-01,0.19684514968289946 158 | 2011-06-01,0.8164581910258784 159 | 2011-07-01,-0.7865406928876951 160 | 2011-08-01,0.5776816701185132 161 | 2011-09-01,0.6830306650409664 162 | 2011-10-01,-1.5764659470575981 163 | 2011-11-01,-0.9753378452977925 164 | 2011-12-01,-0.5018770909690016 165 | 2012-01-01,0.002990601985497658 166 | 2012-02-01,0.6934312692549467 167 | 2012-03-01,0.33035488799827195 168 | 2012-04-01,0.6316395283006564 169 | 2012-05-01,-2.1369448261370234 170 | 2012-06-01,-1.019048380073622 171 | 2012-07-01,0.4383427157248401 172 | 2012-08-01,0.9407095470889854 173 | 2012-09-01,1.348099649128561 174 | 2012-10-01,0.8116850598534116 175 | 2012-11-01,1.8388052714366765 176 | 2012-12-01,2.5190616395369294 177 | 2013-01-01,1.6689361863085121 178 | 2013-02-01,-2.9339212946760274 179 | 2013-03-01,-2.3596669610834473 180 | 2013-04-01,-0.3476010206228577 181 | 2013-05-01,-0.6513293272135454 182 | 2013-06-01,0.16057182159345101 183 | 2013-07-01,-0.1808846341542013 184 | 2013-08-01,1.1473888063467639 185 | 2013-09-01,0.8478549668224615 186 | 2013-10-01,-1.9438919572647397 187 | 2013-11-01,-4.306569160552039 188 | 2013-12-01,-2.9811560202605527 189 | 2014-01-01,0.9623757604005118 190 | 2014-02-01,3.1388819359677322 191 | 2014-03-01,-0.4911584187684064 192 | 2014-04-01,-0.903162201396771 193 | 2014-05-01,-0.20553474371678188 194 | 2014-06-01,0.0034713819334720175 195 | 2014-07-01,-0.06661193721268444 196 | 2014-08-01,-0.9011184211279554 197 | 2014-09-01,0.37803147639788465 198 | 2014-10-01,1.5791737174543747 199 | 2014-11-01,-0.7855743308964656 200 | 2014-12-01,-1.3663558242294889 201 | 2015-01-01,0.09362845713313073 202 | 2015-02-01,0.5052619604953503 203 | 2015-03-01,0.7427287185469982 204 | 2015-04-01,0.7168394463243395 205 | 2015-05-01,1.4315642877534047 206 | 2015-06-01,-0.006107227310269359 207 | 2015-07-01,-0.8942768166365469 208 | 2015-08-01,-0.8966058572714672 209 | 2015-09-01,0.5685733059272622 210 | 2015-10-01,1.8259429077513398 211 | 2015-11-01,0.12388491524839751 212 | 2015-12-01,-0.0017948479719895327 213 | 2016-01-01,-0.30255127717029906 214 | 2016-02-01,-0.09809951837633618 215 | 2016-03-01,-0.8275530135614978 216 | 2016-04-01,1.6647855828518248 217 | 2016-05-01,1.3353296100931782 218 | 2016-06-01,-1.7525984037269728 219 | 2016-07-01,-0.23068707205883043 220 | 2016-08-01,0.41366416066200007 221 | 2016-09-01,-1.1628965126202637 222 | 2016-10-01,-1.3087945420735818 223 | 2016-11-01,-1.5548898995699532 224 | 2016-12-01,0.985958917261037 225 | 2017-01-01,1.1866663916911828 226 | 2017-02-01,1.2150681052189176 227 | 2017-03-01,-1.1837011467595464 228 | 2017-04-01,-2.9009130879566314 229 | 2017-05-01,-1.2105886542817972 230 | 2017-06-01,1.654358133482298 231 | 2017-07-01,-0.2778687199736294 232 | 2017-08-01,-0.2838170624419766 233 | 2017-09-01,-0.6061344971374943 234 | 2017-10-01,-0.39391290851508587 235 | 2017-11-01,1.0816482242687575 236 | 2017-12-01,0.4093893605786109 237 | 2018-01-01,2.177077034365991 238 | 2018-02-01,2.2113259696356655 239 | 2018-03-01,-0.3972793928313735 240 | 2018-04-01,0.8206395157890112 241 | 2018-05-01,0.6605343290422202 242 | 2018-06-01,0.7894552492498998 243 | 2018-07-01,-0.024437392942594227 244 | 2018-08-01,-0.3988814626402032 245 | 2018-09-01,-0.4140144429318273 246 | 2018-10-01,1.4580015190498223 247 | 2018-11-01,-0.02596102842054826 248 | 2018-12-01,-0.6586766555201682 249 | 2019-01-01,-0.21643238827171585 250 | 2019-02-01,-0.42230681184674035 251 | 2019-03-01,1.2527223297693217 252 | -------------------------------------------------------------------------------- /img/svds_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/etcrago/Tutorial-Arima-w-jeffrey-yau/428f906b17616c5aed39b923a6164b80857a698a/img/svds_logo.png --------------------------------------------------------------------------------