├── .gitignore
├── README.md
├── Section_1_Introduction_tutorial.ipynb
├── Section_2_ARIMA_Models_tutorial.ipynb
├── Section_3_ARIMA_Modeling_tutorial.ipynb
├── Section_4_SARIMA_tutorial.ipynb
├── Section_5_ClosingRemarks_tutorial.ipynb
├── data
├── HOUST.csv
├── T10yr.csv
├── TOTALSA.csv
├── TTLCON.csv
├── citi.csv
├── dji.csv
├── international-airline-passengers.csv
├── liquor.csv
├── mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv
├── sentiment.csv
├── series1.csv
└── series2.csv
└── img
└── svds_logo.png
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .ipynb_checkpoints
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PyData San Francisco 2016 - ARIMA Tutorial
2 |
3 | ## Shortlink to this page: http://bit.ly/svds-pydata-ts
4 |
5 | To clone this repository, run the command:
6 | ```bash
7 | git clone git@github.com:silicon-valley-data-science/pydata-sf-2016-arima-tutorial.git
8 | ```
9 |
10 | Requirements (Anaconda/conda install) plus one bleeding edge package:
11 | ```bash
12 | pip install --pre statsmodels --upgrade
13 | ```
14 |
15 | ## Conference page: http://pydata.org/sfo2016/schedule/presentation/38/
16 |
17 | ## Tutorial video: https://www.youtube.com/watch?v=tJ-O3hk1vRw
18 |
--------------------------------------------------------------------------------
/Section_1_Introduction_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# PyData San Francisco 2016\n",
15 | "## Applied Time Series Econometrics in Python (and R) Tutorial"
16 | ]
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": [
22 | "# Abstract\n",
23 | "\n",
24 | "Time series data is ubiquitous, both within and outside of the data science field: weekly initial unemployment claims, tick level stock prices, weekly company sales, daily number of steps taken recorded by a wearable, just to name a few. Some of the most important and commonly used data science techniques to analyze time series data are those in developed in the field of statistics. For this reason, time series statistical models should be included in any data scientist's toolkit.\n",
25 | "\n",
26 | "This 120-minute tutorial covers the mathematical formulation, statistical foundation, and practical considerations of one of the most important classes of time series models: AutoRegression Integrated Moving Average with Explanatory Variables (ARIMAX) models, and its Seasonal counterpart (SARIMAX)."
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "# Topics included in the Tutorial\n",
34 | "\n",
35 | "- Common use cases of SARIMAX\n",
36 | "- The entire class of SARIMAX models, which include Autoregressive (AR) models, Moving Average (MA) models, Mixed Autoregressive Moving Average (ARMA) models, Autoregressive Integrated Moving Average (ARIMA) models, these models with explanatory variables (ARIMAX), and these models with seasonal components and explanatory variables (SARIMAX)\n",
37 | "- Mathematical formulation\n",
38 | "- Underlying assumptions of this class of model\n",
39 | "- Implementation of these models in Python and R, in which I will compare and contrast the two, using simulated and real-world time-series data, which includes\n",
40 | " - Exploratory time series data analysis using histogram, kernel density plot, time-series plot, scatterplot matrix, plots of autocorrelation (i.e. correlogram), and plots of partial autocorrelation \n",
41 | " - Statistical estimation and its options available in Python and R\n",
42 | " - Simulation of these models\n",
43 | " - Order selection (using the celebrated Box-Jenkins approach)\n",
44 | " - Assumption testing and model evaluation \n",
45 | " - Forecasting"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "# Outline of the Tutorial\n",
53 | "\n",
54 | "### 1. Introduction\n",
55 | "\n",
56 | " - 1.1 Common use cases from different disciplines\n",
57 | " - 1.2 Common characteristics of time series\n",
58 | " - 1.3 The class of models to be covered today: A demo\n",
59 | " - Exercise 1\n",
60 | " \n",
61 | "### 2. ETSDA, ARIMA Model Formulation\n",
62 | " - 2.1 The notion of stochastic processes, time series, and stationarity\n",
63 | " - 2.2 Exploratory Time Series Data Analysis\n",
64 | " - 2.3 Mathematical formulation of ARIMA models\n",
65 | " - 2.4 The Box-Jenkins Approach to ARIMA Modeling\n",
66 | " - Exercise 2\n",
67 | " \n",
68 | "### 3. ARIMA Modeling\n",
69 | " - 3.1 Model Identification\n",
70 | " - 3.2 Model Diagnostic Checking\n",
71 | " - 3.3 Model performane evaluation (in-sample fit)\n",
72 | " - 3.4 Forecasting and forecast evaluation \n",
73 | " - 3.5 A few words on adding explanatory variables, its use cases, and its practical suggestions\n",
74 | " - Exercise 3\n",
75 | "\n",
76 | "### 4. SARIMA Modeling\n",
77 | " - 4.1 Mathematical formulation of Seasonal ARIMA (SARIMA) models\n",
78 | " - 4.2 Building a seasonal ARIMA model for forecasting\n",
79 | " - Exercise 4\n",
80 | "\n",
81 | "### 5. Closing Remarks: Practical suggestions and other topics\n",
82 | " - 5.1 Model selection heuristics\n",
83 | " - 5.2 Where to go from here"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "
Note: You may note that these notebooks are, at times, fairly dense. That is because there is likely more material here than we can cover today. This was done on purpose, as there is a lot to know. My hope is that you can continue your exploration of the topic with these notebooks, even after the tutorial has ended.
"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | " Requires `statsmodels0.8.0rc1` or newer. Install with `pip install --pre statsmodels --upgrade`
"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {
104 | "ExecuteTime": {
105 | "end_time": "2016-08-24T11:55:46.760055",
106 | "start_time": "2016-08-24T11:55:44.886453"
107 | },
108 | "collapsed": true
109 | },
110 | "outputs": [],
111 | "source": [
112 | "%load_ext autoreload\n",
113 | "%autoreload 2\n",
114 | "%matplotlib inline\n",
115 | "%config InlineBackend.figure_format='retina'\n",
116 | "\n",
117 | "from __future__ import absolute_import, division, print_function\n",
118 | "\n",
119 | "import pandas as pd\n",
120 | "import numpy as np\n",
121 | "\n",
122 | "import statsmodels.api as sm\n",
123 | "import statsmodels.formula.api as smf\n",
124 | "import statsmodels.tsa.api as smt\n",
125 | "\n",
126 | "# Display and Plotting\n",
127 | "import matplotlib.pylab as plt\n",
128 | "import seaborn as sns\n",
129 | "\n",
130 | "from ipywidgets import interactive, widgets, RadioButtons, ToggleButton, Select, FloatSlider, FloatRangeSlider, IntSlider, fixed\n",
131 | "\n",
132 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n",
133 | "np.set_printoptions(precision=5, suppress=True) # numpy\n",
134 | "\n",
135 | "pd.set_option('display.max_columns', 100)\n",
136 | "pd.set_option('display.max_rows', 100)\n",
137 | "\n",
138 | "# seaborn plotting style\n",
139 | "sns.set(style='ticks', context='poster')"
140 | ]
141 | },
142 | {
143 | "cell_type": "markdown",
144 | "metadata": {},
145 | "source": [
146 | "# 1. Introduction\n",
147 | " - 1.1 Common use cases from different disciplines\n",
148 | " - 1.2 Common characteristics of time series\n",
149 | " - 1.3 The class of models to be covered today: A demo"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {
155 | "collapsed": true
156 | },
157 | "source": [
158 | "## 1.1 Common Use Cases"
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {
164 | "collapsed": true
165 | },
166 | "source": [
167 | "- Government Budget and Key Economic Indicator Projections\n",
168 | "- Companies forecast sales \n",
169 | "- CMS Projection on National Health Expenditure\n",
170 | "- NCES Projections of Education Statistics\n",
171 | "- Vehicular traffic flow forecasting\n",
172 | "- Dynamic resource allocation (e.g., servers)\n",
173 | "- Physiological models for health monitoring (e.g., glucose levels in diabetics)"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "## 1.2 Common characteristics of time series"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {
186 | "collapsed": true
187 | },
188 | "source": [
189 | "* Trend\n",
190 | "* Seasonality\n",
191 | "* Cycles\n",
192 | "* Combination of the above"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "### Pattern 1: Trend and Fluctuation around the Trend\n",
200 | "\n",
201 | "Airline Passenger Bookings\n",
202 | "\n",
203 | "https://datamarket.com/data/set/22u3/international-airline-passengers-monthly-totals-in-thousands-jan-49-dec-60\n",
204 | "- `data/international-airline-passengers.csv`\n"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": null,
210 | "metadata": {
211 | "ExecuteTime": {
212 | "end_time": "2016-08-24T11:40:50.642369",
213 | "start_time": "2016-08-24T11:40:50.615605"
214 | },
215 | "collapsed": true
216 | },
217 | "outputs": [],
218 | "source": [
219 | "air = pd.read_csv('data/international-airline-passengers.csv', header=0, index_col=0, parse_dates=[0])"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {
226 | "ExecuteTime": {
227 | "end_time": "2016-08-24T11:40:51.296736",
228 | "start_time": "2016-08-24T11:40:50.643896"
229 | },
230 | "collapsed": false
231 | },
232 | "outputs": [],
233 | "source": [
234 | "fig, ax = plt.subplots(figsize=(8,6));\n",
235 | "\n",
236 | "air['n_pass_thousands'].plot(ax=ax);\n",
237 | "\n",
238 | "ax.set_title('International airline passengers, 1949-1960');\n",
239 | "ax.set_ylabel('Thousands of passengers');\n",
240 | "ax.set_xlabel('Year');\n",
241 | "ax.xaxis.set_ticks_position('bottom')\n",
242 | "fig.tight_layout();"
243 | ]
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": null,
248 | "metadata": {
249 | "ExecuteTime": {
250 | "end_time": "2016-08-24T11:40:52.092458",
251 | "start_time": "2016-08-24T11:40:51.300100"
252 | },
253 | "collapsed": false
254 | },
255 | "outputs": [],
256 | "source": [
257 | "# Examine annual trend in the data\n",
258 | "fig, ax = plt.subplots(figsize=(8,6));\n",
259 | "\n",
260 | "air['n_pass_thousands'].resample('AS').sum().plot(ax=ax)\n",
261 | "\n",
262 | "# ax.set_title('Aggregated annual series: International airline passengers, 1949-1960');\n",
263 | "fig.suptitle('Aggregated annual series: International airline passengers, 1949-1960');\n",
264 | "ax.set_ylabel('Thousands of passengers');\n",
265 | "ax.set_xlabel('Year');\n",
266 | "ax.xaxis.set_ticks_position('bottom')\n",
267 | "fig.tight_layout();\n",
268 | "fig.subplots_adjust(top=0.9)"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": null,
274 | "metadata": {
275 | "ExecuteTime": {
276 | "end_time": "2016-08-24T11:40:52.858761",
277 | "start_time": "2016-08-24T11:40:52.094540"
278 | },
279 | "collapsed": false
280 | },
281 | "outputs": [],
282 | "source": [
283 | "# Examine seasonal trend in the data\n",
284 | "air['Month'] = air.index.strftime('%b')\n",
285 | "air['Year'] = air.index.year\n",
286 | "\n",
287 | "air_piv = air.pivot(index='Year', columns='Month', values='n_pass_thousands')\n",
288 | "\n",
289 | "air = air.drop(['Month', 'Year'], axis=1)\n",
290 | "\n",
291 | "# put the months in order\n",
292 | "month_names = pd.date_range(start='2016-01-01', periods=12, freq='MS').strftime('%b')\n",
293 | "air_piv = air_piv.reindex(columns=month_names)\n",
294 | "\n",
295 | "# plot it\n",
296 | "fig, ax = plt.subplots(figsize=(8, 6))\n",
297 | "air_piv.plot(ax=ax, kind='box');\n",
298 | "\n",
299 | "ax.set_xlabel('Month');\n",
300 | "ax.set_ylabel('Thousands of passengers');\n",
301 | "ax.set_title('Boxplot of seasonal values');\n",
302 | "ax.xaxis.set_ticks_position('bottom')\n",
303 | "fig.tight_layout();"
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "### Pattern 2: Trend and Change in Structure\n",
311 | "\n",
312 | "Annual Average Global Temperature Change\n",
313 | "\n",
314 | "http://data.giss.nasa.gov/gistemp/graphs/graph_files.html - Land-Ocean: Global Means\n",
315 | "- `data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv`"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": null,
321 | "metadata": {
322 | "ExecuteTime": {
323 | "end_time": "2016-08-24T11:40:53.800427",
324 | "start_time": "2016-08-24T11:40:52.860435"
325 | },
326 | "collapsed": false
327 | },
328 | "outputs": [],
329 | "source": [
330 | "gtemp = pd.read_csv('data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv', header=1, index_col=0, parse_dates=[0])\n",
331 | "gtemp['avg'] = gtemp.iloc[:,:12].mean(axis=1)\n",
332 | "\n",
333 | "fig, ax = plt.subplots(figsize=(8, 6));\n",
334 | "\n",
335 | "gtemp['avg'].plot(ax=ax);\n",
336 | "\n",
337 | "ax.set_title('Annual Average Global Temperature Change');\n",
338 | "\n",
339 | "ylim = (-1.0, 1.5)\n",
340 | "ax.set_ylim(ylim)\n",
341 | "\n",
342 | "ax.fill_betweenx(ylim, gtemp.index[0], pd.Timestamp('1922'), alpha=.1, zorder=-1, color='b');\n",
343 | "ax.fill_betweenx(ylim, pd.Timestamp('1922'), pd.Timestamp('1965'), alpha=.1, zorder=-1, color='g');\n",
344 | "ax.fill_betweenx(ylim, pd.Timestamp('1965'), gtemp.index[-1], alpha=.1, zorder=-1, color='r');\n",
345 | "\n",
346 | "ax.annotate('$\\\\longrightarrow$', (gtemp.index[15], -0.8));\n",
347 | "ax.annotate('$\\\\nearrow$', (gtemp.index[-30], 0));\n",
348 | "ax.xaxis.set_ticks_position('bottom')\n",
349 | "fig.tight_layout();"
350 | ]
351 | },
352 | {
353 | "cell_type": "markdown",
354 | "metadata": {},
355 | "source": [
356 | "### Pattern 3: Variation around a Stable Mean\n",
357 | "\n",
358 | "Dow Jones Industrial Average\n",
359 | "- `data/dji.csv`\n",
360 | "\n",
361 | "```python\n",
362 | "import pandas_datareader.data as web\n",
363 | "\n",
364 | "start = pd.Timestamp('2006-04-20')\n",
365 | "end = pd.Timestamp('2016-04-20')\n",
366 | "\n",
367 | "dji = web.DataReader(\"^DJI\", 'yahoo', start, end)\n",
368 | "\n",
369 | "dji['Return_log'] = dji['Close'].apply(lambda x: np.log(x)).diff()\n",
370 | "\n",
371 | "dji.to_csv('data/dji.csv')\n",
372 | "```"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": null,
378 | "metadata": {
379 | "ExecuteTime": {
380 | "end_time": "2016-08-24T11:40:54.532548",
381 | "start_time": "2016-08-24T11:40:53.801747"
382 | },
383 | "collapsed": false
384 | },
385 | "outputs": [],
386 | "source": [
387 | "dji = pd.read_csv('data/dji.csv', header=0, index_col=0)\n",
388 | "\n",
389 | "fig, ax = plt.subplots(figsize=(8, 7))\n",
390 | "\n",
391 | "dji['Return_log'].plot(ax=ax);\n",
392 | "\n",
393 | "ax.set_title('Dow Jones Industrial returns, 2006-2016');\n",
394 | "ax.xaxis.set_ticks_position('bottom')\n",
395 | "fig.tight_layout();"
396 | ]
397 | },
398 | {
399 | "cell_type": "markdown",
400 | "metadata": {},
401 | "source": [
402 | "### Pattern 4: Cycles/Periodicity\n",
403 | "\n",
404 | "Number of annual sunspots"
405 | ]
406 | },
407 | {
408 | "cell_type": "code",
409 | "execution_count": null,
410 | "metadata": {
411 | "ExecuteTime": {
412 | "end_time": "2016-08-24T11:40:54.562494",
413 | "start_time": "2016-08-24T11:40:54.533876"
414 | },
415 | "collapsed": false
416 | },
417 | "outputs": [],
418 | "source": [
419 | "print(sm.datasets.sunspots.NOTE)"
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": null,
425 | "metadata": {
426 | "ExecuteTime": {
427 | "end_time": "2016-08-24T11:40:54.601703",
428 | "start_time": "2016-08-24T11:40:54.563767"
429 | },
430 | "collapsed": false
431 | },
432 | "outputs": [],
433 | "source": [
434 | "sun = sm.datasets.sunspots.load_pandas().data\n",
435 | "sun['YEAR'] = pd.to_datetime(sun['YEAR'].astype(int), format='%Y')\n",
436 | "sun = sun.set_index('YEAR')"
437 | ]
438 | },
439 | {
440 | "cell_type": "code",
441 | "execution_count": null,
442 | "metadata": {
443 | "ExecuteTime": {
444 | "end_time": "2016-08-24T11:40:54.642270",
445 | "start_time": "2016-08-24T11:40:54.603075"
446 | },
447 | "collapsed": true
448 | },
449 | "outputs": [],
450 | "source": [
451 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n",
452 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n",
453 | " \n",
454 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n",
455 | " '''\n",
456 | " fig = plt.figure(figsize=figsize)\n",
457 | " layout = (2, 2)\n",
458 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n",
459 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n",
460 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n",
461 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n",
462 | " \n",
463 | " y.plot(ax=ts_ax)\n",
464 | " ts_ax.set_title(title)\n",
465 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n",
466 | " hist_ax.set_title('Histogram')\n",
467 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n",
468 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n",
469 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n",
470 | " sns.despine()\n",
471 | " plt.tight_layout()\n",
472 | " return ts_ax, acf_ax, pacf_ax"
473 | ]
474 | },
475 | {
476 | "cell_type": "code",
477 | "execution_count": null,
478 | "metadata": {
479 | "ExecuteTime": {
480 | "end_time": "2016-08-24T11:40:56.478273",
481 | "start_time": "2016-08-24T11:40:54.643768"
482 | },
483 | "collapsed": false
484 | },
485 | "outputs": [],
486 | "source": [
487 | "tsplot(sun, lags=40);"
488 | ]
489 | },
490 | {
491 | "cell_type": "markdown",
492 | "metadata": {},
493 | "source": [
494 | "## 1.3 Demo: Plot and model data generated from ARIMA process"
495 | ]
496 | },
497 | {
498 | "cell_type": "code",
499 | "execution_count": null,
500 | "metadata": {
501 | "ExecuteTime": {
502 | "end_time": "2016-08-24T11:51:07.785101",
503 | "start_time": "2016-08-24T11:51:07.752205"
504 | },
505 | "collapsed": true
506 | },
507 | "outputs": [],
508 | "source": [
509 | "def generate_arima_data(arparams,\n",
510 | " maparams,\n",
511 | " i_order=0,\n",
512 | " n_samp=120,\n",
513 | " rng_state=None,\n",
514 | " sigma=1,\n",
515 | " burnin=10,\n",
516 | " lin_trend=None,\n",
517 | " verbose=True,\n",
518 | " ):\n",
519 | " \n",
520 | " if rng_state is None:\n",
521 | " rng_state = np.random.RandomState()\n",
522 | " ar = np.r_[1, -arparams] # add zero-lag and negate\n",
523 | " ma = np.r_[1, maparams] # add zero-lag\n",
524 | " \n",
525 | " if verbose:\n",
526 | " arma_process = smt.ArmaProcess(ar, ma, nobs=n_samp)\n",
527 | " print('Is the process stationary? {}'.format(arma_process.isstationary))\n",
528 | " print('Is the process invertible? {}'.format(arma_process.isinvertible))\n",
529 | "\n",
530 | " y = smt.arma_generate_sample(ar, ma, n_samp, sigma=sigma, distrvs=rng_state.randn, burnin=burnin)\n",
531 | " # add deterministic linear trend\n",
532 | " if lin_trend is not None:\n",
533 | " y = y + np.cumsum(np.repeat(lin_trend, n_samp))\n",
534 | " for i in range(i_order):\n",
535 | " y = y.cumsum()\n",
536 | " \n",
537 | " return y"
538 | ]
539 | },
540 | {
541 | "cell_type": "code",
542 | "execution_count": null,
543 | "metadata": {
544 | "ExecuteTime": {
545 | "end_time": "2016-08-24T11:51:08.340037",
546 | "start_time": "2016-08-24T11:51:08.149594"
547 | },
548 | "collapsed": false
549 | },
550 | "outputs": [],
551 | "source": [
552 | "# the function for generating data and plotting\n",
553 | "def arima_data(n_samp=120,\n",
554 | " ar_gen=0,\n",
555 | " ar1_coef=0,\n",
556 | " ar2_coef=0,\n",
557 | " ar3_coef=0,\n",
558 | " ar4_coef=0,\n",
559 | " i_gen=0,\n",
560 | " ma_gen=0,\n",
561 | " ma1_coef=0,\n",
562 | " ma2_coef=0,\n",
563 | " ma3_coef=0,\n",
564 | " ma4_coef=0,\n",
565 | " rand_state=42,\n",
566 | " ylim=5,\n",
567 | " ar_fit_p=0,\n",
568 | " i_fit_d=0,\n",
569 | " ma_fit_q=0,\n",
570 | " n_train=108,\n",
571 | " n_forecast=24,\n",
572 | " dynamic=False,\n",
573 | " lin_trend=None,\n",
574 | " verbose=True,\n",
575 | " ):\n",
576 | " \n",
577 | " rng_state = np.random.RandomState(rand_state)\n",
578 | "\n",
579 | " arparams = np.array([ar1_coef, ar2_coef, ar3_coef, ar4_coef])\n",
580 | " arparams = arparams[:ar_gen]\n",
581 | " maparams = np.array([ma1_coef, ma2_coef, ma3_coef, ma4_coef])\n",
582 | " maparams = maparams[:ma_gen]\n",
583 | " \n",
584 | " print('Generated ARIMA({}, {}, {})'.format(ar_gen, i_gen, ma_gen))\n",
585 | " print('AR coeff = {}, MA coeff = {}'.format(arparams, maparams))\n",
586 | " \n",
587 | " y = generate_arima_data(arparams,\n",
588 | " maparams,\n",
589 | " i_gen,\n",
590 | " n_samp=n_samp,\n",
591 | " rng_state=rng_state,\n",
592 | " lin_trend=lin_trend,\n",
593 | " verbose=verbose,\n",
594 | " )\n",
595 | " \n",
596 | " # set a fake DatetimeIndex\n",
597 | " df = pd.DataFrame(data=y, columns=['value'], index=pd.date_range(start='1990-01-01', freq='MS', periods=len(y)))\n",
598 | " \n",
599 | " fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(16, 5))\n",
600 | " \n",
601 | " ax1.plot(df.iloc[:n_train]['value'], label='In-sample data', linestyle='-')\n",
602 | " # subtract 1 only to connect it to previous point in the graph\n",
603 | " ax1.plot(df.iloc[n_train-1:]['value'], label='Held-out data', linestyle='--')\n",
604 | "\n",
605 | " fitting=False\n",
606 | " if (((ar_gen > 0) and (ar1_coef != 0)) or ((ma_gen > 0) and (ma1_coef != 0))) and ((ar_fit_p > 0) or (ma_fit_q > 0)):\n",
607 | " fitting=True\n",
608 | " print('Fit ARIMA({}, {}, {})'.format(ar_fit_p, i_fit_d, ma_fit_q))\n",
609 | " \n",
610 | " training = df.iloc[:n_train]['value']\n",
611 | " \n",
612 | " if (lin_trend is not None) and (lin_trend > 0):\n",
613 | " #trend='t'\n",
614 | " \n",
615 | " # there's a bug in statsmodels 0.8.0rc1 regarding trend that has been fixed\n",
616 | " # https://github.com/statsmodels/statsmodels/issues/3111\n",
617 | " trend='n'\n",
618 | " else:\n",
619 | " trend='n'\n",
620 | " model = smt.SARIMAX(training, order=(ar_fit_p, i_fit_d, ma_fit_q),\n",
621 | " trend=trend,\n",
622 | " enforce_stationarity=False,\n",
623 | " enforce_invertibility=False,\n",
624 | " )\n",
625 | " results = model.fit()\n",
626 | " \n",
627 | " pred_begin = df.index[results.loglikelihood_burn]\n",
628 | " pred_end = df.index[n_train] + pd.DateOffset(months = n_forecast - 1)\n",
629 | " pred = results.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n",
630 | " end=pred_end.strftime('%Y-%m-%d'),\n",
631 | " dynamic=dynamic)\n",
632 | " pred_mean = pred.predicted_mean\n",
633 | " pred_ci = pred.conf_int(alpha=0.05)\n",
634 | " \n",
635 | " ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n",
636 | " ax1.fill_between(pred_ci.index,\n",
637 | " pred_ci.iloc[:, 0],\n",
638 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n",
639 | " # plot the residuals\n",
640 | " (df['value'] - pred_mean).dropna().plot(ax=ax2, marker='o')\n",
641 | " ax2.set_xlim((df.index[0], pred_end))\n",
642 | " ax2.set_title('Residuals ($data - model$)');\n",
643 | " ax2.axhline(y=0, linestyle='--', color='k', alpha=.5);\n",
644 | " \n",
645 | " # scale with i_gen\n",
646 | " ylim = ylim*(10**(i_gen))\n",
647 | " ax1.set_ylim((-ylim, ylim));\n",
648 | " ax1.legend(loc='best');\n",
649 | " \n",
650 | " if fitting:\n",
651 | " ax1.fill_betweenx(ax1.get_ylim(), df.index[n_train], pred_end, alpha=.1, zorder=-1)\n",
652 | " ax2.fill_betweenx(ax2.get_ylim(), df.index[n_train], pred_end, alpha=.1, zorder=-1)\n",
653 | " plt.show();\n",
654 | " print(results.summary())\n",
655 | " pass"
656 | ]
657 | },
658 | {
659 | "cell_type": "code",
660 | "execution_count": null,
661 | "metadata": {
662 | "ExecuteTime": {
663 | "end_time": "2016-08-24T11:51:08.872897",
664 | "start_time": "2016-08-24T11:51:08.540231"
665 | },
666 | "collapsed": false
667 | },
668 | "outputs": [],
669 | "source": [
670 | "# set up the widgets\n",
671 | "n_samp=120\n",
672 | "\n",
673 | "n_train=108\n",
674 | "n_forecast=24\n",
675 | "\n",
676 | "rand_state_init=42\n",
677 | "ylim_init=5\n",
678 | "\n",
679 | "# orders\n",
680 | "int_min = 0\n",
681 | "int_max = 4\n",
682 | "int_step = 1\n",
683 | "\n",
684 | "# sliders for data generation\n",
685 | "ar_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
686 | "i_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
687 | "ma_gen_slider = IntSlider(value=0, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
688 | "\n",
689 | "# coefficients\n",
690 | "lag_min = -1\n",
691 | "lag_max = 1\n",
692 | "lag_step = 0.1\n",
693 | "\n",
694 | "ar1_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
695 | "ar2_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
696 | "ar3_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
697 | "ar4_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
698 | "\n",
699 | "ma1_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
700 | "ma2_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
701 | "ma3_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
702 | "ma4_coef_slider = FloatSlider(value=0, min=lag_min, max=lag_max, step=lag_step, continuous_update=False)\n",
703 | "\n",
704 | "rand_slider = IntSlider(value=rand_state_init, min=0, max=10000, step=1, continuous_update=False)\n",
705 | "ylim_slider = IntSlider(value=ylim_init, min=1, max=100, step=1, continuous_update=False)\n",
706 | "\n",
707 | "# initial values and sliders for model parameters\n",
708 | "ar_fit_p_init=0\n",
709 | "i_fit_d_init=0\n",
710 | "ma_fit_q_init=0\n",
711 | "ar_fit_p_slider = IntSlider(value=ar_fit_p_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
712 | "i_fit_d_slider = IntSlider(value=i_fit_d_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
713 | "ma_fit_q_slider = IntSlider(value=ma_fit_q_init, min=int_min, max=int_max, step=int_step, continuous_update=False)\n",
714 | "\n",
715 | "# dynamic_init=n_train\n",
716 | "# dynamic_slider = IntSlider(value=dynamic_init, min=n_train-10, max=n_train+1, step=int_step, continuous_update=False)\n",
717 | "\n",
718 | "# lin_trend_init = 0\n",
719 | "# lin_trend_slider = FloatSlider(value=lin_trend_init, min=0, max=1.0, step=0.1, continuous_update=False)\n",
720 | "\n",
721 | "arima_w = interactive(\n",
722 | " arima_data,\n",
723 | " n_samp=fixed(n_samp),\n",
724 | " ar_gen=ar_gen_slider,\n",
725 | " ar1_coef=ar1_coef_slider,\n",
726 | " ar2_coef=ar2_coef_slider,\n",
727 | " ar3_coef=ar3_coef_slider,\n",
728 | " ar4_coef=ar4_coef_slider,\n",
729 | " i_gen=i_gen_slider,\n",
730 | " ma_gen=ma_gen_slider,\n",
731 | " ma1_coef=ma1_coef_slider,\n",
732 | " ma2_coef=ma2_coef_slider,\n",
733 | " ma3_coef=ma3_coef_slider,\n",
734 | " ma4_coef=ma4_coef_slider,\n",
735 | " rand_state=rand_slider,\n",
736 | " ylim=ylim_slider,\n",
737 | " ar_fit_p=ar_fit_p_slider,\n",
738 | " i_fit_d=i_fit_d_slider,\n",
739 | " ma_fit_q=ma_fit_q_slider,\n",
740 | " n_train=fixed(n_train),\n",
741 | " n_forecast=fixed(n_forecast),\n",
742 | " dynamic=fixed(False),\n",
743 | " #dynamic=dynamic_slider,\n",
744 | " lin_trend=fixed(None),\n",
745 | " #lin_trend=lin_trend_slider,\n",
746 | " verbose=fixed(True),\n",
747 | " )\n",
748 | "\n",
749 | "# arrange the widgets\n",
750 | "arima_widget = widgets.HBox([widgets.VBox(arima_w.children[:6]),\n",
751 | " widgets.VBox(arima_w.children[6:11]),\n",
752 | " widgets.VBox(arima_w.children[11:]),\n",
753 | " ])\n",
754 | "# this is the set of widgets in the function with defaults\n",
755 | "arima_widget.on_displayed(lambda x: arima_data(ar_gen=0,\n",
756 | " ar1_coef=0,\n",
757 | " ar2_coef=0,\n",
758 | " ar3_coef=0,\n",
759 | " ar4_coef=0,\n",
760 | " i_gen=0,\n",
761 | " ma_gen=0,\n",
762 | " ma1_coef=0,\n",
763 | " ma2_coef=0,\n",
764 | " ma3_coef=0,\n",
765 | " ma4_coef=0,\n",
766 | " rand_state=rand_state_init,\n",
767 | " ylim=ylim_init,\n",
768 | " ar_fit_p=ar_fit_p_init,\n",
769 | " i_fit_d=i_fit_d_init,\n",
770 | " ma_fit_q=ma_fit_q_init,\n",
771 | " n_train=n_train,\n",
772 | " #dynamic=dynamic_init,\n",
773 | " #lin_trend=lin_trend_init,\n",
774 | " ))"
775 | ]
776 | },
777 | {
778 | "cell_type": "code",
779 | "execution_count": null,
780 | "metadata": {
781 | "ExecuteTime": {
782 | "end_time": "2016-08-24T11:51:10.280655",
783 | "start_time": "2016-08-24T11:51:09.374116"
784 | },
785 | "collapsed": false
786 | },
787 | "outputs": [],
788 | "source": [
789 | "arima_widget"
790 | ]
791 | },
792 | {
793 | "cell_type": "markdown",
794 | "metadata": {
795 | "collapsed": true
796 | },
797 | "source": [
798 | "### Exericse 1\n",
799 | "\n",
800 | "1. Write down 2 - 4 examples of time series that you encouter in real life.\n",
801 | "2. Use the widget above to simulate a number of time series. Are there any of them that resemble the ones that you encounter in real life?"
802 | ]
803 | },
804 | {
805 | "cell_type": "code",
806 | "execution_count": null,
807 | "metadata": {
808 | "collapsed": true
809 | },
810 | "outputs": [],
811 | "source": []
812 | }
813 | ],
814 | "metadata": {
815 | "kernelspec": {
816 | "display_name": "Python 3",
817 | "language": "python",
818 | "name": "python3"
819 | },
820 | "language_info": {
821 | "codemirror_mode": {
822 | "name": "ipython",
823 | "version": 3
824 | },
825 | "file_extension": ".py",
826 | "mimetype": "text/x-python",
827 | "name": "python",
828 | "nbconvert_exporter": "python",
829 | "pygments_lexer": "ipython3",
830 | "version": "3.5.2"
831 | },
832 | "nav_menu": {},
833 | "toc": {
834 | "navigate_menu": true,
835 | "number_sections": false,
836 | "sideBar": true,
837 | "threshold": 6,
838 | "toc_cell": false,
839 | "toc_section_display": "block",
840 | "toc_window_display": false
841 | },
842 | "widgets": {
843 | "state": {
844 | "e1d05cf91d904ccfa8f5327c13a904cf": {
845 | "views": [
846 | {
847 | "cell_index": 31
848 | }
849 | ]
850 | }
851 | },
852 | "version": "1.2.0"
853 | }
854 | },
855 | "nbformat": 4,
856 | "nbformat_minor": 0
857 | }
858 |
--------------------------------------------------------------------------------
/Section_2_ARIMA_Models_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# PyData San Francisco 2016\n",
15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n",
16 | "### Section 2: Exploratory Time Series Data Analysis and the Class of ARIMA Models"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### Topics in this section include \n",
24 | "\n",
25 | "- 2.1 The notion of stochastic processes, time series, and stationarity\n",
26 | "- 2.2 Exploratory Time Series Data Analysis\n",
27 | "- 2.3 Mathematical formulation of ARIMA models\n",
28 | "- 2.4 An Introduction to the *Box-Jenkins Approach* to ARIMA Modeling"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "### 2.1 The notion of Stochastic Processes, Time Series, Stationarity, Autocorrelation\n",
36 | " Note: This is a relatively dense section. However, it sets up the necessary framework for us to study the class of *Autoregressive Integrated Moving Average* model.
\n",
37 | "\n",
38 | "#### Key Takeaway from this section:\n",
39 | "1. An observed time series is treated as a realization of an underlying probability model.\n",
40 | "2. We will study a certain class of probability model that comes with a very appealing (and simple) probability structure.\n",
41 | "3. The concept of (weak) stationarity is a key requirement of the class of time series models that we will study.\n",
42 | "4. The concept of autocorrelation function and (partial) autocorrelation function are a main tool for us to analyze a time series.\n"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "* The $\\textbf{autocovariance function}$ is defined as\n",
50 | "\n",
51 | "$$\\gamma_{x}(s,t) = cov(x_s,x_t) = E[(x_s-\\mu_s)(x_t-\\mu_t)] \\forall s,t$$\n",
52 | "\n",
53 | "* Two natural implications are $(1) \\gamma_{x}(s,t) = \\gamma_{x}(t,s)$ and $(2)$ $\\gamma_{x}(s,s) = cov(x_s,x_s) = E[(x_s-\\mu_s)^2]$\n",
54 | "\n",
55 | "* A correlation of a variable with itself at different times is known as $\\textit{autocorrelation}$. If a time series model is second-order stationary (i.e. stationary in both mean and variance: $\\mu_t = \\mu$ and $\\sigma_t^2 = \\sigma^2$ for all $t$), then an $\\textit{autocovariance function}$ can be expressed as a function only of the time lag $k$:\n",
56 | "\n",
57 | "$$ \\gamma_k = E[(x_t-\\mu)(x_{t+k} - \\mu)] $$\n",
58 | " \n",
59 | "* Likewise, the autocorrelation function \\emph{acf} is defined as\n",
60 | "\n",
61 | "$$ \\rho_k = \\frac{\\gamma_k}{\\sigma^2} $$\n",
62 | " \n",
63 | "* When $k=0$, $\\rho_0 = 1$"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {},
69 | "source": [
70 | "#### Estimation of ($\\mathbf{1^{st}}$ order) Dependency:\n",
71 | "\n",
72 | "* Using the $\\textit{moment principles}$, the $\\textit{acvf}$ and $\\textit{acf}$ can be estimated from a time series by their sample equivalents. The sample \\emph{acvf} can be estimated using the following formula:\n",
73 | "\n",
74 | "$$ \\hat{\\gamma}_k = \\frac{1}{T} \\sum_{t=1}^{T-k} \\left( x_t - \\bar{x} \\right) \\left( x_{t+k} - \\bar{x} \\right) $$\n",
75 | "\n",
76 | "* Note that the sum is divided by $T$ and and not $T-k$.\n",
77 | "\n",
78 | "* The sample $\\textit{ACF}$ is defined by\n",
79 | "\n",
80 | "$$ \\frac{\\hat{\\gamma}_k}{\\hat{\\gamma}_0} = \\frac{\\frac{1}{T} \\sum_{t=1}^{T-k} \\left( x_t - \\bar{x} \\right) \\left( x_{t+k} - \\bar{x} \\right)}{ \\frac{1}{T} \\sum_{t=1}^{T} \\left( x_t - \\bar{x} \\right)^2} $$\n"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "#### Notion of Stationarity:\n",
88 | "\n",
89 | "\n",
90 | "* A time series ${x_t}$ is said to be $\\textit{strictly stationary}$ if the joint distributions $F(x_{t_1}, \\dots, x_{t_n})$ and $F( x_{t_1+m}, \\dots, x_{t_n +m})$ are the same, $\\forall$ $t_1, ... t_n$ and $m$. This is a very strong condition, too strong to be applied in practice; it implies that the distribution is unchanged for any time shift!\n",
91 | "\n",
92 | "* A weaker and more practical stationarity condition is that of $\\textit{weakly stationary}$ (or $\\textit{second order stationarity}$). A time series $x_t$ is said to be $\\textit{weakly stationary}$ if it is mean and variance stationary and its autocovariance $Cov(x_t,x_{t+k})$ depends only the time displacement $k$ and can be written as $\\gamma(k)$. \n",
93 | "\n",
94 | "* Second order stationarity plays an important role in many of the time series models we will discuss in this tutorial; if a time series is second order stationary, then once a distribution assumption, such as normality, is imposed, the series can be completely characterized by its mean and covariance structure."
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "### 2.2 Exploratory Time Series Data Analysis\n",
102 | "\n",
103 | "* Now that we introduce the essential concepts for characterizing the probability structure of a time series, we will proceed to \"$\\textit{explore}$\" these characteristics empirically.\n",
104 | "\n",
105 | "* Specifically, we will use *time series plot, histogram (and its variants), plot of sample autocorrelation, and plot of sample partial autocorrelation}* to examine a given time series. \n",
106 | "\n",
107 | "* These visuals play a very crucial role in the $\\textit{Box-Jenkins approach}$ to ARIMA modeling."
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {
114 | "collapsed": false
115 | },
116 | "outputs": [],
117 | "source": [
118 | "# If you get an error in reading pandas-datareader run the following\n",
119 | "# !conda install pandas-datareader -y"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "metadata": {
126 | "collapsed": true
127 | },
128 | "outputs": [],
129 | "source": [
130 | "%load_ext autoreload\n",
131 | "%autoreload 2\n",
132 | "%matplotlib inline\n",
133 | "%config InlineBackend.figure_format='retina'\n",
134 | "\n",
135 | "from __future__ import absolute_import, division, print_function\n",
136 | "\n",
137 | "import sys\n",
138 | "import os\n",
139 | "\n",
140 | "import pandas as pd\n",
141 | "import numpy as np\n",
142 | "\n",
143 | "# # Remote Data Access\n",
144 | "# import pandas_datareader.data as web\n",
145 | "# import datetime\n",
146 | "# # reference: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html\n",
147 | "\n",
148 | "# TSA from Statsmodels\n",
149 | "import statsmodels.api as sm\n",
150 | "import statsmodels.formula.api as smf\n",
151 | "import statsmodels.tsa.api as smt\n",
152 | "\n",
153 | "# Display and Plotting\n",
154 | "import matplotlib.pylab as plt\n",
155 | "import seaborn as sns\n",
156 | "\n",
157 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n",
158 | "np.set_printoptions(precision=5, suppress=True) # numpy\n",
159 | "\n",
160 | "pd.set_option('display.max_columns', 100)\n",
161 | "pd.set_option('display.max_rows', 100)\n",
162 | "\n",
163 | "# seaborn plotting style\n",
164 | "sns.set(style='ticks', context='poster')"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": null,
170 | "metadata": {
171 | "collapsed": false
172 | },
173 | "outputs": [],
174 | "source": [
175 | "# Load data from the internet using the Remote Data Reader\n",
176 | "# This is a very useful function, as it allows one to access a lot of time series and (non-time series) data publicly\n",
177 | "# available on the internet\n",
178 | "\n",
179 | "# start = pd.Timestamp('2000-01-01')\n",
180 | "# end = pd.Timestamp('2016-07-31')\n",
181 | "\n",
182 | "# C = web.DataReader(\"C\", 'yahoo', start, end)\n",
183 | "# Sentiment= web.DataReader(\"UMCSENT\", 'fred', start, end)\n",
184 | "# T10yr = web.DataReader(\"^TNX\", 'yahoo', start, end)\n",
185 | "\n",
186 | "# Save the DataFrame to a csv file\n",
187 | "# Sentiment.to_csv('data/sentiment.csv')\n",
188 | "# C.to_csv('data/citi.csv')\n",
189 | "# T10yr.to_csv('data/T10yr.csv')"
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": null,
195 | "metadata": {
196 | "collapsed": false
197 | },
198 | "outputs": [],
199 | "source": [
200 | "#Read the data\n",
201 | "\n",
202 | "Sentiment = 'data/sentiment.csv'\n",
203 | "Sentiment = pd.read_csv(Sentiment, index_col=0, parse_dates=[0])\n",
204 | "\n",
205 | "C = 'data/citi.csv'\n",
206 | "C = pd.read_csv(C, index_col=0, parse_dates=[0])\n",
207 | "\n",
208 | "T10yr = 'data/T10yr.csv'\n",
209 | "T10yr = pd.read_csv(T10yr, index_col=0, parse_dates=[0])"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": null,
215 | "metadata": {
216 | "collapsed": false
217 | },
218 | "outputs": [],
219 | "source": [
220 | "print(\"Citigroup's stock price:\", \"\\n\", C.dtypes, \"\\n\")\n",
221 | "print(\"10 Year Treasury Bond Rate:\", \"\\n\", T10yr.dtypes, \"\\n\")\n",
222 | "print(\"University of Michigan: Consumer Sentiment:\", \"\\n\", Sentiment.dtypes)"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": null,
228 | "metadata": {
229 | "collapsed": false
230 | },
231 | "outputs": [],
232 | "source": [
233 | "T10yr.head()"
234 | ]
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": null,
239 | "metadata": {
240 | "collapsed": false
241 | },
242 | "outputs": [],
243 | "source": [
244 | "C['Close'].head(10)"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": null,
250 | "metadata": {
251 | "collapsed": false
252 | },
253 | "outputs": [],
254 | "source": [
255 | "Sentiment.head()"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": null,
261 | "metadata": {
262 | "collapsed": false
263 | },
264 | "outputs": [],
265 | "source": [
266 | "C.close = C['Close']\n",
267 | "T10yr.close = T10yr['Close']"
268 | ]
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "metadata": {},
273 | "source": [
274 | "### Plots to used in Exploratory Time Series Analysis\n",
275 | "\n",
276 | "* Time series plot: to visualize the dynamic and evolution of the series\n",
277 | "* Histogram or NP Density: to visualize the distribution \n",
278 | "* Sample ACF and PACF graphs: to examine autocorrelation and partial autocorrelation\n",
279 | "* Scatterplot matrix on lags: an alternative way to visualize autocorrelation of the series"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": null,
285 | "metadata": {
286 | "collapsed": false
287 | },
288 | "outputs": [],
289 | "source": [
290 | "Sentiment.head()"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": null,
296 | "metadata": {
297 | "collapsed": false
298 | },
299 | "outputs": [],
300 | "source": [
301 | "# Select the series from 2005 - 2016\n",
302 | "sentiment_short = Sentiment.ix['2005':'2016']"
303 | ]
304 | },
305 | {
306 | "cell_type": "code",
307 | "execution_count": null,
308 | "metadata": {
309 | "collapsed": false
310 | },
311 | "outputs": [],
312 | "source": [
313 | "sentiment_short.index[:5]"
314 | ]
315 | },
316 | {
317 | "cell_type": "code",
318 | "execution_count": null,
319 | "metadata": {
320 | "collapsed": false
321 | },
322 | "outputs": [],
323 | "source": [
324 | "print(sentiment_short.dtypes)"
325 | ]
326 | },
327 | {
328 | "cell_type": "code",
329 | "execution_count": null,
330 | "metadata": {
331 | "collapsed": false
332 | },
333 | "outputs": [],
334 | "source": [
335 | "sentiment_short.plot(figsize=(12,8))\n",
336 | "plt.legend(bbox_to_anchor=(1.25, 0.5))\n",
337 | "plt.title(\"Consumer Sentiment\")\n",
338 | "sns.despine()"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": null,
344 | "metadata": {
345 | "collapsed": false
346 | },
347 | "outputs": [],
348 | "source": [
349 | "fig = plt.figure(figsize=(12,8))\n",
350 | "\n",
351 | "ax1 = fig.add_subplot(211)\n",
352 | "fig = sm.graphics.tsa.plot_acf(sentiment_short, lags=20, ax=ax1)\n",
353 | "ax1.xaxis.set_ticks_position('bottom')\n",
354 | "fig.tight_layout();\n",
355 | "\n",
356 | "ax2 = fig.add_subplot(212)\n",
357 | "fig = sm.graphics.tsa.plot_pacf(sentiment_short, lags=20, ax=ax2)\n",
358 | "ax2.xaxis.set_ticks_position('bottom')\n",
359 | "fig.tight_layout();"
360 | ]
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": null,
365 | "metadata": {
366 | "collapsed": false
367 | },
368 | "outputs": [],
369 | "source": [
370 | "# Scatterplot matrix is another way to visualize the autocorrelation\n",
371 | "# Its advantage is that it is very intuitive, as scatterplot (i.e. one of the plots in a scatterplot matrix) \n",
372 | "# is used often in practice\n",
373 | "\n",
374 | "lags=9\n",
375 | "\n",
376 | "ncols=3\n",
377 | "nrows=int(np.ceil(lags/ncols))\n",
378 | "\n",
379 | "fig, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(4*ncols, 4*nrows))\n",
380 | "\n",
381 | "for ax, lag in zip(axes.flat, np.arange(1,lags+1, 1)):\n",
382 | " lag_str = 't-{}'.format(lag)\n",
383 | " X = (pd.concat([sentiment_short, sentiment_short.shift(-lag)], axis=1,\n",
384 | " keys=['y'] + [lag_str]).dropna())\n",
385 | "\n",
386 | " X.plot(ax=ax, kind='scatter', y='y', x=lag_str);\n",
387 | " corr = X.corr().as_matrix()[0][1]\n",
388 | " ax.set_ylabel('Original')\n",
389 | " ax.set_title('Lag: {} (corr={:.2f})'.format(lag_str, corr));\n",
390 | " ax.set_aspect('equal');\n",
391 | " sns.despine();\n",
392 | "\n",
393 | "fig.tight_layout();"
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": null,
399 | "metadata": {
400 | "collapsed": true
401 | },
402 | "outputs": [],
403 | "source": [
404 | "# Or, we can plot the four essential plots all at once:\n",
405 | "\n",
406 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n",
407 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n",
408 | " \n",
409 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n",
410 | " '''\n",
411 | " fig = plt.figure(figsize=figsize)\n",
412 | " layout = (2, 2)\n",
413 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n",
414 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n",
415 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n",
416 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n",
417 | " \n",
418 | " y.plot(ax=ts_ax)\n",
419 | " ts_ax.set_title(title)\n",
420 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n",
421 | " hist_ax.set_title('Histogram')\n",
422 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n",
423 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n",
424 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n",
425 | " sns.despine()\n",
426 | " plt.tight_layout()\n",
427 | " return ts_ax, acf_ax, pacf_ax"
428 | ]
429 | },
430 | {
431 | "cell_type": "code",
432 | "execution_count": null,
433 | "metadata": {
434 | "collapsed": false
435 | },
436 | "outputs": [],
437 | "source": [
438 | "tsplot(sentiment_short, title='Consumer Sentiment', lags=36);"
439 | ]
440 | },
441 | {
442 | "cell_type": "markdown",
443 | "metadata": {},
444 | "source": [
445 | "### 2.3 Mathematical formulation of ARIMA models\n",
446 | "\n",
447 | "* A time series ${z_t}$ follows an ARIMA$(p,d,q)$ process if the $d^{th}$ differences of the ${z_t}$ series is an ARMA($p,q$) process. Using lag operator, it can expressed as \n",
448 | "\n",
449 | "$$\\begin{equation}\n",
450 | " \\phi_p(B)(1-B)^d z_t = \\theta_q(B) \\omega_t\n",
451 | "\\end{equation}$$\n",
452 | "\n",
453 | "where $\\phi_p$ and $\\theta_q$ are polynomials of orders $p$ and $q$.\n",
454 | "\n",
455 | "* Writing an ARIMA$(p,d,q)$ may seem too abstract, and whenever a model is presented this way, you may get a feel of the model by making simple cases, such as a low order ARIMA$(p,d,q)$ model. \n",
456 | "\n",
457 | "\n",
458 | "\n",
459 | "* Below show two such examples to unpack some of these notations:\n",
460 | "\n",
461 | "$\\textbf{Example 1:}$\n",
462 | "Consider the model $ z_t = z_{t-1} + \\omega_t + \\theta \\omega_{t-1}$. Re-write this model using lag (or backward shift) operator. By now, we should be familiar with this kind of manipulation:\n",
463 | "\n",
464 | "$$\\begin{align}\n",
465 | " z_t &= z_{t-1} + \\omega_t + \\theta \\omega_{t-1} \\\\\n",
466 | " z_t - z_{t-1} &= \\omega_t + \\theta \\omega_{t-1} \\\\\n",
467 | " (1-B)z_t &= (1+\\theta B)\\omega_t\n",
468 | "\\end{align}$$\n",
469 | "\n",
470 | "where $B$ is a lag operator that when applying to $z_t$, gives $z_{t-1}$. That is, $Bz_t = z_{t-1}$.\n",
471 | "\n",
472 | "* This becomes an ARIMA(0,1,1) model, or $\\textit{integrated moving average}$ model (IMA(1,1)).\n",
473 | "\n",
474 | "$\\textbf{Example 2:}$\n",
475 | "Consider a model of the form\n",
476 | "\n",
477 | "$$\\begin{equation}\n",
478 | " z_t = \\phi z_{t-1} + z_{t-1} - \\phi z_{t-2} + \\omega_t\n",
479 | "\\end{equation}$$\n",
480 | "\n",
481 | "* Rewrite the equation, re-arrange terms, and factorize them:\n",
482 | "\n",
483 | "$$\\begin{align}\n",
484 | " z_t - z_{t-1} &= \\phi (z_{t-1} - z_{t-2}) + \\omega_t \\\\\n",
485 | " (z_t - z_{t-1}) - \\phi (z_{t-1} - z_{t-2}) &= \\omega_t \\\\\n",
486 | " (1 - \\phi B)(z_t - z_{t-1}) &= \\omega_t \\\\\n",
487 | " (1 - \\phi B) \\bigtriangledown z_t &= \\omega_t \\\\\n",
488 | " (1 - \\phi B)(1 - B)z_t &= \\omega_t\n",
489 | "\\end{align}$$\n",
490 | "\n",
491 | "The model can be re-written as $(1 - \\phi B) \\bigtriangledown y_t = \\omega_t$, which is an ARIMA(1,1,0) model.\n",
492 | "\n",
493 | "**Sidenotes**\n",
494 | "\n",
495 | "A series ${z_t}$ is $\\textit{integrated}$ of order $d$, denoted as $I(d)$, if the $d^{th}$ differences of ${z_t}$ is a white noise: $\\bigtriangledown^d y_t = \\omega_t$, where $\\bigtriangledown^d \\equiv (1-B)^d$:\n",
496 | "\n",
497 | "$$\\begin{equation}\n",
498 | " (1-B)^d y_t = \\omega_t\n",
499 | "\\end{equation}$$\n",
500 | "\n",
501 | "As such, random walk is the special case I(1).\n",
502 | "\n",
503 | "* In practice, I(0) and I(1) cases find themselves having the most applications.\n",
504 | "\n"
505 | ]
506 | },
507 | {
508 | "cell_type": "markdown",
509 | "metadata": {},
510 | "source": [
511 | "### 2.4 An Overview of the Box-Jenkins Approach to Non-Seasonal ARIMA Modeling\n",
512 | "\n",
513 | "1. Assess the stationarity of the process $z_t$\n",
514 | "2. If the process is not stationary, difference it (i.e. create an integrated model) as many times as needed to produced a stationary process to be modeled using the $\\textit{mixed autoregressive-moving average process}$ described above.\n",
515 | "3. Identify (i.e. determining the order of the process) the resulting the ARMA model.\n",
516 | " * The sample autocorrelation and sample partial autocorrelation functions are tools used in step $1$ and $2$.\n",
517 | "\n",
518 | "In practice, other steps are necessary in order to produce a functionable model. These steps include:\n",
519 | "- Model diagnostic checking\n",
520 | "- Re-specification of the model if one or more of the underlying statistical assumptions is not satisfied\n",
521 | "- Model selection\n",
522 | "- Perform statistical inference and/or forecasting\n",
523 | "- Forecast evaluation"
524 | ]
525 | },
526 | {
527 | "cell_type": "markdown",
528 | "metadata": {},
529 | "source": [
530 | "\n",
531 | "**Exercise 2:**\n",
532 | "\n",
533 | "Let use *series1.csv* and conduct the exploratory data analysis \n",
534 | "
"
535 | ]
536 | },
537 | {
538 | "cell_type": "code",
539 | "execution_count": null,
540 | "metadata": {
541 | "collapsed": true
542 | },
543 | "outputs": [],
544 | "source": [
545 | "# Step 1: Import the csv file containing the series for the analysis\n",
546 | "filename_ts = 'data/series1.csv'\n",
547 | "ts_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])"
548 | ]
549 | },
550 | {
551 | "cell_type": "code",
552 | "execution_count": null,
553 | "metadata": {
554 | "collapsed": false
555 | },
556 | "outputs": [],
557 | "source": [
558 | "# Step 2: Explore the patterns of the time series and its autocorrelation and partial autocorrelation structure\n",
559 | "\n",
560 | "# Choose the number of lags to display the sample ACF and PACF\n",
561 | "n_lag=25\n",
562 | "graph_title=\"Series 1\"\n",
563 | "\n",
564 | "# Make sure the tsplot() function is defined before running the following command\n",
565 | "tsplot(ts_df, title=graph_title, lags=n_lag);"
566 | ]
567 | },
568 | {
569 | "cell_type": "markdown",
570 | "metadata": {},
571 | "source": [
572 | "** Step 3**\n",
573 | "Type your observations here and discuss with your neighbors.\n",
574 | "* Are there any trend, seasonality, cycles?\n",
575 | "* What are pattern of the ACF? Does it decline exponentially or dampen towards zero? Does it have a sharp cut-off?\n",
576 | "* What about the PACF?"
577 | ]
578 | },
579 | {
580 | "cell_type": "code",
581 | "execution_count": null,
582 | "metadata": {
583 | "collapsed": true
584 | },
585 | "outputs": [],
586 | "source": []
587 | }
588 | ],
589 | "metadata": {
590 | "kernelspec": {
591 | "display_name": "Python 3",
592 | "language": "python",
593 | "name": "python3"
594 | },
595 | "language_info": {
596 | "codemirror_mode": {
597 | "name": "ipython",
598 | "version": 3
599 | },
600 | "file_extension": ".py",
601 | "mimetype": "text/x-python",
602 | "name": "python",
603 | "nbconvert_exporter": "python",
604 | "pygments_lexer": "ipython3",
605 | "version": "3.5.2"
606 | }
607 | },
608 | "nbformat": 4,
609 | "nbformat_minor": 0
610 | }
611 |
--------------------------------------------------------------------------------
/Section_3_ARIMA_Modeling_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# PyData San Francisco 2016\n",
15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n",
16 | "### Section 3: ARIMAX Models"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### Topics in this section include \n",
24 | "\n",
25 | "\n",
26 | " - 3.1 Model Estimation and Identification\n",
27 | " - 3.2 Model Diagnostic Checking\n",
28 | " * Define the stationary and invertible conditions for $ARIMA(p,d,q)$ models\n",
29 | " - 3.3 Model performance evaluation (in-sample fit)\n",
30 | " - 3.4 Forecasting and forecast evaluation \n",
31 | " - 3.5 A few words on adding explanatory variables, its use cases, and its practical suggestions"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": null,
37 | "metadata": {
38 | "collapsed": false
39 | },
40 | "outputs": [],
41 | "source": [
42 | "%load_ext autoreload\n",
43 | "%autoreload 2\n",
44 | "%matplotlib inline\n",
45 | "%config InlineBackend.figure_format='retina'\n",
46 | "\n",
47 | "from __future__ import absolute_import, division, print_function\n",
48 | "\n",
49 | "import sys\n",
50 | "import os\n",
51 | "\n",
52 | "import pandas as pd\n",
53 | "import numpy as np\n",
54 | "\n",
55 | "# TSA from Statsmodels\n",
56 | "import statsmodels.api as sm\n",
57 | "import statsmodels.formula.api as smf\n",
58 | "import statsmodels.tsa.api as smt\n",
59 | "\n",
60 | "# Display and Plotting\n",
61 | "import matplotlib.pylab as plt\n",
62 | "import seaborn as sns\n",
63 | "\n",
64 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n",
65 | "np.set_printoptions(precision=5, suppress=True) # numpy\n",
66 | "\n",
67 | "pd.set_option('display.max_columns', 100)\n",
68 | "pd.set_option('display.max_rows', 100)\n",
69 | "\n",
70 | "# seaborn plotting style\n",
71 | "sns.set(style='ticks', context='poster')"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "\n",
79 | "\n",
80 | "** Read a series stored in a csv file. ** This is the same series we used in *Exercise 2*.\n",
81 | "
"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "metadata": {
88 | "collapsed": false
89 | },
90 | "outputs": [],
91 | "source": [
92 | "# Import the csv file containing the series for the analysis\n",
93 | "# This is the file we just analyzed in Exercise 2\n",
94 | "\n",
95 | "filename_ts = 'data/series1.csv'\n",
96 | "ts_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])\n",
97 | "\n",
98 | "n_sample = ts_df.shape[0]"
99 | ]
100 | },
101 | {
102 | "cell_type": "code",
103 | "execution_count": null,
104 | "metadata": {
105 | "collapsed": false
106 | },
107 | "outputs": [],
108 | "source": [
109 | "print(ts_df.shape)\n",
110 | "print(ts_df.head())"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "### 3.1 Model Identification (ARIMA Model Determination)\n",
118 | "\n",
119 | "1. Determine the *degree of differencing*, $d$\n",
120 | "\n",
121 | "2. Study the patterns of the ACF and PACF of the appropriately differenced series: $\\omega_t = (1-B)^d z_t$, as these autocorrelation functions will provide indication for the choice of the order of autoregressive and the moving average components. While we did not have enough time in this tutorial, it is very beneficial to study the *theoretical* ACF and PACF of the autoregressive, moving average, and the mixed autoregressive and moving average processes.\n",
122 | "\n",
123 | "3. The table below summarize the patterns of the ACF and PACF associated with the $AR(p)$, $MA(q)$, and $ARMA(p,q)$ processes:\n",
124 | "\n",
125 | "| Process | ACF | PACF |\n",
126 | "|---------------|:--------------------:|:--------------------:|\n",
127 | "| **AR(p)** | tails off | cutoff after lag $p$ |\n",
128 | "| **MA(q)** | cutoff after lag $q$ | tails off |\n",
129 | "| **ARMA(p,q)** | tails off | tails off |\n",
130 | "\n",
131 | "4. In general, the ACF of an autoregressive process is similar to the PACF of a moving average process, and vice versa.\n",
132 | "5. Keep in mind that these are theoretical properties. In practice, the estimated sample ACF and PACF can come with large variances, deviating from the underlying theoretical behavior. As such, it is prudent to recognize that these are but broad characteristics, and it is quite possible that several candidate models are narrowed down and will need to be investigaged further in the later stage of the modeling process."
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "metadata": {
139 | "collapsed": false
140 | },
141 | "outputs": [],
142 | "source": [
143 | "# Create a training sample and testing sample before analyzing the series\n",
144 | "\n",
145 | "n_train=int(0.95*n_sample)+1\n",
146 | "n_forecast=n_sample-n_train\n",
147 | "#ts_df\n",
148 | "ts_train = ts_df.iloc[:n_train]['value']\n",
149 | "ts_test = ts_df.iloc[n_train:]['value']\n",
150 | "print(ts_train.shape)\n",
151 | "print(ts_test.shape)\n",
152 | "print(\"Training Series:\", \"\\n\", ts_train.tail(), \"\\n\")\n",
153 | "print(\"Testing Series:\", \"\\n\", ts_test.head())"
154 | ]
155 | },
156 | {
157 | "cell_type": "code",
158 | "execution_count": null,
159 | "metadata": {
160 | "collapsed": true
161 | },
162 | "outputs": [],
163 | "source": [
164 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n",
165 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n",
166 | " \n",
167 | " Source: https://tomaugspurger.github.io/modern-7-timeseries.html\n",
168 | " '''\n",
169 | " fig = plt.figure(figsize=figsize)\n",
170 | " layout = (2, 2)\n",
171 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n",
172 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n",
173 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n",
174 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n",
175 | " \n",
176 | " y.plot(ax=ts_ax)\n",
177 | " ts_ax.set_title(title)\n",
178 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n",
179 | " hist_ax.set_title('Histogram')\n",
180 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n",
181 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n",
182 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n",
183 | " sns.despine()\n",
184 | " fig.tight_layout()\n",
185 | " return ts_ax, acf_ax, pacf_ax"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {
192 | "collapsed": false
193 | },
194 | "outputs": [],
195 | "source": [
196 | "tsplot(ts_train, title='A Given Training Series', lags=20);"
197 | ]
198 | },
199 | {
200 | "cell_type": "markdown",
201 | "metadata": {},
202 | "source": [
203 | "** Observations from the sample ACF and sample PACF (based on first 20 lags) **\n",
204 | "\n",
205 | "- The sample autocorrelation gradually tails off.\n",
206 | "- The sample partial autocorrelation does not exactly cut off at some lag $p$ but does not exactly tail off either.\n",
207 | "- Based on these observations, we could attempt an ARIMA(2,0,0) model as a starting point, although other orders could serve as candidates as well."
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": null,
213 | "metadata": {
214 | "collapsed": true
215 | },
216 | "outputs": [],
217 | "source": [
218 | "# Up until this point in the tutorial, statsmodels 0.6.1 is fine.\n",
219 | "# From here on, we need an updated version of statsmodels 0.8.0rc1"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {
226 | "collapsed": false
227 | },
228 | "outputs": [],
229 | "source": [
230 | "# Uncomment to install\n",
231 | "# !pip install --pre statsmodels --upgrade"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": null,
237 | "metadata": {
238 | "collapsed": false
239 | },
240 | "outputs": [],
241 | "source": [
242 | "#Model Estimation\n",
243 | "\n",
244 | "# Fit the model\n",
245 | "arima200 = sm.tsa.SARIMAX(ts_train, order=(2,0,0))\n",
246 | "model_results = arima200.fit()\n",
247 | "model_results.summary()"
248 | ]
249 | },
250 | {
251 | "cell_type": "markdown",
252 | "metadata": {
253 | "collapsed": true
254 | },
255 | "source": [
256 | "#### Digression:\n",
257 | "\n",
258 | "* In practice, one could *search* over a few models using the visual clues above as a starting point. \n",
259 | "* The code below gives one such example"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "metadata": {
266 | "collapsed": false
267 | },
268 | "outputs": [],
269 | "source": [
270 | "import itertools\n",
271 | "\n",
272 | "p_min = 0\n",
273 | "d_min = 0\n",
274 | "q_min = 0\n",
275 | "p_max = 4\n",
276 | "d_max = 0\n",
277 | "q_max = 4\n",
278 | "\n",
279 | "# Initialize a DataFrame to store the results\n",
280 | "results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min,p_max+1)],\n",
281 | " columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])\n",
282 | "\n",
283 | "for p,d,q in itertools.product(range(p_min,p_max+1),\n",
284 | " range(d_min,d_max+1),\n",
285 | " range(q_min,q_max+1)):\n",
286 | " if p==0 and d==0 and q==0:\n",
287 | " results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = np.nan\n",
288 | " continue\n",
289 | " \n",
290 | " try:\n",
291 | " model = sm.tsa.SARIMAX(ts_train, order=(p, d, q),\n",
292 | " #enforce_stationarity=False,\n",
293 | " #enforce_invertibility=False,\n",
294 | " )\n",
295 | " results = model.fit()\n",
296 | " results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = results.bic\n",
297 | " except:\n",
298 | " continue\n",
299 | "results_bic = results_bic[results_bic.columns].astype(float)"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": null,
305 | "metadata": {
306 | "collapsed": false
307 | },
308 | "outputs": [],
309 | "source": [
310 | "fig, ax = plt.subplots(figsize=(10, 8))\n",
311 | "ax = sns.heatmap(results_bic,\n",
312 | " mask=results_bic.isnull(),\n",
313 | " ax=ax,\n",
314 | " annot=True,\n",
315 | " fmt='.2f',\n",
316 | " );\n",
317 | "ax.set_title('BIC');"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": null,
323 | "metadata": {
324 | "collapsed": false
325 | },
326 | "outputs": [],
327 | "source": [
328 | "# Alternative model selection method, limited to only searching AR and MA parameters\n",
329 | "\n",
330 | "train_results = sm.tsa.arma_order_select_ic(ts_train, ic=['aic', 'bic'], trend='nc', max_ar=4, max_ma=4)\n",
331 | "\n",
332 | "print('AIC', train_results.aic_min_order)\n",
333 | "print('BIC', train_results.bic_min_order)"
334 | ]
335 | },
336 | {
337 | "cell_type": "markdown",
338 | "metadata": {},
339 | "source": [
340 | "### 3.2 Model Diagnostic Checking\n",
341 | "\n",
342 | "* Conduct visual inspection of the residual plots\n",
343 | "* Residuals of a well-specified ARIMA model should mimic *Gaussian white noises*: the residuals should be uncorrelated and distributed approximated normally with mean zero and variance $n^{-1}$\n",
344 | "* Apparent patterns in the standardized residuals and the estimated ACF of the residuals give an indication that the model need to be re-specified\n",
345 | "* The *results.plot_diagnostics()* function conveniently produce several plots to facilitate the investigation.\n",
346 | "* The estimation results also come with some statistical tests"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": null,
352 | "metadata": {
353 | "collapsed": false
354 | },
355 | "outputs": [],
356 | "source": [
357 | "# Residual Diagnostics\n",
358 | "# The plot_diagnostics function associated with the estimated result object produce a few plots that allow us \n",
359 | "# to examine the distribution and correlation of the estimated residuals\n",
360 | "\n",
361 | "model_results.plot_diagnostics(figsize=(16, 12));"
362 | ]
363 | },
364 | {
365 | "cell_type": "markdown",
366 | "metadata": {},
367 | "source": [
368 | "### 3.2.1 Formal testing\n",
369 | "\n",
370 | "** More information about the statistics under the parameters table, tests of standardized residuals **\n",
371 | "\n",
372 | "#### Test of heteroskedasticity\n",
373 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity\n",
374 | "\n",
375 | "#### Test of normality (Jarque-Bera)\n",
376 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality\n",
377 | "\n",
378 | "#### Test of serial correlation (Ljung-Box)\n",
379 | "- http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation"
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": null,
385 | "metadata": {
386 | "collapsed": false
387 | },
388 | "outputs": [],
389 | "source": [
390 | "# Re-run the above statistical tests, and more. To be used when selecting viable models.\n",
391 | "\n",
392 | "het_method='breakvar'\n",
393 | "norm_method='jarquebera'\n",
394 | "sercor_method='ljungbox'\n",
395 | "\n",
396 | "(het_stat, het_p) = model_results.test_heteroskedasticity(het_method)[0]\n",
397 | "norm_stat, norm_p, skew, kurtosis = model_results.test_normality(norm_method)[0]\n",
398 | "sercor_stat, sercor_p = model_results.test_serial_correlation(method=sercor_method)[0]\n",
399 | "sercor_stat = sercor_stat[-1] # last number for the largest lag\n",
400 | "sercor_p = sercor_p[-1] # last number for the largest lag\n",
401 | "\n",
402 | "# Run Durbin-Watson test on the standardized residuals.\n",
403 | "# The statistic is approximately equal to 2*(1-r), where r is the sample autocorrelation of the residuals.\n",
404 | "# Thus, for r == 0, indicating no serial correlation, the test statistic equals 2.\n",
405 | "# This statistic will always be between 0 and 4. The closer to 0 the statistic,\n",
406 | "# the more evidence for positive serial correlation. The closer to 4,\n",
407 | "# the more evidence for negative serial correlation.\n",
408 | "# Essentially, below 1 or above 3 is bad.\n",
409 | "dw = sm.stats.stattools.durbin_watson(model_results.filter_results.standardized_forecasts_error[0, model_results.loglikelihood_burn:])\n",
410 | "\n",
411 | "# check whether roots are outside the unit circle (we want them to be);\n",
412 | "# will be True when AR is not used (i.e., AR order = 0)\n",
413 | "arroots_outside_unit_circle = np.all(np.abs(model_results.arroots) > 1)\n",
414 | "# will be True when MA is not used (i.e., MA order = 0)\n",
415 | "maroots_outside_unit_circle = np.all(np.abs(model_results.maroots) > 1)\n",
416 | "\n",
417 | "print('Test heteroskedasticity of residuals ({}): stat={:.3f}, p={:.3f}'.format(het_method, het_stat, het_p));\n",
418 | "print('\\nTest normality of residuals ({}): stat={:.3f}, p={:.3f}'.format(norm_method, norm_stat, norm_p));\n",
419 | "print('\\nTest serial correlation of residuals ({}): stat={:.3f}, p={:.3f}'.format(sercor_method, sercor_stat, sercor_p));\n",
420 | "print('\\nDurbin-Watson test on residuals: d={:.2f}\\n\\t(NB: 2 means no serial correlation, 0=pos, 4=neg)'.format(dw))\n",
421 | "print('\\nTest for all AR roots outside unit circle (>1): {}'.format(arroots_outside_unit_circle))\n",
422 | "print('\\nTest for all MA roots outside unit circle (>1): {}'.format(maroots_outside_unit_circle))\n"
423 | ]
424 | },
425 | {
426 | "cell_type": "markdown",
427 | "metadata": {},
428 | "source": [
429 | "### 3.3 Model performance evaluation (in-sample fit)"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {
436 | "collapsed": false
437 | },
438 | "outputs": [],
439 | "source": [
440 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n",
441 | " \n",
442 | "ax1.plot(ts_train, label='In-sample data', linestyle='-')\n",
443 | "# subtract 1 only to connect it to previous point in the graph\n",
444 | "ax1.plot(ts_test, label='Held-out data', linestyle='--')\n",
445 | "\n",
446 | "# yes DatetimeIndex\n",
447 | "pred_begin = ts_train.index[model_results.loglikelihood_burn]\n",
448 | "pred_end = ts_test.index[-1]\n",
449 | "pred = model_results.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n",
450 | " end=pred_end.strftime('%Y-%m-%d'))\n",
451 | "pred_mean = pred.predicted_mean\n",
452 | "pred_ci = pred.conf_int(alpha=0.05)\n",
453 | "\n",
454 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n",
455 | "ax1.fill_between(pred_ci.index,\n",
456 | " pred_ci.iloc[:, 0],\n",
457 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n",
458 | "\n",
459 | "ax1.legend(loc='best');"
460 | ]
461 | },
462 | {
463 | "cell_type": "code",
464 | "execution_count": null,
465 | "metadata": {
466 | "collapsed": true
467 | },
468 | "outputs": [],
469 | "source": [
470 | "def get_rmse(y, y_hat):\n",
471 | " '''Root Mean Square Error\n",
472 | " https://en.wikipedia.org/wiki/Root-mean-square_deviation\n",
473 | " '''\n",
474 | " mse = np.mean((y - y_hat)**2)\n",
475 | " return np.sqrt(mse)\n",
476 | "\n",
477 | "def get_mape(y, y_hat):\n",
478 | " '''Mean Absolute Percent Error\n",
479 | " https://en.wikipedia.org/wiki/Mean_absolute_percentage_error\n",
480 | " '''\n",
481 | " perc_err = (100*(y - y_hat))/y\n",
482 | " return np.mean(abs(perc_err))\n",
483 | "\n",
484 | "def get_mase(y, y_hat):\n",
485 | " '''Mean Absolute Scaled Error\n",
486 | " https://en.wikipedia.org/wiki/Mean_absolute_scaled_error\n",
487 | " '''\n",
488 | " abs_err = abs(y - y_hat)\n",
489 | " dsum=sum(abs(y[1:] - y_hat[1:]))\n",
490 | " t = len(y)\n",
491 | " denom = (1/(t - 1))* dsum\n",
492 | " return np.mean(abs_err/denom)"
493 | ]
494 | },
495 | {
496 | "cell_type": "code",
497 | "execution_count": null,
498 | "metadata": {
499 | "collapsed": false
500 | },
501 | "outputs": [],
502 | "source": [
503 | "rmse = get_rmse(ts_train, pred_mean.ix[ts_train.index])\n",
504 | "print(\"RMSE: \", rmse)\n",
505 | "\n",
506 | "mape = get_mape(ts_train, pred_mean.ix[ts_train.index])\n",
507 | "print(\"MAPE: \", mape)\n",
508 | "\n",
509 | "mase = get_mase(ts_train, pred_mean.ix[ts_train.index])\n",
510 | "print(\"MASE: \", mase)"
511 | ]
512 | },
513 | {
514 | "cell_type": "markdown",
515 | "metadata": {},
516 | "source": [
517 | "### 3.4 Forecasting and forecast evaluation"
518 | ]
519 | },
520 | {
521 | "cell_type": "code",
522 | "execution_count": null,
523 | "metadata": {
524 | "collapsed": false
525 | },
526 | "outputs": [],
527 | "source": [
528 | "rmse = get_rmse(ts_test, pred_mean.ix[ts_test.index])\n",
529 | "print(rmse)\n",
530 | "\n",
531 | "mape = get_mape(ts_test, pred_mean.ix[ts_test.index])\n",
532 | "print(mape)\n",
533 | "\n",
534 | "mase = get_mase(ts_test, pred_mean.ix[ts_test.index])\n",
535 | "print(mase)"
536 | ]
537 | },
538 | {
539 | "cell_type": "markdown",
540 | "metadata": {
541 | "collapsed": true
542 | },
543 | "source": [
544 | "### Exericse 3:\n",
545 | "\n"
546 | ]
547 | },
548 | {
549 | "cell_type": "code",
550 | "execution_count": null,
551 | "metadata": {
552 | "collapsed": false
553 | },
554 | "outputs": [],
555 | "source": [
556 | "# Import the csv file containing the series for the analysis\n",
557 | "\n",
558 | "# Step 1a: Read the data series\n",
559 | "filename_ts = 'data/series2.csv'\n",
560 | "series2_df = pd.read_csv(filename_ts, index_col=0, parse_dates=[0])\n",
561 | "\n",
562 | "# Step 1b: Create the training and testing series before analyzing the series\n",
563 | "\n",
564 | "n_sample = series2_df.shape[0]\n",
565 | "\n",
566 | "n_train=int(0.95*n_sample)+1\n",
567 | "n_forecast=n_sample-n_train\n",
568 | "\n",
569 | "series2_train = series2_df.iloc[:n_train]['value']\n",
570 | "series2_test = series2_df.iloc[n_train:]['value']\n",
571 | "print(series2_train.shape)\n",
572 | "print(series2_test.shape)\n",
573 | "print(\"Training Series:\", \"\\n\", series2_train.tail(), \"\\n\")\n",
574 | "print(\"Testing Series:\", \"\\n\", series2_test.head())"
575 | ]
576 | },
577 | {
578 | "cell_type": "code",
579 | "execution_count": null,
580 | "metadata": {
581 | "collapsed": false
582 | },
583 | "outputs": [],
584 | "source": [
585 | "# Step 2a: Examine the basic structure of the data\n",
586 | "print(\"Data shape:\", series2_train.shape, \"\\n\")\n",
587 | "print(\"First 5 observations of the data series:\", \"\\n\", series2_train.head(), \"\\n\")\n",
588 | "print(\"Last 5 observations of the data series:\", \"\\n\", series2_train.tail())"
589 | ]
590 | },
591 | {
592 | "cell_type": "code",
593 | "execution_count": null,
594 | "metadata": {
595 | "collapsed": false
596 | },
597 | "outputs": [],
598 | "source": [
599 | "# Step 2b: Examine the series and use the visuals as clues for the choice of the orders of the ARIMA model\n",
600 | "# Choose the number of lags you would like to display. Pick a number that is at least 20.\n",
601 | "\n",
602 | "# tsplot(series2_train, title='Series 2', lags=?);\n",
603 | "\n",
604 | "tsplot(series2_train, title='Series 2', lags=YOUR_CODE_HERE);"
605 | ]
606 | },
607 | {
608 | "cell_type": "code",
609 | "execution_count": null,
610 | "metadata": {
611 | "collapsed": true
612 | },
613 | "outputs": [],
614 | "source": [
615 | "# Step 2c: Conduct any necessary transformations (such as natural log, difference, difference in natural log, etc )\n",
616 | "# and repeat Step 2b\n"
617 | ]
618 | },
619 | {
620 | "cell_type": "code",
621 | "execution_count": null,
622 | "metadata": {
623 | "collapsed": false
624 | },
625 | "outputs": [],
626 | "source": [
627 | "# Step 3: Estimate an non-Seasonal ARIMA model\n",
628 | "# Note: you will have to pick the orders (p,d,q)\n",
629 | "\n",
630 | "# ex3_mod = sm.tsa.statespace.SARIMAX(series2_train, order=(?,?,?))\n",
631 | "ex3_mod = sm.tsa.statespace.SARIMAX(series2_train, order=())\n",
632 | "ex3_arima_fit = ex3_mod.fit()\n",
633 | "print(ex3_arima_fit.summary())\n",
634 | "\n",
635 | "# Discuss your results"
636 | ]
637 | },
638 | {
639 | "cell_type": "code",
640 | "execution_count": null,
641 | "metadata": {
642 | "collapsed": false
643 | },
644 | "outputs": [],
645 | "source": [
646 | "# Step 4: Conduct model diagnostic check\n",
647 | "\n",
648 | "ex3_arima_fit.plot_diagnostics(figsize=(16, 12));\n",
649 | "\n",
650 | "# Discuss these plots"
651 | ]
652 | },
653 | {
654 | "cell_type": "code",
655 | "execution_count": null,
656 | "metadata": {
657 | "collapsed": false
658 | },
659 | "outputs": [],
660 | "source": [
661 | "# Step 5: Do a 5-step ahead forecast\n",
662 | "\n",
663 | "# ... codes need to be adjusted\n",
664 | "\n",
665 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n",
666 | " \n",
667 | "ax1.plot(series2_train, label='In-sample data', linestyle='-')\n",
668 | "# subtract 1 only to connect it to previous point in the graph\n",
669 | "ax1.plot(series2_test, label='Held-out data', linestyle='--')\n",
670 | "\n",
671 | "# yes DatetimeIndex\n",
672 | "pred_begin = series2_train.index[ex3_arima_fit.loglikelihood_burn]\n",
673 | "pred_end = series2_test.index[-1]\n",
674 | "pred = ex3_arima_fit.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n",
675 | " end=pred_end.strftime('%Y-%m-%d'))\n",
676 | "pred_mean = pred.predicted_mean\n",
677 | "pred_ci = pred.conf_int(alpha=0.05)\n",
678 | "\n",
679 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n",
680 | "ax1.fill_between(pred_ci.index,\n",
681 | " pred_ci.iloc[:, 0],\n",
682 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n",
683 | "\n",
684 | "ax1.legend(loc='best');\n",
685 | "\n",
686 | "## Discuss the results. How does your forecast look?"
687 | ]
688 | },
689 | {
690 | "cell_type": "code",
691 | "execution_count": null,
692 | "metadata": {
693 | "collapsed": false
694 | },
695 | "outputs": [],
696 | "source": []
697 | },
698 | {
699 | "cell_type": "code",
700 | "execution_count": null,
701 | "metadata": {
702 | "collapsed": true
703 | },
704 | "outputs": [],
705 | "source": []
706 | }
707 | ],
708 | "metadata": {
709 | "kernelspec": {
710 | "display_name": "Python 3",
711 | "language": "python",
712 | "name": "python3"
713 | },
714 | "language_info": {
715 | "codemirror_mode": {
716 | "name": "ipython",
717 | "version": 3
718 | },
719 | "file_extension": ".py",
720 | "mimetype": "text/x-python",
721 | "name": "python",
722 | "nbconvert_exporter": "python",
723 | "pygments_lexer": "ipython3",
724 | "version": "3.5.2"
725 | }
726 | },
727 | "nbformat": 4,
728 | "nbformat_minor": 0
729 | }
730 |
--------------------------------------------------------------------------------
/Section_4_SARIMA_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# PyData San Francisco 2016\n",
15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n",
16 | "### Section 4: Seasonal ARIMA Models"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "### Topics in this section include \n",
24 | "\n",
25 | " - 4.1 Mathematical formulation of Seasonal ARIMA (SARIMA) models\n",
26 | " - 4.2 Building a seasonal ARIMA model for forecasting\n",
27 | " - Exercise 4"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {
34 | "collapsed": false
35 | },
36 | "outputs": [],
37 | "source": [
38 | "# Set up\n",
39 | "\n",
40 | "%load_ext autoreload\n",
41 | "%autoreload 2\n",
42 | "%matplotlib inline\n",
43 | "%config InlineBackend.figure_format='retina'\n",
44 | "\n",
45 | "from __future__ import absolute_import, division, print_function\n",
46 | "\n",
47 | "import sys\n",
48 | "import os\n",
49 | "\n",
50 | "import pandas as pd\n",
51 | "import numpy as np\n",
52 | "\n",
53 | "# Remote Data Access\n",
54 | "import pandas_datareader.data as web\n",
55 | "import datetime\n",
56 | "# reference: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html\n",
57 | "\n",
58 | "# TSA from Statsmodels\n",
59 | "import statsmodels.api as sm\n",
60 | "import statsmodels.formula.api as smf\n",
61 | "import statsmodels.tsa.api as smt\n",
62 | "\n",
63 | "from statsmodels.graphics.api import qqplot\n",
64 | "\n",
65 | "# Display and Plotting\n",
66 | "import matplotlib.pylab as plt\n",
67 | "import seaborn as sns\n",
68 | "\n",
69 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n",
70 | "np.set_printoptions(precision=5, suppress=True) # numpy\n",
71 | "\n",
72 | "pd.set_option('display.max_columns', 100)\n",
73 | "pd.set_option('display.max_rows', 100)\n",
74 | "\n",
75 | "# seaborn plotting style\n",
76 | "sns.set(style='ticks', context='poster')"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "### Motivation of Using Seasonal ARIMA Model"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": null,
89 | "metadata": {
90 | "collapsed": false
91 | },
92 | "outputs": [],
93 | "source": [
94 | "# Import a time series\n",
95 | "# This is a series that we introduced in Section 1 of this tutorial\n",
96 | "\n",
97 | "air = pd.read_csv('data/international-airline-passengers.csv', header=0, index_col=0, parse_dates=[0])"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {
104 | "collapsed": false
105 | },
106 | "outputs": [],
107 | "source": [
108 | "# Examine the basic structure of the data\n",
109 | "print(\"Data shape:\", air.shape, \"\\n\")\n",
110 | "print(\"First 5 observations of the data series:\", \"\\n\", air.head())\n",
111 | "print(\"Last 5 observations of the data series:\", \"\\n\", air.tail())"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": null,
117 | "metadata": {
118 | "collapsed": true
119 | },
120 | "outputs": [],
121 | "source": [
122 | "# Examine the patterns of ACF and PACF (along with the time series plot and histogram)\n",
123 | "\n",
124 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n",
125 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n",
126 | " \n",
127 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n",
128 | " '''\n",
129 | " fig = plt.figure(figsize=figsize)\n",
130 | " layout = (2, 2)\n",
131 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n",
132 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n",
133 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n",
134 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n",
135 | " \n",
136 | " y.plot(ax=ts_ax)\n",
137 | " ts_ax.set_title(title)\n",
138 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n",
139 | " hist_ax.set_title('Histogram')\n",
140 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n",
141 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n",
142 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n",
143 | " sns.despine()\n",
144 | " fig.tight_layout()\n",
145 | " return ts_ax, acf_ax, pacf_ax"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {
152 | "collapsed": false
153 | },
154 | "outputs": [],
155 | "source": [
156 | "tsplot(air, title='International airline passengers, 1949-1960', lags=20);"
157 | ]
158 | },
159 | {
160 | "cell_type": "markdown",
161 | "metadata": {
162 | "collapsed": false
163 | },
164 | "source": [
165 | "### Observations of these graphs:\n",
166 | "\n",
167 | "* The airline passengers displays an increasing trend (over time)\n",
168 | "* There appears to be *seasonality*\n",
169 | "* The autocorrelations do not just gradually decline"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": null,
175 | "metadata": {
176 | "collapsed": false
177 | },
178 | "outputs": [],
179 | "source": [
180 | "# Take log of the series\n",
181 | "air['lnair'] = np.log(air)\n",
182 | "print(air['lnair'].head(),\"\\n\")\n",
183 | "print(air['lnair'].shape,\"\\n\")\n",
184 | "\n",
185 | "# Take first difference of the series\n",
186 | "#air_ln_diff = air['lnair'].diff() - air['lnair'].shift()\n",
187 | "air_ln_diff = air['lnair'].diff()\n",
188 | "air_ln_diff = air_ln_diff.dropna()\n",
189 | "print(air_ln_diff.head(),\"\\n\")\n",
190 | "print(air_ln_diff.shape,\"\\n\")"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {
197 | "collapsed": false
198 | },
199 | "outputs": [],
200 | "source": [
201 | "tsplot(air['lnair'], title='Natural Log of nternational airline passengers, 1949-1960', lags=20);"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": null,
207 | "metadata": {
208 | "collapsed": false
209 | },
210 | "outputs": [],
211 | "source": [
212 | "tsplot(air_ln_diff[1:], title='Differences of Log of International airline passengers, 1949-1960', lags=40);"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": null,
218 | "metadata": {
219 | "collapsed": false
220 | },
221 | "outputs": [],
222 | "source": [
223 | "# An alternative way to detect seasonality\n",
224 | "\n",
225 | "air['Month'] = air.index.strftime('%b')\n",
226 | "air['Year'] = air.index.year\n",
227 | "\n",
228 | "air_piv = air.pivot(index='Year', columns='Month', values='n_pass_thousands')\n",
229 | "\n",
230 | "air = air.drop(['Month', 'Year'], axis=1)\n",
231 | "\n",
232 | "# put the months in order\n",
233 | "month_names = pd.date_range(start='2016-01-01', periods=12, freq='MS').strftime('%b')\n",
234 | "air_piv = air_piv.reindex(columns=month_names)\n",
235 | "\n",
236 | "# plot it\n",
237 | "fig, ax = plt.subplots(figsize=(8, 6))\n",
238 | "air_piv.plot(ax=ax, kind='box');\n",
239 | "\n",
240 | "ax.set_xlabel('Month');\n",
241 | "ax.set_ylabel('Thousands of passengers');\n",
242 | "ax.set_title('Boxplot of seasonal values');\n",
243 | "ax.xaxis.set_ticks_position('bottom')\n",
244 | "fig.tight_layout();"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "### 4.1 Formulation of the Seasonal ARIMA Model\n",
252 | "\n",
253 | "The *pure* seasonal autoregressive and moving average model, $ARMA(P,Q)$, take the from\n",
254 | "\n",
255 | "$$\\Phi_P(B^s)z_t=\\Theta_Q(B^s)\\epsilon_t$$ \n",
256 | "\n",
257 | "where \n",
258 | "\n",
259 | "$$\\Phi_P(B^2)=1 - \\Phi_1 B^s - \\Phi_2 B^{2s} - \\cdots - \\Phi_P B^{Ps}$$\n",
260 | "\n",
261 | "and \n",
262 | "\n",
263 | "$$\\Theta_Q(B^2)=1 - \\Theta_1 B^s - \\Theta_2 B^{2s} - \\cdots - \\Theta_Q B^{Qs}$$\n",
264 | "\n",
265 | "are the **seasonal autoregressive operator** and the **seasonal moving average operator** of orders $P$ and $Q$ with **seasonal period s**.\n",
266 | "\n",
267 | "**Example:**\n",
268 | "\n",
269 | "A first-order seasonal autoregressive moving average series over months (or $SARIMA(1,0,1,12)$) can be expressed as\n",
270 | "\n",
271 | "$$ z_t = \\Phi z_{t-12} + \\epsilon_t + \\Theta \\epsilon_{t-12} $$\n",
272 | "\n",
273 | "or\n",
274 | "\n",
275 | "$$ (1 - \\Phi B^{12})z_t = (1 + \\Theta B^{12})\\epsilon_t $$\n",
276 | "\n",
277 | "In other words, this model capture the relationship between $z_t$ and its lags at the multiple of the yearly seasonal period $s=12$ months. \n",
278 | "\n",
279 | "The stationarity condition requires that $|\\Phi|<1$ and the invertible condition requires that $|\\Theta|<1$.\n",
280 | "\n",
281 | "Similar to that for the ARIMA models, the table below summarize the behavior of the theoretical ACF and PACF of the pure seasonal ARMA models:\n",
282 | "\n",
283 | "| Process | ACF | PACF |\n",
284 | "|---------------|:--------------------:|:--------------------:|\n",
285 | "| **AR(P)** | tails off | cutoff after lag $P$ |\n",
286 | "| **MA(Q)** | cutoff after lag $Q$ | tails off |\n",
287 | "| **ARMA(P,Q)** | tails off | tails off |\n",
288 | "\n",
289 | "* **Note that we use (p,d,q) to denote the orders for the non-seasonal components of the ARIMA models and (P,D,Q,s) to denote the orders for the seasonal components of the ARIMA model.**\n",
290 | "\n",
291 | "The general formulation of the **Multiplicative Seasonal Autoregressive Integrated Moving Average (SARIMA)** model takes the following form:\n",
292 | "\n",
293 | "$$ \\phi_p(B) \\Phi_P(B^s) \\bigtriangledown^d \\bigtriangledown^D_s z_t = \\theta_q(B) \\Theta_Q(B^s) \\epsilon_t $$ \n",
294 | "\n",
295 | "where \n",
296 | "\n",
297 | "$\\epsilon_t$ is a white noise process\n",
298 | "\n",
299 | "$\\phi_p(B)$ and $\\theta_q(B)$ are non-seasonal autoregressive and moving average lag polynomials\n",
300 | "\n",
301 | "$\\Phi_P(B^s)$ and $\\Theta_Q(B^s)$ are seasonal autoregressive and moving average lag polynomials\n",
302 | "\n",
303 | "$\\bigtriangledown^d \\equiv (1-B)^d$ and $\\bigtriangledown^D_s \\equiv (1-B^s)^D$ are the difference (or integrated) components\n",
304 | "\n",
305 | "Therefore, the general model is denoted as $\\mathbf{ARIMA(p,d,q)\\times(P,D,Q)_s}$\n",
306 | "\n"
307 | ]
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {},
312 | "source": [
313 | "**Example:**\n",
314 | "\n",
315 | "Unpacking the notation, the $\\mathbf{ARIMA(0,1,1)\\times(P,1,1)_12}$ model becomes\n",
316 | "\n",
317 | "$$(1-B)(1-B^{12})z_t = (1+\\theta B)(1+\\Theta B^{12}) \\epsilon_t$$\n",
318 | "\n",
319 | "When multiplying the lag polynomials on both side, we get\n",
320 | "\n",
321 | "$$ (1 - B - B^{12} + B^{13}) z_t = (1 + \\theta B + \\Theta B^{12} + \\theta \\Theta B^{13}) \\epsilon_t $$\n",
322 | "\n",
323 | "Simplify gives\n",
324 | "\n",
325 | "$$ z_t = z_{t-1} + (z_{t-12} - z_{t-13}) + \\epsilon_t + \\theta \\epsilon_{t-1} + \\Theta \\epsilon_{t-12} + \\theta \\Theta \\epsilon_{t-13}$$\n"
326 | ]
327 | },
328 | {
329 | "cell_type": "markdown",
330 | "metadata": {},
331 | "source": [
332 | "### 4.2 Building a Seasonal ARIMA Model for Forecasting"
333 | ]
334 | },
335 | {
336 | "cell_type": "code",
337 | "execution_count": null,
338 | "metadata": {
339 | "collapsed": false
340 | },
341 | "outputs": [],
342 | "source": [
343 | "# Air Passengers Series\n",
344 | "mod = sm.tsa.statespace.SARIMAX(air['lnair'], order=(2,1,0), seasonal_order=(1,1,0,12), simple_differencing=True)\n",
345 | "sarima_fit1 = mod.fit()\n",
346 | "print(sarima_fit1.summary())"
347 | ]
348 | },
349 | {
350 | "cell_type": "markdown",
351 | "metadata": {},
352 | "source": [
353 | "* Notice an additional argument *simple_differencing=True*. \n",
354 | "\n",
355 | "* This controls how the order of integration is handled in ARIMA models. \n",
356 | "\n",
357 | "* If *simple_differencing=True*, then the time series provided as endog is literally differenced and an ARMA model is fit to the resulting new time series. This implies that a number of initial periods are lost to the differencing process, however it may be necessary either to compare results to other packages (e.g. Stata's arima always uses simple differencing) or if the seasonal periodicity is large"
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": null,
363 | "metadata": {
364 | "collapsed": false
365 | },
366 | "outputs": [],
367 | "source": [
368 | "# Model Diagnostic\n",
369 | "\n",
370 | "sarima_fit1.plot_diagnostics(figsize=(16, 12));"
371 | ]
372 | },
373 | {
374 | "cell_type": "markdown",
375 | "metadata": {},
376 | "source": [
377 | "### Exercise 4: "
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {
384 | "collapsed": false
385 | },
386 | "outputs": [],
387 | "source": [
388 | "# Step 1: Import the data series\n",
389 | "liquor = pd.read_csv('data/liquor.csv', header=0, index_col=0, parse_dates=[0])\n",
390 | "\n",
391 | "# Step 1b: Create the training and testing series before analyzing the series\n",
392 | "n_sample = liquor.shape[0]\n",
393 | "n_train=int(0.95*n_sample)+1\n",
394 | "n_forecast=n_sample-n_train\n",
395 | "\n",
396 | "liquor_train = liquor.iloc[:n_train]['Value']\n",
397 | "liquor_test = liquor.iloc[n_train:]['Value']\n",
398 | "print(liquor_train.shape)\n",
399 | "print(liquor_test.shape)\n",
400 | "print(\"Training Series:\", \"\\n\", liquor_train.tail(), \"\\n\")\n",
401 | "print(\"Testing Series:\", \"\\n\", liquor_test.head())"
402 | ]
403 | },
404 | {
405 | "cell_type": "code",
406 | "execution_count": null,
407 | "metadata": {
408 | "collapsed": false
409 | },
410 | "outputs": [],
411 | "source": [
412 | "# Step 2a: Examine the basic structure of the data\n",
413 | "print(\"Data shape:\", liquor_train.shape, \"\\n\")\n",
414 | "print(\"First 5 observations of the training data series:\", \"\\n\", liquor_train.head(), \"\\n\")\n",
415 | "print(\"Last 5 observations of the training data series:\", \"\\n\", liquor_train.tail())"
416 | ]
417 | },
418 | {
419 | "cell_type": "code",
420 | "execution_count": null,
421 | "metadata": {
422 | "collapsed": false
423 | },
424 | "outputs": [],
425 | "source": [
426 | "# Step 2b: Examine the series and use the visuals as clues for the choice of the orders of the ARIMA model\n",
427 | "#tsplot(liquor_train, title='Liquor Sales (in millions of dollars), 2007-2016', lags=??);\n",
428 | "tsplot(liquor_train, title='Liquor Sales (in millions of dollars)', lags=40);"
429 | ]
430 | },
431 | {
432 | "cell_type": "code",
433 | "execution_count": null,
434 | "metadata": {
435 | "collapsed": true
436 | },
437 | "outputs": [],
438 | "source": [
439 | "# Step 2c: Conduct any necessary transformations (such as natural log, difference, difference in natural log, etc )\n",
440 | "# and repeat Step 2b\n"
441 | ]
442 | },
443 | {
444 | "cell_type": "code",
445 | "execution_count": null,
446 | "metadata": {
447 | "collapsed": false
448 | },
449 | "outputs": [],
450 | "source": [
451 | "# Step 3: Estimate an Seasonal ARIMA model\n",
452 | "# Note: you will have to pick the orders (p,d,q)(P,D,Q)_s\n",
453 | "\n",
454 | "#mod = sm.tsa.statespace.SARIMAX(liquor, order=(?,?,?), seasonal_order=(?,?,?,?))\n",
455 | "\n",
456 | "mod = sm.tsa.statespace.SARIMAX(liquor_train, order=(0,1,1), seasonal_order=(0,1,0,12))\n",
457 | "sarima_fit2 = mod.fit()\n",
458 | "print(sarima_fit2.summary())"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {
465 | "collapsed": false
466 | },
467 | "outputs": [],
468 | "source": [
469 | "# Step 4: Conduct model diagnostic check\n",
470 | "sarima_fit2.plot_diagnostics();\n",
471 | "\n",
472 | "# Discuss these plots"
473 | ]
474 | },
475 | {
476 | "cell_type": "code",
477 | "execution_count": null,
478 | "metadata": {
479 | "collapsed": false
480 | },
481 | "outputs": [],
482 | "source": [
483 | "# Step 5: Do a 14-step ahead forecast\n",
484 | "\n",
485 | "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(12, 8))\n",
486 | " \n",
487 | "ax1.plot(liquor_train, label='In-sample data', linestyle='-')\n",
488 | "# subtract 1 only to connect it to previous point in the graph\n",
489 | "ax1.plot(liquor_test, label='Held-out data', linestyle='--')\n",
490 | "\n",
491 | "# yes DatetimeIndex\n",
492 | "pred_begin = liquor_train.index[sarima_fit2.loglikelihood_burn]\n",
493 | "pred_end = liquor_test.index[-1]\n",
494 | "pred = sarima_fit2.get_prediction(start=pred_begin.strftime('%Y-%m-%d'),\n",
495 | " end=pred_end.strftime('%Y-%m-%d'))\n",
496 | "pred_mean = pred.predicted_mean\n",
497 | "pred_ci = pred.conf_int(alpha=0.05)\n",
498 | "\n",
499 | "ax1.plot(pred_mean, 'r', alpha=.6, label='Predicted values')\n",
500 | "ax1.fill_between(pred_ci.index,\n",
501 | " pred_ci.iloc[:, 0],\n",
502 | " pred_ci.iloc[:, 1], color='k', alpha=.2)\n",
503 | "ax1.set_xlabel(\"Year\")\n",
504 | "ax1.set_ylabel(\"Liquor Sales (in millions of dollars)\")\n",
505 | "ax1.legend(loc='best');\n",
506 | "fig.tight_layout();\n",
507 | "## Discuss the results. How does your forecast look?"
508 | ]
509 | },
510 | {
511 | "cell_type": "code",
512 | "execution_count": null,
513 | "metadata": {
514 | "collapsed": true
515 | },
516 | "outputs": [],
517 | "source": []
518 | },
519 | {
520 | "cell_type": "code",
521 | "execution_count": null,
522 | "metadata": {
523 | "collapsed": true
524 | },
525 | "outputs": [],
526 | "source": []
527 | },
528 | {
529 | "cell_type": "code",
530 | "execution_count": null,
531 | "metadata": {
532 | "collapsed": true
533 | },
534 | "outputs": [],
535 | "source": []
536 | }
537 | ],
538 | "metadata": {
539 | "kernelspec": {
540 | "display_name": "Python 2",
541 | "language": "python",
542 | "name": "python2"
543 | },
544 | "language_info": {
545 | "codemirror_mode": {
546 | "name": "ipython",
547 | "version": 2
548 | },
549 | "file_extension": ".py",
550 | "mimetype": "text/x-python",
551 | "name": "python",
552 | "nbconvert_exporter": "python",
553 | "pygments_lexer": "ipython2",
554 | "version": "2.7.12"
555 | }
556 | },
557 | "nbformat": 4,
558 | "nbformat_minor": 0
559 | }
560 |
--------------------------------------------------------------------------------
/Section_5_ClosingRemarks_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# PyData San Francisco 2016\n",
15 | "## Applied Time Series Econometrics in Python (and R) Tutorial\n",
16 | "\n",
17 | "### Section 5. Closing Remarks: Practical suggestions and other topics"
18 | ]
19 | },
20 | {
21 | "cell_type": "markdown",
22 | "metadata": {},
23 | "source": [
24 | "### Topics in this section include\n",
25 | "\n",
26 | "- 5.1 Model selection heuristics\n",
27 | "- 5.2 Material we did not cover\n",
28 | "- 5.3 Where to go from here"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "metadata": {},
34 | "source": [
35 | "### 5.1 Model selection heuristics\n",
36 | "\n",
37 | "ARIMA $(p,d,q)$\n",
38 | "\n",
39 | "SARIMAX $(p,d,q) \\times (P,D,Q)_{s}$\n",
40 | "\n",
41 | "- Examine the time series to understand its characteristics, e.g., trend, seasonality.\n",
42 | "- Choose an appropriate model form (ARIMA, SARIMA, ARIMAX, SARIMAX).\n",
43 | "- Check for (unit root) stationarity of the time series.\n",
44 | " - Determine whether differencing (informs $d$ and $D$) or other transformation is necessary to make stationary.\n",
45 | "- Examine ACF and PACF to determine the initial choice of the AR($p$) and MA($q$) model orders, and seasonal $P$ and $Q$ orders if appropriate.\n",
46 | "- Alternatively, or in addition, fit many models.\n",
47 | "- Choose a model based on:\n",
48 | " - A criterion, e.g., AIC, BIC\n",
49 | " - Examination of statistical tests on residuals.\n",
50 | " - Out-of-sample forecast error."
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {
57 | "collapsed": true
58 | },
59 | "outputs": [],
60 | "source": [
61 | "%load_ext autoreload\n",
62 | "%autoreload 2\n",
63 | "%matplotlib inline\n",
64 | "%config InlineBackend.figure_format='retina'\n",
65 | "\n",
66 | "from __future__ import absolute_import, division, print_function\n",
67 | "\n",
68 | "import sys\n",
69 | "import os\n",
70 | "\n",
71 | "import pandas as pd\n",
72 | "import numpy as np\n",
73 | "\n",
74 | "import statsmodels.api as sm\n",
75 | "import statsmodels.formula.api as smf\n",
76 | "import statsmodels.tsa.api as smt\n",
77 | "\n",
78 | "import itertools\n",
79 | "import warnings\n",
80 | "\n",
81 | "# Display and Plotting\n",
82 | "import matplotlib.pylab as plt\n",
83 | "import seaborn as sns\n",
84 | "\n",
85 | "pd.set_option('display.float_format', lambda x: '%.5f' % x) # pandas\n",
86 | "np.set_printoptions(precision=5, suppress=True) # numpy\n",
87 | "\n",
88 | "pd.set_option('display.max_columns', 100)\n",
89 | "pd.set_option('display.max_rows', 100)\n",
90 | "\n",
91 | "# seaborn plotting style\n",
92 | "sns.set(style='ticks', context='poster')"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": null,
98 | "metadata": {
99 | "collapsed": true
100 | },
101 | "outputs": [],
102 | "source": [
103 | "def test_stationarity(timeseries,\n",
104 | " maxlag=None, regression=None, autolag=None,\n",
105 | " window=None, plot=False, verbose=False):\n",
106 | " '''\n",
107 | " Check unit root stationarity of time series.\n",
108 | " \n",
109 | " Null hypothesis: the series is non-stationary.\n",
110 | " If p >= alpha, the series is non-stationary.\n",
111 | " If p < alpha, reject the null hypothesis (has unit root stationarity).\n",
112 | " \n",
113 | " Original source: http://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/\n",
114 | " \n",
115 | " Function: http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.adfuller.html\n",
116 | " \n",
117 | " window argument is only required for plotting rolling functions. Default=4.\n",
118 | " '''\n",
119 | " \n",
120 | " # set defaults (from function page)\n",
121 | " if regression is None:\n",
122 | " regression = 'c'\n",
123 | " \n",
124 | " if verbose:\n",
125 | " print('Running Augmented Dickey-Fuller test with paramters:')\n",
126 | " print('maxlag: {}'.format(maxlag))\n",
127 | " print('regression: {}'.format(regression))\n",
128 | " print('autolag: {}'.format(autolag))\n",
129 | " \n",
130 | " if plot:\n",
131 | " if window is None:\n",
132 | " window = 4\n",
133 | " #Determing rolling statistics\n",
134 | " rolmean = timeseries.rolling(window=window, center=False).mean()\n",
135 | " rolstd = timeseries.rolling(window=window, center=False).std()\n",
136 | " \n",
137 | " #Plot rolling statistics:\n",
138 | " orig = plt.plot(timeseries, color='blue', label='Original')\n",
139 | " mean = plt.plot(rolmean, color='red', label='Rolling Mean ({})'.format(window))\n",
140 | " std = plt.plot(rolstd, color='black', label='Rolling Std ({})'.format(window))\n",
141 | " plt.legend(loc='best')\n",
142 | " plt.title('Rolling Mean & Standard Deviation')\n",
143 | " plt.show(block=False)\n",
144 | " \n",
145 | " #Perform Augmented Dickey-Fuller test:\n",
146 | " dftest = smt.adfuller(timeseries, maxlag=maxlag, regression=regression, autolag=autolag)\n",
147 | " dfoutput = pd.Series(dftest[0:4], index=['Test Statistic',\n",
148 | " 'p-value',\n",
149 | " '#Lags Used',\n",
150 | " 'Number of Observations Used',\n",
151 | " ])\n",
152 | " for key,value in dftest[4].items():\n",
153 | " dfoutput['Critical Value (%s)'%key] = value\n",
154 | " if verbose:\n",
155 | " print('Results of Augmented Dickey-Fuller Test:')\n",
156 | " print(dfoutput)\n",
157 | " return dfoutput"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {
164 | "collapsed": true
165 | },
166 | "outputs": [],
167 | "source": [
168 | "def tsplot(y, lags=None, title='', figsize=(14, 8)):\n",
169 | " '''Examine the patterns of ACF and PACF, along with the time series plot and histogram.\n",
170 | " \n",
171 | " Original source: https://tomaugspurger.github.io/modern-7-timeseries.html\n",
172 | " '''\n",
173 | " fig = plt.figure(figsize=figsize)\n",
174 | " layout = (2, 2)\n",
175 | " ts_ax = plt.subplot2grid(layout, (0, 0))\n",
176 | " hist_ax = plt.subplot2grid(layout, (0, 1))\n",
177 | " acf_ax = plt.subplot2grid(layout, (1, 0))\n",
178 | " pacf_ax = plt.subplot2grid(layout, (1, 1))\n",
179 | " \n",
180 | " y.plot(ax=ts_ax)\n",
181 | " ts_ax.set_title(title)\n",
182 | " y.plot(ax=hist_ax, kind='hist', bins=25)\n",
183 | " hist_ax.set_title('Histogram')\n",
184 | " smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)\n",
185 | " smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)\n",
186 | " [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]\n",
187 | " sns.despine()\n",
188 | " fig.tight_layout()\n",
189 | " return ts_ax, acf_ax, pacf_ax"
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": null,
195 | "metadata": {
196 | "collapsed": true
197 | },
198 | "outputs": [],
199 | "source": [
200 | "def model_resid_stats(model_results,\n",
201 | " het_method='breakvar',\n",
202 | " norm_method='jarquebera',\n",
203 | " sercor_method='ljungbox',\n",
204 | " verbose=True,\n",
205 | " ):\n",
206 | " '''More information about the statistics under the ARIMA parameters table, tests of standardized residuals:\n",
207 | " \n",
208 | " Test of heteroskedasticity\n",
209 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_heteroskedasticity\n",
210 | "\n",
211 | " Test of normality (Default: Jarque-Bera)\n",
212 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality.html#statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_normality\n",
213 | "\n",
214 | " Test of serial correlation (Default: Ljung-Box)\n",
215 | " http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.test_serial_correlation.html\n",
216 | " '''\n",
217 | " # Re-run the ARIMA model statistical tests, and more. To be used when selecting viable models.\n",
218 | " (het_stat, het_p) = model_results.test_heteroskedasticity(het_method)[0]\n",
219 | " norm_stat, norm_p, skew, kurtosis = model_results.test_normality(norm_method)[0]\n",
220 | " sercor_stat, sercor_p = model_results.test_serial_correlation(method=sercor_method)[0]\n",
221 | " sercor_stat = sercor_stat[-1] # last number for the largest lag\n",
222 | " sercor_p = sercor_p[-1] # last number for the largest lag\n",
223 | "\n",
224 | " # Run Durbin-Watson test on the standardized residuals.\n",
225 | " # The statistic is approximately equal to 2*(1-r), where r is the sample autocorrelation of the residuals.\n",
226 | " # Thus, for r == 0, indicating no serial correlation, the test statistic equals 2.\n",
227 | " # This statistic will always be between 0 and 4. The closer to 0 the statistic,\n",
228 | " # the more evidence for positive serial correlation. The closer to 4,\n",
229 | " # the more evidence for negative serial correlation.\n",
230 | " # Essentially, below 1 or above 3 is bad.\n",
231 | " dw_stat = sm.stats.stattools.durbin_watson(model_results.filter_results.standardized_forecasts_error[0, model_results.loglikelihood_burn:])\n",
232 | "\n",
233 | " # check whether roots are outside the unit circle (we want them to be);\n",
234 | " # will be True when AR is not used (i.e., AR order = 0)\n",
235 | " arroots_outside_unit_circle = np.all(np.abs(model_results.arroots) > 1)\n",
236 | " # will be True when MA is not used (i.e., MA order = 0)\n",
237 | " maroots_outside_unit_circle = np.all(np.abs(model_results.maroots) > 1)\n",
238 | " \n",
239 | " if verbose:\n",
240 | " print('Test heteroskedasticity of residuals ({}): stat={:.3f}, p={:.3f}'.format(het_method, het_stat, het_p));\n",
241 | " print('\\nTest normality of residuals ({}): stat={:.3f}, p={:.3f}'.format(norm_method, norm_stat, norm_p));\n",
242 | " print('\\nTest serial correlation of residuals ({}): stat={:.3f}, p={:.3f}'.format(sercor_method, sercor_stat, sercor_p));\n",
243 | " print('\\nDurbin-Watson test on residuals: d={:.2f}\\n\\t(NB: 2 means no serial correlation, 0=pos, 4=neg)'.format(dw_stat))\n",
244 | " print('\\nTest for all AR roots outside unit circle (>1): {}'.format(arroots_outside_unit_circle))\n",
245 | " print('\\nTest for all MA roots outside unit circle (>1): {}'.format(maroots_outside_unit_circle))\n",
246 | " \n",
247 | " stat = {'het_method': het_method,\n",
248 | " 'het_stat': het_stat,\n",
249 | " 'het_p': het_p,\n",
250 | " 'norm_method': norm_method,\n",
251 | " 'norm_stat': norm_stat,\n",
252 | " 'norm_p': norm_p,\n",
253 | " 'skew': skew,\n",
254 | " 'kurtosis': kurtosis,\n",
255 | " 'sercor_method': sercor_method,\n",
256 | " 'sercor_stat': sercor_stat,\n",
257 | " 'sercor_p': sercor_p,\n",
258 | " 'dw_stat': dw_stat,\n",
259 | " 'arroots_outside_unit_circle': arroots_outside_unit_circle,\n",
260 | " 'maroots_outside_unit_circle': maroots_outside_unit_circle,\n",
261 | " }\n",
262 | " return stat"
263 | ]
264 | },
265 | {
266 | "cell_type": "code",
267 | "execution_count": null,
268 | "metadata": {
269 | "collapsed": true
270 | },
271 | "outputs": [],
272 | "source": [
273 | "def model_gridsearch(ts,\n",
274 | " p_min,\n",
275 | " d_min,\n",
276 | " q_min,\n",
277 | " p_max,\n",
278 | " d_max,\n",
279 | " q_max,\n",
280 | " sP_min,\n",
281 | " sD_min,\n",
282 | " sQ_min,\n",
283 | " sP_max,\n",
284 | " sD_max,\n",
285 | " sQ_max,\n",
286 | " trends,\n",
287 | " s=None,\n",
288 | " enforce_stationarity=True,\n",
289 | " enforce_invertibility=True,\n",
290 | " simple_differencing=False,\n",
291 | " plot_diagnostics=False,\n",
292 | " verbose=False,\n",
293 | " filter_warnings=True,\n",
294 | " ):\n",
295 | " '''Run grid search of SARIMAX models and save results.\n",
296 | " '''\n",
297 | " \n",
298 | " cols = ['p', 'd', 'q', 'sP', 'sD', 'sQ', 's', 'trend',\n",
299 | " 'enforce_stationarity', 'enforce_invertibility', 'simple_differencing',\n",
300 | " 'aic', 'bic',\n",
301 | " 'het_p', 'norm_p', 'sercor_p', 'dw_stat',\n",
302 | " 'arroots_gt_1', 'maroots_gt_1',\n",
303 | " 'datetime_run']\n",
304 | "\n",
305 | " # Initialize a DataFrame to store the results\n",
306 | " df_results = pd.DataFrame(columns=cols)\n",
307 | "\n",
308 | " # # Initialize a DataFrame to store the results\n",
309 | " # results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min,p_max+1)],\n",
310 | " # columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])\n",
311 | "\n",
312 | " mod_num=0\n",
313 | " for trend,p,d,q,sP,sD,sQ in itertools.product(trends,\n",
314 | " range(p_min,p_max+1),\n",
315 | " range(d_min,d_max+1),\n",
316 | " range(q_min,q_max+1),\n",
317 | " range(sP_min,sP_max+1),\n",
318 | " range(sD_min,sD_max+1),\n",
319 | " range(sQ_min,sQ_max+1),\n",
320 | " ):\n",
321 | " # initialize to store results for this parameter set\n",
322 | " this_model = pd.DataFrame(index=[mod_num], columns=cols)\n",
323 | "\n",
324 | " if p==0 and d==0 and q==0:\n",
325 | " continue\n",
326 | "\n",
327 | " try:\n",
328 | " model = sm.tsa.SARIMAX(ts,\n",
329 | " trend=trend,\n",
330 | " order=(p, d, q),\n",
331 | " seasonal_order=(sP, sD, sQ, s),\n",
332 | " enforce_stationarity=enforce_stationarity,\n",
333 | " enforce_invertibility=enforce_invertibility,\n",
334 | " simple_differencing=simple_differencing,\n",
335 | " )\n",
336 | " \n",
337 | " if filter_warnings is True:\n",
338 | " with warnings.catch_warnings():\n",
339 | " warnings.filterwarnings(\"ignore\")\n",
340 | " model_results = model.fit(disp=0)\n",
341 | " else:\n",
342 | " model_results = model.fit()\n",
343 | "\n",
344 | " if verbose:\n",
345 | " print(model_results.summary())\n",
346 | "\n",
347 | " if plot_diagnostics:\n",
348 | " model_results.plot_diagnostics();\n",
349 | "\n",
350 | " stat = model_resid_stats(model_results,\n",
351 | " verbose=verbose)\n",
352 | "\n",
353 | " this_model.loc[mod_num, 'p'] = p\n",
354 | " this_model.loc[mod_num, 'd'] = d\n",
355 | " this_model.loc[mod_num, 'q'] = q\n",
356 | " this_model.loc[mod_num, 'sP'] = sP\n",
357 | " this_model.loc[mod_num, 'sD'] = sD\n",
358 | " this_model.loc[mod_num, 'sQ'] = sQ\n",
359 | " this_model.loc[mod_num, 's'] = s\n",
360 | " this_model.loc[mod_num, 'trend'] = trend\n",
361 | " this_model.loc[mod_num, 'enforce_stationarity'] = enforce_stationarity\n",
362 | " this_model.loc[mod_num, 'enforce_invertibility'] = enforce_invertibility\n",
363 | " this_model.loc[mod_num, 'simple_differencing'] = simple_differencing\n",
364 | "\n",
365 | " this_model.loc[mod_num, 'aic'] = model_results.aic\n",
366 | " this_model.loc[mod_num, 'bic'] = model_results.bic\n",
367 | "\n",
368 | " # this_model.loc[mod_num, 'het_method'] = stat['het_method']\n",
369 | " # this_model.loc[mod_num, 'het_stat'] = stat['het_stat']\n",
370 | " this_model.loc[mod_num, 'het_p'] = stat['het_p']\n",
371 | " # this_model.loc[mod_num, 'norm_method'] = stat['norm_method']\n",
372 | " # this_model.loc[mod_num, 'norm_stat'] = stat['norm_stat']\n",
373 | " this_model.loc[mod_num, 'norm_p'] = stat['norm_p']\n",
374 | " # this_model.loc[mod_num, 'skew'] = stat['skew']\n",
375 | " # this_model.loc[mod_num, 'kurtosis'] = stat['kurtosis']\n",
376 | " # this_model.loc[mod_num, 'sercor_method'] = stat['sercor_method']\n",
377 | " # this_model.loc[mod_num, 'sercor_stat'] = stat['sercor_stat']\n",
378 | " this_model.loc[mod_num, 'sercor_p'] = stat['sercor_p']\n",
379 | " this_model.loc[mod_num, 'dw_stat'] = stat['dw_stat']\n",
380 | " this_model.loc[mod_num, 'arroots_gt_1'] = stat['arroots_outside_unit_circle']\n",
381 | " this_model.loc[mod_num, 'maroots_gt_1'] = stat['maroots_outside_unit_circle']\n",
382 | "\n",
383 | " this_model.loc[mod_num, 'datetime_run'] = pd.to_datetime('today').strftime('%Y-%m-%d %H:%M:%S')\n",
384 | "\n",
385 | " df_results = df_results.append(this_model)\n",
386 | " mod_num+=1\n",
387 | " except:\n",
388 | " continue\n",
389 | " return df_results"
390 | ]
391 | },
392 | {
393 | "cell_type": "code",
394 | "execution_count": null,
395 | "metadata": {
396 | "collapsed": true
397 | },
398 | "outputs": [],
399 | "source": [
400 | "# load time series\n",
401 | "liquor = pd.read_csv('data/liquor.csv', header=0, index_col=0, parse_dates=[0])\n",
402 | "\n",
403 | "# Keey only the data from the last 10 years or so\n",
404 | "liquor = liquor.ix['2007':'2016']"
405 | ]
406 | },
407 | {
408 | "cell_type": "code",
409 | "execution_count": null,
410 | "metadata": {
411 | "collapsed": false
412 | },
413 | "outputs": [],
414 | "source": [
415 | "# plot\n",
416 | "tsplot(liquor['Value'], title='Liquor Sales (in millions of dollars), 2007-2016', lags=40);"
417 | ]
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": null,
422 | "metadata": {
423 | "collapsed": false
424 | },
425 | "outputs": [],
426 | "source": [
427 | "# Test stationarity\n",
428 | "\n",
429 | "test_stationarity(liquor['Value'])"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": null,
435 | "metadata": {
436 | "collapsed": false
437 | },
438 | "outputs": [],
439 | "source": [
440 | "# Take first difference of the series\n",
441 | "test_stationarity(liquor['Value'].diff().dropna())"
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": null,
447 | "metadata": {
448 | "collapsed": false
449 | },
450 | "outputs": [],
451 | "source": [
452 | "# Take log of the series\n",
453 | "liquor['lnliquor'] = np.log(liquor)\n",
454 | "\n",
455 | "test_stationarity(liquor['lnliquor'])"
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": null,
461 | "metadata": {
462 | "collapsed": false
463 | },
464 | "outputs": [],
465 | "source": [
466 | "# Take first difference of the log series\n",
467 | "liquor_ln_diff = liquor['lnliquor'].diff()\n",
468 | "liquor_ln_diff = liquor_ln_diff.dropna()\n",
469 | "\n",
470 | "test_stationarity(liquor_ln_diff)"
471 | ]
472 | },
473 | {
474 | "cell_type": "code",
475 | "execution_count": null,
476 | "metadata": {
477 | "collapsed": false,
478 | "scrolled": false
479 | },
480 | "outputs": [],
481 | "source": [
482 | "# run model grid search\n",
483 | "\n",
484 | "p_min = 0\n",
485 | "d_min = 0\n",
486 | "q_min = 0\n",
487 | "p_max = 2\n",
488 | "d_max = 1\n",
489 | "q_max = 2\n",
490 | "\n",
491 | "sP_min = 0\n",
492 | "sD_min = 0\n",
493 | "sQ_min = 0\n",
494 | "sP_max = 1\n",
495 | "sD_max = 1\n",
496 | "sQ_max = 1\n",
497 | "\n",
498 | "s=12\n",
499 | "\n",
500 | "# trends=['n', 'c']\n",
501 | "trends=['n']\n",
502 | "\n",
503 | "enforce_stationarity=True\n",
504 | "enforce_invertibility=True\n",
505 | "simple_differencing=False\n",
506 | "\n",
507 | "plot_diagnostics=False\n",
508 | "\n",
509 | "verbose=False\n",
510 | "\n",
511 | "df_results = model_gridsearch(liquor['Value'],\n",
512 | " p_min,\n",
513 | " d_min,\n",
514 | " q_min,\n",
515 | " p_max,\n",
516 | " d_max,\n",
517 | " q_max,\n",
518 | " sP_min,\n",
519 | " sD_min,\n",
520 | " sQ_min,\n",
521 | " sP_max,\n",
522 | " sD_max,\n",
523 | " sQ_max,\n",
524 | " trends,\n",
525 | " s=s,\n",
526 | " enforce_stationarity=enforce_stationarity,\n",
527 | " enforce_invertibility=enforce_invertibility,\n",
528 | " simple_differencing=simple_differencing,\n",
529 | " plot_diagnostics=plot_diagnostics,\n",
530 | " verbose=verbose,\n",
531 | " )"
532 | ]
533 | },
534 | {
535 | "cell_type": "code",
536 | "execution_count": null,
537 | "metadata": {
538 | "collapsed": false
539 | },
540 | "outputs": [],
541 | "source": [
542 | "# choose a model\n",
543 | "\n",
544 | "df_results.sort_values(by='bic').head(10)"
545 | ]
546 | },
547 | {
548 | "cell_type": "markdown",
549 | "metadata": {},
550 | "source": [
551 | "### 5.2 Where to go from here"
552 | ]
553 | },
554 | {
555 | "cell_type": "markdown",
556 | "metadata": {},
557 | "source": [
558 | "#### Jupyter notebooks, presentations, blog posts\n",
559 | "\n",
560 | "- Example notebooks using SARIMAX models:\n",
561 | " - SARIMAX introduction\n",
562 | " - http://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html\n",
563 | " - Model selection, missing data\n",
564 | " - http://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_internet.html\n",
565 | "- \"Time series analysis and forecasting with statsmodels\" presentation, by a statsmodels lead contributor\n",
566 | " - https://josef-pkt.github.io/pages/slides/slides_forecasting.slides.html\n",
567 | "- Time series analysis using Pandas and statsmodels, by a statsmodels lead contributor\n",
568 | " - https://tomaugspurger.github.io/modern-7-timeseries.html\n",
569 | "- Time series analysis using Pandas and statsmodels\n",
570 | " - https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/\n",
571 | "- Simple example of SARIMA forecast (based on the analyticsvidhya.com post)\n",
572 | " - http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/\n",
573 | "\n",
574 | "\n",
575 | "#### Free textbooks available online\n",
576 | "\n",
577 | "- Forecasting: principles and practice (free textbook, with R code)\n",
578 | " - https://www.otexts.org/fpp/\n",
579 | "- Time Series Analysis and Its Applications: With R Examples (Shumway & Stoffer, EZ time series edition)\n",
580 | " - http://www.stat.pitt.edu/stoffer/tsa4/\n",
581 | "\n"
582 | ]
583 | }
584 | ],
585 | "metadata": {
586 | "kernelspec": {
587 | "display_name": "Python 3",
588 | "language": "python",
589 | "name": "python3"
590 | },
591 | "language_info": {
592 | "codemirror_mode": {
593 | "name": "ipython",
594 | "version": 3
595 | },
596 | "file_extension": ".py",
597 | "mimetype": "text/x-python",
598 | "name": "python",
599 | "nbconvert_exporter": "python",
600 | "pygments_lexer": "ipython3",
601 | "version": "3.5.2"
602 | }
603 | },
604 | "nbformat": 4,
605 | "nbformat_minor": 0
606 | }
607 |
--------------------------------------------------------------------------------
/data/HOUST.csv:
--------------------------------------------------------------------------------
1 | observation_date,HOUST
2 | 1/1/93,1210
3 | 2/1/93,1210
4 | 3/1/93,1083
5 | 4/1/93,1258
6 | 5/1/93,1260
7 | 6/1/93,1280
8 | 7/1/93,1254
9 | 8/1/93,1300
10 | 9/1/93,1343
11 | 10/1/93,1392
12 | 11/1/93,1376
13 | 12/1/93,1533
14 | 1/1/94,1272
15 | 2/1/94,1337
16 | 3/1/94,1564
17 | 4/1/94,1465
18 | 5/1/94,1526
19 | 6/1/94,1409
20 | 7/1/94,1439
21 | 8/1/94,1450
22 | 9/1/94,1474
23 | 10/1/94,1450
24 | 11/1/94,1511
25 | 12/1/94,1455
26 | 1/1/95,1407
27 | 2/1/95,1316
28 | 3/1/95,1249
29 | 4/1/95,1267
30 | 5/1/95,1314
31 | 6/1/95,1281
32 | 7/1/95,1461
33 | 8/1/95,1416
34 | 9/1/95,1369
35 | 10/1/95,1369
36 | 11/1/95,1452
37 | 12/1/95,1431
38 | 1/1/96,1467
39 | 2/1/96,1491
40 | 3/1/96,1424
41 | 4/1/96,1516
42 | 5/1/96,1504
43 | 6/1/96,1467
44 | 7/1/96,1472
45 | 8/1/96,1557
46 | 9/1/96,1475
47 | 10/1/96,1392
48 | 11/1/96,1489
49 | 12/1/96,1370
50 | 1/1/97,1355
51 | 2/1/97,1486
52 | 3/1/97,1457
53 | 4/1/97,1492
54 | 5/1/97,1442
55 | 6/1/97,1494
56 | 7/1/97,1437
57 | 8/1/97,1390
58 | 9/1/97,1546
59 | 10/1/97,1520
60 | 11/1/97,1510
61 | 12/1/97,1566
62 | 1/1/98,1525
63 | 2/1/98,1584
64 | 3/1/98,1567
65 | 4/1/98,1540
66 | 5/1/98,1536
67 | 6/1/98,1641
68 | 7/1/98,1698
69 | 8/1/98,1614
70 | 9/1/98,1582
71 | 10/1/98,1715
72 | 11/1/98,1660
73 | 12/1/98,1792
74 | 1/1/99,1748
75 | 2/1/99,1670
76 | 3/1/99,1710
77 | 4/1/99,1553
78 | 5/1/99,1611
79 | 6/1/99,1559
80 | 7/1/99,1669
81 | 8/1/99,1648
82 | 9/1/99,1635
83 | 10/1/99,1608
84 | 11/1/99,1648
85 | 12/1/99,1708
86 | 1/1/00,1636
87 | 2/1/00,1737
88 | 3/1/00,1604
89 | 4/1/00,1626
90 | 5/1/00,1575
91 | 6/1/00,1559
92 | 7/1/00,1463
93 | 8/1/00,1541
94 | 9/1/00,1507
95 | 10/1/00,1549
96 | 11/1/00,1551
97 | 12/1/00,1532
98 | 1/1/01,1600
99 | 2/1/01,1625
100 | 3/1/01,1590
101 | 4/1/01,1649
102 | 5/1/01,1605
103 | 6/1/01,1636
104 | 7/1/01,1670
105 | 8/1/01,1567
106 | 9/1/01,1562
107 | 10/1/01,1540
108 | 11/1/01,1602
109 | 12/1/01,1568
110 | 1/1/02,1698
111 | 2/1/02,1829
112 | 3/1/02,1642
113 | 4/1/02,1592
114 | 5/1/02,1764
115 | 6/1/02,1717
116 | 7/1/02,1655
117 | 8/1/02,1633
118 | 9/1/02,1804
119 | 10/1/02,1648
120 | 11/1/02,1753
121 | 12/1/02,1788
122 | 1/1/03,1853
123 | 2/1/03,1629
124 | 3/1/03,1726
125 | 4/1/03,1643
126 | 5/1/03,1751
127 | 6/1/03,1867
128 | 7/1/03,1897
129 | 8/1/03,1833
130 | 9/1/03,1939
131 | 10/1/03,1967
132 | 11/1/03,2083
133 | 12/1/03,2057
134 | 1/1/04,1911
135 | 2/1/04,1846
136 | 3/1/04,1998
137 | 4/1/04,2003
138 | 5/1/04,1981
139 | 6/1/04,1828
140 | 7/1/04,2002
141 | 8/1/04,2024
142 | 9/1/04,1905
143 | 10/1/04,2072
144 | 11/1/04,1782
145 | 12/1/04,2042
146 | 1/1/05,2144
147 | 2/1/05,2207
148 | 3/1/05,1864
149 | 4/1/05,2061
150 | 5/1/05,2025
151 | 6/1/05,2068
152 | 7/1/05,2054
153 | 8/1/05,2095
154 | 9/1/05,2151
155 | 10/1/05,2065
156 | 11/1/05,2147
157 | 12/1/05,1994
158 | 1/1/06,2273
159 | 2/1/06,2119
160 | 3/1/06,1969
161 | 4/1/06,1821
162 | 5/1/06,1942
163 | 6/1/06,1802
164 | 7/1/06,1737
165 | 8/1/06,1650
166 | 9/1/06,1720
167 | 10/1/06,1491
168 | 11/1/06,1570
169 | 12/1/06,1649
170 | 1/1/07,1409
171 | 2/1/07,1480
172 | 3/1/07,1495
173 | 4/1/07,1490
174 | 5/1/07,1415
175 | 6/1/07,1448
176 | 7/1/07,1354
177 | 8/1/07,1330
178 | 9/1/07,1183
179 | 10/1/07,1264
180 | 11/1/07,1197
181 | 12/1/07,1037
182 | 1/1/08,1084
183 | 2/1/08,1103
184 | 3/1/08,1005
185 | 4/1/08,1013
186 | 5/1/08,973
187 | 6/1/08,1046
188 | 7/1/08,923
189 | 8/1/08,844
190 | 9/1/08,820
191 | 10/1/08,777
192 | 11/1/08,652
193 | 12/1/08,560
194 | 1/1/09,490
195 | 2/1/09,582
196 | 3/1/09,505
197 | 4/1/09,478
198 | 5/1/09,540
199 | 6/1/09,585
200 | 7/1/09,594
201 | 8/1/09,586
202 | 9/1/09,585
203 | 10/1/09,534
204 | 11/1/09,588
205 | 12/1/09,581
206 | 1/1/10,614
207 | 2/1/10,604
208 | 3/1/10,636
209 | 4/1/10,687
210 | 5/1/10,583
211 | 6/1/10,536
212 | 7/1/10,546
213 | 8/1/10,599
214 | 9/1/10,594
215 | 10/1/10,543
216 | 11/1/10,545
217 | 12/1/10,539
218 | 1/1/11,630
219 | 2/1/11,517
220 | 3/1/11,600
221 | 4/1/11,554
222 | 5/1/11,561
223 | 6/1/11,608
224 | 7/1/11,623
225 | 8/1/11,585
226 | 9/1/11,650
227 | 10/1/11,610
228 | 11/1/11,711
229 | 12/1/11,694
230 | 1/1/12,723
231 | 2/1/12,704
232 | 3/1/12,695
233 | 4/1/12,753
234 | 5/1/12,708
235 | 6/1/12,757
236 | 7/1/12,740
237 | 8/1/12,754
238 | 9/1/12,847
239 | 10/1/12,915
240 | 11/1/12,833
241 | 12/1/12,976
242 | 1/1/13,888
243 | 2/1/13,970
244 | 3/1/13,999
245 | 4/1/13,826
246 | 5/1/13,920
247 | 6/1/13,852
248 | 7/1/13,891
249 | 8/1/13,898
250 | 9/1/13,860
251 | 10/1/13,921
252 | 11/1/13,1104
253 | 12/1/13,1010
254 | 1/1/14,902
255 | 2/1/14,948
256 | 3/1/14,973
257 | 4/1/14,1038
258 | 5/1/14,987
259 | 6/1/14,928
260 | 7/1/14,1085
261 | 8/1/14,984
262 | 9/1/14,999
263 | 10/1/14,1094
264 | 11/1/14,994
265 | 12/1/14,1081
266 | 1/1/15,1101
267 | 2/1/15,893
268 | 3/1/15,964
269 | 4/1/15,1192
270 | 5/1/15,1063
271 | 6/1/15,1213
272 | 7/1/15,1147
273 | 8/1/15,1132
274 | 9/1/15,1189
275 | 10/1/15,1073
276 | 11/1/15,1171
277 | 12/1/15,1160
278 | 1/1/16,1128
279 | 2/1/16,1213
280 | 3/1/16,1113
281 | 4/1/16,1155
282 | 5/1/16,1135
283 | 6/1/16,1189
--------------------------------------------------------------------------------
/data/TOTALSA.csv:
--------------------------------------------------------------------------------
1 | DATE,TOTALSA
2 | 1993-01-01,13.5
3 | 1993-02-01,13.0
4 | 1993-03-01,13.3
5 | 1993-04-01,14.5
6 | 1993-05-01,14.5
7 | 1993-06-01,14.5
8 | 1993-07-01,14.5
9 | 1993-08-01,13.6
10 | 1993-09-01,14.0
11 | 1993-10-01,14.9
12 | 1993-11-01,14.9
13 | 1993-12-01,15.0
14 | 1994-01-01,15.4
15 | 1994-02-01,15.5
16 | 1994-03-01,15.3
17 | 1994-04-01,16.0
18 | 1994-05-01,14.6
19 | 1994-06-01,15.1
20 | 1994-07-01,14.9
21 | 1994-08-01,15.4
22 | 1994-09-01,15.3
23 | 1994-10-01,15.9
24 | 1994-11-01,15.9
25 | 1994-12-01,15.6
26 | 1995-01-01,14.8
27 | 1995-02-01,14.9
28 | 1995-03-01,15.3
29 | 1995-04-01,14.4
30 | 1995-05-01,14.8
31 | 1995-06-01,15.4
32 | 1995-07-01,14.7
33 | 1995-08-01,15.4
34 | 1995-09-01,15.3
35 | 1995-10-01,14.9
36 | 1995-11-01,15.4
37 | 1995-12-01,16.3
38 | 1996-01-01,14.8
39 | 1996-02-01,15.6
40 | 1996-03-01,16.0
41 | 1996-04-01,15.5
42 | 1996-05-01,16.0
43 | 1996-06-01,15.3
44 | 1996-07-01,15.1
45 | 1996-08-01,15.5
46 | 1996-09-01,15.5
47 | 1996-10-01,15.3
48 | 1996-11-01,15.7
49 | 1996-12-01,15.2
50 | 1997-01-01,15.7
51 | 1997-02-01,15.3
52 | 1997-03-01,15.8
53 | 1997-04-01,15.1
54 | 1997-05-01,15.1
55 | 1997-06-01,14.5
56 | 1997-07-01,15.6
57 | 1997-08-01,16.1
58 | 1997-09-01,15.1
59 | 1997-10-01,15.4
60 | 1997-11-01,15.8
61 | 1997-12-01,16.5
62 | 1998-01-01,14.8
63 | 1998-02-01,15.2
64 | 1998-03-01,15.4
65 | 1998-04-01,15.9
66 | 1998-05-01,17.1
67 | 1998-06-01,16.8
68 | 1998-07-01,14.7
69 | 1998-08-01,14.8
70 | 1998-09-01,16.3
71 | 1998-10-01,17.1
72 | 1998-11-01,16.1
73 | 1998-12-01,17.4
74 | 1999-01-01,16.6
75 | 1999-02-01,17.1
76 | 1999-03-01,16.8
77 | 1999-04-01,16.9
78 | 1999-05-01,17.6
79 | 1999-06-01,17.3
80 | 1999-07-01,17.7
81 | 1999-08-01,17.6
82 | 1999-09-01,17.7
83 | 1999-10-01,17.7
84 | 1999-11-01,17.6
85 | 1999-12-01,18.3
86 | 2000-01-01,18.6
87 | 2000-02-01,19.4
88 | 2000-03-01,18.3
89 | 2000-04-01,17.9
90 | 2000-05-01,17.9
91 | 2000-06-01,17.6
92 | 2000-07-01,17.3
93 | 2000-08-01,17.5
94 | 2000-09-01,18.7
95 | 2000-10-01,17.5
96 | 2000-11-01,16.6
97 | 2000-12-01,16.2
98 | 2001-01-01,17.7
99 | 2001-02-01,17.8
100 | 2001-03-01,17.2
101 | 2001-04-01,16.9
102 | 2001-05-01,16.9
103 | 2001-06-01,17.5
104 | 2001-07-01,16.5
105 | 2001-08-01,16.3
106 | 2001-09-01,16.4
107 | 2001-10-01,22.1
108 | 2001-11-01,18.0
109 | 2001-12-01,16.5
110 | 2002-01-01,16.5
111 | 2002-02-01,17.3
112 | 2002-03-01,17.1
113 | 2002-04-01,17.7
114 | 2002-05-01,16.2
115 | 2002-06-01,16.9
116 | 2002-07-01,18.2
117 | 2002-08-01,18.4
118 | 2002-09-01,16.7
119 | 2002-10-01,16.3
120 | 2002-11-01,16.5
121 | 2002-12-01,17.9
122 | 2003-01-01,16.7
123 | 2003-02-01,16.1
124 | 2003-03-01,16.5
125 | 2003-04-01,16.7
126 | 2003-05-01,16.5
127 | 2003-06-01,17.0
128 | 2003-07-01,17.1
129 | 2003-08-01,18.3
130 | 2003-09-01,17.3
131 | 2003-10-01,16.5
132 | 2003-11-01,17.6
133 | 2003-12-01,17.4
134 | 2004-01-01,16.7
135 | 2004-02-01,17.0
136 | 2004-03-01,17.2
137 | 2004-04-01,16.9
138 | 2004-05-01,18.2
139 | 2004-06-01,16.2
140 | 2004-07-01,17.3
141 | 2004-08-01,17.2
142 | 2004-09-01,17.9
143 | 2004-10-01,17.5
144 | 2004-11-01,17.4
145 | 2004-12-01,18.1
146 | 2005-01-01,16.9
147 | 2005-02-01,16.9
148 | 2005-03-01,17.4
149 | 2005-04-01,17.8
150 | 2005-05-01,17.4
151 | 2005-06-01,18.5
152 | 2005-07-01,21.1
153 | 2005-08-01,17.4
154 | 2005-09-01,16.9
155 | 2005-10-01,15.3
156 | 2005-11-01,16.5
157 | 2005-12-01,17.2
158 | 2006-01-01,18.1
159 | 2006-02-01,17.1
160 | 2006-03-01,17.0
161 | 2006-04-01,17.1
162 | 2006-05-01,16.7
163 | 2006-06-01,16.9
164 | 2006-07-01,17.7
165 | 2006-08-01,16.5
166 | 2006-09-01,17.0
167 | 2006-10-01,16.9
168 | 2006-11-01,16.7
169 | 2006-12-01,17.1
170 | 2007-01-01,16.9
171 | 2007-02-01,17.2
172 | 2007-03-01,16.4
173 | 2007-04-01,16.6
174 | 2007-05-01,16.7
175 | 2007-06-01,16.2
176 | 2007-07-01,15.8
177 | 2007-08-01,16.4
178 | 2007-09-01,16.5
179 | 2007-10-01,16.5
180 | 2007-11-01,16.4
181 | 2007-12-01,16.0
182 | 2008-01-01,15.7
183 | 2008-02-01,15.5
184 | 2008-03-01,15.1
185 | 2008-04-01,14.6
186 | 2008-05-01,14.7
187 | 2008-06-01,14.4
188 | 2008-07-01,13.0
189 | 2008-08-01,14.1
190 | 2008-09-01,13.0
191 | 2008-10-01,10.9
192 | 2008-11-01,10.5
193 | 2008-12-01,10.4
194 | 2009-01-01,9.8
195 | 2009-02-01,9.2
196 | 2009-03-01,9.8
197 | 2009-04-01,9.4
198 | 2009-05-01,10.2
199 | 2009-06-01,10.1
200 | 2009-07-01,11.6
201 | 2009-08-01,14.8
202 | 2009-09-01,9.5
203 | 2009-10-01,10.6
204 | 2009-11-01,11.0
205 | 2009-12-01,11.3
206 | 2010-01-01,10.9
207 | 2010-02-01,10.3
208 | 2010-03-01,11.8
209 | 2010-04-01,11.5
210 | 2010-05-01,12.0
211 | 2010-06-01,11.6
212 | 2010-07-01,11.9
213 | 2010-08-01,12.0
214 | 2010-09-01,11.9
215 | 2010-10-01,12.4
216 | 2010-11-01,12.3
217 | 2010-12-01,12.6
218 | 2011-01-01,12.8
219 | 2011-02-01,13.1
220 | 2011-03-01,13.2
221 | 2011-04-01,13.4
222 | 2011-05-01,12.3
223 | 2011-06-01,11.9
224 | 2011-07-01,12.7
225 | 2011-08-01,12.6
226 | 2011-09-01,13.4
227 | 2011-10-01,13.7
228 | 2011-11-01,13.6
229 | 2011-12-01,13.8
230 | 2012-01-01,14.3
231 | 2012-02-01,14.9
232 | 2012-03-01,14.6
233 | 2012-04-01,14.8
234 | 2012-05-01,14.4
235 | 2012-06-01,14.5
236 | 2012-07-01,14.5
237 | 2012-08-01,14.5
238 | 2012-09-01,15.2
239 | 2012-10-01,14.8
240 | 2012-11-01,15.4
241 | 2012-12-01,15.6
242 | 2013-01-01,15.7
243 | 2013-02-01,15.9
244 | 2013-03-01,15.7
245 | 2013-04-01,15.8
246 | 2013-05-01,15.7
247 | 2013-06-01,16.1
248 | 2013-07-01,16.0
249 | 2013-08-01,16.0
250 | 2013-09-01,15.7
251 | 2013-10-01,15.7
252 | 2013-11-01,16.4
253 | 2013-12-01,15.8
254 | 2014-01-01,15.7
255 | 2014-02-01,15.9
256 | 2014-03-01,17.0
257 | 2014-04-01,16.7
258 | 2014-05-01,16.9
259 | 2014-06-01,17.2
260 | 2014-07-01,16.9
261 | 2014-08-01,17.6
262 | 2014-09-01,16.7
263 | 2014-10-01,16.8
264 | 2014-11-01,17.4
265 | 2014-12-01,17.4
266 | 2015-01-01,17.2
267 | 2015-02-01,16.9
268 | 2015-03-01,17.8
269 | 2015-04-01,17.3
270 | 2015-05-01,18.0
271 | 2015-06-01,17.5
272 | 2015-07-01,18.0
273 | 2015-08-01,18.2
274 | 2015-09-01,18.4
275 | 2015-10-01,18.5
276 | 2015-11-01,18.4
277 | 2015-12-01,17.9
278 | 2016-01-01,18.0
279 | 2016-02-01,18.0
280 | 2016-03-01,17.2
281 | 2016-04-01,17.9
282 | 2016-05-01,17.6
283 | 2016-06-01,17.1
284 |
--------------------------------------------------------------------------------
/data/TTLCON.csv:
--------------------------------------------------------------------------------
1 | DATE,TTLCON
2 | 1993-01-01,31283
3 | 1993-02-01,30264
4 | 1993-03-01,33794
5 | 1993-04-01,37257
6 | 1993-05-01,40124
7 | 1993-06-01,43842
8 | 1993-07-01,44682
9 | 1993-08-01,46866
10 | 1993-09-01,47755
11 | 1993-10-01,45361
12 | 1993-11-01,44786
13 | 1993-12-01,39535
14 | 1994-01-01,34917
15 | 1994-02-01,33222
16 | 1994-03-01,38215
17 | 1994-04-01,41754
18 | 1994-05-01,45711
19 | 1994-06-01,49032
20 | 1994-07-01,49545
21 | 1994-08-01,51516
22 | 1994-09-01,51718
23 | 1994-10-01,49135
24 | 1994-11-01,46719
25 | 1994-12-01,40406
26 | 1995-01-01,37206
27 | 1995-02-01,35250
28 | 1995-03-01,40016
29 | 1995-04-01,42824
30 | 1995-05-01,46615
31 | 1995-06-01,49804
32 | 1995-07-01,50219
33 | 1995-08-01,52628
34 | 1995-09-01,53028
35 | 1995-10-01,51372
36 | 1995-11-01,48163
37 | 1995-12-01,41542
38 | 1996-01-01,39356
39 | 1996-02-01,37391
40 | 1996-03-01,42052
41 | 1996-04-01,46865
42 | 1996-05-01,51598
43 | 1996-06-01,54703
44 | 1996-07-01,55430
45 | 1996-08-01,57488
46 | 1996-09-01,58544
47 | 1996-10-01,57456
48 | 1996-11-01,53498
49 | 1996-12-01,45313
50 | 1997-01-01,41882
51 | 1997-02-01,40788
52 | 1997-03-01,45704
53 | 1997-04-01,48998
54 | 1997-05-01,53617
55 | 1997-06-01,56977
56 | 1997-07-01,58826
57 | 1997-08-01,60682
58 | 1997-09-01,61323
59 | 1997-10-01,60053
60 | 1997-11-01,54968
61 | 1997-12-01,48031
62 | 1998-01-01,44409
63 | 1998-02-01,43150
64 | 1998-03-01,49602
65 | 1998-04-01,54017
66 | 1998-05-01,57723
67 | 1998-06-01,64393
68 | 1998-07-01,64381
69 | 1998-08-01,66137
70 | 1998-09-01,66544
71 | 1998-10-01,64600
72 | 1998-11-01,60469
73 | 1998-12-01,53095
74 | 1999-01-01,48746
75 | 1999-02-01,48824
76 | 1999-03-01,55607
77 | 1999-04-01,58654
78 | 1999-05-01,62209
79 | 1999-06-01,67545
80 | 1999-07-01,68522
81 | 1999-08-01,69975
82 | 1999-09-01,70034
83 | 1999-10-01,68669
84 | 1999-11-01,66684
85 | 1999-12-01,59082
86 | 2000-01-01,53782
87 | 2000-02-01,53993
88 | 2000-03-01,61295
89 | 2000-04-01,64524
90 | 2000-05-01,69253
91 | 2000-06-01,72715
92 | 2000-07-01,71774
93 | 2000-08-01,76132
94 | 2000-09-01,75153
95 | 2000-10-01,73550
96 | 2000-11-01,69677
97 | 2000-12-01,60910
98 | 2001-01-01,56855
99 | 2001-02-01,55529
100 | 2001-03-01,62808
101 | 2001-04-01,67787
102 | 2001-05-01,72920
103 | 2001-06-01,78004
104 | 2001-07-01,78281
105 | 2001-08-01,79916
106 | 2001-09-01,76605
107 | 2001-10-01,76302
108 | 2001-11-01,71543
109 | 2001-12-01,63697
110 | 2002-01-01,59516
111 | 2002-02-01,58588
112 | 2002-03-01,63782
113 | 2002-04-01,69504
114 | 2002-05-01,73384
115 | 2002-06-01,77182
116 | 2002-07-01,78863
117 | 2002-08-01,79460
118 | 2002-09-01,76542
119 | 2002-10-01,75710
120 | 2002-11-01,71362
121 | 2002-12-01,63984
122 | 2003-01-01,59877
123 | 2003-02-01,58526
124 | 2003-03-01,64506
125 | 2003-04-01,69638
126 | 2003-05-01,74473
127 | 2003-06-01,80377
128 | 2003-07-01,82971
129 | 2003-08-01,85191
130 | 2003-09-01,83841
131 | 2003-10-01,83133
132 | 2003-11-01,77915
133 | 2003-12-01,71050
134 | 2004-01-01,64934
135 | 2004-02-01,64138
136 | 2004-03-01,73238
137 | 2004-04-01,78354
138 | 2004-05-01,83736
139 | 2004-06-01,89932
140 | 2004-07-01,93614
141 | 2004-08-01,96164
142 | 2004-09-01,92538
143 | 2004-10-01,90582
144 | 2004-11-01,86394
145 | 2004-12-01,77733
146 | 2005-01-01,72458
147 | 2005-02-01,73094
148 | 2005-03-01,81791
149 | 2005-04-01,88032
150 | 2005-05-01,93704
151 | 2005-06-01,100678
152 | 2005-07-01,103875
153 | 2005-08-01,107453
154 | 2005-09-01,104682
155 | 2005-10-01,104039
156 | 2005-11-01,98348
157 | 2005-12-01,88657
158 | 2006-01-01,82400
159 | 2006-02-01,82381
160 | 2006-03-01,92354
161 | 2006-04-01,97056
162 | 2006-05-01,101862
163 | 2006-06-01,106777
164 | 2006-07-01,107150
165 | 2006-08-01,108598
166 | 2006-09-01,103102
167 | 2006-10-01,100721
168 | 2006-11-01,93850
169 | 2006-12-01,85031
170 | 2007-01-01,79009
171 | 2007-02-01,78501
172 | 2007-03-01,87421
173 | 2007-04-01,93644
174 | 2007-05-01,99690
175 | 2007-06-01,105020
176 | 2007-07-01,106779
177 | 2007-08-01,109573
178 | 2007-09-01,104744
179 | 2007-10-01,104313
180 | 2007-11-01,94934
181 | 2007-12-01,84325
182 | 2008-01-01,78039
183 | 2008-02-01,77921
184 | 2008-03-01,83384
185 | 2008-04-01,89092
186 | 2008-05-01,93316
187 | 2008-06-01,97882
188 | 2008-07-01,100234
189 | 2008-08-01,100366
190 | 2008-09-01,97303
191 | 2008-10-01,96609
192 | 2008-11-01,86093
193 | 2008-12-01,77112
194 | 2009-01-01,67301
195 | 2009-02-01,67030
196 | 2009-03-01,71985
197 | 2009-04-01,75043
198 | 2009-05-01,76661
199 | 2009-06-01,82089
200 | 2009-07-01,83885
201 | 2009-08-01,84426
202 | 2009-09-01,82037
203 | 2009-10-01,79952
204 | 2009-11-01,71527
205 | 2009-12-01,64608
206 | 2010-01-01,55362
207 | 2010-02-01,54986
208 | 2010-03-01,60990
209 | 2010-04-01,66565
210 | 2010-05-01,68903
211 | 2010-06-01,74806
212 | 2010-07-01,73918
213 | 2010-08-01,76554
214 | 2010-09-01,75818
215 | 2010-10-01,73386
216 | 2010-11-01,67318
217 | 2010-12-01,60648
218 | 2011-01-01,50973
219 | 2011-02-01,51017
220 | 2011-03-01,57148
221 | 2011-04-01,61590
222 | 2011-05-01,65430
223 | 2011-06-01,72495
224 | 2011-07-01,72253
225 | 2011-08-01,76986
226 | 2011-09-01,75871
227 | 2011-10-01,73783
228 | 2011-11-01,68203
229 | 2011-12-01,62582
230 | 2012-01-01,56608
231 | 2012-02-01,56994
232 | 2012-03-01,61803
233 | 2012-04-01,67002
234 | 2012-05-01,72228
235 | 2012-06-01,77580
236 | 2012-07-01,78305
237 | 2012-08-01,81152
238 | 2012-09-01,79404
239 | 2012-10-01,80287
240 | 2012-11-01,73071
241 | 2012-12-01,66022
242 | 2013-01-01,58821
243 | 2013-02-01,58898
244 | 2013-03-01,64190
245 | 2013-04-01,70601
246 | 2013-05-01,75775
247 | 2013-06-01,80997
248 | 2013-07-01,84346
249 | 2013-08-01,86776
250 | 2013-09-01,85825
251 | 2013-10-01,86551
252 | 2013-11-01,79695
253 | 2013-12-01,73876
254 | 2014-01-01,67391
255 | 2014-02-01,66916
256 | 2014-03-01,73899
257 | 2014-04-01,80749
258 | 2014-05-01,85599
259 | 2014-06-01,90712
260 | 2014-07-01,92585
261 | 2014-08-01,93261
262 | 2014-09-01,93193
263 | 2014-10-01,94888
264 | 2014-11-01,86262
265 | 2014-12-01,80174
266 | 2015-01-01,71447
267 | 2015-02-01,71159
268 | 2015-03-01,79751
269 | 2015-04-01,88308
270 | 2015-05-01,94756
271 | 2015-06-01,102716
272 | 2015-07-01,104958
273 | 2015-08-01,106856
274 | 2015-09-01,106868
275 | 2015-10-01,103844
276 | 2015-11-01,94876
277 | 2015-12-01,86894
278 | 2016-01-01,78610
279 | 2016-02-01,79903
280 | 2016-03-01,88984
281 | 2016-04-01,92208
282 | 2016-05-01,97406
283 | 2016-06-01,102696
284 |
--------------------------------------------------------------------------------
/data/international-airline-passengers.csv:
--------------------------------------------------------------------------------
1 | "Month","n_pass_thousands"
2 | "1949-01",112
3 | "1949-02",118
4 | "1949-03",132
5 | "1949-04",129
6 | "1949-05",121
7 | "1949-06",135
8 | "1949-07",148
9 | "1949-08",148
10 | "1949-09",136
11 | "1949-10",119
12 | "1949-11",104
13 | "1949-12",118
14 | "1950-01",115
15 | "1950-02",126
16 | "1950-03",141
17 | "1950-04",135
18 | "1950-05",125
19 | "1950-06",149
20 | "1950-07",170
21 | "1950-08",170
22 | "1950-09",158
23 | "1950-10",133
24 | "1950-11",114
25 | "1950-12",140
26 | "1951-01",145
27 | "1951-02",150
28 | "1951-03",178
29 | "1951-04",163
30 | "1951-05",172
31 | "1951-06",178
32 | "1951-07",199
33 | "1951-08",199
34 | "1951-09",184
35 | "1951-10",162
36 | "1951-11",146
37 | "1951-12",166
38 | "1952-01",171
39 | "1952-02",180
40 | "1952-03",193
41 | "1952-04",181
42 | "1952-05",183
43 | "1952-06",218
44 | "1952-07",230
45 | "1952-08",242
46 | "1952-09",209
47 | "1952-10",191
48 | "1952-11",172
49 | "1952-12",194
50 | "1953-01",196
51 | "1953-02",196
52 | "1953-03",236
53 | "1953-04",235
54 | "1953-05",229
55 | "1953-06",243
56 | "1953-07",264
57 | "1953-08",272
58 | "1953-09",237
59 | "1953-10",211
60 | "1953-11",180
61 | "1953-12",201
62 | "1954-01",204
63 | "1954-02",188
64 | "1954-03",235
65 | "1954-04",227
66 | "1954-05",234
67 | "1954-06",264
68 | "1954-07",302
69 | "1954-08",293
70 | "1954-09",259
71 | "1954-10",229
72 | "1954-11",203
73 | "1954-12",229
74 | "1955-01",242
75 | "1955-02",233
76 | "1955-03",267
77 | "1955-04",269
78 | "1955-05",270
79 | "1955-06",315
80 | "1955-07",364
81 | "1955-08",347
82 | "1955-09",312
83 | "1955-10",274
84 | "1955-11",237
85 | "1955-12",278
86 | "1956-01",284
87 | "1956-02",277
88 | "1956-03",317
89 | "1956-04",313
90 | "1956-05",318
91 | "1956-06",374
92 | "1956-07",413
93 | "1956-08",405
94 | "1956-09",355
95 | "1956-10",306
96 | "1956-11",271
97 | "1956-12",306
98 | "1957-01",315
99 | "1957-02",301
100 | "1957-03",356
101 | "1957-04",348
102 | "1957-05",355
103 | "1957-06",422
104 | "1957-07",465
105 | "1957-08",467
106 | "1957-09",404
107 | "1957-10",347
108 | "1957-11",305
109 | "1957-12",336
110 | "1958-01",340
111 | "1958-02",318
112 | "1958-03",362
113 | "1958-04",348
114 | "1958-05",363
115 | "1958-06",435
116 | "1958-07",491
117 | "1958-08",505
118 | "1958-09",404
119 | "1958-10",359
120 | "1958-11",310
121 | "1958-12",337
122 | "1959-01",360
123 | "1959-02",342
124 | "1959-03",406
125 | "1959-04",396
126 | "1959-05",420
127 | "1959-06",472
128 | "1959-07",548
129 | "1959-08",559
130 | "1959-09",463
131 | "1959-10",407
132 | "1959-11",362
133 | "1959-12",405
134 | "1960-01",417
135 | "1960-02",391
136 | "1960-03",419
137 | "1960-04",461
138 | "1960-05",472
139 | "1960-06",535
140 | "1960-07",622
141 | "1960-08",606
142 | "1960-09",508
143 | "1960-10",461
144 | "1960-11",390
145 | "1960-12",432
146 |
--------------------------------------------------------------------------------
/data/liquor.csv:
--------------------------------------------------------------------------------
1 | Period,Value
2 | 1/1/92,1509
3 | 2/1/92,1541
4 | 3/1/92,1597
5 | 4/1/92,1675
6 | 5/1/92,1822
7 | 6/1/92,1775
8 | 7/1/92,1912
9 | 8/1/92,1862
10 | 9/1/92,1770
11 | 10/1/92,1882
12 | 11/1/92,1831
13 | 12/1/92,2511
14 | 1/1/93,1614
15 | 2/1/93,1529
16 | 3/1/93,1678
17 | 4/1/93,1713
18 | 5/1/93,1796
19 | 6/1/93,1792
20 | 7/1/93,1950
21 | 8/1/93,1777
22 | 9/1/93,1707
23 | 10/1/93,1757
24 | 11/1/93,1782
25 | 12/1/93,2443
26 | 1/1/94,1548
27 | 2/1/94,1505
28 | 3/1/94,1714
29 | 4/1/94,1757
30 | 5/1/94,1830
31 | 6/1/94,1857
32 | 7/1/94,1981
33 | 8/1/94,1858
34 | 9/1/94,1823
35 | 10/1/94,1806
36 | 11/1/94,1845
37 | 12/1/94,2577
38 | 1/1/95,1555
39 | 2/1/95,1501
40 | 3/1/95,1725
41 | 4/1/95,1699
42 | 5/1/95,1807
43 | 6/1/95,1863
44 | 7/1/95,1886
45 | 8/1/95,1861
46 | 9/1/95,1845
47 | 10/1/95,1788
48 | 11/1/95,1879
49 | 12/1/95,2598
50 | 1/1/96,1679
51 | 2/1/96,1652
52 | 3/1/96,1837
53 | 4/1/96,1798
54 | 5/1/96,1957
55 | 6/1/96,1958
56 | 7/1/96,2034
57 | 8/1/96,2062
58 | 9/1/96,1781
59 | 10/1/96,1860
60 | 11/1/96,1992
61 | 12/1/96,2547
62 | 1/1/97,1706
63 | 2/1/97,1621
64 | 3/1/97,1853
65 | 4/1/97,1817
66 | 5/1/97,2060
67 | 6/1/97,2002
68 | 7/1/97,2098
69 | 8/1/97,2079
70 | 9/1/97,1892
71 | 10/1/97,2050
72 | 11/1/97,2082
73 | 12/1/97,2821
74 | 1/1/98,1846
75 | 2/1/98,1768
76 | 3/1/98,1894
77 | 4/1/98,1963
78 | 5/1/98,2140
79 | 6/1/98,2059
80 | 7/1/98,2209
81 | 8/1/98,2118
82 | 9/1/98,2031
83 | 10/1/98,2163
84 | 11/1/98,2154
85 | 12/1/98,3037
86 | 1/1/99,1866
87 | 2/1/99,1808
88 | 3/1/99,1986
89 | 4/1/99,2099
90 | 5/1/99,2210
91 | 6/1/99,2145
92 | 7/1/99,2339
93 | 8/1/99,2140
94 | 9/1/99,2126
95 | 10/1/99,2219
96 | 11/1/99,2273
97 | 12/1/99,3265
98 | 1/1/00,1920
99 | 2/1/00,1976
100 | 3/1/00,2190
101 | 4/1/00,2132
102 | 5/1/00,2357
103 | 6/1/00,2413
104 | 7/1/00,2463
105 | 8/1/00,2422
106 | 9/1/00,2358
107 | 10/1/00,2352
108 | 11/1/00,2549
109 | 12/1/00,3375
110 | 1/1/01,2109
111 | 2/1/01,2052
112 | 3/1/01,2327
113 | 4/1/01,2231
114 | 5/1/01,2470
115 | 6/1/01,2526
116 | 7/1/01,2483
117 | 8/1/01,2518
118 | 9/1/01,2316
119 | 10/1/01,2409
120 | 11/1/01,2638
121 | 12/1/01,3542
122 | 1/1/02,2114
123 | 2/1/02,2109
124 | 3/1/02,2366
125 | 4/1/02,2300
126 | 5/1/02,2569
127 | 6/1/02,2486
128 | 7/1/02,2568
129 | 8/1/02,2595
130 | 9/1/02,2297
131 | 10/1/02,2401
132 | 11/1/02,2601
133 | 12/1/02,3488
134 | 1/1/03,2121
135 | 2/1/03,2046
136 | 3/1/03,2273
137 | 4/1/03,2333
138 | 5/1/03,2576
139 | 6/1/03,2433
140 | 7/1/03,2611
141 | 8/1/03,2660
142 | 9/1/03,2461
143 | 10/1/03,2641
144 | 11/1/03,2660
145 | 12/1/03,3654
146 | 1/1/04,2293
147 | 2/1/04,2219
148 | 3/1/04,2398
149 | 4/1/04,2553
150 | 5/1/04,2685
151 | 6/1/04,2643
152 | 7/1/04,2867
153 | 8/1/04,2622
154 | 9/1/04,2618
155 | 10/1/04,2727
156 | 11/1/04,2763
157 | 12/1/04,3801
158 | 1/1/05,2219
159 | 2/1/05,2316
160 | 3/1/05,2530
161 | 4/1/05,2640
162 | 5/1/05,2709
163 | 6/1/05,2783
164 | 7/1/05,2924
165 | 8/1/05,2791
166 | 9/1/05,2784
167 | 10/1/05,2801
168 | 11/1/05,2933
169 | 12/1/05,4137
170 | 1/1/06,2424
171 | 2/1/06,2519
172 | 3/1/06,2753
173 | 4/1/06,2791
174 | 5/1/06,3017
175 | 6/1/06,3055
176 | 7/1/06,3117
177 | 8/1/06,3024
178 | 9/1/06,2997
179 | 10/1/06,2913
180 | 11/1/06,3137
181 | 12/1/06,4269
182 | 1/1/07,2569
183 | 2/1/07,2603
184 | 3/1/07,3005
185 | 4/1/07,2867
186 | 5/1/07,3262
187 | 6/1/07,3364
188 | 7/1/07,3322
189 | 8/1/07,3292
190 | 9/1/07,3057
191 | 10/1/07,3087
192 | 11/1/07,3297
193 | 12/1/07,4403
194 | 1/1/08,2675
195 | 2/1/08,2806
196 | 3/1/08,2989
197 | 4/1/08,2997
198 | 5/1/08,3420
199 | 6/1/08,3280
200 | 7/1/08,3517
201 | 8/1/08,3473
202 | 9/1/08,3150
203 | 10/1/08,3351
204 | 11/1/08,3387
205 | 12/1/08,4459
206 | 1/1/09,2912
207 | 2/1/09,2781
208 | 3/1/09,3024
209 | 4/1/09,3130
210 | 5/1/09,3467
211 | 6/1/09,3306
212 | 7/1/09,3556
213 | 8/1/09,3399
214 | 9/1/09,3263
215 | 10/1/09,3425
216 | 11/1/09,3356
217 | 12/1/09,4626
218 | 1/1/10,2877
219 | 2/1/10,2916
220 | 3/1/10,3214
221 | 4/1/10,3310
222 | 5/1/10,3466
223 | 6/1/10,3438
224 | 7/1/10,3657
225 | 8/1/10,3455
226 | 9/1/10,3365
227 | 10/1/10,3497
228 | 11/1/10,3524
229 | 12/1/10,4683
230 | 1/1/11,2888
231 | 2/1/11,2985
232 | 3/1/11,3249
233 | 4/1/11,3363
234 | 5/1/11,3471
235 | 6/1/11,3551
236 | 7/1/11,3740
237 | 8/1/11,3576
238 | 9/1/11,3517
239 | 10/1/11,3515
240 | 11/1/11,3646
241 | 12/1/11,4892
242 | 1/1/12,2995
243 | 2/1/12,3202
244 | 3/1/12,3549
245 | 4/1/12,3409
246 | 5/1/12,3786
247 | 6/1/12,3815
248 | 7/1/12,3733
249 | 8/1/12,3752
250 | 9/1/12,3503
251 | 10/1/12,3626
252 | 11/1/12,3869
253 | 12/1/12,5126
254 | 1/1/13,3158
255 | 2/1/13,3231
256 | 3/1/13,3625
257 | 4/1/13,3465
258 | 5/1/13,3973
259 | 6/1/13,3817
260 | 7/1/13,4010
261 | 8/1/13,4078
262 | 9/1/13,3643
263 | 10/1/13,3799
264 | 11/1/13,4043
265 | 12/1/13,5235
266 | 1/1/14,3373
267 | 2/1/14,3353
268 | 3/1/14,3679
269 | 4/1/14,3699
270 | 5/1/14,4187
271 | 6/1/14,4086
272 | 7/1/14,4240
273 | 8/1/14,4216
274 | 9/1/14,3856
275 | 10/1/14,4087
276 | 11/1/14,4133
277 | 12/1/14,5606
278 | 1/1/15,3576
279 | 2/1/15,3517
280 | 3/1/15,3881
281 | 4/1/15,3864
282 | 5/1/15,4369
283 | 6/1/15,4241
284 | 7/1/15,4524
285 | 8/1/15,4248
286 | 9/1/15,4091
287 | 10/1/15,4291
288 | 11/1/15,4241
289 | 12/1/15,5834
290 | 1/1/16,3559
291 | 2/1/16,3718
292 | 3/1/16,3986
293 | 4/1/16,4043
294 | 5/1/16,4311
--------------------------------------------------------------------------------
/data/mixedGLB.Ts.ERSSTV4.GHCN.CL.PA.csv:
--------------------------------------------------------------------------------
1 | Land-Ocean: Global Means
2 | Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,J-D,D-N,DJF,MAM,JJA,SON
3 | 1880,-.30,-.20,-.18,-.28,-.14,-.29,-.24,-.08,-.17,-.16,-.19,-.21,-.20,***,***,-.20,-.20,-.17
4 | 1881,-.09,-.14,.01,-.03,-.04,-.28,-.06,-.02,-.09,-.19,-.26,-.15,-.11,-.12,-.15,-.02,-.12,-.18
5 | 1882,.10,.09,.02,-.20,-.17,-.25,-.10,.04,-.01,-.22,-.21,-.25,-.10,-.09,.01,-.12,-.10,-.14
6 | 1883,-.33,-.42,-.17,-.24,-.25,-.11,-.08,-.13,-.18,-.11,-.20,-.18,-.20,-.20,-.33,-.22,-.11,-.16
7 | 1884,-.18,-.11,-.34,-.36,-.31,-.38,-.34,-.25,-.23,-.22,-.30,-.29,-.27,-.27,-.15,-.34,-.32,-.25
8 | 1885,-.64,-.29,-.23,-.44,-.41,-.50,-.28,-.27,-.19,-.18,-.22,-.05,-.31,-.33,-.41,-.36,-.35,-.20
9 | 1886,-.41,-.45,-.41,-.29,-.27,-.39,-.16,-.32,-.19,-.25,-.26,-.25,-.30,-.29,-.30,-.32,-.29,-.23
10 | 1887,-.66,-.48,-.31,-.37,-.33,-.20,-.19,-.27,-.20,-.32,-.26,-.37,-.33,-.32,-.46,-.34,-.22,-.26
11 | 1888,-.43,-.42,-.47,-.28,-.22,-.20,-.09,-.11,-.08,.01,-.01,-.13,-.20,-.22,-.41,-.33,-.14,-.02
12 | 1889,-.20,.14,.04,.04,-.03,-.12,-.05,-.18,-.18,-.22,-.30,-.31,-.11,-.10,-.06,.02,-.11,-.23
13 | 1890,-.48,-.47,-.41,-.38,-.48,-.27,-.29,-.36,-.36,-.23,-.37,-.30,-.37,-.37,-.42,-.42,-.31,-.32
14 | 1891,-.46,-.48,-.15,-.25,-.17,-.22,-.22,-.21,-.13,-.23,-.37,-.01,-.24,-.27,-.41,-.19,-.22,-.25
15 | 1892,-.26,-.15,-.36,-.35,-.25,-.20,-.28,-.20,-.25,-.16,-.50,-.29,-.27,-.25,-.14,-.32,-.22,-.30
16 | 1893,-.68,-.51,-.24,-.33,-.35,-.23,-.13,-.24,-.18,-.16,-.17,-.37,-.30,-.29,-.49,-.30,-.20,-.17
17 | 1894,-.55,-.32,-.21,-.42,-.30,-.43,-.32,-.28,-.22,-.16,-.25,-.22,-.31,-.32,-.41,-.31,-.34,-.21
18 | 1895,-.44,-.43,-.29,-.23,-.23,-.25,-.17,-.16,-.02,-.10,-.16,-.12,-.22,-.22,-.36,-.25,-.19,-.09
19 | 1896,-.23,-.15,-.30,-.32,-.20,-.13,-.06,-.09,-.04,.04,-.16,-.12,-.15,-.15,-.17,-.27,-.10,-.05
20 | 1897,-.23,-.19,-.12,-.01,.00,-.12,-.05,-.03,-.04,-.09,-.17,-.25,-.11,-.10,-.18,-.05,-.07,-.10
21 | 1898,-.06,-.32,-.56,-.34,-.36,-.21,-.22,-.23,-.19,-.31,-.36,-.21,-.28,-.28,-.21,-.42,-.22,-.29
22 | 1899,-.18,-.40,-.34,-.21,-.20,-.25,-.13,-.05,-.01,.00,.13,-.26,-.16,-.15,-.26,-.25,-.14,.04
23 | 1900,-.39,-.07,.02,-.15,-.06,-.15,-.08,-.03,.02,.09,-.13,-.14,-.09,-.10,-.24,-.06,-.09,-.01
24 | 1901,-.27,-.05,.05,-.05,-.17,-.10,-.08,-.12,-.16,-.29,-.17,-.29,-.14,-.13,-.15,-.06,-.10,-.20
25 | 1902,-.18,-.03,-.29,-.28,-.31,-.33,-.25,-.28,-.21,-.26,-.36,-.45,-.27,-.25,-.17,-.29,-.29,-.27
26 | 1903,-.27,-.05,-.23,-.40,-.41,-.44,-.30,-.43,-.42,-.41,-.38,-.47,-.35,-.35,-.26,-.34,-.39,-.40
27 | 1904,-.63,-.53,-.45,-.50,-.50,-.49,-.47,-.43,-.46,-.34,-.16,-.29,-.44,-.45,-.54,-.48,-.47,-.32
28 | 1905,-.37,-.58,-.24,-.37,-.33,-.31,-.24,-.20,-.14,-.24,-.09,-.21,-.28,-.28,-.41,-.31,-.25,-.16
29 | 1906,-.30,-.32,-.15,-.03,-.20,-.21,-.25,-.18,-.25,-.19,-.38,-.18,-.22,-.22,-.28,-.13,-.21,-.28
30 | 1907,-.43,-.51,-.23,-.39,-.46,-.43,-.35,-.37,-.32,-.24,-.51,-.50,-.40,-.37,-.37,-.36,-.38,-.36
31 | 1908,-.46,-.35,-.58,-.46,-.40,-.39,-.35,-.45,-.33,-.43,-.50,-.49,-.43,-.43,-.44,-.48,-.40,-.42
32 | 1909,-.70,-.47,-.52,-.59,-.53,-.51,-.42,-.30,-.37,-.38,-.31,-.54,-.47,-.47,-.55,-.55,-.41,-.35
33 | 1910,-.43,-.42,-.48,-.38,-.33,-.37,-.31,-.34,-.37,-.38,-.56,-.69,-.42,-.41,-.46,-.40,-.34,-.44
34 | 1911,-.63,-.59,-.62,-.55,-.51,-.46,-.40,-.43,-.38,-.26,-.20,-.24,-.44,-.48,-.64,-.56,-.43,-.28
35 | 1912,-.27,-.15,-.38,-.21,-.19,-.25,-.41,-.52,-.47,-.55,-.37,-.42,-.35,-.33,-.22,-.26,-.39,-.47
36 | 1913,-.41,-.43,-.43,-.36,-.45,-.46,-.35,-.33,-.34,-.35,-.17,-.02,-.34,-.37,-.42,-.41,-.38,-.28
37 | 1914,.02,-.14,-.23,-.27,-.20,-.23,-.24,-.14,-.13,-.05,-.21,-.09,-.16,-.15,-.05,-.23,-.20,-.13
38 | 1915,-.18,-.01,-.11,.08,-.03,-.14,-.02,-.16,-.12,-.22,-.14,-.25,-.11,-.09,-.09,-.02,-.11,-.16
39 | 1916,-.19,-.20,-.32,-.25,-.26,-.42,-.33,-.26,-.28,-.27,-.42,-.78,-.33,-.29,-.21,-.28,-.34,-.32
40 | 1917,-.47,-.55,-.48,-.39,-.48,-.40,-.22,-.26,-.18,-.35,-.27,-.72,-.40,-.40,-.60,-.45,-.29,-.27
41 | 1918,-.44,-.30,-.19,-.40,-.37,-.28,-.19,-.26,-.14,-.04,-.16,-.29,-.25,-.29,-.49,-.32,-.24,-.11
42 | 1919,-.19,-.22,-.26,-.19,-.19,-.27,-.21,-.19,-.18,-.15,-.30,-.35,-.22,-.22,-.23,-.21,-.23,-.21
43 | 1920,-.14,-.23,-.06,-.26,-.25,-.33,-.31,-.29,-.19,-.28,-.33,-.46,-.26,-.25,-.24,-.19,-.31,-.27
44 | 1921,-.04,-.21,-.28,-.36,-.36,-.31,-.15,-.23,-.17,-.06,-.16,-.18,-.21,-.23,-.24,-.33,-.23,-.13
45 | 1922,-.32,-.42,-.12,-.21,-.34,-.32,-.25,-.30,-.28,-.32,-.16,-.16,-.27,-.27,-.30,-.22,-.29,-.25
46 | 1923,-.25,-.36,-.31,-.37,-.33,-.23,-.28,-.28,-.27,-.11,.04,-.04,-.23,-.24,-.26,-.33,-.26,-.11
47 | 1924,-.22,-.26,-.12,-.34,-.18,-.27,-.25,-.34,-.30,-.35,-.23,-.41,-.27,-.24,-.17,-.21,-.29,-.29
48 | 1925,-.32,-.33,-.22,-.24,-.30,-.33,-.29,-.18,-.13,-.17,.04,.10,-.20,-.24,-.35,-.25,-.26,-.09
49 | 1926,.21,.08,.13,-.14,-.24,-.24,-.20,-.10,-.10,-.11,-.07,-.30,-.09,-.06,.13,-.08,-.18,-.09
50 | 1927,-.28,-.20,-.38,-.31,-.25,-.27,-.15,-.19,-.05,.00,-.04,-.36,-.21,-.20,-.26,-.32,-.20,-.03
51 | 1928,-.02,-.11,-.28,-.29,-.29,-.41,-.21,-.25,-.20,-.19,-.10,-.20,-.21,-.23,-.16,-.29,-.29,-.16
52 | 1929,-.47,-.58,-.35,-.42,-.39,-.43,-.33,-.29,-.24,-.15,-.15,-.53,-.36,-.33,-.42,-.39,-.35,-.18
53 | 1930,-.29,-.24,-.09,-.26,-.25,-.19,-.16,-.10,-.11,-.08,.14,-.08,-.14,-.18,-.35,-.20,-.15,-.02
54 | 1931,-.10,-.20,-.06,-.20,-.22,-.06,.01,-.01,-.07,-.01,-.11,-.10,-.09,-.09,-.13,-.16,-.02,-.06
55 | 1932,.14,-.17,-.20,-.08,-.23,-.30,-.24,-.24,-.13,-.10,-.27,-.23,-.17,-.16,-.04,-.17,-.26,-.16
56 | 1933,-.32,-.31,-.28,-.24,-.25,-.32,-.20,-.23,-.27,-.24,-.32,-.47,-.29,-.27,-.28,-.26,-.25,-.28
57 | 1934,-.26,-.04,-.29,-.28,-.11,-.14,-.11,-.11,-.16,-.11,-.01,-.08,-.14,-.17,-.26,-.23,-.12,-.09
58 | 1935,-.38,.11,-.14,-.35,-.26,-.23,-.19,-.17,-.17,-.08,-.28,-.21,-.20,-.19,-.12,-.25,-.20,-.18
59 | 1936,-.29,-.39,-.25,-.20,-.16,-.19,-.06,-.12,-.05,-.03,-.05,-.03,-.15,-.17,-.30,-.20,-.12,-.04
60 | 1937,-.08,.06,-.17,-.16,-.06,-.07,-.04,.03,.14,.09,.10,-.10,-.02,-.02,-.02,-.13,-.03,.11
61 | 1938,.02,-.03,.06,.05,-.07,-.17,-.08,-.04,.04,.12,.02,-.24,-.03,-.02,-.03,.01,-.10,.06
62 | 1939,-.12,-.11,-.20,-.12,-.07,-.08,-.06,-.05,.00,-.03,.06,.40,-.03,-.09,-.16,-.13,-.06,.01
63 | 1940,-.14,.06,.11,.16,.06,.05,.10,.01,.12,.08,.12,.19,.08,.09,.11,.11,.05,.11
64 | 1941,.12,.22,.05,.10,.10,.04,.16,.14,.02,.25,.12,.14,.12,.13,.18,.08,.11,.13
65 | 1942,.28,.06,.11,.13,.14,.10,.02,-.03,.00,.07,.13,.13,.09,.10,.16,.13,.03,.06
66 | 1943,.00,.22,.02,.14,.10,.00,.14,.03,.11,.31,.26,.28,.13,.12,.11,.09,.06,.22
67 | 1944,.42,.32,.34,.26,.26,.23,.23,.23,.31,.28,.13,.07,.26,.27,.34,.29,.23,.24
68 | 1945,.14,.04,.11,.24,.11,.03,.08,.25,.22,.22,.09,-.09,.12,.13,.09,.15,.12,.18
69 | 1946,.15,.05,.00,.11,-.03,-.17,-.09,-.09,-.02,-.06,-.02,-.28,-.04,-.02,.04,.03,-.11,-.03
70 | 1947,-.10,-.06,.06,.04,-.06,-.01,-.06,-.08,-.14,.06,-.01,-.16,-.04,-.05,-.14,.01,-.05,-.03
71 | 1948,.05,-.12,-.23,-.09,.07,-.06,-.13,-.10,-.11,-.07,-.09,-.21,-.09,-.09,-.07,-.08,-.10,-.09
72 | 1949,.09,-.15,-.01,-.07,-.08,-.23,-.13,-.08,-.09,-.03,-.08,-.16,-.09,-.09,-.09,-.05,-.15,-.06
73 | 1950,-.28,-.26,-.07,-.20,-.11,-.06,-.09,-.17,-.10,-.19,-.35,-.19,-.17,-.17,-.23,-.13,-.11,-.21
74 | 1951,-.34,-.43,-.19,-.10,-.02,-.05,.01,.06,.07,.07,-.01,.16,-.07,-.09,-.32,-.10,.00,.04
75 | 1952,.16,.13,-.09,.01,-.06,-.04,.05,.07,.08,-.03,-.16,-.02,.01,.02,.15,-.05,.03,-.04
76 | 1953,.08,.16,.11,.19,.09,.07,.02,.08,.06,.05,-.05,.04,.08,.07,.07,.13,.06,.02
77 | 1954,-.27,-.10,-.12,-.17,-.19,-.15,-.16,-.13,-.07,.00,.09,-.17,-.12,-.10,-.11,-.16,-.15,.00
78 | 1955,.12,-.21,-.35,-.22,-.20,-.07,-.07,.06,-.14,-.05,-.28,-.32,-.14,-.13,-.09,-.26,-.03,-.15
79 | 1956,-.16,-.24,-.22,-.28,-.28,-.14,-.12,-.26,-.19,-.24,-.15,-.09,-.20,-.22,-.24,-.26,-.17,-.20
80 | 1957,-.13,-.06,-.07,-.02,.08,.16,.01,.14,.07,.00,.06,.16,.03,.01,-.09,.00,.10,.05
81 | 1958,.38,.23,.08,.02,.08,-.06,.04,-.06,-.04,.03,.02,.00,.06,.07,.25,.06,-.02,.00
82 | 1959,.06,.09,.20,.16,.06,.02,.06,-.01,-.05,-.09,-.09,-.02,.03,.04,.05,.14,.03,-.07
83 | 1960,-.02,.13,-.36,-.16,-.08,.01,-.02,.00,.04,.08,-.12,.19,-.03,-.04,.03,-.20,.00,.00
84 | 1961,.07,.19,.10,.16,.14,.11,-.03,.00,.05,-.01,.03,-.15,.05,.08,.15,.13,.03,.03
85 | 1962,.07,.14,.11,.05,-.06,.04,-.03,-.03,-.02,-.03,.06,-.02,.02,.01,.02,.04,.00,.00
86 | 1963,-.03,.18,-.14,-.05,-.09,.02,.08,.24,.20,.15,.15,-.01,.06,.06,.05,-.10,.12,.17
87 | 1964,-.07,-.11,-.24,-.30,-.25,-.07,-.08,-.22,-.28,-.30,-.21,-.30,-.20,-.18,-.06,-.26,-.12,-.26
88 | 1965,-.08,-.18,-.12,-.19,-.15,-.10,-.11,-.01,-.14,-.04,-.06,-.06,-.10,-.12,-.19,-.15,-.07,-.08
89 | 1966,-.17,-.01,.05,-.13,-.11,.02,.08,-.11,-.02,-.15,-.02,-.05,-.05,-.05,-.08,-.06,.00,-.06
90 | 1967,-.07,-.19,.04,-.06,.13,-.08,-.01,.02,-.04,.07,-.06,-.02,-.02,-.03,-.10,.04,-.02,-.01
91 | 1968,-.23,-.14,.21,-.05,-.10,-.06,-.10,-.11,-.19,.11,-.04,-.14,-.07,-.06,-.13,.02,-.09,-.04
92 | 1969,-.11,-.15,.01,.18,.20,.05,-.02,.03,.10,.10,.12,.28,.07,.03,-.13,.13,.02,.11
93 | 1970,.10,.23,.08,.10,-.05,-.03,-.04,-.11,.11,.04,.02,-.14,.03,.06,.20,.04,-.06,.06
94 | 1971,-.03,-.21,-.19,-.11,-.07,-.18,-.12,-.02,.00,-.06,-.04,-.08,-.09,-.10,-.12,-.12,-.11,-.03
95 | 1972,-.25,-.17,.03,-.01,-.01,.04,.02,.18,.04,.08,.02,.19,.01,-.01,-.16,.00,.08,.05
96 | 1973,.27,.31,.26,.26,.26,.16,.09,.02,.07,.12,.05,-.06,.15,.17,.26,.26,.09,.08
97 | 1974,-.14,-.28,-.05,-.10,.00,-.05,-.03,.11,-.13,-.07,-.07,-.10,-.08,-.07,-.16,-.05,.01,-.09
98 | 1975,.08,.07,.14,.06,.16,-.01,-.02,-.20,-.03,-.09,-.16,-.17,-.01,-.01,.02,.12,-.08,-.09
99 | 1976,-.01,-.07,-.21,-.09,-.22,-.15,-.11,-.17,-.10,-.26,-.05,.08,-.11,-.13,-.08,-.18,-.14,-.14
100 | 1977,.17,.21,.25,.27,.31,.25,.23,.19,.02,.05,.20,.05,.18,.19,.15,.28,.23,.09
101 | 1978,.07,.12,.21,.14,.06,-.03,.07,-.18,.05,.01,.17,.11,.07,.06,.08,.14,-.05,.08
102 | 1979,.14,-.08,.18,.12,.05,.13,.02,.14,.26,.26,.29,.47,.17,.13,.06,.12,.10,.27
103 | 1980,.30,.42,.29,.32,.34,.16,.29,.26,.22,.19,.29,.20,.27,.30,.40,.32,.24,.23
104 | 1981,.55,.41,.49,.31,.23,.31,.34,.34,.17,.14,.22,.39,.33,.31,.39,.34,.33,.18
105 | 1982,.08,.15,-.02,.09,.15,.05,.13,.09,.16,.13,.13,.43,.13,.13,.21,.07,.09,.14
106 | 1983,.52,.40,.42,.30,.36,.18,.16,.31,.39,.16,.31,.17,.31,.33,.45,.36,.21,.28
107 | 1984,.31,.18,.29,.10,.34,.05,.15,.15,.19,.15,.04,-.06,.16,.18,.22,.24,.12,.13
108 | 1985,.21,-.06,.17,.11,.18,.16,-.01,.14,.15,.11,.10,.15,.12,.10,.03,.15,.10,.12
109 | 1986,.30,.38,.29,.26,.26,.12,.13,.12,.02,.14,.11,.15,.19,.19,.28,.27,.12,.09
110 | 1987,.35,.46,.17,.25,.26,.36,.46,.28,.39,.32,.25,.47,.33,.31,.32,.22,.36,.32
111 | 1988,.57,.42,.49,.44,.44,.42,.35,.45,.41,.38,.12,.34,.40,.41,.49,.46,.41,.30
112 | 1989,.16,.35,.36,.34,.17,.15,.33,.35,.37,.33,.20,.37,.29,.29,.28,.29,.28,.30
113 | 1990,.40,.41,.76,.55,.46,.37,.43,.30,.30,.43,.45,.41,.44,.44,.39,.59,.37,.39
114 | 1991,.42,.50,.35,.52,.39,.54,.51,.41,.49,.32,.31,.33,.42,.43,.45,.42,.49,.37
115 | 1992,.45,.42,.47,.23,.33,.24,.13,.10,.00,.10,.04,.22,.23,.24,.40,.34,.16,.05
116 | 1993,.38,.39,.36,.27,.27,.24,.27,.14,.11,.23,.07,.19,.24,.25,.33,.30,.22,.14
117 | 1994,.31,.04,.26,.41,.29,.42,.32,.23,.32,.42,.45,.36,.32,.30,.18,.32,.32,.40
118 | 1995,.51,.78,.45,.46,.29,.44,.48,.49,.34,.49,.44,.30,.46,.46,.55,.40,.47,.43
119 | 1996,.27,.49,.34,.37,.29,.26,.35,.49,.26,.20,.41,.40,.34,.34,.35,.33,.37,.29
120 | 1997,.32,.37,.51,.37,.38,.54,.36,.42,.56,.65,.65,.59,.48,.46,.37,.42,.44,.62
121 | 1998,.61,.89,.61,.63,.71,.77,.70,.68,.45,.46,.49,.57,.63,.63,.70,.65,.72,.47
122 | 1999,.48,.67,.33,.34,.33,.37,.41,.34,.44,.44,.41,.47,.42,.43,.57,.33,.38,.43
123 | 2000,.26,.59,.59,.59,.40,.43,.42,.43,.43,.30,.33,.30,.42,.44,.44,.53,.42,.35
124 | 2001,.44,.46,.58,.52,.59,.54,.61,.49,.56,.52,.70,.55,.55,.53,.40,.56,.55,.59
125 | 2002,.74,.76,.91,.59,.65,.54,.61,.55,.64,.57,.59,.42,.63,.64,.68,.71,.57,.60
126 | 2003,.72,.55,.57,.55,.63,.48,.55,.66,.66,.75,.54,.74,.62,.59,.56,.58,.56,.65
127 | 2004,.58,.70,.66,.63,.42,.43,.27,.45,.53,.66,.72,.52,.55,.56,.67,.57,.38,.64
128 | 2005,.72,.58,.70,.69,.65,.66,.65,.62,.77,.80,.76,.68,.69,.68,.61,.68,.65,.77
129 | 2006,.57,.71,.63,.49,.47,.63,.54,.71,.64,.69,.72,.77,.63,.62,.65,.53,.63,.69
130 | 2007,.96,.69,.71,.75,.67,.57,.62,.60,.64,.60,.57,.49,.66,.68,.81,.71,.60,.60
131 | 2008,.26,.35,.73,.53,.50,.47,.60,.44,.64,.67,.66,.55,.53,.53,.36,.59,.50,.66
132 | 2009,.61,.53,.53,.61,.63,.65,.71,.67,.69,.65,.78,.65,.64,.63,.56,.59,.68,.71
133 | 2010,.73,.79,.92,.87,.74,.64,.61,.65,.62,.71,.80,.49,.71,.73,.72,.84,.63,.71
134 | 2011,.51,.51,.64,.65,.53,.58,.74,.73,.56,.66,.56,.54,.60,.60,.50,.61,.68,.59
135 | 2012,.46,.49,.57,.68,.77,.61,.57,.63,.75,.79,.75,.52,.63,.63,.49,.67,.60,.76
136 | 2013,.66,.56,.65,.53,.61,.65,.59,.65,.77,.70,.80,.66,.65,.64,.58,.60,.63,.76
137 | 2014,.73,.50,.77,.79,.86,.65,.58,.81,.90,.86,.69,.79,.74,.73,.63,.80,.68,.82
138 | 2015,.82,.87,.90,.74,.78,.78,.73,.78,.81,1.07,1.03,1.10,.87,.84,.82,.81,.76,.97
139 | 2016,1.14,1.33,1.29,1.09,.93,.79,***,***,***,***,***,***,***,***,1.19,1.10,***,***
140 |
--------------------------------------------------------------------------------
/data/sentiment.csv:
--------------------------------------------------------------------------------
1 | DATE,UMCSENT
2 | 2000-01-01,112.0
3 | 2000-02-01,111.3
4 | 2000-03-01,107.1
5 | 2000-04-01,109.2
6 | 2000-05-01,110.7
7 | 2000-06-01,106.4
8 | 2000-07-01,108.3
9 | 2000-08-01,107.3
10 | 2000-09-01,106.8
11 | 2000-10-01,105.8
12 | 2000-11-01,107.6
13 | 2000-12-01,98.4
14 | 2001-01-01,94.7
15 | 2001-02-01,90.6
16 | 2001-03-01,91.5
17 | 2001-04-01,88.4
18 | 2001-05-01,92.0
19 | 2001-06-01,92.6
20 | 2001-07-01,92.4
21 | 2001-08-01,91.5
22 | 2001-09-01,81.8
23 | 2001-10-01,82.7
24 | 2001-11-01,83.9
25 | 2001-12-01,88.8
26 | 2002-01-01,93.0
27 | 2002-02-01,90.7
28 | 2002-03-01,95.7
29 | 2002-04-01,93.0
30 | 2002-05-01,96.9
31 | 2002-06-01,92.4
32 | 2002-07-01,88.1
33 | 2002-08-01,87.6
34 | 2002-09-01,86.1
35 | 2002-10-01,80.6
36 | 2002-11-01,84.2
37 | 2002-12-01,86.7
38 | 2003-01-01,82.4
39 | 2003-02-01,79.9
40 | 2003-03-01,77.6
41 | 2003-04-01,86.0
42 | 2003-05-01,92.1
43 | 2003-06-01,89.7
44 | 2003-07-01,90.9
45 | 2003-08-01,89.3
46 | 2003-09-01,87.7
47 | 2003-10-01,89.6
48 | 2003-11-01,93.7
49 | 2003-12-01,92.6
50 | 2004-01-01,103.8
51 | 2004-02-01,94.4
52 | 2004-03-01,95.8
53 | 2004-04-01,94.2
54 | 2004-05-01,90.2
55 | 2004-06-01,95.6
56 | 2004-07-01,96.7
57 | 2004-08-01,95.9
58 | 2004-09-01,94.2
59 | 2004-10-01,91.7
60 | 2004-11-01,92.8
61 | 2004-12-01,97.1
62 | 2005-01-01,95.5
63 | 2005-02-01,94.1
64 | 2005-03-01,92.6
65 | 2005-04-01,87.7
66 | 2005-05-01,86.9
67 | 2005-06-01,96.0
68 | 2005-07-01,96.5
69 | 2005-08-01,89.1
70 | 2005-09-01,76.9
71 | 2005-10-01,74.2
72 | 2005-11-01,81.6
73 | 2005-12-01,91.5
74 | 2006-01-01,91.2
75 | 2006-02-01,86.7
76 | 2006-03-01,88.9
77 | 2006-04-01,87.4
78 | 2006-05-01,79.1
79 | 2006-06-01,84.9
80 | 2006-07-01,84.7
81 | 2006-08-01,82.0
82 | 2006-09-01,85.4
83 | 2006-10-01,93.6
84 | 2006-11-01,92.1
85 | 2006-12-01,91.7
86 | 2007-01-01,96.9
87 | 2007-02-01,91.3
88 | 2007-03-01,88.4
89 | 2007-04-01,87.1
90 | 2007-05-01,88.3
91 | 2007-06-01,85.3
92 | 2007-07-01,90.4
93 | 2007-08-01,83.4
94 | 2007-09-01,83.4
95 | 2007-10-01,80.9
96 | 2007-11-01,76.1
97 | 2007-12-01,75.5
98 | 2008-01-01,78.4
99 | 2008-02-01,70.8
100 | 2008-03-01,69.5
101 | 2008-04-01,62.6
102 | 2008-05-01,59.8
103 | 2008-06-01,56.4
104 | 2008-07-01,61.2
105 | 2008-08-01,63.0
106 | 2008-09-01,70.3
107 | 2008-10-01,57.6
108 | 2008-11-01,55.3
109 | 2008-12-01,60.1
110 | 2009-01-01,61.2
111 | 2009-02-01,56.3
112 | 2009-03-01,57.3
113 | 2009-04-01,65.1
114 | 2009-05-01,68.7
115 | 2009-06-01,70.8
116 | 2009-07-01,66.0
117 | 2009-08-01,65.7
118 | 2009-09-01,73.5
119 | 2009-10-01,70.6
120 | 2009-11-01,67.4
121 | 2009-12-01,72.5
122 | 2010-01-01,74.4
123 | 2010-02-01,73.6
124 | 2010-03-01,73.6
125 | 2010-04-01,72.2
126 | 2010-05-01,73.6
127 | 2010-06-01,76.0
128 | 2010-07-01,67.8
129 | 2010-08-01,68.9
130 | 2010-09-01,68.2
131 | 2010-10-01,67.7
132 | 2010-11-01,71.6
133 | 2010-12-01,74.5
134 | 2011-01-01,74.2
135 | 2011-02-01,77.5
136 | 2011-03-01,67.5
137 | 2011-04-01,69.8
138 | 2011-05-01,74.3
139 | 2011-06-01,71.5
140 | 2011-07-01,63.7
141 | 2011-08-01,55.8
142 | 2011-09-01,59.5
143 | 2011-10-01,60.8
144 | 2011-11-01,63.7
145 | 2011-12-01,69.9
146 | 2012-01-01,75.0
147 | 2012-02-01,75.3
148 | 2012-03-01,76.2
149 | 2012-04-01,76.4
150 | 2012-05-01,79.3
151 | 2012-06-01,73.2
152 | 2012-07-01,72.3
153 | 2012-08-01,74.3
154 | 2012-09-01,78.3
155 | 2012-10-01,82.6
156 | 2012-11-01,82.7
157 | 2012-12-01,72.9
158 | 2013-01-01,73.8
159 | 2013-02-01,77.6
160 | 2013-03-01,78.6
161 | 2013-04-01,76.4
162 | 2013-05-01,84.5
163 | 2013-06-01,84.1
164 | 2013-07-01,85.1
165 | 2013-08-01,82.1
166 | 2013-09-01,77.5
167 | 2013-10-01,73.2
168 | 2013-11-01,75.1
169 | 2013-12-01,82.5
170 | 2014-01-01,81.2
171 | 2014-02-01,81.6
172 | 2014-03-01,80.0
173 | 2014-04-01,84.1
174 | 2014-05-01,81.9
175 | 2014-06-01,82.5
176 | 2014-07-01,81.8
177 | 2014-08-01,82.5
178 | 2014-09-01,84.6
179 | 2014-10-01,86.9
180 | 2014-11-01,88.8
181 | 2014-12-01,93.6
182 | 2015-01-01,98.1
183 | 2015-02-01,95.4
184 | 2015-03-01,93.0
185 | 2015-04-01,95.9
186 | 2015-05-01,90.7
187 | 2015-06-01,96.1
188 | 2015-07-01,93.1
189 | 2015-08-01,91.9
190 | 2015-09-01,87.2
191 | 2015-10-01,90.0
192 | 2015-11-01,91.3
193 | 2015-12-01,92.6
194 | 2016-01-01,92.0
195 | 2016-02-01,91.7
196 | 2016-03-01,91.0
197 | 2016-04-01,89.0
198 | 2016-05-01,94.7
199 | 2016-06-01,93.5
200 | 2016-07-01,90.0
201 |
--------------------------------------------------------------------------------
/data/series1.csv:
--------------------------------------------------------------------------------
1 | ,value
2 | 2006-06-01,0.21506609377014937
3 | 2006-07-01,1.142246186967091
4 | 2006-08-01,0.08077089285729766
5 | 2006-09-01,-0.7395189837372728
6 | 2006-10-01,0.5355162794384382
7 | 2006-11-01,-0.5647264651320741
8 | 2006-12-01,-1.1913935216543692
9 | 2007-01-01,-1.9961368164247117
10 | 2007-02-01,-1.8824096445368526
11 | 2007-03-01,-1.881361293860238
12 | 2007-04-01,-0.9766776697907428
13 | 2007-05-01,-1.9019923318711625
14 | 2007-06-01,-3.108610707028711
15 | 2007-07-01,-3.5268821422956975
16 | 2007-08-01,-2.7697700367118965
17 | 2007-09-01,-2.0338040672828015
18 | 2007-10-01,-3.180075966358295
19 | 2007-11-01,-3.3080815409072417
20 | 2007-12-01,-3.418501908245932
21 | 2008-01-01,-4.104801270845488
22 | 2008-02-01,-3.1744056756446346
23 | 2008-03-01,-1.425289135757783
24 | 2008-04-01,0.4401615904944527
25 | 2008-05-01,1.2679262510037599
26 | 2008-06-01,0.5441554324441394
27 | 2008-07-01,-0.48066970719551183
28 | 2008-08-01,-1.5799841374850232
29 | 2008-09-01,-0.1336163743808476
30 | 2008-10-01,1.7643398021524446
31 | 2008-11-01,-1.2648773342494266
32 | 2008-12-01,-3.1521978842334883
33 | 2009-01-01,-3.589928203018723
34 | 2009-02-01,-3.406228122833789
35 | 2009-03-01,-3.8263343229385622
36 | 2009-04-01,-2.7425294344933775
37 | 2009-05-01,-1.7887272597313733
38 | 2009-06-01,-2.4639028761016517
39 | 2009-07-01,-2.075657675328104
40 | 2009-08-01,-2.701547013802915
41 | 2009-09-01,-1.7025518581431656
42 | 2009-10-01,-0.7589319907020389
43 | 2009-11-01,-2.905804182464432
44 | 2009-12-01,-1.7551468831911923
45 | 2010-01-01,-1.9096679238947885
46 | 2010-02-01,-0.1291052757849871
47 | 2010-03-01,2.211945650117264
48 | 2010-04-01,1.569618562080831
49 | 2010-05-01,1.5087042686118521
50 | 2010-06-01,1.6998884025934553
51 | 2010-07-01,-1.7667376106183483
52 | 2010-08-01,-1.366051636217643
53 | 2010-09-01,0.7285490524822341
54 | 2010-10-01,2.2262658158365096
55 | 2010-11-01,1.6367586650124024
56 | 2010-12-01,0.043641470761556
57 | 2011-01-01,-2.3933886026916142
58 | 2011-02-01,-3.2353454025596102
59 | 2011-03-01,-2.101837771824045
60 | 2011-04-01,-0.8990375021239454
61 | 2011-05-01,-1.4936664574666798
62 | 2011-06-01,-3.1073167508272217
63 | 2011-07-01,-1.4205328314401435
64 | 2011-08-01,0.2758607214066028
65 | 2011-09-01,0.43735990925986584
66 | 2011-10-01,-0.2548263800847047
67 | 2011-11-01,-0.3458472155762089
68 | 2011-12-01,-0.6115030256444327
69 | 2012-01-01,-0.38337005302025623
70 | 2012-02-01,1.6954758499607498
71 | 2012-03-01,1.5613010884084153
72 | 2012-04-01,-0.17909432703339856
73 | 2012-05-01,-0.5810841195080401
74 | 2012-06-01,-2.3308344299399346
75 | 2012-07-01,-2.0057500970363367
76 | 2012-08-01,-0.5478467175272086
77 | 2012-09-01,0.7102722661376356
78 | 2012-10-01,1.5215664448259674
79 | 2012-11-01,1.323207097388035
80 | 2012-12-01,0.8370054084248226
81 | 2013-01-01,-0.10582093340683829
82 | 2013-02-01,-1.8597930129863454
83 | 2013-03-01,-1.9819951932431905
84 | 2013-04-01,-0.3690851457447418
85 | 2013-05-01,1.0213087568123702
86 | 2013-06-01,1.313503316487931
87 | 2013-07-01,1.138473914848284
88 | 2013-08-01,-0.5684114159918983
89 | 2013-09-01,-1.4298085649813963
90 | 2013-10-01,-1.8058074666928423
91 | 2013-11-01,-1.9511514403250418
92 | 2013-12-01,-1.4477675756530977
93 | 2014-01-01,-0.039660778396576335
94 | 2014-02-01,1.4280190595321094
95 | 2014-03-01,1.1451101579872376
96 | 2014-04-01,-1.6690214653425715
97 | 2014-05-01,-1.5040115349757466
98 | 2014-06-01,-2.448986495668514
99 | 2014-07-01,-2.8317230406994662
100 | 2014-08-01,-2.6938098802334176
101 | 2014-09-01,0.23414533840190543
102 | 2014-10-01,1.33963923299082
103 | 2014-11-01,1.4028876775484114
104 | 2014-12-01,1.7780474518983755
105 | 2015-01-01,1.6194943314371981
106 | 2015-02-01,0.4887985096857944
107 | 2015-03-01,2.208630445167098
108 | 2015-04-01,2.4556132237466786
109 | 2015-05-01,2.6470927240715665
110 | 2015-06-01,3.0162463456840567
111 | 2015-07-01,1.7039588266126313
112 | 2015-08-01,0.6037148295707649
113 | 2015-09-01,-1.2737209728501695
114 | 2015-10-01,-0.93284071310711
115 | 2015-11-01,0.08551545990148485
116 | 2015-12-01,1.20534410726747
117 | 2016-01-01,2.164110679279701
118 | 2016-02-01,0.9522611305039776
119 | 2016-03-01,0.3648520796360801
120 | 2016-04-01,-2.264868721883362
121 | 2016-05-01,-2.3816786375743693
122 |
--------------------------------------------------------------------------------
/data/series2.csv:
--------------------------------------------------------------------------------
1 | ,value
2 | 1998-06-01,-0.5988321459640398
3 | 1998-07-01,-0.8001818245420875
4 | 1998-08-01,2.29897730433156
5 | 1998-09-01,1.1503876193657336
6 | 1998-10-01,-1.1925784409492428
7 | 1998-11-01,-2.8537590592863404
8 | 1998-12-01,-1.9850296778582175
9 | 1999-01-01,-0.2804660526798255
10 | 1999-02-01,0.6441921376804158
11 | 1999-03-01,-0.38643573665159
12 | 1999-04-01,-1.7077984242692898
13 | 1999-05-01,-1.0643432678858735
14 | 1999-06-01,-0.2807300935199594
15 | 1999-07-01,1.0352887029352587
16 | 1999-08-01,0.4772659587223367
17 | 1999-09-01,0.14729461428518129
18 | 1999-10-01,0.3154110024282838
19 | 1999-11-01,0.6863548129190071
20 | 1999-12-01,-0.287397199588831
21 | 2000-01-01,-0.7152901377520209
22 | 2000-02-01,0.9393973123420367
23 | 2000-03-01,1.5775143325001566
24 | 2000-04-01,2.882927016963263
25 | 2000-05-01,1.4941798228687881
26 | 2000-06-01,-0.13206446812278239
27 | 2000-07-01,0.08711484098949879
28 | 2000-08-01,-1.206737167276847
29 | 2000-09-01,0.9805199834263024
30 | 2000-10-01,1.4375923397946233
31 | 2000-11-01,-1.4505667037318246
32 | 2000-12-01,-0.5276126147276123
33 | 2001-01-01,-0.6351930657840301
34 | 2001-02-01,-0.8789431454671376
35 | 2001-03-01,0.38016739799920973
36 | 2001-04-01,0.8989237811520877
37 | 2001-05-01,-0.5696251467413351
38 | 2001-06-01,0.5810107288436385
39 | 2001-07-01,-0.877114392182485
40 | 2001-08-01,-1.8419085781204712
41 | 2001-09-01,-3.0691972159928227
42 | 2001-10-01,-1.592178550180547
43 | 2001-11-01,1.32565772017106
44 | 2001-12-01,3.0356773079514054
45 | 2002-01-01,0.539169160847849
46 | 2002-02-01,-1.5749740555067968
47 | 2002-03-01,-0.5628386964025566
48 | 2002-04-01,2.209853716845048
49 | 2002-05-01,-0.3874084576722314
50 | 2002-06-01,0.11802333676416787
51 | 2002-07-01,0.030258282148944465
52 | 2002-08-01,-0.8169009829590038
53 | 2002-09-01,-0.7066727929039491
54 | 2002-10-01,-1.9666381833685282
55 | 2002-11-01,0.1961625983960542
56 | 2002-12-01,0.9501640581007602
57 | 2003-01-01,-0.09661344822577811
58 | 2003-02-01,1.2536526524586213
59 | 2003-03-01,-0.48851392015721484
60 | 2003-04-01,-2.123074480287456
61 | 2003-05-01,-1.0768519769268379
62 | 2003-06-01,-0.41002004156468885
63 | 2003-07-01,0.9077753163701393
64 | 2003-08-01,1.8289107832882583
65 | 2003-09-01,1.8839236166059405
66 | 2003-10-01,-0.17034810074818763
67 | 2003-11-01,0.3461316722485077
68 | 2003-12-01,-0.5123804109231517
69 | 2004-01-01,-0.0250276791192518
70 | 2004-02-01,-1.6531120005419884
71 | 2004-03-01,-0.04375296259860606
72 | 2004-04-01,-1.0487457638712718
73 | 2004-05-01,-1.3555129316049697
74 | 2004-06-01,-1.0994696847921825
75 | 2004-07-01,0.5323094106346663
76 | 2004-08-01,0.14925877923715314
77 | 2004-09-01,0.01848323695573731
78 | 2004-10-01,-0.7814264894902231
79 | 2004-11-01,1.5167771796013843
80 | 2004-12-01,2.296051087044108
81 | 2005-01-01,0.9434746775169713
82 | 2005-02-01,-1.3443629053521349
83 | 2005-03-01,-0.8976294986936344
84 | 2005-04-01,0.98175810146041
85 | 2005-05-01,1.914742733869594
86 | 2005-06-01,3.6808250803475433
87 | 2005-07-01,2.2499900438842384
88 | 2005-08-01,-0.29639336413031137
89 | 2005-09-01,0.4269621646670275
90 | 2005-10-01,0.19173131063833318
91 | 2005-11-01,-0.2889815853393901
92 | 2005-12-01,-1.0759536060630406
93 | 2006-01-01,-2.3631250663725956
94 | 2006-02-01,-1.9836952101294716
95 | 2006-03-01,-3.500637349426545
96 | 2006-04-01,-2.1141695706277153
97 | 2006-05-01,0.017968722919202373
98 | 2006-06-01,-0.9458506118246945
99 | 2006-07-01,-0.6874705885358369
100 | 2006-08-01,0.8465448195500478
101 | 2006-09-01,0.6691255871439188
102 | 2006-10-01,-0.08713307038502685
103 | 2006-11-01,1.0998058262016177
104 | 2006-12-01,1.4779251634861494
105 | 2007-01-01,1.0297282792527176
106 | 2007-02-01,0.2761081358786794
107 | 2007-03-01,-0.07624838857865499
108 | 2007-04-01,0.9858677151212878
109 | 2007-05-01,1.8400672256944817
110 | 2007-06-01,1.3298450389193845
111 | 2007-07-01,1.3647035685985633
112 | 2007-08-01,0.002165062180341848
113 | 2007-09-01,-1.8832985648714708
114 | 2007-10-01,0.25478369007939394
115 | 2007-11-01,-0.08428480654220216
116 | 2007-12-01,0.250528577334092
117 | 2008-01-01,-1.388046269408355
118 | 2008-02-01,-0.8141709674977629
119 | 2008-03-01,1.105602704377001
120 | 2008-04-01,0.6783867356618624
121 | 2008-05-01,0.8597988185166201
122 | 2008-06-01,0.6072776843392129
123 | 2008-07-01,-1.3642440097784643
124 | 2008-08-01,-1.7137879496710897
125 | 2008-09-01,-0.23947769210328385
126 | 2008-10-01,-0.33873159235012495
127 | 2008-11-01,-0.8917264305872387
128 | 2008-12-01,-1.6743758264767927
129 | 2009-01-01,0.20039551857306726
130 | 2009-02-01,-2.179181944879357
131 | 2009-03-01,0.5897093990491
132 | 2009-04-01,2.2617760869600314
133 | 2009-05-01,1.7321264302126078
134 | 2009-06-01,-0.6027358708709991
135 | 2009-07-01,-0.964091433892799
136 | 2009-08-01,-0.961072497165241
137 | 2009-09-01,1.121733623900013
138 | 2009-10-01,1.8587796044504794
139 | 2009-11-01,1.0411921039301089
140 | 2009-12-01,-0.8383568366059101
141 | 2010-01-01,-1.0065600590335944
142 | 2010-02-01,-0.9352167936808549
143 | 2010-03-01,-0.32786047740480695
144 | 2010-04-01,0.4058865638022857
145 | 2010-05-01,1.1289604538400548
146 | 2010-06-01,-0.1289863833332744
147 | 2010-07-01,-0.7837566521267346
148 | 2010-08-01,-0.9382022886725967
149 | 2010-09-01,0.019015698560999605
150 | 2010-10-01,1.1744723944491786
151 | 2010-11-01,0.023480177242375166
152 | 2010-12-01,0.11225444641066057
153 | 2011-01-01,0.7503458729665786
154 | 2011-02-01,0.6327836554084286
155 | 2011-03-01,-0.397904482673621
156 | 2011-04-01,-0.9041192801212166
157 | 2011-05-01,0.19684514968289946
158 | 2011-06-01,0.8164581910258784
159 | 2011-07-01,-0.7865406928876951
160 | 2011-08-01,0.5776816701185132
161 | 2011-09-01,0.6830306650409664
162 | 2011-10-01,-1.5764659470575981
163 | 2011-11-01,-0.9753378452977925
164 | 2011-12-01,-0.5018770909690016
165 | 2012-01-01,0.002990601985497658
166 | 2012-02-01,0.6934312692549467
167 | 2012-03-01,0.33035488799827195
168 | 2012-04-01,0.6316395283006564
169 | 2012-05-01,-2.1369448261370234
170 | 2012-06-01,-1.019048380073622
171 | 2012-07-01,0.4383427157248401
172 | 2012-08-01,0.9407095470889854
173 | 2012-09-01,1.348099649128561
174 | 2012-10-01,0.8116850598534116
175 | 2012-11-01,1.8388052714366765
176 | 2012-12-01,2.5190616395369294
177 | 2013-01-01,1.6689361863085121
178 | 2013-02-01,-2.9339212946760274
179 | 2013-03-01,-2.3596669610834473
180 | 2013-04-01,-0.3476010206228577
181 | 2013-05-01,-0.6513293272135454
182 | 2013-06-01,0.16057182159345101
183 | 2013-07-01,-0.1808846341542013
184 | 2013-08-01,1.1473888063467639
185 | 2013-09-01,0.8478549668224615
186 | 2013-10-01,-1.9438919572647397
187 | 2013-11-01,-4.306569160552039
188 | 2013-12-01,-2.9811560202605527
189 | 2014-01-01,0.9623757604005118
190 | 2014-02-01,3.1388819359677322
191 | 2014-03-01,-0.4911584187684064
192 | 2014-04-01,-0.903162201396771
193 | 2014-05-01,-0.20553474371678188
194 | 2014-06-01,0.0034713819334720175
195 | 2014-07-01,-0.06661193721268444
196 | 2014-08-01,-0.9011184211279554
197 | 2014-09-01,0.37803147639788465
198 | 2014-10-01,1.5791737174543747
199 | 2014-11-01,-0.7855743308964656
200 | 2014-12-01,-1.3663558242294889
201 | 2015-01-01,0.09362845713313073
202 | 2015-02-01,0.5052619604953503
203 | 2015-03-01,0.7427287185469982
204 | 2015-04-01,0.7168394463243395
205 | 2015-05-01,1.4315642877534047
206 | 2015-06-01,-0.006107227310269359
207 | 2015-07-01,-0.8942768166365469
208 | 2015-08-01,-0.8966058572714672
209 | 2015-09-01,0.5685733059272622
210 | 2015-10-01,1.8259429077513398
211 | 2015-11-01,0.12388491524839751
212 | 2015-12-01,-0.0017948479719895327
213 | 2016-01-01,-0.30255127717029906
214 | 2016-02-01,-0.09809951837633618
215 | 2016-03-01,-0.8275530135614978
216 | 2016-04-01,1.6647855828518248
217 | 2016-05-01,1.3353296100931782
218 | 2016-06-01,-1.7525984037269728
219 | 2016-07-01,-0.23068707205883043
220 | 2016-08-01,0.41366416066200007
221 | 2016-09-01,-1.1628965126202637
222 | 2016-10-01,-1.3087945420735818
223 | 2016-11-01,-1.5548898995699532
224 | 2016-12-01,0.985958917261037
225 | 2017-01-01,1.1866663916911828
226 | 2017-02-01,1.2150681052189176
227 | 2017-03-01,-1.1837011467595464
228 | 2017-04-01,-2.9009130879566314
229 | 2017-05-01,-1.2105886542817972
230 | 2017-06-01,1.654358133482298
231 | 2017-07-01,-0.2778687199736294
232 | 2017-08-01,-0.2838170624419766
233 | 2017-09-01,-0.6061344971374943
234 | 2017-10-01,-0.39391290851508587
235 | 2017-11-01,1.0816482242687575
236 | 2017-12-01,0.4093893605786109
237 | 2018-01-01,2.177077034365991
238 | 2018-02-01,2.2113259696356655
239 | 2018-03-01,-0.3972793928313735
240 | 2018-04-01,0.8206395157890112
241 | 2018-05-01,0.6605343290422202
242 | 2018-06-01,0.7894552492498998
243 | 2018-07-01,-0.024437392942594227
244 | 2018-08-01,-0.3988814626402032
245 | 2018-09-01,-0.4140144429318273
246 | 2018-10-01,1.4580015190498223
247 | 2018-11-01,-0.02596102842054826
248 | 2018-12-01,-0.6586766555201682
249 | 2019-01-01,-0.21643238827171585
250 | 2019-02-01,-0.42230681184674035
251 | 2019-03-01,1.2527223297693217
252 |
--------------------------------------------------------------------------------
/img/svds_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/etcrago/Tutorial-Arima-w-jeffrey-yau/428f906b17616c5aed39b923a6164b80857a698a/img/svds_logo.png
--------------------------------------------------------------------------------