├── .github ├── ISSUE_TEMPLATE │ ├── bug-report-template.md │ └── feature-request-template.md └── workflows │ └── tsfracdiff_tests.yml ├── LICENSE ├── README.md ├── docs ├── .nojekyll └── tsfracdiff │ ├── .nojekyll │ ├── index.html │ ├── tsfracdiff.html │ └── unit_root_tests.html ├── examples ├── Example.html └── Example.ipynb ├── pyproject.toml ├── requirements.txt ├── setup.cfg ├── setup.py ├── tests └── test_module.py └── tsfracdiff ├── __init__.py ├── tsfracdiff.py └── unit_root_tests.py /.github/ISSUE_TEMPLATE/bug-report-template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug Report Template 3 | about: Report a bug 4 | title: "[BUG]" 5 | labels: bug 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Version Numbers:** 24 | - OS 25 | - tsfracdiff Version 26 | - Python Version 27 | 28 | **Additional context** 29 | Add any other context about the problem here. 30 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature-request-template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature Request Template 3 | about: Suggest an idea for this project 4 | title: "[FEAT]" 5 | labels: enhancement 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.github/workflows/tsfracdiff_tests.yml: -------------------------------------------------------------------------------- 1 | # Build & unit tests 2 | 3 | name: Unit Tests 4 | 5 | on: 6 | push: 7 | branches: [ master, dev ] 8 | pull_request: 9 | branches: [ master, dev ] 10 | 11 | jobs: 12 | build: 13 | 14 | runs-on: ubuntu-latest 15 | strategy: 16 | fail-fast: false 17 | matrix: 18 | python-version: ["3.7", "3.8", "3.9", "3.10"] 19 | 20 | steps: 21 | - uses: actions/checkout@v3 22 | - name: Set up Python ${{ matrix.python-version }} 23 | uses: actions/setup-python@v3 24 | with: 25 | python-version: ${{ matrix.python-version }} 26 | - name: Install dependencies and package 27 | run: | 28 | python -m pip install --upgrade pip 29 | python -m pip install flake8 pytest pytest 30 | if [ -f requirements.txt ]; then pip install -r requirements.txt; fi 31 | python -m pip install -e . 32 | - name: Lint with flake8 33 | run: | 34 | # Stop the build if syntax errors 35 | flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics 36 | # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide 37 | flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics 38 | - name: Run Unit Tests 39 | run: | 40 | python -m pytest 41 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) [2022] [adamvvu] 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Build](https://img.shields.io/github/actions/workflow/status/adamvvu/tsfracdiff/tsfracdiff_tests.yml?style=for-the-badge)](https://github.com/adamvvu/tsfracdiff/actions/workflows/tsfracdiff_tests.yml) 2 | [![PyPi](https://img.shields.io/pypi/v/tsfracdiff?style=for-the-badge)](https://pypi.org/project/tsfracdiff/) 3 | [![Downloads](https://img.shields.io/pypi/dm/tsfracdiff?style=for-the-badge)](https://pypi.org/project/tsfracdiff/) 4 | [![License](https://img.shields.io/badge/license-MIT-green?style=for-the-badge)](https://github.com/adamvvu/tsfracdiff/blob/master/LICENSE) 5 | 6 | Efficient and easy to use fractional differentiation transformations for 7 | stationarizing time series data in Python. 8 | 9 | ------------------------------------------------------------------------ 10 | 11 | ## **tsfracdiff** 12 | 13 | Data with high persistence, serial correlation, and non-stationarity 14 | pose significant challenges when used directly as predictive signals in 15 | many machine learning and statistical models. A common approach is to 16 | take the first difference as a stationarity transformation, but this 17 | wipes out much of the information available in the data. For datasets 18 | where there is a low signal-to-noise ratio such as financial market 19 | data, this effect can be particularly severe. Hosking (1981) introduces 20 | fractional (non-integer) differentiation for its flexibility in modeling 21 | short-term and long-term time series dynamics, and López de Prado (2018) 22 | proposes the use of fractional differentiation as a feature 23 | transformation for financial machine learning applications. This library 24 | is an extension of their ideas, with some modifications for efficiency 25 | and robustness. 26 | 27 | [Documentation](https://adamvvu.github.io/tsfracdiff/docs/) 28 | 29 | ## Getting Started 30 | 31 | ### Installation 32 | 33 | `pip install tsfracdiff` 34 | 35 | #### Dependencies: 36 | 37 | # Required 38 | python3 # Python 3.7+ 39 | numpy 40 | pandas 41 | arch 42 | 43 | # Suggested 44 | joblib 45 | 46 | ### Usage 47 | 48 | ``` python 49 | # A pandas.DataFrame/np.array with potentially non-stationary time series 50 | df 51 | 52 | # Automatic stationary transformation with minimal information loss 53 | from tsfracdiff import FractionalDifferentiator 54 | fracDiff = FractionalDifferentiator() 55 | df = fracDiff.FitTransform(df) 56 | ``` 57 | 58 | For a more in-depth example, see this 59 | [notebook](https://adamvvu.github.io/tsfracdiff/examples/Example.html). 60 | 61 | ## References 62 | 63 | Hosking, J. R. M. (1981). Fractional Differencing. Biometrika, 68(1), 64 | 165--176. 65 | 66 | López de Prado, Marcos (2018). Advances in Financial Machine Learning. 67 | John Wiley & Sons, Inc. 68 | -------------------------------------------------------------------------------- /docs/.nojekyll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamvvu/tsfracdiff/fd1816021ac717eb6c761365931279071b66ead6/docs/.nojekyll -------------------------------------------------------------------------------- /docs/tsfracdiff/.nojekyll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamvvu/tsfracdiff/fd1816021ac717eb6c761365931279071b66ead6/docs/tsfracdiff/.nojekyll -------------------------------------------------------------------------------- /docs/tsfracdiff/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | tsfracdiff API documentation 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 |
21 |
22 |

Package tsfracdiff

23 |
24 |
25 |
26 | 27 | Expand source code 28 | 29 |
from .unit_root_tests import *
30 | from .tsfracdiff import *
31 | 
32 | __version__ = "1.0.4"
33 |
34 |
35 |
36 |

Sub-modules

37 |
38 |
tsfracdiff.tsfracdiff
39 |
40 |
41 |
42 |
tsfracdiff.unit_root_tests
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 | 69 |
70 | 73 | 74 | -------------------------------------------------------------------------------- /docs/tsfracdiff/tsfracdiff.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | tsfracdiff.tsfracdiff API documentation 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 |
21 |
22 |

Module tsfracdiff.tsfracdiff

23 |
24 |
25 |
26 | 27 | Expand source code 28 | 29 |
from .unit_root_tests import *
 30 | 
 31 | import pandas as pd
 32 | import numpy as np
 33 | 
 34 | class FractionalDifferentiator:
 35 |     
 36 |     def __init__(self, maxOrderBound=1, significance=0.01, precision=0.01, memoryThreshold=1e-4,
 37 |                        unitRootTest='PP', unitRootTestConfig={}):
 38 |         """
 39 |         Provides estimation of the real-valued order of integration and provides fractional 
 40 |         differentiation data transformations.
 41 |         
 42 |         The available stationarity/unit root tests are:
 43 |         -----------------------------------------------
 44 |             - 'PP'  : Phillips and Perron (1988) [default]
 45 |             - 'ADF' : Augmented Dickey-Fuller (Said & Dickey, 1984)
 46 | 
 47 |         Parameters:
 48 |         -----------
 49 |             maxOrderBound       (float) Maximum real-valued order to search in (0, maxOrderBound)
 50 |             significance        (float) Statistical significance level
 51 |             precision           (float) Precision of estimated order
 52 |             memoryThreshold     (float) Minimum magnitude of weight significance
 53 |             unitRootTest        (str)   Unit-root/stationarity tests: ['PP','ADF']
 54 |             unitRootTestConfig  (dict)  Optional keyword arguments to pass to unit root tests
 55 | 
 56 |         Attributes:
 57 |         -----------
 58 |             orders              (list)  Estimated minimum orders of differentiation
 59 |             numLags             (list)  Number of lags required for transformations
 60 | 
 61 |         Example:
 62 |         --------
 63 |                 # A pandas.DataFrame/np.array with potentially non-stationary time series
 64 |             df 
 65 |         
 66 |                 # Automatic stationary transformation with minimal information loss
 67 |             from tsfracdiff import FractionalDifferentiator
 68 |             fracDiff = FractionalDifferentiator()
 69 |             df = fracDiff.FitTransform(df)
 70 |         """
 71 |         self.maxOrderBound = maxOrderBound
 72 |         self.significance = significance
 73 |         self.precision = precision
 74 |         self.memoryThreshold = memoryThreshold
 75 |         
 76 |         # Critical value checks
 77 |         checkCV = False
 78 |         cv_sig = None
 79 |         if (self.significance in [0.01, 0.05, 0.1]):
 80 |             checkCV = True
 81 |             cv_sig = str(int(self.significance * 100)) + '%'
 82 |         
 83 |         # Unit-root/Stationarity tests
 84 |         if unitRootTest == 'PP':
 85 |             self.UnitRootTest = PhillipsPerron(significance=significance, checkCV=checkCV, cv_sig=cv_sig)
 86 |         elif unitRootTest == 'ADF':
 87 |             self.UnitRootTest = ADFuller(significance=significance, checkCV=checkCV, cv_sig=cv_sig)
 88 |         else:
 89 |             raise Exception('Please specify a valid unit root test.')
 90 |         self.UnitRootTest.config.update( unitRootTestConfig )
 91 | 
 92 |         # States
 93 |         self.isFitted = False
 94 |         self.orders = []
 95 |         self.numLags = None
 96 |         
 97 |     def Fit(self, df, parallel=True):
 98 |         """
 99 |         Estimates the fractional order of integration.
100 |         
101 |         Parameters:
102 |         -----------
103 |             df       (pandas.DataFrame/np.array) Raw data
104 |             parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
105 |         """
106 |         df = pd.DataFrame(df)
107 |         
108 |         # Estimate minimum order of differencing
109 |         if parallel:
110 |             try:
111 |                 import multiprocessing
112 |                 from joblib import Parallel, delayed
113 |                 from functools import partial
114 |             except ImportError:
115 |                 raise Exception('The module `joblib` is required for parallelization.')
116 | 
117 |             def ApplyParallel(df, func, **kwargs):
118 |                 n_jobs = min(df.shape[1], multiprocessing.cpu_count())
119 |                 res = Parallel(n_jobs=n_jobs)( delayed(partial(func, **kwargs))(x) for x in np.array_split(df, df.shape[1], axis=1) )
120 |                 return res
121 |             orders = ApplyParallel(df, self._MinimumOrderSearch, upperOrder=self.maxOrderBound, first_run=True)
122 |         else:
123 |             orders = []
124 |             for j in range(df.shape[1]):
125 |                 orders.append( self._MinimumOrderSearch(df.iloc[:,j], upperOrder=self.maxOrderBound, first_run=True) )
126 |         self.orders = orders
127 |         self.numLags = [ (len(self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)) - 1) for order in self.orders ]
128 |         self.isFitted = True
129 | 
130 |         return
131 |         
132 |     def FitTransform(self, df, parallel=True):
133 |         """
134 |         Estimates the fractional order of integration and returns a stationarized dataframe.
135 | 
136 |         Parameters
137 |         ----------
138 |             df       (pandas.DataFrame/np.array) Raw data
139 |             parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
140 |         """
141 |         if not self.isFitted: 
142 |             self.Fit(df, parallel=parallel)
143 |         fracDiffed = self.Transform(df)
144 | 
145 |         return fracDiffed
146 |     
147 |     def Transform(self, df):
148 |         """
149 |         Applies a fractional differentiation transformation based on estimated orders.
150 | 
151 |         Parameters
152 |         ----------
153 |             df  (pandas.DataFrame/np.array) Raw data
154 |         """
155 |         if not self.isFitted: 
156 |             raise Exception('Fit the model first.')
157 |             
158 |         df = pd.DataFrame(df)
159 |         fracDiffed = []
160 |         for j in range(df.shape[1]):
161 |             x = self._FracDiff(df.iloc[:,j], order=self.orders[j])
162 |             fracDiffed.append( x )
163 |         fracDiffed = pd.concat(fracDiffed, axis=1).sort_index()
164 | 
165 |         return fracDiffed
166 |     
167 |     def InverseTransform(self, fracDiffed, lagData):
168 |         """
169 |         Applies a fractional integration transformation by inverting the fractional differentiation. 
170 | 
171 |         Note: The previous `K` values of the original time series are required to invert the transformation.
172 |         For multi-variate time series, `K` will likely vary across columns and you may find `K` with the
173 |         attribute `.numLags`. 
174 |         
175 |         Parameters
176 |         ----------
177 |             fracDiffed (pandas.DataFrame/np.array) Fractionally differentiated data
178 |             lagData    (pandas.DataFrame/np.array) Previous values of time series. See note.
179 | 
180 |         Example
181 |         -------
182 |             # Multi-variate Time Series/DataFrame
183 |             X                                           # Shape (1000, 2)
184 | 
185 |             # Stationarize
186 |             fracDiff = FractionalDifferentiator()
187 |             X_stationary = fracDiff.FitTransform( X )   # Shape (967, 2)
188 | 
189 |             # Estimated orders
190 |             orders = fracDiff.orders                    # [0.5703, 0.9141]
191 | 
192 |             # Required lagged values
193 |             numLags = fracDiff.numLags                  # [155, 33]
194 |             lagData = X.head(max(numLags))
195 | 
196 |             # Fractionally integrate by passing in the first 155 values
197 |             X_reconstructed = fracDiff.InverseTransform( X_stationary, lagData )    # Recovers the original X
198 |         """
199 |         if not self.isFitted: 
200 |             raise Exception('Fit the model first.')
201 | 
202 |         maxLags, minLags = max(self.numLags), min(self.numLags)
203 |         lagData = pd.DataFrame(lagData)
204 |         if lagData.shape[0] != maxLags:
205 |             raise Exception(f'The previous {maxLags} values are required.')
206 |         
207 |         fracDiffed = pd.DataFrame(fracDiffed)
208 |         X = []
209 |         for j in range(fracDiffed.shape[1]):
210 |             memoryWeights = self._GetMemoryWeights(self.orders[j], memoryThreshold=self.memoryThreshold)
211 |             K = self.numLags[j]
212 |             offset = K - minLags
213 | 
214 |             # Initial values
215 |             tsLagData = lagData.iloc[:K, j]
216 |             
217 |             # Transformed values
218 |             X_tilde = fracDiffed.iloc[offset:, j]
219 | 
220 |             # Already stationary: identity transform
221 |             if K == 0:
222 |                 X.append( X_tilde )
223 |                 continue
224 |             
225 |             # Iteratively invert transformation
226 |             X_vals = np.ravel(tsLagData.values)
227 |             X_tilde = np.ravel(X_tilde.values)
228 |             for t in range(len(X_tilde)):
229 |                 x = X_tilde[t] - np.sum( memoryWeights[:-1] * X_vals[-K:] )
230 |                 X_vals = np.append(X_vals, x)
231 |             X_vals = pd.Series(X_vals)
232 |             X.append( X_vals )
233 |         X = pd.concat(X, axis=1).sort_index()
234 |         X.columns = fracDiffed.columns
235 | 
236 |         # Check for duplicate indices
237 |         idx = lagData.index[:minLags].union( fracDiffed.index )
238 |         if len(idx) != X.shape[0]:
239 |             idx = [ t for t in range(X.shape[0]) ]
240 |         X.index = idx
241 | 
242 |         return X
243 | 
244 |     def _GetMemoryWeights(self, order, memoryThreshold=1e-4):
245 |         """
246 |         Returns an array of memory weights for each time lag.
247 | 
248 |         Parameters:
249 |         -----------
250 |             order           (float) Order of fracdiff
251 |             memoryThreshold (float) Minimum magnitude of weight significance
252 |         """
253 |         memoryWeights = [1,]
254 |         k = 1
255 |         while True:
256 |             weight = -memoryWeights[-1] * ( order - k + 1 ) / k # Iteratively generate next lag weight
257 |             if abs(weight) < memoryThreshold:
258 |                 break
259 |             memoryWeights.append(weight)
260 |             k += 1
261 |         return np.array(list(reversed(memoryWeights)))
262 |     
263 |     def _FracDiff(self, ts, order=1, memoryWeights=None):
264 |         """
265 |         Differentiates a time series based on a real-valued order.
266 | 
267 |         Parameters:
268 |         -----------
269 |             ts            (pandas.Series) Univariate time series
270 |             order         (float) Order of differentiation
271 |             memoryWeights (array) Optional pre-computed weights
272 |         """
273 |         if memoryWeights is None:
274 |             memoryWeights = self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)
275 | 
276 |         K = len(memoryWeights)
277 |         fracDiffedSeries = ts.rolling(K).apply(lambda x: np.sum( x * memoryWeights ), raw=True)
278 |         fracDiffedSeries = fracDiffedSeries.iloc[(K-1):]
279 |         
280 |         return fracDiffedSeries
281 |     
282 |     def _MinimumOrderSearch(self, ts, lowerOrder=0, upperOrder=1, first_run=False):
283 |         """
284 |         Binary search algorithm for estimating the minimum order of differentiation required for stationarity.
285 |         
286 |         Parameters
287 |         ----------
288 |             ts                   (pandas.Series) Univariate time series
289 |             lowerOrder           (float) Lower bound on order
290 |             upperOrder           (float) Upper bound on order
291 |             first_run            (bool)  For testing endpoints of order bounds
292 |         """  
293 |         ## Convergence criteria
294 |         if abs( upperOrder - lowerOrder ) <= self.precision:
295 |             return upperOrder
296 |         
297 |         ## Initial run: Test endpoints
298 |         if first_run:
299 |             lowerFracDiff = self._FracDiff(ts, order=lowerOrder).dropna()
300 |             upperFracDiff = self._FracDiff(ts, order=upperOrder).dropna()
301 |             
302 |             # Unit root tests
303 |             lowerStationary = self.UnitRootTest.IsStationary( lowerFracDiff )
304 |             upperStationary = self.UnitRootTest.IsStationary( upperFracDiff )
305 | 
306 |             # Series is I(0)
307 |             if lowerStationary:
308 |                 return lowerOrder
309 |             # Series is I(k>>1)
310 |             if not upperStationary:                                                        
311 |                 print('Warning: Time series is explosive. Increase upper bounds.')
312 |                 return upperOrder
313 |             
314 |         ## Binary Search: Test midpoint
315 |         midOrder = ( lowerOrder + upperOrder ) / 2                                      
316 |         midFracDiff = self._FracDiff(ts, order=midOrder).dropna()
317 |         midStationary = self.UnitRootTest.IsStationary( midFracDiff )
318 |         
319 |         # Series is weakly stationary in [lowerOrder, midOrder]
320 |         if midStationary:
321 |             return self._MinimumOrderSearch(ts, lowerOrder=lowerOrder, upperOrder=midOrder)
322 |         # Series is weakly stationary in [midOrder, upperOrder]
323 |         else:
324 |             return self._MinimumOrderSearch(ts, lowerOrder=midOrder, upperOrder=upperOrder)
325 |         
326 |
327 |
328 |
329 |
330 |
331 |
332 |
333 |
334 |
335 |

Classes

336 |
337 |
338 | class FractionalDifferentiator 339 | (maxOrderBound=1, significance=0.01, precision=0.01, memoryThreshold=0.0001, unitRootTest='PP', unitRootTestConfig={}) 340 |
341 |
342 |

Provides estimation of the real-valued order of integration and provides fractional 343 | differentiation data transformations.

344 |

The available stationarity/unit root tests are:

345 |
- 'PP'  : Phillips and Perron (1988) [default]
346 | - 'ADF' : Augmented Dickey-Fuller (Said & Dickey, 1984)
347 | 
348 |

Parameters:

349 |
maxOrderBound       (float) Maximum real-valued order to search in (0, maxOrderBound)
350 | significance        (float) Statistical significance level
351 | precision           (float) Precision of estimated order
352 | memoryThreshold     (float) Minimum magnitude of weight significance
353 | unitRootTest        (str)   Unit-root/stationarity tests: ['PP','ADF']
354 | unitRootTestConfig  (dict)  Optional keyword arguments to pass to unit root tests
355 | 
356 |

Attributes:

357 |
orders              (list)  Estimated minimum orders of differentiation
358 | numLags             (list)  Number of lags required for transformations
359 | 
360 |

Example:

361 |
    # A pandas.DataFrame/np.array with potentially non-stationary time series
362 | df
363 | 
364 |     # Automatic stationary transformation with minimal information loss
365 | from tsfracdiff import FractionalDifferentiator
366 | fracDiff = FractionalDifferentiator()
367 | df = fracDiff.FitTransform(df)
368 | 
369 |
370 | 371 | Expand source code 372 | 373 |
class FractionalDifferentiator:
374 |     
375 |     def __init__(self, maxOrderBound=1, significance=0.01, precision=0.01, memoryThreshold=1e-4,
376 |                        unitRootTest='PP', unitRootTestConfig={}):
377 |         """
378 |         Provides estimation of the real-valued order of integration and provides fractional 
379 |         differentiation data transformations.
380 |         
381 |         The available stationarity/unit root tests are:
382 |         -----------------------------------------------
383 |             - 'PP'  : Phillips and Perron (1988) [default]
384 |             - 'ADF' : Augmented Dickey-Fuller (Said & Dickey, 1984)
385 | 
386 |         Parameters:
387 |         -----------
388 |             maxOrderBound       (float) Maximum real-valued order to search in (0, maxOrderBound)
389 |             significance        (float) Statistical significance level
390 |             precision           (float) Precision of estimated order
391 |             memoryThreshold     (float) Minimum magnitude of weight significance
392 |             unitRootTest        (str)   Unit-root/stationarity tests: ['PP','ADF']
393 |             unitRootTestConfig  (dict)  Optional keyword arguments to pass to unit root tests
394 | 
395 |         Attributes:
396 |         -----------
397 |             orders              (list)  Estimated minimum orders of differentiation
398 |             numLags             (list)  Number of lags required for transformations
399 | 
400 |         Example:
401 |         --------
402 |                 # A pandas.DataFrame/np.array with potentially non-stationary time series
403 |             df 
404 |         
405 |                 # Automatic stationary transformation with minimal information loss
406 |             from tsfracdiff import FractionalDifferentiator
407 |             fracDiff = FractionalDifferentiator()
408 |             df = fracDiff.FitTransform(df)
409 |         """
410 |         self.maxOrderBound = maxOrderBound
411 |         self.significance = significance
412 |         self.precision = precision
413 |         self.memoryThreshold = memoryThreshold
414 |         
415 |         # Critical value checks
416 |         checkCV = False
417 |         cv_sig = None
418 |         if (self.significance in [0.01, 0.05, 0.1]):
419 |             checkCV = True
420 |             cv_sig = str(int(self.significance * 100)) + '%'
421 |         
422 |         # Unit-root/Stationarity tests
423 |         if unitRootTest == 'PP':
424 |             self.UnitRootTest = PhillipsPerron(significance=significance, checkCV=checkCV, cv_sig=cv_sig)
425 |         elif unitRootTest == 'ADF':
426 |             self.UnitRootTest = ADFuller(significance=significance, checkCV=checkCV, cv_sig=cv_sig)
427 |         else:
428 |             raise Exception('Please specify a valid unit root test.')
429 |         self.UnitRootTest.config.update( unitRootTestConfig )
430 | 
431 |         # States
432 |         self.isFitted = False
433 |         self.orders = []
434 |         self.numLags = None
435 |         
436 |     def Fit(self, df, parallel=True):
437 |         """
438 |         Estimates the fractional order of integration.
439 |         
440 |         Parameters:
441 |         -----------
442 |             df       (pandas.DataFrame/np.array) Raw data
443 |             parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
444 |         """
445 |         df = pd.DataFrame(df)
446 |         
447 |         # Estimate minimum order of differencing
448 |         if parallel:
449 |             try:
450 |                 import multiprocessing
451 |                 from joblib import Parallel, delayed
452 |                 from functools import partial
453 |             except ImportError:
454 |                 raise Exception('The module `joblib` is required for parallelization.')
455 | 
456 |             def ApplyParallel(df, func, **kwargs):
457 |                 n_jobs = min(df.shape[1], multiprocessing.cpu_count())
458 |                 res = Parallel(n_jobs=n_jobs)( delayed(partial(func, **kwargs))(x) for x in np.array_split(df, df.shape[1], axis=1) )
459 |                 return res
460 |             orders = ApplyParallel(df, self._MinimumOrderSearch, upperOrder=self.maxOrderBound, first_run=True)
461 |         else:
462 |             orders = []
463 |             for j in range(df.shape[1]):
464 |                 orders.append( self._MinimumOrderSearch(df.iloc[:,j], upperOrder=self.maxOrderBound, first_run=True) )
465 |         self.orders = orders
466 |         self.numLags = [ (len(self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)) - 1) for order in self.orders ]
467 |         self.isFitted = True
468 | 
469 |         return
470 |         
471 |     def FitTransform(self, df, parallel=True):
472 |         """
473 |         Estimates the fractional order of integration and returns a stationarized dataframe.
474 | 
475 |         Parameters
476 |         ----------
477 |             df       (pandas.DataFrame/np.array) Raw data
478 |             parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
479 |         """
480 |         if not self.isFitted: 
481 |             self.Fit(df, parallel=parallel)
482 |         fracDiffed = self.Transform(df)
483 | 
484 |         return fracDiffed
485 |     
486 |     def Transform(self, df):
487 |         """
488 |         Applies a fractional differentiation transformation based on estimated orders.
489 | 
490 |         Parameters
491 |         ----------
492 |             df  (pandas.DataFrame/np.array) Raw data
493 |         """
494 |         if not self.isFitted: 
495 |             raise Exception('Fit the model first.')
496 |             
497 |         df = pd.DataFrame(df)
498 |         fracDiffed = []
499 |         for j in range(df.shape[1]):
500 |             x = self._FracDiff(df.iloc[:,j], order=self.orders[j])
501 |             fracDiffed.append( x )
502 |         fracDiffed = pd.concat(fracDiffed, axis=1).sort_index()
503 | 
504 |         return fracDiffed
505 |     
506 |     def InverseTransform(self, fracDiffed, lagData):
507 |         """
508 |         Applies a fractional integration transformation by inverting the fractional differentiation. 
509 | 
510 |         Note: The previous `K` values of the original time series are required to invert the transformation.
511 |         For multi-variate time series, `K` will likely vary across columns and you may find `K` with the
512 |         attribute `.numLags`. 
513 |         
514 |         Parameters
515 |         ----------
516 |             fracDiffed (pandas.DataFrame/np.array) Fractionally differentiated data
517 |             lagData    (pandas.DataFrame/np.array) Previous values of time series. See note.
518 | 
519 |         Example
520 |         -------
521 |             # Multi-variate Time Series/DataFrame
522 |             X                                           # Shape (1000, 2)
523 | 
524 |             # Stationarize
525 |             fracDiff = FractionalDifferentiator()
526 |             X_stationary = fracDiff.FitTransform( X )   # Shape (967, 2)
527 | 
528 |             # Estimated orders
529 |             orders = fracDiff.orders                    # [0.5703, 0.9141]
530 | 
531 |             # Required lagged values
532 |             numLags = fracDiff.numLags                  # [155, 33]
533 |             lagData = X.head(max(numLags))
534 | 
535 |             # Fractionally integrate by passing in the first 155 values
536 |             X_reconstructed = fracDiff.InverseTransform( X_stationary, lagData )    # Recovers the original X
537 |         """
538 |         if not self.isFitted: 
539 |             raise Exception('Fit the model first.')
540 | 
541 |         maxLags, minLags = max(self.numLags), min(self.numLags)
542 |         lagData = pd.DataFrame(lagData)
543 |         if lagData.shape[0] != maxLags:
544 |             raise Exception(f'The previous {maxLags} values are required.')
545 |         
546 |         fracDiffed = pd.DataFrame(fracDiffed)
547 |         X = []
548 |         for j in range(fracDiffed.shape[1]):
549 |             memoryWeights = self._GetMemoryWeights(self.orders[j], memoryThreshold=self.memoryThreshold)
550 |             K = self.numLags[j]
551 |             offset = K - minLags
552 | 
553 |             # Initial values
554 |             tsLagData = lagData.iloc[:K, j]
555 |             
556 |             # Transformed values
557 |             X_tilde = fracDiffed.iloc[offset:, j]
558 | 
559 |             # Already stationary: identity transform
560 |             if K == 0:
561 |                 X.append( X_tilde )
562 |                 continue
563 |             
564 |             # Iteratively invert transformation
565 |             X_vals = np.ravel(tsLagData.values)
566 |             X_tilde = np.ravel(X_tilde.values)
567 |             for t in range(len(X_tilde)):
568 |                 x = X_tilde[t] - np.sum( memoryWeights[:-1] * X_vals[-K:] )
569 |                 X_vals = np.append(X_vals, x)
570 |             X_vals = pd.Series(X_vals)
571 |             X.append( X_vals )
572 |         X = pd.concat(X, axis=1).sort_index()
573 |         X.columns = fracDiffed.columns
574 | 
575 |         # Check for duplicate indices
576 |         idx = lagData.index[:minLags].union( fracDiffed.index )
577 |         if len(idx) != X.shape[0]:
578 |             idx = [ t for t in range(X.shape[0]) ]
579 |         X.index = idx
580 | 
581 |         return X
582 | 
583 |     def _GetMemoryWeights(self, order, memoryThreshold=1e-4):
584 |         """
585 |         Returns an array of memory weights for each time lag.
586 | 
587 |         Parameters:
588 |         -----------
589 |             order           (float) Order of fracdiff
590 |             memoryThreshold (float) Minimum magnitude of weight significance
591 |         """
592 |         memoryWeights = [1,]
593 |         k = 1
594 |         while True:
595 |             weight = -memoryWeights[-1] * ( order - k + 1 ) / k # Iteratively generate next lag weight
596 |             if abs(weight) < memoryThreshold:
597 |                 break
598 |             memoryWeights.append(weight)
599 |             k += 1
600 |         return np.array(list(reversed(memoryWeights)))
601 |     
602 |     def _FracDiff(self, ts, order=1, memoryWeights=None):
603 |         """
604 |         Differentiates a time series based on a real-valued order.
605 | 
606 |         Parameters:
607 |         -----------
608 |             ts            (pandas.Series) Univariate time series
609 |             order         (float) Order of differentiation
610 |             memoryWeights (array) Optional pre-computed weights
611 |         """
612 |         if memoryWeights is None:
613 |             memoryWeights = self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)
614 | 
615 |         K = len(memoryWeights)
616 |         fracDiffedSeries = ts.rolling(K).apply(lambda x: np.sum( x * memoryWeights ), raw=True)
617 |         fracDiffedSeries = fracDiffedSeries.iloc[(K-1):]
618 |         
619 |         return fracDiffedSeries
620 |     
621 |     def _MinimumOrderSearch(self, ts, lowerOrder=0, upperOrder=1, first_run=False):
622 |         """
623 |         Binary search algorithm for estimating the minimum order of differentiation required for stationarity.
624 |         
625 |         Parameters
626 |         ----------
627 |             ts                   (pandas.Series) Univariate time series
628 |             lowerOrder           (float) Lower bound on order
629 |             upperOrder           (float) Upper bound on order
630 |             first_run            (bool)  For testing endpoints of order bounds
631 |         """  
632 |         ## Convergence criteria
633 |         if abs( upperOrder - lowerOrder ) <= self.precision:
634 |             return upperOrder
635 |         
636 |         ## Initial run: Test endpoints
637 |         if first_run:
638 |             lowerFracDiff = self._FracDiff(ts, order=lowerOrder).dropna()
639 |             upperFracDiff = self._FracDiff(ts, order=upperOrder).dropna()
640 |             
641 |             # Unit root tests
642 |             lowerStationary = self.UnitRootTest.IsStationary( lowerFracDiff )
643 |             upperStationary = self.UnitRootTest.IsStationary( upperFracDiff )
644 | 
645 |             # Series is I(0)
646 |             if lowerStationary:
647 |                 return lowerOrder
648 |             # Series is I(k>>1)
649 |             if not upperStationary:                                                        
650 |                 print('Warning: Time series is explosive. Increase upper bounds.')
651 |                 return upperOrder
652 |             
653 |         ## Binary Search: Test midpoint
654 |         midOrder = ( lowerOrder + upperOrder ) / 2                                      
655 |         midFracDiff = self._FracDiff(ts, order=midOrder).dropna()
656 |         midStationary = self.UnitRootTest.IsStationary( midFracDiff )
657 |         
658 |         # Series is weakly stationary in [lowerOrder, midOrder]
659 |         if midStationary:
660 |             return self._MinimumOrderSearch(ts, lowerOrder=lowerOrder, upperOrder=midOrder)
661 |         # Series is weakly stationary in [midOrder, upperOrder]
662 |         else:
663 |             return self._MinimumOrderSearch(ts, lowerOrder=midOrder, upperOrder=upperOrder)
664 |
665 |

Methods

666 |
667 |
668 | def Fit(self, df, parallel=True) 669 |
670 |
671 |

Estimates the fractional order of integration.

672 |

Parameters:

673 |
df       (pandas.DataFrame/np.array) Raw data
674 | parallel (bool) Use multiprocessing if true (default). Requires <code>joblib</code>.
675 | 
676 |
677 | 678 | Expand source code 679 | 680 |
def Fit(self, df, parallel=True):
681 |     """
682 |     Estimates the fractional order of integration.
683 |     
684 |     Parameters:
685 |     -----------
686 |         df       (pandas.DataFrame/np.array) Raw data
687 |         parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
688 |     """
689 |     df = pd.DataFrame(df)
690 |     
691 |     # Estimate minimum order of differencing
692 |     if parallel:
693 |         try:
694 |             import multiprocessing
695 |             from joblib import Parallel, delayed
696 |             from functools import partial
697 |         except ImportError:
698 |             raise Exception('The module `joblib` is required for parallelization.')
699 | 
700 |         def ApplyParallel(df, func, **kwargs):
701 |             n_jobs = min(df.shape[1], multiprocessing.cpu_count())
702 |             res = Parallel(n_jobs=n_jobs)( delayed(partial(func, **kwargs))(x) for x in np.array_split(df, df.shape[1], axis=1) )
703 |             return res
704 |         orders = ApplyParallel(df, self._MinimumOrderSearch, upperOrder=self.maxOrderBound, first_run=True)
705 |     else:
706 |         orders = []
707 |         for j in range(df.shape[1]):
708 |             orders.append( self._MinimumOrderSearch(df.iloc[:,j], upperOrder=self.maxOrderBound, first_run=True) )
709 |     self.orders = orders
710 |     self.numLags = [ (len(self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)) - 1) for order in self.orders ]
711 |     self.isFitted = True
712 | 
713 |     return
714 |
715 |
716 |
717 | def FitTransform(self, df, parallel=True) 718 |
719 |
720 |

Estimates the fractional order of integration and returns a stationarized dataframe.

721 |

Parameters

722 |
df       (pandas.DataFrame/np.array) Raw data
723 | parallel (bool) Use multiprocessing if true (default). Requires <code>joblib</code>.
724 | 
725 |
726 | 727 | Expand source code 728 | 729 |
def FitTransform(self, df, parallel=True):
730 |     """
731 |     Estimates the fractional order of integration and returns a stationarized dataframe.
732 | 
733 |     Parameters
734 |     ----------
735 |         df       (pandas.DataFrame/np.array) Raw data
736 |         parallel (bool) Use multiprocessing if true (default). Requires `joblib`.
737 |     """
738 |     if not self.isFitted: 
739 |         self.Fit(df, parallel=parallel)
740 |     fracDiffed = self.Transform(df)
741 | 
742 |     return fracDiffed
743 |
744 |
745 |
746 | def InverseTransform(self, fracDiffed, lagData) 747 |
748 |
749 |

Applies a fractional integration transformation by inverting the fractional differentiation.

750 |

Note: The previous K values of the original time series are required to invert the transformation. 751 | For multi-variate time series, K will likely vary across columns and you may find K with the 752 | attribute .numLags.

753 |

Parameters

754 |
fracDiffed (pandas.DataFrame/np.array) Fractionally differentiated data
755 | lagData    (pandas.DataFrame/np.array) Previous values of time series. See note.
756 | 
757 |

Example

758 |
# Multi-variate Time Series/DataFrame
759 | X                                           # Shape (1000, 2)
760 | 
761 | # Stationarize
762 | fracDiff = FractionalDifferentiator()
763 | X_stationary = fracDiff.FitTransform( X )   # Shape (967, 2)
764 | 
765 | # Estimated orders
766 | orders = fracDiff.orders                    # [0.5703, 0.9141]
767 | 
768 | # Required lagged values
769 | numLags = fracDiff.numLags                  # [155, 33]
770 | lagData = X.head(max(numLags))
771 | 
772 | # Fractionally integrate by passing in the first 155 values
773 | X_reconstructed = fracDiff.InverseTransform( X_stationary, lagData )    # Recovers the original X
774 | 
775 |
776 | 777 | Expand source code 778 | 779 |
def InverseTransform(self, fracDiffed, lagData):
780 |     """
781 |     Applies a fractional integration transformation by inverting the fractional differentiation. 
782 | 
783 |     Note: The previous `K` values of the original time series are required to invert the transformation.
784 |     For multi-variate time series, `K` will likely vary across columns and you may find `K` with the
785 |     attribute `.numLags`. 
786 |     
787 |     Parameters
788 |     ----------
789 |         fracDiffed (pandas.DataFrame/np.array) Fractionally differentiated data
790 |         lagData    (pandas.DataFrame/np.array) Previous values of time series. See note.
791 | 
792 |     Example
793 |     -------
794 |         # Multi-variate Time Series/DataFrame
795 |         X                                           # Shape (1000, 2)
796 | 
797 |         # Stationarize
798 |         fracDiff = FractionalDifferentiator()
799 |         X_stationary = fracDiff.FitTransform( X )   # Shape (967, 2)
800 | 
801 |         # Estimated orders
802 |         orders = fracDiff.orders                    # [0.5703, 0.9141]
803 | 
804 |         # Required lagged values
805 |         numLags = fracDiff.numLags                  # [155, 33]
806 |         lagData = X.head(max(numLags))
807 | 
808 |         # Fractionally integrate by passing in the first 155 values
809 |         X_reconstructed = fracDiff.InverseTransform( X_stationary, lagData )    # Recovers the original X
810 |     """
811 |     if not self.isFitted: 
812 |         raise Exception('Fit the model first.')
813 | 
814 |     maxLags, minLags = max(self.numLags), min(self.numLags)
815 |     lagData = pd.DataFrame(lagData)
816 |     if lagData.shape[0] != maxLags:
817 |         raise Exception(f'The previous {maxLags} values are required.')
818 |     
819 |     fracDiffed = pd.DataFrame(fracDiffed)
820 |     X = []
821 |     for j in range(fracDiffed.shape[1]):
822 |         memoryWeights = self._GetMemoryWeights(self.orders[j], memoryThreshold=self.memoryThreshold)
823 |         K = self.numLags[j]
824 |         offset = K - minLags
825 | 
826 |         # Initial values
827 |         tsLagData = lagData.iloc[:K, j]
828 |         
829 |         # Transformed values
830 |         X_tilde = fracDiffed.iloc[offset:, j]
831 | 
832 |         # Already stationary: identity transform
833 |         if K == 0:
834 |             X.append( X_tilde )
835 |             continue
836 |         
837 |         # Iteratively invert transformation
838 |         X_vals = np.ravel(tsLagData.values)
839 |         X_tilde = np.ravel(X_tilde.values)
840 |         for t in range(len(X_tilde)):
841 |             x = X_tilde[t] - np.sum( memoryWeights[:-1] * X_vals[-K:] )
842 |             X_vals = np.append(X_vals, x)
843 |         X_vals = pd.Series(X_vals)
844 |         X.append( X_vals )
845 |     X = pd.concat(X, axis=1).sort_index()
846 |     X.columns = fracDiffed.columns
847 | 
848 |     # Check for duplicate indices
849 |     idx = lagData.index[:minLags].union( fracDiffed.index )
850 |     if len(idx) != X.shape[0]:
851 |         idx = [ t for t in range(X.shape[0]) ]
852 |     X.index = idx
853 | 
854 |     return X
855 |
856 |
857 |
858 | def Transform(self, df) 859 |
860 |
861 |

Applies a fractional differentiation transformation based on estimated orders.

862 |

Parameters

863 |
df  (pandas.DataFrame/np.array) Raw data
864 | 
865 |
866 | 867 | Expand source code 868 | 869 |
def Transform(self, df):
870 |     """
871 |     Applies a fractional differentiation transformation based on estimated orders.
872 | 
873 |     Parameters
874 |     ----------
875 |         df  (pandas.DataFrame/np.array) Raw data
876 |     """
877 |     if not self.isFitted: 
878 |         raise Exception('Fit the model first.')
879 |         
880 |     df = pd.DataFrame(df)
881 |     fracDiffed = []
882 |     for j in range(df.shape[1]):
883 |         x = self._FracDiff(df.iloc[:,j], order=self.orders[j])
884 |         fracDiffed.append( x )
885 |     fracDiffed = pd.concat(fracDiffed, axis=1).sort_index()
886 | 
887 |     return fracDiffed
888 |
889 |
890 |
891 |
892 |
893 |
894 |
895 | 921 |
922 | 925 | 926 | -------------------------------------------------------------------------------- /docs/tsfracdiff/unit_root_tests.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | tsfracdiff.unit_root_tests API documentation 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 |
21 |
22 |

Module tsfracdiff.unit_root_tests

23 |
24 |
25 |
26 | 27 | Expand source code 28 | 29 |
import arch
 30 | from arch.unitroot import PhillipsPerron as PP
 31 | from arch.unitroot import ADF
 32 | 
 33 | ## TODO: Ng and Perron (2001)?
 34 | 
 35 | class PhillipsPerron:
 36 |     """
 37 |     Unit root testing via Phillips and Perron (1988). This test is robust to
 38 |     serial correlation and heteroskedasticity.
 39 | 
 40 |     References:
 41 |     -----------
 42 |     Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression. 
 43 |     Biometrika, 75(2), 335–346. https://doi.org/10.1093/biomet/75.2.335
 44 |     """
 45 |     
 46 |     def __init__(self, 
 47 |                 config={ 'trend' : 'n', 'test_type' : 'tau'}, 
 48 |                 significance=0.01,
 49 |                 checkCV=False, 
 50 |                 cv_sig=None):
 51 |         self.config = config
 52 |         self.significance = significance
 53 |         self.checkCV = checkCV
 54 |         self.cv_sig = cv_sig
 55 | 
 56 |     def IsStationary(self, ts):
 57 |         """
 58 |         Performs a unit root test.
 59 |         """
 60 | 
 61 |         testResults = PP(ts, trend=self.config['trend'], test_type=self.config['test_type'])
 62 |         pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
 63 | 
 64 |         result = self.HypothesisTest(pval, cv, stat)
 65 | 
 66 |         return result
 67 | 
 68 |     def HypothesisTest(self, pval, cv, stat):
 69 |         """
 70 |         Null Hypothesis: Time series is integrated of order I(1)
 71 |         Alt Hypothesis: Time series is integrated of order I(k<1)
 72 |         """
 73 |         
 74 |         # Reject the hypothesis
 75 |         if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
 76 |             return True
 77 |         # Fail to reject the hypothesis
 78 |         else:
 79 |             return False
 80 | 
 81 | class ADFuller:
 82 |     """
 83 |     Unit root testing via Said and Dickey (1984). This test assumes a parametric
 84 |     ARMA structure to correct for serial correlation but assumes the errors are homoskedastic.
 85 | 
 86 |     References:
 87 |     -----------
 88 |     Said E. Said, & Dickey, D. A. (1984). Testing for Unit Roots in Autoregressive-Moving Average 
 89 |     Models of Unknown Order. Biometrika, 71(3), 599–607. https://doi.org/10.2307/2336570
 90 |     """
 91 |     def __init__(self, 
 92 |                 config={ 'trend' : 'n', 'method' : 'AIC'}, 
 93 |                 significance=0.01,
 94 |                 checkCV=False, 
 95 |                 cv_sig=None):
 96 |         self.config = config
 97 |         self.significance = significance
 98 |         self.checkCV = checkCV
 99 |         self.cv_sig = cv_sig
100 | 
101 |         ## Compatability workaround //
102 |         #   arch <= 4.17 uses capital letters but newer versions use lowercase
103 |         if (str(arch.__version__) > '4.17'):
104 |             if self.config.get('method') == 'AIC':
105 |                 self.config['method'] = 'aic'
106 |             elif self.config.get('method') == 'BIC':
107 |                 self.config['method'] = 'bic'
108 | 
109 |     def IsStationary(self, ts):
110 |         """
111 |         Performs a unit root test.
112 |         """
113 | 
114 |         testResults = ADF(ts, trend=self.config['trend'], method=self.config['method'])
115 |         pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
116 | 
117 |         result = self.HypothesisTest(pval, cv, stat)
118 | 
119 |         return result
120 | 
121 |     def HypothesisTest(self, pval, cv, stat):
122 |         """
123 |         Null Hypothesis: Gamma = 0 (Unit root)
124 |         Alt Hypothesis: Gamma < 0
125 |         """
126 |         
127 |         # Reject the hypothesis
128 |         if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
129 |             return True
130 |         # Fail to reject the hypothesis
131 |         else:
132 |             return False
133 | 
134 |     
135 | 
136 |     
137 |
138 |
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |

Classes

147 |
148 |
149 | class ADFuller 150 | (config={'trend': 'n', 'method': 'AIC'}, significance=0.01, checkCV=False, cv_sig=None) 151 |
152 |
153 |

Unit root testing via Said and Dickey (1984). This test assumes a parametric 154 | ARMA structure to correct for serial correlation but assumes the errors are homoskedastic.

155 |

References:

156 |

Said E. Said, & Dickey, D. A. (1984). Testing for Unit Roots in Autoregressive-Moving Average 157 | Models of Unknown Order. Biometrika, 71(3), 599–607. https://doi.org/10.2307/2336570

158 |
159 | 160 | Expand source code 161 | 162 |
class ADFuller:
163 |     """
164 |     Unit root testing via Said and Dickey (1984). This test assumes a parametric
165 |     ARMA structure to correct for serial correlation but assumes the errors are homoskedastic.
166 | 
167 |     References:
168 |     -----------
169 |     Said E. Said, & Dickey, D. A. (1984). Testing for Unit Roots in Autoregressive-Moving Average 
170 |     Models of Unknown Order. Biometrika, 71(3), 599–607. https://doi.org/10.2307/2336570
171 |     """
172 |     def __init__(self, 
173 |                 config={ 'trend' : 'n', 'method' : 'AIC'}, 
174 |                 significance=0.01,
175 |                 checkCV=False, 
176 |                 cv_sig=None):
177 |         self.config = config
178 |         self.significance = significance
179 |         self.checkCV = checkCV
180 |         self.cv_sig = cv_sig
181 | 
182 |         ## Compatability workaround //
183 |         #   arch <= 4.17 uses capital letters but newer versions use lowercase
184 |         if (str(arch.__version__) > '4.17'):
185 |             if self.config.get('method') == 'AIC':
186 |                 self.config['method'] = 'aic'
187 |             elif self.config.get('method') == 'BIC':
188 |                 self.config['method'] = 'bic'
189 | 
190 |     def IsStationary(self, ts):
191 |         """
192 |         Performs a unit root test.
193 |         """
194 | 
195 |         testResults = ADF(ts, trend=self.config['trend'], method=self.config['method'])
196 |         pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
197 | 
198 |         result = self.HypothesisTest(pval, cv, stat)
199 | 
200 |         return result
201 | 
202 |     def HypothesisTest(self, pval, cv, stat):
203 |         """
204 |         Null Hypothesis: Gamma = 0 (Unit root)
205 |         Alt Hypothesis: Gamma < 0
206 |         """
207 |         
208 |         # Reject the hypothesis
209 |         if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
210 |             return True
211 |         # Fail to reject the hypothesis
212 |         else:
213 |             return False
214 |
215 |

Methods

216 |
217 |
218 | def HypothesisTest(self, pval, cv, stat) 219 |
220 |
221 |

Null Hypothesis: Gamma = 0 (Unit root) 222 | Alt Hypothesis: Gamma < 0

223 |
224 | 225 | Expand source code 226 | 227 |
def HypothesisTest(self, pval, cv, stat):
228 |     """
229 |     Null Hypothesis: Gamma = 0 (Unit root)
230 |     Alt Hypothesis: Gamma < 0
231 |     """
232 |     
233 |     # Reject the hypothesis
234 |     if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
235 |         return True
236 |     # Fail to reject the hypothesis
237 |     else:
238 |         return False
239 |
240 |
241 |
242 | def IsStationary(self, ts) 243 |
244 |
245 |

Performs a unit root test.

246 |
247 | 248 | Expand source code 249 | 250 |
def IsStationary(self, ts):
251 |     """
252 |     Performs a unit root test.
253 |     """
254 | 
255 |     testResults = ADF(ts, trend=self.config['trend'], method=self.config['method'])
256 |     pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
257 | 
258 |     result = self.HypothesisTest(pval, cv, stat)
259 | 
260 |     return result
261 |
262 |
263 |
264 |
265 |
266 | class PhillipsPerron 267 | (config={'trend': 'n', 'test_type': 'tau'}, significance=0.01, checkCV=False, cv_sig=None) 268 |
269 |
270 |

Unit root testing via Phillips and Perron (1988). This test is robust to 271 | serial correlation and heteroskedasticity.

272 |

References:

273 |

Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression. 274 | Biometrika, 75(2), 335–346. https://doi.org/10.1093/biomet/75.2.335

275 |
276 | 277 | Expand source code 278 | 279 |
class PhillipsPerron:
280 |     """
281 |     Unit root testing via Phillips and Perron (1988). This test is robust to
282 |     serial correlation and heteroskedasticity.
283 | 
284 |     References:
285 |     -----------
286 |     Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression. 
287 |     Biometrika, 75(2), 335–346. https://doi.org/10.1093/biomet/75.2.335
288 |     """
289 |     
290 |     def __init__(self, 
291 |                 config={ 'trend' : 'n', 'test_type' : 'tau'}, 
292 |                 significance=0.01,
293 |                 checkCV=False, 
294 |                 cv_sig=None):
295 |         self.config = config
296 |         self.significance = significance
297 |         self.checkCV = checkCV
298 |         self.cv_sig = cv_sig
299 | 
300 |     def IsStationary(self, ts):
301 |         """
302 |         Performs a unit root test.
303 |         """
304 | 
305 |         testResults = PP(ts, trend=self.config['trend'], test_type=self.config['test_type'])
306 |         pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
307 | 
308 |         result = self.HypothesisTest(pval, cv, stat)
309 | 
310 |         return result
311 | 
312 |     def HypothesisTest(self, pval, cv, stat):
313 |         """
314 |         Null Hypothesis: Time series is integrated of order I(1)
315 |         Alt Hypothesis: Time series is integrated of order I(k<1)
316 |         """
317 |         
318 |         # Reject the hypothesis
319 |         if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
320 |             return True
321 |         # Fail to reject the hypothesis
322 |         else:
323 |             return False
324 |
325 |

Methods

326 |
327 |
328 | def HypothesisTest(self, pval, cv, stat) 329 |
330 |
331 |

Null Hypothesis: Time series is integrated of order I(1) 332 | Alt Hypothesis: Time series is integrated of order I(k<1)

333 |
334 | 335 | Expand source code 336 | 337 |
def HypothesisTest(self, pval, cv, stat):
338 |     """
339 |     Null Hypothesis: Time series is integrated of order I(1)
340 |     Alt Hypothesis: Time series is integrated of order I(k<1)
341 |     """
342 |     
343 |     # Reject the hypothesis
344 |     if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ):
345 |         return True
346 |     # Fail to reject the hypothesis
347 |     else:
348 |         return False
349 |
350 |
351 |
352 | def IsStationary(self, ts) 353 |
354 |
355 |

Performs a unit root test.

356 |
357 | 358 | Expand source code 359 | 360 |
def IsStationary(self, ts):
361 |     """
362 |     Performs a unit root test.
363 |     """
364 | 
365 |     testResults = PP(ts, trend=self.config['trend'], test_type=self.config['test_type'])
366 |     pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat
367 | 
368 |     result = self.HypothesisTest(pval, cv, stat)
369 | 
370 |     return result
371 |
372 |
373 |
374 |
375 |
376 |
377 |
378 | 409 |
410 | 413 | 414 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [project] 6 | name = "tsfracdiff" 7 | description = "Efficient and easy to use fractional differentiation transformations for stationarizing time series data." 8 | authors = [ 9 | {name = "Adam Wu"}, 10 | {email = "adamwu1@outlook.com"} 11 | ] 12 | readme = "README.md" 13 | license = {file = "LICENSE"} 14 | requires-python = ">=3.7" 15 | dependencies = [ 16 | "numpy", 17 | "pandas", 18 | "arch", 19 | "joblib" 20 | ] 21 | classifiers = [ 22 | 'Intended Audience :: Science/Research', 23 | 'Topic :: Scientific/Engineering :: Information Analysis', 24 | 'Programming Language :: Python :: 3 ', 25 | 'Operating System :: OS Independent', 26 | 'License :: OSI Approved :: MIT License' 27 | ] 28 | dynamic = [ "version" ] 29 | 30 | [tool.setuptools.dynamic] 31 | version = {attr = "tsfracdiff.__version__"} 32 | 33 | [project.urls] 34 | homepage = "https://github.com/adamvvu/tsfracdiff" 35 | documentation = "https://github.com/adamvvu/tsfracdiff" 36 | repository = "https://github.com/adamvvu/tsfracdiff" -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Required Dependencies 2 | numpy 3 | pandas 4 | arch <= 4.17; python_version == '3.6.*' 5 | arch; python_version >= '3.7' 6 | 7 | # Suggested/Optional 8 | joblib -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | license_files = LICENSE 3 | version = attr: tsfracdiff.__version__ -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, Command 2 | from codecs import open 3 | from os import path 4 | 5 | currPath = path.abspath(path.dirname(__file__)) 6 | 7 | # Parse README 8 | with open(path.join(currPath, 'README.md'), encoding='utf-8') as f: 9 | long_description = f.read() 10 | 11 | # Parse version 12 | with open(path.join(currPath, 'tsfracdiff', '__init__.py')) as f: 13 | for line in f: 14 | if line.startswith('__version__'): 15 | version = line.split('"')[1] 16 | 17 | setup( 18 | name='tsfracdiff', 19 | description='Efficient and easy to use fractional differentiation transformations for stationarizing time series data.', 20 | version=version, 21 | long_description=long_description, 22 | long_description_content_type='text/markdown', 23 | url='https://github.com/adamvvu/tsfracdiff', 24 | author='Adam Wu', 25 | author_email='adamwu1@outlook.com', 26 | packages=['tsfracdiff'], 27 | classifiers=[ 28 | 'Intended Audience :: Science/Research', 29 | 'Topic :: Scientific/Engineering :: Information Analysis', 30 | 'Programming Language :: Python', 31 | 'Operating System :: OS Independent', 32 | 'License :: OSI Approved :: MIT License' 33 | ], 34 | python_requires='>=3.7', 35 | install_requires=[ 36 | 'numpy', 37 | 'pandas', 38 | 'arch', 39 | 'joblib' 40 | ], 41 | license_files = ('LICENSE',), 42 | ) -------------------------------------------------------------------------------- /tests/test_module.py: -------------------------------------------------------------------------------- 1 | from tsfracdiff import * 2 | import numpy as np 3 | import pandas as pd 4 | np.random.seed(42) 5 | import pytest 6 | 7 | def _GenerateData(): 8 | 9 | T = 1000 10 | K = 5 11 | 12 | df = [ np.array([1 for k in range(K)]) ] 13 | mu = np.random.normal(0, 0.25, size=(K)) 14 | for t in range(T-1): 15 | d_t = mu + df[-1] + np.random.normal(0, 1, size=(K)) 16 | df.append( d_t ) 17 | df = pd.DataFrame(np.vstack(df)) 18 | return df 19 | 20 | def _TestStationary( df_frac, fracDiff ): 21 | if isinstance(df_frac, pd.DataFrame): 22 | for k in range(df_frac.shape[1]): 23 | assert fracDiff.UnitRootTest.IsStationary( df_frac.iloc[:,k].dropna() ) 24 | elif isinstance(df_frac, np.ndarray): 25 | for k in range(df_frac.shape[1]): 26 | assert fracDiff.UnitRootTest.IsStationary( df_frac[:,k].dropna() ) 27 | else: 28 | raise Exception('Invalid datatype returned.') 29 | 30 | return 31 | 32 | def _TestFracDiff( df, unitRootTest, parallel=True ): 33 | fracDiff = FractionalDifferentiator(unitRootTest=unitRootTest) 34 | df_frac = fracDiff.FitTransform( df, parallel=parallel ) 35 | _TestStationary( df_frac, fracDiff ) 36 | return df_frac 37 | 38 | def _TestAutoFracDiff( df, unitRootTest ): 39 | """ 40 | Test automatic fit-transform 41 | """ 42 | df_frac_par = _TestFracDiff( df, unitRootTest=unitRootTest, parallel=True ) 43 | df_frac_seq = _TestFracDiff( df, unitRootTest=unitRootTest, parallel=False ) 44 | assert np.allclose(df_frac_par.values, df_frac_seq.values, equal_nan=True) 45 | print('AutoFracDiff: OK') 46 | return 47 | 48 | def _TestInvTransform( df, unitRootTest ): 49 | """ 50 | Test inverse-transform 51 | """ 52 | fracDiff = FractionalDifferentiator(unitRootTest=unitRootTest) 53 | df_frac = fracDiff.FitTransform( df ) 54 | df_inv = fracDiff.InverseTransform( df_frac, lagData=df.head(max(fracDiff.numLags)) ) 55 | assert np.allclose(df.values, df_inv.values, equal_nan=True) 56 | print('InvTransform: OK') 57 | return 58 | 59 | def test_RunAllTests(): 60 | 61 | df = _GenerateData() 62 | 63 | unitRootTests = ['PP', 'ADF'] 64 | for unitRootTest in unitRootTests: 65 | print(f'Testing {unitRootTest}') 66 | _TestAutoFracDiff( df, unitRootTest=unitRootTest ) 67 | _TestInvTransform( df, unitRootTest=unitRootTest ) 68 | 69 | return -------------------------------------------------------------------------------- /tsfracdiff/__init__.py: -------------------------------------------------------------------------------- 1 | from .unit_root_tests import * 2 | from .tsfracdiff import * 3 | 4 | __version__ = "1.0.4" -------------------------------------------------------------------------------- /tsfracdiff/tsfracdiff.py: -------------------------------------------------------------------------------- 1 | from .unit_root_tests import * 2 | 3 | import pandas as pd 4 | import numpy as np 5 | 6 | class FractionalDifferentiator: 7 | 8 | def __init__(self, maxOrderBound=1, significance=0.01, precision=0.01, memoryThreshold=1e-4, 9 | unitRootTest='PP', unitRootTestConfig={}): 10 | """ 11 | Provides estimation of the real-valued order of integration and provides fractional 12 | differentiation data transformations. 13 | 14 | The available stationarity/unit root tests are: 15 | ----------------------------------------------- 16 | - 'PP' : Phillips and Perron (1988) [default] 17 | - 'ADF' : Augmented Dickey-Fuller (Said & Dickey, 1984) 18 | 19 | Parameters: 20 | ----------- 21 | maxOrderBound (float) Maximum real-valued order to search in (0, maxOrderBound) 22 | significance (float) Statistical significance level 23 | precision (float) Precision of estimated order 24 | memoryThreshold (float) Minimum magnitude of weight significance 25 | unitRootTest (str) Unit-root/stationarity tests: ['PP','ADF'] 26 | unitRootTestConfig (dict) Optional keyword arguments to pass to unit root tests 27 | 28 | Attributes: 29 | ----------- 30 | orders (list) Estimated minimum orders of differentiation 31 | numLags (list) Number of lags required for transformations 32 | 33 | Example: 34 | -------- 35 | # A pandas.DataFrame/np.array with potentially non-stationary time series 36 | df 37 | 38 | # Automatic stationary transformation with minimal information loss 39 | from tsfracdiff import FractionalDifferentiator 40 | fracDiff = FractionalDifferentiator() 41 | df = fracDiff.FitTransform(df) 42 | """ 43 | self.maxOrderBound = maxOrderBound 44 | self.significance = significance 45 | self.precision = precision 46 | self.memoryThreshold = memoryThreshold 47 | 48 | # Critical value checks 49 | checkCV = False 50 | cv_sig = None 51 | if (self.significance in [0.01, 0.05, 0.1]): 52 | checkCV = True 53 | cv_sig = str(int(self.significance * 100)) + '%' 54 | 55 | # Unit-root/Stationarity tests 56 | if unitRootTest == 'PP': 57 | self.UnitRootTest = PhillipsPerron(significance=significance, checkCV=checkCV, cv_sig=cv_sig) 58 | elif unitRootTest == 'ADF': 59 | self.UnitRootTest = ADFuller(significance=significance, checkCV=checkCV, cv_sig=cv_sig) 60 | else: 61 | raise Exception('Please specify a valid unit root test.') 62 | self.UnitRootTest.config.update( unitRootTestConfig ) 63 | 64 | # States 65 | self.isFitted = False 66 | self.orders = [] 67 | self.numLags = None 68 | 69 | def Fit(self, df, parallel=True): 70 | """ 71 | Estimates the fractional order of integration. 72 | 73 | Parameters: 74 | ----------- 75 | df (pandas.DataFrame/np.array) Raw data 76 | parallel (bool) Use multiprocessing if true (default). Requires `joblib`. 77 | """ 78 | df = pd.DataFrame(df) 79 | 80 | # Estimate minimum order of differencing 81 | if parallel: 82 | try: 83 | import multiprocessing 84 | from joblib import Parallel, delayed 85 | from functools import partial 86 | except ImportError: 87 | raise Exception('The module `joblib` is required for parallelization.') 88 | 89 | def ApplyParallel(df, func, **kwargs): 90 | n_jobs = min(df.shape[1], multiprocessing.cpu_count()) 91 | res = Parallel(n_jobs=n_jobs)( delayed(partial(func, **kwargs))(x) for x in np.array_split(df, df.shape[1], axis=1) ) 92 | return res 93 | orders = ApplyParallel(df, self._MinimumOrderSearch, upperOrder=self.maxOrderBound, first_run=True) 94 | else: 95 | orders = [] 96 | for j in range(df.shape[1]): 97 | orders.append( self._MinimumOrderSearch(df.iloc[:,j], upperOrder=self.maxOrderBound, first_run=True) ) 98 | self.orders = orders 99 | self.numLags = [ (len(self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold)) - 1) for order in self.orders ] 100 | self.isFitted = True 101 | 102 | return 103 | 104 | def FitTransform(self, df, parallel=True): 105 | """ 106 | Estimates the fractional order of integration and returns a stationarized dataframe. 107 | 108 | Parameters 109 | ---------- 110 | df (pandas.DataFrame/np.array) Raw data 111 | parallel (bool) Use multiprocessing if true (default). Requires `joblib`. 112 | """ 113 | if not self.isFitted: 114 | self.Fit(df, parallel=parallel) 115 | fracDiffed = self.Transform(df) 116 | 117 | return fracDiffed 118 | 119 | def Transform(self, df): 120 | """ 121 | Applies a fractional differentiation transformation based on estimated orders. 122 | 123 | Parameters 124 | ---------- 125 | df (pandas.DataFrame/np.array) Raw data 126 | """ 127 | if not self.isFitted: 128 | raise Exception('Fit the model first.') 129 | 130 | df = pd.DataFrame(df) 131 | fracDiffed = [] 132 | for j in range(df.shape[1]): 133 | x = self._FracDiff(df.iloc[:,j], order=self.orders[j]) 134 | fracDiffed.append( x ) 135 | fracDiffed = pd.concat(fracDiffed, axis=1).sort_index() 136 | 137 | return fracDiffed 138 | 139 | def InverseTransform(self, fracDiffed, lagData): 140 | """ 141 | Applies a fractional integration transformation by inverting the fractional differentiation. 142 | 143 | Note: The previous `K` values of the original time series are required to invert the transformation. 144 | For multi-variate time series, `K` will likely vary across columns and you may find `K` with the 145 | attribute `.numLags`. 146 | 147 | Parameters 148 | ---------- 149 | fracDiffed (pandas.DataFrame/np.array) Fractionally differentiated data 150 | lagData (pandas.DataFrame/np.array) Previous values of time series. See note. 151 | 152 | Example 153 | ------- 154 | # Multi-variate Time Series/DataFrame 155 | X # Shape (1000, 2) 156 | 157 | # Stationarize 158 | fracDiff = FractionalDifferentiator() 159 | X_stationary = fracDiff.FitTransform( X ) # Shape (967, 2) 160 | 161 | # Estimated orders 162 | orders = fracDiff.orders # [0.5703, 0.9141] 163 | 164 | # Required lagged values 165 | numLags = fracDiff.numLags # [155, 33] 166 | lagData = X.head(max(numLags)) 167 | 168 | # Fractionally integrate by passing in the first 155 values 169 | X_reconstructed = fracDiff.InverseTransform( X_stationary, lagData ) # Recovers the original X 170 | """ 171 | if not self.isFitted: 172 | raise Exception('Fit the model first.') 173 | 174 | maxLags, minLags = max(self.numLags), min(self.numLags) 175 | lagData = pd.DataFrame(lagData) 176 | if lagData.shape[0] != maxLags: 177 | raise Exception(f'The previous {maxLags} values are required.') 178 | 179 | fracDiffed = pd.DataFrame(fracDiffed) 180 | X = [] 181 | for j in range(fracDiffed.shape[1]): 182 | memoryWeights = self._GetMemoryWeights(self.orders[j], memoryThreshold=self.memoryThreshold) 183 | K = self.numLags[j] 184 | offset = K - minLags 185 | 186 | # Initial values 187 | tsLagData = lagData.iloc[:K, j] 188 | 189 | # Transformed values 190 | X_tilde = fracDiffed.iloc[offset:, j] 191 | 192 | # Already stationary: identity transform 193 | if K == 0: 194 | X.append( X_tilde ) 195 | continue 196 | 197 | # Iteratively invert transformation 198 | X_vals = np.ravel(tsLagData.values) 199 | X_tilde = np.ravel(X_tilde.values) 200 | for t in range(len(X_tilde)): 201 | x = X_tilde[t] - np.sum( memoryWeights[:-1] * X_vals[-K:] ) 202 | X_vals = np.append(X_vals, x) 203 | X_vals = pd.Series(X_vals) 204 | X.append( X_vals ) 205 | X = pd.concat(X, axis=1).sort_index() 206 | X.columns = fracDiffed.columns 207 | 208 | # Check for duplicate indices 209 | idx = lagData.index[:minLags].union( fracDiffed.index ) 210 | if len(idx) != X.shape[0]: 211 | idx = [ t for t in range(X.shape[0]) ] 212 | X.index = idx 213 | 214 | return X 215 | 216 | def _GetMemoryWeights(self, order, memoryThreshold=1e-4): 217 | """ 218 | Returns an array of memory weights for each time lag. 219 | 220 | Parameters: 221 | ----------- 222 | order (float) Order of fracdiff 223 | memoryThreshold (float) Minimum magnitude of weight significance 224 | """ 225 | memoryWeights = [1,] 226 | k = 1 227 | while True: 228 | weight = -memoryWeights[-1] * ( order - k + 1 ) / k # Iteratively generate next lag weight 229 | if abs(weight) < memoryThreshold: 230 | break 231 | memoryWeights.append(weight) 232 | k += 1 233 | return np.array(list(reversed(memoryWeights))) 234 | 235 | def _FracDiff(self, ts, order=1, memoryWeights=None): 236 | """ 237 | Differentiates a time series based on a real-valued order. 238 | 239 | Parameters: 240 | ----------- 241 | ts (pandas.Series) Univariate time series 242 | order (float) Order of differentiation 243 | memoryWeights (array) Optional pre-computed weights 244 | """ 245 | if memoryWeights is None: 246 | memoryWeights = self._GetMemoryWeights(order, memoryThreshold=self.memoryThreshold) 247 | 248 | K = len(memoryWeights) 249 | fracDiffedSeries = ts.rolling(K).apply(lambda x: np.sum( x * memoryWeights ), raw=True) 250 | fracDiffedSeries = fracDiffedSeries.iloc[(K-1):] 251 | 252 | return fracDiffedSeries 253 | 254 | def _MinimumOrderSearch(self, ts, lowerOrder=0, upperOrder=1, first_run=False): 255 | """ 256 | Binary search algorithm for estimating the minimum order of differentiation required for stationarity. 257 | 258 | Parameters 259 | ---------- 260 | ts (pandas.Series) Univariate time series 261 | lowerOrder (float) Lower bound on order 262 | upperOrder (float) Upper bound on order 263 | first_run (bool) For testing endpoints of order bounds 264 | """ 265 | ## Convergence criteria 266 | if abs( upperOrder - lowerOrder ) <= self.precision: 267 | return upperOrder 268 | 269 | ## Initial run: Test endpoints 270 | if first_run: 271 | lowerFracDiff = self._FracDiff(ts, order=lowerOrder).dropna() 272 | upperFracDiff = self._FracDiff(ts, order=upperOrder).dropna() 273 | 274 | # Unit root tests 275 | lowerStationary = self.UnitRootTest.IsStationary( lowerFracDiff ) 276 | upperStationary = self.UnitRootTest.IsStationary( upperFracDiff ) 277 | 278 | # Series is I(0) 279 | if lowerStationary: 280 | return lowerOrder 281 | # Series is I(k>>1) 282 | if not upperStationary: 283 | print('Warning: Time series is explosive. Increase upper bounds.') 284 | return upperOrder 285 | 286 | ## Binary Search: Test midpoint 287 | midOrder = ( lowerOrder + upperOrder ) / 2 288 | midFracDiff = self._FracDiff(ts, order=midOrder).dropna() 289 | midStationary = self.UnitRootTest.IsStationary( midFracDiff ) 290 | 291 | # Series is weakly stationary in [lowerOrder, midOrder] 292 | if midStationary: 293 | return self._MinimumOrderSearch(ts, lowerOrder=lowerOrder, upperOrder=midOrder) 294 | # Series is weakly stationary in [midOrder, upperOrder] 295 | else: 296 | return self._MinimumOrderSearch(ts, lowerOrder=midOrder, upperOrder=upperOrder) 297 | -------------------------------------------------------------------------------- /tsfracdiff/unit_root_tests.py: -------------------------------------------------------------------------------- 1 | import arch 2 | from arch.unitroot import PhillipsPerron as PP 3 | from arch.unitroot import ADF 4 | 5 | ## TODO: Ng and Perron (2001)? 6 | 7 | class PhillipsPerron: 8 | """ 9 | Unit root testing via Phillips and Perron (1988). This test is robust to 10 | serial correlation and heteroskedasticity. 11 | 12 | References: 13 | ----------- 14 | Phillips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression. 15 | Biometrika, 75(2), 335–346. https://doi.org/10.1093/biomet/75.2.335 16 | """ 17 | 18 | def __init__(self, 19 | config={ 'trend' : 'n', 'test_type' : 'tau'}, 20 | significance=0.01, 21 | checkCV=False, 22 | cv_sig=None): 23 | self.config = config 24 | self.significance = significance 25 | self.checkCV = checkCV 26 | self.cv_sig = cv_sig 27 | 28 | def IsStationary(self, ts): 29 | """ 30 | Performs a unit root test. 31 | """ 32 | 33 | testResults = PP(ts, trend=self.config['trend'], test_type=self.config['test_type']) 34 | pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat 35 | 36 | result = self.HypothesisTest(pval, cv, stat) 37 | 38 | return result 39 | 40 | def HypothesisTest(self, pval, cv, stat): 41 | """ 42 | Null Hypothesis: Time series is integrated of order I(1) 43 | Alt Hypothesis: Time series is integrated of order I(k<1) 44 | """ 45 | 46 | # Reject the hypothesis 47 | if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ): 48 | return True 49 | # Fail to reject the hypothesis 50 | else: 51 | return False 52 | 53 | class ADFuller: 54 | """ 55 | Unit root testing via Said and Dickey (1984). This test assumes a parametric 56 | ARMA structure to correct for serial correlation but assumes the errors are homoskedastic. 57 | 58 | References: 59 | ----------- 60 | Said E. Said, & Dickey, D. A. (1984). Testing for Unit Roots in Autoregressive-Moving Average 61 | Models of Unknown Order. Biometrika, 71(3), 599–607. https://doi.org/10.2307/2336570 62 | """ 63 | def __init__(self, 64 | config={ 'trend' : 'n', 'method' : 'AIC'}, 65 | significance=0.01, 66 | checkCV=False, 67 | cv_sig=None): 68 | self.config = config 69 | self.significance = significance 70 | self.checkCV = checkCV 71 | self.cv_sig = cv_sig 72 | 73 | ## Compatability workaround // 74 | # arch <= 4.17 uses capital letters but newer versions use lowercase 75 | if (str(arch.__version__) > '4.17'): 76 | if self.config.get('method') == 'AIC': 77 | self.config['method'] = 'aic' 78 | elif self.config.get('method') == 'BIC': 79 | self.config['method'] = 'bic' 80 | 81 | def IsStationary(self, ts): 82 | """ 83 | Performs a unit root test. 84 | """ 85 | 86 | testResults = ADF(ts, trend=self.config['trend'], method=self.config['method']) 87 | pval, cv, stat = testResults.pvalue, testResults.critical_values, testResults.stat 88 | 89 | result = self.HypothesisTest(pval, cv, stat) 90 | 91 | return result 92 | 93 | def HypothesisTest(self, pval, cv, stat): 94 | """ 95 | Null Hypothesis: Gamma = 0 (Unit root) 96 | Alt Hypothesis: Gamma < 0 97 | """ 98 | 99 | # Reject the hypothesis 100 | if (pval < self.significance) or ( self.checkCV and (stat < cv.get(self.cv_sig, 0)) ): 101 | return True 102 | # Fail to reject the hypothesis 103 | else: 104 | return False 105 | 106 | 107 | 108 | --------------------------------------------------------------------------------