├── .gitignore ├── LICENSE ├── README.md ├── imgs ├── randomwalk_smoothing.png ├── sinusoidal_bootstrap.png └── sinusoidal_smoothing.png ├── notebooks ├── Basic Smoothing.ipynb ├── Sinusoidal Smoothing.ipynb ├── Sliding Smoothing.ipynb └── sklearn-wrapper.ipynb ├── requirements.txt ├── setup.py └── tsmoothie ├── __init__.py ├── bootstrap.py ├── regression_basis.py ├── smoother.py ├── utils_class.py └── utils_func.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | 3 | # Created by https://www.gitignore.io/api/python 4 | 5 | ### Python ### 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | 32 | # PyInstaller 33 | # Usually these files are written by a python script from a template 34 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 35 | *.manifest 36 | *.spec 37 | 38 | # Installer logs 39 | pip-log.txt 40 | pip-delete-this-directory.txt 41 | 42 | # Unit test / coverage reports 43 | htmlcov/ 44 | .tox/ 45 | .coverage 46 | .coverage.* 47 | .cache 48 | nosetests.xml 49 | coverage.xml 50 | *.cover 51 | .hypothesis/ 52 | 53 | # Translations 54 | *.mo 55 | *.pot 56 | 57 | # Django stuff: 58 | *.log 59 | local_settings.py 60 | 61 | # Flask stuff: 62 | instance/ 63 | .webassets-cache 64 | 65 | # Scrapy stuff: 66 | .scrapy 67 | 68 | # Sphinx documentation 69 | docs/_build/ 70 | 71 | # PyBuilder 72 | target/ 73 | 74 | # Jupyter Notebook 75 | .ipynb_checkpoints 76 | 77 | # pyenv 78 | .python-version 79 | 80 | # celery beat schedule file 81 | celerybeat-schedule 82 | 83 | # SageMath parsed files 84 | *.sage.py 85 | 86 | # Environments 87 | .env 88 | .venv 89 | env/ 90 | venv/ 91 | ENV/ 92 | env.bak/ 93 | venv.bak/ 94 | 95 | # Spyder project settings 96 | .spyderproject 97 | .spyproject 98 | 99 | # Rope project settings 100 | .ropeproject 101 | 102 | # mkdocs documentation 103 | /site 104 | 105 | # mypy 106 | .mypy_cache/ 107 | 108 | # End of https://www.gitignore.io/api/python 109 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Marco Cerliani 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tsmoothie 2 | 3 | A python library for time-series smoothing and outlier detection in a vectorized way. 4 | 5 | ## Overview 6 | 7 | tsmoothie computes, in a fast and efficient way, the smoothing of single or multiple time-series. 8 | 9 | The smoothing techniques available are: 10 | 11 | - Exponential Smoothing 12 | - Convolutional Smoothing with various window types (constant, hanning, hamming, bartlett, blackman) 13 | - Spectral Smoothing with Fourier Transform 14 | - Polynomial Smoothing 15 | - Spline Smoothing of various kind (linear, cubic, natural cubic) 16 | - Gaussian Smoothing 17 | - Binner Smoothing 18 | - LOWESS 19 | - Seasonal Decompose Smoothing of various kind (convolution, lowess, natural cubic spline) 20 | - Kalman Smoothing with customizable components (level, trend, seasonality, long seasonality) 21 | 22 | tsmoothie provides the calculation of intervals as result of the smoothing process. This can be useful to identify outliers and anomalies in time-series. 23 | 24 | In relation to the smoothing method used, the interval types available are: 25 | 26 | - sigma intervals 27 | - confidence intervals 28 | - predictions intervals 29 | - kalman intervals 30 | 31 | tsmoothie can carry out a sliding smoothing approach to simulate an online usage. This is possible splitting the time-series into equal sized pieces and smoothing them independently. As always, this functionality is implemented in a vectorized way through the **WindowWrapper** class. 32 | 33 | tsmoothie can operate time-series bootstrap through the **BootstrappingWrapper** class. 34 | 35 | The supported bootstrap algorithms are: 36 | 37 | - none overlapping block bootstrap 38 | - moving block bootstrap 39 | - circular block bootstrap 40 | - stationary bootstrap 41 | 42 | ## Media 43 | 44 | Blog Posts: 45 | 46 | - [Time Series Smoothing for better Clustering](https://towardsdatascience.com/time-series-smoothing-for-better-clustering-121b98f308e8) 47 | - [Time Series Smoothing for better Forecasting](https://towardsdatascience.com/time-series-smoothing-for-better-forecasting-7fbf10428b2) 48 | - [Real-Time Time Series Anomaly Detection](https://towardsdatascience.com/real-time-time-series-anomaly-detection-981cf1e1ca13) 49 | - [Extreme Event Time Series Preprocessing](https://towardsdatascience.com/extreme-event-time-series-preprocessing-90aa59d5630c) 50 | - [Time Series Bootstrap in the age of Deep Learning](https://towardsdatascience.com/time-series-bootstrap-in-the-age-of-deep-learning-b98aa2aa32c4) 51 | 52 | ## Installation 53 | 54 | ```shell 55 | pip install --upgrade tsmoothie 56 | ``` 57 | 58 | The module depends only on NumPy, SciPy and simdkalman. Python 3.6 or above is supported. 59 | 60 | ## Usage: _smoothing_ 61 | 62 | Below a couple of examples of how tsmoothie works. Full examples are available in the [notebooks folder](https://github.com/cerlymarco/tsmoothie/tree/master/notebooks). 63 | 64 | ```python 65 | # import libraries 66 | import numpy as np 67 | import matplotlib.pyplot as plt 68 | from tsmoothie.utils_func import sim_randomwalk 69 | from tsmoothie.smoother import LowessSmoother 70 | 71 | # generate 3 randomwalks of lenght 200 72 | np.random.seed(123) 73 | data = sim_randomwalk(n_series=3, timesteps=200, 74 | process_noise=10, measure_noise=30) 75 | 76 | # operate smoothing 77 | smoother = LowessSmoother(smooth_fraction=0.1, iterations=1) 78 | smoother.smooth(data) 79 | 80 | # generate intervals 81 | low, up = smoother.get_intervals('prediction_interval') 82 | 83 | # plot the smoothed timeseries with intervals 84 | plt.figure(figsize=(18,5)) 85 | 86 | for i in range(3): 87 | 88 | plt.subplot(1,3,i+1) 89 | plt.plot(smoother.smooth_data[i], linewidth=3, color='blue') 90 | plt.plot(smoother.data[i], '.k') 91 | plt.title(f"timeseries {i+1}"); plt.xlabel('time') 92 | 93 | plt.fill_between(range(len(smoother.data[i])), low[i], up[i], alpha=0.3) 94 | ``` 95 | 96 | ![Randomwalk Smoothing](https://raw.githubusercontent.com/cerlymarco/tsmoothie/master/imgs/randomwalk_smoothing.png) 97 | 98 | ```python 99 | # import libraries 100 | import numpy as np 101 | import matplotlib.pyplot as plt 102 | from tsmoothie.utils_func import sim_seasonal_data 103 | from tsmoothie.smoother import DecomposeSmoother 104 | 105 | # generate 3 periodic timeseries of lenght 300 106 | np.random.seed(123) 107 | data = sim_seasonal_data(n_series=3, timesteps=300, 108 | freq=24, measure_noise=30) 109 | 110 | # operate smoothing 111 | smoother = DecomposeSmoother(smooth_type='lowess', periods=24, 112 | smooth_fraction=0.3) 113 | smoother.smooth(data) 114 | 115 | # generate intervals 116 | low, up = smoother.get_intervals('sigma_interval') 117 | 118 | # plot the smoothed timeseries with intervals 119 | plt.figure(figsize=(18,5)) 120 | 121 | for i in range(3): 122 | 123 | plt.subplot(1,3,i+1) 124 | plt.plot(smoother.smooth_data[i], linewidth=3, color='blue') 125 | plt.plot(smoother.data[i], '.k') 126 | plt.title(f"timeseries {i+1}"); plt.xlabel('time') 127 | 128 | plt.fill_between(range(len(smoother.data[i])), low[i], up[i], alpha=0.3) 129 | ``` 130 | 131 | ![Sinusoidal Smoothing](https://raw.githubusercontent.com/cerlymarco/tsmoothie/master/imgs/sinusoidal_smoothing.png) 132 | 133 | **All the available smoothers are fully integrable with sklearn (see [here](https://github.com/cerlymarco/tsmoothie/blob/master/notebooks/sklearn-wrapper.ipynb)).** 134 | 135 | ## Usage: _bootstrap_ 136 | 137 | ```python 138 | # import libraries 139 | import numpy as np 140 | import matplotlib.pyplot as plt 141 | from tsmoothie.utils_func import sim_seasonal_data 142 | from tsmoothie.smoother import ConvolutionSmoother 143 | from tsmoothie.bootstrap import BootstrappingWrapper 144 | 145 | # generate a periodic timeseries of lenght 300 146 | np.random.seed(123) 147 | data = sim_seasonal_data(n_series=1, timesteps=300, 148 | freq=24, measure_noise=15) 149 | 150 | # operate bootstrap 151 | bts = BootstrappingWrapper(ConvolutionSmoother(window_len=8, window_type='ones'), 152 | bootstrap_type='mbb', block_length=24) 153 | bts_samples = bts.sample(data, n_samples=100) 154 | 155 | # plot the bootstrapped timeseries 156 | plt.figure(figsize=(13,5)) 157 | plt.plot(bts_samples.T, alpha=0.3, c='orange') 158 | plt.plot(data[0], c='blue', linewidth=2) 159 | ``` 160 | 161 | ![Sinusoidal Bootstrap](https://raw.githubusercontent.com/cerlymarco/tsmoothie/master/imgs/sinusoidal_bootstrap.png) 162 | 163 | ## References 164 | 165 | - Polynomial, Spline, Gaussian and Binner smoothing are carried out building a regression on custom basis expansions. These implementations are based on the amazing intuitions of Matthew Drury available [here](https://github.com/madrury/basis-expansions/blob/master/examples/comparison-of-smoothing-methods.ipynb) 166 | - Time Series Modelling with Unobserved Components, Matteo M. Pelagatti 167 | - Bootstrap Methods in Time Series Analysis, Fanny Bergström, Stockholms universitet 168 | -------------------------------------------------------------------------------- /imgs/randomwalk_smoothing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/tsmoothie/6eede029acf7e2448e762b8d44e5c8fe2bcda319/imgs/randomwalk_smoothing.png -------------------------------------------------------------------------------- /imgs/sinusoidal_bootstrap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/tsmoothie/6eede029acf7e2448e762b8d44e5c8fe2bcda319/imgs/sinusoidal_bootstrap.png -------------------------------------------------------------------------------- /imgs/sinusoidal_smoothing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cerlymarco/tsmoothie/6eede029acf7e2448e762b8d44e5c8fe2bcda319/imgs/sinusoidal_smoothing.png -------------------------------------------------------------------------------- /notebooks/sklearn-wrapper.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "from tsmoothie.utils_func import sim_randomwalk\n", 12 | "from tsmoothie.smoother import LowessSmoother\n", 13 | "\n", 14 | "from sklearn.pipeline import make_pipeline\n", 15 | "from sklearn.preprocessing import StandardScaler\n", 16 | "from sklearn.base import TransformerMixin, BaseEstimator" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "(3, 200)" 28 | ] 29 | }, 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "output_type": "execute_result" 33 | } 34 | ], 35 | "source": [ 36 | "np.random.seed(123)\n", 37 | "data = sim_randomwalk(n_series=3, timesteps=200, \n", 38 | " process_noise=10, measure_noise=30)\n", 39 | "data.shape # (n_series, timesteps)" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "class LowessSmootherWrap(TransformerMixin, BaseEstimator, LowessSmoother):\n", 49 | "\n", 50 | " def fit(self, X, y=None):\n", 51 | " self._is_fitted = True\n", 52 | " return self\n", 53 | "\n", 54 | " def transform(self, X, y=None):\n", 55 | " self.smooth(X)\n", 56 | " return self.smooth_data\n", 57 | "\n", 58 | " def fit_transform(self, X, y=None):\n", 59 | " return self.fit(X).transform(X)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/plain": [ 70 | "LowessSmootherWrap(smooth_fraction=0.1)" 71 | ] 72 | }, 73 | "execution_count": 4, 74 | "metadata": {}, 75 | "output_type": "execute_result" 76 | } 77 | ], 78 | "source": [ 79 | "smoother = LowessSmootherWrap(smooth_fraction=0.1, iterations=1)\n", 80 | "smoother.fit(data)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 5, 86 | "metadata": {}, 87 | "outputs": [ 88 | { 89 | "data": { 90 | "text/plain": [ 91 | "(3, 200)" 92 | ] 93 | }, 94 | "execution_count": 5, 95 | "metadata": {}, 96 | "output_type": "execute_result" 97 | } 98 | ], 99 | "source": [ 100 | "smoothdata = smoother.fit_transform(data)\n", 101 | "smoothdata.shape # (n_series, timesteps)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 6, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "data": { 111 | "image/png": "\n", 112 | "text/plain": [ 113 | "
" 114 | ] 115 | }, 116 | "metadata": { 117 | "needs_background": "light" 118 | }, 119 | "output_type": "display_data" 120 | } 121 | ], 122 | "source": [ 123 | "plt.figure(figsize=(18,5))\n", 124 | "\n", 125 | "for i in range(3):\n", 126 | " \n", 127 | " plt.subplot(1,3,i+1)\n", 128 | " plt.plot(smoothdata[i], linewidth=3, color='blue')\n", 129 | " plt.plot(data[i], '.k')\n", 130 | " plt.title(f\"timeseries {i+1}\"); plt.xlabel('time')" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 7, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "class LowessSmootherWrap(TransformerMixin, BaseEstimator, LowessSmoother):\n", 140 | "\n", 141 | " def fit(self, X, y=None):\n", 142 | " self._is_fitted = True\n", 143 | " return self\n", 144 | "\n", 145 | " def transform(self, X, y=None):\n", 146 | " self.smooth(X.T) # w/ Transpose\n", 147 | " return self.smooth_data.T # w/ Transpose\n", 148 | "\n", 149 | " def fit_transform(self, X, y=None):\n", 150 | " return self.fit(X).transform(X)" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 8, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "smoother = LowessSmootherWrap(smooth_fraction=0.1, iterations=1)\n", 160 | "pipe = make_pipeline(StandardScaler(), smoother)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 9, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "data": { 170 | "text/plain": [ 171 | "(200, 3)" 172 | ] 173 | }, 174 | "execution_count": 9, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "smoothdata = pipe.fit_transform(data.T) # w/ Transpose\n", 181 | "smoothdata.shape # (timesteps, n_series)" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 10, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "image/png": "\n", 192 | "text/plain": [ 193 | "
" 194 | ] 195 | }, 196 | "metadata": { 197 | "needs_background": "light" 198 | }, 199 | "output_type": "display_data" 200 | } 201 | ], 202 | "source": [ 203 | "scaled_data = pipe.named_steps['standardscaler'].transform(data.T)\n", 204 | "\n", 205 | "plt.figure(figsize=(18,5))\n", 206 | "\n", 207 | "for i in range(3):\n", 208 | " \n", 209 | " plt.subplot(1,3,i+1)\n", 210 | " plt.plot(smoothdata[:,i], linewidth=3, color='blue')\n", 211 | " plt.plot(scaled_data[:,i], '.k')\n", 212 | " plt.title(f\"timeseries {i+1}\"); plt.xlabel('time')" 213 | ] 214 | } 215 | ], 216 | "metadata": { 217 | "kernelspec": { 218 | "display_name": "Python 3", 219 | "language": "python", 220 | "name": "python3" 221 | }, 222 | "language_info": { 223 | "codemirror_mode": { 224 | "name": "ipython", 225 | "version": 3 226 | }, 227 | "file_extension": ".py", 228 | "mimetype": "text/x-python", 229 | "name": "python", 230 | "nbconvert_exporter": "python", 231 | "pygments_lexer": "ipython3", 232 | "version": "3.6.6" 233 | } 234 | }, 235 | "nbformat": 4, 236 | "nbformat_minor": 2 237 | } 238 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scipy 3 | simdkalman -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import pathlib 2 | from setuptools import setup, find_packages 3 | 4 | HERE = pathlib.Path(__file__).parent 5 | 6 | VERSION = '1.0.5' 7 | PACKAGE_NAME = 'tsmoothie' 8 | AUTHOR = 'Marco Cerliani' 9 | AUTHOR_EMAIL = 'cerlymarco@gmail.com' 10 | URL = 'https://github.com/cerlymarco/tsmoothie' 11 | 12 | LICENSE = 'MIT' 13 | DESCRIPTION = 'A python library for timeseries smoothing and outlier detection in a vectorized way.' 14 | LONG_DESCRIPTION = (HERE / "README.md").read_text() 15 | LONG_DESC_TYPE = "text/markdown" 16 | 17 | INSTALL_REQUIRES = [ 18 | 'numpy', 19 | 'scipy', 20 | 'simdkalman' 21 | ] 22 | 23 | setup(name=PACKAGE_NAME, 24 | version=VERSION, 25 | description=DESCRIPTION, 26 | long_description=LONG_DESCRIPTION, 27 | long_description_content_type=LONG_DESC_TYPE, 28 | author=AUTHOR, 29 | license=LICENSE, 30 | author_email=AUTHOR_EMAIL, 31 | url=URL, 32 | install_requires=INSTALL_REQUIRES, 33 | python_requires='>=3', 34 | packages=find_packages() 35 | ) -------------------------------------------------------------------------------- /tsmoothie/__init__.py: -------------------------------------------------------------------------------- 1 | from .utils_class import * 2 | from .utils_func import * 3 | from .regression_basis import * 4 | from .smoother import * 5 | from .bootstrap import * 6 | -------------------------------------------------------------------------------- /tsmoothie/bootstrap.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Define Bootstrapping class. 3 | ''' 4 | 5 | import numpy as np 6 | 7 | from .utils_func import _id_nb_bootstrap, _id_mb_bootstrap, _id_cb_bootstrap, _id_s_bootstrap 8 | 9 | 10 | class BootstrappingWrapper: 11 | """BootstrappingWrapper generates new timeseries samples using 12 | specific algorithms for sequences bootstrapping. 13 | 14 | The BootstrappingWrapper handles single timeseries. Firstly, the 15 | smoothing of the received series is computed. Secondly, the residuals 16 | of the smoothing operation are generated and randomly partitionated 17 | into blocks according to the choosen bootstrapping techniques. Finally, 18 | the residual blocks are sampled in random order, concatenated and then 19 | added to the original smoothing curve in order to obtain a bootstrapped 20 | timeseries. 21 | 22 | The supported bootstrap algorithms are: 23 | - none overlapping block bootstrap ('nbb') 24 | - moving block bootstrap ('mbb') 25 | - circular block bootstrap ('cbb') 26 | - stationary bootstrap ('sb') 27 | 28 | Parameters 29 | ---------- 30 | Smoother : class from tsmoothie.smoother 31 | Every smoother available in tsmoothie.smoother 32 | (except for WindowWrapper). It computes the smoothing on the series 33 | received. 34 | 35 | bootstrap_type : str 36 | The type of algorithm used to compute the bootstrap. 37 | Supported types are: none overlapping block bootstrap ('nbb'), 38 | moving block bootstrap ('mbb'), circular block bootstrap ('cbb'), 39 | stationary bootstrap ('sb'). 40 | 41 | block_length : int 42 | The shape of the blocks used to sample from the residuals of the 43 | smoothing operation and used to bootstrap new samples. 44 | Must be an integer in [3, timesteps). 45 | 46 | Attributes 47 | ---------- 48 | Smoother : class from tsmoothie.smoother 49 | Every smoother available in tsmoothie.smoother 50 | (except for WindowWrapper) that was passed to BootstrappingWrapper. 51 | It as the same properties and attributes of every Smoother. 52 | 53 | Examples 54 | -------- 55 | >>> import numpy as np 56 | >>> from tsmoothie.utils_func import sim_seasonal_data 57 | >>> from tsmoothie.bootstrap import BootstrappingWrapper 58 | >>> from tsmoothie.smoother import * 59 | >>> np.random.seed(33) 60 | >>> data = sim_seasonal_data(n_series=1, timesteps=200, 61 | ... freq=24, measure_noise=10) 62 | >>> bts = BootstrappingWrapper( 63 | ... ConvolutionSmoother(window_len=8, window_type='ones'), 64 | ... bootstrap_type='mbb', block_length=24) 65 | >>> bts_samples = bts.sample(data, n_samples=100) 66 | """ 67 | 68 | def __init__(self, Smoother, bootstrap_type, block_length): 69 | self.Smoother = Smoother 70 | self.bootstrap_type = bootstrap_type 71 | self.block_length = block_length 72 | 73 | def __repr__(self): 74 | return "".format(self.__class__.__name__) 75 | 76 | def __str__(self): 77 | return "".format(self.__class__.__name__) 78 | 79 | def sample(self, data, n_samples=1): 80 | """Bootstrap timeseries. 81 | 82 | Parameters 83 | ---------- 84 | data : array-like of shape (1, timesteps) or also (timesteps,) 85 | Single timeseries to bootstrap. 86 | The data are assumed to be in increasing time order. 87 | 88 | n_samples : int, default=1 89 | How many bootstrapped series to generate. 90 | 91 | Returns 92 | ------- 93 | bootstrap_data : array of shape (n_samples, timesteps) 94 | Bootstrapped samples 95 | """ 96 | 97 | bootstrap_types = ['nbb', 'mbb', 'cbb', 'sb'] 98 | 99 | if self.bootstrap_type not in bootstrap_types: 100 | raise ValueError( 101 | "'{}' is not a supported bootstrap type. " 102 | "Supported types are {}".format( 103 | self.bootstrap_type, bootstrap_types)) 104 | 105 | if not 'tsmoothie.smoother' in str(self.Smoother.__repr__): 106 | raise ValueError("Use a Smoother from tsmoothie.smoother") 107 | 108 | if self.Smoother.__class__.__name__ == 'WindowWrapper': 109 | raise ValueError("WindowWrapper doesn't support bootstrapping") 110 | 111 | if self.block_length < 3: 112 | raise ValueError("block_length must be >= 3") 113 | 114 | if n_samples < 1: 115 | raise ValueError("n_samples must be >= 1") 116 | 117 | data = np.asarray(data) 118 | if np.prod(data.shape) == np.max(data.shape): 119 | data = data.ravel() 120 | nobs = data.shape[0] 121 | 122 | if self.block_length >= nobs: 123 | raise ValueError( 124 | "block_length must be < than the timesteps dimension " 125 | "of the data passed") 126 | 127 | if self.Smoother.__class__.__name__ == 'ExponentialSmoother': 128 | nobs = data.shape[0] - self.Smoother.window_len 129 | if self.block_length >= nobs: 130 | raise ValueError( 131 | "block_length must be < than (timesteps - window_len)") 132 | else: 133 | raise ValueError("The format of data received is not appropriate. " 134 | "BootstrappingWrapper accepts only univariate " 135 | "timeseries") 136 | 137 | self.Smoother.copy = True 138 | self.Smoother.smooth(data) 139 | residuals = self.Smoother.data - self.Smoother.smooth_data 140 | if (np.nan == residuals).any(): 141 | residuals = np.nan_to_num(residuals, nan=0) 142 | 143 | if self.bootstrap_type == 'nbb': 144 | bootstrap_func = _id_nb_bootstrap 145 | elif self.bootstrap_type == 'mbb': 146 | bootstrap_func = _id_mb_bootstrap 147 | elif self.bootstrap_type == 'cbb': 148 | bootstrap_func = _id_cb_bootstrap 149 | else: 150 | bootstrap_func = _id_s_bootstrap 151 | 152 | bootstrap_data = np.empty((n_samples, nobs)) 153 | for i in np.arange(n_samples): 154 | bootstrap_id = bootstrap_func(nobs, self.block_length) 155 | bootstrap_res = residuals[[0], bootstrap_id] 156 | bootstrap_data[i] = self.Smoother.smooth_data + bootstrap_res 157 | 158 | return bootstrap_data -------------------------------------------------------------------------------- /tsmoothie/regression_basis.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Basis functions for regression. 3 | Inspired by: https://github.com/madrury/basis-expansions/blob/master/examples/comparison-of-smoothing-methods.ipynb 4 | ''' 5 | 6 | import numpy as np 7 | 8 | 9 | def polynomial(degree, basis_len): 10 | """Create basis for polynomial regression. 11 | 12 | Returns 13 | ------- 14 | X_base : array 15 | Basis for polynomial regression. 16 | """ 17 | 18 | X = np.arange(basis_len, dtype=np.float64) 19 | X_base = np.repeat([X], degree, axis=0).T 20 | X_base = np.power(X_base, np.arange(1, degree + 1)) 21 | 22 | return X_base 23 | 24 | 25 | def linear_spline(knots, basis_len): 26 | """Create basis for linear spline regression. 27 | 28 | Returns 29 | ------- 30 | X_base : array 31 | Basis for linear spline regression. 32 | """ 33 | 34 | n_knots = len(knots) 35 | X = np.arange(basis_len) 36 | 37 | X_base = np.zeros((basis_len, n_knots + 1)) 38 | X_base[:, 0] = X 39 | 40 | X_base[:, 1:] = X[:, None] - knots[None, :] 41 | X_base[X_base < 0] = 0 42 | 43 | return X_base 44 | 45 | 46 | def cubic_spline(knots, basis_len): 47 | """Create basis for cubic spline regression. 48 | 49 | Returns 50 | ------- 51 | X_base : array 52 | Basis for cubic spline regression. 53 | """ 54 | 55 | n_knots = len(knots) 56 | X = np.arange(basis_len) 57 | 58 | X_base = np.zeros((basis_len, n_knots + 3)) 59 | X_base[:, 0] = X 60 | X_base[:, 1] = X_base[:, 0] * X_base[:, 0] 61 | X_base[:, 2] = X_base[:, 1] * X_base[:, 0] 62 | 63 | X_base[:, 3:] = np.power(X[:, None] - knots[None, :], 3) 64 | X_base[X_base < 0] = 0 65 | 66 | return X_base 67 | 68 | 69 | def natural_cubic_spline(knots, basis_len): 70 | """Create basis for natural cubic spline regression. 71 | 72 | Returns 73 | ------- 74 | X_base : array 75 | Basis for natural cubic spline regression. 76 | """ 77 | 78 | n_knots = len(knots) 79 | X = np.arange(basis_len) 80 | 81 | X_base = np.zeros((basis_len, n_knots - 1)) 82 | X_base[:, 0] = X 83 | 84 | numerator1 = X[:, None] - knots[None, :n_knots - 2] 85 | numerator1[numerator1 < 0] = 0 86 | numerator2 = X[:, None] - knots[None, n_knots - 1] 87 | numerator2[numerator2 < 0] = 0 88 | 89 | numerator = np.power(numerator1, 3) - np.power(numerator2, 3) 90 | denominator = knots[n_knots - 1] - knots[:n_knots - 2] 91 | 92 | numerator1_dd = X[:, None] - knots[None, n_knots - 2] 93 | numerator1_dd[numerator1_dd < 0] = 0 94 | numerator2_dd = X[:, None] - knots[None, n_knots - 1] 95 | numerator2_dd[numerator2_dd < 0] = 0 96 | 97 | numerator_dd = np.power(numerator1_dd, 3) - np.power(numerator2_dd, 3) 98 | denominator_dd = knots[n_knots - 1] - knots[n_knots - 2] 99 | 100 | dd = numerator_dd / denominator_dd 101 | 102 | X_base[:, 1:] = numerator / denominator - dd 103 | 104 | return X_base 105 | 106 | 107 | def gaussian_kernel(knots, sigma, basis_len): 108 | """Create basis for gaussian kernel regression. 109 | 110 | Returns 111 | ------- 112 | X_base : array 113 | Basis for gaussian kernel regression. 114 | """ 115 | 116 | n_knots = len(knots) 117 | X = np.arange(basis_len) / basis_len 118 | 119 | X_base = - np.square(X[:, None] - knots) / (2 * sigma) 120 | X_base = np.exp(X_base) 121 | 122 | return X_base 123 | 124 | 125 | def binner(knots, basis_len): 126 | """Create basis for binner regression. 127 | 128 | Returns 129 | ------- 130 | X_base : array 131 | Basis for binner regression. 132 | """ 133 | 134 | n_knots = len(knots) 135 | X = np.arange(basis_len) 136 | 137 | X_base = np.zeros((basis_len, n_knots + 1)) 138 | X_base[:, 0] = X <= knots[0] 139 | 140 | X_base[:, 1:-1] = np.logical_and( 141 | X[:, None] <= knots[1:][None, :], 142 | X[:, None] > knots[:(n_knots - 1)][None, :]) 143 | 144 | X_base[:, n_knots] = knots[-1] < X 145 | 146 | return X_base 147 | 148 | 149 | def lowess(smooth_fraction, basis_len): 150 | """Create basis for LOWESS. 151 | 152 | Returns 153 | ------- 154 | X_base : array 155 | Basis for LOWESS. 156 | """ 157 | 158 | X = np.arange(basis_len) 159 | 160 | r = int(np.ceil(smooth_fraction * basis_len)) 161 | r = min(r, basis_len - 1) 162 | 163 | X = X[:, None] - X[None, :] 164 | 165 | h = np.sort(np.abs(X), axis=1)[:, r] 166 | 167 | X_base = np.abs(X / h).clip(0.0, 1.0) 168 | X_base = np.power(1 - np.power(X_base, 3), 3) 169 | 170 | return X_base 171 | -------------------------------------------------------------------------------- /tsmoothie/smoother.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Define Smoother classes. 3 | ''' 4 | 5 | import numpy as np 6 | from scipy.signal import fftconvolve 7 | import simdkalman 8 | 9 | from .utils_class import LinearRegression 10 | from .utils_func import (create_windows, sigma_interval, kalman_interval, 11 | confidence_interval, prediction_interval) 12 | from .utils_func import (_check_noise_dict, _check_knots, _check_weights, 13 | _check_data, _check_data_nan, _check_output) 14 | from .regression_basis import (polynomial, linear_spline, cubic_spline, natural_cubic_spline, 15 | gaussian_kernel, binner, lowess) 16 | 17 | 18 | _interval_types = { 19 | 'KalmanSmoother': 20 | ['sigma_interval', 'kalman_interval'], 21 | 'PolynomialSmoother': 22 | ['sigma_interval', 'confidence_interval', 'prediction_interval'], 23 | 'SplineSmoother': 24 | ['sigma_interval', 'confidence_interval', 'prediction_interval'], 25 | 'GaussianSmoother': 26 | ['sigma_interval', 'confidence_interval', 'prediction_interval'], 27 | 'BinnerSmoother': 28 | ['sigma_interval', 'confidence_interval', 'prediction_interval'], 29 | 'LowessSmoother': 30 | ['sigma_interval', 'confidence_interval', 'prediction_interval'], 31 | 'ExponentialSmoother': 32 | ['sigma_interval'], 33 | 'ConvolutionSmoother': 34 | ['sigma_interval'], 35 | 'DecomposeSmoother': 36 | ['sigma_interval'], 37 | 'SpectralSmoother': 38 | ['sigma_interval'] 39 | } 40 | 41 | 42 | class _BaseSmoother: 43 | """Base class to build each Smoother. 44 | 45 | Warning: This class should not be used directly. Use derived classes 46 | instead. 47 | """ 48 | 49 | def __init__(self, copy=True): 50 | self.copy = copy 51 | 52 | def __repr__(self): 53 | return "".format(self.__class__.__name__) 54 | 55 | def __str__(self): 56 | return "".format(self.__class__.__name__) 57 | 58 | def _store_results(self, smooth_data, **objtosave): 59 | """Private method to store results.""" 60 | 61 | self.smooth_data = smooth_data 62 | 63 | if self.copy: 64 | for name, obj in objtosave.items(): 65 | setattr(self, str(name), obj) 66 | 67 | def get_intervals(self, interval_type, confidence=0.05, n_sigma=2): 68 | """Obtain intervals from the smoothed timeseries. 69 | Take care to set copy=True when defining the smoother. 70 | 71 | Supported interval types are: 72 | 1) 'sigma_interval'; 73 | 2) 'confidence_interval'; 74 | 3) 'prediction_interval'; 75 | 4) 'kalman_interval'. 76 | 77 | Each Smooter supports different interval types: 78 | - 'KalmanSmoother' => (1,4); 79 | - 'PolynomialSmoother' => (1,2,3); 80 | - 'SplineSmoother' => (1,2,3); 81 | - 'GaussianSmoother' => (1,2,3); 82 | - 'BinnerSmoother' => (1,2,3); 83 | - 'LowessSmoother' => (1,2,3); 84 | - 'ExponentialSmoother' => (1); 85 | - 'ConvolutionSmoother' => (1); 86 | - 'DecomposeSmoother' => (1); 87 | - 'SpectralSmoother' => (1); 88 | - 'WindowWrapper' => depends on the Smoother received. 89 | 90 | Parameters 91 | ---------- 92 | interval_type : str 93 | Type of interval used to produce the lower and upper bands. 94 | 95 | confidence : float, default=0.05 96 | Effective only for 'confidence_interval', 'prediction_interval' 97 | or 'kalman_interval'. 98 | The significance level for the intervals calculated as 99 | (1-confidence). 100 | 101 | n_sigma : int, default=2 102 | Effective only for 'sigma_interval'. 103 | How many standard deviations, calculated on residuals of the 104 | smoothing operation, are used to obtain the intervals. 105 | 106 | Returns 107 | ------- 108 | low : array of shape (series, timesteps) 109 | Lower bands. 110 | 111 | up : array of shape (series, timesteps) 112 | Upper bands. 113 | """ 114 | 115 | if self.__class__.__name__ == 'WindowWrapper': 116 | 117 | if not hasattr(self.Smoother, 'data'): 118 | raise ValueError( 119 | "Pass some data to the smoother before computing intervals, " 120 | "setting copy=True") 121 | 122 | if interval_type not in _interval_types[self.Smoother.__class__.__name__]: 123 | raise ValueError( 124 | "'{}' is not a supported interval type for this smoother. " 125 | "Supported types are {}".format( 126 | interval_type, _interval_types[self.Smoother.__class__.__name__])) 127 | 128 | if interval_type == 'sigma_interval': 129 | low, up = sigma_interval( 130 | self.Smoother.data, self.Smoother.smooth_data, n_sigma) 131 | 132 | elif interval_type == 'kalman_interval': 133 | low, up = kalman_interval( 134 | self.Smoother.data, self.Smoother.smooth_data, 135 | self.Smoother.cov, confidence) 136 | 137 | elif (interval_type == 'confidence_interval' or 138 | interval_type == 'prediction_interval'): 139 | interval_f = eval(interval_type) 140 | low, up = interval_f( 141 | self.Smoother.data, self.Smoother.smooth_data, 142 | self.Smoother.X, confidence) 143 | 144 | else: 145 | 146 | if not hasattr(self, 'data'): 147 | raise ValueError( 148 | "Pass some data to the smoother before computing intervals, " 149 | "setting copy=True") 150 | 151 | if interval_type not in _interval_types[self.__class__.__name__]: 152 | raise ValueError( 153 | "'{}' is not a supported interval type for this smoother. " 154 | "Supported types are {}".format( 155 | interval_type, _interval_types[self.__class__.__name__])) 156 | 157 | if interval_type == 'sigma_interval': 158 | low, up = sigma_interval(self.data, self.smooth_data, n_sigma) 159 | 160 | elif interval_type == 'kalman_interval': 161 | low, up = kalman_interval( 162 | self.data, self.smooth_data, self.cov, confidence) 163 | 164 | elif (interval_type == 'confidence_interval' or 165 | interval_type == 'prediction_interval'): 166 | interval_f = eval(interval_type) 167 | low, up = interval_f( 168 | self.data, self.smooth_data, self.X, confidence) 169 | 170 | return low, up 171 | 172 | 173 | class ExponentialSmoother(_BaseSmoother): 174 | """ExponentialSmoother operates convolutions of fixed dimensions 175 | on the series using a weighted windows. The weights are the same 176 | for all windows and are computed using an exponential decay. 177 | The most recent observations are most important than the past ones. 178 | This is imposed choosing a parameter (alpha). 179 | No padded is provided in order to not alter the results at the edges. 180 | For this reason, this technique doesn't operate smoothing until 181 | the observations at position window_len. 182 | 183 | The ExponentialSmoother automatically vectorizes, in an efficient way, 184 | the desired smoothing operation on all the series received. 185 | 186 | Parameters 187 | ---------- 188 | window_len : int 189 | Greater than equal to 1. The length of the window used to compute 190 | the exponential smoothing. 191 | 192 | alpha : float 193 | Between 0 and 1. (1-alpha) provides the importance of the past 194 | obsevations when computing the smoothing. 195 | 196 | copy : bool, default=True 197 | If True, the raw data received by the smoother and the smoothed 198 | results can be accessed using 'data' and 'smooth_data' attributes. 199 | This is useful to calculate the intervals. If set to False the 200 | interval calculation is disabled. In order to save memory, set it to 201 | False if you are interested only in the smoothed results. 202 | 203 | Attributes 204 | ---------- 205 | smooth_data : array of shape (series, timesteps-window_len) 206 | Smoothed data derived from the smoothing operation. 207 | It has the same shape of the raw data received without the first 208 | observations until window_len. It is accessible after computhing 209 | smoothing, otherwise None is returned. 210 | 211 | data : array of shape (series, timesteps-window_len) 212 | Raw data received by the smoother. It is accessible with 'copy'=True 213 | and after computhing smoothing, otherwise None is returned. 214 | 215 | Examples 216 | -------- 217 | >>> import numpy as np 218 | >>> from tsmoothie.utils_func import sim_randomwalk 219 | >>> from tsmoothie.smoother import * 220 | >>> np.random.seed(33) 221 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 222 | ... process_noise=10, measure_noise=30) 223 | >>> smoother = ExponentialSmoother(window_len=20, alpha=0.3) 224 | >>> smoother.smooth(data) 225 | >>> low, up = smoother.get_intervals('sigma_interval') 226 | """ 227 | 228 | def __init__(self, window_len, alpha, copy=True): 229 | self.window_len = window_len 230 | self.alpha = alpha 231 | self.copy = copy 232 | 233 | def smooth(self, data): 234 | """Smooth timeseries. 235 | 236 | Parameters 237 | ---------- 238 | data : array-like of shape (series, timesteps) or also (timesteps,) 239 | for single timeseries 240 | Timeseries to smooth. The data are assumed to be in increasing 241 | time order in each timeseries. 242 | 243 | Returns 244 | ------- 245 | self : returns an instance of self 246 | """ 247 | 248 | if self.window_len < 1: 249 | raise ValueError("window_len must be >= 1") 250 | 251 | if self.alpha > 1 or self.alpha < 0: 252 | raise ValueError("alpha must be in the range [0,1]") 253 | 254 | data = _check_data(data) 255 | 256 | if self.window_len >= data.shape[0]: 257 | raise ValueError( 258 | "window_len must be < than timesteps dimension " 259 | "of the data received") 260 | 261 | w = np.power((1 - self.alpha), np.arange(self.window_len)) 262 | 263 | if data.ndim == 2: 264 | w = np.repeat([w / w.sum()], data.shape[1], axis=0).T 265 | else: 266 | w = w / w.sum() 267 | 268 | smooth = fftconvolve(w, data, mode='full', axes=0) 269 | smooth = smooth[self.window_len:data.shape[0]] 270 | data = data[self.window_len:data.shape[0]] 271 | 272 | smooth = _check_output(smooth) 273 | data = _check_output(data) 274 | 275 | self._store_results(smooth_data=smooth, data=data) 276 | 277 | return self 278 | 279 | 280 | class ConvolutionSmoother(_BaseSmoother): 281 | """ConvolutionSmoother operates convolutions of fixed dimensions 282 | on the series using a weighted windows. The weights can assume 283 | different format but they are the same for all the windows and 284 | fixed for the whole procedure. The series are padded, reflecting themself, 285 | with a quantity equal to the window size in both ends to avoid loss of 286 | information. 287 | 288 | The ConvolutionSmoother automatically vectorizes, in an efficient way, 289 | the desired smoothing operation on all the series received. 290 | 291 | Parameters 292 | ---------- 293 | window_len : int 294 | Greater than equal to 1. The length of the window used to compute 295 | the convolutions. 296 | 297 | window_type : str 298 | The type of the window used to compute the convolutions. 299 | Supported types are: 'ones', 'hanning', 'hamming', 'bartlett', 'blackman'. 300 | 301 | copy : bool, default=True 302 | If True, the raw data received by the smoother and the smoothed 303 | results can be accessed using 'data' and 'smooth_data' attributes. 304 | This is useful to calculate the intervals. If set to False the 305 | interval calculation is disabled. In order to save memory, set it to 306 | False if you are interested only in the smoothed results. 307 | 308 | Attributes 309 | ---------- 310 | smooth_data : array of shape (series, timesteps) 311 | Smoothed data derived from the smoothing operation. It is accessible 312 | after computhing smoothing, otherwise None is returned. 313 | 314 | data : array of shape (series, timesteps) 315 | Raw data received by the smoother. It is accessible with 'copy'=True 316 | and after computhing smoothing, otherwise None is returned. 317 | 318 | Examples 319 | -------- 320 | >>> import numpy as np 321 | >>> from tsmoothie.utils_func import sim_randomwalk 322 | >>> from tsmoothie.smoother import * 323 | >>> np.random.seed(33) 324 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 325 | ... process_noise=10, measure_noise=30) 326 | >>> smoother = ConvolutionSmoother(window_len=10, window_type='ones') 327 | >>> smoother.smooth(data) 328 | >>> low, up = smoother.get_intervals('sigma_interval') 329 | """ 330 | 331 | def __init__(self, window_len, window_type, copy=True): 332 | self.window_len = window_len 333 | self.window_type = window_type 334 | self.copy = copy 335 | 336 | def smooth(self, data): 337 | """Smooth timeseries. 338 | 339 | Parameters 340 | ---------- 341 | data : array-like of shape (series, timesteps) or also (timesteps,) 342 | for single timeseries 343 | Timeseries to smooth. The data are assumed to be in increasing 344 | time order in each timeseries. 345 | 346 | Returns 347 | ------- 348 | self : returns an instance of self 349 | """ 350 | 351 | window_types = ['ones', 'hanning', 'hamming', 'bartlett', 'blackman'] 352 | 353 | if self.window_type not in window_types: 354 | raise ValueError( 355 | "'{}' is not a supported window type. " 356 | "Supported types are {}".format(self.window_type, window_types)) 357 | 358 | if self.window_len < 1: 359 | raise ValueError("window_len must be >= 1") 360 | 361 | data = _check_data(data) 362 | 363 | if self.window_len % 2 == 0: 364 | window_len = int(self.window_len + 1) 365 | else: 366 | window_len = self.window_len 367 | 368 | if self.window_type == 'ones': 369 | w = np.ones(window_len) 370 | else: 371 | w = eval('np.' + self.window_type + '(window_len)') 372 | 373 | if data.ndim == 2: 374 | pad_data = np.pad( 375 | data, ((window_len, window_len), (0, 0)), mode='symmetric') 376 | w = np.repeat([w / w.sum()], pad_data.shape[1], axis=0).T 377 | else: 378 | pad_data = np.pad(data, window_len, mode='symmetric') 379 | w = w / w.sum() 380 | 381 | smooth = fftconvolve(w, pad_data, mode='valid', axes=0) 382 | smooth = smooth[(window_len // 2 + 1):-(window_len // 2 + 1)] 383 | 384 | smooth = _check_output(smooth) 385 | data = _check_output(data) 386 | 387 | self._store_results(smooth_data=smooth, data=data) 388 | 389 | return self 390 | 391 | 392 | class SpectralSmoother(_BaseSmoother): 393 | """SpectralSmoother smoothes the timeseries applying a Fourier 394 | Transform. It maintains the most important frequencies, suppressing 395 | the others in the Fourier domain. This results in a smoother curves 396 | when returning to a real domain. 397 | 398 | The SpectralSmoother automatically vectorizes, in an efficient way, 399 | the desired smoothing operation on all the series passed. 400 | 401 | Parameters 402 | ---------- 403 | smooth_fraction : float 404 | Between 0 and 1. The smoothing strength. A lower value of 405 | smooth_fraction will result in a smoother curve. It's the proportion 406 | of frequencies used in the discrete Fourier Transform to smooth 407 | the curve. 408 | 409 | pad_len : int 410 | Greater than equal to 1. The length of the padding used at each 411 | timeseries edge to center the series and obtain better smoothings. 412 | 413 | copy : bool, default=True 414 | If True, the raw data received by the smoother and the smoothed 415 | results can be accessed using 'data' and 'smooth_data' attributes. 416 | This is useful to calculate the intervals. If set to False the 417 | interval calculation is disabled. In order to save memory, set it to 418 | False if you are interested only in the smoothed results. 419 | 420 | Attributes 421 | ---------- 422 | smooth_data : array of shape (series, timesteps) 423 | Smoothed data derived from the smoothing operation. It is accessible 424 | after computhing smoothing, otherwise None is returned. 425 | 426 | data : array of shape (series, timesteps) 427 | Raw data received by the smoother. It is accessible with 'copy'=True 428 | and after computhing smoothing, otherwise None is returned. 429 | 430 | Examples 431 | -------- 432 | >>> import numpy as np 433 | >>> from tsmoothie.utils_func import sim_seasonal_data 434 | >>> from tsmoothie.smoother import * 435 | >>> np.random.seed(33) 436 | >>> data = sim_seasonal_data(n_series=3, timesteps=200, 437 | ... freq=24, measure_noise=15) 438 | >>> smoother = SpectralSmoother(smooth_fraction=0.2, pad_len=20) 439 | >>> smoother.smooth(data) 440 | >>> low, up = smoother.get_intervals('sigma_interval') 441 | """ 442 | 443 | def __init__(self, smooth_fraction, pad_len, copy=True): 444 | self.smooth_fraction = smooth_fraction 445 | self.pad_len = pad_len 446 | self.copy = copy 447 | 448 | def smooth(self, data): 449 | """Smooth timeseries. 450 | 451 | Parameters 452 | ---------- 453 | data : array-like of shape (series, timesteps) or also (timesteps,) 454 | for single timeseries 455 | Timeseries to smooth. The data are assumed to be in increasing 456 | time order in each timeseries. 457 | 458 | Returns 459 | ------- 460 | self : returns an instance of self 461 | """ 462 | 463 | if self.smooth_fraction >= 1 or self.smooth_fraction <= 0: 464 | raise ValueError("smooth_fraction must be in the range (0,1)") 465 | 466 | if self.pad_len < 1: 467 | raise ValueError("pad_len must be >= 1") 468 | 469 | data = _check_data(data) 470 | 471 | if data.ndim == 2: 472 | pad_data = np.pad( 473 | data, ((self.pad_len, self.pad_len), (0, 0)), 474 | mode='symmetric') 475 | else: 476 | pad_data = np.pad(data, self.pad_len, mode='symmetric') 477 | 478 | rfft = np.fft.rfft(pad_data, axis=0) 479 | n_coeff = int(rfft.shape[0] * self.smooth_fraction) 480 | 481 | if rfft.ndim == 2: 482 | rfft[n_coeff:, :] = 0 483 | else: 484 | rfft[n_coeff:] = 0 485 | 486 | if data.shape[0] % 2 > 0: 487 | n = 2 * rfft.shape[0] - 1 488 | else: 489 | n = 2 * (rfft.shape[0] - 1) 490 | 491 | smooth = np.fft.irfft(rfft, n=n, axis=0) 492 | smooth = smooth[self.pad_len:-self.pad_len] 493 | 494 | smooth = _check_output(smooth) 495 | data = _check_output(data) 496 | 497 | self._store_results(smooth_data=smooth, data=data) 498 | 499 | return self 500 | 501 | 502 | class PolynomialSmoother(_BaseSmoother): 503 | """PolynomialSmoother smoothes the timeseries applying a linear 504 | regression on an ad-hoc basis expansion. 505 | The input space, used to build the basis expansion, consists in 506 | a single continuos increasing sequence. 507 | 508 | The PolynomialSmoother automatically vectorizes, in an efficient way, 509 | the desired smoothing operation on all the series received. 510 | 511 | Parameters 512 | ---------- 513 | degree : int 514 | The polynomial order used to build the basis. 515 | 516 | copy : bool, default=True 517 | If True, the raw data received by the smoother and the smoothed 518 | results can be accessed using 'data' and 'smooth_data' attributes. 519 | This is useful to calculate the intervals. If set to False the 520 | interval calculation is disabled. In order to save memory, set it to 521 | False if you are interested only in the smoothed results. 522 | 523 | Attributes 524 | ---------- 525 | smooth_data : array of shape (series, timesteps) 526 | Smoothed data derived from the smoothing operation. It is accessible 527 | after computhing smoothing, otherwise None is returned. 528 | 529 | data : array of shape (series, timesteps) 530 | Raw data received by the smoother. It is accessible with 'copy'=True 531 | and after computhing smoothing, otherwise None is returned. 532 | 533 | Examples 534 | -------- 535 | >>> import numpy as np 536 | >>> from tsmoothie.utils_func import sim_randomwalk 537 | >>> from tsmoothie.smoother import * 538 | >>> np.random.seed(33) 539 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 540 | ... process_noise=10, measure_noise=30) 541 | >>> smoother = PolynomialSmoother(degree=6) 542 | >>> smoother.smooth(data) 543 | >>> low, up = smoother.get_intervals('prediction_interval') 544 | """ 545 | 546 | def __init__(self, degree, copy=True): 547 | self.degree = degree 548 | self.copy = copy 549 | 550 | def smooth(self, data, weights=None): 551 | """Smooth timeseries. 552 | 553 | Parameters 554 | ---------- 555 | data : array-like of shape (series, timesteps) or also (timesteps,) 556 | for single timeseries 557 | Timeseries to smooth. The data are assumed to be in increasing 558 | time order in each timeseries. 559 | 560 | weights : array-like of shape (timesteps,), default=None 561 | Individual weights for each timestep. In case of multidimesional 562 | timeseries, the same weights are used for all the timeseries. 563 | 564 | Returns 565 | ------- 566 | self : returns an instance of self 567 | """ 568 | 569 | if self.degree < 1: 570 | raise ValueError("degree must be > 0") 571 | 572 | data = _check_data(data) 573 | basis_len = data.shape[0] 574 | weights = _check_weights(weights, basis_len) 575 | 576 | X_base = polynomial(self.degree, basis_len) 577 | 578 | lr = LinearRegression(fit_intercept=True) 579 | lr.fit(X_base, data, sample_weight=weights) 580 | 581 | smooth = lr.predict(X_base) 582 | 583 | smooth = _check_output(smooth) 584 | data = _check_output(data) 585 | 586 | self._store_results(smooth_data=smooth, X=X_base, data=data) 587 | 588 | return self 589 | 590 | 591 | class SplineSmoother(_BaseSmoother): 592 | """SplineSmoother smoothes the timeseries applying a linear regression 593 | on an ad-hoc basis expansion. Three types of spline smoothing are 594 | available: 'linear spline', 'cubic spline', 'natural cubic spline'. 595 | In all of the available methods, the input space consists in a single 596 | continuos increasing sequence. 597 | 598 | Two possibilities are available: 599 | - smooth the timeseries in equal intervals, where the number of 600 | intervals is a user defined parameter (n_knots); 601 | - smooth the timeseries in custom length intervals, where the interval 602 | positions are defined by the user as normalize points (knots). 603 | The two methods are exclusive: the usage of n_knots makes not effective 604 | the usage of knots and vice-versa. 605 | 606 | The SplineSmoother automatically vectorizes, in an efficient way, 607 | the desired smoothing operation on all the series received. 608 | 609 | Parameters 610 | ---------- 611 | spline_type : str 612 | Type of spline smoother to operate. Supported types are 'linear_spline', 613 | 'cubic_spline' or 'natural_cubic_spline'. 614 | 615 | n_knots : int 616 | Between 1 and timesteps for 'linear_spline' and 'natural_cubic_spline'. 617 | Between 3 and timesteps for 'natural_cubic_spline'. 618 | Number of equal intervals used to divide the input space and smooth 619 | the timeseries. A lower value of n_knots will result in a smoother curve. 620 | 621 | knots : array-like of shape (n_knots,), default=None 622 | With length of at least 1 for 'linear_spline' and 'natural_cubic_spline'. 623 | With length of at least 3 for 'natural_cubic_spline'. 624 | Normalized points in the range [0,1] that specify in which sections 625 | divide the input space. A lower number of knots will result in a 626 | smoother curve. 627 | 628 | copy : bool, default=True 629 | If True, the raw data received by the smoother and the smoothed 630 | results can be accessed using 'data' and 'smooth_data' attributes. 631 | This is useful to calculate the intervals. If set to False the 632 | interval calculation is disabled. In order to save memory, set it to 633 | False if you are interested only in the smoothed results. 634 | 635 | Attributes 636 | ---------- 637 | smooth_data : array of shape (series, timesteps) 638 | Smoothed data derived from the smoothing operation. It is accessible 639 | after computhing smoothing, otherwise None is returned. 640 | 641 | data : array of shape (series, timesteps) 642 | Raw data received by the smoother. It is accessible with 'copy'=True 643 | and after computhing smoothing, otherwise None is returned. 644 | 645 | Examples 646 | -------- 647 | >>> import numpy as np 648 | >>> from tsmoothie.utils_func import sim_randomwalk 649 | >>> from tsmoothie.smoother import * 650 | >>> np.random.seed(33) 651 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 652 | ... process_noise=10, measure_noise=30) 653 | >>> smoother = SplineSmoother(n_knots=6, spline_type='natural_cubic_spline') 654 | >>> smoother.smooth(data) 655 | >>> low, up = smoother.get_intervals('prediction_interval') 656 | """ 657 | 658 | def __init__(self, spline_type, n_knots, knots=None, copy=True): 659 | self.spline_type = spline_type 660 | self.n_knots = n_knots 661 | self.knots = knots 662 | self.copy = copy 663 | 664 | def smooth(self, data, weights=None): 665 | """Smooth timeseries. 666 | 667 | Parameters 668 | ---------- 669 | data : array-like of shape (series, timesteps) or also (timesteps,) 670 | for single timeseries 671 | Timeseries to smooth. The data are assumed to be in increasing 672 | time order in each timeseries. 673 | 674 | weights : array-like of shape (timesteps,), default=None 675 | Individual weights for each timestep. In case of multidimesional 676 | timeseries, the same weights are used for all the timeseries. 677 | 678 | Returns 679 | ------- 680 | self : returns an instance of self 681 | """ 682 | 683 | spline_types = {'linear_spline': 1, 'cubic_spline': 1, 684 | 'natural_cubic_spline': 3} 685 | 686 | if self.spline_type not in spline_types: 687 | raise ValueError("'{}' is not a supported spline type. " 688 | "Supported types are {}".format( 689 | self.spline_type, list(spline_types.keys()))) 690 | 691 | data = _check_data(data) 692 | basis_len = data.shape[0] 693 | weights = _check_weights(weights, basis_len) 694 | 695 | if self.knots is not None: 696 | knots = _check_knots( 697 | self.knots, spline_types[self.spline_type])[1:-1] * basis_len 698 | 699 | else: 700 | if self.n_knots < spline_types[self.spline_type]: 701 | raise ValueError( 702 | "'{}' requires n_knots >= {}".format( 703 | self.spline_type, spline_types[self.spline_type])) 704 | 705 | if self.n_knots > basis_len: 706 | raise ValueError( 707 | "n_knots must be <= than timesteps dimension " 708 | "of the data received") 709 | 710 | knots = np.linspace(0, basis_len, self.n_knots + 2)[1:-1] 711 | 712 | f = eval(self.spline_type) 713 | X_base = f(knots, basis_len) 714 | 715 | lr = LinearRegression(fit_intercept=True) 716 | lr.fit(X_base, data, sample_weight=weights) 717 | 718 | smooth = lr.predict(X_base) 719 | 720 | smooth = _check_output(smooth) 721 | data = _check_output(data) 722 | 723 | self._store_results(smooth_data=smooth, X=X_base, data=data) 724 | 725 | return self 726 | 727 | 728 | class GaussianSmoother(_BaseSmoother): 729 | """GaussianSmoother smoothes the timeseries applying a linear 730 | regression on an ad-hoc basis expansion. The features created with 731 | this method are obtained applying a gaussian kernel centered to specified 732 | points of the input space. 733 | In timeseries domain, the input space consists in a single continuos 734 | increasing sequence. 735 | 736 | Two possibilities are available: 737 | - smooth the timeseries in equal intervals, where the number of 738 | intervals is a user defined parameter (n_knots); 739 | - smooth the timeseries in custom length intervals, where the interval 740 | positions are defined by the user as normalize points (knots). 741 | The two methods are exclusive: the usage of n_knots makes not effective 742 | the usage of knots and vice-versa. 743 | 744 | The GaussianSmoother automatically vectorizes, in an efficient way, 745 | the desired smoothing operation on all the series received. 746 | 747 | Parameters 748 | ---------- 749 | sigma : float 750 | sigma in the gaussian kernel. 751 | 752 | n_knots : int 753 | Between 1 and timesteps. Number of equal intervals used to divide 754 | the input space and smooth the timeseries. A lower value of n_knots 755 | will result in a smoother curve. 756 | 757 | knots : array-like of shape (n_knots,), default=None 758 | With length of at least 1. Normalized points in the range [0,1] that 759 | specify in which sections divide the input space. A lower number of 760 | knots will result in a smoother curve. 761 | 762 | copy : bool, default=True 763 | If True, the raw data received by the smoother and the smoothed 764 | results can be accessed using 'data' and 'smooth_data' attributes. 765 | This is useful to calculate the intervals. If set to False the 766 | interval calculation is disabled. In order to save memory, set it to 767 | False if you are interested only in the smoothed results. 768 | 769 | Attributes 770 | ---------- 771 | smooth_data : array of shape (series, timesteps) 772 | Smoothed data derived from the smoothing operation. It is accessible 773 | after computhing smoothing, otherwise None is returned. 774 | 775 | data : array of shape (series, timesteps) 776 | Raw data received by the smoother. It is accessible with 'copy'=True 777 | and after computhing smoothing, otherwise None is returned. 778 | 779 | Examples 780 | -------- 781 | >>> import numpy as np 782 | >>> from tsmoothie.utils_func import sim_randomwalk 783 | >>> from tsmoothie.smoother import * 784 | >>> np.random.seed(33) 785 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 786 | ... process_noise=10, measure_noise=30) 787 | >>> smoother = GaussianSmoother(n_knots=6, sigma=0.1) 788 | >>> smoother.smooth(data) 789 | >>> low, up = smoother.get_intervals('prediction_interval') 790 | """ 791 | 792 | def __init__(self, sigma, n_knots, knots=None, copy=True): 793 | self.sigma = sigma 794 | self.n_knots = n_knots 795 | self.knots = knots 796 | self.copy = copy 797 | 798 | def smooth(self, data, weights=None): 799 | """Smooth timeseries. 800 | 801 | Parameters 802 | ---------- 803 | data : array-like of shape (series, timesteps) or also (timesteps,) 804 | for single timeseries 805 | Timeseries to smooth. The data are assumed to be in increasing 806 | time order in each timeseries. 807 | 808 | weights : array-like of shape (timesteps,), default=None 809 | Individual weights for each timestep. In case of multidimesional 810 | timeseries, the same weights are used for all the timeseries. 811 | 812 | Returns 813 | ------- 814 | self : returns an instance of self 815 | """ 816 | 817 | if self.sigma <= 0: 818 | raise ValueError("sigma must be > 0") 819 | 820 | data = _check_data(data) 821 | basis_len = data.shape[0] 822 | weights = _check_weights(weights, basis_len) 823 | 824 | if self.knots is not None: 825 | knots = _check_knots(self.knots, 1)[1:-1] 826 | 827 | else: 828 | if self.n_knots < 1: 829 | raise ValueError("n_knots must be > 0") 830 | 831 | if self.n_knots > basis_len: 832 | raise ValueError( 833 | "n_knots must be <= than timesteps dimension " 834 | "of the data received") 835 | 836 | knots = np.linspace(0, 1, self.n_knots + 2)[1:-1] 837 | 838 | X_base = gaussian_kernel(knots, self.sigma, basis_len) 839 | 840 | lr = LinearRegression(fit_intercept=True) 841 | lr.fit(X_base, data, sample_weight=weights) 842 | 843 | smooth = lr.predict(X_base) 844 | 845 | smooth = _check_output(smooth) 846 | data = _check_output(data) 847 | 848 | self._store_results(smooth_data=smooth, X=X_base, data=data) 849 | 850 | return self 851 | 852 | 853 | class BinnerSmoother(_BaseSmoother): 854 | """BinnerSmoother smoothes the timeseries applying a linear regression 855 | on an ad-hoc basis expansion. The features created with this method 856 | are obtained binning the input space into intervals. 857 | An indicator feature is created for each bin, indicating where 858 | a given observation falls into. 859 | In timeseries domain, the input space consists in a single continuos 860 | increasing sequence. 861 | 862 | Two possibilities are available: 863 | - smooth the timeseries in equal intervals, where the number of 864 | intervals is a user defined parameter (n_knots); 865 | - smooth the timeseries in custom length intervals, where the interval 866 | positions are defined by the user as normalize points (knots). 867 | The two methods are exclusive: the usage of n_knots makes not effective 868 | the usage of knots and vice-versa. 869 | 870 | The BinnerSmoother automatically vectorizes, in an efficient way, 871 | the desired smoothing operation on all the series received. 872 | 873 | Parameters 874 | ---------- 875 | n_knots : int 876 | Between 1 and timesteps. Number of equal intervals used to divide 877 | the input space and smooth the timeseries. A lower value of n_knots 878 | will result in a smoother curve. 879 | 880 | knots : array-like of shape (n_knots,), default=None 881 | With length of at least 1. Normalized points in the range [0,1] that 882 | specify in which sections divide the input space. A lower number of 883 | knots will result in a smoother curve. 884 | 885 | copy : bool, default=True 886 | If True, the raw data received by the smoother and the smoothed 887 | results can be accessed using 'data' and 'smooth_data' attributes. 888 | This is useful to calculate the intervals. If set to False the 889 | interval calculation is disabled. In order to save memory, set it to 890 | False if you are interested only in the smoothed results. 891 | 892 | Attributes 893 | ---------- 894 | smooth_data : array of shape (series, timesteps) 895 | Smoothed data derived from the smoothing operation. It is accessible 896 | after computhing smoothing, otherwise None is returned. 897 | 898 | data : array of shape (series, timesteps) 899 | Raw data received by the smoother. It is accessible with 'copy'=True 900 | and after computhing smoothing, otherwise None is returned. 901 | 902 | Examples 903 | -------- 904 | >>> import numpy as np 905 | >>> from tsmoothie.utils_func import sim_randomwalk 906 | >>> from tsmoothie.smoother import * 907 | >>> np.random.seed(33) 908 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 909 | ... process_noise=10, measure_noise=30) 910 | >>> smoother = BinnerSmoother(n_knots=6) 911 | >>> smoother.smooth(data) 912 | >>> low, up = smoother.get_intervals('prediction_interval') 913 | """ 914 | 915 | def __init__(self, n_knots, knots=None, copy=True): 916 | self.n_knots = n_knots 917 | self.knots = knots 918 | self.copy = copy 919 | 920 | def smooth(self, data, weights=None): 921 | """Smooth timeseries. 922 | 923 | Parameters 924 | ---------- 925 | data : array-like of shape (series, timesteps) or also (timesteps,) 926 | for single timeseries 927 | Timeseries to smooth. The data are assumed to be in increasing 928 | time order in each timeseries. 929 | 930 | weights : array-like of shape (timesteps,), default=None 931 | Individual weights for each timestep. In case of multidimesional 932 | timeseries, the same weights are used for all the timeseries. 933 | 934 | Returns 935 | ------- 936 | self : returns an instance of self 937 | """ 938 | 939 | data = _check_data(data) 940 | basis_len = data.shape[0] 941 | weights = _check_weights(weights, basis_len) 942 | 943 | if self.knots is not None: 944 | knots = _check_knots(self.knots, 1)[1:-1] * basis_len 945 | 946 | else: 947 | if self.n_knots < 1: 948 | raise ValueError("n_knots must be > 0") 949 | 950 | if self.n_knots > basis_len: 951 | raise ValueError( 952 | "n_knots must be <= than timesteps dimension " 953 | "of the data received") 954 | 955 | knots = np.linspace(0, basis_len, self.n_knots + 2)[1:-1] 956 | 957 | X_base = binner(knots, basis_len) 958 | 959 | lr = LinearRegression(fit_intercept=True) 960 | lr.fit(X_base, data, sample_weight=weights) 961 | 962 | smooth = lr.predict(X_base) 963 | 964 | smooth = _check_output(smooth) 965 | data = _check_output(data) 966 | 967 | self._store_results(smooth_data=smooth, X=X_base, data=data) 968 | 969 | return self 970 | 971 | 972 | class LowessSmoother(_BaseSmoother): 973 | """LowessSmoother uses LOWESS (locally-weighted scatterplot smoothing) 974 | to smooth the timeseries. This smoothing technique is a non-parametric 975 | regression method that essentially fit a unique linear regression 976 | for every data point by including nearby data points to estimate 977 | the slope and intercept. The presented method is robust because it 978 | performs residual-based reweightings simply specifing the number of 979 | iterations to operate. 980 | 981 | The LowessSmoother automatically vectorizes, in an efficient way, 982 | the desired smoothing operation on all the series passed. 983 | 984 | Parameters 985 | ---------- 986 | smooth_fraction : float 987 | Between 0 and 1. The smoothing span. A larger value of smooth_fraction 988 | will result in a smoother curve. 989 | 990 | iterations : int 991 | Between 1 and 6. The number of residual-based reweightings to perform. 992 | 993 | batch_size : int, default=None 994 | How many timeseries are smoothed simultaneously. This parameter is 995 | important because LowessSmoother is a memory greedy process. Setting 996 | it low, with big timeseries, helps to avoid MemoryError. By default 997 | None means that all the timeseries are smoothed simultaneously. 998 | 999 | copy : bool, default=True 1000 | If True, the raw data received by the smoother and the smoothed 1001 | results can be accessed using 'data' and 'smooth_data' attributes. 1002 | This is useful to calculate the intervals. If set to False the 1003 | interval calculation is disabled. In order to save memory, set it to 1004 | False if you are interested only in the smoothed results. 1005 | 1006 | Attributes 1007 | ---------- 1008 | smooth_data : array of shape (series, timesteps) 1009 | Smoothed data derived from the smoothing operation. It is accessible 1010 | after computhing smoothing, otherwise None is returned. 1011 | 1012 | data : array of shape (series, timesteps) 1013 | Raw data received by the smoother. It is accessible with 'copy'=True 1014 | and after computhing smoothing, otherwise None is returned. 1015 | 1016 | Examples 1017 | -------- 1018 | >>> import numpy as np 1019 | >>> from tsmoothie.utils_func import sim_randomwalk 1020 | >>> from tsmoothie.smoother import * 1021 | >>> np.random.seed(33) 1022 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 1023 | ... process_noise=10, measure_noise=30) 1024 | >>> smoother = LowessSmoother(smooth_fraction=0.3, iterations=1) 1025 | >>> smoother.smooth(data) 1026 | >>> low, up = smoother.get_intervals('prediction_interval') 1027 | """ 1028 | 1029 | def __init__(self, smooth_fraction, iterations=1, 1030 | batch_size=None, copy=True): 1031 | self.smooth_fraction = smooth_fraction 1032 | self.iterations = iterations 1033 | self.batch_size = batch_size 1034 | self.copy = copy 1035 | 1036 | def smooth(self, data): 1037 | """Smooth timeseries. 1038 | 1039 | Parameters 1040 | ---------- 1041 | data : array-like of shape (series, timesteps) or also (timesteps,) 1042 | for single timeseries 1043 | Timeseries to smooth. The data are assumed to be in increasing 1044 | time order in each timeseries. 1045 | 1046 | Returns 1047 | ------- 1048 | self : returns an instance of self 1049 | """ 1050 | 1051 | if self.smooth_fraction >= 1 or self.smooth_fraction <= 0: 1052 | raise ValueError("smooth_fraction must be in the range (0,1)") 1053 | 1054 | if self.iterations <= 0 or self.iterations > 6: 1055 | raise ValueError("iterations must be in the range (0,6]") 1056 | 1057 | data = _check_data(data) 1058 | if data.ndim == 1: 1059 | data = data[:, None] 1060 | timesteps, n_timeseries = data.shape 1061 | 1062 | if self.batch_size is not None: 1063 | if self.batch_size <= 0 or self.batch_size > n_timeseries: 1064 | raise ValueError("batch_size must be in the range (0,series]") 1065 | 1066 | X = np.arange(timesteps) / (timesteps - 1) 1067 | w_init = lowess(self.smooth_fraction, timesteps) 1068 | 1069 | delta = np.ones_like(data) 1070 | 1071 | if self.batch_size is None: 1072 | batches = [np.arange(0, n_timeseries)] 1073 | else: 1074 | batches = np.split( 1075 | np.arange(0, n_timeseries), 1076 | np.arange(self.batch_size, n_timeseries + self.batch_size - 1, 1077 | self.batch_size)) 1078 | smooth = np.empty_like(data) 1079 | 1080 | for iteration in range(self.iterations): 1081 | 1082 | for B in batches: 1083 | 1084 | try: 1085 | w = delta[:, None, B] * w_init[..., None] 1086 | # (timesteps, timesteps, n_series) 1087 | wy = w * data[:, None, B] 1088 | # (timesteps, timesteps, n_series) 1089 | wyx = wy * X[:, None, None] 1090 | # (timesteps, timesteps, n_series) 1091 | wx = w * X[:, None, None] 1092 | # (timesteps, timesteps, n_series) 1093 | wxx = wx * X[:, None, None] 1094 | # (timesteps, timesteps, n_series) 1095 | 1096 | b = np.array([wy.sum(axis=0), wyx.sum(axis=0)]).T 1097 | # (n_series, timesteps, 2) 1098 | A = np.array([[w.sum(axis=0), wx.sum(axis=0)], 1099 | [wx.sum(axis=0), wxx.sum(axis=0)]]) 1100 | # (2, 2, timesteps, n_series) 1101 | 1102 | XtX = (A.transpose(1, 0, 2, 3)[None, ...] * A[:, None, ...]).sum(2) 1103 | # (2, 2, timesteps, n_series) 1104 | XtX = np.linalg.pinv(XtX.transpose(3, 2, 0, 1)) 1105 | # (n_series, timesteps, 2, 2) 1106 | XtXXt = (XtX[..., None] * A.transpose(3, 2, 1, 0)[..., None, :]).sum(2) 1107 | # (n_series, timesteps, 2, 2) 1108 | betas = np.squeeze(XtXXt @ b[..., None], -1) 1109 | # (n_series, timesteps, 2) 1110 | 1111 | smooth[:, B] = (betas[..., 0] + betas[..., 1] * X).T 1112 | # (timesteps, n_series) 1113 | 1114 | residuals = data[:, B] - smooth[:, B] 1115 | s = np.median(np.abs(residuals), axis=0).clip(1e-5) 1116 | delta[:, B] = (residuals / (6.0 * s)).clip(-1, 1) 1117 | delta[:, B] = np.square(1 - np.square(delta[:, B])) 1118 | 1119 | except MemoryError: 1120 | raise StopIteration( 1121 | "Reduce the batch_size provided in order to not encounter " 1122 | "memory errors. Provided batch_size is {}. By default batch_size " 1123 | "is set to None. This means that all the timeseries " 1124 | "passed are smoothed simultaneously".format(self.batch_size)) 1125 | 1126 | smooth = _check_output(smooth) 1127 | data = _check_output(data) 1128 | 1129 | self._store_results(smooth_data=smooth, X=X * (timesteps - 1), data=data) 1130 | 1131 | return self 1132 | 1133 | 1134 | class DecomposeSmoother(_BaseSmoother): 1135 | """DecomposeSmoother smoothes the timeseries applying a standard 1136 | seasonal decomposition. The seasonal decomposition can be carried out 1137 | using different smoothing techniques available in tsmoothie. 1138 | 1139 | The DecomposeSmoother automatically vectorizes, in an efficient way, 1140 | the desired smoothing operation on all the series received. 1141 | 1142 | Parameters 1143 | ---------- 1144 | smooth_type : str 1145 | The type of smoothing used to compute the seasonal decomposition. 1146 | Supported types are: 'convolution', 'lowess', 'natural_cubic_spline'. 1147 | 1148 | periods : list 1149 | List of seasonal periods of the timeseries. Multiple periods are 1150 | allowed. Each period must be an integer reater than 0. 1151 | 1152 | method : str, default='additive' 1153 | Type of seasonal component. 1154 | Supported types are: 'additive', 'multiplicative'. 1155 | 1156 | **smoothargs : Smoothing arguments 1157 | The same accepted by the smoother referring to smooth_type. 1158 | 1159 | copy : bool, default=True 1160 | If True, the raw data received by the smoother and the smoothed 1161 | results can be accessed using 'data' and 'smooth_data' attributes. 1162 | This is useful to calculate the intervals. If set to False the 1163 | interval calculation is disabled. In order to save memory, set it to 1164 | False if you are interested only in the smoothed results. 1165 | 1166 | Attributes 1167 | ---------- 1168 | smooth_data : array of shape (series, timesteps) 1169 | Smoothed data derived from the smoothing operation. It is accessible 1170 | after computhing smoothing, otherwise None is returned. 1171 | 1172 | data : array of shape (series, timesteps) 1173 | Raw data received by the smoother. It is accessible with 'copy'=True 1174 | and after computhing smoothing, otherwise None is returned. 1175 | 1176 | Examples 1177 | -------- 1178 | >>> import numpy as np 1179 | >>> from tsmoothie.utils_func import sim_seasonal_data 1180 | >>> from tsmoothie.smoother import * 1181 | >>> np.random.seed(33) 1182 | >>> data = sim_seasonal_data(n_series=3, timesteps=300, 1183 | ... freq=24, measure_noise=30) 1184 | >>> smoother = DecomposeSmoother(smooth_type='convolution', periods=24, 1185 | ... window_len=30, window_type='ones') 1186 | >>> smoother.smooth(data) 1187 | >>> low, up = smoother.get_intervals('sigma_interval') 1188 | """ 1189 | 1190 | def __init__(self, smooth_type, periods, method='additive', copy=True, 1191 | **smoothargs): 1192 | self.smooth_type = smooth_type 1193 | self.periods = periods 1194 | self.method = method 1195 | self.copy = copy 1196 | self.smoothargs = smoothargs 1197 | 1198 | def smooth(self, data): 1199 | """Smooth timeseries. 1200 | 1201 | Parameters 1202 | ---------- 1203 | data : array-like of shape (series, timesteps) or also (timesteps,) 1204 | for single timeseries 1205 | Timeseries to smooth. The data are assumed to be in increasing 1206 | time order in each timeseries. 1207 | 1208 | Returns 1209 | ------- 1210 | self : returns an instance of self 1211 | """ 1212 | 1213 | smooth_types = ['convolution', 'lowess', 'natural_cubic_spline'] 1214 | methods = ['additive', 'multiplicative'] 1215 | 1216 | if self.smooth_type not in smooth_types: 1217 | raise ValueError( 1218 | "'{}' is not a supported smooth type. " 1219 | "Supported types are {}".format(self.smooth_type, smooth_types)) 1220 | 1221 | if self.method not in methods: 1222 | raise ValueError("'{}' is not a supported method type. " 1223 | "Supported types are {}".format( 1224 | self.method, methods)) 1225 | 1226 | if not isinstance(self.periods, list): 1227 | periods = [self.periods] 1228 | else: 1229 | periods = self.periods 1230 | 1231 | for p in periods: 1232 | if p <= 0 or not isinstance(p, int): 1233 | raise ValueError("periods must a list containing int > 0") 1234 | 1235 | if self.smooth_type == 'convolution': 1236 | smoother = ConvolutionSmoother(copy=True, **self.smoothargs) 1237 | smoother.smooth(data) 1238 | elif self.smooth_type == 'lowess': 1239 | smoother = LowessSmoother(copy=True, **self.smoothargs) 1240 | smoother.smooth(data) 1241 | elif self.smooth_type == 'natural_cubic_spline': 1242 | smoother = SplineSmoother(copy=True, spline_type=self.smooth_type, 1243 | **self.smoothargs) 1244 | smoother.smooth(data) 1245 | 1246 | if self.method == 'additive': 1247 | detrended = smoother.data - smoother.smooth_data 1248 | else: 1249 | detrended = smoother.data / smoother.smooth_data 1250 | 1251 | period_averages = [] 1252 | for p in periods: 1253 | period_averages.append( 1254 | np.array([np.mean(detrended[:, i::p], axis=1) 1255 | for i in range(p)]).T) 1256 | 1257 | if self.method == 'additive': 1258 | period_averages = [p_a - np.mean(p_a, axis=1, keepdims=True) 1259 | for p_a in period_averages] 1260 | else: 1261 | period_averages = [p_a / np.mean(p_a, axis=1, keepdims=True) 1262 | for p_a in period_averages] 1263 | 1264 | nobs = smoother.data.shape[1] 1265 | seasonal = [np.tile(p_a, (1, nobs // periods[i] + 1))[:, :nobs] 1266 | for i, p_a in enumerate(period_averages)] 1267 | 1268 | data = smoother.data 1269 | smooth = smoother.smooth_data 1270 | for season in seasonal: 1271 | if self.method == 'additive': 1272 | smooth += season 1273 | else: 1274 | smooth *= season 1275 | 1276 | self._store_results(smooth_data=smooth, data=data) 1277 | 1278 | return self 1279 | 1280 | 1281 | class KalmanSmoother(_BaseSmoother): 1282 | """KalmanSmoother smoothes the timeseries using the Kalman smoothing 1283 | technique. The Kalman smoother provided here can be represented 1284 | in the state space form. For this reason, it's necessary to provide 1285 | an adequate matrix representation of all the components. It's possible 1286 | to define a Kalman smoother that takes into account the following 1287 | structure present in our series: 'level', 'trend', 'seasonality' and 1288 | 'long seasonality'. All these features have an addictive behaviour. 1289 | 1290 | The KalmanSmoother automatically vectorizes, in an efficient way, 1291 | the desired smoothing operation on all the series received. 1292 | 1293 | Parameters 1294 | ---------- 1295 | component : str 1296 | Specify the patterns and the dinamycs present in our series. 1297 | The possibilities are: 'level', 'level_trend', 1298 | 'level_season', 'level_trend_season', 'level_longseason', 1299 | 'level_trend_longseason', 'level_season_longseason', 1300 | 'level_trend_season_longseason'. Each single component is 1301 | delimited by the '_' notation. 1302 | 1303 | component_noise : dict 1304 | Specify in a dictionary the noise (in float term) of each single 1305 | component provided in the 'component' argument. If a noise of a 1306 | component, not provided in the 'component' argument, is provided, 1307 | it's automatically ignored. 1308 | 1309 | observation_noise : float, default=1.0 1310 | The noise level generated by the data measurement. 1311 | 1312 | n_seasons : int, default=None 1313 | The period of the seasonal component. If a seasonal component 1314 | is not provided in the 'component' argument, it's automatically 1315 | ignored. 1316 | 1317 | n_longseasons : int, default=None 1318 | The period of the long seasonal component. If a long seasonal 1319 | component is not provided in the 'component' argument, it's 1320 | automatically ignored. 1321 | 1322 | copy : bool, default=True 1323 | If True, the raw data received by the smoother and the smoothed 1324 | results can be accessed using 'data' and 'smooth_data' attributes. 1325 | This is useful to calculate the intervals. If set to False the 1326 | interval calculation is disabled. In order to save memory, set it to 1327 | False if you are interested only in the smoothed results. 1328 | 1329 | Attributes 1330 | ---------- 1331 | smooth_data : array of shape (series, timesteps) 1332 | Smoothed data derived from the smoothing operation. It is accessible 1333 | after computhing smoothing, otherwise None is returned. 1334 | 1335 | data : array of shape (series, timesteps) 1336 | Raw data received by the smoother. It is accessible with 'copy'=True 1337 | and after computhing smoothing, otherwise None is returned. 1338 | 1339 | Examples 1340 | -------- 1341 | >>> import numpy as np 1342 | >>> from tsmoothie.utils_func import sim_randomwalk 1343 | >>> from tsmoothie.smoother import * 1344 | >>> np.random.seed(33) 1345 | >>> data = sim_randomwalk(n_series=10, timesteps=200, 1346 | ... process_noise=10, measure_noise=30) 1347 | >>> smoother = KalmanSmoother(component='level_trend', 1348 | ... component_noise={'level':0.1, 'trend':0.1}) 1349 | >>> smoother.smooth(data) 1350 | >>> low, up = smoother.get_intervals('kalman_interval') 1351 | """ 1352 | 1353 | def __init__(self, component, component_noise, observation_noise=1., 1354 | n_seasons=None, n_longseasons=None, copy=True): 1355 | self.component = component 1356 | self.component_noise = component_noise 1357 | self.observation_noise = observation_noise 1358 | self.n_seasons = n_seasons 1359 | self.n_longseasons = n_longseasons 1360 | self.copy = copy 1361 | 1362 | def smooth(self, data): 1363 | """Smooth timeseries. 1364 | 1365 | Parameters 1366 | ---------- 1367 | data : array-like of shape (series, timesteps) or also (timesteps,) 1368 | for single timeseries 1369 | Timeseries to smooth. The data are assumed to be in increasing 1370 | time order in each timeseries. 1371 | 1372 | Returns 1373 | ------- 1374 | self : returns an instance of self 1375 | """ 1376 | 1377 | components = ['level', 'level_trend', 1378 | 'level_season', 'level_trend_season', 1379 | 'level_longseason', 'level_trend_longseason', 1380 | 'level_season_longseason', 'level_trend_season_longseason'] 1381 | 1382 | if self.component not in components: 1383 | raise ValueError( 1384 | "'{}' is unsupported. Pass one of {}".format( 1385 | self.component, components)) 1386 | 1387 | _noise = _check_noise_dict(self.component_noise, self.component) 1388 | data = _check_data_nan(data) 1389 | 1390 | if self.component == 'level': 1391 | 1392 | A = [[1]] # level 1393 | Q = [[_noise['level']]] 1394 | H = [[1]] 1395 | 1396 | elif self.component == 'level_trend': 1397 | 1398 | A = [[1, 1], # level 1399 | [0, 1]] # trend 1400 | Q = np.diag([_noise['level'], _noise['trend']]) 1401 | H = [[1, 0]] 1402 | 1403 | elif self.component == 'level_season': 1404 | 1405 | if self.n_seasons is None: 1406 | raise ValueError( 1407 | "you should specify n_seasons when using a seasonal component") 1408 | 1409 | A = np.zeros((self.n_seasons, self.n_seasons)) 1410 | A[0, 0] = 1 # level 1411 | A[1, 1:] = [-1.0] * (self.n_seasons - 1) # season 1412 | A[2:, 1:-1] = np.eye(self.n_seasons - 2) # season 1413 | Q = np.diag([_noise['level'], 1414 | _noise['season']] + [0] * (self.n_seasons - 2)) 1415 | H = [[1, 1] + [0] * (self.n_seasons - 2)] 1416 | 1417 | elif self.component == 'level_trend_season': 1418 | 1419 | if self.n_seasons is None: 1420 | raise ValueError( 1421 | "you should specify n_seasons when using a seasonal component") 1422 | 1423 | A = np.zeros((self.n_seasons + 1, self.n_seasons + 1)) 1424 | A[:2, :2] = [[1, 1], # level 1425 | [0, 1]] # trend 1426 | A[2, 2:] = [-1.0] * (self.n_seasons - 1) # season 1427 | A[3:, 2:-1] = np.eye(self.n_seasons - 2) # season 1428 | Q = np.diag([_noise['level'], _noise['trend'], 1429 | _noise['season']] + [0] * (self.n_seasons - 2)) 1430 | H = [[1, 0, 1] + [0] * (self.n_seasons - 2)] 1431 | 1432 | elif self.component == 'level_longseason': 1433 | 1434 | if self.n_longseasons is None: 1435 | raise ValueError( 1436 | "you should specify n_longseasons when using a " 1437 | "long seasonal component") 1438 | 1439 | period_cycle_sin = np.sin(2 * np.pi / self.n_longseasons) 1440 | period_cycle_cos = np.cos(2 * np.pi / self.n_longseasons) 1441 | 1442 | A = [[1, 0, 0], # level 1443 | [0, period_cycle_cos, period_cycle_sin], # long season 1444 | [0, -period_cycle_sin, period_cycle_cos]] # long season 1445 | Q = np.diag([_noise['level'], 1446 | _noise['longseason'], _noise['longseason']]) 1447 | H = [[1, 1, 0]] 1448 | 1449 | elif self.component == 'level_trend_longseason': 1450 | 1451 | if self.n_longseasons is None: 1452 | raise ValueError( 1453 | "you should specify n_longseasons when using a " 1454 | "long seasonal component") 1455 | 1456 | period_cycle_sin = np.sin(2 * np.pi / self.n_longseasons) 1457 | period_cycle_cos = np.cos(2 * np.pi / self.n_longseasons) 1458 | 1459 | A = [[1, 1, 0, 0], # level 1460 | [0, 1, 0, 0], # trend 1461 | [0, 0, period_cycle_cos, period_cycle_sin], # long season 1462 | [0, 0, -period_cycle_sin, period_cycle_cos]] # long season 1463 | Q = np.diag([_noise['level'], _noise['trend'], 1464 | _noise['longseason'], _noise['longseason']]), 1465 | H = [[1, 0, 1, 0]] 1466 | 1467 | elif self.component == 'level_season_longseason': 1468 | 1469 | if self.n_seasons is None: 1470 | raise ValueError( 1471 | "you should specify n_seasons when using a seasonal component") 1472 | 1473 | if self.n_longseasons is None: 1474 | raise ValueError( 1475 | "you should specify n_longseasons when using a " 1476 | "long seasonal component") 1477 | 1478 | period_cycle_sin = np.sin(2 * np.pi / self.n_longseasons) 1479 | period_cycle_cos = np.cos(2 * np.pi / self.n_longseasons) 1480 | 1481 | A = np.zeros((self.n_seasons + 2, self.n_seasons + 2)) 1482 | A[0, 0] = 1 # level 1483 | A[1:3, 1:3] = [[period_cycle_cos, period_cycle_sin], # long season 1484 | [-period_cycle_sin, period_cycle_cos]] # long season 1485 | A[3, 3:] = [-1.0] * (self.n_seasons - 1) # season 1486 | A[4:, 3:-1] = np.eye(self.n_seasons - 2) # season 1487 | Q = np.diag([_noise['level'], 1488 | _noise['longseason'], _noise['longseason'], 1489 | _noise['season']] + [0] * (self.n_seasons - 2)) 1490 | H = [[1, 1, 0, 1] + [0] * (self.n_seasons - 2)] 1491 | 1492 | elif self.component == 'level_trend_season_longseason': 1493 | 1494 | if self.n_seasons is None: 1495 | raise ValueError( 1496 | "you should specify n_seasons when using a seasonal component") 1497 | 1498 | if self.n_longseasons is None: 1499 | raise ValueError( 1500 | "you should specify n_longseasons when using a " 1501 | "long seasonal component") 1502 | 1503 | period_cycle_sin = np.sin(2 * np.pi / self.n_longseasons) 1504 | period_cycle_cos = np.cos(2 * np.pi / self.n_longseasons) 1505 | 1506 | A = np.zeros((self.n_seasons + 2 + 1, self.n_seasons + 2 + 1)) 1507 | A[:2, :2] = [[1, 1], # level 1508 | [0, 1]] # trend 1509 | A[2:4, 2:4] = [[period_cycle_cos, period_cycle_sin], # long season 1510 | [-period_cycle_sin, period_cycle_cos]] # long season 1511 | A[4, 4:] = [-1.0] * (self.n_seasons - 1) # season 1512 | A[5:, 4:-1] = np.eye(self.n_seasons - 2) # season 1513 | Q = np.diag([_noise['level'], _noise['trend'], 1514 | _noise['longseason'], _noise['longseason'], 1515 | _noise['season']] + [0] * (self.n_seasons - 2)) 1516 | H = [[1, 0, 1, 0, 1] + [0] * (self.n_seasons - 2)] 1517 | 1518 | kf = simdkalman.KalmanFilter( 1519 | state_transition=A, 1520 | process_noise=Q, 1521 | observation_model=H, 1522 | observation_noise=self.observation_noise) 1523 | 1524 | smoothed = kf.smooth(data) 1525 | smoothed_obs = smoothed.observations.mean 1526 | cov = np.sqrt(smoothed.observations.cov) 1527 | 1528 | smoothed_obs = _check_output(smoothed_obs, transpose=False) 1529 | cov = _check_output(cov, transpose=False) 1530 | data = _check_output(data, transpose=False) 1531 | 1532 | self._store_results(smooth_data=smoothed_obs, cov=cov, data=data) 1533 | 1534 | return self 1535 | 1536 | 1537 | class WindowWrapper(_BaseSmoother): 1538 | """WindowWrapper smooths timeseries partitioning them into equal 1539 | sliding segments and treating them as new standalone timeseries. 1540 | The WindowWrapper handles single timeseries. After the sliding windows 1541 | are generated, the WindowWrapper smooths them using the smoother it 1542 | receives as input parameter. In this way, the smoothing can be carried 1543 | out like a multiple smoothing task. 1544 | 1545 | The WindowWrapper automatically vectorizes, in an efficient way, 1546 | the sliding window creation and the desired smoothing operation. 1547 | 1548 | Parameters 1549 | ---------- 1550 | Smoother : class from tsmoothie.smoother 1551 | Every smoother available in tsmoothie.smoother. 1552 | It computes the smoothing on the series received. 1553 | 1554 | window_shape : int 1555 | Grather than 1. The shape of the sliding windows used to divide 1556 | the series to smooth. 1557 | 1558 | step : int, default=1 1559 | The step used to generate the sliding windows. 1560 | 1561 | Attributes 1562 | ---------- 1563 | Smoother : class from tsmoothie.smoother 1564 | Every smoother available in tsmoothie.smoother that was passed to 1565 | WindowWrapper. 1566 | It as the same properties and attributes of every Smoother. 1567 | 1568 | Examples 1569 | -------- 1570 | >>> import numpy as np 1571 | >>> from tsmoothie.utils_func import sim_randomwalk 1572 | >>> from tsmoothie.smoother import * 1573 | >>> np.random.seed(33) 1574 | >>> data = sim_randomwalk(n_series=1, timesteps=200, 1575 | ... process_noise=10, measure_noise=30) 1576 | >>> smoother = WindowWrapper( 1577 | ... LowessSmoother(smooth_fraction=0.3, iterations=1), 1578 | ... window_shape=30) 1579 | >>> smoother.smooth(data) 1580 | >>> low, up = smoother.get_intervals('prediction_interval') 1581 | """ 1582 | 1583 | def __init__(self, Smoother, window_shape, step=1): 1584 | self.Smoother = Smoother 1585 | self.window_shape = window_shape 1586 | self.step = step 1587 | 1588 | def smooth(self, data): 1589 | """Smooth timeseries. 1590 | 1591 | Parameters 1592 | ---------- 1593 | data : array-like of shape (1, timesteps) or also (timesteps,) 1594 | Single timeseries to smooth. The data are assumed to be in 1595 | increasing time order. 1596 | 1597 | Returns 1598 | ------- 1599 | self : returns an instance of self 1600 | """ 1601 | 1602 | if not 'tsmoothie.smoother' in str(self.Smoother.__repr__): 1603 | raise ValueError("Use a Smoother from tsmoothie.smoother") 1604 | 1605 | data = np.asarray(data) 1606 | if np.prod(data.shape) == np.max(data.shape): 1607 | data = data.ravel()[:, None] 1608 | else: 1609 | raise ValueError( 1610 | "The format of data received is not appropriate. " 1611 | "WindowWrapper accepts only univariate timeseries") 1612 | 1613 | if data.shape[0] < self.window_shape: 1614 | raise ValueError("window_shape must be <= than timesteps") 1615 | 1616 | data = create_windows(data, window_shape=self.window_shape, step=self.step) 1617 | data = np.squeeze(data, -1) 1618 | 1619 | self.Smoother.smooth(data) 1620 | 1621 | return self -------------------------------------------------------------------------------- /tsmoothie/utils_class.py: -------------------------------------------------------------------------------- 1 | ''' 2 | A collection of utility classes. 3 | ''' 4 | 5 | import numpy as np 6 | from scipy import sparse 7 | 8 | 9 | class LinearRegression(object): 10 | """Ordinary least squares Linear Regression. 11 | 12 | Linear model that estimates coefficients to minimize the residual 13 | sum of squares between the observed targets and the predictions. 14 | It automatically handles single and multiple targets. 15 | 16 | It's a modified version of the Linear Regression implemented 17 | in scikit-learn. 18 | 19 | Parameters 20 | ---------- 21 | fit_intercept : bool, default=True 22 | Whether to calculate the intercept for this model. If set 23 | to False, no intercept will be used in calculations. 24 | 25 | Attributes 26 | ---------- 27 | coef_ : array of shape (n_coef,) for univariate data or 28 | (sample, n_coef) for multivariate data 29 | Array containing the estimated coefficients of the linear model. 30 | Available after fitting. 31 | 32 | residues_ : array of shape (sample,) 33 | Sums of residuals; squared Euclidean 2-norm for each sample. 34 | Available after fitting. 35 | 36 | rank_ : int 37 | Rank of exogenous variable matrix. 38 | Available after fitting. 39 | 40 | singular_ : array of shape (n_coef,) 41 | Singular values of the exogenous variable matrix. 42 | Available after fitting. 43 | 44 | intercept_ : array of shape (sample,) 45 | Array containing the estimated intercepts of the linear model. 46 | If fit_intercept = False it returns 0. 47 | Available after fitting. 48 | """ 49 | 50 | def __init__(self, fit_intercept=True): 51 | self.fit_intercept = fit_intercept 52 | 53 | def __repr__(self): 54 | return "" 55 | 56 | def __str__(self): 57 | return "" 58 | 59 | def _preprocess_data(self, X, y, sample_weight): 60 | """Center and scale data. Centers data to have mean zero along 61 | axis 0. If fit_intercept=False no centering is done. If 62 | sample_weight is not None, then the weighted mean of X and y is 63 | zero, and not the mean itself. This is here because nearly all 64 | linear models will want their data to be centered. This function 65 | also systematically makes y consistent with X.dtype. 66 | 67 | Returns 68 | ------- 69 | X : array 70 | 71 | y : array 72 | 73 | X_offset : array 74 | 75 | y_offset : array 76 | 77 | X_scale : array 78 | """ 79 | 80 | X = X.copy(order='K') 81 | y = y.copy(order='K') 82 | y = np.asarray(y, dtype=X.dtype) 83 | 84 | if self.fit_intercept: 85 | 86 | X_offset = np.average(X, axis=0, weights=sample_weight) 87 | X -= X_offset 88 | X_scale = np.ones(X.shape[1], dtype=X.dtype) 89 | 90 | y_offset = np.average(y, axis=0, weights=sample_weight) 91 | y -= y_offset 92 | 93 | else: 94 | 95 | X_offset = np.zeros(X.shape[1], dtype=X.dtype) 96 | X_scale = np.ones(X.shape[1], dtype=X.dtype) 97 | 98 | if y.ndim == 1: 99 | y_offset = X.dtype.type(0) 100 | else: 101 | y_offset = np.zeros(y.shape[1], dtype=X.dtype) 102 | 103 | return X, y, X_offset, y_offset, X_scale 104 | 105 | def _rescale_data(self, X, y, sample_weight): 106 | """Rescale data sample-wise by square root of sample_weight. 107 | 108 | Returns 109 | ------- 110 | X_rescaled : array 111 | 112 | y_rescaled : array 113 | """ 114 | 115 | n_samples = X.shape[0] 116 | sample_weight = np.sqrt(sample_weight) 117 | sw_matrix = sparse.dia_matrix((sample_weight, 0), 118 | shape=(n_samples, n_samples)) 119 | X = sw_matrix @ X 120 | y = sw_matrix @ y 121 | 122 | return X, y 123 | 124 | def fit(self, X, y, sample_weight): 125 | """Fit linear model. 126 | 127 | Parameters 128 | ---------- 129 | X : array of shape (n_samples, n_features) 130 | Training data. 131 | 132 | y : array of shape (n_samples,) or (n_samples, n_targets) 133 | Target values. 134 | 135 | sample_weight : array of shape (n_samples,), default=None 136 | Individual weights for each sample. 137 | 138 | Returns 139 | ------- 140 | self : returns an instance of self 141 | """ 142 | 143 | X, y, X_offset, y_offset, X_scale = self._preprocess_data( 144 | X, y, sample_weight=sample_weight) 145 | 146 | if np.unique(sample_weight).shape[0] > 1: 147 | X, y = self._rescale_data(X, y, sample_weight) 148 | 149 | self.coef_, self.residues_, self.rank_, self.singular_ = \ 150 | np.linalg.lstsq(X, y, rcond=None) 151 | 152 | self.coef_ = self.coef_.T 153 | 154 | if self.fit_intercept: 155 | self.coef_ = self.coef_ / X_scale 156 | self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T) 157 | else: 158 | self.intercept_ = 0. 159 | 160 | return self 161 | 162 | def predict(self, X): 163 | """Compute the predictions with fitted coefficients. 164 | 165 | Parameters 166 | ---------- 167 | X : array of shape (n_samples, n_features) 168 | Exogenous data. 169 | 170 | Returns 171 | ------- 172 | pred : array 173 | """ 174 | 175 | pred = (X @ self.coef_.T) + self.intercept_ 176 | 177 | return pred -------------------------------------------------------------------------------- /tsmoothie/utils_func.py: -------------------------------------------------------------------------------- 1 | ''' 2 | A collection of utility functions. 3 | ''' 4 | 5 | import numpy as np 6 | import scipy.stats as stats 7 | from typing import Iterable 8 | 9 | 10 | def sim_seasonal_data(n_series, timesteps, measure_noise, 11 | freq=None, level=None, amp=None): 12 | """Generate sinusoidal data with periodic patterns. 13 | 14 | Parameters 15 | ---------- 16 | n_series : int 17 | Number of timeseries to generate. 18 | 19 | timesteps : int 20 | How many timesteps every generated series must have. 21 | 22 | measure_noise : int 23 | The noise present in the signals. 24 | 25 | freq : int or 1D array-like, default=None 26 | The frequencies of the sinusoidal timeseries to generate. 27 | If a single integer is passed, all the series generated have 28 | the same frequencies. If a 1D array-like is passed, the 29 | frequencies of timeseries are random sampled from the iterable 30 | passed. If None, the frequencies are random generated. 31 | 32 | level : int or 1D array-like, default=None 33 | The levels of the sinusoidal timeseries to generate. 34 | If a single integer is passed, all the series generated have 35 | the same levels. If a 1D array-like is passed, the levels 36 | of timeseries are random sampled from the iterable passed. 37 | If None, the levels are random generated. 38 | 39 | amp : int or 1D array-like, default=None 40 | The amplitudes of the sinusoidal timeseries to generate. 41 | If a single integer is passed, all the series generated have 42 | the same amplitudes. If a 1D array-like is passed, the amplitudes 43 | of timeseries are random sampled from the iterable passed. 44 | If None, the amplitudes are random generated. 45 | 46 | Returns 47 | ------- 48 | data : array of shape (series, timesteps) 49 | The generated sinusoidal timeseries. 50 | """ 51 | 52 | if freq is None: 53 | freq = np.random.randint(3, int(np.sqrt(timesteps)), (n_series, 1)) 54 | elif isinstance(freq, Iterable): 55 | freq = np.random.choice(freq, size=n_series)[:, None] 56 | else: 57 | freq = np.asarray([[freq]] * n_series) 58 | 59 | if level is None: 60 | level = np.random.uniform(-100, 100, (n_series, 1)) 61 | elif isinstance(level, Iterable): 62 | level = np.random.choice(level, size=n_series)[:, None] 63 | else: 64 | level = np.asarray([[level]] * n_series) 65 | 66 | if amp is None: 67 | amp = np.random.uniform(3, 100, (n_series, 1)) 68 | elif isinstance(amp, Iterable): 69 | amp = np.random.choice(amp, size=n_series)[:, None] 70 | else: 71 | amp = np.asarray([[amp]] * n_series) 72 | 73 | t = np.repeat([np.arange(timesteps)], n_series, axis=0) 74 | e = np.random.normal(0, measure_noise, (n_series, timesteps)) 75 | data = level + amp * np.sin(t * (2 * np.pi / freq)) + e 76 | 77 | return data 78 | 79 | 80 | def sim_randomwalk(n_series, timesteps, process_noise, measure_noise, 81 | level=None): 82 | """Generate randomwalks. 83 | 84 | Parameters 85 | ---------- 86 | n_series : int 87 | Number of randomwalks to generate. 88 | 89 | timesteps : int 90 | How many timesteps every generated randomwalks must have. 91 | 92 | process_noise : int 93 | The noise present in randomwalks creation. 94 | 95 | measure_noise : int 96 | The noise present in the signals. 97 | 98 | level : int or 1D array-like, default=None 99 | The levels of the randomwalks to generate. 100 | If a single integer is passed, all the randomwalks have 101 | the same levels. If a 1D array-like is passed, the levels 102 | of the randomwalks are random sampled from the iterable 103 | passed. If None, the levels are set to 0 for all the series. 104 | 105 | Returns 106 | ------- 107 | data : array of shape (series, timesteps) 108 | The generated randomwalks. 109 | """ 110 | 111 | if level is None: 112 | level = 0 113 | if isinstance(level, Iterable): 114 | level = np.random.choice(level, size=n_series)[:, None] 115 | else: 116 | level = np.asarray([[level]] * n_series) 117 | 118 | data = np.random.normal(0, process_noise, size=(n_series, timesteps)) 119 | e = np.random.normal(0, measure_noise, size=(n_series, timesteps)) 120 | data = level + np.cumsum(data, axis=1) + e 121 | 122 | return data 123 | 124 | 125 | def create_windows(data, window_shape, step=1, 126 | start_id=None, end_id=None): 127 | """Create sliding windows of the same length from the series 128 | received as input. 129 | 130 | create_windows vectorizes, in an efficient way, the windows creation 131 | on all the series received. 132 | 133 | Parameters 134 | ---------- 135 | data : 2D array of shape (timestemps, series) 136 | Timeseries to slide into equal size windows. 137 | 138 | window_shape : int 139 | Grather than 1. The shape of the sliding windows used to divide 140 | the input series. 141 | 142 | step : int, default=1 143 | The step used to generate the sliding windows. The overlapping 144 | portion of two adjacent windows can be defined as 145 | (window_shape - step). 146 | 147 | start_id : int, default=None 148 | The starting position from where operate slicing. The same for 149 | all the series. If None, the windows are generated from the index 0. 150 | 151 | end_id : int, default=None 152 | The ending position of the slicing operation. The same for all the 153 | series. If None, the windows end on the last position available. 154 | 155 | Returns 156 | ------- 157 | window_data : 3D array of shape (window_slices, window_shape, series) 158 | The input data sliced into windows of the same lengths. 159 | """ 160 | 161 | data = np.asarray(data) 162 | 163 | if data.ndim != 2: 164 | raise ValueError( 165 | "Pass a 2D array-like in the format (timestemps, series)") 166 | 167 | if window_shape < 1: 168 | raise ValueError("window_shape must be >= 1") 169 | 170 | if start_id is None: 171 | start_id = 0 172 | 173 | if end_id is None: 174 | end_id = data.shape[0] 175 | 176 | data = data[int(start_id):int(end_id), :] 177 | 178 | window_shape = (int(window_shape), data.shape[-1]) 179 | step = (int(step),) * data.ndim 180 | 181 | slices = tuple(slice(None, None, st) for st in step) 182 | indexing_strides = data[slices].strides 183 | 184 | win_indices_shape = ((np.array(data.shape) - window_shape) // step) + 1 185 | 186 | new_shape = tuple(list(win_indices_shape) + list(window_shape)) 187 | strides = tuple(list(indexing_strides) + list(data.strides)) 188 | 189 | window_data = np.lib.stride_tricks.as_strided( 190 | data, shape=new_shape, strides=strides) 191 | 192 | return np.squeeze(window_data, 1) 193 | 194 | 195 | def sigma_interval(true, prediction, n_sigma): 196 | """Compute smoothing intervals as n_sigma times the residuals of the 197 | smoothing process. 198 | 199 | Returns 200 | ------- 201 | low : array 202 | Lower bands. 203 | 204 | up : array 205 | Upper bands. 206 | """ 207 | 208 | std = np.nanstd(true - prediction, axis=1, keepdims=True) 209 | 210 | low = prediction - n_sigma * std 211 | up = prediction + n_sigma * std 212 | 213 | return low, up 214 | 215 | 216 | def kalman_interval(true, prediction, cov, confidence=0.05): 217 | """Compute smoothing intervals from a Kalman smoothing process. 218 | 219 | Returns 220 | ------- 221 | low : array 222 | Lower bands. 223 | 224 | up : array 225 | Upper bands. 226 | """ 227 | 228 | g = stats.norm.ppf(1 - confidence / 2) 229 | 230 | resid = true - prediction 231 | std_err = np.sqrt(np.nanmean(np.square(resid), axis=1, keepdims=True)) 232 | 233 | low = prediction - g * (std_err * cov) 234 | up = prediction + g * (std_err * cov) 235 | 236 | return low, up 237 | 238 | 239 | def confidence_interval(true, prediction, exog, confidence, 240 | add_intercept=True): 241 | """Compute confidence intervals for regression tasks. 242 | 243 | Returns 244 | ------- 245 | low : array 246 | Lower bands. 247 | 248 | up : array 249 | Upper bands. 250 | """ 251 | 252 | if exog.ndim == 1: 253 | exog = exog[:, None] 254 | 255 | if add_intercept: 256 | exog = np.concatenate([np.ones((len(exog), 1)), exog], axis=1) 257 | 258 | N = exog.shape[0] 259 | d_free = exog.shape[1] 260 | t = stats.t.ppf(1 - confidence / 2, N - d_free) 261 | 262 | resid = true - prediction 263 | mse = (np.square(resid).sum(axis=1, keepdims=True) / (N - d_free)).T 264 | 265 | hat_matrix_diag = (exog * np.linalg.pinv(exog).T).sum(axis=1, keepdims=True) 266 | predict_mean_se = np.sqrt(hat_matrix_diag * mse).T 267 | 268 | low = prediction - t * predict_mean_se 269 | up = prediction + t * predict_mean_se 270 | 271 | return low, up 272 | 273 | 274 | def prediction_interval(true, prediction, exog, confidence, 275 | add_intercept=True): 276 | """Compute prediction intervals for regression tasks. 277 | 278 | Returns 279 | ------- 280 | low : array 281 | Lower bands. 282 | 283 | up : array 284 | Upper bands. 285 | """ 286 | 287 | if exog.ndim == 1: 288 | exog = exog[:, None] 289 | 290 | if add_intercept: 291 | exog = np.concatenate([np.ones((len(exog), 1)), exog], axis=1) 292 | 293 | N = exog.shape[0] 294 | d_free = exog.shape[1] 295 | t = stats.t.ppf(1 - confidence / 2, N - d_free) 296 | 297 | resid = true - prediction 298 | mse = (np.square(resid).sum(axis=1, keepdims=True) / (N - d_free)).T 299 | 300 | covb = np.linalg.pinv(np.dot(exog.T, exog))[..., None] * mse 301 | predvar = mse + (exog[..., None] * 302 | np.dot(covb.transpose(2, 0, 1), exog.T).T).sum(1) 303 | predstd = np.sqrt(predvar).T 304 | 305 | low = prediction - t * predstd 306 | up = prediction + t * predstd 307 | 308 | return low, up 309 | 310 | 311 | def _check_noise_dict(noise_dict, component): 312 | """Ensure noise compatibility for the noises of the components 313 | provided when building a state space model. 314 | 315 | Returns 316 | ------- 317 | noise_dict : dict 318 | Checked input. 319 | """ 320 | 321 | sub_component = component.split('_') 322 | 323 | if isinstance(noise_dict, dict): 324 | for c in sub_component: 325 | 326 | if c not in noise_dict: 327 | raise ValueError( 328 | "You need to provide noise for '{}' component".format(c)) 329 | 330 | if noise_dict[c] < 0: 331 | raise ValueError( 332 | "noise for '{}' must be >= 0".format(c)) 333 | 334 | return noise_dict 335 | 336 | else: 337 | raise ValueError( 338 | "noise should be a dict. Received {}".format(type(noise_dict))) 339 | 340 | 341 | def _check_knots(knots, min_n_knots): 342 | """Ensure knots compatibility for the knots provided when building 343 | bases for linear regression. 344 | 345 | Returns 346 | ------- 347 | knots : array 348 | Checked input. 349 | """ 350 | 351 | knots = np.asarray(knots, dtype=np.float64) 352 | 353 | if np.prod(knots.shape) == np.max(knots.shape): 354 | knots = knots.ravel() 355 | 356 | if knots.ndim != 1: 357 | raise ValueError("knots must be a list or 1D array") 358 | 359 | knots = np.unique(knots) 360 | min_k, max_k = knots[0], knots[-1] 361 | 362 | if min_k < 0 or max_k > 1: 363 | raise ValueError("Every knot must be in the range [0,1]") 364 | 365 | if min_k > 0: 366 | knots = np.append(0., knots) 367 | 368 | if max_k < 1: 369 | knots = np.append(knots, 1.) 370 | 371 | if knots.shape[0] < min_n_knots + 2: 372 | raise ValueError( 373 | "Provide at least {} knots in the range (0,1)".format(min_n_knots)) 374 | 375 | return knots 376 | 377 | 378 | def _check_weights(weights, basis_len): 379 | """Ensure weights compatibility for the weights provided in 380 | linear regression applications. 381 | 382 | Returns 383 | ------- 384 | weights : array 385 | Checked input. 386 | """ 387 | 388 | if weights is None: 389 | return np.ones(basis_len, dtype=np.float64) 390 | 391 | weights = np.asarray(weights, dtype=np.float64) 392 | 393 | if np.prod(weights.shape) == np.max(weights.shape): 394 | weights = weights.ravel() 395 | 396 | if weights.ndim != 1: 397 | raise ValueError("Sample weights must be a list or 1D array") 398 | 399 | if weights.shape[0] != basis_len: 400 | raise ValueError( 401 | "Sample weights length must be equal to timesteps " 402 | "dimension of the data received") 403 | 404 | if np.any(weights < 0): 405 | raise ValueError("weights must be >= 0") 406 | 407 | if np.logical_or(np.isnan(weights), np.isinf(weights)).any(): 408 | raise ValueError("weights must not contain NaNs or Inf") 409 | 410 | return weights 411 | 412 | 413 | def _check_data(data): 414 | """Ensure data compatibility for the series received by the smoother. 415 | 416 | Returns 417 | ------- 418 | data : array 419 | Checked input. 420 | """ 421 | 422 | data = np.asarray(data) 423 | 424 | if np.prod(data.shape) == np.max(data.shape): 425 | data = data.ravel() 426 | 427 | if data.ndim > 2: 428 | raise ValueError( 429 | "The format of data received is not appropriate. " 430 | "Pass an object with data in this format (series, timesteps)") 431 | 432 | if data.ndim == 0: 433 | raise ValueError( 434 | "Pass an object with data in this format (series, timesteps)") 435 | 436 | if data.dtype not in [np.float16, np.float32, np.float64, 437 | np.int8, np.int16, np.int32, np.int64]: 438 | raise ValueError("data contains not numeric types") 439 | 440 | if np.logical_or(np.isnan(data), np.isinf(data)).any(): 441 | raise ValueError("data must not contain NaNs or Inf") 442 | 443 | return data.T 444 | 445 | 446 | def _check_data_nan(data): 447 | """Ensure data compatibility for the series received by the smoother. 448 | (Without checking for inf and nans). 449 | 450 | Returns 451 | ------- 452 | data : array 453 | Checked input. 454 | """ 455 | 456 | data = np.asarray(data) 457 | 458 | if np.prod(data.shape) == np.max(data.shape): 459 | data = data.ravel() 460 | 461 | if data.ndim > 2: 462 | raise ValueError( 463 | "The format of data received is not appropriate. " 464 | "Pass an objet with data in this format (series, timesteps)") 465 | 466 | if data.ndim == 0: 467 | raise ValueError( 468 | "Pass an object with data in this format (series, timesteps)") 469 | 470 | if data.dtype not in [np.float16, np.float32, np.float64, 471 | np.int8, np.int16, np.int32, np.int64]: 472 | raise ValueError("data contains not numeric types") 473 | 474 | return data 475 | 476 | 477 | def _check_output(output, transpose=True): 478 | """Ensure output compatibility for the series returned by the smoother. 479 | 480 | Returns 481 | ------- 482 | output : array 483 | Checked input. 484 | """ 485 | 486 | if transpose: 487 | output = output.T 488 | 489 | if output.ndim == 1: 490 | output = output[None, :] 491 | 492 | return output 493 | 494 | 495 | def _id_nb_bootstrap(n_obs, block_length): 496 | """Create bootstrapped indexes with the none overlapping block bootstrap 497 | ('nbb') strategy given the number of observations in a timeseries and 498 | the length of the blocks. 499 | 500 | Returns 501 | ------- 502 | _id : array 503 | Bootstrapped indexes. 504 | """ 505 | 506 | n_blocks = int(np.ceil(n_obs / block_length)) 507 | nexts = np.repeat([np.arange(0, block_length)], n_blocks, axis=0) 508 | 509 | blocks = np.random.permutation( 510 | np.arange(0, n_obs, block_length) 511 | ).reshape(-1, 1) 512 | 513 | _id = (blocks + nexts).ravel() 514 | _id = _id[_id < n_obs] 515 | 516 | return _id 517 | 518 | 519 | def _id_mb_bootstrap(n_obs, block_length): 520 | """Create bootstrapped indexes with the moving block bootstrap 521 | ('mbb') strategy given the number of observations in a timeseries 522 | and the length of the blocks. 523 | 524 | Returns 525 | ------- 526 | _id : array 527 | Bootstrapped indexes. 528 | """ 529 | 530 | n_blocks = int(np.ceil(n_obs / block_length)) 531 | nexts = np.repeat([np.arange(0, block_length)], n_blocks, axis=0) 532 | 533 | last_block = n_obs - block_length 534 | blocks = np.random.randint(0, last_block, (n_blocks, 1)) 535 | _id = (blocks + nexts).ravel()[:n_obs] 536 | 537 | return _id 538 | 539 | 540 | def _id_cb_bootstrap(n_obs, block_length): 541 | """Create bootstrapped indexes with the circular block bootstrap 542 | ('cbb') strategy given the number of observations in a timeseries 543 | and the length of the blocks. 544 | 545 | Returns 546 | ------- 547 | _id : array 548 | Bootstrapped indexes. 549 | """ 550 | 551 | n_blocks = int(np.ceil(n_obs / block_length)) 552 | nexts = np.repeat([np.arange(0, block_length)], n_blocks, axis=0) 553 | 554 | last_block = n_obs 555 | blocks = np.random.randint(0, last_block, (n_blocks, 1)) 556 | _id = np.mod((blocks + nexts).ravel(), n_obs)[:n_obs] 557 | 558 | return _id 559 | 560 | 561 | def _id_s_bootstrap(n_obs, block_length): 562 | """Create bootstrapped indexes with the stationary bootstrap 563 | ('sb') strategy given the number of observations in a timeseries 564 | and the length of the blocks. 565 | 566 | Returns 567 | ------- 568 | _id : array 569 | Bootstrapped indexes. 570 | """ 571 | 572 | random_block_length = np.random.poisson(block_length, n_obs) 573 | random_block_length[random_block_length < 3] = 3 574 | random_block_length[random_block_length >= n_obs] = n_obs 575 | random_block_length = random_block_length[random_block_length.cumsum() <= n_obs] 576 | residual_block = n_obs - random_block_length.sum() 577 | if residual_block > 0: 578 | random_block_length = np.append(random_block_length, residual_block) 579 | 580 | n_blocks = random_block_length.shape[0] 581 | nexts = np.zeros((n_blocks, random_block_length.max() + 1)) 582 | nexts[np.arange(n_blocks), random_block_length] = 1 583 | nexts = np.flip(nexts, 1).cumsum(1).cumsum(1).ravel() 584 | nexts = (nexts[nexts > 1] - 2).astype(int) 585 | 586 | last_block = n_obs - random_block_length.max() 587 | blocks = np.zeros(n_obs, dtype=int) 588 | if last_block > 0: 589 | blocks = np.random.randint(0, last_block, n_blocks) 590 | blocks = np.repeat(blocks, random_block_length) 591 | _id = blocks + nexts 592 | 593 | return _id --------------------------------------------------------------------------------