├── README.md ├── data ├── M1.csv ├── M3.csv ├── M4_3000.csv.xz ├── example_input └── example_priors ├── requirements.txt └── src ├── forecast.py └── forgp ├── __init__.py └── gp.py /README.md: -------------------------------------------------------------------------------- 1 | # Time series forecasting with Gaussian Processes 2 | ## Related Publication 3 | The theoretical description of the algorithm implemented in this software and empirical results can be found in: 4 | 5 | “Time series forecasting with Gaussian Processes needs priors"\ 6 | Giorgio Corani, Alessio Benavoli, Marco Zaffalon\ 7 | Accepted at ECML-PKDD 2021\ 8 | Arxix preprint: https://arxiv.org/abs/2009.08102 9 | 10 | 11 | ## forgp package 12 | The software includes a small package that builds the gaussian process and uses it to produce predictions. The package heavily relies on GPy. 13 | A convenience script can be used to run the GP over collections of timeseries. 14 | 15 | ## **forecast.py** 16 | __forecast.py__ is an executable python script that can be used to produce forecasts and evaluation of our GP over multiple timeseries. The script takes as input a csv file containing training and test series and produces a csv file with predictions and scores. Input and output file are described below. 17 | The produced prediction includes mean and upper bound of the 95% confidence interval. 18 | 19 | A number of command line arguments can be used to specify custom names for the columns in the input CSV file and to filter the timeseries to be processed. 20 | Most useful command line arguments are: 21 | 22 | * --frequency: to include only timeseries with specific frequency 23 | * --normalize: normalize timeseries using the specified mean and standard deviation 24 | * --log: verbosity level (100 = max vebosity, 0 = min verbosity) Default: 0 25 | * --default-priors to use default values for the priors in place place of no prior values 26 | * --help returns a description for the various command line arguments 27 | 28 | 29 | ## Input File format 30 | Our tools uses a simple tabular data format serialized in a CSV file. This CSV file use as the only supported field separator the comma ",". 31 | The first line contains the header, while Each following line in the file represents a timeseries. Required fields/columns include: 32 | 33 | * __st__: unique name of the timeseries 34 | * __period__: frequency of the timeseries. One of MONTHLY, QUARTERLY, YEARLY and WEEKLY 35 | * __mean__: mean for the normalization of the timeseries 36 | * __std__: standard deviation for the normalization of the timeseries 37 | * __x__: training values of the timeseries 38 | * __xx__: test values of the timeseries 39 | 40 | Point values of the timeseries (i.e. __x__ and __xx__) are to be provided as a semicolon (";") separated list of numeric values. 41 | 42 | ## Output file format 43 | The output file follows a similar format as the input. It stores the predicted point forecasts and 95% upperbounds as semicolon separated lists within a comma separated file where each line represents the corresponding timeseries from the input file. 44 | The generated columns include: 45 | 46 | * __st__: the unique id of the series 47 | * __mean__: the mean of the training timeseries 48 | * __std__: the standard deviation of the training timeseries 49 | * __center__: the mean value of the prediction 50 | * __upper__: the upper bound of the 95% confidence prediction band 51 | * __time__: time required to fit and predict 52 | * __mae__: mean absolute error of the predicted values (the xx values in the input file) 53 | * __crps__: continuous ranked probability score 54 | * __ll__: loglikelihood 55 | 56 | ## Priors file 57 | Prior values for the different kernel hyper parameters can be provided via file. This file contains the priors as a new line separated list of numbers. These are in order: 58 | 59 | * standard deviation of variances 60 | * standard deviation of lengtscales 61 | * mean of variances 62 | * mean of rbf lenghtscale 63 | * mean of periodic kernel's lenghtscale 64 | * mean of first spectral kernel's exponential component lenghtscale 65 | * mean of first spectral kernel's cosine component lenghtscale 66 | * mean of second spectral kernel's exponential component lenghtscale 67 | * mean of second spectral kernel's cosine component lenghtscale 68 | 69 | The latter may only be used depending on the selected number of spectral components to be used (Q parameter). 70 | 71 | ## Dependencies and setup 72 | A requirements file is provided in the package to ease the installation of all the dependencies. For conda based systems one may create a suitable environment with: 73 | 74 | ```sh 75 | conda create --name --file requirements.txt 76 | ``` 77 | 78 | ## Example execution 79 | The package includes a number of input files, inluding standard M1[[1]](#1), M3[[2]](#2) competition timeseries, a sample of the M4 competition[[3]](#3) and a short example input. 80 | To run the script on the example input one may run the following command from withing the src folder: 81 | 82 | ```sh 83 | ./forecast.py --log 100 --default-priors --normalize ../data/example_input example_output 84 | ``` 85 | 86 | As hinted above you can provide priors via file: 87 | 88 | ```sh 89 | ./forecast.py --log 100 --default-priors --normalize --priors ../data/example_priors ../data/example_input example_output 90 | ``` 91 | 92 | ## References 93 | [1] 94 | Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, and R. Winkler (1982) The accuracy of extrapolation (time series) methods: results of a forecasting competition. *Journal of Forecasting*, **1**, 111--153. 95 | 96 | [2] 97 | Makridakis, S. and M. Hibon (2000) The M3-competition: results, conclusions and implications. *International Journal of Forecasting*, **16**, 451-476. 98 | 99 | [3] 100 | Makridakis, S., E. Spiliotis and V. Assimakopoulos (2020) The M4 Competition: 100,000 time series and 61 forecasting methods, *International Journal of Forecasting, Elsevier*, **36(1)**, pages 54-74. 101 | -------------------------------------------------------------------------------- /data/M4_3000.csv.xz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IDSIA/gpforecasting/5f4c459d8a0e7d0140ea6ca5a25d4c0d8a34948b/data/M4_3000.csv.xz -------------------------------------------------------------------------------- /data/example_input: -------------------------------------------------------------------------------- 1 | "st","period","mean","std","x","xx" 2 | "Q94","QUARTERLY",147.99,76.90,"26.2;42.6;68;93.4;193.5;99.5;187.5;163.8;221.3;192.1;135.8;226.7;273.5","247.1;176.7;204.3;186.3;107.7;77.2;86.9;135" 3 | "Q95","QUARTERLY",34.29,6.19,"23.6;30.4;29;27.8;30.6;37.5;38.2;34.3;38.3;46.5;40.3;37.8;31.5","44.9;45.8;37;40.5;55;54.1;50.2;55.5" 4 | "Q96","QUARTERLY",49.83,5.99,"36.8;50.3;47;50.1;43.9;56.7;51.2;43.1;50.1;59.1;54.3;53.3;51.9","68.9;65.2;58.9;64.5;80;76.7;81.1;81.7" -------------------------------------------------------------------------------- /data/example_priors: -------------------------------------------------------------------------------- 1 | #std 2 | 1.0 3 | 1.0 4 | #mu 5 | -1.5 6 | 1.1 7 | 0.2 8 | -0.7 9 | 0.5 10 | 1.1 11 | 1.6 -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # This file may be used to create an environment using: 2 | # $ conda create --name --file 3 | # platform: osx-64 4 | blas=1.0=mkl 5 | ca-certificates=2020.12.5=h033912b_0 6 | certifi=2020.12.5=py38h50d1736_1 7 | cycler=0.10.0=py38_0 8 | decorator=4.4.2=pyhd3eb1b0_0 9 | freetype=2.10.4=ha233b18_0 10 | gpy=1.9.9=py38ha1b04c9_7 11 | intel-openmp=2019.4=233 12 | jpeg=9b=he5867d9_2 13 | kiwisolver=1.3.1=py38h23ab428_0 14 | lcms2=2.11=h92f6f08_0 15 | libcxx=10.0.0=1 16 | libffi=3.3=hb1e8313_2 17 | libgfortran=3.0.1=h93005f0_2 18 | libpng=1.6.37=ha441bb4_0 19 | libtiff=4.2.0=h87d7836_0 20 | libwebp-base=1.2.0=h9ed2024_0 21 | lz4-c=1.9.3=h23ab428_0 22 | matplotlib-base=3.3.4=py38h8b3ea08_0 23 | mkl=2019.4=233 24 | mkl-service=2.3.0=py38h9ed2024_0 25 | mkl_fft=1.3.0=py38ha059aab_0 26 | mkl_random=1.1.1=py38h959d312_0 27 | ncurses=6.2=h0a44026_1 28 | numpy=1.19.2=py38h456fd55_0 29 | numpy-base=1.19.2=py38hcfb5961_0 30 | olefile=0.46=py_0 31 | openssl=1.1.1k=h0d85af4_0 32 | pandas=1.2.3=py38hb2f4e1b_0 33 | paramz=0.9.5=py_0 34 | pillow=8.1.2=py38h5270095_0 35 | pip=21.0.1=py38hecd8cb5_0 36 | properscoring=0.1=py_0 37 | pyparsing=2.4.7=pyhd3eb1b0_0 38 | python=3.8.8=h88f2d9e_4 39 | python-dateutil=2.8.1=pyhd3eb1b0_0 40 | python_abi=3.8=1_cp38 41 | pytz=2021.1=pyhd3eb1b0_0 42 | readline=8.1=h9ed2024_0 43 | scipy=1.6.2=py38h2515648_0 44 | setuptools=52.0.0=py38hecd8cb5_0 45 | six=1.15.0=py38hecd8cb5_0 46 | sqlite=3.35.2=hce871da_0 47 | tk=8.6.10=hb0a8c7a_0 48 | tornado=6.1=py38h9ed2024_0 49 | wheel=0.36.2=pyhd3eb1b0_0 50 | xz=5.2.5=h1de35cc_0 51 | zlib=1.2.11=h1de35cc_3 52 | zstd=1.4.5=h41d2c2f_0 53 | -------------------------------------------------------------------------------- /src/forecast.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import argparse 5 | import pandas as pd 6 | import numpy as np 7 | import forgp.gp as gp 8 | import time 9 | import logging 10 | 11 | class dotdict(dict): 12 | __getattr__ = dict.get 13 | __setattr__ = dict.__setitem__ 14 | __delattr__ = dict.__delitem__ 15 | 16 | 17 | def get_col(data, name, index): 18 | return data.iloc[:,index] if index is not None else data.loc[:,name] 19 | 20 | def parse_arguments(): 21 | ''' Parse and validate command line arguments 22 | ''' 23 | parser = argparse.ArgumentParser(description="Time series forecasting using GP") 24 | parser.add_argument("--log", type=int, default=logging.WARNING) 25 | parser.add_argument('-Q', type=int, choices=[0, 1, 2], help="Number of spectral components", default=2) 26 | parser.add_argument('-r','--restarts', type=int, help="Number of restarts", default=1) 27 | 28 | group = parser.add_mutually_exclusive_group() 29 | group.add_argument('--priors', help="Custom priors file location") 30 | group.add_argument('--default-priors', help="Use default priors", action="store_true") 31 | parser.add_argument('--priors-count', help="Number of blocks of priors in custom priors file location", default=1) 32 | 33 | group = parser.add_mutually_exclusive_group() 34 | group.add_argument('-xn', '--train-col', help="Name of the training data column (default x)", default="x") 35 | group.add_argument('-xi', '--train-index', help="Index (zero based) of the training data column") 36 | 37 | group = parser.add_mutually_exclusive_group() 38 | group.add_argument('-tn', '--test-col', help="Name of the test data column (default xx)", default="xx") 39 | group.add_argument('-ti', '--test-index', help="Index (zero based) of the test data column") 40 | 41 | parser.add_argument('--frequency-col', help="Specify the frequency column", default = "period") 42 | parser.add_argument('-f', '--frequency', help="Specify the frequency (monthly vs quarterly vs yearly)", default = "ANY") 43 | 44 | group = parser.add_mutually_exclusive_group() 45 | group.add_argument('--mean-col', help="Specify the mean column name", default = "mean") 46 | group.add_argument('--mean-index', help="Specify the mean column index") 47 | 48 | group = parser.add_mutually_exclusive_group() 49 | group.add_argument('--std-col', help="Specify the std column name", default = "std") 50 | group.add_argument('--std-index', help="Specify the std column index") 51 | 52 | parser.add_argument('--normalize', help="should data be normalized manually", action='store_true') 53 | 54 | parser.add_argument('--limit', type=int, help="Specify the limit in the length of the training set (-1)", default=-1) 55 | 56 | parser.add_argument('--sample', help='limit to sample', type=int, default=-1) 57 | parser.add_argument('--sample-col', help='sample column', type=str, default="sample") 58 | 59 | parser.add_argument('data', help="Training test file (in csv format with arrays in semicolon separated)") 60 | parser.add_argument('target', help="Output file name") 61 | 62 | 63 | return parser.parse_args() 64 | 65 | 66 | def float_arrays(data): 67 | return data.str.split(";").apply(lambda x: np.array(x).astype(np.float)) 68 | 69 | 70 | def compute_indicators(Ytest, mean, upper): 71 | import properscoring as ps 72 | import scipy.stats as stat 73 | import numpy as np 74 | 75 | sigma = (upper - mean)/ stat.norm.ppf(0.975) 76 | fcast = mean 77 | 78 | crps = np.zeros(len(Ytest)) 79 | ll = np.zeros(len(Ytest)) 80 | 81 | for jj in range(len(Ytest)): 82 | crps[jj] = ps.crps_gaussian(Ytest[jj], mu=fcast[jj], sig=sigma[jj]) 83 | ll[jj] = stat.norm.logpdf(x=Ytest[jj], loc=fcast[jj], scale=sigma[jj]) 84 | 85 | mae = np.mean(np.abs(Ytest - fcast)) 86 | crps = np.mean(crps) 87 | ll = np.mean(ll) 88 | 89 | return([mae, crps, ll]) 90 | 91 | 92 | if __name__ == "__main__": 93 | args = parse_arguments() 94 | 95 | ## initialize logging 96 | logger = logging.getLogger() 97 | logger.setLevel(100 - args.log) 98 | ch = logging.StreamHandler() 99 | ch.setLevel(100- args.log) 100 | formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') 101 | ch.setFormatter(formatter) 102 | logger.addHandler(ch) 103 | 104 | logger.info("Summary: GP forecasting with RBF, no ETS, no Bias") 105 | 106 | 107 | data = pd.read_csv(args.data) 108 | if args.frequency.upper() != "ANY": 109 | data = data[data[args.frequency_col].str.upper() == args.frequency.upper()] 110 | logger.info(f"Using only {args.frequency} data") 111 | else: 112 | logger.info("Not filterning by period") 113 | 114 | # filtering by sample number 115 | if args.sample > 0: 116 | data = data[data[args.sample_col] <= args.sample] 117 | logger.info(f"filtering {args.sample}: {data.shape}") 118 | 119 | 120 | train = get_col(data, args.train_col, args.train_index) 121 | test = get_col(data, args.test_col, args.test_index) 122 | 123 | train = float_arrays(train) 124 | test = float_arrays(test) 125 | 126 | if args.normalize: 127 | means = get_col(data, args.mean_col, args.mean_index) 128 | stds = get_col(data, args.std_col, args.std_index) 129 | 130 | train = (train - means) / stds 131 | test = (test - means) / stds 132 | 133 | priors = None 134 | if args.priors is not None: 135 | priors = pd.read_csv(args.priors, header=None, comment='#') 136 | cut = int(len(priors) / args.priors_count) 137 | priors = priors[-cut:].values.flatten() 138 | pstr = str.join(" ", priors.astype(str)) 139 | logger.debug(f"Using these custom priors: {pstr}") 140 | elif args.default_priors: 141 | logger.info("Using default priors") 142 | priors = True 143 | else: 144 | logger.info("Using no priors") 145 | priors = False 146 | 147 | #priors = False 148 | 149 | 150 | 151 | out = pd.DataFrame(columns=["st", "mean", "std", "center", "upper"]) 152 | for i in range(0, len(train)): 153 | start = time.time() 154 | 155 | 156 | row = data.iloc[i,:] 157 | Y = train.iloc[i] 158 | if args.limit > 0: 159 | Y = Y[-args.limit:] 160 | logger.info(f"train length: {len(Y)}") 161 | 162 | Y = Y.reshape(len(Y), 1) 163 | YY = test.iloc[i] 164 | 165 | # Stderr output to be able to identify GPy errors 166 | print(f"----------------------------------------------", file=sys.stderr) 167 | print(f"Processing series #{i}", file=sys.stderr) 168 | print(f"st: {row.st}", file=sys.stderr) 169 | 170 | logger.info(f"Processing series #{i}") 171 | logger.info(f"st: {row.st}") 172 | logger.debug(f"{row[args.frequency_col]}") 173 | logger.debug(f"series mean: {row[args.mean_col]}") 174 | logger.debug(f"series std: {row[args.std_col]}") 175 | logger.debug(f"train length: {len(Y)}/{len(train.iloc[i])}") 176 | logger.debug(f"test length: {len(YY)}") 177 | 178 | g = gp.GP(row[args.frequency_col].lower(), priors=priors, Q=args.Q, normalize=False, restarts=args.restarts) 179 | if type(priors) == "list": 180 | g.priors_array(priors) 181 | 182 | g.build_gp(Y) 183 | 184 | m, u = g.forecast(len(YY)) 185 | m = m.reshape(len(m)) 186 | u = u.reshape(len(u)) 187 | 188 | mae, crps, ll = compute_indicators(YY, m, u) 189 | end = time.time() 190 | logger.debug(f"duration: {end - start}") 191 | 192 | out = out.append([{ 193 | "st":row.st, 194 | "mean":row[args.mean_col], 195 | "std":row[args.std_col], 196 | "time": end - start, 197 | "center":str.join(";", m.astype(str)), 198 | "upper":str.join(";", u.astype(str)), 199 | "mae": mae, 200 | "crps": crps, 201 | "ll": ll 202 | }]) 203 | 204 | out.to_csv(args.target) 205 | -------------------------------------------------------------------------------- /src/forgp/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IDSIA/gpforecasting/5f4c459d8a0e7d0140ea6ca5a25d4c0d8a34948b/src/forgp/__init__.py -------------------------------------------------------------------------------- /src/forgp/gp.py: -------------------------------------------------------------------------------- 1 | import GPy 2 | import pandas as pd 3 | import numpy as np 4 | import scipy.stats as stats 5 | from collections.abc import Iterable 6 | import logging 7 | import collections 8 | 9 | 10 | class GP: 11 | 12 | def __init__(self, frequency, period = 1, Q = 2, priors = True, restarts = 1, normalize = False, loglevel = 0): 13 | self.logger = logging.getLogger("forgp") 14 | self.logger.setLevel(loglevel) 15 | self.set_frequency(frequency) 16 | self.set_period(period) 17 | 18 | self.Q = Q 19 | self.restarts = restarts 20 | self.normalize = normalize 21 | 22 | self.has_priors = priors is not False 23 | if self.has_priors: 24 | if priors is True: 25 | self.logger.info("using default priors") 26 | priors = self.default_priors() 27 | elif isinstance(priors, (list,pd.core.series.Series,np.ndarray)): 28 | self.priors_array(priors) 29 | else: 30 | self.init_priors(priors) 31 | 32 | 33 | self.logger.info(f"GP priors {priors} {self.has_priors}, Q={self.Q}, restarts={self.restarts}") 34 | 35 | 36 | def standard_prior(self, data): 37 | """ Get the priors hash from the array of priors data 38 | 39 | This method will name the values in the data according to the ordering used in the 40 | hierarchical probabilistic programming prior estimation code. 41 | This is: 42 | - std devs 43 | - means (variance, periodic then exp, cos for the different Qs) 44 | - alpha 45 | - beta 46 | """ 47 | 48 | names = ["p_std_var", "p_std_other", "p_mu_var", "p_mu_periodic"] 49 | if self.Q >= 1: 50 | names += ["p_mu_exp1", "p_mu_cos1"] 51 | if self.Q >= 2: 52 | names += [ "p_mu_exp2", "p_mu_cos2"] 53 | names += [ "p_alpha", "p_beta" ] 54 | 55 | return dict(zip(names, data)) 56 | 57 | def priors_array(self, data): 58 | """ Set priors from array 59 | """ 60 | priors = { 61 | "p_std_var": data[0], "p_std_other": data[1], 62 | "p_mu_var": data[2], "p_mu_rbf": data[3], 63 | "p_mu_periodic": data[4] 64 | } 65 | 66 | for i in range(1, self.Q+1): 67 | priors[f"p_mu_exp{i}"] = data[5 + (i-1)*2] 68 | priors[f"p_mu_cos{i}"] = data[6 + (i-1)*2] 69 | 70 | self.has_priors = True 71 | self.init_priors(priors) 72 | 73 | def default_priors(self): 74 | """ Get default prior values """ 75 | 76 | 77 | priors = { 78 | "p_std_var": 1.0, "p_std_other": 1.0, 79 | "p_mu_var": -1.5, "p_mu_rbf": 1.1, 80 | "p_mu_periodic": 0.2, "p_mu_exp1": -0.7, "p_mu_cos1": 0.5, "p_mu_exp2": 1.1, "p_mu_cos2": 1.6, 81 | 82 | } 83 | 84 | return priors 85 | 86 | def init_priors(self, priors): 87 | """ Initialize the prior parameters crearing the GPy priors """ 88 | self.prior_var = GPy.priors.LogGaussian(priors["p_mu_var"], priors["p_std_var"]) 89 | self.prior_lscal_rbf = GPy.priors.LogGaussian(priors["p_mu_rbf"], priors["p_std_other"]) 90 | self.prior_lscal_std_periodic = GPy.priors.LogGaussian(priors["p_mu_periodic"], priors["p_std_other"]) 91 | 92 | if self.Q >= 1: 93 | self.prior_lscal_exp_short = GPy.priors.LogGaussian(priors["p_mu_exp1"], priors["p_std_other"]) 94 | self.prior_lscal_cos_short = GPy.priors.LogGaussian(priors["p_mu_cos1"], priors["p_std_other"]) 95 | 96 | if self.Q == 2: 97 | self.prior_lscal_exp_long = GPy.priors.LogGaussian(priors["p_mu_exp2"], priors["p_std_other"]) 98 | self.prior_lscal_cos_long = GPy.priors.LogGaussian(priors["p_mu_cos2"], priors["p_std_other"]) 99 | 100 | 101 | def set_period(self, period = 1): 102 | """ Set the period of the series 103 | 104 | Multiple expected periods can be supported by providing an array 105 | """ 106 | 107 | # check for non iterables and make an array 108 | if not isinstance(period, Iterable): 109 | self.periods = [ period ] 110 | else: 111 | self.periods = period 112 | 113 | def set_frequency(self, frequency): 114 | """ Set the data'sfrequency 115 | 116 | The frequency can be either a standard value (monthly, quarterly, yearly, weekly, daily) 117 | or a float defining the "resolution" of the timeseries 118 | 119 | Parameters 120 | ---------- 121 | frequency : str|number 122 | This can be either a standard value among: monthly, quarterly, yearly and weekly 123 | or a numeric value 124 | """ 125 | if type(frequency) != str: 126 | self.sampling_freq = frequency 127 | elif frequency == 'monthly': 128 | self.sampling_freq = 12 129 | elif frequency == 'quarterly': 130 | self.sampling_freq = 4 131 | elif frequency == 'yearly': 132 | self.sampling_freq = 1 133 | elif frequency == 'weekly': 134 | self.sampling_freq = 365.25/7.0 135 | else: 136 | raise Exception(f"wrong frequency: {frequency}") 137 | 138 | def set_q(self, Q): 139 | """ Set the number of spectral kernels (exp+cos) """ 140 | 141 | self.Q = Q 142 | 143 | def do_normalize(self, Y, train = True): 144 | if train: 145 | self.mean = np.mean(Y) 146 | self.std = np.std(Y, ddof=1) 147 | return (Y - self.mean) / self.std 148 | 149 | def do_denormalize(self, Y): 150 | return Y * self.std + self.mean 151 | 152 | def build_gp(self, Yin, X = None): 153 | """ Fit a gaussian process using the specified train values """ 154 | use_bias = True 155 | 156 | if X is None: 157 | X = np.linspace(1/self.sampling_freq,len(Yin)/self.sampling_freq,len(Yin)) 158 | X = X.reshape(len(X),1) 159 | 160 | Y = self.do_normalize(Yin, train = True) if self.normalize else Yin 161 | self.Xtrain = X 162 | 163 | #the yearly case is managed on its own. 164 | lin = GPy.kern.Linear(input_dim=1) 165 | 166 | if self.has_priors: 167 | self.logger.debug(f"Setting Variance Prior {self.prior_var}") 168 | lin.variances.set_prior(self.prior_var) 169 | K = lin 170 | 171 | if use_bias: 172 | bias = GPy.kern.Bias(input_dim=1) 173 | if self.has_priors: 174 | self.logger.debug(f"Setting Bias Prior {self.prior_var}") 175 | bias.variance.set_prior(self.prior_var) 176 | K = K + bias 177 | 178 | rbf = GPy.kern.RBF(input_dim=1) 179 | if self.has_priors: 180 | self.logger.debug(f"Setting RBF priors var {self.prior_var} and lengthscale {self.prior_lscal_rbf}") 181 | rbf.variance.set_prior(self.prior_var) 182 | rbf.lengthscale.set_prior(self.prior_lscal_rbf) 183 | 184 | K = K + rbf 185 | 186 | for period in self.periods: 187 | #the second component is the stdPeriodic 188 | periodic = GPy.kern.StdPeriodic(input_dim=1) 189 | periodic.period.fix(period) # period is set to 1 year by default 190 | 191 | if self.has_priors: 192 | self.logger.debug(f"Setting periodic {period} lscale {self.prior_lscal_std_periodic}") 193 | periodic.lengthscale.set_prior(self.prior_lscal_std_periodic) 194 | periodic.variance.set_prior(self.prior_var) 195 | K = K + periodic 196 | 197 | 198 | #now initiliazes the (Q-1) SM components. Each component is rfb*cos, where 199 | #the variance of the cos is set to 1. 200 | for ii in range(0, self.Q): 201 | cos = GPy.kern.Cosine(input_dim=1) 202 | cos.variance.fix(1) 203 | rbf = GPy.kern.RBF(input_dim=1) #input dim, variance, lenghtscale 204 | 205 | if self.has_priors: 206 | if (ii==0): 207 | rbf.variance.set_prior(self.prior_var) 208 | rbf.lengthscale.set_prior(self.prior_lscal_exp_long) 209 | cos.lengthscale.set_prior(self.prior_lscal_cos_long) 210 | elif (ii==1): 211 | rbf.variance.set_prior(self.prior_var) 212 | rbf.lengthscale.set_prior(self.prior_lscal_exp_short) 213 | cos.lengthscale.set_prior(self.prior_lscal_cos_short) 214 | K = K + cos * rbf 215 | 216 | 217 | GPmodel = GPy.models.GPRegression(X, Y, K) 218 | 219 | if self.has_priors: 220 | GPmodel.likelihood.variance.set_prior(self.prior_var) 221 | 222 | 223 | try: 224 | GPmodel.optimize_restarts(self.restarts, robust=True) 225 | except: 226 | #in the rare case the single optimization numerically fails 227 | GPmodel.optimize_restarts(5, robust=True) 228 | 229 | self.gp_model = GPmodel 230 | return GPmodel 231 | 232 | 233 | def forecast(self, X_forecast): 234 | if type(X_forecast) == int: 235 | lastTrain = self.Xtrain[-1] 236 | endTest = lastTrain + 1/self.sampling_freq * X_forecast 237 | X = np.linspace(lastTrain + 1/self.sampling_freq, endTest, X_forecast) 238 | X = X.reshape(len(X), 1) 239 | else: 240 | X = X_forecast 241 | 242 | 243 | m,v = self.gp_model.predict(X) 244 | s = np.sqrt(v) 245 | 246 | upper = m + s * stats.norm.ppf(0.975) 247 | 248 | if self.normalize: 249 | return self.do_denormalize(m), self.do_denormalize(upper) 250 | else: 251 | return m, upper 252 | 253 | --------------------------------------------------------------------------------