├── README.md
├── data
    ├── M1.csv
    ├── M3.csv
    ├── M4_3000.csv.xz
    ├── example_input
    └── example_priors
├── requirements.txt
└── src
    ├── forecast.py
    └── forgp
        ├── __init__.py
        └── gp.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Time series forecasting with Gaussian Processes
  2 | ## Related Publication
  3 | The theoretical description of the algorithm implemented in this software and empirical results can be found in:
  4 | 
  5 | “Time series forecasting with Gaussian Processes needs priors"\
  6 | Giorgio Corani, Alessio Benavoli, Marco Zaffalon\
  7 | Accepted at ECML-PKDD 2021\
  8 | Arxix preprint: https://arxiv.org/abs/2009.08102
  9 | 
 10 | 
 11 | ## forgp package
 12 | The software includes a small package that builds the gaussian process and uses it to produce predictions. The package heavily relies on GPy. 
 13 | A convenience script can be used to run the GP over collections of timeseries.
 14 | 
 15 | ## **forecast&#46;py**
 16 | __forecast&#46;py__ is an executable python script that can be used to produce forecasts and evaluation of our GP over multiple timeseries. The script takes as input a csv file containing training and test series and produces a csv file with predictions and scores. Input and output file are described below. 
 17 | The produced prediction includes mean and upper bound of the 95% confidence interval.
 18 | 
 19 | A number of command line arguments can be used to specify custom names for the columns in the input CSV file and to filter the timeseries to be processed. 
 20 | Most useful command line arguments are:
 21 | 
 22 |  * --frequency: to include only timeseries with specific frequency
 23 |  * --normalize: normalize timeseries using the specified mean and standard deviation
 24 |  * --log: verbosity level (100 = max vebosity, 0 = min verbosity) Default: 0
 25 |  * --default-priors to use default values for the priors in place place of no prior values
 26 |  * --help returns a description for the various command line arguments
 27 | 
 28 | 
 29 | ## Input File format
 30 | Our tools uses a simple tabular data format serialized in a CSV file. This CSV file use as the only supported field separator the comma ",". 
 31 | The first line contains the header, while Each following line in the file represents a timeseries. Required fields/columns include:
 32 | 
 33 |  * __st__: unique name of the timeseries
 34 |  * __period__: frequency of the timeseries. One of MONTHLY, QUARTERLY, YEARLY and WEEKLY
 35 |  * __mean__: mean for the normalization of the timeseries
 36 |  * __std__: standard deviation for the normalization of the timeseries
 37 |  * __x__: training values of the timeseries
 38 |  * __xx__: test values of the timeseries
 39 | 
 40 | Point values of the timeseries (i.e. __x__ and __xx__) are to be provided as a semicolon (";") separated list of numeric values. 
 41 | 
 42 | ## Output file format
 43 | The output file follows a similar format as the input. It stores the predicted point forecasts and 95% upperbounds as semicolon separated lists within a comma separated file where each line represents the corresponding timeseries from the input file.
 44 | The generated columns include:
 45 | 
 46 |  * __st__: the unique id of the series
 47 |  * __mean__: the mean of the training timeseries
 48 |  * __std__: the standard deviation of the training timeseries
 49 |  * __center__: the mean value of the prediction
 50 |  * __upper__: the upper bound of the 95% confidence prediction band 
 51 |  * __time__: time required to fit and predict
 52 |  * __mae__: mean absolute error of the predicted values (the xx values in the input file)
 53 |  * __crps__: continuous ranked probability score
 54 |  * __ll__: loglikelihood 
 55 | 
 56 | ## Priors file
 57 | Prior values for the different kernel hyper parameters can be provided via file. This file contains the priors as a new line separated list of numbers. These are in order:
 58 |     
 59 |  * standard deviation of variances 
 60 |  * standard deviation of lengtscales
 61 |  * mean of variances
 62 |  * mean of rbf lenghtscale 
 63 |  * mean of periodic kernel's lenghtscale
 64 |  * mean of first spectral kernel's exponential component lenghtscale
 65 |  * mean of first spectral kernel's cosine component lenghtscale
 66 |  * mean of second spectral kernel's exponential component lenghtscale
 67 |  * mean of second spectral kernel's cosine component lenghtscale
 68 |  
 69 | The latter may only be used depending on the selected number of spectral components to be used (Q parameter).
 70 | 
 71 | ## Dependencies and setup
 72 | A requirements file is provided in the package to ease the installation of all the dependencies. For conda based systems one may create a suitable environment with:
 73 | 
 74 | ```sh 
 75 | conda create --name <env> --file requirements.txt
 76 | ```
 77 | 
 78 | ## Example execution
 79 | The package includes a number of input files, inluding standard M1[[1]](#1), M3[[2]](#2) competition timeseries, a sample of the M4 competition[[3]](#3) and a short example input.
 80 | To run the script on the example input one may run the following command from withing the src folder:
 81 | 
 82 | ```sh
 83 | ./forecast.py --log 100 --default-priors --normalize ../data/example_input example_output
 84 | ```
 85 | 
 86 | As hinted above you can provide priors via file:
 87 | 
 88 | ```sh
 89 | ./forecast.py --log 100 --default-priors --normalize --priors ../data/example_priors ../data/example_input example_output
 90 | ```
 91 | 
 92 | ## References
 93 | <a id="1">[1]</a> 
 94 | Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, and R. Winkler (1982) The accuracy of extrapolation (time series) methods: results of a forecasting competition. *Journal of Forecasting*, **1**, 111--153.
 95 | 
 96 | <a id="2">[2]</a> 
 97 | Makridakis, S. and M. Hibon (2000) The M3-competition: results, conclusions and implications. *International Journal of Forecasting*, **16**, 451-476.
 98 | 
 99 | <a id="3">[3]</a>
100 | Makridakis, S., E. Spiliotis and V. Assimakopoulos (2020) The M4 Competition: 100,000 time series and 61 forecasting methods, *International Journal of Forecasting, Elsevier*, **36(1)**, pages 54-74. 
101 | 


--------------------------------------------------------------------------------
/data/M4_3000.csv.xz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IDSIA/gpforecasting/5f4c459d8a0e7d0140ea6ca5a25d4c0d8a34948b/data/M4_3000.csv.xz


--------------------------------------------------------------------------------
/data/example_input:
--------------------------------------------------------------------------------
1 | "st","period","mean","std","x","xx"
2 | "Q94","QUARTERLY",147.99,76.90,"26.2;42.6;68;93.4;193.5;99.5;187.5;163.8;221.3;192.1;135.8;226.7;273.5","247.1;176.7;204.3;186.3;107.7;77.2;86.9;135"
3 | "Q95","QUARTERLY",34.29,6.19,"23.6;30.4;29;27.8;30.6;37.5;38.2;34.3;38.3;46.5;40.3;37.8;31.5","44.9;45.8;37;40.5;55;54.1;50.2;55.5"
4 | "Q96","QUARTERLY",49.83,5.99,"36.8;50.3;47;50.1;43.9;56.7;51.2;43.1;50.1;59.1;54.3;53.3;51.9","68.9;65.2;58.9;64.5;80;76.7;81.1;81.7"


--------------------------------------------------------------------------------
/data/example_priors:
--------------------------------------------------------------------------------
 1 | #std
 2 | 1.0
 3 | 1.0
 4 | #mu
 5 | -1.5
 6 | 1.1
 7 | 0.2
 8 | -0.7
 9 | 0.5
10 | 1.1
11 | 1.6


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | # This file may be used to create an environment using:
 2 | # $ conda create --name <env> --file <this file>
 3 | # platform: osx-64
 4 | blas=1.0=mkl
 5 | ca-certificates=2020.12.5=h033912b_0
 6 | certifi=2020.12.5=py38h50d1736_1
 7 | cycler=0.10.0=py38_0
 8 | decorator=4.4.2=pyhd3eb1b0_0
 9 | freetype=2.10.4=ha233b18_0
10 | gpy=1.9.9=py38ha1b04c9_7
11 | intel-openmp=2019.4=233
12 | jpeg=9b=he5867d9_2
13 | kiwisolver=1.3.1=py38h23ab428_0
14 | lcms2=2.11=h92f6f08_0
15 | libcxx=10.0.0=1
16 | libffi=3.3=hb1e8313_2
17 | libgfortran=3.0.1=h93005f0_2
18 | libpng=1.6.37=ha441bb4_0
19 | libtiff=4.2.0=h87d7836_0
20 | libwebp-base=1.2.0=h9ed2024_0
21 | lz4-c=1.9.3=h23ab428_0
22 | matplotlib-base=3.3.4=py38h8b3ea08_0
23 | mkl=2019.4=233
24 | mkl-service=2.3.0=py38h9ed2024_0
25 | mkl_fft=1.3.0=py38ha059aab_0
26 | mkl_random=1.1.1=py38h959d312_0
27 | ncurses=6.2=h0a44026_1
28 | numpy=1.19.2=py38h456fd55_0
29 | numpy-base=1.19.2=py38hcfb5961_0
30 | olefile=0.46=py_0
31 | openssl=1.1.1k=h0d85af4_0
32 | pandas=1.2.3=py38hb2f4e1b_0
33 | paramz=0.9.5=py_0
34 | pillow=8.1.2=py38h5270095_0
35 | pip=21.0.1=py38hecd8cb5_0
36 | properscoring=0.1=py_0
37 | pyparsing=2.4.7=pyhd3eb1b0_0
38 | python=3.8.8=h88f2d9e_4
39 | python-dateutil=2.8.1=pyhd3eb1b0_0
40 | python_abi=3.8=1_cp38
41 | pytz=2021.1=pyhd3eb1b0_0
42 | readline=8.1=h9ed2024_0
43 | scipy=1.6.2=py38h2515648_0
44 | setuptools=52.0.0=py38hecd8cb5_0
45 | six=1.15.0=py38hecd8cb5_0
46 | sqlite=3.35.2=hce871da_0
47 | tk=8.6.10=hb0a8c7a_0
48 | tornado=6.1=py38h9ed2024_0
49 | wheel=0.36.2=pyhd3eb1b0_0
50 | xz=5.2.5=h1de35cc_0
51 | zlib=1.2.11=h1de35cc_3
52 | zstd=1.4.5=h41d2c2f_0
53 | 


--------------------------------------------------------------------------------
/src/forecast.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import sys
  4 | import argparse
  5 | import pandas as pd
  6 | import numpy as np
  7 | import forgp.gp as gp 
  8 | import time 
  9 | import logging
 10 | 
 11 | class dotdict(dict):
 12 |     __getattr__ = dict.get
 13 |     __setattr__ = dict.__setitem__
 14 |     __delattr__ = dict.__delitem__
 15 | 
 16 | 
 17 | def get_col(data, name, index):
 18 |     return data.iloc[:,index] if index is not None else data.loc[:,name]
 19 | 
 20 | def parse_arguments():
 21 |     ''' Parse and validate command line arguments 
 22 |     '''
 23 |     parser = argparse.ArgumentParser(description="Time series forecasting using GP")
 24 |     parser.add_argument("--log", type=int, default=logging.WARNING)
 25 |     parser.add_argument('-Q', type=int, choices=[0, 1, 2], help="Number of spectral components", default=2)
 26 |     parser.add_argument('-r','--restarts', type=int, help="Number of restarts", default=1)
 27 |     
 28 |     group = parser.add_mutually_exclusive_group()
 29 |     group.add_argument('--priors', help="Custom priors file location")
 30 |     group.add_argument('--default-priors', help="Use default priors", action="store_true")
 31 |     parser.add_argument('--priors-count', help="Number of blocks of priors in custom priors file location", default=1)
 32 | 
 33 |     group = parser.add_mutually_exclusive_group()
 34 |     group.add_argument('-xn', '--train-col', help="Name of the training data column (default x)", default="x")
 35 |     group.add_argument('-xi', '--train-index', help="Index (zero based) of the training data column")
 36 | 
 37 |     group = parser.add_mutually_exclusive_group()
 38 |     group.add_argument('-tn', '--test-col', help="Name of the test data column (default xx)", default="xx")
 39 |     group.add_argument('-ti', '--test-index', help="Index (zero based) of the test data column")
 40 | 
 41 |     parser.add_argument('--frequency-col', help="Specify the frequency column", default = "period")
 42 |     parser.add_argument('-f', '--frequency', help="Specify the frequency (monthly vs quarterly vs yearly)", default = "ANY")
 43 | 
 44 |     group = parser.add_mutually_exclusive_group()
 45 |     group.add_argument('--mean-col', help="Specify the mean column name", default = "mean")
 46 |     group.add_argument('--mean-index', help="Specify the mean column index")
 47 | 
 48 |     group = parser.add_mutually_exclusive_group()
 49 |     group.add_argument('--std-col', help="Specify the std column name", default = "std")
 50 |     group.add_argument('--std-index', help="Specify the std column index")
 51 | 
 52 |     parser.add_argument('--normalize', help="should data be normalized manually", action='store_true')
 53 | 
 54 |     parser.add_argument('--limit', type=int, help="Specify the limit in the length of the training set (-1)", default=-1)
 55 | 
 56 |     parser.add_argument('--sample', help='limit to sample', type=int, default=-1)
 57 |     parser.add_argument('--sample-col', help='sample column', type=str, default="sample")
 58 | 
 59 |     parser.add_argument('data', help="Training test file (in csv format with arrays in semicolon separated)")
 60 |     parser.add_argument('target', help="Output file name")
 61 |     
 62 |     
 63 |     return parser.parse_args()
 64 | 
 65 | 
 66 | def float_arrays(data): 
 67 |     return data.str.split(";").apply(lambda x: np.array(x).astype(np.float))
 68 | 
 69 | 
 70 | def compute_indicators(Ytest, mean, upper):
 71 |     import properscoring as ps
 72 |     import scipy.stats as stat
 73 |     import numpy as np
 74 |     
 75 |     sigma = (upper - mean)/ stat.norm.ppf(0.975)
 76 |     fcast = mean
 77 |     
 78 |     crps = np.zeros(len(Ytest))
 79 |     ll = np.zeros(len(Ytest))
 80 |         
 81 |     for jj in range(len(Ytest)):
 82 |         crps[jj]    = ps.crps_gaussian(Ytest[jj], mu=fcast[jj], sig=sigma[jj])
 83 |         ll[jj]      = stat.norm.logpdf(x=Ytest[jj], loc=fcast[jj], scale=sigma[jj])
 84 |     
 85 |     mae = np.mean(np.abs(Ytest  - fcast))
 86 |     crps = np.mean(crps)
 87 |     ll = np.mean(ll)
 88 | 
 89 |     return([mae, crps, ll])
 90 | 
 91 | 
 92 | if __name__ == "__main__":
 93 |     args = parse_arguments()
 94 | 
 95 |     ## initialize logging
 96 |     logger = logging.getLogger()
 97 |     logger.setLevel(100 - args.log)
 98 |     ch = logging.StreamHandler()
 99 |     ch.setLevel(100- args.log)
100 |     formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
101 |     ch.setFormatter(formatter)
102 |     logger.addHandler(ch)
103 | 
104 |     logger.info("Summary: GP forecasting with RBF, no ETS, no Bias")
105 |     
106 | 
107 |     data = pd.read_csv(args.data)
108 |     if args.frequency.upper() != "ANY":
109 |         data = data[data[args.frequency_col].str.upper() == args.frequency.upper()]
110 |         logger.info(f"Using only {args.frequency} data")
111 |     else:
112 |         logger.info("Not filterning by period")
113 | 
114 |     # filtering by sample number
115 |     if args.sample > 0:
116 |         data = data[data[args.sample_col] <= args.sample]
117 |         logger.info(f"filtering {args.sample}: {data.shape}")
118 | 
119 |    
120 |     train = get_col(data, args.train_col, args.train_index)
121 |     test = get_col(data, args.test_col, args.test_index)
122 |     
123 |     train = float_arrays(train)
124 |     test = float_arrays(test)
125 | 
126 |     if args.normalize: 
127 |         means = get_col(data, args.mean_col, args.mean_index)
128 |         stds = get_col(data, args.std_col, args.std_index)
129 | 
130 |         train = (train - means) / stds
131 |         test = (test - means) / stds
132 | 
133 |     priors = None
134 |     if args.priors is not None:
135 |         priors = pd.read_csv(args.priors, header=None, comment='#')
136 |         cut = int(len(priors) / args.priors_count)
137 |         priors = priors[-cut:].values.flatten()
138 |         pstr = str.join(" ", priors.astype(str))
139 |         logger.debug(f"Using these custom priors: {pstr}")
140 |     elif args.default_priors: 
141 |         logger.info("Using default priors")
142 |         priors = True
143 |     else:
144 |         logger.info("Using no priors")
145 |         priors = False
146 |     
147 |     #priors = False
148 | 
149 |     
150 | 
151 |     out = pd.DataFrame(columns=["st", "mean", "std", "center", "upper"])
152 |     for i in range(0, len(train)):
153 |         start = time.time()
154 | 
155 |             
156 |         row = data.iloc[i,:]
157 |         Y = train.iloc[i]
158 |         if args.limit > 0: 
159 |             Y = Y[-args.limit:]
160 |             logger.info(f"train length: {len(Y)}")
161 |         
162 |         Y = Y.reshape(len(Y), 1)
163 |         YY = test.iloc[i]
164 |         
165 |         # Stderr output to be able to identify GPy errors 
166 |         print(f"----------------------------------------------", file=sys.stderr)
167 |         print(f"Processing series #{i}", file=sys.stderr)
168 |         print(f"st: {row.st}", file=sys.stderr)
169 |                 
170 |         logger.info(f"Processing series #{i}")
171 |         logger.info(f"st: {row.st}")
172 |         logger.debug(f"{row[args.frequency_col]}")
173 |         logger.debug(f"series mean: {row[args.mean_col]}")
174 |         logger.debug(f"series std: {row[args.std_col]}")
175 |         logger.debug(f"train length: {len(Y)}/{len(train.iloc[i])}")
176 |         logger.debug(f"test length: {len(YY)}")
177 |         
178 |         g = gp.GP(row[args.frequency_col].lower(), priors=priors, Q=args.Q, normalize=False, restarts=args.restarts)
179 |         if type(priors) == "list":
180 |             g.priors_array(priors)
181 | 
182 |         g.build_gp(Y)
183 | 
184 |         m, u = g.forecast(len(YY))
185 |         m = m.reshape(len(m))
186 |         u = u.reshape(len(u))
187 | 
188 |         mae, crps, ll = compute_indicators(YY, m, u)
189 |         end = time.time()
190 |         logger.debug(f"duration: {end - start}")
191 | 
192 |         out = out.append([{
193 |             "st":row.st, 
194 |             "mean":row[args.mean_col], 
195 |             "std":row[args.std_col], 
196 |             "time": end - start,
197 |             "center":str.join(";", m.astype(str)), 
198 |             "upper":str.join(";", u.astype(str)), 
199 |             "mae": mae,
200 |             "crps": crps,
201 |             "ll": ll
202 |         }])
203 |         
204 | out.to_csv(args.target)
205 | 


--------------------------------------------------------------------------------
/src/forgp/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IDSIA/gpforecasting/5f4c459d8a0e7d0140ea6ca5a25d4c0d8a34948b/src/forgp/__init__.py


--------------------------------------------------------------------------------
/src/forgp/gp.py:
--------------------------------------------------------------------------------
  1 | import GPy
  2 | import pandas as pd
  3 | import numpy as np
  4 | import scipy.stats as stats
  5 | from collections.abc import Iterable
  6 | import logging
  7 | import collections
  8 | 
  9 | 
 10 | class GP:
 11 | 
 12 |     def __init__(self, frequency, period = 1, Q = 2, priors = True, restarts = 1, normalize = False, loglevel = 0):
 13 |         self.logger = logging.getLogger("forgp")    
 14 |         self.logger.setLevel(loglevel)  
 15 |         self.set_frequency(frequency)
 16 |         self.set_period(period)
 17 |         
 18 |         self.Q = Q
 19 |         self.restarts = restarts
 20 |         self.normalize = normalize
 21 |         
 22 |         self.has_priors = priors is not False
 23 |         if self.has_priors:
 24 |             if priors is True:
 25 |                 self.logger.info("using default priors")
 26 |                 priors = self.default_priors()
 27 |             elif isinstance(priors, (list,pd.core.series.Series,np.ndarray)):
 28 |                 self.priors_array(priors)
 29 |             else: 
 30 |                 self.init_priors(priors)
 31 | 
 32 |         
 33 |         self.logger.info(f"GP priors {priors} {self.has_priors}, Q={self.Q}, restarts={self.restarts}")
 34 | 
 35 | 
 36 |     def standard_prior(self, data):
 37 |         """ Get the priors hash from the array of priors data 
 38 |         
 39 |         This method will name the values in the data according to the ordering used in the
 40 |         hierarchical probabilistic programming prior estimation code. 
 41 |         This is:
 42 |         - std devs
 43 |         - means (variance, periodic then exp, cos for the different Qs)
 44 |         - alpha 
 45 |         - beta
 46 |         """
 47 | 
 48 |         names = ["p_std_var", "p_std_other", "p_mu_var", "p_mu_periodic"]
 49 |         if self.Q >= 1:
 50 |             names += ["p_mu_exp1", "p_mu_cos1"]
 51 |         if self.Q >= 2:
 52 |             names += [ "p_mu_exp2", "p_mu_cos2"]
 53 |         names += [ "p_alpha", "p_beta" ]
 54 | 
 55 |         return dict(zip(names, data))
 56 | 
 57 |     def priors_array(self, data):
 58 |         """ Set priors from array
 59 |         """
 60 |         priors = {
 61 |             "p_std_var": data[0], "p_std_other": data[1], 
 62 |             "p_mu_var": data[2], "p_mu_rbf": data[3], 
 63 |             "p_mu_periodic": data[4]
 64 |         }
 65 | 
 66 |         for i in range(1, self.Q+1):
 67 |             priors[f"p_mu_exp{i}"] = data[5 + (i-1)*2]
 68 |             priors[f"p_mu_cos{i}"] = data[6 + (i-1)*2]
 69 | 
 70 |         self.has_priors = True
 71 |         self.init_priors(priors)
 72 |     
 73 |     def default_priors(self):
 74 |         """ Get default prior values """
 75 |         
 76 |     
 77 |         priors = {
 78 |                 "p_std_var": 1.0, "p_std_other": 1.0, 
 79 |                 "p_mu_var": -1.5, "p_mu_rbf": 1.1, 
 80 |                 "p_mu_periodic": 0.2, "p_mu_exp1": -0.7, "p_mu_cos1": 0.5, "p_mu_exp2": 1.1, "p_mu_cos2": 1.6, 
 81 |         
 82 |             }
 83 |   
 84 |         return priors
 85 | 
 86 |     def init_priors(self, priors):
 87 |         """ Initialize the prior parameters crearing the GPy priors """
 88 |         self.prior_var = GPy.priors.LogGaussian(priors["p_mu_var"], priors["p_std_var"])
 89 |         self.prior_lscal_rbf = GPy.priors.LogGaussian(priors["p_mu_rbf"], priors["p_std_other"]) 
 90 |         self.prior_lscal_std_periodic = GPy.priors.LogGaussian(priors["p_mu_periodic"], priors["p_std_other"]) 
 91 |                 
 92 |         if self.Q >= 1: 
 93 |             self.prior_lscal_exp_short = GPy.priors.LogGaussian(priors["p_mu_exp1"], priors["p_std_other"])
 94 |             self.prior_lscal_cos_short = GPy.priors.LogGaussian(priors["p_mu_cos1"], priors["p_std_other"])
 95 |         
 96 |         if self.Q == 2:
 97 |             self.prior_lscal_exp_long = GPy.priors.LogGaussian(priors["p_mu_exp2"], priors["p_std_other"])
 98 |             self.prior_lscal_cos_long = GPy.priors.LogGaussian(priors["p_mu_cos2"], priors["p_std_other"])
 99 | 
100 | 
101 |     def set_period(self, period = 1):
102 |         """ Set the period of the series
103 | 
104 |         Multiple expected periods can be supported by providing an array
105 |         """
106 | 
107 |         # check for non iterables and make an array
108 |         if not isinstance(period, Iterable):
109 |             self.periods = [ period ]
110 |         else:
111 |             self.periods = period
112 | 
113 |     def set_frequency(self, frequency):
114 |         """ Set the data'sfrequency 
115 |         
116 |         The frequency can be either a standard value (monthly, quarterly, yearly, weekly, daily)
117 |         or a float defining the "resolution" of the timeseries
118 | 
119 |         Parameters
120 |         ----------
121 |             frequency : str|number
122 |                 This can be either a standard value among: monthly, quarterly, yearly and weekly 
123 |                 or a numeric value
124 |         """
125 |         if type(frequency) != str:
126 |             self.sampling_freq = frequency
127 |         elif frequency == 'monthly':
128 |             self.sampling_freq = 12
129 |         elif frequency == 'quarterly':
130 |             self.sampling_freq = 4
131 |         elif frequency == 'yearly':
132 |             self.sampling_freq = 1
133 |         elif frequency == 'weekly':
134 |             self.sampling_freq = 365.25/7.0
135 |         else:
136 |             raise Exception(f"wrong frequency: {frequency}")
137 | 
138 |     def set_q(self, Q):
139 |         """ Set the number of spectral kernels (exp+cos) """
140 | 
141 |         self.Q = Q
142 | 
143 |     def do_normalize(self, Y, train = True):
144 |         if train: 
145 |             self.mean = np.mean(Y)
146 |             self.std = np.std(Y, ddof=1)
147 |         return (Y - self.mean) / self.std
148 | 
149 |     def do_denormalize(self, Y):
150 |         return Y * self.std + self.mean
151 | 
152 |     def build_gp(self, Yin, X = None):
153 |         """ Fit a gaussian process using the specified train values """
154 |         use_bias = True
155 | 
156 |         if X is None:
157 |             X = np.linspace(1/self.sampling_freq,len(Yin)/self.sampling_freq,len(Yin))
158 |             X = X.reshape(len(X),1)
159 |         
160 |         Y = self.do_normalize(Yin, train = True) if self.normalize else Yin
161 |         self.Xtrain = X
162 | 
163 |         #the yearly case is managed on its own.
164 |         lin = GPy.kern.Linear(input_dim=1)
165 |         
166 |         if self.has_priors:
167 |             self.logger.debug(f"Setting Variance Prior {self.prior_var}")
168 |             lin.variances.set_prior(self.prior_var)
169 |         K = lin
170 | 
171 |         if use_bias: 
172 |             bias = GPy.kern.Bias(input_dim=1)
173 |             if self.has_priors:
174 |                 self.logger.debug(f"Setting Bias Prior {self.prior_var}")
175 |                 bias.variance.set_prior(self.prior_var)
176 |             K = K + bias
177 |     
178 |         rbf = GPy.kern.RBF(input_dim=1)
179 |         if self.has_priors:
180 |             self.logger.debug(f"Setting RBF priors var {self.prior_var} and lengthscale {self.prior_lscal_rbf}")
181 |             rbf.variance.set_prior(self.prior_var)
182 |             rbf.lengthscale.set_prior(self.prior_lscal_rbf)
183 |             
184 |         K = K + rbf
185 | 
186 |         for period in self.periods:
187 |             #the second component  is the stdPeriodic
188 |             periodic = GPy.kern.StdPeriodic(input_dim=1)
189 |             periodic.period.fix(period) # period is set to 1 year by default
190 | 
191 |             if self.has_priors:
192 |                 self.logger.debug(f"Setting periodic {period} lscale {self.prior_lscal_std_periodic}")
193 |                 periodic.lengthscale.set_prior(self.prior_lscal_std_periodic)
194 |                 periodic.variance.set_prior(self.prior_var)
195 |             K = K + periodic
196 | 
197 | 
198 |         #now initiliazes the  (Q-1) SM components. Each component is rfb*cos, where
199 |         #the variance of the cos is set to 1.
200 |         for ii in range(0, self.Q):
201 |             cos =  GPy.kern.Cosine(input_dim=1)
202 |             cos.variance.fix(1)
203 |             rbf =  GPy.kern.RBF(input_dim=1) #input dim, variance, lenghtscale
204 |     
205 |             if self.has_priors:
206 |                 if (ii==0):
207 |                         rbf.variance.set_prior(self.prior_var)
208 |                         rbf.lengthscale.set_prior(self.prior_lscal_exp_long)
209 |                         cos.lengthscale.set_prior(self.prior_lscal_cos_long)
210 |                 elif (ii==1):
211 |                         rbf.variance.set_prior(self.prior_var)
212 |                         rbf.lengthscale.set_prior(self.prior_lscal_exp_short)               
213 |                         cos.lengthscale.set_prior(self.prior_lscal_cos_short)
214 |             K = K + cos * rbf
215 |                 
216 |         
217 |         GPmodel = GPy.models.GPRegression(X, Y, K)
218 | 
219 |         if self.has_priors:
220 |             GPmodel.likelihood.variance.set_prior(self.prior_var)
221 |         
222 |     
223 |         try:
224 |             GPmodel.optimize_restarts(self.restarts, robust=True)
225 |         except:
226 |             #in the rare case the single optimization numerically fails
227 |             GPmodel.optimize_restarts(5, robust=True)
228 |         
229 |         self.gp_model = GPmodel
230 |         return GPmodel
231 | 
232 | 
233 |     def forecast(self, X_forecast):
234 |         if type(X_forecast) == int:
235 |             lastTrain = self.Xtrain[-1]
236 |             endTest = lastTrain + 1/self.sampling_freq * X_forecast
237 |             X = np.linspace(lastTrain + 1/self.sampling_freq, endTest, X_forecast)
238 |             X = X.reshape(len(X), 1)
239 |         else:
240 |             X = X_forecast
241 |     
242 |         
243 |         m,v = self.gp_model.predict(X)
244 |         s = np.sqrt(v)
245 | 
246 |         upper = m + s * stats.norm.ppf(0.975)
247 |         
248 |         if self.normalize:
249 |             return self.do_denormalize(m), self.do_denormalize(upper)
250 |         else:
251 |             return m, upper
252 | 
253 | 


--------------------------------------------------------------------------------