├── main.py
├── README.md
├── simulate.py
├── TIRE.py
└── utils.py


/main.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | from tensorflow import keras
 3 | from tensorflow.keras import backend as K
 4 | from tensorflow.keras.layers import Lambda, Input, Dense
 5 | from tensorflow.keras.models import Model
 6 | 
 7 | import numpy as np
 8 | import random
 9 | import matplotlib.pyplot as plt
10 | from scipy.signal import find_peaks, peak_prominences
11 | import warnings
12 | import time, copy
13 | 
14 | import utils
15 | import TIRE
16 | import simulate
17 | 
18 | #---------------------------#
19 | ##SET PARAMETERS
20 | 
21 | window_size = 20
22 | domain = "both" #choose from: TD (time domain), FD (frequency domain) or both
23 | 
24 | #parameters TD
25 | intermediate_dim_TD=0
26 | latent_dim_TD=1 #h^TD in paper
27 | nr_shared_TD=1 #s^TD in paper
28 | K_TD = 2 #as in paper
29 | nr_ae_TD= K_TD+1 #number of parallel AEs = K+1
30 | loss_weight_TD=1 #lambda_TD in paper
31 | 
32 | #parameters FD
33 | intermediate_dim_FD=10
34 | latent_dim_FD=1 #h^FD in paper
35 | nr_shared_FD=1 #s^FD in paper
36 | K_FD = 2 #as in paper
37 | nr_ae_FD=K_FD+1 #number of parallel AEs = K+1
38 | loss_weight_FD=1 #lambda^FD in paper
39 | nfft = 30 #number of points for DFT
40 | norm_mode = "timeseries" #for calculation of DFT, should the timeseries have mean zero or each window?
41 | 
42 | #---------------------------#
43 | ##GENERATE OR LOAD DATA
44 | 
45 | timeseries, windows_TD, parameters = simulate.generate_jumpingmean(window_size)
46 | windows_FD = utils.calc_fft(windows_TD, nfft, norm_mode)
47 | #note: loaded data can be preprocessed using utils.ts_to_windows and utils.combine_ts
48 | 
49 | 
50 | #---------------------------#
51 | ##TRAIN THE AUTOENCODERS
52 | 
53 | shared_features_TD = TIRE.train_AE(windows_TD, intermediate_dim_TD, latent_dim_TD, nr_shared_TD, nr_ae_TD, loss_weight_TD)
54 | shared_features_FD = TIRE.train_AE(windows_FD, intermediate_dim_FD, latent_dim_FD, nr_shared_FD, nr_ae_FD, loss_weight_FD)
55 | 
56 | #---------------------------#
57 | ##POSTPROCESSING AND PEAK DETECTION
58 | 
59 | dissimilarities = TIRE.smoothened_dissimilarity_measures(shared_features_TD, shared_features_FD, domain, window_size)
60 | change_point_scores = TIRE.change_point_score(dissimilarities, window_size)
61 | 
62 | np.savetxt("change_point_scores.txt", change_point_scores)
63 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # TIRE
 2 | 
 3 | **TIRE** is an autoencoder-based change point detection algorithm for time series data that uses a TIme-Invariant Representation (TIRE). More information can be found in the paper *Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation*, published in *IEEE Transactions on Signal Processing* in 2021. 
 4 | 
 5 | The authors of this paper are:
 6 | 
 7 | - [Tim De Ryck](https://math.ethz.ch/sam/the-institute/people.html?u=deryckt) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven; now [SAM](https://math.ethz.ch/sam), Dept. Mathematics, ETH Zürich)
 8 | - [Maarten De Vos](https://www.esat.kuleuven.be/stadius/person.php?id=203) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven and Dept. Development and Regeneration, KU Leuven)
 9 | - [Alexander Bertrand](https://www.esat.kuleuven.be/stadius/person.php?id=331) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven)
10 | 
11 | All authors are affiliated to [LEUVEN.AI - KU Leuven institute for AI](https://ai.kuleuven.be). Note that work based on TIRE should cite our paper: 
12 | 
13 |     @article{deryck2021change,
14 |     title={Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation},
15 |     author={De Ryck, Tim and De Vos, Maarten and Bertrand, Alexander},
16 |     journal={IEEE Transactions on Signal Processing},
17 |     year={2021},
18 |     publisher={IEEE}
19 |     }
20 | 
21 | ## Abstract
22 | 
23 | *Change point detection (CPD) aims to locate abrupt property changes in time series data. Recent CPD methods demonstrated the potential of using deep learning techniques, but often lack the ability to identify more subtle changes in the autocorrelation statistics of the signal and suffer from a high false alarm rate. To address these issues, we employ an autoencoder-based methodology with a novel loss function, through which the used autoencoders learn a partially time-invariant representation that is tailored for CPD. The result is a flexible method that allows the user to indicate whether change points should be sought in the time domain, frequency domain or both. Detectable change points include abrupt changes in the slope, mean, variance, autocorrelation function and frequency spectrum. We demonstrate that our proposed method is consistently highly competitive or superior to baseline methods on diverse simulated and real-life benchmark data sets. Finally, we mitigate the issue of false detection alarms through the use of a postprocessing procedure that combines a matched filter and a newly proposed change point score. We show that this combination drastically improves the performance of our method as well as all baseline methods.*
24 | 
25 | ## Goal
26 | 
27 | More concretely, the goal of TIRE is the following. Given raw time series data, TIRE returns for each time stamp of the time series a change point score. This score reflects the probability that there is a change point at (or near) the corresponding time stamp. Note that the absolute value of this change point score has no meaning. It is then common practice to discard the change point for which the change point score is below some user-defined treshold. For more information on how the change point scores are obtained we refer to our paper. 
28 | 
29 | Detectable change points include abrupt changes in: 
30 | - Mean
31 | - Slope
32 | - Variance
33 | - Autocorrelation
34 | - Frequency spectrum
35 | 
36 | ## Guidelines
37 | 
38 | First install all required packages, these can be found in the beginning of each file. We provided a Jupyter notebook `TIRE_example_notebook.ipynb` that demonstrates how the TIRE change point scores can be obtained from the raw time series data. In addition, the change points obtained by TIRE are compared in the notebook to the ground truth both visually and through the calculation of the AUC score. Alternatively, you can run `main.py` to obtain a txt-file containing the change point scores. 
39 | 
40 | ## Contact
41 | 
42 | In case of comments or questions, please contact me at <tim.deryck@math.ethz.ch>. 
43 | 


--------------------------------------------------------------------------------
/simulate.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow import keras
  3 | from tensorflow.keras import backend as K
  4 | from tensorflow.keras.layers import Lambda, Input, Dense
  5 | from tensorflow.keras.models import Model
  6 | 
  7 | import numpy as np
  8 | import random
  9 | import matplotlib.pyplot as plt
 10 | from scipy.signal import find_peaks, peak_prominences
 11 | import warnings
 12 | import time, copy
 13 | 
 14 | import utils
 15 | 
 16 | def ar2(value1,value2,coef1,coef2,mu,sigma):
 17 |     """
 18 |     AR(2) model, cfr. paper
 19 |     """
 20 |     return coef1*value1+coef2*value2 + np.random.randn()*sigma+mu
 21 | 
 22 | def ar1(value1,coef1,mu,sigma):
 23 |     """
 24 |     AR(1) model, cfr. paper
 25 |     """
 26 |     return coef1*value1 + np.random.randn()*sigma+mu
 27 | 
 28 | def generate_jumpingmean(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1):
 29 |     """
 30 |     Generates one instance of a jumping mean time series, together with the corresponding windows and parameters
 31 |     """
 32 |     mu = np.zeros((nr_cp,))
 33 |     parameters_jumpingmean = []
 34 |     for n in range(1,nr_cp):
 35 |         mu[n] = mu[n-1] + n/16
 36 |     for n in range(nr_cp):
 37 |         nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std))
 38 |         parameters_jumpingmean.extend(mu[n]*np.ones((nr,)))
 39 |     
 40 |     parameters_jumpingmean = np.array([parameters_jumpingmean]).T
 41 | 
 42 |     ts_length = len(parameters_jumpingmean)
 43 |     timeseries = np.zeros((ts_length))
 44 |     for i in range(2,ts_length):
 45 |         #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5))
 46 |         timeseries[i] = ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5)
 47 | 
 48 |     windows = utils.ts_to_windows(timeseries, 0, window_size, stride)
 49 |     windows = utils.minmaxscale(windows,scale_min,scale_max)
 50 |     
 51 |     return timeseries, windows, parameters_jumpingmean
 52 | 
 53 | def generate_scalingvariance(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1):
 54 |     """
 55 |     Generates one instance of a scaling variance time series, together with the corresponding windows and parameters
 56 |     """
 57 |     sigma = np.ones((nr_cp,))
 58 |     parameters_scalingvariance = []
 59 |     for n in range(1,nr_cp-1,2):
 60 |         sigma[n] = np.log(np.exp(1)+n/4)
 61 |     for n in range(nr_cp):
 62 |         nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std))
 63 |         parameters_scalingvariance.extend(sigma[n]*np.ones((nr,)))
 64 | 
 65 |     parameters_scalingvariance = np.array([parameters_scalingvariance]).T
 66 | 
 67 |     
 68 |     ts_length = len(parameters_scalingvariance)
 69 |     timeseries = np.zeros((ts_length))
 70 |     for i in range(2,ts_length):
 71 |         #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5))
 72 |         timeseries[i] = ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, 0, parameters_scalingvariance[i])
 73 | 
 74 |     windows = utils.ts_to_windows(timeseries, 0, window_size, stride)
 75 |     windows = utils.minmaxscale(windows,scale_min,scale_max)
 76 |     
 77 |     return timeseries, windows, parameters_scalingvariance
 78 | 
 79 | def generate_gaussian(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1):
 80 |     """
 81 |     Generates one instance of a Gaussian mixtures time series, together with the corresponding windows and parameters
 82 |     """
 83 |     mixturenumber = np.zeros((nr_cp,))
 84 |     parameters_gaussian = []
 85 |     for n in range(1,nr_cp-1,2):
 86 |         mixturenumber[n] = 1
 87 |     for n in range(nr_cp):
 88 |         nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std))
 89 |         parameters_gaussian.extend(mixturenumber[n]*np.ones((nr,)))
 90 | 
 91 |     parameters_gaussian = np.array([parameters_gaussian]).T
 92 | 
 93 |     ts_length = len(parameters_gaussian)
 94 |     timeseries = np.zeros((ts_length))
 95 |     for i in range(2,ts_length):
 96 |         #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5))
 97 |         if parameters_gaussian[i] == 0:
 98 |             timeseries[i] = 0.5*0.5*np.random.randn()+0.5*0.5*np.random.randn()
 99 |         else:
100 |             timeseries[i] = -0.6 - 0.8*1*np.random.randn() + 0.2*0.1*np.random.randn()
101 | 
102 |     windows = utils.ts_to_windows(timeseries, 0, window_size, stride)
103 |     windows = utils.minmaxscale(windows,scale_min,scale_max)
104 |     
105 |     return timeseries, windows, parameters_gaussian
106 | 
107 | def generate_changingcoefficients(window_size, stride=1, nr_cp=49, delta_t_cp = 1000, delta_t_cp_std = 10, scale_min=-1, scale_max=1):
108 |     """
109 |     Generates one instance of a changing coefficients time series, together with the corresponding windows and parameters
110 |     """
111 |     coeff = np.ones((nr_cp,))
112 |     parameters = []
113 |     for n in range(0,nr_cp,2):
114 |         coeff[n] = np.random.rand()*0.5
115 |     for n in range(1,nr_cp-1,2):
116 |         coeff[n] = np.random.rand()*0.15+0.8
117 |     
118 |     for n in range(nr_cp):
119 |         nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std))
120 |         parameters.extend(coeff[n]*np.ones((nr,)))
121 |         
122 |     #parameters = np.array([parameters]).T
123 |     parameters = ts_to_windows(parameters,0,1,stride)
124 | 
125 |     ts_length = len(parameters)
126 |     timeseries = np.zeros((ts_length))
127 |     for i in range(1,ts_length):
128 |         #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5))
129 |         timeseries[i] = ar1(timeseries[i-1],parameters[i], 0,1)
130 | 
131 |     windows = utils.ts_to_windows(timeseries, 0, window_size, stride)
132 |     windows = utils.minmaxscale(windows,scale_min,scale_max)
133 |     
134 |     return timeseries, windows, parameters
135 | 


--------------------------------------------------------------------------------
/TIRE.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow import keras
  3 | from tensorflow.keras import backend as K
  4 | from tensorflow.keras.layers import Lambda, Input, Dense
  5 | from tensorflow.keras.models import Model
  6 | 
  7 | import numpy as np
  8 | import random
  9 | import matplotlib.pyplot as plt
 10 | from scipy.signal import find_peaks, peak_prominences
 11 | import warnings
 12 | import time, copy
 13 | 
 14 | import utils
 15 | 
 16 | def create_parallel_aes(window_size_per_ae,
 17 |                        intermediate_dim=0,
 18 |                        latent_dim=1,
 19 |                        nr_ae=3,
 20 |                        nr_shared=1,
 21 |                        loss_weight=1):
 22 |     """
 23 |     Create a Tensorflow model with parallel autoencoders, as visualized in Figure 1 of the TIRE paper.
 24 |     
 25 |     Args:
 26 |         window_size_per_ae: window size for the AE
 27 |         intermediate_dim: intermediate dimension for stacked AE, for single-layer AE use 0
 28 |         latent_dim: latent dimension of AE
 29 |         nr_ae: number of parallel AEs (K in paper)
 30 |         nr_shared: number of shared features (should be <= latent_dim)
 31 |         loss_weight: lambda in paper
 32 |         
 33 |     Returns:
 34 |         A parallel AE model instance, its encoder part and its decoder part
 35 |     """
 36 |     wspa = window_size_per_ae
 37 |     
 38 |     x = Input(shape=(nr_ae,wspa,))
 39 |     
 40 |     if intermediate_dim == 0:
 41 |         y=x
 42 |     else:
 43 |         y = Dense(intermediate_dim, activation=tf.nn.relu)(x)
 44 |         #y = tf.keras.layers.BatchNormalization()(y)
 45 |         
 46 |     z_shared = Dense(nr_shared, activation=tf.nn.tanh)(y)
 47 |     z_unshared = Dense(latent_dim-nr_shared, activation=tf.nn.tanh)(y)
 48 |     z = tf.concat([z_shared,z_unshared],-1)
 49 |     
 50 |     
 51 |     if intermediate_dim == 0:
 52 |         y=z
 53 |     else:
 54 |         y = Dense(intermediate_dim, activation=tf.nn.relu)(z)
 55 |         #y = tf.keras.layers.BatchNormalization()(y)
 56 |         
 57 |     x_decoded = Dense(wspa,activation=tf.nn.tanh)(y)
 58 |     
 59 |     pae = Model(x,x_decoded)
 60 |     encoder = Model(x,z)
 61 |     
 62 |     input_decoder = Input(shape=(nr_ae, latent_dim,))
 63 |     if intermediate_dim == 0:
 64 |         layer1 = pae.layers[-1]
 65 |         decoder = Model(input_decoder, layer1(input_decoder))
 66 |     else:
 67 |         layer1 = pae.layers[-1]
 68 |         layer2 = pae.layers[-2]
 69 |         decoder = Model(input_decoder, layer1(layer2(input_decoder)))
 70 |             
 71 |     pae.summary()
 72 |     
 73 |     def pae_loss(x,x_decoded):
 74 |         squared_diff = K.square(x-x_decoded)
 75 |         mse_loss = tf.reduce_mean(squared_diff)
 76 |         
 77 |         square_diff2 = K.square(z_shared[:,1:,:]-z_shared[:,:nr_ae-1,:])
 78 |         shared_loss = tf.reduce_mean(square_diff2)
 79 |         
 80 |         return mse_loss + loss_weight*shared_loss
 81 |     
 82 |     squared_diff = K.square(x-x_decoded)
 83 |     mse_loss = tf.reduce_mean(squared_diff)
 84 |         
 85 |     square_diff2 = K.square(z_shared[:,1:,:]-z_shared[:,:nr_ae-1,:])
 86 |     shared_loss = tf.reduce_mean(square_diff2)
 87 |     total_loss = mse_loss + loss_weight*shared_loss
 88 |     
 89 |     pae.add_loss(total_loss)
 90 |     
 91 |     return pae, encoder, decoder
 92 | 
 93 | def prepare_input_paes(windows,nr_ae):
 94 |     """
 95 |     Prepares input for create_parallel_ae
 96 |     
 97 |     Args:
 98 |         windows: list of windows
 99 |         nr_ae: number of parallel AEs (K in paper)
100 |         
101 |     Returns:
102 |         array with shape (nr_ae, (nr. of windows)-K+1, window size)
103 |     """
104 |     new_windows = []
105 |     nr_windows = windows.shape[0]
106 |     for i in range(nr_ae):
107 |         new_windows.append(windows[i:nr_windows-nr_ae+1+i])
108 |     return np.transpose(new_windows,(1,0,2))
109 | 
110 | def train_AE(windows, intermediate_dim=0, latent_dim=1, nr_shared=1, nr_ae=3, loss_weight=1, nr_epochs=200, nr_patience=200):
111 |     """
112 |     Creates and trains an autoencoder with a Time-Invariant REpresentation (TIRE)
113 |     
114 |     Args:
115 |         windows: time series windows (i.e. {y_t}_t or {z_t}_t in the notation of the paper)
116 |         intermediate_dim: intermediate dimension for stacked AE, for single-layer AE use 0
117 |         latent_dim: latent dimension of AE
118 |         nr_shared: number of shared features (should be <= latent_dim)
119 |         nr_ae: number of parallel AEs (K in paper)
120 |         loss_weight: lambda in paper
121 |         nr_epochs: number of epochs for training
122 |         nr_patience: patience for early stopping
123 |         
124 |     Returns:
125 |         returns the TIRE encoded windows for all windows
126 |     """
127 |     window_size_per_ae = windows.shape[-1]
128 |     
129 |     new_windows = prepare_input_paes(windows,nr_ae)
130 | 
131 |     pae, encoder, decoder = create_parallel_aes(window_size_per_ae,intermediate_dim,latent_dim,nr_ae,nr_shared,loss_weight)
132 |     pae.compile(optimizer='adam')
133 | 
134 |     callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=nr_patience)
135 | 
136 |     pae.fit(new_windows,
137 |                                   epochs=nr_epochs,
138 |                                   verbose=1,
139 |                                   batch_size=64,
140 |                                   shuffle=True,
141 |                                   validation_split=0.0,
142 |                                   initial_epoch=0,
143 |                                   callbacks=[callback]
144 |                                   )
145 | 
146 |     #reconstruct = pae.predict(new_windows)
147 |     encoded_windows_pae = encoder.predict(new_windows)
148 |     encoded_windows = np.concatenate((encoded_windows_pae[:,0,:nr_shared],encoded_windows_pae[-nr_ae+1:,nr_ae-1,:nr_shared]),axis=0)
149 | 
150 |     return encoded_windows
151 | 
152 | def smoothened_dissimilarity_measures(encoded_windows, encoded_windows_fft, domain, window_size):
153 |     """
154 |     Calculation of smoothened dissimilarity measures
155 |     
156 |     Args:
157 |         encoded_windows: TD latent representation of windows
158 |         encoded_windows_fft:  FD latent representation of windows
159 |         domain: TD/FD/both
160 |         parameters: array with used parameters
161 |         window_size: window size used
162 |         par_smooth
163 |         
164 |     Returns:
165 |         smoothened dissimilarity measures
166 |     """
167 |     
168 |     if domain == "TD":
169 |         encoded_windows_both = encoded_windows
170 |     elif domain == "FD":
171 |         encoded_windows_both = encoded_windows_fft
172 |     elif domain == "both":
173 |         beta = np.quantile(utils.distance(encoded_windows, window_size), 0.95)
174 |         alpha = np.quantile(utils.distance(encoded_windows_fft, window_size), 0.95)
175 |         encoded_windows_both = np.concatenate((encoded_windows*alpha, encoded_windows_fft*beta),axis=1)
176 |     
177 |     encoded_windows_both = utils.matched_filter(encoded_windows_both, window_size)
178 |     distances = utils.distance(encoded_windows_both, window_size)
179 |     distances = utils.matched_filter(distances, window_size)
180 |     
181 |     return distances
182 | 
183 | def change_point_score(distances, window_size):
184 |     """
185 |     Gives the change point score for each time stamp. A change point score > 0 indicates that a new segment starts at that time stamp.
186 |     
187 |     Args:
188 |     distances: postprocessed dissimilarity measure for all time stamps
189 |     window_size: window size used in TD for CPD
190 |         
191 |     Returns:
192 |     change point scores for every time stamp (i.e. zero-padded such that length is same as length time series)
193 |     """
194 |     
195 |     prominences = np.array(utils.new_peak_prominences(distances)[0])
196 |     prominences = prominences/np.amax(prominences)
197 |     return np.concatenate((np.zeros((window_size,)), prominences, np.zeros((window_size-1,))))
198 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow import keras
  3 | from tensorflow.keras import backend as K
  4 | from tensorflow.keras.layers import Lambda, Input, Dense
  5 | from tensorflow.keras.models import Model
  6 | 
  7 | import numpy as np
  8 | import random
  9 | import matplotlib.pyplot as plt
 10 | from scipy.signal import find_peaks, peak_prominences
 11 | import warnings
 12 | import time, copy
 13 | 
 14 | def distance(data, window_size):
 15 |     """
 16 |     Calculates distance (dissimilarity measure) between features
 17 |     
 18 |     Args:
 19 |         data: array of of learned features of size (nr. of windows) x (number of shared features)
 20 |         window_size: window size used for CPD
 21 |         
 22 |     Returns:
 23 |         Array of dissimilarities of size ((nr. of windows)-stride)
 24 |     """
 25 |     
 26 |     nr_windows = np.shape(data)[0]
 27 |     
 28 |     index_1 = range(window_size,nr_windows,1)
 29 |     index_2 = range(0,nr_windows-window_size,1)
 30 |     
 31 |     return np.sqrt(np.sum(np.square(data[index_1]-data[index_2]),1))
 32 | 
 33 | def parameters_to_cps(parameters,window_size):
 34 |     """
 35 |     Preparation for plotting ground-truth change points
 36 |     
 37 |     Args:
 38 |         parameters: array parameters used to generate time series, size Tx(nr. of parameters)
 39 |         window_size: window size used for CPD
 40 |         
 41 |     Returns:
 42 |         Array of which entry is non-zero in the presence of a change point. Higher values correspond to larger parameter changes.
 43 |     """
 44 |     
 45 |     length_ts = np.size(parameters,0)
 46 |         
 47 |     index1 = range(window_size-1,length_ts-window_size,1) #selects parameter at LAST time stamp of window
 48 |     index2 = range(window_size,length_ts-window_size+1,1) #selects parameter at FIRST time stamp of next window
 49 | 
 50 |     diff_parameters = np.sqrt(np.sum(np.square(parameters[index1]-parameters[index2]),1))
 51 |     
 52 |     max_diff = np.max(diff_parameters)
 53 |     
 54 |     return diff_parameters/max_diff
 55 | 
 56 | def cp_to_timestamps(changepoints, tolerance, length_ts):
 57 |     """
 58 |     Extracts time stamps of change points
 59 |     
 60 |     Args:
 61 |         changepoints:
 62 |         tolerance:
 63 |         length_ts: length of original time series
 64 |         
 65 |     Returns:
 66 |         list where each entry is a list with the windows affected by a change point
 67 |     """
 68 |     
 69 |     locations_cp = [idx for idx, val in enumerate(changepoints) if val > 0.0]
 70 |     
 71 |     output = []
 72 |     while len(locations_cp)>0:
 73 |         k = 0
 74 |         for i in range(len(locations_cp)-1):
 75 |             if  locations_cp[i]+1 == locations_cp[i+1]:
 76 |                 k+=1
 77 |             else:
 78 |                 break
 79 |         
 80 |         output.append(list(range(max(locations_cp[0]-tolerance,0),min(locations_cp[k]+1+tolerance,length_ts),1)))
 81 |         del locations_cp[:k+1]
 82 |         
 83 |     return output
 84 | 
 85 | def ts_to_windows(ts, onset, window_size, stride, normalization="timeseries"):
 86 |     """Transforms time series into list of windows"""
 87 |     windows = []
 88 |     len_ts = len(ts)
 89 |     onsets = range(onset, len_ts-window_size+1, stride)
 90 |     
 91 |     if normalization =="timeseries":
 92 |         for timestamp in onsets:
 93 |             windows.append(ts[timestamp:timestamp+window_size])
 94 |     elif normalization=="window":
 95 |         for timestamp in onsets:
 96 |             windows.append(np.array(ts[timestamp:timestamp+window_size])-np.mean(ts[timestamp:timestamp+window_size]))
 97 |             
 98 |     return np.array(windows)
 99 | 
100 | def combine_ts(list_of_windows):
101 |     """
102 |     Combines a list of windows from multiple views to one list of windows
103 |     
104 |     Args:
105 |         list_of_windows: list of windows from multiple views
106 |         
107 |     Returns:
108 |         one array with the concatenated windows
109 |     """
110 |     nr_ts, nr_windows, window_size = np.shape(list_of_windows)
111 |     tss = np.array(list_of_windows)
112 |     new_ts = []
113 |     for i in range(nr_windows):
114 |         new_ts.append(tss[:,i,:].flatten())
115 |     return np.array(new_ts)
116 | 
117 | def new_peak_prominences(distances):
118 |     """
119 |     Adapted calculation of prominence of peaks, based on the original scipy code
120 |     
121 |     Args:
122 |         distances: dissimarity scores
123 |     Returns:
124 |         prominence scores
125 |     """
126 |     with warnings.catch_warnings():
127 |         warnings.filterwarnings("ignore")
128 |         all_peak_prom = peak_prominences(distances,range(len(distances)))
129 |     return all_peak_prom
130 | 
131 | def tpr_fpr(bps,distances, method="prominence",tol_dist=0):
132 |     """Calculation of TPR and FPR
133 |     
134 |     Args:
135 |         bps: list of breakpoints (change points)
136 |         distances: list of dissimilarity scores
137 |         method: prominence- or height-based change point score
138 |         tol_dist: toleration distance
139 |         
140 |     Returns:
141 |         list of TPRs and FPRs for different values of the detection threshold
142 |     """
143 |     
144 |     peaks = find_peaks(distances)[0]
145 |     peaks_prom = peak_prominences(distances,peaks)[0]
146 |     peaks_prom_all = np.array(new_peak_prominences(distances)[0])
147 |     distances = np.array(distances)
148 |         
149 |     bps = np.array(bps)
150 |         
151 |     if method == "prominence":
152 |         
153 |         nr_bps = len(bps)
154 |         
155 |         index_closest_peak = [0]*nr_bps
156 |         bp_detected = [0]*nr_bps
157 |         
158 |         #determine for each bp the allowed range s.t. alarm is closest to bp
159 |         
160 |         ranges = [0] * nr_bps
161 |         
162 |         for i in range(nr_bps):
163 |             
164 |             if i==0:
165 |                 left = 0
166 |             else:
167 |                 left = right
168 |             if i==(nr_bps-1):
169 |                 right = len(distances)
170 |             else:
171 |                 right = int((bps[i][-1]+bps[i+1][0])/2)+1
172 |                 
173 |             ranges[i] = [left,right]
174 |         
175 |         quantiles = np.quantile(peaks_prom,np.array(range(51))/50)
176 |         quantiles_set = set(quantiles)
177 |         quantiles_set.add(0.)
178 |         quantiles = list(quantiles_set)
179 |         quantiles.sort()
180 |         
181 |         nr_quant = len(quantiles)
182 |         
183 |         ncr = [0.]*nr_quant
184 |         nal = [0.]*nr_quant
185 |         
186 |         for i in range(nr_quant):
187 |             for j in range(nr_bps):
188 |                 
189 |                 bp_nbhd = peaks_prom_all[max(bps[j][0]-tol_dist,ranges[j][0]):min(bps[j][-1]+tol_dist+1,ranges[j][1])]
190 |                 
191 |                 if max(bp_nbhd) >= quantiles[i]:
192 |                     ncr[i] += 1
193 |                             
194 |             indices_all = np.array(range(len(distances)))
195 |             heights_alarms = distances[peaks_prom_all>=quantiles[i]]
196 |             nal[i] = len(heights_alarms)
197 |                     
198 |         ncr = np.array(ncr)
199 |         nal = np.array(nal)
200 |         
201 |         ngt = nr_bps
202 |         
203 |         tpr = ncr/ngt
204 |         fpr = (nal-ncr)/nal
205 |         
206 |         tpr = list(tpr[nal!=0])
207 |         fpr = list(fpr[nal!=0])
208 |         
209 |         tpr.insert(0,1.0)
210 |         fpr.insert(0,1.0)
211 |         tpr.insert(len(tpr),0.0)
212 |         fpr.insert(len(fpr),0.0)
213 |     return tpr, fpr
214 | 
215 | def matched_filter(signal, window_size):
216 |     """
217 |     Matched filter for dissimilarity measure smoothing (and zero-delay weighted moving average filter for shared feature smoothing)
218 |     
219 |     Args:
220 |         signal: input signal
221 |         window_size: window size used for CPD
222 |     Returns:
223 |         filtered signal
224 |     """
225 |     mask = np.ones((2*window_size+1,))
226 |     for i in range(window_size):
227 |         mask[i] = i/(window_size**2)
228 |         mask[-(i+1)] = i/(window_size**2)
229 |     mask[window_size] = window_size/(window_size**2)
230 |         
231 |     signal_out = np.zeros(np.shape(signal))
232 |     
233 |     if len(np.shape(signal)) >1:
234 |         for i in range(np.shape(signal)[1]):
235 |             signal_extended = np.concatenate((signal[0,i]*np.ones(window_size), signal[:,i], signal[-1,i]*np.ones(window_size)))
236 |             signal_out[:,i] = np.convolve(signal_extended, mask, 'valid')
237 |     else:
238 |         signal = np.concatenate((signal[0]*np.ones(window_size), signal, signal[-1]*np.ones(window_size)))
239 |         signal_out = np.convolve(signal, mask, 'valid')
240 |     
241 |     return signal_out
242 | 
243 | def minmaxscale(data, a, b):
244 |     """
245 |     Scales data to the interval [a,b]
246 |     """
247 |     data_min = np.amin(data)
248 |     data_max = np.amax(data)
249 |         
250 |     return (b-a)*(data-data_min)/(data_max-data_min)+a
251 | 
252 | def calc_fft(windows, nfft=30, norm_mode="timeseries"):
253 |     """
254 |     Calculates the DFT for each window and transforms its length
255 |     
256 |     Args:
257 |         windows: time series windows
258 |         nfft: number of points used for the calculation of the DFT
259 |         norm_mode: ensure that the timeseries / each window has zero mean
260 |         
261 |     Returns:
262 |         frequency domain windows, each window having size nfft//2 (+1 for timeseries normalization)
263 |     """
264 |     mean_per_segment = np.mean(windows, axis=-1)
265 |     mean_all = np.mean(mean_per_segment, axis=0)
266 |     
267 |     if norm_mode == "window":
268 |         windows = windows-mean_per_segment[:,None]
269 |         windows_fft = abs(np.fft.fft(windows, nfft))[...,1:nfft//2+1]
270 |     elif norm_mode == "timeseries":
271 |         windows = windows-mean_all
272 |         windows_fft = abs(np.fft.fft(windows, nfft))[...,:nfft//2+1]
273 |         
274 |     
275 |     fft_max = np.amax(windows_fft)
276 |     fft_min = np.amin(windows_fft)
277 |         
278 |     windows_fft = 2*(windows_fft-fft_min)/(fft_max-fft_min)-1
279 |     
280 |     return windows_fft
281 | 
282 | def plot_cp(distances, parameters, window_size, time_start, time_stop, plot_prominences=False):
283 |     """
284 |     Plots dissimilarity measure with ground-truth changepoints
285 |     
286 |     Args:
287 |         distances: dissimilarity measures
288 |         parameters: array parameters used to generate time series, size Tx(nr. of parameters)
289 |         window_size: window size used for CPD
290 |         time_start: first time stamp of plot
291 |         time_stop: last time stamp of plot
292 |         plot_prominences: True/False
293 |         
294 |     Returns:
295 |         plot of dissimilarity measure with ground-truth changepoints
296 |     
297 |     """
298 |     
299 |     breakpoints = parameters_to_cps(parameters,window_size)
300 | 
301 | 
302 |     length_ts = np.size(breakpoints)
303 |     t = range(len(distances))
304 | 
305 | 
306 |     x = t
307 |     z = distances
308 |     y = breakpoints#[:,0]
309 | 
310 |     peaks = find_peaks(distances)[0]
311 |     peaks_prom = peak_prominences(distances,peaks)[0]
312 |     peaks_prom_all = np.array(new_peak_prominences(distances)[0])
313 | 
314 |     fig, ax = plt.subplots(figsize=(15,4))
315 |     ax.plot(x,z,color="black")
316 |     
317 |     if plot_prominences:
318 |         ax.plot(x,peaks_prom_all, color="blue")
319 | 
320 |     ax.set_xlim(time_start,time_stop)
321 |     ax.set_ylim(0,1.5*max(z))
322 |     plt.xlabel("time")
323 |     plt.ylabel("dissimilarity")
324 | 
325 |     ax.plot(peaks,distances[peaks],'ko')
326 | 
327 |     height_line = 1
328 | 
329 |     ax.fill_between(x, 0, height_line, where=y > 0.0001,
330 |                 color='red', alpha=0.2, transform=ax.get_xaxis_transform())
331 |     ax.fill_between(x, 0, height_line, where=y > 0.25,
332 |                 color='red', alpha=0.2, transform=ax.get_xaxis_transform())
333 |     ax.fill_between(x, 0, height_line, where=y > 0.5,
334 |                 color='red', alpha=0.2, transform=ax.get_xaxis_transform())
335 |     ax.fill_between(x, 0, height_line, where=y > 0.75,
336 |                 color='red', alpha=0.2, transform=ax.get_xaxis_transform())
337 |     ax.fill_between(x, 0, height_line, where=y > 0.9,
338 |                 color='red', alpha=0.2, transform=ax.get_xaxis_transform())
339 |     plt.show()
340 |     
341 | def get_auc(distances, tol_distances, parameters, window_size):
342 |     """
343 |     Calculation of AUC for toleration distances in range(TD_start, TD_stop, TD_step) + plot of corresponding ROC curves
344 |     
345 |     Args:
346 |         distances: dissimilarity measures
347 |         tol_distances: list of different toleration distances
348 |         parameters: array parameters used to generate time series, size Tx(nr. of parameters)
349 |         window_size: window size used for CPD
350 |         
351 |     Returns:
352 |         list of AUCs for every toleration distance
353 |     """
354 |     
355 |     breakpoints = parameters_to_cps(parameters,window_size)
356 | 
357 |     legend = []
358 |     list_of_lists = cp_to_timestamps(breakpoints,0,np.size(breakpoints))
359 |     auc = []
360 | 
361 |     for i in tol_distances:
362 |         with warnings.catch_warnings():
363 |             warnings.filterwarnings("ignore")
364 |             tpr, fpr = tpr_fpr(list_of_lists,distances, "prominence",i)
365 |         plt.plot(fpr,tpr)
366 |         legend.append("tol. dist. = "+str(i))
367 |         auc.append(np.abs(np.trapz(tpr,x=fpr)))
368 | 
369 |     print(auc)
370 |     plt.xlabel("FPR")
371 |     plt.ylabel("TPR")
372 |     plt.title("ROC curve")
373 |     plt.plot([0,1],[0,1],'--')
374 |     legend.append("TPR=FPR")
375 |     plt.legend(legend)
376 |     plt.show()
377 | 
378 |     plt.plot(tol_distances,auc)
379 |     plt.xlabel("toleration distance")
380 |     plt.ylabel("AUC")
381 |     plt.title("AUC")
382 |     plt.show()
383 |     
384 |     return auc
385 | 


--------------------------------------------------------------------------------