├── main.py ├── README.md ├── simulate.py ├── TIRE.py └── utils.py /main.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow import keras 3 | from tensorflow.keras import backend as K 4 | from tensorflow.keras.layers import Lambda, Input, Dense 5 | from tensorflow.keras.models import Model 6 | 7 | import numpy as np 8 | import random 9 | import matplotlib.pyplot as plt 10 | from scipy.signal import find_peaks, peak_prominences 11 | import warnings 12 | import time, copy 13 | 14 | import utils 15 | import TIRE 16 | import simulate 17 | 18 | #---------------------------# 19 | ##SET PARAMETERS 20 | 21 | window_size = 20 22 | domain = "both" #choose from: TD (time domain), FD (frequency domain) or both 23 | 24 | #parameters TD 25 | intermediate_dim_TD=0 26 | latent_dim_TD=1 #h^TD in paper 27 | nr_shared_TD=1 #s^TD in paper 28 | K_TD = 2 #as in paper 29 | nr_ae_TD= K_TD+1 #number of parallel AEs = K+1 30 | loss_weight_TD=1 #lambda_TD in paper 31 | 32 | #parameters FD 33 | intermediate_dim_FD=10 34 | latent_dim_FD=1 #h^FD in paper 35 | nr_shared_FD=1 #s^FD in paper 36 | K_FD = 2 #as in paper 37 | nr_ae_FD=K_FD+1 #number of parallel AEs = K+1 38 | loss_weight_FD=1 #lambda^FD in paper 39 | nfft = 30 #number of points for DFT 40 | norm_mode = "timeseries" #for calculation of DFT, should the timeseries have mean zero or each window? 41 | 42 | #---------------------------# 43 | ##GENERATE OR LOAD DATA 44 | 45 | timeseries, windows_TD, parameters = simulate.generate_jumpingmean(window_size) 46 | windows_FD = utils.calc_fft(windows_TD, nfft, norm_mode) 47 | #note: loaded data can be preprocessed using utils.ts_to_windows and utils.combine_ts 48 | 49 | 50 | #---------------------------# 51 | ##TRAIN THE AUTOENCODERS 52 | 53 | shared_features_TD = TIRE.train_AE(windows_TD, intermediate_dim_TD, latent_dim_TD, nr_shared_TD, nr_ae_TD, loss_weight_TD) 54 | shared_features_FD = TIRE.train_AE(windows_FD, intermediate_dim_FD, latent_dim_FD, nr_shared_FD, nr_ae_FD, loss_weight_FD) 55 | 56 | #---------------------------# 57 | ##POSTPROCESSING AND PEAK DETECTION 58 | 59 | dissimilarities = TIRE.smoothened_dissimilarity_measures(shared_features_TD, shared_features_FD, domain, window_size) 60 | change_point_scores = TIRE.change_point_score(dissimilarities, window_size) 61 | 62 | np.savetxt("change_point_scores.txt", change_point_scores) 63 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TIRE 2 | 3 | **TIRE** is an autoencoder-based change point detection algorithm for time series data that uses a TIme-Invariant Representation (TIRE). More information can be found in the paper *Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation*, published in *IEEE Transactions on Signal Processing* in 2021. 4 | 5 | The authors of this paper are: 6 | 7 | - [Tim De Ryck](https://math.ethz.ch/sam/the-institute/people.html?u=deryckt) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven; now [SAM](https://math.ethz.ch/sam), Dept. Mathematics, ETH Zürich) 8 | - [Maarten De Vos](https://www.esat.kuleuven.be/stadius/person.php?id=203) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven and Dept. Development and Regeneration, KU Leuven) 9 | - [Alexander Bertrand](https://www.esat.kuleuven.be/stadius/person.php?id=331) ([STADIUS](https://www.esat.kuleuven.be/stadius/), Dept. Electrical Engineering, KU Leuven) 10 | 11 | All authors are affiliated to [LEUVEN.AI - KU Leuven institute for AI](https://ai.kuleuven.be). Note that work based on TIRE should cite our paper: 12 | 13 | @article{deryck2021change, 14 | title={Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation}, 15 | author={De Ryck, Tim and De Vos, Maarten and Bertrand, Alexander}, 16 | journal={IEEE Transactions on Signal Processing}, 17 | year={2021}, 18 | publisher={IEEE} 19 | } 20 | 21 | ## Abstract 22 | 23 | *Change point detection (CPD) aims to locate abrupt property changes in time series data. Recent CPD methods demonstrated the potential of using deep learning techniques, but often lack the ability to identify more subtle changes in the autocorrelation statistics of the signal and suffer from a high false alarm rate. To address these issues, we employ an autoencoder-based methodology with a novel loss function, through which the used autoencoders learn a partially time-invariant representation that is tailored for CPD. The result is a flexible method that allows the user to indicate whether change points should be sought in the time domain, frequency domain or both. Detectable change points include abrupt changes in the slope, mean, variance, autocorrelation function and frequency spectrum. We demonstrate that our proposed method is consistently highly competitive or superior to baseline methods on diverse simulated and real-life benchmark data sets. Finally, we mitigate the issue of false detection alarms through the use of a postprocessing procedure that combines a matched filter and a newly proposed change point score. We show that this combination drastically improves the performance of our method as well as all baseline methods.* 24 | 25 | ## Goal 26 | 27 | More concretely, the goal of TIRE is the following. Given raw time series data, TIRE returns for each time stamp of the time series a change point score. This score reflects the probability that there is a change point at (or near) the corresponding time stamp. Note that the absolute value of this change point score has no meaning. It is then common practice to discard the change point for which the change point score is below some user-defined treshold. For more information on how the change point scores are obtained we refer to our paper. 28 | 29 | Detectable change points include abrupt changes in: 30 | - Mean 31 | - Slope 32 | - Variance 33 | - Autocorrelation 34 | - Frequency spectrum 35 | 36 | ## Guidelines 37 | 38 | First install all required packages, these can be found in the beginning of each file. We provided a Jupyter notebook `TIRE_example_notebook.ipynb` that demonstrates how the TIRE change point scores can be obtained from the raw time series data. In addition, the change points obtained by TIRE are compared in the notebook to the ground truth both visually and through the calculation of the AUC score. Alternatively, you can run `main.py` to obtain a txt-file containing the change point scores. 39 | 40 | ## Contact 41 | 42 | In case of comments or questions, please contact me at . 43 | -------------------------------------------------------------------------------- /simulate.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow import keras 3 | from tensorflow.keras import backend as K 4 | from tensorflow.keras.layers import Lambda, Input, Dense 5 | from tensorflow.keras.models import Model 6 | 7 | import numpy as np 8 | import random 9 | import matplotlib.pyplot as plt 10 | from scipy.signal import find_peaks, peak_prominences 11 | import warnings 12 | import time, copy 13 | 14 | import utils 15 | 16 | def ar2(value1,value2,coef1,coef2,mu,sigma): 17 | """ 18 | AR(2) model, cfr. paper 19 | """ 20 | return coef1*value1+coef2*value2 + np.random.randn()*sigma+mu 21 | 22 | def ar1(value1,coef1,mu,sigma): 23 | """ 24 | AR(1) model, cfr. paper 25 | """ 26 | return coef1*value1 + np.random.randn()*sigma+mu 27 | 28 | def generate_jumpingmean(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1): 29 | """ 30 | Generates one instance of a jumping mean time series, together with the corresponding windows and parameters 31 | """ 32 | mu = np.zeros((nr_cp,)) 33 | parameters_jumpingmean = [] 34 | for n in range(1,nr_cp): 35 | mu[n] = mu[n-1] + n/16 36 | for n in range(nr_cp): 37 | nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std)) 38 | parameters_jumpingmean.extend(mu[n]*np.ones((nr,))) 39 | 40 | parameters_jumpingmean = np.array([parameters_jumpingmean]).T 41 | 42 | ts_length = len(parameters_jumpingmean) 43 | timeseries = np.zeros((ts_length)) 44 | for i in range(2,ts_length): 45 | #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5)) 46 | timeseries[i] = ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5) 47 | 48 | windows = utils.ts_to_windows(timeseries, 0, window_size, stride) 49 | windows = utils.minmaxscale(windows,scale_min,scale_max) 50 | 51 | return timeseries, windows, parameters_jumpingmean 52 | 53 | def generate_scalingvariance(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1): 54 | """ 55 | Generates one instance of a scaling variance time series, together with the corresponding windows and parameters 56 | """ 57 | sigma = np.ones((nr_cp,)) 58 | parameters_scalingvariance = [] 59 | for n in range(1,nr_cp-1,2): 60 | sigma[n] = np.log(np.exp(1)+n/4) 61 | for n in range(nr_cp): 62 | nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std)) 63 | parameters_scalingvariance.extend(sigma[n]*np.ones((nr,))) 64 | 65 | parameters_scalingvariance = np.array([parameters_scalingvariance]).T 66 | 67 | 68 | ts_length = len(parameters_scalingvariance) 69 | timeseries = np.zeros((ts_length)) 70 | for i in range(2,ts_length): 71 | #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5)) 72 | timeseries[i] = ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, 0, parameters_scalingvariance[i]) 73 | 74 | windows = utils.ts_to_windows(timeseries, 0, window_size, stride) 75 | windows = utils.minmaxscale(windows,scale_min,scale_max) 76 | 77 | return timeseries, windows, parameters_scalingvariance 78 | 79 | def generate_gaussian(window_size, stride=1, nr_cp=49, delta_t_cp = 100, delta_t_cp_std = 10, scale_min=-1, scale_max=1): 80 | """ 81 | Generates one instance of a Gaussian mixtures time series, together with the corresponding windows and parameters 82 | """ 83 | mixturenumber = np.zeros((nr_cp,)) 84 | parameters_gaussian = [] 85 | for n in range(1,nr_cp-1,2): 86 | mixturenumber[n] = 1 87 | for n in range(nr_cp): 88 | nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std)) 89 | parameters_gaussian.extend(mixturenumber[n]*np.ones((nr,))) 90 | 91 | parameters_gaussian = np.array([parameters_gaussian]).T 92 | 93 | ts_length = len(parameters_gaussian) 94 | timeseries = np.zeros((ts_length)) 95 | for i in range(2,ts_length): 96 | #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5)) 97 | if parameters_gaussian[i] == 0: 98 | timeseries[i] = 0.5*0.5*np.random.randn()+0.5*0.5*np.random.randn() 99 | else: 100 | timeseries[i] = -0.6 - 0.8*1*np.random.randn() + 0.2*0.1*np.random.randn() 101 | 102 | windows = utils.ts_to_windows(timeseries, 0, window_size, stride) 103 | windows = utils.minmaxscale(windows,scale_min,scale_max) 104 | 105 | return timeseries, windows, parameters_gaussian 106 | 107 | def generate_changingcoefficients(window_size, stride=1, nr_cp=49, delta_t_cp = 1000, delta_t_cp_std = 10, scale_min=-1, scale_max=1): 108 | """ 109 | Generates one instance of a changing coefficients time series, together with the corresponding windows and parameters 110 | """ 111 | coeff = np.ones((nr_cp,)) 112 | parameters = [] 113 | for n in range(0,nr_cp,2): 114 | coeff[n] = np.random.rand()*0.5 115 | for n in range(1,nr_cp-1,2): 116 | coeff[n] = np.random.rand()*0.15+0.8 117 | 118 | for n in range(nr_cp): 119 | nr = int(delta_t_cp+np.random.randn()*np.sqrt(delta_t_cp_std)) 120 | parameters.extend(coeff[n]*np.ones((nr,))) 121 | 122 | #parameters = np.array([parameters]).T 123 | parameters = ts_to_windows(parameters,0,1,stride) 124 | 125 | ts_length = len(parameters) 126 | timeseries = np.zeros((ts_length)) 127 | for i in range(1,ts_length): 128 | #print(ar2(timeseries[i-1],timeseries[i-2], 0.6,-0.5, parameters_jumpingmean[i], 1.5)) 129 | timeseries[i] = ar1(timeseries[i-1],parameters[i], 0,1) 130 | 131 | windows = utils.ts_to_windows(timeseries, 0, window_size, stride) 132 | windows = utils.minmaxscale(windows,scale_min,scale_max) 133 | 134 | return timeseries, windows, parameters 135 | -------------------------------------------------------------------------------- /TIRE.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow import keras 3 | from tensorflow.keras import backend as K 4 | from tensorflow.keras.layers import Lambda, Input, Dense 5 | from tensorflow.keras.models import Model 6 | 7 | import numpy as np 8 | import random 9 | import matplotlib.pyplot as plt 10 | from scipy.signal import find_peaks, peak_prominences 11 | import warnings 12 | import time, copy 13 | 14 | import utils 15 | 16 | def create_parallel_aes(window_size_per_ae, 17 | intermediate_dim=0, 18 | latent_dim=1, 19 | nr_ae=3, 20 | nr_shared=1, 21 | loss_weight=1): 22 | """ 23 | Create a Tensorflow model with parallel autoencoders, as visualized in Figure 1 of the TIRE paper. 24 | 25 | Args: 26 | window_size_per_ae: window size for the AE 27 | intermediate_dim: intermediate dimension for stacked AE, for single-layer AE use 0 28 | latent_dim: latent dimension of AE 29 | nr_ae: number of parallel AEs (K in paper) 30 | nr_shared: number of shared features (should be <= latent_dim) 31 | loss_weight: lambda in paper 32 | 33 | Returns: 34 | A parallel AE model instance, its encoder part and its decoder part 35 | """ 36 | wspa = window_size_per_ae 37 | 38 | x = Input(shape=(nr_ae,wspa,)) 39 | 40 | if intermediate_dim == 0: 41 | y=x 42 | else: 43 | y = Dense(intermediate_dim, activation=tf.nn.relu)(x) 44 | #y = tf.keras.layers.BatchNormalization()(y) 45 | 46 | z_shared = Dense(nr_shared, activation=tf.nn.tanh)(y) 47 | z_unshared = Dense(latent_dim-nr_shared, activation=tf.nn.tanh)(y) 48 | z = tf.concat([z_shared,z_unshared],-1) 49 | 50 | 51 | if intermediate_dim == 0: 52 | y=z 53 | else: 54 | y = Dense(intermediate_dim, activation=tf.nn.relu)(z) 55 | #y = tf.keras.layers.BatchNormalization()(y) 56 | 57 | x_decoded = Dense(wspa,activation=tf.nn.tanh)(y) 58 | 59 | pae = Model(x,x_decoded) 60 | encoder = Model(x,z) 61 | 62 | input_decoder = Input(shape=(nr_ae, latent_dim,)) 63 | if intermediate_dim == 0: 64 | layer1 = pae.layers[-1] 65 | decoder = Model(input_decoder, layer1(input_decoder)) 66 | else: 67 | layer1 = pae.layers[-1] 68 | layer2 = pae.layers[-2] 69 | decoder = Model(input_decoder, layer1(layer2(input_decoder))) 70 | 71 | pae.summary() 72 | 73 | def pae_loss(x,x_decoded): 74 | squared_diff = K.square(x-x_decoded) 75 | mse_loss = tf.reduce_mean(squared_diff) 76 | 77 | square_diff2 = K.square(z_shared[:,1:,:]-z_shared[:,:nr_ae-1,:]) 78 | shared_loss = tf.reduce_mean(square_diff2) 79 | 80 | return mse_loss + loss_weight*shared_loss 81 | 82 | squared_diff = K.square(x-x_decoded) 83 | mse_loss = tf.reduce_mean(squared_diff) 84 | 85 | square_diff2 = K.square(z_shared[:,1:,:]-z_shared[:,:nr_ae-1,:]) 86 | shared_loss = tf.reduce_mean(square_diff2) 87 | total_loss = mse_loss + loss_weight*shared_loss 88 | 89 | pae.add_loss(total_loss) 90 | 91 | return pae, encoder, decoder 92 | 93 | def prepare_input_paes(windows,nr_ae): 94 | """ 95 | Prepares input for create_parallel_ae 96 | 97 | Args: 98 | windows: list of windows 99 | nr_ae: number of parallel AEs (K in paper) 100 | 101 | Returns: 102 | array with shape (nr_ae, (nr. of windows)-K+1, window size) 103 | """ 104 | new_windows = [] 105 | nr_windows = windows.shape[0] 106 | for i in range(nr_ae): 107 | new_windows.append(windows[i:nr_windows-nr_ae+1+i]) 108 | return np.transpose(new_windows,(1,0,2)) 109 | 110 | def train_AE(windows, intermediate_dim=0, latent_dim=1, nr_shared=1, nr_ae=3, loss_weight=1, nr_epochs=200, nr_patience=200): 111 | """ 112 | Creates and trains an autoencoder with a Time-Invariant REpresentation (TIRE) 113 | 114 | Args: 115 | windows: time series windows (i.e. {y_t}_t or {z_t}_t in the notation of the paper) 116 | intermediate_dim: intermediate dimension for stacked AE, for single-layer AE use 0 117 | latent_dim: latent dimension of AE 118 | nr_shared: number of shared features (should be <= latent_dim) 119 | nr_ae: number of parallel AEs (K in paper) 120 | loss_weight: lambda in paper 121 | nr_epochs: number of epochs for training 122 | nr_patience: patience for early stopping 123 | 124 | Returns: 125 | returns the TIRE encoded windows for all windows 126 | """ 127 | window_size_per_ae = windows.shape[-1] 128 | 129 | new_windows = prepare_input_paes(windows,nr_ae) 130 | 131 | pae, encoder, decoder = create_parallel_aes(window_size_per_ae,intermediate_dim,latent_dim,nr_ae,nr_shared,loss_weight) 132 | pae.compile(optimizer='adam') 133 | 134 | callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=nr_patience) 135 | 136 | pae.fit(new_windows, 137 | epochs=nr_epochs, 138 | verbose=1, 139 | batch_size=64, 140 | shuffle=True, 141 | validation_split=0.0, 142 | initial_epoch=0, 143 | callbacks=[callback] 144 | ) 145 | 146 | #reconstruct = pae.predict(new_windows) 147 | encoded_windows_pae = encoder.predict(new_windows) 148 | encoded_windows = np.concatenate((encoded_windows_pae[:,0,:nr_shared],encoded_windows_pae[-nr_ae+1:,nr_ae-1,:nr_shared]),axis=0) 149 | 150 | return encoded_windows 151 | 152 | def smoothened_dissimilarity_measures(encoded_windows, encoded_windows_fft, domain, window_size): 153 | """ 154 | Calculation of smoothened dissimilarity measures 155 | 156 | Args: 157 | encoded_windows: TD latent representation of windows 158 | encoded_windows_fft: FD latent representation of windows 159 | domain: TD/FD/both 160 | parameters: array with used parameters 161 | window_size: window size used 162 | par_smooth 163 | 164 | Returns: 165 | smoothened dissimilarity measures 166 | """ 167 | 168 | if domain == "TD": 169 | encoded_windows_both = encoded_windows 170 | elif domain == "FD": 171 | encoded_windows_both = encoded_windows_fft 172 | elif domain == "both": 173 | beta = np.quantile(utils.distance(encoded_windows, window_size), 0.95) 174 | alpha = np.quantile(utils.distance(encoded_windows_fft, window_size), 0.95) 175 | encoded_windows_both = np.concatenate((encoded_windows*alpha, encoded_windows_fft*beta),axis=1) 176 | 177 | encoded_windows_both = utils.matched_filter(encoded_windows_both, window_size) 178 | distances = utils.distance(encoded_windows_both, window_size) 179 | distances = utils.matched_filter(distances, window_size) 180 | 181 | return distances 182 | 183 | def change_point_score(distances, window_size): 184 | """ 185 | Gives the change point score for each time stamp. A change point score > 0 indicates that a new segment starts at that time stamp. 186 | 187 | Args: 188 | distances: postprocessed dissimilarity measure for all time stamps 189 | window_size: window size used in TD for CPD 190 | 191 | Returns: 192 | change point scores for every time stamp (i.e. zero-padded such that length is same as length time series) 193 | """ 194 | 195 | prominences = np.array(utils.new_peak_prominences(distances)[0]) 196 | prominences = prominences/np.amax(prominences) 197 | return np.concatenate((np.zeros((window_size,)), prominences, np.zeros((window_size-1,)))) 198 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow import keras 3 | from tensorflow.keras import backend as K 4 | from tensorflow.keras.layers import Lambda, Input, Dense 5 | from tensorflow.keras.models import Model 6 | 7 | import numpy as np 8 | import random 9 | import matplotlib.pyplot as plt 10 | from scipy.signal import find_peaks, peak_prominences 11 | import warnings 12 | import time, copy 13 | 14 | def distance(data, window_size): 15 | """ 16 | Calculates distance (dissimilarity measure) between features 17 | 18 | Args: 19 | data: array of of learned features of size (nr. of windows) x (number of shared features) 20 | window_size: window size used for CPD 21 | 22 | Returns: 23 | Array of dissimilarities of size ((nr. of windows)-stride) 24 | """ 25 | 26 | nr_windows = np.shape(data)[0] 27 | 28 | index_1 = range(window_size,nr_windows,1) 29 | index_2 = range(0,nr_windows-window_size,1) 30 | 31 | return np.sqrt(np.sum(np.square(data[index_1]-data[index_2]),1)) 32 | 33 | def parameters_to_cps(parameters,window_size): 34 | """ 35 | Preparation for plotting ground-truth change points 36 | 37 | Args: 38 | parameters: array parameters used to generate time series, size Tx(nr. of parameters) 39 | window_size: window size used for CPD 40 | 41 | Returns: 42 | Array of which entry is non-zero in the presence of a change point. Higher values correspond to larger parameter changes. 43 | """ 44 | 45 | length_ts = np.size(parameters,0) 46 | 47 | index1 = range(window_size-1,length_ts-window_size,1) #selects parameter at LAST time stamp of window 48 | index2 = range(window_size,length_ts-window_size+1,1) #selects parameter at FIRST time stamp of next window 49 | 50 | diff_parameters = np.sqrt(np.sum(np.square(parameters[index1]-parameters[index2]),1)) 51 | 52 | max_diff = np.max(diff_parameters) 53 | 54 | return diff_parameters/max_diff 55 | 56 | def cp_to_timestamps(changepoints, tolerance, length_ts): 57 | """ 58 | Extracts time stamps of change points 59 | 60 | Args: 61 | changepoints: 62 | tolerance: 63 | length_ts: length of original time series 64 | 65 | Returns: 66 | list where each entry is a list with the windows affected by a change point 67 | """ 68 | 69 | locations_cp = [idx for idx, val in enumerate(changepoints) if val > 0.0] 70 | 71 | output = [] 72 | while len(locations_cp)>0: 73 | k = 0 74 | for i in range(len(locations_cp)-1): 75 | if locations_cp[i]+1 == locations_cp[i+1]: 76 | k+=1 77 | else: 78 | break 79 | 80 | output.append(list(range(max(locations_cp[0]-tolerance,0),min(locations_cp[k]+1+tolerance,length_ts),1))) 81 | del locations_cp[:k+1] 82 | 83 | return output 84 | 85 | def ts_to_windows(ts, onset, window_size, stride, normalization="timeseries"): 86 | """Transforms time series into list of windows""" 87 | windows = [] 88 | len_ts = len(ts) 89 | onsets = range(onset, len_ts-window_size+1, stride) 90 | 91 | if normalization =="timeseries": 92 | for timestamp in onsets: 93 | windows.append(ts[timestamp:timestamp+window_size]) 94 | elif normalization=="window": 95 | for timestamp in onsets: 96 | windows.append(np.array(ts[timestamp:timestamp+window_size])-np.mean(ts[timestamp:timestamp+window_size])) 97 | 98 | return np.array(windows) 99 | 100 | def combine_ts(list_of_windows): 101 | """ 102 | Combines a list of windows from multiple views to one list of windows 103 | 104 | Args: 105 | list_of_windows: list of windows from multiple views 106 | 107 | Returns: 108 | one array with the concatenated windows 109 | """ 110 | nr_ts, nr_windows, window_size = np.shape(list_of_windows) 111 | tss = np.array(list_of_windows) 112 | new_ts = [] 113 | for i in range(nr_windows): 114 | new_ts.append(tss[:,i,:].flatten()) 115 | return np.array(new_ts) 116 | 117 | def new_peak_prominences(distances): 118 | """ 119 | Adapted calculation of prominence of peaks, based on the original scipy code 120 | 121 | Args: 122 | distances: dissimarity scores 123 | Returns: 124 | prominence scores 125 | """ 126 | with warnings.catch_warnings(): 127 | warnings.filterwarnings("ignore") 128 | all_peak_prom = peak_prominences(distances,range(len(distances))) 129 | return all_peak_prom 130 | 131 | def tpr_fpr(bps,distances, method="prominence",tol_dist=0): 132 | """Calculation of TPR and FPR 133 | 134 | Args: 135 | bps: list of breakpoints (change points) 136 | distances: list of dissimilarity scores 137 | method: prominence- or height-based change point score 138 | tol_dist: toleration distance 139 | 140 | Returns: 141 | list of TPRs and FPRs for different values of the detection threshold 142 | """ 143 | 144 | peaks = find_peaks(distances)[0] 145 | peaks_prom = peak_prominences(distances,peaks)[0] 146 | peaks_prom_all = np.array(new_peak_prominences(distances)[0]) 147 | distances = np.array(distances) 148 | 149 | bps = np.array(bps) 150 | 151 | if method == "prominence": 152 | 153 | nr_bps = len(bps) 154 | 155 | index_closest_peak = [0]*nr_bps 156 | bp_detected = [0]*nr_bps 157 | 158 | #determine for each bp the allowed range s.t. alarm is closest to bp 159 | 160 | ranges = [0] * nr_bps 161 | 162 | for i in range(nr_bps): 163 | 164 | if i==0: 165 | left = 0 166 | else: 167 | left = right 168 | if i==(nr_bps-1): 169 | right = len(distances) 170 | else: 171 | right = int((bps[i][-1]+bps[i+1][0])/2)+1 172 | 173 | ranges[i] = [left,right] 174 | 175 | quantiles = np.quantile(peaks_prom,np.array(range(51))/50) 176 | quantiles_set = set(quantiles) 177 | quantiles_set.add(0.) 178 | quantiles = list(quantiles_set) 179 | quantiles.sort() 180 | 181 | nr_quant = len(quantiles) 182 | 183 | ncr = [0.]*nr_quant 184 | nal = [0.]*nr_quant 185 | 186 | for i in range(nr_quant): 187 | for j in range(nr_bps): 188 | 189 | bp_nbhd = peaks_prom_all[max(bps[j][0]-tol_dist,ranges[j][0]):min(bps[j][-1]+tol_dist+1,ranges[j][1])] 190 | 191 | if max(bp_nbhd) >= quantiles[i]: 192 | ncr[i] += 1 193 | 194 | indices_all = np.array(range(len(distances))) 195 | heights_alarms = distances[peaks_prom_all>=quantiles[i]] 196 | nal[i] = len(heights_alarms) 197 | 198 | ncr = np.array(ncr) 199 | nal = np.array(nal) 200 | 201 | ngt = nr_bps 202 | 203 | tpr = ncr/ngt 204 | fpr = (nal-ncr)/nal 205 | 206 | tpr = list(tpr[nal!=0]) 207 | fpr = list(fpr[nal!=0]) 208 | 209 | tpr.insert(0,1.0) 210 | fpr.insert(0,1.0) 211 | tpr.insert(len(tpr),0.0) 212 | fpr.insert(len(fpr),0.0) 213 | return tpr, fpr 214 | 215 | def matched_filter(signal, window_size): 216 | """ 217 | Matched filter for dissimilarity measure smoothing (and zero-delay weighted moving average filter for shared feature smoothing) 218 | 219 | Args: 220 | signal: input signal 221 | window_size: window size used for CPD 222 | Returns: 223 | filtered signal 224 | """ 225 | mask = np.ones((2*window_size+1,)) 226 | for i in range(window_size): 227 | mask[i] = i/(window_size**2) 228 | mask[-(i+1)] = i/(window_size**2) 229 | mask[window_size] = window_size/(window_size**2) 230 | 231 | signal_out = np.zeros(np.shape(signal)) 232 | 233 | if len(np.shape(signal)) >1: 234 | for i in range(np.shape(signal)[1]): 235 | signal_extended = np.concatenate((signal[0,i]*np.ones(window_size), signal[:,i], signal[-1,i]*np.ones(window_size))) 236 | signal_out[:,i] = np.convolve(signal_extended, mask, 'valid') 237 | else: 238 | signal = np.concatenate((signal[0]*np.ones(window_size), signal, signal[-1]*np.ones(window_size))) 239 | signal_out = np.convolve(signal, mask, 'valid') 240 | 241 | return signal_out 242 | 243 | def minmaxscale(data, a, b): 244 | """ 245 | Scales data to the interval [a,b] 246 | """ 247 | data_min = np.amin(data) 248 | data_max = np.amax(data) 249 | 250 | return (b-a)*(data-data_min)/(data_max-data_min)+a 251 | 252 | def calc_fft(windows, nfft=30, norm_mode="timeseries"): 253 | """ 254 | Calculates the DFT for each window and transforms its length 255 | 256 | Args: 257 | windows: time series windows 258 | nfft: number of points used for the calculation of the DFT 259 | norm_mode: ensure that the timeseries / each window has zero mean 260 | 261 | Returns: 262 | frequency domain windows, each window having size nfft//2 (+1 for timeseries normalization) 263 | """ 264 | mean_per_segment = np.mean(windows, axis=-1) 265 | mean_all = np.mean(mean_per_segment, axis=0) 266 | 267 | if norm_mode == "window": 268 | windows = windows-mean_per_segment[:,None] 269 | windows_fft = abs(np.fft.fft(windows, nfft))[...,1:nfft//2+1] 270 | elif norm_mode == "timeseries": 271 | windows = windows-mean_all 272 | windows_fft = abs(np.fft.fft(windows, nfft))[...,:nfft//2+1] 273 | 274 | 275 | fft_max = np.amax(windows_fft) 276 | fft_min = np.amin(windows_fft) 277 | 278 | windows_fft = 2*(windows_fft-fft_min)/(fft_max-fft_min)-1 279 | 280 | return windows_fft 281 | 282 | def plot_cp(distances, parameters, window_size, time_start, time_stop, plot_prominences=False): 283 | """ 284 | Plots dissimilarity measure with ground-truth changepoints 285 | 286 | Args: 287 | distances: dissimilarity measures 288 | parameters: array parameters used to generate time series, size Tx(nr. of parameters) 289 | window_size: window size used for CPD 290 | time_start: first time stamp of plot 291 | time_stop: last time stamp of plot 292 | plot_prominences: True/False 293 | 294 | Returns: 295 | plot of dissimilarity measure with ground-truth changepoints 296 | 297 | """ 298 | 299 | breakpoints = parameters_to_cps(parameters,window_size) 300 | 301 | 302 | length_ts = np.size(breakpoints) 303 | t = range(len(distances)) 304 | 305 | 306 | x = t 307 | z = distances 308 | y = breakpoints#[:,0] 309 | 310 | peaks = find_peaks(distances)[0] 311 | peaks_prom = peak_prominences(distances,peaks)[0] 312 | peaks_prom_all = np.array(new_peak_prominences(distances)[0]) 313 | 314 | fig, ax = plt.subplots(figsize=(15,4)) 315 | ax.plot(x,z,color="black") 316 | 317 | if plot_prominences: 318 | ax.plot(x,peaks_prom_all, color="blue") 319 | 320 | ax.set_xlim(time_start,time_stop) 321 | ax.set_ylim(0,1.5*max(z)) 322 | plt.xlabel("time") 323 | plt.ylabel("dissimilarity") 324 | 325 | ax.plot(peaks,distances[peaks],'ko') 326 | 327 | height_line = 1 328 | 329 | ax.fill_between(x, 0, height_line, where=y > 0.0001, 330 | color='red', alpha=0.2, transform=ax.get_xaxis_transform()) 331 | ax.fill_between(x, 0, height_line, where=y > 0.25, 332 | color='red', alpha=0.2, transform=ax.get_xaxis_transform()) 333 | ax.fill_between(x, 0, height_line, where=y > 0.5, 334 | color='red', alpha=0.2, transform=ax.get_xaxis_transform()) 335 | ax.fill_between(x, 0, height_line, where=y > 0.75, 336 | color='red', alpha=0.2, transform=ax.get_xaxis_transform()) 337 | ax.fill_between(x, 0, height_line, where=y > 0.9, 338 | color='red', alpha=0.2, transform=ax.get_xaxis_transform()) 339 | plt.show() 340 | 341 | def get_auc(distances, tol_distances, parameters, window_size): 342 | """ 343 | Calculation of AUC for toleration distances in range(TD_start, TD_stop, TD_step) + plot of corresponding ROC curves 344 | 345 | Args: 346 | distances: dissimilarity measures 347 | tol_distances: list of different toleration distances 348 | parameters: array parameters used to generate time series, size Tx(nr. of parameters) 349 | window_size: window size used for CPD 350 | 351 | Returns: 352 | list of AUCs for every toleration distance 353 | """ 354 | 355 | breakpoints = parameters_to_cps(parameters,window_size) 356 | 357 | legend = [] 358 | list_of_lists = cp_to_timestamps(breakpoints,0,np.size(breakpoints)) 359 | auc = [] 360 | 361 | for i in tol_distances: 362 | with warnings.catch_warnings(): 363 | warnings.filterwarnings("ignore") 364 | tpr, fpr = tpr_fpr(list_of_lists,distances, "prominence",i) 365 | plt.plot(fpr,tpr) 366 | legend.append("tol. dist. = "+str(i)) 367 | auc.append(np.abs(np.trapz(tpr,x=fpr))) 368 | 369 | print(auc) 370 | plt.xlabel("FPR") 371 | plt.ylabel("TPR") 372 | plt.title("ROC curve") 373 | plt.plot([0,1],[0,1],'--') 374 | legend.append("TPR=FPR") 375 | plt.legend(legend) 376 | plt.show() 377 | 378 | plt.plot(tol_distances,auc) 379 | plt.xlabel("toleration distance") 380 | plt.ylabel("AUC") 381 | plt.title("AUC") 382 | plt.show() 383 | 384 | return auc 385 | --------------------------------------------------------------------------------