├── README.md ├── SimMTM_Classification ├── .gitignore ├── code │ ├── config_files │ │ ├── Epilepsy_Configs.py │ │ └── SleepEEG_Configs.py │ ├── dataloader.py │ ├── layers │ │ ├── AutoCorrelation.py │ │ ├── Autoformer_EncDec.py │ │ ├── Embed.py │ │ ├── SelfAttention_Family.py │ │ ├── Transformer_EncDec.py │ │ └── __init__.py │ ├── loss.py │ ├── main.py │ ├── model.py │ ├── trainer.py │ └── utils │ │ ├── __init__.py │ │ ├── augmentations.py │ │ ├── loss.py │ │ ├── masking.py │ │ ├── metrics.py │ │ ├── timefeatures.py │ │ ├── tools.py │ │ └── utils.py ├── download_datasets.sh └── run.sh ├── SimMTM_Forecasting ├── .DS_Store ├── .gitignore ├── data_provider │ ├── __init__.py │ ├── data_factory.py │ ├── data_loader.py │ ├── m4.py │ └── uea.py ├── exp │ ├── .DS_Store │ ├── __init__.py │ ├── exp_basic.py │ └── exp_simmtm.py ├── layers │ ├── AutoCorrelation.py │ ├── Autoformer_EncDec.py │ ├── Conv_Blocks.py │ ├── ETSformer_EncDec.py │ ├── Embed.py │ ├── FourierCorrelation.py │ ├── MultiWaveletCorrelation.py │ ├── Pyraformer_EncDec.py │ ├── SelfAttention_Family.py │ ├── Transformer_EncDec.py │ └── __init__.py ├── models │ ├── .DS_Store │ ├── PatchTST.py │ ├── SimMTM.py │ ├── __init__.py │ └── iTransformer.py ├── run.py ├── scripts │ ├── .DS_Store │ ├── finetune │ │ ├── .DS_Store │ │ ├── ECL_script │ │ │ ├── .DS_Store │ │ │ └── ECL.sh │ │ ├── ETT_script │ │ │ ├── .DS_Store │ │ │ ├── ETTh1.sh │ │ │ ├── ETTh2.sh │ │ │ ├── ETTm1.sh │ │ │ └── ETTm2.sh │ │ ├── Traffic │ │ │ ├── .DS_Store │ │ │ └── Traffic.sh │ │ └── Weather_script │ │ │ ├── .DS_Store │ │ │ └── Weather.sh │ └── pretrain │ │ ├── .DS_Store │ │ ├── ECL_script │ │ ├── .DS_Store │ │ └── ECL.sh │ │ ├── ETT_script │ │ ├── .DS_Store │ │ ├── ETTh1.sh │ │ ├── ETTh2.sh │ │ ├── ETTm1.sh │ │ └── ETTm2.sh │ │ ├── Traffic_script │ │ ├── .DS_Store │ │ └── Traffic.sh │ │ └── Weather_script │ │ ├── .DS_Store │ │ └── Weather.sh └── utils │ ├── .DS_Store │ ├── __init__.py │ ├── augmentations.py │ ├── losses.py │ ├── m4_summary.py │ ├── masking.py │ ├── metrics.py │ ├── timefeatures.py │ └── tools.py └── figs ├── .DS_Store ├── mainresult.png └── overview.png /README.md: -------------------------------------------------------------------------------- 1 | 2 | # SimMTM (NeurIPS 2023) 3 | 4 | This is the codebase for the paper: [SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling](https://arxiv.org/abs/2302.00861) 5 | 6 | 7 | ## Architecture 8 | 9 |

10 | 11 |

12 | Figure 1. Overview of SimMTM. 13 |

14 | 15 | The reconstruction process of SimMTM involves the following four modules: masking, representation learning, series-wise similarity learning and point-wise reconstruction. 16 | 17 | ### Masking 18 | 19 | We can easily generate a set of masked series for each sample by randomly masking a portion of time points along the temporal dimension. 20 | 21 | ### Representation Learning 22 | 23 | After the encoder and projector layer, we can obtain the point-wise representations and series-wise representations. 24 | 25 | ### Series-wise Similarity Learning 26 | 27 | To precisely reconstruct the original time series, we attempt to utilize the similarities among series-wise representations for weighted aggregation, namely exploiting the local structure of the time series manifold. 28 | 29 | ### Point-wise Reconstruction 30 | 31 | Based on the learned series-wise similarities, we aggregate the point-wise representation of its own masked series and other series to reconstruct the original time series. 32 | 33 | 34 | ## Get Started 35 | 36 | 1、Prepare Data. 37 | 38 | All benchmark datasets can be obtained from [Google Drive](https://drive.google.com/file/d/1CC4ZrUD4EKncndzgy5PSTzOPSqcuyqqj/view?usp=sharing) or [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/a238e34ff81a42878d50/?dl=1), and arrange the folder as: 39 | 40 | ```plain 41 | SimMTM/ 42 | |--SimMTM_Forecast 43 | |-- dataset/ 44 | |-- ETT-small/ 45 | |-- ETTh1.csv 46 | |-- ETTh2.csv 47 | |-- ETTm1.csv 48 | |-- ETTm2.csv 49 | |-- weather/ 50 | |-- weather.csv 51 | |-- ... 52 | |-- ... 53 | |--SimMTM_Class 54 | |-- dataset/ 55 | |-- SleepEEG/ 56 | |-- train.pt 57 | |-- val.pt 58 | |-- test.pt 59 | |-- FD-B/ 60 | |-- ... 61 | |-- EMG/ 62 | |-- ... 63 | |-- ... 64 | ``` 65 | 66 | 2、Forecasting 67 | 68 | We provide the forecasting experiment coding in `./SimMTM_Forecast` and experiment scripts can be found under the folder `./scripts`. To run the code on ETTh2, just run the following command: 69 | 70 | ```bash 71 | cd ./SimMTM_Forecast 72 | # pre-training 73 | sh ./scripts/pretrain/ETT_script/ETTh2.sh 74 | # fine-tuning 75 | sh ./scripts/finetune/ETT_script/ETTh2.sh 76 | ``` 77 | 78 | 3、Classification 79 | 80 | We also provide the classification experiment coding in `./SimMTM_Class`. When we want to pre-train a model on SleepEEG and fine-tune it on Epilepsy, please run: 81 | 82 | ```bash 83 | cd ./SimMTM_Class 84 | python ./code/main.py --training_mode pre_train --pretrain_dataset SleepEEG --target_dataset Epilepsy 85 | ``` 86 | 87 | 4、We also provide some [checkpoints](https://cloud.tsinghua.edu.cn/f/466995bb5f924f55a6da/?dl=1) and you can tune them directly on target datasets. 88 | 89 | ## Main Results 90 | 91 |

92 | 93 |

94 |

95 | 96 | SimMTM (marked by red stars) can simultaneously cover high-level and low-level tasks for in- and cross-domain settings and outperforms other baselines significantly, highlighting the advantages of SimMTM in task generality. More results can be found in our paper. 97 | 98 | ## Citation 99 | If you find this repo useful, please cite our paper. 100 | 101 | ```plain 102 | @inproceedings{dong2023simmtm, 103 | title={SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling}, 104 | author={Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang and Mingsheng Long}, 105 | booktitle={Advances in Neural Information Processing Systems}, 106 | year={2023} 107 | } 108 | ``` 109 | 110 | ## Contact 111 | 112 | If you have any questions, please contact [djx20@mails.tsinghua.edu.cn](mailto:djx20@mails.tsinghua.edu.cn). 113 | 114 | ## Acknowledgement 115 | 116 | We appreciate the following github repos a lot for their valuable code base or datasets: 117 | 118 | https://github.com/thuml/Time-Series-Library 119 | 120 | https://github.com/mims-harvard/TFC-pretraining/tree/main 121 | 122 | Thanks to [vincentsham](https://github.com/vincentsham/simmtm/blob/main/experiments_simmtm-BeijingPM25Quality.ipynb) for reproducing our code. -------------------------------------------------------------------------------- /SimMTM_Classification/.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__/ 2 | **/.DS_Store 3 | old_README.md 4 | **/.idea 5 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/config_files/Epilepsy_Configs.py: -------------------------------------------------------------------------------- 1 | class Config(object): 2 | def __init__(self): 3 | # model configs 4 | self.input_channels = 1 5 | self.kernel_size = 8 6 | self.stride = 1 7 | self.final_out_channels = 32 #128 8 | 9 | self.num_classes = 2 10 | self.num_classes_target = 3 11 | self.dropout = 0.35 12 | self.features_len = 24 13 | self.features_len_f = 24 # 13 #self.features_len # the output results in time domain 14 | 15 | # training configs 16 | self.num_epoch = 40 # 40 17 | 18 | # optimizer parameters 19 | self.beta1 = 0.9 20 | self.beta2 = 0.99 21 | self.lr = 3e-4 # original lr: 3e-4 22 | self.lr_f = 3e-4 23 | 24 | # data parameters 25 | self.drop_last = True 26 | self.batch_size = 32 #64 # 128 27 | self.target_batch_size = 16 # the size of target dataset (the # of samples used to fine-tune). 28 | 29 | self.Context_Cont = Context_Cont_configs() 30 | self.TC = TC() 31 | self.augmentation = augmentations() 32 | 33 | 34 | class augmentations(object): 35 | def __init__(self): 36 | self.jitter_scale_ratio = 0.001 37 | self.jitter_ratio = 0.001 38 | self.max_seg = 5 39 | 40 | 41 | class Context_Cont_configs(object): 42 | def __init__(self): 43 | self.temperature = 0.2 44 | self.use_cosine_similarity = True 45 | self.use_cosine_similarity_f = True 46 | 47 | 48 | class TC(object): 49 | def __init__(self): 50 | self.hidden_dim = 100 51 | self.timesteps = 10 52 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/config_files/SleepEEG_Configs.py: -------------------------------------------------------------------------------- 1 | 2 | class Config(object): 3 | def __init__(self): 4 | # model configs 5 | self.input_channels = 1 6 | self.increased_dim = 1 7 | self.final_out_channels = 128 8 | self.num_classes = 5 9 | self.num_classes_target = 8 10 | self.dropout = 0.2 11 | self.masking_ratio = 0.5 12 | self.lm = 3 # average length of masking subsequences 13 | 14 | self.kernel_size = 25 15 | self.stride = 3 16 | self.features_len = 127 17 | self.features_len_f = self.features_len 18 | 19 | self.TSlength_aligned = 178 20 | 21 | self.CNNoutput_channel = 10 # 90 # 10 for Epilepsy model 22 | 23 | # training configs 24 | self.num_epoch = 40 25 | 26 | # optimizer parameters 27 | self.optimizer = 'adam' 28 | self.beta1 = 0.9 29 | self.beta2 = 0.99 30 | self.lr = 3e-8 # 3e-4 31 | self.lr_f = self.lr 32 | 33 | # data parameters 34 | self.drop_last = True 35 | self.batch_size = 32 36 | 37 | """For Epilepsy, the target batchsize is 60""" 38 | self.target_batch_size = 32 # the size of target dataset (the # of samples used to fine-tune). 39 | 40 | self.Context_Cont = Context_Cont_configs() 41 | self.TC = TC() 42 | self.augmentation = augmentations() 43 | 44 | 45 | class augmentations(object): 46 | def __init__(self): 47 | self.jitter_scale_ratio = 1.5 48 | self.jitter_ratio = 2 49 | self.max_seg = 12 50 | 51 | 52 | class Context_Cont_configs(object): 53 | def __init__(self): 54 | self.temperature = 0.2 55 | self.use_cosine_similarity = True 56 | 57 | 58 | class TC(object): 59 | def __init__(self): 60 | self.hidden_dim = 64 61 | self.timesteps = 50 62 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/dataloader.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.utils.data import DataLoader 3 | from torch.utils.data import Dataset 4 | import os 5 | import numpy as np 6 | 7 | 8 | class Load_Dataset(Dataset): 9 | # Initialize your data, download, etc. 10 | def __init__(self, dataset, config, training_mode, target_dataset_size=64, subset=False): 11 | super(Load_Dataset, self).__init__() 12 | self.training_mode = training_mode 13 | X_train = dataset["samples"] 14 | y_train = dataset["labels"] 15 | # shuffle 16 | data = list(zip(X_train, y_train)) 17 | np.random.shuffle(data) 18 | X_train, y_train = zip(*data) 19 | X_train, y_train = torch.stack(list(X_train), dim=0), torch.stack(list(y_train), dim=0) 20 | 21 | if len(X_train.shape) < 3: 22 | X_train = X_train.unsqueeze(2) 23 | 24 | if X_train.shape.index(min(X_train.shape)) != 1: # make sure the Channels in second dim 25 | X_train = X_train.permute(0, 2, 1) 26 | 27 | """Align the TS length between source and target datasets""" 28 | #X_train = X_train[:, :1, :int(config.TSlength_aligned)] # take the first 178 samples 29 | X_train = X_train[:, :1, :int(config.TSlength_aligned)] 30 | 31 | """Subset for debugging""" 32 | if subset == True: 33 | subset_size = target_dataset_size *10 34 | """if the dimension is larger than 178, take the first 178 dimensions. If multiple channels, take the first channel""" 35 | X_train = X_train[:subset_size] 36 | y_train = y_train[:subset_size] 37 | 38 | if isinstance(X_train, np.ndarray): 39 | self.x_data = torch.from_numpy(X_train) 40 | self.y_data = torch.from_numpy(y_train).long() 41 | else: 42 | self.x_data = X_train 43 | self.y_data = y_train 44 | 45 | self.len = X_train.shape[0] 46 | 47 | def __getitem__(self, index): 48 | return self.x_data[index], self.y_data[index] 49 | 50 | def __len__(self): 51 | return self.len 52 | 53 | 54 | def data_generator(sourcedata_path, targetdata_path, configs, training_mode, subset = True): 55 | 56 | train_dataset = torch.load(os.path.join(sourcedata_path, "train.pt")) 57 | finetune_dataset = torch.load(os.path.join(targetdata_path, "train.pt")) 58 | test_dataset = torch.load(os.path.join(targetdata_path, "test.pt")) 59 | """ Dataset notes: 60 | Epilepsy: train_dataset['samples'].shape = torch.Size([7360, 1, 178]); binary labels [7360] 61 | valid: [1840, 1, 178] 62 | test: [2300, 1, 178]. In test set, 1835 are positive sampels, the positive rate is 0.7978""" 63 | """sleepEDF: finetune_dataset['samples']: [7786, 1, 3000]""" 64 | 65 | # subset = True # if true, use a subset for debugging. 66 | train_dataset = Load_Dataset(train_dataset, configs, training_mode, target_dataset_size=configs.batch_size, subset=subset) # for self-supervised, the data are augmented here 67 | finetune_dataset = Load_Dataset(finetune_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset) 68 | if test_dataset['labels'].shape[0]>10*configs.target_batch_size: 69 | test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size*10, subset=subset) 70 | else: 71 | test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset) 72 | 73 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=configs.batch_size, 74 | shuffle=True, drop_last=configs.drop_last, 75 | num_workers=0) 76 | 77 | """the valid and test loader would be finetuning set and test set.""" 78 | valid_loader = torch.utils.data.DataLoader(dataset=finetune_dataset, batch_size=configs.target_batch_size, 79 | shuffle=True, drop_last=configs.drop_last, 80 | num_workers=0) 81 | 82 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=configs.target_batch_size, 83 | shuffle=True, drop_last=False, 84 | num_workers=0) 85 | 86 | return train_loader, valid_loader, test_loader -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/AutoCorrelation.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import math 4 | 5 | 6 | class AutoCorrelation(nn.Module): 7 | """ 8 | AutoCorrelation Mechanism with the following two phases: 9 | (1) period-based dependencies discovery 10 | (2) time delay aggregation 11 | This block can replace the self-attention family mechanism seamlessly. 12 | """ 13 | def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False): 14 | super(AutoCorrelation, self).__init__() 15 | self.factor = factor 16 | self.scale = scale 17 | self.mask_flag = mask_flag 18 | self.output_attention = output_attention 19 | self.dropout = nn.Dropout(attention_dropout) 20 | 21 | def time_delay_agg_training(self, values, corr): 22 | """ 23 | SpeedUp version of Autocorrelation (a batch-normalization style design) 24 | This is for the training phase. 25 | """ 26 | head = values.shape[1] 27 | channel = values.shape[2] 28 | length = values.shape[3] 29 | # find top k 30 | top_k = int(self.factor * math.log(length)) 31 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1) 32 | index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1] 33 | weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1) 34 | # update corr 35 | tmp_corr = torch.softmax(weights, dim=-1) 36 | # aggregation 37 | tmp_values = values 38 | delays_agg = torch.zeros_like(values).float() 39 | for i in range(top_k): 40 | pattern = torch.roll(tmp_values, -int(index[i]), -1) 41 | delays_agg = delays_agg + pattern * \ 42 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)) 43 | return delays_agg 44 | 45 | def time_delay_agg_inference(self, values, corr): 46 | """ 47 | SpeedUp version of Autocorrelation (a batch-normalization style design) 48 | This is for the inference phase. 49 | """ 50 | batch = values.shape[0] 51 | head = values.shape[1] 52 | channel = values.shape[2] 53 | length = values.shape[3] 54 | # index init 55 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\ 56 | .repeat(batch, head, channel, 1).to(values.device) 57 | # find top k 58 | top_k = int(self.factor * math.log(length)) 59 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1) 60 | weights, delay = torch.topk(mean_value, top_k, dim=-1) 61 | # update corr 62 | tmp_corr = torch.softmax(weights, dim=-1) 63 | # aggregation 64 | tmp_values = values.repeat(1, 1, 1, 2) 65 | delays_agg = torch.zeros_like(values).float() 66 | for i in range(top_k): 67 | tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length) 68 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay) 69 | delays_agg = delays_agg + pattern * \ 70 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)) 71 | return delays_agg 72 | 73 | def time_delay_agg_full(self, values, corr): 74 | """ 75 | Standard version of Autocorrelation 76 | """ 77 | batch = values.shape[0] 78 | head = values.shape[1] 79 | channel = values.shape[2] 80 | length = values.shape[3] 81 | # index init 82 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\ 83 | .repeat(batch, head, channel, 1).to(values.device) 84 | # find top k 85 | top_k = int(self.factor * math.log(length)) 86 | weights, delay = torch.topk(corr, top_k, dim=-1) 87 | # update corr 88 | tmp_corr = torch.softmax(weights, dim=-1) 89 | # aggregation 90 | tmp_values = values.repeat(1, 1, 1, 2) 91 | delays_agg = torch.zeros_like(values).float() 92 | for i in range(top_k): 93 | tmp_delay = init_index + delay[..., i].unsqueeze(-1) 94 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay) 95 | delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1)) 96 | return delays_agg 97 | 98 | def forward(self, queries, keys, values, attn_mask): 99 | B, L, H, E = queries.shape 100 | _, S, _, D = values.shape 101 | if L > S: 102 | zeros = torch.zeros_like(queries[:, :(L - S), :]).float() 103 | values = torch.cat([values, zeros], dim=1) 104 | keys = torch.cat([keys, zeros], dim=1) 105 | else: 106 | values = values[:, :L, :, :] 107 | keys = keys[:, :L, :, :] 108 | 109 | # period-based dependencies 110 | q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1) 111 | k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1) 112 | res = q_fft * torch.conj(k_fft) 113 | corr = torch.fft.irfft(res, dim=-1) 114 | 115 | # time delay agg 116 | if self.training: 117 | V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2) 118 | else: 119 | V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2) 120 | 121 | if self.output_attention: 122 | return (V.contiguous(), corr.permute(0, 3, 1, 2)) 123 | else: 124 | return (V.contiguous(), None) 125 | 126 | 127 | class AutoCorrelationLayer(nn.Module): 128 | def __init__(self, correlation, d_model, n_heads, d_keys=None, 129 | d_values=None): 130 | super(AutoCorrelationLayer, self).__init__() 131 | 132 | d_keys = d_keys or (d_model // n_heads) 133 | d_values = d_values or (d_model // n_heads) 134 | 135 | self.inner_correlation = correlation 136 | self.query_projection = nn.Linear(d_model, d_keys * n_heads) 137 | self.key_projection = nn.Linear(d_model, d_keys * n_heads) 138 | self.value_projection = nn.Linear(d_model, d_values * n_heads) 139 | self.out_projection = nn.Linear(d_values * n_heads, d_model) 140 | self.n_heads = n_heads 141 | 142 | def forward(self, queries, keys, values, attn_mask): 143 | B, L, _ = queries.shape 144 | _, S, _ = keys.shape 145 | H = self.n_heads 146 | 147 | queries = self.query_projection(queries).view(B, L, H, -1) 148 | keys = self.key_projection(keys).view(B, S, H, -1) 149 | values = self.value_projection(values).view(B, S, H, -1) 150 | 151 | out, attn = self.inner_correlation( 152 | queries, 153 | keys, 154 | values, 155 | attn_mask 156 | ) 157 | out = out.view(B, L, -1) 158 | 159 | return self.out_projection(out), attn 160 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/Autoformer_EncDec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class my_Layernorm(nn.Module): 7 | """ 8 | Special designed layernorm for the seasonal part 9 | """ 10 | def __init__(self, channels): 11 | super(my_Layernorm, self).__init__() 12 | self.layernorm = nn.LayerNorm(channels) 13 | 14 | def forward(self, x): 15 | x_hat = self.layernorm(x) 16 | bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1) 17 | return x_hat - bias 18 | 19 | 20 | class moving_avg(nn.Module): 21 | """ 22 | Moving average block to highlight the trend of time series 23 | """ 24 | def __init__(self, kernel_size, stride): 25 | super(moving_avg, self).__init__() 26 | self.kernel_size = kernel_size 27 | self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0) 28 | 29 | def forward(self, x): 30 | # padding on the both ends of time series 31 | front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1) 32 | end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1) 33 | x = torch.cat([front, x, end], dim=1) 34 | x = self.avg(x.permute(0, 2, 1)) 35 | x = x.permute(0, 2, 1) 36 | return x 37 | 38 | 39 | class series_decomp(nn.Module): 40 | """ 41 | Series decomposition block 42 | """ 43 | def __init__(self, kernel_size): 44 | super(series_decomp, self).__init__() 45 | self.moving_avg = moving_avg(kernel_size, stride=1) 46 | 47 | def forward(self, x): 48 | moving_mean = self.moving_avg(x) 49 | res = x - moving_mean 50 | return res, moving_mean 51 | 52 | 53 | class EncoderLayer(nn.Module): 54 | """ 55 | Autoformer encoder layer with the progressive decomposition architecture 56 | """ 57 | def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"): 58 | super(EncoderLayer, self).__init__() 59 | d_ff = d_ff or 4 * d_model 60 | self.attention = attention 61 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False) 62 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False) 63 | self.decomp1 = series_decomp(moving_avg) 64 | self.decomp2 = series_decomp(moving_avg) 65 | self.dropout = nn.Dropout(dropout) 66 | self.activation = F.relu if activation == "relu" else F.gelu 67 | 68 | def forward(self, x, attn_mask=None): 69 | new_x, attn = self.attention( 70 | x, x, x, 71 | attn_mask=attn_mask 72 | ) 73 | x = x + self.dropout(new_x) 74 | x, _ = self.decomp1(x) 75 | y = x 76 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 77 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 78 | res, _ = self.decomp2(x + y) 79 | return res, attn 80 | 81 | 82 | class Encoder(nn.Module): 83 | """ 84 | Autoformer encoder 85 | """ 86 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None): 87 | super(Encoder, self).__init__() 88 | self.attn_layers = nn.ModuleList(attn_layers) 89 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None 90 | self.norm = norm_layer 91 | 92 | def forward(self, x, attn_mask=None): 93 | attns = [] 94 | if self.conv_layers is not None: 95 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers): 96 | x, attn = attn_layer(x, attn_mask=attn_mask) 97 | x = conv_layer(x) 98 | attns.append(attn) 99 | x, attn = self.attn_layers[-1](x) 100 | attns.append(attn) 101 | else: 102 | for attn_layer in self.attn_layers: 103 | x, attn = attn_layer(x, attn_mask=attn_mask) 104 | attns.append(attn) 105 | 106 | if self.norm is not None: 107 | x = self.norm(x) 108 | 109 | return x, attns 110 | 111 | 112 | class DecoderLayer(nn.Module): 113 | """ 114 | Autoformer decoder layer with the progressive decomposition architecture 115 | """ 116 | def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None, 117 | moving_avg=25, dropout=0.1, activation="relu"): 118 | super(DecoderLayer, self).__init__() 119 | d_ff = d_ff or 4 * d_model 120 | self.self_attention = self_attention 121 | self.cross_attention = cross_attention 122 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False) 123 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False) 124 | self.decomp1 = series_decomp(moving_avg) 125 | self.decomp2 = series_decomp(moving_avg) 126 | self.decomp3 = series_decomp(moving_avg) 127 | self.dropout = nn.Dropout(dropout) 128 | self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1, 129 | padding_mode='circular', bias=False) 130 | self.activation = F.relu if activation == "relu" else F.gelu 131 | 132 | def forward(self, x, cross, x_mask=None, cross_mask=None): 133 | x = x + self.dropout(self.self_attention( 134 | x, x, x, 135 | attn_mask=x_mask 136 | )[0]) 137 | x, trend1 = self.decomp1(x) 138 | x = x + self.dropout(self.cross_attention( 139 | x, cross, cross, 140 | attn_mask=cross_mask 141 | )[0]) 142 | x, trend2 = self.decomp2(x) 143 | y = x 144 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 145 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 146 | x, trend3 = self.decomp3(x + y) 147 | 148 | residual_trend = trend1 + trend2 + trend3 149 | residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2) 150 | return x, residual_trend 151 | 152 | 153 | class Decoder(nn.Module): 154 | """ 155 | Autoformer encoder 156 | """ 157 | def __init__(self, layers, norm_layer=None, projection=None): 158 | super(Decoder, self).__init__() 159 | self.layers = nn.ModuleList(layers) 160 | self.norm = norm_layer 161 | self.projection = projection 162 | 163 | def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None): 164 | for layer in self.layers: 165 | x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask) 166 | trend = trend + residual_trend 167 | 168 | if self.norm is not None: 169 | x = self.norm(x) 170 | 171 | if self.projection is not None: 172 | x = self.projection(x) 173 | return x, trend 174 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/Embed.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import math 4 | 5 | 6 | class PositionalEmbedding(nn.Module): 7 | def __init__(self, d_model, max_len=5000): 8 | super(PositionalEmbedding, self).__init__() 9 | # Compute the positional encodings once in log space. 10 | pe = torch.zeros(max_len, d_model).float() 11 | pe.require_grad = False 12 | 13 | position = torch.arange(0, max_len).float().unsqueeze(1) 14 | div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp() 15 | 16 | pe[:, 0::2] = torch.sin(position * div_term) 17 | pe[:, 1::2] = torch.cos(position * div_term) 18 | 19 | pe = pe.unsqueeze(0) 20 | self.register_buffer('pe', pe) 21 | 22 | def forward(self, x): 23 | return self.pe[:, :x.size(1)] 24 | 25 | 26 | class TokenEmbedding(nn.Module): 27 | def __init__(self, c_in, d_model): 28 | super(TokenEmbedding, self).__init__() 29 | padding = 1 if torch.__version__ >= '1.5.0' else 2 30 | self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model, 31 | kernel_size=3, padding=padding, padding_mode='circular', bias=False) 32 | for m in self.modules(): 33 | if isinstance(m, nn.Conv1d): 34 | nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu') 35 | 36 | def forward(self, x): 37 | x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2) 38 | return x 39 | 40 | 41 | class FixedEmbedding(nn.Module): 42 | def __init__(self, c_in, d_model): 43 | super(FixedEmbedding, self).__init__() 44 | 45 | w = torch.zeros(c_in, d_model).float() 46 | w.require_grad = False 47 | 48 | position = torch.arange(0, c_in).float().unsqueeze(1) 49 | div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp() 50 | 51 | w[:, 0::2] = torch.sin(position * div_term) 52 | w[:, 1::2] = torch.cos(position * div_term) 53 | 54 | self.emb = nn.Embedding(c_in, d_model) 55 | self.emb.weight = nn.Parameter(w, requires_grad=False) 56 | 57 | def forward(self, x): 58 | return self.emb(x).detach() 59 | 60 | 61 | class TemporalEmbedding(nn.Module): 62 | def __init__(self, d_model, embed_type='fixed', freq='h'): 63 | super(TemporalEmbedding, self).__init__() 64 | 65 | minute_size = 4 66 | hour_size = 24 67 | weekday_size = 7 68 | day_size = 32 69 | month_size = 13 70 | 71 | Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding 72 | if freq == 't': 73 | self.minute_embed = Embed(minute_size, d_model) 74 | self.hour_embed = Embed(hour_size, d_model) 75 | self.weekday_embed = Embed(weekday_size, d_model) 76 | self.day_embed = Embed(day_size, d_model) 77 | self.month_embed = Embed(month_size, d_model) 78 | 79 | def forward(self, x): 80 | x = x.long() 81 | 82 | minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0. 83 | hour_x = self.hour_embed(x[:, :, 3]) 84 | weekday_x = self.weekday_embed(x[:, :, 2]) 85 | day_x = self.day_embed(x[:, :, 1]) 86 | month_x = self.month_embed(x[:, :, 0]) 87 | 88 | return hour_x + weekday_x + day_x + month_x + minute_x 89 | 90 | 91 | class TimeFeatureEmbedding(nn.Module): 92 | def __init__(self, d_model, embed_type='timeF', freq='h'): 93 | super(TimeFeatureEmbedding, self).__init__() 94 | 95 | freq_map = {'h': 4, 't': 5, 's': 6, 'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3} 96 | d_inp = freq_map[freq] 97 | self.embed = nn.Linear(d_inp, d_model, bias=False) 98 | 99 | def forward(self, x): 100 | return self.embed(x) 101 | 102 | 103 | class DataEmbedding(nn.Module): 104 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1): 105 | super(DataEmbedding, self).__init__() 106 | 107 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model) 108 | self.position_embedding = PositionalEmbedding(d_model=d_model) 109 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type, 110 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding( 111 | d_model=d_model, embed_type=embed_type, freq=freq) 112 | self.dropout = nn.Dropout(p=dropout) 113 | 114 | def forward(self, x): 115 | x = self.value_embedding(x) + self.position_embedding(x) 116 | return self.dropout(x) 117 | 118 | 119 | class DataEmbedding_wo_pos(nn.Module): 120 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1): 121 | super(DataEmbedding_wo_pos, self).__init__() 122 | 123 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model) 124 | self.position_embedding = PositionalEmbedding(d_model=d_model) 125 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type, 126 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding( 127 | d_model=d_model, embed_type=embed_type, freq=freq) 128 | self.dropout = nn.Dropout(p=dropout) 129 | 130 | def forward(self, x, x_mark): 131 | x = self.value_embedding(x) + self.temporal_embedding(x_mark) 132 | return self.dropout(x) 133 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/SelfAttention_Family.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | from math import sqrt 5 | from utils.masking import TriangularCausalMask, ProbMask 6 | 7 | 8 | class FullAttention(nn.Module): 9 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False): 10 | super(FullAttention, self).__init__() 11 | self.scale = scale 12 | self.mask_flag = mask_flag 13 | self.output_attention = output_attention 14 | self.dropout = nn.Dropout(attention_dropout) 15 | 16 | def forward(self, queries, keys, values, attn_mask): 17 | B, L, H, E = queries.shape 18 | _, S, _, D = values.shape 19 | scale = self.scale or 1. / sqrt(E) 20 | 21 | scores = torch.einsum("blhe,bshe->bhls", queries, keys) 22 | 23 | if self.mask_flag: 24 | if attn_mask is None: 25 | attn_mask = TriangularCausalMask(B, L, device=queries.device) 26 | 27 | scores.masked_fill_(attn_mask.mask, -np.inf) 28 | 29 | A = self.dropout(torch.softmax(scale * scores, dim=-1)) 30 | V = torch.einsum("bhls,bshd->blhd", A, values) 31 | 32 | if self.output_attention: 33 | return (V.contiguous(), A) 34 | else: 35 | return (V.contiguous(), None) 36 | 37 | 38 | class ProbAttention(nn.Module): 39 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False): 40 | super(ProbAttention, self).__init__() 41 | self.factor = factor 42 | self.scale = scale 43 | self.mask_flag = mask_flag 44 | self.output_attention = output_attention 45 | self.dropout = nn.Dropout(attention_dropout) 46 | 47 | def _prob_QK(self, Q, K, sample_k, n_top): # n_top: c*ln(L_q) 48 | # Q [B, H, L, D] 49 | B, H, L_K, E = K.shape 50 | _, _, L_Q, _ = Q.shape 51 | 52 | # calculate the sampled Q_K 53 | K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E) 54 | index_sample = torch.randint(L_K, (L_Q, sample_k)) # real U = U_part(factor*ln(L_k))*L_q 55 | K_sample = K_expand[:, :, torch.arange(L_Q).unsqueeze(1), index_sample, :] 56 | Q_K_sample = torch.matmul(Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze() 57 | 58 | # find the Top_k query with sparisty measurement 59 | M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K) 60 | M_top = M.topk(n_top, sorted=False)[1] 61 | 62 | # use the reduced Q to calculate Q_K 63 | Q_reduce = Q[torch.arange(B)[:, None, None], 64 | torch.arange(H)[None, :, None], 65 | M_top, :] # factor*ln(L_q) 66 | Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1)) # factor*ln(L_q)*L_k 67 | 68 | return Q_K, M_top 69 | 70 | def _get_initial_context(self, V, L_Q): 71 | B, H, L_V, D = V.shape 72 | if not self.mask_flag: 73 | # V_sum = V.sum(dim=-2) 74 | V_sum = V.mean(dim=-2) 75 | contex = V_sum.unsqueeze(-2).expand(B, H, L_Q, V_sum.shape[-1]).clone() 76 | else: # use mask 77 | assert (L_Q == L_V) # requires that L_Q == L_V, i.e. for self-attention only 78 | contex = V.cumsum(dim=-2) 79 | return contex 80 | 81 | def _update_context(self, context_in, V, scores, index, L_Q, attn_mask): 82 | B, H, L_V, D = V.shape 83 | 84 | if self.mask_flag: 85 | attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device) 86 | scores.masked_fill_(attn_mask.mask, -np.inf) 87 | 88 | attn = torch.softmax(scores, dim=-1) # nn.Softmax(dim=-1)(scores) 89 | 90 | context_in[torch.arange(B)[:, None, None], 91 | torch.arange(H)[None, :, None], 92 | index, :] = torch.matmul(attn, V).type_as(context_in) 93 | if self.output_attention: 94 | attns = (torch.ones([B, H, L_V, L_V]) / L_V).type_as(attn).to(attn.device) 95 | attns[torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :] = attn 96 | return (context_in, attns) 97 | else: 98 | return (context_in, None) 99 | 100 | def forward(self, queries, keys, values, attn_mask): 101 | B, L_Q, H, D = queries.shape 102 | _, L_K, _, _ = keys.shape 103 | 104 | queries = queries.transpose(2, 1) 105 | keys = keys.transpose(2, 1) 106 | values = values.transpose(2, 1) 107 | 108 | U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k) 109 | u = self.factor * np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q) 110 | 111 | U_part = U_part if U_part < L_K else L_K 112 | u = u if u < L_Q else L_Q 113 | 114 | scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u) 115 | 116 | # add scale factor 117 | scale = self.scale or 1. / sqrt(D) 118 | if scale is not None: 119 | scores_top = scores_top * scale 120 | # get the context 121 | context = self._get_initial_context(values, L_Q) 122 | # update the context with selected top_k queries 123 | context, attn = self._update_context(context, values, scores_top, index, L_Q, attn_mask) 124 | 125 | return context.contiguous(), attn 126 | 127 | 128 | class AttentionLayer(nn.Module): 129 | def __init__(self, attention, d_model, n_heads, d_keys=None, 130 | d_values=None): 131 | super(AttentionLayer, self).__init__() 132 | 133 | d_keys = d_keys or (d_model // n_heads) 134 | d_values = d_values or (d_model // n_heads) 135 | 136 | self.inner_attention = attention 137 | self.query_projection = nn.Linear(d_model, d_keys * n_heads) 138 | self.key_projection = nn.Linear(d_model, d_keys * n_heads) 139 | self.value_projection = nn.Linear(d_model, d_values * n_heads) 140 | self.out_projection = nn.Linear(d_values * n_heads, d_model) 141 | self.n_heads = n_heads 142 | 143 | def forward(self, queries, keys, values, attn_mask): 144 | B, L, _ = queries.shape 145 | _, S, _ = keys.shape 146 | H = self.n_heads 147 | 148 | queries = self.query_projection(queries).view(B, L, H, -1) 149 | keys = self.key_projection(keys).view(B, S, H, -1) 150 | values = self.value_projection(values).view(B, S, H, -1) 151 | 152 | out, attn = self.inner_attention( 153 | queries, 154 | keys, 155 | values, 156 | attn_mask 157 | ) 158 | out = out.view(B, L, -1) 159 | 160 | return self.out_projection(out), attn 161 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/Transformer_EncDec.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | 4 | 5 | class ConvLayer(nn.Module): 6 | def __init__(self, c_in): 7 | super(ConvLayer, self).__init__() 8 | self.downConv = nn.Conv1d(in_channels=c_in, 9 | out_channels=c_in, 10 | kernel_size=3, 11 | padding=2, 12 | padding_mode='circular') 13 | self.norm = nn.BatchNorm1d(c_in) 14 | self.activation = nn.ELU() 15 | self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1) 16 | 17 | def forward(self, x): 18 | x = self.downConv(x.permute(0, 2, 1)) 19 | x = self.norm(x) 20 | x = self.activation(x) 21 | x = self.maxPool(x) 22 | x = x.transpose(1, 2) 23 | return x 24 | 25 | 26 | class EncoderLayer(nn.Module): 27 | def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"): 28 | super(EncoderLayer, self).__init__() 29 | d_ff = d_ff or 4 * d_model 30 | self.attention = attention 31 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) 32 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) 33 | self.norm1 = nn.LayerNorm(d_model) 34 | self.norm2 = nn.LayerNorm(d_model) 35 | self.dropout = nn.Dropout(dropout) 36 | self.activation = F.relu if activation == "relu" else F.gelu 37 | 38 | def forward(self, x, attn_mask=None): 39 | new_x, attn = self.attention( 40 | x, x, x, 41 | attn_mask=attn_mask 42 | ) 43 | x = x + self.dropout(new_x) 44 | 45 | y = x = self.norm1(x) 46 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 47 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 48 | 49 | return self.norm2(x + y), attn 50 | 51 | 52 | class Encoder(nn.Module): 53 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None, projection=None): 54 | super(Encoder, self).__init__() 55 | self.attn_layers = nn.ModuleList(attn_layers) 56 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None 57 | self.norm = norm_layer 58 | self.projection = projection 59 | 60 | def forward(self, x, attn_mask=None): 61 | # x [B, L, D] 62 | attns = [] 63 | if self.conv_layers is not None: 64 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers): 65 | x, attn = attn_layer(x, attn_mask=attn_mask) 66 | x = conv_layer(x) 67 | attns.append(attn) 68 | x, attn = self.attn_layers[-1](x) 69 | attns.append(attn) 70 | else: 71 | for attn_layer in self.attn_layers: 72 | x, attn = attn_layer(x, attn_mask=attn_mask) 73 | attns.append(attn) 74 | 75 | if self.norm is not None: 76 | x = self.norm(x) 77 | 78 | if self.projection is not None: 79 | x = self.projection(x) 80 | 81 | return x, attns 82 | 83 | 84 | class DecoderLayer(nn.Module): 85 | def __init__(self, self_attention, cross_attention, d_model, d_ff=None, 86 | dropout=0.1, activation="relu"): 87 | super(DecoderLayer, self).__init__() 88 | d_ff = d_ff or 4 * d_model 89 | self.self_attention = self_attention 90 | self.cross_attention = cross_attention 91 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) 92 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) 93 | self.norm1 = nn.LayerNorm(d_model) 94 | self.norm2 = nn.LayerNorm(d_model) 95 | self.norm3 = nn.LayerNorm(d_model) 96 | self.dropout = nn.Dropout(dropout) 97 | self.activation = F.relu if activation == "relu" else F.gelu 98 | 99 | def forward(self, x, cross, x_mask=None, cross_mask=None): 100 | x = x + self.dropout(self.self_attention( 101 | x, x, x, 102 | attn_mask=x_mask 103 | )[0]) 104 | x = self.norm1(x) 105 | 106 | x = x + self.dropout(self.cross_attention( 107 | x, cross, cross, 108 | attn_mask=cross_mask 109 | )[0]) 110 | 111 | y = x = self.norm2(x) 112 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 113 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 114 | 115 | return self.norm3(x + y) 116 | 117 | 118 | class Decoder(nn.Module): 119 | def __init__(self, layers, norm_layer=None, projection=None): 120 | super(Decoder, self).__init__() 121 | self.layers = nn.ModuleList(layers) 122 | self.norm = norm_layer 123 | self.projection = projection 124 | 125 | def forward(self, x, cross, x_mask=None, cross_mask=None): 126 | for layer in self.layers: 127 | x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask) 128 | 129 | if self.norm is not None: 130 | x = self.norm(x) 131 | 132 | if self.projection is not None: 133 | x = self.projection(x) 134 | return x 135 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/layers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/layers/__init__.py -------------------------------------------------------------------------------- /SimMTM_Classification/code/loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import torch.nn.functional as F 4 | 5 | class AutomaticWeightedLoss(torch.nn.Module): 6 | """automatically weighted multi-task loss 7 | Params: 8 | num: int,the number of loss 9 | x: multi-task loss 10 | Examples: 11 | loss1=1 12 | loss2=2 13 | awl = AutomaticWeightedLoss(2) 14 | loss_sum = awl(loss1, loss2) 15 | """ 16 | 17 | def __init__(self, num=2): 18 | super(AutomaticWeightedLoss, self).__init__() 19 | params = torch.ones(num, requires_grad=True) 20 | self.params = torch.nn.Parameter(params) 21 | 22 | def forward(self, *x): 23 | loss_sum = 0 24 | for i, loss in enumerate(x): 25 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2) 26 | return loss_sum 27 | 28 | 29 | class ContrastiveWeight(torch.nn.Module): 30 | 31 | def __init__(self, args): 32 | super(ContrastiveWeight, self).__init__() 33 | self.temperature = args.temperature 34 | 35 | self.bce = torch.nn.BCELoss() 36 | self.softmax = torch.nn.Softmax(dim=-1) 37 | self.log_softmax = torch.nn.LogSoftmax(dim=-1) 38 | self.kl = torch.nn.KLDivLoss(reduction='batchmean') 39 | self.positive_nums = args.positive_nums 40 | 41 | def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size): 42 | diag = np.eye(cur_batch_size) 43 | mask = torch.from_numpy(diag) 44 | mask = mask.type(torch.bool) 45 | 46 | oral_batch_size = cur_batch_size // (self.positive_nums + 1) 47 | 48 | positives_mask = np.zeros(similarity_matrix.size()) 49 | for i in range(self.positive_nums + 1): 50 | ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size * i) 51 | lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size * i) 52 | positives_mask += ll 53 | positives_mask += lr 54 | 55 | positives_mask = torch.from_numpy(positives_mask).to(similarity_matrix.device) 56 | positives_mask[mask] = 0 57 | 58 | negatives_mask = 1 - positives_mask 59 | negatives_mask[mask] = 0 60 | 61 | return positives_mask.type(torch.bool), negatives_mask.type(torch.bool) 62 | 63 | def forward(self, batch_emb_om): 64 | cur_batch_shape = batch_emb_om.shape 65 | 66 | # get similarity matrix among mask samples 67 | norm_emb = F.normalize(batch_emb_om, dim=1) 68 | similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1)) 69 | 70 | # get positives and negatives similarity 71 | positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0]) 72 | 73 | positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1) 74 | negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1) 75 | 76 | # generate predict and target probability distributions matrix 77 | logits = torch.cat((positives, negatives), dim=-1) 78 | y_true = torch.cat( 79 | (torch.ones(cur_batch_shape[0], positives.shape[-1]), torch.zeros(cur_batch_shape[0], negatives.shape[-1])), 80 | dim=-1).to(batch_emb_om.device).float() 81 | 82 | # multiple positives - KL divergence 83 | predict = self.log_softmax(logits / self.temperature) 84 | loss = self.kl(predict, y_true) 85 | 86 | return loss, similarity_matrix, logits, positives_mask 87 | 88 | 89 | class AggregationRebuild(torch.nn.Module): 90 | 91 | def __init__(self, args): 92 | super(AggregationRebuild, self).__init__() 93 | self.args = args 94 | self.temperature = args.temperature 95 | self.softmax = torch.nn.Softmax(dim=-1) 96 | self.mse = torch.nn.MSELoss() 97 | 98 | def forward(self, similarity_matrix, batch_emb_om): 99 | cur_batch_shape = batch_emb_om.shape 100 | 101 | # get the weight among (oral, oral's masks, others, others' masks) 102 | similarity_matrix /= self.temperature 103 | 104 | similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to( 105 | similarity_matrix.device).float() * 1e12 106 | rebuild_weight_matrix = self.softmax(similarity_matrix) 107 | 108 | batch_emb_om = batch_emb_om.reshape(cur_batch_shape[0], -1) 109 | 110 | # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks) 111 | rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om) 112 | 113 | # get oral' rebuilt batch embedding 114 | rebuild_oral_batch_emb = rebuild_batch_emb.reshape(cur_batch_shape[0], cur_batch_shape[1], -1) 115 | 116 | return rebuild_weight_matrix, rebuild_oral_batch_emb -------------------------------------------------------------------------------- /SimMTM_Classification/code/main.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from datetime import datetime 3 | import argparse 4 | from utils.utils import _logger 5 | from model import TFC 6 | from dataloader import data_generator 7 | from trainer import Trainer 8 | import os 9 | import torch 10 | 11 | # Args selections 12 | start_time = datetime.now() 13 | parser = argparse.ArgumentParser() 14 | 15 | home_dir = os.getcwd() 16 | parser.add_argument('--run_description', default='run1', type=str, help='Experiment Description') 17 | parser.add_argument('--seed', default=2023, type=int, help='seed value') 18 | 19 | parser.add_argument('--training_mode', default='pre_train', type=str, help='pre_train, fine_tune') 20 | parser.add_argument('--pretrain_dataset', default='SleepEEG', type=str, 21 | help='Dataset of choice: SleepEEG, FD_A, HAR, ECG') 22 | parser.add_argument('--target_dataset', default='Epilepsy', type=str, 23 | help='Dataset of choice: Epilepsy, FD_B, Gesture, EMG') 24 | 25 | parser.add_argument('--logs_save_dir', default='experiments_logs', type=str, help='saving directory') 26 | parser.add_argument('--device', default='cuda', type=str, help='cpu or cuda') 27 | parser.add_argument('--home_path', default=home_dir, type=str, help='Project home directory') 28 | parser.add_argument('--subset', action='store_true', default=False, help='use the subset of datasets') 29 | parser.add_argument('--log_epoch', default=5, type=int, help='print loss and metrix') 30 | parser.add_argument('--draw_similar_matrix', default=10, type=int, help='draw similarity matrix') 31 | parser.add_argument('--pretrain_lr', default=0.0001, type=float, help='pretrain learning rate') 32 | parser.add_argument('--lr', default=0.0001, type=float, help='learning rate') 33 | parser.add_argument('--use_pretrain_epoch_dir', default=None, type=str, 34 | help='choose the pretrain checkpoint to finetune') 35 | parser.add_argument('--pretrain_epoch', default=10, type=int, help='pretrain epochs') 36 | parser.add_argument('--finetune_epoch', default=300, type=int, help='finetune epochs') 37 | 38 | parser.add_argument('--masking_ratio', default=0.5, type=float, help='masking ratio') 39 | parser.add_argument('--positive_nums', default=3, type=int, help='positive series numbers') 40 | parser.add_argument('--lm', default=3, type=int, help='average masked lenght') 41 | 42 | parser.add_argument('--finetune_result_file_name', default="finetune_result.json", type=str, 43 | help='finetune result json name') 44 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature') 45 | 46 | 47 | def set_seed(seed): 48 | SEED = seed 49 | torch.manual_seed(SEED) 50 | torch.backends.cudnn.deterministic = False 51 | torch.backends.cudnn.benchmark = False 52 | np.random.seed(SEED) 53 | 54 | return seed 55 | 56 | 57 | def main(args, configs, seed=None): 58 | method = 'SimMTM' 59 | sourcedata = args.pretrain_dataset 60 | targetdata = args.target_dataset 61 | training_mode = args.training_mode 62 | run_description = args.run_description 63 | 64 | logs_save_dir = args.logs_save_dir 65 | masking_ratio = args.masking_ratio 66 | pretrain_lr = args.pretrain_lr 67 | pretrain_epoch = args.pretrain_epoch 68 | lr = args.lr 69 | finetune_epoch = args.finetune_epoch 70 | temperature = args.temperature 71 | experiment_description = f"{sourcedata}_2_{targetdata}" 72 | 73 | os.makedirs(logs_save_dir, exist_ok=True) 74 | 75 | # Load datasets 76 | sourcedata_path = f"./dataset/{sourcedata}" # './data/Epilepsy' 77 | targetdata_path = f"./dataset/{targetdata}" 78 | 79 | subset = args.subset # if subset= true, use a subset for debugging. 80 | train_dl, valid_dl, test_dl = data_generator(sourcedata_path, targetdata_path, configs, training_mode, 81 | subset=subset) 82 | 83 | # set seed 84 | if seed is not None: 85 | seed = set_seed(seed) 86 | else: 87 | seed = set_seed(args.seed) 88 | 89 | # experiments_logs/SleepEEG/run1/pre_train_2023_pt_0.5_0.0001_50_ft_0.0003_100 90 | experiment_log_dir = os.path.join(logs_save_dir, experiment_description, run_description, 91 | training_mode + f"_{seed}_pt_{masking_ratio}_{pretrain_lr}_{pretrain_epoch}_ft_{lr}_{finetune_epoch}") 92 | os.makedirs(experiment_log_dir, exist_ok=True) 93 | 94 | # Logging 95 | log_file_name = os.path.join(experiment_log_dir, f"logs_{datetime.now().strftime('%d_%m_%Y_%H_%M_%S')}.log") 96 | logger = _logger(log_file_name) 97 | logger.debug("=" * 45) 98 | logger.debug(f'Pre-training Dataset: {sourcedata}') 99 | logger.debug(f'Target (fine-tuning) Dataset: {targetdata}') 100 | logger.debug(f'Seed: {seed}') 101 | logger.debug(f'Method: {method}') 102 | logger.debug(f'Mode: {training_mode}') 103 | logger.debug(f'Pretrain Learning rate: {pretrain_lr}') 104 | logger.debug(f'Masking ratio: {masking_ratio}') 105 | logger.debug(f'Pretrain Epochs: {pretrain_epoch}') 106 | logger.debug(f'Finetune Learning rate: {lr}') 107 | logger.debug(f'Finetune Epochs: {finetune_epoch}') 108 | logger.debug(f'Temperature: {temperature}') 109 | logger.debug("=" * 45) 110 | 111 | # Load Model 112 | model = TFC(configs, args).to(device) 113 | params_group = [{'params': model.parameters()}] 114 | model_optimizer = torch.optim.Adam(params_group, lr=pretrain_lr, betas=(configs.beta1, configs.beta2), 115 | weight_decay=0) 116 | model_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=model_optimizer, T_max=pretrain_epoch) 117 | 118 | # Trainer 119 | best_performance = Trainer(model, model_optimizer, model_scheduler, train_dl, valid_dl, test_dl, device, logger, 120 | args, configs, experiment_log_dir, seed) 121 | 122 | return best_performance 123 | 124 | 125 | if __name__ == '__main__': 126 | args, unknown = parser.parse_known_args() 127 | device = torch.device(args.device) 128 | exec (f'from config_files.{args.pretrain_dataset}_Configs import Config as Configs') 129 | configs = Configs() 130 | 131 | main(args, configs) 132 | 133 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/model.py: -------------------------------------------------------------------------------- 1 | from torch import nn 2 | import torch 3 | from loss import ContrastiveWeight, AggregationRebuild, AutomaticWeightedLoss 4 | 5 | 6 | class TFC(nn.Module): 7 | def __init__(self, configs, args): 8 | super(TFC, self).__init__() 9 | self.training_mode = args.training_mode 10 | 11 | self.conv_block1 = nn.Sequential( 12 | nn.Conv1d(configs.input_channels, 32, kernel_size=configs.kernel_size, 13 | stride=configs.stride, bias=False, padding=(configs.kernel_size // 2)), 14 | nn.BatchNorm1d(32), 15 | nn.ReLU(), 16 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1), 17 | nn.Dropout(configs.dropout) 18 | ) 19 | 20 | self.conv_block2 = nn.Sequential( 21 | nn.Conv1d(32, 64, kernel_size=8, stride=1, bias=False, padding=4), 22 | nn.BatchNorm1d(64), 23 | nn.ReLU(), 24 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1) 25 | ) 26 | 27 | self.conv_block3 = nn.Sequential( 28 | nn.Conv1d(64, configs.final_out_channels, kernel_size=8, stride=1, bias=False, padding=4), 29 | nn.BatchNorm1d(configs.final_out_channels), 30 | nn.ReLU(), 31 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1), 32 | ) 33 | 34 | self.dense = nn.Sequential( 35 | nn.Linear(configs.CNNoutput_channel * configs.final_out_channels, 256), 36 | nn.BatchNorm1d(256), 37 | nn.ReLU(), 38 | nn.Linear(256, 128) 39 | ) 40 | 41 | if self.training_mode == 'pre_train': 42 | self.awl = AutomaticWeightedLoss(2) 43 | self.contrastive = ContrastiveWeight(args) 44 | self.aggregation = AggregationRebuild(args) 45 | self.head = nn.Linear(1280, 178) 46 | self.mse = torch.nn.MSELoss() 47 | 48 | def forward(self, x_in_t, pretrain=False): 49 | 50 | if pretrain: 51 | x = self.conv_block1(x_in_t) 52 | x = self.conv_block2(x) 53 | x = self.conv_block3(x) 54 | 55 | h = x.reshape(x.shape[0], -1) 56 | z = self.dense(h) 57 | 58 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(z) 59 | rebuild_weight_matrix, agg_x = self.aggregation(similarity_matrix, x) 60 | pred_x = self.head(agg_x.reshape(agg_x.size(0), -1)) 61 | 62 | loss_rb = self.mse(pred_x, x_in_t.reshape(x_in_t.size(0), -1).detach()) 63 | loss = self.awl(loss_cl, loss_rb) 64 | 65 | return loss, loss_cl, loss_rb 66 | else: 67 | x = self.conv_block1(x_in_t) 68 | x = self.conv_block2(x) 69 | x = self.conv_block3(x) 70 | 71 | h = x.reshape(x.shape[0], -1) 72 | z = self.dense(h) 73 | 74 | return h, z 75 | 76 | 77 | class target_classifier(nn.Module): # Classification head 78 | def __init__(self, configs): 79 | super(target_classifier, self).__init__() 80 | self.logits = nn.Linear(1280, 64) 81 | self.logits_simple = nn.Linear(64, configs.num_classes_target) 82 | 83 | def forward(self, emb): 84 | """2-layer MLP""" 85 | emb_flat = emb.reshape(emb.shape[0], -1) 86 | emb = torch.sigmoid(self.logits(emb_flat)) 87 | pred = self.logits_simple(emb) 88 | return pred 89 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/utils/__init__.py -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/augmentations.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import math 4 | 5 | 6 | def data_transform_masked4cl(sample, masking_ratio, lm, positive_nums=None, distribution='geometric'): 7 | """Masked time series in time dimension""" 8 | 9 | if positive_nums is None: 10 | positive_nums = math.ceil(1.5 / (1 - masking_ratio)) 11 | 12 | sample = sample.permute(0, 2, 1) 13 | 14 | sample_repeat = sample.repeat(positive_nums, 1, 1) 15 | 16 | mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution) 17 | x_masked = mask * sample_repeat 18 | 19 | return x_masked.permute(0, 2, 1), mask.permute(0, 2, 1) 20 | 21 | 22 | def geom_noise_mask_single(L, lm, masking_ratio): 23 | """ 24 | Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio` 25 | proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution. 26 | Args: 27 | L: length of mask and sequence to be masked 28 | lm: average length of masking subsequences (streaks of 0s) 29 | masking_ratio: proportion of L to be masked 30 | Returns: 31 | (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L 32 | """ 33 | keep_mask = np.ones(L, dtype=bool) 34 | p_m = 1 / lm # probability of each masking sequence stopping. parameter of geometric distribution. 35 | p_u = p_m * masking_ratio / ( 36 | 1 - masking_ratio) # probability of each unmasked sequence stopping. parameter of geometric distribution. 37 | p = [p_m, p_u] 38 | 39 | # Start in state 0 with masking_ratio probability 40 | state = int(np.random.rand() > masking_ratio) # state 0 means masking, 1 means not masking 41 | for i in range(L): 42 | keep_mask[i] = state # here it happens that state and masking value corresponding to state are identical 43 | if np.random.rand() < p[state]: 44 | state = 1 - state 45 | 46 | return keep_mask 47 | 48 | 49 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None): 50 | """ 51 | Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked. 52 | Args: 53 | X: (seq_length, feat_dim) numpy array of features corresponding to a single sample 54 | masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of 55 | feat_dim that will be masked on average 56 | lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'. 57 | distribution: whether each mask sequence element is sampled independently at random, or whether 58 | sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of 59 | masked squences of a desired mean length `lm` 60 | exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s) 61 | Returns: 62 | boolean numpy array with the same shape as X, with 0s at places where a feature should be masked 63 | """ 64 | if exclude_feats is not None: 65 | exclude_feats = set(exclude_feats) 66 | 67 | if distribution == 'geometric': # stateful (Markov chain) 68 | mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio) 69 | mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2]) 70 | elif distribution == 'masked_tail': 71 | mask = np.ones(X.shape, dtype=bool) 72 | for m in range(X.shape[0]): # feature dimension 73 | 74 | keep_mask = np.zeros_like(mask[m, :], dtype=bool) 75 | n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio)) 76 | keep_mask[:, :n] = True 77 | mask[m, :] = keep_mask # time dimension 78 | elif distribution == 'masked_head': 79 | mask = np.ones(X.shape, dtype=bool) 80 | for m in range(X.shape[0]): # feature dimension 81 | 82 | keep_mask = np.zeros_like(mask[m, :], dtype=bool) 83 | n = math.ceil(keep_mask.shape[1] * masking_ratio) 84 | keep_mask[:, n:] = True 85 | mask[m, :] = keep_mask # time dimension 86 | else: # each position is independent Bernoulli with p = 1 - masking_ratio 87 | mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True, 88 | p=(1 - masking_ratio, masking_ratio)) 89 | return torch.tensor(mask) -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | import random 5 | import math 6 | import torch.nn.functional as F 7 | 8 | class AutomaticWeightedLoss(nn.Module): 9 | """automatically weighted multi-task loss 10 | Params: 11 | num: int,the number of loss 12 | x: multi-task loss 13 | Examples: 14 | loss1=1 15 | loss2=2 16 | awl = AutomaticWeightedLoss(2) 17 | loss_sum = awl(loss1, loss2) 18 | """ 19 | def __init__(self, num=2): 20 | super(AutomaticWeightedLoss, self).__init__() 21 | params = torch.ones(num, requires_grad=True) 22 | self.params = nn.Parameter(params) 23 | 24 | def forward(self, *x): 25 | loss_sum = 0 26 | for i, loss in enumerate(x): 27 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2) 28 | return loss_sum 29 | 30 | 31 | class ContrastiveLoss(nn.Module): 32 | 33 | def __init__(self, device, args): 34 | super(ContrastiveLoss, self).__init__() 35 | self.device = device 36 | self.temperature = args.temperature 37 | 38 | self.bce = torch.nn.BCELoss() 39 | self.softmax = torch.nn.Softmax(dim=-1) 40 | self.log_softmax = torch.nn.LogSoftmax(dim=-1) 41 | 42 | self.kl = torch.nn.KLDivLoss(reduction='batchmean') 43 | 44 | def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size, oral_batch_size): 45 | 46 | diag = np.eye(cur_batch_size) 47 | mask = torch.from_numpy(diag) 48 | mask = mask.type(torch.bool) 49 | 50 | positives_mask = np.zeros(similarity_matrix.size()) 51 | for i in range(cur_batch_size//oral_batch_size): 52 | ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size*i) 53 | lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size*i) 54 | positives_mask += ll 55 | positives_mask += lr 56 | 57 | positives_mask = torch.from_numpy(positives_mask) 58 | positives_mask[mask] = 0 59 | 60 | negatives_mask = 1 - positives_mask 61 | negatives_mask[mask] = 0 62 | 63 | return positives_mask.type(torch.bool), negatives_mask.type(torch.bool) 64 | 65 | def forward(self, batch_emb_om, batch_x): 66 | 67 | cur_batch_shape = batch_emb_om.shape 68 | oral_batch_shape = batch_x.shape 69 | 70 | # get similarity matrix among mask samples 71 | norm_emb = F.normalize(batch_emb_om, dim=1) 72 | similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1)) 73 | 74 | # get positives and negatives similarity 75 | positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0], oral_batch_shape[0]) 76 | 77 | positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1) 78 | negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1) 79 | 80 | # generate predict and target probability distributions matrix 81 | logits = torch.cat((positives, negatives), dim=-1) 82 | y_true = torch.cat((torch.ones(cur_batch_shape[0], positives.shape[-1]) / positives.shape[-1], torch.zeros(cur_batch_shape[0], negatives.shape[-1])), dim=-1).to(self.device).float() 83 | 84 | # multiple positives - KL divergence 85 | predict = self.log_softmax(logits / self.temperature) 86 | loss = self.kl(predict, y_true) 87 | 88 | return loss, similarity_matrix, logits 89 | 90 | 91 | class RebuildLoss(torch.nn.Module): 92 | 93 | def __init__(self, device, args): 94 | super(RebuildLoss, self).__init__() 95 | self.args = args 96 | self.device = device 97 | self.temperature = args.temperature 98 | 99 | self.softmax = torch.nn.Softmax(dim=-1) 100 | self.mse = torch.nn.MSELoss() 101 | 102 | def forward(self, similarity_matrix, batch_emb_om, batch_emb_o, batch_x): 103 | 104 | cur_batch_shape = batch_emb_om.shape 105 | oral_batch_shape = batch_x.shape 106 | 107 | # get the weight among (oral, oral's masks, others, others' masks) 108 | similarity_matrix /= self.temperature 109 | similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to(self.device).float() * 1e12 110 | rebuild_weight_matrix = self.softmax(similarity_matrix) 111 | 112 | batch_emb_om = batch_emb_om.view(cur_batch_shape[0], -1) 113 | 114 | # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks) 115 | rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om) 116 | 117 | # get oral' rebuilt batch embedding 118 | rebuild_oral_batch_emb = rebuild_batch_emb[:oral_batch_shape[0]].reshape(oral_batch_shape[0], cur_batch_shape[1], -1) 119 | 120 | # MSE Loss 121 | if self.args.rbtp == 0: 122 | loss = self.mse(rebuild_oral_batch_emb, batch_emb_o.detach()) 123 | elif self.args.rbtp == 1: 124 | loss = self.mse(rebuild_oral_batch_emb, batch_x.detach()) 125 | 126 | return loss, rebuild_weight_matrix 127 | 128 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/masking.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | class TriangularCausalMask(): 5 | def __init__(self, B, L, device="cpu"): 6 | mask_shape = [B, 1, L, L] 7 | with torch.no_grad(): 8 | self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device) 9 | 10 | @property 11 | def mask(self): 12 | return self._mask 13 | 14 | 15 | class ProbMask(): 16 | def __init__(self, B, H, L, index, scores, device="cpu"): 17 | _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1) 18 | _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1]) 19 | indicator = _mask_ex[torch.arange(B)[:, None, None], 20 | torch.arange(H)[None, :, None], 21 | index, :].to(device) 22 | self._mask = indicator.view(scores.shape).to(device) 23 | 24 | @property 25 | def mask(self): 26 | return self._mask 27 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def RSE(pred, true): 5 | return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2)) 6 | 7 | 8 | def CORR(pred, true): 9 | u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0) 10 | d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0)) 11 | return (u / d).mean(-1) 12 | 13 | 14 | def MAE(pred, true): 15 | return np.mean(np.abs(pred - true)) 16 | 17 | 18 | def MSE(pred, true): 19 | return np.mean((pred - true) ** 2) 20 | 21 | 22 | def RMSE(pred, true): 23 | return np.sqrt(MSE(pred, true)) 24 | 25 | 26 | def MAPE(pred, true): 27 | return np.mean(np.abs((pred - true) / true)) 28 | 29 | 30 | def MSPE(pred, true): 31 | return np.mean(np.square((pred - true) / true)) 32 | 33 | 34 | def metric(pred, true): 35 | mae = MAE(pred, true) 36 | mse = MSE(pred, true) 37 | rmse = RMSE(pred, true) 38 | mape = MAPE(pred, true) 39 | mspe = MSPE(pred, true) 40 | 41 | return mae, mse, rmse, mape, mspe 42 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/timefeatures.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | 3 | import numpy as np 4 | import pandas as pd 5 | from pandas.tseries import offsets 6 | from pandas.tseries.frequencies import to_offset 7 | 8 | 9 | class TimeFeature: 10 | def __init__(self): 11 | pass 12 | 13 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 14 | pass 15 | 16 | def __repr__(self): 17 | return self.__class__.__name__ + "()" 18 | 19 | 20 | class SecondOfMinute(TimeFeature): 21 | """Minute of hour encoded as value between [-0.5, 0.5]""" 22 | 23 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 24 | return index.second / 59.0 - 0.5 25 | 26 | 27 | class MinuteOfHour(TimeFeature): 28 | """Minute of hour encoded as value between [-0.5, 0.5]""" 29 | 30 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 31 | return index.minute / 59.0 - 0.5 32 | 33 | 34 | class HourOfDay(TimeFeature): 35 | """Hour of day encoded as value between [-0.5, 0.5]""" 36 | 37 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 38 | return index.hour / 23.0 - 0.5 39 | 40 | 41 | class DayOfWeek(TimeFeature): 42 | """Hour of day encoded as value between [-0.5, 0.5]""" 43 | 44 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 45 | return index.dayofweek / 6.0 - 0.5 46 | 47 | 48 | class DayOfMonth(TimeFeature): 49 | """Day of month encoded as value between [-0.5, 0.5]""" 50 | 51 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 52 | return (index.day - 1) / 30.0 - 0.5 53 | 54 | 55 | class DayOfYear(TimeFeature): 56 | """Day of year encoded as value between [-0.5, 0.5]""" 57 | 58 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 59 | return (index.dayofyear - 1) / 365.0 - 0.5 60 | 61 | 62 | class MonthOfYear(TimeFeature): 63 | """Month of year encoded as value between [-0.5, 0.5]""" 64 | 65 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 66 | return (index.month - 1) / 11.0 - 0.5 67 | 68 | 69 | class WeekOfYear(TimeFeature): 70 | """Week of year encoded as value between [-0.5, 0.5]""" 71 | 72 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 73 | return (index.isocalendar().week - 1) / 52.0 - 0.5 74 | 75 | 76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]: 77 | """ 78 | Returns a list of time features that will be appropriate for the given frequency string. 79 | Parameters 80 | ---------- 81 | freq_str 82 | Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc. 83 | """ 84 | 85 | features_by_offsets = { 86 | offsets.YearEnd: [], 87 | offsets.QuarterEnd: [MonthOfYear], 88 | offsets.MonthEnd: [MonthOfYear], 89 | offsets.Week: [DayOfMonth, WeekOfYear], 90 | offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear], 91 | offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear], 92 | offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear], 93 | offsets.Minute: [ 94 | MinuteOfHour, 95 | HourOfDay, 96 | DayOfWeek, 97 | DayOfMonth, 98 | DayOfYear, 99 | ], 100 | offsets.Second: [ 101 | SecondOfMinute, 102 | MinuteOfHour, 103 | HourOfDay, 104 | DayOfWeek, 105 | DayOfMonth, 106 | DayOfYear, 107 | ], 108 | } 109 | 110 | offset = to_offset(freq_str) 111 | 112 | for offset_type, feature_classes in features_by_offsets.items(): 113 | if isinstance(offset, offset_type): 114 | return [cls() for cls in feature_classes] 115 | 116 | supported_freq_msg = f""" 117 | Unsupported frequency {freq_str} 118 | The following frequencies are supported: 119 | Y - yearly 120 | alias: A 121 | M - monthly 122 | W - weekly 123 | D - daily 124 | B - business days 125 | H - hourly 126 | T - minutely 127 | alias: min 128 | S - secondly 129 | """ 130 | raise RuntimeError(supported_freq_msg) 131 | 132 | 133 | def time_features(dates, freq='h'): 134 | return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)]) 135 | -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/tools.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import matplotlib.pyplot as plt 4 | 5 | plt.switch_backend('agg') 6 | 7 | def adjust_learning_rate(optimizer, epoch, args, learning_rate): 8 | # lr = args.learning_rate * (0.2 ** (epoch // 2)) 9 | 10 | if args.lradj == 'type1': 11 | lr_adjust = {epoch: learning_rate * (0.5 ** ((epoch - 1) // 1))} 12 | 13 | if epoch in lr_adjust.keys(): 14 | lr = lr_adjust[epoch] 15 | for param_group in optimizer.param_groups: 16 | param_group['lr'] = lr 17 | print('Updating learning rate to {}'.format(lr)) 18 | elif args.lradj == 'type2': 19 | lr_adjust = { 20 | 2: 5e-5, 4: 1e-5, 6: 5e-6, 8: 1e-6, 21 | 10: 5e-7, 15: 1e-7, 20: 5e-8 22 | } 23 | 24 | if epoch in lr_adjust.keys(): 25 | lr = lr_adjust[epoch] 26 | for param_group in optimizer.param_groups: 27 | param_group['lr'] = lr 28 | print('Updating learning rate to {}'.format(lr)) 29 | 30 | 31 | class EarlyStopping: 32 | def __init__(self, patience=7, verbose=False, delta=0): 33 | self.patience = patience 34 | self.verbose = verbose 35 | self.counter = 0 36 | self.best_score = None 37 | self.early_stop = False 38 | self.val_loss_min = np.Inf 39 | self.delta = delta 40 | 41 | def __call__(self, val_loss, model, path, pred_len): 42 | score = -val_loss 43 | if self.best_score is None: 44 | self.best_score = score 45 | self.save_checkpoint(val_loss, model, path, pred_len) 46 | elif score < self.best_score + self.delta: 47 | self.counter += 1 48 | print(f'EarlyStopping counter: {self.counter} out of {self.patience}') 49 | if self.counter >= self.patience: 50 | self.early_stop = True 51 | else: 52 | self.best_score = score 53 | self.save_checkpoint(val_loss, model, path, pred_len) 54 | self.counter = 0 55 | 56 | def save_checkpoint(self, val_loss, model, path, pred_len): 57 | if self.verbose: 58 | print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ...') 59 | torch.save(model.state_dict(), path + '/' + f'checkpoint_{pred_len}.pth') 60 | self.val_loss_min = val_loss -------------------------------------------------------------------------------- /SimMTM_Classification/code/utils/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import random 3 | import numpy as np 4 | import pandas as pd 5 | import os 6 | import sys 7 | import logging 8 | from sklearn.metrics import classification_report, cohen_kappa_score, confusion_matrix, accuracy_score 9 | from shutil import copy 10 | 11 | def set_requires_grad(model, dict_, requires_grad=True): 12 | for param in model.named_parameters(): 13 | if param[0] in dict_: 14 | param[1].requires_grad = requires_grad 15 | 16 | 17 | def fix_randomness(SEED): 18 | random.seed(SEED) 19 | np.random.seed(SEED) 20 | torch.manual_seed(SEED) 21 | torch.cuda.manual_seed(SEED) 22 | torch.backends.cudnn.deterministic = True 23 | 24 | 25 | def epoch_time(start_time, end_time): 26 | elapsed_time = end_time - start_time 27 | elapsed_mins = int(elapsed_time / 60) 28 | elapsed_secs = int(elapsed_time - (elapsed_mins * 60)) 29 | return elapsed_mins, elapsed_secs 30 | 31 | 32 | def _calc_metrics(pred_labels, true_labels, log_dir, home_path): 33 | pred_labels = np.array(pred_labels).astype(int) 34 | true_labels = np.array(true_labels).astype(int) 35 | 36 | # save targets 37 | labels_save_path = os.path.join(log_dir, "labels") 38 | os.makedirs(labels_save_path, exist_ok=True) 39 | np.save(os.path.join(labels_save_path, "predicted_labels.npy"), pred_labels) 40 | np.save(os.path.join(labels_save_path, "true_labels.npy"), true_labels) 41 | 42 | r = classification_report(true_labels, pred_labels, digits=6, output_dict=True) 43 | cm = confusion_matrix(true_labels, pred_labels) 44 | df = pd.DataFrame(r) 45 | df["cohen"] = cohen_kappa_score(true_labels, pred_labels) 46 | df["accuracy"] = accuracy_score(true_labels, pred_labels) 47 | df = df * 100 48 | 49 | # save classification report 50 | exp_name = os.path.split(os.path.dirname(log_dir))[-1] 51 | training_mode = os.path.basename(log_dir) 52 | file_name = f"{exp_name}_{training_mode}_classification_report.xlsx" 53 | report_Save_path = os.path.join(home_path, log_dir, file_name) 54 | df.to_excel(report_Save_path) 55 | 56 | # save confusion matrix 57 | cm_file_name = f"{exp_name}_{training_mode}_confusion_matrix.torch" 58 | cm_Save_path = os.path.join(home_path, log_dir, cm_file_name) 59 | torch.save(cm, cm_Save_path) 60 | 61 | 62 | def _logger(logger_name, level=logging.DEBUG): 63 | """ 64 | Method to return a custom logger with the given name and level 65 | """ 66 | logger = logging.getLogger(logger_name) 67 | logger.setLevel(level) 68 | # format_string = ("%(asctime)s — %(name)s — %(levelname)s — %(funcName)s:" 69 | # "%(lineno)d — %(message)s") 70 | format_string = "%(message)s" 71 | log_format = logging.Formatter(format_string) 72 | # Creating and adding the console handler 73 | console_handler = logging.StreamHandler(sys.stdout) 74 | console_handler.setFormatter(log_format) 75 | logger.addHandler(console_handler) 76 | # Creating and adding the file handler 77 | file_handler = logging.FileHandler(logger_name, mode='a') 78 | file_handler.setFormatter(log_format) 79 | logger.addHandler(file_handler) 80 | return logger 81 | 82 | 83 | 84 | 85 | 86 | def copy_Files(destination, data_type): 87 | # destination: 'experiments_logs/Exp1/run1' 88 | destination_dir = os.path.join(destination, "model_files") 89 | os.makedirs(destination_dir, exist_ok=True) 90 | copy("code/main.py", os.path.join(destination_dir, "main.py")) 91 | copy("code/trainer.py", os.path.join(destination_dir, "trainer.py")) 92 | copy(f"code/config_files/{data_type}_Configs.py", os.path.join(destination_dir, f"{data_type}_Configs.py")) 93 | copy("code/augmentations.py", os.path.join(destination_dir, "augmentations.py")) 94 | copy("code/dataloader.py", os.path.join(destination_dir, "dataloader.py")) 95 | copy(f"code/model.py", os.path.join(destination_dir, f"model.py")) 96 | copy("code/loss.py", os.path.join(destination_dir, "loss.py")) 97 | copy("code/TC.py", os.path.join(destination_dir, "TC.py")) 98 | -------------------------------------------------------------------------------- /SimMTM_Classification/download_datasets.sh: -------------------------------------------------------------------------------- 1 | wget -O SleepEEG.zip https://figshare.com/ndownloader/articles/19930178/versions/1 2 | wget -O Epilepsy.zip https://figshare.com/ndownloader/articles/19930199/versions/1 3 | wget -O FD-A.zip https://figshare.com/ndownloader/articles/19930205/versions/1 4 | wget -O FD-B.zip https://figshare.com/ndownloader/articles/19930226/versions/1 5 | wget -O HAR.zip https://figshare.com/ndownloader/articles/19930244/versions/1 6 | wget -O Gesture.zip https://figshare.com/ndownloader/articles/19930247/versions/1 7 | wget -O ECG.zip https://figshare.com/ndownloader/articles/19930253/versions/1 8 | wget -O EMG.zip https://figshare.com/ndownloader/articles/19930250/versions/1 9 | 10 | unzip SleepEEG.zip -d datasets/SleepEEG/ 11 | unzip Epilepsy.zip -d datasets/Epilepsy/ 12 | unzip FD-A.zip -d datasets/FD-A/ 13 | unzip FD-B.zip -d datasets/FD-B/ 14 | unzip HAR.zip -d datasets/HAR/ 15 | unzip Gesture.zip -d datasets/Gesture/ 16 | unzip ECG.zip -d datasets/ECG/ 17 | unzip EMG.zip -d datasets/EMG/ 18 | 19 | #rm {SleepEEG,Epilepsy,FD-A,FD-B,HAR,Gesture,ECG,EMG}.zip 20 | 21 | 22 | 23 | 24 | -------------------------------------------------------------------------------- /SimMTM_Classification/run.sh: -------------------------------------------------------------------------------- 1 | python ./code/main.py --target_dataset Epilepsy --finetune_epoch 100 2 | python ./code/main.py --target_dataset FD-B --lr 0.0003 3 | python ./code/main.py --target_dataset Gesture 4 | python ./code/main.py --target_dataset EMG --lr 0.0003 5 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | /scripts/long_term_forecast/Traffic_script/PatchTST1.sh 131 | /backups/ 132 | /result.xlsx 133 | /~$result.xlsx 134 | /Time-Series-Library.zip 135 | /temp.sh 136 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/data_provider/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/data_provider/data_factory.py: -------------------------------------------------------------------------------- 1 | from data_provider.data_loader import Dataset_ETT_hour, Dataset_ETT_minute, Dataset_Custom, Dataset_M4, PSMSegLoader, \ 2 | MSLSegLoader, SMAPSegLoader, SMDSegLoader, SWATSegLoader, UEAloader 3 | from data_provider.uea import collate_fn 4 | from torch.utils.data import DataLoader 5 | 6 | data_dict = { 7 | 'ETTh1': Dataset_ETT_hour, 8 | 'ETTh2': Dataset_ETT_hour, 9 | 'ETTm1': Dataset_ETT_minute, 10 | 'ETTm2': Dataset_ETT_minute, 11 | 'Traffic': Dataset_Custom, 12 | 'Exchange': Dataset_Custom, 13 | 'Weather': Dataset_Custom, 14 | 'ECL': Dataset_Custom, 15 | 'ILI': Dataset_Custom, 16 | 'm4': Dataset_M4, 17 | 'PSM': PSMSegLoader, 18 | 'MSL': MSLSegLoader, 19 | 'SMAP': SMAPSegLoader, 20 | 'SMD': SMDSegLoader, 21 | 'SWAT': SWATSegLoader, 22 | 'UEA': UEAloader, 23 | } 24 | 25 | 26 | def data_provider(args, flag): 27 | Data = data_dict[args.data] 28 | 29 | timeenc = 0 if args.embed != 'timeF' else 1 30 | 31 | if flag == 'test': 32 | shuffle_flag = False 33 | drop_last = True 34 | if args.task_name == 'anomaly_detection' or args.task_name == 'classification': 35 | batch_size = args.batch_size 36 | else: 37 | batch_size = 1 # bsz=1 for evaluation 38 | freq = args.freq 39 | else: 40 | shuffle_flag = True 41 | drop_last = True 42 | batch_size = args.batch_size # bsz for train and valid 43 | freq = args.freq 44 | 45 | if args.task_name == 'anomaly_detection': 46 | drop_last = False 47 | data_set = Data( 48 | root_path=args.root_path, 49 | win_size=args.seq_len, 50 | flag=flag, 51 | ) 52 | print(flag, len(data_set)) 53 | data_loader = DataLoader( 54 | data_set, 55 | batch_size=batch_size, 56 | shuffle=shuffle_flag, 57 | num_workers=args.num_workers, 58 | drop_last=drop_last) 59 | return data_set, data_loader 60 | elif args.task_name == 'classification': 61 | drop_last = False 62 | data_set = Data( 63 | root_path=args.root_path, 64 | flag=flag, 65 | ) 66 | print(flag, len(data_set)) 67 | data_loader = DataLoader( 68 | data_set, 69 | batch_size=batch_size, 70 | shuffle=shuffle_flag, 71 | num_workers=args.num_workers, 72 | drop_last=drop_last, 73 | collate_fn=lambda x: collate_fn(x, max_len=args.seq_len) 74 | ) 75 | return data_set, data_loader 76 | else: 77 | if args.data == 'm4': 78 | drop_last = False 79 | 80 | data_set = Data( 81 | root_path=args.root_path, 82 | data_path=args.data_path, 83 | flag=flag, 84 | size=[args.seq_len, args.label_len, args.pred_len], 85 | features=args.features, 86 | target=args.target, 87 | timeenc=timeenc, 88 | freq=freq, 89 | seasonal_patterns=args.seasonal_patterns 90 | ) 91 | 92 | data_loader = DataLoader( 93 | data_set, 94 | batch_size=batch_size, 95 | shuffle=shuffle_flag, 96 | num_workers=args.num_workers, 97 | drop_last=drop_last) 98 | 99 | print(flag, len(data_set), len(data_loader)) 100 | return data_set, data_loader 101 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/data_provider/m4.py: -------------------------------------------------------------------------------- 1 | # This source code is provided for the purposes of scientific reproducibility 2 | # under the following limited license from Element AI Inc. The code is an 3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis 4 | # expansion analysis for interpretable time series forecasting, 5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is 6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0 7 | # International license (CC BY-NC 4.0): 8 | # https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether 9 | # for the benefit of third parties or internally in production) requires an 10 | # explicit license. The subject-matter of the N-BEATS model and associated 11 | # materials are the property of Element AI Inc. and may be subject to patent 12 | # protection. No license to patents is granted hereunder (whether express or 13 | # implied). Copyright © 2020 Element AI Inc. All rights reserved. 14 | 15 | """ 16 | M4 Dataset 17 | """ 18 | from dataclasses import dataclass 19 | 20 | import numpy as np 21 | import pandas as pd 22 | import logging 23 | import os 24 | import pathlib 25 | import sys 26 | from urllib import request 27 | 28 | 29 | def url_file_name(url: str) -> str: 30 | """ 31 | Extract file name from url. 32 | 33 | :param url: URL to extract file name from. 34 | :return: File name. 35 | """ 36 | return url.split('/')[-1] if len(url) > 0 else '' 37 | 38 | 39 | def download(url: str, file_path: str) -> None: 40 | """ 41 | Download a file to the given path. 42 | 43 | :param url: URL to download 44 | :param file_path: Where to download the content. 45 | """ 46 | 47 | def progress(count, block_size, total_size): 48 | progress_pct = float(count * block_size) / float(total_size) * 100.0 49 | sys.stdout.write('\rDownloading {} to {} {:.1f}%'.format(url, file_path, progress_pct)) 50 | sys.stdout.flush() 51 | 52 | if not os.path.isfile(file_path): 53 | opener = request.build_opener() 54 | opener.addheaders = [('User-agent', 'Mozilla/5.0')] 55 | request.install_opener(opener) 56 | pathlib.Path(os.path.dirname(file_path)).mkdir(parents=True, exist_ok=True) 57 | f, _ = request.urlretrieve(url, file_path, progress) 58 | sys.stdout.write('\n') 59 | sys.stdout.flush() 60 | file_info = os.stat(f) 61 | logging.info(f'Successfully downloaded {os.path.basename(file_path)} {file_info.st_size} bytes.') 62 | else: 63 | file_info = os.stat(file_path) 64 | logging.info(f'File already exists: {file_path} {file_info.st_size} bytes.') 65 | 66 | 67 | @dataclass() 68 | class M4Dataset: 69 | ids: np.ndarray 70 | groups: np.ndarray 71 | frequencies: np.ndarray 72 | horizons: np.ndarray 73 | values: np.ndarray 74 | 75 | @staticmethod 76 | def load(training: bool = True, dataset_file: str = '../dataset/m4') -> 'M4Dataset': 77 | """ 78 | Load cached dataset. 79 | 80 | :param training: Load training part if training is True, test part otherwise. 81 | """ 82 | info_file = os.path.join(dataset_file, 'M4-info.csv') 83 | train_cache_file = os.path.join(dataset_file, 'training.npz') 84 | test_cache_file = os.path.join(dataset_file, 'test.npz') 85 | m4_info = pd.read_csv(info_file) 86 | return M4Dataset(ids=m4_info.M4id.values, 87 | groups=m4_info.SP.values, 88 | frequencies=m4_info.Frequency.values, 89 | horizons=m4_info.Horizon.values, 90 | values=np.load(train_cache_file if training else test_cache_file, allow_pickle=True)) 91 | 92 | 93 | @dataclass() 94 | class M4Meta: 95 | seasonal_patterns = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly'] 96 | horizons = [6, 8, 18, 13, 14, 48] 97 | frequencies = [1, 4, 12, 1, 1, 24] 98 | horizons_map = { 99 | 'Yearly': 6, 100 | 'Quarterly': 8, 101 | 'Monthly': 18, 102 | 'Weekly': 13, 103 | 'Daily': 14, 104 | 'Hourly': 48 105 | } # different predict length 106 | frequency_map = { 107 | 'Yearly': 1, 108 | 'Quarterly': 4, 109 | 'Monthly': 12, 110 | 'Weekly': 1, 111 | 'Daily': 1, 112 | 'Hourly': 24 113 | } 114 | history_size = { 115 | 'Yearly': 1.5, 116 | 'Quarterly': 1.5, 117 | 'Monthly': 1.5, 118 | 'Weekly': 10, 119 | 'Daily': 10, 120 | 'Hourly': 10 121 | } # from interpretable.gin 122 | 123 | 124 | def load_m4_info() -> pd.DataFrame: 125 | """ 126 | Load M4Info file. 127 | 128 | :return: Pandas DataFrame of M4Info. 129 | """ 130 | return pd.read_csv(INFO_FILE_PATH) 131 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/data_provider/uea.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pandas as pd 4 | import torch 5 | 6 | 7 | def collate_fn(data, max_len=None): 8 | """Build mini-batch tensors from a list of (X, mask) tuples. Mask input. Create 9 | Args: 10 | data: len(batch_size) list of tuples (X, y). 11 | - X: torch tensor of shape (seq_length, feat_dim); variable seq_length. 12 | - y: torch tensor of shape (num_labels,) : class indices or numerical targets 13 | (for classification or regression, respectively). num_labels > 1 for multi-task models 14 | max_len: global fixed sequence length. Used for architectures requiring fixed length input, 15 | where the batch length cannot vary dynamically. Longer sequences are clipped, shorter are padded with 0s 16 | Returns: 17 | X: (batch_size, padded_length, feat_dim) torch tensor of masked features (input) 18 | targets: (batch_size, padded_length, feat_dim) torch tensor of unmasked features (output) 19 | target_masks: (batch_size, padded_length, feat_dim) boolean torch tensor 20 | 0 indicates masked values to be predicted, 1 indicates unaffected/"active" feature values 21 | padding_masks: (batch_size, padded_length) boolean tensor, 1 means keep vector at this position, 0 means padding 22 | """ 23 | 24 | batch_size = len(data) 25 | features, labels = zip(*data) 26 | 27 | # Stack and pad features and masks (convert 2D to 3D tensors, i.e. add batch dimension) 28 | lengths = [X.shape[0] for X in features] # original sequence length for each time series 29 | if max_len is None: 30 | max_len = max(lengths) 31 | X = torch.zeros(batch_size, max_len, features[0].shape[-1]) # (batch_size, padded_length, feat_dim) 32 | for i in range(batch_size): 33 | end = min(lengths[i], max_len) 34 | X[i, :end, :] = features[i][:end, :] 35 | 36 | targets = torch.stack(labels, dim=0) # (batch_size, num_labels) 37 | 38 | padding_masks = padding_mask(torch.tensor(lengths, dtype=torch.int16), 39 | max_len=max_len) # (batch_size, padded_length) boolean tensor, "1" means keep 40 | 41 | return X, targets, padding_masks 42 | 43 | 44 | def padding_mask(lengths, max_len=None): 45 | """ 46 | Used to mask padded positions: creates a (batch_size, max_len) boolean mask from a tensor of sequence lengths, 47 | where 1 means keep element at this position (time step) 48 | """ 49 | batch_size = lengths.numel() 50 | max_len = max_len or lengths.max_val() # trick works because of overloading of 'or' operator for non-boolean types 51 | return (torch.arange(0, max_len, device=lengths.device) 52 | .type_as(lengths) 53 | .repeat(batch_size, 1) 54 | .lt(lengths.unsqueeze(1))) 55 | 56 | 57 | class Normalizer(object): 58 | """ 59 | Normalizes dataframe across ALL contained rows (time steps). Different from per-sample normalization. 60 | """ 61 | 62 | def __init__(self, norm_type='standardization', mean=None, std=None, min_val=None, max_val=None): 63 | """ 64 | Args: 65 | norm_type: choose from: 66 | "standardization", "minmax": normalizes dataframe across ALL contained rows (time steps) 67 | "per_sample_std", "per_sample_minmax": normalizes each sample separately (i.e. across only its own rows) 68 | mean, std, min_val, max_val: optional (num_feat,) Series of pre-computed values 69 | """ 70 | 71 | self.norm_type = norm_type 72 | self.mean = mean 73 | self.std = std 74 | self.min_val = min_val 75 | self.max_val = max_val 76 | 77 | def normalize(self, df): 78 | """ 79 | Args: 80 | df: input dataframe 81 | Returns: 82 | df: normalized dataframe 83 | """ 84 | if self.norm_type == "standardization": 85 | if self.mean is None: 86 | self.mean = df.mean() 87 | self.std = df.std() 88 | return (df - self.mean) / (self.std + np.finfo(float).eps) 89 | 90 | elif self.norm_type == "minmax": 91 | if self.max_val is None: 92 | self.max_val = df.max() 93 | self.min_val = df.min() 94 | return (df - self.min_val) / (self.max_val - self.min_val + np.finfo(float).eps) 95 | 96 | elif self.norm_type == "per_sample_std": 97 | grouped = df.groupby(by=df.index) 98 | return (df - grouped.transform('mean')) / grouped.transform('std') 99 | 100 | elif self.norm_type == "per_sample_minmax": 101 | grouped = df.groupby(by=df.index) 102 | min_vals = grouped.transform('min') 103 | return (df - min_vals) / (grouped.transform('max') - min_vals + np.finfo(float).eps) 104 | 105 | else: 106 | raise (NameError(f'Normalize method "{self.norm_type}" not implemented')) 107 | 108 | 109 | def interpolate_missing(y): 110 | """ 111 | Replaces NaN values in pd.Series `y` using linear interpolation 112 | """ 113 | if y.isna().any(): 114 | y = y.interpolate(method='linear', limit_direction='both') 115 | return y 116 | 117 | 118 | def subsample(y, limit=256, factor=2): 119 | """ 120 | If a given Series is longer than `limit`, returns subsampled sequence by the specified integer factor 121 | """ 122 | if len(y) > limit: 123 | return y[::factor].reset_index(drop=True) 124 | return y 125 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/exp/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/exp/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/__init__.py -------------------------------------------------------------------------------- /SimMTM_Forecasting/exp/exp_basic.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | from models import SimMTM 4 | 5 | 6 | class Exp_Basic(object): 7 | def __init__(self, args): 8 | self.args = args 9 | self.model_dict = {'SimMTM': SimMTM} 10 | self.device = self._acquire_device() 11 | self.model = self._build_model().to(self.device) 12 | 13 | def _build_model(self): 14 | raise NotImplementedError 15 | return None 16 | 17 | def _acquire_device(self): 18 | if self.args.use_gpu: 19 | os.environ["CUDA_VISIBLE_DEVICES"] = str(self.args.gpu) if not self.args.use_multi_gpu else self.args.devices 20 | device = torch.device('cuda:{}'.format(self.args.gpu)) 21 | print('Use GPU: cuda:{}'.format(self.args.gpu)) 22 | else: 23 | device = torch.device('cpu') 24 | print('Use CPU') 25 | return device 26 | 27 | def _get_data(self): 28 | pass 29 | 30 | def vali(self): 31 | pass 32 | 33 | def train(self): 34 | pass 35 | 36 | def test(self): 37 | pass 38 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/AutoCorrelation.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | import math 7 | from math import sqrt 8 | import os 9 | 10 | 11 | class AutoCorrelation(nn.Module): 12 | """ 13 | AutoCorrelation Mechanism with the following two phases: 14 | (1) period-based dependencies discovery 15 | (2) time delay aggregation 16 | This block can replace the self-attention family mechanism seamlessly. 17 | """ 18 | 19 | def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False): 20 | super(AutoCorrelation, self).__init__() 21 | self.factor = factor 22 | self.scale = scale 23 | self.mask_flag = mask_flag 24 | self.output_attention = output_attention 25 | self.dropout = nn.Dropout(attention_dropout) 26 | 27 | def time_delay_agg_training(self, values, corr): 28 | """ 29 | SpeedUp version of Autocorrelation (a batch-normalization style design) 30 | This is for the training phase. 31 | """ 32 | head = values.shape[1] 33 | channel = values.shape[2] 34 | length = values.shape[3] 35 | # find top k 36 | top_k = int(self.factor * math.log(length)) 37 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1) 38 | index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1] 39 | weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1) 40 | # update corr 41 | tmp_corr = torch.softmax(weights, dim=-1) 42 | # aggregation 43 | tmp_values = values 44 | delays_agg = torch.zeros_like(values).float() 45 | for i in range(top_k): 46 | pattern = torch.roll(tmp_values, -int(index[i]), -1) 47 | delays_agg = delays_agg + pattern * \ 48 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)) 49 | return delays_agg 50 | 51 | def time_delay_agg_inference(self, values, corr): 52 | """ 53 | SpeedUp version of Autocorrelation (a batch-normalization style design) 54 | This is for the inference phase. 55 | """ 56 | batch = values.shape[0] 57 | head = values.shape[1] 58 | channel = values.shape[2] 59 | length = values.shape[3] 60 | # index init 61 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda() 62 | # find top k 63 | top_k = int(self.factor * math.log(length)) 64 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1) 65 | weights, delay = torch.topk(mean_value, top_k, dim=-1) 66 | # update corr 67 | tmp_corr = torch.softmax(weights, dim=-1) 68 | # aggregation 69 | tmp_values = values.repeat(1, 1, 1, 2) 70 | delays_agg = torch.zeros_like(values).float() 71 | for i in range(top_k): 72 | tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length) 73 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay) 74 | delays_agg = delays_agg + pattern * \ 75 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)) 76 | return delays_agg 77 | 78 | def time_delay_agg_full(self, values, corr): 79 | """ 80 | Standard version of Autocorrelation 81 | """ 82 | batch = values.shape[0] 83 | head = values.shape[1] 84 | channel = values.shape[2] 85 | length = values.shape[3] 86 | # index init 87 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda() 88 | # find top k 89 | top_k = int(self.factor * math.log(length)) 90 | weights, delay = torch.topk(corr, top_k, dim=-1) 91 | # update corr 92 | tmp_corr = torch.softmax(weights, dim=-1) 93 | # aggregation 94 | tmp_values = values.repeat(1, 1, 1, 2) 95 | delays_agg = torch.zeros_like(values).float() 96 | for i in range(top_k): 97 | tmp_delay = init_index + delay[..., i].unsqueeze(-1) 98 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay) 99 | delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1)) 100 | return delays_agg 101 | 102 | def forward(self, queries, keys, values, attn_mask): 103 | B, L, H, E = queries.shape 104 | _, S, _, D = values.shape 105 | if L > S: 106 | zeros = torch.zeros_like(queries[:, :(L - S), :]).float() 107 | values = torch.cat([values, zeros], dim=1) 108 | keys = torch.cat([keys, zeros], dim=1) 109 | else: 110 | values = values[:, :L, :, :] 111 | keys = keys[:, :L, :, :] 112 | 113 | # period-based dependencies 114 | q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1) 115 | k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1) 116 | res = q_fft * torch.conj(k_fft) 117 | corr = torch.fft.irfft(res, dim=-1) 118 | 119 | # time delay agg 120 | if self.training: 121 | V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2) 122 | else: 123 | V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2) 124 | 125 | if self.output_attention: 126 | return (V.contiguous(), corr.permute(0, 3, 1, 2)) 127 | else: 128 | return (V.contiguous(), None) 129 | 130 | 131 | class AutoCorrelationLayer(nn.Module): 132 | def __init__(self, correlation, d_model, n_heads, d_keys=None, 133 | d_values=None): 134 | super(AutoCorrelationLayer, self).__init__() 135 | 136 | d_keys = d_keys or (d_model // n_heads) 137 | d_values = d_values or (d_model // n_heads) 138 | 139 | self.inner_correlation = correlation 140 | self.query_projection = nn.Linear(d_model, d_keys * n_heads) 141 | self.key_projection = nn.Linear(d_model, d_keys * n_heads) 142 | self.value_projection = nn.Linear(d_model, d_values * n_heads) 143 | self.out_projection = nn.Linear(d_values * n_heads, d_model) 144 | self.n_heads = n_heads 145 | 146 | def forward(self, queries, keys, values, attn_mask): 147 | B, L, _ = queries.shape 148 | _, S, _ = keys.shape 149 | H = self.n_heads 150 | 151 | queries = self.query_projection(queries).view(B, L, H, -1) 152 | keys = self.key_projection(keys).view(B, S, H, -1) 153 | values = self.value_projection(values).view(B, S, H, -1) 154 | 155 | out, attn = self.inner_correlation( 156 | queries, 157 | keys, 158 | values, 159 | attn_mask 160 | ) 161 | out = out.view(B, L, -1) 162 | 163 | return self.out_projection(out), attn 164 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/Autoformer_EncDec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class my_Layernorm(nn.Module): 7 | """ 8 | Special designed layernorm for the seasonal part 9 | """ 10 | 11 | def __init__(self, channels): 12 | super(my_Layernorm, self).__init__() 13 | self.layernorm = nn.LayerNorm(channels) 14 | 15 | def forward(self, x): 16 | x_hat = self.layernorm(x) 17 | bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1) 18 | return x_hat - bias 19 | 20 | 21 | class moving_avg(nn.Module): 22 | """ 23 | Moving average block to highlight the trend of time series 24 | """ 25 | 26 | def __init__(self, kernel_size, stride): 27 | super(moving_avg, self).__init__() 28 | self.kernel_size = kernel_size 29 | self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0) 30 | 31 | def forward(self, x): 32 | 33 | # padding on the both ends of time series 34 | front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1) 35 | end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1) 36 | x = torch.cat([front, x, end], dim=1) 37 | x = self.avg(x.permute(0, 2, 1)) 38 | x = x.permute(0, 2, 1) 39 | return x 40 | 41 | 42 | class series_decomp(nn.Module): 43 | """ 44 | Series decomposition block 45 | """ 46 | 47 | def __init__(self, kernel_size): 48 | super(series_decomp, self).__init__() 49 | self.moving_avg = moving_avg(kernel_size, stride=1) 50 | 51 | def forward(self, x): 52 | moving_mean = self.moving_avg(x) 53 | res = x - moving_mean 54 | return res, moving_mean 55 | 56 | 57 | class series_decomp_multi(nn.Module): 58 | """ 59 | Multiple Series decomposition block from FEDformer 60 | """ 61 | 62 | def __init__(self, kernel_size): 63 | super(series_decomp_multi, self).__init__() 64 | self.kernel_size = kernel_size 65 | self.series_decomp = [series_decomp(kernel) for kernel in kernel_size] 66 | 67 | def forward(self, x): 68 | moving_mean = [] 69 | res = [] 70 | for func in self.series_decomp: 71 | sea, moving_avg = func(x) 72 | moving_mean.append(moving_avg) 73 | res.append(sea) 74 | 75 | sea = sum(res) / len(res) 76 | moving_mean = sum(moving_mean) / len(moving_mean) 77 | return sea, moving_mean 78 | 79 | 80 | class EncoderLayer(nn.Module): 81 | """ 82 | Autoformer encoder layer with the progressive decomposition architecture 83 | """ 84 | 85 | def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"): 86 | super(EncoderLayer, self).__init__() 87 | d_ff = d_ff or 4 * d_model 88 | self.attention = attention 89 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False) 90 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False) 91 | self.decomp1 = series_decomp(moving_avg) 92 | self.decomp2 = series_decomp(moving_avg) 93 | self.dropout = nn.Dropout(dropout) 94 | self.activation = F.relu if activation == "relu" else F.gelu 95 | 96 | def forward(self, x, attn_mask=None): 97 | new_x, attn = self.attention( 98 | x, x, x, 99 | attn_mask=attn_mask 100 | ) 101 | x = x + self.dropout(new_x) 102 | x, _ = self.decomp1(x) 103 | y = x 104 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 105 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 106 | res, _ = self.decomp2(x + y) 107 | return res, attn 108 | 109 | 110 | class Encoder(nn.Module): 111 | """ 112 | Autoformer encoder 113 | """ 114 | 115 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None): 116 | super(Encoder, self).__init__() 117 | self.attn_layers = nn.ModuleList(attn_layers) 118 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None 119 | self.norm = norm_layer 120 | 121 | def forward(self, x, attn_mask=None): 122 | attns = [] 123 | if self.conv_layers is not None: 124 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers): 125 | x, attn = attn_layer(x, attn_mask=attn_mask) 126 | x = conv_layer(x) 127 | attns.append(attn) 128 | x, attn = self.attn_layers[-1](x) 129 | attns.append(attn) 130 | else: 131 | for attn_layer in self.attn_layers: 132 | x, attn = attn_layer(x, attn_mask=attn_mask) 133 | attns.append(attn) 134 | 135 | if self.norm is not None: 136 | x = self.norm(x) 137 | 138 | return x, attns 139 | 140 | 141 | class DecoderLayer(nn.Module): 142 | """ 143 | Autoformer decoder layer with the progressive decomposition architecture 144 | """ 145 | 146 | def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None, 147 | moving_avg=25, dropout=0.1, activation="relu"): 148 | super(DecoderLayer, self).__init__() 149 | d_ff = d_ff or 4 * d_model 150 | self.self_attention = self_attention 151 | self.cross_attention = cross_attention 152 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False) 153 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False) 154 | self.decomp1 = series_decomp(moving_avg) 155 | self.decomp2 = series_decomp(moving_avg) 156 | self.decomp3 = series_decomp(moving_avg) 157 | self.dropout = nn.Dropout(dropout) 158 | self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1, 159 | padding_mode='circular', bias=False) 160 | self.activation = F.relu if activation == "relu" else F.gelu 161 | 162 | def forward(self, x, cross, x_mask=None, cross_mask=None): 163 | x = x + self.dropout(self.self_attention( 164 | x, x, x, 165 | attn_mask=x_mask 166 | )[0]) 167 | x, trend1 = self.decomp1(x) 168 | x = x + self.dropout(self.cross_attention( 169 | x, cross, cross, 170 | attn_mask=cross_mask 171 | )[0]) 172 | x, trend2 = self.decomp2(x) 173 | y = x 174 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 175 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 176 | x, trend3 = self.decomp3(x + y) 177 | 178 | residual_trend = trend1 + trend2 + trend3 179 | residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2) 180 | return x, residual_trend 181 | 182 | 183 | class Decoder(nn.Module): 184 | """ 185 | Autoformer encoder 186 | """ 187 | 188 | def __init__(self, layers, norm_layer=None, projection=None): 189 | super(Decoder, self).__init__() 190 | self.layers = nn.ModuleList(layers) 191 | self.norm = norm_layer 192 | self.projection = projection 193 | 194 | def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None): 195 | for layer in self.layers: 196 | x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask) 197 | trend = trend + residual_trend 198 | 199 | if self.norm is not None: 200 | x = self.norm(x) 201 | 202 | if self.projection is not None: 203 | x = self.projection(x) 204 | return x, trend 205 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/Conv_Blocks.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | class Inception_Block_V1(nn.Module): 6 | def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True): 7 | super(Inception_Block_V1, self).__init__() 8 | self.in_channels = in_channels 9 | self.out_channels = out_channels 10 | self.num_kernels = num_kernels 11 | kernels = [] 12 | for i in range(self.num_kernels): 13 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=2 * i + 1, padding=i)) 14 | self.kernels = nn.ModuleList(kernels) 15 | if init_weight: 16 | self._initialize_weights() 17 | 18 | def _initialize_weights(self): 19 | for m in self.modules(): 20 | if isinstance(m, nn.Conv2d): 21 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 22 | if m.bias is not None: 23 | nn.init.constant_(m.bias, 0) 24 | 25 | def forward(self, x): 26 | res_list = [] 27 | for i in range(self.num_kernels): 28 | res_list.append(self.kernels[i](x)) 29 | res = torch.stack(res_list, dim=-1).mean(-1) 30 | return res 31 | 32 | 33 | class Inception_Block_V2(nn.Module): 34 | def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True): 35 | super(Inception_Block_V2, self).__init__() 36 | self.in_channels = in_channels 37 | self.out_channels = out_channels 38 | self.num_kernels = num_kernels 39 | kernels = [] 40 | for i in range(self.num_kernels // 2): 41 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[1, 2 * i + 3], padding=[0, i + 1])) 42 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[2 * i + 3, 1], padding=[i + 1, 0])) 43 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=1)) 44 | self.kernels = nn.ModuleList(kernels) 45 | if init_weight: 46 | self._initialize_weights() 47 | 48 | def _initialize_weights(self): 49 | for m in self.modules(): 50 | if isinstance(m, nn.Conv2d): 51 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 52 | if m.bias is not None: 53 | nn.init.constant_(m.bias, 0) 54 | 55 | def forward(self, x): 56 | res_list = [] 57 | for i in range(self.num_kernels + 1): 58 | res_list.append(self.kernels[i](x)) 59 | res = torch.stack(res_list, dim=-1).mean(-1) 60 | return res 61 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/Embed.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import math 4 | 5 | 6 | class PositionalEmbedding(nn.Module): 7 | def __init__(self, d_model, max_len=5000): 8 | super(PositionalEmbedding, self).__init__() 9 | # Compute the positional encodings once in log space. 10 | pe = torch.zeros(max_len, d_model).float() 11 | pe.require_grad = False 12 | 13 | position = torch.arange(0, max_len).float().unsqueeze(1) 14 | div_term = (torch.arange(0, d_model, 2).float() 15 | * -(math.log(10000.0) / d_model)).exp() 16 | 17 | pe[:, 0::2] = torch.sin(position * div_term) 18 | pe[:, 1::2] = torch.cos(position * div_term) 19 | 20 | pe = pe.unsqueeze(0) 21 | self.register_buffer('pe', pe) 22 | 23 | def forward(self, x): 24 | return self.pe[:, :x.size(1)] 25 | 26 | 27 | class TokenEmbedding(nn.Module): 28 | def __init__(self, c_in, d_model): 29 | super(TokenEmbedding, self).__init__() 30 | padding = 1 if torch.__version__ >= '1.5.0' else 2 31 | self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model, 32 | kernel_size=3, padding=padding, padding_mode='circular', bias=False) 33 | for m in self.modules(): 34 | if isinstance(m, nn.Conv1d): 35 | nn.init.kaiming_normal_( 36 | m.weight, mode='fan_in', nonlinearity='leaky_relu') 37 | 38 | def forward(self, x): 39 | x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2) 40 | return x 41 | 42 | 43 | class FixedEmbedding(nn.Module): 44 | def __init__(self, c_in, d_model): 45 | super(FixedEmbedding, self).__init__() 46 | 47 | w = torch.zeros(c_in, d_model).float() 48 | w.require_grad = False 49 | 50 | position = torch.arange(0, c_in).float().unsqueeze(1) 51 | div_term = (torch.arange(0, d_model, 2).float() 52 | * -(math.log(10000.0) / d_model)).exp() 53 | 54 | w[:, 0::2] = torch.sin(position * div_term) 55 | w[:, 1::2] = torch.cos(position * div_term) 56 | 57 | self.emb = nn.Embedding(c_in, d_model) 58 | self.emb.weight = nn.Parameter(w, requires_grad=False) 59 | 60 | def forward(self, x): 61 | return self.emb(x).detach() 62 | 63 | 64 | class TemporalEmbedding(nn.Module): 65 | def __init__(self, d_model, embed_type='fixed', freq='h'): 66 | super(TemporalEmbedding, self).__init__() 67 | 68 | minute_size = 4 69 | hour_size = 24 70 | weekday_size = 7 71 | day_size = 32 72 | month_size = 13 73 | 74 | Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding 75 | if freq == 't': 76 | self.minute_embed = Embed(minute_size, d_model) 77 | self.hour_embed = Embed(hour_size, d_model) 78 | self.weekday_embed = Embed(weekday_size, d_model) 79 | self.day_embed = Embed(day_size, d_model) 80 | self.month_embed = Embed(month_size, d_model) 81 | 82 | def forward(self, x): 83 | 84 | x = x.long() 85 | minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0. 86 | hour_x = self.hour_embed(x[:, :, 3]) 87 | weekday_x = self.weekday_embed(x[:, :, 2]) 88 | day_x = self.day_embed(x[:, :, 1]) 89 | month_x = self.month_embed(x[:, :, 0]) 90 | 91 | return hour_x + weekday_x + day_x + month_x + minute_x 92 | 93 | 94 | class TimeFeatureEmbedding(nn.Module): 95 | def __init__(self, d_model, embed_type='timeF', freq='h'): 96 | super(TimeFeatureEmbedding, self).__init__() 97 | 98 | freq_map = {'h': 4, 't': 5, 's': 6, 99 | 'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3} 100 | d_inp = freq_map[freq] 101 | self.embed = nn.Linear(d_inp, d_model, bias=False) 102 | 103 | def forward(self, x): 104 | return self.embed(x) 105 | 106 | 107 | class DataEmbedding(nn.Module): 108 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1): 109 | super(DataEmbedding, self).__init__() 110 | 111 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model) 112 | self.position_embedding = PositionalEmbedding(d_model=d_model) 113 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type, 114 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding( 115 | d_model=d_model, embed_type=embed_type, freq=freq) 116 | self.dropout = nn.Dropout(p=dropout) 117 | 118 | def forward(self, x, x_mark=None): 119 | 120 | if x_mark is None: 121 | x = self.value_embedding(x) + self.position_embedding(x) 122 | else: 123 | x = self.value_embedding(x) + self.temporal_embedding(x_mark) + self.position_embedding(x) 124 | return self.dropout(x) 125 | 126 | 127 | class DataEmbedding_wo_pos(nn.Module): 128 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1): 129 | super(DataEmbedding_wo_pos, self).__init__() 130 | 131 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model) 132 | 133 | self.position_embedding = PositionalEmbedding(d_model=d_model) 134 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type, 135 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding( 136 | d_model=d_model, embed_type=embed_type, freq=freq) 137 | self.dropout = nn.Dropout(p=dropout) 138 | 139 | def forward(self, x, x_mark): 140 | if x_mark is None: 141 | x = self.value_embedding(x) 142 | else: 143 | x = self.value_embedding(x) + self.temporal_embedding(x_mark) 144 | return self.dropout(x) 145 | 146 | 147 | class PatchEmbedding(nn.Module): 148 | def __init__(self, d_model, patch_len, stride, dropout): 149 | super(PatchEmbedding, self).__init__() 150 | # Patching 151 | self.patch_len = patch_len 152 | self.stride = stride 153 | self.padding_patch_layer = nn.ReplicationPad1d((0, stride)) 154 | 155 | # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space 156 | self.value_embedding = TokenEmbedding(patch_len, d_model) 157 | 158 | # Positional embedding 159 | self.position_embedding = PositionalEmbedding(d_model) 160 | 161 | # Residual dropout 162 | self.dropout = nn.Dropout(dropout) 163 | 164 | def forward(self, x): 165 | # do patching 166 | n_vars = x.shape[1] 167 | x = self.padding_patch_layer(x) 168 | x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride) 169 | x = torch.reshape(x, (x.shape[0] * x.shape[1], x.shape[2], x.shape[3])) # channel independent 170 | 171 | # Input encoding 172 | x = self.value_embedding(x) + self.position_embedding(x) 173 | return self.dropout(x), n_vars 174 | 175 | 176 | class PatchEmbedding_wo_channel_independent(nn.Module): 177 | def __init__(self, n_vars, d_model, patch_len, stride, dropout): 178 | super(PatchEmbedding_wo_channel_independent, self).__init__() 179 | # Patching 180 | self.n_vars = n_vars 181 | self.patch_len = patch_len 182 | self.stride = stride 183 | self.padding_patch_layer = nn.ReplicationPad1d((0, stride)) 184 | 185 | # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space 186 | self.value_embedding = TokenEmbedding(patch_len*n_vars, d_model) 187 | 188 | # Positional embedding 189 | self.position_embedding = PositionalEmbedding(d_model) 190 | 191 | # Residual dropout 192 | self.dropout = nn.Dropout(dropout) 193 | 194 | def forward(self, x): 195 | # do patching 196 | n_vars = x.shape[1] 197 | x = self.padding_patch_layer(x) 198 | x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride) 199 | 200 | x = torch.reshape(x, (x.shape[0], x.shape[2], x.shape[1]*x.shape[3])) 201 | 202 | # Input encoding 203 | x = self.value_embedding(x) + self.position_embedding(x) 204 | return self.dropout(x), n_vars 205 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/FourierCorrelation.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # author=maziqing 3 | # email=maziqing.mzq@alibaba-inc.com 4 | 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | 9 | 10 | def get_frequency_modes(seq_len, modes=64, mode_select_method='random'): 11 | """ 12 | get modes on frequency domain: 13 | 'random' means sampling randomly; 14 | 'else' means sampling the lowest modes; 15 | """ 16 | modes = min(modes, seq_len // 2) 17 | if mode_select_method == 'random': 18 | index = list(range(0, seq_len // 2)) 19 | np.random.shuffle(index) 20 | index = index[:modes] 21 | else: 22 | index = list(range(0, modes)) 23 | index.sort() 24 | return index 25 | 26 | 27 | # ########## fourier layer ############# 28 | class FourierBlock(nn.Module): 29 | def __init__(self, in_channels, out_channels, seq_len, modes=0, mode_select_method='random'): 30 | super(FourierBlock, self).__init__() 31 | print('fourier enhanced block used!') 32 | """ 33 | 1D Fourier block. It performs representation learning on frequency domain, 34 | it does FFT, linear transform, and Inverse FFT. 35 | """ 36 | # get modes on frequency domain 37 | self.index = get_frequency_modes(seq_len, modes=modes, mode_select_method=mode_select_method) 38 | print('modes={}, index={}'.format(modes, self.index)) 39 | 40 | self.scale = (1 / (in_channels * out_channels)) 41 | self.weights1 = nn.Parameter( 42 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float)) 43 | self.weights2 = nn.Parameter( 44 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float)) 45 | 46 | # Complex multiplication 47 | def compl_mul1d(self, order, x, weights): 48 | x_flag = True 49 | w_flag = True 50 | if not torch.is_complex(x): 51 | x_flag = False 52 | x = torch.complex(x, torch.zeros_like(x).to(x.device)) 53 | if not torch.is_complex(weights): 54 | w_flag = False 55 | weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device)) 56 | if x_flag or w_flag: 57 | return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag), 58 | torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real)) 59 | else: 60 | return torch.einsum(order, x.real, weights.real) 61 | 62 | def forward(self, q, k, v, mask): 63 | # size = [B, L, H, E] 64 | B, L, H, E = q.shape 65 | x = q.permute(0, 2, 3, 1) 66 | # Compute Fourier coefficients 67 | x_ft = torch.fft.rfft(x, dim=-1) 68 | # Perform Fourier neural operations 69 | out_ft = torch.zeros(B, H, E, L // 2 + 1, device=x.device, dtype=torch.cfloat) 70 | for wi, i in enumerate(self.index): 71 | if i >= x_ft.shape[3] or wi >= out_ft.shape[3]: 72 | continue 73 | out_ft[:, :, :, wi] = self.compl_mul1d("bhi,hio->bho", x_ft[:, :, :, i], 74 | torch.complex(self.weights1, self.weights2)[:, :, :, wi]) 75 | # Return to time domain 76 | x = torch.fft.irfft(out_ft, n=x.size(-1)) 77 | return (x, None) 78 | 79 | 80 | # ########## Fourier Cross Former #################### 81 | class FourierCrossAttention(nn.Module): 82 | def __init__(self, in_channels, out_channels, seq_len_q, seq_len_kv, modes=64, mode_select_method='random', 83 | activation='tanh', policy=0): 84 | super(FourierCrossAttention, self).__init__() 85 | print(' fourier enhanced cross attention used!') 86 | """ 87 | 1D Fourier Cross Attention layer. It does FFT, linear transform, attention mechanism and Inverse FFT. 88 | """ 89 | self.activation = activation 90 | self.in_channels = in_channels 91 | self.out_channels = out_channels 92 | # get modes for queries and keys (& values) on frequency domain 93 | self.index_q = get_frequency_modes(seq_len_q, modes=modes, mode_select_method=mode_select_method) 94 | self.index_kv = get_frequency_modes(seq_len_kv, modes=modes, mode_select_method=mode_select_method) 95 | 96 | print('modes_q={}, index_q={}'.format(len(self.index_q), self.index_q)) 97 | print('modes_kv={}, index_kv={}'.format(len(self.index_kv), self.index_kv)) 98 | 99 | self.scale = (1 / (in_channels * out_channels)) 100 | self.weights1 = nn.Parameter( 101 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float)) 102 | self.weights2 = nn.Parameter( 103 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float)) 104 | 105 | # Complex multiplication 106 | def compl_mul1d(self, order, x, weights): 107 | x_flag = True 108 | w_flag = True 109 | if not torch.is_complex(x): 110 | x_flag = False 111 | x = torch.complex(x, torch.zeros_like(x).to(x.device)) 112 | if not torch.is_complex(weights): 113 | w_flag = False 114 | weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device)) 115 | if x_flag or w_flag: 116 | return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag), 117 | torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real)) 118 | else: 119 | return torch.einsum(order, x.real, weights.real) 120 | 121 | def forward(self, q, k, v, mask): 122 | # size = [B, L, H, E] 123 | B, L, H, E = q.shape 124 | xq = q.permute(0, 2, 3, 1) # size = [B, H, E, L] 125 | xk = k.permute(0, 2, 3, 1) 126 | xv = v.permute(0, 2, 3, 1) 127 | 128 | # Compute Fourier coefficients 129 | xq_ft_ = torch.zeros(B, H, E, len(self.index_q), device=xq.device, dtype=torch.cfloat) 130 | xq_ft = torch.fft.rfft(xq, dim=-1) 131 | for i, j in enumerate(self.index_q): 132 | if j >= xq_ft.shape[3]: 133 | continue 134 | xq_ft_[:, :, :, i] = xq_ft[:, :, :, j] 135 | xk_ft_ = torch.zeros(B, H, E, len(self.index_kv), device=xq.device, dtype=torch.cfloat) 136 | xk_ft = torch.fft.rfft(xk, dim=-1) 137 | for i, j in enumerate(self.index_kv): 138 | if j >= xk_ft.shape[3]: 139 | continue 140 | xk_ft_[:, :, :, i] = xk_ft[:, :, :, j] 141 | 142 | # perform attention mechanism on frequency domain 143 | xqk_ft = (self.compl_mul1d("bhex,bhey->bhxy", xq_ft_, xk_ft_)) 144 | if self.activation == 'tanh': 145 | xqk_ft = torch.complex(xqk_ft.real.tanh(), xqk_ft.imag.tanh()) 146 | elif self.activation == 'softmax': 147 | xqk_ft = torch.softmax(abs(xqk_ft), dim=-1) 148 | xqk_ft = torch.complex(xqk_ft, torch.zeros_like(xqk_ft)) 149 | else: 150 | raise Exception('{} actiation function is not implemented'.format(self.activation)) 151 | xqkv_ft = self.compl_mul1d("bhxy,bhey->bhex", xqk_ft, xk_ft_) 152 | xqkvw = self.compl_mul1d("bhex,heox->bhox", xqkv_ft, torch.complex(self.weights1, self.weights2)) 153 | out_ft = torch.zeros(B, H, E, L // 2 + 1, device=xq.device, dtype=torch.cfloat) 154 | for i, j in enumerate(self.index_q): 155 | if i >= xqkvw.shape[3] or j >= out_ft.shape[3]: 156 | continue 157 | out_ft[:, :, :, j] = xqkvw[:, :, :, i] 158 | # Return to time domain 159 | out = torch.fft.irfft(out_ft / self.in_channels / self.out_channels, n=xq.size(-1)) 160 | return (out, None) 161 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/Pyraformer_EncDec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torch.nn.modules.linear import Linear 5 | from layers.SelfAttention_Family import AttentionLayer, FullAttention 6 | from layers.Embed import DataEmbedding 7 | import math 8 | 9 | 10 | def get_mask(input_size, window_size, inner_size): 11 | """Get the attention mask of PAM-Naive""" 12 | # Get the size of all layers 13 | all_size = [] 14 | all_size.append(input_size) 15 | for i in range(len(window_size)): 16 | layer_size = math.floor(all_size[i] / window_size[i]) 17 | all_size.append(layer_size) 18 | 19 | seq_length = sum(all_size) 20 | mask = torch.zeros(seq_length, seq_length) 21 | 22 | # get intra-scale mask 23 | inner_window = inner_size // 2 24 | for layer_idx in range(len(all_size)): 25 | start = sum(all_size[:layer_idx]) 26 | for i in range(start, start + all_size[layer_idx]): 27 | left_side = max(i - inner_window, start) 28 | right_side = min(i + inner_window + 1, start + all_size[layer_idx]) 29 | mask[i, left_side:right_side] = 1 30 | 31 | # get inter-scale mask 32 | for layer_idx in range(1, len(all_size)): 33 | start = sum(all_size[:layer_idx]) 34 | for i in range(start, start + all_size[layer_idx]): 35 | left_side = (start - all_size[layer_idx - 1]) + \ 36 | (i - start) * window_size[layer_idx - 1] 37 | if i == (start + all_size[layer_idx] - 1): 38 | right_side = start 39 | else: 40 | right_side = ( 41 | start - all_size[layer_idx - 1]) + (i - start + 1) * window_size[layer_idx - 1] 42 | mask[i, left_side:right_side] = 1 43 | mask[left_side:right_side, i] = 1 44 | 45 | mask = (1 - mask).bool() 46 | 47 | return mask, all_size 48 | 49 | 50 | def refer_points(all_sizes, window_size): 51 | """Gather features from PAM's pyramid sequences""" 52 | input_size = all_sizes[0] 53 | indexes = torch.zeros(input_size, len(all_sizes)) 54 | 55 | for i in range(input_size): 56 | indexes[i][0] = i 57 | former_index = i 58 | for j in range(1, len(all_sizes)): 59 | start = sum(all_sizes[:j]) 60 | inner_layer_idx = former_index - (start - all_sizes[j - 1]) 61 | former_index = start + \ 62 | min(inner_layer_idx // window_size[j - 1], all_sizes[j] - 1) 63 | indexes[i][j] = former_index 64 | 65 | indexes = indexes.unsqueeze(0).unsqueeze(3) 66 | 67 | return indexes.long() 68 | 69 | 70 | class RegularMask(): 71 | def __init__(self, mask): 72 | self._mask = mask.unsqueeze(1) 73 | 74 | @property 75 | def mask(self): 76 | return self._mask 77 | 78 | 79 | class EncoderLayer(nn.Module): 80 | """ Compose with two layers """ 81 | 82 | def __init__(self, d_model, d_inner, n_head, dropout=0.1, normalize_before=True): 83 | super(EncoderLayer, self).__init__() 84 | 85 | self.slf_attn = AttentionLayer( 86 | FullAttention(mask_flag=True, factor=0, 87 | attention_dropout=dropout, output_attention=False), 88 | d_model, n_head) 89 | self.pos_ffn = PositionwiseFeedForward( 90 | d_model, d_inner, dropout=dropout, normalize_before=normalize_before) 91 | 92 | def forward(self, enc_input, slf_attn_mask=None): 93 | attn_mask = RegularMask(slf_attn_mask) 94 | enc_output, _ = self.slf_attn( 95 | enc_input, enc_input, enc_input, attn_mask=attn_mask) 96 | enc_output = self.pos_ffn(enc_output) 97 | return enc_output 98 | 99 | 100 | class Encoder(nn.Module): 101 | """ A encoder model with self attention mechanism. """ 102 | 103 | def __init__(self, configs, window_size, inner_size): 104 | super().__init__() 105 | 106 | d_bottleneck = configs.d_model//4 107 | 108 | self.mask, self.all_size = get_mask( 109 | configs.seq_len, window_size, inner_size) 110 | self.indexes = refer_points(self.all_size, window_size) 111 | self.layers = nn.ModuleList([ 112 | EncoderLayer(configs.d_model, configs.d_ff, configs.n_heads, dropout=configs.dropout, 113 | normalize_before=False) for _ in range(configs.e_layers) 114 | ]) # naive pyramid attention 115 | 116 | self.enc_embedding = DataEmbedding( 117 | configs.enc_in, configs.d_model, configs.dropout) 118 | self.conv_layers = Bottleneck_Construct( 119 | configs.d_model, window_size, d_bottleneck) 120 | 121 | def forward(self, x_enc, x_mark_enc): 122 | seq_enc = self.enc_embedding(x_enc, x_mark_enc) 123 | 124 | mask = self.mask.repeat(len(seq_enc), 1, 1).to(x_enc.device) 125 | seq_enc = self.conv_layers(seq_enc) 126 | 127 | for i in range(len(self.layers)): 128 | seq_enc = self.layers[i](seq_enc, mask) 129 | 130 | indexes = self.indexes.repeat(seq_enc.size( 131 | 0), 1, 1, seq_enc.size(2)).to(seq_enc.device) 132 | indexes = indexes.view(seq_enc.size(0), -1, seq_enc.size(2)) 133 | all_enc = torch.gather(seq_enc, 1, indexes) 134 | seq_enc = all_enc.view(seq_enc.size(0), self.all_size[0], -1) 135 | 136 | return seq_enc 137 | 138 | 139 | class ConvLayer(nn.Module): 140 | def __init__(self, c_in, window_size): 141 | super(ConvLayer, self).__init__() 142 | self.downConv = nn.Conv1d(in_channels=c_in, 143 | out_channels=c_in, 144 | kernel_size=window_size, 145 | stride=window_size) 146 | self.norm = nn.BatchNorm1d(c_in) 147 | self.activation = nn.ELU() 148 | 149 | def forward(self, x): 150 | x = self.downConv(x) 151 | x = self.norm(x) 152 | x = self.activation(x) 153 | return x 154 | 155 | 156 | class Bottleneck_Construct(nn.Module): 157 | """Bottleneck convolution CSCM""" 158 | 159 | def __init__(self, d_model, window_size, d_inner): 160 | super(Bottleneck_Construct, self).__init__() 161 | if not isinstance(window_size, list): 162 | self.conv_layers = nn.ModuleList([ 163 | ConvLayer(d_inner, window_size), 164 | ConvLayer(d_inner, window_size), 165 | ConvLayer(d_inner, window_size) 166 | ]) 167 | else: 168 | self.conv_layers = [] 169 | for i in range(len(window_size)): 170 | self.conv_layers.append(ConvLayer(d_inner, window_size[i])) 171 | self.conv_layers = nn.ModuleList(self.conv_layers) 172 | self.up = Linear(d_inner, d_model) 173 | self.down = Linear(d_model, d_inner) 174 | self.norm = nn.LayerNorm(d_model) 175 | 176 | def forward(self, enc_input): 177 | temp_input = self.down(enc_input).permute(0, 2, 1) 178 | all_inputs = [] 179 | for i in range(len(self.conv_layers)): 180 | temp_input = self.conv_layers[i](temp_input) 181 | all_inputs.append(temp_input) 182 | 183 | all_inputs = torch.cat(all_inputs, dim=2).transpose(1, 2) 184 | all_inputs = self.up(all_inputs) 185 | all_inputs = torch.cat([enc_input, all_inputs], dim=1) 186 | 187 | all_inputs = self.norm(all_inputs) 188 | return all_inputs 189 | 190 | 191 | class PositionwiseFeedForward(nn.Module): 192 | """ Two-layer position-wise feed-forward neural network. """ 193 | 194 | def __init__(self, d_in, d_hid, dropout=0.1, normalize_before=True): 195 | super().__init__() 196 | 197 | self.normalize_before = normalize_before 198 | 199 | self.w_1 = nn.Linear(d_in, d_hid) 200 | self.w_2 = nn.Linear(d_hid, d_in) 201 | 202 | self.layer_norm = nn.LayerNorm(d_in, eps=1e-6) 203 | self.dropout = nn.Dropout(dropout) 204 | 205 | def forward(self, x): 206 | residual = x 207 | if self.normalize_before: 208 | x = self.layer_norm(x) 209 | 210 | x = F.gelu(self.w_1(x)) 211 | x = self.dropout(x) 212 | x = self.w_2(x) 213 | x = self.dropout(x) 214 | x = x + residual 215 | 216 | if not self.normalize_before: 217 | x = self.layer_norm(x) 218 | return x 219 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/SelfAttention_Family.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | from math import sqrt 5 | from utils.masking import TriangularCausalMask, ProbMask 6 | from reformer_pytorch import LSHSelfAttention 7 | 8 | 9 | class DSAttention(nn.Module): 10 | '''De-stationary Attention''' 11 | 12 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False): 13 | super(DSAttention, self).__init__() 14 | self.scale = scale 15 | self.mask_flag = mask_flag 16 | self.output_attention = output_attention 17 | self.dropout = nn.Dropout(attention_dropout) 18 | 19 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None): 20 | B, L, H, E = queries.shape 21 | _, S, _, D = values.shape 22 | scale = self.scale or 1. / sqrt(E) 23 | 24 | tau = 1.0 if tau is None else tau.unsqueeze( 25 | 1).unsqueeze(1) # B x 1 x 1 x 1 26 | delta = 0.0 if delta is None else delta.unsqueeze( 27 | 1).unsqueeze(1) # B x 1 x 1 x S 28 | 29 | # De-stationary Attention, rescaling pre-softmax score with learned de-stationary factors 30 | scores = torch.einsum("blhe,bshe->bhls", queries, keys) * tau + delta 31 | 32 | if self.mask_flag: 33 | if attn_mask is None: 34 | attn_mask = TriangularCausalMask(B, L, device=queries.device) 35 | 36 | scores.masked_fill_(attn_mask.mask, -np.inf) 37 | 38 | A = self.dropout(torch.softmax(scale * scores, dim=-1)) 39 | V = torch.einsum("bhls,bshd->blhd", A, values) 40 | 41 | if self.output_attention: 42 | return (V.contiguous(), A) 43 | else: 44 | return (V.contiguous(), None) 45 | 46 | 47 | class FullAttention(nn.Module): 48 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False): 49 | super(FullAttention, self).__init__() 50 | self.scale = scale 51 | self.mask_flag = mask_flag 52 | self.output_attention = output_attention 53 | self.dropout = nn.Dropout(attention_dropout) 54 | 55 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None): 56 | B, L, H, E = queries.shape 57 | _, S, _, D = values.shape 58 | scale = self.scale or 1. / sqrt(E) 59 | 60 | scores = torch.einsum("blhe,bshe->bhls", queries, keys) 61 | 62 | if self.mask_flag: 63 | if attn_mask is None: 64 | attn_mask = TriangularCausalMask(B, L, device=queries.device) 65 | 66 | scores.masked_fill_(attn_mask.mask, -np.inf) 67 | 68 | A = self.dropout(torch.softmax(scale * scores, dim=-1)) 69 | V = torch.einsum("bhls,bshd->blhd", A, values) 70 | 71 | if self.output_attention: 72 | return (V.contiguous(), A) 73 | else: 74 | return (V.contiguous(), None) 75 | 76 | 77 | class ProbAttention(nn.Module): 78 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False): 79 | super(ProbAttention, self).__init__() 80 | self.factor = factor 81 | self.scale = scale 82 | self.mask_flag = mask_flag 83 | self.output_attention = output_attention 84 | self.dropout = nn.Dropout(attention_dropout) 85 | 86 | def _prob_QK(self, Q, K, sample_k, n_top): # n_top: c*ln(L_q) 87 | # Q [B, H, L, D] 88 | B, H, L_K, E = K.shape 89 | _, _, L_Q, _ = Q.shape 90 | 91 | # calculate the sampled Q_K 92 | K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E) 93 | # real U = U_part(factor*ln(L_k))*L_q 94 | index_sample = torch.randint(L_K, (L_Q, sample_k)) 95 | K_sample = K_expand[:, :, torch.arange( 96 | L_Q).unsqueeze(1), index_sample, :] 97 | Q_K_sample = torch.matmul( 98 | Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze() 99 | 100 | # find the Top_k query with sparisty measurement 101 | M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K) 102 | M_top = M.topk(n_top, sorted=False)[1] 103 | 104 | # use the reduced Q to calculate Q_K 105 | Q_reduce = Q[torch.arange(B)[:, None, None], 106 | torch.arange(H)[None, :, None], 107 | M_top, :] # factor*ln(L_q) 108 | Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1)) # factor*ln(L_q)*L_k 109 | 110 | return Q_K, M_top 111 | 112 | def _get_initial_context(self, V, L_Q): 113 | B, H, L_V, D = V.shape 114 | if not self.mask_flag: 115 | # V_sum = V.sum(dim=-2) 116 | V_sum = V.mean(dim=-2) 117 | contex = V_sum.unsqueeze(-2).expand(B, H, 118 | L_Q, V_sum.shape[-1]).clone() 119 | else: # use mask 120 | # requires that L_Q == L_V, i.e. for self-attention only 121 | assert (L_Q == L_V) 122 | contex = V.cumsum(dim=-2) 123 | return contex 124 | 125 | def _update_context(self, context_in, V, scores, index, L_Q, attn_mask): 126 | B, H, L_V, D = V.shape 127 | 128 | if self.mask_flag: 129 | attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device) 130 | scores.masked_fill_(attn_mask.mask, -np.inf) 131 | 132 | attn = torch.softmax(scores, dim=-1) # nn.Softmax(dim=-1)(scores) 133 | 134 | context_in[torch.arange(B)[:, None, None], 135 | torch.arange(H)[None, :, None], 136 | index, :] = torch.matmul(attn, V).type_as(context_in) 137 | if self.output_attention: 138 | attns = (torch.ones([B, H, L_V, L_V]) / 139 | L_V).type_as(attn).to(attn.device) 140 | attns[torch.arange(B)[:, None, None], torch.arange(H)[ 141 | None, :, None], index, :] = attn 142 | return (context_in, attns) 143 | else: 144 | return (context_in, None) 145 | 146 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None): 147 | B, L_Q, H, D = queries.shape 148 | _, L_K, _, _ = keys.shape 149 | 150 | queries = queries.transpose(2, 1) 151 | keys = keys.transpose(2, 1) 152 | values = values.transpose(2, 1) 153 | 154 | U_part = self.factor * \ 155 | np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k) 156 | u = self.factor * \ 157 | np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q) 158 | 159 | U_part = U_part if U_part < L_K else L_K 160 | u = u if u < L_Q else L_Q 161 | 162 | scores_top, index = self._prob_QK( 163 | queries, keys, sample_k=U_part, n_top=u) 164 | 165 | # add scale factor 166 | scale = self.scale or 1. / sqrt(D) 167 | if scale is not None: 168 | scores_top = scores_top * scale 169 | # get the context 170 | context = self._get_initial_context(values, L_Q) 171 | # update the context with selected top_k queries 172 | context, attn = self._update_context( 173 | context, values, scores_top, index, L_Q, attn_mask) 174 | 175 | return context.contiguous(), attn 176 | 177 | 178 | class AttentionLayer(nn.Module): 179 | def __init__(self, attention, d_model, n_heads, d_keys=None, 180 | d_values=None): 181 | super(AttentionLayer, self).__init__() 182 | 183 | d_keys = d_keys or (d_model // n_heads) 184 | d_values = d_values or (d_model // n_heads) 185 | 186 | self.inner_attention = attention 187 | self.query_projection = nn.Linear(d_model, d_keys * n_heads) 188 | self.key_projection = nn.Linear(d_model, d_keys * n_heads) 189 | self.value_projection = nn.Linear(d_model, d_values * n_heads) 190 | self.out_projection = nn.Linear(d_values * n_heads, d_model) 191 | self.n_heads = n_heads 192 | 193 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None): 194 | B, L, _ = queries.shape 195 | _, S, _ = keys.shape 196 | H = self.n_heads 197 | 198 | queries = self.query_projection(queries).view(B, L, H, -1) 199 | keys = self.key_projection(keys).view(B, S, H, -1) 200 | values = self.value_projection(values).view(B, S, H, -1) 201 | 202 | out, attn = self.inner_attention( 203 | queries, 204 | keys, 205 | values, 206 | attn_mask, 207 | tau=tau, 208 | delta=delta 209 | ) 210 | out = out.view(B, L, -1) 211 | 212 | return self.out_projection(out), attn 213 | 214 | 215 | class ReformerLayer(nn.Module): 216 | def __init__(self, attention, d_model, n_heads, d_keys=None, 217 | d_values=None, causal=False, bucket_size=4, n_hashes=4): 218 | super().__init__() 219 | self.bucket_size = bucket_size 220 | self.attn = LSHSelfAttention( 221 | dim=d_model, 222 | heads=n_heads, 223 | bucket_size=bucket_size, 224 | n_hashes=n_hashes, 225 | causal=causal 226 | ) 227 | 228 | def fit_length(self, queries): 229 | # inside reformer: assert N % (bucket_size * 2) == 0 230 | B, N, C = queries.shape 231 | if N % (self.bucket_size * 2) == 0: 232 | return queries 233 | else: 234 | # fill the time series 235 | fill_len = (self.bucket_size * 2) - (N % (self.bucket_size * 2)) 236 | return torch.cat([queries, torch.zeros([B, fill_len, C]).to(queries.device)], dim=1) 237 | 238 | def forward(self, queries, keys, values, attn_mask, tau, delta): 239 | # in Reformer: defalut queries=keys 240 | B, N, C = queries.shape 241 | queries = self.attn(self.fit_length(queries))[:, :N, :] 242 | return queries, None 243 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/Transformer_EncDec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class ConvLayer(nn.Module): 7 | def __init__(self, c_in): 8 | super(ConvLayer, self).__init__() 9 | self.downConv = nn.Conv1d(in_channels=c_in, 10 | out_channels=c_in, 11 | kernel_size=3, 12 | padding=2, 13 | padding_mode='circular') 14 | self.norm = nn.BatchNorm1d(c_in) 15 | self.activation = nn.ELU() 16 | self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1) 17 | 18 | def forward(self, x): 19 | x = self.downConv(x.permute(0, 2, 1)) 20 | x = self.norm(x) 21 | x = self.activation(x) 22 | x = self.maxPool(x) 23 | x = x.transpose(1, 2) 24 | return x 25 | 26 | 27 | class EncoderLayer(nn.Module): 28 | def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"): 29 | super(EncoderLayer, self).__init__() 30 | d_ff = d_ff or 4 * d_model 31 | self.attention = attention 32 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) 33 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) 34 | self.norm1 = nn.LayerNorm(d_model) 35 | self.norm2 = nn.LayerNorm(d_model) 36 | self.dropout = nn.Dropout(dropout) 37 | self.activation = F.relu if activation == "relu" else F.gelu 38 | 39 | def forward(self, x, attn_mask=None, tau=None, delta=None): 40 | new_x, attn = self.attention( 41 | x, x, x, 42 | attn_mask=attn_mask, 43 | tau=tau, delta=delta 44 | ) 45 | x = x + self.dropout(new_x) 46 | 47 | y = x = self.norm1(x) 48 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 49 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 50 | 51 | return self.norm2(x + y), attn 52 | 53 | 54 | class Encoder(nn.Module): 55 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None): 56 | super(Encoder, self).__init__() 57 | self.attn_layers = nn.ModuleList(attn_layers) 58 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None 59 | self.norm = norm_layer 60 | 61 | def forward(self, x, attn_mask=None, tau=None, delta=None): 62 | # x [B, L, D] 63 | attns = [] 64 | if self.conv_layers is not None: 65 | for i, (attn_layer, conv_layer) in enumerate(zip(self.attn_layers, self.conv_layers)): 66 | delta = delta if i == 0 else None 67 | x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta) 68 | x = conv_layer(x) 69 | attns.append(attn) 70 | x, attn = self.attn_layers[-1](x, tau=tau, delta=None) 71 | attns.append(attn) 72 | else: 73 | for attn_layer in self.attn_layers: 74 | x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta) 75 | attns.append(attn) 76 | 77 | if self.norm is not None: 78 | x = self.norm(x) 79 | 80 | return x, attns 81 | 82 | 83 | class DecoderLayer(nn.Module): 84 | def __init__(self, self_attention, cross_attention, d_model, d_ff=None, 85 | dropout=0.1, activation="relu"): 86 | super(DecoderLayer, self).__init__() 87 | d_ff = d_ff or 4 * d_model 88 | self.self_attention = self_attention 89 | self.cross_attention = cross_attention 90 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1) 91 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1) 92 | self.norm1 = nn.LayerNorm(d_model) 93 | self.norm2 = nn.LayerNorm(d_model) 94 | self.norm3 = nn.LayerNorm(d_model) 95 | self.dropout = nn.Dropout(dropout) 96 | self.activation = F.relu if activation == "relu" else F.gelu 97 | 98 | def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None): 99 | x = x + self.dropout(self.self_attention( 100 | x, x, x, 101 | attn_mask=x_mask, 102 | tau=tau, delta=None 103 | )[0]) 104 | x = self.norm1(x) 105 | 106 | x = x + self.dropout(self.cross_attention( 107 | x, cross, cross, 108 | attn_mask=cross_mask, 109 | tau=tau, delta=delta 110 | )[0]) 111 | 112 | y = x = self.norm2(x) 113 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1)))) 114 | y = self.dropout(self.conv2(y).transpose(-1, 1)) 115 | 116 | return self.norm3(x + y) 117 | 118 | 119 | class Decoder(nn.Module): 120 | def __init__(self, layers, norm_layer=None, projection=None): 121 | super(Decoder, self).__init__() 122 | self.layers = nn.ModuleList(layers) 123 | self.norm = norm_layer 124 | self.projection = projection 125 | 126 | def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None): 127 | for layer in self.layers: 128 | x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask, tau=tau, delta=delta) 129 | 130 | if self.norm is not None: 131 | x = self.norm(x) 132 | 133 | if self.projection is not None: 134 | x = self.projection(x) 135 | return x 136 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/layers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/layers/__init__.py -------------------------------------------------------------------------------- /SimMTM_Forecasting/models/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/models/PatchTST.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from layers.Transformer_EncDec import Encoder, EncoderLayer 4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer 5 | from layers.Embed import PatchEmbedding 6 | from utils.losses import AutomaticWeightedLoss 7 | from utils.tools import ContrastiveWeight, AggregationRebuild 8 | 9 | class Flatten_Head(nn.Module): 10 | def __init__(self, nf, pred_len, head_dropout=0): 11 | super().__init__() 12 | self.flatten = nn.Flatten(start_dim=-2) 13 | self.linear = nn.Linear(nf, pred_len) 14 | self.dropout = nn.Dropout(head_dropout) 15 | 16 | def forward(self, x): # [bs x n_vars x patch_num x d_model] 17 | x = self.flatten(x) # [bs x n_vars x (patch_num * d_model)] 18 | x = self.linear(x) # [bs x n_vars x pred_len] 19 | x = self.dropout(x) # [bs x n_vars x pred_len] 20 | return x 21 | 22 | class Pooler_Head(nn.Module): 23 | def __init__(self, nf, dimension=128, head_dropout=0): 24 | super().__init__() 25 | 26 | self.pooler = nn.Sequential( 27 | nn.Flatten(start_dim=-2), 28 | nn.Linear(nf, nf // 2), 29 | nn.BatchNorm1d(nf // 2), 30 | nn.ReLU(), 31 | nn.Linear(nf // 2, dimension), 32 | nn.Dropout(head_dropout), 33 | ) 34 | 35 | def forward(self, x): # [(bs * n_vars) x patch_num x d_model] 36 | x = self.pooler(x) # [(bs * n_vars) x dimension] 37 | return x 38 | 39 | class Model(nn.Module): 40 | """ 41 | PatchTST + SimMTM 42 | """ 43 | 44 | def __init__(self, configs): 45 | super(Model, self).__init__() 46 | self.task_name = configs.task_name 47 | self.pred_len = configs.pred_len 48 | self.seq_len = configs.seq_len 49 | self.label_len = configs.label_len 50 | self.output_attention = configs.output_attention 51 | self.configs = configs 52 | 53 | # patching and embedding 54 | self.patch_embedding = PatchEmbedding(configs.d_model, configs.patch_len, configs.stride, configs.stride, configs.dropout) 55 | 56 | # Encoder 57 | self.encoder = Encoder( 58 | [ 59 | EncoderLayer( 60 | AttentionLayer( 61 | DSAttention(False, configs.factor, attention_dropout=configs.dropout, 62 | output_attention=configs.output_attention), configs.d_model, configs.n_heads), 63 | configs.d_model, 64 | configs.d_ff, 65 | dropout=configs.dropout, 66 | activation=configs.activation 67 | ) for l in range(configs.e_layers) 68 | ], 69 | norm_layer=torch.nn.LayerNorm(configs.d_model), 70 | ) 71 | 72 | self.patch_num = int((configs.seq_len - configs.patch_len) / configs.stride + 2) 73 | self.head_nf = self.patch_num * configs.d_model 74 | 75 | # Decoder 76 | if self.task_name == 'pretrain': 77 | 78 | # for series-wise representation 79 | self.pooler = Pooler_Head(self.head_nf, head_dropout=configs.head_dropout) 80 | 81 | # for reconstruction 82 | self.projection = Flatten_Head(self.head_nf, configs.d_model, configs.seq_len, head_dropout=configs.head_dropout) 83 | 84 | self.awl = AutomaticWeightedLoss(2) 85 | self.contrastive = ContrastiveWeight(self.configs) 86 | self.aggregation = AggregationRebuild(self.configs) 87 | self.mse = torch.nn.MSELoss() 88 | 89 | elif self.task_name == 'finetune': 90 | self.head = Flatten_Head(configs.head_nf, configs.d_model, configs.pred_len, head_dropout=configs.head_dropout) 91 | 92 | def forecast(self, x_enc, x_mark_enc): 93 | 94 | # data shape 95 | bs, seq_len, n_vars = x_enc.shape 96 | 97 | # normalization 98 | means = x_enc.mean(1, keepdim=True).detach() 99 | x_enc = x_enc - means 100 | stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5) 101 | x_enc /= stdev 102 | 103 | # do patching and embedding 104 | x_enc = x_enc.permute(0, 2, 1) 105 | enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model] 106 | 107 | # encoder 108 | enc_out, _ = self.encoder(enc_out) # enc_out: [(bs * n_vars) x patch_num x d_model] 109 | 110 | enc_out = torch.reshape(enc_out, (bs, n_vars, seq_len, -1)) # enc_out: [bs x n_vars x patch_num x d_model] 111 | 112 | # decoder 113 | dec_out = self.head(enc_out) # dec_out: [bs x n_vars x pred_len] 114 | dec_out = dec_out.permute(0, 2, 1) # dec_out: [bs x pred_len x n_vars] 115 | 116 | # de-Normalization from Non-stationary Transformer 117 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1)) 118 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1)) 119 | 120 | return dec_out 121 | 122 | def pretrain(self, x_enc, x_mark_enc, batch_x, mask): 123 | 124 | # data shape 125 | bs, seq_len, n_vars = x_enc.shape 126 | 127 | # normalization 128 | means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1) 129 | means = means.unsqueeze(1).detach() 130 | x_enc = x_enc - means 131 | x_enc = x_enc.masked_fill(mask == 0, 0) 132 | stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5) 133 | stdev = stdev.unsqueeze(1).detach() 134 | x_enc /= stdev 135 | 136 | # do patching and embedding 137 | x_enc = x_enc.permute(0, 2, 1) 138 | enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model] 139 | 140 | # encoder 141 | p_enc_out, _ = self.encoder(enc_out) # [(bs * n_vars) x patch_num x d_model] 142 | 143 | # series-wise representation 144 | s_enc_out = self.pooler(p_enc_out) # [(bs * n_vars) x dimension] 145 | 146 | # series weight learning 147 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [(bs * n_vars) x (bs * n_vars)] 148 | rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [(bs * n_vars) x patch_num x d_model] 149 | 150 | agg_enc_out = agg_enc_out.reshape(bs, n_vars, agg_enc_out.shape[-2], agg_enc_out.shape[-1]) # agg_enc_out: [bs x n_vars x patch_num x d_model] 151 | 152 | # decoder 153 | dec_out = self.projection(agg_enc_out) # [bs x n_vars x seq_len] 154 | dec_out = dec_out.permute(0, 2, 1) # [bs x seq_len x n_vars] 155 | 156 | # de-Normalization 157 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1)) 158 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1)) 159 | 160 | pred_batch_x = dec_out[:batch_x.shape[0]] 161 | 162 | # series reconstruction 163 | loss_rb = self.mse(pred_batch_x, batch_x.detach()) 164 | 165 | # loss 166 | loss = self.awl(loss_cl, loss_rb) 167 | 168 | return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x 169 | 170 | def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None): 171 | 172 | if self.task_name == 'pretrain': 173 | return self.pretrain(x_enc, x_mark_enc, batch_x, mask) 174 | if self.task_name == 'finetune': 175 | dec_out = self.forecast(x_enc, x_mark_enc) 176 | return dec_out[:, -self.pred_len:, :] # [B, L, D] 177 | 178 | return None 179 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/__init__.py -------------------------------------------------------------------------------- /SimMTM_Forecasting/models/iTransformer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from layers.Transformer_EncDec import Encoder, EncoderLayer 4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer 5 | from layers.Embed import DataEmbedding_inverted 6 | from utils.losses import AutomaticWeightedLoss 7 | from utils.tools import ContrastiveWeight, AggregationRebuild 8 | 9 | class Flatten_Head(nn.Module): 10 | def __init__(self, d_model, pred_len, head_dropout=0): 11 | super().__init__() 12 | self.flatten = nn.Flatten(start_dim=-1) 13 | self.linear = nn.Linear(d_model, pred_len, bias=True) 14 | self.dropout = nn.Dropout(head_dropout) 15 | 16 | def forward(self, x): # x: [bs x n_vars x d_model] 17 | x = self.flatten(x) 18 | x = self.linear(x) 19 | x = self.dropout(x) 20 | return x # x: [bs x n_vars x seq_len] 21 | 22 | class Pooler_Head(nn.Module): 23 | def __init__(self, nf, dimension=128, head_dropout=0): 24 | super().__init__() 25 | 26 | self.pooler = nn.Sequential( 27 | nn.Flatten(start_dim=-2), 28 | nn.Linear(nf, nf // 2), 29 | nn.BatchNorm1d(nf // 2), 30 | nn.ReLU(), 31 | nn.Linear(nf // 2, dimension), 32 | nn.Dropout(head_dropout), 33 | ) 34 | 35 | def forward(self, x): # [bs x n_vars x d_model] 36 | x = self.pooler(x) # [bs x dimension] 37 | return x 38 | 39 | class Model(nn.Module): 40 | """ 41 | iTransformer + SimMTM 42 | """ 43 | 44 | def __init__(self, configs): 45 | super(Model, self).__init__() 46 | self.task_name = configs.task_name 47 | self.pred_len = configs.pred_len 48 | self.seq_len = configs.seq_len 49 | self.label_len = configs.label_len 50 | self.output_attention = configs.output_attention 51 | self.configs = configs 52 | 53 | # patching and embedding 54 | self.enc_embedding = DataEmbedding_inverted(configs.seq_len, configs.d_model, configs.embed, configs.freq, configs.dropout) 55 | 56 | # Encoder 57 | self.encoder = Encoder( 58 | [ 59 | EncoderLayer( 60 | AttentionLayer( 61 | DSAttention(False, configs.factor, attention_dropout=configs.dropout, 62 | output_attention=configs.output_attention), configs.d_model, configs.n_heads), 63 | configs.d_model, 64 | configs.d_ff, 65 | dropout=configs.dropout, 66 | activation=configs.activation 67 | ) for l in range(configs.e_layers) 68 | ], 69 | norm_layer=torch.nn.LayerNorm(configs.d_model), 70 | ) 71 | 72 | # Decoder 73 | if self.task_name == 'pretrain': 74 | 75 | # for series-wise representation 76 | self.pooler = Pooler_Head(configs.enc_in*configs.d_model, head_dropout=configs.head_dropout) 77 | 78 | # for reconstruction 79 | self.projection = Flatten_Head(configs.d_model, configs.seq_len, head_dropout=configs.head_dropout) 80 | 81 | self.awl = AutomaticWeightedLoss(2) 82 | self.contrastive = ContrastiveWeight(self.configs) 83 | self.aggregation = AggregationRebuild(self.configs) 84 | self.mse = torch.nn.MSELoss() 85 | 86 | elif self.task_name == 'finetune': 87 | self.head = Flatten_Head(configs.d_model, configs.pred_len, head_dropout=configs.head_dropout) 88 | 89 | def forecast(self, x_enc, x_mark_enc): 90 | 91 | # normalization 92 | means = x_enc.mean(1, keepdim=True).detach() 93 | x_enc = x_enc - means 94 | stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5) 95 | x_enc /= stdev 96 | 97 | _, _, N = x_enc.shape # x_enc: [Batch Time Variate] 98 | 99 | # embedding 100 | enc_out = self.enc_embedding(x_enc, x_mark_enc) 101 | 102 | # encoder 103 | enc_out, _ = self.encoder(enc_out) 104 | 105 | dec_out = self.head(enc_out).permute(0, 2, 1)[:, :, :N] 106 | 107 | # de-Normalization 108 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1)) 109 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1)) 110 | 111 | return dec_out 112 | 113 | def pretrain(self, x_enc, x_mark_enc, batch_x, mask): 114 | 115 | # normalization 116 | means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1) 117 | means = means.unsqueeze(1).detach() 118 | x_enc = x_enc - means 119 | x_enc = x_enc.masked_fill(mask == 0, 0) 120 | stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5) 121 | stdev = stdev.unsqueeze(1).detach() 122 | x_enc /= stdev 123 | 124 | # encoder 125 | enc_out = self.enc_embedding(x_enc) 126 | p_enc_out, _ = self.encoder(enc_out) # p_enc_out: [bs x n_vars x d_model] 127 | 128 | # series-wise representation 129 | s_enc_out = self.pooler(p_enc_out) # s_enc_out: [bs x dimension] 130 | 131 | # series weight learning 132 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [bs x bs] 133 | rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [bs x n_vars x d_model] 134 | 135 | # decoder 136 | dec_out = self.projection(agg_enc_out) # [bs x n_vars x seq_len] 137 | dec_out = dec_out.permute(0, 2, 1) # [bs x seq_len x n_vars] 138 | 139 | # de-Normalization 140 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1)) 141 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1)) 142 | 143 | pred_batch_x = dec_out[:batch_x.shape[0]] 144 | 145 | # series reconstruction 146 | loss_rb = self.mse(pred_batch_x, batch_x.detach()) 147 | 148 | # loss 149 | loss = self.awl(loss_cl, loss_rb) 150 | 151 | return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x 152 | 153 | def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None): 154 | 155 | if self.task_name == 'pretrain': 156 | return self.pretrain(x_enc, x_mark_enc, batch_x, mask) 157 | if self.task_name == 'finetune': 158 | dec_out = self.forecast(x_enc, x_mark_enc) 159 | return dec_out[:, -self.pred_len:, :] # [B, L, D] 160 | 161 | return None 162 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/run.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import torch 3 | from exp.exp_simmtm import Exp_SimMTM 4 | import random 5 | import numpy as np 6 | import os 7 | os.environ['CUDA_LAUNCH_BLOCKING'] = '0' 8 | 9 | fix_seed = 2023 10 | random.seed(fix_seed) 11 | torch.manual_seed(fix_seed) 12 | np.random.seed(fix_seed) 13 | 14 | parser = argparse.ArgumentParser(description='SimMTM') 15 | 16 | # basic config 17 | parser.add_argument('--task_name', type=str, required=True, default='pretrain', help='task name, options:[pretrain, finetune]') 18 | parser.add_argument('--is_training', type=int, default=1, help='status') 19 | parser.add_argument('--model_id', type=str, required=True, default='SimMTM', help='model id') 20 | parser.add_argument('--model', type=str, required=True, default='SimMTM', help='model name') 21 | 22 | # data loader 23 | parser.add_argument('--data', type=str, required=True, default='ETTh1', help='dataset type') 24 | parser.add_argument('--root_path', type=str, default='./datasets', help='root path of the data file') 25 | parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file') 26 | parser.add_argument('--features', type=str, default='M', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate') 27 | parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task') 28 | parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h') 29 | parser.add_argument('--checkpoints', type=str, default='./outputs/checkpoints/', help='location of model fine-tuning checkpoints') 30 | parser.add_argument('--pretrain_checkpoints', type=str, default='./outputs/pretrain_checkpoints/', help='location of model pre-training checkpoints') 31 | parser.add_argument('--transfer_checkpoints', type=str, default='ckpt_best.pth', help='checkpoints we will use to finetune, options:[ckpt_best.pth, ckpt10.pth, ckpt20.pth...]') 32 | parser.add_argument('--load_checkpoints', type=str, default=None, help='location of model checkpoints') 33 | parser.add_argument('--select_channels', type=float, default=1, help='select the rate of channels to train') 34 | 35 | # forecasting task 36 | parser.add_argument('--seq_len', type=int, default=336, help='input sequence length') 37 | parser.add_argument('--label_len', type=int, default=48, help='start token length') 38 | parser.add_argument('--pred_len', type=int, default=96, help='prediction sequence length') 39 | parser.add_argument('--seasonal_patterns', type=str, default='Monthly', help='subset for M4') 40 | 41 | # model define 42 | parser.add_argument('--top_k', type=int, default=5, help='for TimesBlock') 43 | parser.add_argument('--num_kernels', type=int, default=3, help='for Inception') 44 | parser.add_argument('--enc_in', type=int, default=7, help='encoder input size') 45 | parser.add_argument('--dec_in', type=int, default=7, help='decoder input size') 46 | parser.add_argument('--c_out', type=int, default=7, help='output size') 47 | parser.add_argument('--d_model', type=int, default=512, help='dimension of model') 48 | parser.add_argument('--n_heads', type=int, default=8, help='num of heads') 49 | parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers') 50 | parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers') 51 | parser.add_argument('--d_ff', type=int, default=2048, help='dimension of fcn') 52 | parser.add_argument('--moving_avg', type=int, default=25, help='window size of moving average') 53 | parser.add_argument('--factor', type=int, default=1, help='attn factor') 54 | parser.add_argument('--distil', action='store_false', help='whether to use distilling in encoder, using this argument means not using distilling', default=True) 55 | parser.add_argument('--dropout', type=float, default=0.1, help='dropout') 56 | parser.add_argument('--fc_dropout', type=float, default=0, help='fully connected dropout') 57 | parser.add_argument('--head_dropout', type=float, default=0.1, help='head dropout') 58 | parser.add_argument('--embed', type=str, default='timeF', help='time features encoding, options:[timeF, fixed, learned]') 59 | parser.add_argument('--activation', type=str, default='gelu', help='activation') 60 | parser.add_argument('--output_attention', action='store_true', help='whether to output attention in ecoder') 61 | parser.add_argument('--individual', type=int, default=0, help='individual head; True 1 False 0') 62 | parser.add_argument('--pct_start', type=float, default=0.3, help='pct_start') 63 | parser.add_argument('--patch_len', type=int, default=12, help='path length') 64 | parser.add_argument('--stride', type=int, default=12, help='stride') 65 | 66 | # optimization 67 | parser.add_argument('--num_workers', type=int, default=5, help='data loader num workers') 68 | parser.add_argument('--itr', type=int, default=1, help='experiments times') 69 | parser.add_argument('--train_epochs', type=int, default=10, help='train epochs') 70 | parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data') 71 | parser.add_argument('--patience', type=int, default=3, help='early stopping patience') 72 | parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate') 73 | parser.add_argument('--des', type=str, default='test', help='exp description') 74 | parser.add_argument('--loss', type=str, default='MSE', help='loss function') 75 | parser.add_argument('--lradj', type=str, default='type1', help='adjust learning rate') 76 | parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False) 77 | 78 | # GPU 79 | parser.add_argument('--use_gpu', type=bool, default=True, help='use gpu') 80 | parser.add_argument('--gpu', type=int, default=0, help='gpu') 81 | parser.add_argument('--use_multi_gpu', action='store_true', help='use multiple gpus', default=False) 82 | parser.add_argument('--devices', type=str, default='0', help='device ids of multile gpus') 83 | 84 | # Pre-train 85 | parser.add_argument('--lm', type=int, default=3, help='average masking length') 86 | parser.add_argument('--positive_nums', type=int, default=3, help='masking series numbers') 87 | parser.add_argument('--rbtp', type=int, default=1, help='0: rebuild the embedding of oral series; 1: rebuild oral series') 88 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature') 89 | parser.add_argument('--masked_rule', type=str, default='geometric', help='geometric, random, masked tail, masked head') 90 | parser.add_argument('--mask_rate', type=float, default=0.5, help='mask ratio') 91 | 92 | args = parser.parse_args() 93 | args.use_gpu = True if torch.cuda.is_available() and args.use_gpu else False 94 | 95 | if args.use_gpu and args.use_multi_gpu: 96 | args.devices = args.devices.replace(' ', '') 97 | device_ids = args.devices.split(',') 98 | args.device_ids = [int(id_) for id_ in device_ids] 99 | 100 | print('Args in experiment:') 101 | print(args) 102 | 103 | Exp = Exp_SimMTM 104 | if args.task_name == 'pretrain': 105 | for ii in range(args.itr): 106 | # setting record of experiments 107 | setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}_lm{}_pn{}_mr{}_tp{}'.format( 108 | args.task_name, 109 | args.model, 110 | args.data, 111 | args.features, 112 | args.seq_len, 113 | args.label_len, 114 | args.pred_len, 115 | args.d_model, 116 | args.d_ff, 117 | args.n_heads, 118 | args.e_layers, 119 | args.d_layers, 120 | args.factor, 121 | args.dropout, 122 | args.head_dropout, 123 | args.train_epochs, 124 | args.batch_size, 125 | args.learning_rate, 126 | args.lm, 127 | args.positive_nums, 128 | args.mask_rate, 129 | args.temperature 130 | ) 131 | 132 | exp = Exp(args) # set experiments 133 | print('>>>>>>>start pre_training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting)) 134 | exp.pretrain() 135 | torch.cuda.empty_cache() 136 | 137 | elif args.task_name == 'finetune': 138 | for ii in range(args.itr): 139 | # setting record of experiments 140 | setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}'.format( 141 | args.task_name, 142 | args.model, 143 | args.data, 144 | args.features, 145 | args.seq_len, 146 | args.label_len, 147 | args.pred_len, 148 | args.d_model, 149 | args.d_ff, 150 | args.n_heads, 151 | args.e_layers, 152 | args.d_layers, 153 | args.factor, 154 | args.dropout, 155 | args.head_dropout, 156 | args.train_epochs, 157 | args.batch_size, 158 | args.learning_rate 159 | ) 160 | 161 | args.load_checkpoints = os.path.join(args.pretrain_checkpoints, args.data, args.transfer_checkpoints) 162 | 163 | exp = Exp(args) # set experiments 164 | 165 | print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting)) 166 | exp.train(setting) 167 | 168 | print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting)) 169 | exp.test() 170 | torch.cuda.empty_cache() 171 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ECL_script/ECL.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --root_path ./dataset/electricity/ \ 7 | --data_path electricity.csv \ 8 | --model_id ECL \ 9 | --model SimMTM \ 10 | --data ECL \ 11 | --features M \ 12 | --seq_len 336 \ 13 | --label_len 48 \ 14 | --pred_len $pred_len \ 15 | --e_layers 2 \ 16 | --enc_in 321 \ 17 | --dec_in 321 \ 18 | --c_out 321 \ 19 | --d_model 32 \ 20 | --d_ff 64 \ 21 | --n_heads 16 \ 22 | --batch_size 32 23 | done 24 | 25 | 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --is_training 1 \ 7 | --root_path ./dataset/ETT-small/ \ 8 | --data_path ETTh1.csv \ 9 | --model_id ETTh1 \ 10 | --model SimMTM \ 11 | --data ETTh1 \ 12 | --features M \ 13 | --seq_len 336 \ 14 | --label_len 48 \ 15 | --pred_len $pred_len \ 16 | --e_layers 3 \ 17 | --enc_in 7 \ 18 | --dec_in 7 \ 19 | --c_out 7 \ 20 | --n_heads 16 \ 21 | --d_model 32 \ 22 | --d_ff 64 \ 23 | --learning_rate 0.0001 \ 24 | --dropout 0.2 \ 25 | --batch_size 16 26 | done 27 | 28 | 29 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh2.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --is_training 1 \ 7 | --root_path ./dataset/ETT-small/ \ 8 | --data_path ETTh2.csv \ 9 | --model_id ETTh2 \ 10 | --model SimMTM \ 11 | --data ETTh2 \ 12 | --features M \ 13 | --seq_len 336 \ 14 | --label_len 48 \ 15 | --pred_len $pred_len \ 16 | --e_layers 2 \ 17 | --enc_in 7 \ 18 | --dec_in 7 \ 19 | --c_out 7 \ 20 | --n_heads 8 \ 21 | --d_model 8 \ 22 | --d_ff 32 \ 23 | --dropout 0.4 \ 24 | --head_dropout 0.2 \ 25 | --batch_size 16 26 | done -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --is_training 1 \ 7 | --root_path ./dataset/ETT-small/ \ 8 | --data_path ETTm1.csv \ 9 | --model_id ETTm1 \ 10 | --model SimMTM \ 11 | --data ETTm1 \ 12 | --features M \ 13 | --seq_len 336 \ 14 | --label_len 48 \ 15 | --pred_len $pred_len \ 16 | --e_layers 2 \ 17 | --enc_in 7 \ 18 | --dec_in 7 \ 19 | --c_out 7 \ 20 | --n_heads 8 \ 21 | --d_model 32 \ 22 | --d_ff 64 \ 23 | --dropout 0 24 | done 25 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm2.sh: -------------------------------------------------------------------------------- 1 | xport CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --is_training 1 \ 7 | --root_path ./dataset/ETT-small/ \ 8 | --data_path ETTm2.csv \ 9 | --model_id ETTm2 \ 10 | --model SimMTM \ 11 | --data ETTm2 \ 12 | --features M \ 13 | --seq_len 336 \ 14 | --label_len 48 \ 15 | --pred_len $pred_len \ 16 | --e_layers 3 \ 17 | --enc_in 7 \ 18 | --dec_in 7 \ 19 | --c_out 7 \ 20 | --n_heads 8 \ 21 | --d_model 8 \ 22 | --d_ff 16 \ 23 | --dropout 0 \ 24 | --batch_size 64 25 | done 26 | 27 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/Traffic/Traffic.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --root_path ./dataset/traffic/ \ 7 | --data_path traffic.csv \ 8 | --model_id Traffic \ 9 | --model SimMTM \ 10 | --data Traffic \ 11 | --features M \ 12 | --seq_len 336 \ 13 | --label_len 48 \ 14 | --pred_len $pred_len \ 15 | --e_layers 2 \ 16 | --enc_in 862 \ 17 | --dec_in 862 \ 18 | --c_out 862 \ 19 | --d_model 128 \ 20 | --d_ff 256 \ 21 | --n_heads 16 \ 22 | --batch_size 32 \ 23 | --dropout 0.2 24 | done 25 | 26 | 27 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/finetune/Weather_script/Weather.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | for pred_len in 96 192 336 720; do 4 | python -u run.py \ 5 | --task_name finetune \ 6 | --is_training 1 \ 7 | --root_path ./dataset/weather/ \ 8 | --data_path weather.csv \ 9 | --model_id Weather \ 10 | --model SimMTM \ 11 | --data Weather \ 12 | --features M \ 13 | --seq_len 336 \ 14 | --label_len 48 \ 15 | --pred_len $pred_len \ 16 | --e_layers 2 \ 17 | --enc_in 21 \ 18 | --dec_in 21 \ 19 | --c_out 21 \ 20 | --n_heads 8 \ 21 | --d_model 64 \ 22 | --d_ff 64 \ 23 | --batch_size 16 24 | done 25 | 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ECL_script/ECL.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/electricity/ \ 6 | --data_path electricity.csv \ 7 | --model_id ECL \ 8 | --model SimMTM \ 9 | --data ECL \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --label_len 48 \ 13 | --e_layers 2 \ 14 | --positive_nums 2 \ 15 | --mask_rate 0.5 \ 16 | --enc_in 321 \ 17 | --dec_in 321 \ 18 | --c_out 321 \ 19 | --d_model 32 \ 20 | --d_ff 64 \ 21 | --n_heads 16 \ 22 | --batch_size 32 \ 23 | --train_epochs 50 \ 24 | --temperature 0.02 25 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/ETT-small/ \ 6 | --data_path ETTh1.csv \ 7 | --model_id ETTh1 \ 8 | --model SimMTM \ 9 | --data ETTh1 \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --e_layers 3 \ 13 | --enc_in 7 \ 14 | --dec_in 7 \ 15 | --c_out 7 \ 16 | --n_heads 16 \ 17 | --d_model 32 \ 18 | --d_ff 64 \ 19 | --positive_nums 3 \ 20 | --mask_rate 0.5 \ 21 | --learning_rate 0.001 \ 22 | --batch_size 16 \ 23 | --train_epochs 50 24 | 25 | 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh2.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/ETT-small/ \ 6 | --data_path ETTh2.csv \ 7 | --model_id ETTh2 \ 8 | --model SimMTM \ 9 | --data ETTh2 \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --e_layers 2 \ 13 | --enc_in 7 \ 14 | --dec_in 7 \ 15 | --c_out 7 \ 16 | --n_heads 8 \ 17 | --d_model 8 \ 18 | --d_ff 32 \ 19 | --positive_nums 3 \ 20 | --mask_rate 0.5 \ 21 | --learning_rate 0.001 \ 22 | --batch_size 16 \ 23 | --train_epochs 50 24 | 25 | 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/ETT-small/ \ 6 | --data_path ETTm1.csv \ 7 | --model_id ETTm1 \ 8 | --model SimMTM \ 9 | --data ETTm1 \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --e_layers 2 \ 13 | --enc_in 7 \ 14 | --dec_in 7 \ 15 | --c_out 7 \ 16 | --n_heads 8 \ 17 | --d_model 32 \ 18 | --d_ff 64 \ 19 | --positive_nums 3 \ 20 | --mask_rate 0.5 \ 21 | --learning_rate 0.001 \ 22 | --batch_size 16 \ 23 | --train_epochs 50 24 | 25 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm2.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0,1 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/ETT-small/ \ 6 | --data_path ETTm2.csv \ 7 | --model_id ETTm2 \ 8 | --model SimMTM \ 9 | --data ETTm2 \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --e_layers 3 \ 13 | --enc_in 7 \ 14 | --dec_in 7 \ 15 | --c_out 7 \ 16 | --n_heads 8 \ 17 | --d_model 8 \ 18 | --d_ff 16 \ 19 | --positive_nums 2 \ 20 | --mask_rate 0.5 \ 21 | --learning_rate 0.001 \ 22 | --batch_size 16 \ 23 | --train_epochs 50 24 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/Traffic_script/Traffic.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/traffic/ \ 6 | --data_path traffic.csv \ 7 | --model_id Traffic \ 8 | --model SimMTM \ 9 | --data Traffic \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --label_len 48 \ 13 | --e_layers 3 \ 14 | --positive_nums 2 \ 15 | --mask_rate 0.5 \ 16 | --enc_in 862 \ 17 | --dec_in 862 \ 18 | --c_out 862 \ 19 | --d_model 128 \ 20 | --d_ff 256 \ 21 | --n_heads 16 \ 22 | --batch_size 32 \ 23 | --dropout 0.2 \ 24 | --train_epochs 50 \ 25 | --temperature 0.02 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/scripts/pretrain/Weather_script/Weather.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0 2 | 3 | python -u run.py \ 4 | --task_name pretrain \ 5 | --root_path ./dataset/weather/ \ 6 | --data_path weather.csv \ 7 | --model_id Weather \ 8 | --model SimMTM \ 9 | --data Weather \ 10 | --features M \ 11 | --seq_len 336 \ 12 | --e_layers 2 \ 13 | --positive_nums 2 \ 14 | --mask_rate 0.5 \ 15 | --enc_in 21 \ 16 | --dec_in 21 \ 17 | --c_out 21 \ 18 | --n_heads 8 \ 19 | --d_model 64 \ 20 | --d_ff 64 \ 21 | --learning_rate 0.001 \ 22 | --batch_size 16 \ 23 | --train_epochs 50 24 | 25 | 26 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/.DS_Store -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/__init__.py -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/augmentations.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import math 4 | 5 | 6 | def masked_data(sample, sample_mark, masking_ratio, lm, positive_nums=1, distribution='geometric'): 7 | """Masked time series in time dimension""" 8 | 9 | sample = sample.permute(0, 2, 1) # [bs x nvars x seq_len] 10 | 11 | sample_repeat = sample.repeat(positive_nums, 1, 1) # [(bs * positive_nums) x nvars x seq_len] 12 | 13 | mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution) 14 | x_masked = mask * sample_repeat 15 | 16 | sample_mark_repeat = sample_mark.repeat(positive_nums, 1, 1) 17 | 18 | return x_masked.permute(0, 2, 1), sample_mark_repeat, mask.permute(0, 2, 1) 19 | 20 | 21 | def geom_noise_mask_single(L, lm, masking_ratio): 22 | """ 23 | Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio` 24 | proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution. 25 | Args: 26 | L: length of mask and sequence to be masked 27 | lm: average length of masking subsequences (streaks of 0s) 28 | masking_ratio: proportion of L to be masked 29 | Returns: 30 | (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L 31 | """ 32 | keep_mask = np.ones(L, dtype=bool) 33 | p_m = 1 / lm # probability of each masking sequence stopping. parameter of geometric distribution. 34 | p_u = p_m * masking_ratio / ( 35 | 1 - masking_ratio) # probability of each unmasked sequence stopping. parameter of geometric distribution. 36 | p = [p_m, p_u] 37 | 38 | # Start in state 0 with masking_ratio probability 39 | state = int(np.random.rand() > masking_ratio) # state 0 means masking, 1 means not masking 40 | for i in range(L): 41 | keep_mask[i] = state # here it happens that state and masking value corresponding to state are identical 42 | if np.random.rand() < p[state]: 43 | state = 1 - state 44 | 45 | return keep_mask 46 | 47 | 48 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None): 49 | """ 50 | Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked. 51 | Args: 52 | X: (seq_length, feat_dim) numpy array of features corresponding to a single sample 53 | masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of 54 | feat_dim that will be masked on average 55 | lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'. 56 | distribution: whether each mask sequence element is sampled independently at random, or whether 57 | sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of 58 | masked squences of a desired mean length `lm` 59 | exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s) 60 | Returns: 61 | boolean numpy array with the same shape as X, with 0s at places where a feature should be masked 62 | """ 63 | if exclude_feats is not None: 64 | exclude_feats = set(exclude_feats) 65 | 66 | if distribution == 'geometric': # stateful (Markov chain) 67 | mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio) 68 | mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2]) 69 | elif distribution == 'masked_tail': 70 | mask = np.ones(X.shape, dtype=bool) 71 | for m in range(X.shape[0]): # feature dimension 72 | 73 | keep_mask = np.zeros_like(mask[m, :], dtype=bool) 74 | n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio)) 75 | keep_mask[:, :n] = True 76 | mask[m, :] = keep_mask # time dimension 77 | elif distribution == 'masked_head': 78 | mask = np.ones(X.shape, dtype=bool) 79 | for m in range(X.shape[0]): # feature dimension 80 | 81 | keep_mask = np.zeros_like(mask[m, :], dtype=bool) 82 | n = math.ceil(keep_mask.shape[1] * masking_ratio) 83 | keep_mask[:, n:] = True 84 | mask[m, :] = keep_mask # time dimension 85 | else: # each position is independent Bernoulli with p = 1 - masking_ratio 86 | mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True, 87 | p=(1 - masking_ratio, masking_ratio)) 88 | return torch.tensor(mask) 89 | 90 | 91 | def one_hot_encoding(X): 92 | X = [int(x) for x in X] 93 | n_values = np.max(X) + 1 94 | b = np.eye(n_values)[X] 95 | return b 96 | 97 | 98 | def DataTransform(sample, config): 99 | """Weak and strong augmentations""" 100 | weak_aug = scaling(sample, config.augmentation.jitter_scale_ratio) 101 | # weak_aug = permutation(sample, max_segments=config.augmentation.max_seg) 102 | strong_aug = jitter(permutation(sample, max_segments=config.augmentation.max_seg), config.augmentation.jitter_ratio) 103 | 104 | return weak_aug, strong_aug 105 | 106 | 107 | def remove_frequency(x, pertub_ratio=0.0): 108 | mask = torch.cuda.FloatTensor(x.shape).uniform_() > pertub_ratio # maskout_ratio are False 109 | mask = mask.to(x.device) 110 | return x*mask 111 | 112 | 113 | def add_frequency(x, pertub_ratio=0.0): 114 | 115 | mask = torch.cuda.FloatTensor(x.shape).uniform_() > (1-pertub_ratio) # only pertub_ratio of all values are True 116 | mask = mask.to(x.device) 117 | max_amplitude = x.max() 118 | random_am = torch.rand(mask.shape)*(max_amplitude*0.1) 119 | pertub_matrix = mask*random_am 120 | return x+pertub_matrix 121 | 122 | 123 | def generate_binomial_mask(B, T, D, p=0.5): # p is the ratio of not zero 124 | return torch.from_numpy(np.random.binomial(1, p, size=(B, T, D))).to(torch.bool) 125 | 126 | 127 | def masking(x, keepratio=0.9, mask= 'binomial'): 128 | global mask_id 129 | nan_mask = ~x.isnan().any(axis=-1) 130 | x[~nan_mask] = 0 131 | # x = self.input_fc(x) # B x T x Ch 132 | 133 | if mask == 'binomial': 134 | mask_id = generate_binomial_mask(x.size(0), x.size(1), x.size(2), p=keepratio).to(x.device) 135 | # elif mask == 'continuous': 136 | # mask = generate_continuous_mask(x.size(0), x.size(1)).to(x.device) 137 | # elif mask == 'all_true': 138 | # mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool) 139 | # elif mask == 'all_false': 140 | # mask = x.new_full((x.size(0), x.size(1)), False, dtype=torch.bool) 141 | # elif mask == 'mask_last': 142 | # mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool) 143 | # mask[:, -1] = False 144 | 145 | # mask &= nan_mask 146 | x[~mask_id] = 0 147 | return x 148 | 149 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/losses.py: -------------------------------------------------------------------------------- 1 | import torch as t 2 | import torch 3 | import torch.nn as nn 4 | import numpy as np 5 | 6 | def divide_no_nan(a, b): 7 | """ 8 | a/b where the resulted NaN or Inf are replaced by 0. 9 | """ 10 | result = a / b 11 | result[result != result] = .0 12 | result[result == np.inf] = .0 13 | return result 14 | 15 | 16 | class mape_loss(nn.Module): 17 | def __init__(self): 18 | super(mape_loss, self).__init__() 19 | 20 | def forward(self, insample: t.Tensor, freq: int, 21 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float: 22 | """ 23 | MAPE loss as defined in: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error 24 | 25 | :param forecast: Forecast values. Shape: batch, time 26 | :param target: Target values. Shape: batch, time 27 | :param mask: 0/1 mask. Shape: batch, time 28 | :return: Loss value 29 | """ 30 | weights = divide_no_nan(mask, target) 31 | return t.mean(t.abs((forecast - target) * weights)) 32 | 33 | 34 | class smape_loss(nn.Module): 35 | def __init__(self): 36 | super(smape_loss, self).__init__() 37 | 38 | def forward(self, insample: t.Tensor, freq: int, 39 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float: 40 | """ 41 | sMAPE loss as defined in https://robjhyndman.com/hyndsight/smape/ (Makridakis 1993) 42 | 43 | :param forecast: Forecast values. Shape: batch, time 44 | :param target: Target values. Shape: batch, time 45 | :param mask: 0/1 mask. Shape: batch, time 46 | :return: Loss value 47 | """ 48 | return 200 * t.mean(divide_no_nan(t.abs(forecast - target), t.abs(forecast.data) + t.abs(target.data)) * mask) 49 | 50 | 51 | class mase_loss(nn.Module): 52 | def __init__(self): 53 | super(mase_loss, self).__init__() 54 | 55 | def forward(self, insample: t.Tensor, freq: int, 56 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float: 57 | """ 58 | MASE loss as defined in "Scaled Errors" https://robjhyndman.com/papers/mase.pdf 59 | 60 | :param insample: Insample values. Shape: batch, time_i 61 | :param freq: Frequency value 62 | :param forecast: Forecast values. Shape: batch, time_o 63 | :param target: Target values. Shape: batch, time_o 64 | :param mask: 0/1 mask. Shape: batch, time_o 65 | :return: Loss value 66 | """ 67 | masep = t.mean(t.abs(insample[:, freq:] - insample[:, :-freq]), dim=1) 68 | masked_masep_inv = divide_no_nan(mask, masep[:, None]) 69 | return t.mean(t.abs(target - forecast) * masked_masep_inv) 70 | 71 | 72 | class AutomaticWeightedLoss(nn.Module): 73 | """automatically weighted multi-task loss 74 | Params: 75 | num: int,the number of loss 76 | x: multi-task loss 77 | Examples: 78 | loss1=1 79 | loss2=2 80 | awl = AutomaticWeightedLoss(2) 81 | loss_sum = awl(loss1, loss2) 82 | """ 83 | 84 | def __init__(self, num=2): 85 | super(AutomaticWeightedLoss, self).__init__() 86 | params = torch.ones(num, requires_grad=True) 87 | self.params = nn.Parameter(params) 88 | 89 | def forward(self, *x): 90 | loss_sum = 0 91 | for i, loss in enumerate(x): 92 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2) 93 | return loss_sum -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/m4_summary.py: -------------------------------------------------------------------------------- 1 | # This source code is provided for the purposes of scientific reproducibility 2 | # under the following limited license from Element AI Inc. The code is an 3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis 4 | # expansion analysis for interpretable time series forecasting, 5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is 6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0 7 | # International license (CC BY-NC 4.0): 8 | # https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether 9 | # for the benefit of third parties or internally in production) requires an 10 | # explicit license. The subject-matter of the N-BEATS model and associated 11 | # materials are the property of Element AI Inc. and may be subject to patent 12 | # protection. No license to patents is granted hereunder (whether express or 13 | # implied). Copyright 2020 Element AI Inc. All rights reserved. 14 | 15 | """ 16 | M4 Summary 17 | """ 18 | from collections import OrderedDict 19 | 20 | import numpy as np 21 | import pandas as pd 22 | 23 | from data_provider.m4 import M4Dataset 24 | from data_provider.m4 import M4Meta 25 | import os 26 | 27 | 28 | def group_values(values, groups, group_name): 29 | return np.array([v[~np.isnan(v)] for v in values[groups == group_name]]) 30 | 31 | 32 | def mase(forecast, insample, outsample, frequency): 33 | return np.mean(np.abs(forecast - outsample)) / np.mean(np.abs(insample[:-frequency] - insample[frequency:])) 34 | 35 | 36 | def smape_2(forecast, target): 37 | denom = np.abs(target) + np.abs(forecast) 38 | # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway. 39 | denom[denom == 0.0] = 1.0 40 | return 200 * np.abs(forecast - target) / denom 41 | 42 | 43 | def mape(forecast, target): 44 | denom = np.abs(target) 45 | # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway. 46 | denom[denom == 0.0] = 1.0 47 | return 100 * np.abs(forecast - target) / denom 48 | 49 | 50 | class M4Summary: 51 | def __init__(self, file_path, root_path): 52 | self.file_path = file_path 53 | self.training_set = M4Dataset.load(training=True, dataset_file=root_path) 54 | self.test_set = M4Dataset.load(training=False, dataset_file=root_path) 55 | self.naive_path = os.path.join(root_path, 'submission-Naive2.csv') 56 | 57 | def evaluate(self): 58 | """ 59 | Evaluate forecasts using M4 test dataset. 60 | 61 | :param forecast: Forecasts. Shape: timeseries, time. 62 | :return: sMAPE and OWA grouped by seasonal patterns. 63 | """ 64 | grouped_owa = OrderedDict() 65 | 66 | naive2_forecasts = pd.read_csv(self.naive_path).values[:, 1:].astype(np.float32) 67 | naive2_forecasts = np.array([v[~np.isnan(v)] for v in naive2_forecasts]) 68 | 69 | model_mases = {} 70 | naive2_smapes = {} 71 | naive2_mases = {} 72 | grouped_smapes = {} 73 | grouped_mapes = {} 74 | for group_name in M4Meta.seasonal_patterns: 75 | file_name = self.file_path + group_name + "_forecast.csv" 76 | if os.path.exists(file_name): 77 | model_forecast = pd.read_csv(file_name).values 78 | 79 | naive2_forecast = group_values(naive2_forecasts, self.test_set.groups, group_name) 80 | target = group_values(self.test_set.values, self.test_set.groups, group_name) 81 | # all timeseries within group have same frequency 82 | frequency = self.training_set.frequencies[self.test_set.groups == group_name][0] 83 | insample = group_values(self.training_set.values, self.test_set.groups, group_name) 84 | 85 | model_mases[group_name] = np.mean([mase(forecast=model_forecast[i], 86 | insample=insample[i], 87 | outsample=target[i], 88 | frequency=frequency) for i in range(len(model_forecast))]) 89 | naive2_mases[group_name] = np.mean([mase(forecast=naive2_forecast[i], 90 | insample=insample[i], 91 | outsample=target[i], 92 | frequency=frequency) for i in range(len(model_forecast))]) 93 | 94 | naive2_smapes[group_name] = np.mean(smape_2(naive2_forecast, target)) 95 | grouped_smapes[group_name] = np.mean(smape_2(forecast=model_forecast, target=target)) 96 | grouped_mapes[group_name] = np.mean(mape(forecast=model_forecast, target=target)) 97 | 98 | grouped_smapes = self.summarize_groups(grouped_smapes) 99 | grouped_mapes = self.summarize_groups(grouped_mapes) 100 | grouped_model_mases = self.summarize_groups(model_mases) 101 | grouped_naive2_smapes = self.summarize_groups(naive2_smapes) 102 | grouped_naive2_mases = self.summarize_groups(naive2_mases) 103 | for k in grouped_model_mases.keys(): 104 | grouped_owa[k] = (grouped_model_mases[k] / grouped_naive2_mases[k] + 105 | grouped_smapes[k] / grouped_naive2_smapes[k]) / 2 106 | 107 | def round_all(d): 108 | return dict(map(lambda kv: (kv[0], np.round(kv[1], 3)), d.items())) 109 | 110 | return round_all(grouped_smapes), round_all(grouped_owa), round_all(grouped_mapes), round_all( 111 | grouped_model_mases) 112 | 113 | def summarize_groups(self, scores): 114 | """ 115 | Re-group scores respecting M4 rules. 116 | :param scores: Scores per group. 117 | :return: Grouped scores. 118 | """ 119 | scores_summary = OrderedDict() 120 | 121 | def group_count(group_name): 122 | return len(np.where(self.test_set.groups == group_name)[0]) 123 | 124 | weighted_score = {} 125 | for g in ['Yearly', 'Quarterly', 'Monthly']: 126 | weighted_score[g] = scores[g] * group_count(g) 127 | scores_summary[g] = scores[g] 128 | 129 | others_score = 0 130 | others_count = 0 131 | for g in ['Weekly', 'Daily', 'Hourly']: 132 | others_score += scores[g] * group_count(g) 133 | others_count += group_count(g) 134 | weighted_score['Others'] = others_score 135 | scores_summary['Others'] = others_score / others_count 136 | 137 | average = np.sum(list(weighted_score.values())) / len(self.test_set.groups) 138 | scores_summary['Average'] = average 139 | 140 | return scores_summary 141 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/masking.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | class TriangularCausalMask(): 5 | def __init__(self, B, L, device="cpu"): 6 | mask_shape = [B, 1, L, L] 7 | with torch.no_grad(): 8 | self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device) 9 | 10 | @property 11 | def mask(self): 12 | return self._mask 13 | 14 | 15 | class ProbMask(): 16 | def __init__(self, B, H, L, index, scores, device="cpu"): 17 | _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1) 18 | _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1]) 19 | indicator = _mask_ex[torch.arange(B)[:, None, None], 20 | torch.arange(H)[None, :, None], 21 | index, :].to(device) 22 | self._mask = indicator.view(scores.shape).to(device) 23 | 24 | @property 25 | def mask(self): 26 | return self._mask 27 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def RSE(pred, true): 5 | return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2)) 6 | 7 | 8 | def CORR(pred, true): 9 | u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0) 10 | d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0)) 11 | return (u / d).mean(-1) 12 | 13 | 14 | def MAE(pred, true): 15 | return np.mean(np.abs(pred - true)) 16 | 17 | 18 | def MSE(pred, true): 19 | return np.mean((pred - true) ** 2) 20 | 21 | 22 | def RMSE(pred, true): 23 | return np.sqrt(MSE(pred, true)) 24 | 25 | 26 | def MAPE(pred, true): 27 | return np.mean(np.abs((pred - true) / true)) 28 | 29 | 30 | def MSPE(pred, true): 31 | return np.mean(np.square((pred - true) / true)) 32 | 33 | 34 | def metric(pred, true): 35 | mae = MAE(pred, true) 36 | mse = MSE(pred, true) 37 | rmse = RMSE(pred, true) 38 | mape = MAPE(pred, true) 39 | mspe = MSPE(pred, true) 40 | 41 | return mae, mse, rmse, mape, mspe 42 | -------------------------------------------------------------------------------- /SimMTM_Forecasting/utils/timefeatures.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | 3 | import numpy as np 4 | import pandas as pd 5 | from pandas.tseries import offsets 6 | from pandas.tseries.frequencies import to_offset 7 | 8 | 9 | class TimeFeature: 10 | def __init__(self): 11 | pass 12 | 13 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 14 | pass 15 | 16 | def __repr__(self): 17 | return self.__class__.__name__ + "()" 18 | 19 | 20 | class SecondOfMinute(TimeFeature): 21 | """Minute of hour encoded as value between [-0.5, 0.5]""" 22 | 23 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 24 | return index.second / 59.0 - 0.5 25 | 26 | 27 | class MinuteOfHour(TimeFeature): 28 | """Minute of hour encoded as value between [-0.5, 0.5]""" 29 | 30 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 31 | return index.minute / 59.0 - 0.5 32 | 33 | 34 | class HourOfDay(TimeFeature): 35 | """Hour of day encoded as value between [-0.5, 0.5]""" 36 | 37 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 38 | return index.hour / 23.0 - 0.5 39 | 40 | 41 | class DayOfWeek(TimeFeature): 42 | """Hour of day encoded as value between [-0.5, 0.5]""" 43 | 44 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 45 | return index.dayofweek / 6.0 - 0.5 46 | 47 | 48 | class DayOfMonth(TimeFeature): 49 | """Day of month encoded as value between [-0.5, 0.5]""" 50 | 51 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 52 | return (index.day - 1) / 30.0 - 0.5 53 | 54 | 55 | class DayOfYear(TimeFeature): 56 | """Day of year encoded as value between [-0.5, 0.5]""" 57 | 58 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 59 | return (index.dayofyear - 1) / 365.0 - 0.5 60 | 61 | 62 | class MonthOfYear(TimeFeature): 63 | """Month of year encoded as value between [-0.5, 0.5]""" 64 | 65 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 66 | return (index.month - 1) / 11.0 - 0.5 67 | 68 | 69 | class WeekOfYear(TimeFeature): 70 | """Week of year encoded as value between [-0.5, 0.5]""" 71 | 72 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray: 73 | return (index.isocalendar().week - 1) / 52.0 - 0.5 74 | 75 | 76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]: 77 | """ 78 | Returns a list of time features that will be appropriate for the given frequency string. 79 | Parameters 80 | ---------- 81 | freq_str 82 | Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc. 83 | """ 84 | 85 | features_by_offsets = { 86 | offsets.YearEnd: [], 87 | offsets.QuarterEnd: [MonthOfYear], 88 | offsets.MonthEnd: [MonthOfYear], 89 | offsets.Week: [DayOfMonth, WeekOfYear], 90 | offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear], 91 | offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear], 92 | offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear], 93 | offsets.Minute: [ 94 | MinuteOfHour, 95 | HourOfDay, 96 | DayOfWeek, 97 | DayOfMonth, 98 | DayOfYear, 99 | ], 100 | offsets.Second: [ 101 | SecondOfMinute, 102 | MinuteOfHour, 103 | HourOfDay, 104 | DayOfWeek, 105 | DayOfMonth, 106 | DayOfYear, 107 | ], 108 | } 109 | 110 | offset = to_offset(freq_str) 111 | 112 | for offset_type, feature_classes in features_by_offsets.items(): 113 | if isinstance(offset, offset_type): 114 | return [cls() for cls in feature_classes] 115 | 116 | supported_freq_msg = f""" 117 | Unsupported frequency {freq_str} 118 | The following frequencies are supported: 119 | Y - yearly 120 | alias: A 121 | M - monthly 122 | W - weekly 123 | D - daily 124 | B - business days 125 | H - hourly 126 | T - minutely 127 | alias: min 128 | S - secondly 129 | """ 130 | raise RuntimeError(supported_freq_msg) 131 | 132 | 133 | def time_features(dates, freq='h'): 134 | return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)]) 135 | -------------------------------------------------------------------------------- /figs/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/.DS_Store -------------------------------------------------------------------------------- /figs/mainresult.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/mainresult.png -------------------------------------------------------------------------------- /figs/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/overview.png --------------------------------------------------------------------------------