├── README.md
├── SimMTM_Classification
    ├── .gitignore
    ├── code
    │   ├── config_files
    │   │   ├── Epilepsy_Configs.py
    │   │   └── SleepEEG_Configs.py
    │   ├── dataloader.py
    │   ├── layers
    │   │   ├── AutoCorrelation.py
    │   │   ├── Autoformer_EncDec.py
    │   │   ├── Embed.py
    │   │   ├── SelfAttention_Family.py
    │   │   ├── Transformer_EncDec.py
    │   │   └── __init__.py
    │   ├── loss.py
    │   ├── main.py
    │   ├── model.py
    │   ├── trainer.py
    │   └── utils
    │   │   ├── __init__.py
    │   │   ├── augmentations.py
    │   │   ├── loss.py
    │   │   ├── masking.py
    │   │   ├── metrics.py
    │   │   ├── timefeatures.py
    │   │   ├── tools.py
    │   │   └── utils.py
    ├── download_datasets.sh
    └── run.sh
├── SimMTM_Forecasting
    ├── .DS_Store
    ├── .gitignore
    ├── data_provider
    │   ├── __init__.py
    │   ├── data_factory.py
    │   ├── data_loader.py
    │   ├── m4.py
    │   └── uea.py
    ├── exp
    │   ├── .DS_Store
    │   ├── __init__.py
    │   ├── exp_basic.py
    │   └── exp_simmtm.py
    ├── layers
    │   ├── AutoCorrelation.py
    │   ├── Autoformer_EncDec.py
    │   ├── Conv_Blocks.py
    │   ├── ETSformer_EncDec.py
    │   ├── Embed.py
    │   ├── FourierCorrelation.py
    │   ├── MultiWaveletCorrelation.py
    │   ├── Pyraformer_EncDec.py
    │   ├── SelfAttention_Family.py
    │   ├── Transformer_EncDec.py
    │   └── __init__.py
    ├── models
    │   ├── .DS_Store
    │   ├── PatchTST.py
    │   ├── SimMTM.py
    │   ├── __init__.py
    │   └── iTransformer.py
    ├── run.py
    ├── scripts
    │   ├── .DS_Store
    │   ├── finetune
    │   │   ├── .DS_Store
    │   │   ├── ECL_script
    │   │   │   ├── .DS_Store
    │   │   │   └── ECL.sh
    │   │   ├── ETT_script
    │   │   │   ├── .DS_Store
    │   │   │   ├── ETTh1.sh
    │   │   │   ├── ETTh2.sh
    │   │   │   ├── ETTm1.sh
    │   │   │   └── ETTm2.sh
    │   │   ├── Traffic
    │   │   │   ├── .DS_Store
    │   │   │   └── Traffic.sh
    │   │   └── Weather_script
    │   │   │   ├── .DS_Store
    │   │   │   └── Weather.sh
    │   └── pretrain
    │   │   ├── .DS_Store
    │   │   ├── ECL_script
    │   │       ├── .DS_Store
    │   │       └── ECL.sh
    │   │   ├── ETT_script
    │   │       ├── .DS_Store
    │   │       ├── ETTh1.sh
    │   │       ├── ETTh2.sh
    │   │       ├── ETTm1.sh
    │   │       └── ETTm2.sh
    │   │   ├── Traffic_script
    │   │       ├── .DS_Store
    │   │       └── Traffic.sh
    │   │   └── Weather_script
    │   │       ├── .DS_Store
    │   │       └── Weather.sh
    └── utils
    │   ├── .DS_Store
    │   ├── __init__.py
    │   ├── augmentations.py
    │   ├── losses.py
    │   ├── m4_summary.py
    │   ├── masking.py
    │   ├── metrics.py
    │   ├── timefeatures.py
    │   └── tools.py
└── figs
    ├── .DS_Store
    ├── mainresult.png
    └── overview.png


/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # SimMTM (NeurIPS 2023)
  3 | 
  4 | This is the codebase for the paper: [SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling](https://arxiv.org/abs/2302.00861)
  5 | 
  6 | 
  7 | ## Architecture
  8 | 
  9 | <p align="center">
 10 | <img src=".\figs\overview.png" alt="" align=center />
 11 | <br><br>
 12 | <b>Figure 1.</b> Overview of SimMTM.
 13 | </p>
 14 | 
 15 | The reconstruction process of SimMTM involves the following four modules: masking, representation learning, series-wise similarity learning and point-wise reconstruction.
 16 | 
 17 | ### Masking
 18 | 
 19 | We can easily generate a set of masked series for each sample by randomly masking a portion of time points along the temporal dimension.
 20 | 
 21 | ### Representation Learning
 22 | 
 23 | After the encoder and projector layer, we can obtain the point-wise representations and series-wise representations.
 24 | 
 25 | ### Series-wise Similarity Learning
 26 | 
 27 | To precisely reconstruct the original time series, we attempt to utilize the similarities among series-wise representations for weighted aggregation, namely exploiting the local structure of the time series manifold.
 28 | 
 29 | ### Point-wise Reconstruction
 30 | 
 31 | Based on the learned series-wise similarities, we aggregate the point-wise representation of its own masked series and other series to reconstruct the original time series.
 32 | 
 33 | 
 34 | ## Get Started
 35 | 
 36 | 1、Prepare Data. 
 37 | 
 38 | All benchmark datasets can be obtained from [Google Drive](https://drive.google.com/file/d/1CC4ZrUD4EKncndzgy5PSTzOPSqcuyqqj/view?usp=sharing) or [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/a238e34ff81a42878d50/?dl=1), and arrange the folder as:
 39 | 
 40 | ```plain
 41 | SimMTM/
 42 | |--SimMTM_Forecast
 43 |     |-- dataset/
 44 |         |-- ETT-small/
 45 |             |-- ETTh1.csv
 46 |             |-- ETTh2.csv
 47 |             |-- ETTm1.csv
 48 |             |-- ETTm2.csv
 49 |         |-- weather/
 50 |             |-- weather.csv
 51 |         |-- ...
 52 |     |-- ...
 53 | |--SimMTM_Class
 54 |     |-- dataset/
 55 |         |-- SleepEEG/
 56 |             |-- train.pt
 57 |             |-- val.pt
 58 |             |-- test.pt
 59 |         |-- FD-B/
 60 |             |-- ...
 61 |         |-- EMG/
 62 |             |-- ...
 63 |     |-- ...
 64 | ```
 65 | 
 66 | 2、Forecasting
 67 | 
 68 | We provide the forecasting experiment coding in `./SimMTM_Forecast` and experiment scripts can be found under the folder `./scripts`. To run the code on ETTh2, just run the following command: 
 69 | 
 70 | ```bash
 71 | cd ./SimMTM_Forecast
 72 | # pre-training
 73 | sh ./scripts/pretrain/ETT_script/ETTh2.sh
 74 | # fine-tuning
 75 | sh ./scripts/finetune/ETT_script/ETTh2.sh
 76 | ```
 77 | 
 78 | 3、Classification
 79 | 
 80 | We also provide the classification experiment coding in `./SimMTM_Class`. When we want to pre-train a model on SleepEEG and fine-tune it on Epilepsy, please run:
 81 | 
 82 | ```bash
 83 | cd ./SimMTM_Class
 84 | python ./code/main.py --training_mode pre_train --pretrain_dataset SleepEEG --target_dataset Epilepsy 
 85 | ```
 86 | 
 87 | 4、We also provide some [checkpoints](https://cloud.tsinghua.edu.cn/f/466995bb5f924f55a6da/?dl=1) and you can tune them directly on target datasets.
 88 | 
 89 | ## Main Results
 90 | 
 91 | <p align="center">
 92 | <img src=".\figs\mainresult.png" alt="" align=center />
 93 | <br><br>
 94 | </p>
 95 | 
 96 | SimMTM (marked by red stars) can simultaneously cover high-level and low-level tasks for in- and cross-domain settings and outperforms other baselines significantly, highlighting the advantages of SimMTM in task generality. More results can be found in our paper.
 97 | 
 98 | ## Citation
 99 | If you find this repo useful, please cite our paper.
100 | 
101 | ```plain
102 | @inproceedings{dong2023simmtm,
103 |   title={SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling},
104 |   author={Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang and Mingsheng Long},
105 |   booktitle={Advances in Neural Information Processing Systems},
106 |   year={2023}
107 | }
108 | ```
109 | 
110 | ## Contact
111 | 
112 | If you have any questions, please contact [djx20@mails.tsinghua.edu.cn](mailto:djx20@mails.tsinghua.edu.cn).
113 | 
114 | ## Acknowledgement
115 | 
116 | We appreciate the following github repos a lot for their valuable code base or datasets:
117 | 
118 | https://github.com/thuml/Time-Series-Library
119 | 
120 | https://github.com/mims-harvard/TFC-pretraining/tree/main
121 | 
122 | Thanks to [vincentsham](https://github.com/vincentsham/simmtm/blob/main/experiments_simmtm-BeijingPM25Quality.ipynb) for reproducing our code.


--------------------------------------------------------------------------------
/SimMTM_Classification/.gitignore:
--------------------------------------------------------------------------------
1 | **/__pycache__/
2 | **/.DS_Store
3 | old_README.md
4 | **/.idea
5 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/config_files/Epilepsy_Configs.py:
--------------------------------------------------------------------------------
 1 | class Config(object):
 2 |     def __init__(self):
 3 |         # model configs
 4 |         self.input_channels = 1
 5 |         self.kernel_size = 8
 6 |         self.stride = 1
 7 |         self.final_out_channels = 32  #128
 8 | 
 9 |         self.num_classes = 2
10 |         self.num_classes_target = 3
11 |         self.dropout = 0.35
12 |         self.features_len = 24
13 |         self.features_len_f = 24 # 13 #self.features_len   # the output results in time domain
14 | 
15 |         # training configs
16 |         self.num_epoch = 40 # 40
17 |         
18 |         # optimizer parameters
19 |         self.beta1 = 0.9
20 |         self.beta2 = 0.99
21 |         self.lr = 3e-4  # original lr: 3e-4
22 |         self.lr_f = 3e-4
23 | 
24 |         # data parameters
25 |         self.drop_last = True
26 |         self.batch_size = 32 #64 #  128
27 |         self.target_batch_size = 16 # the size of target dataset (the # of samples used to fine-tune).
28 | 
29 |         self.Context_Cont = Context_Cont_configs()
30 |         self.TC = TC()
31 |         self.augmentation = augmentations()
32 | 
33 | 
34 | class augmentations(object):
35 |     def __init__(self):
36 |         self.jitter_scale_ratio = 0.001
37 |         self.jitter_ratio = 0.001
38 |         self.max_seg = 5
39 | 
40 | 
41 | class Context_Cont_configs(object):
42 |     def __init__(self):
43 |         self.temperature = 0.2
44 |         self.use_cosine_similarity = True
45 |         self.use_cosine_similarity_f = True
46 | 
47 | 
48 | class TC(object):
49 |     def __init__(self):
50 |         self.hidden_dim = 100
51 |         self.timesteps = 10
52 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/config_files/SleepEEG_Configs.py:
--------------------------------------------------------------------------------
 1 | 
 2 | class Config(object):
 3 |     def __init__(self):
 4 |         # model configs
 5 |         self.input_channels = 1
 6 |         self.increased_dim = 1
 7 |         self.final_out_channels = 128
 8 |         self.num_classes = 5
 9 |         self.num_classes_target = 8
10 |         self.dropout = 0.2
11 |         self.masking_ratio = 0.5
12 |         self.lm = 3 # average length of masking subsequences
13 | 
14 |         self.kernel_size = 25
15 |         self.stride = 3
16 |         self.features_len = 127
17 |         self.features_len_f = self.features_len
18 | 
19 |         self.TSlength_aligned = 178
20 | 
21 |         self.CNNoutput_channel = 10 # 90 # 10 for Epilepsy model
22 | 
23 |         # training configs
24 |         self.num_epoch = 40
25 | 
26 |         # optimizer parameters
27 |         self.optimizer = 'adam'
28 |         self.beta1 = 0.9
29 |         self.beta2 = 0.99
30 |         self.lr = 3e-8 # 3e-4
31 |         self.lr_f = self.lr
32 | 
33 |         # data parameters
34 |         self.drop_last = True
35 |         self.batch_size = 32
36 | 
37 |         """For Epilepsy, the target batchsize is 60"""
38 |         self.target_batch_size = 32   # the size of target dataset (the # of samples used to fine-tune).
39 | 
40 |         self.Context_Cont = Context_Cont_configs()
41 |         self.TC = TC()
42 |         self.augmentation = augmentations()
43 | 
44 | 
45 | class augmentations(object):
46 |     def __init__(self):
47 |         self.jitter_scale_ratio = 1.5
48 |         self.jitter_ratio = 2
49 |         self.max_seg = 12
50 | 
51 | 
52 | class Context_Cont_configs(object):
53 |     def __init__(self):
54 |         self.temperature = 0.2
55 |         self.use_cosine_similarity = True
56 | 
57 | 
58 | class TC(object):
59 |     def __init__(self):
60 |         self.hidden_dim = 64
61 |         self.timesteps = 50
62 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/dataloader.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch.utils.data import DataLoader
 3 | from torch.utils.data import Dataset
 4 | import os
 5 | import numpy as np
 6 | 
 7 | 
 8 | class Load_Dataset(Dataset):
 9 |     # Initialize your data, download, etc.
10 |     def __init__(self, dataset, config, training_mode, target_dataset_size=64, subset=False):
11 |         super(Load_Dataset, self).__init__()
12 |         self.training_mode = training_mode
13 |         X_train = dataset["samples"]
14 |         y_train = dataset["labels"]
15 |         # shuffle
16 |         data = list(zip(X_train, y_train))
17 |         np.random.shuffle(data)
18 |         X_train, y_train = zip(*data)
19 |         X_train, y_train = torch.stack(list(X_train), dim=0), torch.stack(list(y_train), dim=0)
20 | 
21 |         if len(X_train.shape) < 3:
22 |             X_train = X_train.unsqueeze(2)
23 | 
24 |         if X_train.shape.index(min(X_train.shape)) != 1:  # make sure the Channels in second dim
25 |             X_train = X_train.permute(0, 2, 1)
26 | 
27 |         """Align the TS length between source and target datasets"""
28 |         #X_train = X_train[:, :1, :int(config.TSlength_aligned)] # take the first 178 samples
29 |         X_train = X_train[:, :1, :int(config.TSlength_aligned)]
30 | 
31 |         """Subset for debugging"""
32 |         if subset == True:
33 |             subset_size = target_dataset_size *10
34 |             """if the dimension is larger than 178, take the first 178 dimensions. If multiple channels, take the first channel"""
35 |             X_train = X_train[:subset_size] 
36 |             y_train = y_train[:subset_size]
37 | 
38 |         if isinstance(X_train, np.ndarray):
39 |             self.x_data = torch.from_numpy(X_train)
40 |             self.y_data = torch.from_numpy(y_train).long()
41 |         else:
42 |             self.x_data = X_train
43 |             self.y_data = y_train
44 | 
45 |         self.len = X_train.shape[0]
46 | 
47 |     def __getitem__(self, index):
48 |         return self.x_data[index], self.y_data[index]
49 | 
50 |     def __len__(self):
51 |         return self.len
52 | 
53 | 
54 | def data_generator(sourcedata_path, targetdata_path, configs, training_mode, subset = True):
55 | 
56 |     train_dataset = torch.load(os.path.join(sourcedata_path, "train.pt"))
57 |     finetune_dataset = torch.load(os.path.join(targetdata_path, "train.pt"))
58 |     test_dataset = torch.load(os.path.join(targetdata_path, "test.pt"))
59 |     """ Dataset notes:
60 |     Epilepsy: train_dataset['samples'].shape = torch.Size([7360, 1, 178]); binary labels [7360] 
61 |     valid: [1840, 1, 178]
62 |     test: [2300, 1, 178]. In test set, 1835 are positive sampels, the positive rate is 0.7978"""
63 |     """sleepEDF: finetune_dataset['samples']: [7786, 1, 3000]"""
64 | 
65 |     # subset = True # if true, use a subset for debugging.
66 |     train_dataset = Load_Dataset(train_dataset, configs, training_mode, target_dataset_size=configs.batch_size, subset=subset) # for self-supervised, the data are augmented here
67 |     finetune_dataset = Load_Dataset(finetune_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset)
68 |     if test_dataset['labels'].shape[0]>10*configs.target_batch_size:
69 |         test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size*10, subset=subset)
70 |     else:
71 |         test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset)
72 | 
73 |     train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=configs.batch_size,
74 |                                                shuffle=True, drop_last=configs.drop_last,
75 |                                                num_workers=0)
76 | 
77 |     """the valid and test loader would be finetuning set and test set."""
78 |     valid_loader = torch.utils.data.DataLoader(dataset=finetune_dataset, batch_size=configs.target_batch_size,
79 |                                                shuffle=True, drop_last=configs.drop_last,
80 |                                                num_workers=0)
81 | 
82 |     test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=configs.target_batch_size,
83 |                                               shuffle=True, drop_last=False,
84 |                                               num_workers=0)
85 | 
86 |     return train_loader, valid_loader, test_loader


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/AutoCorrelation.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import math
  4 | 
  5 | 
  6 | class AutoCorrelation(nn.Module):
  7 |     """
  8 |     AutoCorrelation Mechanism with the following two phases:
  9 |     (1) period-based dependencies discovery
 10 |     (2) time delay aggregation
 11 |     This block can replace the self-attention family mechanism seamlessly.
 12 |     """
 13 |     def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False):
 14 |         super(AutoCorrelation, self).__init__()
 15 |         self.factor = factor
 16 |         self.scale = scale
 17 |         self.mask_flag = mask_flag
 18 |         self.output_attention = output_attention
 19 |         self.dropout = nn.Dropout(attention_dropout)
 20 | 
 21 |     def time_delay_agg_training(self, values, corr):
 22 |         """
 23 |         SpeedUp version of Autocorrelation (a batch-normalization style design)
 24 |         This is for the training phase.
 25 |         """
 26 |         head = values.shape[1]
 27 |         channel = values.shape[2]
 28 |         length = values.shape[3]
 29 |         # find top k
 30 |         top_k = int(self.factor * math.log(length))
 31 |         mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
 32 |         index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1]
 33 |         weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1)
 34 |         # update corr
 35 |         tmp_corr = torch.softmax(weights, dim=-1)
 36 |         # aggregation
 37 |         tmp_values = values
 38 |         delays_agg = torch.zeros_like(values).float()
 39 |         for i in range(top_k):
 40 |             pattern = torch.roll(tmp_values, -int(index[i]), -1)
 41 |             delays_agg = delays_agg + pattern * \
 42 |                          (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
 43 |         return delays_agg
 44 | 
 45 |     def time_delay_agg_inference(self, values, corr):
 46 |         """
 47 |         SpeedUp version of Autocorrelation (a batch-normalization style design)
 48 |         This is for the inference phase.
 49 |         """
 50 |         batch = values.shape[0]
 51 |         head = values.shape[1]
 52 |         channel = values.shape[2]
 53 |         length = values.shape[3]
 54 |         # index init
 55 |         init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\
 56 |             .repeat(batch, head, channel, 1).to(values.device)
 57 |         # find top k
 58 |         top_k = int(self.factor * math.log(length))
 59 |         mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
 60 |         weights, delay = torch.topk(mean_value, top_k, dim=-1)
 61 |         # update corr
 62 |         tmp_corr = torch.softmax(weights, dim=-1)
 63 |         # aggregation
 64 |         tmp_values = values.repeat(1, 1, 1, 2)
 65 |         delays_agg = torch.zeros_like(values).float()
 66 |         for i in range(top_k):
 67 |             tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)
 68 |             pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
 69 |             delays_agg = delays_agg + pattern * \
 70 |                          (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
 71 |         return delays_agg
 72 | 
 73 |     def time_delay_agg_full(self, values, corr):
 74 |         """
 75 |         Standard version of Autocorrelation
 76 |         """
 77 |         batch = values.shape[0]
 78 |         head = values.shape[1]
 79 |         channel = values.shape[2]
 80 |         length = values.shape[3]
 81 |         # index init
 82 |         init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\
 83 |             .repeat(batch, head, channel, 1).to(values.device)
 84 |         # find top k
 85 |         top_k = int(self.factor * math.log(length))
 86 |         weights, delay = torch.topk(corr, top_k, dim=-1)
 87 |         # update corr
 88 |         tmp_corr = torch.softmax(weights, dim=-1)
 89 |         # aggregation
 90 |         tmp_values = values.repeat(1, 1, 1, 2)
 91 |         delays_agg = torch.zeros_like(values).float()
 92 |         for i in range(top_k):
 93 |             tmp_delay = init_index + delay[..., i].unsqueeze(-1)
 94 |             pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
 95 |             delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1))
 96 |         return delays_agg
 97 | 
 98 |     def forward(self, queries, keys, values, attn_mask):
 99 |         B, L, H, E = queries.shape
100 |         _, S, _, D = values.shape
101 |         if L > S:
102 |             zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
103 |             values = torch.cat([values, zeros], dim=1)
104 |             keys = torch.cat([keys, zeros], dim=1)
105 |         else:
106 |             values = values[:, :L, :, :]
107 |             keys = keys[:, :L, :, :]
108 | 
109 |         # period-based dependencies
110 |         q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1)
111 |         k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1)
112 |         res = q_fft * torch.conj(k_fft)
113 |         corr = torch.fft.irfft(res, dim=-1)
114 | 
115 |         # time delay agg
116 |         if self.training:
117 |             V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
118 |         else:
119 |             V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
120 | 
121 |         if self.output_attention:
122 |             return (V.contiguous(), corr.permute(0, 3, 1, 2))
123 |         else:
124 |             return (V.contiguous(), None)
125 | 
126 | 
127 | class AutoCorrelationLayer(nn.Module):
128 |     def __init__(self, correlation, d_model, n_heads, d_keys=None,
129 |                  d_values=None):
130 |         super(AutoCorrelationLayer, self).__init__()
131 | 
132 |         d_keys = d_keys or (d_model // n_heads)
133 |         d_values = d_values or (d_model // n_heads)
134 | 
135 |         self.inner_correlation = correlation
136 |         self.query_projection = nn.Linear(d_model, d_keys * n_heads)
137 |         self.key_projection = nn.Linear(d_model, d_keys * n_heads)
138 |         self.value_projection = nn.Linear(d_model, d_values * n_heads)
139 |         self.out_projection = nn.Linear(d_values * n_heads, d_model)
140 |         self.n_heads = n_heads
141 | 
142 |     def forward(self, queries, keys, values, attn_mask):
143 |         B, L, _ = queries.shape
144 |         _, S, _ = keys.shape
145 |         H = self.n_heads
146 | 
147 |         queries = self.query_projection(queries).view(B, L, H, -1)
148 |         keys = self.key_projection(keys).view(B, S, H, -1)
149 |         values = self.value_projection(values).view(B, S, H, -1)
150 | 
151 |         out, attn = self.inner_correlation(
152 |             queries,
153 |             keys,
154 |             values,
155 |             attn_mask
156 |         )
157 |         out = out.view(B, L, -1)
158 | 
159 |         return self.out_projection(out), attn
160 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Autoformer_EncDec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class my_Layernorm(nn.Module):
  7 |     """
  8 |     Special designed layernorm for the seasonal part
  9 |     """
 10 |     def __init__(self, channels):
 11 |         super(my_Layernorm, self).__init__()
 12 |         self.layernorm = nn.LayerNorm(channels)
 13 | 
 14 |     def forward(self, x):
 15 |         x_hat = self.layernorm(x)
 16 |         bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1)
 17 |         return x_hat - bias
 18 | 
 19 | 
 20 | class moving_avg(nn.Module):
 21 |     """
 22 |     Moving average block to highlight the trend of time series
 23 |     """
 24 |     def __init__(self, kernel_size, stride):
 25 |         super(moving_avg, self).__init__()
 26 |         self.kernel_size = kernel_size
 27 |         self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
 28 | 
 29 |     def forward(self, x):
 30 |         # padding on the both ends of time series
 31 |         front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
 32 |         end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
 33 |         x = torch.cat([front, x, end], dim=1)
 34 |         x = self.avg(x.permute(0, 2, 1))
 35 |         x = x.permute(0, 2, 1)
 36 |         return x
 37 | 
 38 | 
 39 | class series_decomp(nn.Module):
 40 |     """
 41 |     Series decomposition block
 42 |     """
 43 |     def __init__(self, kernel_size):
 44 |         super(series_decomp, self).__init__()
 45 |         self.moving_avg = moving_avg(kernel_size, stride=1)
 46 | 
 47 |     def forward(self, x):
 48 |         moving_mean = self.moving_avg(x)
 49 |         res = x - moving_mean
 50 |         return res, moving_mean
 51 | 
 52 | 
 53 | class EncoderLayer(nn.Module):
 54 |     """
 55 |     Autoformer encoder layer with the progressive decomposition architecture
 56 |     """
 57 |     def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"):
 58 |         super(EncoderLayer, self).__init__()
 59 |         d_ff = d_ff or 4 * d_model
 60 |         self.attention = attention
 61 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
 62 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
 63 |         self.decomp1 = series_decomp(moving_avg)
 64 |         self.decomp2 = series_decomp(moving_avg)
 65 |         self.dropout = nn.Dropout(dropout)
 66 |         self.activation = F.relu if activation == "relu" else F.gelu
 67 | 
 68 |     def forward(self, x, attn_mask=None):
 69 |         new_x, attn = self.attention(
 70 |             x, x, x,
 71 |             attn_mask=attn_mask
 72 |         )
 73 |         x = x + self.dropout(new_x)
 74 |         x, _ = self.decomp1(x)
 75 |         y = x
 76 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
 77 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
 78 |         res, _ = self.decomp2(x + y)
 79 |         return res, attn
 80 | 
 81 | 
 82 | class Encoder(nn.Module):
 83 |     """
 84 |     Autoformer encoder
 85 |     """
 86 |     def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
 87 |         super(Encoder, self).__init__()
 88 |         self.attn_layers = nn.ModuleList(attn_layers)
 89 |         self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
 90 |         self.norm = norm_layer
 91 | 
 92 |     def forward(self, x, attn_mask=None):
 93 |         attns = []
 94 |         if self.conv_layers is not None:
 95 |             for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
 96 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
 97 |                 x = conv_layer(x)
 98 |                 attns.append(attn)
 99 |             x, attn = self.attn_layers[-1](x)
100 |             attns.append(attn)
101 |         else:
102 |             for attn_layer in self.attn_layers:
103 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
104 |                 attns.append(attn)
105 | 
106 |         if self.norm is not None:
107 |             x = self.norm(x)
108 | 
109 |         return x, attns
110 | 
111 | 
112 | class DecoderLayer(nn.Module):
113 |     """
114 |     Autoformer decoder layer with the progressive decomposition architecture
115 |     """
116 |     def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None,
117 |                  moving_avg=25, dropout=0.1, activation="relu"):
118 |         super(DecoderLayer, self).__init__()
119 |         d_ff = d_ff or 4 * d_model
120 |         self.self_attention = self_attention
121 |         self.cross_attention = cross_attention
122 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
123 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
124 |         self.decomp1 = series_decomp(moving_avg)
125 |         self.decomp2 = series_decomp(moving_avg)
126 |         self.decomp3 = series_decomp(moving_avg)
127 |         self.dropout = nn.Dropout(dropout)
128 |         self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1,
129 |                                     padding_mode='circular', bias=False)
130 |         self.activation = F.relu if activation == "relu" else F.gelu
131 | 
132 |     def forward(self, x, cross, x_mask=None, cross_mask=None):
133 |         x = x + self.dropout(self.self_attention(
134 |             x, x, x,
135 |             attn_mask=x_mask
136 |         )[0])
137 |         x, trend1 = self.decomp1(x)
138 |         x = x + self.dropout(self.cross_attention(
139 |             x, cross, cross,
140 |             attn_mask=cross_mask
141 |         )[0])
142 |         x, trend2 = self.decomp2(x)
143 |         y = x
144 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
145 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
146 |         x, trend3 = self.decomp3(x + y)
147 | 
148 |         residual_trend = trend1 + trend2 + trend3
149 |         residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2)
150 |         return x, residual_trend
151 | 
152 | 
153 | class Decoder(nn.Module):
154 |     """
155 |     Autoformer encoder
156 |     """
157 |     def __init__(self, layers, norm_layer=None, projection=None):
158 |         super(Decoder, self).__init__()
159 |         self.layers = nn.ModuleList(layers)
160 |         self.norm = norm_layer
161 |         self.projection = projection
162 | 
163 |     def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
164 |         for layer in self.layers:
165 |             x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
166 |             trend = trend + residual_trend
167 | 
168 |         if self.norm is not None:
169 |             x = self.norm(x)
170 | 
171 |         if self.projection is not None:
172 |             x = self.projection(x)
173 |         return x, trend
174 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Embed.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import math
  4 | 
  5 | 
  6 | class PositionalEmbedding(nn.Module):
  7 |     def __init__(self, d_model, max_len=5000):
  8 |         super(PositionalEmbedding, self).__init__()
  9 |         # Compute the positional encodings once in log space.
 10 |         pe = torch.zeros(max_len, d_model).float()
 11 |         pe.require_grad = False
 12 | 
 13 |         position = torch.arange(0, max_len).float().unsqueeze(1)
 14 |         div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
 15 | 
 16 |         pe[:, 0::2] = torch.sin(position * div_term)
 17 |         pe[:, 1::2] = torch.cos(position * div_term)
 18 | 
 19 |         pe = pe.unsqueeze(0)
 20 |         self.register_buffer('pe', pe)
 21 | 
 22 |     def forward(self, x):
 23 |         return self.pe[:, :x.size(1)]
 24 | 
 25 | 
 26 | class TokenEmbedding(nn.Module):
 27 |     def __init__(self, c_in, d_model):
 28 |         super(TokenEmbedding, self).__init__()
 29 |         padding = 1 if torch.__version__ >= '1.5.0' else 2
 30 |         self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
 31 |                                    kernel_size=3, padding=padding, padding_mode='circular', bias=False)
 32 |         for m in self.modules():
 33 |             if isinstance(m, nn.Conv1d):
 34 |                 nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
 35 | 
 36 |     def forward(self, x):
 37 |         x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
 38 |         return x
 39 | 
 40 | 
 41 | class FixedEmbedding(nn.Module):
 42 |     def __init__(self, c_in, d_model):
 43 |         super(FixedEmbedding, self).__init__()
 44 | 
 45 |         w = torch.zeros(c_in, d_model).float()
 46 |         w.require_grad = False
 47 | 
 48 |         position = torch.arange(0, c_in).float().unsqueeze(1)
 49 |         div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
 50 | 
 51 |         w[:, 0::2] = torch.sin(position * div_term)
 52 |         w[:, 1::2] = torch.cos(position * div_term)
 53 | 
 54 |         self.emb = nn.Embedding(c_in, d_model)
 55 |         self.emb.weight = nn.Parameter(w, requires_grad=False)
 56 | 
 57 |     def forward(self, x):
 58 |         return self.emb(x).detach()
 59 | 
 60 | 
 61 | class TemporalEmbedding(nn.Module):
 62 |     def __init__(self, d_model, embed_type='fixed', freq='h'):
 63 |         super(TemporalEmbedding, self).__init__()
 64 | 
 65 |         minute_size = 4
 66 |         hour_size = 24
 67 |         weekday_size = 7
 68 |         day_size = 32
 69 |         month_size = 13
 70 | 
 71 |         Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
 72 |         if freq == 't':
 73 |             self.minute_embed = Embed(minute_size, d_model)
 74 |         self.hour_embed = Embed(hour_size, d_model)
 75 |         self.weekday_embed = Embed(weekday_size, d_model)
 76 |         self.day_embed = Embed(day_size, d_model)
 77 |         self.month_embed = Embed(month_size, d_model)
 78 | 
 79 |     def forward(self, x):
 80 |         x = x.long()
 81 | 
 82 |         minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0.
 83 |         hour_x = self.hour_embed(x[:, :, 3])
 84 |         weekday_x = self.weekday_embed(x[:, :, 2])
 85 |         day_x = self.day_embed(x[:, :, 1])
 86 |         month_x = self.month_embed(x[:, :, 0])
 87 | 
 88 |         return hour_x + weekday_x + day_x + month_x + minute_x
 89 | 
 90 | 
 91 | class TimeFeatureEmbedding(nn.Module):
 92 |     def __init__(self, d_model, embed_type='timeF', freq='h'):
 93 |         super(TimeFeatureEmbedding, self).__init__()
 94 | 
 95 |         freq_map = {'h': 4, 't': 5, 's': 6, 'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3}
 96 |         d_inp = freq_map[freq]
 97 |         self.embed = nn.Linear(d_inp, d_model, bias=False)
 98 | 
 99 |     def forward(self, x):
100 |         return self.embed(x)
101 | 
102 | 
103 | class DataEmbedding(nn.Module):
104 |     def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
105 |         super(DataEmbedding, self).__init__()
106 | 
107 |         self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
108 |         self.position_embedding = PositionalEmbedding(d_model=d_model)
109 |         self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
110 |                                                     freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
111 |             d_model=d_model, embed_type=embed_type, freq=freq)
112 |         self.dropout = nn.Dropout(p=dropout)
113 | 
114 |     def forward(self, x):
115 |         x = self.value_embedding(x) + self.position_embedding(x)
116 |         return self.dropout(x)
117 | 
118 | 
119 | class DataEmbedding_wo_pos(nn.Module):
120 |     def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
121 |         super(DataEmbedding_wo_pos, self).__init__()
122 | 
123 |         self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
124 |         self.position_embedding = PositionalEmbedding(d_model=d_model)
125 |         self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
126 |                                                     freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
127 |             d_model=d_model, embed_type=embed_type, freq=freq)
128 |         self.dropout = nn.Dropout(p=dropout)
129 | 
130 |     def forward(self, x, x_mark):
131 |         x = self.value_embedding(x) + self.temporal_embedding(x_mark)
132 |         return self.dropout(x)
133 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/SelfAttention_Family.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import numpy as np
  4 | from math import sqrt
  5 | from utils.masking import TriangularCausalMask, ProbMask
  6 | 
  7 | 
  8 | class FullAttention(nn.Module):
  9 |     def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
 10 |         super(FullAttention, self).__init__()
 11 |         self.scale = scale
 12 |         self.mask_flag = mask_flag
 13 |         self.output_attention = output_attention
 14 |         self.dropout = nn.Dropout(attention_dropout)
 15 | 
 16 |     def forward(self, queries, keys, values, attn_mask):
 17 |         B, L, H, E = queries.shape
 18 |         _, S, _, D = values.shape
 19 |         scale = self.scale or 1. / sqrt(E)
 20 | 
 21 |         scores = torch.einsum("blhe,bshe->bhls", queries, keys)
 22 | 
 23 |         if self.mask_flag:
 24 |             if attn_mask is None:
 25 |                 attn_mask = TriangularCausalMask(B, L, device=queries.device)
 26 | 
 27 |             scores.masked_fill_(attn_mask.mask, -np.inf)
 28 | 
 29 |         A = self.dropout(torch.softmax(scale * scores, dim=-1))
 30 |         V = torch.einsum("bhls,bshd->blhd", A, values)
 31 | 
 32 |         if self.output_attention:
 33 |             return (V.contiguous(), A)
 34 |         else:
 35 |             return (V.contiguous(), None)
 36 | 
 37 | 
 38 | class ProbAttention(nn.Module):
 39 |     def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
 40 |         super(ProbAttention, self).__init__()
 41 |         self.factor = factor
 42 |         self.scale = scale
 43 |         self.mask_flag = mask_flag
 44 |         self.output_attention = output_attention
 45 |         self.dropout = nn.Dropout(attention_dropout)
 46 | 
 47 |     def _prob_QK(self, Q, K, sample_k, n_top):  # n_top: c*ln(L_q)
 48 |         # Q [B, H, L, D]
 49 |         B, H, L_K, E = K.shape
 50 |         _, _, L_Q, _ = Q.shape
 51 | 
 52 |         # calculate the sampled Q_K
 53 |         K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)
 54 |         index_sample = torch.randint(L_K, (L_Q, sample_k))  # real U = U_part(factor*ln(L_k))*L_q
 55 |         K_sample = K_expand[:, :, torch.arange(L_Q).unsqueeze(1), index_sample, :]
 56 |         Q_K_sample = torch.matmul(Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()
 57 | 
 58 |         # find the Top_k query with sparisty measurement
 59 |         M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
 60 |         M_top = M.topk(n_top, sorted=False)[1]
 61 | 
 62 |         # use the reduced Q to calculate Q_K
 63 |         Q_reduce = Q[torch.arange(B)[:, None, None],
 64 |                    torch.arange(H)[None, :, None],
 65 |                    M_top, :]  # factor*ln(L_q)
 66 |         Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1))  # factor*ln(L_q)*L_k
 67 | 
 68 |         return Q_K, M_top
 69 | 
 70 |     def _get_initial_context(self, V, L_Q):
 71 |         B, H, L_V, D = V.shape
 72 |         if not self.mask_flag:
 73 |             # V_sum = V.sum(dim=-2)
 74 |             V_sum = V.mean(dim=-2)
 75 |             contex = V_sum.unsqueeze(-2).expand(B, H, L_Q, V_sum.shape[-1]).clone()
 76 |         else:  # use mask
 77 |             assert (L_Q == L_V)  # requires that L_Q == L_V, i.e. for self-attention only
 78 |             contex = V.cumsum(dim=-2)
 79 |         return contex
 80 | 
 81 |     def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
 82 |         B, H, L_V, D = V.shape
 83 | 
 84 |         if self.mask_flag:
 85 |             attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
 86 |             scores.masked_fill_(attn_mask.mask, -np.inf)
 87 | 
 88 |         attn = torch.softmax(scores, dim=-1)  # nn.Softmax(dim=-1)(scores)
 89 | 
 90 |         context_in[torch.arange(B)[:, None, None],
 91 |         torch.arange(H)[None, :, None],
 92 |         index, :] = torch.matmul(attn, V).type_as(context_in)
 93 |         if self.output_attention:
 94 |             attns = (torch.ones([B, H, L_V, L_V]) / L_V).type_as(attn).to(attn.device)
 95 |             attns[torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :] = attn
 96 |             return (context_in, attns)
 97 |         else:
 98 |             return (context_in, None)
 99 | 
100 |     def forward(self, queries, keys, values, attn_mask):
101 |         B, L_Q, H, D = queries.shape
102 |         _, L_K, _, _ = keys.shape
103 | 
104 |         queries = queries.transpose(2, 1)
105 |         keys = keys.transpose(2, 1)
106 |         values = values.transpose(2, 1)
107 | 
108 |         U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item()  # c*ln(L_k)
109 |         u = self.factor * np.ceil(np.log(L_Q)).astype('int').item()  # c*ln(L_q)
110 | 
111 |         U_part = U_part if U_part < L_K else L_K
112 |         u = u if u < L_Q else L_Q
113 | 
114 |         scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)
115 | 
116 |         # add scale factor
117 |         scale = self.scale or 1. / sqrt(D)
118 |         if scale is not None:
119 |             scores_top = scores_top * scale
120 |         # get the context
121 |         context = self._get_initial_context(values, L_Q)
122 |         # update the context with selected top_k queries
123 |         context, attn = self._update_context(context, values, scores_top, index, L_Q, attn_mask)
124 | 
125 |         return context.contiguous(), attn
126 | 
127 | 
128 | class AttentionLayer(nn.Module):
129 |     def __init__(self, attention, d_model, n_heads, d_keys=None,
130 |                  d_values=None):
131 |         super(AttentionLayer, self).__init__()
132 | 
133 |         d_keys = d_keys or (d_model // n_heads)
134 |         d_values = d_values or (d_model // n_heads)
135 | 
136 |         self.inner_attention = attention
137 |         self.query_projection = nn.Linear(d_model, d_keys * n_heads)
138 |         self.key_projection = nn.Linear(d_model, d_keys * n_heads)
139 |         self.value_projection = nn.Linear(d_model, d_values * n_heads)
140 |         self.out_projection = nn.Linear(d_values * n_heads, d_model)
141 |         self.n_heads = n_heads
142 | 
143 |     def forward(self, queries, keys, values, attn_mask):
144 |         B, L, _ = queries.shape
145 |         _, S, _ = keys.shape
146 |         H = self.n_heads
147 | 
148 |         queries = self.query_projection(queries).view(B, L, H, -1)
149 |         keys = self.key_projection(keys).view(B, S, H, -1)
150 |         values = self.value_projection(values).view(B, S, H, -1)
151 | 
152 |         out, attn = self.inner_attention(
153 |             queries,
154 |             keys,
155 |             values,
156 |             attn_mask
157 |         )
158 |         out = out.view(B, L, -1)
159 | 
160 |         return self.out_projection(out), attn
161 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Transformer_EncDec.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.nn.functional as F
  3 | 
  4 | 
  5 | class ConvLayer(nn.Module):
  6 |     def __init__(self, c_in):
  7 |         super(ConvLayer, self).__init__()
  8 |         self.downConv = nn.Conv1d(in_channels=c_in,
  9 |                                   out_channels=c_in,
 10 |                                   kernel_size=3,
 11 |                                   padding=2,
 12 |                                   padding_mode='circular')
 13 |         self.norm = nn.BatchNorm1d(c_in)
 14 |         self.activation = nn.ELU()
 15 |         self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
 16 | 
 17 |     def forward(self, x):
 18 |         x = self.downConv(x.permute(0, 2, 1))
 19 |         x = self.norm(x)
 20 |         x = self.activation(x)
 21 |         x = self.maxPool(x)
 22 |         x = x.transpose(1, 2)
 23 |         return x
 24 | 
 25 | 
 26 | class EncoderLayer(nn.Module):
 27 |     def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
 28 |         super(EncoderLayer, self).__init__()
 29 |         d_ff = d_ff or 4 * d_model
 30 |         self.attention = attention
 31 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 32 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 33 |         self.norm1 = nn.LayerNorm(d_model)
 34 |         self.norm2 = nn.LayerNorm(d_model)
 35 |         self.dropout = nn.Dropout(dropout)
 36 |         self.activation = F.relu if activation == "relu" else F.gelu
 37 | 
 38 |     def forward(self, x, attn_mask=None):
 39 |         new_x, attn = self.attention(
 40 |             x, x, x,
 41 |             attn_mask=attn_mask
 42 |         )
 43 |         x = x + self.dropout(new_x)
 44 | 
 45 |         y = x = self.norm1(x)
 46 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
 47 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
 48 | 
 49 |         return self.norm2(x + y), attn
 50 | 
 51 | 
 52 | class Encoder(nn.Module):
 53 |     def __init__(self, attn_layers, conv_layers=None, norm_layer=None, projection=None):
 54 |         super(Encoder, self).__init__()
 55 |         self.attn_layers = nn.ModuleList(attn_layers)
 56 |         self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
 57 |         self.norm = norm_layer
 58 |         self.projection = projection
 59 | 
 60 |     def forward(self, x, attn_mask=None):
 61 |         # x [B, L, D]
 62 |         attns = []
 63 |         if self.conv_layers is not None:
 64 |             for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
 65 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
 66 |                 x = conv_layer(x)
 67 |                 attns.append(attn)
 68 |             x, attn = self.attn_layers[-1](x)
 69 |             attns.append(attn)
 70 |         else:
 71 |             for attn_layer in self.attn_layers:
 72 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
 73 |                 attns.append(attn)
 74 | 
 75 |         if self.norm is not None:
 76 |             x = self.norm(x)
 77 |         
 78 |         if self.projection is not None:
 79 |             x = self.projection(x)
 80 | 
 81 |         return x, attns
 82 | 
 83 | 
 84 | class DecoderLayer(nn.Module):
 85 |     def __init__(self, self_attention, cross_attention, d_model, d_ff=None,
 86 |                  dropout=0.1, activation="relu"):
 87 |         super(DecoderLayer, self).__init__()
 88 |         d_ff = d_ff or 4 * d_model
 89 |         self.self_attention = self_attention
 90 |         self.cross_attention = cross_attention
 91 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 92 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 93 |         self.norm1 = nn.LayerNorm(d_model)
 94 |         self.norm2 = nn.LayerNorm(d_model)
 95 |         self.norm3 = nn.LayerNorm(d_model)
 96 |         self.dropout = nn.Dropout(dropout)
 97 |         self.activation = F.relu if activation == "relu" else F.gelu
 98 | 
 99 |     def forward(self, x, cross, x_mask=None, cross_mask=None):
100 |         x = x + self.dropout(self.self_attention(
101 |             x, x, x,
102 |             attn_mask=x_mask
103 |         )[0])
104 |         x = self.norm1(x)
105 | 
106 |         x = x + self.dropout(self.cross_attention(
107 |             x, cross, cross,
108 |             attn_mask=cross_mask
109 |         )[0])
110 | 
111 |         y = x = self.norm2(x)
112 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
113 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
114 | 
115 |         return self.norm3(x + y)
116 | 
117 | 
118 | class Decoder(nn.Module):
119 |     def __init__(self, layers, norm_layer=None, projection=None):
120 |         super(Decoder, self).__init__()
121 |         self.layers = nn.ModuleList(layers)
122 |         self.norm = norm_layer
123 |         self.projection = projection
124 | 
125 |     def forward(self, x, cross, x_mask=None, cross_mask=None):
126 |         for layer in self.layers:
127 |             x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
128 | 
129 |         if self.norm is not None:
130 |             x = self.norm(x)
131 | 
132 |         if self.projection is not None:
133 |             x = self.projection(x)
134 |         return x
135 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/layers/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Classification/code/loss.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | import torch.nn.functional as F
  4 | 
  5 | class AutomaticWeightedLoss(torch.nn.Module):
  6 |     """automatically weighted multi-task loss
  7 |     Params：
  8 |         num: int，the number of loss
  9 |         x: multi-task loss
 10 |     Examples：
 11 |         loss1=1
 12 |         loss2=2
 13 |         awl = AutomaticWeightedLoss(2)
 14 |         loss_sum = awl(loss1, loss2)
 15 |     """
 16 | 
 17 |     def __init__(self, num=2):
 18 |         super(AutomaticWeightedLoss, self).__init__()
 19 |         params = torch.ones(num, requires_grad=True)
 20 |         self.params = torch.nn.Parameter(params)
 21 | 
 22 |     def forward(self, *x):
 23 |         loss_sum = 0
 24 |         for i, loss in enumerate(x):
 25 |             loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
 26 |         return loss_sum
 27 | 
 28 | 
 29 | class ContrastiveWeight(torch.nn.Module):
 30 | 
 31 |     def __init__(self, args):
 32 |         super(ContrastiveWeight, self).__init__()
 33 |         self.temperature = args.temperature
 34 | 
 35 |         self.bce = torch.nn.BCELoss()
 36 |         self.softmax = torch.nn.Softmax(dim=-1)
 37 |         self.log_softmax = torch.nn.LogSoftmax(dim=-1)
 38 |         self.kl = torch.nn.KLDivLoss(reduction='batchmean')
 39 |         self.positive_nums = args.positive_nums
 40 | 
 41 |     def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size):
 42 |         diag = np.eye(cur_batch_size)
 43 |         mask = torch.from_numpy(diag)
 44 |         mask = mask.type(torch.bool)
 45 | 
 46 |         oral_batch_size = cur_batch_size // (self.positive_nums + 1)
 47 | 
 48 |         positives_mask = np.zeros(similarity_matrix.size())
 49 |         for i in range(self.positive_nums + 1):
 50 |             ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size * i)
 51 |             lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size * i)
 52 |             positives_mask += ll
 53 |             positives_mask += lr
 54 | 
 55 |         positives_mask = torch.from_numpy(positives_mask).to(similarity_matrix.device)
 56 |         positives_mask[mask] = 0
 57 | 
 58 |         negatives_mask = 1 - positives_mask
 59 |         negatives_mask[mask] = 0
 60 | 
 61 |         return positives_mask.type(torch.bool), negatives_mask.type(torch.bool)
 62 | 
 63 |     def forward(self, batch_emb_om):
 64 |         cur_batch_shape = batch_emb_om.shape
 65 | 
 66 |         # get similarity matrix among mask samples
 67 |         norm_emb = F.normalize(batch_emb_om, dim=1)
 68 |         similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1))
 69 | 
 70 |         # get positives and negatives similarity
 71 |         positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0])
 72 | 
 73 |         positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1)
 74 |         negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1)
 75 | 
 76 |         # generate predict and target probability distributions matrix
 77 |         logits = torch.cat((positives, negatives), dim=-1)
 78 |         y_true = torch.cat(
 79 |             (torch.ones(cur_batch_shape[0], positives.shape[-1]), torch.zeros(cur_batch_shape[0], negatives.shape[-1])),
 80 |             dim=-1).to(batch_emb_om.device).float()
 81 | 
 82 |         # multiple positives - KL divergence
 83 |         predict = self.log_softmax(logits / self.temperature)
 84 |         loss = self.kl(predict, y_true)
 85 | 
 86 |         return loss, similarity_matrix, logits, positives_mask
 87 | 
 88 | 
 89 | class AggregationRebuild(torch.nn.Module):
 90 | 
 91 |     def __init__(self, args):
 92 |         super(AggregationRebuild, self).__init__()
 93 |         self.args = args
 94 |         self.temperature = args.temperature
 95 |         self.softmax = torch.nn.Softmax(dim=-1)
 96 |         self.mse = torch.nn.MSELoss()
 97 | 
 98 |     def forward(self, similarity_matrix, batch_emb_om):
 99 |         cur_batch_shape = batch_emb_om.shape
100 | 
101 |         # get the weight among (oral, oral's masks, others, others' masks)
102 |         similarity_matrix /= self.temperature
103 | 
104 |         similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to(
105 |             similarity_matrix.device).float() * 1e12
106 |         rebuild_weight_matrix = self.softmax(similarity_matrix)
107 | 
108 |         batch_emb_om = batch_emb_om.reshape(cur_batch_shape[0], -1)
109 | 
110 |         # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks)
111 |         rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om)
112 | 
113 |         # get oral' rebuilt batch embedding
114 |         rebuild_oral_batch_emb = rebuild_batch_emb.reshape(cur_batch_shape[0], cur_batch_shape[1], -1)
115 | 
116 |         return rebuild_weight_matrix, rebuild_oral_batch_emb


--------------------------------------------------------------------------------
/SimMTM_Classification/code/main.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from datetime import datetime
  3 | import argparse
  4 | from utils.utils import _logger
  5 | from model import TFC
  6 | from dataloader import data_generator
  7 | from trainer import Trainer
  8 | import os
  9 | import torch
 10 | 
 11 | # Args selections
 12 | start_time = datetime.now()
 13 | parser = argparse.ArgumentParser()
 14 | 
 15 | home_dir = os.getcwd()
 16 | parser.add_argument('--run_description', default='run1', type=str, help='Experiment Description')
 17 | parser.add_argument('--seed', default=2023, type=int, help='seed value')
 18 | 
 19 | parser.add_argument('--training_mode', default='pre_train', type=str, help='pre_train, fine_tune')
 20 | parser.add_argument('--pretrain_dataset', default='SleepEEG', type=str,
 21 |                     help='Dataset of choice: SleepEEG, FD_A, HAR, ECG')
 22 | parser.add_argument('--target_dataset', default='Epilepsy', type=str,
 23 |                     help='Dataset of choice: Epilepsy, FD_B, Gesture, EMG')
 24 | 
 25 | parser.add_argument('--logs_save_dir', default='experiments_logs', type=str, help='saving directory')
 26 | parser.add_argument('--device', default='cuda', type=str, help='cpu or cuda')
 27 | parser.add_argument('--home_path', default=home_dir, type=str, help='Project home directory')
 28 | parser.add_argument('--subset', action='store_true', default=False, help='use the subset of datasets')
 29 | parser.add_argument('--log_epoch', default=5, type=int, help='print loss and metrix')
 30 | parser.add_argument('--draw_similar_matrix', default=10, type=int, help='draw similarity matrix')
 31 | parser.add_argument('--pretrain_lr', default=0.0001, type=float, help='pretrain learning rate')
 32 | parser.add_argument('--lr', default=0.0001, type=float, help='learning rate')
 33 | parser.add_argument('--use_pretrain_epoch_dir', default=None, type=str,
 34 |                     help='choose the pretrain checkpoint to finetune')
 35 | parser.add_argument('--pretrain_epoch', default=10, type=int, help='pretrain epochs')
 36 | parser.add_argument('--finetune_epoch', default=300, type=int, help='finetune epochs')
 37 | 
 38 | parser.add_argument('--masking_ratio', default=0.5, type=float, help='masking ratio')
 39 | parser.add_argument('--positive_nums', default=3, type=int, help='positive series numbers')
 40 | parser.add_argument('--lm', default=3, type=int, help='average masked lenght')
 41 | 
 42 | parser.add_argument('--finetune_result_file_name', default="finetune_result.json", type=str,
 43 |                     help='finetune result json name')
 44 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature')
 45 | 
 46 | 
 47 | def set_seed(seed):
 48 |     SEED = seed
 49 |     torch.manual_seed(SEED)
 50 |     torch.backends.cudnn.deterministic = False
 51 |     torch.backends.cudnn.benchmark = False
 52 |     np.random.seed(SEED)
 53 | 
 54 |     return seed
 55 | 
 56 | 
 57 | def main(args, configs, seed=None):
 58 |     method = 'SimMTM'
 59 |     sourcedata = args.pretrain_dataset
 60 |     targetdata = args.target_dataset
 61 |     training_mode = args.training_mode
 62 |     run_description = args.run_description
 63 | 
 64 |     logs_save_dir = args.logs_save_dir
 65 |     masking_ratio = args.masking_ratio
 66 |     pretrain_lr = args.pretrain_lr
 67 |     pretrain_epoch = args.pretrain_epoch
 68 |     lr = args.lr
 69 |     finetune_epoch = args.finetune_epoch
 70 |     temperature = args.temperature
 71 |     experiment_description = f"{sourcedata}_2_{targetdata}"
 72 | 
 73 |     os.makedirs(logs_save_dir, exist_ok=True)
 74 | 
 75 |     # Load datasets
 76 |     sourcedata_path = f"./dataset/{sourcedata}"  # './data/Epilepsy'
 77 |     targetdata_path = f"./dataset/{targetdata}"
 78 | 
 79 |     subset = args.subset  # if subset= true, use a subset for debugging.
 80 |     train_dl, valid_dl, test_dl = data_generator(sourcedata_path, targetdata_path, configs, training_mode,
 81 |                                                  subset=subset)
 82 | 
 83 |     # set seed
 84 |     if seed is not None:
 85 |         seed = set_seed(seed)
 86 |     else:
 87 |         seed = set_seed(args.seed)
 88 | 
 89 |     # experiments_logs/SleepEEG/run1/pre_train_2023_pt_0.5_0.0001_50_ft_0.0003_100
 90 |     experiment_log_dir = os.path.join(logs_save_dir, experiment_description, run_description,
 91 |                                       training_mode + f"_{seed}_pt_{masking_ratio}_{pretrain_lr}_{pretrain_epoch}_ft_{lr}_{finetune_epoch}")
 92 |     os.makedirs(experiment_log_dir, exist_ok=True)
 93 | 
 94 |     # Logging
 95 |     log_file_name = os.path.join(experiment_log_dir, f"logs_{datetime.now().strftime('%d_%m_%Y_%H_%M_%S')}.log")
 96 |     logger = _logger(log_file_name)
 97 |     logger.debug("=" * 45)
 98 |     logger.debug(f'Pre-training Dataset: {sourcedata}')
 99 |     logger.debug(f'Target (fine-tuning) Dataset: {targetdata}')
100 |     logger.debug(f'Seed: {seed}')
101 |     logger.debug(f'Method:  {method}')
102 |     logger.debug(f'Mode:    {training_mode}')
103 |     logger.debug(f'Pretrain Learning rate:    {pretrain_lr}')
104 |     logger.debug(f'Masking ratio:    {masking_ratio}')
105 |     logger.debug(f'Pretrain Epochs:    {pretrain_epoch}')
106 |     logger.debug(f'Finetune Learning rate:    {lr}')
107 |     logger.debug(f'Finetune Epochs:    {finetune_epoch}')
108 |     logger.debug(f'Temperature: {temperature}')
109 |     logger.debug("=" * 45)
110 | 
111 |     # Load Model
112 |     model = TFC(configs, args).to(device)
113 |     params_group = [{'params': model.parameters()}]
114 |     model_optimizer = torch.optim.Adam(params_group, lr=pretrain_lr, betas=(configs.beta1, configs.beta2),
115 |                                        weight_decay=0)
116 |     model_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=model_optimizer, T_max=pretrain_epoch)
117 | 
118 |     # Trainer
119 |     best_performance = Trainer(model, model_optimizer, model_scheduler, train_dl, valid_dl, test_dl, device, logger,
120 |                                args, configs, experiment_log_dir, seed)
121 | 
122 |     return best_performance
123 | 
124 | 
125 | if __name__ == '__main__':
126 |     args, unknown = parser.parse_known_args()
127 |     device = torch.device(args.device)
128 |     exec (f'from config_files.{args.pretrain_dataset}_Configs import Config as Configs')
129 |     configs = Configs()
130 | 
131 |     main(args, configs)
132 | 
133 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/model.py:
--------------------------------------------------------------------------------
 1 | from torch import nn
 2 | import torch
 3 | from loss import ContrastiveWeight, AggregationRebuild, AutomaticWeightedLoss
 4 | 
 5 | 
 6 | class TFC(nn.Module):
 7 |     def __init__(self, configs, args):
 8 |         super(TFC, self).__init__()
 9 |         self.training_mode = args.training_mode
10 | 
11 |         self.conv_block1 = nn.Sequential(
12 |             nn.Conv1d(configs.input_channels, 32, kernel_size=configs.kernel_size,
13 |                       stride=configs.stride, bias=False, padding=(configs.kernel_size // 2)),
14 |             nn.BatchNorm1d(32),
15 |             nn.ReLU(),
16 |             nn.MaxPool1d(kernel_size=2, stride=2, padding=1),
17 |             nn.Dropout(configs.dropout)
18 |         )
19 | 
20 |         self.conv_block2 = nn.Sequential(
21 |             nn.Conv1d(32, 64, kernel_size=8, stride=1, bias=False, padding=4),
22 |             nn.BatchNorm1d(64),
23 |             nn.ReLU(),
24 |             nn.MaxPool1d(kernel_size=2, stride=2, padding=1)
25 |         )
26 | 
27 |         self.conv_block3 = nn.Sequential(
28 |             nn.Conv1d(64, configs.final_out_channels, kernel_size=8, stride=1, bias=False, padding=4),
29 |             nn.BatchNorm1d(configs.final_out_channels),
30 |             nn.ReLU(),
31 |             nn.MaxPool1d(kernel_size=2, stride=2, padding=1),
32 |         )
33 | 
34 |         self.dense = nn.Sequential(
35 |             nn.Linear(configs.CNNoutput_channel * configs.final_out_channels, 256),
36 |             nn.BatchNorm1d(256),
37 |             nn.ReLU(),
38 |             nn.Linear(256, 128)
39 |         )
40 | 
41 |         if self.training_mode == 'pre_train':
42 |             self.awl = AutomaticWeightedLoss(2)
43 |             self.contrastive = ContrastiveWeight(args)
44 |             self.aggregation = AggregationRebuild(args)
45 |             self.head = nn.Linear(1280, 178)
46 |             self.mse = torch.nn.MSELoss()
47 | 
48 |     def forward(self, x_in_t, pretrain=False):
49 | 
50 |         if pretrain:
51 |             x = self.conv_block1(x_in_t)
52 |             x = self.conv_block2(x)
53 |             x = self.conv_block3(x)
54 | 
55 |             h = x.reshape(x.shape[0], -1)
56 |             z = self.dense(h)
57 | 
58 |             loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(z)
59 |             rebuild_weight_matrix, agg_x = self.aggregation(similarity_matrix, x)
60 |             pred_x = self.head(agg_x.reshape(agg_x.size(0), -1))
61 | 
62 |             loss_rb = self.mse(pred_x, x_in_t.reshape(x_in_t.size(0), -1).detach())
63 |             loss = self.awl(loss_cl, loss_rb)
64 | 
65 |             return loss, loss_cl, loss_rb
66 |         else:
67 |             x = self.conv_block1(x_in_t)
68 |             x = self.conv_block2(x)
69 |             x = self.conv_block3(x)
70 | 
71 |             h = x.reshape(x.shape[0], -1)
72 |             z = self.dense(h)
73 | 
74 |             return h, z
75 | 
76 | 
77 | class target_classifier(nn.Module):  # Classification head
78 |     def __init__(self, configs):
79 |         super(target_classifier, self).__init__()
80 |         self.logits = nn.Linear(1280, 64)
81 |         self.logits_simple = nn.Linear(64, configs.num_classes_target)
82 | 
83 |     def forward(self, emb):
84 |         """2-layer MLP"""
85 |         emb_flat = emb.reshape(emb.shape[0], -1)
86 |         emb = torch.sigmoid(self.logits(emb_flat))
87 |         pred = self.logits_simple(emb)
88 |         return pred
89 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/utils/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/augmentations.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | import math
 4 | 
 5 | 
 6 | def data_transform_masked4cl(sample, masking_ratio, lm, positive_nums=None, distribution='geometric'):
 7 |     """Masked time series in time dimension"""
 8 | 
 9 |     if positive_nums is None:
10 |         positive_nums = math.ceil(1.5 / (1 - masking_ratio))
11 | 
12 |     sample = sample.permute(0, 2, 1)
13 | 
14 |     sample_repeat = sample.repeat(positive_nums, 1, 1)
15 | 
16 |     mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution)
17 |     x_masked = mask * sample_repeat
18 | 
19 |     return x_masked.permute(0, 2, 1), mask.permute(0, 2, 1)
20 | 
21 | 
22 | def geom_noise_mask_single(L, lm, masking_ratio):
23 |     """
24 |     Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio`
25 |     proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution.
26 |     Args:
27 |         L: length of mask and sequence to be masked
28 |         lm: average length of masking subsequences (streaks of 0s)
29 |         masking_ratio: proportion of L to be masked
30 |     Returns:
31 |         (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L
32 |     """
33 |     keep_mask = np.ones(L, dtype=bool)
34 |     p_m = 1 / lm  # probability of each masking sequence stopping. parameter of geometric distribution.
35 |     p_u = p_m * masking_ratio / (
36 |             1 - masking_ratio)  # probability of each unmasked sequence stopping. parameter of geometric distribution.
37 |     p = [p_m, p_u]
38 | 
39 |     # Start in state 0 with masking_ratio probability
40 |     state = int(np.random.rand() > masking_ratio)  # state 0 means masking, 1 means not masking
41 |     for i in range(L):
42 |         keep_mask[i] = state  # here it happens that state and masking value corresponding to state are identical
43 |         if np.random.rand() < p[state]:
44 |             state = 1 - state
45 | 
46 |     return keep_mask
47 | 
48 | 
49 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None):
50 |     """
51 |     Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked.
52 |     Args:
53 |         X: (seq_length, feat_dim) numpy array of features corresponding to a single sample
54 |         masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of
55 |             feat_dim that will be masked on average
56 |         lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'.
57 |         distribution: whether each mask sequence element is sampled independently at random, or whether
58 |             sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of
59 |             masked squences of a desired mean length `lm`
60 |         exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s)
61 |     Returns:
62 |         boolean numpy array with the same shape as X, with 0s at places where a feature should be masked
63 |     """
64 |     if exclude_feats is not None:
65 |         exclude_feats = set(exclude_feats)
66 | 
67 |     if distribution == 'geometric':  # stateful (Markov chain)
68 |         mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio)
69 |         mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2])
70 |     elif distribution == 'masked_tail':
71 |         mask = np.ones(X.shape, dtype=bool)
72 |         for m in range(X.shape[0]):  # feature dimension
73 | 
74 |             keep_mask = np.zeros_like(mask[m, :], dtype=bool)
75 |             n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio))
76 |             keep_mask[:, :n] = True
77 |             mask[m, :] = keep_mask  # time dimension
78 |     elif distribution == 'masked_head':
79 |         mask = np.ones(X.shape, dtype=bool)
80 |         for m in range(X.shape[0]):  # feature dimension
81 | 
82 |             keep_mask = np.zeros_like(mask[m, :], dtype=bool)
83 |             n = math.ceil(keep_mask.shape[1] * masking_ratio)
84 |             keep_mask[:, n:] = True
85 |             mask[m, :] = keep_mask  # time dimension
86 |     else:  # each position is independent Bernoulli with p = 1 - masking_ratio
87 |         mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True,
88 |                                 p=(1 - masking_ratio, masking_ratio))
89 |     return torch.tensor(mask)


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/loss.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import numpy as np
  4 | import random
  5 | import math
  6 | import torch.nn.functional as F
  7 |   
  8 | class AutomaticWeightedLoss(nn.Module):
  9 |     """automatically weighted multi-task loss
 10 |     Params：
 11 |         num: int，the number of loss
 12 |         x: multi-task loss
 13 |     Examples：
 14 |         loss1=1
 15 |         loss2=2
 16 |         awl = AutomaticWeightedLoss(2)
 17 |         loss_sum = awl(loss1, loss2)
 18 |     """
 19 |     def __init__(self, num=2):
 20 |         super(AutomaticWeightedLoss, self).__init__()
 21 |         params = torch.ones(num, requires_grad=True)
 22 |         self.params = nn.Parameter(params)
 23 |        
 24 |     def forward(self, *x):
 25 |         loss_sum = 0
 26 |         for i, loss in enumerate(x):
 27 |             loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
 28 |         return loss_sum
 29 |     
 30 |     
 31 | class ContrastiveLoss(nn.Module):
 32 | 
 33 |     def __init__(self, device, args):
 34 |         super(ContrastiveLoss, self).__init__()
 35 |         self.device = device
 36 |         self.temperature = args.temperature
 37 |         
 38 |         self.bce = torch.nn.BCELoss()
 39 |         self.softmax = torch.nn.Softmax(dim=-1)
 40 |         self.log_softmax = torch.nn.LogSoftmax(dim=-1)
 41 |         
 42 |         self.kl = torch.nn.KLDivLoss(reduction='batchmean')
 43 |         
 44 |     def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size, oral_batch_size):
 45 |         
 46 |         diag = np.eye(cur_batch_size)
 47 |         mask = torch.from_numpy(diag)
 48 |         mask = mask.type(torch.bool)
 49 |         
 50 |         positives_mask = np.zeros(similarity_matrix.size())
 51 |         for i in range(cur_batch_size//oral_batch_size):
 52 |             ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size*i)
 53 |             lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size*i)
 54 |             positives_mask += ll
 55 |             positives_mask += lr
 56 |         
 57 |         positives_mask = torch.from_numpy(positives_mask)
 58 |         positives_mask[mask] = 0
 59 | 
 60 |         negatives_mask = 1 - positives_mask
 61 |         negatives_mask[mask] = 0
 62 |         
 63 |         return positives_mask.type(torch.bool), negatives_mask.type(torch.bool)
 64 | 
 65 |     def forward(self, batch_emb_om, batch_x):
 66 |         
 67 |         cur_batch_shape = batch_emb_om.shape
 68 |         oral_batch_shape = batch_x.shape
 69 |         
 70 |         # get similarity matrix among mask samples
 71 |         norm_emb = F.normalize(batch_emb_om, dim=1)
 72 |         similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1))
 73 |         
 74 |         # get positives and negatives similarity
 75 |         positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0], oral_batch_shape[0])
 76 | 
 77 |         positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1)
 78 |         negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1)
 79 |         
 80 |         # generate predict and target probability distributions matrix
 81 |         logits = torch.cat((positives, negatives), dim=-1) 
 82 |         y_true = torch.cat((torch.ones(cur_batch_shape[0], positives.shape[-1]) / positives.shape[-1],  torch.zeros(cur_batch_shape[0], negatives.shape[-1])), dim=-1).to(self.device).float()
 83 |         
 84 |         # multiple positives - KL divergence
 85 |         predict = self.log_softmax(logits / self.temperature)
 86 |         loss = self.kl(predict, y_true)
 87 |         
 88 |         return loss, similarity_matrix, logits
 89 |     
 90 |    
 91 | class RebuildLoss(torch.nn.Module):
 92 | 
 93 |     def __init__(self, device, args):
 94 |         super(RebuildLoss, self).__init__()
 95 |         self.args = args
 96 |         self.device = device
 97 |         self.temperature = args.temperature
 98 |         
 99 |         self.softmax = torch.nn.Softmax(dim=-1)
100 |         self.mse = torch.nn.MSELoss()
101 | 
102 |     def forward(self, similarity_matrix, batch_emb_om, batch_emb_o, batch_x):
103 |         
104 |         cur_batch_shape = batch_emb_om.shape
105 |         oral_batch_shape = batch_x.shape
106 |         
107 |         # get the weight among (oral, oral's masks, others, others' masks)
108 |         similarity_matrix /= self.temperature
109 |         similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to(self.device).float() * 1e12
110 |         rebuild_weight_matrix = self.softmax(similarity_matrix)
111 |         
112 |         batch_emb_om = batch_emb_om.view(cur_batch_shape[0], -1)
113 |         
114 |         # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks)
115 |         rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om)
116 | 
117 |         # get oral' rebuilt batch embedding
118 |         rebuild_oral_batch_emb = rebuild_batch_emb[:oral_batch_shape[0]].reshape(oral_batch_shape[0], cur_batch_shape[1], -1)
119 |         
120 |         # MSE Loss
121 |         if self.args.rbtp == 0:
122 |             loss = self.mse(rebuild_oral_batch_emb, batch_emb_o.detach())
123 |         elif self.args.rbtp == 1:
124 |             loss = self.mse(rebuild_oral_batch_emb, batch_x.detach())
125 |         
126 |         return loss, rebuild_weight_matrix
127 |     
128 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/masking.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | class TriangularCausalMask():
 5 |     def __init__(self, B, L, device="cpu"):
 6 |         mask_shape = [B, 1, L, L]
 7 |         with torch.no_grad():
 8 |             self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
 9 | 
10 |     @property
11 |     def mask(self):
12 |         return self._mask
13 | 
14 | 
15 | class ProbMask():
16 |     def __init__(self, B, H, L, index, scores, device="cpu"):
17 |         _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1)
18 |         _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
19 |         indicator = _mask_ex[torch.arange(B)[:, None, None],
20 |                     torch.arange(H)[None, :, None],
21 |                     index, :].to(device)
22 |         self._mask = indicator.view(scores.shape).to(device)
23 | 
24 |     @property
25 |     def mask(self):
26 |         return self._mask
27 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/metrics.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def RSE(pred, true):
 5 |     return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2))
 6 | 
 7 | 
 8 | def CORR(pred, true):
 9 |     u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0)
10 |     d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0))
11 |     return (u / d).mean(-1)
12 | 
13 | 
14 | def MAE(pred, true):
15 |     return np.mean(np.abs(pred - true))
16 | 
17 | 
18 | def MSE(pred, true):
19 |     return np.mean((pred - true) ** 2)
20 | 
21 | 
22 | def RMSE(pred, true):
23 |     return np.sqrt(MSE(pred, true))
24 | 
25 | 
26 | def MAPE(pred, true):
27 |     return np.mean(np.abs((pred - true) / true))
28 | 
29 | 
30 | def MSPE(pred, true):
31 |     return np.mean(np.square((pred - true) / true))
32 | 
33 | 
34 | def metric(pred, true):
35 |     mae = MAE(pred, true)
36 |     mse = MSE(pred, true)
37 |     rmse = RMSE(pred, true)
38 |     mape = MAPE(pred, true)
39 |     mspe = MSPE(pred, true)
40 | 
41 |     return mae, mse, rmse, mape, mspe
42 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/timefeatures.py:
--------------------------------------------------------------------------------
  1 | from typing import List
  2 | 
  3 | import numpy as np
  4 | import pandas as pd
  5 | from pandas.tseries import offsets
  6 | from pandas.tseries.frequencies import to_offset
  7 | 
  8 | 
  9 | class TimeFeature:
 10 |     def __init__(self):
 11 |         pass
 12 | 
 13 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 14 |         pass
 15 | 
 16 |     def __repr__(self):
 17 |         return self.__class__.__name__ + "()"
 18 | 
 19 | 
 20 | class SecondOfMinute(TimeFeature):
 21 |     """Minute of hour encoded as value between [-0.5, 0.5]"""
 22 | 
 23 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 24 |         return index.second / 59.0 - 0.5
 25 | 
 26 | 
 27 | class MinuteOfHour(TimeFeature):
 28 |     """Minute of hour encoded as value between [-0.5, 0.5]"""
 29 | 
 30 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 31 |         return index.minute / 59.0 - 0.5
 32 | 
 33 | 
 34 | class HourOfDay(TimeFeature):
 35 |     """Hour of day encoded as value between [-0.5, 0.5]"""
 36 | 
 37 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 38 |         return index.hour / 23.0 - 0.5
 39 | 
 40 | 
 41 | class DayOfWeek(TimeFeature):
 42 |     """Hour of day encoded as value between [-0.5, 0.5]"""
 43 | 
 44 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 45 |         return index.dayofweek / 6.0 - 0.5
 46 | 
 47 | 
 48 | class DayOfMonth(TimeFeature):
 49 |     """Day of month encoded as value between [-0.5, 0.5]"""
 50 | 
 51 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 52 |         return (index.day - 1) / 30.0 - 0.5
 53 | 
 54 | 
 55 | class DayOfYear(TimeFeature):
 56 |     """Day of year encoded as value between [-0.5, 0.5]"""
 57 | 
 58 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 59 |         return (index.dayofyear - 1) / 365.0 - 0.5
 60 | 
 61 | 
 62 | class MonthOfYear(TimeFeature):
 63 |     """Month of year encoded as value between [-0.5, 0.5]"""
 64 | 
 65 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 66 |         return (index.month - 1) / 11.0 - 0.5
 67 | 
 68 | 
 69 | class WeekOfYear(TimeFeature):
 70 |     """Week of year encoded as value between [-0.5, 0.5]"""
 71 | 
 72 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 73 |         return (index.isocalendar().week - 1) / 52.0 - 0.5
 74 | 
 75 | 
 76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]:
 77 |     """
 78 |     Returns a list of time features that will be appropriate for the given frequency string.
 79 |     Parameters
 80 |     ----------
 81 |     freq_str
 82 |         Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc.
 83 |     """
 84 | 
 85 |     features_by_offsets = {
 86 |         offsets.YearEnd: [],
 87 |         offsets.QuarterEnd: [MonthOfYear],
 88 |         offsets.MonthEnd: [MonthOfYear],
 89 |         offsets.Week: [DayOfMonth, WeekOfYear],
 90 |         offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear],
 91 |         offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear],
 92 |         offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear],
 93 |         offsets.Minute: [
 94 |             MinuteOfHour,
 95 |             HourOfDay,
 96 |             DayOfWeek,
 97 |             DayOfMonth,
 98 |             DayOfYear,
 99 |         ],
100 |         offsets.Second: [
101 |             SecondOfMinute,
102 |             MinuteOfHour,
103 |             HourOfDay,
104 |             DayOfWeek,
105 |             DayOfMonth,
106 |             DayOfYear,
107 |         ],
108 |     }
109 | 
110 |     offset = to_offset(freq_str)
111 | 
112 |     for offset_type, feature_classes in features_by_offsets.items():
113 |         if isinstance(offset, offset_type):
114 |             return [cls() for cls in feature_classes]
115 | 
116 |     supported_freq_msg = f"""
117 |     Unsupported frequency {freq_str}
118 |     The following frequencies are supported:
119 |         Y   - yearly
120 |             alias: A
121 |         M   - monthly
122 |         W   - weekly
123 |         D   - daily
124 |         B   - business days
125 |         H   - hourly
126 |         T   - minutely
127 |             alias: min
128 |         S   - secondly
129 |     """
130 |     raise RuntimeError(supported_freq_msg)
131 | 
132 | 
133 | def time_features(dates, freq='h'):
134 |     return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)])
135 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/tools.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | import matplotlib.pyplot as plt
 4 | 
 5 | plt.switch_backend('agg')
 6 | 
 7 | def adjust_learning_rate(optimizer, epoch, args, learning_rate):
 8 |     # lr = args.learning_rate * (0.2 ** (epoch // 2))
 9 | 
10 |     if args.lradj == 'type1':
11 |         lr_adjust = {epoch: learning_rate * (0.5 ** ((epoch - 1) // 1))}
12 |         
13 |         if epoch in lr_adjust.keys():
14 |             lr = lr_adjust[epoch]
15 |             for param_group in optimizer.param_groups:
16 |                 param_group['lr'] = lr
17 |             print('Updating learning rate to {}'.format(lr))
18 |     elif args.lradj == 'type2':
19 |         lr_adjust = {
20 |             2: 5e-5, 4: 1e-5, 6: 5e-6, 8: 1e-6,
21 |             10: 5e-7, 15: 1e-7, 20: 5e-8
22 |         }
23 |         
24 |         if epoch in lr_adjust.keys():
25 |             lr = lr_adjust[epoch]
26 |             for param_group in optimizer.param_groups:
27 |                 param_group['lr'] = lr
28 |             print('Updating learning rate to {}'.format(lr))
29 | 
30 | 
31 | class EarlyStopping:
32 |     def __init__(self, patience=7, verbose=False, delta=0):
33 |         self.patience = patience
34 |         self.verbose = verbose
35 |         self.counter = 0
36 |         self.best_score = None
37 |         self.early_stop = False
38 |         self.val_loss_min = np.Inf
39 |         self.delta = delta
40 | 
41 |     def __call__(self, val_loss, model, path, pred_len):
42 |         score = -val_loss
43 |         if self.best_score is None:
44 |             self.best_score = score
45 |             self.save_checkpoint(val_loss, model, path, pred_len)
46 |         elif score < self.best_score + self.delta:
47 |             self.counter += 1
48 |             print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
49 |             if self.counter >= self.patience:
50 |                 self.early_stop = True
51 |         else:
52 |             self.best_score = score
53 |             self.save_checkpoint(val_loss, model, path, pred_len)
54 |             self.counter = 0
55 | 
56 |     def save_checkpoint(self, val_loss, model, path, pred_len):
57 |         if self.verbose:
58 |             print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...')
59 |         torch.save(model.state_dict(), path + '/' + f'checkpoint_{pred_len}.pth')
60 |         self.val_loss_min = val_loss


--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import random
 3 | import numpy as np
 4 | import pandas as pd
 5 | import os
 6 | import sys
 7 | import logging
 8 | from sklearn.metrics import classification_report, cohen_kappa_score, confusion_matrix, accuracy_score
 9 | from shutil import copy
10 | 
11 | def set_requires_grad(model, dict_, requires_grad=True):
12 |     for param in model.named_parameters():
13 |         if param[0] in dict_:
14 |             param[1].requires_grad = requires_grad
15 | 
16 | 
17 | def fix_randomness(SEED):
18 |     random.seed(SEED)
19 |     np.random.seed(SEED)
20 |     torch.manual_seed(SEED)
21 |     torch.cuda.manual_seed(SEED)
22 |     torch.backends.cudnn.deterministic = True
23 | 
24 | 
25 | def epoch_time(start_time, end_time):
26 |     elapsed_time = end_time - start_time
27 |     elapsed_mins = int(elapsed_time / 60)
28 |     elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
29 |     return elapsed_mins, elapsed_secs
30 | 
31 | 
32 | def _calc_metrics(pred_labels, true_labels, log_dir, home_path):
33 |     pred_labels = np.array(pred_labels).astype(int)
34 |     true_labels = np.array(true_labels).astype(int)
35 | 
36 |     # save targets
37 |     labels_save_path = os.path.join(log_dir, "labels")
38 |     os.makedirs(labels_save_path, exist_ok=True)
39 |     np.save(os.path.join(labels_save_path, "predicted_labels.npy"), pred_labels)
40 |     np.save(os.path.join(labels_save_path, "true_labels.npy"), true_labels)
41 | 
42 |     r = classification_report(true_labels, pred_labels, digits=6, output_dict=True)
43 |     cm = confusion_matrix(true_labels, pred_labels)
44 |     df = pd.DataFrame(r)
45 |     df["cohen"] = cohen_kappa_score(true_labels, pred_labels)
46 |     df["accuracy"] = accuracy_score(true_labels, pred_labels)
47 |     df = df * 100
48 | 
49 |     # save classification report
50 |     exp_name = os.path.split(os.path.dirname(log_dir))[-1]
51 |     training_mode = os.path.basename(log_dir)
52 |     file_name = f"{exp_name}_{training_mode}_classification_report.xlsx"
53 |     report_Save_path = os.path.join(home_path, log_dir, file_name)
54 |     df.to_excel(report_Save_path)
55 | 
56 |     # save confusion matrix
57 |     cm_file_name = f"{exp_name}_{training_mode}_confusion_matrix.torch"
58 |     cm_Save_path = os.path.join(home_path, log_dir, cm_file_name)
59 |     torch.save(cm, cm_Save_path)
60 | 
61 | 
62 | def _logger(logger_name, level=logging.DEBUG):
63 |     """
64 |     Method to return a custom logger with the given name and level
65 |     """
66 |     logger = logging.getLogger(logger_name)
67 |     logger.setLevel(level)
68 |     # format_string = ("%(asctime)s — %(name)s — %(levelname)s — %(funcName)s:"
69 |     #                 "%(lineno)d — %(message)s")
70 |     format_string = "%(message)s"
71 |     log_format = logging.Formatter(format_string)
72 |     # Creating and adding the console handler
73 |     console_handler = logging.StreamHandler(sys.stdout)
74 |     console_handler.setFormatter(log_format)
75 |     logger.addHandler(console_handler)
76 |     # Creating and adding the file handler
77 |     file_handler = logging.FileHandler(logger_name, mode='a')
78 |     file_handler.setFormatter(log_format)
79 |     logger.addHandler(file_handler)
80 |     return logger
81 | 
82 | 
83 | 
84 | 
85 | 
86 | def copy_Files(destination, data_type):
87 |     # destination: 'experiments_logs/Exp1/run1'
88 |     destination_dir = os.path.join(destination, "model_files")
89 |     os.makedirs(destination_dir, exist_ok=True)
90 |     copy("code/main.py", os.path.join(destination_dir, "main.py"))
91 |     copy("code/trainer.py", os.path.join(destination_dir, "trainer.py"))
92 |     copy(f"code/config_files/{data_type}_Configs.py", os.path.join(destination_dir, f"{data_type}_Configs.py"))
93 |     copy("code/augmentations.py", os.path.join(destination_dir, "augmentations.py"))
94 |     copy("code/dataloader.py", os.path.join(destination_dir, "dataloader.py"))
95 |     copy(f"code/model.py", os.path.join(destination_dir, f"model.py"))
96 |     copy("code/loss.py", os.path.join(destination_dir, "loss.py"))
97 |     copy("code/TC.py", os.path.join(destination_dir, "TC.py"))
98 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/download_datasets.sh:
--------------------------------------------------------------------------------
 1 | wget -O SleepEEG.zip https://figshare.com/ndownloader/articles/19930178/versions/1
 2 | wget -O Epilepsy.zip https://figshare.com/ndownloader/articles/19930199/versions/1
 3 | wget -O FD-A.zip https://figshare.com/ndownloader/articles/19930205/versions/1
 4 | wget -O FD-B.zip https://figshare.com/ndownloader/articles/19930226/versions/1
 5 | wget -O HAR.zip https://figshare.com/ndownloader/articles/19930244/versions/1
 6 | wget -O Gesture.zip https://figshare.com/ndownloader/articles/19930247/versions/1
 7 | wget -O ECG.zip https://figshare.com/ndownloader/articles/19930253/versions/1
 8 | wget -O EMG.zip https://figshare.com/ndownloader/articles/19930250/versions/1
 9 | 
10 | unzip  SleepEEG.zip -d datasets/SleepEEG/
11 | unzip Epilepsy.zip -d datasets/Epilepsy/
12 | unzip FD-A.zip -d datasets/FD-A/
13 | unzip FD-B.zip -d datasets/FD-B/
14 | unzip HAR.zip -d datasets/HAR/
15 | unzip Gesture.zip -d datasets/Gesture/
16 | unzip ECG.zip -d datasets/ECG/
17 | unzip EMG.zip -d datasets/EMG/
18 | 
19 | #rm {SleepEEG,Epilepsy,FD-A,FD-B,HAR,Gesture,ECG,EMG}.zip
20 | 
21 | 
22 | 
23 | 
24 | 


--------------------------------------------------------------------------------
/SimMTM_Classification/run.sh:
--------------------------------------------------------------------------------
1 | python ./code/main.py --target_dataset Epilepsy --finetune_epoch 100
2 | python ./code/main.py --target_dataset FD-B --lr 0.0003
3 | python ./code/main.py --target_dataset Gesture
4 | python ./code/main.py --target_dataset EMG --lr 0.0003
5 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | /scripts/long_term_forecast/Traffic_script/PatchTST1.sh
131 | /backups/
132 | /result.xlsx
133 | /~$result.xlsx
134 | /Time-Series-Library.zip
135 | /temp.sh
136 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/data_factory.py:
--------------------------------------------------------------------------------
  1 | from data_provider.data_loader import Dataset_ETT_hour, Dataset_ETT_minute, Dataset_Custom, Dataset_M4, PSMSegLoader, \
  2 |     MSLSegLoader, SMAPSegLoader, SMDSegLoader, SWATSegLoader, UEAloader
  3 | from data_provider.uea import collate_fn
  4 | from torch.utils.data import DataLoader
  5 | 
  6 | data_dict = {
  7 |     'ETTh1': Dataset_ETT_hour,
  8 |     'ETTh2': Dataset_ETT_hour,
  9 |     'ETTm1': Dataset_ETT_minute,
 10 |     'ETTm2': Dataset_ETT_minute,
 11 |     'Traffic': Dataset_Custom,
 12 |     'Exchange': Dataset_Custom,
 13 |     'Weather': Dataset_Custom,
 14 |     'ECL': Dataset_Custom,
 15 |     'ILI': Dataset_Custom,
 16 |     'm4': Dataset_M4,
 17 |     'PSM': PSMSegLoader,
 18 |     'MSL': MSLSegLoader,
 19 |     'SMAP': SMAPSegLoader,
 20 |     'SMD': SMDSegLoader,
 21 |     'SWAT': SWATSegLoader,
 22 |     'UEA': UEAloader,
 23 | }
 24 | 
 25 | 
 26 | def data_provider(args, flag):
 27 |     Data = data_dict[args.data]
 28 | 
 29 |     timeenc = 0 if args.embed != 'timeF' else 1
 30 | 
 31 |     if flag == 'test':
 32 |         shuffle_flag = False
 33 |         drop_last = True
 34 |         if args.task_name == 'anomaly_detection' or args.task_name == 'classification':
 35 |             batch_size = args.batch_size
 36 |         else:
 37 |             batch_size = 1  # bsz=1 for evaluation
 38 |         freq = args.freq
 39 |     else:
 40 |         shuffle_flag = True
 41 |         drop_last = True
 42 |         batch_size = args.batch_size  # bsz for train and valid
 43 |         freq = args.freq
 44 | 
 45 |     if args.task_name == 'anomaly_detection':
 46 |         drop_last = False
 47 |         data_set = Data(
 48 |             root_path=args.root_path,
 49 |             win_size=args.seq_len,
 50 |             flag=flag,
 51 |         )
 52 |         print(flag, len(data_set))
 53 |         data_loader = DataLoader(
 54 |             data_set,
 55 |             batch_size=batch_size,
 56 |             shuffle=shuffle_flag,
 57 |             num_workers=args.num_workers,
 58 |             drop_last=drop_last)
 59 |         return data_set, data_loader
 60 |     elif args.task_name == 'classification':
 61 |         drop_last = False
 62 |         data_set = Data(
 63 |             root_path=args.root_path,
 64 |             flag=flag,
 65 |         )
 66 |         print(flag, len(data_set))
 67 |         data_loader = DataLoader(
 68 |             data_set,
 69 |             batch_size=batch_size,
 70 |             shuffle=shuffle_flag,
 71 |             num_workers=args.num_workers,
 72 |             drop_last=drop_last,
 73 |             collate_fn=lambda x: collate_fn(x, max_len=args.seq_len)
 74 |         )
 75 |         return data_set, data_loader
 76 |     else:
 77 |         if args.data == 'm4':
 78 |             drop_last = False
 79 | 
 80 |         data_set = Data(
 81 |             root_path=args.root_path,
 82 |             data_path=args.data_path,
 83 |             flag=flag,
 84 |             size=[args.seq_len, args.label_len, args.pred_len],
 85 |             features=args.features,
 86 |             target=args.target,
 87 |             timeenc=timeenc,
 88 |             freq=freq,
 89 |             seasonal_patterns=args.seasonal_patterns
 90 |         )
 91 | 
 92 |         data_loader = DataLoader(
 93 |             data_set,
 94 |             batch_size=batch_size,
 95 |             shuffle=shuffle_flag,
 96 |             num_workers=args.num_workers,
 97 |             drop_last=drop_last)
 98 | 
 99 |         print(flag, len(data_set), len(data_loader))
100 |         return data_set, data_loader
101 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/m4.py:
--------------------------------------------------------------------------------
  1 | # This source code is provided for the purposes of scientific reproducibility
  2 | # under the following limited license from Element AI Inc. The code is an
  3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
  4 | # expansion analysis for interpretable time series forecasting,
  5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is
  6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0
  7 | # International license (CC BY-NC 4.0):
  8 | # https://creativecommons.org/licenses/by-nc/4.0/.  Any commercial use (whether
  9 | # for the benefit of third parties or internally in production) requires an
 10 | # explicit license. The subject-matter of the N-BEATS model and associated
 11 | # materials are the property of Element AI Inc. and may be subject to patent
 12 | # protection. No license to patents is granted hereunder (whether express or
 13 | # implied). Copyright © 2020 Element AI Inc. All rights reserved.
 14 | 
 15 | """
 16 | M4 Dataset
 17 | """
 18 | from dataclasses import dataclass
 19 | 
 20 | import numpy as np
 21 | import pandas as pd
 22 | import logging
 23 | import os
 24 | import pathlib
 25 | import sys
 26 | from urllib import request
 27 | 
 28 | 
 29 | def url_file_name(url: str) -> str:
 30 |     """
 31 |     Extract file name from url.
 32 | 
 33 |     :param url: URL to extract file name from.
 34 |     :return: File name.
 35 |     """
 36 |     return url.split('/')[-1] if len(url) > 0 else ''
 37 | 
 38 | 
 39 | def download(url: str, file_path: str) -> None:
 40 |     """
 41 |     Download a file to the given path.
 42 | 
 43 |     :param url: URL to download
 44 |     :param file_path: Where to download the content.
 45 |     """
 46 | 
 47 |     def progress(count, block_size, total_size):
 48 |         progress_pct = float(count * block_size) / float(total_size) * 100.0
 49 |         sys.stdout.write('\rDownloading {} to {} {:.1f}%'.format(url, file_path, progress_pct))
 50 |         sys.stdout.flush()
 51 | 
 52 |     if not os.path.isfile(file_path):
 53 |         opener = request.build_opener()
 54 |         opener.addheaders = [('User-agent', 'Mozilla/5.0')]
 55 |         request.install_opener(opener)
 56 |         pathlib.Path(os.path.dirname(file_path)).mkdir(parents=True, exist_ok=True)
 57 |         f, _ = request.urlretrieve(url, file_path, progress)
 58 |         sys.stdout.write('\n')
 59 |         sys.stdout.flush()
 60 |         file_info = os.stat(f)
 61 |         logging.info(f'Successfully downloaded {os.path.basename(file_path)} {file_info.st_size} bytes.')
 62 |     else:
 63 |         file_info = os.stat(file_path)
 64 |         logging.info(f'File already exists: {file_path} {file_info.st_size} bytes.')
 65 | 
 66 | 
 67 | @dataclass()
 68 | class M4Dataset:
 69 |     ids: np.ndarray
 70 |     groups: np.ndarray
 71 |     frequencies: np.ndarray
 72 |     horizons: np.ndarray
 73 |     values: np.ndarray
 74 | 
 75 |     @staticmethod
 76 |     def load(training: bool = True, dataset_file: str = '../dataset/m4') -> 'M4Dataset':
 77 |         """
 78 |         Load cached dataset.
 79 | 
 80 |         :param training: Load training part if training is True, test part otherwise.
 81 |         """
 82 |         info_file = os.path.join(dataset_file, 'M4-info.csv')
 83 |         train_cache_file = os.path.join(dataset_file, 'training.npz')
 84 |         test_cache_file = os.path.join(dataset_file, 'test.npz')
 85 |         m4_info = pd.read_csv(info_file)
 86 |         return M4Dataset(ids=m4_info.M4id.values,
 87 |                          groups=m4_info.SP.values,
 88 |                          frequencies=m4_info.Frequency.values,
 89 |                          horizons=m4_info.Horizon.values,
 90 |                          values=np.load(train_cache_file if training else test_cache_file, allow_pickle=True))
 91 | 
 92 | 
 93 | @dataclass()
 94 | class M4Meta:
 95 |     seasonal_patterns = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly']
 96 |     horizons = [6, 8, 18, 13, 14, 48]
 97 |     frequencies = [1, 4, 12, 1, 1, 24]
 98 |     horizons_map = {
 99 |         'Yearly': 6,
100 |         'Quarterly': 8,
101 |         'Monthly': 18,
102 |         'Weekly': 13,
103 |         'Daily': 14,
104 |         'Hourly': 48
105 |     }  # different predict length
106 |     frequency_map = {
107 |         'Yearly': 1,
108 |         'Quarterly': 4,
109 |         'Monthly': 12,
110 |         'Weekly': 1,
111 |         'Daily': 1,
112 |         'Hourly': 24
113 |     }
114 |     history_size = {
115 |         'Yearly': 1.5,
116 |         'Quarterly': 1.5,
117 |         'Monthly': 1.5,
118 |         'Weekly': 10,
119 |         'Daily': 10,
120 |         'Hourly': 10
121 |     }  # from interpretable.gin
122 | 
123 | 
124 | def load_m4_info() -> pd.DataFrame:
125 |     """
126 |     Load M4Info file.
127 | 
128 |     :return: Pandas DataFrame of M4Info.
129 |     """
130 |     return pd.read_csv(INFO_FILE_PATH)
131 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/uea.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import pandas as pd
  4 | import torch
  5 | 
  6 | 
  7 | def collate_fn(data, max_len=None):
  8 |     """Build mini-batch tensors from a list of (X, mask) tuples. Mask input. Create
  9 |     Args:
 10 |         data: len(batch_size) list of tuples (X, y).
 11 |             - X: torch tensor of shape (seq_length, feat_dim); variable seq_length.
 12 |             - y: torch tensor of shape (num_labels,) : class indices or numerical targets
 13 |                 (for classification or regression, respectively). num_labels > 1 for multi-task models
 14 |         max_len: global fixed sequence length. Used for architectures requiring fixed length input,
 15 |             where the batch length cannot vary dynamically. Longer sequences are clipped, shorter are padded with 0s
 16 |     Returns:
 17 |         X: (batch_size, padded_length, feat_dim) torch tensor of masked features (input)
 18 |         targets: (batch_size, padded_length, feat_dim) torch tensor of unmasked features (output)
 19 |         target_masks: (batch_size, padded_length, feat_dim) boolean torch tensor
 20 |             0 indicates masked values to be predicted, 1 indicates unaffected/"active" feature values
 21 |         padding_masks: (batch_size, padded_length) boolean tensor, 1 means keep vector at this position, 0 means padding
 22 |     """
 23 | 
 24 |     batch_size = len(data)
 25 |     features, labels = zip(*data)
 26 | 
 27 |     # Stack and pad features and masks (convert 2D to 3D tensors, i.e. add batch dimension)
 28 |     lengths = [X.shape[0] for X in features]  # original sequence length for each time series
 29 |     if max_len is None:
 30 |         max_len = max(lengths)
 31 |     X = torch.zeros(batch_size, max_len, features[0].shape[-1])  # (batch_size, padded_length, feat_dim)
 32 |     for i in range(batch_size):
 33 |         end = min(lengths[i], max_len)
 34 |         X[i, :end, :] = features[i][:end, :]
 35 | 
 36 |     targets = torch.stack(labels, dim=0)  # (batch_size, num_labels)
 37 | 
 38 |     padding_masks = padding_mask(torch.tensor(lengths, dtype=torch.int16),
 39 |                                  max_len=max_len)  # (batch_size, padded_length) boolean tensor, "1" means keep
 40 | 
 41 |     return X, targets, padding_masks
 42 | 
 43 | 
 44 | def padding_mask(lengths, max_len=None):
 45 |     """
 46 |     Used to mask padded positions: creates a (batch_size, max_len) boolean mask from a tensor of sequence lengths,
 47 |     where 1 means keep element at this position (time step)
 48 |     """
 49 |     batch_size = lengths.numel()
 50 |     max_len = max_len or lengths.max_val()  # trick works because of overloading of 'or' operator for non-boolean types
 51 |     return (torch.arange(0, max_len, device=lengths.device)
 52 |             .type_as(lengths)
 53 |             .repeat(batch_size, 1)
 54 |             .lt(lengths.unsqueeze(1)))
 55 | 
 56 | 
 57 | class Normalizer(object):
 58 |     """
 59 |     Normalizes dataframe across ALL contained rows (time steps). Different from per-sample normalization.
 60 |     """
 61 | 
 62 |     def __init__(self, norm_type='standardization', mean=None, std=None, min_val=None, max_val=None):
 63 |         """
 64 |         Args:
 65 |             norm_type: choose from:
 66 |                 "standardization", "minmax": normalizes dataframe across ALL contained rows (time steps)
 67 |                 "per_sample_std", "per_sample_minmax": normalizes each sample separately (i.e. across only its own rows)
 68 |             mean, std, min_val, max_val: optional (num_feat,) Series of pre-computed values
 69 |         """
 70 | 
 71 |         self.norm_type = norm_type
 72 |         self.mean = mean
 73 |         self.std = std
 74 |         self.min_val = min_val
 75 |         self.max_val = max_val
 76 | 
 77 |     def normalize(self, df):
 78 |         """
 79 |         Args:
 80 |             df: input dataframe
 81 |         Returns:
 82 |             df: normalized dataframe
 83 |         """
 84 |         if self.norm_type == "standardization":
 85 |             if self.mean is None:
 86 |                 self.mean = df.mean()
 87 |                 self.std = df.std()
 88 |             return (df - self.mean) / (self.std + np.finfo(float).eps)
 89 | 
 90 |         elif self.norm_type == "minmax":
 91 |             if self.max_val is None:
 92 |                 self.max_val = df.max()
 93 |                 self.min_val = df.min()
 94 |             return (df - self.min_val) / (self.max_val - self.min_val + np.finfo(float).eps)
 95 | 
 96 |         elif self.norm_type == "per_sample_std":
 97 |             grouped = df.groupby(by=df.index)
 98 |             return (df - grouped.transform('mean')) / grouped.transform('std')
 99 | 
100 |         elif self.norm_type == "per_sample_minmax":
101 |             grouped = df.groupby(by=df.index)
102 |             min_vals = grouped.transform('min')
103 |             return (df - min_vals) / (grouped.transform('max') - min_vals + np.finfo(float).eps)
104 | 
105 |         else:
106 |             raise (NameError(f'Normalize method "{self.norm_type}" not implemented'))
107 | 
108 | 
109 | def interpolate_missing(y):
110 |     """
111 |     Replaces NaN values in pd.Series `y` using linear interpolation
112 |     """
113 |     if y.isna().any():
114 |         y = y.interpolate(method='linear', limit_direction='both')
115 |     return y
116 | 
117 | 
118 | def subsample(y, limit=256, factor=2):
119 |     """
120 |     If a given Series is longer than `limit`, returns subsampled sequence by the specified integer factor
121 |     """
122 |     if len(y) > limit:
123 |         return y[::factor].reset_index(drop=True)
124 |     return y
125 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/exp_basic.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import torch
 3 | from models import SimMTM
 4 | 
 5 | 
 6 | class Exp_Basic(object):
 7 |     def __init__(self, args):
 8 |         self.args = args
 9 |         self.model_dict = {'SimMTM': SimMTM}
10 |         self.device = self._acquire_device()
11 |         self.model = self._build_model().to(self.device)
12 | 
13 |     def _build_model(self):
14 |         raise NotImplementedError
15 |         return None
16 | 
17 |     def _acquire_device(self):
18 |         if self.args.use_gpu:
19 |             os.environ["CUDA_VISIBLE_DEVICES"] = str(self.args.gpu) if not self.args.use_multi_gpu else self.args.devices
20 |             device = torch.device('cuda:{}'.format(self.args.gpu))
21 |             print('Use GPU: cuda:{}'.format(self.args.gpu))
22 |         else:
23 |             device = torch.device('cpu')
24 |             print('Use CPU')
25 |         return device
26 | 
27 |     def _get_data(self):
28 |         pass
29 | 
30 |     def vali(self):
31 |         pass
32 | 
33 |     def train(self):
34 |         pass
35 | 
36 |     def test(self):
37 |         pass
38 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/AutoCorrelation.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import matplotlib.pyplot as plt
  5 | import numpy as np
  6 | import math
  7 | from math import sqrt
  8 | import os
  9 | 
 10 | 
 11 | class AutoCorrelation(nn.Module):
 12 |     """
 13 |     AutoCorrelation Mechanism with the following two phases:
 14 |     (1) period-based dependencies discovery
 15 |     (2) time delay aggregation
 16 |     This block can replace the self-attention family mechanism seamlessly.
 17 |     """
 18 | 
 19 |     def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False):
 20 |         super(AutoCorrelation, self).__init__()
 21 |         self.factor = factor
 22 |         self.scale = scale
 23 |         self.mask_flag = mask_flag
 24 |         self.output_attention = output_attention
 25 |         self.dropout = nn.Dropout(attention_dropout)
 26 | 
 27 |     def time_delay_agg_training(self, values, corr):
 28 |         """
 29 |         SpeedUp version of Autocorrelation (a batch-normalization style design)
 30 |         This is for the training phase.
 31 |         """
 32 |         head = values.shape[1]
 33 |         channel = values.shape[2]
 34 |         length = values.shape[3]
 35 |         # find top k
 36 |         top_k = int(self.factor * math.log(length))
 37 |         mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
 38 |         index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1]
 39 |         weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1)
 40 |         # update corr
 41 |         tmp_corr = torch.softmax(weights, dim=-1)
 42 |         # aggregation
 43 |         tmp_values = values
 44 |         delays_agg = torch.zeros_like(values).float()
 45 |         for i in range(top_k):
 46 |             pattern = torch.roll(tmp_values, -int(index[i]), -1)
 47 |             delays_agg = delays_agg + pattern * \
 48 |                          (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
 49 |         return delays_agg
 50 | 
 51 |     def time_delay_agg_inference(self, values, corr):
 52 |         """
 53 |         SpeedUp version of Autocorrelation (a batch-normalization style design)
 54 |         This is for the inference phase.
 55 |         """
 56 |         batch = values.shape[0]
 57 |         head = values.shape[1]
 58 |         channel = values.shape[2]
 59 |         length = values.shape[3]
 60 |         # index init
 61 |         init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
 62 |         # find top k
 63 |         top_k = int(self.factor * math.log(length))
 64 |         mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
 65 |         weights, delay = torch.topk(mean_value, top_k, dim=-1)
 66 |         # update corr
 67 |         tmp_corr = torch.softmax(weights, dim=-1)
 68 |         # aggregation
 69 |         tmp_values = values.repeat(1, 1, 1, 2)
 70 |         delays_agg = torch.zeros_like(values).float()
 71 |         for i in range(top_k):
 72 |             tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)
 73 |             pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
 74 |             delays_agg = delays_agg + pattern * \
 75 |                          (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
 76 |         return delays_agg
 77 | 
 78 |     def time_delay_agg_full(self, values, corr):
 79 |         """
 80 |         Standard version of Autocorrelation
 81 |         """
 82 |         batch = values.shape[0]
 83 |         head = values.shape[1]
 84 |         channel = values.shape[2]
 85 |         length = values.shape[3]
 86 |         # index init
 87 |         init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
 88 |         # find top k
 89 |         top_k = int(self.factor * math.log(length))
 90 |         weights, delay = torch.topk(corr, top_k, dim=-1)
 91 |         # update corr
 92 |         tmp_corr = torch.softmax(weights, dim=-1)
 93 |         # aggregation
 94 |         tmp_values = values.repeat(1, 1, 1, 2)
 95 |         delays_agg = torch.zeros_like(values).float()
 96 |         for i in range(top_k):
 97 |             tmp_delay = init_index + delay[..., i].unsqueeze(-1)
 98 |             pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
 99 |             delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1))
100 |         return delays_agg
101 | 
102 |     def forward(self, queries, keys, values, attn_mask):
103 |         B, L, H, E = queries.shape
104 |         _, S, _, D = values.shape
105 |         if L > S:
106 |             zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
107 |             values = torch.cat([values, zeros], dim=1)
108 |             keys = torch.cat([keys, zeros], dim=1)
109 |         else:
110 |             values = values[:, :L, :, :]
111 |             keys = keys[:, :L, :, :]
112 | 
113 |         # period-based dependencies
114 |         q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1)
115 |         k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1)
116 |         res = q_fft * torch.conj(k_fft)
117 |         corr = torch.fft.irfft(res, dim=-1)
118 | 
119 |         # time delay agg
120 |         if self.training:
121 |             V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
122 |         else:
123 |             V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
124 | 
125 |         if self.output_attention:
126 |             return (V.contiguous(), corr.permute(0, 3, 1, 2))
127 |         else:
128 |             return (V.contiguous(), None)
129 | 
130 | 
131 | class AutoCorrelationLayer(nn.Module):
132 |     def __init__(self, correlation, d_model, n_heads, d_keys=None,
133 |                  d_values=None):
134 |         super(AutoCorrelationLayer, self).__init__()
135 | 
136 |         d_keys = d_keys or (d_model // n_heads)
137 |         d_values = d_values or (d_model // n_heads)
138 | 
139 |         self.inner_correlation = correlation
140 |         self.query_projection = nn.Linear(d_model, d_keys * n_heads)
141 |         self.key_projection = nn.Linear(d_model, d_keys * n_heads)
142 |         self.value_projection = nn.Linear(d_model, d_values * n_heads)
143 |         self.out_projection = nn.Linear(d_values * n_heads, d_model)
144 |         self.n_heads = n_heads
145 | 
146 |     def forward(self, queries, keys, values, attn_mask):
147 |         B, L, _ = queries.shape
148 |         _, S, _ = keys.shape
149 |         H = self.n_heads
150 | 
151 |         queries = self.query_projection(queries).view(B, L, H, -1)
152 |         keys = self.key_projection(keys).view(B, S, H, -1)
153 |         values = self.value_projection(values).view(B, S, H, -1)
154 | 
155 |         out, attn = self.inner_correlation(
156 |             queries,
157 |             keys,
158 |             values,
159 |             attn_mask
160 |         )
161 |         out = out.view(B, L, -1)
162 | 
163 |         return self.out_projection(out), attn
164 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Autoformer_EncDec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class my_Layernorm(nn.Module):
  7 |     """
  8 |     Special designed layernorm for the seasonal part
  9 |     """
 10 | 
 11 |     def __init__(self, channels):
 12 |         super(my_Layernorm, self).__init__()
 13 |         self.layernorm = nn.LayerNorm(channels)
 14 | 
 15 |     def forward(self, x):
 16 |         x_hat = self.layernorm(x)
 17 |         bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1)
 18 |         return x_hat - bias
 19 | 
 20 | 
 21 | class moving_avg(nn.Module):
 22 |     """
 23 |     Moving average block to highlight the trend of time series
 24 |     """
 25 | 
 26 |     def __init__(self, kernel_size, stride):
 27 |         super(moving_avg, self).__init__()
 28 |         self.kernel_size = kernel_size
 29 |         self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
 30 | 
 31 |     def forward(self, x):
 32 | 
 33 |         # padding on the both ends of time series
 34 |         front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
 35 |         end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
 36 |         x = torch.cat([front, x, end], dim=1)
 37 |         x = self.avg(x.permute(0, 2, 1))
 38 |         x = x.permute(0, 2, 1)
 39 |         return x
 40 | 
 41 | 
 42 | class series_decomp(nn.Module):
 43 |     """
 44 |     Series decomposition block
 45 |     """
 46 | 
 47 |     def __init__(self, kernel_size):
 48 |         super(series_decomp, self).__init__()
 49 |         self.moving_avg = moving_avg(kernel_size, stride=1)
 50 | 
 51 |     def forward(self, x):
 52 |         moving_mean = self.moving_avg(x)
 53 |         res = x - moving_mean
 54 |         return res, moving_mean
 55 | 
 56 | 
 57 | class series_decomp_multi(nn.Module):
 58 |     """
 59 |     Multiple Series decomposition block from FEDformer
 60 |     """
 61 | 
 62 |     def __init__(self, kernel_size):
 63 |         super(series_decomp_multi, self).__init__()
 64 |         self.kernel_size = kernel_size
 65 |         self.series_decomp = [series_decomp(kernel) for kernel in kernel_size]
 66 | 
 67 |     def forward(self, x):
 68 |         moving_mean = []
 69 |         res = []
 70 |         for func in self.series_decomp:
 71 |             sea, moving_avg = func(x)
 72 |             moving_mean.append(moving_avg)
 73 |             res.append(sea)
 74 | 
 75 |         sea = sum(res) / len(res)
 76 |         moving_mean = sum(moving_mean) / len(moving_mean)
 77 |         return sea, moving_mean
 78 | 
 79 | 
 80 | class EncoderLayer(nn.Module):
 81 |     """
 82 |     Autoformer encoder layer with the progressive decomposition architecture
 83 |     """
 84 | 
 85 |     def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"):
 86 |         super(EncoderLayer, self).__init__()
 87 |         d_ff = d_ff or 4 * d_model
 88 |         self.attention = attention
 89 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
 90 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
 91 |         self.decomp1 = series_decomp(moving_avg)
 92 |         self.decomp2 = series_decomp(moving_avg)
 93 |         self.dropout = nn.Dropout(dropout)
 94 |         self.activation = F.relu if activation == "relu" else F.gelu
 95 | 
 96 |     def forward(self, x, attn_mask=None):
 97 |         new_x, attn = self.attention(
 98 |             x, x, x,
 99 |             attn_mask=attn_mask
100 |         )
101 |         x = x + self.dropout(new_x)
102 |         x, _ = self.decomp1(x)
103 |         y = x
104 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
105 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
106 |         res, _ = self.decomp2(x + y)
107 |         return res, attn
108 | 
109 | 
110 | class Encoder(nn.Module):
111 |     """
112 |     Autoformer encoder
113 |     """
114 | 
115 |     def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
116 |         super(Encoder, self).__init__()
117 |         self.attn_layers = nn.ModuleList(attn_layers)
118 |         self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
119 |         self.norm = norm_layer
120 | 
121 |     def forward(self, x, attn_mask=None):
122 |         attns = []
123 |         if self.conv_layers is not None:
124 |             for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
125 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
126 |                 x = conv_layer(x)
127 |                 attns.append(attn)
128 |             x, attn = self.attn_layers[-1](x)
129 |             attns.append(attn)
130 |         else:
131 |             for attn_layer in self.attn_layers:
132 |                 x, attn = attn_layer(x, attn_mask=attn_mask)
133 |                 attns.append(attn)
134 | 
135 |         if self.norm is not None:
136 |             x = self.norm(x)
137 | 
138 |         return x, attns
139 | 
140 | 
141 | class DecoderLayer(nn.Module):
142 |     """
143 |     Autoformer decoder layer with the progressive decomposition architecture
144 |     """
145 | 
146 |     def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None,
147 |                  moving_avg=25, dropout=0.1, activation="relu"):
148 |         super(DecoderLayer, self).__init__()
149 |         d_ff = d_ff or 4 * d_model
150 |         self.self_attention = self_attention
151 |         self.cross_attention = cross_attention
152 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
153 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
154 |         self.decomp1 = series_decomp(moving_avg)
155 |         self.decomp2 = series_decomp(moving_avg)
156 |         self.decomp3 = series_decomp(moving_avg)
157 |         self.dropout = nn.Dropout(dropout)
158 |         self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1,
159 |                                     padding_mode='circular', bias=False)
160 |         self.activation = F.relu if activation == "relu" else F.gelu
161 | 
162 |     def forward(self, x, cross, x_mask=None, cross_mask=None):
163 |         x = x + self.dropout(self.self_attention(
164 |             x, x, x,
165 |             attn_mask=x_mask
166 |         )[0])
167 |         x, trend1 = self.decomp1(x)
168 |         x = x + self.dropout(self.cross_attention(
169 |             x, cross, cross,
170 |             attn_mask=cross_mask
171 |         )[0])
172 |         x, trend2 = self.decomp2(x)
173 |         y = x
174 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
175 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
176 |         x, trend3 = self.decomp3(x + y)
177 | 
178 |         residual_trend = trend1 + trend2 + trend3
179 |         residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2)
180 |         return x, residual_trend
181 | 
182 | 
183 | class Decoder(nn.Module):
184 |     """
185 |     Autoformer encoder
186 |     """
187 | 
188 |     def __init__(self, layers, norm_layer=None, projection=None):
189 |         super(Decoder, self).__init__()
190 |         self.layers = nn.ModuleList(layers)
191 |         self.norm = norm_layer
192 |         self.projection = projection
193 | 
194 |     def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
195 |         for layer in self.layers:
196 |             x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
197 |             trend = trend + residual_trend
198 | 
199 |         if self.norm is not None:
200 |             x = self.norm(x)
201 | 
202 |         if self.projection is not None:
203 |             x = self.projection(x)
204 |         return x, trend
205 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Conv_Blocks.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class Inception_Block_V1(nn.Module):
 6 |     def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
 7 |         super(Inception_Block_V1, self).__init__()
 8 |         self.in_channels = in_channels
 9 |         self.out_channels = out_channels
10 |         self.num_kernels = num_kernels
11 |         kernels = []
12 |         for i in range(self.num_kernels):
13 |             kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=2 * i + 1, padding=i))
14 |         self.kernels = nn.ModuleList(kernels)
15 |         if init_weight:
16 |             self._initialize_weights()
17 | 
18 |     def _initialize_weights(self):
19 |         for m in self.modules():
20 |             if isinstance(m, nn.Conv2d):
21 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
22 |                 if m.bias is not None:
23 |                     nn.init.constant_(m.bias, 0)
24 | 
25 |     def forward(self, x):
26 |         res_list = []
27 |         for i in range(self.num_kernels):
28 |             res_list.append(self.kernels[i](x))
29 |         res = torch.stack(res_list, dim=-1).mean(-1)
30 |         return res
31 | 
32 | 
33 | class Inception_Block_V2(nn.Module):
34 |     def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
35 |         super(Inception_Block_V2, self).__init__()
36 |         self.in_channels = in_channels
37 |         self.out_channels = out_channels
38 |         self.num_kernels = num_kernels
39 |         kernels = []
40 |         for i in range(self.num_kernels // 2):
41 |             kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[1, 2 * i + 3], padding=[0, i + 1]))
42 |             kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[2 * i + 3, 1], padding=[i + 1, 0]))
43 |         kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=1))
44 |         self.kernels = nn.ModuleList(kernels)
45 |         if init_weight:
46 |             self._initialize_weights()
47 | 
48 |     def _initialize_weights(self):
49 |         for m in self.modules():
50 |             if isinstance(m, nn.Conv2d):
51 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
52 |                 if m.bias is not None:
53 |                     nn.init.constant_(m.bias, 0)
54 | 
55 |     def forward(self, x):
56 |         res_list = []
57 |         for i in range(self.num_kernels + 1):
58 |             res_list.append(self.kernels[i](x))
59 |         res = torch.stack(res_list, dim=-1).mean(-1)
60 |         return res
61 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Embed.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import math
  4 | 
  5 | 
  6 | class PositionalEmbedding(nn.Module):
  7 |     def __init__(self, d_model, max_len=5000):
  8 |         super(PositionalEmbedding, self).__init__()
  9 |         # Compute the positional encodings once in log space.
 10 |         pe = torch.zeros(max_len, d_model).float()
 11 |         pe.require_grad = False
 12 | 
 13 |         position = torch.arange(0, max_len).float().unsqueeze(1)
 14 |         div_term = (torch.arange(0, d_model, 2).float()
 15 |                     * -(math.log(10000.0) / d_model)).exp()
 16 | 
 17 |         pe[:, 0::2] = torch.sin(position * div_term)
 18 |         pe[:, 1::2] = torch.cos(position * div_term)
 19 | 
 20 |         pe = pe.unsqueeze(0)
 21 |         self.register_buffer('pe', pe)
 22 | 
 23 |     def forward(self, x):
 24 |         return self.pe[:, :x.size(1)]
 25 | 
 26 | 
 27 | class TokenEmbedding(nn.Module):
 28 |     def __init__(self, c_in, d_model):
 29 |         super(TokenEmbedding, self).__init__()
 30 |         padding = 1 if torch.__version__ >= '1.5.0' else 2
 31 |         self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
 32 |                                    kernel_size=3, padding=padding, padding_mode='circular', bias=False)
 33 |         for m in self.modules():
 34 |             if isinstance(m, nn.Conv1d):
 35 |                 nn.init.kaiming_normal_(
 36 |                     m.weight, mode='fan_in', nonlinearity='leaky_relu')
 37 | 
 38 |     def forward(self, x):
 39 |         x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
 40 |         return x
 41 | 
 42 | 
 43 | class FixedEmbedding(nn.Module):
 44 |     def __init__(self, c_in, d_model):
 45 |         super(FixedEmbedding, self).__init__()
 46 | 
 47 |         w = torch.zeros(c_in, d_model).float()
 48 |         w.require_grad = False
 49 | 
 50 |         position = torch.arange(0, c_in).float().unsqueeze(1)
 51 |         div_term = (torch.arange(0, d_model, 2).float()
 52 |                     * -(math.log(10000.0) / d_model)).exp()
 53 | 
 54 |         w[:, 0::2] = torch.sin(position * div_term)
 55 |         w[:, 1::2] = torch.cos(position * div_term)
 56 | 
 57 |         self.emb = nn.Embedding(c_in, d_model)
 58 |         self.emb.weight = nn.Parameter(w, requires_grad=False)
 59 | 
 60 |     def forward(self, x):
 61 |         return self.emb(x).detach()
 62 | 
 63 | 
 64 | class TemporalEmbedding(nn.Module):
 65 |     def __init__(self, d_model, embed_type='fixed', freq='h'):
 66 |         super(TemporalEmbedding, self).__init__()
 67 | 
 68 |         minute_size = 4
 69 |         hour_size = 24
 70 |         weekday_size = 7
 71 |         day_size = 32
 72 |         month_size = 13
 73 | 
 74 |         Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
 75 |         if freq == 't':
 76 |             self.minute_embed = Embed(minute_size, d_model)
 77 |         self.hour_embed = Embed(hour_size, d_model)
 78 |         self.weekday_embed = Embed(weekday_size, d_model)
 79 |         self.day_embed = Embed(day_size, d_model)
 80 |         self.month_embed = Embed(month_size, d_model)
 81 | 
 82 |     def forward(self, x):
 83 | 
 84 |         x = x.long()
 85 |         minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0.
 86 |         hour_x = self.hour_embed(x[:, :, 3])
 87 |         weekday_x = self.weekday_embed(x[:, :, 2])
 88 |         day_x = self.day_embed(x[:, :, 1])
 89 |         month_x = self.month_embed(x[:, :, 0])
 90 | 
 91 |         return hour_x + weekday_x + day_x + month_x + minute_x
 92 | 
 93 | 
 94 | class TimeFeatureEmbedding(nn.Module):
 95 |     def __init__(self, d_model, embed_type='timeF', freq='h'):
 96 |         super(TimeFeatureEmbedding, self).__init__()
 97 | 
 98 |         freq_map = {'h': 4, 't': 5, 's': 6,
 99 |                     'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3}
100 |         d_inp = freq_map[freq]
101 |         self.embed = nn.Linear(d_inp, d_model, bias=False)
102 | 
103 |     def forward(self, x):
104 |         return self.embed(x)
105 | 
106 | 
107 | class DataEmbedding(nn.Module):
108 |     def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
109 |         super(DataEmbedding, self).__init__()
110 | 
111 |         self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
112 |         self.position_embedding = PositionalEmbedding(d_model=d_model)
113 |         self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
114 |                                                     freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
115 |             d_model=d_model, embed_type=embed_type, freq=freq)
116 |         self.dropout = nn.Dropout(p=dropout)
117 | 
118 |     def forward(self, x, x_mark=None):
119 | 
120 |         if x_mark is None:
121 |             x = self.value_embedding(x) + self.position_embedding(x)
122 |         else:
123 |             x = self.value_embedding(x) + self.temporal_embedding(x_mark) + self.position_embedding(x)
124 |         return self.dropout(x)
125 | 
126 | 
127 | class DataEmbedding_wo_pos(nn.Module):
128 |     def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
129 |         super(DataEmbedding_wo_pos, self).__init__()
130 | 
131 |         self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
132 | 
133 |         self.position_embedding = PositionalEmbedding(d_model=d_model)
134 |         self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
135 |                                                     freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
136 |             d_model=d_model, embed_type=embed_type, freq=freq)
137 |         self.dropout = nn.Dropout(p=dropout)
138 | 
139 |     def forward(self, x, x_mark):
140 |         if x_mark is None:
141 |             x = self.value_embedding(x)
142 |         else:
143 |             x = self.value_embedding(x) + self.temporal_embedding(x_mark)
144 |         return self.dropout(x)
145 | 
146 | 
147 | class PatchEmbedding(nn.Module):
148 |     def __init__(self, d_model, patch_len, stride, dropout):
149 |         super(PatchEmbedding, self).__init__()
150 |         # Patching
151 |         self.patch_len = patch_len
152 |         self.stride = stride
153 |         self.padding_patch_layer = nn.ReplicationPad1d((0, stride))
154 | 
155 |         # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space
156 |         self.value_embedding = TokenEmbedding(patch_len, d_model)
157 | 
158 |         # Positional embedding
159 |         self.position_embedding = PositionalEmbedding(d_model)
160 | 
161 |         # Residual dropout
162 |         self.dropout = nn.Dropout(dropout)
163 | 
164 |     def forward(self, x):
165 |         # do patching
166 |         n_vars = x.shape[1]
167 |         x = self.padding_patch_layer(x)
168 |         x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
169 |         x = torch.reshape(x, (x.shape[0] * x.shape[1], x.shape[2], x.shape[3])) # channel independent
170 | 
171 |         # Input encoding
172 |         x = self.value_embedding(x) + self.position_embedding(x)
173 |         return self.dropout(x), n_vars
174 | 
175 | 
176 | class PatchEmbedding_wo_channel_independent(nn.Module):
177 |     def __init__(self, n_vars, d_model, patch_len, stride, dropout):
178 |         super(PatchEmbedding_wo_channel_independent, self).__init__()
179 |         # Patching
180 |         self.n_vars = n_vars
181 |         self.patch_len = patch_len
182 |         self.stride = stride
183 |         self.padding_patch_layer = nn.ReplicationPad1d((0, stride))
184 | 
185 |         # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space
186 |         self.value_embedding = TokenEmbedding(patch_len*n_vars, d_model)
187 | 
188 |         # Positional embedding
189 |         self.position_embedding = PositionalEmbedding(d_model)
190 | 
191 |         # Residual dropout
192 |         self.dropout = nn.Dropout(dropout)
193 | 
194 |     def forward(self, x):
195 |         # do patching
196 |         n_vars = x.shape[1]
197 |         x = self.padding_patch_layer(x)
198 |         x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
199 | 
200 |         x = torch.reshape(x, (x.shape[0], x.shape[2], x.shape[1]*x.shape[3]))
201 | 
202 |         # Input encoding
203 |         x = self.value_embedding(x) + self.position_embedding(x)
204 |         return self.dropout(x), n_vars
205 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/FourierCorrelation.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # author=maziqing
  3 | # email=maziqing.mzq@alibaba-inc.com
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | 
  9 | 
 10 | def get_frequency_modes(seq_len, modes=64, mode_select_method='random'):
 11 |     """
 12 |     get modes on frequency domain:
 13 |     'random' means sampling randomly;
 14 |     'else' means sampling the lowest modes;
 15 |     """
 16 |     modes = min(modes, seq_len // 2)
 17 |     if mode_select_method == 'random':
 18 |         index = list(range(0, seq_len // 2))
 19 |         np.random.shuffle(index)
 20 |         index = index[:modes]
 21 |     else:
 22 |         index = list(range(0, modes))
 23 |     index.sort()
 24 |     return index
 25 | 
 26 | 
 27 | # ########## fourier layer #############
 28 | class FourierBlock(nn.Module):
 29 |     def __init__(self, in_channels, out_channels, seq_len, modes=0, mode_select_method='random'):
 30 |         super(FourierBlock, self).__init__()
 31 |         print('fourier enhanced block used!')
 32 |         """
 33 |         1D Fourier block. It performs representation learning on frequency domain, 
 34 |         it does FFT, linear transform, and Inverse FFT.    
 35 |         """
 36 |         # get modes on frequency domain
 37 |         self.index = get_frequency_modes(seq_len, modes=modes, mode_select_method=mode_select_method)
 38 |         print('modes={}, index={}'.format(modes, self.index))
 39 | 
 40 |         self.scale = (1 / (in_channels * out_channels))
 41 |         self.weights1 = nn.Parameter(
 42 |             self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float))
 43 |         self.weights2 = nn.Parameter(
 44 |             self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float))
 45 | 
 46 |     # Complex multiplication
 47 |     def compl_mul1d(self, order, x, weights):
 48 |         x_flag = True
 49 |         w_flag = True
 50 |         if not torch.is_complex(x):
 51 |             x_flag = False
 52 |             x = torch.complex(x, torch.zeros_like(x).to(x.device))
 53 |         if not torch.is_complex(weights):
 54 |             w_flag = False
 55 |             weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device))
 56 |         if x_flag or w_flag:
 57 |             return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag),
 58 |                                  torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real))
 59 |         else:
 60 |             return torch.einsum(order, x.real, weights.real)
 61 | 
 62 |     def forward(self, q, k, v, mask):
 63 |         # size = [B, L, H, E]
 64 |         B, L, H, E = q.shape
 65 |         x = q.permute(0, 2, 3, 1)
 66 |         # Compute Fourier coefficients
 67 |         x_ft = torch.fft.rfft(x, dim=-1)
 68 |         # Perform Fourier neural operations
 69 |         out_ft = torch.zeros(B, H, E, L // 2 + 1, device=x.device, dtype=torch.cfloat)
 70 |         for wi, i in enumerate(self.index):
 71 |             if i >= x_ft.shape[3] or wi >= out_ft.shape[3]:
 72 |                 continue
 73 |             out_ft[:, :, :, wi] = self.compl_mul1d("bhi,hio->bho", x_ft[:, :, :, i],
 74 |                                                    torch.complex(self.weights1, self.weights2)[:, :, :, wi])
 75 |         # Return to time domain
 76 |         x = torch.fft.irfft(out_ft, n=x.size(-1))
 77 |         return (x, None)
 78 | 
 79 | 
 80 | # ########## Fourier Cross Former ####################
 81 | class FourierCrossAttention(nn.Module):
 82 |     def __init__(self, in_channels, out_channels, seq_len_q, seq_len_kv, modes=64, mode_select_method='random',
 83 |                  activation='tanh', policy=0):
 84 |         super(FourierCrossAttention, self).__init__()
 85 |         print(' fourier enhanced cross attention used!')
 86 |         """
 87 |         1D Fourier Cross Attention layer. It does FFT, linear transform, attention mechanism and Inverse FFT.    
 88 |         """
 89 |         self.activation = activation
 90 |         self.in_channels = in_channels
 91 |         self.out_channels = out_channels
 92 |         # get modes for queries and keys (& values) on frequency domain
 93 |         self.index_q = get_frequency_modes(seq_len_q, modes=modes, mode_select_method=mode_select_method)
 94 |         self.index_kv = get_frequency_modes(seq_len_kv, modes=modes, mode_select_method=mode_select_method)
 95 | 
 96 |         print('modes_q={}, index_q={}'.format(len(self.index_q), self.index_q))
 97 |         print('modes_kv={}, index_kv={}'.format(len(self.index_kv), self.index_kv))
 98 | 
 99 |         self.scale = (1 / (in_channels * out_channels))
100 |         self.weights1 = nn.Parameter(
101 |             self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float))
102 |         self.weights2 = nn.Parameter(
103 |             self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float))
104 | 
105 |     # Complex multiplication
106 |     def compl_mul1d(self, order, x, weights):
107 |         x_flag = True
108 |         w_flag = True
109 |         if not torch.is_complex(x):
110 |             x_flag = False
111 |             x = torch.complex(x, torch.zeros_like(x).to(x.device))
112 |         if not torch.is_complex(weights):
113 |             w_flag = False
114 |             weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device))
115 |         if x_flag or w_flag:
116 |             return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag),
117 |                                  torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real))
118 |         else:
119 |             return torch.einsum(order, x.real, weights.real)
120 | 
121 |     def forward(self, q, k, v, mask):
122 |         # size = [B, L, H, E]
123 |         B, L, H, E = q.shape
124 |         xq = q.permute(0, 2, 3, 1)  # size = [B, H, E, L]
125 |         xk = k.permute(0, 2, 3, 1)
126 |         xv = v.permute(0, 2, 3, 1)
127 | 
128 |         # Compute Fourier coefficients
129 |         xq_ft_ = torch.zeros(B, H, E, len(self.index_q), device=xq.device, dtype=torch.cfloat)
130 |         xq_ft = torch.fft.rfft(xq, dim=-1)
131 |         for i, j in enumerate(self.index_q):
132 |             if j >= xq_ft.shape[3]:
133 |                 continue
134 |             xq_ft_[:, :, :, i] = xq_ft[:, :, :, j]
135 |         xk_ft_ = torch.zeros(B, H, E, len(self.index_kv), device=xq.device, dtype=torch.cfloat)
136 |         xk_ft = torch.fft.rfft(xk, dim=-1)
137 |         for i, j in enumerate(self.index_kv):
138 |             if j >= xk_ft.shape[3]:
139 |                 continue
140 |             xk_ft_[:, :, :, i] = xk_ft[:, :, :, j]
141 | 
142 |         # perform attention mechanism on frequency domain
143 |         xqk_ft = (self.compl_mul1d("bhex,bhey->bhxy", xq_ft_, xk_ft_))
144 |         if self.activation == 'tanh':
145 |             xqk_ft = torch.complex(xqk_ft.real.tanh(), xqk_ft.imag.tanh())
146 |         elif self.activation == 'softmax':
147 |             xqk_ft = torch.softmax(abs(xqk_ft), dim=-1)
148 |             xqk_ft = torch.complex(xqk_ft, torch.zeros_like(xqk_ft))
149 |         else:
150 |             raise Exception('{} actiation function is not implemented'.format(self.activation))
151 |         xqkv_ft = self.compl_mul1d("bhxy,bhey->bhex", xqk_ft, xk_ft_)
152 |         xqkvw = self.compl_mul1d("bhex,heox->bhox", xqkv_ft, torch.complex(self.weights1, self.weights2))
153 |         out_ft = torch.zeros(B, H, E, L // 2 + 1, device=xq.device, dtype=torch.cfloat)
154 |         for i, j in enumerate(self.index_q):
155 |             if i >= xqkvw.shape[3] or j >= out_ft.shape[3]:
156 |                 continue
157 |             out_ft[:, :, :, j] = xqkvw[:, :, :, i]
158 |         # Return to time domain
159 |         out = torch.fft.irfft(out_ft / self.in_channels / self.out_channels, n=xq.size(-1))
160 |         return (out, None)
161 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Pyraformer_EncDec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from torch.nn.modules.linear import Linear
  5 | from layers.SelfAttention_Family import AttentionLayer, FullAttention
  6 | from layers.Embed import DataEmbedding
  7 | import math
  8 | 
  9 | 
 10 | def get_mask(input_size, window_size, inner_size):
 11 |     """Get the attention mask of PAM-Naive"""
 12 |     # Get the size of all layers
 13 |     all_size = []
 14 |     all_size.append(input_size)
 15 |     for i in range(len(window_size)):
 16 |         layer_size = math.floor(all_size[i] / window_size[i])
 17 |         all_size.append(layer_size)
 18 | 
 19 |     seq_length = sum(all_size)
 20 |     mask = torch.zeros(seq_length, seq_length)
 21 | 
 22 |     # get intra-scale mask
 23 |     inner_window = inner_size // 2
 24 |     for layer_idx in range(len(all_size)):
 25 |         start = sum(all_size[:layer_idx])
 26 |         for i in range(start, start + all_size[layer_idx]):
 27 |             left_side = max(i - inner_window, start)
 28 |             right_side = min(i + inner_window + 1, start + all_size[layer_idx])
 29 |             mask[i, left_side:right_side] = 1
 30 | 
 31 |     # get inter-scale mask
 32 |     for layer_idx in range(1, len(all_size)):
 33 |         start = sum(all_size[:layer_idx])
 34 |         for i in range(start, start + all_size[layer_idx]):
 35 |             left_side = (start - all_size[layer_idx - 1]) + \
 36 |                 (i - start) * window_size[layer_idx - 1]
 37 |             if i == (start + all_size[layer_idx] - 1):
 38 |                 right_side = start
 39 |             else:
 40 |                 right_side = (
 41 |                     start - all_size[layer_idx - 1]) + (i - start + 1) * window_size[layer_idx - 1]
 42 |             mask[i, left_side:right_side] = 1
 43 |             mask[left_side:right_side, i] = 1
 44 | 
 45 |     mask = (1 - mask).bool()
 46 | 
 47 |     return mask, all_size
 48 | 
 49 | 
 50 | def refer_points(all_sizes, window_size):
 51 |     """Gather features from PAM's pyramid sequences"""
 52 |     input_size = all_sizes[0]
 53 |     indexes = torch.zeros(input_size, len(all_sizes))
 54 | 
 55 |     for i in range(input_size):
 56 |         indexes[i][0] = i
 57 |         former_index = i
 58 |         for j in range(1, len(all_sizes)):
 59 |             start = sum(all_sizes[:j])
 60 |             inner_layer_idx = former_index - (start - all_sizes[j - 1])
 61 |             former_index = start + \
 62 |                 min(inner_layer_idx // window_size[j - 1], all_sizes[j] - 1)
 63 |             indexes[i][j] = former_index
 64 | 
 65 |     indexes = indexes.unsqueeze(0).unsqueeze(3)
 66 | 
 67 |     return indexes.long()
 68 | 
 69 | 
 70 | class RegularMask():
 71 |     def __init__(self, mask):
 72 |         self._mask = mask.unsqueeze(1)
 73 | 
 74 |     @property
 75 |     def mask(self):
 76 |         return self._mask
 77 | 
 78 | 
 79 | class EncoderLayer(nn.Module):
 80 |     """ Compose with two layers """
 81 | 
 82 |     def __init__(self, d_model, d_inner, n_head, dropout=0.1, normalize_before=True):
 83 |         super(EncoderLayer, self).__init__()
 84 | 
 85 |         self.slf_attn = AttentionLayer(
 86 |             FullAttention(mask_flag=True, factor=0,
 87 |                           attention_dropout=dropout, output_attention=False),
 88 |             d_model, n_head)
 89 |         self.pos_ffn = PositionwiseFeedForward(
 90 |             d_model, d_inner, dropout=dropout, normalize_before=normalize_before)
 91 | 
 92 |     def forward(self, enc_input, slf_attn_mask=None):
 93 |         attn_mask = RegularMask(slf_attn_mask)
 94 |         enc_output, _ = self.slf_attn(
 95 |             enc_input, enc_input, enc_input, attn_mask=attn_mask)
 96 |         enc_output = self.pos_ffn(enc_output)
 97 |         return enc_output
 98 | 
 99 | 
100 | class Encoder(nn.Module):
101 |     """ A encoder model with self attention mechanism. """
102 | 
103 |     def __init__(self, configs, window_size, inner_size):
104 |         super().__init__()
105 | 
106 |         d_bottleneck = configs.d_model//4
107 | 
108 |         self.mask, self.all_size = get_mask(
109 |             configs.seq_len, window_size, inner_size)
110 |         self.indexes = refer_points(self.all_size, window_size)
111 |         self.layers = nn.ModuleList([
112 |             EncoderLayer(configs.d_model, configs.d_ff, configs.n_heads, dropout=configs.dropout,
113 |                          normalize_before=False) for _ in range(configs.e_layers)
114 |         ])  # naive pyramid attention
115 | 
116 |         self.enc_embedding = DataEmbedding(
117 |             configs.enc_in, configs.d_model, configs.dropout)
118 |         self.conv_layers = Bottleneck_Construct(
119 |             configs.d_model, window_size, d_bottleneck)
120 | 
121 |     def forward(self, x_enc, x_mark_enc):
122 |         seq_enc = self.enc_embedding(x_enc, x_mark_enc)
123 | 
124 |         mask = self.mask.repeat(len(seq_enc), 1, 1).to(x_enc.device)
125 |         seq_enc = self.conv_layers(seq_enc)
126 | 
127 |         for i in range(len(self.layers)):
128 |             seq_enc = self.layers[i](seq_enc, mask)
129 | 
130 |         indexes = self.indexes.repeat(seq_enc.size(
131 |             0), 1, 1, seq_enc.size(2)).to(seq_enc.device)
132 |         indexes = indexes.view(seq_enc.size(0), -1, seq_enc.size(2))
133 |         all_enc = torch.gather(seq_enc, 1, indexes)
134 |         seq_enc = all_enc.view(seq_enc.size(0), self.all_size[0], -1)
135 | 
136 |         return seq_enc
137 | 
138 | 
139 | class ConvLayer(nn.Module):
140 |     def __init__(self, c_in, window_size):
141 |         super(ConvLayer, self).__init__()
142 |         self.downConv = nn.Conv1d(in_channels=c_in,
143 |                                   out_channels=c_in,
144 |                                   kernel_size=window_size,
145 |                                   stride=window_size)
146 |         self.norm = nn.BatchNorm1d(c_in)
147 |         self.activation = nn.ELU()
148 | 
149 |     def forward(self, x):
150 |         x = self.downConv(x)
151 |         x = self.norm(x)
152 |         x = self.activation(x)
153 |         return x
154 | 
155 | 
156 | class Bottleneck_Construct(nn.Module):
157 |     """Bottleneck convolution CSCM"""
158 | 
159 |     def __init__(self, d_model, window_size, d_inner):
160 |         super(Bottleneck_Construct, self).__init__()
161 |         if not isinstance(window_size, list):
162 |             self.conv_layers = nn.ModuleList([
163 |                 ConvLayer(d_inner, window_size),
164 |                 ConvLayer(d_inner, window_size),
165 |                 ConvLayer(d_inner, window_size)
166 |             ])
167 |         else:
168 |             self.conv_layers = []
169 |             for i in range(len(window_size)):
170 |                 self.conv_layers.append(ConvLayer(d_inner, window_size[i]))
171 |             self.conv_layers = nn.ModuleList(self.conv_layers)
172 |         self.up = Linear(d_inner, d_model)
173 |         self.down = Linear(d_model, d_inner)
174 |         self.norm = nn.LayerNorm(d_model)
175 | 
176 |     def forward(self, enc_input):
177 |         temp_input = self.down(enc_input).permute(0, 2, 1)
178 |         all_inputs = []
179 |         for i in range(len(self.conv_layers)):
180 |             temp_input = self.conv_layers[i](temp_input)
181 |             all_inputs.append(temp_input)
182 | 
183 |         all_inputs = torch.cat(all_inputs, dim=2).transpose(1, 2)
184 |         all_inputs = self.up(all_inputs)
185 |         all_inputs = torch.cat([enc_input, all_inputs], dim=1)
186 | 
187 |         all_inputs = self.norm(all_inputs)
188 |         return all_inputs
189 | 
190 | 
191 | class PositionwiseFeedForward(nn.Module):
192 |     """ Two-layer position-wise feed-forward neural network. """
193 | 
194 |     def __init__(self, d_in, d_hid, dropout=0.1, normalize_before=True):
195 |         super().__init__()
196 | 
197 |         self.normalize_before = normalize_before
198 | 
199 |         self.w_1 = nn.Linear(d_in, d_hid)
200 |         self.w_2 = nn.Linear(d_hid, d_in)
201 | 
202 |         self.layer_norm = nn.LayerNorm(d_in, eps=1e-6)
203 |         self.dropout = nn.Dropout(dropout)
204 | 
205 |     def forward(self, x):
206 |         residual = x
207 |         if self.normalize_before:
208 |             x = self.layer_norm(x)
209 | 
210 |         x = F.gelu(self.w_1(x))
211 |         x = self.dropout(x)
212 |         x = self.w_2(x)
213 |         x = self.dropout(x)
214 |         x = x + residual
215 | 
216 |         if not self.normalize_before:
217 |             x = self.layer_norm(x)
218 |         return x
219 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/SelfAttention_Family.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import numpy as np
  4 | from math import sqrt
  5 | from utils.masking import TriangularCausalMask, ProbMask
  6 | from reformer_pytorch import LSHSelfAttention
  7 | 
  8 | 
  9 | class DSAttention(nn.Module):
 10 |     '''De-stationary Attention'''
 11 | 
 12 |     def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
 13 |         super(DSAttention, self).__init__()
 14 |         self.scale = scale
 15 |         self.mask_flag = mask_flag
 16 |         self.output_attention = output_attention
 17 |         self.dropout = nn.Dropout(attention_dropout)
 18 | 
 19 |     def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
 20 |         B, L, H, E = queries.shape
 21 |         _, S, _, D = values.shape
 22 |         scale = self.scale or 1. / sqrt(E)
 23 | 
 24 |         tau = 1.0 if tau is None else tau.unsqueeze(
 25 |             1).unsqueeze(1)  # B x 1 x 1 x 1
 26 |         delta = 0.0 if delta is None else delta.unsqueeze(
 27 |             1).unsqueeze(1)  # B x 1 x 1 x S
 28 | 
 29 |         # De-stationary Attention, rescaling pre-softmax score with learned de-stationary factors
 30 |         scores = torch.einsum("blhe,bshe->bhls", queries, keys) * tau + delta
 31 | 
 32 |         if self.mask_flag:
 33 |             if attn_mask is None:
 34 |                 attn_mask = TriangularCausalMask(B, L, device=queries.device)
 35 | 
 36 |             scores.masked_fill_(attn_mask.mask, -np.inf)
 37 | 
 38 |         A = self.dropout(torch.softmax(scale * scores, dim=-1))
 39 |         V = torch.einsum("bhls,bshd->blhd", A, values)
 40 | 
 41 |         if self.output_attention:
 42 |             return (V.contiguous(), A)
 43 |         else:
 44 |             return (V.contiguous(), None)
 45 | 
 46 | 
 47 | class FullAttention(nn.Module):
 48 |     def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
 49 |         super(FullAttention, self).__init__()
 50 |         self.scale = scale
 51 |         self.mask_flag = mask_flag
 52 |         self.output_attention = output_attention
 53 |         self.dropout = nn.Dropout(attention_dropout)
 54 | 
 55 |     def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
 56 |         B, L, H, E = queries.shape
 57 |         _, S, _, D = values.shape
 58 |         scale = self.scale or 1. / sqrt(E)
 59 | 
 60 |         scores = torch.einsum("blhe,bshe->bhls", queries, keys)
 61 | 
 62 |         if self.mask_flag:
 63 |             if attn_mask is None:
 64 |                 attn_mask = TriangularCausalMask(B, L, device=queries.device)
 65 | 
 66 |             scores.masked_fill_(attn_mask.mask, -np.inf)
 67 | 
 68 |         A = self.dropout(torch.softmax(scale * scores, dim=-1))
 69 |         V = torch.einsum("bhls,bshd->blhd", A, values)
 70 | 
 71 |         if self.output_attention:
 72 |             return (V.contiguous(), A)
 73 |         else:
 74 |             return (V.contiguous(), None)
 75 | 
 76 | 
 77 | class ProbAttention(nn.Module):
 78 |     def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
 79 |         super(ProbAttention, self).__init__()
 80 |         self.factor = factor
 81 |         self.scale = scale
 82 |         self.mask_flag = mask_flag
 83 |         self.output_attention = output_attention
 84 |         self.dropout = nn.Dropout(attention_dropout)
 85 | 
 86 |     def _prob_QK(self, Q, K, sample_k, n_top):  # n_top: c*ln(L_q)
 87 |         # Q [B, H, L, D]
 88 |         B, H, L_K, E = K.shape
 89 |         _, _, L_Q, _ = Q.shape
 90 | 
 91 |         # calculate the sampled Q_K
 92 |         K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)
 93 |         # real U = U_part(factor*ln(L_k))*L_q
 94 |         index_sample = torch.randint(L_K, (L_Q, sample_k))
 95 |         K_sample = K_expand[:, :, torch.arange(
 96 |             L_Q).unsqueeze(1), index_sample, :]
 97 |         Q_K_sample = torch.matmul(
 98 |             Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()
 99 | 
100 |         # find the Top_k query with sparisty measurement
101 |         M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
102 |         M_top = M.topk(n_top, sorted=False)[1]
103 | 
104 |         # use the reduced Q to calculate Q_K
105 |         Q_reduce = Q[torch.arange(B)[:, None, None],
106 |                    torch.arange(H)[None, :, None],
107 |                    M_top, :]  # factor*ln(L_q)
108 |         Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1))  # factor*ln(L_q)*L_k
109 | 
110 |         return Q_K, M_top
111 | 
112 |     def _get_initial_context(self, V, L_Q):
113 |         B, H, L_V, D = V.shape
114 |         if not self.mask_flag:
115 |             # V_sum = V.sum(dim=-2)
116 |             V_sum = V.mean(dim=-2)
117 |             contex = V_sum.unsqueeze(-2).expand(B, H,
118 |                                                 L_Q, V_sum.shape[-1]).clone()
119 |         else:  # use mask
120 |             # requires that L_Q == L_V, i.e. for self-attention only
121 |             assert (L_Q == L_V)
122 |             contex = V.cumsum(dim=-2)
123 |         return contex
124 | 
125 |     def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
126 |         B, H, L_V, D = V.shape
127 | 
128 |         if self.mask_flag:
129 |             attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
130 |             scores.masked_fill_(attn_mask.mask, -np.inf)
131 | 
132 |         attn = torch.softmax(scores, dim=-1)  # nn.Softmax(dim=-1)(scores)
133 | 
134 |         context_in[torch.arange(B)[:, None, None],
135 |         torch.arange(H)[None, :, None],
136 |         index, :] = torch.matmul(attn, V).type_as(context_in)
137 |         if self.output_attention:
138 |             attns = (torch.ones([B, H, L_V, L_V]) /
139 |                      L_V).type_as(attn).to(attn.device)
140 |             attns[torch.arange(B)[:, None, None], torch.arange(H)[
141 |                                                   None, :, None], index, :] = attn
142 |             return (context_in, attns)
143 |         else:
144 |             return (context_in, None)
145 | 
146 |     def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
147 |         B, L_Q, H, D = queries.shape
148 |         _, L_K, _, _ = keys.shape
149 | 
150 |         queries = queries.transpose(2, 1)
151 |         keys = keys.transpose(2, 1)
152 |         values = values.transpose(2, 1)
153 | 
154 |         U_part = self.factor * \
155 |                  np.ceil(np.log(L_K)).astype('int').item()  # c*ln(L_k)
156 |         u = self.factor * \
157 |             np.ceil(np.log(L_Q)).astype('int').item()  # c*ln(L_q)
158 | 
159 |         U_part = U_part if U_part < L_K else L_K
160 |         u = u if u < L_Q else L_Q
161 | 
162 |         scores_top, index = self._prob_QK(
163 |             queries, keys, sample_k=U_part, n_top=u)
164 | 
165 |         # add scale factor
166 |         scale = self.scale or 1. / sqrt(D)
167 |         if scale is not None:
168 |             scores_top = scores_top * scale
169 |         # get the context
170 |         context = self._get_initial_context(values, L_Q)
171 |         # update the context with selected top_k queries
172 |         context, attn = self._update_context(
173 |             context, values, scores_top, index, L_Q, attn_mask)
174 | 
175 |         return context.contiguous(), attn
176 | 
177 | 
178 | class AttentionLayer(nn.Module):
179 |     def __init__(self, attention, d_model, n_heads, d_keys=None,
180 |                  d_values=None):
181 |         super(AttentionLayer, self).__init__()
182 | 
183 |         d_keys = d_keys or (d_model // n_heads)
184 |         d_values = d_values or (d_model // n_heads)
185 | 
186 |         self.inner_attention = attention
187 |         self.query_projection = nn.Linear(d_model, d_keys * n_heads)
188 |         self.key_projection = nn.Linear(d_model, d_keys * n_heads)
189 |         self.value_projection = nn.Linear(d_model, d_values * n_heads)
190 |         self.out_projection = nn.Linear(d_values * n_heads, d_model)
191 |         self.n_heads = n_heads
192 | 
193 |     def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
194 |         B, L, _ = queries.shape
195 |         _, S, _ = keys.shape
196 |         H = self.n_heads
197 | 
198 |         queries = self.query_projection(queries).view(B, L, H, -1)
199 |         keys = self.key_projection(keys).view(B, S, H, -1)
200 |         values = self.value_projection(values).view(B, S, H, -1)
201 | 
202 |         out, attn = self.inner_attention(
203 |             queries,
204 |             keys,
205 |             values,
206 |             attn_mask,
207 |             tau=tau,
208 |             delta=delta
209 |         )
210 |         out = out.view(B, L, -1)
211 | 
212 |         return self.out_projection(out), attn
213 | 
214 | 
215 | class ReformerLayer(nn.Module):
216 |     def __init__(self, attention, d_model, n_heads, d_keys=None,
217 |                  d_values=None, causal=False, bucket_size=4, n_hashes=4):
218 |         super().__init__()
219 |         self.bucket_size = bucket_size
220 |         self.attn = LSHSelfAttention(
221 |             dim=d_model,
222 |             heads=n_heads,
223 |             bucket_size=bucket_size,
224 |             n_hashes=n_hashes,
225 |             causal=causal
226 |         )
227 | 
228 |     def fit_length(self, queries):
229 |         # inside reformer: assert N % (bucket_size * 2) == 0
230 |         B, N, C = queries.shape
231 |         if N % (self.bucket_size * 2) == 0:
232 |             return queries
233 |         else:
234 |             # fill the time series
235 |             fill_len = (self.bucket_size * 2) - (N % (self.bucket_size * 2))
236 |             return torch.cat([queries, torch.zeros([B, fill_len, C]).to(queries.device)], dim=1)
237 | 
238 |     def forward(self, queries, keys, values, attn_mask, tau, delta):
239 |         # in Reformer: defalut queries=keys
240 |         B, N, C = queries.shape
241 |         queries = self.attn(self.fit_length(queries))[:, :N, :]
242 |         return queries, None
243 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Transformer_EncDec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class ConvLayer(nn.Module):
  7 |     def __init__(self, c_in):
  8 |         super(ConvLayer, self).__init__()
  9 |         self.downConv = nn.Conv1d(in_channels=c_in,
 10 |                                   out_channels=c_in,
 11 |                                   kernel_size=3,
 12 |                                   padding=2,
 13 |                                   padding_mode='circular')
 14 |         self.norm = nn.BatchNorm1d(c_in)
 15 |         self.activation = nn.ELU()
 16 |         self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
 17 | 
 18 |     def forward(self, x):
 19 |         x = self.downConv(x.permute(0, 2, 1))
 20 |         x = self.norm(x)
 21 |         x = self.activation(x)
 22 |         x = self.maxPool(x)
 23 |         x = x.transpose(1, 2)
 24 |         return x
 25 | 
 26 | 
 27 | class EncoderLayer(nn.Module):
 28 |     def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
 29 |         super(EncoderLayer, self).__init__()
 30 |         d_ff = d_ff or 4 * d_model
 31 |         self.attention = attention
 32 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 33 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 34 |         self.norm1 = nn.LayerNorm(d_model)
 35 |         self.norm2 = nn.LayerNorm(d_model)
 36 |         self.dropout = nn.Dropout(dropout)
 37 |         self.activation = F.relu if activation == "relu" else F.gelu
 38 | 
 39 |     def forward(self, x, attn_mask=None, tau=None, delta=None):
 40 |         new_x, attn = self.attention(
 41 |             x, x, x,
 42 |             attn_mask=attn_mask,
 43 |             tau=tau, delta=delta
 44 |         )
 45 |         x = x + self.dropout(new_x)
 46 | 
 47 |         y = x = self.norm1(x)
 48 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
 49 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
 50 | 
 51 |         return self.norm2(x + y), attn
 52 | 
 53 | 
 54 | class Encoder(nn.Module):
 55 |     def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
 56 |         super(Encoder, self).__init__()
 57 |         self.attn_layers = nn.ModuleList(attn_layers)
 58 |         self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
 59 |         self.norm = norm_layer
 60 | 
 61 |     def forward(self, x, attn_mask=None, tau=None, delta=None):
 62 |         # x [B, L, D]
 63 |         attns = []
 64 |         if self.conv_layers is not None:
 65 |             for i, (attn_layer, conv_layer) in enumerate(zip(self.attn_layers, self.conv_layers)):
 66 |                 delta = delta if i == 0 else None
 67 |                 x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
 68 |                 x = conv_layer(x)
 69 |                 attns.append(attn)
 70 |             x, attn = self.attn_layers[-1](x, tau=tau, delta=None)
 71 |             attns.append(attn)
 72 |         else:
 73 |             for attn_layer in self.attn_layers:
 74 |                 x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
 75 |                 attns.append(attn)
 76 | 
 77 |         if self.norm is not None:
 78 |             x = self.norm(x)
 79 | 
 80 |         return x, attns
 81 | 
 82 | 
 83 | class DecoderLayer(nn.Module):
 84 |     def __init__(self, self_attention, cross_attention, d_model, d_ff=None,
 85 |                  dropout=0.1, activation="relu"):
 86 |         super(DecoderLayer, self).__init__()
 87 |         d_ff = d_ff or 4 * d_model
 88 |         self.self_attention = self_attention
 89 |         self.cross_attention = cross_attention
 90 |         self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
 91 |         self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
 92 |         self.norm1 = nn.LayerNorm(d_model)
 93 |         self.norm2 = nn.LayerNorm(d_model)
 94 |         self.norm3 = nn.LayerNorm(d_model)
 95 |         self.dropout = nn.Dropout(dropout)
 96 |         self.activation = F.relu if activation == "relu" else F.gelu
 97 | 
 98 |     def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
 99 |         x = x + self.dropout(self.self_attention(
100 |             x, x, x,
101 |             attn_mask=x_mask,
102 |             tau=tau, delta=None
103 |         )[0])
104 |         x = self.norm1(x)
105 | 
106 |         x = x + self.dropout(self.cross_attention(
107 |             x, cross, cross,
108 |             attn_mask=cross_mask,
109 |             tau=tau, delta=delta
110 |         )[0])
111 | 
112 |         y = x = self.norm2(x)
113 |         y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
114 |         y = self.dropout(self.conv2(y).transpose(-1, 1))
115 | 
116 |         return self.norm3(x + y)
117 | 
118 | 
119 | class Decoder(nn.Module):
120 |     def __init__(self, layers, norm_layer=None, projection=None):
121 |         super(Decoder, self).__init__()
122 |         self.layers = nn.ModuleList(layers)
123 |         self.norm = norm_layer
124 |         self.projection = projection
125 | 
126 |     def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
127 |         for layer in self.layers:
128 |             x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask, tau=tau, delta=delta)
129 | 
130 |         if self.norm is not None:
131 |             x = self.norm(x)
132 | 
133 |         if self.projection is not None:
134 |             x = self.projection(x)
135 |         return x
136 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/layers/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/PatchTST.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | from layers.Transformer_EncDec import Encoder, EncoderLayer
  4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer
  5 | from layers.Embed import PatchEmbedding
  6 | from utils.losses import AutomaticWeightedLoss
  7 | from utils.tools import ContrastiveWeight, AggregationRebuild
  8 | 
  9 | class Flatten_Head(nn.Module):
 10 |     def __init__(self, nf, pred_len, head_dropout=0):
 11 |         super().__init__()
 12 |         self.flatten = nn.Flatten(start_dim=-2)
 13 |         self.linear = nn.Linear(nf, pred_len)
 14 |         self.dropout = nn.Dropout(head_dropout)
 15 | 
 16 |     def forward(self, x):  # [bs x n_vars x patch_num x d_model]
 17 |         x = self.flatten(x) # [bs x n_vars x (patch_num * d_model)]
 18 |         x = self.linear(x) # [bs x n_vars x pred_len]
 19 |         x = self.dropout(x) # [bs x n_vars x pred_len]
 20 |         return x
 21 | 
 22 | class Pooler_Head(nn.Module):
 23 |     def __init__(self, nf, dimension=128, head_dropout=0):
 24 |         super().__init__()
 25 | 
 26 |         self.pooler = nn.Sequential(
 27 |             nn.Flatten(start_dim=-2),
 28 |             nn.Linear(nf, nf // 2),
 29 |             nn.BatchNorm1d(nf // 2),
 30 |             nn.ReLU(),
 31 |             nn.Linear(nf // 2, dimension),
 32 |             nn.Dropout(head_dropout),
 33 |         )
 34 | 
 35 |     def forward(self, x):  # [(bs * n_vars) x patch_num x d_model]
 36 |         x = self.pooler(x) # [(bs * n_vars) x dimension]
 37 |         return x
 38 | 
 39 | class Model(nn.Module):
 40 |     """
 41 |     PatchTST + SimMTM
 42 |     """
 43 | 
 44 |     def __init__(self, configs):
 45 |         super(Model, self).__init__()
 46 |         self.task_name = configs.task_name
 47 |         self.pred_len = configs.pred_len
 48 |         self.seq_len = configs.seq_len
 49 |         self.label_len = configs.label_len
 50 |         self.output_attention = configs.output_attention
 51 |         self.configs = configs
 52 | 
 53 |         # patching and embedding
 54 |         self.patch_embedding = PatchEmbedding(configs.d_model, configs.patch_len, configs.stride, configs.stride, configs.dropout)
 55 | 
 56 |         # Encoder
 57 |         self.encoder = Encoder(
 58 |             [
 59 |                 EncoderLayer(
 60 |                     AttentionLayer(
 61 |                         DSAttention(False, configs.factor, attention_dropout=configs.dropout,
 62 |                                     output_attention=configs.output_attention), configs.d_model, configs.n_heads),
 63 |                     configs.d_model,
 64 |                     configs.d_ff,
 65 |                     dropout=configs.dropout,
 66 |                     activation=configs.activation
 67 |                 ) for l in range(configs.e_layers)
 68 |             ],
 69 |             norm_layer=torch.nn.LayerNorm(configs.d_model),
 70 |         )
 71 | 
 72 |         self.patch_num = int((configs.seq_len - configs.patch_len) / configs.stride + 2)
 73 |         self.head_nf = self.patch_num * configs.d_model
 74 | 
 75 |         # Decoder
 76 |         if self.task_name == 'pretrain':
 77 | 
 78 |             # for series-wise representation
 79 |             self.pooler = Pooler_Head(self.head_nf, head_dropout=configs.head_dropout)
 80 | 
 81 |             # for reconstruction
 82 |             self.projection = Flatten_Head(self.head_nf, configs.d_model, configs.seq_len, head_dropout=configs.head_dropout)
 83 | 
 84 |             self.awl = AutomaticWeightedLoss(2)
 85 |             self.contrastive = ContrastiveWeight(self.configs)
 86 |             self.aggregation = AggregationRebuild(self.configs)
 87 |             self.mse = torch.nn.MSELoss()
 88 | 
 89 |         elif self.task_name == 'finetune':
 90 |             self.head = Flatten_Head(configs.head_nf, configs.d_model, configs.pred_len, head_dropout=configs.head_dropout)
 91 | 
 92 |     def forecast(self, x_enc, x_mark_enc):
 93 | 
 94 |         # data shape
 95 |         bs, seq_len, n_vars = x_enc.shape
 96 | 
 97 |         # normalization
 98 |         means = x_enc.mean(1, keepdim=True).detach()
 99 |         x_enc = x_enc - means
100 |         stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)
101 |         x_enc /= stdev
102 | 
103 |         # do patching and embedding
104 |         x_enc = x_enc.permute(0, 2, 1)
105 |         enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model]
106 | 
107 |         # encoder
108 |         enc_out, _ = self.encoder(enc_out) # enc_out: [(bs * n_vars) x patch_num x d_model]
109 | 
110 |         enc_out = torch.reshape(enc_out, (bs, n_vars, seq_len, -1)) # enc_out: [bs x n_vars x patch_num x d_model]
111 | 
112 |         # decoder
113 |         dec_out = self.head(enc_out)  # dec_out: [bs x n_vars x pred_len]
114 |         dec_out = dec_out.permute(0, 2, 1) # dec_out: [bs x pred_len x n_vars]
115 | 
116 |         # de-Normalization from Non-stationary Transformer
117 |         dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
118 |         dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
119 | 
120 |         return dec_out
121 |     
122 |     def pretrain(self, x_enc, x_mark_enc, batch_x, mask):
123 |         
124 |         # data shape
125 |         bs, seq_len, n_vars = x_enc.shape
126 | 
127 |         # normalization
128 |         means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1)
129 |         means = means.unsqueeze(1).detach()
130 |         x_enc = x_enc - means
131 |         x_enc = x_enc.masked_fill(mask == 0, 0)
132 |         stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5)
133 |         stdev = stdev.unsqueeze(1).detach()
134 |         x_enc /= stdev
135 | 
136 |         # do patching and embedding
137 |         x_enc = x_enc.permute(0, 2, 1)
138 |         enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model]
139 | 
140 |         # encoder
141 |         p_enc_out, _ = self.encoder(enc_out) # [(bs * n_vars) x patch_num x d_model]
142 | 
143 |         # series-wise representation
144 |         s_enc_out = self.pooler(p_enc_out) # [(bs * n_vars) x dimension]
145 | 
146 |         # series weight learning
147 |         loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [(bs * n_vars) x (bs * n_vars)]
148 |         rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [(bs * n_vars) x patch_num x d_model]
149 | 
150 |         agg_enc_out = agg_enc_out.reshape(bs, n_vars, agg_enc_out.shape[-2], agg_enc_out.shape[-1]) # agg_enc_out: [bs x n_vars x patch_num x d_model]
151 | 
152 |         # decoder
153 |         dec_out = self.projection(agg_enc_out)  # [bs x n_vars x seq_len]
154 |         dec_out = dec_out.permute(0, 2, 1)  # [bs x seq_len x n_vars]
155 | 
156 |         # de-Normalization
157 |         dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
158 |         dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
159 | 
160 |         pred_batch_x = dec_out[:batch_x.shape[0]]
161 | 
162 |         # series reconstruction
163 |         loss_rb = self.mse(pred_batch_x, batch_x.detach())
164 | 
165 |         # loss
166 |         loss = self.awl(loss_cl, loss_rb)
167 | 
168 |         return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x
169 | 
170 |     def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None):
171 | 
172 |         if self.task_name == 'pretrain':
173 |             return self.pretrain(x_enc, x_mark_enc, batch_x, mask)
174 |         if self.task_name == 'finetune':
175 |             dec_out = self.forecast(x_enc, x_mark_enc)
176 |             return dec_out[:, -self.pred_len:, :]  # [B, L, D]
177 | 
178 |         return None
179 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/iTransformer.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | from layers.Transformer_EncDec import Encoder, EncoderLayer
  4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer
  5 | from layers.Embed import DataEmbedding_inverted
  6 | from utils.losses import AutomaticWeightedLoss
  7 | from utils.tools import ContrastiveWeight, AggregationRebuild
  8 | 
  9 | class Flatten_Head(nn.Module):
 10 |     def __init__(self, d_model, pred_len, head_dropout=0):
 11 |         super().__init__()
 12 |         self.flatten = nn.Flatten(start_dim=-1)
 13 |         self.linear = nn.Linear(d_model, pred_len, bias=True)
 14 |         self.dropout = nn.Dropout(head_dropout)
 15 | 
 16 |     def forward(self, x):  # x: [bs x n_vars x d_model]
 17 |         x = self.flatten(x)
 18 |         x = self.linear(x)
 19 |         x = self.dropout(x)
 20 |         return x   # x: [bs x n_vars x seq_len]
 21 | 
 22 | class Pooler_Head(nn.Module):
 23 |     def __init__(self, nf, dimension=128, head_dropout=0):
 24 |         super().__init__()
 25 | 
 26 |         self.pooler = nn.Sequential(
 27 |             nn.Flatten(start_dim=-2),
 28 |             nn.Linear(nf, nf // 2),
 29 |             nn.BatchNorm1d(nf // 2),
 30 |             nn.ReLU(),
 31 |             nn.Linear(nf // 2, dimension),
 32 |             nn.Dropout(head_dropout),
 33 |         )
 34 | 
 35 |     def forward(self, x):  # [bs x n_vars x d_model]
 36 |         x = self.pooler(x) # [bs x dimension]
 37 |         return x
 38 | 
 39 | class Model(nn.Module):
 40 |     """
 41 |     iTransformer + SimMTM
 42 |     """
 43 | 
 44 |     def __init__(self, configs):
 45 |         super(Model, self).__init__()
 46 |         self.task_name = configs.task_name
 47 |         self.pred_len = configs.pred_len
 48 |         self.seq_len = configs.seq_len
 49 |         self.label_len = configs.label_len
 50 |         self.output_attention = configs.output_attention
 51 |         self.configs = configs
 52 | 
 53 |         # patching and embedding
 54 |         self.enc_embedding = DataEmbedding_inverted(configs.seq_len, configs.d_model, configs.embed, configs.freq, configs.dropout)
 55 | 
 56 |         # Encoder
 57 |         self.encoder = Encoder(
 58 |             [
 59 |                 EncoderLayer(
 60 |                     AttentionLayer(
 61 |                         DSAttention(False, configs.factor, attention_dropout=configs.dropout,
 62 |                                     output_attention=configs.output_attention), configs.d_model, configs.n_heads),
 63 |                     configs.d_model,
 64 |                     configs.d_ff,
 65 |                     dropout=configs.dropout,
 66 |                     activation=configs.activation
 67 |                 ) for l in range(configs.e_layers)
 68 |             ],
 69 |             norm_layer=torch.nn.LayerNorm(configs.d_model),
 70 |         )
 71 | 
 72 |         # Decoder
 73 |         if self.task_name == 'pretrain':
 74 | 
 75 |             # for series-wise representation
 76 |             self.pooler = Pooler_Head(configs.enc_in*configs.d_model, head_dropout=configs.head_dropout)
 77 | 
 78 |             # for reconstruction
 79 |             self.projection = Flatten_Head(configs.d_model, configs.seq_len, head_dropout=configs.head_dropout)
 80 | 
 81 |             self.awl = AutomaticWeightedLoss(2)
 82 |             self.contrastive = ContrastiveWeight(self.configs)
 83 |             self.aggregation = AggregationRebuild(self.configs)
 84 |             self.mse = torch.nn.MSELoss()
 85 | 
 86 |         elif self.task_name == 'finetune':
 87 |             self.head = Flatten_Head(configs.d_model, configs.pred_len, head_dropout=configs.head_dropout)
 88 |     
 89 |     def forecast(self, x_enc, x_mark_enc):
 90 | 
 91 |         # normalization
 92 |         means = x_enc.mean(1, keepdim=True).detach()
 93 |         x_enc = x_enc - means
 94 |         stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)
 95 |         x_enc /= stdev
 96 | 
 97 |         _, _, N = x_enc.shape   # x_enc: [Batch Time Variate]
 98 | 
 99 |         # embedding
100 |         enc_out = self.enc_embedding(x_enc, x_mark_enc)
101 | 
102 |         # encoder
103 |         enc_out, _ = self.encoder(enc_out)
104 | 
105 |         dec_out = self.head(enc_out).permute(0, 2, 1)[:, :, :N]
106 | 
107 |         # de-Normalization
108 |         dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
109 |         dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
110 | 
111 |         return dec_out
112 | 
113 |     def pretrain(self, x_enc, x_mark_enc, batch_x, mask):
114 | 
115 |         # normalization
116 |         means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1)
117 |         means = means.unsqueeze(1).detach()
118 |         x_enc = x_enc - means
119 |         x_enc = x_enc.masked_fill(mask == 0, 0)
120 |         stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5)
121 |         stdev = stdev.unsqueeze(1).detach()
122 |         x_enc /= stdev
123 | 
124 |         # encoder
125 |         enc_out = self.enc_embedding(x_enc)
126 |         p_enc_out, _ = self.encoder(enc_out)  # p_enc_out: [bs x n_vars x d_model]
127 | 
128 |         # series-wise representation
129 |         s_enc_out = self.pooler(p_enc_out) # s_enc_out: [bs x dimension]
130 | 
131 |         # series weight learning
132 |         loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [bs x bs]
133 |         rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [bs x n_vars x d_model]
134 | 
135 |         # decoder
136 |         dec_out = self.projection(agg_enc_out)  # [bs x n_vars x seq_len]
137 |         dec_out = dec_out.permute(0, 2, 1)  # [bs x seq_len x n_vars]
138 | 
139 |         # de-Normalization
140 |         dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
141 |         dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
142 | 
143 |         pred_batch_x = dec_out[:batch_x.shape[0]]
144 | 
145 |         # series reconstruction
146 |         loss_rb = self.mse(pred_batch_x, batch_x.detach())
147 | 
148 |         # loss
149 |         loss = self.awl(loss_cl, loss_rb)
150 | 
151 |         return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x
152 | 
153 |     def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None):
154 | 
155 |         if self.task_name == 'pretrain':
156 |             return self.pretrain(x_enc, x_mark_enc, batch_x, mask)
157 |         if self.task_name == 'finetune':
158 |             dec_out = self.forecast(x_enc, x_mark_enc)
159 |             return dec_out[:, -self.pred_len:, :]  # [B, L, D]
160 | 
161 |         return None
162 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/run.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import torch
  3 | from exp.exp_simmtm import Exp_SimMTM
  4 | import random
  5 | import numpy as np
  6 | import os
  7 | os.environ['CUDA_LAUNCH_BLOCKING'] = '0'
  8 | 
  9 | fix_seed = 2023
 10 | random.seed(fix_seed)
 11 | torch.manual_seed(fix_seed)
 12 | np.random.seed(fix_seed)
 13 | 
 14 | parser = argparse.ArgumentParser(description='SimMTM')
 15 | 
 16 | # basic config
 17 | parser.add_argument('--task_name', type=str, required=True, default='pretrain', help='task name, options:[pretrain, finetune]')
 18 | parser.add_argument('--is_training', type=int, default=1, help='status')
 19 | parser.add_argument('--model_id', type=str, required=True, default='SimMTM', help='model id')
 20 | parser.add_argument('--model', type=str, required=True, default='SimMTM', help='model name')
 21 | 
 22 | # data loader
 23 | parser.add_argument('--data', type=str, required=True, default='ETTh1', help='dataset type')
 24 | parser.add_argument('--root_path', type=str, default='./datasets', help='root path of the data file')
 25 | parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file')
 26 | parser.add_argument('--features', type=str, default='M', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate')
 27 | parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
 28 | parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h')
 29 | parser.add_argument('--checkpoints', type=str, default='./outputs/checkpoints/', help='location of model fine-tuning checkpoints')
 30 | parser.add_argument('--pretrain_checkpoints', type=str, default='./outputs/pretrain_checkpoints/', help='location of model pre-training checkpoints')
 31 | parser.add_argument('--transfer_checkpoints', type=str, default='ckpt_best.pth', help='checkpoints we will use to finetune, options:[ckpt_best.pth, ckpt10.pth, ckpt20.pth...]')
 32 | parser.add_argument('--load_checkpoints', type=str, default=None, help='location of model checkpoints')
 33 | parser.add_argument('--select_channels', type=float, default=1, help='select the rate of channels to train')
 34 | 
 35 | # forecasting task
 36 | parser.add_argument('--seq_len', type=int, default=336, help='input sequence length')
 37 | parser.add_argument('--label_len', type=int, default=48, help='start token length')
 38 | parser.add_argument('--pred_len', type=int, default=96, help='prediction sequence length')
 39 | parser.add_argument('--seasonal_patterns', type=str, default='Monthly', help='subset for M4')
 40 | 
 41 | # model define
 42 | parser.add_argument('--top_k', type=int, default=5, help='for TimesBlock')
 43 | parser.add_argument('--num_kernels', type=int, default=3, help='for Inception')
 44 | parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
 45 | parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
 46 | parser.add_argument('--c_out', type=int, default=7, help='output size')
 47 | parser.add_argument('--d_model', type=int, default=512, help='dimension of model')
 48 | parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
 49 | parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
 50 | parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
 51 | parser.add_argument('--d_ff', type=int, default=2048, help='dimension of fcn')
 52 | parser.add_argument('--moving_avg', type=int, default=25, help='window size of moving average')
 53 | parser.add_argument('--factor', type=int, default=1, help='attn factor')
 54 | parser.add_argument('--distil', action='store_false', help='whether to use distilling in encoder, using this argument means not using distilling', default=True)
 55 | parser.add_argument('--dropout', type=float, default=0.1, help='dropout')
 56 | parser.add_argument('--fc_dropout', type=float, default=0, help='fully connected dropout')
 57 | parser.add_argument('--head_dropout', type=float, default=0.1, help='head dropout')
 58 | parser.add_argument('--embed', type=str, default='timeF', help='time features encoding, options:[timeF, fixed, learned]')
 59 | parser.add_argument('--activation', type=str, default='gelu', help='activation')
 60 | parser.add_argument('--output_attention', action='store_true', help='whether to output attention in ecoder')
 61 | parser.add_argument('--individual', type=int, default=0, help='individual head; True 1 False 0')
 62 | parser.add_argument('--pct_start', type=float, default=0.3, help='pct_start')
 63 | parser.add_argument('--patch_len', type=int, default=12, help='path length')
 64 | parser.add_argument('--stride', type=int, default=12, help='stride')
 65 | 
 66 | # optimization
 67 | parser.add_argument('--num_workers', type=int, default=5, help='data loader num workers')
 68 | parser.add_argument('--itr', type=int, default=1, help='experiments times')
 69 | parser.add_argument('--train_epochs', type=int, default=10, help='train epochs')
 70 | parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data')
 71 | parser.add_argument('--patience', type=int, default=3, help='early stopping patience')
 72 | parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
 73 | parser.add_argument('--des', type=str, default='test', help='exp description')
 74 | parser.add_argument('--loss', type=str, default='MSE', help='loss function')
 75 | parser.add_argument('--lradj', type=str, default='type1', help='adjust learning rate')
 76 | parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False)
 77 | 
 78 | # GPU
 79 | parser.add_argument('--use_gpu', type=bool, default=True, help='use gpu')
 80 | parser.add_argument('--gpu', type=int, default=0, help='gpu')
 81 | parser.add_argument('--use_multi_gpu', action='store_true', help='use multiple gpus', default=False)
 82 | parser.add_argument('--devices', type=str, default='0', help='device ids of multile gpus')
 83 | 
 84 | # Pre-train
 85 | parser.add_argument('--lm', type=int, default=3, help='average masking length')
 86 | parser.add_argument('--positive_nums', type=int, default=3, help='masking series numbers')
 87 | parser.add_argument('--rbtp', type=int, default=1, help='0: rebuild the embedding of oral series; 1: rebuild oral series')
 88 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature')
 89 | parser.add_argument('--masked_rule', type=str, default='geometric', help='geometric, random, masked tail, masked head')
 90 | parser.add_argument('--mask_rate', type=float, default=0.5, help='mask ratio')
 91 | 
 92 | args = parser.parse_args()
 93 | args.use_gpu = True if torch.cuda.is_available() and args.use_gpu else False
 94 | 
 95 | if args.use_gpu and args.use_multi_gpu:
 96 |     args.devices = args.devices.replace(' ', '')
 97 |     device_ids = args.devices.split(',')
 98 |     args.device_ids = [int(id_) for id_ in device_ids]
 99 | 
100 | print('Args in experiment:')
101 | print(args)
102 | 
103 | Exp = Exp_SimMTM
104 | if args.task_name == 'pretrain':
105 |     for ii in range(args.itr):
106 |         # setting record of experiments
107 |         setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}_lm{}_pn{}_mr{}_tp{}'.format(
108 |             args.task_name,
109 |             args.model,
110 |             args.data,
111 |             args.features,
112 |             args.seq_len,
113 |             args.label_len,
114 |             args.pred_len,
115 |             args.d_model,
116 |             args.d_ff,
117 |             args.n_heads,
118 |             args.e_layers,
119 |             args.d_layers,
120 |             args.factor,
121 |             args.dropout,
122 |             args.head_dropout,
123 |             args.train_epochs,
124 |             args.batch_size,
125 |             args.learning_rate,
126 |             args.lm,
127 |             args.positive_nums,
128 |             args.mask_rate,
129 |             args.temperature
130 |         )
131 | 
132 |         exp = Exp(args)  # set experiments
133 |         print('>>>>>>>start pre_training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
134 |         exp.pretrain()
135 |         torch.cuda.empty_cache()
136 | 
137 | elif args.task_name == 'finetune':
138 |     for ii in range(args.itr):
139 |         # setting record of experiments
140 |         setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}'.format(
141 |             args.task_name,
142 |             args.model,
143 |             args.data,
144 |             args.features,
145 |             args.seq_len,
146 |             args.label_len,
147 |             args.pred_len,
148 |             args.d_model,
149 |             args.d_ff,
150 |             args.n_heads,
151 |             args.e_layers,
152 |             args.d_layers,
153 |             args.factor,
154 |             args.dropout,
155 |             args.head_dropout,
156 |             args.train_epochs,
157 |             args.batch_size,
158 |             args.learning_rate
159 |         )
160 | 
161 |         args.load_checkpoints = os.path.join(args.pretrain_checkpoints, args.data, args.transfer_checkpoints)
162 | 
163 |         exp = Exp(args)  # set experiments
164 | 
165 |         print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
166 |         exp.train(setting)
167 | 
168 |         print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
169 |         exp.test()
170 |         torch.cuda.empty_cache()
171 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ECL_script/ECL.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --root_path ./dataset/electricity/ \
 7 |         --data_path electricity.csv \
 8 |         --model_id ECL \
 9 |         --model SimMTM \
10 |         --data ECL \
11 |         --features M \
12 |         --seq_len 336 \
13 |         --label_len 48 \
14 |         --pred_len $pred_len \
15 |         --e_layers 2 \
16 |         --enc_in 321 \
17 |         --dec_in 321 \
18 |         --c_out 321 \
19 |         --d_model 32 \
20 |         --d_ff 64 \
21 |         --n_heads 16 \
22 |         --batch_size 32
23 | done
24 | 
25 | 
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --is_training 1 \
 7 |         --root_path ./dataset/ETT-small/ \
 8 |         --data_path ETTh1.csv \
 9 |         --model_id ETTh1 \
10 |         --model SimMTM \
11 |         --data ETTh1 \
12 |         --features M \
13 |         --seq_len 336 \
14 |         --label_len 48 \
15 |         --pred_len $pred_len \
16 |         --e_layers 3 \
17 |         --enc_in 7 \
18 |         --dec_in 7 \
19 |         --c_out 7 \
20 |         --n_heads 16 \
21 |         --d_model 32 \
22 |         --d_ff 64 \
23 |         --learning_rate 0.0001 \
24 |         --dropout 0.2 \
25 |         --batch_size 16
26 | done
27 | 
28 | 
29 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh2.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --is_training 1 \
 7 |         --root_path ./dataset/ETT-small/ \
 8 |         --data_path ETTh2.csv \
 9 |         --model_id ETTh2 \
10 |         --model SimMTM \
11 |         --data ETTh2 \
12 |         --features M \
13 |         --seq_len 336 \
14 |         --label_len 48 \
15 |         --pred_len $pred_len \
16 |         --e_layers 2 \
17 |         --enc_in 7 \
18 |         --dec_in 7 \
19 |         --c_out 7 \
20 |         --n_heads 8 \
21 |         --d_model 8 \
22 |         --d_ff 32 \
23 |         --dropout 0.4 \
24 |         --head_dropout 0.2 \
25 |         --batch_size 16
26 | done


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --is_training 1 \
 7 |         --root_path ./dataset/ETT-small/ \
 8 |         --data_path ETTm1.csv \
 9 |         --model_id ETTm1 \
10 |         --model SimMTM \
11 |         --data ETTm1 \
12 |         --features M \
13 |         --seq_len 336 \
14 |         --label_len 48 \
15 |         --pred_len $pred_len \
16 |         --e_layers 2 \
17 |         --enc_in 7 \
18 |         --dec_in 7 \
19 |         --c_out 7 \
20 |         --n_heads 8 \
21 |         --d_model 32 \
22 |         --d_ff 64 \
23 |         --dropout 0
24 | done
25 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm2.sh:
--------------------------------------------------------------------------------
 1 | xport CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --is_training 1 \
 7 |         --root_path ./dataset/ETT-small/ \
 8 |         --data_path ETTm2.csv \
 9 |         --model_id ETTm2 \
10 |         --model SimMTM \
11 |         --data ETTm2 \
12 |         --features M \
13 |         --seq_len 336 \
14 |         --label_len 48 \
15 |         --pred_len $pred_len \
16 |         --e_layers 3 \
17 |         --enc_in 7 \
18 |         --dec_in 7 \
19 |         --c_out 7 \
20 |         --n_heads 8 \
21 |         --d_model 8 \
22 |         --d_ff 16 \
23 |         --dropout 0 \
24 |         --batch_size 64
25 | done
26 | 
27 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Traffic/Traffic.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --root_path ./dataset/traffic/ \
 7 |         --data_path traffic.csv \
 8 |         --model_id Traffic \
 9 |         --model SimMTM \
10 |         --data Traffic \
11 |         --features M \
12 |         --seq_len 336 \
13 |         --label_len 48 \
14 |         --pred_len $pred_len \
15 |         --e_layers 2 \
16 |         --enc_in 862 \
17 |         --dec_in 862 \
18 |         --c_out 862 \
19 |         --d_model 128 \
20 |         --d_ff 256 \
21 |         --n_heads 16 \
22 |         --batch_size 32 \
23 |         --dropout 0.2
24 | done
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Weather_script/Weather.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | for pred_len in 96 192 336 720; do
 4 |     python -u run.py \
 5 |         --task_name finetune \
 6 |         --is_training 1 \
 7 |         --root_path ./dataset/weather/ \
 8 |         --data_path weather.csv \
 9 |         --model_id Weather \
10 |         --model SimMTM \
11 |         --data Weather \
12 |         --features M \
13 |         --seq_len 336 \
14 |         --label_len 48 \
15 |         --pred_len $pred_len \
16 |         --e_layers 2 \
17 |         --enc_in 21 \
18 |         --dec_in 21 \
19 |         --c_out 21 \
20 |         --n_heads 8 \
21 |         --d_model 64 \
22 |         --d_ff 64 \
23 |         --batch_size 16
24 | done
25 | 
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ECL_script/ECL.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/electricity/ \
 6 |     --data_path electricity.csv \
 7 |     --model_id ECL \
 8 |     --model SimMTM \
 9 |     --data ECL \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --label_len 48 \
13 |     --e_layers 2 \
14 |     --positive_nums 2 \
15 |     --mask_rate 0.5 \
16 |     --enc_in 321 \
17 |     --dec_in 321 \
18 |     --c_out 321 \
19 |     --d_model 32 \
20 |     --d_ff 64 \
21 |     --n_heads 16 \
22 |     --batch_size 32 \
23 |     --train_epochs 50 \
24 |     --temperature 0.02
25 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/ETT-small/ \
 6 |     --data_path ETTh1.csv \
 7 |     --model_id ETTh1 \
 8 |     --model SimMTM \
 9 |     --data ETTh1 \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --e_layers 3 \
13 |     --enc_in 7 \
14 |     --dec_in 7 \
15 |     --c_out 7 \
16 |     --n_heads 16 \
17 |     --d_model 32 \
18 |     --d_ff 64 \
19 |     --positive_nums 3 \
20 |     --mask_rate 0.5 \
21 |     --learning_rate 0.001 \
22 |     --batch_size 16 \
23 |     --train_epochs 50
24 | 
25 | 
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh2.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/ETT-small/ \
 6 |     --data_path ETTh2.csv \
 7 |     --model_id ETTh2 \
 8 |     --model SimMTM \
 9 |     --data ETTh2 \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --e_layers 2 \
13 |     --enc_in 7 \
14 |     --dec_in 7 \
15 |     --c_out 7 \
16 |     --n_heads 8 \
17 |     --d_model 8 \
18 |     --d_ff 32 \
19 |     --positive_nums 3 \
20 |     --mask_rate 0.5 \
21 |     --learning_rate 0.001 \
22 |     --batch_size 16 \
23 |     --train_epochs 50
24 | 
25 | 
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/ETT-small/ \
 6 |     --data_path ETTm1.csv \
 7 |     --model_id ETTm1 \
 8 |     --model SimMTM \
 9 |     --data ETTm1 \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --e_layers 2 \
13 |     --enc_in 7 \
14 |     --dec_in 7 \
15 |     --c_out 7 \
16 |     --n_heads 8 \
17 |     --d_model 32 \
18 |     --d_ff 64 \
19 |     --positive_nums 3 \
20 |     --mask_rate 0.5 \
21 |     --learning_rate 0.001 \
22 |     --batch_size 16 \
23 |     --train_epochs 50
24 | 
25 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm2.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0,1
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/ETT-small/ \
 6 |     --data_path ETTm2.csv \
 7 |     --model_id ETTm2 \
 8 |     --model SimMTM \
 9 |     --data ETTm2 \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --e_layers 3 \
13 |     --enc_in 7 \
14 |     --dec_in 7 \
15 |     --c_out 7 \
16 |     --n_heads 8 \
17 |     --d_model 8 \
18 |     --d_ff 16 \
19 |     --positive_nums 2 \
20 |     --mask_rate 0.5 \
21 |     --learning_rate 0.001 \
22 |     --batch_size 16 \
23 |     --train_epochs 50
24 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Traffic_script/Traffic.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/traffic/ \
 6 |     --data_path traffic.csv \
 7 |     --model_id Traffic \
 8 |     --model SimMTM \
 9 |     --data Traffic \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --label_len 48 \
13 |     --e_layers 3 \
14 |     --positive_nums 2 \
15 |     --mask_rate 0.5 \
16 |     --enc_in 862 \
17 |     --dec_in 862 \
18 |     --c_out 862 \
19 |     --d_model 128 \
20 |     --d_ff 256 \
21 |     --n_heads 16 \
22 |     --batch_size 32 \
23 |     --dropout 0.2 \
24 |     --train_epochs 50 \
25 |     --temperature 0.02
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Weather_script/Weather.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=0
 2 | 
 3 | python -u run.py \
 4 |     --task_name pretrain \
 5 |     --root_path ./dataset/weather/ \
 6 |     --data_path weather.csv \
 7 |     --model_id Weather \
 8 |     --model SimMTM \
 9 |     --data Weather \
10 |     --features M \
11 |     --seq_len 336 \
12 |     --e_layers 2 \
13 |     --positive_nums 2 \
14 |     --mask_rate 0.5 \
15 |     --enc_in 21 \
16 |     --dec_in 21 \
17 |     --c_out 21 \
18 |     --n_heads 8 \
19 |     --d_model 64 \
20 |     --d_ff 64 \
21 |     --learning_rate 0.001 \
22 |     --batch_size 16 \
23 |     --train_epochs 50
24 | 
25 | 
26 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/.DS_Store


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/__init__.py


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/augmentations.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import math
  4 | 
  5 | 
  6 | def masked_data(sample, sample_mark, masking_ratio, lm, positive_nums=1, distribution='geometric'):
  7 |     """Masked time series in time dimension"""
  8 | 
  9 |     sample = sample.permute(0, 2, 1)  # [bs x nvars x seq_len]
 10 | 
 11 |     sample_repeat = sample.repeat(positive_nums, 1, 1)  # [(bs * positive_nums) x nvars x seq_len]
 12 | 
 13 |     mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution)
 14 |     x_masked = mask * sample_repeat
 15 | 
 16 |     sample_mark_repeat = sample_mark.repeat(positive_nums, 1, 1)
 17 | 
 18 |     return x_masked.permute(0, 2, 1), sample_mark_repeat, mask.permute(0, 2, 1)
 19 | 
 20 | 
 21 | def geom_noise_mask_single(L, lm, masking_ratio):
 22 |     """
 23 |     Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio`
 24 |     proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution.
 25 |     Args:
 26 |         L: length of mask and sequence to be masked
 27 |         lm: average length of masking subsequences (streaks of 0s)
 28 |         masking_ratio: proportion of L to be masked
 29 |     Returns:
 30 |         (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L
 31 |     """
 32 |     keep_mask = np.ones(L, dtype=bool)
 33 |     p_m = 1 / lm  # probability of each masking sequence stopping. parameter of geometric distribution.
 34 |     p_u = p_m * masking_ratio / (
 35 |             1 - masking_ratio)  # probability of each unmasked sequence stopping. parameter of geometric distribution.
 36 |     p = [p_m, p_u]
 37 | 
 38 |     # Start in state 0 with masking_ratio probability
 39 |     state = int(np.random.rand() > masking_ratio)  # state 0 means masking, 1 means not masking
 40 |     for i in range(L):
 41 |         keep_mask[i] = state  # here it happens that state and masking value corresponding to state are identical
 42 |         if np.random.rand() < p[state]:
 43 |             state = 1 - state
 44 | 
 45 |     return keep_mask
 46 | 
 47 | 
 48 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None):
 49 |     """
 50 |     Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked.
 51 |     Args:
 52 |         X: (seq_length, feat_dim) numpy array of features corresponding to a single sample
 53 |         masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of
 54 |             feat_dim that will be masked on average
 55 |         lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'.
 56 |         distribution: whether each mask sequence element is sampled independently at random, or whether
 57 |             sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of
 58 |             masked squences of a desired mean length `lm`
 59 |         exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s)
 60 |     Returns:
 61 |         boolean numpy array with the same shape as X, with 0s at places where a feature should be masked
 62 |     """
 63 |     if exclude_feats is not None:
 64 |         exclude_feats = set(exclude_feats)
 65 | 
 66 |     if distribution == 'geometric':  # stateful (Markov chain)
 67 |         mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio)
 68 |         mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2])
 69 |     elif distribution == 'masked_tail':
 70 |         mask = np.ones(X.shape, dtype=bool)
 71 |         for m in range(X.shape[0]):  # feature dimension
 72 | 
 73 |             keep_mask = np.zeros_like(mask[m, :], dtype=bool)
 74 |             n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio))
 75 |             keep_mask[:, :n] = True
 76 |             mask[m, :] = keep_mask  # time dimension
 77 |     elif distribution == 'masked_head':
 78 |         mask = np.ones(X.shape, dtype=bool)
 79 |         for m in range(X.shape[0]):  # feature dimension
 80 | 
 81 |             keep_mask = np.zeros_like(mask[m, :], dtype=bool)
 82 |             n = math.ceil(keep_mask.shape[1] * masking_ratio)
 83 |             keep_mask[:, n:] = True
 84 |             mask[m, :] = keep_mask  # time dimension
 85 |     else:  # each position is independent Bernoulli with p = 1 - masking_ratio
 86 |         mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True,
 87 |                                 p=(1 - masking_ratio, masking_ratio))
 88 |     return torch.tensor(mask)
 89 | 
 90 | 
 91 | def one_hot_encoding(X):
 92 |     X = [int(x) for x in X]
 93 |     n_values = np.max(X) + 1
 94 |     b = np.eye(n_values)[X]
 95 |     return b
 96 | 
 97 | 
 98 | def DataTransform(sample, config):
 99 |     """Weak and strong augmentations"""
100 |     weak_aug = scaling(sample, config.augmentation.jitter_scale_ratio)
101 |     # weak_aug = permutation(sample, max_segments=config.augmentation.max_seg)
102 |     strong_aug = jitter(permutation(sample, max_segments=config.augmentation.max_seg), config.augmentation.jitter_ratio)
103 | 
104 |     return weak_aug, strong_aug
105 | 
106 | 
107 | def remove_frequency(x, pertub_ratio=0.0):
108 |     mask = torch.cuda.FloatTensor(x.shape).uniform_() > pertub_ratio # maskout_ratio are False
109 |     mask = mask.to(x.device)
110 |     return x*mask
111 | 
112 | 
113 | def add_frequency(x, pertub_ratio=0.0):
114 | 
115 |     mask = torch.cuda.FloatTensor(x.shape).uniform_() > (1-pertub_ratio) # only pertub_ratio of all values are True
116 |     mask = mask.to(x.device)
117 |     max_amplitude = x.max()
118 |     random_am = torch.rand(mask.shape)*(max_amplitude*0.1)
119 |     pertub_matrix = mask*random_am
120 |     return x+pertub_matrix
121 | 
122 | 
123 | def generate_binomial_mask(B, T, D, p=0.5): # p is the ratio of not zero
124 |     return torch.from_numpy(np.random.binomial(1, p, size=(B, T, D))).to(torch.bool)
125 | 
126 | 
127 | def masking(x, keepratio=0.9, mask= 'binomial'):
128 |     global mask_id
129 |     nan_mask = ~x.isnan().any(axis=-1)
130 |     x[~nan_mask] = 0
131 |     # x = self.input_fc(x)  # B x T x Ch
132 | 
133 |     if mask == 'binomial':
134 |         mask_id = generate_binomial_mask(x.size(0), x.size(1), x.size(2), p=keepratio).to(x.device)
135 |     # elif mask == 'continuous':
136 |     #     mask = generate_continuous_mask(x.size(0), x.size(1)).to(x.device)
137 |     # elif mask == 'all_true':
138 |     #     mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool)
139 |     # elif mask == 'all_false':
140 |     #     mask = x.new_full((x.size(0), x.size(1)), False, dtype=torch.bool)
141 |     # elif mask == 'mask_last':
142 |     #     mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool)
143 |     #     mask[:, -1] = False
144 | 
145 |     # mask &= nan_mask
146 |     x[~mask_id] = 0
147 |     return x
148 | 
149 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/losses.py:
--------------------------------------------------------------------------------
 1 | import torch as t
 2 | import torch
 3 | import torch.nn as nn
 4 | import numpy as np
 5 | 
 6 | def divide_no_nan(a, b):
 7 |     """
 8 |     a/b where the resulted NaN or Inf are replaced by 0.
 9 |     """
10 |     result = a / b
11 |     result[result != result] = .0
12 |     result[result == np.inf] = .0
13 |     return result
14 | 
15 | 
16 | class mape_loss(nn.Module):
17 |     def __init__(self):
18 |         super(mape_loss, self).__init__()
19 | 
20 |     def forward(self, insample: t.Tensor, freq: int,
21 |                 forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
22 |         """
23 |         MAPE loss as defined in: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
24 | 
25 |         :param forecast: Forecast values. Shape: batch, time
26 |         :param target: Target values. Shape: batch, time
27 |         :param mask: 0/1 mask. Shape: batch, time
28 |         :return: Loss value
29 |         """
30 |         weights = divide_no_nan(mask, target)
31 |         return t.mean(t.abs((forecast - target) * weights))
32 | 
33 | 
34 | class smape_loss(nn.Module):
35 |     def __init__(self):
36 |         super(smape_loss, self).__init__()
37 | 
38 |     def forward(self, insample: t.Tensor, freq: int,
39 |                 forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
40 |         """
41 |         sMAPE loss as defined in https://robjhyndman.com/hyndsight/smape/ (Makridakis 1993)
42 | 
43 |         :param forecast: Forecast values. Shape: batch, time
44 |         :param target: Target values. Shape: batch, time
45 |         :param mask: 0/1 mask. Shape: batch, time
46 |         :return: Loss value
47 |         """
48 |         return 200 * t.mean(divide_no_nan(t.abs(forecast - target), t.abs(forecast.data) + t.abs(target.data)) * mask)
49 | 
50 | 
51 | class mase_loss(nn.Module):
52 |     def __init__(self):
53 |         super(mase_loss, self).__init__()
54 | 
55 |     def forward(self, insample: t.Tensor, freq: int,
56 |                 forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
57 |         """
58 |         MASE loss as defined in "Scaled Errors" https://robjhyndman.com/papers/mase.pdf
59 | 
60 |         :param insample: Insample values. Shape: batch, time_i
61 |         :param freq: Frequency value
62 |         :param forecast: Forecast values. Shape: batch, time_o
63 |         :param target: Target values. Shape: batch, time_o
64 |         :param mask: 0/1 mask. Shape: batch, time_o
65 |         :return: Loss value
66 |         """
67 |         masep = t.mean(t.abs(insample[:, freq:] - insample[:, :-freq]), dim=1)
68 |         masked_masep_inv = divide_no_nan(mask, masep[:, None])
69 |         return t.mean(t.abs(target - forecast) * masked_masep_inv)
70 | 
71 | 
72 | class AutomaticWeightedLoss(nn.Module):
73 |     """automatically weighted multi-task loss
74 |     Params：
75 |         num: int，the number of loss
76 |         x: multi-task loss
77 |     Examples：
78 |         loss1=1
79 |         loss2=2
80 |         awl = AutomaticWeightedLoss(2)
81 |         loss_sum = awl(loss1, loss2)
82 |     """
83 | 
84 |     def __init__(self, num=2):
85 |         super(AutomaticWeightedLoss, self).__init__()
86 |         params = torch.ones(num, requires_grad=True)
87 |         self.params = nn.Parameter(params)
88 | 
89 |     def forward(self, *x):
90 |         loss_sum = 0
91 |         for i, loss in enumerate(x):
92 |             loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
93 |         return loss_sum


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/m4_summary.py:
--------------------------------------------------------------------------------
  1 | # This source code is provided for the purposes of scientific reproducibility
  2 | # under the following limited license from Element AI Inc. The code is an
  3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
  4 | # expansion analysis for interpretable time series forecasting,
  5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is
  6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0
  7 | # International license (CC BY-NC 4.0):
  8 | # https://creativecommons.org/licenses/by-nc/4.0/.  Any commercial use (whether
  9 | # for the benefit of third parties or internally in production) requires an
 10 | # explicit license. The subject-matter of the N-BEATS model and associated
 11 | # materials are the property of Element AI Inc. and may be subject to patent
 12 | # protection. No license to patents is granted hereunder (whether express or
 13 | # implied). Copyright 2020 Element AI Inc. All rights reserved.
 14 | 
 15 | """
 16 | M4 Summary
 17 | """
 18 | from collections import OrderedDict
 19 | 
 20 | import numpy as np
 21 | import pandas as pd
 22 | 
 23 | from data_provider.m4 import M4Dataset
 24 | from data_provider.m4 import M4Meta
 25 | import os
 26 | 
 27 | 
 28 | def group_values(values, groups, group_name):
 29 |     return np.array([v[~np.isnan(v)] for v in values[groups == group_name]])
 30 | 
 31 | 
 32 | def mase(forecast, insample, outsample, frequency):
 33 |     return np.mean(np.abs(forecast - outsample)) / np.mean(np.abs(insample[:-frequency] - insample[frequency:]))
 34 | 
 35 | 
 36 | def smape_2(forecast, target):
 37 |     denom = np.abs(target) + np.abs(forecast)
 38 |     # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
 39 |     denom[denom == 0.0] = 1.0
 40 |     return 200 * np.abs(forecast - target) / denom
 41 | 
 42 | 
 43 | def mape(forecast, target):
 44 |     denom = np.abs(target)
 45 |     # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
 46 |     denom[denom == 0.0] = 1.0
 47 |     return 100 * np.abs(forecast - target) / denom
 48 | 
 49 | 
 50 | class M4Summary:
 51 |     def __init__(self, file_path, root_path):
 52 |         self.file_path = file_path
 53 |         self.training_set = M4Dataset.load(training=True, dataset_file=root_path)
 54 |         self.test_set = M4Dataset.load(training=False, dataset_file=root_path)
 55 |         self.naive_path = os.path.join(root_path, 'submission-Naive2.csv')
 56 | 
 57 |     def evaluate(self):
 58 |         """
 59 |         Evaluate forecasts using M4 test dataset.
 60 | 
 61 |         :param forecast: Forecasts. Shape: timeseries, time.
 62 |         :return: sMAPE and OWA grouped by seasonal patterns.
 63 |         """
 64 |         grouped_owa = OrderedDict()
 65 | 
 66 |         naive2_forecasts = pd.read_csv(self.naive_path).values[:, 1:].astype(np.float32)
 67 |         naive2_forecasts = np.array([v[~np.isnan(v)] for v in naive2_forecasts])
 68 | 
 69 |         model_mases = {}
 70 |         naive2_smapes = {}
 71 |         naive2_mases = {}
 72 |         grouped_smapes = {}
 73 |         grouped_mapes = {}
 74 |         for group_name in M4Meta.seasonal_patterns:
 75 |             file_name = self.file_path + group_name + "_forecast.csv"
 76 |             if os.path.exists(file_name):
 77 |                 model_forecast = pd.read_csv(file_name).values
 78 | 
 79 |             naive2_forecast = group_values(naive2_forecasts, self.test_set.groups, group_name)
 80 |             target = group_values(self.test_set.values, self.test_set.groups, group_name)
 81 |             # all timeseries within group have same frequency
 82 |             frequency = self.training_set.frequencies[self.test_set.groups == group_name][0]
 83 |             insample = group_values(self.training_set.values, self.test_set.groups, group_name)
 84 | 
 85 |             model_mases[group_name] = np.mean([mase(forecast=model_forecast[i],
 86 |                                                     insample=insample[i],
 87 |                                                     outsample=target[i],
 88 |                                                     frequency=frequency) for i in range(len(model_forecast))])
 89 |             naive2_mases[group_name] = np.mean([mase(forecast=naive2_forecast[i],
 90 |                                                      insample=insample[i],
 91 |                                                      outsample=target[i],
 92 |                                                      frequency=frequency) for i in range(len(model_forecast))])
 93 | 
 94 |             naive2_smapes[group_name] = np.mean(smape_2(naive2_forecast, target))
 95 |             grouped_smapes[group_name] = np.mean(smape_2(forecast=model_forecast, target=target))
 96 |             grouped_mapes[group_name] = np.mean(mape(forecast=model_forecast, target=target))
 97 | 
 98 |         grouped_smapes = self.summarize_groups(grouped_smapes)
 99 |         grouped_mapes = self.summarize_groups(grouped_mapes)
100 |         grouped_model_mases = self.summarize_groups(model_mases)
101 |         grouped_naive2_smapes = self.summarize_groups(naive2_smapes)
102 |         grouped_naive2_mases = self.summarize_groups(naive2_mases)
103 |         for k in grouped_model_mases.keys():
104 |             grouped_owa[k] = (grouped_model_mases[k] / grouped_naive2_mases[k] +
105 |                               grouped_smapes[k] / grouped_naive2_smapes[k]) / 2
106 | 
107 |         def round_all(d):
108 |             return dict(map(lambda kv: (kv[0], np.round(kv[1], 3)), d.items()))
109 | 
110 |         return round_all(grouped_smapes), round_all(grouped_owa), round_all(grouped_mapes), round_all(
111 |             grouped_model_mases)
112 | 
113 |     def summarize_groups(self, scores):
114 |         """
115 |         Re-group scores respecting M4 rules.
116 |         :param scores: Scores per group.
117 |         :return: Grouped scores.
118 |         """
119 |         scores_summary = OrderedDict()
120 | 
121 |         def group_count(group_name):
122 |             return len(np.where(self.test_set.groups == group_name)[0])
123 | 
124 |         weighted_score = {}
125 |         for g in ['Yearly', 'Quarterly', 'Monthly']:
126 |             weighted_score[g] = scores[g] * group_count(g)
127 |             scores_summary[g] = scores[g]
128 | 
129 |         others_score = 0
130 |         others_count = 0
131 |         for g in ['Weekly', 'Daily', 'Hourly']:
132 |             others_score += scores[g] * group_count(g)
133 |             others_count += group_count(g)
134 |         weighted_score['Others'] = others_score
135 |         scores_summary['Others'] = others_score / others_count
136 | 
137 |         average = np.sum(list(weighted_score.values())) / len(self.test_set.groups)
138 |         scores_summary['Average'] = average
139 | 
140 |         return scores_summary
141 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/masking.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | class TriangularCausalMask():
 5 |     def __init__(self, B, L, device="cpu"):
 6 |         mask_shape = [B, 1, L, L]
 7 |         with torch.no_grad():
 8 |             self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
 9 | 
10 |     @property
11 |     def mask(self):
12 |         return self._mask
13 | 
14 | 
15 | class ProbMask():
16 |     def __init__(self, B, H, L, index, scores, device="cpu"):
17 |         _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1)
18 |         _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
19 |         indicator = _mask_ex[torch.arange(B)[:, None, None],
20 |                     torch.arange(H)[None, :, None],
21 |                     index, :].to(device)
22 |         self._mask = indicator.view(scores.shape).to(device)
23 | 
24 |     @property
25 |     def mask(self):
26 |         return self._mask
27 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/metrics.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def RSE(pred, true):
 5 |     return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2))
 6 | 
 7 | 
 8 | def CORR(pred, true):
 9 |     u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0)
10 |     d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0))
11 |     return (u / d).mean(-1)
12 | 
13 | 
14 | def MAE(pred, true):
15 |     return np.mean(np.abs(pred - true))
16 | 
17 | 
18 | def MSE(pred, true):
19 |     return np.mean((pred - true) ** 2)
20 | 
21 | 
22 | def RMSE(pred, true):
23 |     return np.sqrt(MSE(pred, true))
24 | 
25 | 
26 | def MAPE(pred, true):
27 |     return np.mean(np.abs((pred - true) / true))
28 | 
29 | 
30 | def MSPE(pred, true):
31 |     return np.mean(np.square((pred - true) / true))
32 | 
33 | 
34 | def metric(pred, true):
35 |     mae = MAE(pred, true)
36 |     mse = MSE(pred, true)
37 |     rmse = RMSE(pred, true)
38 |     mape = MAPE(pred, true)
39 |     mspe = MSPE(pred, true)
40 | 
41 |     return mae, mse, rmse, mape, mspe
42 | 


--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/timefeatures.py:
--------------------------------------------------------------------------------
  1 | from typing import List
  2 | 
  3 | import numpy as np
  4 | import pandas as pd
  5 | from pandas.tseries import offsets
  6 | from pandas.tseries.frequencies import to_offset
  7 | 
  8 | 
  9 | class TimeFeature:
 10 |     def __init__(self):
 11 |         pass
 12 | 
 13 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 14 |         pass
 15 | 
 16 |     def __repr__(self):
 17 |         return self.__class__.__name__ + "()"
 18 | 
 19 | 
 20 | class SecondOfMinute(TimeFeature):
 21 |     """Minute of hour encoded as value between [-0.5, 0.5]"""
 22 | 
 23 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 24 |         return index.second / 59.0 - 0.5
 25 | 
 26 | 
 27 | class MinuteOfHour(TimeFeature):
 28 |     """Minute of hour encoded as value between [-0.5, 0.5]"""
 29 | 
 30 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 31 |         return index.minute / 59.0 - 0.5
 32 | 
 33 | 
 34 | class HourOfDay(TimeFeature):
 35 |     """Hour of day encoded as value between [-0.5, 0.5]"""
 36 | 
 37 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 38 |         return index.hour / 23.0 - 0.5
 39 | 
 40 | 
 41 | class DayOfWeek(TimeFeature):
 42 |     """Hour of day encoded as value between [-0.5, 0.5]"""
 43 | 
 44 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 45 |         return index.dayofweek / 6.0 - 0.5
 46 | 
 47 | 
 48 | class DayOfMonth(TimeFeature):
 49 |     """Day of month encoded as value between [-0.5, 0.5]"""
 50 | 
 51 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 52 |         return (index.day - 1) / 30.0 - 0.5
 53 | 
 54 | 
 55 | class DayOfYear(TimeFeature):
 56 |     """Day of year encoded as value between [-0.5, 0.5]"""
 57 | 
 58 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 59 |         return (index.dayofyear - 1) / 365.0 - 0.5
 60 | 
 61 | 
 62 | class MonthOfYear(TimeFeature):
 63 |     """Month of year encoded as value between [-0.5, 0.5]"""
 64 | 
 65 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 66 |         return (index.month - 1) / 11.0 - 0.5
 67 | 
 68 | 
 69 | class WeekOfYear(TimeFeature):
 70 |     """Week of year encoded as value between [-0.5, 0.5]"""
 71 | 
 72 |     def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
 73 |         return (index.isocalendar().week - 1) / 52.0 - 0.5
 74 | 
 75 | 
 76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]:
 77 |     """
 78 |     Returns a list of time features that will be appropriate for the given frequency string.
 79 |     Parameters
 80 |     ----------
 81 |     freq_str
 82 |         Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc.
 83 |     """
 84 | 
 85 |     features_by_offsets = {
 86 |         offsets.YearEnd: [],
 87 |         offsets.QuarterEnd: [MonthOfYear],
 88 |         offsets.MonthEnd: [MonthOfYear],
 89 |         offsets.Week: [DayOfMonth, WeekOfYear],
 90 |         offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear],
 91 |         offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear],
 92 |         offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear],
 93 |         offsets.Minute: [
 94 |             MinuteOfHour,
 95 |             HourOfDay,
 96 |             DayOfWeek,
 97 |             DayOfMonth,
 98 |             DayOfYear,
 99 |         ],
100 |         offsets.Second: [
101 |             SecondOfMinute,
102 |             MinuteOfHour,
103 |             HourOfDay,
104 |             DayOfWeek,
105 |             DayOfMonth,
106 |             DayOfYear,
107 |         ],
108 |     }
109 | 
110 |     offset = to_offset(freq_str)
111 | 
112 |     for offset_type, feature_classes in features_by_offsets.items():
113 |         if isinstance(offset, offset_type):
114 |             return [cls() for cls in feature_classes]
115 | 
116 |     supported_freq_msg = f"""
117 |     Unsupported frequency {freq_str}
118 |     The following frequencies are supported:
119 |         Y   - yearly
120 |             alias: A
121 |         M   - monthly
122 |         W   - weekly
123 |         D   - daily
124 |         B   - business days
125 |         H   - hourly
126 |         T   - minutely
127 |             alias: min
128 |         S   - secondly
129 |     """
130 |     raise RuntimeError(supported_freq_msg)
131 | 
132 | 
133 | def time_features(dates, freq='h'):
134 |     return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)])
135 | 


--------------------------------------------------------------------------------
/figs/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/.DS_Store


--------------------------------------------------------------------------------
/figs/mainresult.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/mainresult.png


--------------------------------------------------------------------------------
/figs/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/overview.png


--------------------------------------------------------------------------------