├── README.md
├── SimMTM_Classification
├── .gitignore
├── code
│ ├── config_files
│ │ ├── Epilepsy_Configs.py
│ │ └── SleepEEG_Configs.py
│ ├── dataloader.py
│ ├── layers
│ │ ├── AutoCorrelation.py
│ │ ├── Autoformer_EncDec.py
│ │ ├── Embed.py
│ │ ├── SelfAttention_Family.py
│ │ ├── Transformer_EncDec.py
│ │ └── __init__.py
│ ├── loss.py
│ ├── main.py
│ ├── model.py
│ ├── trainer.py
│ └── utils
│ │ ├── __init__.py
│ │ ├── augmentations.py
│ │ ├── loss.py
│ │ ├── masking.py
│ │ ├── metrics.py
│ │ ├── timefeatures.py
│ │ ├── tools.py
│ │ └── utils.py
├── download_datasets.sh
└── run.sh
├── SimMTM_Forecasting
├── .DS_Store
├── .gitignore
├── data_provider
│ ├── __init__.py
│ ├── data_factory.py
│ ├── data_loader.py
│ ├── m4.py
│ └── uea.py
├── exp
│ ├── .DS_Store
│ ├── __init__.py
│ ├── exp_basic.py
│ └── exp_simmtm.py
├── layers
│ ├── AutoCorrelation.py
│ ├── Autoformer_EncDec.py
│ ├── Conv_Blocks.py
│ ├── ETSformer_EncDec.py
│ ├── Embed.py
│ ├── FourierCorrelation.py
│ ├── MultiWaveletCorrelation.py
│ ├── Pyraformer_EncDec.py
│ ├── SelfAttention_Family.py
│ ├── Transformer_EncDec.py
│ └── __init__.py
├── models
│ ├── .DS_Store
│ ├── PatchTST.py
│ ├── SimMTM.py
│ ├── __init__.py
│ └── iTransformer.py
├── run.py
├── scripts
│ ├── .DS_Store
│ ├── finetune
│ │ ├── .DS_Store
│ │ ├── ECL_script
│ │ │ ├── .DS_Store
│ │ │ └── ECL.sh
│ │ ├── ETT_script
│ │ │ ├── .DS_Store
│ │ │ ├── ETTh1.sh
│ │ │ ├── ETTh2.sh
│ │ │ ├── ETTm1.sh
│ │ │ └── ETTm2.sh
│ │ ├── Traffic
│ │ │ ├── .DS_Store
│ │ │ └── Traffic.sh
│ │ └── Weather_script
│ │ │ ├── .DS_Store
│ │ │ └── Weather.sh
│ └── pretrain
│ │ ├── .DS_Store
│ │ ├── ECL_script
│ │ ├── .DS_Store
│ │ └── ECL.sh
│ │ ├── ETT_script
│ │ ├── .DS_Store
│ │ ├── ETTh1.sh
│ │ ├── ETTh2.sh
│ │ ├── ETTm1.sh
│ │ └── ETTm2.sh
│ │ ├── Traffic_script
│ │ ├── .DS_Store
│ │ └── Traffic.sh
│ │ └── Weather_script
│ │ ├── .DS_Store
│ │ └── Weather.sh
└── utils
│ ├── .DS_Store
│ ├── __init__.py
│ ├── augmentations.py
│ ├── losses.py
│ ├── m4_summary.py
│ ├── masking.py
│ ├── metrics.py
│ ├── timefeatures.py
│ └── tools.py
└── figs
├── .DS_Store
├── mainresult.png
└── overview.png
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # SimMTM (NeurIPS 2023)
3 |
4 | This is the codebase for the paper: [SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling](https://arxiv.org/abs/2302.00861)
5 |
6 |
7 | ## Architecture
8 |
9 |
10 |
11 |
12 | Figure 1. Overview of SimMTM.
13 |
14 |
15 | The reconstruction process of SimMTM involves the following four modules: masking, representation learning, series-wise similarity learning and point-wise reconstruction.
16 |
17 | ### Masking
18 |
19 | We can easily generate a set of masked series for each sample by randomly masking a portion of time points along the temporal dimension.
20 |
21 | ### Representation Learning
22 |
23 | After the encoder and projector layer, we can obtain the point-wise representations and series-wise representations.
24 |
25 | ### Series-wise Similarity Learning
26 |
27 | To precisely reconstruct the original time series, we attempt to utilize the similarities among series-wise representations for weighted aggregation, namely exploiting the local structure of the time series manifold.
28 |
29 | ### Point-wise Reconstruction
30 |
31 | Based on the learned series-wise similarities, we aggregate the point-wise representation of its own masked series and other series to reconstruct the original time series.
32 |
33 |
34 | ## Get Started
35 |
36 | 1、Prepare Data.
37 |
38 | All benchmark datasets can be obtained from [Google Drive](https://drive.google.com/file/d/1CC4ZrUD4EKncndzgy5PSTzOPSqcuyqqj/view?usp=sharing) or [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/a238e34ff81a42878d50/?dl=1), and arrange the folder as:
39 |
40 | ```plain
41 | SimMTM/
42 | |--SimMTM_Forecast
43 | |-- dataset/
44 | |-- ETT-small/
45 | |-- ETTh1.csv
46 | |-- ETTh2.csv
47 | |-- ETTm1.csv
48 | |-- ETTm2.csv
49 | |-- weather/
50 | |-- weather.csv
51 | |-- ...
52 | |-- ...
53 | |--SimMTM_Class
54 | |-- dataset/
55 | |-- SleepEEG/
56 | |-- train.pt
57 | |-- val.pt
58 | |-- test.pt
59 | |-- FD-B/
60 | |-- ...
61 | |-- EMG/
62 | |-- ...
63 | |-- ...
64 | ```
65 |
66 | 2、Forecasting
67 |
68 | We provide the forecasting experiment coding in `./SimMTM_Forecast` and experiment scripts can be found under the folder `./scripts`. To run the code on ETTh2, just run the following command:
69 |
70 | ```bash
71 | cd ./SimMTM_Forecast
72 | # pre-training
73 | sh ./scripts/pretrain/ETT_script/ETTh2.sh
74 | # fine-tuning
75 | sh ./scripts/finetune/ETT_script/ETTh2.sh
76 | ```
77 |
78 | 3、Classification
79 |
80 | We also provide the classification experiment coding in `./SimMTM_Class`. When we want to pre-train a model on SleepEEG and fine-tune it on Epilepsy, please run:
81 |
82 | ```bash
83 | cd ./SimMTM_Class
84 | python ./code/main.py --training_mode pre_train --pretrain_dataset SleepEEG --target_dataset Epilepsy
85 | ```
86 |
87 | 4、We also provide some [checkpoints](https://cloud.tsinghua.edu.cn/f/466995bb5f924f55a6da/?dl=1) and you can tune them directly on target datasets.
88 |
89 | ## Main Results
90 |
91 |
92 |
93 |
94 |
95 |
96 | SimMTM (marked by red stars) can simultaneously cover high-level and low-level tasks for in- and cross-domain settings and outperforms other baselines significantly, highlighting the advantages of SimMTM in task generality. More results can be found in our paper.
97 |
98 | ## Citation
99 | If you find this repo useful, please cite our paper.
100 |
101 | ```plain
102 | @inproceedings{dong2023simmtm,
103 | title={SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling},
104 | author={Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang and Mingsheng Long},
105 | booktitle={Advances in Neural Information Processing Systems},
106 | year={2023}
107 | }
108 | ```
109 |
110 | ## Contact
111 |
112 | If you have any questions, please contact [djx20@mails.tsinghua.edu.cn](mailto:djx20@mails.tsinghua.edu.cn).
113 |
114 | ## Acknowledgement
115 |
116 | We appreciate the following github repos a lot for their valuable code base or datasets:
117 |
118 | https://github.com/thuml/Time-Series-Library
119 |
120 | https://github.com/mims-harvard/TFC-pretraining/tree/main
121 |
122 | Thanks to [vincentsham](https://github.com/vincentsham/simmtm/blob/main/experiments_simmtm-BeijingPM25Quality.ipynb) for reproducing our code.
--------------------------------------------------------------------------------
/SimMTM_Classification/.gitignore:
--------------------------------------------------------------------------------
1 | **/__pycache__/
2 | **/.DS_Store
3 | old_README.md
4 | **/.idea
5 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/config_files/Epilepsy_Configs.py:
--------------------------------------------------------------------------------
1 | class Config(object):
2 | def __init__(self):
3 | # model configs
4 | self.input_channels = 1
5 | self.kernel_size = 8
6 | self.stride = 1
7 | self.final_out_channels = 32 #128
8 |
9 | self.num_classes = 2
10 | self.num_classes_target = 3
11 | self.dropout = 0.35
12 | self.features_len = 24
13 | self.features_len_f = 24 # 13 #self.features_len # the output results in time domain
14 |
15 | # training configs
16 | self.num_epoch = 40 # 40
17 |
18 | # optimizer parameters
19 | self.beta1 = 0.9
20 | self.beta2 = 0.99
21 | self.lr = 3e-4 # original lr: 3e-4
22 | self.lr_f = 3e-4
23 |
24 | # data parameters
25 | self.drop_last = True
26 | self.batch_size = 32 #64 # 128
27 | self.target_batch_size = 16 # the size of target dataset (the # of samples used to fine-tune).
28 |
29 | self.Context_Cont = Context_Cont_configs()
30 | self.TC = TC()
31 | self.augmentation = augmentations()
32 |
33 |
34 | class augmentations(object):
35 | def __init__(self):
36 | self.jitter_scale_ratio = 0.001
37 | self.jitter_ratio = 0.001
38 | self.max_seg = 5
39 |
40 |
41 | class Context_Cont_configs(object):
42 | def __init__(self):
43 | self.temperature = 0.2
44 | self.use_cosine_similarity = True
45 | self.use_cosine_similarity_f = True
46 |
47 |
48 | class TC(object):
49 | def __init__(self):
50 | self.hidden_dim = 100
51 | self.timesteps = 10
52 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/config_files/SleepEEG_Configs.py:
--------------------------------------------------------------------------------
1 |
2 | class Config(object):
3 | def __init__(self):
4 | # model configs
5 | self.input_channels = 1
6 | self.increased_dim = 1
7 | self.final_out_channels = 128
8 | self.num_classes = 5
9 | self.num_classes_target = 8
10 | self.dropout = 0.2
11 | self.masking_ratio = 0.5
12 | self.lm = 3 # average length of masking subsequences
13 |
14 | self.kernel_size = 25
15 | self.stride = 3
16 | self.features_len = 127
17 | self.features_len_f = self.features_len
18 |
19 | self.TSlength_aligned = 178
20 |
21 | self.CNNoutput_channel = 10 # 90 # 10 for Epilepsy model
22 |
23 | # training configs
24 | self.num_epoch = 40
25 |
26 | # optimizer parameters
27 | self.optimizer = 'adam'
28 | self.beta1 = 0.9
29 | self.beta2 = 0.99
30 | self.lr = 3e-8 # 3e-4
31 | self.lr_f = self.lr
32 |
33 | # data parameters
34 | self.drop_last = True
35 | self.batch_size = 32
36 |
37 | """For Epilepsy, the target batchsize is 60"""
38 | self.target_batch_size = 32 # the size of target dataset (the # of samples used to fine-tune).
39 |
40 | self.Context_Cont = Context_Cont_configs()
41 | self.TC = TC()
42 | self.augmentation = augmentations()
43 |
44 |
45 | class augmentations(object):
46 | def __init__(self):
47 | self.jitter_scale_ratio = 1.5
48 | self.jitter_ratio = 2
49 | self.max_seg = 12
50 |
51 |
52 | class Context_Cont_configs(object):
53 | def __init__(self):
54 | self.temperature = 0.2
55 | self.use_cosine_similarity = True
56 |
57 |
58 | class TC(object):
59 | def __init__(self):
60 | self.hidden_dim = 64
61 | self.timesteps = 50
62 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/dataloader.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.utils.data import DataLoader
3 | from torch.utils.data import Dataset
4 | import os
5 | import numpy as np
6 |
7 |
8 | class Load_Dataset(Dataset):
9 | # Initialize your data, download, etc.
10 | def __init__(self, dataset, config, training_mode, target_dataset_size=64, subset=False):
11 | super(Load_Dataset, self).__init__()
12 | self.training_mode = training_mode
13 | X_train = dataset["samples"]
14 | y_train = dataset["labels"]
15 | # shuffle
16 | data = list(zip(X_train, y_train))
17 | np.random.shuffle(data)
18 | X_train, y_train = zip(*data)
19 | X_train, y_train = torch.stack(list(X_train), dim=0), torch.stack(list(y_train), dim=0)
20 |
21 | if len(X_train.shape) < 3:
22 | X_train = X_train.unsqueeze(2)
23 |
24 | if X_train.shape.index(min(X_train.shape)) != 1: # make sure the Channels in second dim
25 | X_train = X_train.permute(0, 2, 1)
26 |
27 | """Align the TS length between source and target datasets"""
28 | #X_train = X_train[:, :1, :int(config.TSlength_aligned)] # take the first 178 samples
29 | X_train = X_train[:, :1, :int(config.TSlength_aligned)]
30 |
31 | """Subset for debugging"""
32 | if subset == True:
33 | subset_size = target_dataset_size *10
34 | """if the dimension is larger than 178, take the first 178 dimensions. If multiple channels, take the first channel"""
35 | X_train = X_train[:subset_size]
36 | y_train = y_train[:subset_size]
37 |
38 | if isinstance(X_train, np.ndarray):
39 | self.x_data = torch.from_numpy(X_train)
40 | self.y_data = torch.from_numpy(y_train).long()
41 | else:
42 | self.x_data = X_train
43 | self.y_data = y_train
44 |
45 | self.len = X_train.shape[0]
46 |
47 | def __getitem__(self, index):
48 | return self.x_data[index], self.y_data[index]
49 |
50 | def __len__(self):
51 | return self.len
52 |
53 |
54 | def data_generator(sourcedata_path, targetdata_path, configs, training_mode, subset = True):
55 |
56 | train_dataset = torch.load(os.path.join(sourcedata_path, "train.pt"))
57 | finetune_dataset = torch.load(os.path.join(targetdata_path, "train.pt"))
58 | test_dataset = torch.load(os.path.join(targetdata_path, "test.pt"))
59 | """ Dataset notes:
60 | Epilepsy: train_dataset['samples'].shape = torch.Size([7360, 1, 178]); binary labels [7360]
61 | valid: [1840, 1, 178]
62 | test: [2300, 1, 178]. In test set, 1835 are positive sampels, the positive rate is 0.7978"""
63 | """sleepEDF: finetune_dataset['samples']: [7786, 1, 3000]"""
64 |
65 | # subset = True # if true, use a subset for debugging.
66 | train_dataset = Load_Dataset(train_dataset, configs, training_mode, target_dataset_size=configs.batch_size, subset=subset) # for self-supervised, the data are augmented here
67 | finetune_dataset = Load_Dataset(finetune_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset)
68 | if test_dataset['labels'].shape[0]>10*configs.target_batch_size:
69 | test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size*10, subset=subset)
70 | else:
71 | test_dataset = Load_Dataset(test_dataset, configs, training_mode, target_dataset_size=configs.target_batch_size, subset=subset)
72 |
73 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=configs.batch_size,
74 | shuffle=True, drop_last=configs.drop_last,
75 | num_workers=0)
76 |
77 | """the valid and test loader would be finetuning set and test set."""
78 | valid_loader = torch.utils.data.DataLoader(dataset=finetune_dataset, batch_size=configs.target_batch_size,
79 | shuffle=True, drop_last=configs.drop_last,
80 | num_workers=0)
81 |
82 | test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=configs.target_batch_size,
83 | shuffle=True, drop_last=False,
84 | num_workers=0)
85 |
86 | return train_loader, valid_loader, test_loader
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/AutoCorrelation.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import math
4 |
5 |
6 | class AutoCorrelation(nn.Module):
7 | """
8 | AutoCorrelation Mechanism with the following two phases:
9 | (1) period-based dependencies discovery
10 | (2) time delay aggregation
11 | This block can replace the self-attention family mechanism seamlessly.
12 | """
13 | def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False):
14 | super(AutoCorrelation, self).__init__()
15 | self.factor = factor
16 | self.scale = scale
17 | self.mask_flag = mask_flag
18 | self.output_attention = output_attention
19 | self.dropout = nn.Dropout(attention_dropout)
20 |
21 | def time_delay_agg_training(self, values, corr):
22 | """
23 | SpeedUp version of Autocorrelation (a batch-normalization style design)
24 | This is for the training phase.
25 | """
26 | head = values.shape[1]
27 | channel = values.shape[2]
28 | length = values.shape[3]
29 | # find top k
30 | top_k = int(self.factor * math.log(length))
31 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
32 | index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1]
33 | weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1)
34 | # update corr
35 | tmp_corr = torch.softmax(weights, dim=-1)
36 | # aggregation
37 | tmp_values = values
38 | delays_agg = torch.zeros_like(values).float()
39 | for i in range(top_k):
40 | pattern = torch.roll(tmp_values, -int(index[i]), -1)
41 | delays_agg = delays_agg + pattern * \
42 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
43 | return delays_agg
44 |
45 | def time_delay_agg_inference(self, values, corr):
46 | """
47 | SpeedUp version of Autocorrelation (a batch-normalization style design)
48 | This is for the inference phase.
49 | """
50 | batch = values.shape[0]
51 | head = values.shape[1]
52 | channel = values.shape[2]
53 | length = values.shape[3]
54 | # index init
55 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\
56 | .repeat(batch, head, channel, 1).to(values.device)
57 | # find top k
58 | top_k = int(self.factor * math.log(length))
59 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
60 | weights, delay = torch.topk(mean_value, top_k, dim=-1)
61 | # update corr
62 | tmp_corr = torch.softmax(weights, dim=-1)
63 | # aggregation
64 | tmp_values = values.repeat(1, 1, 1, 2)
65 | delays_agg = torch.zeros_like(values).float()
66 | for i in range(top_k):
67 | tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)
68 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
69 | delays_agg = delays_agg + pattern * \
70 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
71 | return delays_agg
72 |
73 | def time_delay_agg_full(self, values, corr):
74 | """
75 | Standard version of Autocorrelation
76 | """
77 | batch = values.shape[0]
78 | head = values.shape[1]
79 | channel = values.shape[2]
80 | length = values.shape[3]
81 | # index init
82 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0)\
83 | .repeat(batch, head, channel, 1).to(values.device)
84 | # find top k
85 | top_k = int(self.factor * math.log(length))
86 | weights, delay = torch.topk(corr, top_k, dim=-1)
87 | # update corr
88 | tmp_corr = torch.softmax(weights, dim=-1)
89 | # aggregation
90 | tmp_values = values.repeat(1, 1, 1, 2)
91 | delays_agg = torch.zeros_like(values).float()
92 | for i in range(top_k):
93 | tmp_delay = init_index + delay[..., i].unsqueeze(-1)
94 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
95 | delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1))
96 | return delays_agg
97 |
98 | def forward(self, queries, keys, values, attn_mask):
99 | B, L, H, E = queries.shape
100 | _, S, _, D = values.shape
101 | if L > S:
102 | zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
103 | values = torch.cat([values, zeros], dim=1)
104 | keys = torch.cat([keys, zeros], dim=1)
105 | else:
106 | values = values[:, :L, :, :]
107 | keys = keys[:, :L, :, :]
108 |
109 | # period-based dependencies
110 | q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1)
111 | k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1)
112 | res = q_fft * torch.conj(k_fft)
113 | corr = torch.fft.irfft(res, dim=-1)
114 |
115 | # time delay agg
116 | if self.training:
117 | V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
118 | else:
119 | V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
120 |
121 | if self.output_attention:
122 | return (V.contiguous(), corr.permute(0, 3, 1, 2))
123 | else:
124 | return (V.contiguous(), None)
125 |
126 |
127 | class AutoCorrelationLayer(nn.Module):
128 | def __init__(self, correlation, d_model, n_heads, d_keys=None,
129 | d_values=None):
130 | super(AutoCorrelationLayer, self).__init__()
131 |
132 | d_keys = d_keys or (d_model // n_heads)
133 | d_values = d_values or (d_model // n_heads)
134 |
135 | self.inner_correlation = correlation
136 | self.query_projection = nn.Linear(d_model, d_keys * n_heads)
137 | self.key_projection = nn.Linear(d_model, d_keys * n_heads)
138 | self.value_projection = nn.Linear(d_model, d_values * n_heads)
139 | self.out_projection = nn.Linear(d_values * n_heads, d_model)
140 | self.n_heads = n_heads
141 |
142 | def forward(self, queries, keys, values, attn_mask):
143 | B, L, _ = queries.shape
144 | _, S, _ = keys.shape
145 | H = self.n_heads
146 |
147 | queries = self.query_projection(queries).view(B, L, H, -1)
148 | keys = self.key_projection(keys).view(B, S, H, -1)
149 | values = self.value_projection(values).view(B, S, H, -1)
150 |
151 | out, attn = self.inner_correlation(
152 | queries,
153 | keys,
154 | values,
155 | attn_mask
156 | )
157 | out = out.view(B, L, -1)
158 |
159 | return self.out_projection(out), attn
160 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Autoformer_EncDec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class my_Layernorm(nn.Module):
7 | """
8 | Special designed layernorm for the seasonal part
9 | """
10 | def __init__(self, channels):
11 | super(my_Layernorm, self).__init__()
12 | self.layernorm = nn.LayerNorm(channels)
13 |
14 | def forward(self, x):
15 | x_hat = self.layernorm(x)
16 | bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1)
17 | return x_hat - bias
18 |
19 |
20 | class moving_avg(nn.Module):
21 | """
22 | Moving average block to highlight the trend of time series
23 | """
24 | def __init__(self, kernel_size, stride):
25 | super(moving_avg, self).__init__()
26 | self.kernel_size = kernel_size
27 | self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
28 |
29 | def forward(self, x):
30 | # padding on the both ends of time series
31 | front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
32 | end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
33 | x = torch.cat([front, x, end], dim=1)
34 | x = self.avg(x.permute(0, 2, 1))
35 | x = x.permute(0, 2, 1)
36 | return x
37 |
38 |
39 | class series_decomp(nn.Module):
40 | """
41 | Series decomposition block
42 | """
43 | def __init__(self, kernel_size):
44 | super(series_decomp, self).__init__()
45 | self.moving_avg = moving_avg(kernel_size, stride=1)
46 |
47 | def forward(self, x):
48 | moving_mean = self.moving_avg(x)
49 | res = x - moving_mean
50 | return res, moving_mean
51 |
52 |
53 | class EncoderLayer(nn.Module):
54 | """
55 | Autoformer encoder layer with the progressive decomposition architecture
56 | """
57 | def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"):
58 | super(EncoderLayer, self).__init__()
59 | d_ff = d_ff or 4 * d_model
60 | self.attention = attention
61 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
62 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
63 | self.decomp1 = series_decomp(moving_avg)
64 | self.decomp2 = series_decomp(moving_avg)
65 | self.dropout = nn.Dropout(dropout)
66 | self.activation = F.relu if activation == "relu" else F.gelu
67 |
68 | def forward(self, x, attn_mask=None):
69 | new_x, attn = self.attention(
70 | x, x, x,
71 | attn_mask=attn_mask
72 | )
73 | x = x + self.dropout(new_x)
74 | x, _ = self.decomp1(x)
75 | y = x
76 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
77 | y = self.dropout(self.conv2(y).transpose(-1, 1))
78 | res, _ = self.decomp2(x + y)
79 | return res, attn
80 |
81 |
82 | class Encoder(nn.Module):
83 | """
84 | Autoformer encoder
85 | """
86 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
87 | super(Encoder, self).__init__()
88 | self.attn_layers = nn.ModuleList(attn_layers)
89 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
90 | self.norm = norm_layer
91 |
92 | def forward(self, x, attn_mask=None):
93 | attns = []
94 | if self.conv_layers is not None:
95 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
96 | x, attn = attn_layer(x, attn_mask=attn_mask)
97 | x = conv_layer(x)
98 | attns.append(attn)
99 | x, attn = self.attn_layers[-1](x)
100 | attns.append(attn)
101 | else:
102 | for attn_layer in self.attn_layers:
103 | x, attn = attn_layer(x, attn_mask=attn_mask)
104 | attns.append(attn)
105 |
106 | if self.norm is not None:
107 | x = self.norm(x)
108 |
109 | return x, attns
110 |
111 |
112 | class DecoderLayer(nn.Module):
113 | """
114 | Autoformer decoder layer with the progressive decomposition architecture
115 | """
116 | def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None,
117 | moving_avg=25, dropout=0.1, activation="relu"):
118 | super(DecoderLayer, self).__init__()
119 | d_ff = d_ff or 4 * d_model
120 | self.self_attention = self_attention
121 | self.cross_attention = cross_attention
122 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
123 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
124 | self.decomp1 = series_decomp(moving_avg)
125 | self.decomp2 = series_decomp(moving_avg)
126 | self.decomp3 = series_decomp(moving_avg)
127 | self.dropout = nn.Dropout(dropout)
128 | self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1,
129 | padding_mode='circular', bias=False)
130 | self.activation = F.relu if activation == "relu" else F.gelu
131 |
132 | def forward(self, x, cross, x_mask=None, cross_mask=None):
133 | x = x + self.dropout(self.self_attention(
134 | x, x, x,
135 | attn_mask=x_mask
136 | )[0])
137 | x, trend1 = self.decomp1(x)
138 | x = x + self.dropout(self.cross_attention(
139 | x, cross, cross,
140 | attn_mask=cross_mask
141 | )[0])
142 | x, trend2 = self.decomp2(x)
143 | y = x
144 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
145 | y = self.dropout(self.conv2(y).transpose(-1, 1))
146 | x, trend3 = self.decomp3(x + y)
147 |
148 | residual_trend = trend1 + trend2 + trend3
149 | residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2)
150 | return x, residual_trend
151 |
152 |
153 | class Decoder(nn.Module):
154 | """
155 | Autoformer encoder
156 | """
157 | def __init__(self, layers, norm_layer=None, projection=None):
158 | super(Decoder, self).__init__()
159 | self.layers = nn.ModuleList(layers)
160 | self.norm = norm_layer
161 | self.projection = projection
162 |
163 | def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
164 | for layer in self.layers:
165 | x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
166 | trend = trend + residual_trend
167 |
168 | if self.norm is not None:
169 | x = self.norm(x)
170 |
171 | if self.projection is not None:
172 | x = self.projection(x)
173 | return x, trend
174 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Embed.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import math
4 |
5 |
6 | class PositionalEmbedding(nn.Module):
7 | def __init__(self, d_model, max_len=5000):
8 | super(PositionalEmbedding, self).__init__()
9 | # Compute the positional encodings once in log space.
10 | pe = torch.zeros(max_len, d_model).float()
11 | pe.require_grad = False
12 |
13 | position = torch.arange(0, max_len).float().unsqueeze(1)
14 | div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
15 |
16 | pe[:, 0::2] = torch.sin(position * div_term)
17 | pe[:, 1::2] = torch.cos(position * div_term)
18 |
19 | pe = pe.unsqueeze(0)
20 | self.register_buffer('pe', pe)
21 |
22 | def forward(self, x):
23 | return self.pe[:, :x.size(1)]
24 |
25 |
26 | class TokenEmbedding(nn.Module):
27 | def __init__(self, c_in, d_model):
28 | super(TokenEmbedding, self).__init__()
29 | padding = 1 if torch.__version__ >= '1.5.0' else 2
30 | self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
31 | kernel_size=3, padding=padding, padding_mode='circular', bias=False)
32 | for m in self.modules():
33 | if isinstance(m, nn.Conv1d):
34 | nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
35 |
36 | def forward(self, x):
37 | x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
38 | return x
39 |
40 |
41 | class FixedEmbedding(nn.Module):
42 | def __init__(self, c_in, d_model):
43 | super(FixedEmbedding, self).__init__()
44 |
45 | w = torch.zeros(c_in, d_model).float()
46 | w.require_grad = False
47 |
48 | position = torch.arange(0, c_in).float().unsqueeze(1)
49 | div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()
50 |
51 | w[:, 0::2] = torch.sin(position * div_term)
52 | w[:, 1::2] = torch.cos(position * div_term)
53 |
54 | self.emb = nn.Embedding(c_in, d_model)
55 | self.emb.weight = nn.Parameter(w, requires_grad=False)
56 |
57 | def forward(self, x):
58 | return self.emb(x).detach()
59 |
60 |
61 | class TemporalEmbedding(nn.Module):
62 | def __init__(self, d_model, embed_type='fixed', freq='h'):
63 | super(TemporalEmbedding, self).__init__()
64 |
65 | minute_size = 4
66 | hour_size = 24
67 | weekday_size = 7
68 | day_size = 32
69 | month_size = 13
70 |
71 | Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
72 | if freq == 't':
73 | self.minute_embed = Embed(minute_size, d_model)
74 | self.hour_embed = Embed(hour_size, d_model)
75 | self.weekday_embed = Embed(weekday_size, d_model)
76 | self.day_embed = Embed(day_size, d_model)
77 | self.month_embed = Embed(month_size, d_model)
78 |
79 | def forward(self, x):
80 | x = x.long()
81 |
82 | minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0.
83 | hour_x = self.hour_embed(x[:, :, 3])
84 | weekday_x = self.weekday_embed(x[:, :, 2])
85 | day_x = self.day_embed(x[:, :, 1])
86 | month_x = self.month_embed(x[:, :, 0])
87 |
88 | return hour_x + weekday_x + day_x + month_x + minute_x
89 |
90 |
91 | class TimeFeatureEmbedding(nn.Module):
92 | def __init__(self, d_model, embed_type='timeF', freq='h'):
93 | super(TimeFeatureEmbedding, self).__init__()
94 |
95 | freq_map = {'h': 4, 't': 5, 's': 6, 'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3}
96 | d_inp = freq_map[freq]
97 | self.embed = nn.Linear(d_inp, d_model, bias=False)
98 |
99 | def forward(self, x):
100 | return self.embed(x)
101 |
102 |
103 | class DataEmbedding(nn.Module):
104 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
105 | super(DataEmbedding, self).__init__()
106 |
107 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
108 | self.position_embedding = PositionalEmbedding(d_model=d_model)
109 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
110 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
111 | d_model=d_model, embed_type=embed_type, freq=freq)
112 | self.dropout = nn.Dropout(p=dropout)
113 |
114 | def forward(self, x):
115 | x = self.value_embedding(x) + self.position_embedding(x)
116 | return self.dropout(x)
117 |
118 |
119 | class DataEmbedding_wo_pos(nn.Module):
120 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
121 | super(DataEmbedding_wo_pos, self).__init__()
122 |
123 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
124 | self.position_embedding = PositionalEmbedding(d_model=d_model)
125 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
126 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
127 | d_model=d_model, embed_type=embed_type, freq=freq)
128 | self.dropout = nn.Dropout(p=dropout)
129 |
130 | def forward(self, x, x_mark):
131 | x = self.value_embedding(x) + self.temporal_embedding(x_mark)
132 | return self.dropout(x)
133 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/SelfAttention_Family.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | from math import sqrt
5 | from utils.masking import TriangularCausalMask, ProbMask
6 |
7 |
8 | class FullAttention(nn.Module):
9 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
10 | super(FullAttention, self).__init__()
11 | self.scale = scale
12 | self.mask_flag = mask_flag
13 | self.output_attention = output_attention
14 | self.dropout = nn.Dropout(attention_dropout)
15 |
16 | def forward(self, queries, keys, values, attn_mask):
17 | B, L, H, E = queries.shape
18 | _, S, _, D = values.shape
19 | scale = self.scale or 1. / sqrt(E)
20 |
21 | scores = torch.einsum("blhe,bshe->bhls", queries, keys)
22 |
23 | if self.mask_flag:
24 | if attn_mask is None:
25 | attn_mask = TriangularCausalMask(B, L, device=queries.device)
26 |
27 | scores.masked_fill_(attn_mask.mask, -np.inf)
28 |
29 | A = self.dropout(torch.softmax(scale * scores, dim=-1))
30 | V = torch.einsum("bhls,bshd->blhd", A, values)
31 |
32 | if self.output_attention:
33 | return (V.contiguous(), A)
34 | else:
35 | return (V.contiguous(), None)
36 |
37 |
38 | class ProbAttention(nn.Module):
39 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
40 | super(ProbAttention, self).__init__()
41 | self.factor = factor
42 | self.scale = scale
43 | self.mask_flag = mask_flag
44 | self.output_attention = output_attention
45 | self.dropout = nn.Dropout(attention_dropout)
46 |
47 | def _prob_QK(self, Q, K, sample_k, n_top): # n_top: c*ln(L_q)
48 | # Q [B, H, L, D]
49 | B, H, L_K, E = K.shape
50 | _, _, L_Q, _ = Q.shape
51 |
52 | # calculate the sampled Q_K
53 | K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)
54 | index_sample = torch.randint(L_K, (L_Q, sample_k)) # real U = U_part(factor*ln(L_k))*L_q
55 | K_sample = K_expand[:, :, torch.arange(L_Q).unsqueeze(1), index_sample, :]
56 | Q_K_sample = torch.matmul(Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()
57 |
58 | # find the Top_k query with sparisty measurement
59 | M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
60 | M_top = M.topk(n_top, sorted=False)[1]
61 |
62 | # use the reduced Q to calculate Q_K
63 | Q_reduce = Q[torch.arange(B)[:, None, None],
64 | torch.arange(H)[None, :, None],
65 | M_top, :] # factor*ln(L_q)
66 | Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1)) # factor*ln(L_q)*L_k
67 |
68 | return Q_K, M_top
69 |
70 | def _get_initial_context(self, V, L_Q):
71 | B, H, L_V, D = V.shape
72 | if not self.mask_flag:
73 | # V_sum = V.sum(dim=-2)
74 | V_sum = V.mean(dim=-2)
75 | contex = V_sum.unsqueeze(-2).expand(B, H, L_Q, V_sum.shape[-1]).clone()
76 | else: # use mask
77 | assert (L_Q == L_V) # requires that L_Q == L_V, i.e. for self-attention only
78 | contex = V.cumsum(dim=-2)
79 | return contex
80 |
81 | def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
82 | B, H, L_V, D = V.shape
83 |
84 | if self.mask_flag:
85 | attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
86 | scores.masked_fill_(attn_mask.mask, -np.inf)
87 |
88 | attn = torch.softmax(scores, dim=-1) # nn.Softmax(dim=-1)(scores)
89 |
90 | context_in[torch.arange(B)[:, None, None],
91 | torch.arange(H)[None, :, None],
92 | index, :] = torch.matmul(attn, V).type_as(context_in)
93 | if self.output_attention:
94 | attns = (torch.ones([B, H, L_V, L_V]) / L_V).type_as(attn).to(attn.device)
95 | attns[torch.arange(B)[:, None, None], torch.arange(H)[None, :, None], index, :] = attn
96 | return (context_in, attns)
97 | else:
98 | return (context_in, None)
99 |
100 | def forward(self, queries, keys, values, attn_mask):
101 | B, L_Q, H, D = queries.shape
102 | _, L_K, _, _ = keys.shape
103 |
104 | queries = queries.transpose(2, 1)
105 | keys = keys.transpose(2, 1)
106 | values = values.transpose(2, 1)
107 |
108 | U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
109 | u = self.factor * np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
110 |
111 | U_part = U_part if U_part < L_K else L_K
112 | u = u if u < L_Q else L_Q
113 |
114 | scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)
115 |
116 | # add scale factor
117 | scale = self.scale or 1. / sqrt(D)
118 | if scale is not None:
119 | scores_top = scores_top * scale
120 | # get the context
121 | context = self._get_initial_context(values, L_Q)
122 | # update the context with selected top_k queries
123 | context, attn = self._update_context(context, values, scores_top, index, L_Q, attn_mask)
124 |
125 | return context.contiguous(), attn
126 |
127 |
128 | class AttentionLayer(nn.Module):
129 | def __init__(self, attention, d_model, n_heads, d_keys=None,
130 | d_values=None):
131 | super(AttentionLayer, self).__init__()
132 |
133 | d_keys = d_keys or (d_model // n_heads)
134 | d_values = d_values or (d_model // n_heads)
135 |
136 | self.inner_attention = attention
137 | self.query_projection = nn.Linear(d_model, d_keys * n_heads)
138 | self.key_projection = nn.Linear(d_model, d_keys * n_heads)
139 | self.value_projection = nn.Linear(d_model, d_values * n_heads)
140 | self.out_projection = nn.Linear(d_values * n_heads, d_model)
141 | self.n_heads = n_heads
142 |
143 | def forward(self, queries, keys, values, attn_mask):
144 | B, L, _ = queries.shape
145 | _, S, _ = keys.shape
146 | H = self.n_heads
147 |
148 | queries = self.query_projection(queries).view(B, L, H, -1)
149 | keys = self.key_projection(keys).view(B, S, H, -1)
150 | values = self.value_projection(values).view(B, S, H, -1)
151 |
152 | out, attn = self.inner_attention(
153 | queries,
154 | keys,
155 | values,
156 | attn_mask
157 | )
158 | out = out.view(B, L, -1)
159 |
160 | return self.out_projection(out), attn
161 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/Transformer_EncDec.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 | import torch.nn.functional as F
3 |
4 |
5 | class ConvLayer(nn.Module):
6 | def __init__(self, c_in):
7 | super(ConvLayer, self).__init__()
8 | self.downConv = nn.Conv1d(in_channels=c_in,
9 | out_channels=c_in,
10 | kernel_size=3,
11 | padding=2,
12 | padding_mode='circular')
13 | self.norm = nn.BatchNorm1d(c_in)
14 | self.activation = nn.ELU()
15 | self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
16 |
17 | def forward(self, x):
18 | x = self.downConv(x.permute(0, 2, 1))
19 | x = self.norm(x)
20 | x = self.activation(x)
21 | x = self.maxPool(x)
22 | x = x.transpose(1, 2)
23 | return x
24 |
25 |
26 | class EncoderLayer(nn.Module):
27 | def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
28 | super(EncoderLayer, self).__init__()
29 | d_ff = d_ff or 4 * d_model
30 | self.attention = attention
31 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
32 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
33 | self.norm1 = nn.LayerNorm(d_model)
34 | self.norm2 = nn.LayerNorm(d_model)
35 | self.dropout = nn.Dropout(dropout)
36 | self.activation = F.relu if activation == "relu" else F.gelu
37 |
38 | def forward(self, x, attn_mask=None):
39 | new_x, attn = self.attention(
40 | x, x, x,
41 | attn_mask=attn_mask
42 | )
43 | x = x + self.dropout(new_x)
44 |
45 | y = x = self.norm1(x)
46 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
47 | y = self.dropout(self.conv2(y).transpose(-1, 1))
48 |
49 | return self.norm2(x + y), attn
50 |
51 |
52 | class Encoder(nn.Module):
53 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None, projection=None):
54 | super(Encoder, self).__init__()
55 | self.attn_layers = nn.ModuleList(attn_layers)
56 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
57 | self.norm = norm_layer
58 | self.projection = projection
59 |
60 | def forward(self, x, attn_mask=None):
61 | # x [B, L, D]
62 | attns = []
63 | if self.conv_layers is not None:
64 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
65 | x, attn = attn_layer(x, attn_mask=attn_mask)
66 | x = conv_layer(x)
67 | attns.append(attn)
68 | x, attn = self.attn_layers[-1](x)
69 | attns.append(attn)
70 | else:
71 | for attn_layer in self.attn_layers:
72 | x, attn = attn_layer(x, attn_mask=attn_mask)
73 | attns.append(attn)
74 |
75 | if self.norm is not None:
76 | x = self.norm(x)
77 |
78 | if self.projection is not None:
79 | x = self.projection(x)
80 |
81 | return x, attns
82 |
83 |
84 | class DecoderLayer(nn.Module):
85 | def __init__(self, self_attention, cross_attention, d_model, d_ff=None,
86 | dropout=0.1, activation="relu"):
87 | super(DecoderLayer, self).__init__()
88 | d_ff = d_ff or 4 * d_model
89 | self.self_attention = self_attention
90 | self.cross_attention = cross_attention
91 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
92 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
93 | self.norm1 = nn.LayerNorm(d_model)
94 | self.norm2 = nn.LayerNorm(d_model)
95 | self.norm3 = nn.LayerNorm(d_model)
96 | self.dropout = nn.Dropout(dropout)
97 | self.activation = F.relu if activation == "relu" else F.gelu
98 |
99 | def forward(self, x, cross, x_mask=None, cross_mask=None):
100 | x = x + self.dropout(self.self_attention(
101 | x, x, x,
102 | attn_mask=x_mask
103 | )[0])
104 | x = self.norm1(x)
105 |
106 | x = x + self.dropout(self.cross_attention(
107 | x, cross, cross,
108 | attn_mask=cross_mask
109 | )[0])
110 |
111 | y = x = self.norm2(x)
112 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
113 | y = self.dropout(self.conv2(y).transpose(-1, 1))
114 |
115 | return self.norm3(x + y)
116 |
117 |
118 | class Decoder(nn.Module):
119 | def __init__(self, layers, norm_layer=None, projection=None):
120 | super(Decoder, self).__init__()
121 | self.layers = nn.ModuleList(layers)
122 | self.norm = norm_layer
123 | self.projection = projection
124 |
125 | def forward(self, x, cross, x_mask=None, cross_mask=None):
126 | for layer in self.layers:
127 | x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
128 |
129 | if self.norm is not None:
130 | x = self.norm(x)
131 |
132 | if self.projection is not None:
133 | x = self.projection(x)
134 | return x
135 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/layers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/layers/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Classification/code/loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import torch.nn.functional as F
4 |
5 | class AutomaticWeightedLoss(torch.nn.Module):
6 | """automatically weighted multi-task loss
7 | Params:
8 | num: int,the number of loss
9 | x: multi-task loss
10 | Examples:
11 | loss1=1
12 | loss2=2
13 | awl = AutomaticWeightedLoss(2)
14 | loss_sum = awl(loss1, loss2)
15 | """
16 |
17 | def __init__(self, num=2):
18 | super(AutomaticWeightedLoss, self).__init__()
19 | params = torch.ones(num, requires_grad=True)
20 | self.params = torch.nn.Parameter(params)
21 |
22 | def forward(self, *x):
23 | loss_sum = 0
24 | for i, loss in enumerate(x):
25 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
26 | return loss_sum
27 |
28 |
29 | class ContrastiveWeight(torch.nn.Module):
30 |
31 | def __init__(self, args):
32 | super(ContrastiveWeight, self).__init__()
33 | self.temperature = args.temperature
34 |
35 | self.bce = torch.nn.BCELoss()
36 | self.softmax = torch.nn.Softmax(dim=-1)
37 | self.log_softmax = torch.nn.LogSoftmax(dim=-1)
38 | self.kl = torch.nn.KLDivLoss(reduction='batchmean')
39 | self.positive_nums = args.positive_nums
40 |
41 | def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size):
42 | diag = np.eye(cur_batch_size)
43 | mask = torch.from_numpy(diag)
44 | mask = mask.type(torch.bool)
45 |
46 | oral_batch_size = cur_batch_size // (self.positive_nums + 1)
47 |
48 | positives_mask = np.zeros(similarity_matrix.size())
49 | for i in range(self.positive_nums + 1):
50 | ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size * i)
51 | lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size * i)
52 | positives_mask += ll
53 | positives_mask += lr
54 |
55 | positives_mask = torch.from_numpy(positives_mask).to(similarity_matrix.device)
56 | positives_mask[mask] = 0
57 |
58 | negatives_mask = 1 - positives_mask
59 | negatives_mask[mask] = 0
60 |
61 | return positives_mask.type(torch.bool), negatives_mask.type(torch.bool)
62 |
63 | def forward(self, batch_emb_om):
64 | cur_batch_shape = batch_emb_om.shape
65 |
66 | # get similarity matrix among mask samples
67 | norm_emb = F.normalize(batch_emb_om, dim=1)
68 | similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1))
69 |
70 | # get positives and negatives similarity
71 | positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0])
72 |
73 | positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1)
74 | negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1)
75 |
76 | # generate predict and target probability distributions matrix
77 | logits = torch.cat((positives, negatives), dim=-1)
78 | y_true = torch.cat(
79 | (torch.ones(cur_batch_shape[0], positives.shape[-1]), torch.zeros(cur_batch_shape[0], negatives.shape[-1])),
80 | dim=-1).to(batch_emb_om.device).float()
81 |
82 | # multiple positives - KL divergence
83 | predict = self.log_softmax(logits / self.temperature)
84 | loss = self.kl(predict, y_true)
85 |
86 | return loss, similarity_matrix, logits, positives_mask
87 |
88 |
89 | class AggregationRebuild(torch.nn.Module):
90 |
91 | def __init__(self, args):
92 | super(AggregationRebuild, self).__init__()
93 | self.args = args
94 | self.temperature = args.temperature
95 | self.softmax = torch.nn.Softmax(dim=-1)
96 | self.mse = torch.nn.MSELoss()
97 |
98 | def forward(self, similarity_matrix, batch_emb_om):
99 | cur_batch_shape = batch_emb_om.shape
100 |
101 | # get the weight among (oral, oral's masks, others, others' masks)
102 | similarity_matrix /= self.temperature
103 |
104 | similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to(
105 | similarity_matrix.device).float() * 1e12
106 | rebuild_weight_matrix = self.softmax(similarity_matrix)
107 |
108 | batch_emb_om = batch_emb_om.reshape(cur_batch_shape[0], -1)
109 |
110 | # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks)
111 | rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om)
112 |
113 | # get oral' rebuilt batch embedding
114 | rebuild_oral_batch_emb = rebuild_batch_emb.reshape(cur_batch_shape[0], cur_batch_shape[1], -1)
115 |
116 | return rebuild_weight_matrix, rebuild_oral_batch_emb
--------------------------------------------------------------------------------
/SimMTM_Classification/code/main.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from datetime import datetime
3 | import argparse
4 | from utils.utils import _logger
5 | from model import TFC
6 | from dataloader import data_generator
7 | from trainer import Trainer
8 | import os
9 | import torch
10 |
11 | # Args selections
12 | start_time = datetime.now()
13 | parser = argparse.ArgumentParser()
14 |
15 | home_dir = os.getcwd()
16 | parser.add_argument('--run_description', default='run1', type=str, help='Experiment Description')
17 | parser.add_argument('--seed', default=2023, type=int, help='seed value')
18 |
19 | parser.add_argument('--training_mode', default='pre_train', type=str, help='pre_train, fine_tune')
20 | parser.add_argument('--pretrain_dataset', default='SleepEEG', type=str,
21 | help='Dataset of choice: SleepEEG, FD_A, HAR, ECG')
22 | parser.add_argument('--target_dataset', default='Epilepsy', type=str,
23 | help='Dataset of choice: Epilepsy, FD_B, Gesture, EMG')
24 |
25 | parser.add_argument('--logs_save_dir', default='experiments_logs', type=str, help='saving directory')
26 | parser.add_argument('--device', default='cuda', type=str, help='cpu or cuda')
27 | parser.add_argument('--home_path', default=home_dir, type=str, help='Project home directory')
28 | parser.add_argument('--subset', action='store_true', default=False, help='use the subset of datasets')
29 | parser.add_argument('--log_epoch', default=5, type=int, help='print loss and metrix')
30 | parser.add_argument('--draw_similar_matrix', default=10, type=int, help='draw similarity matrix')
31 | parser.add_argument('--pretrain_lr', default=0.0001, type=float, help='pretrain learning rate')
32 | parser.add_argument('--lr', default=0.0001, type=float, help='learning rate')
33 | parser.add_argument('--use_pretrain_epoch_dir', default=None, type=str,
34 | help='choose the pretrain checkpoint to finetune')
35 | parser.add_argument('--pretrain_epoch', default=10, type=int, help='pretrain epochs')
36 | parser.add_argument('--finetune_epoch', default=300, type=int, help='finetune epochs')
37 |
38 | parser.add_argument('--masking_ratio', default=0.5, type=float, help='masking ratio')
39 | parser.add_argument('--positive_nums', default=3, type=int, help='positive series numbers')
40 | parser.add_argument('--lm', default=3, type=int, help='average masked lenght')
41 |
42 | parser.add_argument('--finetune_result_file_name', default="finetune_result.json", type=str,
43 | help='finetune result json name')
44 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature')
45 |
46 |
47 | def set_seed(seed):
48 | SEED = seed
49 | torch.manual_seed(SEED)
50 | torch.backends.cudnn.deterministic = False
51 | torch.backends.cudnn.benchmark = False
52 | np.random.seed(SEED)
53 |
54 | return seed
55 |
56 |
57 | def main(args, configs, seed=None):
58 | method = 'SimMTM'
59 | sourcedata = args.pretrain_dataset
60 | targetdata = args.target_dataset
61 | training_mode = args.training_mode
62 | run_description = args.run_description
63 |
64 | logs_save_dir = args.logs_save_dir
65 | masking_ratio = args.masking_ratio
66 | pretrain_lr = args.pretrain_lr
67 | pretrain_epoch = args.pretrain_epoch
68 | lr = args.lr
69 | finetune_epoch = args.finetune_epoch
70 | temperature = args.temperature
71 | experiment_description = f"{sourcedata}_2_{targetdata}"
72 |
73 | os.makedirs(logs_save_dir, exist_ok=True)
74 |
75 | # Load datasets
76 | sourcedata_path = f"./dataset/{sourcedata}" # './data/Epilepsy'
77 | targetdata_path = f"./dataset/{targetdata}"
78 |
79 | subset = args.subset # if subset= true, use a subset for debugging.
80 | train_dl, valid_dl, test_dl = data_generator(sourcedata_path, targetdata_path, configs, training_mode,
81 | subset=subset)
82 |
83 | # set seed
84 | if seed is not None:
85 | seed = set_seed(seed)
86 | else:
87 | seed = set_seed(args.seed)
88 |
89 | # experiments_logs/SleepEEG/run1/pre_train_2023_pt_0.5_0.0001_50_ft_0.0003_100
90 | experiment_log_dir = os.path.join(logs_save_dir, experiment_description, run_description,
91 | training_mode + f"_{seed}_pt_{masking_ratio}_{pretrain_lr}_{pretrain_epoch}_ft_{lr}_{finetune_epoch}")
92 | os.makedirs(experiment_log_dir, exist_ok=True)
93 |
94 | # Logging
95 | log_file_name = os.path.join(experiment_log_dir, f"logs_{datetime.now().strftime('%d_%m_%Y_%H_%M_%S')}.log")
96 | logger = _logger(log_file_name)
97 | logger.debug("=" * 45)
98 | logger.debug(f'Pre-training Dataset: {sourcedata}')
99 | logger.debug(f'Target (fine-tuning) Dataset: {targetdata}')
100 | logger.debug(f'Seed: {seed}')
101 | logger.debug(f'Method: {method}')
102 | logger.debug(f'Mode: {training_mode}')
103 | logger.debug(f'Pretrain Learning rate: {pretrain_lr}')
104 | logger.debug(f'Masking ratio: {masking_ratio}')
105 | logger.debug(f'Pretrain Epochs: {pretrain_epoch}')
106 | logger.debug(f'Finetune Learning rate: {lr}')
107 | logger.debug(f'Finetune Epochs: {finetune_epoch}')
108 | logger.debug(f'Temperature: {temperature}')
109 | logger.debug("=" * 45)
110 |
111 | # Load Model
112 | model = TFC(configs, args).to(device)
113 | params_group = [{'params': model.parameters()}]
114 | model_optimizer = torch.optim.Adam(params_group, lr=pretrain_lr, betas=(configs.beta1, configs.beta2),
115 | weight_decay=0)
116 | model_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=model_optimizer, T_max=pretrain_epoch)
117 |
118 | # Trainer
119 | best_performance = Trainer(model, model_optimizer, model_scheduler, train_dl, valid_dl, test_dl, device, logger,
120 | args, configs, experiment_log_dir, seed)
121 |
122 | return best_performance
123 |
124 |
125 | if __name__ == '__main__':
126 | args, unknown = parser.parse_known_args()
127 | device = torch.device(args.device)
128 | exec (f'from config_files.{args.pretrain_dataset}_Configs import Config as Configs')
129 | configs = Configs()
130 |
131 | main(args, configs)
132 |
133 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/model.py:
--------------------------------------------------------------------------------
1 | from torch import nn
2 | import torch
3 | from loss import ContrastiveWeight, AggregationRebuild, AutomaticWeightedLoss
4 |
5 |
6 | class TFC(nn.Module):
7 | def __init__(self, configs, args):
8 | super(TFC, self).__init__()
9 | self.training_mode = args.training_mode
10 |
11 | self.conv_block1 = nn.Sequential(
12 | nn.Conv1d(configs.input_channels, 32, kernel_size=configs.kernel_size,
13 | stride=configs.stride, bias=False, padding=(configs.kernel_size // 2)),
14 | nn.BatchNorm1d(32),
15 | nn.ReLU(),
16 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1),
17 | nn.Dropout(configs.dropout)
18 | )
19 |
20 | self.conv_block2 = nn.Sequential(
21 | nn.Conv1d(32, 64, kernel_size=8, stride=1, bias=False, padding=4),
22 | nn.BatchNorm1d(64),
23 | nn.ReLU(),
24 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1)
25 | )
26 |
27 | self.conv_block3 = nn.Sequential(
28 | nn.Conv1d(64, configs.final_out_channels, kernel_size=8, stride=1, bias=False, padding=4),
29 | nn.BatchNorm1d(configs.final_out_channels),
30 | nn.ReLU(),
31 | nn.MaxPool1d(kernel_size=2, stride=2, padding=1),
32 | )
33 |
34 | self.dense = nn.Sequential(
35 | nn.Linear(configs.CNNoutput_channel * configs.final_out_channels, 256),
36 | nn.BatchNorm1d(256),
37 | nn.ReLU(),
38 | nn.Linear(256, 128)
39 | )
40 |
41 | if self.training_mode == 'pre_train':
42 | self.awl = AutomaticWeightedLoss(2)
43 | self.contrastive = ContrastiveWeight(args)
44 | self.aggregation = AggregationRebuild(args)
45 | self.head = nn.Linear(1280, 178)
46 | self.mse = torch.nn.MSELoss()
47 |
48 | def forward(self, x_in_t, pretrain=False):
49 |
50 | if pretrain:
51 | x = self.conv_block1(x_in_t)
52 | x = self.conv_block2(x)
53 | x = self.conv_block3(x)
54 |
55 | h = x.reshape(x.shape[0], -1)
56 | z = self.dense(h)
57 |
58 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(z)
59 | rebuild_weight_matrix, agg_x = self.aggregation(similarity_matrix, x)
60 | pred_x = self.head(agg_x.reshape(agg_x.size(0), -1))
61 |
62 | loss_rb = self.mse(pred_x, x_in_t.reshape(x_in_t.size(0), -1).detach())
63 | loss = self.awl(loss_cl, loss_rb)
64 |
65 | return loss, loss_cl, loss_rb
66 | else:
67 | x = self.conv_block1(x_in_t)
68 | x = self.conv_block2(x)
69 | x = self.conv_block3(x)
70 |
71 | h = x.reshape(x.shape[0], -1)
72 | z = self.dense(h)
73 |
74 | return h, z
75 |
76 |
77 | class target_classifier(nn.Module): # Classification head
78 | def __init__(self, configs):
79 | super(target_classifier, self).__init__()
80 | self.logits = nn.Linear(1280, 64)
81 | self.logits_simple = nn.Linear(64, configs.num_classes_target)
82 |
83 | def forward(self, emb):
84 | """2-layer MLP"""
85 | emb_flat = emb.reshape(emb.shape[0], -1)
86 | emb = torch.sigmoid(self.logits(emb_flat))
87 | pred = self.logits_simple(emb)
88 | return pred
89 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Classification/code/utils/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/augmentations.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import math
4 |
5 |
6 | def data_transform_masked4cl(sample, masking_ratio, lm, positive_nums=None, distribution='geometric'):
7 | """Masked time series in time dimension"""
8 |
9 | if positive_nums is None:
10 | positive_nums = math.ceil(1.5 / (1 - masking_ratio))
11 |
12 | sample = sample.permute(0, 2, 1)
13 |
14 | sample_repeat = sample.repeat(positive_nums, 1, 1)
15 |
16 | mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution)
17 | x_masked = mask * sample_repeat
18 |
19 | return x_masked.permute(0, 2, 1), mask.permute(0, 2, 1)
20 |
21 |
22 | def geom_noise_mask_single(L, lm, masking_ratio):
23 | """
24 | Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio`
25 | proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution.
26 | Args:
27 | L: length of mask and sequence to be masked
28 | lm: average length of masking subsequences (streaks of 0s)
29 | masking_ratio: proportion of L to be masked
30 | Returns:
31 | (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L
32 | """
33 | keep_mask = np.ones(L, dtype=bool)
34 | p_m = 1 / lm # probability of each masking sequence stopping. parameter of geometric distribution.
35 | p_u = p_m * masking_ratio / (
36 | 1 - masking_ratio) # probability of each unmasked sequence stopping. parameter of geometric distribution.
37 | p = [p_m, p_u]
38 |
39 | # Start in state 0 with masking_ratio probability
40 | state = int(np.random.rand() > masking_ratio) # state 0 means masking, 1 means not masking
41 | for i in range(L):
42 | keep_mask[i] = state # here it happens that state and masking value corresponding to state are identical
43 | if np.random.rand() < p[state]:
44 | state = 1 - state
45 |
46 | return keep_mask
47 |
48 |
49 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None):
50 | """
51 | Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked.
52 | Args:
53 | X: (seq_length, feat_dim) numpy array of features corresponding to a single sample
54 | masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of
55 | feat_dim that will be masked on average
56 | lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'.
57 | distribution: whether each mask sequence element is sampled independently at random, or whether
58 | sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of
59 | masked squences of a desired mean length `lm`
60 | exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s)
61 | Returns:
62 | boolean numpy array with the same shape as X, with 0s at places where a feature should be masked
63 | """
64 | if exclude_feats is not None:
65 | exclude_feats = set(exclude_feats)
66 |
67 | if distribution == 'geometric': # stateful (Markov chain)
68 | mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio)
69 | mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2])
70 | elif distribution == 'masked_tail':
71 | mask = np.ones(X.shape, dtype=bool)
72 | for m in range(X.shape[0]): # feature dimension
73 |
74 | keep_mask = np.zeros_like(mask[m, :], dtype=bool)
75 | n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio))
76 | keep_mask[:, :n] = True
77 | mask[m, :] = keep_mask # time dimension
78 | elif distribution == 'masked_head':
79 | mask = np.ones(X.shape, dtype=bool)
80 | for m in range(X.shape[0]): # feature dimension
81 |
82 | keep_mask = np.zeros_like(mask[m, :], dtype=bool)
83 | n = math.ceil(keep_mask.shape[1] * masking_ratio)
84 | keep_mask[:, n:] = True
85 | mask[m, :] = keep_mask # time dimension
86 | else: # each position is independent Bernoulli with p = 1 - masking_ratio
87 | mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True,
88 | p=(1 - masking_ratio, masking_ratio))
89 | return torch.tensor(mask)
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | import random
5 | import math
6 | import torch.nn.functional as F
7 |
8 | class AutomaticWeightedLoss(nn.Module):
9 | """automatically weighted multi-task loss
10 | Params:
11 | num: int,the number of loss
12 | x: multi-task loss
13 | Examples:
14 | loss1=1
15 | loss2=2
16 | awl = AutomaticWeightedLoss(2)
17 | loss_sum = awl(loss1, loss2)
18 | """
19 | def __init__(self, num=2):
20 | super(AutomaticWeightedLoss, self).__init__()
21 | params = torch.ones(num, requires_grad=True)
22 | self.params = nn.Parameter(params)
23 |
24 | def forward(self, *x):
25 | loss_sum = 0
26 | for i, loss in enumerate(x):
27 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
28 | return loss_sum
29 |
30 |
31 | class ContrastiveLoss(nn.Module):
32 |
33 | def __init__(self, device, args):
34 | super(ContrastiveLoss, self).__init__()
35 | self.device = device
36 | self.temperature = args.temperature
37 |
38 | self.bce = torch.nn.BCELoss()
39 | self.softmax = torch.nn.Softmax(dim=-1)
40 | self.log_softmax = torch.nn.LogSoftmax(dim=-1)
41 |
42 | self.kl = torch.nn.KLDivLoss(reduction='batchmean')
43 |
44 | def get_positive_and_negative_mask(self, similarity_matrix, cur_batch_size, oral_batch_size):
45 |
46 | diag = np.eye(cur_batch_size)
47 | mask = torch.from_numpy(diag)
48 | mask = mask.type(torch.bool)
49 |
50 | positives_mask = np.zeros(similarity_matrix.size())
51 | for i in range(cur_batch_size//oral_batch_size):
52 | ll = np.eye(cur_batch_size, cur_batch_size, k=oral_batch_size*i)
53 | lr = np.eye(cur_batch_size, cur_batch_size, k=-oral_batch_size*i)
54 | positives_mask += ll
55 | positives_mask += lr
56 |
57 | positives_mask = torch.from_numpy(positives_mask)
58 | positives_mask[mask] = 0
59 |
60 | negatives_mask = 1 - positives_mask
61 | negatives_mask[mask] = 0
62 |
63 | return positives_mask.type(torch.bool), negatives_mask.type(torch.bool)
64 |
65 | def forward(self, batch_emb_om, batch_x):
66 |
67 | cur_batch_shape = batch_emb_om.shape
68 | oral_batch_shape = batch_x.shape
69 |
70 | # get similarity matrix among mask samples
71 | norm_emb = F.normalize(batch_emb_om, dim=1)
72 | similarity_matrix = torch.matmul(norm_emb, norm_emb.transpose(0, 1))
73 |
74 | # get positives and negatives similarity
75 | positives_mask, negatives_mask = self.get_positive_and_negative_mask(similarity_matrix, cur_batch_shape[0], oral_batch_shape[0])
76 |
77 | positives = similarity_matrix[positives_mask].view(cur_batch_shape[0], -1)
78 | negatives = similarity_matrix[negatives_mask].view(cur_batch_shape[0], -1)
79 |
80 | # generate predict and target probability distributions matrix
81 | logits = torch.cat((positives, negatives), dim=-1)
82 | y_true = torch.cat((torch.ones(cur_batch_shape[0], positives.shape[-1]) / positives.shape[-1], torch.zeros(cur_batch_shape[0], negatives.shape[-1])), dim=-1).to(self.device).float()
83 |
84 | # multiple positives - KL divergence
85 | predict = self.log_softmax(logits / self.temperature)
86 | loss = self.kl(predict, y_true)
87 |
88 | return loss, similarity_matrix, logits
89 |
90 |
91 | class RebuildLoss(torch.nn.Module):
92 |
93 | def __init__(self, device, args):
94 | super(RebuildLoss, self).__init__()
95 | self.args = args
96 | self.device = device
97 | self.temperature = args.temperature
98 |
99 | self.softmax = torch.nn.Softmax(dim=-1)
100 | self.mse = torch.nn.MSELoss()
101 |
102 | def forward(self, similarity_matrix, batch_emb_om, batch_emb_o, batch_x):
103 |
104 | cur_batch_shape = batch_emb_om.shape
105 | oral_batch_shape = batch_x.shape
106 |
107 | # get the weight among (oral, oral's masks, others, others' masks)
108 | similarity_matrix /= self.temperature
109 | similarity_matrix = similarity_matrix - torch.eye(cur_batch_shape[0]).to(self.device).float() * 1e12
110 | rebuild_weight_matrix = self.softmax(similarity_matrix)
111 |
112 | batch_emb_om = batch_emb_om.view(cur_batch_shape[0], -1)
113 |
114 | # generate the rebuilt batch embedding (oral, others, oral's masks, others' masks)
115 | rebuild_batch_emb = torch.matmul(rebuild_weight_matrix, batch_emb_om)
116 |
117 | # get oral' rebuilt batch embedding
118 | rebuild_oral_batch_emb = rebuild_batch_emb[:oral_batch_shape[0]].reshape(oral_batch_shape[0], cur_batch_shape[1], -1)
119 |
120 | # MSE Loss
121 | if self.args.rbtp == 0:
122 | loss = self.mse(rebuild_oral_batch_emb, batch_emb_o.detach())
123 | elif self.args.rbtp == 1:
124 | loss = self.mse(rebuild_oral_batch_emb, batch_x.detach())
125 |
126 | return loss, rebuild_weight_matrix
127 |
128 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/masking.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 |
4 | class TriangularCausalMask():
5 | def __init__(self, B, L, device="cpu"):
6 | mask_shape = [B, 1, L, L]
7 | with torch.no_grad():
8 | self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
9 |
10 | @property
11 | def mask(self):
12 | return self._mask
13 |
14 |
15 | class ProbMask():
16 | def __init__(self, B, H, L, index, scores, device="cpu"):
17 | _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1)
18 | _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
19 | indicator = _mask_ex[torch.arange(B)[:, None, None],
20 | torch.arange(H)[None, :, None],
21 | index, :].to(device)
22 | self._mask = indicator.view(scores.shape).to(device)
23 |
24 | @property
25 | def mask(self):
26 | return self._mask
27 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/metrics.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def RSE(pred, true):
5 | return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2))
6 |
7 |
8 | def CORR(pred, true):
9 | u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0)
10 | d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0))
11 | return (u / d).mean(-1)
12 |
13 |
14 | def MAE(pred, true):
15 | return np.mean(np.abs(pred - true))
16 |
17 |
18 | def MSE(pred, true):
19 | return np.mean((pred - true) ** 2)
20 |
21 |
22 | def RMSE(pred, true):
23 | return np.sqrt(MSE(pred, true))
24 |
25 |
26 | def MAPE(pred, true):
27 | return np.mean(np.abs((pred - true) / true))
28 |
29 |
30 | def MSPE(pred, true):
31 | return np.mean(np.square((pred - true) / true))
32 |
33 |
34 | def metric(pred, true):
35 | mae = MAE(pred, true)
36 | mse = MSE(pred, true)
37 | rmse = RMSE(pred, true)
38 | mape = MAPE(pred, true)
39 | mspe = MSPE(pred, true)
40 |
41 | return mae, mse, rmse, mape, mspe
42 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/timefeatures.py:
--------------------------------------------------------------------------------
1 | from typing import List
2 |
3 | import numpy as np
4 | import pandas as pd
5 | from pandas.tseries import offsets
6 | from pandas.tseries.frequencies import to_offset
7 |
8 |
9 | class TimeFeature:
10 | def __init__(self):
11 | pass
12 |
13 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
14 | pass
15 |
16 | def __repr__(self):
17 | return self.__class__.__name__ + "()"
18 |
19 |
20 | class SecondOfMinute(TimeFeature):
21 | """Minute of hour encoded as value between [-0.5, 0.5]"""
22 |
23 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
24 | return index.second / 59.0 - 0.5
25 |
26 |
27 | class MinuteOfHour(TimeFeature):
28 | """Minute of hour encoded as value between [-0.5, 0.5]"""
29 |
30 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
31 | return index.minute / 59.0 - 0.5
32 |
33 |
34 | class HourOfDay(TimeFeature):
35 | """Hour of day encoded as value between [-0.5, 0.5]"""
36 |
37 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
38 | return index.hour / 23.0 - 0.5
39 |
40 |
41 | class DayOfWeek(TimeFeature):
42 | """Hour of day encoded as value between [-0.5, 0.5]"""
43 |
44 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
45 | return index.dayofweek / 6.0 - 0.5
46 |
47 |
48 | class DayOfMonth(TimeFeature):
49 | """Day of month encoded as value between [-0.5, 0.5]"""
50 |
51 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
52 | return (index.day - 1) / 30.0 - 0.5
53 |
54 |
55 | class DayOfYear(TimeFeature):
56 | """Day of year encoded as value between [-0.5, 0.5]"""
57 |
58 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
59 | return (index.dayofyear - 1) / 365.0 - 0.5
60 |
61 |
62 | class MonthOfYear(TimeFeature):
63 | """Month of year encoded as value between [-0.5, 0.5]"""
64 |
65 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
66 | return (index.month - 1) / 11.0 - 0.5
67 |
68 |
69 | class WeekOfYear(TimeFeature):
70 | """Week of year encoded as value between [-0.5, 0.5]"""
71 |
72 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
73 | return (index.isocalendar().week - 1) / 52.0 - 0.5
74 |
75 |
76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]:
77 | """
78 | Returns a list of time features that will be appropriate for the given frequency string.
79 | Parameters
80 | ----------
81 | freq_str
82 | Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc.
83 | """
84 |
85 | features_by_offsets = {
86 | offsets.YearEnd: [],
87 | offsets.QuarterEnd: [MonthOfYear],
88 | offsets.MonthEnd: [MonthOfYear],
89 | offsets.Week: [DayOfMonth, WeekOfYear],
90 | offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear],
91 | offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear],
92 | offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear],
93 | offsets.Minute: [
94 | MinuteOfHour,
95 | HourOfDay,
96 | DayOfWeek,
97 | DayOfMonth,
98 | DayOfYear,
99 | ],
100 | offsets.Second: [
101 | SecondOfMinute,
102 | MinuteOfHour,
103 | HourOfDay,
104 | DayOfWeek,
105 | DayOfMonth,
106 | DayOfYear,
107 | ],
108 | }
109 |
110 | offset = to_offset(freq_str)
111 |
112 | for offset_type, feature_classes in features_by_offsets.items():
113 | if isinstance(offset, offset_type):
114 | return [cls() for cls in feature_classes]
115 |
116 | supported_freq_msg = f"""
117 | Unsupported frequency {freq_str}
118 | The following frequencies are supported:
119 | Y - yearly
120 | alias: A
121 | M - monthly
122 | W - weekly
123 | D - daily
124 | B - business days
125 | H - hourly
126 | T - minutely
127 | alias: min
128 | S - secondly
129 | """
130 | raise RuntimeError(supported_freq_msg)
131 |
132 |
133 | def time_features(dates, freq='h'):
134 | return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)])
135 |
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/tools.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import matplotlib.pyplot as plt
4 |
5 | plt.switch_backend('agg')
6 |
7 | def adjust_learning_rate(optimizer, epoch, args, learning_rate):
8 | # lr = args.learning_rate * (0.2 ** (epoch // 2))
9 |
10 | if args.lradj == 'type1':
11 | lr_adjust = {epoch: learning_rate * (0.5 ** ((epoch - 1) // 1))}
12 |
13 | if epoch in lr_adjust.keys():
14 | lr = lr_adjust[epoch]
15 | for param_group in optimizer.param_groups:
16 | param_group['lr'] = lr
17 | print('Updating learning rate to {}'.format(lr))
18 | elif args.lradj == 'type2':
19 | lr_adjust = {
20 | 2: 5e-5, 4: 1e-5, 6: 5e-6, 8: 1e-6,
21 | 10: 5e-7, 15: 1e-7, 20: 5e-8
22 | }
23 |
24 | if epoch in lr_adjust.keys():
25 | lr = lr_adjust[epoch]
26 | for param_group in optimizer.param_groups:
27 | param_group['lr'] = lr
28 | print('Updating learning rate to {}'.format(lr))
29 |
30 |
31 | class EarlyStopping:
32 | def __init__(self, patience=7, verbose=False, delta=0):
33 | self.patience = patience
34 | self.verbose = verbose
35 | self.counter = 0
36 | self.best_score = None
37 | self.early_stop = False
38 | self.val_loss_min = np.Inf
39 | self.delta = delta
40 |
41 | def __call__(self, val_loss, model, path, pred_len):
42 | score = -val_loss
43 | if self.best_score is None:
44 | self.best_score = score
45 | self.save_checkpoint(val_loss, model, path, pred_len)
46 | elif score < self.best_score + self.delta:
47 | self.counter += 1
48 | print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
49 | if self.counter >= self.patience:
50 | self.early_stop = True
51 | else:
52 | self.best_score = score
53 | self.save_checkpoint(val_loss, model, path, pred_len)
54 | self.counter = 0
55 |
56 | def save_checkpoint(self, val_loss, model, path, pred_len):
57 | if self.verbose:
58 | print(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ...')
59 | torch.save(model.state_dict(), path + '/' + f'checkpoint_{pred_len}.pth')
60 | self.val_loss_min = val_loss
--------------------------------------------------------------------------------
/SimMTM_Classification/code/utils/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import random
3 | import numpy as np
4 | import pandas as pd
5 | import os
6 | import sys
7 | import logging
8 | from sklearn.metrics import classification_report, cohen_kappa_score, confusion_matrix, accuracy_score
9 | from shutil import copy
10 |
11 | def set_requires_grad(model, dict_, requires_grad=True):
12 | for param in model.named_parameters():
13 | if param[0] in dict_:
14 | param[1].requires_grad = requires_grad
15 |
16 |
17 | def fix_randomness(SEED):
18 | random.seed(SEED)
19 | np.random.seed(SEED)
20 | torch.manual_seed(SEED)
21 | torch.cuda.manual_seed(SEED)
22 | torch.backends.cudnn.deterministic = True
23 |
24 |
25 | def epoch_time(start_time, end_time):
26 | elapsed_time = end_time - start_time
27 | elapsed_mins = int(elapsed_time / 60)
28 | elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
29 | return elapsed_mins, elapsed_secs
30 |
31 |
32 | def _calc_metrics(pred_labels, true_labels, log_dir, home_path):
33 | pred_labels = np.array(pred_labels).astype(int)
34 | true_labels = np.array(true_labels).astype(int)
35 |
36 | # save targets
37 | labels_save_path = os.path.join(log_dir, "labels")
38 | os.makedirs(labels_save_path, exist_ok=True)
39 | np.save(os.path.join(labels_save_path, "predicted_labels.npy"), pred_labels)
40 | np.save(os.path.join(labels_save_path, "true_labels.npy"), true_labels)
41 |
42 | r = classification_report(true_labels, pred_labels, digits=6, output_dict=True)
43 | cm = confusion_matrix(true_labels, pred_labels)
44 | df = pd.DataFrame(r)
45 | df["cohen"] = cohen_kappa_score(true_labels, pred_labels)
46 | df["accuracy"] = accuracy_score(true_labels, pred_labels)
47 | df = df * 100
48 |
49 | # save classification report
50 | exp_name = os.path.split(os.path.dirname(log_dir))[-1]
51 | training_mode = os.path.basename(log_dir)
52 | file_name = f"{exp_name}_{training_mode}_classification_report.xlsx"
53 | report_Save_path = os.path.join(home_path, log_dir, file_name)
54 | df.to_excel(report_Save_path)
55 |
56 | # save confusion matrix
57 | cm_file_name = f"{exp_name}_{training_mode}_confusion_matrix.torch"
58 | cm_Save_path = os.path.join(home_path, log_dir, cm_file_name)
59 | torch.save(cm, cm_Save_path)
60 |
61 |
62 | def _logger(logger_name, level=logging.DEBUG):
63 | """
64 | Method to return a custom logger with the given name and level
65 | """
66 | logger = logging.getLogger(logger_name)
67 | logger.setLevel(level)
68 | # format_string = ("%(asctime)s — %(name)s — %(levelname)s — %(funcName)s:"
69 | # "%(lineno)d — %(message)s")
70 | format_string = "%(message)s"
71 | log_format = logging.Formatter(format_string)
72 | # Creating and adding the console handler
73 | console_handler = logging.StreamHandler(sys.stdout)
74 | console_handler.setFormatter(log_format)
75 | logger.addHandler(console_handler)
76 | # Creating and adding the file handler
77 | file_handler = logging.FileHandler(logger_name, mode='a')
78 | file_handler.setFormatter(log_format)
79 | logger.addHandler(file_handler)
80 | return logger
81 |
82 |
83 |
84 |
85 |
86 | def copy_Files(destination, data_type):
87 | # destination: 'experiments_logs/Exp1/run1'
88 | destination_dir = os.path.join(destination, "model_files")
89 | os.makedirs(destination_dir, exist_ok=True)
90 | copy("code/main.py", os.path.join(destination_dir, "main.py"))
91 | copy("code/trainer.py", os.path.join(destination_dir, "trainer.py"))
92 | copy(f"code/config_files/{data_type}_Configs.py", os.path.join(destination_dir, f"{data_type}_Configs.py"))
93 | copy("code/augmentations.py", os.path.join(destination_dir, "augmentations.py"))
94 | copy("code/dataloader.py", os.path.join(destination_dir, "dataloader.py"))
95 | copy(f"code/model.py", os.path.join(destination_dir, f"model.py"))
96 | copy("code/loss.py", os.path.join(destination_dir, "loss.py"))
97 | copy("code/TC.py", os.path.join(destination_dir, "TC.py"))
98 |
--------------------------------------------------------------------------------
/SimMTM_Classification/download_datasets.sh:
--------------------------------------------------------------------------------
1 | wget -O SleepEEG.zip https://figshare.com/ndownloader/articles/19930178/versions/1
2 | wget -O Epilepsy.zip https://figshare.com/ndownloader/articles/19930199/versions/1
3 | wget -O FD-A.zip https://figshare.com/ndownloader/articles/19930205/versions/1
4 | wget -O FD-B.zip https://figshare.com/ndownloader/articles/19930226/versions/1
5 | wget -O HAR.zip https://figshare.com/ndownloader/articles/19930244/versions/1
6 | wget -O Gesture.zip https://figshare.com/ndownloader/articles/19930247/versions/1
7 | wget -O ECG.zip https://figshare.com/ndownloader/articles/19930253/versions/1
8 | wget -O EMG.zip https://figshare.com/ndownloader/articles/19930250/versions/1
9 |
10 | unzip SleepEEG.zip -d datasets/SleepEEG/
11 | unzip Epilepsy.zip -d datasets/Epilepsy/
12 | unzip FD-A.zip -d datasets/FD-A/
13 | unzip FD-B.zip -d datasets/FD-B/
14 | unzip HAR.zip -d datasets/HAR/
15 | unzip Gesture.zip -d datasets/Gesture/
16 | unzip ECG.zip -d datasets/ECG/
17 | unzip EMG.zip -d datasets/EMG/
18 |
19 | #rm {SleepEEG,Epilepsy,FD-A,FD-B,HAR,Gesture,ECG,EMG}.zip
20 |
21 |
22 |
23 |
24 |
--------------------------------------------------------------------------------
/SimMTM_Classification/run.sh:
--------------------------------------------------------------------------------
1 | python ./code/main.py --target_dataset Epilepsy --finetune_epoch 100
2 | python ./code/main.py --target_dataset FD-B --lr 0.0003
3 | python ./code/main.py --target_dataset Gesture
4 | python ./code/main.py --target_dataset EMG --lr 0.0003
5 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 | /scripts/long_term_forecast/Traffic_script/PatchTST1.sh
131 | /backups/
132 | /result.xlsx
133 | /~$result.xlsx
134 | /Time-Series-Library.zip
135 | /temp.sh
136 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/data_factory.py:
--------------------------------------------------------------------------------
1 | from data_provider.data_loader import Dataset_ETT_hour, Dataset_ETT_minute, Dataset_Custom, Dataset_M4, PSMSegLoader, \
2 | MSLSegLoader, SMAPSegLoader, SMDSegLoader, SWATSegLoader, UEAloader
3 | from data_provider.uea import collate_fn
4 | from torch.utils.data import DataLoader
5 |
6 | data_dict = {
7 | 'ETTh1': Dataset_ETT_hour,
8 | 'ETTh2': Dataset_ETT_hour,
9 | 'ETTm1': Dataset_ETT_minute,
10 | 'ETTm2': Dataset_ETT_minute,
11 | 'Traffic': Dataset_Custom,
12 | 'Exchange': Dataset_Custom,
13 | 'Weather': Dataset_Custom,
14 | 'ECL': Dataset_Custom,
15 | 'ILI': Dataset_Custom,
16 | 'm4': Dataset_M4,
17 | 'PSM': PSMSegLoader,
18 | 'MSL': MSLSegLoader,
19 | 'SMAP': SMAPSegLoader,
20 | 'SMD': SMDSegLoader,
21 | 'SWAT': SWATSegLoader,
22 | 'UEA': UEAloader,
23 | }
24 |
25 |
26 | def data_provider(args, flag):
27 | Data = data_dict[args.data]
28 |
29 | timeenc = 0 if args.embed != 'timeF' else 1
30 |
31 | if flag == 'test':
32 | shuffle_flag = False
33 | drop_last = True
34 | if args.task_name == 'anomaly_detection' or args.task_name == 'classification':
35 | batch_size = args.batch_size
36 | else:
37 | batch_size = 1 # bsz=1 for evaluation
38 | freq = args.freq
39 | else:
40 | shuffle_flag = True
41 | drop_last = True
42 | batch_size = args.batch_size # bsz for train and valid
43 | freq = args.freq
44 |
45 | if args.task_name == 'anomaly_detection':
46 | drop_last = False
47 | data_set = Data(
48 | root_path=args.root_path,
49 | win_size=args.seq_len,
50 | flag=flag,
51 | )
52 | print(flag, len(data_set))
53 | data_loader = DataLoader(
54 | data_set,
55 | batch_size=batch_size,
56 | shuffle=shuffle_flag,
57 | num_workers=args.num_workers,
58 | drop_last=drop_last)
59 | return data_set, data_loader
60 | elif args.task_name == 'classification':
61 | drop_last = False
62 | data_set = Data(
63 | root_path=args.root_path,
64 | flag=flag,
65 | )
66 | print(flag, len(data_set))
67 | data_loader = DataLoader(
68 | data_set,
69 | batch_size=batch_size,
70 | shuffle=shuffle_flag,
71 | num_workers=args.num_workers,
72 | drop_last=drop_last,
73 | collate_fn=lambda x: collate_fn(x, max_len=args.seq_len)
74 | )
75 | return data_set, data_loader
76 | else:
77 | if args.data == 'm4':
78 | drop_last = False
79 |
80 | data_set = Data(
81 | root_path=args.root_path,
82 | data_path=args.data_path,
83 | flag=flag,
84 | size=[args.seq_len, args.label_len, args.pred_len],
85 | features=args.features,
86 | target=args.target,
87 | timeenc=timeenc,
88 | freq=freq,
89 | seasonal_patterns=args.seasonal_patterns
90 | )
91 |
92 | data_loader = DataLoader(
93 | data_set,
94 | batch_size=batch_size,
95 | shuffle=shuffle_flag,
96 | num_workers=args.num_workers,
97 | drop_last=drop_last)
98 |
99 | print(flag, len(data_set), len(data_loader))
100 | return data_set, data_loader
101 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/m4.py:
--------------------------------------------------------------------------------
1 | # This source code is provided for the purposes of scientific reproducibility
2 | # under the following limited license from Element AI Inc. The code is an
3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
4 | # expansion analysis for interpretable time series forecasting,
5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is
6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0
7 | # International license (CC BY-NC 4.0):
8 | # https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether
9 | # for the benefit of third parties or internally in production) requires an
10 | # explicit license. The subject-matter of the N-BEATS model and associated
11 | # materials are the property of Element AI Inc. and may be subject to patent
12 | # protection. No license to patents is granted hereunder (whether express or
13 | # implied). Copyright © 2020 Element AI Inc. All rights reserved.
14 |
15 | """
16 | M4 Dataset
17 | """
18 | from dataclasses import dataclass
19 |
20 | import numpy as np
21 | import pandas as pd
22 | import logging
23 | import os
24 | import pathlib
25 | import sys
26 | from urllib import request
27 |
28 |
29 | def url_file_name(url: str) -> str:
30 | """
31 | Extract file name from url.
32 |
33 | :param url: URL to extract file name from.
34 | :return: File name.
35 | """
36 | return url.split('/')[-1] if len(url) > 0 else ''
37 |
38 |
39 | def download(url: str, file_path: str) -> None:
40 | """
41 | Download a file to the given path.
42 |
43 | :param url: URL to download
44 | :param file_path: Where to download the content.
45 | """
46 |
47 | def progress(count, block_size, total_size):
48 | progress_pct = float(count * block_size) / float(total_size) * 100.0
49 | sys.stdout.write('\rDownloading {} to {} {:.1f}%'.format(url, file_path, progress_pct))
50 | sys.stdout.flush()
51 |
52 | if not os.path.isfile(file_path):
53 | opener = request.build_opener()
54 | opener.addheaders = [('User-agent', 'Mozilla/5.0')]
55 | request.install_opener(opener)
56 | pathlib.Path(os.path.dirname(file_path)).mkdir(parents=True, exist_ok=True)
57 | f, _ = request.urlretrieve(url, file_path, progress)
58 | sys.stdout.write('\n')
59 | sys.stdout.flush()
60 | file_info = os.stat(f)
61 | logging.info(f'Successfully downloaded {os.path.basename(file_path)} {file_info.st_size} bytes.')
62 | else:
63 | file_info = os.stat(file_path)
64 | logging.info(f'File already exists: {file_path} {file_info.st_size} bytes.')
65 |
66 |
67 | @dataclass()
68 | class M4Dataset:
69 | ids: np.ndarray
70 | groups: np.ndarray
71 | frequencies: np.ndarray
72 | horizons: np.ndarray
73 | values: np.ndarray
74 |
75 | @staticmethod
76 | def load(training: bool = True, dataset_file: str = '../dataset/m4') -> 'M4Dataset':
77 | """
78 | Load cached dataset.
79 |
80 | :param training: Load training part if training is True, test part otherwise.
81 | """
82 | info_file = os.path.join(dataset_file, 'M4-info.csv')
83 | train_cache_file = os.path.join(dataset_file, 'training.npz')
84 | test_cache_file = os.path.join(dataset_file, 'test.npz')
85 | m4_info = pd.read_csv(info_file)
86 | return M4Dataset(ids=m4_info.M4id.values,
87 | groups=m4_info.SP.values,
88 | frequencies=m4_info.Frequency.values,
89 | horizons=m4_info.Horizon.values,
90 | values=np.load(train_cache_file if training else test_cache_file, allow_pickle=True))
91 |
92 |
93 | @dataclass()
94 | class M4Meta:
95 | seasonal_patterns = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly']
96 | horizons = [6, 8, 18, 13, 14, 48]
97 | frequencies = [1, 4, 12, 1, 1, 24]
98 | horizons_map = {
99 | 'Yearly': 6,
100 | 'Quarterly': 8,
101 | 'Monthly': 18,
102 | 'Weekly': 13,
103 | 'Daily': 14,
104 | 'Hourly': 48
105 | } # different predict length
106 | frequency_map = {
107 | 'Yearly': 1,
108 | 'Quarterly': 4,
109 | 'Monthly': 12,
110 | 'Weekly': 1,
111 | 'Daily': 1,
112 | 'Hourly': 24
113 | }
114 | history_size = {
115 | 'Yearly': 1.5,
116 | 'Quarterly': 1.5,
117 | 'Monthly': 1.5,
118 | 'Weekly': 10,
119 | 'Daily': 10,
120 | 'Hourly': 10
121 | } # from interpretable.gin
122 |
123 |
124 | def load_m4_info() -> pd.DataFrame:
125 | """
126 | Load M4Info file.
127 |
128 | :return: Pandas DataFrame of M4Info.
129 | """
130 | return pd.read_csv(INFO_FILE_PATH)
131 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/data_provider/uea.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import pandas as pd
4 | import torch
5 |
6 |
7 | def collate_fn(data, max_len=None):
8 | """Build mini-batch tensors from a list of (X, mask) tuples. Mask input. Create
9 | Args:
10 | data: len(batch_size) list of tuples (X, y).
11 | - X: torch tensor of shape (seq_length, feat_dim); variable seq_length.
12 | - y: torch tensor of shape (num_labels,) : class indices or numerical targets
13 | (for classification or regression, respectively). num_labels > 1 for multi-task models
14 | max_len: global fixed sequence length. Used for architectures requiring fixed length input,
15 | where the batch length cannot vary dynamically. Longer sequences are clipped, shorter are padded with 0s
16 | Returns:
17 | X: (batch_size, padded_length, feat_dim) torch tensor of masked features (input)
18 | targets: (batch_size, padded_length, feat_dim) torch tensor of unmasked features (output)
19 | target_masks: (batch_size, padded_length, feat_dim) boolean torch tensor
20 | 0 indicates masked values to be predicted, 1 indicates unaffected/"active" feature values
21 | padding_masks: (batch_size, padded_length) boolean tensor, 1 means keep vector at this position, 0 means padding
22 | """
23 |
24 | batch_size = len(data)
25 | features, labels = zip(*data)
26 |
27 | # Stack and pad features and masks (convert 2D to 3D tensors, i.e. add batch dimension)
28 | lengths = [X.shape[0] for X in features] # original sequence length for each time series
29 | if max_len is None:
30 | max_len = max(lengths)
31 | X = torch.zeros(batch_size, max_len, features[0].shape[-1]) # (batch_size, padded_length, feat_dim)
32 | for i in range(batch_size):
33 | end = min(lengths[i], max_len)
34 | X[i, :end, :] = features[i][:end, :]
35 |
36 | targets = torch.stack(labels, dim=0) # (batch_size, num_labels)
37 |
38 | padding_masks = padding_mask(torch.tensor(lengths, dtype=torch.int16),
39 | max_len=max_len) # (batch_size, padded_length) boolean tensor, "1" means keep
40 |
41 | return X, targets, padding_masks
42 |
43 |
44 | def padding_mask(lengths, max_len=None):
45 | """
46 | Used to mask padded positions: creates a (batch_size, max_len) boolean mask from a tensor of sequence lengths,
47 | where 1 means keep element at this position (time step)
48 | """
49 | batch_size = lengths.numel()
50 | max_len = max_len or lengths.max_val() # trick works because of overloading of 'or' operator for non-boolean types
51 | return (torch.arange(0, max_len, device=lengths.device)
52 | .type_as(lengths)
53 | .repeat(batch_size, 1)
54 | .lt(lengths.unsqueeze(1)))
55 |
56 |
57 | class Normalizer(object):
58 | """
59 | Normalizes dataframe across ALL contained rows (time steps). Different from per-sample normalization.
60 | """
61 |
62 | def __init__(self, norm_type='standardization', mean=None, std=None, min_val=None, max_val=None):
63 | """
64 | Args:
65 | norm_type: choose from:
66 | "standardization", "minmax": normalizes dataframe across ALL contained rows (time steps)
67 | "per_sample_std", "per_sample_minmax": normalizes each sample separately (i.e. across only its own rows)
68 | mean, std, min_val, max_val: optional (num_feat,) Series of pre-computed values
69 | """
70 |
71 | self.norm_type = norm_type
72 | self.mean = mean
73 | self.std = std
74 | self.min_val = min_val
75 | self.max_val = max_val
76 |
77 | def normalize(self, df):
78 | """
79 | Args:
80 | df: input dataframe
81 | Returns:
82 | df: normalized dataframe
83 | """
84 | if self.norm_type == "standardization":
85 | if self.mean is None:
86 | self.mean = df.mean()
87 | self.std = df.std()
88 | return (df - self.mean) / (self.std + np.finfo(float).eps)
89 |
90 | elif self.norm_type == "minmax":
91 | if self.max_val is None:
92 | self.max_val = df.max()
93 | self.min_val = df.min()
94 | return (df - self.min_val) / (self.max_val - self.min_val + np.finfo(float).eps)
95 |
96 | elif self.norm_type == "per_sample_std":
97 | grouped = df.groupby(by=df.index)
98 | return (df - grouped.transform('mean')) / grouped.transform('std')
99 |
100 | elif self.norm_type == "per_sample_minmax":
101 | grouped = df.groupby(by=df.index)
102 | min_vals = grouped.transform('min')
103 | return (df - min_vals) / (grouped.transform('max') - min_vals + np.finfo(float).eps)
104 |
105 | else:
106 | raise (NameError(f'Normalize method "{self.norm_type}" not implemented'))
107 |
108 |
109 | def interpolate_missing(y):
110 | """
111 | Replaces NaN values in pd.Series `y` using linear interpolation
112 | """
113 | if y.isna().any():
114 | y = y.interpolate(method='linear', limit_direction='both')
115 | return y
116 |
117 |
118 | def subsample(y, limit=256, factor=2):
119 | """
120 | If a given Series is longer than `limit`, returns subsampled sequence by the specified integer factor
121 | """
122 | if len(y) > limit:
123 | return y[::factor].reset_index(drop=True)
124 | return y
125 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/exp/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Forecasting/exp/exp_basic.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | from models import SimMTM
4 |
5 |
6 | class Exp_Basic(object):
7 | def __init__(self, args):
8 | self.args = args
9 | self.model_dict = {'SimMTM': SimMTM}
10 | self.device = self._acquire_device()
11 | self.model = self._build_model().to(self.device)
12 |
13 | def _build_model(self):
14 | raise NotImplementedError
15 | return None
16 |
17 | def _acquire_device(self):
18 | if self.args.use_gpu:
19 | os.environ["CUDA_VISIBLE_DEVICES"] = str(self.args.gpu) if not self.args.use_multi_gpu else self.args.devices
20 | device = torch.device('cuda:{}'.format(self.args.gpu))
21 | print('Use GPU: cuda:{}'.format(self.args.gpu))
22 | else:
23 | device = torch.device('cpu')
24 | print('Use CPU')
25 | return device
26 |
27 | def _get_data(self):
28 | pass
29 |
30 | def vali(self):
31 | pass
32 |
33 | def train(self):
34 | pass
35 |
36 | def test(self):
37 | pass
38 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/AutoCorrelation.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | import matplotlib.pyplot as plt
5 | import numpy as np
6 | import math
7 | from math import sqrt
8 | import os
9 |
10 |
11 | class AutoCorrelation(nn.Module):
12 | """
13 | AutoCorrelation Mechanism with the following two phases:
14 | (1) period-based dependencies discovery
15 | (2) time delay aggregation
16 | This block can replace the self-attention family mechanism seamlessly.
17 | """
18 |
19 | def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False):
20 | super(AutoCorrelation, self).__init__()
21 | self.factor = factor
22 | self.scale = scale
23 | self.mask_flag = mask_flag
24 | self.output_attention = output_attention
25 | self.dropout = nn.Dropout(attention_dropout)
26 |
27 | def time_delay_agg_training(self, values, corr):
28 | """
29 | SpeedUp version of Autocorrelation (a batch-normalization style design)
30 | This is for the training phase.
31 | """
32 | head = values.shape[1]
33 | channel = values.shape[2]
34 | length = values.shape[3]
35 | # find top k
36 | top_k = int(self.factor * math.log(length))
37 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
38 | index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1]
39 | weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1)
40 | # update corr
41 | tmp_corr = torch.softmax(weights, dim=-1)
42 | # aggregation
43 | tmp_values = values
44 | delays_agg = torch.zeros_like(values).float()
45 | for i in range(top_k):
46 | pattern = torch.roll(tmp_values, -int(index[i]), -1)
47 | delays_agg = delays_agg + pattern * \
48 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
49 | return delays_agg
50 |
51 | def time_delay_agg_inference(self, values, corr):
52 | """
53 | SpeedUp version of Autocorrelation (a batch-normalization style design)
54 | This is for the inference phase.
55 | """
56 | batch = values.shape[0]
57 | head = values.shape[1]
58 | channel = values.shape[2]
59 | length = values.shape[3]
60 | # index init
61 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
62 | # find top k
63 | top_k = int(self.factor * math.log(length))
64 | mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
65 | weights, delay = torch.topk(mean_value, top_k, dim=-1)
66 | # update corr
67 | tmp_corr = torch.softmax(weights, dim=-1)
68 | # aggregation
69 | tmp_values = values.repeat(1, 1, 1, 2)
70 | delays_agg = torch.zeros_like(values).float()
71 | for i in range(top_k):
72 | tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)
73 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
74 | delays_agg = delays_agg + pattern * \
75 | (tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
76 | return delays_agg
77 |
78 | def time_delay_agg_full(self, values, corr):
79 | """
80 | Standard version of Autocorrelation
81 | """
82 | batch = values.shape[0]
83 | head = values.shape[1]
84 | channel = values.shape[2]
85 | length = values.shape[3]
86 | # index init
87 | init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
88 | # find top k
89 | top_k = int(self.factor * math.log(length))
90 | weights, delay = torch.topk(corr, top_k, dim=-1)
91 | # update corr
92 | tmp_corr = torch.softmax(weights, dim=-1)
93 | # aggregation
94 | tmp_values = values.repeat(1, 1, 1, 2)
95 | delays_agg = torch.zeros_like(values).float()
96 | for i in range(top_k):
97 | tmp_delay = init_index + delay[..., i].unsqueeze(-1)
98 | pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
99 | delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1))
100 | return delays_agg
101 |
102 | def forward(self, queries, keys, values, attn_mask):
103 | B, L, H, E = queries.shape
104 | _, S, _, D = values.shape
105 | if L > S:
106 | zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
107 | values = torch.cat([values, zeros], dim=1)
108 | keys = torch.cat([keys, zeros], dim=1)
109 | else:
110 | values = values[:, :L, :, :]
111 | keys = keys[:, :L, :, :]
112 |
113 | # period-based dependencies
114 | q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1)
115 | k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1)
116 | res = q_fft * torch.conj(k_fft)
117 | corr = torch.fft.irfft(res, dim=-1)
118 |
119 | # time delay agg
120 | if self.training:
121 | V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
122 | else:
123 | V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
124 |
125 | if self.output_attention:
126 | return (V.contiguous(), corr.permute(0, 3, 1, 2))
127 | else:
128 | return (V.contiguous(), None)
129 |
130 |
131 | class AutoCorrelationLayer(nn.Module):
132 | def __init__(self, correlation, d_model, n_heads, d_keys=None,
133 | d_values=None):
134 | super(AutoCorrelationLayer, self).__init__()
135 |
136 | d_keys = d_keys or (d_model // n_heads)
137 | d_values = d_values or (d_model // n_heads)
138 |
139 | self.inner_correlation = correlation
140 | self.query_projection = nn.Linear(d_model, d_keys * n_heads)
141 | self.key_projection = nn.Linear(d_model, d_keys * n_heads)
142 | self.value_projection = nn.Linear(d_model, d_values * n_heads)
143 | self.out_projection = nn.Linear(d_values * n_heads, d_model)
144 | self.n_heads = n_heads
145 |
146 | def forward(self, queries, keys, values, attn_mask):
147 | B, L, _ = queries.shape
148 | _, S, _ = keys.shape
149 | H = self.n_heads
150 |
151 | queries = self.query_projection(queries).view(B, L, H, -1)
152 | keys = self.key_projection(keys).view(B, S, H, -1)
153 | values = self.value_projection(values).view(B, S, H, -1)
154 |
155 | out, attn = self.inner_correlation(
156 | queries,
157 | keys,
158 | values,
159 | attn_mask
160 | )
161 | out = out.view(B, L, -1)
162 |
163 | return self.out_projection(out), attn
164 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Autoformer_EncDec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class my_Layernorm(nn.Module):
7 | """
8 | Special designed layernorm for the seasonal part
9 | """
10 |
11 | def __init__(self, channels):
12 | super(my_Layernorm, self).__init__()
13 | self.layernorm = nn.LayerNorm(channels)
14 |
15 | def forward(self, x):
16 | x_hat = self.layernorm(x)
17 | bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1)
18 | return x_hat - bias
19 |
20 |
21 | class moving_avg(nn.Module):
22 | """
23 | Moving average block to highlight the trend of time series
24 | """
25 |
26 | def __init__(self, kernel_size, stride):
27 | super(moving_avg, self).__init__()
28 | self.kernel_size = kernel_size
29 | self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
30 |
31 | def forward(self, x):
32 |
33 | # padding on the both ends of time series
34 | front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
35 | end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
36 | x = torch.cat([front, x, end], dim=1)
37 | x = self.avg(x.permute(0, 2, 1))
38 | x = x.permute(0, 2, 1)
39 | return x
40 |
41 |
42 | class series_decomp(nn.Module):
43 | """
44 | Series decomposition block
45 | """
46 |
47 | def __init__(self, kernel_size):
48 | super(series_decomp, self).__init__()
49 | self.moving_avg = moving_avg(kernel_size, stride=1)
50 |
51 | def forward(self, x):
52 | moving_mean = self.moving_avg(x)
53 | res = x - moving_mean
54 | return res, moving_mean
55 |
56 |
57 | class series_decomp_multi(nn.Module):
58 | """
59 | Multiple Series decomposition block from FEDformer
60 | """
61 |
62 | def __init__(self, kernel_size):
63 | super(series_decomp_multi, self).__init__()
64 | self.kernel_size = kernel_size
65 | self.series_decomp = [series_decomp(kernel) for kernel in kernel_size]
66 |
67 | def forward(self, x):
68 | moving_mean = []
69 | res = []
70 | for func in self.series_decomp:
71 | sea, moving_avg = func(x)
72 | moving_mean.append(moving_avg)
73 | res.append(sea)
74 |
75 | sea = sum(res) / len(res)
76 | moving_mean = sum(moving_mean) / len(moving_mean)
77 | return sea, moving_mean
78 |
79 |
80 | class EncoderLayer(nn.Module):
81 | """
82 | Autoformer encoder layer with the progressive decomposition architecture
83 | """
84 |
85 | def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"):
86 | super(EncoderLayer, self).__init__()
87 | d_ff = d_ff or 4 * d_model
88 | self.attention = attention
89 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
90 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
91 | self.decomp1 = series_decomp(moving_avg)
92 | self.decomp2 = series_decomp(moving_avg)
93 | self.dropout = nn.Dropout(dropout)
94 | self.activation = F.relu if activation == "relu" else F.gelu
95 |
96 | def forward(self, x, attn_mask=None):
97 | new_x, attn = self.attention(
98 | x, x, x,
99 | attn_mask=attn_mask
100 | )
101 | x = x + self.dropout(new_x)
102 | x, _ = self.decomp1(x)
103 | y = x
104 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
105 | y = self.dropout(self.conv2(y).transpose(-1, 1))
106 | res, _ = self.decomp2(x + y)
107 | return res, attn
108 |
109 |
110 | class Encoder(nn.Module):
111 | """
112 | Autoformer encoder
113 | """
114 |
115 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
116 | super(Encoder, self).__init__()
117 | self.attn_layers = nn.ModuleList(attn_layers)
118 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
119 | self.norm = norm_layer
120 |
121 | def forward(self, x, attn_mask=None):
122 | attns = []
123 | if self.conv_layers is not None:
124 | for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
125 | x, attn = attn_layer(x, attn_mask=attn_mask)
126 | x = conv_layer(x)
127 | attns.append(attn)
128 | x, attn = self.attn_layers[-1](x)
129 | attns.append(attn)
130 | else:
131 | for attn_layer in self.attn_layers:
132 | x, attn = attn_layer(x, attn_mask=attn_mask)
133 | attns.append(attn)
134 |
135 | if self.norm is not None:
136 | x = self.norm(x)
137 |
138 | return x, attns
139 |
140 |
141 | class DecoderLayer(nn.Module):
142 | """
143 | Autoformer decoder layer with the progressive decomposition architecture
144 | """
145 |
146 | def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None,
147 | moving_avg=25, dropout=0.1, activation="relu"):
148 | super(DecoderLayer, self).__init__()
149 | d_ff = d_ff or 4 * d_model
150 | self.self_attention = self_attention
151 | self.cross_attention = cross_attention
152 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
153 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
154 | self.decomp1 = series_decomp(moving_avg)
155 | self.decomp2 = series_decomp(moving_avg)
156 | self.decomp3 = series_decomp(moving_avg)
157 | self.dropout = nn.Dropout(dropout)
158 | self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1,
159 | padding_mode='circular', bias=False)
160 | self.activation = F.relu if activation == "relu" else F.gelu
161 |
162 | def forward(self, x, cross, x_mask=None, cross_mask=None):
163 | x = x + self.dropout(self.self_attention(
164 | x, x, x,
165 | attn_mask=x_mask
166 | )[0])
167 | x, trend1 = self.decomp1(x)
168 | x = x + self.dropout(self.cross_attention(
169 | x, cross, cross,
170 | attn_mask=cross_mask
171 | )[0])
172 | x, trend2 = self.decomp2(x)
173 | y = x
174 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
175 | y = self.dropout(self.conv2(y).transpose(-1, 1))
176 | x, trend3 = self.decomp3(x + y)
177 |
178 | residual_trend = trend1 + trend2 + trend3
179 | residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2)
180 | return x, residual_trend
181 |
182 |
183 | class Decoder(nn.Module):
184 | """
185 | Autoformer encoder
186 | """
187 |
188 | def __init__(self, layers, norm_layer=None, projection=None):
189 | super(Decoder, self).__init__()
190 | self.layers = nn.ModuleList(layers)
191 | self.norm = norm_layer
192 | self.projection = projection
193 |
194 | def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
195 | for layer in self.layers:
196 | x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
197 | trend = trend + residual_trend
198 |
199 | if self.norm is not None:
200 | x = self.norm(x)
201 |
202 | if self.projection is not None:
203 | x = self.projection(x)
204 | return x, trend
205 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Conv_Blocks.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | class Inception_Block_V1(nn.Module):
6 | def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
7 | super(Inception_Block_V1, self).__init__()
8 | self.in_channels = in_channels
9 | self.out_channels = out_channels
10 | self.num_kernels = num_kernels
11 | kernels = []
12 | for i in range(self.num_kernels):
13 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=2 * i + 1, padding=i))
14 | self.kernels = nn.ModuleList(kernels)
15 | if init_weight:
16 | self._initialize_weights()
17 |
18 | def _initialize_weights(self):
19 | for m in self.modules():
20 | if isinstance(m, nn.Conv2d):
21 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
22 | if m.bias is not None:
23 | nn.init.constant_(m.bias, 0)
24 |
25 | def forward(self, x):
26 | res_list = []
27 | for i in range(self.num_kernels):
28 | res_list.append(self.kernels[i](x))
29 | res = torch.stack(res_list, dim=-1).mean(-1)
30 | return res
31 |
32 |
33 | class Inception_Block_V2(nn.Module):
34 | def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
35 | super(Inception_Block_V2, self).__init__()
36 | self.in_channels = in_channels
37 | self.out_channels = out_channels
38 | self.num_kernels = num_kernels
39 | kernels = []
40 | for i in range(self.num_kernels // 2):
41 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[1, 2 * i + 3], padding=[0, i + 1]))
42 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[2 * i + 3, 1], padding=[i + 1, 0]))
43 | kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=1))
44 | self.kernels = nn.ModuleList(kernels)
45 | if init_weight:
46 | self._initialize_weights()
47 |
48 | def _initialize_weights(self):
49 | for m in self.modules():
50 | if isinstance(m, nn.Conv2d):
51 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
52 | if m.bias is not None:
53 | nn.init.constant_(m.bias, 0)
54 |
55 | def forward(self, x):
56 | res_list = []
57 | for i in range(self.num_kernels + 1):
58 | res_list.append(self.kernels[i](x))
59 | res = torch.stack(res_list, dim=-1).mean(-1)
60 | return res
61 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Embed.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import math
4 |
5 |
6 | class PositionalEmbedding(nn.Module):
7 | def __init__(self, d_model, max_len=5000):
8 | super(PositionalEmbedding, self).__init__()
9 | # Compute the positional encodings once in log space.
10 | pe = torch.zeros(max_len, d_model).float()
11 | pe.require_grad = False
12 |
13 | position = torch.arange(0, max_len).float().unsqueeze(1)
14 | div_term = (torch.arange(0, d_model, 2).float()
15 | * -(math.log(10000.0) / d_model)).exp()
16 |
17 | pe[:, 0::2] = torch.sin(position * div_term)
18 | pe[:, 1::2] = torch.cos(position * div_term)
19 |
20 | pe = pe.unsqueeze(0)
21 | self.register_buffer('pe', pe)
22 |
23 | def forward(self, x):
24 | return self.pe[:, :x.size(1)]
25 |
26 |
27 | class TokenEmbedding(nn.Module):
28 | def __init__(self, c_in, d_model):
29 | super(TokenEmbedding, self).__init__()
30 | padding = 1 if torch.__version__ >= '1.5.0' else 2
31 | self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
32 | kernel_size=3, padding=padding, padding_mode='circular', bias=False)
33 | for m in self.modules():
34 | if isinstance(m, nn.Conv1d):
35 | nn.init.kaiming_normal_(
36 | m.weight, mode='fan_in', nonlinearity='leaky_relu')
37 |
38 | def forward(self, x):
39 | x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
40 | return x
41 |
42 |
43 | class FixedEmbedding(nn.Module):
44 | def __init__(self, c_in, d_model):
45 | super(FixedEmbedding, self).__init__()
46 |
47 | w = torch.zeros(c_in, d_model).float()
48 | w.require_grad = False
49 |
50 | position = torch.arange(0, c_in).float().unsqueeze(1)
51 | div_term = (torch.arange(0, d_model, 2).float()
52 | * -(math.log(10000.0) / d_model)).exp()
53 |
54 | w[:, 0::2] = torch.sin(position * div_term)
55 | w[:, 1::2] = torch.cos(position * div_term)
56 |
57 | self.emb = nn.Embedding(c_in, d_model)
58 | self.emb.weight = nn.Parameter(w, requires_grad=False)
59 |
60 | def forward(self, x):
61 | return self.emb(x).detach()
62 |
63 |
64 | class TemporalEmbedding(nn.Module):
65 | def __init__(self, d_model, embed_type='fixed', freq='h'):
66 | super(TemporalEmbedding, self).__init__()
67 |
68 | minute_size = 4
69 | hour_size = 24
70 | weekday_size = 7
71 | day_size = 32
72 | month_size = 13
73 |
74 | Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
75 | if freq == 't':
76 | self.minute_embed = Embed(minute_size, d_model)
77 | self.hour_embed = Embed(hour_size, d_model)
78 | self.weekday_embed = Embed(weekday_size, d_model)
79 | self.day_embed = Embed(day_size, d_model)
80 | self.month_embed = Embed(month_size, d_model)
81 |
82 | def forward(self, x):
83 |
84 | x = x.long()
85 | minute_x = self.minute_embed(x[:, :, 4]) if hasattr(self, 'minute_embed') else 0.
86 | hour_x = self.hour_embed(x[:, :, 3])
87 | weekday_x = self.weekday_embed(x[:, :, 2])
88 | day_x = self.day_embed(x[:, :, 1])
89 | month_x = self.month_embed(x[:, :, 0])
90 |
91 | return hour_x + weekday_x + day_x + month_x + minute_x
92 |
93 |
94 | class TimeFeatureEmbedding(nn.Module):
95 | def __init__(self, d_model, embed_type='timeF', freq='h'):
96 | super(TimeFeatureEmbedding, self).__init__()
97 |
98 | freq_map = {'h': 4, 't': 5, 's': 6,
99 | 'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3}
100 | d_inp = freq_map[freq]
101 | self.embed = nn.Linear(d_inp, d_model, bias=False)
102 |
103 | def forward(self, x):
104 | return self.embed(x)
105 |
106 |
107 | class DataEmbedding(nn.Module):
108 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
109 | super(DataEmbedding, self).__init__()
110 |
111 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
112 | self.position_embedding = PositionalEmbedding(d_model=d_model)
113 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
114 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
115 | d_model=d_model, embed_type=embed_type, freq=freq)
116 | self.dropout = nn.Dropout(p=dropout)
117 |
118 | def forward(self, x, x_mark=None):
119 |
120 | if x_mark is None:
121 | x = self.value_embedding(x) + self.position_embedding(x)
122 | else:
123 | x = self.value_embedding(x) + self.temporal_embedding(x_mark) + self.position_embedding(x)
124 | return self.dropout(x)
125 |
126 |
127 | class DataEmbedding_wo_pos(nn.Module):
128 | def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
129 | super(DataEmbedding_wo_pos, self).__init__()
130 |
131 | self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
132 |
133 | self.position_embedding = PositionalEmbedding(d_model=d_model)
134 | self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
135 | freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
136 | d_model=d_model, embed_type=embed_type, freq=freq)
137 | self.dropout = nn.Dropout(p=dropout)
138 |
139 | def forward(self, x, x_mark):
140 | if x_mark is None:
141 | x = self.value_embedding(x)
142 | else:
143 | x = self.value_embedding(x) + self.temporal_embedding(x_mark)
144 | return self.dropout(x)
145 |
146 |
147 | class PatchEmbedding(nn.Module):
148 | def __init__(self, d_model, patch_len, stride, dropout):
149 | super(PatchEmbedding, self).__init__()
150 | # Patching
151 | self.patch_len = patch_len
152 | self.stride = stride
153 | self.padding_patch_layer = nn.ReplicationPad1d((0, stride))
154 |
155 | # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space
156 | self.value_embedding = TokenEmbedding(patch_len, d_model)
157 |
158 | # Positional embedding
159 | self.position_embedding = PositionalEmbedding(d_model)
160 |
161 | # Residual dropout
162 | self.dropout = nn.Dropout(dropout)
163 |
164 | def forward(self, x):
165 | # do patching
166 | n_vars = x.shape[1]
167 | x = self.padding_patch_layer(x)
168 | x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
169 | x = torch.reshape(x, (x.shape[0] * x.shape[1], x.shape[2], x.shape[3])) # channel independent
170 |
171 | # Input encoding
172 | x = self.value_embedding(x) + self.position_embedding(x)
173 | return self.dropout(x), n_vars
174 |
175 |
176 | class PatchEmbedding_wo_channel_independent(nn.Module):
177 | def __init__(self, n_vars, d_model, patch_len, stride, dropout):
178 | super(PatchEmbedding_wo_channel_independent, self).__init__()
179 | # Patching
180 | self.n_vars = n_vars
181 | self.patch_len = patch_len
182 | self.stride = stride
183 | self.padding_patch_layer = nn.ReplicationPad1d((0, stride))
184 |
185 | # Backbone, Input encoding: projection of feature vectors onto a d-dim vector space
186 | self.value_embedding = TokenEmbedding(patch_len*n_vars, d_model)
187 |
188 | # Positional embedding
189 | self.position_embedding = PositionalEmbedding(d_model)
190 |
191 | # Residual dropout
192 | self.dropout = nn.Dropout(dropout)
193 |
194 | def forward(self, x):
195 | # do patching
196 | n_vars = x.shape[1]
197 | x = self.padding_patch_layer(x)
198 | x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
199 |
200 | x = torch.reshape(x, (x.shape[0], x.shape[2], x.shape[1]*x.shape[3]))
201 |
202 | # Input encoding
203 | x = self.value_embedding(x) + self.position_embedding(x)
204 | return self.dropout(x), n_vars
205 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/FourierCorrelation.py:
--------------------------------------------------------------------------------
1 | # coding=utf-8
2 | # author=maziqing
3 | # email=maziqing.mzq@alibaba-inc.com
4 |
5 | import numpy as np
6 | import torch
7 | import torch.nn as nn
8 |
9 |
10 | def get_frequency_modes(seq_len, modes=64, mode_select_method='random'):
11 | """
12 | get modes on frequency domain:
13 | 'random' means sampling randomly;
14 | 'else' means sampling the lowest modes;
15 | """
16 | modes = min(modes, seq_len // 2)
17 | if mode_select_method == 'random':
18 | index = list(range(0, seq_len // 2))
19 | np.random.shuffle(index)
20 | index = index[:modes]
21 | else:
22 | index = list(range(0, modes))
23 | index.sort()
24 | return index
25 |
26 |
27 | # ########## fourier layer #############
28 | class FourierBlock(nn.Module):
29 | def __init__(self, in_channels, out_channels, seq_len, modes=0, mode_select_method='random'):
30 | super(FourierBlock, self).__init__()
31 | print('fourier enhanced block used!')
32 | """
33 | 1D Fourier block. It performs representation learning on frequency domain,
34 | it does FFT, linear transform, and Inverse FFT.
35 | """
36 | # get modes on frequency domain
37 | self.index = get_frequency_modes(seq_len, modes=modes, mode_select_method=mode_select_method)
38 | print('modes={}, index={}'.format(modes, self.index))
39 |
40 | self.scale = (1 / (in_channels * out_channels))
41 | self.weights1 = nn.Parameter(
42 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float))
43 | self.weights2 = nn.Parameter(
44 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index), dtype=torch.float))
45 |
46 | # Complex multiplication
47 | def compl_mul1d(self, order, x, weights):
48 | x_flag = True
49 | w_flag = True
50 | if not torch.is_complex(x):
51 | x_flag = False
52 | x = torch.complex(x, torch.zeros_like(x).to(x.device))
53 | if not torch.is_complex(weights):
54 | w_flag = False
55 | weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device))
56 | if x_flag or w_flag:
57 | return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag),
58 | torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real))
59 | else:
60 | return torch.einsum(order, x.real, weights.real)
61 |
62 | def forward(self, q, k, v, mask):
63 | # size = [B, L, H, E]
64 | B, L, H, E = q.shape
65 | x = q.permute(0, 2, 3, 1)
66 | # Compute Fourier coefficients
67 | x_ft = torch.fft.rfft(x, dim=-1)
68 | # Perform Fourier neural operations
69 | out_ft = torch.zeros(B, H, E, L // 2 + 1, device=x.device, dtype=torch.cfloat)
70 | for wi, i in enumerate(self.index):
71 | if i >= x_ft.shape[3] or wi >= out_ft.shape[3]:
72 | continue
73 | out_ft[:, :, :, wi] = self.compl_mul1d("bhi,hio->bho", x_ft[:, :, :, i],
74 | torch.complex(self.weights1, self.weights2)[:, :, :, wi])
75 | # Return to time domain
76 | x = torch.fft.irfft(out_ft, n=x.size(-1))
77 | return (x, None)
78 |
79 |
80 | # ########## Fourier Cross Former ####################
81 | class FourierCrossAttention(nn.Module):
82 | def __init__(self, in_channels, out_channels, seq_len_q, seq_len_kv, modes=64, mode_select_method='random',
83 | activation='tanh', policy=0):
84 | super(FourierCrossAttention, self).__init__()
85 | print(' fourier enhanced cross attention used!')
86 | """
87 | 1D Fourier Cross Attention layer. It does FFT, linear transform, attention mechanism and Inverse FFT.
88 | """
89 | self.activation = activation
90 | self.in_channels = in_channels
91 | self.out_channels = out_channels
92 | # get modes for queries and keys (& values) on frequency domain
93 | self.index_q = get_frequency_modes(seq_len_q, modes=modes, mode_select_method=mode_select_method)
94 | self.index_kv = get_frequency_modes(seq_len_kv, modes=modes, mode_select_method=mode_select_method)
95 |
96 | print('modes_q={}, index_q={}'.format(len(self.index_q), self.index_q))
97 | print('modes_kv={}, index_kv={}'.format(len(self.index_kv), self.index_kv))
98 |
99 | self.scale = (1 / (in_channels * out_channels))
100 | self.weights1 = nn.Parameter(
101 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float))
102 | self.weights2 = nn.Parameter(
103 | self.scale * torch.rand(8, in_channels // 8, out_channels // 8, len(self.index_q), dtype=torch.float))
104 |
105 | # Complex multiplication
106 | def compl_mul1d(self, order, x, weights):
107 | x_flag = True
108 | w_flag = True
109 | if not torch.is_complex(x):
110 | x_flag = False
111 | x = torch.complex(x, torch.zeros_like(x).to(x.device))
112 | if not torch.is_complex(weights):
113 | w_flag = False
114 | weights = torch.complex(weights, torch.zeros_like(weights).to(weights.device))
115 | if x_flag or w_flag:
116 | return torch.complex(torch.einsum(order, x.real, weights.real) - torch.einsum(order, x.imag, weights.imag),
117 | torch.einsum(order, x.real, weights.imag) + torch.einsum(order, x.imag, weights.real))
118 | else:
119 | return torch.einsum(order, x.real, weights.real)
120 |
121 | def forward(self, q, k, v, mask):
122 | # size = [B, L, H, E]
123 | B, L, H, E = q.shape
124 | xq = q.permute(0, 2, 3, 1) # size = [B, H, E, L]
125 | xk = k.permute(0, 2, 3, 1)
126 | xv = v.permute(0, 2, 3, 1)
127 |
128 | # Compute Fourier coefficients
129 | xq_ft_ = torch.zeros(B, H, E, len(self.index_q), device=xq.device, dtype=torch.cfloat)
130 | xq_ft = torch.fft.rfft(xq, dim=-1)
131 | for i, j in enumerate(self.index_q):
132 | if j >= xq_ft.shape[3]:
133 | continue
134 | xq_ft_[:, :, :, i] = xq_ft[:, :, :, j]
135 | xk_ft_ = torch.zeros(B, H, E, len(self.index_kv), device=xq.device, dtype=torch.cfloat)
136 | xk_ft = torch.fft.rfft(xk, dim=-1)
137 | for i, j in enumerate(self.index_kv):
138 | if j >= xk_ft.shape[3]:
139 | continue
140 | xk_ft_[:, :, :, i] = xk_ft[:, :, :, j]
141 |
142 | # perform attention mechanism on frequency domain
143 | xqk_ft = (self.compl_mul1d("bhex,bhey->bhxy", xq_ft_, xk_ft_))
144 | if self.activation == 'tanh':
145 | xqk_ft = torch.complex(xqk_ft.real.tanh(), xqk_ft.imag.tanh())
146 | elif self.activation == 'softmax':
147 | xqk_ft = torch.softmax(abs(xqk_ft), dim=-1)
148 | xqk_ft = torch.complex(xqk_ft, torch.zeros_like(xqk_ft))
149 | else:
150 | raise Exception('{} actiation function is not implemented'.format(self.activation))
151 | xqkv_ft = self.compl_mul1d("bhxy,bhey->bhex", xqk_ft, xk_ft_)
152 | xqkvw = self.compl_mul1d("bhex,heox->bhox", xqkv_ft, torch.complex(self.weights1, self.weights2))
153 | out_ft = torch.zeros(B, H, E, L // 2 + 1, device=xq.device, dtype=torch.cfloat)
154 | for i, j in enumerate(self.index_q):
155 | if i >= xqkvw.shape[3] or j >= out_ft.shape[3]:
156 | continue
157 | out_ft[:, :, :, j] = xqkvw[:, :, :, i]
158 | # Return to time domain
159 | out = torch.fft.irfft(out_ft / self.in_channels / self.out_channels, n=xq.size(-1))
160 | return (out, None)
161 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Pyraformer_EncDec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | from torch.nn.modules.linear import Linear
5 | from layers.SelfAttention_Family import AttentionLayer, FullAttention
6 | from layers.Embed import DataEmbedding
7 | import math
8 |
9 |
10 | def get_mask(input_size, window_size, inner_size):
11 | """Get the attention mask of PAM-Naive"""
12 | # Get the size of all layers
13 | all_size = []
14 | all_size.append(input_size)
15 | for i in range(len(window_size)):
16 | layer_size = math.floor(all_size[i] / window_size[i])
17 | all_size.append(layer_size)
18 |
19 | seq_length = sum(all_size)
20 | mask = torch.zeros(seq_length, seq_length)
21 |
22 | # get intra-scale mask
23 | inner_window = inner_size // 2
24 | for layer_idx in range(len(all_size)):
25 | start = sum(all_size[:layer_idx])
26 | for i in range(start, start + all_size[layer_idx]):
27 | left_side = max(i - inner_window, start)
28 | right_side = min(i + inner_window + 1, start + all_size[layer_idx])
29 | mask[i, left_side:right_side] = 1
30 |
31 | # get inter-scale mask
32 | for layer_idx in range(1, len(all_size)):
33 | start = sum(all_size[:layer_idx])
34 | for i in range(start, start + all_size[layer_idx]):
35 | left_side = (start - all_size[layer_idx - 1]) + \
36 | (i - start) * window_size[layer_idx - 1]
37 | if i == (start + all_size[layer_idx] - 1):
38 | right_side = start
39 | else:
40 | right_side = (
41 | start - all_size[layer_idx - 1]) + (i - start + 1) * window_size[layer_idx - 1]
42 | mask[i, left_side:right_side] = 1
43 | mask[left_side:right_side, i] = 1
44 |
45 | mask = (1 - mask).bool()
46 |
47 | return mask, all_size
48 |
49 |
50 | def refer_points(all_sizes, window_size):
51 | """Gather features from PAM's pyramid sequences"""
52 | input_size = all_sizes[0]
53 | indexes = torch.zeros(input_size, len(all_sizes))
54 |
55 | for i in range(input_size):
56 | indexes[i][0] = i
57 | former_index = i
58 | for j in range(1, len(all_sizes)):
59 | start = sum(all_sizes[:j])
60 | inner_layer_idx = former_index - (start - all_sizes[j - 1])
61 | former_index = start + \
62 | min(inner_layer_idx // window_size[j - 1], all_sizes[j] - 1)
63 | indexes[i][j] = former_index
64 |
65 | indexes = indexes.unsqueeze(0).unsqueeze(3)
66 |
67 | return indexes.long()
68 |
69 |
70 | class RegularMask():
71 | def __init__(self, mask):
72 | self._mask = mask.unsqueeze(1)
73 |
74 | @property
75 | def mask(self):
76 | return self._mask
77 |
78 |
79 | class EncoderLayer(nn.Module):
80 | """ Compose with two layers """
81 |
82 | def __init__(self, d_model, d_inner, n_head, dropout=0.1, normalize_before=True):
83 | super(EncoderLayer, self).__init__()
84 |
85 | self.slf_attn = AttentionLayer(
86 | FullAttention(mask_flag=True, factor=0,
87 | attention_dropout=dropout, output_attention=False),
88 | d_model, n_head)
89 | self.pos_ffn = PositionwiseFeedForward(
90 | d_model, d_inner, dropout=dropout, normalize_before=normalize_before)
91 |
92 | def forward(self, enc_input, slf_attn_mask=None):
93 | attn_mask = RegularMask(slf_attn_mask)
94 | enc_output, _ = self.slf_attn(
95 | enc_input, enc_input, enc_input, attn_mask=attn_mask)
96 | enc_output = self.pos_ffn(enc_output)
97 | return enc_output
98 |
99 |
100 | class Encoder(nn.Module):
101 | """ A encoder model with self attention mechanism. """
102 |
103 | def __init__(self, configs, window_size, inner_size):
104 | super().__init__()
105 |
106 | d_bottleneck = configs.d_model//4
107 |
108 | self.mask, self.all_size = get_mask(
109 | configs.seq_len, window_size, inner_size)
110 | self.indexes = refer_points(self.all_size, window_size)
111 | self.layers = nn.ModuleList([
112 | EncoderLayer(configs.d_model, configs.d_ff, configs.n_heads, dropout=configs.dropout,
113 | normalize_before=False) for _ in range(configs.e_layers)
114 | ]) # naive pyramid attention
115 |
116 | self.enc_embedding = DataEmbedding(
117 | configs.enc_in, configs.d_model, configs.dropout)
118 | self.conv_layers = Bottleneck_Construct(
119 | configs.d_model, window_size, d_bottleneck)
120 |
121 | def forward(self, x_enc, x_mark_enc):
122 | seq_enc = self.enc_embedding(x_enc, x_mark_enc)
123 |
124 | mask = self.mask.repeat(len(seq_enc), 1, 1).to(x_enc.device)
125 | seq_enc = self.conv_layers(seq_enc)
126 |
127 | for i in range(len(self.layers)):
128 | seq_enc = self.layers[i](seq_enc, mask)
129 |
130 | indexes = self.indexes.repeat(seq_enc.size(
131 | 0), 1, 1, seq_enc.size(2)).to(seq_enc.device)
132 | indexes = indexes.view(seq_enc.size(0), -1, seq_enc.size(2))
133 | all_enc = torch.gather(seq_enc, 1, indexes)
134 | seq_enc = all_enc.view(seq_enc.size(0), self.all_size[0], -1)
135 |
136 | return seq_enc
137 |
138 |
139 | class ConvLayer(nn.Module):
140 | def __init__(self, c_in, window_size):
141 | super(ConvLayer, self).__init__()
142 | self.downConv = nn.Conv1d(in_channels=c_in,
143 | out_channels=c_in,
144 | kernel_size=window_size,
145 | stride=window_size)
146 | self.norm = nn.BatchNorm1d(c_in)
147 | self.activation = nn.ELU()
148 |
149 | def forward(self, x):
150 | x = self.downConv(x)
151 | x = self.norm(x)
152 | x = self.activation(x)
153 | return x
154 |
155 |
156 | class Bottleneck_Construct(nn.Module):
157 | """Bottleneck convolution CSCM"""
158 |
159 | def __init__(self, d_model, window_size, d_inner):
160 | super(Bottleneck_Construct, self).__init__()
161 | if not isinstance(window_size, list):
162 | self.conv_layers = nn.ModuleList([
163 | ConvLayer(d_inner, window_size),
164 | ConvLayer(d_inner, window_size),
165 | ConvLayer(d_inner, window_size)
166 | ])
167 | else:
168 | self.conv_layers = []
169 | for i in range(len(window_size)):
170 | self.conv_layers.append(ConvLayer(d_inner, window_size[i]))
171 | self.conv_layers = nn.ModuleList(self.conv_layers)
172 | self.up = Linear(d_inner, d_model)
173 | self.down = Linear(d_model, d_inner)
174 | self.norm = nn.LayerNorm(d_model)
175 |
176 | def forward(self, enc_input):
177 | temp_input = self.down(enc_input).permute(0, 2, 1)
178 | all_inputs = []
179 | for i in range(len(self.conv_layers)):
180 | temp_input = self.conv_layers[i](temp_input)
181 | all_inputs.append(temp_input)
182 |
183 | all_inputs = torch.cat(all_inputs, dim=2).transpose(1, 2)
184 | all_inputs = self.up(all_inputs)
185 | all_inputs = torch.cat([enc_input, all_inputs], dim=1)
186 |
187 | all_inputs = self.norm(all_inputs)
188 | return all_inputs
189 |
190 |
191 | class PositionwiseFeedForward(nn.Module):
192 | """ Two-layer position-wise feed-forward neural network. """
193 |
194 | def __init__(self, d_in, d_hid, dropout=0.1, normalize_before=True):
195 | super().__init__()
196 |
197 | self.normalize_before = normalize_before
198 |
199 | self.w_1 = nn.Linear(d_in, d_hid)
200 | self.w_2 = nn.Linear(d_hid, d_in)
201 |
202 | self.layer_norm = nn.LayerNorm(d_in, eps=1e-6)
203 | self.dropout = nn.Dropout(dropout)
204 |
205 | def forward(self, x):
206 | residual = x
207 | if self.normalize_before:
208 | x = self.layer_norm(x)
209 |
210 | x = F.gelu(self.w_1(x))
211 | x = self.dropout(x)
212 | x = self.w_2(x)
213 | x = self.dropout(x)
214 | x = x + residual
215 |
216 | if not self.normalize_before:
217 | x = self.layer_norm(x)
218 | return x
219 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/SelfAttention_Family.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | from math import sqrt
5 | from utils.masking import TriangularCausalMask, ProbMask
6 | from reformer_pytorch import LSHSelfAttention
7 |
8 |
9 | class DSAttention(nn.Module):
10 | '''De-stationary Attention'''
11 |
12 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
13 | super(DSAttention, self).__init__()
14 | self.scale = scale
15 | self.mask_flag = mask_flag
16 | self.output_attention = output_attention
17 | self.dropout = nn.Dropout(attention_dropout)
18 |
19 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
20 | B, L, H, E = queries.shape
21 | _, S, _, D = values.shape
22 | scale = self.scale or 1. / sqrt(E)
23 |
24 | tau = 1.0 if tau is None else tau.unsqueeze(
25 | 1).unsqueeze(1) # B x 1 x 1 x 1
26 | delta = 0.0 if delta is None else delta.unsqueeze(
27 | 1).unsqueeze(1) # B x 1 x 1 x S
28 |
29 | # De-stationary Attention, rescaling pre-softmax score with learned de-stationary factors
30 | scores = torch.einsum("blhe,bshe->bhls", queries, keys) * tau + delta
31 |
32 | if self.mask_flag:
33 | if attn_mask is None:
34 | attn_mask = TriangularCausalMask(B, L, device=queries.device)
35 |
36 | scores.masked_fill_(attn_mask.mask, -np.inf)
37 |
38 | A = self.dropout(torch.softmax(scale * scores, dim=-1))
39 | V = torch.einsum("bhls,bshd->blhd", A, values)
40 |
41 | if self.output_attention:
42 | return (V.contiguous(), A)
43 | else:
44 | return (V.contiguous(), None)
45 |
46 |
47 | class FullAttention(nn.Module):
48 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
49 | super(FullAttention, self).__init__()
50 | self.scale = scale
51 | self.mask_flag = mask_flag
52 | self.output_attention = output_attention
53 | self.dropout = nn.Dropout(attention_dropout)
54 |
55 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
56 | B, L, H, E = queries.shape
57 | _, S, _, D = values.shape
58 | scale = self.scale or 1. / sqrt(E)
59 |
60 | scores = torch.einsum("blhe,bshe->bhls", queries, keys)
61 |
62 | if self.mask_flag:
63 | if attn_mask is None:
64 | attn_mask = TriangularCausalMask(B, L, device=queries.device)
65 |
66 | scores.masked_fill_(attn_mask.mask, -np.inf)
67 |
68 | A = self.dropout(torch.softmax(scale * scores, dim=-1))
69 | V = torch.einsum("bhls,bshd->blhd", A, values)
70 |
71 | if self.output_attention:
72 | return (V.contiguous(), A)
73 | else:
74 | return (V.contiguous(), None)
75 |
76 |
77 | class ProbAttention(nn.Module):
78 | def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
79 | super(ProbAttention, self).__init__()
80 | self.factor = factor
81 | self.scale = scale
82 | self.mask_flag = mask_flag
83 | self.output_attention = output_attention
84 | self.dropout = nn.Dropout(attention_dropout)
85 |
86 | def _prob_QK(self, Q, K, sample_k, n_top): # n_top: c*ln(L_q)
87 | # Q [B, H, L, D]
88 | B, H, L_K, E = K.shape
89 | _, _, L_Q, _ = Q.shape
90 |
91 | # calculate the sampled Q_K
92 | K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)
93 | # real U = U_part(factor*ln(L_k))*L_q
94 | index_sample = torch.randint(L_K, (L_Q, sample_k))
95 | K_sample = K_expand[:, :, torch.arange(
96 | L_Q).unsqueeze(1), index_sample, :]
97 | Q_K_sample = torch.matmul(
98 | Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()
99 |
100 | # find the Top_k query with sparisty measurement
101 | M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
102 | M_top = M.topk(n_top, sorted=False)[1]
103 |
104 | # use the reduced Q to calculate Q_K
105 | Q_reduce = Q[torch.arange(B)[:, None, None],
106 | torch.arange(H)[None, :, None],
107 | M_top, :] # factor*ln(L_q)
108 | Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1)) # factor*ln(L_q)*L_k
109 |
110 | return Q_K, M_top
111 |
112 | def _get_initial_context(self, V, L_Q):
113 | B, H, L_V, D = V.shape
114 | if not self.mask_flag:
115 | # V_sum = V.sum(dim=-2)
116 | V_sum = V.mean(dim=-2)
117 | contex = V_sum.unsqueeze(-2).expand(B, H,
118 | L_Q, V_sum.shape[-1]).clone()
119 | else: # use mask
120 | # requires that L_Q == L_V, i.e. for self-attention only
121 | assert (L_Q == L_V)
122 | contex = V.cumsum(dim=-2)
123 | return contex
124 |
125 | def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
126 | B, H, L_V, D = V.shape
127 |
128 | if self.mask_flag:
129 | attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
130 | scores.masked_fill_(attn_mask.mask, -np.inf)
131 |
132 | attn = torch.softmax(scores, dim=-1) # nn.Softmax(dim=-1)(scores)
133 |
134 | context_in[torch.arange(B)[:, None, None],
135 | torch.arange(H)[None, :, None],
136 | index, :] = torch.matmul(attn, V).type_as(context_in)
137 | if self.output_attention:
138 | attns = (torch.ones([B, H, L_V, L_V]) /
139 | L_V).type_as(attn).to(attn.device)
140 | attns[torch.arange(B)[:, None, None], torch.arange(H)[
141 | None, :, None], index, :] = attn
142 | return (context_in, attns)
143 | else:
144 | return (context_in, None)
145 |
146 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
147 | B, L_Q, H, D = queries.shape
148 | _, L_K, _, _ = keys.shape
149 |
150 | queries = queries.transpose(2, 1)
151 | keys = keys.transpose(2, 1)
152 | values = values.transpose(2, 1)
153 |
154 | U_part = self.factor * \
155 | np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
156 | u = self.factor * \
157 | np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
158 |
159 | U_part = U_part if U_part < L_K else L_K
160 | u = u if u < L_Q else L_Q
161 |
162 | scores_top, index = self._prob_QK(
163 | queries, keys, sample_k=U_part, n_top=u)
164 |
165 | # add scale factor
166 | scale = self.scale or 1. / sqrt(D)
167 | if scale is not None:
168 | scores_top = scores_top * scale
169 | # get the context
170 | context = self._get_initial_context(values, L_Q)
171 | # update the context with selected top_k queries
172 | context, attn = self._update_context(
173 | context, values, scores_top, index, L_Q, attn_mask)
174 |
175 | return context.contiguous(), attn
176 |
177 |
178 | class AttentionLayer(nn.Module):
179 | def __init__(self, attention, d_model, n_heads, d_keys=None,
180 | d_values=None):
181 | super(AttentionLayer, self).__init__()
182 |
183 | d_keys = d_keys or (d_model // n_heads)
184 | d_values = d_values or (d_model // n_heads)
185 |
186 | self.inner_attention = attention
187 | self.query_projection = nn.Linear(d_model, d_keys * n_heads)
188 | self.key_projection = nn.Linear(d_model, d_keys * n_heads)
189 | self.value_projection = nn.Linear(d_model, d_values * n_heads)
190 | self.out_projection = nn.Linear(d_values * n_heads, d_model)
191 | self.n_heads = n_heads
192 |
193 | def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
194 | B, L, _ = queries.shape
195 | _, S, _ = keys.shape
196 | H = self.n_heads
197 |
198 | queries = self.query_projection(queries).view(B, L, H, -1)
199 | keys = self.key_projection(keys).view(B, S, H, -1)
200 | values = self.value_projection(values).view(B, S, H, -1)
201 |
202 | out, attn = self.inner_attention(
203 | queries,
204 | keys,
205 | values,
206 | attn_mask,
207 | tau=tau,
208 | delta=delta
209 | )
210 | out = out.view(B, L, -1)
211 |
212 | return self.out_projection(out), attn
213 |
214 |
215 | class ReformerLayer(nn.Module):
216 | def __init__(self, attention, d_model, n_heads, d_keys=None,
217 | d_values=None, causal=False, bucket_size=4, n_hashes=4):
218 | super().__init__()
219 | self.bucket_size = bucket_size
220 | self.attn = LSHSelfAttention(
221 | dim=d_model,
222 | heads=n_heads,
223 | bucket_size=bucket_size,
224 | n_hashes=n_hashes,
225 | causal=causal
226 | )
227 |
228 | def fit_length(self, queries):
229 | # inside reformer: assert N % (bucket_size * 2) == 0
230 | B, N, C = queries.shape
231 | if N % (self.bucket_size * 2) == 0:
232 | return queries
233 | else:
234 | # fill the time series
235 | fill_len = (self.bucket_size * 2) - (N % (self.bucket_size * 2))
236 | return torch.cat([queries, torch.zeros([B, fill_len, C]).to(queries.device)], dim=1)
237 |
238 | def forward(self, queries, keys, values, attn_mask, tau, delta):
239 | # in Reformer: defalut queries=keys
240 | B, N, C = queries.shape
241 | queries = self.attn(self.fit_length(queries))[:, :N, :]
242 | return queries, None
243 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/Transformer_EncDec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class ConvLayer(nn.Module):
7 | def __init__(self, c_in):
8 | super(ConvLayer, self).__init__()
9 | self.downConv = nn.Conv1d(in_channels=c_in,
10 | out_channels=c_in,
11 | kernel_size=3,
12 | padding=2,
13 | padding_mode='circular')
14 | self.norm = nn.BatchNorm1d(c_in)
15 | self.activation = nn.ELU()
16 | self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
17 |
18 | def forward(self, x):
19 | x = self.downConv(x.permute(0, 2, 1))
20 | x = self.norm(x)
21 | x = self.activation(x)
22 | x = self.maxPool(x)
23 | x = x.transpose(1, 2)
24 | return x
25 |
26 |
27 | class EncoderLayer(nn.Module):
28 | def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
29 | super(EncoderLayer, self).__init__()
30 | d_ff = d_ff or 4 * d_model
31 | self.attention = attention
32 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
33 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
34 | self.norm1 = nn.LayerNorm(d_model)
35 | self.norm2 = nn.LayerNorm(d_model)
36 | self.dropout = nn.Dropout(dropout)
37 | self.activation = F.relu if activation == "relu" else F.gelu
38 |
39 | def forward(self, x, attn_mask=None, tau=None, delta=None):
40 | new_x, attn = self.attention(
41 | x, x, x,
42 | attn_mask=attn_mask,
43 | tau=tau, delta=delta
44 | )
45 | x = x + self.dropout(new_x)
46 |
47 | y = x = self.norm1(x)
48 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
49 | y = self.dropout(self.conv2(y).transpose(-1, 1))
50 |
51 | return self.norm2(x + y), attn
52 |
53 |
54 | class Encoder(nn.Module):
55 | def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
56 | super(Encoder, self).__init__()
57 | self.attn_layers = nn.ModuleList(attn_layers)
58 | self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
59 | self.norm = norm_layer
60 |
61 | def forward(self, x, attn_mask=None, tau=None, delta=None):
62 | # x [B, L, D]
63 | attns = []
64 | if self.conv_layers is not None:
65 | for i, (attn_layer, conv_layer) in enumerate(zip(self.attn_layers, self.conv_layers)):
66 | delta = delta if i == 0 else None
67 | x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
68 | x = conv_layer(x)
69 | attns.append(attn)
70 | x, attn = self.attn_layers[-1](x, tau=tau, delta=None)
71 | attns.append(attn)
72 | else:
73 | for attn_layer in self.attn_layers:
74 | x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
75 | attns.append(attn)
76 |
77 | if self.norm is not None:
78 | x = self.norm(x)
79 |
80 | return x, attns
81 |
82 |
83 | class DecoderLayer(nn.Module):
84 | def __init__(self, self_attention, cross_attention, d_model, d_ff=None,
85 | dropout=0.1, activation="relu"):
86 | super(DecoderLayer, self).__init__()
87 | d_ff = d_ff or 4 * d_model
88 | self.self_attention = self_attention
89 | self.cross_attention = cross_attention
90 | self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
91 | self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
92 | self.norm1 = nn.LayerNorm(d_model)
93 | self.norm2 = nn.LayerNorm(d_model)
94 | self.norm3 = nn.LayerNorm(d_model)
95 | self.dropout = nn.Dropout(dropout)
96 | self.activation = F.relu if activation == "relu" else F.gelu
97 |
98 | def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
99 | x = x + self.dropout(self.self_attention(
100 | x, x, x,
101 | attn_mask=x_mask,
102 | tau=tau, delta=None
103 | )[0])
104 | x = self.norm1(x)
105 |
106 | x = x + self.dropout(self.cross_attention(
107 | x, cross, cross,
108 | attn_mask=cross_mask,
109 | tau=tau, delta=delta
110 | )[0])
111 |
112 | y = x = self.norm2(x)
113 | y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
114 | y = self.dropout(self.conv2(y).transpose(-1, 1))
115 |
116 | return self.norm3(x + y)
117 |
118 |
119 | class Decoder(nn.Module):
120 | def __init__(self, layers, norm_layer=None, projection=None):
121 | super(Decoder, self).__init__()
122 | self.layers = nn.ModuleList(layers)
123 | self.norm = norm_layer
124 | self.projection = projection
125 |
126 | def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
127 | for layer in self.layers:
128 | x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask, tau=tau, delta=delta)
129 |
130 | if self.norm is not None:
131 | x = self.norm(x)
132 |
133 | if self.projection is not None:
134 | x = self.projection(x)
135 | return x
136 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/layers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/layers/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/PatchTST.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from layers.Transformer_EncDec import Encoder, EncoderLayer
4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer
5 | from layers.Embed import PatchEmbedding
6 | from utils.losses import AutomaticWeightedLoss
7 | from utils.tools import ContrastiveWeight, AggregationRebuild
8 |
9 | class Flatten_Head(nn.Module):
10 | def __init__(self, nf, pred_len, head_dropout=0):
11 | super().__init__()
12 | self.flatten = nn.Flatten(start_dim=-2)
13 | self.linear = nn.Linear(nf, pred_len)
14 | self.dropout = nn.Dropout(head_dropout)
15 |
16 | def forward(self, x): # [bs x n_vars x patch_num x d_model]
17 | x = self.flatten(x) # [bs x n_vars x (patch_num * d_model)]
18 | x = self.linear(x) # [bs x n_vars x pred_len]
19 | x = self.dropout(x) # [bs x n_vars x pred_len]
20 | return x
21 |
22 | class Pooler_Head(nn.Module):
23 | def __init__(self, nf, dimension=128, head_dropout=0):
24 | super().__init__()
25 |
26 | self.pooler = nn.Sequential(
27 | nn.Flatten(start_dim=-2),
28 | nn.Linear(nf, nf // 2),
29 | nn.BatchNorm1d(nf // 2),
30 | nn.ReLU(),
31 | nn.Linear(nf // 2, dimension),
32 | nn.Dropout(head_dropout),
33 | )
34 |
35 | def forward(self, x): # [(bs * n_vars) x patch_num x d_model]
36 | x = self.pooler(x) # [(bs * n_vars) x dimension]
37 | return x
38 |
39 | class Model(nn.Module):
40 | """
41 | PatchTST + SimMTM
42 | """
43 |
44 | def __init__(self, configs):
45 | super(Model, self).__init__()
46 | self.task_name = configs.task_name
47 | self.pred_len = configs.pred_len
48 | self.seq_len = configs.seq_len
49 | self.label_len = configs.label_len
50 | self.output_attention = configs.output_attention
51 | self.configs = configs
52 |
53 | # patching and embedding
54 | self.patch_embedding = PatchEmbedding(configs.d_model, configs.patch_len, configs.stride, configs.stride, configs.dropout)
55 |
56 | # Encoder
57 | self.encoder = Encoder(
58 | [
59 | EncoderLayer(
60 | AttentionLayer(
61 | DSAttention(False, configs.factor, attention_dropout=configs.dropout,
62 | output_attention=configs.output_attention), configs.d_model, configs.n_heads),
63 | configs.d_model,
64 | configs.d_ff,
65 | dropout=configs.dropout,
66 | activation=configs.activation
67 | ) for l in range(configs.e_layers)
68 | ],
69 | norm_layer=torch.nn.LayerNorm(configs.d_model),
70 | )
71 |
72 | self.patch_num = int((configs.seq_len - configs.patch_len) / configs.stride + 2)
73 | self.head_nf = self.patch_num * configs.d_model
74 |
75 | # Decoder
76 | if self.task_name == 'pretrain':
77 |
78 | # for series-wise representation
79 | self.pooler = Pooler_Head(self.head_nf, head_dropout=configs.head_dropout)
80 |
81 | # for reconstruction
82 | self.projection = Flatten_Head(self.head_nf, configs.d_model, configs.seq_len, head_dropout=configs.head_dropout)
83 |
84 | self.awl = AutomaticWeightedLoss(2)
85 | self.contrastive = ContrastiveWeight(self.configs)
86 | self.aggregation = AggregationRebuild(self.configs)
87 | self.mse = torch.nn.MSELoss()
88 |
89 | elif self.task_name == 'finetune':
90 | self.head = Flatten_Head(configs.head_nf, configs.d_model, configs.pred_len, head_dropout=configs.head_dropout)
91 |
92 | def forecast(self, x_enc, x_mark_enc):
93 |
94 | # data shape
95 | bs, seq_len, n_vars = x_enc.shape
96 |
97 | # normalization
98 | means = x_enc.mean(1, keepdim=True).detach()
99 | x_enc = x_enc - means
100 | stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)
101 | x_enc /= stdev
102 |
103 | # do patching and embedding
104 | x_enc = x_enc.permute(0, 2, 1)
105 | enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model]
106 |
107 | # encoder
108 | enc_out, _ = self.encoder(enc_out) # enc_out: [(bs * n_vars) x patch_num x d_model]
109 |
110 | enc_out = torch.reshape(enc_out, (bs, n_vars, seq_len, -1)) # enc_out: [bs x n_vars x patch_num x d_model]
111 |
112 | # decoder
113 | dec_out = self.head(enc_out) # dec_out: [bs x n_vars x pred_len]
114 | dec_out = dec_out.permute(0, 2, 1) # dec_out: [bs x pred_len x n_vars]
115 |
116 | # de-Normalization from Non-stationary Transformer
117 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
118 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
119 |
120 | return dec_out
121 |
122 | def pretrain(self, x_enc, x_mark_enc, batch_x, mask):
123 |
124 | # data shape
125 | bs, seq_len, n_vars = x_enc.shape
126 |
127 | # normalization
128 | means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1)
129 | means = means.unsqueeze(1).detach()
130 | x_enc = x_enc - means
131 | x_enc = x_enc.masked_fill(mask == 0, 0)
132 | stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5)
133 | stdev = stdev.unsqueeze(1).detach()
134 | x_enc /= stdev
135 |
136 | # do patching and embedding
137 | x_enc = x_enc.permute(0, 2, 1)
138 | enc_out, n_vars = self.patch_embedding(x_enc) # [(bs * n_vars) x patch_num x d_model]
139 |
140 | # encoder
141 | p_enc_out, _ = self.encoder(enc_out) # [(bs * n_vars) x patch_num x d_model]
142 |
143 | # series-wise representation
144 | s_enc_out = self.pooler(p_enc_out) # [(bs * n_vars) x dimension]
145 |
146 | # series weight learning
147 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [(bs * n_vars) x (bs * n_vars)]
148 | rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [(bs * n_vars) x patch_num x d_model]
149 |
150 | agg_enc_out = agg_enc_out.reshape(bs, n_vars, agg_enc_out.shape[-2], agg_enc_out.shape[-1]) # agg_enc_out: [bs x n_vars x patch_num x d_model]
151 |
152 | # decoder
153 | dec_out = self.projection(agg_enc_out) # [bs x n_vars x seq_len]
154 | dec_out = dec_out.permute(0, 2, 1) # [bs x seq_len x n_vars]
155 |
156 | # de-Normalization
157 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
158 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
159 |
160 | pred_batch_x = dec_out[:batch_x.shape[0]]
161 |
162 | # series reconstruction
163 | loss_rb = self.mse(pred_batch_x, batch_x.detach())
164 |
165 | # loss
166 | loss = self.awl(loss_cl, loss_rb)
167 |
168 | return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x
169 |
170 | def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None):
171 |
172 | if self.task_name == 'pretrain':
173 | return self.pretrain(x_enc, x_mark_enc, batch_x, mask)
174 | if self.task_name == 'finetune':
175 | dec_out = self.forecast(x_enc, x_mark_enc)
176 | return dec_out[:, -self.pred_len:, :] # [B, L, D]
177 |
178 | return None
179 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/models/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Forecasting/models/iTransformer.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from layers.Transformer_EncDec import Encoder, EncoderLayer
4 | from layers.SelfAttention_Family import DSAttention, AttentionLayer
5 | from layers.Embed import DataEmbedding_inverted
6 | from utils.losses import AutomaticWeightedLoss
7 | from utils.tools import ContrastiveWeight, AggregationRebuild
8 |
9 | class Flatten_Head(nn.Module):
10 | def __init__(self, d_model, pred_len, head_dropout=0):
11 | super().__init__()
12 | self.flatten = nn.Flatten(start_dim=-1)
13 | self.linear = nn.Linear(d_model, pred_len, bias=True)
14 | self.dropout = nn.Dropout(head_dropout)
15 |
16 | def forward(self, x): # x: [bs x n_vars x d_model]
17 | x = self.flatten(x)
18 | x = self.linear(x)
19 | x = self.dropout(x)
20 | return x # x: [bs x n_vars x seq_len]
21 |
22 | class Pooler_Head(nn.Module):
23 | def __init__(self, nf, dimension=128, head_dropout=0):
24 | super().__init__()
25 |
26 | self.pooler = nn.Sequential(
27 | nn.Flatten(start_dim=-2),
28 | nn.Linear(nf, nf // 2),
29 | nn.BatchNorm1d(nf // 2),
30 | nn.ReLU(),
31 | nn.Linear(nf // 2, dimension),
32 | nn.Dropout(head_dropout),
33 | )
34 |
35 | def forward(self, x): # [bs x n_vars x d_model]
36 | x = self.pooler(x) # [bs x dimension]
37 | return x
38 |
39 | class Model(nn.Module):
40 | """
41 | iTransformer + SimMTM
42 | """
43 |
44 | def __init__(self, configs):
45 | super(Model, self).__init__()
46 | self.task_name = configs.task_name
47 | self.pred_len = configs.pred_len
48 | self.seq_len = configs.seq_len
49 | self.label_len = configs.label_len
50 | self.output_attention = configs.output_attention
51 | self.configs = configs
52 |
53 | # patching and embedding
54 | self.enc_embedding = DataEmbedding_inverted(configs.seq_len, configs.d_model, configs.embed, configs.freq, configs.dropout)
55 |
56 | # Encoder
57 | self.encoder = Encoder(
58 | [
59 | EncoderLayer(
60 | AttentionLayer(
61 | DSAttention(False, configs.factor, attention_dropout=configs.dropout,
62 | output_attention=configs.output_attention), configs.d_model, configs.n_heads),
63 | configs.d_model,
64 | configs.d_ff,
65 | dropout=configs.dropout,
66 | activation=configs.activation
67 | ) for l in range(configs.e_layers)
68 | ],
69 | norm_layer=torch.nn.LayerNorm(configs.d_model),
70 | )
71 |
72 | # Decoder
73 | if self.task_name == 'pretrain':
74 |
75 | # for series-wise representation
76 | self.pooler = Pooler_Head(configs.enc_in*configs.d_model, head_dropout=configs.head_dropout)
77 |
78 | # for reconstruction
79 | self.projection = Flatten_Head(configs.d_model, configs.seq_len, head_dropout=configs.head_dropout)
80 |
81 | self.awl = AutomaticWeightedLoss(2)
82 | self.contrastive = ContrastiveWeight(self.configs)
83 | self.aggregation = AggregationRebuild(self.configs)
84 | self.mse = torch.nn.MSELoss()
85 |
86 | elif self.task_name == 'finetune':
87 | self.head = Flatten_Head(configs.d_model, configs.pred_len, head_dropout=configs.head_dropout)
88 |
89 | def forecast(self, x_enc, x_mark_enc):
90 |
91 | # normalization
92 | means = x_enc.mean(1, keepdim=True).detach()
93 | x_enc = x_enc - means
94 | stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)
95 | x_enc /= stdev
96 |
97 | _, _, N = x_enc.shape # x_enc: [Batch Time Variate]
98 |
99 | # embedding
100 | enc_out = self.enc_embedding(x_enc, x_mark_enc)
101 |
102 | # encoder
103 | enc_out, _ = self.encoder(enc_out)
104 |
105 | dec_out = self.head(enc_out).permute(0, 2, 1)[:, :, :N]
106 |
107 | # de-Normalization
108 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
109 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.pred_len, 1))
110 |
111 | return dec_out
112 |
113 | def pretrain(self, x_enc, x_mark_enc, batch_x, mask):
114 |
115 | # normalization
116 | means = torch.sum(x_enc, dim=1) / torch.sum(mask == 1, dim=1)
117 | means = means.unsqueeze(1).detach()
118 | x_enc = x_enc - means
119 | x_enc = x_enc.masked_fill(mask == 0, 0)
120 | stdev = torch.sqrt(torch.sum(x_enc * x_enc, dim=1) / torch.sum(mask == 1, dim=1) + 1e-5)
121 | stdev = stdev.unsqueeze(1).detach()
122 | x_enc /= stdev
123 |
124 | # encoder
125 | enc_out = self.enc_embedding(x_enc)
126 | p_enc_out, _ = self.encoder(enc_out) # p_enc_out: [bs x n_vars x d_model]
127 |
128 | # series-wise representation
129 | s_enc_out = self.pooler(p_enc_out) # s_enc_out: [bs x dimension]
130 |
131 | # series weight learning
132 | loss_cl, similarity_matrix, logits, positives_mask = self.contrastive(s_enc_out) # similarity_matrix: [bs x bs]
133 | rebuild_weight_matrix, agg_enc_out = self.aggregation(similarity_matrix, p_enc_out) # agg_enc_out: [bs x n_vars x d_model]
134 |
135 | # decoder
136 | dec_out = self.projection(agg_enc_out) # [bs x n_vars x seq_len]
137 | dec_out = dec_out.permute(0, 2, 1) # [bs x seq_len x n_vars]
138 |
139 | # de-Normalization
140 | dec_out = dec_out * (stdev[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
141 | dec_out = dec_out + (means[:, 0, :].unsqueeze(1).repeat(1, self.seq_len, 1))
142 |
143 | pred_batch_x = dec_out[:batch_x.shape[0]]
144 |
145 | # series reconstruction
146 | loss_rb = self.mse(pred_batch_x, batch_x.detach())
147 |
148 | # loss
149 | loss = self.awl(loss_cl, loss_rb)
150 |
151 | return loss, loss_cl, loss_rb, positives_mask, logits, rebuild_weight_matrix, pred_batch_x
152 |
153 | def forward(self, x_enc, x_mark_enc, batch_x=None, mask=None):
154 |
155 | if self.task_name == 'pretrain':
156 | return self.pretrain(x_enc, x_mark_enc, batch_x, mask)
157 | if self.task_name == 'finetune':
158 | dec_out = self.forecast(x_enc, x_mark_enc)
159 | return dec_out[:, -self.pred_len:, :] # [B, L, D]
160 |
161 | return None
162 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/run.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import torch
3 | from exp.exp_simmtm import Exp_SimMTM
4 | import random
5 | import numpy as np
6 | import os
7 | os.environ['CUDA_LAUNCH_BLOCKING'] = '0'
8 |
9 | fix_seed = 2023
10 | random.seed(fix_seed)
11 | torch.manual_seed(fix_seed)
12 | np.random.seed(fix_seed)
13 |
14 | parser = argparse.ArgumentParser(description='SimMTM')
15 |
16 | # basic config
17 | parser.add_argument('--task_name', type=str, required=True, default='pretrain', help='task name, options:[pretrain, finetune]')
18 | parser.add_argument('--is_training', type=int, default=1, help='status')
19 | parser.add_argument('--model_id', type=str, required=True, default='SimMTM', help='model id')
20 | parser.add_argument('--model', type=str, required=True, default='SimMTM', help='model name')
21 |
22 | # data loader
23 | parser.add_argument('--data', type=str, required=True, default='ETTh1', help='dataset type')
24 | parser.add_argument('--root_path', type=str, default='./datasets', help='root path of the data file')
25 | parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file')
26 | parser.add_argument('--features', type=str, default='M', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate')
27 | parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
28 | parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h')
29 | parser.add_argument('--checkpoints', type=str, default='./outputs/checkpoints/', help='location of model fine-tuning checkpoints')
30 | parser.add_argument('--pretrain_checkpoints', type=str, default='./outputs/pretrain_checkpoints/', help='location of model pre-training checkpoints')
31 | parser.add_argument('--transfer_checkpoints', type=str, default='ckpt_best.pth', help='checkpoints we will use to finetune, options:[ckpt_best.pth, ckpt10.pth, ckpt20.pth...]')
32 | parser.add_argument('--load_checkpoints', type=str, default=None, help='location of model checkpoints')
33 | parser.add_argument('--select_channels', type=float, default=1, help='select the rate of channels to train')
34 |
35 | # forecasting task
36 | parser.add_argument('--seq_len', type=int, default=336, help='input sequence length')
37 | parser.add_argument('--label_len', type=int, default=48, help='start token length')
38 | parser.add_argument('--pred_len', type=int, default=96, help='prediction sequence length')
39 | parser.add_argument('--seasonal_patterns', type=str, default='Monthly', help='subset for M4')
40 |
41 | # model define
42 | parser.add_argument('--top_k', type=int, default=5, help='for TimesBlock')
43 | parser.add_argument('--num_kernels', type=int, default=3, help='for Inception')
44 | parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
45 | parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
46 | parser.add_argument('--c_out', type=int, default=7, help='output size')
47 | parser.add_argument('--d_model', type=int, default=512, help='dimension of model')
48 | parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
49 | parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
50 | parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
51 | parser.add_argument('--d_ff', type=int, default=2048, help='dimension of fcn')
52 | parser.add_argument('--moving_avg', type=int, default=25, help='window size of moving average')
53 | parser.add_argument('--factor', type=int, default=1, help='attn factor')
54 | parser.add_argument('--distil', action='store_false', help='whether to use distilling in encoder, using this argument means not using distilling', default=True)
55 | parser.add_argument('--dropout', type=float, default=0.1, help='dropout')
56 | parser.add_argument('--fc_dropout', type=float, default=0, help='fully connected dropout')
57 | parser.add_argument('--head_dropout', type=float, default=0.1, help='head dropout')
58 | parser.add_argument('--embed', type=str, default='timeF', help='time features encoding, options:[timeF, fixed, learned]')
59 | parser.add_argument('--activation', type=str, default='gelu', help='activation')
60 | parser.add_argument('--output_attention', action='store_true', help='whether to output attention in ecoder')
61 | parser.add_argument('--individual', type=int, default=0, help='individual head; True 1 False 0')
62 | parser.add_argument('--pct_start', type=float, default=0.3, help='pct_start')
63 | parser.add_argument('--patch_len', type=int, default=12, help='path length')
64 | parser.add_argument('--stride', type=int, default=12, help='stride')
65 |
66 | # optimization
67 | parser.add_argument('--num_workers', type=int, default=5, help='data loader num workers')
68 | parser.add_argument('--itr', type=int, default=1, help='experiments times')
69 | parser.add_argument('--train_epochs', type=int, default=10, help='train epochs')
70 | parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data')
71 | parser.add_argument('--patience', type=int, default=3, help='early stopping patience')
72 | parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
73 | parser.add_argument('--des', type=str, default='test', help='exp description')
74 | parser.add_argument('--loss', type=str, default='MSE', help='loss function')
75 | parser.add_argument('--lradj', type=str, default='type1', help='adjust learning rate')
76 | parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False)
77 |
78 | # GPU
79 | parser.add_argument('--use_gpu', type=bool, default=True, help='use gpu')
80 | parser.add_argument('--gpu', type=int, default=0, help='gpu')
81 | parser.add_argument('--use_multi_gpu', action='store_true', help='use multiple gpus', default=False)
82 | parser.add_argument('--devices', type=str, default='0', help='device ids of multile gpus')
83 |
84 | # Pre-train
85 | parser.add_argument('--lm', type=int, default=3, help='average masking length')
86 | parser.add_argument('--positive_nums', type=int, default=3, help='masking series numbers')
87 | parser.add_argument('--rbtp', type=int, default=1, help='0: rebuild the embedding of oral series; 1: rebuild oral series')
88 | parser.add_argument('--temperature', type=float, default=0.2, help='temperature')
89 | parser.add_argument('--masked_rule', type=str, default='geometric', help='geometric, random, masked tail, masked head')
90 | parser.add_argument('--mask_rate', type=float, default=0.5, help='mask ratio')
91 |
92 | args = parser.parse_args()
93 | args.use_gpu = True if torch.cuda.is_available() and args.use_gpu else False
94 |
95 | if args.use_gpu and args.use_multi_gpu:
96 | args.devices = args.devices.replace(' ', '')
97 | device_ids = args.devices.split(',')
98 | args.device_ids = [int(id_) for id_ in device_ids]
99 |
100 | print('Args in experiment:')
101 | print(args)
102 |
103 | Exp = Exp_SimMTM
104 | if args.task_name == 'pretrain':
105 | for ii in range(args.itr):
106 | # setting record of experiments
107 | setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}_lm{}_pn{}_mr{}_tp{}'.format(
108 | args.task_name,
109 | args.model,
110 | args.data,
111 | args.features,
112 | args.seq_len,
113 | args.label_len,
114 | args.pred_len,
115 | args.d_model,
116 | args.d_ff,
117 | args.n_heads,
118 | args.e_layers,
119 | args.d_layers,
120 | args.factor,
121 | args.dropout,
122 | args.head_dropout,
123 | args.train_epochs,
124 | args.batch_size,
125 | args.learning_rate,
126 | args.lm,
127 | args.positive_nums,
128 | args.mask_rate,
129 | args.temperature
130 | )
131 |
132 | exp = Exp(args) # set experiments
133 | print('>>>>>>>start pre_training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
134 | exp.pretrain()
135 | torch.cuda.empty_cache()
136 |
137 | elif args.task_name == 'finetune':
138 | for ii in range(args.itr):
139 | # setting record of experiments
140 | setting = '{}_{}_{}_{}_sl{}_ll{}_pl{}_dm{}_df{}_nh{}_el{}_dl{}_fc{}_dp{}_hdp{}_ep{}_bs{}_lr{}'.format(
141 | args.task_name,
142 | args.model,
143 | args.data,
144 | args.features,
145 | args.seq_len,
146 | args.label_len,
147 | args.pred_len,
148 | args.d_model,
149 | args.d_ff,
150 | args.n_heads,
151 | args.e_layers,
152 | args.d_layers,
153 | args.factor,
154 | args.dropout,
155 | args.head_dropout,
156 | args.train_epochs,
157 | args.batch_size,
158 | args.learning_rate
159 | )
160 |
161 | args.load_checkpoints = os.path.join(args.pretrain_checkpoints, args.data, args.transfer_checkpoints)
162 |
163 | exp = Exp(args) # set experiments
164 |
165 | print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
166 | exp.train(setting)
167 |
168 | print('>>>>>>>testing : {}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'.format(setting))
169 | exp.test()
170 | torch.cuda.empty_cache()
171 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ECL_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ECL_script/ECL.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --root_path ./dataset/electricity/ \
7 | --data_path electricity.csv \
8 | --model_id ECL \
9 | --model SimMTM \
10 | --data ECL \
11 | --features M \
12 | --seq_len 336 \
13 | --label_len 48 \
14 | --pred_len $pred_len \
15 | --e_layers 2 \
16 | --enc_in 321 \
17 | --dec_in 321 \
18 | --c_out 321 \
19 | --d_model 32 \
20 | --d_ff 64 \
21 | --n_heads 16 \
22 | --batch_size 32
23 | done
24 |
25 |
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/ETT_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh1.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --is_training 1 \
7 | --root_path ./dataset/ETT-small/ \
8 | --data_path ETTh1.csv \
9 | --model_id ETTh1 \
10 | --model SimMTM \
11 | --data ETTh1 \
12 | --features M \
13 | --seq_len 336 \
14 | --label_len 48 \
15 | --pred_len $pred_len \
16 | --e_layers 3 \
17 | --enc_in 7 \
18 | --dec_in 7 \
19 | --c_out 7 \
20 | --n_heads 16 \
21 | --d_model 32 \
22 | --d_ff 64 \
23 | --learning_rate 0.0001 \
24 | --dropout 0.2 \
25 | --batch_size 16
26 | done
27 |
28 |
29 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTh2.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --is_training 1 \
7 | --root_path ./dataset/ETT-small/ \
8 | --data_path ETTh2.csv \
9 | --model_id ETTh2 \
10 | --model SimMTM \
11 | --data ETTh2 \
12 | --features M \
13 | --seq_len 336 \
14 | --label_len 48 \
15 | --pred_len $pred_len \
16 | --e_layers 2 \
17 | --enc_in 7 \
18 | --dec_in 7 \
19 | --c_out 7 \
20 | --n_heads 8 \
21 | --d_model 8 \
22 | --d_ff 32 \
23 | --dropout 0.4 \
24 | --head_dropout 0.2 \
25 | --batch_size 16
26 | done
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm1.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --is_training 1 \
7 | --root_path ./dataset/ETT-small/ \
8 | --data_path ETTm1.csv \
9 | --model_id ETTm1 \
10 | --model SimMTM \
11 | --data ETTm1 \
12 | --features M \
13 | --seq_len 336 \
14 | --label_len 48 \
15 | --pred_len $pred_len \
16 | --e_layers 2 \
17 | --enc_in 7 \
18 | --dec_in 7 \
19 | --c_out 7 \
20 | --n_heads 8 \
21 | --d_model 32 \
22 | --d_ff 64 \
23 | --dropout 0
24 | done
25 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/ETT_script/ETTm2.sh:
--------------------------------------------------------------------------------
1 | xport CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --is_training 1 \
7 | --root_path ./dataset/ETT-small/ \
8 | --data_path ETTm2.csv \
9 | --model_id ETTm2 \
10 | --model SimMTM \
11 | --data ETTm2 \
12 | --features M \
13 | --seq_len 336 \
14 | --label_len 48 \
15 | --pred_len $pred_len \
16 | --e_layers 3 \
17 | --enc_in 7 \
18 | --dec_in 7 \
19 | --c_out 7 \
20 | --n_heads 8 \
21 | --d_model 8 \
22 | --d_ff 16 \
23 | --dropout 0 \
24 | --batch_size 64
25 | done
26 |
27 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Traffic/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Traffic/Traffic.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --root_path ./dataset/traffic/ \
7 | --data_path traffic.csv \
8 | --model_id Traffic \
9 | --model SimMTM \
10 | --data Traffic \
11 | --features M \
12 | --seq_len 336 \
13 | --label_len 48 \
14 | --pred_len $pred_len \
15 | --e_layers 2 \
16 | --enc_in 862 \
17 | --dec_in 862 \
18 | --c_out 862 \
19 | --d_model 128 \
20 | --d_ff 256 \
21 | --n_heads 16 \
22 | --batch_size 32 \
23 | --dropout 0.2
24 | done
25 |
26 |
27 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/finetune/Weather_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/finetune/Weather_script/Weather.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | for pred_len in 96 192 336 720; do
4 | python -u run.py \
5 | --task_name finetune \
6 | --is_training 1 \
7 | --root_path ./dataset/weather/ \
8 | --data_path weather.csv \
9 | --model_id Weather \
10 | --model SimMTM \
11 | --data Weather \
12 | --features M \
13 | --seq_len 336 \
14 | --label_len 48 \
15 | --pred_len $pred_len \
16 | --e_layers 2 \
17 | --enc_in 21 \
18 | --dec_in 21 \
19 | --c_out 21 \
20 | --n_heads 8 \
21 | --d_model 64 \
22 | --d_ff 64 \
23 | --batch_size 16
24 | done
25 |
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ECL_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ECL_script/ECL.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/electricity/ \
6 | --data_path electricity.csv \
7 | --model_id ECL \
8 | --model SimMTM \
9 | --data ECL \
10 | --features M \
11 | --seq_len 336 \
12 | --label_len 48 \
13 | --e_layers 2 \
14 | --positive_nums 2 \
15 | --mask_rate 0.5 \
16 | --enc_in 321 \
17 | --dec_in 321 \
18 | --c_out 321 \
19 | --d_model 32 \
20 | --d_ff 64 \
21 | --n_heads 16 \
22 | --batch_size 32 \
23 | --train_epochs 50 \
24 | --temperature 0.02
25 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/ETT_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh1.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/ETT-small/ \
6 | --data_path ETTh1.csv \
7 | --model_id ETTh1 \
8 | --model SimMTM \
9 | --data ETTh1 \
10 | --features M \
11 | --seq_len 336 \
12 | --e_layers 3 \
13 | --enc_in 7 \
14 | --dec_in 7 \
15 | --c_out 7 \
16 | --n_heads 16 \
17 | --d_model 32 \
18 | --d_ff 64 \
19 | --positive_nums 3 \
20 | --mask_rate 0.5 \
21 | --learning_rate 0.001 \
22 | --batch_size 16 \
23 | --train_epochs 50
24 |
25 |
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTh2.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/ETT-small/ \
6 | --data_path ETTh2.csv \
7 | --model_id ETTh2 \
8 | --model SimMTM \
9 | --data ETTh2 \
10 | --features M \
11 | --seq_len 336 \
12 | --e_layers 2 \
13 | --enc_in 7 \
14 | --dec_in 7 \
15 | --c_out 7 \
16 | --n_heads 8 \
17 | --d_model 8 \
18 | --d_ff 32 \
19 | --positive_nums 3 \
20 | --mask_rate 0.5 \
21 | --learning_rate 0.001 \
22 | --batch_size 16 \
23 | --train_epochs 50
24 |
25 |
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm1.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/ETT-small/ \
6 | --data_path ETTm1.csv \
7 | --model_id ETTm1 \
8 | --model SimMTM \
9 | --data ETTm1 \
10 | --features M \
11 | --seq_len 336 \
12 | --e_layers 2 \
13 | --enc_in 7 \
14 | --dec_in 7 \
15 | --c_out 7 \
16 | --n_heads 8 \
17 | --d_model 32 \
18 | --d_ff 64 \
19 | --positive_nums 3 \
20 | --mask_rate 0.5 \
21 | --learning_rate 0.001 \
22 | --batch_size 16 \
23 | --train_epochs 50
24 |
25 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/ETT_script/ETTm2.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0,1
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/ETT-small/ \
6 | --data_path ETTm2.csv \
7 | --model_id ETTm2 \
8 | --model SimMTM \
9 | --data ETTm2 \
10 | --features M \
11 | --seq_len 336 \
12 | --e_layers 3 \
13 | --enc_in 7 \
14 | --dec_in 7 \
15 | --c_out 7 \
16 | --n_heads 8 \
17 | --d_model 8 \
18 | --d_ff 16 \
19 | --positive_nums 2 \
20 | --mask_rate 0.5 \
21 | --learning_rate 0.001 \
22 | --batch_size 16 \
23 | --train_epochs 50
24 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Traffic_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Traffic_script/Traffic.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/traffic/ \
6 | --data_path traffic.csv \
7 | --model_id Traffic \
8 | --model SimMTM \
9 | --data Traffic \
10 | --features M \
11 | --seq_len 336 \
12 | --label_len 48 \
13 | --e_layers 3 \
14 | --positive_nums 2 \
15 | --mask_rate 0.5 \
16 | --enc_in 862 \
17 | --dec_in 862 \
18 | --c_out 862 \
19 | --d_model 128 \
20 | --d_ff 256 \
21 | --n_heads 16 \
22 | --batch_size 32 \
23 | --dropout 0.2 \
24 | --train_epochs 50 \
25 | --temperature 0.02
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/scripts/pretrain/Weather_script/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/scripts/pretrain/Weather_script/Weather.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES=0
2 |
3 | python -u run.py \
4 | --task_name pretrain \
5 | --root_path ./dataset/weather/ \
6 | --data_path weather.csv \
7 | --model_id Weather \
8 | --model SimMTM \
9 | --data Weather \
10 | --features M \
11 | --seq_len 336 \
12 | --e_layers 2 \
13 | --positive_nums 2 \
14 | --mask_rate 0.5 \
15 | --enc_in 21 \
16 | --dec_in 21 \
17 | --c_out 21 \
18 | --n_heads 8 \
19 | --d_model 64 \
20 | --d_ff 64 \
21 | --learning_rate 0.001 \
22 | --batch_size 16 \
23 | --train_epochs 50
24 |
25 |
26 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/.DS_Store
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/SimMTM_Forecasting/utils/__init__.py
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/augmentations.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import math
4 |
5 |
6 | def masked_data(sample, sample_mark, masking_ratio, lm, positive_nums=1, distribution='geometric'):
7 | """Masked time series in time dimension"""
8 |
9 | sample = sample.permute(0, 2, 1) # [bs x nvars x seq_len]
10 |
11 | sample_repeat = sample.repeat(positive_nums, 1, 1) # [(bs * positive_nums) x nvars x seq_len]
12 |
13 | mask = noise_mask(sample_repeat, masking_ratio, lm, distribution=distribution)
14 | x_masked = mask * sample_repeat
15 |
16 | sample_mark_repeat = sample_mark.repeat(positive_nums, 1, 1)
17 |
18 | return x_masked.permute(0, 2, 1), sample_mark_repeat, mask.permute(0, 2, 1)
19 |
20 |
21 | def geom_noise_mask_single(L, lm, masking_ratio):
22 | """
23 | Randomly create a boolean mask of length `L`, consisting of subsequences of average length lm, masking with 0s a `masking_ratio`
24 | proportion of the sequence L. The length of masking subsequences and intervals follow a geometric distribution.
25 | Args:
26 | L: length of mask and sequence to be masked
27 | lm: average length of masking subsequences (streaks of 0s)
28 | masking_ratio: proportion of L to be masked
29 | Returns:
30 | (L,) boolean numpy array intended to mask ('drop') with 0s a sequence of length L
31 | """
32 | keep_mask = np.ones(L, dtype=bool)
33 | p_m = 1 / lm # probability of each masking sequence stopping. parameter of geometric distribution.
34 | p_u = p_m * masking_ratio / (
35 | 1 - masking_ratio) # probability of each unmasked sequence stopping. parameter of geometric distribution.
36 | p = [p_m, p_u]
37 |
38 | # Start in state 0 with masking_ratio probability
39 | state = int(np.random.rand() > masking_ratio) # state 0 means masking, 1 means not masking
40 | for i in range(L):
41 | keep_mask[i] = state # here it happens that state and masking value corresponding to state are identical
42 | if np.random.rand() < p[state]:
43 | state = 1 - state
44 |
45 | return keep_mask
46 |
47 |
48 | def noise_mask(X, masking_ratio=0.25, lm=3, distribution='geometric', exclude_feats=None):
49 | """
50 | Creates a random boolean mask of the same shape as X, with 0s at places where a feature should be masked.
51 | Args:
52 | X: (seq_length, feat_dim) numpy array of features corresponding to a single sample
53 | masking_ratio: proportion of seq_length to be masked. At each time step, will also be the proportion of
54 | feat_dim that will be masked on average
55 | lm: average length of masking subsequences (streaks of 0s). Used only when `distribution` is 'geometric'.
56 | distribution: whether each mask sequence element is sampled independently at random, or whether
57 | sampling follows a markov chain (and thus is stateful), resulting in geometric distributions of
58 | masked squences of a desired mean length `lm`
59 | exclude_feats: iterable of indices corresponding to features to be excluded from masking (i.e. to remain all 1s)
60 | Returns:
61 | boolean numpy array with the same shape as X, with 0s at places where a feature should be masked
62 | """
63 | if exclude_feats is not None:
64 | exclude_feats = set(exclude_feats)
65 |
66 | if distribution == 'geometric': # stateful (Markov chain)
67 | mask = geom_noise_mask_single(X.shape[0] * X.shape[1] * X.shape[2], lm, masking_ratio)
68 | mask = mask.reshape(X.shape[0], X.shape[1], X.shape[2])
69 | elif distribution == 'masked_tail':
70 | mask = np.ones(X.shape, dtype=bool)
71 | for m in range(X.shape[0]): # feature dimension
72 |
73 | keep_mask = np.zeros_like(mask[m, :], dtype=bool)
74 | n = math.ceil(keep_mask.shape[1] * (1 - masking_ratio))
75 | keep_mask[:, :n] = True
76 | mask[m, :] = keep_mask # time dimension
77 | elif distribution == 'masked_head':
78 | mask = np.ones(X.shape, dtype=bool)
79 | for m in range(X.shape[0]): # feature dimension
80 |
81 | keep_mask = np.zeros_like(mask[m, :], dtype=bool)
82 | n = math.ceil(keep_mask.shape[1] * masking_ratio)
83 | keep_mask[:, n:] = True
84 | mask[m, :] = keep_mask # time dimension
85 | else: # each position is independent Bernoulli with p = 1 - masking_ratio
86 | mask = np.random.choice(np.array([True, False]), size=X.shape, replace=True,
87 | p=(1 - masking_ratio, masking_ratio))
88 | return torch.tensor(mask)
89 |
90 |
91 | def one_hot_encoding(X):
92 | X = [int(x) for x in X]
93 | n_values = np.max(X) + 1
94 | b = np.eye(n_values)[X]
95 | return b
96 |
97 |
98 | def DataTransform(sample, config):
99 | """Weak and strong augmentations"""
100 | weak_aug = scaling(sample, config.augmentation.jitter_scale_ratio)
101 | # weak_aug = permutation(sample, max_segments=config.augmentation.max_seg)
102 | strong_aug = jitter(permutation(sample, max_segments=config.augmentation.max_seg), config.augmentation.jitter_ratio)
103 |
104 | return weak_aug, strong_aug
105 |
106 |
107 | def remove_frequency(x, pertub_ratio=0.0):
108 | mask = torch.cuda.FloatTensor(x.shape).uniform_() > pertub_ratio # maskout_ratio are False
109 | mask = mask.to(x.device)
110 | return x*mask
111 |
112 |
113 | def add_frequency(x, pertub_ratio=0.0):
114 |
115 | mask = torch.cuda.FloatTensor(x.shape).uniform_() > (1-pertub_ratio) # only pertub_ratio of all values are True
116 | mask = mask.to(x.device)
117 | max_amplitude = x.max()
118 | random_am = torch.rand(mask.shape)*(max_amplitude*0.1)
119 | pertub_matrix = mask*random_am
120 | return x+pertub_matrix
121 |
122 |
123 | def generate_binomial_mask(B, T, D, p=0.5): # p is the ratio of not zero
124 | return torch.from_numpy(np.random.binomial(1, p, size=(B, T, D))).to(torch.bool)
125 |
126 |
127 | def masking(x, keepratio=0.9, mask= 'binomial'):
128 | global mask_id
129 | nan_mask = ~x.isnan().any(axis=-1)
130 | x[~nan_mask] = 0
131 | # x = self.input_fc(x) # B x T x Ch
132 |
133 | if mask == 'binomial':
134 | mask_id = generate_binomial_mask(x.size(0), x.size(1), x.size(2), p=keepratio).to(x.device)
135 | # elif mask == 'continuous':
136 | # mask = generate_continuous_mask(x.size(0), x.size(1)).to(x.device)
137 | # elif mask == 'all_true':
138 | # mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool)
139 | # elif mask == 'all_false':
140 | # mask = x.new_full((x.size(0), x.size(1)), False, dtype=torch.bool)
141 | # elif mask == 'mask_last':
142 | # mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool)
143 | # mask[:, -1] = False
144 |
145 | # mask &= nan_mask
146 | x[~mask_id] = 0
147 | return x
148 |
149 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/losses.py:
--------------------------------------------------------------------------------
1 | import torch as t
2 | import torch
3 | import torch.nn as nn
4 | import numpy as np
5 |
6 | def divide_no_nan(a, b):
7 | """
8 | a/b where the resulted NaN or Inf are replaced by 0.
9 | """
10 | result = a / b
11 | result[result != result] = .0
12 | result[result == np.inf] = .0
13 | return result
14 |
15 |
16 | class mape_loss(nn.Module):
17 | def __init__(self):
18 | super(mape_loss, self).__init__()
19 |
20 | def forward(self, insample: t.Tensor, freq: int,
21 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
22 | """
23 | MAPE loss as defined in: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
24 |
25 | :param forecast: Forecast values. Shape: batch, time
26 | :param target: Target values. Shape: batch, time
27 | :param mask: 0/1 mask. Shape: batch, time
28 | :return: Loss value
29 | """
30 | weights = divide_no_nan(mask, target)
31 | return t.mean(t.abs((forecast - target) * weights))
32 |
33 |
34 | class smape_loss(nn.Module):
35 | def __init__(self):
36 | super(smape_loss, self).__init__()
37 |
38 | def forward(self, insample: t.Tensor, freq: int,
39 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
40 | """
41 | sMAPE loss as defined in https://robjhyndman.com/hyndsight/smape/ (Makridakis 1993)
42 |
43 | :param forecast: Forecast values. Shape: batch, time
44 | :param target: Target values. Shape: batch, time
45 | :param mask: 0/1 mask. Shape: batch, time
46 | :return: Loss value
47 | """
48 | return 200 * t.mean(divide_no_nan(t.abs(forecast - target), t.abs(forecast.data) + t.abs(target.data)) * mask)
49 |
50 |
51 | class mase_loss(nn.Module):
52 | def __init__(self):
53 | super(mase_loss, self).__init__()
54 |
55 | def forward(self, insample: t.Tensor, freq: int,
56 | forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
57 | """
58 | MASE loss as defined in "Scaled Errors" https://robjhyndman.com/papers/mase.pdf
59 |
60 | :param insample: Insample values. Shape: batch, time_i
61 | :param freq: Frequency value
62 | :param forecast: Forecast values. Shape: batch, time_o
63 | :param target: Target values. Shape: batch, time_o
64 | :param mask: 0/1 mask. Shape: batch, time_o
65 | :return: Loss value
66 | """
67 | masep = t.mean(t.abs(insample[:, freq:] - insample[:, :-freq]), dim=1)
68 | masked_masep_inv = divide_no_nan(mask, masep[:, None])
69 | return t.mean(t.abs(target - forecast) * masked_masep_inv)
70 |
71 |
72 | class AutomaticWeightedLoss(nn.Module):
73 | """automatically weighted multi-task loss
74 | Params:
75 | num: int,the number of loss
76 | x: multi-task loss
77 | Examples:
78 | loss1=1
79 | loss2=2
80 | awl = AutomaticWeightedLoss(2)
81 | loss_sum = awl(loss1, loss2)
82 | """
83 |
84 | def __init__(self, num=2):
85 | super(AutomaticWeightedLoss, self).__init__()
86 | params = torch.ones(num, requires_grad=True)
87 | self.params = nn.Parameter(params)
88 |
89 | def forward(self, *x):
90 | loss_sum = 0
91 | for i, loss in enumerate(x):
92 | loss_sum += 0.5 / (self.params[i] ** 2) * loss + torch.log(1 + self.params[i] ** 2)
93 | return loss_sum
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/m4_summary.py:
--------------------------------------------------------------------------------
1 | # This source code is provided for the purposes of scientific reproducibility
2 | # under the following limited license from Element AI Inc. The code is an
3 | # implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
4 | # expansion analysis for interpretable time series forecasting,
5 | # https://arxiv.org/abs/1905.10437). The copyright to the source code is
6 | # licensed under the Creative Commons - Attribution-NonCommercial 4.0
7 | # International license (CC BY-NC 4.0):
8 | # https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether
9 | # for the benefit of third parties or internally in production) requires an
10 | # explicit license. The subject-matter of the N-BEATS model and associated
11 | # materials are the property of Element AI Inc. and may be subject to patent
12 | # protection. No license to patents is granted hereunder (whether express or
13 | # implied). Copyright 2020 Element AI Inc. All rights reserved.
14 |
15 | """
16 | M4 Summary
17 | """
18 | from collections import OrderedDict
19 |
20 | import numpy as np
21 | import pandas as pd
22 |
23 | from data_provider.m4 import M4Dataset
24 | from data_provider.m4 import M4Meta
25 | import os
26 |
27 |
28 | def group_values(values, groups, group_name):
29 | return np.array([v[~np.isnan(v)] for v in values[groups == group_name]])
30 |
31 |
32 | def mase(forecast, insample, outsample, frequency):
33 | return np.mean(np.abs(forecast - outsample)) / np.mean(np.abs(insample[:-frequency] - insample[frequency:]))
34 |
35 |
36 | def smape_2(forecast, target):
37 | denom = np.abs(target) + np.abs(forecast)
38 | # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
39 | denom[denom == 0.0] = 1.0
40 | return 200 * np.abs(forecast - target) / denom
41 |
42 |
43 | def mape(forecast, target):
44 | denom = np.abs(target)
45 | # divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
46 | denom[denom == 0.0] = 1.0
47 | return 100 * np.abs(forecast - target) / denom
48 |
49 |
50 | class M4Summary:
51 | def __init__(self, file_path, root_path):
52 | self.file_path = file_path
53 | self.training_set = M4Dataset.load(training=True, dataset_file=root_path)
54 | self.test_set = M4Dataset.load(training=False, dataset_file=root_path)
55 | self.naive_path = os.path.join(root_path, 'submission-Naive2.csv')
56 |
57 | def evaluate(self):
58 | """
59 | Evaluate forecasts using M4 test dataset.
60 |
61 | :param forecast: Forecasts. Shape: timeseries, time.
62 | :return: sMAPE and OWA grouped by seasonal patterns.
63 | """
64 | grouped_owa = OrderedDict()
65 |
66 | naive2_forecasts = pd.read_csv(self.naive_path).values[:, 1:].astype(np.float32)
67 | naive2_forecasts = np.array([v[~np.isnan(v)] for v in naive2_forecasts])
68 |
69 | model_mases = {}
70 | naive2_smapes = {}
71 | naive2_mases = {}
72 | grouped_smapes = {}
73 | grouped_mapes = {}
74 | for group_name in M4Meta.seasonal_patterns:
75 | file_name = self.file_path + group_name + "_forecast.csv"
76 | if os.path.exists(file_name):
77 | model_forecast = pd.read_csv(file_name).values
78 |
79 | naive2_forecast = group_values(naive2_forecasts, self.test_set.groups, group_name)
80 | target = group_values(self.test_set.values, self.test_set.groups, group_name)
81 | # all timeseries within group have same frequency
82 | frequency = self.training_set.frequencies[self.test_set.groups == group_name][0]
83 | insample = group_values(self.training_set.values, self.test_set.groups, group_name)
84 |
85 | model_mases[group_name] = np.mean([mase(forecast=model_forecast[i],
86 | insample=insample[i],
87 | outsample=target[i],
88 | frequency=frequency) for i in range(len(model_forecast))])
89 | naive2_mases[group_name] = np.mean([mase(forecast=naive2_forecast[i],
90 | insample=insample[i],
91 | outsample=target[i],
92 | frequency=frequency) for i in range(len(model_forecast))])
93 |
94 | naive2_smapes[group_name] = np.mean(smape_2(naive2_forecast, target))
95 | grouped_smapes[group_name] = np.mean(smape_2(forecast=model_forecast, target=target))
96 | grouped_mapes[group_name] = np.mean(mape(forecast=model_forecast, target=target))
97 |
98 | grouped_smapes = self.summarize_groups(grouped_smapes)
99 | grouped_mapes = self.summarize_groups(grouped_mapes)
100 | grouped_model_mases = self.summarize_groups(model_mases)
101 | grouped_naive2_smapes = self.summarize_groups(naive2_smapes)
102 | grouped_naive2_mases = self.summarize_groups(naive2_mases)
103 | for k in grouped_model_mases.keys():
104 | grouped_owa[k] = (grouped_model_mases[k] / grouped_naive2_mases[k] +
105 | grouped_smapes[k] / grouped_naive2_smapes[k]) / 2
106 |
107 | def round_all(d):
108 | return dict(map(lambda kv: (kv[0], np.round(kv[1], 3)), d.items()))
109 |
110 | return round_all(grouped_smapes), round_all(grouped_owa), round_all(grouped_mapes), round_all(
111 | grouped_model_mases)
112 |
113 | def summarize_groups(self, scores):
114 | """
115 | Re-group scores respecting M4 rules.
116 | :param scores: Scores per group.
117 | :return: Grouped scores.
118 | """
119 | scores_summary = OrderedDict()
120 |
121 | def group_count(group_name):
122 | return len(np.where(self.test_set.groups == group_name)[0])
123 |
124 | weighted_score = {}
125 | for g in ['Yearly', 'Quarterly', 'Monthly']:
126 | weighted_score[g] = scores[g] * group_count(g)
127 | scores_summary[g] = scores[g]
128 |
129 | others_score = 0
130 | others_count = 0
131 | for g in ['Weekly', 'Daily', 'Hourly']:
132 | others_score += scores[g] * group_count(g)
133 | others_count += group_count(g)
134 | weighted_score['Others'] = others_score
135 | scores_summary['Others'] = others_score / others_count
136 |
137 | average = np.sum(list(weighted_score.values())) / len(self.test_set.groups)
138 | scores_summary['Average'] = average
139 |
140 | return scores_summary
141 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/masking.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 |
4 | class TriangularCausalMask():
5 | def __init__(self, B, L, device="cpu"):
6 | mask_shape = [B, 1, L, L]
7 | with torch.no_grad():
8 | self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
9 |
10 | @property
11 | def mask(self):
12 | return self._mask
13 |
14 |
15 | class ProbMask():
16 | def __init__(self, B, H, L, index, scores, device="cpu"):
17 | _mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1)
18 | _mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
19 | indicator = _mask_ex[torch.arange(B)[:, None, None],
20 | torch.arange(H)[None, :, None],
21 | index, :].to(device)
22 | self._mask = indicator.view(scores.shape).to(device)
23 |
24 | @property
25 | def mask(self):
26 | return self._mask
27 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/metrics.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def RSE(pred, true):
5 | return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2))
6 |
7 |
8 | def CORR(pred, true):
9 | u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0)
10 | d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0))
11 | return (u / d).mean(-1)
12 |
13 |
14 | def MAE(pred, true):
15 | return np.mean(np.abs(pred - true))
16 |
17 |
18 | def MSE(pred, true):
19 | return np.mean((pred - true) ** 2)
20 |
21 |
22 | def RMSE(pred, true):
23 | return np.sqrt(MSE(pred, true))
24 |
25 |
26 | def MAPE(pred, true):
27 | return np.mean(np.abs((pred - true) / true))
28 |
29 |
30 | def MSPE(pred, true):
31 | return np.mean(np.square((pred - true) / true))
32 |
33 |
34 | def metric(pred, true):
35 | mae = MAE(pred, true)
36 | mse = MSE(pred, true)
37 | rmse = RMSE(pred, true)
38 | mape = MAPE(pred, true)
39 | mspe = MSPE(pred, true)
40 |
41 | return mae, mse, rmse, mape, mspe
42 |
--------------------------------------------------------------------------------
/SimMTM_Forecasting/utils/timefeatures.py:
--------------------------------------------------------------------------------
1 | from typing import List
2 |
3 | import numpy as np
4 | import pandas as pd
5 | from pandas.tseries import offsets
6 | from pandas.tseries.frequencies import to_offset
7 |
8 |
9 | class TimeFeature:
10 | def __init__(self):
11 | pass
12 |
13 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
14 | pass
15 |
16 | def __repr__(self):
17 | return self.__class__.__name__ + "()"
18 |
19 |
20 | class SecondOfMinute(TimeFeature):
21 | """Minute of hour encoded as value between [-0.5, 0.5]"""
22 |
23 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
24 | return index.second / 59.0 - 0.5
25 |
26 |
27 | class MinuteOfHour(TimeFeature):
28 | """Minute of hour encoded as value between [-0.5, 0.5]"""
29 |
30 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
31 | return index.minute / 59.0 - 0.5
32 |
33 |
34 | class HourOfDay(TimeFeature):
35 | """Hour of day encoded as value between [-0.5, 0.5]"""
36 |
37 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
38 | return index.hour / 23.0 - 0.5
39 |
40 |
41 | class DayOfWeek(TimeFeature):
42 | """Hour of day encoded as value between [-0.5, 0.5]"""
43 |
44 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
45 | return index.dayofweek / 6.0 - 0.5
46 |
47 |
48 | class DayOfMonth(TimeFeature):
49 | """Day of month encoded as value between [-0.5, 0.5]"""
50 |
51 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
52 | return (index.day - 1) / 30.0 - 0.5
53 |
54 |
55 | class DayOfYear(TimeFeature):
56 | """Day of year encoded as value between [-0.5, 0.5]"""
57 |
58 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
59 | return (index.dayofyear - 1) / 365.0 - 0.5
60 |
61 |
62 | class MonthOfYear(TimeFeature):
63 | """Month of year encoded as value between [-0.5, 0.5]"""
64 |
65 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
66 | return (index.month - 1) / 11.0 - 0.5
67 |
68 |
69 | class WeekOfYear(TimeFeature):
70 | """Week of year encoded as value between [-0.5, 0.5]"""
71 |
72 | def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
73 | return (index.isocalendar().week - 1) / 52.0 - 0.5
74 |
75 |
76 | def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]:
77 | """
78 | Returns a list of time features that will be appropriate for the given frequency string.
79 | Parameters
80 | ----------
81 | freq_str
82 | Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc.
83 | """
84 |
85 | features_by_offsets = {
86 | offsets.YearEnd: [],
87 | offsets.QuarterEnd: [MonthOfYear],
88 | offsets.MonthEnd: [MonthOfYear],
89 | offsets.Week: [DayOfMonth, WeekOfYear],
90 | offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear],
91 | offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear],
92 | offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear],
93 | offsets.Minute: [
94 | MinuteOfHour,
95 | HourOfDay,
96 | DayOfWeek,
97 | DayOfMonth,
98 | DayOfYear,
99 | ],
100 | offsets.Second: [
101 | SecondOfMinute,
102 | MinuteOfHour,
103 | HourOfDay,
104 | DayOfWeek,
105 | DayOfMonth,
106 | DayOfYear,
107 | ],
108 | }
109 |
110 | offset = to_offset(freq_str)
111 |
112 | for offset_type, feature_classes in features_by_offsets.items():
113 | if isinstance(offset, offset_type):
114 | return [cls() for cls in feature_classes]
115 |
116 | supported_freq_msg = f"""
117 | Unsupported frequency {freq_str}
118 | The following frequencies are supported:
119 | Y - yearly
120 | alias: A
121 | M - monthly
122 | W - weekly
123 | D - daily
124 | B - business days
125 | H - hourly
126 | T - minutely
127 | alias: min
128 | S - secondly
129 | """
130 | raise RuntimeError(supported_freq_msg)
131 |
132 |
133 | def time_features(dates, freq='h'):
134 | return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)])
135 |
--------------------------------------------------------------------------------
/figs/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/.DS_Store
--------------------------------------------------------------------------------
/figs/mainresult.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/mainresult.png
--------------------------------------------------------------------------------
/figs/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/thuml/SimMTM/169513bef74fb676e48d98a0e30f8823793f691c/figs/overview.png
--------------------------------------------------------------------------------