├── Eval_Syn_Data.npy ├── Eval_Syn_Representations.npy ├── Micrseismic_Timeseries_Centers ├── Micrseismic_Timeseries_Finetuning_phase ├── Micrseismic_Timeseries_Pretraining_phase ├── README.md ├── colormap.mat ├── datasets └── Micrseismic_Timeseries │ └── test ├── datautils.py ├── evaluation.py ├── models ├── Metrics.py ├── __init__.py └── encoder.py ├── requirements.txt ├── results ├── ClusterAsMicroseismic.jpg ├── ClusterAsNoise.jpg ├── Framework.jpg ├── comparison_results.jpg ├── reprs.png └── syn_reprs.png ├── run.py ├── tools.py └── tscc.py /Eval_Syn_Data.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Eval_Syn_Data.npy -------------------------------------------------------------------------------- /Eval_Syn_Representations.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Eval_Syn_Representations.npy -------------------------------------------------------------------------------- /Micrseismic_Timeseries_Centers: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Centers -------------------------------------------------------------------------------- /Micrseismic_Timeseries_Finetuning_phase: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Finetuning_phase -------------------------------------------------------------------------------- /Micrseismic_Timeseries_Pretraining_phase: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Pretraining_phase -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Time Series Contrastive Clustering (TSCC) 2 | 3 | ## Cite This Paper: 4 | Z. Yang, H. Li, X. Tuo, L. Li and J. Wen, "Unsupervised Clustering of Microseismic Signals Using a Contrastive Learning Model," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-12, 2023, Art no. 5903212, doi: 10.1109/TGRS.2023.3240728. 5 | 6 | Link: https://doi.org/10.1109/TGRS.2023.3240728 7 | 8 | ## Getting Started 9 | 10 | Clone the project into your local system 11 | 12 | ``` 13 | git clone https://github.com/yangzhen-cdut/Unsupervised-Clustering.git 14 | cd Unsupervised-Clustering 15 | ``` 16 | 17 | ## Requirements 18 | The recommended requirements for TSCC are specified as follows: 19 | * Python 3.7 20 | * torch==1.8.1 21 | * scipy==1.7.3 22 | * numpy==1.21.6 23 | * pandas==1.3.5 24 | * scikit_learn==0.24.2 25 | * matplotlib==3.5.2 26 | * bottleneck==1.3.4 27 | * seaborn==0.11.2 28 | 29 | The dependencies can be installed by: 30 | ```bash 31 | pip install -r requirements.txt 32 | ``` 33 | 34 | Note that you should have CUDA installed for running the code. 35 | 36 | ## Usage 37 | 38 | To train TSCC on a microseismic dataset, run the following command: 39 | 40 | ```run 41 | python run.py --pretraining_epoch --batch-size --MaxIter --repr-dims 42 | ``` 43 | 44 | After training, the trained encoder of pre-training phase, the trained encoder of fine-tuning phase and clustering centers can be found in `./_Pretraining_phase`, `./_Finetuning_phase`, `./_Centers`. 45 | 46 | To evaluate TSCC on a microseismic dataset, run the following command: 47 | 48 | ```evaluate 49 | python evaluation.py --pretraining_epoch --batch-size --MaxIter --repr-dims 50 | ``` 51 | 52 | There are two examples are given in `evaluation.py`: `eval_with_real_data` and `eval_with_synthetic_data`. You can call those two functions directly, and the output of representations can be found in `./Eval_Representations.npy` and `./Eval_Syn_Representations.npy`, respectively. 53 | 54 | 55 | ## Results 56 | 57 | ### The architecture of TSCC used in our study. 58 | 59 |
60 | 61 |
62 | 63 | ### Clustering performance comparison. 64 | 65 |
66 | 67 |
68 | 69 | ### Visualization of learned latent representations. 70 | 71 | Latent representations of synthetic waveforms 72 | 73 |
74 | 75 |
76 | 77 | Latent representations of real microseismic waveforms 78 | 79 |
80 | 81 |
82 | 83 | ### Representative cluster distribution 84 | 85 | Tpyical microseismic waveforms 86 | 87 |
88 | 89 |
90 | 91 | Tpyical noise waveforms 92 | 93 |
94 | 95 |
96 | 97 | ### Classification performance comparison of supervised classifier using raw time series and the features (R) generated by the TSCC. 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 |
MethodsTime seriesFeature R
ACC (%)NMI (%)AUPRC (%)ACC (%)NMI (%)AUPRC (%)
Linear71.5915.3866.4799.1192.6398.71
KNN90.5855.5587.6598.1386.6297.36
SVM97.6583.9199.8198.9491.5699.91
TSCC98.0786.2697.15------
150 | -------------------------------------------------------------------------------- /colormap.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/colormap.mat -------------------------------------------------------------------------------- /datasets/Micrseismic_Timeseries/test: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /datautils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from torch.utils.data import DataLoader 4 | 5 | 6 | class Mydataset(torch.utils.data.Dataset): 7 | def __init__(self, x, y): 8 | self.x = x 9 | self.y = y 10 | self.idx = list() 11 | for item in x: 12 | self.idx.append(item) 13 | pass 14 | 15 | def __getitem__(self, index): 16 | input_data = self.idx[index] 17 | target = self.y[index] 18 | return input_data, target 19 | 20 | def __len__(self): 21 | return len(self.idx) 22 | 23 | 24 | def load_data(dataset_name, data_size, batch_size): 25 | x = np.load('./datasets/' + f'{dataset_name}/' + f'cluster_x_{data_size}.npy') 26 | y = np.load('./datasets/' + f'{dataset_name}/' + f'cluster_y_{data_size}.npy') 27 | dataset = Mydataset(x, y) 28 | print('n_cluster:', np.unique(y)) 29 | 30 | torch.manual_seed(123) 31 | data_loader = torch.utils.data.DataLoader( 32 | dataset=dataset, 33 | batch_size=batch_size, 34 | shuffle=True, 35 | num_workers=0) 36 | return data_loader 37 | -------------------------------------------------------------------------------- /evaluation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import math 3 | import torch 4 | import datautils 5 | import argparse 6 | import numpy as np 7 | import matplotlib.pyplot as plt 8 | from torch.autograd import Variable 9 | 10 | from tscc import TSCCModel 11 | 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument('--dataset_name', default='Micrseismic_Timeseries', type=str, help='The dataset name') 14 | parser.add_argument('--dataset_size', default=4928, type=int, help='The size of dataset') 15 | parser.add_argument('--dim', default=1, type=int, help='The dimension of input') 16 | parser.add_argument('--num_cluster', type=int, default=2, help='The number of cluster') 17 | parser.add_argument('--batch_size', type=int, default=64, help='The batch size') 18 | parser.add_argument('--repr_dims', type=int, default=32, help='The representation dimension') 19 | parser.add_argument('--lr', type=float, default=0.001, help='The learning rate of pre-training phase') 20 | parser.add_argument('--pretraining_epoch', type=int, default=25, help='The epoch of pre-training phase') 21 | parser.add_argument('--MaxIter1', type=int, default=25, help='The epoch of fine-tuning phase') 22 | args = parser.parse_args() 23 | 24 | print("Arguments:", str(args)) 25 | print('Loading data... \n', end='') 26 | 27 | # Load data 28 | data_loader = datautils.load_data(args.dataset_name, args.dataset_size, args.batch_size) 29 | 30 | config = dict(dataset_size=args.dataset_size, 31 | dataset_name=args.dataset_name, 32 | pretraining_epoch=args.pretraining_epoch, 33 | batch_size=args.batch_size, 34 | MaxIter1=args.MaxIter1, 35 | lr=args.lr, 36 | output_dims=args.repr_dims) 37 | 38 | model = TSCCModel(data_loader, n_cluster=args.num_cluster, input_dims=args.dim, **config) 39 | 40 | model.encoder = torch.load('Micrseismic_Timeseries_Finetuning_phase') 41 | print('finish inital') 42 | model.encoder.eval() 43 | 44 | 45 | def eval_with_real_data(save=False): 46 | data = np.zeros([args.dataset_size, 500, 1]) 47 | reps = np.zeros([args.dataset_size, 500, 32]) 48 | 49 | ii = 0 50 | for x, target in data_loader: 51 | x = Variable(x).cuda() 52 | u = model.encoder(x) 53 | u = u.cpu() 54 | reps[ii * args.batch_size:(ii + 1) * args.batch_size, :, :] = u.data.numpy() 55 | data[ii * args.batch_size:(ii + 1) * args.batch_size, :, :] = x.cpu().numpy() 56 | ii = ii + 1 57 | if save: 58 | np.save(os.getcwd() + '\\Eval_Representations.npy', reps) 59 | np.save(os.getcwd() + '\\Eval_Data.npy', data) 60 | print('finish') 61 | 62 | 63 | def eval_with_synthetic_data(save=False): 64 | # Ricker with White Gaussian Noise, SNR = -5dB 65 | n = 500 66 | wt = Ricker(n) 67 | np.random.seed(123) 68 | noise = np.random.normal(loc=0, scale=1.0, size=(len(wt),)) 69 | nwt = add_noise(wt, noise, -5) 70 | nwt = nwt / np.max(abs(nwt)) # de-mean 71 | 72 | plt.figure(figsize=(8, 6)) 73 | plt.subplot(211) 74 | plt.plot(nwt, c='#9B3A4D', linewidth=1.5) 75 | plt.tick_params(labelsize=14) 76 | plt.margins(x=0) 77 | plt.title('Noisy Ricker with SNR=-5dB', fontsize=20, family='Calibri') 78 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 79 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 80 | 81 | # Synthetic Noise 82 | plt.subplot(212) 83 | t = np.linspace(0, n-1, n) 84 | lowfre_noise = np.sin((t) * np.pi / 100) 85 | np.random.seed(123) 86 | random_noise = np.random.normal(loc=0, scale=1.0, size=(500,)) 87 | noise = random_noise + lowfre_noise 88 | noise = noise / np.max(abs(noise)) 89 | plt.plot(noise, c='#70A0AC', linewidth=1.5) 90 | plt.tick_params(labelsize=14) 91 | plt.margins(x=0) 92 | plt.title('Synthetic Noise', fontsize=20, family='Calibri') 93 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 94 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 95 | plt.tight_layout() 96 | plt.show() 97 | 98 | syndata = np.zeros((2, 500, 1)) 99 | syndata[0, :, 0] = nwt 100 | syndata[1, :, 0] = noise 101 | 102 | syndata_in = Variable(torch.tensor(syndata, dtype=torch.float32)).cuda() 103 | 104 | syn_reps = model.encoder(syndata_in) 105 | syn_reps = syn_reps.detach().cpu().numpy() 106 | if save: 107 | np.save(os.getcwd() + '\\Eval_Syn_Representations.npy', syn_reps) 108 | np.save(os.getcwd() + '\\Eval_Syn_Data.npy', np.squeeze(syndata, axis=2)) 109 | print('finish') 110 | 111 | 112 | def Ricker(n, f0=20, dt=0.001): 113 | wt = np.zeros(n) 114 | i = 0 115 | for k in range(int(-n / 2), int(n / 2)): 116 | wt[i] = (1 - 2.0 * (math.pi * f0 * k * dt) ** 2) * math.exp(-1 * (math.pi * f0 * k * dt) ** 2) 117 | i += 1 118 | 119 | return wt 120 | 121 | 122 | def add_noise(x, noise, SNR): 123 | """ 124 | :param x: pure signal (np.array) 125 | :param noise: noise = random.normal(0,1) 126 | :param SNR: signal-to-noise ratio 127 | :return: noisy_signal 128 | """ 129 | try: 130 | x = np.array(x) 131 | except: 132 | pass 133 | 134 | N = len(x.tolist()) 135 | noise = noise - np.mean(noise) 136 | signal_power = 1.0 / N * sum(x ** 2) 137 | noise_variance = signal_power / (math.pow(10, SNR / 10)) 138 | NOISE = math.sqrt(noise_variance) / np.std(noise) * noise 139 | noisy_signal = x + NOISE 140 | return noisy_signal 141 | 142 | 143 | if __name__ == '__main__': 144 | eval_with_synthetic_data() -------------------------------------------------------------------------------- /models/Metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.metrics import normalized_mutual_info_score 3 | 4 | nmi = normalized_mutual_info_score 5 | def acc(y_true, y_pred, num_cluster): 6 | """ 7 | Calculate clustering accuracy. Require scikit-learn installed 8 | 9 | # Arguments 10 | y_true: true labels, numpy.array with shape `(n_samples,)` 11 | y_pred: predicted labels, numpy.array with shape `(n_samples,)` 12 | num_cluster: number of cluster 13 | 14 | # Return 15 | accuracy, in [0,1] 16 | """ 17 | y_true = y_true.astype(np.int64) 18 | y_pred = y_pred.astype(np.int64) 19 | assert y_pred.size == y_true.size 20 | 21 | w = np.zeros((num_cluster, num_cluster)) 22 | for i in range(y_pred.size): 23 | w[y_pred[i], y_true[i]] += 1 24 | from scipy.optimize import linear_sum_assignment 25 | 26 | ind = linear_sum_assignment(w.max() - w) 27 | accuracy = 0.0 28 | for i in ind[0]: 29 | accuracy = accuracy + w[i, ind[1][i]] 30 | return accuracy / y_pred.size 31 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .encoder import TSCCEncoder 2 | -------------------------------------------------------------------------------- /models/encoder.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from torch import nn 4 | import torch.nn.functional as F 5 | 6 | 7 | class TSCCEncoder(nn.Module): 8 | def __init__(self, input_dims, output_dims, hidden_dims=64, depth=10, mask_mode=0): 9 | super().__init__() 10 | self.input_dims = input_dims 11 | self.output_dims = output_dims 12 | self.hidden_dims = hidden_dims 13 | self.mask_mode = mask_mode 14 | self.linear_projector = nn.Linear(input_dims, hidden_dims) 15 | self.feature_extractor = DilatedConvModule(hidden_dims, [hidden_dims] * depth + [output_dims], kernel_size=3) 16 | self.repr_dropout = nn.Dropout(p=0.2) 17 | 18 | def forward(self, x): 19 | nan_mask = ~x.isnan().any(axis=-1) 20 | x[~nan_mask] = 0 21 | x = self.linear_projector(x) 22 | 23 | if self.training: 24 | mask = self.mask_mode 25 | else: 26 | mask = 1 27 | 28 | if mask == 0: 29 | mask = torch.from_numpy(np.random.binomial(1, 0.5, size=(x.size(0), x.size(1)).to(x.device))).to(torch.bool) 30 | elif mask == 1: 31 | mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool) 32 | 33 | mask &= nan_mask 34 | x[~mask] = 0 35 | 36 | x = x.transpose(1, 2) 37 | x = self.repr_dropout(self.feature_extractor(x)) 38 | x = x.transpose(1, 2) 39 | 40 | return x 41 | 42 | 43 | class DilatedConv(nn.Module): 44 | def __init__(self, in_channels, out_channels, kernel_size, dilation=1, groups=1): 45 | super().__init__() 46 | self.receptive_field = (kernel_size - 1) * dilation + 1 47 | padding = self.receptive_field // 2 48 | self.conv = nn.Conv1d( 49 | in_channels, out_channels, kernel_size, 50 | padding=padding, 51 | dilation=dilation, 52 | groups=groups 53 | ) 54 | self.remove = 1 if self.receptive_field % 2 == 0 else 0 55 | 56 | def forward(self, x): 57 | out = self.conv(x) 58 | if self.remove > 0: 59 | out = out[:, :, : -self.remove] 60 | return out 61 | 62 | 63 | class DilatedConvBlock(nn.Module): 64 | def __init__(self, in_channels, out_channels, kernel_size, dilation, final=False): 65 | super().__init__() 66 | self.DilatedConv1 = DilatedConv(in_channels, out_channels, kernel_size, dilation=dilation) 67 | self.DilatedConv2 = DilatedConv(out_channels, out_channels, kernel_size, dilation=dilation) 68 | self.ResProjector = nn.Conv1d(in_channels, out_channels, 1) if in_channels != out_channels or final else None 69 | 70 | def forward(self, x): 71 | residual = x if self.ResProjector is None else self.ResProjector(x) 72 | x = F.gelu(x) 73 | x = self.DilatedConv1(x) 74 | x = F.gelu(x) 75 | x = self.DilatedConv2(x) 76 | return x + residual 77 | 78 | 79 | class DilatedConvModule(nn.Module): 80 | def __init__(self, in_channels, channels, kernel_size): 81 | super().__init__() 82 | self.net = nn.Sequential(*[ 83 | DilatedConvBlock( 84 | channels[i - 1] if i > 0 else in_channels, 85 | channels[i], 86 | kernel_size=kernel_size, 87 | dilation=2 ** i, 88 | final=(i == len(channels) - 1) 89 | ) 90 | for i in range(len(channels)) 91 | ]) 92 | 93 | def forward(self, x): 94 | return self.net(x) 95 | 96 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | seaborn==0.11.2 2 | matplotlib==3.5.2 3 | Bottleneck==1.3.4 4 | torch==1.8.1 5 | scipy==1.7.3 6 | numpy==1.21.6 7 | pandas==1.3.5 8 | scikit_learn==0.24.2 9 | -------------------------------------------------------------------------------- /results/ClusterAsMicroseismic.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/ClusterAsMicroseismic.jpg -------------------------------------------------------------------------------- /results/ClusterAsNoise.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/ClusterAsNoise.jpg -------------------------------------------------------------------------------- /results/Framework.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/Framework.jpg -------------------------------------------------------------------------------- /results/comparison_results.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/comparison_results.jpg -------------------------------------------------------------------------------- /results/reprs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/reprs.png -------------------------------------------------------------------------------- /results/syn_reprs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/syn_reprs.png -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | import time 2 | import datetime 3 | import argparse 4 | import datautils 5 | from tscc import TSCCModel 6 | 7 | parser = argparse.ArgumentParser() 8 | parser.add_argument('--dataset_name', default='Micrseismic_Timeseries', type=str, help='The dataset name') 9 | parser.add_argument('--dataset_size', default=4928, type=int, help='The size of dataset') 10 | parser.add_argument('--dim', default=1, type=int, help='The dimension of input') 11 | parser.add_argument('--num_cluster', type=int, default=2, help='The number of cluster') 12 | parser.add_argument('--batch_size', type=int, default=64, help='The batch size') 13 | parser.add_argument('--repr_dims', type=int, default=32, help='The representation dimension') 14 | parser.add_argument('--lr', type=float, default=0.001, help='The learning rate of pre-training phase') 15 | parser.add_argument('--pretraining_epoch', type=int, default=25, help='The epoch of pre-training phase') 16 | parser.add_argument('--MaxIter', type=int, default=25, help='The epoch of fine-tuning phase') 17 | args = parser.parse_args() 18 | 19 | print("Arguments:", str(args)) 20 | print('Loading data... \n', end='') 21 | 22 | # Load data 23 | data_loader = datautils.load_data(args.dataset_name, args.dataset_size, args.batch_size) 24 | 25 | config = dict(dataset_size=args.dataset_size, 26 | dataset_name=args.dataset_name, 27 | pretraining_epoch=args.pretraining_epoch, 28 | batch_size=args.batch_size, 29 | MaxIter1=args.MaxIter, 30 | lr=args.lr, 31 | output_dims=args.repr_dims) 32 | 33 | model = TSCCModel(data_loader, n_cluster=args.num_cluster, input_dims=args.dim, **config) 34 | 35 | t = time.time() 36 | 37 | if args.pretraining_epoch != 0: 38 | model.Pretraining() 39 | if args.MaxIter1 != 0: 40 | model.Finetuning() 41 | 42 | t = time.time() - t 43 | 44 | print(f"\nTraining time: {datetime.timedelta(seconds=t)}\n") 45 | -------------------------------------------------------------------------------- /tools.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | from sklearn.manifold import TSNE 6 | from sklearn.cluster import KMeans 7 | from models.Metrics import nmi, acc 8 | 9 | import seaborn as sns 10 | sns.set() 11 | 12 | os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" 13 | 14 | 15 | def plotter(S, y, name, loc, save_fig=False): 16 | ''' function to visualize the outputs of t-SNE ''' 17 | 18 | legend_properties = {'family': 'Calibri', 'size': '16'} 19 | target_names = ['Microseismic', 'Noise'] 20 | colors = ['#9B3A4D', '#70A0AC'] 21 | lw = 0.6 22 | for color, i, target_name in zip(colors, [0, 1], target_names): 23 | plt.scatter(S[y == i, 0], S[y == i, 1], c=color, alpha=0.5, lw=lw, s=40, label=target_name) 24 | plt.legend(loc=loc, shadow=True, scatterpoints=1, prop=legend_properties, facecolor='white', frameon=False) 25 | plt.tick_params(labelsize=12) 26 | plt.title(name, fontsize=20, family='Calibri') 27 | 28 | 29 | def comparison_clustering(save_fig=False): 30 | filename = './features/' 31 | f = plt.figure(figsize=(18, 12)) 32 | 33 | # K-means on Time Series 34 | ax = plt.subplot(231) 35 | x = np.load('./dataset/Time Series/cluster_x_4928.npy') 36 | x = np.squeeze(x, axis=2) 37 | y = np.load('./dataset/Time Series/cluster_y_4928.npy') 38 | kmeans = KMeans(n_clusters=2, random_state=0).fit(x) 39 | label_pred = kmeans.labels_ 40 | ACC = acc(y, label_pred, 2) 41 | NMI = nmi(y, label_pred) 42 | print('ACC', ACC) 43 | print('NMI', NMI) 44 | redu = TSNE(n_components=2, random_state=50).fit_transform(x) 45 | plotter(redu, y, name='K-means', loc='upper left') 46 | ax.axis('off') 47 | ax.axis('tight') 48 | 49 | # DEC 50 | ax = plt.subplot(232) 51 | enc = np.load(filename + 'DEC.npy') 52 | y = np.load(filename + 'DEC_y_true.npy') 53 | redu = TSNE(n_components=2, random_state=123).fit_transform(enc) 54 | plotter(redu, y, name='DEC', loc='upper right') 55 | ax.axis('off') 56 | ax.axis('tight') 57 | 58 | # DCA 59 | ax = plt.subplot(233) 60 | enc = np.load(filename + 'DCA.npy') 61 | y = np.load(filename + 'DCA_y_true.npy') 62 | redu = TSNE(n_components=2, random_state=50).fit_transform(enc) 63 | plotter(redu, y, name='DCA', loc='upper right') 64 | ax.axis('off') 65 | ax.axis('tight') 66 | 67 | # DCSS 68 | ax = plt.subplot(234) 69 | enc = np.load(filename + 'DCSS.npy') 70 | y = np.load(filename + 'DCSS_y_true.npy') 71 | redu = TSNE(n_components=2, random_state=64).fit_transform(enc) 72 | plotter(redu, y, name='DCSS', loc='upper left') 73 | ax.axis('off') 74 | ax.axis('tight') 75 | 76 | # TSCC Pre-training 77 | ax = plt.subplot(235) 78 | enc = np.load(filename + 'End_Pretraining_u.npy') 79 | y = np.load(filename + 'End_Pretraining_y_true.npy') 80 | redu = TSNE(n_components=2, random_state=88).fit_transform(enc) 81 | plotter(redu, y, name='TSCC Pre-training', loc='upper left') 82 | ax.axis('off') 83 | ax.axis('tight') 84 | 85 | # TSCC Fine-tuning 86 | ax = plt.subplot(236) 87 | enc = np.load(filename + 'End_Finetuning_u.npy') 88 | y = np.load(filename + 'End_Finetuning_y_true.npy') 89 | redu = TSNE(n_components=2, random_state=0).fit_transform(enc) 90 | plotter(redu, y, name='TSCC Fine-tuning', loc='upper right') 91 | ax.axis('off') 92 | ax.axis('tight') 93 | 94 | plt.tight_layout() 95 | plt.show() 96 | if save_fig: 97 | f.savefig('./results/comparison_results.png', dpi=600) 98 | 99 | 100 | def representation_visualization(save_fig=False): 101 | data = np.load('./Eval_Data.npy') 102 | data = np.squeeze(data, axis=2) 103 | reps = np.load('./Eval_Representations.npy') 104 | p1 = 2290 105 | p2 = 1171 106 | p3 = 3279 107 | p4 = 387 108 | f = plt.figure(figsize=(15, 10)) 109 | plt.subplot(421) 110 | plt.plot(data[p1, :]/max(abs(data[p1, :])), c='#9B3A4D', linewidth=2) 111 | plt.tick_params(labelsize=14) 112 | plt.margins(x=0) 113 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 114 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 115 | plt.text(390, 0.65, 'Microseismic', fontsize=18, family='Calibri') 116 | 117 | plt.subplot(422) 118 | plt.plot(data[p2, :]/max(abs(data[p2, :])), c='#9B3A4D', linewidth=2) 119 | plt.tick_params(labelsize=14) 120 | plt.margins(x=0) 121 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 122 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 123 | plt.text(390, 0.65, 'Microseismic', fontsize=18, family='Calibri') 124 | 125 | plt.subplot(425) 126 | plt.plot(data[p3, :]/max(abs(data[p3, :])), c='#70A0AC', linewidth=2) 127 | plt.tick_params(labelsize=14) 128 | plt.margins(x=0) 129 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 130 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 131 | plt.text(450, 0.65, 'Noise', fontsize=18, family='Calibri') 132 | 133 | plt.subplot(426) 134 | plt.plot(data[p4, :]/max(abs(data[p4, :])), c='#70A0AC', linewidth=2) 135 | plt.tick_params(labelsize=14) 136 | plt.margins(x=0) 137 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 138 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 139 | plt.text(450, 0.65, 'Noise', fontsize=18, family='Calibri') 140 | 141 | # Customize Colormap 142 | from matplotlib.colors import LinearSegmentedColormap 143 | import scipy.io as scio 144 | colormap = scio.loadmat('./colormap.mat')['mycamp'] 145 | moreland_map = LinearSegmentedColormap.from_list('cos', colormap) 146 | 147 | plt.subplot(423) 148 | ax = sns.heatmap(reps[p1, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 149 | ax.axis('on') 150 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 151 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 152 | # cbar = ax.collections[0].colorbar 153 | # cbar.ax.tick_params(labelsize=14) 154 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 155 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 156 | plt.tight_layout() 157 | 158 | plt.subplot(424) 159 | ax = sns.heatmap(reps[p2, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 160 | ax.axis('on') 161 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 162 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 163 | # cbar = ax.collections[0].colorbar 164 | # cbar.ax.tick_params(labelsize=14) 165 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 166 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 167 | plt.tight_layout() 168 | 169 | plt.subplot(427) 170 | ax = sns.heatmap(reps[p3, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 171 | ax.axis('on') 172 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 173 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 174 | # cbar = ax.collections[0].colorbar 175 | # cbar.ax.tick_params(labelsize=14) 176 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 177 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 178 | plt.tight_layout() 179 | 180 | plt.subplot(428) 181 | ax = sns.heatmap(reps[p4, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 182 | ax.axis('on') 183 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 184 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 185 | # cbar = ax.collections[0].colorbar 186 | # cbar.ax.tick_params(labelsize=14) 187 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 188 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 189 | plt.tight_layout() 190 | if save_fig: 191 | plt.savefig('./results/reprs.png', dpi=600) 192 | plt.show() 193 | 194 | 195 | def syn_representation_visualization(save_fig=False): 196 | data = np.load('./Eval_Syn_Data.npy') 197 | reps = np.load('./Eval_Syn_Representations.npy') 198 | 199 | f = plt.figure(figsize=(7.5, 10)) 200 | plt.subplot(411) 201 | plt.plot(data[0, :], c='#9B3A4D', linewidth=2) 202 | plt.tick_params(labelsize=14) 203 | plt.margins(x=0) 204 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 205 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 206 | plt.text(365, 0.76, 'Synthetic ricker', fontsize=18, family='Calibri') 207 | 208 | plt.subplot(413) 209 | plt.plot(data[1, :], c='#70A0AC', linewidth=2) 210 | plt.tick_params(labelsize=14) 211 | plt.margins(x=0) 212 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 213 | plt.ylabel('Amplitude', fontsize=20, family='Calibri') 214 | plt.text(365, 0.73, 'Synthetic noise', fontsize=18, family='Calibri') 215 | 216 | # Customize Colormap 217 | from matplotlib.colors import LinearSegmentedColormap 218 | import scipy.io as scio 219 | colormap = scio.loadmat('./colormap.mat')['mycamp'] 220 | moreland_map = LinearSegmentedColormap.from_list('cos', colormap) 221 | 222 | plt.subplot(412) 223 | ax = sns.heatmap(reps[0, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 224 | ax.axis('on') 225 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 226 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 227 | # cbar = ax.collections[0].colorbar 228 | # cbar.ax.tick_params(labelsize=14) 229 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 230 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 231 | plt.tight_layout() 232 | 233 | plt.subplot(414) 234 | ax = sns.heatmap(reps[1, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False) 235 | ax.axis('on') 236 | plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14) 237 | plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14) 238 | # cbar = ax.collections[0].colorbar 239 | # cbar.ax.tick_params(labelsize=14) 240 | plt.xlabel('Timestamp', fontsize=20, family='Calibri') 241 | plt.ylabel('Repr dims', fontsize=20, family='Calibri') 242 | plt.tight_layout() 243 | 244 | if save_fig: 245 | plt.savefig('./results/syn_reprs.png', dpi=600) 246 | plt.show() 247 | 248 | 249 | if __name__ == '__main__': 250 | #comparison_clustering() 251 | syn_representation_visualization(save_fig=False) -------------------------------------------------------------------------------- /tscc.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | import torch 4 | import torch.optim as optim 5 | from torch.autograd import Variable 6 | import torch.nn.functional as F 7 | from torch.utils.data import TensorDataset, DataLoader 8 | import numpy as np 9 | from models import TSCCEncoder 10 | from sklearn.manifold import TSNE 11 | from sklearn.cluster import KMeans 12 | from matplotlib import pyplot as plt 13 | from models.Metrics import nmi, acc 14 | 15 | 16 | class TSCCModel: 17 | ''' The TSCC model ''' 18 | 19 | def __init__( 20 | self, 21 | data_loader, 22 | dataset_size, 23 | batch_size, 24 | pretraining_epoch, 25 | n_cluster, 26 | dataset_name, 27 | input_dims, 28 | MaxIter=100, 29 | m=1.0, 30 | T1=2, 31 | output_dims=32, 32 | hidden_dims=64, 33 | depth=10, 34 | device='cuda', 35 | lr=0.001, 36 | max_train_length=3000, 37 | temporal_unit=0): 38 | 39 | ''' Initialize a TS2Vec model ''' 40 | 41 | super().__init__() 42 | self.device = device 43 | self.lr = lr 44 | self.num_cluster = n_cluster 45 | self.batch_size = batch_size 46 | self.T1 = T1 47 | self.m = m 48 | self.pretraining_epoch = pretraining_epoch 49 | self.MaxIter1 = MaxIter 50 | self.data_loader = data_loader 51 | self.dataset_size = dataset_size 52 | self.dataset_name = dataset_name 53 | self.latent_size = output_dims 54 | self.max_train_length = max_train_length 55 | self.temporal_unit = temporal_unit 56 | 57 | self.u_mean = torch.zeros([n_cluster, output_dims]) 58 | 59 | self.encoder = TSCCEncoder(input_dims=input_dims, output_dims=output_dims, hidden_dims=hidden_dims, depth=depth).to(self.device) 60 | self.net = torch.optim.swa_utils.AveragedModel(self.encoder) 61 | self.net.update_parameters(self.encoder) 62 | 63 | def Pretraining(self): 64 | print('Pretraining...') 65 | self.encoder.train() 66 | self.encoder.cuda() 67 | for param in self.encoder.parameters(): 68 | param.requires_grad = True 69 | optimizer = optim.AdamW(self.encoder.parameters(), lr=self.lr) 70 | prev_ACC = 0 71 | loss_log = [] 72 | acc_log = [] 73 | nmi_log = [] 74 | for T in range(0, self.pretraining_epoch): 75 | print('Pretraining Epoch: ', T + 1) 76 | for x, target in self.data_loader: 77 | optimizer.zero_grad() 78 | x = Variable(x).cuda() 79 | out1, out2 = self.cropping(x, tp_unit=self.temporal_unit, model=self.encoder) 80 | loss = self.contrastive_loss(out1, out2, temporal_unit=self.temporal_unit) 81 | loss.backward() 82 | optimizer.step() 83 | self.net.update_parameters(self.encoder) 84 | loss_log.append(loss) 85 | 86 | ACC, NMI = self.Kmeans_model_evaluation(T) 87 | acc_log.append(ACC) 88 | nmi_log.append(NMI) 89 | 90 | if ACC > prev_ACC: 91 | prev_ACC = ACC 92 | with open(self.dataset_name+'_Pretraining_phase', 'wb') as f: 93 | torch.save(self.encoder, f) 94 | print(f"Epoch #{T + 1}: loss={loss}") 95 | 96 | file = os.getcwd() + '\\pretraining.csv' 97 | data = pd.DataFrame.from_dict({'pretraining': loss_log, 'ACC': acc_log, 'NMI': nmi_log}, orient='index') 98 | data.to_csv(file, index=False) 99 | 100 | self.encoder = torch.load(self.dataset_name + '_Pretraining_phase') 101 | self.plotter(name=self.dataset_name + '_Pretraining_phase', save_fig=False) 102 | return self.encoder 103 | 104 | def Finetuning(self): 105 | self.encoder, self.u_mean = self.initialization() 106 | self.encoder.cuda() 107 | self.encoder.train() 108 | for param in self.encoder.parameters(): 109 | param.requires_grad = True 110 | optimizer = optim.AdamW(self.encoder.parameters(), lr=0.0001) 111 | ACC_prev = 0.0 112 | loss_log = [] 113 | acc_log = [] 114 | nmi_log = [] 115 | for T in range(0, self.MaxIter1): 116 | print('Finetuning Epoch: ', T + 1) 117 | if T % self.T1 == 1: 118 | self.u_mean = self.update_cluster_centers() 119 | for x, target in self.data_loader: 120 | u = torch.zeros([self.num_cluster, self.batch_size, self.latent_size]).cuda() 121 | x = Variable(x).cuda() 122 | for kk in range(0, self.num_cluster): 123 | y = self.encode_with_pooling(x) 124 | u[kk, :, :] = y.cuda() 125 | u = u.detach() 126 | p = self.cmp(u, self.u_mean.cuda()) 127 | p = p.detach() 128 | self.u_mean = self.u_mean.cuda() 129 | p = p.T 130 | p = torch.pow(p, self.m) 131 | for i in range(0, self.num_cluster): 132 | out1, out2 = self.cropping(x, tp_unit=self.temporal_unit, model=self.encoder) 133 | u1 = self.encode_with_pooling(x) 134 | self.u_mean = self.u_mean.float() 135 | loss_c = torch.matmul(p[i, :].unsqueeze(0), torch.sum(torch.pow(u1 - self.u_mean[i, :].unsqueeze(0).repeat(self.batch_size, 1), 2), dim=1)) 136 | loss_r = self.contrastive_loss(out1, out2, temporal_unit=self.temporal_unit) 137 | loss = loss_r + 0.1 * loss_c 138 | optimizer.zero_grad() 139 | loss.backward() 140 | optimizer.step() 141 | self.net.update_parameters(self.encoder) 142 | loss_log.append(loss) 143 | 144 | ACC, NMI = self.model_evaluation(T) 145 | acc_log.append(ACC) 146 | nmi_log.append(NMI) 147 | 148 | if ACC > ACC_prev: 149 | ACC_prev = ACC 150 | with open(self.dataset_name + '_Finetuning_phase', 'wb') as f: 151 | torch.save(self.encoder, f) 152 | with open(self.dataset_name + '_Centers', 'wb') as f: 153 | torch.save(self.u_mean, f) 154 | print(f"Epoch #{T + 1}: loss={loss}") 155 | 156 | file = os.getcwd() + '\\finetuning.csv' 157 | data = pd.DataFrame.from_dict({'finetuning': loss_log, 'ACC': acc_log, 'NMI': nmi_log}, orient='index') # orient='columns' 158 | data.to_csv(file, index=False) 159 | 160 | self.plotter(name=self.dataset_name + '_Finetuning_phase', save_fig=False) 161 | 162 | def initialization(self): 163 | print("-----initialization mode--------") 164 | self.encoder = torch.load('AE_' + self.dataset_name + '_pretrain') 165 | self.encoder.cuda() 166 | datas = np.zeros([self.dataset_size, self.latent_size]) 167 | ii = 0 168 | for x, target in self.data_loader: 169 | x = Variable(x).cuda() 170 | u = self.encode_with_pooling(x) 171 | u = u.cpu() 172 | datas[(ii) * self.batch_size:(ii + 1) * self.batch_size] = u.data.numpy() 173 | ii = ii + 1 174 | # datas = datas.cpu() 175 | kmeans = KMeans(n_clusters=self.num_cluster, random_state=0).fit(datas) 176 | self.u_mean = kmeans.cluster_centers_ 177 | self.u_mean = torch.from_numpy(self.u_mean) 178 | self.u_mean = Variable(self.u_mean).cuda() 179 | return self.encoder, self.u_mean 180 | 181 | def Kmeans_model_evaluation(self, T): 182 | self.encoder.eval() 183 | datas = np.zeros([self.dataset_size, self.latent_size]) 184 | label_true = np.zeros(self.dataset_size) 185 | ii = 0 186 | for x, target in self.data_loader: 187 | x = Variable(x).cuda() 188 | u = self.encode_with_pooling(x) 189 | u = u.cpu() 190 | datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.numpy() 191 | label_true[ii * self.batch_size:(ii + 1) * self.batch_size] = target.numpy() 192 | ii = ii + 1 193 | 194 | kmeans = KMeans(n_clusters=self.num_cluster, random_state=0).fit(datas) 195 | 196 | label_pred = kmeans.labels_ 197 | ACC = acc(label_true, label_pred, self.num_cluster) 198 | NMI = nmi(label_true, label_pred) 199 | print('ACC', ACC) 200 | print('NMI', NMI) 201 | if T == 0: 202 | np.save('./features/Start_Pretraining_R.npy', datas) 203 | np.save('./features/Start_Pretraining_y_true.npy', label_true) 204 | if T == self.pretraining_epoch-1: 205 | np.save('./features/End_Pretraining_R.npy', datas) 206 | np.save('./features/End_Pretraining_y_true.npy', label_true) 207 | return ACC, NMI 208 | 209 | def update_cluster_centers(self): 210 | self.encoder.eval() 211 | for param in self.encoder.parameters(): 212 | param.requires_grad = False 213 | den = torch.zeros([self.num_cluster]).cuda() 214 | num = torch.zeros([self.num_cluster, self.latent_size]).cuda() 215 | for x, target in self.data_loader: 216 | x = Variable(x).cuda() 217 | u = self.encode_with_pooling(x) 218 | p = self.cmp(u.unsqueeze(0).repeat(self.num_cluster, 1, 1), self.u_mean) 219 | p = torch.pow(p, self.m) 220 | for kk in range(0, self.num_cluster): 221 | den[kk] = den[kk] + torch.sum(p[:, kk]) 222 | num[kk, :] = num[kk, :] + torch.matmul(p[:, kk].T, u) 223 | for kk in range(0, self.num_cluster): 224 | self.u_mean[kk, :] = torch.div(num[kk, :], den[kk]) 225 | self.encoder.cuda() 226 | self.encoder.train() 227 | for param in self.encoder.parameters(): 228 | param.requires_grad = True 229 | return self.u_mean 230 | 231 | def cmp(self, u, u_mean): 232 | p = torch.zeros([self.batch_size, self.num_cluster]).cuda() 233 | for j in range(0, self.num_cluster): 234 | p[:, j] = torch.sum(torch.pow(u[j, :, :] - u_mean[j, :].unsqueeze(0).repeat(self.batch_size, 1), 2), dim=1) 235 | p = torch.pow(p, -1 / (self.m - 1)) 236 | sum1 = torch.sum(p, dim=1) 237 | p = torch.div(p, sum1.unsqueeze(1).repeat(1, self.num_cluster)) 238 | # print(p[1,:]) 239 | return p 240 | 241 | def model_evaluation(self, T): 242 | datas = np.zeros([self.dataset_size, self.latent_size]) 243 | pred_labels = np.zeros(self.dataset_size) 244 | true_labels = np.zeros(self.dataset_size) 245 | ii = 0 246 | for x, target in self.data_loader: 247 | x = Variable(x).cuda() 248 | u = self.encode_with_pooling(x) 249 | datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.cpu().numpy() 250 | u = u.unsqueeze(0).repeat(self.num_cluster, 1, 1) 251 | p = self.cmp(u, self.u_mean) 252 | y = torch.argmax(p, dim=1) 253 | y = y.cpu() 254 | y = y.numpy() 255 | pred_labels[(ii) * self.batch_size:(ii + 1) * self.batch_size] = y 256 | true_labels[(ii) * self.batch_size:(ii + 1) * self.batch_size] = target.numpy() 257 | ii = ii + 1 258 | 259 | ACC = acc(true_labels, pred_labels, self.num_cluster) 260 | NMI = nmi(true_labels, pred_labels) 261 | print('ACC', ACC) 262 | print('NMI', NMI) 263 | if T == 0: 264 | np.save('./features/Start_Finetuning_R.npy', datas) 265 | np.save(f'./features/Start_Finetuning_y_pred.npy', pred_labels) 266 | np.save(f'./features/Start_Finetuning_y_true.npy', true_labels) 267 | if T == self.MaxIter1-1: 268 | np.save(f'./features/End_Finetuning_End_Finetuning_R.npy', datas) 269 | np.save(f'./features/End_Finetuning_y_pred.npy', pred_labels) 270 | np.save(f'./features/End_Finetuning_y_true.npy', true_labels) 271 | self.encoder.cuda() 272 | self.encoder.train() 273 | for param in self.encoder.parameters(): 274 | param.requires_grad = True 275 | 276 | return ACC, NMI 277 | 278 | def encode_with_pooling(self, data): 279 | assert data.ndim == 3 280 | n_samples, ts_l, _ = data.shape 281 | 282 | org_training = self.net.training 283 | self.net.eval() 284 | 285 | if torch.is_tensor(data): 286 | dataset = TensorDataset(data) 287 | else: 288 | dataset = TensorDataset(torch.from_numpy(data).to(torch.float)) 289 | loader = DataLoader(dataset, batch_size=self.batch_size) 290 | 291 | with torch.no_grad(): 292 | output = [] 293 | for batch in loader: 294 | x = batch[0] 295 | out = self.net(x.to(self.device, non_blocking=True)) 296 | out = F.max_pool1d(out.transpose(1, 2), kernel_size=out.size(1), ).transpose(1, 2) 297 | out = out.cpu() 298 | out = out.squeeze(1) 299 | 300 | output.append(out) 301 | 302 | output = torch.cat(output, dim=0) 303 | 304 | self.net.train(org_training) 305 | if torch.is_tensor(data): 306 | return output.to(self.device) 307 | else: 308 | return output.numpy() 309 | 310 | def target_distribution(self, q_): 311 | weight = (q_ ** 2) / torch.sum(q_, 0) 312 | return (weight.t() / torch.sum(weight, 1)).t() 313 | 314 | def plotter(self, name, save_fig=False): 315 | print('Evaluation') 316 | legend_properties = {'family': 'Calibri', 'size': '16'} 317 | target_names = ['Microseismic', 'Noise'] 318 | colors = ['#9B3A4D', '#70A0AC'] 319 | self.encoder.eval() 320 | label_true = np.zeros(self.dataset_size) 321 | datas = np.zeros([self.dataset_size, self.latent_size]) 322 | ii = 0 323 | for x, target in self.data_loader: 324 | x = Variable(x).cuda() 325 | u = self.encode_with_pooling(x) 326 | u = u.cpu() 327 | datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.numpy() 328 | label_true[ii * self.batch_size:(ii + 1) * self.batch_size] = target.numpy() 329 | ii = ii + 1 330 | redu = TSNE(n_components=2, random_state=123).fit_transform(datas) 331 | lw = 0.6 332 | f = plt.figure(figsize=(6, 6)) 333 | ax = f.add_subplot(111) 334 | for color, i, target_name in zip(colors, [0, 1], target_names): 335 | plt.scatter(redu[label_true == i, 0], redu[label_true == i, 1], color=color, alpha=0.5, lw=lw, s=40, 336 | label=target_name) 337 | plt.legend(loc='lower left', shadow=True, scatterpoints=1, prop=legend_properties, facecolor='w', frameon=False) 338 | ax.axis('off') 339 | ax.axis('tight') 340 | if save_fig: 341 | f.savefig(f'./{name}.png', dpi=600) 342 | plt.close(f) 343 | 344 | def instance_loss(self, z1, z2): 345 | B, T = z1.size(0), z1.size(1) 346 | if B == 1: 347 | return z1.new_tensor(0.) 348 | z = torch.cat([z1, z2], dim=0) 349 | z = z.transpose(0, 1) 350 | sim = torch.matmul(z, z.transpose(1, 2)) 351 | logits = torch.tril(sim, diagonal=-1)[:, :, :-1] 352 | logits += torch.triu(sim, diagonal=1)[:, :, 1:] 353 | logits = -F.log_softmax(logits, dim=-1) 354 | 355 | i = torch.arange(B, device=z1.device) 356 | loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2 357 | return loss 358 | 359 | def temporal_loss(self, z1, z2): 360 | B, T = z1.size(0), z1.size(1) 361 | if T == 1: 362 | return z1.new_tensor(0.) 363 | z = torch.cat([z1, z2], dim=1) 364 | sim = torch.matmul(z, z.transpose(1, 2)) 365 | logits = torch.tril(sim, diagonal=-1)[:, :, :-1] 366 | logits += torch.triu(sim, diagonal=1)[:, :, 1:] 367 | logits = -F.log_softmax(logits, dim=-1) 368 | 369 | t = torch.arange(T, device=z1.device) 370 | loss = (logits[:, t, T + t - 1].mean() + logits[:, T + t, t].mean()) / 2 371 | return loss 372 | 373 | def contrastive_loss(self, z1, z2, alpha=0.5, temporal_unit=0): 374 | loss = torch.tensor(0., device=z1.device) 375 | d = 0 376 | while z1.size(1) > 1: 377 | if alpha != 0: 378 | loss += alpha * self.instance_loss(z1, z2) 379 | if d >= temporal_unit: 380 | if 1 - alpha != 0: 381 | loss += (1 - alpha) * self.temporal_loss(z1, z2) 382 | d += 1 383 | z1 = F.max_pool1d(z1.transpose(1, 2), kernel_size=2).transpose(1, 2) 384 | z2 = F.max_pool1d(z2.transpose(1, 2), kernel_size=2).transpose(1, 2) 385 | if z1.size(1) == 1: 386 | if alpha != 0: 387 | loss += alpha * self.instance_loss(z1, z2) 388 | d += 1 389 | return loss / d 390 | 391 | def cropping(self, x, tp_unit, model): 392 | ts_l = x.size(1) 393 | crop_l = np.random.randint(low=2 ** (tp_unit + 1), high=ts_l + 1) 394 | crop_left = np.random.randint(ts_l - crop_l + 1) 395 | crop_right = crop_left + crop_l 396 | crop_eleft = np.random.randint(crop_left + 1) 397 | crop_eright = np.random.randint(low=crop_right, high=ts_l + 1) 398 | crop_offset = np.random.randint(low=-crop_eleft, high=ts_l - crop_eright + 1, size=x.size(0)) 399 | 400 | indx1 = crop_offset + crop_eleft 401 | num_elem1 = crop_right - crop_eleft 402 | all_indx1 = indx1[:, None] + np.arange(num_elem1) 403 | 404 | indx2 = crop_offset + crop_left 405 | num_elem2 = crop_eright - crop_left 406 | all_indx2 = indx2[:, None] + np.arange(num_elem2) 407 | 408 | out1 = model(x[torch.arange(all_indx1.shape[0])[:, None], all_indx1]) 409 | out1 = out1[:, -crop_l:] 410 | 411 | out2 = model(x[torch.arange(all_indx2.shape[0])[:, None], all_indx2]) 412 | out2 = out2[:, :crop_l] 413 | 414 | return out1, out2 415 | --------------------------------------------------------------------------------