├── Eval_Syn_Data.npy
├── Eval_Syn_Representations.npy
├── Micrseismic_Timeseries_Centers
├── Micrseismic_Timeseries_Finetuning_phase
├── Micrseismic_Timeseries_Pretraining_phase
├── README.md
├── colormap.mat
├── datasets
    └── Micrseismic_Timeseries
    │   └── test
├── datautils.py
├── evaluation.py
├── models
    ├── Metrics.py
    ├── __init__.py
    └── encoder.py
├── requirements.txt
├── results
    ├── ClusterAsMicroseismic.jpg
    ├── ClusterAsNoise.jpg
    ├── Framework.jpg
    ├── comparison_results.jpg
    ├── reprs.png
    └── syn_reprs.png
├── run.py
├── tools.py
└── tscc.py


/Eval_Syn_Data.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Eval_Syn_Data.npy


--------------------------------------------------------------------------------
/Eval_Syn_Representations.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Eval_Syn_Representations.npy


--------------------------------------------------------------------------------
/Micrseismic_Timeseries_Centers:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Centers


--------------------------------------------------------------------------------
/Micrseismic_Timeseries_Finetuning_phase:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Finetuning_phase


--------------------------------------------------------------------------------
/Micrseismic_Timeseries_Pretraining_phase:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/Micrseismic_Timeseries_Pretraining_phase


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Time Series Contrastive Clustering (TSCC)
  2 | 
  3 | ## Cite This Paper:
  4 | Z. Yang, H. Li, X. Tuo, L. Li and J. Wen, "Unsupervised Clustering of Microseismic Signals Using a Contrastive Learning Model," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-12, 2023, Art no. 5903212, doi: 10.1109/TGRS.2023.3240728.
  5 | 
  6 | Link: https://doi.org/10.1109/TGRS.2023.3240728
  7 | 
  8 | ## Getting Started
  9 | 
 10 | Clone the project into your local system  
 11 | 
 12 | ```
 13 | git clone https://github.com/yangzhen-cdut/Unsupervised-Clustering.git
 14 | cd Unsupervised-Clustering
 15 | ```
 16 | 
 17 | ## Requirements
 18 | The recommended requirements for TSCC are specified as follows:
 19 | * Python 3.7
 20 | * torch==1.8.1
 21 | * scipy==1.7.3
 22 | * numpy==1.21.6
 23 | * pandas==1.3.5
 24 | * scikit_learn==0.24.2
 25 | * matplotlib==3.5.2
 26 | * bottleneck==1.3.4
 27 | * seaborn==0.11.2
 28 | 
 29 | The dependencies can be installed by:
 30 | ```bash
 31 | pip install -r requirements.txt
 32 | ```
 33 | 
 34 | Note that you should have CUDA installed for running the code.
 35 | 
 36 | ## Usage
 37 | 
 38 | To train TSCC on a microseismic dataset, run the following command:
 39 | 
 40 | ```run
 41 | python run.py <dataset_name> <dataset_name> --pretraining_epoch <pretraining_epoch> --batch-size <batch_size> --MaxIter <MaxIter> --repr-dims <repr_dims>
 42 | ```
 43 | 
 44 | After training, the trained encoder of pre-training phase, the trained encoder of fine-tuning phase and clustering centers can be found in `./<dataset_name>_Pretraining_phase`, `./<dataset_name>_Finetuning_phase`, `./<dataset_name>_Centers`. 
 45 | 
 46 | To evaluate TSCC on a microseismic dataset, run the following command:
 47 | 
 48 | ```evaluate
 49 | python evaluation.py <dataset_name> <dataset_name> --pretraining_epoch <pretraining_epoch> --batch-size <batch_size> --MaxIter <MaxIter> --repr-dims <repr_dims>
 50 | ```
 51 | 
 52 | There are two examples are given in `evaluation.py`: `eval_with_real_data` and `eval_with_synthetic_data`. You can call those two functions directly, and the output of representations can be found in `./Eval_Representations.npy` and `./Eval_Syn_Representations.npy`, respectively. 
 53 | 
 54 | 
 55 | ## Results
 56 | 
 57 | ### The architecture of TSCC used in our study. 
 58 | 
 59 | <center>
 60 |     <img src=./results/Framework.jpg width="600"/>
 61 | </center>
 62 | 
 63 | ### Clustering performance comparison. 
 64 | 
 65 | <center>
 66 |     <img src=./results/comparison_results.jpg width="600"/>
 67 | </center>
 68 | 
 69 | ### Visualization of learned latent representations. 
 70 | 
 71 | Latent representations of synthetic waveforms
 72 | 
 73 | <center>
 74 |     <img src=./results/syn_reprs.png width="400"/>
 75 | </center>
 76 | 
 77 | Latent representations of real microseismic waveforms
 78 | 
 79 | <center>
 80 |     <img src=./results/reprs.png width="500"/>
 81 | </center>
 82 | 
 83 | ### Representative cluster distribution
 84 | 
 85 | Tpyical microseismic waveforms    
 86 | 
 87 | <center>
 88 |     <img src=./results/ClusterAsMicroseismic.jpg width="400"/>
 89 | </center>
 90 | 
 91 | Tpyical noise waveforms
 92 | 
 93 | <center>
 94 |     <img src=./results/ClusterAsNoise.jpg width="400"/>
 95 | </center>
 96 | 
 97 | ### Classification performance comparison of supervised classifier using raw time series and the features (R) generated by the TSCC.
 98 | 
 99 | <table>
100 | 	<tr>
101 |       <td rowspan="2">Methods</td>
102 | 	    <th colspan="3">Time series</th>
103 |       <th colspan="3">Feature R</th>
104 | 	</tr >
105 | 	<tr >
106 | 	    <td>ACC (%)</td>
107 |       <td>NMI (%)</td>
108 |       <td>AUPRC (%)</td>
109 |     	<td>ACC (%)</td>
110 |       <td>NMI (%)</td>
111 |       <td>AUPRC (%)</td>
112 | 	</tr>
113 | 	<tr>
114 | 	    <td>Linear</td>
115 |       <td>71.59</td>
116 |       <td>15.38</td>
117 |       <td>66.47</td>
118 |     	<td>99.11</td>
119 |       <td>92.63</td>
120 |       <td>98.71</td>
121 | 	</tr>
122 |   <tr>
123 | 	    <td>KNN</td>
124 |       <td>90.58</td>
125 |       <td>55.55</td>
126 |       <td>87.65</td>
127 |     	<td>98.13</td>
128 |       <td>86.62</td>
129 |       <td>97.36</td>
130 | 	</tr>
131 |   <tr>
132 | 	    <td>SVM</td>
133 |       <td>97.65</td>
134 |       <td>83.91</td>
135 |       <td>99.81</td>
136 |     	<td>98.94</td>
137 |       <td>91.56</td>
138 |       <td>99.91</td>
139 | 	</tr>
140 |   <tr>
141 | 	    <td>TSCC</td>
142 |       <td>98.07</td>
143 |       <td>86.26</td>
144 |       <td>97.15</td>
145 |     	<td>--</td>
146 |       <td>--</td>
147 |       <td>--</td>
148 | 	</tr>
149 | </table>
150 | 


--------------------------------------------------------------------------------
/colormap.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/colormap.mat


--------------------------------------------------------------------------------
/datasets/Micrseismic_Timeseries/test:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/datautils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import numpy as np
 3 | from torch.utils.data import DataLoader
 4 | 
 5 | 
 6 | class Mydataset(torch.utils.data.Dataset):
 7 |     def __init__(self, x, y):
 8 |         self.x = x
 9 |         self.y = y
10 |         self.idx = list()
11 |         for item in x:
12 |             self.idx.append(item)
13 |         pass
14 | 
15 |     def __getitem__(self, index):
16 |         input_data = self.idx[index]
17 |         target = self.y[index]
18 |         return input_data, target
19 | 
20 |     def __len__(self):
21 |         return len(self.idx)
22 | 
23 | 
24 | def load_data(dataset_name, data_size, batch_size):
25 |     x = np.load('./datasets/' + f'{dataset_name}/' + f'cluster_x_{data_size}.npy')
26 |     y = np.load('./datasets/' + f'{dataset_name}/' + f'cluster_y_{data_size}.npy')
27 |     dataset = Mydataset(x, y)
28 |     print('n_cluster:', np.unique(y))
29 | 
30 |     torch.manual_seed(123)
31 |     data_loader = torch.utils.data.DataLoader(
32 |         dataset=dataset,
33 |         batch_size=batch_size,
34 |         shuffle=True,
35 |         num_workers=0)
36 |     return data_loader
37 | 


--------------------------------------------------------------------------------
/evaluation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import math
  3 | import torch
  4 | import datautils
  5 | import argparse
  6 | import numpy as np
  7 | import matplotlib.pyplot as plt
  8 | from torch.autograd import Variable
  9 | 
 10 | from tscc import TSCCModel
 11 | 
 12 | parser = argparse.ArgumentParser()
 13 | parser.add_argument('--dataset_name', default='Micrseismic_Timeseries', type=str, help='The dataset name')
 14 | parser.add_argument('--dataset_size', default=4928, type=int, help='The size of dataset')
 15 | parser.add_argument('--dim', default=1, type=int, help='The dimension of input')
 16 | parser.add_argument('--num_cluster', type=int, default=2, help='The number of cluster')
 17 | parser.add_argument('--batch_size', type=int, default=64, help='The batch size')
 18 | parser.add_argument('--repr_dims', type=int, default=32, help='The representation dimension')
 19 | parser.add_argument('--lr', type=float, default=0.001, help='The learning rate of pre-training phase')
 20 | parser.add_argument('--pretraining_epoch', type=int, default=25, help='The epoch of pre-training phase')
 21 | parser.add_argument('--MaxIter1', type=int, default=25, help='The epoch of fine-tuning phase')
 22 | args = parser.parse_args()
 23 | 
 24 | print("Arguments:", str(args))
 25 | print('Loading data... \n', end='')
 26 | 
 27 | # Load data
 28 | data_loader = datautils.load_data(args.dataset_name, args.dataset_size, args.batch_size)
 29 | 
 30 | config = dict(dataset_size=args.dataset_size,
 31 |               dataset_name=args.dataset_name,
 32 |               pretraining_epoch=args.pretraining_epoch,
 33 |               batch_size=args.batch_size,
 34 |               MaxIter1=args.MaxIter1,
 35 |               lr=args.lr,
 36 |               output_dims=args.repr_dims)
 37 | 
 38 | model = TSCCModel(data_loader, n_cluster=args.num_cluster, input_dims=args.dim, **config)
 39 | 
 40 | model.encoder = torch.load('Micrseismic_Timeseries_Finetuning_phase')
 41 | print('finish inital')
 42 | model.encoder.eval()
 43 | 
 44 | 
 45 | def eval_with_real_data(save=False):
 46 |     data = np.zeros([args.dataset_size, 500, 1])
 47 |     reps = np.zeros([args.dataset_size, 500, 32])
 48 | 
 49 |     ii = 0
 50 |     for x, target in data_loader:
 51 |         x = Variable(x).cuda()
 52 |         u = model.encoder(x)
 53 |         u = u.cpu()
 54 |         reps[ii * args.batch_size:(ii + 1) * args.batch_size, :, :] = u.data.numpy()
 55 |         data[ii * args.batch_size:(ii + 1) * args.batch_size, :, :] = x.cpu().numpy()
 56 |         ii = ii + 1
 57 |     if save:
 58 |         np.save(os.getcwd() + '\\Eval_Representations.npy', reps)
 59 |         np.save(os.getcwd() + '\\Eval_Data.npy', data)
 60 |     print('finish')
 61 | 
 62 | 
 63 | def eval_with_synthetic_data(save=False):
 64 |     # Ricker with White Gaussian Noise, SNR = -5dB
 65 |     n = 500
 66 |     wt = Ricker(n)
 67 |     np.random.seed(123)
 68 |     noise = np.random.normal(loc=0, scale=1.0, size=(len(wt),))
 69 |     nwt = add_noise(wt, noise, -5)
 70 |     nwt = nwt / np.max(abs(nwt))  # de-mean
 71 | 
 72 |     plt.figure(figsize=(8, 6))
 73 |     plt.subplot(211)
 74 |     plt.plot(nwt, c='#9B3A4D', linewidth=1.5)
 75 |     plt.tick_params(labelsize=14)
 76 |     plt.margins(x=0)
 77 |     plt.title('Noisy Ricker with SNR=-5dB', fontsize=20, family='Calibri')
 78 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
 79 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
 80 | 
 81 |     # Synthetic Noise
 82 |     plt.subplot(212)
 83 |     t = np.linspace(0, n-1, n)
 84 |     lowfre_noise = np.sin((t) * np.pi / 100)
 85 |     np.random.seed(123)
 86 |     random_noise = np.random.normal(loc=0, scale=1.0, size=(500,))
 87 |     noise = random_noise + lowfre_noise
 88 |     noise = noise / np.max(abs(noise))
 89 |     plt.plot(noise, c='#70A0AC', linewidth=1.5)
 90 |     plt.tick_params(labelsize=14)
 91 |     plt.margins(x=0)
 92 |     plt.title('Synthetic Noise', fontsize=20, family='Calibri')
 93 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
 94 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
 95 |     plt.tight_layout()
 96 |     plt.show()
 97 | 
 98 |     syndata = np.zeros((2, 500, 1))
 99 |     syndata[0, :, 0] = nwt
100 |     syndata[1, :, 0] = noise
101 | 
102 |     syndata_in = Variable(torch.tensor(syndata, dtype=torch.float32)).cuda()
103 | 
104 |     syn_reps = model.encoder(syndata_in)
105 |     syn_reps = syn_reps.detach().cpu().numpy()
106 |     if save:
107 |         np.save(os.getcwd() + '\\Eval_Syn_Representations.npy', syn_reps)
108 |         np.save(os.getcwd() + '\\Eval_Syn_Data.npy', np.squeeze(syndata, axis=2))
109 |     print('finish')
110 | 
111 | 
112 | def Ricker(n, f0=20, dt=0.001):
113 |     wt = np.zeros(n)
114 |     i = 0
115 |     for k in range(int(-n / 2), int(n / 2)):
116 |         wt[i] = (1 - 2.0 * (math.pi * f0 * k * dt) ** 2) * math.exp(-1 * (math.pi * f0 * k * dt) ** 2)
117 |         i += 1
118 | 
119 |     return wt
120 | 
121 | 
122 | def add_noise(x, noise, SNR):
123 |     """
124 |     :param x: pure signal (np.array)
125 |     :param noise: noise = random.normal(0,1)
126 |     :param SNR: signal-to-noise ratio
127 |     :return: noisy_signal
128 |     """
129 |     try:
130 |         x = np.array(x)
131 |     except:
132 |         pass
133 | 
134 |     N = len(x.tolist())
135 |     noise = noise - np.mean(noise)
136 |     signal_power = 1.0 / N * sum(x ** 2)
137 |     noise_variance = signal_power / (math.pow(10, SNR / 10))
138 |     NOISE = math.sqrt(noise_variance) / np.std(noise) * noise
139 |     noisy_signal = x + NOISE
140 |     return noisy_signal
141 | 
142 | 
143 | if __name__ == '__main__':
144 |     eval_with_synthetic_data()


--------------------------------------------------------------------------------
/models/Metrics.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from sklearn.metrics import normalized_mutual_info_score
 3 | 
 4 | nmi = normalized_mutual_info_score
 5 | def acc(y_true, y_pred, num_cluster):
 6 |     """
 7 |     Calculate clustering accuracy. Require scikit-learn installed
 8 | 
 9 |     # Arguments
10 |         y_true: true labels, numpy.array with shape `(n_samples,)`
11 |         y_pred: predicted labels, numpy.array with shape `(n_samples,)`
12 |         num_cluster: number of cluster
13 | 
14 |     # Return
15 |         accuracy, in [0,1]
16 |     """
17 |     y_true = y_true.astype(np.int64)
18 |     y_pred = y_pred.astype(np.int64)
19 |     assert y_pred.size == y_true.size
20 | 
21 |     w = np.zeros((num_cluster, num_cluster))
22 |     for i in range(y_pred.size):
23 |         w[y_pred[i], y_true[i]] += 1
24 |     from scipy.optimize import linear_sum_assignment
25 | 
26 |     ind = linear_sum_assignment(w.max() - w)
27 |     accuracy = 0.0
28 |     for i in ind[0]:
29 |         accuracy = accuracy + w[i, ind[1][i]]
30 |     return accuracy / y_pred.size
31 | 


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .encoder import TSCCEncoder
2 | 


--------------------------------------------------------------------------------
/models/encoder.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import numpy as np
 3 | from torch import nn
 4 | import torch.nn.functional as F
 5 | 
 6 | 
 7 | class TSCCEncoder(nn.Module):
 8 |     def __init__(self, input_dims, output_dims, hidden_dims=64, depth=10, mask_mode=0):
 9 |         super().__init__()
10 |         self.input_dims = input_dims
11 |         self.output_dims = output_dims
12 |         self.hidden_dims = hidden_dims
13 |         self.mask_mode = mask_mode
14 |         self.linear_projector = nn.Linear(input_dims, hidden_dims)
15 |         self.feature_extractor = DilatedConvModule(hidden_dims, [hidden_dims] * depth + [output_dims], kernel_size=3)
16 |         self.repr_dropout = nn.Dropout(p=0.2)
17 | 
18 |     def forward(self, x):
19 |         nan_mask = ~x.isnan().any(axis=-1)
20 |         x[~nan_mask] = 0
21 |         x = self.linear_projector(x)
22 | 
23 |         if self.training:
24 |             mask = self.mask_mode
25 |         else:
26 |             mask = 1
27 | 
28 |         if mask == 0:
29 |             mask = torch.from_numpy(np.random.binomial(1, 0.5, size=(x.size(0), x.size(1)).to(x.device))).to(torch.bool)
30 |         elif mask == 1:
31 |             mask = x.new_full((x.size(0), x.size(1)), True, dtype=torch.bool)
32 | 
33 |         mask &= nan_mask
34 |         x[~mask] = 0
35 | 
36 |         x = x.transpose(1, 2)
37 |         x = self.repr_dropout(self.feature_extractor(x))
38 |         x = x.transpose(1, 2)
39 | 
40 |         return x
41 | 
42 | 
43 | class DilatedConv(nn.Module):
44 |     def __init__(self, in_channels, out_channels, kernel_size, dilation=1, groups=1):
45 |         super().__init__()
46 |         self.receptive_field = (kernel_size - 1) * dilation + 1
47 |         padding = self.receptive_field // 2
48 |         self.conv = nn.Conv1d(
49 |             in_channels, out_channels, kernel_size,
50 |             padding=padding,
51 |             dilation=dilation,
52 |             groups=groups
53 |         )
54 |         self.remove = 1 if self.receptive_field % 2 == 0 else 0
55 | 
56 |     def forward(self, x):
57 |         out = self.conv(x)
58 |         if self.remove > 0:
59 |             out = out[:, :, : -self.remove]
60 |         return out
61 | 
62 | 
63 | class DilatedConvBlock(nn.Module):
64 |     def __init__(self, in_channels, out_channels, kernel_size, dilation, final=False):
65 |         super().__init__()
66 |         self.DilatedConv1 = DilatedConv(in_channels, out_channels, kernel_size, dilation=dilation)
67 |         self.DilatedConv2 = DilatedConv(out_channels, out_channels, kernel_size, dilation=dilation)
68 |         self.ResProjector = nn.Conv1d(in_channels, out_channels, 1) if in_channels != out_channels or final else None
69 | 
70 |     def forward(self, x):
71 |         residual = x if self.ResProjector is None else self.ResProjector(x)
72 |         x = F.gelu(x)
73 |         x = self.DilatedConv1(x)
74 |         x = F.gelu(x)
75 |         x = self.DilatedConv2(x)
76 |         return x + residual
77 | 
78 | 
79 | class DilatedConvModule(nn.Module):
80 |     def __init__(self, in_channels, channels, kernel_size):
81 |         super().__init__()
82 |         self.net = nn.Sequential(*[
83 |             DilatedConvBlock(
84 |                 channels[i - 1] if i > 0 else in_channels,
85 |                 channels[i],
86 |                 kernel_size=kernel_size,
87 |                 dilation=2 ** i,
88 |                 final=(i == len(channels) - 1)
89 |             )
90 |             for i in range(len(channels))
91 |         ])
92 | 
93 |     def forward(self, x):
94 |         return self.net(x)
95 | 
96 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | seaborn==0.11.2
2 | matplotlib==3.5.2
3 | Bottleneck==1.3.4
4 | torch==1.8.1
5 | scipy==1.7.3
6 | numpy==1.21.6
7 | pandas==1.3.5
8 | scikit_learn==0.24.2
9 | 


--------------------------------------------------------------------------------
/results/ClusterAsMicroseismic.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/ClusterAsMicroseismic.jpg


--------------------------------------------------------------------------------
/results/ClusterAsNoise.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/ClusterAsNoise.jpg


--------------------------------------------------------------------------------
/results/Framework.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/Framework.jpg


--------------------------------------------------------------------------------
/results/comparison_results.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/comparison_results.jpg


--------------------------------------------------------------------------------
/results/reprs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/reprs.png


--------------------------------------------------------------------------------
/results/syn_reprs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangzhen-cdut/Unsupervised-Clustering/6eba15ce5993c1713fd75bb96fee06be0536d226/results/syn_reprs.png


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | import datetime
 3 | import argparse
 4 | import datautils
 5 | from tscc import TSCCModel
 6 | 
 7 | parser = argparse.ArgumentParser()
 8 | parser.add_argument('--dataset_name', default='Micrseismic_Timeseries', type=str, help='The dataset name')
 9 | parser.add_argument('--dataset_size', default=4928, type=int, help='The size of dataset')
10 | parser.add_argument('--dim', default=1, type=int, help='The dimension of input')
11 | parser.add_argument('--num_cluster', type=int, default=2, help='The number of cluster')
12 | parser.add_argument('--batch_size', type=int, default=64, help='The batch size')
13 | parser.add_argument('--repr_dims', type=int, default=32, help='The representation dimension')
14 | parser.add_argument('--lr', type=float, default=0.001, help='The learning rate of pre-training phase')
15 | parser.add_argument('--pretraining_epoch', type=int, default=25, help='The epoch of pre-training phase')
16 | parser.add_argument('--MaxIter', type=int, default=25, help='The epoch of fine-tuning phase')
17 | args = parser.parse_args()
18 | 
19 | print("Arguments:", str(args))
20 | print('Loading data... \n', end='')
21 | 
22 | # Load data
23 | data_loader = datautils.load_data(args.dataset_name, args.dataset_size, args.batch_size)
24 | 
25 | config = dict(dataset_size=args.dataset_size,
26 |               dataset_name=args.dataset_name,
27 |               pretraining_epoch=args.pretraining_epoch,
28 |               batch_size=args.batch_size,
29 |               MaxIter1=args.MaxIter,
30 |               lr=args.lr,
31 |               output_dims=args.repr_dims)
32 | 
33 | model = TSCCModel(data_loader, n_cluster=args.num_cluster, input_dims=args.dim, **config)
34 | 
35 | t = time.time()
36 | 
37 | if args.pretraining_epoch != 0:
38 |     model.Pretraining()
39 | if args.MaxIter1 != 0:
40 |     model.Finetuning()
41 | 
42 | t = time.time() - t
43 | 
44 | print(f"\nTraining time: {datetime.timedelta(seconds=t)}\n")
45 | 


--------------------------------------------------------------------------------
/tools.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | from sklearn.manifold import TSNE
  6 | from sklearn.cluster import KMeans
  7 | from models.Metrics import nmi, acc
  8 | 
  9 | import seaborn as sns
 10 | sns.set()
 11 | 
 12 | os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
 13 | 
 14 | 
 15 | def plotter(S, y, name, loc, save_fig=False):
 16 |     ''' function to visualize the outputs of t-SNE '''
 17 | 
 18 |     legend_properties = {'family': 'Calibri', 'size': '16'}
 19 |     target_names = ['Microseismic', 'Noise']
 20 |     colors = ['#9B3A4D', '#70A0AC']
 21 |     lw = 0.6
 22 |     for color, i, target_name in zip(colors, [0, 1], target_names):
 23 |         plt.scatter(S[y == i, 0], S[y == i, 1], c=color, alpha=0.5, lw=lw, s=40, label=target_name)
 24 |     plt.legend(loc=loc, shadow=True, scatterpoints=1, prop=legend_properties, facecolor='white', frameon=False)
 25 |     plt.tick_params(labelsize=12)
 26 |     plt.title(name, fontsize=20, family='Calibri')
 27 | 
 28 | 
 29 | def comparison_clustering(save_fig=False):
 30 |     filename = './features/'
 31 |     f = plt.figure(figsize=(18, 12))
 32 | 
 33 |     # K-means on Time Series
 34 |     ax = plt.subplot(231)
 35 |     x = np.load('./dataset/Time Series/cluster_x_4928.npy')
 36 |     x = np.squeeze(x, axis=2)
 37 |     y = np.load('./dataset/Time Series/cluster_y_4928.npy')
 38 |     kmeans = KMeans(n_clusters=2, random_state=0).fit(x)
 39 |     label_pred = kmeans.labels_
 40 |     ACC = acc(y, label_pred, 2)
 41 |     NMI = nmi(y, label_pred)
 42 |     print('ACC', ACC)
 43 |     print('NMI', NMI)
 44 |     redu = TSNE(n_components=2, random_state=50).fit_transform(x)
 45 |     plotter(redu, y, name='K-means', loc='upper left')
 46 |     ax.axis('off')
 47 |     ax.axis('tight')
 48 | 
 49 |     # DEC
 50 |     ax = plt.subplot(232)
 51 |     enc = np.load(filename + 'DEC.npy')
 52 |     y = np.load(filename + 'DEC_y_true.npy')
 53 |     redu = TSNE(n_components=2, random_state=123).fit_transform(enc)
 54 |     plotter(redu, y, name='DEC', loc='upper right')
 55 |     ax.axis('off')
 56 |     ax.axis('tight')
 57 | 
 58 |     # DCA
 59 |     ax = plt.subplot(233)
 60 |     enc = np.load(filename + 'DCA.npy')
 61 |     y = np.load(filename + 'DCA_y_true.npy')
 62 |     redu = TSNE(n_components=2, random_state=50).fit_transform(enc)
 63 |     plotter(redu, y, name='DCA', loc='upper right')
 64 |     ax.axis('off')
 65 |     ax.axis('tight')
 66 | 
 67 |     # DCSS
 68 |     ax = plt.subplot(234)
 69 |     enc = np.load(filename + 'DCSS.npy')
 70 |     y = np.load(filename + 'DCSS_y_true.npy')
 71 |     redu = TSNE(n_components=2, random_state=64).fit_transform(enc)
 72 |     plotter(redu, y, name='DCSS', loc='upper left')
 73 |     ax.axis('off')
 74 |     ax.axis('tight')
 75 | 
 76 |     # TSCC Pre-training
 77 |     ax = plt.subplot(235)
 78 |     enc = np.load(filename + 'End_Pretraining_u.npy')
 79 |     y = np.load(filename + 'End_Pretraining_y_true.npy')
 80 |     redu = TSNE(n_components=2, random_state=88).fit_transform(enc)
 81 |     plotter(redu, y, name='TSCC Pre-training', loc='upper left')
 82 |     ax.axis('off')
 83 |     ax.axis('tight')
 84 | 
 85 |     # TSCC Fine-tuning
 86 |     ax = plt.subplot(236)
 87 |     enc = np.load(filename + 'End_Finetuning_u.npy')
 88 |     y = np.load(filename + 'End_Finetuning_y_true.npy')
 89 |     redu = TSNE(n_components=2, random_state=0).fit_transform(enc)
 90 |     plotter(redu, y, name='TSCC Fine-tuning', loc='upper right')
 91 |     ax.axis('off')
 92 |     ax.axis('tight')
 93 | 
 94 |     plt.tight_layout()
 95 |     plt.show()
 96 |     if save_fig:
 97 |         f.savefig('./results/comparison_results.png', dpi=600)
 98 | 
 99 | 
100 | def representation_visualization(save_fig=False):
101 |     data = np.load('./Eval_Data.npy')
102 |     data = np.squeeze(data, axis=2)
103 |     reps = np.load('./Eval_Representations.npy')
104 |     p1 = 2290
105 |     p2 = 1171
106 |     p3 = 3279
107 |     p4 = 387
108 |     f = plt.figure(figsize=(15, 10))
109 |     plt.subplot(421)
110 |     plt.plot(data[p1, :]/max(abs(data[p1, :])), c='#9B3A4D', linewidth=2)
111 |     plt.tick_params(labelsize=14)
112 |     plt.margins(x=0)
113 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
114 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
115 |     plt.text(390, 0.65, 'Microseismic', fontsize=18, family='Calibri')
116 | 
117 |     plt.subplot(422)
118 |     plt.plot(data[p2, :]/max(abs(data[p2, :])), c='#9B3A4D', linewidth=2)
119 |     plt.tick_params(labelsize=14)
120 |     plt.margins(x=0)
121 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
122 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
123 |     plt.text(390, 0.65, 'Microseismic', fontsize=18, family='Calibri')
124 | 
125 |     plt.subplot(425)
126 |     plt.plot(data[p3, :]/max(abs(data[p3, :])), c='#70A0AC', linewidth=2)
127 |     plt.tick_params(labelsize=14)
128 |     plt.margins(x=0)
129 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
130 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
131 |     plt.text(450, 0.65, 'Noise', fontsize=18, family='Calibri')
132 | 
133 |     plt.subplot(426)
134 |     plt.plot(data[p4, :]/max(abs(data[p4, :])), c='#70A0AC', linewidth=2)
135 |     plt.tick_params(labelsize=14)
136 |     plt.margins(x=0)
137 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
138 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
139 |     plt.text(450, 0.65, 'Noise', fontsize=18, family='Calibri')
140 | 
141 |     # Customize Colormap
142 |     from matplotlib.colors import LinearSegmentedColormap
143 |     import scipy.io as scio
144 |     colormap = scio.loadmat('./colormap.mat')['mycamp']
145 |     moreland_map = LinearSegmentedColormap.from_list('cos', colormap)
146 | 
147 |     plt.subplot(423)
148 |     ax = sns.heatmap(reps[p1, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
149 |     ax.axis('on')
150 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
151 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
152 |     # cbar = ax.collections[0].colorbar
153 |     # cbar.ax.tick_params(labelsize=14)
154 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
155 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
156 |     plt.tight_layout()
157 | 
158 |     plt.subplot(424)
159 |     ax = sns.heatmap(reps[p2, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
160 |     ax.axis('on')
161 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
162 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
163 |     # cbar = ax.collections[0].colorbar
164 |     # cbar.ax.tick_params(labelsize=14)
165 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
166 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
167 |     plt.tight_layout()
168 | 
169 |     plt.subplot(427)
170 |     ax = sns.heatmap(reps[p3, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
171 |     ax.axis('on')
172 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
173 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
174 |     # cbar = ax.collections[0].colorbar
175 |     # cbar.ax.tick_params(labelsize=14)
176 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
177 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
178 |     plt.tight_layout()
179 | 
180 |     plt.subplot(428)
181 |     ax = sns.heatmap(reps[p4, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
182 |     ax.axis('on')
183 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
184 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
185 |     # cbar = ax.collections[0].colorbar
186 |     # cbar.ax.tick_params(labelsize=14)
187 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
188 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
189 |     plt.tight_layout()
190 |     if save_fig:
191 |         plt.savefig('./results/reprs.png', dpi=600)
192 |     plt.show()
193 | 
194 | 
195 | def syn_representation_visualization(save_fig=False):
196 |     data = np.load('./Eval_Syn_Data.npy')
197 |     reps = np.load('./Eval_Syn_Representations.npy')
198 | 
199 |     f = plt.figure(figsize=(7.5, 10))
200 |     plt.subplot(411)
201 |     plt.plot(data[0, :], c='#9B3A4D', linewidth=2)
202 |     plt.tick_params(labelsize=14)
203 |     plt.margins(x=0)
204 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
205 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
206 |     plt.text(365, 0.76, 'Synthetic ricker', fontsize=18, family='Calibri')
207 | 
208 |     plt.subplot(413)
209 |     plt.plot(data[1, :], c='#70A0AC', linewidth=2)
210 |     plt.tick_params(labelsize=14)
211 |     plt.margins(x=0)
212 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
213 |     plt.ylabel('Amplitude', fontsize=20, family='Calibri')
214 |     plt.text(365, 0.73, 'Synthetic noise', fontsize=18, family='Calibri')
215 | 
216 |     # Customize Colormap
217 |     from matplotlib.colors import LinearSegmentedColormap
218 |     import scipy.io as scio
219 |     colormap = scio.loadmat('./colormap.mat')['mycamp']
220 |     moreland_map = LinearSegmentedColormap.from_list('cos', colormap)
221 | 
222 |     plt.subplot(412)
223 |     ax = sns.heatmap(reps[0, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
224 |     ax.axis('on')
225 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
226 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
227 |     # cbar = ax.collections[0].colorbar
228 |     # cbar.ax.tick_params(labelsize=14)
229 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
230 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
231 |     plt.tight_layout()
232 | 
233 |     plt.subplot(414)
234 |     ax = sns.heatmap(reps[1, :, :].T, cmap=moreland_map, vmin=-2, vmax=2, xticklabels=False, yticklabels=False, cbar=False)
235 |     ax.axis('on')
236 |     plt.xticks(ticks=[0, 100, 200, 300, 400, 500], labels=[0, 100, 200, 300, 400, 500], rotation=0, fontsize=14)
237 |     plt.yticks(ticks=[0, 16], labels=[32, 16], rotation=0, fontsize=14)
238 |     # cbar = ax.collections[0].colorbar
239 |     # cbar.ax.tick_params(labelsize=14)
240 |     plt.xlabel('Timestamp', fontsize=20, family='Calibri')
241 |     plt.ylabel('Repr dims', fontsize=20, family='Calibri')
242 |     plt.tight_layout()
243 | 
244 |     if save_fig:
245 |         plt.savefig('./results/syn_reprs.png', dpi=600)
246 |     plt.show()
247 | 
248 | 
249 | if __name__ == '__main__':
250 |     #comparison_clustering()
251 |     syn_representation_visualization(save_fig=False)


--------------------------------------------------------------------------------
/tscc.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import pandas as pd
  3 | import torch
  4 | import torch.optim as optim
  5 | from torch.autograd import Variable
  6 | import torch.nn.functional as F
  7 | from torch.utils.data import TensorDataset, DataLoader
  8 | import numpy as np
  9 | from models import TSCCEncoder
 10 | from sklearn.manifold import TSNE
 11 | from sklearn.cluster import KMeans
 12 | from matplotlib import pyplot as plt
 13 | from models.Metrics import nmi, acc
 14 | 
 15 | 
 16 | class TSCCModel:
 17 |     ''' The TSCC model '''
 18 | 
 19 |     def __init__(
 20 |             self,
 21 |             data_loader,
 22 |             dataset_size,
 23 |             batch_size,
 24 |             pretraining_epoch,
 25 |             n_cluster,
 26 |             dataset_name,
 27 |             input_dims,
 28 |             MaxIter=100,
 29 |             m=1.0,
 30 |             T1=2,
 31 |             output_dims=32,
 32 |             hidden_dims=64,
 33 |             depth=10,
 34 |             device='cuda',
 35 |             lr=0.001,
 36 |             max_train_length=3000,
 37 |             temporal_unit=0):
 38 | 
 39 |         ''' Initialize a TS2Vec model '''
 40 | 
 41 |         super().__init__()
 42 |         self.device = device
 43 |         self.lr = lr
 44 |         self.num_cluster = n_cluster
 45 |         self.batch_size = batch_size
 46 |         self.T1 = T1
 47 |         self.m = m
 48 |         self.pretraining_epoch = pretraining_epoch
 49 |         self.MaxIter1 = MaxIter
 50 |         self.data_loader = data_loader
 51 |         self.dataset_size = dataset_size
 52 |         self.dataset_name = dataset_name
 53 |         self.latent_size = output_dims
 54 |         self.max_train_length = max_train_length
 55 |         self.temporal_unit = temporal_unit
 56 | 
 57 |         self.u_mean = torch.zeros([n_cluster, output_dims])
 58 | 
 59 |         self.encoder = TSCCEncoder(input_dims=input_dims, output_dims=output_dims, hidden_dims=hidden_dims, depth=depth).to(self.device)
 60 |         self.net = torch.optim.swa_utils.AveragedModel(self.encoder)
 61 |         self.net.update_parameters(self.encoder)
 62 | 
 63 |     def Pretraining(self):
 64 |         print('Pretraining...')
 65 |         self.encoder.train()
 66 |         self.encoder.cuda()
 67 |         for param in self.encoder.parameters():
 68 |             param.requires_grad = True
 69 |         optimizer = optim.AdamW(self.encoder.parameters(), lr=self.lr)
 70 |         prev_ACC = 0
 71 |         loss_log = []
 72 |         acc_log = []
 73 |         nmi_log = []
 74 |         for T in range(0, self.pretraining_epoch):
 75 |             print('Pretraining Epoch: ', T + 1)
 76 |             for x, target in self.data_loader:
 77 |                 optimizer.zero_grad()
 78 |                 x = Variable(x).cuda()
 79 |                 out1, out2 = self.cropping(x, tp_unit=self.temporal_unit, model=self.encoder)
 80 |                 loss = self.contrastive_loss(out1, out2, temporal_unit=self.temporal_unit)
 81 |                 loss.backward()
 82 |                 optimizer.step()
 83 |                 self.net.update_parameters(self.encoder)
 84 |             loss_log.append(loss)
 85 | 
 86 |             ACC, NMI = self.Kmeans_model_evaluation(T)
 87 |             acc_log.append(ACC)
 88 |             nmi_log.append(NMI)
 89 | 
 90 |             if ACC > prev_ACC:
 91 |                 prev_ACC = ACC
 92 |                 with open(self.dataset_name+'_Pretraining_phase', 'wb') as f:
 93 |                     torch.save(self.encoder, f)
 94 |             print(f"Epoch #{T + 1}: loss={loss}")
 95 | 
 96 |         file = os.getcwd() + '\\pretraining.csv'
 97 |         data = pd.DataFrame.from_dict({'pretraining': loss_log, 'ACC': acc_log, 'NMI': nmi_log}, orient='index')
 98 |         data.to_csv(file, index=False)
 99 | 
100 |         self.encoder = torch.load(self.dataset_name + '_Pretraining_phase')
101 |         self.plotter(name=self.dataset_name + '_Pretraining_phase', save_fig=False)
102 |         return self.encoder
103 | 
104 |     def Finetuning(self):
105 |         self.encoder, self.u_mean = self.initialization()
106 |         self.encoder.cuda()
107 |         self.encoder.train()
108 |         for param in self.encoder.parameters():
109 |             param.requires_grad = True
110 |         optimizer = optim.AdamW(self.encoder.parameters(), lr=0.0001)
111 |         ACC_prev = 0.0
112 |         loss_log = []
113 |         acc_log = []
114 |         nmi_log = []
115 |         for T in range(0, self.MaxIter1):
116 |             print('Finetuning Epoch: ', T + 1)
117 |             if T % self.T1 == 1:
118 |                 self.u_mean = self.update_cluster_centers()
119 |             for x, target in self.data_loader:
120 |                 u = torch.zeros([self.num_cluster, self.batch_size, self.latent_size]).cuda()
121 |                 x = Variable(x).cuda()
122 |                 for kk in range(0, self.num_cluster):
123 |                     y = self.encode_with_pooling(x)
124 |                     u[kk, :, :] = y.cuda()
125 |                 u = u.detach()
126 |                 p = self.cmp(u, self.u_mean.cuda())
127 |                 p = p.detach()
128 |                 self.u_mean = self.u_mean.cuda()
129 |                 p = p.T
130 |                 p = torch.pow(p, self.m)
131 |                 for i in range(0, self.num_cluster):
132 |                     out1, out2 = self.cropping(x, tp_unit=self.temporal_unit, model=self.encoder)
133 |                     u1 = self.encode_with_pooling(x)
134 |                     self.u_mean = self.u_mean.float()
135 |                     loss_c = torch.matmul(p[i, :].unsqueeze(0), torch.sum(torch.pow(u1 - self.u_mean[i, :].unsqueeze(0).repeat(self.batch_size, 1), 2), dim=1))
136 |                     loss_r = self.contrastive_loss(out1, out2, temporal_unit=self.temporal_unit)
137 |                     loss = loss_r + 0.1 * loss_c
138 |                     optimizer.zero_grad()
139 |                     loss.backward()
140 |                     optimizer.step()
141 |                     self.net.update_parameters(self.encoder)
142 |                     loss_log.append(loss)
143 | 
144 |             ACC, NMI = self.model_evaluation(T)
145 |             acc_log.append(ACC)
146 |             nmi_log.append(NMI)
147 | 
148 |             if ACC > ACC_prev:
149 |                 ACC_prev = ACC
150 |                 with open(self.dataset_name + '_Finetuning_phase', 'wb') as f:
151 |                     torch.save(self.encoder, f)
152 |                 with open(self.dataset_name + '_Centers', 'wb') as f:
153 |                     torch.save(self.u_mean, f)
154 |             print(f"Epoch #{T + 1}: loss={loss}")
155 | 
156 |         file = os.getcwd() + '\\finetuning.csv'
157 |         data = pd.DataFrame.from_dict({'finetuning': loss_log, 'ACC': acc_log, 'NMI': nmi_log}, orient='index')  # orient='columns'
158 |         data.to_csv(file, index=False)
159 | 
160 |         self.plotter(name=self.dataset_name + '_Finetuning_phase', save_fig=False)
161 | 
162 |     def initialization(self):
163 |         print("-----initialization mode--------")
164 |         self.encoder = torch.load('AE_' + self.dataset_name + '_pretrain')
165 |         self.encoder.cuda()
166 |         datas = np.zeros([self.dataset_size, self.latent_size])
167 |         ii = 0
168 |         for x, target in self.data_loader:
169 |             x = Variable(x).cuda()
170 |             u = self.encode_with_pooling(x)
171 |             u = u.cpu()
172 |             datas[(ii) * self.batch_size:(ii + 1) * self.batch_size] = u.data.numpy()
173 |             ii = ii + 1
174 |         # datas = datas.cpu()
175 |         kmeans = KMeans(n_clusters=self.num_cluster, random_state=0).fit(datas)
176 |         self.u_mean = kmeans.cluster_centers_
177 |         self.u_mean = torch.from_numpy(self.u_mean)
178 |         self.u_mean = Variable(self.u_mean).cuda()
179 |         return self.encoder, self.u_mean
180 | 
181 |     def Kmeans_model_evaluation(self, T):
182 |         self.encoder.eval()
183 |         datas = np.zeros([self.dataset_size, self.latent_size])
184 |         label_true = np.zeros(self.dataset_size)
185 |         ii = 0
186 |         for x, target in self.data_loader:
187 |             x = Variable(x).cuda()
188 |             u = self.encode_with_pooling(x)
189 |             u = u.cpu()
190 |             datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.numpy()
191 |             label_true[ii * self.batch_size:(ii + 1) * self.batch_size] = target.numpy()
192 |             ii = ii + 1
193 | 
194 |         kmeans = KMeans(n_clusters=self.num_cluster, random_state=0).fit(datas)
195 | 
196 |         label_pred = kmeans.labels_
197 |         ACC = acc(label_true, label_pred, self.num_cluster)
198 |         NMI = nmi(label_true, label_pred)
199 |         print('ACC', ACC)
200 |         print('NMI', NMI)
201 |         if T == 0:
202 |             np.save('./features/Start_Pretraining_R.npy', datas)
203 |             np.save('./features/Start_Pretraining_y_true.npy', label_true)
204 |         if T == self.pretraining_epoch-1:
205 |             np.save('./features/End_Pretraining_R.npy', datas)
206 |             np.save('./features/End_Pretraining_y_true.npy', label_true)
207 |         return ACC, NMI
208 | 
209 |     def update_cluster_centers(self):
210 |         self.encoder.eval()
211 |         for param in self.encoder.parameters():
212 |             param.requires_grad = False
213 |         den = torch.zeros([self.num_cluster]).cuda()
214 |         num = torch.zeros([self.num_cluster, self.latent_size]).cuda()
215 |         for x, target in self.data_loader:
216 |             x = Variable(x).cuda()
217 |             u = self.encode_with_pooling(x)
218 |             p = self.cmp(u.unsqueeze(0).repeat(self.num_cluster, 1, 1), self.u_mean)
219 |             p = torch.pow(p, self.m)
220 |             for kk in range(0, self.num_cluster):
221 |                 den[kk] = den[kk] + torch.sum(p[:, kk])
222 |                 num[kk, :] = num[kk, :] + torch.matmul(p[:, kk].T, u)
223 |         for kk in range(0, self.num_cluster):
224 |             self.u_mean[kk, :] = torch.div(num[kk, :], den[kk])
225 |         self.encoder.cuda()
226 |         self.encoder.train()
227 |         for param in self.encoder.parameters():
228 |             param.requires_grad = True
229 |         return self.u_mean
230 | 
231 |     def cmp(self, u, u_mean):
232 |         p = torch.zeros([self.batch_size, self.num_cluster]).cuda()
233 |         for j in range(0, self.num_cluster):
234 |             p[:, j] = torch.sum(torch.pow(u[j, :, :] - u_mean[j, :].unsqueeze(0).repeat(self.batch_size, 1), 2), dim=1)
235 |         p = torch.pow(p, -1 / (self.m - 1))
236 |         sum1 = torch.sum(p, dim=1)
237 |         p = torch.div(p, sum1.unsqueeze(1).repeat(1, self.num_cluster))
238 |         # print(p[1,:])
239 |         return p
240 | 
241 |     def model_evaluation(self, T):
242 |         datas = np.zeros([self.dataset_size, self.latent_size])
243 |         pred_labels = np.zeros(self.dataset_size)
244 |         true_labels = np.zeros(self.dataset_size)
245 |         ii = 0
246 |         for x, target in self.data_loader:
247 |             x = Variable(x).cuda()
248 |             u = self.encode_with_pooling(x)
249 |             datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.cpu().numpy()
250 |             u = u.unsqueeze(0).repeat(self.num_cluster, 1, 1)
251 |             p = self.cmp(u, self.u_mean)
252 |             y = torch.argmax(p, dim=1)
253 |             y = y.cpu()
254 |             y = y.numpy()
255 |             pred_labels[(ii) * self.batch_size:(ii + 1) * self.batch_size] = y
256 |             true_labels[(ii) * self.batch_size:(ii + 1) * self.batch_size] = target.numpy()
257 |             ii = ii + 1
258 | 
259 |         ACC = acc(true_labels, pred_labels, self.num_cluster)
260 |         NMI = nmi(true_labels, pred_labels)
261 |         print('ACC', ACC)
262 |         print('NMI', NMI)
263 |         if T == 0:
264 |             np.save('./features/Start_Finetuning_R.npy', datas)
265 |             np.save(f'./features/Start_Finetuning_y_pred.npy', pred_labels)
266 |             np.save(f'./features/Start_Finetuning_y_true.npy', true_labels)
267 |         if T == self.MaxIter1-1:
268 |             np.save(f'./features/End_Finetuning_End_Finetuning_R.npy', datas)
269 |             np.save(f'./features/End_Finetuning_y_pred.npy', pred_labels)
270 |             np.save(f'./features/End_Finetuning_y_true.npy', true_labels)
271 |         self.encoder.cuda()
272 |         self.encoder.train()
273 |         for param in self.encoder.parameters():
274 |             param.requires_grad = True
275 | 
276 |         return ACC, NMI
277 | 
278 |     def encode_with_pooling(self, data):
279 |         assert data.ndim == 3
280 |         n_samples, ts_l, _ = data.shape
281 | 
282 |         org_training = self.net.training
283 |         self.net.eval()
284 | 
285 |         if torch.is_tensor(data):
286 |             dataset = TensorDataset(data)
287 |         else:
288 |             dataset = TensorDataset(torch.from_numpy(data).to(torch.float))
289 |         loader = DataLoader(dataset, batch_size=self.batch_size)
290 |         
291 |         with torch.no_grad():
292 |             output = []
293 |             for batch in loader:
294 |                 x = batch[0]
295 |                 out = self.net(x.to(self.device, non_blocking=True))
296 |                 out = F.max_pool1d(out.transpose(1, 2), kernel_size=out.size(1), ).transpose(1, 2)
297 |                 out = out.cpu()
298 |                 out = out.squeeze(1)
299 |                         
300 |                 output.append(out)
301 |                 
302 |             output = torch.cat(output, dim=0)
303 |             
304 |         self.net.train(org_training)
305 |         if torch.is_tensor(data):
306 |             return output.to(self.device)
307 |         else:
308 |             return output.numpy()
309 | 
310 |     def target_distribution(self, q_):
311 |         weight = (q_ ** 2) / torch.sum(q_, 0)
312 |         return (weight.t() / torch.sum(weight, 1)).t()
313 | 
314 |     def plotter(self, name, save_fig=False):
315 |         print('Evaluation')
316 |         legend_properties = {'family': 'Calibri', 'size': '16'}
317 |         target_names = ['Microseismic', 'Noise']
318 |         colors = ['#9B3A4D', '#70A0AC']
319 |         self.encoder.eval()
320 |         label_true = np.zeros(self.dataset_size)
321 |         datas = np.zeros([self.dataset_size, self.latent_size])
322 |         ii = 0
323 |         for x, target in self.data_loader:
324 |             x = Variable(x).cuda()
325 |             u = self.encode_with_pooling(x)
326 |             u = u.cpu()
327 |             datas[ii * self.batch_size:(ii + 1) * self.batch_size, :] = u.data.numpy()
328 |             label_true[ii * self.batch_size:(ii + 1) * self.batch_size] = target.numpy()
329 |             ii = ii + 1
330 |         redu = TSNE(n_components=2, random_state=123).fit_transform(datas)
331 |         lw = 0.6
332 |         f = plt.figure(figsize=(6, 6))
333 |         ax = f.add_subplot(111)
334 |         for color, i, target_name in zip(colors, [0, 1], target_names):
335 |             plt.scatter(redu[label_true == i, 0], redu[label_true == i, 1], color=color, alpha=0.5, lw=lw, s=40,
336 |                         label=target_name)
337 |         plt.legend(loc='lower left', shadow=True, scatterpoints=1, prop=legend_properties, facecolor='w', frameon=False)
338 |         ax.axis('off')
339 |         ax.axis('tight')
340 |         if save_fig:
341 |             f.savefig(f'./{name}.png', dpi=600)
342 |         plt.close(f)
343 | 
344 |     def instance_loss(self, z1, z2):
345 |         B, T = z1.size(0), z1.size(1)
346 |         if B == 1:
347 |             return z1.new_tensor(0.)
348 |         z = torch.cat([z1, z2], dim=0)
349 |         z = z.transpose(0, 1)
350 |         sim = torch.matmul(z, z.transpose(1, 2))
351 |         logits = torch.tril(sim, diagonal=-1)[:, :, :-1]
352 |         logits += torch.triu(sim, diagonal=1)[:, :, 1:]
353 |         logits = -F.log_softmax(logits, dim=-1)
354 | 
355 |         i = torch.arange(B, device=z1.device)
356 |         loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
357 |         return loss
358 | 
359 |     def temporal_loss(self, z1, z2):
360 |         B, T = z1.size(0), z1.size(1)
361 |         if T == 1:
362 |             return z1.new_tensor(0.)
363 |         z = torch.cat([z1, z2], dim=1)
364 |         sim = torch.matmul(z, z.transpose(1, 2))
365 |         logits = torch.tril(sim, diagonal=-1)[:, :, :-1]
366 |         logits += torch.triu(sim, diagonal=1)[:, :, 1:]
367 |         logits = -F.log_softmax(logits, dim=-1)
368 | 
369 |         t = torch.arange(T, device=z1.device)
370 |         loss = (logits[:, t, T + t - 1].mean() + logits[:, T + t, t].mean()) / 2
371 |         return loss
372 | 
373 |     def contrastive_loss(self, z1, z2, alpha=0.5, temporal_unit=0):
374 |         loss = torch.tensor(0., device=z1.device)
375 |         d = 0
376 |         while z1.size(1) > 1:
377 |             if alpha != 0:
378 |                 loss += alpha * self.instance_loss(z1, z2)
379 |             if d >= temporal_unit:
380 |                 if 1 - alpha != 0:
381 |                     loss += (1 - alpha) * self.temporal_loss(z1, z2)
382 |             d += 1
383 |             z1 = F.max_pool1d(z1.transpose(1, 2), kernel_size=2).transpose(1, 2)
384 |             z2 = F.max_pool1d(z2.transpose(1, 2), kernel_size=2).transpose(1, 2)
385 |         if z1.size(1) == 1:
386 |             if alpha != 0:
387 |                 loss += alpha * self.instance_loss(z1, z2)
388 |             d += 1
389 |         return loss / d
390 | 
391 |     def cropping(self, x, tp_unit, model):
392 |         ts_l = x.size(1)
393 |         crop_l = np.random.randint(low=2 ** (tp_unit + 1), high=ts_l + 1)
394 |         crop_left = np.random.randint(ts_l - crop_l + 1)
395 |         crop_right = crop_left + crop_l
396 |         crop_eleft = np.random.randint(crop_left + 1)
397 |         crop_eright = np.random.randint(low=crop_right, high=ts_l + 1)
398 |         crop_offset = np.random.randint(low=-crop_eleft, high=ts_l - crop_eright + 1, size=x.size(0))
399 | 
400 |         indx1 = crop_offset + crop_eleft
401 |         num_elem1 = crop_right - crop_eleft
402 |         all_indx1 = indx1[:, None] + np.arange(num_elem1)
403 | 
404 |         indx2 = crop_offset + crop_left
405 |         num_elem2 = crop_eright - crop_left
406 |         all_indx2 = indx2[:, None] + np.arange(num_elem2)
407 | 
408 |         out1 = model(x[torch.arange(all_indx1.shape[0])[:, None], all_indx1])
409 |         out1 = out1[:, -crop_l:]
410 | 
411 |         out2 = model(x[torch.arange(all_indx2.shape[0])[:, None], all_indx2])
412 |         out2 = out2[:, :crop_l]
413 | 
414 |         return out1, out2
415 | 


--------------------------------------------------------------------------------
Methods	Time series			Feature R
Methods	ACC (%)	NMI (%)	AUPRC (%)	ACC (%)	NMI (%)	AUPRC (%)
Linear	71.59	15.38	66.47	99.11	92.63	98.71
KNN	90.58	55.55	87.65	98.13	86.62	97.36
SVM	97.65	83.91	99.81	98.94	91.56	99.91
TSCC	98.07	86.26	97.15	--	--	--