├── .DS_Store
├── README.md
├── detect_3c.py
├── generate_detect_data_3cRand.py
├── generate_event_array3c4d.py
├── models
    ├── SeisConvNetDetect_sortedAbs50s.pth
    └── SeisConvNetLoc_NotAbs2017Mcut50s.pth
├── predict_location3c4d.py
└── validate_consecution.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seismolab/ArrayConvNet/0e6efea258a07bff99b4fb9625e086376ddfe00a/.DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ArrayConvNet
 2 | Using convolutional neural networks (CNNs), we propose a new technique for automatic earthquake detection and 4D localization. This repository contains the code and trained models to write our paper. The full insights and report should be referenced at our paper submission. 
 3 | 
 4 | ## Data
 5 | This study is based on the earthquake information and waveform data from the Hawaiian Volcano Observatory (HVO), run by the USGS. The USGS earthquake catalog is obtained from the [USGS](https://earthquake.usgs.gov/earthquakes/search/), last accessed March 23, 2020.  The waveform data is available from the IRIS DMC.
 6 | 
 7 | Special thanks to researchers and staff at HVO for collecting the seismic data and providing the earthquake catalog used in this study and Quoqing Lin for providing the relocated earthquakes that are used to assess the effects of location accuracy in the training data set.
 8 | 
 9 | ## Preprocessing the data
10 | To process the public raw trace data, we use two scripts:
11 | 1. `generate_detect_data_3cRand.py`
12 | 2. `generate_event_array3c4d.py`
13 | 
14 | The first script processes all detection data given earthquake and noise events. It creates the files for the training and test data for training the detection model.
15 | 
16 | Similarly, the second script processes all earthquake event data and their labels. It creates the files for the training and test data for training the localization model.   
17 | 
18 | ## Training the models
19 | Once all data is processed, we train and test the detection and localization model with the scripts `detect_3c.py` and `predict_location3c4d.py`, respectively. The output of these scripts are the trained models with their accuracies on test datasets. 
20 | 
21 | ### Trained models
22 | For ease of use, we've included the trained detection and localization models. The directory `models` contains:
23 | 
24 | - `SeisConvNetDetect_sortedAbs50s.pth`: trained model for detection
25 | - `SeisConvNetLoc_NotAbs2017Mcut50s.pth`: trained model for 4d (latitude, longitude, depth, and time) localization
26 | 
27 | Together, they create ArrayConvNet.
28 | 
29 | ## Validation on continuous data
30 | Earthquake catalogs usually represent only a subset of earthquakes that occurred, with detection and localization limited by signal-to-noise ratios in seismic records, number of detected stations, and other factors. Our training data from the USGS catalog for Hawaii is no exception. So while our ArrayConvNet performs well for the test data set, we evaluate its true efficacy on continuous data. In `validate_consecution.py`, we pass in continuous seismic readings and evaluate the results.
31 | 


--------------------------------------------------------------------------------
/detect_3c.py:
--------------------------------------------------------------------------------
  1 | import pdb
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch.optim as optim
  6 | from torch.utils.data import Dataset, DataLoader
  7 |  
  8 | #========Preparing datasets for PyTorch DataLoader=====================================
  9 | # Custom data pre-processor to transform X and y from numpy arrays to torch tensors
 10 | class PrepareData(Dataset):
 11 | 	def __init__(self, path):
 12 | 		self.X, self.y = torch.load(path)
 13 | 		# CE loss only accepts ints as classes
 14 | 		self.y = self.y.type(torch.LongTensor) 
 15 | 
 16 | 	def __len__(self):
 17 | 		return self.X.shape[0]
 18 | 
 19 | 	def __getitem__(self, idx):
 20 | 		return self.X[idx], self.y[idx]
 21 | 
 22 | #========Network architecture=====================================
 23 | class Net(nn.Module):
 24 |     def __init__(self):
 25 |         super(Net, self).__init__()
 26 |         # conv2d takes C_in, C_out, kernel size, stride, padding
 27 |         # input array (3,55,2500)
 28 |         self.conv1 = nn.Conv2d(3, 4, (1,9), stride = 1, padding=(0,4))
 29 |         self.pool1 = nn.MaxPool2d((1,5), stride=(1,5))
 30 |         self.conv2 = nn.Conv2d(4, 4, (5,3), stride = 1, padding=(2,1))
 31 |         self.pool2 = nn.MaxPool2d((1,2), stride=(1,2))
 32 |         self.conv3 = nn.Conv2d(4, 8, (5,3), stride = 1, padding=(2,1))
 33 | 
 34 |         self.fc1 = nn.Linear(8*55*125, 128)
 35 |         self.fc2 = nn.Linear(128, 2)
 36 | 
 37 |     def forward(self, x):
 38 |         x = self.pool1(F.relu(self.conv1(torch.squeeze(x,1))))
 39 |         x = self.pool2(F.relu(self.conv2(x)))
 40 |         x = self.pool2(F.relu(self.conv3(x)))
 41 |         x = x.view(-1, 8*55*125)
 42 |         x = F.relu(self.fc1(x))
 43 |         x = self.fc2(x)
 44 |         return x
 45 | 
 46 | 
 47 | #========Training the model =====================================
 48 | # ds_train is the training dataset loader
 49 | # ds_test is the testing dataset loader
 50 | def train_model(ds_train, ds_test):
 51 | 	net = Net()
 52 | 	# Cross Entropy Loss is used for classification
 53 | 	criterion = nn.CrossEntropyLoss()
 54 | 	optimizer = optim.AdamW(net.parameters(), lr = 2e-5)
 55 | 	num_epoch = 80
 56 | 
 57 | 	losses = []
 58 | 	accs = []
 59 | 	for epoch in range(num_epoch):  # loop over the dataset multiple times
 60 | 		running_loss = 0.0
 61 | 		epoch_loss = 0.0
 62 | 		for i, (_x, _y) in enumerate(ds_train):
 63 | 
 64 | 			optimizer.zero_grad() # zero the gradients on each pass before the update
 65 | 
 66 | 			#========forward pass=====================================
 67 | 			outputs = net(_x.unsqueeze(1))
 68 | 
 69 | 			loss = criterion(outputs, _y)
 70 | 
 71 | 			#=======backward pass=====================================
 72 | 			loss.backward() # backpropagate the loss through the model
 73 | 			optimizer.step() # update the gradients w.r.t the loss
 74 | 
 75 | 			running_loss += loss.item()
 76 | 			epoch_loss += loss.item()
 77 | 			if i % 10 == 9:    # print every 10 mini-batches
 78 | 				print('[%d, %5d] loss: %.3f' %
 79 | 				(epoch + 1, i + 1, running_loss / 10))
 80 | 				running_loss = 0.0
 81 | 		
 82 | 		# For each epoch, monitor test loss to ensure we are not overfitting
 83 | 		test_loss = 0.0
 84 | 		correct = 0
 85 | 		total = 0
 86 | 		
 87 | 		with torch.no_grad():
 88 | 			for i, (_x, _y) in enumerate(ds_test):
 89 | 				outputs = net(_x.unsqueeze(1))
 90 | 				loss = criterion(outputs,_y)
 91 | 				test_loss += loss.item()
 92 | 
 93 | 				_, predicted = torch.max(outputs.data,1)
 94 | 				total += _y.size(0)
 95 | 				correct += (predicted == _y).sum().item()
 96 | 
 97 | 		print('[epoch %d] test loss: %.3f training loss: %.3f' %
 98 | 				(epoch + 1, test_loss / len(ds_test), epoch_loss / len(ds_train))) 
 99 | 	
100 | 	print('Finished Training')
101 | 	return net
102 | 
103 | #========Testing the model =====================================
104 | # ds is the testing dataset
105 | # ds_loader is the testing dataset loader
106 | # net is the trained network
107 | def test_model(ds,ds_loader, net):
108 | 	criterion = nn.CrossEntropyLoss()
109 | 	test_no = len(ds)
110 | 	batch_size=32
111 | 	
112 | 	# precision and recall values for classification threshold, thre
113 | 	thre = torch.arange(0,1.01,0.05)
114 | 	thre_no = len(thre)
115 | 	true_p = torch.zeros(thre_no)
116 | 	false_p =torch.zeros(thre_no)
117 | 	false_n = torch.zeros(thre_no)
118 | 	true_n = torch.zeros(thre_no)
119 | 
120 | 	y_hat = torch.zeros(test_no,2)
121 | 	y_ori = torch.zeros(test_no)
122 | 	y_pre = torch.zeros(test_no)
123 | 	with torch.no_grad():
124 | 		for i, (_x, _y) in enumerate(ds_loader):
125 | 			outputs = net(_x.unsqueeze(1))
126 | 			
127 | 			# view output as probability and set classification threshold
128 | 			prob = F.softmax(outputs,1)
129 | 
130 | 			for j in range(thre_no):
131 | 				pred_threshold = (prob>thre[j]).float()
132 | 				predicted = pred_threshold[:,1]
133 | 			
134 | 				for m in range(len(_y)):
135 | 					if _y[m] == 1. and pred_threshold[m,1] == 1.:
136 | 						true_p[j] += 1.
137 | 					if _y[m] == 0. and pred_threshold[m,1] == 1.:
138 | 						false_p[j] += 1.
139 | 					if _y[m] == 1. and pred_threshold[m,1] == 0.:
140 | 						false_n[j] += 1.
141 | 					if _y[m] == 0. and pred_threshold[m,1] == 0.:
142 | 						true_n[j]  += 1.
143 | 		
144 | 			y_hat[batch_size*i:batch_size*(i+1),:] = outputs
145 | 			y_ori[batch_size*i:batch_size*(i+1)] = _y
146 | 			y_pre[batch_size*i:batch_size*(i+1)] = predicted
147 | 
148 | 	print("Threshold, Accuracy, Precision, Recall, TPR, FPR, FScore")
149 | 	for j in range(thre_no):
150 | 		acc = 100*(true_p[j]+true_n[j])/(true_p[j]+true_n[j]+false_p[j]+false_n[j])
151 | 
152 | 		if (true_p[j]+false_p[j]) > 0.:
153 | 			pre = 100*true_p[j]/(true_p[j]+false_p[j])
154 | 		else:
155 | 			pre = 100*torch.ones(1)
156 | 
157 | 		if (true_p[j]+false_n[j]) > 0.:
158 | 			rec = 100*true_p[j]/(true_p[j]+false_n[j])
159 | 		else:
160 | 			rec = 100*torch.ones(1)
161 | 
162 | 		tpr = 100*true_p[j]/(true_p[j]+false_n[j])
163 | 		fpr = 100*false_p[j]/(false_p[j]+true_n[j])
164 | 		fscore = 2*pre*rec/(pre+rec)
165 | 		print(" %.2f, %.2f, %.2f, %.2f, %.2f, %.2f, %.2f" %(thre[j].item(),acc.item(),pre.item(),rec.item(),tpr.item(),fpr.item(),fscore))
166 | 
167 | 
168 | 
169 | if __name__ == "__main__":
170 | 
171 | 	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
172 | 
173 | 	# Prepare the training dataset and loader 
174 | 	# path is where the preprocessed training event data is housed
175 | 	ds_train = PrepareData(path = '/Volumes/jd/data.hawaii/pts/detect_train_data_sortedAbs50s.pt')
176 | 	ds_train_loader = DataLoader(ds_train, batch_size=32, shuffle=True)
177 | 
178 | 	# Prepare the testing dataset and loader 
179 | 	# path is where the preprocessed test event data is housed
180 | 	ds_test = PrepareData(path = '/Volumes/jd/data.hawaii/pts/detect_test_data_sortedAbs50s.pt')
181 | 	ds_test_loader = DataLoader(ds_test, batch_size=32, shuffle=True)
182 | 	net = train_model(ds_train_loader, ds_test_loader)
183 | 
184 | 	# detect_net_path is where we will store our trained model
185 | 	detect_net_path = './SeisConvNetDetect_sortedAbs50s.pth'
186 | 	torch.save(net.state_dict(), detect_net_path)
187 | 
188 | 	# Analyze our final model on the testing dataset 
189 | 	accuracy = test_model(ds_test, ds_test_loader, net)
190 | 
191 | 
192 | 
193 | 


--------------------------------------------------------------------------------
/generate_detect_data_3cRand.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from obspy import read
  4 | import os
  5 | import matplotlib.pyplot as plt
  6 | import pdb
  7 | import torch
  8 | 
  9 | from sklearn.model_selection import train_test_split
 10 |  
 11 | #========Load raw trace data=====================================
 12 | # Load trace data given a SAC file
 13 | def load_data(filename, see_stats=False, bandpass=False):
 14 |     st = read(filename)
 15 |     tr = st[0]
 16 |     if bandpass:
 17 |         tr.filter(type='bandpass', freqmin=5.0, freqmax=40.0)
 18 |     tr.taper(0.02)
 19 |     if see_stats:
 20 |         print(tr.stats)
 21 |         tr.plot()
 22 |     return tr
 23 | 
 24 | #========Find all events=====================================
 25 | # Inputs a path and returns all events (directories) is list     
 26 | def find_all_events(path):
 27 |     dirs = []
 28 |     for r, d, f in os.walk(path):
 29 |         for name in d:
 30 |             dirs.append(os.path.join(r, name))
 31 |     return dirs
 32 | 
 33 | def find_all_SAC(path):
 34 |     files = []
 35 |     for r, d, f in os.walk(path):
 36 |         for file in f:
 37 |             if '.SAC' in file:
 38 |                 files.append(os.path.join(r, file))
 39 |     return files
 40 | 
 41 | #========Get all station traces for a  given event=====================================
 42 | # Given a path, find all station traces. If a station did not record an event, zero-fill      
 43 | def get_event(path, station_no, showPlots=False):
 44 | 	sample_size = 2500
 45 | 	channel_size = 3
 46 | 	event_array = torch.zeros(channel_size, station_no,sample_size)
 47 | 	sorted_event_array = torch.zeros(channel_size, station_no,sample_size)
 48 | 	max_amp_idx = np.ones(station_no) * sample_size
 49 | 	snr = []
 50 | 	ii=0
 51 | 	for r, d, f in os.walk(path):
 52 | 		if f == []:
 53 | 			return []
 54 | 		for filename in sorted(f):
 55 | 			i = ii // 3
 56 | 			j = ii % 3
 57 | 			tr = load_data(os.path.join(r,filename), False)
 58 | 			if len(tr.data) < sample_size:
 59 | 				print('ERROR '+filename+' '+str(len(tr.data)))
 60 | 			else:
 61 | 				event_array[j,i,:] = torch.from_numpy(tr.data[:sample_size])
 62 | 				peak_amp = max(abs(event_array[j,i,:]))
 63 | 				if tr.stats['network'] != 'FG':
 64 | 					event_array[j,i,:] = event_array[j,i,:] / peak_amp
 65 | 					if tr.stats['channel'] == 'HHZ' or tr.stats['channel'] == 'EHZ':
 66 | 						max_amp_idx[i] = np.argmax(abs(event_array[j,i,:])).numpy()
 67 | 				else:
 68 | 					event_array[j,i,:] = event_array[j,i,:] * 0
 69 | 			ii+=1
 70 | 			if i == station_no:
 71 | 				break
 72 | 
 73 | 	# sort traces in order of when their maximum amplitude arrives
 74 | 	idx = np.argsort(max_amp_idx)
 75 | 	sorted_event_array = event_array[:,idx,:]
 76 | 
 77 | 	# sorted and absolute
 78 | 	event_array = abs(sorted_event_array)
 79 | 
 80 | 	# Include option to visualize traces for each event
 81 | 	if (showPlots):
 82 | 		fig, axs = plt.subplots(station_no, sharey="col")
 83 | 		fig.suptitle(path)
 84 | 		for i in range(station_no):
 85 | 			axs[i].plot(sorted_event_array[2,i,:])
 86 | 			axs[i].axis('off')
 87 | 		plt.show()
 88 | 	return event_array	
 89 | 
 90 | 
 91 | if __name__ == "__main__":
 92 | 	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 93 | 	pos_path = "/Volumes/jd/data.hawaii/data_prepared_FCreviewedRand50s"
 94 | 	neg_path = "/Volumes/jd/data.hawaii/data_prepared_noise50s"
 95 | 	sample_size = 2500
 96 | 	pos_dirs = find_all_events(pos_path)
 97 | 	neg_dirs = find_all_events(neg_path)
 98 | 	print(len(pos_dirs)) # number of earthquake events
 99 | 	print(len(neg_dirs)) # number of noise events
100 | 	station_no = 55
101 | 	channel_size=3
102 | 	X_all = torch.zeros(len(pos_dirs)+len(neg_dirs), channel_size, station_no, sample_size)
103 | 	y_all = torch.zeros(len(pos_dirs)+len(neg_dirs))
104 | 	
105 | 	for i,dirname in enumerate(pos_dirs):
106 | 	 	print(dirname)
107 | 	 	event_array = get_event(dirname, station_no)
108 | 	 	X_all[i,:,:] = event_array
109 | 	 	y_all[i] = torch.tensor(1)
110 | 
111 | 	for i,dirname in enumerate(neg_dirs):
112 | 		print(dirname)
113 | 		event_array = get_event(dirname, station_no)
114 | 		X_all[i+len(pos_dirs),:,:] = event_array
115 | 		y_all[i+len(pos_dirs)] = torch.tensor(0)
116 | 
117 | 	# Split all data randomly into a 75-25 training/test set 
118 | 	X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.25, random_state=42)
119 | 
120 | 	# Save all processed data into training and test files
121 | 	torch.save((X_train, y_train), '/Volumes/jd/data.hawaii/pts/detect_train_data_sortedAbs50s.pt')
122 | 	torch.save((X_test, y_test), '/Volumes/jd/data.hawaii/pts/detect_test_data_sortedAbs50s.pt')


--------------------------------------------------------------------------------
/generate_event_array3c4d.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from obspy import read
  4 | import os
  5 | import matplotlib.pyplot as plt
  6 | import matplotlib.colors as colors
  7 | import pdb
  8 | import torch
  9 | import torchvision
 10 | import torchvision.transforms as transforms
 11 | import torch.nn as nn
 12 | import torch.nn.functional as F
 13 | import torch.optim as optim
 14 | from torch.utils.data import Dataset, DataLoader
 15 | 
 16 | from sklearn.model_selection import train_test_split
 17 |   
 18 | #========Load raw trace data=====================================
 19 | # Load trace data given a SAC file
 20 | def load_data(filename, see_stats=False, bandpass=False):
 21 |     st = read(filename)
 22 |     tr = st[0]
 23 |     if bandpass:
 24 |         tr.filter(type='bandpass', freqmin=5.0, freqmax=40.0)
 25 |     tr.taper(0.2)
 26 |     if see_stats:
 27 |         print(tr.stats)
 28 |         tr.plot()
 29 |     return tr
 30 | 
 31 | #========Find all events=====================================
 32 | # Inputs a path and returns all events (directories) is list     
 33 | def find_all_events(path):
 34 |     dirs = []
 35 |     for r, d, f in os.walk(path):
 36 |         for name in d:
 37 |             dirs.append(os.path.join(r, name))
 38 |     return dirs
 39 | 
 40 | 
 41 | #========Get all station traces for a  given event=====================================
 42 | # Given a path, find all station traces. If a station did not record an event, zero-fill      
 43 | def get_event(path, station_no, showPlots=False):
 44 | 	sample_size = 2500
 45 | 	channel_size = 3
 46 | 	event_array = torch.zeros(channel_size,station_no,sample_size)
 47 | 	max_amp_idx = []
 48 | 	max_amp = []
 49 | 	snr = []
 50 | 	ii=0
 51 | 	for r, d, f in os.walk(path):
 52 | 		if f == []:
 53 | 			return []
 54 | 		for filename in sorted(f):
 55 | 			i = ii // 3	
 56 | 			j = ii % 3
 57 | 			tr = load_data(os.path.join(r,filename), False)
 58 | 			if len(tr.data) < sample_size:
 59 | 				print('ERROR '+filename+' '+str(len(tr.data)))
 60 | 			else:
 61 | 				event_array[j,i,:] = torch.from_numpy(tr.data[:sample_size])
 62 | 				peak_amp = max(abs(event_array[j,i,:]))
 63 | 				event_array[j,i,:] = event_array[j,i,:] / peak_amp
 64 | 				
 65 | 				# FG stands for funcgen, a random time series generated by SAC
 66 | 				if tr.stats['network'] == 'FG':
 67 | 					event_array[j,i,:] = event_array[j,i,:] * 0
 68 | 
 69 | 			ii+=1
 70 | 			if i == station_no:
 71 | 				break
 72 | 
 73 | 	# Include option to visualize traces for each event
 74 | 	if (showPlots):
 75 | 		fig, axs = plt.subplots(station_no, sharey="col")
 76 | 		fig.suptitle(path)	
 77 | 		for i in range(station_no):
 78 | 			axs[i].plot(event_array[2,i,:])
 79 | 			axs[i].axis('off')
 80 | 		plt.show()
 81 | 	return event_array	
 82 | 
 83 | #========Get the true coordinates for each event=====================================
 84 | # Given a directory name with event information, return event coordinates
 85 | def get_coords(dirname):
 86 | 	# Includes path to CSV that includes all earthquake information
 87 | 	earthquake_df = pd.read_csv('/Users/yshen/Proj.ML/EQinfo/Lin_2020_reloc_withTBO.csv')
 88 | 
 89 | 	uniqueID = dirname[-17:]
 90 | 	earthquake_df = earthquake_df.set_index(earthquake_df['evtCutID'].str[:17])
 91 | 	match = uniqueID
 92 | 	return torch.tensor([earthquake_df['latitude'][match], earthquake_df['longitude'][match], earthquake_df['depth'][match], earthquake_df['timeBeforeOri'][match]])
 93 | 
 94 | if __name__ == "__main__":
 95 | 	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 96 | 
 97 | 	pos_path = "/Volumes/jd/data.hawaii/data_prepared_LinReloc"
 98 | 
 99 | 	sample_size = 2500
100 | 	pos_dirs = find_all_events(pos_path)
101 | 	print(len(pos_dirs))
102 | 	station_no = 55
103 | 	channel_size = 3
104 | 
105 | 	X_all = torch.zeros(len(pos_dirs),channel_size,station_no, sample_size)
106 | 	y_all = torch.zeros(len(pos_dirs),4)
107 | 
108 | 	for i,dirname in enumerate(pos_dirs):
109 | 		print(dirname)
110 | 		event_coordref = (19.5,-155.5,0.0,0.0)
111 | 		event_norm = (1.0,1.0,50.0,10.0)
112 | 
113 | 		event_array = get_event(dirname, station_no)
114 | 		event_coordinates = get_coords(dirname)
115 | 
116 | 		# normalize location and time
117 | 		event_coordinates = np.subtract(event_coordinates, event_coordref)
118 | 		event_coordinates = np.divide(event_coordinates, event_norm) 	
119 | 		X_all[i,:,:,:] = event_array
120 | 		y_all[i,:] = event_coordinates
121 | 	X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.25, random_state=42)
122 | 
123 | 	torch.save((X_train, y_train), '/Volumes/jd/data.hawaii/pts/train_data3c4d_NotAbs2017Mcut50sLin.pt')
124 | 	torch.save((X_test, y_test), '/Volumes/jd/data.hawaii/pts/test_data3c4d_NotAbs2017Mcut50sLin.pt')
125 | 
126 | 
127 | 
128 | 


--------------------------------------------------------------------------------
/models/SeisConvNetDetect_sortedAbs50s.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seismolab/ArrayConvNet/0e6efea258a07bff99b4fb9625e086376ddfe00a/models/SeisConvNetDetect_sortedAbs50s.pth


--------------------------------------------------------------------------------
/models/SeisConvNetLoc_NotAbs2017Mcut50s.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/seismolab/ArrayConvNet/0e6efea258a07bff99b4fb9625e086376ddfe00a/models/SeisConvNetLoc_NotAbs2017Mcut50s.pth


--------------------------------------------------------------------------------
/predict_location3c4d.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import os
  3 | import matplotlib.pyplot as plt
  4 | import pdb
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | import torch.optim as optim
  9 | from torch.utils.data import Dataset, DataLoader
 10 | import geopy.distance
 11 | from geopy import distance
 12 | 
 13 | #========Preparing datasets for PyTorch DataLoader=====================================
 14 | # Custom data pre-processor to transform X and y from numpy arrays to torch tensors
 15 | class PrepareData(Dataset):
 16 | 	def __init__(self, path):
 17 | 		self.X, self.y = torch.load(path)
 18 | 
 19 | 	def __len__(self):
 20 | 		return self.X.shape[0]
 21 | 
 22 | 	def __getitem__(self, idx):
 23 | 		return self.X[idx], self.y[idx]
 24 | 
 25 | #========Network architecture=====================================
 26 | class Net(nn.Module):
 27 |     def __init__(self):
 28 |         super(Net, self).__init__()
 29 |         # conv2d takes C_in, C_out, kernel size, stride, padding
 30 |         # input array (3,55,2500)
 31 |         self.conv1 = nn.Conv2d(3, 4, (1,9), stride = 1, padding=(0,4))
 32 |         self.pool1 = nn.MaxPool2d((1,5), stride=(1,5))
 33 |         self.conv2 = nn.Conv2d(4, 4, (5,3), stride = 1, padding=(2,1))
 34 |         self.pool2 = nn.MaxPool2d((1,2), stride=(1,2))
 35 |         self.conv3 = nn.Conv2d(4, 8, (5,3), stride = 1, padding=(2,1))
 36 | 
 37 |         self.fc1 = nn.Linear(8*55*125, 128)
 38 |         self.fc2 = nn.Linear(128, 4)
 39 | 
 40 |     def forward(self, x):
 41 |         x = self.pool1(F.relu(self.conv1(torch.squeeze(x,1))))
 42 |         x = self.pool2(F.relu(self.conv2(x)))
 43 |         x = self.pool2(F.relu(self.conv3(x)))
 44 |         x = x.view(-1, 8*55*125)
 45 |         x = F.relu(self.fc1(x))
 46 |         x = self.fc2(x)
 47 |         return x
 48 | 
 49 |  
 50 | #========Training the model =====================================
 51 | # ds_train is the training dataset loader
 52 | # ds_test is the testing dataset loader
 53 | def train_model(ds_train, ds_test):
 54 | 	net = Net()
 55 | 	criterion = nn.MSELoss()
 56 | 	optimizer = optim.AdamW(net.parameters(), lr = 5e-5)
 57 | 	num_epoch = 80
 58 | 
 59 | 	losses = []
 60 | 	accs = []
 61 | 	for epoch in range(num_epoch):  # loop over the dataset multiple times
 62 | 		running_loss = 0.0
 63 | 		epoch_loss = 0.0
 64 | 		for i, (_x, _y) in enumerate(ds_train):
 65 | 
 66 | 			optimizer.zero_grad() # zero the gradients on each pass before the update
 67 | 
 68 | 			#========forward pass=====================================
 69 | 			outputs = net(_x.unsqueeze(1))
 70 | 			loss = criterion(outputs, _y)
 71 | 			# acc = tr.eq(outputs.round(), _y).float().mean() # accuracy
 72 | 			# print(loss.item())
 73 | 
 74 | 			#=======backward pass=====================================
 75 | 			loss.backward() # backpropagate the loss through the model
 76 | 			optimizer.step() # update the gradients w.r.t the loss
 77 | 
 78 | 			running_loss += loss.item()
 79 | 			epoch_loss += loss.item()
 80 | 			if i % 10 == 9:    # print running_loss for every 10 mini-batches
 81 | 				print('[%d, %5d] loss: %.4f' %
 82 | 				(epoch + 1, i + 1, running_loss / 10))
 83 | 				running_loss = 0.0
 84 | 		
 85 | 		test_loss = 0.0
 86 | 		with torch.no_grad():
 87 | 			for i, (_x, _y) in enumerate(ds_test):
 88 | 				outputs = net(_x.unsqueeze(1))
 89 | 				loss = criterion(outputs,_y)
 90 | 				test_loss += loss.item()
 91 | 		print('[epoch %d] test loss: %.4f training loss: %.4f' %
 92 | 				(epoch + 1, test_loss / len(ds_test), epoch_loss / len(ds_train))) 
 93 | 	
 94 | 	print('Finished Training')
 95 | 	return net
 96 | 
 97 | #========Get the distance between two points=====================================
 98 | # true is a list of true locations
 99 | # predicted is a list of predicted locations
100 | # Returns in a list of distances between each true/predicted point, in km 
101 | def dist_list(true, predicted):
102 | 	dist_list = np.zeros((predicted.shape[0]))
103 | 	for i in range(predicted.shape[0]):
104 | 		origin = (true[i,0], true[i,1])
105 | 		dest = (predicted[i,0], predicted[i,1])
106 | 		dist_list[i] = distance.distance(origin, dest).km
107 | 	return dist_list
108 | 
109 | #========Testing the model =====================================
110 | # ds is the testing dataset
111 | # ds_loader is the testing dataset loader
112 | # net is the trained network
113 | def test_model(ds,ds_loader, net):
114 | 	criterion = nn.MSELoss()
115 | 	test_no = len(ds)
116 | 	batch_size=32
117 | 	print(test_no)
118 | 	y_hat = np.zeros((test_no,4))
119 | 	y_ori = np.zeros((test_no,4))
120 | 	accurate = 0
121 | 	with torch.no_grad():
122 | 		for i, (_x, _y) in enumerate(ds_loader):
123 | 			outputs = net(_x.unsqueeze(1))
124 | 			loss = criterion(outputs,_y)
125 | 			y_hat[batch_size*i:batch_size*(i+1),:] = outputs
126 | 			y_ori[batch_size*i:batch_size*(i+1),:] = _y
127 | 
128 | 	fig = plt.figure()
129 | 	ax = fig.add_subplot(111, projection='3d')
130 | 
131 | 	event_coordref = (19.5,-155.5,0,0)
132 | 	event_norm = (1.0,1.0,50.0,10.)
133 | 
134 | 	# first rescale and then shift the earthquake source values, reverse the steps in generate*.py
135 | 	ds.y = np.multiply(ds.y,event_norm)
136 | 	y_hat = np.multiply(y_hat,event_norm)
137 | 	y_ori = np.multiply(y_ori,event_norm)
138 | 	ds.y = np.add(ds.y,event_coordref)
139 | 	y_hat = np.add(y_hat,event_coordref)
140 | 	y_ori = np.add(y_ori,event_coordref)
141 | 
142 | 	for k in range(len(y_hat)):
143 | 		if y_hat[k,2] < 0:
144 | 			y_hat[k,2] = 0 # no earthquake in the air (note: HVO catalog ignore topo)
145 | 
146 | 	dist = dist_list(y_ori, y_hat)
147 | 	for k in range(len(dist)):
148 | 		print(y_ori[k,0],y_ori[k,1],y_ori[k,2],y_hat[k,0],y_hat[k,1],y_hat[k,2],y_ori[k,3],y_hat[k,3])
149 | 	dep_diff = y_ori[:,2] - y_hat[:,2]
150 | 	print(np.mean(dist), np.std(dist), np.mean(abs(dep_diff)), np.std(dep_diff))
151 | 
152 | 	# Visualize  the  results of the predicted location
153 | 	ax.scatter(ds.y[:,0], ds.y[:,1], -ds.y[:,2], marker='o',label="HVO")
154 | 	ax.scatter(y_hat[:,0], y_hat[:,1], -y_hat[:,2], marker='^', label="predicted")
155 | 	ax.set_xlim(18, 20.5)
156 | 	ax.set_ylim(-154, -157)
157 | 	ax.set_xlabel('Latitude')
158 | 	ax.set_ylabel('Longitude')
159 | 	ax.set_zlabel('Depth')
160 | 	plt.legend(loc='upper left')
161 | 	plt.show()
162 | 
163 | 
164 | if __name__ == "__main__":
165 | 
166 | 	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
167 | 
168 | 	# Prepare the training dataset and loader 
169 | 	# path is where the preprocessed training event data is housed, can be replaced with your updated location 
170 | 	ds_train = PrepareData(path = '/Volumes/jd/data.hawaii/pts/train_data3c4d_NotAbs2017Mcut50sLin.pt')
171 | 	ds_train_loader = DataLoader(ds_train, batch_size=32, shuffle=True)
172 | 
173 | 	# Prepare the testing dataset and loader 
174 | 	# path is where the preprocessed testing event data is housed, can be replaced with your updated location 
175 | 	ds_test = PrepareData(path = '/Volumes/jd/data.hawaii/pts/test_data3c4d_NotAbs2017Mcut50sLin.pt')
176 | 	ds_test_loader = DataLoader(ds_test, batch_size=32, shuffle=True)
177 | 
178 | 	net = train_model(ds_train_loader, ds_test_loader)
179 | 
180 | 	# predict_path is where we will store our trained model
181 | 	predict_path = './SeisConvNetLoc_NotAbs2017Mcut50sLin.pth'
182 | 	torch.save(net.state_dict(), predict_path)
183 | 
184 | 	# Analyze our final model on the testing dataset 
185 | 	accuracy = test_model(ds_test, ds_test_loader, net)
186 | 
187 | 
188 | 
189 | 


--------------------------------------------------------------------------------
/validate_consecution.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from obspy import read
  3 | import os
  4 | import matplotlib.pyplot as plt
  5 | import pdb
  6 | import torch
  7 | import torchvision
  8 | import torch.nn as nn
  9 | import torch.nn.functional as F
 10 | import torch.optim as optim
 11 | from geopy import distance
 12 | import math
 13 | 
 14 | #========Network architecture for detection=====================================
 15 | class DetectNet(nn.Module):
 16 |     def __init__(self):
 17 |         super(DetectNet, self).__init__()
 18 |         # conv2d takes C_in, C_out, kernel size, stride, padding
 19 |         self.conv1 = nn.Conv2d(3, 4, (1,9), stride = 1, padding=(0,4))
 20 |         self.pool1 = nn.MaxPool2d((1,5), stride=(1,5))
 21 |         self.conv2 = nn.Conv2d(4, 4, (5,3), stride = 1, padding=(2,1))
 22 |         self.pool2 = nn.MaxPool2d((1,2), stride=(1,2))
 23 |         self.conv3 = nn.Conv2d(4, 8, (5,3), stride = 1, padding=(2,1))
 24 | 
 25 |         self.fc1 = nn.Linear(8*55*125, 128)
 26 |         self.fc2 = nn.Linear(128, 2)
 27 | 
 28 |     def forward(self, x):
 29 |         x = self.pool1(F.relu(self.conv1(x)))
 30 |         x = self.pool2(F.relu(self.conv2(x)))
 31 |         x = self.pool2(F.relu(self.conv3(x)))
 32 |         x = x.view(-1, 8*55*125)
 33 |         x = F.relu(self.fc1(x))
 34 |         x = self.fc2(x)
 35 |         return x
 36 | 
 37 | #========Network architecture for prediction=====================================
 38 | class PredictNet(nn.Module):
 39 |     def __init__(self):
 40 |         super(PredictNet, self).__init__()
 41 |         self.conv1 = nn.Conv2d(3, 4, (1,9), stride = 1, padding=(0,4))
 42 |         self.pool1 = nn.MaxPool2d((1,5), stride=(1,5))
 43 |         self.conv2 = nn.Conv2d(4, 4, (5,3), stride = 1, padding=(2,1))
 44 |         self.pool2 = nn.MaxPool2d((1,2), stride=(1,2))
 45 |         self.conv3 = nn.Conv2d(4, 8, (5,3), stride = 1, padding=(2,1))
 46 | 
 47 |         self.fc1 = nn.Linear(8*55*125, 128) 
 48 |         self.fc2 = nn.Linear(128, 4)
 49 | 
 50 |     def forward(self, x):
 51 |         x = self.pool1(F.relu(self.conv1(x)))
 52 |         x = self.pool2(F.relu(self.conv2(x)))
 53 |         x = self.pool2(F.relu(self.conv3(x)))
 54 |         x = x.view(-1, 8*55*125)
 55 |         x = F.relu(self.fc1(x))
 56 |         x = self.fc2(x)
 57 |         return x
 58 | 
 59 | #========Predict=====================================
 60 | # Run prediction model over a window
 61 | def predict(window, predictNet):
 62 | 	with torch.no_grad():
 63 | 		outputs = predictNet(window.unsqueeze(0))
 64 | 	return outputs
 65 | 
 66 | #========Detect=====================================
 67 | # Run detection model over a window
 68 | def detect(window, detectNet):
 69 | 	criterion = nn.CrossEntropyLoss()
 70 | 	with torch.no_grad():
 71 | 		#print(window.unsqueeze(0).shape)
 72 | 		outputs = detectNet(window.unsqueeze(0))
 73 | 		# outputs is a vector of two elements from CrossEntropyLoss
 74 | 		#print(outputs)
 75 | 		_, detection = torch.max(outputs.data,1)
 76 | 	#return detection, 
 77 | 	return detection, outputs
 78 | 
 79 | #========Load raw trace data=====================================
 80 | # Load trace data given a SAC file
 81 | def load_data(filename, see_stats=False, bandpass=False):
 82 |     st = read(filename)
 83 |     tr = st[0]
 84 |     if bandpass:
 85 |         tr.filter(type='bandpass', freqmin=5.0, freqmax=40.0)
 86 |     tr.taper(0.2)
 87 |     if see_stats:
 88 |         print(tr.stats)
 89 |         tr.plot()
 90 |     return tr
 91 | 
 92 | #========Read a SAC file=====================================
 93 | # Load trace data given a SAC file
 94 | def read_sac(path):
 95 | 	length = 4320000
 96 | 	channel_size = 3
 97 | 	station_no = 55
 98 | 	event_array = torch.zeros(channel_size, station_no,length)
 99 | 	ii = 0
100 | 	for r, d, f in os.walk(path):
101 | 		print(len(f))
102 | 		if f == []:
103 | 			print ("no files")
104 | 			return []
105 | 
106 | 		# Load sac files in alphabetically by station
107 | 		for filename in sorted(f):
108 | 			#print(filename)
109 | 			i = ii // 3 # station number
110 | 			j = ii % 3 # channel
111 | 			tr = load_data(os.path.join(r,filename), False)
112 | 			if tr.stats['network'] == 'FG':
113 | 				print('FG')
114 | 			else:
115 | 				event_array[j,i,:] = torch.from_numpy(tr.data)
116 | 			ii +=1
117 | 	return event_array
118 |  
119 | #========Return all days with seismic traces=====================================
120 | # Each directory corresponds to a day     
121 | def find_all_days(path):
122 |     dirs = []
123 |     for r, d, f in os.walk(path):
124 |         for name in d:
125 |             dirs.append(os.path.join(r, name))
126 |     return dirs
127 | 
128 | #========Parse the year and day=====================================
129 | # Given a path, return the year and day
130 | def parse(path):
131 | 	dir_name = path.split('/')[-1]
132 | 	year = dir_name.split('.')[0]
133 | 	day = dir_name.split('.')[1]
134 | 	return year, day
135 | 
136 | #========Run detection model over an entire day=====================================
137 | # Detect earthquakes within a day
138 | def daily_detection(day_path, detectNet, predictNet):
139 | 	sampling_rate = 50 # we collect 50 samples per second
140 | 	interval = 3
141 | 	window_n = 2500 # number of samples in the window
142 | 	n = 4320000	
143 | 	station_no = 55
144 | 	channels = 3
145 | 	idx = range(2*window_n, n, sampling_rate*interval) # skip the begining due to tapering effect
146 | 
147 | 	saved_vars = []
148 | 	num_detected = 0
149 | 
150 | 	continuous_sac = read_sac(day_path)
151 | 	skip = False
152 | 	skipTracker = 0
153 | 	prob_max = 0
154 | 	window_max = torch.zeros(channels,station_no,window_n)
155 | 	peak_med = 40
156 | 	peak_medmax = 40
157 | 	peak_ave = 40
158 | 	ttdiff = 40
159 | 	det_flag = 0
160 | 	prob_cutoff = 0.95
161 | 	prob_det = 0
162 | 
163 | 	for i in idx:
164 | 		# ensure we get full length windows 
165 | 		if i+window_n > n:
166 | 			break
167 | 		if skip:
168 | 			skipTracker += 1
169 | 			# at 5 s interval
170 | 			if skipTracker == 2:
171 | 				skip = False
172 | 			continue
173 | 		window = continuous_sac[:,:,i:i+window_n] # get all channels, all stations within the specific time frame
174 | 
175 | 		window_normalized = torch.zeros(channels, station_no,window_n)
176 | 		trace_max = np.amax(abs(np.array(window[:,:,:])),axis=2)
177 | 		max_amp_idx = np.ones(station_no) * window_n
178 | 		idx = np.ones(3) * window_n
179 | 
180 | 		for k in reversed(range(channels)):
181 | 			for j in range(station_no):
182 | 				if trace_max[k,j] != 0:
183 | 					window_normalized[k,j,:] = window[k,j,:]/trace_max[k,j]
184 | 		for j in range(station_no):
185 | 			if trace_max[0,j] != 0:
186 | 				idx[0] = np.argmax(abs(window_normalized[0, j,:]))
187 | 				idx[1] = np.argmax(abs(window_normalized[1, j,:]))
188 | 				idx[2] = np.argmax(abs(window_normalized[2, j,:]))
189 | 				max_amp_idx[j] = np.median(idx)
190 | 			else:
191 | 				max_amp_idx[j] = np.argmax(abs(window_normalized[2, j,:]))
192 | 
193 | 		peak_med = np.median(max_amp_idx)/50
194 | 
195 | 		if peak_med > 7 and peak_med < 43:
196 | 			sort_idx = np.argsort(max_amp_idx) # sort by index of maximum amplitude
197 | 			sorted_window = abs(window_normalized[:,sort_idx,:])
198 | 
199 | 			detection, outputs = detect(sorted_window, detectNet)
200 | 
201 | 			prob_outputs = F.softmax(outputs,1)
202 | 			prob = prob_outputs[:,1].item()
203 | 
204 | 			# If the probability of an earthquake occuring is greater than the designated threshold,
205 | 			# run the prediction model
206 | 			if prob > prob_cutoff:
207 | 				prediction = predict(window_normalized, predictNet)
208 | 				event_coordref = (19.5,-155.5,0,0)
209 | 				event_norm = (1.0,1.0,50.0,10.0)
210 | 				y_hat = torch.squeeze(prediction)
211 | 				y_hat = np.multiply(y_hat,event_norm)
212 | 				y_hat = np.add(y_hat,event_coordref)
213 | 
214 | 				if prob >= prob_max and y_hat[3].item() > 4 and y_hat[3].item() < 9 and y_hat[2].item() > -1.5:
215 | 					prob_max = prob
216 | 					i_max = i
217 | 					window_max = window_normalized
218 | 					peak_medmax = peak_med
219 | 			if prob < prob_cutoff and prob_max > prob_cutoff:
220 | 				det_flag = 1
221 | 				prob_det = prob_max
222 | 				prob_max = 0
223 | 
224 | 		if det_flag == 1:
225 | 			num_detected+=1
226 | 			prediction = predict(window_max, predictNet)
227 | 			event_coordref = (19.5,-155.5,0,0)
228 | 			event_norm = (1.0,1.0,50.0,10.0)
229 | 			y_hat = torch.squeeze(prediction)
230 | 			y_hat = np.multiply(y_hat,event_norm)
231 | 			y_hat = np.add(y_hat,event_coordref)
232 | 
233 | 			year, day = parse(day_path)
234 | 
235 | 			# 50 sps, 3600 s per hour
236 | 			ori_time = y_hat[3].item()+i_max/50
237 | 			hour = int(ori_time) // 3600
238 | 			minute = int(ori_time) % 3600 // 60
239 | 			second = int(ori_time) % 3600 % 60 
240 | 			timestamp = str(hour) + '.' + str(minute) + '.' + str(second)
241 | 
242 | 			print(year +" " + day + " " +' %s %s %s %.4f %.4f %.2f %.2f %.2f %.5f %.2f' % (hour,minute,second,y_hat[0].item(),y_hat[1].item(),y_hat[2].item(),y_hat[3].item()+i_max/50,i_max/50,prob_det,peak_medmax)) # 50 sps
243 | 
244 | 			saved_vars.append([year, day, hour, minute, second, y_hat[0].item(), y_hat[1].item(),y_hat[2].item(),y_hat[3].item()+i_max/50,i_max/50])
245 | 
246 | 			skip = True
247 | 			skipTracker = 0
248 | 			det_flag = 0
249 | 
250 | 	return num_detected, saved_vars
251 | 
252 | if __name__ == "__main__":
253 | 	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
254 | 
255 | 	# Load trained detection model
256 | 	DETECT_PATH = './SeisConvNetDetect_sortedAbs50s.pth'
257 | 	detectNet = DetectNet()
258 | 	detectNet.load_state_dict(torch.load(DETECT_PATH))
259 | 
260 | 	# Load trained prediction model
261 | 	PREDICT_PATH = './SeisConvNetLoc_NotAbs2017Mcut50s.pth'
262 | 	predictNet = PredictNet()
263 | 	predictNet.load_state_dict(torch.load(PREDICT_PATH))
264 | 
265 | 	val_path = "/Volumes/jd/data.hawaii/sac_allstns_cont/screened"
266 | 	f = open("val_consecution_output_test.dat", "w")
267 | 
268 | 	val_dirs = find_all_days(val_path)
269 | 	for day in val_dirs:
270 | 		daily_num_detected, daily_outputs = daily_detection(day, detectNet, predictNet)
271 | 		for output in daily_outputs:
272 | 			f.write('%s %s %s %s %s %.4f %.4f %.2f %.2f %.2f \n' % (output[0],output[1],output[2],output[3],output[4],output[5],output[6],output[7],output[8],output[9])) # 50 sps
273 | 		print('Number of detected earthquakes on this day: %.0f' %(daily_num_detected))
274 | 
275 | 	f.close()


--------------------------------------------------------------------------------