├── LICENSE ├── README.md ├── data ├── learning_data.hdf5 ├── prediction_data.hdf5 └── validation_data.hdf5 ├── images ├── prediction.png ├── tmp11.txt └── training.png ├── main ├── prediction_functions.py └── training_functions.py ├── models ├── best_features_postfb.npy ├── best_features_prefb.npy ├── m_1s.h5 ├── m_50s_postfb.h5 └── m_50s_prefb.h5 ├── prediction.ipynb └── training.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 mmezyk 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # fbpicker 2 | ## Multi-pattern algorithm for first-break picking employing open-source machine learning libraries 3 | 4 | Accurate first-break (FB) onsets are crucial in processing seismic data, e.g. for deriving statics solution or building velocity models via tomography. Ever-increasing volumes of seismic data require automatic algorithms for FB picking. Most of the existing techniques are based either on simple triggering algorithms that rely on basic properties of seismic traces or on neural network architectures that are too complex to be easily trained and re-used in another survey. Here we propose a solution that addresses the issue of multi-level analysis using a time-sequence pattern-recognition process implemented in the newest open-source machine learning library like Keras or Scikit-Learn. We use well-established methods such as STA/LTA, entropy, fractal dimension or higher-order statistics to provide patterns required for generative model training with artificial neural networks (ANN), Support Vector Regression (SVR) and an implementation of gradient boosted decision trees – XGBoost (Extreme Gradient Boosting). FB picking is cast as the binary classification that requires a model to differentiate FB sample from all other samples in a seismic trace. Our workflow (provided freely as a Jupyter Notebook) is robust, easily adaptable and flexible in a way of adding new pattern generators that might contribute to even better performance, while already-trained models can be saved and re-used for another dataset collected with similar acquisition parameters (e.g., in multi-line surveys). Application to real seismic data showed that the models trained on 1000 and 340,100 manually-picked FB onsets are able to predict FB on the rest of 470,000 of traces with the success rate of nearly 90% and 95%, respectively. 5 | 6 | This repository contains IPython Notebook with sample code and arbitrary seismic traces, complementing research artictle about automatic first-break picking: 7 | 8 | https://www.sciencedirect.com/science/article/pii/S0926985119302435 9 | 10 | You can view **"training.ipynb"** and **"prediction.ipynb"** directly on github, or clone the repository, install dependencies listed in the notebook and play with code locally. 11 | 12 | # Training 13 | ![Training](./images/training.png "Training") 14 | 15 | # Prediction 16 | ![Prediction](./images/prediction.png "Prediction") 17 | -------------------------------------------------------------------------------- /data/learning_data.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mmezyk/fbpicker/5bcd2f039c1948d25094a9fdbb182b4fbf95ae1b/data/learning_data.hdf5 -------------------------------------------------------------------------------- /data/prediction_data.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mmezyk/fbpicker/5bcd2f039c1948d25094a9fdbb182b4fbf95ae1b/data/prediction_data.hdf5 -------------------------------------------------------------------------------- /data/validation_data.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mmezyk/fbpicker/5bcd2f039c1948d25094a9fdbb182b4fbf95ae1b/data/validation_data.hdf5 -------------------------------------------------------------------------------- /images/prediction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mmezyk/fbpicker/5bcd2f039c1948d25094a9fdbb182b4fbf95ae1b/images/prediction.png -------------------------------------------------------------------------------- /images/tmp11.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /images/training.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mmezyk/fbpicker/5bcd2f039c1948d25094a9fdbb182b4fbf95ae1b/images/training.png -------------------------------------------------------------------------------- /main/prediction_functions.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import h5py 3 | import os 4 | import sys 5 | import time 6 | from scipy.signal import hann 7 | from obspy.signal import trigger 8 | import theano.ifelse 9 | from keras.models import load_model 10 | from keras.layers import Dense, Dropout, Activation 11 | import matplotlib.pyplot as plt 12 | from scipy.stats import kurtosis,skew,linregress 13 | 14 | def read_hdf5(path,fb=0): 15 | f=h5py.File(path,'a') 16 | dataset=f['/TRACE_DATA/DEFAULT'] 17 | channels=f['/TRACE_DATA/DEFAULT/CHANNEL/'] 18 | offset=f['/TRACE_DATA/DEFAULT/OFFSET/'][:] 19 | cdp=f['/TRACE_DATA/DEFAULT/CDP/'][:] 20 | ftrace=f['/TRACE_DATA/DEFAULT/FTRACE/'] 21 | data=f['/TRACE_DATA/DEFAULT/data_array/'] 22 | gapsize=f['/TRACE_DATA/DEFAULT/GAP_SIZE/'] 23 | shotids=dataset['SHOTID'][:] 24 | user=dataset['USER'] 25 | sourcenum=dataset['SOURCENUM'] 26 | recordnum=dataset['RECORDNUM'] 27 | src_x=dataset['SOURCE_X'] 28 | src_y=dataset['SOURCE_Y'] 29 | cdp_x=dataset['CDP_X'] 30 | cdp_y=dataset['CDP_Y'] 31 | rec_x=dataset['REC_X'] 32 | rec_y=dataset['REC_Y'] 33 | if fb==1: 34 | pred_prefb=f['/pred_fb1'] 35 | pred_postfb=f['/pred_fb2'] 36 | pred_1samp=f['/pred_fb3'] 37 | pred_avg=f['/pred_avg'] 38 | dataset={'data':data,'shotids':shotids,'offset':offset,'cdp':cdp,'gapsize':gapsize,'user':user,'src_x':src_x,'src_y':src_y,'cdp_x':cdp_x,'cdp_y':cdp_y,'snum':sourcenum,'rnum':recordnum,'ftrc':ftrace,'rec_x':rec_x,'rec_y':rec_y,'chan':channels,'pred_prefb':pred_prefb,'pred_postfb':pred_postfb,'pred_1samp':pred_1samp,'pred_avg':pred_avg} 39 | else: 40 | dataset={'data':data,'shotids':shotids,'offset':offset,'cdp':cdp,'gapsize':gapsize,'user':user,'src_x':src_x,'src_y':src_y,'cdp_x':cdp_x,'cdp_y':cdp_y,'snum':sourcenum,'rnum':recordnum} 41 | print('Data loaded') 42 | return dataset 43 | 44 | def add_hdrs(path): 45 | file=h5py.File(path,'a') 46 | data=file['/TRACE_DATA/DEFAULT/data_array/'] 47 | try: 48 | file.create_dataset('/pred_fb1',(len(data),len(data[0]))) 49 | file.create_dataset('/pred_fb2',(len(data),len(data[0]))) 50 | file.create_dataset('/pred_fb3',(len(data),len(data[0]))) 51 | file.create_dataset('/pred_avg',(len(data),len(data[0]))) 52 | file['/pred_fb1'][:]=0 53 | file['/pred_fb2'][:]=0 54 | file['/pred_fb3'][:]=0 55 | file['/pred_avg'][:]=0 56 | file.close() 57 | print('Data arrays pred_fb1, pred_fb2, pred_fb3, pred_avg added to the data structure') 58 | except: 59 | file['/pred_fb1'][:]=0 60 | file['/pred_fb2'][:]=0 61 | file['/pred_fb3'][:]=0 62 | file['/pred_avg'][:]=0 63 | file.close() 64 | print('Data arrays pred_fb1, pred_fb2, pred_fb2, pred_avg already exist') 65 | 66 | def plot(dataset,trcids,fsamp,lsamp): 67 | ntrcs=len(trcids) 68 | fig,axes=plt.subplots(ncols=4,nrows=ntrcs,sharey=True,sharex=True,figsize=(25,4*ntrcs)) 69 | for i,j in enumerate(trcids): 70 | fb=dataset['gapsize'][j]/2 71 | axes[i,0].set_title('Single\nsample model',fontsize=15) 72 | axes[i,0].plot(dataset['pred_1samp'][j][fsamp:lsamp],c='red',linewidth=1,label='Probability\ndistribution') 73 | axes[i,0].plot(dataset['data'][j][fsamp:500]/np.amax(np.abs(dataset['data'][j][fsamp:lsamp])),c='black',linewidth=1,label='Seismic trace') 74 | axes[i,0].plot([fb,fb],[-1,0],'--',c='blue',label='Reference first-break sample') 75 | axes[i,0].set_ylim((-1,1)) 76 | axes[i,1].set_ylim((-1,1)) 77 | axes[i,2].set_ylim((-1,1)) 78 | axes[i,3].set_ylim((-1,1)) 79 | axes[i,0].set_xlim((fsamp,lsamp)) 80 | axes[i,1].set_xlim((fsamp,lsamp)) 81 | axes[i,2].set_xlim((fsamp,lsamp)) 82 | axes[i,3].set_xlim((fsamp,lsamp)) 83 | axes[i,0].set_ylabel('Probability',fontsize=15) 84 | axes[i,0].set_xlabel('Samples',fontsize=15) 85 | axes[i,0].legend() 86 | axes[i,2].set_title('50 sample\npost-FB model',fontsize=15) 87 | axes[i,2].plot(dataset['pred_postfb'][j][fsamp:lsamp],c='red',linewidth=1) 88 | axes[i,2].plot(dataset['data'][j][fsamp:lsamp]/np.amax(np.abs(dataset['data'][j][fsamp:lsamp])),c='black',linewidth=1) 89 | axes[i,2].plot([fb,fb],[-1,0],'--',c='blue',label='Reference first-break sample') 90 | axes[i,1].set_title('50 sample\npre-FB model',fontsize=15) 91 | axes[i,1].plot(dataset['pred_prefb'][j][fsamp:lsamp],c='red',linewidth=1) 92 | axes[i,1].plot(dataset['data'][j][fsamp:lsamp]/np.amax(np.abs(dataset['data'][j][fsamp:lsamp])),c='black',linewidth=1) 93 | axes[i,1].plot([fb,fb],[-1,0],'--',c='blue',label='Reference first-break sample') 94 | axes[i,3].set_title('Models average',fontsize=15) 95 | axes[i,3].plot(dataset['pred_avg'][j][fsamp:lsamp],c='red',linewidth=1) 96 | axes[i,3].plot(dataset['data'][j][fsamp:lsamp]/np.amax(np.abs(dataset['data'][j][fsamp:lsamp])),c='black',linewidth=1) 97 | axes[i,3].plot([fb,fb],[-1,0],'--',c='blue',label='Reference first-break sample') 98 | plt.tight_layout() 99 | 100 | class fbpicker(): 101 | """ 102 | A class used to perfom automatic first-break picking on a given dataset. 103 | 104 | ... 105 | 106 | Attributes 107 | ---------- 108 | dataset : dict 109 | the seismics stored and organized in a Hierarchical Data Format (HDF) 110 | models : ndarray 111 | the first-break models stacked together in a single container 112 | (1 sample model, pre-FB model, post-FB model) 113 | min_offset : int 114 | the smallest offset used for autopicking 115 | max_offset : int 116 | the largest offset used for autopicking 117 | scalar_offset : float 118 | the scalar values used for scaling the offsets 119 | first_sample : int 120 | the time sample where autopicker starts 121 | trc_slen : int 122 | the number of samples being analyzed 123 | dt : int 124 | the sampling interval 125 | features_prefb : list 126 | the features list obtained for the pre-FB model through a selection process 127 | features_postfb : list 128 | the features list obtained for the post-FB model through a selection process 129 | 130 | Methods 131 | ------- 132 | entropy(trc,e_step) 133 | Calculates trace entropy 134 | fdm(trc,fd_step,lags,noise_scalar) 135 | Transforms trace into fractal dimension 136 | amp_spectrum(data,dt,single=1) 137 | Calculates amplitude spectrum 138 | norm(data) 139 | Performs simple normalization 140 | fq_win_sum(data,hwin,dt) 141 | Calculates summation of amplitude spectra of a data slice 142 | kurtosis_skewness(data,hwin) 143 | Calculates higher order statistics: kurtosis & skewness 144 | trc_fgen_prefb(trc,dt,nspad=200,hwin=150,vlen=51) 145 | Constructs feature matrices for the pre-FB model 146 | trc_fgen_postfb(trc,dt,hwin=150,vlen=51) 147 | Constructs feature matrices for the post-FB model 148 | trc_prep(m,ds,s,mode=2) 149 | Resizes a given matrix by down-sampling (Mode 1 - pre-FB, Mode 2 - post-FB) 150 | predict(self) 151 | Performs first-break prediction 152 | 153 | """ 154 | 155 | def __init__(self,dataset,models,min_offset,max_offset,scalar_offset,first_sample,trc_slen,dt,features_prefb,features_postfb): 156 | """ 157 | Parameters 158 | ---------- 159 | dataset : dict 160 | the seismics stored and organized in a Hierarchical Data Format (HDF) 161 | models : ndarray 162 | the first-break models stacked together in a single container 163 | (1 sample model, pre-FB model, post-FB model) 164 | min_offset : int 165 | the smallest offset used for autopicking 166 | max_offset : int 167 | the largest offset used for autopicking 168 | scalar_offset : float 169 | the scalar values used for scaling the offsets 170 | first_sample : int 171 | the time sample where autopicking starts for a seismic trace 172 | trc_slen : int 173 | the number of samples being analyzed 174 | dt : int 175 | the sampling interval 176 | features_prefb : list 177 | the features list obtained for the pre-FB model through a selection process 178 | features_postfb : list 179 | the features list obtained for the post-FB model through a selection process 180 | """ 181 | 182 | self.dataset = dataset 183 | self.models = models 184 | self.min_offset = min_offset 185 | self.max_offset = max_offset 186 | self.scalar_offset = scalar_offset 187 | self.first_sample = first_sample 188 | self.trc_slen = trc_slen 189 | self.dt = dt 190 | self.features_prefb = features_prefb 191 | self.features_postfb = features_postfb 192 | 193 | def entropy(self,trc,e_step): 194 | """ Calculates trace entropy """ 195 | t=len(trc)-1 196 | trc_out=trc.copy() 197 | trc_out[0:e_step]=0 198 | while t>e_step-1: 199 | trc_win=trc[t-e_step:t+1] 200 | t_win=e_step-1 201 | res=0 202 | while t_win>0: 203 | res+=np.abs(trc_win[t_win-1]-trc_win[t_win]) 204 | t_win-=1 205 | res=np.log10(1/e_step*res) 206 | trc_out[t]=res 207 | t-=1 208 | return trc_out 209 | 210 | def fdm(self,trc,fd_step,lags,noise_scalar): 211 | """ Transforms trace into fractal dimension """ 212 | ress=[] 213 | trc_out=trc/np.amax(np.abs(trc)) 214 | noise=np.random.normal(0,1,len(trc_out))*(np.std(trc_out)/noise_scalar) 215 | trc_out=trc_out+noise 216 | for i,lag in enumerate(lags): 217 | trc_cp=trc_out.copy() 218 | t=len(trc)-1 219 | trc_cp[0:fd_step]=0 220 | while t>fd_step-1: 221 | trc_win=trc_out[t-fd_step:t+1] 222 | t_win=fd_step-1 223 | res=0 224 | while t_win>lag-1: 225 | res+=np.square(trc_win[t_win-lag]-trc_win[t_win]) 226 | t_win-=1 227 | res=np.log10(1/(fd_step-lag)*res) 228 | trc_cp[t]=res 229 | t-=1 230 | if len(ress)==0: 231 | ress=np.reshape(trc_cp,(len(trc_cp),1)) 232 | else: 233 | ress=np.concatenate((ress,np.reshape(trc_cp,(len(trc_cp),1))),axis=1) 234 | for i,j in enumerate(ress): 235 | slope = linregress(lags,ress[i,:])[0] 236 | trc_out[i]=slope 237 | 238 | return trc_out 239 | 240 | def amp_spectrum(self,data,dt,single=1): 241 | """ Calculates amplitude spectrum """ 242 | if single==0: 243 | sp = np.average(np.fft.fftshift(np.fft.fft(data)),axis=1) 244 | else: 245 | sp=np.fft.fftshift(np.fft.fft(data)) 246 | win=np.ones((1,len(data))) 247 | s_mag=np.abs(sp)*2/np.sum(win) 248 | s_dbfs=20*np.log10(s_mag/np.amax(s_mag)) 249 | f = np.fft.fftshift(np.fft.fftfreq(len(data), dt)) 250 | freq=f[np.int(len(data)/2)+1:] 251 | amps=s_dbfs[np.int(len(data)/2)+1:] 252 | return freq,amps 253 | 254 | def norm(self,data): 255 | """ Performs simple normalization """ 256 | return data/np.amax(np.abs(data)) 257 | 258 | def fq_win_sum(self,data,hwin,dt): 259 | """ Calculates summation of amplitude spectra of a data slice """ 260 | data_cp=data.copy() 261 | for k,l in enumerate(data): 262 | if np.logical_and(k>hwin,khwin,k=self.min_offset,np.abs(self.dataset['offset'])/self.scalar_offset=perc)[0] 389 | if len(potential_fbs)!=0: 390 | fbs[itrc]=np.int(potential_fbs[0]) 391 | else: 392 | print('FB was not found for trace id:\t{}'.format(itrc)) 393 | print('Completed') 394 | return fbs 395 | 396 | def find_approx_fb(self,min_offset,max_offset,min_cdp,max_cdp,offset_spacing,n_split=100): 397 | """ Finds the maximum value of a probability distribution that is averaged within selected offset and cdp range """ 398 | fbs=np.zeros((self.dataset['pred_avg'].shape[0],1)) 399 | min_offset=(min_offset+offset_spacing)*self.scalar_offset 400 | max_offset=(max_offset-offset_spacing)*self.scalar_offset 401 | offsets=np.arange(min_offset,max_offset,offset_spacing*self.scalar_offset) 402 | for i,coffset in enumerate(offsets): 403 | print('Working on central offset:\t{}'.format(coffset/self.scalar_offset)) 404 | obin_trcs=np.where(np.logical_and(self.dataset['cdp'][:]<=max_cdp,np.logical_and(self.dataset['cdp'][:]>=min_cdp,np.logical_and(self.dataset['offset'][:]>=coffset-offset_spacing,self.dataset['offset'][:]10: 407 | for k,l in enumerate(tmp1): 408 | tmp0=self.dataset['pred_avg'][list(tmp1[k]),:] 409 | tmp2=np.sum(tmp0,axis=0) 410 | tmp2=np.where(tmp2[:]==np.amax(tmp2))[0] 411 | for m,n in enumerate(tmp1[k]): 412 | fbs[n]=np.int(tmp2) 413 | else: 414 | print('Not enough traces in a splitted offset bin') 415 | 416 | def stats(self,reference_fbs,predicted_fbs,allowed_diff): 417 | """ Provides basic statistics on a first-break outcome """ 418 | diff=np.abs(reference_fbs-predicted_fbs) 419 | diff_nonzero=np.where(diff!=0)[0] 420 | mispicked=np.where(diff>allowed_diff)[0] 421 | accuracy=1-len(mispicked)/len(predicted_fbs) 422 | print('Traces analyzed:\t{}\n\ 423 | Allowed sample mismatch:\t{}\n\ 424 | Traces mispicked:\t{}\n\ 425 | Accuracy:\t{} %\n\ 426 | Median sample mismatch:\t{}'.format(len(predicted_fbs),allowed_diff,len(mispicked),round(accuracy*100,1),np.int(np.median(diff[list(diff_nonzero)])))) -------------------------------------------------------------------------------- /main/training_functions.py: -------------------------------------------------------------------------------- 1 | import scipy 2 | import numpy as np 3 | import h5py 4 | import os 5 | import sys 6 | import copy,time 7 | from scipy.fftpack import fft, ifft 8 | import signal 9 | from obspy.signal import filter 10 | from obspy.signal import trigger 11 | from obspy.signal import cpxtrace 12 | from scipy import stats 13 | from obspy.core.trace import Trace 14 | import theano.ifelse 15 | import matplotlib.pyplot as plt 16 | from keras.models import load_model 17 | from keras.utils import plot_model 18 | from sklearn.cross_validation import train_test_split 19 | from scipy.stats import randint as sp_randint 20 | from keras.layers import Dense, Dropout, Activation 21 | from keras.models import Sequential 22 | from keras.wrappers.scikit_learn import KerasClassifier 23 | from sklearn.metrics import confusion_matrix 24 | 25 | def read_hdf5(path,fb=0): 26 | f=h5py.File(path,'a') 27 | dataset=f['/TRACE_DATA/DEFAULT'] 28 | channels=f['/TRACE_DATA/DEFAULT/CHANNEL/'] 29 | offset=f['/TRACE_DATA/DEFAULT/OFFSET/'][:] 30 | cdp=f['/TRACE_DATA/DEFAULT/CDP/'][:] 31 | ftrace=f['/TRACE_DATA/DEFAULT/FTRACE/'] 32 | data=f['/TRACE_DATA/DEFAULT/data_array/'] 33 | gapsize=f['/TRACE_DATA/DEFAULT/GAP_SIZE/'] 34 | shotids=dataset['SHOTID'][:] 35 | user=dataset['USER'] 36 | sourcenum=dataset['SOURCENUM'] 37 | recordnum=dataset['RECORDNUM'] 38 | src_x=dataset['SOURCE_X'] 39 | src_y=dataset['SOURCE_Y'] 40 | cdp_x=dataset['CDP_X'] 41 | cdp_y=dataset['CDP_Y'] 42 | rec_x=dataset['REC_X'] 43 | rec_y=dataset['REC_Y'] 44 | if fb==1: 45 | predictions1=f['/pred_fb1'] 46 | pred_fb_avg=f['/pred_fb_avg'] 47 | predictions2=f['/pred_fb2'] 48 | predictions3=f['/pred_fb3'] 49 | dataset={'data':data,'shotids':shotids,'offset':offset,'cdp':cdp,'predictions1':predictions1,'pred_fb_avg':pred_fb_avg,'gapsize':gapsize,'user':user,'src_x':src_x,'src_y':src_y,'cdp_x':cdp_x,'cdp_y':cdp_y,'snum':sourcenum,'rnum':recordnum,'ftrc':ftrace,'rec_x':rec_x,'rec_y':rec_y,'chan':channels,'predictions2':predictions2,'predictions3':predictions3} 50 | else: 51 | dataset={'data':data,'shotids':shotids,'offset':offset,'cdp':cdp,'gapsize':gapsize,'user':user,'src_x':src_x,'src_y':src_y,'cdp_x':cdp_x,'cdp_y':cdp_y,'snum':sourcenum,'rnum':recordnum} 52 | #pred_fb_avg_mp=f['/pred_fb_avg_mp'] 53 | 54 | print('Data loaded') 55 | return dataset 56 | 57 | def entropy(trc,e_step): 58 | t=len(trc)-1 59 | trc_out=copy.deepcopy(trc) 60 | trc_out[0:e_step]=0 61 | while t>e_step-1: 62 | trc_win=trc[t-e_step:t+1] 63 | t_win=e_step-1 64 | res=0 65 | while t_win>0: 66 | res+=np.abs(trc_win[t_win-1]-trc_win[t_win]) 67 | t_win-=1 68 | res=np.log10(1/e_step*res) 69 | trc_out[t]=res 70 | t-=1 71 | return trc_out 72 | 73 | def eps_filter(trc,eps_len): 74 | t=len(trc)-((eps_len-1)/2+1) 75 | trc_out=copy.deepcopy(trc) 76 | while t>0: 77 | tmp1=[] 78 | tmp2=[] 79 | tmp3=[] 80 | t_win=0 81 | while t_win<5: 82 | tmp1=np.append(tmp1,np.arange(t-eps_len-1,t+1)) 83 | t_win+=1 84 | for i,j in enumerate(tmp1): 85 | tmp2=np.append(tmp2,trc[j]) 86 | tmp3=np.append(tmp3,np.std(tmp2[-1])) 87 | tmp4=np.where(tmp3==np.amin(tmp3))[0] 88 | trc_out[t]=np.mean(tmp2[tmp4]) 89 | t-=1 90 | return trc_out 91 | 92 | def fdm(trc,fd_step,lags,noise_scalar): 93 | ress=[] 94 | trc_out=trc/np.amax(np.abs(trc)) 95 | noise=np.random.normal(0,1,len(trc_out))*(np.std(trc_out)/noise_scalar) 96 | trc_out=trc_out+noise 97 | for i,lag in enumerate(lags): 98 | trc_cp=copy.deepcopy(trc_out) 99 | t=len(trc)-1 100 | trc_cp[0:fd_step]=0 101 | while t>fd_step-1: 102 | trc_win=trc_out[t-fd_step:t+1] 103 | t_win=fd_step-1 104 | res=0 105 | while t_win>lag-1: 106 | res+=np.square(trc_win[t_win-lag]-trc_win[t_win]) 107 | t_win-=1 108 | res=np.log10(1/(fd_step-lag)*res) 109 | trc_cp[t]=res 110 | t-=1 111 | if len(ress)==0: 112 | ress=np.reshape(trc_cp,(len(trc_cp),1)) 113 | else: 114 | ress=np.concatenate((ress,np.reshape(trc_cp,(len(trc_cp),1))),axis=1) 115 | for i,j in enumerate(ress): 116 | slope = stats.linregress(lags,ress[i,:])[0] 117 | trc_out[i]=slope 118 | 119 | return trc_out 120 | 121 | def amp_spectrum(data,dt=0.004,single=1): 122 | if single==0: 123 | sp = np.average(np.fft.fftshift(np.fft.fft(data)),axis=1) 124 | else: 125 | sp=np.fft.fftshift(np.fft.fft(data)) 126 | win=np.ones((1,len(data))) 127 | s_mag=np.abs(sp)*2/np.sum(win) 128 | s_dbfs=20*np.log10(s_mag/np.amax(s_mag)) 129 | f = np.fft.fftshift(np.fft.fftfreq(len(data), dt)) 130 | freq=f[np.int(len(data)/2)+1:] 131 | amps=s_dbfs[np.int(len(data)/2)+1:] 132 | return freq,amps 133 | 134 | def despike(data,lperc,hperc): 135 | lamp=np.percentile(data,lperc) 136 | hamp=np.percentile(data,hperc) 137 | sample_to_kill=np.where(np.logical_or(data[:]hamp))[0] 138 | data[list(sample_to_kill)]=0 139 | return data 140 | 141 | def norm(data): 142 | return data/np.amax(np.abs(data)) 143 | 144 | def fq_win_sum(data,hwin,dt): 145 | data_cp=data.copy() 146 | for k,l in enumerate(data): 147 | if np.logical_and(k>hwin,k=4: 327 | model.add(Dense(output_dim = neurons, init = 'he_normal', activation = 'relu', input_dim = input_dim)) 328 | model.add(Dense(output_dim = neurons, init = 'he_normal', activation = 'relu')) 329 | model.add(Dense(output_dim = neurons, init = 'he_normal', activation = 'relu')) 330 | model.add(Dense(output_dim = neurons, init = 'he_normal', activation = 'relu')) 331 | model.add(Dense(output_dim = 1, init = 'he_normal', activation = 'sigmoid')) 332 | model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) 333 | model_log=model.fit(x_train, y_train, validation_data=(x_test,y_test), batch_size = batch_size, nb_epoch = epochs) 334 | return model,model_log 335 | 336 | def plot_training_log(log1,label1,log2,label2,log3,label3,lossmin,lossmax,valmin,valmax): 337 | fig,axes=plt.subplots(nrows=2,sharex=True,figsize=(10,5)) 338 | axes[0].set_title('Validation test\nrun on unseen dataset',fontsize=20) 339 | axes[0].plot(log1.history['val_loss'],label=label1,c='red',linewidth=2) 340 | axes[0].plot(log2.history['val_loss'],label=label2,c='blue',linewidth=2) 341 | axes[0].plot(log3.history['val_loss'],label=label3,c='green',linewidth=2) 342 | axes[0].grid() 343 | axes[0].set_xlim((0,99)) 344 | axes[0].set_ylim((lossmin,lossmax)) 345 | axes[0].set_ylabel('Loss',fontsize=15) 346 | axes[1].plot(log1.history['val_acc'],label=label1,c='red',linewidth=2) 347 | axes[1].plot(log2.history['val_acc'],label=label2,c='blue',linewidth=2) 348 | axes[1].plot(log3.history['val_acc'],label=label3,c='green',linewidth=2) 349 | axes[1].legend(loc=4) 350 | axes[1].grid() 351 | axes[1].set_xlim((0,99)) 352 | axes[1].set_ylim((valmin,valmax)) 353 | axes[1].set_ylabel('Accuracy [%]',fontsize=15) 354 | axes[1].set_xlabel('Epoch',fontsize=15) 355 | plt.tight_layout() 356 | 357 | def model_test(model,test_set,threshold): 358 | x_test = test_set[:,0:-1] 359 | y_test = test_set[:, -1] 360 | y_pred = model.predict(x_test) 361 | y_pred[y_pred>=threshold]=1 362 | y_pred[y_pred" 953 | ] 954 | }, 955 | "metadata": {}, 956 | "output_type": "display_data" 957 | } 958 | ], 959 | "source": [ 960 | "plot_training_log(model_1s_log,'Single sample model',model_10s_prefb_log,'Pre-first-break & 10-sample model',model_10s_postfb_log,'Post-first-break & 10-sample model',00.1,0.15,0.935,0.96)" 961 | ] 962 | }, 963 | { 964 | "cell_type": "markdown", 965 | "metadata": {}, 966 | "source": [ 967 | "Confusion matrices for each derived model" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 51, 973 | "metadata": {}, 974 | "outputs": [ 975 | { 976 | "name": "stdout", 977 | "output_type": "stream", 978 | "text": [ 979 | "Single sample model\n", 980 | "Total predictions: 329726\n", 981 | "True positive: 155595\n", 982 | "True negative: 157714\n", 983 | "False positive: 9268\n", 984 | "False negative: 7149\n", 985 | "Accuracy: 0.9502101745085313\n" 986 | ] 987 | } 988 | ], 989 | "source": [ 990 | "print('Single sample model')\n", 991 | "model_test(model_1s,testset_1s,0.35)" 992 | ] 993 | }, 994 | { 995 | "cell_type": "code", 996 | "execution_count": 50, 997 | "metadata": {}, 998 | "outputs": [ 999 | { 1000 | "name": "stdout", 1001 | "output_type": "stream", 1002 | "text": [ 1003 | "Pre-first-break & 10-sample model\n", 1004 | "Total predictions: 329726\n", 1005 | "True positive: 156182\n", 1006 | "True negative: 158906\n", 1007 | "False positive: 8681\n", 1008 | "False negative: 5957\n", 1009 | "Accuracy: 0.9556055634071926\n" 1010 | ] 1011 | } 1012 | ], 1013 | "source": [ 1014 | "print('Pre-first-break & 10-sample model')\n", 1015 | "model_test(model_10s_prefb,testset_10s_prefb,0.35)" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "code", 1020 | "execution_count": 52, 1021 | "metadata": {}, 1022 | "outputs": [ 1023 | { 1024 | "name": "stdout", 1025 | "output_type": "stream", 1026 | "text": [ 1027 | "Post-first-break & 10-sample model\n", 1028 | "Total predictions: 329726\n", 1029 | "True positive: 157467\n", 1030 | "True negative: 159335\n", 1031 | "False positive: 7396\n", 1032 | "False negative: 5528\n", 1033 | "Accuracy: 0.9608038189284436\n" 1034 | ] 1035 | } 1036 | ], 1037 | "source": [ 1038 | "print('Post-first-break & 10-sample model')\n", 1039 | "model_test(model_10s_postfb,testset_10s_postfb,0.35)" 1040 | ] 1041 | } 1042 | ], 1043 | "metadata": { 1044 | "hide_input": false, 1045 | "kernelspec": { 1046 | "display_name": "Python 3", 1047 | "language": "python", 1048 | "name": "python3" 1049 | }, 1050 | "language_info": { 1051 | "codemirror_mode": { 1052 | "name": "ipython", 1053 | "version": 3 1054 | }, 1055 | "file_extension": ".py", 1056 | "mimetype": "text/x-python", 1057 | "name": "python", 1058 | "nbconvert_exporter": "python", 1059 | "pygments_lexer": "ipython3", 1060 | "version": "3.5.4" 1061 | } 1062 | }, 1063 | "nbformat": 4, 1064 | "nbformat_minor": 2 1065 | } 1066 | --------------------------------------------------------------------------------