├── Paper ├── paper.md └── readme.md ├── README.md └── functions ├── Energy_extraction.py ├── Entropy_extraction.py ├── SpectralEdgeFrequency_Extraction.py └── readme.md /Paper/paper.md: -------------------------------------------------------------------------------- 1 | ### Emotion Recognition 2 | 1. Zheng W L, Lu B L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Transactions on Autonomous Mental Development, 2015, 7(3): 162-175. 3 | 2. Zong C, Chetouani M. Hilbert-Huang transform based physiological signals analysis for emotion recognition[C]//Signal processing and information technology (isspit), 2009 ieee international symposium on. IEEE, 2009: 334-339 4 | 5 | ### Sleep Stage Classification 6 | 1. Ronzhina M, Janoušek O, Kolářová J, et al. Sleep scoring using artificial neural networks[J]. Sleep medicine reviews, 2012, 16(3): 251-263. 7 | 2. Chapotot F, Becq G. Automated sleep–wake staging combining robust feature extraction, artificial neural network classification, and flexible decision rules[J]. International Journal of Adaptive Control and Signal Processing, 2010, 24(5): 409-423. 8 | 3. Ebrahimi F, Mikaeili M, Estrada E, et al. Automatic sleep stage classification based on EEG signals by using neural networks and wavelet packet coefficients[C]//Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE. IEEE, 2008: 1151-1154. 9 | 4. Losonczi L, Bako L, Brassai S T, et al. Hilbert-Huang transform used for EEG signal analysis[C]//The International Conference Interdisciplinarity in Engineering INTER-ENG. Editura Universitatii” Petru Maior” din Tirgu Mures, 2012: 361. 10 | 5. Subasi A, Gursoy M I. EEG signal classification using PCA, ICA, LDA and support vector machines[J]. Expert Systems with Applications, 2010, 37(12): 8659-8666. 11 | MLA 12 | -------------------------------------------------------------------------------- /Paper/readme.md: -------------------------------------------------------------------------------- 1 | **This is a collection of useful papers in EEG signal analysis.** 2 | 3 | The papers may not be the most cited ones or those published by the top journal/conference, but I did extracted some useful ideas from them. 4 | 5 | I wrote down some comments on each paper I have read on my personal website [here](https://billbeatthepeat.github.io/2017/08/24/My-Currently-Reading-Paper/). 6 | 7 | I am still a novel student in the Machine Learning field, so there would be some mistakes or misunderstanding. Kindly contact me with any questions. 8 | 9 | Regards. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python-for-EEG-analysis 2 | Currently working on analysis of EEG (Electroencephalography) signals on the emotion recognition and sleep stage classification problems. This package/file is a summary/gathering of the most frequently used packages, functions and papers of EEG analysis in Python. 3 | -------------------------------------------------------------------------------- /functions/Energy_extraction.py: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # This file is the demo code used to extract energy features for EEG analysis 3 | # The mainly method is to use the existing package. 4 | # However, the packages are sometimes obscure to use. 5 | # So I wrote down this file as a reference for further work on EEG analysis. 6 | ############################################################################# 7 | 8 | 9 | 10 | 11 | ########################### 12 | 13 | ### imported package 14 | #### 15 | 16 | import os 17 | import pyedflib 18 | import pandas as pd 19 | import numpy as np 20 | import pywt 21 | import plotly as py 22 | import plotly.graph_objs as go 23 | py.offline.init_notebook_mode() 24 | from numpy.fft import fft,ifft,fftfreq 25 | from scipy.signal import welch,iirfilter,filtfilt 26 | from scipy.stats import rv_continuous 27 | from scipy.signal import savgol_filter 28 | import xgboost as xgb 29 | from xgboost.sklearn import XGBClassifier 30 | from sklearn.metrics import accuracy_score 31 | from scipy.signal import find_peaks_cwt 32 | from scipy.interpolate import UnivariateSpline 33 | from scipy.stats import entropy 34 | from scipy.stats import kurtosis 35 | from scipy.stats import skew 36 | import pyeeg 37 | # from spectrum import * 38 | # from pyentrp import entropy as ent 39 | 40 | 41 | 42 | ############################ 43 | 44 | ### Read the data from .edf files and the labels from .txt files. 45 | ### Then concatenate the data and the label togther and 46 | ### reshape them into the form of (n_samples, length_of_signal) 47 | #### 48 | 49 | def read_data(filename): 50 | 51 | ''' 52 | Read data from the .edf and .txt files and output in pd.DataFrame. 53 | ''' 54 | 55 | f = pyedflib.EdfReader('eeg/edfs/'+filename+'.edf') 56 | headers = pd.DataFrame(f.getSignalHeaders()) 57 | headers_512 = headers[ headers.sample_rate == 512][['sample_rate','label']] 58 | #load data 59 | data1 = [] 60 | for row in headers_512.itertuples(): 61 | idx = row.Index 62 | name = row.label 63 | data1.append(pd.DataFrame(f.readSignal(idx),columns=[name])) 64 | data = pd.concat(data1,axis=1) 65 | #load label 66 | label = pd.read_csv('eeg/edfs/'+filename+'.txt',header=None) 67 | label.columns = ['label'] 68 | return data,label 69 | 70 | 71 | 72 | def data_reshape(data,col): 73 | 74 | ''' 75 | The data is the output of read_data in pd.DataFrame form. 76 | The col is the name of the inpput signal channel. 77 | Return in pd.DataFrame form 78 | ''' 79 | 80 | cols = [col+'_'+x for x in (map(str,range(15360)))] 81 | frame = pd.DataFrame(data[0][:len(data[1])*30*512][col].values.reshape((len(data[1]),15360)),columns=cols) 82 | return frame 83 | 84 | 85 | 86 | def map_label(label): 87 | 88 | ''' 89 | Map the label from str to int. The W, N1, N2, N3, R represent each stage of sleep stages. 90 | Return in dictionary form. 91 | ''' 92 | 93 | 94 | label_dic={'W':1,'N1':2,'N2':3,'N3':4,'R':5} 95 | return label_dic[label] 96 | 97 | 98 | 99 | ########################### 100 | 101 | ### Filter the input signal with banpass butter filter. 102 | ### According to the researches, we saperate the signal into 103 | ### 5 to 7 frequency bands: alpha, delta, theta, beta, gamma, low_alpha and high_alpha 104 | #### 105 | 106 | def Filter(frame1): 107 | 108 | ''' 109 | Filter delta,theta,alpha,beta,gamma. 110 | Input is the original signal data. 111 | Outputs are the filtered time-domain signals for each band. 112 | ''' 113 | a,b = iirfilter(1,[1.0/1024.0,4.0/1024.0],btype='bandpass',ftype='butter') 114 | delta = filtfilt(a,b,frame1,axis=1) 115 | a,b = iirfilter(1,[4.0/1024.0,8.0/1024.0],btype='bandpass',ftype='butter') 116 | theta = filtfilt(a,b,frame1,axis=1) 117 | a,b = iirfilter(1,[8.0/1024.0,14.0/1024.0],btype='bandpass',ftype='butter') 118 | alpha = filtfilt(a,b,frame1,axis=1) 119 | a,b = iirfilter(1,[8.5/1024.0,11.5/1024.0],btype='bandpass',ftype='butter') 120 | low_alpha = filtfilt(a,b,frame1,axis=1) 121 | a,b = iirfilter(1,[11.5/1024.0,15.5/1024.0],btype='bandpass',ftype='butter') 122 | high_alpha = filtfilt(a,b,frame1,axis=1) 123 | a,b = iirfilter(1,[14.0/1024.0,31.0/1024.0],btype='bandpass',ftype='butter') 124 | beta = filtfilt(a,b,frame1,axis=1) 125 | a,b = iirfilter(1,[31.0/1024.0,50.0/1024.0],btype='bandpass',ftype='butter') 126 | gamma = filtfilt(a,b,frame1,axis=1) 127 | 128 | return delta, theta, alpha, beta, gamma, low_alpha, high_alpha 129 | 130 | 131 | 132 | 133 | ####################################### 134 | 135 | ### Calculate power spectual density for each frequency band (as a whole) and their relative ratio value. 136 | ### This might be the most useful and important feature for the Sleep stage classification 137 | ### 138 | #### 139 | 140 | def energy_psd(phase, col, delta, theta, alpha, beta, gamma, low_alpha, high_alpha, dfFeature): 141 | 142 | ''' 143 | Input: 144 | phase: we can divide the original signal into several phase with respect with time. 145 | This variable is the label of each phase 146 | col: The name of each channel of bological signal. 147 | dfFeature: The returned df.DataFrame. 148 | ''' 149 | print "------------------------phase ", phase, "begin:" 150 | delta_f,delta_psd = welch(delta,fs=512,scaling='density',axis=1) 151 | # if 'Origin' == phase: 152 | if '_phase_0' == phase: 153 | dfFeature = pd.DataFrame((delta_psd*delta_psd).sum(axis=1)) 154 | dfFeature.columns = [col+'_delta_Energy'+phase] 155 | else: 156 | dfFeature[col+'_delta_Energy'+phase] = (delta_psd*delta_psd).sum(axis=1) 157 | theta_f,theta_psd = welch(theta,fs=512,scaling='density',axis=1) 158 | dfFeature[col+'_theta_Energy'+phase] = (theta_psd*theta_psd).sum(axis=1) 159 | alpha_f,alpha_psd = welch(alpha,fs=512,scaling='density',axis=1) 160 | dfFeature[col+'_alpha_Energy'+phase] = (alpha_psd*alpha_psd).sum(axis=1) 161 | beta_f,beta_psd = welch(beta,fs=512,scaling='density',axis=1) 162 | dfFeature[col+'_beta_Energy'+phase] = (beta_psd*beta_psd).sum(axis=1) 163 | gamma_f,gamma_psd = welch(gamma,fs=512,scaling='density',axis=1) 164 | dfFeature[col+'_gamma_Energy'+phase] = (gamma_psd*gamma_psd).sum(axis=1) 165 | 166 | low_alpha_f,low_alpha_psd = welch(low_alpha,fs=512,scaling='density',axis=1) 167 | dfFeature[col+'_low_alpha_Energy'+phase] = (low_alpha_psd*low_alpha_psd).sum(axis=1) 168 | high_alpha_f,high_alpha_psd = welch(high_alpha,fs=512,scaling='density',axis=1) 169 | dfFeature[col+'_high_alpha_Energy'+phase] = (high_alpha_psd*high_alpha_psd).sum(axis=1) 170 | 171 | dfFeature[col+'_Energy'+phase] = dfFeature[col+'_delta_Energy'+phase]+dfFeature[col+'_theta_Energy'+phase]+dfFeature[col+'_alpha_Energy'+phase]+\ 172 | dfFeature[col+'_beta_Energy'+phase]+dfFeature[col+'_gamma_Energy'+phase] 173 | 174 | dfFeature[col+'Energyratio1'+phase] = (dfFeature[col+'_alpha_Energy'+phase])/(dfFeature[col+'_delta_Energy'+phase]+dfFeature[col+'_theta_Energy'+phase]) 175 | dfFeature[col+'Energyratio2'+phase] = (dfFeature[col+'_delta_Energy'+phase])/(dfFeature[col+'_theta_Energy'+phase]+dfFeature[col+'_alpha_Energy'+phase]) 176 | dfFeature[col+'Energyratio3'+phase] = (dfFeature[col+'_theta_Energy'+phase])/(dfFeature[col+'_delta_Energy'+phase]+dfFeature[col+'_alpha_Energy'+phase]) 177 | 178 | dfFeature[col+'Energyrelative1'+phase] = (dfFeature[col+'_alpha_Energy'+phase])/dfFeature[col+'_Energy'+phase] 179 | dfFeature[col+'Energyrelative2'+phase] = (dfFeature[col+'_delta_Energy'+phase])/dfFeature[col+'_Energy'+phase] 180 | dfFeature[col+'Energyrelative3'+phase] = (dfFeature[col+'_theta_Energy'+phase])/dfFeature[col+'_Energy'+phase] 181 | dfFeature[col+'Energyrelative4'+phase] = (dfFeature[col+'_beta_Energy'+phase])/dfFeature[col+'_Energy'+phase] 182 | dfFeature[col+'Energyrelative5'+phase] = (dfFeature[col+'_gamma_Energy'+phase])/dfFeature[col+'_Energy'+phase] 183 | 184 | return dfFeature 185 | 186 | 187 | 188 | ########################################## 189 | 190 | ### Calculate the psd with small moving windows (6 seconds with 50% overlaping) 191 | ### Extract statistic value from these psd series. 192 | #### 193 | 194 | def small_window_psd(phase, col, delta, theta, alpha, beta, gamma, dfFeature): 195 | 196 | band = [delta, theta, alpha, beta, gamma] 197 | band_name = ['delta', 'theta', 'alpha', 'beta', 'gamma'] 198 | 199 | for i in range(5): 200 | signal = np.array(band[i]) 201 | signal_name = band_name[i] 202 | print "moving small windows on ", signal_name 203 | 204 | psd_arr = np.zeros((signal.shape[0],1 + (30-6)/3)) 205 | 206 | for j in range(1 + (30-6)/3): 207 | 208 | segment = signal[::, j*3*512:(j+2)*3*512] 209 | f,psd = welch(segment,fs=512, scaling='density', nperseg=256, noverlap=128, axis=1) 210 | psd = pd.DataFrame(psd) 211 | psd_arr[:,j] = psd.sum(axis=1) 212 | 213 | psd_arr = pd.DataFrame(psd_arr) 214 | dfFeature[col+signal_name+'_psd_min'+phase] = psd_arr.min(axis=1) 215 | dfFeature[col+signal_name+'_psd_max'+phase] = psd_arr.max(axis=1) 216 | dfFeature[col+signal_name+'_psd_mean'+phase] = psd_arr.mean(axis=1) 217 | dfFeature[col+signal_name+'_psd_median'+phase] = psd_arr.median(axis=1) 218 | dfFeature[col+signal_name+'_psd_std'+phase] = psd_arr.std(axis=1) 219 | dfFeature[col+signal_name+'_psd_var'+phase] = psd_arr.var(axis=1) 220 | dfFeature[col+signal_name+'_psd_diff_max'+phase] = psd_arr.diff(axis=1).max(axis=1) 221 | dfFeature[col+signal_name+'_psd_diff_min'+phase] = psd_arr.diff(axis=1).min(axis=1) 222 | dfFeature[col+signal_name+'_psd_diff_mean'+phase] = psd_arr.diff(axis=1).mean(axis=1) 223 | dfFeature[col+signal_name+'_psd_diff_std'+phase] = psd_arr.diff(axis=1).std(axis=1) 224 | 225 | return dfFeature 226 | -------------------------------------------------------------------------------- /functions/Entropy_extraction.py: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # This file is the demo code used to extract features for EEG analysis 3 | # The mainly method is to use the existing package. 4 | # However, the packages are sometimes obscure to use. 5 | # So I wrote down this file as a reference for further work on EEG analysis. 6 | ###################################### 7 | 8 | 9 | 10 | ########################### 11 | 12 | ### imported package 13 | #### 14 | 15 | import os 16 | import pyedflib 17 | import pandas as pd 18 | import numpy as np 19 | import pywt 20 | import plotly as py 21 | import plotly.graph_objs as go 22 | py.offline.init_notebook_mode() 23 | from numpy.fft import fft,ifft,fftfreq 24 | from scipy.signal import welch,iirfilter,filtfilt 25 | from scipy.stats import rv_continuous 26 | from scipy.signal import savgol_filter 27 | import xgboost as xgb 28 | from xgboost.sklearn import XGBClassifier 29 | from sklearn.metrics import accuracy_score 30 | from scipy.signal import find_peaks_cwt 31 | from scipy.interpolate import UnivariateSpline 32 | from scipy.stats import entropy 33 | from scipy.stats import kurtosis 34 | from scipy.stats import skew 35 | import pyeeg 36 | # from spectrum import * 37 | # from pyentrp import entropy as ent 38 | 39 | 40 | 41 | def DE(phase, col, delta, theta, alpha, beta, gamma, dfFeature): 42 | #Differential entropy 43 | en = np.zeros(alpha.shape[0]) 44 | sum_of = alpha.sum() * 1.0 45 | alpha = alpha / sum_of 46 | for i in range(alpha.shape[0]): 47 | en[i] = entropy(alpha[i]) 48 | dfFeature[col+'alpha_Entropy'+phase] = en 49 | 50 | # sample_entropy = np.zeros(alpha.shape[0]) 51 | # std_alpha = np.std(alpha, axis=1) 52 | # for i in range(std_alpha.shape[0]): 53 | # sample_entropy[i] = ent.sample_entropy(alpha, 4, 0.2*std_alpha) 54 | # dfFeature[col+'alpha_Sample_Entropy'+phase] = sample_entropy 55 | 56 | 57 | en = np.zeros(delta.shape[0]) 58 | sum_of = delta.sum() * 1.0 59 | delta = delta / sum_of 60 | for i in range(delta.shape[0]): 61 | en[i] = entropy(delta[i]) 62 | dfFeature[col+'delta_Entropy'+phase] = en 63 | 64 | en = np.zeros(theta.shape[0]) 65 | sum_of = theta.sum() * 1.0 66 | theta = theta / sum_of 67 | for i in range(theta.shape[0]): 68 | en[i] = entropy(theta[i]) 69 | dfFeature[col+'theta_Entropy'+phase] = en 70 | 71 | en = np.zeros(beta.shape[0]) 72 | sum_of = beta.sum() * 1.0 73 | beta = beta / sum_of 74 | for i in range(beta.shape[0]): 75 | en[i] = entropy(beta[i]) 76 | dfFeature[col+'beta_Entropy'+phase] = en 77 | 78 | en = np.zeros(gamma.shape[0]) 79 | sum_of = gamma.sum() * 1.0 80 | gamma = gamma / sum_of 81 | for i in range(gamma.shape[0]): 82 | en[i] = entropy(gamma[i]) 83 | dfFeature[col+'gamma_Entropy'+phase] = en 84 | 85 | 86 | # dfFeature[col+'delta_Entropy'+phase] = np.log(dfFeature[col+'_delta_Energy'+phase]) 87 | # dfFeature[col+'theta_Entropy'+phase] = np.log(dfFeature[col+'_theta_Energy'+phase]) 88 | # dfFeature[col+'alpha_Entropy'+phase] = np.log(dfFeature[col+'_alpha_Energy'+phase]) 89 | # dfFeature[col+'beta_Entropy'+phase] = np.log(dfFeature[col+'_beta_Energy'+phase]) 90 | # dfFeature[col+'gamma_Entropy'+phase] = np.log(dfFeature[col+'_gamma_Energy'+phase]) 91 | return dfFeature 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | ############################################ 101 | 102 | ### Following is the other useful functions 103 | ### Take down in the raw function from the introduction of its package 104 | #### 105 | 106 | 107 | ### From the package pyeeg: https://github.com/forrestbao/pyeeg/blob/master/pyeeg/__init__.py 108 | 109 | 110 | def spectral_entropy(X, Band, Fs, Power_Ratio=None): 111 | """Compute spectral entropy of a time series from either two cases below: 112 | 1. X, the time series (default) 113 | 2. Power_Ratio, a list of normalized signal power in a set of frequency 114 | bins defined in Band (if Power_Ratio is provided, recommended to speed up) 115 | In case 1, Power_Ratio is computed by bin_power() function. 116 | """ 117 | 118 | 119 | def svd_entropy(X, Tau, DE, W=None): 120 | """Compute SVD Entropy from either two cases below: 121 | 1. a time series X, with lag tau and embedding dimension dE (default) 122 | 2. a list, W, of normalized singular values of a matrix (if W is provided, 123 | recommend to speed up.) 124 | """ 125 | 126 | def ap_entropy(X, M, R): 127 | """Computer approximate entropy (ApEN) of series X, specified by M and R. 128 | Suppose given time series is X = [x(1), x(2), ... , x(N)]. We first build 129 | embedding matrix Em, of dimension (N-M+1)-by-M, such that the i-th row of 130 | Em is x(i),x(i+1), ... , x(i+M-1). Hence, the embedding lag and dimension 131 | are 1 and M-1 respectively. Such a matrix can be built by calling pyeeg 132 | function as Em = embed_seq(X, 1, M). Then we build matrix Emp, whose only 133 | difference with Em is that the length of each embedding sequence is M + 1 134 | Denote the i-th and j-th row of Em as Em[i] and Em[j]. Their k-th elements 135 | are Em[i][k] and Em[j][k] respectively. The distance between Em[i] and 136 | Em[j] is defined as 1) the maximum difference of their corresponding scalar 137 | components, thus, max(Em[i]-Em[j]), or 2) Euclidean distance. We say two 138 | 1-D vectors Em[i] and Em[j] *match* in *tolerance* R, if the distance 139 | between them is no greater than R, thus, max(Em[i]-Em[j]) <= R. Mostly, the 140 | value of R is defined as 20% - 30% of standard deviation of X. 141 | Pick Em[i] as a template, for all j such that 0 < j < N - M + 1, we can 142 | check whether Em[j] matches with Em[i]. Denote the number of Em[j], 143 | which is in the range of Em[i], as k[i], which is the i-th element of the 144 | vector k. The probability that a random row in Em matches Em[i] is 145 | \simga_1^{N-M+1} k[i] / (N - M + 1), thus sum(k)/ (N - M + 1), 146 | denoted as Cm[i]. 147 | We repeat the same process on Emp and obtained Cmp[i], but here 0