├── Demo.mp4
├── conflictnet.png
├── README.md
├── requirements.txt
├── gui.py
├── dataLoad.py
└── conflict_net.py


/Demo.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/smartcameras/ConflictNET/HEAD/Demo.mp4


--------------------------------------------------------------------------------
/conflictnet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/smartcameras/ConflictNET/HEAD/conflictnet.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Conflict-Intensity-Estimation
 2 | Estimate the level of verbal conflict from raw speech signals
 3 | 
 4 | An end-to-end CNN-LSTM architecture with attention mechanism using Keras with Tensorflow backend. 
 5 | 
 6 | Dataset used - SSPNet Conflict Corpus (http://www.dcs.gla.ac.uk/~vincia/dataconflict/)
 7 | 
 8 | Paper - 
 9 | 
10 | Rajan, Vandana, Alessio Brutti, and Andrea Cavallaro. "ConflictNET: End-to-End Learning for Speech-based Conflict Intensity Estimation." IEEE Signal Processing Letters 26.11 (2019): 1668-1672.
11 | (https://ieeexplore.ieee.org/document/8850055)
12 | 
13 | ![True versus Predicted Conflict Values](https://github.com/smartcameras/ConflictNET/blob/master/conflictnet.png)
14 | 
15 | # Procedure
16 | 
17 | 1. Download the dataset from (http://www.dcs.gla.ac.uk/~vincia/dataconflict/)
18 | 
19 | 2. Create train, val and test split according to the following paper
20 | 
21 | Schuller, Björn, et al. "The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism." Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France. 2013.
22 | 
23 | 3. Change lines 10,11 and 12 in 'dataLoad.py' by providing the train, val and test paths in your computer.
24 | 
25 | 4. Run 'conflict_net.py'
26 | 
27 | # Demo using a Greek political debate (from CONFER dataset)
28 | 
29 | [![ConflictNet Demo](https://img.youtube.com/vi/6AH-ITHsQbw/0.jpg)](https://www.youtube.com/watch?v=6AH-ITHsQbw)
30 | 
31 | CONFER dataset: https://ibug.doc.ic.ac.uk/resources/confer/
32 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | absl-py==0.8.0
 2 | astor==0.8.0
 3 | atomicwrites==1.3.0
 4 | attrs==19.3.0
 5 | audioread==2.1.8
 6 | backports-abc==0.5
 7 | backports.functools-lru-cache==1.6.1
 8 | backports.weakref==1.0.post1
 9 | certifi==2019.9.11
10 | cffi==1.13.2
11 | configparser==4.0.2
12 | contextlib2==0.6.0.post1
13 | cycler==0.10.0
14 | decorator==4.4.1
15 | enum34==1.1.6
16 | funcsigs==1.0.2
17 | functools32==3.2.3.post2
18 | future==0.17.1
19 | futures==3.3.0
20 | gast==0.2.2
21 | google-pasta==0.1.7
22 | grpcio==1.23.0
23 | h5py==2.9.0
24 | importlib-metadata==0.23
25 | innvestigate==1.0.8
26 | joblib==0.14.0
27 | Keras==2.2.4
28 | Keras-Applications==1.0.8
29 | Keras-Preprocessing==1.1.0
30 | keras-self-attention==0.42.0
31 | kiwisolver==1.1.0
32 | librosa==0.7.1
33 | linecache2==1.0.0
34 | llvmlite==0.30.0
35 | Markdown==3.1.1
36 | matplotlib==2.2.3
37 | mkl-fft==1.0.14
38 | mkl-random==1.1.0
39 | mkl-service==2.3.0
40 | mock==2.0.0
41 | more-itertools==5.0.0
42 | numba==0.46.0
43 | numpy==1.16.5
44 | opt-einsum==2.3.2
45 | packaging==19.2
46 | pandas==0.24.2
47 | pathlib2==2.3.5
48 | pbr==5.4.3
49 | Pillow==6.2.1
50 | pluggy==0.13.0
51 | protobuf==3.9.2
52 | py==1.8.0
53 | pycparser==2.19
54 | pyparsing==2.4.4
55 | pytest==4.6.6
56 | python-dateutil==2.8.1
57 | pytz==2019.3
58 | PyYAML==5.1.2
59 | resampy==0.2.2
60 | scandir==1.10.0
61 | scikit-learn==0.20.3
62 | scipy==1.2.1
63 | singledispatch==3.4.0.3
64 | six==1.12.0
65 | SoundFile==0.10.2
66 | subprocess32==3.5.4
67 | tensorboard==1.12.2
68 | tensorflow==1.12.0
69 | tensorflow-estimator==1.13.0
70 | termcolor==1.1.0
71 | tornado==5.1.1
72 | traceback2==1.4.0
73 | unittest2==1.1.0
74 | wcwidth==0.1.7
75 | webencodings==0.5.1
76 | Werkzeug==0.16.0
77 | wrapt==1.11.2
78 | zipp==0.6.0
79 | 


--------------------------------------------------------------------------------
/gui.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | import PySimpleGUI as sg
 3 | import numpy as np
 4 | import sounddevice as sd
 5 | import librosa
 6 | 
 7 | def find_closest(p,vals):
 8 | 	x = p[0]
 9 | 	y = p[1]
10 | 	X = np.array(vals[0])
11 | 	Y = np.array(vals[1])
12 | 	l = np.sqrt(np.square(X - x) + np.square(Y - y))
13 | 	if(l.min()<=4):
14 | 		idx = np.where(l == l.min())
15 | 		s_id = idx[0]
16 | 		x = X[s_id]
17 | 		y = Y[s_id]
18 | 	else:
19 | 		s_id = []
20 | 		x = []
21 | 		y = []
22 | 	return s_id,x,y
23 | 
24 | layout = [[sg.Graph(canvas_size=(800, 800), graph_bottom_left=(-105,-105), graph_top_right=(105,105), background_color='white', key='graph', tooltip=None, enable_events=True)],]
25 | window = sg.Window('ConflictNET Prediction Evaluation', layout, grab_anywhere=True).Finalize()
26 | graph = window['graph']
27 | # Draw axis
28 | graph.DrawLine((-100,0), (100,0))
29 | graph.DrawLine((0,-100), (0,100))
30 | for x in range(-100, 101, 20):
31 | 	graph.DrawLine((x,-3), (x,3))
32 | 	if x != 0:
33 | 		graph.DrawText(x, (x,-10), color='green')
34 | for y in range(-100, 101, 20):
35 | 	graph.DrawLine((-3,y), (3,y))
36 | 	if y != 0:
37 | 		graph.DrawText(y, (-10,y), color='blue')
38 | # Draw Graph
39 | f= open("./data/train_analysis.txt","r")
40 | x_tr = np.load('./data/x_train.npy')
41 | #s_id = []
42 | true_val = []
43 | pred_val = []
44 | f.readline() # to remove column names
45 | for x in f:
46 | 	s = x.split()
47 | 	#s_id.append(float(s[0]))
48 | 	true_val.append(int(round(float(s[1])*100)))
49 | 	pred_val.append(int(round(float(s[2])*100)))
50 | #i=100
51 | vals = [true_val,pred_val]
52 | for i in range(len(true_val)):
53 | 	#graph.DrawLine((true_val[i],0),(true_val[i],pred_val[i]))
54 | 	graph.DrawCircle((true_val[i],pred_val[i]),4,line_color='red', fill_color='blue')
55 | while True:
56 | 	event, values = window.read()
57 | 	if event is None:
58 | 		break
59 | 	#print(event, values)
60 | 	val = values[event]
61 | 	print(val)
62 | 	# find closest data point 
63 | 	s_id, x, y = find_closest(val,vals)
64 | 	if (len(s_id) != 0):
65 | 		print('Sample ID:',s_id)
66 | 		sig = x_tr[s_id]
67 | 		sig = np.reshape(sig,(240000))
68 | 		#Getting an error using 8k with sd [Error: PaAlsaStreamComponent_BeginPolling: Assertion `ret == self->nfds' failed. Aborted (core dumped)]. So using 16k for playback
69 | 		aud = librosa.resample(sig,8000,16000)
70 | 		sd.play(aud,16000,blocking=True)
71 | 


--------------------------------------------------------------------------------
/dataLoad.py:
--------------------------------------------------------------------------------
 1 | # Prepare SSPNet Conflict Corpus data
 2 | # Author: Vandana Rajan
 3 | # Email: v.rajan@qmul.ac.uk
 4 | 
 5 | import librosa
 6 | import os
 7 | import numpy as np
 8 | import pandas as pd
 9 | 
10 | # Interspeech 2013 challenge data partition: 
11 | # All broad-casts with the female moderator (speaker # 50) were assigned to the training set.
12 | # The development set consists of all broad-casts moderated by the (male) speaker # 153,
13 | # and the test set comprises the rest (male moderators).
14 | 
15 | train_path = '/data/scratch/eex608/conflict/audiodata/train'
16 | val_path = '/data/scratch/eex608/conflict/audiodata/val'
17 | test_path = '/data/scratch/eex608/conflict/audiodata/test'
18 | 
19 | Fs = 8000
20 | t = 30 # duration of signal
21 | 
22 | def rms_normalize(s):
23 | # RMS normalization
24 | 
25 |         new_s = s/np.sqrt(np.sum(np.square((np.abs(s))))/len(s))
26 |         return new_s
27 | 
28 | def normalize(x):
29 |                
30 |         new_x = (x-np.mean(x))
31 |         return new_x
32 |         
33 | def saveData(path):
34 | 
35 |         label_data = pd.read_csv('/data/scratch/eex608/conflict/conflictlevel.csv',index_col="Name")
36 |         fnames = os.listdir(path)
37 |         sig_len = int(Fs*t)
38 |         x_data = np.zeros((len(fnames),int(Fs*t),1))
39 |         y_data = np.zeros((len(fnames)))        
40 |         for i in range(len(fnames)):
41 |                 full_path = path + '/' + fnames[i]
42 |                 sig,fs = librosa.load(full_path,Fs)
43 |                 if(len(sig)>sig_len):
44 |                         sig = sig[0:sig_len]
45 |                 elif(len(sig)<sig_len):
46 |                         z = np.zeros((sig_len-len(sig),1))
47 |                         sig = np.append(sig,z)                
48 |                 s = normalize(sig)
49 |                 s = rms_normalize(sig)
50 |                 y = label_data.loc[fnames[i][:-4]].Value                
51 |                 x_data[i] = np.reshape(s,(len(s),1))
52 |                 y_data[i] = y                
53 |         x_data = x_data.astype('float32')
54 |         y_data = y_data.astype('float32')        
55 |         print(x_data.shape,y_data.shape)
56 |         return x_data,y_data
57 | 
58 | def save_tr():
59 | 	x,y,y1 = saveData(train_path)
60 |         np.save('x_train.npy',x)
61 |         np.save('y_train.npy',y)
62 |         
63 | def save_val():
64 |         x,y,y1 = saveData(val_path)
65 |         np.save('x_val.npy',x)
66 |         np.save('y_val.npy',y)        
67 | 
68 | def save_test():
69 |         x,y,y1 = saveData(test_path)
70 |         np.save('x_test.npy',x)
71 |         np.save('y_test.npy',y)        
72 | 
73 | def load_test():
74 |         x = np.load('x_test.npy')
75 |         y = np.load('y_test.npy')        
76 |         return x,y        
77 | 
78 | def load_tr():
79 | 	x = np.load('x_train.npy')
80 |         y = np.load('y_train.npy')        
81 |         return x,y        
82 | 
83 | def load_val():
84 |         x = np.load('x_val.npy')
85 |         y = np.load('y_val.npy')
86 |         return x,y  
87 | 
88 | 
89 | 
90 | 


--------------------------------------------------------------------------------
/conflict_net.py:
--------------------------------------------------------------------------------
  1 | # CNN-LSTM with attention for conflict intensity estimation
  2 | # Author: Vandana Rajan
  3 | # Email: v.rajan@qmul.ac.uk
  4 | 
  5 | # seed initialization for reproducible results
  6 | from numpy.random import seed
  7 | seed(1)
  8 | from tensorflow import set_random_seed
  9 | set_random_seed(2)
 10 | 
 11 | # imports
 12 | from keras.models import Sequential,Model
 13 | from keras.layers import Input,Conv1D,BatchNormalization,MaxPooling1D,LSTM,Dense,GlobalAveragePooling1D,AveragePooling1D  
 14 | from keras import optimizers
 15 | from keras.callbacks import ModelCheckpoint,EarlyStopping,Callback
 16 | import argparse
 17 | import numpy as np
 18 | import keras.backend as K
 19 | from dataLoad import load_tr,load_val,load_test
 20 | from keras.models import load_model,Model
 21 | from keras_self_attention import SeqSelfAttention                     
 22 | from keras import losses
 23 | 
 24 | #custom metric
 25 | def pearson_cc(x,y): # x-ground truth and y-prediction
 26 | 
 27 |         x_mean = K.mean(x,axis=0)
 28 |         y_mean = K.mean(y,axis=0)
 29 |         x_std = K.std(x,axis=0)
 30 |         y_std = K.std(y,axis=0)
 31 |         n = K.mean((x - x_mean) * (y - y_mean))
 32 |         d = x_std*y_std
 33 |         return (n/d)
 34 | 
 35 | #custom loss
 36 | def pearson_loss(x,y):
 37 | 
 38 |         x_mean = K.mean(x,axis=0)
 39 |         y_mean = K.mean(y,axis=0)
 40 |         x_std = K.std(x,axis=0)
 41 |         y_std = K.std(y,axis=0)
 42 |         n = K.mean((x - x_mean) * (y - y_mean))
 43 |         d = x_std*y_std
 44 |         p_loss = (1-(n/d)) # pearson cc loss
 45 |         return p_loss
 46 | 
 47 | # range modification
 48 | def normalize(x,old_max,old_min,new_max,new_min):
 49 | 
 50 |         new_x = (((x-old_min)*(new_max-new_min))/(old_max-old_min))+new_min
 51 |         return new_x
 52 |         
 53 | def conflictNet(input_shape):
 54 | 
 55 |         inp = Input(shape=input_shape)
 56 | 
 57 |         x = Conv1D(filters=64, kernel_size=6, strides=1, padding='valid', data_format='channels_last', activation = 'relu',input_shape=input_shape)(inp)
 58 |         x = BatchNormalization()(x)
 59 |         x = MaxPooling1D(pool_size=8, strides=8)(x)
 60 | 
 61 |         x = Conv1D(filters=128, kernel_size=4, strides=1, padding='valid', activation = 'relu', data_format='channels_last')(x)
 62 |         x = BatchNormalization()(x)
 63 |         x = MaxPooling1D(pool_size=6,strides=6)(x)
 64 | 
 65 |         x = Conv1D(filters=256, kernel_size=4, strides=1, padding='valid', activation = 'relu', data_format='channels_last')(x)
 66 |         x = BatchNormalization()(x)
 67 |         x = MaxPooling1D(pool_size=6,strides=6)(x)
 68 | 
 69 |         #Average Pooling 1D block
 70 |         x = AveragePooling1D(pool_size=4)(x)
 71 | 
 72 |         #LSTM1
 73 | 	x = LSTM(units=128,activation='tanh',return_sequences=True)(x)
 74 | 
 75 |         # Self Attention
 76 |         x = SeqSelfAttention(attention_activation='tanh')(x)
 77 | 
 78 |         #LSTM2
 79 | 	x = LSTM(units=64,activation='tanh',return_sequences=False)(x)
 80 | 
 81 |         #Dense
 82 | 	y = Dense(units=1,use_bias=True)(x) # regression
 83 |         
 84 |         model = Model(inputs=inp,outputs=y)
 85 | 
 86 |         opt = optimizers.Adam(lr=0.01,decay=0.6)
 87 |         model.compile(optimizer=opt,loss=pearson_loss,metrics=[pearson_cc]) # regression
 88 |         
 89 |         return model
 90 |         
 91 | def train(model, x_tr, y_tr, x_val, y_val, args):
 92 | 
 93 |         es = EarlyStopping(monitor='val_pearson_cc',patience=10,mode='max',restore_best_weights=True) # regression
 94 |         mc = ModelCheckpoint('best_model.h5', monitor='val_pearson_cc', mode='max', verbose=1, save_best_only=True) # regression
 95 |         history = model.fit(x_tr,y_tr,batch_size=args.batch_size,epochs=args.num_epochs,validation_data=(x_val,y_val),callbacks=[mc,es])
 96 |         return model
 97 |         
 98 | def test(model,x_t,y_t):
 99 | 
100 |         score = model.evaluate(x_t,y_t,batch_size=32)
101 |         print(model.metrics_names)
102 |         print(score)
103 |         return score
104 | 
105 | if __name__ == "__main__":
106 | 
107 |         parser = argparse.ArgumentParser()
108 |         args = parser.parse_args()
109 | 
110 |         args.batch_size = 32
111 |         args.num_epochs = 1500 #best model will be saved before number of epochs reach this value
112 | 
113 |         x_tr, y_tr,z = load_tr()
114 |         x_val, y_val,z = load_val()
115 |         x_t, y_t,z = load_test()        
116 | 
117 |         y_tr = normalize(y_tr,10,-10,1,-1) # change output label range from [-10,10] to [-1,1]
118 |         y_tr = np.reshape(y_tr,(y_tr.shape[0],1))
119 |         y_val = normalize(y_val,10,-10,1,-1)
120 |         y_val = np.reshape(y_val,(y_val.shape[0],1))
121 |         y_t = normalize(y_t,10,-10,1,-1)
122 |         y_t = np.reshape(y_t,(y_t.shape[0],1))
123 |         
124 |         #define model
125 |         model = conflictNet(input_shape=(240000,1))
126 |         model.summary()
127 | 
128 |         #train model
129 |         model = train(model,x_tr,y_tr,x_val,y_val,args=args)
130 |         
131 |         #test model
132 |         model = load_model('best_model.h5',custom_objects={'pearson_loss':pearson_loss,'pearson_cc':pearson_cc,'SeqSelfAttention':SeqSelfAttention})
133 | 
134 |         score = test(model,x_t,y_t)
135 |         pred_values = model.predict(x_t,batch_size=32)
136 |         
137 |         #Modify predicted values range
138 |         max_tr = np.max(y_tr)
139 |         min_tr = np.min(y_tr)
140 | 
141 |         p = normalize(pred_values,np.max(pred_values),np.min(pred_values),max_tr,min_tr)        
142 | 
143 |         print('########################PCC-Value#####################')
144 |         y_mean = np.mean(y_t,axis=0)
145 |         p_mean = np.mean(p,axis=0)
146 |         y_std = np.std(y_t,axis=0)
147 |         p_std = np.std(p,axis=0)
148 |         n = np.mean((y_t-y_mean)*(p-p_mean))
149 |         d = y_std*p_std
150 |         pcc = n/d
151 |         print('PCC:',pcc)
152 |         
153 |         print('######################WAR and UAR#########################')
154 |         tp = 0
155 |         tn = 0
156 |         tot_p = 0
157 |         tot_n = 0
158 |         for i in range(len(p)):
159 |                 if(y_t[i]>=0):
160 |                         tot_p = tot_p+1
161 |                         if(p[i]>=0):
162 |                                 tp = tp+1
163 |                 else:
164 |                      	tot_n = tot_n + 1
165 |                         if(p[i]<0):
166 |                                 tn = tn+1
167 | 
168 |         r1 = float(tp)/tot_p
169 |         r2 = float(tn)/tot_n
170 | 
171 |         uar = (r1+r2)/2
172 |         war = float(tp+tn)/(tot_p+tot_n)
173 | 
174 |         print('R1:',r1,'R2:',r2,'UAR:',uar,'WAR:',war)
175 | 
176 | 	
177 | 	
178 | """
179 | Model Summary
180 | _________________________________________________________________
181 | Layer (type)                 Output Shape              Param #   
182 | =================================================================
183 | input_2 (InputLayer)         (None, 240000, 1)         0         
184 | _________________________________________________________________
185 | conv1d_4 (Conv1D)            (None, 239995, 64)        448       
186 | _________________________________________________________________
187 | batch_normalization_4 (Batch (None, 239995, 64)        256       
188 | _________________________________________________________________
189 | max_pooling1d_4 (MaxPooling1 (None, 29999, 64)         0         
190 | _________________________________________________________________
191 | conv1d_5 (Conv1D)            (None, 29996, 128)        32896     
192 | _________________________________________________________________
193 | batch_normalization_5 (Batch (None, 29996, 128)        512       
194 | _________________________________________________________________
195 | max_pooling1d_5 (MaxPooling1 (None, 4999, 128)         0         
196 | _________________________________________________________________
197 | conv1d_6 (Conv1D)            (None, 4996, 256)         131328    
198 | _________________________________________________________________
199 | batch_normalization_6 (Batch (None, 4996, 256)         1024      
200 | _________________________________________________________________
201 | max_pooling1d_6 (MaxPooling1 (None, 832, 256)          0         
202 | _________________________________________________________________
203 | average_pooling1d_2 (Average (None, 208, 256)          0         
204 | _________________________________________________________________
205 | lstm_3 (LSTM)                (None, 208, 128)          197120    
206 | _________________________________________________________________
207 | seq_self_attention_2 (SeqSel (None, 208, 128)          8257      
208 | _________________________________________________________________
209 | lstm_4 (LSTM)                (None, 64)                49408     
210 | _________________________________________________________________
211 | dense_2 (Dense)              (None, 1)                 65        
212 | =================================================================
213 | Total params: 421,314
214 | Trainable params: 420,418
215 | Non-trainable params: 896
216 | _________________________________________________________________
217 | """
218 | 


--------------------------------------------------------------------------------