├── Demo.mp4 ├── conflictnet.png ├── README.md ├── requirements.txt ├── gui.py ├── dataLoad.py └── conflict_net.py /Demo.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smartcameras/ConflictNET/HEAD/Demo.mp4 -------------------------------------------------------------------------------- /conflictnet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/smartcameras/ConflictNET/HEAD/conflictnet.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Conflict-Intensity-Estimation 2 | Estimate the level of verbal conflict from raw speech signals 3 | 4 | An end-to-end CNN-LSTM architecture with attention mechanism using Keras with Tensorflow backend. 5 | 6 | Dataset used - SSPNet Conflict Corpus (http://www.dcs.gla.ac.uk/~vincia/dataconflict/) 7 | 8 | Paper - 9 | 10 | Rajan, Vandana, Alessio Brutti, and Andrea Cavallaro. "ConflictNET: End-to-End Learning for Speech-based Conflict Intensity Estimation." IEEE Signal Processing Letters 26.11 (2019): 1668-1672. 11 | (https://ieeexplore.ieee.org/document/8850055) 12 | 13 | ![True versus Predicted Conflict Values](https://github.com/smartcameras/ConflictNET/blob/master/conflictnet.png) 14 | 15 | # Procedure 16 | 17 | 1. Download the dataset from (http://www.dcs.gla.ac.uk/~vincia/dataconflict/) 18 | 19 | 2. Create train, val and test split according to the following paper 20 | 21 | Schuller, Björn, et al. "The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism." Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France. 2013. 22 | 23 | 3. Change lines 10,11 and 12 in 'dataLoad.py' by providing the train, val and test paths in your computer. 24 | 25 | 4. Run 'conflict_net.py' 26 | 27 | # Demo using a Greek political debate (from CONFER dataset) 28 | 29 | [![ConflictNet Demo](https://img.youtube.com/vi/6AH-ITHsQbw/0.jpg)](https://www.youtube.com/watch?v=6AH-ITHsQbw) 30 | 31 | CONFER dataset: https://ibug.doc.ic.ac.uk/resources/confer/ 32 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.8.0 2 | astor==0.8.0 3 | atomicwrites==1.3.0 4 | attrs==19.3.0 5 | audioread==2.1.8 6 | backports-abc==0.5 7 | backports.functools-lru-cache==1.6.1 8 | backports.weakref==1.0.post1 9 | certifi==2019.9.11 10 | cffi==1.13.2 11 | configparser==4.0.2 12 | contextlib2==0.6.0.post1 13 | cycler==0.10.0 14 | decorator==4.4.1 15 | enum34==1.1.6 16 | funcsigs==1.0.2 17 | functools32==3.2.3.post2 18 | future==0.17.1 19 | futures==3.3.0 20 | gast==0.2.2 21 | google-pasta==0.1.7 22 | grpcio==1.23.0 23 | h5py==2.9.0 24 | importlib-metadata==0.23 25 | innvestigate==1.0.8 26 | joblib==0.14.0 27 | Keras==2.2.4 28 | Keras-Applications==1.0.8 29 | Keras-Preprocessing==1.1.0 30 | keras-self-attention==0.42.0 31 | kiwisolver==1.1.0 32 | librosa==0.7.1 33 | linecache2==1.0.0 34 | llvmlite==0.30.0 35 | Markdown==3.1.1 36 | matplotlib==2.2.3 37 | mkl-fft==1.0.14 38 | mkl-random==1.1.0 39 | mkl-service==2.3.0 40 | mock==2.0.0 41 | more-itertools==5.0.0 42 | numba==0.46.0 43 | numpy==1.16.5 44 | opt-einsum==2.3.2 45 | packaging==19.2 46 | pandas==0.24.2 47 | pathlib2==2.3.5 48 | pbr==5.4.3 49 | Pillow==6.2.1 50 | pluggy==0.13.0 51 | protobuf==3.9.2 52 | py==1.8.0 53 | pycparser==2.19 54 | pyparsing==2.4.4 55 | pytest==4.6.6 56 | python-dateutil==2.8.1 57 | pytz==2019.3 58 | PyYAML==5.1.2 59 | resampy==0.2.2 60 | scandir==1.10.0 61 | scikit-learn==0.20.3 62 | scipy==1.2.1 63 | singledispatch==3.4.0.3 64 | six==1.12.0 65 | SoundFile==0.10.2 66 | subprocess32==3.5.4 67 | tensorboard==1.12.2 68 | tensorflow==1.12.0 69 | tensorflow-estimator==1.13.0 70 | termcolor==1.1.0 71 | tornado==5.1.1 72 | traceback2==1.4.0 73 | unittest2==1.1.0 74 | wcwidth==0.1.7 75 | webencodings==0.5.1 76 | Werkzeug==0.16.0 77 | wrapt==1.11.2 78 | zipp==0.6.0 79 | -------------------------------------------------------------------------------- /gui.py: -------------------------------------------------------------------------------- 1 | import math 2 | import PySimpleGUI as sg 3 | import numpy as np 4 | import sounddevice as sd 5 | import librosa 6 | 7 | def find_closest(p,vals): 8 | x = p[0] 9 | y = p[1] 10 | X = np.array(vals[0]) 11 | Y = np.array(vals[1]) 12 | l = np.sqrt(np.square(X - x) + np.square(Y - y)) 13 | if(l.min()<=4): 14 | idx = np.where(l == l.min()) 15 | s_id = idx[0] 16 | x = X[s_id] 17 | y = Y[s_id] 18 | else: 19 | s_id = [] 20 | x = [] 21 | y = [] 22 | return s_id,x,y 23 | 24 | layout = [[sg.Graph(canvas_size=(800, 800), graph_bottom_left=(-105,-105), graph_top_right=(105,105), background_color='white', key='graph', tooltip=None, enable_events=True)],] 25 | window = sg.Window('ConflictNET Prediction Evaluation', layout, grab_anywhere=True).Finalize() 26 | graph = window['graph'] 27 | # Draw axis 28 | graph.DrawLine((-100,0), (100,0)) 29 | graph.DrawLine((0,-100), (0,100)) 30 | for x in range(-100, 101, 20): 31 | graph.DrawLine((x,-3), (x,3)) 32 | if x != 0: 33 | graph.DrawText(x, (x,-10), color='green') 34 | for y in range(-100, 101, 20): 35 | graph.DrawLine((-3,y), (3,y)) 36 | if y != 0: 37 | graph.DrawText(y, (-10,y), color='blue') 38 | # Draw Graph 39 | f= open("./data/train_analysis.txt","r") 40 | x_tr = np.load('./data/x_train.npy') 41 | #s_id = [] 42 | true_val = [] 43 | pred_val = [] 44 | f.readline() # to remove column names 45 | for x in f: 46 | s = x.split() 47 | #s_id.append(float(s[0])) 48 | true_val.append(int(round(float(s[1])*100))) 49 | pred_val.append(int(round(float(s[2])*100))) 50 | #i=100 51 | vals = [true_val,pred_val] 52 | for i in range(len(true_val)): 53 | #graph.DrawLine((true_val[i],0),(true_val[i],pred_val[i])) 54 | graph.DrawCircle((true_val[i],pred_val[i]),4,line_color='red', fill_color='blue') 55 | while True: 56 | event, values = window.read() 57 | if event is None: 58 | break 59 | #print(event, values) 60 | val = values[event] 61 | print(val) 62 | # find closest data point 63 | s_id, x, y = find_closest(val,vals) 64 | if (len(s_id) != 0): 65 | print('Sample ID:',s_id) 66 | sig = x_tr[s_id] 67 | sig = np.reshape(sig,(240000)) 68 | #Getting an error using 8k with sd [Error: PaAlsaStreamComponent_BeginPolling: Assertion `ret == self->nfds' failed. Aborted (core dumped)]. So using 16k for playback 69 | aud = librosa.resample(sig,8000,16000) 70 | sd.play(aud,16000,blocking=True) 71 | -------------------------------------------------------------------------------- /dataLoad.py: -------------------------------------------------------------------------------- 1 | # Prepare SSPNet Conflict Corpus data 2 | # Author: Vandana Rajan 3 | # Email: v.rajan@qmul.ac.uk 4 | 5 | import librosa 6 | import os 7 | import numpy as np 8 | import pandas as pd 9 | 10 | # Interspeech 2013 challenge data partition: 11 | # All broad-casts with the female moderator (speaker # 50) were assigned to the training set. 12 | # The development set consists of all broad-casts moderated by the (male) speaker # 153, 13 | # and the test set comprises the rest (male moderators). 14 | 15 | train_path = '/data/scratch/eex608/conflict/audiodata/train' 16 | val_path = '/data/scratch/eex608/conflict/audiodata/val' 17 | test_path = '/data/scratch/eex608/conflict/audiodata/test' 18 | 19 | Fs = 8000 20 | t = 30 # duration of signal 21 | 22 | def rms_normalize(s): 23 | # RMS normalization 24 | 25 | new_s = s/np.sqrt(np.sum(np.square((np.abs(s))))/len(s)) 26 | return new_s 27 | 28 | def normalize(x): 29 | 30 | new_x = (x-np.mean(x)) 31 | return new_x 32 | 33 | def saveData(path): 34 | 35 | label_data = pd.read_csv('/data/scratch/eex608/conflict/conflictlevel.csv',index_col="Name") 36 | fnames = os.listdir(path) 37 | sig_len = int(Fs*t) 38 | x_data = np.zeros((len(fnames),int(Fs*t),1)) 39 | y_data = np.zeros((len(fnames))) 40 | for i in range(len(fnames)): 41 | full_path = path + '/' + fnames[i] 42 | sig,fs = librosa.load(full_path,Fs) 43 | if(len(sig)>sig_len): 44 | sig = sig[0:sig_len] 45 | elif(len(sig)=0): 160 | tot_p = tot_p+1 161 | if(p[i]>=0): 162 | tp = tp+1 163 | else: 164 | tot_n = tot_n + 1 165 | if(p[i]<0): 166 | tn = tn+1 167 | 168 | r1 = float(tp)/tot_p 169 | r2 = float(tn)/tot_n 170 | 171 | uar = (r1+r2)/2 172 | war = float(tp+tn)/(tot_p+tot_n) 173 | 174 | print('R1:',r1,'R2:',r2,'UAR:',uar,'WAR:',war) 175 | 176 | 177 | 178 | """ 179 | Model Summary 180 | _________________________________________________________________ 181 | Layer (type) Output Shape Param # 182 | ================================================================= 183 | input_2 (InputLayer) (None, 240000, 1) 0 184 | _________________________________________________________________ 185 | conv1d_4 (Conv1D) (None, 239995, 64) 448 186 | _________________________________________________________________ 187 | batch_normalization_4 (Batch (None, 239995, 64) 256 188 | _________________________________________________________________ 189 | max_pooling1d_4 (MaxPooling1 (None, 29999, 64) 0 190 | _________________________________________________________________ 191 | conv1d_5 (Conv1D) (None, 29996, 128) 32896 192 | _________________________________________________________________ 193 | batch_normalization_5 (Batch (None, 29996, 128) 512 194 | _________________________________________________________________ 195 | max_pooling1d_5 (MaxPooling1 (None, 4999, 128) 0 196 | _________________________________________________________________ 197 | conv1d_6 (Conv1D) (None, 4996, 256) 131328 198 | _________________________________________________________________ 199 | batch_normalization_6 (Batch (None, 4996, 256) 1024 200 | _________________________________________________________________ 201 | max_pooling1d_6 (MaxPooling1 (None, 832, 256) 0 202 | _________________________________________________________________ 203 | average_pooling1d_2 (Average (None, 208, 256) 0 204 | _________________________________________________________________ 205 | lstm_3 (LSTM) (None, 208, 128) 197120 206 | _________________________________________________________________ 207 | seq_self_attention_2 (SeqSel (None, 208, 128) 8257 208 | _________________________________________________________________ 209 | lstm_4 (LSTM) (None, 64) 49408 210 | _________________________________________________________________ 211 | dense_2 (Dense) (None, 1) 65 212 | ================================================================= 213 | Total params: 421,314 214 | Trainable params: 420,418 215 | Non-trainable params: 896 216 | _________________________________________________________________ 217 | """ 218 | --------------------------------------------------------------------------------