├── dataset
    └── README.md
├── app.py
├── README.md
└── speech_emotion_recognition.py


/dataset/README.md:
--------------------------------------------------------------------------------
1 | <h2>Datasets Used in this project are:</h2>
2 | <h4>
3 |     <li> <a href="https://www.kaggle.com/datasets/ejlok1/cremad/code">Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D)</a>
4 |     <li> <a href="https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio">RAVDESS Emotional speech audio</a>
5 |     <li> <a href="https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee">Surrey Audio-Visual Expressed Emotion (SAVEE)</a>
6 |     <li> <a href="https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess">Toronto emotional speech set (TESS)</a>
7 | </h4>


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | from keras.models import load_model
 3 | import numpy as np
 4 | 
 5 | from scipy.io import wavfile
 6 | from scipy import signal
 7 | 
 8 | def preprocess_speech(wav_file):
 9 |     # Read the wav file
10 |     sample_rate, audio = wavfile.read(wav_file)
11 | 
12 |     # Normalize the audio data
13 |     audio = audio / np.max(np.abs(audio))
14 | 
15 |     # Resample the audio to a lower sample rate
16 |     resampled_audio = signal.resample_poly(audio, 1, sample_rate // 0.8)
17 | 
18 |     # Extract features from the audio (e.g., MFCCs, spectrogram, etc.)
19 |     features = extract_features(resampled_audio, sample_rate)
20 | 
21 |     return features
22 | 
23 | 
24 | # Load the pre-trained model
25 | model = load_model('./model.h5')
26 | 
27 | st.title('Emotion Recognition')
28 | st.header('This app predicts the emotion from speech')
29 | 
30 | # Upload the speech file
31 | uploaded_file = st.file_uploader("Choose a speech file", type="wav")
32 | 
33 | if uploaded_file is not None:
34 |     # Preprocess the speech file (you'll need to implement this)
35 |     features = preprocess_speech(uploaded_file)
36 |     features = np.expand_dims(features, axis=0)
37 | 
38 |     # Make a prediction
39 |     prediction = model.predict(features)
40 | 
41 |     # Get the emotion with the highest confidence
42 |     emotion = np.argmax(prediction)
43 | 
44 |     # Display the result
45 |     st.write(f'The predicted emotion is: {emotion}')
46 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Speech Emotion Recognition
 2 | 
 3 | ## Overview
 4 | This GitHub repository is dedicated to the development of a Speech Emotion Recognition (SER) model using Python libraries Convolutional Neural Network, Keras, Pandas, Numpy. The goal of this project is to create an efficient and accurate model that can recognize emotions in spoken language, which can have a wide range of applications in fields such as human-computer interaction, customer service, and mental health.
 5 | 
 6 | ## Features
 7 | <li>
 8 |   Speech data preprocessing: Utilizes various Python libraries to preprocess and clean speech data
 9 | </li>
10 | <li>
11 |   Deep Learning Model: Implemented a deep learning model, Convolutional Neural Network (CNN) to extract meaningful features from the audio data.
12 | </li>
13 | <li>
14 |   Emotion Classification: Trained the model to classify speech into different emotion categories, such as happiness, sadness, anger, fear, etc.
15 | </li>
16 | <li>
17 |   Evaluation Metrics: Computes and display the evaluation metrics like accuracy, F1-score, and confusion matrix to assess the model's performance.
18 | </li>
19 | 
20 | ## Usage
21 | <ol type="1">
22 |   <li>
23 |     Fork this repository and Clone this repository to your local machine using
24 |   </li>
25 |   <code>git clone https://github.com/"YOUR_GITHUB_USERNAME"/Speech-Emotion-Recognition.git</code>
26 |   <li>
27 |     Set up a Python environment and install the required dependencies using pip install -r requirements.txt.
28 |   </li>
29 |   <li>
30 |     Prepare your speech emotion dataset and organize it appropriately.
31 |   </li>
32 |   <li>
33 |     Preprocess the data, train the SER model, and evaluate its performance using the provided scripts.
34 |   </li>
35 |   <li>
36 |     Fine-tune the model and experiment with different hyperparameters for improved accuracy.
37 |   </li>
38 | </ol>
39 | 
40 | ## Contributions
41 | Contributions to this project are welcome! Whether you want to improve the model's architecture, add new features, or fix bugs, please feel free to submit pull requests.
42 | 
43 | ## Acknowledgments
44 | We would like to acknowledge the open-source community and the developers of the Python libraries and frameworks used in this project. Additionally, special thanks to anyone who contributes to this project to make speech emotion recognition more accessible and accurate.
45 | 
46 | Get ready to explore the fascinating world of speech emotion recognition with Python. Happy coding!
47 | 


--------------------------------------------------------------------------------
/speech_emotion_recognition.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """Speech_Emotion_Recognition.ipynb
  3 | 
  4 | Automatically generated by Colaboratory.
  5 | 
  6 | Original file is located at
  7 |     https://colab.research.google.com/github/MrCuber/Speech-Emotion-Recognition/blob/main/Speech_Emotion_Recognition.ipynb
  8 | 
  9 | # Speech Emotion Recognition Project
 10 | 
 11 | ## Contributors to this project:
 12 | 
 13 | <b>
 14 | <li>Umesh Chandra Sakinala - 21BCE1782
 15 | <li>Sujan Kumar Sollety - 21BCE5667
 16 | <li>Harshith Simha Gurram - 21BCE5653
 17 | <li>Pulipaka Phani Meghana - 21BCE1345
 18 | <li>Pandithradhyula Soumya - 21BCE1424
 19 | </b>
 20 | """
 21 | 
 22 | !mkdir ~/.kaggle
 23 | 
 24 | !touch ~/.kaggle/kaggle.json
 25 | 
 26 | api_token = {"username":"umesh109","key":"aef50af22b6fe262935d024c7c135ac8"}
 27 | import json
 28 | with open('/root/.kaggle/kaggle.json', 'w') as file:
 29 |     json.dump(api_token, file)
 30 | 
 31 | !chmod 600 ~/.kaggle/kaggle.json
 32 | 
 33 | pip install np_utils
 34 | 
 35 | import pandas as pd
 36 | import numpy as np
 37 | 
 38 | import kaggle
 39 | 
 40 | import os
 41 | import sys
 42 | 
 43 | import librosa
 44 | import librosa.display
 45 | import seaborn as sns
 46 | import matplotlib.pyplot as plt
 47 | 
 48 | from sklearn.preprocessing import StandardScaler, OneHotEncoder
 49 | from sklearn.metrics import confusion_matrix, classification_report
 50 | from sklearn.model_selection import train_test_split
 51 | 
 52 | from IPython.display import Audio
 53 | 
 54 | import keras
 55 | import np_utils
 56 | from keras.callbacks import ReduceLROnPlateau
 57 | from keras.models import Sequential
 58 | from keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization
 59 | from keras.utils import to_categorical
 60 | from keras.callbacks import ModelCheckpoint
 61 | 
 62 | import warnings
 63 | if not sys.warnoptions:
 64 |     warnings.simplefilter("ignore")
 65 | warnings.filterwarnings("ignore", category=DeprecationWarning)
 66 | 
 67 | """## Downloading Datasets"""
 68 | 
 69 | !kaggle datasets download -d ejlok1/cremad
 70 | 
 71 | !kaggle datasets download -d uwrfkaggler/ravdess-emotional-speech-audio
 72 | !kaggle datasets download -d ejlok1/surrey-audiovisual-expressed-emotion-savee
 73 | !kaggle datasets download -d ejlok1/toronto-emotional-speech-set-tess
 74 | 
 75 | """### Unzip the datasets"""
 76 | 
 77 | !unzip /content/cremad.zip
 78 | !unzip /content/ravdess-emotional-speech-audio.zip
 79 | !unzip /content/surrey-audiovisual-expressed-emotion-savee.zip
 80 | 
 81 | !unzip /content/toronto-emotional-speech-set-tess.zip
 82 | 
 83 | """### Paths for Datasets"""
 84 | 
 85 | Ravdess = r"/content/audio_speech_actors_01-24"
 86 | Crema = r"/content/AudioWAV"
 87 | Tess = r"/content/TESS Toronto emotional speech set data"
 88 | Savee = r"/content/ALL"
 89 | 
 90 | """## 1. Ravdess DataFrame"""
 91 | 
 92 | ravdess_directory_list = os.listdir(Ravdess)
 93 | file_emotion = []
 94 | file_path = []
 95 | for dir in ravdess_directory_list:
 96 |     # as their are 20 different actors in our previous directory we need to extract files for each actor.
 97 |     actor = os.listdir(Ravdess + '/' + dir)
 98 |     for file in actor:
 99 |         part = file.split('.')[0]
100 |         part = part.split('-')
101 |         # third part in each file represents the emotion associated to that file.
102 |         file_emotion.append(int(part[2]))
103 |         file_path.append(Ravdess +'/'+ dir + '/' + file)
104 | 
105 | # dataframe for emotion of files
106 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
107 | 
108 | # dataframe for path of files.
109 | path_df = pd.DataFrame(file_path, columns=['Path'])
110 | Ravdess_df = pd.concat([emotion_df, path_df], axis=1)
111 | 
112 | # changing integers to actual emotions.
113 | Ravdess_df.Emotions.replace({1:'neutral', 2:'calm', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'}, inplace=True)
114 | Ravdess_df.head()
115 | 
116 | """## 2. Crema DataFrame"""
117 | 
118 | crema_directory_list = os.listdir(Crema)
119 | 
120 | file_emotion = []
121 | file_path = []
122 | 
123 | for file in crema_directory_list:
124 |     # storing file paths
125 |     file_path.append(Crema + '\\' + file)
126 |     # storing file emotions
127 |     part=file.split('_')
128 |     if part[2] == 'SAD':
129 |         file_emotion.append('sad')
130 |     elif part[2] == 'ANG':
131 |         file_emotion.append('angry')
132 |     elif part[2] == 'DIS':
133 |         file_emotion.append('disgust')
134 |     elif part[2] == 'FEA':
135 |         file_emotion.append('fear')
136 |     elif part[2] == 'HAP':
137 |         file_emotion.append('happy')
138 |     elif part[2] == 'NEU':
139 |         file_emotion.append('neutral')
140 |     else:
141 |         file_emotion.append('Unknown')
142 | 
143 | # dataframe for emotion of files
144 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
145 | 
146 | # dataframe for path of files.
147 | path_df = pd.DataFrame(file_path, columns=['Path'])
148 | Crema_df = pd.concat([emotion_df, path_df], axis=1)
149 | Crema_df.head()
150 | 
151 | """##3. TESS dataset"""
152 | 
153 | tess_directory_list = os.listdir(Tess)
154 | 
155 | file_emotion = []
156 | file_path = []
157 | 
158 | for dir in tess_directory_list:
159 |     directories = os.listdir(Tess +'/'+ dir)
160 |     for file in directories:
161 |         part = file.split('.')[0]
162 |         part = part.split('_')[2]
163 |         if part=='ps':
164 |             file_emotion.append('surprise')
165 |         else:
166 |             file_emotion.append(part)
167 |         file_path.append(Tess + dir + '/' + file)
168 | 
169 | # dataframe for emotion of files
170 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
171 | 
172 | # dataframe for path of files.
173 | path_df = pd.DataFrame(file_path, columns=['Path'])
174 | Tess_df = pd.concat([emotion_df, path_df], axis=1)
175 | Tess_df.head()
176 | 
177 | """##4. CREMA-D dataset
178 | 
179 | The audio files in this dataset are named in a way such that the prefix letters describes the emotion classes as the below:
180 | 
181 | <ul>
182 | <li> 'a' = Anger
183 | <li> 'd' = Disgust
184 | <li> 'f' = Fear
185 | <li> 'h' = Happiness
186 | <li> 'n' = Neutral
187 | <li> 'sa' = Sadness
188 | <li> 'su' = Surprise
189 | </ul>
190 | """
191 | 
192 | savee_directory_list = os.listdir(Savee)
193 | 
194 | file_emotion = []
195 | file_path = []
196 | 
197 | for file in savee_directory_list:
198 |     file_path.append(Savee + '\\' + file)
199 |     part = file.split('_')[1]
200 |     ele = part[:-6]
201 |     if ele=='a':
202 |         file_emotion.append('angry')
203 |     elif ele=='d':
204 |         file_emotion.append('disgust')
205 |     elif ele=='f':
206 |         file_emotion.append('fear')
207 |     elif ele=='h':
208 |         file_emotion.append('happy')
209 |     elif ele=='n':
210 |         file_emotion.append('neutral')
211 |     elif ele=='sa':
212 |         file_emotion.append('sad')
213 |     else:
214 |         file_emotion.append('surprise')
215 | 
216 | # dataframe for emotion of files
217 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
218 | 
219 | # dataframe for path of files.
220 | path_df = pd.DataFrame(file_path, columns=['Path'])
221 | Savee_df = pd.concat([emotion_df, path_df], axis=1)
222 | Savee_df.head()
223 | 
224 | #creating combined dataframe
225 | data_path = pd.concat([Ravdess_df], axis = 0)
226 | data_path.to_csv("data_path.csv",index=False)
227 | data_path
228 | 
229 | """## Data Visualisation and Exploration
230 | 
231 | First let's plot the count of each emotions in our dataset.
232 | """
233 | 
234 | import seaborn as sns
235 | import matplotlib.pyplot as plt
236 | 
237 | data_path['Emotions'] = data_path['Emotions'].astype('category')
238 | 
239 | plt.title('Count of Emotions', size=16)
240 | sns.countplot(data=data_path, x='Emotions')  # Specify x parameter
241 | plt.ylabel('Count', size=12)
242 | plt.xlabel('Emotions', size=12)
243 | sns.despine(top=True, right=True, left=False, bottom=False)
244 | plt.show()
245 | 
246 | """We can also plot waveplots and spectograms for audio signals
247 | <li> Waveplots - Waveplots let us know the loudness of the audio at a given time
248 | <li> Spectograms - A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. It's a representation of frequencies changing with respect to time for given audio/music signals.
249 | """
250 | 
251 | def create_waveplot(data, sr, e):
252 |     plt.figure(figsize=(10, 3))
253 |     plt.title('Waveplot for audio with {} emotion'.format(e), size=15)
254 |     librosa.display.waveshow(data, sr=sr)
255 |     plt.show()
256 | 
257 | def create_spectrogram(data, sr, e):
258 |     # stft function converts the data into short term fourier transform
259 |     X = librosa.stft(data)
260 |     Xdb = librosa.amplitude_to_db(abs(X))
261 |     plt.figure(figsize=(12, 3))
262 |     plt.title('Spectrogram for audio with {} emotion'.format(e), size=15)
263 |     librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
264 |     #librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
265 |     plt.colorbar()
266 | 
267 | """# Wave Plots for the Emotions
268 | 
269 | ## <li>Fear
270 | """
271 | 
272 | emotion='fear'
273 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
274 | data, sampling_rate = librosa.load(path)
275 | create_waveplot(data, sampling_rate, emotion)
276 | create_spectrogram(data, sampling_rate, emotion)
277 | Audio(path)
278 | 
279 | """## <li>Angry"""
280 | 
281 | emotion='angry'
282 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
283 | data, sampling_rate = librosa.load(path)
284 | create_waveplot(data, sampling_rate, emotion)
285 | create_spectrogram(data, sampling_rate, emotion)
286 | Audio(path)
287 | 
288 | """## <li>Sad"""
289 | 
290 | emotion = 'sad'
291 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
292 | data, sampling_rate = librosa.load(path)
293 | create_waveplot(data, sampling_rate, emotion)
294 | create_spectrogram(data, sampling_rate, emotion)
295 | Audio(path)
296 | 
297 | """## <li>Happy"""
298 | 
299 | emotion='happy'
300 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
301 | data, sampling_rate = librosa.load(path)
302 | create_waveplot(data, sampling_rate, emotion)
303 | create_spectrogram(data, sampling_rate, emotion)
304 | Audio(path)
305 | 
306 | """## Data Augmentation
307 | Data augmentation is the process by which we create new synthetic data samples by adding small perturbations on our initial training set.
308 | 
309 | <li>To generate syntactic data for audio, we can apply noise injection, shifting time, changing pitch and speed
310 | <li>The objective is to make our model invariant to those perturbations and enhace its ability to generalize.
311 | <li>In order to this to work adding the perturbations must conserve the same label as the original training sample.
312 | <li>In images data augmention can be performed by shifting the image, zooming, rotating, cropping ...etc
313 | 
314 | But We need to check which augmentation techinques works better for our dataset
315 | """
316 | 
317 | def noise(data):
318 |     noise_amp = 0.035*np.random.uniform()*np.amax(data)
319 |     data = data + noise_amp*np.random.normal(size=data.shape[0])
320 |     return data
321 | 
322 | def stretch(data, rate=0.8):
323 |     return librosa.effects.time_stretch(data, rate=rate)
324 | 
325 | def shift(data):
326 |     shift_range = int(np.random.uniform(low=-5, high = 5)*1000)
327 |     return np.roll(data, shift_range)
328 | 
329 | def pitch(data, sampling_rate, pitch_factor=0.7):
330 |     return librosa.effects.pitch_shift(data, sr=sampling_rate, n_steps=pitch_factor)
331 | 
332 | # taking any example and checking for techniques.
333 | path = np.array(data_path.Path)[1]
334 | data, sample_rate = librosa.load(path)
335 | 
336 | """#### 1. Simple Audio"""
337 | 
338 | plt.figure(figsize=(14,4))
339 | librosa.display.waveshow(y=data, sr=sample_rate)
340 | Audio(path)
341 | 
342 | """#### 2. Noise Injection"""
343 | 
344 | x = noise(data)
345 | plt.figure(figsize=(14,4))
346 | librosa.display.waveshow(y=x, sr=sample_rate)
347 | Audio(x, rate=sample_rate)
348 | 
349 | """Here, we can see noise injection is a very good augmentation technique because of which we can assure our training model is not overfitted
350 | 
351 | #### 3.Stretching
352 | """
353 | 
354 | x = stretch(data, rate=0.8)
355 | plt.figure(figsize=(14,4))
356 | librosa.display.waveshow(y=x, sr=sample_rate)
357 | Audio(x, rate=sample_rate)
358 | 
359 | """#### 4.Shifting"""
360 | 
361 | x = shift(data)
362 | plt.figure(figsize=(14,4))
363 | librosa.display.waveshow(y=x, sr=sample_rate)
364 | Audio(x, rate=sample_rate)
365 | 
366 | """#### 5.Pitch"""
367 | 
368 | x = pitch(data, sample_rate)
369 | plt.figure(figsize=(14,4))
370 | librosa.display.waveshow(y=x, sr=sample_rate)
371 | Audio(x, rate=sample_rate)
372 | 
373 | """I am employing noise injection, stretching (i.e., changing speed), and pitch modulation as part of the aforementioned augmentation techniques.
374 | 
375 | ## Feature Extraction
376 | """
377 | 
378 | def extract_features(data):
379 |     # ZCR
380 |     result = np.array([])
381 |     zcr = np.mean(librosa.feature.zero_crossing_rate(y=data).T, axis=0)
382 |     result=np.hstack((result, zcr)) # stacking horizontally
383 | 
384 |     # Chroma_stft
385 |     stft = np.abs(librosa.stft(data))
386 |     chroma_stft = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis=0)
387 |     result = np.hstack((result, chroma_stft)) # stacking horizontally
388 | 
389 |     # MFCC
390 |     mfcc = np.mean(librosa.feature.mfcc(y=data, sr=sample_rate).T, axis=0)
391 |     result = np.hstack((result, mfcc)) # stacking horizontally
392 | 
393 |     # Root Mean Square Value
394 |     rms = np.mean(librosa.feature.rms(y=data).T, axis=0)
395 |     result = np.hstack((result, rms)) # stacking horizontally
396 | 
397 |     # MelSpectogram
398 |     mel = np.mean(librosa.feature.melspectrogram(y=data, sr=sample_rate).T, axis=0)
399 |     result = np.hstack((result, mel)) # stacking horizontally
400 | 
401 |     return result
402 | 
403 | def get_features(path):
404 |     # duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
405 |     data, sample_rate = librosa.load(path, duration=2.5, offset=0.6)
406 | 
407 |     # without augmentation
408 |     res1 = extract_features(data)
409 |     result = np.array(res1)
410 | 
411 |     # data with noise
412 |     noise_data = noise(data)
413 |     res2 = extract_features(noise_data)
414 |     result = np.vstack((result, res2)) # stacking vertically
415 | 
416 |     # data with stretching and pitching
417 |     new_data = stretch(data)
418 |     data_stretch_pitch = pitch(new_data, sample_rate)
419 |     res3 = extract_features(data_stretch_pitch)
420 |     result = np.vstack((result, res3)) # stacking vertically
421 | 
422 |     return result
423 | 
424 | X, Y = [], []
425 | for path, emotion in zip(data_path.Path, data_path.Emotions):
426 |     feature = get_features(path)
427 |     for ele in feature:
428 |         X.append(ele)
429 |         # appending emotion 3 times as we have made 3 augmentation techniques on each audio file.
430 |         Y.append(emotion)
431 | 
432 | len(X), len(Y), data_path.Path.shape
433 | 
434 | Features = pd.DataFrame(X)
435 | Features['labels'] = Y
436 | Features.to_csv('features.csv', index=False)
437 | Features.head()
438 | 
439 | """We have applied data augmentation and extracted the features for each audio files and saved them.
440 | 
441 | ## Data Preparation
442 | 
443 | As of now we have extracted the data, now we need to normalize and split our data for training and testing.
444 | """
445 | 
446 | X = Features.iloc[: ,:-1].values
447 | Y = Features['labels'].values
448 | 
449 | # As this is a multiclass classification problem onehotencoding our Y.
450 | encoder = OneHotEncoder()
451 | Y = encoder.fit_transform(np.array(Y).reshape(-1,1)).toarray()
452 | 
453 | # splitting data
454 | x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=0, shuffle=True)
455 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
456 | 
457 | # scaling our data with sklearn's Standard scaler
458 | scaler = StandardScaler()
459 | x_train = scaler.fit_transform(x_train)
460 | x_test = scaler.transform(x_test)
461 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
462 | 
463 | # making our data compatible to model.
464 | x_train = np.expand_dims(x_train, axis=2)
465 | x_test = np.expand_dims(x_test, axis=2)
466 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
467 | 
468 | """## Modelling"""
469 | 
470 | model=Sequential()
471 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(x_train.shape[1], 1)))
472 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
473 | 
474 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu'))
475 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
476 | 
477 | model.add(Conv1D(128, kernel_size=5, strides=1, padding='same', activation='relu'))
478 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
479 | model.add(Dropout(0.2))
480 | 
481 | model.add(Conv1D(64, kernel_size=5, strides=1, padding='same', activation='relu'))
482 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
483 | 
484 | model.add(Flatten())
485 | model.add(Dense(units=32, activation='relu'))
486 | model.add(Dropout(0.3))
487 | 
488 | model.add(Dense(units=8, activation='softmax'))
489 | model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
490 | 
491 | model.summary()
492 | 
493 | rlrp = ReduceLROnPlateau(monitor='loss', factor=0.4, verbose=0, patience=2, min_lr=0.0000001)
494 | history=model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), callbacks=[rlrp])
495 | 
496 | print("Accuracy of our model on test data : " , model.evaluate(x_test,y_test)[1]*100 , "%")
497 | 
498 | epochs = [i for i in range(50)]
499 | fig , ax = plt.subplots(1,2)
500 | train_acc = history.history['accuracy']
501 | train_loss = history.history['loss']
502 | test_acc = history.history['val_accuracy']
503 | test_loss = history.history['val_loss']
504 | 
505 | fig.set_size_inches(20,6)
506 | ax[0].plot(epochs , train_loss , label = 'Training Loss')
507 | ax[0].plot(epochs , test_loss , label = 'Testing Loss')
508 | ax[0].set_title('Training & Testing Loss')
509 | ax[0].legend()
510 | ax[0].set_xlabel("Epochs")
511 | 
512 | ax[1].plot(epochs , train_acc , label = 'Training Accuracy')
513 | ax[1].plot(epochs , test_acc , label = 'Testing Accuracy')
514 | ax[1].set_title('Training & Testing Accuracy')
515 | ax[1].legend()
516 | ax[1].set_xlabel("Epochs")
517 | plt.show()
518 | 
519 | # predicting on test data.
520 | pred_test = model.predict(x_test)
521 | y_pred = encoder.inverse_transform(pred_test)
522 | 
523 | y_test = encoder.inverse_transform(y_test)
524 | 
525 | df = pd.DataFrame(columns=['Predicted Labels', 'Actual Labels'])
526 | df['Predicted Labels'] = y_pred.flatten()
527 | df['Actual Labels'] = y_test.flatten()
528 | 
529 | df.head(10)
530 | 
531 | cm = confusion_matrix(y_test, y_pred)
532 | plt.figure(figsize = (12, 10))
533 | cm = pd.DataFrame(cm , index = [i for i in encoder.categories_] , columns = [i for i in encoder.categories_])
534 | sns.heatmap(cm, linecolor='white', cmap='Blues', linewidth=1, annot=True, fmt='')
535 | plt.title('Confusion Matrix', size=20)
536 | plt.xlabel('Predicted Labels', size=14)
537 | plt.ylabel('Actual Labels', size=14)
538 | plt.show()
539 | 
540 | print(classification_report(y_test, y_pred))
541 | 
542 | total_instances = np.sum(cm.to_numpy())
543 | correct_predictions = np.trace(cm)
544 | 
545 | accuracy = correct_predictions / total_instances
546 | print(f'Overall Accuracy: {accuracy:.2%}')
547 | 
548 | """We can see our model is more accurate in predicting surprise, angry emotions and it makes sense also because audio files of these emotions differ to other audio files in a lot of ways like pitch, speed, noise.
549 | 
550 | We overall achieved 63.8% accuracy on our test data and its decent but we can improve it more by applying more augmentation techniques and using other feature extraction methods.
551 | """


--------------------------------------------------------------------------------