├── dataset
└── README.md
├── app.py
├── README.md
└── speech_emotion_recognition.py
/dataset/README.md:
--------------------------------------------------------------------------------
1 |
Datasets Used in this project are:
2 |
3 |
Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D)
4 | RAVDESS Emotional speech audio
5 | Surrey Audio-Visual Expressed Emotion (SAVEE)
6 | Toronto emotional speech set (TESS)
7 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import streamlit as st
2 | from keras.models import load_model
3 | import numpy as np
4 |
5 | from scipy.io import wavfile
6 | from scipy import signal
7 |
8 | def preprocess_speech(wav_file):
9 | # Read the wav file
10 | sample_rate, audio = wavfile.read(wav_file)
11 |
12 | # Normalize the audio data
13 | audio = audio / np.max(np.abs(audio))
14 |
15 | # Resample the audio to a lower sample rate
16 | resampled_audio = signal.resample_poly(audio, 1, sample_rate // 0.8)
17 |
18 | # Extract features from the audio (e.g., MFCCs, spectrogram, etc.)
19 | features = extract_features(resampled_audio, sample_rate)
20 |
21 | return features
22 |
23 |
24 | # Load the pre-trained model
25 | model = load_model('./model.h5')
26 |
27 | st.title('Emotion Recognition')
28 | st.header('This app predicts the emotion from speech')
29 |
30 | # Upload the speech file
31 | uploaded_file = st.file_uploader("Choose a speech file", type="wav")
32 |
33 | if uploaded_file is not None:
34 | # Preprocess the speech file (you'll need to implement this)
35 | features = preprocess_speech(uploaded_file)
36 | features = np.expand_dims(features, axis=0)
37 |
38 | # Make a prediction
39 | prediction = model.predict(features)
40 |
41 | # Get the emotion with the highest confidence
42 | emotion = np.argmax(prediction)
43 |
44 | # Display the result
45 | st.write(f'The predicted emotion is: {emotion}')
46 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Speech Emotion Recognition
2 |
3 | ## Overview
4 | This GitHub repository is dedicated to the development of a Speech Emotion Recognition (SER) model using Python libraries Convolutional Neural Network, Keras, Pandas, Numpy. The goal of this project is to create an efficient and accurate model that can recognize emotions in spoken language, which can have a wide range of applications in fields such as human-computer interaction, customer service, and mental health.
5 |
6 | ## Features
7 |
8 | Speech data preprocessing: Utilizes various Python libraries to preprocess and clean speech data
9 |
10 |
11 | Deep Learning Model: Implemented a deep learning model, Convolutional Neural Network (CNN) to extract meaningful features from the audio data.
12 |
13 |
14 | Emotion Classification: Trained the model to classify speech into different emotion categories, such as happiness, sadness, anger, fear, etc.
15 |
16 |
17 | Evaluation Metrics: Computes and display the evaluation metrics like accuracy, F1-score, and confusion matrix to assess the model's performance.
18 |
19 |
20 | ## Usage
21 |
22 | -
23 | Fork this repository and Clone this repository to your local machine using
24 |
25 | git clone https://github.com/"YOUR_GITHUB_USERNAME"/Speech-Emotion-Recognition.git
26 | -
27 | Set up a Python environment and install the required dependencies using pip install -r requirements.txt.
28 |
29 | -
30 | Prepare your speech emotion dataset and organize it appropriately.
31 |
32 | -
33 | Preprocess the data, train the SER model, and evaluate its performance using the provided scripts.
34 |
35 | -
36 | Fine-tune the model and experiment with different hyperparameters for improved accuracy.
37 |
38 |
39 |
40 | ## Contributions
41 | Contributions to this project are welcome! Whether you want to improve the model's architecture, add new features, or fix bugs, please feel free to submit pull requests.
42 |
43 | ## Acknowledgments
44 | We would like to acknowledge the open-source community and the developers of the Python libraries and frameworks used in this project. Additionally, special thanks to anyone who contributes to this project to make speech emotion recognition more accessible and accurate.
45 |
46 | Get ready to explore the fascinating world of speech emotion recognition with Python. Happy coding!
47 |
--------------------------------------------------------------------------------
/speech_emotion_recognition.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """Speech_Emotion_Recognition.ipynb
3 |
4 | Automatically generated by Colaboratory.
5 |
6 | Original file is located at
7 | https://colab.research.google.com/github/MrCuber/Speech-Emotion-Recognition/blob/main/Speech_Emotion_Recognition.ipynb
8 |
9 | # Speech Emotion Recognition Project
10 |
11 | ## Contributors to this project:
12 |
13 |
14 | Umesh Chandra Sakinala - 21BCE1782
15 | Sujan Kumar Sollety - 21BCE5667
16 | Harshith Simha Gurram - 21BCE5653
17 | Pulipaka Phani Meghana - 21BCE1345
18 | Pandithradhyula Soumya - 21BCE1424
19 |
20 | """
21 |
22 | !mkdir ~/.kaggle
23 |
24 | !touch ~/.kaggle/kaggle.json
25 |
26 | api_token = {"username":"umesh109","key":"aef50af22b6fe262935d024c7c135ac8"}
27 | import json
28 | with open('/root/.kaggle/kaggle.json', 'w') as file:
29 | json.dump(api_token, file)
30 |
31 | !chmod 600 ~/.kaggle/kaggle.json
32 |
33 | pip install np_utils
34 |
35 | import pandas as pd
36 | import numpy as np
37 |
38 | import kaggle
39 |
40 | import os
41 | import sys
42 |
43 | import librosa
44 | import librosa.display
45 | import seaborn as sns
46 | import matplotlib.pyplot as plt
47 |
48 | from sklearn.preprocessing import StandardScaler, OneHotEncoder
49 | from sklearn.metrics import confusion_matrix, classification_report
50 | from sklearn.model_selection import train_test_split
51 |
52 | from IPython.display import Audio
53 |
54 | import keras
55 | import np_utils
56 | from keras.callbacks import ReduceLROnPlateau
57 | from keras.models import Sequential
58 | from keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization
59 | from keras.utils import to_categorical
60 | from keras.callbacks import ModelCheckpoint
61 |
62 | import warnings
63 | if not sys.warnoptions:
64 | warnings.simplefilter("ignore")
65 | warnings.filterwarnings("ignore", category=DeprecationWarning)
66 |
67 | """## Downloading Datasets"""
68 |
69 | !kaggle datasets download -d ejlok1/cremad
70 |
71 | !kaggle datasets download -d uwrfkaggler/ravdess-emotional-speech-audio
72 | !kaggle datasets download -d ejlok1/surrey-audiovisual-expressed-emotion-savee
73 | !kaggle datasets download -d ejlok1/toronto-emotional-speech-set-tess
74 |
75 | """### Unzip the datasets"""
76 |
77 | !unzip /content/cremad.zip
78 | !unzip /content/ravdess-emotional-speech-audio.zip
79 | !unzip /content/surrey-audiovisual-expressed-emotion-savee.zip
80 |
81 | !unzip /content/toronto-emotional-speech-set-tess.zip
82 |
83 | """### Paths for Datasets"""
84 |
85 | Ravdess = r"/content/audio_speech_actors_01-24"
86 | Crema = r"/content/AudioWAV"
87 | Tess = r"/content/TESS Toronto emotional speech set data"
88 | Savee = r"/content/ALL"
89 |
90 | """## 1. Ravdess DataFrame"""
91 |
92 | ravdess_directory_list = os.listdir(Ravdess)
93 | file_emotion = []
94 | file_path = []
95 | for dir in ravdess_directory_list:
96 | # as their are 20 different actors in our previous directory we need to extract files for each actor.
97 | actor = os.listdir(Ravdess + '/' + dir)
98 | for file in actor:
99 | part = file.split('.')[0]
100 | part = part.split('-')
101 | # third part in each file represents the emotion associated to that file.
102 | file_emotion.append(int(part[2]))
103 | file_path.append(Ravdess +'/'+ dir + '/' + file)
104 |
105 | # dataframe for emotion of files
106 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
107 |
108 | # dataframe for path of files.
109 | path_df = pd.DataFrame(file_path, columns=['Path'])
110 | Ravdess_df = pd.concat([emotion_df, path_df], axis=1)
111 |
112 | # changing integers to actual emotions.
113 | Ravdess_df.Emotions.replace({1:'neutral', 2:'calm', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'}, inplace=True)
114 | Ravdess_df.head()
115 |
116 | """## 2. Crema DataFrame"""
117 |
118 | crema_directory_list = os.listdir(Crema)
119 |
120 | file_emotion = []
121 | file_path = []
122 |
123 | for file in crema_directory_list:
124 | # storing file paths
125 | file_path.append(Crema + '\\' + file)
126 | # storing file emotions
127 | part=file.split('_')
128 | if part[2] == 'SAD':
129 | file_emotion.append('sad')
130 | elif part[2] == 'ANG':
131 | file_emotion.append('angry')
132 | elif part[2] == 'DIS':
133 | file_emotion.append('disgust')
134 | elif part[2] == 'FEA':
135 | file_emotion.append('fear')
136 | elif part[2] == 'HAP':
137 | file_emotion.append('happy')
138 | elif part[2] == 'NEU':
139 | file_emotion.append('neutral')
140 | else:
141 | file_emotion.append('Unknown')
142 |
143 | # dataframe for emotion of files
144 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
145 |
146 | # dataframe for path of files.
147 | path_df = pd.DataFrame(file_path, columns=['Path'])
148 | Crema_df = pd.concat([emotion_df, path_df], axis=1)
149 | Crema_df.head()
150 |
151 | """##3. TESS dataset"""
152 |
153 | tess_directory_list = os.listdir(Tess)
154 |
155 | file_emotion = []
156 | file_path = []
157 |
158 | for dir in tess_directory_list:
159 | directories = os.listdir(Tess +'/'+ dir)
160 | for file in directories:
161 | part = file.split('.')[0]
162 | part = part.split('_')[2]
163 | if part=='ps':
164 | file_emotion.append('surprise')
165 | else:
166 | file_emotion.append(part)
167 | file_path.append(Tess + dir + '/' + file)
168 |
169 | # dataframe for emotion of files
170 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
171 |
172 | # dataframe for path of files.
173 | path_df = pd.DataFrame(file_path, columns=['Path'])
174 | Tess_df = pd.concat([emotion_df, path_df], axis=1)
175 | Tess_df.head()
176 |
177 | """##4. CREMA-D dataset
178 |
179 | The audio files in this dataset are named in a way such that the prefix letters describes the emotion classes as the below:
180 |
181 |
182 | - 'a' = Anger
183 |
- 'd' = Disgust
184 |
- 'f' = Fear
185 |
- 'h' = Happiness
186 |
- 'n' = Neutral
187 |
- 'sa' = Sadness
188 |
- 'su' = Surprise
189 |
190 | """
191 |
192 | savee_directory_list = os.listdir(Savee)
193 |
194 | file_emotion = []
195 | file_path = []
196 |
197 | for file in savee_directory_list:
198 | file_path.append(Savee + '\\' + file)
199 | part = file.split('_')[1]
200 | ele = part[:-6]
201 | if ele=='a':
202 | file_emotion.append('angry')
203 | elif ele=='d':
204 | file_emotion.append('disgust')
205 | elif ele=='f':
206 | file_emotion.append('fear')
207 | elif ele=='h':
208 | file_emotion.append('happy')
209 | elif ele=='n':
210 | file_emotion.append('neutral')
211 | elif ele=='sa':
212 | file_emotion.append('sad')
213 | else:
214 | file_emotion.append('surprise')
215 |
216 | # dataframe for emotion of files
217 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
218 |
219 | # dataframe for path of files.
220 | path_df = pd.DataFrame(file_path, columns=['Path'])
221 | Savee_df = pd.concat([emotion_df, path_df], axis=1)
222 | Savee_df.head()
223 |
224 | #creating combined dataframe
225 | data_path = pd.concat([Ravdess_df], axis = 0)
226 | data_path.to_csv("data_path.csv",index=False)
227 | data_path
228 |
229 | """## Data Visualisation and Exploration
230 |
231 | First let's plot the count of each emotions in our dataset.
232 | """
233 |
234 | import seaborn as sns
235 | import matplotlib.pyplot as plt
236 |
237 | data_path['Emotions'] = data_path['Emotions'].astype('category')
238 |
239 | plt.title('Count of Emotions', size=16)
240 | sns.countplot(data=data_path, x='Emotions') # Specify x parameter
241 | plt.ylabel('Count', size=12)
242 | plt.xlabel('Emotions', size=12)
243 | sns.despine(top=True, right=True, left=False, bottom=False)
244 | plt.show()
245 |
246 | """We can also plot waveplots and spectograms for audio signals
247 | Waveplots - Waveplots let us know the loudness of the audio at a given time
248 | Spectograms - A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. It's a representation of frequencies changing with respect to time for given audio/music signals.
249 | """
250 |
251 | def create_waveplot(data, sr, e):
252 | plt.figure(figsize=(10, 3))
253 | plt.title('Waveplot for audio with {} emotion'.format(e), size=15)
254 | librosa.display.waveshow(data, sr=sr)
255 | plt.show()
256 |
257 | def create_spectrogram(data, sr, e):
258 | # stft function converts the data into short term fourier transform
259 | X = librosa.stft(data)
260 | Xdb = librosa.amplitude_to_db(abs(X))
261 | plt.figure(figsize=(12, 3))
262 | plt.title('Spectrogram for audio with {} emotion'.format(e), size=15)
263 | librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
264 | #librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log')
265 | plt.colorbar()
266 |
267 | """# Wave Plots for the Emotions
268 |
269 | ## Fear
270 | """
271 |
272 | emotion='fear'
273 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
274 | data, sampling_rate = librosa.load(path)
275 | create_waveplot(data, sampling_rate, emotion)
276 | create_spectrogram(data, sampling_rate, emotion)
277 | Audio(path)
278 |
279 | """## Angry"""
280 |
281 | emotion='angry'
282 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
283 | data, sampling_rate = librosa.load(path)
284 | create_waveplot(data, sampling_rate, emotion)
285 | create_spectrogram(data, sampling_rate, emotion)
286 | Audio(path)
287 |
288 | """## Sad"""
289 |
290 | emotion = 'sad'
291 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
292 | data, sampling_rate = librosa.load(path)
293 | create_waveplot(data, sampling_rate, emotion)
294 | create_spectrogram(data, sampling_rate, emotion)
295 | Audio(path)
296 |
297 | """## Happy"""
298 |
299 | emotion='happy'
300 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1]
301 | data, sampling_rate = librosa.load(path)
302 | create_waveplot(data, sampling_rate, emotion)
303 | create_spectrogram(data, sampling_rate, emotion)
304 | Audio(path)
305 |
306 | """## Data Augmentation
307 | Data augmentation is the process by which we create new synthetic data samples by adding small perturbations on our initial training set.
308 |
309 | To generate syntactic data for audio, we can apply noise injection, shifting time, changing pitch and speed
310 | The objective is to make our model invariant to those perturbations and enhace its ability to generalize.
311 | In order to this to work adding the perturbations must conserve the same label as the original training sample.
312 | In images data augmention can be performed by shifting the image, zooming, rotating, cropping ...etc
313 |
314 | But We need to check which augmentation techinques works better for our dataset
315 | """
316 |
317 | def noise(data):
318 | noise_amp = 0.035*np.random.uniform()*np.amax(data)
319 | data = data + noise_amp*np.random.normal(size=data.shape[0])
320 | return data
321 |
322 | def stretch(data, rate=0.8):
323 | return librosa.effects.time_stretch(data, rate=rate)
324 |
325 | def shift(data):
326 | shift_range = int(np.random.uniform(low=-5, high = 5)*1000)
327 | return np.roll(data, shift_range)
328 |
329 | def pitch(data, sampling_rate, pitch_factor=0.7):
330 | return librosa.effects.pitch_shift(data, sr=sampling_rate, n_steps=pitch_factor)
331 |
332 | # taking any example and checking for techniques.
333 | path = np.array(data_path.Path)[1]
334 | data, sample_rate = librosa.load(path)
335 |
336 | """#### 1. Simple Audio"""
337 |
338 | plt.figure(figsize=(14,4))
339 | librosa.display.waveshow(y=data, sr=sample_rate)
340 | Audio(path)
341 |
342 | """#### 2. Noise Injection"""
343 |
344 | x = noise(data)
345 | plt.figure(figsize=(14,4))
346 | librosa.display.waveshow(y=x, sr=sample_rate)
347 | Audio(x, rate=sample_rate)
348 |
349 | """Here, we can see noise injection is a very good augmentation technique because of which we can assure our training model is not overfitted
350 |
351 | #### 3.Stretching
352 | """
353 |
354 | x = stretch(data, rate=0.8)
355 | plt.figure(figsize=(14,4))
356 | librosa.display.waveshow(y=x, sr=sample_rate)
357 | Audio(x, rate=sample_rate)
358 |
359 | """#### 4.Shifting"""
360 |
361 | x = shift(data)
362 | plt.figure(figsize=(14,4))
363 | librosa.display.waveshow(y=x, sr=sample_rate)
364 | Audio(x, rate=sample_rate)
365 |
366 | """#### 5.Pitch"""
367 |
368 | x = pitch(data, sample_rate)
369 | plt.figure(figsize=(14,4))
370 | librosa.display.waveshow(y=x, sr=sample_rate)
371 | Audio(x, rate=sample_rate)
372 |
373 | """I am employing noise injection, stretching (i.e., changing speed), and pitch modulation as part of the aforementioned augmentation techniques.
374 |
375 | ## Feature Extraction
376 | """
377 |
378 | def extract_features(data):
379 | # ZCR
380 | result = np.array([])
381 | zcr = np.mean(librosa.feature.zero_crossing_rate(y=data).T, axis=0)
382 | result=np.hstack((result, zcr)) # stacking horizontally
383 |
384 | # Chroma_stft
385 | stft = np.abs(librosa.stft(data))
386 | chroma_stft = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis=0)
387 | result = np.hstack((result, chroma_stft)) # stacking horizontally
388 |
389 | # MFCC
390 | mfcc = np.mean(librosa.feature.mfcc(y=data, sr=sample_rate).T, axis=0)
391 | result = np.hstack((result, mfcc)) # stacking horizontally
392 |
393 | # Root Mean Square Value
394 | rms = np.mean(librosa.feature.rms(y=data).T, axis=0)
395 | result = np.hstack((result, rms)) # stacking horizontally
396 |
397 | # MelSpectogram
398 | mel = np.mean(librosa.feature.melspectrogram(y=data, sr=sample_rate).T, axis=0)
399 | result = np.hstack((result, mel)) # stacking horizontally
400 |
401 | return result
402 |
403 | def get_features(path):
404 | # duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above.
405 | data, sample_rate = librosa.load(path, duration=2.5, offset=0.6)
406 |
407 | # without augmentation
408 | res1 = extract_features(data)
409 | result = np.array(res1)
410 |
411 | # data with noise
412 | noise_data = noise(data)
413 | res2 = extract_features(noise_data)
414 | result = np.vstack((result, res2)) # stacking vertically
415 |
416 | # data with stretching and pitching
417 | new_data = stretch(data)
418 | data_stretch_pitch = pitch(new_data, sample_rate)
419 | res3 = extract_features(data_stretch_pitch)
420 | result = np.vstack((result, res3)) # stacking vertically
421 |
422 | return result
423 |
424 | X, Y = [], []
425 | for path, emotion in zip(data_path.Path, data_path.Emotions):
426 | feature = get_features(path)
427 | for ele in feature:
428 | X.append(ele)
429 | # appending emotion 3 times as we have made 3 augmentation techniques on each audio file.
430 | Y.append(emotion)
431 |
432 | len(X), len(Y), data_path.Path.shape
433 |
434 | Features = pd.DataFrame(X)
435 | Features['labels'] = Y
436 | Features.to_csv('features.csv', index=False)
437 | Features.head()
438 |
439 | """We have applied data augmentation and extracted the features for each audio files and saved them.
440 |
441 | ## Data Preparation
442 |
443 | As of now we have extracted the data, now we need to normalize and split our data for training and testing.
444 | """
445 |
446 | X = Features.iloc[: ,:-1].values
447 | Y = Features['labels'].values
448 |
449 | # As this is a multiclass classification problem onehotencoding our Y.
450 | encoder = OneHotEncoder()
451 | Y = encoder.fit_transform(np.array(Y).reshape(-1,1)).toarray()
452 |
453 | # splitting data
454 | x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=0, shuffle=True)
455 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
456 |
457 | # scaling our data with sklearn's Standard scaler
458 | scaler = StandardScaler()
459 | x_train = scaler.fit_transform(x_train)
460 | x_test = scaler.transform(x_test)
461 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
462 |
463 | # making our data compatible to model.
464 | x_train = np.expand_dims(x_train, axis=2)
465 | x_test = np.expand_dims(x_test, axis=2)
466 | x_train.shape, y_train.shape, x_test.shape, y_test.shape
467 |
468 | """## Modelling"""
469 |
470 | model=Sequential()
471 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(x_train.shape[1], 1)))
472 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
473 |
474 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu'))
475 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
476 |
477 | model.add(Conv1D(128, kernel_size=5, strides=1, padding='same', activation='relu'))
478 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
479 | model.add(Dropout(0.2))
480 |
481 | model.add(Conv1D(64, kernel_size=5, strides=1, padding='same', activation='relu'))
482 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same'))
483 |
484 | model.add(Flatten())
485 | model.add(Dense(units=32, activation='relu'))
486 | model.add(Dropout(0.3))
487 |
488 | model.add(Dense(units=8, activation='softmax'))
489 | model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
490 |
491 | model.summary()
492 |
493 | rlrp = ReduceLROnPlateau(monitor='loss', factor=0.4, verbose=0, patience=2, min_lr=0.0000001)
494 | history=model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), callbacks=[rlrp])
495 |
496 | print("Accuracy of our model on test data : " , model.evaluate(x_test,y_test)[1]*100 , "%")
497 |
498 | epochs = [i for i in range(50)]
499 | fig , ax = plt.subplots(1,2)
500 | train_acc = history.history['accuracy']
501 | train_loss = history.history['loss']
502 | test_acc = history.history['val_accuracy']
503 | test_loss = history.history['val_loss']
504 |
505 | fig.set_size_inches(20,6)
506 | ax[0].plot(epochs , train_loss , label = 'Training Loss')
507 | ax[0].plot(epochs , test_loss , label = 'Testing Loss')
508 | ax[0].set_title('Training & Testing Loss')
509 | ax[0].legend()
510 | ax[0].set_xlabel("Epochs")
511 |
512 | ax[1].plot(epochs , train_acc , label = 'Training Accuracy')
513 | ax[1].plot(epochs , test_acc , label = 'Testing Accuracy')
514 | ax[1].set_title('Training & Testing Accuracy')
515 | ax[1].legend()
516 | ax[1].set_xlabel("Epochs")
517 | plt.show()
518 |
519 | # predicting on test data.
520 | pred_test = model.predict(x_test)
521 | y_pred = encoder.inverse_transform(pred_test)
522 |
523 | y_test = encoder.inverse_transform(y_test)
524 |
525 | df = pd.DataFrame(columns=['Predicted Labels', 'Actual Labels'])
526 | df['Predicted Labels'] = y_pred.flatten()
527 | df['Actual Labels'] = y_test.flatten()
528 |
529 | df.head(10)
530 |
531 | cm = confusion_matrix(y_test, y_pred)
532 | plt.figure(figsize = (12, 10))
533 | cm = pd.DataFrame(cm , index = [i for i in encoder.categories_] , columns = [i for i in encoder.categories_])
534 | sns.heatmap(cm, linecolor='white', cmap='Blues', linewidth=1, annot=True, fmt='')
535 | plt.title('Confusion Matrix', size=20)
536 | plt.xlabel('Predicted Labels', size=14)
537 | plt.ylabel('Actual Labels', size=14)
538 | plt.show()
539 |
540 | print(classification_report(y_test, y_pred))
541 |
542 | total_instances = np.sum(cm.to_numpy())
543 | correct_predictions = np.trace(cm)
544 |
545 | accuracy = correct_predictions / total_instances
546 | print(f'Overall Accuracy: {accuracy:.2%}')
547 |
548 | """We can see our model is more accurate in predicting surprise, angry emotions and it makes sense also because audio files of these emotions differ to other audio files in a lot of ways like pitch, speed, noise.
549 |
550 | We overall achieved 63.8% accuracy on our test data and its decent but we can improve it more by applying more augmentation techniques and using other feature extraction methods.
551 | """
--------------------------------------------------------------------------------