├── dataset └── README.md ├── app.py ├── README.md └── speech_emotion_recognition.py /dataset/README.md: -------------------------------------------------------------------------------- 1 |

Datasets Used in this project are:

2 |

3 |
  • Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D) 4 |
  • RAVDESS Emotional speech audio 5 |
  • Surrey Audio-Visual Expressed Emotion (SAVEE) 6 |
  • Toronto emotional speech set (TESS) 7 |
  • -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from keras.models import load_model 3 | import numpy as np 4 | 5 | from scipy.io import wavfile 6 | from scipy import signal 7 | 8 | def preprocess_speech(wav_file): 9 | # Read the wav file 10 | sample_rate, audio = wavfile.read(wav_file) 11 | 12 | # Normalize the audio data 13 | audio = audio / np.max(np.abs(audio)) 14 | 15 | # Resample the audio to a lower sample rate 16 | resampled_audio = signal.resample_poly(audio, 1, sample_rate // 0.8) 17 | 18 | # Extract features from the audio (e.g., MFCCs, spectrogram, etc.) 19 | features = extract_features(resampled_audio, sample_rate) 20 | 21 | return features 22 | 23 | 24 | # Load the pre-trained model 25 | model = load_model('./model.h5') 26 | 27 | st.title('Emotion Recognition') 28 | st.header('This app predicts the emotion from speech') 29 | 30 | # Upload the speech file 31 | uploaded_file = st.file_uploader("Choose a speech file", type="wav") 32 | 33 | if uploaded_file is not None: 34 | # Preprocess the speech file (you'll need to implement this) 35 | features = preprocess_speech(uploaded_file) 36 | features = np.expand_dims(features, axis=0) 37 | 38 | # Make a prediction 39 | prediction = model.predict(features) 40 | 41 | # Get the emotion with the highest confidence 42 | emotion = np.argmax(prediction) 43 | 44 | # Display the result 45 | st.write(f'The predicted emotion is: {emotion}') 46 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Speech Emotion Recognition 2 | 3 | ## Overview 4 | This GitHub repository is dedicated to the development of a Speech Emotion Recognition (SER) model using Python libraries Convolutional Neural Network, Keras, Pandas, Numpy. The goal of this project is to create an efficient and accurate model that can recognize emotions in spoken language, which can have a wide range of applications in fields such as human-computer interaction, customer service, and mental health. 5 | 6 | ## Features 7 |
  • 8 | Speech data preprocessing: Utilizes various Python libraries to preprocess and clean speech data 9 |
  • 10 |
  • 11 | Deep Learning Model: Implemented a deep learning model, Convolutional Neural Network (CNN) to extract meaningful features from the audio data. 12 |
  • 13 |
  • 14 | Emotion Classification: Trained the model to classify speech into different emotion categories, such as happiness, sadness, anger, fear, etc. 15 |
  • 16 |
  • 17 | Evaluation Metrics: Computes and display the evaluation metrics like accuracy, F1-score, and confusion matrix to assess the model's performance. 18 |
  • 19 | 20 | ## Usage 21 |
      22 |
    1. 23 | Fork this repository and Clone this repository to your local machine using 24 |
    2. 25 | git clone https://github.com/"YOUR_GITHUB_USERNAME"/Speech-Emotion-Recognition.git 26 |
    3. 27 | Set up a Python environment and install the required dependencies using pip install -r requirements.txt. 28 |
    4. 29 |
    5. 30 | Prepare your speech emotion dataset and organize it appropriately. 31 |
    6. 32 |
    7. 33 | Preprocess the data, train the SER model, and evaluate its performance using the provided scripts. 34 |
    8. 35 |
    9. 36 | Fine-tune the model and experiment with different hyperparameters for improved accuracy. 37 |
    10. 38 |
    39 | 40 | ## Contributions 41 | Contributions to this project are welcome! Whether you want to improve the model's architecture, add new features, or fix bugs, please feel free to submit pull requests. 42 | 43 | ## Acknowledgments 44 | We would like to acknowledge the open-source community and the developers of the Python libraries and frameworks used in this project. Additionally, special thanks to anyone who contributes to this project to make speech emotion recognition more accessible and accurate. 45 | 46 | Get ready to explore the fascinating world of speech emotion recognition with Python. Happy coding! 47 | -------------------------------------------------------------------------------- /speech_emotion_recognition.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Speech_Emotion_Recognition.ipynb 3 | 4 | Automatically generated by Colaboratory. 5 | 6 | Original file is located at 7 | https://colab.research.google.com/github/MrCuber/Speech-Emotion-Recognition/blob/main/Speech_Emotion_Recognition.ipynb 8 | 9 | # Speech Emotion Recognition Project 10 | 11 | ## Contributors to this project: 12 | 13 | 14 |
  • Umesh Chandra Sakinala - 21BCE1782 15 |
  • Sujan Kumar Sollety - 21BCE5667 16 |
  • Harshith Simha Gurram - 21BCE5653 17 |
  • Pulipaka Phani Meghana - 21BCE1345 18 |
  • Pandithradhyula Soumya - 21BCE1424 19 | 20 | """ 21 | 22 | !mkdir ~/.kaggle 23 | 24 | !touch ~/.kaggle/kaggle.json 25 | 26 | api_token = {"username":"umesh109","key":"aef50af22b6fe262935d024c7c135ac8"} 27 | import json 28 | with open('/root/.kaggle/kaggle.json', 'w') as file: 29 | json.dump(api_token, file) 30 | 31 | !chmod 600 ~/.kaggle/kaggle.json 32 | 33 | pip install np_utils 34 | 35 | import pandas as pd 36 | import numpy as np 37 | 38 | import kaggle 39 | 40 | import os 41 | import sys 42 | 43 | import librosa 44 | import librosa.display 45 | import seaborn as sns 46 | import matplotlib.pyplot as plt 47 | 48 | from sklearn.preprocessing import StandardScaler, OneHotEncoder 49 | from sklearn.metrics import confusion_matrix, classification_report 50 | from sklearn.model_selection import train_test_split 51 | 52 | from IPython.display import Audio 53 | 54 | import keras 55 | import np_utils 56 | from keras.callbacks import ReduceLROnPlateau 57 | from keras.models import Sequential 58 | from keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization 59 | from keras.utils import to_categorical 60 | from keras.callbacks import ModelCheckpoint 61 | 62 | import warnings 63 | if not sys.warnoptions: 64 | warnings.simplefilter("ignore") 65 | warnings.filterwarnings("ignore", category=DeprecationWarning) 66 | 67 | """## Downloading Datasets""" 68 | 69 | !kaggle datasets download -d ejlok1/cremad 70 | 71 | !kaggle datasets download -d uwrfkaggler/ravdess-emotional-speech-audio 72 | !kaggle datasets download -d ejlok1/surrey-audiovisual-expressed-emotion-savee 73 | !kaggle datasets download -d ejlok1/toronto-emotional-speech-set-tess 74 | 75 | """### Unzip the datasets""" 76 | 77 | !unzip /content/cremad.zip 78 | !unzip /content/ravdess-emotional-speech-audio.zip 79 | !unzip /content/surrey-audiovisual-expressed-emotion-savee.zip 80 | 81 | !unzip /content/toronto-emotional-speech-set-tess.zip 82 | 83 | """### Paths for Datasets""" 84 | 85 | Ravdess = r"/content/audio_speech_actors_01-24" 86 | Crema = r"/content/AudioWAV" 87 | Tess = r"/content/TESS Toronto emotional speech set data" 88 | Savee = r"/content/ALL" 89 | 90 | """## 1. Ravdess DataFrame""" 91 | 92 | ravdess_directory_list = os.listdir(Ravdess) 93 | file_emotion = [] 94 | file_path = [] 95 | for dir in ravdess_directory_list: 96 | # as their are 20 different actors in our previous directory we need to extract files for each actor. 97 | actor = os.listdir(Ravdess + '/' + dir) 98 | for file in actor: 99 | part = file.split('.')[0] 100 | part = part.split('-') 101 | # third part in each file represents the emotion associated to that file. 102 | file_emotion.append(int(part[2])) 103 | file_path.append(Ravdess +'/'+ dir + '/' + file) 104 | 105 | # dataframe for emotion of files 106 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions']) 107 | 108 | # dataframe for path of files. 109 | path_df = pd.DataFrame(file_path, columns=['Path']) 110 | Ravdess_df = pd.concat([emotion_df, path_df], axis=1) 111 | 112 | # changing integers to actual emotions. 113 | Ravdess_df.Emotions.replace({1:'neutral', 2:'calm', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'}, inplace=True) 114 | Ravdess_df.head() 115 | 116 | """## 2. Crema DataFrame""" 117 | 118 | crema_directory_list = os.listdir(Crema) 119 | 120 | file_emotion = [] 121 | file_path = [] 122 | 123 | for file in crema_directory_list: 124 | # storing file paths 125 | file_path.append(Crema + '\\' + file) 126 | # storing file emotions 127 | part=file.split('_') 128 | if part[2] == 'SAD': 129 | file_emotion.append('sad') 130 | elif part[2] == 'ANG': 131 | file_emotion.append('angry') 132 | elif part[2] == 'DIS': 133 | file_emotion.append('disgust') 134 | elif part[2] == 'FEA': 135 | file_emotion.append('fear') 136 | elif part[2] == 'HAP': 137 | file_emotion.append('happy') 138 | elif part[2] == 'NEU': 139 | file_emotion.append('neutral') 140 | else: 141 | file_emotion.append('Unknown') 142 | 143 | # dataframe for emotion of files 144 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions']) 145 | 146 | # dataframe for path of files. 147 | path_df = pd.DataFrame(file_path, columns=['Path']) 148 | Crema_df = pd.concat([emotion_df, path_df], axis=1) 149 | Crema_df.head() 150 | 151 | """##3. TESS dataset""" 152 | 153 | tess_directory_list = os.listdir(Tess) 154 | 155 | file_emotion = [] 156 | file_path = [] 157 | 158 | for dir in tess_directory_list: 159 | directories = os.listdir(Tess +'/'+ dir) 160 | for file in directories: 161 | part = file.split('.')[0] 162 | part = part.split('_')[2] 163 | if part=='ps': 164 | file_emotion.append('surprise') 165 | else: 166 | file_emotion.append(part) 167 | file_path.append(Tess + dir + '/' + file) 168 | 169 | # dataframe for emotion of files 170 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions']) 171 | 172 | # dataframe for path of files. 173 | path_df = pd.DataFrame(file_path, columns=['Path']) 174 | Tess_df = pd.concat([emotion_df, path_df], axis=1) 175 | Tess_df.head() 176 | 177 | """##4. CREMA-D dataset 178 | 179 | The audio files in this dataset are named in a way such that the prefix letters describes the emotion classes as the below: 180 | 181 | 190 | """ 191 | 192 | savee_directory_list = os.listdir(Savee) 193 | 194 | file_emotion = [] 195 | file_path = [] 196 | 197 | for file in savee_directory_list: 198 | file_path.append(Savee + '\\' + file) 199 | part = file.split('_')[1] 200 | ele = part[:-6] 201 | if ele=='a': 202 | file_emotion.append('angry') 203 | elif ele=='d': 204 | file_emotion.append('disgust') 205 | elif ele=='f': 206 | file_emotion.append('fear') 207 | elif ele=='h': 208 | file_emotion.append('happy') 209 | elif ele=='n': 210 | file_emotion.append('neutral') 211 | elif ele=='sa': 212 | file_emotion.append('sad') 213 | else: 214 | file_emotion.append('surprise') 215 | 216 | # dataframe for emotion of files 217 | emotion_df = pd.DataFrame(file_emotion, columns=['Emotions']) 218 | 219 | # dataframe for path of files. 220 | path_df = pd.DataFrame(file_path, columns=['Path']) 221 | Savee_df = pd.concat([emotion_df, path_df], axis=1) 222 | Savee_df.head() 223 | 224 | #creating combined dataframe 225 | data_path = pd.concat([Ravdess_df], axis = 0) 226 | data_path.to_csv("data_path.csv",index=False) 227 | data_path 228 | 229 | """## Data Visualisation and Exploration 230 | 231 | First let's plot the count of each emotions in our dataset. 232 | """ 233 | 234 | import seaborn as sns 235 | import matplotlib.pyplot as plt 236 | 237 | data_path['Emotions'] = data_path['Emotions'].astype('category') 238 | 239 | plt.title('Count of Emotions', size=16) 240 | sns.countplot(data=data_path, x='Emotions') # Specify x parameter 241 | plt.ylabel('Count', size=12) 242 | plt.xlabel('Emotions', size=12) 243 | sns.despine(top=True, right=True, left=False, bottom=False) 244 | plt.show() 245 | 246 | """We can also plot waveplots and spectograms for audio signals 247 |
  • Waveplots - Waveplots let us know the loudness of the audio at a given time 248 |
  • Spectograms - A spectrogram is a visual representation of the spectrum of frequencies of sound or other signals as they vary with time. It's a representation of frequencies changing with respect to time for given audio/music signals. 249 | """ 250 | 251 | def create_waveplot(data, sr, e): 252 | plt.figure(figsize=(10, 3)) 253 | plt.title('Waveplot for audio with {} emotion'.format(e), size=15) 254 | librosa.display.waveshow(data, sr=sr) 255 | plt.show() 256 | 257 | def create_spectrogram(data, sr, e): 258 | # stft function converts the data into short term fourier transform 259 | X = librosa.stft(data) 260 | Xdb = librosa.amplitude_to_db(abs(X)) 261 | plt.figure(figsize=(12, 3)) 262 | plt.title('Spectrogram for audio with {} emotion'.format(e), size=15) 263 | librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz') 264 | #librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='log') 265 | plt.colorbar() 266 | 267 | """# Wave Plots for the Emotions 268 | 269 | ##
  • Fear 270 | """ 271 | 272 | emotion='fear' 273 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1] 274 | data, sampling_rate = librosa.load(path) 275 | create_waveplot(data, sampling_rate, emotion) 276 | create_spectrogram(data, sampling_rate, emotion) 277 | Audio(path) 278 | 279 | """##
  • Angry""" 280 | 281 | emotion='angry' 282 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1] 283 | data, sampling_rate = librosa.load(path) 284 | create_waveplot(data, sampling_rate, emotion) 285 | create_spectrogram(data, sampling_rate, emotion) 286 | Audio(path) 287 | 288 | """##
  • Sad""" 289 | 290 | emotion = 'sad' 291 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1] 292 | data, sampling_rate = librosa.load(path) 293 | create_waveplot(data, sampling_rate, emotion) 294 | create_spectrogram(data, sampling_rate, emotion) 295 | Audio(path) 296 | 297 | """##
  • Happy""" 298 | 299 | emotion='happy' 300 | path = np.array(data_path.Path[data_path.Emotions==emotion])[1] 301 | data, sampling_rate = librosa.load(path) 302 | create_waveplot(data, sampling_rate, emotion) 303 | create_spectrogram(data, sampling_rate, emotion) 304 | Audio(path) 305 | 306 | """## Data Augmentation 307 | Data augmentation is the process by which we create new synthetic data samples by adding small perturbations on our initial training set. 308 | 309 |
  • To generate syntactic data for audio, we can apply noise injection, shifting time, changing pitch and speed 310 |
  • The objective is to make our model invariant to those perturbations and enhace its ability to generalize. 311 |
  • In order to this to work adding the perturbations must conserve the same label as the original training sample. 312 |
  • In images data augmention can be performed by shifting the image, zooming, rotating, cropping ...etc 313 | 314 | But We need to check which augmentation techinques works better for our dataset 315 | """ 316 | 317 | def noise(data): 318 | noise_amp = 0.035*np.random.uniform()*np.amax(data) 319 | data = data + noise_amp*np.random.normal(size=data.shape[0]) 320 | return data 321 | 322 | def stretch(data, rate=0.8): 323 | return librosa.effects.time_stretch(data, rate=rate) 324 | 325 | def shift(data): 326 | shift_range = int(np.random.uniform(low=-5, high = 5)*1000) 327 | return np.roll(data, shift_range) 328 | 329 | def pitch(data, sampling_rate, pitch_factor=0.7): 330 | return librosa.effects.pitch_shift(data, sr=sampling_rate, n_steps=pitch_factor) 331 | 332 | # taking any example and checking for techniques. 333 | path = np.array(data_path.Path)[1] 334 | data, sample_rate = librosa.load(path) 335 | 336 | """#### 1. Simple Audio""" 337 | 338 | plt.figure(figsize=(14,4)) 339 | librosa.display.waveshow(y=data, sr=sample_rate) 340 | Audio(path) 341 | 342 | """#### 2. Noise Injection""" 343 | 344 | x = noise(data) 345 | plt.figure(figsize=(14,4)) 346 | librosa.display.waveshow(y=x, sr=sample_rate) 347 | Audio(x, rate=sample_rate) 348 | 349 | """Here, we can see noise injection is a very good augmentation technique because of which we can assure our training model is not overfitted 350 | 351 | #### 3.Stretching 352 | """ 353 | 354 | x = stretch(data, rate=0.8) 355 | plt.figure(figsize=(14,4)) 356 | librosa.display.waveshow(y=x, sr=sample_rate) 357 | Audio(x, rate=sample_rate) 358 | 359 | """#### 4.Shifting""" 360 | 361 | x = shift(data) 362 | plt.figure(figsize=(14,4)) 363 | librosa.display.waveshow(y=x, sr=sample_rate) 364 | Audio(x, rate=sample_rate) 365 | 366 | """#### 5.Pitch""" 367 | 368 | x = pitch(data, sample_rate) 369 | plt.figure(figsize=(14,4)) 370 | librosa.display.waveshow(y=x, sr=sample_rate) 371 | Audio(x, rate=sample_rate) 372 | 373 | """I am employing noise injection, stretching (i.e., changing speed), and pitch modulation as part of the aforementioned augmentation techniques. 374 | 375 | ## Feature Extraction 376 | """ 377 | 378 | def extract_features(data): 379 | # ZCR 380 | result = np.array([]) 381 | zcr = np.mean(librosa.feature.zero_crossing_rate(y=data).T, axis=0) 382 | result=np.hstack((result, zcr)) # stacking horizontally 383 | 384 | # Chroma_stft 385 | stft = np.abs(librosa.stft(data)) 386 | chroma_stft = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis=0) 387 | result = np.hstack((result, chroma_stft)) # stacking horizontally 388 | 389 | # MFCC 390 | mfcc = np.mean(librosa.feature.mfcc(y=data, sr=sample_rate).T, axis=0) 391 | result = np.hstack((result, mfcc)) # stacking horizontally 392 | 393 | # Root Mean Square Value 394 | rms = np.mean(librosa.feature.rms(y=data).T, axis=0) 395 | result = np.hstack((result, rms)) # stacking horizontally 396 | 397 | # MelSpectogram 398 | mel = np.mean(librosa.feature.melspectrogram(y=data, sr=sample_rate).T, axis=0) 399 | result = np.hstack((result, mel)) # stacking horizontally 400 | 401 | return result 402 | 403 | def get_features(path): 404 | # duration and offset are used to take care of the no audio in start and the ending of each audio files as seen above. 405 | data, sample_rate = librosa.load(path, duration=2.5, offset=0.6) 406 | 407 | # without augmentation 408 | res1 = extract_features(data) 409 | result = np.array(res1) 410 | 411 | # data with noise 412 | noise_data = noise(data) 413 | res2 = extract_features(noise_data) 414 | result = np.vstack((result, res2)) # stacking vertically 415 | 416 | # data with stretching and pitching 417 | new_data = stretch(data) 418 | data_stretch_pitch = pitch(new_data, sample_rate) 419 | res3 = extract_features(data_stretch_pitch) 420 | result = np.vstack((result, res3)) # stacking vertically 421 | 422 | return result 423 | 424 | X, Y = [], [] 425 | for path, emotion in zip(data_path.Path, data_path.Emotions): 426 | feature = get_features(path) 427 | for ele in feature: 428 | X.append(ele) 429 | # appending emotion 3 times as we have made 3 augmentation techniques on each audio file. 430 | Y.append(emotion) 431 | 432 | len(X), len(Y), data_path.Path.shape 433 | 434 | Features = pd.DataFrame(X) 435 | Features['labels'] = Y 436 | Features.to_csv('features.csv', index=False) 437 | Features.head() 438 | 439 | """We have applied data augmentation and extracted the features for each audio files and saved them. 440 | 441 | ## Data Preparation 442 | 443 | As of now we have extracted the data, now we need to normalize and split our data for training and testing. 444 | """ 445 | 446 | X = Features.iloc[: ,:-1].values 447 | Y = Features['labels'].values 448 | 449 | # As this is a multiclass classification problem onehotencoding our Y. 450 | encoder = OneHotEncoder() 451 | Y = encoder.fit_transform(np.array(Y).reshape(-1,1)).toarray() 452 | 453 | # splitting data 454 | x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=0, shuffle=True) 455 | x_train.shape, y_train.shape, x_test.shape, y_test.shape 456 | 457 | # scaling our data with sklearn's Standard scaler 458 | scaler = StandardScaler() 459 | x_train = scaler.fit_transform(x_train) 460 | x_test = scaler.transform(x_test) 461 | x_train.shape, y_train.shape, x_test.shape, y_test.shape 462 | 463 | # making our data compatible to model. 464 | x_train = np.expand_dims(x_train, axis=2) 465 | x_test = np.expand_dims(x_test, axis=2) 466 | x_train.shape, y_train.shape, x_test.shape, y_test.shape 467 | 468 | """## Modelling""" 469 | 470 | model=Sequential() 471 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(x_train.shape[1], 1))) 472 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same')) 473 | 474 | model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu')) 475 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same')) 476 | 477 | model.add(Conv1D(128, kernel_size=5, strides=1, padding='same', activation='relu')) 478 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same')) 479 | model.add(Dropout(0.2)) 480 | 481 | model.add(Conv1D(64, kernel_size=5, strides=1, padding='same', activation='relu')) 482 | model.add(MaxPooling1D(pool_size=5, strides = 2, padding = 'same')) 483 | 484 | model.add(Flatten()) 485 | model.add(Dense(units=32, activation='relu')) 486 | model.add(Dropout(0.3)) 487 | 488 | model.add(Dense(units=8, activation='softmax')) 489 | model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy']) 490 | 491 | model.summary() 492 | 493 | rlrp = ReduceLROnPlateau(monitor='loss', factor=0.4, verbose=0, patience=2, min_lr=0.0000001) 494 | history=model.fit(x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), callbacks=[rlrp]) 495 | 496 | print("Accuracy of our model on test data : " , model.evaluate(x_test,y_test)[1]*100 , "%") 497 | 498 | epochs = [i for i in range(50)] 499 | fig , ax = plt.subplots(1,2) 500 | train_acc = history.history['accuracy'] 501 | train_loss = history.history['loss'] 502 | test_acc = history.history['val_accuracy'] 503 | test_loss = history.history['val_loss'] 504 | 505 | fig.set_size_inches(20,6) 506 | ax[0].plot(epochs , train_loss , label = 'Training Loss') 507 | ax[0].plot(epochs , test_loss , label = 'Testing Loss') 508 | ax[0].set_title('Training & Testing Loss') 509 | ax[0].legend() 510 | ax[0].set_xlabel("Epochs") 511 | 512 | ax[1].plot(epochs , train_acc , label = 'Training Accuracy') 513 | ax[1].plot(epochs , test_acc , label = 'Testing Accuracy') 514 | ax[1].set_title('Training & Testing Accuracy') 515 | ax[1].legend() 516 | ax[1].set_xlabel("Epochs") 517 | plt.show() 518 | 519 | # predicting on test data. 520 | pred_test = model.predict(x_test) 521 | y_pred = encoder.inverse_transform(pred_test) 522 | 523 | y_test = encoder.inverse_transform(y_test) 524 | 525 | df = pd.DataFrame(columns=['Predicted Labels', 'Actual Labels']) 526 | df['Predicted Labels'] = y_pred.flatten() 527 | df['Actual Labels'] = y_test.flatten() 528 | 529 | df.head(10) 530 | 531 | cm = confusion_matrix(y_test, y_pred) 532 | plt.figure(figsize = (12, 10)) 533 | cm = pd.DataFrame(cm , index = [i for i in encoder.categories_] , columns = [i for i in encoder.categories_]) 534 | sns.heatmap(cm, linecolor='white', cmap='Blues', linewidth=1, annot=True, fmt='') 535 | plt.title('Confusion Matrix', size=20) 536 | plt.xlabel('Predicted Labels', size=14) 537 | plt.ylabel('Actual Labels', size=14) 538 | plt.show() 539 | 540 | print(classification_report(y_test, y_pred)) 541 | 542 | total_instances = np.sum(cm.to_numpy()) 543 | correct_predictions = np.trace(cm) 544 | 545 | accuracy = correct_predictions / total_instances 546 | print(f'Overall Accuracy: {accuracy:.2%}') 547 | 548 | """We can see our model is more accurate in predicting surprise, angry emotions and it makes sense also because audio files of these emotions differ to other audio files in a lot of ways like pitch, speed, noise. 549 | 550 | We overall achieved 63.8% accuracy on our test data and its decent but we can improve it more by applying more augmentation techniques and using other feature extraction methods. 551 | """ --------------------------------------------------------------------------------