634 |
635 | This program is free software: you can redistribute it and/or modify
636 | it under the terms of the GNU Affero General Public License as published
637 | by the Free Software Foundation, either version 3 of the License, or
638 | (at your option) any later version.
639 |
640 | This program is distributed in the hope that it will be useful,
641 | but WITHOUT ANY WARRANTY; without even the implied warranty of
642 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
643 | GNU Affero General Public License for more details.
644 |
645 | You should have received a copy of the GNU Affero General Public License
646 | along with this program. If not, see .
647 |
648 | Also add information on how to contact you by electronic and paper mail.
649 |
650 | If your software can interact with users remotely through a computer
651 | network, you should also make sure that it provides a way for users to
652 | get its source. For example, if your program is a web application, its
653 | interface could display a "Source" link that leads users to an archive
654 | of the code. There are many ways you could offer source, and different
655 | solutions will be better for different programs; see section 13 for the
656 | specific requirements.
657 |
658 | You should also get your employer (if you work as a programmer) or school,
659 | if any, to sign a "copyright disclaimer" for the program, if necessary.
660 | For more information on this, and how to apply and follow the GNU AGPL, see
661 | .
662 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | # Speech2dCNN_LSTM
5 |
6 |
7 |
8 |
11 |
12 |
13 | ## Description
14 | A pytorch implementation of [Speech emotion recognition using deep 1D & 2D CNN LSTM networks](https://www.sciencedirect.com/science/article/abs/pii/S1746809418302337) using pytorch lighting and wandb sweep for hyperparameter finding. I'm not affiliated with the authors of the paper.
15 |
16 | 
17 | ## How to run
18 | First, install dependencies
19 | ```bash
20 | # clone project
21 | git clone https://github.com/RicardoP0/Speech2dCNN_LSTM.git
22 |
23 | # install project
24 | cd Speech2dCNN_LSTM
25 | pip install -e .
26 | pip install requirements.txt
27 | ```
28 | Next, navigate to [CNN+LSTM](https://github.com/RicardoP0/Speech2dCNN_LSTM/tree/master/research_seed/audio_classification) and run it.
29 | ```bash
30 | # module folder
31 | cd research_seed/audio_classification/
32 |
33 | # run module
34 | python cnn_trainer.py
35 | ```
36 |
37 | ## Main Contribution
38 |
39 | - [CNN+LSTM](https://github.com/RicardoP0/Speech2dCNN_LSTM/tree/master/research_seed/audio_classification)
40 |
41 | ## Results
42 |
43 | Validation accuracy reaches 0.4 and a F1 value of 0.3 using 8 classes.
44 | 
45 | 
46 |
--------------------------------------------------------------------------------
/img/spectogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RicardoP0/Speech2dCNN_LSTM/78b9cb6a483c0acbce08cda6bf769d9015716ba8/img/spectogram.png
--------------------------------------------------------------------------------
/img/val_acc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RicardoP0/Speech2dCNN_LSTM/78b9cb6a483c0acbce08cda6bf769d9015716ba8/img/val_acc.png
--------------------------------------------------------------------------------
/img/val_f1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RicardoP0/Speech2dCNN_LSTM/78b9cb6a483c0acbce08cda6bf769d9015716ba8/img/val_f1.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | adabound==0.0.5
2 | torchaudio==0.4.0a0+719bcc7
3 | librosa==0.7.1
4 | pytorch_lightning==0.7.1
5 | scikit_image==0.16.2
6 | matplotlib==3.1.3
7 | torchvision==0.5.0
8 | numpy==1.22.0
9 | wandb==0.8.32
10 | pandas==1.0.3
11 | torch==1.4.0
12 | Pillow==9.3.0
13 | scikit_learn==0.22.2.post1
14 | skimage==0.0
15 | torch==1.4.0
16 |
17 |
--------------------------------------------------------------------------------
/research_seed/README.md:
--------------------------------------------------------------------------------
1 | ## Research Seed Folder
2 |
3 |
4 | ##### cnn_trainer.py
5 | Runs your LightningModule. Abstracts training loop, distributed training, etc...
6 |
7 | ##### cnn_sweep.py
8 | Code for hyperparameter finding using wandb's sweep.
9 |
10 | ##### cnn_lflb.py and cnn_rnn.py
11 | Code for cnn and cnn + lstm models
12 |
--------------------------------------------------------------------------------
/research_seed/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RicardoP0/Speech2dCNN_LSTM/78b9cb6a483c0acbce08cda6bf769d9015716ba8/research_seed/__init__.py
--------------------------------------------------------------------------------
/research_seed/audio_classification/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/RicardoP0/Speech2dCNN_LSTM/78b9cb6a483c0acbce08cda6bf769d9015716ba8/research_seed/audio_classification/__init__.py
--------------------------------------------------------------------------------
/research_seed/audio_classification/anaconda3/bin/wandb:
--------------------------------------------------------------------------------
1 | #!/home/ricardo/anaconda3/bin/python
2 | # -*- coding: utf-8 -*-
3 | import re
4 | import sys
5 | from wandb.cli import cli
6 | if __name__ == '__main__':
7 | sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
8 | sys.exit(cli())
9 |
--------------------------------------------------------------------------------
/research_seed/audio_classification/cnn_lflb.py:
--------------------------------------------------------------------------------
1 | """
2 | This file defines the core research contribution
3 | """
4 | # %%
5 | import os
6 | import torch
7 | import torch.nn.functional as F
8 | import torch.nn as nn
9 | from torch.utils.data import DataLoader
10 | from torchvision.datasets import MNIST
11 | import torchvision.transforms as transforms
12 | from argparse import ArgumentParser
13 | from collections import OrderedDict
14 | from research_seed.audio_classification.datasets.iemocap_spect import IEMOCAPSpectDataset
15 | import pytorch_lightning as pl
16 | import numpy as np
17 | import sklearn.metrics as metrics
18 | from adabound import AdaBound
19 |
20 | import torchvision.models as models
21 |
22 |
23 | class LFLBlock(nn.Module):
24 | def __init__(self, inp_ch, out_ch, conv_k, conv_s, pool_k, pool_s, p_dropout):
25 | super(LFLBlock, self).__init__()
26 |
27 | self.conv = nn.Conv2d(inp_ch, out_ch, conv_k, conv_s, padding=(1, 2))
28 | self.batch_nm = nn.BatchNorm2d(out_ch)
29 | self.pool = nn.MaxPool2d(pool_k, pool_s)
30 |
31 | self.dropout = nn.Dropout2d(p=p_dropout) # AlphaDropout
32 | self.actv = nn.ELU()
33 |
34 | def forward(self, x):
35 |
36 | x = self.conv(x)
37 |
38 | x = self.actv(x)
39 | x = self.pool(x)
40 | x = self.dropout(x)
41 | x = self.batch_nm(x)
42 |
43 | return x
44 |
45 |
46 | class CNN_LFLB(pl.LightningModule):
47 |
48 | def __init__(self, hparams):
49 | super(CNN_LFLB, self).__init__()
50 | # not the best model...
51 |
52 | self.hparams = hparams
53 | self.num_classes = hparams.num_classes
54 |
55 | self.lflb1 = LFLBlock(inp_ch=1, out_ch=64, conv_k=3,
56 | conv_s=1, pool_k=2, pool_s=2, p_dropout=self.hparams.dropout_1)
57 | self.lflb2 = LFLBlock(inp_ch=64, out_ch=64, conv_k=3,
58 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_2)
59 | self.lflb3 = LFLBlock(inp_ch=64, out_ch=128, conv_k=3,
60 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_3)
61 | self.lflb4 = LFLBlock(inp_ch=128, out_ch=128, conv_k=3,
62 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_3)
63 |
64 | self.fc1 = nn.Linear(256, 64)
65 | self.fc2 = nn.Linear(64, self.num_classes)
66 |
67 | def forward(self, x):
68 | x = self.lflb1(x)
69 | x = self.lflb2(x)
70 | x = self.lflb3(x)
71 | x = self.lflb4(x)
72 |
73 | x = x.view(x.shape[0], -1)
74 | x = F.relu(self.fc1(x))
75 | x = self.fc2(x)
76 |
77 | return x
78 |
79 | def training_step(self, batch, batch_idx):
80 | # REQUIRED
81 | x, y = batch
82 | y_hat = self.forward(x)
83 | loss_val = F.cross_entropy(y_hat, y)
84 | with torch.no_grad():
85 | y_pred = torch.max(F.softmax(y_hat, dim=1), 1)[1]
86 | acc = metrics.accuracy_score(y.cpu(), y_pred.cpu())
87 | tqdm_dict = {'train_loss': loss_val, 'train_acc': acc}
88 | output = OrderedDict({
89 | 'loss': loss_val,
90 | 'progress_bar': tqdm_dict,
91 | 'log': tqdm_dict
92 | })
93 | return output
94 |
95 | def validation_step(self, batch, batch_idx):
96 | # OPTIONAL
97 | x, y = batch
98 |
99 | with torch.no_grad():
100 | y_hat = self.forward(x)
101 | y_pred = torch.max(F.softmax(y_hat, dim=1), 1)[1]
102 |
103 | acc = metrics.accuracy_score(y.cpu(), y_pred.cpu())
104 | f1 = metrics.f1_score(y.cpu(), y_pred.cpu(), average='macro')
105 | loss_val = F.cross_entropy(y_hat, y)
106 |
107 | output = OrderedDict({'val_loss': loss_val, 'val_f1': f1, 'val_acc': acc})
108 |
109 | return output
110 |
111 | def validation_end(self, outputs):
112 | # OPTIONAL
113 | tqdm_dict = {}
114 |
115 | for metric_name in ["val_loss", "val_f1", "val_acc"]:
116 | metric_total = 0
117 |
118 | for output in outputs:
119 | metric_value = output[metric_name]
120 |
121 | # reduce manually when using dp
122 | if self.trainer.use_dp or self.trainer.use_ddp2:
123 | metric_value = torch.mean(metric_value)
124 |
125 | metric_total += metric_value
126 |
127 | tqdm_dict[metric_name] = metric_total / len(outputs)
128 |
129 | result = {'progress_bar': tqdm_dict, 'log': tqdm_dict,'val_loss': tqdm_dict["val_loss"]}
130 |
131 | return result
132 |
133 | def configure_optimizers(self):
134 | # REQUIRED
135 | # can return multiple optimizers and learning_rate schedulers
136 |
137 | # ,weight_decay=0.01)#torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)
138 | return AdaBound(self.parameters(), lr=self.hparams.learning_rate_init,
139 | final_lr=self.hparams.learning_rate_final,weight_decay=self.hparams.weight_decay)
140 |
141 | def train_dataloader(self):
142 | # REQUIRED
143 | transform = transforms.Compose([transforms.ToTensor()])
144 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='train', transform=transform, num_classes=self.num_classes),
145 | batch_size=32, num_workers=2, pin_memory=True,
146 | shuffle=True)
147 |
148 | def val_dataloader(self):
149 | # OPTIONAL
150 |
151 | transform = transforms.Compose([transforms.ToTensor()])
152 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='val', transform=transform, num_classes=self.num_classes),
153 | batch_size=32, num_workers=2, pin_memory=True,
154 | shuffle=True)
155 |
156 | def test_dataloader(self):
157 | # OPTIONAL
158 | transform = transforms.Compose([transforms.ToTensor()])
159 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='test', transform=transform, num_classes=self.num_classes),
160 | batch_size=32, num_workers=2, pin_memory=True,
161 | shuffle=True)
162 |
163 | @staticmethod
164 | def add_model_specific_args(parent_parser):
165 | """
166 | Specify the hyperparams for this LightningModule
167 | """
168 | # MODEL specific
169 | parser = ArgumentParser(parents=[parent_parser])
170 | parser.add_argument('--learning_rate_init', default=7e-4, type=float)
171 | parser.add_argument('--learning_rate_final', default=0.02, type=float)
172 | parser.add_argument('--batch_size', default=32, type=int)
173 | parser.add_argument('--dropout_1', default=0.6, type=float)
174 | parser.add_argument('--dropout_2', default=0.3, type=float)
175 | parser.add_argument('--dropout_3', default=0.2, type=float)
176 | parser.add_argument('--weight_decay', default=0.0, type=float)
177 |
178 |
179 | # training specific (for this model)
180 | parser.add_argument('--max_nb_epochs', default=10000, type=int)
181 |
182 | # data
183 | parser.add_argument(
184 | '--data_root', default='../datasets/RAVDESS/SOUND_SPECT/', type=str)
185 | parser.add_argument(
186 | '--num_classes', dest='num_classes', default=8, type=int)
187 | return parser
188 |
--------------------------------------------------------------------------------
/research_seed/audio_classification/cnn_rnn.py:
--------------------------------------------------------------------------------
1 | """
2 | This file defines the core research contribution
3 | """
4 | # %%
5 | import os
6 | import torch
7 | import torch.nn.functional as F
8 | import torch.nn as nn
9 | from torch.utils.data import DataLoader
10 | from torchvision.datasets import MNIST
11 | import torchvision.transforms as transforms
12 | from argparse import ArgumentParser
13 | from collections import OrderedDict
14 | from research_seed.audio_classification.datasets.iemocap_spect import IEMOCAPSpectDataset
15 | import pytorch_lightning as pl
16 | import numpy as np
17 | import sklearn.metrics as metrics
18 | from adabound import AdaBound
19 |
20 | import torchvision.models as models
21 |
22 |
23 | class LSTMBlock(nn.Module):
24 | def __init__(self, input_size=300, hidden_size=256, num_layers=2, bidirectional=True, dropout=0.0, num_classes=8):
25 | super(LSTMBlock, self).__init__()
26 |
27 | self.input_size = input_size
28 | self.num_layers = num_layers # RNN hidden layers
29 | self.hidden_size = hidden_size # RNN hidden nodes
30 | self.num_classes = num_classes
31 | if bidirectional:
32 | self.num_directions = 2
33 | else:
34 | self.num_directions = 1
35 |
36 | self.LSTM = nn.LSTM(
37 | input_size=self.input_size,
38 | hidden_size=self.hidden_size,
39 | num_layers=self.num_layers,
40 | bidirectional=bidirectional,
41 | dropout=dropout,
42 | # input & output has batch size as 1s dimension. e.g. (batch, time_step, input_size)
43 | batch_first=True,
44 | )
45 |
46 | self.fc1 = nn.Linear(self.hidden_size * self.num_directions, self.num_classes)
47 |
48 | def forward(self, x):
49 |
50 | self.LSTM.flatten_parameters()
51 | # print(x.shape)
52 |
53 | RNN_out, (h_n, h_c) = self.LSTM(x, None)
54 | # out" will give you access to all hidden states in the sequence
55 | """ h_n shape ((num_layers * num_directions, batch, hidden_size)), h_c shape (n_layers, batch, hidden_size) """
56 | """ None represents zero initial hidden state. RNN_out has shape=(batch, time_step, output_size) """
57 |
58 | x = self.fc1(RNN_out[:,-1,:]) # choose RNN_out at the last time step and activations in both directions
59 |
60 | return x
61 |
62 |
63 | class LFLBlock(nn.Module):
64 | def __init__(self, inp_ch, out_ch, conv_k, conv_s, pool_k, pool_s, p_dropout):
65 | super(LFLBlock, self).__init__()
66 |
67 | self.conv = nn.Conv2d(inp_ch, out_ch, conv_k, conv_s, padding=(1, 2))
68 | self.batch_nm = nn.BatchNorm2d(out_ch)
69 | self.pool = nn.MaxPool2d(pool_k, pool_s)
70 | self.dropout = nn.Dropout2d(p=p_dropout) # AlphaDropout
71 | self.actv = nn.ELU()
72 |
73 | def forward(self, x):
74 |
75 | x = self.conv(x)
76 |
77 | x = self.actv(x)
78 | x = self.pool(x)
79 | x = self.dropout(x)
80 | x = self.batch_nm(x)
81 |
82 | return x
83 |
84 |
85 | class CNN_RNN(pl.LightningModule):
86 |
87 | def __init__(self, hparams):
88 | super(CNN_RNN, self).__init__()
89 |
90 | self.hparams = hparams
91 | self.num_classes = hparams.num_classes
92 | self.bidirectional = bool(hparams.bidirectional)
93 | self.num_layers_rnn = hparams.num_layers_rnn
94 | self.dropout_rnn = hparams.dropout_rnn
95 | self.num_layers_rnn = hparams.num_layers_rnn # RNN hidden layers
96 | self.hidden_size_rnn = hparams.hidden_size_rnn # RNN hidden nodes
97 |
98 | self.lflb1 = LFLBlock(inp_ch=1, out_ch=64, conv_k=3,
99 | conv_s=1, pool_k=2, pool_s=2, p_dropout=self.hparams.dropout_1)
100 | self.lflb2 = LFLBlock(inp_ch=64, out_ch=64, conv_k=3,
101 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_2)
102 | self.lflb3 = LFLBlock(inp_ch=64, out_ch=128, conv_k=3,
103 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_3)
104 | self.lflb4 = LFLBlock(inp_ch=128, out_ch=128, conv_k=3,
105 | conv_s=1, pool_k=4, pool_s=4, p_dropout=self.hparams.dropout_3)
106 |
107 | self.rnn = LSTMBlock(input_size=128, hidden_size=self.hidden_size_rnn,dropout=self.dropout_rnn, num_classes=self.num_classes,
108 | bidirectional=self.bidirectional, num_layers=self.num_layers_rnn,
109 | )
110 |
111 |
112 | def forward(self, x):
113 | x = self.lflb1(x)
114 | x = self.lflb2(x)
115 | x = self.lflb3(x)
116 | x = self.lflb4(x)
117 |
118 | x = x.permute(0, 3, 1, 2)
119 | x = x.view(x.shape[0], x.shape[1], -1)
120 |
121 | x = self.rnn(x)
122 |
123 | return x
124 |
125 | def training_step(self, batch, batch_idx):
126 | # REQUIRED
127 | x, y = batch
128 | y_hat = self.forward(x)
129 | loss_val = F.cross_entropy(y_hat, y)
130 | with torch.no_grad():
131 | y_pred = torch.max(F.softmax(y_hat, dim=1), 1)[1]
132 | acc = metrics.accuracy_score(y.cpu(), y_pred.cpu())
133 | tqdm_dict = {'train_loss': loss_val, 'train_acc': acc}
134 |
135 | output = OrderedDict({
136 | 'loss': loss_val,
137 | 'progress_bar': tqdm_dict,
138 | 'log': tqdm_dict
139 | })
140 | return output
141 |
142 | def accuracy(self, y_true, y_pred):
143 | with torch.no_grad():
144 | acc = (y_true == y_pred).sum().to(torch.float32)
145 | acc /= y_pred.shape[0]
146 |
147 | return acc
148 |
149 | def validation_step(self, batch, batch_idx):
150 | # OPTIONAL
151 | x, y = batch
152 |
153 | y_hat = self.forward(x)
154 |
155 | with torch.no_grad():
156 | y_pred = torch.max(F.softmax(y_hat, dim=1), 1)[1]
157 | acc = metrics.accuracy_score(y.cpu(), y_pred.cpu())
158 | f1 = metrics.f1_score(y.cpu(), y_pred.cpu(), average='macro')
159 | loss_val = F.cross_entropy(y_hat, y)
160 |
161 | output = OrderedDict(
162 | {'val_loss': loss_val, 'val_f1': f1, 'val_acc': acc})
163 |
164 | return output
165 |
166 | def validation_end(self, outputs):
167 | # OPTIONAL
168 | tqdm_dict = {}
169 |
170 | for metric_name in ["val_loss", "val_f1", "val_acc"]:
171 | metric_total = 0
172 |
173 | for output in outputs:
174 | metric_value = output[metric_name]
175 |
176 | # reduce manually when using dp
177 | if self.trainer.use_dp or self.trainer.use_ddp2:
178 | metric_value = torch.mean(metric_value)
179 |
180 | metric_total += metric_value
181 |
182 | tqdm_dict[metric_name] = metric_total / len(outputs)
183 |
184 | result = {'progress_bar': tqdm_dict, 'log': tqdm_dict,
185 | 'val_loss': tqdm_dict["val_loss"]}
186 |
187 | return result
188 |
189 | def configure_optimizers(self):
190 | # REQUIRED
191 | # can return multiple optimizers and learning_rate schedulers
192 |
193 | return AdaBound(self.parameters(), lr=self.hparams.learning_rate_init,
194 | final_lr=self.hparams.learning_rate_final, weight_decay=self.hparams.weight_decay)
195 |
196 | def train_dataloader(self):
197 | # REQUIRED
198 | transform = transforms.Compose([transforms.ToTensor()])
199 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='train', transform=transform, num_classes=self.num_classes),
200 | batch_size=32, num_workers=2, pin_memory=True,
201 | shuffle=True)
202 |
203 | def val_dataloader(self):
204 | # OPTIONAL
205 |
206 | transform = transforms.Compose([transforms.ToTensor()])
207 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='val', transform=transform, num_classes=self.num_classes),
208 | batch_size=32, num_workers=2, pin_memory=True,
209 | shuffle=True)
210 |
211 | def test_dataloader(self):
212 | # OPTIONAL
213 | transform = transforms.Compose([transforms.ToTensor()])
214 | return DataLoader(IEMOCAPSpectDataset(self.hparams.data_root, set_type='test', transform=transform, num_classes=self.num_classes),
215 | batch_size=32, num_workers=2, pin_memory=True,
216 | shuffle=True)
217 |
218 | @staticmethod
219 | def add_model_specific_args(parent_parser):
220 | """
221 | Specify the hyperparams for this LightningModule
222 | """
223 | # MODEL specific
224 | parser = ArgumentParser(parents=[parent_parser])
225 |
226 | parser.add_argument('--learning_rate_init',
227 | default=0.0002898, type=float)
228 | parser.add_argument('--learning_rate_final',
229 | default=0.01435, type=float)
230 | parser.add_argument('--batch_size', default=32, type=int)
231 | parser.add_argument('--weight_decay', default=0.004566, type=float)
232 | #cnn
233 | parser.add_argument('--dropout_1', default=0.5424, type=float)
234 | parser.add_argument('--dropout_2', default=0.257, type=float)
235 | parser.add_argument('--dropout_3', default=0.558, type=float)
236 | #rnn
237 | parser.add_argument('--bidirectional', default=1, type=int)
238 | parser.add_argument('--num_layers_rnn', default=2, type=int)
239 | parser.add_argument('--dropout_rnn', default=0.0, type=float)
240 | parser.add_argument('--hidden_size_rnn', default=256, type=int)
241 |
242 |
243 |
244 | # training specific (for this model)
245 | parser.add_argument('--max_nb_epochs', default=10000, type=int)
246 |
247 | # data
248 | parser.add_argument(
249 | '--data_root', default='../datasets/RAVDESS/SOUND_SPECT/', type=str)
250 | parser.add_argument(
251 | '--num_classes', dest='num_classes', default=8, type=int)
252 | return parser
253 |
--------------------------------------------------------------------------------
/research_seed/audio_classification/cnn_sweep.py:
--------------------------------------------------------------------------------
1 | """
2 | This file runs the main training/val loop, etc... using Lightning Trainer
3 | """
4 | # %%
5 | import os
6 | import sys
7 | os.chdir('../../')
8 | current_path = os.path.abspath('.')
9 | sys.path.append(current_path)
10 |
11 | import logging
12 | from shutil import copyfile
13 | import numpy as np
14 | import pandas as pd
15 | from sklearn.metrics import confusion_matrix, classification_report
16 | import torch.nn.functional as F
17 | import torch
18 | from research_seed.audio_classification.datasets.iemocap_spect import IEMOCAPSpectDataset
19 | from pytorch_lightning.callbacks import EarlyStopping
20 | import wandb
21 | from pytorch_lightning.loggers import WandbLogger
22 | from research_seed.audio_classification.cnn_rnn import CNN_RNN
23 | from research_seed.audio_classification.cnn_lflb import CNN_LFLB
24 | from research_seed.audio_classification.cnn_spect import CNN_SPECT
25 | from argparse import ArgumentParser
26 | from pytorch_lightning import Trainer
27 |
28 |
29 | logging.basicConfig(
30 | level=logging.INFO,
31 | format="%(asctime)s [%(levelname)s] %(message)s",
32 | handlers=[
33 | logging.FileHandler("debug.log"),
34 | logging.StreamHandler()
35 | ]
36 | )
37 |
38 |
39 | def report(model, wandb_logger):
40 | # https://donatstudios.com/CsvToMarkdownTable
41 |
42 | model.eval()
43 | model = model.cpu()
44 | y_pred = []
45 | y_true = []
46 |
47 | for x, y in model.val_dataloader():
48 |
49 | res = torch.max(F.softmax(model(x), dim=1), 1)[1].numpy()
50 | y_pred.extend(res)
51 | y_true.extend(y.numpy())
52 |
53 | unique_label = np.unique([y_true, y_pred])
54 | cmtx = pd.DataFrame(
55 | confusion_matrix(y_true, y_pred, labels=unique_label),
56 | index=['true:{:}'.format(x) for x in unique_label],
57 | columns=['pred:{:}'.format(x) for x in unique_label]
58 | )
59 |
60 | report = pd.DataFrame(classification_report(
61 | y_true, y_pred, output_dict=True))
62 | print(cmtx, report)
63 | wreport = []
64 | tmp = [str(item) for item in report.values[0]]
65 | tmp.insert(0, 'precision')
66 | wreport.append(tmp)
67 | tmp = [str(item) for item in report.values[1]]
68 | tmp.insert(0, 'recall')
69 | wreport.append(tmp)
70 | tmp = [str(item) for item in report.values[2]]
71 | tmp.insert(0, 'f1-score')
72 | wreport.append(tmp)
73 | tmp = [str(item) for item in report.values[3]]
74 | tmp.insert(0, 'support')
75 | wreport.append(tmp)
76 |
77 | hreport = report.columns
78 | hreport = hreport.insert(0, '')
79 |
80 | wandb_logger.log_metrics({'confusion_matrix': wandb.plots.HeatMap(unique_label, unique_label, cmtx.values, show_text=True),
81 | 'classification_report': wandb.Table(data=wreport, columns=hreport.values)})
82 |
83 |
84 | def main(hparams, network):
85 | # init module
86 |
87 | model = network(hparams)
88 | print(model.hparams)
89 | project_folder = 'audio_emotion_team'
90 | wandb_logger = WandbLogger(
91 | name='lflb_dropout_rnn', project=project_folder, entity='thesis', offline=False)
92 |
93 | early_stop_callback = EarlyStopping(
94 | monitor='val_loss',
95 | min_delta=0.00,
96 | patience=20,
97 | verbose=False,
98 | mode='min'
99 | )
100 |
101 | # most basic trainer, uses good defaults
102 | trainer = Trainer(
103 | max_nb_epochs=hparams.max_nb_epochs,
104 | gpus=hparams.gpus,
105 | nb_gpu_nodes=hparams.nodes,
106 | logger=wandb_logger,
107 | #weights_summary='full',
108 | early_stop_callback=early_stop_callback,
109 | #profiler=True,
110 | benchmark=True,
111 | #log_gpu_memory='all'
112 |
113 | )
114 | wandb_logger.experiment
115 | wandb_logger.watch(model)
116 |
117 | trainer.fit(model)
118 |
119 |
120 | if __name__ == '__main__':
121 | parser = ArgumentParser(add_help=False)
122 | parser.add_argument('--gpus', type=str, default=1)
123 | parser.add_argument('--nodes', type=int, default=1)
124 |
125 | network = CNN_RNN#CNN_LFLB # CNN_RNN
126 | # give the module a chance to add own params
127 | # good practice to define LightningModule speficic params in the module
128 | parser = network.add_model_specific_args(parser)
129 |
130 | # parse params
131 | #print(os.getcwd())
132 | hparams = parser.parse_args()
133 |
134 | main(hparams, network)
135 |
136 |
137 | # %%
138 |
--------------------------------------------------------------------------------
/research_seed/audio_classification/cnn_trainer.py:
--------------------------------------------------------------------------------
1 | """
2 | This file runs the main training/val loop, etc... using Lightning Trainer
3 | """
4 | # %%
5 | import os
6 | import sys
7 | current_path = os.path.abspath('.')
8 | sys.path.append(current_path)
9 |
10 | import logging
11 | from shutil import copyfile
12 | import numpy as np
13 | import pandas as pd
14 | from sklearn.metrics import confusion_matrix, classification_report
15 | import torch.nn.functional as F
16 | import torch
17 | from research_seed.audio_classification.datasets.iemocap_spect import IEMOCAPSpectDataset
18 | from pytorch_lightning.callbacks import EarlyStopping
19 | import wandb
20 | from pytorch_lightning.loggers import WandbLogger
21 | from research_seed.audio_classification.cnn_rnn import CNN_RNN
22 | from research_seed.audio_classification.cnn_lflb import CNN_LFLB
23 |
24 | from argparse import ArgumentParser
25 | from pytorch_lightning import Trainer
26 |
27 |
28 | logging.basicConfig(
29 | level=logging.INFO,
30 | format="%(asctime)s [%(levelname)s] %(message)s",
31 | handlers=[
32 | logging.FileHandler("debug.log"),
33 | logging.StreamHandler()
34 | ]
35 | )
36 |
37 |
38 | def report(model, wandb_logger):
39 |
40 | model.eval()
41 | model = model.cpu()
42 | y_pred = []
43 | y_true = []
44 |
45 | for x, y in model.val_dataloader():
46 |
47 | res = torch.max(F.softmax(model(x), dim=1), 1)[1].numpy()
48 | y_pred.extend(res)
49 | y_true.extend(y.numpy())
50 |
51 | unique_label = np.unique([y_true, y_pred])
52 | cmtx = pd.DataFrame(
53 | confusion_matrix(y_true, y_pred, labels=unique_label),
54 | index=['true:{:}'.format(x) for x in unique_label],
55 | columns=['pred:{:}'.format(x) for x in unique_label]
56 | )
57 |
58 | report = pd.DataFrame(classification_report(
59 | y_true, y_pred, output_dict=True))
60 | print(cmtx, report)
61 | wreport = []
62 | tmp = [str(item) for item in report.values[0]]
63 | tmp.insert(0, 'precision')
64 | wreport.append(tmp)
65 | tmp = [str(item) for item in report.values[1]]
66 | tmp.insert(0, 'recall')
67 | wreport.append(tmp)
68 | tmp = [str(item) for item in report.values[2]]
69 | tmp.insert(0, 'f1-score')
70 | wreport.append(tmp)
71 | tmp = [str(item) for item in report.values[3]]
72 | tmp.insert(0, 'support')
73 | wreport.append(tmp)
74 |
75 | hreport = report.columns
76 | hreport = hreport.insert(0, '')
77 |
78 | wandb_logger.log_metrics({'confusion_matrix': wandb.plots.HeatMap(unique_label, unique_label, cmtx.values, show_text=True),
79 | 'classification_report': wandb.Table(data=wreport, columns=hreport.values)})
80 |
81 |
82 | def main(hparams, network):
83 | # init module
84 |
85 | model = network(hparams)
86 | project_folder = 'audio_emotion_team'
87 | wandb_logger = WandbLogger(
88 | name='lflb_dropout_rnn', project=project_folder, entity='thesis', offline=False)
89 |
90 | early_stop_callback = EarlyStopping(
91 | monitor='val_loss',
92 | min_delta=0.00,
93 | patience=20,
94 | verbose=False,
95 | mode='min'
96 | )
97 |
98 | # most basic trainer, uses good defaults
99 | trainer = Trainer(
100 | max_nb_epochs=hparams.max_nb_epochs,
101 | gpus=hparams.gpus,
102 | nb_gpu_nodes=hparams.nodes,
103 | logger=wandb_logger,
104 | weights_summary='full',
105 | early_stop_callback=early_stop_callback,
106 | profiler=True,
107 | benchmark=True,
108 | log_gpu_memory='all'
109 |
110 | )
111 |
112 | wandb_logger.experiment.config.update(
113 | {'dataset': 'IEMOCAP_SPECT_GS_8s_512h_2048n'})
114 | wandb_logger.watch(model)
115 |
116 | trainer.fit(model)
117 | # load best model
118 | exp_folder = project_folder + '/version_'+wandb_logger.experiment.id
119 | model_file = os.listdir(exp_folder + '/checkpoints')[0]
120 | # eval and upload best model
121 | model = network.load_from_checkpoint(
122 | exp_folder+'/checkpoints/' + model_file)
123 | report(model, wandb_logger)
124 | copyfile(exp_folder+'/checkpoints/' + model_file,
125 | wandb_logger.experiment.dir+'/model.ckpt')
126 | wandb_logger.experiment.save('model.ckpt')
127 |
128 |
129 | if __name__ == '__main__':
130 | parser = ArgumentParser(add_help=False)
131 | parser.add_argument('--gpus', type=str, default=1)
132 | parser.add_argument('--nodes', type=int, default=1)
133 |
134 | network = CNN_RNN#CNN_LFLB # CNN_RNN
135 | parser = network.add_model_specific_args(parser)
136 |
137 | # parse params
138 | print(os.getcwd())
139 | hparams = parser.parse_args(["--data_root", "../datasets/IEMOCAP/SOUND_SPECT_GS_8s_512h_2048n/", '--max_nb_epochs', '10000',
140 | '--num_classes', '8'])
141 |
142 | main(hparams, network)
143 |
144 |
145 | # %%
146 |
--------------------------------------------------------------------------------
/research_seed/audio_classification/model_testing.py:
--------------------------------------------------------------------------------
1 | """
2 | This file runs the main training/val loop, etc... using Lightning Trainer
3 | """
4 | #%%
5 | import os
6 | import sys
7 | current_path = os.path.abspath('.')
8 | sys.path.append(current_path)
9 | from pytorch_lightning import Trainer
10 | from pytorch_lightning.profiler import Profiler
11 | from argparse import ArgumentParser
12 |
13 | from research_seed.audio_classification.cnn_spect import CNN_SPECT
14 | from research_seed.audio_classification.cnn_lflb import CNN_LFLB
15 | from research_seed.audio_classification.cnn_rnn import CNN_RNN
16 | from pytorch_lightning.loggers import WandbLogger
17 | import wandb
18 | from pytorch_lightning.callbacks import EarlyStopping
19 | from research_seed.audio_classification.datasets.iemocap_spect import IEMOCAPSpectDataset
20 |
21 | import torch
22 | import torch.nn.functional as F
23 | from sklearn.metrics import confusion_matrix, classification_report
24 | import pandas as pd
25 | import numpy as np
26 |
27 | from shutil import copyfile
28 | import sys
29 | import logging
30 | # ...
31 | logging.basicConfig(
32 | level=logging.INFO,
33 | format="%(asctime)s [%(levelname)s] %(message)s",
34 | handlers=[
35 | logging.FileHandler("debug.log"),
36 | logging.StreamHandler()
37 | ]
38 | )
39 | def report(model, wandb_logger):
40 | #https://donatstudios.com/CsvToMarkdownTable
41 |
42 | model.eval()
43 | model = model.cpu()
44 | y_pred = []
45 | y_true = []
46 |
47 | for x,y in model.val_dataloader():
48 |
49 | res = torch.max(F.softmax(model(x), dim=1),1)[1].numpy()
50 | y_pred.extend(res)
51 | y_true.extend(y.numpy())
52 | break
53 |
54 |
55 |
56 | unique_label = np.unique([y_true, y_pred])
57 | cmtx = pd.DataFrame(
58 | confusion_matrix(y_true, y_pred, labels=unique_label),
59 | index=['true:{:}'.format(x) for x in unique_label],
60 | columns=['pred:{:}'.format(x) for x in unique_label]
61 | )
62 |
63 | report = pd.DataFrame(classification_report(y_true,y_pred, output_dict=True))
64 |
65 | wreport = []
66 | tmp = [str(item) for item in report.values[0]]
67 | tmp.insert(0,'precision')
68 | wreport.append(tmp)
69 | tmp = [str(item) for item in report.values[1]]
70 | tmp.insert(0,'recall')
71 | wreport.append(tmp)
72 | tmp = [str(item) for item in report.values[2]]
73 | tmp.insert(0,'f1-score')
74 | wreport.append(tmp)
75 | tmp = [str(item) for item in report.values[3]]
76 | tmp.insert(0,'support')
77 | wreport.append(tmp)
78 |
79 | print(report,cmtx)
80 |
81 |
82 |
83 | hreport = report.columns
84 | hreport = hreport.insert(0,'')
85 |
86 | if wandb_logger:
87 | wandb_logger.log_metrics({'confusion_matrix': wandb.plots.HeatMap(unique_label, unique_label, cmtx.values, show_text=True),
88 | 'classification_report':wandb.Table(data=wreport, columns=hreport.values)})
89 | def main(hparams, network):
90 | # init module
91 | debugging = True
92 |
93 | project_folder = 'test'
94 | model = network(hparams)
95 | #wandb_logger = WandbLogger(name='test',offline=True,project=project_folder,entity='ricardop0')
96 |
97 |
98 | early_stop_callback = EarlyStopping(
99 | monitor='val_loss',
100 | min_delta=0.00,
101 | patience=5,
102 | verbose=False,
103 | mode='min'
104 | )
105 |
106 | # most basic trainer, uses good defaults
107 | trainer = Trainer(
108 | max_nb_epochs=hparams.max_nb_epochs,
109 | gpus=hparams.gpus,
110 | nb_gpu_nodes=hparams.nodes,
111 | fast_dev_run=debugging,
112 | weights_summary='full',
113 | early_stop_callback=early_stop_callback,
114 | profiler=True,
115 | benchmark=True,
116 | log_gpu_memory='all',
117 | overfit_pct = 0.1,
118 | #logger=wandb_logger,
119 |
120 | )
121 |
122 | trainer.fit(model)
123 | # id = wandb_logger.experiment.id
124 | # print(id)
125 | # os.environ["WANDB_RUN_ID"] = id
126 | # wandb_logger = WandbLogger(name='test',offline=True,project=project_folder,entity='ricardop0')
127 | # wandb_logger.experiment
128 | # print(wandb_logger.experiment.id)
129 | # # load best model
130 | # exp_folder = project_folder + '/version_'+wandb_logger.experiment.id
131 | # model_file = os.listdir(exp_folder + '/checkpoints')[0]
132 | # # eval and upload best model
133 | # model = network.load_from_checkpoint(
134 | # exp_folder+'/checkpoints/' + model_file)
135 | # report(model, wandb_logger)
136 | # copyfile(exp_folder+'/checkpoints/' + model_file,
137 | # wandb_logger.experiment.dir+'/model.ckpt')
138 | # wandb_logger.experiment.save('model.ckpt')
139 | # print('here')
140 | # wandb_logger.finalize()
141 |
142 | #print(wandb_logger.experiment.config)
143 | #print(wandb_logger.experiment.id)
144 | #exp_folder = 'audio_class/version_'+wandb_logger.experiment.id
145 | #model_file = os.listdir(exp_folder + '/checkpoints')[0]
146 | #model = CNN_LFLB.load_from_checkpoint(exp_folder+'/checkpoints/'+ model_file)
147 | #report(model, None)
148 | #wandb_logger.experiment.save(exp_folder+'/checkpoints/'+ model_file)
149 |
150 |
151 |
152 | if __name__ == '__main__':
153 |
154 | parser = ArgumentParser(add_help=False)
155 | parser.add_argument('--gpus', type=str, default=1)
156 | parser.add_argument('--nodes', type=int, default=1)
157 |
158 | # give the module a chance to add own params
159 | # good practice to define LightningModule speficic params in the module
160 | network = CNN_RNN#CNN_LFLB#CNN_RNN
161 | parser = network.add_model_specific_args(parser)
162 |
163 | # parse params
164 | print(os.getcwd())
165 | hparams = parser.parse_args(["--data_root", "../datasets/IEMOCAP/SOUND_SPECT_GS_8s_512h_2048n/", '--max_nb_epochs', '50',
166 | '--num_classes', '6'])
167 |
168 | main(hparams, network)
169 |
170 |
171 |
172 | #%%
173 |
174 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | from setuptools import setup, find_packages
4 |
5 | setup(name='Speech2dCNN_LSTM',
6 | version='0.0.1',
7 | description='A pytorch implementation of Speech emotion recognition using deep 1D & 2D CNN LSTM networks',
8 | author='',
9 | author_email='',
10 | url='https://github.com/RicardoP0/Speech2dCNN_LSTM.git',
11 | install_requires=[
12 | 'pytorch-lightning'
13 | ],
14 | packages=find_packages()
15 | )
16 |
17 |
--------------------------------------------------------------------------------