├── .gitignore ├── README.md ├── adversary_dataset.py ├── audio_dataset.py ├── fine_tune_pretrained_mlp.py ├── gtzan ├── test_filtered.txt ├── test_stratified.txt ├── train_filtered.txt ├── train_stratified.txt ├── valid_filtered.txt └── valid_stratified.txt ├── hpc_scripts ├── README.md ├── generate_gbar_jobs.py ├── generate_gbar_jobs_dnn.py ├── generate_gbar_jobs_rf.py ├── generate_gbar_jobs_rf2.py ├── generate_jobs.sh ├── gpu_to_cpu_pkl.py └── requirements.txt ├── lmd ├── lmd_prep.sh ├── lmd_train.sh └── lmd_train_conv.sh ├── lmd_af ├── LMD_AF_split_conv.pkl ├── LMD_AF_split_dnn.pkl ├── lmd_af_conv_model.pkl ├── lmd_af_dnn_model.pkl └── mlp_rlu_dropout_adversary.yaml ├── prepare_dataset.py ├── pretrain_layers.py ├── svm_train_test.py ├── test_mlp_script.py ├── train_classifier_on_dnn_feats.py ├── train_mlp_conv_script.py ├── train_mlp_script.py ├── utils ├── __init__.py ├── calc_grad.py ├── class_histogram.py ├── comp_ave_snr.py ├── create_adversarial_dataset.py ├── create_split_files.py ├── filtered_classify.py ├── filtered_classify_batch.py ├── plot_adversary_spectra.py ├── plot_conf.py ├── plot_individual_confs.py ├── plot_mean_std_recall.py ├── read_mp3.py ├── tensongs_exp.py ├── tensongs_exp_filtered.py ├── test_adversary.py └── test_adversary_linearized.py └── yaml_scripts ├── mlp_rlu.yaml ├── mlp_rlu_conv2.yaml └── mlp_rlu_dropout.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | *.pkl 3 | *.h5 4 | *.pyc 5 | .DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ##About 2 | This repository contains research code for experimenting with deep neural networks (DNNs) for music genre recognition (MGR). For the moment the repository contains many test scripts and should be considered unstable (i.e., the code is subject to change given our experimental needs). Nonetheless, the instructions below should help you reproduce our experiments and identify which files are most important for the basic functionality. 3 | 4 | ##Requirements 5 | - Python (Tested with version 2.7. Note Python 3 contains many changes that might introduce bugs in this code) 6 | - NumPy 7 | - SciPy 8 | - PyTables (requires numexpr and libhdf5) 9 | - Theano 10 | - Pylearn2 11 | - scikits.audiolab (and libsndfile) 12 | - scikits-learn 13 | 14 | You can find some setup tips in the hpc_scripts folder of this repository 15 | 16 | ##V2.0 Instructions 17 | This version is more flexible than the previous version and has been designed to work with generic datasets (not only the Tzanetakis dataset), with arbitrary categorical labels, and excerpt lengths. 18 | 19 | ###Dataset organization: 20 | Audio files must be uncompressed in either WAV or AU format and many different types of directory structures are permissible. However, there must be a way of specifying the categorical label for each file in the dataset. This can be done either by embedding the label in the filename, or the name of the parent folder (the folder name will always take precedence in the case of a conflict). 21 | 22 | In order to handle large datasets that may not fit into RAM this code requires that the dataset first be saved as a hdf5 file, which can be partially loaded into RAM on demand during training and testing. The script prepare_dataset.py will search for the dataset files, and prepare the hdf5 file. Furthermore, the prepare_dataset.py script can generate train/validation/test configuration files that specify a partition to be used in an experiment (e.g., 10-fold cross-validation). The partition configuration contains important meta-data, such as the train/valid/test files (and their index in the hdf5 file), as well as the mean and standard deviation of the training set (which can be used to standardize the data for training, validation, and testing). 23 | 24 | The following instructions demonstrate an example of how to use the code: 25 | ####1. Prepare the dataset and partition configuration file(s): 26 | 27 | ``` 28 | python prepare_dataset.py \ 29 | /path/to/dataset \ 30 | /path/to/label_list.txt \ 31 | --hdf5 /path/to/save/dataset.hdf5 \ 32 | --train_prop 0.5 \ 33 | --valid_prop 0.25 \ 34 | --test_prop 0.25 \ 35 | --partition_name /path/to/save/partition_configuration.pkl \ 36 | --compute_std 37 | --tframes 100 38 | ``` 39 | 40 | This will create the hdf5 dataset file and generate (1/test_prop) stratified partitions. The label_list.txt is a comma or newline separated list of the categorical labels in the dataset (which are matched against file and/or folder names) 41 | 42 | Alternatively the user can use a list of files when creating the partition: 43 | 44 | ``` 45 | python prepare_dataset.py \ 46 | /path/to/dataset \ 47 | /path/to/label_list.txt \ 48 | --hdf5 /path/to/save/dataset.hdf5 \ 49 | --train /path/to/train_list.txt \ 50 | --valid /path/to/valid_list.txt \ 51 | --test /path/to/test_list.txt \ 52 | --partition_name /path/to/save/partition_configuration.pkl \ 53 | --compute_std 54 | --tframes 100 55 | ``` 56 | 57 | The lists should be newline separated, and contain the relative path to each file (from the root folder of the dataset). For example if the directory structure is as follows: 58 | 59 | /root/blues/file.wav 60 | /root/jazz/file.wav 61 | 62 | . 63 | . 64 | . 65 | 66 | 67 | then the training list text file should look like this: 68 | 69 | blues/file.wav 70 | jazz/file.wav 71 | 72 | run: `python prepare_dataset.py --help` to see a full list of options 73 | 74 | ####2. Train a DNN: 75 | 76 | ``` 77 | python train_mlp_script.py \ 78 | /path/to/partition_configuration.pkl \ 79 | /path/to/yaml_config_file.yaml \ 80 | --nunits 50 81 | --output /path/to/save/model_file.pkl 82 | ``` 83 | 84 | Some yaml configuration files are provided in the folder yaml_scripts (but you can write your own for different experiments). 85 | 86 | ####3. Test a previously trained and saved DNN: 87 | 88 | ``` 89 | python test_mlp_script.py \ 90 | /path/to/saved/model_file.pkl \ 91 | --majority_vote 92 | ``` 93 | 94 | The model knows which dataset it was trained on, and will use the associated test set. An alternative testset can also be specified: 95 | 96 | ``` 97 | python test_mlp_script.py \ 98 | /path/to/saved/model_file.pkl \ 99 | --testset /path/to/alternate/partition_configuration.pkl 100 | --save_file /path/to/savefile.txt 101 | ``` 102 | 103 | `--save_file` lets the user save the test results to a file 104 | 105 | 106 | ##V1.0 Instructions 107 | This version has now been removed, but can be checked out as a branch using the v1.0 tag. 108 | -------------------------------------------------------------------------------- /adversary_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import functools 3 | import tables 4 | 5 | from pylearn2.datasets.dataset import Dataset 6 | from pylearn2.datasets.dense_design_matrix import DenseDesignMatrixPyTables, DefaultViewConverter 7 | from pylearn2.blocks import Block 8 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 9 | from pylearn2.utils.iteration import SubsetIterator, FiniteDatasetIterator, resolve_iterator_class 10 | from pylearn2.utils import safe_zip, safe_izip 11 | 12 | from pylearn2.datasets import control 13 | from pylearn2.utils.exc import reraise_as 14 | from pylearn2.utils.rng import make_np_rng 15 | from pylearn2.utils import contains_nan 16 | from pylearn2.models.mlp import MLP, Linear, PretrainedLayer 17 | from pylearn2.models.autoencoder import Autoencoder 18 | 19 | #from utils.test_adversary_linearized import find_adversary 20 | import theano 21 | from theano import config 22 | from theano import tensor as T 23 | 24 | import pdb 25 | 26 | class AdversaryDataset(DenseDesignMatrixPyTables): 27 | def __init__(self, config, adv_model, which_set='train'): 28 | 29 | keys = ['train', 'test', 'valid'] 30 | assert which_set in keys 31 | 32 | # load hdf5 metadata 33 | self.hdf5 = tables.open_file( config['hdf5'], mode='r') 34 | data = self.hdf5.get_node('/', 'Data') 35 | param = self.hdf5.get_node('/', 'Param') 36 | self.file_index = param.file_index[0] 37 | self.file_dict = param.file_dict[0] 38 | self.label_list = param.label_list[0] 39 | self.targets = param.targets[0] 40 | self.nfft = param.fft[0]['nfft'] 41 | 42 | # load parition information 43 | self.support = config[which_set] 44 | self.file_list = config[which_set+'_files'] 45 | self.mean = config['mean'] 46 | self.mean = self.mean.reshape((np.prod(self.mean.shape),)) 47 | self.var = config['var'] 48 | self.var = self.var.reshape((np.prod(self.var.shape),)) 49 | self.istd = np.reciprocal(np.sqrt(self.var)) 50 | self.mask = (self.istd < 20) 51 | self.tframes = config['tframes'] 52 | 53 | # setup adversary 54 | self.adv_model = adv_model 55 | in_batch = adv_model.get_input_space().make_theano_batch() 56 | out_batch = adv_model.get_output_space().make_theano_batch() 57 | cost = adv_model.cost(out_batch, adv_model.fprop(in_batch)) 58 | dCost = T.grad(cost, in_batch) 59 | 60 | grad_theano = theano.function([in_batch, out_batch], dCost) 61 | fprop_theano = theano.function([in_batch], adv_model.fprop(in_batch)) 62 | fcost_theano = theano.function([in_batch, out_batch], cost) 63 | 64 | self.input_space = adv_model.get_input_space() 65 | if isinstance(self.input_space, Conv2DSpace): 66 | tframes, dim = self.input_space.shape 67 | view_converter = DefaultViewConverter((tframes, dim, 1)) 68 | 69 | def grad(batch, labels): 70 | topo_view = grad_theano(view_converter.get_formatted_batch(batch, self.input_space), labels) 71 | return view_converter.topo_view_to_design_mat(topo_view) 72 | 73 | def fprop(batch): 74 | return fprop_theano(view_converter.get_formatted_batch(batch, self.input_space)) 75 | 76 | def fcost(batch, labels): 77 | return fcost_theano(view_converter.get_formatted_batch(batch, self.input_space), labels) 78 | 79 | self.grad = grad 80 | self.fprop = fprop 81 | self.fcost = fcost 82 | 83 | 84 | super(AdversaryDataset, self).__init__(X=data.X, y=data.y, 85 | view_converter=view_converter) 86 | 87 | else: 88 | dim = self.input_space.dim 89 | tframes = 1 90 | view_converter = None 91 | 92 | self.grad = grad_theano 93 | self.fprop = fprop_theano 94 | self.fcost = fcost_theano 95 | 96 | super(AdversaryDataset, self).__init__(X=data.X, y=data.y) 97 | 98 | def __del__(self): 99 | self.hdf5.close() 100 | 101 | def create_adversary_from_batch(self, batch, label, mu=0.25, snr=15): 102 | #n_examples = batch.shape[0] 103 | #epsilon = np.linalg.norm(batch)/n_examples/10**(snr/20.) 104 | 105 | g = self.grad(batch, label) #* n_examples 106 | Z = batch - mu * np.sign(g) 107 | if self.tframes==1: Z = Z * (Z>0) 108 | 109 | #nu = np.linalg.norm((Z-batch))/n_examples/epsilon - 1 # lagrange multiplier 110 | #nu = nu * (nu>=0) 111 | #Y = (Z + nu*batch) / (1+nu) 112 | return Z#Y 113 | 114 | def standardize(self, batch): 115 | return (batch - self.mean) * self.istd * self.mask 116 | 117 | @functools.wraps(Dataset.iterator) 118 | def iterator(self, mode=None, batch_size=1, num_batches=None, 119 | topo=None, targets=None, rng=None, data_specs=None, 120 | return_tuple=False): 121 | ''' 122 | Copied from pylearn2 superclass in order to return custom iterator. 123 | Two different iterators are available, depending on the data_specs. 124 | 1. If the data_specs source is 'features' a framelevel iterator is returned 125 | (each call to next() returns a single frame) 126 | 2. If the data_specs source is 'songlevel-features' a songlevel iterator is returned 127 | (each call to next() returns all the frames associated with a given song in the dataset) 128 | ''' 129 | if data_specs is None: 130 | data_specs = self._iter_data_specs 131 | else: 132 | self.data_specs = data_specs 133 | 134 | # If there is a view_converter, we have to use it to convert 135 | # the stored data for "features" into one that the iterator 136 | # can return. 137 | space, source = data_specs 138 | if isinstance(space, CompositeSpace): 139 | sub_spaces = space.components 140 | sub_sources = source 141 | else: 142 | sub_spaces = (space,) 143 | sub_sources = (source,) 144 | 145 | convert = [] 146 | for sp, src in safe_zip(sub_spaces, sub_sources): 147 | if (src == 'features' or src == 'songlevel-features') and \ 148 | getattr(self, 'view_converter', None) is not None: 149 | conv_fn = (lambda batch, self=self, space=sp: 150 | self.view_converter.get_formatted_batch(batch, 151 | space)) 152 | else: 153 | conv_fn = None 154 | 155 | convert.append(conv_fn) 156 | 157 | # TODO: Refactor 158 | if mode is None: 159 | if hasattr(self, '_iter_subset_class'): 160 | mode = self._iter_subset_class 161 | else: 162 | raise ValueError('iteration mode not provided and no default ' 163 | 'mode set for %s' % str(self)) 164 | else: 165 | mode = resolve_iterator_class(mode) 166 | 167 | if num_batches is None: 168 | num_batches = getattr(self, '_iter_num_batches', None) 169 | if rng is None and mode.stochastic: 170 | rng = self.rng 171 | 172 | if 'songlevel-features' in sub_sources: 173 | if batch_size is not 1: 174 | raise ValueError("'batch_size' must be set to 1 for songlevel iterator") 175 | return SonglevelIterator(self, 176 | mode(len(self.file_list), batch_size, num_batches, rng), 177 | data_specs=data_specs, 178 | return_tuple=return_tuple, 179 | convert=convert) 180 | else: 181 | return FramelevelIterator(self, 182 | mode(len(self.support), batch_size, num_batches, rng), 183 | data_specs=data_specs, 184 | return_tuple=return_tuple, 185 | convert=convert) 186 | 187 | class FramelevelIterator(FiniteDatasetIterator): 188 | ''' 189 | Returns individual (spectrogram) frames/slices from the dataset 190 | ''' 191 | @functools.wraps(SubsetIterator.next) 192 | def next(self): 193 | """ 194 | Retrieves the next batch of examples. 195 | 196 | Returns 197 | ------- 198 | next_batch : object 199 | An object representing a mini-batch of data, conforming 200 | to the space specified in the `data_specs` constructor 201 | argument to this iterator. Will be a tuple if more 202 | than one data source was specified or if the constructor 203 | parameter `return_tuple` was `True`. 204 | 205 | Raises 206 | ------ 207 | StopIteration 208 | When there are no more batches to return. 209 | """ 210 | next_index = self._subset_iterator.next() 211 | next_index = self._dataset.support[ next_index ] # !!! added line to iterate over different index set !!! 212 | 213 | spaces, sources = self._data_specs 214 | output = [] 215 | 216 | for data, fn, source in safe_izip(self._raw_data, self._convert, sources): 217 | if source=='targets': 218 | if fn: 219 | output.append( fn(data[next_index, :]) ) 220 | else: 221 | output.append( data[next_index, :] ) 222 | else: 223 | design_mat = [] 224 | for index in next_index: 225 | X = np.abs(data[index:index+self._dataset.tframes, :]) 226 | design_mat.append( X.reshape((np.prod(X.shape),)) ) 227 | 228 | design_mat = np.vstack(design_mat) 229 | if self._dataset.tframes > 1: 230 | # ideally we'd standardize in a preprocessing layer 231 | # (so that standardization is built-in to the model rather 232 | # than the dataset) but i haven't quite figured out how to do 233 | # this yet for images, due to a memory error associated with 234 | # a really big diagonal scaling matrix 235 | # (however, it works fine for vectors) 236 | design_mat = self._dataset.standardize(design_mat) 237 | 238 | n_classes = len(self._dataset.targets) 239 | one_hot = np.zeros((design_mat.shape[0], n_classes), dtype=np.float32) 240 | for r in one_hot: r[np.random.randint(n_classes)] = 1 241 | 242 | design_mat = self._dataset.create_adversary_from_batch(design_mat, one_hot) 243 | 244 | if fn: 245 | output.append( fn(design_mat) ) 246 | else: 247 | output.append( design_mat ) 248 | 249 | rval = tuple(output) 250 | if not self._return_tuple and len(rval) == 1: 251 | rval, = rval 252 | return rval 253 | 254 | class SonglevelIterator(FiniteDatasetIterator): 255 | ''' 256 | Returns all data associated with a particular song from the dataset 257 | (only iterates 1 song at a time!) 258 | ''' 259 | @functools.wraps(SubsetIterator.next) 260 | def next(self): 261 | 262 | # next numerical index 263 | next_file_index = self._subset_iterator.next() 264 | 265 | # associate numerical index with file from the dataset 266 | next_file = self._dataset.file_list[ next_file_index ][0] # !!! added line to iterate over different index set !!! 267 | 268 | # lookup file's position in the hdf5 array 269 | offset, nframes, key, target = self._dataset.file_index[next_file] 270 | 271 | thop = 1. # hardcoded and must match prepare_dataset.py!!! 272 | sup = np.arange(0,nframes-self._dataset.tframes,np.int(self._dataset.tframes/thop)) 273 | next_index = offset + sup 274 | 275 | spaces, sources = self._data_specs 276 | output = [] 277 | 278 | for data, fn, source, space in safe_izip(self._raw_data, self._convert, sources, spaces.components): 279 | if source=='targets': 280 | # if fn: 281 | # output.append( fn( np.reshape(data[next_index[0], :], (1,-1)) ) ) 282 | # else: 283 | # output.append( np.reshape(data[next_index[0], :], (1,-1)) ) 284 | output.append( target ) 285 | else: 286 | design_mat = [] 287 | for index in next_index: 288 | if 0:#space.dtype=='complex64': 289 | X = data[index:index+self._dataset.tframes, :] # return phase too 290 | else: 291 | X = np.abs(data[index:index+self._dataset.tframes, :]) 292 | design_mat.append( X.reshape((np.prod(X.shape),)) ) 293 | 294 | design_mat = np.vstack(design_mat) 295 | 296 | if self._dataset.tframes > 1: 297 | # ideally we'd standardize in a preprocessing layer 298 | # (so that standardization is built-in to the model rather 299 | # than the dataset) but i haven't quite figured out how to do 300 | # this yet for images, due to a memory error associated with 301 | # a really big diagonal scaling matrix 302 | # (however, it works fine for vectors) 303 | design_mat = self._dataset.standardize(design_mat) 304 | 305 | if fn: 306 | output.append( fn(design_mat) ) 307 | else: 308 | output.append( design_mat ) 309 | 310 | output.append(next_file) 311 | rval = tuple(output) 312 | if not self._return_tuple and len(rval) == 1: 313 | rval, = rval 314 | return rval 315 | 316 | class PreprocLayer(PretrainedLayer): 317 | # should this use a linear layer instead of an autoencoder 318 | # (problem is layers don't implement upward_pass as required by pretrained layer... 319 | # but perhaps could write upward_pass function to call layer's fprop.) 320 | def __init__(self, config, proc_type='standardize', **kwargs): 321 | ''' 322 | config: dictionary with partition configuration information 323 | 324 | proc_type: type of preprocessing (either standardize or pca_whiten) 325 | 326 | if proc_type='standardize' no extra arguments required 327 | 328 | if proc_type='pca_whiten' the following keyword arguments are required: 329 | ncomponents = x where x is an integer 330 | epsilon = y where y is a float (regularization parameter) 331 | ''' 332 | 333 | recognized_types = ['standardize', 'pca_whiten'] 334 | assert proc_type in recognized_types 335 | 336 | # load parition information 337 | self.mean = config['mean'] 338 | self.istd = np.reciprocal(np.sqrt(config['var'])) 339 | self.tframes = config['tframes'] 340 | nvis = len(self.mean) 341 | 342 | if proc_type == 'standardize': 343 | dim = nvis 344 | mask = (self.istd < 20) # in order to ignore near-zero variance inputs 345 | self.biases = np.array(-self.mean * self.istd * mask, dtype=np.float32) 346 | self.weights = np.array(np.diag(self.istd * mask), dtype=np.float32) 347 | 348 | if proc_type == 'pca_whiten': 349 | raise NotImplementedError( 350 | '''PCA whitening not yet implemented as a layer. 351 | Use audio_dataset2d.AudioDataset2d to perform whitening from the dataset iterator''') 352 | 353 | # dim = kwargs['ncomponents'] 354 | # S = config['S'][:dim] # eigenvalues 355 | # U = config['U'][:,:dim] # eigenvectors 356 | # self.pca = np.diag(1./(np.sqrt(S) + epsilon)).dot(U.T) 357 | 358 | # self.biases = np.array(-self.mean.dot(self.pca.transpose()), dtype=np.float32) 359 | # self.weights = np.array(self.pca.transpose(), dtype=np.float32) 360 | 361 | # Autoencoder with linear units 362 | pre_layer = Autoencoder(nvis=nvis, nhid=dim, act_enc=None, act_dec=None, irange=0) 363 | 364 | # Set weights for pre-processing 365 | params = pre_layer.get_param_values() 366 | params[1] = self.biases 367 | params[2] = self.weights 368 | pre_layer.set_param_values(params) 369 | 370 | super(PreprocLayer, self).__init__(layer_name='pre', layer_content=pre_layer, freeze_params=True) 371 | 372 | def get_biases(self): 373 | return self.biases 374 | 375 | def get_weights(self): 376 | return self.weights 377 | 378 | def get_param_values(self): 379 | return list((self.get_weights(), self.get_biases())) 380 | 381 | 382 | -------------------------------------------------------------------------------- /audio_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import functools 3 | import tables 4 | 5 | from pylearn2.datasets.dataset import Dataset 6 | from pylearn2.datasets.dense_design_matrix import DenseDesignMatrixPyTables, DefaultViewConverter 7 | from pylearn2.blocks import Block 8 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 9 | from pylearn2.utils.iteration import SubsetIterator, FiniteDatasetIterator, resolve_iterator_class 10 | from pylearn2.utils import safe_zip, safe_izip 11 | 12 | from pylearn2.datasets import control 13 | from pylearn2.utils.exc import reraise_as 14 | from pylearn2.utils.rng import make_np_rng 15 | from pylearn2.utils import contains_nan 16 | from pylearn2.models.mlp import MLP, Linear, PretrainedLayer 17 | from pylearn2.models.autoencoder import Autoencoder 18 | 19 | from theano import config 20 | 21 | import pdb 22 | 23 | class AudioDataset(DenseDesignMatrixPyTables): 24 | def __init__(self, config, which_set='train'): #, standardize=True, pca_whitening=False, ncomponents=None, epsilon=3): 25 | 26 | keys = ['train', 'test', 'valid'] 27 | assert which_set in keys 28 | 29 | # load hdf5 metadata 30 | self.hdf5 = tables.open_file( config['hdf5'], mode='r') 31 | data = self.hdf5.get_node('/', 'Data') 32 | param = self.hdf5.get_node('/', 'Param') 33 | self.file_index = param.file_index[0] 34 | self.file_dict = param.file_dict[0] 35 | self.label_list = param.label_list[0] 36 | self.targets = param.targets[0] 37 | self.nfft = param.fft[0]['nfft'] 38 | 39 | # load parition information 40 | self.support = config[which_set] 41 | self.file_list = config[which_set+'_files'] 42 | self.mean = config['mean'] 43 | self.mean = self.mean.reshape((np.prod(self.mean.shape),)) 44 | self.var = config['var'] 45 | self.var = self.var.reshape((np.prod(self.var.shape),)) 46 | self.istd = np.reciprocal(np.sqrt(self.var)) 47 | self.mask = (self.istd < 20) 48 | self.tframes = config['tframes'] 49 | 50 | if self.tframes > 1: 51 | view_converter = DefaultViewConverter((self.tframes, len(self.mean)/self.tframes, 1)) 52 | super(AudioDataset, self).__init__(X=data.X, y=data.y, 53 | view_converter=view_converter) 54 | else: 55 | super(AudioDataset, self).__init__(X=data.X, y=data.y) 56 | 57 | def __del__(self): 58 | self.hdf5.close() 59 | 60 | @functools.wraps(Dataset.iterator) 61 | def iterator(self, mode=None, batch_size=1, num_batches=None, 62 | topo=None, targets=None, rng=None, data_specs=None, 63 | return_tuple=False): 64 | ''' 65 | Copied from pylearn2 superclass in order to return custom iterator. 66 | Two different iterators are available, depending on the data_specs. 67 | 1. If the data_specs source is 'features' a framelevel iterator is returned 68 | (each call to next() returns a single frame) 69 | 2. If the data_specs source is 'songlevel-features' a songlevel iterator is returned 70 | (each call to next() returns all the frames associated with a given song in the dataset) 71 | ''' 72 | if data_specs is None: 73 | data_specs = self._iter_data_specs 74 | else: 75 | self.data_specs = data_specs 76 | 77 | # If there is a view_converter, we have to use it to convert 78 | # the stored data for "features" into one that the iterator 79 | # can return. 80 | space, source = data_specs 81 | if isinstance(space, CompositeSpace): 82 | sub_spaces = space.components 83 | sub_sources = source 84 | else: 85 | sub_spaces = (space,) 86 | sub_sources = (source,) 87 | 88 | convert = [] 89 | for sp, src in safe_zip(sub_spaces, sub_sources): 90 | if (src == 'features' or src == 'songlevel-features') and \ 91 | getattr(self, 'view_converter', None) is not None: 92 | conv_fn = (lambda batch, self=self, space=sp: 93 | self.view_converter.get_formatted_batch(batch, 94 | space)) 95 | else: 96 | conv_fn = None 97 | 98 | convert.append(conv_fn) 99 | 100 | # TODO: Refactor 101 | if mode is None: 102 | if hasattr(self, '_iter_subset_class'): 103 | mode = self._iter_subset_class 104 | else: 105 | raise ValueError('iteration mode not provided and no default ' 106 | 'mode set for %s' % str(self)) 107 | else: 108 | mode = resolve_iterator_class(mode) 109 | 110 | if num_batches is None: 111 | num_batches = getattr(self, '_iter_num_batches', None) 112 | if rng is None and mode.stochastic: 113 | rng = self.rng 114 | 115 | if 'songlevel-features' in sub_sources: 116 | if batch_size is not 1: 117 | raise ValueError("'batch_size' must be set to 1 for songlevel iterator") 118 | return SonglevelIterator(self, 119 | mode(len(self.file_list), batch_size, num_batches, rng), 120 | data_specs=data_specs, 121 | return_tuple=return_tuple, 122 | convert=convert) 123 | else: 124 | return FramelevelIterator(self, 125 | mode(len(self.support), batch_size, num_batches, rng), 126 | data_specs=data_specs, 127 | return_tuple=return_tuple, 128 | convert=convert) 129 | 130 | def standardize(self, batch): 131 | return (batch - self.mean) * self.istd * self.mask 132 | 133 | class FramelevelIterator(FiniteDatasetIterator): 134 | ''' 135 | Returns individual (spectrogram) frames/slices from the dataset 136 | ''' 137 | @functools.wraps(SubsetIterator.next) 138 | def next(self): 139 | """ 140 | Retrieves the next batch of examples. 141 | 142 | Returns 143 | ------- 144 | next_batch : object 145 | An object representing a mini-batch of data, conforming 146 | to the space specified in the `data_specs` constructor 147 | argument to this iterator. Will be a tuple if more 148 | than one data source was specified or if the constructor 149 | parameter `return_tuple` was `True`. 150 | 151 | Raises 152 | ------ 153 | StopIteration 154 | When there are no more batches to return. 155 | """ 156 | next_index = self._subset_iterator.next() 157 | next_index = self._dataset.support[ next_index ] # !!! added line to iterate over different index set !!! 158 | 159 | spaces, sources = self._data_specs 160 | output = [] 161 | 162 | for data, fn, source in safe_izip(self._raw_data, self._convert, sources): 163 | if source=='targets': 164 | if fn: 165 | output.append( fn(data[next_index, :]) ) 166 | else: 167 | output.append( data[next_index, :] ) 168 | else: 169 | design_mat = [] 170 | for index in next_index: 171 | X = np.abs(data[index:index+self._dataset.tframes, :]) 172 | design_mat.append( X.reshape((np.prod(X.shape),)) ) 173 | 174 | design_mat = np.vstack(design_mat) 175 | 176 | if self._dataset.tframes > 1: 177 | # ideally we'd standardize in a preprocessing layer 178 | # (so that standardization is built-in to the model rather 179 | # than the dataset) but i haven't quite figured out how to do 180 | # this yet for images, due to a memory error associated with 181 | # a really big diagonal scaling matrix 182 | # (however, it works fine for vectors) 183 | design_mat = self._dataset.standardize(design_mat) 184 | 185 | if fn: 186 | output.append( fn(design_mat) ) 187 | else: 188 | output.append( design_mat ) 189 | 190 | rval = tuple(output) 191 | if not self._return_tuple and len(rval) == 1: 192 | rval, = rval 193 | return rval 194 | 195 | class SonglevelIterator(FiniteDatasetIterator): 196 | ''' 197 | Returns all data associated with a particular song from the dataset 198 | (only iterates 1 song at a time!) 199 | ''' 200 | @functools.wraps(SubsetIterator.next) 201 | def next(self): 202 | 203 | # next numerical index 204 | next_file_index = self._subset_iterator.next() 205 | 206 | # associate numerical index with file from the dataset 207 | next_file = self._dataset.file_list[ next_file_index ][0] # !!! added line to iterate over different index set !!! 208 | 209 | # lookup file's position in the hdf5 array 210 | offset, nframes, key, target = self._dataset.file_index[next_file] 211 | 212 | thop = 1. # hardcoded and must match prepare_dataset.py!!! 213 | sup = np.arange(0,nframes-self._dataset.tframes,np.int(self._dataset.tframes/thop)) 214 | next_index = offset + sup 215 | 216 | 217 | spaces, sources = self._data_specs 218 | output = [] 219 | 220 | for data, fn, source, space in safe_izip(self._raw_data, self._convert, sources, spaces.components): 221 | if source=='targets': 222 | # if fn: 223 | # output.append( fn( np.reshape(data[next_index[0], :], (1,-1)) ) ) 224 | # else: 225 | # output.append( np.reshape(data[next_index[0], :], (1,-1)) ) 226 | output.append( target ) 227 | else: 228 | design_mat = [] 229 | for index in next_index: 230 | if 0:#space.dtype=='complex64': 231 | X = data[index:index+self._dataset.tframes, :] # return phase too 232 | else: 233 | X = np.abs(data[index:index+self._dataset.tframes, :]) 234 | design_mat.append( X.reshape((np.prod(X.shape),)) ) 235 | 236 | design_mat = np.vstack(design_mat) 237 | if self._dataset.tframes > 1: 238 | # ideally we'd standardize in a preprocessing layer 239 | # (so that standardization is built-in to the model rather 240 | # than the dataset) but i haven't quite figured out how to do 241 | # this yet for images, due to a memory error associated with 242 | # a really big diagonal scaling matrix 243 | # (however, it works fine for vectors) 244 | design_mat = self._dataset.standardize(design_mat) 245 | 246 | if fn: 247 | output.append( fn(design_mat) ) 248 | else: 249 | output.append( design_mat ) 250 | 251 | output.append(next_file) 252 | rval = tuple(output) 253 | if not self._return_tuple and len(rval) == 1: 254 | rval, = rval 255 | return rval 256 | 257 | class PreprocLayer(PretrainedLayer): 258 | # should this use a linear layer instead of an autoencoder 259 | # (problem is layers don't implement upward_pass as required by pretrained layer... 260 | # but perhaps could write upward_pass function to call layer's fprop.) 261 | def __init__(self, config, proc_type='standardize', **kwargs): 262 | ''' 263 | config: dictionary with partition configuration information 264 | 265 | proc_type: type of preprocessing (either standardize or pca_whiten) 266 | 267 | if proc_type='standardize' no extra arguments required 268 | 269 | if proc_type='pca_whiten' the following keyword arguments are required: 270 | ncomponents = x where x is an integer 271 | epsilon = y where y is a float (regularization parameter) 272 | ''' 273 | 274 | recognized_types = ['standardize', 'pca_whiten'] 275 | assert proc_type in recognized_types 276 | 277 | # load parition information 278 | self.mean = config['mean'] 279 | self.mean = self.mean.reshape((np.prod(self.mean.shape),)) 280 | self.istd = np.reciprocal(np.sqrt(config['var'])) 281 | self.istd = self.istd.reshape((np.prod(self.istd.shape),)) 282 | self.tframes = config['tframes'] 283 | nvis = len(self.mean) 284 | 285 | if proc_type == 'standardize': 286 | dim = nvis 287 | mask = (self.istd < 20) # in order to ignore near-zero variance inputs 288 | self.biases = np.array(-self.mean * self.istd * mask, dtype=np.float32) 289 | self.weights = np.array(np.diag(self.istd * mask), dtype=np.float32) #!!!gives memory error for convnet (because diag not treated as sparse mat) 290 | 291 | if proc_type == 'pca_whiten': 292 | raise NotImplementedError( 293 | '''PCA whitening not yet implemented as a layer. 294 | Use audio_dataset2d.AudioDataset2d to perform whitening from the dataset iterator''') 295 | 296 | # dim = kwargs['ncomponents'] 297 | # S = config['S'][:dim] # eigenvalues 298 | # U = config['U'][:,:dim] # eigenvectors 299 | # self.pca = np.diag(1./(np.sqrt(S) + epsilon)).dot(U.T) 300 | 301 | # self.biases = np.array(-self.mean.dot(self.pca.transpose()), dtype=np.float32) 302 | # self.weights = np.array(self.pca.transpose(), dtype=np.float32) 303 | 304 | # Autoencoder with linear units 305 | pre_layer = Autoencoder(nvis=nvis, nhid=dim, act_enc=None, act_dec=None, irange=0) 306 | 307 | # Set weights for pre-processing 308 | params = pre_layer.get_param_values() 309 | params[1] = self.biases 310 | params[2] = self.weights 311 | pre_layer.set_param_values(params) 312 | 313 | super(PreprocLayer, self).__init__(layer_name='pre', layer_content=pre_layer, freeze_params=True) 314 | 315 | def get_biases(self): 316 | return self.biases 317 | 318 | def get_weights(self): 319 | return self.weights 320 | 321 | def get_param_values(self): 322 | return list((self.get_weights(), self.get_biases())) 323 | 324 | if __name__=='__main__': 325 | 326 | # tests 327 | import theano 328 | import cPickle 329 | from audio_dataset import AudioDataset 330 | 331 | with open('GTZAN_stratified.pkl') as f: 332 | config = cPickle.load(f) 333 | 334 | D = AudioDataset(config) 335 | 336 | feat_space = VectorSpace(dim=D.X.shape[1]) 337 | feat_space_complex = VectorSpace(dim=D.X.shape[1], dtype='complex64') 338 | target_space = VectorSpace(dim=len(D.label_list)) 339 | 340 | data_specs_frame = (CompositeSpace((feat_space,target_space)), ("features", "targets")) 341 | data_specs_song = (CompositeSpace((feat_space_complex, target_space)), ("songlevel-features", "targets")) 342 | 343 | framelevel_it = D.iterator(mode='sequential', batch_size=10, data_specs=data_specs_frame) 344 | frame_batch = framelevel_it.next() 345 | 346 | songlevel_it = D.iterator(mode='sequential', batch_size=1, data_specs=data_specs_song) 347 | song_batch = songlevel_it.next() 348 | 349 | -------------------------------------------------------------------------------- /fine_tune_pretrained_mlp.py: -------------------------------------------------------------------------------- 1 | import sys, re, cPickle, argparse 2 | from glob import glob 3 | from pylearn2.train import Train 4 | from pylearn2.utils import serial 5 | from pylearn2.models.mlp import MLP, PretrainedLayer, Sigmoid, Softmax 6 | from pylearn2.training_algorithms.sgd import SGD, LinearDecayOverEpoch 7 | from pylearn2.training_algorithms.learning_rule import Momentum, MomentumAdjustor 8 | from pylearn2.training_algorithms.learning_rule import RMSProp 9 | from pylearn2.termination_criteria import MonitorBased 10 | from pylearn2.train_extensions.best_params import MonitorBasedSaveBest 11 | from pylearn2.datasets.transformer_dataset import TransformerDataset 12 | from audio_dataset import AudioDataset 13 | 14 | import pylearn2.config.yaml_parse as yaml_parse 15 | 16 | import pdb 17 | 18 | def get_mlp(nvis, nclasses, pretrained_layers): 19 | 20 | layer_yaml = [] 21 | for i, m in enumerate(pretrained_layers): 22 | layer_yaml.append('''!obj:pylearn2.models.mlp.PretrainedLayer { 23 | layer_name : "%(layer_name)s", 24 | layer_content : !pkl: "%(layer_content)s" 25 | }''' % {'layer_name' : 'h'+str(i), 'layer_content' : m }) 26 | 27 | layer_yaml.append('''!obj:pylearn2.models.mlp.Softmax { 28 | n_classes : %(nclasses)i, 29 | layer_name : "y", 30 | irange : .01 31 | }''' % {'nclasses' : nclasses}) 32 | 33 | layer_yaml = ','.join(layer_yaml) 34 | 35 | model_yaml = '''!obj:pylearn2.models.mlp.MLP { 36 | nvis : %(nvis)i, 37 | layers : [%(layers)s] 38 | }''' % {'nvis' : nvis, 'layers' : layer_yaml} 39 | 40 | model = yaml_parse.load(model_yaml) 41 | return model 42 | 43 | def get_trainer(model, trainset, validset, save_path): 44 | 45 | monitoring = dict(valid=validset, train=trainset) 46 | termination = MonitorBased(channel_name='valid_y_misclass', prop_decrease=.001, N=100) 47 | extensions = [MonitorBasedSaveBest(channel_name='valid_y_misclass', save_path=save_path), 48 | #MomentumAdjustor(start=1, saturate=100, final_momentum=.9), 49 | LinearDecayOverEpoch(start=1, saturate=200, decay_factor=0.01)] 50 | 51 | config = { 52 | 'learning_rate': .01, 53 | #'learning_rule': Momentum(0.5), 54 | 'learning_rule': RMSProp(), 55 | 'train_iteration_mode': 'shuffled_sequential', 56 | 'batch_size': 1200,#250, 57 | #'batches_per_iter' : 100, 58 | 'monitoring_dataset': monitoring, 59 | 'monitor_iteration_mode' : 'shuffled_sequential', 60 | 'termination_criterion' : termination, 61 | } 62 | 63 | return Train(model=model, 64 | algorithm=SGD(**config), 65 | dataset=trainset, 66 | extensions=extensions) 67 | 68 | if __name__=="__main__": 69 | 70 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 71 | description='Script to pretrain the layers of a DNN.') 72 | 73 | parser.add_argument('fold_config', help='Path to dataset configuration file (generated with prepare_dataset.py)') 74 | parser.add_argument('--pretrained_layers', nargs='*', help='List of pretrained layers (sorted from input to output)') 75 | parser.add_argument('--save_file', help='Full path and for saving output model') 76 | args = parser.parse_args() 77 | 78 | trainset = yaml_parse.load( 79 | '''!obj:audio_dataset.AudioDataset { 80 | which_set : 'train', 81 | config : !pkl: "%(fold_config)s" 82 | }''' % {'fold_config' : args.fold_config} ) 83 | 84 | validset = yaml_parse.load( 85 | '''!obj:audio_dataset.AudioDataset { 86 | which_set : 'valid', 87 | config : !pkl: "%(fold_config)s" 88 | }''' % {'fold_config' : args.fold_config} ) 89 | 90 | 91 | testset = yaml_parse.load( 92 | '''!obj:audio_dataset.AudioDataset { 93 | which_set : 'test', 94 | config : !pkl: "%(fold_config)s" 95 | }''' % {'fold_config' : args.fold_config} ) 96 | 97 | model = get_mlp(nvis=trainset.X.shape[1], nclasses=trainset.y.shape[1], pretrained_layers=args.pretrained_layers) 98 | trainer = get_trainer(model=model, trainset=trainset, validset=validset, save_path=args.save_file) 99 | 100 | trainer.main_loop() 101 | 102 | 103 | 104 | -------------------------------------------------------------------------------- /gtzan/test_filtered.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00012.wav 2 | blues/blues.00013.wav 3 | blues/blues.00014.wav 4 | blues/blues.00015.wav 5 | blues/blues.00016.wav 6 | blues/blues.00017.wav 7 | blues/blues.00018.wav 8 | blues/blues.00019.wav 9 | blues/blues.00020.wav 10 | blues/blues.00021.wav 11 | blues/blues.00022.wav 12 | blues/blues.00023.wav 13 | blues/blues.00024.wav 14 | blues/blues.00025.wav 15 | blues/blues.00026.wav 16 | blues/blues.00027.wav 17 | blues/blues.00028.wav 18 | blues/blues.00061.wav 19 | blues/blues.00062.wav 20 | blues/blues.00063.wav 21 | blues/blues.00064.wav 22 | blues/blues.00065.wav 23 | blues/blues.00066.wav 24 | blues/blues.00067.wav 25 | blues/blues.00068.wav 26 | blues/blues.00069.wav 27 | blues/blues.00070.wav 28 | blues/blues.00071.wav 29 | blues/blues.00072.wav 30 | blues/blues.00098.wav 31 | blues/blues.00099.wav 32 | classical/classical.00011.wav 33 | classical/classical.00012.wav 34 | classical/classical.00013.wav 35 | classical/classical.00014.wav 36 | classical/classical.00015.wav 37 | classical/classical.00016.wav 38 | classical/classical.00017.wav 39 | classical/classical.00018.wav 40 | classical/classical.00019.wav 41 | classical/classical.00020.wav 42 | classical/classical.00021.wav 43 | classical/classical.00022.wav 44 | classical/classical.00023.wav 45 | classical/classical.00024.wav 46 | classical/classical.00025.wav 47 | classical/classical.00026.wav 48 | classical/classical.00027.wav 49 | classical/classical.00028.wav 50 | classical/classical.00029.wav 51 | classical/classical.00034.wav 52 | classical/classical.00035.wav 53 | classical/classical.00036.wav 54 | classical/classical.00037.wav 55 | classical/classical.00038.wav 56 | classical/classical.00039.wav 57 | classical/classical.00040.wav 58 | classical/classical.00041.wav 59 | classical/classical.00049.wav 60 | classical/classical.00077.wav 61 | classical/classical.00078.wav 62 | classical/classical.00079.wav 63 | country/country.00030.wav 64 | country/country.00031.wav 65 | country/country.00032.wav 66 | country/country.00033.wav 67 | country/country.00034.wav 68 | country/country.00035.wav 69 | country/country.00036.wav 70 | country/country.00037.wav 71 | country/country.00038.wav 72 | country/country.00039.wav 73 | country/country.00040.wav 74 | country/country.00043.wav 75 | country/country.00044.wav 76 | country/country.00046.wav 77 | country/country.00047.wav 78 | country/country.00048.wav 79 | country/country.00050.wav 80 | country/country.00051.wav 81 | country/country.00053.wav 82 | country/country.00054.wav 83 | country/country.00055.wav 84 | country/country.00056.wav 85 | country/country.00057.wav 86 | country/country.00058.wav 87 | country/country.00059.wav 88 | country/country.00060.wav 89 | country/country.00061.wav 90 | country/country.00062.wav 91 | country/country.00063.wav 92 | country/country.00064.wav 93 | disco/disco.00001.wav 94 | disco/disco.00021.wav 95 | disco/disco.00058.wav 96 | disco/disco.00062.wav 97 | disco/disco.00063.wav 98 | disco/disco.00064.wav 99 | disco/disco.00065.wav 100 | disco/disco.00066.wav 101 | disco/disco.00069.wav 102 | disco/disco.00076.wav 103 | disco/disco.00077.wav 104 | disco/disco.00078.wav 105 | disco/disco.00079.wav 106 | disco/disco.00080.wav 107 | disco/disco.00081.wav 108 | disco/disco.00082.wav 109 | disco/disco.00083.wav 110 | disco/disco.00084.wav 111 | disco/disco.00085.wav 112 | disco/disco.00086.wav 113 | disco/disco.00087.wav 114 | disco/disco.00088.wav 115 | disco/disco.00091.wav 116 | disco/disco.00092.wav 117 | disco/disco.00093.wav 118 | disco/disco.00094.wav 119 | disco/disco.00096.wav 120 | disco/disco.00097.wav 121 | disco/disco.00099.wav 122 | hiphop/hiphop.00000.wav 123 | hiphop/hiphop.00026.wav 124 | hiphop/hiphop.00027.wav 125 | hiphop/hiphop.00030.wav 126 | hiphop/hiphop.00040.wav 127 | hiphop/hiphop.00043.wav 128 | hiphop/hiphop.00044.wav 129 | hiphop/hiphop.00045.wav 130 | hiphop/hiphop.00051.wav 131 | hiphop/hiphop.00052.wav 132 | hiphop/hiphop.00053.wav 133 | hiphop/hiphop.00054.wav 134 | hiphop/hiphop.00062.wav 135 | hiphop/hiphop.00063.wav 136 | hiphop/hiphop.00064.wav 137 | hiphop/hiphop.00065.wav 138 | hiphop/hiphop.00066.wav 139 | hiphop/hiphop.00067.wav 140 | hiphop/hiphop.00068.wav 141 | hiphop/hiphop.00069.wav 142 | hiphop/hiphop.00070.wav 143 | hiphop/hiphop.00071.wav 144 | hiphop/hiphop.00072.wav 145 | hiphop/hiphop.00073.wav 146 | hiphop/hiphop.00074.wav 147 | hiphop/hiphop.00075.wav 148 | hiphop/hiphop.00099.wav 149 | jazz/jazz.00073.wav 150 | jazz/jazz.00074.wav 151 | jazz/jazz.00075.wav 152 | jazz/jazz.00076.wav 153 | jazz/jazz.00077.wav 154 | jazz/jazz.00078.wav 155 | jazz/jazz.00079.wav 156 | jazz/jazz.00080.wav 157 | jazz/jazz.00081.wav 158 | jazz/jazz.00082.wav 159 | jazz/jazz.00083.wav 160 | jazz/jazz.00084.wav 161 | jazz/jazz.00085.wav 162 | jazz/jazz.00086.wav 163 | jazz/jazz.00087.wav 164 | jazz/jazz.00088.wav 165 | jazz/jazz.00089.wav 166 | jazz/jazz.00090.wav 167 | jazz/jazz.00091.wav 168 | jazz/jazz.00092.wav 169 | jazz/jazz.00093.wav 170 | jazz/jazz.00094.wav 171 | jazz/jazz.00095.wav 172 | jazz/jazz.00096.wav 173 | jazz/jazz.00097.wav 174 | jazz/jazz.00098.wav 175 | jazz/jazz.00099.wav 176 | metal/metal.00012.wav 177 | metal/metal.00013.wav 178 | metal/metal.00014.wav 179 | metal/metal.00015.wav 180 | metal/metal.00022.wav 181 | metal/metal.00023.wav 182 | metal/metal.00025.wav 183 | metal/metal.00026.wav 184 | metal/metal.00027.wav 185 | metal/metal.00028.wav 186 | metal/metal.00029.wav 187 | metal/metal.00030.wav 188 | metal/metal.00031.wav 189 | metal/metal.00032.wav 190 | metal/metal.00033.wav 191 | metal/metal.00038.wav 192 | metal/metal.00039.wav 193 | metal/metal.00067.wav 194 | metal/metal.00070.wav 195 | metal/metal.00073.wav 196 | metal/metal.00074.wav 197 | metal/metal.00075.wav 198 | metal/metal.00078.wav 199 | metal/metal.00083.wav 200 | metal/metal.00085.wav 201 | metal/metal.00087.wav 202 | metal/metal.00088.wav 203 | pop/pop.00000.wav 204 | pop/pop.00001.wav 205 | pop/pop.00013.wav 206 | pop/pop.00014.wav 207 | pop/pop.00043.wav 208 | pop/pop.00063.wav 209 | pop/pop.00064.wav 210 | pop/pop.00065.wav 211 | pop/pop.00066.wav 212 | pop/pop.00069.wav 213 | pop/pop.00070.wav 214 | pop/pop.00071.wav 215 | pop/pop.00072.wav 216 | pop/pop.00073.wav 217 | pop/pop.00074.wav 218 | pop/pop.00075.wav 219 | pop/pop.00076.wav 220 | pop/pop.00077.wav 221 | pop/pop.00078.wav 222 | pop/pop.00079.wav 223 | pop/pop.00082.wav 224 | pop/pop.00088.wav 225 | pop/pop.00089.wav 226 | pop/pop.00090.wav 227 | pop/pop.00091.wav 228 | pop/pop.00092.wav 229 | pop/pop.00093.wav 230 | pop/pop.00094.wav 231 | pop/pop.00095.wav 232 | pop/pop.00096.wav 233 | reggae/reggae.00034.wav 234 | reggae/reggae.00035.wav 235 | reggae/reggae.00036.wav 236 | reggae/reggae.00037.wav 237 | reggae/reggae.00038.wav 238 | reggae/reggae.00039.wav 239 | reggae/reggae.00040.wav 240 | reggae/reggae.00046.wav 241 | reggae/reggae.00047.wav 242 | reggae/reggae.00048.wav 243 | reggae/reggae.00052.wav 244 | reggae/reggae.00053.wav 245 | reggae/reggae.00064.wav 246 | reggae/reggae.00065.wav 247 | reggae/reggae.00066.wav 248 | reggae/reggae.00067.wav 249 | reggae/reggae.00068.wav 250 | reggae/reggae.00071.wav 251 | reggae/reggae.00079.wav 252 | reggae/reggae.00082.wav 253 | reggae/reggae.00083.wav 254 | reggae/reggae.00084.wav 255 | reggae/reggae.00087.wav 256 | reggae/reggae.00088.wav 257 | reggae/reggae.00089.wav 258 | reggae/reggae.00090.wav 259 | rock/rock.00010.wav 260 | rock/rock.00011.wav 261 | rock/rock.00012.wav 262 | rock/rock.00013.wav 263 | rock/rock.00014.wav 264 | rock/rock.00015.wav 265 | rock/rock.00027.wav 266 | rock/rock.00028.wav 267 | rock/rock.00029.wav 268 | rock/rock.00030.wav 269 | rock/rock.00031.wav 270 | rock/rock.00032.wav 271 | rock/rock.00033.wav 272 | rock/rock.00034.wav 273 | rock/rock.00035.wav 274 | rock/rock.00036.wav 275 | rock/rock.00037.wav 276 | rock/rock.00039.wav 277 | rock/rock.00040.wav 278 | rock/rock.00041.wav 279 | rock/rock.00042.wav 280 | rock/rock.00043.wav 281 | rock/rock.00044.wav 282 | rock/rock.00045.wav 283 | rock/rock.00046.wav 284 | rock/rock.00047.wav 285 | rock/rock.00048.wav 286 | rock/rock.00086.wav 287 | rock/rock.00087.wav 288 | rock/rock.00088.wav 289 | rock/rock.00089.wav 290 | rock/rock.00090.wav 291 | -------------------------------------------------------------------------------- /gtzan/test_stratified.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00005.wav 2 | blues/blues.00010.wav 3 | blues/blues.00012.wav 4 | blues/blues.00013.wav 5 | blues/blues.00015.wav 6 | blues/blues.00020.wav 7 | blues/blues.00023.wav 8 | blues/blues.00024.wav 9 | blues/blues.00025.wav 10 | blues/blues.00028.wav 11 | blues/blues.00031.wav 12 | blues/blues.00040.wav 13 | blues/blues.00043.wav 14 | blues/blues.00049.wav 15 | blues/blues.00052.wav 16 | blues/blues.00053.wav 17 | blues/blues.00059.wav 18 | blues/blues.00062.wav 19 | blues/blues.00064.wav 20 | blues/blues.00077.wav 21 | blues/blues.00083.wav 22 | blues/blues.00084.wav 23 | blues/blues.00092.wav 24 | blues/blues.00095.wav 25 | blues/blues.00098.wav 26 | classical/classical.00009.wav 27 | classical/classical.00016.wav 28 | classical/classical.00019.wav 29 | classical/classical.00021.wav 30 | classical/classical.00024.wav 31 | classical/classical.00029.wav 32 | classical/classical.00034.wav 33 | classical/classical.00035.wav 34 | classical/classical.00036.wav 35 | classical/classical.00037.wav 36 | classical/classical.00041.wav 37 | classical/classical.00042.wav 38 | classical/classical.00043.wav 39 | classical/classical.00045.wav 40 | classical/classical.00052.wav 41 | classical/classical.00053.wav 42 | classical/classical.00055.wav 43 | classical/classical.00056.wav 44 | classical/classical.00061.wav 45 | classical/classical.00066.wav 46 | classical/classical.00069.wav 47 | classical/classical.00075.wav 48 | classical/classical.00087.wav 49 | classical/classical.00088.wav 50 | classical/classical.00092.wav 51 | country/country.00002.wav 52 | country/country.00011.wav 53 | country/country.00021.wav 54 | country/country.00022.wav 55 | country/country.00023.wav 56 | country/country.00024.wav 57 | country/country.00031.wav 58 | country/country.00033.wav 59 | country/country.00041.wav 60 | country/country.00047.wav 61 | country/country.00050.wav 62 | country/country.00051.wav 63 | country/country.00052.wav 64 | country/country.00053.wav 65 | country/country.00057.wav 66 | country/country.00064.wav 67 | country/country.00065.wav 68 | country/country.00072.wav 69 | country/country.00075.wav 70 | country/country.00076.wav 71 | country/country.00077.wav 72 | country/country.00078.wav 73 | country/country.00082.wav 74 | country/country.00091.wav 75 | country/country.00099.wav 76 | disco/disco.00000.wav 77 | disco/disco.00001.wav 78 | disco/disco.00002.wav 79 | disco/disco.00004.wav 80 | disco/disco.00005.wav 81 | disco/disco.00008.wav 82 | disco/disco.00012.wav 83 | disco/disco.00014.wav 84 | disco/disco.00021.wav 85 | disco/disco.00033.wav 86 | disco/disco.00038.wav 87 | disco/disco.00049.wav 88 | disco/disco.00055.wav 89 | disco/disco.00058.wav 90 | disco/disco.00061.wav 91 | disco/disco.00068.wav 92 | disco/disco.00071.wav 93 | disco/disco.00076.wav 94 | disco/disco.00077.wav 95 | disco/disco.00079.wav 96 | disco/disco.00080.wav 97 | disco/disco.00082.wav 98 | disco/disco.00086.wav 99 | disco/disco.00094.wav 100 | disco/disco.00098.wav 101 | hiphop/hiphop.00001.wav 102 | hiphop/hiphop.00004.wav 103 | hiphop/hiphop.00008.wav 104 | hiphop/hiphop.00009.wav 105 | hiphop/hiphop.00014.wav 106 | hiphop/hiphop.00015.wav 107 | hiphop/hiphop.00017.wav 108 | hiphop/hiphop.00023.wav 109 | hiphop/hiphop.00024.wav 110 | hiphop/hiphop.00025.wav 111 | hiphop/hiphop.00028.wav 112 | hiphop/hiphop.00029.wav 113 | hiphop/hiphop.00034.wav 114 | hiphop/hiphop.00042.wav 115 | hiphop/hiphop.00061.wav 116 | hiphop/hiphop.00062.wav 117 | hiphop/hiphop.00065.wav 118 | hiphop/hiphop.00070.wav 119 | hiphop/hiphop.00075.wav 120 | hiphop/hiphop.00085.wav 121 | hiphop/hiphop.00087.wav 122 | hiphop/hiphop.00091.wav 123 | hiphop/hiphop.00094.wav 124 | hiphop/hiphop.00095.wav 125 | hiphop/hiphop.00096.wav 126 | jazz/jazz.00003.wav 127 | jazz/jazz.00009.wav 128 | jazz/jazz.00016.wav 129 | jazz/jazz.00018.wav 130 | jazz/jazz.00020.wav 131 | jazz/jazz.00031.wav 132 | jazz/jazz.00033.wav 133 | jazz/jazz.00035.wav 134 | jazz/jazz.00037.wav 135 | jazz/jazz.00039.wav 136 | jazz/jazz.00045.wav 137 | jazz/jazz.00048.wav 138 | jazz/jazz.00053.wav 139 | jazz/jazz.00066.wav 140 | jazz/jazz.00067.wav 141 | jazz/jazz.00069.wav 142 | jazz/jazz.00071.wav 143 | jazz/jazz.00073.wav 144 | jazz/jazz.00076.wav 145 | jazz/jazz.00078.wav 146 | jazz/jazz.00084.wav 147 | jazz/jazz.00085.wav 148 | jazz/jazz.00087.wav 149 | jazz/jazz.00088.wav 150 | jazz/jazz.00099.wav 151 | metal/metal.00001.wav 152 | metal/metal.00002.wav 153 | metal/metal.00005.wav 154 | metal/metal.00018.wav 155 | metal/metal.00020.wav 156 | metal/metal.00021.wav 157 | metal/metal.00030.wav 158 | metal/metal.00035.wav 159 | metal/metal.00040.wav 160 | metal/metal.00042.wav 161 | metal/metal.00046.wav 162 | metal/metal.00048.wav 163 | metal/metal.00050.wav 164 | metal/metal.00051.wav 165 | metal/metal.00054.wav 166 | metal/metal.00057.wav 167 | metal/metal.00058.wav 168 | metal/metal.00062.wav 169 | metal/metal.00066.wav 170 | metal/metal.00069.wav 171 | metal/metal.00078.wav 172 | metal/metal.00080.wav 173 | metal/metal.00084.wav 174 | metal/metal.00092.wav 175 | metal/metal.00098.wav 176 | pop/pop.00000.wav 177 | pop/pop.00005.wav 178 | pop/pop.00006.wav 179 | pop/pop.00008.wav 180 | pop/pop.00021.wav 181 | pop/pop.00030.wav 182 | pop/pop.00031.wav 183 | pop/pop.00034.wav 184 | pop/pop.00036.wav 185 | pop/pop.00038.wav 186 | pop/pop.00039.wav 187 | pop/pop.00040.wav 188 | pop/pop.00044.wav 189 | pop/pop.00046.wav 190 | pop/pop.00049.wav 191 | pop/pop.00052.wav 192 | pop/pop.00066.wav 193 | pop/pop.00068.wav 194 | pop/pop.00069.wav 195 | pop/pop.00070.wav 196 | pop/pop.00084.wav 197 | pop/pop.00088.wav 198 | pop/pop.00091.wav 199 | pop/pop.00096.wav 200 | pop/pop.00097.wav 201 | reggae/reggae.00002.wav 202 | reggae/reggae.00003.wav 203 | reggae/reggae.00006.wav 204 | reggae/reggae.00015.wav 205 | reggae/reggae.00020.wav 206 | reggae/reggae.00021.wav 207 | reggae/reggae.00022.wav 208 | reggae/reggae.00033.wav 209 | reggae/reggae.00035.wav 210 | reggae/reggae.00046.wav 211 | reggae/reggae.00048.wav 212 | reggae/reggae.00050.wav 213 | reggae/reggae.00057.wav 214 | reggae/reggae.00058.wav 215 | reggae/reggae.00068.wav 216 | reggae/reggae.00069.wav 217 | reggae/reggae.00074.wav 218 | reggae/reggae.00078.wav 219 | reggae/reggae.00079.wav 220 | reggae/reggae.00081.wav 221 | reggae/reggae.00083.wav 222 | reggae/reggae.00094.wav 223 | reggae/reggae.00096.wav 224 | reggae/reggae.00097.wav 225 | reggae/reggae.00099.wav 226 | rock/rock.00000.wav 227 | rock/rock.00001.wav 228 | rock/rock.00002.wav 229 | rock/rock.00009.wav 230 | rock/rock.00010.wav 231 | rock/rock.00011.wav 232 | rock/rock.00020.wav 233 | rock/rock.00031.wav 234 | rock/rock.00032.wav 235 | rock/rock.00033.wav 236 | rock/rock.00039.wav 237 | rock/rock.00042.wav 238 | rock/rock.00047.wav 239 | rock/rock.00050.wav 240 | rock/rock.00051.wav 241 | rock/rock.00057.wav 242 | rock/rock.00060.wav 243 | rock/rock.00064.wav 244 | rock/rock.00070.wav 245 | rock/rock.00072.wav 246 | rock/rock.00073.wav 247 | rock/rock.00074.wav 248 | rock/rock.00079.wav 249 | rock/rock.00082.wav 250 | rock/rock.00098.wav 251 | -------------------------------------------------------------------------------- /gtzan/train_filtered.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00029.wav 2 | blues/blues.00030.wav 3 | blues/blues.00031.wav 4 | blues/blues.00032.wav 5 | blues/blues.00033.wav 6 | blues/blues.00034.wav 7 | blues/blues.00035.wav 8 | blues/blues.00036.wav 9 | blues/blues.00037.wav 10 | blues/blues.00038.wav 11 | blues/blues.00039.wav 12 | blues/blues.00040.wav 13 | blues/blues.00041.wav 14 | blues/blues.00042.wav 15 | blues/blues.00043.wav 16 | blues/blues.00044.wav 17 | blues/blues.00045.wav 18 | blues/blues.00046.wav 19 | blues/blues.00047.wav 20 | blues/blues.00048.wav 21 | blues/blues.00049.wav 22 | blues/blues.00073.wav 23 | blues/blues.00074.wav 24 | blues/blues.00075.wav 25 | blues/blues.00076.wav 26 | blues/blues.00077.wav 27 | blues/blues.00078.wav 28 | blues/blues.00079.wav 29 | blues/blues.00080.wav 30 | blues/blues.00081.wav 31 | blues/blues.00082.wav 32 | blues/blues.00083.wav 33 | blues/blues.00084.wav 34 | blues/blues.00085.wav 35 | blues/blues.00086.wav 36 | blues/blues.00087.wav 37 | blues/blues.00088.wav 38 | blues/blues.00089.wav 39 | blues/blues.00090.wav 40 | blues/blues.00091.wav 41 | blues/blues.00092.wav 42 | blues/blues.00093.wav 43 | blues/blues.00094.wav 44 | blues/blues.00095.wav 45 | blues/blues.00096.wav 46 | blues/blues.00097.wav 47 | classical/classical.00030.wav 48 | classical/classical.00031.wav 49 | classical/classical.00032.wav 50 | classical/classical.00033.wav 51 | classical/classical.00043.wav 52 | classical/classical.00044.wav 53 | classical/classical.00045.wav 54 | classical/classical.00046.wav 55 | classical/classical.00047.wav 56 | classical/classical.00048.wav 57 | classical/classical.00050.wav 58 | classical/classical.00051.wav 59 | classical/classical.00052.wav 60 | classical/classical.00053.wav 61 | classical/classical.00054.wav 62 | classical/classical.00055.wav 63 | classical/classical.00056.wav 64 | classical/classical.00057.wav 65 | classical/classical.00058.wav 66 | classical/classical.00059.wav 67 | classical/classical.00060.wav 68 | classical/classical.00061.wav 69 | classical/classical.00062.wav 70 | classical/classical.00063.wav 71 | classical/classical.00064.wav 72 | classical/classical.00065.wav 73 | classical/classical.00066.wav 74 | classical/classical.00067.wav 75 | classical/classical.00080.wav 76 | classical/classical.00081.wav 77 | classical/classical.00082.wav 78 | classical/classical.00083.wav 79 | classical/classical.00084.wav 80 | classical/classical.00085.wav 81 | classical/classical.00086.wav 82 | classical/classical.00087.wav 83 | classical/classical.00088.wav 84 | classical/classical.00089.wav 85 | classical/classical.00090.wav 86 | classical/classical.00091.wav 87 | classical/classical.00092.wav 88 | classical/classical.00093.wav 89 | classical/classical.00094.wav 90 | classical/classical.00095.wav 91 | classical/classical.00096.wav 92 | classical/classical.00097.wav 93 | classical/classical.00098.wav 94 | classical/classical.00099.wav 95 | country/country.00019.wav 96 | country/country.00020.wav 97 | country/country.00021.wav 98 | country/country.00022.wav 99 | country/country.00023.wav 100 | country/country.00024.wav 101 | country/country.00025.wav 102 | country/country.00026.wav 103 | country/country.00028.wav 104 | country/country.00029.wav 105 | country/country.00065.wav 106 | country/country.00066.wav 107 | country/country.00067.wav 108 | country/country.00068.wav 109 | country/country.00069.wav 110 | country/country.00070.wav 111 | country/country.00071.wav 112 | country/country.00072.wav 113 | country/country.00073.wav 114 | country/country.00074.wav 115 | country/country.00075.wav 116 | country/country.00076.wav 117 | country/country.00077.wav 118 | country/country.00078.wav 119 | country/country.00079.wav 120 | country/country.00080.wav 121 | country/country.00081.wav 122 | country/country.00082.wav 123 | country/country.00083.wav 124 | country/country.00084.wav 125 | country/country.00085.wav 126 | country/country.00086.wav 127 | country/country.00087.wav 128 | country/country.00088.wav 129 | country/country.00089.wav 130 | country/country.00090.wav 131 | country/country.00091.wav 132 | country/country.00092.wav 133 | country/country.00093.wav 134 | country/country.00094.wav 135 | country/country.00095.wav 136 | country/country.00096.wav 137 | country/country.00097.wav 138 | country/country.00098.wav 139 | country/country.00099.wav 140 | disco/disco.00005.wav 141 | disco/disco.00015.wav 142 | disco/disco.00016.wav 143 | disco/disco.00017.wav 144 | disco/disco.00018.wav 145 | disco/disco.00019.wav 146 | disco/disco.00020.wav 147 | disco/disco.00022.wav 148 | disco/disco.00023.wav 149 | disco/disco.00024.wav 150 | disco/disco.00025.wav 151 | disco/disco.00026.wav 152 | disco/disco.00027.wav 153 | disco/disco.00028.wav 154 | disco/disco.00029.wav 155 | disco/disco.00030.wav 156 | disco/disco.00031.wav 157 | disco/disco.00032.wav 158 | disco/disco.00033.wav 159 | disco/disco.00034.wav 160 | disco/disco.00035.wav 161 | disco/disco.00036.wav 162 | disco/disco.00037.wav 163 | disco/disco.00039.wav 164 | disco/disco.00040.wav 165 | disco/disco.00041.wav 166 | disco/disco.00042.wav 167 | disco/disco.00043.wav 168 | disco/disco.00044.wav 169 | disco/disco.00045.wav 170 | disco/disco.00047.wav 171 | disco/disco.00049.wav 172 | disco/disco.00053.wav 173 | disco/disco.00054.wav 174 | disco/disco.00056.wav 175 | disco/disco.00057.wav 176 | disco/disco.00059.wav 177 | disco/disco.00061.wav 178 | disco/disco.00070.wav 179 | disco/disco.00073.wav 180 | disco/disco.00074.wav 181 | disco/disco.00089.wav 182 | hiphop/hiphop.00002.wav 183 | hiphop/hiphop.00003.wav 184 | hiphop/hiphop.00004.wav 185 | hiphop/hiphop.00005.wav 186 | hiphop/hiphop.00006.wav 187 | hiphop/hiphop.00007.wav 188 | hiphop/hiphop.00008.wav 189 | hiphop/hiphop.00009.wav 190 | hiphop/hiphop.00010.wav 191 | hiphop/hiphop.00011.wav 192 | hiphop/hiphop.00012.wav 193 | hiphop/hiphop.00013.wav 194 | hiphop/hiphop.00014.wav 195 | hiphop/hiphop.00015.wav 196 | hiphop/hiphop.00016.wav 197 | hiphop/hiphop.00017.wav 198 | hiphop/hiphop.00018.wav 199 | hiphop/hiphop.00019.wav 200 | hiphop/hiphop.00020.wav 201 | hiphop/hiphop.00021.wav 202 | hiphop/hiphop.00022.wav 203 | hiphop/hiphop.00023.wav 204 | hiphop/hiphop.00024.wav 205 | hiphop/hiphop.00025.wav 206 | hiphop/hiphop.00028.wav 207 | hiphop/hiphop.00029.wav 208 | hiphop/hiphop.00031.wav 209 | hiphop/hiphop.00032.wav 210 | hiphop/hiphop.00033.wav 211 | hiphop/hiphop.00034.wav 212 | hiphop/hiphop.00035.wav 213 | hiphop/hiphop.00036.wav 214 | hiphop/hiphop.00037.wav 215 | hiphop/hiphop.00038.wav 216 | hiphop/hiphop.00041.wav 217 | hiphop/hiphop.00042.wav 218 | hiphop/hiphop.00055.wav 219 | hiphop/hiphop.00056.wav 220 | hiphop/hiphop.00057.wav 221 | hiphop/hiphop.00058.wav 222 | hiphop/hiphop.00059.wav 223 | hiphop/hiphop.00060.wav 224 | hiphop/hiphop.00061.wav 225 | hiphop/hiphop.00077.wav 226 | hiphop/hiphop.00078.wav 227 | hiphop/hiphop.00079.wav 228 | hiphop/hiphop.00080.wav 229 | jazz/jazz.00000.wav 230 | jazz/jazz.00001.wav 231 | jazz/jazz.00011.wav 232 | jazz/jazz.00012.wav 233 | jazz/jazz.00013.wav 234 | jazz/jazz.00014.wav 235 | jazz/jazz.00015.wav 236 | jazz/jazz.00016.wav 237 | jazz/jazz.00017.wav 238 | jazz/jazz.00018.wav 239 | jazz/jazz.00019.wav 240 | jazz/jazz.00020.wav 241 | jazz/jazz.00021.wav 242 | jazz/jazz.00022.wav 243 | jazz/jazz.00023.wav 244 | jazz/jazz.00024.wav 245 | jazz/jazz.00041.wav 246 | jazz/jazz.00047.wav 247 | jazz/jazz.00048.wav 248 | jazz/jazz.00049.wav 249 | jazz/jazz.00050.wav 250 | jazz/jazz.00051.wav 251 | jazz/jazz.00052.wav 252 | jazz/jazz.00053.wav 253 | jazz/jazz.00054.wav 254 | jazz/jazz.00055.wav 255 | jazz/jazz.00056.wav 256 | jazz/jazz.00057.wav 257 | jazz/jazz.00058.wav 258 | jazz/jazz.00059.wav 259 | jazz/jazz.00060.wav 260 | jazz/jazz.00061.wav 261 | jazz/jazz.00062.wav 262 | jazz/jazz.00063.wav 263 | jazz/jazz.00064.wav 264 | jazz/jazz.00065.wav 265 | jazz/jazz.00066.wav 266 | jazz/jazz.00067.wav 267 | jazz/jazz.00068.wav 268 | jazz/jazz.00069.wav 269 | jazz/jazz.00070.wav 270 | jazz/jazz.00071.wav 271 | jazz/jazz.00072.wav 272 | metal/metal.00002.wav 273 | metal/metal.00003.wav 274 | metal/metal.00005.wav 275 | metal/metal.00021.wav 276 | metal/metal.00024.wav 277 | metal/metal.00035.wav 278 | metal/metal.00046.wav 279 | metal/metal.00047.wav 280 | metal/metal.00048.wav 281 | metal/metal.00049.wav 282 | metal/metal.00050.wav 283 | metal/metal.00051.wav 284 | metal/metal.00052.wav 285 | metal/metal.00053.wav 286 | metal/metal.00054.wav 287 | metal/metal.00055.wav 288 | metal/metal.00056.wav 289 | metal/metal.00057.wav 290 | metal/metal.00059.wav 291 | metal/metal.00060.wav 292 | metal/metal.00061.wav 293 | metal/metal.00062.wav 294 | metal/metal.00063.wav 295 | metal/metal.00064.wav 296 | metal/metal.00065.wav 297 | metal/metal.00066.wav 298 | metal/metal.00069.wav 299 | metal/metal.00071.wav 300 | metal/metal.00072.wav 301 | metal/metal.00079.wav 302 | metal/metal.00080.wav 303 | metal/metal.00084.wav 304 | metal/metal.00086.wav 305 | metal/metal.00089.wav 306 | metal/metal.00090.wav 307 | metal/metal.00091.wav 308 | metal/metal.00092.wav 309 | metal/metal.00093.wav 310 | metal/metal.00094.wav 311 | metal/metal.00095.wav 312 | metal/metal.00096.wav 313 | metal/metal.00097.wav 314 | metal/metal.00098.wav 315 | metal/metal.00099.wav 316 | pop/pop.00002.wav 317 | pop/pop.00003.wav 318 | pop/pop.00004.wav 319 | pop/pop.00005.wav 320 | pop/pop.00006.wav 321 | pop/pop.00007.wav 322 | pop/pop.00008.wav 323 | pop/pop.00009.wav 324 | pop/pop.00011.wav 325 | pop/pop.00012.wav 326 | pop/pop.00016.wav 327 | pop/pop.00017.wav 328 | pop/pop.00018.wav 329 | pop/pop.00019.wav 330 | pop/pop.00020.wav 331 | pop/pop.00023.wav 332 | pop/pop.00024.wav 333 | pop/pop.00025.wav 334 | pop/pop.00026.wav 335 | pop/pop.00027.wav 336 | pop/pop.00028.wav 337 | pop/pop.00029.wav 338 | pop/pop.00031.wav 339 | pop/pop.00032.wav 340 | pop/pop.00033.wav 341 | pop/pop.00034.wav 342 | pop/pop.00035.wav 343 | pop/pop.00036.wav 344 | pop/pop.00038.wav 345 | pop/pop.00039.wav 346 | pop/pop.00040.wav 347 | pop/pop.00041.wav 348 | pop/pop.00042.wav 349 | pop/pop.00044.wav 350 | pop/pop.00046.wav 351 | pop/pop.00049.wav 352 | pop/pop.00050.wav 353 | pop/pop.00080.wav 354 | pop/pop.00097.wav 355 | pop/pop.00098.wav 356 | pop/pop.00099.wav 357 | reggae/reggae.00000.wav 358 | reggae/reggae.00001.wav 359 | reggae/reggae.00002.wav 360 | reggae/reggae.00004.wav 361 | reggae/reggae.00006.wav 362 | reggae/reggae.00009.wav 363 | reggae/reggae.00011.wav 364 | reggae/reggae.00012.wav 365 | reggae/reggae.00014.wav 366 | reggae/reggae.00015.wav 367 | reggae/reggae.00016.wav 368 | reggae/reggae.00017.wav 369 | reggae/reggae.00018.wav 370 | reggae/reggae.00019.wav 371 | reggae/reggae.00020.wav 372 | reggae/reggae.00021.wav 373 | reggae/reggae.00022.wav 374 | reggae/reggae.00023.wav 375 | reggae/reggae.00024.wav 376 | reggae/reggae.00025.wav 377 | reggae/reggae.00026.wav 378 | reggae/reggae.00027.wav 379 | reggae/reggae.00028.wav 380 | reggae/reggae.00029.wav 381 | reggae/reggae.00030.wav 382 | reggae/reggae.00031.wav 383 | reggae/reggae.00032.wav 384 | reggae/reggae.00042.wav 385 | reggae/reggae.00043.wav 386 | reggae/reggae.00044.wav 387 | reggae/reggae.00045.wav 388 | reggae/reggae.00049.wav 389 | reggae/reggae.00050.wav 390 | reggae/reggae.00051.wav 391 | reggae/reggae.00054.wav 392 | reggae/reggae.00055.wav 393 | reggae/reggae.00056.wav 394 | reggae/reggae.00057.wav 395 | reggae/reggae.00058.wav 396 | reggae/reggae.00059.wav 397 | reggae/reggae.00060.wav 398 | reggae/reggae.00063.wav 399 | reggae/reggae.00069.wav 400 | rock/rock.00000.wav 401 | rock/rock.00001.wav 402 | rock/rock.00002.wav 403 | rock/rock.00003.wav 404 | rock/rock.00004.wav 405 | rock/rock.00005.wav 406 | rock/rock.00006.wav 407 | rock/rock.00007.wav 408 | rock/rock.00008.wav 409 | rock/rock.00009.wav 410 | rock/rock.00016.wav 411 | rock/rock.00017.wav 412 | rock/rock.00018.wav 413 | rock/rock.00019.wav 414 | rock/rock.00020.wav 415 | rock/rock.00021.wav 416 | rock/rock.00022.wav 417 | rock/rock.00023.wav 418 | rock/rock.00024.wav 419 | rock/rock.00025.wav 420 | rock/rock.00026.wav 421 | rock/rock.00057.wav 422 | rock/rock.00058.wav 423 | rock/rock.00059.wav 424 | rock/rock.00060.wav 425 | rock/rock.00061.wav 426 | rock/rock.00062.wav 427 | rock/rock.00063.wav 428 | rock/rock.00064.wav 429 | rock/rock.00065.wav 430 | rock/rock.00066.wav 431 | rock/rock.00067.wav 432 | rock/rock.00068.wav 433 | rock/rock.00069.wav 434 | rock/rock.00070.wav 435 | rock/rock.00091.wav 436 | rock/rock.00092.wav 437 | rock/rock.00093.wav 438 | rock/rock.00094.wav 439 | rock/rock.00095.wav 440 | rock/rock.00096.wav 441 | rock/rock.00097.wav 442 | rock/rock.00098.wav 443 | rock/rock.00099.wav 444 | -------------------------------------------------------------------------------- /gtzan/train_stratified.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00000.wav 2 | blues/blues.00001.wav 3 | blues/blues.00002.wav 4 | blues/blues.00003.wav 5 | blues/blues.00006.wav 6 | blues/blues.00011.wav 7 | blues/blues.00014.wav 8 | blues/blues.00017.wav 9 | blues/blues.00018.wav 10 | blues/blues.00019.wav 11 | blues/blues.00021.wav 12 | blues/blues.00026.wav 13 | blues/blues.00027.wav 14 | blues/blues.00029.wav 15 | blues/blues.00030.wav 16 | blues/blues.00033.wav 17 | blues/blues.00034.wav 18 | blues/blues.00036.wav 19 | blues/blues.00037.wav 20 | blues/blues.00038.wav 21 | blues/blues.00039.wav 22 | blues/blues.00041.wav 23 | blues/blues.00044.wav 24 | blues/blues.00046.wav 25 | blues/blues.00047.wav 26 | blues/blues.00048.wav 27 | blues/blues.00050.wav 28 | blues/blues.00055.wav 29 | blues/blues.00056.wav 30 | blues/blues.00057.wav 31 | blues/blues.00060.wav 32 | blues/blues.00063.wav 33 | blues/blues.00065.wav 34 | blues/blues.00067.wav 35 | blues/blues.00069.wav 36 | blues/blues.00070.wav 37 | blues/blues.00072.wav 38 | blues/blues.00074.wav 39 | blues/blues.00075.wav 40 | blues/blues.00076.wav 41 | blues/blues.00079.wav 42 | blues/blues.00081.wav 43 | blues/blues.00085.wav 44 | blues/blues.00087.wav 45 | blues/blues.00088.wav 46 | blues/blues.00089.wav 47 | blues/blues.00091.wav 48 | blues/blues.00094.wav 49 | blues/blues.00097.wav 50 | blues/blues.00099.wav 51 | classical/classical.00000.wav 52 | classical/classical.00001.wav 53 | classical/classical.00002.wav 54 | classical/classical.00003.wav 55 | classical/classical.00004.wav 56 | classical/classical.00011.wav 57 | classical/classical.00012.wav 58 | classical/classical.00014.wav 59 | classical/classical.00017.wav 60 | classical/classical.00020.wav 61 | classical/classical.00023.wav 62 | classical/classical.00025.wav 63 | classical/classical.00026.wav 64 | classical/classical.00028.wav 65 | classical/classical.00031.wav 66 | classical/classical.00032.wav 67 | classical/classical.00038.wav 68 | classical/classical.00040.wav 69 | classical/classical.00044.wav 70 | classical/classical.00047.wav 71 | classical/classical.00048.wav 72 | classical/classical.00049.wav 73 | classical/classical.00050.wav 74 | classical/classical.00051.wav 75 | classical/classical.00057.wav 76 | classical/classical.00058.wav 77 | classical/classical.00059.wav 78 | classical/classical.00060.wav 79 | classical/classical.00062.wav 80 | classical/classical.00064.wav 81 | classical/classical.00065.wav 82 | classical/classical.00067.wav 83 | classical/classical.00070.wav 84 | classical/classical.00072.wav 85 | classical/classical.00073.wav 86 | classical/classical.00076.wav 87 | classical/classical.00078.wav 88 | classical/classical.00079.wav 89 | classical/classical.00080.wav 90 | classical/classical.00081.wav 91 | classical/classical.00082.wav 92 | classical/classical.00083.wav 93 | classical/classical.00084.wav 94 | classical/classical.00085.wav 95 | classical/classical.00091.wav 96 | classical/classical.00093.wav 97 | classical/classical.00095.wav 98 | classical/classical.00096.wav 99 | classical/classical.00097.wav 100 | classical/classical.00099.wav 101 | country/country.00003.wav 102 | country/country.00006.wav 103 | country/country.00007.wav 104 | country/country.00008.wav 105 | country/country.00010.wav 106 | country/country.00012.wav 107 | country/country.00013.wav 108 | country/country.00014.wav 109 | country/country.00016.wav 110 | country/country.00017.wav 111 | country/country.00018.wav 112 | country/country.00019.wav 113 | country/country.00028.wav 114 | country/country.00029.wav 115 | country/country.00032.wav 116 | country/country.00034.wav 117 | country/country.00035.wav 118 | country/country.00036.wav 119 | country/country.00037.wav 120 | country/country.00038.wav 121 | country/country.00039.wav 122 | country/country.00040.wav 123 | country/country.00044.wav 124 | country/country.00045.wav 125 | country/country.00048.wav 126 | country/country.00049.wav 127 | country/country.00054.wav 128 | country/country.00056.wav 129 | country/country.00059.wav 130 | country/country.00061.wav 131 | country/country.00062.wav 132 | country/country.00063.wav 133 | country/country.00066.wav 134 | country/country.00067.wav 135 | country/country.00068.wav 136 | country/country.00069.wav 137 | country/country.00070.wav 138 | country/country.00073.wav 139 | country/country.00074.wav 140 | country/country.00079.wav 141 | country/country.00080.wav 142 | country/country.00081.wav 143 | country/country.00085.wav 144 | country/country.00086.wav 145 | country/country.00089.wav 146 | country/country.00093.wav 147 | country/country.00094.wav 148 | country/country.00095.wav 149 | country/country.00096.wav 150 | country/country.00097.wav 151 | disco/disco.00006.wav 152 | disco/disco.00007.wav 153 | disco/disco.00009.wav 154 | disco/disco.00010.wav 155 | disco/disco.00011.wav 156 | disco/disco.00013.wav 157 | disco/disco.00016.wav 158 | disco/disco.00017.wav 159 | disco/disco.00018.wav 160 | disco/disco.00019.wav 161 | disco/disco.00020.wav 162 | disco/disco.00023.wav 163 | disco/disco.00024.wav 164 | disco/disco.00026.wav 165 | disco/disco.00027.wav 166 | disco/disco.00030.wav 167 | disco/disco.00031.wav 168 | disco/disco.00034.wav 169 | disco/disco.00035.wav 170 | disco/disco.00036.wav 171 | disco/disco.00037.wav 172 | disco/disco.00041.wav 173 | disco/disco.00042.wav 174 | disco/disco.00044.wav 175 | disco/disco.00045.wav 176 | disco/disco.00047.wav 177 | disco/disco.00048.wav 178 | disco/disco.00052.wav 179 | disco/disco.00054.wav 180 | disco/disco.00056.wav 181 | disco/disco.00057.wav 182 | disco/disco.00060.wav 183 | disco/disco.00063.wav 184 | disco/disco.00064.wav 185 | disco/disco.00065.wav 186 | disco/disco.00066.wav 187 | disco/disco.00067.wav 188 | disco/disco.00069.wav 189 | disco/disco.00075.wav 190 | disco/disco.00078.wav 191 | disco/disco.00081.wav 192 | disco/disco.00083.wav 193 | disco/disco.00084.wav 194 | disco/disco.00088.wav 195 | disco/disco.00089.wav 196 | disco/disco.00090.wav 197 | disco/disco.00091.wav 198 | disco/disco.00096.wav 199 | disco/disco.00097.wav 200 | disco/disco.00099.wav 201 | hiphop/hiphop.00000.wav 202 | hiphop/hiphop.00002.wav 203 | hiphop/hiphop.00006.wav 204 | hiphop/hiphop.00011.wav 205 | hiphop/hiphop.00012.wav 206 | hiphop/hiphop.00013.wav 207 | hiphop/hiphop.00018.wav 208 | hiphop/hiphop.00020.wav 209 | hiphop/hiphop.00021.wav 210 | hiphop/hiphop.00022.wav 211 | hiphop/hiphop.00026.wav 212 | hiphop/hiphop.00027.wav 213 | hiphop/hiphop.00030.wav 214 | hiphop/hiphop.00031.wav 215 | hiphop/hiphop.00032.wav 216 | hiphop/hiphop.00033.wav 217 | hiphop/hiphop.00035.wav 218 | hiphop/hiphop.00038.wav 219 | hiphop/hiphop.00040.wav 220 | hiphop/hiphop.00041.wav 221 | hiphop/hiphop.00044.wav 222 | hiphop/hiphop.00045.wav 223 | hiphop/hiphop.00046.wav 224 | hiphop/hiphop.00049.wav 225 | hiphop/hiphop.00050.wav 226 | hiphop/hiphop.00052.wav 227 | hiphop/hiphop.00053.wav 228 | hiphop/hiphop.00054.wav 229 | hiphop/hiphop.00056.wav 230 | hiphop/hiphop.00057.wav 231 | hiphop/hiphop.00058.wav 232 | hiphop/hiphop.00059.wav 233 | hiphop/hiphop.00060.wav 234 | hiphop/hiphop.00063.wav 235 | hiphop/hiphop.00064.wav 236 | hiphop/hiphop.00068.wav 237 | hiphop/hiphop.00072.wav 238 | hiphop/hiphop.00073.wav 239 | hiphop/hiphop.00074.wav 240 | hiphop/hiphop.00077.wav 241 | hiphop/hiphop.00078.wav 242 | hiphop/hiphop.00079.wav 243 | hiphop/hiphop.00084.wav 244 | hiphop/hiphop.00086.wav 245 | hiphop/hiphop.00088.wav 246 | hiphop/hiphop.00089.wav 247 | hiphop/hiphop.00090.wav 248 | hiphop/hiphop.00092.wav 249 | hiphop/hiphop.00098.wav 250 | hiphop/hiphop.00099.wav 251 | jazz/jazz.00002.wav 252 | jazz/jazz.00004.wav 253 | jazz/jazz.00005.wav 254 | jazz/jazz.00006.wav 255 | jazz/jazz.00007.wav 256 | jazz/jazz.00010.wav 257 | jazz/jazz.00012.wav 258 | jazz/jazz.00013.wav 259 | jazz/jazz.00014.wav 260 | jazz/jazz.00015.wav 261 | jazz/jazz.00023.wav 262 | jazz/jazz.00024.wav 263 | jazz/jazz.00026.wav 264 | jazz/jazz.00027.wav 265 | jazz/jazz.00028.wav 266 | jazz/jazz.00030.wav 267 | jazz/jazz.00032.wav 268 | jazz/jazz.00034.wav 269 | jazz/jazz.00038.wav 270 | jazz/jazz.00040.wav 271 | jazz/jazz.00041.wav 272 | jazz/jazz.00043.wav 273 | jazz/jazz.00050.wav 274 | jazz/jazz.00052.wav 275 | jazz/jazz.00054.wav 276 | jazz/jazz.00055.wav 277 | jazz/jazz.00057.wav 278 | jazz/jazz.00058.wav 279 | jazz/jazz.00059.wav 280 | jazz/jazz.00060.wav 281 | jazz/jazz.00061.wav 282 | jazz/jazz.00062.wav 283 | jazz/jazz.00064.wav 284 | jazz/jazz.00068.wav 285 | jazz/jazz.00070.wav 286 | jazz/jazz.00072.wav 287 | jazz/jazz.00074.wav 288 | jazz/jazz.00075.wav 289 | jazz/jazz.00077.wav 290 | jazz/jazz.00079.wav 291 | jazz/jazz.00080.wav 292 | jazz/jazz.00086.wav 293 | jazz/jazz.00089.wav 294 | jazz/jazz.00090.wav 295 | jazz/jazz.00091.wav 296 | jazz/jazz.00092.wav 297 | jazz/jazz.00093.wav 298 | jazz/jazz.00094.wav 299 | jazz/jazz.00095.wav 300 | jazz/jazz.00096.wav 301 | metal/metal.00000.wav 302 | metal/metal.00003.wav 303 | metal/metal.00004.wav 304 | metal/metal.00006.wav 305 | metal/metal.00010.wav 306 | metal/metal.00011.wav 307 | metal/metal.00013.wav 308 | metal/metal.00014.wav 309 | metal/metal.00016.wav 310 | metal/metal.00017.wav 311 | metal/metal.00019.wav 312 | metal/metal.00022.wav 313 | metal/metal.00023.wav 314 | metal/metal.00024.wav 315 | metal/metal.00025.wav 316 | metal/metal.00026.wav 317 | metal/metal.00027.wav 318 | metal/metal.00028.wav 319 | metal/metal.00036.wav 320 | metal/metal.00037.wav 321 | metal/metal.00038.wav 322 | metal/metal.00041.wav 323 | metal/metal.00045.wav 324 | metal/metal.00047.wav 325 | metal/metal.00049.wav 326 | metal/metal.00052.wav 327 | metal/metal.00056.wav 328 | metal/metal.00059.wav 329 | metal/metal.00060.wav 330 | metal/metal.00061.wav 331 | metal/metal.00065.wav 332 | metal/metal.00070.wav 333 | metal/metal.00071.wav 334 | metal/metal.00072.wav 335 | metal/metal.00074.wav 336 | metal/metal.00075.wav 337 | metal/metal.00076.wav 338 | metal/metal.00077.wav 339 | metal/metal.00079.wav 340 | metal/metal.00081.wav 341 | metal/metal.00082.wav 342 | metal/metal.00085.wav 343 | metal/metal.00086.wav 344 | metal/metal.00088.wav 345 | metal/metal.00089.wav 346 | metal/metal.00090.wav 347 | metal/metal.00091.wav 348 | metal/metal.00093.wav 349 | metal/metal.00097.wav 350 | metal/metal.00099.wav 351 | pop/pop.00002.wav 352 | pop/pop.00003.wav 353 | pop/pop.00004.wav 354 | pop/pop.00009.wav 355 | pop/pop.00011.wav 356 | pop/pop.00012.wav 357 | pop/pop.00013.wav 358 | pop/pop.00015.wav 359 | pop/pop.00017.wav 360 | pop/pop.00020.wav 361 | pop/pop.00022.wav 362 | pop/pop.00024.wav 363 | pop/pop.00026.wav 364 | pop/pop.00027.wav 365 | pop/pop.00032.wav 366 | pop/pop.00033.wav 367 | pop/pop.00035.wav 368 | pop/pop.00041.wav 369 | pop/pop.00042.wav 370 | pop/pop.00043.wav 371 | pop/pop.00045.wav 372 | pop/pop.00048.wav 373 | pop/pop.00050.wav 374 | pop/pop.00051.wav 375 | pop/pop.00053.wav 376 | pop/pop.00054.wav 377 | pop/pop.00055.wav 378 | pop/pop.00056.wav 379 | pop/pop.00061.wav 380 | pop/pop.00062.wav 381 | pop/pop.00063.wav 382 | pop/pop.00064.wav 383 | pop/pop.00065.wav 384 | pop/pop.00067.wav 385 | pop/pop.00071.wav 386 | pop/pop.00072.wav 387 | pop/pop.00074.wav 388 | pop/pop.00075.wav 389 | pop/pop.00076.wav 390 | pop/pop.00077.wav 391 | pop/pop.00079.wav 392 | pop/pop.00081.wav 393 | pop/pop.00082.wav 394 | pop/pop.00083.wav 395 | pop/pop.00086.wav 396 | pop/pop.00087.wav 397 | pop/pop.00092.wav 398 | pop/pop.00093.wav 399 | pop/pop.00095.wav 400 | pop/pop.00098.wav 401 | reggae/reggae.00004.wav 402 | reggae/reggae.00005.wav 403 | reggae/reggae.00009.wav 404 | reggae/reggae.00010.wav 405 | reggae/reggae.00011.wav 406 | reggae/reggae.00012.wav 407 | reggae/reggae.00013.wav 408 | reggae/reggae.00014.wav 409 | reggae/reggae.00016.wav 410 | reggae/reggae.00017.wav 411 | reggae/reggae.00018.wav 412 | reggae/reggae.00023.wav 413 | reggae/reggae.00027.wav 414 | reggae/reggae.00028.wav 415 | reggae/reggae.00030.wav 416 | reggae/reggae.00031.wav 417 | reggae/reggae.00036.wav 418 | reggae/reggae.00037.wav 419 | reggae/reggae.00040.wav 420 | reggae/reggae.00041.wav 421 | reggae/reggae.00042.wav 422 | reggae/reggae.00043.wav 423 | reggae/reggae.00044.wav 424 | reggae/reggae.00049.wav 425 | reggae/reggae.00051.wav 426 | reggae/reggae.00052.wav 427 | reggae/reggae.00053.wav 428 | reggae/reggae.00054.wav 429 | reggae/reggae.00055.wav 430 | reggae/reggae.00056.wav 431 | reggae/reggae.00059.wav 432 | reggae/reggae.00060.wav 433 | reggae/reggae.00062.wav 434 | reggae/reggae.00064.wav 435 | reggae/reggae.00065.wav 436 | reggae/reggae.00066.wav 437 | reggae/reggae.00071.wav 438 | reggae/reggae.00073.wav 439 | reggae/reggae.00075.wav 440 | reggae/reggae.00076.wav 441 | reggae/reggae.00077.wav 442 | reggae/reggae.00082.wav 443 | reggae/reggae.00084.wav 444 | reggae/reggae.00087.wav 445 | reggae/reggae.00088.wav 446 | reggae/reggae.00089.wav 447 | reggae/reggae.00091.wav 448 | reggae/reggae.00092.wav 449 | reggae/reggae.00095.wav 450 | reggae/reggae.00098.wav 451 | rock/rock.00003.wav 452 | rock/rock.00004.wav 453 | rock/rock.00005.wav 454 | rock/rock.00006.wav 455 | rock/rock.00008.wav 456 | rock/rock.00014.wav 457 | rock/rock.00015.wav 458 | rock/rock.00017.wav 459 | rock/rock.00023.wav 460 | rock/rock.00024.wav 461 | rock/rock.00025.wav 462 | rock/rock.00026.wav 463 | rock/rock.00027.wav 464 | rock/rock.00034.wav 465 | rock/rock.00035.wav 466 | rock/rock.00037.wav 467 | rock/rock.00040.wav 468 | rock/rock.00041.wav 469 | rock/rock.00044.wav 470 | rock/rock.00045.wav 471 | rock/rock.00048.wav 472 | rock/rock.00052.wav 473 | rock/rock.00054.wav 474 | rock/rock.00055.wav 475 | rock/rock.00058.wav 476 | rock/rock.00059.wav 477 | rock/rock.00062.wav 478 | rock/rock.00063.wav 479 | rock/rock.00065.wav 480 | rock/rock.00066.wav 481 | rock/rock.00067.wav 482 | rock/rock.00069.wav 483 | rock/rock.00071.wav 484 | rock/rock.00075.wav 485 | rock/rock.00077.wav 486 | rock/rock.00078.wav 487 | rock/rock.00080.wav 488 | rock/rock.00081.wav 489 | rock/rock.00083.wav 490 | rock/rock.00084.wav 491 | rock/rock.00085.wav 492 | rock/rock.00086.wav 493 | rock/rock.00088.wav 494 | rock/rock.00090.wav 495 | rock/rock.00091.wav 496 | rock/rock.00093.wav 497 | rock/rock.00094.wav 498 | rock/rock.00095.wav 499 | rock/rock.00097.wav 500 | rock/rock.00099.wav 501 | -------------------------------------------------------------------------------- /gtzan/valid_filtered.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00000.wav 2 | blues/blues.00001.wav 3 | blues/blues.00002.wav 4 | blues/blues.00003.wav 5 | blues/blues.00004.wav 6 | blues/blues.00005.wav 7 | blues/blues.00006.wav 8 | blues/blues.00007.wav 9 | blues/blues.00008.wav 10 | blues/blues.00009.wav 11 | blues/blues.00010.wav 12 | blues/blues.00011.wav 13 | blues/blues.00050.wav 14 | blues/blues.00051.wav 15 | blues/blues.00052.wav 16 | blues/blues.00053.wav 17 | blues/blues.00054.wav 18 | blues/blues.00055.wav 19 | blues/blues.00056.wav 20 | blues/blues.00057.wav 21 | blues/blues.00058.wav 22 | blues/blues.00059.wav 23 | blues/blues.00060.wav 24 | classical/classical.00000.wav 25 | classical/classical.00001.wav 26 | classical/classical.00002.wav 27 | classical/classical.00003.wav 28 | classical/classical.00004.wav 29 | classical/classical.00005.wav 30 | classical/classical.00006.wav 31 | classical/classical.00007.wav 32 | classical/classical.00008.wav 33 | classical/classical.00009.wav 34 | classical/classical.00010.wav 35 | classical/classical.00068.wav 36 | classical/classical.00069.wav 37 | classical/classical.00070.wav 38 | classical/classical.00071.wav 39 | classical/classical.00072.wav 40 | classical/classical.00073.wav 41 | classical/classical.00074.wav 42 | classical/classical.00075.wav 43 | classical/classical.00076.wav 44 | country/country.00000.wav 45 | country/country.00001.wav 46 | country/country.00002.wav 47 | country/country.00003.wav 48 | country/country.00004.wav 49 | country/country.00005.wav 50 | country/country.00006.wav 51 | country/country.00007.wav 52 | country/country.00009.wav 53 | country/country.00010.wav 54 | country/country.00011.wav 55 | country/country.00012.wav 56 | country/country.00013.wav 57 | country/country.00014.wav 58 | country/country.00015.wav 59 | country/country.00016.wav 60 | country/country.00017.wav 61 | country/country.00018.wav 62 | country/country.00027.wav 63 | country/country.00041.wav 64 | country/country.00042.wav 65 | country/country.00045.wav 66 | country/country.00049.wav 67 | disco/disco.00000.wav 68 | disco/disco.00002.wav 69 | disco/disco.00003.wav 70 | disco/disco.00004.wav 71 | disco/disco.00006.wav 72 | disco/disco.00007.wav 73 | disco/disco.00008.wav 74 | disco/disco.00009.wav 75 | disco/disco.00010.wav 76 | disco/disco.00011.wav 77 | disco/disco.00012.wav 78 | disco/disco.00013.wav 79 | disco/disco.00014.wav 80 | disco/disco.00046.wav 81 | disco/disco.00048.wav 82 | disco/disco.00052.wav 83 | disco/disco.00067.wav 84 | disco/disco.00068.wav 85 | disco/disco.00072.wav 86 | disco/disco.00075.wav 87 | disco/disco.00090.wav 88 | disco/disco.00095.wav 89 | hiphop/hiphop.00081.wav 90 | hiphop/hiphop.00082.wav 91 | hiphop/hiphop.00083.wav 92 | hiphop/hiphop.00084.wav 93 | hiphop/hiphop.00085.wav 94 | hiphop/hiphop.00086.wav 95 | hiphop/hiphop.00087.wav 96 | hiphop/hiphop.00088.wav 97 | hiphop/hiphop.00089.wav 98 | hiphop/hiphop.00090.wav 99 | hiphop/hiphop.00091.wav 100 | hiphop/hiphop.00092.wav 101 | hiphop/hiphop.00093.wav 102 | hiphop/hiphop.00094.wav 103 | hiphop/hiphop.00095.wav 104 | hiphop/hiphop.00096.wav 105 | hiphop/hiphop.00097.wav 106 | hiphop/hiphop.00098.wav 107 | jazz/jazz.00002.wav 108 | jazz/jazz.00003.wav 109 | jazz/jazz.00004.wav 110 | jazz/jazz.00005.wav 111 | jazz/jazz.00006.wav 112 | jazz/jazz.00007.wav 113 | jazz/jazz.00008.wav 114 | jazz/jazz.00009.wav 115 | jazz/jazz.00010.wav 116 | jazz/jazz.00025.wav 117 | jazz/jazz.00026.wav 118 | jazz/jazz.00027.wav 119 | jazz/jazz.00028.wav 120 | jazz/jazz.00029.wav 121 | jazz/jazz.00030.wav 122 | jazz/jazz.00031.wav 123 | jazz/jazz.00032.wav 124 | metal/metal.00000.wav 125 | metal/metal.00001.wav 126 | metal/metal.00006.wav 127 | metal/metal.00007.wav 128 | metal/metal.00008.wav 129 | metal/metal.00009.wav 130 | metal/metal.00010.wav 131 | metal/metal.00011.wav 132 | metal/metal.00016.wav 133 | metal/metal.00017.wav 134 | metal/metal.00018.wav 135 | metal/metal.00019.wav 136 | metal/metal.00020.wav 137 | metal/metal.00036.wav 138 | metal/metal.00037.wav 139 | metal/metal.00068.wav 140 | metal/metal.00076.wav 141 | metal/metal.00077.wav 142 | metal/metal.00081.wav 143 | metal/metal.00082.wav 144 | pop/pop.00010.wav 145 | pop/pop.00053.wav 146 | pop/pop.00055.wav 147 | pop/pop.00058.wav 148 | pop/pop.00059.wav 149 | pop/pop.00060.wav 150 | pop/pop.00061.wav 151 | pop/pop.00062.wav 152 | pop/pop.00081.wav 153 | pop/pop.00083.wav 154 | pop/pop.00084.wav 155 | pop/pop.00085.wav 156 | pop/pop.00086.wav 157 | reggae/reggae.00061.wav 158 | reggae/reggae.00062.wav 159 | reggae/reggae.00070.wav 160 | reggae/reggae.00072.wav 161 | reggae/reggae.00074.wav 162 | reggae/reggae.00076.wav 163 | reggae/reggae.00077.wav 164 | reggae/reggae.00078.wav 165 | reggae/reggae.00085.wav 166 | reggae/reggae.00092.wav 167 | reggae/reggae.00093.wav 168 | reggae/reggae.00094.wav 169 | reggae/reggae.00095.wav 170 | reggae/reggae.00096.wav 171 | reggae/reggae.00097.wav 172 | reggae/reggae.00098.wav 173 | reggae/reggae.00099.wav 174 | rock/rock.00038.wav 175 | rock/rock.00049.wav 176 | rock/rock.00050.wav 177 | rock/rock.00051.wav 178 | rock/rock.00052.wav 179 | rock/rock.00053.wav 180 | rock/rock.00054.wav 181 | rock/rock.00055.wav 182 | rock/rock.00056.wav 183 | rock/rock.00071.wav 184 | rock/rock.00072.wav 185 | rock/rock.00073.wav 186 | rock/rock.00074.wav 187 | rock/rock.00075.wav 188 | rock/rock.00076.wav 189 | rock/rock.00077.wav 190 | rock/rock.00078.wav 191 | rock/rock.00079.wav 192 | rock/rock.00080.wav 193 | rock/rock.00081.wav 194 | rock/rock.00082.wav 195 | rock/rock.00083.wav 196 | rock/rock.00084.wav 197 | rock/rock.00085.wav 198 | -------------------------------------------------------------------------------- /gtzan/valid_stratified.txt: -------------------------------------------------------------------------------- 1 | blues/blues.00004.wav 2 | blues/blues.00007.wav 3 | blues/blues.00008.wav 4 | blues/blues.00009.wav 5 | blues/blues.00016.wav 6 | blues/blues.00022.wav 7 | blues/blues.00032.wav 8 | blues/blues.00035.wav 9 | blues/blues.00042.wav 10 | blues/blues.00045.wav 11 | blues/blues.00051.wav 12 | blues/blues.00054.wav 13 | blues/blues.00058.wav 14 | blues/blues.00061.wav 15 | blues/blues.00066.wav 16 | blues/blues.00068.wav 17 | blues/blues.00071.wav 18 | blues/blues.00073.wav 19 | blues/blues.00078.wav 20 | blues/blues.00080.wav 21 | blues/blues.00082.wav 22 | blues/blues.00086.wav 23 | blues/blues.00090.wav 24 | blues/blues.00093.wav 25 | blues/blues.00096.wav 26 | classical/classical.00005.wav 27 | classical/classical.00006.wav 28 | classical/classical.00007.wav 29 | classical/classical.00008.wav 30 | classical/classical.00010.wav 31 | classical/classical.00013.wav 32 | classical/classical.00015.wav 33 | classical/classical.00018.wav 34 | classical/classical.00022.wav 35 | classical/classical.00027.wav 36 | classical/classical.00030.wav 37 | classical/classical.00033.wav 38 | classical/classical.00039.wav 39 | classical/classical.00046.wav 40 | classical/classical.00054.wav 41 | classical/classical.00063.wav 42 | classical/classical.00068.wav 43 | classical/classical.00071.wav 44 | classical/classical.00074.wav 45 | classical/classical.00077.wav 46 | classical/classical.00086.wav 47 | classical/classical.00089.wav 48 | classical/classical.00090.wav 49 | classical/classical.00094.wav 50 | classical/classical.00098.wav 51 | country/country.00000.wav 52 | country/country.00001.wav 53 | country/country.00004.wav 54 | country/country.00005.wav 55 | country/country.00009.wav 56 | country/country.00015.wav 57 | country/country.00020.wav 58 | country/country.00025.wav 59 | country/country.00026.wav 60 | country/country.00027.wav 61 | country/country.00030.wav 62 | country/country.00042.wav 63 | country/country.00043.wav 64 | country/country.00046.wav 65 | country/country.00055.wav 66 | country/country.00058.wav 67 | country/country.00060.wav 68 | country/country.00071.wav 69 | country/country.00083.wav 70 | country/country.00084.wav 71 | country/country.00087.wav 72 | country/country.00088.wav 73 | country/country.00090.wav 74 | country/country.00092.wav 75 | country/country.00098.wav 76 | disco/disco.00003.wav 77 | disco/disco.00015.wav 78 | disco/disco.00022.wav 79 | disco/disco.00025.wav 80 | disco/disco.00028.wav 81 | disco/disco.00029.wav 82 | disco/disco.00032.wav 83 | disco/disco.00039.wav 84 | disco/disco.00040.wav 85 | disco/disco.00043.wav 86 | disco/disco.00046.wav 87 | disco/disco.00050.wav 88 | disco/disco.00051.wav 89 | disco/disco.00053.wav 90 | disco/disco.00059.wav 91 | disco/disco.00062.wav 92 | disco/disco.00070.wav 93 | disco/disco.00072.wav 94 | disco/disco.00073.wav 95 | disco/disco.00074.wav 96 | disco/disco.00085.wav 97 | disco/disco.00087.wav 98 | disco/disco.00092.wav 99 | disco/disco.00093.wav 100 | disco/disco.00095.wav 101 | hiphop/hiphop.00003.wav 102 | hiphop/hiphop.00005.wav 103 | hiphop/hiphop.00007.wav 104 | hiphop/hiphop.00010.wav 105 | hiphop/hiphop.00016.wav 106 | hiphop/hiphop.00019.wav 107 | hiphop/hiphop.00036.wav 108 | hiphop/hiphop.00037.wav 109 | hiphop/hiphop.00039.wav 110 | hiphop/hiphop.00043.wav 111 | hiphop/hiphop.00047.wav 112 | hiphop/hiphop.00048.wav 113 | hiphop/hiphop.00051.wav 114 | hiphop/hiphop.00055.wav 115 | hiphop/hiphop.00066.wav 116 | hiphop/hiphop.00067.wav 117 | hiphop/hiphop.00069.wav 118 | hiphop/hiphop.00071.wav 119 | hiphop/hiphop.00076.wav 120 | hiphop/hiphop.00080.wav 121 | hiphop/hiphop.00081.wav 122 | hiphop/hiphop.00082.wav 123 | hiphop/hiphop.00083.wav 124 | hiphop/hiphop.00093.wav 125 | hiphop/hiphop.00097.wav 126 | jazz/jazz.00000.wav 127 | jazz/jazz.00001.wav 128 | jazz/jazz.00008.wav 129 | jazz/jazz.00011.wav 130 | jazz/jazz.00017.wav 131 | jazz/jazz.00019.wav 132 | jazz/jazz.00021.wav 133 | jazz/jazz.00022.wav 134 | jazz/jazz.00025.wav 135 | jazz/jazz.00029.wav 136 | jazz/jazz.00036.wav 137 | jazz/jazz.00042.wav 138 | jazz/jazz.00044.wav 139 | jazz/jazz.00046.wav 140 | jazz/jazz.00047.wav 141 | jazz/jazz.00049.wav 142 | jazz/jazz.00051.wav 143 | jazz/jazz.00056.wav 144 | jazz/jazz.00063.wav 145 | jazz/jazz.00065.wav 146 | jazz/jazz.00081.wav 147 | jazz/jazz.00082.wav 148 | jazz/jazz.00083.wav 149 | jazz/jazz.00097.wav 150 | jazz/jazz.00098.wav 151 | metal/metal.00007.wav 152 | metal/metal.00008.wav 153 | metal/metal.00009.wav 154 | metal/metal.00012.wav 155 | metal/metal.00015.wav 156 | metal/metal.00029.wav 157 | metal/metal.00031.wav 158 | metal/metal.00032.wav 159 | metal/metal.00033.wav 160 | metal/metal.00034.wav 161 | metal/metal.00039.wav 162 | metal/metal.00043.wav 163 | metal/metal.00044.wav 164 | metal/metal.00053.wav 165 | metal/metal.00055.wav 166 | metal/metal.00063.wav 167 | metal/metal.00064.wav 168 | metal/metal.00067.wav 169 | metal/metal.00068.wav 170 | metal/metal.00073.wav 171 | metal/metal.00083.wav 172 | metal/metal.00087.wav 173 | metal/metal.00094.wav 174 | metal/metal.00095.wav 175 | metal/metal.00096.wav 176 | pop/pop.00001.wav 177 | pop/pop.00007.wav 178 | pop/pop.00010.wav 179 | pop/pop.00014.wav 180 | pop/pop.00016.wav 181 | pop/pop.00018.wav 182 | pop/pop.00019.wav 183 | pop/pop.00023.wav 184 | pop/pop.00025.wav 185 | pop/pop.00028.wav 186 | pop/pop.00029.wav 187 | pop/pop.00037.wav 188 | pop/pop.00047.wav 189 | pop/pop.00057.wav 190 | pop/pop.00058.wav 191 | pop/pop.00059.wav 192 | pop/pop.00060.wav 193 | pop/pop.00073.wav 194 | pop/pop.00078.wav 195 | pop/pop.00080.wav 196 | pop/pop.00085.wav 197 | pop/pop.00089.wav 198 | pop/pop.00090.wav 199 | pop/pop.00094.wav 200 | pop/pop.00099.wav 201 | reggae/reggae.00000.wav 202 | reggae/reggae.00001.wav 203 | reggae/reggae.00007.wav 204 | reggae/reggae.00008.wav 205 | reggae/reggae.00019.wav 206 | reggae/reggae.00024.wav 207 | reggae/reggae.00025.wav 208 | reggae/reggae.00026.wav 209 | reggae/reggae.00029.wav 210 | reggae/reggae.00032.wav 211 | reggae/reggae.00034.wav 212 | reggae/reggae.00038.wav 213 | reggae/reggae.00039.wav 214 | reggae/reggae.00045.wav 215 | reggae/reggae.00047.wav 216 | reggae/reggae.00061.wav 217 | reggae/reggae.00063.wav 218 | reggae/reggae.00067.wav 219 | reggae/reggae.00070.wav 220 | reggae/reggae.00072.wav 221 | reggae/reggae.00080.wav 222 | reggae/reggae.00085.wav 223 | reggae/reggae.00086.wav 224 | reggae/reggae.00090.wav 225 | reggae/reggae.00093.wav 226 | rock/rock.00007.wav 227 | rock/rock.00012.wav 228 | rock/rock.00013.wav 229 | rock/rock.00016.wav 230 | rock/rock.00018.wav 231 | rock/rock.00019.wav 232 | rock/rock.00021.wav 233 | rock/rock.00022.wav 234 | rock/rock.00028.wav 235 | rock/rock.00029.wav 236 | rock/rock.00030.wav 237 | rock/rock.00036.wav 238 | rock/rock.00038.wav 239 | rock/rock.00043.wav 240 | rock/rock.00046.wav 241 | rock/rock.00049.wav 242 | rock/rock.00053.wav 243 | rock/rock.00056.wav 244 | rock/rock.00061.wav 245 | rock/rock.00068.wav 246 | rock/rock.00076.wav 247 | rock/rock.00087.wav 248 | rock/rock.00089.wav 249 | rock/rock.00092.wav 250 | rock/rock.00096.wav 251 | -------------------------------------------------------------------------------- /hpc_scripts/README.md: -------------------------------------------------------------------------------- 1 | ##Setup Tips! 2 | If you want to run this code on a high performance cluster (HPC), or on a computer where you don't have root access, the following setup instructions may be helpful. They may also be helpful in other cases as well. 3 | 4 | ## Instructions 5 | 6 | ### HPC Module loading 7 | On the HPC I use there are various modules that must be loaded on demand (e.g., python and cuda). This is done by placing a file called .gbarrc in your home directory with the following lines: 8 | ``` 9 | MODULES=python/2.7.3,cuda/6.5 10 | ``` 11 | This will probably differ from case to case depending on how your HPC is setup. 12 | 13 | ### Python virtual environment 14 | I first setup a python virtual environment (which provides a clean copy of python without any packages) as follows: 15 | ``` 16 | mkdir venv 17 | virtualenv venv 18 | ``` 19 | 20 | After this you must call: 21 | ``` 22 | source venv/bin/activate 23 | ``` 24 | every time you login to activate the virtual environment. I added the above line to the end of my .bashrc file so it happens automagically. 25 | 26 | ### Installation of libraries 27 | Libraries typically get installed to /usr/local/lib, but often times we don't have write access to that location. However, we can download and install libraries locally. First make a directory to hold the libraries (starting from your home directory): 28 | ``` 29 | mkdir .local 30 | mkdir .local/lib 31 | mkdir .local/include 32 | ``` 33 | 34 | Add this location to your LD_LIBRARY_PATH too 35 | ``` 36 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 37 | ``` 38 | (that line can be added to .bashrc so that it is excuted in every new shell). 39 | 40 | Now download and install the libraries. The following "pseudocode" demonstrates the basic process (you will need to track down the correct urls for your platform): 41 | ``` 42 | url_list = [ 43 | "http://www.hdfgroup.org/HDF5/release/obtain5.html", # find your libhdf5 here 44 | "http://sourceforge.net/projects/mad/files/libmad", # find libmad here (only need if you plan to read mp3 files) 45 | "http://www.mega-nerd.com/SRC/download.html", # libsamplerate 46 | "http://www.mega-nerd.com/libsndfile/#Download", #libsndfile 47 | "http://pyyaml.org/download/libyaml" #libyaml 48 | ] 49 | 50 | for url in url_list: 51 | wget url 52 | tar xvfz package_name.tar.gz 53 | cd package_name 54 | ./configure --prefix=$HOME/.local 55 | make 56 | make install 57 | ``` 58 | the --prefix flag tells the Makefile to install the library locally (and therefore it is crucial that you include it). 59 | 60 | Note on the HDF5 library: This library may already exist on your system, however it may be a good idea to install a local copy anyway, because some older versions do not support multiple read access (which is required), leading to an error (if you see 'FILE_OPEN_POLICY=strict' printed out, then you have probably just encountered this error). Also, to make sure Pytables links with the correct HDF5 library add the following to your .bashrc file: 61 | ``` 62 | export HDF5_DIR=$HOME/.local 63 | ``` 64 | 65 | Note on libmad: In the Makfile for libmad you may need to remove the "-fforce-mem" flag, which gcc no longer supports. 66 | 67 | ### Installation of Python packages 68 | After the libraries have been installed, one can start installing the necessary python packages. First you should create a file called .numpy-site.cfg in your home directory with the following lines: 69 | ``` 70 | [sndfile] 71 | library_dirs = $HOME/.local/lib 72 | include_dirs = $HOME/.local/include 73 | [hdf5] 74 | library_dirs = $HOME/.local/lib 75 | include_dirs = $HOME/.local/include 76 | [samplerate] 77 | library_dirs = $HOME/.local/lib 78 | include_dirs = $HOME/.local/include 79 | ``` 80 | This tells numpy's distutils where to find your locally installed libraries. You will probably have to replace $HOME with the actual path to your home directory (e.g., "/home/a/user") since it does not seem to get properly exported as an environment variable by numpy. 81 | 82 | Now we can install the python packages (using pip, or from github sources, etc). The packages I have installed are: 83 | ``` 84 | numpy, 85 | scipy, 86 | theano, 87 | pylearn2, 88 | pytables, 89 | numexpr, 90 | cython, 91 | pyyaml, 92 | ipython, 93 | sklearn, 94 | matplotlib, 95 | scikits.audiolab, 96 | scikits.samplerate, 97 | pymad (if you need to read mp3's) 98 | ``` 99 | 100 | You might first try installing these using the "requirements.txt" file in this folder: 101 | ``` 102 | pip install -r path/to/requirements.txt 103 | ``` 104 | If that doesn't work you can try installing these with "pip install package_name", and finally, if that fails, download the source code for each module and run "python setup.py build" followed by "python setup.py install". 105 | 106 | Note: Theano and Pylearn2 should be installed from the github repositories in order to get up-to-date verions (the pip packages seem to be old). 107 | 108 | ### Theano setup 109 | If you want theano to use your GPU (and you probably do if you have one), create a file called .theanorc in your home directory with the following lines: 110 | ``` 111 | [global] 112 | floatX = float32 113 | device = gpu0 114 | 115 | [nvcc] 116 | fastmath = True 117 | ``` -------------------------------------------------------------------------------- /hpc_scripts/generate_gbar_jobs.py: -------------------------------------------------------------------------------- 1 | jobscript = ''' 2 | #!/bin/sh 3 | # embedded options to qsub - start with #PBS 4 | # -- Name of the job --- 5 | #PBS -N {jobname} 6 | # -- specify queue -- 7 | #PBS -q hpc 8 | # -- estimated wall clock time (execution time): hh:mm:ss -- 9 | #PBS -l walltime=24:00:00 10 | # --- number of processors/cores/nodes -- 11 | #PBS -l nodes=1:ppn=1:gpus=1 12 | # -- user email address -- 13 | #PBS -M cmke@dtu.dk 14 | # -- mail notification -- 15 | #PBS -m abe 16 | # -- run in the current working (submission) directory -- 17 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 18 | # here follow the commands you want to execute 19 | # Load modules needed by myapplication.x 20 | module load python/2.7.3 cuda/6.5 21 | 22 | # Run my program 23 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 24 | source ~/venv/bin/activate 25 | cd /SCRATCH/cmke/dnn-mgr 26 | python train_mlp_script.py {fold_config} {dropout} --nunits {nunits} --output {savename} 27 | '''.format 28 | 29 | fold_config = ['GTZAN_1024-fold-1_of_4.pkl']*4 + ['GTZAN_1024-filtered-fold.pkl']*4 30 | dropout = ['', '', '--dropout', '--dropout']*4 31 | nunits = [50, 500]*8 32 | for f, d, n in zip(fold_config, dropout, nunits): 33 | 34 | savename='' 35 | if f==fold_config[0]: 36 | savename += 'S_' 37 | else: 38 | savename += 'F_' 39 | 40 | if n==50: 41 | savename += '50_' 42 | else: 43 | savename += '500_' 44 | 45 | if d=='': 46 | savename += 'RS' 47 | else: 48 | savename += 'RSD' 49 | 50 | with open(savename+'.sh', 'w') as fname: 51 | fname.write(jobscript(jobname=savename, fold_config=f, dropout=d, nunits=n, savename='./saved/'+savename+'.pkl')) 52 | -------------------------------------------------------------------------------- /hpc_scripts/generate_gbar_jobs_dnn.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | jobscript = ''' 4 | #!/bin/sh 5 | # embedded options to qsub - start with #PBS 6 | # -- Name of the job --- 7 | #PBS -N {jobname} 8 | # -- specify queue -- 9 | #PBS -q hpc 10 | # -- estimated wall clock time (execution time): hh:mm:ss -- 11 | #PBS -l walltime=24:00:00 12 | # --- number of processors/cores/nodes -- 13 | #PBS -l nodes=1:ppn=1:gpus=1 14 | # -- user email address -- 15 | #PBS -M cmke@dtu.dk 16 | # -- mail notification -- 17 | #PBS -m abe 18 | # -- run in the current working (submission) directory -- 19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 20 | # here follow the commands you want to execute 21 | # Load modules needed by myapplication.x 22 | module load python/2.7.3 cuda/6.5 23 | 24 | # Run my program 25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 26 | source ~/venv/bin/activate 27 | cd /dtu-compute/cosound/data/_tzanetakis_genre/ 28 | python /SCRATCH/cmke/dnn-mgr/train_mlp_script.py {fold_config} {yaml_file} --nunits {nunits} --output {savename} 29 | '''.format 30 | 31 | fold_config = ['GTZAN_stratified.pkl']*4 + ['GTZAN_filtered.pkl']*4 32 | yaml_file = ['mlp_rlu2.yaml', 'mlp_rlu2.yaml', 'mlp_rlu_dropout2.yaml', 'mlp_rlu_dropout2.yaml']*4 33 | nunits = [50, 500]*8 34 | for f, d, n in zip(fold_config, yaml_file, nunits): 35 | 36 | savename='' 37 | if f==fold_config[0]: 38 | savename += 'S_' 39 | else: 40 | savename += 'F_' 41 | 42 | if n==50: 43 | savename += '50_' 44 | else: 45 | savename += '500_' 46 | 47 | if d=='mlp_rlu2.yaml': 48 | savename += 'RS' 49 | else: 50 | savename += 'RSD' 51 | 52 | with open(savename+'.sh', 'w') as fname: 53 | fname.write(jobscript(jobname=savename, fold_config=f, yaml_file=os.path.join('/SCRATCH/cmke/dnn-mgr/',d), nunits=n, savename='/SCRATCH/cmke/saved_models/dnn/'+savename+'.pkl')) 54 | -------------------------------------------------------------------------------- /hpc_scripts/generate_gbar_jobs_rf.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | jobscript = ''' 4 | #!/bin/sh 5 | # embedded options to qsub - start with #PBS 6 | # -- Name of the job --- 7 | #PBS -N {jobname} 8 | # -- specify queue -- 9 | #PBS -q hpc 10 | # -- estimated wall clock time (execution time): hh:mm:ss -- 11 | #PBS -l walltime=12:00:00 12 | # --- number of processors/cores/nodes -- 13 | #PBS -l nodes=1:ppn=4:gpus=1 14 | # -- user email address -- 15 | #PBS -M cmke@dtu.dk 16 | # -- mail notification -- 17 | #PBS -m abe 18 | # -- run in the current working (submission) directory -- 19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 20 | # here follow the commands you want to execute 21 | # Load modules needed by myapplication.x 22 | module load python/2.7.3 cuda/6.5 23 | 24 | # Run my program 25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 26 | source ~/venv/bin/activate 27 | cd /SCRATCH/cmke/dnn-mgr 28 | python train_classifier_on_dnn_feats.py {model_file} {dataset_dir} --which_layers {which_layers} --save_folder {save_folder} {aggregate_features} 29 | '''.format 30 | 31 | model_files = ['S_50_RS.pkl', 'S_50_RSD.pkl', 'S_500_RS.pkl', 'S_500_RSD.pkl', 'F_50_RS.pkl', 'F_50_RSD.pkl', 'F_500_RS.pkl', 'F_500_RSD.pkl'] 32 | dataset_dir = '/dtu-compute/cosound/data/_tzanetakis_genre/audio' 33 | which_layers = ['0', '1', '2', '0 1 2'] 34 | aggregate_features = ['--aggregate_features', ''] 35 | 36 | job_list = [] 37 | for agg in aggregate_features: 38 | for model in model_files: 39 | for l in which_layers: 40 | 41 | savename = model.split('.pkl')[0] 42 | if agg=='': 43 | savename += '_FF_' # frame-level features 44 | else: 45 | savename += '_AF_' # aggregate features 46 | 47 | if l=='0 1 2': 48 | savename += 'LAll' 49 | else: 50 | savename += 'L' + l 51 | 52 | jobname = savename+'.sh' 53 | job_list.append( jobname ) 54 | with open(jobname, 'w') as fname: 55 | fname.write( jobscript( jobname=savename, 56 | model_file=os.path.join('./saved', model), 57 | dataset_dir=dataset_dir, 58 | which_layers=l, 59 | save_folder=os.path.join('./saved', savename), 60 | aggregate_features=agg) ) 61 | 62 | with open('_master_RF_trainer.sh', 'w') as f: 63 | for j in job_list: 64 | f.write('qsub %s\n' % j) 65 | 66 | 67 | -------------------------------------------------------------------------------- /hpc_scripts/generate_gbar_jobs_rf2.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | jobscript = ''' 4 | #!/bin/sh 5 | # embedded options to qsub - start with #PBS 6 | # -- Name of the job --- 7 | #PBS -N {jobname} 8 | # -- specify queue -- 9 | #PBS -q hpc 10 | # -- estimated wall clock time (execution time): hh:mm:ss -- 11 | #PBS -l walltime=12:00:00 12 | # --- number of processors/cores/nodes -- 13 | #PBS -l nodes=1:ppn=4:gpus=1 14 | # -- user email address -- 15 | #PBS -M cmke@dtu.dk 16 | # -- mail notification -- 17 | #PBS -m abe 18 | # -- run in the current working (submission) directory -- 19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 20 | # here follow the commands you want to execute 21 | # Load modules needed by myapplication.x 22 | module load python/2.7.3 cuda/6.5 23 | 24 | # Run my program 25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 26 | source ~/venv/bin/activate 27 | 28 | cd /dtu-compute/cosound/data/_tzanetakis_genre 29 | python /SCRATCH/cmke/dnn-mgr/train_classifier_on_dnn_feats2.py {model_file} --which_layers {which_layers} --save_file {save_file} {aggregate_features} 30 | '''.format 31 | 32 | model_files = ['S_50_RS.pkl', 'S_50_RSD.pkl', 'S_500_RS.pkl', 'S_500_RSD.pkl', 'F_50_RS.pkl', 'F_50_RSD.pkl', 'F_500_RS.pkl', 'F_500_RSD.pkl'] 33 | dataset_dir = '/dtu-compute/cosound/data/_tzanetakis_genre/audio' 34 | which_layers = ['1', '2', '3', '1 2 3'] 35 | aggregate_features = ['--aggregate_features', ''] 36 | 37 | job_list = [] 38 | for agg in aggregate_features: 39 | for model in model_files: 40 | for l in which_layers: 41 | 42 | savename = model.split('.pkl')[0] 43 | if agg=='': 44 | savename += '_FF_' # frame-level features 45 | else: 46 | savename += '_AF_' # aggregate features 47 | 48 | if l=='1 2 3': 49 | savename += 'LAll' 50 | else: 51 | savename += 'L' + l 52 | 53 | jobname = savename+'.sh' 54 | job_list.append( jobname ) 55 | with open(jobname, 'w') as fname: 56 | fname.write( jobscript( jobname=savename, 57 | model_file=os.path.join('/SCRATCH/cmke/saved_models/dnn', model), 58 | which_layers=l, 59 | save_file=os.path.join('/SCRATCH/cmke/saved_models/rf', savename), 60 | aggregate_features=agg) ) 61 | 62 | with open('_master_RF_trainer.sh', 'w') as f: 63 | for j in job_list: 64 | f.write('qsub %s\n' % j) 65 | 66 | 67 | -------------------------------------------------------------------------------- /hpc_scripts/generate_jobs.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #python prepare_dataset.py \ 4 | # ../GTZAN \ 5 | # ./label_list_GTZAN.txt \ 6 | # --hdf5 ./GTZAN.hdf5 \ 7 | # --nfft 4096 --nhop 2048 \ 8 | # --train ./gtzan/train_stratified2.txt \ 9 | # --valid ./gtzan/valid_stratified2.txt \ 10 | # --test ./gtzan/test_stratified2.txt \ 11 | # --partition_name ./GTZANstrat_partition_configuration.pkl 12 | 13 | python train_mlp_script.py \ 14 | ./GTZANstrat_partition_configuration.pkl \ 15 | ./yaml_scripts/mlp_rlu2.yaml \ 16 | --nunits 50 17 | --output GTZAN_strat2049_model.pkl 18 | -------------------------------------------------------------------------------- /hpc_scripts/gpu_to_cpu_pkl.py: -------------------------------------------------------------------------------- 1 | import os, sys, glob, re 2 | import numpy as np 3 | from pylearn2.utils import serial 4 | import pylearn2.config.yaml_parse as yaml_parse 5 | from pylearn2.models.rbm import RBM 6 | import copy 7 | 8 | ''' 9 | When training is done on a GPU the model files won't unpickle properly. 10 | This script attempts to fix that. It must be run on a GPU with the environment 11 | variable THEANO_FLAGS exported beforehand. If pretraining is done then 12 | we may have to delete the .cpu.pkl files for the composite MLP and re-run 13 | (because the layers models need to be properly converted first) 14 | ''' 15 | 16 | if __name__=="__main__": 17 | 18 | os.environ['THEANO_FLAGS']="device=cpu" 19 | _, directory = sys.argv 20 | 21 | files_list = glob.glob(os.path.join(directory, '*.pkl')) 22 | 23 | p1 = re.compile(r"(cpu)") 24 | p2 = re.compile(r"(\.pkl)") 25 | 26 | for in_file in files_list: 27 | 28 | if p1.search(in_file) != None: 29 | continue 30 | 31 | out_file = os.path.splitext(in_file)[0] + '.cpu.pkl' 32 | 33 | if os.path.exists(out_file): 34 | continue 35 | 36 | model = serial.load(in_file) 37 | model_yaml_src = p2.sub('.cpu.pkl', model.yaml_src) 38 | model2 = yaml_parse.load(model_yaml_src) 39 | 40 | params = model.get_param_values() 41 | model2.set_param_values(params) 42 | 43 | model2.yaml_src = model_yaml_src 44 | model2.dataset_yaml_src = model.dataset_yaml_src 45 | 46 | serial.save(out_file, model2) 47 | -------------------------------------------------------------------------------- /hpc_scripts/requirements.txt: -------------------------------------------------------------------------------- 1 | Cython==0.21.1 2 | PyYAML==3.11 3 | Theano==0.6.0 4 | argparse==1.2.1 5 | ipython==2.3.0 6 | matplotlib==1.4.2 7 | mock==1.0.1 8 | nose==1.3.4 9 | numexpr==2.4 10 | numpy==1.9.0 11 | -e git://github.com/lisa-lab/pylearn2.git@b645ac32a6ca0ce92724e75236db193671ce2e19#egg=pylearn2-master 12 | pymad==0.6 13 | pyparsing==2.0.3 14 | python-dateutil==2.2 15 | pytz==2014.7 16 | scikit-learn==0.15.2 17 | scikits.audiolab==0.11.0 18 | scikits.samplerate==0.4.0.dev 19 | scipy==0.14.0 20 | six==1.8.0 21 | tables==3.1.2dev 22 | wsgiref==0.1.2 23 | -------------------------------------------------------------------------------- /lmd/lmd_prep.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # embedded options to qsub - start with #PBS 3 | # -- Name of the job --- 4 | #PBS -N lmd_prep 5 | # -- specify queue -- 6 | #PBS -q hpc 7 | # -- estimated wall clock time (execution time): hh:mm:ss -- 8 | #PBS -l walltime=12:00:00 9 | # --- number of processors/cores/nodes -- 10 | #PBS -l nodes=1:ppn=1:gpus=1 11 | # -- user email address -- 12 | #PBS -M coreyker@gmail.com 13 | # -- mail notification -- 14 | #PBS -m abe 15 | # -- run in the current working (submission) directory -- 16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 17 | # here follow the commands you want to execute 18 | # Load modules needed by myapplication.x 19 | module load python/2.7.3 cuda/6.5 20 | 21 | # Run my program 22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 23 | source ~/venv/bin/activate 24 | cd ~/dnn-mgr 25 | python prepare_dataset.py \ 26 | /dtu-compute/cosound/data/_latinmusicdataset/ \ 27 | /dtu-compute/cosound/data/_latinmusicdataset/label_list.txt \ 28 | --hdf5 /dtu-compute/cosound/data/_latinmusicdataset/LMD.h5 \ 29 | --train /dtu-compute/cosound/data/_latinmusicdataset/train-part.txt \ 30 | --valid /dtu-compute/cosound/data/_latinmusicdataset/valid-part.txt \ 31 | --test /dtu-compute/cosound/data/_latinmusicdataset/test-part.txt \ 32 | --partition_name /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_config.pkl \ 33 | --compute_std -------------------------------------------------------------------------------- /lmd/lmd_train.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # embedded options to qsub - start with #PBS 3 | # -- Name of the job --- 4 | #PBS -N lmd_train 5 | # -- specify queue -- 6 | #PBS -q hpc 7 | # -- estimated wall clock time (execution time): hh:mm:ss -- 8 | #PBS -l walltime=30:00:00 9 | # --- number of processors/cores/nodes -- 10 | #PBS -l nodes=1:ppn=1:gpus=1 11 | # -- user email address -- 12 | #PBS -M coreyker@gmail.com 13 | # -- mail notification -- 14 | #PBS -m abe 15 | # -- run in the current working (submission) directory -- 16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 17 | # here follow the commands you want to execute 18 | # Load modules needed by myapplication.x 19 | module load python/2.7.3 cuda/6.5 20 | 21 | # Run my program 22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 23 | source ~/venv/bin/activate 24 | cd /dtu-compute/cosound/data/_latinmusicdataset/ 25 | python ~/dnn-mgr/train_mlp_script.py \ 26 | /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_config.pkl \ 27 | ~/dnn-mgr/yaml_scripts/mlp_rlu_dropout.yaml \ 28 | --nunits 500 \ 29 | --output ~/dnn-mgr/lmd/lmd_513_500x3.pkl 30 | -------------------------------------------------------------------------------- /lmd/lmd_train_conv.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # embedded options to qsub - start with #PBS 3 | # -- Name of the job --- 4 | #PBS -N lmd_train_conv 5 | # -- specify queue -- 6 | #PBS -q hpc 7 | # -- estimated wall clock time (execution time): hh:mm:ss -- 8 | #PBS -l walltime=30:00:00 9 | # --- number of processors/cores/nodes -- 10 | #PBS -l nodes=1:ppn=1:gpus=1 11 | # -- user email address -- 12 | #PBS -M coreyker@gmail.com 13 | # -- mail notification -- 14 | #PBS -m abe 15 | # -- run in the current working (submission) directory -- 16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi 17 | # here follow the commands you want to execute 18 | # Load modules needed by myapplication.x 19 | module load python/2.7.3 cuda/6.5 20 | 21 | # Run my program 22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH 23 | source ~/venv/bin/activate 24 | cd /dtu-compute/cosound/data/_latinmusicdataset/ 25 | python ~/dnn-mgr/train_mlp_conv_script.py \ 26 | /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_conv_config.pkl \ 27 | ~/dnn-mgr/yaml_scripts/mlp_rlu_conv2.yaml \ 28 | --output ~/dnn-mgr/lmd/lmd_513_conv.pkl 29 | -------------------------------------------------------------------------------- /lmd_af/LMD_AF_split_dnn.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coreyker/dnn-mgr/bdad579ea6cb37b665ea6019fe1026a6ce20cbc7/lmd_af/LMD_AF_split_dnn.pkl -------------------------------------------------------------------------------- /lmd_af/mlp_rlu_dropout_adversary.yaml: -------------------------------------------------------------------------------- 1 | !obj:pylearn2.train.Train { 2 | dataset : &trainset !obj:adversary_dataset.AdversaryDataset { 3 | adv_model: &adv_model !pkl: "lmd_af/lmd_af_513_500x3.pkl", 4 | which_set : 'train', 5 | config : &fold !pkl: "%(fold_config)s" 6 | }, 7 | model : !obj:pylearn2.models.mlp.MLP { 8 | nvis : 513, 9 | layers : [ 10 | !obj:audio_dataset.PreprocLayer { 11 | config : *fold, 12 | proc_type : 'standardize' 13 | }, 14 | !obj:pylearn2.models.mlp.RectifiedLinear { 15 | layer_name : 'h0', 16 | dim : %(dim_h0)i, 17 | irange : &irange .1 18 | }, 19 | !obj:pylearn2.models.mlp.RectifiedLinear { 20 | layer_name : 'h1', 21 | dim : %(dim_h1)i, 22 | irange : *irange 23 | }, 24 | !obj:pylearn2.models.mlp.RectifiedLinear { 25 | layer_name : 'h2', 26 | dim : %(dim_h2)i, 27 | irange : *irange 28 | }, 29 | !obj:pylearn2.models.mlp.Softmax { 30 | n_classes : 10, 31 | layer_name : 'y', 32 | irange : *irange 33 | } 34 | ] 35 | }, 36 | algorithm : !obj:pylearn2.training_algorithms.sgd.SGD { 37 | learning_rate : .1, 38 | learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum { 39 | init_momentum : 0.5 40 | }, 41 | train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 42 | monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 43 | #batches_per_iter : 500, 44 | batch_size : 1200, 45 | monitoring_dataset : { 46 | 'train' : *trainset, 47 | 'valid' : !obj:adversary_dataset.AdversaryDataset { 48 | adv_model: *adv_model, 49 | which_set : 'valid', 50 | config : *fold 51 | } 52 | }, 53 | termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased { 54 | channel_name : 'valid_y_misclass', 55 | prop_decrease : .001, 56 | N: 50 57 | }, 58 | cost: !obj:pylearn2.costs.mlp.dropout.Dropout { 59 | default_input_include_prob : .75, 60 | input_include_probs: { 'pre': 1., 'h0': 1. }, 61 | input_scales: { 'pre': 1., 'h0' : 1. } 62 | } 63 | }, 64 | extensions: [ 65 | !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest { 66 | channel_name: 'valid_y_misclass', 67 | save_path: "%(best_model_save_path)s" 68 | }, 69 | !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor { 70 | start: 1, 71 | saturate: 200, 72 | final_momentum: .9 73 | }, 74 | !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch { 75 | start: 1, 76 | saturate: 200, 77 | decay_factor: .01 78 | } 79 | ], 80 | save_path : "%(save_path)s", 81 | save_freq : 1 82 | } -------------------------------------------------------------------------------- /pretrain_layers.py: -------------------------------------------------------------------------------- 1 | import os, sys, cPickle, argparse 2 | from pylearn2.train import Train 3 | from pylearn2.training_algorithms.sgd import SGD 4 | from pylearn2.costs.ebm_estimation import SML 5 | from pylearn2.datasets.transformer_dataset import TransformerDataset 6 | from pylearn2.termination_criteria import MonitorBased, ChannelTarget, EpochCounter 7 | from pylearn2.training_algorithms.learning_rule import RMSProp 8 | from pylearn2.costs.autoencoder import MeanSquaredReconstructionError 9 | 10 | import pylearn2.config.yaml_parse as yaml_parse 11 | from audio_dataset import AudioDataset, PreprocLayer 12 | import pdb 13 | 14 | ''' 15 | (Although it may be more complicated) We build our models and dataset using yaml in order to keep a record of how things were built 16 | ''' 17 | 18 | def get_grbm(nvis, nhid): 19 | 20 | model_yaml = '''!obj:pylearn2.models.rbm.GaussianBinaryRBM { 21 | nvis : %(nvis)i, 22 | nhid : %(nhid)i, 23 | irange : .1, 24 | energy_function_class : !obj:pylearn2.energy_functions.rbm_energy.grbm_type_1 {}, 25 | init_sigma : 1., 26 | init_bias_hid : 0, 27 | mean_vis : True 28 | }''' % {'nvis' : nvis, 'nhid': nhid} 29 | 30 | model = yaml_parse.load(model_yaml) 31 | return model 32 | 33 | def get_rbm(nvis, nhid): 34 | 35 | model_yaml = '''!obj:pylearn2.models.rbm.RBM { 36 | nvis : %(nvis)i, 37 | nhid : %(nhid)i, 38 | irange : .1 39 | }''' % {'nvis' : nvis, 'nhid': nhid} 40 | 41 | model = yaml_parse.load(model_yaml) 42 | return model 43 | 44 | def get_ae(nvis, nhid): 45 | 46 | model_yaml = '''!obj:pylearn2.models.autoencoder.DenoisingAutoencoder { 47 | nvis : %(nvis)i, 48 | nhid : %(nhid)i, 49 | irange : .1, 50 | corruptor : !obj:pylearn2.corruption.BinomialCorruptor { corruption_level : .1 }, 51 | act_enc : 'sigmoid', 52 | act_dec : null 53 | }''' % {'nvis' : nvis, 'nhid': nhid} 54 | 55 | model = yaml_parse.load(model_yaml) 56 | return model 57 | 58 | def get_rbm_trainer(model, dataset, save_path, epochs=5): 59 | """ 60 | A Restricted Boltzmann Machine (RBM) trainer 61 | """ 62 | 63 | config = { 64 | 'learning_rate': 1e-2, 65 | 'train_iteration_mode': 'shuffled_sequential', 66 | 'batch_size': 250, 67 | #'batches_per_iter' : 100, 68 | 'learning_rule': RMSProp(), 69 | 'monitoring_dataset': dataset, 70 | 'cost' : SML(250, 1), 71 | 'termination_criterion' : EpochCounter(max_epochs=epochs), 72 | } 73 | 74 | return Train(model=model, 75 | algorithm=SGD(**config), 76 | dataset=dataset, 77 | save_path=save_path, 78 | save_freq=1 79 | )#, extensions=extensions) 80 | 81 | def get_ae_trainer(model, dataset, save_path, epochs=5): 82 | """ 83 | An Autoencoder (AE) trainer 84 | """ 85 | 86 | config = { 87 | 'learning_rate': 1e-2, 88 | 'train_iteration_mode': 'shuffled_sequential', 89 | 'batch_size': 250, 90 | #'batches_per_iter' : 2000, 91 | 'learning_rule': RMSProp(), 92 | 'monitoring_dataset': dataset, 93 | 'cost' : MeanSquaredReconstructionError(), 94 | 'termination_criterion' : EpochCounter(max_epochs=epochs), 95 | } 96 | 97 | return Train(model=model, 98 | algorithm=SGD(**config), 99 | dataset=dataset, 100 | save_path=save_path, 101 | save_freq=1 102 | )#, extensions=extensions) 103 | 104 | if __name__=="__main__": 105 | 106 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 107 | description='Script to pretrain the layers of a DNN.') 108 | 109 | parser.add_argument('fold_config', help='Path to dataset configuration file (generated with prepare_dataset.py)') 110 | parser.add_argument('--arch', nargs='*', type=int, help='Architecture: nvis nhid1 nhid2 ...') 111 | parser.add_argument('--epochs', type=int, help='Number of training epochs per layer') 112 | parser.add_argument('--save_prefix', help='Full path and prefix for saving output models') 113 | parser.add_argument('--use_autoencoder', action='store_true') 114 | args = parser.parse_args() 115 | 116 | if args.epochs is None: 117 | args.epochs = 5 118 | 119 | arch = [(i,j) for i,j in zip(args.arch[:-1], args.arch[1:])] 120 | 121 | with open(args.fold_config) as f: 122 | config = cPickle.load(f) 123 | 124 | preproc_layer = PreprocLayer(config=config, proc_type='standardize') 125 | 126 | dataset = TransformerDataset( 127 | raw=AudioDataset(which_set='train', config=config), 128 | transformer=preproc_layer.layer_content 129 | ) 130 | 131 | # transformer_yaml = '''!obj:pylearn2.datasets.transformer_dataset.TransformerDataset { 132 | # raw : %(raw)s, 133 | # transformer : %(transformer)s 134 | # }''' 135 | # 136 | # dataset_yaml = transformer_yaml % { 137 | # 'raw' : '''!obj:audio_dataset.AudioDataset { 138 | # which_set : 'train', 139 | # config : !pkl: "%(fold_config)s" 140 | # }''' % {'fold_config' : args.fold_config}, 141 | # 'transformer' : '''!obj:pylearn2.models.mlp.MLP { 142 | # nvis : %(nvis)i, 143 | # layers : 144 | # [ 145 | # !obj:audio_dataset.PreprocLayer { 146 | # config : !pkl: "%(fold_config)s", 147 | # proc_type : 'standardize' 148 | # } 149 | # ] 150 | # }''' % {'nvis' : args.arch[0], 'fold_config' : args.fold_config } 151 | # } 152 | 153 | for i,(v,h) in enumerate(arch): 154 | 155 | if not args.use_autoencoder: 156 | print 'Pretraining layer %d with RBM' % i 157 | 158 | if i==0: 159 | model = get_grbm(v,h) 160 | else: 161 | model = get_rbm(v,h) 162 | 163 | save_path = args.save_prefix+ 'RBM_L{}.pkl'.format(i+1) 164 | trainer = get_rbm_trainer(model=model, dataset=dataset, save_path=save_path, epochs=args.epochs) 165 | else: 166 | print 'Pretraining layer %d with AE' % i 167 | 168 | model = get_ae(v,h) 169 | save_path = args.save_prefix + 'AE_L{}.pkl'.format(i+1) 170 | trainer = get_ae_trainer(model=model, dataset=dataset, save_path=save_path, epochs=args.epochs) 171 | 172 | trainer.main_loop() 173 | 174 | dataset = TransformerDataset(raw=dataset, transformer=model) 175 | 176 | # dataset_yaml = transformer_yaml % {'raw' : dataset_yaml, 'transformer' : '!pkl: %s' % save_path} 177 | # dataset = yaml_parse.load( dataset_yaml ) 178 | 179 | 180 | -------------------------------------------------------------------------------- /svm_train_test.py: -------------------------------------------------------------------------------- 1 | # svm train / test 2 | import os, sys, copy, cPickle 3 | import numpy as np 4 | import theano 5 | import theano.tensor as T 6 | from pylearn2.utils import serial 7 | from sklearn.svm import LinearSVC, SVC 8 | import pdb 9 | 10 | def train_svm(X,y,C): 11 | if 0: 12 | svm = LinearSVC(C=C, loss='l1', random_state=1234) 13 | else: 14 | svm = SVC(C=C, kernel='linear', random_state=1234) 15 | return svm.fit(X,y) 16 | 17 | def test_svm(X, y, svm): 18 | n_classes = 10 19 | 20 | confusion = np.zeros((n_classes, n_classes)) 21 | for feats, label in zip(X,y): 22 | true_label = label if np.isscalar(label) else label[0] 23 | pred = np.array( svm.predict(feats), dtype='int' ) 24 | vote_label = np.argmax( np.bincount(pred, minlength=10) ) 25 | 26 | confusion[int(true_label), int(vote_label)] += 1 27 | 28 | total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion)) 29 | 30 | return total_error, confusion 31 | 32 | def grid_search(X_train, y_train, X_valid, y_valid, C_values): 33 | n_classes = 10 34 | best_svm = None 35 | best_C = None 36 | best_error = 100. 37 | best_conf = None 38 | 39 | for m,C in enumerate(C_values): 40 | 41 | svm = train_svm( np.vstack(X_train), np.hstack(y_train), C) 42 | err, conf = test_svm( np.vstack(X_valid), np.hstack(y_valid), svm ) 43 | #err, conf = test_svm(X_valid, y_valid, svm) 44 | 45 | if err < best_error: 46 | best_error = err 47 | best_svm = copy.deepcopy(svm) 48 | best_C = C 49 | best_conf = conf 50 | 51 | print 'Model selection progress %2d%%, best_error=%2.2f, curr_error=%2.2f' % ((100*m)/len(C_values), best_error, err) 52 | 53 | print ''#newline 54 | return best_error, best_svm, best_C, best_conf 55 | 56 | if __name__ == "__main__": 57 | 58 | # load in BOF features 59 | model = './saved-rlu-505050/mlp_rlu-fold-4_of_4'#'mlp_rlu_fold1_best' 60 | train_BOF = model + '-train-BOF.pkl' 61 | valid_BOF = model + '-valid-BOF.pkl' 62 | test_BOF = model + '-test-BOF.pkl' 63 | 64 | with open(train_BOF) as f: 65 | X_train, y_train = cPickle.load(f) 66 | 67 | with open(valid_BOF) as f: 68 | X_valid, y_valid = cPickle.load(f) 69 | 70 | with open(test_BOF) as f: 71 | X_test, y_test = cPickle.load(f) 72 | 73 | C_values = 10.0 ** np.arange(-3, 3, 0.25) 74 | #C_values = np.arange(0.5, 1, 0.01) 75 | best_error, best_svm, best_C, best_conf = grid_search(X_train, y_train, X_valid, y_valid, C_values) 76 | 77 | # re-train on train+valid sets before testing 78 | best_svm = train_svm(np.vstack(sum([X_train, X_valid],[])), np.hstack(sum([y_train, y_valid],[])), best_C) 79 | total_error, confusion = test_svm(X_test, y_test, best_svm) 80 | 81 | print 'test accuracy: %2.2f' % (100-total_error) 82 | print 'confusion matrix:' 83 | print confusion/np.sum(confusion, axis=1) 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /test_mlp_script.py: -------------------------------------------------------------------------------- 1 | import sys, re, csv, cPickle 2 | import numpy as np 3 | import theano 4 | 5 | from pylearn2.utils import serial 6 | from audio_dataset import AudioDataset 7 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 8 | import pylearn2.config.yaml_parse as yaml_parse 9 | 10 | import pdb 11 | def frame_misclass_error(model, dataset): 12 | """ 13 | Function to compute the frame-level classification error by classifying 14 | individual frames and then voting for the class with highest cumulative probability 15 | """ 16 | 17 | n_classes = len(dataset.targets) 18 | feat_space = model.get_input_space() 19 | 20 | X = feat_space.make_theano_batch() 21 | Y = model.fprop( X ) 22 | fprop = theano.function([X],Y) 23 | 24 | confusion = np.zeros((n_classes, n_classes)) 25 | 26 | batch_size = 30 27 | n_examples = len(dataset.support) // batch_size 28 | target_space = VectorSpace(dim=n_classes) 29 | data_specs = (CompositeSpace((feat_space, target_space)), ("features", "targets")) 30 | iterator = dataset.iterator(mode='sequential', batch_size=batch_size, data_specs=data_specs) 31 | 32 | for i, el in enumerate(iterator): 33 | 34 | # display progress indicator 35 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples))) 36 | sys.stdout.flush() 37 | 38 | fft_data = np.array(el[0], dtype=np.float32) 39 | vote_labels = np.argmax(fprop(fft_data), axis=1) 40 | true_labels = np.argmax(el[1], axis=1) 41 | 42 | for l,v in zip(true_labels, vote_labels): 43 | confusion[l, v] += 1 44 | 45 | total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion)) 46 | print '' 47 | return total_error, confusion 48 | 49 | def file_misclass_error(model, dataset): 50 | """ 51 | Function to compute the file-level classification error by classifying 52 | individual frames and then voting for the class with highest cumulative probability 53 | """ 54 | n_classes = len(dataset.targets) 55 | feat_space = model.get_input_space() 56 | 57 | X = feat_space.make_theano_batch() 58 | Y = model.fprop( X ) 59 | fprop = theano.function([X],Y) 60 | 61 | confusion = np.zeros((n_classes, n_classes)) 62 | n_examples = len(dataset.file_list) 63 | 64 | target_space = VectorSpace(dim=n_classes) 65 | data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets")) 66 | iterator = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs) 67 | 68 | for i,el in enumerate(iterator): 69 | 70 | # display progress indicator 71 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples))) 72 | sys.stdout.flush() 73 | 74 | fft_data = np.array(el[0], dtype=np.float32) 75 | #frame_labels = np.argmax(fprop(fft_data), axis=1) 76 | #hist = np.bincount(frame_labels, minlength=n_classes) 77 | #vote_label = np.argmax(hist) # most used label 78 | vote_label = np.argmax(np.sum(fprop(fft_data), axis=0)) 79 | true_label = el[1] #np.argmax(el[1]) 80 | confusion[true_label, vote_label] += 1 81 | #print 'true: {}, vote: {}'.format(true_label, vote_label) 82 | #pdb.set_trace() 83 | 84 | total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion)) 85 | print '' 86 | return total_error, confusion 87 | 88 | def file_misclass_error_printf(model, dataset, save_file, label_list=None): 89 | """ 90 | Function to compute the file-level classification error by classifying 91 | individual frames and then voting for the class with highest cumulative probability 92 | """ 93 | n_classes = len(dataset.targets) 94 | feat_space = model.get_input_space() 95 | 96 | X = feat_space.make_theano_batch() 97 | Y = model.fprop(X) 98 | fprop = theano.function([X],Y) 99 | 100 | n_examples = len(dataset.file_list) 101 | target_space = VectorSpace(dim=n_classes) 102 | data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets")) 103 | iterator = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs) 104 | 105 | with open(save_file, 'w') as fname: 106 | csvwriter = csv.writer(fname, delimiter='\t') 107 | for i,el in enumerate(iterator): 108 | 109 | # display progress indicator 110 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples))) 111 | sys.stdout.flush() 112 | 113 | fft_data = np.array(el[0], dtype=np.float32) 114 | #frame_labels = np.argmax(fprop(fft_data), axis=1) 115 | #hist = np.bincount(frame_labels, minlength=n_classes) 116 | choice = np.argmax(np.sum(fprop(fft_data), axis=0)) 117 | 118 | if label_list: # use-string labels 119 | vote_label = label_list[choice] # most used label 120 | true_label = dataset.label_list[el[1]]#np.argmax(el[1]) 121 | else: # use numeric labels 122 | vote_label = choice # most used label 123 | true_label = el[1] #np.argmax(el[1]) 124 | 125 | #csvwriter.writerow([dataset.file_list[i], true_label, vote_label]) 126 | csvwriter.writerow([dataset.file_list[i], true_label, vote_label]) 127 | 128 | # fname.write('{file_name}\t{true_label}\t{vote_label}\n'.format( 129 | # file_name =dataset.file_list[i], 130 | # true_label=true_label, 131 | # vote_label=vote_label)) 132 | print '' 133 | 134 | def file_misclass_error_topx(model, dataset, topx=3): 135 | """ 136 | Function to compute the file-level classification error by classifying 137 | individual frames and then voting for the class with highest cumulative probability 138 | 139 | Check topx most probable results 140 | """ 141 | X = model.get_input_space().make_theano_batch() 142 | Y = model.fprop( X ) 143 | fprop = theano.function([X],Y) 144 | 145 | n_classes = dataset.raw.y.shape[1] 146 | confusion = np.zeros((n_classes, n_classes)) 147 | n_examples = len(dataset.raw.support) 148 | n_frames_per_file = dataset.raw.n_frames_per_file 149 | 150 | batch_size = n_frames_per_file 151 | data_specs = dataset.raw.get_data_specs() 152 | iterator = dataset.iterator(mode='sequential', 153 | batch_size=batch_size, 154 | data_specs=data_specs 155 | ) 156 | 157 | hits = 0 158 | n = 0 159 | i=0 160 | for el in iterator: 161 | 162 | # display progress indicator 163 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples))) 164 | sys.stdout.flush() 165 | 166 | fft_data = np.array(el[0], dtype=np.float32) 167 | frame_labels = np.argmax(fprop(fft_data), axis=1) 168 | hist = np.bincount(frame_labels, minlength=n_classes) 169 | vote_label = np.argsort(hist)[-1:-1-topx:-1] # most used label 170 | 171 | labels = np.argmax(el[1], axis=1) 172 | true_label = labels[0] 173 | for entry in labels: 174 | assert entry == true_label # check for indexing prob 175 | 176 | if true_label in vote_label: 177 | hits+=1 178 | 179 | n+=1 180 | i+=batch_size 181 | 182 | print '' 183 | return hits/float(n)*100 184 | 185 | 186 | def pp_array(array): # pretty printing 187 | for row in array: 188 | print ['%04.1f' % el for el in row] 189 | 190 | 191 | if __name__ == '__main__': 192 | 193 | import argparse 194 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 195 | description='''Script to test DNN. Measure framelevel accuracy. 196 | Option to use a majority vote for over the frames in each test recording. 197 | ''') 198 | 199 | parser.add_argument('model_file', help='Path to trained model file') 200 | parser.add_argument('--testset', help='Optional. If not specified, the testset from the models yaml src will be used') 201 | parser.add_argument('--majority_vote', action='store_true', help='Measure framelevel accuracy with ') 202 | parser.add_argument('--which_set', help='train, test, or valid') 203 | parser.add_argument('--save_file', help='Save results to tab separated file') 204 | args = parser.parse_args() 205 | 206 | # get model 207 | model = serial.load(args.model_file) 208 | 209 | if args.which_set is None: 210 | args.which_set = 'test' 211 | 212 | if args.testset: # dataset config passed in from command line 213 | print 'Using dataset passed in from command line' 214 | with open(args.testset) as f: config = cPickle.load(f) 215 | dataset = AudioDataset(config=config, which_set=args.which_set) 216 | 217 | # get model dataset for its labels... 218 | model_dataset = yaml_parse.load(model.dataset_yaml_src) 219 | label_list = model_dataset.label_list 220 | 221 | else: # get dataset from model's yaml_src 222 | print "Using dataset from model's yaml src" 223 | p = re.compile(r"which_set.*'(train)'") 224 | dataset_yaml = p.sub("which_set: '{}'".format(args.which_set), model.dataset_yaml_src) 225 | dataset = yaml_parse.load(dataset_yaml) 226 | 227 | label_list = dataset.label_list 228 | 229 | # measure test error 230 | if args.majority_vote: 231 | print 'Using majority vote' 232 | if args.save_file: 233 | file_misclass_error_printf(model, dataset, args.save_file)#, label_list) 234 | else: 235 | err, conf = file_misclass_error(model, dataset) 236 | else: 237 | print 'Not using majority vote' 238 | # if args.save_file: 239 | # raise ValueError('--save_file option only supported for majority vote currently') 240 | # else: 241 | # err, conf = frame_misclass_error(model, dataset) 242 | err, conf = frame_misclass_error(model, dataset) 243 | with open(args.save_file, 'wb') as fname: 244 | csvwriter = csv.writer(fname, delimiter='\t') 245 | for r in conf: 246 | csvwriter.writerow(r) 247 | 248 | 249 | if not args.save_file: 250 | conf = conf.transpose() 251 | print 'test accuracy: %2.2f' % (100-err) 252 | print 'confusion matrix (cols true):' 253 | pp_array(100*conf/np.sum(conf, axis=0)) 254 | 255 | # acc = file_misclass_error_topx(model, dataset, 2) 256 | # print 'test accuracy: %2.2f' % acc 257 | 258 | 259 | -------------------------------------------------------------------------------- /train_classifier_on_dnn_feats.py: -------------------------------------------------------------------------------- 1 | import os, sys, re, cPickle 2 | import numpy as np 3 | import theano 4 | 5 | from sklearn.externals import joblib 6 | from sklearn.ensemble import RandomForestClassifier 7 | from sklearn.svm import SVC 8 | from sklearn.grid_search import GridSearchCV 9 | from pylearn2.utils import serial 10 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 11 | import pylearn2.config.yaml_parse as yaml_parse 12 | 13 | import pdb 14 | 15 | def aggregate_features(model, dataset, which_layers=[2], win_size=200, step=100): 16 | assert np.max(which_layers) < len(model.layers) 17 | 18 | X = model.get_input_space().make_theano_batch() 19 | Y = model.fprop(X, return_all=True) 20 | fprop = theano.function([X],Y) 21 | 22 | n_classes = dataset.y.shape[1] 23 | n_examples = len(dataset.file_list) 24 | 25 | feat_space = model.get_input_space() 26 | target_space = VectorSpace(dim=n_classes) 27 | 28 | data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets")) 29 | iterator = dataset.iterator(mode='sequential', data_specs=data_specs) 30 | 31 | # compute feature representation, aggregrate frames 32 | X=[]; y=[]; Z=[]; file_list=[]; 33 | for n,el in enumerate(iterator): 34 | # display progress indicator 35 | sys.stdout.write('Aggregation progress: %2.0f%%\r' % (100*n/float(n_examples))) 36 | sys.stdout.flush() 37 | 38 | input_data = np.array(el[0], dtype=np.float32) 39 | output_data = fprop(input_data) 40 | feats = np.hstack([output_data[i] for i in which_layers]) 41 | true_label = el[1] 42 | 43 | # aggregate features 44 | agg_feat = [] 45 | for i in xrange(0, feats.shape[0]-win_size, step): 46 | chunk = feats[i:i+win_size,:] 47 | agg_feat.append(np.hstack((np.mean(chunk, axis=0), np.std(chunk, axis=0)))) 48 | 49 | X.append(np.vstack(agg_feat)) 50 | y.append(np.hstack([true_label] * len(agg_feat))) 51 | Z.append(np.sum(output_data[-1], axis=0)) 52 | file_list.append(el[2]) 53 | 54 | print '' # newline 55 | return X, y, Z, file_list 56 | 57 | def get_features(model, dataset, which_layers=[2], n_features=100): 58 | assert np.max(which_layers) < len(model.layers) 59 | 60 | rng = np.random.RandomState(111) 61 | X = model.get_input_space().make_theano_batch() 62 | Y = model.fprop(X, return_all=True) 63 | fprop = theano.function([X],Y) 64 | 65 | n_classes = dataset.y.shape[1] 66 | n_examples = len(dataset.file_list) 67 | 68 | feat_space = model.get_input_space() 69 | target_space = VectorSpace(dim=n_classes) 70 | 71 | data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets")) 72 | iterator = dataset.iterator(mode='sequential', data_specs=data_specs) 73 | 74 | X=[]; y=[]; Z=[]; file_list=[]; 75 | for n,el in enumerate(iterator): 76 | # display progress indicator 77 | sys.stdout.write('Getting features: %2.0f%%\r' % (100*n/float(n_examples))) 78 | sys.stdout.flush() 79 | 80 | input_data = np.array(el[0], dtype=np.float32) 81 | output_data = fprop(input_data) 82 | feats = np.hstack([output_data[i] for i in which_layers]) 83 | true_label = el[1] 84 | 85 | if n_features: 86 | ind = rng.permutation(feats.shape[0]) 87 | feats = feats[ind[:n_features],:] 88 | 89 | X.append(feats) 90 | y.append([true_label]*n_features) 91 | Z.append(np.sum(output_data[-1], axis=0)) 92 | file_list.append(el[2]) 93 | 94 | print '' 95 | return X, y, Z, file_list 96 | 97 | def train_classifier(X_train, y_train, method='random_forest', verbose=2): 98 | assert method in ['random_forest', 'linear_svm'] 99 | 100 | # train classifier 101 | if method=='random_forest': 102 | classifier = RandomForestClassifier(n_estimators=500, random_state=1234, verbose=verbose, n_jobs=2) 103 | else: 104 | parameters = {'C' : 10**np.arange(-2,4.)} 105 | grid = GridSearchCV(SVC(), parameters, verbose=3) 106 | grid.fit(X_train, y_train) 107 | classifier = grid.best_estimator_ 108 | #classifier = SVC(C=0.5, kernel='linear', random_state=1234, verbose=verbose) 109 | 110 | return classifier.fit(X_train, y_train) 111 | 112 | def test_classifier(X_test, y_test, classifier, n_labels=10): 113 | n_examples = len(y_test) 114 | confusion = np.zeros((n_labels,n_labels)) 115 | 116 | for n, (X, true_label) in enumerate(zip(X_test,y_test)): 117 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*n/float(n_examples))) 118 | sys.stdout.flush() 119 | 120 | y_pred = np.array(classifier.predict(X), dtype='int') 121 | pred_label = np.argmax(np.bincount(y_pred, minlength=n_labels)) 122 | confusion[pred_label, true_label[0]] += 1 123 | print '' 124 | 125 | ave_acc = 100*(np.sum(np.diag(confusion)) / np.sum(confusion)) 126 | print "classification accuracy:", ave_acc 127 | return confusion 128 | 129 | def test_classifier_printf(X_test, y_test, Z_test, file_list, classifier, save_file, n_labels=10): 130 | n_examples = len(file_list) 131 | with open(save_file, 'w') as f: 132 | for n, (X, true_label, Z, fname) in enumerate(zip(X_test, y_test, Z_test, file_list)): 133 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*n/float(n_examples))) 134 | sys.stdout.flush() 135 | 136 | y_pred = np.array(classifier.predict(X), dtype='int') 137 | pred_label = np.argmax(np.bincount(y_pred, minlength=n_labels)) 138 | s='' 139 | for i in Z: s+='%2.2f\t'%i 140 | f.write('{0}\t{1}\t{2}\t{3}\n'.format(fname, true_label[0], pred_label, s)) 141 | print '' 142 | 143 | if __name__ == "__main__": 144 | # example: python train_classifier_on_dnn_feats.py ./saved/S_500_RS.cpu.pkl /Users/cmke/Datasets/tzanetakis_genre --which_layers 0 145 | 146 | import argparse, glob 147 | 148 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 149 | description='''Script to train/test random forest on DNN features. 150 | ''') 151 | 152 | parser.add_argument('model_file', help='Path to trained DNN model file') 153 | parser.add_argument('--which_layers', nargs='*', type=int, help='List of which DNN layers to use as features') 154 | parser.add_argument('--aggregate_features', action='store_true', help='option to aggregate frames (mean/std of frames used to train classifier)') 155 | parser.add_argument('--classifier', help="either 'random_forest' or 'linear_svm'") 156 | parser.add_argument('--save_file', help='Output classification results to a text file') 157 | 158 | args = parser.parse_args() 159 | 160 | if not args.which_layers: 161 | parser.error('Please specify --which_layers x, with x either 1, 2, 3 or 1 2 3 (layer 0 is a pre-processing layer)') 162 | 163 | if args.aggregate_features: 164 | print 'Using aggregate features' 165 | else: 166 | print 'Not using aggregate features' 167 | 168 | if args.classifier is None: 169 | print 'No classifer selected, using random forest' 170 | args.classifier = 'random_forest' 171 | 172 | # load model 173 | model = serial.load(args.model_file) 174 | 175 | # parse dataset from model 176 | p = re.compile(r"which_set.*'(train)'") 177 | trainset_yaml = model.dataset_yaml_src 178 | validset_yaml = p.sub("which_set: 'valid'", model.dataset_yaml_src) 179 | testset_yaml = p.sub("which_set: 'test'", model.dataset_yaml_src) 180 | 181 | trainset = yaml_parse.load(trainset_yaml) 182 | validset = yaml_parse.load(validset_yaml) 183 | testset = yaml_parse.load(testset_yaml) 184 | 185 | if args.aggregate_features: 186 | X_train, y_train, Z_train, train_files = aggregate_features(model, trainset, which_layers=args.which_layers) 187 | X_valid, y_valid, Z_valid, valid_files = aggregate_features(model, validset, which_layers=args.which_layers) 188 | X_test, y_test, Z_test, test_files = aggregate_features(model, testset, which_layers=args.which_layers) 189 | else: 190 | X_train, y_train, Z_train, train_files = get_features(model, trainset, which_layers=args.which_layers) 191 | X_valid, y_valid, Z_valid, valid_files = get_features(model, validset, which_layers=args.which_layers) 192 | X_test, y_test, Z_test, test_files = get_features(model, testset, which_layers=args.which_layers) 193 | 194 | print 'Training classifier' 195 | X_all = np.vstack((np.vstack(X_train), np.vstack(X_valid))) 196 | y_all = np.hstack((np.hstack(y_train), np.hstack(y_valid))) 197 | classifier = train_classifier(X_all, y_all, method=args.classifier) 198 | 199 | print 'Testing classifier' 200 | if args.save_file: 201 | 202 | test_classifier_printf( 203 | X_test=X_test, 204 | y_test=y_test, 205 | Z_test=Z_test, 206 | file_list=test_files, 207 | classifier=classifier, 208 | save_file=args.save_file+'.txt') 209 | 210 | print 'Saving trained classifier' 211 | joblib.dump(classifier, args.save_file+'.pkl', 9) 212 | 213 | else: 214 | confusion = test_classifier(X_test, y_test, classifier) 215 | 216 | -------------------------------------------------------------------------------- /train_mlp_conv_script.py: -------------------------------------------------------------------------------- 1 | # training script 2 | import sys, os, argparse, cPickle 3 | import pylearn2.config.yaml_parse as yaml_parse 4 | import pdb 5 | 6 | if __name__=="__main__": 7 | 8 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 9 | description='''Script to train a DNN with a variable number of units and the possibility of using dropout. 10 | ''') 11 | 12 | parser.add_argument('fold_config', help='Path to dataset partition configuration file (generated with prepare_dataset.py)') 13 | parser.add_argument('yaml_file') 14 | parser.add_argument('--output', help='Name of output model') 15 | args = parser.parse_args() 16 | 17 | if args.output is None: 18 | parser.error('Please specify the name that the trained model file should be saved as (.pkl file)') 19 | 20 | hyper_params = { 21 | 'fold_config' : args.fold_config, 22 | 'best_model_save_path' : args.output, 23 | 'save_path' : '/tmp/save.pkl' 24 | } 25 | 26 | with open(args.yaml_file) as f: 27 | train_yaml = f.read() 28 | 29 | train_yaml = train_yaml % (hyper_params) 30 | train = yaml_parse.load(train_yaml) 31 | train.main_loop() -------------------------------------------------------------------------------- /train_mlp_script.py: -------------------------------------------------------------------------------- 1 | # training script 2 | import sys, os, argparse, cPickle 3 | import pylearn2.config.yaml_parse as yaml_parse 4 | import pdb 5 | 6 | if __name__=="__main__": 7 | 8 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 9 | description='''Script to train a DNN with a variable number of units and the possibility of using dropout. 10 | ''') 11 | 12 | parser.add_argument('fold_config', help='Path to dataset partition configuration file (generated with prepare_dataset.py)') 13 | parser.add_argument('yaml_file') 14 | parser.add_argument('--nunits', type=int, help='Number of units in each hidden layer') 15 | # parser.add_argument('--dropout', action='store_true', help='Set this flag if you want to use dropout regularization') 16 | parser.add_argument('--output', help='Name of output model') 17 | args = parser.parse_args() 18 | 19 | if args.nunits is None: 20 | parser.error('Please specify number of hidden units per layer with --nunits flag') 21 | if args.output is None: 22 | parser.error('Please specify the name that the trained model file should be saved as (.pkl file)') 23 | 24 | # if args.dropout: 25 | # print 'Using dropout' 26 | # yaml_file = 'mlp_rlu_dropout.yaml' 27 | # else: 28 | # print 'Not using dropout' 29 | # yaml_file = 'mlp_rlu.yaml' 30 | 31 | hyper_params = { 'dim_h0' : args.nunits, 32 | 'dim_h1' : args.nunits, 33 | 'dim_h2' : args.nunits, 34 | 'fold_config' : args.fold_config, 35 | 'best_model_save_path' : args.output, 36 | 'save_path' : '/tmp/save.pkl' 37 | } 38 | 39 | with open(args.yaml_file) as f: 40 | train_yaml = f.read() 41 | 42 | train_yaml = train_yaml % (hyper_params) 43 | train = yaml_parse.load(train_yaml) 44 | train.main_loop() 45 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coreyker/dnn-mgr/bdad579ea6cb37b665ea6019fe1026a6ce20cbc7/utils/__init__.py -------------------------------------------------------------------------------- /utils/calc_grad.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from theano import function 3 | from theano import tensor as T 4 | import pylearn2 5 | import audio_dataset 6 | import pdb 7 | 8 | ''' 9 | # !!! There is something wrong in the following gradient calculation !!! 10 | # !!! (But we don't need it anyway, since theano can do the backprop calculation for us) !!! 11 | 12 | # Calculate gradient of MLP w.r.t. input data 13 | # (assumes rectified linear units and softmax output layer) 14 | 15 | def calc_grad(X0, model, label): 16 | 17 | X = model.get_input_space().make_theano_batch() 18 | Y = model.fprop( X, return_all=True ) 19 | fprop = theano.function([X],Y) 20 | 21 | activations = fprop(X0) 22 | 23 | Wn = model.layers[-1].get_weights() 24 | bn = model.layers[-1].get_biases() 25 | Xn = activations[-1] 26 | 27 | # derivative of cost with respect to layer preceeding the softmax 28 | gradn = Wn[:,label] - Xn.dot(Wn.T) 29 | 30 | pdb.set_trace() 31 | for n in xrange(len(model.layers)-2, 0, -1): 32 | Wn = model.layers[n].get_weights() 33 | bn = model.layers[n].get_biases() 34 | Xn_1 = activations[n-1] 35 | 36 | if type(model.layers[n]) is pylearn2.models.mlp.RectifiedLinear: 37 | dact = lambda x: x>0 38 | elif type(model.layers[n]) is pylearn2.models.mlp.Linear: 39 | dact = lambda x: x 40 | elif type(model.layers[n]) is audio_dataset.PreprocLayer: 41 | dact = lambda x: x 42 | 43 | gradn = (dact(Xn_1.dot(Wn)) * gradn).dot(Wn.T) 44 | 45 | return gradn 46 | ''' 47 | 48 | # Create a simple model for testing 49 | rng = np.random.RandomState(111) 50 | epsilon = 1e-2 51 | nvis = 10 52 | nhid = 5 53 | n_classes = 3 54 | 55 | X0 = np.array(rng.randn(1,nvis), dtype=np.float32) 56 | label = rng.randint(0,n_classes) 57 | 58 | model = pylearn2.models.mlp.MLP( 59 | nvis=nvis, 60 | layers=[ 61 | pylearn2.models.mlp.Linear( 62 | layer_name='pre', 63 | dim=nvis, 64 | irange=1. 65 | ), 66 | pylearn2.models.mlp.RectifiedLinear( 67 | layer_name='h0', 68 | dim=nhid, 69 | irange=1.), 70 | pylearn2.models.mlp.RectifiedLinear( 71 | layer_name='h1', 72 | dim=nhid, 73 | irange=1.), 74 | pylearn2.models.mlp.RectifiedLinear( 75 | layer_name='h2', 76 | dim=nhid, 77 | irange=1.), 78 | pylearn2.models.mlp.Softmax( 79 | n_classes=n_classes, 80 | layer_name='y', 81 | irange=1.) 82 | ]) 83 | 84 | # Numerical computation of gradients 85 | X = model.get_input_space().make_theano_batch() 86 | Y = model.fprop( X ) 87 | fprop = function([X],Y) 88 | 89 | dX_num = np.zeros(X0.shape) 90 | for i in range(nvis): 91 | tmp = np.copy(X0[:,i]) 92 | X0[:,i] = tmp + epsilon 93 | Y_plus = -np.log(fprop(X0)[:,label]) 94 | 95 | X0[:,i] = tmp - epsilon 96 | Y_minus = -np.log(fprop(X0)[:,label]) 97 | 98 | X0[:,i] = tmp 99 | dX_num[:,i] = (Y_plus - Y_minus) / (2*epsilon) 100 | 101 | # Computation of gradients using Theano 102 | n_examples = X0.shape[0] 103 | label_vec = T.vector('label_vec') 104 | cost = model.cost(label_vec, model.fprop(X)) 105 | dCost = T.grad(cost * n_examples, X) 106 | f = function([X, label_vec], dCost) 107 | 108 | one_hot = np.zeros(n_classes, dtype=np.float32) 109 | one_hot[label] = 1 110 | 111 | dX_est = f(X0, one_hot) #dX_est = calc_grad(X0, model, label) 112 | 113 | delta = dX_num - dX_est 114 | # Print results 115 | print 'Numerical gradient:', dX_num 116 | print 'Theano gradient:', dX_est 117 | print 'Absolute difference:', np.abs(delta) 118 | print '2-norm of difference', np.linalg.norm(delta) 119 | 120 | -------------------------------------------------------------------------------- /utils/class_histogram.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | import theano 4 | from pylearn2.utils import serial 5 | from pylearn2.datasets.transformer_dataset import TransformerDataset 6 | import cPickle 7 | import GTZAN_dataset 8 | 9 | import pdb 10 | 11 | def class_histogram(model, dataset): 12 | """ 13 | Function to compute the file-level classification error by classifying 14 | individual frames and then voting for the class with highest cumulative probability 15 | """ 16 | X = model.get_input_space().make_theano_batch() 17 | Y = model.fprop( X ) 18 | fprop = theano.function([X],Y) 19 | 20 | n_classes = dataset.raw.y.shape[1] 21 | confusion = np.zeros((n_classes, n_classes)) 22 | n_examples = len(dataset.raw.support) 23 | n_frames_per_file = dataset.raw.n_frames_per_file 24 | 25 | batch_size = n_frames_per_file 26 | data_specs = dataset.raw.get_data_specs() 27 | iterator = dataset.iterator(mode='sequential', 28 | batch_size=batch_size, 29 | data_specs=data_specs 30 | ) 31 | 32 | i=0 33 | histogram = [] 34 | for el in iterator: 35 | 36 | # display progress indicator 37 | sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples))) 38 | sys.stdout.flush() 39 | 40 | fft_data = np.array(el[0], dtype=np.float32) 41 | frame_labels = np.argmax(fprop(fft_data), axis=1) 42 | hist = np.bincount(frame_labels, minlength=n_classes) 43 | histogram.append(hist) 44 | 45 | i += batch_size 46 | 47 | return histogram 48 | 49 | #if __name__ == '__main__': 50 | 51 | # _, fold_file, model_file = sys.argv 52 | fold_file = 'GTZAN_1024-fold-1_of_4.pkl' 53 | model_file = './saved-rlu-505050/mlp_rlu_fold1_best.pkl' 54 | 55 | # get model 56 | model = serial.load(model_file) 57 | 58 | # get stanardized dictionary 59 | which_set = 'test' 60 | with open(fold_file) as f: 61 | config = cPickle.load(f) 62 | 63 | dataset = TransformerDataset( 64 | raw = GTZAN_dataset.GTZAN_dataset(config, which_set), 65 | transformer = GTZAN_dataset.GTZAN_standardizer(config) 66 | ) 67 | 68 | # test error 69 | #err, conf = frame_misclass_error(model, dataset) 70 | 71 | hist = class_histogram(model, dataset) 72 | hist = np.vstack(hist) 73 | 74 | test_files = np.array(config['test_files']) 75 | test_labels = test_files//100 76 | 77 | most_votes = np.argmax(hist,axis=0) 78 | most_rep_files = test_files[most_votes] 79 | most_rep_hist = hist[most_votes, :] 80 | 81 | prediction = np.argmax(hist, axis=1) 82 | top_pred = np.argsort(hist, axis=1) 83 | top_pred = top_pred[:,-1::-1] 84 | 85 | err_list = [] 86 | 87 | # for i, (l,p) in enumerate(zip(test_labels, prediction)): 88 | # if l != p: 89 | # err_list.append(i) 90 | 91 | for i, (l,p) in enumerate(zip(test_labels, top_pred)): 92 | if l not in p[:2]: 93 | err_list.append(i) 94 | 95 | 96 | ax_labels = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock'] 97 | 98 | n = err_list[18] 99 | err_file = test_files[n] 100 | err_hist = hist[n] 101 | pred_label = ax_labels[np.argmax(err_hist)] 102 | true_label = ax_labels[err_file//100] 103 | 104 | print ff[err_file] 105 | print pred_label 106 | 107 | -------------------------------------------------------------------------------- /utils/comp_ave_snr.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import scikits.audiolab as audiolab 4 | 5 | # calc ave snr 6 | file_list = '/home/cmke/Datasets/_tzanetakis_genre/test_filtered.txt' 7 | 8 | with open(file_list) as f: 9 | files = [l.strip() for l in f.readlines()] 10 | 11 | base = '/home/cmke/Datasets/_tzanetakis_genre' 12 | 13 | dir_list = ['/home/cmke/Datasets/_tzanetakis_F_500_RSD_allies', 14 | '/home/cmke/Datasets/_tzanetakis_F_500_RSD_random', 15 | '/home/cmke/Datasets/_tzanetakis_F_500_RSD_jazz'] 16 | 17 | ign = 2048 18 | snr = [[],[],[]] 19 | for j,d in enumerate(dir_list): 20 | print d 21 | for i,f in enumerate(files): 22 | print 'iteration ', i 23 | x,_,_ = audiolab.wavread(os.path.join(base,f)) 24 | xhat,_,_ = audiolab.wavread(os.path.join(d,f)) 25 | 26 | L = min(len(x), len(xhat)) 27 | 28 | snr[j].append(20*np.log10(np.linalg.norm(x[ign:L-ign-1])/np.linalg.norm(np.abs(x[ign:L-ign-1]-xhat[ign:L-ign-1])+1e-12))) 29 | 30 | for d,s in zip(dir_list, snr): 31 | print 'Directory', d 32 | print 'Average SNR', np.mean(s) 33 | print 'Std SNR', np.std(s) 34 | 35 | -------------------------------------------------------------------------------- /utils/create_adversarial_dataset.py: -------------------------------------------------------------------------------- 1 | import os, sys, re, csv, cPickle, argparse 2 | from scikits import audiolab, samplerate 3 | from utils.read_mp3 import read_mp3 4 | from sklearn.externals import joblib 5 | import numpy as np 6 | import theano 7 | from theano import tensor as T 8 | from pylearn2.utils import serial 9 | from audio_dataset import AudioDataset, PreprocLayer 10 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 11 | from pylearn2.datasets.dense_design_matrix import DefaultViewConverter 12 | import pylearn2.config.yaml_parse as yaml_parse 13 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features 14 | import pdb 15 | 16 | 17 | def file_misclass_error_printf(dnn_model, root_dir, dataset, save_file, mode='all_same', label=0, snr=30, aux_model=None, aux_save_file=None, which_layers=None, save_adversary_audio=None, fwd_xform=None, back_xform=None): 18 | """ 19 | Function to compute the file-level classification error by classifying 20 | individual frames and then voting for the class with highest cumulative probability 21 | """ 22 | if fwd_xform is None: 23 | print 'fwd_xform=None, using identity' 24 | fwd_xform = lambda X: X 25 | if back_xform is None: 26 | print 'back_xform=None, using identity' 27 | back_xform = lambda X: X 28 | 29 | n_classes = len(dataset.targets) 30 | 31 | X = dnn_model.get_input_space().make_theano_batch() 32 | Y = dnn_model.fprop(X) 33 | fprop_theano = theano.function([X],Y) 34 | 35 | input_space = dnn_model.get_input_space() 36 | if isinstance(input_space, Conv2DSpace): 37 | tframes, dim = input_space.shape 38 | view_converter = DefaultViewConverter((tframes, dim, 1)) 39 | else: 40 | dim = input_space.dim 41 | tframes = 1 42 | view_converter = None 43 | 44 | if view_converter is not None: 45 | def fprop(batch): 46 | nframes = batch.shape[0] 47 | thop = 1. 48 | sup = np.arange(0,nframes-tframes+1, np.int(tframes/thop)) 49 | 50 | data = np.vstack([np.reshape(batch[i:i+tframes, :],(tframes*dim,)) for i in sup]) 51 | data = fwd_xform(data) 52 | 53 | return fprop_theano(view_converter.get_formatted_batch(data, input_space)) 54 | 55 | else: 56 | fprop = fprop_theano 57 | 58 | n_examples = len(dataset.file_list) 59 | target_space = dnn_model.get_output_space() #VectorSpace(dim=n_classes) 60 | feat_space = dnn_model.get_input_space() #VectorSpace(dim=dataset.nfft//2+1, dtype='complex64') 61 | data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets")) 62 | iterator = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs) 63 | 64 | if aux_model: 65 | aux_fname = open(aux_save_file, 'w') 66 | aux_writer = csv.writer(aux_fname, delimiter='\t') 67 | 68 | with open(save_file, 'w') as fname: 69 | dnn_writer = csv.writer(fname, delimiter='\t') 70 | for i,el in enumerate(iterator): 71 | 72 | # display progress indicator 73 | 'Progress: %2.0f%%\r' % (100*i/float(n_examples)) 74 | 75 | Mag, Phs = np.abs(el[0], dtype=np.float32), np.angle(el[0]) 76 | epsilon = np.linalg.norm(Mag)/Mag.shape[0]/10**(snr/20.) 77 | 78 | if mode == 'all_same': 79 | target = label 80 | elif mode == 'perfect': 81 | target = el[1] 82 | elif mode == 'random': 83 | target = np.random.randint(n_classes) 84 | elif mode == 'all_wrong': 85 | cand = np.setdiff1d(np.arange(n_classes),np.array(el[1])) # remove ground truth label from set of options 86 | target = cand[np.random.randint(len(cand))] 87 | 88 | if 1: # re-read audio (seems to be bug when reading from h5) 89 | f = el[2] 90 | if f.endswith('.wav'): 91 | read_fun = audiolab.wavread 92 | elif f.endswith('.au'): 93 | read_fun = audiolab.auread 94 | elif f.endswith('.mp3'): 95 | read_fun = read_mp3 96 | 97 | x, fstmp, _ = read_fun(os.path.join(root_dir, f)) 98 | 99 | # make mono 100 | if len(x.shape) != 1: 101 | x = np.sum(x, axis=1)/2. 102 | 103 | seglen=30 104 | x = x[:fstmp*seglen] 105 | 106 | fs = 22050 107 | if fstmp != fs: 108 | x = samplerate.resample(x, fs/float(fstmp), 'sinc_best') 109 | 110 | Mag, Phs = compute_fft(x) 111 | Mag = Mag[:1200,:513] 112 | Phs = Phs[:1200,:513] 113 | epsilon = np.linalg.norm(Mag)/Mag.shape[0]/10**(snr/20.) 114 | else: 115 | raise ValueError("Check that song-level iterator is indeed returning 'raw data'") 116 | 117 | X_adv, P_adv = find_adversary( 118 | model=dnn_model, 119 | X0=Mag, 120 | label=target, 121 | fwd_xform=fwd_xform, 122 | back_xform=back_xform, 123 | P0=np.hstack((Phs, -Phs[:,-2:-dataset.nfft/2-1:-1])), 124 | mu=.01, 125 | epsilon=epsilon, 126 | maxits=50, 127 | stop_thresh=0.5, 128 | griffin_lim=False#True 129 | ) 130 | 131 | if save_adversary_audio: 132 | 133 | nfft = 2*(X_adv.shape[1]-1) 134 | nhop = nfft//2 135 | x_adv = overlap_add(np.hstack((X_adv, X_adv[:,-2:-nfft//2-1:-1])) * np.exp(1j*P_adv), nfft, nhop) 136 | audiolab.wavwrite(x_adv, os.path.join(save_adversary_audio, el[2]), 22050, 'pcm16') 137 | 138 | #frame_labels = np.argmax(fprop(X_adv), axis=1) 139 | #hist = np.bincount(frame_labels, minlength=n_classes) 140 | 141 | fpass = fprop(X_adv) 142 | conf = np.sum(fpass, axis=0) / float(fpass.shape[0]) 143 | dnn_label = np.argmax(conf) #np.argmax(hist) # most used label 144 | true_label = el[1] 145 | 146 | # truncate to correct length 147 | ext = min(Mag.shape[0], X_adv.shape[0]) 148 | Mag = Mag[:ext,:] 149 | X_adv = X_adv[:ext,:] 150 | 151 | X_diff = Mag-X_adv 152 | out_snr = 20*np.log10(np.linalg.norm(Mag)/np.linalg.norm(X_diff)) 153 | 154 | dnn_writer.writerow([dataset.file_list[i], true_label, dnn_label, out_snr, conf[dnn_label]]) 155 | 156 | print 'Mode:{}, True label:{}, Adv label:{}, Sel label:{}, Conf:{}, Out snr: {}'.format(mode, true_label, target, dnn_label, conf[dnn_label], out_snr) 157 | if aux_model: 158 | fft_agg = aggregate_features(dnn_model, X_adv, which_layers) 159 | aux_vote = np.argmax(np.bincount(np.array(aux_model.predict(fft_agg), dtype='int'))) 160 | aux_writer.writerow([dataset.file_list[i], true_label, aux_vote]) 161 | print 'AUX adversarial label: {}'.format(aux_vote) 162 | if aux_model: 163 | aux_fname.close() 164 | print '' 165 | 166 | if __name__ == '__main__': 167 | ''' 168 | Variants: 169 | 1) Label all excerpts the same (e.g., all blues) 170 | 2) Perfect classification 171 | 3) Random classification 172 | ''' 173 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 174 | description='''Script to find/test adversarial examples with a dnn''') 175 | parser.add_argument('--dnn_model', help='dnn model to use for features') 176 | parser.add_argument('--aux_model', help='(optional) auxiliary model trained on dnn features (e.g. random forest)') 177 | parser.add_argument('--which_layers', nargs='*', type=int, help='(optional) layer(s) from dnn to be passed to auxiliary model') 178 | 179 | # three variants 180 | parser.add_argument('--mode', help='either all_same, perfect, random, all_wrong') 181 | parser.add_argument('--label', type=int, help='label to minimize loss on (only used in all_same mode)') 182 | parser.add_argument('--root_dir', help='dataset directory') 183 | 184 | parser.add_argument('--dnn_save_file', help='txt file to save results in') 185 | parser.add_argument('--aux_save_file', help='txt file to save results in') 186 | parser.add_argument('--save_adversary_audio', help='path to save adversaries') 187 | 188 | args = parser.parse_args() 189 | 190 | assert args.mode in ['all_same', 'perfect', 'random', 'all_wrong'] 191 | if args.mode == 'all_same' and not args.label: 192 | parser.error('--label x must be specified together with all_same mode') 193 | if args.aux_model and not args.which_layers: 194 | parser.error('--which_layers x1 x2 ... must be specified together with aux_model') 195 | if args.aux_model and not args.aux_save_file: 196 | parser.error('--aux_save_file x must be specified together with --aux_model') 197 | 198 | dnn_model = serial.load(args.dnn_model) 199 | if isinstance(dnn_model.layers[0], PreprocLayer): 200 | print 'Preprocessing layer detected' 201 | fwd_xform = None 202 | back_xform = None 203 | else: 204 | print 'No preprocessing layer detected' 205 | trainset = yaml_parse.load(dnn_model.dataset_yaml_src) 206 | fwd_xform = lambda batch: (batch - trainset.mean) * trainset.istd * trainset.mask 207 | back_xform = lambda batch: (batch / trainset.istd + trainset.mean) * trainset.mask 208 | 209 | p = re.compile(r"which_set.*'(train)'") 210 | dataset_yaml = p.sub("which_set: 'test'", dnn_model.dataset_yaml_src) 211 | testset = yaml_parse.load(dataset_yaml) 212 | 213 | if args.aux_model: 214 | aux_model = joblib.load(args.aux_model) 215 | else: 216 | aux_model = None 217 | 218 | file_misclass_error_printf( 219 | dnn_model=dnn_model, 220 | root_dir=args.root_dir, 221 | dataset=testset, 222 | save_file=args.dnn_save_file, 223 | mode=args.mode, 224 | label=args.label, 225 | snr=-300.,#15., 226 | aux_model=aux_model, 227 | aux_save_file=args.aux_save_file, 228 | which_layers=args.which_layers, 229 | save_adversary_audio=args.save_adversary_audio, 230 | fwd_xform=fwd_xform, 231 | back_xform=back_xform) 232 | 233 | -------------------------------------------------------------------------------- /utils/create_split_files.py: -------------------------------------------------------------------------------- 1 | import os, sys, tables 2 | import numpy as np 3 | 4 | def create_split_files(hdf5, ntrain, nvalid, ntest, path): 5 | # ntrain, nvalid, ntest are per class 6 | 7 | # extract metadata from dataset 8 | hdf5_file = tables.open_file(hdf5, mode='r') 9 | param = hdf5_file.get_node('/', 'Param') 10 | file_dict = param.file_dict[0] 11 | 12 | train_list = [] 13 | valid_list = [] 14 | test_list = [] 15 | 16 | rng = np.random.RandomState(111) 17 | for key, files in file_dict.iteritems(): # for all files that share a given label 18 | nfiles = len(files) 19 | perm = rng.permutation(nfiles) 20 | 21 | sup = np.arange(ntest) 22 | train_index = perm[:ntrain] 23 | valid_index = perm[ntrain:ntrain+nvalid] 24 | test_index = perm[ntrain+nvalid:ntrain+nvalid+ntest] 25 | 26 | train_list.append([files[i] for i in train_index]) 27 | valid_list.append([files[i] for i in valid_index]) 28 | test_list.append([files[i] for i in test_index]) 29 | 30 | # flatten lists 31 | train_list = sum(train_list,[]) 32 | valid_list = sum(valid_list,[]) 33 | test_list = sum(test_list,[]) 34 | 35 | with open(os.path.join(path, 'train-part.txt'), 'w') as f: 36 | for i in train_list: 37 | f.write('{}\n'.format(i)) 38 | 39 | with open(os.path.join(path, 'valid-part.txt'), 'w') as f: 40 | for i in valid_list: 41 | f.write('{}\n'.format(i)) 42 | 43 | with open(os.path.join(path, 'test-part.txt'), 'w') as f: 44 | for i in test_list: 45 | f.write('{}\n'.format(i)) 46 | 47 | hdf5_file.close() 48 | 49 | if __name__=='__main__': 50 | create_split_files(sys.argv[1], int(sys.argv[2]), int(sys.argv[3]), int(sys.argv[4]), sys.argv[5]) -------------------------------------------------------------------------------- /utils/filtered_classify.py: -------------------------------------------------------------------------------- 1 | import os, sys, re, csv, cPickle 2 | import numpy as np 3 | import scipy as sp 4 | import scikits.audiolab as audiolab 5 | import scikits.samplerate as samplerate 6 | from sklearn.externals import joblib 7 | import theano 8 | 9 | from pylearn2.utils import serial 10 | from audio_dataset import AudioDataset 11 | 12 | from test_adversary import aggregate_features, compute_fft 13 | import pdb 14 | 15 | def file_misclass_error_printf(dnn_model, aux_model, which_layers, data_dir, file_list, filter_cutoff, dnn_save_file, aux_save_file): 16 | 17 | # closures 18 | def dnn_classify(X): 19 | batch = dnn_model.get_input_space().make_theano_batch() 20 | fprop = theano.function([batch], dnn_model.fprop(batch)) 21 | prediction = np.argmax(np.sum(fprop(X), axis=0)) 22 | return prediction 23 | 24 | def aux_classify(X): 25 | Xagg = aggregate_features(dnn_model, X, which_layers) 26 | prediction = np.argmax(np.bincount(np.array(aux_model.predict(Xagg), dtype='int'))) 27 | return prediction 28 | 29 | # filter coeffs 30 | b,a = sp.signal.butter(4, filter_cutoff/(22050./2.)) 31 | 32 | dnn_file = open(dnn_save_file, 'w') 33 | aux_file = open(aux_save_file, 'w') 34 | label_list = {'blues':0, 'classical':1, 'country':2, 'disco':3, 'hiphop':4, 'jazz':5, 'metal':6, 'pop':7, 'reggae':8, 'rock':9} 35 | 36 | for i, fname in enumerate(file_list): 37 | print 'Processing file {} of {}'.format(i+1, len(file_list)) 38 | true_label = label_list[fname.split('/')[0]] 39 | 40 | x,_,_ = audiolab.wavread(os.path.join(data_dir, fname)) 41 | x = sp.signal.lfilter(b,a,x) 42 | X,_ = compute_fft(x) 43 | X = np.array(X[:,:513], dtype=np.float32) 44 | 45 | dnn_pred = dnn_classify(X) 46 | dnn_file.write('{fname}\t{true_label}\t{pred_label}\n'.format( 47 | fname=fname, 48 | true_label=true_label, 49 | pred_label=dnn_pred)) 50 | 51 | aux_pred = aux_classify(X) 52 | aux_file.write('{fname}\t{true_label}\t{pred_label}\n'.format( 53 | fname=fname, 54 | true_label=true_label, 55 | pred_label=aux_pred)) 56 | 57 | dnn_file.close() 58 | aux_file.close() 59 | 60 | def pp_array(array): # pretty printing 61 | for row in array: 62 | print ['%04.1f' % el for el in row] 63 | 64 | if __name__ == '__main__': 65 | 66 | import argparse 67 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 68 | description='') 69 | 70 | parser.add_argument('--dnn_model', help='Path to trained dnn model file') 71 | parser.add_argument('--aux_model', help='Path to trained aux model file') 72 | parser.add_argument('--data_dir', help='Adversarial dataset dir') 73 | parser.add_argument('--test_list', help='List of test files in dataset dir') 74 | parser.add_argument('--filter_cutoff', type=float, help='filter cutoff') 75 | parser.add_argument('--dnn_save_file', help='') 76 | parser.add_argument('--aux_save_file', help='') 77 | args = parser.parse_args() 78 | 79 | # get model 80 | dnn_model = serial.load(args.dnn_model) 81 | aux_model = joblib.load(args.aux_model) 82 | L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1] 83 | if L=='All': 84 | which_layers = [1,2,3] 85 | else: 86 | which_layers = [int(L)] 87 | 88 | with open(args.test_list) as f: 89 | file_list = [l.strip() for l in f.readlines()] 90 | 91 | file_misclass_error_printf( 92 | dnn_model = dnn_model, 93 | aux_model = aux_model, 94 | which_layers = which_layers, 95 | data_dir = args.data_dir, 96 | file_list = file_list, 97 | filter_cutoff = args.filter_cutoff, 98 | dnn_save_file = args.dnn_save_file, 99 | aux_save_file = args.aux_save_file) 100 | -------------------------------------------------------------------------------- /utils/filtered_classify_batch.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | freqs = np.hstack(([20], np.arange(1000,12000,1000))) 5 | for f in freqs: 6 | print 'On cutoff: {}'.format(f) 7 | #os.system('''python utils/filtered_classify.py --dnn_model saved_models/dnn/S_500_RSD.pkl --aux_model saved_models/rf/S_500_RSD_AF_LAll.pkl --data_dir /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/ --test_list gtzan/test_stratified.txt --filter_cutoff {hz:d} --dnn_save_file /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/__dnn__/S_500_RSD-{hz:d}.txt --aux_save_file /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/__rf__/S_500_RSD_AF_LAll-{hz:d}.txt'''.format(hz=f)) 8 | os.system('''python utils/filtered_classify.py --dnn_model saved_models/dnn/S_500_RSD.pkl --aux_model saved_models/rf/S_500_RSD_AF_LAll.pkl --data_dir /home/cmke/Datasets/_tzanetakis_genre/ --test_list gtzan/test_stratified.txt --filter_cutoff {hz:d} --dnn_save_file /home/cmke/Datasets/_tzanetakis_genre/__dnn__/S_500_RSD-{hz:d}.txt --aux_save_file /home/cmke/Datasets/_tzanetakis_genre/__rf__/S_500_RSD_AF_LAll-{hz:d}.txt'''.format(hz=f)) 9 | 10 | 11 | -------------------------------------------------------------------------------- /utils/plot_adversary_spectra.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scikits.audiolab as audiolab 3 | from test_adversary import winfunc, compute_fft 4 | from matplotlib import pyplot as plt 5 | 6 | 7 | if __name__=='__main__': 8 | import argparse 9 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, description='') 10 | parser.add_argument('--true_file') 11 | parser.add_argument('--adversary') 12 | args = parser.parse_args() 13 | 14 | # load sndfile 15 | x,_,_ = audiolab.wavread(args.true_file) 16 | x_adv,_,_ = audiolab.wavread(args.adversary) 17 | 18 | L = min(len(x), len(x_adv)) 19 | ign = 2048 20 | snr = 20*np.log10(np.linalg.norm(x[ign:L-ign-1])/np.linalg.norm(x[ign:L-ign-1]-x_adv[ign:L-ign-1])) 21 | print 'SNR: ', snr 22 | 23 | # STFT 24 | X = compute_fft(x)[0][:,:513] 25 | X_adv = compute_fft(x_adv)[0][:,:513] 26 | 27 | rng = 1+np.arange(400) 28 | Xt = X[rng,:] 29 | X = 20*np.log10(Xt) 30 | 31 | Xt_adv = X_adv[rng,:] 32 | X_adv = 20*np.log10(Xt_adv) 33 | # nrm = np.max(X)/1. 34 | # X /= nrm 35 | # X_adv /= nrm 36 | 37 | vmin = np.min(X) 38 | vmax = np.max(X) 39 | 40 | # Plotting... 41 | plt.ion() 42 | plt.figure() 43 | 44 | plt.subplot(2,3,1) 45 | plt.imshow(X, extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax) 46 | plt.axis('tight') 47 | plt.xlabel('Frequency (kHz)') 48 | plt.ylabel('Time frame') 49 | 50 | plt.subplot(2,3,2) 51 | plt.imshow(X_adv, extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax) 52 | plt.axis('tight') 53 | plt.xlabel('Frequency (kHz)') 54 | 55 | plt.subplot(2,3,3) 56 | plt.imshow(20*np.log10(np.abs(Xt_adv-Xt)), extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax) 57 | plt.axis('tight') 58 | plt.xlabel('Frequency (kHz)') 59 | 60 | plt.subplot(2,1,2) 61 | N = 10 62 | x_range = np.arange(513)/513.*(22.050/2) 63 | plt.plot(x_range, 20*np.log10(Xt[N,:]), color=(.4,.6,1,0.8), linewidth=2) 64 | 65 | plt.plot(x_range, 20*np.log10(Xt_adv[N,:]), color=(0,0,0,1), linewidth=1) 66 | 67 | plt.plot(x_range, 20*np.log10(np.abs(Xt[N,:]-Xt_adv[N,:])), '-', color=(1,0.6,0.1,0.6), linewidth=2) 68 | plt.axis('tight') 69 | plt.xlabel('Frequency (kHz)') 70 | plt.ylabel('Magnitude (dB)') 71 | 72 | #plt.savefig('adversary_spectra.pdf', format='pdf') -------------------------------------------------------------------------------- /utils/plot_conf.py: -------------------------------------------------------------------------------- 1 | import matplotlib 2 | #matplotlib.use('Agg') 3 | from matplotlib import pyplot as plt 4 | 5 | import sys, re, os 6 | import numpy as np 7 | from pylearn2.utils import serial 8 | import pylearn2.config.yaml_parse as yaml_parse 9 | 10 | #from test_mlp_script import frame_misclass_error, file_misclass_error 11 | 12 | def plot_conf_mat(confusion, title, labels): 13 | augmented_confusion = augment_confusion_matrix(confusion) 14 | 15 | fig = plt.figure() 16 | ax = fig.add_subplot(111) 17 | ax.set_aspect(1) 18 | ax.imshow(np.array(augmented_confusion), cmap=plt.cm.gray_r, interpolation='nearest') 19 | 20 | width,height = augmented_confusion.shape 21 | for x in xrange(width): 22 | for y in xrange(height): 23 | if augmented_confusion[x][y]<50: 24 | color='k' 25 | else: 26 | color='w' 27 | ax.annotate('%2.1f'%augmented_confusion[x][y], xy=(y, x), horizontalalignment='center', verticalalignment='center',color=color, fontsize=9) 28 | 29 | ax.xaxis.tick_top() 30 | plt.xticks(range(width), labels+['Pr']) 31 | plt.yticks(range(height), labels+['F']) 32 | 33 | xlabels = ax.get_xticklabels() 34 | for label in xlabels: 35 | label.set_rotation(30) 36 | 37 | plt.xlabel(title) 38 | plt.show() 39 | 40 | def save_conf_mat(confusion, title, labels): 41 | augmented_confusion = augment_confusion_matrix(confusion) 42 | 43 | fig = plt.figure() 44 | ax = fig.add_subplot(111) 45 | ax.set_aspect(1) 46 | ax.imshow(np.array(augmented_confusion), cmap=plt.cm.gray_r, interpolation='nearest') 47 | 48 | thresh = np.max(augmented_confusion) 49 | width,height = augmented_confusion.shape 50 | for x in xrange(width): 51 | for y in xrange(height): 52 | if augmented_confusion[x][y] samples * 1.5 : 41 | x=x[0::2] #grab every other sample 42 | if channels>1 : 43 | x=x.reshape(-1,channels) 44 | return (x,fs,'int16') -------------------------------------------------------------------------------- /utils/tensongs_exp.py: -------------------------------------------------------------------------------- 1 | import os, argparse 2 | from scikits import audiolab, samplerate 3 | from matplotlib import pyplot as plt 4 | from sklearn.externals import joblib 5 | import numpy as np 6 | import scipy as sp 7 | import glob 8 | import theano 9 | from theano import tensor as T 10 | from pylearn2.utils import serial 11 | from audio_dataset import AudioDataset 12 | from pylearn2.datasets.dense_design_matrix import DefaultViewConverter 13 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 14 | import pylearn2.config.yaml_parse as yaml_parse 15 | from utils.read_mp3 import read_mp3 16 | 17 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features 18 | 19 | import pdb 20 | 21 | def stripf(f): 22 | fname = os.path.split(f)[-1] 23 | return os.path.splitext(fname)[0] 24 | 25 | if __name__ == '__main__': 26 | 27 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 28 | description='') 29 | parser.add_argument('--dnn_model', help='dnn model to use for features') 30 | parser.add_argument('--aux_model', help='(auxilliary) model trained on dnn features') 31 | parser.add_argument('--labels', help='(auxilliary) model trained on dnn features') 32 | parser.add_argument('--in_path', help='directory with files to test model on') 33 | parser.add_argument('--out_path', help='location for saving adversary (name automatically generated)') 34 | 35 | args = parser.parse_args() 36 | 37 | # tunable alg. parameters 38 | snr = 15. 39 | mu = .1 40 | stop_thresh = .9 41 | maxits = 100 42 | 43 | with open(args.labels) as f: 44 | lines = f.readlines() 45 | if len(lines)==1: # assume comma separated, single line 46 | label_list = lines[0].replace(' ','').split(',') 47 | else: 48 | label_list = [l.split()[0] for l in lines] 49 | 50 | targets = range(len(label_list)) 51 | 52 | # load dnn model, fprop function 53 | dnn_model = serial.load(args.dnn_model) 54 | input_space = dnn_model.get_input_space() 55 | batch = input_space.make_theano_batch() 56 | fprop_theano = theano.function([batch], dnn_model.fprop(batch)) 57 | 58 | if isinstance(input_space, Conv2DSpace): 59 | tframes, dim = input_space.shape 60 | view_converter = DefaultViewConverter((tframes, dim, 1)) 61 | else: 62 | dim = input_space.dim 63 | tframes = 1 64 | view_converter = None 65 | 66 | if view_converter: 67 | def fprop(batch): 68 | nframes = batch.shape[0] 69 | thop = 1. 70 | sup = np.arange(0,nframes-tframes+1, np.int(tframes/thop)) 71 | data = np.vstack([np.reshape(batch[i:i+tframes, :],(tframes*dim,)) for i in sup]) 72 | return fprop_theano(view_converter.get_formatted_batch(data, input_space)) 73 | else: 74 | fprop = fprop_theano 75 | 76 | # load aux model 77 | if args.aux_model: 78 | aux_model = joblib.load(args.aux_model) 79 | L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1] 80 | if L=='All': 81 | which_layers = [1,2,3] 82 | else: 83 | which_layers = [int(L)] 84 | aux_file = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.txt'), 'w') 85 | 86 | dnn_file = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.txt'), 'w') 87 | 88 | # fft params 89 | nfft = 2*(dim-1) 90 | nhop = nfft//2 91 | win = winfunc(2048) 92 | 93 | flist = glob.glob(os.path.join(args.in_path, '*')) 94 | 95 | for f in flist: 96 | fname = stripf(f) 97 | 98 | if f.endswith('.wav'): 99 | read_fun = audiolab.wavread 100 | elif f.endswith('.au'): 101 | read_fun = audiolab.auread 102 | elif f.endswith('.mp3'): 103 | read_fun = read_mp3 104 | else: 105 | continue 106 | 107 | x, fstmp, _ = read_fun(f) 108 | 109 | # make mono 110 | if len(x.shape) != 1: 111 | x = np.sum(x, axis=1)/2. 112 | 113 | seglen=30 114 | x = x[:fstmp*seglen] 115 | 116 | fs = 22050 117 | if fstmp != fs: 118 | x = samplerate.resample(x, fs/float(fstmp), 'sinc_best') 119 | 120 | # compute mag. spectra 121 | Mag, Phs = compute_fft(x, nfft, nhop) 122 | X0 = Mag[:,:dim] 123 | 124 | epsilon = np.linalg.norm(X0)/X0.shape[0]/10**(snr/20.) 125 | 126 | # write file name 127 | dnn_file.write('{}\t'.format(fname)) 128 | if args.aux_model: 129 | aux_file.write('{}\t'.format(fname)) 130 | 131 | for t in targets: 132 | 133 | # search for adversary 134 | X_adv, P_adv = find_adversary( 135 | model=dnn_model, 136 | X0=X0, 137 | label=t, 138 | P0=Phs, 139 | mu=mu, 140 | epsilon=epsilon, 141 | maxits=maxits, 142 | stop_thresh=stop_thresh, 143 | griffin_lim=True) 144 | 145 | # get time-domain representation 146 | x_adv = overlap_add( np.hstack((X_adv, X_adv[:,-2:-nfft/2-1:-1])) * np.exp(1j*P_adv)) 147 | 148 | minlen = min(len(x_adv), len(x)) 149 | x_adv = x_adv[:minlen] 150 | x = x[:minlen] 151 | out_snr = 20*np.log10(np.linalg.norm(x[nfft:-nfft]) / np.linalg.norm(x[nfft:-nfft]-x_adv[nfft:-nfft])) 152 | 153 | # dnn prediction 154 | pred = np.argmax(np.sum(fprop(X_adv), axis=0)) 155 | if pred == t: 156 | dnn_file.write('{}\t'.format(int(out_snr+.5))) 157 | else: 158 | dnn_file.write('{}\t'.format('na')) 159 | 160 | # aux prediction 161 | if args.aux_model: 162 | X_adv_agg = aggregate_features(dnn_model, X_adv, which_layers) 163 | pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg), dtype='int'))) 164 | if pred == t: 165 | aux_file.write('{}\t'.format(int(out_snr+.5))) 166 | else: 167 | aux_file.write('{}\t'.format('na')) 168 | 169 | # SAVE ADVERSARY FILES 170 | out_file = os.path.join(args.out_path, 171 | '{fname}.{label}.adversary.{snr}dB.wav'.format( 172 | fname=fname, 173 | label=label_list[t], 174 | snr=int(out_snr+.5))) 175 | audiolab.wavwrite(x_adv, out_file, fs) 176 | 177 | dnn_file.write('\n'.format(fname)) 178 | if args.aux_model: 179 | aux_file.write('\n'.format(fname)) 180 | 181 | dnn_file.close() 182 | if args.aux_model: 183 | aux_file.close() 184 | -------------------------------------------------------------------------------- /utils/tensongs_exp_filtered.py: -------------------------------------------------------------------------------- 1 | import os, argparse 2 | import scikits.audiolab as audiolab 3 | import scikits.samplerate as samplerate 4 | from matplotlib import pyplot as plt 5 | from sklearn.externals import joblib 6 | import numpy as np 7 | import scipy as sp 8 | import glob 9 | import theano 10 | from theano import tensor as T 11 | from pylearn2.utils import serial 12 | from audio_dataset import AudioDataset 13 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace 14 | import pylearn2.config.yaml_parse as yaml_parse 15 | 16 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features 17 | 18 | import pdb 19 | 20 | # def find_adversary(model, X0, label, P0=None, mu=.1, epsilon=.25, maxits=10, stop_thresh=0.5, griffin_lim=False): 21 | # ''' 22 | # Solves: 23 | 24 | # y* = argmin_y f(y; label) 25 | # s.t. y >= 0 and ||y-X0|| < e 26 | 27 | # where f(y) is the cost associated the network associates with the pair (y,label) 28 | 29 | # This can be solved using the projected gradient method: 30 | 31 | # min_y f(y) 32 | # s.t. y >= 0 and ||y-X0|| < e 33 | 34 | # z = max(0, y^k - mu.f'(y^k)) 35 | # y^k+1 = P(z) 36 | 37 | # P(z) = min_u ||u-z|| s.t. {u | ||u-X0|| < e } 38 | # Lagrangian(u,l) = L(u,l) = ||u-z|| + nu*(||u-X0|| - e) 39 | # dL/du = u-z + nu*(u-X0) = 0 40 | # u = (1+nu)^-1 (z + nu*X0) 41 | 42 | # KKT: 43 | # ||u-x|| = e 44 | # ||(1/(1+nu))(z + nu*x) - x|| = e 45 | # ||(1/(1+nu))z + ((nu/(1+nu))-1)x|| = e 46 | # ||(1/(1+nu))z - (1/(1+nu))x|| = e 47 | # (1/(1+nu))||z-x|| = e 48 | # nu = max(0,||z-x||/e - 1) 49 | 50 | # function inputs: 51 | 52 | # model - pylearn2 dnn model (implements fprop, cost) 53 | # X0 - an example that the model classifies correctly 54 | # label - an incorrect label 55 | # ''' 56 | # # convert integer label into one-hot vector 57 | # n_classes, n_examples = model.get_output_space().dim, X0.shape[0] 58 | # one_hot = np.zeros((n_examples, n_classes), dtype=np.float32) 59 | # one_hot[:,label] = 1 60 | 61 | # # Set-up gradient computation w/ Theano 62 | # in_batch = model.get_input_space().make_theano_batch() 63 | # out_batch = model.get_output_space().make_theano_batch() 64 | # cost = model.cost(out_batch, model.fprop(in_batch)) 65 | # dCost = T.grad(cost, in_batch) 66 | # grad = theano.function([in_batch, out_batch], dCost) 67 | # fprop = theano.function([in_batch], model.fprop(in_batch)) 68 | 69 | # # projected gradient: 70 | # last_pred = 0 71 | # #Y = np.array(np.random.rand(*X0.shape), dtype=np.float32) 72 | # Y = np.copy(X0) 73 | # Y_old = np.copy(Y) 74 | # t_old = 1 75 | # for i in xrange(maxits): 76 | 77 | # # gradient step 78 | # Z = Y - mu * n_examples * grad(Y, one_hot) 79 | 80 | # # non-negative projection 81 | # Z = Z * (Z>0) 82 | 83 | # if griffin_lim: 84 | # Z, P0 = griffin_lim_proj(np.hstack((Z, Z[:,-2:-nfft/2-1:-1])), P0, its=0) 85 | 86 | # # maximum allowable signal-to-noise projection 87 | # nu = np.linalg.norm((Z-X0))/n_examples/epsilon - 1 # lagrange multiplier 88 | # nu = nu * (nu>=0) 89 | # Y = (Z + nu*X0) / (1+nu) 90 | 91 | # # FISTA momentum 92 | # t = .5 + np.sqrt(1+4*t_old**2)/2. 93 | # alpha = (t_old - 1)/t 94 | # Y += alpha * (Y - Y_old) 95 | # Y_old = np.copy(Y) 96 | # t_old = t 97 | 98 | # # stopping condition 99 | # pred = np.sum(fprop(Y), axis=0) 100 | # pred /= np.sum(pred) 101 | 102 | # #print 'iteration: {}, pred[label]: {}, nu: {}'.format(i, pred[label], nu) 103 | # print 'iteration: {}, pred[label]: {}, nu: {}, snr: {}'.format(i, pred[label], nu, 20*np.log10(np.linalg.norm(X0)/np.linalg.norm(Y-X0))) 104 | 105 | # if pred[label] > stop_thresh: 106 | # break 107 | # elif pred[label] < last_pred + 1e-4: 108 | # break 109 | # last_pred = pred[label] 110 | 111 | # return Y, P0 112 | 113 | # winfunc = lambda x: np.hanning(x) 114 | # def compute_fft(x, nfft=1024, nhop=512): 115 | 116 | # window = winfunc(nfft) 117 | # nframes = int((len(x)-nfft)//nhop + 1) 118 | # fft_data = np.zeros((nframes, nfft)) 119 | 120 | # for i in xrange(nframes): 121 | # sup = i*nhop + np.arange(nfft) 122 | # fft_data[i,:] = x[sup] * window 123 | 124 | # fft_data = np.fft.fft(fft_data) 125 | # return tuple((np.array(np.abs(fft_data), dtype=np.float32), np.array(np.angle(fft_data), dtype=np.float32))) 126 | 127 | # def overlap_add(X, nfft=1024, nhop=512): 128 | 129 | # window = winfunc(nfft) # must use same window as compute_fft 130 | # L = X.shape[0]*nhop + (nfft-nhop) 131 | # x = np.zeros(L) 132 | # win_sum = np.zeros(L) 133 | 134 | # for i, frame in enumerate(X): 135 | # sup = i*nhop + np.arange(nfft) 136 | # x[sup] += np.real(np.fft.ifft(frame)) * window 137 | # win_sum[sup] += window **2 # ensure perfect reconstruction 138 | 139 | # return x/(win_sum + 1e-12) 140 | 141 | # def griffin_lim_proj(Mag, Phs=None, its=4, nfft=1024, nhop=512): 142 | # if Phs is None: 143 | # Phs = np.pi * np.random.randn(*Mag.shape) 144 | 145 | # x = overlap_add(Mag * np.exp(1j*Phs), nfft, nhop) 146 | # for i in xrange(its): 147 | # _, Phs = compute_fft(x, nfft, nhop) 148 | # x = overlap_add(Mag * np.exp(1j*Phs), nfft, nhop) 149 | 150 | # Mag, Phs = compute_fft(x, nfft, nhop) 151 | # return np.array(Mag[:,:nfft//2+1], dtype=np.float32), Phs 152 | 153 | 154 | # def aggregate_features(model, X, which_layers=[3], win_size=200, step=100): 155 | # assert np.max(which_layers) < len(model.layers) 156 | 157 | # n_classes, n_examples = model.get_output_space().dim, X.shape[0] 158 | # in_batch = model.get_input_space().make_theano_batch() 159 | # fprop = theano.function([in_batch], model.fprop(in_batch, return_all=True)) 160 | # output_data = fprop(X) 161 | # feats = np.hstack([output_data[i] for i in which_layers]) 162 | 163 | # agg_feat = [] 164 | # for i in xrange(0, feats.shape[0]-win_size, step): 165 | # chunk = feats[i:i+win_size,:] 166 | # agg_feat.append(np.hstack((np.mean(chunk, axis=0), np.std(chunk, axis=0)))) 167 | 168 | # return np.vstack(agg_feat) 169 | 170 | def stripf(f): 171 | fname = os.path.split(f)[-1] 172 | return os.path.splitext(fname)[0] 173 | 174 | if __name__ == '__main__': 175 | 176 | parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, 177 | description='') 178 | parser.add_argument('--dnn_model', help='dnn model to use for features') 179 | parser.add_argument('--aux_model', help='(auxilliary) model trained on dnn features') 180 | parser.add_argument('--in_path', help='file to test model on') 181 | parser.add_argument('--out_path', help='location for saving adversary (name automatically generated)') 182 | 183 | args = parser.parse_args() 184 | 185 | # tunable alg. parameters 186 | snr = 15. 187 | mu = .05 188 | stop_thresh = .9 189 | maxits = 100 190 | cut_freq = 9000. 191 | 192 | label_list = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock'] 193 | targets = range(len(label_list)) 194 | 195 | # load dnn model, fprop function 196 | dnn_model = serial.load(args.dnn_model) 197 | input_space = dnn_model.get_input_space() 198 | batch = input_space.make_theano_batch() 199 | fprop = theano.function([batch], dnn_model.fprop(batch)) 200 | 201 | # load aux model 202 | aux_model = joblib.load(args.aux_model) 203 | L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1] 204 | if L=='All': 205 | which_layers = [1,2,3] 206 | else: 207 | which_layers = [int(L)] 208 | 209 | # fft params 210 | nfft = 2*(input_space.dim-1) 211 | nhop = nfft//2 212 | win = winfunc(1024) 213 | 214 | # design lowpass filter. 215 | b,a = sp.signal.butter(4, cut_freq/(22050./2.)) 216 | 217 | flist = glob.glob(args.in_path +'*.wav') 218 | 219 | dnn_file = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.txt'), 'w') 220 | dnn_file_filt = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.filtered.txt'), 'w') 221 | aux_file = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.txt'), 'w') 222 | aux_file_filt = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.filtered.txt'), 'w') 223 | 224 | for f in flist: 225 | fname = stripf(f) 226 | 227 | # load audio file 228 | x, fs, fmt = audiolab.wavread(f) 229 | 230 | # make sure format agrees with training data 231 | if len(x.shape)!=1: 232 | print 'making mono:' 233 | x = np.sum(x, axis=1)/2. # mono 234 | if fs != 22050: 235 | print 'resampling to 22050 hz:' 236 | x = samplerate.resample(x, 22050./fs, 'sinc_best') 237 | fs = 22050 238 | 239 | # truncate input to multiple of hopsize 240 | nframes = (len(x)-nfft)/nhop 241 | x = x[:(nframes-1)*nhop + nfft] 242 | 243 | # smooth boundaries to prevent a click 244 | x[:512] *= win[:512] 245 | x[-512:] *= win[512:] 246 | 247 | # compute mag. spectra 248 | Mag, Phs = compute_fft(x, nfft, nhop) 249 | X0 = Mag[:,:input_space.dim] 250 | 251 | epsilon = np.linalg.norm(X0)/X0.shape[0]/10**(snr/20.) 252 | 253 | # write file name 254 | dnn_file.write('{}\t'.format(fname)) 255 | dnn_file_filt.write('{}\t'.format(fname)) 256 | aux_file.write('{}\t'.format(fname)) 257 | aux_file_filt.write('{}\t'.format(fname)) 258 | 259 | for t in targets: 260 | 261 | # search for adversary 262 | X_adv, P_adv = find_adversary( 263 | model=dnn_model, 264 | X0=X0, 265 | label=t, 266 | P0=Phs, 267 | mu=mu, 268 | epsilon=epsilon, 269 | maxits=maxits, 270 | stop_thresh=stop_thresh, 271 | griffin_lim=True) 272 | 273 | # get time-domain representation 274 | x_adv = overlap_add( np.hstack((X_adv, X_adv[:,-2:-nfft/2-1:-1])) * np.exp(1j*P_adv)) 275 | out_snr = 20*np.log10(np.linalg.norm(x[nfft:-nfft]) / np.linalg.norm(x[nfft:-nfft]-x_adv[nfft:-nfft])) 276 | 277 | # BEFORE FILTERING 278 | # =========================================== 279 | # dnn prediction 280 | pred = np.argmax(np.sum(fprop(X_adv), axis=0)) 281 | if pred == t: 282 | dnn_file.write('{}\t'.format(int(out_snr+.5))) 283 | else: 284 | dnn_file.write('{}\t'.format('na')) 285 | 286 | # aux prediction 287 | X_adv_agg = aggregate_features(dnn_model, X_adv, which_layers) 288 | pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg), dtype='int'))) 289 | if pred == t: 290 | aux_file.write('{}\t'.format(int(out_snr+.5))) 291 | else: 292 | aux_file.write('{}\t'.format('na')) 293 | 294 | # filtered representation 295 | x_filt = sp.signal.lfilter(b,a,x_adv) 296 | Mag2, Phs2 = compute_fft(x_filt, nfft, nhop) 297 | X_adv_filt = Mag2[:,:input_space.dim] 298 | 299 | # AFTER FILTERING 300 | # ================================================== 301 | # dnn prediction 302 | pred = np.argmax(np.sum(fprop(X_adv_filt), axis=0)) 303 | if pred == t: 304 | dnn_file_filt.write('{}\t'.format('x')) 305 | else: 306 | dnn_file_filt.write('{}\t'.format('o')) 307 | 308 | # aux prediction 309 | X_adv_agg_filt = aggregate_features(dnn_model, X_adv_filt, which_layers) 310 | pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg_filt), dtype='int'))) 311 | if pred == t: 312 | aux_file_filt.write('{}\t'.format('x')) 313 | else: 314 | aux_file_filt.write('{}\t'.format('o')) 315 | 316 | # SAVE ADVERSARY FILES 317 | out_file = os.path.join(args.out_path, 318 | '{fname}.{label}.adversary.{snr}dB.wav'.format( 319 | fname=fname, 320 | label=label_list[t], 321 | snr=int(out_snr+.5))) 322 | audiolab.wavwrite(x_adv, out_file, fs, fmt) 323 | 324 | out_file2 = os.path.join(args.out_path, 325 | '{fname}.{label}.adversary.filtered.wav'.format( 326 | fname=fname, 327 | label=label_list[t])) 328 | audiolab.wavwrite(x_filt, out_file2, fs, fmt) 329 | 330 | dnn_file.write('\n'.format(fname)) 331 | dnn_file_filt.write('\n'.format(fname)) 332 | aux_file.write('\n'.format(fname)) 333 | aux_file_filt.write('\n'.format(fname)) 334 | 335 | dnn_file.close() 336 | dnn_file_filt.close() 337 | aux_file.close() 338 | aux_file_filt.close() 339 | -------------------------------------------------------------------------------- /yaml_scripts/mlp_rlu.yaml: -------------------------------------------------------------------------------- 1 | !obj:pylearn2.train.Train { 2 | dataset : &trainset !obj:audio_dataset.AudioDataset { 3 | which_set : 'train', 4 | config : &fold !pkl: "%(fold_config)s" 5 | }, 6 | model : !obj:pylearn2.models.mlp.MLP { 7 | nvis : 513, 8 | layers : [ 9 | !obj:audio_dataset.PreprocLayer { 10 | config : *fold, 11 | proc_type : 'standardize' 12 | }, 13 | !obj:pylearn2.models.mlp.RectifiedLinear { 14 | layer_name : 'h0', 15 | dim : %(dim_h0)i, 16 | irange : &irange .1 17 | }, 18 | !obj:pylearn2.models.mlp.RectifiedLinear { 19 | layer_name : 'h1', 20 | dim : %(dim_h1)i, 21 | irange : *irange 22 | }, 23 | !obj:pylearn2.models.mlp.RectifiedLinear { 24 | layer_name : 'h2', 25 | dim : %(dim_h2)i, 26 | irange : *irange 27 | }, 28 | !obj:pylearn2.models.mlp.Softmax { 29 | n_classes : 10, 30 | layer_name : 'y', 31 | irange : *irange 32 | } 33 | ] 34 | }, 35 | algorithm : !obj:pylearn2.training_algorithms.sgd.SGD { 36 | learning_rate : .01, 37 | learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum { 38 | init_momentum : 0.5 39 | }, 40 | train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 41 | monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 42 | #batches_per_iter : 500, 43 | batch_size : 1200, 44 | monitoring_dataset : { 45 | 'train' : *trainset, 46 | 'valid' : !obj:audio_dataset.AudioDataset { 47 | which_set : 'valid', 48 | config : *fold 49 | } 50 | }, 51 | termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased { 52 | channel_name : 'valid_y_misclass', 53 | prop_decrease : .001, 54 | N: 10 55 | }#,cost : !obj:pylearn2.costs.mlp.Default {} 56 | }, 57 | extensions: [ 58 | !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest { 59 | channel_name: 'valid_y_misclass', 60 | save_path: "%(best_model_save_path)s" 61 | }, 62 | !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor { 63 | start: 1, 64 | saturate: 50, 65 | final_momentum: .9 66 | }, 67 | !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch { 68 | start: 1, 69 | saturate: 50, 70 | decay_factor: .01 71 | }, 72 | ], 73 | save_path : "%(save_path)s", 74 | save_freq : 1 75 | } -------------------------------------------------------------------------------- /yaml_scripts/mlp_rlu_conv2.yaml: -------------------------------------------------------------------------------- 1 | !obj:pylearn2.train.Train { 2 | dataset : &trainset !obj:audio_dataset.AudioDataset { 3 | which_set : 'train', 4 | config : &fold !pkl: "%(fold_config)s" 5 | }, 6 | model : !obj:pylearn2.models.mlp.MLP { 7 | batch_size : 5, 8 | input_space: !obj:pylearn2.space.Conv2DSpace { 9 | shape: [100, 513], 10 | num_channels: 1 11 | }, 12 | layers : [ 13 | !obj:pylearn2.models.mlp.ConvRectifiedLinear { 14 | layer_name : 'h0', 15 | output_channels : 32, 16 | kernel_shape : [4, 400], 17 | pool_shape : [4, 4], 18 | pool_stride : [2, 2], 19 | irange : &irange .01 20 | }, 21 | !obj:pylearn2.models.mlp.ConvRectifiedLinear { 22 | layer_name : 'h1', 23 | output_channels : 32, 24 | kernel_shape : [8, 8], 25 | pool_shape : [4, 4], 26 | pool_stride : [2, 2], 27 | irange : *irange 28 | }, 29 | !obj:pylearn2.models.mlp.RectifiedLinear { 30 | layer_name : 'h2', 31 | dim : 50, 32 | irange : *irange 33 | }, 34 | !obj:pylearn2.models.mlp.Softmax { 35 | n_classes : 10, 36 | layer_name : 'y', 37 | irange : *irange 38 | } 39 | ] 40 | }, 41 | algorithm : !obj:pylearn2.training_algorithms.sgd.SGD { 42 | learning_rate : .001, 43 | learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum { 44 | init_momentum : 0.5 45 | }, 46 | train_iteration_mode : 'even_shuffled_sequential', #'batchwise_shuffled_sequential', 47 | monitor_iteration_mode : 'even_shuffled_sequential', #'batchwise_shuffled_sequential', 48 | #batches_per_iter : 500, 49 | #batch_size : 1200, 50 | monitoring_dataset : { 51 | 'train' : *trainset, 52 | 'valid' : !obj:audio_dataset.AudioDataset { 53 | which_set : 'valid', 54 | config : *fold 55 | } 56 | }, 57 | termination_criterion: !obj:pylearn2.termination_criteria.And { 58 | criteria: [ 59 | !obj:pylearn2.termination_criteria.MonitorBased { 60 | channel_name: "valid_y_misclass", 61 | prop_decrease: 0.001, 62 | N: 10 63 | }, 64 | !obj:pylearn2.termination_criteria.EpochCounter { 65 | max_epochs: 30 66 | } 67 | ] 68 | }, 69 | 70 | }, 71 | extensions: [ 72 | !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest { 73 | channel_name: 'valid_y_misclass', 74 | save_path: "%(best_model_save_path)s" 75 | }, 76 | !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor { 77 | start: 1, 78 | saturate: 100, 79 | final_momentum: .9 80 | }, 81 | !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch { 82 | start: 1, 83 | saturate: 100, 84 | decay_factor: .01 85 | }, 86 | ], 87 | save_path : "%(save_path)s", 88 | save_freq : 1 89 | } -------------------------------------------------------------------------------- /yaml_scripts/mlp_rlu_dropout.yaml: -------------------------------------------------------------------------------- 1 | !obj:pylearn2.train.Train { 2 | dataset : &trainset !obj:audio_dataset.AudioDataset { 3 | which_set : 'train', 4 | config : &fold !pkl: "%(fold_config)s" 5 | }, 6 | model : !obj:pylearn2.models.mlp.MLP { 7 | nvis : 513, 8 | layers : [ 9 | !obj:audio_dataset.PreprocLayer { 10 | config : *fold, 11 | proc_type : 'standardize' 12 | }, 13 | !obj:pylearn2.models.mlp.RectifiedLinear { 14 | layer_name : 'h0', 15 | dim : %(dim_h0)i, 16 | irange : &irange .1 17 | }, 18 | !obj:pylearn2.models.mlp.RectifiedLinear { 19 | layer_name : 'h1', 20 | dim : %(dim_h1)i, 21 | irange : *irange 22 | }, 23 | !obj:pylearn2.models.mlp.RectifiedLinear { 24 | layer_name : 'h2', 25 | dim : %(dim_h2)i, 26 | irange : *irange 27 | }, 28 | !obj:pylearn2.models.mlp.Softmax { 29 | n_classes : 10, 30 | layer_name : 'y', 31 | irange : *irange 32 | } 33 | ] 34 | }, 35 | algorithm : !obj:pylearn2.training_algorithms.sgd.SGD { 36 | learning_rate : .1, 37 | learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum { 38 | init_momentum : 0.5 39 | }, 40 | train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 41 | monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential', 42 | #batches_per_iter : 500, 43 | batch_size : 1200, 44 | monitoring_dataset : { 45 | 'train' : *trainset, 46 | 'valid' : !obj:audio_dataset.AudioDataset { 47 | which_set : 'valid', 48 | config : *fold 49 | } 50 | }, 51 | termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased { 52 | channel_name : 'valid_y_misclass', 53 | prop_decrease : .001, 54 | N: 50 55 | }, 56 | cost: !obj:pylearn2.costs.mlp.dropout.Dropout { 57 | default_input_include_prob : .75, 58 | input_include_probs: { 'pre': 1., 'h0': 1. }, 59 | input_scales: { 'pre': 1., 'h0' : 1. } 60 | } 61 | }, 62 | extensions: [ 63 | !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest { 64 | channel_name: 'valid_y_misclass', 65 | save_path: "%(best_model_save_path)s" 66 | }, 67 | !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor { 68 | start: 1, 69 | saturate: 200, 70 | final_momentum: .9 71 | }, 72 | !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch { 73 | start: 1, 74 | saturate: 200, 75 | decay_factor: .01 76 | } 77 | ], 78 | save_path : "%(save_path)s", 79 | save_freq : 1 80 | } --------------------------------------------------------------------------------