├── .gitignore
├── README.md
├── adversary_dataset.py
├── audio_dataset.py
├── fine_tune_pretrained_mlp.py
├── gtzan
    ├── test_filtered.txt
    ├── test_stratified.txt
    ├── train_filtered.txt
    ├── train_stratified.txt
    ├── valid_filtered.txt
    └── valid_stratified.txt
├── hpc_scripts
    ├── README.md
    ├── generate_gbar_jobs.py
    ├── generate_gbar_jobs_dnn.py
    ├── generate_gbar_jobs_rf.py
    ├── generate_gbar_jobs_rf2.py
    ├── generate_jobs.sh
    ├── gpu_to_cpu_pkl.py
    └── requirements.txt
├── lmd
    ├── lmd_prep.sh
    ├── lmd_train.sh
    └── lmd_train_conv.sh
├── lmd_af
    ├── LMD_AF_split_conv.pkl
    ├── LMD_AF_split_dnn.pkl
    ├── lmd_af_conv_model.pkl
    ├── lmd_af_dnn_model.pkl
    └── mlp_rlu_dropout_adversary.yaml
├── prepare_dataset.py
├── pretrain_layers.py
├── svm_train_test.py
├── test_mlp_script.py
├── train_classifier_on_dnn_feats.py
├── train_mlp_conv_script.py
├── train_mlp_script.py
├── utils
    ├── __init__.py
    ├── calc_grad.py
    ├── class_histogram.py
    ├── comp_ave_snr.py
    ├── create_adversarial_dataset.py
    ├── create_split_files.py
    ├── filtered_classify.py
    ├── filtered_classify_batch.py
    ├── plot_adversary_spectra.py
    ├── plot_conf.py
    ├── plot_individual_confs.py
    ├── plot_mean_std_recall.py
    ├── read_mp3.py
    ├── tensongs_exp.py
    ├── tensongs_exp_filtered.py
    ├── test_adversary.py
    └── test_adversary_linearized.py
└── yaml_scripts
    ├── mlp_rlu.yaml
    ├── mlp_rlu_conv2.yaml
    └── mlp_rlu_dropout.yaml


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.pkl
3 | *.h5
4 | *.pyc
5 | .DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ##About
  2 | This repository contains research code for experimenting with deep neural networks (DNNs) for music genre recognition (MGR). For the moment the repository contains many test scripts and should be considered unstable (i.e., the code is subject to change given our experimental needs). Nonetheless, the instructions below should help you reproduce our experiments and identify which files are most important for the basic functionality. 
  3 | 
  4 | ##Requirements
  5 | - Python (Tested with version 2.7. Note Python 3 contains many changes that might introduce bugs in this code) 
  6 | - NumPy
  7 | - SciPy
  8 | - PyTables (requires numexpr and libhdf5)
  9 | - Theano
 10 | - Pylearn2
 11 | - scikits.audiolab (and libsndfile)
 12 | - scikits-learn
 13 | 
 14 | You can find some setup tips in the hpc_scripts folder of this repository
 15 | 
 16 | ##V2.0 Instructions
 17 | This version is more flexible than the previous version and has been designed to work with generic datasets (not only the Tzanetakis dataset), with arbitrary categorical labels, and excerpt lengths.
 18 | 
 19 | ###Dataset organization:
 20 | Audio files must be uncompressed in either WAV or AU format and many different types of directory structures are permissible. However, there must be a way of specifying the categorical label for each file in the dataset. This can be done either by embedding the label in the filename, or the name of the parent folder (the folder name will always take precedence in the case of a conflict).
 21 | 
 22 | In order to handle large datasets that may not fit into RAM this code requires that the dataset first be saved as a hdf5 file, which can be partially loaded into RAM on demand during training and testing. The script prepare_dataset.py will search for the dataset files, and prepare the hdf5 file. Furthermore, the prepare_dataset.py script can generate train/validation/test configuration files that specify a partition to be used in an experiment (e.g., 10-fold cross-validation). The partition configuration contains important meta-data, such as the train/valid/test files (and their index in the hdf5 file), as well as the mean and standard deviation of the training set (which can be used to standardize the data for training, validation, and testing).
 23 | 
 24 | The following instructions demonstrate an example of how to use the code:
 25 | ####1. Prepare the dataset and partition configuration file(s):
 26 | 
 27 | ```
 28 | python prepare_dataset.py \
 29 | 	/path/to/dataset \
 30 | 	/path/to/label_list.txt \
 31 | 	--hdf5 /path/to/save/dataset.hdf5 \
 32 | 	--train_prop 0.5 \
 33 | 	--valid_prop 0.25 \
 34 | 	--test_prop 0.25 \
 35 | 	--partition_name /path/to/save/partition_configuration.pkl \
 36 | 	--compute_std
 37 | 	--tframes 100
 38 | ```
 39 | 
 40 | This will create the hdf5 dataset file and generate (1/test_prop) stratified partitions. The label_list.txt is a comma or newline separated list of the categorical labels in the dataset (which are matched against file and/or folder names)
 41 | 
 42 | Alternatively the user can use a list of files when creating the partition:
 43 | 
 44 | ```
 45 | python prepare_dataset.py \
 46 | 	/path/to/dataset \
 47 | 	/path/to/label_list.txt \
 48 | 	--hdf5 /path/to/save/dataset.hdf5 \
 49 | 	--train /path/to/train_list.txt \
 50 | 	--valid /path/to/valid_list.txt \
 51 | 	--test /path/to/test_list.txt \
 52 | 	--partition_name /path/to/save/partition_configuration.pkl \
 53 | 	--compute_std
 54 | 	--tframes 100
 55 | ```
 56 | 
 57 | The lists should be newline separated, and contain the relative path to each file (from the root folder of the dataset). For example if the directory structure is as follows:
 58 | 
 59 | /root/blues/file.wav   
 60 | /root/jazz/file.wav   
 61 |    
 62 | .   
 63 | .      
 64 | .   
 65 | 
 66 | 
 67 | then the training list text file should look like this:
 68 | 
 69 | blues/file.wav   
 70 | jazz/file.wav   
 71 |    
 72 | run: `python prepare_dataset.py --help` to see a full list of options
 73 | 
 74 | ####2. Train a DNN:
 75 | 
 76 | ```
 77 | python train_mlp_script.py \
 78 | 	/path/to/partition_configuration.pkl \
 79 | 	/path/to/yaml_config_file.yaml \
 80 | 	--nunits 50
 81 | 	--output /path/to/save/model_file.pkl
 82 | ```
 83 | 
 84 | Some yaml configuration files are provided in the folder yaml_scripts (but you can write your own for different experiments).
 85 | 
 86 | ####3. Test a previously trained and saved DNN:
 87 | 
 88 | ```
 89 | python test_mlp_script.py \
 90 | 	/path/to/saved/model_file.pkl \
 91 | 	--majority_vote
 92 | ```
 93 | 
 94 | The model knows which dataset it was trained on, and will use the associated test set. An alternative testset can also be specified:
 95 | 
 96 | ```
 97 | python test_mlp_script.py \
 98 | 	/path/to/saved/model_file.pkl \
 99 | 	--testset /path/to/alternate/partition_configuration.pkl
100 | 	--save_file /path/to/savefile.txt
101 | ```
102 | 
103 | `--save_file` lets the user save the test results to a file
104 | 
105 | 
106 | ##V1.0 Instructions
107 | This version has now been removed, but can be checked out as a branch using the v1.0 tag.
108 | 


--------------------------------------------------------------------------------
/adversary_dataset.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import functools
  3 | import tables
  4 | 
  5 | from pylearn2.datasets.dataset import Dataset
  6 | from pylearn2.datasets.dense_design_matrix import DenseDesignMatrixPyTables, DefaultViewConverter
  7 | from pylearn2.blocks import Block
  8 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
  9 | from pylearn2.utils.iteration import SubsetIterator, FiniteDatasetIterator, resolve_iterator_class
 10 | from pylearn2.utils import safe_zip, safe_izip
 11 | 
 12 | from pylearn2.datasets import control
 13 | from pylearn2.utils.exc import reraise_as
 14 | from pylearn2.utils.rng import make_np_rng
 15 | from pylearn2.utils import contains_nan
 16 | from pylearn2.models.mlp import MLP, Linear, PretrainedLayer
 17 | from pylearn2.models.autoencoder import Autoencoder
 18 | 
 19 | #from utils.test_adversary_linearized import find_adversary
 20 | import theano
 21 | from theano import config
 22 | from theano import tensor as T
 23 | 
 24 | import pdb
 25 | 
 26 | class AdversaryDataset(DenseDesignMatrixPyTables):
 27 |     def __init__(self, config, adv_model, which_set='train'):
 28 | 
 29 |         keys = ['train', 'test', 'valid']
 30 |         assert which_set in keys
 31 | 
 32 |         # load hdf5 metadata
 33 |         self.hdf5       = tables.open_file( config['hdf5'], mode='r')
 34 |         data            = self.hdf5.get_node('/', 'Data')
 35 |         param           = self.hdf5.get_node('/', 'Param')
 36 |         self.file_index = param.file_index[0]
 37 |         self.file_dict  = param.file_dict[0]
 38 |         self.label_list = param.label_list[0]
 39 |         self.targets    = param.targets[0]
 40 |         self.nfft       = param.fft[0]['nfft']
 41 | 
 42 |         # load parition information
 43 |         self.support   = config[which_set]
 44 |         self.file_list = config[which_set+'_files']
 45 |         self.mean      = config['mean']        
 46 |         self.mean      = self.mean.reshape((np.prod(self.mean.shape),))
 47 |         self.var       = config['var']
 48 |         self.var       = self.var.reshape((np.prod(self.var.shape),))
 49 |         self.istd      = np.reciprocal(np.sqrt(self.var))
 50 |         self.mask      = (self.istd < 20)
 51 |         self.tframes   = config['tframes']
 52 | 
 53 |         # setup adversary
 54 |         self.adv_model = adv_model
 55 |         in_batch  = adv_model.get_input_space().make_theano_batch()
 56 |         out_batch = adv_model.get_output_space().make_theano_batch()        
 57 |         cost      = adv_model.cost(out_batch, adv_model.fprop(in_batch))
 58 |         dCost     = T.grad(cost, in_batch)
 59 | 
 60 |         grad_theano  = theano.function([in_batch, out_batch], dCost)
 61 |         fprop_theano = theano.function([in_batch], adv_model.fprop(in_batch))
 62 |         fcost_theano = theano.function([in_batch, out_batch], cost)
 63 | 
 64 |         self.input_space = adv_model.get_input_space()
 65 |         if isinstance(self.input_space, Conv2DSpace):
 66 |             tframes, dim   = self.input_space.shape
 67 |             view_converter = DefaultViewConverter((tframes, dim, 1))
 68 | 
 69 |             def grad(batch, labels):            
 70 |                  topo_view = grad_theano(view_converter.get_formatted_batch(batch, self.input_space), labels)
 71 |                  return view_converter.topo_view_to_design_mat(topo_view)
 72 | 
 73 |             def fprop(batch):
 74 |                  return fprop_theano(view_converter.get_formatted_batch(batch, self.input_space))
 75 | 
 76 |             def fcost(batch, labels):
 77 |                  return fcost_theano(view_converter.get_formatted_batch(batch, self.input_space), labels)
 78 | 
 79 |             self.grad = grad
 80 |             self.fprop = fprop
 81 |             self.fcost = fcost
 82 | 
 83 |             
 84 |             super(AdversaryDataset, self).__init__(X=data.X, y=data.y,
 85 |                 view_converter=view_converter)
 86 | 
 87 |         else:
 88 |             dim = self.input_space.dim        
 89 |             tframes = 1
 90 |             view_converter = None
 91 | 
 92 |             self.grad = grad_theano
 93 |             self.fprop = fprop_theano
 94 |             self.fcost = fcost_theano 
 95 | 
 96 |             super(AdversaryDataset, self).__init__(X=data.X, y=data.y)
 97 |     
 98 |     def __del__(self):
 99 |         self.hdf5.close()   
100 |     
101 |     def create_adversary_from_batch(self, batch, label, mu=0.25, snr=15):
102 |         #n_examples = batch.shape[0]
103 |         #epsilon = np.linalg.norm(batch)/n_examples/10**(snr/20.)
104 | 
105 |         g = self.grad(batch, label) #* n_examples
106 |         Z = batch - mu * np.sign(g)
107 |         if self.tframes==1: Z = Z * (Z>0)
108 | 
109 |         #nu = np.linalg.norm((Z-batch))/n_examples/epsilon - 1 # lagrange multiplier
110 |         #nu = nu * (nu>=0)
111 |         #Y  = (Z + nu*batch) / (1+nu)
112 |         return Z#Y
113 |     
114 |     def standardize(self, batch):
115 |         return (batch - self.mean) * self.istd * self.mask
116 | 
117 |     @functools.wraps(Dataset.iterator)
118 |     def iterator(self, mode=None, batch_size=1, num_batches=None,
119 |                  topo=None, targets=None, rng=None, data_specs=None,
120 |                  return_tuple=False):
121 |         '''
122 |         Copied from pylearn2 superclass in order to return custom iterator.
123 |         Two different iterators are available, depending on the data_specs.
124 |         1. If the data_specs source is 'features' a framelevel iterator is returned 
125 |         (each call to next() returns a single frame)
126 |         2. If the data_specs source is 'songlevel-features' a songlevel iterator is returned
127 |         (each call to next() returns all the frames associated with a given song in the dataset)
128 |         '''
129 |         if data_specs is None:
130 |             data_specs = self._iter_data_specs
131 |         else:
132 |             self.data_specs = data_specs
133 | 
134 |         # If there is a view_converter, we have to use it to convert
135 |         # the stored data for "features" into one that the iterator
136 |         # can return.
137 |         space, source = data_specs
138 |         if isinstance(space, CompositeSpace):
139 |             sub_spaces = space.components
140 |             sub_sources = source
141 |         else:
142 |             sub_spaces = (space,)
143 |             sub_sources = (source,)
144 | 
145 |         convert = []
146 |         for sp, src in safe_zip(sub_spaces, sub_sources):
147 |             if (src == 'features' or src == 'songlevel-features') and \
148 |                getattr(self, 'view_converter', None) is not None:
149 |                 conv_fn = (lambda batch, self=self, space=sp:
150 |                            self.view_converter.get_formatted_batch(batch,
151 |                                                                    space))
152 |             else:
153 |                 conv_fn = None
154 | 
155 |             convert.append(conv_fn)
156 | 
157 |         # TODO: Refactor
158 |         if mode is None:
159 |             if hasattr(self, '_iter_subset_class'):
160 |                 mode = self._iter_subset_class
161 |             else:
162 |                 raise ValueError('iteration mode not provided and no default '
163 |                                  'mode set for %s' % str(self))
164 |         else:
165 |             mode = resolve_iterator_class(mode)
166 | 
167 |         if num_batches is None:
168 |             num_batches = getattr(self, '_iter_num_batches', None)
169 |         if rng is None and mode.stochastic:
170 |             rng = self.rng        
171 | 
172 |         if 'songlevel-features' in sub_sources:
173 |             if batch_size is not 1:
174 |                 raise ValueError("'batch_size' must be set to 1 for songlevel iterator")
175 |             return SonglevelIterator(self,
176 |                               mode(len(self.file_list), batch_size, num_batches, rng),
177 |                               data_specs=data_specs,
178 |                               return_tuple=return_tuple,
179 |                               convert=convert)
180 |         else:
181 |             return FramelevelIterator(self,
182 |                                   mode(len(self.support), batch_size, num_batches, rng),
183 |                                   data_specs=data_specs,
184 |                                   return_tuple=return_tuple,
185 |                                   convert=convert)
186 | 
187 | class FramelevelIterator(FiniteDatasetIterator):
188 |     '''
189 |     Returns individual (spectrogram) frames/slices from the dataset
190 |     '''
191 |     @functools.wraps(SubsetIterator.next)
192 |     def next(self):
193 |         """
194 |         Retrieves the next batch of examples.
195 | 
196 |         Returns
197 |         -------
198 |         next_batch : object
199 |             An object representing a mini-batch of data, conforming
200 |             to the space specified in the `data_specs` constructor
201 |             argument to this iterator. Will be a tuple if more
202 |             than one data source was specified or if the constructor
203 |             parameter `return_tuple` was `True`.
204 | 
205 |         Raises
206 |         ------
207 |         StopIteration
208 |             When there are no more batches to return.
209 |         """
210 |         next_index = self._subset_iterator.next()
211 |         next_index = self._dataset.support[ next_index ] # !!! added line to iterate over different index set !!!
212 | 
213 |         spaces, sources = self._data_specs
214 |         output = []                
215 | 
216 |         for data, fn, source in safe_izip(self._raw_data, self._convert, sources):
217 |             if source=='targets':
218 |                 if fn:
219 |                     output.append( fn(data[next_index, :]) )
220 |                 else:
221 |                     output.append( data[next_index, :] )
222 |             else:
223 |                 design_mat = []
224 |                 for index in next_index:                    
225 |                     X = np.abs(data[index:index+self._dataset.tframes, :])
226 |                     design_mat.append( X.reshape((np.prod(X.shape),)) )
227 |                    
228 |                 design_mat = np.vstack(design_mat)
229 |                 if self._dataset.tframes > 1:
230 |                     # ideally we'd standardize in a preprocessing layer
231 |                     # (so that standardization is built-in to the model rather
232 |                     # than the dataset) but i haven't quite figured out how to do 
233 |                     # this yet for images, due to a memory error associated with
234 |                     # a really big diagonal scaling matrix
235 |                     # (however, it works fine for vectors)
236 |                     design_mat = self._dataset.standardize(design_mat)
237 | 
238 |                 n_classes = len(self._dataset.targets)
239 |                 one_hot = np.zeros((design_mat.shape[0], n_classes), dtype=np.float32)
240 |                 for r in one_hot: r[np.random.randint(n_classes)] = 1
241 | 
242 |                 design_mat = self._dataset.create_adversary_from_batch(design_mat, one_hot)
243 |                 
244 |                 if fn:
245 |                     output.append( fn(design_mat) )
246 |                 else:
247 |                     output.append( design_mat )
248 | 
249 |         rval = tuple(output)
250 |         if not self._return_tuple and len(rval) == 1:
251 |             rval, = rval
252 |         return rval
253 | 
254 | class SonglevelIterator(FiniteDatasetIterator):
255 |     '''
256 |     Returns all data associated with a particular song from the dataset
257 |     (only iterates 1 song at a time!)
258 |     '''
259 |     @functools.wraps(SubsetIterator.next)
260 |     def next(self):
261 |         
262 |         # next numerical index
263 |         next_file_index = self._subset_iterator.next()        
264 |         
265 |         # associate numerical index with file from the dataset
266 |         next_file = self._dataset.file_list[ next_file_index ][0] # !!! added line to iterate over different index set !!!
267 |         
268 |         # lookup file's position in the hdf5 array
269 |         offset, nframes, key, target = self._dataset.file_index[next_file]
270 |         
271 |         thop = 1. # hardcoded and must match prepare_dataset.py!!!
272 |         sup = np.arange(0,nframes-self._dataset.tframes,np.int(self._dataset.tframes/thop))        
273 |         next_index = offset + sup
274 | 
275 |         spaces, sources = self._data_specs
276 |         output = []                
277 | 
278 |         for data, fn, source, space in safe_izip(self._raw_data, self._convert, sources, spaces.components):
279 |             if source=='targets':
280 |                 # if fn:
281 |                 #     output.append( fn( np.reshape(data[next_index[0], :], (1,-1)) ) )
282 |                 # else:
283 |                 #     output.append( np.reshape(data[next_index[0], :], (1,-1)) )
284 |                 output.append( target )
285 |             else:
286 |                 design_mat = []
287 |                 for index in next_index:
288 |                     if 0:#space.dtype=='complex64':
289 |                         X = data[index:index+self._dataset.tframes, :] # return phase too
290 |                     else:
291 |                         X = np.abs(data[index:index+self._dataset.tframes, :])
292 |                     design_mat.append( X.reshape((np.prod(X.shape),)) )                    
293 |                 
294 |                 design_mat = np.vstack(design_mat)
295 | 
296 |                 if self._dataset.tframes > 1:
297 |                     # ideally we'd standardize in a preprocessing layer
298 |                     # (so that standardization is built-in to the model rather
299 |                     # than the dataset) but i haven't quite figured out how to do 
300 |                     # this yet for images, due to a memory error associated with
301 |                     # a really big diagonal scaling matrix
302 |                     # (however, it works fine for vectors)
303 |                     design_mat = self._dataset.standardize(design_mat)
304 | 
305 |                 if fn:
306 |                     output.append( fn(design_mat) )
307 |                 else:
308 |                     output.append( design_mat )
309 |         
310 |         output.append(next_file)
311 |         rval = tuple(output)
312 |         if not self._return_tuple and len(rval) == 1:
313 |             rval, = rval
314 |         return rval
315 | 
316 | class PreprocLayer(PretrainedLayer):
317 |     # should this use a linear layer instead of an autoencoder
318 |     # (problem is layers don't implement upward_pass as required by pretrained layer...
319 |     # but perhaps could write upward_pass function to call layer's fprop.)
320 |     def __init__(self, config, proc_type='standardize', **kwargs):
321 |         '''
322 |         config: dictionary with partition configuration information
323 |         
324 |         proc_type: type of preprocessing (either standardize or pca_whiten)
325 |         
326 |         if proc_type='standardize' no extra arguments required
327 | 
328 |         if proc_type='pca_whiten' the following keyword arguments are required:
329 |             ncomponents = x where x is an integer
330 |             epsilon = y where y is a float (regularization parameter)
331 |         '''
332 | 
333 |         recognized_types = ['standardize', 'pca_whiten']
334 |         assert proc_type in recognized_types
335 | 
336 |         # load parition information
337 |         self.mean    = config['mean']
338 |         self.istd    = np.reciprocal(np.sqrt(config['var']))
339 |         self.tframes = config['tframes']
340 |         nvis = len(self.mean)
341 | 
342 |         if proc_type == 'standardize':
343 |             dim = nvis
344 |             mask = (self.istd < 20) # in order to ignore near-zero variance inputs
345 |             self.biases = np.array(-self.mean * self.istd * mask, dtype=np.float32)
346 |             self.weights = np.array(np.diag(self.istd * mask), dtype=np.float32)
347 |             
348 |         if proc_type == 'pca_whiten':
349 |             raise NotImplementedError(
350 |             '''PCA whitening not yet implemented as a layer. 
351 |             Use audio_dataset2d.AudioDataset2d to perform whitening from the dataset iterator''')
352 | 
353 |             # dim      = kwargs['ncomponents']
354 |             # S        = config['S'][:dim]   # eigenvalues
355 |             # U        = config['U'][:,:dim] # eigenvectors            
356 |             # self.pca = np.diag(1./(np.sqrt(S) + epsilon)).dot(U.T)
357 |             
358 |             # self.biases   = np.array(-self.mean.dot(self.pca.transpose()), dtype=np.float32)
359 |             # self.weights  = np.array(self.pca.transpose(), dtype=np.float32)
360 | 
361 |         # Autoencoder with linear units
362 |         pre_layer = Autoencoder(nvis=nvis, nhid=dim, act_enc=None, act_dec=None, irange=0)
363 |         
364 |         # Set weights for pre-processing
365 |         params    = pre_layer.get_param_values()
366 |         params[1] = self.biases
367 |         params[2] = self.weights
368 |         pre_layer.set_param_values(params)
369 | 
370 |         super(PreprocLayer, self).__init__(layer_name='pre', layer_content=pre_layer, freeze_params=True)        
371 |     
372 |     def get_biases(self):
373 |         return self.biases
374 | 
375 |     def get_weights(self):
376 |         return self.weights
377 | 
378 |     def get_param_values(self):
379 |         return list((self.get_weights(), self.get_biases()))
380 | 
381 | 
382 | 


--------------------------------------------------------------------------------
/audio_dataset.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import functools
  3 | import tables
  4 | 
  5 | from pylearn2.datasets.dataset import Dataset
  6 | from pylearn2.datasets.dense_design_matrix import DenseDesignMatrixPyTables, DefaultViewConverter
  7 | from pylearn2.blocks import Block
  8 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
  9 | from pylearn2.utils.iteration import SubsetIterator, FiniteDatasetIterator, resolve_iterator_class
 10 | from pylearn2.utils import safe_zip, safe_izip
 11 | 
 12 | from pylearn2.datasets import control
 13 | from pylearn2.utils.exc import reraise_as
 14 | from pylearn2.utils.rng import make_np_rng
 15 | from pylearn2.utils import contains_nan
 16 | from pylearn2.models.mlp import MLP, Linear, PretrainedLayer
 17 | from pylearn2.models.autoencoder import Autoencoder
 18 | 
 19 | from theano import config
 20 | 
 21 | import pdb
 22 | 
 23 | class AudioDataset(DenseDesignMatrixPyTables):
 24 |     def __init__(self, config, which_set='train'): #, standardize=True, pca_whitening=False, ncomponents=None, epsilon=3):
 25 | 
 26 |         keys = ['train', 'test', 'valid']
 27 |         assert which_set in keys
 28 | 
 29 |         # load hdf5 metadata
 30 |         self.hdf5       = tables.open_file( config['hdf5'], mode='r')
 31 |         data            = self.hdf5.get_node('/', 'Data')
 32 |         param           = self.hdf5.get_node('/', 'Param')
 33 |         self.file_index = param.file_index[0]
 34 |         self.file_dict  = param.file_dict[0]
 35 |         self.label_list = param.label_list[0]
 36 |         self.targets    = param.targets[0]
 37 |         self.nfft       = param.fft[0]['nfft']
 38 | 
 39 |         # load parition information
 40 |         self.support   = config[which_set]
 41 |         self.file_list = config[which_set+'_files']
 42 |         self.mean      = config['mean']        
 43 |         self.mean      = self.mean.reshape((np.prod(self.mean.shape),))
 44 |         self.var       = config['var']
 45 |         self.var       = self.var.reshape((np.prod(self.var.shape),))
 46 |         self.istd      = np.reciprocal(np.sqrt(self.var))
 47 |         self.mask      = (self.istd < 20)
 48 |         self.tframes   = config['tframes']
 49 | 
 50 |         if self.tframes > 1:
 51 |             view_converter = DefaultViewConverter((self.tframes, len(self.mean)/self.tframes, 1))
 52 |             super(AudioDataset, self).__init__(X=data.X, y=data.y,
 53 |                 view_converter=view_converter)
 54 |         else:
 55 |             super(AudioDataset, self).__init__(X=data.X, y=data.y)
 56 |     
 57 |     def __del__(self):
 58 |         self.hdf5.close()   
 59 |     
 60 |     @functools.wraps(Dataset.iterator)
 61 |     def iterator(self, mode=None, batch_size=1, num_batches=None,
 62 |                  topo=None, targets=None, rng=None, data_specs=None,
 63 |                  return_tuple=False):
 64 |         '''
 65 |         Copied from pylearn2 superclass in order to return custom iterator.
 66 |         Two different iterators are available, depending on the data_specs.
 67 |         1. If the data_specs source is 'features' a framelevel iterator is returned 
 68 |         (each call to next() returns a single frame)
 69 |         2. If the data_specs source is 'songlevel-features' a songlevel iterator is returned
 70 |         (each call to next() returns all the frames associated with a given song in the dataset)
 71 |         '''
 72 |         if data_specs is None:
 73 |             data_specs = self._iter_data_specs
 74 |         else:
 75 |             self.data_specs = data_specs
 76 | 
 77 |         # If there is a view_converter, we have to use it to convert
 78 |         # the stored data for "features" into one that the iterator
 79 |         # can return.
 80 |         space, source = data_specs
 81 |         if isinstance(space, CompositeSpace):
 82 |             sub_spaces = space.components
 83 |             sub_sources = source
 84 |         else:
 85 |             sub_spaces = (space,)
 86 |             sub_sources = (source,)
 87 | 
 88 |         convert = []
 89 |         for sp, src in safe_zip(sub_spaces, sub_sources):
 90 |             if (src == 'features' or src == 'songlevel-features') and \
 91 |                getattr(self, 'view_converter', None) is not None:
 92 |                 conv_fn = (lambda batch, self=self, space=sp:
 93 |                            self.view_converter.get_formatted_batch(batch,
 94 |                                                                    space))
 95 |             else:
 96 |                 conv_fn = None
 97 | 
 98 |             convert.append(conv_fn)
 99 | 
100 |         # TODO: Refactor
101 |         if mode is None:
102 |             if hasattr(self, '_iter_subset_class'):
103 |                 mode = self._iter_subset_class
104 |             else:
105 |                 raise ValueError('iteration mode not provided and no default '
106 |                                  'mode set for %s' % str(self))
107 |         else:
108 |             mode = resolve_iterator_class(mode)
109 | 
110 |         if num_batches is None:
111 |             num_batches = getattr(self, '_iter_num_batches', None)
112 |         if rng is None and mode.stochastic:
113 |             rng = self.rng
114 | 
115 |         if 'songlevel-features' in sub_sources:
116 |             if batch_size is not 1:
117 |                 raise ValueError("'batch_size' must be set to 1 for songlevel iterator")
118 |             return SonglevelIterator(self,
119 |                               mode(len(self.file_list), batch_size, num_batches, rng),
120 |                               data_specs=data_specs,
121 |                               return_tuple=return_tuple,
122 |                               convert=convert)
123 |         else:
124 |             return FramelevelIterator(self,
125 |                                   mode(len(self.support), batch_size, num_batches, rng),
126 |                                   data_specs=data_specs,
127 |                                   return_tuple=return_tuple,
128 |                                   convert=convert)
129 | 
130 |     def standardize(self, batch):
131 |         return (batch - self.mean) * self.istd * self.mask
132 | 
133 | class FramelevelIterator(FiniteDatasetIterator):
134 |     '''
135 |     Returns individual (spectrogram) frames/slices from the dataset
136 |     '''
137 |     @functools.wraps(SubsetIterator.next)
138 |     def next(self):
139 |         """
140 |         Retrieves the next batch of examples.
141 | 
142 |         Returns
143 |         -------
144 |         next_batch : object
145 |             An object representing a mini-batch of data, conforming
146 |             to the space specified in the `data_specs` constructor
147 |             argument to this iterator. Will be a tuple if more
148 |             than one data source was specified or if the constructor
149 |             parameter `return_tuple` was `True`.
150 | 
151 |         Raises
152 |         ------
153 |         StopIteration
154 |             When there are no more batches to return.
155 |         """
156 |         next_index = self._subset_iterator.next()
157 |         next_index = self._dataset.support[ next_index ] # !!! added line to iterate over different index set !!!
158 | 
159 |         spaces, sources = self._data_specs
160 |         output = []                
161 | 
162 |         for data, fn, source in safe_izip(self._raw_data, self._convert, sources):
163 |             if source=='targets':
164 |                 if fn:
165 |                     output.append( fn(data[next_index, :]) )
166 |                 else:
167 |                     output.append( data[next_index, :] )
168 |             else:
169 |                 design_mat = []
170 |                 for index in next_index:                    
171 |                     X = np.abs(data[index:index+self._dataset.tframes, :])
172 |                     design_mat.append( X.reshape((np.prod(X.shape),)) ) 
173 | 
174 |                 design_mat = np.vstack(design_mat)
175 |                 
176 |                 if self._dataset.tframes > 1:
177 |                     # ideally we'd standardize in a preprocessing layer
178 |                     # (so that standardization is built-in to the model rather
179 |                     # than the dataset) but i haven't quite figured out how to do 
180 |                     # this yet for images, due to a memory error associated with
181 |                     # a really big diagonal scaling matrix
182 |                     # (however, it works fine for vectors)
183 |                     design_mat = self._dataset.standardize(design_mat)
184 | 
185 |                 if fn:
186 |                     output.append( fn(design_mat) )
187 |                 else:
188 |                     output.append( design_mat )
189 | 
190 |         rval = tuple(output)
191 |         if not self._return_tuple and len(rval) == 1:
192 |             rval, = rval
193 |         return rval
194 | 
195 | class SonglevelIterator(FiniteDatasetIterator):
196 |     '''
197 |     Returns all data associated with a particular song from the dataset
198 |     (only iterates 1 song at a time!)
199 |     '''
200 |     @functools.wraps(SubsetIterator.next)
201 |     def next(self):
202 |         
203 |         # next numerical index
204 |         next_file_index = self._subset_iterator.next()        
205 |         
206 |         # associate numerical index with file from the dataset
207 |         next_file = self._dataset.file_list[ next_file_index ][0] # !!! added line to iterate over different index set !!!
208 |         
209 |         # lookup file's position in the hdf5 array
210 |         offset, nframes, key, target = self._dataset.file_index[next_file]
211 |         
212 |         thop = 1. # hardcoded and must match prepare_dataset.py!!!
213 |         sup = np.arange(0,nframes-self._dataset.tframes,np.int(self._dataset.tframes/thop))        
214 |         next_index = offset + sup
215 | 
216 | 
217 |         spaces, sources = self._data_specs
218 |         output = []                
219 | 
220 |         for data, fn, source, space in safe_izip(self._raw_data, self._convert, sources, spaces.components):
221 |             if source=='targets':
222 |                 # if fn:
223 |                 #     output.append( fn( np.reshape(data[next_index[0], :], (1,-1)) ) )
224 |                 # else:
225 |                 #     output.append( np.reshape(data[next_index[0], :], (1,-1)) )
226 |                 output.append( target )
227 |             else:
228 |                 design_mat = []
229 |                 for index in next_index:
230 |                     if 0:#space.dtype=='complex64':
231 |                         X = data[index:index+self._dataset.tframes, :] # return phase too
232 |                     else:
233 |                         X = np.abs(data[index:index+self._dataset.tframes, :])
234 |                     design_mat.append( X.reshape((np.prod(X.shape),)) )
235 | 
236 |                 design_mat = np.vstack(design_mat)
237 |                 if self._dataset.tframes > 1:
238 |                     # ideally we'd standardize in a preprocessing layer
239 |                     # (so that standardization is built-in to the model rather
240 |                     # than the dataset) but i haven't quite figured out how to do 
241 |                     # this yet for images, due to a memory error associated with
242 |                     # a really big diagonal scaling matrix
243 |                     # (however, it works fine for vectors)                    
244 |                     design_mat = self._dataset.standardize(design_mat)
245 |                 
246 |                 if fn:
247 |                     output.append( fn(design_mat) )
248 |                 else:
249 |                     output.append( design_mat )
250 |         
251 |         output.append(next_file)
252 |         rval = tuple(output)
253 |         if not self._return_tuple and len(rval) == 1:
254 |             rval, = rval
255 |         return rval
256 | 
257 | class PreprocLayer(PretrainedLayer):
258 |     # should this use a linear layer instead of an autoencoder
259 |     # (problem is layers don't implement upward_pass as required by pretrained layer...
260 |     # but perhaps could write upward_pass function to call layer's fprop.)
261 |     def __init__(self, config, proc_type='standardize', **kwargs):
262 |         '''
263 |         config: dictionary with partition configuration information
264 |         
265 |         proc_type: type of preprocessing (either standardize or pca_whiten)
266 |         
267 |         if proc_type='standardize' no extra arguments required
268 | 
269 |         if proc_type='pca_whiten' the following keyword arguments are required:
270 |             ncomponents = x where x is an integer
271 |             epsilon = y where y is a float (regularization parameter)
272 |         '''
273 | 
274 |         recognized_types = ['standardize', 'pca_whiten']
275 |         assert proc_type in recognized_types
276 | 
277 |         # load parition information
278 |         self.mean    = config['mean']
279 |         self.mean    = self.mean.reshape((np.prod(self.mean.shape),))
280 |         self.istd    = np.reciprocal(np.sqrt(config['var']))
281 |         self.istd    = self.istd.reshape((np.prod(self.istd.shape),))
282 |         self.tframes = config['tframes']
283 |         nvis = len(self.mean)
284 | 
285 |         if proc_type == 'standardize':
286 |             dim = nvis
287 |             mask = (self.istd < 20) # in order to ignore near-zero variance inputs
288 |             self.biases = np.array(-self.mean * self.istd * mask, dtype=np.float32) 
289 |             self.weights = np.array(np.diag(self.istd * mask), dtype=np.float32) #!!!gives memory error for convnet (because diag not treated as sparse mat)
290 |             
291 |         if proc_type == 'pca_whiten':
292 |             raise NotImplementedError(
293 |             '''PCA whitening not yet implemented as a layer. 
294 |             Use audio_dataset2d.AudioDataset2d to perform whitening from the dataset iterator''')
295 | 
296 |             # dim      = kwargs['ncomponents']
297 |             # S        = config['S'][:dim]   # eigenvalues
298 |             # U        = config['U'][:,:dim] # eigenvectors            
299 |             # self.pca = np.diag(1./(np.sqrt(S) + epsilon)).dot(U.T)
300 |             
301 |             # self.biases   = np.array(-self.mean.dot(self.pca.transpose()), dtype=np.float32)
302 |             # self.weights  = np.array(self.pca.transpose(), dtype=np.float32)
303 | 
304 |         # Autoencoder with linear units
305 |         pre_layer = Autoencoder(nvis=nvis, nhid=dim, act_enc=None, act_dec=None, irange=0)
306 |         
307 |         # Set weights for pre-processing
308 |         params    = pre_layer.get_param_values()
309 |         params[1] = self.biases
310 |         params[2] = self.weights
311 |         pre_layer.set_param_values(params)
312 | 
313 |         super(PreprocLayer, self).__init__(layer_name='pre', layer_content=pre_layer, freeze_params=True)        
314 |     
315 |     def get_biases(self):
316 |         return self.biases
317 | 
318 |     def get_weights(self):
319 |         return self.weights
320 | 
321 |     def get_param_values(self):
322 |         return list((self.get_weights(), self.get_biases()))
323 | 
324 | if __name__=='__main__':
325 | 
326 |     # tests
327 |     import theano
328 |     import cPickle
329 |     from audio_dataset import AudioDataset
330 | 
331 |     with open('GTZAN_stratified.pkl') as f: 
332 |         config = cPickle.load(f)
333 |     
334 |     D = AudioDataset(config)
335 |     
336 |     feat_space   = VectorSpace(dim=D.X.shape[1])
337 |     feat_space_complex = VectorSpace(dim=D.X.shape[1], dtype='complex64')
338 |     target_space = VectorSpace(dim=len(D.label_list))
339 |     
340 |     data_specs_frame = (CompositeSpace((feat_space,target_space)), ("features", "targets"))
341 |     data_specs_song = (CompositeSpace((feat_space_complex, target_space)), ("songlevel-features", "targets"))
342 | 
343 |     framelevel_it = D.iterator(mode='sequential', batch_size=10, data_specs=data_specs_frame)
344 |     frame_batch = framelevel_it.next()
345 | 
346 |     songlevel_it = D.iterator(mode='sequential', batch_size=1, data_specs=data_specs_song)    
347 |     song_batch = songlevel_it.next()
348 | 
349 | 


--------------------------------------------------------------------------------
/fine_tune_pretrained_mlp.py:
--------------------------------------------------------------------------------
  1 | import sys, re, cPickle, argparse
  2 | from glob import glob
  3 | from pylearn2.train import Train
  4 | from pylearn2.utils import serial
  5 | from pylearn2.models.mlp import MLP, PretrainedLayer, Sigmoid, Softmax
  6 | from pylearn2.training_algorithms.sgd import SGD, LinearDecayOverEpoch
  7 | from pylearn2.training_algorithms.learning_rule import Momentum, MomentumAdjustor
  8 | from pylearn2.training_algorithms.learning_rule import RMSProp
  9 | from pylearn2.termination_criteria import MonitorBased
 10 | from pylearn2.train_extensions.best_params import MonitorBasedSaveBest
 11 | from pylearn2.datasets.transformer_dataset import TransformerDataset
 12 | from audio_dataset import AudioDataset
 13 | 
 14 | import pylearn2.config.yaml_parse as yaml_parse
 15 | 
 16 | import pdb
 17 | 
 18 | def get_mlp(nvis, nclasses, pretrained_layers):
 19 | 
 20 |   layer_yaml = []
 21 |   for i, m in enumerate(pretrained_layers):
 22 |     layer_yaml.append('''!obj:pylearn2.models.mlp.PretrainedLayer {
 23 |       layer_name : "%(layer_name)s",
 24 |       layer_content : !pkl: "%(layer_content)s"
 25 |       }''' % {'layer_name' : 'h'+str(i), 'layer_content' : m })
 26 | 
 27 |   layer_yaml.append('''!obj:pylearn2.models.mlp.Softmax {
 28 |     n_classes : %(nclasses)i,
 29 |     layer_name : "y",
 30 |     irange : .01
 31 |     }''' % {'nclasses' : nclasses})
 32 | 
 33 |   layer_yaml = ','.join(layer_yaml)
 34 | 
 35 |   model_yaml = '''!obj:pylearn2.models.mlp.MLP {
 36 |     nvis : %(nvis)i,
 37 |     layers : [%(layers)s]
 38 |     }''' % {'nvis' : nvis, 'layers' : layer_yaml}
 39 | 
 40 |   model = yaml_parse.load(model_yaml)
 41 |   return model
 42 | 
 43 | def get_trainer(model, trainset, validset, save_path):
 44 |   
 45 |   monitoring  = dict(valid=validset, train=trainset)
 46 |   termination = MonitorBased(channel_name='valid_y_misclass', prop_decrease=.001, N=100)
 47 |   extensions  = [MonitorBasedSaveBest(channel_name='valid_y_misclass', save_path=save_path),
 48 |                 #MomentumAdjustor(start=1, saturate=100, final_momentum=.9),
 49 |                 LinearDecayOverEpoch(start=1, saturate=200, decay_factor=0.01)]
 50 | 
 51 |   config = {
 52 |   'learning_rate': .01,
 53 |   #'learning_rule': Momentum(0.5),
 54 |   'learning_rule': RMSProp(),
 55 |   'train_iteration_mode': 'shuffled_sequential',
 56 |   'batch_size': 1200,#250,
 57 |   #'batches_per_iter' : 100,
 58 |   'monitoring_dataset': monitoring,
 59 |   'monitor_iteration_mode' : 'shuffled_sequential',
 60 |   'termination_criterion' : termination,
 61 |   }
 62 | 
 63 |   return Train(model=model, 
 64 |       algorithm=SGD(**config),
 65 |       dataset=trainset,
 66 |       extensions=extensions)
 67 | 
 68 | if __name__=="__main__":
 69 | 
 70 |   parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
 71 |   description='Script to pretrain the layers of a DNN.')
 72 | 
 73 |   parser.add_argument('fold_config', help='Path to dataset configuration file (generated with prepare_dataset.py)')
 74 |   parser.add_argument('--pretrained_layers', nargs='*', help='List of pretrained layers (sorted from input to output)')
 75 |   parser.add_argument('--save_file', help='Full path and for saving output model')
 76 |   args = parser.parse_args()
 77 |   
 78 |   trainset = yaml_parse.load(
 79 |     '''!obj:audio_dataset.AudioDataset {
 80 |              which_set : 'train',
 81 |              config : !pkl: "%(fold_config)s"
 82 |     }''' % {'fold_config' : args.fold_config} )
 83 | 
 84 |   validset = yaml_parse.load(
 85 |     '''!obj:audio_dataset.AudioDataset {
 86 |              which_set : 'valid',
 87 |              config : !pkl: "%(fold_config)s"
 88 |     }''' % {'fold_config' : args.fold_config} )
 89 | 
 90 | 
 91 |   testset = yaml_parse.load(
 92 |     '''!obj:audio_dataset.AudioDataset {
 93 |              which_set : 'test',
 94 |              config : !pkl: "%(fold_config)s"
 95 |     }''' % {'fold_config' : args.fold_config} )
 96 | 
 97 |   model   = get_mlp(nvis=trainset.X.shape[1], nclasses=trainset.y.shape[1], pretrained_layers=args.pretrained_layers)
 98 |   trainer = get_trainer(model=model, trainset=trainset, validset=validset, save_path=args.save_file)
 99 |   
100 |   trainer.main_loop()
101 | 
102 | 
103 | 
104 | 


--------------------------------------------------------------------------------
/gtzan/test_filtered.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00012.wav
  2 | blues/blues.00013.wav
  3 | blues/blues.00014.wav
  4 | blues/blues.00015.wav
  5 | blues/blues.00016.wav
  6 | blues/blues.00017.wav
  7 | blues/blues.00018.wav
  8 | blues/blues.00019.wav
  9 | blues/blues.00020.wav
 10 | blues/blues.00021.wav
 11 | blues/blues.00022.wav
 12 | blues/blues.00023.wav
 13 | blues/blues.00024.wav
 14 | blues/blues.00025.wav
 15 | blues/blues.00026.wav
 16 | blues/blues.00027.wav
 17 | blues/blues.00028.wav
 18 | blues/blues.00061.wav
 19 | blues/blues.00062.wav
 20 | blues/blues.00063.wav
 21 | blues/blues.00064.wav
 22 | blues/blues.00065.wav
 23 | blues/blues.00066.wav
 24 | blues/blues.00067.wav
 25 | blues/blues.00068.wav
 26 | blues/blues.00069.wav
 27 | blues/blues.00070.wav
 28 | blues/blues.00071.wav
 29 | blues/blues.00072.wav
 30 | blues/blues.00098.wav
 31 | blues/blues.00099.wav
 32 | classical/classical.00011.wav
 33 | classical/classical.00012.wav
 34 | classical/classical.00013.wav
 35 | classical/classical.00014.wav
 36 | classical/classical.00015.wav
 37 | classical/classical.00016.wav
 38 | classical/classical.00017.wav
 39 | classical/classical.00018.wav
 40 | classical/classical.00019.wav
 41 | classical/classical.00020.wav
 42 | classical/classical.00021.wav
 43 | classical/classical.00022.wav
 44 | classical/classical.00023.wav
 45 | classical/classical.00024.wav
 46 | classical/classical.00025.wav
 47 | classical/classical.00026.wav
 48 | classical/classical.00027.wav
 49 | classical/classical.00028.wav
 50 | classical/classical.00029.wav
 51 | classical/classical.00034.wav
 52 | classical/classical.00035.wav
 53 | classical/classical.00036.wav
 54 | classical/classical.00037.wav
 55 | classical/classical.00038.wav
 56 | classical/classical.00039.wav
 57 | classical/classical.00040.wav
 58 | classical/classical.00041.wav
 59 | classical/classical.00049.wav
 60 | classical/classical.00077.wav
 61 | classical/classical.00078.wav
 62 | classical/classical.00079.wav
 63 | country/country.00030.wav
 64 | country/country.00031.wav
 65 | country/country.00032.wav
 66 | country/country.00033.wav
 67 | country/country.00034.wav
 68 | country/country.00035.wav
 69 | country/country.00036.wav
 70 | country/country.00037.wav
 71 | country/country.00038.wav
 72 | country/country.00039.wav
 73 | country/country.00040.wav
 74 | country/country.00043.wav
 75 | country/country.00044.wav
 76 | country/country.00046.wav
 77 | country/country.00047.wav
 78 | country/country.00048.wav
 79 | country/country.00050.wav
 80 | country/country.00051.wav
 81 | country/country.00053.wav
 82 | country/country.00054.wav
 83 | country/country.00055.wav
 84 | country/country.00056.wav
 85 | country/country.00057.wav
 86 | country/country.00058.wav
 87 | country/country.00059.wav
 88 | country/country.00060.wav
 89 | country/country.00061.wav
 90 | country/country.00062.wav
 91 | country/country.00063.wav
 92 | country/country.00064.wav
 93 | disco/disco.00001.wav
 94 | disco/disco.00021.wav
 95 | disco/disco.00058.wav
 96 | disco/disco.00062.wav
 97 | disco/disco.00063.wav
 98 | disco/disco.00064.wav
 99 | disco/disco.00065.wav
100 | disco/disco.00066.wav
101 | disco/disco.00069.wav
102 | disco/disco.00076.wav
103 | disco/disco.00077.wav
104 | disco/disco.00078.wav
105 | disco/disco.00079.wav
106 | disco/disco.00080.wav
107 | disco/disco.00081.wav
108 | disco/disco.00082.wav
109 | disco/disco.00083.wav
110 | disco/disco.00084.wav
111 | disco/disco.00085.wav
112 | disco/disco.00086.wav
113 | disco/disco.00087.wav
114 | disco/disco.00088.wav
115 | disco/disco.00091.wav
116 | disco/disco.00092.wav
117 | disco/disco.00093.wav
118 | disco/disco.00094.wav
119 | disco/disco.00096.wav
120 | disco/disco.00097.wav
121 | disco/disco.00099.wav
122 | hiphop/hiphop.00000.wav
123 | hiphop/hiphop.00026.wav
124 | hiphop/hiphop.00027.wav
125 | hiphop/hiphop.00030.wav
126 | hiphop/hiphop.00040.wav
127 | hiphop/hiphop.00043.wav
128 | hiphop/hiphop.00044.wav
129 | hiphop/hiphop.00045.wav
130 | hiphop/hiphop.00051.wav
131 | hiphop/hiphop.00052.wav
132 | hiphop/hiphop.00053.wav
133 | hiphop/hiphop.00054.wav
134 | hiphop/hiphop.00062.wav
135 | hiphop/hiphop.00063.wav
136 | hiphop/hiphop.00064.wav
137 | hiphop/hiphop.00065.wav
138 | hiphop/hiphop.00066.wav
139 | hiphop/hiphop.00067.wav
140 | hiphop/hiphop.00068.wav
141 | hiphop/hiphop.00069.wav
142 | hiphop/hiphop.00070.wav
143 | hiphop/hiphop.00071.wav
144 | hiphop/hiphop.00072.wav
145 | hiphop/hiphop.00073.wav
146 | hiphop/hiphop.00074.wav
147 | hiphop/hiphop.00075.wav
148 | hiphop/hiphop.00099.wav
149 | jazz/jazz.00073.wav
150 | jazz/jazz.00074.wav
151 | jazz/jazz.00075.wav
152 | jazz/jazz.00076.wav
153 | jazz/jazz.00077.wav
154 | jazz/jazz.00078.wav
155 | jazz/jazz.00079.wav
156 | jazz/jazz.00080.wav
157 | jazz/jazz.00081.wav
158 | jazz/jazz.00082.wav
159 | jazz/jazz.00083.wav
160 | jazz/jazz.00084.wav
161 | jazz/jazz.00085.wav
162 | jazz/jazz.00086.wav
163 | jazz/jazz.00087.wav
164 | jazz/jazz.00088.wav
165 | jazz/jazz.00089.wav
166 | jazz/jazz.00090.wav
167 | jazz/jazz.00091.wav
168 | jazz/jazz.00092.wav
169 | jazz/jazz.00093.wav
170 | jazz/jazz.00094.wav
171 | jazz/jazz.00095.wav
172 | jazz/jazz.00096.wav
173 | jazz/jazz.00097.wav
174 | jazz/jazz.00098.wav
175 | jazz/jazz.00099.wav
176 | metal/metal.00012.wav
177 | metal/metal.00013.wav
178 | metal/metal.00014.wav
179 | metal/metal.00015.wav
180 | metal/metal.00022.wav
181 | metal/metal.00023.wav
182 | metal/metal.00025.wav
183 | metal/metal.00026.wav
184 | metal/metal.00027.wav
185 | metal/metal.00028.wav
186 | metal/metal.00029.wav
187 | metal/metal.00030.wav
188 | metal/metal.00031.wav
189 | metal/metal.00032.wav
190 | metal/metal.00033.wav
191 | metal/metal.00038.wav
192 | metal/metal.00039.wav
193 | metal/metal.00067.wav
194 | metal/metal.00070.wav
195 | metal/metal.00073.wav
196 | metal/metal.00074.wav
197 | metal/metal.00075.wav
198 | metal/metal.00078.wav
199 | metal/metal.00083.wav
200 | metal/metal.00085.wav
201 | metal/metal.00087.wav
202 | metal/metal.00088.wav
203 | pop/pop.00000.wav
204 | pop/pop.00001.wav
205 | pop/pop.00013.wav
206 | pop/pop.00014.wav
207 | pop/pop.00043.wav
208 | pop/pop.00063.wav
209 | pop/pop.00064.wav
210 | pop/pop.00065.wav
211 | pop/pop.00066.wav
212 | pop/pop.00069.wav
213 | pop/pop.00070.wav
214 | pop/pop.00071.wav
215 | pop/pop.00072.wav
216 | pop/pop.00073.wav
217 | pop/pop.00074.wav
218 | pop/pop.00075.wav
219 | pop/pop.00076.wav
220 | pop/pop.00077.wav
221 | pop/pop.00078.wav
222 | pop/pop.00079.wav
223 | pop/pop.00082.wav
224 | pop/pop.00088.wav
225 | pop/pop.00089.wav
226 | pop/pop.00090.wav
227 | pop/pop.00091.wav
228 | pop/pop.00092.wav
229 | pop/pop.00093.wav
230 | pop/pop.00094.wav
231 | pop/pop.00095.wav
232 | pop/pop.00096.wav
233 | reggae/reggae.00034.wav
234 | reggae/reggae.00035.wav
235 | reggae/reggae.00036.wav
236 | reggae/reggae.00037.wav
237 | reggae/reggae.00038.wav
238 | reggae/reggae.00039.wav
239 | reggae/reggae.00040.wav
240 | reggae/reggae.00046.wav
241 | reggae/reggae.00047.wav
242 | reggae/reggae.00048.wav
243 | reggae/reggae.00052.wav
244 | reggae/reggae.00053.wav
245 | reggae/reggae.00064.wav
246 | reggae/reggae.00065.wav
247 | reggae/reggae.00066.wav
248 | reggae/reggae.00067.wav
249 | reggae/reggae.00068.wav
250 | reggae/reggae.00071.wav
251 | reggae/reggae.00079.wav
252 | reggae/reggae.00082.wav
253 | reggae/reggae.00083.wav
254 | reggae/reggae.00084.wav
255 | reggae/reggae.00087.wav
256 | reggae/reggae.00088.wav
257 | reggae/reggae.00089.wav
258 | reggae/reggae.00090.wav
259 | rock/rock.00010.wav
260 | rock/rock.00011.wav
261 | rock/rock.00012.wav
262 | rock/rock.00013.wav
263 | rock/rock.00014.wav
264 | rock/rock.00015.wav
265 | rock/rock.00027.wav
266 | rock/rock.00028.wav
267 | rock/rock.00029.wav
268 | rock/rock.00030.wav
269 | rock/rock.00031.wav
270 | rock/rock.00032.wav
271 | rock/rock.00033.wav
272 | rock/rock.00034.wav
273 | rock/rock.00035.wav
274 | rock/rock.00036.wav
275 | rock/rock.00037.wav
276 | rock/rock.00039.wav
277 | rock/rock.00040.wav
278 | rock/rock.00041.wav
279 | rock/rock.00042.wav
280 | rock/rock.00043.wav
281 | rock/rock.00044.wav
282 | rock/rock.00045.wav
283 | rock/rock.00046.wav
284 | rock/rock.00047.wav
285 | rock/rock.00048.wav
286 | rock/rock.00086.wav
287 | rock/rock.00087.wav
288 | rock/rock.00088.wav
289 | rock/rock.00089.wav
290 | rock/rock.00090.wav
291 | 


--------------------------------------------------------------------------------
/gtzan/test_stratified.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00005.wav
  2 | blues/blues.00010.wav
  3 | blues/blues.00012.wav
  4 | blues/blues.00013.wav
  5 | blues/blues.00015.wav
  6 | blues/blues.00020.wav
  7 | blues/blues.00023.wav
  8 | blues/blues.00024.wav
  9 | blues/blues.00025.wav
 10 | blues/blues.00028.wav
 11 | blues/blues.00031.wav
 12 | blues/blues.00040.wav
 13 | blues/blues.00043.wav
 14 | blues/blues.00049.wav
 15 | blues/blues.00052.wav
 16 | blues/blues.00053.wav
 17 | blues/blues.00059.wav
 18 | blues/blues.00062.wav
 19 | blues/blues.00064.wav
 20 | blues/blues.00077.wav
 21 | blues/blues.00083.wav
 22 | blues/blues.00084.wav
 23 | blues/blues.00092.wav
 24 | blues/blues.00095.wav
 25 | blues/blues.00098.wav
 26 | classical/classical.00009.wav
 27 | classical/classical.00016.wav
 28 | classical/classical.00019.wav
 29 | classical/classical.00021.wav
 30 | classical/classical.00024.wav
 31 | classical/classical.00029.wav
 32 | classical/classical.00034.wav
 33 | classical/classical.00035.wav
 34 | classical/classical.00036.wav
 35 | classical/classical.00037.wav
 36 | classical/classical.00041.wav
 37 | classical/classical.00042.wav
 38 | classical/classical.00043.wav
 39 | classical/classical.00045.wav
 40 | classical/classical.00052.wav
 41 | classical/classical.00053.wav
 42 | classical/classical.00055.wav
 43 | classical/classical.00056.wav
 44 | classical/classical.00061.wav
 45 | classical/classical.00066.wav
 46 | classical/classical.00069.wav
 47 | classical/classical.00075.wav
 48 | classical/classical.00087.wav
 49 | classical/classical.00088.wav
 50 | classical/classical.00092.wav
 51 | country/country.00002.wav
 52 | country/country.00011.wav
 53 | country/country.00021.wav
 54 | country/country.00022.wav
 55 | country/country.00023.wav
 56 | country/country.00024.wav
 57 | country/country.00031.wav
 58 | country/country.00033.wav
 59 | country/country.00041.wav
 60 | country/country.00047.wav
 61 | country/country.00050.wav
 62 | country/country.00051.wav
 63 | country/country.00052.wav
 64 | country/country.00053.wav
 65 | country/country.00057.wav
 66 | country/country.00064.wav
 67 | country/country.00065.wav
 68 | country/country.00072.wav
 69 | country/country.00075.wav
 70 | country/country.00076.wav
 71 | country/country.00077.wav
 72 | country/country.00078.wav
 73 | country/country.00082.wav
 74 | country/country.00091.wav
 75 | country/country.00099.wav
 76 | disco/disco.00000.wav
 77 | disco/disco.00001.wav
 78 | disco/disco.00002.wav
 79 | disco/disco.00004.wav
 80 | disco/disco.00005.wav
 81 | disco/disco.00008.wav
 82 | disco/disco.00012.wav
 83 | disco/disco.00014.wav
 84 | disco/disco.00021.wav
 85 | disco/disco.00033.wav
 86 | disco/disco.00038.wav
 87 | disco/disco.00049.wav
 88 | disco/disco.00055.wav
 89 | disco/disco.00058.wav
 90 | disco/disco.00061.wav
 91 | disco/disco.00068.wav
 92 | disco/disco.00071.wav
 93 | disco/disco.00076.wav
 94 | disco/disco.00077.wav
 95 | disco/disco.00079.wav
 96 | disco/disco.00080.wav
 97 | disco/disco.00082.wav
 98 | disco/disco.00086.wav
 99 | disco/disco.00094.wav
100 | disco/disco.00098.wav
101 | hiphop/hiphop.00001.wav
102 | hiphop/hiphop.00004.wav
103 | hiphop/hiphop.00008.wav
104 | hiphop/hiphop.00009.wav
105 | hiphop/hiphop.00014.wav
106 | hiphop/hiphop.00015.wav
107 | hiphop/hiphop.00017.wav
108 | hiphop/hiphop.00023.wav
109 | hiphop/hiphop.00024.wav
110 | hiphop/hiphop.00025.wav
111 | hiphop/hiphop.00028.wav
112 | hiphop/hiphop.00029.wav
113 | hiphop/hiphop.00034.wav
114 | hiphop/hiphop.00042.wav
115 | hiphop/hiphop.00061.wav
116 | hiphop/hiphop.00062.wav
117 | hiphop/hiphop.00065.wav
118 | hiphop/hiphop.00070.wav
119 | hiphop/hiphop.00075.wav
120 | hiphop/hiphop.00085.wav
121 | hiphop/hiphop.00087.wav
122 | hiphop/hiphop.00091.wav
123 | hiphop/hiphop.00094.wav
124 | hiphop/hiphop.00095.wav
125 | hiphop/hiphop.00096.wav
126 | jazz/jazz.00003.wav
127 | jazz/jazz.00009.wav
128 | jazz/jazz.00016.wav
129 | jazz/jazz.00018.wav
130 | jazz/jazz.00020.wav
131 | jazz/jazz.00031.wav
132 | jazz/jazz.00033.wav
133 | jazz/jazz.00035.wav
134 | jazz/jazz.00037.wav
135 | jazz/jazz.00039.wav
136 | jazz/jazz.00045.wav
137 | jazz/jazz.00048.wav
138 | jazz/jazz.00053.wav
139 | jazz/jazz.00066.wav
140 | jazz/jazz.00067.wav
141 | jazz/jazz.00069.wav
142 | jazz/jazz.00071.wav
143 | jazz/jazz.00073.wav
144 | jazz/jazz.00076.wav
145 | jazz/jazz.00078.wav
146 | jazz/jazz.00084.wav
147 | jazz/jazz.00085.wav
148 | jazz/jazz.00087.wav
149 | jazz/jazz.00088.wav
150 | jazz/jazz.00099.wav
151 | metal/metal.00001.wav
152 | metal/metal.00002.wav
153 | metal/metal.00005.wav
154 | metal/metal.00018.wav
155 | metal/metal.00020.wav
156 | metal/metal.00021.wav
157 | metal/metal.00030.wav
158 | metal/metal.00035.wav
159 | metal/metal.00040.wav
160 | metal/metal.00042.wav
161 | metal/metal.00046.wav
162 | metal/metal.00048.wav
163 | metal/metal.00050.wav
164 | metal/metal.00051.wav
165 | metal/metal.00054.wav
166 | metal/metal.00057.wav
167 | metal/metal.00058.wav
168 | metal/metal.00062.wav
169 | metal/metal.00066.wav
170 | metal/metal.00069.wav
171 | metal/metal.00078.wav
172 | metal/metal.00080.wav
173 | metal/metal.00084.wav
174 | metal/metal.00092.wav
175 | metal/metal.00098.wav
176 | pop/pop.00000.wav
177 | pop/pop.00005.wav
178 | pop/pop.00006.wav
179 | pop/pop.00008.wav
180 | pop/pop.00021.wav
181 | pop/pop.00030.wav
182 | pop/pop.00031.wav
183 | pop/pop.00034.wav
184 | pop/pop.00036.wav
185 | pop/pop.00038.wav
186 | pop/pop.00039.wav
187 | pop/pop.00040.wav
188 | pop/pop.00044.wav
189 | pop/pop.00046.wav
190 | pop/pop.00049.wav
191 | pop/pop.00052.wav
192 | pop/pop.00066.wav
193 | pop/pop.00068.wav
194 | pop/pop.00069.wav
195 | pop/pop.00070.wav
196 | pop/pop.00084.wav
197 | pop/pop.00088.wav
198 | pop/pop.00091.wav
199 | pop/pop.00096.wav
200 | pop/pop.00097.wav
201 | reggae/reggae.00002.wav
202 | reggae/reggae.00003.wav
203 | reggae/reggae.00006.wav
204 | reggae/reggae.00015.wav
205 | reggae/reggae.00020.wav
206 | reggae/reggae.00021.wav
207 | reggae/reggae.00022.wav
208 | reggae/reggae.00033.wav
209 | reggae/reggae.00035.wav
210 | reggae/reggae.00046.wav
211 | reggae/reggae.00048.wav
212 | reggae/reggae.00050.wav
213 | reggae/reggae.00057.wav
214 | reggae/reggae.00058.wav
215 | reggae/reggae.00068.wav
216 | reggae/reggae.00069.wav
217 | reggae/reggae.00074.wav
218 | reggae/reggae.00078.wav
219 | reggae/reggae.00079.wav
220 | reggae/reggae.00081.wav
221 | reggae/reggae.00083.wav
222 | reggae/reggae.00094.wav
223 | reggae/reggae.00096.wav
224 | reggae/reggae.00097.wav
225 | reggae/reggae.00099.wav
226 | rock/rock.00000.wav
227 | rock/rock.00001.wav
228 | rock/rock.00002.wav
229 | rock/rock.00009.wav
230 | rock/rock.00010.wav
231 | rock/rock.00011.wav
232 | rock/rock.00020.wav
233 | rock/rock.00031.wav
234 | rock/rock.00032.wav
235 | rock/rock.00033.wav
236 | rock/rock.00039.wav
237 | rock/rock.00042.wav
238 | rock/rock.00047.wav
239 | rock/rock.00050.wav
240 | rock/rock.00051.wav
241 | rock/rock.00057.wav
242 | rock/rock.00060.wav
243 | rock/rock.00064.wav
244 | rock/rock.00070.wav
245 | rock/rock.00072.wav
246 | rock/rock.00073.wav
247 | rock/rock.00074.wav
248 | rock/rock.00079.wav
249 | rock/rock.00082.wav
250 | rock/rock.00098.wav
251 | 


--------------------------------------------------------------------------------
/gtzan/train_filtered.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00029.wav
  2 | blues/blues.00030.wav
  3 | blues/blues.00031.wav
  4 | blues/blues.00032.wav
  5 | blues/blues.00033.wav
  6 | blues/blues.00034.wav
  7 | blues/blues.00035.wav
  8 | blues/blues.00036.wav
  9 | blues/blues.00037.wav
 10 | blues/blues.00038.wav
 11 | blues/blues.00039.wav
 12 | blues/blues.00040.wav
 13 | blues/blues.00041.wav
 14 | blues/blues.00042.wav
 15 | blues/blues.00043.wav
 16 | blues/blues.00044.wav
 17 | blues/blues.00045.wav
 18 | blues/blues.00046.wav
 19 | blues/blues.00047.wav
 20 | blues/blues.00048.wav
 21 | blues/blues.00049.wav
 22 | blues/blues.00073.wav
 23 | blues/blues.00074.wav
 24 | blues/blues.00075.wav
 25 | blues/blues.00076.wav
 26 | blues/blues.00077.wav
 27 | blues/blues.00078.wav
 28 | blues/blues.00079.wav
 29 | blues/blues.00080.wav
 30 | blues/blues.00081.wav
 31 | blues/blues.00082.wav
 32 | blues/blues.00083.wav
 33 | blues/blues.00084.wav
 34 | blues/blues.00085.wav
 35 | blues/blues.00086.wav
 36 | blues/blues.00087.wav
 37 | blues/blues.00088.wav
 38 | blues/blues.00089.wav
 39 | blues/blues.00090.wav
 40 | blues/blues.00091.wav
 41 | blues/blues.00092.wav
 42 | blues/blues.00093.wav
 43 | blues/blues.00094.wav
 44 | blues/blues.00095.wav
 45 | blues/blues.00096.wav
 46 | blues/blues.00097.wav
 47 | classical/classical.00030.wav
 48 | classical/classical.00031.wav
 49 | classical/classical.00032.wav
 50 | classical/classical.00033.wav
 51 | classical/classical.00043.wav
 52 | classical/classical.00044.wav
 53 | classical/classical.00045.wav
 54 | classical/classical.00046.wav
 55 | classical/classical.00047.wav
 56 | classical/classical.00048.wav
 57 | classical/classical.00050.wav
 58 | classical/classical.00051.wav
 59 | classical/classical.00052.wav
 60 | classical/classical.00053.wav
 61 | classical/classical.00054.wav
 62 | classical/classical.00055.wav
 63 | classical/classical.00056.wav
 64 | classical/classical.00057.wav
 65 | classical/classical.00058.wav
 66 | classical/classical.00059.wav
 67 | classical/classical.00060.wav
 68 | classical/classical.00061.wav
 69 | classical/classical.00062.wav
 70 | classical/classical.00063.wav
 71 | classical/classical.00064.wav
 72 | classical/classical.00065.wav
 73 | classical/classical.00066.wav
 74 | classical/classical.00067.wav
 75 | classical/classical.00080.wav
 76 | classical/classical.00081.wav
 77 | classical/classical.00082.wav
 78 | classical/classical.00083.wav
 79 | classical/classical.00084.wav
 80 | classical/classical.00085.wav
 81 | classical/classical.00086.wav
 82 | classical/classical.00087.wav
 83 | classical/classical.00088.wav
 84 | classical/classical.00089.wav
 85 | classical/classical.00090.wav
 86 | classical/classical.00091.wav
 87 | classical/classical.00092.wav
 88 | classical/classical.00093.wav
 89 | classical/classical.00094.wav
 90 | classical/classical.00095.wav
 91 | classical/classical.00096.wav
 92 | classical/classical.00097.wav
 93 | classical/classical.00098.wav
 94 | classical/classical.00099.wav
 95 | country/country.00019.wav
 96 | country/country.00020.wav
 97 | country/country.00021.wav
 98 | country/country.00022.wav
 99 | country/country.00023.wav
100 | country/country.00024.wav
101 | country/country.00025.wav
102 | country/country.00026.wav
103 | country/country.00028.wav
104 | country/country.00029.wav
105 | country/country.00065.wav
106 | country/country.00066.wav
107 | country/country.00067.wav
108 | country/country.00068.wav
109 | country/country.00069.wav
110 | country/country.00070.wav
111 | country/country.00071.wav
112 | country/country.00072.wav
113 | country/country.00073.wav
114 | country/country.00074.wav
115 | country/country.00075.wav
116 | country/country.00076.wav
117 | country/country.00077.wav
118 | country/country.00078.wav
119 | country/country.00079.wav
120 | country/country.00080.wav
121 | country/country.00081.wav
122 | country/country.00082.wav
123 | country/country.00083.wav
124 | country/country.00084.wav
125 | country/country.00085.wav
126 | country/country.00086.wav
127 | country/country.00087.wav
128 | country/country.00088.wav
129 | country/country.00089.wav
130 | country/country.00090.wav
131 | country/country.00091.wav
132 | country/country.00092.wav
133 | country/country.00093.wav
134 | country/country.00094.wav
135 | country/country.00095.wav
136 | country/country.00096.wav
137 | country/country.00097.wav
138 | country/country.00098.wav
139 | country/country.00099.wav
140 | disco/disco.00005.wav
141 | disco/disco.00015.wav
142 | disco/disco.00016.wav
143 | disco/disco.00017.wav
144 | disco/disco.00018.wav
145 | disco/disco.00019.wav
146 | disco/disco.00020.wav
147 | disco/disco.00022.wav
148 | disco/disco.00023.wav
149 | disco/disco.00024.wav
150 | disco/disco.00025.wav
151 | disco/disco.00026.wav
152 | disco/disco.00027.wav
153 | disco/disco.00028.wav
154 | disco/disco.00029.wav
155 | disco/disco.00030.wav
156 | disco/disco.00031.wav
157 | disco/disco.00032.wav
158 | disco/disco.00033.wav
159 | disco/disco.00034.wav
160 | disco/disco.00035.wav
161 | disco/disco.00036.wav
162 | disco/disco.00037.wav
163 | disco/disco.00039.wav
164 | disco/disco.00040.wav
165 | disco/disco.00041.wav
166 | disco/disco.00042.wav
167 | disco/disco.00043.wav
168 | disco/disco.00044.wav
169 | disco/disco.00045.wav
170 | disco/disco.00047.wav
171 | disco/disco.00049.wav
172 | disco/disco.00053.wav
173 | disco/disco.00054.wav
174 | disco/disco.00056.wav
175 | disco/disco.00057.wav
176 | disco/disco.00059.wav
177 | disco/disco.00061.wav
178 | disco/disco.00070.wav
179 | disco/disco.00073.wav
180 | disco/disco.00074.wav
181 | disco/disco.00089.wav
182 | hiphop/hiphop.00002.wav
183 | hiphop/hiphop.00003.wav
184 | hiphop/hiphop.00004.wav
185 | hiphop/hiphop.00005.wav
186 | hiphop/hiphop.00006.wav
187 | hiphop/hiphop.00007.wav
188 | hiphop/hiphop.00008.wav
189 | hiphop/hiphop.00009.wav
190 | hiphop/hiphop.00010.wav
191 | hiphop/hiphop.00011.wav
192 | hiphop/hiphop.00012.wav
193 | hiphop/hiphop.00013.wav
194 | hiphop/hiphop.00014.wav
195 | hiphop/hiphop.00015.wav
196 | hiphop/hiphop.00016.wav
197 | hiphop/hiphop.00017.wav
198 | hiphop/hiphop.00018.wav
199 | hiphop/hiphop.00019.wav
200 | hiphop/hiphop.00020.wav
201 | hiphop/hiphop.00021.wav
202 | hiphop/hiphop.00022.wav
203 | hiphop/hiphop.00023.wav
204 | hiphop/hiphop.00024.wav
205 | hiphop/hiphop.00025.wav
206 | hiphop/hiphop.00028.wav
207 | hiphop/hiphop.00029.wav
208 | hiphop/hiphop.00031.wav
209 | hiphop/hiphop.00032.wav
210 | hiphop/hiphop.00033.wav
211 | hiphop/hiphop.00034.wav
212 | hiphop/hiphop.00035.wav
213 | hiphop/hiphop.00036.wav
214 | hiphop/hiphop.00037.wav
215 | hiphop/hiphop.00038.wav
216 | hiphop/hiphop.00041.wav
217 | hiphop/hiphop.00042.wav
218 | hiphop/hiphop.00055.wav
219 | hiphop/hiphop.00056.wav
220 | hiphop/hiphop.00057.wav
221 | hiphop/hiphop.00058.wav
222 | hiphop/hiphop.00059.wav
223 | hiphop/hiphop.00060.wav
224 | hiphop/hiphop.00061.wav
225 | hiphop/hiphop.00077.wav
226 | hiphop/hiphop.00078.wav
227 | hiphop/hiphop.00079.wav
228 | hiphop/hiphop.00080.wav
229 | jazz/jazz.00000.wav
230 | jazz/jazz.00001.wav
231 | jazz/jazz.00011.wav
232 | jazz/jazz.00012.wav
233 | jazz/jazz.00013.wav
234 | jazz/jazz.00014.wav
235 | jazz/jazz.00015.wav
236 | jazz/jazz.00016.wav
237 | jazz/jazz.00017.wav
238 | jazz/jazz.00018.wav
239 | jazz/jazz.00019.wav
240 | jazz/jazz.00020.wav
241 | jazz/jazz.00021.wav
242 | jazz/jazz.00022.wav
243 | jazz/jazz.00023.wav
244 | jazz/jazz.00024.wav
245 | jazz/jazz.00041.wav
246 | jazz/jazz.00047.wav
247 | jazz/jazz.00048.wav
248 | jazz/jazz.00049.wav
249 | jazz/jazz.00050.wav
250 | jazz/jazz.00051.wav
251 | jazz/jazz.00052.wav
252 | jazz/jazz.00053.wav
253 | jazz/jazz.00054.wav
254 | jazz/jazz.00055.wav
255 | jazz/jazz.00056.wav
256 | jazz/jazz.00057.wav
257 | jazz/jazz.00058.wav
258 | jazz/jazz.00059.wav
259 | jazz/jazz.00060.wav
260 | jazz/jazz.00061.wav
261 | jazz/jazz.00062.wav
262 | jazz/jazz.00063.wav
263 | jazz/jazz.00064.wav
264 | jazz/jazz.00065.wav
265 | jazz/jazz.00066.wav
266 | jazz/jazz.00067.wav
267 | jazz/jazz.00068.wav
268 | jazz/jazz.00069.wav
269 | jazz/jazz.00070.wav
270 | jazz/jazz.00071.wav
271 | jazz/jazz.00072.wav
272 | metal/metal.00002.wav
273 | metal/metal.00003.wav
274 | metal/metal.00005.wav
275 | metal/metal.00021.wav
276 | metal/metal.00024.wav
277 | metal/metal.00035.wav
278 | metal/metal.00046.wav
279 | metal/metal.00047.wav
280 | metal/metal.00048.wav
281 | metal/metal.00049.wav
282 | metal/metal.00050.wav
283 | metal/metal.00051.wav
284 | metal/metal.00052.wav
285 | metal/metal.00053.wav
286 | metal/metal.00054.wav
287 | metal/metal.00055.wav
288 | metal/metal.00056.wav
289 | metal/metal.00057.wav
290 | metal/metal.00059.wav
291 | metal/metal.00060.wav
292 | metal/metal.00061.wav
293 | metal/metal.00062.wav
294 | metal/metal.00063.wav
295 | metal/metal.00064.wav
296 | metal/metal.00065.wav
297 | metal/metal.00066.wav
298 | metal/metal.00069.wav
299 | metal/metal.00071.wav
300 | metal/metal.00072.wav
301 | metal/metal.00079.wav
302 | metal/metal.00080.wav
303 | metal/metal.00084.wav
304 | metal/metal.00086.wav
305 | metal/metal.00089.wav
306 | metal/metal.00090.wav
307 | metal/metal.00091.wav
308 | metal/metal.00092.wav
309 | metal/metal.00093.wav
310 | metal/metal.00094.wav
311 | metal/metal.00095.wav
312 | metal/metal.00096.wav
313 | metal/metal.00097.wav
314 | metal/metal.00098.wav
315 | metal/metal.00099.wav
316 | pop/pop.00002.wav
317 | pop/pop.00003.wav
318 | pop/pop.00004.wav
319 | pop/pop.00005.wav
320 | pop/pop.00006.wav
321 | pop/pop.00007.wav
322 | pop/pop.00008.wav
323 | pop/pop.00009.wav
324 | pop/pop.00011.wav
325 | pop/pop.00012.wav
326 | pop/pop.00016.wav
327 | pop/pop.00017.wav
328 | pop/pop.00018.wav
329 | pop/pop.00019.wav
330 | pop/pop.00020.wav
331 | pop/pop.00023.wav
332 | pop/pop.00024.wav
333 | pop/pop.00025.wav
334 | pop/pop.00026.wav
335 | pop/pop.00027.wav
336 | pop/pop.00028.wav
337 | pop/pop.00029.wav
338 | pop/pop.00031.wav
339 | pop/pop.00032.wav
340 | pop/pop.00033.wav
341 | pop/pop.00034.wav
342 | pop/pop.00035.wav
343 | pop/pop.00036.wav
344 | pop/pop.00038.wav
345 | pop/pop.00039.wav
346 | pop/pop.00040.wav
347 | pop/pop.00041.wav
348 | pop/pop.00042.wav
349 | pop/pop.00044.wav
350 | pop/pop.00046.wav
351 | pop/pop.00049.wav
352 | pop/pop.00050.wav
353 | pop/pop.00080.wav
354 | pop/pop.00097.wav
355 | pop/pop.00098.wav
356 | pop/pop.00099.wav
357 | reggae/reggae.00000.wav
358 | reggae/reggae.00001.wav
359 | reggae/reggae.00002.wav
360 | reggae/reggae.00004.wav
361 | reggae/reggae.00006.wav
362 | reggae/reggae.00009.wav
363 | reggae/reggae.00011.wav
364 | reggae/reggae.00012.wav
365 | reggae/reggae.00014.wav
366 | reggae/reggae.00015.wav
367 | reggae/reggae.00016.wav
368 | reggae/reggae.00017.wav
369 | reggae/reggae.00018.wav
370 | reggae/reggae.00019.wav
371 | reggae/reggae.00020.wav
372 | reggae/reggae.00021.wav
373 | reggae/reggae.00022.wav
374 | reggae/reggae.00023.wav
375 | reggae/reggae.00024.wav
376 | reggae/reggae.00025.wav
377 | reggae/reggae.00026.wav
378 | reggae/reggae.00027.wav
379 | reggae/reggae.00028.wav
380 | reggae/reggae.00029.wav
381 | reggae/reggae.00030.wav
382 | reggae/reggae.00031.wav
383 | reggae/reggae.00032.wav
384 | reggae/reggae.00042.wav
385 | reggae/reggae.00043.wav
386 | reggae/reggae.00044.wav
387 | reggae/reggae.00045.wav
388 | reggae/reggae.00049.wav
389 | reggae/reggae.00050.wav
390 | reggae/reggae.00051.wav
391 | reggae/reggae.00054.wav
392 | reggae/reggae.00055.wav
393 | reggae/reggae.00056.wav
394 | reggae/reggae.00057.wav
395 | reggae/reggae.00058.wav
396 | reggae/reggae.00059.wav
397 | reggae/reggae.00060.wav
398 | reggae/reggae.00063.wav
399 | reggae/reggae.00069.wav
400 | rock/rock.00000.wav
401 | rock/rock.00001.wav
402 | rock/rock.00002.wav
403 | rock/rock.00003.wav
404 | rock/rock.00004.wav
405 | rock/rock.00005.wav
406 | rock/rock.00006.wav
407 | rock/rock.00007.wav
408 | rock/rock.00008.wav
409 | rock/rock.00009.wav
410 | rock/rock.00016.wav
411 | rock/rock.00017.wav
412 | rock/rock.00018.wav
413 | rock/rock.00019.wav
414 | rock/rock.00020.wav
415 | rock/rock.00021.wav
416 | rock/rock.00022.wav
417 | rock/rock.00023.wav
418 | rock/rock.00024.wav
419 | rock/rock.00025.wav
420 | rock/rock.00026.wav
421 | rock/rock.00057.wav
422 | rock/rock.00058.wav
423 | rock/rock.00059.wav
424 | rock/rock.00060.wav
425 | rock/rock.00061.wav
426 | rock/rock.00062.wav
427 | rock/rock.00063.wav
428 | rock/rock.00064.wav
429 | rock/rock.00065.wav
430 | rock/rock.00066.wav
431 | rock/rock.00067.wav
432 | rock/rock.00068.wav
433 | rock/rock.00069.wav
434 | rock/rock.00070.wav
435 | rock/rock.00091.wav
436 | rock/rock.00092.wav
437 | rock/rock.00093.wav
438 | rock/rock.00094.wav
439 | rock/rock.00095.wav
440 | rock/rock.00096.wav
441 | rock/rock.00097.wav
442 | rock/rock.00098.wav
443 | rock/rock.00099.wav
444 | 


--------------------------------------------------------------------------------
/gtzan/train_stratified.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00000.wav
  2 | blues/blues.00001.wav
  3 | blues/blues.00002.wav
  4 | blues/blues.00003.wav
  5 | blues/blues.00006.wav
  6 | blues/blues.00011.wav
  7 | blues/blues.00014.wav
  8 | blues/blues.00017.wav
  9 | blues/blues.00018.wav
 10 | blues/blues.00019.wav
 11 | blues/blues.00021.wav
 12 | blues/blues.00026.wav
 13 | blues/blues.00027.wav
 14 | blues/blues.00029.wav
 15 | blues/blues.00030.wav
 16 | blues/blues.00033.wav
 17 | blues/blues.00034.wav
 18 | blues/blues.00036.wav
 19 | blues/blues.00037.wav
 20 | blues/blues.00038.wav
 21 | blues/blues.00039.wav
 22 | blues/blues.00041.wav
 23 | blues/blues.00044.wav
 24 | blues/blues.00046.wav
 25 | blues/blues.00047.wav
 26 | blues/blues.00048.wav
 27 | blues/blues.00050.wav
 28 | blues/blues.00055.wav
 29 | blues/blues.00056.wav
 30 | blues/blues.00057.wav
 31 | blues/blues.00060.wav
 32 | blues/blues.00063.wav
 33 | blues/blues.00065.wav
 34 | blues/blues.00067.wav
 35 | blues/blues.00069.wav
 36 | blues/blues.00070.wav
 37 | blues/blues.00072.wav
 38 | blues/blues.00074.wav
 39 | blues/blues.00075.wav
 40 | blues/blues.00076.wav
 41 | blues/blues.00079.wav
 42 | blues/blues.00081.wav
 43 | blues/blues.00085.wav
 44 | blues/blues.00087.wav
 45 | blues/blues.00088.wav
 46 | blues/blues.00089.wav
 47 | blues/blues.00091.wav
 48 | blues/blues.00094.wav
 49 | blues/blues.00097.wav
 50 | blues/blues.00099.wav
 51 | classical/classical.00000.wav
 52 | classical/classical.00001.wav
 53 | classical/classical.00002.wav
 54 | classical/classical.00003.wav
 55 | classical/classical.00004.wav
 56 | classical/classical.00011.wav
 57 | classical/classical.00012.wav
 58 | classical/classical.00014.wav
 59 | classical/classical.00017.wav
 60 | classical/classical.00020.wav
 61 | classical/classical.00023.wav
 62 | classical/classical.00025.wav
 63 | classical/classical.00026.wav
 64 | classical/classical.00028.wav
 65 | classical/classical.00031.wav
 66 | classical/classical.00032.wav
 67 | classical/classical.00038.wav
 68 | classical/classical.00040.wav
 69 | classical/classical.00044.wav
 70 | classical/classical.00047.wav
 71 | classical/classical.00048.wav
 72 | classical/classical.00049.wav
 73 | classical/classical.00050.wav
 74 | classical/classical.00051.wav
 75 | classical/classical.00057.wav
 76 | classical/classical.00058.wav
 77 | classical/classical.00059.wav
 78 | classical/classical.00060.wav
 79 | classical/classical.00062.wav
 80 | classical/classical.00064.wav
 81 | classical/classical.00065.wav
 82 | classical/classical.00067.wav
 83 | classical/classical.00070.wav
 84 | classical/classical.00072.wav
 85 | classical/classical.00073.wav
 86 | classical/classical.00076.wav
 87 | classical/classical.00078.wav
 88 | classical/classical.00079.wav
 89 | classical/classical.00080.wav
 90 | classical/classical.00081.wav
 91 | classical/classical.00082.wav
 92 | classical/classical.00083.wav
 93 | classical/classical.00084.wav
 94 | classical/classical.00085.wav
 95 | classical/classical.00091.wav
 96 | classical/classical.00093.wav
 97 | classical/classical.00095.wav
 98 | classical/classical.00096.wav
 99 | classical/classical.00097.wav
100 | classical/classical.00099.wav
101 | country/country.00003.wav
102 | country/country.00006.wav
103 | country/country.00007.wav
104 | country/country.00008.wav
105 | country/country.00010.wav
106 | country/country.00012.wav
107 | country/country.00013.wav
108 | country/country.00014.wav
109 | country/country.00016.wav
110 | country/country.00017.wav
111 | country/country.00018.wav
112 | country/country.00019.wav
113 | country/country.00028.wav
114 | country/country.00029.wav
115 | country/country.00032.wav
116 | country/country.00034.wav
117 | country/country.00035.wav
118 | country/country.00036.wav
119 | country/country.00037.wav
120 | country/country.00038.wav
121 | country/country.00039.wav
122 | country/country.00040.wav
123 | country/country.00044.wav
124 | country/country.00045.wav
125 | country/country.00048.wav
126 | country/country.00049.wav
127 | country/country.00054.wav
128 | country/country.00056.wav
129 | country/country.00059.wav
130 | country/country.00061.wav
131 | country/country.00062.wav
132 | country/country.00063.wav
133 | country/country.00066.wav
134 | country/country.00067.wav
135 | country/country.00068.wav
136 | country/country.00069.wav
137 | country/country.00070.wav
138 | country/country.00073.wav
139 | country/country.00074.wav
140 | country/country.00079.wav
141 | country/country.00080.wav
142 | country/country.00081.wav
143 | country/country.00085.wav
144 | country/country.00086.wav
145 | country/country.00089.wav
146 | country/country.00093.wav
147 | country/country.00094.wav
148 | country/country.00095.wav
149 | country/country.00096.wav
150 | country/country.00097.wav
151 | disco/disco.00006.wav
152 | disco/disco.00007.wav
153 | disco/disco.00009.wav
154 | disco/disco.00010.wav
155 | disco/disco.00011.wav
156 | disco/disco.00013.wav
157 | disco/disco.00016.wav
158 | disco/disco.00017.wav
159 | disco/disco.00018.wav
160 | disco/disco.00019.wav
161 | disco/disco.00020.wav
162 | disco/disco.00023.wav
163 | disco/disco.00024.wav
164 | disco/disco.00026.wav
165 | disco/disco.00027.wav
166 | disco/disco.00030.wav
167 | disco/disco.00031.wav
168 | disco/disco.00034.wav
169 | disco/disco.00035.wav
170 | disco/disco.00036.wav
171 | disco/disco.00037.wav
172 | disco/disco.00041.wav
173 | disco/disco.00042.wav
174 | disco/disco.00044.wav
175 | disco/disco.00045.wav
176 | disco/disco.00047.wav
177 | disco/disco.00048.wav
178 | disco/disco.00052.wav
179 | disco/disco.00054.wav
180 | disco/disco.00056.wav
181 | disco/disco.00057.wav
182 | disco/disco.00060.wav
183 | disco/disco.00063.wav
184 | disco/disco.00064.wav
185 | disco/disco.00065.wav
186 | disco/disco.00066.wav
187 | disco/disco.00067.wav
188 | disco/disco.00069.wav
189 | disco/disco.00075.wav
190 | disco/disco.00078.wav
191 | disco/disco.00081.wav
192 | disco/disco.00083.wav
193 | disco/disco.00084.wav
194 | disco/disco.00088.wav
195 | disco/disco.00089.wav
196 | disco/disco.00090.wav
197 | disco/disco.00091.wav
198 | disco/disco.00096.wav
199 | disco/disco.00097.wav
200 | disco/disco.00099.wav
201 | hiphop/hiphop.00000.wav
202 | hiphop/hiphop.00002.wav
203 | hiphop/hiphop.00006.wav
204 | hiphop/hiphop.00011.wav
205 | hiphop/hiphop.00012.wav
206 | hiphop/hiphop.00013.wav
207 | hiphop/hiphop.00018.wav
208 | hiphop/hiphop.00020.wav
209 | hiphop/hiphop.00021.wav
210 | hiphop/hiphop.00022.wav
211 | hiphop/hiphop.00026.wav
212 | hiphop/hiphop.00027.wav
213 | hiphop/hiphop.00030.wav
214 | hiphop/hiphop.00031.wav
215 | hiphop/hiphop.00032.wav
216 | hiphop/hiphop.00033.wav
217 | hiphop/hiphop.00035.wav
218 | hiphop/hiphop.00038.wav
219 | hiphop/hiphop.00040.wav
220 | hiphop/hiphop.00041.wav
221 | hiphop/hiphop.00044.wav
222 | hiphop/hiphop.00045.wav
223 | hiphop/hiphop.00046.wav
224 | hiphop/hiphop.00049.wav
225 | hiphop/hiphop.00050.wav
226 | hiphop/hiphop.00052.wav
227 | hiphop/hiphop.00053.wav
228 | hiphop/hiphop.00054.wav
229 | hiphop/hiphop.00056.wav
230 | hiphop/hiphop.00057.wav
231 | hiphop/hiphop.00058.wav
232 | hiphop/hiphop.00059.wav
233 | hiphop/hiphop.00060.wav
234 | hiphop/hiphop.00063.wav
235 | hiphop/hiphop.00064.wav
236 | hiphop/hiphop.00068.wav
237 | hiphop/hiphop.00072.wav
238 | hiphop/hiphop.00073.wav
239 | hiphop/hiphop.00074.wav
240 | hiphop/hiphop.00077.wav
241 | hiphop/hiphop.00078.wav
242 | hiphop/hiphop.00079.wav
243 | hiphop/hiphop.00084.wav
244 | hiphop/hiphop.00086.wav
245 | hiphop/hiphop.00088.wav
246 | hiphop/hiphop.00089.wav
247 | hiphop/hiphop.00090.wav
248 | hiphop/hiphop.00092.wav
249 | hiphop/hiphop.00098.wav
250 | hiphop/hiphop.00099.wav
251 | jazz/jazz.00002.wav
252 | jazz/jazz.00004.wav
253 | jazz/jazz.00005.wav
254 | jazz/jazz.00006.wav
255 | jazz/jazz.00007.wav
256 | jazz/jazz.00010.wav
257 | jazz/jazz.00012.wav
258 | jazz/jazz.00013.wav
259 | jazz/jazz.00014.wav
260 | jazz/jazz.00015.wav
261 | jazz/jazz.00023.wav
262 | jazz/jazz.00024.wav
263 | jazz/jazz.00026.wav
264 | jazz/jazz.00027.wav
265 | jazz/jazz.00028.wav
266 | jazz/jazz.00030.wav
267 | jazz/jazz.00032.wav
268 | jazz/jazz.00034.wav
269 | jazz/jazz.00038.wav
270 | jazz/jazz.00040.wav
271 | jazz/jazz.00041.wav
272 | jazz/jazz.00043.wav
273 | jazz/jazz.00050.wav
274 | jazz/jazz.00052.wav
275 | jazz/jazz.00054.wav
276 | jazz/jazz.00055.wav
277 | jazz/jazz.00057.wav
278 | jazz/jazz.00058.wav
279 | jazz/jazz.00059.wav
280 | jazz/jazz.00060.wav
281 | jazz/jazz.00061.wav
282 | jazz/jazz.00062.wav
283 | jazz/jazz.00064.wav
284 | jazz/jazz.00068.wav
285 | jazz/jazz.00070.wav
286 | jazz/jazz.00072.wav
287 | jazz/jazz.00074.wav
288 | jazz/jazz.00075.wav
289 | jazz/jazz.00077.wav
290 | jazz/jazz.00079.wav
291 | jazz/jazz.00080.wav
292 | jazz/jazz.00086.wav
293 | jazz/jazz.00089.wav
294 | jazz/jazz.00090.wav
295 | jazz/jazz.00091.wav
296 | jazz/jazz.00092.wav
297 | jazz/jazz.00093.wav
298 | jazz/jazz.00094.wav
299 | jazz/jazz.00095.wav
300 | jazz/jazz.00096.wav
301 | metal/metal.00000.wav
302 | metal/metal.00003.wav
303 | metal/metal.00004.wav
304 | metal/metal.00006.wav
305 | metal/metal.00010.wav
306 | metal/metal.00011.wav
307 | metal/metal.00013.wav
308 | metal/metal.00014.wav
309 | metal/metal.00016.wav
310 | metal/metal.00017.wav
311 | metal/metal.00019.wav
312 | metal/metal.00022.wav
313 | metal/metal.00023.wav
314 | metal/metal.00024.wav
315 | metal/metal.00025.wav
316 | metal/metal.00026.wav
317 | metal/metal.00027.wav
318 | metal/metal.00028.wav
319 | metal/metal.00036.wav
320 | metal/metal.00037.wav
321 | metal/metal.00038.wav
322 | metal/metal.00041.wav
323 | metal/metal.00045.wav
324 | metal/metal.00047.wav
325 | metal/metal.00049.wav
326 | metal/metal.00052.wav
327 | metal/metal.00056.wav
328 | metal/metal.00059.wav
329 | metal/metal.00060.wav
330 | metal/metal.00061.wav
331 | metal/metal.00065.wav
332 | metal/metal.00070.wav
333 | metal/metal.00071.wav
334 | metal/metal.00072.wav
335 | metal/metal.00074.wav
336 | metal/metal.00075.wav
337 | metal/metal.00076.wav
338 | metal/metal.00077.wav
339 | metal/metal.00079.wav
340 | metal/metal.00081.wav
341 | metal/metal.00082.wav
342 | metal/metal.00085.wav
343 | metal/metal.00086.wav
344 | metal/metal.00088.wav
345 | metal/metal.00089.wav
346 | metal/metal.00090.wav
347 | metal/metal.00091.wav
348 | metal/metal.00093.wav
349 | metal/metal.00097.wav
350 | metal/metal.00099.wav
351 | pop/pop.00002.wav
352 | pop/pop.00003.wav
353 | pop/pop.00004.wav
354 | pop/pop.00009.wav
355 | pop/pop.00011.wav
356 | pop/pop.00012.wav
357 | pop/pop.00013.wav
358 | pop/pop.00015.wav
359 | pop/pop.00017.wav
360 | pop/pop.00020.wav
361 | pop/pop.00022.wav
362 | pop/pop.00024.wav
363 | pop/pop.00026.wav
364 | pop/pop.00027.wav
365 | pop/pop.00032.wav
366 | pop/pop.00033.wav
367 | pop/pop.00035.wav
368 | pop/pop.00041.wav
369 | pop/pop.00042.wav
370 | pop/pop.00043.wav
371 | pop/pop.00045.wav
372 | pop/pop.00048.wav
373 | pop/pop.00050.wav
374 | pop/pop.00051.wav
375 | pop/pop.00053.wav
376 | pop/pop.00054.wav
377 | pop/pop.00055.wav
378 | pop/pop.00056.wav
379 | pop/pop.00061.wav
380 | pop/pop.00062.wav
381 | pop/pop.00063.wav
382 | pop/pop.00064.wav
383 | pop/pop.00065.wav
384 | pop/pop.00067.wav
385 | pop/pop.00071.wav
386 | pop/pop.00072.wav
387 | pop/pop.00074.wav
388 | pop/pop.00075.wav
389 | pop/pop.00076.wav
390 | pop/pop.00077.wav
391 | pop/pop.00079.wav
392 | pop/pop.00081.wav
393 | pop/pop.00082.wav
394 | pop/pop.00083.wav
395 | pop/pop.00086.wav
396 | pop/pop.00087.wav
397 | pop/pop.00092.wav
398 | pop/pop.00093.wav
399 | pop/pop.00095.wav
400 | pop/pop.00098.wav
401 | reggae/reggae.00004.wav
402 | reggae/reggae.00005.wav
403 | reggae/reggae.00009.wav
404 | reggae/reggae.00010.wav
405 | reggae/reggae.00011.wav
406 | reggae/reggae.00012.wav
407 | reggae/reggae.00013.wav
408 | reggae/reggae.00014.wav
409 | reggae/reggae.00016.wav
410 | reggae/reggae.00017.wav
411 | reggae/reggae.00018.wav
412 | reggae/reggae.00023.wav
413 | reggae/reggae.00027.wav
414 | reggae/reggae.00028.wav
415 | reggae/reggae.00030.wav
416 | reggae/reggae.00031.wav
417 | reggae/reggae.00036.wav
418 | reggae/reggae.00037.wav
419 | reggae/reggae.00040.wav
420 | reggae/reggae.00041.wav
421 | reggae/reggae.00042.wav
422 | reggae/reggae.00043.wav
423 | reggae/reggae.00044.wav
424 | reggae/reggae.00049.wav
425 | reggae/reggae.00051.wav
426 | reggae/reggae.00052.wav
427 | reggae/reggae.00053.wav
428 | reggae/reggae.00054.wav
429 | reggae/reggae.00055.wav
430 | reggae/reggae.00056.wav
431 | reggae/reggae.00059.wav
432 | reggae/reggae.00060.wav
433 | reggae/reggae.00062.wav
434 | reggae/reggae.00064.wav
435 | reggae/reggae.00065.wav
436 | reggae/reggae.00066.wav
437 | reggae/reggae.00071.wav
438 | reggae/reggae.00073.wav
439 | reggae/reggae.00075.wav
440 | reggae/reggae.00076.wav
441 | reggae/reggae.00077.wav
442 | reggae/reggae.00082.wav
443 | reggae/reggae.00084.wav
444 | reggae/reggae.00087.wav
445 | reggae/reggae.00088.wav
446 | reggae/reggae.00089.wav
447 | reggae/reggae.00091.wav
448 | reggae/reggae.00092.wav
449 | reggae/reggae.00095.wav
450 | reggae/reggae.00098.wav
451 | rock/rock.00003.wav
452 | rock/rock.00004.wav
453 | rock/rock.00005.wav
454 | rock/rock.00006.wav
455 | rock/rock.00008.wav
456 | rock/rock.00014.wav
457 | rock/rock.00015.wav
458 | rock/rock.00017.wav
459 | rock/rock.00023.wav
460 | rock/rock.00024.wav
461 | rock/rock.00025.wav
462 | rock/rock.00026.wav
463 | rock/rock.00027.wav
464 | rock/rock.00034.wav
465 | rock/rock.00035.wav
466 | rock/rock.00037.wav
467 | rock/rock.00040.wav
468 | rock/rock.00041.wav
469 | rock/rock.00044.wav
470 | rock/rock.00045.wav
471 | rock/rock.00048.wav
472 | rock/rock.00052.wav
473 | rock/rock.00054.wav
474 | rock/rock.00055.wav
475 | rock/rock.00058.wav
476 | rock/rock.00059.wav
477 | rock/rock.00062.wav
478 | rock/rock.00063.wav
479 | rock/rock.00065.wav
480 | rock/rock.00066.wav
481 | rock/rock.00067.wav
482 | rock/rock.00069.wav
483 | rock/rock.00071.wav
484 | rock/rock.00075.wav
485 | rock/rock.00077.wav
486 | rock/rock.00078.wav
487 | rock/rock.00080.wav
488 | rock/rock.00081.wav
489 | rock/rock.00083.wav
490 | rock/rock.00084.wav
491 | rock/rock.00085.wav
492 | rock/rock.00086.wav
493 | rock/rock.00088.wav
494 | rock/rock.00090.wav
495 | rock/rock.00091.wav
496 | rock/rock.00093.wav
497 | rock/rock.00094.wav
498 | rock/rock.00095.wav
499 | rock/rock.00097.wav
500 | rock/rock.00099.wav
501 | 


--------------------------------------------------------------------------------
/gtzan/valid_filtered.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00000.wav
  2 | blues/blues.00001.wav
  3 | blues/blues.00002.wav
  4 | blues/blues.00003.wav
  5 | blues/blues.00004.wav
  6 | blues/blues.00005.wav
  7 | blues/blues.00006.wav
  8 | blues/blues.00007.wav
  9 | blues/blues.00008.wav
 10 | blues/blues.00009.wav
 11 | blues/blues.00010.wav
 12 | blues/blues.00011.wav
 13 | blues/blues.00050.wav
 14 | blues/blues.00051.wav
 15 | blues/blues.00052.wav
 16 | blues/blues.00053.wav
 17 | blues/blues.00054.wav
 18 | blues/blues.00055.wav
 19 | blues/blues.00056.wav
 20 | blues/blues.00057.wav
 21 | blues/blues.00058.wav
 22 | blues/blues.00059.wav
 23 | blues/blues.00060.wav
 24 | classical/classical.00000.wav
 25 | classical/classical.00001.wav
 26 | classical/classical.00002.wav
 27 | classical/classical.00003.wav
 28 | classical/classical.00004.wav
 29 | classical/classical.00005.wav
 30 | classical/classical.00006.wav
 31 | classical/classical.00007.wav
 32 | classical/classical.00008.wav
 33 | classical/classical.00009.wav
 34 | classical/classical.00010.wav
 35 | classical/classical.00068.wav
 36 | classical/classical.00069.wav
 37 | classical/classical.00070.wav
 38 | classical/classical.00071.wav
 39 | classical/classical.00072.wav
 40 | classical/classical.00073.wav
 41 | classical/classical.00074.wav
 42 | classical/classical.00075.wav
 43 | classical/classical.00076.wav
 44 | country/country.00000.wav
 45 | country/country.00001.wav
 46 | country/country.00002.wav
 47 | country/country.00003.wav
 48 | country/country.00004.wav
 49 | country/country.00005.wav
 50 | country/country.00006.wav
 51 | country/country.00007.wav
 52 | country/country.00009.wav
 53 | country/country.00010.wav
 54 | country/country.00011.wav
 55 | country/country.00012.wav
 56 | country/country.00013.wav
 57 | country/country.00014.wav
 58 | country/country.00015.wav
 59 | country/country.00016.wav
 60 | country/country.00017.wav
 61 | country/country.00018.wav
 62 | country/country.00027.wav
 63 | country/country.00041.wav
 64 | country/country.00042.wav
 65 | country/country.00045.wav
 66 | country/country.00049.wav
 67 | disco/disco.00000.wav
 68 | disco/disco.00002.wav
 69 | disco/disco.00003.wav
 70 | disco/disco.00004.wav
 71 | disco/disco.00006.wav
 72 | disco/disco.00007.wav
 73 | disco/disco.00008.wav
 74 | disco/disco.00009.wav
 75 | disco/disco.00010.wav
 76 | disco/disco.00011.wav
 77 | disco/disco.00012.wav
 78 | disco/disco.00013.wav
 79 | disco/disco.00014.wav
 80 | disco/disco.00046.wav
 81 | disco/disco.00048.wav
 82 | disco/disco.00052.wav
 83 | disco/disco.00067.wav
 84 | disco/disco.00068.wav
 85 | disco/disco.00072.wav
 86 | disco/disco.00075.wav
 87 | disco/disco.00090.wav
 88 | disco/disco.00095.wav
 89 | hiphop/hiphop.00081.wav
 90 | hiphop/hiphop.00082.wav
 91 | hiphop/hiphop.00083.wav
 92 | hiphop/hiphop.00084.wav
 93 | hiphop/hiphop.00085.wav
 94 | hiphop/hiphop.00086.wav
 95 | hiphop/hiphop.00087.wav
 96 | hiphop/hiphop.00088.wav
 97 | hiphop/hiphop.00089.wav
 98 | hiphop/hiphop.00090.wav
 99 | hiphop/hiphop.00091.wav
100 | hiphop/hiphop.00092.wav
101 | hiphop/hiphop.00093.wav
102 | hiphop/hiphop.00094.wav
103 | hiphop/hiphop.00095.wav
104 | hiphop/hiphop.00096.wav
105 | hiphop/hiphop.00097.wav
106 | hiphop/hiphop.00098.wav
107 | jazz/jazz.00002.wav
108 | jazz/jazz.00003.wav
109 | jazz/jazz.00004.wav
110 | jazz/jazz.00005.wav
111 | jazz/jazz.00006.wav
112 | jazz/jazz.00007.wav
113 | jazz/jazz.00008.wav
114 | jazz/jazz.00009.wav
115 | jazz/jazz.00010.wav
116 | jazz/jazz.00025.wav
117 | jazz/jazz.00026.wav
118 | jazz/jazz.00027.wav
119 | jazz/jazz.00028.wav
120 | jazz/jazz.00029.wav
121 | jazz/jazz.00030.wav
122 | jazz/jazz.00031.wav
123 | jazz/jazz.00032.wav
124 | metal/metal.00000.wav
125 | metal/metal.00001.wav
126 | metal/metal.00006.wav
127 | metal/metal.00007.wav
128 | metal/metal.00008.wav
129 | metal/metal.00009.wav
130 | metal/metal.00010.wav
131 | metal/metal.00011.wav
132 | metal/metal.00016.wav
133 | metal/metal.00017.wav
134 | metal/metal.00018.wav
135 | metal/metal.00019.wav
136 | metal/metal.00020.wav
137 | metal/metal.00036.wav
138 | metal/metal.00037.wav
139 | metal/metal.00068.wav
140 | metal/metal.00076.wav
141 | metal/metal.00077.wav
142 | metal/metal.00081.wav
143 | metal/metal.00082.wav
144 | pop/pop.00010.wav
145 | pop/pop.00053.wav
146 | pop/pop.00055.wav
147 | pop/pop.00058.wav
148 | pop/pop.00059.wav
149 | pop/pop.00060.wav
150 | pop/pop.00061.wav
151 | pop/pop.00062.wav
152 | pop/pop.00081.wav
153 | pop/pop.00083.wav
154 | pop/pop.00084.wav
155 | pop/pop.00085.wav
156 | pop/pop.00086.wav
157 | reggae/reggae.00061.wav
158 | reggae/reggae.00062.wav
159 | reggae/reggae.00070.wav
160 | reggae/reggae.00072.wav
161 | reggae/reggae.00074.wav
162 | reggae/reggae.00076.wav
163 | reggae/reggae.00077.wav
164 | reggae/reggae.00078.wav
165 | reggae/reggae.00085.wav
166 | reggae/reggae.00092.wav
167 | reggae/reggae.00093.wav
168 | reggae/reggae.00094.wav
169 | reggae/reggae.00095.wav
170 | reggae/reggae.00096.wav
171 | reggae/reggae.00097.wav
172 | reggae/reggae.00098.wav
173 | reggae/reggae.00099.wav
174 | rock/rock.00038.wav
175 | rock/rock.00049.wav
176 | rock/rock.00050.wav
177 | rock/rock.00051.wav
178 | rock/rock.00052.wav
179 | rock/rock.00053.wav
180 | rock/rock.00054.wav
181 | rock/rock.00055.wav
182 | rock/rock.00056.wav
183 | rock/rock.00071.wav
184 | rock/rock.00072.wav
185 | rock/rock.00073.wav
186 | rock/rock.00074.wav
187 | rock/rock.00075.wav
188 | rock/rock.00076.wav
189 | rock/rock.00077.wav
190 | rock/rock.00078.wav
191 | rock/rock.00079.wav
192 | rock/rock.00080.wav
193 | rock/rock.00081.wav
194 | rock/rock.00082.wav
195 | rock/rock.00083.wav
196 | rock/rock.00084.wav
197 | rock/rock.00085.wav
198 | 


--------------------------------------------------------------------------------
/gtzan/valid_stratified.txt:
--------------------------------------------------------------------------------
  1 | blues/blues.00004.wav
  2 | blues/blues.00007.wav
  3 | blues/blues.00008.wav
  4 | blues/blues.00009.wav
  5 | blues/blues.00016.wav
  6 | blues/blues.00022.wav
  7 | blues/blues.00032.wav
  8 | blues/blues.00035.wav
  9 | blues/blues.00042.wav
 10 | blues/blues.00045.wav
 11 | blues/blues.00051.wav
 12 | blues/blues.00054.wav
 13 | blues/blues.00058.wav
 14 | blues/blues.00061.wav
 15 | blues/blues.00066.wav
 16 | blues/blues.00068.wav
 17 | blues/blues.00071.wav
 18 | blues/blues.00073.wav
 19 | blues/blues.00078.wav
 20 | blues/blues.00080.wav
 21 | blues/blues.00082.wav
 22 | blues/blues.00086.wav
 23 | blues/blues.00090.wav
 24 | blues/blues.00093.wav
 25 | blues/blues.00096.wav
 26 | classical/classical.00005.wav
 27 | classical/classical.00006.wav
 28 | classical/classical.00007.wav
 29 | classical/classical.00008.wav
 30 | classical/classical.00010.wav
 31 | classical/classical.00013.wav
 32 | classical/classical.00015.wav
 33 | classical/classical.00018.wav
 34 | classical/classical.00022.wav
 35 | classical/classical.00027.wav
 36 | classical/classical.00030.wav
 37 | classical/classical.00033.wav
 38 | classical/classical.00039.wav
 39 | classical/classical.00046.wav
 40 | classical/classical.00054.wav
 41 | classical/classical.00063.wav
 42 | classical/classical.00068.wav
 43 | classical/classical.00071.wav
 44 | classical/classical.00074.wav
 45 | classical/classical.00077.wav
 46 | classical/classical.00086.wav
 47 | classical/classical.00089.wav
 48 | classical/classical.00090.wav
 49 | classical/classical.00094.wav
 50 | classical/classical.00098.wav
 51 | country/country.00000.wav
 52 | country/country.00001.wav
 53 | country/country.00004.wav
 54 | country/country.00005.wav
 55 | country/country.00009.wav
 56 | country/country.00015.wav
 57 | country/country.00020.wav
 58 | country/country.00025.wav
 59 | country/country.00026.wav
 60 | country/country.00027.wav
 61 | country/country.00030.wav
 62 | country/country.00042.wav
 63 | country/country.00043.wav
 64 | country/country.00046.wav
 65 | country/country.00055.wav
 66 | country/country.00058.wav
 67 | country/country.00060.wav
 68 | country/country.00071.wav
 69 | country/country.00083.wav
 70 | country/country.00084.wav
 71 | country/country.00087.wav
 72 | country/country.00088.wav
 73 | country/country.00090.wav
 74 | country/country.00092.wav
 75 | country/country.00098.wav
 76 | disco/disco.00003.wav
 77 | disco/disco.00015.wav
 78 | disco/disco.00022.wav
 79 | disco/disco.00025.wav
 80 | disco/disco.00028.wav
 81 | disco/disco.00029.wav
 82 | disco/disco.00032.wav
 83 | disco/disco.00039.wav
 84 | disco/disco.00040.wav
 85 | disco/disco.00043.wav
 86 | disco/disco.00046.wav
 87 | disco/disco.00050.wav
 88 | disco/disco.00051.wav
 89 | disco/disco.00053.wav
 90 | disco/disco.00059.wav
 91 | disco/disco.00062.wav
 92 | disco/disco.00070.wav
 93 | disco/disco.00072.wav
 94 | disco/disco.00073.wav
 95 | disco/disco.00074.wav
 96 | disco/disco.00085.wav
 97 | disco/disco.00087.wav
 98 | disco/disco.00092.wav
 99 | disco/disco.00093.wav
100 | disco/disco.00095.wav
101 | hiphop/hiphop.00003.wav
102 | hiphop/hiphop.00005.wav
103 | hiphop/hiphop.00007.wav
104 | hiphop/hiphop.00010.wav
105 | hiphop/hiphop.00016.wav
106 | hiphop/hiphop.00019.wav
107 | hiphop/hiphop.00036.wav
108 | hiphop/hiphop.00037.wav
109 | hiphop/hiphop.00039.wav
110 | hiphop/hiphop.00043.wav
111 | hiphop/hiphop.00047.wav
112 | hiphop/hiphop.00048.wav
113 | hiphop/hiphop.00051.wav
114 | hiphop/hiphop.00055.wav
115 | hiphop/hiphop.00066.wav
116 | hiphop/hiphop.00067.wav
117 | hiphop/hiphop.00069.wav
118 | hiphop/hiphop.00071.wav
119 | hiphop/hiphop.00076.wav
120 | hiphop/hiphop.00080.wav
121 | hiphop/hiphop.00081.wav
122 | hiphop/hiphop.00082.wav
123 | hiphop/hiphop.00083.wav
124 | hiphop/hiphop.00093.wav
125 | hiphop/hiphop.00097.wav
126 | jazz/jazz.00000.wav
127 | jazz/jazz.00001.wav
128 | jazz/jazz.00008.wav
129 | jazz/jazz.00011.wav
130 | jazz/jazz.00017.wav
131 | jazz/jazz.00019.wav
132 | jazz/jazz.00021.wav
133 | jazz/jazz.00022.wav
134 | jazz/jazz.00025.wav
135 | jazz/jazz.00029.wav
136 | jazz/jazz.00036.wav
137 | jazz/jazz.00042.wav
138 | jazz/jazz.00044.wav
139 | jazz/jazz.00046.wav
140 | jazz/jazz.00047.wav
141 | jazz/jazz.00049.wav
142 | jazz/jazz.00051.wav
143 | jazz/jazz.00056.wav
144 | jazz/jazz.00063.wav
145 | jazz/jazz.00065.wav
146 | jazz/jazz.00081.wav
147 | jazz/jazz.00082.wav
148 | jazz/jazz.00083.wav
149 | jazz/jazz.00097.wav
150 | jazz/jazz.00098.wav
151 | metal/metal.00007.wav
152 | metal/metal.00008.wav
153 | metal/metal.00009.wav
154 | metal/metal.00012.wav
155 | metal/metal.00015.wav
156 | metal/metal.00029.wav
157 | metal/metal.00031.wav
158 | metal/metal.00032.wav
159 | metal/metal.00033.wav
160 | metal/metal.00034.wav
161 | metal/metal.00039.wav
162 | metal/metal.00043.wav
163 | metal/metal.00044.wav
164 | metal/metal.00053.wav
165 | metal/metal.00055.wav
166 | metal/metal.00063.wav
167 | metal/metal.00064.wav
168 | metal/metal.00067.wav
169 | metal/metal.00068.wav
170 | metal/metal.00073.wav
171 | metal/metal.00083.wav
172 | metal/metal.00087.wav
173 | metal/metal.00094.wav
174 | metal/metal.00095.wav
175 | metal/metal.00096.wav
176 | pop/pop.00001.wav
177 | pop/pop.00007.wav
178 | pop/pop.00010.wav
179 | pop/pop.00014.wav
180 | pop/pop.00016.wav
181 | pop/pop.00018.wav
182 | pop/pop.00019.wav
183 | pop/pop.00023.wav
184 | pop/pop.00025.wav
185 | pop/pop.00028.wav
186 | pop/pop.00029.wav
187 | pop/pop.00037.wav
188 | pop/pop.00047.wav
189 | pop/pop.00057.wav
190 | pop/pop.00058.wav
191 | pop/pop.00059.wav
192 | pop/pop.00060.wav
193 | pop/pop.00073.wav
194 | pop/pop.00078.wav
195 | pop/pop.00080.wav
196 | pop/pop.00085.wav
197 | pop/pop.00089.wav
198 | pop/pop.00090.wav
199 | pop/pop.00094.wav
200 | pop/pop.00099.wav
201 | reggae/reggae.00000.wav
202 | reggae/reggae.00001.wav
203 | reggae/reggae.00007.wav
204 | reggae/reggae.00008.wav
205 | reggae/reggae.00019.wav
206 | reggae/reggae.00024.wav
207 | reggae/reggae.00025.wav
208 | reggae/reggae.00026.wav
209 | reggae/reggae.00029.wav
210 | reggae/reggae.00032.wav
211 | reggae/reggae.00034.wav
212 | reggae/reggae.00038.wav
213 | reggae/reggae.00039.wav
214 | reggae/reggae.00045.wav
215 | reggae/reggae.00047.wav
216 | reggae/reggae.00061.wav
217 | reggae/reggae.00063.wav
218 | reggae/reggae.00067.wav
219 | reggae/reggae.00070.wav
220 | reggae/reggae.00072.wav
221 | reggae/reggae.00080.wav
222 | reggae/reggae.00085.wav
223 | reggae/reggae.00086.wav
224 | reggae/reggae.00090.wav
225 | reggae/reggae.00093.wav
226 | rock/rock.00007.wav
227 | rock/rock.00012.wav
228 | rock/rock.00013.wav
229 | rock/rock.00016.wav
230 | rock/rock.00018.wav
231 | rock/rock.00019.wav
232 | rock/rock.00021.wav
233 | rock/rock.00022.wav
234 | rock/rock.00028.wav
235 | rock/rock.00029.wav
236 | rock/rock.00030.wav
237 | rock/rock.00036.wav
238 | rock/rock.00038.wav
239 | rock/rock.00043.wav
240 | rock/rock.00046.wav
241 | rock/rock.00049.wav
242 | rock/rock.00053.wav
243 | rock/rock.00056.wav
244 | rock/rock.00061.wav
245 | rock/rock.00068.wav
246 | rock/rock.00076.wav
247 | rock/rock.00087.wav
248 | rock/rock.00089.wav
249 | rock/rock.00092.wav
250 | rock/rock.00096.wav
251 | 


--------------------------------------------------------------------------------
/hpc_scripts/README.md:
--------------------------------------------------------------------------------
  1 | ##Setup Tips!
  2 | If you want to run this code on a high performance cluster (HPC), or on a computer where you don't have root access, the following setup instructions may be helpful. They may also be helpful in other cases as well.
  3 | 
  4 | ## Instructions
  5 | 
  6 | ### HPC Module loading
  7 | On the HPC I use there are various modules that must be loaded on demand (e.g., python and cuda). This is done by placing a file called .gbarrc in your home directory with the following lines:
  8 | ```
  9 | MODULES=python/2.7.3,cuda/6.5
 10 | ```
 11 | This will probably differ from case to case depending on how your HPC is setup.
 12 | 
 13 | ### Python virtual environment
 14 | I first setup a python virtual environment (which provides a clean copy of python without any packages) as follows:
 15 | ```
 16 | mkdir venv
 17 | virtualenv venv
 18 | ```
 19 | 
 20 | After this you must call:
 21 | ```
 22 | source venv/bin/activate
 23 | ```
 24 | every time you login to activate the virtual environment. I added the above line to the end of my .bashrc file so it happens automagically.
 25 | 
 26 | ### Installation of libraries
 27 | Libraries typically get installed to /usr/local/lib, but often times we don't have write access to that location. However, we can download and install libraries locally. First make a directory to hold the libraries (starting from your home directory):
 28 | ```
 29 | mkdir .local
 30 | mkdir .local/lib
 31 | mkdir .local/include
 32 | ```
 33 | 
 34 | Add this location to your LD_LIBRARY_PATH too
 35 | ```
 36 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
 37 | ```
 38 | (that line can be added to .bashrc so that it is excuted in every new shell).
 39 | 
 40 | Now download and install the libraries. The following "pseudocode" demonstrates the basic process (you will need to track down the correct urls for your platform):
 41 | ```
 42 | url_list = [
 43 | "http://www.hdfgroup.org/HDF5/release/obtain5.html", # find your libhdf5 here
 44 | "http://sourceforge.net/projects/mad/files/libmad", # find libmad here (only need if you plan to read mp3 files)
 45 | "http://www.mega-nerd.com/SRC/download.html", # libsamplerate
 46 | "http://www.mega-nerd.com/libsndfile/#Download", #libsndfile
 47 | "http://pyyaml.org/download/libyaml" #libyaml
 48 | ]
 49 | 
 50 | for url in url_list:
 51 | 	wget url
 52 | 	tar xvfz package_name.tar.gz
 53 | 	cd package_name
 54 | 	./configure --prefix=$HOME/.local
 55 | 	make
 56 | 	make install
 57 | ```
 58 | the --prefix flag tells the Makefile to install the library locally (and therefore it is crucial that you include it).
 59 | 
 60 | Note on the HDF5 library: This library may already exist on your system, however it may be a good idea to install a local copy anyway, because some older versions do not support multiple read access (which is required), leading to an error (if you see 'FILE_OPEN_POLICY=strict' printed out, then you have probably just encountered this error). Also, to make sure Pytables links with the correct HDF5 library add the following to your .bashrc file:
 61 | ```
 62 | export HDF5_DIR=$HOME/.local
 63 | ```
 64 | 
 65 | Note on libmad: In the Makfile for libmad you may need to remove the "-fforce-mem" flag, which gcc no longer supports.
 66 | 
 67 | ### Installation of Python packages
 68 | After the libraries have been installed, one can start installing the necessary python packages. First you should create a file called .numpy-site.cfg in your home directory with the following lines:
 69 | ```
 70 | [sndfile]
 71 | library_dirs = $HOME/.local/lib
 72 | include_dirs = $HOME/.local/include
 73 | [hdf5]
 74 | library_dirs = $HOME/.local/lib
 75 | include_dirs = $HOME/.local/include
 76 | [samplerate]
 77 | library_dirs = $HOME/.local/lib
 78 | include_dirs = $HOME/.local/include
 79 | ```
 80 | This tells numpy's distutils where to find your locally installed libraries. You will probably have to replace $HOME with the actual path to your home directory (e.g., "/home/a/user") since it does not seem to get properly exported as an environment variable by numpy.
 81 | 
 82 | Now we can install the python packages (using pip, or from github sources, etc). The packages I have installed are:
 83 | ```
 84 | numpy, 
 85 | scipy, 
 86 | theano, 
 87 | pylearn2, 
 88 | pytables, 
 89 | numexpr, 
 90 | cython, 
 91 | pyyaml, 
 92 | ipython, 
 93 | sklearn, 
 94 | matplotlib, 
 95 | scikits.audiolab, 
 96 | scikits.samplerate, 
 97 | pymad (if you need to read mp3's) 
 98 | ```
 99 | 
100 | You might first try installing these using the "requirements.txt" file in this folder:
101 | ```
102 | pip install -r path/to/requirements.txt
103 | ```
104 | If that doesn't work you can try installing these with "pip install package_name", and finally, if that fails, download the source code for each module and run "python setup.py build" followed by "python setup.py install". 
105 | 
106 | Note: Theano and Pylearn2 should be installed from the github repositories in order to get up-to-date verions (the pip packages seem to be old). 
107 | 
108 | ### Theano setup
109 | If you want theano to use your GPU (and you probably do if you have one), create a file called .theanorc in your home directory with the following lines:
110 | ```
111 | [global]
112 | floatX = float32
113 | device = gpu0
114 | 
115 | [nvcc]
116 | fastmath = True
117 | ```


--------------------------------------------------------------------------------
/hpc_scripts/generate_gbar_jobs.py:
--------------------------------------------------------------------------------
 1 | jobscript = '''
 2 | #!/bin/sh
 3 | # embedded options to qsub - start with #PBS
 4 | # -- Name of the job ---
 5 | #PBS -N {jobname}
 6 | # -- specify queue --
 7 | #PBS -q hpc
 8 | # -- estimated wall clock time (execution time): hh:mm:ss --
 9 | #PBS -l walltime=24:00:00
10 | # --- number of processors/cores/nodes --
11 | #PBS -l nodes=1:ppn=1:gpus=1
12 | # -- user email address --
13 | #PBS -M cmke@dtu.dk
14 | # -- mail notification --
15 | #PBS -m abe
16 | # -- run in the current working (submission) directory --
17 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
18 | # here follow the commands you want to execute
19 | # Load modules needed by myapplication.x
20 | module load python/2.7.3 cuda/6.5
21 | 
22 | # Run my program
23 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
24 | source ~/venv/bin/activate
25 | cd /SCRATCH/cmke/dnn-mgr
26 | python train_mlp_script.py {fold_config} {dropout} --nunits {nunits} --output {savename}
27 | '''.format
28 | 
29 | fold_config = ['GTZAN_1024-fold-1_of_4.pkl']*4 + ['GTZAN_1024-filtered-fold.pkl']*4
30 | dropout = ['', '', '--dropout', '--dropout']*4
31 | nunits = [50, 500]*8
32 | for f, d, n in zip(fold_config, dropout, nunits):
33 | 	
34 | 	savename=''
35 | 	if f==fold_config[0]:
36 | 		savename += 'S_'
37 | 	else:
38 | 		savename += 'F_'
39 | 
40 | 	if n==50:
41 | 		savename += '50_'
42 | 	else:
43 | 		savename += '500_'
44 | 
45 | 	if d=='':
46 | 		savename += 'RS'
47 | 	else:
48 | 		savename += 'RSD'
49 | 
50 | 	with open(savename+'.sh', 'w') as fname:
51 | 		fname.write(jobscript(jobname=savename, fold_config=f, dropout=d, nunits=n, savename='./saved/'+savename+'.pkl'))
52 | 


--------------------------------------------------------------------------------
/hpc_scripts/generate_gbar_jobs_dnn.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | jobscript = '''
 4 | #!/bin/sh
 5 | # embedded options to qsub - start with #PBS
 6 | # -- Name of the job ---
 7 | #PBS -N {jobname}
 8 | # -- specify queue --
 9 | #PBS -q hpc
10 | # -- estimated wall clock time (execution time): hh:mm:ss --
11 | #PBS -l walltime=24:00:00
12 | # --- number of processors/cores/nodes --
13 | #PBS -l nodes=1:ppn=1:gpus=1
14 | # -- user email address --
15 | #PBS -M cmke@dtu.dk
16 | # -- mail notification --
17 | #PBS -m abe
18 | # -- run in the current working (submission) directory --
19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
20 | # here follow the commands you want to execute
21 | # Load modules needed by myapplication.x
22 | module load python/2.7.3 cuda/6.5
23 | 
24 | # Run my program
25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
26 | source ~/venv/bin/activate
27 | cd /dtu-compute/cosound/data/_tzanetakis_genre/
28 | python /SCRATCH/cmke/dnn-mgr/train_mlp_script.py {fold_config} {yaml_file} --nunits {nunits} --output {savename}
29 | '''.format
30 | 
31 | fold_config = ['GTZAN_stratified.pkl']*4 + ['GTZAN_filtered.pkl']*4
32 | yaml_file = ['mlp_rlu2.yaml', 'mlp_rlu2.yaml', 'mlp_rlu_dropout2.yaml', 'mlp_rlu_dropout2.yaml']*4
33 | nunits = [50, 500]*8
34 | for f, d, n in zip(fold_config, yaml_file, nunits):
35 | 	
36 | 	savename=''
37 | 	if f==fold_config[0]:
38 | 		savename += 'S_'
39 | 	else:
40 | 		savename += 'F_'
41 | 
42 | 	if n==50:
43 | 		savename += '50_'
44 | 	else:
45 | 		savename += '500_'
46 | 
47 | 	if d=='mlp_rlu2.yaml':
48 | 		savename += 'RS'
49 | 	else:
50 | 		savename += 'RSD'
51 | 
52 | 	with open(savename+'.sh', 'w') as fname:
53 | 		fname.write(jobscript(jobname=savename, fold_config=f, yaml_file=os.path.join('/SCRATCH/cmke/dnn-mgr/',d), nunits=n, savename='/SCRATCH/cmke/saved_models/dnn/'+savename+'.pkl'))
54 | 


--------------------------------------------------------------------------------
/hpc_scripts/generate_gbar_jobs_rf.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | jobscript = '''
 4 | #!/bin/sh
 5 | # embedded options to qsub - start with #PBS
 6 | # -- Name of the job ---
 7 | #PBS -N {jobname}
 8 | # -- specify queue --
 9 | #PBS -q hpc
10 | # -- estimated wall clock time (execution time): hh:mm:ss --
11 | #PBS -l walltime=12:00:00
12 | # --- number of processors/cores/nodes --
13 | #PBS -l nodes=1:ppn=4:gpus=1
14 | # -- user email address --
15 | #PBS -M cmke@dtu.dk
16 | # -- mail notification --
17 | #PBS -m abe
18 | # -- run in the current working (submission) directory --
19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
20 | # here follow the commands you want to execute
21 | # Load modules needed by myapplication.x
22 | module load python/2.7.3 cuda/6.5
23 | 
24 | # Run my program
25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
26 | source ~/venv/bin/activate
27 | cd /SCRATCH/cmke/dnn-mgr
28 | python train_classifier_on_dnn_feats.py {model_file} {dataset_dir} --which_layers {which_layers} --save_folder {save_folder} {aggregate_features}
29 | '''.format
30 | 
31 | model_files  = ['S_50_RS.pkl', 'S_50_RSD.pkl', 'S_500_RS.pkl', 'S_500_RSD.pkl', 'F_50_RS.pkl', 'F_50_RSD.pkl', 'F_500_RS.pkl', 'F_500_RSD.pkl']
32 | dataset_dir  = '/dtu-compute/cosound/data/_tzanetakis_genre/audio'
33 | which_layers = ['0', '1', '2', '0 1 2']
34 | aggregate_features = ['--aggregate_features', '']
35 | 
36 | job_list = []
37 | for agg in aggregate_features:
38 | 	for model in model_files:
39 | 		for l in which_layers:
40 | 
41 | 			savename = model.split('.pkl')[0]
42 | 			if agg=='':
43 | 				savename += '_FF_' # frame-level features
44 | 			else:
45 | 				savename += '_AF_' # aggregate features
46 | 
47 | 			if l=='0 1 2':
48 | 				savename += 'LAll'
49 | 			else:
50 | 				savename += 'L' + l
51 | 
52 | 			jobname = savename+'.sh'
53 | 			job_list.append( jobname )
54 | 			with open(jobname, 'w') as fname:
55 | 				fname.write( jobscript( jobname=savename, 
56 | 										model_file=os.path.join('./saved', model), 
57 | 										dataset_dir=dataset_dir, 
58 | 										which_layers=l, 
59 | 										save_folder=os.path.join('./saved', savename), 
60 | 										aggregate_features=agg) )
61 | 
62 | with open('_master_RF_trainer.sh', 'w') as f:
63 | 	for j in job_list:
64 | 		f.write('qsub %s\n' % j)
65 | 
66 | 
67 | 


--------------------------------------------------------------------------------
/hpc_scripts/generate_gbar_jobs_rf2.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | jobscript = '''
 4 | #!/bin/sh
 5 | # embedded options to qsub - start with #PBS
 6 | # -- Name of the job ---
 7 | #PBS -N {jobname}
 8 | # -- specify queue --
 9 | #PBS -q hpc
10 | # -- estimated wall clock time (execution time): hh:mm:ss --
11 | #PBS -l walltime=12:00:00
12 | # --- number of processors/cores/nodes --
13 | #PBS -l nodes=1:ppn=4:gpus=1
14 | # -- user email address --
15 | #PBS -M cmke@dtu.dk
16 | # -- mail notification --
17 | #PBS -m abe
18 | # -- run in the current working (submission) directory --
19 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
20 | # here follow the commands you want to execute
21 | # Load modules needed by myapplication.x
22 | module load python/2.7.3 cuda/6.5
23 | 
24 | # Run my program
25 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
26 | source ~/venv/bin/activate
27 | 
28 | cd /dtu-compute/cosound/data/_tzanetakis_genre
29 | python /SCRATCH/cmke/dnn-mgr/train_classifier_on_dnn_feats2.py {model_file} --which_layers {which_layers} --save_file {save_file} {aggregate_features}
30 | '''.format
31 | 
32 | model_files  = ['S_50_RS.pkl', 'S_50_RSD.pkl', 'S_500_RS.pkl', 'S_500_RSD.pkl', 'F_50_RS.pkl', 'F_50_RSD.pkl', 'F_500_RS.pkl', 'F_500_RSD.pkl']
33 | dataset_dir  = '/dtu-compute/cosound/data/_tzanetakis_genre/audio'
34 | which_layers = ['1', '2', '3', '1 2 3']
35 | aggregate_features = ['--aggregate_features', '']
36 | 
37 | job_list = []
38 | for agg in aggregate_features:
39 | 	for model in model_files:
40 | 		for l in which_layers:
41 | 
42 | 			savename = model.split('.pkl')[0]
43 | 			if agg=='':
44 | 				savename += '_FF_' # frame-level features
45 | 			else:
46 | 				savename += '_AF_' # aggregate features
47 | 
48 | 			if l=='1 2 3':
49 | 				savename += 'LAll'
50 | 			else:
51 | 				savename += 'L' + l
52 | 
53 | 			jobname = savename+'.sh'
54 | 			job_list.append( jobname )
55 | 			with open(jobname, 'w') as fname:
56 | 				fname.write( jobscript( jobname=savename, 
57 | 										model_file=os.path.join('/SCRATCH/cmke/saved_models/dnn', model), 
58 | 										which_layers=l, 
59 | 										save_file=os.path.join('/SCRATCH/cmke/saved_models/rf', savename), 
60 | 										aggregate_features=agg) )
61 | 
62 | with open('_master_RF_trainer.sh', 'w') as f:
63 | 	for j in job_list:
64 | 		f.write('qsub %s\n' % j)
65 | 
66 | 
67 | 


--------------------------------------------------------------------------------
/hpc_scripts/generate_jobs.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | #python prepare_dataset.py \
 4 | #        ../GTZAN \
 5 | #        ./label_list_GTZAN.txt \
 6 | #        --hdf5 ./GTZAN.hdf5 \
 7 | #	--nfft 4096 --nhop 2048 \
 8 | #        --train ./gtzan/train_stratified2.txt \
 9 | #        --valid ./gtzan/valid_stratified2.txt \
10 | #        --test ./gtzan/test_stratified2.txt \
11 | #        --partition_name ./GTZANstrat_partition_configuration.pkl
12 | 
13 | python train_mlp_script.py \
14 |         ./GTZANstrat_partition_configuration.pkl \
15 |         ./yaml_scripts/mlp_rlu2.yaml \
16 |         --nunits 50
17 |         --output GTZAN_strat2049_model.pkl
18 | 


--------------------------------------------------------------------------------
/hpc_scripts/gpu_to_cpu_pkl.py:
--------------------------------------------------------------------------------
 1 | import os, sys, glob, re
 2 | import numpy as np
 3 | from pylearn2.utils import serial
 4 | import pylearn2.config.yaml_parse as yaml_parse
 5 | from pylearn2.models.rbm import RBM
 6 | import copy
 7 | 
 8 | '''
 9 | When training is done on a GPU the model files won't unpickle properly.
10 | This script attempts to fix that. It must be run on a GPU with the environment
11 | variable THEANO_FLAGS exported beforehand. If pretraining is done then 
12 | we may have to delete the .cpu.pkl files for the composite MLP and re-run
13 | (because the layers models need to be properly converted first)
14 | '''
15 | 
16 | if __name__=="__main__":
17 | 
18 | 	os.environ['THEANO_FLAGS']="device=cpu"
19 | 	_, directory = sys.argv
20 | 
21 | 	files_list = glob.glob(os.path.join(directory, '*.pkl'))
22 | 
23 | 	p1 = re.compile(r"(cpu)")
24 | 	p2 = re.compile(r"(\.pkl)")
25 | 
26 | 	for in_file in files_list:
27 | 
28 | 		if p1.search(in_file) != None:
29 | 			continue
30 | 
31 | 		out_file = os.path.splitext(in_file)[0] + '.cpu.pkl'
32 | 
33 | 		if os.path.exists(out_file):
34 | 			continue
35 | 
36 | 		model  = serial.load(in_file)
37 | 		model_yaml_src = p2.sub('.cpu.pkl', model.yaml_src)
38 | 		model2 = yaml_parse.load(model_yaml_src)
39 | 
40 | 		params = model.get_param_values()
41 | 		model2.set_param_values(params)
42 | 
43 | 		model2.yaml_src = model_yaml_src
44 | 		model2.dataset_yaml_src = model.dataset_yaml_src
45 | 
46 | 		serial.save(out_file, model2)
47 | 


--------------------------------------------------------------------------------
/hpc_scripts/requirements.txt:
--------------------------------------------------------------------------------
 1 | Cython==0.21.1
 2 | PyYAML==3.11
 3 | Theano==0.6.0
 4 | argparse==1.2.1
 5 | ipython==2.3.0
 6 | matplotlib==1.4.2
 7 | mock==1.0.1
 8 | nose==1.3.4
 9 | numexpr==2.4
10 | numpy==1.9.0
11 | -e git://github.com/lisa-lab/pylearn2.git@b645ac32a6ca0ce92724e75236db193671ce2e19#egg=pylearn2-master
12 | pymad==0.6
13 | pyparsing==2.0.3
14 | python-dateutil==2.2
15 | pytz==2014.7
16 | scikit-learn==0.15.2
17 | scikits.audiolab==0.11.0
18 | scikits.samplerate==0.4.0.dev
19 | scipy==0.14.0
20 | six==1.8.0
21 | tables==3.1.2dev
22 | wsgiref==0.1.2
23 | 


--------------------------------------------------------------------------------
/lmd/lmd_prep.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | # embedded options to qsub - start with #PBS
 3 | # -- Name of the job ---
 4 | #PBS -N lmd_prep
 5 | # -- specify queue --
 6 | #PBS -q hpc
 7 | # -- estimated wall clock time (execution time): hh:mm:ss --
 8 | #PBS -l walltime=12:00:00
 9 | # --- number of processors/cores/nodes --
10 | #PBS -l nodes=1:ppn=1:gpus=1
11 | # -- user email address --
12 | #PBS -M coreyker@gmail.com
13 | # -- mail notification --
14 | #PBS -m abe
15 | # -- run in the current working (submission) directory --
16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
17 | # here follow the commands you want to execute
18 | # Load modules needed by myapplication.x
19 | module load python/2.7.3 cuda/6.5
20 | 
21 | # Run my program
22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
23 | source ~/venv/bin/activate
24 | cd ~/dnn-mgr
25 | python prepare_dataset.py \
26 |     /dtu-compute/cosound/data/_latinmusicdataset/ \
27 |     /dtu-compute/cosound/data/_latinmusicdataset/label_list.txt \
28 |     --hdf5 /dtu-compute/cosound/data/_latinmusicdataset/LMD.h5 \
29 |     --train /dtu-compute/cosound/data/_latinmusicdataset/train-part.txt \
30 |     --valid /dtu-compute/cosound/data/_latinmusicdataset/valid-part.txt \
31 |     --test /dtu-compute/cosound/data/_latinmusicdataset/test-part.txt \
32 |     --partition_name /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_config.pkl \
33 |     --compute_std


--------------------------------------------------------------------------------
/lmd/lmd_train.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | # embedded options to qsub - start with #PBS
 3 | # -- Name of the job ---
 4 | #PBS -N lmd_train
 5 | # -- specify queue --
 6 | #PBS -q hpc
 7 | # -- estimated wall clock time (execution time): hh:mm:ss --
 8 | #PBS -l walltime=30:00:00
 9 | # --- number of processors/cores/nodes --
10 | #PBS -l nodes=1:ppn=1:gpus=1
11 | # -- user email address --
12 | #PBS -M coreyker@gmail.com
13 | # -- mail notification --
14 | #PBS -m abe
15 | # -- run in the current working (submission) directory --
16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
17 | # here follow the commands you want to execute
18 | # Load modules needed by myapplication.x
19 | module load python/2.7.3 cuda/6.5
20 | 
21 | # Run my program
22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
23 | source ~/venv/bin/activate
24 | cd /dtu-compute/cosound/data/_latinmusicdataset/
25 | python ~/dnn-mgr/train_mlp_script.py \
26 |     /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_config.pkl \
27 |     ~/dnn-mgr/yaml_scripts/mlp_rlu_dropout.yaml \
28 |     --nunits 500 \
29 |     --output ~/dnn-mgr/lmd/lmd_513_500x3.pkl
30 | 


--------------------------------------------------------------------------------
/lmd/lmd_train_conv.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | # embedded options to qsub - start with #PBS
 3 | # -- Name of the job ---
 4 | #PBS -N lmd_train_conv
 5 | # -- specify queue --
 6 | #PBS -q hpc
 7 | # -- estimated wall clock time (execution time): hh:mm:ss --
 8 | #PBS -l walltime=30:00:00
 9 | # --- number of processors/cores/nodes --
10 | #PBS -l nodes=1:ppn=1:gpus=1
11 | # -- user email address --
12 | #PBS -M coreyker@gmail.com
13 | # -- mail notification --
14 | #PBS -m abe
15 | # -- run in the current working (submission) directory --
16 | if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi
17 | # here follow the commands you want to execute
18 | # Load modules needed by myapplication.x
19 | module load python/2.7.3 cuda/6.5
20 | 
21 | # Run my program
22 | export LD_LIBRARY_PATH=~/.local/lib:$LD_LIBRARY_PATH
23 | source ~/venv/bin/activate
24 | cd /dtu-compute/cosound/data/_latinmusicdataset/
25 | python ~/dnn-mgr/train_mlp_conv_script.py \
26 |     /dtu-compute/cosound/data/_latinmusicdataset/LMD_split_conv_config.pkl \
27 |     ~/dnn-mgr/yaml_scripts/mlp_rlu_conv2.yaml \
28 |     --output ~/dnn-mgr/lmd/lmd_513_conv.pkl
29 | 


--------------------------------------------------------------------------------
/lmd_af/LMD_AF_split_dnn.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coreyker/dnn-mgr/bdad579ea6cb37b665ea6019fe1026a6ce20cbc7/lmd_af/LMD_AF_split_dnn.pkl


--------------------------------------------------------------------------------
/lmd_af/mlp_rlu_dropout_adversary.yaml:
--------------------------------------------------------------------------------
 1 | !obj:pylearn2.train.Train {
 2 |     dataset : &trainset !obj:adversary_dataset.AdversaryDataset {
 3 |         adv_model: &adv_model !pkl: "lmd_af/lmd_af_513_500x3.pkl",
 4 |         which_set : 'train',
 5 |         config : &fold !pkl: "%(fold_config)s"
 6 |     },
 7 |     model : !obj:pylearn2.models.mlp.MLP {
 8 |         nvis : 513,
 9 |         layers : [
10 |             !obj:audio_dataset.PreprocLayer {
11 |                 config : *fold,
12 |                 proc_type : 'standardize'
13 |                 },
14 |             !obj:pylearn2.models.mlp.RectifiedLinear {
15 |                 layer_name : 'h0',
16 |                 dim : %(dim_h0)i,
17 |                 irange : &irange .1
18 |                 },
19 |             !obj:pylearn2.models.mlp.RectifiedLinear {
20 |                 layer_name : 'h1',
21 |                 dim : %(dim_h1)i,
22 |                 irange : *irange
23 |                 },
24 |             !obj:pylearn2.models.mlp.RectifiedLinear {
25 |                 layer_name : 'h2',
26 |                 dim : %(dim_h2)i,
27 |                 irange : *irange
28 |                 },
29 |             !obj:pylearn2.models.mlp.Softmax {
30 |                 n_classes : 10,
31 |                 layer_name : 'y',
32 |                 irange : *irange
33 |                 }
34 |             ]
35 |     },
36 |     algorithm : !obj:pylearn2.training_algorithms.sgd.SGD {
37 |         learning_rate : .1,
38 |         learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum {
39 |             init_momentum : 0.5
40 |         },
41 |         train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
42 |         monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
43 |         #batches_per_iter : 500,
44 |         batch_size : 1200,
45 |         monitoring_dataset : {
46 |             'train' : *trainset,
47 |             'valid' : !obj:adversary_dataset.AdversaryDataset {
48 |                 adv_model: *adv_model,
49 |                 which_set : 'valid',
50 |                 config : *fold
51 |             }
52 |         },
53 |         termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
54 |             channel_name : 'valid_y_misclass',
55 |             prop_decrease : .001,
56 |             N: 50
57 |         },
58 |         cost: !obj:pylearn2.costs.mlp.dropout.Dropout { 
59 |             default_input_include_prob : .75,
60 |             input_include_probs: { 'pre': 1., 'h0': 1. },
61 |             input_scales: { 'pre': 1., 'h0' : 1. }
62 |         }
63 |     },
64 |     extensions: [
65 |         !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
66 |              channel_name: 'valid_y_misclass',
67 |              save_path: "%(best_model_save_path)s"
68 |         },
69 |         !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
70 |             start: 1,
71 |             saturate: 200,
72 |             final_momentum: .9
73 |         },
74 |         !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
75 |             start: 1,
76 |             saturate: 200,
77 |             decay_factor: .01
78 |         }
79 |     ],
80 |     save_path : "%(save_path)s",
81 |     save_freq : 1
82 | }


--------------------------------------------------------------------------------
/pretrain_layers.py:
--------------------------------------------------------------------------------
  1 | import os, sys, cPickle, argparse
  2 | from pylearn2.train import Train
  3 | from pylearn2.training_algorithms.sgd import SGD
  4 | from pylearn2.costs.ebm_estimation import SML
  5 | from pylearn2.datasets.transformer_dataset import TransformerDataset
  6 | from pylearn2.termination_criteria import MonitorBased, ChannelTarget, EpochCounter
  7 | from pylearn2.training_algorithms.learning_rule import RMSProp
  8 | from pylearn2.costs.autoencoder import MeanSquaredReconstructionError
  9 | 
 10 | import pylearn2.config.yaml_parse as yaml_parse
 11 | from audio_dataset import AudioDataset, PreprocLayer
 12 | import pdb
 13 | 
 14 | '''
 15 | (Although it may be more complicated) We build our models and dataset using yaml in order to keep a record of how things were built
 16 | ''' 
 17 | 
 18 | def get_grbm(nvis, nhid):
 19 | 
 20 |     model_yaml = '''!obj:pylearn2.models.rbm.GaussianBinaryRBM {
 21 |         nvis : %(nvis)i,
 22 |         nhid : %(nhid)i,
 23 |         irange : .1,
 24 |         energy_function_class : !obj:pylearn2.energy_functions.rbm_energy.grbm_type_1 {},        
 25 |         init_sigma : 1.,
 26 |         init_bias_hid : 0,
 27 |         mean_vis : True    
 28 |     }''' % {'nvis' : nvis, 'nhid': nhid}    
 29 | 
 30 |     model = yaml_parse.load(model_yaml)
 31 |     return model
 32 | 
 33 | def get_rbm(nvis, nhid):
 34 | 
 35 |     model_yaml = '''!obj:pylearn2.models.rbm.RBM {
 36 |         nvis : %(nvis)i,
 37 |         nhid : %(nhid)i,
 38 |         irange : .1
 39 |     }''' % {'nvis' : nvis, 'nhid': nhid}  
 40 | 
 41 |     model = yaml_parse.load(model_yaml)
 42 |     return model
 43 | 
 44 | def get_ae(nvis, nhid):
 45 | 
 46 |     model_yaml = '''!obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
 47 |         nvis : %(nvis)i,
 48 |         nhid : %(nhid)i,
 49 |         irange : .1,
 50 |         corruptor : !obj:pylearn2.corruption.BinomialCorruptor { corruption_level : .1 },
 51 |         act_enc : 'sigmoid',
 52 |         act_dec : null  
 53 |     }''' % {'nvis' : nvis, 'nhid': nhid}  
 54 | 
 55 |     model = yaml_parse.load(model_yaml)
 56 |     return model
 57 | 
 58 | def get_rbm_trainer(model, dataset, save_path, epochs=5):
 59 |     """
 60 |     A Restricted Boltzmann Machine (RBM) trainer    
 61 |     """
 62 | 
 63 |     config = {
 64 |     'learning_rate': 1e-2,
 65 |     'train_iteration_mode': 'shuffled_sequential',
 66 |     'batch_size': 250,
 67 |     #'batches_per_iter' : 100,
 68 |     'learning_rule': RMSProp(),
 69 |     'monitoring_dataset': dataset,
 70 |     'cost' : SML(250, 1),
 71 |     'termination_criterion' : EpochCounter(max_epochs=epochs),
 72 |     }
 73 |     
 74 |     return Train(model=model, 
 75 |         algorithm=SGD(**config),
 76 |         dataset=dataset,    
 77 |         save_path=save_path, 
 78 |         save_freq=1
 79 |         )#, extensions=extensions)
 80 | 
 81 | def get_ae_trainer(model, dataset, save_path, epochs=5):
 82 |     """
 83 |     An Autoencoder (AE) trainer    
 84 |     """
 85 | 
 86 |     config = {
 87 |     'learning_rate': 1e-2,
 88 |     'train_iteration_mode': 'shuffled_sequential',
 89 |     'batch_size': 250,
 90 |     #'batches_per_iter' : 2000,
 91 |     'learning_rule': RMSProp(),
 92 |     'monitoring_dataset': dataset,
 93 |     'cost' : MeanSquaredReconstructionError(),
 94 |     'termination_criterion' : EpochCounter(max_epochs=epochs),
 95 |     }
 96 | 
 97 |     return Train(model=model, 
 98 |         algorithm=SGD(**config),
 99 |         dataset=dataset,    
100 |         save_path=save_path, 
101 |         save_freq=1
102 |         )#, extensions=extensions)
103 | 
104 | if __name__=="__main__":
105 | 
106 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
107 |     description='Script to pretrain the layers of a DNN.')
108 | 
109 |     parser.add_argument('fold_config', help='Path to dataset configuration file (generated with prepare_dataset.py)')
110 |     parser.add_argument('--arch', nargs='*', type=int, help='Architecture: nvis nhid1 nhid2 ...')
111 |     parser.add_argument('--epochs', type=int, help='Number of training epochs per layer')
112 |     parser.add_argument('--save_prefix', help='Full path and prefix for saving output models')
113 |     parser.add_argument('--use_autoencoder', action='store_true')
114 |     args = parser.parse_args()
115 | 
116 |     if args.epochs is None:
117 |         args.epochs = 5
118 | 
119 |     arch = [(i,j) for i,j in zip(args.arch[:-1], args.arch[1:])]
120 | 
121 |     with open(args.fold_config) as f: 
122 |         config = cPickle.load(f)
123 | 
124 |     preproc_layer = PreprocLayer(config=config, proc_type='standardize')
125 | 
126 |     dataset = TransformerDataset(
127 |         raw=AudioDataset(which_set='train', config=config),
128 |         transformer=preproc_layer.layer_content
129 |         )
130 | 
131 |     # transformer_yaml = '''!obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
132 |     #     raw : %(raw)s,
133 |     #     transformer : %(transformer)s
134 |     # }'''
135 |     #
136 |     # dataset_yaml = transformer_yaml % {
137 |     #     'raw' : '''!obj:audio_dataset.AudioDataset {
138 |     #         which_set : 'train',
139 |     #         config : !pkl: "%(fold_config)s"
140 |     #     }''' % {'fold_config' : args.fold_config},
141 |     #     'transformer' : '''!obj:pylearn2.models.mlp.MLP {
142 |     #         nvis : %(nvis)i,
143 |     #         layers : 
144 |     #         [
145 |     #             !obj:audio_dataset.PreprocLayer {
146 |     #                 config : !pkl: "%(fold_config)s",
147 |     #                 proc_type : 'standardize'
148 |     #             } 
149 |     #         ]
150 |     #     }''' % {'nvis' : args.arch[0], 'fold_config' : args.fold_config }
151 |     # }
152 | 
153 |     for i,(v,h) in enumerate(arch):        
154 | 
155 |         if not args.use_autoencoder:
156 |             print 'Pretraining layer %d with RBM' % i
157 | 
158 |             if i==0:
159 |                 model = get_grbm(v,h)
160 |             else:
161 |                 model = get_rbm(v,h)
162 | 
163 |             save_path = args.save_prefix+ 'RBM_L{}.pkl'.format(i+1)
164 |             trainer   = get_rbm_trainer(model=model, dataset=dataset, save_path=save_path, epochs=args.epochs)
165 |         else:
166 |             print 'Pretraining layer %d with AE' % i
167 | 
168 |             model     = get_ae(v,h)
169 |             save_path = args.save_prefix + 'AE_L{}.pkl'.format(i+1)
170 |             trainer   = get_ae_trainer(model=model, dataset=dataset, save_path=save_path, epochs=args.epochs)
171 | 
172 |         trainer.main_loop()
173 | 
174 |         dataset = TransformerDataset(raw=dataset, transformer=model)
175 | 
176 |         # dataset_yaml = transformer_yaml % {'raw' : dataset_yaml, 'transformer' : '!pkl: %s' % save_path}
177 |         # dataset = yaml_parse.load( dataset_yaml )
178 | 
179 | 
180 | 


--------------------------------------------------------------------------------
/svm_train_test.py:
--------------------------------------------------------------------------------
 1 | # svm train / test
 2 | import os, sys, copy, cPickle
 3 | import numpy as np
 4 | import theano
 5 | import theano.tensor as T
 6 | from pylearn2.utils import serial
 7 | from sklearn.svm import LinearSVC, SVC
 8 | import pdb
 9 | 
10 | def train_svm(X,y,C):
11 | 	if 0:
12 | 		svm = LinearSVC(C=C, loss='l1', random_state=1234)
13 | 	else:
14 | 		svm = SVC(C=C, kernel='linear', random_state=1234)
15 | 	return svm.fit(X,y)
16 | 	
17 | def test_svm(X, y, svm):
18 | 	n_classes = 10
19 | 
20 | 	confusion = np.zeros((n_classes, n_classes))
21 | 	for feats, label in zip(X,y):
22 | 		true_label = label if np.isscalar(label) else label[0]
23 | 		pred       = np.array( svm.predict(feats), dtype='int' )
24 | 		vote_label = np.argmax( np.bincount(pred, minlength=10) )
25 | 		
26 | 		confusion[int(true_label), int(vote_label)] += 1
27 | 
28 | 	total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion))
29 | 
30 | 	return total_error, confusion
31 | 
32 | def grid_search(X_train, y_train, X_valid, y_valid, C_values):
33 | 	n_classes  = 10
34 | 	best_svm   = None
35 | 	best_C     = None
36 | 	best_error = 100.
37 | 	best_conf  = None
38 | 
39 | 	for m,C in enumerate(C_values):
40 | 
41 | 		svm = train_svm( np.vstack(X_train), np.hstack(y_train), C)
42 | 		err, conf = test_svm( np.vstack(X_valid), np.hstack(y_valid), svm )
43 | 		#err, conf = test_svm(X_valid, y_valid, svm)
44 | 
45 | 		if err < best_error:
46 | 			best_error = err
47 | 			best_svm   = copy.deepcopy(svm)
48 | 			best_C     = C
49 | 			best_conf  = conf
50 | 
51 | 		print 'Model selection progress %2d%%, best_error=%2.2f, curr_error=%2.2f' % ((100*m)/len(C_values), best_error, err)
52 | 
53 | 	print ''#newline
54 | 	return best_error, best_svm, best_C, best_conf
55 | 
56 | if __name__ == "__main__":
57 | 
58 | 	# load in BOF features
59 | 	model = './saved-rlu-505050/mlp_rlu-fold-4_of_4'#'mlp_rlu_fold1_best'
60 | 	train_BOF = model + '-train-BOF.pkl'
61 | 	valid_BOF = model + '-valid-BOF.pkl'
62 | 	test_BOF  = model + '-test-BOF.pkl'
63 | 
64 | 	with open(train_BOF) as f:
65 | 		X_train, y_train = cPickle.load(f)
66 | 
67 | 	with open(valid_BOF) as f:
68 | 		X_valid, y_valid = cPickle.load(f)
69 | 
70 | 	with open(test_BOF) as f:
71 | 		X_test, y_test = cPickle.load(f)
72 | 
73 | 	C_values = 10.0 ** np.arange(-3, 3, 0.25)
74 | 	#C_values = np.arange(0.5, 1, 0.01)
75 | 	best_error, best_svm, best_C, best_conf = grid_search(X_train, y_train, X_valid, y_valid, C_values)
76 | 
77 | 	# re-train on train+valid sets before testing
78 | 	best_svm = train_svm(np.vstack(sum([X_train, X_valid],[])), np.hstack(sum([y_train, y_valid],[])), best_C)
79 | 	total_error, confusion = test_svm(X_test, y_test, best_svm)
80 | 
81 | 	print 'test accuracy: %2.2f' % (100-total_error)
82 | 	print 'confusion matrix:'
83 | 	print confusion/np.sum(confusion, axis=1)
84 | 
85 | 
86 | 
87 | 


--------------------------------------------------------------------------------
/test_mlp_script.py:
--------------------------------------------------------------------------------
  1 | import sys, re, csv, cPickle
  2 | import numpy as np
  3 | import theano
  4 | 
  5 | from pylearn2.utils import serial
  6 | from audio_dataset import AudioDataset
  7 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
  8 | import pylearn2.config.yaml_parse as yaml_parse
  9 | 
 10 | import pdb
 11 | def frame_misclass_error(model, dataset):
 12 |     """
 13 |     Function to compute the frame-level classification error by classifying
 14 |     individual frames and then voting for the class with highest cumulative probability
 15 |     """
 16 | 
 17 |     n_classes  = len(dataset.targets)
 18 |     feat_space = model.get_input_space()
 19 | 
 20 |     X     = feat_space.make_theano_batch()
 21 |     Y     = model.fprop( X )
 22 |     fprop = theano.function([X],Y)
 23 |     
 24 |     confusion  = np.zeros((n_classes, n_classes))
 25 |     
 26 |     batch_size   = 30
 27 |     n_examples   = len(dataset.support) // batch_size
 28 |     target_space = VectorSpace(dim=n_classes)
 29 |     data_specs   = (CompositeSpace((feat_space, target_space)), ("features", "targets"))     
 30 |     iterator     = dataset.iterator(mode='sequential', batch_size=batch_size, data_specs=data_specs)
 31 | 
 32 |     for i, el in enumerate(iterator):
 33 | 
 34 |         # display progress indicator
 35 |         sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples)))
 36 |         sys.stdout.flush()
 37 |     
 38 |         fft_data    = np.array(el[0], dtype=np.float32)
 39 |         vote_labels = np.argmax(fprop(fft_data), axis=1)
 40 |         true_labels = np.argmax(el[1], axis=1)
 41 | 
 42 |         for l,v in zip(true_labels, vote_labels):
 43 |             confusion[l, v] += 1
 44 | 
 45 |     total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion))
 46 |     print ''
 47 |     return total_error, confusion
 48 | 
 49 | def file_misclass_error(model, dataset):
 50 |     """
 51 |     Function to compute the file-level classification error by classifying
 52 |     individual frames and then voting for the class with highest cumulative probability
 53 |     """
 54 |     n_classes  = len(dataset.targets)
 55 |     feat_space = model.get_input_space()
 56 | 
 57 |     X     = feat_space.make_theano_batch()
 58 |     Y     = model.fprop( X )
 59 |     fprop = theano.function([X],Y)
 60 |     
 61 |     confusion  = np.zeros((n_classes, n_classes))
 62 |     n_examples = len(dataset.file_list)
 63 | 
 64 |     target_space = VectorSpace(dim=n_classes)
 65 |     data_specs   = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets"))     
 66 |     iterator     = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs)
 67 | 
 68 |     for i,el in enumerate(iterator):
 69 | 
 70 |         # display progress indicator
 71 |         sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples)))
 72 |         sys.stdout.flush()
 73 | 
 74 |         fft_data     = np.array(el[0], dtype=np.float32)
 75 |         #frame_labels = np.argmax(fprop(fft_data), axis=1)
 76 |         #hist         = np.bincount(frame_labels, minlength=n_classes)
 77 |         #vote_label   = np.argmax(hist) # most used label
 78 |         vote_label = np.argmax(np.sum(fprop(fft_data), axis=0))
 79 |         true_label = el[1] #np.argmax(el[1])
 80 |         confusion[true_label, vote_label] += 1
 81 |         #print 'true: {}, vote: {}'.format(true_label, vote_label)
 82 |         #pdb.set_trace()
 83 | 
 84 |     total_error = 100*(1 - np.sum(np.diag(confusion)) / np.sum(confusion))
 85 |     print ''
 86 |     return total_error, confusion
 87 | 
 88 | def file_misclass_error_printf(model, dataset, save_file, label_list=None):
 89 |     """
 90 |     Function to compute the file-level classification error by classifying
 91 |     individual frames and then voting for the class with highest cumulative probability
 92 |     """
 93 |     n_classes  = len(dataset.targets)
 94 |     feat_space = model.get_input_space()
 95 | 
 96 |     X     = feat_space.make_theano_batch()
 97 |     Y     = model.fprop(X)
 98 |     fprop = theano.function([X],Y)
 99 |     
100 |     n_examples   = len(dataset.file_list)
101 |     target_space = VectorSpace(dim=n_classes)
102 |     data_specs   = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets"))     
103 |     iterator     = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs)
104 | 
105 |     with open(save_file, 'w') as fname:
106 |         csvwriter = csv.writer(fname, delimiter='\t')
107 |         for i,el in enumerate(iterator):
108 | 
109 |             # display progress indicator
110 |             sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples)))
111 |             sys.stdout.flush()
112 |         
113 |             fft_data     = np.array(el[0], dtype=np.float32)
114 |             #frame_labels = np.argmax(fprop(fft_data), axis=1)
115 |             #hist         = np.bincount(frame_labels, minlength=n_classes)
116 |             choice = np.argmax(np.sum(fprop(fft_data), axis=0))
117 | 
118 |             if label_list: # use-string labels
119 |                 vote_label   = label_list[choice] # most used label
120 |                 true_label   = dataset.label_list[el[1]]#np.argmax(el[1])
121 |             else: # use numeric labels
122 |                 vote_label   = choice # most used label
123 |                 true_label   = el[1] #np.argmax(el[1])
124 | 
125 |             #csvwriter.writerow([dataset.file_list[i], true_label, vote_label])            
126 |             csvwriter.writerow([dataset.file_list[i], true_label, vote_label])            
127 | 
128 |             # fname.write('{file_name}\t{true_label}\t{vote_label}\n'.format(
129 |             #     file_name =dataset.file_list[i], 
130 |             #     true_label=true_label,
131 |             #     vote_label=vote_label))
132 |     print ''
133 | 
134 | def file_misclass_error_topx(model, dataset, topx=3):
135 |     """
136 |     Function to compute the file-level classification error by classifying
137 |     individual frames and then voting for the class with highest cumulative probability
138 | 
139 |     Check topx most probable results
140 |     """
141 |     X = model.get_input_space().make_theano_batch()
142 |     Y = model.fprop( X )
143 |     fprop = theano.function([X],Y)
144 | 
145 |     n_classes  = dataset.raw.y.shape[1]
146 |     confusion  = np.zeros((n_classes, n_classes))
147 |     n_examples = len(dataset.raw.support)
148 |     n_frames_per_file = dataset.raw.n_frames_per_file
149 | 
150 |     batch_size = n_frames_per_file
151 |     data_specs = dataset.raw.get_data_specs()
152 |     iterator = dataset.iterator(mode='sequential', 
153 |         batch_size=batch_size, 
154 |         data_specs=data_specs
155 |         )
156 | 
157 |     hits = 0
158 |     n = 0
159 |     i=0        
160 |     for el in iterator:
161 | 
162 |         # display progress indicator
163 |         sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples)))
164 |         sys.stdout.flush()
165 |     
166 |         fft_data     = np.array(el[0], dtype=np.float32)
167 |         frame_labels = np.argmax(fprop(fft_data), axis=1)
168 |         hist         = np.bincount(frame_labels, minlength=n_classes)
169 |         vote_label   = np.argsort(hist)[-1:-1-topx:-1] # most used label
170 | 
171 |         labels = np.argmax(el[1], axis=1)
172 |         true_label = labels[0]
173 |         for entry in labels:
174 |              assert entry == true_label # check for indexing prob
175 | 
176 |         if true_label in vote_label:
177 |             hits+=1
178 | 
179 |         n+=1
180 |         i+=batch_size
181 | 
182 |     print ''
183 |     return hits/float(n)*100
184 | 
185 | 
186 | def pp_array(array): # pretty printing
187 |     for row in array:
188 |         print ['%04.1f' % el for el in row]
189 | 
190 | 
191 | if __name__ == '__main__':
192 |     
193 |     import argparse
194 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
195 |         description='''Script to test DNN. Measure framelevel accuracy. 
196 |         Option to use a majority vote for over the frames in each test recording.
197 |         ''')
198 | 
199 |     parser.add_argument('model_file', help='Path to trained model file')
200 |     parser.add_argument('--testset', help='Optional. If not specified, the testset from the models yaml src will be used')
201 |     parser.add_argument('--majority_vote', action='store_true', help='Measure framelevel accuracy with ')
202 |     parser.add_argument('--which_set', help='train, test, or valid')
203 |     parser.add_argument('--save_file', help='Save results to tab separated file')
204 |     args = parser.parse_args()
205 |     
206 |     # get model
207 |     model = serial.load(args.model_file)  
208 | 
209 |     if args.which_set is None:
210 |         args.which_set = 'test'
211 |         
212 |     if args.testset: # dataset config passed in from command line
213 |         print 'Using dataset passed in from command line'
214 |         with open(args.testset) as f: config = cPickle.load(f)
215 |         dataset = AudioDataset(config=config, which_set=args.which_set)
216 | 
217 |         # get model dataset for its labels...
218 |         model_dataset = yaml_parse.load(model.dataset_yaml_src)
219 |         label_list = model_dataset.label_list
220 | 
221 |     else: # get dataset from model's yaml_src
222 |         print "Using dataset from model's yaml src"
223 |         p = re.compile(r"which_set.*'(train)'")
224 |         dataset_yaml = p.sub("which_set: '{}'".format(args.which_set), model.dataset_yaml_src)
225 |         dataset = yaml_parse.load(dataset_yaml)
226 |         
227 |         label_list = dataset.label_list
228 | 
229 |     # measure test error
230 |     if args.majority_vote:
231 |         print 'Using majority vote'
232 |         if args.save_file:
233 |             file_misclass_error_printf(model, dataset, args.save_file)#, label_list)
234 |         else:
235 |             err, conf = file_misclass_error(model, dataset)
236 |     else:
237 |         print 'Not using majority vote'
238 |         # if args.save_file:
239 |         #     raise ValueError('--save_file option only supported for majority vote currently')            
240 |         # else:
241 |         #     err, conf = frame_misclass_error(model, dataset)
242 |         err, conf = frame_misclass_error(model, dataset)
243 |         with open(args.save_file, 'wb') as fname:
244 |             csvwriter = csv.writer(fname, delimiter='\t')
245 |             for r in conf:
246 |                 csvwriter.writerow(r)
247 | 
248 | 
249 |     if not args.save_file:
250 |         conf = conf.transpose()
251 |         print 'test accuracy: %2.2f' % (100-err)
252 |         print 'confusion matrix (cols true):'
253 |         pp_array(100*conf/np.sum(conf, axis=0))
254 | 
255 |     # acc = file_misclass_error_topx(model, dataset, 2)
256 |     # print 'test accuracy: %2.2f' % acc
257 | 
258 | 
259 | 


--------------------------------------------------------------------------------
/train_classifier_on_dnn_feats.py:
--------------------------------------------------------------------------------
  1 | import os, sys, re, cPickle
  2 | import numpy as np
  3 | import theano
  4 | 
  5 | from sklearn.externals import joblib
  6 | from sklearn.ensemble import RandomForestClassifier
  7 | from sklearn.svm import SVC
  8 | from sklearn.grid_search import GridSearchCV
  9 | from pylearn2.utils import serial
 10 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
 11 | import pylearn2.config.yaml_parse as yaml_parse
 12 | 
 13 | import pdb
 14 | 
 15 | def aggregate_features(model, dataset, which_layers=[2], win_size=200, step=100):
 16 |     assert np.max(which_layers) < len(model.layers)
 17 | 
 18 |     X = model.get_input_space().make_theano_batch()
 19 |     Y = model.fprop(X, return_all=True)
 20 |     fprop = theano.function([X],Y)
 21 | 
 22 |     n_classes  = dataset.y.shape[1]
 23 |     n_examples = len(dataset.file_list)
 24 | 
 25 |     feat_space   = model.get_input_space()
 26 |     target_space = VectorSpace(dim=n_classes)
 27 | 
 28 |     data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets"))     
 29 |     iterator = dataset.iterator(mode='sequential', data_specs=data_specs)
 30 | 
 31 |     # compute feature representation, aggregrate frames
 32 |     X=[]; y=[]; Z=[]; file_list=[];
 33 |     for n,el in enumerate(iterator):
 34 |         # display progress indicator
 35 |         sys.stdout.write('Aggregation progress: %2.0f%%\r' % (100*n/float(n_examples)))
 36 |         sys.stdout.flush()
 37 | 
 38 |         input_data  = np.array(el[0], dtype=np.float32)
 39 |         output_data = fprop(input_data)
 40 |         feats       = np.hstack([output_data[i] for i in which_layers])
 41 |         true_label  = el[1]
 42 | 
 43 |         # aggregate features
 44 |         agg_feat = []
 45 |         for i in xrange(0, feats.shape[0]-win_size, step):
 46 |             chunk = feats[i:i+win_size,:]
 47 |             agg_feat.append(np.hstack((np.mean(chunk, axis=0), np.std(chunk, axis=0))))
 48 |         
 49 |         X.append(np.vstack(agg_feat))
 50 |         y.append(np.hstack([true_label] * len(agg_feat)))
 51 |         Z.append(np.sum(output_data[-1], axis=0)) 
 52 |         file_list.append(el[2])
 53 | 
 54 |     print '' # newline
 55 |     return X, y, Z, file_list
 56 | 
 57 | def get_features(model, dataset, which_layers=[2], n_features=100):
 58 |     assert np.max(which_layers) < len(model.layers)
 59 | 
 60 |     rng = np.random.RandomState(111)
 61 |     X = model.get_input_space().make_theano_batch()
 62 |     Y = model.fprop(X, return_all=True)
 63 |     fprop = theano.function([X],Y)
 64 | 
 65 |     n_classes  = dataset.y.shape[1]
 66 |     n_examples = len(dataset.file_list)
 67 | 
 68 |     feat_space   = model.get_input_space()
 69 |     target_space = VectorSpace(dim=n_classes)
 70 | 
 71 |     data_specs = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets"))     
 72 |     iterator = dataset.iterator(mode='sequential', data_specs=data_specs)
 73 |     
 74 |     X=[]; y=[]; Z=[]; file_list=[];
 75 |     for n,el in enumerate(iterator):
 76 |         # display progress indicator
 77 |         sys.stdout.write('Getting features: %2.0f%%\r' % (100*n/float(n_examples)))
 78 |         sys.stdout.flush()
 79 | 
 80 |         input_data  = np.array(el[0], dtype=np.float32)
 81 |         output_data = fprop(input_data)        
 82 |         feats = np.hstack([output_data[i] for i in which_layers])
 83 |         true_label = el[1]
 84 | 
 85 |         if n_features:
 86 |             ind   = rng.permutation(feats.shape[0])
 87 |             feats = feats[ind[:n_features],:]
 88 | 
 89 |         X.append(feats)
 90 |         y.append([true_label]*n_features)
 91 |         Z.append(np.sum(output_data[-1], axis=0))
 92 |         file_list.append(el[2])
 93 | 
 94 |     print ''
 95 |     return X, y, Z, file_list
 96 | 
 97 | def train_classifier(X_train, y_train, method='random_forest', verbose=2):
 98 |     assert method in ['random_forest', 'linear_svm']
 99 |     
100 |     # train classifier
101 |     if method=='random_forest':
102 |         classifier = RandomForestClassifier(n_estimators=500, random_state=1234, verbose=verbose, n_jobs=2)
103 |     else:
104 |         parameters = {'C' : 10**np.arange(-2,4.)}
105 |         grid = GridSearchCV(SVC(), parameters, verbose=3)
106 |         grid.fit(X_train, y_train)
107 |         classifier = grid.best_estimator_
108 |         #classifier = SVC(C=0.5, kernel='linear', random_state=1234, verbose=verbose)
109 | 
110 |     return classifier.fit(X_train, y_train)       
111 | 
112 | def test_classifier(X_test, y_test, classifier, n_labels=10):  
113 |     n_examples = len(y_test)    
114 |     confusion = np.zeros((n_labels,n_labels))
115 | 
116 |     for n, (X, true_label) in enumerate(zip(X_test,y_test)):
117 |         sys.stdout.write('Classify progress: %2.0f%%\r' % (100*n/float(n_examples)))
118 |         sys.stdout.flush()
119 | 
120 |         y_pred = np.array(classifier.predict(X), dtype='int')
121 |         pred_label = np.argmax(np.bincount(y_pred, minlength=n_labels))
122 |         confusion[pred_label, true_label[0]] += 1
123 |     print ''
124 | 
125 |     ave_acc = 100*(np.sum(np.diag(confusion)) / np.sum(confusion))
126 |     print "classification accuracy:", ave_acc
127 |     return confusion
128 | 
129 | def test_classifier_printf(X_test, y_test, Z_test, file_list, classifier, save_file, n_labels=10):
130 |     n_examples = len(file_list)
131 |     with open(save_file, 'w') as f:
132 |         for n, (X, true_label, Z, fname) in enumerate(zip(X_test, y_test, Z_test, file_list)):
133 |             sys.stdout.write('Classify progress: %2.0f%%\r' % (100*n/float(n_examples)))
134 |             sys.stdout.flush()
135 | 
136 |             y_pred = np.array(classifier.predict(X), dtype='int')
137 |             pred_label = np.argmax(np.bincount(y_pred, minlength=n_labels))
138 |             s=''
139 |             for i in Z: s+='%2.2f\t'%i
140 |             f.write('{0}\t{1}\t{2}\t{3}\n'.format(fname, true_label[0], pred_label, s))
141 |         print ''
142 | 
143 | if __name__ == "__main__":
144 |     # example: python train_classifier_on_dnn_feats.py ./saved/S_500_RS.cpu.pkl /Users/cmke/Datasets/tzanetakis_genre --which_layers 0
145 |     
146 |     import argparse, glob
147 | 
148 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
149 |         description='''Script to train/test random forest on DNN features.
150 |         ''')
151 |     
152 |     parser.add_argument('model_file', help='Path to trained DNN model file')
153 |     parser.add_argument('--which_layers', nargs='*', type=int, help='List of which DNN layers to use as features')
154 |     parser.add_argument('--aggregate_features', action='store_true', help='option to aggregate frames (mean/std of frames used to train classifier)')
155 |     parser.add_argument('--classifier', help="either 'random_forest' or 'linear_svm'")
156 |     parser.add_argument('--save_file', help='Output classification results to a text file')
157 | 
158 |     args = parser.parse_args()
159 |     
160 |     if not args.which_layers:
161 |         parser.error('Please specify --which_layers x, with x either 1, 2, 3 or 1 2 3 (layer 0 is a pre-processing layer)')
162 | 
163 |     if args.aggregate_features:
164 |         print 'Using aggregate features'
165 |     else:
166 |         print 'Not using aggregate features'
167 | 
168 |     if args.classifier is None:
169 |         print 'No classifer selected, using random forest'
170 |         args.classifier = 'random_forest'
171 | 
172 |     # load model
173 |     model = serial.load(args.model_file) 
174 | 
175 |     # parse dataset from model
176 |     p = re.compile(r"which_set.*'(train)'")
177 |     trainset_yaml = model.dataset_yaml_src
178 |     validset_yaml = p.sub("which_set: 'valid'", model.dataset_yaml_src)
179 |     testset_yaml  = p.sub("which_set: 'test'", model.dataset_yaml_src)
180 | 
181 |     trainset = yaml_parse.load(trainset_yaml)
182 |     validset = yaml_parse.load(validset_yaml)
183 |     testset  = yaml_parse.load(testset_yaml)
184 | 
185 |     if args.aggregate_features:
186 |         X_train, y_train, Z_train, train_files = aggregate_features(model, trainset, which_layers=args.which_layers)
187 |         X_valid, y_valid, Z_valid, valid_files = aggregate_features(model, validset, which_layers=args.which_layers)
188 |         X_test, y_test, Z_test, test_files = aggregate_features(model, testset, which_layers=args.which_layers)
189 |     else:
190 |         X_train, y_train, Z_train, train_files = get_features(model, trainset, which_layers=args.which_layers)
191 |         X_valid, y_valid, Z_valid, valid_files = get_features(model, validset, which_layers=args.which_layers)        
192 |         X_test, y_test, Z_test, test_files = get_features(model, testset, which_layers=args.which_layers)
193 |         
194 |     print 'Training classifier'
195 |     X_all = np.vstack((np.vstack(X_train), np.vstack(X_valid)))
196 |     y_all = np.hstack((np.hstack(y_train), np.hstack(y_valid)))
197 |     classifier = train_classifier(X_all, y_all, method=args.classifier)
198 | 
199 |     print 'Testing classifier'
200 |     if args.save_file:    
201 | 
202 |         test_classifier_printf(
203 |             X_test=X_test, 
204 |             y_test=y_test, 
205 |             Z_test=Z_test, 
206 |             file_list=test_files, 
207 |             classifier=classifier, 
208 |             save_file=args.save_file+'.txt')
209 | 
210 |         print 'Saving trained classifier'
211 |         joblib.dump(classifier, args.save_file+'.pkl', 9)
212 | 
213 |     else:
214 |         confusion = test_classifier(X_test, y_test, classifier)
215 | 
216 | 


--------------------------------------------------------------------------------
/train_mlp_conv_script.py:
--------------------------------------------------------------------------------
 1 | # training script
 2 | import sys, os, argparse, cPickle
 3 | import pylearn2.config.yaml_parse as yaml_parse
 4 | import pdb
 5 | 
 6 | if __name__=="__main__":
 7 | 	
 8 | 	parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
 9 | 	description='''Script to train a DNN with a variable number of units and the possibility of using dropout.
10 | 	''')
11 | 
12 | 	parser.add_argument('fold_config', help='Path to dataset partition configuration file (generated with prepare_dataset.py)')
13 | 	parser.add_argument('yaml_file')
14 | 	parser.add_argument('--output', help='Name of output model')
15 | 	args = parser.parse_args()
16 | 
17 | 	if args.output is None:
18 | 		parser.error('Please specify the name that the trained model file should be saved as (.pkl file)')
19 | 
20 | 	hyper_params = { 
21 | 		'fold_config' : args.fold_config,
22 | 		'best_model_save_path' : args.output,
23 | 		'save_path'	: '/tmp/save.pkl'
24 | 	}
25 | 
26 | 	with open(args.yaml_file) as f:
27 | 		train_yaml = f.read()
28 | 
29 | 	train_yaml = train_yaml % (hyper_params)
30 | 	train = yaml_parse.load(train_yaml)
31 | 	train.main_loop()


--------------------------------------------------------------------------------
/train_mlp_script.py:
--------------------------------------------------------------------------------
 1 | # training script
 2 | import sys, os, argparse, cPickle
 3 | import pylearn2.config.yaml_parse as yaml_parse
 4 | import pdb
 5 | 
 6 | if __name__=="__main__":
 7 | 	
 8 | 	parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
 9 | 	description='''Script to train a DNN with a variable number of units and the possibility of using dropout.
10 | 	''')
11 | 
12 | 	parser.add_argument('fold_config', help='Path to dataset partition configuration file (generated with prepare_dataset.py)')
13 | 	parser.add_argument('yaml_file')
14 | 	parser.add_argument('--nunits', type=int, help='Number of units in each hidden layer')
15 | 	# parser.add_argument('--dropout', action='store_true', help='Set this flag if you want to use dropout regularization')
16 | 	parser.add_argument('--output', help='Name of output model')
17 | 	args = parser.parse_args()
18 | 
19 | 	if args.nunits is None:
20 | 		parser.error('Please specify number of hidden units per layer with --nunits flag')
21 | 	if args.output is None:
22 | 		parser.error('Please specify the name that the trained model file should be saved as (.pkl file)')
23 | 
24 | 	# if args.dropout:
25 | 	# 	print 'Using dropout'
26 | 	# 	yaml_file = 'mlp_rlu_dropout.yaml'
27 | 	# else:
28 | 	# 	print 'Not using dropout'
29 | 	# 	yaml_file = 'mlp_rlu.yaml'
30 | 
31 | 	hyper_params = { 'dim_h0' : args.nunits,
32 | 		'dim_h1' : args.nunits,
33 | 		'dim_h2' : args.nunits,
34 | 		'fold_config' : args.fold_config,
35 | 		'best_model_save_path' : args.output,
36 | 		'save_path'	: '/tmp/save.pkl'
37 | 	}
38 | 
39 | 	with open(args.yaml_file) as f:
40 | 		train_yaml = f.read()
41 | 
42 | 	train_yaml = train_yaml % (hyper_params)
43 | 	train = yaml_parse.load(train_yaml)
44 | 	train.main_loop()
45 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coreyker/dnn-mgr/bdad579ea6cb37b665ea6019fe1026a6ce20cbc7/utils/__init__.py


--------------------------------------------------------------------------------
/utils/calc_grad.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from theano import function
  3 | from theano import tensor as T
  4 | import pylearn2
  5 | import audio_dataset
  6 | import pdb
  7 | 
  8 | '''
  9 | # !!! There is something wrong in the following gradient calculation !!!
 10 | # !!! (But we don't need it anyway, since theano can do the backprop calculation for us) !!!
 11 | 
 12 | # Calculate gradient of MLP w.r.t. input data
 13 | # (assumes rectified linear units and softmax output layer)
 14 | 
 15 | def calc_grad(X0, model, label):
 16 | 
 17 |     X = model.get_input_space().make_theano_batch()
 18 |     Y = model.fprop( X, return_all=True )
 19 |     fprop = theano.function([X],Y)
 20 | 
 21 |     activations = fprop(X0)
 22 | 
 23 |     Wn = model.layers[-1].get_weights()
 24 |     bn = model.layers[-1].get_biases()
 25 |     Xn = activations[-1]
 26 | 
 27 |     # derivative of cost with respect to layer preceeding the softmax
 28 |     gradn = Wn[:,label] - Xn.dot(Wn.T) 
 29 | 
 30 |     pdb.set_trace()
 31 |     for n in xrange(len(model.layers)-2, 0, -1):
 32 |         Wn = model.layers[n].get_weights()
 33 |         bn = model.layers[n].get_biases()
 34 |         Xn_1 = activations[n-1]
 35 | 
 36 |         if type(model.layers[n]) is pylearn2.models.mlp.RectifiedLinear:
 37 |             dact = lambda x: x>0
 38 |         elif type(model.layers[n]) is pylearn2.models.mlp.Linear:
 39 |             dact = lambda x: x
 40 |         elif type(model.layers[n]) is audio_dataset.PreprocLayer:
 41 |             dact = lambda x: x
 42 | 
 43 |         gradn = (dact(Xn_1.dot(Wn)) * gradn).dot(Wn.T)
 44 | 
 45 |     return gradn
 46 | '''
 47 | 
 48 | # Create a simple model for testing
 49 | rng = np.random.RandomState(111)
 50 | epsilon = 1e-2
 51 | nvis = 10
 52 | nhid = 5
 53 | n_classes = 3
 54 | 
 55 | X0 = np.array(rng.randn(1,nvis), dtype=np.float32)
 56 | label = rng.randint(0,n_classes)
 57 | 
 58 | model = pylearn2.models.mlp.MLP(
 59 |     nvis=nvis,
 60 |     layers=[
 61 |         pylearn2.models.mlp.Linear(
 62 |             layer_name='pre',
 63 |             dim=nvis,
 64 |             irange=1.
 65 |             ),
 66 |         pylearn2.models.mlp.RectifiedLinear(
 67 |             layer_name='h0',
 68 |             dim=nhid,
 69 |             irange=1.),
 70 |         pylearn2.models.mlp.RectifiedLinear(
 71 |             layer_name='h1',
 72 |             dim=nhid,
 73 |             irange=1.),
 74 |         pylearn2.models.mlp.RectifiedLinear(
 75 |             layer_name='h2',
 76 |             dim=nhid,
 77 |             irange=1.),
 78 |         pylearn2.models.mlp.Softmax(
 79 |             n_classes=n_classes,
 80 |             layer_name='y',
 81 |             irange=1.)
 82 |         ])
 83 | 
 84 | # Numerical computation of gradients
 85 | X = model.get_input_space().make_theano_batch()
 86 | Y = model.fprop( X )
 87 | fprop = function([X],Y)
 88 | 
 89 | dX_num = np.zeros(X0.shape)
 90 | for i in range(nvis):
 91 |     tmp = np.copy(X0[:,i])
 92 |     X0[:,i] = tmp + epsilon
 93 |     Y_plus  = -np.log(fprop(X0)[:,label])
 94 |     
 95 |     X0[:,i] = tmp - epsilon
 96 |     Y_minus = -np.log(fprop(X0)[:,label])
 97 | 
 98 |     X0[:,i] = tmp
 99 |     dX_num[:,i] = (Y_plus - Y_minus) / (2*epsilon)
100 | 
101 | # Computation of gradients using Theano
102 | n_examples = X0.shape[0]
103 | label_vec = T.vector('label_vec')
104 | cost  = model.cost(label_vec, model.fprop(X))
105 | dCost = T.grad(cost * n_examples, X) 
106 | f = function([X, label_vec], dCost)
107 | 
108 | one_hot = np.zeros(n_classes, dtype=np.float32)
109 | one_hot[label] = 1
110 | 
111 | dX_est = f(X0, one_hot) #dX_est = calc_grad(X0, model, label)
112 | 
113 | delta = dX_num - dX_est
114 | # Print results
115 | print 'Numerical gradient:', dX_num
116 | print 'Theano gradient:', dX_est
117 | print 'Absolute difference:', np.abs(delta)
118 | print '2-norm of difference', np.linalg.norm(delta)
119 | 
120 | 


--------------------------------------------------------------------------------
/utils/class_histogram.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import numpy as np
  3 | import theano
  4 | from pylearn2.utils import serial
  5 | from pylearn2.datasets.transformer_dataset import TransformerDataset
  6 | import cPickle
  7 | import GTZAN_dataset
  8 | 
  9 | import pdb
 10 | 
 11 | def class_histogram(model, dataset):
 12 |     """
 13 |     Function to compute the file-level classification error by classifying
 14 |     individual frames and then voting for the class with highest cumulative probability
 15 |     """
 16 |     X = model.get_input_space().make_theano_batch()
 17 |     Y = model.fprop( X )
 18 |     fprop = theano.function([X],Y)
 19 | 
 20 |     n_classes  = dataset.raw.y.shape[1]
 21 |     confusion  = np.zeros((n_classes, n_classes))
 22 |     n_examples = len(dataset.raw.support)
 23 |     n_frames_per_file = dataset.raw.n_frames_per_file
 24 | 
 25 |     batch_size = n_frames_per_file
 26 |     data_specs = dataset.raw.get_data_specs()
 27 |     iterator = dataset.iterator(mode='sequential', 
 28 |         batch_size=batch_size, 
 29 |         data_specs=data_specs
 30 |         )
 31 | 
 32 |     i=0
 33 |     histogram = []
 34 |     for el in iterator:
 35 | 
 36 |         # display progress indicator
 37 |         sys.stdout.write('Classify progress: %2.0f%%\r' % (100*i/float(n_examples)))
 38 |         sys.stdout.flush()
 39 |     
 40 |         fft_data     = np.array(el[0], dtype=np.float32)
 41 |         frame_labels = np.argmax(fprop(fft_data), axis=1)
 42 |         hist         = np.bincount(frame_labels, minlength=n_classes)
 43 |         histogram.append(hist)
 44 | 
 45 |         i += batch_size
 46 | 
 47 |     return histogram
 48 | 
 49 | #if __name__ == '__main__':
 50 |     
 51 | #    _, fold_file, model_file = sys.argv
 52 | fold_file = 'GTZAN_1024-fold-1_of_4.pkl'
 53 | model_file = './saved-rlu-505050/mlp_rlu_fold1_best.pkl'
 54 | 
 55 | # get model
 56 | model = serial.load(model_file)  
 57 | 
 58 | # get stanardized dictionary  
 59 | which_set = 'test'
 60 | with open(fold_file) as f:
 61 |     config = cPickle.load(f)
 62 | 
 63 | dataset = TransformerDataset(
 64 |     raw = GTZAN_dataset.GTZAN_dataset(config, which_set),
 65 |     transformer = GTZAN_dataset.GTZAN_standardizer(config)
 66 |     )
 67 | 
 68 | # test error
 69 | #err, conf = frame_misclass_error(model, dataset)
 70 | 
 71 | hist = class_histogram(model, dataset)
 72 | hist = np.vstack(hist)
 73 | 
 74 | test_files = np.array(config['test_files'])
 75 | test_labels = test_files//100
 76 | 
 77 | most_votes = np.argmax(hist,axis=0)
 78 | most_rep_files = test_files[most_votes]
 79 | most_rep_hist = hist[most_votes, :]
 80 | 
 81 | prediction = np.argmax(hist, axis=1)
 82 | top_pred = np.argsort(hist, axis=1)
 83 | top_pred = top_pred[:,-1::-1]
 84 | 
 85 | err_list = []
 86 | 
 87 | # for i, (l,p) in enumerate(zip(test_labels, prediction)):
 88 | #     if l != p:
 89 | #         err_list.append(i)
 90 | 
 91 | for i, (l,p) in enumerate(zip(test_labels, top_pred)):    
 92 |     if l not in p[:2]:
 93 |         err_list.append(i)
 94 | 
 95 | 
 96 | ax_labels = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
 97 | 
 98 | n = err_list[18]
 99 | err_file = test_files[n]
100 | err_hist = hist[n] 
101 | pred_label = ax_labels[np.argmax(err_hist)]
102 | true_label = ax_labels[err_file//100]
103 | 
104 | print ff[err_file]
105 | print pred_label
106 | 
107 | 


--------------------------------------------------------------------------------
/utils/comp_ave_snr.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import numpy as np
 3 | import scikits.audiolab as audiolab
 4 | 
 5 | # calc ave snr
 6 | file_list = '/home/cmke/Datasets/_tzanetakis_genre/test_filtered.txt'
 7 | 
 8 | with open(file_list) as f:
 9 | 	files = [l.strip() for l in f.readlines()]
10 | 
11 | base = '/home/cmke/Datasets/_tzanetakis_genre'
12 | 
13 | dir_list = ['/home/cmke/Datasets/_tzanetakis_F_500_RSD_allies', 
14 | '/home/cmke/Datasets/_tzanetakis_F_500_RSD_random',
15 | '/home/cmke/Datasets/_tzanetakis_F_500_RSD_jazz']
16 | 
17 | ign = 2048
18 | snr = [[],[],[]]
19 | for j,d in enumerate(dir_list):
20 | 	print d
21 | 	for i,f in enumerate(files):
22 | 		print 'iteration ', i
23 | 		x,_,_ = audiolab.wavread(os.path.join(base,f))
24 | 		xhat,_,_ = audiolab.wavread(os.path.join(d,f))
25 | 		
26 | 		L = min(len(x), len(xhat))
27 | 
28 | 		snr[j].append(20*np.log10(np.linalg.norm(x[ign:L-ign-1])/np.linalg.norm(np.abs(x[ign:L-ign-1]-xhat[ign:L-ign-1])+1e-12)))
29 | 
30 | for d,s in zip(dir_list, snr):
31 | 	print 'Directory', d
32 | 	print 'Average SNR', np.mean(s)
33 | 	print 'Std SNR', np.std(s)
34 | 
35 | 


--------------------------------------------------------------------------------
/utils/create_adversarial_dataset.py:
--------------------------------------------------------------------------------
  1 | import os, sys, re, csv, cPickle, argparse
  2 | from scikits import audiolab, samplerate
  3 | from utils.read_mp3 import read_mp3
  4 | from sklearn.externals import joblib
  5 | import numpy as np
  6 | import theano
  7 | from theano import tensor as T
  8 | from pylearn2.utils import serial
  9 | from audio_dataset import AudioDataset, PreprocLayer
 10 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
 11 | from pylearn2.datasets.dense_design_matrix import DefaultViewConverter
 12 | import pylearn2.config.yaml_parse as yaml_parse
 13 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features
 14 | import pdb
 15 | 
 16 | 
 17 | def file_misclass_error_printf(dnn_model, root_dir, dataset, save_file, mode='all_same', label=0, snr=30, aux_model=None, aux_save_file=None, which_layers=None, save_adversary_audio=None, fwd_xform=None, back_xform=None):
 18 |     """
 19 |     Function to compute the file-level classification error by classifying
 20 |     individual frames and then voting for the class with highest cumulative probability
 21 |     """
 22 |     if fwd_xform is None: 
 23 |         print 'fwd_xform=None, using identity'
 24 |         fwd_xform = lambda X: X
 25 |     if back_xform is None: 
 26 |         print 'back_xform=None, using identity'
 27 |         back_xform = lambda X: X
 28 | 
 29 |     n_classes  = len(dataset.targets)    
 30 | 
 31 |     X     = dnn_model.get_input_space().make_theano_batch()
 32 |     Y     = dnn_model.fprop(X)
 33 |     fprop_theano = theano.function([X],Y)
 34 | 
 35 |     input_space = dnn_model.get_input_space()
 36 |     if isinstance(input_space, Conv2DSpace):
 37 |         tframes, dim = input_space.shape
 38 |         view_converter = DefaultViewConverter((tframes, dim, 1))
 39 |     else:
 40 |         dim = input_space.dim        
 41 |         tframes = 1
 42 |         view_converter = None
 43 | 
 44 |     if view_converter is not None:
 45 |         def fprop(batch):
 46 |             nframes = batch.shape[0]
 47 |             thop = 1.
 48 |             sup = np.arange(0,nframes-tframes+1, np.int(tframes/thop))
 49 |             
 50 |             data = np.vstack([np.reshape(batch[i:i+tframes, :],(tframes*dim,)) for i in sup])
 51 |             data = fwd_xform(data)
 52 |             
 53 |             return fprop_theano(view_converter.get_formatted_batch(data, input_space))
 54 | 
 55 |     else:
 56 |         fprop = fprop_theano
 57 | 
 58 |     n_examples   = len(dataset.file_list)
 59 |     target_space = dnn_model.get_output_space() #VectorSpace(dim=n_classes)
 60 |     feat_space   = dnn_model.get_input_space() #VectorSpace(dim=dataset.nfft//2+1, dtype='complex64')
 61 |     data_specs   = (CompositeSpace((feat_space, target_space)), ("songlevel-features", "targets"))     
 62 |     iterator     = dataset.iterator(mode='sequential', batch_size=1, data_specs=data_specs)
 63 | 
 64 |     if aux_model:
 65 |         aux_fname = open(aux_save_file, 'w')
 66 |         aux_writer = csv.writer(aux_fname, delimiter='\t')
 67 | 
 68 |     with open(save_file, 'w') as fname:
 69 |         dnn_writer = csv.writer(fname, delimiter='\t')
 70 |         for i,el in enumerate(iterator):
 71 | 
 72 |             # display progress indicator
 73 |             'Progress: %2.0f%%\r' % (100*i/float(n_examples))
 74 |         
 75 |             Mag, Phs = np.abs(el[0], dtype=np.float32), np.angle(el[0])
 76 |             epsilon  = np.linalg.norm(Mag)/Mag.shape[0]/10**(snr/20.)
 77 | 
 78 |             if mode == 'all_same':
 79 |                 target = label
 80 |             elif mode == 'perfect':
 81 |                 target = el[1]
 82 |             elif mode == 'random':
 83 |                 target = np.random.randint(n_classes)
 84 |             elif mode == 'all_wrong':
 85 |                 cand = np.setdiff1d(np.arange(n_classes),np.array(el[1])) # remove ground truth label from set of options
 86 |                 target = cand[np.random.randint(len(cand))]
 87 | 
 88 |             if 1: # re-read audio (seems to be bug when reading from h5)
 89 |                 f = el[2]
 90 |                 if f.endswith('.wav'):
 91 |                     read_fun = audiolab.wavread             
 92 |                 elif f.endswith('.au'):
 93 |                     read_fun = audiolab.auread
 94 |                 elif f.endswith('.mp3'):
 95 |                     read_fun = read_mp3
 96 | 
 97 |                 x, fstmp, _ = read_fun(os.path.join(root_dir, f))
 98 | 
 99 |                 # make mono
100 |                 if len(x.shape) != 1: 
101 |                     x = np.sum(x, axis=1)/2.
102 | 
103 |                 seglen=30
104 |                 x = x[:fstmp*seglen]
105 | 
106 |                 fs = 22050
107 |                 if fstmp != fs:
108 |                     x = samplerate.resample(x, fs/float(fstmp), 'sinc_best')
109 | 
110 |                 Mag, Phs = compute_fft(x)
111 |                 Mag = Mag[:1200,:513]
112 |                 Phs = Phs[:1200,:513]
113 |                 epsilon = np.linalg.norm(Mag)/Mag.shape[0]/10**(snr/20.)
114 |             else:
115 |                 raise ValueError("Check that song-level iterator is indeed returning 'raw data'") 
116 | 
117 |             X_adv, P_adv = find_adversary(
118 |                 model=dnn_model, 
119 |                 X0=Mag, 
120 |                 label=target,
121 |                 fwd_xform=fwd_xform,
122 |                 back_xform=back_xform,
123 |                 P0=np.hstack((Phs, -Phs[:,-2:-dataset.nfft/2-1:-1])), 
124 |                 mu=.01, 
125 |                 epsilon=epsilon, 
126 |                 maxits=50, 
127 |                 stop_thresh=0.5, 
128 |                 griffin_lim=False#True
129 |                 )
130 |             
131 |             if save_adversary_audio: 
132 |                 
133 |                 nfft  = 2*(X_adv.shape[1]-1)
134 |                 nhop  = nfft//2      
135 |                 x_adv = overlap_add(np.hstack((X_adv, X_adv[:,-2:-nfft//2-1:-1])) * np.exp(1j*P_adv), nfft, nhop)
136 |                 audiolab.wavwrite(x_adv, os.path.join(save_adversary_audio, el[2]), 22050, 'pcm16')
137 | 
138 |             #frame_labels = np.argmax(fprop(X_adv), axis=1)
139 |             #hist         = np.bincount(frame_labels, minlength=n_classes)
140 |             
141 |             fpass = fprop(X_adv)
142 |             conf = np.sum(fpass, axis=0) / float(fpass.shape[0])
143 |             dnn_label = np.argmax(conf) #np.argmax(hist) # most used label
144 |             true_label = el[1]
145 | 
146 |             # truncate to correct length
147 |             ext = min(Mag.shape[0], X_adv.shape[0])
148 |             Mag = Mag[:ext,:]
149 |             X_adv = X_adv[:ext,:]
150 | 
151 |             X_diff = Mag-X_adv
152 |             out_snr = 20*np.log10(np.linalg.norm(Mag)/np.linalg.norm(X_diff))
153 |             
154 |             dnn_writer.writerow([dataset.file_list[i], true_label, dnn_label, out_snr, conf[dnn_label]]) 
155 | 
156 |             print 'Mode:{}, True label:{}, Adv label:{}, Sel label:{}, Conf:{}, Out snr: {}'.format(mode, true_label, target, dnn_label, conf[dnn_label], out_snr)
157 |             if aux_model:
158 |                 fft_agg  = aggregate_features(dnn_model, X_adv, which_layers)
159 |                 aux_vote = np.argmax(np.bincount(np.array(aux_model.predict(fft_agg), dtype='int')))
160 |                 aux_writer.writerow([dataset.file_list[i], true_label, aux_vote]) 
161 |                 print 'AUX adversarial label: {}'.format(aux_vote)
162 |     if aux_model:
163 |         aux_fname.close()
164 |     print ''
165 | 
166 | if __name__ == '__main__':
167 |     '''
168 |     Variants:
169 |     1) Label all excerpts the same (e.g., all blues)
170 |     2) Perfect classification
171 |     3) Random classification
172 |     '''
173 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
174 |     description='''Script to find/test adversarial examples with a dnn''')
175 |     parser.add_argument('--dnn_model', help='dnn model to use for features')
176 |     parser.add_argument('--aux_model', help='(optional) auxiliary model trained on dnn features (e.g. random forest)')
177 |     parser.add_argument('--which_layers', nargs='*', type=int, help='(optional) layer(s) from dnn to be passed to auxiliary model')
178 | 
179 |     # three variants
180 |     parser.add_argument('--mode', help='either all_same, perfect, random, all_wrong')
181 |     parser.add_argument('--label', type=int, help='label to minimize loss on (only used in all_same mode)')
182 |     parser.add_argument('--root_dir', help='dataset directory')
183 | 
184 |     parser.add_argument('--dnn_save_file', help='txt file to save results in')
185 |     parser.add_argument('--aux_save_file', help='txt file to save results in')
186 |     parser.add_argument('--save_adversary_audio', help='path to save adversaries')
187 | 
188 |     args = parser.parse_args()
189 | 
190 |     assert args.mode in ['all_same', 'perfect', 'random', 'all_wrong'] 
191 |     if args.mode == 'all_same' and not args.label:
192 |         parser.error('--label x must be specified together with all_same mode')
193 |     if args.aux_model and not args.which_layers:
194 |         parser.error('--which_layers x1 x2 ... must be specified together with aux_model')
195 |     if args.aux_model and not args.aux_save_file:
196 |         parser.error('--aux_save_file x must be specified together with --aux_model')      
197 | 
198 |     dnn_model = serial.load(args.dnn_model)
199 |     if isinstance(dnn_model.layers[0], PreprocLayer):
200 |         print 'Preprocessing layer detected'
201 |         fwd_xform = None
202 |         back_xform = None
203 |     else:
204 |         print 'No preprocessing layer detected'
205 |         trainset = yaml_parse.load(dnn_model.dataset_yaml_src)
206 |         fwd_xform = lambda batch: (batch - trainset.mean) * trainset.istd * trainset.mask
207 |         back_xform = lambda batch: (batch / trainset.istd + trainset.mean) * trainset.mask 
208 |     
209 |     p = re.compile(r"which_set.*'(train)'")
210 |     dataset_yaml = p.sub("which_set: 'test'", dnn_model.dataset_yaml_src)
211 |     testset = yaml_parse.load(dataset_yaml)
212 | 
213 |     if args.aux_model:
214 |         aux_model = joblib.load(args.aux_model)
215 |     else:
216 |         aux_model = None
217 | 
218 |     file_misclass_error_printf(
219 |         dnn_model=dnn_model, 
220 |         root_dir=args.root_dir,
221 |         dataset=testset,         
222 |         save_file=args.dnn_save_file, 
223 |         mode=args.mode, 
224 |         label=args.label, 
225 |         snr=-300.,#15., 
226 |         aux_model=aux_model, 
227 |         aux_save_file=args.aux_save_file, 
228 |         which_layers=args.which_layers,
229 |         save_adversary_audio=args.save_adversary_audio,
230 |         fwd_xform=fwd_xform,
231 |         back_xform=back_xform)
232 | 
233 | 


--------------------------------------------------------------------------------
/utils/create_split_files.py:
--------------------------------------------------------------------------------
 1 | import os, sys, tables
 2 | import numpy as np
 3 | 
 4 | def create_split_files(hdf5, ntrain, nvalid, ntest, path):
 5 |     # ntrain, nvalid, ntest are per class
 6 | 
 7 |     # extract metadata from dataset
 8 |     hdf5_file = tables.open_file(hdf5, mode='r')
 9 |     param     = hdf5_file.get_node('/', 'Param')    
10 |     file_dict = param.file_dict[0]
11 |     
12 |     train_list = []
13 |     valid_list = []
14 |     test_list  = []
15 | 
16 |     rng = np.random.RandomState(111)
17 |     for key, files in file_dict.iteritems(): # for all files that share a given label
18 |         nfiles = len(files)             
19 |         perm = rng.permutation(nfiles)      
20 | 
21 |         sup         = np.arange(ntest)
22 |         train_index = perm[:ntrain]
23 |         valid_index = perm[ntrain:ntrain+nvalid]
24 |         test_index  = perm[ntrain+nvalid:ntrain+nvalid+ntest]
25 | 
26 |         train_list.append([files[i] for i in train_index])
27 |         valid_list.append([files[i] for i in valid_index])
28 |         test_list.append([files[i] for i in test_index])
29 | 
30 |     # flatten lists
31 |     train_list = sum(train_list,[])
32 |     valid_list = sum(valid_list,[])
33 |     test_list  = sum(test_list,[])
34 | 
35 |     with open(os.path.join(path, 'train-part.txt'), 'w') as f:
36 |         for i in train_list:        
37 |             f.write('{}\n'.format(i))
38 | 
39 |     with open(os.path.join(path, 'valid-part.txt'), 'w') as f:
40 |         for i in valid_list:        
41 |             f.write('{}\n'.format(i))
42 | 
43 |     with open(os.path.join(path, 'test-part.txt'), 'w') as f:
44 |         for i in test_list:        
45 |             f.write('{}\n'.format(i))
46 | 
47 |     hdf5_file.close()
48 | 
49 | if __name__=='__main__':
50 |     create_split_files(sys.argv[1], int(sys.argv[2]), int(sys.argv[3]), int(sys.argv[4]), sys.argv[5])


--------------------------------------------------------------------------------
/utils/filtered_classify.py:
--------------------------------------------------------------------------------
  1 | import os, sys, re, csv, cPickle
  2 | import numpy as np
  3 | import scipy as sp
  4 | import scikits.audiolab as audiolab
  5 | import scikits.samplerate as samplerate
  6 | from sklearn.externals import joblib
  7 | import theano
  8 | 
  9 | from pylearn2.utils import serial
 10 | from audio_dataset import AudioDataset
 11 | 
 12 | from test_adversary import aggregate_features, compute_fft
 13 | import pdb
 14 | 
 15 | def file_misclass_error_printf(dnn_model, aux_model, which_layers, data_dir, file_list, filter_cutoff, dnn_save_file, aux_save_file):
 16 |     
 17 |     # closures
 18 |     def dnn_classify(X):
 19 |         batch = dnn_model.get_input_space().make_theano_batch()
 20 |         fprop = theano.function([batch], dnn_model.fprop(batch))
 21 |         prediction = np.argmax(np.sum(fprop(X), axis=0))
 22 |         return prediction
 23 | 
 24 |     def aux_classify(X):
 25 |         Xagg = aggregate_features(dnn_model, X, which_layers)
 26 |         prediction = np.argmax(np.bincount(np.array(aux_model.predict(Xagg), dtype='int')))
 27 |         return prediction
 28 | 
 29 |     # filter coeffs
 30 |     b,a = sp.signal.butter(4, filter_cutoff/(22050./2.))
 31 |     
 32 |     dnn_file = open(dnn_save_file, 'w')
 33 |     aux_file = open(aux_save_file, 'w')        
 34 |     label_list = {'blues':0, 'classical':1, 'country':2, 'disco':3, 'hiphop':4, 'jazz':5, 'metal':6, 'pop':7, 'reggae':8, 'rock':9}
 35 | 
 36 |     for i, fname in enumerate(file_list):
 37 |         print 'Processing file {} of {}'.format(i+1, len(file_list))
 38 |         true_label = label_list[fname.split('/')[0]]
 39 | 
 40 |         x,_,_ = audiolab.wavread(os.path.join(data_dir, fname))
 41 |         x     = sp.signal.lfilter(b,a,x)
 42 |         X,_   = compute_fft(x)      
 43 |         X     = np.array(X[:,:513], dtype=np.float32)
 44 | 
 45 |         dnn_pred = dnn_classify(X)        
 46 |         dnn_file.write('{fname}\t{true_label}\t{pred_label}\n'.format(
 47 |             fname=fname,
 48 |             true_label=true_label,
 49 |             pred_label=dnn_pred))
 50 |   
 51 |         aux_pred = aux_classify(X)
 52 |         aux_file.write('{fname}\t{true_label}\t{pred_label}\n'.format(
 53 |             fname=fname,
 54 |             true_label=true_label,
 55 |             pred_label=aux_pred))
 56 | 
 57 |     dnn_file.close()
 58 |     aux_file.close()
 59 | 
 60 | def pp_array(array): # pretty printing
 61 |     for row in array:
 62 |         print ['%04.1f' % el for el in row]
 63 | 
 64 | if __name__ == '__main__':
 65 |     
 66 |     import argparse
 67 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
 68 |         description='')
 69 | 
 70 |     parser.add_argument('--dnn_model', help='Path to trained dnn model file')
 71 |     parser.add_argument('--aux_model', help='Path to trained aux model file')
 72 |     parser.add_argument('--data_dir', help='Adversarial dataset dir')
 73 |     parser.add_argument('--test_list', help='List of test files in dataset dir')
 74 |     parser.add_argument('--filter_cutoff', type=float, help='filter cutoff')
 75 |     parser.add_argument('--dnn_save_file', help='')
 76 |     parser.add_argument('--aux_save_file', help='')
 77 |     args = parser.parse_args()
 78 |     
 79 |     # get model
 80 |     dnn_model = serial.load(args.dnn_model)
 81 |     aux_model = joblib.load(args.aux_model)
 82 |     L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1]
 83 |     if L=='All':
 84 |         which_layers = [1,2,3]
 85 |     else:
 86 |         which_layers = [int(L)]
 87 | 
 88 |     with open(args.test_list) as f: 
 89 |         file_list = [l.strip() for l in f.readlines()]
 90 | 
 91 |     file_misclass_error_printf(
 92 |         dnn_model = dnn_model, 
 93 |         aux_model = aux_model, 
 94 |         which_layers = which_layers, 
 95 |         data_dir = args.data_dir, 
 96 |         file_list = file_list, 
 97 |         filter_cutoff = args.filter_cutoff, 
 98 |         dnn_save_file = args.dnn_save_file, 
 99 |         aux_save_file = args.aux_save_file)
100 | 


--------------------------------------------------------------------------------
/utils/filtered_classify_batch.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import numpy as np
 3 | 
 4 | freqs = np.hstack(([20], np.arange(1000,12000,1000)))
 5 | for f in freqs:
 6 | 	print 'On cutoff: {}'.format(f)
 7 | 	#os.system('''python utils/filtered_classify.py --dnn_model saved_models/dnn/S_500_RSD.pkl --aux_model saved_models/rf/S_500_RSD_AF_LAll.pkl --data_dir /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/ --test_list gtzan/test_stratified.txt --filter_cutoff {hz:d} --dnn_save_file /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/__dnn__/S_500_RSD-{hz:d}.txt --aux_save_file /home/cmke/Datasets/_tzanetakis_S_500_RSD_random/__rf__/S_500_RSD_AF_LAll-{hz:d}.txt'''.format(hz=f))
 8 | 	os.system('''python utils/filtered_classify.py --dnn_model saved_models/dnn/S_500_RSD.pkl --aux_model saved_models/rf/S_500_RSD_AF_LAll.pkl --data_dir /home/cmke/Datasets/_tzanetakis_genre/ --test_list gtzan/test_stratified.txt --filter_cutoff {hz:d} --dnn_save_file /home/cmke/Datasets/_tzanetakis_genre/__dnn__/S_500_RSD-{hz:d}.txt --aux_save_file /home/cmke/Datasets/_tzanetakis_genre/__rf__/S_500_RSD_AF_LAll-{hz:d}.txt'''.format(hz=f))
 9 | 
10 | 
11 | 


--------------------------------------------------------------------------------
/utils/plot_adversary_spectra.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scikits.audiolab as audiolab
 3 | from test_adversary import winfunc, compute_fft
 4 | from matplotlib import pyplot as plt
 5 | 
 6 | 
 7 | if __name__=='__main__':
 8 |     import argparse
 9 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter, description='')
10 |     parser.add_argument('--true_file')
11 |     parser.add_argument('--adversary')
12 |     args = parser.parse_args()
13 | 
14 |     # load sndfile 
15 |     x,_,_ = audiolab.wavread(args.true_file)
16 |     x_adv,_,_ = audiolab.wavread(args.adversary)
17 |     
18 |     L = min(len(x), len(x_adv))
19 |     ign = 2048
20 |     snr = 20*np.log10(np.linalg.norm(x[ign:L-ign-1])/np.linalg.norm(x[ign:L-ign-1]-x_adv[ign:L-ign-1]))
21 |     print 'SNR: ', snr
22 | 
23 |     # STFT
24 |     X = compute_fft(x)[0][:,:513]
25 |     X_adv = compute_fft(x_adv)[0][:,:513]
26 | 
27 |     rng = 1+np.arange(400)
28 |     Xt = X[rng,:]
29 |     X = 20*np.log10(Xt)
30 | 
31 |     Xt_adv = X_adv[rng,:]
32 |     X_adv = 20*np.log10(Xt_adv)
33 |     # nrm = np.max(X)/1.
34 |     # X /= nrm
35 |     # X_adv /= nrm
36 | 
37 |     vmin = np.min(X)
38 |     vmax = np.max(X)
39 | 
40 |     # Plotting...
41 |     plt.ion()
42 |     plt.figure()
43 | 
44 |     plt.subplot(2,3,1)
45 |     plt.imshow(X, extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax)
46 |     plt.axis('tight')
47 |     plt.xlabel('Frequency (kHz)')
48 |     plt.ylabel('Time frame')
49 | 
50 |     plt.subplot(2,3,2)
51 |     plt.imshow(X_adv, extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax)
52 |     plt.axis('tight')
53 |     plt.xlabel('Frequency (kHz)')
54 | 
55 |     plt.subplot(2,3,3)
56 |     plt.imshow(20*np.log10(np.abs(Xt_adv-Xt)), extent=[0,11.025,len(rng),0], vmin=vmin, vmax=vmax)
57 |     plt.axis('tight')
58 |     plt.xlabel('Frequency (kHz)')
59 | 
60 |     plt.subplot(2,1,2)
61 |     N = 10
62 |     x_range = np.arange(513)/513.*(22.050/2)
63 |     plt.plot(x_range, 20*np.log10(Xt[N,:]), color=(.4,.6,1,0.8), linewidth=2)
64 |     
65 |     plt.plot(x_range, 20*np.log10(Xt_adv[N,:]), color=(0,0,0,1), linewidth=1)
66 |     
67 |     plt.plot(x_range, 20*np.log10(np.abs(Xt[N,:]-Xt_adv[N,:])), '-', color=(1,0.6,0.1,0.6), linewidth=2)
68 |     plt.axis('tight')
69 |     plt.xlabel('Frequency (kHz)')
70 |     plt.ylabel('Magnitude (dB)')
71 | 
72 |     #plt.savefig('adversary_spectra.pdf', format='pdf')


--------------------------------------------------------------------------------
/utils/plot_conf.py:
--------------------------------------------------------------------------------
  1 | import matplotlib
  2 | #matplotlib.use('Agg')
  3 | from matplotlib import pyplot as plt
  4 | 
  5 | import sys, re, os
  6 | import numpy as np
  7 | from pylearn2.utils import serial
  8 | import pylearn2.config.yaml_parse as yaml_parse
  9 | 
 10 | #from test_mlp_script import frame_misclass_error, file_misclass_error
 11 | 
 12 | def plot_conf_mat(confusion, title, labels):        
 13 |     augmented_confusion = augment_confusion_matrix(confusion)
 14 | 
 15 |     fig = plt.figure()
 16 |     ax  = fig.add_subplot(111)
 17 |     ax.set_aspect(1)
 18 |     ax.imshow(np.array(augmented_confusion), cmap=plt.cm.gray_r, interpolation='nearest')
 19 | 
 20 |     width,height = augmented_confusion.shape
 21 |     for x in xrange(width):
 22 |         for y in xrange(height):
 23 |             if augmented_confusion[x][y]<50:
 24 |                 color='k'
 25 |             else:
 26 |                 color='w'
 27 |             ax.annotate('%2.1f'%augmented_confusion[x][y], xy=(y, x), horizontalalignment='center', verticalalignment='center',color=color, fontsize=9)
 28 | 
 29 |     ax.xaxis.tick_top()
 30 |     plt.xticks(range(width), labels+['Pr'])
 31 |     plt.yticks(range(height), labels+['F'])
 32 | 
 33 |     xlabels = ax.get_xticklabels()
 34 |     for label in xlabels: 
 35 |         label.set_rotation(30) 
 36 | 
 37 |     plt.xlabel(title)
 38 |     plt.show()
 39 | 
 40 | def save_conf_mat(confusion, title, labels):
 41 |     augmented_confusion = augment_confusion_matrix(confusion)
 42 | 
 43 |     fig = plt.figure()
 44 |     ax  = fig.add_subplot(111)
 45 |     ax.set_aspect(1)
 46 |     ax.imshow(np.array(augmented_confusion), cmap=plt.cm.gray_r, interpolation='nearest')
 47 | 
 48 |     thresh = np.max(augmented_confusion)
 49 |     width,height = augmented_confusion.shape
 50 |     for x in xrange(width):
 51 |         for y in xrange(height):
 52 |             if augmented_confusion[x][y]<thresh/2:
 53 |                 color='k'
 54 |             else:
 55 |                 color='w'
 56 |             ax.annotate('%2.1f'%augmented_confusion[x][y], xy=(y, x), horizontalalignment='center', verticalalignment='center',color=color, fontsize=9)
 57 | 
 58 |     ax.xaxis.tick_top()
 59 |     plt.xticks(range(width), labels+['Pr'])
 60 |     plt.yticks(range(height), labels+['F'])
 61 | 
 62 |     xlabels = ax.get_xticklabels()
 63 |     for label in xlabels: 
 64 |         label.set_rotation(30) 
 65 | 
 66 |     #plt.xlabel(title.split('/')[-1])
 67 |     #plt.show()
 68 | 
 69 |     plt.savefig(title + '.pdf', format='pdf')
 70 |     plt.close()
 71 |     return augmented_confusion[-1,-1]
 72 |     
 73 | def plot_ave_conf_mat(confusion_matrices, title, labels):
 74 |     
 75 |     #plt.rc('text', usetex=True)
 76 |     #plt.rc('font', family='serif')
 77 | 
 78 |     ave_confusion = np.mean(confusion_matrices, axis=0)
 79 |     std_confusion = np.std(confusion_matrices, axis=0)
 80 | 
 81 |     fig = plt.figure()
 82 |     ax  = fig.add_subplot(111)
 83 |     ax.set_aspect(1)
 84 |     ax.imshow(np.array(ave_confusion), cmap=plt.cm.gray_r, interpolation='nearest')
 85 | 
 86 |     thresh = np.max(ave_confusion)
 87 |     width,height = ave_confusion.shape
 88 |     for x in xrange(width):
 89 |         for y in xrange(height):
 90 |             if ave_confusion[x][y]<thresh/2:#50:
 91 |                 color='k'
 92 |             else:
 93 |                 color='w'
 94 |             mean = ave_confusion[x][y]
 95 |             std  = std_confusion[x][y]
 96 |             ax.annotate('$%2.1f$' % mean, xy=(y, x), horizontalalignment='center', verticalalignment='bottom',color=color, fontsize=11)
 97 |             ax.annotate('$\pm %2.1f$' % std, xy=(y, x), horizontalalignment='center', verticalalignment='top',color=color, fontsize=9)
 98 | 
 99 |     ax.xaxis.tick_top()
100 |     plt.xticks(range(width), labels+['Pr'])
101 |     plt.yticks(range(height), labels+['F'])
102 | 
103 |     xlabels = ax.get_xticklabels()
104 |     for label in xlabels: 
105 |         label.set_rotation(30) 
106 | 
107 |     plt.xlabel(title)
108 |     plt.show()
109 | 
110 | 
111 | 
112 | def augment_confusion_matrix(confusion):
113 |     # add precision, f-score and average to confusion matrix
114 |     # confusion: confusion matrix with true columns
115 |     
116 |     tp = np.diag(confusion) # true positive count
117 |     fp = np.sum(confusion, axis=1) - tp # false positive count
118 |     fn = np.sum(confusion, axis=0) - tp # false negative count
119 | 
120 |     pr     = tp / ((tp + fp) + 1e-12) # precision
121 |     rc     = tp / ((tp + fn) + 1e-12)# recall
122 |     fscore = 2. * pr * rc / ((pr + rc) + 1e-12) # f-score
123 |     ave    = np.sum(np.diag(confusion)) / np.sum(confusion) * 100
124 | 
125 |     confusion = confusion / np.sum(confusion, axis=0) * 100 # precentages
126 |     augmented_confusion = np.hstack((confusion, 100 * np.reshape(pr, (len(pr),1))))
127 |     augmented_confusion = np.vstack((augmented_confusion, np.hstack((100 * fscore, ave))))
128 | 
129 |     return augmented_confusion
130 | 
131 | if __name__ == '__main__':
132 | 
133 |     #ex1: python plot_conf.py --file ./saved/*.txt
134 |     #ex2: python plot_conf.py --summary ./saved/RF_500/RF_500_summary.txt --file `find ./saved/RF_500/ -name "*.txt" -print`
135 |     import argparse
136 |     
137 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
138 |         description='''Script to generate confusion matrices.
139 |         ''')
140 |     
141 |     parser.add_argument('--file', nargs='*', help='Tab separated file(s) listing filename, true class, and predicted class')
142 |     parser.add_argument('--labels', help='Text file with list of labels')
143 |     parser.add_argument('--summary')
144 | 
145 |     args = parser.parse_args()
146 | 
147 |     # tabulate confusions
148 |     with open(args.labels) as f:
149 |         lines = f.readlines()
150 |         if len(lines)==1: # assume comma separated, single line
151 |             label_list = lines[0].replace(' ','').split(',')
152 |         else:
153 |             label_list = [l.split()[0] for l in lines]
154 | 
155 | 
156 |     ave_acc = []
157 |     for f in args.file:
158 |                 
159 |         with open(f) as fname:
160 |             lines = fname.readlines()
161 | 
162 |         mx = np.max([int(l.strip().split('\t')[-2]) for l in lines])
163 |         mn = np.min([int(l.strip().split('\t')[-2]) for l in lines])
164 |         n_classes = mx-mn+1
165 | 
166 |         confusion = np.zeros((n_classes, n_classes))
167 | 
168 |         for l in lines:
169 |             s = l.strip().split('\t') 
170 |             true_class = int(s[1])-mn #classes[s[0].split('.')[0]]
171 |             pred_class = int(s[2])-mn
172 |             confusion[pred_class, true_class] += 1
173 | 
174 |         ave = save_conf_mat(confusion, os.path.splitext(f)[0], label_list)
175 |         ave_acc.append([f, ave])
176 | 
177 |     if args.summary:
178 |         with open(args.summary, 'w') as f:
179 |             for l in ave_acc:
180 |                 f.write('{}\t{}\n'.format(*l))
181 | 
182 | 
183 | 


--------------------------------------------------------------------------------
/utils/plot_individual_confs.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import numpy as np
  3 | from matplotlib import pyplot as plt
  4 | 
  5 | def excerpt_num(s):
  6 | 	return int(s.split('.')[-2])
  7 | 
  8 | if __name__=='__main__':
  9 | 	#_,_,fname,label = sys.argv
 10 | 	cat = 2
 11 | 	categories = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
 12 | 	pre = '/home/cmke/Development/dnn-mgr/saved_models/rf/'
 13 | 	fnames = [pre+'F_500_RSD_AF_L1.txt',
 14 | 		pre+'F_500_RSD_AF_L2.txt',
 15 | 		pre+'F_500_RSD_AF_L3.txt',
 16 | 		pre+'F_500_RSD_AF_LAll.txt']
 17 | 
 18 | 
 19 | 	# get system's classifications for given label
 20 | 	labels=[] # excerpt number | true label | l1 label | l2 label | l3 label | lall label 
 21 | 	for fname in fnames:
 22 | 		with open(fname) as f:
 23 | 			a = [[excerpt_num(i), int(j), int(k)] for i,j,k in [l.split()[:3] for l in f.readlines()] if int(j)==cat]
 24 | 		labels.append(np.vstack(a))
 25 | 	labels = np.hstack(labels)[:,[0,1,2,5,8,11]]
 26 | 
 27 | 	plt.figure(num=None, figsize=(9, 3), facecolor='w', edgecolor='k')
 28 | 	x_range = np.arange(len(labels))
 29 | 	y_range = np.arange(10)
 30 | 
 31 | 	plt.axis([0,len(x_range),0,len(y_range)])
 32 | 	ax = plt.gca()
 33 | 	plt.xticks(x_range+.5, labels[:,0]) # excerpt number
 34 | 	plt.yticks(y_range+.5, [c[:2] for c in categories])
 35 | 	ax.yaxis.set_ticks_position('none')
 36 | 
 37 | 	colors = [(.5,.1,.1,0.5),(.1,.5,.1,0.5),(.2,.5,.5,0.5),(.9,.1,.1,0.5)]
 38 | 	offset=np.array([.125, .125/2])+np.array([[0,.5],[.5,.5],[.5,0],[0,0]])
 39 | 	marker = ['1', '2', '3', 'A'] 
 40 | 	width=1
 41 | 	for x,lab in enumerate(labels):
 42 | 		for i,l in enumerate(lab[2:]):
 43 | 
 44 | 			r = plt.Rectangle([x,l], width, width, 
 45 | 					facecolor=(.5,.5,.5,.3), edgecolor='k', linewidth=1.5)
 46 | 			ax.add_patch(r)
 47 | 			plt.text(x+offset[i][0],l+offset[i][1], marker[i], fontsize=9)
 48 | 
 49 | 			#plt.plot(x+offset[i][0],l+offset[i][1],'o',color=colors[i])
 50 | 	tick = plt.gca().get_yticklabels()[cat]
 51 | 	tick.set_color('red')
 52 | 	plt.show()
 53 | 	plt.grid()
 54 | 	plt.xlabel('%s excerpts' % categories[cat])
 55 | 	plt.gcf().tight_layout()
 56 | 	
 57 | 	# width=.01
 58 | 	# color=['b','g','r','k']
 59 | 	# #offset=[[0,0],[width/2,0],[0,width/2],[width/2,width/2]]
 60 | 	# offset=[[0,0],[width,0],[2*width,0],[3*width,0]]
 61 | 
 62 | 	# for i,row in enumerate(a):
 63 | 	# 	x = i*width*4
 64 | 	# 	for j,col in enumerate(row[1:]):
 65 | 	# 		y = col*width*4
 66 | 	# 		r = plt.Rectangle([x+offset[j][0],y+offset[j][1]], width, width, facecolor=color[j], edgecolor='none')
 67 | 	# 		ax.add_patch(r)
 68 | 	# 		#plt.plot(x+offset[j][0],y+offset[j][1], 'x', color='k')
 69 | 
 70 | 	# plt.show()
 71 | 	# #plt.axis('equal')
 72 | 
 73 | 
 74 | 
 75 | 
 76 | 
 77 | 	# ================================================================
 78 | 
 79 | 	# plt.figure()
 80 | 	# ax = plt.gca()
 81 | 
 82 | 	# width=.01
 83 | 	# color=[(1,0,0,0.25),(0,1,0,0.25),(0,0,1,0.25),(0,0.3,0.2,0.25)]
 84 | 	# #offset=[[0,0],[width/2,0],[0,width/2],[width/2,width/2]]
 85 | 	# offset=[[0,0],[width,0],[2*width,0],[3*width,0]]
 86 | 
 87 | 
 88 | 	# # create channels
 89 | 	# for i in range(10):
 90 | 	# 	x=i*width*4
 91 | 	# 	#for j in range(4):
 92 | 	# 	#	r = plt.Rectangle([x+j*width,0], width, 1, facecolor=color[j], edgecolor='none')
 93 | 	# 	#	ax.add_patch(r)
 94 | 	# 	r = plt.Rectangle([x,0], 4*width, 1, facecolor=color[i%2], edgecolor='none')
 95 | 	# 	ax.add_patch(r)
 96 | 	# 	plt.plot((x,x),(0,1),'k-')
 97 | 	# for i in range(10):
 98 | 	# 	y=i*width*4
 99 | 	# 	plt.plot((0,1),(y,y),'k-')
100 | 	# plt.show()
101 | 	# #plt.axis('equal')
102 | 


--------------------------------------------------------------------------------
/utils/plot_mean_std_recall.py:
--------------------------------------------------------------------------------
 1 | import os, glob
 2 | import numpy as np
 3 | from matplotlib import pyplot as plt
 4 | 
 5 | def compute_recall(fname, n_classes=10):
 6 |     
 7 |     with open(fname) as f:
 8 |         lines = [l.split() for l in f.readlines()]
 9 |     
10 |     confusion = np.zeros((n_classes, n_classes))
11 |     for (fname, true_label, pred_label) in lines:
12 |         confusion[int(pred_label), int(true_label)] += 1
13 | 
14 |     confusion /= np.sum(confusion, axis=0)
15 |     recalls = np.diag(confusion)
16 | 
17 |     return np.mean(recalls), np.std(recalls)/np.sqrt(250)*2.145
18 | 
19 | def get_freq_from_fname(f):
20 |     fname = os.path.splitext(os.path.split(f)[-1])[0]
21 |     freq = fname.split('-')[-1]
22 |     return float(freq)
23 | 
24 | if __name__=='__main__':
25 |     import argparse
26 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
27 |         description='')
28 |     parser.add_argument('--dnn_dir')
29 |     parser.add_argument('--aux_dir')
30 | 
31 |     args = parser.parse_args()
32 | 
33 |     dnn_files = glob.glob(os.path.join(args.dnn_dir, '*.txt'))
34 |     aux_files = glob.glob(os.path.join(args.aux_dir, '*.txt'))
35 | 
36 |     dnn_recall=[]
37 |     for d in dnn_files:
38 |         f = get_freq_from_fname(d)
39 |         m,s = compute_recall(d)
40 |         dnn_recall.append((f,m,s))
41 |     dnn_recall = np.vstack(dnn_recall)
42 |     sortind = np.argsort(dnn_recall[:,0])
43 |     dnn_recall = np.vstack([dnn_recall[i,:] for i in sortind])
44 | 
45 |     aux_recall=[]
46 |     for a in aux_files:
47 |         f = get_freq_from_fname(a)
48 |         m,s = compute_recall(a)
49 |         aux_recall.append((f,m,s))
50 |     aux_recall = np.vstack(aux_recall)
51 |     sortind = np.argsort(aux_recall[:,0])
52 |     aux_recall = np.vstack([aux_recall[i,:] for i in sortind])
53 | 
54 |     plt.ion()
55 |     plt.figure()
56 |     color1=[1,0.6,0.1]
57 |     color2=[.4,.6,1]
58 |     #plt.errorbar(dnn_recall[:,0], dnn_recall[:,1], yerr=dnn_recall[:,2], fmt='o')
59 |     plt.plot(dnn_recall[:,0], dnn_recall[:,1], 'o-', color=tuple(color1+[0.8]), linewidth=2)
60 |     plt.fill_between(dnn_recall[:,0], dnn_recall[:,1]-dnn_recall[:,2], dnn_recall[:,1]+dnn_recall[:,2], color=tuple(color1+[0.4]))
61 | 
62 |     #plt.errorbar(aux_recall[:,0], aux_recall[:,1], yerr=aux_recall[:,2], fmt='o')
63 |     plt.plot(aux_recall[:,0], aux_recall[:,1], 'o-', color=tuple(color2+[0.8]), linewidth=2)
64 |     plt.fill_between(aux_recall[:,0], aux_recall[:,1]-aux_recall[:,2], aux_recall[:,1]+aux_recall[:,2], color=tuple(color2+[0.3]))
65 | 
66 |     plt.grid()
67 |     plt.xlabel('Lowpass cutoff frequency (Hz)')
68 |     plt.ylabel('Recall')
69 |     plt.axis('tight')
70 |     plt.axis([20, 11000, 0, 1])
71 | 
72 |     #plt.savefig('recall_vs_cutoff.pdf', format='pdf')


--------------------------------------------------------------------------------
/utils/read_mp3.py:
--------------------------------------------------------------------------------
 1 | # from http://www.iro.umontreal.ca/~eckdoug/python_for_courses/readmp3.py
 2 | 
 3 | import numpy as N
 4 | import sys,os
 5 | 
 6 | try  :
 7 |     import mad
 8 | except :
 9 |     print 'You need to install pymad from http://spacepants.org/src/pymad'
10 |     
11 | def read_mp3(fn) :
12 |     """Reads an mp3 file using madplay library.  Returns (x,fs)
13 |     You need pymad from http://spacepants.org/src/pymad. 
14 |     To install pymad you will also need libmad"""
15 |     mf=mad.MadFile(fn)
16 |     fs=mf.samplerate()       
17 |     mode_to_channel = { mad.MODE_SINGLE_CHANNEL:1, 
18 |                         mad.MODE_DUAL_CHANNEL:2,
19 |                         mad.MODE_JOINT_STEREO:2,
20 |                         mad.MODE_STEREO:2}
21 |     channels = mode_to_channel[mf.mode()]
22 |     secs=int(mf.total_time()/1000.0)
23 |     samps_per_channel = (mf.samplerate() * mf.total_time()) /1000.0
24 |     samples = samps_per_channel * channels
25 |     dat = N.zeros(((samples+fs)*2),'int16')  #we store 1 second at end of song
26 |     st=0
27 |     buf=mf.read()
28 |     while buf :
29 |         shortbuf = N.fromstring(buf,'int16')
30 |         if st+len(shortbuf)<len(dat):
31 |             dat[st:st+len(shortbuf)]=shortbuf
32 |         st += len(shortbuf)
33 |         buf=mf.read()
34 |     dat=dat[:st]
35 |     x=dat/float(32768)
36 | 
37 |     #it seems that we always get a stereo channel even for mono files, 
38 |     #but that values are duplicated e.g. [10 , 10, -4 , -4]
39 |     #I don't have enough mono files around to check so this trap is left in place
40 |     if channels==1 and len(dat) > samples * 1.5 :
41 |         x=x[0::2] #grab every other sample
42 |     if channels>1 :
43 |         x=x.reshape(-1,channels)
44 |     return (x,fs,'int16')


--------------------------------------------------------------------------------
/utils/tensongs_exp.py:
--------------------------------------------------------------------------------
  1 | import os, argparse
  2 | from scikits import audiolab, samplerate
  3 | from matplotlib import pyplot as plt
  4 | from sklearn.externals import joblib
  5 | import numpy as np
  6 | import scipy as sp
  7 | import glob
  8 | import theano
  9 | from theano import tensor as T
 10 | from pylearn2.utils import serial
 11 | from audio_dataset import AudioDataset
 12 | from pylearn2.datasets.dense_design_matrix import DefaultViewConverter
 13 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
 14 | import pylearn2.config.yaml_parse as yaml_parse
 15 | from utils.read_mp3 import read_mp3
 16 | 
 17 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features
 18 | 
 19 | import pdb
 20 | 
 21 | def stripf(f):
 22 |     fname = os.path.split(f)[-1]
 23 |     return os.path.splitext(fname)[0]
 24 | 
 25 | if __name__ == '__main__':
 26 | 
 27 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
 28 |     description='')
 29 |     parser.add_argument('--dnn_model', help='dnn model to use for features')
 30 |     parser.add_argument('--aux_model', help='(auxilliary) model trained on dnn features')
 31 |     parser.add_argument('--labels', help='(auxilliary) model trained on dnn features')
 32 |     parser.add_argument('--in_path', help='directory with files to test model on')
 33 |     parser.add_argument('--out_path', help='location for saving adversary (name automatically generated)')
 34 | 
 35 |     args = parser.parse_args()
 36 | 
 37 |     # tunable alg. parameters
 38 |     snr = 15.
 39 |     mu  = .1
 40 |     stop_thresh = .9
 41 |     maxits = 100
 42 | 
 43 |     with open(args.labels) as f:
 44 |         lines = f.readlines()
 45 |         if len(lines)==1: # assume comma separated, single line
 46 |             label_list = lines[0].replace(' ','').split(',')
 47 |         else:
 48 |             label_list = [l.split()[0] for l in lines]
 49 | 
 50 |     targets = range(len(label_list))
 51 | 
 52 |     # load dnn model, fprop function
 53 |     dnn_model    = serial.load(args.dnn_model)
 54 |     input_space  = dnn_model.get_input_space()
 55 |     batch        = input_space.make_theano_batch()
 56 |     fprop_theano = theano.function([batch], dnn_model.fprop(batch))
 57 | 
 58 |     if isinstance(input_space, Conv2DSpace):
 59 |         tframes, dim = input_space.shape
 60 |         view_converter = DefaultViewConverter((tframes, dim, 1))
 61 |     else:
 62 |         dim = input_space.dim        
 63 |         tframes = 1
 64 |         view_converter = None
 65 | 
 66 |     if view_converter:
 67 |         def fprop(batch):
 68 |             nframes = batch.shape[0]
 69 |             thop = 1.
 70 |             sup = np.arange(0,nframes-tframes+1, np.int(tframes/thop))
 71 |             data = np.vstack([np.reshape(batch[i:i+tframes, :],(tframes*dim,)) for i in sup])
 72 |             return fprop_theano(view_converter.get_formatted_batch(data, input_space))
 73 |     else:
 74 |         fprop = fprop_theano
 75 | 
 76 |     # load aux model
 77 |     if args.aux_model:
 78 |         aux_model = joblib.load(args.aux_model)
 79 |         L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1]
 80 |         if L=='All':
 81 |             which_layers = [1,2,3]
 82 |         else:
 83 |             which_layers = [int(L)]
 84 |         aux_file = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.txt'), 'w')
 85 | 
 86 |     dnn_file = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.txt'), 'w')
 87 | 
 88 |     # fft params
 89 |     nfft = 2*(dim-1)
 90 |     nhop = nfft//2
 91 |     win = winfunc(2048)
 92 |     
 93 |     flist = glob.glob(os.path.join(args.in_path, '*'))
 94 | 
 95 |     for f in flist:
 96 |         fname = stripf(f)
 97 | 
 98 |         if f.endswith('.wav'):
 99 |             read_fun = audiolab.wavread             
100 |         elif f.endswith('.au'):
101 |             read_fun = audiolab.auread
102 |         elif f.endswith('.mp3'):
103 |             read_fun = read_mp3
104 |         else:
105 |             continue
106 | 
107 |         x, fstmp, _ = read_fun(f)
108 | 
109 |         # make mono
110 |         if len(x.shape) != 1: 
111 |             x = np.sum(x, axis=1)/2.
112 | 
113 |         seglen=30
114 |         x = x[:fstmp*seglen]
115 | 
116 |         fs = 22050
117 |         if fstmp != fs:
118 |             x = samplerate.resample(x, fs/float(fstmp), 'sinc_best')
119 | 
120 |         # compute mag. spectra
121 |         Mag, Phs = compute_fft(x, nfft, nhop)
122 |         X0 = Mag[:,:dim]
123 |             
124 |         epsilon = np.linalg.norm(X0)/X0.shape[0]/10**(snr/20.)
125 | 
126 |         # write file name
127 |         dnn_file.write('{}\t'.format(fname))
128 |         if args.aux_model:
129 |             aux_file.write('{}\t'.format(fname))
130 |  
131 |         for t in targets:
132 | 
133 |             # search for adversary
134 |             X_adv, P_adv = find_adversary(
135 |                 model=dnn_model, 
136 |                 X0=X0, 
137 |                 label=t, 
138 |                 P0=Phs, 
139 |                 mu=mu, 
140 |                 epsilon=epsilon, 
141 |                 maxits=maxits, 
142 |                 stop_thresh=stop_thresh, 
143 |                 griffin_lim=True)
144 | 
145 |             # get time-domain representation
146 |             x_adv = overlap_add( np.hstack((X_adv, X_adv[:,-2:-nfft/2-1:-1])) * np.exp(1j*P_adv))
147 |             
148 |             minlen = min(len(x_adv), len(x))
149 |             x_adv = x_adv[:minlen]
150 |             x = x[:minlen] 
151 |             out_snr = 20*np.log10(np.linalg.norm(x[nfft:-nfft]) / np.linalg.norm(x[nfft:-nfft]-x_adv[nfft:-nfft]))
152 | 
153 |            # dnn prediction
154 |             pred = np.argmax(np.sum(fprop(X_adv), axis=0))
155 |             if pred == t:
156 |                 dnn_file.write('{}\t'.format(int(out_snr+.5)))
157 |             else:
158 |                 dnn_file.write('{}\t'.format('na'))
159 | 
160 |             # aux prediction
161 |             if args.aux_model:
162 |                 X_adv_agg = aggregate_features(dnn_model, X_adv, which_layers)
163 |                 pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg), dtype='int')))
164 |                 if pred == t:
165 |                     aux_file.write('{}\t'.format(int(out_snr+.5)))
166 |                 else:
167 |                     aux_file.write('{}\t'.format('na'))
168 | 
169 |             # SAVE ADVERSARY FILES
170 |             out_file = os.path.join(args.out_path,
171 |             '{fname}.{label}.adversary.{snr}dB.wav'.format(
172 |                 fname=fname,
173 |                 label=label_list[t],
174 |                 snr=int(out_snr+.5)))
175 |             audiolab.wavwrite(x_adv, out_file, fs)
176 | 
177 |         dnn_file.write('\n'.format(fname))
178 |         if args.aux_model:
179 |             aux_file.write('\n'.format(fname))
180 |     
181 |     dnn_file.close()
182 |     if args.aux_model:
183 |         aux_file.close()
184 |     


--------------------------------------------------------------------------------
/utils/tensongs_exp_filtered.py:
--------------------------------------------------------------------------------
  1 | import os, argparse
  2 | import scikits.audiolab as audiolab
  3 | import scikits.samplerate as samplerate
  4 | from matplotlib import pyplot as plt
  5 | from sklearn.externals import joblib
  6 | import numpy as np
  7 | import scipy as sp
  8 | import glob
  9 | import theano
 10 | from theano import tensor as T
 11 | from pylearn2.utils import serial
 12 | from audio_dataset import AudioDataset
 13 | from pylearn2.space import CompositeSpace, Conv2DSpace, VectorSpace, IndexSpace
 14 | import pylearn2.config.yaml_parse as yaml_parse
 15 | 
 16 | from test_adversary import winfunc, compute_fft, overlap_add, griffin_lim_proj, find_adversary, aggregate_features
 17 | 
 18 | import pdb
 19 | 
 20 | # def find_adversary(model, X0, label, P0=None, mu=.1, epsilon=.25, maxits=10, stop_thresh=0.5, griffin_lim=False):
 21 | #     '''
 22 | #     Solves:
 23 | 
 24 | #     y* = argmin_y f(y; label) 
 25 | #     s.t. y >= 0 and ||y-X0|| < e
 26 | 
 27 | #     where f(y) is the cost associated the network associates with the pair (y,label)
 28 | 
 29 | #     This can be solved using the projected gradient method:
 30 | 
 31 | #     min_y f(y)
 32 | #     s.t. y >= 0 and ||y-X0|| < e
 33 | 
 34 | #     z = max(0, y^k - mu.f'(y^k))
 35 | #     y^k+1 = P(z)
 36 | 
 37 | #     P(z) = min_u ||u-z|| s.t. {u | ||u-X0|| < e }
 38 | #     Lagrangian(u,l) = L(u,l) = ||u-z|| + nu*(||u-X0|| - e)
 39 | #     dL/du = u-z + nu*(u-X0) = 0
 40 | #     u = (1+nu)^-1 (z + nu*X0)
 41 | 
 42 | #     KKT:
 43 | #     ||u-x|| = e
 44 | #     ||(1/(1+nu))(z + nu*x) - x|| = e
 45 | #     ||(1/(1+nu))z + ((nu/(1+nu))-1)x|| = e
 46 | #     ||(1/(1+nu))z - (1/(1+nu))x|| = e
 47 | #     (1/(1+nu))||z-x|| = e
 48 | #     nu = max(0,||z-x||/e - 1)
 49 | 
 50 | #     function inputs:
 51 | 
 52 | #     model - pylearn2 dnn model (implements fprop, cost)
 53 | #     X0 - an example that the model classifies correctly
 54 | #     label - an incorrect label
 55 | #     '''
 56 | #     # convert integer label into one-hot vector
 57 | #     n_classes, n_examples = model.get_output_space().dim, X0.shape[0]
 58 | #     one_hot               = np.zeros((n_examples, n_classes), dtype=np.float32)
 59 | #     one_hot[:,label]      = 1
 60 | 
 61 | #     # Set-up gradient computation w/ Theano
 62 | #     in_batch  = model.get_input_space().make_theano_batch()
 63 | #     out_batch = model.get_output_space().make_theano_batch()
 64 | #     cost      = model.cost(out_batch, model.fprop(in_batch))
 65 | #     dCost     = T.grad(cost, in_batch)
 66 | #     grad      = theano.function([in_batch, out_batch], dCost)
 67 | #     fprop     = theano.function([in_batch], model.fprop(in_batch))
 68 | 
 69 | #     # projected gradient:
 70 | #     last_pred = 0
 71 | #     #Y = np.array(np.random.rand(*X0.shape), dtype=np.float32) 
 72 | #     Y = np.copy(X0)
 73 | #     Y_old = np.copy(Y)
 74 | #     t_old = 1
 75 | #     for i in xrange(maxits):        
 76 | 
 77 | #         # gradient step
 78 | #         Z = Y - mu * n_examples * grad(Y, one_hot)
 79 | 
 80 | #         # non-negative projection
 81 | #         Z = Z * (Z>0)
 82 | 
 83 | #         if griffin_lim:
 84 | #             Z, P0 = griffin_lim_proj(np.hstack((Z, Z[:,-2:-nfft/2-1:-1])), P0, its=0)
 85 |         
 86 | #         # maximum allowable signal-to-noise projection
 87 | #         nu = np.linalg.norm((Z-X0))/n_examples/epsilon - 1 # lagrange multiplier
 88 | #         nu = nu * (nu>=0)
 89 | #         Y  = (Z + nu*X0) / (1+nu)
 90 |         
 91 | #         # FISTA momentum
 92 | #         t = .5 + np.sqrt(1+4*t_old**2)/2.
 93 | #         alpha = (t_old - 1)/t
 94 | #         Y += alpha * (Y - Y_old)
 95 | #         Y_old = np.copy(Y)
 96 | #         t_old = t
 97 | 
 98 | #         # stopping condition
 99 | #         pred = np.sum(fprop(Y), axis=0)
100 | #         pred /= np.sum(pred)
101 | 
102 | #         #print 'iteration: {}, pred[label]: {}, nu: {}'.format(i, pred[label], nu)
103 | #         print 'iteration: {}, pred[label]: {}, nu: {}, snr: {}'.format(i, pred[label], nu, 20*np.log10(np.linalg.norm(X0)/np.linalg.norm(Y-X0)))
104 | 
105 | #         if pred[label] > stop_thresh:
106 | #             break
107 | #         elif pred[label] < last_pred + 1e-4:
108 | #             break
109 | #         last_pred = pred[label]
110 | 
111 | #     return Y, P0
112 | 
113 | # winfunc = lambda x: np.hanning(x)
114 | # def compute_fft(x, nfft=1024, nhop=512):
115 |   
116 | #     window   = winfunc(nfft)
117 | #     nframes  = int((len(x)-nfft)//nhop + 1)
118 | #     fft_data = np.zeros((nframes, nfft))
119 | 
120 | #     for i in xrange(nframes):
121 | #         sup = i*nhop + np.arange(nfft)
122 | #         fft_data[i,:] = x[sup] * window
123 |     
124 | #     fft_data = np.fft.fft(fft_data)
125 | #     return tuple((np.array(np.abs(fft_data), dtype=np.float32), np.array(np.angle(fft_data), dtype=np.float32)))
126 | 
127 | # def overlap_add(X, nfft=1024, nhop=512):
128 | 
129 | #     window = winfunc(nfft) # must use same window as compute_fft
130 | #     L = X.shape[0]*nhop + (nfft-nhop)
131 | #     x = np.zeros(L)
132 | #     win_sum = np.zeros(L)
133 | 
134 | #     for i, frame in enumerate(X):
135 | #         sup = i*nhop + np.arange(nfft)
136 | #         x[sup] += np.real(np.fft.ifft(frame)) * window
137 | #         win_sum[sup] += window **2 # ensure perfect reconstruction
138 |     
139 | #     return x/(win_sum + 1e-12)
140 | 
141 | # def griffin_lim_proj(Mag, Phs=None, its=4, nfft=1024, nhop=512):
142 | #     if Phs is None:
143 | #         Phs = np.pi * np.random.randn(*Mag.shape)
144 | 
145 | #     x = overlap_add(Mag * np.exp(1j*Phs), nfft, nhop)
146 | #     for i in xrange(its):
147 | #         _, Phs = compute_fft(x, nfft, nhop)
148 | #         x = overlap_add(Mag * np.exp(1j*Phs), nfft, nhop)
149 | 
150 | #     Mag, Phs = compute_fft(x, nfft, nhop)
151 | #     return np.array(Mag[:,:nfft//2+1], dtype=np.float32), Phs
152 | 
153 | 
154 | # def aggregate_features(model, X, which_layers=[3], win_size=200, step=100):
155 | #     assert np.max(which_layers) < len(model.layers)
156 | 
157 | #     n_classes, n_examples = model.get_output_space().dim, X.shape[0] 
158 | #     in_batch              = model.get_input_space().make_theano_batch()    
159 | #     fprop                 = theano.function([in_batch], model.fprop(in_batch, return_all=True))
160 | #     output_data           = fprop(X)
161 | #     feats                 = np.hstack([output_data[i] for i in which_layers])
162 | 
163 | #     agg_feat = []
164 | #     for i in xrange(0, feats.shape[0]-win_size, step):
165 | #         chunk = feats[i:i+win_size,:]
166 | #         agg_feat.append(np.hstack((np.mean(chunk, axis=0), np.std(chunk, axis=0))))
167 |         
168 | #     return np.vstack(agg_feat)
169 | 
170 | def stripf(f):
171 |     fname = os.path.split(f)[-1]
172 |     return os.path.splitext(fname)[0]
173 | 
174 | if __name__ == '__main__':
175 | 
176 |     parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
177 |     description='')
178 |     parser.add_argument('--dnn_model', help='dnn model to use for features')
179 |     parser.add_argument('--aux_model', help='(auxilliary) model trained on dnn features')
180 |     parser.add_argument('--in_path', help='file to test model on')
181 |     parser.add_argument('--out_path', help='location for saving adversary (name automatically generated)')
182 | 
183 |     args = parser.parse_args()
184 | 
185 |     # tunable alg. parameters
186 |     snr = 15.
187 |     mu  = .05
188 |     stop_thresh = .9
189 |     maxits = 100
190 |     cut_freq = 9000.
191 | 
192 |     label_list = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
193 |     targets     = range(len(label_list))
194 | 
195 |     # load dnn model, fprop function
196 |     dnn_model   = serial.load(args.dnn_model)
197 |     input_space = dnn_model.get_input_space()
198 |     batch       = input_space.make_theano_batch()
199 |     fprop       = theano.function([batch], dnn_model.fprop(batch))
200 | 
201 |     # load aux model
202 |     aux_model = joblib.load(args.aux_model)
203 |     L = os.path.splitext(os.path.split(args.aux_model)[-1])[0].split('_L')[-1]
204 |     if L=='All':
205 |         which_layers = [1,2,3]
206 |     else:
207 |         which_layers = [int(L)]
208 | 
209 |     # fft params
210 |     nfft = 2*(input_space.dim-1)
211 |     nhop = nfft//2
212 |     win = winfunc(1024)
213 |     
214 |     # design lowpass filter.
215 |     b,a = sp.signal.butter(4, cut_freq/(22050./2.))
216 | 
217 |     flist = glob.glob(args.in_path +'*.wav')
218 | 
219 |     dnn_file = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.txt'), 'w')
220 |     dnn_file_filt = open(os.path.join(args.out_path, stripf(args.dnn_model) + '.adversaries.filtered.txt'), 'w')
221 |     aux_file = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.txt'), 'w')
222 |     aux_file_filt = open(os.path.join(args.out_path, stripf(args.aux_model) + '.adversaries.filtered.txt'), 'w')
223 | 
224 |     for f in flist:
225 |         fname = stripf(f)
226 | 
227 |         # load audio file
228 |         x, fs, fmt = audiolab.wavread(f)
229 |     
230 |         # make sure format agrees with training data
231 |         if len(x.shape)!=1:
232 |             print 'making mono:'
233 |             x = np.sum(x, axis=1)/2. # mono
234 |         if fs != 22050:
235 |             print 'resampling to 22050 hz:'
236 |             x = samplerate.resample(x, 22050./fs, 'sinc_best')
237 |             fs = 22050
238 |         
239 |         # truncate input to multiple of hopsize
240 |         nframes = (len(x)-nfft)/nhop
241 |         x = x[:(nframes-1)*nhop + nfft] 
242 | 
243 |         # smooth boundaries to prevent a click    
244 |         x[:512]  *= win[:512]
245 |         x[-512:] *= win[512:]
246 | 
247 |         # compute mag. spectra
248 |         Mag, Phs = compute_fft(x, nfft, nhop)
249 |         X0 = Mag[:,:input_space.dim]
250 |             
251 |         epsilon = np.linalg.norm(X0)/X0.shape[0]/10**(snr/20.)
252 | 
253 |         # write file name
254 |         dnn_file.write('{}\t'.format(fname))
255 |         dnn_file_filt.write('{}\t'.format(fname))
256 |         aux_file.write('{}\t'.format(fname))
257 |         aux_file_filt.write('{}\t'.format(fname))
258 | 
259 |         for t in targets:
260 | 
261 |             # search for adversary
262 |             X_adv, P_adv = find_adversary(
263 |                 model=dnn_model, 
264 |                 X0=X0, 
265 |                 label=t, 
266 |                 P0=Phs, 
267 |                 mu=mu, 
268 |                 epsilon=epsilon, 
269 |                 maxits=maxits, 
270 |                 stop_thresh=stop_thresh, 
271 |                 griffin_lim=True)
272 | 
273 |             # get time-domain representation
274 |             x_adv   = overlap_add( np.hstack((X_adv, X_adv[:,-2:-nfft/2-1:-1])) * np.exp(1j*P_adv))
275 |             out_snr = 20*np.log10(np.linalg.norm(x[nfft:-nfft]) / np.linalg.norm(x[nfft:-nfft]-x_adv[nfft:-nfft]))
276 | 
277 |             # BEFORE FILTERING
278 |             # ===========================================
279 |             # dnn prediction
280 |             pred = np.argmax(np.sum(fprop(X_adv), axis=0))
281 |             if pred == t:
282 |                 dnn_file.write('{}\t'.format(int(out_snr+.5)))
283 |             else:
284 |                 dnn_file.write('{}\t'.format('na'))
285 | 
286 |             # aux prediction
287 |             X_adv_agg = aggregate_features(dnn_model, X_adv, which_layers)
288 |             pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg), dtype='int')))
289 |             if pred == t:
290 |                 aux_file.write('{}\t'.format(int(out_snr+.5)))
291 |             else:
292 |                 aux_file.write('{}\t'.format('na'))
293 | 
294 |             # filtered representation
295 |             x_filt = sp.signal.lfilter(b,a,x_adv)
296 |             Mag2, Phs2 = compute_fft(x_filt, nfft, nhop)
297 |             X_adv_filt = Mag2[:,:input_space.dim]            
298 | 
299 |             # AFTER FILTERING
300 |             # ==================================================
301 |             # dnn prediction
302 |             pred = np.argmax(np.sum(fprop(X_adv_filt), axis=0))
303 |             if pred == t:
304 |                 dnn_file_filt.write('{}\t'.format('x'))
305 |             else:
306 |                 dnn_file_filt.write('{}\t'.format('o'))
307 | 
308 |             # aux prediction
309 |             X_adv_agg_filt = aggregate_features(dnn_model, X_adv_filt, which_layers)
310 |             pred = np.argmax(np.bincount(np.array(aux_model.predict(X_adv_agg_filt), dtype='int')))
311 |             if pred == t:
312 |                 aux_file_filt.write('{}\t'.format('x'))
313 |             else:
314 |                 aux_file_filt.write('{}\t'.format('o'))
315 | 
316 |             # SAVE ADVERSARY FILES
317 |             out_file = os.path.join(args.out_path,
318 |             '{fname}.{label}.adversary.{snr}dB.wav'.format(
319 |                 fname=fname,
320 |                 label=label_list[t],
321 |                 snr=int(out_snr+.5)))
322 |             audiolab.wavwrite(x_adv, out_file, fs, fmt)
323 | 
324 |             out_file2 = os.path.join(args.out_path,
325 |             '{fname}.{label}.adversary.filtered.wav'.format(
326 |                 fname=fname,
327 |                 label=label_list[t]))
328 |             audiolab.wavwrite(x_filt, out_file2, fs, fmt)
329 | 
330 |         dnn_file.write('\n'.format(fname))
331 |         dnn_file_filt.write('\n'.format(fname))
332 |         aux_file.write('\n'.format(fname))
333 |         aux_file_filt.write('\n'.format(fname))
334 | 
335 |     dnn_file.close()
336 |     dnn_file_filt.close()
337 |     aux_file.close()
338 |     aux_file_filt.close()
339 | 


--------------------------------------------------------------------------------
/yaml_scripts/mlp_rlu.yaml:
--------------------------------------------------------------------------------
 1 | !obj:pylearn2.train.Train {
 2 |     dataset : &trainset !obj:audio_dataset.AudioDataset {
 3 |         which_set : 'train',
 4 |         config : &fold !pkl: "%(fold_config)s"
 5 |     },
 6 |     model : !obj:pylearn2.models.mlp.MLP {
 7 |         nvis : 513,
 8 |         layers : [
 9 |             !obj:audio_dataset.PreprocLayer {
10 |                 config : *fold,
11 |                 proc_type : 'standardize'
12 |                 },
13 |             !obj:pylearn2.models.mlp.RectifiedLinear {
14 |                 layer_name : 'h0',
15 |                 dim : %(dim_h0)i,
16 |                 irange : &irange .1
17 |                 },
18 |             !obj:pylearn2.models.mlp.RectifiedLinear {
19 |                 layer_name : 'h1',
20 |                 dim : %(dim_h1)i,
21 |                 irange : *irange
22 |                 },
23 |             !obj:pylearn2.models.mlp.RectifiedLinear {
24 |                 layer_name : 'h2',
25 |                 dim : %(dim_h2)i,
26 |                 irange : *irange
27 |                 },
28 |             !obj:pylearn2.models.mlp.Softmax {
29 |                 n_classes : 10,
30 |                 layer_name : 'y',
31 |                 irange : *irange
32 |                 }
33 |             ]
34 |     },
35 |     algorithm : !obj:pylearn2.training_algorithms.sgd.SGD {
36 |         learning_rate : .01,
37 |         learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum {
38 |             init_momentum : 0.5
39 |         },
40 |         train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
41 |         monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
42 |         #batches_per_iter : 500,
43 |         batch_size : 1200,
44 |         monitoring_dataset : {
45 |             'train' : *trainset,
46 |             'valid' : !obj:audio_dataset.AudioDataset {
47 |                 which_set : 'valid',
48 |                 config : *fold
49 |             }
50 |         },
51 |         termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
52 |             channel_name : 'valid_y_misclass',
53 |             prop_decrease : .001,
54 |             N: 10
55 |         }#,cost : !obj:pylearn2.costs.mlp.Default {}
56 |     },
57 |     extensions: [
58 |         !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
59 |              channel_name: 'valid_y_misclass',
60 |              save_path: "%(best_model_save_path)s"
61 |         },
62 |         !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
63 |             start: 1,
64 |             saturate: 50,
65 |             final_momentum: .9
66 |         },
67 |         !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
68 |             start: 1,
69 |             saturate: 50,
70 |             decay_factor: .01
71 |         },
72 |     ],
73 |     save_path : "%(save_path)s",
74 |     save_freq : 1
75 | }


--------------------------------------------------------------------------------
/yaml_scripts/mlp_rlu_conv2.yaml:
--------------------------------------------------------------------------------
 1 | !obj:pylearn2.train.Train {
 2 |     dataset : &trainset !obj:audio_dataset.AudioDataset {
 3 |         which_set : 'train',
 4 |         config : &fold !pkl: "%(fold_config)s"
 5 |     },
 6 |     model : !obj:pylearn2.models.mlp.MLP {
 7 |         batch_size : 5,
 8 |         input_space: !obj:pylearn2.space.Conv2DSpace {
 9 |            shape: [100, 513],
10 |            num_channels: 1
11 |         },
12 |         layers : [
13 |             !obj:pylearn2.models.mlp.ConvRectifiedLinear {
14 |                 layer_name : 'h0',
15 |                 output_channels : 32,
16 |                 kernel_shape : [4, 400],
17 |                 pool_shape : [4, 4],
18 |                 pool_stride : [2, 2],
19 |                 irange : &irange .01
20 |                 },
21 |             !obj:pylearn2.models.mlp.ConvRectifiedLinear {
22 |                 layer_name : 'h1',
23 |                 output_channels : 32,
24 |                 kernel_shape : [8, 8],
25 |                 pool_shape : [4, 4],
26 |                 pool_stride : [2, 2],
27 |                 irange : *irange 
28 |                 },
29 |             !obj:pylearn2.models.mlp.RectifiedLinear {
30 |                 layer_name : 'h2',
31 |                 dim : 50,
32 |                 irange : *irange
33 |                 },
34 |             !obj:pylearn2.models.mlp.Softmax {
35 |                 n_classes : 10,
36 |                 layer_name : 'y',
37 |                 irange : *irange
38 |                 }
39 |             ]
40 |     },
41 |     algorithm : !obj:pylearn2.training_algorithms.sgd.SGD {
42 |         learning_rate : .001,
43 |         learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum {
44 |             init_momentum : 0.5
45 |         },
46 |         train_iteration_mode : 'even_shuffled_sequential', #'batchwise_shuffled_sequential',
47 |         monitor_iteration_mode : 'even_shuffled_sequential', #'batchwise_shuffled_sequential',
48 |         #batches_per_iter : 500,
49 |         #batch_size : 1200,
50 |         monitoring_dataset : {
51 |             'train' : *trainset,
52 |             'valid' : !obj:audio_dataset.AudioDataset {
53 |                 which_set : 'valid',
54 |                 config : *fold
55 |             }
56 |         },
57 |         termination_criterion: !obj:pylearn2.termination_criteria.And {
58 |             criteria: [
59 |                 !obj:pylearn2.termination_criteria.MonitorBased {
60 |                     channel_name: "valid_y_misclass",
61 |                     prop_decrease: 0.001,
62 |                     N: 10
63 |                 },
64 |                 !obj:pylearn2.termination_criteria.EpochCounter {
65 |                     max_epochs: 30
66 |                 }
67 |             ]
68 |         },
69 | 
70 |     },
71 |     extensions: [
72 |         !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
73 |              channel_name: 'valid_y_misclass',
74 |              save_path: "%(best_model_save_path)s"
75 |         },
76 |         !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
77 |             start: 1,
78 |             saturate: 100,
79 |             final_momentum: .9
80 |         },
81 |         !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
82 |             start: 1,
83 |             saturate: 100,
84 |             decay_factor: .01
85 |         },
86 |     ],
87 |     save_path : "%(save_path)s",
88 |     save_freq : 1
89 | }


--------------------------------------------------------------------------------
/yaml_scripts/mlp_rlu_dropout.yaml:
--------------------------------------------------------------------------------
 1 | !obj:pylearn2.train.Train {
 2 |     dataset : &trainset !obj:audio_dataset.AudioDataset {
 3 |         which_set : 'train',
 4 |         config : &fold !pkl: "%(fold_config)s"
 5 |     },
 6 |     model : !obj:pylearn2.models.mlp.MLP {
 7 |         nvis : 513,
 8 |         layers : [
 9 |             !obj:audio_dataset.PreprocLayer {
10 |                 config : *fold,
11 |                 proc_type : 'standardize'
12 |                 },
13 |             !obj:pylearn2.models.mlp.RectifiedLinear {
14 |                 layer_name : 'h0',
15 |                 dim : %(dim_h0)i,
16 |                 irange : &irange .1
17 |                 },
18 |             !obj:pylearn2.models.mlp.RectifiedLinear {
19 |                 layer_name : 'h1',
20 |                 dim : %(dim_h1)i,
21 |                 irange : *irange
22 |                 },
23 |             !obj:pylearn2.models.mlp.RectifiedLinear {
24 |                 layer_name : 'h2',
25 |                 dim : %(dim_h2)i,
26 |                 irange : *irange
27 |                 },
28 |             !obj:pylearn2.models.mlp.Softmax {
29 |                 n_classes : 10,
30 |                 layer_name : 'y',
31 |                 irange : *irange
32 |                 }
33 |             ]
34 |     },
35 |     algorithm : !obj:pylearn2.training_algorithms.sgd.SGD {
36 |         learning_rate : .1,
37 |         learning_rule : !obj:pylearn2.training_algorithms.learning_rule.Momentum {
38 |             init_momentum : 0.5
39 |         },
40 |         train_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
41 |         monitor_iteration_mode : 'shuffled_sequential', #'batchwise_shuffled_sequential',
42 |         #batches_per_iter : 500,
43 |         batch_size : 1200,
44 |         monitoring_dataset : {
45 |             'train' : *trainset,
46 |             'valid' : !obj:audio_dataset.AudioDataset {
47 |                 which_set : 'valid',
48 |                 config : *fold
49 |             }
50 |         },
51 |         termination_criterion: !obj:pylearn2.termination_criteria.MonitorBased {
52 |             channel_name : 'valid_y_misclass',
53 |             prop_decrease : .001,
54 |             N: 50
55 |         },
56 |         cost: !obj:pylearn2.costs.mlp.dropout.Dropout { 
57 |             default_input_include_prob : .75,
58 |             input_include_probs: { 'pre': 1., 'h0': 1. },
59 |             input_scales: { 'pre': 1., 'h0' : 1. }
60 |         }
61 |     },
62 |     extensions: [
63 |         !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
64 |              channel_name: 'valid_y_misclass',
65 |              save_path: "%(best_model_save_path)s"
66 |         },
67 |         !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
68 |             start: 1,
69 |             saturate: 200,
70 |             final_momentum: .9
71 |         },
72 |         !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
73 |             start: 1,
74 |             saturate: 200,
75 |             decay_factor: .01
76 |         }
77 |     ],
78 |     save_path : "%(save_path)s",
79 |     save_freq : 1
80 | }


--------------------------------------------------------------------------------