├── .gitignore
├── EULA.pdf
├── README.html
├── README.md
├── requirements.txt
├── src
    ├── __init__.py
    ├── dataset.py
    ├── evaluation.py
    ├── features.py
    ├── files.py
    ├── general.py
    ├── sound_event_detection.py
    └── ui.py
├── task1_scene_classification.py
├── task1_scene_classification.yaml
├── task3_sound_event_detection_in_real_life_audio.py
└── task3_sound_event_detection_in_real_life_audio.yaml


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | 
27 | # PyInstaller
28 | #  Usually these files are written by a python script from a template
29 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 | 
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 | 
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 | 
48 | # Translations
49 | *.mo
50 | *.pot
51 | 
52 | # Django stuff:
53 | *.log
54 | 
55 | # Sphinx documentation
56 | docs/_build/
57 | 
58 | # PyBuilder
59 | target/
60 | 
61 | #Ipython Notebook
62 | .ipynb_checkpoints
63 | 
64 | data/
65 | system/
66 | .idea/


--------------------------------------------------------------------------------
/EULA.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/EULA.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | DCASE2016 Baseline system
  2 | =========================
  3 | [Audio Research Group / Tampere University of Technology](http://arg.cs.tut.fi/)
  4 | 
  5 | *Python implementation*
  6 | 
  7 | Systems:
  8 | - Task 1 - Acoustic scene classification
  9 | - Task 3 - Sound event detection in real life audio
 10 | 
 11 | Authors
 12 | - Toni Heittola (<toni.heittola@tut.fi>, <http://www.cs.tut.fi/~heittolt/>)
 13 | - Annamaria Mesaros (<annamaria.mesaros@tut.fi>, <http://www.cs.tut.fi/~mesaros/>)
 14 | - Tuomas Virtanen (<tuomas.virtanen@tut.fi>, <http://www.cs.tut.fi/~tuomasv/>)
 15 | 
 16 | Table of Contents
 17 | =================
 18 | 1. [Introduction](#1-introduction)
 19 | 2. [Installation](#2-installation)
 20 | 3. [Usage](#3-usage)
 21 | 4. [System blocks](#4-system-blocks)
 22 | 5. [System evaluation](#5-system-evaluation)
 23 | 6. [System parameters](#6-system-parameters)
 24 | 7. [Changelog](#7-changelog)
 25 | 8. [License](#8-license)
 26 | 
 27 | 1. Introduction
 28 | ===============
 29 | This document describes the Python implementation of the baseline systems for the [Detection and Classification of Acoustic Scenes and Events 2016 (DCASE2016) challenge](http://www.cs.tut.fi/sgn/arg/dcase2016/) **[tasks 1](#11-acoustic-scene-classification)** and **[task 3](#12-sound-event-detection)**. The challenge consists of four tasks:
 30 | 
 31 | 1. [Acoustic scene classification](http://www.cs.tut.fi/sgn/arg/dcase2016/task-acoustic-scene-classification)
 32 | 2. [Sound event detection in synthetic audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-synthetic-audio)
 33 | 3. [Sound event detection in real life audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-real-life-audio)
 34 | 4. [Domestic audio tagging](http://www.cs.tut.fi/sgn/arg/dcase2016/task-audio-tagging)
 35 | 
 36 | The baseline systems for task 1 and 3 shares the same basic approach: [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) based acoustic features and [GMM](https://en.wikipedia.org/wiki/Mixture_model) based classifier. The main motivation to have similar approaches for both tasks was to provide low entry level and allow easy switching between the tasks. 
 37 | 
 38 | The dataset handling is hidden behind dataset access class, which should help DCASE challenge participants implementing their own systems.
 39 | 
 40 | The [Matlab implementation](https://github.com/TUT-ARG/DCASE2016-baseline-system-matlab) is also available. 
 41 | 
 42 | #### 1.1. Acoustic scene classification
 43 | 
 44 | The acoustic features include MFCC static coefficients (with 0th coefficient), delta coefficients and acceleration coefficients. The system learns one acoustic model per acoustic scene class, and does the classification with maximum likelihood classification scheme. 
 45 | 
 46 | #### 1.2. Sound event detection
 47 | 
 48 | The acoustic features include MFCC static coefficients (0th coefficient omitted), delta coefficients and acceleration coefficients. The system has a binary classifier for each sound event class included. For the classifier, two acoustic models are trained from the mixture signals: one with positive examples (target sound event active) and one with negative examples (target sound event non-active). The classification is done between these two models as likelihood ratio. Post-processing is applied to get sound event detection output. 
 49 | 
 50 | 2. Installation
 51 | ===============
 52 | 
 53 | The systems are developed for [Python 2.7.0](https://www.python.org/). Currently, the baseline system is tested only with Linux operating system. 
 54 | 
 55 | Run to ensure that all external modules are installed
 56 | 
 57 |     pip install -r requirements.txt
 58 | 
 59 | **External modules required**
 60 | 
 61 | [*numpy*](http://www.numpy.org/), [*scipy*](http://www.scipy.org/), [*scikit-learn*](http://scikit-learn.org/)
 62 | `pip install numpy scipy scikit-learn`
 63 | 
 64 | Scikit-learn (version >= 0.16) is required for the machine learning implementations.
 65 | 
 66 | [*PyYAML*](http://pyyaml.org/)
 67 | `pip install pyyaml`
 68 | 
 69 | PyYAML is required for handling the configuration files.
 70 | 
 71 | [*librosa*](https://github.com/bmcfee/librosa)
 72 | `pip install librosa`
 73 | 
 74 | Librosa is required for the feature extraction.
 75 | 
 76 | 3. Usage
 77 | ========
 78 | 
 79 | For each task there is separate executable (.py file):
 80 | 
 81 | 1. *task1_scene_classification.py*, Acoustic scene classification
 82 | 3. *task3_sound_event_detection_in_real_life_audio.py*, Real life audio sound event detection
 83 | 
 84 | Each system has two operating modes: **Development mode** and **Challenge mode**. 
 85 | 
 86 | All the usage parameters are shown by `python task1_scene_classification.py -h` and `python task3_sound_event_detection_in_real_life_audio.py -h`
 87 | 
 88 | The system parameters are defined in `task1_scene_classification.yaml` and `task3_sound_event_detection_in_real_life_audio.yaml`. 
 89 | 
 90 | With default parameter settings, the system will download needed dataset from Internet and extract it under directory `data` (storage path is controlled with parameter `path->data`). 
 91 | 
 92 | #### Development mode
 93 | 
 94 | In this mode, the system is trained and evaluated with the development dataset. This is the default operating mode. 
 95 | 
 96 | To run the system in this mode:
 97 | `python task1_scene_classification.py` 
 98 | or `python task1_scene_classification.py -development`.
 99 | 
100 | #### Challenge mode
101 | 
102 | In this mode, the system is trained with the provided development dataset and the evaluation dataset is run through the developed system. Output files are generated in correct format for the challenge submission. The system ouput is saved in the path specified with the parameter: `path->challenge_results`.
103 | 
104 | To run the system in this mode:
105 | `python task1_scene_classification.py -challenge`.
106 | 
107 | 
108 | 4. System blocks
109 | ================
110 | 
111 | The system implements following blocks:
112 | 
113 | 1. Dataset initialization 
114 |   - Downloads the dataset from the Internet if needed
115 |   - Extracts the dataset package if needed
116 |   - Makes sure that the meta files are appropriately formated
117 | 
118 | 2. Feature extraction (`do_feature_extraction`)
119 |   - Goes through all the training material and extracts the acoustic features
120 |   - Features are stored file-by-file on the local disk (pickle files)
121 | 
122 | 3. Feature normalization (`do_feature_normalization`)
123 |   - Goes through the training material in evaluation folds, and calculates global mean and std of the data.
124 |   - Stores the normalization factors (pickle files)
125 | 
126 | 4. System training (`do_system_training`)
127 |   - Trains the system
128 |   - Stores the trained models and feature normalization factors together on the local disk (pickle files)
129 | 
130 | 5. System testing (`do_system_testing`)
131 |   - Goes through the testing material and does the classification / detection 
132 |   - Stores the results (text files)
133 | 
134 | 6. System evaluation (`do_system_evaluation`)
135 |   - Reads the ground truth and the output of the system and calculates evaluation metrics
136 | 
137 | 5. System evaluation
138 | ====================
139 | 
140 | ## Task 1 - Acoustic scene classification
141 | 
142 | ###  Metrics
143 | 
144 | The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample. 
145 | 
146 | ### Results
147 | 
148 | ##### TUT Acoustic scenes 2016, development set
149 | 
150 | [Dataset](https://zenodo.org/record/45739)
151 | 
152 | *Evaluation setup*
153 | 
154 | - 4 cross-validation folds, average classification accuracy over folds
155 | - 15 acoustic scene classes
156 | - Classification unit: one file (30 seconds of audio).
157 | 
158 | *System parameters*
159 | 
160 | - Frame size: 40 ms (with 50% hop size)
161 | - Number of Gaussians per acoustic scene class model: 16 
162 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
163 | - Trained and tested on full audio 
164 | 
165 | | Scene                | Accuracy     |
166 | |----------------------|--------------|
167 | | Beach                |  63.3 %      |
168 | | Bus                  |  79.6 %      |
169 | | Cafe/restaurant      |  83.2 %      |
170 | | Car                  |  87.2 %      |
171 | | City center          |  85.5 %      |
172 | | Forest path          |  81.0 %      |
173 | | Grocery store        |  65.0 %      |
174 | | Home                 |  82.1 %      |
175 | | Library              |  50.4 %      |
176 | | Metro station        |  94.7 %      |
177 | | Office               |  98.6 %      |
178 | | Park                 |  13.9 %      |
179 | | Residential area     |  77.7 %      |
180 | | Train                |  33.6 %      |
181 | | Tram                 |  85.4 %      |
182 | | **Overall accuracy** |  **72.5 %**  |
183 | 
184 | ##### DCASE 2013 Scene classification, development set
185 | 
186 | [Dataset](http://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/29)
187 | 
188 | *Evaluation setup*
189 | 
190 | - 5 fold average
191 | - 10 acoustic scene classes
192 | - Classification unit: one file (30 seconds of audio).
193 | 
194 | *System parameters*
195 | 
196 | - Frame size: 40 ms (with 50% hop size)
197 | - Number of Gaussians per acoustic scene class model: 16 
198 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
199 | 
200 | | Scene                | Accuracy     |
201 | |----------------------|--------------|
202 | | Bus                  |  93.3 %      |
203 | | Busy street          |  80.0 %      |
204 | | Office               |  86.7 %      |
205 | | Open air market      |  73.3 %      |
206 | | Park                 |  26.7 %      |
207 | | Quiet street         |  53.3 %      |
208 | | Restaurant           |  40.0 %      |
209 | | Supermarket          |  26.7 %      |
210 | | Tube                 |  66.7 %      |
211 | | Tube station         |  53.3 %      |
212 | | **Overall accuracy** |  **60.0 %**  |
213 | 
214 | 
215 | ## Task 3 - Real life audio sound event detection
216 | 
217 | ###  Metrics
218 | 
219 | **Segment-based metrics**
220 | 
221 | Segment based evaluation is done in a fixed time grid, using segments of one second length to compare the ground truth and the system output. 
222 | 
223 | - **Total error rate (ER)** is the main metric for this task. Error rate as defined in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) will be evaluated in one-second segments over the entire test set. 
224 | 
225 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives. 
226 | 
227 | **Event-based metrics**
228 | 
229 | Event-based evaluation considers true positives, false positives and false negatives with respect to event instances. 
230 | 
231 | **Definition**: An event in the system output is considered correctly detected if its temporal position is overlapping with the temporal position of an event with the same label in the ground truth. A tolerance is allowed for the onset and offset (200 ms for onset and 200 ms or half length for offset)
232 | 
233 | - **Error rate** calculated as described in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) over all test data based on the total number of insertions, deletions and substitutions.
234 | 
235 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives.
236 | 
237 | Detailed description of metrics can be found from [DCASE2016 website](http://www.cs.tut.fi/sgn/arg/dcase2016/sound-event-detection-metrics).
238 | 
239 | ### Results
240 | 
241 | ##### TUT Sound events 2016, development set
242 | 
243 | [Dataset](https://zenodo.org/record/45759)
244 | 
245 | *Evaluation setup*
246 | 
247 | - 4 cross-validation folds
248 | 
249 | *System parameters*
250 | 
251 | - Frame size: 40 ms (with 50% hop size)
252 | - Number of Gaussians per sound event model (positive and negative): 16 
253 | - Feature vector: 20 MFCC static coefficients (excluding 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
254 | - Decision_threshold: 140
255 | 
256 | *Segment based metrics - overall*
257 | 
258 | | Scene                 | ER          | ER / S      | ER / D      | ER / I      |  F1         |
259 | |-----------------------|-------------|-------------|-------------|-------------|-------------|
260 | | Home                  | 0.96        | 0.08        | 0.82        | 0.06        | 15.9 %      |
261 | | Residential area      | 0.86        | 0.05        | 0.74        | 0.07        | 31.5 %      |
262 | | **Average**           | **0.91**    |             |             |             | **23.7 %**  |
263 | 
264 | *Segment based metrics - class-wise*
265 | 
266 | | Scene                 | ER          | F1          | 
267 | |-----------------------|-------------|-------------|
268 | | Home                  | 1.06        | 9.2 %      |
269 | | Residential area      | 1.03        | 17.6 %      | 
270 | | **Average**           | **1.04**    | **13.4 %**  |
271 | 
272 | *Event based metrics (onset-only) - overall*
273 | 
274 | | Scene                 | ER          | F1          | 
275 | |-----------------------|-------------|-------------|
276 | | Home                  | 1.28        | 4.7 %       |
277 | | Residential area      | 1.92        | 2.9 %       |
278 | | **Average**           | **1.60**    | **3.8 %**   |
279 | 
280 | *Event based metrics (onset-only) - class-wise*
281 | 
282 | | Scene                 | ER          | F1          | 
283 | |-----------------------|-------------|-------------|
284 | | Home                  | 1.27        | 4.3 %       |
285 | | Residential area      | 1.97        | 1.5 %       |
286 | | **Average**           | **1.62**    | **2.9 %**   |
287 | 
288 | 
289 | 6. System parameters
290 | ====================
291 | All the parameters are set in `task1_scene_classification.yaml`, and `task3_sound_event_detection_in_real_life_audio.yaml`.
292 | 
293 | **Controlling the system flow**
294 | 
295 | The blocks of the system can be controlled through the configuration file. Usually all of them can be kept on. 
296 |     
297 |     flow:
298 |       initialize: true
299 |       extract_features: true
300 |       feature_normalizer: true
301 |       train_system: true
302 |       test_system: true
303 |       evaluate_system: true
304 | 
305 | **General parameters**
306 | 
307 | The selection of used dataset.
308 | 
309 |     general:
310 |       development_dataset: TUTSoundEvents_2016_DevelopmentSet
311 |       challenge_dataset: TUTSoundEvents_2016_EvaluationSet
312 | 
313 |       overwrite: false                                          # Overwrite previously stored data 
314 | 
315 | `development_dataset: TUTSoundEvents_2016_DevelopmentSet`
316 | : The dataset handler class used while running the system in development mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`).
317 | 
318 | `challenge_dataset: TUTSoundEvents_2016_EvaluationSet`
319 | : The dataset handler class used while running the system in challenge mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`).
320 | 
321 | Available dataset handler classes:
322 | 
323 | **DCASE 2016**
324 | 
325 | - TUTAcousticScenes_2016_DevelopmentSet
326 | - TUTAcousticScenes_2016_EvaluationSet
327 | - TUTSoundEvents_2016_DevelopmentSet
328 | - TUTSoundEvents_2016_EvaluationSet
329 | 
330 | **DCASE 2013**
331 | 
332 | - DCASE2013_Scene_DevelopmentSet
333 | - DCASE2013_Scene_EvaluationSet
334 | - DCASE2013_Event_DevelopmentSet
335 | - DCASE2013_Event_EvaluationSet
336 | 
337 | 
338 | `overwrite: false`
339 | : Switch to allow the system always to overwrite existing data on disk. 
340 | 
341 | `challenge_submission_mode: false`
342 | : Switch to control where system output is saved. If true, `path->challenge_results` used, and all results are overwritten by default.    
343 | 
344 |   
345 | **System paths**
346 | 
347 | This section contains the storage paths.      
348 |       
349 |     path:
350 |       data: data/
351 | 
352 |       base: system/baseline_dcase2016_task1/
353 |       features: features/
354 |       feature_normalizers: feature_normalizers/
355 |       models: acoustic_models/
356 |       results: evaluation_results/
357 | 
358 |       challenge_results: challenge_submission/task_1_acoustic_scene_classification/
359 | 
360 | These parameters defines the folder-structure to store acoustic features, feature normalization data, acoustic models and evaluation results.      
361 | 
362 | `data: data/`
363 | : Defines the path where the dataset data is downloaded and stored. Path is relative to the main script. 
364 | 
365 | `base: system/baseline_dcase2016_task1/`
366 | : Defines the base path where the system stores the data. Other paths are stored under this path. If specified directory does not exist it is created. Path is relative to the main script.  
367 | 
368 | `challenge_results: challenge_submission/task_1_acoustic_scene_classification/`
369 | : Defines where the system output is stored while running the system in challenge mode. 
370 |       
371 | **Feature extraction**
372 | 
373 | This section contains the feature extraction related parameters. 
374 | 
375 |     features:
376 |       fs: 44100
377 |       win_length_seconds: 0.04
378 |       hop_length_seconds: 0.02
379 | 
380 |       include_mfcc0: true           #
381 |       include_delta: true           #
382 |       include_acceleration: true    #
383 | 
384 |       mfcc:
385 |         window: hamming_asymmetric  # [hann_asymmetric, hamming_asymmetric]
386 |         n_mfcc: 20                  # Number of MFCC coefficients
387 |         n_mels: 40                  # Number of MEL bands used
388 |         n_fft: 2048                 # FFT length
389 |         fmin: 0                     # Minimum frequency when constructing MEL bands
390 |         fmax: 22050                 # Maximum frequency when constructing MEL band
391 |         htk: false                  # Switch for HTK-styled MEL-frequency equation
392 | 
393 |       mfcc_delta:
394 |         width: 9
395 | 
396 |       mfcc_acceleration:
397 |         width: 9
398 | 
399 | `fs: 44100`
400 | : Default sampling frequency. If given dataset does not fulfill this criteria the audio data is resampled.
401 | 
402 | 
403 | `win_length_seconds: 0.04`
404 | : Feature extraction frame length in seconds.
405 |     
406 | 
407 | `hop_length_seconds: 0.02`
408 | : Feature extraction frame hop-length in seconds.
409 | 
410 | 
411 | `include_mfcc0: true`
412 | : Switch to include zeroth coefficient of static MFCC in the feature vector
413 | 
414 | 
415 | `include_delta: true`
416 | : Switch to include delta coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of delta-window is set in `mfcc_delta->width: 9` 
417 | 
418 | 
419 | `include_acceleration: true`
420 | : Switch to include acceleration (delta-delta) coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of acceleration-window is set in `mfcc_acceleration->width: 9` 
421 | 
422 | `mfcc->n_mfcc: 16`
423 | : Number of MFCC coefficients
424 | 
425 | `mfcc->fmax: 22050`
426 | : Maximum frequency for MEL band. Usually, this is set to a half of the sampling frequency.
427 |         
428 | **Classifier**
429 | 
430 | This section contains the frame classifier related parameters. These parameters are used when chosen classifier is trained.
431 | 
432 |     classifier:
433 |       method: gmm                   # The system supports only gmm
434 | 
435 |       audio_error_handling:         # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
436 |         clean_data: false           # Exclude audio errors from training audio
437 |       
438 |       parameters: !!null            # Parameters are copied from classifier_parameters based on defined method
439 | 
440 |     classifier_parameters:
441 |       gmm:
442 |         n_components: 16            # Number of Gaussian components
443 |         covariance_type: diag       # Diagonal or full covariance matrix
444 |         random_state: 0
445 |         thresh: !!null
446 |         tol: 0.001
447 |         min_covar: 0.001
448 |         n_iter: 40
449 |         n_init: 1
450 |         params: wmc
451 |         init_params: wmc
452 | 
453 | `audio_error_handling->clean_data: false`
454 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the classifier during training. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones.
455 | 
456 | `classifier_parameters->gmm->n_components: 16`
457 | : Number of Gaussians used in the modeling.
458 | 
459 | In order to add new classifiers to the system, add parameters under classifier_parameters with new tag. Set `classifier->method` and add appropriate code where `classifier_method` variable is used system block API (look into `do_system_training` and `do_system_testing` methods). In addition to this, one might want to modify filename methods (`get_model_filename` and `get_result_filename`) to allow multiple classifier methods co-exist in the system.
460 | 
461 | **Recognizer**
462 | 
463 | This section contains the sound recognition related parameters (used in `task1_scene_classification.py`).
464 | 
465 |     recognizer:
466 |       audio_error_handling:         # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
467 |         clean_data: false           # Exclude audio errors from test audio
468 | 
469 | `audio_error_handling->clean_data: false`
470 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the recognizer. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones.
471 | 
472 | **Detector**
473 | 
474 | This section contains the sound event detection related parameters (used in `task3_sound_event_detection_in_real_life_audio.py`).
475 | 
476 |     detector:
477 |       decision_threshold: 140.0
478 |       smoothing_window_length: 1.0  # seconds
479 |       minimum_event_length: 0.1     # seconds
480 |       minimum_event_gap: 0.1        # seconds
481 | 
482 | `decision_threshold: 140.0`
483 | : Decision threshold used to do final classification. This can be used to control the sensitivity of the system. With log-likelihoods: `event_activity = (positive - negative) > decision_threshold`
484 | 
485 | 
486 | `smoothing_window_length: 1.0`
487 | : Size of sliding accumulation window (in seconds) used before frame-wise classification decision  
488 | 
489 | 
490 | `minimum_event_length: 0.1`
491 | : Minimum length (in seconds) of outputted events. Events with shorter length than given are filtered out from the system output.
492 | 
493 | 
494 | `minimum_event_gap: 0.1`
495 | : Minimum gap (in seconds) between events from same event class in the output. Consecutive events (event with same event label) having shorter gaps between them than set parameter are merged together.
496 | 
497 | 7. Changelog
498 | ============
499 | #### 1.2 / 2016-11-10
500 | * Added evaluation in challenge mode for task 1
501 | 
502 | #### 1.1 / 2016-05-19
503 | * Added audio error handling 
504 | 
505 | #### 1.0 / 2016-02-08
506 | * Initial commit
507 | 
508 | 8. License
509 | ==========
510 | 
511 | See file [EULA.pdf](EULA.pdf)
512 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | scipy>=0.15.1
2 | numpy>=1.9.2
3 | scikit-learn==0.16.1
4 | pyyaml>=3.11
5 | librosa==0.4.0
6 | soundfile>=0.9.0
7 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/src/__init__.py


--------------------------------------------------------------------------------
/src/evaluation.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | # -*- coding: utf-8 -*-
   3 | 
   4 | import sys
   5 | import numpy
   6 | import math
   7 | from sklearn import metrics
   8 | 
   9 | class DCASE2016_SceneClassification_Metrics():
  10 |     """DCASE 2016 scene classification metrics
  11 | 
  12 |     Examples
  13 |     --------
  14 | 
  15 |         >>> dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
  16 |         >>> for fold in dataset.folds(mode=dataset_evaluation_mode):
  17 |         >>>     results = []
  18 |         >>>     result_filename = get_result_filename(fold=fold, path=result_path)
  19 |         >>>
  20 |         >>>     if os.path.isfile(result_filename):
  21 |         >>>         with open(result_filename, 'rt') as f:
  22 |         >>>             for row in csv.reader(f, delimiter='\t'):
  23 |         >>>                 results.append(row)
  24 |         >>>
  25 |         >>>     y_true = []
  26 |         >>>     y_pred = []
  27 |         >>>     for result in results:
  28 |         >>>         y_true.append(dataset.file_meta(result[0])[0]['scene_label'])
  29 |         >>>         y_pred.append(result[1])
  30 |         >>>
  31 |         >>>     dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
  32 |         >>>
  33 |         >>> results = dcase2016_scene_metric.results()
  34 | 
  35 |     """
  36 | 
  37 |     def __init__(self, class_list):
  38 |         """__init__ method.
  39 | 
  40 |         Parameters
  41 |         ----------
  42 |         class_list : list
  43 |             Evaluated scene labels in the list
  44 | 
  45 |         """
  46 |         self.accuracies_per_class = None
  47 |         self.correct_per_class = None
  48 |         self.Nsys = None
  49 |         self.Nref = None
  50 |         self.class_list = class_list
  51 |         self.eps = numpy.spacing(1)
  52 | 
  53 |     def __enter__(self):
  54 |         return self
  55 | 
  56 |     def __exit__(self, type, value, traceback):
  57 |         return self.results()
  58 | 
  59 |     def accuracies(self, y_true, y_pred, labels):
  60 |         """Calculate accuracy
  61 | 
  62 |         Parameters
  63 |         ----------
  64 |         y_true : numpy.array
  65 |             Ground truth array, list of scene labels
  66 | 
  67 |         y_pred : numpy.array
  68 |             System output array, list of scene labels
  69 | 
  70 |         labels : list
  71 |             list of scene labels
  72 | 
  73 |         Returns
  74 |         -------
  75 |         array : numpy.array [shape=(number of scene labels,)]
  76 |             Accuracy per scene label class
  77 | 
  78 |         """
  79 | 
  80 |         confusion_matrix = metrics.confusion_matrix(y_true=y_true, y_pred=y_pred, labels=labels).astype(float)
  81 |         return (numpy.diag(confusion_matrix), numpy.divide(numpy.diag(confusion_matrix), numpy.sum(confusion_matrix, 1)+self.eps))
  82 | 
  83 |     def evaluate(self, annotated_ground_truth, system_output):
  84 |         """Evaluate system output and annotated ground truth pair.
  85 | 
  86 |         Use results method to get results.
  87 | 
  88 |         Parameters
  89 |         ----------
  90 |         annotated_ground_truth : numpy.array
  91 |             Ground truth array, list of scene labels
  92 | 
  93 |         system_output : numpy.array
  94 |             System output array, list of scene labels
  95 | 
  96 |         Returns
  97 |         -------
  98 |         nothing
  99 | 
 100 |         """
 101 | 
 102 |         correct_per_class, accuracies_per_class = self.accuracies(y_pred=system_output, y_true=annotated_ground_truth, labels=self.class_list)
 103 | 
 104 |         if self.accuracies_per_class is None:
 105 |             self.accuracies_per_class = accuracies_per_class
 106 |         else:
 107 |             self.accuracies_per_class = numpy.vstack((self.accuracies_per_class, accuracies_per_class))
 108 | 
 109 |         if self.correct_per_class is None:
 110 |             self.correct_per_class = correct_per_class
 111 |         else:
 112 |             self.correct_per_class = numpy.vstack((self.correct_per_class, correct_per_class))
 113 | 
 114 |         Nref = numpy.zeros(len(self.class_list))
 115 |         Nsys = numpy.zeros(len(self.class_list))
 116 | 
 117 |         for class_id, class_label in enumerate(self.class_list):
 118 |             for item in system_output:
 119 |                 if item == class_label:
 120 |                     Nsys[class_id] += 1
 121 | 
 122 |             for item in annotated_ground_truth:
 123 |                 if item == class_label:
 124 |                     Nref[class_id] += 1
 125 | 
 126 |         if self.Nref is None:
 127 |             self.Nref = Nref
 128 |         else:
 129 |             self.Nref = numpy.vstack((self.Nref, Nref))
 130 | 
 131 |         if self.Nsys is None:
 132 |             self.Nsys = Nsys
 133 |         else:
 134 |             self.Nsys = numpy.vstack((self.Nsys, Nsys))
 135 | 
 136 |     def results(self):
 137 |         """Get results
 138 | 
 139 |         Outputs results in dict, format:
 140 | 
 141 |             {
 142 |                 'class_wise_data':
 143 |                     {
 144 |                         'office': {
 145 |                             'Nsys': 10,
 146 |                             'Nref': 7,
 147 |                         },
 148 |                     }
 149 |                 'class_wise_accuracy':
 150 |                     {
 151 |                         'office': 0.6,
 152 |                         'home': 0.4,
 153 |                     }
 154 |                 'overall_accuracy': numpy.mean(self.accuracies_per_class)
 155 |                 'Nsys': 100,
 156 |                 'Nref': 100,
 157 |             }
 158 | 
 159 |         Parameters
 160 |         ----------
 161 |         nothing
 162 | 
 163 |         Returns
 164 |         -------
 165 |         results : dict
 166 |             Results dict
 167 | 
 168 |         """
 169 | 
 170 |         results = {
 171 |             'class_wise_data': {},
 172 |             'class_wise_accuracy': {},
 173 |             'overall_accuracy': float(numpy.mean(self.accuracies_per_class)),
 174 |             'class_wise_correct_count': self.correct_per_class.tolist(),
 175 | 
 176 |         }
 177 |         if len(self.Nsys.shape) == 2:
 178 |             results['Nsys'] = int(sum(sum(self.Nsys)))
 179 |             results['Nref'] = int(sum(sum(self.Nref)))
 180 |         else:
 181 |             results['Nsys'] = int(sum(self.Nsys))
 182 |             results['Nref'] = int(sum(self.Nref))
 183 | 
 184 |         for class_id, class_label in enumerate(self.class_list):
 185 |             if len(self.accuracies_per_class.shape) == 2:
 186 |                 results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[:, class_id])
 187 |                 results['class_wise_data'][class_label] = {
 188 |                    'Nsys': int(sum(self.Nsys[:, class_id])),
 189 |                     'Nref': int(sum(self.Nref[:, class_id])),
 190 |                 }
 191 |             else:
 192 |                 results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[class_id])
 193 |                 results['class_wise_data'][class_label] = {
 194 |                    'Nsys': int(self.Nsys[class_id]),
 195 |                     'Nref': int(self.Nref[class_id]),
 196 |                 }
 197 | 
 198 |         return results
 199 | 
 200 | 
 201 | class EventDetectionMetrics(object):
 202 |     """Baseclass for sound event metric classes.
 203 |     """
 204 | 
 205 |     def __init__(self, class_list):
 206 |         """__init__ method.
 207 | 
 208 |         Parameters
 209 |         ----------
 210 |         class_list : list
 211 |             List of class labels to be evaluated.
 212 | 
 213 |         """
 214 | 
 215 |         self.class_list = class_list
 216 |         self.eps = numpy.spacing(1)
 217 | 
 218 |     def max_event_offset(self, data):
 219 |         """Get maximum event offset from event list
 220 | 
 221 |         Parameters
 222 |         ----------
 223 |         data : list
 224 |             Event list, list of event dicts
 225 | 
 226 |         Returns
 227 |         -------
 228 |         max : float > 0
 229 |             Maximum event offset
 230 |         """
 231 | 
 232 |         max = 0
 233 |         for event in data:
 234 |             if event['event_offset'] > max:
 235 |                 max = event['event_offset']
 236 |         return max
 237 | 
 238 |     def list_to_roll(self, data, time_resolution=0.01):
 239 |         """Convert event list into event roll.
 240 |         Event roll is binary matrix indicating event activity withing time segment defined by time_resolution.
 241 | 
 242 |         Parameters
 243 |         ----------
 244 |         data : list
 245 |             Event list, list of event dicts
 246 | 
 247 |         time_resolution : float > 0
 248 |             Time resolution used when converting event into event roll.
 249 | 
 250 |         Returns
 251 |         -------
 252 |         event_roll : numpy.ndarray [shape=(math.ceil(data_length * 1 / time_resolution), amount of classes)]
 253 |             Event roll
 254 |         """
 255 | 
 256 |         # Initialize
 257 |         data_length = self.max_event_offset(data)
 258 |         event_roll = numpy.zeros(( int(math.ceil(data_length * 1 / time_resolution)), len(self.class_list)))
 259 | 
 260 |         # Fill-in event_roll
 261 |         for event in data:
 262 |             pos = self.class_list.index(event['event_label'].rstrip())
 263 | 
 264 |             onset = int(math.floor(event['event_onset'] * 1 / time_resolution))
 265 |             offset = int(math.ceil(event['event_offset'] * 1 / time_resolution))
 266 | 
 267 |             event_roll[onset:offset, pos] = 1
 268 | 
 269 |         return event_roll
 270 | 
 271 | 
 272 | class DCASE2016_EventDetection_SegmentBasedMetrics(EventDetectionMetrics):
 273 |     """DCASE2016 Segment based metrics for sound event detection
 274 | 
 275 |     Supported metrics:
 276 |     - Overall
 277 |         - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D)
 278 |         - F-score (F1)
 279 |     - Class-wise
 280 |         - Error rate (ER), Insertions (I), Deletions (D)
 281 |         - F-score (F1)
 282 | 
 283 |     Examples
 284 |     --------
 285 | 
 286 |     >>> overall_metrics_per_scene = {}
 287 |     >>> for scene_id, scene_label in enumerate(dataset.scene_labels):
 288 |     >>>     dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
 289 |     >>>     for fold in dataset.folds(mode=dataset_evaluation_mode):
 290 |     >>>         results = []
 291 |     >>>         result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
 292 |     >>>
 293 |     >>>         if os.path.isfile(result_filename):
 294 |     >>>             with open(result_filename, 'rt') as f:
 295 |     >>>                 for row in csv.reader(f, delimiter='\t'):
 296 |     >>>                     results.append(row)
 297 |     >>>
 298 |     >>>         for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)):
 299 |     >>>             current_file_results = []
 300 |     >>>             for result_line in results:
 301 |     >>>                 if result_line[0] == dataset.absolute_to_relative(item['file']):
 302 |     >>>                     current_file_results.append(
 303 |     >>>                         {'file': result_line[0],
 304 |     >>>                          'event_onset': float(result_line[1]),
 305 |     >>>                          'event_offset': float(result_line[2]),
 306 |     >>>                          'event_label': result_line[3]
 307 |     >>>                          }
 308 |     >>>                     )
 309 |     >>>             meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
 310 |     >>>         dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
 311 |     >>> overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results()
 312 | 
 313 |     """
 314 | 
 315 |     def __init__(self, class_list, time_resolution=1.0):
 316 |         """__init__ method.
 317 | 
 318 |         Parameters
 319 |         ----------
 320 |         class_list : list
 321 |             List of class labels to be evaluated.
 322 | 
 323 |         time_resolution : float > 0
 324 |             Time resolution used when converting event into event roll.
 325 |             (Default value = 1.0)
 326 | 
 327 |         """
 328 | 
 329 |         self.time_resolution = time_resolution
 330 | 
 331 |         self.overall = {
 332 |             'Ntp': 0.0,
 333 |             'Ntn': 0.0,
 334 |             'Nfp': 0.0,
 335 |             'Nfn': 0.0,
 336 |             'Nref': 0.0,
 337 |             'Nsys': 0.0,
 338 |             'ER': 0.0,
 339 |             'S': 0.0,
 340 |             'D': 0.0,
 341 |             'I': 0.0,
 342 |         }
 343 |         self.class_wise = {}
 344 | 
 345 |         for class_label in class_list:
 346 |             self.class_wise[class_label] = {
 347 |                 'Ntp': 0.0,
 348 |                 'Ntn': 0.0,
 349 |                 'Nfp': 0.0,
 350 |                 'Nfn': 0.0,
 351 |                 'Nref': 0.0,
 352 |                 'Nsys': 0.0,
 353 |             }
 354 | 
 355 |         EventDetectionMetrics.__init__(self, class_list=class_list)
 356 | 
 357 |     def __enter__(self):
 358 |         # Initialize class and return it
 359 |         return self
 360 | 
 361 |     def __exit__(self, type, value, traceback):
 362 |         # Finalize evaluation and return results
 363 |         return self.results()
 364 | 
 365 |     def evaluate(self, annotated_ground_truth, system_output):
 366 |         """Evaluate system output and annotated ground truth pair.
 367 | 
 368 |         Use results method to get results.
 369 | 
 370 |         Parameters
 371 |         ----------
 372 |         annotated_ground_truth : numpy.array
 373 |             Ground truth array, list of scene labels
 374 | 
 375 |         system_output : numpy.array
 376 |             System output array, list of scene labels
 377 | 
 378 |         Returns
 379 |         -------
 380 |         nothing
 381 | 
 382 |         """
 383 | 
 384 |         # Convert event list into frame-based representation
 385 |         system_event_roll = self.list_to_roll(data=system_output, time_resolution=self.time_resolution)
 386 |         annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=self.time_resolution)
 387 | 
 388 |         # Fix durations of both event_rolls to be equal
 389 |         if annotated_event_roll.shape[0] > system_event_roll.shape[0]:
 390 |             padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list)))
 391 |             system_event_roll = numpy.vstack((system_event_roll, padding))
 392 | 
 393 |         if system_event_roll.shape[0] > annotated_event_roll.shape[0]:
 394 |             padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list)))
 395 |             annotated_event_roll = numpy.vstack((annotated_event_roll, padding))
 396 | 
 397 |         # Compute segment-based overall metrics
 398 |         for segment_id in range(0, annotated_event_roll.shape[0]):
 399 |             annotated_segment = annotated_event_roll[segment_id, :]
 400 |             system_segment = system_event_roll[segment_id, :]
 401 | 
 402 |             Ntp = sum(system_segment + annotated_segment > 1)
 403 |             Ntn = sum(system_segment + annotated_segment == 0)
 404 |             Nfp = sum(system_segment - annotated_segment > 0)
 405 |             Nfn = sum(annotated_segment - system_segment > 0)
 406 | 
 407 |             Nref = sum(annotated_segment)
 408 |             Nsys = sum(system_segment)
 409 | 
 410 |             S = min(Nref, Nsys) - Ntp
 411 |             D = max(0, Nref - Nsys)
 412 |             I = max(0, Nsys - Nref)
 413 |             ER = max(Nref, Nsys) - Ntp
 414 | 
 415 |             self.overall['Ntp'] += Ntp
 416 |             self.overall['Ntn'] += Ntn
 417 |             self.overall['Nfp'] += Nfp
 418 |             self.overall['Nfn'] += Nfn
 419 |             self.overall['Nref'] += Nref
 420 |             self.overall['Nsys'] += Nsys
 421 |             self.overall['S'] += S
 422 |             self.overall['D'] += D
 423 |             self.overall['I'] += I
 424 |             self.overall['ER'] += ER
 425 | 
 426 |         for class_id, class_label in enumerate(self.class_list):
 427 |             annotated_segment = annotated_event_roll[:, class_id]
 428 |             system_segment = system_event_roll[:, class_id]
 429 | 
 430 |             Ntp = sum(system_segment + annotated_segment > 1)
 431 |             Ntn = sum(system_segment + annotated_segment == 0)
 432 |             Nfp = sum(system_segment - annotated_segment > 0)
 433 |             Nfn = sum(annotated_segment - system_segment > 0)
 434 | 
 435 |             Nref = sum(annotated_segment)
 436 |             Nsys = sum(system_segment)
 437 | 
 438 |             self.class_wise[class_label]['Ntp'] += Ntp
 439 |             self.class_wise[class_label]['Ntn'] += Ntn
 440 |             self.class_wise[class_label]['Nfp'] += Nfp
 441 |             self.class_wise[class_label]['Nfn'] += Nfn
 442 |             self.class_wise[class_label]['Nref'] += Nref
 443 |             self.class_wise[class_label]['Nsys'] += Nsys
 444 | 
 445 |         return self
 446 | 
 447 |     def results(self):
 448 |         """Get results
 449 | 
 450 |         Outputs results in dict, format:
 451 | 
 452 |             {
 453 |                 'overall':
 454 |                     {
 455 |                         'Pre':
 456 |                         'Rec':
 457 |                         'F':
 458 |                         'ER':
 459 |                         'S':
 460 |                         'D':
 461 |                         'I':
 462 |                     }
 463 |                 'class_wise':
 464 |                     {
 465 |                         'office': {
 466 |                             'Pre':
 467 |                             'Rec':
 468 |                             'F':
 469 |                             'ER':
 470 |                             'D':
 471 |                             'I':
 472 |                             'Nref':
 473 |                             'Nsys':
 474 |                             'Ntp':
 475 |                             'Nfn':
 476 |                             'Nfp':
 477 |                         },
 478 |                     }
 479 |                 'class_wise_average':
 480 |                     {
 481 |                         'F':
 482 |                         'ER':
 483 |                     }
 484 |             }
 485 | 
 486 |         Parameters
 487 |         ----------
 488 |         nothing
 489 | 
 490 |         Returns
 491 |         -------
 492 |         results : dict
 493 |             Results dict
 494 | 
 495 |         """
 496 | 
 497 |         results = {'overall': {},
 498 |                    'class_wise': {},
 499 |                    'class_wise_average': {},
 500 |                    }
 501 | 
 502 |         # Overall metrics
 503 |         results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps)
 504 |         results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref']
 505 |         results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps))
 506 | 
 507 |         results['overall']['ER'] = self.overall['ER'] / self.overall['Nref']
 508 |         results['overall']['S'] = self.overall['S'] / self.overall['Nref']
 509 |         results['overall']['D'] = self.overall['D'] / self.overall['Nref']
 510 |         results['overall']['I'] = self.overall['I'] / self.overall['Nref']
 511 | 
 512 |         # Class-wise metrics
 513 |         class_wise_F = []
 514 |         class_wise_ER = []
 515 |         for class_id, class_label in enumerate(self.class_list):
 516 |             if class_label not in results['class_wise']:
 517 |                 results['class_wise'][class_label] = {}
 518 |             results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps)
 519 |             results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps)
 520 |             results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps))
 521 | 
 522 |             results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn'] + self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps)
 523 |             results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps)
 524 |             results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps)
 525 | 
 526 |             results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref']
 527 |             results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys']
 528 |             results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp']
 529 |             results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn']
 530 |             results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp']
 531 | 
 532 |             class_wise_F.append(results['class_wise'][class_label]['F'])
 533 |             class_wise_ER.append(results['class_wise'][class_label]['ER'])
 534 | 
 535 |         results['class_wise_average']['F'] = numpy.mean(class_wise_F)
 536 |         results['class_wise_average']['ER'] = numpy.mean(class_wise_ER)
 537 | 
 538 |         return results
 539 | 
 540 | 
 541 | class DCASE2016_EventDetection_EventBasedMetrics(EventDetectionMetrics):
 542 |     """DCASE2016 Event based metrics for sound event detection
 543 | 
 544 |     Supported metrics:
 545 |     - Overall
 546 |         - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D)
 547 |         - F-score (F1)
 548 |     - Class-wise
 549 |         - Error rate (ER), Insertions (I), Deletions (D)
 550 |         - F-score (F1)
 551 | 
 552 |     Examples
 553 |     --------
 554 | 
 555 |     >>> overall_metrics_per_scene = {}
 556 |     >>> for scene_id, scene_label in enumerate(dataset.scene_labels):
 557 |     >>>     dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
 558 |     >>>     for fold in dataset.folds(mode=dataset_evaluation_mode):
 559 |     >>>         results = []
 560 |     >>>         result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
 561 |     >>>
 562 |     >>>         if os.path.isfile(result_filename):
 563 |     >>>             with open(result_filename, 'rt') as f:
 564 |     >>>                 for row in csv.reader(f, delimiter='\t'):
 565 |     >>>                     results.append(row)
 566 |     >>>
 567 |     >>>         for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)):
 568 |     >>>             current_file_results = []
 569 |     >>>             for result_line in results:
 570 |     >>>                 if result_line[0] == dataset.absolute_to_relative(item['file']):
 571 |     >>>                     current_file_results.append(
 572 |     >>>                         {'file': result_line[0],
 573 |     >>>                          'event_onset': float(result_line[1]),
 574 |     >>>                          'event_offset': float(result_line[2]),
 575 |     >>>                          'event_label': result_line[3]
 576 |     >>>                          }
 577 |     >>>                     )
 578 |     >>>             meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
 579 |     >>>         dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
 580 |     >>> overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results()
 581 | 
 582 |     """
 583 | 
 584 |     def __init__(self, class_list, t_collar=0.2, use_onset_condition=True, use_offset_condition=True):
 585 |         """__init__ method.
 586 | 
 587 |         Parameters
 588 |         ----------
 589 |         class_list : list
 590 |             List of class labels to be evaluated.
 591 | 
 592 |         t_collar : float > 0
 593 |             Time collar for event onset and offset condition
 594 |             (Default value = 0.2)
 595 | 
 596 |         use_onset_condition : bool
 597 |             Use onset condition when finding correctly detected events
 598 |             (Default value = True)
 599 | 
 600 |         use_offset_condition : bool
 601 |             Use offset condition when finding correctly detected events
 602 |             (Default value = True)
 603 | 
 604 |         """
 605 | 
 606 |         self.t_collar = t_collar
 607 |         self.use_onset_condition = use_onset_condition
 608 |         self.use_offset_condition = use_offset_condition
 609 | 
 610 |         self.overall = {
 611 |             'Nref': 0.0,
 612 |             'Nsys': 0.0,
 613 |             'Nsubs': 0.0,
 614 |             'Ntp': 0.0,
 615 |             'Nfp': 0.0,
 616 |             'Nfn': 0.0,
 617 |         }
 618 |         self.class_wise = {}
 619 | 
 620 |         for class_label in class_list:
 621 |             self.class_wise[class_label] = {
 622 |                 'Nref': 0.0,
 623 |                 'Nsys': 0.0,
 624 |                 'Ntp': 0.0,
 625 |                 'Ntn': 0.0,
 626 |                 'Nfp': 0.0,
 627 |                 'Nfn': 0.0,
 628 |             }
 629 | 
 630 |         EventDetectionMetrics.__init__(self, class_list=class_list)
 631 | 
 632 |     def __enter__(self):
 633 |         # Initialize class and return it
 634 |         return self
 635 | 
 636 |     def __exit__(self, type, value, traceback):
 637 |         # Finalize evaluation and return results
 638 |         return self.results()
 639 | 
 640 |     def evaluate(self, annotated_ground_truth, system_output):
 641 |         """Evaluate system output and annotated ground truth pair.
 642 | 
 643 |         Use results method to get results.
 644 | 
 645 |         Parameters
 646 |         ----------
 647 |         annotated_ground_truth : numpy.array
 648 |             Ground truth array, list of scene labels
 649 | 
 650 |         system_output : numpy.array
 651 |             System output array, list of scene labels
 652 | 
 653 |         Returns
 654 |         -------
 655 |         nothing
 656 | 
 657 |         """
 658 | 
 659 |         # Overall metrics
 660 | 
 661 |         # Total number of detected and reference events
 662 |         Nsys = len(system_output)
 663 |         Nref = len(annotated_ground_truth)
 664 | 
 665 |         sys_correct = numpy.zeros(Nsys, dtype=bool)
 666 |         ref_correct = numpy.zeros(Nref, dtype=bool)
 667 | 
 668 |         # Number of correctly transcribed events, onset/offset within a t_collar range
 669 |         for j in range(0, len(annotated_ground_truth)):
 670 |             for i in range(0, len(system_output)):
 671 |                 if not sys_correct[i]:  # skip already matched events
 672 |                     label_condition = annotated_ground_truth[j]['event_label'] == system_output[i]['event_label']
 673 |                     if self.use_onset_condition:
 674 |                         onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
 675 |                                                                system_event=system_output[i],
 676 |                                                                t_collar=self.t_collar)
 677 |                     else:
 678 |                         onset_condition = True
 679 | 
 680 |                     if self.use_offset_condition:
 681 |                         offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
 682 |                                                                 system_event=system_output[i],
 683 |                                                                 t_collar=self.t_collar)
 684 |                     else:
 685 |                         offset_condition = True
 686 | 
 687 |                     if label_condition and onset_condition and offset_condition:
 688 |                         ref_correct[j] = True
 689 |                         sys_correct[i] = True
 690 |                         break
 691 | 
 692 |         Ntp = numpy.sum(sys_correct)
 693 | 
 694 |         sys_leftover = numpy.nonzero(numpy.negative(sys_correct))[0]
 695 |         ref_leftover = numpy.nonzero(numpy.negative(ref_correct))[0]
 696 | 
 697 |         # Substitutions
 698 |         Nsubs = 0
 699 |         sys_counted = numpy.zeros(Nsys, dtype=bool)
 700 |         for j in ref_leftover:
 701 |             for i in sys_leftover:
 702 |                 if not sys_counted[i]:
 703 |                     if self.use_onset_condition:
 704 |                         onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
 705 |                                                                system_event=system_output[i],
 706 |                                                                t_collar=self.t_collar)
 707 |                     else:
 708 |                         onset_condition = True
 709 | 
 710 |                     if self.use_offset_condition:
 711 |                         offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
 712 |                                                                  system_event=system_output[i],
 713 |                                                                  t_collar=self.t_collar)
 714 |                     else:
 715 |                         offset_condition = True
 716 | 
 717 |                     if onset_condition and offset_condition:
 718 |                         sys_counted[i] = True
 719 |                         Nsubs += 1
 720 |                         break
 721 | 
 722 |         Nfp = Nsys - Ntp - Nsubs
 723 |         Nfn = Nref - Ntp - Nsubs
 724 | 
 725 |         self.overall['Nref'] += Nref
 726 |         self.overall['Nsys'] += Nsys
 727 |         self.overall['Ntp'] += Ntp
 728 |         self.overall['Nsubs'] += Nsubs
 729 |         self.overall['Nfp'] += Nfp
 730 |         self.overall['Nfn'] += Nfn
 731 | 
 732 |         # Class-wise metrics
 733 |         for class_id, class_label in enumerate(self.class_list):
 734 |             Nref = 0.0
 735 |             Nsys = 0.0
 736 |             Ntp = 0.0
 737 | 
 738 |             # Count event frequencies in the ground truth
 739 |             for i in range(0, len(annotated_ground_truth)):
 740 |                 if annotated_ground_truth[i]['event_label'] == class_label:
 741 |                     Nref += 1
 742 | 
 743 |             # Count event frequencies in the system output
 744 |             for i in range(0, len(system_output)):
 745 |                 if system_output[i]['event_label'] == class_label:
 746 |                     Nsys += 1
 747 | 
 748 |             sys_counted = numpy.zeros(len(system_output), dtype=bool)
 749 |             for j in range(0, len(annotated_ground_truth)):
 750 |                 if annotated_ground_truth[j]['event_label'] == class_label:
 751 |                     for i in range(0, len(system_output)):
 752 |                         if system_output[i]['event_label'] == class_label and not sys_counted[i]:
 753 |                             if self.use_onset_condition:
 754 |                                 onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
 755 |                                                                        system_event=system_output[i],
 756 |                                                                        t_collar=self.t_collar)
 757 |                             else:
 758 |                                 onset_condition = True
 759 | 
 760 |                             if self.use_offset_condition:
 761 |                                 offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
 762 |                                                                          system_event=system_output[i],
 763 |                                                                          t_collar=self.t_collar)
 764 |                             else:
 765 |                                 offset_condition = True
 766 | 
 767 |                             if onset_condition and offset_condition:
 768 |                                 sys_counted[i] = True
 769 |                                 Ntp += 1
 770 |                                 break
 771 | 
 772 |             Nfp = Nsys - Ntp
 773 |             Nfn = Nref - Ntp
 774 | 
 775 |             self.class_wise[class_label]['Nref'] += Nref
 776 |             self.class_wise[class_label]['Nsys'] += Nsys
 777 | 
 778 |             self.class_wise[class_label]['Ntp'] += Ntp
 779 |             self.class_wise[class_label]['Nfp'] += Nfp
 780 |             self.class_wise[class_label]['Nfn'] += Nfn
 781 | 
 782 | 
 783 |     def onset_condition(self, annotated_event, system_event, t_collar=0.200):
 784 |         """Onset condition, checked does the event pair fulfill condition
 785 | 
 786 |         Condition:
 787 | 
 788 |         - event onsets are within t_collar each other
 789 | 
 790 |         Parameters
 791 |         ----------
 792 |         annotated_event : dict
 793 |             Event dict
 794 | 
 795 |         system_event : dict
 796 |             Event dict
 797 | 
 798 |         t_collar : float > 0
 799 |             Defines how close event onsets have to be in order to be considered match. In seconds.
 800 |             (Default value = 0.2)
 801 | 
 802 |         Returns
 803 |         -------
 804 |         result : bool
 805 |             Condition result
 806 | 
 807 |         """
 808 | 
 809 |         return math.fabs(annotated_event['event_onset'] - system_event['event_onset']) <= t_collar
 810 | 
 811 |     def offset_condition(self, annotated_event, system_event, t_collar=0.200, percentage_of_length=0.5):
 812 |         """Offset condition, checking does the event pair fulfill condition
 813 | 
 814 |         Condition:
 815 | 
 816 |         - event offsets are within t_collar each other
 817 |         or
 818 |         - system event offset is within the percentage_of_length*annotated event_length
 819 | 
 820 |         Parameters
 821 |         ----------
 822 |         annotated_event : dict
 823 |             Event dict
 824 | 
 825 |         system_event : dict
 826 |             Event dict
 827 | 
 828 |         t_collar : float > 0
 829 |             Defines how close event onsets have to be in order to be considered match. In seconds.
 830 |             (Default value = 0.2)
 831 | 
 832 |         percentage_of_length : float [0-1]
 833 | 
 834 | 
 835 |         Returns
 836 |         -------
 837 |         result : bool
 838 |             Condition result
 839 | 
 840 |         """
 841 |         annotated_length = annotated_event['event_offset'] - annotated_event['event_onset']
 842 |         return math.fabs(annotated_event['event_offset'] - system_event['event_offset']) <= max(t_collar, percentage_of_length * annotated_length)
 843 | 
 844 |     def results(self):
 845 |         """Get results
 846 | 
 847 |         Outputs results in dict, format:
 848 | 
 849 |             {
 850 |                 'overall':
 851 |                     {
 852 |                         'Pre':
 853 |                         'Rec':
 854 |                         'F':
 855 |                         'ER':
 856 |                         'S':
 857 |                         'D':
 858 |                         'I':
 859 |                     }
 860 |                 'class_wise':
 861 |                     {
 862 |                         'office': {
 863 |                             'Pre':
 864 |                             'Rec':
 865 |                             'F':
 866 |                             'ER':
 867 |                             'D':
 868 |                             'I':
 869 |                             'Nref':
 870 |                             'Nsys':
 871 |                             'Ntp':
 872 |                             'Nfn':
 873 |                             'Nfp':
 874 |                         },
 875 |                     }
 876 |                 'class_wise_average':
 877 |                     {
 878 |                         'F':
 879 |                         'ER':
 880 |                     }
 881 |             }
 882 | 
 883 |         Parameters
 884 |         ----------
 885 |         nothing
 886 | 
 887 |         Returns
 888 |         -------
 889 |         results : dict
 890 |             Results dict
 891 | 
 892 |         """
 893 | 
 894 |         results = {
 895 |             'overall': {},
 896 |             'class_wise': {},
 897 |             'class_wise_average': {},
 898 |         }
 899 | 
 900 |         # Overall metrics
 901 |         results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps)
 902 |         results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref']
 903 |         results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps))
 904 | 
 905 |         results['overall']['ER'] = (self.overall['Nfn'] + self.overall['Nfp'] + self.overall['Nsubs']) / self.overall['Nref']
 906 |         results['overall']['S'] = self.overall['Nsubs'] / self.overall['Nref']
 907 |         results['overall']['D'] = self.overall['Nfn'] / self.overall['Nref']
 908 |         results['overall']['I'] = self.overall['Nfp'] / self.overall['Nref']
 909 | 
 910 |         # Class-wise metrics
 911 |         class_wise_F = []
 912 |         class_wise_ER = []
 913 | 
 914 |         for class_label in self.class_list:
 915 |             if class_label not in results['class_wise']:
 916 |                 results['class_wise'][class_label] = {}
 917 | 
 918 |             results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps)
 919 |             results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps)
 920 |             results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps))
 921 | 
 922 |             results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn']+self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps)
 923 |             results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps)
 924 |             results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps)
 925 | 
 926 |             results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref']
 927 |             results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys']
 928 |             results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp']
 929 |             results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn']
 930 |             results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp']
 931 | 
 932 |             class_wise_F.append(results['class_wise'][class_label]['F'])
 933 |             class_wise_ER.append(results['class_wise'][class_label]['ER'])
 934 | 
 935 |         # Class-wise average
 936 |         results['class_wise_average']['F'] = numpy.mean(class_wise_F)
 937 |         results['class_wise_average']['ER'] = numpy.mean(class_wise_ER)
 938 | 
 939 |         return results
 940 | 
 941 | 
 942 | class DCASE2013_EventDetection_Metrics(EventDetectionMetrics):
 943 |     """Lecagy DCASE2013 metrics, converted from the provided Matlab implementation
 944 | 
 945 |     Supported metrics:
 946 |     - Frame based
 947 |         - F-score (F)
 948 |         - AEER
 949 |     - Event based
 950 |         - Onset
 951 |             - F-Score (F)
 952 |             - AEER
 953 |         - Onset-offset
 954 |             - F-Score (F)
 955 |             - AEER
 956 |     - Class based
 957 |         - Onset
 958 |             - F-Score (F)
 959 |             - AEER
 960 |         - Onset-offset
 961 |             - F-Score (F)
 962 |             - AEER
 963 |     """
 964 | 
 965 |     #
 966 | 
 967 |     def frame_based(self, annotated_ground_truth, system_output, resolution=0.01):
 968 |         # Convert event list into frame-based representation
 969 |         system_event_roll = self.list_to_roll(data=system_output, time_resolution=resolution)
 970 |         annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=resolution)
 971 | 
 972 |         # Fix durations of both event_rolls to be equal
 973 |         if annotated_event_roll.shape[0] > system_event_roll.shape[0]:
 974 |             padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list)))
 975 |             system_event_roll = numpy.vstack((system_event_roll, padding))
 976 | 
 977 |         if system_event_roll.shape[0] > annotated_event_roll.shape[0]:
 978 |             padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list)))
 979 |             annotated_event_roll = numpy.vstack((annotated_event_roll, padding))
 980 | 
 981 |         # Compute frame-based metrics
 982 |         Nref = sum(sum(annotated_event_roll))
 983 |         Ntot = sum(sum(system_event_roll))
 984 |         Ntp = sum(sum(system_event_roll + annotated_event_roll > 1))
 985 |         Nfp = sum(sum(system_event_roll - annotated_event_roll > 0))
 986 |         Nfn = sum(sum(annotated_event_roll - system_event_roll > 0))
 987 |         Nsubs = min(Nfp, Nfn)
 988 | 
 989 |         eps = numpy.spacing(1)
 990 | 
 991 |         results = dict()
 992 |         results['Rec'] = Ntp / (Nref + eps)
 993 |         results['Pre'] = Ntp / (Ntot + eps)
 994 |         results['F'] = 2 * ((results['Pre'] * results['Rec']) / (results['Pre'] + results['Rec'] + eps))
 995 |         results['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps)
 996 | 
 997 |         return results
 998 | 
 999 |     def event_based(self, annotated_ground_truth, system_output):
1000 |         # Event-based evaluation for event detection task
1001 |         # outputFile: the output of the event detection system
1002 |         # GTFile: the ground truth list of events
1003 | 
1004 |         # Total number of detected and reference events
1005 |         Ntot = len(system_output)
1006 |         Nref = len(annotated_ground_truth)
1007 | 
1008 |         # Number of correctly transcribed events, onset within a +/-100 ms range
1009 |         Ncorr = 0
1010 |         NcorrOff = 0
1011 |         for j in range(0, len(annotated_ground_truth)):
1012 |             for i in range(0, len(system_output)):
1013 |                 if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and (math.fabs(annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1):
1014 |                     Ncorr += 1
1015 | 
1016 |                     # If offset within a +/-100 ms range or within 50% of ground-truth event's duration
1017 |                     if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max(0.1, 0.5 * (annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j]['event_onset'])):
1018 |                         NcorrOff += 1
1019 | 
1020 |                     break  # In order to not evaluate duplicates
1021 | 
1022 |         # Compute onset-only event-based metrics
1023 |         eps = numpy.spacing(1)
1024 |         results = {
1025 |             'onset': {},
1026 |             'onset-offset': {},
1027 |         }
1028 | 
1029 |         Nfp = Ntot - Ncorr
1030 |         Nfn = Nref - Ncorr
1031 |         Nsubs = min(Nfp, Nfn)
1032 |         results['onset']['Rec'] = Ncorr / (Nref + eps)
1033 |         results['onset']['Pre'] = Ncorr / (Ntot + eps)
1034 |         results['onset']['F'] = 2 * (
1035 |             (results['onset']['Pre'] * results['onset']['Rec']) / (
1036 |                 results['onset']['Pre'] + results['onset']['Rec'] + eps))
1037 |         results['onset']['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps)
1038 | 
1039 |         # Compute onset-offset event-based metrics
1040 |         NfpOff = Ntot - NcorrOff
1041 |         NfnOff = Nref - NcorrOff
1042 |         NsubsOff = min(NfpOff, NfnOff)
1043 |         results['onset-offset']['Rec'] = NcorrOff / (Nref + eps)
1044 |         results['onset-offset']['Pre'] = NcorrOff / (Ntot + eps)
1045 |         results['onset-offset']['F'] = 2 * ((results['onset-offset']['Pre'] * results['onset-offset']['Rec']) / (
1046 |             results['onset-offset']['Pre'] + results['onset-offset']['Rec'] + eps))
1047 |         results['onset-offset']['AEER'] = (NfnOff + NfpOff + NsubsOff) / (Nref + eps)
1048 | 
1049 |         return results
1050 | 
1051 |     def class_based(self, annotated_ground_truth, system_output):
1052 |         # Class-wise event-based evaluation for event detection task
1053 |         # outputFile: the output of the event detection system
1054 |         # GTFile: the ground truth list of events
1055 | 
1056 |         # Total number of detected and reference events per class
1057 |         Ntot = numpy.zeros((len(self.class_list), 1))
1058 |         for event in system_output:
1059 |             pos = self.class_list.index(event['event_label'])
1060 |             Ntot[pos] += 1
1061 | 
1062 |         Nref = numpy.zeros((len(self.class_list), 1))
1063 |         for event in annotated_ground_truth:
1064 |             pos = self.class_list.index(event['event_label'])
1065 |             Nref[pos] += 1
1066 | 
1067 |         I = (Nref > 0).nonzero()[0]  # index for classes present in ground-truth
1068 | 
1069 |         # Number of correctly transcribed events per class, onset within a +/-100 ms range
1070 |         Ncorr = numpy.zeros((len(self.class_list), 1))
1071 |         NcorrOff = numpy.zeros((len(self.class_list), 1))
1072 | 
1073 |         for j in range(0, len(annotated_ground_truth)):
1074 |             for i in range(0, len(system_output)):
1075 |                 if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and (
1076 |                             math.fabs(
1077 |                                     annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1):
1078 |                     pos = self.class_list.index(system_output[i]['event_label'])
1079 |                     Ncorr[pos] += 1
1080 | 
1081 |                     # If offset within a +/-100 ms range or within 50% of ground-truth event's duration
1082 |                     if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max(
1083 |                             0.1, 0.5 * (
1084 |                                         annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j][
1085 |                                         'event_onset'])):
1086 |                         pos = self.class_list.index(system_output[i]['event_label'])
1087 |                         NcorrOff[pos] += 1
1088 | 
1089 |                     break  # In order to not evaluate duplicates
1090 | 
1091 |         # Compute onset-only class-wise event-based metrics
1092 |         eps = numpy.spacing(1)
1093 |         results = {
1094 |             'onset': {},
1095 |             'onset-offset': {},
1096 |         }
1097 | 
1098 |         Nfp = Ntot - Ncorr
1099 |         Nfn = Nref - Ncorr
1100 |         Nsubs = numpy.minimum(Nfp, Nfn)
1101 |         tempRec = Ncorr[I] / (Nref[I] + eps)
1102 |         tempPre = Ncorr[I] / (Ntot[I] + eps)
1103 |         results['onset']['Rec'] = numpy.mean(tempRec)
1104 |         results['onset']['Pre'] = numpy.mean(tempPre)
1105 |         tempF = 2 * ((tempPre * tempRec) / (tempPre + tempRec + eps))
1106 |         results['onset']['F'] = numpy.mean(tempF)
1107 |         tempAEER = (Nfn[I] + Nfp[I] + Nsubs[I]) / (Nref[I] + eps)
1108 |         results['onset']['AEER'] = numpy.mean(tempAEER)
1109 | 
1110 |         # Compute onset-offset class-wise event-based metrics
1111 |         NfpOff = Ntot - NcorrOff
1112 |         NfnOff = Nref - NcorrOff
1113 |         NsubsOff = numpy.minimum(NfpOff, NfnOff)
1114 |         tempRecOff = NcorrOff[I] / (Nref[I] + eps)
1115 |         tempPreOff = NcorrOff[I] / (Ntot[I] + eps)
1116 |         results['onset-offset']['Rec'] = numpy.mean(tempRecOff)
1117 |         results['onset-offset']['Pre'] = numpy.mean(tempPreOff)
1118 |         tempFOff = 2 * ((tempPreOff * tempRecOff) / (tempPreOff + tempRecOff + eps))
1119 |         results['onset-offset']['F'] = numpy.mean(tempFOff)
1120 |         tempAEEROff = (NfnOff[I] + NfpOff[I] + NsubsOff[I]) / (Nref[I] + eps)
1121 |         results['onset-offset']['AEER'] = numpy.mean(tempAEEROff)
1122 | 
1123 |         return results
1124 | 
1125 | 
1126 | def main(argv):
1127 |     # Examples to show usage and required data structures
1128 |     class_list = ['class1', 'class2', 'class3']
1129 |     system_output = [
1130 |         {
1131 |             'event_label': 'class1',
1132 |             'event_onset': 0.1,
1133 |             'event_offset': 1.0
1134 |         },
1135 |         {
1136 |             'event_label': 'class2',
1137 |             'event_onset': 4.1,
1138 |             'event_offset': 4.7
1139 |         },
1140 |         {
1141 |             'event_label': 'class3',
1142 |             'event_onset': 5.5,
1143 |             'event_offset': 6.7
1144 |         }
1145 |     ]
1146 |     annotated_groundtruth = [
1147 |         {
1148 |             'event_label': 'class1',
1149 |             'event_onset': 0.1,
1150 |             'event_offset': 1.0
1151 |         },
1152 |         {
1153 |             'event_label': 'class2',
1154 |             'event_onset': 4.2,
1155 |             'event_offset': 5.4
1156 |         },
1157 |         {
1158 |             'event_label': 'class3',
1159 |             'event_onset': 5.5,
1160 |             'event_offset': 6.7
1161 |         }
1162 |     ]
1163 |     dcase2013metric = DCASE2013_EventDetection_Metrics(class_list=class_list)
1164 | 
1165 |     print 'DCASE2013'
1166 |     print 'Frame-based:', dcase2013metric.frame_based(system_output=system_output,
1167 |                                                       annotated_ground_truth=annotated_groundtruth)
1168 |     print 'Event-based:', dcase2013metric.event_based(system_output=system_output,
1169 |                                                       annotated_ground_truth=annotated_groundtruth)
1170 |     print 'Class-based:', dcase2013metric.class_based(system_output=system_output,
1171 |                                                       annotated_ground_truth=annotated_groundtruth)
1172 | 
1173 |     dcase2016_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=class_list)
1174 |     print 'DCASE2016'
1175 |     print dcase2016_metric.evaluate(system_output=system_output, annotated_ground_truth=annotated_groundtruth).results()
1176 | 
1177 | 
1178 | if __name__ == "__main__":
1179 |     sys.exit(main(sys.argv))
1180 | 


--------------------------------------------------------------------------------
/src/features.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import numpy
  5 | import librosa
  6 | import scipy
  7 | 
  8 | 
  9 | def feature_extraction(y, fs=44100, statistics=True, include_mfcc0=True, include_delta=True,
 10 |                        include_acceleration=True, mfcc_params=None, delta_params=None, acceleration_params=None):
 11 |     """Feature extraction, MFCC based features
 12 | 
 13 |     Outputs features in dict, format:
 14 | 
 15 |         {
 16 |             'feat': feature_matrix [shape=(frame count, feature vector size)],
 17 |             'stat': {
 18 |                 'mean': numpy.mean(feature_matrix, axis=0),
 19 |                 'std': numpy.std(feature_matrix, axis=0),
 20 |                 'N': feature_matrix.shape[0],
 21 |                 'S1': numpy.sum(feature_matrix, axis=0),
 22 |                 'S2': numpy.sum(feature_matrix ** 2, axis=0),
 23 |             }
 24 |         }
 25 | 
 26 |     Parameters
 27 |     ----------
 28 |     y: numpy.array [shape=(signal_length, )]
 29 |         Audio
 30 | 
 31 |     fs: int > 0 [scalar]
 32 |         Sample rate
 33 |         (Default value=44100)
 34 | 
 35 |     statistics: bool
 36 |         Calculate feature statistics for extracted matrix
 37 |         (Default value=True)
 38 | 
 39 |     include_mfcc0: bool
 40 |         Include 0th MFCC coefficient into static coefficients.
 41 |         (Default value=True)
 42 | 
 43 |     include_delta: bool
 44 |         Include delta MFCC coefficients.
 45 |         (Default value=True)
 46 | 
 47 |     include_acceleration: bool
 48 |         Include acceleration MFCC coefficients.
 49 |         (Default value=True)
 50 | 
 51 |     mfcc_params: dict or None
 52 |         Parameters for extraction of static MFCC coefficients.
 53 | 
 54 |     delta_params: dict or None
 55 |         Parameters for extraction of delta MFCC coefficients.
 56 | 
 57 |     acceleration_params: dict or None
 58 |         Parameters for extraction of acceleration MFCC coefficients.
 59 | 
 60 |     Returns
 61 |     -------
 62 |     result: dict
 63 |         Feature dict
 64 | 
 65 |     """
 66 | 
 67 |     eps = numpy.spacing(1)
 68 | 
 69 |     # Windowing function
 70 |     if mfcc_params['window'] == 'hamming_asymmetric':
 71 |         window = scipy.signal.hamming(mfcc_params['n_fft'], sym=False)
 72 |     elif mfcc_params['window'] == 'hamming_symmetric':
 73 |         window = scipy.signal.hamming(mfcc_params['n_fft'], sym=True)
 74 |     elif mfcc_params['window'] == 'hann_asymmetric':
 75 |         window = scipy.signal.hann(mfcc_params['n_fft'], sym=False)
 76 |     elif mfcc_params['window'] == 'hann_symmetric':
 77 |         window = scipy.signal.hann(mfcc_params['n_fft'], sym=True)
 78 |     else:
 79 |         window = None
 80 | 
 81 |     # Calculate Static Coefficients
 82 |     power_spectrogram = numpy.abs(librosa.stft(y + eps,
 83 |                                                n_fft=mfcc_params['n_fft'],
 84 |                                                win_length=mfcc_params['win_length'],
 85 |                                                hop_length=mfcc_params['hop_length'],
 86 |                                                center=True,
 87 |                                                window=window))**2
 88 |     mel_basis = librosa.filters.mel(sr=fs,
 89 |                                     n_fft=mfcc_params['n_fft'],
 90 |                                     n_mels=mfcc_params['n_mels'],
 91 |                                     fmin=mfcc_params['fmin'],
 92 |                                     fmax=mfcc_params['fmax'],
 93 |                                     htk=mfcc_params['htk'])
 94 |     mel_spectrum = numpy.dot(mel_basis, power_spectrogram)
 95 |     mfcc = librosa.feature.mfcc(S=librosa.logamplitude(mel_spectrum),
 96 |                                 n_mfcc=mfcc_params['n_mfcc'])
 97 | 
 98 |     # Collect the feature matrix
 99 |     feature_matrix = mfcc
100 |     if include_delta:
101 |         # Delta coefficients
102 |         mfcc_delta = librosa.feature.delta(mfcc, **delta_params)
103 | 
104 |         # Add Delta Coefficients to feature matrix
105 |         feature_matrix = numpy.vstack((feature_matrix, mfcc_delta))
106 | 
107 |     if include_acceleration:
108 |         # Acceleration coefficients (aka delta delta)
109 |         mfcc_delta2 = librosa.feature.delta(mfcc, order=2, **acceleration_params)
110 | 
111 |         # Add Acceleration Coefficients to feature matrix
112 |         feature_matrix = numpy.vstack((feature_matrix, mfcc_delta2))
113 | 
114 |     if not include_mfcc0:
115 |         # Omit mfcc0
116 |         feature_matrix = feature_matrix[1:, :]
117 | 
118 |     feature_matrix = feature_matrix.T
119 | 
120 |     # Collect into data structure
121 |     if statistics:
122 |         return {
123 |             'feat': feature_matrix,
124 |             'stat': {
125 |                 'mean': numpy.mean(feature_matrix, axis=0),
126 |                 'std': numpy.std(feature_matrix, axis=0),
127 |                 'N': feature_matrix.shape[0],
128 |                 'S1': numpy.sum(feature_matrix, axis=0),
129 |                 'S2': numpy.sum(feature_matrix ** 2, axis=0),
130 |             }
131 |         }
132 |     else:
133 |         return {
134 |             'feat': feature_matrix}
135 | 
136 | 
137 | class FeatureNormalizer(object):
138 |     """Feature normalizer class
139 | 
140 |     Accumulates feature statistics
141 | 
142 |     Examples
143 |     --------
144 | 
145 |     >>> normalizer = FeatureNormalizer()
146 |     >>> for feature_matrix in training_items:
147 |     >>>     normalizer.accumulate(feature_matrix)
148 |     >>>
149 |     >>> normalizer.finalize()
150 | 
151 |     >>> for feature_matrix in test_items:
152 |     >>>     feature_matrix_normalized = normalizer.normalize(feature_matrix)
153 |     >>>     # used the features
154 | 
155 |     """
156 |     def __init__(self, feature_matrix=None):
157 |         """__init__ method.
158 | 
159 |         Parameters
160 |         ----------
161 |         feature_matrix : numpy.ndarray [shape=(frames, number of feature values)] or None
162 |             Feature matrix to be used in the initialization
163 | 
164 |         """
165 |         if feature_matrix is None:
166 |             self.N = 0
167 |             self.mean = 0
168 |             self.S1 = 0
169 |             self.S2 = 0
170 |             self.std = 0
171 |         else:
172 |             self.mean = numpy.mean(feature_matrix, axis=0)
173 |             self.std = numpy.std(feature_matrix, axis=0)
174 |             self.N = feature_matrix.shape[0]
175 |             self.S1 = numpy.sum(feature_matrix, axis=0)
176 |             self.S2 = numpy.sum(feature_matrix ** 2, axis=0)
177 |             self.finalize()
178 | 
179 |     def __enter__(self):
180 |         # Initialize Normalization class and return it
181 |         self.N = 0
182 |         self.mean = 0
183 |         self.S1 = 0
184 |         self.S2 = 0
185 |         self.std = 0
186 |         return self
187 | 
188 |     def __exit__(self, type, value, traceback):
189 |         # Finalize accumulated calculation
190 |         self.finalize()
191 | 
192 |     def accumulate(self, stat):
193 |         """Accumalate statistics
194 | 
195 |         Input is statistics dict, format:
196 | 
197 |             {
198 |                 'mean': numpy.mean(feature_matrix, axis=0),
199 |                 'std': numpy.std(feature_matrix, axis=0),
200 |                 'N': feature_matrix.shape[0],
201 |                 'S1': numpy.sum(feature_matrix, axis=0),
202 |                 'S2': numpy.sum(feature_matrix ** 2, axis=0),
203 |             }
204 | 
205 |         Parameters
206 |         ----------
207 |         stat : dict
208 |             Statistics dict
209 | 
210 |         Returns
211 |         -------
212 |         nothing
213 | 
214 |         """
215 |         self.N += stat['N']
216 |         self.mean += stat['mean']
217 |         self.S1 += stat['S1']
218 |         self.S2 += stat['S2']
219 | 
220 |     def finalize(self):
221 |         """Finalize statistics calculation
222 | 
223 |         Accumulated values are used to get mean and std for the seen feature data.
224 | 
225 |         Parameters
226 |         ----------
227 |         nothing
228 | 
229 |         Returns
230 |         -------
231 |         nothing
232 | 
233 |         """
234 | 
235 |         # Finalize statistics
236 |         self.mean = self.S1 / self.N
237 |         self.std = numpy.sqrt((self.N * self.S2 - (self.S1 * self.S1)) / (self.N * (self.N - 1)))
238 | 
239 |         # In case we have very brain-death material we get std = Nan => 0.0
240 |         self.std = numpy.nan_to_num(self.std)
241 | 
242 |         self.mean = numpy.reshape(self.mean, [1, -1])
243 |         self.std = numpy.reshape(self.std, [1, -1])
244 | 
245 |     def normalize(self, feature_matrix):
246 |         """Normalize feature matrix with internal statistics of the class
247 | 
248 |         Parameters
249 |         ----------
250 |         feature_matrix : numpy.ndarray [shape=(frames, number of feature values)]
251 |             Feature matrix to be normalized
252 | 
253 |         Returns
254 |         -------
255 |         feature_matrix : numpy.ndarray [shape=(frames, number of feature values)]
256 |             Normalized feature matrix
257 | 
258 |         """
259 | 
260 |         return (feature_matrix - self.mean) / self.std
261 | 


--------------------------------------------------------------------------------
/src/files.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import os
  5 | import wave
  6 | import numpy
  7 | import csv
  8 | import cPickle as pickle
  9 | import librosa
 10 | import yaml
 11 | import soundfile
 12 | 
 13 | def load_audio(filename, mono=True, fs=44100):
 14 |     """Load audio file into numpy array
 15 | 
 16 |     Supports 24-bit wav-format, and flac audio through librosa.
 17 | 
 18 |     Parameters
 19 |     ----------
 20 |     filename:  str
 21 |         Path to audio file
 22 | 
 23 |     mono : bool
 24 |         In case of multi-channel audio, channels are averaged into single channel.
 25 |         (Default value=True)
 26 | 
 27 |     fs : int > 0 [scalar]
 28 |         Target sample rate, if input audio does not fulfil this, audio is resampled.
 29 |         (Default value=44100)
 30 | 
 31 |     Returns
 32 |     -------
 33 |     audio_data : numpy.ndarray [shape=(signal_length, channel)]
 34 |         Audio
 35 | 
 36 |     sample_rate : integer
 37 |         Sample rate
 38 | 
 39 |     """
 40 | 
 41 |     file_base, file_extension = os.path.splitext(filename)
 42 |     if file_extension == '.wav':
 43 |         # Load audio
 44 |         audio_data, sample_rate = soundfile.read(filename)
 45 |         audio_data = audio_data.T
 46 | 
 47 |         if mono:
 48 |             # Down-mix audio
 49 |             audio_data = numpy.mean(audio_data, axis=0)
 50 | 
 51 |         # Resample
 52 |         if fs != sample_rate:
 53 |             audio_data = librosa.core.resample(audio_data, sample_rate, fs)
 54 |             sample_rate = fs
 55 | 
 56 |         return audio_data, sample_rate
 57 | 
 58 |     elif file_extension == '.flac':
 59 |         audio_data, sample_rate = librosa.load(filename, sr=fs, mono=mono)
 60 | 
 61 |         return audio_data, sample_rate
 62 | 
 63 |     return None, None
 64 | 
 65 | 
 66 | def load_event_list(file):
 67 |     """Load event list from tab delimited text file (csv-formated)
 68 | 
 69 |     Supported input formats:
 70 | 
 71 |         - [event_onset (float)][tab][event_offset (float)]
 72 |         - [event_onset (float)][tab][event_offset (float)][tab][event_label (string)]
 73 |         - [file(string)[tab][scene_label][tab][event_onset (float)][tab][event_offset (float)][tab][event_label (string)]
 74 | 
 75 |     Event dict format:
 76 | 
 77 |         {
 78 |             'file': 'filename',
 79 |             'scene_label': 'office',
 80 |             'event_onset': 0.0,
 81 |             'event_offset': 1.0,
 82 |             'event_label': 'people_walking',
 83 |         }
 84 | 
 85 |     Parameters
 86 |     ----------
 87 |     file : str
 88 |         Path to the event list in text format (csv)
 89 | 
 90 |     Returns
 91 |     -------
 92 |     data : list of event dicts
 93 |         List containing event dicts
 94 | 
 95 |     """
 96 |     data = []
 97 |     with open(file, 'rt') as f:
 98 |         for row in csv.reader(f, delimiter='\t'):
 99 |             if len(row) == 2:
100 |                 data.append(
101 |                     {
102 |                         'event_onset': float(row[0]),
103 |                         'event_offset': float(row[1])
104 |                     }
105 |                 )
106 |             elif len(row) == 3:
107 |                 data.append(
108 |                     {
109 |                         'event_onset': float(row[0]),
110 |                         'event_offset': float(row[1]),
111 |                         'event_label': row[2]
112 |                     }
113 |                 )
114 |             elif len(row) == 4:
115 |                 data.append(
116 |                     {
117 |                         'file': row[0],
118 |                         'event_onset': float(row[1]),
119 |                         'event_offset': float(row[2]),
120 |                         'event_label': row[3]
121 |                     }
122 |                 )
123 |             elif len(row) == 5:
124 |                 data.append(
125 |                     {
126 |                         'file': row[0],
127 |                         'scene_label': row[1],
128 |                         'event_onset': float(row[2]),
129 |                         'event_offset': float(row[3]),
130 |                         'event_label': row[4]
131 |                     }
132 |                 )
133 |     return data
134 | 
135 | 
136 | def save_data(filename, data):
137 |     """Save variable into a pickle file
138 | 
139 |     Parameters
140 |     ----------
141 |     filename: str
142 |         Path to file
143 | 
144 |     data: list or dict
145 |         Data to be saved.
146 | 
147 |     Returns
148 |     -------
149 |     nothing
150 | 
151 |     """
152 | 
153 |     pickle.dump(data, open(filename, 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
154 | 
155 | 
156 | def load_data(filename):
157 |     """Load data from pickle file
158 | 
159 |     Parameters
160 |     ----------
161 |     filename: str
162 |         Path to file
163 | 
164 |     Returns
165 |     -------
166 |     data: list or dict
167 |         Loaded file.
168 | 
169 |     """
170 | 
171 |     return pickle.load(open(filename, "rb"))
172 | 
173 | 
174 | def save_parameters(filename, parameters):
175 |     """Save parameters to YAML-file
176 | 
177 |     Parameters
178 |     ----------
179 |     filename: str
180 |         Path to file
181 |     parameters: dict
182 |         Dict containing parameters to be saved
183 | 
184 |     Returns
185 |     -------
186 |     Nothing
187 | 
188 |     """
189 | 
190 |     with open(filename, 'w') as outfile:
191 |         outfile.write(yaml.dump(parameters, default_flow_style=False))
192 | 
193 | 
194 | def load_parameters(filename):
195 |     """Load parameters from YAML-file
196 | 
197 |     Parameters
198 |     ----------
199 |     filename: str
200 |         Path to file
201 | 
202 |     Returns
203 |     -------
204 |     parameters: dict
205 |         Dict containing loaded parameters
206 | 
207 |     Raises
208 |     -------
209 |     IOError
210 |         file is not found.
211 | 
212 |     """
213 | 
214 |     if os.path.isfile(filename):
215 |         with open(filename, 'r') as f:
216 |             return yaml.load(f)
217 |     else:
218 |         raise IOError("Parameter file not found [%s]" % filename)
219 | 
220 | 
221 | def save_text(filename, text):
222 |     """Save text into text file.
223 | 
224 |     Parameters
225 |     ----------
226 |     filename: str
227 |         Path to file
228 | 
229 |     text: str
230 |         String to be saved.
231 | 
232 |     Returns
233 |     -------
234 |     nothing
235 | 
236 |     """
237 | 
238 |     with open(filename, "w") as text_file:
239 |         text_file.write(text)
240 | 
241 | 
242 | def load_text(filename):
243 |     """Load text file
244 | 
245 |     Parameters
246 |     ----------
247 |     filename: str
248 |         Path to file
249 | 
250 |     Returns
251 |     -------
252 |     text: string
253 |         Loaded text.
254 | 
255 |     """
256 | 
257 |     with open(filename, 'r') as f:
258 |         return f.readlines()
259 | 


--------------------------------------------------------------------------------
/src/general.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import os
 5 | import hashlib
 6 | import json
 7 | 
 8 | 
 9 | def check_path(path):
10 |     """Check if path exists, if not creates one
11 | 
12 |     Parameters
13 |     ----------
14 |     path : str
15 |         Path to be checked.
16 | 
17 |     Returns
18 |     -------
19 |     Nothing
20 | 
21 |     """
22 | 
23 |     if not os.path.isdir(path):
24 |         os.makedirs(path)
25 | 
26 | 
27 | def get_parameter_hash(params):
28 |     """Get unique hash string (md5) for given parameter dict
29 | 
30 |     Parameters
31 |     ----------
32 |     params : dict
33 |         Input parameters
34 | 
35 |     Returns
36 |     -------
37 |     md5_hash : str
38 |         Unique hash for parameter dict
39 | 
40 |     """
41 | 
42 |     md5 = hashlib.md5()
43 |     md5.update(str(json.dumps(params, sort_keys=True)))
44 |     return md5.hexdigest()
45 | 
46 | 


--------------------------------------------------------------------------------
/src/sound_event_detection.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import numpy
  5 | 
  6 | 
  7 | def event_detection(feature_data, model_container, hop_length_seconds=0.01, smoothing_window_length_seconds=1.0, decision_threshold=0.0, minimum_event_length=0.1, minimum_event_gap=0.1):
  8 |     """Sound event detection
  9 | 
 10 |     Parameters
 11 |     ----------
 12 |     feature_data : numpy.ndarray [shape=(n_features, t)]
 13 |         Feature matrix
 14 | 
 15 |     model_container : dict
 16 |         Sound event model pairs [positive and negative] in dict
 17 | 
 18 |     hop_length_seconds : float > 0.0
 19 |         Feature hop length in seconds, used to convert feature index into time-stamp
 20 |         (Default value=0.01)
 21 | 
 22 |     smoothing_window_length_seconds : float > 0.0
 23 |         Accumulation window (look-back) length, withing the window likelihoods are accumulated.
 24 |         (Default value=1.0)
 25 | 
 26 |     decision_threshold : float > 0.0
 27 |         Likelihood ratio threshold for making the decision.
 28 |         (Default value=0.0)
 29 | 
 30 |     minimum_event_length : float > 0.0
 31 |         Minimum event length in seconds, shorten than given are filtered out from the output.
 32 |         (Default value=0.1)
 33 | 
 34 |     minimum_event_gap : float > 0.0
 35 |         Minimum allowed gap between events in seconds from same event label class.
 36 |         (Default value=0.1)
 37 | 
 38 |     Returns
 39 |     -------
 40 |     results : list (event dicts)
 41 |         Detection result, event list
 42 | 
 43 |     """
 44 | 
 45 |     smoothing_window = int(smoothing_window_length_seconds / hop_length_seconds)
 46 | 
 47 |     results = []
 48 |     for event_label in model_container['models']:
 49 |         positive = model_container['models'][event_label]['positive'].score_samples(feature_data)[0]
 50 |         negative = model_container['models'][event_label]['negative'].score_samples(feature_data)[0]
 51 | 
 52 |         # Lets keep the system causal and use look-back while smoothing (accumulating) likelihoods
 53 |         for stop_id in range(0, feature_data.shape[0]):
 54 |             start_id = stop_id - smoothing_window
 55 |             if start_id < 0:
 56 |                 start_id = 0
 57 |             positive[start_id] = sum(positive[start_id:stop_id])
 58 |             negative[start_id] = sum(negative[start_id:stop_id])
 59 | 
 60 |         likelihood_ratio = positive - negative
 61 |         event_activity = likelihood_ratio > decision_threshold
 62 | 
 63 |         # Find contiguous segments and convert frame-ids into times
 64 |         event_segments = contiguous_regions(event_activity) * hop_length_seconds
 65 | 
 66 |         # Preprocess the event segments
 67 |         event_segments = postprocess_event_segments(event_segments=event_segments,
 68 |                                                    minimum_event_length=minimum_event_length,
 69 |                                                    minimum_event_gap=minimum_event_gap)
 70 | 
 71 |         for event in event_segments:
 72 |             results.append((event[0], event[1], event_label))
 73 | 
 74 |     return results
 75 | 
 76 | 
 77 | def contiguous_regions(activity_array):
 78 |     """Find contiguous regions from bool valued numpy.array.
 79 |     Transforms boolean values for each frame into pairs of onsets and offsets.
 80 | 
 81 |     Parameters
 82 |     ----------
 83 |     activity_array : numpy.array [shape=(t)]
 84 |         Event activity array, bool values
 85 | 
 86 |     Returns
 87 |     -------
 88 |     change_indices : numpy.ndarray [shape=(2, number of found changes)]
 89 |         Onset and offset indices pairs in matrix
 90 | 
 91 |     """
 92 | 
 93 |     # Find the changes in the activity_array
 94 |     change_indices = numpy.diff(activity_array).nonzero()[0]
 95 | 
 96 |     # Shift change_index with one, focus on frame after the change.
 97 |     change_indices += 1
 98 | 
 99 |     if activity_array[0]:
100 |         # If the first element of activity_array is True add 0 at the beginning
101 |         change_indices = numpy.r_[0, change_indices]
102 | 
103 |     if activity_array[-1]:
104 |         # If the last element of activity_array is True, add the length of the array
105 |         change_indices = numpy.r_[change_indices, activity_array.size]
106 | 
107 |     # Reshape the result into two columns
108 |     return change_indices.reshape((-1, 2))
109 | 
110 | 
111 | def postprocess_event_segments(event_segments, minimum_event_length=0.1, minimum_event_gap=0.1):
112 |     """Post process event segment list. Makes sure that minimum event length and minimum event gap conditions are met.
113 | 
114 |     Parameters
115 |     ----------
116 |     event_segments : numpy.ndarray [shape=(2, number of event)]
117 |         Event segments, first column has the onset, second has the offset.
118 | 
119 |     minimum_event_length : float > 0.0
120 |         Minimum event length in seconds, shorten than given are filtered out from the output.
121 |         (Default value=0.1)
122 | 
123 |     minimum_event_gap : float > 0.0
124 |         Minimum allowed gap between events in seconds from same event label class.
125 |         (Default value=0.1)
126 | 
127 |     Returns
128 |     -------
129 |     event_results : numpy.ndarray [shape=(2, number of event)]
130 |         postprocessed event segments
131 | 
132 |     """
133 | 
134 |     # 1. remove short events
135 |     event_results_1 = []
136 |     for event in event_segments:
137 |         if event[1]-event[0] >= minimum_event_length:
138 |             event_results_1.append((event[0], event[1]))
139 | 
140 |     if len(event_results_1):
141 |         # 2. remove small gaps between events
142 |         event_results_2 = []
143 | 
144 |         # Load first event into event buffer
145 |         buffered_event_onset = event_results_1[0][0]
146 |         buffered_event_offset = event_results_1[0][1]
147 |         for i in range(1, len(event_results_1)):
148 |             if event_results_1[i][0] - buffered_event_offset > minimum_event_gap:
149 |                 # The gap between current event and the buffered is bigger than minimum event gap,
150 |                 # store event, and replace buffered event
151 |                 event_results_2.append((buffered_event_onset, buffered_event_offset))
152 |                 buffered_event_onset = event_results_1[i][0]
153 |                 buffered_event_offset = event_results_1[i][1]
154 |             else:
155 |                 # The gap between current event and the buffered is smalle than minimum event gap,
156 |                 # extend the buffered event until the current offset
157 |                 buffered_event_offset = event_results_1[i][1]
158 | 
159 |         # Store last event from buffer
160 |         event_results_2.append((buffered_event_onset, buffered_event_offset))
161 | 
162 |         return event_results_2
163 |     else:
164 |         return event_results_1
165 | 


--------------------------------------------------------------------------------
/src/ui.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import sys
  5 | import itertools
  6 | 
  7 | spinner = itertools.cycle(["`", "*", ";", ","])
  8 | 
  9 | 
 10 | def title(text):
 11 |     """Prints title
 12 | 
 13 |     Parameters
 14 |     ----------
 15 |     text : str
 16 |         Title
 17 | 
 18 |     Returns
 19 |     -------
 20 |     Nothing
 21 | 
 22 |     """
 23 | 
 24 |     print "--------------------------------"
 25 |     print text
 26 |     print "--------------------------------"
 27 | 
 28 | 
 29 | def section_header(text):
 30 |     """Prints section header
 31 | 
 32 |     Parameters
 33 |     ----------
 34 |     text : str
 35 |         Section header
 36 | 
 37 |     Returns
 38 |     -------
 39 |     Nothing
 40 | 
 41 |     """
 42 | 
 43 |     print " "
 44 |     print text
 45 |     print "================================"
 46 | 
 47 | 
 48 | def foot():
 49 |     """Prints foot
 50 | 
 51 |     Parameters
 52 |     ----------
 53 |     Nothing
 54 | 
 55 |     Returns
 56 |     -------
 57 |     Nothing
 58 | 
 59 |     """
 60 | 
 61 |     print "  [Done]                                                                             "
 62 | 
 63 | 
 64 | def progress(title_text=None, fold=None, percentage=None, note=None, label=None):
 65 |     """Prints progress line
 66 | 
 67 |     Parameters
 68 |     ----------
 69 |     title_text : str or None
 70 |         Title
 71 | 
 72 |     fold : int > 0 [scalar] or None
 73 |         Fold number
 74 | 
 75 |     percentage : float [0-1] or None
 76 |         Progress percentage.
 77 | 
 78 |     note : str or None
 79 |         Note
 80 | 
 81 |     label : str or None
 82 |         Label
 83 | 
 84 |     Returns
 85 |     -------
 86 |     Nothing
 87 | 
 88 |     """
 89 | 
 90 |     if title_text is not None and fold is not None and percentage is not None and note is not None and label is None:
 91 |         print "  {:2s} {:20s} fold[{:1d}] [{:3.0f}%] [{:20s}]                        \r".format(spinner.next(), title_text, fold,percentage * 100, note),
 92 | 
 93 |     elif title_text is not None and fold is not None and percentage is None and note is not None and label is None:
 94 |         print "  {:2s} {:20s} fold[{:1d}]        [{:20s}]                     \r".format(spinner.next(), title_text, fold, note),
 95 | 
 96 |     elif title_text is not None and fold is None and percentage is not None and note is not None and label is None:
 97 |         print "  {:2s} {:20s} [{:3.0f}%] [{:20s}]                          \r".format(spinner.next(), title_text, percentage * 100, note),
 98 | 
 99 |     elif title_text is not None and fold is None and percentage is not None and note is None and label is None:
100 |         print "  {:2s} {:20s} [{:3.0f}%]                                   \r".format(spinner.next(), title_text, percentage * 100),
101 | 
102 |     elif title_text is not None and fold is None and percentage is None and note is not None and label is None:
103 |         print "  {:2s} {:20s} [{:20s}]                                    \r".format(spinner.next(), title_text, note),
104 | 
105 |     elif title_text is not None and fold is None and percentage is None and note is not None and label is not None:
106 |         print "  {:2s} {:20s} [{:20s}] [{:20s}]                                    \r".format(spinner.next(), title_text, label, note),
107 | 
108 |     elif title_text is not None and fold is None and percentage is not None and note is not None and label is not None:
109 |         print "  {:2s} {:20s} [{:20s}] [{:3.0f}%] [{:20s}]                           \r".format(spinner.next(), title_text, label, percentage * 100, note),
110 | 
111 |     elif title_text is not None and fold is not None and percentage is not None and note is not None and label is not None:
112 |         print "  {:2s} {:20s} fold[{:1d}] [{:10s}] [{:3.0f}%] [{:20s}]                           \r".format(spinner.next(), title_text, fold, label, percentage * 100, note),
113 | 
114 |     elif title_text is not None and fold is not None and percentage is None and note is None and label is not None:
115 |         print "  {:2s} {:20s} fold[{:1d}] [{:10s}]                                               \r".format(spinner.next(), title_text, fold, label),
116 | 
117 |     sys.stdout.flush()
118 | 


--------------------------------------------------------------------------------
/task1_scene_classification.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | # -*- coding: utf-8 -*-
   3 | #
   4 | # DCASE 2016::Acoustic Scene Classification / Baseline System
   5 | 
   6 | from src.ui import *
   7 | from src.general import *
   8 | from src.files import *
   9 | 
  10 | from src.features import *
  11 | from src.dataset import *
  12 | from src.evaluation import *
  13 | 
  14 | import numpy
  15 | import csv
  16 | import argparse
  17 | import textwrap
  18 | import copy
  19 | 
  20 | from sklearn import mixture
  21 | 
  22 | __version_info__ = ('1', '0', '0')
  23 | __version__ = '.'.join(__version_info__)
  24 | 
  25 | 
  26 | def main(argv):
  27 |     numpy.random.seed(123456)  # let's make randomization predictable
  28 | 
  29 |     parser = argparse.ArgumentParser(
  30 |         prefix_chars='-+',
  31 |         formatter_class=argparse.RawDescriptionHelpFormatter,
  32 |         description=textwrap.dedent('''\
  33 |             DCASE 2016
  34 |             Task 1: Acoustic Scene Classification
  35 |             Baseline system
  36 |             ---------------------------------------------
  37 |                 Tampere University of Technology / Audio Research Group
  38 |                 Author:  Toni Heittola ( toni.heittola@tut.fi )
  39 | 
  40 |             System description
  41 |                 This is an baseline implementation for D-CASE 2016 challenge acoustic scene classification task.
  42 |                 Features: MFCC (static+delta+acceleration)
  43 |                 Classifier: GMM
  44 | 
  45 |         '''))
  46 | 
  47 |     # Setup argument handling
  48 |     parser.add_argument("-development", help="Use the system in the development mode", action='store_true',
  49 |                         default=False, dest='development')
  50 |     parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true',
  51 |                         default=False, dest='challenge')
  52 | 
  53 |     parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__)
  54 |     args = parser.parse_args()
  55 | 
  56 |     # Load parameters from config file
  57 |     parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)),
  58 |                                   os.path.splitext(os.path.basename(__file__))[0]+'.yaml')
  59 |     params = load_parameters(parameter_file)
  60 |     params = process_parameters(params)
  61 |     make_folders(params)
  62 | 
  63 |     title("DCASE 2016::Acoustic Scene Classification / Baseline System")
  64 | 
  65 |     # Check if mode is defined
  66 |     if not (args.development or args.challenge):
  67 |         args.development = True
  68 |         args.challenge = False
  69 | 
  70 |     dataset_evaluation_mode = 'folds'
  71 |     if args.development and not args.challenge:
  72 |         print "Running system in development mode"
  73 |         dataset_evaluation_mode = 'folds'
  74 |     elif not args.development and args.challenge:
  75 |         print "Running system in challenge mode"
  76 |         dataset_evaluation_mode = 'full'
  77 | 
  78 |     # Get dataset container class
  79 |     dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data'])
  80 | 
  81 |     # Fetch data over internet and setup the data
  82 |     # ==================================================
  83 |     if params['flow']['initialize']:
  84 |         dataset.fetch()
  85 | 
  86 |     # Extract features for all audio files in the dataset
  87 |     # ==================================================
  88 |     if params['flow']['extract_features']:
  89 |         section_header('Feature extraction')
  90 | 
  91 |         # Collect files in train sets and test sets
  92 |         files = []
  93 |         for fold in dataset.folds(mode=dataset_evaluation_mode):
  94 |             for item_id, item in enumerate(dataset.train(fold)):
  95 |                 if item['file'] not in files:
  96 |                     files.append(item['file'])
  97 |             for item_id, item in enumerate(dataset.test(fold)):
  98 |                 if item['file'] not in files:
  99 |                     files.append(item['file'])
 100 |         files = sorted(files)
 101 | 
 102 |         # Go through files and make sure all features are extracted
 103 |         do_feature_extraction(files=files,
 104 |                               dataset=dataset,
 105 |                               feature_path=params['path']['features'],
 106 |                               params=params['features'],
 107 |                               overwrite=params['general']['overwrite'])
 108 | 
 109 |         foot()
 110 | 
 111 |     # Prepare feature normalizers
 112 |     # ==================================================
 113 |     if params['flow']['feature_normalizer']:
 114 |         section_header('Feature normalizer')
 115 | 
 116 |         do_feature_normalization(dataset=dataset,
 117 |                                  feature_normalizer_path=params['path']['feature_normalizers'],
 118 |                                  feature_path=params['path']['features'],
 119 |                                  dataset_evaluation_mode=dataset_evaluation_mode,
 120 |                                  overwrite=params['general']['overwrite'])
 121 | 
 122 |         foot()
 123 | 
 124 |     # System training
 125 |     # ==================================================
 126 |     if params['flow']['train_system']:
 127 |         section_header('System training')
 128 | 
 129 |         do_system_training(dataset=dataset,                           
 130 |                            model_path=params['path']['models'],
 131 |                            feature_normalizer_path=params['path']['feature_normalizers'],
 132 |                            feature_path=params['path']['features'],
 133 |                            feature_params=params['features'],
 134 |                            classifier_params=params['classifier']['parameters'],
 135 |                            classifier_method=params['classifier']['method'],
 136 |                            dataset_evaluation_mode=dataset_evaluation_mode,
 137 |                            clean_audio_errors=params['classifier']['audio_error_handling']['clean_data'],
 138 |                            overwrite=params['general']['overwrite']
 139 |                            )
 140 | 
 141 |         foot()
 142 | 
 143 |     # System evaluation in development mode
 144 |     if args.development and not args.challenge:
 145 | 
 146 |         # System testing
 147 |         # ==================================================
 148 |         if params['flow']['test_system']:
 149 |             section_header('System testing')
 150 | 
 151 |             do_system_testing(dataset=dataset,                              
 152 |                               feature_path=params['path']['features'],
 153 |                               result_path=params['path']['results'],
 154 |                               model_path=params['path']['models'],
 155 |                               feature_params=params['features'],
 156 |                               dataset_evaluation_mode=dataset_evaluation_mode,
 157 |                               classifier_method=params['classifier']['method'],
 158 |                               clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'],
 159 |                               overwrite=params['general']['overwrite']
 160 |                               )
 161 |             
 162 |             foot()
 163 | 
 164 |         # System evaluation
 165 |         # ==================================================
 166 |         if params['flow']['evaluate_system']:
 167 |             section_header('System evaluation')
 168 | 
 169 |             do_system_evaluation(dataset=dataset,
 170 |                                  dataset_evaluation_mode=dataset_evaluation_mode,
 171 |                                  result_path=params['path']['results'])
 172 | 
 173 |             foot()
 174 | 
 175 |     # System evaluation with challenge data
 176 |     elif not args.development and args.challenge:
 177 |         # Fetch data over internet and setup the data
 178 |         challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data'])
 179 |         if params['general']['challenge_submission_mode']:
 180 |             result_path = params['path']['challenge_results']
 181 |         else:
 182 |             result_path = params['path']['results']
 183 | 
 184 |         if params['flow']['initialize']:
 185 |             challenge_dataset.fetch()
 186 | 
 187 |         if not params['general']['challenge_submission_mode']:
 188 |             section_header('Feature extraction for challenge data')
 189 | 
 190 |             # Extract feature if not running in challenge submission mode.
 191 |             # Collect test files
 192 |             files = []
 193 |             for fold in challenge_dataset.folds(mode=dataset_evaluation_mode):
 194 |                 for item_id, item in enumerate(dataset.test(fold)):
 195 |                     if item['file'] not in files:
 196 |                         files.append(item['file'])
 197 |             files = sorted(files)
 198 | 
 199 |             # Go through files and make sure all features are extracted
 200 |             do_feature_extraction(files=files,
 201 |                                   dataset=challenge_dataset,
 202 |                                   feature_path=params['path']['features'],
 203 |                                   params=params['features'],
 204 |                                   overwrite=params['general']['overwrite'])
 205 |             foot()
 206 | 
 207 |         # System testing
 208 |         if params['flow']['test_system']:
 209 |             section_header('System testing with challenge data')
 210 | 
 211 |             do_system_testing(dataset=challenge_dataset,
 212 |                               feature_path=params['path']['features'],
 213 |                               result_path=result_path,
 214 |                               model_path=params['path']['models'],
 215 |                               feature_params=params['features'],
 216 |                               dataset_evaluation_mode=dataset_evaluation_mode,
 217 |                               classifier_method=params['classifier']['method'],
 218 |                               clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'],
 219 |                               overwrite=params['general']['overwrite'] or params['general']['challenge_submission_mode']
 220 |                               )
 221 |             foot()
 222 | 
 223 |             if params['general']['challenge_submission_mode']:
 224 |                 print " "
 225 |                 print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]"
 226 |                 print " "
 227 | 
 228 |         # System evaluation if not in challenge submission mode
 229 |         if params['flow']['evaluate_system'] and not params['general']['challenge_submission_mode']:
 230 |             section_header('System evaluation with challenge data')
 231 |             do_system_evaluation(dataset=challenge_dataset,
 232 |                                  dataset_evaluation_mode=dataset_evaluation_mode,
 233 |                                  result_path=result_path)
 234 | 
 235 |             foot()
 236 | 
 237 |     return 0
 238 | 
 239 | 
 240 | def process_parameters(params):
 241 |     """Parameter post-processing.
 242 | 
 243 |     Parameters
 244 |     ----------
 245 |     params : dict
 246 |         parameters in dict
 247 | 
 248 |     Returns
 249 |     -------
 250 |     params : dict
 251 |         processed parameters
 252 | 
 253 |     """
 254 | 
 255 |     # Convert feature extraction window and hop sizes seconds to samples
 256 |     params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs'])
 257 |     params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs'])
 258 | 
 259 |     # Copy parameters for current classifier method
 260 |     params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']]
 261 | 
 262 |     # Hash
 263 |     params['features']['hash'] = get_parameter_hash(params['features'])
 264 | 
 265 |     # Let's keep hashes backwards compatible after added parameters.
 266 |     # Only if error handling is used, they are included in the hash.
 267 |     classifier_params = copy.copy(params['classifier'])
 268 |     if not classifier_params['audio_error_handling']['clean_data']:
 269 |         del classifier_params['audio_error_handling']
 270 |     params['classifier']['hash'] = get_parameter_hash(classifier_params)
 271 | 
 272 |     params['recognizer']['hash'] = get_parameter_hash(params['recognizer'])
 273 | 
 274 |     # Paths
 275 |     params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data'])
 276 |     params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base'])
 277 | 
 278 |     # Features
 279 |     params['path']['features_'] = params['path']['features']
 280 |     params['path']['features'] = os.path.join(params['path']['base'],
 281 |                                               params['path']['features'],
 282 |                                               params['features']['hash'])
 283 | 
 284 |     # Feature normalizers
 285 |     params['path']['feature_normalizers_'] = params['path']['feature_normalizers']
 286 |     params['path']['feature_normalizers'] = os.path.join(params['path']['base'],
 287 |                                                          params['path']['feature_normalizers'],
 288 |                                                          params['features']['hash'])
 289 | 
 290 |     # Models
 291 |     params['path']['models_'] = params['path']['models']
 292 |     params['path']['models'] = os.path.join(params['path']['base'],
 293 |                                             params['path']['models'],
 294 |                                             params['features']['hash'],
 295 |                                             params['classifier']['hash'])
 296 |     # Results
 297 |     params['path']['results_'] = params['path']['results']
 298 |     params['path']['results'] = os.path.join(params['path']['base'],
 299 |                                              params['path']['results'],
 300 |                                              params['features']['hash'],
 301 |                                              params['classifier']['hash'],
 302 |                                              params['recognizer']['hash'])
 303 | 
 304 |     return params
 305 | 
 306 | 
 307 | def make_folders(params, parameter_filename='parameters.yaml'):
 308 |     """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data.
 309 | 
 310 |     Parameters
 311 |     ----------
 312 |     params : dict
 313 |         parameters in dict
 314 | 
 315 |     parameter_filename : str
 316 |         filename to save parameters used to generate the folder name
 317 | 
 318 |     Returns
 319 |     -------
 320 |     nothing
 321 | 
 322 |     """
 323 | 
 324 |     # Check that target path exists, create if not
 325 |     check_path(params['path']['features'])
 326 |     check_path(params['path']['feature_normalizers'])
 327 |     check_path(params['path']['models'])
 328 |     check_path(params['path']['results'])
 329 | 
 330 |     # Save parameters into folders to help manual browsing of files.
 331 | 
 332 |     # Features
 333 |     feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename)
 334 |     if not os.path.isfile(feature_parameter_filename):
 335 |         save_parameters(feature_parameter_filename, params['features'])
 336 | 
 337 |     # Feature normalizers
 338 |     feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename)
 339 |     if not os.path.isfile(feature_normalizer_parameter_filename):
 340 |         save_parameters(feature_normalizer_parameter_filename, params['features'])
 341 | 
 342 |     # Models
 343 |     model_features_parameter_filename = os.path.join(params['path']['base'],
 344 |                                                      params['path']['models_'],
 345 |                                                      params['features']['hash'],
 346 |                                                      parameter_filename)
 347 |     if not os.path.isfile(model_features_parameter_filename):
 348 |         save_parameters(model_features_parameter_filename, params['features'])
 349 | 
 350 |     model_models_parameter_filename = os.path.join(params['path']['base'],
 351 |                                                    params['path']['models_'],
 352 |                                                    params['features']['hash'],
 353 |                                                    params['classifier']['hash'],
 354 |                                                    parameter_filename)
 355 |     if not os.path.isfile(model_models_parameter_filename):
 356 |         save_parameters(model_models_parameter_filename, params['classifier'])
 357 | 
 358 |     # Results
 359 |     # Save parameters into folders to help manual browsing of files.
 360 |     result_features_parameter_filename = os.path.join(params['path']['base'],
 361 |                                                       params['path']['results_'],
 362 |                                                       params['features']['hash'],
 363 |                                                       parameter_filename)
 364 |     if not os.path.isfile(result_features_parameter_filename):
 365 |         save_parameters(result_features_parameter_filename, params['features'])
 366 | 
 367 |     result_models_parameter_filename = os.path.join(params['path']['base'],
 368 |                                                     params['path']['results_'],
 369 |                                                     params['features']['hash'],
 370 |                                                     params['classifier']['hash'],
 371 |                                                     parameter_filename)
 372 |     if not os.path.isfile(result_models_parameter_filename):
 373 |         save_parameters(result_models_parameter_filename, params['classifier'])
 374 | 
 375 |     result_models_parameter_filename = os.path.join(params['path']['base'],
 376 |                                                     params['path']['results_'],
 377 |                                                     params['features']['hash'],
 378 |                                                     params['classifier']['hash'],
 379 |                                                     params['recognizer']['hash'],
 380 |                                                     parameter_filename)
 381 |     if not os.path.isfile(result_models_parameter_filename):
 382 |         save_parameters(result_models_parameter_filename, params['recognizer'])
 383 | 
 384 | def get_feature_filename(audio_file, path, extension='cpickle'):
 385 |     """Get feature filename
 386 | 
 387 |     Parameters
 388 |     ----------
 389 |     audio_file : str
 390 |         audio file name from which the features are extracted
 391 | 
 392 |     path :  str
 393 |         feature path
 394 | 
 395 |     extension : str
 396 |         file extension
 397 |         (Default value='cpickle')
 398 | 
 399 |     Returns
 400 |     -------
 401 |     feature_filename : str
 402 |         full feature filename
 403 | 
 404 |     """
 405 | 
 406 |     audio_filename = os.path.split(audio_file)[1]
 407 |     return os.path.join(path, os.path.splitext(audio_filename)[0] + '.' + extension)
 408 | 
 409 | 
 410 | def get_feature_normalizer_filename(fold, path, extension='cpickle'):
 411 |     """Get normalizer filename
 412 | 
 413 |     Parameters
 414 |     ----------
 415 |     fold : int >= 0
 416 |         evaluation fold number
 417 | 
 418 |     path :  str
 419 |         normalizer path
 420 | 
 421 |     extension : str
 422 |         file extension
 423 |         (Default value='cpickle')
 424 | 
 425 |     Returns
 426 |     -------
 427 |     normalizer_filename : str
 428 |         full normalizer filename
 429 | 
 430 |     """
 431 | 
 432 |     return os.path.join(path, 'scale_fold' + str(fold) + '.' + extension)
 433 | 
 434 | 
 435 | def get_model_filename(fold, path, extension='cpickle'):
 436 |     """Get model filename
 437 | 
 438 |     Parameters
 439 |     ----------
 440 |     fold : int >= 0
 441 |         evaluation fold number
 442 | 
 443 |     path :  str
 444 |         model path
 445 | 
 446 |     extension : str
 447 |         file extension
 448 |         (Default value='cpickle')
 449 | 
 450 |     Returns
 451 |     -------
 452 |     model_filename : str
 453 |         full model filename
 454 | 
 455 |     """
 456 | 
 457 |     return os.path.join(path, 'model_fold' + str(fold) + '.' + extension)
 458 | 
 459 | 
 460 | def get_result_filename(fold, path, extension='txt'):
 461 |     """Get result filename
 462 | 
 463 |     Parameters
 464 |     ----------
 465 |     fold : int >= 0
 466 |         evaluation fold number
 467 | 
 468 |     path :  str
 469 |         result path
 470 | 
 471 |     extension : str
 472 |         file extension
 473 |         (Default value='cpickle')
 474 | 
 475 |     Returns
 476 |     -------
 477 |     result_filename : str
 478 |         full result filename
 479 | 
 480 |     """
 481 | 
 482 |     if fold == 0:
 483 |         return os.path.join(path, 'results.' + extension)
 484 |     else:
 485 |         return os.path.join(path, 'results_fold' + str(fold) + '.' + extension)
 486 | 
 487 | 
 488 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False):
 489 |     """Feature extraction
 490 | 
 491 |     Parameters
 492 |     ----------
 493 |     files : list
 494 |         file list
 495 | 
 496 |     dataset : class
 497 |         dataset class
 498 | 
 499 |     feature_path : str
 500 |         path where the features are saved
 501 | 
 502 |     params : dict
 503 |         parameter dict
 504 | 
 505 |     overwrite : bool
 506 |         overwrite existing feature files
 507 |         (Default value=False)
 508 | 
 509 |     Returns
 510 |     -------
 511 |     nothing
 512 | 
 513 |     Raises
 514 |     -------
 515 |     IOError
 516 |         Audio file not found.
 517 | 
 518 |     """
 519 | 
 520 |     # Check that target path exists, create if not
 521 |     check_path(feature_path)
 522 | 
 523 |     for file_id, audio_filename in enumerate(files):
 524 |         # Get feature filename
 525 |         current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
 526 | 
 527 |         progress(title_text='Extracting',
 528 |                  percentage=(float(file_id) / len(files)),
 529 |                  note=os.path.split(audio_filename)[1])
 530 | 
 531 |         if not os.path.isfile(current_feature_file) or overwrite:
 532 |             # Load audio data
 533 |             if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)):
 534 |                 y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs'])
 535 |             else:
 536 |                 raise IOError("Audio file not found [%s]" % audio_filename)
 537 | 
 538 |             # Extract features
 539 |             feature_data = feature_extraction(y=y,
 540 |                                               fs=fs,
 541 |                                               include_mfcc0=params['include_mfcc0'],
 542 |                                               include_delta=params['include_delta'],
 543 |                                               include_acceleration=params['include_acceleration'],
 544 |                                               mfcc_params=params['mfcc'],
 545 |                                               delta_params=params['mfcc_delta'],
 546 |                                               acceleration_params=params['mfcc_acceleration'])
 547 |             # Save
 548 |             save_data(current_feature_file, feature_data)
 549 | 
 550 | 
 551 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False):
 552 |     """Feature normalization
 553 | 
 554 |     Calculated normalization factors for each evaluation fold based on the training material available.
 555 | 
 556 |     Parameters
 557 |     ----------
 558 |     dataset : class
 559 |         dataset class
 560 | 
 561 |     feature_normalizer_path : str
 562 |         path where the feature normalizers are saved.
 563 | 
 564 |     feature_path : str
 565 |         path where the features are saved.
 566 | 
 567 |     dataset_evaluation_mode : str ['folds', 'full']
 568 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 569 |         (Default value='folds')
 570 | 
 571 |     overwrite : bool
 572 |         overwrite existing normalizers
 573 |         (Default value=False)
 574 | 
 575 |     Returns
 576 |     -------
 577 |     nothing
 578 | 
 579 |     Raises
 580 |     -------
 581 |     IOError
 582 |         Feature file not found.
 583 | 
 584 |     """
 585 | 
 586 |     # Check that target path exists, create if not
 587 |     check_path(feature_normalizer_path)
 588 | 
 589 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 590 |         current_normalizer_file = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path)
 591 | 
 592 |         if not os.path.isfile(current_normalizer_file) or overwrite:
 593 |             # Initialize statistics
 594 |             file_count = len(dataset.train(fold))
 595 |             normalizer = FeatureNormalizer()
 596 | 
 597 |             for item_id, item in enumerate(dataset.train(fold)):
 598 |                 progress(title_text='Collecting data',
 599 |                          fold=fold,
 600 |                          percentage=(float(item_id) / file_count),
 601 |                          note=os.path.split(item['file'])[1])
 602 |                 # Load features
 603 |                 if os.path.isfile(get_feature_filename(audio_file=item['file'], path=feature_path)):
 604 |                     feature_data = load_data(get_feature_filename(audio_file=item['file'], path=feature_path))['stat']
 605 |                 else:
 606 |                     raise IOError("Feature file not found [%s]" % (item['file']))
 607 | 
 608 |                 # Accumulate statistics
 609 |                 normalizer.accumulate(feature_data)
 610 |             
 611 |             # Calculate normalization factors
 612 |             normalizer.finalize()
 613 | 
 614 |             # Save
 615 |             save_data(current_normalizer_file, normalizer)
 616 | 
 617 | 
 618 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, feature_params, classifier_params,
 619 |                        dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False):
 620 |     """System training
 621 | 
 622 |     model container format:
 623 | 
 624 |     {
 625 |         'normalizer': normalizer class
 626 |         'models' :
 627 |             {
 628 |                 'office' : mixture.GMM class
 629 |                 'home' : mixture.GMM class
 630 |                 ...
 631 |             }
 632 |     }
 633 | 
 634 |     Parameters
 635 |     ----------
 636 |     dataset : class
 637 |         dataset class
 638 | 
 639 |     model_path : str
 640 |         path where the models are saved.
 641 | 
 642 |     feature_normalizer_path : str
 643 |         path where the feature normalizers are saved.
 644 | 
 645 |     feature_path : str
 646 |         path where the features are saved.
 647 | 
 648 |     feature_params : dict
 649 |         parameter dict
 650 | 
 651 |     classifier_params : dict
 652 |         parameter dict
 653 | 
 654 |     dataset_evaluation_mode : str ['folds', 'full']
 655 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 656 |         (Default value='folds')
 657 | 
 658 |     classifier_method : str ['gmm']
 659 |         classifier method, currently only GMM supported
 660 |         (Default value='gmm')
 661 | 
 662 |     clean_audio_errors : bool
 663 |         Remove audio errors from the training data
 664 |         (Default value=False)
 665 | 
 666 |     overwrite : bool
 667 |         overwrite existing models
 668 |         (Default value=False)
 669 | 
 670 |     Returns
 671 |     -------
 672 |     nothing
 673 | 
 674 |     Raises
 675 |     -------
 676 |     ValueError
 677 |         classifier_method is unknown.
 678 | 
 679 |     IOError
 680 |         Feature normalizer not found.
 681 |         Feature file not found.
 682 | 
 683 |     """
 684 | 
 685 |     if classifier_method != 'gmm':
 686 |         raise ValueError("Unknown classifier method ["+classifier_method+"]")
 687 | 
 688 |     # Check that target path exists, create if not
 689 |     check_path(model_path)
 690 | 
 691 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 692 |         current_model_file = get_model_filename(fold=fold, path=model_path)
 693 |         if not os.path.isfile(current_model_file) or overwrite:
 694 |             # Load normalizer
 695 |             feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path)
 696 |             if os.path.isfile(feature_normalizer_filename):
 697 |                 normalizer = load_data(feature_normalizer_filename)
 698 |             else:
 699 |                 raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename)
 700 | 
 701 |             # Initialize model container
 702 |             model_container = {'normalizer': normalizer, 'models': {}}
 703 | 
 704 |             # Collect training examples
 705 |             file_count = len(dataset.train(fold))
 706 |             data = {}
 707 |             for item_id, item in enumerate(dataset.train(fold)):
 708 |                 progress(title_text='Collecting data',
 709 |                          fold=fold,
 710 |                          percentage=(float(item_id) / file_count),
 711 |                          note=os.path.split(item['file'])[1])
 712 | 
 713 |                 # Load features
 714 |                 feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
 715 |                 if os.path.isfile(feature_filename):
 716 |                     feature_data = load_data(feature_filename)['feat']
 717 |                 else:
 718 |                     raise IOError("Features not found [%s]" % (item['file']))
 719 | 
 720 |                 # Scale features
 721 |                 feature_data = model_container['normalizer'].normalize(feature_data)
 722 | 
 723 |                 # Audio error removal
 724 |                 if clean_audio_errors:
 725 |                     current_errors = dataset.file_error_meta(item['file'])
 726 |                     if current_errors:
 727 |                         removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool)
 728 |                         for error_event in current_errors:
 729 |                             onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds']))
 730 |                             offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds']))
 731 |                             if offset_frame > feature_data.shape[0]:
 732 |                                 offset_frame = feature_data.shape[0]
 733 |                             removal_mask[onset_frame:offset_frame] = False
 734 |                         feature_data = feature_data[removal_mask, :]
 735 | 
 736 |                 # Store features per class label
 737 |                 if item['scene_label'] not in data:
 738 |                     data[item['scene_label']] = feature_data
 739 |                 else:
 740 |                     data[item['scene_label']] = numpy.vstack((data[item['scene_label']], feature_data))
 741 | 
 742 |             # Train models for each class
 743 |             for label in data:
 744 |                 progress(title_text='Train models',
 745 |                          fold=fold,
 746 |                          note=label)
 747 |                 if classifier_method == 'gmm':
 748 |                     model_container['models'][label] = mixture.GMM(**classifier_params).fit(data[label])
 749 |                 else:
 750 |                     raise ValueError("Unknown classifier method ["+classifier_method+"]")
 751 | 
 752 |             # Save models
 753 |             save_data(current_model_file, model_container)
 754 | 
 755 | 
 756 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params,
 757 |                       dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False):
 758 |     """System testing.
 759 | 
 760 |     If extracted features are not found from disk, they are extracted but not saved.
 761 | 
 762 |     Parameters
 763 |     ----------
 764 |     dataset : class
 765 |         dataset class
 766 | 
 767 |     result_path : str
 768 |         path where the results are saved.
 769 | 
 770 |     feature_path : str
 771 |         path where the features are saved.
 772 | 
 773 |     model_path : str
 774 |         path where the models are saved.
 775 | 
 776 |     feature_params : dict
 777 |         parameter dict
 778 | 
 779 |     dataset_evaluation_mode : str ['folds', 'full']
 780 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 781 |         (Default value='folds')
 782 | 
 783 |     classifier_method : str ['gmm']
 784 |         classifier method, currently only GMM supported
 785 |         (Default value='gmm')
 786 | 
 787 |     clean_audio_errors : bool
 788 |         Remove audio errors from the training data
 789 |         (Default value=False)
 790 | 
 791 |     overwrite : bool
 792 |         overwrite existing models
 793 |         (Default value=False)
 794 | 
 795 |     Returns
 796 |     -------
 797 |     nothing
 798 | 
 799 |     Raises
 800 |     -------
 801 |     ValueError
 802 |         classifier_method is unknown.
 803 | 
 804 |     IOError
 805 |         Model file not found.
 806 |         Audio file not found.
 807 | 
 808 |     """
 809 | 
 810 |     if classifier_method != 'gmm':
 811 |         raise ValueError("Unknown classifier method ["+classifier_method+"]")
 812 | 
 813 |     # Check that target path exists, create if not
 814 |     check_path(result_path)
 815 | 
 816 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 817 |         current_result_file = get_result_filename(fold=fold, path=result_path)
 818 |         if not os.path.isfile(current_result_file) or overwrite:
 819 |             results = []
 820 | 
 821 |             # Load class model container
 822 |             model_filename = get_model_filename(fold=fold, path=model_path)
 823 |             if os.path.isfile(model_filename):
 824 |                 model_container = load_data(model_filename)
 825 |             else:
 826 |                 raise IOError("Model file not found [%s]" % model_filename)
 827 | 
 828 |             file_count = len(dataset.test(fold))
 829 |             for file_id, item in enumerate(dataset.test(fold)):
 830 |                 progress(title_text='Testing',
 831 |                          fold=fold,
 832 |                          percentage=(float(file_id) / file_count),
 833 |                          note=os.path.split(item['file'])[1])
 834 |                 
 835 |                 # Load features
 836 |                 feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
 837 | 
 838 |                 if os.path.isfile(feature_filename):
 839 |                     feature_data = load_data(feature_filename)['feat']
 840 |                 else:
 841 |                     # Load audio
 842 |                     if os.path.isfile(dataset.relative_to_absolute_path(item['file'])):
 843 |                         y, fs = load_audio(filename=dataset.relative_to_absolute_path(item['file']), mono=True, fs=feature_params['fs'])
 844 |                     else:
 845 |                         raise IOError("Audio file not found [%s]" % (item['file']))
 846 | 
 847 |                     feature_data = feature_extraction(y=y,
 848 |                                                       fs=fs,
 849 |                                                       include_mfcc0=feature_params['include_mfcc0'],
 850 |                                                       include_delta=feature_params['include_delta'],
 851 |                                                       include_acceleration=feature_params['include_acceleration'],
 852 |                                                       mfcc_params=feature_params['mfcc'],
 853 |                                                       delta_params=feature_params['mfcc_delta'],
 854 |                                                       acceleration_params=feature_params['mfcc_acceleration'],
 855 |                                                       statistics=False)['feat']
 856 | 
 857 |                 # Scale features
 858 |                 feature_data = model_container['normalizer'].normalize(feature_data)
 859 | 
 860 |                 if clean_audio_errors:
 861 |                     current_errors = dataset.file_error_meta(item['file'])
 862 |                     if current_errors:
 863 |                         removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool)
 864 |                         for error_event in current_errors:
 865 |                             onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds']))
 866 |                             offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds']))
 867 |                             if offset_frame > feature_data.shape[0]:
 868 |                                 offset_frame = feature_data.shape[0]
 869 |                             removal_mask[onset_frame:offset_frame] = False
 870 |                         feature_data = feature_data[removal_mask, :]
 871 | 
 872 |                 # Do classification for the block
 873 |                 if classifier_method == 'gmm':
 874 |                     current_result = do_classification_gmm(feature_data, model_container)
 875 |                 else:
 876 |                     raise ValueError("Unknown classifier method ["+classifier_method+"]")
 877 | 
 878 |                 # Store the result
 879 |                 results.append((dataset.absolute_to_relative(item['file']), current_result))
 880 | 
 881 |             # Save testing results
 882 |             with open(current_result_file, 'wt') as f:
 883 |                 writer = csv.writer(f, delimiter='\t')
 884 |                 for result_item in results:
 885 |                     writer.writerow(result_item)
 886 | 
 887 | 
 888 | def do_classification_gmm(feature_data, model_container):
 889 |     """GMM classification for give feature matrix
 890 | 
 891 |     model container format:
 892 | 
 893 |     {
 894 |         'normalizer': normalizer class
 895 |         'models' :
 896 |             {
 897 |                 'office' : mixture.GMM class
 898 |                 'home' : mixture.GMM class
 899 |                 ...
 900 |             }
 901 |     }
 902 | 
 903 |     Parameters
 904 |     ----------
 905 |     feature_data : numpy.ndarray [shape=(t, feature vector length)]
 906 |         feature matrix
 907 | 
 908 |     model_container : dict
 909 |         model container
 910 | 
 911 |     Returns
 912 |     -------
 913 |     result : str
 914 |         classification result as scene label
 915 | 
 916 |     """
 917 | 
 918 |     # Initialize log-likelihood matrix to -inf
 919 |     logls = numpy.empty(len(model_container['models']))
 920 |     logls.fill(-numpy.inf)
 921 | 
 922 |     for label_id, label in enumerate(model_container['models']):
 923 |         logls[label_id] = numpy.sum(model_container['models'][label].score(feature_data))
 924 | 
 925 |     classification_result_id = numpy.argmax(logls)
 926 |     return model_container['models'].keys()[classification_result_id]
 927 | 
 928 | 
 929 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'):
 930 |     """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed.
 931 | 
 932 |     Parameters
 933 |     ----------
 934 |     dataset : class
 935 |         dataset class
 936 | 
 937 |     result_path : str
 938 |         path where the results are saved.
 939 | 
 940 |     dataset_evaluation_mode : str ['folds', 'full']
 941 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 942 |         (Default value='folds')
 943 | 
 944 |     Returns
 945 |     -------
 946 |     nothing
 947 | 
 948 |     Raises
 949 |     -------
 950 |     IOError
 951 |         Result file not found
 952 | 
 953 |     """
 954 | 
 955 |     dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
 956 |     results_fold = []
 957 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 958 |         dcase2016_scene_metric_fold = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
 959 |         results = []
 960 |         result_filename = get_result_filename(fold=fold, path=result_path)
 961 | 
 962 |         if os.path.isfile(result_filename):
 963 |             with open(result_filename, 'rt') as f:
 964 |                 for row in csv.reader(f, delimiter='\t'):
 965 |                     results.append(row)
 966 |         else:
 967 |             raise IOError("Result file not found [%s]" % result_filename)
 968 | 
 969 |         y_true = []
 970 |         y_pred = []
 971 |         for result in results:
 972 |             y_true.append(dataset.file_meta(result[0])[0]['scene_label'])
 973 |             y_pred.append(result[1])
 974 |         dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
 975 |         dcase2016_scene_metric_fold.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
 976 |         results_fold.append(dcase2016_scene_metric_fold.results())
 977 |     results = dcase2016_scene_metric.results()
 978 | 
 979 |     print "  File-wise evaluation, over %d folds" % dataset.fold_count
 980 |     fold_labels = ''
 981 |     separator = '     =====================+======+======+==========+  +'
 982 |     if dataset.fold_count > 1:
 983 |         for fold in dataset.folds(mode=dataset_evaluation_mode):
 984 |             fold_labels += " {:8s} |".format('Fold'+str(fold))
 985 |             separator += "==========+"
 986 |     print "     {:20s} | {:4s} : {:4s} | {:8s} |  |".format('Scene label', 'Nref', 'Nsys', 'Accuracy')+fold_labels
 987 |     print separator
 988 |     for label_id, label in enumerate(sorted(results['class_wise_accuracy'])):
 989 |         fold_values = ''
 990 |         if dataset.fold_count > 1:
 991 |             for fold in dataset.folds(mode=dataset_evaluation_mode):
 992 |                 fold_values += " {:5.1f} %  |".format(results_fold[fold-1]['class_wise_accuracy'][label] * 100)
 993 |         print "     {:20s} | {:4d} : {:4d} | {:5.1f} %  |  |".format(label,
 994 |                                                                      results['class_wise_data'][label]['Nref'],
 995 |                                                                      results['class_wise_data'][label]['Nsys'],
 996 |                                                                      results['class_wise_accuracy'][label] * 100)+fold_values
 997 |     print separator
 998 |     fold_values = ''
 999 |     if dataset.fold_count > 1:
1000 |         for fold in dataset.folds(mode=dataset_evaluation_mode):
1001 |             fold_values += " {:5.1f} %  |".format(results_fold[fold-1]['overall_accuracy'] * 100)
1002 | 
1003 |     print "     {:20s} | {:4d} : {:4d} | {:5.1f} %  |  |".format('Overall accuracy',
1004 |                                                                  results['Nref'],
1005 |                                                                  results['Nsys'],
1006 |                                                                  results['overall_accuracy'] * 100)+fold_values
1007 | 
1008 | if __name__ == "__main__":
1009 |     try:
1010 |         sys.exit(main(sys.argv))
1011 |     except (ValueError, IOError) as e:
1012 |         sys.exit(e)
1013 | 


--------------------------------------------------------------------------------
/task1_scene_classification.yaml:
--------------------------------------------------------------------------------
 1 | # ==========================================================
 2 | # Flow
 3 | # ==========================================================
 4 | flow:
 5 |   initialize: true
 6 |   extract_features: true
 7 |   feature_normalizer: true
 8 |   train_system: true
 9 |   test_system: true
10 |   evaluate_system: true
11 | 
12 | # ==========================================================
13 | # General
14 | # ==========================================================
15 | general:
16 |   development_dataset: TUTAcousticScenes_2016_DevelopmentSet
17 |   challenge_dataset: TUTAcousticScenes_2016_EvaluationSet
18 | 
19 |   overwrite: false              # Overwrite previously stored data
20 | 
21 |   challenge_submission_mode: false # save results into path->challenge_results for challenge submission
22 | 
23 | # ==========================================================
24 | # Paths
25 | # ==========================================================
26 | path:
27 |   data: data/
28 | 
29 |   base: system/baseline_dcase2016_task1/
30 |   features: features/
31 |   feature_normalizers: feature_normalizers/
32 |   models: acoustic_models/
33 |   results: evaluation_results/
34 | 
35 |   challenge_results: challenge_submission/task_1_acoustic_scene_classification/
36 | 
37 | # ==========================================================
38 | # Feature extraction
39 | # ==========================================================
40 | features:
41 |   fs: 44100
42 |   win_length_seconds: 0.04
43 |   hop_length_seconds: 0.02
44 | 
45 |   include_mfcc0: true           #
46 |   include_delta: true           #
47 |   include_acceleration: true    #
48 | 
49 |   mfcc:
50 |     window: hamming_asymmetric  # [hann_asymmetric, hamming_asymmetric]
51 |     n_mfcc: 20                  # Number of MFCC coefficients
52 |     n_mels: 40                  # Number of MEL bands used
53 |     n_fft: 2048                 # FFT length
54 |     fmin: 0                     # Minimum frequency when constructing MEL bands
55 |     fmax: 22050                 # Maximum frequency when constructing MEL band
56 |     htk: false                  # Switch for HTK-styled MEL-frequency equation
57 | 
58 |   mfcc_delta:
59 |     width: 9
60 | 
61 |   mfcc_acceleration:
62 |     width: 9
63 | 
64 | # ==========================================================
65 | # Classifier
66 | # ==========================================================
67 | classifier:
68 |   method: gmm                   # The system supports only gmm
69 | 
70 |   audio_error_handling:         # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
71 |     clean_data: false           # Exclude audio errors from training audio
72 | 
73 |   parameters: !!null            # Parameters are copied from classifier_parameters based on defined method
74 | 
75 | classifier_parameters:
76 |   gmm:
77 |     n_components: 16            # Number of Gaussian components
78 |     covariance_type: diag       # [diag|full] Diagonal or full covariance matrix
79 |     random_state: 0
80 |     thresh: !!null
81 |     tol: 0.001
82 |     min_covar: 0.001
83 |     n_iter: 40
84 |     n_init: 1
85 |     params: wmc
86 |     init_params: wmc
87 | 
88 | # ==========================================================
89 | # Recognizer
90 | # ==========================================================
91 | recognizer:
92 |   audio_error_handling:         # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
93 |     clean_data: false           # Exclude audio errors from test audio


--------------------------------------------------------------------------------
/task3_sound_event_detection_in_real_life_audio.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | # -*- coding: utf-8 -*-
   3 | #
   4 | # DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System
   5 | 
   6 | from src.ui import *
   7 | from src.general import *
   8 | from src.files import *
   9 | 
  10 | from src.features import *
  11 | from src.sound_event_detection import *
  12 | from src.dataset import *
  13 | from src.evaluation import *
  14 | 
  15 | import numpy
  16 | import csv
  17 | import warnings
  18 | import argparse
  19 | import textwrap
  20 | import math
  21 | 
  22 | from sklearn import mixture
  23 | 
  24 | __version_info__ = ('1', '0', '1')
  25 | __version__ = '.'.join(__version_info__)
  26 | 
  27 | 
  28 | def main(argv):
  29 |     numpy.random.seed(123456)  # let's make randomization predictable
  30 | 
  31 |     parser = argparse.ArgumentParser(
  32 |         prefix_chars='-+',
  33 |         formatter_class=argparse.RawDescriptionHelpFormatter,
  34 |         description=textwrap.dedent('''\
  35 |             DCASE 2016
  36 |             Task 3: Sound Event Detection in Real-life Audio 
  37 |             Baseline System
  38 |             ---------------------------------------------
  39 |                 Tampere University of Technology / Audio Research Group
  40 |                 Author:  Toni Heittola ( toni.heittola@tut.fi )
  41 | 
  42 |             System description
  43 |                 This is an baseline implementation for the D-CASE 2016, task 3 - Sound event detection in real life audio.
  44 |                 The system has binary classifier for each included sound event class. The GMM classifier is trained with
  45 |                 the positive and negative examples from the mixture signals, and classification is done between these
  46 |                 two models as likelihood ratio. Acoustic features are MFCC+Delta+Acceleration (MFCC0 omitted).
  47 | 
  48 |         '''))
  49 | 
  50 |     parser.add_argument("-development", help="Use the system in the development mode", action='store_true',
  51 |                         default=False, dest='development')
  52 |     parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true',
  53 |                         default=False, dest='challenge')
  54 | 
  55 |     parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__)
  56 |     args = parser.parse_args()
  57 | 
  58 |     # Load parameters from config file
  59 |     parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)),
  60 |                                   os.path.splitext(os.path.basename(__file__))[0]+'.yaml')
  61 |     params = load_parameters(parameter_file)
  62 |     params = process_parameters(params)
  63 |     make_folders(params)
  64 | 
  65 |     title("DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System")
  66 | 
  67 |     # Check if mode is defined
  68 |     if not (args.development or args.challenge):
  69 |         args.development = True
  70 |         args.challenge = False
  71 | 
  72 |     dataset_evaluation_mode = 'folds'
  73 |     if args.development and not args.challenge:
  74 |         print "Running system in development mode"
  75 |         dataset_evaluation_mode = 'folds'
  76 |     elif not args.development and args.challenge:
  77 |         print "Running system in challenge mode"
  78 |         dataset_evaluation_mode = 'full'
  79 | 
  80 |     # Get dataset container class
  81 |     dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data'])
  82 | 
  83 |     # Fetch data over internet and setup the data
  84 |     # ==================================================
  85 |     if params['flow']['initialize']:
  86 |         dataset.fetch()
  87 | 
  88 |     # Extract features for all audio files in the dataset
  89 |     # ==================================================
  90 |     if params['flow']['extract_features']:
  91 |         section_header('Feature extraction [Development data]')
  92 | 
  93 |         # Collect files from evaluation sets
  94 |         files = []
  95 |         for fold in dataset.folds(mode=dataset_evaluation_mode):
  96 |             for item_id, item in enumerate(dataset.train(fold)):
  97 |                 if item['file'] not in files:
  98 |                     files.append(item['file'])
  99 |             for item_id, item in enumerate(dataset.test(fold)):
 100 |                 if item['file'] not in files:
 101 |                     files.append(item['file'])
 102 | 
 103 |         # Go through files and make sure all features are extracted
 104 |         do_feature_extraction(files=files,
 105 |                               dataset=dataset,
 106 |                               feature_path=params['path']['features'],
 107 |                               params=params['features'],
 108 |                               overwrite=params['general']['overwrite'])
 109 | 
 110 |         foot()
 111 | 
 112 |     # Prepare feature normalizers
 113 |     # ==================================================
 114 |     if params['flow']['feature_normalizer']:
 115 |         section_header('Feature normalizer [Development data]')
 116 | 
 117 |         do_feature_normalization(dataset=dataset,
 118 |                                  feature_normalizer_path=params['path']['feature_normalizers'],
 119 |                                  feature_path=params['path']['features'],
 120 |                                  dataset_evaluation_mode=dataset_evaluation_mode,
 121 |                                  overwrite=params['general']['overwrite'])
 122 | 
 123 |         foot()
 124 | 
 125 |     # System training
 126 |     # ==================================================
 127 |     if params['flow']['train_system']:
 128 |         section_header('System training    [Development data]')
 129 | 
 130 |         do_system_training(dataset=dataset,
 131 |                            model_path=params['path']['models'],
 132 |                            feature_normalizer_path=params['path']['feature_normalizers'],
 133 |                            feature_path=params['path']['features'],
 134 |                            hop_length_seconds=params['features']['hop_length_seconds'],
 135 |                            classifier_params=params['classifier']['parameters'],
 136 |                            dataset_evaluation_mode=dataset_evaluation_mode,
 137 |                            classifier_method=params['classifier']['method'],
 138 |                            overwrite=params['general']['overwrite']
 139 |                            )
 140 | 
 141 |         foot()
 142 | 
 143 |     # System evaluation in development mode
 144 |     if args.development and not args.challenge:
 145 | 
 146 |         # System testing
 147 |         # ==================================================
 148 |         if params['flow']['test_system']:
 149 |             section_header('System testing     [Development data]')
 150 | 
 151 |             do_system_testing(dataset=dataset,                              
 152 |                               result_path=params['path']['results'],
 153 |                               feature_path=params['path']['features'],
 154 |                               model_path=params['path']['models'],
 155 |                               feature_params=params['features'],
 156 |                               detector_params=params['detector'],
 157 |                               dataset_evaluation_mode=dataset_evaluation_mode,
 158 |                               classifier_method=params['classifier']['method'],
 159 |                               overwrite=params['general']['overwrite']
 160 |                               )
 161 |             foot()
 162 | 
 163 |         # System evaluation
 164 |         # ==================================================
 165 |         if params['flow']['evaluate_system']:
 166 |             section_header('System evaluation  [Development data]')
 167 | 
 168 |             do_system_evaluation(dataset=dataset,
 169 |                                  dataset_evaluation_mode=dataset_evaluation_mode,
 170 |                                  result_path=params['path']['results'])
 171 | 
 172 |             foot()
 173 | 
 174 |     # System evaluation with challenge data
 175 |     elif not args.development and args.challenge:
 176 |         # Fetch data over internet and setup the data
 177 |         challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data'])
 178 | 
 179 |         if params['flow']['initialize']:
 180 |             challenge_dataset.fetch()
 181 | 
 182 |         # System testing
 183 |         if params['flow']['test_system']:
 184 |             section_header('System testing     [Challenge data]')
 185 | 
 186 |             do_system_testing(dataset=challenge_dataset,                              
 187 |                               result_path=params['path']['challenge_results'],
 188 |                               feature_path=params['path']['features'],
 189 |                               model_path=params['path']['models'],
 190 |                               feature_params=params['features'],
 191 |                               detector_params=params['detector'],
 192 |                               dataset_evaluation_mode=dataset_evaluation_mode,
 193 |                               classifier_method=params['classifier']['method'],
 194 |                               overwrite=True
 195 |                               )
 196 |             foot()
 197 | 
 198 |             print " "
 199 |             print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]"
 200 |             print " "
 201 | 
 202 | 
 203 | def process_parameters(params):
 204 |     """Parameter post-processing.
 205 | 
 206 |     Parameters
 207 |     ----------
 208 |     params : dict
 209 |         parameters in dict
 210 | 
 211 |     Returns
 212 |     -------
 213 |     params : dict
 214 |         processed parameters
 215 | 
 216 |     """
 217 | 
 218 |     params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs'])
 219 |     params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs'])
 220 | 
 221 |     # Copy parameters for current classifier method
 222 |     params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']]
 223 | 
 224 |     # Hash
 225 |     params['features']['hash'] = get_parameter_hash(params['features'])
 226 |     params['classifier']['hash'] = get_parameter_hash(params['classifier'])
 227 |     params['detector']['hash'] = get_parameter_hash(params['detector'])
 228 | 
 229 |     # Paths
 230 |     params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data'])
 231 |     params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base'])
 232 | 
 233 |     # Features
 234 |     params['path']['features_'] = params['path']['features']
 235 |     params['path']['features'] = os.path.join(params['path']['base'],
 236 |                                               params['path']['features'],
 237 |                                               params['features']['hash'])
 238 | 
 239 |     # Feature normalizers
 240 |     params['path']['feature_normalizers_'] = params['path']['feature_normalizers']
 241 |     params['path']['feature_normalizers'] = os.path.join(params['path']['base'],
 242 |                                                          params['path']['feature_normalizers'],
 243 |                                                          params['features']['hash'])
 244 | 
 245 |     # Models
 246 |     # Save parameters into folders to help manual browsing of files.
 247 |     params['path']['models_'] = params['path']['models']
 248 |     params['path']['models'] = os.path.join(params['path']['base'],
 249 |                                             params['path']['models'],
 250 |                                             params['features']['hash'],
 251 |                                             params['classifier']['hash'])
 252 | 
 253 |     # Results
 254 |     params['path']['results_'] = params['path']['results']
 255 |     params['path']['results'] = os.path.join(params['path']['base'],
 256 |                                              params['path']['results'],
 257 |                                              params['features']['hash'],
 258 |                                              params['classifier']['hash'],
 259 |                                              params['detector']['hash'])
 260 |     return params
 261 | 
 262 | 
 263 | def make_folders(params, parameter_filename='parameters.yaml'):
 264 |     """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data.
 265 | 
 266 |     Parameters
 267 |     ----------
 268 |     params : dict
 269 |         parameters in dict
 270 | 
 271 |     parameter_filename : str
 272 |         filename to save parameters used to generate the folder name
 273 | 
 274 |     Returns
 275 |     -------
 276 |     nothing
 277 | 
 278 |     """
 279 | 
 280 |     # Check that target path exists, create if not
 281 |     check_path(params['path']['features'])
 282 |     check_path(params['path']['feature_normalizers'])
 283 |     check_path(params['path']['models'])
 284 |     check_path(params['path']['results'])
 285 | 
 286 |     # Save parameters into folders to help manual browsing of files.
 287 | 
 288 |     # Features
 289 |     feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename)
 290 |     if not os.path.isfile(feature_parameter_filename):
 291 |         save_parameters(feature_parameter_filename, params['features'])
 292 | 
 293 |     # Feature normalizers
 294 |     feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename)
 295 |     if not os.path.isfile(feature_normalizer_parameter_filename):
 296 |         save_parameters(feature_normalizer_parameter_filename, params['features'])
 297 | 
 298 |     # Models
 299 |     model_features_parameter_filename = os.path.join(params['path']['base'],
 300 |                                                      params['path']['models_'],
 301 |                                                      params['features']['hash'],
 302 |                                                      parameter_filename)
 303 |     if not os.path.isfile(model_features_parameter_filename):
 304 |         save_parameters(model_features_parameter_filename, params['features'])
 305 | 
 306 |     model_models_parameter_filename = os.path.join(params['path']['base'],
 307 |                                                    params['path']['models_'],
 308 |                                                    params['features']['hash'],
 309 |                                                    params['classifier']['hash'],
 310 |                                                    parameter_filename)
 311 |     if not os.path.isfile(model_models_parameter_filename):
 312 |         save_parameters(model_models_parameter_filename, params['classifier'])
 313 | 
 314 |     # Results
 315 |     # Save parameters into folders to help manual browsing of files.
 316 |     result_features_parameter_filename = os.path.join(params['path']['base'],
 317 |                                                       params['path']['results_'],
 318 |                                                       params['features']['hash'],
 319 |                                                       parameter_filename)
 320 |     if not os.path.isfile(result_features_parameter_filename):
 321 |         save_parameters(result_features_parameter_filename, params['features'])
 322 | 
 323 |     result_models_parameter_filename = os.path.join(params['path']['base'],
 324 |                                                     params['path']['results_'],
 325 |                                                     params['features']['hash'],
 326 |                                                     params['classifier']['hash'],
 327 |                                                     parameter_filename)
 328 |     if not os.path.isfile(result_models_parameter_filename):
 329 |         save_parameters(result_models_parameter_filename, params['classifier'])
 330 | 
 331 |     result_detector_parameter_filename = os.path.join(params['path']['base'],
 332 |                                                       params['path']['results_'],
 333 |                                                       params['features']['hash'],
 334 |                                                       params['classifier']['hash'],
 335 |                                                       params['detector']['hash'],
 336 |                                                       parameter_filename)
 337 |     if not os.path.isfile(result_detector_parameter_filename):
 338 |         save_parameters(result_detector_parameter_filename, params['detector'])
 339 | 
 340 | 
 341 | def get_feature_filename(audio_file, path, extension='cpickle'):
 342 |     """Get feature filename
 343 | 
 344 |     Parameters
 345 |     ----------
 346 |     audio_file : str
 347 |         audio file name from which the features are extracted
 348 | 
 349 |     path :  str
 350 |         feature path
 351 | 
 352 |     extension : str
 353 |         file extension
 354 |         (Default value='cpickle')
 355 | 
 356 |     Returns
 357 |     -------
 358 |     feature_filename : str
 359 |         full feature filename
 360 | 
 361 |     """
 362 | 
 363 |     return os.path.join(path, 'sequence_' + os.path.splitext(audio_file)[0] + '.' + extension)
 364 | 
 365 | 
 366 | def get_feature_normalizer_filename(fold, scene_label, path, extension='cpickle'):
 367 |     """Get normalizer filename
 368 | 
 369 |     Parameters
 370 |     ----------
 371 |     fold : int >= 0
 372 |         evaluation fold number
 373 | 
 374 |     scene_label : str
 375 |         scene label
 376 | 
 377 |     path :  str
 378 |         normalizer path
 379 | 
 380 |     extension : str
 381 |         file extension
 382 |         (Default value='cpickle')
 383 | 
 384 |     Returns
 385 |     -------
 386 |     normalizer_filename : str
 387 |         full normalizer filename
 388 | 
 389 |     """
 390 | 
 391 |     return os.path.join(path, 'scale_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
 392 | 
 393 | 
 394 | def get_model_filename(fold, scene_label, path, extension='cpickle'):
 395 |     """Get model filename
 396 | 
 397 |     Parameters
 398 |     ----------
 399 |     fold : int >= 0
 400 |         evaluation fold number
 401 | 
 402 |     scene_label : str
 403 |         scene label
 404 | 
 405 |     path :  str
 406 |         model path
 407 | 
 408 |     extension : str
 409 |         file extension
 410 |         (Default value='cpickle')
 411 | 
 412 |     Returns
 413 |     -------
 414 |     model_filename : str
 415 |         full model filename
 416 | 
 417 |     """
 418 | 
 419 |     return os.path.join(path, 'model_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
 420 | 
 421 | 
 422 | def get_result_filename(fold, scene_label, path, extension='txt'):
 423 |     """Get result filename
 424 | 
 425 |     Parameters
 426 |     ----------
 427 |     fold : int >= 0
 428 |         evaluation fold number
 429 | 
 430 |     scene_label : str
 431 |         scene label
 432 | 
 433 |     path :  str
 434 |         result path
 435 | 
 436 |     extension : str
 437 |         file extension
 438 |         (Default value='cpickle')
 439 | 
 440 |     Returns
 441 |     -------
 442 |     result_filename : str
 443 |         full result filename
 444 | 
 445 |     """
 446 | 
 447 |     if fold == 0:
 448 |         return os.path.join(path, 'results_' + str(scene_label) + '.' + extension)
 449 |     else:
 450 |         return os.path.join(path, 'results_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
 451 | 
 452 | 
 453 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False):
 454 |     """Feature extraction
 455 | 
 456 |     Parameters
 457 |     ----------
 458 |     files : list
 459 |         file list
 460 | 
 461 |     dataset : class
 462 |         dataset class
 463 | 
 464 |     feature_path : str
 465 |         path where the features are saved
 466 | 
 467 |     params : dict
 468 |         parameter dict
 469 | 
 470 |     overwrite : bool
 471 |         overwrite existing feature files
 472 |         (Default value=False)
 473 | 
 474 |     Returns
 475 |     -------
 476 |     nothing
 477 | 
 478 |     Raises
 479 |     -------
 480 |     IOError
 481 |         Audio file not found.
 482 | 
 483 |     """
 484 | 
 485 |     for file_id, audio_filename in enumerate(files):
 486 |         # Get feature filename
 487 |         current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
 488 | 
 489 |         progress(title_text='Extracting [sequences]',
 490 |                  percentage=(float(file_id) / len(files)),
 491 |                  note=os.path.split(audio_filename)[1])
 492 | 
 493 |         if not os.path.isfile(current_feature_file) or overwrite:
 494 |             # Load audio
 495 |             if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)):
 496 |                 y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs'])
 497 |             else:
 498 |                 raise IOError("Audio file not found [%s]" % audio_filename)
 499 | 
 500 |             # Extract features
 501 |             feature_data = feature_extraction(y=y,
 502 |                                               fs=fs,
 503 |                                               include_mfcc0=params['include_mfcc0'],
 504 |                                               include_delta=params['include_delta'],
 505 |                                               include_acceleration=params['include_acceleration'],
 506 |                                               mfcc_params=params['mfcc'],
 507 |                                               delta_params=params['mfcc_delta'],
 508 |                                               acceleration_params=params['mfcc_acceleration'])
 509 |             # Save
 510 |             save_data(current_feature_file, feature_data)
 511 | 
 512 | 
 513 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False):
 514 |     """Feature normalization
 515 | 
 516 |     Calculated normalization factors for each evaluation fold based on the training material available.
 517 | 
 518 |     Parameters
 519 |     ----------
 520 |     dataset : class
 521 |         dataset class
 522 | 
 523 |     feature_normalizer_path : str
 524 |         path where the feature normalizers are saved.
 525 | 
 526 |     feature_path : str
 527 |         path where the features are saved.
 528 | 
 529 |     dataset_evaluation_mode : str ['folds', 'full']
 530 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 531 |         (Default value='folds')
 532 | 
 533 |     overwrite : bool
 534 |         overwrite existing normalizers
 535 |         (Default value=False)
 536 | 
 537 |     Returns
 538 |     -------
 539 |     nothing
 540 | 
 541 |     Raises
 542 |     -------
 543 |     IOError
 544 |         Feature file not found.
 545 | 
 546 |     """
 547 | 
 548 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 549 |         for scene_id, scene_label in enumerate(dataset.scene_labels):
 550 |             current_normalizer_file = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path)
 551 |             
 552 |             if not os.path.isfile(current_normalizer_file) or overwrite:
 553 |                 # Collect sequence files from scene class
 554 |                 files = []                
 555 |                 for item_id, item in enumerate(dataset.train(fold, scene_label=scene_label)):
 556 |                     if item['file'] not in files:
 557 |                         files.append(item['file'])
 558 | 
 559 |                 file_count = len(files)
 560 | 
 561 |                 # Initialize statistics
 562 |                 normalizer = FeatureNormalizer()
 563 | 
 564 |                 for file_id, audio_filename in enumerate(files):
 565 |                     progress(title_text='Collecting data',
 566 |                              fold=fold,
 567 |                              percentage=(float(file_id) / file_count),
 568 |                              note=os.path.split(audio_filename)[1])
 569 | 
 570 |                     # Load features
 571 |                     feature_filename = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
 572 |                     if os.path.isfile(feature_filename):
 573 |                         feature_data = load_data(feature_filename)['stat']
 574 |                     else:
 575 |                         raise IOError("Feature file not found [%s]" % audio_filename)
 576 | 
 577 |                     # Accumulate statistics
 578 |                     normalizer.accumulate(feature_data)
 579 | 
 580 |                 # Calculate normalization factors
 581 |                 normalizer.finalize()
 582 | 
 583 |                 # Save
 584 |                 save_data(current_normalizer_file, normalizer)
 585 | 
 586 | 
 587 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, hop_length_seconds, classifier_params,
 588 |                        dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False):
 589 |     """System training
 590 | 
 591 |     Train a model pair for each sound event class, one for activity and one for inactivity.
 592 | 
 593 |     model container format:
 594 | 
 595 |     {
 596 |         'normalizer': normalizer class
 597 |         'models' :
 598 |             {
 599 |                 'mouse click' :
 600 |                     {
 601 |                         'positive': mixture.GMM class,
 602 |                         'negative': mixture.GMM class
 603 |                     }
 604 |                 'keyboard typing' :
 605 |                     {
 606 |                         'positive': mixture.GMM class,
 607 |                         'negative': mixture.GMM class
 608 |                     }
 609 |                 ...
 610 |             }
 611 |     }
 612 | 
 613 |     Parameters
 614 |     ----------
 615 |     dataset : class
 616 |         dataset class
 617 | 
 618 |     model_path : str
 619 |         path where the models are saved.
 620 | 
 621 |     feature_normalizer_path : str
 622 |         path where the feature normalizers are saved.
 623 | 
 624 |     feature_path : str
 625 |         path where the features are saved.
 626 | 
 627 |     hop_length_seconds : float > 0
 628 |         feature frame hop length in seconds
 629 | 
 630 |     classifier_params : dict
 631 |         parameter dict
 632 | 
 633 |     dataset_evaluation_mode : str ['folds', 'full']
 634 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 635 |         (Default value='folds')
 636 | 
 637 |     classifier_method : str ['gmm']
 638 |         classifier method, currently only GMM supported
 639 |         (Default value='gmm')
 640 | 
 641 |     overwrite : bool
 642 |         overwrite existing models
 643 |         (Default value=False)
 644 | 
 645 |     Returns
 646 |     -------
 647 |     nothing
 648 | 
 649 |     Raises
 650 |     -------
 651 |     ValueError
 652 |         classifier_method is unknown.
 653 | 
 654 |     IOError
 655 |         Feature normalizer not found.
 656 |         Feature file not found.
 657 | 
 658 |     """
 659 | 
 660 |     if classifier_method != 'gmm':
 661 |         raise ValueError("Unknown classifier method ["+classifier_method+"]")
 662 | 
 663 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 664 |         for scene_id, scene_label in enumerate(dataset.scene_labels):
 665 |             current_model_file = get_model_filename(fold=fold, scene_label=scene_label, path=model_path)
 666 |             if not os.path.isfile(current_model_file) or overwrite:
 667 | 
 668 |                 # Load normalizer
 669 |                 feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path)
 670 |                 if os.path.isfile(feature_normalizer_filename):
 671 |                     normalizer = load_data(feature_normalizer_filename)
 672 |                 else:
 673 |                     raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename)
 674 | 
 675 |                 # Initialize model container
 676 |                 model_container = {'normalizer': normalizer, 'models': {}}
 677 | 
 678 |                 # Restructure training data in to structure[files][events]
 679 |                 ann = {}
 680 |                 for item_id, item in enumerate(dataset.train(fold=fold, scene_label=scene_label)):
 681 |                     filename = os.path.split(item['file'])[1]
 682 |                     if filename not in ann:
 683 |                         ann[filename] = {}
 684 |                     if item['event_label'] not in ann[filename]:
 685 |                         ann[filename][item['event_label']] = []
 686 |                     ann[filename][item['event_label']].append((item['event_onset'], item['event_offset']))
 687 | 
 688 |                 # Collect training examples
 689 |                 data_positive = {}
 690 |                 data_negative = {}
 691 |                 file_count = len(ann)
 692 |                 for item_id, audio_filename in enumerate(ann):
 693 |                     progress(title_text='Collecting data',
 694 |                              fold=fold,
 695 |                              percentage=(float(item_id) / file_count),
 696 |                              note=scene_label+" / "+os.path.split(audio_filename)[1])
 697 | 
 698 |                     # Load features
 699 |                     feature_filename = get_feature_filename(audio_file=audio_filename, path=feature_path)
 700 |                     if os.path.isfile(feature_filename):
 701 |                         feature_data = load_data(feature_filename)['feat']
 702 |                     else:
 703 |                         raise IOError("Feature file not found [%s]" % feature_filename)
 704 | 
 705 |                     # Normalize features
 706 |                     feature_data = model_container['normalizer'].normalize(feature_data)
 707 | 
 708 |                     for event_label in ann[audio_filename]:
 709 |                         positive_mask = numpy.zeros((feature_data.shape[0]), dtype=bool)
 710 | 
 711 |                         for event in ann[audio_filename][event_label]:
 712 |                             start_frame = int(math.floor(event[0] / hop_length_seconds))
 713 |                             stop_frame = int(math.ceil(event[1] / hop_length_seconds))
 714 | 
 715 |                             if stop_frame > feature_data.shape[0]:
 716 |                                 stop_frame = feature_data.shape[0]
 717 | 
 718 |                             positive_mask[start_frame:stop_frame] = True
 719 | 
 720 |                         # Store positive examples
 721 |                         if event_label not in data_positive:
 722 |                             data_positive[event_label] = feature_data[positive_mask, :]
 723 |                         else:
 724 |                             data_positive[event_label] = numpy.vstack((data_positive[event_label], feature_data[positive_mask, :]))
 725 | 
 726 |                         # Store negative examples
 727 |                         if event_label not in data_negative:
 728 |                             data_negative[event_label] = feature_data[~positive_mask, :]
 729 |                         else:
 730 |                             data_negative[event_label] = numpy.vstack((data_negative[event_label], feature_data[~positive_mask, :]))
 731 | 
 732 |                 # Train models for each class
 733 |                 for event_label in data_positive:
 734 |                     progress(title_text='Train models',
 735 |                              fold=fold,
 736 |                              note=scene_label+" / "+event_label)
 737 |                     if classifier_method == 'gmm':
 738 |                         model_container['models'][event_label] = {}
 739 |                         model_container['models'][event_label]['positive'] = mixture.GMM(**classifier_params).fit(data_positive[event_label])
 740 |                         model_container['models'][event_label]['negative'] = mixture.GMM(**classifier_params).fit(data_negative[event_label])
 741 |                     else:
 742 |                         raise ValueError("Unknown classifier method ["+classifier_method+"]")
 743 | 
 744 |                 # Save models
 745 |                 save_data(current_model_file, model_container)
 746 | 
 747 | 
 748 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params, detector_params,
 749 |                       dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False):
 750 |     """System testing.
 751 | 
 752 |     If extracted features are not found from disk, they are extracted but not saved.
 753 | 
 754 |     Parameters
 755 |     ----------
 756 |     dataset : class
 757 |         dataset class
 758 | 
 759 |     result_path : str
 760 |         path where the results are saved.
 761 | 
 762 |     feature_path : str
 763 |         path where the features are saved.
 764 | 
 765 |     model_path : str
 766 |         path where the models are saved.
 767 | 
 768 |     feature_params : dict
 769 |         parameter dict
 770 | 
 771 |     dataset_evaluation_mode : str ['folds', 'full']
 772 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 773 |         (Default value='folds')
 774 | 
 775 |     classifier_method : str ['gmm']
 776 |         classifier method, currently only GMM supported
 777 |         (Default value='gmm')
 778 | 
 779 |     overwrite : bool
 780 |         overwrite existing models
 781 |         (Default value=False)
 782 | 
 783 |     Returns
 784 |     -------
 785 |     nothing
 786 | 
 787 |     Raises
 788 |     -------
 789 |     ValueError
 790 |         classifier_method is unknown.
 791 | 
 792 |     IOError
 793 |         Model file not found.
 794 |         Audio file not found.
 795 | 
 796 |     """
 797 | 
 798 |     if classifier_method != 'gmm':
 799 |         raise ValueError("Unknown classifier method ["+classifier_method+"]")
 800 | 
 801 |     # Check that target path exists, create if not
 802 |     check_path(result_path)
 803 | 
 804 |     for fold in dataset.folds(mode=dataset_evaluation_mode):
 805 |         for scene_id, scene_label in enumerate(dataset.scene_labels):
 806 |             current_result_file = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
 807 | 
 808 |             if not os.path.isfile(current_result_file) or overwrite:
 809 |                 results = []
 810 | 
 811 |                 # Load class model container
 812 |                 model_filename = get_model_filename(fold=fold, scene_label=scene_label, path=model_path)
 813 |                 if os.path.isfile(model_filename):
 814 |                     model_container = load_data(model_filename)
 815 |                 else:
 816 |                     raise IOError("Model file not found [%s]" % model_filename)
 817 | 
 818 |                 file_count = len(dataset.test(fold, scene_label=scene_label))
 819 |                 for file_id, item in enumerate(dataset.test(fold=fold, scene_label=scene_label)):
 820 |                     progress(title_text='Testing',
 821 |                              fold=fold,
 822 |                              percentage=(float(file_id) / file_count),
 823 |                              note=scene_label+" / "+os.path.split(item['file'])[1])
 824 | 
 825 |                     # Load features
 826 |                     feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
 827 | 
 828 |                     if os.path.isfile(feature_filename):
 829 |                         feature_data = load_data(feature_filename)['feat']
 830 |                     else:
 831 |                         # Load audio
 832 |                         if os.path.isfile(dataset.relative_to_absolute_path(item['file'])):
 833 |                             y, fs = load_audio(filename=item['file'], mono=True, fs=feature_params['fs'])
 834 |                         else:
 835 |                             raise IOError("Audio file not found [%s]" % item['file'])
 836 | 
 837 |                         # Extract features
 838 |                         feature_data = feature_extraction(y=y,
 839 |                                                           fs=fs,
 840 |                                                           include_mfcc0=feature_params['include_mfcc0'],
 841 |                                                           include_delta=feature_params['include_delta'],
 842 |                                                           include_acceleration=feature_params['include_acceleration'],
 843 |                                                           mfcc_params=feature_params['mfcc'],
 844 |                                                           delta_params=feature_params['mfcc_delta'],
 845 |                                                           acceleration_params=feature_params['mfcc_acceleration'],
 846 |                                                           statistics=False)['feat']
 847 | 
 848 |                     # Normalize features
 849 |                     feature_data = model_container['normalizer'].normalize(feature_data)
 850 | 
 851 |                     current_results = event_detection(feature_data=feature_data,
 852 |                                                       model_container=model_container,
 853 |                                                       hop_length_seconds=feature_params['hop_length_seconds'],
 854 |                                                       smoothing_window_length_seconds=detector_params['smoothing_window_length'],
 855 |                                                       decision_threshold=detector_params['decision_threshold'],
 856 |                                                       minimum_event_length=detector_params['minimum_event_length'],
 857 |                                                       minimum_event_gap=detector_params['minimum_event_gap'])
 858 | 
 859 |                     # Store the result
 860 |                     for event in current_results:
 861 |                         results.append((dataset.absolute_to_relative(item['file']), event[0], event[1], event[2] ))
 862 | 
 863 |                 # Save testing results
 864 |                 with open(current_result_file, 'wt') as f:
 865 |                     writer = csv.writer(f, delimiter='\t')
 866 |                     for result_item in results:
 867 |                         writer.writerow(result_item)
 868 | 
 869 | 
 870 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'):
 871 |     """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed.
 872 | 
 873 |     Parameters
 874 |     ----------
 875 |     dataset : class
 876 |         dataset class
 877 | 
 878 |     result_path : str
 879 |         path where the results are saved.
 880 | 
 881 |     dataset_evaluation_mode : str ['folds', 'full']
 882 |         evaluation mode, 'full' all material available is considered to belong to one fold.
 883 |         (Default value='folds')
 884 | 
 885 |     Returns
 886 |     -------
 887 |     nothing
 888 | 
 889 |     Raises
 890 |     -------
 891 |     IOError
 892 |         Result file not found
 893 | 
 894 |     """
 895 | 
 896 |     # Set warnings off, sklearn metrics will trigger warning for classes without
 897 |     # predicted samples in F1-scoring. This is just to keep printing clean.
 898 |     warnings.simplefilter("ignore")
 899 | 
 900 |     overall_metrics_per_scene = {}
 901 | 
 902 |     for scene_id, scene_label in enumerate(dataset.scene_labels):
 903 |         if scene_label not in overall_metrics_per_scene:
 904 |             overall_metrics_per_scene[scene_label] = {}
 905 | 
 906 |         dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
 907 |         dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label), use_onset_condition=True, use_offset_condition=False)
 908 | 
 909 |         for fold in dataset.folds(mode=dataset_evaluation_mode):
 910 |             results = []
 911 |             result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
 912 | 
 913 |             if os.path.isfile(result_filename):
 914 |                 with open(result_filename, 'rt') as f:
 915 |                     for row in csv.reader(f, delimiter='\t'):
 916 |                         results.append(row)
 917 |             else:
 918 |                 raise IOError("Result file not found [%s]" % result_filename)
 919 | 
 920 |             for file_id, item in enumerate(dataset.test(fold, scene_label=scene_label)):
 921 |                 current_file_results = []
 922 |                 for result_line in results:
 923 |                     if len(result_line) != 0 and result_line[0] == dataset.absolute_to_relative(item['file']):
 924 |                         current_file_results.append(
 925 |                             {'file': result_line[0],
 926 |                              'event_onset': float(result_line[1]),
 927 |                              'event_offset': float(result_line[2]),
 928 |                              'event_label': result_line[3].rstrip()
 929 |                              }
 930 |                         )
 931 |                 meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
 932 | 
 933 |                 dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
 934 |                 dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
 935 | 
 936 |         overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results()
 937 |         overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results()
 938 | 
 939 |     print "  Evaluation over %d folds" % dataset.fold_count
 940 |     print " "
 941 |     print "  Results per scene "
 942 |     print "  {:18s} | {:5s} |  | {:39s}  ".format('', 'Main', 'Secondary metrics')
 943 |     print "  {:18s} | {:5s} |  | {:38s} | {:14s} | {:14s} |  {:14s} ".format('', '', 'Seg/Overall','Seg/Class', 'Event/Overall','Event/Class')
 944 |     print "  {:18s} | {:5s} |  | {:6s} : {:5s} : {:5s} : {:5s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} |".format('Scene', 'ER', 'F1', 'ER', 'ER/S', 'ER/D', 'ER/I', 'F1', 'ER', 'F1', 'ER', 'F1', 'ER')
 945 |     print "  -------------------+-------+  +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+"
 946 |     averages = {
 947 |         'segment_based_metrics': {
 948 |             'overall': {
 949 |                 'ER': [],
 950 |                 'F': [],
 951 |             },
 952 |             'class_wise_average': {
 953 |                 'ER': [],
 954 |                 'F': [],
 955 |             }
 956 |         },
 957 |         'event_based_metrics': {
 958 |             'overall': {
 959 |                 'ER': [],
 960 |                 'F': [],
 961 |             },
 962 |             'class_wise_average': {
 963 |                 'ER': [],
 964 |                 'F': [],
 965 |             }
 966 |         },
 967 |     }
 968 |     for scene_id, scene_label in enumerate(dataset.scene_labels):
 969 |         print "  {:18s} | {:5.2f} |  | {:4.1f} % : {:5.2f} : {:5.2f} : {:5.2f} : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format(scene_label,
 970 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'],
 971 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F'] * 100,
 972 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'],
 973 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['S'],
 974 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['D'],
 975 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['I'],
 976 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100,
 977 |                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'],
 978 |                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F']*100,
 979 |                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER'],
 980 |                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100,
 981 |                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'],
 982 |                                                                      )
 983 |         averages['segment_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'])
 984 |         averages['segment_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F'])
 985 |         averages['segment_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'])
 986 |         averages['segment_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F'])
 987 |         averages['event_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER'])
 988 |         averages['event_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F'])
 989 |         averages['event_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'])
 990 |         averages['event_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F'])
 991 | 
 992 |     print "  -------------------+-------+  +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+"
 993 |     print "  {:18s} | {:5.2f} |  | {:4.1f} % : {:5.2f} : {:21s} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format('Average',
 994 |                                         numpy.mean(averages['segment_based_metrics']['overall']['ER']),
 995 |                                         numpy.mean(averages['segment_based_metrics']['overall']['F'])*100,
 996 |                                         numpy.mean(averages['segment_based_metrics']['overall']['ER']),
 997 |                                         ' ',
 998 |                                         numpy.mean(averages['segment_based_metrics']['class_wise_average']['F'])*100,
 999 |                                         numpy.mean(averages['segment_based_metrics']['class_wise_average']['ER']),
1000 |                                         numpy.mean(averages['event_based_metrics']['overall']['F'])*100,
1001 |                                         numpy.mean(averages['event_based_metrics']['overall']['ER']),
1002 |                                         numpy.mean(averages['event_based_metrics']['class_wise_average']['F'])*100,
1003 |                                         numpy.mean(averages['event_based_metrics']['class_wise_average']['ER']),
1004 |                                                     )
1005 | 
1006 |     print "  "
1007 |     # Restore warnings to default settings
1008 |     warnings.simplefilter("default")
1009 |     print "  Results per events "
1010 | 
1011 |     for scene_id, scene_label in enumerate(dataset.scene_labels):
1012 |         print "  "
1013 |         print "  "+scene_label.upper()
1014 |         print "  {:20s} | {:30s} |  | {:15s} ".format('', 'Segment-based', 'Event-based')
1015 |         print "  {:20s} | {:5s} : {:5s} : {:6s} : {:5s} |  | {:5s} : {:5s} : {:6s} : {:5s} |".format('Event', 'Nref', 'Nsys', 'F1', 'ER', 'Nref', 'Nsys', 'F1', 'ER')
1016 |         print "  ---------------------+-------+-------+--------+-------+  +-------+-------+--------+-------+"
1017 |         seg_Nref = 0
1018 |         seg_Nsys = 0
1019 | 
1020 |         event_Nref = 0
1021 |         event_Nsys = 0
1022 |         for event_label in sorted(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise']):
1023 |             print "  {:20s} | {:5d} : {:5d} : {:4.1f} % : {:5.2f} |  | {:5d} : {:5d} : {:4.1f} % : {:5.2f} |".format(event_label,
1024 |                                                                         int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref']),
1025 |                                                                         int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys']),
1026 |                                                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['F']*100,
1027 |                                                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['ER'],
1028 |                                                                         int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref']),
1029 |                                                                         int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys']),
1030 |                                                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['F']*100,
1031 |                                                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['ER'])
1032 |             seg_Nref += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref'])
1033 |             seg_Nsys += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys'])
1034 | 
1035 |             event_Nref += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref'])
1036 |             event_Nsys += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys'])
1037 |         print "  ---------------------+-------+-------+--------+-------+  +-------+-------+--------+-------+"
1038 |         print "  {:20s} | {:5d} : {:5d} : {:14s} |  | {:5d} : {:5d} : {:14s} |".format('Sum',
1039 |                                                                         seg_Nref,
1040 |                                                                         seg_Nsys,
1041 |                                                                         '',
1042 |                                                                         event_Nref,
1043 |                                                                         event_Nsys,
1044 |                                                                         '')
1045 |         print "  {:20s} | {:5s}   {:5s} : {:4.1f} % : {:5.2f} |  | {:5s}   {:5s} : {:4.1f} % : {:5.2f} |".format('Average',
1046 |                                                                         '', '',
1047 |                                                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100,
1048 |                                                                         overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'],
1049 |                                                                         '', '',
1050 |                                                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100,
1051 |                                                                         overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'])
1052 |         print "  "
1053 | 
1054 | if __name__ == "__main__":
1055 |     try:
1056 |         sys.exit(main(sys.argv))
1057 |     except (ValueError, IOError) as e:
1058 |         sys.exit(e)


--------------------------------------------------------------------------------
/task3_sound_event_detection_in_real_life_audio.yaml:
--------------------------------------------------------------------------------
 1 | # ==========================================================
 2 | # Flow
 3 | # ==========================================================
 4 | flow:
 5 |   initialize: true
 6 |   extract_features: true
 7 |   feature_normalizer: true
 8 |   train_system: true
 9 |   test_system: true
10 |   evaluate_system: true
11 | 
12 | # ==========================================================
13 | # General
14 | # ==========================================================
15 | general:
16 |   development_dataset: TUTSoundEvents_2016_DevelopmentSet
17 |   challenge_dataset: TUTSoundEvents_2016_EvaluationSet
18 | 
19 |   overwrite: false              # Overwrite previously stored data
20 | 
21 | # ==========================================================
22 | # Paths
23 | # ==========================================================
24 | path:
25 |   data: data/
26 | 
27 |   base: system/baseline_dcase2016_task3/
28 |   features: features/
29 |   feature_normalizers: feature_normalizers/
30 |   models: acoustic_models/
31 |   results: evaluation_results/
32 | 
33 |   challenge_results: challenge_submission/task_3_sound_event_detection_in_real_life_audio/
34 | 
35 | # ==========================================================
36 | # Feature extraction
37 | # ==========================================================
38 | features:
39 |   fs: 44100
40 |   win_length_seconds: 0.04
41 |   hop_length_seconds: 0.02
42 | 
43 |   include_mfcc0: false
44 |   include_delta: true
45 |   include_acceleration: true
46 | 
47 |   mfcc:
48 |     window: hamming_asymmetric  # [hann_asymmetric, hamming_asymmetric]
49 |     n_mfcc: 20                  # Number of MFCC coefficients
50 |     n_mels: 40                  # Number of MEL bands used
51 |     n_fft: 2048                 # FFT length, make sure this is larger than win_length_seconds*fs
52 |     fmin: 0                     # Minimum frequency when constructing MEL bands
53 |     fmax: 22050                 # Maximum frequency when constructing MEL band
54 |     htk: false                  # Switch for HTK-styled MEL-frequency equation
55 | 
56 |   mfcc_delta:
57 |     width: 9
58 | 
59 |   mfcc_acceleration:
60 |     width: 9
61 | 
62 | # ==========================================================
63 | # Classifier
64 | # ==========================================================
65 | classifier:
66 |   method: gmm                   # The system supports only gmm
67 |   parameters: !!null            # Parameters are copied from classifier_parameters based on defined method
68 | 
69 | classifier_parameters:
70 |   gmm:
71 |     n_components: 16            # Number of Gaussian components
72 |     covariance_type: diag       # [diag|full] Diagonal or full covariance matrix
73 |     random_state: 0
74 |     thresh: !!null
75 |     tol: 0.001
76 |     min_covar: 0.001
77 |     n_iter: 40
78 |     n_init: 1
79 |     params: wmc
80 |     init_params: wmc
81 | 
82 | # ==========================================================
83 | # Detector
84 | # ==========================================================
85 | detector:
86 |   decision_threshold: 160.0
87 |   smoothing_window_length: 1.0  # seconds
88 |   minimum_event_length: 0.1     # seconds
89 |   minimum_event_gap: 0.1        # seconds
90 | 


--------------------------------------------------------------------------------