├── .gitignore
├── EULA.pdf
├── README.html
├── README.md
├── requirements.txt
├── src
├── __init__.py
├── dataset.py
├── evaluation.py
├── features.py
├── files.py
├── general.py
├── sound_event_detection.py
└── ui.py
├── task1_scene_classification.py
├── task1_scene_classification.yaml
├── task3_sound_event_detection_in_real_life_audio.py
└── task3_sound_event_detection_in_real_life_audio.yaml
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | # PyInstaller
28 | # Usually these files are written by a python script from a template
29 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 |
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 |
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 |
48 | # Translations
49 | *.mo
50 | *.pot
51 |
52 | # Django stuff:
53 | *.log
54 |
55 | # Sphinx documentation
56 | docs/_build/
57 |
58 | # PyBuilder
59 | target/
60 |
61 | #Ipython Notebook
62 | .ipynb_checkpoints
63 |
64 | data/
65 | system/
66 | .idea/
--------------------------------------------------------------------------------
/EULA.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/EULA.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | DCASE2016 Baseline system
2 | =========================
3 | [Audio Research Group / Tampere University of Technology](http://arg.cs.tut.fi/)
4 |
5 | *Python implementation*
6 |
7 | Systems:
8 | - Task 1 - Acoustic scene classification
9 | - Task 3 - Sound event detection in real life audio
10 |
11 | Authors
12 | - Toni Heittola (, )
13 | - Annamaria Mesaros (, )
14 | - Tuomas Virtanen (, )
15 |
16 | Table of Contents
17 | =================
18 | 1. [Introduction](#1-introduction)
19 | 2. [Installation](#2-installation)
20 | 3. [Usage](#3-usage)
21 | 4. [System blocks](#4-system-blocks)
22 | 5. [System evaluation](#5-system-evaluation)
23 | 6. [System parameters](#6-system-parameters)
24 | 7. [Changelog](#7-changelog)
25 | 8. [License](#8-license)
26 |
27 | 1. Introduction
28 | ===============
29 | This document describes the Python implementation of the baseline systems for the [Detection and Classification of Acoustic Scenes and Events 2016 (DCASE2016) challenge](http://www.cs.tut.fi/sgn/arg/dcase2016/) **[tasks 1](#11-acoustic-scene-classification)** and **[task 3](#12-sound-event-detection)**. The challenge consists of four tasks:
30 |
31 | 1. [Acoustic scene classification](http://www.cs.tut.fi/sgn/arg/dcase2016/task-acoustic-scene-classification)
32 | 2. [Sound event detection in synthetic audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-synthetic-audio)
33 | 3. [Sound event detection in real life audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-real-life-audio)
34 | 4. [Domestic audio tagging](http://www.cs.tut.fi/sgn/arg/dcase2016/task-audio-tagging)
35 |
36 | The baseline systems for task 1 and 3 shares the same basic approach: [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) based acoustic features and [GMM](https://en.wikipedia.org/wiki/Mixture_model) based classifier. The main motivation to have similar approaches for both tasks was to provide low entry level and allow easy switching between the tasks.
37 |
38 | The dataset handling is hidden behind dataset access class, which should help DCASE challenge participants implementing their own systems.
39 |
40 | The [Matlab implementation](https://github.com/TUT-ARG/DCASE2016-baseline-system-matlab) is also available.
41 |
42 | #### 1.1. Acoustic scene classification
43 |
44 | The acoustic features include MFCC static coefficients (with 0th coefficient), delta coefficients and acceleration coefficients. The system learns one acoustic model per acoustic scene class, and does the classification with maximum likelihood classification scheme.
45 |
46 | #### 1.2. Sound event detection
47 |
48 | The acoustic features include MFCC static coefficients (0th coefficient omitted), delta coefficients and acceleration coefficients. The system has a binary classifier for each sound event class included. For the classifier, two acoustic models are trained from the mixture signals: one with positive examples (target sound event active) and one with negative examples (target sound event non-active). The classification is done between these two models as likelihood ratio. Post-processing is applied to get sound event detection output.
49 |
50 | 2. Installation
51 | ===============
52 |
53 | The systems are developed for [Python 2.7.0](https://www.python.org/). Currently, the baseline system is tested only with Linux operating system.
54 |
55 | Run to ensure that all external modules are installed
56 |
57 | pip install -r requirements.txt
58 |
59 | **External modules required**
60 |
61 | [*numpy*](http://www.numpy.org/), [*scipy*](http://www.scipy.org/), [*scikit-learn*](http://scikit-learn.org/)
62 | `pip install numpy scipy scikit-learn`
63 |
64 | Scikit-learn (version >= 0.16) is required for the machine learning implementations.
65 |
66 | [*PyYAML*](http://pyyaml.org/)
67 | `pip install pyyaml`
68 |
69 | PyYAML is required for handling the configuration files.
70 |
71 | [*librosa*](https://github.com/bmcfee/librosa)
72 | `pip install librosa`
73 |
74 | Librosa is required for the feature extraction.
75 |
76 | 3. Usage
77 | ========
78 |
79 | For each task there is separate executable (.py file):
80 |
81 | 1. *task1_scene_classification.py*, Acoustic scene classification
82 | 3. *task3_sound_event_detection_in_real_life_audio.py*, Real life audio sound event detection
83 |
84 | Each system has two operating modes: **Development mode** and **Challenge mode**.
85 |
86 | All the usage parameters are shown by `python task1_scene_classification.py -h` and `python task3_sound_event_detection_in_real_life_audio.py -h`
87 |
88 | The system parameters are defined in `task1_scene_classification.yaml` and `task3_sound_event_detection_in_real_life_audio.yaml`.
89 |
90 | With default parameter settings, the system will download needed dataset from Internet and extract it under directory `data` (storage path is controlled with parameter `path->data`).
91 |
92 | #### Development mode
93 |
94 | In this mode, the system is trained and evaluated with the development dataset. This is the default operating mode.
95 |
96 | To run the system in this mode:
97 | `python task1_scene_classification.py`
98 | or `python task1_scene_classification.py -development`.
99 |
100 | #### Challenge mode
101 |
102 | In this mode, the system is trained with the provided development dataset and the evaluation dataset is run through the developed system. Output files are generated in correct format for the challenge submission. The system ouput is saved in the path specified with the parameter: `path->challenge_results`.
103 |
104 | To run the system in this mode:
105 | `python task1_scene_classification.py -challenge`.
106 |
107 |
108 | 4. System blocks
109 | ================
110 |
111 | The system implements following blocks:
112 |
113 | 1. Dataset initialization
114 | - Downloads the dataset from the Internet if needed
115 | - Extracts the dataset package if needed
116 | - Makes sure that the meta files are appropriately formated
117 |
118 | 2. Feature extraction (`do_feature_extraction`)
119 | - Goes through all the training material and extracts the acoustic features
120 | - Features are stored file-by-file on the local disk (pickle files)
121 |
122 | 3. Feature normalization (`do_feature_normalization`)
123 | - Goes through the training material in evaluation folds, and calculates global mean and std of the data.
124 | - Stores the normalization factors (pickle files)
125 |
126 | 4. System training (`do_system_training`)
127 | - Trains the system
128 | - Stores the trained models and feature normalization factors together on the local disk (pickle files)
129 |
130 | 5. System testing (`do_system_testing`)
131 | - Goes through the testing material and does the classification / detection
132 | - Stores the results (text files)
133 |
134 | 6. System evaluation (`do_system_evaluation`)
135 | - Reads the ground truth and the output of the system and calculates evaluation metrics
136 |
137 | 5. System evaluation
138 | ====================
139 |
140 | ## Task 1 - Acoustic scene classification
141 |
142 | ### Metrics
143 |
144 | The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample.
145 |
146 | ### Results
147 |
148 | ##### TUT Acoustic scenes 2016, development set
149 |
150 | [Dataset](https://zenodo.org/record/45739)
151 |
152 | *Evaluation setup*
153 |
154 | - 4 cross-validation folds, average classification accuracy over folds
155 | - 15 acoustic scene classes
156 | - Classification unit: one file (30 seconds of audio).
157 |
158 | *System parameters*
159 |
160 | - Frame size: 40 ms (with 50% hop size)
161 | - Number of Gaussians per acoustic scene class model: 16
162 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
163 | - Trained and tested on full audio
164 |
165 | | Scene | Accuracy |
166 | |----------------------|--------------|
167 | | Beach | 63.3 % |
168 | | Bus | 79.6 % |
169 | | Cafe/restaurant | 83.2 % |
170 | | Car | 87.2 % |
171 | | City center | 85.5 % |
172 | | Forest path | 81.0 % |
173 | | Grocery store | 65.0 % |
174 | | Home | 82.1 % |
175 | | Library | 50.4 % |
176 | | Metro station | 94.7 % |
177 | | Office | 98.6 % |
178 | | Park | 13.9 % |
179 | | Residential area | 77.7 % |
180 | | Train | 33.6 % |
181 | | Tram | 85.4 % |
182 | | **Overall accuracy** | **72.5 %** |
183 |
184 | ##### DCASE 2013 Scene classification, development set
185 |
186 | [Dataset](http://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/29)
187 |
188 | *Evaluation setup*
189 |
190 | - 5 fold average
191 | - 10 acoustic scene classes
192 | - Classification unit: one file (30 seconds of audio).
193 |
194 | *System parameters*
195 |
196 | - Frame size: 40 ms (with 50% hop size)
197 | - Number of Gaussians per acoustic scene class model: 16
198 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
199 |
200 | | Scene | Accuracy |
201 | |----------------------|--------------|
202 | | Bus | 93.3 % |
203 | | Busy street | 80.0 % |
204 | | Office | 86.7 % |
205 | | Open air market | 73.3 % |
206 | | Park | 26.7 % |
207 | | Quiet street | 53.3 % |
208 | | Restaurant | 40.0 % |
209 | | Supermarket | 26.7 % |
210 | | Tube | 66.7 % |
211 | | Tube station | 53.3 % |
212 | | **Overall accuracy** | **60.0 %** |
213 |
214 |
215 | ## Task 3 - Real life audio sound event detection
216 |
217 | ### Metrics
218 |
219 | **Segment-based metrics**
220 |
221 | Segment based evaluation is done in a fixed time grid, using segments of one second length to compare the ground truth and the system output.
222 |
223 | - **Total error rate (ER)** is the main metric for this task. Error rate as defined in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) will be evaluated in one-second segments over the entire test set.
224 |
225 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives.
226 |
227 | **Event-based metrics**
228 |
229 | Event-based evaluation considers true positives, false positives and false negatives with respect to event instances.
230 |
231 | **Definition**: An event in the system output is considered correctly detected if its temporal position is overlapping with the temporal position of an event with the same label in the ground truth. A tolerance is allowed for the onset and offset (200 ms for onset and 200 ms or half length for offset)
232 |
233 | - **Error rate** calculated as described in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) over all test data based on the total number of insertions, deletions and substitutions.
234 |
235 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives.
236 |
237 | Detailed description of metrics can be found from [DCASE2016 website](http://www.cs.tut.fi/sgn/arg/dcase2016/sound-event-detection-metrics).
238 |
239 | ### Results
240 |
241 | ##### TUT Sound events 2016, development set
242 |
243 | [Dataset](https://zenodo.org/record/45759)
244 |
245 | *Evaluation setup*
246 |
247 | - 4 cross-validation folds
248 |
249 | *System parameters*
250 |
251 | - Frame size: 40 ms (with 50% hop size)
252 | - Number of Gaussians per sound event model (positive and negative): 16
253 | - Feature vector: 20 MFCC static coefficients (excluding 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values
254 | - Decision_threshold: 140
255 |
256 | *Segment based metrics - overall*
257 |
258 | | Scene | ER | ER / S | ER / D | ER / I | F1 |
259 | |-----------------------|-------------|-------------|-------------|-------------|-------------|
260 | | Home | 0.96 | 0.08 | 0.82 | 0.06 | 15.9 % |
261 | | Residential area | 0.86 | 0.05 | 0.74 | 0.07 | 31.5 % |
262 | | **Average** | **0.91** | | | | **23.7 %** |
263 |
264 | *Segment based metrics - class-wise*
265 |
266 | | Scene | ER | F1 |
267 | |-----------------------|-------------|-------------|
268 | | Home | 1.06 | 9.2 % |
269 | | Residential area | 1.03 | 17.6 % |
270 | | **Average** | **1.04** | **13.4 %** |
271 |
272 | *Event based metrics (onset-only) - overall*
273 |
274 | | Scene | ER | F1 |
275 | |-----------------------|-------------|-------------|
276 | | Home | 1.28 | 4.7 % |
277 | | Residential area | 1.92 | 2.9 % |
278 | | **Average** | **1.60** | **3.8 %** |
279 |
280 | *Event based metrics (onset-only) - class-wise*
281 |
282 | | Scene | ER | F1 |
283 | |-----------------------|-------------|-------------|
284 | | Home | 1.27 | 4.3 % |
285 | | Residential area | 1.97 | 1.5 % |
286 | | **Average** | **1.62** | **2.9 %** |
287 |
288 |
289 | 6. System parameters
290 | ====================
291 | All the parameters are set in `task1_scene_classification.yaml`, and `task3_sound_event_detection_in_real_life_audio.yaml`.
292 |
293 | **Controlling the system flow**
294 |
295 | The blocks of the system can be controlled through the configuration file. Usually all of them can be kept on.
296 |
297 | flow:
298 | initialize: true
299 | extract_features: true
300 | feature_normalizer: true
301 | train_system: true
302 | test_system: true
303 | evaluate_system: true
304 |
305 | **General parameters**
306 |
307 | The selection of used dataset.
308 |
309 | general:
310 | development_dataset: TUTSoundEvents_2016_DevelopmentSet
311 | challenge_dataset: TUTSoundEvents_2016_EvaluationSet
312 |
313 | overwrite: false # Overwrite previously stored data
314 |
315 | `development_dataset: TUTSoundEvents_2016_DevelopmentSet`
316 | : The dataset handler class used while running the system in development mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`).
317 |
318 | `challenge_dataset: TUTSoundEvents_2016_EvaluationSet`
319 | : The dataset handler class used while running the system in challenge mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`).
320 |
321 | Available dataset handler classes:
322 |
323 | **DCASE 2016**
324 |
325 | - TUTAcousticScenes_2016_DevelopmentSet
326 | - TUTAcousticScenes_2016_EvaluationSet
327 | - TUTSoundEvents_2016_DevelopmentSet
328 | - TUTSoundEvents_2016_EvaluationSet
329 |
330 | **DCASE 2013**
331 |
332 | - DCASE2013_Scene_DevelopmentSet
333 | - DCASE2013_Scene_EvaluationSet
334 | - DCASE2013_Event_DevelopmentSet
335 | - DCASE2013_Event_EvaluationSet
336 |
337 |
338 | `overwrite: false`
339 | : Switch to allow the system always to overwrite existing data on disk.
340 |
341 | `challenge_submission_mode: false`
342 | : Switch to control where system output is saved. If true, `path->challenge_results` used, and all results are overwritten by default.
343 |
344 |
345 | **System paths**
346 |
347 | This section contains the storage paths.
348 |
349 | path:
350 | data: data/
351 |
352 | base: system/baseline_dcase2016_task1/
353 | features: features/
354 | feature_normalizers: feature_normalizers/
355 | models: acoustic_models/
356 | results: evaluation_results/
357 |
358 | challenge_results: challenge_submission/task_1_acoustic_scene_classification/
359 |
360 | These parameters defines the folder-structure to store acoustic features, feature normalization data, acoustic models and evaluation results.
361 |
362 | `data: data/`
363 | : Defines the path where the dataset data is downloaded and stored. Path is relative to the main script.
364 |
365 | `base: system/baseline_dcase2016_task1/`
366 | : Defines the base path where the system stores the data. Other paths are stored under this path. If specified directory does not exist it is created. Path is relative to the main script.
367 |
368 | `challenge_results: challenge_submission/task_1_acoustic_scene_classification/`
369 | : Defines where the system output is stored while running the system in challenge mode.
370 |
371 | **Feature extraction**
372 |
373 | This section contains the feature extraction related parameters.
374 |
375 | features:
376 | fs: 44100
377 | win_length_seconds: 0.04
378 | hop_length_seconds: 0.02
379 |
380 | include_mfcc0: true #
381 | include_delta: true #
382 | include_acceleration: true #
383 |
384 | mfcc:
385 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric]
386 | n_mfcc: 20 # Number of MFCC coefficients
387 | n_mels: 40 # Number of MEL bands used
388 | n_fft: 2048 # FFT length
389 | fmin: 0 # Minimum frequency when constructing MEL bands
390 | fmax: 22050 # Maximum frequency when constructing MEL band
391 | htk: false # Switch for HTK-styled MEL-frequency equation
392 |
393 | mfcc_delta:
394 | width: 9
395 |
396 | mfcc_acceleration:
397 | width: 9
398 |
399 | `fs: 44100`
400 | : Default sampling frequency. If given dataset does not fulfill this criteria the audio data is resampled.
401 |
402 |
403 | `win_length_seconds: 0.04`
404 | : Feature extraction frame length in seconds.
405 |
406 |
407 | `hop_length_seconds: 0.02`
408 | : Feature extraction frame hop-length in seconds.
409 |
410 |
411 | `include_mfcc0: true`
412 | : Switch to include zeroth coefficient of static MFCC in the feature vector
413 |
414 |
415 | `include_delta: true`
416 | : Switch to include delta coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of delta-window is set in `mfcc_delta->width: 9`
417 |
418 |
419 | `include_acceleration: true`
420 | : Switch to include acceleration (delta-delta) coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of acceleration-window is set in `mfcc_acceleration->width: 9`
421 |
422 | `mfcc->n_mfcc: 16`
423 | : Number of MFCC coefficients
424 |
425 | `mfcc->fmax: 22050`
426 | : Maximum frequency for MEL band. Usually, this is set to a half of the sampling frequency.
427 |
428 | **Classifier**
429 |
430 | This section contains the frame classifier related parameters. These parameters are used when chosen classifier is trained.
431 |
432 | classifier:
433 | method: gmm # The system supports only gmm
434 |
435 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
436 | clean_data: false # Exclude audio errors from training audio
437 |
438 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method
439 |
440 | classifier_parameters:
441 | gmm:
442 | n_components: 16 # Number of Gaussian components
443 | covariance_type: diag # Diagonal or full covariance matrix
444 | random_state: 0
445 | thresh: !!null
446 | tol: 0.001
447 | min_covar: 0.001
448 | n_iter: 40
449 | n_init: 1
450 | params: wmc
451 | init_params: wmc
452 |
453 | `audio_error_handling->clean_data: false`
454 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the classifier during training. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones.
455 |
456 | `classifier_parameters->gmm->n_components: 16`
457 | : Number of Gaussians used in the modeling.
458 |
459 | In order to add new classifiers to the system, add parameters under classifier_parameters with new tag. Set `classifier->method` and add appropriate code where `classifier_method` variable is used system block API (look into `do_system_training` and `do_system_testing` methods). In addition to this, one might want to modify filename methods (`get_model_filename` and `get_result_filename`) to allow multiple classifier methods co-exist in the system.
460 |
461 | **Recognizer**
462 |
463 | This section contains the sound recognition related parameters (used in `task1_scene_classification.py`).
464 |
465 | recognizer:
466 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
467 | clean_data: false # Exclude audio errors from test audio
468 |
469 | `audio_error_handling->clean_data: false`
470 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the recognizer. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones.
471 |
472 | **Detector**
473 |
474 | This section contains the sound event detection related parameters (used in `task3_sound_event_detection_in_real_life_audio.py`).
475 |
476 | detector:
477 | decision_threshold: 140.0
478 | smoothing_window_length: 1.0 # seconds
479 | minimum_event_length: 0.1 # seconds
480 | minimum_event_gap: 0.1 # seconds
481 |
482 | `decision_threshold: 140.0`
483 | : Decision threshold used to do final classification. This can be used to control the sensitivity of the system. With log-likelihoods: `event_activity = (positive - negative) > decision_threshold`
484 |
485 |
486 | `smoothing_window_length: 1.0`
487 | : Size of sliding accumulation window (in seconds) used before frame-wise classification decision
488 |
489 |
490 | `minimum_event_length: 0.1`
491 | : Minimum length (in seconds) of outputted events. Events with shorter length than given are filtered out from the system output.
492 |
493 |
494 | `minimum_event_gap: 0.1`
495 | : Minimum gap (in seconds) between events from same event class in the output. Consecutive events (event with same event label) having shorter gaps between them than set parameter are merged together.
496 |
497 | 7. Changelog
498 | ============
499 | #### 1.2 / 2016-11-10
500 | * Added evaluation in challenge mode for task 1
501 |
502 | #### 1.1 / 2016-05-19
503 | * Added audio error handling
504 |
505 | #### 1.0 / 2016-02-08
506 | * Initial commit
507 |
508 | 8. License
509 | ==========
510 |
511 | See file [EULA.pdf](EULA.pdf)
512 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | scipy>=0.15.1
2 | numpy>=1.9.2
3 | scikit-learn==0.16.1
4 | pyyaml>=3.11
5 | librosa==0.4.0
6 | soundfile>=0.9.0
7 |
--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/src/__init__.py
--------------------------------------------------------------------------------
/src/evaluation.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import sys
5 | import numpy
6 | import math
7 | from sklearn import metrics
8 |
9 | class DCASE2016_SceneClassification_Metrics():
10 | """DCASE 2016 scene classification metrics
11 |
12 | Examples
13 | --------
14 |
15 | >>> dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
16 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode):
17 | >>> results = []
18 | >>> result_filename = get_result_filename(fold=fold, path=result_path)
19 | >>>
20 | >>> if os.path.isfile(result_filename):
21 | >>> with open(result_filename, 'rt') as f:
22 | >>> for row in csv.reader(f, delimiter='\t'):
23 | >>> results.append(row)
24 | >>>
25 | >>> y_true = []
26 | >>> y_pred = []
27 | >>> for result in results:
28 | >>> y_true.append(dataset.file_meta(result[0])[0]['scene_label'])
29 | >>> y_pred.append(result[1])
30 | >>>
31 | >>> dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
32 | >>>
33 | >>> results = dcase2016_scene_metric.results()
34 |
35 | """
36 |
37 | def __init__(self, class_list):
38 | """__init__ method.
39 |
40 | Parameters
41 | ----------
42 | class_list : list
43 | Evaluated scene labels in the list
44 |
45 | """
46 | self.accuracies_per_class = None
47 | self.correct_per_class = None
48 | self.Nsys = None
49 | self.Nref = None
50 | self.class_list = class_list
51 | self.eps = numpy.spacing(1)
52 |
53 | def __enter__(self):
54 | return self
55 |
56 | def __exit__(self, type, value, traceback):
57 | return self.results()
58 |
59 | def accuracies(self, y_true, y_pred, labels):
60 | """Calculate accuracy
61 |
62 | Parameters
63 | ----------
64 | y_true : numpy.array
65 | Ground truth array, list of scene labels
66 |
67 | y_pred : numpy.array
68 | System output array, list of scene labels
69 |
70 | labels : list
71 | list of scene labels
72 |
73 | Returns
74 | -------
75 | array : numpy.array [shape=(number of scene labels,)]
76 | Accuracy per scene label class
77 |
78 | """
79 |
80 | confusion_matrix = metrics.confusion_matrix(y_true=y_true, y_pred=y_pred, labels=labels).astype(float)
81 | return (numpy.diag(confusion_matrix), numpy.divide(numpy.diag(confusion_matrix), numpy.sum(confusion_matrix, 1)+self.eps))
82 |
83 | def evaluate(self, annotated_ground_truth, system_output):
84 | """Evaluate system output and annotated ground truth pair.
85 |
86 | Use results method to get results.
87 |
88 | Parameters
89 | ----------
90 | annotated_ground_truth : numpy.array
91 | Ground truth array, list of scene labels
92 |
93 | system_output : numpy.array
94 | System output array, list of scene labels
95 |
96 | Returns
97 | -------
98 | nothing
99 |
100 | """
101 |
102 | correct_per_class, accuracies_per_class = self.accuracies(y_pred=system_output, y_true=annotated_ground_truth, labels=self.class_list)
103 |
104 | if self.accuracies_per_class is None:
105 | self.accuracies_per_class = accuracies_per_class
106 | else:
107 | self.accuracies_per_class = numpy.vstack((self.accuracies_per_class, accuracies_per_class))
108 |
109 | if self.correct_per_class is None:
110 | self.correct_per_class = correct_per_class
111 | else:
112 | self.correct_per_class = numpy.vstack((self.correct_per_class, correct_per_class))
113 |
114 | Nref = numpy.zeros(len(self.class_list))
115 | Nsys = numpy.zeros(len(self.class_list))
116 |
117 | for class_id, class_label in enumerate(self.class_list):
118 | for item in system_output:
119 | if item == class_label:
120 | Nsys[class_id] += 1
121 |
122 | for item in annotated_ground_truth:
123 | if item == class_label:
124 | Nref[class_id] += 1
125 |
126 | if self.Nref is None:
127 | self.Nref = Nref
128 | else:
129 | self.Nref = numpy.vstack((self.Nref, Nref))
130 |
131 | if self.Nsys is None:
132 | self.Nsys = Nsys
133 | else:
134 | self.Nsys = numpy.vstack((self.Nsys, Nsys))
135 |
136 | def results(self):
137 | """Get results
138 |
139 | Outputs results in dict, format:
140 |
141 | {
142 | 'class_wise_data':
143 | {
144 | 'office': {
145 | 'Nsys': 10,
146 | 'Nref': 7,
147 | },
148 | }
149 | 'class_wise_accuracy':
150 | {
151 | 'office': 0.6,
152 | 'home': 0.4,
153 | }
154 | 'overall_accuracy': numpy.mean(self.accuracies_per_class)
155 | 'Nsys': 100,
156 | 'Nref': 100,
157 | }
158 |
159 | Parameters
160 | ----------
161 | nothing
162 |
163 | Returns
164 | -------
165 | results : dict
166 | Results dict
167 |
168 | """
169 |
170 | results = {
171 | 'class_wise_data': {},
172 | 'class_wise_accuracy': {},
173 | 'overall_accuracy': float(numpy.mean(self.accuracies_per_class)),
174 | 'class_wise_correct_count': self.correct_per_class.tolist(),
175 |
176 | }
177 | if len(self.Nsys.shape) == 2:
178 | results['Nsys'] = int(sum(sum(self.Nsys)))
179 | results['Nref'] = int(sum(sum(self.Nref)))
180 | else:
181 | results['Nsys'] = int(sum(self.Nsys))
182 | results['Nref'] = int(sum(self.Nref))
183 |
184 | for class_id, class_label in enumerate(self.class_list):
185 | if len(self.accuracies_per_class.shape) == 2:
186 | results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[:, class_id])
187 | results['class_wise_data'][class_label] = {
188 | 'Nsys': int(sum(self.Nsys[:, class_id])),
189 | 'Nref': int(sum(self.Nref[:, class_id])),
190 | }
191 | else:
192 | results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[class_id])
193 | results['class_wise_data'][class_label] = {
194 | 'Nsys': int(self.Nsys[class_id]),
195 | 'Nref': int(self.Nref[class_id]),
196 | }
197 |
198 | return results
199 |
200 |
201 | class EventDetectionMetrics(object):
202 | """Baseclass for sound event metric classes.
203 | """
204 |
205 | def __init__(self, class_list):
206 | """__init__ method.
207 |
208 | Parameters
209 | ----------
210 | class_list : list
211 | List of class labels to be evaluated.
212 |
213 | """
214 |
215 | self.class_list = class_list
216 | self.eps = numpy.spacing(1)
217 |
218 | def max_event_offset(self, data):
219 | """Get maximum event offset from event list
220 |
221 | Parameters
222 | ----------
223 | data : list
224 | Event list, list of event dicts
225 |
226 | Returns
227 | -------
228 | max : float > 0
229 | Maximum event offset
230 | """
231 |
232 | max = 0
233 | for event in data:
234 | if event['event_offset'] > max:
235 | max = event['event_offset']
236 | return max
237 |
238 | def list_to_roll(self, data, time_resolution=0.01):
239 | """Convert event list into event roll.
240 | Event roll is binary matrix indicating event activity withing time segment defined by time_resolution.
241 |
242 | Parameters
243 | ----------
244 | data : list
245 | Event list, list of event dicts
246 |
247 | time_resolution : float > 0
248 | Time resolution used when converting event into event roll.
249 |
250 | Returns
251 | -------
252 | event_roll : numpy.ndarray [shape=(math.ceil(data_length * 1 / time_resolution), amount of classes)]
253 | Event roll
254 | """
255 |
256 | # Initialize
257 | data_length = self.max_event_offset(data)
258 | event_roll = numpy.zeros(( int(math.ceil(data_length * 1 / time_resolution)), len(self.class_list)))
259 |
260 | # Fill-in event_roll
261 | for event in data:
262 | pos = self.class_list.index(event['event_label'].rstrip())
263 |
264 | onset = int(math.floor(event['event_onset'] * 1 / time_resolution))
265 | offset = int(math.ceil(event['event_offset'] * 1 / time_resolution))
266 |
267 | event_roll[onset:offset, pos] = 1
268 |
269 | return event_roll
270 |
271 |
272 | class DCASE2016_EventDetection_SegmentBasedMetrics(EventDetectionMetrics):
273 | """DCASE2016 Segment based metrics for sound event detection
274 |
275 | Supported metrics:
276 | - Overall
277 | - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D)
278 | - F-score (F1)
279 | - Class-wise
280 | - Error rate (ER), Insertions (I), Deletions (D)
281 | - F-score (F1)
282 |
283 | Examples
284 | --------
285 |
286 | >>> overall_metrics_per_scene = {}
287 | >>> for scene_id, scene_label in enumerate(dataset.scene_labels):
288 | >>> dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
289 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode):
290 | >>> results = []
291 | >>> result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
292 | >>>
293 | >>> if os.path.isfile(result_filename):
294 | >>> with open(result_filename, 'rt') as f:
295 | >>> for row in csv.reader(f, delimiter='\t'):
296 | >>> results.append(row)
297 | >>>
298 | >>> for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)):
299 | >>> current_file_results = []
300 | >>> for result_line in results:
301 | >>> if result_line[0] == dataset.absolute_to_relative(item['file']):
302 | >>> current_file_results.append(
303 | >>> {'file': result_line[0],
304 | >>> 'event_onset': float(result_line[1]),
305 | >>> 'event_offset': float(result_line[2]),
306 | >>> 'event_label': result_line[3]
307 | >>> }
308 | >>> )
309 | >>> meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
310 | >>> dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
311 | >>> overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results()
312 |
313 | """
314 |
315 | def __init__(self, class_list, time_resolution=1.0):
316 | """__init__ method.
317 |
318 | Parameters
319 | ----------
320 | class_list : list
321 | List of class labels to be evaluated.
322 |
323 | time_resolution : float > 0
324 | Time resolution used when converting event into event roll.
325 | (Default value = 1.0)
326 |
327 | """
328 |
329 | self.time_resolution = time_resolution
330 |
331 | self.overall = {
332 | 'Ntp': 0.0,
333 | 'Ntn': 0.0,
334 | 'Nfp': 0.0,
335 | 'Nfn': 0.0,
336 | 'Nref': 0.0,
337 | 'Nsys': 0.0,
338 | 'ER': 0.0,
339 | 'S': 0.0,
340 | 'D': 0.0,
341 | 'I': 0.0,
342 | }
343 | self.class_wise = {}
344 |
345 | for class_label in class_list:
346 | self.class_wise[class_label] = {
347 | 'Ntp': 0.0,
348 | 'Ntn': 0.0,
349 | 'Nfp': 0.0,
350 | 'Nfn': 0.0,
351 | 'Nref': 0.0,
352 | 'Nsys': 0.0,
353 | }
354 |
355 | EventDetectionMetrics.__init__(self, class_list=class_list)
356 |
357 | def __enter__(self):
358 | # Initialize class and return it
359 | return self
360 |
361 | def __exit__(self, type, value, traceback):
362 | # Finalize evaluation and return results
363 | return self.results()
364 |
365 | def evaluate(self, annotated_ground_truth, system_output):
366 | """Evaluate system output and annotated ground truth pair.
367 |
368 | Use results method to get results.
369 |
370 | Parameters
371 | ----------
372 | annotated_ground_truth : numpy.array
373 | Ground truth array, list of scene labels
374 |
375 | system_output : numpy.array
376 | System output array, list of scene labels
377 |
378 | Returns
379 | -------
380 | nothing
381 |
382 | """
383 |
384 | # Convert event list into frame-based representation
385 | system_event_roll = self.list_to_roll(data=system_output, time_resolution=self.time_resolution)
386 | annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=self.time_resolution)
387 |
388 | # Fix durations of both event_rolls to be equal
389 | if annotated_event_roll.shape[0] > system_event_roll.shape[0]:
390 | padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list)))
391 | system_event_roll = numpy.vstack((system_event_roll, padding))
392 |
393 | if system_event_roll.shape[0] > annotated_event_roll.shape[0]:
394 | padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list)))
395 | annotated_event_roll = numpy.vstack((annotated_event_roll, padding))
396 |
397 | # Compute segment-based overall metrics
398 | for segment_id in range(0, annotated_event_roll.shape[0]):
399 | annotated_segment = annotated_event_roll[segment_id, :]
400 | system_segment = system_event_roll[segment_id, :]
401 |
402 | Ntp = sum(system_segment + annotated_segment > 1)
403 | Ntn = sum(system_segment + annotated_segment == 0)
404 | Nfp = sum(system_segment - annotated_segment > 0)
405 | Nfn = sum(annotated_segment - system_segment > 0)
406 |
407 | Nref = sum(annotated_segment)
408 | Nsys = sum(system_segment)
409 |
410 | S = min(Nref, Nsys) - Ntp
411 | D = max(0, Nref - Nsys)
412 | I = max(0, Nsys - Nref)
413 | ER = max(Nref, Nsys) - Ntp
414 |
415 | self.overall['Ntp'] += Ntp
416 | self.overall['Ntn'] += Ntn
417 | self.overall['Nfp'] += Nfp
418 | self.overall['Nfn'] += Nfn
419 | self.overall['Nref'] += Nref
420 | self.overall['Nsys'] += Nsys
421 | self.overall['S'] += S
422 | self.overall['D'] += D
423 | self.overall['I'] += I
424 | self.overall['ER'] += ER
425 |
426 | for class_id, class_label in enumerate(self.class_list):
427 | annotated_segment = annotated_event_roll[:, class_id]
428 | system_segment = system_event_roll[:, class_id]
429 |
430 | Ntp = sum(system_segment + annotated_segment > 1)
431 | Ntn = sum(system_segment + annotated_segment == 0)
432 | Nfp = sum(system_segment - annotated_segment > 0)
433 | Nfn = sum(annotated_segment - system_segment > 0)
434 |
435 | Nref = sum(annotated_segment)
436 | Nsys = sum(system_segment)
437 |
438 | self.class_wise[class_label]['Ntp'] += Ntp
439 | self.class_wise[class_label]['Ntn'] += Ntn
440 | self.class_wise[class_label]['Nfp'] += Nfp
441 | self.class_wise[class_label]['Nfn'] += Nfn
442 | self.class_wise[class_label]['Nref'] += Nref
443 | self.class_wise[class_label]['Nsys'] += Nsys
444 |
445 | return self
446 |
447 | def results(self):
448 | """Get results
449 |
450 | Outputs results in dict, format:
451 |
452 | {
453 | 'overall':
454 | {
455 | 'Pre':
456 | 'Rec':
457 | 'F':
458 | 'ER':
459 | 'S':
460 | 'D':
461 | 'I':
462 | }
463 | 'class_wise':
464 | {
465 | 'office': {
466 | 'Pre':
467 | 'Rec':
468 | 'F':
469 | 'ER':
470 | 'D':
471 | 'I':
472 | 'Nref':
473 | 'Nsys':
474 | 'Ntp':
475 | 'Nfn':
476 | 'Nfp':
477 | },
478 | }
479 | 'class_wise_average':
480 | {
481 | 'F':
482 | 'ER':
483 | }
484 | }
485 |
486 | Parameters
487 | ----------
488 | nothing
489 |
490 | Returns
491 | -------
492 | results : dict
493 | Results dict
494 |
495 | """
496 |
497 | results = {'overall': {},
498 | 'class_wise': {},
499 | 'class_wise_average': {},
500 | }
501 |
502 | # Overall metrics
503 | results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps)
504 | results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref']
505 | results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps))
506 |
507 | results['overall']['ER'] = self.overall['ER'] / self.overall['Nref']
508 | results['overall']['S'] = self.overall['S'] / self.overall['Nref']
509 | results['overall']['D'] = self.overall['D'] / self.overall['Nref']
510 | results['overall']['I'] = self.overall['I'] / self.overall['Nref']
511 |
512 | # Class-wise metrics
513 | class_wise_F = []
514 | class_wise_ER = []
515 | for class_id, class_label in enumerate(self.class_list):
516 | if class_label not in results['class_wise']:
517 | results['class_wise'][class_label] = {}
518 | results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps)
519 | results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps)
520 | results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps))
521 |
522 | results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn'] + self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps)
523 | results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps)
524 | results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps)
525 |
526 | results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref']
527 | results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys']
528 | results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp']
529 | results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn']
530 | results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp']
531 |
532 | class_wise_F.append(results['class_wise'][class_label]['F'])
533 | class_wise_ER.append(results['class_wise'][class_label]['ER'])
534 |
535 | results['class_wise_average']['F'] = numpy.mean(class_wise_F)
536 | results['class_wise_average']['ER'] = numpy.mean(class_wise_ER)
537 |
538 | return results
539 |
540 |
541 | class DCASE2016_EventDetection_EventBasedMetrics(EventDetectionMetrics):
542 | """DCASE2016 Event based metrics for sound event detection
543 |
544 | Supported metrics:
545 | - Overall
546 | - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D)
547 | - F-score (F1)
548 | - Class-wise
549 | - Error rate (ER), Insertions (I), Deletions (D)
550 | - F-score (F1)
551 |
552 | Examples
553 | --------
554 |
555 | >>> overall_metrics_per_scene = {}
556 | >>> for scene_id, scene_label in enumerate(dataset.scene_labels):
557 | >>> dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
558 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode):
559 | >>> results = []
560 | >>> result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
561 | >>>
562 | >>> if os.path.isfile(result_filename):
563 | >>> with open(result_filename, 'rt') as f:
564 | >>> for row in csv.reader(f, delimiter='\t'):
565 | >>> results.append(row)
566 | >>>
567 | >>> for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)):
568 | >>> current_file_results = []
569 | >>> for result_line in results:
570 | >>> if result_line[0] == dataset.absolute_to_relative(item['file']):
571 | >>> current_file_results.append(
572 | >>> {'file': result_line[0],
573 | >>> 'event_onset': float(result_line[1]),
574 | >>> 'event_offset': float(result_line[2]),
575 | >>> 'event_label': result_line[3]
576 | >>> }
577 | >>> )
578 | >>> meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
579 | >>> dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
580 | >>> overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results()
581 |
582 | """
583 |
584 | def __init__(self, class_list, t_collar=0.2, use_onset_condition=True, use_offset_condition=True):
585 | """__init__ method.
586 |
587 | Parameters
588 | ----------
589 | class_list : list
590 | List of class labels to be evaluated.
591 |
592 | t_collar : float > 0
593 | Time collar for event onset and offset condition
594 | (Default value = 0.2)
595 |
596 | use_onset_condition : bool
597 | Use onset condition when finding correctly detected events
598 | (Default value = True)
599 |
600 | use_offset_condition : bool
601 | Use offset condition when finding correctly detected events
602 | (Default value = True)
603 |
604 | """
605 |
606 | self.t_collar = t_collar
607 | self.use_onset_condition = use_onset_condition
608 | self.use_offset_condition = use_offset_condition
609 |
610 | self.overall = {
611 | 'Nref': 0.0,
612 | 'Nsys': 0.0,
613 | 'Nsubs': 0.0,
614 | 'Ntp': 0.0,
615 | 'Nfp': 0.0,
616 | 'Nfn': 0.0,
617 | }
618 | self.class_wise = {}
619 |
620 | for class_label in class_list:
621 | self.class_wise[class_label] = {
622 | 'Nref': 0.0,
623 | 'Nsys': 0.0,
624 | 'Ntp': 0.0,
625 | 'Ntn': 0.0,
626 | 'Nfp': 0.0,
627 | 'Nfn': 0.0,
628 | }
629 |
630 | EventDetectionMetrics.__init__(self, class_list=class_list)
631 |
632 | def __enter__(self):
633 | # Initialize class and return it
634 | return self
635 |
636 | def __exit__(self, type, value, traceback):
637 | # Finalize evaluation and return results
638 | return self.results()
639 |
640 | def evaluate(self, annotated_ground_truth, system_output):
641 | """Evaluate system output and annotated ground truth pair.
642 |
643 | Use results method to get results.
644 |
645 | Parameters
646 | ----------
647 | annotated_ground_truth : numpy.array
648 | Ground truth array, list of scene labels
649 |
650 | system_output : numpy.array
651 | System output array, list of scene labels
652 |
653 | Returns
654 | -------
655 | nothing
656 |
657 | """
658 |
659 | # Overall metrics
660 |
661 | # Total number of detected and reference events
662 | Nsys = len(system_output)
663 | Nref = len(annotated_ground_truth)
664 |
665 | sys_correct = numpy.zeros(Nsys, dtype=bool)
666 | ref_correct = numpy.zeros(Nref, dtype=bool)
667 |
668 | # Number of correctly transcribed events, onset/offset within a t_collar range
669 | for j in range(0, len(annotated_ground_truth)):
670 | for i in range(0, len(system_output)):
671 | if not sys_correct[i]: # skip already matched events
672 | label_condition = annotated_ground_truth[j]['event_label'] == system_output[i]['event_label']
673 | if self.use_onset_condition:
674 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
675 | system_event=system_output[i],
676 | t_collar=self.t_collar)
677 | else:
678 | onset_condition = True
679 |
680 | if self.use_offset_condition:
681 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
682 | system_event=system_output[i],
683 | t_collar=self.t_collar)
684 | else:
685 | offset_condition = True
686 |
687 | if label_condition and onset_condition and offset_condition:
688 | ref_correct[j] = True
689 | sys_correct[i] = True
690 | break
691 |
692 | Ntp = numpy.sum(sys_correct)
693 |
694 | sys_leftover = numpy.nonzero(numpy.negative(sys_correct))[0]
695 | ref_leftover = numpy.nonzero(numpy.negative(ref_correct))[0]
696 |
697 | # Substitutions
698 | Nsubs = 0
699 | sys_counted = numpy.zeros(Nsys, dtype=bool)
700 | for j in ref_leftover:
701 | for i in sys_leftover:
702 | if not sys_counted[i]:
703 | if self.use_onset_condition:
704 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
705 | system_event=system_output[i],
706 | t_collar=self.t_collar)
707 | else:
708 | onset_condition = True
709 |
710 | if self.use_offset_condition:
711 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
712 | system_event=system_output[i],
713 | t_collar=self.t_collar)
714 | else:
715 | offset_condition = True
716 |
717 | if onset_condition and offset_condition:
718 | sys_counted[i] = True
719 | Nsubs += 1
720 | break
721 |
722 | Nfp = Nsys - Ntp - Nsubs
723 | Nfn = Nref - Ntp - Nsubs
724 |
725 | self.overall['Nref'] += Nref
726 | self.overall['Nsys'] += Nsys
727 | self.overall['Ntp'] += Ntp
728 | self.overall['Nsubs'] += Nsubs
729 | self.overall['Nfp'] += Nfp
730 | self.overall['Nfn'] += Nfn
731 |
732 | # Class-wise metrics
733 | for class_id, class_label in enumerate(self.class_list):
734 | Nref = 0.0
735 | Nsys = 0.0
736 | Ntp = 0.0
737 |
738 | # Count event frequencies in the ground truth
739 | for i in range(0, len(annotated_ground_truth)):
740 | if annotated_ground_truth[i]['event_label'] == class_label:
741 | Nref += 1
742 |
743 | # Count event frequencies in the system output
744 | for i in range(0, len(system_output)):
745 | if system_output[i]['event_label'] == class_label:
746 | Nsys += 1
747 |
748 | sys_counted = numpy.zeros(len(system_output), dtype=bool)
749 | for j in range(0, len(annotated_ground_truth)):
750 | if annotated_ground_truth[j]['event_label'] == class_label:
751 | for i in range(0, len(system_output)):
752 | if system_output[i]['event_label'] == class_label and not sys_counted[i]:
753 | if self.use_onset_condition:
754 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j],
755 | system_event=system_output[i],
756 | t_collar=self.t_collar)
757 | else:
758 | onset_condition = True
759 |
760 | if self.use_offset_condition:
761 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j],
762 | system_event=system_output[i],
763 | t_collar=self.t_collar)
764 | else:
765 | offset_condition = True
766 |
767 | if onset_condition and offset_condition:
768 | sys_counted[i] = True
769 | Ntp += 1
770 | break
771 |
772 | Nfp = Nsys - Ntp
773 | Nfn = Nref - Ntp
774 |
775 | self.class_wise[class_label]['Nref'] += Nref
776 | self.class_wise[class_label]['Nsys'] += Nsys
777 |
778 | self.class_wise[class_label]['Ntp'] += Ntp
779 | self.class_wise[class_label]['Nfp'] += Nfp
780 | self.class_wise[class_label]['Nfn'] += Nfn
781 |
782 |
783 | def onset_condition(self, annotated_event, system_event, t_collar=0.200):
784 | """Onset condition, checked does the event pair fulfill condition
785 |
786 | Condition:
787 |
788 | - event onsets are within t_collar each other
789 |
790 | Parameters
791 | ----------
792 | annotated_event : dict
793 | Event dict
794 |
795 | system_event : dict
796 | Event dict
797 |
798 | t_collar : float > 0
799 | Defines how close event onsets have to be in order to be considered match. In seconds.
800 | (Default value = 0.2)
801 |
802 | Returns
803 | -------
804 | result : bool
805 | Condition result
806 |
807 | """
808 |
809 | return math.fabs(annotated_event['event_onset'] - system_event['event_onset']) <= t_collar
810 |
811 | def offset_condition(self, annotated_event, system_event, t_collar=0.200, percentage_of_length=0.5):
812 | """Offset condition, checking does the event pair fulfill condition
813 |
814 | Condition:
815 |
816 | - event offsets are within t_collar each other
817 | or
818 | - system event offset is within the percentage_of_length*annotated event_length
819 |
820 | Parameters
821 | ----------
822 | annotated_event : dict
823 | Event dict
824 |
825 | system_event : dict
826 | Event dict
827 |
828 | t_collar : float > 0
829 | Defines how close event onsets have to be in order to be considered match. In seconds.
830 | (Default value = 0.2)
831 |
832 | percentage_of_length : float [0-1]
833 |
834 |
835 | Returns
836 | -------
837 | result : bool
838 | Condition result
839 |
840 | """
841 | annotated_length = annotated_event['event_offset'] - annotated_event['event_onset']
842 | return math.fabs(annotated_event['event_offset'] - system_event['event_offset']) <= max(t_collar, percentage_of_length * annotated_length)
843 |
844 | def results(self):
845 | """Get results
846 |
847 | Outputs results in dict, format:
848 |
849 | {
850 | 'overall':
851 | {
852 | 'Pre':
853 | 'Rec':
854 | 'F':
855 | 'ER':
856 | 'S':
857 | 'D':
858 | 'I':
859 | }
860 | 'class_wise':
861 | {
862 | 'office': {
863 | 'Pre':
864 | 'Rec':
865 | 'F':
866 | 'ER':
867 | 'D':
868 | 'I':
869 | 'Nref':
870 | 'Nsys':
871 | 'Ntp':
872 | 'Nfn':
873 | 'Nfp':
874 | },
875 | }
876 | 'class_wise_average':
877 | {
878 | 'F':
879 | 'ER':
880 | }
881 | }
882 |
883 | Parameters
884 | ----------
885 | nothing
886 |
887 | Returns
888 | -------
889 | results : dict
890 | Results dict
891 |
892 | """
893 |
894 | results = {
895 | 'overall': {},
896 | 'class_wise': {},
897 | 'class_wise_average': {},
898 | }
899 |
900 | # Overall metrics
901 | results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps)
902 | results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref']
903 | results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps))
904 |
905 | results['overall']['ER'] = (self.overall['Nfn'] + self.overall['Nfp'] + self.overall['Nsubs']) / self.overall['Nref']
906 | results['overall']['S'] = self.overall['Nsubs'] / self.overall['Nref']
907 | results['overall']['D'] = self.overall['Nfn'] / self.overall['Nref']
908 | results['overall']['I'] = self.overall['Nfp'] / self.overall['Nref']
909 |
910 | # Class-wise metrics
911 | class_wise_F = []
912 | class_wise_ER = []
913 |
914 | for class_label in self.class_list:
915 | if class_label not in results['class_wise']:
916 | results['class_wise'][class_label] = {}
917 |
918 | results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps)
919 | results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps)
920 | results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps))
921 |
922 | results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn']+self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps)
923 | results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps)
924 | results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps)
925 |
926 | results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref']
927 | results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys']
928 | results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp']
929 | results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn']
930 | results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp']
931 |
932 | class_wise_F.append(results['class_wise'][class_label]['F'])
933 | class_wise_ER.append(results['class_wise'][class_label]['ER'])
934 |
935 | # Class-wise average
936 | results['class_wise_average']['F'] = numpy.mean(class_wise_F)
937 | results['class_wise_average']['ER'] = numpy.mean(class_wise_ER)
938 |
939 | return results
940 |
941 |
942 | class DCASE2013_EventDetection_Metrics(EventDetectionMetrics):
943 | """Lecagy DCASE2013 metrics, converted from the provided Matlab implementation
944 |
945 | Supported metrics:
946 | - Frame based
947 | - F-score (F)
948 | - AEER
949 | - Event based
950 | - Onset
951 | - F-Score (F)
952 | - AEER
953 | - Onset-offset
954 | - F-Score (F)
955 | - AEER
956 | - Class based
957 | - Onset
958 | - F-Score (F)
959 | - AEER
960 | - Onset-offset
961 | - F-Score (F)
962 | - AEER
963 | """
964 |
965 | #
966 |
967 | def frame_based(self, annotated_ground_truth, system_output, resolution=0.01):
968 | # Convert event list into frame-based representation
969 | system_event_roll = self.list_to_roll(data=system_output, time_resolution=resolution)
970 | annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=resolution)
971 |
972 | # Fix durations of both event_rolls to be equal
973 | if annotated_event_roll.shape[0] > system_event_roll.shape[0]:
974 | padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list)))
975 | system_event_roll = numpy.vstack((system_event_roll, padding))
976 |
977 | if system_event_roll.shape[0] > annotated_event_roll.shape[0]:
978 | padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list)))
979 | annotated_event_roll = numpy.vstack((annotated_event_roll, padding))
980 |
981 | # Compute frame-based metrics
982 | Nref = sum(sum(annotated_event_roll))
983 | Ntot = sum(sum(system_event_roll))
984 | Ntp = sum(sum(system_event_roll + annotated_event_roll > 1))
985 | Nfp = sum(sum(system_event_roll - annotated_event_roll > 0))
986 | Nfn = sum(sum(annotated_event_roll - system_event_roll > 0))
987 | Nsubs = min(Nfp, Nfn)
988 |
989 | eps = numpy.spacing(1)
990 |
991 | results = dict()
992 | results['Rec'] = Ntp / (Nref + eps)
993 | results['Pre'] = Ntp / (Ntot + eps)
994 | results['F'] = 2 * ((results['Pre'] * results['Rec']) / (results['Pre'] + results['Rec'] + eps))
995 | results['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps)
996 |
997 | return results
998 |
999 | def event_based(self, annotated_ground_truth, system_output):
1000 | # Event-based evaluation for event detection task
1001 | # outputFile: the output of the event detection system
1002 | # GTFile: the ground truth list of events
1003 |
1004 | # Total number of detected and reference events
1005 | Ntot = len(system_output)
1006 | Nref = len(annotated_ground_truth)
1007 |
1008 | # Number of correctly transcribed events, onset within a +/-100 ms range
1009 | Ncorr = 0
1010 | NcorrOff = 0
1011 | for j in range(0, len(annotated_ground_truth)):
1012 | for i in range(0, len(system_output)):
1013 | if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and (math.fabs(annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1):
1014 | Ncorr += 1
1015 |
1016 | # If offset within a +/-100 ms range or within 50% of ground-truth event's duration
1017 | if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max(0.1, 0.5 * (annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j]['event_onset'])):
1018 | NcorrOff += 1
1019 |
1020 | break # In order to not evaluate duplicates
1021 |
1022 | # Compute onset-only event-based metrics
1023 | eps = numpy.spacing(1)
1024 | results = {
1025 | 'onset': {},
1026 | 'onset-offset': {},
1027 | }
1028 |
1029 | Nfp = Ntot - Ncorr
1030 | Nfn = Nref - Ncorr
1031 | Nsubs = min(Nfp, Nfn)
1032 | results['onset']['Rec'] = Ncorr / (Nref + eps)
1033 | results['onset']['Pre'] = Ncorr / (Ntot + eps)
1034 | results['onset']['F'] = 2 * (
1035 | (results['onset']['Pre'] * results['onset']['Rec']) / (
1036 | results['onset']['Pre'] + results['onset']['Rec'] + eps))
1037 | results['onset']['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps)
1038 |
1039 | # Compute onset-offset event-based metrics
1040 | NfpOff = Ntot - NcorrOff
1041 | NfnOff = Nref - NcorrOff
1042 | NsubsOff = min(NfpOff, NfnOff)
1043 | results['onset-offset']['Rec'] = NcorrOff / (Nref + eps)
1044 | results['onset-offset']['Pre'] = NcorrOff / (Ntot + eps)
1045 | results['onset-offset']['F'] = 2 * ((results['onset-offset']['Pre'] * results['onset-offset']['Rec']) / (
1046 | results['onset-offset']['Pre'] + results['onset-offset']['Rec'] + eps))
1047 | results['onset-offset']['AEER'] = (NfnOff + NfpOff + NsubsOff) / (Nref + eps)
1048 |
1049 | return results
1050 |
1051 | def class_based(self, annotated_ground_truth, system_output):
1052 | # Class-wise event-based evaluation for event detection task
1053 | # outputFile: the output of the event detection system
1054 | # GTFile: the ground truth list of events
1055 |
1056 | # Total number of detected and reference events per class
1057 | Ntot = numpy.zeros((len(self.class_list), 1))
1058 | for event in system_output:
1059 | pos = self.class_list.index(event['event_label'])
1060 | Ntot[pos] += 1
1061 |
1062 | Nref = numpy.zeros((len(self.class_list), 1))
1063 | for event in annotated_ground_truth:
1064 | pos = self.class_list.index(event['event_label'])
1065 | Nref[pos] += 1
1066 |
1067 | I = (Nref > 0).nonzero()[0] # index for classes present in ground-truth
1068 |
1069 | # Number of correctly transcribed events per class, onset within a +/-100 ms range
1070 | Ncorr = numpy.zeros((len(self.class_list), 1))
1071 | NcorrOff = numpy.zeros((len(self.class_list), 1))
1072 |
1073 | for j in range(0, len(annotated_ground_truth)):
1074 | for i in range(0, len(system_output)):
1075 | if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and (
1076 | math.fabs(
1077 | annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1):
1078 | pos = self.class_list.index(system_output[i]['event_label'])
1079 | Ncorr[pos] += 1
1080 |
1081 | # If offset within a +/-100 ms range or within 50% of ground-truth event's duration
1082 | if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max(
1083 | 0.1, 0.5 * (
1084 | annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j][
1085 | 'event_onset'])):
1086 | pos = self.class_list.index(system_output[i]['event_label'])
1087 | NcorrOff[pos] += 1
1088 |
1089 | break # In order to not evaluate duplicates
1090 |
1091 | # Compute onset-only class-wise event-based metrics
1092 | eps = numpy.spacing(1)
1093 | results = {
1094 | 'onset': {},
1095 | 'onset-offset': {},
1096 | }
1097 |
1098 | Nfp = Ntot - Ncorr
1099 | Nfn = Nref - Ncorr
1100 | Nsubs = numpy.minimum(Nfp, Nfn)
1101 | tempRec = Ncorr[I] / (Nref[I] + eps)
1102 | tempPre = Ncorr[I] / (Ntot[I] + eps)
1103 | results['onset']['Rec'] = numpy.mean(tempRec)
1104 | results['onset']['Pre'] = numpy.mean(tempPre)
1105 | tempF = 2 * ((tempPre * tempRec) / (tempPre + tempRec + eps))
1106 | results['onset']['F'] = numpy.mean(tempF)
1107 | tempAEER = (Nfn[I] + Nfp[I] + Nsubs[I]) / (Nref[I] + eps)
1108 | results['onset']['AEER'] = numpy.mean(tempAEER)
1109 |
1110 | # Compute onset-offset class-wise event-based metrics
1111 | NfpOff = Ntot - NcorrOff
1112 | NfnOff = Nref - NcorrOff
1113 | NsubsOff = numpy.minimum(NfpOff, NfnOff)
1114 | tempRecOff = NcorrOff[I] / (Nref[I] + eps)
1115 | tempPreOff = NcorrOff[I] / (Ntot[I] + eps)
1116 | results['onset-offset']['Rec'] = numpy.mean(tempRecOff)
1117 | results['onset-offset']['Pre'] = numpy.mean(tempPreOff)
1118 | tempFOff = 2 * ((tempPreOff * tempRecOff) / (tempPreOff + tempRecOff + eps))
1119 | results['onset-offset']['F'] = numpy.mean(tempFOff)
1120 | tempAEEROff = (NfnOff[I] + NfpOff[I] + NsubsOff[I]) / (Nref[I] + eps)
1121 | results['onset-offset']['AEER'] = numpy.mean(tempAEEROff)
1122 |
1123 | return results
1124 |
1125 |
1126 | def main(argv):
1127 | # Examples to show usage and required data structures
1128 | class_list = ['class1', 'class2', 'class3']
1129 | system_output = [
1130 | {
1131 | 'event_label': 'class1',
1132 | 'event_onset': 0.1,
1133 | 'event_offset': 1.0
1134 | },
1135 | {
1136 | 'event_label': 'class2',
1137 | 'event_onset': 4.1,
1138 | 'event_offset': 4.7
1139 | },
1140 | {
1141 | 'event_label': 'class3',
1142 | 'event_onset': 5.5,
1143 | 'event_offset': 6.7
1144 | }
1145 | ]
1146 | annotated_groundtruth = [
1147 | {
1148 | 'event_label': 'class1',
1149 | 'event_onset': 0.1,
1150 | 'event_offset': 1.0
1151 | },
1152 | {
1153 | 'event_label': 'class2',
1154 | 'event_onset': 4.2,
1155 | 'event_offset': 5.4
1156 | },
1157 | {
1158 | 'event_label': 'class3',
1159 | 'event_onset': 5.5,
1160 | 'event_offset': 6.7
1161 | }
1162 | ]
1163 | dcase2013metric = DCASE2013_EventDetection_Metrics(class_list=class_list)
1164 |
1165 | print 'DCASE2013'
1166 | print 'Frame-based:', dcase2013metric.frame_based(system_output=system_output,
1167 | annotated_ground_truth=annotated_groundtruth)
1168 | print 'Event-based:', dcase2013metric.event_based(system_output=system_output,
1169 | annotated_ground_truth=annotated_groundtruth)
1170 | print 'Class-based:', dcase2013metric.class_based(system_output=system_output,
1171 | annotated_ground_truth=annotated_groundtruth)
1172 |
1173 | dcase2016_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=class_list)
1174 | print 'DCASE2016'
1175 | print dcase2016_metric.evaluate(system_output=system_output, annotated_ground_truth=annotated_groundtruth).results()
1176 |
1177 |
1178 | if __name__ == "__main__":
1179 | sys.exit(main(sys.argv))
1180 |
--------------------------------------------------------------------------------
/src/features.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import numpy
5 | import librosa
6 | import scipy
7 |
8 |
9 | def feature_extraction(y, fs=44100, statistics=True, include_mfcc0=True, include_delta=True,
10 | include_acceleration=True, mfcc_params=None, delta_params=None, acceleration_params=None):
11 | """Feature extraction, MFCC based features
12 |
13 | Outputs features in dict, format:
14 |
15 | {
16 | 'feat': feature_matrix [shape=(frame count, feature vector size)],
17 | 'stat': {
18 | 'mean': numpy.mean(feature_matrix, axis=0),
19 | 'std': numpy.std(feature_matrix, axis=0),
20 | 'N': feature_matrix.shape[0],
21 | 'S1': numpy.sum(feature_matrix, axis=0),
22 | 'S2': numpy.sum(feature_matrix ** 2, axis=0),
23 | }
24 | }
25 |
26 | Parameters
27 | ----------
28 | y: numpy.array [shape=(signal_length, )]
29 | Audio
30 |
31 | fs: int > 0 [scalar]
32 | Sample rate
33 | (Default value=44100)
34 |
35 | statistics: bool
36 | Calculate feature statistics for extracted matrix
37 | (Default value=True)
38 |
39 | include_mfcc0: bool
40 | Include 0th MFCC coefficient into static coefficients.
41 | (Default value=True)
42 |
43 | include_delta: bool
44 | Include delta MFCC coefficients.
45 | (Default value=True)
46 |
47 | include_acceleration: bool
48 | Include acceleration MFCC coefficients.
49 | (Default value=True)
50 |
51 | mfcc_params: dict or None
52 | Parameters for extraction of static MFCC coefficients.
53 |
54 | delta_params: dict or None
55 | Parameters for extraction of delta MFCC coefficients.
56 |
57 | acceleration_params: dict or None
58 | Parameters for extraction of acceleration MFCC coefficients.
59 |
60 | Returns
61 | -------
62 | result: dict
63 | Feature dict
64 |
65 | """
66 |
67 | eps = numpy.spacing(1)
68 |
69 | # Windowing function
70 | if mfcc_params['window'] == 'hamming_asymmetric':
71 | window = scipy.signal.hamming(mfcc_params['n_fft'], sym=False)
72 | elif mfcc_params['window'] == 'hamming_symmetric':
73 | window = scipy.signal.hamming(mfcc_params['n_fft'], sym=True)
74 | elif mfcc_params['window'] == 'hann_asymmetric':
75 | window = scipy.signal.hann(mfcc_params['n_fft'], sym=False)
76 | elif mfcc_params['window'] == 'hann_symmetric':
77 | window = scipy.signal.hann(mfcc_params['n_fft'], sym=True)
78 | else:
79 | window = None
80 |
81 | # Calculate Static Coefficients
82 | power_spectrogram = numpy.abs(librosa.stft(y + eps,
83 | n_fft=mfcc_params['n_fft'],
84 | win_length=mfcc_params['win_length'],
85 | hop_length=mfcc_params['hop_length'],
86 | center=True,
87 | window=window))**2
88 | mel_basis = librosa.filters.mel(sr=fs,
89 | n_fft=mfcc_params['n_fft'],
90 | n_mels=mfcc_params['n_mels'],
91 | fmin=mfcc_params['fmin'],
92 | fmax=mfcc_params['fmax'],
93 | htk=mfcc_params['htk'])
94 | mel_spectrum = numpy.dot(mel_basis, power_spectrogram)
95 | mfcc = librosa.feature.mfcc(S=librosa.logamplitude(mel_spectrum),
96 | n_mfcc=mfcc_params['n_mfcc'])
97 |
98 | # Collect the feature matrix
99 | feature_matrix = mfcc
100 | if include_delta:
101 | # Delta coefficients
102 | mfcc_delta = librosa.feature.delta(mfcc, **delta_params)
103 |
104 | # Add Delta Coefficients to feature matrix
105 | feature_matrix = numpy.vstack((feature_matrix, mfcc_delta))
106 |
107 | if include_acceleration:
108 | # Acceleration coefficients (aka delta delta)
109 | mfcc_delta2 = librosa.feature.delta(mfcc, order=2, **acceleration_params)
110 |
111 | # Add Acceleration Coefficients to feature matrix
112 | feature_matrix = numpy.vstack((feature_matrix, mfcc_delta2))
113 |
114 | if not include_mfcc0:
115 | # Omit mfcc0
116 | feature_matrix = feature_matrix[1:, :]
117 |
118 | feature_matrix = feature_matrix.T
119 |
120 | # Collect into data structure
121 | if statistics:
122 | return {
123 | 'feat': feature_matrix,
124 | 'stat': {
125 | 'mean': numpy.mean(feature_matrix, axis=0),
126 | 'std': numpy.std(feature_matrix, axis=0),
127 | 'N': feature_matrix.shape[0],
128 | 'S1': numpy.sum(feature_matrix, axis=0),
129 | 'S2': numpy.sum(feature_matrix ** 2, axis=0),
130 | }
131 | }
132 | else:
133 | return {
134 | 'feat': feature_matrix}
135 |
136 |
137 | class FeatureNormalizer(object):
138 | """Feature normalizer class
139 |
140 | Accumulates feature statistics
141 |
142 | Examples
143 | --------
144 |
145 | >>> normalizer = FeatureNormalizer()
146 | >>> for feature_matrix in training_items:
147 | >>> normalizer.accumulate(feature_matrix)
148 | >>>
149 | >>> normalizer.finalize()
150 |
151 | >>> for feature_matrix in test_items:
152 | >>> feature_matrix_normalized = normalizer.normalize(feature_matrix)
153 | >>> # used the features
154 |
155 | """
156 | def __init__(self, feature_matrix=None):
157 | """__init__ method.
158 |
159 | Parameters
160 | ----------
161 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)] or None
162 | Feature matrix to be used in the initialization
163 |
164 | """
165 | if feature_matrix is None:
166 | self.N = 0
167 | self.mean = 0
168 | self.S1 = 0
169 | self.S2 = 0
170 | self.std = 0
171 | else:
172 | self.mean = numpy.mean(feature_matrix, axis=0)
173 | self.std = numpy.std(feature_matrix, axis=0)
174 | self.N = feature_matrix.shape[0]
175 | self.S1 = numpy.sum(feature_matrix, axis=0)
176 | self.S2 = numpy.sum(feature_matrix ** 2, axis=0)
177 | self.finalize()
178 |
179 | def __enter__(self):
180 | # Initialize Normalization class and return it
181 | self.N = 0
182 | self.mean = 0
183 | self.S1 = 0
184 | self.S2 = 0
185 | self.std = 0
186 | return self
187 |
188 | def __exit__(self, type, value, traceback):
189 | # Finalize accumulated calculation
190 | self.finalize()
191 |
192 | def accumulate(self, stat):
193 | """Accumalate statistics
194 |
195 | Input is statistics dict, format:
196 |
197 | {
198 | 'mean': numpy.mean(feature_matrix, axis=0),
199 | 'std': numpy.std(feature_matrix, axis=0),
200 | 'N': feature_matrix.shape[0],
201 | 'S1': numpy.sum(feature_matrix, axis=0),
202 | 'S2': numpy.sum(feature_matrix ** 2, axis=0),
203 | }
204 |
205 | Parameters
206 | ----------
207 | stat : dict
208 | Statistics dict
209 |
210 | Returns
211 | -------
212 | nothing
213 |
214 | """
215 | self.N += stat['N']
216 | self.mean += stat['mean']
217 | self.S1 += stat['S1']
218 | self.S2 += stat['S2']
219 |
220 | def finalize(self):
221 | """Finalize statistics calculation
222 |
223 | Accumulated values are used to get mean and std for the seen feature data.
224 |
225 | Parameters
226 | ----------
227 | nothing
228 |
229 | Returns
230 | -------
231 | nothing
232 |
233 | """
234 |
235 | # Finalize statistics
236 | self.mean = self.S1 / self.N
237 | self.std = numpy.sqrt((self.N * self.S2 - (self.S1 * self.S1)) / (self.N * (self.N - 1)))
238 |
239 | # In case we have very brain-death material we get std = Nan => 0.0
240 | self.std = numpy.nan_to_num(self.std)
241 |
242 | self.mean = numpy.reshape(self.mean, [1, -1])
243 | self.std = numpy.reshape(self.std, [1, -1])
244 |
245 | def normalize(self, feature_matrix):
246 | """Normalize feature matrix with internal statistics of the class
247 |
248 | Parameters
249 | ----------
250 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)]
251 | Feature matrix to be normalized
252 |
253 | Returns
254 | -------
255 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)]
256 | Normalized feature matrix
257 |
258 | """
259 |
260 | return (feature_matrix - self.mean) / self.std
261 |
--------------------------------------------------------------------------------
/src/files.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import os
5 | import wave
6 | import numpy
7 | import csv
8 | import cPickle as pickle
9 | import librosa
10 | import yaml
11 | import soundfile
12 |
13 | def load_audio(filename, mono=True, fs=44100):
14 | """Load audio file into numpy array
15 |
16 | Supports 24-bit wav-format, and flac audio through librosa.
17 |
18 | Parameters
19 | ----------
20 | filename: str
21 | Path to audio file
22 |
23 | mono : bool
24 | In case of multi-channel audio, channels are averaged into single channel.
25 | (Default value=True)
26 |
27 | fs : int > 0 [scalar]
28 | Target sample rate, if input audio does not fulfil this, audio is resampled.
29 | (Default value=44100)
30 |
31 | Returns
32 | -------
33 | audio_data : numpy.ndarray [shape=(signal_length, channel)]
34 | Audio
35 |
36 | sample_rate : integer
37 | Sample rate
38 |
39 | """
40 |
41 | file_base, file_extension = os.path.splitext(filename)
42 | if file_extension == '.wav':
43 | # Load audio
44 | audio_data, sample_rate = soundfile.read(filename)
45 | audio_data = audio_data.T
46 |
47 | if mono:
48 | # Down-mix audio
49 | audio_data = numpy.mean(audio_data, axis=0)
50 |
51 | # Resample
52 | if fs != sample_rate:
53 | audio_data = librosa.core.resample(audio_data, sample_rate, fs)
54 | sample_rate = fs
55 |
56 | return audio_data, sample_rate
57 |
58 | elif file_extension == '.flac':
59 | audio_data, sample_rate = librosa.load(filename, sr=fs, mono=mono)
60 |
61 | return audio_data, sample_rate
62 |
63 | return None, None
64 |
65 |
66 | def load_event_list(file):
67 | """Load event list from tab delimited text file (csv-formated)
68 |
69 | Supported input formats:
70 |
71 | - [event_onset (float)][tab][event_offset (float)]
72 | - [event_onset (float)][tab][event_offset (float)][tab][event_label (string)]
73 | - [file(string)[tab][scene_label][tab][event_onset (float)][tab][event_offset (float)][tab][event_label (string)]
74 |
75 | Event dict format:
76 |
77 | {
78 | 'file': 'filename',
79 | 'scene_label': 'office',
80 | 'event_onset': 0.0,
81 | 'event_offset': 1.0,
82 | 'event_label': 'people_walking',
83 | }
84 |
85 | Parameters
86 | ----------
87 | file : str
88 | Path to the event list in text format (csv)
89 |
90 | Returns
91 | -------
92 | data : list of event dicts
93 | List containing event dicts
94 |
95 | """
96 | data = []
97 | with open(file, 'rt') as f:
98 | for row in csv.reader(f, delimiter='\t'):
99 | if len(row) == 2:
100 | data.append(
101 | {
102 | 'event_onset': float(row[0]),
103 | 'event_offset': float(row[1])
104 | }
105 | )
106 | elif len(row) == 3:
107 | data.append(
108 | {
109 | 'event_onset': float(row[0]),
110 | 'event_offset': float(row[1]),
111 | 'event_label': row[2]
112 | }
113 | )
114 | elif len(row) == 4:
115 | data.append(
116 | {
117 | 'file': row[0],
118 | 'event_onset': float(row[1]),
119 | 'event_offset': float(row[2]),
120 | 'event_label': row[3]
121 | }
122 | )
123 | elif len(row) == 5:
124 | data.append(
125 | {
126 | 'file': row[0],
127 | 'scene_label': row[1],
128 | 'event_onset': float(row[2]),
129 | 'event_offset': float(row[3]),
130 | 'event_label': row[4]
131 | }
132 | )
133 | return data
134 |
135 |
136 | def save_data(filename, data):
137 | """Save variable into a pickle file
138 |
139 | Parameters
140 | ----------
141 | filename: str
142 | Path to file
143 |
144 | data: list or dict
145 | Data to be saved.
146 |
147 | Returns
148 | -------
149 | nothing
150 |
151 | """
152 |
153 | pickle.dump(data, open(filename, 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
154 |
155 |
156 | def load_data(filename):
157 | """Load data from pickle file
158 |
159 | Parameters
160 | ----------
161 | filename: str
162 | Path to file
163 |
164 | Returns
165 | -------
166 | data: list or dict
167 | Loaded file.
168 |
169 | """
170 |
171 | return pickle.load(open(filename, "rb"))
172 |
173 |
174 | def save_parameters(filename, parameters):
175 | """Save parameters to YAML-file
176 |
177 | Parameters
178 | ----------
179 | filename: str
180 | Path to file
181 | parameters: dict
182 | Dict containing parameters to be saved
183 |
184 | Returns
185 | -------
186 | Nothing
187 |
188 | """
189 |
190 | with open(filename, 'w') as outfile:
191 | outfile.write(yaml.dump(parameters, default_flow_style=False))
192 |
193 |
194 | def load_parameters(filename):
195 | """Load parameters from YAML-file
196 |
197 | Parameters
198 | ----------
199 | filename: str
200 | Path to file
201 |
202 | Returns
203 | -------
204 | parameters: dict
205 | Dict containing loaded parameters
206 |
207 | Raises
208 | -------
209 | IOError
210 | file is not found.
211 |
212 | """
213 |
214 | if os.path.isfile(filename):
215 | with open(filename, 'r') as f:
216 | return yaml.load(f)
217 | else:
218 | raise IOError("Parameter file not found [%s]" % filename)
219 |
220 |
221 | def save_text(filename, text):
222 | """Save text into text file.
223 |
224 | Parameters
225 | ----------
226 | filename: str
227 | Path to file
228 |
229 | text: str
230 | String to be saved.
231 |
232 | Returns
233 | -------
234 | nothing
235 |
236 | """
237 |
238 | with open(filename, "w") as text_file:
239 | text_file.write(text)
240 |
241 |
242 | def load_text(filename):
243 | """Load text file
244 |
245 | Parameters
246 | ----------
247 | filename: str
248 | Path to file
249 |
250 | Returns
251 | -------
252 | text: string
253 | Loaded text.
254 |
255 | """
256 |
257 | with open(filename, 'r') as f:
258 | return f.readlines()
259 |
--------------------------------------------------------------------------------
/src/general.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import os
5 | import hashlib
6 | import json
7 |
8 |
9 | def check_path(path):
10 | """Check if path exists, if not creates one
11 |
12 | Parameters
13 | ----------
14 | path : str
15 | Path to be checked.
16 |
17 | Returns
18 | -------
19 | Nothing
20 |
21 | """
22 |
23 | if not os.path.isdir(path):
24 | os.makedirs(path)
25 |
26 |
27 | def get_parameter_hash(params):
28 | """Get unique hash string (md5) for given parameter dict
29 |
30 | Parameters
31 | ----------
32 | params : dict
33 | Input parameters
34 |
35 | Returns
36 | -------
37 | md5_hash : str
38 | Unique hash for parameter dict
39 |
40 | """
41 |
42 | md5 = hashlib.md5()
43 | md5.update(str(json.dumps(params, sort_keys=True)))
44 | return md5.hexdigest()
45 |
46 |
--------------------------------------------------------------------------------
/src/sound_event_detection.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import numpy
5 |
6 |
7 | def event_detection(feature_data, model_container, hop_length_seconds=0.01, smoothing_window_length_seconds=1.0, decision_threshold=0.0, minimum_event_length=0.1, minimum_event_gap=0.1):
8 | """Sound event detection
9 |
10 | Parameters
11 | ----------
12 | feature_data : numpy.ndarray [shape=(n_features, t)]
13 | Feature matrix
14 |
15 | model_container : dict
16 | Sound event model pairs [positive and negative] in dict
17 |
18 | hop_length_seconds : float > 0.0
19 | Feature hop length in seconds, used to convert feature index into time-stamp
20 | (Default value=0.01)
21 |
22 | smoothing_window_length_seconds : float > 0.0
23 | Accumulation window (look-back) length, withing the window likelihoods are accumulated.
24 | (Default value=1.0)
25 |
26 | decision_threshold : float > 0.0
27 | Likelihood ratio threshold for making the decision.
28 | (Default value=0.0)
29 |
30 | minimum_event_length : float > 0.0
31 | Minimum event length in seconds, shorten than given are filtered out from the output.
32 | (Default value=0.1)
33 |
34 | minimum_event_gap : float > 0.0
35 | Minimum allowed gap between events in seconds from same event label class.
36 | (Default value=0.1)
37 |
38 | Returns
39 | -------
40 | results : list (event dicts)
41 | Detection result, event list
42 |
43 | """
44 |
45 | smoothing_window = int(smoothing_window_length_seconds / hop_length_seconds)
46 |
47 | results = []
48 | for event_label in model_container['models']:
49 | positive = model_container['models'][event_label]['positive'].score_samples(feature_data)[0]
50 | negative = model_container['models'][event_label]['negative'].score_samples(feature_data)[0]
51 |
52 | # Lets keep the system causal and use look-back while smoothing (accumulating) likelihoods
53 | for stop_id in range(0, feature_data.shape[0]):
54 | start_id = stop_id - smoothing_window
55 | if start_id < 0:
56 | start_id = 0
57 | positive[start_id] = sum(positive[start_id:stop_id])
58 | negative[start_id] = sum(negative[start_id:stop_id])
59 |
60 | likelihood_ratio = positive - negative
61 | event_activity = likelihood_ratio > decision_threshold
62 |
63 | # Find contiguous segments and convert frame-ids into times
64 | event_segments = contiguous_regions(event_activity) * hop_length_seconds
65 |
66 | # Preprocess the event segments
67 | event_segments = postprocess_event_segments(event_segments=event_segments,
68 | minimum_event_length=minimum_event_length,
69 | minimum_event_gap=minimum_event_gap)
70 |
71 | for event in event_segments:
72 | results.append((event[0], event[1], event_label))
73 |
74 | return results
75 |
76 |
77 | def contiguous_regions(activity_array):
78 | """Find contiguous regions from bool valued numpy.array.
79 | Transforms boolean values for each frame into pairs of onsets and offsets.
80 |
81 | Parameters
82 | ----------
83 | activity_array : numpy.array [shape=(t)]
84 | Event activity array, bool values
85 |
86 | Returns
87 | -------
88 | change_indices : numpy.ndarray [shape=(2, number of found changes)]
89 | Onset and offset indices pairs in matrix
90 |
91 | """
92 |
93 | # Find the changes in the activity_array
94 | change_indices = numpy.diff(activity_array).nonzero()[0]
95 |
96 | # Shift change_index with one, focus on frame after the change.
97 | change_indices += 1
98 |
99 | if activity_array[0]:
100 | # If the first element of activity_array is True add 0 at the beginning
101 | change_indices = numpy.r_[0, change_indices]
102 |
103 | if activity_array[-1]:
104 | # If the last element of activity_array is True, add the length of the array
105 | change_indices = numpy.r_[change_indices, activity_array.size]
106 |
107 | # Reshape the result into two columns
108 | return change_indices.reshape((-1, 2))
109 |
110 |
111 | def postprocess_event_segments(event_segments, minimum_event_length=0.1, minimum_event_gap=0.1):
112 | """Post process event segment list. Makes sure that minimum event length and minimum event gap conditions are met.
113 |
114 | Parameters
115 | ----------
116 | event_segments : numpy.ndarray [shape=(2, number of event)]
117 | Event segments, first column has the onset, second has the offset.
118 |
119 | minimum_event_length : float > 0.0
120 | Minimum event length in seconds, shorten than given are filtered out from the output.
121 | (Default value=0.1)
122 |
123 | minimum_event_gap : float > 0.0
124 | Minimum allowed gap between events in seconds from same event label class.
125 | (Default value=0.1)
126 |
127 | Returns
128 | -------
129 | event_results : numpy.ndarray [shape=(2, number of event)]
130 | postprocessed event segments
131 |
132 | """
133 |
134 | # 1. remove short events
135 | event_results_1 = []
136 | for event in event_segments:
137 | if event[1]-event[0] >= minimum_event_length:
138 | event_results_1.append((event[0], event[1]))
139 |
140 | if len(event_results_1):
141 | # 2. remove small gaps between events
142 | event_results_2 = []
143 |
144 | # Load first event into event buffer
145 | buffered_event_onset = event_results_1[0][0]
146 | buffered_event_offset = event_results_1[0][1]
147 | for i in range(1, len(event_results_1)):
148 | if event_results_1[i][0] - buffered_event_offset > minimum_event_gap:
149 | # The gap between current event and the buffered is bigger than minimum event gap,
150 | # store event, and replace buffered event
151 | event_results_2.append((buffered_event_onset, buffered_event_offset))
152 | buffered_event_onset = event_results_1[i][0]
153 | buffered_event_offset = event_results_1[i][1]
154 | else:
155 | # The gap between current event and the buffered is smalle than minimum event gap,
156 | # extend the buffered event until the current offset
157 | buffered_event_offset = event_results_1[i][1]
158 |
159 | # Store last event from buffer
160 | event_results_2.append((buffered_event_onset, buffered_event_offset))
161 |
162 | return event_results_2
163 | else:
164 | return event_results_1
165 |
--------------------------------------------------------------------------------
/src/ui.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 |
4 | import sys
5 | import itertools
6 |
7 | spinner = itertools.cycle(["`", "*", ";", ","])
8 |
9 |
10 | def title(text):
11 | """Prints title
12 |
13 | Parameters
14 | ----------
15 | text : str
16 | Title
17 |
18 | Returns
19 | -------
20 | Nothing
21 |
22 | """
23 |
24 | print "--------------------------------"
25 | print text
26 | print "--------------------------------"
27 |
28 |
29 | def section_header(text):
30 | """Prints section header
31 |
32 | Parameters
33 | ----------
34 | text : str
35 | Section header
36 |
37 | Returns
38 | -------
39 | Nothing
40 |
41 | """
42 |
43 | print " "
44 | print text
45 | print "================================"
46 |
47 |
48 | def foot():
49 | """Prints foot
50 |
51 | Parameters
52 | ----------
53 | Nothing
54 |
55 | Returns
56 | -------
57 | Nothing
58 |
59 | """
60 |
61 | print " [Done] "
62 |
63 |
64 | def progress(title_text=None, fold=None, percentage=None, note=None, label=None):
65 | """Prints progress line
66 |
67 | Parameters
68 | ----------
69 | title_text : str or None
70 | Title
71 |
72 | fold : int > 0 [scalar] or None
73 | Fold number
74 |
75 | percentage : float [0-1] or None
76 | Progress percentage.
77 |
78 | note : str or None
79 | Note
80 |
81 | label : str or None
82 | Label
83 |
84 | Returns
85 | -------
86 | Nothing
87 |
88 | """
89 |
90 | if title_text is not None and fold is not None and percentage is not None and note is not None and label is None:
91 | print " {:2s} {:20s} fold[{:1d}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, fold,percentage * 100, note),
92 |
93 | elif title_text is not None and fold is not None and percentage is None and note is not None and label is None:
94 | print " {:2s} {:20s} fold[{:1d}] [{:20s}] \r".format(spinner.next(), title_text, fold, note),
95 |
96 | elif title_text is not None and fold is None and percentage is not None and note is not None and label is None:
97 | print " {:2s} {:20s} [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, percentage * 100, note),
98 |
99 | elif title_text is not None and fold is None and percentage is not None and note is None and label is None:
100 | print " {:2s} {:20s} [{:3.0f}%] \r".format(spinner.next(), title_text, percentage * 100),
101 |
102 | elif title_text is not None and fold is None and percentage is None and note is not None and label is None:
103 | print " {:2s} {:20s} [{:20s}] \r".format(spinner.next(), title_text, note),
104 |
105 | elif title_text is not None and fold is None and percentage is None and note is not None and label is not None:
106 | print " {:2s} {:20s} [{:20s}] [{:20s}] \r".format(spinner.next(), title_text, label, note),
107 |
108 | elif title_text is not None and fold is None and percentage is not None and note is not None and label is not None:
109 | print " {:2s} {:20s} [{:20s}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, label, percentage * 100, note),
110 |
111 | elif title_text is not None and fold is not None and percentage is not None and note is not None and label is not None:
112 | print " {:2s} {:20s} fold[{:1d}] [{:10s}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, fold, label, percentage * 100, note),
113 |
114 | elif title_text is not None and fold is not None and percentage is None and note is None and label is not None:
115 | print " {:2s} {:20s} fold[{:1d}] [{:10s}] \r".format(spinner.next(), title_text, fold, label),
116 |
117 | sys.stdout.flush()
118 |
--------------------------------------------------------------------------------
/task1_scene_classification.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 | #
4 | # DCASE 2016::Acoustic Scene Classification / Baseline System
5 |
6 | from src.ui import *
7 | from src.general import *
8 | from src.files import *
9 |
10 | from src.features import *
11 | from src.dataset import *
12 | from src.evaluation import *
13 |
14 | import numpy
15 | import csv
16 | import argparse
17 | import textwrap
18 | import copy
19 |
20 | from sklearn import mixture
21 |
22 | __version_info__ = ('1', '0', '0')
23 | __version__ = '.'.join(__version_info__)
24 |
25 |
26 | def main(argv):
27 | numpy.random.seed(123456) # let's make randomization predictable
28 |
29 | parser = argparse.ArgumentParser(
30 | prefix_chars='-+',
31 | formatter_class=argparse.RawDescriptionHelpFormatter,
32 | description=textwrap.dedent('''\
33 | DCASE 2016
34 | Task 1: Acoustic Scene Classification
35 | Baseline system
36 | ---------------------------------------------
37 | Tampere University of Technology / Audio Research Group
38 | Author: Toni Heittola ( toni.heittola@tut.fi )
39 |
40 | System description
41 | This is an baseline implementation for D-CASE 2016 challenge acoustic scene classification task.
42 | Features: MFCC (static+delta+acceleration)
43 | Classifier: GMM
44 |
45 | '''))
46 |
47 | # Setup argument handling
48 | parser.add_argument("-development", help="Use the system in the development mode", action='store_true',
49 | default=False, dest='development')
50 | parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true',
51 | default=False, dest='challenge')
52 |
53 | parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__)
54 | args = parser.parse_args()
55 |
56 | # Load parameters from config file
57 | parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)),
58 | os.path.splitext(os.path.basename(__file__))[0]+'.yaml')
59 | params = load_parameters(parameter_file)
60 | params = process_parameters(params)
61 | make_folders(params)
62 |
63 | title("DCASE 2016::Acoustic Scene Classification / Baseline System")
64 |
65 | # Check if mode is defined
66 | if not (args.development or args.challenge):
67 | args.development = True
68 | args.challenge = False
69 |
70 | dataset_evaluation_mode = 'folds'
71 | if args.development and not args.challenge:
72 | print "Running system in development mode"
73 | dataset_evaluation_mode = 'folds'
74 | elif not args.development and args.challenge:
75 | print "Running system in challenge mode"
76 | dataset_evaluation_mode = 'full'
77 |
78 | # Get dataset container class
79 | dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data'])
80 |
81 | # Fetch data over internet and setup the data
82 | # ==================================================
83 | if params['flow']['initialize']:
84 | dataset.fetch()
85 |
86 | # Extract features for all audio files in the dataset
87 | # ==================================================
88 | if params['flow']['extract_features']:
89 | section_header('Feature extraction')
90 |
91 | # Collect files in train sets and test sets
92 | files = []
93 | for fold in dataset.folds(mode=dataset_evaluation_mode):
94 | for item_id, item in enumerate(dataset.train(fold)):
95 | if item['file'] not in files:
96 | files.append(item['file'])
97 | for item_id, item in enumerate(dataset.test(fold)):
98 | if item['file'] not in files:
99 | files.append(item['file'])
100 | files = sorted(files)
101 |
102 | # Go through files and make sure all features are extracted
103 | do_feature_extraction(files=files,
104 | dataset=dataset,
105 | feature_path=params['path']['features'],
106 | params=params['features'],
107 | overwrite=params['general']['overwrite'])
108 |
109 | foot()
110 |
111 | # Prepare feature normalizers
112 | # ==================================================
113 | if params['flow']['feature_normalizer']:
114 | section_header('Feature normalizer')
115 |
116 | do_feature_normalization(dataset=dataset,
117 | feature_normalizer_path=params['path']['feature_normalizers'],
118 | feature_path=params['path']['features'],
119 | dataset_evaluation_mode=dataset_evaluation_mode,
120 | overwrite=params['general']['overwrite'])
121 |
122 | foot()
123 |
124 | # System training
125 | # ==================================================
126 | if params['flow']['train_system']:
127 | section_header('System training')
128 |
129 | do_system_training(dataset=dataset,
130 | model_path=params['path']['models'],
131 | feature_normalizer_path=params['path']['feature_normalizers'],
132 | feature_path=params['path']['features'],
133 | feature_params=params['features'],
134 | classifier_params=params['classifier']['parameters'],
135 | classifier_method=params['classifier']['method'],
136 | dataset_evaluation_mode=dataset_evaluation_mode,
137 | clean_audio_errors=params['classifier']['audio_error_handling']['clean_data'],
138 | overwrite=params['general']['overwrite']
139 | )
140 |
141 | foot()
142 |
143 | # System evaluation in development mode
144 | if args.development and not args.challenge:
145 |
146 | # System testing
147 | # ==================================================
148 | if params['flow']['test_system']:
149 | section_header('System testing')
150 |
151 | do_system_testing(dataset=dataset,
152 | feature_path=params['path']['features'],
153 | result_path=params['path']['results'],
154 | model_path=params['path']['models'],
155 | feature_params=params['features'],
156 | dataset_evaluation_mode=dataset_evaluation_mode,
157 | classifier_method=params['classifier']['method'],
158 | clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'],
159 | overwrite=params['general']['overwrite']
160 | )
161 |
162 | foot()
163 |
164 | # System evaluation
165 | # ==================================================
166 | if params['flow']['evaluate_system']:
167 | section_header('System evaluation')
168 |
169 | do_system_evaluation(dataset=dataset,
170 | dataset_evaluation_mode=dataset_evaluation_mode,
171 | result_path=params['path']['results'])
172 |
173 | foot()
174 |
175 | # System evaluation with challenge data
176 | elif not args.development and args.challenge:
177 | # Fetch data over internet and setup the data
178 | challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data'])
179 | if params['general']['challenge_submission_mode']:
180 | result_path = params['path']['challenge_results']
181 | else:
182 | result_path = params['path']['results']
183 |
184 | if params['flow']['initialize']:
185 | challenge_dataset.fetch()
186 |
187 | if not params['general']['challenge_submission_mode']:
188 | section_header('Feature extraction for challenge data')
189 |
190 | # Extract feature if not running in challenge submission mode.
191 | # Collect test files
192 | files = []
193 | for fold in challenge_dataset.folds(mode=dataset_evaluation_mode):
194 | for item_id, item in enumerate(dataset.test(fold)):
195 | if item['file'] not in files:
196 | files.append(item['file'])
197 | files = sorted(files)
198 |
199 | # Go through files and make sure all features are extracted
200 | do_feature_extraction(files=files,
201 | dataset=challenge_dataset,
202 | feature_path=params['path']['features'],
203 | params=params['features'],
204 | overwrite=params['general']['overwrite'])
205 | foot()
206 |
207 | # System testing
208 | if params['flow']['test_system']:
209 | section_header('System testing with challenge data')
210 |
211 | do_system_testing(dataset=challenge_dataset,
212 | feature_path=params['path']['features'],
213 | result_path=result_path,
214 | model_path=params['path']['models'],
215 | feature_params=params['features'],
216 | dataset_evaluation_mode=dataset_evaluation_mode,
217 | classifier_method=params['classifier']['method'],
218 | clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'],
219 | overwrite=params['general']['overwrite'] or params['general']['challenge_submission_mode']
220 | )
221 | foot()
222 |
223 | if params['general']['challenge_submission_mode']:
224 | print " "
225 | print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]"
226 | print " "
227 |
228 | # System evaluation if not in challenge submission mode
229 | if params['flow']['evaluate_system'] and not params['general']['challenge_submission_mode']:
230 | section_header('System evaluation with challenge data')
231 | do_system_evaluation(dataset=challenge_dataset,
232 | dataset_evaluation_mode=dataset_evaluation_mode,
233 | result_path=result_path)
234 |
235 | foot()
236 |
237 | return 0
238 |
239 |
240 | def process_parameters(params):
241 | """Parameter post-processing.
242 |
243 | Parameters
244 | ----------
245 | params : dict
246 | parameters in dict
247 |
248 | Returns
249 | -------
250 | params : dict
251 | processed parameters
252 |
253 | """
254 |
255 | # Convert feature extraction window and hop sizes seconds to samples
256 | params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs'])
257 | params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs'])
258 |
259 | # Copy parameters for current classifier method
260 | params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']]
261 |
262 | # Hash
263 | params['features']['hash'] = get_parameter_hash(params['features'])
264 |
265 | # Let's keep hashes backwards compatible after added parameters.
266 | # Only if error handling is used, they are included in the hash.
267 | classifier_params = copy.copy(params['classifier'])
268 | if not classifier_params['audio_error_handling']['clean_data']:
269 | del classifier_params['audio_error_handling']
270 | params['classifier']['hash'] = get_parameter_hash(classifier_params)
271 |
272 | params['recognizer']['hash'] = get_parameter_hash(params['recognizer'])
273 |
274 | # Paths
275 | params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data'])
276 | params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base'])
277 |
278 | # Features
279 | params['path']['features_'] = params['path']['features']
280 | params['path']['features'] = os.path.join(params['path']['base'],
281 | params['path']['features'],
282 | params['features']['hash'])
283 |
284 | # Feature normalizers
285 | params['path']['feature_normalizers_'] = params['path']['feature_normalizers']
286 | params['path']['feature_normalizers'] = os.path.join(params['path']['base'],
287 | params['path']['feature_normalizers'],
288 | params['features']['hash'])
289 |
290 | # Models
291 | params['path']['models_'] = params['path']['models']
292 | params['path']['models'] = os.path.join(params['path']['base'],
293 | params['path']['models'],
294 | params['features']['hash'],
295 | params['classifier']['hash'])
296 | # Results
297 | params['path']['results_'] = params['path']['results']
298 | params['path']['results'] = os.path.join(params['path']['base'],
299 | params['path']['results'],
300 | params['features']['hash'],
301 | params['classifier']['hash'],
302 | params['recognizer']['hash'])
303 |
304 | return params
305 |
306 |
307 | def make_folders(params, parameter_filename='parameters.yaml'):
308 | """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data.
309 |
310 | Parameters
311 | ----------
312 | params : dict
313 | parameters in dict
314 |
315 | parameter_filename : str
316 | filename to save parameters used to generate the folder name
317 |
318 | Returns
319 | -------
320 | nothing
321 |
322 | """
323 |
324 | # Check that target path exists, create if not
325 | check_path(params['path']['features'])
326 | check_path(params['path']['feature_normalizers'])
327 | check_path(params['path']['models'])
328 | check_path(params['path']['results'])
329 |
330 | # Save parameters into folders to help manual browsing of files.
331 |
332 | # Features
333 | feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename)
334 | if not os.path.isfile(feature_parameter_filename):
335 | save_parameters(feature_parameter_filename, params['features'])
336 |
337 | # Feature normalizers
338 | feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename)
339 | if not os.path.isfile(feature_normalizer_parameter_filename):
340 | save_parameters(feature_normalizer_parameter_filename, params['features'])
341 |
342 | # Models
343 | model_features_parameter_filename = os.path.join(params['path']['base'],
344 | params['path']['models_'],
345 | params['features']['hash'],
346 | parameter_filename)
347 | if not os.path.isfile(model_features_parameter_filename):
348 | save_parameters(model_features_parameter_filename, params['features'])
349 |
350 | model_models_parameter_filename = os.path.join(params['path']['base'],
351 | params['path']['models_'],
352 | params['features']['hash'],
353 | params['classifier']['hash'],
354 | parameter_filename)
355 | if not os.path.isfile(model_models_parameter_filename):
356 | save_parameters(model_models_parameter_filename, params['classifier'])
357 |
358 | # Results
359 | # Save parameters into folders to help manual browsing of files.
360 | result_features_parameter_filename = os.path.join(params['path']['base'],
361 | params['path']['results_'],
362 | params['features']['hash'],
363 | parameter_filename)
364 | if not os.path.isfile(result_features_parameter_filename):
365 | save_parameters(result_features_parameter_filename, params['features'])
366 |
367 | result_models_parameter_filename = os.path.join(params['path']['base'],
368 | params['path']['results_'],
369 | params['features']['hash'],
370 | params['classifier']['hash'],
371 | parameter_filename)
372 | if not os.path.isfile(result_models_parameter_filename):
373 | save_parameters(result_models_parameter_filename, params['classifier'])
374 |
375 | result_models_parameter_filename = os.path.join(params['path']['base'],
376 | params['path']['results_'],
377 | params['features']['hash'],
378 | params['classifier']['hash'],
379 | params['recognizer']['hash'],
380 | parameter_filename)
381 | if not os.path.isfile(result_models_parameter_filename):
382 | save_parameters(result_models_parameter_filename, params['recognizer'])
383 |
384 | def get_feature_filename(audio_file, path, extension='cpickle'):
385 | """Get feature filename
386 |
387 | Parameters
388 | ----------
389 | audio_file : str
390 | audio file name from which the features are extracted
391 |
392 | path : str
393 | feature path
394 |
395 | extension : str
396 | file extension
397 | (Default value='cpickle')
398 |
399 | Returns
400 | -------
401 | feature_filename : str
402 | full feature filename
403 |
404 | """
405 |
406 | audio_filename = os.path.split(audio_file)[1]
407 | return os.path.join(path, os.path.splitext(audio_filename)[0] + '.' + extension)
408 |
409 |
410 | def get_feature_normalizer_filename(fold, path, extension='cpickle'):
411 | """Get normalizer filename
412 |
413 | Parameters
414 | ----------
415 | fold : int >= 0
416 | evaluation fold number
417 |
418 | path : str
419 | normalizer path
420 |
421 | extension : str
422 | file extension
423 | (Default value='cpickle')
424 |
425 | Returns
426 | -------
427 | normalizer_filename : str
428 | full normalizer filename
429 |
430 | """
431 |
432 | return os.path.join(path, 'scale_fold' + str(fold) + '.' + extension)
433 |
434 |
435 | def get_model_filename(fold, path, extension='cpickle'):
436 | """Get model filename
437 |
438 | Parameters
439 | ----------
440 | fold : int >= 0
441 | evaluation fold number
442 |
443 | path : str
444 | model path
445 |
446 | extension : str
447 | file extension
448 | (Default value='cpickle')
449 |
450 | Returns
451 | -------
452 | model_filename : str
453 | full model filename
454 |
455 | """
456 |
457 | return os.path.join(path, 'model_fold' + str(fold) + '.' + extension)
458 |
459 |
460 | def get_result_filename(fold, path, extension='txt'):
461 | """Get result filename
462 |
463 | Parameters
464 | ----------
465 | fold : int >= 0
466 | evaluation fold number
467 |
468 | path : str
469 | result path
470 |
471 | extension : str
472 | file extension
473 | (Default value='cpickle')
474 |
475 | Returns
476 | -------
477 | result_filename : str
478 | full result filename
479 |
480 | """
481 |
482 | if fold == 0:
483 | return os.path.join(path, 'results.' + extension)
484 | else:
485 | return os.path.join(path, 'results_fold' + str(fold) + '.' + extension)
486 |
487 |
488 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False):
489 | """Feature extraction
490 |
491 | Parameters
492 | ----------
493 | files : list
494 | file list
495 |
496 | dataset : class
497 | dataset class
498 |
499 | feature_path : str
500 | path where the features are saved
501 |
502 | params : dict
503 | parameter dict
504 |
505 | overwrite : bool
506 | overwrite existing feature files
507 | (Default value=False)
508 |
509 | Returns
510 | -------
511 | nothing
512 |
513 | Raises
514 | -------
515 | IOError
516 | Audio file not found.
517 |
518 | """
519 |
520 | # Check that target path exists, create if not
521 | check_path(feature_path)
522 |
523 | for file_id, audio_filename in enumerate(files):
524 | # Get feature filename
525 | current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
526 |
527 | progress(title_text='Extracting',
528 | percentage=(float(file_id) / len(files)),
529 | note=os.path.split(audio_filename)[1])
530 |
531 | if not os.path.isfile(current_feature_file) or overwrite:
532 | # Load audio data
533 | if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)):
534 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs'])
535 | else:
536 | raise IOError("Audio file not found [%s]" % audio_filename)
537 |
538 | # Extract features
539 | feature_data = feature_extraction(y=y,
540 | fs=fs,
541 | include_mfcc0=params['include_mfcc0'],
542 | include_delta=params['include_delta'],
543 | include_acceleration=params['include_acceleration'],
544 | mfcc_params=params['mfcc'],
545 | delta_params=params['mfcc_delta'],
546 | acceleration_params=params['mfcc_acceleration'])
547 | # Save
548 | save_data(current_feature_file, feature_data)
549 |
550 |
551 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False):
552 | """Feature normalization
553 |
554 | Calculated normalization factors for each evaluation fold based on the training material available.
555 |
556 | Parameters
557 | ----------
558 | dataset : class
559 | dataset class
560 |
561 | feature_normalizer_path : str
562 | path where the feature normalizers are saved.
563 |
564 | feature_path : str
565 | path where the features are saved.
566 |
567 | dataset_evaluation_mode : str ['folds', 'full']
568 | evaluation mode, 'full' all material available is considered to belong to one fold.
569 | (Default value='folds')
570 |
571 | overwrite : bool
572 | overwrite existing normalizers
573 | (Default value=False)
574 |
575 | Returns
576 | -------
577 | nothing
578 |
579 | Raises
580 | -------
581 | IOError
582 | Feature file not found.
583 |
584 | """
585 |
586 | # Check that target path exists, create if not
587 | check_path(feature_normalizer_path)
588 |
589 | for fold in dataset.folds(mode=dataset_evaluation_mode):
590 | current_normalizer_file = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path)
591 |
592 | if not os.path.isfile(current_normalizer_file) or overwrite:
593 | # Initialize statistics
594 | file_count = len(dataset.train(fold))
595 | normalizer = FeatureNormalizer()
596 |
597 | for item_id, item in enumerate(dataset.train(fold)):
598 | progress(title_text='Collecting data',
599 | fold=fold,
600 | percentage=(float(item_id) / file_count),
601 | note=os.path.split(item['file'])[1])
602 | # Load features
603 | if os.path.isfile(get_feature_filename(audio_file=item['file'], path=feature_path)):
604 | feature_data = load_data(get_feature_filename(audio_file=item['file'], path=feature_path))['stat']
605 | else:
606 | raise IOError("Feature file not found [%s]" % (item['file']))
607 |
608 | # Accumulate statistics
609 | normalizer.accumulate(feature_data)
610 |
611 | # Calculate normalization factors
612 | normalizer.finalize()
613 |
614 | # Save
615 | save_data(current_normalizer_file, normalizer)
616 |
617 |
618 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, feature_params, classifier_params,
619 | dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False):
620 | """System training
621 |
622 | model container format:
623 |
624 | {
625 | 'normalizer': normalizer class
626 | 'models' :
627 | {
628 | 'office' : mixture.GMM class
629 | 'home' : mixture.GMM class
630 | ...
631 | }
632 | }
633 |
634 | Parameters
635 | ----------
636 | dataset : class
637 | dataset class
638 |
639 | model_path : str
640 | path where the models are saved.
641 |
642 | feature_normalizer_path : str
643 | path where the feature normalizers are saved.
644 |
645 | feature_path : str
646 | path where the features are saved.
647 |
648 | feature_params : dict
649 | parameter dict
650 |
651 | classifier_params : dict
652 | parameter dict
653 |
654 | dataset_evaluation_mode : str ['folds', 'full']
655 | evaluation mode, 'full' all material available is considered to belong to one fold.
656 | (Default value='folds')
657 |
658 | classifier_method : str ['gmm']
659 | classifier method, currently only GMM supported
660 | (Default value='gmm')
661 |
662 | clean_audio_errors : bool
663 | Remove audio errors from the training data
664 | (Default value=False)
665 |
666 | overwrite : bool
667 | overwrite existing models
668 | (Default value=False)
669 |
670 | Returns
671 | -------
672 | nothing
673 |
674 | Raises
675 | -------
676 | ValueError
677 | classifier_method is unknown.
678 |
679 | IOError
680 | Feature normalizer not found.
681 | Feature file not found.
682 |
683 | """
684 |
685 | if classifier_method != 'gmm':
686 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
687 |
688 | # Check that target path exists, create if not
689 | check_path(model_path)
690 |
691 | for fold in dataset.folds(mode=dataset_evaluation_mode):
692 | current_model_file = get_model_filename(fold=fold, path=model_path)
693 | if not os.path.isfile(current_model_file) or overwrite:
694 | # Load normalizer
695 | feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path)
696 | if os.path.isfile(feature_normalizer_filename):
697 | normalizer = load_data(feature_normalizer_filename)
698 | else:
699 | raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename)
700 |
701 | # Initialize model container
702 | model_container = {'normalizer': normalizer, 'models': {}}
703 |
704 | # Collect training examples
705 | file_count = len(dataset.train(fold))
706 | data = {}
707 | for item_id, item in enumerate(dataset.train(fold)):
708 | progress(title_text='Collecting data',
709 | fold=fold,
710 | percentage=(float(item_id) / file_count),
711 | note=os.path.split(item['file'])[1])
712 |
713 | # Load features
714 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
715 | if os.path.isfile(feature_filename):
716 | feature_data = load_data(feature_filename)['feat']
717 | else:
718 | raise IOError("Features not found [%s]" % (item['file']))
719 |
720 | # Scale features
721 | feature_data = model_container['normalizer'].normalize(feature_data)
722 |
723 | # Audio error removal
724 | if clean_audio_errors:
725 | current_errors = dataset.file_error_meta(item['file'])
726 | if current_errors:
727 | removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool)
728 | for error_event in current_errors:
729 | onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds']))
730 | offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds']))
731 | if offset_frame > feature_data.shape[0]:
732 | offset_frame = feature_data.shape[0]
733 | removal_mask[onset_frame:offset_frame] = False
734 | feature_data = feature_data[removal_mask, :]
735 |
736 | # Store features per class label
737 | if item['scene_label'] not in data:
738 | data[item['scene_label']] = feature_data
739 | else:
740 | data[item['scene_label']] = numpy.vstack((data[item['scene_label']], feature_data))
741 |
742 | # Train models for each class
743 | for label in data:
744 | progress(title_text='Train models',
745 | fold=fold,
746 | note=label)
747 | if classifier_method == 'gmm':
748 | model_container['models'][label] = mixture.GMM(**classifier_params).fit(data[label])
749 | else:
750 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
751 |
752 | # Save models
753 | save_data(current_model_file, model_container)
754 |
755 |
756 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params,
757 | dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False):
758 | """System testing.
759 |
760 | If extracted features are not found from disk, they are extracted but not saved.
761 |
762 | Parameters
763 | ----------
764 | dataset : class
765 | dataset class
766 |
767 | result_path : str
768 | path where the results are saved.
769 |
770 | feature_path : str
771 | path where the features are saved.
772 |
773 | model_path : str
774 | path where the models are saved.
775 |
776 | feature_params : dict
777 | parameter dict
778 |
779 | dataset_evaluation_mode : str ['folds', 'full']
780 | evaluation mode, 'full' all material available is considered to belong to one fold.
781 | (Default value='folds')
782 |
783 | classifier_method : str ['gmm']
784 | classifier method, currently only GMM supported
785 | (Default value='gmm')
786 |
787 | clean_audio_errors : bool
788 | Remove audio errors from the training data
789 | (Default value=False)
790 |
791 | overwrite : bool
792 | overwrite existing models
793 | (Default value=False)
794 |
795 | Returns
796 | -------
797 | nothing
798 |
799 | Raises
800 | -------
801 | ValueError
802 | classifier_method is unknown.
803 |
804 | IOError
805 | Model file not found.
806 | Audio file not found.
807 |
808 | """
809 |
810 | if classifier_method != 'gmm':
811 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
812 |
813 | # Check that target path exists, create if not
814 | check_path(result_path)
815 |
816 | for fold in dataset.folds(mode=dataset_evaluation_mode):
817 | current_result_file = get_result_filename(fold=fold, path=result_path)
818 | if not os.path.isfile(current_result_file) or overwrite:
819 | results = []
820 |
821 | # Load class model container
822 | model_filename = get_model_filename(fold=fold, path=model_path)
823 | if os.path.isfile(model_filename):
824 | model_container = load_data(model_filename)
825 | else:
826 | raise IOError("Model file not found [%s]" % model_filename)
827 |
828 | file_count = len(dataset.test(fold))
829 | for file_id, item in enumerate(dataset.test(fold)):
830 | progress(title_text='Testing',
831 | fold=fold,
832 | percentage=(float(file_id) / file_count),
833 | note=os.path.split(item['file'])[1])
834 |
835 | # Load features
836 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
837 |
838 | if os.path.isfile(feature_filename):
839 | feature_data = load_data(feature_filename)['feat']
840 | else:
841 | # Load audio
842 | if os.path.isfile(dataset.relative_to_absolute_path(item['file'])):
843 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(item['file']), mono=True, fs=feature_params['fs'])
844 | else:
845 | raise IOError("Audio file not found [%s]" % (item['file']))
846 |
847 | feature_data = feature_extraction(y=y,
848 | fs=fs,
849 | include_mfcc0=feature_params['include_mfcc0'],
850 | include_delta=feature_params['include_delta'],
851 | include_acceleration=feature_params['include_acceleration'],
852 | mfcc_params=feature_params['mfcc'],
853 | delta_params=feature_params['mfcc_delta'],
854 | acceleration_params=feature_params['mfcc_acceleration'],
855 | statistics=False)['feat']
856 |
857 | # Scale features
858 | feature_data = model_container['normalizer'].normalize(feature_data)
859 |
860 | if clean_audio_errors:
861 | current_errors = dataset.file_error_meta(item['file'])
862 | if current_errors:
863 | removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool)
864 | for error_event in current_errors:
865 | onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds']))
866 | offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds']))
867 | if offset_frame > feature_data.shape[0]:
868 | offset_frame = feature_data.shape[0]
869 | removal_mask[onset_frame:offset_frame] = False
870 | feature_data = feature_data[removal_mask, :]
871 |
872 | # Do classification for the block
873 | if classifier_method == 'gmm':
874 | current_result = do_classification_gmm(feature_data, model_container)
875 | else:
876 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
877 |
878 | # Store the result
879 | results.append((dataset.absolute_to_relative(item['file']), current_result))
880 |
881 | # Save testing results
882 | with open(current_result_file, 'wt') as f:
883 | writer = csv.writer(f, delimiter='\t')
884 | for result_item in results:
885 | writer.writerow(result_item)
886 |
887 |
888 | def do_classification_gmm(feature_data, model_container):
889 | """GMM classification for give feature matrix
890 |
891 | model container format:
892 |
893 | {
894 | 'normalizer': normalizer class
895 | 'models' :
896 | {
897 | 'office' : mixture.GMM class
898 | 'home' : mixture.GMM class
899 | ...
900 | }
901 | }
902 |
903 | Parameters
904 | ----------
905 | feature_data : numpy.ndarray [shape=(t, feature vector length)]
906 | feature matrix
907 |
908 | model_container : dict
909 | model container
910 |
911 | Returns
912 | -------
913 | result : str
914 | classification result as scene label
915 |
916 | """
917 |
918 | # Initialize log-likelihood matrix to -inf
919 | logls = numpy.empty(len(model_container['models']))
920 | logls.fill(-numpy.inf)
921 |
922 | for label_id, label in enumerate(model_container['models']):
923 | logls[label_id] = numpy.sum(model_container['models'][label].score(feature_data))
924 |
925 | classification_result_id = numpy.argmax(logls)
926 | return model_container['models'].keys()[classification_result_id]
927 |
928 |
929 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'):
930 | """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed.
931 |
932 | Parameters
933 | ----------
934 | dataset : class
935 | dataset class
936 |
937 | result_path : str
938 | path where the results are saved.
939 |
940 | dataset_evaluation_mode : str ['folds', 'full']
941 | evaluation mode, 'full' all material available is considered to belong to one fold.
942 | (Default value='folds')
943 |
944 | Returns
945 | -------
946 | nothing
947 |
948 | Raises
949 | -------
950 | IOError
951 | Result file not found
952 |
953 | """
954 |
955 | dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
956 | results_fold = []
957 | for fold in dataset.folds(mode=dataset_evaluation_mode):
958 | dcase2016_scene_metric_fold = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels)
959 | results = []
960 | result_filename = get_result_filename(fold=fold, path=result_path)
961 |
962 | if os.path.isfile(result_filename):
963 | with open(result_filename, 'rt') as f:
964 | for row in csv.reader(f, delimiter='\t'):
965 | results.append(row)
966 | else:
967 | raise IOError("Result file not found [%s]" % result_filename)
968 |
969 | y_true = []
970 | y_pred = []
971 | for result in results:
972 | y_true.append(dataset.file_meta(result[0])[0]['scene_label'])
973 | y_pred.append(result[1])
974 | dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
975 | dcase2016_scene_metric_fold.evaluate(system_output=y_pred, annotated_ground_truth=y_true)
976 | results_fold.append(dcase2016_scene_metric_fold.results())
977 | results = dcase2016_scene_metric.results()
978 |
979 | print " File-wise evaluation, over %d folds" % dataset.fold_count
980 | fold_labels = ''
981 | separator = ' =====================+======+======+==========+ +'
982 | if dataset.fold_count > 1:
983 | for fold in dataset.folds(mode=dataset_evaluation_mode):
984 | fold_labels += " {:8s} |".format('Fold'+str(fold))
985 | separator += "==========+"
986 | print " {:20s} | {:4s} : {:4s} | {:8s} | |".format('Scene label', 'Nref', 'Nsys', 'Accuracy')+fold_labels
987 | print separator
988 | for label_id, label in enumerate(sorted(results['class_wise_accuracy'])):
989 | fold_values = ''
990 | if dataset.fold_count > 1:
991 | for fold in dataset.folds(mode=dataset_evaluation_mode):
992 | fold_values += " {:5.1f} % |".format(results_fold[fold-1]['class_wise_accuracy'][label] * 100)
993 | print " {:20s} | {:4d} : {:4d} | {:5.1f} % | |".format(label,
994 | results['class_wise_data'][label]['Nref'],
995 | results['class_wise_data'][label]['Nsys'],
996 | results['class_wise_accuracy'][label] * 100)+fold_values
997 | print separator
998 | fold_values = ''
999 | if dataset.fold_count > 1:
1000 | for fold in dataset.folds(mode=dataset_evaluation_mode):
1001 | fold_values += " {:5.1f} % |".format(results_fold[fold-1]['overall_accuracy'] * 100)
1002 |
1003 | print " {:20s} | {:4d} : {:4d} | {:5.1f} % | |".format('Overall accuracy',
1004 | results['Nref'],
1005 | results['Nsys'],
1006 | results['overall_accuracy'] * 100)+fold_values
1007 |
1008 | if __name__ == "__main__":
1009 | try:
1010 | sys.exit(main(sys.argv))
1011 | except (ValueError, IOError) as e:
1012 | sys.exit(e)
1013 |
--------------------------------------------------------------------------------
/task1_scene_classification.yaml:
--------------------------------------------------------------------------------
1 | # ==========================================================
2 | # Flow
3 | # ==========================================================
4 | flow:
5 | initialize: true
6 | extract_features: true
7 | feature_normalizer: true
8 | train_system: true
9 | test_system: true
10 | evaluate_system: true
11 |
12 | # ==========================================================
13 | # General
14 | # ==========================================================
15 | general:
16 | development_dataset: TUTAcousticScenes_2016_DevelopmentSet
17 | challenge_dataset: TUTAcousticScenes_2016_EvaluationSet
18 |
19 | overwrite: false # Overwrite previously stored data
20 |
21 | challenge_submission_mode: false # save results into path->challenge_results for challenge submission
22 |
23 | # ==========================================================
24 | # Paths
25 | # ==========================================================
26 | path:
27 | data: data/
28 |
29 | base: system/baseline_dcase2016_task1/
30 | features: features/
31 | feature_normalizers: feature_normalizers/
32 | models: acoustic_models/
33 | results: evaluation_results/
34 |
35 | challenge_results: challenge_submission/task_1_acoustic_scene_classification/
36 |
37 | # ==========================================================
38 | # Feature extraction
39 | # ==========================================================
40 | features:
41 | fs: 44100
42 | win_length_seconds: 0.04
43 | hop_length_seconds: 0.02
44 |
45 | include_mfcc0: true #
46 | include_delta: true #
47 | include_acceleration: true #
48 |
49 | mfcc:
50 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric]
51 | n_mfcc: 20 # Number of MFCC coefficients
52 | n_mels: 40 # Number of MEL bands used
53 | n_fft: 2048 # FFT length
54 | fmin: 0 # Minimum frequency when constructing MEL bands
55 | fmax: 22050 # Maximum frequency when constructing MEL band
56 | htk: false # Switch for HTK-styled MEL-frequency equation
57 |
58 | mfcc_delta:
59 | width: 9
60 |
61 | mfcc_acceleration:
62 | width: 9
63 |
64 | # ==========================================================
65 | # Classifier
66 | # ==========================================================
67 | classifier:
68 | method: gmm # The system supports only gmm
69 |
70 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
71 | clean_data: false # Exclude audio errors from training audio
72 |
73 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method
74 |
75 | classifier_parameters:
76 | gmm:
77 | n_components: 16 # Number of Gaussian components
78 | covariance_type: diag # [diag|full] Diagonal or full covariance matrix
79 | random_state: 0
80 | thresh: !!null
81 | tol: 0.001
82 | min_covar: 0.001
83 | n_iter: 40
84 | n_init: 1
85 | params: wmc
86 | init_params: wmc
87 |
88 | # ==========================================================
89 | # Recognizer
90 | # ==========================================================
91 | recognizer:
92 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones)
93 | clean_data: false # Exclude audio errors from test audio
--------------------------------------------------------------------------------
/task3_sound_event_detection_in_real_life_audio.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 | #
4 | # DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System
5 |
6 | from src.ui import *
7 | from src.general import *
8 | from src.files import *
9 |
10 | from src.features import *
11 | from src.sound_event_detection import *
12 | from src.dataset import *
13 | from src.evaluation import *
14 |
15 | import numpy
16 | import csv
17 | import warnings
18 | import argparse
19 | import textwrap
20 | import math
21 |
22 | from sklearn import mixture
23 |
24 | __version_info__ = ('1', '0', '1')
25 | __version__ = '.'.join(__version_info__)
26 |
27 |
28 | def main(argv):
29 | numpy.random.seed(123456) # let's make randomization predictable
30 |
31 | parser = argparse.ArgumentParser(
32 | prefix_chars='-+',
33 | formatter_class=argparse.RawDescriptionHelpFormatter,
34 | description=textwrap.dedent('''\
35 | DCASE 2016
36 | Task 3: Sound Event Detection in Real-life Audio
37 | Baseline System
38 | ---------------------------------------------
39 | Tampere University of Technology / Audio Research Group
40 | Author: Toni Heittola ( toni.heittola@tut.fi )
41 |
42 | System description
43 | This is an baseline implementation for the D-CASE 2016, task 3 - Sound event detection in real life audio.
44 | The system has binary classifier for each included sound event class. The GMM classifier is trained with
45 | the positive and negative examples from the mixture signals, and classification is done between these
46 | two models as likelihood ratio. Acoustic features are MFCC+Delta+Acceleration (MFCC0 omitted).
47 |
48 | '''))
49 |
50 | parser.add_argument("-development", help="Use the system in the development mode", action='store_true',
51 | default=False, dest='development')
52 | parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true',
53 | default=False, dest='challenge')
54 |
55 | parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__)
56 | args = parser.parse_args()
57 |
58 | # Load parameters from config file
59 | parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)),
60 | os.path.splitext(os.path.basename(__file__))[0]+'.yaml')
61 | params = load_parameters(parameter_file)
62 | params = process_parameters(params)
63 | make_folders(params)
64 |
65 | title("DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System")
66 |
67 | # Check if mode is defined
68 | if not (args.development or args.challenge):
69 | args.development = True
70 | args.challenge = False
71 |
72 | dataset_evaluation_mode = 'folds'
73 | if args.development and not args.challenge:
74 | print "Running system in development mode"
75 | dataset_evaluation_mode = 'folds'
76 | elif not args.development and args.challenge:
77 | print "Running system in challenge mode"
78 | dataset_evaluation_mode = 'full'
79 |
80 | # Get dataset container class
81 | dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data'])
82 |
83 | # Fetch data over internet and setup the data
84 | # ==================================================
85 | if params['flow']['initialize']:
86 | dataset.fetch()
87 |
88 | # Extract features for all audio files in the dataset
89 | # ==================================================
90 | if params['flow']['extract_features']:
91 | section_header('Feature extraction [Development data]')
92 |
93 | # Collect files from evaluation sets
94 | files = []
95 | for fold in dataset.folds(mode=dataset_evaluation_mode):
96 | for item_id, item in enumerate(dataset.train(fold)):
97 | if item['file'] not in files:
98 | files.append(item['file'])
99 | for item_id, item in enumerate(dataset.test(fold)):
100 | if item['file'] not in files:
101 | files.append(item['file'])
102 |
103 | # Go through files and make sure all features are extracted
104 | do_feature_extraction(files=files,
105 | dataset=dataset,
106 | feature_path=params['path']['features'],
107 | params=params['features'],
108 | overwrite=params['general']['overwrite'])
109 |
110 | foot()
111 |
112 | # Prepare feature normalizers
113 | # ==================================================
114 | if params['flow']['feature_normalizer']:
115 | section_header('Feature normalizer [Development data]')
116 |
117 | do_feature_normalization(dataset=dataset,
118 | feature_normalizer_path=params['path']['feature_normalizers'],
119 | feature_path=params['path']['features'],
120 | dataset_evaluation_mode=dataset_evaluation_mode,
121 | overwrite=params['general']['overwrite'])
122 |
123 | foot()
124 |
125 | # System training
126 | # ==================================================
127 | if params['flow']['train_system']:
128 | section_header('System training [Development data]')
129 |
130 | do_system_training(dataset=dataset,
131 | model_path=params['path']['models'],
132 | feature_normalizer_path=params['path']['feature_normalizers'],
133 | feature_path=params['path']['features'],
134 | hop_length_seconds=params['features']['hop_length_seconds'],
135 | classifier_params=params['classifier']['parameters'],
136 | dataset_evaluation_mode=dataset_evaluation_mode,
137 | classifier_method=params['classifier']['method'],
138 | overwrite=params['general']['overwrite']
139 | )
140 |
141 | foot()
142 |
143 | # System evaluation in development mode
144 | if args.development and not args.challenge:
145 |
146 | # System testing
147 | # ==================================================
148 | if params['flow']['test_system']:
149 | section_header('System testing [Development data]')
150 |
151 | do_system_testing(dataset=dataset,
152 | result_path=params['path']['results'],
153 | feature_path=params['path']['features'],
154 | model_path=params['path']['models'],
155 | feature_params=params['features'],
156 | detector_params=params['detector'],
157 | dataset_evaluation_mode=dataset_evaluation_mode,
158 | classifier_method=params['classifier']['method'],
159 | overwrite=params['general']['overwrite']
160 | )
161 | foot()
162 |
163 | # System evaluation
164 | # ==================================================
165 | if params['flow']['evaluate_system']:
166 | section_header('System evaluation [Development data]')
167 |
168 | do_system_evaluation(dataset=dataset,
169 | dataset_evaluation_mode=dataset_evaluation_mode,
170 | result_path=params['path']['results'])
171 |
172 | foot()
173 |
174 | # System evaluation with challenge data
175 | elif not args.development and args.challenge:
176 | # Fetch data over internet and setup the data
177 | challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data'])
178 |
179 | if params['flow']['initialize']:
180 | challenge_dataset.fetch()
181 |
182 | # System testing
183 | if params['flow']['test_system']:
184 | section_header('System testing [Challenge data]')
185 |
186 | do_system_testing(dataset=challenge_dataset,
187 | result_path=params['path']['challenge_results'],
188 | feature_path=params['path']['features'],
189 | model_path=params['path']['models'],
190 | feature_params=params['features'],
191 | detector_params=params['detector'],
192 | dataset_evaluation_mode=dataset_evaluation_mode,
193 | classifier_method=params['classifier']['method'],
194 | overwrite=True
195 | )
196 | foot()
197 |
198 | print " "
199 | print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]"
200 | print " "
201 |
202 |
203 | def process_parameters(params):
204 | """Parameter post-processing.
205 |
206 | Parameters
207 | ----------
208 | params : dict
209 | parameters in dict
210 |
211 | Returns
212 | -------
213 | params : dict
214 | processed parameters
215 |
216 | """
217 |
218 | params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs'])
219 | params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs'])
220 |
221 | # Copy parameters for current classifier method
222 | params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']]
223 |
224 | # Hash
225 | params['features']['hash'] = get_parameter_hash(params['features'])
226 | params['classifier']['hash'] = get_parameter_hash(params['classifier'])
227 | params['detector']['hash'] = get_parameter_hash(params['detector'])
228 |
229 | # Paths
230 | params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data'])
231 | params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base'])
232 |
233 | # Features
234 | params['path']['features_'] = params['path']['features']
235 | params['path']['features'] = os.path.join(params['path']['base'],
236 | params['path']['features'],
237 | params['features']['hash'])
238 |
239 | # Feature normalizers
240 | params['path']['feature_normalizers_'] = params['path']['feature_normalizers']
241 | params['path']['feature_normalizers'] = os.path.join(params['path']['base'],
242 | params['path']['feature_normalizers'],
243 | params['features']['hash'])
244 |
245 | # Models
246 | # Save parameters into folders to help manual browsing of files.
247 | params['path']['models_'] = params['path']['models']
248 | params['path']['models'] = os.path.join(params['path']['base'],
249 | params['path']['models'],
250 | params['features']['hash'],
251 | params['classifier']['hash'])
252 |
253 | # Results
254 | params['path']['results_'] = params['path']['results']
255 | params['path']['results'] = os.path.join(params['path']['base'],
256 | params['path']['results'],
257 | params['features']['hash'],
258 | params['classifier']['hash'],
259 | params['detector']['hash'])
260 | return params
261 |
262 |
263 | def make_folders(params, parameter_filename='parameters.yaml'):
264 | """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data.
265 |
266 | Parameters
267 | ----------
268 | params : dict
269 | parameters in dict
270 |
271 | parameter_filename : str
272 | filename to save parameters used to generate the folder name
273 |
274 | Returns
275 | -------
276 | nothing
277 |
278 | """
279 |
280 | # Check that target path exists, create if not
281 | check_path(params['path']['features'])
282 | check_path(params['path']['feature_normalizers'])
283 | check_path(params['path']['models'])
284 | check_path(params['path']['results'])
285 |
286 | # Save parameters into folders to help manual browsing of files.
287 |
288 | # Features
289 | feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename)
290 | if not os.path.isfile(feature_parameter_filename):
291 | save_parameters(feature_parameter_filename, params['features'])
292 |
293 | # Feature normalizers
294 | feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename)
295 | if not os.path.isfile(feature_normalizer_parameter_filename):
296 | save_parameters(feature_normalizer_parameter_filename, params['features'])
297 |
298 | # Models
299 | model_features_parameter_filename = os.path.join(params['path']['base'],
300 | params['path']['models_'],
301 | params['features']['hash'],
302 | parameter_filename)
303 | if not os.path.isfile(model_features_parameter_filename):
304 | save_parameters(model_features_parameter_filename, params['features'])
305 |
306 | model_models_parameter_filename = os.path.join(params['path']['base'],
307 | params['path']['models_'],
308 | params['features']['hash'],
309 | params['classifier']['hash'],
310 | parameter_filename)
311 | if not os.path.isfile(model_models_parameter_filename):
312 | save_parameters(model_models_parameter_filename, params['classifier'])
313 |
314 | # Results
315 | # Save parameters into folders to help manual browsing of files.
316 | result_features_parameter_filename = os.path.join(params['path']['base'],
317 | params['path']['results_'],
318 | params['features']['hash'],
319 | parameter_filename)
320 | if not os.path.isfile(result_features_parameter_filename):
321 | save_parameters(result_features_parameter_filename, params['features'])
322 |
323 | result_models_parameter_filename = os.path.join(params['path']['base'],
324 | params['path']['results_'],
325 | params['features']['hash'],
326 | params['classifier']['hash'],
327 | parameter_filename)
328 | if not os.path.isfile(result_models_parameter_filename):
329 | save_parameters(result_models_parameter_filename, params['classifier'])
330 |
331 | result_detector_parameter_filename = os.path.join(params['path']['base'],
332 | params['path']['results_'],
333 | params['features']['hash'],
334 | params['classifier']['hash'],
335 | params['detector']['hash'],
336 | parameter_filename)
337 | if not os.path.isfile(result_detector_parameter_filename):
338 | save_parameters(result_detector_parameter_filename, params['detector'])
339 |
340 |
341 | def get_feature_filename(audio_file, path, extension='cpickle'):
342 | """Get feature filename
343 |
344 | Parameters
345 | ----------
346 | audio_file : str
347 | audio file name from which the features are extracted
348 |
349 | path : str
350 | feature path
351 |
352 | extension : str
353 | file extension
354 | (Default value='cpickle')
355 |
356 | Returns
357 | -------
358 | feature_filename : str
359 | full feature filename
360 |
361 | """
362 |
363 | return os.path.join(path, 'sequence_' + os.path.splitext(audio_file)[0] + '.' + extension)
364 |
365 |
366 | def get_feature_normalizer_filename(fold, scene_label, path, extension='cpickle'):
367 | """Get normalizer filename
368 |
369 | Parameters
370 | ----------
371 | fold : int >= 0
372 | evaluation fold number
373 |
374 | scene_label : str
375 | scene label
376 |
377 | path : str
378 | normalizer path
379 |
380 | extension : str
381 | file extension
382 | (Default value='cpickle')
383 |
384 | Returns
385 | -------
386 | normalizer_filename : str
387 | full normalizer filename
388 |
389 | """
390 |
391 | return os.path.join(path, 'scale_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
392 |
393 |
394 | def get_model_filename(fold, scene_label, path, extension='cpickle'):
395 | """Get model filename
396 |
397 | Parameters
398 | ----------
399 | fold : int >= 0
400 | evaluation fold number
401 |
402 | scene_label : str
403 | scene label
404 |
405 | path : str
406 | model path
407 |
408 | extension : str
409 | file extension
410 | (Default value='cpickle')
411 |
412 | Returns
413 | -------
414 | model_filename : str
415 | full model filename
416 |
417 | """
418 |
419 | return os.path.join(path, 'model_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
420 |
421 |
422 | def get_result_filename(fold, scene_label, path, extension='txt'):
423 | """Get result filename
424 |
425 | Parameters
426 | ----------
427 | fold : int >= 0
428 | evaluation fold number
429 |
430 | scene_label : str
431 | scene label
432 |
433 | path : str
434 | result path
435 |
436 | extension : str
437 | file extension
438 | (Default value='cpickle')
439 |
440 | Returns
441 | -------
442 | result_filename : str
443 | full result filename
444 |
445 | """
446 |
447 | if fold == 0:
448 | return os.path.join(path, 'results_' + str(scene_label) + '.' + extension)
449 | else:
450 | return os.path.join(path, 'results_fold' + str(fold) + '_' + str(scene_label) + '.' + extension)
451 |
452 |
453 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False):
454 | """Feature extraction
455 |
456 | Parameters
457 | ----------
458 | files : list
459 | file list
460 |
461 | dataset : class
462 | dataset class
463 |
464 | feature_path : str
465 | path where the features are saved
466 |
467 | params : dict
468 | parameter dict
469 |
470 | overwrite : bool
471 | overwrite existing feature files
472 | (Default value=False)
473 |
474 | Returns
475 | -------
476 | nothing
477 |
478 | Raises
479 | -------
480 | IOError
481 | Audio file not found.
482 |
483 | """
484 |
485 | for file_id, audio_filename in enumerate(files):
486 | # Get feature filename
487 | current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
488 |
489 | progress(title_text='Extracting [sequences]',
490 | percentage=(float(file_id) / len(files)),
491 | note=os.path.split(audio_filename)[1])
492 |
493 | if not os.path.isfile(current_feature_file) or overwrite:
494 | # Load audio
495 | if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)):
496 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs'])
497 | else:
498 | raise IOError("Audio file not found [%s]" % audio_filename)
499 |
500 | # Extract features
501 | feature_data = feature_extraction(y=y,
502 | fs=fs,
503 | include_mfcc0=params['include_mfcc0'],
504 | include_delta=params['include_delta'],
505 | include_acceleration=params['include_acceleration'],
506 | mfcc_params=params['mfcc'],
507 | delta_params=params['mfcc_delta'],
508 | acceleration_params=params['mfcc_acceleration'])
509 | # Save
510 | save_data(current_feature_file, feature_data)
511 |
512 |
513 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False):
514 | """Feature normalization
515 |
516 | Calculated normalization factors for each evaluation fold based on the training material available.
517 |
518 | Parameters
519 | ----------
520 | dataset : class
521 | dataset class
522 |
523 | feature_normalizer_path : str
524 | path where the feature normalizers are saved.
525 |
526 | feature_path : str
527 | path where the features are saved.
528 |
529 | dataset_evaluation_mode : str ['folds', 'full']
530 | evaluation mode, 'full' all material available is considered to belong to one fold.
531 | (Default value='folds')
532 |
533 | overwrite : bool
534 | overwrite existing normalizers
535 | (Default value=False)
536 |
537 | Returns
538 | -------
539 | nothing
540 |
541 | Raises
542 | -------
543 | IOError
544 | Feature file not found.
545 |
546 | """
547 |
548 | for fold in dataset.folds(mode=dataset_evaluation_mode):
549 | for scene_id, scene_label in enumerate(dataset.scene_labels):
550 | current_normalizer_file = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path)
551 |
552 | if not os.path.isfile(current_normalizer_file) or overwrite:
553 | # Collect sequence files from scene class
554 | files = []
555 | for item_id, item in enumerate(dataset.train(fold, scene_label=scene_label)):
556 | if item['file'] not in files:
557 | files.append(item['file'])
558 |
559 | file_count = len(files)
560 |
561 | # Initialize statistics
562 | normalizer = FeatureNormalizer()
563 |
564 | for file_id, audio_filename in enumerate(files):
565 | progress(title_text='Collecting data',
566 | fold=fold,
567 | percentage=(float(file_id) / file_count),
568 | note=os.path.split(audio_filename)[1])
569 |
570 | # Load features
571 | feature_filename = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path)
572 | if os.path.isfile(feature_filename):
573 | feature_data = load_data(feature_filename)['stat']
574 | else:
575 | raise IOError("Feature file not found [%s]" % audio_filename)
576 |
577 | # Accumulate statistics
578 | normalizer.accumulate(feature_data)
579 |
580 | # Calculate normalization factors
581 | normalizer.finalize()
582 |
583 | # Save
584 | save_data(current_normalizer_file, normalizer)
585 |
586 |
587 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, hop_length_seconds, classifier_params,
588 | dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False):
589 | """System training
590 |
591 | Train a model pair for each sound event class, one for activity and one for inactivity.
592 |
593 | model container format:
594 |
595 | {
596 | 'normalizer': normalizer class
597 | 'models' :
598 | {
599 | 'mouse click' :
600 | {
601 | 'positive': mixture.GMM class,
602 | 'negative': mixture.GMM class
603 | }
604 | 'keyboard typing' :
605 | {
606 | 'positive': mixture.GMM class,
607 | 'negative': mixture.GMM class
608 | }
609 | ...
610 | }
611 | }
612 |
613 | Parameters
614 | ----------
615 | dataset : class
616 | dataset class
617 |
618 | model_path : str
619 | path where the models are saved.
620 |
621 | feature_normalizer_path : str
622 | path where the feature normalizers are saved.
623 |
624 | feature_path : str
625 | path where the features are saved.
626 |
627 | hop_length_seconds : float > 0
628 | feature frame hop length in seconds
629 |
630 | classifier_params : dict
631 | parameter dict
632 |
633 | dataset_evaluation_mode : str ['folds', 'full']
634 | evaluation mode, 'full' all material available is considered to belong to one fold.
635 | (Default value='folds')
636 |
637 | classifier_method : str ['gmm']
638 | classifier method, currently only GMM supported
639 | (Default value='gmm')
640 |
641 | overwrite : bool
642 | overwrite existing models
643 | (Default value=False)
644 |
645 | Returns
646 | -------
647 | nothing
648 |
649 | Raises
650 | -------
651 | ValueError
652 | classifier_method is unknown.
653 |
654 | IOError
655 | Feature normalizer not found.
656 | Feature file not found.
657 |
658 | """
659 |
660 | if classifier_method != 'gmm':
661 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
662 |
663 | for fold in dataset.folds(mode=dataset_evaluation_mode):
664 | for scene_id, scene_label in enumerate(dataset.scene_labels):
665 | current_model_file = get_model_filename(fold=fold, scene_label=scene_label, path=model_path)
666 | if not os.path.isfile(current_model_file) or overwrite:
667 |
668 | # Load normalizer
669 | feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path)
670 | if os.path.isfile(feature_normalizer_filename):
671 | normalizer = load_data(feature_normalizer_filename)
672 | else:
673 | raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename)
674 |
675 | # Initialize model container
676 | model_container = {'normalizer': normalizer, 'models': {}}
677 |
678 | # Restructure training data in to structure[files][events]
679 | ann = {}
680 | for item_id, item in enumerate(dataset.train(fold=fold, scene_label=scene_label)):
681 | filename = os.path.split(item['file'])[1]
682 | if filename not in ann:
683 | ann[filename] = {}
684 | if item['event_label'] not in ann[filename]:
685 | ann[filename][item['event_label']] = []
686 | ann[filename][item['event_label']].append((item['event_onset'], item['event_offset']))
687 |
688 | # Collect training examples
689 | data_positive = {}
690 | data_negative = {}
691 | file_count = len(ann)
692 | for item_id, audio_filename in enumerate(ann):
693 | progress(title_text='Collecting data',
694 | fold=fold,
695 | percentage=(float(item_id) / file_count),
696 | note=scene_label+" / "+os.path.split(audio_filename)[1])
697 |
698 | # Load features
699 | feature_filename = get_feature_filename(audio_file=audio_filename, path=feature_path)
700 | if os.path.isfile(feature_filename):
701 | feature_data = load_data(feature_filename)['feat']
702 | else:
703 | raise IOError("Feature file not found [%s]" % feature_filename)
704 |
705 | # Normalize features
706 | feature_data = model_container['normalizer'].normalize(feature_data)
707 |
708 | for event_label in ann[audio_filename]:
709 | positive_mask = numpy.zeros((feature_data.shape[0]), dtype=bool)
710 |
711 | for event in ann[audio_filename][event_label]:
712 | start_frame = int(math.floor(event[0] / hop_length_seconds))
713 | stop_frame = int(math.ceil(event[1] / hop_length_seconds))
714 |
715 | if stop_frame > feature_data.shape[0]:
716 | stop_frame = feature_data.shape[0]
717 |
718 | positive_mask[start_frame:stop_frame] = True
719 |
720 | # Store positive examples
721 | if event_label not in data_positive:
722 | data_positive[event_label] = feature_data[positive_mask, :]
723 | else:
724 | data_positive[event_label] = numpy.vstack((data_positive[event_label], feature_data[positive_mask, :]))
725 |
726 | # Store negative examples
727 | if event_label not in data_negative:
728 | data_negative[event_label] = feature_data[~positive_mask, :]
729 | else:
730 | data_negative[event_label] = numpy.vstack((data_negative[event_label], feature_data[~positive_mask, :]))
731 |
732 | # Train models for each class
733 | for event_label in data_positive:
734 | progress(title_text='Train models',
735 | fold=fold,
736 | note=scene_label+" / "+event_label)
737 | if classifier_method == 'gmm':
738 | model_container['models'][event_label] = {}
739 | model_container['models'][event_label]['positive'] = mixture.GMM(**classifier_params).fit(data_positive[event_label])
740 | model_container['models'][event_label]['negative'] = mixture.GMM(**classifier_params).fit(data_negative[event_label])
741 | else:
742 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
743 |
744 | # Save models
745 | save_data(current_model_file, model_container)
746 |
747 |
748 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params, detector_params,
749 | dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False):
750 | """System testing.
751 |
752 | If extracted features are not found from disk, they are extracted but not saved.
753 |
754 | Parameters
755 | ----------
756 | dataset : class
757 | dataset class
758 |
759 | result_path : str
760 | path where the results are saved.
761 |
762 | feature_path : str
763 | path where the features are saved.
764 |
765 | model_path : str
766 | path where the models are saved.
767 |
768 | feature_params : dict
769 | parameter dict
770 |
771 | dataset_evaluation_mode : str ['folds', 'full']
772 | evaluation mode, 'full' all material available is considered to belong to one fold.
773 | (Default value='folds')
774 |
775 | classifier_method : str ['gmm']
776 | classifier method, currently only GMM supported
777 | (Default value='gmm')
778 |
779 | overwrite : bool
780 | overwrite existing models
781 | (Default value=False)
782 |
783 | Returns
784 | -------
785 | nothing
786 |
787 | Raises
788 | -------
789 | ValueError
790 | classifier_method is unknown.
791 |
792 | IOError
793 | Model file not found.
794 | Audio file not found.
795 |
796 | """
797 |
798 | if classifier_method != 'gmm':
799 | raise ValueError("Unknown classifier method ["+classifier_method+"]")
800 |
801 | # Check that target path exists, create if not
802 | check_path(result_path)
803 |
804 | for fold in dataset.folds(mode=dataset_evaluation_mode):
805 | for scene_id, scene_label in enumerate(dataset.scene_labels):
806 | current_result_file = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
807 |
808 | if not os.path.isfile(current_result_file) or overwrite:
809 | results = []
810 |
811 | # Load class model container
812 | model_filename = get_model_filename(fold=fold, scene_label=scene_label, path=model_path)
813 | if os.path.isfile(model_filename):
814 | model_container = load_data(model_filename)
815 | else:
816 | raise IOError("Model file not found [%s]" % model_filename)
817 |
818 | file_count = len(dataset.test(fold, scene_label=scene_label))
819 | for file_id, item in enumerate(dataset.test(fold=fold, scene_label=scene_label)):
820 | progress(title_text='Testing',
821 | fold=fold,
822 | percentage=(float(file_id) / file_count),
823 | note=scene_label+" / "+os.path.split(item['file'])[1])
824 |
825 | # Load features
826 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path)
827 |
828 | if os.path.isfile(feature_filename):
829 | feature_data = load_data(feature_filename)['feat']
830 | else:
831 | # Load audio
832 | if os.path.isfile(dataset.relative_to_absolute_path(item['file'])):
833 | y, fs = load_audio(filename=item['file'], mono=True, fs=feature_params['fs'])
834 | else:
835 | raise IOError("Audio file not found [%s]" % item['file'])
836 |
837 | # Extract features
838 | feature_data = feature_extraction(y=y,
839 | fs=fs,
840 | include_mfcc0=feature_params['include_mfcc0'],
841 | include_delta=feature_params['include_delta'],
842 | include_acceleration=feature_params['include_acceleration'],
843 | mfcc_params=feature_params['mfcc'],
844 | delta_params=feature_params['mfcc_delta'],
845 | acceleration_params=feature_params['mfcc_acceleration'],
846 | statistics=False)['feat']
847 |
848 | # Normalize features
849 | feature_data = model_container['normalizer'].normalize(feature_data)
850 |
851 | current_results = event_detection(feature_data=feature_data,
852 | model_container=model_container,
853 | hop_length_seconds=feature_params['hop_length_seconds'],
854 | smoothing_window_length_seconds=detector_params['smoothing_window_length'],
855 | decision_threshold=detector_params['decision_threshold'],
856 | minimum_event_length=detector_params['minimum_event_length'],
857 | minimum_event_gap=detector_params['minimum_event_gap'])
858 |
859 | # Store the result
860 | for event in current_results:
861 | results.append((dataset.absolute_to_relative(item['file']), event[0], event[1], event[2] ))
862 |
863 | # Save testing results
864 | with open(current_result_file, 'wt') as f:
865 | writer = csv.writer(f, delimiter='\t')
866 | for result_item in results:
867 | writer.writerow(result_item)
868 |
869 |
870 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'):
871 | """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed.
872 |
873 | Parameters
874 | ----------
875 | dataset : class
876 | dataset class
877 |
878 | result_path : str
879 | path where the results are saved.
880 |
881 | dataset_evaluation_mode : str ['folds', 'full']
882 | evaluation mode, 'full' all material available is considered to belong to one fold.
883 | (Default value='folds')
884 |
885 | Returns
886 | -------
887 | nothing
888 |
889 | Raises
890 | -------
891 | IOError
892 | Result file not found
893 |
894 | """
895 |
896 | # Set warnings off, sklearn metrics will trigger warning for classes without
897 | # predicted samples in F1-scoring. This is just to keep printing clean.
898 | warnings.simplefilter("ignore")
899 |
900 | overall_metrics_per_scene = {}
901 |
902 | for scene_id, scene_label in enumerate(dataset.scene_labels):
903 | if scene_label not in overall_metrics_per_scene:
904 | overall_metrics_per_scene[scene_label] = {}
905 |
906 | dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label))
907 | dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label), use_onset_condition=True, use_offset_condition=False)
908 |
909 | for fold in dataset.folds(mode=dataset_evaluation_mode):
910 | results = []
911 | result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path)
912 |
913 | if os.path.isfile(result_filename):
914 | with open(result_filename, 'rt') as f:
915 | for row in csv.reader(f, delimiter='\t'):
916 | results.append(row)
917 | else:
918 | raise IOError("Result file not found [%s]" % result_filename)
919 |
920 | for file_id, item in enumerate(dataset.test(fold, scene_label=scene_label)):
921 | current_file_results = []
922 | for result_line in results:
923 | if len(result_line) != 0 and result_line[0] == dataset.absolute_to_relative(item['file']):
924 | current_file_results.append(
925 | {'file': result_line[0],
926 | 'event_onset': float(result_line[1]),
927 | 'event_offset': float(result_line[2]),
928 | 'event_label': result_line[3].rstrip()
929 | }
930 | )
931 | meta = dataset.file_meta(dataset.absolute_to_relative(item['file']))
932 |
933 | dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
934 | dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta)
935 |
936 | overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results()
937 | overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results()
938 |
939 | print " Evaluation over %d folds" % dataset.fold_count
940 | print " "
941 | print " Results per scene "
942 | print " {:18s} | {:5s} | | {:39s} ".format('', 'Main', 'Secondary metrics')
943 | print " {:18s} | {:5s} | | {:38s} | {:14s} | {:14s} | {:14s} ".format('', '', 'Seg/Overall','Seg/Class', 'Event/Overall','Event/Class')
944 | print " {:18s} | {:5s} | | {:6s} : {:5s} : {:5s} : {:5s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} |".format('Scene', 'ER', 'F1', 'ER', 'ER/S', 'ER/D', 'ER/I', 'F1', 'ER', 'F1', 'ER', 'F1', 'ER')
945 | print " -------------------+-------+ +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+"
946 | averages = {
947 | 'segment_based_metrics': {
948 | 'overall': {
949 | 'ER': [],
950 | 'F': [],
951 | },
952 | 'class_wise_average': {
953 | 'ER': [],
954 | 'F': [],
955 | }
956 | },
957 | 'event_based_metrics': {
958 | 'overall': {
959 | 'ER': [],
960 | 'F': [],
961 | },
962 | 'class_wise_average': {
963 | 'ER': [],
964 | 'F': [],
965 | }
966 | },
967 | }
968 | for scene_id, scene_label in enumerate(dataset.scene_labels):
969 | print " {:18s} | {:5.2f} | | {:4.1f} % : {:5.2f} : {:5.2f} : {:5.2f} : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format(scene_label,
970 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'],
971 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F'] * 100,
972 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'],
973 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['S'],
974 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['D'],
975 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['I'],
976 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100,
977 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'],
978 | overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F']*100,
979 | overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER'],
980 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100,
981 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'],
982 | )
983 | averages['segment_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'])
984 | averages['segment_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F'])
985 | averages['segment_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'])
986 | averages['segment_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F'])
987 | averages['event_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER'])
988 | averages['event_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F'])
989 | averages['event_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'])
990 | averages['event_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F'])
991 |
992 | print " -------------------+-------+ +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+"
993 | print " {:18s} | {:5.2f} | | {:4.1f} % : {:5.2f} : {:21s} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format('Average',
994 | numpy.mean(averages['segment_based_metrics']['overall']['ER']),
995 | numpy.mean(averages['segment_based_metrics']['overall']['F'])*100,
996 | numpy.mean(averages['segment_based_metrics']['overall']['ER']),
997 | ' ',
998 | numpy.mean(averages['segment_based_metrics']['class_wise_average']['F'])*100,
999 | numpy.mean(averages['segment_based_metrics']['class_wise_average']['ER']),
1000 | numpy.mean(averages['event_based_metrics']['overall']['F'])*100,
1001 | numpy.mean(averages['event_based_metrics']['overall']['ER']),
1002 | numpy.mean(averages['event_based_metrics']['class_wise_average']['F'])*100,
1003 | numpy.mean(averages['event_based_metrics']['class_wise_average']['ER']),
1004 | )
1005 |
1006 | print " "
1007 | # Restore warnings to default settings
1008 | warnings.simplefilter("default")
1009 | print " Results per events "
1010 |
1011 | for scene_id, scene_label in enumerate(dataset.scene_labels):
1012 | print " "
1013 | print " "+scene_label.upper()
1014 | print " {:20s} | {:30s} | | {:15s} ".format('', 'Segment-based', 'Event-based')
1015 | print " {:20s} | {:5s} : {:5s} : {:6s} : {:5s} | | {:5s} : {:5s} : {:6s} : {:5s} |".format('Event', 'Nref', 'Nsys', 'F1', 'ER', 'Nref', 'Nsys', 'F1', 'ER')
1016 | print " ---------------------+-------+-------+--------+-------+ +-------+-------+--------+-------+"
1017 | seg_Nref = 0
1018 | seg_Nsys = 0
1019 |
1020 | event_Nref = 0
1021 | event_Nsys = 0
1022 | for event_label in sorted(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise']):
1023 | print " {:20s} | {:5d} : {:5d} : {:4.1f} % : {:5.2f} | | {:5d} : {:5d} : {:4.1f} % : {:5.2f} |".format(event_label,
1024 | int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref']),
1025 | int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys']),
1026 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['F']*100,
1027 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['ER'],
1028 | int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref']),
1029 | int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys']),
1030 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['F']*100,
1031 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['ER'])
1032 | seg_Nref += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref'])
1033 | seg_Nsys += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys'])
1034 |
1035 | event_Nref += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref'])
1036 | event_Nsys += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys'])
1037 | print " ---------------------+-------+-------+--------+-------+ +-------+-------+--------+-------+"
1038 | print " {:20s} | {:5d} : {:5d} : {:14s} | | {:5d} : {:5d} : {:14s} |".format('Sum',
1039 | seg_Nref,
1040 | seg_Nsys,
1041 | '',
1042 | event_Nref,
1043 | event_Nsys,
1044 | '')
1045 | print " {:20s} | {:5s} {:5s} : {:4.1f} % : {:5.2f} | | {:5s} {:5s} : {:4.1f} % : {:5.2f} |".format('Average',
1046 | '', '',
1047 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100,
1048 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'],
1049 | '', '',
1050 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100,
1051 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'])
1052 | print " "
1053 |
1054 | if __name__ == "__main__":
1055 | try:
1056 | sys.exit(main(sys.argv))
1057 | except (ValueError, IOError) as e:
1058 | sys.exit(e)
--------------------------------------------------------------------------------
/task3_sound_event_detection_in_real_life_audio.yaml:
--------------------------------------------------------------------------------
1 | # ==========================================================
2 | # Flow
3 | # ==========================================================
4 | flow:
5 | initialize: true
6 | extract_features: true
7 | feature_normalizer: true
8 | train_system: true
9 | test_system: true
10 | evaluate_system: true
11 |
12 | # ==========================================================
13 | # General
14 | # ==========================================================
15 | general:
16 | development_dataset: TUTSoundEvents_2016_DevelopmentSet
17 | challenge_dataset: TUTSoundEvents_2016_EvaluationSet
18 |
19 | overwrite: false # Overwrite previously stored data
20 |
21 | # ==========================================================
22 | # Paths
23 | # ==========================================================
24 | path:
25 | data: data/
26 |
27 | base: system/baseline_dcase2016_task3/
28 | features: features/
29 | feature_normalizers: feature_normalizers/
30 | models: acoustic_models/
31 | results: evaluation_results/
32 |
33 | challenge_results: challenge_submission/task_3_sound_event_detection_in_real_life_audio/
34 |
35 | # ==========================================================
36 | # Feature extraction
37 | # ==========================================================
38 | features:
39 | fs: 44100
40 | win_length_seconds: 0.04
41 | hop_length_seconds: 0.02
42 |
43 | include_mfcc0: false
44 | include_delta: true
45 | include_acceleration: true
46 |
47 | mfcc:
48 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric]
49 | n_mfcc: 20 # Number of MFCC coefficients
50 | n_mels: 40 # Number of MEL bands used
51 | n_fft: 2048 # FFT length, make sure this is larger than win_length_seconds*fs
52 | fmin: 0 # Minimum frequency when constructing MEL bands
53 | fmax: 22050 # Maximum frequency when constructing MEL band
54 | htk: false # Switch for HTK-styled MEL-frequency equation
55 |
56 | mfcc_delta:
57 | width: 9
58 |
59 | mfcc_acceleration:
60 | width: 9
61 |
62 | # ==========================================================
63 | # Classifier
64 | # ==========================================================
65 | classifier:
66 | method: gmm # The system supports only gmm
67 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method
68 |
69 | classifier_parameters:
70 | gmm:
71 | n_components: 16 # Number of Gaussian components
72 | covariance_type: diag # [diag|full] Diagonal or full covariance matrix
73 | random_state: 0
74 | thresh: !!null
75 | tol: 0.001
76 | min_covar: 0.001
77 | n_iter: 40
78 | n_init: 1
79 | params: wmc
80 | init_params: wmc
81 |
82 | # ==========================================================
83 | # Detector
84 | # ==========================================================
85 | detector:
86 | decision_threshold: 160.0
87 | smoothing_window_length: 1.0 # seconds
88 | minimum_event_length: 0.1 # seconds
89 | minimum_event_gap: 0.1 # seconds
90 |
--------------------------------------------------------------------------------